From:	IN%"linguist@tamsun.tamu.edu"  "The Linguist List"  3-MAY-1993 23:40:21.15
To:	IN%"LINGUIST@TAMVM1.BITNET"  "Multiple recipients of list LINGUIST"
CC:	
Subj:	4.314 Sum: Spanish corpora

Received: from HKUVM1.HKU.HK (MAILER@HKUVM1) by vax.csc.cuhk.hk (PMDF #12160)
 id <01GXR0KO4II68WWAZM@vax.csc.cuhk.hk>; Mon, 3 May 1993 19:33 +0800
Received: from HKUVM1.HKU.HK by HKUVM1.HKU.HK (Mailer R2.10 ptf000) with BSMTP
 id 0043; Tue, 27 Apr 93 21:39:10 HKT
Date: Tue, 27 Apr 1993 08:35:31 -0500
From: The Linguist List <linguist@tamsun.tamu.edu>
Subject: 4.314 Sum: Spanish corpora
Sender: The LINGUIST Discussion List <LINGUIST@TAMVM1.BITNET>
To: Multiple recipients of list LINGUIST <LINGUIST@TAMVM1.BITNET>
Reply-to: The Linguist List <linguist@tamsun.tamu.edu>
Message-id: <01GXR0KO4II68WWAZM@vax.csc.cuhk.hk>
Comments: To: linguist@tamvm1.tamu.edu

----------------------------------------------------------------------
LINGUIST List:  Vol-4-314. Tue 27 Apr 1993. ISSN: 1068-4875. Lines: 139
 
Subject: 4.314 Sum: Spanish corpora
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar@tamuts.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry@emunix.emich.edu>
 
Asst. Editor: Ron Reck <rreck@EMUNIX.EMICH.EDU>
 
-------------------------Directory-------------------------------------
 
1)
Date: Mon, 26 Apr 93 23:44:16 EST
From: decio@mace.cc.purdue.edu (Gabriel Decio)
Subject: summary--Spanish corpora
 
-------------------------Messages--------------------------------------
1)
Date: Mon, 26 Apr 93 23:44:16 EST
From: decio@mace.cc.purdue.edu (Gabriel Decio)
Subject: summary--Spanish corpora
 
Thanks to all that responded to my query on Spanish corpora available
online.  Below is a summary of the responses I got.
 
Text
begins=============================================================
 
********************* Text Corpora List: Addresses ***************************
CORPORA@NORA.HD.UIB.NO          for messages to the list
CORPORA-REQUEST@NORA.HD.UIB.NO  for messages to list administrator
FILESERV@NORA.HD.UIB.NO         for requests to file server (try sending HELP)
******************************************************************************
 
I'm looking for online Spanish corpora, preferably newspaper or
magazine articles.  I've heard there is a collection at the University
of Miami, but I haven't been able to find it.  Can anyone help he out?
BTW, I already know what is available in the Oxford Text Archive.
 
 ----------------------------------------------------------------
        Doug McKee              E-mail: mckeed@sra.com
        SRA Corp.               Phone: (703) 558-7820
        2000 15th St. N         Fax: (703) 558-4723
        Arlington, VA 22201
        USA
 ----------------------------------------------------------------
 
========================================================================
I would like to mention the Catalogue of Projects in Electronic Text (CPET)
at Georgetown University, Washington DC. This catalogue can be accessed
via Telnet to: guvax3.georgetown.edu with username: CPET (you will need
VT-100 keys).
 
A manual can be fetched from our fileserver (FILESERV@NORA.HD.UIB.NO)
by sending
 
send info cpet.manual
 
either as the subject or the only line in the message.
 
A list of roman language projects (of feb. 1991, 64 KB) can be
fetched from the file server with the line:
 
send info roman.projects
 
For further information about CPET, contact
Margaret Friedman (mfriedman@guvax.georgetown.edu)
 
==================================================================
 
There is a swedish archive at Gothenburg University containing spanish
newspaper and magazine articles. Please contact:
        David Mighetto  <mighetto@rom.gu.se>
 
===================================================================
 
Concerning English corpora, I'd like to mention that I wrote a survey
of electronic corpora and related resources which will be published in
the book "Talking Data:  Transcription and coding in discourse research",
Edwards & Lampert, Erlbaum Publishers, due out April 15.
 
Other surveys are available through:
the ICAME archive (anonymous ftp to nora.hd.uib.no), and
CPET (cited in the preceding message).
There is also the Oxford Text Archive, which specializes, however, in
literature and Biblical texts:  anonymous ftp to black.ox.ac.uk.
 
Hope that helps.
 
=======================================================================
 
There are some
literary works available electronically from Project Gutenberg. You can get
them via anonymous ftp. Just ftp to 128.174.201.12 , after entering then "cd
etext/etext92" or "etext91" or "etext93". Among their offerings are works like
"Moby Dick" and "Through the Looking Glass". I think they even have Clinton's
Inaugural address.
 
I've also been looking for e-texts in Spanish, but with not too much luck. I
have some newspaper articles, and some interviews ews that someowas kind
enough to send me once. (I posted a query on Linguist about Spanish corpora a
while back)
 
============================================================================
 
 #12755) id <01GVVWTKTIJG8X144C@guvax.acc.georgetown.edu>; Tue,
 16 Mar 1993 18:43 EST
There are zillions of e-texts! Here are a few sources.
1. The Oxford Text Archives: I can send you their catalogue and order
    form. They have *lots* of texts in several languages. They will FTP
     the texts to you free over the internet.
2. Georgetown Catalogue of Projects in Electronic Text (CPET): there was
    a posting on LINGUIST not too long ago ... if you have access to  Gopher,
    you can find it under 'North America', 'Washington DC'.
3. Commercial: in catalogues such as MacWarehouse, you can find CD-ROMS
    of text like 'Front Page News'.
4. ACL/DCI: they have a CD-ROM with over a million words of Dow Jones
   or the Wall Street Journal (or both? I forget)
5. The Linguistic Data Consortium (LDC): lots of non-literary e-corpora,
    including transcriptions of spoken data
6. ICAME: they have a CD-ROM of famous e-corpora + tools (concordances
    and stuff) that goes for about $500, and includes the Brown corpus,
   the LOB corpus, the Lundon-Lund corpus, the Helsinki Diachronic
   corpus (see 'corpus' and these entries in the Oxford Companion to
   the English Language)
7. The CHILDES database - caretaker and child language in several diff.
   languages
 
End of text============================================================
 
--
--------decio@mace.cc.purdue.edu----------------------------------------
|Gabriel A. Decio               |   XX     XXX     XXX     XXX     XX  |
|Dept. of English               |    XX   XX XX   XX XX   XX XX   XX   |
|Purdue University              |     XX XX   XX XX   XX XX   XX XX    |
|West Lafayette, IN             |      XXX     XXX     XXX     XXX     |
------------------------------------------------------------------------
LINGUIST List: Vol-4-314.
