From: IN%"EDITORS@BROWNVM.BITNET" "Elaine Brennan & Allen Renear" 10-JUL-1992 06:24:49.36 To: IN%"B071767@vax.csc.cuhk.hk" "Tze-wan Kwan" CC: Subj: 6.0123 R: E-Text Projects in Humanities (1/154) Received: from HKUVM1.HKU.HK (MAILER@HKUVM1) by vax.csc.cuhk.hk (PMDF #12160) id <01GM7CGDKU6O8WW5RP@vax.csc.cuhk.hk>; Fri, 10 Jul 1992 06:24 +0800 Received: by HKUVM1 (Mailer R2.08 PTF008) id 3238; Fri, 10 Jul 92 06:23:33 HKT Date: Thu, 9 Jul 1992 18:15:32 EDT From: Elaine Brennan & Allen Renear Subject: 6.0123 R: E-Text Projects in Humanities (1/154) Sender: "HUMANIST: Humanities Computing" To: Tze-wan Kwan Reply-to: Elaine Brennan & Allen Renear Message-id: <01GM7CGDKU6O8WW5RP@vax.csc.cuhk.hk> Humanist Discussion Group, Vol. 6, No. 0123. Thursday, 9 Jul 1992. Date: Thu, 9 Jul 1992 17:04 EDT From: MFRIEDMAN@GUVAX.BITNET Subject: Text Projects in the Humanities In response to Professor Maurizio Lana's inquiry about information resources for electronic text projects in the humanities, following is some background on the Georgetown Center for Texts and Technology's catalogue of projects. I plan to send Dr. Lana a copy of our list of electronic text projects, and invite any other interested persons to contact me, MFRIEDMAN@guvax.georgetown.edu., either with information on projects or to receive a copy of the list. Since April of 1989, the Center for Text & Technology (CTT), under the aegis of the Academic Computer Center at Georgetown University, has been compiling a catalogue of projects that create and analyze electronic text in the humanities. The Georgetown University Catalogue of Projects in Electronic Text is a powerful database that includes information on electronic text projects throughout the world. The database includes a variety of information on the many collections of literary works, historical documents, and linguistic data which are available from commercial vendors and scholarly sources. The database is written in Ingres and resides on a VAX 8700 computer at Georgetown University. The database may be searched by off-campus users who can connect to the database using Telnet or a modem. The electronic text projects documented in the database are machine-readable files of primary materials from humanities disciplines. Whether entered by keyboarding or by scanning with an optical character reader, these text files generally take the form either of large corpora for linguistic analysis (such as the new British National Corpus of one million words currently being developed by Oxford University Press and others) or major works of major authors for analysis of style and content (such as the compact disc of the Thesaurus Linguae Graecae containing 1400 years of classical Greek texts). The catalogue does not include electronic versions of encyclopedias, dictionaries, and secondary studies as well as concordances, databases, and computer-assisted instruction programs that do not contain full-text versions of primary works as these materials are beyond the scope of this project. Unlike the databases that research libraries often make available, the electronic texts cataloged at Georgetown are intended by their developers to be searched and manipulated directly by humanists. Often, therefore, the text is encoded with markup language to facilitate integration with other files; occasionally, the texts are combined with a commercial text-analysis tool such as WordCruncher, Folio Views, or Micro-OCP. With electronic text and integrated analysis software, the researcher not only has the equivalent of an interactive concordance for finding instances of key words but can also search for clusters of words, exact phrases, and co-occurrences of key words (sorted by boolean operators) in contexts of various sizes. Statistical programs show where the desired term or concept is concentrated in a work or series of works, and parsing programs can analyze parts of speech and syntactic structures. In general, therefore, the combination of electronic text and searching software can be said to provide the researcher with both microscopic and macroscopic views of the text. The former provides access to small-scale features of a single work; for example, within seconds, a philosopher could locate the single occurrence of the phrase "consciousness of absolute being" from the nine-megabyte, three-volume translation of Hegel's Lectures on the Philosophy of Religion. By contrast, the macroscopic view of the text highlights the ways in which one work differs from other works by the same author or the author's contemporaries; for example, if one searches an eleven-megabyte file of Shakespeare's works for the word 'time,' one finds a greater concentration in Macbeth than in the other tragedies, and by exploring the contexts, one can see how the title character's over-reaching can be explained thematically in terms of his attempt to usurp the providential function that belongs to time. Given these advantages, it is not surprising that the conversion of primary texts to electronic form is proliferating throughout the world. Nevertheless, because of the unpublicized academic nature of such projects, the process of locating them can be difficult. For this reason, we rely heavily on the discussion groups on BlTNET and Internet, not only to identify new projects but also to request information about them and to disseminate the material we compile. Electronic mail provides access to the most recent developments and permits us to receive and transmit information throughout the world quickly and economically. Among the sixty discussion groups we monitor are those in language and literature (Ansax-L, C18-L, Chaucer, Contex-L, English, Ficino, Linguist, Litera-L, Literary, Reed-L, Rustex-L, Shakesper, and Wwp- L), culture and religion (Ccnet-L, Indology, Japan, Judaica, and Religion), libraries (Cdrom-L, Fisc-L, Libref-L, Pacs-L, and Tei-L), philosophy and history (History, Philos-L, ad Philosop), and the humanities in general (Erl-L, Gutnberg, Humanist, and Pmc-Talk). In our search for news of projects, we also review a wide range of publications, including popular magazines and newspapers (such as the Chronicle of Higher Education), agency reports (such as the List of Awards of the National Endowment for the Humanities), trade publications (including InfoWorld and EDU Magazine), discipline specific journals (such as Computers and Philosophy and Computers and the Classics), the newsletters of numerous academic computing centers, and the journals central to humanities computing (Computers and the Humanities, Bits and Bytes Review, and the ICAME Journal). Once we have identified a new project, we request ten categories of information: 0. Identifying acronym or short reference; 1. Name and affiliation of operation (including collaborators) with references toany published description; 2. Contact person and/or vendor with addresses; 3. Primary disciplinary focus (and secondary interests); 4. Focus: time period, geographical area, or individual; 5. Language(s) coded; 6. Intended use(s) and Size (number of works, or entries, or citations); 7. File format(s); 8. Form(s) of access (outline, tape, diskette, CD-ROM, etc.); 9. Source(s) of the archival holdings: encoded in-house, or obtained from elsewhere. Because the catalogue is constantly being updated, any printing would be almost immediately obsolete. Consequently, the CTT has converted the catalog to an online database searchable through Telnet and dial-in access so that current information can be made available to researchers. In addition, searches of the catalogue are performed on request, and updated lists of projects and addresses are posted regularly on the HUMANIST electronic bulletin board and distributed through surface and electronic mail. For further information about the project, or to request a specific search, please contact: Margaret Friedman, Project Assistant The Center for Text and Technology Academic Computer Center 238 Reiss Science Building Georgetown University Washington, DC 20057 (202) 687-6096 BITNET: mfriedman@guvax Internet: mfriedman@guvax.georgetown.edu