From:	IN%"EDITORS@BROWNVM.BITNET"  "Elaine Brennan & Allen Renear" 10-JUL-1992 06:24:49.36
To:	IN%"B071767@vax.csc.cuhk.hk"  "Tze-wan Kwan"
CC:	
Subj:	6.0123  R:  E-Text Projects in Humanities  (1/154)

Received: from HKUVM1.HKU.HK (MAILER@HKUVM1) by vax.csc.cuhk.hk (PMDF #12160)
 id <01GM7CGDKU6O8WW5RP@vax.csc.cuhk.hk>; Fri, 10 Jul 1992 06:24 +0800
Received: by HKUVM1 (Mailer R2.08 PTF008) id 3238; Fri, 10 Jul 92 06:23:33 HKT
Date: Thu, 9 Jul 1992 18:15:32 EDT
From: Elaine Brennan & Allen Renear <EDITORS@BROWNVM.BITNET>
Subject: 6.0123  R:  E-Text Projects in Humanities  (1/154)
Sender: "HUMANIST: Humanities Computing" <HUMANIST@BROWNVM.BITNET>
To: Tze-wan Kwan <B071767@vax.csc.cuhk.hk>
Reply-to: Elaine Brennan & Allen Renear <EDITORS@BROWNVM.BITNET>
Message-id: <01GM7CGDKU6O8WW5RP@vax.csc.cuhk.hk>

Humanist Discussion Group, Vol. 6, No. 0123. Thursday, 9 Jul 1992.
 
Date:    Thu, 9 Jul 1992 17:04 EDT
From:    MFRIEDMAN@GUVAX.BITNET
Subject: Text Projects in the Humanities
 
In response to Professor Maurizio Lana's inquiry about information resources
for electronic text projects in the humanities, following is some
background on the Georgetown Center for Texts and Technology's
catalogue of projects.  I plan to send Dr. Lana a copy of our list of
electronic text projects, and invite any other interested persons to
contact me, MFRIEDMAN@guvax.georgetown.edu., either with information on
projects or to receive a copy of the list.
 
 
    Since April of 1989, the Center for Text & Technology (CTT), under
    the aegis of the Academic Computer Center at Georgetown
    University, has been compiling a catalogue of projects that create
    and analyze electronic text in the humanities.  The Georgetown
    University Catalogue of Projects in Electronic Text is a powerful
    database that includes information on electronic text projects
    throughout the world.  The database includes a variety of
    information on the many collections of literary works, historical
    documents, and linguistic data which are available from commercial
    vendors and scholarly sources.  The database is written in Ingres
    and resides on a VAX 8700 computer at Georgetown University.  The
    database may be searched by off-campus users who can connect to
    the database using Telnet or a modem.
 
    The electronic text projects documented in the database are
    machine-readable files of primary materials from humanities
    disciplines.  Whether entered by keyboarding or by scanning with
    an optical character reader, these text files generally take the
    form either of large corpora for linguistic analysis (such as the
    new British National Corpus of one million words currently being
    developed by Oxford University Press and others) or major works of
    major authors for analysis of style and content (such as the
    compact disc of the Thesaurus Linguae Graecae containing 1400
    years of classical Greek texts).  The catalogue does not include
    electronic versions of encyclopedias, dictionaries, and secondary
    studies as well as concordances, databases, and computer-assisted
    instruction programs that do not contain full-text versions of
    primary works as these materials are beyond the scope of this
    project.
 
    Unlike the databases that research libraries often make available,
    the electronic texts cataloged at Georgetown are intended by their
    developers to be searched and manipulated directly by humanists.
    Often, therefore, the text is encoded with markup language to
    facilitate integration with other files; occasionally, the texts
    are combined with a commercial text-analysis tool such as
    WordCruncher, Folio Views, or Micro-OCP.
 
    With electronic text and integrated analysis software, the
    researcher not only has the equivalent of an interactive
    concordance for finding instances of key words but can also search
    for clusters of words, exact phrases, and co-occurrences of key
    words (sorted by boolean operators) in contexts of various sizes.
 
    Statistical programs show where the desired term or concept is
    concentrated in a work or series of works, and parsing programs
    can analyze parts of speech and syntactic structures.
 
    In general, therefore, the combination of electronic text and
    searching software can be said to provide the researcher with both
    microscopic and macroscopic views of the text.  The former
    provides access to small-scale features of a single work; for
    example, within seconds, a philosopher could locate the single
    occurrence of the phrase "consciousness of absolute being" from
    the nine-megabyte, three-volume translation of Hegel's Lectures on
    the Philosophy of Religion.  By contrast, the macroscopic view of
    the text highlights the ways in which one work differs from other
    works by the same author or the author's contemporaries; for
    example, if one searches an eleven-megabyte file of Shakespeare's
    works for the word 'time,' one finds a greater concentration in
    Macbeth than in the other tragedies, and by exploring the
    contexts, one can see how the title character's over-reaching can
    be explained thematically in terms of his attempt to usurp the
    providential function that belongs to time.
 
    Given these advantages, it is not surprising that the conversion
    of primary texts to electronic form is proliferating throughout
    the world.  Nevertheless, because of the unpublicized academic
    nature of such projects, the process of locating them can be
    difficult. For this reason, we rely heavily on the discussion
    groups on BlTNET and Internet, not only to identify new projects
    but also to request information about them and to disseminate the
    material we compile. Electronic mail provides access to the most
    recent developments and permits us to receive and transmit
    information throughout the world quickly and economically. Among
    the sixty discussion groups we monitor are those in language
    and literature (Ansax-L, C18-L, Chaucer, Contex-L, English,
    Ficino, Linguist, Litera-L, Literary, Reed-L, Rustex-L, Shakesper,
    and Wwp- L), culture and religion (Ccnet-L, Indology, Japan,
    Judaica, and Religion), libraries (Cdrom-L,  Fisc-L, Libref-L,
    Pacs-L, and Tei-L), philosophy and history (History, Philos-L, ad
    Philosop), and the humanities in general (Erl-L, Gutnberg,
    Humanist, and Pmc-Talk).
 
    In our search for news of projects, we also review a wide range of
    publications, including popular magazines and newspapers (such as
    the Chronicle of Higher Education), agency reports (such as the
    List of Awards of the National Endowment for the Humanities),
    trade publications  (including InfoWorld and EDU Magazine),
    discipline specific journals (such as Computers and Philosophy and
    Computers and the Classics), the newsletters of numerous academic
    computing centers, and the journals central to humanities
    computing (Computers and the Humanities, Bits and Bytes Review,
    and the ICAME Journal).
 
    Once we have identified a new project, we request ten categories
    of information:
 
    0.  Identifying acronym or short reference;
 
    1.  Name and affiliation of operation (including collaborators)
        with references toany published description;
 
    2.  Contact person and/or vendor with addresses;
 
    3.  Primary disciplinary focus (and secondary interests);
 
    4.  Focus: time period, geographical area, or individual;
 
    5.  Language(s) coded;
 
    6.  Intended use(s) and Size (number of works, or entries, or
        citations);
 
    7.  File format(s);
 
    8.  Form(s)  of access  (outline,  tape,  diskette, CD-ROM, etc.);
 
    9.  Source(s) of the archival holdings: encoded in-house, or
         obtained from elsewhere.
 
 
    Because the catalogue is constantly being updated, any printing
    would be almost immediately obsolete. Consequently, the CTT has
    converted the catalog to an online database searchable through
    Telnet and dial-in access so that current information can be
    made available to researchers. In addition, searches of the
    catalogue are performed on request, and updated lists of projects
    and addresses are posted regularly on the HUMANIST electronic
    bulletin board and distributed through surface and electronic
    mail.
 
    For further information about the project, or to request a
    specific search, please contact:
 
        Margaret Friedman, Project Assistant
        The Center for Text and Technology
        Academic Computer Center
        238 Reiss Science Building
        Georgetown University
        Washington, DC 20057
 
        (202) 687-6096
        BITNET: mfriedman@guvax    Internet: mfriedman@guvax.georgetown.edu