From estephen@whosnext.cad.gatech.edu Wed Jul 21 22:03:20 1993
Path: cam-eng!uknet!mcsun!uunet!gatech!cae!usenet
From: estephen@whosnext.cad.gatech.edu (Eric R Stephens)
Newsgroups: comp.infosystems.www
Subject: Hypertext WhitePaper
Message-ID: <22kauo$kh5@cae.cad.gatech.edu>
Date: 21 Jul 93 21:03:20 GMT
Organization: Georgia Institute of Technology, CAE/CAD Lab
Lines: 303
NNTP-Posting-Host: whosnext.cad.gatech.edu

The following white paper was written by Tracy Valleau of LinksWare.
It makes an excellent case for hypertext and presents some solution techniques.
Please respond to LINKSWARE@aol.com

____________________
(Font: Monaco 9)
Management in the Information Age. 
Hypermedia : The Next Generation of Software
   1993 Tracy Valleau

 Software developers have taken advantage of many of the computer s
capabilities, but the industry has yet to deal significantly with what may
well be the single most important aspect of the information revolution: data  
management.

         TEXT         SOUND                 TEXT---------SOUND
           |            |                    |    \    /    |
GRAPHICS   |            |   OTHER            |     \  /     |   
        \  |            |  /                 |      \/      |
         \ |            | /                  |      /\      |
          ||            ||                   |     /  \     |
          ||            ||                   |    /    \    |
           ______________                    |   /      \   |
               DATABASE                   GRAPHICS-------OTHER 
               INDEXER                                               
              MULTIMEDIA                       (HYPERMEDIA)

The Problem

Already overwhelmed with information, we are faced with a situation that is
entirely one-sided: as more and more information is generated, software
engineers continue to pursue products whose sole purpose is to create yet
more information.

The information age yields such vast amounts of data (the Library of Congress
receives over 30,000 new documents each day) that it is beyond human
capability to deal with it   and we have turned to computers for help. 

Today s software, even that on the  cutting edge  (such as intelligent agents
& minimization thru collaboration) is geared at producing data. The only
apparent exception is the database (or indexer), the old war-horse of the
computer industry.

Unfortunately, the documents generated by word-processors, not to mention
spreadsheets, movies, page-layouts, sounds, scripts, and graphics, do not
lend themselves to insertion into standard databases.

Further, even if it were possible, the effort required to continually insert
such an onslaught of information renders it impractical. Additionally, the
extra storage space requirements would be prohibitive.

Research

The process of retrieving information is research.  Re -search: to search
again. The searcher (ne user) tries to, and eventually will, construct a path
along which relevant information is found. The process of actually doing the
search however, sends the user down many a path to a dead end. Thus, the
researcher does indeed  re -search until one or more relevant paths is found.

In an ideal environment, no paths would dead-end; all would have some degree
of relevancy. Unfortunately this is not the case. Thus the process of aiding
research is one of reducing the number of irrelevant paths taken.

While current computer databases can index data (much like the card catalog
in a library) such programs merely show where a reference is located. They
cannot and do not settle the problem of whether those references are, in
fact, relevant (to the user s inquiry).

The Old Solutions

In the traditional computerized world, there are only a few approaches and
virtually all have been limited to researching text documents (as opposed to
graphic, sound, movie or script).

The brute force approach: a search engine in the data base or indexer parses
every document looking for a particular word. The results will be a list of
documents that contain the word, thus eliminating all others. The researcher
now searches this constrained set. This suffers however, the obvious
impotence of time: such searches may take many minutes to complete.

The process can be quickened by performing a pre-search. That is, a database
of words can be created in advance of the request, thus eliminating the
computer search time. Once again, the user is presented with a list of
potentially useful files to review.

An even more useful approach is that of the traditional database. By entering
the documents themselves into a special format, the search can be quite
specific, yielding the exact and only the exact, matches for a wide variety
of search criteria.

Unfortunately, this process alters the original context of the data by
forcing it into the mold required by the database record structure. 

(Some newer databases can include graphics, sound and movies as a field of a
record, but they cannot be used as keys. Specialized databases exist for
graphics and movies alone, but these obviously do not integrate other forms
of data documents.)

Even the hot new multimedia programs are not a database; a multimedia program
(usually) creates a third entity by assembling together multiple files,
usually resulting in a file which is much like a 'slide show,  a generally
linear, 'read-only' presentation file with an intended beginning, middle and
end. 

The Limitations

1)	Traditional approaches fail to encompass new documents (in all but the
brute force method). That is, the data base & pre-index contain only the
documents known  at the time of the search. To include new data documents,
they must be specifically entered. (The brute force search overcomes this,
but the time constraint remains prohibitive.)

2)	These processes are not dynamic; they produce some third file or list
based upon a criteria. Adding new data or searching a new path requires a
complete reformulation of the list.

3)	Each technique requires the user to know, in advance of the search, a
key (i.e., word) upon which to base the search. 

4)	The search yields only the same thing as does a search through the
library s card catalog: a list of items for further searching. (Admittedly,
the database yields up the actual card for convenient viewing, but not the
original document.)

5)	The discovery of a possible additional path requires the user to start
the search process all over again.

6)	Searches are limited to text keys and cannot be performed on
information stored as movies, sound, pictures and, particularly, scripts
(files that cause the computer to take some action(s)). 

7)	None of these methods can provide for serendipity. Research keys will
provide exactly and only the matches sought. (In some cases, of course, this
is the desired result. However, in research , this is a stifling limitation.)

8)	Most importantly, all the traditional approaches limit the search to
instances of the key; there is no attempt (by the search engine) to further
whittle down those occurances into those which are actually or potentially
relevant.

The need:

Given the extraordinary growth rate of available information, merely sorting
the wheat from the chaff, is no longer sufficient. 

A method must be found that can meet some minimum expectations:

 	It must limit searches by relevancy as well as more traditional
criteria.

 	It must make allowances for new data.

 	It must not require a massive retraining of users.

 	It must not duplicate nor alter existing data.

 	It must be dynamic.

 	It must integrate various types of data.

 The Solution

As it appears from the diagram on page 1, the research trail  can best be
followed via hypermedia, in which documents are related to one another rather
than to some third database.

There are profound differences between the traditional database or multimedia
approach to data management and the hypermedia approach.

High on the list of differences is (i) that the documents are related, in
their original form, to one another, rather than to some third structure and
(ii) the hypermedia approach is dynamic.

Combined, these two features alone provide the missing relevancy, the missing
serendipity, and the missing methodology to integrate, manage, research and
retrieve data.

 The Process

Using a hypermedia engine to retrieve information is significantly different
than the database approach. First, the researcher uses any original document
as a starting point, rather than starting with a search engine. (A search
engine is useful in a hypermedia system, but in a different fashion.)

Links to related information are indicated within the body of the text -
usually by underline, italics or bold-faced type. (An additional list of
linked words may also be available for direct reference.)

To retrieve the linked file, the researcher need only select the linked word
(again, either within the text body or from a list.) Doing so causes the
hypermedia engine to perform the action dictated by the linked file; i.e., if
the linked file is text or graphics, then it will be retrieved and displayed
on the screen; if sound or movie, it will be played; if  script then it will
be performed). A link can even refer to other information within the same
file, causing that paragraph or section to be displayed.

(Note that links can be added and retrieved from virtually all types of
files, although in some cases it makes little sense [i.e., adding links to
scripts].)

The user can add or remove links with ease so that should a new document seem
relevant to a subject, adding the link from the subject to that document is
made as simple as possible (i.e., clicking on a screen button or selecting a
menu item). 

The Advantages

1)	Relevancy can be assured to a significant degree since the links are
created by users of the system rather than a computer search.

2)	Further, by having the hypermedia engine track retrieval and link
frequencies, a priority retrieval system is automatically in place.

3)	Hypermedia is inherently dynamic both for the user and the data. Links
are easily added or changed and new data is automatically included (see
below).

4)	A unique search mechanism is possible. Because of the nature of the
hypermedia engine, a search on a given word can retrieve both documents in
which the word is linked, and the files to which the word is linked. (In this
instance, a minimum of two files are retrieved from a single word - unlike
the database s single file per key.)

5)	It s a natural and intuitive interface, requiring virtually no
retraining and no specialized skills.

6)	It does not require any new documents, new document formats, or changes
to existing data.

7)	Finally, by maintaining a list of linked words, any new document can be
hypermedia link-searched, regardless of whether or not any links have been
specifically made to that document. 

This is the hypermedia expanded search, and it goes a long way to curing the
problem of new data documents. (For example, document A has the word
 elephant  and it is linked to document B. New document C arrives and also
contains the word  elephant . Although no explicit link has been made to the
 elephant  in document C, an implicit link exists to both document A and
document B [through doc A]). In this manner, serendipity returns to research.

Potential

It is apparent that a hypermedia system must maintain two sets of references:
those to various types of documents (ne files); and those to various types of
retrievers (words, graphics, frames, phrases and so on). These two, along
with link and retrieval monitoring, provide a nearly ideal platform upon
which to build various automatic and semi-automatic search engines.

Such an engine could operate within a minimized collaborative environment, or
with agents. Equally, it could be used to support a single specialized
software program.

Its dynamic nature also allows for one more significant capability: an
environment in which the data is linked as it is viewed. A software package
using a hypermedia engine could be created wherein the user, in reading
documents or otherwise viewing files, can automatically or manually establish
links from one document to another. Thus, rather than hiring an army of
specialists to establish links in a massive effort, a self-building reference
library of multiple file types could be created in the passive process of
viewing those files. In short, the users both create and update the
hypermedia set.

Summary

The solution to the increasing weight of data in the information age must
involve both humans and computers. To avoid being overwhelmed by the sheer
volume, and yet make that volume useful, requires a new approach to data
manipulation for retrieval.

In order to succeed, such an approach must fit within the existing structure,
and work with standard file formats. It must not require the regeneration nor
alteration of current data.

Unlike multimedia which produces a linear presentation, hypermedia
establishes connections between the multiple files and file types. There is
no beginning, middle or end. While multimedia is like an electronic book,
hypermedia is like an electronic encyclopedia.

A hypermedia engine in its various implementations, can offer most features
of older methodologies while increasing the range, scope and depth of the
retrieval process. Such an approach is simple, elegant, intuitive and
self-supporting. It can be made to solve many of the problems of data
retrieval, including new data documents, relevancy, and serendipity.

The sooner we come to grips with the needs of data management, the less
danger we face. In the sheer mass of data we are creating, let us not injure
ourselves by that which we overlook .

The Author

This paper was prepared by Tracy Valleau, president of LinksWare Corporation
and author of the LinksWare hypermedia engine that performs the functions
listed above. LinksWare and the hypermedia engine are covered by copyrights.
Patent disclosures have been filed. Please contact Mr. Valleau directly for
more information. This paper is copyright 1993 Tracy Valleau. Permission to
reproduce this document, except for profit, is hereby granted.

Tracy Valleau 
LinksWare Corporation 
641 Lily Street, Monterey, CA 93940-1631
(408) 372-4155     fax (408) 646-9104 
AppleLink, America OnLine, GEnie:
LINKSWARE CompuServe: 72103,2073 Internet: LINKSWARE@aol.com


