Visual Resource Association Conference
Los Angeles, CA
February 12, 1999
Digital Reference: Building and Rebuilding
Ben
Howell Davis, Manager, Communications, Getty Information Institute
The digital information age seems to be in a
constant state of becoming. The speed at which new technologies effect
intellectual pursuits is blinding. The complexity of digital technology and its
relationship to cultural endeavors is not a simple meeting.
Cultural materials have special needs and require
special schema. Content, audience, software architecture, and visual design are
at once nouns and verbs in the new digital world. The Internet, for instance,
is a different medium. Although it metaphorically has kinship with books,
television networks, movies, CD ROMs, etc., it is not the same as any of them.
It is a networked computational medium; it makes information perform without
regard for time or distance. Cultural information then, has special performance
requirements.
A producer of digital reference must
understand what the content being communicated is, to whom, how, and what
impression it should make. That process will be reversed and reversed again as
the digital form of the information takes shape. In other words, the content
will be effected by the visual design, the perception of who it should be
reaching, and how it should perform.
In his book, How Buildings Learn: What
happens after they’re built, author
Stewart Brand looks at how structures at first perceived to be permanent
actually always change over time:
The word
"building" contains the double reality. It means both "the
action of the verb BUILD" and "that which is built" —both verb
and noun, both the action and the result. Whereas "architecture" may
strive to be permanent, a "building" is always building and
rebuilding. The idea is crystalline, the fact fluid. Could the idea be revised
to match the fact?
Cultural inquiry — the contextual that
surrounds and pervades physical culture — is itself a multi-layered and dynamic
structure. Until the Internet there has never been a medium that could
communicate such a fluid condition. The Getty Center has produced a number of
digital reference works that have had surprising and profound transformations
as they have gone from print to CD-ROM to the Web.
Vocabulary works like The Art and
Architecture Thesaurus (AAT), the Union List of Artists Names (ULAN), The Getty
Thesaurus of Geographic Names, and databases like The Getty Provenance Index,
and the Bibliography of the History of Art (BHA) have all encountered the new
digital reference domains and flourished, but in ways that no one imagined
before the Internet.
Print reference works have obvious
limitations. They take a long time to produce, update, and for works like ULAN
they are bound by what James Bower (Manager, Getty Information Institute
Vocabulary Program) calls "the Tyranny of the Preferred Form." Bower
explains:
That is, when we
produced ULAN in print, we had to choose one name form under which to enter
each cluster in the printed alphabetic sequence. Doing so left the suggestion
that there was one "preferred form" for each artist, when actually we
were trying to demonstrate patterns of usage and encourage appropriate
selection of forms according to different users' standards.
Another difficulty with print is the size of
data collection. The Getty Thesaurus of Geographic Names (TGN) for instance,
would be something like twenty-four volumes if it were printed. Patricia
Harpring, Senior Editor for TGN, notes that vocabularies are often " so big and complex and have so many cross references that it would be
very difficult to navigate through the material in hard copy." In
addition, it is virtually impossible to sort data in different ways in print
because you can’t search across it all at once. And certainly it isn’t possible
at all to accept contributions to these kinds of projects in any timely fashion
when print publication must be scheduled months, often years in advance.
Harpring does point out some of the positive
aspects of print and high volume reference works:
People tell us it
is easier to comprehend the big picture of relationships in hierarchies,
especially the AAT, in hard copy. They say they often refer to the book to get
an idea of the layout of the hierarchies. Some people don't have access to a
computer at all. Books are generally more practical for libraries that don't
have adequate computer availability for all their patrons. Many people just
like the look and feel of a book
Burton Fredericksen, Director of the Getty
Provenance Index, also reminds us that electronic databases can be aggregated
and searched. Books must be looked at one at a time. The editing of printed
books is very formal and precise, leaving no room for works in progress,
informal unedited work, or casual standards for an audience or contributor to
react to directly.
CD-ROM has many of these same limitations.
They require a deadline production schedule, are fixed bodies of information
that are immediately out of date when they are produced, can’t be updated, are
expensive to produce and market, and as James Bower states "Users of the
CD editions simply transcribe (or cut-and-paste) terms from a vocabulary into
their local database, without tracking any elements that would help them update
their data if the vocabulary changed."
Although CD’s have the advantage of allowing
a variety of searching and cross- referencing capabilities, size of the data
set is still a concern. TGN, for instance, is so large it will not fit on
current CD-ROM media. It would require more than one disc requiring expensive
packaging. DVD technology may remedy this but how long will it take DVD to
become a standard format?
Cumulative reference works like the
vocabularies and databases described are never finished. They are great examples
of works in a constant state of becoming. They must be built and rebuilt for
available technologies. With international collaborators, global data
resources, and cross-cultural partners, these kinds of works are often
multilingual as well. Mediums like print and CD-ROM offer advantages and
disadvantages in terms of usability and maintenance. Print and CD-ROM are also
tangible things; they can be bought and sold. This last reason for using them
may be the most pragmatic of all. Cost recovery for producing material like
this is marginal at best. To de-materialize the final product might be
unthinkable.
With the creation of the Mosaic browser, the
Internet was transformed into the World Wide Web. Cumulative reference works
now have a vast platform for collection, distribution, presentation, and
evaluation. The perfect situation for building and rebuilding information
resources that required cross-referencing, updating, user contribution, and
constant maintenance. No other medium had ever come close to satisfying the
demands such works required to be fully functional.
As an experiment, researchers at the Getty
Information Institute built a system called a.k.a. that took advantage of an early Web searching
protocol called WAIS (Wide Area Information Search). The a.k.a system allowed the simultaneous search of a number of
databases mounted on a local server at once. The International Repertory of the
Literature of Art (RILA - one of the early bibliographic databases that now
make up the BHA), The Getty Provenance Index (Sales Catalogues and Sales
Content), and the Avery Index to Architectural Periodicals (Avery) could now be
queried by keywords and lists of material relevant in each database would be
displayed. The implications of the experiment were that databases resident on
the Internet could be searched simultaneously from any user’s workstation
anywhere on the Net.
What next became of interest was how to
sharpen searches by enhancing the keyword query. By adapting the AAT and ULAN
as search filters, more exact searches could be done. The AAT when used as a search enhancement allows for the construction
of networks of links and paths composed of synonyms, broader term/narrower
terms, and related concepts that are used to refine and expand searches. The
AAT contains over 120,000 terms that describe objects, textural materials,
images, architecture and material culture from antiquity to the present
(focused mainly on the Western World) and provides vocabulary to describe the
materials and techniques related to physical attributes construction, and
conservation.
Used as a search enhancement the ULAN's
"clustered" format allows all name forms (pseudonyms, nicknames, and
an unlimited number of orthographic and linguistic variants) associated with a
particular artist or architect to be linked in a single, merged record. This
feature enables a great number of access points. The ULAN contains more than
200,000 names representing approximately 100,000 individual artists including
performance artists, decorative artists, and architects. Chronological coverage
ranges from ancient to contemporary, although post-medieval artists and
architects predominate. Geographic coverage is global, with mostly Western
European and American names.
The appearance of the Web had a profound
influence on the way databases and vocabularies could now perform. They could
be searched as individual resources, collectively searched simultaneously, and
searching could be enhanced by applying the vocabularies as filters. The Web
also serves as a platform for the continued, global, creation of the resources
as well. James Bower notes: "Now that the vocabularies were on-line, it
had profound implications for how we make updates/changes known to our users,
and whether/how they implement those changes in their local files." The
Web has made it possible for more people to use the resources than was ever conceivable
for print or CD- ROM products. Patricia Harping speaking about the TGN:
We average between
60,000 and 70,000 queries per month - which is very high for a cultural
database. We can better track who are users are and what kind of questions they
ask, thus we can improve the interface or add data to suit their needs. We can
accept contributions and comments much more easily and you can link our data to
other data sets, e.g., TGN may be linked to maps held by someone else.
There are still some interesting limitations
of the current Web in terms of speed and reliability of connections as well as
the fact that using a laptop in the field with a Net connection is not yet very
viable. But new projects show some striking directions. The Getty's ARTHUR (ART
media and text HUb and Retrieval System) uses the AMORE image system (developed
by NEC USA, Inc.) to index and search 30,000 images and associated text of 300
selected Web sites organized into five databases. Images can be retrieved by
image similarity, contextual similarity (text in the Web page near the image
that is similar), or by keywords in the Web pages. The AMORE software uses
algorithms that can detect edge, shape, and color that facilitate visual
searching. In addition, ARTHUR has enhanced query filtering through the use of
the AAT and ULAN, and TGN. This demonstration project seems to point to vast
new uses of the Web, databases, imaging technology, and vocabularies that may
combine print, CD-ROM and Web resources in ways not yet predicted.
The transformation of printed reference to
CD-ROM to Web resources has allowed works first conceived as library books to
become cultural information performers. The work put into formulating schema
for cumulative reference works has paid off in the network environment in
profound ways. The ability to ask precise and informed questions of Internet
information, make juxtapositions and comparisons, and draw surprising results
forces us to see the Web as a rendering system with vast scope. Cumulative
reference works now have a fluid environment for building and rebuilding. Like
all current Web enterprises, however, the task will be to better understand how
this rendering tool can support itself.
Reference:
Art and Architecture Thesaurus
http://www.gii.getty.edu/vocabulary/aat.html
Union List of Artist Names
http://www.gii.getty.edu/vocabulary/ulan.html
Getty Thesaurus of Geographic Names
http://www.gii.getty.edu/vocabulary/tgn.html
Bibliography of the History of Art
http://www.gii.getty.edu/bha/index.html
Getty Provenance Index
http://www.gii.getty.edu/provenance/index.html
Avery Index http://www.gii.getty.edu/index/avery.html
Ben Davis is Program Manager, Communications
at the Getty Information Institute and directs projects in digital publication,
digital design, and digital communication.