Visual Resource Association Conference

Los Angeles, CA

February 12, 1999

Digital Reference: Building and Rebuilding

Ben Howell Davis, Manager, Communications, Getty Information Institute


The digital information age seems to be in a constant state of becoming. The speed at which new technologies effect intellectual pursuits is blinding. The complexity of digital technology and its relationship to cultural endeavors is not a simple meeting.

Cultural materials have special needs and require special schema. Content, audience, software architecture, and visual design are at once nouns and verbs in the new digital world. The Internet, for instance, is a different medium. Although it metaphorically has kinship with books, television networks, movies, CD ROMs, etc., it is not the same as any of them. It is a networked computational medium; it makes information perform without regard for time or distance. Cultural information then, has special performance requirements.

A producer of digital reference must understand what the content being communicated is, to whom, how, and what impression it should make. That process will be reversed and reversed again as the digital form of the information takes shape. In other words, the content will be effected by the visual design, the perception of who it should be reaching, and how it should perform.

In his book, How Buildings Learn: What happens after they’re built, author Stewart Brand looks at how structures at first perceived to be permanent actually always change over time:

The word "building" contains the double reality. It means both "the action of the verb BUILD" and "that which is built" —both verb and noun, both the action and the result. Whereas "architecture" may strive to be permanent, a "building" is always building and rebuilding. The idea is crystalline, the fact fluid. Could the idea be revised to match the fact?

Cultural inquiry — the contextual that surrounds and pervades physical culture — is itself a multi-layered and dynamic structure. Until the Internet there has never been a medium that could communicate such a fluid condition. The Getty Center has produced a number of digital reference works that have had surprising and profound transformations as they have gone from print to CD-ROM to the Web.

Vocabulary works like The Art and Architecture Thesaurus (AAT), the Union List of Artists Names (ULAN), The Getty Thesaurus of Geographic Names, and databases like The Getty Provenance Index, and the Bibliography of the History of Art (BHA) have all encountered the new digital reference domains and flourished, but in ways that no one imagined before the Internet.

Print reference works have obvious limitations. They take a long time to produce, update, and for works like ULAN they are bound by what James Bower (Manager, Getty Information Institute Vocabulary Program) calls "the Tyranny of the Preferred Form." Bower explains:

That is, when we produced ULAN in print, we had to choose one name form under which to enter each cluster in the printed alphabetic sequence. Doing so left the suggestion that there was one "preferred form" for each artist, when actually we were trying to demonstrate patterns of usage and encourage appropriate selection of forms according to different users' standards.

Another difficulty with print is the size of data collection. The Getty Thesaurus of Geographic Names (TGN) for instance, would be something like twenty-four volumes if it were printed. Patricia Harpring, Senior Editor for TGN, notes that vocabularies are often " so big and complex and have so many cross references that it would be very difficult to navigate through the material in hard copy." In addition, it is virtually impossible to sort data in different ways in print because you can’t search across it all at once. And certainly it isn’t possible at all to accept contributions to these kinds of projects in any timely fashion when print publication must be scheduled months, often years in advance.

Harpring does point out some of the positive aspects of print and high volume reference works:

People tell us it is easier to comprehend the big picture of relationships in hierarchies, especially the AAT, in hard copy. They say they often refer to the book to get an idea of the layout of the hierarchies. Some people don't have access to a computer at all. Books are generally more practical for libraries that don't have adequate computer availability for all their patrons. Many people just like the look and feel of a book

Burton Fredericksen, Director of the Getty Provenance Index, also reminds us that electronic databases can be aggregated and searched. Books must be looked at one at a time. The editing of printed books is very formal and precise, leaving no room for works in progress, informal unedited work, or casual standards for an audience or contributor to react to directly.

CD-ROM has many of these same limitations. They require a deadline production schedule, are fixed bodies of information that are immediately out of date when they are produced, can’t be updated, are expensive to produce and market, and as James Bower states "Users of the CD editions simply transcribe (or cut-and-paste) terms from a vocabulary into their local database, without tracking any elements that would help them update their data if the vocabulary changed."

Although CD’s have the advantage of allowing a variety of searching and cross- referencing capabilities, size of the data set is still a concern. TGN, for instance, is so large it will not fit on current CD-ROM media. It would require more than one disc requiring expensive packaging. DVD technology may remedy this but how long will it take DVD to become a standard format?

Cumulative reference works like the vocabularies and databases described are never finished. They are great examples of works in a constant state of becoming. They must be built and rebuilt for available technologies. With international collaborators, global data resources, and cross-cultural partners, these kinds of works are often multilingual as well. Mediums like print and CD-ROM offer advantages and disadvantages in terms of usability and maintenance. Print and CD-ROM are also tangible things; they can be bought and sold. This last reason for using them may be the most pragmatic of all. Cost recovery for producing material like this is marginal at best. To de-materialize the final product might be unthinkable.

With the creation of the Mosaic browser, the Internet was transformed into the World Wide Web. Cumulative reference works now have a vast platform for collection, distribution, presentation, and evaluation. The perfect situation for building and rebuilding information resources that required cross-referencing, updating, user contribution, and constant maintenance. No other medium had ever come close to satisfying the demands such works required to be fully functional.

As an experiment, researchers at the Getty Information Institute built a system called a.k.a. that took advantage of an early Web searching protocol called WAIS (Wide Area Information Search). The a.k.a system allowed the simultaneous search of a number of databases mounted on a local server at once. The International Repertory of the Literature of Art (RILA - one of the early bibliographic databases that now make up the BHA), The Getty Provenance Index (Sales Catalogues and Sales Content), and the Avery Index to Architectural Periodicals (Avery) could now be queried by keywords and lists of material relevant in each database would be displayed. The implications of the experiment were that databases resident on the Internet could be searched simultaneously from any user’s workstation anywhere on the Net.

What next became of interest was how to sharpen searches by enhancing the keyword query. By adapting the AAT and ULAN as search filters, more exact searches could be done. The AAT when used as a search enhancement allows for the construction of networks of links and paths composed of synonyms, broader term/narrower terms, and related concepts that are used to refine and expand searches. The AAT contains over 120,000 terms that describe objects, textural materials, images, architecture and material culture from antiquity to the present (focused mainly on the Western World) and provides vocabulary to describe the materials and techniques related to physical attributes construction, and conservation.

Used as a search enhancement the ULAN's "clustered" format allows all name forms (pseudonyms, nicknames, and an unlimited number of orthographic and linguistic variants) associated with a particular artist or architect to be linked in a single, merged record. This feature enables a great number of access points. The ULAN contains more than 200,000 names representing approximately 100,000 individual artists including performance artists, decorative artists, and architects. Chronological coverage ranges from ancient to contemporary, although post-medieval artists and architects predominate. Geographic coverage is global, with mostly Western European and American names.

The appearance of the Web had a profound influence on the way databases and vocabularies could now perform. They could be searched as individual resources, collectively searched simultaneously, and searching could be enhanced by applying the vocabularies as filters. The Web also serves as a platform for the continued, global, creation of the resources as well. James Bower notes: "Now that the vocabularies were on-line, it had profound implications for how we make updates/changes known to our users, and whether/how they implement those changes in their local files." The Web has made it possible for more people to use the resources than was ever conceivable for print or CD- ROM products. Patricia Harping speaking about the TGN:

We average between 60,000 and 70,000 queries per month - which is very high for a cultural database. We can better track who are users are and what kind of questions they ask, thus we can improve the interface or add data to suit their needs. We can accept contributions and comments much more easily and you can link our data to other data sets, e.g., TGN may be linked to maps held by someone else.

There are still some interesting limitations of the current Web in terms of speed and reliability of connections as well as the fact that using a laptop in the field with a Net connection is not yet very viable. But new projects show some striking directions. The Getty's ARTHUR (ART media and text HUb and Retrieval System) uses the AMORE image system (developed by NEC USA, Inc.) to index and search 30,000 images and associated text of 300 selected Web sites organized into five databases. Images can be retrieved by image similarity, contextual similarity (text in the Web page near the image that is similar), or by keywords in the Web pages. The AMORE software uses algorithms that can detect edge, shape, and color that facilitate visual searching. In addition, ARTHUR has enhanced query filtering through the use of the AAT and ULAN, and TGN. This demonstration project seems to point to vast new uses of the Web, databases, imaging technology, and vocabularies that may combine print, CD-ROM and Web resources in ways not yet predicted.

The transformation of printed reference to CD-ROM to Web resources has allowed works first conceived as library books to become cultural information performers. The work put into formulating schema for cumulative reference works has paid off in the network environment in profound ways. The ability to ask precise and informed questions of Internet information, make juxtapositions and comparisons, and draw surprising results forces us to see the Web as a rendering system with vast scope. Cumulative reference works now have a fluid environment for building and rebuilding. Like all current Web enterprises, however, the task will be to better understand how this rendering tool can support itself.


Art and Architecture Thesaurus

Union List of Artist Names

Getty Thesaurus of Geographic Names

Bibliography of the History of Art

Getty Provenance Index

Avery Index

Ben Davis is Program Manager, Communications at the Getty Information Institute and directs projects in digital publication, digital design, and digital communication.