Virtual environments is a very young field of study yet has received a large amount of attention from a variety of disciplines, including computer science, psychology, cognitive science, visual art, and industrial engineering. All of these disciplines seek to improve upon the human-centered paradigm exemplified by VE systems. However, the implementation of VE systems is a nontrivial task because of the technological limitations of the hardware. Before discussing these limitations and how best to approach them, a number of definitions should be provided.
We can be more precise by giving meaning to a "virtual environment system." A VE system is multi-modal, interactive and adaptive, reconfigurable in software, and can generate supernormal situations. This system consists of the human, the human-machine interface and the computer-generated synthetic environment.
A virtual environment system consists of a visual display system, head tracker, and auditory displays (VETREC, 1992). Oftentimes, a glove-input device is included in the system, but such devices are no longer very popular due to a number of technical difficulties. Additional components of a VE system may include head-mounted displays (HMDs), automatic speech recognition, haptic input interfaces, force-feedback devices, whole body movement sensing, and/or sensing of physiological responses, such as eye movement, blood pressure or heart rate (VETREC, 1992). The equipment and its properties will be discussed in further detail below.
In the broad sense, a virtual environment system is any system that attempts to fool people into accepting a computer-generated, synthesized world as real. The most important traits of a convincing VE are interactivity and adaptability. The ability to persuade a user that the VE is the real world leads us to define the concepts of immersion and presence.
Immersion and Presence:
The definition of a virtual environment given above can be extended to
describe the subjective sensation a user might feel: "Sensory
information generated only by and within a computer and associated
display technology compels a feeling of being present in an
environment other than the one a person is actually in" (Sheridan,
1992a).
Not only is a virtual environment supposed to fool the observer into perceiving different surroundings, but it is also supposed to imbue the user with the experience of "being there." This sense of "being there" is termed immersion or presence.
Without the feeling of immersion or presence, VE research would not have generated much excitement. The ability to provide a sense of immersion is very desirable in entertainment applications as well as in teleoperation or training where the tasks to be performed are wide-ranging, complex and uncertain (Held & Durlach, 1992). The desire for more immersive virtual environments motivates much of the research in the area. Scientists are trying to qualify and quantify what creates the experience of presence, how to measure it, how to improve it, and what effect it has on various VE tasks (Slater & Usoh, 1993).
Simulator:
A knowledgeable reader might have noted that a number of the
qualities associated with virtual environments have been present for
many years in what are called simulators. However, virtual
environments have some characteristics which distinguish them from
simulators. For example, a VE is more flexible than the typical
simulator; a VE is reconfigurable for different levels of fidelity
and/or various skill levels as well as for the characteristics of a
particular task (VETREC, 1992). In addition, a simulator is generally
trying to match the real world (the degree of which is measured as
"simulator fidelity") while a VE need not present an exact copy of a
real world environment. A simulator is closely tied to some physical
situation, while a VE is most closely associated with the human user
(Carr, 1995). This is encouraging; VE systems have much wider
applicability than ordinary simulators since they focus on providing
immersion, rather than merely a replica of the real world. Negroponte
prognosticated in 1970 that human factors would eventually take a
strong part in computer system and computer interface design. VE
systems differ from the simulators that existed at that time because
they embrace a human-centric view, encouraging designers to look to
the human being to justify design decisions.
Kriloff, in 1976, presented another view, suggesting that the human should conform to the machine since machine programming is fixed and man is adaptable, with the caveat that unpleasant side effects might occur. The study of human factors engineering, which originated in the late 1940s, is concerned with how to design man-machine systems to reduce side effects and make them easy to use, safe, and efficient. The view that a VE adheres more to the human end of man-machine interaction accurately represents the relatively recent shift to more human-centric systems. Simulators are being replaced by VEs in situations where the human and their sensations are of the primary concern. Simulators act to produce a situation to which the human must adapt, rather than adapting the situation to human sensory capabilities. Both simulators and VEs share the common theme of presenting a synthetic task to the user; therefore, the concept of a task should be better defined.
Task:
Human factors engineers recognized very early in the growth of
their discipline that some formalization of human behavior was needed
in order to examine the behavior involved in man-machine interaction.
Thus, the formal concept of a task was born. A task is an arbitrary
unit of work, one or more specific related actions necessary to change
or verify a system's state. A task may be mental, physical or a
combination of these (Van Cott & Paramore, 1985). A task has a number
of characteristics, including a set of conditions that require and
initiate human performance, a specific purpose, a definite start and
end defined by an initiating and terminating cue or stimulus, a
relatively short life span, the ability to be interrupted by another
task, and the ability to be performed by many people at once
(Christensen, 1993). For the purpose of VE research, these
definitions are more than sufficient.
Realism:
When we mention the idea of a task, we are implying the
existence of a real-world context for that task. In a simulator, we
are trying to make the synthetic task as much like the real world task
as possible. Furthermore, we could say that "realism" is the degree
to which the sensory stimulation coming from the artificial
environment matches that originating from an equivalent real
environment (Christou & Parker, 1995). A serious issue of perception
arises immediately: what is veridical and what is perceived? The
common assumption is to believe that "what I see is real," a
perspective philosophers call naive representationalism or naive
realism (Weintraub, 1993). Research shows that the veridical world is
not the same as the perceived world, yet in VE work, the main thrust
is to utilize and exploit the inherent desire for naive realism to
produce the experience of presence. We exploit this desire by
studying the aspects of the human visual system that contribute to the
perception of a world as real and using the results to synthesize a
real situation. However, in VE systems, supernormal situations are
possible, making comparison to the real world difficult, if not
impossible. Therefore, realism is only really relevant to the
discussion of VEs when the VE is attempting to simulate traits of the
real world.
Could a virtual environment simulate a world that has no functional or logical connection to the real world? This is a difficult idea to grasp and seems far-fetched, yet the capability may exist. However, a human would be unlikely to understand such a world and would be experience little or no immersion. Thus, a VE must embrace reality in some manner in order to present a sensical and usable environment. Characterizing the ways in which a VE should mimic the real world is an important part of creating an interactive and immersive environment.
Helmholtz, in 1882, proposed the Doctrine of Unconscious Inference, which states that people apply what they know about the world to achieve the percept, and that information from the world is oftentimes insufficient (Weintraub, 1993). His century-old doctrine suggests the basis for studying realism in VEs: the manner in which the world is perceived as real is based upon previously gathered information about the world. Realism, then, could be achieved through the study of the preconceptions that lead to perception of the world as real.
Realism in virtual environments has a connection to the feeling of immersion. Hendrix and Barfield report that, according to a subjective questionnaire, people react more to the realism of the interaction in a VE than to the realism of the objects in that VE (Hendrix & Barfield, 1995). The functional behavior of objects in a VE determines its subjective realism, while the actual appearance of the objects is less relevant. However, what determines functional behavior is not clear. This terminology should include perceptual functionality, such as how objects behave when a subject's head turns, or how objects move within the scene. Clearly, the exactitude of the human-machine interface is not the only determinant of immersiveness and/or realism. The significance of the behavior of the objects within the simulated environment should not be overlooked.
The computer graphics world has, for much of its short life, been pushing technology ever further in the pursuit of photorealistic images. New, higher-resolution displays and more powerful computational engines have been developed to aid the realism of the images on a computer screen. The computer graphics field is finally acknowledging that the behavior of the objects in the scene is likely to be more important to realism than the quality of the image. A shift in focus is reflected by increased research into physically-based modeling and animation.
The young discipline of virtual environments is only beginning to absorb the large body of knowledge from psychophysics and perception that offers clues to the concept of realism. The perceived appearance and behavior of objects is being carefully quantified relative to some of the peculiar equipment used in VE systems. Therefore, a brief survey of standard VE equipment and its characteristics is warranted.
The visual display gives the subject the most salient and detailed information about the synthetic world. A real-time display showing precise, continuous motion imagery, while maximizing normal visual sensory abilities is ideal. A visual display system incorporates the actual display surface, a system for monitoring the location and motion of the head and/or eyes, a system for generating the stimulus, and a positioning system for the displays (VETREC, 1992).
The type of visual display most commonly used in virtual environment work is a head-mounted display. An HMD is a helmet that sits on the user's head and presents an image very close to the eyes. Usually, the helmet contains a tracking device which permits the graphics hardware to update the image to match the motion of the participant's head, allowing them to "look around" a scene. Often, to allow the presentation of stereoscopic images, an HMD will have a separate display for each eye.
A number of variations on the generic HMD have been developed. See-through displays using half-silvered mirrors have been developed to allow the superposition of virtual images with the real world (Kalawsky, 1993; Barfield, Rosenberg, & Lotens, 1995). Some HMDs use spinning color filters to present a color image from a grayscale display (Allen, 1993). Eye tracking has been introduced into a few HMDs so as to follow the user's gaze with high-resolution inserts (Kalawsky, 1993). New optical systems are always being developed that have wider field of views (FOVs) and better adaptability to the vision of different users. Significant effort is being devoted to designing and prototyping new HMDs by laboratories at the University of North Carolina and the University of Washington, while the commercial world continues to produce a variety of styles of HMDs (Barfield and Furness, 1995).
Despite the focus on the ubiquitous HMD, a number of other types of displays are worth mentioning since they lend insight into a few of the limitations of HMDs. The CrystalEyes field-sequential system uses glasses with liquid crystal shutters that are synched with a monitor to time-multiplex images to the eyes (Lipton, 1991). A high-resolution, narrow FOV, three-dimensional image can be presented with this system. The CAVE at the University of Illinois is a system designed to surround the user in a cube with back-projected images on the walls. These images are also presented in a time-multiplexed manner with shutter glasses so that a user perceives a large FOV, three-dimensional image. The shape of the CAVE presents some limitations; the projected images are lower resolution than those seen on a normal computer monitor, and a great deal of computation is required to present images that update at a sufficiently fast rate (Cruz-Neira, Sandin, & DeFanti, 1993). At the time of this writing, a number of other interesting alternatives to the HMD are being developed. However, to present them all would be laborious. The reader is invited to consult Kalawsky (1993) and the journal Presence: Teleoperators and Virtual Environment for more information.
Head-mounted displays have a number of characteristics which determine their weight, comfort, durability, size, and price. Generally, HMDs weigh from 1 to 10 pounds and permit for a number of different adjustments to ensure a secure fit. Some HMDs allow inter-pupillary distance (IPD) and eye relief (the distance from the surface of the eye to surface of the display) adjustments to suit the individual. FOV ranges from 20û to 140û horizontally and 23û to 90û vertically, and most HMDs are capable of displaying around 50û horizontal by 40û vertical. Displays are usually active liquid-crystal displays (LCD), although electroluminescent, light-emitting diode, plasma, and cathode-ray tube (CRT) based displays have been successfully constructed. The pixel resolution of these displays hovers around 600 pixels by 400 pixels, although examples of both higher and lower resolution systems have been produced. The design of HMDs is a fairly complex and interesting field; the curious reader is referred to the work of Barfield, Hendrix, Bjorneseth, Kaczmarek, and Lotens (1995) for an introduction.
Ultimately, virtual environment designers would like a visual display system that has a high spatial resolution, high luminance and contrast, and high-fidelity color. The system should incorporate a wide FOV with a rapid stimulus generation and update rate. The system should also include a quality stereoscopic configuration, provide minimal interference with the user's motion and overall comfort, and eliminate noise and distortion. Safety is an important issue, as are reliability and durability (VETREC, 1992). Unfortunately, these characteristics are far from being achieved. Ideally, an HMD would weigh as little as a pair of glasses, fill the visual field, match the resolving power of the eye, and properly coordinate all visuo-spatial cues (Biocca & Delaney, 1995). However, the possibilities for VE technology are still substantial even if these constraints are only partially satisfied.
Another strength of virtual environment systems is that they are more cost-effective than traditional simulators. Because of a VE's reconfigurability in software, changing a simulator from, for example, an F-14 fighter to a Cessna 150, costs only the price of a programmer, not the price of a whole new cockpit mock-up. In this example, the VE is more cost effective by at least two orders of magnitude (a new simulator might cost millions of dollars, while reprogramming a VE may only cost tens of thousands of dollars). A VE could also be modified for each user based upon a variety of constraints from the user's visual acuity to his or her spatial reasoning abilities. The reconfigurability of VEs allow them to be supremely adaptable, requiring only standard equipment to provide a variety of services (VETREC, 1992).
In addition, virtual environment systems are networkable. Multiple users can be supported in a single environment, permitting cooperation and human-human interaction. Networking together VEs also allows for the geographical distribution of users and permits dispersed resources to be shared among users. Thus, VEs are well-matched to teleoperation tasks. Areas of applications for teleoperation include the exploration of space, undersea oil and science (geology, biology), nuclear power plants, toxic waste cleanup, construction, agriculture, mining, warehousing, mail delivery, firefighting, policing, military operations, telesurgery, and entertainment (Sheridan, 1992b). The ability to network VE systems allows for the possibility of achieving these applications.
Finally, VEs have the ability to present supernormal situations. The user can be given additional information in a synthetic world, gaining perspectives and opportunities for communication not possible in the real world. The potential for augmenting normal perception is staggering; adding information to a person's sense of the real world has hundreds of applications.
Before the lovefest with VEs goes too far, it must be noted that VEs are well-suited to a particular set of tasks. Tasks that require a tracked and transitional viewpoint, 3D spatial reasoning and visualization, and complex interactions with the environment utilize the advantages of VE technology (Stanney, 1995). Suggested applications include teleoperation (Sheridan, 1992b), entertainment, training (Ellis, 1995a; VETREC, 1992), education (Travis, Watson, & Atyeo, 1994), architecture (Slater & Usoh, 1993; Henry & Furness, 1993), scientific and medical visualization (Ellis, 1995b; Kalawsky, 1993), design, manufacturing and marketing, and telecommunication (Durlach & Mavor, 1995).
However, for each potential application, a careful analysis of both the task and the users is needed; failure to recognize the misapplication of VE technology can have disastrous effects. Some have suggested the formation of a virtual task taxonomy to direct design efforts for maximizing human performance in VEs. Classifying tasks according to types of displays and interactions which best improve efficiency in VEs would be extraordinarily helpful in determining for which tasks VE technology might be effective (Stanney, 1995). However, the development of such a taxonomy is a formidable job, since the size and complexity of the application-space are imposing.
The vertical and horizontal field of views in an HMD are rarely equal since few displays are made with equal resolution along each axis. FOV is believed to have a strong effect on the feeling of presence (Robinett & Rolland, 1992). The wider an image appears, the more like the real world it seems. Studies have shown that a wider FOV improves performance on some tasks (Ellis, 1995a).
In order to achieve a wide FOV, the pixel resolution of the display is compromised. For example, normal computer monitors have a resolution of 1028 pixels by 1240 pixels and a diagonal measure of about 17 inches. When viewed from a normal sitting distance of 15 inches, the individual pixels are not detectable, and the FOV is about 20û horizontal. Viewed from a distance of 2 inches, the individual pixels are clearly discernible, and the FOV is about 100û horizontal. HMDs are analogous to the latter case, where displays with resolutions roughly 600 pixels by 400 pixels are placed about an inch from the surface of the eye.
Because of the placement of the display, an HMD has low pixel resolution. Text is difficult to present, and depth perception is seriously distorted. Users of typical HMDs qualify as legally blind (i.e. have 20/200 vision or worse) (Ellis, 1995a). In the domain of aviation, displays that had low resolution were shown to increase the root-mean-square deviation from an optimal descent path (Fadden, Browne, & Widemann, 1991). The pixel resolution of the display is important for a variety of applications.
The ability of a human eye to discriminate objects is called visual acuity. Acuity is determined by the physiology of the retina, which has varied density and accuracy of receptors. The eye has a main focus region, called the fovea, which has a resolution of about 30 seconds of arc (Buser & Imbert, 1991; Boff & Lincoln, 1988; Goldstein, 1989). HMDs are far from presenting images that are equivalent to this level of acuity.
As the eye moves, the high-acuity foveal region points to different regions of an image. Therefore, to present an image of sufficiently high resolution, the display must match foveal acuity. Most displays available today have inadequate resolution. In addition, eye movement may cause vignetting, which is the partial to total loss of light from the image. Vignetting occurs when the eye pupil is not at the intended exit pupil of the optic system of the HMD (Ma, Hollerbach, & Hunter, 1993). This problem can be partially repaired with an increased exit pupil.
A number of display systems have troublesome interfaces with the computational engines which produce the images. Often, the frame buffer of the graphics computer does not map precisely to the surface of the display; usually some image clipping occurs. Thus, the FOV of the display may differ from the published characteristics of an HMD. Erroneous computation of the perspective geometry can result unless the behavior of the computer-to-display interface is properly represented. In one system, neglecting the clipping of the frame buffer resulted in an error of 5û in the presented FOV, causing objects to look bigger and closer than intended (Rolland, Gibson, & Ariely, 1995). Clearly, the behavior of the interface between the graphics engine and the actual display should be well understood by the designer of a visual display system.
Furthermore, a designer should try to build a head-mounted display that has a large eye relief. A large eye relief would accommodate the 30-50% of the population aged 20-45 that use spectacles, more than half of which wear them when using optical devices (Ma, Hollerbach, and Hunter, 1993). However, increasing the distance between the eye and the display reduces the FOV. Many HMDs do not even allow any sort of eye relief adjustment to accommodate wearers of eyeglasses.
User variance, as illustrated by the issue of users with corrected vision, is one of the critical elements of display system design. A VE display system should be flexible enough to handle the variation between users yet robust enough to present the same image to each person. One of the major factors that differs from user to user is the interpupillary distance. IPD varies from about 53 mm to 73 mm between users, averaging out around 63 mm (The author's IPD measures 61 mm, 29 mm to the left, 32 mm to the right!). This discrepancy places the following constraint on the design: either the optics have to provide a wide enough exit pupil to accommodate both wide and narrow-eyed viewers, or a mechanical adjustment, like on binoculars, should be incorporated into the optical system (Robinett & Rolland, 1992).
User variance includes more than just physiological differences; psychological issues are also important. For example, familiarity with a particular display system has been shown to significantly affect some tasks performed in a VE (Stanney, 1995). For example, novice users of HMD systems often fail to take advantage of the multiple viewpoints presented or, even more critically, of the adjustments possible on the HMD itself.
Another problem with head-mounted displays stems from limitations of the display technology. Generally, the displays in HMDs lack luminance and contrast. In the real world, intensity in a particular scene might range from 1 to 1000 (i.e., a sunbeam in a dark room), while a typical CRT has an intensity range from only 1 to 100, and an LCD has even less. Contrast also aids visual acuity; as a display gets dimmer, acuity performance decreases (Christou & Parker, 1995). The relationship between contrast and visual acuity is given by the Contrast Sensitivity Function (CSF). The CSF relates the decrease in contrast to a decrease in visual acuity (Weintraub, 1993; Goldstein, 1989). Because HMDs are unable to produce realistic contrast and luminance values, visual acuity suffers.
The color capability of the display technology used in most head-mounted displays is also inadequate. The color range that can be presented simply does not match the colors that are perceivable and discriminable by the human eye. And, for each color, the brightness control produces less brightness levels than can be normally differentiated (Christou & Parker, 1995; Barfield et al., 1995). So, the use of color displays adds an additional layer of complexity to the limits of HMDs.
The optics that rest between the user and the display surface also contribute to the difficulty of designing an HMDs. Since most HMD optics are magnifiers to widen the FOV, a convex spherical distortion is introduced. Most code for presenting images on these displays fails to take into account this distortion. The optics, besides increasing the FOV, provide the user with an image that they can focus on, despite the fact that the display surface may be very close to the eye (Robinett & Rolland, 1992; Hodges & Davis, 1993). The optics end up curving normally linear surfaces, and introduce a number of other aberrations. The main aberrations in HMD optics can be described as (in layman's terms) blurring, curvature, distortion, and color (Ma, Hollerbach, & Hunter, 1995). Humans have some ability to adapt to aberrations, and much work in optometry has been devoted to quantifying human tolerances to these distortions.
The optical system in an HMD presents another problem since its idiosynchracies are usually not modeled in the code used to display the image. Generally, graphics systems model the eye as a single point which is the center of the perspective projection. The pupil is not well-represented as a single point, nor do most displays account for the movement of the eye (Rolland, Gibson, & Ariely, 1995). Furthermore, the displacement of the "virtual eye" in some models results in significant spatial errors as well as a decrease in the speed of task completion (Rolland, Biocca, Barlow, & Kancheria, 1995). Furthermore, most HMD optical systems have large exit pupils which are also not accurately represented as a point. Clearly, the model of the geometry in the computation of the image should match the characteristics of both the optics in the display and the optics of the human eye.
Furthermore, human perceptual distortions can further complicate the precise modeling of the visual display. A distortion-free display and a precise formulation of the geometry will still result in some inaccurate perceptions. This is due, in part, to the psychology of self-location which states that accurate visual perception of an object requires a combination of perceived distance, perceived direction, and perceived location of the viewpoint (Pstoka, Lewis, & King, 1996). Because the visual system is an information-loss system, a number of filters from the physiological to psychological level act to extract relevant information from the stream of data being received from the real world (Weintraub, 1993). The physiological characteristics of these filters have been discussed, but human psychological biases are not well-modeled in most VE visual display systems.
Field of view of the display is another important design parameter for developing immersive simulations. Spatial resolution is generally compromised to provide a wider FOV; a narrow FOV with high resolution gives an unrealistic sense of tunnel-vision. Conversely, a low-resolution, wide-FOV gives a more primitive, yet more realistic image. Due to this trade-off, HMDs are simply inappropriate for certain tasks. One hardware solution follows the eye with a high-resolution patch of about 30û (VETREC, 1992; Travis, Watson, & Atyeo, 1994; Ellis, 1995a; Yoshidea, Rolland, & Reif, 1995).
The exponential growth of technology should not be ruled out as a solution to the problems with VE systems. Active matrix LCD displays have already surpassed good quality CRTs and are far ahead in size, weight, power consumption and operation voltage (Ma, Hollerbach, & Hunter, 1993). A recent development in LCD technology allows the placement of a 640 pixel by 480 pixel display on a single chip with pixel size measuring only 30 microns by 30 microns. Not only is this chip small, but it also has low power consumption and a low production cost (MicroDisplay, 1996).
Another example of potential of technology is the CAE Fiber Optic HMD (FOHMD), considered to be one of the best visual displays currently available. The CAE FOHMD uses two 83.5û monocular FOVs with an adjustable binocular overlap up to 38û. It provides a horizontal FOV of 162û. The visual resolution is 5 minutes of visual arc, with a high-resolution insert (24û x 18û) with 1.5 arcminute resolution. In addition, the displays are bright, at 30 foot-Lamberts. The head tracker's performance is boosted by additional accelerometers to do predictive tracking, yielding an update rate of about 100 Hz (Ellis, 1995b; Kalawsky, 1993). Of course, the FOHMD is a fairly heavy piece of equipment and is prohibitively expensive.
Another interesting display is the Sparcchair, developed at Sun Microsystems. The Sparcchair trades off high resolution for a low FOV; it has a resolution of 1120 pixels by 900 pixels with a 20û by 25û FOV. The Sparcchair was developed for a specific task requiring high resolution, and thus its configuration seems reasonable (Reichlen, 1993). Yet, even with the arrival of new technologies and designs, some tradeoffs simply cannot be avoided.
The design of the optic system in an HMD also suffers from several unavoidable tradeoffs. The problem with the optics in HMDs has been given a fairly comprehensive treatment by Robinett and Rolland (1992). They attempt to quantify the problems associated with optical distortion and IPD variation by computing an extensive model of the image based upon the layout of the optics in an HMD. A comprehensive simulation should provide a consistent image by accounting for the properties of the HMD's geometry, including the relative positions of the display screens, optics, and eyes (Robinett & Rolland, 1992).
Once a computational model of the HMD geometry has been included in the code, IPD variation can be accounted for by using it as a parameter in the calculation and presentation of the graphics. Measuring a user's IPD is a fairly trivial task and having adjustments on the HMD for IPD has become commonplace (Robinett & Holloway, 1995; Ma, Hollerbach, & Hunter, 1993). Further calculations have revealed ways to account for all the various transforms in the optics system (including some tracker transforms), as well as off-center perspective projection (Robinett & Holloway, 1995). Hodges and Davis have also contributed a description of the perspective geometry of a display system (1993). Their work, which describes the effects of pixels on stereo depth perception, has resulted in other solutions to display difficulties. Through extensive modeling and calculation, solutions to the optical distortions in HMDs can be resolved.
Watson and Hodges (1995), using Robinett and Rolland's model of the optics' geometry (1992), implemented pre-distortions in software to correct for optical distortion. Their work is particularly interesting because it represents a software solution to a hardware limitation - a methodology discussed in more detail below.
Inter-pupillary distance should not be the only parameter used to characterize user variance. A number of additional tests should be performed to assess other individual differences. Lampton, Knerr, Goldberg, Bliss, Moshell, and Blatt (1994) suggest a battery of tests to determine a subject's visual acuity, color and object recognition, size estimation, distance estimation, search and a number of other visual skills used in locomotion, object manipulation and target tracking. Such a battery seems more appropriate for rigorous experimentation in VE systems than for off-the-shelf VE systems. A good system should be able to accommodate population variance without seriously compromising performance. Thus, the job of VE designers is a difficult one; they must devise solutions that work around the limitations of the equipment and yet are capable of presenting a realistic environment.
Thus, the level of realism is reduced by the low quality of the virtual world. Photorealism suffers from the low resolution of the display and the computational limitations of the graphics engine. Functional and logical realism suffer for the same reasons, as well as the others mentioned above. Clearly, the application of VE systems to simulating a real world task is warranted only if a suitable level of realism can be obtained.
Virtual environment systems should provide the sensory cues that are necessary for a particular task. A fully real physical world is too complex to simulate, so providing task-specific information in the best way possible is the only feasible solution (Zeltzer, 1991). Thus, some applications might benefit from a VE-type display, but the demands of many other tasks may be best met by more traditional (and cheaper) display types (Ellis, 1995b; Stanney, 1995).
For example, Smets and Overbeeke (1995) argue that spatial resolution is not important for some tasks, implying that low resolution HMDs may be tolerable in some situations. How much resolution is necessary is a obviously a function of the type of task (Travis, Watson, & Atyeo, 1994). In summary, one might ask:
Basically, a task analysis is the breakup of a task into behavioral components that can be further analyzed (VETREC, 1992). However, visual tasks are fairly complex. Researchers know the type of visual stimulation a user finds informative for particular tasks, but they have trouble linking the type of stimulus with the task type. For example, stereovision and motion parallax provide useful information about the relative distances of objects from the observer (Christou & Parker, 1995), but this result is hard to translate to a particular type of task.
Since we can derive the information to which the visual system is sensitive, the in-context (ecological) significance of this information, and the limitations on the use of the information in the visual system, we can design a VE to stimulate the visual system in a realistic manner. However, the limits of the human visual system be accommodated first, before other contributions to realism can be analyzed (Christou & Parker, 1995).
Realism in a VE can be improved by recognition of the redundancy in the human visual system. Tasks that provide a great deal of redundancy (i.e. multiple cues to the same piece of information) are well-suited to VE systems. Repeated information in the visual system reduces ambiguities, and improves the signal-to-noise ratio (England, 1995).
Further analysis reveals that spatial visualization, orientation, spatial memory, and spatial scanning skills are helpful in predicting the performance of a human-machine interface (Stanney, 1995). A task can be analyzed in terms of these component skills to determine its suitability for a given interface.
Figure 2.1: Assuming a square, 15 foot by 15 foot object and a display that is 600 pixels by 400 pixels, this plot shows the concept of pixellation of depth. The sample object remains the same size despite being at significantly different depths. For example, the object remains 2 pixels by 2 pixels from about 4000 feet to 7000 feet, a range which is much different from discriminability in the real world. Pixellation of depth causes two major problems. One, the ability to judge depth is severely impaired; two, the range in which objects are visible is greatly reduced. Depth estimation is impaired and can be further exacerbated by improperly applied anti-aliasing techniques (Christou & Parker, 1995). Actual human depth judgment has an acuity of about 5 minutes of visual arc near the fovea, although lower values have been reported for special cases (Yeh, 1993).
Figure 2.1 shows the threshold problem caused by low resolution displays. As distance is increased, the jump from one pixel to no pixels occurs well before the human visual system would reach the its threshold of detectability. The display assumed here is unable to match human abilities. This inadequacy lies at the heart of the problem with visibility in HMDs, and has received some acknowledgment in the literature (Christou & Parker, 1995; Pioch, 1995), but no reasonable solutions have been presented.
(1) Figure 2.2: Basic model of visual acuity. The visual angle, §, increases as the tangent of the ratio of the separation of the two point-objects, A and B, to the distance from the cornea of the eye to the perpendicular bisector that intersects points A and B. Formula (1) gives a fairly accurate representation of the angle subtended by the separation of two objects as a function of depth. Now, we can define visual acuity as the maximum value of § for which A and B can no longer be discriminated.
Depth acuity refers to the ability of a subject to discriminate between two objects positioned at different depths (Goldstein, 1989; Graham, 1951). Depth acuity is a particularly complex issue, since a depth percept is constructed from a number of cues. Depth cues can be classified into stereopsis cues and pictorial depth cues. Stereopsis refers to the production of a three-dimensional scene from the images acquired by each eye. Stereopsis cues also include accommodation and convergence, which help determine depth by noting the state of rotation of the eyes (convergence) and the focus of the lens (accommodation).
The main pictorial depth cues generally include: occlusion linear perspective size and familiar size distance to horizon color shading atmospheric effects texture gradient focus shadow motion parallax (Goldstein, 1991; Buser & Imbert, 1992; Graham, 1951; Boff & Lincoln, 1988) Interaction among pictorial depth cues is very difficult to quantify. However, the influence of occlusion, linear perspective, and size constancy cues is known to be stronger, under most conditions, than most of the other cues. Linear perspective and size constancy the cues used most frequently in VEs. This is due to the inability of most HMDs to produce a decent quality stereo image.
The other pictorial depth cues are generally more situation-dependent than the linear perspective and size constancy cues. For example, occlusion is useless unless two objects are placed so that one is at least partially in front of another. The color range available on most HMDs is not sufficient to produce a significant color depth effect. Plus, the "looking through binoculars" feeling of an HMD is not likely to produce an accurate familiar size cue. Most importantly, the deficiencies in color and resolution make blurring and defocusing cues nearly worthless, preventing the use of anti-aliasing techniques.
Because of the limitations of the visual displays some depth cues are simply unavailable, and the remaining cues generally lack the precision of the real world. Since size constancy and linear perspective are the main depth cues used in VE displays, the examination of these cues will provide insight into the depth perception problems that result from poor pixel resolution.
First, the size constancy cue is based on the observation that familiar objects become smaller as they move farther away. Prior knowledge of the size of the object is an important component of the size constancy cue. Size constancy was first noted in the literature in 1889 when evidence was given to match the virtual retina theory (as presented in Equation [1]) (Maurtius).
Figure 2.3: Size constancy. An object appears to shrink as the distance between it and the observer increases. (a) The size of the object is given as h at distances A, B, and C. (b) The object as seen at distance A. (c) The object as seen at distance B. (d) The object as seen at distance C. Linear perspective cues generally require that the observer be some distance above the plane being viewed. Humans have their eyes conveniently located some distance above the ground which helps to provide this type of cue.
Figure 2.4: The effect of viewpoint height. For a constant viewing distance, d, and a consistent fixation point, increasing the observation height decreases the visual angle subtended by the object and moves the horizon line. (a) The size of the object is given as h and the object is viewed from locations A, B, and C. (b) The object as seen from location A; viewpoint height is zero. (c) The object as seen from location B; viewpoint height is . (d) The object as seen from location C; the viewpoint height is d. Figure 2.5: With the viewpoint located at a height and fixed at a single point, the size of the object shrinks and it appears to move towards the horizon as the separation between the object and observer increases. (a) The size of the object is given as h and is viewed at distances A, B, C. (b) The object as seen at distance A. (c) The object as seen at distance B. (d) The object as seen from distance C Given a particular viewpoint height, the linear perspective cue can be described as the motion of an object towards a center "infinity point" as it moves away from the observer. The following figure illustrates this idea:
Figure 2.6: Result of tracing the corner points of a square with width and height as the separation between the observer and the object increases from zero to infinity. Conveniently, both of these depth cues can be described by simple mathematics. A prediction of subject performance in a depth perception task can be based both on the perspective geometry and on the results of previous work in human visual performance. The development of a predictive model of visual depth perception in VEs will facilitate the quantification of threshold and depth estimation problems described above.
The first component of this model is a formula describing the visual angle subtended by an object as a function of viewing distance. For the following calculations, a simple model with no viewpoint height is assumed:
Figure 2.7: A simple model for the size constancy calculation. The angle subtended by the object, , decreases as distance increases according to the tangent function given in Equation (2). Substituting the parameters of this model into Equation (1):
(2) Figure 2.8: A plot of Equation (2). The size of the object is assumed to be 15 feet by 15 feet. Given a value of 1 minute of visual angle for human spatial acuity, the greatest distance at which a 15 foot by 15 foot object can be detected is:
(3) However, visual acuity is not independent of the viewing distance (Boff & Lincoln, 1988; Geise, 1946) since environmental noise may further add to or detract from it. The actual visual acuity at such a great distance is difficult to determine. Nagata plotted the degradation of several cues as a function of distance, and found that the size constancy starts to become useless at about 1000 m (Nagata, 1991). An engineering approach to determining an actual visibility point will be discussed below.
Visibility in computer displays has been an issue since the late 1940s. Fitts (1951) describes a number of tests regarding visibility of CRT displays, and notes that object size, brightness, and contrast are the main contributing factors to visibility in a normal display. An HMD has certain characteristics which determine visibility, namely: field of view, pixel resolution, and display size. Contrast and brightness are also important in HMDs, but since the spatial resolution is so poor, visibility is not likely to be affected as significantly by those factors.
Figure 2.9: Parameters of a head-mounted display. For some floating-point number x.y, we define:
Given the characteristics of an HMD presented in Figure 2.9, a formula for the actual number of pixels and displayed size of an object can be stated.
(4) (5) The problems caused by low display resolution are best illustrated with a particular example. The following list of constraints is typical of HMDs:
Vertical resolution = 400 pixels Horizontal resolution = 600 pixels Diagonal FOV = 60û Vertical FOV = 48û Horizontal FOV = 36û These constraints are based roughly upon the current state-of-the-art (as described in the section entitled Virtual Environment Equipment). Given these values, we can compute the visual angle subtended by one pixel:
(6) Clearly, the visual angle subtended by one pixel in an average HMD is greater than the values for human visual acuity found in the literature. According to the visual angles given above, a 15' x 15' object in the real world would be barely visible at 51,566 feet, whereas in the display, the same object would be just visible at 10,743 feet.
The HMD characteristics needed to match a human visual acuity of 1 min of arc can be easily calculated. For an HMD with the typical FOV of 48û horizontal by 36û vertical, the display would have to have a resolution of 2,160 pixels by 2,880 pixels to match foveal acuity. For an HMD with a typical resolution of 400 pixels by 600 pixels, the display would have to have a FOV of 10.8û by 6.7û.
In addition to the desired resolution and the number of pixels per object, the near complete visibility distance can be calculated. The near complete visibility distance is defined as the point at which the object is first fully contained in the display (i.e. is not cutoff or bigger than the display). In this simple case:
(7) Having calculated the limits on visibility imposed by a display, we can now examine the behavior of the object as it appears at different depths. A depth range is defined as the set of continuous distances over which an object stays the same size (i.e. number of pixels). Depth ranges are caused by the failure of the object to change by more than one pixel as it moves in depth.
Figure 2.10: A plot of the discrete size steps caused by low pixel resolution. Assumed size of the target object is 15 feet by 15 feet, while FOV is taken to be 48û and pixel resolution to be 600 pixels by 400 pixels. Not only does the pixellation of depth reduce depth resolution, but it also reduces the total range over which an object can be seen. Since the smallest visible unit is one pixel, and the visual angle subtended by one pixel is greater than the size that can be discriminated by the human eye, an object will disappear prematurely as it moves into the distance and reaches a size less than one pixel.
Figure 2.11: The discretization of object size as a function of pixel resolution and distance. The target object is assumed to be 15 feet by 15 feet, and the FOV is taken to be 48û and the resolution is assumed to be 600 pixels by 600 pixels. Figure 2.11 dramatically illustrates the effects of pixellation on the appearance of an object at various depths. In this model, a viewer would be unable to discriminate between an object at 7,500 feet and an object at 21,000 feet. However, in some ways, the detection threshold issue is more of a concern than the distance discrimination issue. Because human depth acuity at a great distance is considerably poorer than depth acuity at a close distance (Boff & Lincoln, 1988; Geise, 1946), the effect on distance discrimination is less important. From the calculations above, the predicted distance at which a human could spot a 15 foot tall object is about 51,000 feet, more than twice the distance at which the one-pixel cutoff occurs in this simple model. While in reality, the actual distance may be smaller, it is still significantly greater than can be seen with current displays.
The pixellation of depth cues also has a significant effect on linear perspective. One would expect an object to exhibit the same stepping problem when it moves towards the horizon as when it changes size. However, the model must include a non-zero viewpoint height to observe this effect. Since the appearance of the object as a function of distance is more simple when the viewpoint height is greater than the object height, the model will assume:
Figure 2.12: A side view of a model for calculating an object's visual angle. As distance increases, the visual angle subtended by the object, , decreases according to a tangent function. From the model in Figure 2.12, the following formulas can be derived:
(8) (9) (10) The formula describing the number of pixels composing the object is the same as before:
(11) Finally, the formulas determining the location of the end points of the object can be defined:
(12) These equations are used for the vertical dimension only. To fully understand the behavior of the object, the horizontal dimension should also be considered. The following figure shows the model of the object as viewed from above:
Figure 2.13: A top view of the same scene as depicted in the previous figure. The visual angle, , decreases according to Equation (13), for a fixed object width and increasing distance. Repeating the previous derivations for the model shown in Figure 2.13, we have:
(13) (14) (15) (16) The endpoints of the object will reflect both the effect of the size constancy and the effect of linear perspective since the endpoints are determined both by the location of the object and its size.
Figure 2.14: A plot showing the results of linear perspective and size constancy on object location. The space between the top and bottom lines (dotted) and the right and left line (solid) indicates the size of the object at various distances. The dimensions of the display are assumed to be 600 pixels by 400 pixels, and the object size is assumed to be 15 feet by 15 feet. In the right-left case, the object remains centered in the middle of the screen, at 300 pixels, while in the top-bottom case, the object moves towards 200 pixels.
Figure 2.14 shows a number of inconsistencies in the shape of the object as it recedes in depth. At a number of points the object is taller than it is wide, due to the 4 x 3 aspect ratio of the display. The interaction of size constancy and linear perspective is quite apparent. Figure 2.15 shows, in more detail the behavior of the left and right points. In the horizontal case, everything seems to be appropriate; the size decreases consistently until the cutoff threshold point. Also, the cutoff point in this model (~5,400 feet) is closer than that in the simple model with a zero viewpoint height (~22,000 feet).
Figure 2.15: The predicted movement of the left and right edges of a 15' by 15' object as viewing distance increases. The size of the object at a particular distance is given by vertical distance between the plots for the left and right edges. The display is assumed to be 600 pixels wide. The plot of the left and right points of the object shows no inconsistencies in the shape of the object. Again, the effect of pixel size on the appearance of the object is apparent. The movement of the top and bottom endpoints is more interesting since the observer is not viewing along the line to the center of the object. With the observer above the object being viewed, the object will move according to the equations that model linear perspective and will shrink according to the equations for size constancy. However, the changes in the appearance of the object due to the two depth cues do not necessarily happen at the same time, as Figure 2.16 shows:
Figure 2.16: The predicted movement of the top and bottom edges of a 15' by 15' object as viewing distance increases. The size of the object at a particular distance is given by the vertical distance between the plots for the top and bottom points. The display is assumed to be 400 pixels tall. Most notably, the object will disappear briefly at a distance of approximately 8,800 feet. The object, which is one pixel in size and moving towards the horizon, reaches a point where not enough of it is in either the pixel it is moving from or the pixel it is moving to. Thus, the object disappears until a sufficient portion of it moves into the new pixel.
Figure 2.17: The disappearance-reappearance problem. The linear perspective and size constancy geometry predict the location and size of the object in the first column. Because of rounding in the graphics software and hardware, the object is actually displayed as in the right column. A traversal from (a) to (c) represents the result of increasing the viewing distance. This disappearance-reappearance problem at the threshold distance has a parallel in the visible range. The size constancy and linear perspective steps do not occur at the same time, as shown in Figure 2.17. Thus, an object may shrink and grow intermittently. The object may be forced to move by linear perspective to a point where it overlaps more pixels and thus appears a pixel bigger than predicted by size constancy alone. So, the disappearance-reappearance problem implies a similar growth-shrinkage problem. Depth estimation is clearly compromised by the disappearance-reappearance and growth-shrinkage problems.
The complexity of the problems in perspective geometry is proportional to the complexity of the model of the observer and the target stimulus. The problems associated with low resolution require more sophisticated analysis than is commonly thought. These problems deserve careful treatment since a carefully constructed solution has broad applications.
A simple hardware solution to the problems caused by lack of spatial resolution is to simply make displays with more pixels per inch. However, the technology is not yet available to accomplish this, nor is it clear that additional pixels would be used to improve the spatial resolution of a display. The demand for improved FOV may outweigh the desire for better pixel resolution.
Thus, another kind of solution must be found. Perceptual tradeoffs are notoriously tricky and are best handled in a flexible way. Computer software is inherently adaptable and is a powerful tool for solving perception and display problems. Through careful measurement of human performance using the display with various software-controlled parameters, a reasonable solution can be achieved with relatively little effort.
The abstract idea of engineering software to match human perceptual performance is not a new one. Robinett and Rolland's model of the optical system in HMDs (1992) led to Watson and Hodges' work (1995) involving the software predistortion of images to compensate for optical distortion.
The compromises made by VE systems designers should be based as much as possible on the best available evidence regarding the interaction between the human visual system and objective performance metrics (VETREC, 1992). An effective design results from trading off sets of variables, including economic and psychological cost factors, in order to optimize resources for reaching task goals (Miller, 1976). Determining operational parameters inevitably involves a number of tradeoffs among not only cost but also performance and efficiency. Zeltzer offers the throughput of geometric primitives, visual update rate, and display resolution as the major design parameters for a visual display (1991). Also, temporal sensitivity and resolution have a tradeoff (one cannot update a high-resolution image fast enough to show smooth motion), and image intensity and perceived color and brightness influence one another (Christou & Parker, 1995).
Given that any VE visual system design incorporates a significant number of tradeoffs between hardware limitations and human perceptual capabilities, providing software-based solutions seems to present an orthogonal domain in which to seek solutions. With the exception of the work done by Robinett and Rolland (1992), Watson and Hodges (1995), little effort has been made outside of traditional computer graphics to find the bridge between human visual perception and solutions found via the adaptability of software. Because of the flexibility of software and the ease and speed with which results can be tested, it seems an obvious direction to pursue solutions to some of the more daunting perceptual difficulties found in VE systems.
The visibility-resolution problem itself has other implications. Not only would problems with visibility in VEs be solved, but other "smart" systems that suffer from the effects of poor resolution in depth judgment could also be improved. Most notably, night-vision goggles suffer from poor resolution which limits visibility and the overall effectiveness of the device. Thus, finding a solution for the effects of low resolution displays on visibility has other potentially useful ramifications.