3-dimensional electro-optical see-through displays
11803059 · 2023-10-31
Assignee
Inventors
Cpc classification
G02B2027/0147
PHYSICS
G02B2027/0187
PHYSICS
H04N13/383
ELECTRICITY
International classification
G02B26/00
PHYSICS
G02B30/34
PHYSICS
G06T19/00
PHYSICS
Abstract
A display placed in an optical pathway extending from an entrance pupil of a person's eye to a real-world scene beyond the eye. The display includes at least one 2-D added-image source that is addressable to produce a light pattern corresponding to a virtual object. The source is situated to direct the light pattern toward the person's eye to superimpose the virtual object on an image of the real-world scene as perceived by the eye via the optical pathway. An active-optical element is situated between the eye and the added-image source at a location that is optically conjugate to the entrance pupil and at which the active-optical element forms an intermediate image of the light pattern from the added-image source. The active-optical element has variable optical power and is addressable to change its optical power to produce a corresponding change in perceived distance at which the intermediate image is formed, as an added image to the real-world scene, relative to the eye.
Claims
1. A stereoscopic headmounted display for creating an image of a virtual 3-D scene containing objects located at differing 3-D depths within the virtual 3-D scene, the headmounted display comprising: a 2-D display addressable to produce at least two light patterns corresponding to respective virtual objects located at differing 3-D depths within the virtual 3-D scene, each of the respective virtual objects located at a respective virtual distance associated with a respective accommodation cue provided by the 2-D display; an active-optical element disposed at a location to receive the at least two light patterns emitted by the 2-D display, the active-optical element having an addressable and variable optical power corresponding to the differing 3-D depths of the virtual objects within the virtual 3-D scene; and a controller in communication with the 2-D display and the active-optical element, and configured to create and update the at least two patterns on the 2-D display and configured to change the variable optical power of the active-optical element synchronously with the update of the at least two patterns to match to the variable optical power to the respective accommodation cues being provided by the 2-D display wherein the active-optical element is addressable to change the variable optical power at a switching rate, f.sub.N, given by
2. The stereoscopic headmounted display of claim 1, wherein the 2-D display comprises a 2-D array of light-producing pixels.
3. The stereoscopic headmounted display of claim 1, comprising a spherical mirror disposed on the optical axis in optical communication with the active-optical element and at a location such that the active-optical element is disposed between the spherical mirror and the 2-D display.
4. The stereoscopic headmounted display of claim 3, wherein the active-optical element is offset along the optical axis from the center of a radius of curvature, R, of the spherical mirror.
5. The stereoscopic headmounted display of claim 4, wherein the active-optical element is offset by Δ, and an exit pupil of the 2-D display is magnified by an amount m.sub.p,
6. The stereoscopic headmounted display of claim 1, wherein the active-optical element comprises a refractive active-optical element.
7. The stereoscopic headmounted display of claim 6, wherein the refractive active-optical element comprises a liquid lens.
8. The stereoscopic headmounted display of claim 1, wherein the controller is configured to change the optical power of the active-optical element to produce a corresponding change in perceived distance at which an intermediate image is formed.
9. The stereoscopic headmounted display of claim 8, wherein the stereoscopic headmounted display is a see-through display and wherein the intermediate image is added to a real-world scene observable through the see-through display.
10. The stereoscopic headmounted display of claim 8, wherein a distance at which the intermediate image is formed can serve as an accommodation cue for the person with respect to the intermediate image.
11. The stereoscopic headmounted display of claim 8, wherein the controller is configured to change the optical power of the active-optical element to correspondingly move the intermediate image along an optical pathway relative to a focal plane to produce a corresponding change in distance of the image relative to the person's eye.
12. The stereoscopic headmounted display of claim 1, wherein the stereoscopic headmounted display is a see-through display.
13. The stereoscopic headmounted display of claim 1, wherein the active-optical element is configured to change the variable optical power at a rate of at least 60 Hz.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
DETAILED DESCRIPTION
(37) The following disclosure is presented in the context of representative embodiments that are not to be construed as being limiting in any way. This disclosure is directed toward all novel and non-obvious features and aspects of the various disclosed embodiments, alone and in various combinations and sub-combinations with one another. The disclosed methods, apparatus, and systems are not limited to any specific aspect or feature or combination thereof, nor do the disclosed embodiments require that any one or more specific advantages be present or problems be solved.
(38) Although the operations of the disclosed methods are described in a particular, sequential order for convenient presentation, it should be understood that this manner of description encompasses rearrangement of the operations, unless a particular ordering is required by specific language set forth below. For example, operations described sequentially may in some cases be rearranged or performed concurrently. Moreover, for the sake of simplicity, the attached figures may not show the various ways in which the disclosed systems, methods, and apparatus can be used in conjunction with other things and methods.
(39) The following explanations of terms are provided to better describe the present disclosure and to guide those of ordinary skill in the art in the practice of the present disclosure.
(40) This disclosure sometimes uses terms like “produce,” “generate,” “select,” “receive,” “exhibit,” and “provide” to describe the disclosed methods. These terms are high-level abstractions of the actual operations that are performed. The actual operations that correspond to these terms may vary depending on the particular implementation and are readily discernible by one of ordinary skill in the art.
(41) The singular forms “a,” “an,” and “the” include the plural forms unless the context clearly dictates otherwise. The term “includes” means “comprises.” Unless the context dictates otherwise, the term “coupled” means mechanically, electrically, or electromagnetically connected or linked and includes both direct connections or direct links and indirect connections or indirect links through one or more intermediate elements not affecting the intended operation of the described system.
(42) Certain terms may be used such as “up,” “down,” “upper,” “lower,” and the like. These terms are used, where applicable, to provide some clarity of description when dealing with relative relationships. But, these terms are not intended to imply absolute relationships, positions, and/or orientations.
(43) The term “or” refers to a single element of stated alternative elements or a combination of two or more elements, unless the context clearly indicates otherwise.
(44) Unless explained otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods and materials are described below. The materials, methods, and examples are illustrative only and not intended to be limiting. Other features of the disclosure are apparent from the following detailed description and the claims.
(45) Unless otherwise indicated, all numbers expressing quantities of components, percentages, temperatures, times, and so forth, as used in the specification or claims are to be understood as being modified by the term “about” or “approximately.” Accordingly, unless otherwise indicated, implicitly or explicitly, the numerical parameters set forth are approximations that may depend on the desired properties sought and/or limits of detection under standard test conditions/methods. When directly and explicitly distinguishing embodiments from discussed prior art, the embodiment numbers are not approximates unless the word “about” is recited.
(46) The various embodiments of displays address multiple focal planes in an optical see-through display. A particularly desirable display configuration is head-mountable; however, head-mountability is not a mandatory feature. For example, contemplated as being within the scope of the invention are displays relative to which a viewer simply places his or her head or at least his or her eyes. The displays include binocular (intended and configured for use with both eyes) as well as monocular displays (intended and configured for use with one eye).
(47) Each of the various embodiments of displays described herein comprises an active-optical element that can change its focal length by application of an appropriate electrical stimulus (e.g., voltage) or command. An active-optical element can be refractive (e.g., a lens) or reflective (e.g., a mirror).
(48) A practical active-optical element in this regard is a so-called “liquid lens.” A liquid lens operates according to the electrowetting phenomenon, and can exhibit a wide range of optical power. Electrowetting is exemplified by placement of a small volume (e.g., a drop) of water on an electrically conductive substrate, wherein the water is covered by a thin layer of an electrical insulator. A voltage applied to the substrate modifies the contact angle of the liquid drop relative to the substrate. Currently available liquid lenses actually comprise two liquids having the same density. One liquid is an electrical insulator while the other liquid (water) is electrically conductive. The liquids are not miscible with each other but contact each other at a liquid-liquid interface. Changing the applied voltage causes a corresponding change in curvature of the liquid-liquid interface, which in turn changes the focal length of the lens. One commercial source of liquid lenses is Varioptic, Inc., Lyon, France. In one example embodiment the respective liquid lens exhibits an optical power ranging from −5 to +20 diopters (−5D to 20D) by applying an AC voltage ranging from 32 V.sub.rms to 60 V.sub.rms, respectively. Such a lens is capable of dynamically controlling the focal distance of a light pattern produced by a 2-D micro-display from infinity to as close as the near point of the eye.
First Representative Embodiment
(49) A representative embodiment of a stereoscopic display 10 is shown in
(50) The display 10 comprises a 2-D micro-display 12 (termed herein an “added-image source”), a focusing lens 14, a beam-splitter (BS) 16, and a condensing (e.g., concave spherical) mirror 18. The added-image source 12 generates a light pattern intended to be added, as an image, to the view of the “real world” being perceived by a user wearing or otherwise using the display 10.
(51) To illustrate generally the operation of the display 10, reference is made to
(52) The focusing lens 14 is drawn as a singlet in
(53) Thus, the two lenses 14a, 14b together form an intermediate image of the light pattern produced by the added-image source 12, and the mirror 18 relays and directs the intermediate image toward the viewer's eye via the beam-splitter 16. The mirror 18 is configured to ensure a conjugate exit pupil is formed at the eye of a person using the display 10. By placing the eye at the conjugate pupil position, the viewer sees both the image of the light pattern produced by the added-image source 12 and a view of the real world. Although the mirror 18 in this embodiment is spherically concave, it will be understood that it alternatively could be aspherical-concave.
(54) In certain alternative configurations, the mirror 18 can be omitted. The main benefit of the mirror is its ability to fold the optical pathway and provide a compact optical system in the display. In certain situations such compactness may not be necessary.
(55) The accommodation lens 14a is a liquid lens in this embodiment, which is an example of a refractive active-optical element. It will be understood that any of several other types of refractive active-optical elements can alternatively be used, such as but not limited to a liquid-crystal lens. Further alternatively, the accommodation lens can be a reflective active-optical element, such as an actively deformable mirror. In other words, any of various optical elements can be used that have the capability of changing their focal length upon being addressed (i.e., upon command).
(56) Based on first-order optics and use of a liquid lens as an active-optical element, the accommodation cue, d, of the display 10 (i.e., the distance from the eye 15 to the image plane of the virtual object produced by the added-image source 12) is determined by:
(57)
where Φ=Φ.sub.0+Φ.sub.A−Φ.sub.0Φ.sub.At is the combined optical power of the focusing lens, t is the axial separation between the objective lens 14b and the accommodation lens 14a, u is the axial distance from the 2-D added-image source 12 to the focusing lens 14, and R is the radius of curvature of the mirror 18. All distances are defined by the sign convention in optical designs.
(58) This display 10 has multiple addressable focal planes for improved depth perceptions. Similarly to the accommodative ability of the crystalline lens in the human visual system, the liquid lens 14a or other refractive active-optical element provides an addressable accommodation cue that ranges from infinity to as close as the near-point of the eye. Unlike mechanical focusing methods, and unlike retinal scanning displays (RSDs) based on reflective deformable membrane mirrors (DMMs), the transmissive nature of the liquid lens 14a or other refractive active-optical element allows for a compact and practical display that has substantially no moving mechanical parts and that does not compromise the accommodation range.
(59) Focus cues are addressable with this embodiment in at least one of two modes. One mode is a variable-single-focal-plane mode, and the other in a time-multiplexed multi-focal-plane mode. In the variable-single-focal-plane mode, the accommodation cue of a displayed virtual object is continuously addressed from far to near distances and vice versa. Thus, the accommodation cue provided by a virtual object can be arbitrarily manipulated in a viewed 3-D world. In the time-multiplexed multi-focal-plane mode, the active-optical element, operating synchronously with graphics hardware and software driving the added-image source, is driven time-sequentially to render both accommodation and retinal blur cues for virtual objects at different depths. In comparison to the conventional time-multiplexed RSD approach using individually addressable pixels, use in this embodiment of the 2-D added-image source to render multiple full-color 2-D images on a frame-sequential basis substantially eliminates any requirement for high addressing speeds.
(60) This embodiment is head-mountable, as shown, for example, in
(61)
EXAMPLE 1
(62) In this example a monocular display was constructed, in which the accommodation lens 14a was a liquid lens (“Arctic 320” manufactured by Varioptic, Inc., Lyon, France) having a variable optical power from −5 to +20 diopters by applying an AC voltage from 32 V.sub.rms to 60 V.sub.rms, respectively. The liquid lens 14a, having a clear aperture of 3 mm, was coupled to an objective lens 14b having an 18-mm focal length. The source of images to be placed in a viewed portion of the real world was an organic-LED, full-color, 2-D added-image source (“micro-display,” 0.59 inches square) having 800×600 pixels and a refresh rate of up to 85 Hz (manufactured by eMagin, Inc., Bellevue, Wash.). The mirror 18 was spherically concave, with a 70-mm radius of curvature and a 35-mm clear aperture. Based on these parametric combinations, the display had an exit-pupil diameter of 3 mm, an eye-relief of 20 mm, a diagonal field of view (FOV) of about 28°, and an angular resolution of 1.7 arcmins. The 28° FOV was derived by accounting for the chief-ray angle in the image space.
(63)
(64) Based on the parametric selections in this example and on Eq. (1),
(65) As indicated by
(66) In the multi-focal plane mode, the liquid lens 14a is fast-switched among between multiple discrete driving voltages to provide multiple respective focal distances, such as I″ and II″ in
(67)
(68) In the multi-focal-plane mode, the dioptric spacing between adjacent focal planes and the overall range of accommodation cues can be controlled by changing the voltages applied to the liquid lens 14a. Switching among various multi-focal-plane settings, or between the variable-single-focal-plane mode and the multi-focal-plane mode, does not require any hardware modifications. These distinctive capabilities provide a flexible management of focus cues suited for a variety of applications, which may involve focal planes spanning a wide depth range or dense focal planes within a relatively smaller depth range for better accuracy.
(69) Certain embodiments are operable in a mode that is essentially a combination of both operating modes summarized above.
Variable-Single-Focal-Plane Mode
(70) Operating the system under the variable-single-focal-plane mode allows for the dynamic rendering of accommodation cues which may vary with the viewer's position of interest in the viewing volume. Operation in this mode usually requires some form of feedback and thus some form of feedback control. The feedback control need not be automatic. The feedback can be generated by a user using the display and responding to accommodation and/or convergence cues provided by the display and feeding back his responses using a user interface. Alternatively, the feedback can be produced using sensors producing data that are fed to a computer or processor controlling the display. A user interface also typically requires a computer or processor to interpret commands from the interface and produce corresponding address commands for the active-optical element.
(71) In this mode the added-image source 12 produces a light pattern corresponding to a desired image to be added, as a virtual object, to the real-world view being produced by the display 10. Meanwhile, the voltage applied to the liquid lens 14a is dynamically adjusted to focus the added image of the light pattern at different focal distances, from infinity to as close as the near point of the eye, in the real-world view. This dynamic adjustment can be achieved using a “user interface,” which in this context is a device manipulated by a user to produce and input data and/or commands to the display. An example command is the particular depth at which the user would like the added image placed in the real-world view. The image of the light pattern produced by the added-image source 12 is thus contributed, at the desired depth, to the view of the “real” world being provided by the display 10. Another user interface is a 3-D eye-tracker, for example, that is capable of tracking the convergence point of the left and right eyes in 3-D space. A hand-held device offers easy and robust control of slowly changing points of interest, but usually lacks the ability to respond to rapidly updating points of interest at a pace comparable to the speed of moderate eye movements. An eye-tracker interface, which may be applicable for images of virtual objects graphically rendered with the depth-of-field effects, enables synchronous action between the focus cues of the virtual images and the viewer's eye movements. In various experiments we adopted a hand-held device, e.g., “SpaceTraveler” (3DConnexion, Inc., Fremont, Calif.) for manipulating accommodation cues of the display in 3-D space.
(72) The variable-single-focal-plane mode meets specific application needs, such as substantially matching the accommodation cues of virtual and real objects in mixed and augmented realities being perceived by the user of the display. The accommodation and/or focus cues can be pre-programmed, if desired, to animate the virtual object to move in 3-D space, as perceived by the user.
(73) To demonstrate the addressability of focus cues in the variable-single-focal-plane mode, three bar-type resolution targets were placed along the visual axis of an actually constructed display. The targets served as references to the virtual image with variable focus cues. As shown on the left side of each sub-image in
(74) The added-image source 12 was addressed to produce an image of a torus and to place the image of the torus successively, at a constant rate of change, along the visual axis of the display at 16 cm, 33 cm, and 100 cm from the eye, or in reverse order. Meanwhile, the voltage applied to the liquid lens 14a was changed synchronously with the rate of change of the distance of the virtual torus from the eye. By varying the voltage between 38 V.sub.rms and 49 V.sub.rms, the accommodation cue of the displayed torus image was varied correspondingly from 6 diopters to 1 diopter.
(75) Meanwhile, the digital camcorder captured the images shown in
(76)
(77) A series of focus cues can be pre-preprogrammed to animate a virtual object in the real-world view to move smoothly in the view in three-dimensional space.
Multi-Focal-Plane Mode
(78) Although the variable-single-focal-plane mode is a useful mode for many applications, the multi-focal-plane mode addresses needs for a true 3-D display, in which depth perceptions are not limited by a single or a variable focal plane that may need an eye tracker or the like to track a viewer's point of interest in a dynamic manner. In other words, the multi-focal-plane mode can be used without the need for feedback or feedback control. Compared to the volumetric displays, a display operating in the multi-focal-plane mode balances accuracy of depth perception, practicability for device implementation, and accessibility to computational resources and graphics-rendering techniques.
(79) In the multi-focal-plane mode, the liquid lens 14a is rapidly switched among multiple selectable driving voltages to provide multiple respective focal distances, such as I″ and II″ in
(80) Use of the display in the time-multiplexed multi-focal-plane mode is made possible, for example, by using the liquid lens 14a as an active-optical element to control the accommodation cue. There are a few major differences between this mode as used with certain of the displays described herein versus the conventional retinal scanning display (RSD) technique. Firstly, the subject embodiments of the display 10 use a liquid lens 14a (a refractive active-optical element), rather than a reflective DMM device. Use of the liquid lens 14a provides a compact and practical display without compromising the range of accommodation cues. Secondly, instead of addressing each pixel individually by a laser-scanning mechanism as in the RSD technique, the subject embodiments use a 2-D added-image source 12 to generate and present high-resolution, images (typically in full color) in a time-sequential, image-by-image manner to respective focal planes. Consequently, the subject embodiments do not require the very high addressing speed (at the MHz level) conventionally required to render images pixel-by-pixel. Rather, the addressing speeds of the added-image source 12 and of the active-optical element 14a are substantially reduced to, e.g., the 100-Hz level. In contrast, the pixel-sequential rendering approach used in a conventional RSD system requires MHz operation speeds for both the DMM device and the mechanism for scanning multiple laser beams.
(81) For an example display in a dual-focal-plane mode (as an example of a multi-focal-plane mode), the driving signal of the liquid lens 14a and an exemplary manner of driving the production of virtual objects are shown in
(82)
where N is the total number of focal planes and f.sub.min is the lowest response speed (in Hz) among the added-image source 12, the active-optical element 14a, and the electronics driving these components. The waveforms in
EXAMPLE 2
(83) In this example, the liquid lens 14a (Varioptic “Arctic 320”) was driven by a square wave oscillating between 49 V.sub.rms and 38 V.sub.rms, respectively. Meanwhile, the accommodation cue provided by the display 10 was fast-switched between the depths of 100 cm and 16 cm. The period, T, of the driving signal was adjustable in the image-rendering program. Ideally, T should be set to match the response speed of the slowest component in the display 10, which determines the frame rate of the display operating in the dual-focal-plane mode. For example, if T is set at 200 ms, matching the speed (f.sub.min) of the slowest component in the display 10, the speed of the display will be 5 Hz, and the virtual objects at the two depths will appear alternatingly to a user of the display. If T is set at 20 ms (50 Hz) faster than the slowest component (in one example the highest refresh rate of the electronics driving the added-image source 12 is 75 Hz), then the virtual objects will be rendered at a speed of about f.sub.min/2=37.5 Hz. In another example, the control electronics driving the liquid lens 14a allows for a high-speed operational mode, in which the driving voltage is updated every 600 μs to drive the liquid lens. The response speed of this liquid lens 14a (shown in
(84) This is shown in Table 1. In the left-hand column of Table 1, potential limiting factors to the maximum speed of the display operating in a dual-focal-plane mode are listed, including the liquid lens 14a, the added-image source 12, and the driving electronics (“graphics card”). For example, if the particular liquid lens 14a used in the display 10 is the “Arctic 320”, then the maximum achievable frame rate in the dual-focal-plane mode is 7 Hz. A more recent type of liquid lens, namely the “Arctic 314” from Varioptic, has a purported 5˜10 times faster response speed than the Arctic 320. In
(85) TABLE-US-00001 TABLE 1 Limiting Factor Hardware Speed (ms) Max. Display Speed (Hz) Liquid Lens, 74 7 Arctic 320 Graphics Card, 13.3 37.5 75 Hz OLED Micro- 11.8 42.5 display, 85 Hz Liquid Lens, 9 56 Arctic 314 Flicker-Free 8.4 60 Frequency
Second Representative Embodiment Example 3
(86) A display 30 according to this embodiment and example comprised a faster liquid lens 34a than used in the first embodiment. Specifically, the faster liquid lens 34a was the “Arctic 314” manufactured by Varioptic, Inc. This liquid lens 34a had a response speed of about 9 ms, which allowed the frame rate of the display 30 (operating in dual-focal-plane mode) to be increased to 37.5 Hz. Referring to FIG. 9(a), the display 30 (only the respective portion, termed a “monocular” portion, for one eye is shown; a binocular display would include two monocular portions for stereoscopic viewing) also included a spherical concave mirror 38, a 2-D added-image source 32, and a beam-splitter (BS) 36.
(87) An alternative object-rendering scheme was used in this embodiment and example to reduce artifacts and further improve the accuracy of the convergence cues produced by the display 30. The liquid lens 34a had a clear aperture of 2.5 mm rather than the 3-mm clear aperture of the liquid lens 14a. To compensate for the reduced clear aperture, certain modifications were made. As shown in
(88)
to the size of the clear aperture of the liquid lens 34a. The focus cue is specified by the distance z from the virtual image to the exit pupil of the display 30, given as:
(89)
(90) The liquid lens 34a had a variable optical power ranging from −5 to +20 diopters by applying an AC voltage, ranging from 32 V.sub.rms to 60 V.sub.rms, respectively. The other optical components (e.g., the beam-splitter 36 and singlet objective lens 34b) were as used in Example 1. The axial distance t between the objective lens 34b and the liquid lens 34a was 6 mm, the offset Δ was 6 mm, and the object distance (−u) was 34 mm. With these parameters, the display 30 exhibited a 24° diagonal field-of-view (FOV) with an exit pupil of 3 mm. A comparison of the Arctic 314 and Arctic 320 lenses is shown in Table 2.
(91) TABLE-US-00002 TABLE 2 Parameter ARCTIC 320 ARCTIC 314 Applied voltage 0-60 V.sub.rms 0-60 V.sub.rms Optical Power −5 D~20 D −5 D~20 D Effective aperture 3.0 mm 2.5 mm Response time 75 msec (90% rise time) 9 msec (90% rise time) Operate wavelength Visible Visible Linear range 38~49 V.sub.rms 38~49 V.sub.rms Drive Freq. 1 kHz 1 kHz Wavefront distort. <0.5 μm 80 nm (typ.) Transmittance @ 587 nm >90% rms >97% rms
(92) Given the dependence of the optical power Φ upon the voltage U applied to the liquid lens 34a,
(93) As illustrated in
(94) Using the lens-driving scheme of
(95) In this example, since the response speed of the liquid lens 34a was about 9 ms, longitudinal shifts of the focal planes during the settling time of the liquid lens were expected as the driving signal was switched between the two voltages. This phenomenon can produce minor image blur and less than ideally accurate depth representations. A liquid lens (or other adaptive optical element) having a faster response speed can reduce these artifacts and render more accurate focus cues at high speed.
(96) Experiments were also performed to investigate another scheme for image rendering. As shown in
(97)
(98) Due to the shortened duration of image frames, brightness level may be correspondingly lower, as quantified by:
(99)
If t.sub.1=t.sub.2=13.3 ms, the relative brightness level in
(100) A faster liquid lens and/or added-image source and higher-speed driving electronics are beneficial for producing accurate focus cues at a substantially flicker-free rate. For less flicker the liquid lens can be driven in an overshoot manner with decreased time-to-depth-of-field in an auto-focusing imaging system. Other active-optical technologies, such as high-speed DMM and liquid-crystal lenses, could also be used in the time-multiplexed multi-focal-plane mode to reduce flicker.
(101) In any event, by using a faster active-optical element, a display operating in the time-multiplexed multi-focal-plane mode was produced and operated in this example. The display was capable of rendering nearly correct focus cues and other depth cues such as occlusion and shading, and the focus cues were presentable within a wide range, from infinity to as close as 8 diopters.
(102) We compared the effects of two rendering schemes having respective refresh rates; the first scheme having a higher refresh rate (e.g., f=37.5 Hz) and producing a brighter image (B=1.0) but with reduced image sharpness and focus-cue accuracy due to the limited response speed of the liquid lens, and the second scheme producing sharper images and more accurate focus cues but with compromised speed (e.g., f=18.75 Hz) and image brightness (B=0.5) due to the limited frame rate of the driving electronics.
Third Representative Embodiment
(103) This embodiment is directed to a display that is gaze-contingent and that is capable of rendering nearly correct focus cues in real-time for the attended region of interest. The display addresses accommodation cues produced in the variable-single-focal-plane mode in synchrony with the graphical rendering of retinal blur cues and tracking of the convergence distance of the eye.
(104) This embodiment is termed herein a “variable-focus gaze-contingent display” (VF-GCD). It can produce improved focus-cue presentation and better matching of accommodation and convergence in the single-variable-focal-plane. Thus, this embodiment utilizes a display operating in the variable-single-focal-plane mode and provides integrated convergence tracking to provide accurate rendering of real-time focus cues. Unlike conventional stereoscopic displays, which typically fix the distance of the focal plane in the visual space, the VF-GCD automatically tracks the viewer's current 3-D point-of-gaze (POG) and adjusts the focal plane of the display to match the viewer's current convergence distance in real-time. (In contrast, a display operating in the variable-single-focal-plane mode with user interface typically has a delay in feedback produced by the user mentally processing feedback information and utilizing that information in responding to accommodation and/or convergence cues.) Also, in contrast to volumetric displays that typically render the entire 3-D scene as a discretized space of voxels, the VF-GCD renders the projected 2-D image of the 3-D scene onto moving image planes, thereby significantly improving the rendering efficiency as well as taking full advantage of commercially available graphics electronics for rendering focus cues.
(105) This embodiment incorporates three principles for rendering nearly correct focus cues: addressable accommodation cues, convergence tracking, and real-time rendering of retinal blur cues. Reference is made again to
(106) By passively involving the viewer (user) for feedback purposes, the VF-GCD forms a closed-loop system that can respond in real-time to user feedback in the form of convergent or divergent eye rotations. See
(107) In this embodiment the focal plane moves in three dimensions, matching with the convergence depth of the viewer. In practice, the addressable accommodation cue is realized by an active-optical element having variable optical power. From a practical standpoint, the active-optical element should satisfy the following conditions: (1) It should provide a variable range of optical power that is compatible with the accommodative range of the human eye. (2) It should be optically conjugate to the entrance pupil of the viewer, making the display appearing to have a fixed FOV that is independent of focus changes. (3) It should have a response speed that substantially matches the speed of rapid eye movements.
(108) The display of this embodiment comprises a liquid lens (Arctic 314 made by Varioptic), which has a variable optical power ranging from −5 diopters (−5D) (1 diopter=1/meter) to 20D, a clear aperture of ˜3 mm, and a response speed of about 10 msec.
(109) To maintain proper focus cues, the VF-GCD computes changes in the viewer's convergence distance using a binocular eye-tracking system adapted from a pair of 2-D monocular eye-trackers. In general, current monocular eye-trackers utilize one or more of non-imaging-based tracking, image-based tracking, and model-based tracking methods. Among the image-based tracking methods, dark-pupil tracking is generally regarded as the simplest and most robust.
(110) To compute the viewer's convergence distance in 3-D space, a pair of monocular trackers was used to triangulate the convergence point using the lines of sight of both eyes, as shown in
(111)
where IPD is the inter-pupillary distance of the viewer. As shown in
(112) The VF-GCD also desirably includes an image-rendering system capable of simulating real-time retinal blur effects, which is commonly referred to as “DOF rendering.” Depth-of-field effects improve the photo-realistic appearance of a 3-D scene by simulating a thin-lens camera model with a finite aperture, thereby inducing a circle of confusion into the rendered image for virtual objects outside the focal plane. Virtual scenes rendered with DOF effects provide a more realistic appearance of the scene than images rendered with the more typical pinhole-camera model and can potentially reduce visual artifacts. Real-time DOF has particular relevance in the VF-GCD since the focal distance of the display changes following the convergence distance of the viewer. Maintaining the expected blurring cues is thus important to preventing depth confusion as the viewer browses objects at varying depths in the scene.
(113) Graphically rendering DOF effects can be done in any of several ways that differ from one another significantly in their rendering accuracy and speed. For instance, ray-tracing and accumulation-buffer methods provide good visual results on rendered blur cues but are typically not feasible for real-time systems. Single-layer and multiple-layer post-processing methods tend to yield acceptable real-time performance with somewhat lesser visual accuracy. The latter methods are made computationally feasible due to the highly parallel nature of their algorithms; this feasibility is suitable for implementation on currently available high-performance graphics processing units (GPUs). We used a single-layer post-processing DOF method. To illustrate this DOF algorithm, note the rabbits rendered in
(114) A key component of the DOF algorithm is the computation of the DOB (depth of blur) map, which is used for weighted blending of the pin-hole and blurred images. The DOB map is created by normalizing the depth values Z′, which are retrieved from the z-buffer for the image, with respect to the viewer's current convergence distance Z given by the binocular eye-tracker:
(115)
where Z.sub.near and Z.sub.far indicate the nearest and furthest depths, respectively, of the rendered 3-D space from the viewer's eyes. Note that all distances expressed in capital letters in Eq. (6) are defined in dioptric rather than Euclidian space. Taking
(116) We constructed a VF-GCD comprising a variable-focus display, convergence tracking, and real-time DOF rendering. The optical path for the VF-GCD was arranged perpendicularly, mainly due to ergonomic reasons, to prevent the spherical mirror from blocking the center FOV of both eyes. The key element for controlling focal distance in real-time was a liquid lens, which was coupled to an imaging lens to provide variable and sufficient optical power. The entrance pupil of the viewer was optically conjugate with the aperture of the liquid lens. As a result, without affecting the size of the FOV, the focus adjustment of the eye was optically compensated by the optical power change of the liquid lens, thus forming a closed-loop control system as shown in
(117) The capability of the VF-GCD was demonstrated in an experiment as outlined in
(118)
(119) This embodiment is directed to a variable-focus gaze-contingent display that is capable of rendering nearly correct focus cues of a volumetric space in real-time and in a closed-loop manner. Compared to a conventional stereoscopic display, the VF-GCD provided rendered focus cues more accurately, with reduced visual artifacts such as the conflict between convergence and accommodation. Compared to conventional volumetric displays, the VF-GCD was much simpler and conserved hardware and computational resources.
(120) Although this embodiment and example were described in the context of a monocular system, the embodiment encompasses corresponding binocular systems that can provide both binocular and monocular depth cues.
Fourth Representative Embodiment
(121) This embodiment is directed to the multi-focal-plane mode that operates in a so-called “depth fused” manner. A large number of focal planes and small dioptric spacings between them are desirable for improving image quality and reducing perceptual effects in the multi-focal-plane mode. But, to keep the number of focal planes to a manageable level, a depth-weighted blending technique can be implemented. This technique can lead to a “depth-fused 3-D” (DFD) perception, in which two overlapped images displayed at two different respective depths may be perceived as a single-depth image. The luminance ratio between the two images may be modulated to change the perceived depth of the fused image. The DFD effect can be incorporated into the multi-focal-plane mode. Another concern addressed by this embodiment is the choice of diopter spacing between adjacent focal planes.
(122) In this embodiment a systematic approach is utilized to address these issues. It is based on quantitative evaluation of the modulation transfer functions (MTF) of DFD images formed on the retina. The embodiment also takes into account most of the ocular factors, such as pupil size, monochromatic and chromatic aberrations, diffraction, Stiles-Crawford effect (SCE), and accommodation; and also takes into account certain display factors, such as dioptric midpoint, dioptric spacing, depth filter, and spatial frequency of the target. Based on the MTFs of the retinal images of the display and the depth of field (DOF) of the human visual system under photopic viewing conditions, the optimal arrangement of focal planes was determined, and the depth-weighted fusing function between adjacent focal planes was characterized.
(123)
L=L.sub.1(z)+L.sub.2(z)=w.sub.1(z)L+w.sub.2(z)L (7)
where w.sub.1(z) and w.sub.2(z) are the depth-weighted fusing functions modulating the luminance of the front and back focal planes, respectively. Typically, w.sub.1(z)+w.sub.2(z)=1 is enforced such that the luminance of the fused pixel is L.sub.1 when w.sub.1(z)=1 and is L.sub.2 when w.sub.2(z)=1. We hereafter assume the peak luminance of individual focal planes is normalized to be uniform, without considering system-specific optical losses potentially in some forms of multi-focal plane displays (e.g., in spatially multiplexed displays where light may be projected through a thick stack of display panels). Optical losses of a system should be characterized to normalize non-uniformity across the viewing volume before applying depth-weighted fusing functions.
(124) The depth-fused 3-D perception effect indicates that, as the depth-weighted fusing functions (w.sub.1 and w.sub.2) change, the perceived depth {circumflex over (z)} of the fused pixel will change accordingly. This is formulated as:
{circumflex over (z)}=f(w.sub.1,w.sub.2) (8)
For instance, when w.sub.1(z)=1, the perceived depth should be z.sub.1, and should be z.sub.2 when w.sub.2(z)=1. In a generalized n-focal plane DFD system, the dioptric distances from the eye to the n focal planes are denoted as z.sub.1, z.sub.2, . . . , z.sub.n in distance order, where z.sub.1 is the closest one to the eye. We assume that the 3-D scenes contained between a pair of adjacent focal planes are rendered only on this corresponding focal plane pair. Under this assumption, a given focal plane at z.sub.i will render all the 3-D scenes contained between the (I−1).sup.th and the (I+1).sup.th focal planes. Within the depth range of z.sub.i−1≥z≥z.sub.i+1, many scene points may be projected onto the same pixel of the i.sup.th focal plane, among which only the closest scene point to the eye is un-occluded and thus effectively determines the depth-weighted fusing function modulating the luminance of the specific pixel.
(125) The closest scene point corresponding to a specific pixel can typically be retrieved from the z-buffer in a computer graphics renderer. Let us assume the depth of the closest 3-D scene point projected onto a given pixel of the i.sup.th focal plane is z. Based on the depth-fused 3-D perception described above, the luminance of the 3-D point is distributed between the (I−1).sup.th and i.sup.th focal planes if z.sub.i−1≥z≥z.sub.i, otherwise between the i.sup.th and (I+1).sup.th focal planes if z.sub.i≥z≥z.sub.i+1. The luminance attribution to the i.sup.th focal plane is weighted by the depth z. It may be characterized by the ratio of the luminance attribution L.sub.i(z) on the i.sup.th focal plane at z.sub.i to that of the total scene luminance L(z), written as g.sub.i(z)=L.sub.i(z)/L(z), where L(z)=L.sub.i−1(z)+L.sub.i(z) if z.sub.i−1≥z≥z.sub.i or L(z)=L.sub.i(z)+L.sub.i+1(z) if z.sub.i≥z≥z.sub.i+1. In general, the depth-weighted fusing function, w.sub.i(z), of the i.sup.th focal plane can be defined as:
(126)
In summary, by knowing the rendered depth z of a 3-D virtual scene, the luminance levels of the multi-focal plane images can be modulated accordingly by the depth-weighted fusing functions in Eq. (9) to render pseudo-correct focus cues.
(127) In displays comprising DFD operability, the adjacent focal planes are separated in space at a considerable distance. The retinal image quality is expected to worsen when the eye is accommodated at a distance in between the front and back focal planes than when focusing on the front or back focal planes. However, both the dioptric spacing between adjacent focal planes and the depth-weighted fusing functions can be selected such that the perceived depth of the fused pixel {circumflex over (z)} closely matches with the rendered depth z and the image quality degradation is minimally perceptible as the observer accommodates to different distances between the focal planes.
(128) The optical quality of a fused pixel in DFD displays may be quantitatively measured by the point spread function (PSF) of the retinal image, or equivalently by the modulation transfer function (MTF), which is characterized by the ratio of the contrast modulation of the retinal image to that of a sinusoidal object on the 3-D display. Without loss of generality, hereafter a dual-focal plane display is assumed and the results therewith can be extended to n focal planes. Based on Eq. (7), when the eye is accommodated at the rendered distance z, the PSF of the fused pixel, PSF.sub.12, may be described as:
PSF.sub.12(z)=w.sub.1(z)PSF.sub.1(z,z.sub.1)+w.sub.2(z)PSF.sub.2(z,z.sub.2) (10)
where PSF.sub.1(z, z.sub.1) and PSF.sub.2(z, z.sub.2) are the point spread functions of the front and back pixels, respectively, corresponding to the eye accommodated distance z. The MTF of a DFD display can then be calculated via the Fourier Transform (FT) of the PSF.sub.12 and subsequently the FT of the PSF.sub.1 and PSF.sub.2.
(129) Multiple factors may affect the retinal image quality—PSF.sub.12 and MTF.sub.12—of a DFD display. Table 3 categorizes the parameters, along with their notation and typical range, into two types: ocular and display factors. Ocular factors are mostly related to the human visual system when viewing DFD images from a viewer's perspective. These variables, including pupil size, pupil apodization, reference wavelength, and accommodation state, should be carefully considered when modeling the eye optics. Display factors are related to the practical configuration of the display with DFD operability, such as the covered depth range, dioptric midpoint of two adjacent focal planes to the eye, dioptric spacing between two adjacent focal planes, depth-weighted fusing functions, as well as the spatial frequency of a displayed target.
(130) TABLE-US-00003 TABLE 3 Type of Factors Factors Notation Typical range Ocular Pupil diameter D 2 mm~8 mm Stiles-Crawford effect B −0.116 mm.sup.−2 Reference wavelength Λ F (486.1 nm), d (587.6 nm), C (656.3 nm) Accommodation Z z.sub.i+1 < z < z.sub.i Display Focal range z.sub.1 − z.sub.n 3 D Medial focus z.sub.i, i+1 = (z.sub.i + z.sub.i+1)/2 0 D~3 D Dioptric spacing Δz = z.sub.i − z.sub.i+1 0 D~1 D Depth filter w.sub.i, w.sub.i+1 0 ≤ w.sub.i, w.sub.i+1 ≤ 1 Target spatial V 1 cpd~30 cpd frequency
(131) Instead of using observer- and display-specific measurements to evaluate the PSF and MTF of DFD displays, we adopted a schematic Arizona eye model to simulate and analyze the retinal image quality from simulated targets to derive generalizable results. In the fields of optical design and ophthalmology, various schematic eye models have been widely used to predict the performance of an optical system involved with human subjects. In this study, the Arizona eye model was set up in CODE V. The Arizona eye model is designed to match clinical levels of aberration, both on- and off-axis fields, and can accommodate to different distances. The accommodative distance z, as shown in
(132) Ocular characteristics of the HVS, such as depth of field, pupil size, diffraction, Stiles-Crawford effect, monochromatic and chromatic aberrations, and accommodation, play important roles on the perceived image quality of a DFD display. Although there have been investigations of image-quality dependence upon pupil size, high-order aberration, and accommodation, the treatment to the aforementioned factors lacks generality to average subjects and to a full-color DFD display with different display configurations. For instance, only monochromatic aberrations specific to one user's eye were considered and a linear depth-weighted fusing function was assumed.
(133) To simulate the PSF/MTF of the retinal images accurately in a DFD display, we firstly examined the dependence of the polychromatic MTF of a fused pixel upon eye-pupil diameter while fixing other ocular and display factors. Particularly, we examined the MTFs under the condition that the luminance of a rendered pixel is equally distributed between the front and back focal planes separated by 0.5D, and the eye is accommodated at the midpoint between the two focal planes. The midpoint is generally expected to have the worst retinal image quality for a fused pixel. Assuming the same pupil size, we further compared the MTFs of the fused pixel against that of a real pixel that is physically placed at the dioptric midpoint between the two focal planes. For pupil diameters no larger than 4 mm, we found the MTF differences of the fused pixel from a real pixel at the same distance is acceptable for spatial frequencies below 20 cpd, while a considerable degradation is observed for larger pupils. Therefore, we set the eye pupil diameter of the eye model to be 4 mm, which in fact corresponded well to the pupil size viewing conventional HMD-like displays. Secondly, to account for the directional sensitivity of photoreceptors on the human retina, which commonly refers to the Stiles-Crawford effect (SCE), a Gaussian apodization filter was applied to the entrance pupil with an amplitude transmittance coefficient of β=−0.116 mm.sup.−2. Consequently, SCE may induce a slightly contracted effective pupil, and thus reduce spherical aberration and improve MTF.
(134) Furthermore, the image source in the model was set up with polychromatic wavelengths, including F, d, and C components as listed in Table 3, to simulate a full-color DFD display. To compensate the longitudinal chromatic aberration (LCA) that commonly exists in human eyes, we inserted a zero optical power achromat at 15 mm from the cornea vertex with the LCA opposite to the Arizona eye model. In a practical DFD display, instead of inserting an achromat directly in front of the eye, the display optics may be optimized to have an equivalent chromatic aberration to compensate the LCA of the visual system. Finally, the effect of diffraction was accounted for in the modeling software (CODE V) while simulating PSFs. The effect of accommodation is discussed below with depth filters.
(135) Based on the model setup described above, for a given eye accommodation status and display settings, PSF.sub.1(z,z.sub.1) and PSF.sub.2(z,z.sub.2) for an on-axis point source are simulated separately in CODE V. Using the relationship in Eq. (9), a series of PSF.sub.12(z) are computed by varying w.sub.1 from 1 to 0, which corresponds to varying the rendered depth z from z.sub.1 to z.sub.2. The corresponding MTF.sub.12(z) of the DFD display is derived by taking the FT of PSF.sub.12.
(136) To evaluate the retinal image quality of a depth-fused pixel against a physical pixel placed at the same distance, we further simulated the PSF of a real point source placed at distance z, PSF.sub.ideal(z), and computed the corresponding MTF.sub.ideal(z). The degradation of MTF.sub.12(z) from MTF.sub.ideal(z) was expected to vary with the dioptric spacing of the two adjacent focal planes, rendered depth z, as well as eye-specific parameters. Through comprehensive analysis of the retinal image quality of the DFD display, threshold values were established to ensure the degradation from a real display condition was minimally perceptible to average subjects. Optimal depth-weighted fusing functions were then obtained.
(137) As mentioned earlier, a fused pixel that is rendered to be at the dioptric midpoint of two adjacent focal planes was expected to have the worst retinal image quality compared with other points between the focal planes. Therefore, in the following analysis, we used the retinal image quality of a fused pixel rendered at the midpoint of two adjacent focal planes as a criterion for determining appropriate settings for display designs.
(138) In this study to determine optimal dioptric spacing, the overall focal range of a DFD display covers the depth varying from 3D (z.sub.1) to 0D (z.sub.n). Within this range, we further assumed a constant dioptric spacing between two adjacent focal planes (e.g., z.sub.i and z.sub.i+1) independent of the dioptric midpoint of the focal plane pair relative to the eye noted as z.sub.i,i+1=(z.sub.i+z.sub.i+1)/2 in Table 3. Using the simulation method described above, we validated this assumption by examining the dependence of the MTF of a fused pixel at the midpoint of two focal planes upon the dioptric distance of the midpoint to the eye while fixing other ocular and display factors (i.e., w.sub.1=w.sub.2=0.5, Δz=0.5D, z=z.sub.i,i+1). As expected the MTF of a fused pixel at the midpoint varies as the midpoint gets closer to the eye due to ocular aberrations being highly correlated to accommodation. However, the average variation is less than 15% for spatial frequencies below 20 cpd for z.sub.i,i+1 within the 0D˜3D range.
(139) Under these assumptions, the effect of dioptric spacing on DFD displays can be evaluated by setting the midpoint of a pair of adjacent focal planes at an arbitrary position within the depth range without loss of generality. We thus chose 1D as the midpoint of a focal-plane pair and varied their dioptric spacing Δz from 0.2D to 1D at an interval of 0.2D. For each dioptric spacing condition, the MTF of a fused pixel at the dioptric midpoint (i.e., MTF.sub.12 (z=z.sub.i,i+1)) of the two focal planes was calculated with the assumption that the luminance level was evenly divided between front and back focal planes.
(140) By setting a dioptric spacing of Δz=0.6D and a dioptric midpoint of z.sub.12=1D from the eye, we further examined the MTF of a fused pixel while incrementally varying the eye accommodation distance from the front focal plane (z.sub.1=1.3D) to the back focal plane (z.sub.2=0.7D) at an increment of 0.1D, as shown in
(141) To further demonstrate the pseudo-correct focus cues created using a DFD display, we configured a dual-focal plane display similarly to that used in the previous paragraph (i.e., z.sub.12=1D, and Δz=0.6D). We simulated multiple retinal images of a Snellen E target by convolving the target with the PSF.sub.12(z) defined in Eq. (9), while the luminance of the target was evenly divided between the two focal planes (i.e. w.sub.1=w.sub.2=0.5). Thus, the fused target was expected to appear at the dioptric midpoint of the two focal planes. In
(142) To derive the dependence of the rendered accommodation cue on the depth-weighted fusing function as described in Eq. (8), we extended the MTF simulation shown in
(143) This evaluation can be extended to more than two focal planes covering a much larger depth range. As an example, we chose a 6-focal-plane DFD display covering a depth range from 3D to 0D. By assuming a 0.6D dioptric spacing, six focal planes were placed at 3D (z.sub.1), 2.4D (z.sub.2), 1.8D (z.sub.3), 1.2D (z.sub.4), 0.6D (z.sub.5), and 0D (z.sub.6), respectively. In this display configuration, we repeated the above-described simulations independently to each adjacent pair of focal planes. The black solid curves in
(144) Based on the simulated results shown in
(145)
where z′.sub.i,i+1 represents the pseudo-correct accommodation cue rendered by a luminance ratio of g.sub.i(z=z′.sub.i,i+1)=0.5, and Δz′ characterizes the nonlinearity of g.sub.i(z). Ideally, z′.sub.i,i+1 is equal to the dioptric midpoint z.sub.i,i+1. Table 4 lists detailed parameters of g.sub.i(z) for the six-focal-plane DFD display. As the distance of the focal planes from the eye increased from 2.7D to 0.3D, the difference between z.sub.i,i+1 and z′.sub.i,i+1 increased from −0.013D to +0.024D. The slight mismatch between z′.sub.i,i+1 and z.sub.i,i+1 may be attributed to the dependence of spherical aberration on eye-accommodation distances. The nonlinear fittings of the luminance ratio functions were plotted as red dashed curves in
(146) TABLE-US-00004 TABLE 4 Parameters of Eq. (15) for a 6-focal plane DFD display. I 1 2 3 4 5 z.sub.i, i+1 (diopters) 2.7 2.1 1.5 0.9 0.3 z′.sub.i, i+1 (diopters).sup. 2.7134 2.1082 1.5034 0.8959 0.2758 .sup. Δz′ (diopters) 0.0347 0.0318 0.0366 0.0408 0.0534
(147)
(148) To quantitatively evaluate the retinal-image quality differences between the linear and nonlinear fusing functions, we further evaluated the MTFs of the retinal images simulated with the method described above. A display operating in the dual-focal-plane mode, with z.sub.1=1.8D and z.sub.2=1.2D, was assumed in the simulation without loss of generality. The eye-accommodation distance z was varied from z.sub.1 to z.sub.2 at an interval of 0.1D. For each eye-accommodation distance,
(149) In summary, the non-linear depth-weighted fusing functions shown in
(150) In this embodiment we presented an exemplary systematic method to address two issues in configuring a display for operation in the multi-focal-plane mode: (1) the appropriate dioptric spacing between adjacent focal planes; and (2) the depth-weighted fusing function to render a continuous 3-D volume. By taking account of both ocular and display factors, we determined the optimal spacing between two adjacent focal planes to be ˜0.6D to ensure the MTF of a fused pixel at the dioptric midpoint is comparable to the DOF effect of the HVS on the MTF of a real pixel at the same distance under photopic viewing conditions. We further characterized the optimal form of a set of depth-weighted fusing functions as a function of rendered accommodation cues. Based on simulation results, the non-linear form of depth filters appears to be better than a box filter in terms of improved depth continuity, and better than a linear filter in terms of retinal image contrast modulation. Although our evaluation did not take into account certain other ocular factors such as scattering on the retina and psychophysical factors such as the neuron response, it provides a systematic framework that can objectively predict the optical quality and guide efforts to configure DFD displays for operation in the multi-focal-plane mode.
Subjective Evaluations
(151) To better understand how depth perception is affected by the displays disclosed herein, and how the human visual system responds to the addressable focal planes in the display, we performed two user studies. One was a depth judgment experiment, in which we explored the perceived depth of the displayed virtual object with respect to the variable accommodation cues rendered by the display. The other was an accommodative response measurement, in which we quantitatively measured the accommodative response of a user to a virtual object being presented at different depths. Both experiments were carried out using a display operating in the variable-single-focal-plane mode, configured as a monocular bench prototype.
(152) The major purpose of the depth judgment experiment was to determine the relationship of the perceived depths of virtual objects versus the accommodation cues rendered by the active optical element. A depth judgment task was devised to evaluate depth perceptions in the display in two viewing conditions. In Case A, a subject was asked to estimate subjectively the depth of a virtual stimulus without seeing any real target references. In Case B, a subject was asked to position a real reference target at the same perceived depth as the displayed virtual object.
(153)
(154) On an optical bench, the real target was mounted on a rail to allow movement along the visual axis of the display. To avoid the accommodative dependence on the luminance, multiple light sources were employed to create a uniform illumination on the real target throughout the viewing space. The rail was about 1.5 meters long, but due to the mechanical mounts, the real target could be as close as about 15 cm to the viewer's eye, specifying the measurement range of perceived depths from 0.66 diopters to about 7 diopters. The accommodation distance of the virtual target was controlled by applying five different voltages to the liquid lens, 49, 46.8, 44.5, 42.3, and 40 V.sub.rms, which corresponded to rendered depths at 1, 2, 3, 4 and 5 diopters, respectively.
(155) Ten subjects, 8 males and 2 females, participated in the depth judgment experiments. The average age of all subjects was 28.6. Six subjects had previous experiences with stereoscopic displays, while the other four were from unrelated fields. All subjects had either normal or corrected vision.
(156) The depth judgment task started with a 10-minute training session, followed by 25 consecutive trials. The tasks were to subjectively (Case A) and objectively (Case B) determine the depth of a virtual target displayed at one of the five depths among 1, 2, 3, 4, and 5 diopters. Each of the five depths was repeated in five trials. In each trial, the subject was first asked to close his/her eyes. The virtual stimulus was then displayed and the real target was placed randomly along the optical rail. The experimenter blocked the real target with a black board and instructed the subject to open his/her eyes. The subject was then asked to subjectively estimate the perceived depth of the virtual target and rate its depth as Far, Middle, or Near, accordingly. (Case A). The blocker of the real target was then removed. Following the subject's instruction, the experimenter moved the real target along the optical rail in directions in which the real target appeared to approach the depth of the virtual target. The subject made a fine depth judgment by repeatedly moving the real target backward and forward from the initial judged position until he/she determined that the virtual and real targets appeared to collocate at the same depth. The position of the real target was then recorded as the objective measurement of the perceived depth of the virtual display in Case B. Considering that all the depth cues except the accommodation cue were minimized in the subjective experiment (Case A), we expected that the depth-estimation accuracy would be low. Therefore, the subjective depth estimations for stimuli at 2 and 4 diopters were disregarded to avoid low-confidence, random guessing. Only virtual targets at 1, 3, and 5 diopters were considered as valid stimuli, corresponding to Far, Middle, and Near depths, respectively.
(157) To counter potential learning effects, the order of first five trials, with depths of 1D, 2D, 3D, 4D, and 5D, respectively, were counter-balanced among the ten subjects using a double Latin Square design. The remaining twenty trials for each subject were then generated by random orders. An additional requirement was that two consecutive trials have different rendered depths. Overall, 10×25 trials were performed with 150 valid data points being collected for the subjective experiment and 250 data points for the objective experiment.
(158) After completing all the trials, each subject was asked to fill out a questionnaire, asking how well he/she could perceive depth without (Case A) or with (Case B) seeing the real reference target. The subject was given three choices, ranking his/her sense of depth as Strong, Medium, or Weak in both Cases A and B.
(159) We firstly analyzed the data of the subjective assessments of the perceived depth in the viewing condition without the real target references (Case A). For each subject, we counted the number of correct and incorrect depth estimations among the 15 trials to compute the error rate. For example, when the virtual target was presented at 5 diopters, the correct count would increase by 1 only if the subject estimated the perceived depth as Near; otherwise (either Middle or Far) the error count would increase by 1. Similar counting methods were applied to stimuli displayed at 3 diopters and at 1 diopter. The average error rate for each subject was quantified by the overall error count divided by 15.
(160) The objective measurement results of the perceived depth were then analyzed. For each subject, the perceived depth at each rendered depth, such as 5, 4, 3, 2 and 1 diopter, was computed by averaging the measurements of the five repeating virtual stimuli among the 25 trials. Then, the results from ten subjects were averaged to compute the mean perceived depth among ten subjects.
(161) Finally, we compared the subjective ranking data on depth perception in two cases: without (Case A) and with (Case B) a real target reference. To analyze the ranking data from different users, we assigned values of 1, 2, and 3 to the rankings of Strong, Medium, and Weak, respectively. Thus, the average ranking and the standard deviation for each viewing condition could be computed for ten subjects. The results are plotted in
(162) Despite the fact that only the focus cues were primarily relied upon for the depth-judgment tasks, the results indicated that, under the monocular viewing condition without presenting perspective and binocular depth cues, the perceived depth in Case A matched with the rendered accommodation cue with good accuracy, and in Case B matched well with the rendered accommodation cues. In contrast to the usability studies on traditional stereoscopic displays that have suggested distorted and compressed perceived depths by rendering conflicting binocular disparity and focus cues, the user studies reported herein suggest that depth perception is improved by appropriately rendering accommodation cues in this display with addressable focal planes. The depth judgment task described above proved the potential that this optical see-through display with addressable focus cues can be applied for mixed and augmented reality applications, approximating the viewing condition in the real world.
(163) The major purpose of the accommodative response measurements was to quantify accommodative response of the human visual system to the depth cues presented through the subject display. In this experiment, the accommodative responses of the eye were measured by a near-infrared (NIR) auto-refractor (RM-8000B, Topcon). The auto-refractor has a measurement range of the refractive power from −20 to 20 diopters, a measurement speed of about 2 sec and an RMS measurement error of 0.33 diopters. The eye relief of the auto-refractor is about 50 mm. In the objective measurement, the auto-refractor was placed right in front of the beam-splitter, so that the exit pupil of the auto-refractor coincided with that of the display. Throughout the data-acquisition procedure, the ambient lights were turned off to prevent their influences on accommodation responses.
(164) During the test, a subject with normal vision was asked to focus on the virtual display, which was presented at 1 diopter, 3 diopters, and 5 diopters, respectively, in a three-trial test. At each trial, after the subject set his or her focus on the virtual display, the accommodative response of the subject's eye was recorded at every 2 sec for up to nine measurement points. The results for one subject are plotted in
(165) Whereas the invention has been described in connection with various representative embodiments, it will be understood that it is not limited to those embodiments. On the contrary, it is intended to cover all alternatives, modifications, and equal limits as may be included within the spirit and scope of the invention as defined by the appended claims.