SYSTEM AND METHOD FOR CORRECTED VIDEO-SEE-THROUGH FOR HEAD MOUNTED DISPLAYS
20230239457 · 2023-07-27
Inventors
Cpc classification
H04N13/383
ELECTRICITY
H04N13/117
ELECTRICITY
H04N13/239
ELECTRICITY
International classification
H04N13/117
ELECTRICITY
Abstract
A head mounted display system with video-see-through (VST) is taught. The system and method process video images captured by at least two forward facing video cameras mounted to the headset to produce generated images whose viewpoints correspond to the viewpoint of the user if the user was not wearing the display system. By generating VST images which have viewpoints corresponding to the user's viewpoint, errors in sizing, distances and positions of objects in the VST images are prevented.
Claims
1. A head mounted display system comprising: a display capable of being worn by a user in front of their eyes and displaying images to the user; at least two video cameras having fixed respective fields of view relative to pupils of the eyes of the user when the head mounted display system is worn by the user, the at least two video cameras operable to capture video images from the respective fields of view; a computational device operable to: obtain distances between the display and objects located in the respective fields of view of the at least two video cameras; determine based on the fixed respective fields of view of the at least two video cameras, and the distances, a transformation to transform captured video images from each of the at least two video cameras to locations of the pupils of the user; and apply the transformation to the captured video images to generate an image for display on the display, the generated image corresponding to a viewpoint at the pupils of the user, including displaying the objects located in the respective fields of view of the at least two video cameras at depths corresponding to distances of the objects from the pupils of the user, the depths computed based on the obtained distances and the fixed respective fields of view of the at least two video cameras relative to the locations of the pupils of the user.
2. The head mounted display system according to claim 1 wherein the computational device is operable to compute the distances between the display and the objects located in the respective fields of view of the at least two video cameras from the captured video images.
3. The head mounted display system according to claim 2, wherein the computational device is operable to generate an image for each pupil of the user, each generated image corresponding to the viewpoint of the respective pupil of the user and each generated image is displayed to the respective pupil of the user providing the user with a stereoscopic image.
4. The head mounted display system according to claim 2, wherein the locations of the pupils of the user are virtual locations, selected by the user.
5. The head mounted display system according to claim 1 wherein the computational device is operable to generate an image for each pupil of the user, each generated image corresponding to the viewpoint of the respective pupil of the user and each generated image is displayed to the respective pupil of the user providing the user with a stereoscopic image.
6. The head mounted display system according to claim 1 wherein the computational device is mounted to the display.
7. The head mounted display system according to claim 1, wherein the computational device is further operable to obtain an inter-pupil distance between the pupils of the user and determine the transformation further based on the inter-pupil distance.
8. The head mounted display system according to claim 1 wherein the computational device is connected to the display by a wire tether.
9. The head mounted display system according to claim 1 wherein the computational device is wirelessly connected to the display.
10. The head mounted display system of claim 1 wherein the locations of pupils of the user are virtual locations, selected by the user.
11. The head mounted display system of claim 1, wherein the at least two video cameras have fixed locations relative to the display, and wherein the computational device is operable to determine the fixed respective fields of view of the at least two video cameras relative to the pupils of the user based on an inter-pupil distance and an eye-to-display distance.
12. A method of operating a head mounted display worn by a user in front of their eyes, the head mounted display having at least two video cameras operable to capture video images , the method comprising the steps of: determining respective fields of view of the at least two video cameras relative to each pupil of the eyes of the user; obtaining video images captured by the at least two video cameras; computing distances between the display and objects located in the respective fields of view of the at least two video cameras based on the video images captured by the at least two video cameras; determining, based on the respective fields of view of each of the at least two video cameras relative to the pupil of each eye of the user, and the computed distances, a transformation to transform the captured video images from each of the at least two video cameras to locations of the pupils of the user; applying the transformation to the captured video images to render a generated image corresponding to a viewpoint at the pupils of the user, wherein the generated image displays the objects at depths corresponding to distances of the objects from the pupils of the user, the depths computed based on the computed distances and the respective fields of view of the at least two video cameras relative to the pupils of the user; and displaying the generated image to the user on the head mounted display.
13. The method of claim 12, further comprising processing the captured video images to render a respective generated image for each pupil of the user, each respective generated image corresponding to the viewpoint of the respective pupil of the user.
14. The method of claim 12, further comprising obtaining an inter-pupil distance between the pupils of the user and determining the transformation further based on the inter-pupil distance.
15. The method of claim 12, further comprising receiving a selection of virtual locations of the pupils of the user; and determining the transformation to transform the captured video images from each of the at least two video cameras to the virtual locations of the pupils of the user.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] Preferred embodiments of the present invention will now be described, by way of example only, with reference to the attached Figures, wherein:
[0016]
[0017]
[0018]
[0019]
[0020]
DETAILED DESCRIPTION OF THE INVENTION
[0021] A user 20 is illustrated in
[0022] However, as illustrated in the figure, the locations of the pupils of eyes 36 of user 20 do not correspond to the location of video cameras 28a, 28b and thus the respective viewpoints of the images acquired by cameras 28a, 28b (indicated by lines 40 and 44) do not correspond to what would be the actual viewpoints (indicated by dashed lines 48 and 52) of the user's eyes 36 if object 32 was viewed without head mounted display 24. Thus, when the images captured by cameras 28a, 28b are displayed to user 20 in head mounted display 24, object 32 appears closer to user 20 and/or larger than it actually is. In many applications, such as the above-mentioned surgical case, such distortions cannot be tolerated.
[0023] In
[0024] Unit 104 includes a display, or displays, (not shown in this figure) which are operable to display a different video image to each of the eyes of user 108 and unit 104 can also include head tracking and orientation measuring systems which can be used to determine the position and orientation of the head (and thus the eyes) of user 108. Unit 104 can also include depth sensors 110, such as a RealSense Depth Camera D435, manufactured by Intel, a LIDAR scanner, or any other suitable system which can determine the distance between unit 104 and objects in front of unit 104.
[0025] Computation device 112 can be a conventional computing device, such as a personal computer, single board computer, etc. or can be a purpose-built computing device which provides the necessary computational processing, as described below.
[0026] Computation device 112 can be located within unit 104 or can be separate from unit 104 and, in the latter case, computational device 112 can be connected to unit 104 via a wired tether 116 or via a wireless data connection 120.
[0027] Unit 104 also includes at least two video cameras 124 which are mounted to unit 104 and which face generally forward, with respect to the viewpoint of user 108, when user 108 is wearing unit 104. It is contemplated that, in a minimal viable product configuration, cameras 124 can be (or can include) the above-mentioned depth sensors 110, provided that sensors 110 are visible light cameras and allow access to their captured images for subsequent image processing by computation device 112.
[0028] In the case where unit 104 is a custom headset, cameras 124 are mounted to the front of the headset and appropriately communicate with computation device 112. In the case where unit 104 is a commercially available headset, cameras 124 can be provided on a module which is designed to be attached to the commercially available headset with cameras 124 facing outward from unit 104 and the module can appropriately communicate with computational device 112.
[0029] Preferably, cameras 124 are mounted such that there are no “blindspots”, relative to the expected field of view of a user wearing unit 104, and that all areas of the user's field of view are captured by cameras 124. While not essential, it is preferred that the total combined field of view coverage of cameras 124 is at least one-hundred and eighty degrees, both horizontally and vertically.
[0030] Preferably, several cameras 124 (e.g.—eight or more) are provided, each of which is a color camera with a relatively narrow field of view (FOV), and cameras 124 are placed close to each other on the front face unit 104. Such a configuration is advantageous as simplifies the image processing required to produce a generated view (as described below) and it allows relatively low resolution (and hence low expense) cameras to be employed while still providing an overall sufficient quality of a generated view.
[0031] As should be apparent to those of skill in the art, it is not necessary that all cameras 124 have the same resolution, FOV or even that all cameras be color cameras, as the preferred processing methods of the present invention can compensate for such differences.
[0032] The locations of cameras 124 on unit 104, and inter-camera distances and the FOV of cameras 124 and their positioning relative to the displays in unit 104, are determined at the time of manufacture (in the case of a custom headset) or the at time of manufacture and installation of the camera module (in the case of a module to be attached to a commercial headset) and this information is provided to computation device 112 as an input for the image processing described below which is performed by computational device 112.
[0033] Additional inputs to computational device 112 include the distance 130 between the pupils of the eyes 134 of the user 108, as shown in
[0034] Similarly, distance 138 can be determined by any suitable means, such as by a time of flight sensor 146 in unit 104 or from any focus adjustments made by user 108 that are required to adjust an optical path to bring images on display 142 into focus, etc.
[0035] As will now be apparent to those of skill in the art, with these physical parameters, system 100 can determine the location of each camera 124 relative to each pupil of user 108.
[0036] A method in accordance with an aspect of the present invention, will now be described, with reference to
[0037] The method commences at step 200 wherein the physical parameters of unit 104 and user 108 are determined and provided to computational device 112. As mentioned above, these physical parameters include the number of cameras 124 on unit 104, as well as their locations relative to the display 142 in unit 104. It is contemplated that, in most cases, this information will be a constant, fixed at the time of manufacture and/or assembly of unit 104 and provided once to computational unit 112. However, it is also contemplated that different units 104 may be used with computational device 112 and in such cases; these different units 104 may have different physical parameters which can be provided to computational device 112 when these units 104 are connected thereto.
[0038] The inter-pupil distance 130 and eye to display 142 distance 138 are also determined and provided to computational unit 112 such that computational unit 112 can determine the location, distance and FOV of each camera 124 with respect to each of the pupils of user 108.
[0039] At step 204, cameras 124 are activated and begin capturing video from their respective FOVs and provide that captured video to computational device 112. Also, depth information 160, from depth sensors 110 if present, is captured and is also provided to computational device 112.
[0040] In a current embodiment of the present invention, computation device 112 employs the technique of light field rendering to process video captured by cameras 124. Specifically, the lightfield rendering is employed to create a generated view from the video captured by cameras 124 which is correct for the viewpoint of user 108 looking at display 142. While light field rendering is discussed herein, the present invention is not so limited and other suitable techniques for processing video captured by cameras, such as view interpolation methods, will occur to those of skill in the art and can be used.
[0041] At step 208, computational device 112 uses the depth information and the video captured by cameras 124 to produce a generated view of the real world in front of user 108, the generated view corresponding to the viewpoint of the user as would be viewed by the user if they were not wearing unit 104.
[0042] Specifically, computational device 112 uses the depth information 160 with the light field rendering technique to estimate the specific cameras 124a, 124b, etc. which will capture light rays 164, 168 that would reach the pupils of the eyes of user 108 from each object 172 in front of user 108, if user 108 was observing the real world directly, without unit 104. The video captured by these cameras 124 is then processed by computational unit 112 to produce a generated image 178 which is viewed 182 by user 108.
[0043] At step 212 the generated view is displayed to user 108 on display 142 and the process returns to step 204. Preferably, computational device 112 has sufficient processing capacity to render generated view 178 at a frame rate of at least 30 FPS and more preferably, at a frame rate greater than 60 FPS.
[0044] While the method described above provides advantages over the prior art in that the field of view of the generated image of real world that Is provided to the user corresponds to the viewpoint the user would have if they were not wearing unit 104, preferably computational device 112 produces two generated images, one for each eye 134 of user 108 to provide a stereoscopic view for user 108. In this case, each generated image will correspond to the viewpoint of the eye 134 of user 108 for which it is generated and such stereoscopic images provide a more useful result in many cases. Thus, for such cases, steps 200 to 212 are repeated for each eye 134 of user 208.
[0045] It is contemplated that, in some embodiments, depth sensors 110 may be omitted and the necessary depth information for computational device 112 can be determined directly from the video images captures by cameras 124 using known image processing techniques.
[0046] If it is desired, generated images 178 can be stored, in addition to being displayed to user 108, and in such a case generated images can be store on computational device 112 or on a separate storage device (not shown).
[0047] While the above-described aspects of the present invention provide a user of a head mounted display system with a viewpoint-correct view of the real world, it is also contemplated that in some circumstances it may be desired to provide the user with real world view that corresponds to a different viewpoint. Specifically, it is contemplated that computational device 112 can be provided with a selected location, a “virtual viewpoint”, for the pupils of the eyes of the user. Specifically, computational device 112 can be provided with a location for the pupils of the user which does not, in fact, correspond to the actual location of the pupils.
[0048] For example, computational device 112 can be instructed that the location of the pupils of the user are one foot further apart (distance 130 is one foot longer) than they actually are. In such a case the generated views produced by computational device 112 would appear enlarged, or magnified, to the actual real-world view which would otherwise be experienced by the user if they were not wearing unit 104. Similarly, a virtual viewpoint defining the pupils of user 108 as being located to one side or the other of user 108 or above or below user 108 could be employed if desired.
[0049] As will now be apparent, the present invention provides a head mounted display system with video-see-through images that correspond to the user's viewpoint. Thus, distortions in distance, position and size which would occur without the present invention are avoided.
[0050] The above-described embodiments of the invention are intended to be examples of the present invention and alterations and modifications may be effected thereto, by those of skill in the art, without departing from the scope of the invention which is defined solely by the claims appended hereto.