SYSTEM AND METHOD FOR CREATING A NAVIGABLE, THREE-DIMENSIONAL VIRTUAL REALITY ENVIRONMENT HAVING ULTRA-WIDE FIELD OF VIEW

Abstract

The present invention relates to a system and method for capturing video of a real-world scene over a field of view that may exceed the field of view of a user, manipulating the captured video, and then stereoscopically displaying the manipulated image to the user in a head mounted display to create a virtual environment having length, width, and depth in the image. By capturing and manipulating video for a field of view that exceeds the field of view of the user, the system and method can quickly respond to movement by the user to update the display allowing the user to look and pan around, i.e., navigate, inside the three-dimensional virtual environment.

Claims

1-13. (canceled)

14. A system for creating a navigable, panoramic three-dimensional virtual reality environment with stereoscopic perception of three-dimensional depth to a user, comprising: first and second optical elements each optical element having a horizontal field of view of at least 80 degrees, the first optical element providing a first video image stream and the second optical element providing a second video image stream, each stream corresponding to a different stereoscopic viewpoint of a scene; a storage device for storing the first and second video image streams, each stream comprising image frames; an image renderer disposed in communication with the storage device to receive the first and second video image streams, the image renderer configured to create an output stream of merged stereoscopic image frames corresponding to a selected direction of view of the user, the merged frames comprising image frames from the first video image stream which are temporally matched to respective image frames from the second video image stream.

15. The system according to claim 14, wherein the first and second optical elements each include at least one optical imaging detector configured to record the first and second video image streams corresponding to the different stereoscopic viewpoints of the scene.

16. The system according to claim 14, wherein the image renderer is configured to split the merged frames of the output stream and transform each of the split frames onto respective first and second models.

17. The system according to claim 16, wherein the center of origin of each model is selected to correspond to a location at which a respective one of the user's eyes is to be positioned.

18. The system according to claim 16, wherein each model comprises a wireframe model.

19. The system according to claim 16, wherein each model comprises a sphere.

20. The system according to claim 14, wherein the renderer comprises a non-transitory computer usable medium having a computer readable program code embodied therein, the computer readable program code adapted to be executed to implement a method for rendering the first and second video image streams to create a panoramic three-dimensional virtual reality environment having stereoscopic perception of three-dimensional depth comprising the steps of creating first and second wireframe models, and transforming each of the first and second video image streams by wrapping the first and second video image streams onto the first and second wireframe models respectively.

21. The system according to claim 14, wherein the field of view of each optical element is at least 100 degrees.

22. The system according to claim 14, wherein the field of view of each optical element is at least 120 degrees.

23. The system according to claim 14, wherein the field of view of each optical element is at least 150 degrees.

24. The system according to claim 14, wherein the field of view of each optical element is at least 170 degrees.

25. The system according to claim 14, wherein the field of view of each optical element is at least 183 degrees.

26. The system according to claim 14, wherein the field of view of each optical element is at least 210 degrees.

27. The system according to claim 14, wherein the first and second optical elements comprise a first pair of optical elements, wherein the system comprises a second pair of optical elements facing in a different direction from the first pair, each optical element of the second pair having a horizontal field of view of at least 80 degrees, and wherein the first and second pairs of optical dements are configured to provide for the recording of a real-world scene of up to a full 360° by 360° field of view in three dimensions.

28. A method for rendering video image streams to create a navigable, panoramic three-dimensional virtual reality environment having stereoscopic perception of three-dimensional depth comprising: providing the system of claim 15; capturing the first and second video image streams from a respective one of the at least one imaging detector of the first and second optical elements; merging selected frames of the first and second video image streams in stereoscopic format to create the output stream of merged stereoscopic image frames; receiving position data indicating a direction of sight of the user; calculating a first and second region of interest of the merged stereoscopic image pairs based on the position data; and displaying the first and second regions of interest on a head mounted stereoscopic display, whereby a navigable, panoramic three-dimensional virtual reality environment having stereoscopic perception of three-dimensional depth is created.

29. A according to claim 28, comprising creating first and second wireframe models, and wrapping respective first and second sections of the merged stereoscopic image frames onto the first and second wireframe models respectively.

30. A method according to claim 29, wherein the displaying of the first and second regions of interest comprises rotating the first and second wireframe models in response to the position data.

31. A method according to claim 28, comprising removing distortion from the first and second image frames.

32. A method according to claim 28, wherein the merging of selected frames of the first and second video image streams comprises temporally synchronizing the first and second video image streams.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The foregoing summary and the following detailed description of the preferred embodiments of the present invention will be best understood when read in conjunction with the appended drawings, in which:

[0012] FIG. 1 schematically illustrates an exemplary system for creating a navigable, panoramic three-dimensional virtual reality environment in accordance with the present invention;

[0013] FIGS. 2A, 2B schematically illustrate an exemplary method for creating a navigable, panoramic three-dimensional virtual reality environment in accordance with the present invention; and

[0014] FIGS. 3A-3D schematically illustrate exemplary configurations for optical imaging devices used in the present invention, with FIGS. 3A, 3B showing side elevational and top views, respectively, of a mirror system, FIG. 3C showing a top view of a lens system have forward and rearward pairs of imaging devices, and FIG. 3D showing a top view of a hexagonally configured system having six imaging devices.

DETAILED DESCRIPTION OF THE INVENTION

[0015] Referring now to the figures, wherein like elements are numbered alike throughout, the present invention relates to systems and methods for creating a navigable, panoramic three-dimensional virtual reality environment having an ultra-wide field of view, FIGS. 1, 2A, 2B. For example, in one of its aspects, the systems and methods provide a navigable, panoramic three-dimensional virtual reality environment by capturing video image data of a scene having a wide field of view. By capturing video data of a scene that extends beyond the field of view of a user and storing and transforming that video data, the systems and methods of the present invention can create a navigable, panoramic three-dimensional virtual reality environment that can rapidly render and present to the user video information of new portions of the scene that come into the user's view as the user moves their head. As a result, the user gets the feeling of being inside a virtual environment, being able to perceive depth and pan and zoom, due to the manipulation and display of the video image data by the system and method of the present invention.

[0016] For example, as shown in FIG. 1, an exemplary system in accordance with the present invention may include a wide field of view optical imaging device 100 which may include at least two optical elements 106, 108 each oriented to image different viewpoints of a scene. Respective optical imaging detectors 102, 104 may be associated with each optical element 106, 108. Alternatively, a single imaging detector may be provided and configured to alternately receive images from each of the optical elements 106, 108. The optical elements 106, 108 may include any suitable optical imaging device capable of imaging a wide field of view, such as fisheye lenses, Fresnel lenses, mirror lenses, and catadioptric systems, for example. For instance, the two optical elements 106, 108 may be provided in the form of fisheye lenses which can capture video image data over a field of 180° by 180°. (For example, a 185° lens Raynox® Circular Fisheye Conversion Lens DCR-CF187PRO, Yoshida Industry Co., Ltd. Tokyo, Japan.) Alternatively, the optical elements may be provided in the form of parabolic reflectors 306, 308 and associated detectors 302, 304, FIGS. 3A, 3B. (For example, Pano Pro™ MkIIVB, Pano Pro, Ltd, Watlington, UK.) Further, two pairs of optical imaging detectors with associated optical elements may be used to capture video image data having a wider field of view. For example, a pair of forward facing optical imaging detectors 312 and a pair of rearward facing optical imaging detectors 310 may be employed, FIG. 3C. Still further, five (or four) optical imaging detectors and optical elements 314 maybe spaced equidistant around the circle with an optional sixth optical imaging detector and optical element 314 pointing upward out of the plane of the circle, FIG. 3D.

[0017] The optical imaging detectors 102, 104 may record first and second video image data comprising left and right video image streams 2, 4 and may communicate directly with an image renderer 110 to transmit the respective left and right video image streams 2, 4 acquired by the optical imaging detectors 102, 104 to the image renderer 110. Such a configuration in which the optical imaging detectors 102, 104 communication directly with the image renderer 110 may be used for real-time imaging. Alternatively, the optical imaging device 100 may record and store the left and right video image streams 2, 4 for subsequent download to the image renderer 110. For example, the left and right video image streams 2, 4 may be subsequently loaded on to a hard-drive or other suitable storage device to be accessed by the image renderer 110 at a later date.

[0018] The image renderer 110 may communicate with a head mounted display 130 (iWear® VR920, Vuzix Corporation, Rochester, NY) worn by a user which may include a tracking device 140 configured to track the movement of the user. A position detector 120 may be disposed in communication with the tracking device 140 to receive tracking data from the tracking device 140 to determine the direction of view of the user. Although the position detector 120 is shown as a separate element in FIG. 1, the position detector 120 may optionally be provided as an integral part of the tracking device 140. (For example, FSRK-USB-1, Hill Crest Labs, Rockville, MD.) The position detector 120 may be disposed in communication with the image renderer 110 to supply the direction of view data to the image renderer 110. By combining knowledge of the direction of view of the user with the left and right video image streams 2, 4, the image renderer 110 may control the rendering and display of selected portions of the left and right video image streams 2, 4 corresponding to the user's direction of view on respective left and right screens 134, 132 of the head mounted display 130 so the user perceives a navigable, panoramic three-dimensional virtual reality environment.

[0019] Specifically, the image renderer 110 may take the left and right video image streams 2, 4, and a perform polar to rectangular conversion to merge respective temporally matched image frames of the left and right video image streams 2, 4 into a merged image pair 8. That is, for each image frame of the left video image stream 2 the temporally corresponding image frame of the right video image stream 4 is merged thereto to create a series of merged image pairs 8 for all video frames to provide an output stream of the merged image pairs 8. Merging temporally matched image frames 8 from the left and right video image streams 2, 4 ensures that the video image data from each optical imaging detector 102, 104 remains temporally synchronized, and is but one example of how the image streams 2, 4 may be synchronized. That is, data will always be in sync as the left and right video image streams 2, 4 are now merged together in the output stream. With the left and right video image streams 2, 4 now merged in stereo format in the output stream, the output stream can be processed to play in the head mounted display 130. (In addition, in certain configurations of the optical imaging device 100 it is possible that one optical element 106, 108 may image the other, in which case the image renderer 110 may patch the images of the left and right video image streams 2, 4 to obscure the image of the other optical element.)

[0020] To create the head mounted display output, the image renderer 110 may create virtual wireframe spheres 12, 14 (one for each eye) to provide a framework for the virtual reality environment. As each merged image pair 8 is encountered in the output stream, the merged image pair 8 may be split into respective left and right portions that are then wrapped onto respective left, right wireframe sphere 12, 14. The image renderer 110 may virtually place a camera inside the left sphere 14 to create a feed to the left screen 134 of the head mounted display 130 for display to the left eye of the user. Likewise, the image renderer 110 may virtually place a camera inside the right sphere 12 to create a feed to the right screen 132 of the head mounted display 130 for display to the right eye of the user. As these virtual cameras are looking only at a small section of the inside of the virtual spheres 12, 14 the cameras only see part of the image projected around the virtual spheres 12, 14. As the user turns his head anticlockwise a signal is sent back from the tracking device 140 via the position detector 120 to inform the image renderer 110 that the user's head is moving anticlockwise, in response to which the image renderer 110 rotates both spheres 12, 14 clockwise around the spheres' vertical axes. As a result, the virtual cameras inside the virtual spheres 12, 14 will now see an image that pans to the anticlockwise as the cameras are always fixed looking forward, so the inside of the virtual spheres 12, 14 pan clockwise which gives the feeling to the user of movement turning anticlockwise the opposite way.

[0021] Turning to FIGS. 2A, 2B, an exemplary process by which the image renderer creates the navigable, panoramic three-dimensional virtual reality environment is illustrated. In one exemplary method in accordance with the present invention, a navigable, panoramic three-dimensional virtual reality environment having ultra-wide field of view may be created using a fisheye stereo side-by-side video publisher for creating the output stream of merged image pairs 8, FIG. 2A, in conjunction with a spherical stereo side-by-side viewer for processing and displaying the output stream to the user, FIG. 2B. Turning first to the publisher, FIG. 2A, the publisher may be implemented using FFMpeg open source libraries for handling multimedia data and Microsoft® DirectX® protocols, though other suitable libraries or algorithms may be used to effect the creation of the output stream of merged image pairs 8. As a first step 200, the video publisher may initialize global variables, which may include initializing the bit rate, frame size, audio channel, and audio rate, as well as selecting the output video stream compression, and output FPS frame rate. (The name of the output video file may also be selected at step 200.) In addition at step 200, the publisher may receive the left and right video image streams 2, 4, with the left video image stream 2 having a plurality of left image frames and the right video image stream 4 having a plurality of right image frames.

[0022] After global initialization, publication of the merged image pairs 8 may proceed by creating two parallel publication threads for processing the left and right video image streams 2, 4, at steps 210a and 210b, respectively. Initialization of variables for each publication thread may include initialization of the DirectX® variables: FilterGraph, SampleGrabber, GraphBuilder, IMediaControl, IVideoWindow, IBasicAudio, and IMediaSeek, steps 210a, 210b. In addition, the rendering modes associated with sharpening, brightness, contrast, and noise median may be set, steps 220a, 220b. With the variables initialized and rendering modes set, the left and right video image streams 2, 4 may be processed to remove distortion introduced by the optical imaging device 100 using the transformation provided in Table 1, steps 230a, 230b. The distortion-free streams may then be rendered to respective left and right bitmap streams, steps 240a, 240b, which may then be merged to create the merged image pairs 8 of the output stream, step 250.

TABLE-US-00001 TABLE 1 Transformation for each pixel of the current frame: x_dst, y_dst - destination coordinates x_src, y_src - source coordinates distance = FrameHeight * PI distance_1 = 1.0 / distance distance_2 = distance / 1.1 phi = x_dst * distance_1 theta = −y_dst * distance_1 + PI/2 s = SIN( theta ) v0 = s * SIN( phi ) v1 = COS( theta ) r = SQRT(v1*v1 + v0*v0) s = ATAN(r / ( s * COS( phi ))) / r x_src = v0 * distance_2 * s y_src = v1 * distance_2 * s

[0023] Having formed the output stream of merged image pairs 8, the image renderer 110 may proceed with transformation of the output stream for display in the head mounted display 130 to create the navigable, panoramic three-dimensional virtual reality environment. Beginning at step 260, the video publisher may be initialized by initializing the DirectX® D3D interfaces (e.g., IDirect3D9, IDirect3DDevice9, IDirect3DVertexBuffer9, and IDirect3DTexture9), FIG. 2B. In addition at step 260, the system may be configured to initialize the DirectShow interface (e.g., FilterGraph, SampleGrabber, GraphBuilder, IMediaControl, IVideoWindow, IBasicAudio, IMediaSeek) as well as common parameters, such as all parameters for the stereo video stream and look-up parameters for left and right eye image buffers. After initialization, rendering may proceed by creating two parallel threads for rendering the output video stream of merged image pairs 8 and operating the tracking device 140, at steps 270 and 290, respectively. Specifically, for the tracking thread, the process may include the steps for initializing the tracking device 140, step 290, and tracking the user's movement using the position detector 140, step 292. Returning to the rendering thread, at step 270 the 3D environment may be initialized by filling quad vertices (D3D), creating the Direct3D interface, creating a D3D device and setting the DirectX D3D display mode, setting a render state for the D3D device, setting light for D3D device, creating a vertex buffer for D3D device, preparing backbuffer flipping, and preparing a D3D scene. Once the 3D environment is initialized, the frame-by-frame process of rendering the video stream may proceed at step 272. Specifically, the process may retrieve a current video frame containing a merged image pair 8 and may split the merged image pairs 8 of the video output stream into respective left and right images for use in a left eye buffer and right eye buffer, step 272. Further, the status of the user's position and direction of view in the parallel tracking thread may be checked and updated. As a next step in creating the navigable, panoramic three-dimensional virtual reality environment, the left and right images generated at step 272 may be wrapped around the virtual spheres 12, 14 using the exemplary code listed in Table 2 in view of the following geometric conventions.

[0024] In analytic geometry, a sphere with center (x.sub.0, y.sub.0, z.sub.0) and radius, r, is the locus of all points (x, y, z) such that (x−x.sub.0).sup.2+(y−y.sub.0).sup.2+(z−z.sub.0).sup.2=r.sup.2. The points on the sphere with radius r can be parameterized via

x=x.sub.0+r sin θ cos φ

y=y.sub.0+r sin θ sin φ(0≤φ≤2π and 0≤θ≤π)

z=z.sub.0+r cos θ.

[0025] A sphere of any radius centered at zero is an integral surface of the following differential form x dx+y dy+z dz=0. In Cartesian coordinates (x, y, z) on the sphere and (X, Y) on the plane, the projection and its inverse are given by the formulas

[00001] $(X, Y) = (\frac{x}{1 - z}, \frac{y}{1 - z}), and (x, y, z) = (\frac{2 X}{1 + X^{2} + Y^{2}}, \frac{2 Y}{1 + X^{2} + Y^{2}}, \frac{- 1 + X^{2} + Y^{2}}{1 + X^{2} + Y^{2}}) .$

[0026] In spherical coordinates (φ, θ) on the sphere (with φ the zenith angle, 0≤φ≤π, and θ the azimuth, 0≤θ≤2π) and polar coordinates (R, Θ) on the plane, the projection and its inverse are

[00002] $(R, Θ) = (\frac{\sin φ}{1 - \cos φ}, θ), (φ, θ) = (2 arc \tan (\frac{1}{R}), Θ) .$

[0027] Here, φ is understood to have value π when R=0. Also, there are many ways to rewrite these formulas using trigonometric identities. In cylindrical coordinates (r, θ, z) on the sphere and polar coordinates (R, Θ) on the plane, the projection and its inverse are

[00003] $(R, Θ) = (\frac{r}{1 - z}, θ), (r, θ, z) = (\frac{2 R}{1 + R^{2}}, Θ, \frac{R^{2} - 1}{R^{2} + 1}) .$

[0028] Having transformed the left and right images from a rectangular to spherical form by wrapping the left and right images around respective spheres 12, 14, further processing can proceed by selecting that portion of the wrapped images for display on each of the left and right eyes of the user, step 276. That is, in response to the user's direction of view as determined at step 292, a region of interest for each of the left and right wrapped images is selected for display in the head mounted display 130, step 276. Specifically, the process may, for example, execute the steps for D3D device render by: preparing the D3D matrix, choosing the left and right regions of interest, and controlling the video stream (play, stop, pause) as well as controlling panning (up, down, left, right, zoom). The left and right regions of interest may then be displayed on the respective left and right displays 134, 132 of the head mounted display 130. In the exemplary DirectX® implementation display may be accomplished by having D3D render left and right image buffers containing the left and right regions of interest to the left and right displays 134, 132 of the head mounted display 130, step 278. After a video frame is thus displayed to each of the left and right displays 134, 132, the process may be repeated for each video frame by sending a signal to the rendered 110 to return to step 272, step 280, to create a navigable, panoramic three-dimensional virtual reality environment having ultra-wide field of view.

[0029] These and other advantages of the present invention will be apparent to those skilled in the art from the foregoing specification. Accordingly, it will be recognized by those skilled in the art that changes or modifications may be made to the above-described embodiments without departing from the broad inventive concepts of the invention. It should therefore be understood that this invention is not limited to the particular embodiments described herein, but is intended to include all changes and modifications that are within the scope and spirit of the invention as set forth in the claims.

TABLE-US-00002 TABLE 2 Simple Transformation: width, height - width and height of the window Fov - current field of view angle (degrees); pan, tilt - camera direction point (degrees). panDest, tiltDest - model coordinates of clicked (x, y) point (degrees). x, y - screen clicked point coordinates (pixels); r - panorama model sphere radius; vx, vy, vz - coordinates of vector from origin to picked point in projection plane; vx2, vy2, vz2 - transformed vector of picked point. function radiansToDegree( radian ) { return (radian * 180.0) / 3.14159265358979323846264; } function DegreeToRadians( degree ) { return (degree / 180.0) * 3.14159265358979323846264; } input: x, y, Fov, pan, tilt, width, height output: panDest, tiltDest r = (sqrt(w*w + h*h) / 2.0) / tan (degreeToRadians(fov/2.0) ); vx = r; vy = w/2.0 − x; vz = h/2.0 − y; pan = degreeToRadians( pan ); tilt = degreeToRadians ( −tilt ); cosp = cos( pan ); sinp = sin( pan ); cost = cos( tilt ); sint = sin( tilt ); vx2 = vx*cost*cosp − vy*sinp + vz*cosp*sint; vy2 = vx*cost*sinp + vy*cosp + vz*sint*sinp; vz2 = −vx*sint + vz*cost; sinDest = vy2 / sqrt(vx2*vx2+ vy2*vy2); panDest = radinasToDegree( arcsin(sinDest) ); if( vx2 < = 0 && vy2 > 0 ) panDest = 180 − panDest; else if( vx2 < 0 && vy2 < = 0 ) panDest = −180 − panDest; sinDest = vz2 / sqrt(vx2*vx2 + vy2*vy2 + vz2*vz2); tiltDest = radiansToDegree( arcsin(sinDest) );

SYSTEM AND METHOD FOR CREATING A NAVIGABLE, THREE-DIMENSIONAL VIRTUAL REALITY ENVIRONMENT HAVING ULTRA-WIDE FIELD OF VIEW

Inventors

Cpc classification

Classification Explorer

H04N13/383

ELECTRICITY

Classification Explorer

H04N13/117

ELECTRICITY

Classification Explorer

H04N13/239

ELECTRICITY

Classification Explorer

G06F3/011

PHYSICS

Classification Explorer

H04N13/344

ELECTRICITY

Classification Explorer

G06T15/10

PHYSICS

Classification Explorer

H04N13/243

ELECTRICITY

Classification Explorer

G06T2215/08

PHYSICS

International classification

Classification Explorer

H04N13/344

ELECTRICITY

Classification Explorer

H04N13/117

ELECTRICITY

Classification Explorer

H04N13/243

ELECTRICITY

Classification Explorer

H04N13/383

ELECTRICITY

Classification Explorer

G06T15/10

PHYSICS

Classification Explorer

G06F3/01

PHYSICS

Classification Explorer

H04N13/239

ELECTRICITY

Abstract

Claims

Description