TEAM AUGMENTED REALITY SYSTEM
20190279428 ยท 2019-09-12
Inventors
- Newton Eliot Mack (Culver City, CA, US)
- Philip R. Mass (Culver City, CA, US)
- Winston C. Tao (Culver City, CA, US)
Cpc classification
A63F13/212
HUMAN NECESSITIES
H04N9/646
ELECTRICITY
G06F3/011
PHYSICS
A63F13/213
HUMAN NECESSITIES
A63F13/65
HUMAN NECESSITIES
G06F3/0346
PHYSICS
International classification
G06T19/00
PHYSICS
A63F13/212
HUMAN NECESSITIES
A63F13/211
HUMAN NECESSITIES
A63F13/213
HUMAN NECESSITIES
Abstract
A system for combining live action and virtual images in real time into a final composite image as viewed by a user through a head mounted display, and which uses a self-contained tracking sensor to enable large groups of users to use the system simultaneously and in complex walled environments, and a color keying based algorithm to determine display of real or virtual imagery to the user.
Claims
1. A system comprising: a helmet mounted display (HMD) for a user; a front-facing camera or cameras; and a low latency keying module configured to mix virtual and live action environments and objects in an augmented reality game or simulation.
2. The system of claim 1 wherein the keying module is configured to composite the live action image from the front facing camera with a rendered virtual image from the point of view of the HMD, and send the composited image to the HMD so the user can see the combined image.
3. The system of claim 1 wherein the keying module is configured to take in a live action image from the camera, and perform a color difference and despill operation on the image to determine how to mix it with an image of a virtual environment.
4. The system of claim 1 wherein the camera is mounted to the front of the HMD and facing forward, to provide a view of the real environment in the direction that the user is looking.
5. The system of claim 1 further comprising an upward-facing tracking sensor configured to be carried by the user of the HMD and to detect overhead tracking markers.
6. The system of claim 5 wherein the sensor is configured to determine a position of the user in a physical space, and the keying module is configured to determine which areas of the physical space will be visually replaced by virtual elements.
7. The system of claim 1 wherein the virtual elements will be visually replaced when areas of the live action environment are painted a solid blue or green color.
8. The system of claim 1 wherein the sensor is configured to calculate the position of the HMD in a physical environment, and that information is used to render a virtual image from the correct point of view that is mixed with the live action view and displayed in the HMD.
9. The system of claim 1 wherein each user of the HMD has a separate tracking sensor and rendering computer, whose function is independent of the sensors and rendering computers of the other users.
10. The system of claim 9 wherein a tracking system of the sensor is not dependent on the other users because it can calculate the complete position and orientation of the HMD based upon the view of the overhead markers without communicating with any external sensors.
11. The system of claim 1 wherein the front facing camera or cameras are configured to provide a real time view of the environment that the user is facing.
12. The system of claim 1 wherein the low latency is on the order of 25 milliseconds.
13. The system of claim 1 wherein the sensor is a self-contained 6DOF tracking sensor.
14. The system of claim 1 wherein the keying module is configured to allow an environment designer to determine which components of an environment of the user are to be optically passed through and which are to be replaced by virtual elements.
15. The system of claim 1 wherein the keying module is configured to handle transitions between virtual and real worlds in a game or simulation by reading the image from the front facing camera, performing a color difference key process on the image to remove the solid blue or green elements from the image, and then combining this image with a virtual rendered image.
16. The system of claim 1 wherein the keying module is embodied in low latency programmable hardware.
17. The system of claim 1 wherein the keying module is configured to calculate the color difference between the red, green and blue elements of a region of a live action image, to use that difference to determine the portions of the live action image to remove, and use a despill calculation to limit the amount of blue or green in the image and remove colored fringes from the image.
18. The system of claim 1 wherein the number of users of the self-contained tracking system can be greater than five because the tracking system can calculate its position based on a view of overhead markers without needing to communicate with an external tracking computer.
19. The system of claim 1 wherein the users of the self-contained tracking system can be located very close to each other without experiencing tracking problems, as the tracking system can calculate its position even when occluded to either side.
20. The system of claim 1 wherein the users of the self-contained tracking system can walk very close to head height walls without experiencing tracking problems, as the tracking system can calculate its position even when occluded to either side.
21-52. (canceled)
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] The foregoing and other features and advantages of the present invention will be more fully understood from the following detailed description of illustrative embodiments, taken in conjunction with the accompanying drawings.
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
DETAILED DESCRIPTION
[0057] The following is a detailed description of presently known best mode(s) of carrying out the inventions. This description is not to be taken in a limiting sense, but is made for the purpose of illustrating the general principles of the inventions.
[0058] A rapid, efficient, reliable system is disclosed herein for combining live action images on a head mounted display that can be worn by multiple moving users with matching virtual images in real time. Applications ranging from video games to military and industrial simulations can implement the system in a variety of desired settings that are otherwise difficult or impossible to achieve with existing technologies. The system thereby can greatly improve the visual and user experience, and enable a much wider usage of realistic augmented reality simulation.
[0059] The process can work with a variety of head mounted displays and cameras that are being developed.
[0060] An objective of the present disclosure is to provide a method and apparatus for rapidly and easily combining live action and virtual elements in a head mounted display worn by multiple moving users in a wide area.
[0061]
[0062] User 200 can carry at least one hand controller 220; in this embodiment it is displayed as a gun. Hand controller 220 also has a self-contained tracking sensor 214 with upward facing lens 216 mounted rigidly to it. The users 200 are moving through an area which optionally has walls 100 to segment the simulation area. Walls 100 and floor 110 may be painted a solid blue or green color to enable a real time keying process that selects which portions of the real world environment will be replaced with virtual imagery. The walls 100 are positioned using world coordinate system 122 as a global reference. World coordinate system 122 can also be used as the reference for the virtual scene, to keep a 1:1 match between the virtual and the real world environment positions. There is no need to have walls 100 for the system to work, and the system can work in a wide open area.
[0063] One of the system advantages is that it can work in environments with many high physical walls 100, which are frequently needed for realistic environment simulation. Physical props 118 can also be placed in the environment. They can be colored a realistic color that does not match the blue or green keyed colors, so that the object that the user may touch or hold (such as lamp posts, stairs, or guard rails) can be easily seen and touched by the user with no need for a virtual representation of the object. This also makes safety-critical items like guardrails safer, as there is no need to have a perfect VR recreation of the guardrail that is registered 100% accurately for the user to be able to grab it.
[0064] An embodiment of the present disclosure is illustrated in
[0065] User 200 can be surrounded by walls 100 and floor 110, optionally with openings 102. Since most existing VR tracking technologies require a horizontal line of sight to HMD 210 and hand controller 220, the use of high walls 100 prevents those technologies from working. The use of self-contained tracking sensor 214 with overhead tracking targets 111 enables high walls 100 to be used in the simulation, which is important to maintain a sense of simulation reality, as one user 200 can see other users 200 (or other scene objects not painted a blue or green keying color) through the front facing cameras 212. As previously noted, most other tracking technologies depend upon an unobstructed sideways view of the various users in the simulation, preventing realistically high walls from being used to separate one area from another. This lowers the simulation accuracy, which can be critical for most situations.
[0066] To calculate the current position of the tracking sensor 214 in the world, a map of the existing fiducial marker 3D positions 111 is known. In order to generate a map of the position of the optical markers 111, a nonlinear least squared optimization is performed using a series of views of identified optical markers 111, in this case called a bundled solve, a method that is well known by machine vision practitioners. The bundled solve to calculation can be calculated using the open source CERES optimization library by Google Inc. of Mountain View, Calif. (http://ceres-solver.org/nnls_tutorial.html#bundle-adjustment) Since the total number of targets 111 is small, the resulting calculation is quick, and can be performed rapidly with an embedded computer 280 (
[0067] Once the overall target map is known and tracking camera 216 can see and recognize at least four optical markers 111, the current position and orientation (or pose) of tracking sensor 214 can be solved. This can be solved with the Perspective 3 Point Problem method described by Laurent Kneip of ETH Zurich in A Novel Parametrization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation. Since the number of targets 111 is still relatively small (at least four, but typically less than thirty), the numerical solution to the pose calculation can be solved very rapidly, in a matter of milliseconds on a small embedded computer 280 contained in the self-contained tracking sensor.
[0068] Once the sensor pose can be solved, the resulting overhead target map can then be referenced to the physical stage coordinate system floor 110. This can be achieved by placing tracking sensor 214 on the floor 110 while keeping the targets 111 in sight of tracking camera 216. Since the pose of tracking camera 216 is known and the position of tracking camera 216 with respect to the floor 110 is known (as the tracking sensor 214 is resting on the floor 110), the relationship of the targets 111 with respect to the ground plane 110 can be rapidly solved with a single 6DOF transformation, a technique well known to practitioners in the field.
[0069] After the overall target map is known and reference to the floor 110, when the tracking sensor 214 can see at least four targets 111 in its field of view, it can calculate the position and orientation, or pose, anywhere under the extent of targets 111, which can cover the ceiling of a very large space (for example, 50 m50 m10 m.)
[0070] A schematic of an embodiment of the present disclosure is shown in
[0071] The field of view of the lens 218 on tracking camera 216 is a trade-off between what the lens 218 can see and the limited resolution that can be processed in real time. This wide angle lens 218 can have a field of view of about ninety degrees, which provides a useful trade-off between the required size of optical markers 111 and the stability of the optical tracking solution.
[0072] An embodiment of the present disclosure is illustrated in
[0073] The data flow of the tracking and imaging data is illustrated in
[0074] Tracking data 215 is passed to both 3D engine 500 and wall renderer 410. Wall renderer 410 can be a simple renderer that uses the wall position and color data from a 3D environment lighting model 400 to generate a matched clean wall view 420. 3D environment lighting model 400 can be a simple 3D model of the walls 100, the floor 110, and their individual lighting variations. Since real time keying algorithms that separate blue or green colors from the rest of an image are extremely sensitive to lighting variations within those images, it is advantageous to remove those lighting variations from the live action image before attempting the keying process. This process is disclosed in U.S. Pat. No. 7,999,862. Wall renderer 410 uses the current position tracking data 215 to generate a matched clean wall view 420 of the real world walls 100 from the same point of view that the HMD 210 is presently viewing those same walls 100. In this way, the appearance of the walls 100 without any moving subjects 200 in front of them is known, which is useful for making keying an automated process. This matched clean wall view 420 is then passed to the lighting variation removal stage 430.
[0075] As previously noted, HMD 210 contains front facing cameras 212 connected via a low-latency data connection to the eye displays in HMD 210. This low latency connection is important to users being able to use HMD 210 without feeling ill, as the real world representation needs to pass through to user 200's eyes with absolute minimum latency. However, this low latency requirement can drive the constraints on image processing in unusual ways. As previously noted, the algorithms used for blue and green screen removal are sensitive to lighting variations, and so typically require modifying their parameters on a per-shot basis in traditional film and television VFX production. However, as the user 200 is rapidly moving his head around, and walking around multiple walls 100, the keying process must become more automated. By removing the lighting variations from the front facing camera image 213, it becomes possible to cleanly replace the physical appearance of the blue or green walls 100 and floor 110, and rapidly and automatically provide a high quality, seamless transition between the virtual environment and the real world environment for the user 200.
[0076] This is achieved with the following steps, and can take place on portable computer 230 or in HMD 210. This can take place, for example, on HMD 210 inside very low latency circuitry. The front facing camera image 213 along with the matched clean wall view 420 are passed to the lighting variation removal processor 430. This lighting variation removal uses a simple algorithm to combine the clean wall view 420 with the live action image 213 in a way that reduces or eliminates the lighting variations in the blue or green background walls 100, without affecting the non-blue and non-green portions of the image. This can be achieved by a simple interpolation algorithm, described in U.S. Pat. No. 7,999,862, that can be implemented on the low latency circuitry in HMD 210. This results in evened camera image 440, which has had the variations in the blue or green background substantially removed. Evened camera image 440 is then passed to low latency keyer 450. Low latency keyer 450 can use a simple, high speed algorithm such as a color difference method to remove the blue or green elements from the scene, and create keyed image 452. The color difference method is well known to practitioners in the field. Since the evened camera image 440 has little or no variation in the blue or green background lighting, keyed image 452 can be high quality with little or no readjustment of keying parameters required as user 200 moves around the simulation area and sees different walls 100 with different lighting conditions.
[0077] Keyed image 452 is then sent to low latency image compositor 460 along with the rendered virtual view 510. Low latency image compositor 460 can then rapidly combine keyed image 452 and rendered virtual view 510 into the final composited HMD image 211. The image combination at this point becomes very simple, as keyed image 452 already has transparency information, and the image compositing step becomes a very simple linear mix between virtual and live action based upon transparency level.
[0078] A perspective view of the system is illustrated in
[0079] Since the walls in this embodiment are painted a solid color to aid the keying process, it will typically be difficult to measure the actual wall using stereo depth to processing methods. However, edges 104 and corners 106 typically provide areas of high contrast, even when painted a solid color, and can be used to measure the depth to the edges 104 and corners 106 of walls 100. This would be insufficient for general tracking use, as corners are not always in view. However, combined with the overall 3D tracking data 215 from self-contained tracking sensor 214, this can be used to calculate the 3D locations of the edges 104 and corners 106 in the overall environment. Once the edges and corners of walls 100 are known in 3D space, it is straightforward to determine the color and lighting levels of walls 100 by having a user 200 move around walls 100 until their color and lighting information (as viewed through front facing cameras 212) has been captured from every angle and applied to 3D environment lighting model 400. This environment lighting model 400 is then used as described in
[0080] A view of the image before and after compositing is shown in
[0081] Another goal of the system is illustrated in
[0082] A perspective view of the present embodiment is shown in
[0083] A perspective view of the present embodiment is shown in
[0084]
[0085] A perspective view of the physical environment being set up is shown in
[0086] A block diagram showing the method of operations is shown in
[0087] Section B shows a method of generating the lighting model 400. Once the HMD 210 is tracking with respect to the overhead tracking targets 111 and the world coordinate system 122, the basic 3D geometry of the walls is established. This can be achieved either by loading a very simple geometric model of the locations of the walls 100, or determined by combining the distance measurements from stereo cameras 212 on HMD 210 to calculate the 3D positions of edges 104 and corners 106 of walls 100. Once the simplified 3D model of the walls 100 is established, user 200 moves around walls 100 so that every section of walls 100 is viewed by the cameras 212 on HMD 210. The color image data from cameras 212 is then projected onto the simplified lighting model 400, to provide an overall view of the color and lighting variations of walls 100 through the scene. Once this is complete, simple lighting model 400 is copied to the other portable computers 230 of other users.
[0088] Section C shows a method of updating the position of user 200 and hand controller 220 in the simulation. The tracking data 215 from self contained trackers 214 mounted on HMD 210 and hand controller 220 is sent to the real time 3D engine 500 running on the user's portable computer 230. The 3D engine 500 then sends position updates for the user and their hand controller over a standard wireless network to update the other user's 3D engines. The other users' 3D engines update once they receive the updated position information, and in this way all the users stay synchronized with the overall scene.
[0089] A similar method is shown in Section D for the updates of moving scene objects. The tracking data 215 is sent to a local portable computer 230 running a build of the 3D engine 500, so that the position of the moving scene object 140 is updated in the 3D engine 500. 3D engine 500 then transmits the updated object position on a regular basis to the other 3D engines 500 used by other players, so the same virtual object motion is perceived by each player.
[0090] In an alternative embodiment, the depth information from the stereo cameras 212 can be used as part of the keying process, either by occluding portions of the live action scene behind virtual objects as specified by their distance from the user, or by using depth blur instead of the blue or green screen keying process as a means to separate the live action player in the foreground from the background walls. There are multiple techniques to get a clean key, some of which do not involve green screen such as difference matting, so other technologies to separate the foreground players from the background walls can also be used.
[0091] Thus, systems of the present disclosure can have many unique advantages such as those discussed immediately below. Since each tracking sensor 214 is self contained and connected to an individual portable computer 230, the system can scale to very large numbers of users (dozens or hundreds) in a single location, without compromising overall tracking or system stability. In addition, since each tracking sensor 214 has an upward facing camera 216 viewing tracking targets 111, many users can be very close together without compromising the tracking performance of the system for any individual user. This is important for many simulations like group or team scenarios. Since the portable computers 230 are running standard 3D engines 500 which already have high speed communication over standard wifi type connnections, the system scales in the same way that a standard gaming local area network scales, which can handle dozens or hundreds of users with existing 3D engine technology that is well understood by practitioners in the art.
[0092] The use of a low latency, real time keying algorithm enables a rapid separation between which portions of the scene are desired to be normally visible, and which portions of the scene will be replaced by CGI. Since this process can be driven by the application of a specific paint color, virtual and real world objects can be combined by simply painting one part of the real world object the keyed color. In addition, due to the upward-facing tracking camera and use of overhead tracking targets, the system can easily track even when surrounded by high walls painted a single uniform color, which would make traditional motion capture technologies and most other VR tracking technologies fail. The green walls can be aligned with the CGI versions of these walls, so that players can move through rooms and into buildings in a realistic manner, with a physical green wall transformed into a visually textured wall that can still be leaned against or looked around.
[0093] The keying algorithm can be implemented to work at high speed in the type of low latency hardware found in modern head mounted displays. This makes it possible for users to see their teammates and any other scene features not painted the keying color as they would normally appear, making it possible to instantly read each other's body language and motions, and enhancing the value of team or group scenarios. In addition, using the depth sensing capability of the multiple front facing cameras 212, a simplified 3D model of the walls 100 that has all of the color and lighting variations can be captured. This simple 3D lighting model can then be used to create a clean wall image of what a given portion of the walls 100 would look like without anyone in front of them, which is an important element to automated creation of high quality real time keying. It is also possible to track the users' finger position based on the HMD position and the depth sensing of the front facing cameras, and calculate whether the user's hand has intersected a virtual control switch in the simulation.
[0094] A third person spectator VR system can also be easily integrated into the overall whole, so that the performance of the users while integrated into the virtual scene can be easily witnessed by an external audience for entertainment or analysis. In addition, it is straightforward to add the use of moving tracked virtual obstacles, whose positions are updated in real time across all of the users in the simulation. The same methods can be used to overlay the visual appearance of the user's hand controller, showing an elaborate weapon or control in place of a more pedestrian controller. Finally, a projected blueprint 123 can be generated on the floor 110 of the system, enabling rapid alignment of physical walls 100 with their virtual counterparts.
[0095] In an alternative embodiment, the walls 100 can be visually mapped even if they are not painted a blue or green, to provide a difference key method to remove the background without needing a blue or green component.
SUMMARIES OF SELECTED ASPECTS OF THE DISCLOSURE
[0096] 1. A team augmented reality system that uses self-contained tracking systems with an upward-facing tracking sensor to track the positions of large numbers of simultaneous users in a space.
[0097] The system uses an upward-facing tracking sensor to detect overhead tracking markers, thus making it unaffected by objects near the user, including large numbers of other users or high walls that are painted a single color. Since the tracking system is contained with the user, and does not have any dependencies on other users, the tracked space can be very large (50 m50 m10 m) and the number of simultaneous users in a space can be very large without overloading the system. This is required to achieve realistic simulation scenarios with large numbers of participants.
2. A HMD with a low latency keying algorithm to provide a means to seamlessly mix virtual and live action environments and objects.
[0098] The use of a keying algorithm enables a rapid, simple way of determining which components of the environment are to be passed through optically to the end user, and which components are to be replaced by virtual elements. This means that simulations can freely mix and match virtual and real components to best fit the needs of the game or simulation, and the system will automatically handle the transitions between the two worlds.
3. A team augmented reality system that lets users see all the movements of the other members of their group and objects not the keyed color.
[0099] Further to #1 above, a player can see his other teammates automatically in the scene, as they are not painted green. The system includes the ability to automatically transition between the virtual and real worlds with a simple, inexpensive, easy to apply coat of paint.
4. A team augmented reality system that uses depth information to generate a 3D textured model of the physical surroundings, so that the background color and lighting variations can be rapidly removed to improve the real time keying results.
[0100] The success or failure of the keying algorithms depends on the lighting of the green or blue walls. If the walls have a lot of uneven lighting and the keying algorithm cannot compensate for this, the key may not be very good, and the illusion of a seamless transition from live action to virtual will be compromised. However, automatically building the lighting map of the blue or green background environment solves this problem automatically, so that the illusion works no matter which direction the user aims his head.
[0101] 5. A team augmented reality system that can incorporate a third person spectator AR system for third person viewing of the team immersed in their environment.
[0102] The ability to see how a team interacts is key to some of the educational, industrial and military applications of this technology. The system includes the common tracking origin made possible by the use of the same overhead tracking technology for the users as for the spectator VR camera. It also means that the camera operator can follow the users and track wherever they will go inside the virtual environment.
6. A team augmented reality system that can project a virtual blueprint in the displays of users, so that the physical environment can be rapidly set up to match the virtual generated environment.
[0103] This system feature helps set up the environments; otherwise it is prohibitively difficult to align everything correctly between the virtual world and the live action world.
[0104] Although the inventions disclosed herein have been described in terms of preferred embodiments, numerous modifications and/or additions to these embodiments would be readily apparent to one skilled in the art. The embodiments can be defined, for example, as methods carried out by any one, any subset of or all of the components as a system of one or more components in a certain structural and/or functional relationship; as methods of making, installing and assembling; as methods of using; methods of commercializing; as methods of making and using the units; as kits of the different components; as an entire assembled workable system; and/or as sub-assemblies or sub-methods. The scope further is includes apparatus embodiments/claims versions of method claims and method embodiments/claims versions of apparatus claims. It is intended that the scope of the present inventions extend to all such modifications and/or additions.