MULTI-CAMERA SCENE REPRESENTATION INCLUDING STEREO VIDEO FOR VR DISPLAY
20190349561 ยท 2019-11-14
Inventors
Cpc classification
H04N23/54
ELECTRICITY
H04N13/111
ELECTRICITY
H04N13/239
ELECTRICITY
International classification
H04N13/111
ELECTRICITY
H04N13/239
ELECTRICITY
Abstract
This invention encompasses a device capable of taking two sets of videos or pictures from a slightly different perspective than the other, and using software to manipulate these two sets of media into one three-dimensional image that can be shared with others. One embodiment of the invention calls for a tray with a hand grip that holds two cell phones, and can adjust them to approximately an interpupillary distance, such that a user can take a picture with the device and have a recipient of the message view either the user or an objection the user is pointing the device at in three-dimensional view. The software also has image recognition abilities such that it can build a three-dimensional environment through the one-sided capture of an image, then pull data from an image recognition database to complete a three-dimensional representation of the object. Dual Bluetooth with close-range detection for shutter control was developed and tested successfully.
Claims
1. An apparatus for capturing stereo images comprising a mount, where the mount comprises a first rack and a second rack, where each rack comprises a retention mechanism; and a first camera and a second camera, where the first camera is secured to the first rack by the retention mechanism of the first rack, and where the second camera is secured to the second rack by the retention mechanism of the second rack, where each camera comprises a lens, where the lenses are a certain distance from each other.
2. The apparatus of claim 1, wherein a distance between each rack is adjustable thereby adjusting the distance between each lens.
3. The apparatus of claim 2, wherein the distance between each lens is between 65 mm and 130 mm, inclusive.
4. The apparatus, of claim 1, wherein the lenses are at a distance from each other of between 60 mm and 70 mm, inclusive.
5. The apparatus of claim 1, wherein the first camera is a first mobile phone and the second camera is a second mobile phone.
6. The apparatus of claim 1, wherein the first camera is a first 360 degree camera, and where the second camera is a second 360 degree camera.
7. The apparatus of claim 1, wherein each rack has a separate rotational degree of freedom allowing each rack to rotate independently.
8. A system for generating stereo image or video files from separate capture sources comprising a capture system having a first camera and a second camera, where the capture system generates a first media and a second media, where the first media comprises image or video data from the first camera, and where the second media comprises image or video data from the second camera, a computer system comprising one or more processors executing programming logic, the programming logic configured to: recognize objects in the first media and second media and retrieving pattern recognition objects from pattern recognition databases; take the first media, second media and the pattern recognition objects, and creating a three dimensional representation of an entire scene, including both the side views of objects originally contained in the first media and second media as well as side views of objects created by the software from the pattern recognition objects; take both the side views of objects originally contained in the first media and second media as well as side views of objects created by the software from the pattern recognition objects, and combining these into two separate maps, one for each eye; identify discrepancies in the objects between the first media and second media, and correcting these discrepancies by comparing the first media and the second media; and generate an output media having a stereo image or video file, and a display system for displaying the output media.
9. The system of claim 8, wherein the first media comprises video data, and where the programming logic is further configured to vary the frame rate of video data within the first media based upon available light.
10. The system of claim 8, wherein the programming logic is further configured to create a scene through spatial map construction using the first media from both cameras.
11. The system of claim 8, wherein the programming logic is further configured to create a virtual reality environment through backside image additions by retrieving object recognition data from one or more object recognition databases.
12. The system of claim 8, wherein each camera of the capture system is a mobile phone.
13. The system of claim 8, wherein each camera of the capture system is a 360 degree camera.
14. The system of claim 8, wherein the display system is a mobile phone.
15. The system of claim 8, wherein the display system is a set of goggles, where the goggles comprises a first display for displaying an image to a first eye of a user, and a second display for displaying an image to a second eye of a user.
16. The system of claim 8, wherein each camera of the capture system comprises a lens, wherein the lenses are at a distance from each other of between 60 mm and 70 mm, inclusive.
17. A method of providing a three-dimensional stereo viewing experience, comprising the steps of, in order: first, securing two cameras to an adjustable mount; second, capturing two sets of pictures or videos using the cameras from adjustably different perspectives; third, transmitting the two sets of pictures or videos to a processing center; fourth, identifying temporal overlap in the two sets of pictures or videos and creating a temporal overlapped series of pictures or videos; fifth, creating a single file from the temporal overlapped series of pictures or videos; sixth, forming a single image or video file; seventh, transmitting the single image or video file to a customer.
18. The method of claim 17, wherein each camera is a mobile phone.
19. The method of claim 17, wherein each camera is a 360 degree camera.
20. The method of claim 17, wherein the mount comprises a strap, band or bracket.
Description
BRIEF DESCRIPTION OF THE FIGURES
[0034] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of this invention.
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
DETAILED DESCRIPTION OF THE INVENTION
[0045] Many aspects of the invention can be better understood with the references made to the drawings below. The components in the drawings are not necessarily drawn to scale. Instead, emphasis is placed upon clearly illustrating the components of the present invention. Moreover, like reference numerals designate corresponding parts through the several views in the drawings.
[0046]
[0047]
[0048] In the left eye panel of the stereogram the distance AB is the representation of the front face of the cube, in the right eye panel, there is in addition BC, the representation of the cube's depth, i.e., the intercept on the screen of the rays from the cameras' principal points to the back of the cube. This interval computes to the first order to dza/z. (To simplify the account, the right and left screens are taken to be superimposed, as they would be in a 3D display with LCD goggles.) Hence the depth/width ratio of the cube's view, as embodied in its representation on the viewing screen, is r=adz/zdx=a/z since dx=dz and depends solely on the distance of the target from the twin lenses and their separation and remains constant with scale or magnification changes. The depth/width ratio of the actual object, of course, is 1.00.
[0049] This stereogram with the cube, whose depth/width ratio had been captured with recording parameters a.sub.c and z.sub.c and embodied in the ratio BC/AB=r.sub.c=a.sub.c/z.sub.c, is now viewed by an observer with interocular separation a.sub.o at a distance z.sub.o. An overall scale change in BC/AB does not matter, but unless r.sub.o=r.sub.c, i.e., a.sub.o/z.sub.o=a.sub.c/z.sub.c. this no longer represents a cube but rather becomes, for this observer at this distance, a configuration for which R=r.sub.c/r.sub.o; for example, whose depth is R times that of a cube.
[0050] But we also need the ability to pan in an arbitrary direction, so we insert a flat image consistent with stereo graphic projection where in Cartesian coordinates (x, y, z) on the sphere and (X, Y) on the plane, the projection and its inverse are given by the following equations:
[0051] For all points except the south pole:
PP
Equation 3
[0052] But we need two distinct maps to cover the disparities between left and right eyes when the gaze falls in a particular direction in perceived three-space, generating (x.sub.2, y.sub.2,z.sub.2) and (X.sub.2,Y.sub.2) for the unique fields as above.
[0053] Further, we need to parameterize motion through the scene along various user selected paths, functionally moving the origins and requiring significant recalculation of the relative objects' shapes features and positions within the displayed video sequence for each eye.
[0054] In some embodiments, the system can function as a phone App where the users understand that image stability and interpupillary distance are critical to simple image construction directly from the video files for display by binocular viewing means.
[0055]
[0056] Particular embodiments of the current disclosure have a capture apparatus that includes two mobile phones with integrated cameras, where the lenses of each camera are separated by a certain distance. Each mobile phone captures a media, whereby a first media and second media are captured by the capture apparatus. A dual phone mount may be used to secure the cameras in a fixed position with the lenses of each camera a certain distance apart.
[0057] Further embodiments of the capture apparatus include a mobile phone application that uses on board WIFI utilities for the coordination of shutter and zoom controls and image uploads of two phones. This approach designates one phone as a master that in control and other (a slave) by copying the settings and timing as well as directing these activities once the photos are initiated.
[0058] A creation apparatus is also disclosed herein. The creation apparatus comprises a computerized system that includes machine readable instructions on a non-transitory medium. The instructions enable the computerized system to receive a first media and a second media and convert it into a three dimensional image or sequence of images (video) stored as an output media. A particular embodiment disclosed herein has instructions that concatenate images together to form a three-dimensional image. For example, a first image file and a second image file, each of the same dimension, are concatenated together to form a third image file that is twice the width of the first or second image file. The raw image may be twice the width of the two individual images, but for high resolution images, various compression algorithms may be employed to reduce the file size. The concatenated image may or may not be the full objective space of the original pictures, that is, the images may be cropped and still contain the concatenated 3D stereo image of interest. This approach is not generally automated.
[0059] The creation apparatus may further comprise computer readable instructions capable of recognizing objects within the media files created by the mobile phones. Pattern recognition objects from a pattern recognition database are used to determine objects within a media file. These objects are used to create three-dimensional representations of a scene in the media file, thereby allowing for the creation of side views of objects that were not previously visible by the particular media file. Furthermore, the recognized objects may be used to create two separate maps, one for each eye, that are used when displaying the output media file to a user.
[0060] The three-dimensional media file produced by the creation apparatus is displayed on a display apparatus. The display apparatus displays the output media file to a user such that the user perceives the output media file in three dimensions. For example, goggles with a display that matches dimensions of the output media file, and whose display segregates each half of the output media file to one eye of the user, may appropriately display the output media file to user such that the user perceives the resulting image as a three-dimensional view. A mobile phone may be used in conjunction with the goggles or as a part of the goggles.
[0061] Additionally, hardwired solutions for single button shutter triggering were identified and successfully tested.
[0062]
[0063]
[0064]
[0065] The dual phone mount includes a retention system. The retention system has rubber bands, Velcro bands, both, or similar straps or bands that are used in conjunction with a protective spacer between the mobile phones. Alternatively, the retention system may use hook and loop fasteners, magnets, friction strips, or similar contact retention strips for restraining the mobile phones to the mount. Slotted brackets within a trough may be used to allow for variable wall spacing for phones with different thicknesses and for precise mating with the mount. The dual phone mount may also include an adjustment system. The adjustment system has a split rack, that is, two racks that are independent of the other, each capable of holding and retaining a mobile phone. Each rack of the split rack has a rotational degree of freedom allowing the axis of each camera of each mobile phone to cross, that is, be non-parallel. This allows for close-ups with zoom. Placement of a rubber band, approximately one-inch in width, around the mobile phones to provide a restorative force pulling the mobile phones together allows for a smooth motion of the mobile phones' cameras relative horizontal spacing when zooming. Furthermore, the spacer is located within the band's loop to prevent additional movement after the desired setting is achieved, or to prevent the mobile phones from moving too close together. The retention system may include linear and angular measurement markings along its length to provide quantitative estimates and guidance for quick pre-set values, although often it is possible to align images visually.
[0066] Particular embodiments provide for a grip portion, where the grip portion is attached to a bracket portion, and the bracket portion comprises a front section, a back section, two end sections, a trough, and means of adjustment, where the trough comprises a cavity bounded by the front section, the back section, the two end sections, and a bottom section, where a user can place two cell phones in the trough and adjust their location relative to one another through the means of adjustment.
[0067] The dual phone mount, in particular embodiments, sets the lenses of each camera of each mobile phone between 6.5 centimeters and 13.0 centimeters apart. As discussed above, the distance between the lenses of each camera may be varied by varying the distance between the mobile phones.
[0068] WIFI links can be used for on-board processing in a phone App. This approach, when considered in the context of existing display technologies, can be used not only for virtual presence for anyone, but also as a prosthetic for visually impaired persons, as it allows for inline enhancements of brightness and contrast, potentially with edge enhancements and artificial color for those who would like or need better views of their own environment. It will be a primary goal of this effort to provide this functionality.
[0069] Turning to
[0070] Capture configurations as shown in
[0071]
[0072] Also, as shown in
[0073]
[0074] A computer system for generating an output media file, in certain embodiments, includes instructions to place the left image on the left field of view, and the right image on the right field of view in the concatenated image. Then, the computer system should align objects in the respective images so that the subject elements are at approximately the same vertical height in the concatenated image of both the top and bottom of the subjects. This means that the apparent top of the objects are at the same line number in the concatenated image and that the images are scaled so that the objects are of the same approximate scale, and so that the horizontal-vertical scale proportionality is maintained for any vertical scaling adjustment (vertical-only image stretching is not suitable and will result in adverse viewing effects). The resultant smaller image is the maximum extent of the stereo 3D effect, and any area not encompassed by the smaller image can be cropped out of the final produced image concatenation. Apparent horizontal scale should not be considered when aligning or scaling the left and right images. The concatenated image can then be moved in its entirety such that the vertical center of the mage is at the vertical center of the displayed field of view. Differences in left and right image resolution after scaling may be ignored, but will result in differences in the left and right resolutions in the concatenated image.
[0075] Object recognition software exists from third parties for the purposes of completing un-captured perspective views of virtual environments, including software created by Dynamic Ventures, Inc. and Facebook, Inc.
[0076] While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not of limitation. Likewise, the various diagrams may depict an example architectural or other configuration for the invention, which is provided to aid in understanding the features and functionality that can be included in the invention. The invention is not restricted to the illustrated example architectures or configurations, but the desired features can be implemented using a variety of alternative architectures and configurations.
[0077] Indeed, it will be apparent to one of skill in the art how alternative functional configurations can be constructed to implement the desired features of the present invention. Additionally, with regard to flow diagrams, operational descriptions and method claims, the order in which the steps are presented herein shall not mandate that various embodiments be implemented to perform the recited functionality in the same order unless the context dictates otherwise.
[0078] Although the invention is described above in terms of various exemplary embodiments and implementations, it should be understood that the various features, aspects and functionality described in one or more of the individual embodiments are not limited in their applicability to the particular embodiment with which they are described, but instead can be applied, alone or in various combinations, to one or more of the other embodiments of the invention, whether or not such embodiments are described and whether or not such features are presented as being a part of a described embodiment. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments.