Patent classifications
G06T7/596
Recognition of activity in a video image sequence using depth information
Techniques are provided for recognition of activity in a sequence of video image frames that include depth information. A methodology embodying the techniques includes segmenting each of the received image frames into a multiple windows and generating spatio-temporal image cells from groupings of windows from a selected sub-sequence of the frames. The method also includes calculating a four dimensional (4D) optical flow vector for each of the pixels of each of the image cells and calculating a three dimensional (3D) angular representation from each of the optical flow vectors. The method further includes generating a classification feature for each of the image cells based on a histogram of the 3D angular representations of the pixels in that image cell. The classification features are then provided to a recognition classifier configured to recognize the type of activity depicted in the video sequence, based on the generated classification features.
Method for Determining Distance Information from Images of a Spatial Region
A method includes defining a disparity range having discrete disparities and taking first, second, and third images of a spatial region using first, second, and third imaging units. The imaging units are arranged in an isosceles triangle geometry. The method includes determining first similarity values for a pixel of the first image for all the discrete disparities along a first epipolar line associated with the pixel in the second image. The method includes determining second similarity values for the pixel for all discrete disparities along a second epipolar line associated with the pixel in the third image. The method includes combining the first and second similarity values and determining a common disparity based on the combined similarity values. The method includes determining a distance to a point within the spatial region for the pixel from the common disparity and the isosceles triangle geometry.
POINT DEPTH ESTIMATION FROM A SET OF 3D-REGISTERED IMAGES
Embodiments provide 3D coordinates for points in a scene that are observed to be in the correct physical position in a series of images. A method may comprise obtaining a plurality of images including a base image having at least one annotated point corresponding to a point of an object shown in the base image, and a plurality of side images showing the object from different viewpoints than the base image, wherein the plurality of side images are given with the camera poses relative to the base image, extracting from at least some of the side images, image patches showing the annotated point, wherein a plurality of sets of image patches are extracted, wherein a set of image patches is extracted at a plurality of corresponding candidate depth values, classifying each set as having a corresponding candidate depth value that is correct or incorrect, and outputting a correct depth value.
Image capturing apparatus, monitoring system, image processing apparatus, image capturing method, and non-transitory computer readable recording medium
There is provided an image capturing apparatus that captures a plurality of images, calculates a three-dimensional position from the plurality of images, and outputs the plurality of images and information about the three-dimensional position. The image capturing apparatus includes an image capturing unit, a camera parameter storage unit, a position calculation unit, a position selection unit, and an image complementing unit. The image capturing unit outputs the plurality of images using at least three cameras. The camera parameter storage unit stores in advance camera parameters including occlusion information. The position calculation unit calculates three dimensional positions of a plurality of points. The position selection unit selects a piece of position information relating to a subject area that does not have an occlusion, and outputs selected position information. The image complementing unit generates a complementary image, and outputs the complementary image and the selected position information.
Gesture operation method based on depth values and system thereof
A gesture operation method based on depth values and the system thereof are revealed. A stereoscopic-image camera module acquires a first stereoscopic image. Then an algorithm is performed to judge if the first stereoscopic image includes a triggering gesture. Then the stereoscopic-image camera module acquires a second stereoscopic image. Another algorithm is performed to judge if the second stereoscopic image includes a command gesture for performing the corresponding operation of the command gesture.
Modular imaging system
An example modular imaging system may include at least a modular imaging enclosure and a power supply. The modular imaging enclosure may include multiple panels coupled together to define an interior space of the modular imaging enclosure where an object may be placed in order to obtain image data relating to the object. Individual ones of the panels may include imaging panels having one or more light emitting elements, an array of cameras, and a panel computer. The power supply can supply power to these electronic components so they can operate to capture images of an object in appropriate lighting. The modular nature of the imaging system allows for easy packaging and shipment, as well as easy setup and teardown to use the imaging system, making it an easily transportable device that can be easily transported to the location of an object(s) to be imaged.
PERIMETER ESTIMATION FROM POSED MONOCULAR VIDEO
Techniques for estimating a perimeter of a room environment at least partially enclosed by a set of adjoining walls using posed images are disclosed. A set of images and a set of poses are obtained. A depth map is generated based on the set of images and the set of poses. A set of wall segmentation maps are generated based on the set of images, each of the set of wall segmentation maps indicating a target region of a corresponding image that contains the set of adjoining walls. A point cloud is generated based on the depth map and the set of wall segmentation maps, the point cloud including a plurality of points that are sampled along portions of the depth map that align with the target region. The perimeter of the environment along the set of adjoining walls is estimated based on the point cloud.
Dynamic lighting for objects in images
Techniques and systems are described herein for determining dynamic lighting for objects in images. Using such techniques and systems, a lighting condition of one or more captured images can be adjusted. Techniques and systems are also described herein for determining depth values for one or more objects in an image. In some cases, the depth values (and the lighting values) can be determined using only a single camera and a single image, in which case one or more depth sensors are not needed to produce the depth values.
IMAGE PROCESSING APPARATUS, SYSTEM THAT GENERATES VIRTUAL VIEWPOINT VIDEO IMAGE, CONTROL METHOD OF IMAGE PROCESSING APPARATUS AND STORAGE MEDIUM
To prevent an object that should exist in a virtual viewpoint video image from disappearing. The image processing apparatus generates three-dimensional shape data on a moving object from images based on image capturing from a plurality of viewpoints and outputs the data to the apparatus that generates a virtual viewpoint video image. Then, in a case where it is not possible to generate three-dimensional shape data on an object that behaves as the moving object during a part of the period of the image capturing, three-dimensional shape data on the object generated in the past is output to the apparatus that generates a virtual viewpoint video image.
Game engine responsive to motion-capture data for mixed-reality environments
An immersive content presentation system can capture the motion or position of a performer in a real-world environment. A game engine can be modified to receive the position or motion of the performer and identify predetermined gestures or positions that can be used to trigger actions in a 3-D virtual environment, such as generating a digital effect, transitioning virtual assets through an animation graph, adding new objects, and so forth. The use of the 3-D environment can be rendered and composited views can be generated. Information for constructing the composited views can be streamed to numerous display devices in many different physical locations using a customized communication protocol. Multiple real-world performers can interact with virtual objects through the game engine in a shared mixed-reality experience.