Patent classifications
G06T7/596
Recognition of activity in a video image sequence using depth information
Techniques are provided for recognition of activity in a sequence of video image frames that include depth information. A methodology embodying the techniques includes segmenting each of the received image frames into a multiple windows and generating spatio-temporal image cells from groupings of windows from a selected sub-sequence of the frames. The method also includes calculating a four dimensional (4D) optical flow vector for each of the pixels of each of the image cells and calculating a three dimensional (3D) angular representation from each of the optical flow vectors. The method further includes generating a classification feature for each of the image cells based on a histogram of the 3D angular representations of the pixels in that image cell. The classification features are then provided to a recognition classifier configured to recognize the type of activity depicted in the video sequence, based on the generated classification features.
Dense depth computations aided by sparse feature matching
A system for dense depth computation aided by sparse feature matching generates a first image using a first camera, a second image using a second camera, and a third image using a third camera. The system generates a sparse disparity map using the first image and the third image by (1) identifying a set of feature points within the first image and a set of corresponding feature points within the third image, and (2) identifying feature disparity values based on the set of feature points and the set of corresponding feature points. The system also applies the first image, the second image, and the sparse disparity map as inputs for generating a dense disparity map.
APPARATUS FOR ESTIMATING CAMERA POSE USING MULTI-VIEW IMAGE OF 2D ARRAY STRUCTURE AND METHOD USING SAME
Disclosed herein are an apparatus for estimating a camera pose using multi-view images of a 2D array structure and a method using the same. The method performed by the apparatus includes acquiring multi-view images from a 2D array camera system, forming a 2D image link structure corresponding to the multi-view images in consideration of the geometric structure of the camera system, estimating an initial camera pose based on an adjacent image extracted from the 2D image link structure and a pair of corresponding feature points, and estimating a final camera pose by reconstructing a 3D structure based on the initial camera pose and performing correction so as to minimize a reprojection error of the reconstructed 3D structure.
METHODS AND APPARATUS FOR GENERATING A THREE-DIMENSIONAL RECONSTRUCTION OF AN OBJECT WITH REDUCED DISTORTION
Methods, systems, and computer readable media for generating a three-dimensional reconstruction of an object with reduced distortion are described. In some aspects, a system includes at least two image sensors, at least two projectors, and a processor. Each image sensor is configured to capture one or more images of an object. Each projector is configured to illuminate the object with an associated optical pattern and from a different perspective. The processor is configured to perform the acts of receiving, from each image sensor, for each projector, images of the object illuminated with the associated optical pattern and generating, from the received images, a three-dimensional reconstruction of the object. The three-dimensional reconstruction has reduced distortion due to the received images of the object being generated when each projector illuminates the object with an associated optical pattern from the different perspective.
Multimodal foreground background segmentation
The subject disclosure is directed towards a framework that is configured to allow different background-foreground segmentation modalities to contribute towards segmentation. In one aspect, pixels are processed based upon RGB background separation, chroma keying, IR background separation, current depth versus background depth and current depth versus threshold background depth modalities. Each modality may contribute as a factor that the framework combines to determine a probability as to whether a pixel is foreground or background. The probabilities are fed into a global segmentation framework to obtain a segmented image.
Systems and methods for generating 3D images based on fluorescent illumination
There is provided a computer implemented method for generating a three dimensional (3D) image based of fluorescent illumination, comprising: receiving in parallel by each of at least three imaging sensors positioned at a respective parallax towards an object having a plurality of regions with fluorescent illumination therein, a respective sequence of a plurality of images including fluorescent illumination of the plurality of regions, each of the plurality of images separated by an interval of time; analyzing the respective sequences, to create a volume-dataset indicative of the depth of each respective region of the plurality of regions; and generating a 3D image according to the volume-dataset.
POLKA LINES: LEARNING STRUCTURED ILLUMINATION AND RECONSTRUCTION FOR ACTIVE STEREO
The present disclosure relates generally to image processing, and more particularly, toward techniques for structured illumination and reconstruction of three-dimensional (3D) images. Disclosed herein is a method to jointly learn structured illumination and reconstruction, parameterized by a diffractive optical element and a neural network in an end-to-end fashion. The disclosed approach has a differentiable image formation model for active stereo, relying on both wave and geometric optics, and a trinocular reconstruction network. The jointly optimized pattern, dubbed “Polka Lines,” together with the reconstruction network, makes accurate active-stereo depth estimates across imaging conditions. The disclosed method is validated in simulation and used with an experimental prototype, and several variants of the Polka Lines patterns specialized to the illumination conditions are demonstrated.
Method and apparatus for processing image, electronic device, and storage medium
Disclosed are a method and apparatus for processing an image, an electronic device and a storage medium. A specific implementation comprises: acquiring a matching association relationship of a feature point in each to-be-modeled image frame in a to-be-modeled image frame set, a plurality of to-be-modeled image frames in the to-be-modeled image frame set belonging to at least two different to-be-modeled image sequences; determining a first feature point set of the each to-be-modeled image frame based on the matching association relationship, the first feature point set including a first feature point, and the first feature point matching a corresponding feature point in a to-be-modeled image frame in a different to-be-modeled image sequence; and selecting, based on a number of the first feature point in the first feature point set in the each to-be-modeled image frame, a to-be-modeled image frame from the to-be-modeled image frame set for a three-dimensional reconstruction.
Streaming mixed-reality environments between multiple devices
An immersive content presentation system can capture the motion or position of a performer in a real-world environment. A game engine can be modified to receive the position or motion of the performer and identify predetermined gestures or positions that can be used to trigger actions in a 3-D virtual environment, such as generating a digital effect, transitioning virtual assets through an animation graph, adding new objects, and so forth. The use of the 3-D environment can be rendered and composited views can be generated. Information for constructing the composited views can be streamed to numerous display devices in many different physical locations using a customized communication protocol. Multiple real-world performers can interact with virtual objects through the game engine in a shared mixed-reality experience.
Method and system for multiple stereo based depth estimation and collision warning/avoidance utilizing the same
The present teaching relates to method, system, medium, and implementation of determining depth information in autonomous driving. Stereo images are first obtained from multiple stereo pairs selected from at least two stereo pairs. The at least two stereo pairs have stereo cameras installed with the same baseline and in the same vertical plane. Left images from the multiple stereo pairs are fused to generate a fused left image and right images from the multiple stereo pairs are fused to generate a fused right image. Disparity is then estimated based on the fused left and right images and depth information can be computed based on the stereo images and the disparity.