G06V10/806

MEDIA PROCESSING METHOD, RELATED APPARATUS, AND STORAGE MEDIUM

Provided is a video processing method, including: obtaining a to-be-processed video and generating a first gait energy diagram, the to-be-processed video including an object with a to-be-recognized identity; obtaining a second gait energy diagram, the second gait energy diagram being generated based on a video including an object with a known identity; inputting the first gait energy diagram and the second gait energy diagram into a deep neural network; extracting respective identity information of the first gait energy diagram and the second gait energy diagram, and determining a fused gait feature vector from gait feature vectors of the first gait energy diagram and the second gait energy diagram; and calculating a similarity based on at least the fused gait feature vector. The identity information of the first gait energy diagram includes gait feature vectors, and the identity information of the second gait energy diagram includes gait feature vectors.

Multi-view vector processing method and multi-view vector processing device
10796205 · 2020-10-06 · ·

A multi-view vector processing method and a multi-view vector processing device are provided. A multi-view vector x represents an object containing information on at least two non-discrete views. A model of the multi-view vector, where the model includes at least components of: a population mean of the multi-view vector, view component of each view of the multi-view vector and noise custom character is established. The population mean , parameters of each view component and parameters of the noise custom character, are obtained by using training data of the multi-view vector x. The device includes a processor and a storage medium storing program codes, and the program codes implements the aforementioned method when being executed by the processor.

Perceptual data association

Embodiments provide for perceptual data association from at least a first and a second sensor disposed at different positions in an environment, in respective series of local scene graphs that identify characteristics of objects in the environment that are updated asynchronously and merging the series of local scene graphs to form a coherent image of the environment from multiple perspectives.

PERCEPTUAL DATA ASSOCIATION
20200311462 · 2020-10-01 ·

Embodiments provide for perceptual data association from at least a first and a second sensor disposed at different positions in an environment, in respective series of local scene graphs that identify characteristics of objects in the environment that are updated asynchronously and merging the series of local scene graphs to form a coherent image of the environment from multiple perspectives.

GENERATING MULTI MODAL IMAGE REPRESENTATION FOR AN IMAGE
20200311467 · 2020-10-01 ·

Technologies for generating a multi-modal representation of an image based on the image content are provided. The disclosed techniques include receiving an image, to be classified, that comprises one or more embedded text characters. The one or more embedded text characters are identified from the image and a first machine learning model is used to generate a text vector that represents a numerical representation of the one or more embedded text characters. A second machine learning model is used to generate an image vector that represents a numerical representation of the graphical portion of the image. The text vector and the image vector are used as input to generate a multi-modal vector that contains information from both the text vector and the image vector. The image may be classified into one of a plurality of image classifications based upon the information in the multi-modal vector.

Information processing device, learned model, information processing method, and computer program product

According to an embodiment, an information processing device includes one or more processors. The one or more processors is configured to acquire a map in which, for each of grids in a particular space, observation information representing object information on an object or the observation information representing non-observation information on non-observation of the object is correlated; and correct, for each of the grids, correlation of the observation information by using a learned model based on the observation information correlated with other peripheral grids.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
20240013407 · 2024-01-11 ·

An information processing apparatus comprises a first computation unit configured to obtain first features of an image of a tracking target, a second computation unit configured to obtain second features of an image of a search region, a third computation unit configured to obtain an inference tensor representing likelihoods that the tracking target is present at respective positions of the image of the search region, using the first features and the second features, and a fourth computation unit configured to obtain an inference map representing a position of the tracking target in the image of the search region, using the inference tensor.

Detecting Boxes

A method for detecting boxes includes receiving a plurality of image frame pairs for an area of interest including at least one target box. Each image frame pair includes a monocular image frame and a respective depth image frame. For each image frame pair, the method includes determining corners for a rectangle associated with the at least one target box within the respective monocular image frame. Based on the determined corners, the method includes the following: performing edge detection and determining faces within the respective monocular image frame; and extracting planes corresponding to the at least one target box from the respective depth image frame. The method includes matching the determined faces to the extracted planes and generating a box estimation based on the determined corners, the performed edge detection, and the matched faces of the at least one target box.

SHAPE FUSION FOR IMAGE ANALYSIS

Various types of image analysis benefit from a multi-stream architecture that allows the analysis to consider shape data. A shape stream can process image data in parallel with a primary stream, where data from layers of a network in the primary stream is provided as input to a network of the shape stream. The shape data can be fused with the primary analysis data to produce more accurate output, such as to produce accurate boundary information when the shape data is used with semantic segmentation data produced by the primary stream. A gate structure can be used to connect the intermediate layers of the primary and shape streams, using higher level activations to gate lower level activations in the shape stream. Such a gate structure can help focus the shape stream on the relevant information and reduces any additional weight of the shape stream.

SCALABLE DATA FUSION ARCHITECTURE AND RELATED PRODUCTS
20200302221 · 2020-09-24 ·

Provided are a scalable data fusion method and related products. The scalable data fusion method is applied in a central device and includes: receiving sensing data transmitted by each of M first edge devices, wherein M is an integer equal to or greater than 1; fusing the sensing data transmitted by each of the M first edge devices to obtain M pieces of fused data respectively corresponding to the M first edge devices; distributing the M pieces of fused data to the M first edge devices respectively; receiving object information transmitted by each of the M first edge devices, wherein the object information is obtained based on the fused data; and integrating the object information transmitted by each of the M first edge devices and construct surrounding information based on the integrated object information.