G06V10/806

Video Content based on Multiple Capture Devices

Techniques for video content based on multiple capture devices are described and are implementable to enable multiple video capture devices to be utilized for a video feed. Generally, the described implementations enable video content captured by multiple video capture devices to be utilized, such as to integrate different instances of video content into a merged video content stream. In at least one implementation this provides higher quality video attributes to be utilized than is provided by utilizing a single video content source.

INFORMATION PROCESSING APPARATUS, METHOD FOR CONTROLLING THE SAME, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
20230077498 · 2023-03-16 ·

This disclosure provide an apparatus which comprises an acquisition unit that acquires an image, a divider that divides the image into a plurality of partial images; a converter that converts the partial images into tokens having fixed dimensional vectors; an adder that adds a class-token having a vector of a dimension same as the two or more tokens to the tokens obtained by the converter; an encoder that updates the tokens obtained by the adder based on a relevance between the tokens to obtain final encoded representations; an acquiring unit that acquires an encoded representation corresponding to the class-token in the encoded representations obtained by the encoder as class-token encoded representations; and a combining unit that combines the class-token encoded representations to obtain a feature vector of the input image.

Methods and apparatus for human pose estimation from images using dynamic multi-headed convolutional attention

An apparatus for 3D human pose estimation using dynamic multi-headed convolutional attention mechanism is presented. The apparatus contains two dynamic multi-headed convolutional attention mechanism with spatial attention and another with temporal attention that leverages the spatial attention mechanism to extract frame-wise inter-joint dependencies by analyzing sections of limbs that are related. The temporal attention mechanism extracts global inter-frame relationships by analyzing correlations between the temporal profile of joints. The temporal profile mechanism leads to a more diverse temporal attention map while achieving substantial parameter reduction.

A MONTE CARLO RENDERING IMAGE DENOISING MODEL, METHOD AND DEVICE BASED ON GENERATIVE ADVERSARIAL NETWORK
20220335574 · 2022-10-20 ·

The present invention discloses a denoising model of Monte Carlo rendering based on a Generative Adversarial Network (GAN) and its construction method, including: constructing a training sample and constructing a Generative Adversarial Network (GAN), including Denoising Net and Critic Net, wherein Denoising Net is used to denoise the input noise rendering image and auxiliary features, and output the denoising rendering image, and Critic Net is used to classify the input denoising rendering image and the target rendering image corresponding to the noise rendering image, and output the classification result. The training samples are used to tune the network parameters of the Generative Adversarial Network (GAN). After the tuning is completed, the denoising network determined by the network parameters is used as the Monte Carlo rendering image denoising model. A denoising method and device for the Monte Carlo rendering image are also disclosed, which can realize the denoising of noisy Monte Carlo renderings.

INSTANCE SEGMENTATION METHOD AND APPARATUS
20220335619 · 2022-10-20 ·

An instance segmentation method and apparatus are provided. A to-be-trained segmentation network performs the following processing on each instance group that is in a sample original image and that is of pixels of a marked instance, where each instance group includes at least one marked instance, and the processing includes: predicting at least two different first basic feature maps and a first attention feature map corresponding to each first basic feature map; performing weighted processing on the at least two first basic feature maps and pixel values of respective first attention feature maps corresponding to the at least two first basic feature maps, to obtain a first feature fusion map; and training the to-be-trained segmentation network based on the first feature fusion map and the sample original image. A segmentation model can precisely determine pixels of an instance in an original image.

Image processing method and apparatus, computer-readable storage medium, and computer device

An image processing method is provided. The method includes obtaining at least two images, the at least two images being based on the same target object captured from different imaging angles, respectively; extracting, by using feature extraction networks included in an image processing model, target features of the at least two images, the feature extraction networks being configured to extract features of images corresponding to the different imaging angles, respectively; and determining, based on the target features, a classification result corresponding to the target object.

SYSTEMS AND METHODS FOR DETECTION OF CONCEALED THREATS

Described herein are systems for detecting a representation of an object in a radio frequency (RF) image. The system transmits one or more first RF signals toward an object, and receives one or more second RF signals, associated with the one or more transmitted RF signals, that have been reflected from the object. The system determines a plurality of first feature maps corresponding to a RF image associated with the one or more second RF signals. The system combines the plurality of first feature maps. The system further detects a representation of the object in the RF image based at least in part on the combined plurality of first feature maps.

METHODS AND DEVICES FOR EFFICIENT GENERAL DECONVOLUTION IMPLEMENTATION ON HARDWARE ACCELERATOR
20230075264 · 2023-03-09 · ·

Methods and devices are provided for implementing efficient general deconvolution Implementation on hardware accelerator. In one method for implementing a devolution operation between a multi-dimensional kernel in a pre-defined dimensional count and a multi-dimensional input map in the pre-defined dimensional count to obtain a multi-dimensional output feature map in the pre-defined dimensional count, the multi-dimensional kernel is subsampled into a plurality of non-overlapping multi-dimensional sub-kernels, each in the pre-defined dimensional count, based on a pre-defined common stride parameter, a plurality of multi-dimensional sub-output feature maps in the pre-defined dimensional count is obtained by applying a stride-dependent virtual padded devolution operation between the multi-dimensional input map and each multi-dimensional sub-kernel in the plurality of non-overlapping multi-dimensional sub-kernels, and in response to determining the pre-defined common stride parameter is not greater than two, a pre-defined multi-dimensional interleave-concatenation method is used to interleave-concatenate and reorder the plurality of multi-dimensional sub-output feature maps into the multi-dimensional output feature map.

CASE QUERY APPARATUS AND METHOD AND STORAGE MEDIUM

According to one embodiment, a case query apparatus includes a processing circuit. The processing circuit acquires a query condition represented by a query target. The processing circuit acquires a meta query condition represented by a description concerning a viewpoint to focus on when querying a case similar to the query condition. The processing circuit calculates a similarity degree between the query condition and each of a plurality of reference cases represented by a query target. The processing circuit queries a similar reference case similar to the query condition from a viewpoint of the meta query condition, among the plurality of reference cases, based on the similarity degree. The processing circuit presents a query result on the similar reference case.

METHOD AND SYSTEM FOR PROCESSING IMAGE BASED ON WEIGHTED MULTIPLE KERNELS
20230073175 · 2023-03-09 ·

Systems and methods for processing a plurality of images include obtaining input data including the plurality of images; providing the input data to a first machine learning model; providing an output of the first machine learning model to a second machine learning model and a third machine learning model; generating a first feature map corresponding to a plurality of kernels based on an output of the second machine learning model; generating a second feature map corresponding to a plurality of weights based on an output of the third machine learning model; generating a predicted kernel based on a weighted sum of the plurality of kernels; and generating output data based on the input data and the predicted kernel.