G06V10/449

Downscaler and Method of Downscaling
20220100466 · 2022-03-31 ·

A hardware downscaler and an architecture for implementing a FIR filter in which the downscaler can be arranged for downscaling by a half in one dimension. The downscaler can comprise: hardware logic implementing a first three-tap FIR filter; and hardware logic implementing a second three-tap FIR filter; wherein the output from the hardware logic implementing the first three-tap filter is provided as an input to the hardware logic implementing the second three-tap filter.

Downscaler and Method of Downscaling
20220092731 · 2022-03-24 ·

A hardware downscaling module and downscaling methods for downscaling a two-dimensional array of values. The hardware downscaling unit comprises a first group of one-dimensional downscalers; and a second group of one-dimensional downscalers; wherein the first group of one-dimensional downscalers is arranged to receive a two-dimensional array of values and to perform downscaling in series in a first dimension; and wherein the second group of one-dimensional downscalers is arranged to receive an output from the first group of one-dimensional downscalers and to perform downscaling in series in a second dimension.

EFFICIENT DATA LAYOUTS FOR CONVOLUTIONAL NEURAL NETWORKS
20220076056 · 2022-03-10 ·

Systems and methods for efficient implementation of a convolutional layer of a convolutional neural network are disclosed. In one aspect, weight values of kernels in a kernel stack of a convolutional layer can be reordered into a tile layout with tiles of runnels. Pixel values of input activation maps of the convolutional layer can be reordered into an interleaved layout comprising a plurality of clusters of input activation map pixels. The output activation maps can be determined using the clusters of the input activation map pixels and kernels tile by tile.

Fixation Generation For Machine Learning
20210334610 · 2021-10-28 ·

The disclosure extends to methods, systems, and apparatuses for automated fixation generation and more particularly relates to generation of synthetic saliency maps. A method for generating saliency information includes receiving a first image and an indication of one or more sub-regions within the first image corresponding to one or more objects of interest. The method includes generating and storing a label image by creating an intermediate image having one or more random points. The random points have a first color in regions corresponding to the sub-regions and a remainder of the intermediate image having a second color. Generating and storing the label image further includes applying a Gaussian blur to the intermediate image.

Expression recognition method, apparatus, electronic device, and storage medium

Embodiments of the present disclosure provide an expression recognition method, apparatus, electronic device and storage medium. An expression recognition model includes a convolutional neural network model, a fully connected network model and a bilinear network model. During an expression recognition process, after an image to be recognized is pre-processed to obtain a facial image and a key point coordinate vector, the facial image is computed by the convolutional neural network model to output a first feature vector, the key point coordinate vector is computed by the fully connected network model to output a second feature vector, the first feature vector and the second feature vector are computed by the bilinear network model to obtain second-order information, and an expression recognition result in turn is obtained according to the second-order information. During this process, robustness of gestures and illuminations is better, and accuracy of expression recognition is improved.

DEEP LEARNING BASED ADAPTIVE ARITHMETIC CODING AND CODELENGTH REGULARIZATION
20210295164 · 2021-09-23 ·

A deep learning based compression (DLBC) system applies trained models to compress binary code of an input image to a target codelength. For a set of binary codes representing the quantized coefficents of an input image, the DLBC system applies a first model that is trained to predict feature probabilities based on the context of each bit of the binary codes. The DLBC system compresses the binary code via adaptive arithmetic coding based on the determined probability of each bit. The compressed binary code represents a balance between a reconstruction quality of a reconstruction of the input image and a target compression ratio of the compressed binary code.

Fixation generation for machine learning

The disclosure extends to methods, systems, and apparatuses for automated fixation generation and more particularly relates to generation of synthetic saliency maps. A method for generating saliency information includes receiving a first image and an indication of one or more sub-regions within the first image corresponding to one or more objects of interest. The method includes generating and storing a label image by creating an intermediate image having one or more random points. The random points have a first color in regions corresponding to the sub-regions and a remainder of the intermediate image having a second color. Generating and storing the label image further includes applying a Gaussian blur to the intermediate image.

Low-power iris scan initialization

Apparatuses, methods, and systems are presented for sensing scene-based occurrences. Such an apparatus may comprise a vision sensor system comprising a first processing unit and dedicated computer vision (CV) computation hardware configured to receive sensor data from at least one sensor array comprising a plurality of sensor pixels and capable of computing one or more CV features using readings from neighboring sensor pixels. The vision sensor system may be configured to send an event to be received by a second processing unit in response to processing of the one or more computed CV features by the first processing unit. The event may indicate possible presence of one or more irises within a scene.

VIDEO SYNTHESIS METHOD, MODEL TRAINING METHOD, DEVICE, AND STORAGE MEDIUM
20210243383 · 2021-08-05 ·

Embodiments of this application disclose methods, systems, and devices for video synthesis. In one aspect, a method comprises obtaining a plurality of frames corresponding to source image information of a first to-be-synthesized video, each frame of the source image information. The method also comprises obtaining a plurality of frames corresponding to target image information of a second to-be-synthesized video. For each frame of the plurality of frames corresponding to the target image information of the second to-be-synthesized video, the method comprises fusing a respective source image from the first to-be-synthesized video, a corresponding source motion key point, and a respective target motion key point corresponding to the frame using a pre-trained video synthesis model, and generating a respective output image in accordance with the fusing. The method further comprises repeating the fusing and the generating steps for the second to-be-synthesized video to produce a synthesized video.

Deep learning based adaptive arithmetic coding and codelength regularization
11100394 · 2021-08-24 · ·

A deep learning based compression (DLBC) system applies trained models to compress binary code of an input image to a target codelength. For a set of binary codes representing the quantized coefficents of an input image, the DLBC system applies a first model that is trained to predict feature probabilities based on the context of each bit of the binary codes. The DLBC system compresses the binary code via adaptive arithmetic coding based on the determined probability of each bit. The compressed binary code represents a balance between a reconstruction quality of a reconstruction of the input image and a target compression ratio of the compressed binary code.