G06V10/806

SYSTEM AND METHOD FOR DETECTING LIVENESS OF BIOMETRIC INFORMATION
20230245493 · 2023-08-03 ·

The present teaching relates to method, system, medium, and implementations for detecting liveness. When an image is received with visual information claimed to represent a palm of a person, a region of interests (ROI) in the image that corresponds to the palm is identified. Each of a plurality of fake palm detectors individually generates an individual decision on whether the ROI corresponds to a specific type of fake palm that the fake palm detector is to detect. Such individual decisions from the plurality of fake palm detectors are combined to derive a liveness detection decision with respect to the ROI.

SYSTEM AND METHOD FOR GENERATING REGION OF INTERESTS FOR PALM LIVENESS DETECTION
20230245502 · 2023-08-03 ·

The present teaching relates to detecting palm liveness. When an image is received with visual information claimed to represent a palm of a person, an initial region of interests (ROI) is identified from the image that corresponds to the palm and an initial dimension thereof is determined. When the initial dimension is smaller than a specified dimension, the initial ROI is extended in some respective directions to some expansion region with certain expansion dimension to generate an ROI using the visual information in the ROI from the image. A plurality of decisions are obtained with respect to the ROI, each of which is made individually on whether the ROI represents a specific type of fake palm. The decisions are then combined to derive a liveness detection decision on whether the palm captured in the image is live.

CROSS-SCALE DEFECT DETECTION METHOD BASED ON DEEP LEARNING
20230306577 · 2023-09-28 ·

A cross-scale defect detection method based on deep learning, including: (S1) building a vision data acquisition system to acquire a surface image of a part to be processed; and building a defect dataset; (S2) building a deep learning-based cross-scale defect detection model; and inputting the defect dataset obtained in the step (S1) into the deep learning-based cross-scale defect detection model for model training; and (S3) building a defect detection system according to the deep learning-based cross-scale defect detection model and the vision data acquisition system; and detecting a defect of the surface image of the part to be processed.

TARGET OBJECT DETECTION METHOD AND APPARATUS, AND READABLE STORAGE MEDIUM
20230306750 · 2023-09-28 · ·

A target object detection method, including: obtaining images collected by more than one camera installed on a target vehicle; determining a high-dimensional parameter feature in a high-dimensional space corresponding to parameter information of each camera; and fusing features of the images via a target object detection model according to the high-dimensional parameter features, and determining position information of a target object based on the fused features, an order of the cameras corresponding to the images being the same as an order of the cameras corresponding to the high-dimensional parameter features.

Method and device for visual question answering, computer apparatus and medium

The present disclosure provides a method for visual question answering, which relates to a field of computer vision and natural language processing. The method includes: acquiring an input image and an input question; constructing a Visual Graph based on the input image, wherein the Visual Graph comprises a Node Feature and an Edge Feature; updating the Node Feature by using the Node Feature and the Edge Feature to obtain an updated Visual Graph; determining a question feature based on the input question; fusing the updated Visual Graph and the question feature to obtain a fused feature; and generating a predicted answer for the input image and the input question based on the fused feature. The present disclosure further provides an apparatus for visual question answering, a computer device and a non-transitory computer-readable storage medium.

Multimodal medical image fusion method based on darts network

A multimodal medical image fusion method based on a DARTS network is provided. Feature extraction is performed on a multimodal medical image by using a differentiable architecture search (DARTS) network. The network performs learning by using the gradient of network weight as a loss function in a search phase. A network architecture most suitable for a current dataset is selected from different convolution operations and connections between different nodes, so that features extracted by the network have richer details. In addition, a plurality of indicators that can represent image grayscale information, correlation, detail information, structural features, and image contrast are used as a network loss function, so that the effective fusion of medical images can be implemented in an unsupervised learning way without a gold standard.

Image processing system

Disclosed is a multi-modal convolutional neural network (CNN) for fusing image information from a frame based camera, such as, a near infra-red (NIR) camera and an event camera for analysing facial characteristics in order to produce classifications such as head pose or eye gaze. The neural network processes image frames acquired from each camera through a plurality of convolutional layers to provide a respective set of one or more intermediate images. The network fuses at least one corresponding pair of intermediate images generated from each of image frames through an array of fusing cells. Each fusing cell is connected to at least a respective element of each intermediate image and is trained to weight each element from each intermediate image to provide the fused output. The neural network further comprises at least one task network configured to generate one or more task outputs for the region of interest.

Information processing apparatus and method of inferring
11769340 · 2023-09-26 · ·

A non-transitory computer-readable recording medium stores a program for causing a computer to execute a process including for each of plural pieces of first-type training data including image information, first semantic information, and a first class of a relevant first object, generating a first hyperdimensional vector (HV) from the image information and the first semantic information, and storing the first HV in a storage unit in correlation with the first class, and for each of plural pieces of second-type training data including second semantic information and a second class of a relevant second object, obtaining, from the storage unit, a predetermined number of HVs exhibiting a higher degree of matching with an HV generated from the second semantic information, generating a second HV of the second-type training data based on the predetermined number of HVs, and storing the second HV in the storage unit in correlation with the second class.

Generating synthesized digital images utilizing a multi-resolution generator neural network

This disclosure describes methods, non-transitory computer readable storage media, and systems that generate synthetized digital images via multi-resolution generator neural networks. The disclosed system extracts multi-resolution features from a scene representation to condition a spatial feature tensor and a latent code to modulate an output of a generator neural network. For example, the disclosed systems utilizes a base encoder of the generator neural network to generate a feature set from a semantic label map of a scene. The disclosed system then utilizes a bottom-up encoder to extract multi-resolution features and generate a latent code from the feature set. Furthermore, the disclosed system determines a spatial feature tensor by utilizing a top-down encoder to up-sample and aggregate the multi-resolution features. The disclosed system then utilizes a decoder to generate a synthesized digital image based on the spatial feature tensor and the latent code.

Multi-modal, multi-technique vehicle signal detection

A vehicle includes one or more cameras that capture a plurality of two-dimensional images of a three-dimensional object. A light detector and/or a semantic classifier search within those images for lights of the three-dimensional object. A vehicle signal detection module fuses information from the light detector and/or the semantic classifier to produce a semantic meaning for the lights. The vehicle can be controlled based on the semantic meaning. Further, the vehicle can include a depth sensor and an object projector. The object projector can determine regions of interest within the two-dimensional images, based on the depth sensor. The light detector and/or the semantic classifier can use these regions of interest to efficiently perform the search for the lights.