Patent classifications
G06V10/462
DIGITAL ASSISTANT REFERENCE RESOLUTION
Systems and processes for operating a digital assistant are provided. An example process for performing a task includes, at an electronic device having one or more processors and memory, receiving a spoken input including a request, receiving an image input including a plurality of objects, selecting a reference resolution module of a plurality of reference resolution modules based on the request and the image input, determining, with the selected reference resolution module, whether the request references a first object of the plurality of objects based on at least the spoken input, and in accordance with a determination that the request references the first object of the plurality of objects, determining a response to the request including information about the first object.
CONTOUR SHAPE RECOGNITION METHOD
Provided is a contour shape recognition method, including: sampling and extracting salient feature points of a contour of a shape sample; calculating a feature function of the shape sample at a semi-global scale by using three types of shape descriptors; dividing the scale with a single pixel as a spacing to acquire a shape feature function in a full-scale space; storing feature function values at various scales into a matrix to acquire three types of feature grayscale map representations of the shape sample in the full-scale space; synthesizing the three types of grayscale map representations of the shape sample, as three channels of RGB, into a color feature representation image; constructing a two-stream convolutional neural network by taking the shape sample and the feature representation image as inputs at the same time; and training the two-stream convolutional neural network, and inputting a test sample into a trained network model to achieve shape classification.
SYSTEM AND METHOD FOR CALIBRATING A TIME DIFFERENCE BETWEEN AN IMAGE PROCESSOR AND AN INTERTIAL MEASUREMENT UNIT BASED ON INTER-FRAME POINT CORRESPONDENCE
Systems and methods are used for calibrating a time difference between an image signal processor (ISP) and an inertial measurement unit (IMU) of an image capture device. An image capture device includes a lens, an image sensor, an IMU, and an ISP. The image sensor detects images as frames and the IMU captures motion data. The ISP detects one or more key points on the frames and matches the one or more key points between the frames. The ISP computes one or more calibration parameters. The one or more calibration parameters are based on the matched key points and a time difference between the ISP and the IMU. The ISP performs a calibration using the calibration parameters.
SENSOR TRANSFORMATION ATTENTION NETWORK (STAN) MODEL
A sensor transformation attention network (STAN) model including sensors configured to collect input signals, attention modules configured to calculate attention scores of feature vectors corresponding to the input signals, a merge module configured to calculate attention values of the attention scores, and generate a merged transformation vector based on the attention values and the feature vectors, and a task-specific module configured to classify the merged transformation vector is provided.
SALIENT OBJECT DETECTION FOR ARTIFICIAL VISION
There is provided a method for creating artificial vision with an implantable visual stimulation device. The method comprises receiving image data comprising, for each of multiple points of the image, a depth value, performing a local background enclosure calculation on the input image to determine salient object information, and generating a visual stimulus to visualise the salient object information using the visual stimulation device. Determining the salient object information is based on a spatial variance of at least one of the multiple points of the image in relation to a surface model that defines a surface in the input image.
OBJECT RECOGNITION APPARATUS, OBJECT RECOGNITION METHOD, AND RECORDING MEDIUM
In an object recognition apparatus, a storage unit stores a table in that a plurality of feature amounts are associated with each object having feature points of respective feature amounts. An object region detection unit detects object regions of a plurality of objects from an input image. A feature amount extraction unit extracts feature amounts of feature points from the input image. A refining unit refers to the table, and refines from all objects of recognition subjects to object candidates corresponding to the object regions based on feature amounts of feature points belonging to the object regions. A matching unit recognizes the plurality of objects by matching the feature points belonging to each of the object regions with feature points for each of the object candidates, and outputs a recognition result.
METHOD OF DETECTING, SEGMENTING AND EXTRACTING SALIENT REGIONS IN DOCUMENTS USING ATTENTION TRACKING SENSORS
A method and system for detecting, segmenting, and extracting salient regions in documents by using attention tracking sensors is provided. The method includes: receiving an image that corresponds to a document; receiving, from a sensor, a sequence of measurements that correspond to a human reading of the document; determining, based on the sequence of measurements, at least one region of the document as being a salient document region; demarcating the salient document region in an electronically displayable manner; and outputting a file that includes a displayable version of the document with the demarcated document region. The salient document region may include a title, a section header, and/or a table. The sensor may be an eye-tracking sensor that detects a sequence of eye-gaze positions on the document as a function of time.
IMAGING SYSTEM FOR DETECTING HUMAN-OBJECT INTERACTION AND A METHOD FOR DETECTING HUMAN-OBJECT INTERACTION
The present application discloses an imaging system for detecting human-object interaction and a method for detecting human-object interaction thereof. The imaging system includes an event sensor, an image sensor, and a controller. The event sensor is configured obtain an event data set of the targeted scene according to variations of light intensity sensed by pixels of the event sensor when an event occurs in the targeted scene. The image sensor is configured capture a visual image of the targeted scene. The controller is configured to detect human according to the event data set, trigger the image sensor to capture the visual image when the human is detected, and detect the human-object interaction in the targeted scene according to the visual image and a series of event data sets obtained by the event sensor during the event.
System and method for reconstructing ECT image
The present disclosure provides a system and method for PET image reconstruction. The method may include processes for obtaining physiological information and/or rigid motion information. The image reconstruction may be performed based on the physiological information and/or rigid motion information.
Deep learning-based feature extraction for LiDAR localization of autonomous driving vehicles
In one embodiment, a method for extracting point cloud features for use in localizing an autonomous driving vehicle (ADV) includes selecting a first set of keypoints from an online point cloud, the online point cloud generated by a LiDAR device on the ADV for a predicted pose of the ADV; and extracting a first set of feature descriptors from the first set of keypoints using a feature learning neural network running on the ADV, The method further includes locating a second set of keypoints on a pre-built point cloud map, each keypoint of the second set of keypoints corresponding to a keypoint of the first set of keypoint; extracting a second set of feature descriptors from the pre-built point cloud map; and estimating a position and orientation of the ADV based on the first set of feature descriptors, the second set of feature descriptors, and a predicted pose of the ADV.