Patent classifications
G06V10/449
EFFICIENT DATA LAYOUTS FOR CONVOLUTIONAL NEURAL NETWORKS
Systems and methods for efficient implementation of a convolutional layer of a convolutional neural network are disclosed. In one aspect, weight values of kernels in a kernel stack of a convolutional layer can be reordered into a tile layout with tiles of runnels. Pixel values of input activation maps of the convolutional layer can be reordered into an interleaved layout comprising a plurality of clusters of input activation map pixels. The output activation maps can be determined using the clusters of the input activation map pixels and kernels tile by tile.
Object detection and classification
Object detection and classification across disparate fields of view are provided. A first image generated by a first recording device with a first field of view, and a second image generated by a second recording device with a second field of view can be obtained. An object detection component can detect a first object within the first field of view, and a second object within the second field of view. An object classification component can determine first and second level classification categories of the first object. A data processing system can create a data structure indicating a probability identifier for a descriptor of the first object. An object matching component can correlate the first object with the second object based on the descriptor of the first object, the probability identifier for the descriptor of the first object, or a descriptor of the second object.
LOW-COST FACE RECOGNITION USING GAUSSIAN RECEPTIVE FIELD FEATURES
Methods and systems may provide for facial recognition of at least one input image utilizing hierarchical feature learning and pair-wise classification. Receptive field theory may be used on the input image to generate a pre-processed multi-channel image. Channels in the pre-processed image may be activated based on the amount of feature rich details within the channels. Similarly, local patches may be activated based on the discriminant features within the local patches. Features may be extracted from the local patches and the most discriminant features may be selected in order to perform feature matching on pair sets. The system may utilize patch feature pooling, pair-wise matching, and large-scale training in order to quickly and accurately perform facial recognition at a low cost for both system memory and computation.
Measuring cervical spine posture using nostril tracking
A method for detecting deviation from a preferred cervical spine posture when using a mobile device is disclosed. The mobile device uses a front-facing camera to capture images of the user and apply a nostril tracking algorithm to the images. The nostril tracking algorithm is used in real-time to measure displacement of the user's nostrils and correlate the nostril displacement to a cervical spine flexion angle. The user's cervical spine flexion angle is communicated using an alarm device, such as a row of lights, which allows the user to monitor and correct their posture and avoid potential injury.
GENERATING NUMERIC EMBEDDINGS OF IMAGES
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating numeric embeddings of images. One of the methods includes obtaining training images; generating a plurality of triplets of training images; and training a neural network on each of the triplets to determine trained values of a plurality of parameters of the neural network, wherein training the neural network comprises, for each of the triplets: processing the anchor image in the triplet using the neural network to generate a numeric embedding of the anchor image; processing the positive image in the triplet using the neural network to generate a numeric embedding of the positive image; processing the negative image in the triplet using the neural network to generate a numeric embedding of the negative image; computing a triplet loss; and adjusting the current values of the parameters of the neural network using the triplet loss.
Sensor noise profile
The invention relates to feature extraction technique based on edge extraction. It can be used in computer vision systems, including image/facial/object recognition systems, scene interpretation, classification and captioning systems. A model or profile of the noise in the sensor is used to improve feature extraction or object detection on an image from a sensor.
OBJECT RECOGNITION BASED ON BOOSTING BINARY CONVOLUTIONAL NEURAL NETWORK FEATURES
Techniques related to implementing convolutional neural networks for object recognition are discussed. Such techniques may include generating a set of binary neural features via convolutional neural network layers based on input image data and applying a strong classifier to the set of binary neural features to generate an object label for the input image data.
Low-power always-on face detection, tracking, recognition and/or analysis using events-based vision sensor
Techniques disclosed herein utilize a vision sensor that integrates a special-purpose camera with dedicated computer vision (CV) computation hardware and a dedicated low-power microprocessor for the purposes of detecting, tracking, recognizing, and/or analyzing subjects, objects, and scenes in the view of the camera. The vision sensor processes the information retrieved from the camera using the included low-power microprocessor and sends events (or indications that one or more reference occurrences have occurred, and, possibly, associated data) for the main processor only when needed or as defined and configured by the application. This allows the general-purpose microprocessor (which is typically relatively high-speed and high-power to support a variety of applications) to stay in a low-power (e.g., sleep mode) most of the time as conventional, while becoming active only when events are received from the vision sensor.
Method and Processing Unit for Correlating Image Data Content from Disparate Sources
A signal processing appliance is disclosed that will simultaneously process the image data sets from disparate types of imaging sensors and data sets taken by them under varying conditions of viewing geometry, environmental conditions, lighting conditions, and at different times. Processing techniques that emulate how the human visual path processes and exploits data are implemented. The salient spatial, temporal, and color features of observed objects are calculated and cross-correlated over the disparate sensors and data sets to enable improved object association, classification and recognition. The appliance uses unique signal processing devices and architectures to enable near real-time processing.
VISUAL MAPPING METHOD, AND COMPUTER PROGRAM RECORDED ON RECORDING MEDIUM FOR EXECUTING METHOD THEREFOR
Proposed is a visual mapping method for generating a feature map by mapping a feature point of an image captured by a camera to point cloud data acquired by a lidar. The method may include generating a first feature map based on point cloud data obtained from a lidar and an image captured from a camera, by a data generator, and generating a third feature map by mapping the first feature map on a second feature map generated through pre-stored point cloud data, by the data generator. The present method is a technology developed with support from the Ministry of Trade, Industry and Energy/Korea Planning and Evaluation Institute of Industrial Technology (Project No. 201792/Business name-Excellent enterprise research center promotion project (ATC+)/Project name-Development of real-time risk detection and mapping solution based on 3D scanning technology to ensure safety in autonomous driving).