Patent classifications
G06V10/464
SYSTEM AND METHOD FOR NEURAL NETWORKS
A method for a neural network includes receiving an input from a vector of inputs, determining a table index based on the input, and retrieving a hash table from a plurality of hash tables, wherein the hash table corresponds to the table index. The method also includes determining an entry index of the hash table based on an index matrix, wherein the index matrix includes one or more index values, and each of the one or more index values corresponds to a vector in the hash table and determining an entry value in the hash table corresponding to the entry index. The method also includes determining a value index, wherein the vector in the hash table includes one or more entry values, and wherein the value index corresponds to one of the one or more entry values in the vector and determining a layer response.
TRAINING METHOD, TRAINING APPARATUS, REGION CLASSIFIER, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
A region classifier training method includes generating a first network which outputs a saliency map with respect to an input image; generating superpixels of the input image; generating a weak segmentation for extracting a target region based on the saliency map and the superpixels; and training and generating a second network being a region classifier which classifies the target region when the input image is input, by using the weak segmentation as supervised data.
Three-dimensional facial recognition method and system
The present disclosure provides a three-dimensional facial recognition method and system. The method includes: performing pose estimation on an input binocular vision image pair by using a three-dimensional facial reference model, to obtain a pose parameter and a virtual image pair of the three-dimensional facial reference model with respect to the binocular vision image pair; reconstructing a facial depth image of the binocular vision image pair by using the virtual image pair as prior information; detecting, according to the pose parameter, a local grid scale-invariant feature descriptor corresponding to an interest point in the facial depth image; and generating a recognition result of the binocular vision image pair according to the detected local grid scale-invariant feature descriptor and training data having attached category annotations. The present disclosure can reduce computational costs and required storage space.
SYSTEMS AND METHODS FOR CLUSTERING OF NEAR-DUPLICATE IMAGES IN VERY LARGE IMAGE COLLECTIONS
Detection of near-duplicate images is important for detecting the reuse of copyrighted material. Some applications require the clustering of near-duplicates instead of the comparison to an original. Representing images as bags of visual words is the first step for our clustering approach. An inverted index points from visual words to all the images containing that visual word. In the next step, matches are geometrically verified in pairs of images that share a large fraction of their visual words. Geometric verification may use affine, perspective, or other transformations. The verification step provides a similarity measure based on the fraction of the matching image points and on their distributions in the compared images. The resulting distance matrix is very sparse because most images in the collection are not compared to each other. This distance matrix is used as input for modified agglomerative hierarchical clustering approach that can handle a sparse distance matrix.
Multimodal and real-time method for filtering sensitive media
A multimodal and real-time method for filtering sensitive content, receiving as input a digital video stream, the method including segmenting digital video into video fragments along the video timeline; extracting features containing significant information from the digital video input on sensitive media; reducing the semantic difference between each of the low-level video features, and the high-level sensitive concept; classifying the video fragments, generating a high-level label (positive or negative), with a confidence score for each fragment representation; performing high-level fusion to properly match the possible high-level labels and confidence scores for each fragment; and predicting the sensitive time by combining the labels of the fragments along the video timeline, indicating the moments when the content becomes sensitive.
Methods and arrangements for identifying objects
In some arrangements, product packaging is digitally watermarked over most of its extent to facilitate high-throughput item identification at retail checkouts. Imagery captured by conventional or plenoptic cameras can be processed (e.g., by GPUs) to derive several different perspective-transformed viewsfurther minimizing the need to manually reposition items for identification. Crinkles and other deformations in product packaging can be optically sensed, allowing such surfaces to be virtually flattened to aid identification. Piles of items can be 3D-modelled and virtually segmented into geometric primitives to aid identification, and to discover locations of obscured items. Other data (e.g., including data from sensors in aisles, shelves and carts, and gaze tracking for clues about visual saliency) can be used in assessing identification hypotheses about an item. A great variety of other features and arrangements are also detailed.
Support vector machine adapted sign language classification method
A sign language recognizer is configured to detect interest points in an extracted sign language feature, wherein the interest points are localized in space and time in each image acquired from a plurality of frames of a sign language video; apply a filter to determine one or more extrema of a central region of the interest points; associate features with each interest point using a neighboring pixel function; cluster a group of extracted sign language features from the images based on a similarity between the extracted sign language features; represent each image by a histogram of visual words corresponding to the respective image to generate a code book; train a classifier to classify each extracted sign language feature using the code book; detect a posture in each frame of the sign language video using the trained classifier; and construct a sign gesture based on the detected postures.
AUTOMATED SIGN LANGUAGE RECOGNITION METHOD
A sign language recognizer is configured to detect interest points in an extracted sign language feature, wherein the interest points are localized in space and time in each image acquired from a plurality of frames of a sign language video; apply a filter to determine one or more extrema of a central region of the interest points; associate features with each interest point using a neighboring pixel function; cluster a group of extracted sign language features from the images based on a similarity between the extracted sign language features; represent each image by a histogram of visual words corresponding to the respective image to generate a code book; train a classifier to classify each extracted sign language feature using the code book; detect a posture in each frame of the sign language video using the trained classifier; and construct a sign gesture based on the detected postures.
Deformable-Surface Tracking Based Augmented Reality Image Generation
There are provided systems and methods for performing deformable-surface tracking based augmented reality image generation. In one implementation, such a system includes a hardware processor and a system memory storing an augmented reality three-dimensional image generator. The hardware processor is configured to execute the augmented reality three-dimensional image generator to receive image data corresponding to a two-dimensional surface, and to identify an image template corresponding to the two-dimensional surface based on the image data. In addition, the hardware processor is configured to execute the augmented reality three-dimensional image generator to determine a surface deformation of the two-dimensional surface. The hardware processor is further configured to execute the augmented reality three-dimensional image generator to generate an augmented reality three-dimensional image including at least one feature of the two-dimensional surface, based on the image template and the surface deformation of the two-dimensional surface.
SIGN LANGUAGE METHOD USING CLUSTERING
A sign language recognizer is configured to detect interest points in an extracted sign language feature, wherein the interest points are localized in space and time in each image acquired from a plurality of frames of a sign language video; apply a filter to determine one or more extrema of a central region of the interest points; associate features with each interest point using a neighboring pixel function; cluster a group of extracted sign language features from the images based on a similarity between the extracted sign language features; represent each image by a histogram of visual words corresponding to the respective image to generate a code book; train a classifier to classify each extracted sign language feature using the code book; detect a posture in each frame of the sign language video using the trained classifier; and construct a sign gesture based on the detected postures.