Patent classifications
G06V40/164
METHODS AND SYSTEMS FOR CORRECTING, BASED ON SPEECH, INPUT GENERATED USING AUTOMATIC SPEECH RECOGNITION
Methods and systems for correcting, based on subsequent second speech, an error in an input generated from first speech using automatic speech recognition, without an explicit indication in the second speech that a user intended to correct the input with the second speech, include determining that a time difference between when search results in response to the input were displayed and when the second speech was received is less than a threshold time, and based on the determination, correcting the input based on the second speech. The methods and systems also include determining that a difference in acceleration of a user input device, used to input the first speech and second speech, between when the search results in response to the input were displayed and when the second speech was received is less than a threshold acceleration, and based on the determination, correcting the input based on the second speech.
PHOTO PROCESSING METHOD AND APPARATUS
The present disclosure discloses a photo processing method and an apparatus for grouping photos into photo albums based on facial recognition results. The method includes: performing face detection on multiple photos, to obtain a face image feature set, each face image feature in the face image feature set corresponding to one of the multiple photos; determining a face-level similarity for each pair of face image features in the face image feature set; determining a photo-level similarity between each pair of photos in the multiple photos in accordance with their associated face-level similarities; generating a photo set for each target photo in the multiple photos, wherein any photo-level similarity between the target photo and another photo in the photo set exceeds a predefined photo-level threshold; and generating a label for each photo set using photographing location and photographing time information associated with the photos in the photo set.
LEARNING MODEL FOR SALIENT FACIAL REGION DETECTION
One embodiment provides a method comprising receiving a first input image and a second input image. Each input image comprises a facial image of an individual. For each input image, a first set of facial regions of the facial image is distinguished from a second set of facial regions of the facial image based on a learning based model. The first set of facial regions comprises age-invariant facial features, and the second set of facial regions comprises age-sensitive facial features. The method further comprises determining whether the first input image and the second input images comprise facial images of the same individual by performing face verification based on the first set of facial regions of each input image.
SYSTEMS AND METHODS FOR DISPLAYING SUBJECTS OF A VIDEO PORTION OF CONTENT
Systems and methods are described herein for displaying subjects of a portion of content. Media data of content is analyzed during playback, and a number of action signatures are identified. Each action signature is associated with a particular subject within the content. The action signature is stored, along with a timestamp corresponding to a playback position at which the action signature begins, in association with an identifier of the particular subject. Upon receiving a command, icons representing each of a number of action signatures at or near the current playback position are displayed. Upon receiving user selection of an icon corresponding to a particular signature, a portion of the content corresponding to the action signature is played back.
Personal computing device control using face detection and recognition
Systems and methods are provided for control of a personal computing device based on user face detection and recognition techniques.
Systems and methods for displaying subjects of a video portion of content
Systems and methods are described herein for displaying subjects of a portion of content. Media data of content is analyzed during playback, and a number of action signatures are identified. Each action signature is associated with a particular subject within the content. The action signature is stored, along with a timestamp corresponding to a playback position at which the action signature begins, in association with an identifier of the particular subject. Upon receiving a command, icons representing each of a number of action signatures at or near the current playback position are displayed. Upon receiving user selection of an icon corresponding to a particular signature, a portion of the content corresponding to the action signature is played back.
Facial Image-Processing Method and System Thereof
A facial image-processing method includes: transforming a facial image with 2D Fourier transformation (FT) in a template to obtain 2D FT data of color channels of the facial image and a 2D FT data of the template, with computing first light intensities of color channels and a second light intensity of the template with the 2D FT data; computing an intensity mean value and an intensity maximum in each of the color channels; processing the first light intensities and the second light intensity with singular value decomposition (SVD) to obtain intensity spectrum SVD matrixes and a template SVD matrix; computing a compensation weight coefficient for each color channel with the intensity mean value, the intensity maximum and SV maximums of the intensity spectrum SVD matrixes and the template SVD matrix; and compensating the facial image with the compensation weight coefficients to obtain a compensated facial image.
Car Onboard Speech Processing Device
The present invention provides a speech processing device with which it is possible to achieve smooth communication between the passengers of a host vehicle and the passengers of a desired vehicle. In a communication system according to the present invention, a first communication device 10 transmits the position of a first vehicle Mc, the speech of a speaker 601, and a direction d of utterance to multiple unspecified second vehicles Mr in the surroundings of the first vehicle Mc. A second communication device 10 processes the speech in a sound field formed inside the second vehicles Mr by a speaker array comprising a plurality of speakers 41 so that the virtual sound source of the speech is formed in the direction of the position of the first vehicle Mc, and outputs the processed speech using the speaker array at a sound volume calculated on the basis of the position of the first vehicle Mc, the positions of the second vehicles Mr, and the direction d of utterance of the speaker in the first vehicle Mc.
Systems and methods for 3D facial modeling
In an embodiment, a 3D facial modeling system includes a plurality of cameras configured to capture images from different viewpoints, a processor, and a memory containing a 3D facial modeling application and parameters defining a face detector, wherein the 3D facial modeling application directs the processor to obtain a plurality of images of a face captured from different viewpoints using the plurality of cameras, locate a face within each of the plurality of images using the face detector, wherein the face detector labels key feature points on the located face within each of the plurality of images, determine disparity between corresponding key feature points of located faces within the plurality of images, and generate a 3D model of the face using the depth of the key feature points.
METHOD AND SYSTEM OF FACIAL EXPRESSION RECOGNITION USING LINEAR RELATIONSHIPS WITHIN LANDMARK SUBSETS
A system, article, and method to provide facial expression recognition using linear relationships within landmark subsets.