Patent classifications
G10L15/25
CORRECTING LIP-READING PREDICTIONS
Implementations generally relate to correcting lip-reading predictions. In some implementations, a method includes receiving video input of a user, where the user is talking in the video input. The method further includes predicting one or more words from mouth movement of the user to provide one or more predicted words. The method further includes correcting one or more correction candidate words from the one or more predicted words. The method further includes predicting one or more sentences from the one or more predicted words.
CORRECTING LIP-READING PREDICTIONS
Implementations generally relate to correcting lip-reading predictions. In some implementations, a method includes receiving video input of a user, where the user is talking in the video input. The method further includes predicting one or more words from mouth movement of the user to provide one or more predicted words. The method further includes correcting one or more correction candidate words from the one or more predicted words. The method further includes predicting one or more sentences from the one or more predicted words.
Method and system for recording audio content in a group conversation
A method for recording audio content in a group conversation among a plurality of members includes: controlling an image capturing device to continuously capture images of the members; executing an image processing procedure on the images of the members to determine whether a specific gesture is detected; when the determination is affirmative, controlling an audio recording device to activate and perform directional audio collection with respect to a direction that is associated with the specific gesture to record audio data; and controlling a data storage to store the audio data and a time stamp associated with the audio data as an entry of conversation record.
Method and system for recording audio content in a group conversation
A method for recording audio content in a group conversation among a plurality of members includes: controlling an image capturing device to continuously capture images of the members; executing an image processing procedure on the images of the members to determine whether a specific gesture is detected; when the determination is affirmative, controlling an audio recording device to activate and perform directional audio collection with respect to a direction that is associated with the specific gesture to record audio data; and controlling a data storage to store the audio data and a time stamp associated with the audio data as an entry of conversation record.
Automatic Routing Using Search Results
In general, the subject matter described in this specification can be embodied in methods, systems, and program products for providing search results automatically to a user of a computing device. A spoken input provided by a user to a computing device is received. The spoken input is transmitted to a computer server system that is remote from the computing device. Search result information that is responsive to the spoken input is receiving by the computing device and in response to the transmitted spoken input. An alert is provided to the user that the device will connect the user to a target of the search result information if the user does not intervene to stop the connecting of the user. The user is connected to the target of the search result information based on a determination that the user has not intervened to stop the connecting of the user.
SPEECH INSTRUCTION CONTROL METHOD IN VEHICLE CABIN AND RELATED DEVICE
In a method of applying speech control in a vehicle, a control device of the vehicle obtains audio data in the vehicle cabin. The control device recognizes that the audio data includes instruction information, and obtains image data in an instruction information-related event segment in the audio data in the vehicle cabin. The control device obtains, based on the image data, image data of a person in a specific position in the vehicle cabin, and extracting lip motion information of the person from the image data. The control device obtains a matching degree between the lip motion information of the person in the specific position and the instruction information, and then determines, based on the matching degree, where to execute an instruction corresponding to the instruction information.
SPEECH INSTRUCTION CONTROL METHOD IN VEHICLE CABIN AND RELATED DEVICE
In a method of applying speech control in a vehicle, a control device of the vehicle obtains audio data in the vehicle cabin. The control device recognizes that the audio data includes instruction information, and obtains image data in an instruction information-related event segment in the audio data in the vehicle cabin. The control device obtains, based on the image data, image data of a person in a specific position in the vehicle cabin, and extracting lip motion information of the person from the image data. The control device obtains a matching degree between the lip motion information of the person in the specific position and the instruction information, and then determines, based on the matching degree, where to execute an instruction corresponding to the instruction information.
FOVEATED BEAMFORMING FOR AUGMENTED REALITY DEVICES AND WEARABLES
An augmented reality (AR) device, such as AR glasses, may include a microphone array. The sensitivity of the microphone array can be directed to a target by beamforming, which includes combining the audio of each microphone of the array in a particular way based on a location of the target. The present disclosure describes systems and methods to determine the location of the target based on a gaze of a user and beamform the audio accordingly. This eye-tracked beamforming (i.e., foveated beamforming) can be used by AR applications to enhance sounds from a gaze direction and to suppress sounds from other directions. Additionally, the gaze information can be used to help visualize the results of an AR application, such as speech-to-text.
FOVEATED BEAMFORMING FOR AUGMENTED REALITY DEVICES AND WEARABLES
An augmented reality (AR) device, such as AR glasses, may include a microphone array. The sensitivity of the microphone array can be directed to a target by beamforming, which includes combining the audio of each microphone of the array in a particular way based on a location of the target. The present disclosure describes systems and methods to determine the location of the target based on a gaze of a user and beamform the audio accordingly. This eye-tracked beamforming (i.e., foveated beamforming) can be used by AR applications to enhance sounds from a gaze direction and to suppress sounds from other directions. Additionally, the gaze information can be used to help visualize the results of an AR application, such as speech-to-text.
VOICE NOTE WITH FACE TRACKING
Methods and systems are disclosed for performing operations for generating a voice note. The operations include receiving, by a messaging application, a request from a first participant to send a voice message to a second participant in a communication session. The operations include, in response to receiving the request, generating an audio file comprising a specified duration of speech input received from the first participant. The operations include associating the audio file with an avatar that represents the first participant. The operations include presenting an interactive visual indicator of the avatar among a plurality of messages in the communication session. The operations include receiving, by the messaging application, input that selects the interactive visual indicator of the avatar. The operations include, in response to receiving the input, rendering an animation of the avatar speaking the speech input while playing the audio file.