IPIQ

G10L15/25

CORRECTING LIP-READING PREDICTIONS

20230031536 · 2023-02-02 ·

Sony Group Corporation

Implementations generally relate to correcting lip-reading predictions. In some implementations, a method includes receiving video input of a user, where the user is talking in the video input. The method further includes predicting one or more words from mouth movement of the user to provide one or more predicted words. The method further includes correcting one or more correction candidate words from the one or more predicted words. The method further includes predicting one or more sentences from the one or more predicted words.

CORRECTING LIP-READING PREDICTIONS

20230031536 · 2023-02-02 ·

Sony Group Corporation

Method and system for recording audio content in a group conversation

11488596 · 2022-11-01 ·

Hsiao-Han Chen

A method for recording audio content in a group conversation among a plurality of members includes: controlling an image capturing device to continuously capture images of the members; executing an image processing procedure on the images of the members to determine whether a specific gesture is detected; when the determination is affirmative, controlling an audio recording device to activate and perform directional audio collection with respect to a direction that is associated with the specific gesture to record audio data; and controlling a data storage to store the audio data and a time stamp associated with the audio data as an entry of conversation record.

Method and system for recording audio content in a group conversation

11488596 · 2022-11-01 ·

Hsiao-Han Chen

Automatic Routing Using Search Results

20230082927 · 2023-03-16 ·

Google Llc

In general, the subject matter described in this specification can be embodied in methods, systems, and program products for providing search results automatically to a user of a computing device. A spoken input provided by a user to a computing device is received. The spoken input is transmitted to a computer server system that is remote from the computing device. Search result information that is responsive to the spoken input is receiving by the computing device and in response to the transmitted spoken input. An alert is provided to the user that the device will connect the user to a target of the search result information if the user does not intervene to stop the connecting of the user. The user is connected to the target of the search result information based on a determination that the user has not intervened to stop the connecting of the user.

SPEECH INSTRUCTION CONTROL METHOD IN VEHICLE CABIN AND RELATED DEVICE

20230129816 · 2023-04-27 ·

Huawei Technologies Co., Ltd.

In a method of applying speech control in a vehicle, a control device of the vehicle obtains audio data in the vehicle cabin. The control device recognizes that the audio data includes instruction information, and obtains image data in an instruction information-related event segment in the audio data in the vehicle cabin. The control device obtains, based on the image data, image data of a person in a specific position in the vehicle cabin, and extracting lip motion information of the person from the image data. The control device obtains a matching degree between the lip motion information of the person in the specific position and the instruction information, and then determines, based on the matching degree, where to execute an instruction corresponding to the instruction information.

SPEECH INSTRUCTION CONTROL METHOD IN VEHICLE CABIN AND RELATED DEVICE

20230129816 · 2023-04-27 ·

Huawei Technologies Co., Ltd.

FOVEATED BEAMFORMING FOR AUGMENTED REALITY DEVICES AND WEARABLES

20230071778 · 2023-03-09 ·

An augmented reality (AR) device, such as AR glasses, may include a microphone array. The sensitivity of the microphone array can be directed to a target by beamforming, which includes combining the audio of each microphone of the array in a particular way based on a location of the target. The present disclosure describes systems and methods to determine the location of the target based on a gaze of a user and beamform the audio accordingly. This eye-tracked beamforming (i.e., foveated beamforming) can be used by AR applications to enhance sounds from a gaze direction and to suppress sounds from other directions. Additionally, the gaze information can be used to help visualize the results of an AR application, such as speech-to-text.

FOVEATED BEAMFORMING FOR AUGMENTED REALITY DEVICES AND WEARABLES

20230071778 · 2023-03-09 ·

VOICE NOTE WITH FACE TRACKING

20230127090 · 2023-04-27 ·

Methods and systems are disclosed for performing operations for generating a voice note. The operations include receiving, by a messaging application, a request from a first participant to send a voice message to a second participant in a communication session. The operations include, in response to receiving the request, generating an audio file comprising a specified duration of speech input received from the first participant. The operations include associating the audio file with an avatar that represents the first participant. The operations include presenting an interactive visual indicator of the avatar among a plurality of messages in the communication session. The operations include receiving, by the messaging application, input that selects the interactive visual indicator of the avatar. The operations include, in response to receiving the input, rendering an animation of the avatar speaking the speech input while playing the audio file.

Patent classifications

G10L15/25