IPIQ

G10L15/25

REMOTELESS CONTROL OF DRONE BEHAVIOR

20230047759 · 2023-02-16 ·

A drone system is configured to capture an audio stream that includes voice commands from an operator, to process the audio stream for identification of the voice commands, and to perform operations based on the identified voice commands. The drone system can identify a particular voice stream in the audio stream as an operator voice, and perform the command recognition with respect to the operator voice to the exclusion of other voice streams present in the audio stream. The drone can include a directional camera that is automatically and continuously focused on the operator to capture a video stream usable in disambiguation of different voice streams captured by the drone.

REMOTELESS CONTROL OF DRONE BEHAVIOR

20230047759 · 2023-02-16 ·

METHOD AND DEVICE FOR GENERATING SPEECH VIDEO ON BASIS OF MACHINE LEARNING

20220358703 · 2022-11-10 ·

A device for generating a speech video may include a first encoder to receive a person background image corresponding to a video part of a speech video of a person and extract an image feature vector from the person background image, a second encoder to receive a speech audio signal corresponding to an audio part of the speech video and extract a voice feature vector from the speech audio signal, a combiner to generate a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder, and a decoder to reconstruct the speech video of the person using the combined vector as an input. The person background image input to the first encoder includes a face and an upper body of the person, with a portion related to speech of the person covered with a mask.

METHOD AND DEVICE FOR GENERATING SPEECH VIDEO ON BASIS OF MACHINE LEARNING

20220358703 · 2022-11-10 ·

VOICE ACTIVITY DETECTION METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM

20220358929 · 2022-11-10 ·

BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

The present disclosure discloses a voice activity detection method and apparatus, an electronic device and a storage medium, and relates to the field of artificial intelligence, such as deep learning, intelligent voices, or the like. The method may include: acquiring time-aligned voice data and video data; performing a first detection of a voice start point and a voice end point of the voice data using a voice detection model obtained by a training operation; performing a second detection of a lip movement start point and a lip movement end point of the video data; and correcting a result of the first detection using a result of the second detection, and taking a corrected result as a voice activity detection result. The solution of the present disclosure may improve accuracy of the voice activity detection result, or the like.

VOICE ACTIVITY DETECTION METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM

20220358929 · 2022-11-10 ·

BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

PORTABLE TERMINAL DEVICE AND INFORMATION PROCESSING SYSTEM

20230039067 · 2023-02-09 ·

A portable terminal device in an information processing system and method includes a camera and a microphone. Data of obtained images and voice are transmitted to a server that identifies operations to be executed based on the received voice and image data. The server transmits an identification of one or more results of the plurality of operations to the portable terminal device. When the portable terminal device receives only one result from the server, an operation corresponding to the one result is executed, and when a plurality of results is received, the portable terminal device displays information corresponding to the plurality of results as candidates. Additional voice is captured for selecting one of the plurality of results during the displaying of the information. A determination of one result from the plurality of results is made based on the captured voice, and an operation corresponding to the determined result is executed.

PORTABLE TERMINAL DEVICE AND INFORMATION PROCESSING SYSTEM

20230039067 · 2023-02-09 ·

Automatic dialing

11494453 · 2022-11-08 ·

Google Llc

In general, the subject matter described in this specification can be embodied in methods, systems, and program products for providing search results automatically to a user of a computing device. A spoken input provided by a user to a computing device is received. The spoken input is transmitted to a computer server system that is remote from the computing device. Search result information that is responsive to the spoken input is receiving by the computing device and in response to the transmitted spoken input. An alert is provided to the user that the device will connect the user to a target of the search result information if the user does not intervene to stop the connecting of the user. The user is connected to the target of the search result information based on a determination that the user has not intervened to stop the connecting of the user.

Lip language recognition method and mobile terminal using sound and silent modes

11495231 · 2022-11-08 ·

Beijing BOE Technology Development Co., Ltd.

A lip language recognition method, applied to a mobile terminal having a sound mode and a silent mode, includes: training a deep neural network in the sound mode; collecting a user's lip images in the silent mode; and identifying content corresponding to the user's lip images with the deep neural network trained in the sound mode. The method further includes: switching from the sound mode to the silent mode when a privacy need of the user arises.

Patent classifications

G10L15/25