G10L17/02

Securely executing voice actions with speaker identification and authorization code
11665543 · 2023-05-30 · ·

In some implementations, (i) audio data representing a voice command spoken by a speaker and (ii) a speaker identification result indicating that the voice command was spoken by the speaker are obtained. A voice action is selected based at least on a transcription of the audio data. A service provider corresponding to the selected voice action is selected from among a plurality of different service providers. One or more input data types that the selected service provider uses to perform authentication for the selected voice action are identified. A request to perform the selected voice action and (i) one or more values that correspond to the identified one or more input data types are provided to the service provider.

Method for searching for contents having same voice as voice of target speaker, and apparatus for executing same
11664015 · 2023-05-30 · ·

A method for searching content having same voice as a voice of a target speaker from among a plurality of contents includes extracting a feature vector corresponding to the voice of the target speaker, selecting any subset of speakers from a training dataset repeatedly by a predetermined number of times, generating linear discriminant analysis (LDA) transformation matrices using each of the selected any subsets of speakers repeatedly by a predetermined number of times, projecting the extracted speaker feature vector to the selected corresponding subsets of speakers using each of the generated LDA transformation matrices, assigning a value corresponding to nearby speaker class among corresponding subsets of speakers, to each of projection regions of the extracted speaker feature vector, generating a hash value corresponding to the extracted feature vector based on the assigned values, and searching content having a similar hash value to the generated hash value among the contents.

ROLE SEPARATION METHOD, MEETING SUMMARY RECORDING METHOD, ROLE DISPLAY METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER STORAGE MEDIUM
20230162757 · 2023-05-25 ·

A role separation method, a meeting summary recording method, a role display method and apparatus, an electronic device, and a computer storage medium, relating to the field of speech processing. The role separation method comprises: obtaining sound source angle data corresponding to a speech data frame, acquired by a speech acquisition device, of a role to be separated (S102); on the basis of the sound source angle data, performing identity recognition on the role to be separated to obtain a first identity recognition result of the role to be separated (S104); and separating the role on the basis of the first identity recognition result of the role to be separated (S106). The role is separated in real time, thus making user experience smooth.

ROLE SEPARATION METHOD, MEETING SUMMARY RECORDING METHOD, ROLE DISPLAY METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER STORAGE MEDIUM
20230162757 · 2023-05-25 ·

A role separation method, a meeting summary recording method, a role display method and apparatus, an electronic device, and a computer storage medium, relating to the field of speech processing. The role separation method comprises: obtaining sound source angle data corresponding to a speech data frame, acquired by a speech acquisition device, of a role to be separated (S102); on the basis of the sound source angle data, performing identity recognition on the role to be separated to obtain a first identity recognition result of the role to be separated (S104); and separating the role on the basis of the first identity recognition result of the role to be separated (S106). The role is separated in real time, thus making user experience smooth.

NOISE CANCELLATION PROCESSING METHOD, DEVICE AND APPARATUS
20230164477 · 2023-05-25 ·

A noise cancellation processing method, device and apparatus are provided. The noise cancellation processing method includes: collecting first voice data in a surrounding environment by using a noise-cancelling earphone in response to detecting that the noise-cancelling earphone is in a wearing state and a noise cancellation mode is enabled; extracting to-be-recognized voiceprint feature information according to the first voice data; identifying similarities between registered voiceprint feature information stored in a registered voiceprint database and the to-be-recognized voiceprint feature information entry by entry; and in response to at least one of the similarities being greater than a first preset threshold, performing a preset action in the noise-cancelling earphone.

NOISE CANCELLATION PROCESSING METHOD, DEVICE AND APPARATUS
20230164477 · 2023-05-25 ·

A noise cancellation processing method, device and apparatus are provided. The noise cancellation processing method includes: collecting first voice data in a surrounding environment by using a noise-cancelling earphone in response to detecting that the noise-cancelling earphone is in a wearing state and a noise cancellation mode is enabled; extracting to-be-recognized voiceprint feature information according to the first voice data; identifying similarities between registered voiceprint feature information stored in a registered voiceprint database and the to-be-recognized voiceprint feature information entry by entry; and in response to at least one of the similarities being greater than a first preset threshold, performing a preset action in the noise-cancelling earphone.

INTELLIGENT RECOMMENDATION METHOD, VEHICLE-MOUNTED DEVICE, AND STORAGE MEDIUM
20230162514 · 2023-05-25 ·

This application provides an intelligent recommendation method. The method includes capturing images of occupants in a vehicle using at least one camera, and obtaining attributes of occupants in the vehicle according to the captured images. Voice information of the occupants in the vehicle is collected using a microphone. Once the attributes of occupants in the vehicle and the voice information are sent to a cloud server, recommendation information can be obtained from the cloud server, wherein the recommendation information is generated based on a user intention that is obtained based on the attributes of occupants in the vehicle and the voice information.

INTELLIGENT RECOMMENDATION METHOD, VEHICLE-MOUNTED DEVICE, AND STORAGE MEDIUM
20230162514 · 2023-05-25 ·

This application provides an intelligent recommendation method. The method includes capturing images of occupants in a vehicle using at least one camera, and obtaining attributes of occupants in the vehicle according to the captured images. Voice information of the occupants in the vehicle is collected using a microphone. Once the attributes of occupants in the vehicle and the voice information are sent to a cloud server, recommendation information can be obtained from the cloud server, wherein the recommendation information is generated based on a user intention that is obtained based on the attributes of occupants in the vehicle and the voice information.

Channel-compensated low-level features for speaker recognition
11657823 · 2023-05-23 · ·

A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.

Channel-compensated low-level features for speaker recognition
11657823 · 2023-05-23 · ·

A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.