G10L15/20

SYSTEM AND METHOD FOR ROBUST WAKEWORD DETECTION IN PRESENCE OF NOISE IN NEW UNSEEN ENVIRONMENTS WITHOUT ADDITIONAL DATA

The current disclosure relates to systems and methods for wakeword or keyword detection in Virtual Personal Assistants (VPAs). In particular, systems and methods are provided for wakeword detection using deep neural networks including a parametric pooling layer, wherein the parametric pooling layer includes trainable parameters, enabling the layer to learn to distinguish between informative feature vectors and non-informative/noisy feature vectors extracted from a variable length acoustic signal. In one example, a parametric pooling layer may aggregate a variable length feature map, comprising a plurality of feature vectors extracted from an acoustic signal, into an embedding vector of pre-determined length, by weighting each of the plurality of feature vectors based on one or more learned parameters in a parametric pooling layer, and aggregating the plurality of weighted feature vectors into the embedding vector.

Leveraging entity relations to discover answers using a knowledge graph

An approach is provided that receives a question at a question-answering (QA) system. A number of passages are identified that are relevant to the received question. A question knowledge graph is generated that corresponds to the question and a set of passage knowledge graphs are also generated with each passage knowledge graph corresponding to one of the identified passages. Each of the passage knowledge graphs are compared to the question knowledge graph with the comparison resulting in a set of knowledge graph candidate answers (kgCAs). A set of candidate answers (CAs) is computed by the QA with at least one of the CAs being based on one of the kgCAs.

Vehicular apparatus, vehicle, operation method of vehicular apparatus, and storage medium
11521615 · 2022-12-06 · ·

A vehicular apparatus having at least one of a voice calling function and a voice recognition function, the apparatus comprising a voice input unit including a plurality of microphones, the voice input unit being disposed between a driver's seat and a passenger seat with respect to a vehicle width direction; and a control unit configured to control a directionality direction and a gain level of each of the plurality of microphones, wherein the control unit controls the directionality directions of the plurality of microphones in two directions, the two directions being a driver's seat side and a passenger seat side, and controls a gain level on the passenger seat side to be lower than a gain level on the driver's seat side.

Vehicular apparatus, vehicle, operation method of vehicular apparatus, and storage medium
11521615 · 2022-12-06 · ·

A vehicular apparatus having at least one of a voice calling function and a voice recognition function, the apparatus comprising a voice input unit including a plurality of microphones, the voice input unit being disposed between a driver's seat and a passenger seat with respect to a vehicle width direction; and a control unit configured to control a directionality direction and a gain level of each of the plurality of microphones, wherein the control unit controls the directionality directions of the plurality of microphones in two directions, the two directions being a driver's seat side and a passenger seat side, and controls a gain level on the passenger seat side to be lower than a gain level on the driver's seat side.

End-to-end multi-talker overlapping speech recognition
11521595 · 2022-12-06 · ·

A method for training a speech recognition model with a loss function includes receiving an audio signal including a first segment corresponding to audio spoken by a first speaker, a second segment corresponding to audio spoken by a second speaker, and an overlapping region where the first segment overlaps the second segment. The overlapping region includes a known start time and a known end time. The method also includes generating a respective masked audio embedding for each of the first and second speakers. The method also includes applying a masking loss after the known end time to the respective masked audio embedding for the first speaker when the first speaker was speaking prior to the known start time, or applying the masking loss prior to the known start time when the first speaker was speaking after the known end time.

Systems and methods for noise cancellation

A computing device may receive audio data from a microphone representing audio in an environment of the device, which may correspond to an utterance and noise. A model may be trained to process the audio data to cancel noise from the audio data. The model may include an encoder that includes one or more dense layers, one or more recurrent layers, and a decoder that includes one or more dense layers.

Systems and methods for noise cancellation

A computing device may receive audio data from a microphone representing audio in an environment of the device, which may correspond to an utterance and noise. A model may be trained to process the audio data to cancel noise from the audio data. The model may include an encoder that includes one or more dense layers, one or more recurrent layers, and a decoder that includes one or more dense layers.

VOICE PROCESSING SYSTEM AND VOICE PROCESSING METHOD
20220383878 · 2022-12-01 ·

A voice processing system includes: a first acquisition processor that acquires voice data corrected by a microphone installed in a microphone-speaker device; a second acquisition processor that acquires authentication information of a wearer who wears the microphone-speaker device, the authentication information being acquired by an authentication information acquirer installed in the microphone-speaker device; and a control processor that executes predetermined processing related to the voice data, which is acquired by the first acquisition processor, on the basis of the authentication information acquired by the second acquisition processor.

VOICE PROCESSING SYSTEM AND VOICE PROCESSING METHOD
20220383878 · 2022-12-01 ·

A voice processing system includes: a first acquisition processor that acquires voice data corrected by a microphone installed in a microphone-speaker device; a second acquisition processor that acquires authentication information of a wearer who wears the microphone-speaker device, the authentication information being acquired by an authentication information acquirer installed in the microphone-speaker device; and a control processor that executes predetermined processing related to the voice data, which is acquired by the first acquisition processor, on the basis of the authentication information acquired by the second acquisition processor.

EXPLAINING ANOMALOUS PHONETIC TRANSLATIONS

A method includes: receiving, by a computing device, a digital voice stream; receiving, by the computing device, converted text that represents the digital voice stream; identifying, by the computing device, an erroneously converted portion of the converted text; selecting, by the computing device, the erroneously converted portion for explainability processing; parsing, by the computing device, the erroneously converted portion into parts based on a predetermined parsing level; collecting, by the computing device, supplementary input data related to the erroneously converted portion; and determining, by the computing device and based on the supplemental input data, a reason why the erroneously converted portion was erroneously converted.