G10L15/05

SPEECH PROCESSING METHOD AND APPARATUS
20220399012 · 2022-12-15 ·

A speech processing method includes obtaining first speech information from a user, determining one or more similar speech segments in the first speech information and deleting one or more similar frames each of the one or more similar speech segments to obtain second speech information, and analyzing the second speech information to determine a user intent corresponding to the first speech information. A duration of the first speech information exceeds a preset analysis duration threshold, and a duration of the second speech information does not exceed the preset analysis duration threshold.

SPEECH PROCESSING METHOD AND APPARATUS
20220399012 · 2022-12-15 ·

A speech processing method includes obtaining first speech information from a user, determining one or more similar speech segments in the first speech information and deleting one or more similar frames each of the one or more similar speech segments to obtain second speech information, and analyzing the second speech information to determine a user intent corresponding to the first speech information. A duration of the first speech information exceeds a preset analysis duration threshold, and a duration of the second speech information does not exceed the preset analysis duration threshold.

SYSTEM AND METHOD FOR ROBUST WAKEWORD DETECTION IN PRESENCE OF NOISE IN NEW UNSEEN ENVIRONMENTS WITHOUT ADDITIONAL DATA

The current disclosure relates to systems and methods for wakeword or keyword detection in Virtual Personal Assistants (VPAs). In particular, systems and methods are provided for wakeword detection using deep neural networks including a parametric pooling layer, wherein the parametric pooling layer includes trainable parameters, enabling the layer to learn to distinguish between informative feature vectors and non-informative/noisy feature vectors extracted from a variable length acoustic signal. In one example, a parametric pooling layer may aggregate a variable length feature map, comprising a plurality of feature vectors extracted from an acoustic signal, into an embedding vector of pre-determined length, by weighting each of the plurality of feature vectors based on one or more learned parameters in a parametric pooling layer, and aggregating the plurality of weighted feature vectors into the embedding vector.

SYSTEM AND METHOD FOR ROBUST WAKEWORD DETECTION IN PRESENCE OF NOISE IN NEW UNSEEN ENVIRONMENTS WITHOUT ADDITIONAL DATA

The current disclosure relates to systems and methods for wakeword or keyword detection in Virtual Personal Assistants (VPAs). In particular, systems and methods are provided for wakeword detection using deep neural networks including a parametric pooling layer, wherein the parametric pooling layer includes trainable parameters, enabling the layer to learn to distinguish between informative feature vectors and non-informative/noisy feature vectors extracted from a variable length acoustic signal. In one example, a parametric pooling layer may aggregate a variable length feature map, comprising a plurality of feature vectors extracted from an acoustic signal, into an embedding vector of pre-determined length, by weighting each of the plurality of feature vectors based on one or more learned parameters in a parametric pooling layer, and aggregating the plurality of weighted feature vectors into the embedding vector.

Dialogue processing apparatus, a vehicle including the same, and a dialogue processing method

A dialogue processing apparatus includes: a speech input device configured to receive a speech signal of a user; a first buffer configured to store the received speech signal therein; an output device; and a controller. The controller is configured to: detect an utterance end time point on the basis of the stored speech signal; generate a second speech recognition result corresponding to a speech signal after the utterance end time point on the basis of whether an intention of the user is to be identified from a first speech recognition result corresponding to a speech signal before the utterance end time point; and control the output device to output a response corresponding to the intention of the user determined on the basis of at least one of the first speech recognition result or the second speech recognition result.

Dialogue processing apparatus, a vehicle including the same, and a dialogue processing method

A dialogue processing apparatus includes: a speech input device configured to receive a speech signal of a user; a first buffer configured to store the received speech signal therein; an output device; and a controller. The controller is configured to: detect an utterance end time point on the basis of the stored speech signal; generate a second speech recognition result corresponding to a speech signal after the utterance end time point on the basis of whether an intention of the user is to be identified from a first speech recognition result corresponding to a speech signal before the utterance end time point; and control the output device to output a response corresponding to the intention of the user determined on the basis of at least one of the first speech recognition result or the second speech recognition result.

METHOD FOR VOICE RECOGNITION, ELECTRONIC DEVICE AND STORAGE MEDIUM

A method for voice recognition includes: performing by an electronic device, voice recognition on voice information; and updating by the electronic device, a waiting duration for EPD from a first preset duration to a second preset duration in response to recognizing a preset keyword from the voice information, where the first preset duration is less than the second preset duration.

METHOD FOR VOICE RECOGNITION, ELECTRONIC DEVICE AND STORAGE MEDIUM

A method for voice recognition includes: performing by an electronic device, voice recognition on voice information; and updating by the electronic device, a waiting duration for EPD from a first preset duration to a second preset duration in response to recognizing a preset keyword from the voice information, where the first preset duration is less than the second preset duration.

Neural network accelerator with compact instruct set
11520561 · 2022-12-06 · ·

Described herein is a neural network accelerator with a set of neural processing units and an instruction set for execution on the neural processing units. The instruction set is a compact instruction set including various compute and data move instructions for implementing a neural network. Among the compute instructions are an instruction for performing a fused operation comprising sequential computations, one of which involves matrix multiplication, and an instruction for performing an elementwise vector operation. The instructions in the instruction set are highly configurable and can handle data elements of variable size. The instructions also implement a synchronization mechanism that allows asynchronous execution of data move and compute operations across different components of the neural network accelerator as well as between multiple instances of the neural network accelerator.

Wakeword detection using a neural network

A system and method performs wakeword detection using a feedforward neural network model. A first output of the model indicates when the wakeword appears on a right side of a first window of input audio data. A second output of the model indicates when the wakeword appears in the center of a second window of input audio data. A third output of the model indicates when the wakeword appears on a left side of a third window of input audio data. Using these outputs, the system and method determine a beginpoint and endpoint of the wakeword.