G10L15/14

Multi-assistant control

A multi-assistant controller includes an audio recorder and a detector. The audio recorder is configured to receive a sampled audio from a microphone, store the sampled audio in a circular buffer, and transfer the sampled audio from the circular buffer to a particular voice-activated assistant. The detector is configured to store multiple wake-up phrases that are recognizable by multiple voice-activated assistants, search the sampled audio to determine multiple probabilities that the sampled audio includes the wake-up phrases, select a particular wake-up phrase that has a highest probability among the probabilities, and send a callback to the particular voice-activated assistant that the particular wake-up phrase has been detected. The sampled audio that is transferred to the particular voice-activated assistant includes the particular wake-up phrase that was detected.

Method and system for providing adjunct sensory information to a user

A method for providing information to a user, the method including: receiving an input signal from a sensing device associated with a sensory modality of the user; generating a preprocessed signal upon preprocessing the input signal with a set of preprocessing operations; extracting a set of features from the preprocessed signal; processing the set of features with a neural network system; mapping outputs of the neural network system to a device domain associated with a device including a distribution of haptic actuators in proximity to the user; and at the distribution of haptic actuators, cooperatively producing a haptic output representative of at least a portion of the input signal, thereby providing information to the user.

Systems and methods for improving content discovery in response to a voice query using a recognition rate which depends on detected trigger terms

A transcription of a query for content discovery is generated, and a context of the query is identified, as well as a first plurality of candidate entities to which the query refers. A search is performed based on the context of the query and the first plurality of candidate entities, and results are generated for output. A transcription of a second voice query is then generated, and it is determined whether the second transcription includes a trigger term indicating a corrective query. If so, the context of the first query is retrieved. A second term of the second query similar to a term of the first query is identified, and a second plurality of candidate entities to which the second term refers is determined. A second search is performed based on the second plurality of candidates and the context, and new search results are generated for output.

DETERMINING INPUT FOR SPEECH PROCESSING ENGINE
20230135768 · 2023-05-04 ·

A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.

DETERMINING INPUT FOR SPEECH PROCESSING ENGINE
20230135768 · 2023-05-04 ·

A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.

EFFICIENT EMPIRICAL DETERMINATION, COMPUTATION, AND USE OF ACOUSTIC CONFUSABILITY MEASURES
20230206914 · 2023-06-29 ·

A computer-implemented method includes generating an empirically derived acoustic confusability measure by processing example utterances and iterating from an initial estimate of the acoustic confusability measure to improve the measure. The method can further include using the acoustic confusability measure to selectively limit phrases to make recognizable by a speech recognition application.

Mandarin and dialect mixed modeling and speech recognition

The present disclosure provides a modeling method for speech recognition and a device. The method includes: determining N types of tags; training a neural network according to speech data of Mandarin to generate a recognition model whose outputs are the N types of tags; inputting speech data of each dialect into the recognition model to obtain an output tag of each frame of the speech data of each dialect; determining, according to the output tags and tagged true tags, error rates of the N types of tags for the each dialect, generating M types of target tags according to tags with error rates greater than a preset threshold; and training an acoustic model according to third speech data of Mandarin and third speech data of the P dialects, outputs of the acoustic model being the N types of tags and the M types of target tags corresponding to each dialect.

METHOD AND DEVICE FOR PERFORMING VOICE RECOGNITION USING GRAMMAR MODEL

A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.

METHOD AND DEVICE FOR PERFORMING VOICE RECOGNITION USING GRAMMAR MODEL

A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.

SPEECH PROCESSING SYSTEM AND SPEECH PROCESSING METHOD

A speech intelligibility enhancing system for enhancing speech, the system comprising: a speech input for receiving speech to be enhanced; an enhanced speech output to output the enhanced speech; and a processor configured to convert speech received by the speech input to enhanced speech to be output by the enhanced speech output, the processor being configured to: extract a portion of the speech received by the speech input; calculate the power of the portion; estimate a contribution due to late reverberation to the power of the portion of the speech when reverbed; calculate a target late reverberation power; determine a time t.sub.i for the estimated contribution due to late reverberation to decay to the target late reverberation power; calculate a pause duration, wherein the pause duration is calculated using the time t.sub.i; insert a pause having the calculated duration into the speech received by the speech input at a first location, wherein the first location is followed by the portion.