G10L15/14

Efficient empirical determination, computation, and use of acoustic confusability measures

A computer-implemented method includes generating an empirically derived acoustic confusability measure by processing example utterances and iterating from an initial estimate of the acoustic confusability measure to improve the measure. The method can further include using the acoustic confusability measure to selectively limit phrases to make recognizable by a speech recognition application.

Determining input for speech processing engine

A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.

Determining input for speech processing engine

A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.

Enhancing signature word detection in voice assistants

Systems and methods detecting a spoken sentence in a speech recognition system are disclosed herein. Speech data is buffered based on an audio signal captured at a computing device operating in an active mode. The speech data is buffered irrespective of whether the speech data comprises a signature word. The buffered speech data is processed to detect a presence of the sentence comprising at least one command and a query for the computing device. Processing the buffered speech data includes detecting the signature word in the buffered speech data, and in response to detecting the signature word in the speech data, initiating detection of the sentence in the buffered speech data.

Noise speed-ups in hidden markov models with applications to speech recognition

A learning computer system may estimate unknown parameters and states of a stochastic or uncertain system having a probability structure. The system may include a data processing system that may include a hardware processor that has a configuration that: receives data; generates random, chaotic, fuzzy, or other numerical perturbations of the data, one or more of the states, or the probability structure; estimates observed and hidden states of the stochastic or uncertain system using the data, the generated perturbations, previous states of the stochastic or uncertain system, or estimated states of the stochastic or uncertain system; and causes perturbations or independent noise to be injected into the data, the states, or the stochastic or uncertain system so as to speed up training or learning of the probability structure and of the system parameters or the states.

Masking systems and methods

Term masking is performed by generating a time-alignment value for a plurality of units of sound in vocal audio content contained in a mixed audio track, force-aligning each of the plurality of units of sound to the vocal audio content based on the time-alignment value, thereby generating a plurality of force-aligned identifiable units of sound, identifying from the plurality of force-aligned units of sound a force-aligned unit of sound to be altered, and altering the identified force-aligned unit of sound.

Control method and control apparatus for speech interaction

The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.

System and method for handling unwanted telephone calls through a branching node

Disclosed herein are systems and methods for handling unwanted telephone calls through a branching node. In one aspect, an exemplary method comprises, intercepting a call request from a terminal device of a calling party to a terminal device of a called party, establishing a connection through the branching node via two different communication channels, a first communication channel being with the terminal device of the called party and a second communication channel being with a call recorder; duplicating media data between the terminal devices such that one data stream is directed towards a receiving device of the media data and a second data stream is directed towards the call recorder; recording and sending the recorded call to an automatic speech recognizer for converting the media file to digital information suitable for analysis; and when the call is unwanted, handling the call based on classification of the call.

System and method for handling unwanted telephone calls through a branching node

Disclosed herein are systems and methods for handling unwanted telephone calls through a branching node. In one aspect, an exemplary method comprises, intercepting a call request from a terminal device of a calling party to a terminal device of a called party, establishing a connection through the branching node via two different communication channels, a first communication channel being with the terminal device of the called party and a second communication channel being with a call recorder; duplicating media data between the terminal devices such that one data stream is directed towards a receiving device of the media data and a second data stream is directed towards the call recorder; recording and sending the recorded call to an automatic speech recognizer for converting the media file to digital information suitable for analysis; and when the call is unwanted, handling the call based on classification of the call.

Device and method for supporting creation of reception history, non-transitory computer readable recording medium

The present invention makes it possible to efficiently create an appropriate dialogue history. This device for supporting creation of dialogue history (1) is provided with: a dialogue utterance focus point information store (19) which, according to utterance data indicating utterances, stores dialogue scene data indicating dialogue scenes of the utterances, utterance type indicating the types of the utterances, and utterance focus point information of the utterances; and an input/output interface (20) which, with respect to each of the dialogue scenes indicated by the dialogue scene data stored in the dialogue utterance focus point information store (19), causes a display device to display any one or more of utterances, utterance type, and utterance focus point information. Based on an operation input to the input/output interface (20), the dialogue utterance focus point information store (19) adds, modifies, or deletes any one or more of the dialogue scene data, the utterance type, and the utterance focus point information.