G10L25/00

Method and apparatus for activating speech recognition

A device to process an audio signal representing input sound includes a user voice verifier configured to generate a first indication based on whether the audio signal represents a user's voice. The device includes a speaking target detector configured to generate a second indication based on whether the audio signal represents at least one of a command or a question. The device includes an activation signal unit configured to selectively generate an activation signal based on the first indication and the second indication. The device also includes an automatic speech recognition engine configured to be activated, responsive to the activation signal, to process the audio signal.

Restroom maintenance systems having a voice activated virtual assistant

Exemplary embodiments of restroom monitoring systems having a virtual assistants includes a communications gateway located in a restroom. The communications gateway having a processor, memory, short range communications circuitry, long range communications circuitry, a microphone and a speaker. The communications gateway containing logic for listening for a wake up word and upon detecting a wake up word, capturing a request, logic for processing the request to determine what request is being requested, logic for verifying the request with the requester, and one of a plurality of wave files and a voice synthesizer. The system further includes one or more dispensers located in the restroom. The one or more dispensers having short range communications circuitry for communicating status or product level to the communications gateway.

Method of generating estimated value of local inverse speaking rate (ISR) and device and method of generating predicted value of local ISR accordingly

A method is disclosed. The proposed method includes: providing an initial speech corpus including plural utterances; based on a condition of maximum a posteriori (MAP), according to respective sequences of syllable duration, syllable duration prosodic state, syllable tone, base-syllable type, and break type of the k.sup.th utterance, using a probability of an ISR of the k.sup.th utterance x.sub.k to estimate an estimated value {circumflex over (x)}.sub.k of the x.sub.k; and through the MAP condition, according to respective sequences of syllable duration, syllable duration prosodic state, syllable tone, base-syllable type, and break type of the given l.sup.th breath group/prosodic phrase group (BG/PG) of the k.sup.th utterance, using a probability of an ISR of the l.sup.th BG/PG of the k.sup.th utterance x.sub.k,l to estimate an estimated value {circumflex over (x)}.sub.k,l of the x.sub.k,l wherein the {circumflex over (x)}.sub.k,l is the estimated value of local ISR, and a mean of a prior probability model of the {circumflex over (x)}.sub.k,l is the {circumflex over (x)}.sub.k.

Expandable dialogue system

A system that allows non-engineers administrators, without programming, machine language, or artificial intelligence system knowledge, to expand the capabilities of a dialogue system. The dialogue system may have a knowledge system, user interface, and learning model. A user interface allows non-engineers to utilize the knowledge system, defined by a small set of primitives and a simple language, to annotate a user utterance. The annotation may include selecting actions to take based on the utterance and subsequent actions and configuring associations. A dialogue state is continuously updated and provided to the user as the actions and associations take place. Rules are generated based on the actions, associations and dialogue state that allows for computing a wide range of results.

Expandable dialogue system

A system that allows non-engineers administrators, without programming, machine language, or artificial intelligence system knowledge, to expand the capabilities of a dialogue system. The dialogue system may have a knowledge system, user interface, and learning model. A user interface allows non-engineers to utilize the knowledge system, defined by a small set of primitives and a simple language, to annotate a user utterance. The annotation may include selecting actions to take based on the utterance and subsequent actions and configuring associations. A dialogue state is continuously updated and provided to the user as the actions and associations take place. Rules are generated based on the actions, associations and dialogue state that allows for computing a wide range of results.

System and method for contextual search query revision

Systems and methods for contextual search query revision are disclosed. A user utterance including at least one semantic component is received and a plurality of candidate n-grams including the at least one semantic component and at least one additional semantic component selected from a set of prior semantic components is generated. A probability that each of the plurality of candidate n-grams is an intended n-gram is calculated and a selected one of the plurality of candidate n-grams is output based on the probability.

SPEECH ANALYSER AND RELATED METHOD

A speech analyser and related methods are disclosed, the speech analyser comprising an input module for provision of speech data based on a speech signal; a primary feature extractor for provision of primary feature metrics of the speech data; a secondary feature extractor for provision of secondary feature metrics associated with the speech data; and a speech model module comprising a neural network with model layers including an input layer, one or more intermediate layers including a first intermediate layer, and an output layer for provision of a speaker metric, wherein the speech model module is configured to condition an intermediate layer based on the secondary feature metrics for provision of output from the intermediate layer as input to the model layer after the intermediate layer in the neural network.

Real-time vs non-real time audio streaming

One or more audio data is received. An expected bitrate of the one or more audio data is determined. An input bitrate of the one or more audio data is determined. An R value using the expected bitrate and the input bitrate is determined. The R value is compared to an R threshold.

System and method for interactive cognitive task assistance

A cognitive assistant that allows a maintainer to speak to an application using natural language is disclosed. The maintainer can quickly interact with an application hands-free without the need to use complex user interfaces or memorized voice commands. The assistant provides instructions to the maintainer using augmented reality audio and visual cues. The assistant will walk the maintainer through maintenance tasks and verify proper execution using IoT sensors. If after completing a step, the IoT sensors are not as expected, the maintainer is notified on how to resolve the situation.

Method and apparatus for detecting correctness of pitch period
11741980 · 2023-08-29 · ·

A method and an apparatus for detecting correctness of a pitch period, where the method for detecting correctness of a pitch period includes determining, according to an initial pitch period of an input signal in a time domain, a pitch frequency bin of the input signal, where the initial pitch period is obtained by performing open-loop detection on the input signal, determining, based on an amplitude spectrum of the input signal in a frequency domain, a pitch period correctness decision parameter, associated with the pitch frequency bin, of the input signal, and determining correctness of the initial pitch period according to the pitch period correctness decision parameter.