G10L15/32

Synthesizing higher order conversation features for a multiparty conversation

Technology is provided for identifying synthesized conversation features from recorded conversations. The technology can identify, for each of one or more utterances, data for multiple modalities, such as acoustic data, video data, and text data. The technology can extract features, for each particular utterance of the one or more utterances, from each of the data for the multiple modalities associated with that particular utterance. The technology can also apply a machine learning model that receives the extracted features and/or previously synthesized conversation features and produces one or more additional synthesized conversation features.

SPEECH RECOGNITION ERROR CORRECTION METHOD, RELATED DEVICES, AND READABLE STORAGE MEDIUM
20220383853 · 2022-12-01 · ·

A speech recognition error correction method and device, and a readable storage medium are provided. The method includes: acquiring to-be-recognized speech data and a first recognition result of the speech data, re-recognizing the speech data with reference to context information in the first recognition result to obtain a second recognition result, and determining a final recognition result based on the second recognition result. In the method, the speech data is re-recognized with reference to context information in the first recognition result, which fully considers context information in the recognition result and the application scenario of the speech data. If any error occurs in the first recognition result, the first recognition result is corrected based on the second recognition. Therefore, the accuracy of speech recognition can be improved.

USAGE OF VOICE RECOGNITION CONFIDENCE LEVELS IN A PASSENGER INTERFACE
20220383870 · 2022-12-01 ·

A voice recognition system for an elevator system including: one or more microphones configured to capture a voice command from an individual and convert the voice command into an audio signal; a command arbitrator including one or more speech interpretation systems, the command arbitrator being configured to analyze the audio signal and determine an interpreted command for the elevator system from the audio signal using the one or more speech interpretation systems, wherein the interpreted command includes a confidence measure associated with the interpreted command, and wherein the confidence measure is an indicator depicting how confident the command arbitrator is that the interpreted command matches the voice command from the individual.

DYNAMIC SPEECH RECOGNITION METHODS AND SYSTEMS WITH USER-CONFIGURABLE PERFORMANCE

Methods and systems are provided for assisting operation of a vehicle using speech recognition. One method involves identifying a user-configured speech recognition performance setting value selected from among a plurality of speech recognition performance setting values, selecting a speech recognition model configuration corresponding to the user-configured speech recognition performance setting value from among a plurality of speech recognition model configurations, where each speech recognition model configuration of the plurality of speech recognition model configurations corresponds to a respective one of the plurality of speech recognition performance setting values, and recognizing an audio input as an input state using the speech recognition model configuration corresponding to the user-configured speech recognition performance setting value.

SYSTEM AND METHOD FOR EXTRACTING AND DISPLAYING SPEAKER INFORMATION IN AN ATC TRANSCRIPTION

A system for extracting speaker information in an ATC transcription and displaying the speaker information on a graphical display unit is provided. The system is configured to: segment a stream of audio received from an ATC and other aircraft into a plurality of chunks; determine, for each chunk, if the speaker is enrolled in an enrolled speaker database; when the speaker is enrolled in the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model and tag the chunk with a permanent name for the speaker; when the speaker is not enrolled in the enrolled speaker database, assign a temporary name for the speaker, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; and signal the graphical display unit to display the formatted text along with an identity for the speaker.

Third party account linking for voice user interface

Methods and systems for adding functionality to an account of a language processing system where the functionality is associated with a second account of a first application system is described herein. In a non-limiting embodiment, an individual may log into a first account of a language processing system and log into a second account of a first application system. While logged into both the first account and the second account, a button included within a webpage provided by the first application may be invoked. A request capable of being serviced using the first functionality may be received by the language processing system from a device associated with the first account. The language processing system may send first account data and the second account data to the first application system to facilitate an action associated with the request, thereby enabling the first functionality for the first account.

Third party account linking for voice user interface

Methods and systems for adding functionality to an account of a language processing system where the functionality is associated with a second account of a first application system is described herein. In a non-limiting embodiment, an individual may log into a first account of a language processing system and log into a second account of a first application system. While logged into both the first account and the second account, a button included within a webpage provided by the first application may be invoked. A request capable of being serviced using the first functionality may be received by the language processing system from a device associated with the first account. The language processing system may send first account data and the second account data to the first application system to facilitate an action associated with the request, thereby enabling the first functionality for the first account.

Device arbitration by multiple speech processing systems
11513766 · 2022-11-29 · ·

A device can perform device arbitration, even when the device is unable to communicate with a remote system over a wide area network (e.g., the Internet). Upon detecting a wakeword in an utterance, the device can wait a period of time for data to arrive at the device, which, if received, indicates to the device that another speech interface device in the environment detected an utterance. If the device receives data prior to the period of time lapsing, the device can determine the earliest-occurring wakeword based on multiple wakeword occurrence times, and may designate whichever device that detected the wakeword first as the designated device to perform an action with respect to the user speech. To account for differences in sound capture latency between speech interface devices, a pre-calculated time offset value can be applied to wakeword occurrence time(s) during device arbitration.

Device arbitration by multiple speech processing systems
11513766 · 2022-11-29 · ·

A device can perform device arbitration, even when the device is unable to communicate with a remote system over a wide area network (e.g., the Internet). Upon detecting a wakeword in an utterance, the device can wait a period of time for data to arrive at the device, which, if received, indicates to the device that another speech interface device in the environment detected an utterance. If the device receives data prior to the period of time lapsing, the device can determine the earliest-occurring wakeword based on multiple wakeword occurrence times, and may designate whichever device that detected the wakeword first as the designated device to perform an action with respect to the user speech. To account for differences in sound capture latency between speech interface devices, a pre-calculated time offset value can be applied to wakeword occurrence time(s) during device arbitration.

Method, device, and system of selectively using multiple voice data receiving devices for intelligent service

An electronic device is provided, which includes a user interface, at least one communication module, a microphone, at least one speaker, at least one processor operatively connected with the user interface, the at least one communication module, the microphone, and the at least one speaker, and at least one memory operatively connected with the at least one processor, wherein the at least one memory stores instructions, which when executed, instruct the at least one processor to while the electronic device is wiredly or wirelessly connected with an access point (AP) connected with at least one external electronic device, after receiving, through the microphone, part of a wake-up utterance to invoke a voice-based intelligent assistant service, broadcast identification information about the electronic device and receive identification information broadcast from the external electronic device, after receiving the whole wake-up utterance through the microphone, individually transmit first information related to the wake-up utterance received through the microphone to the at least one external electronic device and individually receive, from the external electronic device, second information related to the wake-up utterance received by the at least one external electronic device, and determine whether to transmit voice information received after the wake-up utterance to an external server based on at least part of the first information and the second information. Other various embodiments are possible as well.