G10L15/01

Speech recognition with parallel recognition tasks
11527248 · 2022-12-13 · ·

The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Voice message capturing system
11527251 · 2022-12-13 · ·

Systems, apparatuses, and methods for capturing voice messages are provided. In one embodiment, a method can include receiving, by one or more processors of a mobile user device, a user input indicative of a voice message at a first time. The method can further include identifying contextual data indicative of one or more computing devices within proximity of the mobile user device. The method can include providing a set of data for storage in one or more memory devices of the mobile user device. The set of data can indicate the voice message and the contextual data indicative of the computing devices. The method can further include providing an output indicative of the voice message and the contextual data to one or more secure computing devices at a second time.

Voice message capturing system
11527251 · 2022-12-13 · ·

Systems, apparatuses, and methods for capturing voice messages are provided. In one embodiment, a method can include receiving, by one or more processors of a mobile user device, a user input indicative of a voice message at a first time. The method can further include identifying contextual data indicative of one or more computing devices within proximity of the mobile user device. The method can include providing a set of data for storage in one or more memory devices of the mobile user device. The set of data can indicate the voice message and the contextual data indicative of the computing devices. The method can further include providing an output indicative of the voice message and the contextual data to one or more secure computing devices at a second time.

Rescoring Automatic Speech Recognition Hypotheses Using Audio-Visual Matching

A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.

WAKEUP TESTING METHOD AND APPARATUS, ELECTRONIC DEVICE AND READABLE STORAGE MEDIUM

The present discloses a wakeup testing method and apparatus, an electronic device and a readable storage medium, and relates to technical fields of automatic driving and intelligent transportation. The wakeup testing method includes: acquiring a plurality of wakeup audios; playing the plurality of wakeup audios in a vehicle, and configuring a noise environment during the playing of each wakeup audio; and obtaining, according to wakeup results of at least one to-be-tested vehicle-mounted client in the vehicle for different wakeup audios, wakeup rates of the at least one to-be-tested vehicle-mounted client. The present disclosure can reduce costs of wakeup testing and improve accuracy of wakeup testing.

ERROR CORRECTION IN SPEECH RECOGNITION

Systems and methods for speech recognition correction include receiving a voice recognition input from an individual user and using a trained error correction model to add a new alternative result to a results list based on the received voice input processed by a voice recognition system. The error correction model is trained using contextual information corresponding to the individual user. The contextual information comprises a plurality of historical user correction logs, a plurality of personal class definitions, and an application context. A re-ranker re-ranks the results list with the new alternative result and a top result from the re-ranked results list is output.

ERROR CORRECTION IN SPEECH RECOGNITION

Systems and methods for speech recognition correction include receiving a voice recognition input from an individual user and using a trained error correction model to add a new alternative result to a results list based on the received voice input processed by a voice recognition system. The error correction model is trained using contextual information corresponding to the individual user. The contextual information comprises a plurality of historical user correction logs, a plurality of personal class definitions, and an application context. A re-ranker re-ranks the results list with the new alternative result and a top result from the re-ranked results list is output.

Filtering directive invoking vocal utterances

Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: receiving, from a user, voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system, wherein the candidate directive invoking vocal utterance includes at least one word or phrase of the text based command, wherein the computer system is configured to perform the first computer function in response to the first text based command and wherein the computer system is configured to perform a second computer function in response to a second text based command; determining, based on machine logic, whether a word or phrase of the candidate vocal utterance sounds confusingly similar to a speech rendering of a word or phrase defining the second text based command.

Filtering directive invoking vocal utterances

Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: receiving, from a user, voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system, wherein the candidate directive invoking vocal utterance includes at least one word or phrase of the text based command, wherein the computer system is configured to perform the first computer function in response to the first text based command and wherein the computer system is configured to perform a second computer function in response to a second text based command; determining, based on machine logic, whether a word or phrase of the candidate vocal utterance sounds confusingly similar to a speech rendering of a word or phrase defining the second text based command.

Automated Social Agent Interaction Quality Monitoring and Improvement

A system for monitoring and improving social agent interaction quality includes a computing platform having processing hardware and a system memory storing a software code. The processing hardware is configured to execute the software code to receive, from a social agent, interaction data describing an interaction of the social agent with a user, and to perform an assessment of the interaction, using the interaction data, as one of successful or including a flaw. When the assessment indicates that the interaction includes the flaw, the processing hardware is further configured to execute the software code to identify an interaction strategy for correcting the flaw, and to deliver, to the social agent, one or both of the assessment and the interaction strategy to correct the flaw in the interaction.