G10L15/32

Electronic device for controlling predefined function based on response time of external electronic device on user input, and method thereof
11531835 · 2022-12-20 · ·

Various embodiments of the disclosure relate to an electronic device for controlling a predefined function based on a response time of an external electronic device on a user input, and a method thereof. The electronic device includes: a memory configured to store one or more applications; a communication module comprising communication circuitry configured to communicate with an external electronic device; and a processor, wherein the processor is configured to control the electronic device to: receive an input; generate first control data for controlling at least one application among the one or more applications using a first recognition method based at least on the input; transmit at least part of the input to the external electronic device through the communication module, wherein the external electronic device is configured to generate second control data for controlling the at least one application using a second recognition method based at least on the input; identify a time that passes until the second control data is received after the at least part of the input is transmitted to the external electronic device; control the at least one application using the first control data based on the passing time satisfying a first predefined condition; and control the at least one application using the second control data based on the passing time satisfying a second predefined condition.

Method for controlling the operation of an appliance by a user through voice control
11532304 · 2022-12-20 · ·

A method for controlling operation of an appliance by a user through voice control includes at least the steps of: detecting, by the appliance, a control action performed by the user on the appliance; activating a voice control system by the appliance; capturing, by the voice control system, a voice input from the user as a captured voice input; recognizing, by the voice control system, a piece of information and/or an instruction in the captured voice input from the user as a recognized information and/or instruction; and executing, by the voice control system, a user control action on the appliance in accordance with the recognized information and/or instruction.

Speech-to-text system
11532308 · 2022-12-20 · ·

Systems and methods for processing speech transcription in a speech processing system are disclosed. A first transcription of a first utterance is received. In response to receiving an indication of an erroneous transcribed word in the first transcription, a control circuitry automatically activates an audio receiver for receiving a second utterance. In response to receiving the second utterance, an audio file of the second utterance and an indication of a location of the erroneous transcribed word within the first transcription is transmitted to a speech recognition system for a second transcription of the second utterance. Subsequently, the erroneous transcribed word in the first transcription is replaced with a transcribed word from the second transcription.

VOICE ANALYSIS SYSTEM
20220399011 · 2022-12-15 · ·

[Object] To provide a highly accurate voice analysis system. [Solution] A voice analysis system 1 includes a first voice analysis terminal 3 and a second voice analysis terminal 5. The first voice analysis terminal 3 includes a first term analysis unit 7 that obtains first conversation information, a first conversation storage unit 9 that stores the first conversation information, a first analysis unit 11 that analyzes the first conversation information, a presentation storage unit 13, a related term storage unit 15, a display unit 17, a topic word storage unit 19, and a conversation information reception unit 25 that receives second conversation information from the second voice analysis terminal 5. The second voice analysis terminal 5 includes a second term analysis unit 21 that obtains the second conversation information and a second conversation storage unit 23. The first analysis unit 11 employs the first conversation section or the second conversation section as a correct conversation section using a relationship between a first topic word and a specific related term and a relationship between a second topic word and the specific related term.

VOICE ANALYSIS SYSTEM
20220399011 · 2022-12-15 · ·

[Object] To provide a highly accurate voice analysis system. [Solution] A voice analysis system 1 includes a first voice analysis terminal 3 and a second voice analysis terminal 5. The first voice analysis terminal 3 includes a first term analysis unit 7 that obtains first conversation information, a first conversation storage unit 9 that stores the first conversation information, a first analysis unit 11 that analyzes the first conversation information, a presentation storage unit 13, a related term storage unit 15, a display unit 17, a topic word storage unit 19, and a conversation information reception unit 25 that receives second conversation information from the second voice analysis terminal 5. The second voice analysis terminal 5 includes a second term analysis unit 21 that obtains the second conversation information and a second conversation storage unit 23. The first analysis unit 11 employs the first conversation section or the second conversation section as a correct conversation section using a relationship between a first topic word and a specific related term and a relationship between a second topic word and the specific related term.

PROVIDING HIGH QUALITY SPEECH RECOGNITION
20220399006 · 2022-12-15 ·

A computer-implemented method, system and computer program product for providing high quality speech recognition. A first speech-to-text model is selected to perform speech recognition of a customer's spoken words and a second speech-to-text model is selected to perform speech recognition of the agent's spoken words during a call. The combined results of the speech-to-text models used to process the customer's and agent's spoken words are then analyzed to generate a reference speech-to-text result. The customer speech data that was processed by the first speech-to-text model is reprocessed by multiple other speech-to-text models. A similarity analysis is performed on the results of these speech-to-text models with respect to the reference speech-to-text result resulting in similarity scores being assigned to these speech-to-text models. The speech-to-text model with the highest similarity score is then selected as the new speech-to-text model for performing speech recognition of the customer's spoken words during the call.

Speech recognition with parallel recognition tasks
11527248 · 2022-12-13 · ·

The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Speech recognition with parallel recognition tasks
11527248 · 2022-12-13 · ·

The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Text independent speaker recognition

Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

Text independent speaker recognition

Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.