Patent classifications
G10L15/065
Improving speech recognition transcriptions
An approach to correcting transcriptions of speech recognition models may be provided. A list of similar sounding phonemes from associated with the phonemes of high frequency terms may be generated for a particular node associated with a virtual assistant. An utterance may be transcribed and receive a confidence score regarding the correctness of the transcription based on audio metrics and other factors. The phonemes of the utterance can be compared to the phonemes of the high frequency terms from the list and a score for the matching phonemes and similar sounding phonemes can be determined. If it is determined the sounds similar score for a term from the high frequency term list is above a threshold, the transcription can be replaced with the term, providing a corrected transcription.
Improving speech recognition transcriptions
An approach to correcting transcriptions of speech recognition models may be provided. A list of similar sounding phonemes from associated with the phonemes of high frequency terms may be generated for a particular node associated with a virtual assistant. An utterance may be transcribed and receive a confidence score regarding the correctness of the transcription based on audio metrics and other factors. The phonemes of the utterance can be compared to the phonemes of the high frequency terms from the list and a score for the matching phonemes and similar sounding phonemes can be determined. If it is determined the sounds similar score for a term from the high frequency term list is above a threshold, the transcription can be replaced with the term, providing a corrected transcription.
METHOD AND DEVICE FOR INFORMATION PROCESSING
An information processing method and an electronic device are provided. The method includes: obtaining audio data collected by a slave device; obtaining contextual data corresponding to the slave device; and obtaining a recognition result of recognizing the audio data based on the contextual data. The contextual data characterizes a voice environment of the audio data collected by the slave device.
Speech endpointing
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.
Speech endpointing
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.
Recommending Results In Multiple Languages For Search Queries Based On User Profile
Systems and methods for a media guidance application that generates results in multiple languages for search queries. In particular, the media guidance application resolves multiple language barriers by taking automatic and manual user language settings and applying those settings to a variety of potential search results.
Recommending Results In Multiple Languages For Search Queries Based On User Profile
Systems and methods for a media guidance application that generates results in multiple languages for search queries. In particular, the media guidance application resolves multiple language barriers by taking automatic and manual user language settings and applying those settings to a variety of potential search results.
Electronic device and operation method thereof
Provided are an electronic device and an operation method thereof. The electronic device includes: a first sound receiver configured to receive a sound input while power is supplied to the first sound receiver in a standby state; a trigger word/phrase recognizer configured to recognize whether the sound input received by the first sound receiver corresponds to a trigger word or phrase; a second sound receiver configured to receive a sound input by receiving supply of power based on the trigger word or phrase being recognized by the trigger word/phrase recognizer; and a data transceiver configured to output a first sound input signal supplied from the first sound receiver and a second sound input signal supplied from the second sound receiver.
Electronic device and operation method thereof
Provided are an electronic device and an operation method thereof. The electronic device includes: a first sound receiver configured to receive a sound input while power is supplied to the first sound receiver in a standby state; a trigger word/phrase recognizer configured to recognize whether the sound input received by the first sound receiver corresponds to a trigger word or phrase; a second sound receiver configured to receive a sound input by receiving supply of power based on the trigger word or phrase being recognized by the trigger word/phrase recognizer; and a data transceiver configured to output a first sound input signal supplied from the first sound receiver and a second sound input signal supplied from the second sound receiver.
MITIGATING FALSE POSITIVES AND/OR FALSE NEGATIVES IN HOT WORD FREE ADAPTATION OF AUTOMATED ASSISTANT
Hot word free adaptation, of one or more function(s) of an automated assistant, responsive to determining, based on gaze measure(s) and/or active speech measure(s), that a user is engaging with the automated assistant. Implementations relate to various techniques for mitigating false positive occurrences of and/or false negative occurrences, of hot word free adaptation, through utilization of personalized parameter(s) for at least some user(s) of an assistant device. The personalized parameter(s) are utilized in determining whether condition(s) are satisfied, where those condition(s), if satisfied, indicate that the user is engaging in hot word free interaction with the automated assistant and result in adaptation of function(s) of the automated assistant.