Patent classifications
G10L15/01
SEMIAUTOMATED RELAY METHOD AND APPARATUS
A call captioning system for captioning a hearing user's (HU's) voice signal during an ongoing call with an assisted user (AU) includes: an AU communication device with a display screen and a caption service activation feature, and a first processor programmed to, during an ongoing call, receive the HU's voice signal. Prior to activating the caption service via the activation feature, the processor uses an automated speech recognition (ASR) engine to generate HU voice signal captions, detect errors in the HU voice signal captions, use the errors to train the ASR software to the HU's voice signal to increase accuracy of the HU captions generated by the ASR engine; and store the trained ASR engine for subsequent use. Upon activating the caption service during the ongoing call, the processor uses the trained ASR engine to generate HU voice signal captions and present them to the AU via the display screen.
SEMIAUTOMATED RELAY METHOD AND APPARATUS
A call captioning system for captioning a hearing user's (HU's) voice signal during an ongoing call with an assisted user (AU) includes: an AU communication device with a display screen and a caption service activation feature, and a first processor programmed to, during an ongoing call, receive the HU's voice signal. Prior to activating the caption service via the activation feature, the processor uses an automated speech recognition (ASR) engine to generate HU voice signal captions, detect errors in the HU voice signal captions, use the errors to train the ASR software to the HU's voice signal to increase accuracy of the HU captions generated by the ASR engine; and store the trained ASR engine for subsequent use. Upon activating the caption service during the ongoing call, the processor uses the trained ASR engine to generate HU voice signal captions and present them to the AU via the display screen.
ERROR-CORRECTION AND EXTRACTION IN REQUEST DIALOGS
A system comprises a machine that is configured to act upon requests from a user and sensing means for sensing an operational-mode dialog stream from the user for the machine. The system also comprises a computing system that is configured to train a neural network through machine learning to output, for each training example in a training dialog stream dataset, a corrected request for the machine. The computing system is also configure to, in an operational mode, using the trained neural network, generate a corrected, operational-mode request for the machine based on the operational-mode dialog stream from the user for the machine, wherein the operational-mode dialog stream is sensed by the sensing means.
ERROR-CORRECTION AND EXTRACTION IN REQUEST DIALOGS
A system comprises a machine that is configured to act upon requests from a user and sensing means for sensing an operational-mode dialog stream from the user for the machine. The system also comprises a computing system that is configured to train a neural network through machine learning to output, for each training example in a training dialog stream dataset, a corrected request for the machine. The computing system is also configure to, in an operational mode, using the trained neural network, generate a corrected, operational-mode request for the machine based on the operational-mode dialog stream from the user for the machine, wherein the operational-mode dialog stream is sensed by the sensing means.
FINGERPRINTING DATA TO DETECT VARIANCES
A system and method for characterizing the data used to train a model for machine learning inference. Training data and production data may both be fingerprinted, and the fingerprints may be compared to detect undesirable variances between training and production data. This may allow performance issues relating to differences in the training data set versus the production data set to be more easily identified. Parameters used for characterization can be determined based on the type of training data such as numerical data, image data, or audio data.
FINGERPRINTING DATA TO DETECT VARIANCES
A system and method for characterizing the data used to train a model for machine learning inference. Training data and production data may both be fingerprinted, and the fingerprints may be compared to detect undesirable variances between training and production data. This may allow performance issues relating to differences in the training data set versus the production data set to be more easily identified. Parameters used for characterization can be determined based on the type of training data such as numerical data, image data, or audio data.
Hotword detection on multiple devices
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
Hotword detection on multiple devices
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
Determination of transcription accuracy
A method may include obtaining audio of a communication session between a first device of a first user and a second device of a second user. The method may further include obtaining a transcription of second speech of the second user. The method may also include identifying one or more first sound characteristics of first speech of the first user. The method may also include identifying one or more first words indicating a lack of understanding in the first speech. The method may further include determining an experienced emotion of the first user based on the one or more first sound characteristics. The method may also include determining an accuracy of the transcription of the second speech based on the experienced emotion and the one or more first words.
Information processing apparatus, information processing system, and information processing method
Provided is an apparatus that includes a voice recognition section that executes a voice recognition process on a user speech and a learning processing section that executes a process of updating a degree of confidence on the basis of an interaction made between a user and the information processing apparatus after the user speech. The degree of confidence is an evaluation value indicating the reliability of a voice recognition result of the user speech. The voice recognition section generates data on degrees of confidence in recognition of the user speech in which data plural user speech candidates based on the voice recognition result of the user speech are associated with the degrees of confidence which are evaluation values each indicating reliability of the corresponding user speech candidate.