Patent classifications
G10L2015/221
Acoustic model training using corrected terms
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.
INTELLIGENT LIST READING
Systems and processes for operating an intelligent automated assistant to perform intelligent list reading are provided. In one example process, a spoken user request associated with a plurality of data items is received. The process determines whether a degree of specificity of the spoken user request is less than a threshold level. In response to determining that a degree of specificity of the spoken user request is less than a threshold level, one or more attributes related to the spoken user request are determined. The one or more attributes are not defined in the spoken user request. Additionally, a list of data items based on the spoken user request and the one or more attributes is obtained. A spoken response comprising a subset of the list of data items is generated and the spoken response is provided.
METHODS AND SYSTEMS FOR TRANSCRIPTION OF AUDIO DATA
Systems, devices, and methods transcribe words recorded in audio data. A computer-generated transcript is provided. The transcript comprises records for each word in the computer-generated transcript. At least one confirmation input is received for each record. The at least one confirmation input modifies a selected record and automatically identifies a next record for receiving a next confirmation input. A sequence of confirmation inputs may rapidly modify and validate each record in a sequence of records in the computer-generated transcript. A validated transcript is generated from the modified records and is provided from an evidence management system.
INCREMENTAL POST-EDITING AND LEARNING IN SPEECH TRANSCRIPTION AND TRANSLATION SERVICES
Computer systems and computer-implemented methods provide for interactive and incremental post-editing of real-time speech transcription and translation. A first component is automatic identification of potentially problematic regions in the output (e.g., transcription or translation) that are either likely to be technically processed badly or risky in terms of their content or expression. A second component is intelligent, efficient interfaces that permit multiple editors to correct system output concurrently, collaboratively, efficiently, and simultaneously, so that corrections can be seamlessly inserted and become part of a running presentation. A third component is incremental learning and adaptation that allows the system to use the human corrective feedback to deliver instantaneous improvement of system behavior down-stream. A fourth component is transfer learning to transfer short-term learning into long term learning if the modifications warrant long-term retention.
INPUT ASSISTANCE SYSTEM, INPUT ASSISTANCE METHOD, AND NON-VOLATILE RECORDING MEDIUM STORING PROGRAM
An input assistance system includes a terminal device including a display screen, an acquisition unit, a recognition unit, an input item display unit, a recognition result display unit, and a reception unit. The acquisition unit acquires utterance voice data of a user. The recognition unit performs voice recognition of the utterance voice data to generate text data. The input item display unit displays a plurality of input items including the input item associated with the text data. The recognition result display unit displays the text data. The reception unit accepts an operation of selecting the input item associated with the text data displayed by the recognition result display unit from the plurality of input items displayed by the input item display unit. The reception unit accepts the operation of selecting the input item associated with the text data when the plurality of input items and the text data are displayed.
VOICE IDENTIFICATION FOR OPTIMIZING VOICE SEARCH RESULTS
Systems and methods are provided for processing a voice input stream with interruptions and/or supplemental comments. Generally, a virtual voice assistant may receive an input stream with a first input comprising a voice query from a first voice and a second input comprising a secondary query from a second voice (e.g., an interruption or a supplement). The virtual assistant may determine that the second voice does not match the first voice, and then process the voice query to produce first results. Some embodiments may determine whether the secondary query is a supplement or an interruption and, e.g., choose to ignore an interruption or set aside a supplement if it may be used to help the search query. In some embodiments, results for the first query may be compared with results for the first query with a portion of the supplement.
INFORMATION PROCESSING DEVICE, METHOD OF INFORMATION PROCESSING, AND PROGRAM
[Object] The technology that can improve accuracy of speech recognition for collected sound data is provided. [Solution] Provided is an information processing device including: a collected sound data acquisition portion that acquires collected sound data; and an output controller that causes an output portion to output at least whether or not a state of the collected sound data is suitable for speech recognition.
Information processing device, information processing method, and program
There is provided an information processing device including an analysis unit configured to analyze a character string indicating contents of utterance obtained as a result of speech recognition, and a display control unit configured to display the character string indicating the contents of the utterance and an analysis result on a display screen.
Method, apparatus and device for implementing voice application, computer readable storage medium
The present disclosure provides a method, an apparatus and a device for implementing a voice application and a computer readable storage medium, which determine a feedback content and a template identifier corresponding to a voice command of a user on a server side, and the determination result is performed by the IoT device. As the Internet information is iteratively updated, the voice command is also updated, the processing function of the voice command can be updated on the server side, so that the voice application in the IoT device does not need to be updated. Therefore, the processing capability of the voice application can be updated without upgrading the voice application itself, thereby alleviating the problem of an excessively long upgrade process due to the OTA upgrade process in the prior art.
SYSTEM AND METHODS FOR ROBUST VOICE-BASED HUMAN-IOT COMMUNICATION
A system and method for robust voiced-based communication of humans and Internet of Things.