G10L2015/221

ACOUSTIC MODEL TRAINING USING CORRECTED TERMS

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM

There is provided an information processing device including an analysis unit configured to analyze a character string indicating contents of utterance obtained as a result of speech recognition, and a display control unit configured to display the character string indicating the contents of the utterance and an analysis result on a display screen.

AUTOMATIC OUT OF VOCABULARY WORD DETECTION IN SPEECH RECOGNITION
20230267918 · 2023-08-24 ·

Presented herein are systems and methods are presented for detecting out-of-vocabulary (OOV) words in an automatic speech recognition (ASR) system, determining an intended word for the OOV, and adding the intended word to a repository of words. A method may involve receiving audio input data including a series of spoken words; determining that one of the spoken words is an out of vocabulary word absent from a repository of words; generating word candidates based on characteristics of the out of vocabulary word; presenting the word candidates on a display; receiving intended word input data that indicates a selection of one of the word candidates as an intended word for the out of vocabulary word; and adding the intended word to the repository of words. Additionally, one or more devices or apparatuses may be configured to perform such method.

SYSTEM AND METHOD FOR IMPROVING AIR TRAFFIC COMMUNICATION (ATC) TRANSCRIPTION ACCURACY BY INPUT OF PILOT RUN-TIME EDITS
20230267917 · 2023-08-24 · ·

Systems and methods are provided for training of an Automatic Speech Recognition (ASR) model during runtime of a transcription system, the system includes a background processor configured to operate with the transcription system to display a speech-to-text sample of an audio segment of a cockpit communication with an identifier which is converted using an ASR model wherein the background processor receives a response by a user during runtime of the transcription system and display of the speech-to-text sample and causes a change to the identifier to either a positive or negative attribute upon a determination of the correctness of a conversion process of the speech-to-text sample using the ASR model by review of a display of the content of the speech-to-text sample; and to train the ASR model based on information associated with the content of the speech-to-text sample in accordance with the response by the user.

Techniques to enhance transcript of speech with indications of speaker emotion

In one aspect, a device includes at least one processor and storage accessible to the at least one processor. The storage includes instructions executable by the at least one processor to analyze the decibel levels of audio of a user's speech. The instructions are executable to, based on the analysis, enhance a transcript of the user's speech with indications of particular words from the user's speech as being associated with one or more emotions of the user.

ELECTRONIC APPARATUS FOR PROVIDING ADVERTISEMENT THROUGH VOICE ASSISTANT AND CONTROL METHOD THEREOF
20220148034 · 2022-05-12 · ·

An electronic apparatus includes a microphone, a speaker, a memory, and a processor. The processor obtains a product keyword from a text converted from a user voice that is input through the microphone, identify a product category related to the product keyword; determine whether to provide an advertisement related to the user voice based on an interest level of the user according to utterance history information and an advertisement fatigue level of the user; and based on a determination to provide the advertisement, obtain advertisement information based on a target product category identified according to the interest level of the user, and control the speaker to output an advertisement voice based on the advertisement information.

DETERMINING SUGGESTED SUBSEQUENT USER ACTIONS DURING DIGITAL ASSISTANT INTERACTION

Systems and processes for operating an intelligent automated assistant are provided. An example process includes receiving an utterance including a user request, determining, based on the user request, a domain associated with the user request, determining, based on the domain, a first subsequent user action and a second subsequent user action, determining, based on the domain, a first parameter for the first subsequent user action and a second parameter for the second subsequent user action, in accordance with a determination that a first score associated with the first subsequent user action is higher than a score associated with the second subsequent user action, selecting the first subsequent user action as a suggested subsequent user action, and providing the suggested subsequent user action.

Systems, methods, and storage media for providing presence of modifications in user dictation
11328729 · 2022-05-10 · ·

System and method for providing presence of modifications in user dictation are disclosed. Exemplary implementations may: obtain primary audio information representing sound, including speech from a recording user, captured by a client computing platform; perform speech recognition on the primary audio information to generate a textual transcript; effectuate presentation of the transcript to the recording user; receive user input from the recording user; alter, based on the received user input from the recording user, a portion of the transcript to generate an altered transcript; effectuate presentation of the altered transcript in conjunction with audio playback of at least some of the primary audio information in a reviewing interface on a client computing platform; receive user input from the reviewing user; alter, based on the received user input from the reviewing user, portions of the altered transcript to generate a reviewed transcript; and store the reviewed transcript in electronic storage.

Display apparatus, voice acquiring apparatus and voice recognition method thereof

Disclosed are a display apparatus, a voice acquiring apparatus and a voice recognition method thereof, the display apparatus including: a display unit which displays an image; a communication unit which communicates with a plurality of external apparatuses; and a controller which includes a voice recognition engine to recognize a user's voice, receives a voice signal from a voice acquiring unit, and controls the communication unit to receive candidate instruction words from at least one of the plurality of external apparatuses to recognize the received voice signal.

INFORMATION PROCESSING DEVICE
20220139418 · 2022-05-05 ·

The present invention addresses the problem of providing a technique for assisting the realization of more efficient business activities, while taking account of objective indicators. In a server 1 which supports a user U having a telephone call with a call destination C, an acquiring unit 101 acquires information recorded during the call between the user U and the call destination C, as call information. An extracting unit 102 detects utterance segments VS1 to VSn in which speech is present, from the acquired call information, and extracts speech information VI1 to VIm from each utterance segment VS1 to VSn. An analyzing unit 103 performs analysis based on elements E1 to Ep, on the basis of the extracted speech information VI1 to VIm. A generating unit 104 generates business support information for supporting the call with the user U, on the basis of the results of the analysis. A presenting unit 105 presents the generated business support information to the user U. The abovementioned problem is thus resolved.