G10L15/01

HYBRID LIVE CAPTIONING SYSTEMS AND METHODS

A computer system configured to generate captions is provided. The computer system includes a memory and a processor coupled to the memory. The processor is configured to access a first buffer configured to store text generated by an automated speech recognition (ASR) process; access a second buffer configured to store text generated by a captioning client process; identify either the first buffer or the second buffer as a source buffer of caption text; generate caption text from the source buffer; and communicate the caption text to a target process.

HYBRID LIVE CAPTIONING SYSTEMS AND METHODS

A computer system configured to generate captions is provided. The computer system includes a memory and a processor coupled to the memory. The processor is configured to access a first buffer configured to store text generated by an automated speech recognition (ASR) process; access a second buffer configured to store text generated by a captioning client process; identify either the first buffer or the second buffer as a source buffer of caption text; generate caption text from the source buffer; and communicate the caption text to a target process.

Methods, Systems And Apparatuses For Improved Speech Recognition And Transcription
20230122559 · 2023-04-20 ·

Methods, systems, and apparatuses for improved speech recognition and transcription of user utterances are described herein. User utterances may be processed by a speech recognition computing device as well as an acoustic model. The acoustic model may be trained using historical user utterance data and machine learning techniques. The acoustic model may be used to determine whether a transcription determined by the speech recognition computing device should be overridden with an updated transcription.

Semiautomated relay method and apparatus

A method to transcribe communications includes the steps of obtaining a plurality of hypothesis transcriptions of a voice signal generated by a speech recognition system, determining consistent words that are included in at least first and second of the plurality of hypothesis transcriptions, in response to determining the consistent words, providing the consistent words to a device for presentation of the consistent words to an assisted user, and presenting the consistent words via a display screen on the device, wherein a rate of the presentation of the words on the display screen is variable.

Semiautomated relay method and apparatus

A method to transcribe communications includes the steps of obtaining a plurality of hypothesis transcriptions of a voice signal generated by a speech recognition system, determining consistent words that are included in at least first and second of the plurality of hypothesis transcriptions, in response to determining the consistent words, providing the consistent words to a device for presentation of the consistent words to an assisted user, and presenting the consistent words via a display screen on the device, wherein a rate of the presentation of the words on the display screen is variable.

COMPUTER SYSTEMS EXHIBITING IMPROVED COMPUTER SPEED AND TRANSCRIPTION ACCURACY OF AUTOMATIC SPEECH TRANSCRIPTION (AST) BASED ON A MULTIPLE SPEECH-TO-TEXT ENGINES AND METHODS OF USE THEREOF

In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.

Distilling to a Target Device Based on Observed Query Patterns
20230111618 · 2023-04-13 · ·

A method includes receiving user queries directed toward a cloud-based assistant service. For each received user query directed toward the cloud-based assistant service, the method also includes extracting one or more attributes from the user query and logging the user query into one or more of a plurality of category buckets based on the one or more attributes extracted from the user query. The method also includes determining when at least one of the plurality of category buckets includes a threshold number of the user queries logged into the at least one category bucket, and when the at least one of the plurality of category buckets includes the threshold number of the user queries, generating a distilled model of the cloud-based assistant service. The distilled model of the cloud-based assistant service is configured to execute on one or more target client devices.

Distilling to a Target Device Based on Observed Query Patterns
20230111618 · 2023-04-13 · ·

A method includes receiving user queries directed toward a cloud-based assistant service. For each received user query directed toward the cloud-based assistant service, the method also includes extracting one or more attributes from the user query and logging the user query into one or more of a plurality of category buckets based on the one or more attributes extracted from the user query. The method also includes determining when at least one of the plurality of category buckets includes a threshold number of the user queries logged into the at least one category bucket, and when the at least one of the plurality of category buckets includes the threshold number of the user queries, generating a distilled model of the cloud-based assistant service. The distilled model of the cloud-based assistant service is configured to execute on one or more target client devices.

Semiautomated relay method and apparatus

A call captioning system for captioning a hearing user's (HU's) voice signal during an ongoing call with an assisted user (AU) includes: an AU communication device with a display screen and a caption service activation feature, and a first processor programmed to, during an ongoing call, receive the HU's voice signal. Prior to activating the caption service via the activation feature, the processor uses an automated speech recognition (ASR) engine to generate HU voice signal captions, detect errors in the HU voice signal captions, use the errors to train the ASR software to the HU's voice signal to increase accuracy of the HU captions generated by the ASR engine; and store the trained ASR engine for subsequent use. Upon activating the caption service during the ongoing call, the processor uses the trained ASR engine to generate HU voice signal captions and present them to the AU via the display screen.

Semiautomated relay method and apparatus

A call captioning system for captioning a hearing user's (HU's) voice signal during an ongoing call with an assisted user (AU) includes: an AU communication device with a display screen and a caption service activation feature, and a first processor programmed to, during an ongoing call, receive the HU's voice signal. Prior to activating the caption service via the activation feature, the processor uses an automated speech recognition (ASR) engine to generate HU voice signal captions, detect errors in the HU voice signal captions, use the errors to train the ASR software to the HU's voice signal to increase accuracy of the HU captions generated by the ASR engine; and store the trained ASR engine for subsequent use. Upon activating the caption service during the ongoing call, the processor uses the trained ASR engine to generate HU voice signal captions and present them to the AU via the display screen.