G10L15/01

Information processing apparatus, information processing system, and information processing method

Provided is an apparatus that includes a voice recognition section that executes a voice recognition process on a user speech and a learning processing section that executes a process of updating a degree of confidence on the basis of an interaction made between a user and the information processing apparatus after the user speech. The degree of confidence is an evaluation value indicating the reliability of a voice recognition result of the user speech. The voice recognition section generates data on degrees of confidence in recognition of the user speech in which data plural user speech candidates based on the voice recognition result of the user speech are associated with the degrees of confidence which are evaluation values each indicating reliability of the corresponding user speech candidate.

System and method of automated model adaptation

Methods, systems, and computer readable media for automated transcription model adaptation includes obtaining audio data from a plurality of audio files. The audio data is transcribed to produce at least one audio file transcription which represents a plurality of transcription alternatives for each audio file. Speech analytics are applied to each audio file transcription. A best transcription is selected from the plurality of transcription alternatives for each audio file. Statistics from the selected best transcription are calculated. An adapted model is created from the calculated statistics.

System and method of automated model adaptation

Methods, systems, and computer readable media for automated transcription model adaptation includes obtaining audio data from a plurality of audio files. The audio data is transcribed to produce at least one audio file transcription which represents a plurality of transcription alternatives for each audio file. Speech analytics are applied to each audio file transcription. A best transcription is selected from the plurality of transcription alternatives for each audio file. Statistics from the selected best transcription are calculated. An adapted model is created from the calculated statistics.

Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof

In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.

Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof

In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.

SYSTEM AND METHODS FOR CHATBOT AND SEARCH ENGINE INTEGRATION
20220407961 · 2022-12-22 ·

A system and method for chatbot and search engine integration comprising chatbot crawler engine configured to detect all possible paths through a conversational flow between a chatbot and a user, and also comprising a chatbot search integration manager configured to receive a processed conversation flow from the chatbot crawler engine, parse the conversation flow to identify keywords and features, and build an indexable data structure which can be integrated into search engines in order to expose the information and data contained within the chatbot's knowledge base. This integration may allow search engine users to be redirected to a website hosting the chatbot when an indexed data structure comprises information relevant to a search engine query.

SYSTEM AND METHODS FOR AN AUTOMATED CHATBOT TESTING PLATFORM
20220407960 · 2022-12-22 ·

A system and method for automated chatbot testing to provide training and quality assurance of conversational artificial intelligence systems, comprising a chatbot testing administrator interface which allows chatbot makers to define what a chatbot is supposed to do, create test scripts to test the performance of the chatbot, and review the results of the chatbot tests, a chatbot testing server which provides and interface between chatbot testing agents and the administrator interface, instantiates chatbot agents and distributes them across available hardware and runs testing programs which activate, configure, and deactivate chatbot testing agents as needed. A plurality of chatbot agents may be running in parallel to provide automated testing based upon test script configuration.

USING SPEECH MANNERISMS TO VALIDATE AN INTEGRITY OF A CONFERENCE PARTICIPANT
20220399024 · 2022-12-15 ·

Techniques are provided to validate a digitized audio signal that is generated by a conference participant. Reference speech features of the conference participant are obtained, either via samples provided explicitly by the participant, or collected passively via prior conferences. The speech features include one or more of word choices, filler words, common grammatical errors, idioms, common phrases, pace of speech, or other features. The reference speech features are compared to features observed in the digitized audio signal. If the reference speech features are sufficiently similar to the observed speech features, the digitized audio signal is validated and the conference participant is allowed to remain in the conference. If the validation is not successful, a variety of possible actions are taken, including alerting an administrator and/or terminating the participant's attendance in the conference.

USING SPEECH MANNERISMS TO VALIDATE AN INTEGRITY OF A CONFERENCE PARTICIPANT
20220399024 · 2022-12-15 ·

Techniques are provided to validate a digitized audio signal that is generated by a conference participant. Reference speech features of the conference participant are obtained, either via samples provided explicitly by the participant, or collected passively via prior conferences. The speech features include one or more of word choices, filler words, common grammatical errors, idioms, common phrases, pace of speech, or other features. The reference speech features are compared to features observed in the digitized audio signal. If the reference speech features are sufficiently similar to the observed speech features, the digitized audio signal is validated and the conference participant is allowed to remain in the conference. If the validation is not successful, a variety of possible actions are taken, including alerting an administrator and/or terminating the participant's attendance in the conference.

Speech recognition with parallel recognition tasks
11527248 · 2022-12-13 · ·

The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.