Patent classifications
G10L15/04
IDENTIFICATION AND CLASSIFICATION OF TALK-OVER SEGMENTS DURING VOICE COMMUNICATIONS USING MACHINE LEARNING MODELS
A system and methods are provided to analyze audio signals from an incoming voice call. The system includes a processor and a computer readable medium operably coupled thereto, to perform voice analysis operations which include receiving a first audio signal comprising a first audio waveform of a first speech between at least two users during the incoming voice call, accessing speech segment parameters for analyzing the audio signals, determining one or more talk-over segments in the first audio waveform using the speech segment parameters, extracting audio features from each of the one or more talk-over segments, determining, using a machine learning (ML) model trained for interruption analysis of the audio signals, whether each of the one or more talk-over segments are a negative interruption or a non-negative interruption based on the audio features, and determining whether to output a first notification for the negative interruption or the non-negative interruption.
IDENTIFICATION AND CLASSIFICATION OF TALK-OVER SEGMENTS DURING VOICE COMMUNICATIONS USING MACHINE LEARNING MODELS
A system and methods are provided to analyze audio signals from an incoming voice call. The system includes a processor and a computer readable medium operably coupled thereto, to perform voice analysis operations which include receiving a first audio signal comprising a first audio waveform of a first speech between at least two users during the incoming voice call, accessing speech segment parameters for analyzing the audio signals, determining one or more talk-over segments in the first audio waveform using the speech segment parameters, extracting audio features from each of the one or more talk-over segments, determining, using a machine learning (ML) model trained for interruption analysis of the audio signals, whether each of the one or more talk-over segments are a negative interruption or a non-negative interruption based on the audio features, and determining whether to output a first notification for the negative interruption or the non-negative interruption.
Communication with in-game characters
A system for coordinating reactions of a virtual character with script spoken by a player in a video game or presentation, comprising an internet-connected server executing software and streaming video games or presentations to a player's computerized device. The system senses start of a dialogue between the player and the virtual character, displays a script for the player on a display of the computerized platform, prompts the player to speak the script. A timer then starts, or the system tracks an audio stream of the spoken script, determines where the player is in the script by the timer or the audio stream, and causes specific actions and responses of the virtual character according to pre-programmed association of actions and responses of the character to points of time or specific variations in the audio stream.
Communication with in-game characters
A system for coordinating reactions of a virtual character with script spoken by a player in a video game or presentation, comprising an internet-connected server executing software and streaming video games or presentations to a player's computerized device. The system senses start of a dialogue between the player and the virtual character, displays a script for the player on a display of the computerized platform, prompts the player to speak the script. A timer then starts, or the system tracks an audio stream of the spoken script, determines where the player is in the script by the timer or the audio stream, and causes specific actions and responses of the virtual character according to pre-programmed association of actions and responses of the character to points of time or specific variations in the audio stream.
Context configurable keywords
A system incorporating configurable keywords. The system can detect a keyword in audio data and execute one function for the keyword if a first application is operating, but a second function for the keyword if a second function is operating. Each keyword may be associated with multiple different functions. If a keyword is recognized during keyword detection, a function associated with that keyword is determined based on another application running on the system. Thus detection of a same keyword may result in a different function based on system context.
Context configurable keywords
A system incorporating configurable keywords. The system can detect a keyword in audio data and execute one function for the keyword if a first application is operating, but a second function for the keyword if a second function is operating. Each keyword may be associated with multiple different functions. If a keyword is recognized during keyword detection, a function associated with that keyword is determined based on another application running on the system. Thus detection of a same keyword may result in a different function based on system context.
SYSTEMS AND METHODS FOR GENERATING TRAILERS FOR AUDIO CONTENT
An electronic device receives an audio file and divides the audio file into a plurality of segments. The electronic device, automatically, without user input, determines, for each segment, a descriptor from a plurality of descriptors and a value of the descriptor for the segment. The electronic device selects one or more segments of the plurality of segments, based on a comparison of the respective values of respective descriptors for respective segments and genre-specific criteria selected based on a genre of the audio file. The electronic device generates a trailer for the audio file using the selected one or more segments.
SYSTEMS AND METHODS FOR GENERATING TRAILERS FOR AUDIO CONTENT
An electronic device receives an audio file and divides the audio file into a plurality of segments. The electronic device, automatically, without user input, determines, for each segment, a descriptor from a plurality of descriptors and a value of the descriptor for the segment. The electronic device selects one or more segments of the plurality of segments, based on a comparison of the respective values of respective descriptors for respective segments and genre-specific criteria selected based on a genre of the audio file. The electronic device generates a trailer for the audio file using the selected one or more segments.
Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof
In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.
Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof
In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription and generating a transcript of the audio recording from respective accepted hypotheses for the plurality of audio segments.