Patent classifications
G10L15/193
Artificial intelligence apparatus and method for recognizing speech of user in consideration of word usage frequency
Disclosed herein is an artificial intelligence apparatus for recognizing speech of a user including a microphone and a processor configured to obtain, via the microphone, speech data including speech of a user, determine a frequency weight for each word using a speech recognition log, generate a speech recognition result corresponding to the speech data using the frequency weight, and perform control corresponding to the speech recognition result.
System and method for controlling an application using natural language communication
A system and method are disclosed for setting up a communication link between a device or application and a system with a controller. The controller can collect and send information to the application. A user interfaces with the controller to access the functionality of the application through providing commands to the controller. The system allows the user to interface with multiple applications.
System and method for controlling an application using natural language communication
A system and method are disclosed for setting up a communication link between a device or application and a system with a controller. The controller can collect and send information to the application. A user interfaces with the controller to access the functionality of the application through providing commands to the controller. The system allows the user to interface with multiple applications.
SYSTEM AND/OR METHOD FOR SEMANTIC PARSING OF AIR TRAFFIC CONTROL AUDIO
The method S200 can include: at an aircraft, receiving an audio utterance from air traffic control S210, converting the audio utterance to text, determining commands from the text using a question-and-answer model S240, and optionally controlling the aircraft based on the commands S250. The method functions to automatically interpret flight commands from the air traffic control (ATC) stream.
SPEECH RECOGNITION METHOD, APPARATUS, AND DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
A speech recognition method, apparatus, and device, and a computer-readable storage medium provided pertain to the field of artificial intelligence technologies. The method includes: obtaining or generating a dynamic target language model based on reply information of a first intent, where the dynamic target language model includes a front-end part and a core part; obtaining a speech signal, parsing the speech signal to generate a key word; and invoking the dynamic target language model to determine a second intent and a service content. The front-end part of the dynamic target language model parses out the second intent based on the key word, and the core part of the dynamic target language model parses out the service content based on the key word. The speech recognition method prevents a provided service content from deviating from a user requirement and achieves a good recognition effect.
Artificial intelligence apparatus for recognizing speech of user using personalized language model and method for the same
An artificial intelligence apparatus for recognizing speech of a user includes a microphone, and a processor configured to receive, via the microphone, a sound signal corresponding to the speech of the user, acquire personalize identification information corresponding to the speech, recognize the speech from the sound signal using a global language model, calculate a reliability for the recognition, and if the calculated reliability exceeds a predetermined first reference value, update a personalized language model corresponding to the personalize identification information using the recognition result.
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models
A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models
A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.
Language and grammar model adaptation
Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.
Language and grammar model adaptation
Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.