Patent classifications
G10L15/193
AUTOMATIC LEARNING OF ENTITIES, WORDS, PRONUNCIATIONS, AND PARTS OF SPEECH
Systems for automatic speech recognition and/or natural language understanding automatically learn new words by finding subsequences of phonemes that, if they were a new word, would enable a successful tokenization of a phoneme sequence. Systems can learn alternate pronunciations of words by finding phoneme sequences with a small edit distance to existing pronunciations. Systems can learn the part of speech of words by finding part-of-speech variations that would enable parses by syntactic grammars. Systems can learn what types of entities a word describes by finding sentences that could be parsed by a semantic grammar but for the words not being on an entity list.
Context driven device arbitration
This disclosure describes, in part, context-driven device arbitration techniques to select a speech interface device from multiple speech interface devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include executing multiple pipeline instances to analyze audio signals and device metadata received from each of the multiple speech interface devices which detected the speech utterance. A remote speech processing service may execute the multiple pipeline instances and analyze the audio signals and/or metadata, at various stages of the pipeline instances, to determine which speech interface device is to respond to the speech utterance.
Context driven device arbitration
This disclosure describes, in part, context-driven device arbitration techniques to select a speech interface device from multiple speech interface devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include executing multiple pipeline instances to analyze audio signals and device metadata received from each of the multiple speech interface devices which detected the speech utterance. A remote speech processing service may execute the multiple pipeline instances and analyze the audio signals and/or metadata, at various stages of the pipeline instances, to determine which speech interface device is to respond to the speech utterance.
Teleconference recording management system
An example operation may include one or more of receiving a plurality of local audio files from a plurality of audio devices that participated in a teleconference, where each local audio file includes a locally captured audio recording of a user of a respective audio device during the teleconference, generating combined audio playback information for the teleconference based on the plurality of local audio files received from the plurality of audio devices, the generating including detecting audio portions within the plurality of local audio files and synchronizing a playing order of the detected audio portions based on timing information included in the plurality of local audio files, and transmitting the combined audio playback information of the teleconference to at least one audio device among the plurality of audio devices.
Teleconference recording management system
An example operation may include one or more of receiving a plurality of local audio files from a plurality of audio devices that participated in a teleconference, where each local audio file includes a locally captured audio recording of a user of a respective audio device during the teleconference, generating combined audio playback information for the teleconference based on the plurality of local audio files received from the plurality of audio devices, the generating including detecting audio portions within the plurality of local audio files and synchronizing a playing order of the detected audio portions based on timing information included in the plurality of local audio files, and transmitting the combined audio playback information of the teleconference to at least one audio device among the plurality of audio devices.
Device for recognizing speech input from user and operating method thereof
Provided are a device for recognizing a speech input including a named entity from a user and an operating method thereof. The device is configured to: generate a weighted finite state transducer model by using a vocabulary list including a plurality of named entities; obtain a first string from a speech input received from a user, by using a first decoding model; obtain a second string by using a second decoding model that uses the weighted finite state transducer model, the second string including a word sequence, which corresponds to at least one named entity, and an unrecognized word sequence not identified as a named entity; and output a text corresponding to the speech input by substituting the unrecognized word sequence of the second string with a word sequence included in the first string.
Speech correction system and speech correction method
The speech correction system includes a storage device, an audio receiver and a processing device. The processing device includes a speech recognition engine and a determination module. The storage device is configured to store a database. The audio receiver is configured to receive an audio signal. The speech recognition engine is configured to identify a key speech pattern in the audio signal and generate a candidate vocabulary list and a transcode corresponding to the key speech pattern; wherein the candidate vocabulary list includes a candidate vocabulary corresponding to the key speech pattern and a vocabulary score corresponding to the candidate vocabulary. The determination module is configured to determine whether the vocabulary score is greater than a score threshold. If the vocabulary score is greater than the score threshold, the determination module stores the candidate vocabulary corresponding to the vocabulary score in the database.
Enhancing test coverage of dialogue models
In one or more embodiments described herein, device, computer-implemented methods, and/or computer program products that facilitate enhancing test coverage of dialogue models. According to an embodiment, a system can comprise a processor that executes computer executable components stored in memory. The computer executable components can comprise a conversation processing component that receives and processes a first conversation. The computer executable components can further comprise a node marking component that tags a first node of a node map as an accessed node if the first node was accessed during processing of the first conversation. The computer executable components can further comprise a reporting component that generates a report comprising a list of nodes, wherein the list of nodes comprises one or more second nodes that were not accessed during processing of the first conversation.
Enhancing test coverage of dialogue models
In one or more embodiments described herein, device, computer-implemented methods, and/or computer program products that facilitate enhancing test coverage of dialogue models. According to an embodiment, a system can comprise a processor that executes computer executable components stored in memory. The computer executable components can comprise a conversation processing component that receives and processes a first conversation. The computer executable components can further comprise a node marking component that tags a first node of a node map as an accessed node if the first node was accessed during processing of the first conversation. The computer executable components can further comprise a reporting component that generates a report comprising a list of nodes, wherein the list of nodes comprises one or more second nodes that were not accessed during processing of the first conversation.
Voice Control Command Generation Method and Terminal
A voice control command generation method includes displaying, by a terminal, prompt information in response to a first operation, where the prompt information prompts a user to enter a to-be-recorded operation, receiving, by the terminal, one or more operations from the user, recording, by the terminal in response to a second operation of the one or more operations, operation information corresponding to the one or more operations determining, by the terminal based on a third operation of the one or more operations, first text information corresponding to the operation information, receiving, by the terminal, a first voice command, and performing, by the terminal, a corresponding operation based on the operation information when a text of the first voice command matches the first text information.