Patent classifications
G10L15/32
Speech recognition system, speech recognition method and computer program product
A speech recognition system and method thereof are provided. The speech recognition system connects to an external general-purpose speech recognition system, and including a storage unit and a processing unit. The storage unit stores a specific application speech recognition module, a comparison module and an enhancement module. The specific application speech recognition module converts a speech signal into a first phonetic text. The general-purpose speech recognition system converts the speech signal into a written text. The comparison module receives the first phonetic text and the written text, converts the written text into a second phonetic text, and aligns the second phonetic text with the first phonetic text according to similarity of pronunciation to output a phonetic text alignment result. The enhancement module receives the phonetic text alignment result, and constructs with the written text and the first phonetic text after path weighting to form an outputting recognized text.
Dialogue processing apparatus, a vehicle including the same, and a dialogue processing method
A dialogue processing apparatus includes: a speech input device configured to receive a speech signal of a user; a first buffer configured to store the received speech signal therein; an output device; and a controller. The controller is configured to: detect an utterance end time point on the basis of the stored speech signal; generate a second speech recognition result corresponding to a speech signal after the utterance end time point on the basis of whether an intention of the user is to be identified from a first speech recognition result corresponding to a speech signal before the utterance end time point; and control the output device to output a response corresponding to the intention of the user determined on the basis of at least one of the first speech recognition result or the second speech recognition result.
ERROR CORRECTION IN SPEECH RECOGNITION
Systems and methods for speech recognition correction include receiving a voice recognition input from an individual user and using a trained error correction model to add a new alternative result to a results list based on the received voice input processed by a voice recognition system. The error correction model is trained using contextual information corresponding to the individual user. The contextual information comprises a plurality of historical user correction logs, a plurality of personal class definitions, and an application context. A re-ranker re-ranks the results list with the new alternative result and a top result from the re-ranked results list is output.
CHANNEL-AGNOSTIC CONVERSATION INTELLIGENCE SERVICE
An online system, for example, a multi-tenant system interacts with various conversation channels, for example, various telephony services and artificial intelligence provider systems that perform artificial intelligence based analysis of conversations. The analysis of the conversation determines additional information describing the conversation, for example, sentiment of an utterance of the conversation, entities mentioned in an utterance of the conversation, intent of an utterance of the conversation, and so on. The online system stores the information describing conversations using a normalized representation that conforms to a unified conversation schema. Various applications may use the result of the analysis obtained from the AI provider systems to take further action, for example, recommend a specific workflow to an agent that is a participant in the conversation.
CHANNEL-AGNOSTIC CONVERSATION INTELLIGENCE SERVICE
An online system, for example, a multi-tenant system interacts with various conversation channels, for example, various telephony services and artificial intelligence provider systems that perform artificial intelligence based analysis of conversations. The analysis of the conversation determines additional information describing the conversation, for example, sentiment of an utterance of the conversation, entities mentioned in an utterance of the conversation, intent of an utterance of the conversation, and so on. The online system stores the information describing conversations using a normalized representation that conforms to a unified conversation schema. Various applications may use the result of the analysis obtained from the AI provider systems to take further action, for example, recommend a specific workflow to an agent that is a participant in the conversation.
Device with voice command input capabtility
A system including at least one computerized device with voice command capability processed remotely includes a low power processor, executing a loose algorithmic model to recognize a wake word prefix in a voice command, the loose model having a low false rejection rate but suffering a high false acceptance rate, and a second processor which can operate in at least a low power/low clock rate mode and a high power/high clock rate mode. When the first processor determines the presence of the wake word, it causes the second processor to switch to the high power/high clock rate mode and to execute a tight algorithmic model to verify the presence of the wake word. By using the two processors in this manner, the average overall power required by the computerized device is reduced, as is the amount of waste heat generated by the system.
Device with voice command input capabtility
A system including at least one computerized device with voice command capability processed remotely includes a low power processor, executing a loose algorithmic model to recognize a wake word prefix in a voice command, the loose model having a low false rejection rate but suffering a high false acceptance rate, and a second processor which can operate in at least a low power/low clock rate mode and a high power/high clock rate mode. When the first processor determines the presence of the wake word, it causes the second processor to switch to the high power/high clock rate mode and to execute a tight algorithmic model to verify the presence of the wake word. By using the two processors in this manner, the average overall power required by the computerized device is reduced, as is the amount of waste heat generated by the system.
Agent system, agent server, method of controlling agent server, and storage medium
According to an embodiment, an agent system includes: a plurality of agent functions mounted on a plurality of different objects and configured to each provide a service which includes a service for causing an output to output a response by a voice in response to a speech of a user; and an information provider configured to include attribute information associated with the same kind of agent function in response content by the same kind of agent function and provide the attribute information to a portable mobile terminal of the user when the same kind of agent function is in the plurality of objects among the plurality of agent functions.
Agent system, agent server, method of controlling agent server, and storage medium
According to an embodiment, an agent system includes: a plurality of agent functions mounted on a plurality of different objects and configured to each provide a service which includes a service for causing an output to output a response by a voice in response to a speech of a user; and an information provider configured to include attribute information associated with the same kind of agent function in response content by the same kind of agent function and provide the attribute information to a portable mobile terminal of the user when the same kind of agent function is in the plurality of objects among the plurality of agent functions.
Synthesizing higher order conversation features for a multiparty conversation
Technology is provided for identifying synthesized conversation features from recorded conversations. The technology can identify, for each of one or more utterances, data for multiple modalities, such as acoustic data, video data, and text data. The technology can extract features, for each particular utterance of the one or more utterances, from each of the data for the multiple modalities associated with that particular utterance. The technology can also apply a machine learning model that receives the extracted features and/or previously synthesized conversation features and produces one or more additional synthesized conversation features.