Patent classifications
G10L15/222
Device-directed utterance detection
A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
Triggering voice control disambiguation
In various embodiments, a voice command is associated with a plurality of processing steps to be performed. The plurality of processing steps may include analysis of audio data using automatic speech recognition, generating and selecting a search query from the utterance text, and conducting a search of database of items using a search query. The plurality of processing steps may include additional or different steps, depending on the type of the request. In performing one or more of these processing steps, an error or ambiguity may be detected. An error or ambiguity may either halt the processing step or create more than one path of actions. A model may be used to determine if and how to request additional user input to attempt to resolve the error or ambiguity. The voice-enabled device or a second client device is then causing to output a request for the additional user input.
MAN-MACHINE DIALOGUE MODE SWITCHING METHOD
The present disclosure discloses a man-machine dialogue mode switching method, which is applicable to an electronic device. The method includes receiving a current user sentence spoken by a current user; determining whether a dialogue field to which the current user sentence belongs is a preset dialogue field; if yes, switching the current dialogue mode to a full-duplex dialogue mode; and if not, switching the current dialogue mode to a half-duplex dialogue mode. In the present disclosure, the dialogue mode is switched by determining whether the dialogue field to which the current user sentence belongs is the preset dialogue field, and the dialogue mode can be automatically switched and adjusted according to the difference of the dialogue fields, such that the man-machine dialogue is always in the most suitable dialogue mode and can be realized smoothly.
Voice prompt avatar
A method, computer program product, and system for a cognitive dialoguing avatar, the method including identifying a user, a target entity, and a user goal, initiating communication with the target entity, evaluating cognitively a question from a dialog with the target entity, determining cognitively an answer to the question by evaluating stored user information to progress to the user goal, communicating the determined answer to the target entity.
Voice user interface for intervening in conversation of at least one user by adjusting two different thresholds
An electronic device is provided. The electronic device includes a memory configured to store at least one instruction, and at least one processor where the at least one processor is configured to execute the instruction to obtain voice data from a conversation of at least one user, convert the voice data to text data, determine at least one parameter indicating characteristic of the conversation based on at least one of the voice data or the text data, adjust a condition for triggering intervention in the conversation based on the determined at least one parameter, and output a feedback based on the text data when the adjusted condition is satisfied, wherein the adjustment of the condition includes adjusting a first and a second threshold based on change of the at least one parameter.
Policy authoring for task state tracking during dialogue
Conversational understanding systems allow users to conversationally interface with a computing device. In examples, a query may be received that includes a request for execution of a task. A data exchange task definition may be accessed. The data exchange task definition assists a conversational understanding system in managing task state tracking for information needed for task execution. Using the data exchange task definition, a per-turn policy for interacting with the user computing device is generated based on the state of a dialogue with a computing device and an evaluation of a process flow chart provided by a task owner resource. The task owner resource may be independent from the conversational understanding system. A response to the query may be generated and output based on the per-turn policy. In examples, the per-turn policy is used to generate one or more responses during a dialogue with a user via a computing device.
Methods, Systems And Apparatuses For Improved Speech Recognition And Transcription
Methods, systems, and apparatuses for improved speech recognition and transcription of user utterances are described herein. User utterances may be processed by a speech recognition computing device as well as an acoustic model. The acoustic model may be trained using historical user utterance data and machine learning techniques. The acoustic model may be used to determine whether a transcription determined by the speech recognition computing device should be overridden with an updated transcription.
AUTOMATED CALL REQUESTS WITH STATUS UPDATES
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to synthetic call status updates. In some implementations, a method includes determining, by a task manager module, that a triggering event has occurred to provide a current status of a user call request. The method may then determine, by the task manager module, the current status of the user call request. A representation of the current status of the user call request is generated. Then, the generated representation of the current status of the user call request is provided to the user.
Automated calling system
Methods, systems, and apparatus for an automated calling system are disclosed. Some implementations are directed to using a bot to initiate telephone calls and conduct telephone conversations with a user. The bot may be interrupted while providing synthesized speech during the telephone call. The interruption can be classified into one of multiple disparate interruption types, and the bot can react to the interruption based on the interruption type. Some implementations are directed to determining that a first user is placed on hold by a second user during a telephone conversation, and maintaining the telephone call in an active state in response to determining the first user hung up the telephone call. The first user can be notified when the second user rejoins the call, and a bot associated with the first user can notify the first user that the second user has rejoined the telephone call.
Duplex communications for conversational AI by dynamically responsive interrupting content
Systems and methods of presenting interrupting content during human speech are disclosed. The proposed systems offer improved duplex communications in conversational AI platforms. In some embodiments, the system receives speech data and evaluates the data using linguistic models. If the linguistic models detect indications of linguistic irregularities such as mispronunciation, a smart feedback assistant can determine that the system should interrupt the speaker in near-real-time and provide feedback regarding their pronunciation. In addition, conversational irregularities may also be detected, causing the smart feedback assistant to interrupt with presentation of moderating guidance. In some cases, emotion models may also be utilized to detect emotional states based on the speaker's voice in order to offer near-immediate feedback. Users can also customize the manner and occasions in which they are interrupted.