Patent classifications
G10L15/222
Voice controlled media playback system
Disclosed herein are systems and methods for receiving a voice command and determining an appropriate action for the media playback system to execute based on user identification. The systems and methods receive a voice command for a media playback system, and determines whether the voice command was received from a registered user of the media playback system. In response to determining that the voice command was received from a registered user, the systems and methods configure an instruction for the media playback system based on content from the voice command and information in a user profile for the registered user.
Methods, systems and apparatuses for improved speech recognition and transcription
Methods, systems, and apparatuses for improved speech recognition and transcription of user utterances are described herein. User utterances may be processed by a speech recognition computing device as well as an acoustic model. The acoustic model may be trained using historical user utterance data and machine learning techniques. The acoustic model may be used to determine whether a transcription determined by the speech recognition computing device should be overridden with an updated transcription.
DEVICE INCLUDING SPEECH RECOGNITION FUNCTION AND METHOD OF RECOGNIZING SPEECH
A device including a speech recognition function which recognizes speech from a user, includes: a loudspeaker which outputs speech to a space; a microphone which collects speech in the space; a first speech recognition unit which recognizes the speech collected by the microphone; a command control unit which issues a command for controlling the device, based on the speech recognized by the first speech recognition unit; and a control unit which prohibits the command issuance unit from issuing the command, based on the speech to be output from the loudspeaker.
POLICY AUTHORING FOR TASK STATE TRACKING DURING DIALOGUE
Conversational understanding systems allow users to conversationally interface with a computing device. In examples, a query may be received that includes a request for execution of a task. A data exchange task definition may be accessed. The data exchange task definition assists a conversational understanding system in managing task state tracking for information needed for task execution. Using the data exchange task definition, a per-turn policy for interacting with the user computing device is generated based on the state of a dialogue with a computing device and an evaluation of a process flow chart provided by a task owner resource. The task owner resource may be independent from the conversational understanding system. A response to the query may be generated and output based on the per-turn policy. In examples, the per-turn policy is used to generate one or more responses during a dialogue with a user via a computing device.
CALL MANAGEMENT SYSTEM AND ITS SPEECH RECOGNITION CONTROL METHOD
A speech recognition server has a speech recognition engine, and a mode control table to hold a speech recognition mode for each call. The speech recognition engine has a mode management unit to designate a speech recognition mode for a decoder, and an output analysis unit to analyze recognition result data speech-to-text converted by speech recognition. The output analysis unit designates the speech recognition mode for the mode management unit in accordance with result of analysis of the recognition result data speech-to-text converted by the speech recognition. The mode management unit designates the speech recognition mode for the decoder in accordance with the designation with the output analysis unit. Upon speech recognition of call data, it is possible to suppress hardware resource consumption while improve users' satisfaction.
REINFORCEMENT LEARNING TECHNIQUES FOR SELECTING A SOFTWARE POLICY NETWORK AND AUTONOMOUSLY CONTROLLING A CORRESPONDING SOFTWARE CLIENT BASED ON SELECTED POLICY NETWORK
Techniques are disclosed that enable automating user interface input by generating a sequence of actions to perform a task utilizing a multi-agent reinforcement learning framework. Various implementations process an intent associated with received user interface input using a holistic reinforcement policy network to select a software reinforcement learning policy network. The sequence of actions can be generated by processing the intent, as well as a sequence of software client state data, using the selected software reinforcement learning policy network. The sequence of actions are utilized to control the software client corresponding to the selected software reinforcement learning policy network.
AUTOMATED CALLING SYSTEM
Methods, systems, and apparatus for an automated calling system are disclosed. Some implementations are directed to using a bot to initiate telephone calls and conduct telephone conversations with a user. The bot may be interrupted while providing synthesized speech during the telephone call. The interruption can be classified into one of multiple disparate interruption types, and the bot can react to the interruption based on the interruption type. Some implementations are directed to determining that a first user is placed on hold by a second user during a telephone conversation, and maintaining the telephone call in an active state in response to determining the first user hung up the telephone call. The first user can be notified when the second user rejoins the call, and a bot associated with the first user can notify the first user that the second user has rejoined the telephone call.
Automated call requests with status updates
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to synthetic call status updates. In some implementations, a method includes determining, by a task manager module, that a triggering event has occurred to provide a current status of a user call request. The method may then determine, by the task manager module, the current status of the user call request. A representation of the current status of the user call request is generated. Then, the generated representation of the current status of the user call request is provided to the user.
Spoken notifications
An example method includes, at an electronic device: receiving an indication of a notification; in accordance with receiving the indication of the notification: obtaining one or more data streams from one or more sensors; determining, based on the one or more data streams, whether a user associated with the electronic device is speaking; and in accordance with a determination that the user is not speaking: causing an output associated with the notification to be provided.
DEVICE-DIRECTED UTTERANCE DETECTION
A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.