Patent classifications
G10L2015/223
RESPONDING TO QUERIES WITH VOICE RECORDINGS
Implementations are provided for providing responsive audio recordings to user queries that are prerecorded by human beings, rather than generated automatically using speech synthesis processing. In various implementations, a query provided by a user at an input component of a computing device may be used to search a corpus of voice recordings From the searching, a plurality of candidate responsive voice recordings may be identified and ranked based on measures of credibility associated with speakers that created the candidate responsive voice recordings. Based on the ranking, one or more of the plurality of candidate responsive voice recordings may be provided for presentation to the user at an output component of the same computing device or a different computing device.
VOICE INFORMATION PROCESSING METHOD AND ELECTRONIC DEVICE
A voice information processing method and an electronic device are provided. The voice information processing method may include: a first device (1100) obtains first voice information, and when the first voice information includes a wakeup keyword, the first device (1100) sends a voice assistant wakeup instruction to a second device (1200), such that the second device (1200) launches a voice assistant; then the first device (1100) obtains second voice information and sends the second voice information to the second device (1200), the second device (1200) determines a voice triggered event corresponding to the second voice information by using the voice assistant, and feeds target information associated with performance of the voice triggered event back to the first device (1100), such that the first device (1100) performs the voice triggered event based on the target information. The method can reduce the computing burden of the first device (1100).
VEHICLE AVATAR DEVICES FOR INTERACTIVE VIRTUAL ASSISTANT
A system and method for providing avatar device status indicators for voice assistants in multi-zone vehicles. The method comprises: receiving at least one signal from a plurality of microphones, wherein each microphone is associated with one of a plurality of spatial zones, and one of a plurality of avatar devices; wherein the at least one signal further comprises a speech signal component from a speaker; wherein the speech signal component is a voice command or question; sending zone information associated with the speaker and with one of the plurality of spatial zones to an avatar; activating one the plurality of avatar devices in a respective one of the plurality of spatial zones associated with the speaker.
Systems and methods for screenless computerized social-media access
Systems and methods for screenless computerized social-media access may include (1) producing, via an audio speaker that is communicatively coupled to a computing device, a computer-generated verbal description of a social-media post provided via a social-media application, (2) detecting, via a microphone that is communicatively coupled to the computing device, an audible response to the social-media post from a user of the computing device, and (3) digitally responding to the social-media post in accordance with the detected audible response. Various other methods, systems, and computer-readable media are also disclosed.
Natural language processing routing
Devices and techniques are generally described for a speech processing routing architecture. In various examples, first data comprising a first feature definition is received. The first feature definition may include a first indication of first source data and first instructions for generating feature data using the first source data. In various examples, the feature data may be generated according to the first feature definition. In some examples, a speech processing system may receive a first request to process a first utterance. The feature data may be retrieved from a non-transitory computer-readable memory. The speech processing system may determine a first skill for processing the first utterance based at least in part on the feature data.
State detection and responses for electronic devices
This disclosure describes, in part, techniques for utilizing global models to generate local models for electronic devices in an environment, and techniques for utilizing the global models and/or the local models to provide notifications that are based on anomalies detected within the environment. For instance, a remote system may receive an identifier associated with an electronic device and identify a global model using the identifier. The remote system may then receive data indicating state changes of the electronic device and use the data and the global model to generate a local model associated with the electronic device. Using the global model and/or local model, the remote system can identify anomalies associated with the electronic device and, in response to identifying an anomaly, notify the user. The remote system can further cause the electronic device to change states after receiving a request from the user.
Systems and methods for generating labeled data to facilitate configuration of network microphone devices
Systems and methods for generating training data are described herein. Pieces of metadata captured by a plurality of networked sensor systems can be captured, where each piece of metadata is associated with a specific set of sensor data captured by one of the plurality of networked sensor systems and includes a set of characteristics for the specific set of captured sensor data. A probabilistic model can be generated based on the received metadata and simulations can be performed based upon a training corpus by generating multiple scenarios, and, for each scenario, a scenario specific version of a particular annotated sample is generated by performing a simulation using the particular annotated sample. The scenario specific versions of annotated samples from the training corpus can be stored as a training data set on the at least one network device.
Enhanced graphical user interface for voice communications
Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.
VOICE ASSISTANT SYSTEM WITH AUDIO EFFECTS RELATED TO VOICE COMMANDS
Voice command type entry used as a basis for applying “audio effects” (see definition herein), “sound effects” (see definition herein) and/or audio edits (see definition herein) to a sound signal. This may be done so that the various types of instructed audio processing evoke, in typical listeners, a desired sentiment or mood. Artificial intelligence may be used to accomplish this objective.
Spoken language understanding models
Techniques for using a federated learning framework to update machine learning models for spoken language understanding (SLU) system are described. The system determines which labeled data is needed to update the models based on the models generating an undesired response to an input. The system identifies users to solicit labeled data from, and sends a request to a user device to speak an input. The device generates labeled data using the spoken input, and updates the on-device models using the spoken input and the labeled data. The updated model data is provided to the system to enable the system to update the system-level (global) models.