G10L21/00

Speech recognition biasing
11670300 · 2023-06-06 · ·

Systems and methods are described include a robot and/or an associated computing system that can use various cues about an environment of the robot to apply a bias to increase the accuracy of speech transcription. In some implementations, audio data corresponding to a spoken instruction to a robot is received. Candidate transcriptions of the audio data are obtained. A respective action of the robot corresponding to each of the candidate transcriptions of the audio data is determined. One or more scores indicating characteristics of a potential outcome of performing the respective action corresponding to the candidate transcription of the audio data are determined for each of the candidate transcriptions of the audio data. A particular candidate transcription is selected from among the candidate transcriptions based at least on the one or more scores. The action determined for the particular candidate transcription is performed.

Methods and apparatuses for generating and using low-resolution preview tracks with high-quality encoded object and multichannel audio signals

A low-quality rendition of a complex soundtrack is created, synchronized and combined with the soundtrack. The low-quality rendition may be monitored in mastering operations, for example, to control the removal, replacement or addition of aural content in the soundtrack without the need for expensive equipment that would otherwise be required to render the soundtrack.

Methods and apparatuses for generating and using low-resolution preview tracks with high-quality encoded object and multichannel audio signals

A low-quality rendition of a complex soundtrack is created, synchronized and combined with the soundtrack. The low-quality rendition may be monitored in mastering operations, for example, to control the removal, replacement or addition of aural content in the soundtrack without the need for expensive equipment that would otherwise be required to render the soundtrack.

Method and apparatus for assigning keyword model to voice operated function
09786296 · 2017-10-10 · ·

A method, performed in an electronic device, for assigning a target keyword to a function is disclosed. In this method, a list of a plurality of target keywords is received at the electronic device via a communication network, and a particular target keyword is selected from the list of target keywords. Further, the method may include receiving a keyword model for the particular target keyword via the communication network. In this method, the particular target keyword is assigned to a function of the electronic device such that the function is performed in response to detecting the particular target keyword based on the keyword model in an input sound received at the electronic device.

Emotion type classification for interactive dialog system

Techniques for selecting an emotion type code associated with semantic content in an interactive dialog system. In an aspect, fact or profile inputs are provided to an emotion classification algorithm, which selects an emotion type based on the specific combination of fact or profile inputs. The emotion classification algorithm may be rules-based or derived from machine learning. A previous user input may be further specified as input to the emotion classification algorithm. The techniques are especially applicable in mobile communications devices such as smartphones, wherein the fact or profile inputs may be derived from usage of the diverse function set of the device, including online access, text or voice communications, scheduling functions, etc.

Visual indication of an operational state

This disclosure describes architectures and techniques to visually indicate an operational state of an electronic device. In some instances, the electronic device comprises a voice-controlled device configured to interact with a user through voice input and visual output. The voice-controlled device may be positioned in a home environment, such as on a table in a room. The user may interact with the voice-controlled device through speech and the voice-controlled device may perform operations requested by the speech. As the voice-controlled device enters different operational states while interacting with the user, one or more lights of the voice-controlled device may be illuminated to indicate the different operational states.

Voice pattern coding sequence and cataloging voice matching system

A method for voice pattern coding and catalog matching. The method includes identifying a set of vocal variables for a user, by a voice recognition system, based, at least in part, on a user interaction with the voice recognition system. The method further includes generating a voice model of speech patterns that represent the speaking of a particular language using the identified set of vocal variables, wherein the voice model is adapted to improve recognition of the user's voice by the voice recognition system. The method further includes matching the generated voice model to a catalog of speech patterns, and identifying a voice model code that represents speech patterns in the catalog that match the generated voice model. The method further includes providing the identified voice model code to the user.

METHOD OF AUDIO DEBUGGING FOR TELEVISION AND ELECTRONIC DEVICE
20170289536 · 2017-10-05 ·

The present disclosure relates to a method of audio debugging for television and an electronic device. In the method, when it is required to receive the audio in an external media source, the audio is sampled in an audio sampling rate of the audio and a sampling result obtained is stored; when it is required to play the audio, the sampling result is obtained again from the storage area and played. By the solution provided the disclosure, the television receives and plays an audio in an audio sampling rate of the audio itself, thereby avoiding the appearance of noise in the audio and improving the audio quality.

Cue-aware privacy filter for participants in persistent communications

A cue, for example a facial expression or hand gesture, is identified, and a device communication is filtered according to the cue.

Cue-aware privacy filter for participants in persistent communications

A cue, for example a facial expression or hand gesture, is identified, and a device communication is filtered according to the cue.