Patent classifications
G10L2015/088
Voice Filtering Other Speakers From Calls And Audio Messages
A method includes receiving a first instance of raw audio data corresponding to a voice-based command and receiving a second instance of the raw audio data corresponding to an utterance of audible contents for an audio-based communication spoken by a user. When a voice filtering recognition routine determines to activate voice filtering for at least the voice of the user, the method also includes obtaining a respective speaker embedding of the user and processing, using the respective speaker embedding, the second instance of the raw audio data to generate enhanced audio data for the audio-based communication that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user The method also includes executing.
Detecting self-generated wake expressions
A speech-based audio device may be configured to detect a user-uttered wake expression. For example, the audio device may generate a parameter indicating whether output audio is currently being produced by an audio speaker, whether the output audio contains speech, whether the output audio contains a predefined expression, loudness of the output audio, loudness of input audio, and/or an echo characteristic. Based on the parameter, the audio device may determine whether an occurrence of the predefined expression in the input audio is a result of an utterance of the predefined expression by a user.
Methods and systems for detecting and processing speech signals
Provided are methods, systems, and apparatuses for detecting, processing, and responding to audio signals, including speech signals, within a designated area or space. A platform for multiple media devices connected via a network is configured to process speech, such as voice commands, detected at the media devices, and respond to the detected speech by causing the media devices to simultaneously perform one or more requested actions. The platform is capable of scoring the quality of a speech request, handling speech requests from multiple end points of the platform using a centralized processing approach, a de-centralized processing approach, or a combination thereof, and also manipulating partial processing of speech requests from multiple end points into a coherent whole when necessary.
Methods, systems, and media for providing information relating to detected events
Methods, systems, and media for providing information are provided. In some implementations, a method for providing information is provided, the method comprising: associating a first recording device of a group of recording devices located in an environment of a user with a trigger term; receiving, from a user device, a query that includes the trigger term; in response to receiving the query, determining that audio data is to be transmitted from at least one recording device from the group of recording devices in the environment of the user; identifying the first recording device based on the inclusion of the trigger term in the received query; receiving the audio data from the first recording device; identifying a characteristic of an animate object in the environment of the user based on the received audio data; and presenting information indicating the characteristic of the animate object on the user device.
Voice control method and apparatus, and computer storage medium
A voice control method can be applied to a first terminal, and include: receiving a user's voice operation instruction after the first terminal is activated, the voice operation instruction being used for controlling the first terminal to perform a target operation; sending an instruction execution request to a server after the voice operation instruction is received, the instruction execution request being used for requesting the server to determine whether the first terminal is to respond to the voice operation instruction according to device information of the terminal in a device network, wherein the first terminal is located in the device network; and performing the target operation in a case where a response message is received from the server, the response message indicating that the first terminal is to respond to the voice operation instruction.
Audio processing system, conferencing system, and audio processing method
An audio processing system includes: an audio receiver that receives audio; a speaker specifier that specifies a speaker on the basis of the received audio; an audio determinator that determines, on the basis of the received audio, whether or not a specified word for starting the reception of a predetermined command is included in the audio; a command specifier that specifies, when the specified word is included in the audio, a command on the basis of a command keyword which is included in the audio and follows the specified word; a target user specifier that specifies, on the basis of the content of the command, a target user with respect to which the command is to be executed; and a command executor that executes the command with respect to the target user.
Speech-processing system
A system may include first and second speech-processing systems with corresponding first and second wakewords. An utterance may contain two or more wakewords. The system determines which wakeword was spoken first and can send data to that wakeword's speech-processing system to perform further processing.
Methods and systems for recommending content in context of a conversation
A media guidance application may monitor a conversation among users, and identify keywords in the conversation, without the use of wakewords. The keywords are used to search for media content that is relevant to the on-going conversation. Accordingly, the media guidance application presents relevant content to the users, during the conversation, to more actively engage the users. A conversation monitoring window may be used to present conversation information as well as relevant content. A listening mode may be used to manage when the media guidance application processes speech from a conversation. The media guidance application may access user profiles for keywords, select content types, select content sources, and determine relevancy of media content, to provide content in context of a conversation.
Interactive media system using audio inputs
An interactive media system enables creation, editing, and presentation of voice-driven interactive media content. The interactive media content may include prompts for user input via voice, manual input, or gestures. In the case of an audio input, the interactive media player application obtains a text string representing the spoken phrases and matches the text string against a set of expected values corresponding to different predefined responses and each associated with a different possible action. Based on the matching of the phrase to an expected value, the interactive media player application dynamically selects and performs the action associated with the matching response. The action may comprise, for example, transitioning to playback of a different media object (e.g., a second video segment) and/or causing some other functionality programmatically accessible by the interactive media player application to occur.
Systems, methods, and storage media for performing actions based on utterance of a command
Systems and methods for recognizing and executing spoken commands using speech recognition. Exemplary implementations may: store actionable phrases; obtain audio information representing sound captured by a mobile client computing platform associated with a user; detect any spoken instances of a predetermined keyword present in the sound represented by the audio information; perform speech recognition on the sound represented by the audio information; identify an utterance of an individual actionable phrase in speech temporally adjacent to the spoken instance of the predetermined keyword that is present in the sound represented by the audio information; perform natural language processing to identify an individual command uttered temporally adjacent to the spoken instance of the predetermined keyword that is present in the sound represented by the audio information; and effectuate performance of instructions corresponding to the command.