Patent classifications
G10L21/16
VOICE TRIGGER FOR A DIGITAL ASSISTANT
A method for operating a voice trigger is provided. In some implementations, the method is performed at an electronic device including one or more processors and memory storing instructions for execution by the one or more processors. The method includes receiving a sound input. The sound input may correspond to a spoken word or phrase, or a portion thereof. The method includes determining whether at least a portion of the sound input corresponds to a predetermined type of sound, such as a human voice. The method includes, upon a determination that at least a portion of the sound input corresponds to the predetermined type, determining whether the sound input includes predetermined content, such as a predetermined trigger word or phrase. The method also includes, upon a determination that the sound input includes the predetermined content, initiating a speech-based service, such as a voice-based digital assistant.
Signal processing apparatus, signal processing method, and storage medium
A signal processing apparatus includes a detection unit configured to perform a voice detection process on each of a plurality of audio signals captured by a plurality of microphones arranged at mutually different positions, a determination unit configured to determine a degree of similarity between two or more of the plurality of audio signals in which voice is detected by the detection unit, and a suppression unit configured to perform a process of suppressing the voice contained in at least one of the two or more audio signals, in response to a determination that the degree of similarity between the two or more audio signals is less than a threshold by the determination unit.
Voice trigger for a digital assistant
A method for operating a voice trigger is provided. In some implementations, the method is performed at an electronic device including one or more processors and memory storing instructions for execution by the one or more processors. The method includes receiving a sound input. The sound input may correspond to a spoken word or phrase, or a portion thereof. The method includes determining whether at least a portion of the sound input corresponds to a predetermined type of sound, such as a human voice. The method includes, upon a determination that at least a portion of the sound input corresponds to the predetermined type, determining whether the sound input includes predetermined content, such as a predetermined trigger word or phrase. The method also includes, upon a determination that the sound input includes the predetermined content, initiating a speech-based service, such as a voice-based digital assistant.
Voice trigger for a digital assistant
A method for operating a voice trigger is provided. In some implementations, the method is performed at an electronic device including one or more processors and memory storing instructions for execution by the one or more processors. The method includes receiving a sound input. The sound input may correspond to a spoken word or phrase, or a portion thereof. The method includes determining whether at least a portion of the sound input corresponds to a predetermined type of sound, such as a human voice. The method includes, upon a determination that at least a portion of the sound input corresponds to the predetermined type, determining whether the sound input includes predetermined content, such as a predetermined trigger word or phrase. The method also includes, upon a determination that the sound input includes the predetermined content, initiating a speech-based service, such as a voice-based digital assistant.
Multi-device audio streaming system with synchronization
Embodiments include an electronic control unit comprising an audio input device for receiving an audio stream from an external audio source, the audio stream being split between an audio path and a haptic path; a wireless transceiver in the haptic path for transmitting the audio stream to at least one wearable haptic device using short-range wireless communication; and a processor coupled to the transceiver and configured to calculate an amount of latency associated with transmission of the audio stream to the wearable haptic device(s), and partition the audio stream into a plurality of audio packets including a time-to-play based on the calculated latency. The control unit further includes a buffer in the audio path for inserting a time delay into the audio stream based on the calculated latency, and an audio output device in the audio path for outputting the time-delayed audio stream to an external audio listening device.
Multi-device audio streaming system with synchronization
Embodiments include an electronic control unit comprising an audio input device for receiving an audio stream from an external audio source, the audio stream being split between an audio path and a haptic path; a wireless transceiver in the haptic path for transmitting the audio stream to at least one wearable haptic device using short-range wireless communication; and a processor coupled to the transceiver and configured to calculate an amount of latency associated with transmission of the audio stream to the wearable haptic device(s), and partition the audio stream into a plurality of audio packets including a time-to-play based on the calculated latency. The control unit further includes a buffer in the audio path for inserting a time delay into the audio stream based on the calculated latency, and an audio output device in the audio path for outputting the time-delayed audio stream to an external audio listening device.
Listening devices for obtaining metrics from ambient noise
A device may receive audio data based on a capturing of sounds associated with a structure. The device may obtain a model associated with the structure. The model may have been trained to receive the audio data as input, determine a score that identifies a likelihood that a sound is present in the audio data, and identify the sound based on the score. The device may determine at least one parameter associated with the sound. The device may generate a metric based on the at least one parameter associated with the sound, and perform an action based on the metric.
Listening devices for obtaining metrics from ambient noise
A device may receive audio data based on a capturing of sounds associated with a structure. The device may obtain a model associated with the structure. The model may have been trained to receive the audio data as input, determine a score that identifies a likelihood that a sound is present in the audio data, and identify the sound based on the score. The device may determine at least one parameter associated with the sound. The device may generate a metric based on the at least one parameter associated with the sound, and perform an action based on the metric.
APPARATUS AND METHOD FOR VOICE EVENT DETECTION
A voice event detection apparatus is disclosed. The apparatus comprises a vibration to digital converter and a computing unit. The vibration to digital converter is configured to convert an input audio signal into vibration data. The computing unit is configured to trigger a downstream module according to a sum of vibration counts of the vibration data for a number X of frames. In an embodiment, the voice event detection apparatus is capable of correctly distinguishing a wake phoneme from the input vibration data so as to trigger a downstream module of a computing system. Thus, the power consumption of the computing system is saved.
METHOD FOR PREVENTING INTELLIGIBLE VOICE RECORDINGS
A method, for preventing the intelligible voice recording is provided. The voice of a subject or interlocutor is recorded for a given time interval thereby providing a voice recording. The voice recording is cut into shorter time interval segments thereby providing a set of voice recording segments. The set of voice recording segments is mixed in a randomly rearranged order. The mixed set of voice recording segments is spliced into a single randomly mixed voice recording. Emitting the randomly mixed voice recording during speaking of the subject or interlocutor prevents the intelligible recording of the voice of the subject or interlocutor.