Patent classifications
G10L25/78
CONVERSATION FACILITATING METHOD AND ELECTRONIC DEVICE USING THE SAME
A method for facilitating a multiparty conversation is disclosed. An electronic device using the method may facilitate a multiparty conversation by identifying participants of a conversation, localizing relative positions of the participants, detecting speeches of the conversation, matching one of the participants to each of the detected speeches according to the relative positions of the participants, counting participations of the matched participant in the conversation, identifying a passive subject from all the participants according to the participations of all the participants in the conversation, finding a topic of the conversation between the participants, and engaging the passive subject by addressing the passive subject and speaking a sentence related to the topic.
Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
Array microphone systems and methods that can automatically focus and/or place beamformed lobes in response to detected sound activity are provided. The automatic focus and/or placement of the beamformed lobes can be inhibited based on a remote far end audio signal. The quality of the coverage of audio sources in an environment may be improved by ensuring that beamformed lobes are optimally picking up the audio sources even if they have moved and changed locations.
Contextual suppression of assistant command(s)
Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
Contextual suppression of assistant command(s)
Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
User voice control system
Embodiments include techniques and objects related to a wearable audio device that includes a microphone to detect a plurality of sounds in an environment in which the wearable audio device is located. The wearable audio device further includes a non-acoustic sensor to detect that a user of the wearable audio device is speaking. The wearable audio device further includes one or more processors communicatively to alter, based on an identification by the non-acoustic sensor that the user of the wearable audio device is speaking, one or more of the plurality of sounds to generate a sound output. Other embodiments may be described or claimed.
USER VOICE DETECTOR DEVICE AND METHOD USING IN-EAR MICROPHONE SIGNAL OF OCCLUDED EAR
A device and a method for detecting voice of a user of an intra-aural device. The intra-aural device has an in-ear microphone adapted to be in fluid communication with an outer-ear ear canal of the user occluded from an environment outside the ear. A signal provided by the in-ear microphone is obtained to determine an acquired voice indicator signal, and a voice produced by the user is detecting by comparing the acquired voice indicator signal with a corresponding threshold value, upon the acquired voice indicator signal being larger than the corresponding threshold value. Although the method also reduces any voice interference coming from a non-user, the results are improved when the non-user voice is captured from an outer-ear microphone of the intra-aural device.
USER VOICE DETECTOR DEVICE AND METHOD USING IN-EAR MICROPHONE SIGNAL OF OCCLUDED EAR
A device and a method for detecting voice of a user of an intra-aural device. The intra-aural device has an in-ear microphone adapted to be in fluid communication with an outer-ear ear canal of the user occluded from an environment outside the ear. A signal provided by the in-ear microphone is obtained to determine an acquired voice indicator signal, and a voice produced by the user is detecting by comparing the acquired voice indicator signal with a corresponding threshold value, upon the acquired voice indicator signal being larger than the corresponding threshold value. Although the method also reduces any voice interference coming from a non-user, the results are improved when the non-user voice is captured from an outer-ear microphone of the intra-aural device.
FRONTEND CAPTURE
Disclosed are systems and methods for a frontend capture module of a video conferencing application, which can modify an input signal, received from a microphone device to match predetermined signal characteristics, such as voice signal level and expected noise floor. An Input stage, a suppression module and an output stage amplify the voice signal portion of the input signal and suppress the noise signal of input signal to predetermined ranges. The input stage selectively applies gains defined by a gain table, based on signal level of the input signal. The suppression module selectively applies a suppression gain to the input signal based on presence or absence of voice signal in the input signal. The output stage further amplifies the input signal in portions having a voice signal and applies a gain table to maintain a consistent noise floor.
MICROPHONE UNIT COMPRISING INTEGRATED SPEECH ANALYSIS
A microphone unit has a transducer, for generating an electrical audio signal from a received acoustic signal; a speech coder, for obtaining compressed speech data from the audio signal; and a digital output, for supplying digital signals representing said compressed speech data. The speech coder may be a lossy speech coder, and may contain a bank of filters with centre frequencies that are non-uniformly spaced, for example mel frequencies.
MICROPHONE UNIT COMPRISING INTEGRATED SPEECH ANALYSIS
A microphone unit has a transducer, for generating an electrical audio signal from a received acoustic signal; a speech coder, for obtaining compressed speech data from the audio signal; and a digital output, for supplying digital signals representing said compressed speech data. The speech coder may be a lossy speech coder, and may contain a bank of filters with centre frequencies that are non-uniformly spaced, for example mel frequencies.