G10L21/028

User voice control system

Embodiments include techniques and objects related to a wearable audio device that includes a microphone to detect a plurality of sounds in an environment in which the wearable audio device is located. The wearable audio device further includes a non-acoustic sensor to detect that a user of the wearable audio device is speaking. The wearable audio device further includes one or more processors communicatively to alter, based on an identification by the non-acoustic sensor that the user of the wearable audio device is speaking, one or more of the plurality of sounds to generate a sound output. Other embodiments may be described or claimed.

Device for outputting sound and method therefor

A device for outputting sound and a method therefor are provided. The sound output method includes predicting external sound to be received from an external environment, variably adjusting sound to be output from the device, based on the predicted external sound, and outputting the adjusted sound.

Device for outputting sound and method therefor

A device for outputting sound and a method therefor are provided. The sound output method includes predicting external sound to be received from an external environment, variably adjusting sound to be output from the device, based on the predicted external sound, and outputting the adjusted sound.

AUDIO CONTENT SEGMENTATION METHOD AND APPARATUS

Embodiments of the present invention provide an audio content segmentation method and an apparatus. The method includes: obtaining at least one piece of first segmentation location information of audio content; sending a segmentation location message to a server, wherein the segmentation location message carries the at least one piece of first segmentation location information of the audio content and an audio identifier of the audio content; receiving a segmentation location recommendation message sent by the server, wherein the segmentation location recommendation message carries the audio identifier of the audio content and the at least one piece of third segmentation location information; and segmenting the audio content according to the at least one piece of third segmentation location information.

AUDIO CONTENT SEGMENTATION METHOD AND APPARATUS

Embodiments of the present invention provide an audio content segmentation method and an apparatus. The method includes: obtaining at least one piece of first segmentation location information of audio content; sending a segmentation location message to a server, wherein the segmentation location message carries the at least one piece of first segmentation location information of the audio content and an audio identifier of the audio content; receiving a segmentation location recommendation message sent by the server, wherein the segmentation location recommendation message carries the audio identifier of the audio content and the at least one piece of third segmentation location information; and segmenting the audio content according to the at least one piece of third segmentation location information.

Adaptive diarization model and user interface
11710496 · 2023-07-25 · ·

A computing device receives a first audio waveform representing a first utterance and a second utterance. The computing device receives identity data indicating that the first utterance corresponds to a first speaker and the second utterance corresponds to a second speaker. The computing device determines, based on the first utterance, the second utterance, and the identity data, a diarization model configured to distinguish between utterances by the first speaker and utterances by the second speaker. The computing device receives, exclusively of receiving further identity data indicating a source speaker of a third utterance, a second audio waveform representing the third utterance. The computing device determines, by way of the diarization model and independently of the further identity data of the first type, the source speaker of the third utterance. The computing device updates the diarization model based on the third utterance and the determined source speaker.

Adaptive diarization model and user interface
11710496 · 2023-07-25 · ·

A computing device receives a first audio waveform representing a first utterance and a second utterance. The computing device receives identity data indicating that the first utterance corresponds to a first speaker and the second utterance corresponds to a second speaker. The computing device determines, based on the first utterance, the second utterance, and the identity data, a diarization model configured to distinguish between utterances by the first speaker and utterances by the second speaker. The computing device receives, exclusively of receiving further identity data indicating a source speaker of a third utterance, a second audio waveform representing the third utterance. The computing device determines, by way of the diarization model and independently of the further identity data of the first type, the source speaker of the third utterance. The computing device updates the diarization model based on the third utterance and the determined source speaker.

ROLE SEPARATION METHOD, ELECTRONIC DEVICE, AND COMPUTER STORAGE MEDIUM
20230238015 · 2023-07-27 · ·

Embodiments of the present application provide a role separation method, an electronic device, and a computer storage medium. The role separation method includes: acquiring sound source information of target voice data and a voiceprint feature of the target voice data; determining, according to the sound source information, at least one candidate position corresponding to a sound source position; calculating a similarity between a voiceprint feature of a role corresponding to the at least one candidate position and the voiceprint feature of the target voice data; and determining a target role corresponding to the target voice data according to the similarity. By means of the embodiments of the present application, the accuracy of the role separation is improved.

ROLE SEPARATION METHOD, ELECTRONIC DEVICE, AND COMPUTER STORAGE MEDIUM
20230238015 · 2023-07-27 · ·

Embodiments of the present application provide a role separation method, an electronic device, and a computer storage medium. The role separation method includes: acquiring sound source information of target voice data and a voiceprint feature of the target voice data; determining, according to the sound source information, at least one candidate position corresponding to a sound source position; calculating a similarity between a voiceprint feature of a role corresponding to the at least one candidate position and the voiceprint feature of the target voice data; and determining a target role corresponding to the target voice data according to the similarity. By means of the embodiments of the present application, the accuracy of the role separation is improved.

Voice Filtering Other Speakers From Calls And Audio Messages
20230005480 · 2023-01-05 · ·

A method includes receiving a first instance of raw audio data corresponding to a voice-based command and receiving a second instance of the raw audio data corresponding to an utterance of audible contents for an audio-based communication spoken by a user. When a voice filtering recognition routine determines to activate voice filtering for at least the voice of the user, the method also includes obtaining a respective speaker embedding of the user and processing, using the respective speaker embedding, the second instance of the raw audio data to generate enhanced audio data for the audio-based communication that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user The method also includes executing.