Patent classifications
G10L17/12
Methods, apparatus and systems for authentication
Methods, apparatus and systems for biometric authentication based on an audio signal are provided. The audio signal comprises a representation of a voice signal of a user conducted via at least part of a user's skeleton. Further embodiments may relate to biometric authentication based upon a combination of a bone-conducted audio signal, or a bone-conducted voice biometric process, with an air-conducted voice signal.
Multi-user devices in a connected home environment
A device implementing a system for responding to a voice request includes a processor configured to receive a voice request for content, the voice request corresponding to a first user account, the device being associated with the first user account, a second user account and a default account. The processor if further configured to determine that the content is unavailable via the first user account, and provide, in response to the determining, the content via at least one of the second user account or the default account.
Apparatus and method for own voice suppression
An own voice suppression apparatus applicable to a hearing aid is disclosed. The own voice suppression apparatus comprises: an air conduction sensor, an own voice indication module and a suppression module. The air conduction sensor is configured to generate an audio signal. The own voice indication module is configured to generate an indication signal according to at least one of user's mouth vibration information and user's voice feature vector comparison result. The suppression module coupled to the air conduction sensor and the own voice indication module is configured to generate an own-voice-suppressed signal according to the indication signal and the audio signal.
Apparatus and method for own voice suppression
An own voice suppression apparatus applicable to a hearing aid is disclosed. The own voice suppression apparatus comprises: an air conduction sensor, an own voice indication module and a suppression module. The air conduction sensor is configured to generate an audio signal. The own voice indication module is configured to generate an indication signal according to at least one of user's mouth vibration information and user's voice feature vector comparison result. The suppression module coupled to the air conduction sensor and the own voice indication module is configured to generate an own-voice-suppressed signal according to the indication signal and the audio signal.
Electronic device and method for operation thereof
Various embodiments of the present disclosure relate to a method for providing an intelligent assistance service and an electronic device for performing the same. According to an embodiment, an electronic device comprises at least one communication circuit, at least one microphone, at least one speaker, at least one processor operatively connected to the communication circuit, the microphone, and the speaker, and at least one memory electrically connected to the processor, wherein the memory has instructions stored therein which, when executed, cause the processor to receive a wake-up utterance calling a voice-based intelligent assistance service, in response to the wake-up utterance, to identify a session which is in progress by the voice-based intelligent assistance service, and, upon receiving a control command, to provide the control command to an external device through the session on the basis of the session. Other embodiments are also possible.
Electronic device and method for operation thereof
Various embodiments of the present disclosure relate to a method for providing an intelligent assistance service and an electronic device for performing the same. According to an embodiment, an electronic device comprises at least one communication circuit, at least one microphone, at least one speaker, at least one processor operatively connected to the communication circuit, the microphone, and the speaker, and at least one memory electrically connected to the processor, wherein the memory has instructions stored therein which, when executed, cause the processor to receive a wake-up utterance calling a voice-based intelligent assistance service, in response to the wake-up utterance, to identify a session which is in progress by the voice-based intelligent assistance service, and, upon receiving a control command, to provide the control command to an external device through the session on the basis of the session. Other embodiments are also possible.
IDENTIFICATION DEVICE, IDENTIFICATION METHOD, AND RECORDING MEDIUM
An identification device includes: an obtainer that obtains voice data; an identifier that obtains, through speaker identification processing, a score indicating a degree of similarity between the voice data obtained by the obtainer and voice data on an utterance of a predetermined speaker; and a corrector that corrects the score to reduce the influence of a degradation in identification performance of the speaker identification processing by the identifier on the score and outputs the score corrected, when determining that the voice data obtained by the obtainer has a feature of reducing the influence.
A SYSTEM AND A METHOD FOR LOW LATENCY SPEAKER DETECTION AND RECOGNITION
A system for recognizing a user of a communicating device as belonging to a list of known users from an utterance included in a voice signal received from the communicating device. The system includes applying an utterance of a speaker to a machine learning voiceprint extraction model to extract a voiceprint set comprising an i-vector or a speaker embedding based on the utterance; outputting the voiceprint set by the machine learning voiceprint extraction model; applying the output voiceprint set to a machine learning model to compute an utterance match score based on the voiceprint set, or to a machine learning hashing model to reduce the voiceprint set to a reduced dimension voiceprint set and apply the reduced dimension voiceprint set to the machine learning model to compute the utterance match score based on the reduced dimension voiceprint set; outputting the utterance match score by the machine learning model; applying the output match score to a machine learning score normalization model (NL-NORM) to calibrate the match score; comparing the calibrated match score to a match score threshold; and, when the calibrated match score is greater than the match score threshold, identifying the user as belonging to a list of known users.
A SYSTEM AND A METHOD FOR LOW LATENCY SPEAKER DETECTION AND RECOGNITION
A system for recognizing a user of a communicating device as belonging to a list of known users from an utterance included in a voice signal received from the communicating device. The system includes applying an utterance of a speaker to a machine learning voiceprint extraction model to extract a voiceprint set comprising an i-vector or a speaker embedding based on the utterance; outputting the voiceprint set by the machine learning voiceprint extraction model; applying the output voiceprint set to a machine learning model to compute an utterance match score based on the voiceprint set, or to a machine learning hashing model to reduce the voiceprint set to a reduced dimension voiceprint set and apply the reduced dimension voiceprint set to the machine learning model to compute the utterance match score based on the reduced dimension voiceprint set; outputting the utterance match score by the machine learning model; applying the output match score to a machine learning score normalization model (NL-NORM) to calibrate the match score; comparing the calibrated match score to a match score threshold; and, when the calibrated match score is greater than the match score threshold, identifying the user as belonging to a list of known users.
Information processing apparatus, control method, and program
The information processing apparatus (2000) computes a first score representing a degree of similarity between the input sound data (10) and the registrant sound data (22) of the registrant (20). The information processing apparatus (2000) obtains a plurality of pieces of segmented sound data (12) by segmenting the input sound data (10) in the time direction. The information processing apparatus (2000) computes, for each piece of segmented sound data piece (12), a second score representing the degree of similarity between the segmented sound data (12) and the registrant sound data (22). The information processing apparatus 2000 makes first determination to determine whether a number of speakers of sound included in the input sound data (10) is one or multiple, using at least the second score. The information processing apparatus (2000) makes second determination to determine whether the input sound data (10) includes the sound of the registrant (20), based on the first score, the second scores, and a result of the first determination.