G10L17/20

Apparatus for recognizing voice speaker and method for the same

Disclosed herein are an apparatus and method for recognizing a voice speaker. The apparatus for recognizing a voice speaker includes a voice feature extraction unit configured to extract a feature vector from a voice signal inputted through a microphone; and a speaker recognition unit configured to calculate a speaker recognition score by selecting a reverberant environment from multiple reverberant environment learning data sets based on the feature vector extracted by the voice feature extraction unit and to recognize a speaker by assigning a weight depending on the selected reverberant environment to the speaker recognition score.

Apparatus for recognizing voice speaker and method for the same

Disclosed herein are an apparatus and method for recognizing a voice speaker. The apparatus for recognizing a voice speaker includes a voice feature extraction unit configured to extract a feature vector from a voice signal inputted through a microphone; and a speaker recognition unit configured to calculate a speaker recognition score by selecting a reverberant environment from multiple reverberant environment learning data sets based on the feature vector extracted by the voice feature extraction unit and to recognize a speaker by assigning a weight depending on the selected reverberant environment to the speaker recognition score.

Speaker recognition device, speaker recognition method, and recording medium

A speaker recognition device includes: a feature calculator that calculates two or more acoustic features of a voice of an utterance obtained; a similarity calculator that calculates two or more similarities, each being a similarity between one of one or more speaker-specific features of a target speaker for recognition and one of the two or more acoustic features; a combination unit that combines the two or more similarities to obtain a combined value; and a determiner that determines whether a speaker of the utterance is the target speaker based on the combined value. Here, (i) at least two of the two or more acoustic features have different properties, (ii) at least two of the two or more similarities have different properties, or (iii) at least two of the two or more acoustic features have different properties and at least two of the two or more similarities have different properties.

Speaker recognition device, speaker recognition method, and recording medium

A speaker recognition device includes: a feature calculator that calculates two or more acoustic features of a voice of an utterance obtained; a similarity calculator that calculates two or more similarities, each being a similarity between one of one or more speaker-specific features of a target speaker for recognition and one of the two or more acoustic features; a combination unit that combines the two or more similarities to obtain a combined value; and a determiner that determines whether a speaker of the utterance is the target speaker based on the combined value. Here, (i) at least two of the two or more acoustic features have different properties, (ii) at least two of the two or more similarities have different properties, or (iii) at least two of the two or more acoustic features have different properties and at least two of the two or more similarities have different properties.

VOICE RECOGNITION DEVICE AND METHOD

The disclosure relates to an electronic apparatus for recognizing user voice and a method of recognizing, by the electronic apparatus, the user voice. According to an embodiment, the method of recognizing the user voice includes obtaining an audio signal segmented into a plurality of frame units, determining an energy component for each filter bank by applying a filter bank distributed according to a preset scale to a frequency spectrum of the audio signal segmented into the frame units, smoothing the determined energy component for each filter bank, extracting a feature vector of the audio signal based on the smoothed energy component for each filter bank, and recognizing the user voice in the audio signal by inputting the extracted feature vector to a voice recognition model.

Condition-invariant feature extraction network

To generate substantially condition-invariant and speaker-discriminative features, embodiments are associated with a feature extractor capable of extracting features from speech frames based on first parameters, a speaker classifier capable of identifying a speaker based on the features and on second parameters, and a condition classifier capable of identifying a noise condition based on the features and on third parameters. The first parameters of the feature extractor and the second parameters of the speaker classifier are trained to minimize a speaker classification loss, the first parameters of the feature extractor are further trained to maximize a condition classification loss, and the third parameters of the condition classifier are trained to minimize the condition classification loss.

TEXT PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND MEDIUM
20230326466 · 2023-10-12 ·

Provided are a text processing method and apparatus, an electronic device, and a medium. The method includes the following: target text information generated based on audio information is acquired; a to-be-error-corrected word in the target text information and a target candidate replacement word corresponding to the to-be-error-corrected word are determined; and a target replacement word corresponding to the to-be-error-corrected word is determined according to the target candidate replacement word, and the target text information is updated based on the target replacement word.

TEXT PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND MEDIUM
20230326466 · 2023-10-12 ·

Provided are a text processing method and apparatus, an electronic device, and a medium. The method includes the following: target text information generated based on audio information is acquired; a to-be-error-corrected word in the target text information and a target candidate replacement word corresponding to the to-be-error-corrected word are determined; and a target replacement word corresponding to the to-be-error-corrected word is determined according to the target candidate replacement word, and the target text information is updated based on the target replacement word.

VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD, RECORDING MEDIUM, AND VOICE AUTHENTICATION SYSTEM
20230326465 · 2023-10-12 · ·

The present disclosure implements speaker verification with high accuracy regardless of input devices. An integration unit (110) integrates voice data inputted using an input device, and the frequency characteristic of the input device, and a feature extraction unit (120) extracts, from an integrated feature obtained by integrated the voice data and the frequency characteristic, a speaker feature for verifying the speaker of voice.

VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD, RECORDING MEDIUM, AND VOICE AUTHENTICATION SYSTEM
20230326465 · 2023-10-12 · ·

The present disclosure implements speaker verification with high accuracy regardless of input devices. An integration unit (110) integrates voice data inputted using an input device, and the frequency characteristic of the input device, and a feature extraction unit (120) extracts, from an integrated feature obtained by integrated the voice data and the frequency characteristic, a speaker feature for verifying the speaker of voice.