G10L25/27

METHOD FOR DETECTING VOICE, METHOD FOR TRAINING, AND ELECTRONIC DEVICES
20220358955 · 2022-11-10 ·

A method for detecting a voice, a method for training, apparatuses and an electronic device. An implementation of the method includes: during performing voice detection, obtaining a first feature vector corresponding to the to-be-detected voice by a voice encoding model in the confidence detection model, and obtaining a second feature vector corresponding to a to-be-detected text corresponding to the to-be-detected voice by a text encoding model in the confidence detection model; then processing the first feature vector and the second feature vector by a decoding model in the confidence detection model, to obtain a target feature vector; and performing classification processing on the target feature vector by a classification model in the confidence detection model.

METHOD FOR DETECTING VOICE, METHOD FOR TRAINING, AND ELECTRONIC DEVICES
20220358955 · 2022-11-10 ·

A method for detecting a voice, a method for training, apparatuses and an electronic device. An implementation of the method includes: during performing voice detection, obtaining a first feature vector corresponding to the to-be-detected voice by a voice encoding model in the confidence detection model, and obtaining a second feature vector corresponding to a to-be-detected text corresponding to the to-be-detected voice by a text encoding model in the confidence detection model; then processing the first feature vector and the second feature vector by a decoding model in the confidence detection model, to obtain a target feature vector; and performing classification processing on the target feature vector by a classification model in the confidence detection model.

Real-Time Accent Conversion Model
20220358903 · 2022-11-10 ·

Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent. The computing device is configured to convert the synthesized audio data into a synthesized version of the received speech content having the second accent.

Real-Time Accent Conversion Model
20220358903 · 2022-11-10 ·

Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent. The computing device is configured to convert the synthesized audio data into a synthesized version of the received speech content having the second accent.

Activity Recognition Using Inaudible Frequencies For Privacy

Sound presents an invaluable signal source that enables computing systems to perform daily activity recognition. However, microphones are optimized for human speech and hearing ranges: capturing private content, such as speech, while omitting useful, inaudible information that can aid in acoustic recognition tasks. This disclosure presents an activity recognition system that recognizes activities using sounds with frequencies inaudible to humans for preserving privacy. Real-world activity recognition performance of the system is comparable to simulated results, with over 95% classification accuracy across all environments, suggesting immediate viability in performing privacy-preserving daily activity recognition.

Activity Recognition Using Inaudible Frequencies For Privacy

Sound presents an invaluable signal source that enables computing systems to perform daily activity recognition. However, microphones are optimized for human speech and hearing ranges: capturing private content, such as speech, while omitting useful, inaudible information that can aid in acoustic recognition tasks. This disclosure presents an activity recognition system that recognizes activities using sounds with frequencies inaudible to humans for preserving privacy. Real-world activity recognition performance of the system is comparable to simulated results, with over 95% classification accuracy across all environments, suggesting immediate viability in performing privacy-preserving daily activity recognition.

LINEAR PREDICTION ANALYSIS DEVICE, METHOD, PROGRAM, AND STORAGE MEDIUM

An autocorrelation calculation unit 21 calculates an autocorrelation R.sub.O(i) from an input signal. A prediction coefficient calculation unit 23 performs linear prediction analysis by using a modified autocorrelation R′.sub.O(i) obtained by multiplying a coefficient w.sub.O( ) by the autocorrelation R.sub.O(i). It is assumed here, for each order i of some orders i at least, that the coefficient w.sub.O(i) corresponding to the order i is in a monotonically increasing relationship with an increase in a value that is negatively correlated with a fundamental frequency of the input signal of the current frame or a past frame.

LINEAR PREDICTION ANALYSIS DEVICE, METHOD, PROGRAM, AND STORAGE MEDIUM

An autocorrelation calculation unit 21 calculates an autocorrelation R.sub.O(i) from an input signal. A prediction coefficient calculation unit 23 performs linear prediction analysis by using a modified autocorrelation R′.sub.O(i) obtained by multiplying a coefficient w.sub.O( ) by the autocorrelation R.sub.O(i). It is assumed here, for each order i of some orders i at least, that the coefficient w.sub.O(i) corresponding to the order i is in a monotonically increasing relationship with an increase in a value that is negatively correlated with a fundamental frequency of the input signal of the current frame or a past frame.

EMOTION RECOGNITION APPARATUS, EMOTION RECOGNITION MODEL LEARNING APPARATUS, METHODS AND PROGRAMS FOR THE SAME

The present invention provides emotion recognition technology that achieves high emotion recognition accuracy for all speakers. The emotion recognition device comprises an emotion representation vector extraction unit that extracts an emotion representation vector representing emotion information included in input utterance data to be recognized and an emotion representation vector representing emotion information included in preregistered calm emotion utterance data by the same speaker as the input utterance data to be recognized, and a second emotion recognition unit that uses a second emotion recognition model to obtain an emotion recognition result regarding the input utterance data to be recognized from the emotion representation vector of the preregistered calm emotion utterance data and the emotion representation vector of the input utterance data to be recognized, wherein the second emotion recognition model is a model that accepts an emotion representation vector of input utterance data and an emotion representation vector of calm emotion utterance data as input, and outputs an emotion recognition result regarding the input utterance data.

EMOTION RECOGNITION APPARATUS, EMOTION RECOGNITION MODEL LEARNING APPARATUS, METHODS AND PROGRAMS FOR THE SAME

The present invention provides emotion recognition technology that achieves high emotion recognition accuracy for all speakers. The emotion recognition device comprises an emotion representation vector extraction unit that extracts an emotion representation vector representing emotion information included in input utterance data to be recognized and an emotion representation vector representing emotion information included in preregistered calm emotion utterance data by the same speaker as the input utterance data to be recognized, and a second emotion recognition unit that uses a second emotion recognition model to obtain an emotion recognition result regarding the input utterance data to be recognized from the emotion representation vector of the preregistered calm emotion utterance data and the emotion representation vector of the input utterance data to be recognized, wherein the second emotion recognition model is a model that accepts an emotion representation vector of input utterance data and an emotion representation vector of calm emotion utterance data as input, and outputs an emotion recognition result regarding the input utterance data.