Patent classifications
G10L25/24
Method and apparatus for camera activation
During operation a first personal-area network will activate a first camera. The first camera may be manually activated, or triggered by an audio signal. The event that causes the first camera to activate will also cause the personal-area network to send an acoustic signature to other personal-area networks. Personal-area networks that receive the acoustic signature will modify audio triggers so that the acoustic signature can be better distinguished from other noises.
Multi-modal emotion recognition device, method, and storage medium using artificial intelligence
A multi-modal emotion recognition system is disclosed. The system includes a data input unit for receiving video data and voice data of a user, a data pre-processing unit including a voice pre-processing unit for generating voice feature data from the voice data and a video pre-processing unit for generating one or more face feature data from the video data, a preliminary inference unit for generating situation determination data as to whether or not the user's situation changes according to a temporal sequence based on the video data. The system further comprises a main inference unit for generating at least one sub feature map based on the voice feature data or the face feature data, and inferring the user's emotion state based on the sub feature map and the situation determination data.
Method and electronic apparatus for detecting tampering audio, and storage medium
Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.
Method and electronic apparatus for detecting tampering audio, and storage medium
Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.
METHOD FOR IDENTIFYING AN AUDIO SIGNAL
A data processing system for identifying an audio signal includes an audio sensor, a receiver module, a signal recognition module, and a receiver device. The receiver module receives audio data from the audio sensor. The receiver module transmits the audio data to the signal recognition module. The signal recognition module calculates time-varying vector arrays of octave band energies, and/or of fractional octave band energies, and calculates time-varying vector arrays of Mel-Frequency Cepstral Coefficients (MFCC) values based on the received audio data. The signal recognition module generates audio feature image data based on the vector arrays. The signal recognition module includes binary classifier machine learning models and inference models to identify the audio signal based on the generated audio feature image data. The signal recognition module transmits a notification message to the receiver device.
METHOD FOR IDENTIFYING AN AUDIO SIGNAL
A data processing system for identifying an audio signal includes an audio sensor, a receiver module, a signal recognition module, and a receiver device. The receiver module receives audio data from the audio sensor. The receiver module transmits the audio data to the signal recognition module. The signal recognition module calculates time-varying vector arrays of octave band energies, and/or of fractional octave band energies, and calculates time-varying vector arrays of Mel-Frequency Cepstral Coefficients (MFCC) values based on the received audio data. The signal recognition module generates audio feature image data based on the vector arrays. The signal recognition module includes binary classifier machine learning models and inference models to identify the audio signal based on the generated audio feature image data. The signal recognition module transmits a notification message to the receiver device.
SPEECH EMOTION RECOGNITION METHOD AND SYSTEM BASED ON FUSED POPULATION INFORMATION
The present invention discloses a speech emotion recognition method and system based on fused population information. The method includes the following steps: S1: acquiring a user's audio data; S2: preprocessing the audio data, and obtaining a Mel spectrogram feature; S3: cutting off a front mute segment and a rear mute segment of the Mel spectrogram feature; S4: obtaining population depth feature information through a population classification network; S5: obtaining Mel spectrogram depth feature information through a Mel spectrogram preprocessing network; S6: fusing the population depth feature information and the Mel spectrogram depth feature information through SENet to obtain fused information; and S7: obtaining an emotion recognition result from the fused information through a classification network.
Clockwork hierarchal variational encoder
A method of providing a frame-based mel spectral representation of speech includes receiving a text utterance having at least one word and selecting a mel spectral embedding for the text utterance. Each word has at least one syllable and each syllable has at least one phoneme. For each phoneme, the method further includes using the selected mel spectral embedding to: (i) predict a duration of the corresponding phoneme based on corresponding linguistic features associated with the word that includes the corresponding phoneme and corresponding linguistic features associated with the syllable that includes the corresponding phoneme; and (ii) generate a plurality of fixed-length predicted mel-frequency spectrogram frames based on the predicted duration for the corresponding phoneme. Each fixed-length predicted mel-frequency spectrogram frame represents mel-spectral information of the corresponding phoneme.
Clockwork hierarchal variational encoder
A method of providing a frame-based mel spectral representation of speech includes receiving a text utterance having at least one word and selecting a mel spectral embedding for the text utterance. Each word has at least one syllable and each syllable has at least one phoneme. For each phoneme, the method further includes using the selected mel spectral embedding to: (i) predict a duration of the corresponding phoneme based on corresponding linguistic features associated with the word that includes the corresponding phoneme and corresponding linguistic features associated with the syllable that includes the corresponding phoneme; and (ii) generate a plurality of fixed-length predicted mel-frequency spectrogram frames based on the predicted duration for the corresponding phoneme. Each fixed-length predicted mel-frequency spectrogram frame represents mel-spectral information of the corresponding phoneme.
Device wakeup method and apparatus, electronic device, and storage medium
The present disclosure relates to a device wakeup method and apparatus, an electronic device, and a storage medium. The wakeup method is applied to a first electronic device and includes: a wakeup message from a second electronic device is received, and when it is determined that a present state is an unawakened state, locally collected voice data is acquired; MFCC extraction is performed on the voice data to acquire a first MFCC of the voice data; the wakeup message is parsed to obtain a second MFCC included in the wakeup message; the first MFCC and the second MFCC are matched, and when it is determined that a difference between the first MFCC and the second MFCC is less than or equal to a set threshold value, a wakeup instruction is generated; and responsive to the wakeup instruction, the first electronic device is woken up.