Patent classifications
G10L17/14
METHOD AND APPARATUS FOR RECOGNIZING SPEAKER BY USING A RESONATOR
Provided are a method and device for recognizing a speaker by using a resonator. The method of recognizing the speaker includes receiving a plurality of electrical signals corresponding to a speech of the speaker from a plurality of resonators having different resonance bands; obtaining a difference of magnitudes of the plurality of electrical signals; and recognizing the speaker based on the difference of magnitudes of the plurality of electrical signals.
Speech recognition with acoustic models
Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.
Speech recognition with acoustic models
Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: subsampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.
Voiceprint security with messaging services
An online system authenticates a user through a voiceprint biometric verification process. When a user needs to be authenticated, the online system generates and provides a random phrase to the user. The online system receives an audio recording of the randomly generated phrase and retrieves a previously trained voiceprint model for the user. The online system analyzes the audio recording by applying the voiceprint model to determine whether the audio recording satisfies a first criteria of whether the voice in the audio recording belongs the user and a second criteria of whether the audio recording includes a vocalization of the randomly generated phrase. If the audio recording satisfies both criteria, the online system authenticates the user. Therefore, the user can be provided access to a new communication session in response to being authenticated.
Voiceprint security with messaging services
An online system authenticates a user through a voiceprint biometric verification process. When a user needs to be authenticated, the online system generates and provides a random phrase to the user. The online system receives an audio recording of the randomly generated phrase and retrieves a previously trained voiceprint model for the user. The online system analyzes the audio recording by applying the voiceprint model to determine whether the audio recording satisfies a first criteria of whether the voice in the audio recording belongs the user and a second criteria of whether the audio recording includes a vocalization of the randomly generated phrase. If the audio recording satisfies both criteria, the online system authenticates the user. Therefore, the user can be provided access to a new communication session in response to being authenticated.
AUTHENTICATION METHOD, AUTHENTICATION SYSTEM, SMART SPEAKER AND PROGRAM
An authentication method includes a first step and a second step. The first step causes a voice including a predetermined character string to be output from a speaker 23. The second step acquires voice information by receiving an utterance voice of the target user via a microphone 21 after the first step, and determines from the voice information whether the target user is the specific user or not. In the second step, it is determined whether a character string recognized from the voice information is matched to the predetermined character string. In the second step, it is determined whether characteristics of the utterance voice of the target user is matched to characteristics of the voice of the target user based on a characteristics amount recognized from the voice information and a characteristics amount of voice information registered in advance as the voice of the specific user.
A Method Of Sequence To Sequence Data Processing And A System For Sequence To Sequence Data Processing
A computer implemented method of sequence to sequence data processing, comprising: inputting a first input comprising a first input data sequence into a model, the model outputting a first output data sequence, a first part of the model generating an intermediate state comprising information relating to an alignment relationship between the first input data sequence and the first output data sequence, the intermediate state being used in the model to generate the first output data sequence; storing the intermediate state; modifying the model to replace the first part with the stored intermediate state; inputting a second input comprising a second input data sequence into the modified model, the modified model outputting a second output data sequence using the intermediate state.
A Method Of Sequence To Sequence Data Processing And A System For Sequence To Sequence Data Processing
A computer implemented method of sequence to sequence data processing, comprising: inputting a first input comprising a first input data sequence into a model, the model outputting a first output data sequence, a first part of the model generating an intermediate state comprising information relating to an alignment relationship between the first input data sequence and the first output data sequence, the intermediate state being used in the model to generate the first output data sequence; storing the intermediate state; modifying the model to replace the first part with the stored intermediate state; inputting a second input comprising a second input data sequence into the modified model, the modified model outputting a second output data sequence using the intermediate state.
APPARATUS FOR PROCESSING AN AUDIO SIGNAL FOR THE GENERATION OF A MULTIMEDIA FILE WITH SPEECH TRANSCRIPTION
Apparatus for processing a signal to be processed, in particular an audio signal or a signal comprising an audio track, comprising a portable container which houses at least one processor; and ports for interfacing externally, suitable for connection with means for acquiring the audio signal to be processed. The apparatus includes a control module (10) for controlling the processing procedure; a module (22) for processing the input signal to be processed; a speech transcription module (40); a diarization module (30) for recognizing and tracking each change of speaker in the second sampled audio signal; a module (50) for generating a multimedia file, and diarization module (30), at least one multimedia PDF containing an audio and/or video digital file. The multimedia PDF allows synchronized playback of the digital file and/or navigation of the transcribed text.
APPARATUS FOR PROCESSING AN AUDIO SIGNAL FOR THE GENERATION OF A MULTIMEDIA FILE WITH SPEECH TRANSCRIPTION
Apparatus for processing a signal to be processed, in particular an audio signal or a signal comprising an audio track, comprising a portable container which houses at least one processor; and ports for interfacing externally, suitable for connection with means for acquiring the audio signal to be processed. The apparatus includes a control module (10) for controlling the processing procedure; a module (22) for processing the input signal to be processed; a speech transcription module (40); a diarization module (30) for recognizing and tracking each change of speaker in the second sampled audio signal; a module (50) for generating a multimedia file, and diarization module (30), at least one multimedia PDF containing an audio and/or video digital file. The multimedia PDF allows synchronized playback of the digital file and/or navigation of the transcribed text.