Patent classifications
G10L21/057
COMPUTER-READABLE RECORDING MEDIUM HAVING STORED THEREIN PROGRAM FOR GENERATING MODEL, INFORMATION PROCESSING APPARATUS, AND METHOD FOR GENERATING MODEL
A computer-readable recording medium has stored therein a program for causing a computer to execute a process including: generating a voice processing model by executing machine learning using training data, the training data associating first training voice data obtained with a first microphone, second training voice data obtained with a second microphone different from the first microphone, and clarified training voice data with one another, the clarified training voice data being obtained by a clarifying process on voice contained at least one of the first training voice data and the second training voice data, the voice processing model generating clarified voice data in response to input of first inference voice data and second inference voice data.
Training apparatus, method of the same and program
A training device changes feedback formant frequencies which are formant frequencies of a picked-up speech signal, applies a lowpass filter, converts the picked-up speech signal, adds high-pass noise to the converted speech signal, feeds back the converted speech signal with the high-pass noise added to a subject, calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on the compensatory response vector and a correct compensatory response vector.
Voice capturing method and voice capturing system
A voice capturing method includes following operations: storing, by a buffer, voice data from a plurality of microphones; determining, by a processor, whether a target speaker exists and whether a direction of the target speaker changes according to the voice data and target speaker information; inserting a voice segment corresponding to a previous tracking direction into a current position in the voice data to generate fusion voice data when the target speaker exists and the direction of the target speaker changes from the previous tracking direction to a current tracking direction; performing, by the processor, a voice enhancement process on the fusion voice data according to the current tracking direction to generate enhanced voice data; performing, by the processor, a voice shortening process on the enhanced voice data to generate voice output data; and playing, by a playing circuit, the voice output data.
Voice capturing method and voice capturing system
A voice capturing method includes following operations: storing, by a buffer, voice data from a plurality of microphones; determining, by a processor, whether a target speaker exists and whether a direction of the target speaker changes according to the voice data and target speaker information; inserting a voice segment corresponding to a previous tracking direction into a current position in the voice data to generate fusion voice data when the target speaker exists and the direction of the target speaker changes from the previous tracking direction to a current tracking direction; performing, by the processor, a voice enhancement process on the fusion voice data according to the current tracking direction to generate enhanced voice data; performing, by the processor, a voice shortening process on the enhanced voice data to generate voice output data; and playing, by a playing circuit, the voice output data.
VOICE CAPTURING METHOD AND VOICE CAPTURING SYSTEM
A voice capturing method includes following operations: storing, by a buffer, voice data from a plurality of microphones; determining, by a processor, whether a target speaker exists and whether a direction of the target speaker changes according to the voice data and target speaker information; inserting a voice segment corresponding to a previous tracking direction into a current position in the voice data to generate fusion voice data when the target speaker exists and the direction of the target speaker changes from the previous tracking direction to a current tracking direction; performing, by the processor, a voice enhancement process on the fusion voice data according to the current tracking direction to generate enhanced voice data; performing, by the processor, a voice shortening process on the enhanced voice data to generate voice output data; and playing, by a playing circuit, the voice output data.
VOICE CAPTURING METHOD AND VOICE CAPTURING SYSTEM
A voice capturing method includes following operations: storing, by a buffer, voice data from a plurality of microphones; determining, by a processor, whether a target speaker exists and whether a direction of the target speaker changes according to the voice data and target speaker information; inserting a voice segment corresponding to a previous tracking direction into a current position in the voice data to generate fusion voice data when the target speaker exists and the direction of the target speaker changes from the previous tracking direction to a current tracking direction; performing, by the processor, a voice enhancement process on the fusion voice data according to the current tracking direction to generate enhanced voice data; performing, by the processor, a voice shortening process on the enhanced voice data to generate voice output data; and playing, by a playing circuit, the voice output data.
NOISE FILTERING AND VOICE ISOLATION DEVICE AND METHOD
A method of isolating voice signal from a user, the method including capturing a first audio signal by a first microphone; capturing a second audio signal by a second microphone, the second microphone located at a distance from the first microphone; transmitting the first audio signal from the first microphone and the second audio signal from the second microphone to a processor; comparing, by the processor, the first audio signal and the second audio signal with a time delay corresponding to the distance between the first microphone and the second microphone; and finding a commonality between the first audio signal and the second audio signal if the first audio signal and the second audio signal are substantially different.
NOISE FILTERING AND VOICE ISOLATION DEVICE AND METHOD
A method of isolating voice signal from a user, the method including capturing a first audio signal by a first microphone; capturing a second audio signal by a second microphone, the second microphone located at a distance from the first microphone; transmitting the first audio signal from the first microphone and the second audio signal from the second microphone to a processor; comparing, by the processor, the first audio signal and the second audio signal with a time delay corresponding to the distance between the first microphone and the second microphone; and finding a commonality between the first audio signal and the second audio signal if the first audio signal and the second audio signal are substantially different.
FAST PLAYBACK IN MEDIA FILES WITH REDUCED IMPACT TO SPEECH QUALITY
The present invention is a computer program product and method for increasing the playback speed of audio or other media files. The computer program product and method identifies pedagogic media files and adds a flag to the metadata of the media file. The flag represents the number and type of pauses or silent sections in the pedagogic media file. Based on the flag, the computer program product and method may fast forward or remove a portion of the pauses and silent sections to provide a new playback speed.
SPEAKING RHYTHM TRANSFORMATION APPARATUS, MODEL LEARNING APPARATUS, METHODS THEREFOR, AND PROGRAM
It is intended to accurately convert a speech rhythm. A model storage unit (10) stores a speech rhythm conversion model which is a neural network that receives, as an input thereto, a first feature value vector including information related to a speech rhythm of at least a phoneme extracted from a first speech signal resulting from a speech uttered by a speaker in a first group, converts the speech rhythm of the first speech signal to a speech rhythm of a speaker in a second group, and outputs the speech rhythm of the speaker in the second group. A feature value extraction unit (11) extracts, from the input speech signal resulting from the speech uttered by the speaker in the first group, information related to a vocal tract spectrum and information related to the speech rhythm. A conversion unit (12) inputs the first feature value vector including the information related to the speech rhythm extracted from the input speech signal to the speech rhythm conversion model and obtains the post-conversion speech rhythm. A speech synthesis unit (13) uses the post-conversion speech rhythm and the information related to the vocal tract spectrum extracted from the input speech signal to generate an output speech signal.