Patent classifications
G10L25/87
Voice alignment method and apparatus
Example methods and apparatus for providing voice alignment are described. One example method including: obtaining an original voice and a test voice, the test voice is a voice generated after the original voice is transmitted over a communications network; performing loss detection and/or discontinuity detection on the test voice, the loss detection is used to determine whether the test voice has a voice loss compared with the original voice, and the discontinuity detection is used to determine whether the test voice has voice discontinuity compared with the original voice; and aligning the test voice with the original voice based on a result of the loss detection and/or the discontinuity detection, to obtain an aligned original voice and an aligned test voice, the result of the loss detection and/or the discontinuity detection is used to indicate a manner of aligning the test voice with the original voice.
VOICE TRANSMISSION COMPENSATION APPARATUS, VOICE TRANSMISSION COMPENSATION METHOD AND PROGRAM
A speech transmission compensation apparatus that assists discrimination of speech heard by a user, includes: one or more computers each including a memory and a processor configured to: accept input of a speech signal, detect a specific type of sound in the speech signal, analyze an acoustic characteristic of the specific type of sound in the speech signal and output the acoustic characteristic; accept input of the acoustic characteristic being output by the memory and the processor, generate a vibration signal of a duration corresponding to the acoustic characteristic and output the vibration signal; and accept input of the vibration signal being output by the memory and the processor and provide the user with vibration for the duration on the basis of the vibration signal.
VOICE TRANSMISSION COMPENSATION APPARATUS, VOICE TRANSMISSION COMPENSATION METHOD AND PROGRAM
A speech transmission compensation apparatus that assists discrimination of speech heard by a user, includes: one or more computers each including a memory and a processor configured to: accept input of a speech signal, detect a specific type of sound in the speech signal, analyze an acoustic characteristic of the specific type of sound in the speech signal and output the acoustic characteristic; accept input of the acoustic characteristic being output by the memory and the processor, generate a vibration signal of a duration corresponding to the acoustic characteristic and output the vibration signal; and accept input of the vibration signal being output by the memory and the processor and provide the user with vibration for the duration on the basis of the vibration signal.
OBFUSCATING AUDIO SAMPLES FOR HEALTH PRIVACY CONTEXTS
A supervised discriminator for detecting bio-markers in an audio sample dataset is trained and a denoising autoencoder is trained to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset. A conditional auxiliary generative adversarial network (GAN) trained to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers. The conditional auxiliary generative adversarial network (GAN), the corresponding supervised discriminator, and the corresponding denoising autoencoder are deployed in an audio processing system
OBFUSCATING AUDIO SAMPLES FOR HEALTH PRIVACY CONTEXTS
A supervised discriminator for detecting bio-markers in an audio sample dataset is trained and a denoising autoencoder is trained to learn a latent space that is used to reconstruct an output audio sample with a same fidelity as an input audio sample of the audio sample dataset. A conditional auxiliary generative adversarial network (GAN) trained to generate the output audio sample with the same fidelity as the input audio sample, wherein the output audio sample is void of the bio-markers. The conditional auxiliary generative adversarial network (GAN), the corresponding supervised discriminator, and the corresponding denoising autoencoder are deployed in an audio processing system
SYNCHRONIZED CONTROLLER
A system and method are described herein for configuring an audio distribution system, comprising a Redis server, the Redis server adapted to store Redis data to be used in configuring the audio distribution system; a plurality of audio devices, the plurality of audio devices and Redis server interconnected to form the audio distribution system, wherein each of the plurality of audio devices comprises—at least one processor; an electronic communications interface operatively connected to the at least one processor and adapted to receive data from a user and transfer the data to the at least one processor; and a memory operatively connected with the at least one processor, wherein the memory stores computer-executable instructions that, when executed by the at least one processor, causes the at least one processor in a first audio device to execute a method for configuring the audio distribution system that comprises: establishing communications using the electronic communications interface between the user and the at least one processor of the first audio device, such that data input by the user is received by the at least one processor of the first audio device; establishing communications to each of the remaining plurality of audio devices and Redis server in the audio distribution system; obtaining information from each of the remaining plurality of audio devices with which communications have been established, such information including one or more of an audio device name, part number, serial number, internet protocol address number, and physical location; receiving configuration information from the user that pertains to a specific audio device of the plurality of audio devices in the audio distribution system that, when installed on a specific audio device, causes the specific audio device to operate in a known manner; and copying that configuration information to others of the same specific type of audio device in the audio distribution system.
SPEECH RECOGNITION APPARATUS, CONTROL METHOD, AND NON-TRANSITORY STORAGE MEDIUM
A speech recognition apparatus (2000) acquires source data (10) representing an audio signal including an utterance. The speech recognition apparatus (2000) converts the source data (10) into a text string (30). The speech recognition apparatus (2000) generates a concatenated text (40) representing a content of an utterance by concatenating a text (32) included in the text string (30). Herein, texts (32) adjacent to each other in the text string (30) are such that parts of associated audio signals overlap each other on a time axis. At a time of concatenating texts (32) adjacent to each other, the speech recognition apparatus (2000) eliminates a trailing portion of a preceding text (32) and a leading portion of a succeeding text (32).
SYSTEMS AND METHODS FOR SPEECH RECOGNITION
A speech recognition method is provided. The method may include: obtaining speech data and a speech recognition result of the speech data, the speech data including speech of a plurality of speakers, and the speech recognition result including a plurality of words; determining speaking time of each of the plurality of speakers by processing the speech data; determining, based on the speaking times of the plurality of speakers and the speech recognition result, a corresponding relationship between the plurality of words and the plurality of speakers; determining, based on the corresponding relationship, at least one conversion word from the plurality of words, each of the at least one conversion word corresponding to at least two of the plurality of speakers; and re-determining the corresponding relationship between the plurality of words and the plurality of speakers based on the at least one conversion word.
SYSTEMS AND METHODS FOR SPEECH RECOGNITION
A speech recognition method is provided. The method may include: obtaining speech data and a speech recognition result of the speech data, the speech data including speech of a plurality of speakers, and the speech recognition result including a plurality of words; determining speaking time of each of the plurality of speakers by processing the speech data; determining, based on the speaking times of the plurality of speakers and the speech recognition result, a corresponding relationship between the plurality of words and the plurality of speakers; determining, based on the corresponding relationship, at least one conversion word from the plurality of words, each of the at least one conversion word corresponding to at least two of the plurality of speakers; and re-determining the corresponding relationship between the plurality of words and the plurality of speakers based on the at least one conversion word.
AUTOMATIC DUBBING METHOD AND APPARATUS
A method and system for automatic dubbing method is disclosed, comprising, responsive to receiving a selection of media content for playback on a user device by a user of the user device, processing extracted speeches of a first voice from the media content to generate replacement speeches using a set of phenomes of a second voice of the user of the user device, and replacing the extracted speeches of the first voice with the generated replacement speeches in the audio portion of the media content for playback on the user device.