G10L21/0324

SPATIAL AUDIO WIND NOISE DETECTION

A device includes one or more processors configured to obtain audio signals representing sound captured by at least three microphones and determine spatial audio data based on the audio signals. The one or more processors are further configured to determine a metric indicative of wind noise in the audio signals. The metric is based on a comparison of a first value and a second value. The first value corresponds to an aggregate signal based on the spatial audio data, and the second value corresponds to a differential signal based on the spatial audio data.

SPEAKER-SPECIFIC VOICE AMPLIFICATION

A method, system and computer program product for amplifying a single voice during an audio conversation. One embodiment of the method may comprise receiving, by a computing device, an audio sample of speech from a user, and generating, by the computing device, a user-specific acoustic model for enhancement of speech by the user based upon the audio sample. The method may further comprise receiving a live audiovisual stream, the live audiovisual stream including live speech by the user during an audio conversation, wherein the live audiovisual stream includes background noise, and using, by the computing device, the user-specific acoustic model to selectively amplify the live speech during the live audiovisual stream without amplifying the background noise.

SPEAKER-SPECIFIC VOICE AMPLIFICATION

A method, system and computer program product for amplifying a single voice during an audio conversation. One embodiment of the method may comprise receiving, by a computing device, an audio sample of speech from a user, and generating, by the computing device, a user-specific acoustic model for enhancement of speech by the user based upon the audio sample. The method may further comprise receiving a live audiovisual stream, the live audiovisual stream including live speech by the user during an audio conversation, wherein the live audiovisual stream includes background noise, and using, by the computing device, the user-specific acoustic model to selectively amplify the live speech during the live audiovisual stream without amplifying the background noise.

COMPUTER-IMPLEMENTED METHOD FOR SPEECH SYNTHESIS, COMPUTER DEVICE, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
20220189454 · 2022-06-16 ·

A computer-implemented method for speech synthesis, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a speech text to be synthesized; obtaining a Mel spectrum corresponding to the speech text to be synthesized according to the speech text to be synthesized; inputting the Mel spectrum into a complex neural network, and obtaining a complex spectrum corresponding to the speech text to be synthesized, wherein the complex spectrum comprises real component information and imaginary component information; and obtaining a synthetic speech corresponding to the speech text to be synthesized, according to the complex spectrum. The method can efficiently and simply complete speech synthesis.

COMPUTER-IMPLEMENTED METHOD FOR SPEECH SYNTHESIS, COMPUTER DEVICE, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
20220189454 · 2022-06-16 ·

A computer-implemented method for speech synthesis, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a speech text to be synthesized; obtaining a Mel spectrum corresponding to the speech text to be synthesized according to the speech text to be synthesized; inputting the Mel spectrum into a complex neural network, and obtaining a complex spectrum corresponding to the speech text to be synthesized, wherein the complex spectrum comprises real component information and imaginary component information; and obtaining a synthetic speech corresponding to the speech text to be synthesized, according to the complex spectrum. The method can efficiently and simply complete speech synthesis.

MULTI-MODE CHANNEL CODING

A channel encoder for encoding a frame includes a multi-mode redundancy encoder for redundancy encoding the frame in accordance with a certain coding mode from a set of different coding modes, wherein the coding modes are different from each other with respect to an amount of redundancy added to the frame, wherein the multi-mode redundancy encoder is configured to output a coded frame including at least one code word; and a colorator for applying a coloration sequence to the at least one code word; wherein the coloration sequence is such that at least one bit of the code word is changed by the application of the at least one of coloration sequence, wherein the specific coloration sequence is selected in accordance with the certain coding mode.

MULTI-MODE CHANNEL CODING

A channel encoder for encoding a frame includes a multi-mode redundancy encoder for redundancy encoding the frame in accordance with a certain coding mode from a set of different coding modes, wherein the coding modes are different from each other with respect to an amount of redundancy added to the frame, wherein the multi-mode redundancy encoder is configured to output a coded frame including at least one code word; and a colorator for applying a coloration sequence to the at least one code word; wherein the coloration sequence is such that at least one bit of the code word is changed by the application of the at least one of coloration sequence, wherein the specific coloration sequence is selected in accordance with the certain coding mode.

METHODS AND SYSTEMS FOR PROCESSING RECORDED AUDIO CONTENT TO ENHANCE SPEECH
20220165289 · 2022-05-26 ·

Methods and systems are disclosed configured to perform automated volume leveling on speech content in an audio file containing speech and non-speech segments. A low pass filter and a high pass filter may be applied to the audio data, and normalization may be performed. Speech and non-speech segments may be detected. Gain adjustments may be made to achieve a substantially constant short term loudness. Processing may be applied to enhance speech parameters, such as attack and release. An upward expander may be used to achieve a target loudness level. A limited and/or dynamic range compressor may be utilized to satisfy true peak and/or short term loudness specifications. A file of processed audio data may be generated and transmitted to one or more destinations for broadcast and/or streaming.

METHODS AND SYSTEMS FOR PROCESSING RECORDED AUDIO CONTENT TO ENHANCE SPEECH
20220165289 · 2022-05-26 ·

Methods and systems are disclosed configured to perform automated volume leveling on speech content in an audio file containing speech and non-speech segments. A low pass filter and a high pass filter may be applied to the audio data, and normalization may be performed. Speech and non-speech segments may be detected. Gain adjustments may be made to achieve a substantially constant short term loudness. Processing may be applied to enhance speech parameters, such as attack and release. An upward expander may be used to achieve a target loudness level. A limited and/or dynamic range compressor may be utilized to satisfy true peak and/or short term loudness specifications. A file of processed audio data may be generated and transmitted to one or more destinations for broadcast and/or streaming.

Systems and methods for pre-filtering audio content based on prominence of frequency content

A system is disclosed for processing electronic audio signals. The system includes an input process for receiving digital samples of an electronic audio signal; a frame division process for allocating sequences of the digital samples of the electronic audio signal to respective frames; a frequency transform process for processing the digital samples by frame thereby to register, for each of the frames, a respective frequency set; a filtering process for filtering frequencies of each frequency set into a respective one of a plurality of orders based on relative prominence; an amplitude sequence process for generating multiple amplitude sequences based on the orders, each amplitude sequence n respectively comprising a sequence of amplitudes of the nth-order frequency content in the frames; and an output process for generating user-apprehendable content for a user interface of the system based on the multiple amplitude sequences. Related systems, methods and computer-readable media are also disclosed.