G10L2019/0011

SELF-SUPERVISED PITCH ESTIMATION

Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.

Concept for encoding of information

An information encoder for encoding an information signal includes: a converter for converting the linear prediction coefficients of the predictive polynomial A(z) to frequency values f.sub.1 . . . f.sub.n of a spectral frequency representation of the predictive polynomial A(z), wherein the converter is configured to determine the frequency values f.sub.1 . . . f.sub.n by analyzing a pair of polynomials P(z) and Q(z) being defined as P ( z ) = A ( z ) + z - m - l A ( z - 1 ) and Q ( z ) = A ( z ) - z - m - l A ( z - 1 ) ,
wherein m is

Method for speech coding, method for speech decoding and their apparatuses
09852740 · 2017-12-26 · ·

A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.

Apparatus and method realizing a fading of an MDCT spectrum to white noise prior to FDNS application

An apparatus for decoding an encoded audio signal to obtain a reconstructed audio signal includes a receiving interface for receiving one or more frames comprising information on a plurality of audio signal samples of an audio signal spectrum of the encoded audio signal, and a processor for generating the reconstructed audio signal. The processor is configured to generate the reconstructed audio signal by fading a modified spectrum to a target spectrum, if a current frame is not received by the receiving interface or if the current frame is received by the receiving interface but is corrupted, wherein the modified spectrum includes a plurality of modified signal samples, wherein, for each of the modified signal samples of the modified spectrum, an absolute value of the modified signal sample is equal to an absolute value of one of the audio signal samples of the audio signal spectrum.

APPARATUS AND METHOD FOR SELECTING ONE OF A FIRST ENCODING ALGORITHM AND A SECOND ENCODING ALGORITHM USING HARMONICS REDUCTION

An apparatus for selecting one of a first encoding algorithm and a second encoding algorithm includes a filter configured to receive the audio signal, to reduce the amplitude of harmonics in the audio signal and to output a filtered version of the audio signal. First and second estimators are provided for estimating first and second quality measures in the form of SNRs of segmented SNRs associated with the first and second encoding algorithms without actually encoding and decoding the portion of the audio signal using the first and second encoding algorithms. A controller is provided for selecting the first encoding algorithm or the second encoding algorithm based on a comparison between the first quality measure and the second quality measure.

Audio signal processing system for discontinuity correction

An audio signal processing device comprises a discontinuity detector configured to determine an occurrence of a discontinuity from a sudden increase of an amplitude of decoded audio obtained by decoding the first audio packet which is received correctly after an occurrence of a packet loss, and a discontinuity corrector for correcting the discontinuity of the decoded audio.

Systems, methods, apparatus, and computer-readable media for adaptive formant sharpening in linear prediction coding

A method of processing an audio signal includes determining an average signal-to-noise ratio for the audio signal over time. The method includes, based on the determined average signal-to-noise ratio, a formant-sharpening factor is determined. The method also includes applying a filter that is based on the determined formant-sharpening factor to a codebook vector that is based on information from the audio signal.

AUDIO SIGNAL PROCESSING DEVICE, AUDIO SIGNAL PROCESSING METHOD, AND AUDIO SIGNAL PROCESSING PROGRAM

An audio signal processing device comprises a discontinuity detector configured to determine an occurrence of a discontinuity from a sudden increase of an amplitude of decoded audio obtained by decoding the first audio packet which is received correctly after an occurrence of a packet loss, and a discontinuity corrector for correcting the discontinuity of the decoded audio.

Stereo Encoding Method and Apparatus, and Stereo Decoding Method and Apparatus
20220122619 · 2022-04-21 ·

A stereo encoding method includes: performing downmix processing on a left channel signal of a current frame and a right channel signal of the current frame to obtain a primary channel signal of the current frame and a secondary channel signal of the current frame; and when determining to perform differential encoding on a pitch period of the secondary channel signal, performing differential encoding on the pitch period of the secondary channel signal using an estimated pitch period value of the primary channel signal to obtain a pitch period index value of the secondary channel signal, where the pitch period index value of the secondary channel signal is used to generate a to-be-sent stereo encoded bitstream.

Apparatus and method for improved signal fade out in different domains during error concealment

An apparatus for decoding an audio signal is provided, having a receiving interface, configured to receive a first frame having a first audio signal portion of the audio signal, and configured to receive a second frame having a second audio signal portion of the audio signal; a noise level tracing unit, wherein the noise level tracing unit is configured to determine noise level information depending on at least one of the first audio signal portion and the second audio signal portion; a first reconstruction unit for reconstructing, in a first reconstruction domain, a third audio signal portion of the audio signal depending on the noise level information; a transform unit for transforming the noise level information to a second reconstruction domain; and a second reconstruction unit for reconstructing, in the second reconstruction domain, a fourth audio signal portion of the audio signal depending on the noise level information.