G10L2019/0011

Audio signal discontinuity correction processing system

An audio signal processing system and method is executed by an audio signal processing device to decode an audio packet to obtain decoded audio and determine an occurrence of a discontinuity occurring with a sudden increase of an amplitude of the decoded audio obtained by decoding the audio packet. The audio packet may be received correctly after an occurrence of a packet loss, and corrected to improve subjective quality of the decoded audio, wherein correcting the discontinuity of the decoded audio comprises causing distances between ISF/LSF parameters corresponding to a frame in which a packet loss has occurred to be equal.

Self-supervised pitch estimation

Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.

Apparatus and method for generating an adaptive spectral shape of comfort noise

An apparatus for decoding an encoded audio signal to obtain a reconstructed audio signal is provided, having: a receiving interface for receiving one or more frames, a coefficient generator, and a signal reconstructor. The coefficient generator is configured to determine one or more first audio signal coefficients, and one or more noise coefficients. Moreover, the coefficient generator is configured to generate one or more second audio signal coefficients, depending on the one or more first audio signal coefficients and depending on the one or more noise coefficients. The audio signal reconstructor is configured to reconstruct a first portion of the reconstructed audio signal depending on the one or more first audio signal coefficients and the audio signal reconstructor is configured to reconstruct a second portion of the reconstructed audio signal depending on the one or more second audio signal coefficients, if the current frame is not received by the receiving interface or if the current frame being received by the receiving interface is corrupted.

Audio signal discontinuity processing system

An audio signal processing device comprises a discontinuity detector configured to determine an occurrence of a discontinuity from a sudden increase of an amplitude of decoded audio obtained by decoding the first audio packet which is received correctly after an occurrence of a packet loss, and a discontinuity corrector for correcting the discontinuity of the decoded audio by changing, in a state buffer, a distance between elements of Immittance Spectral Pair/Immittance Spectral Frequency (ISF/LSF) parameters of a past frame.

CONCEPT FOR ENCODING OF INFORMATION

An information encoder for encoding an information signal includes: a converter for converting the linear prediction coefficients of the predictive polynomial A(z) to frequency values f.sub.1 . . . f.sub.n of a spectral frequency representation of the predictive polynomial A(z), wherein the converter is configured to determine the frequency values f.sub.1 . . . f.sub.n by analyzing a pair of polynomials P(z) and Q(z) being defined as

[00001] P ( z ) = A ( z ) + z - m - l A ( z - 1 ) and Q ( z ) = A ( z ) - z - m - l A ( z - 1 ) ,

wherein m is an order of the predictive polynomial A(z) and I is greater or equal to zero, wherein the converter is configured to obtain the frequency values by establishing a strictly real spectrum derived from P(z) and a strictly imaginary spectrum from Q(z) and by identifying zeros of the strictly real spectrum derived from P(z) and the strictly imaginary spectrum derived from Q(z).

Concept for encoding of information

An information encoder for encoding an information signal includes: a converter for converting the linear prediction coefficients of the predictive polynomial A(z) to frequency values f.sub.1 . . . f.sub.n of a spectral frequency representation of the predictive polynomial A(z), wherein the converter is configured to determine the frequency values f.sub.1 . . . f.sub.n by analyzing a pair of polynomials P(z) and Q(z) being defined as
P(z)=A(z)+z.sup.−m−lA(z.sup.−1) and
Q(z)=A(z)−z.sup.−m−lA(z.sup.−1),
wherein m is an order of the predictive polynomial A(z) and l is greater or equal to zero, wherein the converter is configured to obtain the frequency values by establishing a strictly real spectrum derived from P(z) and a strictly imaginary spectrum from Q(z) and by identifying zeros of the strictly real spectrum derived from P(z) and the strictly imaginary spectrum derived from Q(z).

APPARATUS AND METHOD REALIZING IMPROVED CONCEPTS FOR TCX LTP

An apparatus for decoding an encoded audio signal to obtain a reconstructed audio signal is provided. The apparatus includes a receiving interface, a delay buffer and a sample processor for processing the selected audio signal samples to obtain reconstructed audio signal samples of the reconstructed audio signal. The sample selector is configured to select, if a current frame is received by the receiving interface and if the current frame being received by the receiving interface is not corrupted, the plurality of selected audio signal samples from the audio signal samples being stored in the delay buffer depending on a pitch lag information being included by the current frame.

APPARATUS AND METHOD FOR IMPROVED SIGNAL FADE OUT IN DIFFERENT DOMAINS DURING ERROR CONCEALMENT

An apparatus for decoding an audio signal is provided, having a receiving interface, configured to receive a first frame having a first audio signal portion of the audio signal, and configured to receive a second frame having a second audio signal portion of the audio signal; a noise level tracing unit, wherein the noise level tracing unit is configured to determine noise level information depending on at least one of the first audio signal portion and the second audio signal portion; a first reconstruction unit for reconstructing, in a first reconstruction domain, a third audio signal portion of the audio signal depending on the noise level information; a transform unit for transforming the noise level information to a second reconstruction domain; and a second reconstruction unit for reconstructing, in the second reconstruction domain, a fourth audio signal portion of the audio signal depending on the noise level information.

Apparatus and method for improved signal fade out for switched audio coding systems during error concealment

An apparatus for decoding an audio signal includes a receiving interface, wherein the receiving interface is configured to receive a first frame and a second frame. Moreover, the apparatus includes a noise level tracing unit for determining noise level information being represented in a tracing domain. Furthermore, the apparatus includes a first reconstruction unit for reconstructing a third audio signal portion of the audio signal depending on the noise level information and a second reconstruction unit for reconstructing a fourth audio signal portion depending on noise level information being represented in the second reconstruction domain.

APPARATUS AND METHOD FOR IMPROVED SIGNAL FADE OUT FOR SWITCHED AUDIO CODING SYSTEMS DURING ERROR CONCEALMENT

An apparatus for decoding an audio signal includes a receiving interface, wherein the receiving interface is configured to receive a first frame and a second frame. Moreover, the apparatus includes a noise level tracing unit for determining noise level information being represented in a tracing domain. Furthermore, the apparatus includes a first reconstruction unit for reconstructing a third audio signal portion of the audio signal depending on the noise level information and a second reconstruction unit for reconstructing a fourth audio signal portion depending on noise level information being represented in the second reconstruction domain