Method of encoding, method of decoding, encoder, and decoder of an audio signal using transformation of frequencies of sinusoids
10734005 ยท 2020-08-04
Assignee
Inventors
- Maciej Bartkowiak (Poznan, PL)
- Tomasz Zernicki (Poznan, PL)
- Lukasz Januszkiewicz (Jastrowie, PL)
- Marcin Chryszczanowicz (Poznan, PL)
Cpc classification
G10L19/12
PHYSICS
G10L19/087
PHYSICS
G10L19/02
PHYSICS
G06F17/14
PHYSICS
International classification
G10L19/20
PHYSICS
G06Q10/08
PHYSICS
G06F17/14
PHYSICS
G10L19/087
PHYSICS
G10L19/02
PHYSICS
G10L19/005
PHYSICS
G10L19/008
PHYSICS
G10L19/12
PHYSICS
Abstract
The invention concerns an audio signal encoding method comprising the steps of: collecting the audio signal samples, determining sinusoidal components in subsequent frames, estimation of amplitudes and frequencies of the components for each frame, merging thus obtained pairs into sinusoidal trajectories, splitting particular trajectories into segments, transforming particular trajectories comprising of their amplitude and frequency variations to the frequency domain by means of a digital transform performed on segments longer than the frame duration, quantization and selection of transform coefficients in the segments, entropy encoding, and outputting the quantized coefficients as output data. The method is characterized in that the length of the segments into which each trajectory is split is individually adjusted in time for each trajectory.
Claims
1. An audio signal encoding method comprising the steps of: collecting audio signal samples, determining sinusoidal components in subsequent frames, estimation of value of amplitude and value of frequency of sinusoidal components in each frame, compression of the values of amplitude and the values of frequency to obtain output data, characterized in that compression of the values of the amplitudes of sinusoidal components and values of frequency of sinusoidal components includes steps of: forming rows of values of amplitudes of sinusoidal components determined in subsequent frames, wherein each row comprises values of amplitude of sinusoidal trajectory in subsequent frames, and forming rows of frequencies of sinusoidal components determined in subsequent frames to obtain sinusoidal trajectories, wherein each row comprises values of frequency of sinusoidal trajectory in subsequent frames, splitting particular trajectories into segments having individual lengths greater than one frame, representing the amplitudes of sinusoidal components and frequencies of sinusoidal components in logarithmic scale, by computing logarithms of amplitudes and frequencies for all frames of the segment, transforming rows of amplitudes of sinusoidal components represented in logarithmic scale over the segments to the frequency domain by means of orthogonal transform and transforming rows of frequencies of sinusoidal components represented in logarithmic scale over the segments to the frequency domain by means of orthogonal transform to obtain transform coefficients, quantization of the transform coefficients with quantization levels to obtain quantized transform coefficients, and selection in the segments of the quantized transform coefficients to be encoded and forming arrays of their indices and discarding the remaining quantized transform coefficients, entropy encoding of only selected quantized transform coefficients together with the arrays of their indices to obtain output data.
2. The method according to the claim 1 characterized in that, in the step of quantization, the quantization levels are selected individually for each trajectory.
3. The method according to the claim 2 characterized in that the quantization levels are adjusted in subsequent segments.
4. The method according to the claim 1 characterized in that entropy encoding of only selected quantized transform coefficients together with the arrays of their indices involves also and encoding at least one noise distribution parameter corresponding to total energy of discarded coefficients.
5. The method according to the claim 4, characterized in that, entropy encoding of only selected quantized transform coefficients together with the arrays of their indices and together with at least one noise distribution parameter involves also encoding of at least one additional noise parameter indicating the type of noise distribution.
6. The method according to the claim 1 characterized in that the individually adjusted length of the segments into which each trajectory is split is and determined in an optimization process, wherein minimization of output data rate is set as an optimization criterion.
7. The method according to claim 1 characterized in that a number of coefficients subjected to encoding with entropy code is selected individually in each segment.
8. The method according to claim 1 characterized in that the quantized coefficients are outputted in such a way that coefficients obtained from trajectories being a continuation of trajectories encoded in the previous groups of segments are outputted first.
9. An audio signal encoder comprising an analog-to-digital converter and a processing unit characterized in that processing unit is adapted to execute a method as defined in claim 1.
10. An audio signal decoding method comprising the steps of: retrieving encoded data, decoding sinusoidal components from the encoded data synthesis of the audio signal from the sinusoidal components, characterized in that decoding includes entropy decoding of encoded quantized transform coefficients, array of their indices, reconstruction of vectors of quantized transform coefficients scaling quantized transform coefficients, subjecting the quantized transform coefficients to an inverse orthogonal frequency transform to obtain rows of amplitudes of sinusoidal components represented in logarithmic scale and rows of frequencies of sinusoidal components represented in logarithmic scale over a segment of sinusoidal trajectory, converting values of frequency and amplitude to back to linear scale with exponential operation, and reconstructing segments of trajectories merging newly decoded segments to segments decoded already to recover the continuity of the sinusoidal trajectories.
11. The method according to the claim 10, characterized in that it includes decoding of at least one noise parameter and before subjecting the quantized coefficients to an inverse transform it includes reconstruction of discarded quantized coefficients with noise generated on a basis of the at least one noise parameter.
12. The method according to the claim 11 characterized in that it includes decoding of additional parameter specifying distribution of noise used for reconstruction of discarded quantized coefficients with noise generated on a basis of the at least one noise parameter.
13. The method according to the claim 10, characterized in that, quantized transform coefficients corresponding to segments continued from segments reconstructed in previous groups of segments are outputted in the order of reconstruction of the segments of decoded in previous groups of segments.
14. An audio signal decoder, comprising a digital-to-analog converter and a processing unit characterized in that processing unit is adapted to execute a method as defined in claim 10.
Description
SHORT DESCRIPTION OF DRAWINGS
(1) The embodiments of the present invention are illustrated in the drawing, in which
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTIONS OF EMBODIMENTS OF THE INVENTION
(7) In the first embodiment the method according to the invention was implemented in an encoder according to the invention shown in
(8) A block diagram of the decoder 210 according to the invention is shown in
(9) The signal processor 112 performs encoding according to the flowchart illustrated in
(10) For the purpose of encoding, each trajectory describing the change of frequency or the change of amplitude of the sinusoidal component is in step 315 split into segments having a length of N frames. For each segment, the frequency and amplitude values are in blocks 316 and 317 represented in the logarithmic scale, according to the formula:
x.sub.log(n,k)=log.sub.ax(n,k)
in which x(n,k) represents the amplitude or the frequency of a single signal component indicated by index k in the n-th frame, where k belongs to a range from 1 to K, n belongs to the range from 0 to N1, and a is a selected logarithm base. A vector containing values of x.sub.log(n,k) corresponding to the current segment is transformed to the frequency domain by means of an orthogonal transform 318, 319, such as the discrete cosine transform known from the literature: N. Ahmed, T. Natarajan, K. R. Rao, Discrete Cosine Transform, IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, January 1974; or by means of any other suitable transform returning a vector of spectral coefficients X(m,k), according to the formula:
X(m,k)=w.sub.m.sub.n=0.sup.N1x.sub.log(n,k).sub.m(n)
in which .sub.m(n) is a transform base function representing the m-th spectra component, m belongs to a range from 0 to N1, and w.sub.m is a normalization factor of the function. The values of transform coefficients X(m, k) are independently quantized at the quantization step by means of quantization units 320, 321 using the quantization step size providing appropriately low frequency and amplitude error of the signal reconstructed at the decoder, for example frequency error less than 10 ct and amplitude error less than 1 dB. The quantization methods and the methods of quantization step size selection are known to those skilled in the art and described in detail for example in: L. R. Rabiner, R. W. Schafer, Digital Processing of Speech Signals, Prentice Hall, 1978, and M. Bosi, R. E. Goldberg, Introduction to Digital Audio Coding and Standards, Springer, 2003. A key step for obtaining a high compression ratio is the step of quantization and selection of only a few quantized coefficients X(m,k) for further encoding. Selection blocks 322 and 323 perform this step, discarding all the coefficients of absolute values below a certain threshold or arbitrarily discarding the selected number of coefficients with the smallest absolute values. In the next steps the array of indices of selected coefficients 324, 326 and the array of values of selected coefficients 325, 327 are encoded. The coefficients that are not selected, are lost. Preferably an additional parameter, ACEnergy, representing their total energy is sent instead of them. Such operation enables reconstruction at the decoder coefficients corresponding to the lost ones in such a way that the total signal energy is not changed. This is beneficial for human perception of a sound. Additional improvement can be obtained by sending information about the shape of the envelope of the lost coefficients in a form of the second parameter, which may take three values corresponding to Poisson, Gaussian or exponential function, respectively.
(11) At the next stage 328 the content of all arrays is encoded using one of known entropy encoding techniques, such as Huffman code known from: A. Huffman, A Method for the Construction of Minimum-Redundancy Codes, Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, September 1952, which returns an output compressed data sequence 115.
(12)
(13) The energy restoring unit operates with the AC coefficients of both the frequency trajectory and the amplitude trajectory. This introduces certain randomness to the signala noise that was lost in the encoding process. The energy distribution modelled with the Poisson/Gauss/exponential function corresponds to the shape of distribution occurring in the natural musical signals.
(14) Next, the inverse transform 416, 417 is computed according to the formula:
{circumflex over (x)}.sub.log(n,k)=v.sub.n.sub.m=0.sup.N1{circumflex over (X)}(m,k).sub.m(n),
where {circumflex over (X)}(m,k) stands for the reconstructed value of quantized transform coefficient, {circumflex over (x)}.sub.log(n,k) is the reconstructed logarithmic value of signal frequency or the reconstructed logarithmic value of signal amplitude in the frame indicated by index n for the trajectory of the sinusoidal component indicated by k of the decoded signal in the current trajectory segment, .sub.m(n) is the base function of the transform inverse to the one used in the encoding process, and v.sub.n is a normalization factor of the function. Encoding techniques with a use of a transform are widely known from the literature, e.g.: N. S. Jayant, P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, Prentice-Hall, 1984, and K. Sayood, Introduction to Data Compression, Morgan Kaufmann, 2000. In the subsequent step reconstructed logarithmic values of frequency and amplitude are converted to the linear scale by means of antilogarithm 418, 419, according to the formula:
{circumflex over (x)}(n,k)=a.sup.{circumflex over (x)}.sup.
(15) In the above, a is the logarithm base used in the encoder, while {circumflex over (x)}.sub.log(n,k) is the reconstructed frequency value or the reconstructed amplitude value in the frame indicated by index n for the current segment of the sinusoidal trajectory describing the k-th component of the decoded signal. In the next step of decoding, the reconstructed trajectories' segments are merged in the blocks 420, 421 with the segments already decoded in order to recover the continuity of the trajectories' waveforms. The last decoding step is synthesis of the signal 214 described by the sinusoidal trajectories, the synthesis being performed in the block 422 with a use of techniques known from the literature, e.g.: R. J. McAulay, T. F. Quatieri, Speech analysis/synthesis based on a sinusoidal representation, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-34 (4), 1986.
(16)
(17) The disclosed invention allows significant reduction, by a factor of several times, of the number of bits required to encode the signal while maintaining good quality of the decoded signal at bit rates in the range of 8 kb/s-16 kb/s.
(18) For the one skilled in the art it is clear that the invention may be practiced in many different ways and using different conventional devices. It is clear that various modifications of the embodiments of the invention using FPGA matrices, ASIC circuits, signal processors, and other typical components are within the scope of protection.