LPC RESIDUAL SIGNAL ENCODING/DECODING APPARATUS OF MODIFIED DISCRETE COSINE TRANSFORM (MDCT)-BASED UNIFIED VOICE/AUDIO ENCODING DEVICE
20220406320 · 2022-12-22
Assignee
Inventors
- Seung Kwon BEACK (Daejeon, KR)
- Tae Jin LEE (Daejon, KR)
- Min Je KIM (Daegu, KR)
- Kyeongok Kang (Daejeon, KR)
- Dae Young Jang (Daejeon, KR)
- Jin Woo HONG (Daejeon-si, KR)
- Jeongil Seo (Daejeon, KR)
- Chieteuk AHN (Seoul, KR)
- Hochong Park (Seoul, KR)
- Young-Cheol PARK (Gangwon-do, KR)
Cpc classification
G10L19/087
PHYSICS
G10L19/22
PHYSICS
International classification
G10L19/087
PHYSICS
G10L19/125
PHYSICS
G10L19/22
PHYSICS
Abstract
Disclosed is an LPC residual signal encoding/decoding apparatus of an MDCT based unified voice and audio encoding device. The LPC residual signal encoding apparatus analyzes a property of an input signal, selects an encoding method of an LPC filtered signal, and encode the LPC residual signal based on one of a real filterbank, a complex filterbank, and an algebraic code excited linear prediction (ACELP).
Claims
1. A processing method performed by a device, comprising: identifying a previous frame which has a first characteristic to be coded by a first coding scheme; identifying a current frame which has a second characteristic to be coded by a second coding scheme; identifying additional information for cancelling a time-domain aliasing introduced by Modified Discrete Cosine Transform (MDCT); and adding (i) a first signal related to the previous frame, (ii) a second signal related to the additional information, and (iii) a third signal related to the current frame.
2. The processing method of claim 1, wherein the first characteristic is a speech characteristic, and the second characteristic is an audio characteristic.
3. The processing method of claim 1, wherein the first coding scheme is a time-domain coding scheme, and the second coding scheme is a frequency-domain coding scheme.
4. The processing method of claim 1, wherein the time-domain coding scheme includes CELP (code-excited linear prediction), and the frequency-domain coding scheme includes MDCT (Modified Discrete Cosine Transform).
5. The processing method of claim 1, wherein the first signal is derived from a portion of the previous frame, the second signal is derived from the additional information, and third signal is derived from the current frame.
6. The processing method of claim 1, wherein the additional information is different from the previous frame.
7. The processing method of claim 1, wherein the additional information is used for restoring the current frame.
8. The processing method of claim 1, wherein the additional information has length corresponding to a portion of entire length of the current frame.
9. The processing method of claim 1, wherein the additional information is applied to a boundary between the previous frame and the current frame.
10. A processing method performed by a device, comprising: identifying a previous frame which has a first characteristic to be coded by a first coding scheme; identifying a current frame which has an second characteristic to be coded by a a second coding scheme; identifying additional information for compensating a time-domain aliasing introduced by Modified Discrete Cosine Transform (MDCT); determining a first signal based on the additional information; determining a second signal based on a portion of the previous frame; determining a third signal based on the current frame; and adding the first signal, the second signal and the third signal to restore the current frame.
11. The processing method of claim 10, wherein the first characteristic is a speech characteristic, and the second characteristic is an audio characteristic.
12. The processing method of claim 10, wherein the first coding scheme is a time-domain coding scheme, and the second coding scheme is a frequency-domain coding scheme, and wherein the time-domain coding scheme includes CELP (code-excited linear prediction), and the frequency-domain coding scheme includes MDCT (Modified Discrete Cosine Transform).
13. The processing method of claim 10, wherein the additional information is different from the previous frame.
14. The processing method of claim 10, wherein the additional information has length corresponding to a portion of entire length of the current frame.
15. The processing method of claim 10, wherein the additional information is applied to a boundary between the previous frame and the current frame.
16. A processing method performed by a device, comprising: identifying a previous frame which has a first characteristic to be coded by a first coding scheme; identifying a current frame which has a second characteristic to be coded in a second coding scheme; and processing for modifying a specific area of the previous frame to be overlap-added with the current frame; performing overlap-add for a first signal related to the specific area of the previous frame and a second signal related to the current frame.
17. The processing method of claim 16, wherein the previous frame is divided into first area and second area, wherein the second area is located after the first area in the previous frame, wherein the specific area corresponds to the second area.
18. The processing method of claim 16, wherein the specific area is modified for artificially compensating a time-domain aliasing introduced by processing the current frame using a frequency domain coding.
19. The processing method of claim 16, wherein the specific area is modified based on artificial TDA(time domain aliasing) signal.
20. The processing method of claim 16, wherein the specific area is modified using a sine window corresponding to left portion of window for the current frame.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
BEST MODE FOR CARRYING OUT THE INVENTION
[0028] Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
[0029]
[0030] Referring to
[0031] The signal analyzing unit 110 may analyze a property of an input signal and may select an encoding method for an LPC filtered signal. As an example, when the input signal is an audio signal, the input signal is encoded by the first encoding unit 120 or the second encoding unit 130, and when the input signal is a voice signal, the input signal is encoded by the third encoding unit 120. In this instance, the signal analyzing unit 110 may transfer a control command to select the encoding method, and may control one of the first encoding unit 120, the second encoding unit 130, and the third encoding unit 140 to perform encoding. Accordingly, one of a real filterbank based residual signal encoding, a complex filterbanks based residual signal encoding, and an algebraic code excited linear prediction (ACELP) based residual signal encoding may be performed.
[0032] The first encoding unit 120 may encode the LPC residual signal based on the real filterbank according to the selection of the signal analyzing unit. As an example, the first encoding unit 120 may perform a modified discrete cosine transform (MDCT) based filterbank with respect to the LPC residual signal and may encode the LPC residual signal.
[0033] The second encoding unit 130 may encode the LPC residual signal based on the complex filterbanks according to the selection of the signal analyzing unit As an example, the second encoding unit 130 may perform a discrete Fourier transform (DFT) based filter bank with respect to the LPC residual signal, and may encode the LPC residual signal. Also, the second encoding unit 130 may perform a modified discrete sine transform (MDST) based filterbank with respect to the LPC residual signal, and may encode the LPC residual signal.
[0034] The third encoding unit 140 may encode the LPC residual signal based on the ACELP according to the selection of the signal analyzing unit. That is, when the input signal is a voice signal, the third encoding unit 140 may encode LPC residual signal based on the ACELP.
[0035]
[0036] Referring to
[0037] That is, when the signal analyzing unit 210 analyzes the input signal, and generates a control command to control a switch, one of a first encoding unit 220, a second encoding unit 230, and a third encoding unit 240 may perform encoding according to the controlling of the switch. Here, the first encoding unit 220 encodes the LPC residual signal based on the real filterbank, the second encoding unit 230 encodes the
[0038] LPC residual signal based on the complex filterbank, and the third encoding unit 240 encodes the LPC residual signal based on the ACELP.
[0039] Here, when the complex filterbank is performed with respect to the same size of frame, twice the amount of data is outputted than when the real based (e.g. MDCT based) filterbank is performed, due to an imaginary part. That is, when the complex filterbank is applied to the same input, twice the amount of data needs to be encoded. However, in a case of an MDCT based residual signal, an aliasing occurs on a time axis. Conversely, in a case of a complex transform, such as a DTF and the like, an aliasing does not occur on the time axis.
[0040]
[0041] Referring to
[0042] That is, when a signal analyzing unit 310 may generate a control signal based on the property of the input signal and transfer a command to select an encoding method, one of the first encoding unit 320 and the second encoding unit 330 may perform encoding. In this instance, when the input signal is an audio signal, the first encoding unit 320 performs encoding, and when the input signal is a voice signal, the second encoding unit 330 performs encoding.
[0043] Here, the first encoding unit 320 may perform one of a real filterbank based encoding or a complex filterbank based encoding, and may include an MDCT encoding unit (not illustrated) to perform an MDCT based encoding, an MDST encoding unit (not illustrated) to perform an MDST based encoding, and an outputting unit (not illustrated) to output at least one of an MDCT coefficient and an MDST coefficient according to the property of the input signal.
[0044] Accordingly, the first encoding unit 320 performs the MDCT based encoding and the MDST based encoding as a complex transform, and determines whether to output only the MDCT coefficient or to output both the MDCT coefficient and the MDST coefficient based on a status of the control signal of the signal analyzing unit 310.
[0045]
[0046] Referring to
[0047] The audio decoding unit 410 may decode an LPC residual signal that is encoded from a frequency domain. That is, when the input signal is an audio signal, the signal is encoded from the frequency domain, and thus, the audio decoding unit 410 inversely performs the encoding process to decode the audio signal. In this instance, the audio decoding unit 410 may include a first decoding unit (not illustrated) to decode an LPC residual signal encoded based on a real filterbank, and a second decoding unit (not illustrated) to decode an LPC residual signal encoded based on a complex filterbank. The voice decoding unit 420 may decode an LPC residual signal encoded from a time domain. That is, when the input signal is a voice signal, the signal is encoded from the time domain, and thus, the voice decoding unit 420 inversely performs the encoding process to decode the voice signal.
[0048] The distortion controller 430 may compensate for a distortion between an output signal of the audio decoding unit 410 and an output signal of the voice decoding unit 420. That is, the distortion controller may compensate for discontinuity or distortion occurring when the output signal of the audio decoding unit 410 or the output signal of the voice decoding unit 420 is connected.
[0049]
[0050] Referring to
[0051] Also, in an encoding process, a window applied as a preprocess of a real based (e.g. MDCT based) filterbank and a window applied as a preprocess of a complex based filter bank may be differently defined, and when the MDCT based filterbank is performed, a window may be defined as given in Table 1 below, according to a mode of a previous frame.
TABLE-US-00001 TABLE 1 MDCT based MDCT based residual residual A number of filterbank filterbank coefficients mode of a mode of a transformed to previous current a frequency frame frame domain ZL L M R ZR 1, 2, 3 1 256 64 128 128 128 64 1, 2, 3 2 512 192 128 384 128 192 1, 2, 3 3 1024 448 128 896 128 448
[0052] As an example, a shape of a window of an MDCT residual filterbank mode 1 will be described with reference to
[0053] Referring to
[0054] Also, when both of the current frame and the previous frame are in a complex filterbank mode, a shape of a window of the current frame may be defined as given in Table 2 below.
TABLE-US-00002 TABLE 2 MDCT based MDCT based residual residual A number of filterbank filterbank coefficients mode of a mode of a transformed to previous current a frequency frame frame domain ZL L M R ZR 1 1 288 0 32 224 32 0 1 2 576 0 32 480 64 0 2 2 576 0 64 448 64 0 1 3 1152 0 32 992 128 0 2 3 1152 0 64 960 128 0 3 3 1152 0 128 896 128 0
[0055] Table 2 does not include the ZL and ZR, unlike Table 1, and has the same frame size and the same coefficients transformed into the frequency domain. That is, the number of the transformed coefficients is ZL+L+M+R+ZR.
[0056] Also, a window shape, when an MDCT based filter bank is applied in the previous frame, and a complex based filter bank is applied in the current frame, will be described as given in Table 3.
TABLE-US-00003 TABLE 3 MDCT based MDCT based residual residual A number of filterbank filterbank coefficients mode of a mode of a transformed to previous current a frequency frame frame domain ZL L M R ZR 1, 2, 3 1 288 0 128 128 32 0 1, 2, 3 2 576 0 128 384 64 0 1, 2, 3 3 1152 0 128 896 128 0
[0057] Here, an overlap size of a left side of the window, that is a size overlapped with the previous frame, may be set to “128”.
[0058] Also, a window shape, when the previous frame is in the complex filterbank mode and the current frame is in an MDCT based filterbank mode, will be described as given in Table 4.
TABLE-US-00004 TABLE 4 MDCT based MDCT based residual residual A number of filterbank filterbank coefficients mode of a mode of a transformed to previous current a frequency frame frame domain ZL L M R ZR 1, 2, 3 1 256 64 128 128 128 64 1, 2, 3 2 512 192 128 384 128 192 1, 2, 3 3 1024 448 128 896 128 448
[0059] Here, the same window of Table 1 may be applicable to Table 4. However, the R section of the window may be transformed to “128” with respect to the complex filterbank mode 1 and 2 of the previous frame. An example of the transformation will be described in detail with reference to
[0060] Referring to
[0061] Also, when the previous frame performs encoding by using an ACELP, and a current frame is in an MDCT filterbank mode, the window may be defined as given in Table 5.
TABLE-US-00005 TABLE 5 MDCT based MDCT based residual residual A number of filterbank filterbank coefficients mode of a mode of a transformed to previous current a frequency frame frame domain ZL L M R ZR 0 1 320 160 0 256 128 96 0 2 576 288 0 512 128 224 0 3 1152 512 128 1024 128 512
[0062] That is, Table 5 defines a window of each mode of the current frame when a last mode of the previous frame is zero. Here, when the last mode of the previous frame is zero and a mode of the current frame is “3”, Table 6 may be applicable.
TABLE-US-00006 TABLE 6 MDCT based MDCT based residual residual A number of filterbank filterbank coefficients mode of a mode of a transformed to previous current a frequency frame frame domain ZL L M R ZR 0 3 1152 512 + α α 1024 128 512
[0063] Here, a may be 0≤a≤sN/2 or a=sN. In this instance, a transform coefficient may be 5×sN. As an example, sN=128 in Table 6.
[0064] Accordingly, a frame connection method of when 0≤a≤sN/2 and a frame connection method of when a=sN are different will be described in detail with reference to
[0065] Detailed description with reference to
[0066] When sN=128, the connection is processed as shown in
[0067] Next, the w.sub.a is applied last and a block to be lastly overlap added is generated. The w.sub.a is applied last once again, since a windowing after the transformation from Frequency to Time is considered. The generated block is ((w.sub.a×x.sub.b)+(w.sub.a.sup.r×x.sub.b.sup.r))×w.sub.a is overlap added and is connected to an MDCT block of a Mode 3.
[0068] As described in the above description, a block, expressing a residual signal as a complex signal and performing encoding/decoding, is embodied to encode/decode an LPC residual signal, and thus, an LPC residual signal encoding/decoding apparatus that improves encoding performance may be provided and an LPC residual signal encoding/decoding apparatus that does not generate an aliasing on a time axis may be provided.
[0069] Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.