Device and method for bandwidth extension for audio signals
09747908 ยท 2017-08-29
Assignee
Inventors
Cpc classification
G10L25/18
PHYSICS
G10L19/167
PHYSICS
International classification
G10L19/00
PHYSICS
G10L19/24
PHYSICS
Abstract
An audio signal decoding apparatus is provided that includes a receiver that receives an encoded information, a memory, and a processor that demultiplexes low-band encoding parameters, index information, and scale factor information from the encoded information. The processor also decodes the low-band encoding parameters to obtain a synthesized low frequency spectrum, replicates a high frequency subband spectrum based on the index information using the synthesized low frequency spectrum, and adjusts an amplitude of the replicated high frequency subband spectrum using the scale factor information. The processor further estimates a frequency of a harmonic component in the synthesized low frequency spectrum, adjusts a frequency of a harmonic component in the high frequency subband spectrum using the estimated harmonic frequency spectrum, and generates an output signal using the synthesized low frequency spectrum and the high frequency subband spectrum.
Claims
1. An audio signal decoding apparatus, comprising: a receiver that receives an encoded information; a memory; and a processor that demultiplexes encoding parameters, index information that identifies a most correlated portion from a low frequency spectrum for one or more high frequency subbands, and scale factor information from the encoded information; replicates a high frequency subband spectrum based on the index information using a synthesized low frequency spectrum, the synthesized low frequency spectrum being obtained by decoding the encoding parameters; adjusts an amplitude of the replicated high frequency subband spectrum using the scale factor information, estimates a frequency of a harmonic component in the synthesized low frequency spectrum; adjusts a frequency of a harmonic component in the high frequency subband spectrum using the estimated harmonic frequency; and generates an output signal using the synthesized low frequency spectrum and the high frequency subband spectrum; wherein, within the harmonic frequency estimation, the processor splits a preselected portion of the synthesized low frequency spectrum into plural blocks; identifies a frequency of a spectral peak having a maximum amplitude in each of the plural blocks; calculates spacing between each of the identified spectral peak frequencies; and calculates the harmonic frequency using the spacing between the identified spectral peak frequencies.
2. An audio signal decoding apparatus, comprising: a receiver that receives an encoded information; a memory; and a processor that demultiplexes encoding parameters, index information that identifies a most correlated portion from a low frequency spectrum for one or more high frequency subbands, and scale factor information from the encoded information; replicates a high frequency subband spectrum based on the index information using a synthesized low frequency spectrum, the synthesized low frequency spectrum being obtained by decoding the encoding parameters; adjusts an amplitude of the replicated high frequency subband spectrum using the scale factor information, estimates a frequency of a harmonic component in the synthesized low frequency spectrum; adjusts a frequency of a harmonic component in the high frequency subband spectrum using the estimated harmonic frequency; and generates an output signal using the synthesized low frequency spectrum and the high frequency subband spectrum; wherein, within the harmonic frequency adjustment, the processor further adjusts the plurality of spectral peak frequencies so that the spacing between the spectral peak frequencies after the adjustment is equal to the estimated harmonic frequency, using, as a reference, the highest frequency of the spectral peaks in the synthesized low frequency spectrum.
3. An audio signal decoding method, comprising: receiving encoded information; demultiplexing encoding parameters, index information that identifies a most correlated portion from a low frequency spectrum for one or more high frequency subbands, and scale factor information from the encoded information; replicating a high frequency subband spectrum based on the index information using the synthesized low frequency spectrum, the synthesized low frequency spectrum being obtained by decoding the encoding parameters; adjusting an amplitude of the replicated high frequency subband spectrum using the scale factor information, estimating a frequency of a harmonic component in the synthesized low frequency spectrum; adjusting a frequency of a harmonic component in the high frequency subband spectrum using the estimated harmonic frequency; and generating an output signal using the synthesized low frequency spectrum and the high frequency subband spectrum; wherein, within an harmonic frequency estimation splitting a preselected portion of the synthesized low frequency spectrum into plural blocks; identifying a frequency of a spectral peak having a maximum amplitude in each of the plural blocks; calculating spacing between each of the identified spectral peak frequencies; and calculating the harmonic frequency using the spacing between the identified spectral peak frequencies.
4. An audio signal decoding method, comprising: receiving an encoded information; demultiplexing encoding parameters, index information that identifies a most correlated portion from a low frequency spectrum for one or more high frequency subbands, and scale factor information from the encoded information; replicating a high frequency subband spectrum based on the index information using the synthesized low frequency spectrum, the synthesized low frequency spectrum being obtained by decoding the encoding parameters; adjusting an amplitude of the replicated high frequency subband spectrum using the scale factor information, estimating a frequency of a harmonic component in the synthesized low frequency spectrum; adjusting a frequency of a harmonic component in the high frequency subband spectrum using the estimated harmonic; and generating an output signal using the synthesized low frequency spectrum and the high frequency subband spectrum; wherein, within the harmonic frequency adjustment, adjusting the plurality of spectral peak frequencies so that the spacing between the spectral peak frequencies after the adjustment is equal to the estimated harmonic frequency, using, as a reference, the highest frequency of the spectral peaks in the synthesized low frequency spectrum.
5. The audio signal decoding apparatus according to claim 1, wherein the processor calculates the harmonic frequency using an average value of the spacing between the identified spectral peak frequencies.
6. The audio signal decoding method according to claim 3, wherein calculating the harmonic frequency uses an average value of the spacing between the identified spectral peak frequencies.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DESCRIPTION OF EMBODIMENTS
(15) The main principle of the present invention is described in this section using
(16) (Embodiment 1)
(17) The configuration of a codec according to the present invention is illustrated in
(18) At an encoding apparatus side illustrated in
(19) Finally, the multiplexing section (307) integrates the core encoding parameters, the index information and the scale factor information into a bitstream.
(20) In a decoding apparatus illustrated in
(21) A core decoding section reconstructs synthesized low frequency signals using the core encoding parameters (402). The synthesized low frequency signal is up-sampled (403), and used for bandwidth extension (410).
(22) This bandwidth extension is performed as follows. That is, the synthesized low frequency signal is energy-normalized (404), and a low frequency signal identified according to the index information that identifies a portion which is the most correlated with each subband of the high frequency signal of the input signal derived at the encoding apparatus side is copied into the high frequency band (405), and the energy level is adjusted according to the scale factor information to achieve the same level of the energy level of the high frequency signal of the input signal (406).
(23) Further, a harmonic frequency is estimated from the synthesized low frequency spectrum (407). The estimated harmonic frequency is used to adjust the frequency of the tonal component in the high frequency signal spectrum (408).
(24) The reconstructed high frequency signal is transformed from a frequency domain to a time domain (409), and is added to the up-sampled synthesized low frequency signal to generate an output signal in the time domain.
(25) The detail processing of a harmonic frequency estimation scheme will be described as follows: 1) From the synthesized low frequency signal (LF) spectrum, a portion for estimating a harmonic frequency is selected. The selected portion should have clear harmonic structure so that the harmonic frequency estimated from the selected portion is reliable. Usually, for every harmonic, a clear harmonic structure is observed from 1 to 2 kHz to around a cut-off frequency. 2) The selected portion is split into a multiplicity of blocks with a width near to a human's voice pitch frequency (about 100 to 400 Hz). 3) Spectral peaks, which are the spectrum whose amplitude is the maximum within each block, and spectral peak frequencies, which are the frequencies of those spectral peaks, are searched. 4) Post-processing is performed to the identified spectral peaks in order to avoid errors or to improve the accuracy in the harmonic frequency estimation,
(26) The spectrum illustrated in
(27) Based on the synthesized low frequency signal spectrum, spectral peaks and spectral peak frequencies are calculated. However, a spectral peak with a small amplitude and extremely short spacing of a spectral peak frequency with respect to an adjacent spectral peak is discarded, which avoids estimation errors in calculating a harmonic frequency value. 1) The spacing between the identified spectral peak frequencies is calculated. 2) A harmonic frequency is estimated based on the spacing between the identified spectral peak frequencies. One of the methods for estimating the harmonic frequency is presented as follows:
(28)
(29) where
(30) Est.sub.Harmonic is the calculated harmonic frequency;
(31) Spacing.sub.peak is the frequency spacing between the detected peak positions;
(32) N is the number of the detected peak positions;
(33) Pos.sub.peak is the position of the detected peak;
(34) The harmonic frequency estimation is also performed according to a method described as follows: 1) In the synthesized low frequency signal (LF) spectrum, in order to estimate a harmonic frequency, a portion having a clear harmonic structure is selected so that the estimated harmonic frequency is reliable. Usually, for every harmonic, a clear harmonic structure can be seen from 1 to 2 kHz to around a cut-off frequency. 2) A spectrum and its frequency having the maximum amplitude (absolute value) are identified within the selected portion of the above-mentioned synthesized low frequency signal (spectrum). 3) A set of spectral peaks having a substantially equal frequency spacing from the spectrum frequency of the spectrum with the maximum amplitude and at which the absolute value of the amplitude exceeds a predetermined threshold is identified. As the predetermined threshold, it is possible to apply, for example, a value twice the standard deviation of the spectral amplitudes contained in the above-mentioned selected portion. 4) The spacing between the above-mentioned spectral peak frequencies is calculated. 5) The harmonic frequency is estimated based on the spacing between the above-mentioned spectral peak frequencies. Also in this case, the method in Equation (1) can be used to estimate the harmonic frequency.
(35) There is a case where the harmonic component in the synthesized low frequency signal spectrum is not well encoded, at a very low bitrate. In this case, there is a possibility that some of the spectral peaks identified may not correspond to the harmonic components of the input signals at all. Therefore, in the calculation of the harmonic frequency, the spacing between spectral peak frequencies which are largely different from the average value should be excluded from the calculation target.
(36) Also, there is a case where not all the harmonic components can be encoded (meaning that some of the harmonic components are missing in the synthesized low frequency signal spectrum) due to the relatively low amplitude of the spectral peak, the bitrate constraints for encoding, or the like. In these cases, the spacing between the spectral peak frequencies extracted at the missing harmonic portion is considered to be twice or a few times the spacing between the spectral peak frequencies extracted at the portion which retains good harmonic structure. In this case, the average value of the extracted values of the spacing between the spectral peak frequencies where the values are included in the predetermined range including the maximum spacing between the spectral peak frequencies is defined as an estimated harmonic frequency value. Thus, it becomes possible to properly replicate the high frequency spectrum. The specific procedure comprises the following steps: 1) The minimum and maximum values of the spacing between the spectral peak frequencies are identified;
(37) [2]
Spacing.sub.peak(n)=Pos.sub.peak(n+1)Pos.sub.peak(n), n[1,N1]
Spacing.sub.mix=min({Spacing.sub.peak(n)});
Spacing.sub.max=max({Spacing.sub.peak(n)});(Equation 2)
(38) where;
(39) Spacing.sub.peak is the frequency spacing between the detected peak positions;
(40) Spacing.sub.min is the minimum frequency spacing between the detected peak positions;
(41) Spacing.sub.max is the maximum frequency spacing between the detected peak positions;
(42) N is the number of the detected peak positions;
(43) Pos.sub.peak is the position of the detected peak;
(44) 2) Every spacing between spectral peak frequencies is identified in the range of:
(45) [3]
[k*Spacing.sub.min,Spacing.sub.max], k[1,2]
(46) 3) The average value of the identified spacing values between the spectral peak frequencies in the above range is defined as the estimated harmonic frequency value.
(47) Next, one example of harmonic frequency adjustment schemes will be described below,
(48) 1) The last encoded spectral peak and its spectral peak frequency are identified in to the synthesized low frequency signal (LF) spectrum.
(49) 2) The spectral peak and the spectral peak frequency are identified within the high frequency spectrum replicated by bandwidth extension.
(50) 3) Using the highest spectral peak frequency as a reference, among spectral peaks of the synthesized low frequency signal spectrum, the spectral peak frequencies are adjusted so that the values of the spacing between the spectral peak frequencies are equal to the estimated value of the spacing between the harmonic frequencies. This processing is illustrated in
(51) Harmonic frequency adjustment schemes as described below are also possible. 1) The synthesized low frequency signal (LF) spectrum having the highest spectral peak frequency is identified. 2) The spectral peak and the spectral peak frequency within the high frequency (HF) spectrum extended in terms of bandwidth by bandwidth extension are identified. 3) Using the highest spectral peak frequency of the synthesized low frequency signal spectrum as a reference, possible spectral peak frequencies in the HR spectrum are calculated. Each spectral peak in the high frequency spectrum replicated by the bandwidth extension is shifted to a frequency which is the closest to each spectral peak frequency, among the calculated spectral peak frequencies. This processing is illustrated in
(52) Thereafter, the spectral peak extracted in the replicated high frequency spectrum is shifted to a frequency which is the closest to the spectral peak frequency, among the possible spectral peak frequencies calculated as described above.
(53) There is also a case where the estimated harmonic value Est.sub.Harmonic does not correspond to an integer frequency bin. In this case, the spectral peak frequency is selected to be a frequency bin which is the closest to the frequency derived based on Est.sub.Harmonic.
(54) There also may be a method of estimating a harmonic frequency in which the previous frame spectrum is utilized to estimate the harmonic frequency, and a method of adjusting the frequencis of tonal components in which the previous frame spectrum is taken into consideration so that the transition between frames is smooth when adjusting the tonal component. It is also possible to adjust the amplitude such that, even when the frequencies of the tonal components are shifted, the energy level of the original spectrum is maintained. All such minor variations are within the scope of the present invention.
(55) The above descriptions are all given as examples, and the ideas of the present invention are not limited by the given examples. Those skilled in the art will be able to modify and adapt the present invention without deviating from the spirit of the invention.
(56) [Effect]
(57) The bandwidth extension method according to the present invention replicates the high frequency spectrum utilizing the synthesized low frequency signal spectrum which is the most correlated with the high frequency spectrum, and shifts the spectral peaks to the estimated harmonic frequencies. Thus, it becomes possible to maintain both the fine structure of the spectrum and the harmonic structure between the low frequency band spectral peaks and the replicated high frequency band spectral peaks.
(58) (Embodiment 2)
(59) Embodiment 2 of the present invention is illustrated in
(60) The encoding apparatus according to Embodiment 2 is substantially the same as that of Embodiment 1, except harmonic frequency estimation sections (708 and 709) and a harmonic frequency comparison section (710).
(61) The harmonic frequency is estimated separately from synthesized low frequency spectrum (708) and high frequency spectrum (709) of the input signal, and flag information is transmitted based on the comparison result between the estimated values of those (710). As one of the examples, the flag information can be derived as in the following equation:
(62) [4]
(63) if
Est.sub.Harmonic.sub._.sub.LF[Est.sub.Harmonic.sub._.sub.HFThreshold, Est.sub.Harmonic.sub._.sub.HF+Threshold]
Flag=1
Otherwise
Flag=0(Equation 3)
where Est.sub.Harmonic.sub._.sub.LF is the estimated harmonic frequency from the synthesized low frequency spectrum; Est.sub.Harmonic.sub._.sub.HF is the estimated harmonic frequency from the original high frequency spectrum; Threshold is a predetermined threshold for the difference between Est.sub.Harmonic.sub._.sub.LF and Est.sub.Harmonic .sub._.sub.HF; Flag is the flag signal to indicate whether the harmonic adjustment should be applied;
(64) That is, the harmonic frequency estimated from the synthesized low frequency signal spectrum (synthesized low frequency spectrum) Est.sub.Harmonic.sub._.sub.LF is compared with the harmonic frequency estimated from the high frequency spectrum of the input signal Est.sub.Harmonic.sub._.sub.HF. When the difference between the two values is small enough, it is considered that the estimation from the synthesized low frequency spectrum is accurate enough, and a flag (Flag=1) meaning that it may be used for harmonic frequency adjustment is set. On the other hand, when the difference between the two values is not small, it is considered that the estimated value from the synthesized low frequency spectrum is not accurate, and a flag (Flag=0) meaning that it should not be used for harmonic frequency adjustment is set.
(65) At decoding apparatus side illustrated in
(66) [Effect]
(67) For several input signals, there is a case where the harmonic frequency estimated from the synthesized low frequency spectrum is different from the harmonic frequency of the high frequency spectrum of the input signal. Especially at low bitrate, the harmonic structure of the low frequency spectrum is not well maintained. By sending the flag information, it becomes possible to avoid the adjustment of the tonal component using a wrongly estimated value of the harmonic frequency.
(68) (Embodiment 3)
(69) Embodiment 3 of the present invention is illustrated in
(70) The encoding apparatus according to Embodiment 3 is substantially the same as that of Embodiment 2, except differential device (910).
(71) The harmonic frequency is estimated separately from the synthesized low frequency spectrum (908) and high frequency spectrum (909) of the input signal. The difference between the two estimated harmonic frequencies (Diff) is calculated (910), and transmitted to the decoding apparatus side.
(72) At decoding apparatus side illustrated in
(73) Instead of the difference value, the harmonic frequency estimated from the high frequency spectrum of the input signal may also be directly transmitted to the decoding section. Then, the received harmonic frequency value of the high frequency spectrum of the input signal is used to perform the harmonic frequency adjustment. Thus, it becomes unnecessary to estimate the harmonic frequency from the synthesized low frequency spectrum at the decoding apparatus side.
(74) [Effect]
(75) There is a case where, for several signals, the harmonic frequency estimated from the synthesized low frequency spectrum is different from the harmonic frequency of the high frequency spectrum of the input signal. Therefore, by sending the difference value, or the harmonic frequency value derived from the high frequency spectrum of the input signal, it becomes possible to adjust the tonal component of the high frequency spectrum replicated through bandwidth extension by the decoding apparatus at the receiving side more accurately.
(76) Embodiment 4)
(77) Embodiment 4 of the present invention is illustrated in
(78) The encoding apparatus according to Embodiment 4 is the same as any other conventional encoding apparatuses,or is the same as the encoding apparatus in Embodiment 1, 2 or 3.
(79) At decoding apparatus side illustrated in
(80) Especially when the available bitrate is low, there is a case where some of the harmonic components of the low frequency spectrum are hardly encoded, or are not encoded at all. In this case, the estimated harmonic frequency value can be used to inject the missing harmonic components.
(81) This will be illustrated in the
(82) Another approach for injecting the missing harmonic component will be described as follows: 1. The harmonic frequency is estimated using the encoded LE spectrum (1103). 1.1 The harmonic frequency is estimated using spacing between spectral peak frequencies identified in the encoded low frequency spectrum. 1.2 The values of spacing between the spectral peak frequencies, which are derived from the missing harmonic portion, become twice or a few times of values of the spacing between the spectral peak frequencies, which are derived from a portion which has a good harmonic structure. Such values of the spacing between the spectral peak frequencies are grouped into different categories, and the average spacing value between the spectral peak frequencies is estimated for each of the categories. The detail thereof will be described as follows: a. The minimum value and the maximum value of the spacing value between the spectral peak frequencies are identified.
(83) [5]
Spacing.sub.peak(n)=Pos.sub.peak(n+1)Pos.sub.peak(n), n[1, N1]
Spacing.sub.min=min({Spacing.sub.peak(n)});
Spacing.sub.max=max({Spacing.sub.peak(n)});(Equation 4)
(84) where;
(85) Spacing.sub.peak is the frequency spacing between the detected peak positions;
(86) Spacing.sub.min is the minimum frequency spacing between the detected peak positions;
(87) Spacing.sub.max is the maximum frequency spacing between the detected peak positions;
(88) N is the number of the detected peak positions;
(89) Pos.sub.peak is the position of the detected peak; b. Every spacing value is identified in the range of:
(90) [6]
r.sub.1=[Spacing.sub.min,k*Spacing.sub.min)
r.sub.2=[k*Spacing.sub.min, Spacing.sub.max], 1<k2 c. The average values of the spacing values identified in the above ranges are calculated as the estimated harmonic frequency values.
(91)
(92) where
(93) Est.sub.Harmonic.sub.
(94) N.sub.1 is the number of the detected peak positions belonging to r.sub.1
(95) N.sub.2 is the number of the detected peak positions belonging to r.sub.2 2. Using the estimated harmonic frequency values, the missing harmonic components are injected. 2.1 The selected LF spectrum is split into several regions. 2.2 The missing harmonics are identified by utilizing region information and the estimated frequencies.
(96) For example, assume that the selected LF spectrum is split into three regions r.sub.1, r.sub.2, and r.sub.3.
(97) Based on the region information, the harmonics are identified and injected.
(98) Due to the signal characteristics for harmonics, the spectral gap between harmonics is Est.sub.Harmonic.sub.
(99) Similarly, Est.sub.Harmonic.sub.
(100) Further, as for its amplitude, it is possible to use the average value of the amplitudes of all the harmonic components which are not missing or the average value of the amplitudes of the harmonic components preceding and following the missing harmonic component. Alternatively, as for the amplitude, a spectral peak with the minimum amplitude in the WB spectrum may be used. The harmonic component generated using the frequency and amplitude is injected into the LF spectrum for restoring the missing harmonic component.
(101) [Effect]
(102) There is a case where the synthesized low frequency spectrum is not maintained for several signals. Especially at low bitrate, there is a possibility that several harmonic components may be missing. By injecting the missing harmonic components in the LF spectrum, it becomes possible not only to extend the LF, but also improve the harmonic characteristics of the reconstructed harmonics. This can suppress the auditory influence due to missing harmonics to further improve the sound quality.
(103) The disclosure of Japanese Patent Application No. 2013-122985 filed on Jun. 11, 2013, including the specification, drawings and abstract, is incorporated herein by reference in its entirety.
INDUSTRIAL APPLICABILITY
(104) The encoding apparatus, decoding apparatus and encoding and decoding methods according to the present invention are applicable to a wireless communication terminal apparatus, base station apparatus in a mobile communication system, tele-conference terminal apparatus, video conference terminal apparatus, and voice over Internet protocol (VOIP) terminal apparatus.