Encoding and decoding methods, and encoding and decoding apparatuses for stereo signal
11238875 · 2022-02-01
Assignee
Inventors
Cpc classification
H04S3/00
ELECTRICITY
G10L19/20
PHYSICS
H04S2420/03
ELECTRICITY
G10L19/008
PHYSICS
H04S3/008
ELECTRICITY
International classification
G10L19/008
PHYSICS
G10L19/20
PHYSICS
H04S3/00
ELECTRICITY
Abstract
This disclosure provides an encoding method, a decoding method, an encoding apparatus, and a decoding apparatus for a stereo signal. The encoding method includes: performing interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame; performing time-domain downmixing processing on the stereo signal after the delay alignment in the current frame, to obtain a primary-channel signal and a secondary-channel signal in the current frame; and quantizing the inter-channel time difference after the interpolation processing in the current frame, the primary channel signal and the secondary channel signal.
Claims
1. An encoding method for a stereo audio signal, comprising: determining an inter-channel time difference in a current frame; performing interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing; performing delay alignment on a stereo audio signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo audio signal after the delay alignment; performing time-domain downmixing processing on the stereo audio signal after the delay alignment, to obtain a primary-channel signal and a secondary-channel signal in the current frame; quantizing the inter-channel time difference after the interpolation processing, and writing the quantized inter-channel time difference into a bitstream; and quantizing the primary-channel signal and the secondary-channel signal in the current frame, and writing the quantized primary-channel signal and the quantized secondary-channel signal into the bitstream; wherein the inter-channel time difference after the interpolation processing is calculated according to a formula A=α.Math.B+(1−α).Math.C, wherein A is the inter-channel time difference after the interpolation processing, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, a is a first interpolation coefficient, and 0<α<1; wherein the first interpolation coefficient α is inversely proportional to an encoding and decoding delay, and is directly proportional to a frame length of the current frame, wherein the encoding and decoding delay comprises an encoding delay in a process of encoding, by an encoding end, the primary-channel signal and the secondary-channel signal that are obtained after the time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, the bitstream to obtain a primary-channel signal and a secondary-channel signal.
2. The method according to claim 1, wherein the first interpolation coefficient α satisfies a formula α=(N−S)/N, wherein S is the encoding and decoding delay, and N is the frame length of the current frame.
3. An encoding method for a stereo audio signal, comprising: determining an inter-channel time difference in a current frame; performing interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing; performing delay alignment on a stereo audio signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo audio signal after the delay alignment; performing time-domain downmixing processing on the stereo audio signal after the delay alignment, to obtain a primary-channel signal and a secondary-channel signal in the current frame; quantizing the inter-channel time difference after the interpolation processing, and writing the quantized inter-channel time difference into a bitstream; and quantizing the primary-channel signal and the secondary-channel signal in the current frame, and writing the quantized primary-channel signal and the quantized secondary-channel signal into the bitstream; wherein the inter-channel time difference after the interpolation processing is calculated according to a formula A=(1−β).Math.B+β.Math.C, wherein A is the inter-channel time difference after the interpolation processing, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, β is a second interpolation coefficient, and 0<β<1; wherein the second interpolation coefficient β is directly proportional to an encoding and decoding delay, and is inversely proportional to a frame length of the current frame, wherein the encoding and decoding delay comprises an encoding delay in a process of encoding, by an encoding end, the primary-channel signal and the secondary-channel signal that are obtained after the time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, the bitstream to obtain a primary-channel signal and a secondary-channel signal.
4. The method according to claim 3, wherein the second interpolation coefficient β satisfies a formula β=S/N, wherein S is the encoding and decoding delay, and N is the frame length of the current frame.
5. An encoding apparatus, comprising: at least one processor; and one or more memories coupled to the at least one processor and storing programming instructions for execution by the at least one processor to: determine an inter-channel time difference in a current frame; perform interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing; perform delay alignment on a stereo audio signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo audio signal after the delay alignment; perform time-domain downmixing processing on the stereo audio signal after the delay alignment, to obtain a primary-channel signal and a secondary-channel signal; and quantize the inter-channel time difference after the interpolation processing, and write the quantized inter-channel time difference into a bitstream; and quantize the primary-channel signal and the secondary-channel signal, and write the quantized primary-channel signal and the quantized secondary-channel signal into the bitstream; wherein the inter-channel time difference after the interpolation processing is calculated according to a formula A=α.Math.B+(1−α).Math.C, wherein A is the inter-channel time difference after the interpolation processing, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, a is a first interpolation coefficient, and 0<α<1; wherein the first interpolation coefficient α is inversely proportional to an encoding and decoding delay, and is directly proportional to a frame length of the current frame, wherein the encoding and decoding delay comprises an encoding delay in a process of encoding, by an encoding end, the primary-channel signal and the secondary-channel signal that are obtained after the time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, the bitstream to obtain a primary-channel signal and a secondary-channel signal.
6. The apparatus according to claim 5, wherein the first interpolation coefficient α satisfies a formula α=(N−S)/N, wherein S is the encoding and decoding delay, and N is the frame length of the current frame.
7. An encoding apparatus, comprising: at least one processor; and one or more memories coupled to the at least one processor and storing programming instructions for execution by the at least one processor to: determine an inter-channel time difference in a current frame; perform interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing; perform delay alignment on a stereo audio signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo audio signal after the delay alignment; perform time-domain downmixing processing on the stereo audio signal after the delay alignment, to obtain a primary-channel signal and a secondary-channel signal; and quantize the inter-channel time difference after the interpolation processing, and write the quantized inter-channel time difference into a bitstream; and quantize the primary-channel signal and the secondary-channel signal, and write the quantized primary-channel signal and the quantized secondary-channel signal into the bitstream; wherein the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=(1−β).Math.B+β.Math.C, wherein A is the inter-channel time difference after the interpolation processing, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, β is a second interpolation coefficient, and 0<β<1; wherein the second interpolation coefficient β is directly proportional to an encoding and decoding delay, and is inversely proportional to a frame length of the current frame, wherein the encoding and decoding delay comprises an encoding delay in a process of encoding, by an encoding end, the primary-channel signal and the secondary-channel signal that are obtained after the time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, the bitstream to obtain a primary-channel signal and a secondary-channel signal.
8. The apparatus according to claim 7, wherein the second interpolation coefficient β satisfies a formula β=S/N, wherein S is the encoding and decoding delay, and N is the frame length of the current frame.
9. A non-transitory computer-readable storage medium storing computer instructions, that when executed by one or more processors, cause the one or more processors to perform operations comprising: determining an inter-channel time difference in a current frame; performing interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing; performing delay alignment on a stereo audio signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo audio signal after the delay alignment; performing time-domain downmixing processing on the stereo audio signal after the delay alignment, to obtain a primary-channel signal and a secondary-channel signal in the current frame; quantizing the inter-channel time difference after the interpolation processing, and writing the quantized inter-channel time difference into a bitstream; and quantizing the primary-channel signal and the secondary-channel signal in the current frame, and writing the quantized primary-channel signal and the quantized secondary-channel signal into the bitstream; wherein the inter-channel time difference after the interpolation processing is calculated according to a formula A=α.Math.B+(1−α).Math.C, wherein A is the inter-channel time difference after the interpolation processing, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, a is a first interpolation coefficient, and 0<α<1; wherein the first interpolation coefficient α is inversely proportional to an encoding and decoding delay, and is directly proportional to a frame length of the current frame, wherein the encoding and decoding delay comprises an encoding delay in a process of encoding, by an encoding end, the primary-channel signal and the secondary-channel signal that are obtained after the time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, the bitstream to obtain a primary-channel signal and a secondary-channel signal.
10. The non-transitory computer-readable storage medium according to claim 9, wherein the first interpolation coefficient α satisfies a formula α=(N−S)/N, wherein S is the encoding and decoding delay, and N is the frame length of the current frame.
11. A non-transitory computer-readable storage medium storing computer instructions, that when executed by one or more processors, cause the one or more processors to perform operations comprising: determining an inter-channel time difference in a current frame; performing interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing; performing delay alignment on a stereo audio signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo audio signal after the delay alignment; performing time-domain downmixing processing on the stereo audio signal after the delay alignment, to obtain a primary-channel signal and a secondary-channel signal in the current frame; quantizing the inter-channel time difference after the interpolation processing, and writing the quantized inter-channel time difference into a bitstream; and quantizing the primary-channel signal and the secondary-channel signal in the current frame, and writing the quantized primary-channel signal and the quantized secondary-channel signal into the bitstream; wherein the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=(1−β).Math.B+β.Math.C, wherein A is the inter-channel time difference after the interpolation processing, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, β is a second interpolation coefficient, and 0<β<1; wherein the second interpolation coefficient β is directly proportional to an encoding and decoding delay, and is inversely proportional to a frame length of the current frame, wherein the encoding and decoding delay comprises an encoding delay in a process of encoding, by an encoding end, the primary-channel signal and the secondary-channel signal that are obtained after the time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, the bitstream to obtain a primary-channel signal and a secondary-channel signal.
12. The non-transitory computer-readable storage medium according to claim 11, wherein the second interpolation coefficient β satisfies a formula β=S/N, wherein S is the encoding and decoding delay, and N is the frame length of the current frame.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
DESCRIPTION OF EMBODIMENTS
(19) The following describes the technical solutions in this disclosure with reference to the accompanying drawings.
(20) To better understand encoding and decoding methods in the embodiments of this disclosure, the following first describes in detail processes of existing time-domain stereo encoding and decoding methods with reference to
(21)
(22) 110. An encoding end estimates an inter-channel time difference of a stereo signal, to obtain the inter-channel time difference of the stereo signal.
(23) The stereo signal includes a left-channel signal and a right-channel signal. The inter-channel time difference of the stereo signal is a time difference between the left-channel signal and the right-channel signal.
(24) 120. Perform delay alignment on the left-channel signal and the right-channel signal based on the estimated inter-channel time difference.
(25) 130. Encode the inter-channel time difference of the stereo signal, to obtain an encoding index of the inter-channel time difference, and write the encoding index into a stereo encoded bitstream.
(26) 140. Determine a channel combination scale factor, encode the channel combination scale factor to obtain an encoding index of the channel combination scale factor, and write the encoding index into the stereo encoded bitstream.
(27) 150. Perform, based on the channel combination scale factor, time-domain downmixing processing on a left-channel signal and a right-channel signal that are obtained after the delay alignment.
(28) 160. Separately encode a primary-channel signal and a secondary-channel signal that are obtained after the downmixing processing, to obtain bitstreams of the primary-channel signal and the secondary-channel signal, and write the bitstreams into the stereo encoded bitstream.
(29)
(30) 210. Decode a received bitstream to obtain a primary-channel signal and a secondary-channel signal.
(31) The step 210 is equivalent to separately performing primary-channel signal decoding and secondary-channel signal decoding to obtain the primary-channel signal and the secondary-channel signal.
(32) 220. Decode the received bitstream to obtain a channel combination scale factor.
(33) 230. Perform time-domain upmixing processing on the primary-channel signal and the secondary-channel signal based on the channel combination scale factor, to obtain a left-channel reconstructed signal and a right-channel reconstructed signal that are obtained after the time-domain upmixing processing.
(34) 240. Decode the received bitstream to obtain an inter-channel time difference.
(35) 250. Adjust, based on the inter-channel time difference, a delay of the left-channel reconstructed signal and the right-channel reconstructed signal that are obtained after the time-domain upmixing processing, to obtain a decoded stereo signal.
(36) In the existing time-domain stereo encoding and decoding methods, an additional encoding delay (this delay may be specifically a time required for encoding the primary-channel signal and the secondary-channel signal) and an additional decoding delay (this delay may be specifically a time required for decoding the primary-channel signal and the secondary-channel signal) are introduced in the processes of encoding (specifically shown in the step 160) and decoding (specifically shown in the step 210) the primary-channel signal and the secondary-channel signal. However, there are no same encoding delay and same decoding delay in the processes of encoding and decoding the inter-channel time difference. Therefore, there is a deviation between the inter-channel time difference of the stereo signal that is finally obtained by decoding and the inter-channel time difference of the original stereo signal, and then there is a delay between a signal in the stereo signal obtained by decoding and the same signal in the original stereo signal, which affects accuracy of a stereo sound image of the stereo signal obtained by decoding.
(37) Specifically, in the processes of encoding and decoding the inter-channel time difference, there is no encoding delay and decoding delay that are the same as those in the processes of encoding and decoding the primary-channel signal and the secondary-channel signal. Therefore, a primary-channel signal and a secondary-channel signal that are obtained by decoding currently by the decoding end do not match an inter-channel time difference obtained by decoding currently.
(38)
(39) Therefore, this disclosure provides a new encoding method for a stereo channel signal. According to the encoding method, interpolation processing is performed on an inter-channel time difference in a current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing in the current frame, and the inter-channel time difference after the interpolation processing in the current frame is encoded and then transmitted to a decoding end. However, delay alignment is still performed by using the inter-channel time difference in the current frame. Compared with the prior art, the inter-channel time difference in the current frame obtained in this disclosure better matches a primary-channel signal and a secondary-channel signal that are obtained after encoding and decoding, and has a relatively high degree of matching with a corresponding stereo signal. This reduces a deviation between an inter-channel time difference of a stereo signal that is finally obtained by decoding by a decoding end and an inter-channel time difference of an original stereo signal. Therefore, an effect of the stereo signal that is finally obtained by decoding by the decoding end can be improved.
(40) It should be understood that the stereo signal in this disclosure may be an original stereo signal, a stereo signal including two signals that are included in a multi-channel signal, or a stereo signal including two signals that are jointly generated by a plurality of signals included in a multi-channel signal. The encoding method for a stereo signal may also be an encoding method for a stereo signal that is used in a multi-channel encoding method. The decoding method for a stereo signal may also be a decoding method for a stereo signal that is used in a multi-channel decoding method.
(41)
(42) 410. Determine an inter-channel time difference in a current frame.
(43) It should be understood that a stereo signal processed herein may include a left-channel signal and a right-channel signal, and the inter-channel time difference in the current frame may be obtained by estimating a delay of the left-channel signal and the right-channel signal. An inter-channel time difference in a previous frame of the current frame may be obtained by estimating a delay of a left-channel signal and a right-channel signal in a process of encoding a stereo signal in the previous frame. For example, a cross-correlation coefficient of a left channel and a right channel is calculated based on the left-channel signal and the right-channel signal in the current frame, and then an index value corresponding to a maximum value of the cross-correlation coefficient is used as the inter-channel time difference in the current frame.
(44) Specifically, delay estimation may be performed in a manner described in an example 1 to an example 3, to obtain the inter-channeltime difference in the current frame.
Example 1
(45) In a current sampling rate, a maximum value and a minimum value of the inter-channel time difference are respectively T.sub.max and T.sub.min, where T.sub.max and T.sub.min are preset real numbers, and T.sub.max>T.sub.min. In this case, a maximum value of the cross-correlation coefficient of the left and right channels, whose index value is between the maximum value and the minimum value of the inter-channel time difference, may be searched for. Finally, an index value corresponding to the searched maximum value of the cross-correlation coefficient of the left and right channels is determined as the inter-channel time difference in the current frame. Specifically, values of T.sub.max and T.sub.min may be 40 and −40 respectively. In this way, the maximum value of the cross-correlation coefficient of the left and right channels may be searched in a range of −40≤i≤40, and then an index value corresponding to the maximum value of the cross-correlation coefficient is used as the inter-channel time difference in the current frame.
Example 2
(46) In a current sampling rate, a maximum value and a minimum value of the inter-channel time difference are respectively T.sub.max and T.sub.min, where T.sub.max and T.sub.min are preset real numbers, and T.sub.max<T.sub.min. A cross-correlation function of the left and right channel is calculated based on the left-channel signal and the right-channel signal in the current frame. In addition, smoothing processing is performed on the calculated cross-correlation function of the left and right channels in the current frame based on a cross-correlation function of the left and right channels in previous L frames (L is an integer greater than or equal to 1), to obtain a smoothed cross-correlation function of the left and right channels. Then, a maximum value of a cross-correlation coefficient of the left and right channels after the smoothing processing is searched for in a range of T.sub.min≤i≤T.sub.max, and an index value i corresponding to the maximum value is used as the inter-channel time difference in the current frame.
Example 3
(47) After the inter-channel time difference in the current frame is estimated according to the method in the example 1 or the example 2, inter-frame smoothing processing is performed on an inter-channel time difference in previous M frames (M is an integer greater than or equal to 1) of the current frame and the estimated inter-channel time difference in the current frame, and an inter-channel time difference obtained after the smoothing processing is used as the inter-channel time difference in the current frame.
(48) It should be understood that, before estimating the delay of the left-channel signal and the right-channel signal (the left-channel signal and the right-channel signal herein are time-domain signals) to obtain the inter-channel time difference in the current frame, time-domain preprocessing may be further performed on the left-channel signal and the right-channel signal in the current frame. Specifically, high-pass filtering processing may be performed on the left-channel signal and the right-channel signal in the current frame to obtain a preprocessed left-channel signal and a preprocessed right-channel signal in the current frame. In addition, the time-domain preprocessing herein may alternatively be other processing in addition to the high-pass filtering processing. For example, pre-emphasis processing is performed.
(49) 420. Perform interpolation processing based on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing in the current frame.
(50) It should be understood that the inter-channel time difference in the current frame may be a time difference between the left-channel signal in the current frame and the right-channel signal in the current frame, and the inter-channel time difference in the previous frame of the current frame may be a time difference between a left-channel signal in the previous frame of the current frame and a right-channel signal in the previous frame of the current frame.
(51) It should be understood that performing interpolation processing based on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame is equivalent to performing weighted average processing on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame. In this way, the finally obtained inter-channel time difference after the interpolation processing in the current frame is between the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame.
(52) There may be a plurality of specific manners for performing interpolation processing based on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame. For example, interpolation processing may be performed in the following manner 1 and manner 2.
Manner 1
(53) The inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula (1).
A=α.Math.B+(1−α).Math.C (1)
(54) In the formula (1), A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, α is a first interpolation coefficient, and α is a real number satisfying 0<α<1.
(55) The inter-channel time difference can be adjusted by using the formula A=α.Math.B+(1−α).Math.C, so that the finally obtained inter-channel time difference after interpolation processing in the current frame is between the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, and the inter-channel time difference after the interpolation processing in the current frame matches, as much as possible, an inter-channel time difference of an original stereo signal that is not encoded and decoded.
(56) Specifically, assuming that the current frame is an frame, the previous frame of the current frame is an (i−1).sup.th frame. In this case, an inter-channel time difference in the frame may be determined according to a formula (2).
d_int(i)=α.Math.d(i)+(1−α).Math.d(i−1) (2)
(57) In the formula (2), d_int(i) is an inter-channel time difference after interpolation processing in the i.sup.th frame, d(i) is the inter-channel time difference in the current frame, s an inter-channel time difference in the (i−1).sup.th frame, and has a same meaning as a in the formula (1), and is also a first interpolation coefficient.
(58) The first interpolation coefficient may be directly set by technical personnel. For example, the first interpolation coefficient α may be directly set to 0.4 or 0.6.
(59) In addition, the first interpolation coefficient α may also be determined based on a frame length of the current frame and an encoding and decoding delay. The encoding and decoding delay herein may include an encoding delay in a process of encoding, by the encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal. Further, the encoding and decoding delay herein may be a sum of the encoding delay and the decoding delay. The encoding and decoding delay may be determined after an encoding and decoding algorithm used by a codec is determined. Therefore, the encoding and decoding delay is a known parameter for an encoder or a decoder.
(60) Optionally, the first interpolation coefficient α may be specifically inversely proportional to the encoding and decoding delay, and is directly proportional to the frame length of the current frame. In other words, the first interpolation coefficient α decreases as the encoding and decoding delay increases, and increases as the frame length of the current frame increases.
(61) Optionally, the first interpolation coefficient α may be determined according to a formula (3).
(62)
(63) In the formula (3), N is the frame length of the current frame, and S is the encoding and decoding delay.
(64) When N=320 and S=192, the following may be obtained according to the formula (3):
(65)
(66) Finally, it can be obtained that the first interpolation coefficient α is 0.4.
(67) Alternatively, the first interpolation coefficient α is pre-stored. Because the encoding and decoding delay and the frame length may be known in advance, the corresponding first interpolation coefficient α may also be determined and stored in advance based on the encoding and decoding delay and the frame length. Specifically, the first interpolation coefficient α may be pre-stored al the encoding end. In this way, when performing interpolation processing, the encoding end may directly perform interpolation processing based on the pre-stored first interpolation coefficient α without calculating a value of the first interpolation coefficient α. This can reduce calculation complexity of an encoding process and improve encoding efficiency.
Manner 2
(68) The inter-channel time difference in the current frame is determined according to a formula (5).
A=(1−β).Math.B+β.Math.C (5)
(69) In the formula (5), A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, β is a second interpolation coefficient, and is a real number satisfying 0<β<1.
(70) The inter-channel time difference can be adjusted by using the formula A=(1−β).Math.B+β.Math.C, so that the finally obtained inter-channel time difference after interpolation processing in the current frame is between the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, and the inter-channel time difference after the interpolation processing in the current frame matches, as much as possible, an inter-channel time difference of an original stereo signal that is not encoded and decoded.
(71) Specifically, assuming that the current frame is an frame, the previous frame of the current frame is an (i−1).sup.th if frame in this case, an inter-channel time difference in the i.sup.th frame may be determined according to a formula (6).
d_int(i)=(1−β).Math.d(i)+β.Math.d(t−1) (6)
(72) In the formula (6), d_int(i) is the inter-channel time difference in the frame, d(i) is the inter-channel time difference in the current frame, d(i−1) is an inter-channel time difference in the (i−1).sup.th frame, and β has a same meaning as β in the formula (5), and is also a second interpolation coefficient.
(73) The foregoing interpolation coefficient may be directly set by technical personnel. For example, the second interpolation coefficient β may be directly set to 0.6 or 0.4.
(74) In addition, the second interpolation coefficient β may also be determined based on a frame length of the current frame and an encoding and decoding delay. The encoding and decoding delay herein may include an encoding delay in a process of encoding, by the encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal. Further, the encoding and decoding delay herein may be a sum of the encoding delay and the decoding delay.
(75) Optionally, the second interpolation coefficient β may be specifically directly proportional to the encoding and decoding delay. In addition, the second interpolation coefficient β may be specifically inversely proportional to the frame length of the current frame.
(76) Optionally, the second interpolation coefficient β may be determined according to a formula (7).
(77)
(78) In the formula (7), N is the frame length of the current frame, and S is the encoding and decoding delay.
(79) When N=320 and S=192, the following may be obtained according to the formula (7):
(80)
(81) Finally, it can be obtained that the second interpolation coefficient β is 0.6.
(82) Alternatively, the second interpolation coefficient β is pre-stored. Because the encoding and decoding delay and the frame length may be known in advance, the corresponding second interpolation coefficient β may also be determined and stored in advance based on the encoding and decoding delay and the frame length. Specifically, the second interpolation coefficient β may be pre-stored at the encoding end. In this way, when performing interpolation processing, the encoding end may directly perform interpolation processing based on the pre-stored second interpolation coefficient β without calculating a value of the second interpolation coefficient β. This can reduce calculation complexity of an encoding process and improve encoding efficiency.
(83) 430. Perform delay alignment on a stereo signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo signal after the delay alignment in the current frame.
(84) When delay alignment is performed on the left-channel signal and the right-channel signal in the current frame, one or two of the left-channel signal and the right-channel signal may be compressed or extended based on the inter-channel time difference in the current frame, so that there is no inter-channel time difference between a left-channel signal and a right-channel signal after the delay alignment. The left-channel signal and the right-channel signal after the delay alignment in the current frame, which are obtained after delay alignment is performed on the left-channel signal and the right-channel signal in the current frame, are stereo signals after the delay alignment in the current frame.
(85) 440. Perform time-domain downmixing processing on the stereo signal after the delay alignment in the current frame, to obtain a primary-channel signal and a secondary-channel signal in the current frame.
(86) When time-domain downmixing processing is performed on the left-channel signal and the right-channel signal after the delay alignment, the left-channel signal and the right-channel signal may be down-mixed into a middle channel (Mid channel) signal and a side channel (Side channel) signal. The middle channel signal can indicate related information between the left channel and the right channel, and the side channel signal can indicate difference information between the left channel and the right channel.
(87) Assuming that L represents the left-channel signal and R represents the right-channel signal, the middle channel signal is 0.5×(L+R) and the side channel signal is 0.5×(L−R).
(88) In addition, when time-domain downmixing processing is performed on the left-channel signal and the right-channel signal after the delay alignment, to control a ratio of the left-channel signal and the right-channel signal in the downmixing processing, a channel combination scale factor may be calculated, and then time-domain downmixing processing is performed on the left-channel signal and the right-channel signal the channel combination scale factor, to obtain a primary-channel signal and a secondary-channel signal.
(89) There are a plurality of methods for calculating the channel combination scale factor. For example, a channel combination scale factor in the current frame may be calculated based on frame energy of the left channel and the right channel. A specific process is as follows:
(90) (1). Calculate frame energy of the left-channel signal and the right-channel signal based on the left-channel signal and the right-channel signal after the delay alignment in the current frame.
(91) The frame energy rms_L of the left channel in the current frame satisfies:
(92)
(93) The frame energy rms_R of the right channel in the current frame satisfies:
(94)
(95) x′.sub.L(n) is the left-channel signal after the delay alignment in the current frame, x′.sub.R(n) is the right-channel signal after the delay alignment in the current frame, n is a sampling point number, and n=0, 1, . . . , N−1.
(96) (2). Calculate the channel combination scale factor in the current frame based on the frame energy of the left channel and the right channel.
(97) The channel combination scale factor ratio in the current frame satisfies:
(98)
(99) Therefore, the channel combination scale factor is calculated based on the frame energy of the left-channel signal and the right-channel signal.
(100) After the channel combination scale factor ratio is obtained, time-domain downmixing processing may be performed based on the channel combination scale factor ratio. For example, the primary-channel signal and the secondary-channel signal after the time-domain downmixing processing may be determined according to a formula (12).
(101)
(102) Y(n) is the primary-channel signal in the current frame, X(n) is the secondary-channel signal in the current frame, x′.sub.L(n) is the left-channel signal after the delay alignment in the current frame, x′.sub.R(n) is the right-channel signal after delay alignment in the current frame, n is the sampling point number, n=0, 1, . . . , N−1, N is the frame length, and ratio is the channel combination scale factor.
(103) (3). Quantize the channel combination scale factor, and write a quantized channel combination scale factor into a bitstream.
(104) 450. Quantize the inter-channel time difference after the interpolation processing in the current frame, and write a quantized inter-channel time difference into a bitstream.
(105) Specifically, in a process of quantizing the inter-channel time difference after the interpolation processing in the current frame, any quantization algorithm in the prior art may be used to quantize the inter-channel time difference after the interpolation processing in the current frame, to obtain a quantization index. Then, the quantization index is encoded and then written into a bitstream.
(106) 460. Quantize the primary-channel signal and the secondary-channel signal in the current frame, and write a quantized primary-channel signal and a quantized secondary-channel signal into the bitstream.
(107) Optionally, a monophonic signal encoding and decoding method may be used to encode the primary-channel signal and the secondary-channel signal that are obtained after the downmixing processing. Specifically, bits of encoding a primary channel and a secondary channel may be allocated based on parameter information obtained in a process of encoding a primary-channel signal in the previous frame and/or a secondary-channel signal in the previous frame and a total number of bits of encoding the primary-channel signal and the secondary-channel signal. Then, the primary-channel signal and the secondary-channel signal are separately encoded based on a bit allocation result, to obtain an encoding index of encoding the primary channel and an encoding index of encoding the secondary channel.
(108) it should be understood that the bitstream obtained after the step 460 includes a bitstream that is obtained after the inter-channel time difference after the interpolation processing in the current frame is quantized and a bitstream that is obtained after the primary-channel signal and the secondary-channel signal are quantized.
(109) Optionally, in the method 400, the channel combination scale factor that is used when time-domain downmixing processing is performed in the step 440 may be quantized, to obtain a corresponding bitstream.
(110) Therefore, the bitstream finally obtained in the method 400 may include the bitstream that is obtained after the inter-channel time difference after the interpolation processing in the current frame is quantized, the bitstream that is obtained after the primary-channel signal and the secondary-channel signal in the current frame are quantized, and the bitstream that is obtained after the channel combination scale factor is quantized.
(111) In this disclosure, the inter-channel time difference in the current frame is used at the encoding end to perform delay alignment, to obtain the primary-channel signal and the secondary-channel signal. However, interpolation processing is performed on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, so that the inter-channel time difference in the current frame that is obtained after the interpolation processing can match the primary-channel signal and the secondary-channel signal that are obtained by encoding and decoding. The inter-channel time difference after the interpolation processing is encoded and then transmitted to the decoding end, so that the decoding end can perform decoding based on the inter-channel time difference in the current frame that matches the primary-channel signal and the secondary-channel signal that are obtained by decoding. This can reduce a deviation between an inter-channel time difference of a stereo signal that is finally obtained by decoding and an inter-channel time difference of an original stereo signal. Therefore, accuracy of a stereo sound image of the stereo signal that is finally obtained by decoding is improved.
(112) It should be understood that, the bitstream finally obtained in the method 400 may be transmitted to the decoding end, and the decoding end may decode the received bitstream to obtain the primary-channel signal and the secondary-channel signal in the current frame and the inter-channel time difference in the current frame, and adjusts, based on the inter-channel time difference in the current frame, a delay of a left-channel reconstructed signal and a right-channel reconstructed signal that are obtained after time-domain upmixing processing, to obtain a decoded stereo signal. A specific process executed by the decoding end may be the same as the process of the time-domain stereo decoding method in the prior art shown in
(113) The decoding end decodes the bitstream generated in the method 400, and a difference between a signal in the finally obtained stereo signal and the same signal in the original stereo signal may be shown in
(114) It should be understood that downmixing processing may be further implemented herein in another manner, to obtain the primary-channel signal and the secondary-channel signal.
(115) A detailed process of the encoding method for a stereo signal in the embodiments of this disclosure is described below with reference to
(116)
(117) 610. Perform time-domain preprocessing on a stereo signal, to obtain a left-channel signal and a right-channel signal after the preprocessing.
(118) Specifically, the time-domain preprocessing on the stereo signal may be implemented by using high-pass filtering, pre-emphasis processing, or the like.
(119) 620. Perform delay estimation based the left-channel signal and the right-channel signal after the preprocessing in the current frame, to obtain an estimated inter-channel time difference in the current frame.
(120) The estimated inter-channel time difference in the current frame is equivalent to the inter-channel time difference in the current frame in the method 400.
(121) 630. Perform delay alignment on the left-channel signal and the right-channel signal based on the estimated inter-channel time difference in the current frame, to obtain a stereo signal after the delay alignment.
(122) 640. Perform interpolation processing on the estimated inter-channel time difference.
(123) An inter-channel time difference after the interpolation processing is equivalent to the inter-channel time difference after the interpolation processing in the current frame in the foregoing description.
(124) 650. Quantize the inter-channel time difference after the interpolation processing.
(125) 660. Determine a channel combination scale factor based on the stereo signal after the delay alignment, and quantize the channel combination scale factor.
(126) 670. Perform, based on the channel combination scale factor, time-domain downmixing processing on a left-channel signal and a right-channel signal that are obtained after the delay alignment, to obtain a primary-channel signal and a secondary-channel signal.
(127) 680. Encode, by using a monophonic signal encoding and decoding method, the primary-channel signal and the secondary-channel signal that are obtained after the time-domain downmixing processing.
(128) The foregoing describes in detail the encoding method for a stereo signal in the embodiments of this disclosure with reference to
(129) The following describes in detail the decoding method for a stereo signal in the embodiments of this disclosure with reference to
(130)
(131) 710. Decode a bitstream to obtain a primary-channel signal and a secondary-channel signal in a current frame, and an inter-channel time difference in the current frame.
(132) It should be understood that, in the step 710, a method for decoding the primary-channel signal needs to correspond to a method for encoding the primary-channel signal by an encoding end. Similarly, a method for decoding the secondary channel also needs to correspond to a method for encoding the secondary-channel signal by the encoding end.
(133) Optionally, the bitstream in the step 710 may be a bitstream received by the decoding end.
(134) It should be understood that a stereo signal processed herein may include a left-channel signal and a right-channel signal, and the inter-channel time difference in the current frame may be obtained by estimating, by the encoding end, a delay of the left-channel signal and the right-channel signal, and then the inter-channel time difference in the current frame is quantized before being transmitted to the decoding end (the inter-channel time difference in the current frame may be specifically determined after the decoding end decodes the received bitstream). For example, the encoding end calculates a cross-correlation function of a left channel and a right channel based on a left-channel signal and a right-channel signal in the current frame, then uses an index value corresponding to a maximum value of the cross-correlation function as the inter-channel time difference in the current frame, quantizes and encodes the inter-channel time difference in the current frame, and transmits a quantized inter-channel time difference to the decoding end. The decoding end decodes the received bitstream to determine the inter-channel time difference in the current frame. A specific manner in which the encoding end estimates the delay of the left-channel signal and the right-channel signal may be shown by the example 1 to the example 3 in the foregoing description.
(135) 720. Perform time-domain upmixing processing on the primary-channel signal and the secondary-channel signal in the current frame, to obtain a left-channel reconstructed signal and a right-channel reconstructed signal that are obtained after the time-domain upmixing processing.
(136) Specifically, time-domain upmixing processing may be performed, based on a channel combination scale factor, on the primary-channel signal and the secondary-channel signal in the current frame that are obtained by decoding, to obtain the left-channel reconstructed signal and the right-channel reconstructed signal that are obtained after the time-domain upmixing processing (which may also be referred to as a left-channel signal and a right-channel signal that are obtained after the time-domain upmixing processing).
(137) It should be understood that the encoding end and the decoding end may use many methods to perform time-domain downmixing processing and time-domain upmixing processing respectively. However, a method for performing time-domain upmixing processing by the decoding end needs to correspond to a method for performing time-domain downmixing processing by the encoding end. For example, when the encoding end obtains the primary-channel signal and the secondary-channel signal according to the formula (12), the decoding end may first obtain the channel combination scale factor by decoding the received bitstream, and then obtain the left-channel signal and the right-channel signal that are obtained after the time-domain upmixing processing according to a formula (13).
(138)
(139) In the formula (13), x′.sub.L(n) the left-channel signal after the time-domain upmixing processing in the current frame, x′.sub.R(n) is the right-channel signal after the time-domain upmixing processing in the current frame, Y(n) is the primary-channel signal in the current frame that is obtained by decoding, X(n) is the secondary-channel signal in the current frame that is obtained by decoding, n is a sampling point number, n=0, 1, . . . , N−1, N is a frame length, and ratio is the channel combination scale factor that is obtained by decoding.
(140) 730. Perform interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing in the current frame.
(141) In the step 730, performing interpolation processing based on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame is equivalent to performing weighted average processing on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame. In this way, the finally obtained inter-channel time difference after the interpolation processing in the current frame is between the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame.
(142) In the step 730, the following manner 3 and manner 4 may be used when interpolation processing is performed based on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame.
Manner 3
(143) The inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula (14).
A=α.Math.B+(1−α).Math.C (14)
(144) In the formula (14), A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, a is a first interpolation coefficient, and α is a real number satisfying 0<α<1.
(145) The inter-channel time difference can be adjusted by using the formula A=α.Math.B+(1−α).Math.C, so that the finally obtained inter-channel time difference after interpolation processing in the current frame is between the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, and the inter-channel time difference after the interpolation processing in the current frame matches, as much as possible, an inter-channel time difference of an original stereo signal that is not encoded and decoded.
(146) Assuming that the current frame is an i.sup.th frame, the previous frame of the current frame is an (i−1).sup.th frame. In this case, the formula (14) may be transformed into a formula (15).
d_int(i)=α.Math.d(i)+(1−α).Math.d(i−1) (15)
(147) in the formula (15). d_int(i) is an inter-channel time difference after interpolation processing in the i.sup.th frame, d(i) is the inter-channel time difference in the current frame, d(i−1) is an inter-channel time difference in the (i−1).sup.th frame.
(148) The first interpolation coefficient α in the formulas (14) and (15) may be directly set by technical personnel (may be directly set according to experience). For example, the first interpolation coefficient α may be directly set to 0.4 or 0.6.
(149) Optionally, the interpolation coefficient α may also be determined based on a frame length of the current frame and an encoding and decoding delay. The encoding and decoding delay herein may include an encoding delay in a process of encoding, by the encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal. Further, the encoding and decoding delay herein may be a sum of the encoding delay at the encoding end and the decoding delay at the decoding end.
(150) Optionally, the interpolation coefficient α may be specifically inversely proportional to the encoding and decoding delay, and the first interpolation coefficient α is directly proportional to the frame length of the current frame. In other words, the first interpolation coefficient α decreases as the encoding and decoding delay increases, and increases as the frame length of the current frame increases.
(151) Optionally, the first interpolation coefficient α may be calculated according to a formula (16).
(152)
(153) In the formula (16), N is the frame length of the current frame, and S is the encoding and decoding delay.
(154) It is assumed that the frame length of the current frame is 320, and the encoding and decoding delay is 192, in other words, N=320, and S=192, In this case, N and S are substituted into the formula (16) to obtain:
(155)
(156) Finally, it can be obtained that the first interpolation coefficient α is 0.4.
(157) Optionally, the first interpolation coefficient α is pre-stored. Specifically, the first interpolation coefficient α may be pre-stored at the decoding end. In this way, when performing interpolation processing, the decoding end may directly perform interpolation processing based on the pre-stored first interpolation coefficient α without calculating a value of the first interpolation coefficient α. This can reduce calculation complexity of a decoding process and improve decoding efficiency.
Manner 4
(158) The inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula (18).
A=(1−β).Math.B+β.Math.C (18)
(159) In the formula (18), A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, and β is a second interpolation coefficient and is a real number satisfying 0<α<1.
(160) The inter-channel time difference can be adjusted by using the formula A=(1−β).Math.B+β.Math.C, so that the finally obtained inter-channel time difference after interpolation processing in the current frame is between the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, and the inter-channel time difference after the interpolation processing in the current frame matches, as much as possible, an inter-channel time difference of an original stereo signal that is not encoded and decoded.
(161) Assuming that the current frame is an i.sup.th frame, the previous frame of the current frame is an (i−1).sup.th frame. In this case, the formula (18) may be transformed into the following formula:
d_int(i)=(1−β).Math.d(i)+β.Math.d(i−1) (19)
(162) In the formula (15), d_int(i) an inter-channel time difference after interpolation processing in the i.sup.th frame, d(i) is the inter-channel time difference in the current frame, d(i−1) is an inter-channel time difference in the (i−1).sup.th frame.
(163) Similar to the manner for setting the first interpolation coefficient α, the second interpolation coefficient β may also be directly set by technical personnel (may be directly set according to experience). For example, the second interpolation coefficient β may be directly set to 0.6 or 0.4.
(164) Optionally, the second interpolation coefficient β may also be determined based on a frame length of the current frame and an encoding and decoding delay. The encoding and decoding delay herein may include an encoding delay in a process of encoding, by the encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal. Further, the encoding and decoding delay herein may be a sum of the encoding delay at the encoding end and the decoding delay at the decoding end.
(165) Optionally, the second interpolation coefficient β may be specifically directly proportional to the encoding and decoding delay, and is inversely proportional to the frame length of the current frame. In other words, the second interpolation coefficient β increases as the encoding and decoding delay increases, and decreases as the frame length of the current frame increases.
(166) Optionally, the second interpolation coefficient β may be determined according to a formula (20).
(167)
(168) In the formula (20), N is the frame length of the current frame, and S is the encoding and decoding delay.
(169) It is assumed that N=320, and S=192. In this case, N=320 and S=192 are substituted into the formula (20) to obtain:
(170)
(171) Finally, it can be obtained that the second interpolation coefficient β is 0.6.
(172) Optionally, the second interpolation coefficient β is pre-stored. Specifically, the second interpolation coefficient β may be pre-stored at the decoding end. In this way, when performing interpolation processing, the decoding end may directly perform interpolation processing based on the pre-stored second interpolation coefficient β without calculating a value of the second interpolation coefficient β. This can reduce calculation complexity of a decoding process and improve decoding efficiency.
(173) 740. Adjust a delay of the left-channel reconstructed signal and the right-channel reconstructed signal based on the inter-channel time difference in the current frame.
(174) It should be understood that, optionally, the left-channel reconstructed signal and the right-channel reconstructed signal that are obtained after the delay adjustment are decoded stereo signals.
(175) Optionally, after the step 740, the method may further includes obtaining the decoded stereo signals based on the left-channel reconstructed signal and the right-channel reconstructed signal that are obtained after the delay adjustment. For example, de-emphasis processing is performed on the left-channel reconstructed signal and the right-channel reconstructed signal that are obtained after the delay adjustment, to obtain the decoded stereo signals. For another example, post-processing is performed on the left-channel reconstructed signal and the right-channel reconstructed signal that are obtained after the delay adjustment, to obtain the decoded stereo signals.
(176) In this disclosure, by performing interpolation processing on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, the inter-channel time difference after the interpolation processing in the current frame can match the primary-channel signal and the secondary-channel signal that are obtained by decoding currently. This can reduce a deviation between an inter-channel time difference of a stereo signal that is finally obtained by decoding and an inter-channel time difference of an original stereo signal. Therefore, accuracy of a stereo sound image of the stereo signal that is finally obtained by decoding is improved.
(177) Specifically, a difference between a signal in the stereo signal finally obtained in the method 700 and the same signal in the original stereo signal may be shown in
(178) It should be understood that the encoding method of the encoding end corresponding to the method 700 may be an existing time-domain stereo encoding method. For example, the time-domain stereo encoding method corresponding to the method 700 may be the method 100 shown in
(179) A detailed process of the decoding method for a stereo signal in the embodiments of this disclosure is described below with reference to
(180)
(181) 810. Decode a primary-channel signal and a secondary-channel signal respectively based on a received bitstream.
(182) Specifically, a decoding method for decoding the primary-channel signal by the decoding end corresponds to an encoding method for encoding the primary-channel signal by an encoding end. A decoding method for decoding the secondary-channel signal by the decoding end corresponds to an encoding method for encoding the secondary-channel signal by the encoding end.
(183) 820. Decode the received bitstream to obtain a channel combination scale factor.
(184) Specifically, the received bitstream may be decoded to obtain an encoding index of the channel combination scale factor, and then the channel combination scale factor is obtained by decoding based on the obtained encoding index of the channel combination scale factor.
(185) 830. Perform time-domain upmixing processing on the primary-channel signal and the secondary-channel signal based on the channel combination scale factor, to obtain a left-channel reconstructed signal and a right-channel reconstructed signal that are obtained after the time-domain upmixing processing.
(186) 840. Decode the received bitstream to obtain an inter-channel time difference in a current frame.
(187) 850. Perform interpolation processing based on the inter-channel time difference in the current frame that is obtained by decoding and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing in the current frame.
(188) 860. Adjust, based on the inter-channel time difference after the interpolation processing, a delay of the left-channel reconstructed signal and the right-channel reconstructed signal that are obtained after the time-domain upmixing processing, to obtain a decoded stereo signal.
(189) It should be understood that, in this disclosure, the process of performing interpolation processing based on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame may be performed at the encoding end or the decoding end. After interpolation processing is performed at the encoding end based on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame, interpolation processing does not need to be performed at the decoding end, the inter-channel time difference after the interpolation processing in the current frame may be obtained directly based on the bitstream, and subsequent delay adjustment is performed based on the inter-channel time difference after the interpolation processing in the current frame. However, when interpolation processing is not performed at the encoding end, the decoding end needs to perform interpolation processing based on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame, and then performs subsequent delay adjustment based on the inter-channel time difference after the interpolation processing in the current frame that is obtained through the interpolation processing.
(190) The foregoing describes in detail the encoding and decoding methods for a stereo signal in the embodiments of this disclosure with reference to
(191)
(192) a determining module 910, configured to determine an inter-channel time difference in a current frame;
(193) an interpolation module 920, configured to perform interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing in the current frame;
(194) a delay alignment module 930, configured to perform delay alignment on a stereo signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo signal after the delay alignment in the current frame;
(195) a downmixing module 940, configured to perform time-domain downmixing processing on the stereo signal after the delay alignment in the current frame, to obtain a primary-channel signal and a secondary-channel signal in the current frame; and
(196) an encoding module 950, configured to quantize the inter-channel time difference after the interpolation processing in the current frame, and write a quantized inter-channel time difference into a bitstream.
(197) The encoding module 950 is further configured to quantize the primary-channel signal and the secondary-channel signal in the current frame, and write a quantized primary-channel signal and a quantized secondary-channel signal into the bitstream.
(198) In this disclosure, the inter-channel time difference in the current frame is used at the encoding apparatus to perform delay alignment, to obtain the primary-channel signal and the secondary-channel signal. However, interpolation processing is performed on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, so that the inter-channel time difference in the current frame that is obtained after the interpolation processing can match the primary-channel signal and the secondary-channel signal that are obtained by encoding and decoding. The inter-channel time difference after the interpolation processing is encoded and then transmitted to the decoding end, so that the decoding end can perform decoding based on the inter-channel time difference in the current frame that matches the primary-channel signal and the secondary-channel signal that are obtained by decoding. This can reduce a deviation between an inter-channel time difference of a stereo signal that is finally obtained by decoding and an inter-channel time difference of an original stereo signal. Therefore, accuracy of a stereo sound image of the stereo signal that is finally obtained by decoding is improved.
(199) Optionally, in an embodiment, the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=α.Math.B+(1−α).Math.C, inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, α is a first interpolation coefficient, and 0<α<1.
(200) Optionally, in an embodiment, the first interpolation coefficient α is inversely proportional to an encoding and decoding delay, and is directly proportional to a frame length of the current frame, where the encoding and decoding delay includes an encoding delay in a process of encoding, by an encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal.
(201) Optionally, in an embodiment, the first interpolation coefficient α satisfies a formula α=(N−S)/N, where S is the encoding and decoding delay, and N is the frame length of the current frame.
(202) Optionally, in an embodiment, the first interpolation coefficient α is pre-stored.
(203) Optionally, in an embodiment, the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=(1−β).Math.B+β.Math.C.
(204) In the formula, A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, β is a second interpolation coefficient, and 0<β<1.
(205) Optionally, in an embodiment, the second interpolation coefficient β is directly proportional to an encoding and decoding delay, and is inversely proportional to a frame length of the current frame, where the encoding and decoding delay includes an encoding delay in a process of encoding, by an encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal.
(206) Optionally, in an embodiment, the second interpolation coefficient β satisfies a formula β=S/N, where S is the encoding and decoding delay, and N is the frame length of the current frame.
(207) Optionally, in an embodiment, the second interpolation coefficient β is pre-stored.
(208)
(209) a decoding module 1010, configured to decode a bitstream to obtain a primary-channel signal and a secondary-channel signal in a current frame, and an inter-channel time difference in the current frame;
(210) an upmixing module 1020, configured to perform time-domain upmixing processing on the primary-channel signal and the secondary-channel signal in the current frame, to obtain a primary-channel signal and a secondary-channel signal that are obtained after the time-domain upmixing processing;
(211) an interpolation module 1030, configured to perform interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing in the current frame; and a delay adjustment module 1040, configured to adjust, based on the inter-channel time difference after the interpolation processing in the current frame, a delay of the primary-channel signal and the secondary-channel signal that are obtained after the time-domain upmixing processing.
(212) In this disclosure, by performing interpolation processing on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, the inter-channel time difference after the interpolation processing in the current frame can match the primary-channel signal and the secondary-channel signal that are obtained by decoding currently. This can reduce a deviation between an inter-channel time difference of a stereo signal that is finally obtained by decoding and an inter-channel time difference of an original stereo signal. Therefore, accuracy of a stereo sound image of the stereo signal that is finally obtained by decoding is improved.
(213) Optionally, in an embodiment, the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=α.Math.β+(1−α).Math.C, where A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, a is a first interpolation coefficient, and 0<α<1.
(214) Optionally, in an embodiment, the first interpolation coefficient α is inversely proportional to an encoding and decoding delay, and is directly proportional to a frame length of the current frame, where the encoding and decoding delay includes an encoding delay in a process of encoding, by an encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal.
(215) Optionally, in an embodiment, the first interpolation coefficient α satisfies a formula α=(N−S)/N, where S is the encoding and decoding delay, and N is the frame length of the current frame.
(216) Optionally, in an embodiment, the first interpolation coefficient α is pre-stored.
(217) Optionally, in an embodiment, the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=(1−β).Math.B+β.Math.C, where A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame. C is the inter-channel time difference in the previous frame of the current frame, β is a second interpolation coefficient, and 0<β<1.
(218) Optionally, in an embodiment, the second interpolation coefficient β is directly proportional to an encoding and decoding delay, and is inversely proportional to a frame length of the current frame, where the encoding and decoding delay includes an encoding delay in a process of encoding, by an encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal.
(219) Optionally, in an embodiment, the second interpolation coefficient β satisfies a formula β=S/N, where S is the encoding and decoding delay, and N is the frame length of the current frame.
(220) Optionally, in an embodiment, the second interpolation coefficient β is pre-stored.
(221)
(222) a memory 1110, configured to store a program; and
(223) a processor 1120, configured to execute the program stored in the memory 1110, where when the program in the memory 1110 is executed, the processor 1120 is specifically configured to: perform interpolation processing based on an inter-channel time difference in a current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing in the current frame; perform delay alignment on a stereo signal in the current frame based on the inter-channel time difference in the current frame, to obtain a stereo signal after the delay alignment in the current frame; perform time-domain downmixing processing on the stereo signal after the delay alignment in the current frame, to obtain a primary-channel signal and a secondary-channel signal in the current frame; quantize the inter-channel time difference after the interpolation processing in the current frame, and write a quantized inter-channel time difference into a bitstream; and quantize the primary-channel signal and the secondary-channel signal in the current frame, and write a quantized primary-channel signal and a quantized secondary-channel signal into the bitstream.
(224) In this disclosure, the inter-channel time difference in the current frame is used at the encoding apparatus to perform delay alignment, to obtain the primary-channel signal and the secondary-channel signal. However, interpolation processing is performed on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, so that the inter-channel time difference in the current frame that is obtained after the interpolation processing can match the primary-channel signal and the secondary-channel signal that are obtained by encoding and decoding. The inter-channel time difference after the interpolation processing is encoded and then transmitted to the decoding end, so that the decoding end can perform decoding based on the inter-channel time difference in the current frame that matches the primary-channel signal and the secondary-channel signal that are obtained by decoding. This can reduce a deviation between an inter-channel time difference of a stereo signal that is finally obtained by decoding and an inter-channel time difference of an original stereo signal. Therefore, accuracy of a stereo sound image of the stereo signal that is finally obtained by decoding is improved.
(225) Optionally, in an embodiment, the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=α.Math.B.Math.(1−α).Math.C, where A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, α is a first interpolation coefficient, and 0<α<1.
(226) Optionally, in an embodiment, the first interpolation coefficient α is inversely proportional to an encoding and decoding delay, and is directly proportional to a frame length of the current frame, where the encoding and decoding delay includes an encoding delay in a process of encoding, by an encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal.
(227) Optionally, in an embodiment, the first interpolation coefficient α satisfies a formula α=(N−S)/N, where S is the encoding and decoding delay, and N is the frame length of the current frame.
(228) Optionally, in an embodiment, the first interpolation coefficient α is pre-stored.
(229) The first interpolation coefficient α may be stored in the memory 1110.
(230) Optionally, in an embodiment, the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=(1−β).Math.B+β.Math.C.
(231) In the formula, A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, β is a second interpolation coefficient, and 0<β<1.
(232) Optionally, in an embodiment, the second interpolation coefficient β is directly proportional to an encoding and decoding delay, and is inversely proportional to a frame length of the current frame, where the encoding and decoding delay includes an encoding delay in a process of encoding, by an encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal.
(233) Optionally, in an embodiment, the second interpolation coefficient β satisfies a formula β=S/N, where S is the encoding and decoding delay, and N is the frame length of the current frame.
(234) Optionally, in an embodiment, the second interpolation coefficient β is pre-stored.
(235) The second interpolation coefficient may be stored in the memory 1110.
(236)
(237) a memory 1210, configured to store a program; and
(238) a processor 1220, configured to execute the program stored in the memory 1210, where when the program in the memory 1210 is executed, the processor 1220 is specifically configured to: decode a bitstream to obtain a primary-channel signal and a secondary-channel signal in a current frame; perform time-domain upmixing processing on the primary-channel signal and the secondary-channel signal in the current frame, to obtain a primary-channel signal and a secondary-channel signal that are obtained after the time-domain upmixing processing; perform interpolation processing based on an inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame, to obtain an inter-channel time difference after the interpolation processing in the current frame; and adjust, based on the inter-channel time difference after the interpolation processing in the current frame, a delay of the primary-channel signal and the secondary-channel signal that are obtained after the time-domain upmixing processing.
(239) In this disclosure, by performing interpolation processing on the inter-channel time difference in the current frame and the inter-channel time difference in the previous frame of the current frame, the inter-channel time difference after the interpolation processing in the current frame can match the primary-channel signal and the secondary-channel signal that are obtained by decoding currently. This can reduce a deviation between an inter-channel time difference of a stereo signal that is finally obtained by decoding and an inter-channel time difference of an original stereo signal. Therefore, accuracy of a stereo sound image of the stereo signal that is finally obtained by decoding is improved.
(240) Optionally, in an embodiment, the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=α.Math.B+(1−α).Math.C, where A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, α is a first interpolation coefficient, and 0<α<1.
(241) Optionally, in an embodiment, the first interpolation coefficient α is inversely proportional to an encoding and decoding delay, and is directly proportional to a frame length of the current frame, where the encoding and decoding delay includes an encoding delay in a process of encoding, by an encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal.
(242) Optionally, in an embodiment, the first interpolation coefficient α satisfies a formula α=(N−S)/N, where S is the encoding and decoding delay, and N is the frame length of the current frame.
(243) Optionally, in an embodiment, the first interpolation coefficient α is pre-stored.
(244) The first interpolation coefficient α may be stored in the memory 1210.
(245) Optionally, in an embodiment, the inter-channel time difference after the interpolation processing in the current frame is calculated according to a formula A=(1−β).Math.B+β.Math.C, where A is the inter-channel time difference after the interpolation processing in the current frame, B is the inter-channel time difference in the current frame, C is the inter-channel time difference in the previous frame of the current frame, β is a second interpolation coefficient, and 0<β<1.
(246) Optionally, in an embodiment, the second interpolation coefficient β is directly proportional to an encoding and decoding delay, and is inversely proportional to a frame length of the current frame, where the encoding and decoding delay includes an encoding delay in a process of encoding, by an encoding end, a primary-channel signal and a secondary-channel signal that are obtained after time-domain downmixing processing, and a decoding delay in a process of decoding, by a decoding end, a bitstream to obtain a primary-channel signal and a secondary-channel signal.
(247) Optionally, in an embodiment, the second interpolation coefficient β satisfies a formula β=S/N, where
(248) S is the encoding and decoding delay, and N is the frame length of the current frame.
(249) Optionally, in an embodiment, the second interpolation coefficient β is pre-stored.
(250) The second interpolation coefficient may be stored in the memory 1210.
(251) It should be understood that the encoding and decoding methods for a stereo signal in the embodiments of this disclosure may be performed by a terminal device or a network device in
(252) As shown in
(253) It should be understood that, in
(254) In
(255) The first terminal device or the second terminal device in
(256) In audio communication, a network device may implement transcoding of an encoding and decoding format of an audio signal. As shown in
(257) Similarly, as shown in
(258) In
(259) It should be further understood that the stereo encoder in
(260) It should be understood that the encoding and decoding methods for a stereo signal in the embodiments of this disclosure may also be performed by a terminal device or a network device in
(261) As shown in
(262) It should be understood that, in
(263) In
(264) The first terminal device or the second terminal device in
(265) In audio communication, a network device may implement transcoding of an encoding and decoding format of an audio signal. As shown in
(266) Similarly, as shown in
(267) It should be understood that, in
(268) It should be further understood that the stereo encoder in
(269) A person of ordinary skill in the art may be aware that, in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps may be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on particular disclosures and design constraint conditions of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular disclosure, but it should not be considered that the implementation goes beyond the scope of this disclosure.
(270) It may be clearly understood by a person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiments, and details are not described herein again.
(271) In the several embodiments provided in this disclosure, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other manners. For example, the described apparatus embodiments are merely examples. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
(272) The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of the embodiments.
(273) In addition, functional units in the embodiments of this disclosure may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.
(274) When the functions are implemented in the form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this disclosure essentially, or the part contributing to the prior art, or some of the technical solutions may be implemented in a form of a software product. The software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps of the methods described in the embodiments of this disclosure. The foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (read-only memory, ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disc.
(275) The foregoing descriptions are merely specific implementations of this disclosure, but are not intended to limit the protection scope of this disclosure. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this disclosure shall fall within the protection scope of this disclosure. Therefore, the protection scope of this disclosure shall be subject to the protection scope of the claims.