Method and apparatus for decoding speech/audio bitstream
09734836 · 2017-08-15
Assignee
Inventors
Cpc classification
G10L19/06
PHYSICS
G10L19/167
PHYSICS
G10L19/005
PHYSICS
International classification
G10L19/00
PHYSICS
G10L19/06
PHYSICS
G10L19/005
PHYSICS
Abstract
A method and an apparatus for decoding a speech/audio bitstream are disclosed, where the method for decoding a speech/audio bitstream includes determining whether a current frame is a normal decoding frame or a redundancy decoding frame, obtaining a decoded parameter of the current frame by means of parsing when the current frame is a normal decoding frame or a redundancy decoding frame, performing post-processing on the decoded parameter of the current frame to obtain a post-processed decoded parameter of the current frame, and using the post-processed decoded parameter of the current frame to reconstruct a speech/audio signal.
Claims
1. A method for decoding a speech/audio bitstream, comprising: performing decoding operations on a bit stream, wherein a decoded parameter of a first frame and a decoded parameter of a second frame are acquired via the decoding operations, and wherein the second frame is a previous frame adjacent to the first frame; performing, according to the decoded parameter of the second frame, post-processing on the decoded parameter of the first frame to obtain a post-processed decoded parameter of the first frame when at least one of the first frame or the second frame is a redundancy decoding frame; and reconstructing a speech/audio signal using the post-processed decoded parameter of the first frame, wherein the decoded parameter of the first frame comprises a spectral pair parameter of the first frame, wherein the decoded parameter of the second frame comprises a spectral pair parameter of the second frame, and wherein performing post-processed on the decoded parameter of the first frame comprises weighting the spectral pair parameter of the first frame and the spectral pair parameter of the second frame.
2. The method according to claim 1, wherein the post-processed spectral pair parameter of the first frame is obtained through calculation using the formula lsp[k]=α*lsp_old[k]+β*lsp_mid[k]+δ*lsp_new[k], wherein 0 ≦k≦M wherein lsp[k] is the post-processed spectral pair parameter of the first frame, wherein lsp_old[k]is the spectral pair parameter of the second frame, wherein lsp_mid[k] is a middle value of the spectral pair parameter of the first frame, wherein lsp_new[k] is the spectral pair parameter of the first frame, wherein M is an order of spectral pair parameters, wherein α is a weight of the spectral pair parameter of the second frame, wherein β is a weight of the middle value of the spectral pair parameter of the first frame, wherein δ is a weight of the spectral pair parameter of the first frame, wherein α≧0, wherein β≧0, wherein δ≧0, and wherein α+β+δ=1.
3. The method according to claim 2, wherein a value of β is 0 or is less than a preset threshold when the first frame is the redundancy decoding frame, a signal class of the first frame is not unvoiced, and a signal class of a next frame of the first frame is unvoiced.
4. The method according to claim 2, wherein a value of β is 0 or is less than a preset threshold when the first frame is the redundancy decoding frame, a signal class of the first frame is not unvoiced, and a spectral tilt factor of the second frame is less than a preset spectral tilt factor threshold.
5. The method according to claim 2, wherein a value of β is 0 or is less than a preset threshold when the first frame is the redundancy decoding frame, a signal class of the first frame is not unvoiced, a signal class of a next frame of the first frame is unvoiced, and a spectral tilt factor of the second frame is less than a preset spectral tilt factor threshold.
6. The method according to claim 1, wherein a weight of the spectral pair parameter of the second frame is 0 or less than a preset threshold when a signal class of the first frame is unvoiced, the second frame is the redundancy decoding frame, and a signal class of the second frame is not unvoiced.
7. The method according to claim 1, wherein a weight of the spectral pair parameter of the first frame is 0 or is less than a preset threshold when the first frame is the redundancy decoding frame, a signal class of the first frame is not unvoiced, and a signal class of a next frame of the first frame is unvoiced.
8. The method according to claim 1, a weight of the spectral pair parameter of the first frame is 0 or is less than a preset threshold when the first frame is the redundancy decoding frame, a signal class of the first frame is not unvoiced, and a spectral tilt factor of the second frame is less than a preset spectral tilt factor threshold.
9. The method according to claim 1, wherein a weight of the spectral pair parameter of the first frame is 0 or is less than a preset threshold when the first frame is the redundancy decoding frame, a signal class of the first frame is not unvoiced, a signal class of a next frame of the first frame is unvoiced and a spectral tilt factor of the second frame is less than a preset spectral tilt factor threshold.
10. The method according to claim 4, wherein a smaller spectral tilt factor indicates the signal class, which is more inclined to be unvoiced, of a frame corresponding to the spectral tilt factor.
11. The method according to claim 1, wherein the decoded parameter of the first frame comprises an adaptive codebook gain and wherein performing the post-processing on the decoded parameter of the first frame comprises attenuating an adaptive codebook gain of at least one subframe of the first frame when the first frame is the redundancy decoding frame and a next frame of the first frame is an unvoiced frame.
12. The method according to claim 1, wherein the first frame is the redundancy decoding frame, wherein the decoded parameter comprises a bandwidth extension envelope, and wherein performing the post-processing on the decoded parameter of the first frame comprises performing correction on the bandwidth extension envelope of the first frame according to at least one of a bandwidth extension envelope of the second frame or the spectral tilt factor of the second frame when the first frame is not an unvoiced frame, a next frame of the first frame is an unvoiced frame, and a spectral tilt factor of the second frame is less than a preset spectral tilt factor threshold.
13. The method according to claim 12, wherein a correction factor used when correction is performed on the bandwidth extension envelope of the first frame is inversely proportional to the spectral tilt factor of the second frame and is directly proportional to a ratio of the bandwidth extension envelope of the second frame to the bandwidth extension envelope of the first frame.
14. The method according to claim 1, wherein the first frame is the redundancy decoding frame, wherein the decoded parameter comprises a bandwidth extension envelope, and wherein performing the post-processing on the decoded parameter of the first frame comprises using a bandwidth extension envelope of the second frame to perform adjustment on a bandwidth extension envelope of the first frame when the second frame is a normal decoding frame, and a signal class of the first frame is same as a signal class of the second frame.
15. A decoder for decoding a speech/audio bitstream, comprising: a processor; and a memory coupled to the processor, wherein the processor is configured to: perform decoding operations on a bit stream, wherein a decoded parameter of a first frame and a decoded of a second frame are acquired via the decoding operations, and wherein the second frame is a previous frame adjacent to the first frame: perform post-processing on the decoded parameter of the first frame to obtain a post-processed decoded parameter of the first frame when at least one of the first frame or the second frame is a redundancy decoding frame; and reconstruct a speech/audio signal using the post-processed decoded parameter of the first frame wherein the decoded parameter of the first frame comprises a spectral pair parameter of the first frame, wherein the decoded parameter of the second frame comprises a spectral pair parameter of the second frame, and wherein the post-processed decoded parameter of the first frame is calculated by weighting the spectral pair parameter of the first frame and the spectral pair parameter of the second frame.
16. A non-transitory computer-readable storage medium storing computer instructions, that when executed by one or more processors, cause the one or more processors to perform decoding operations on a bit stream, wherein a decoded parameter of a first frame and a decoded parameter of a second frame are acquired via the decoding operations, and wherein the second frame is a previous frame adjacent to the first frame:, perform post-processing on the decoded parameter of the first frame to obtain a post-processed decoded parameter of the first frame when at least one of the first frame or the second frame is a redundancy decoding frame; and reconstruct a speech/audio signal using the post-processed decoded parameter of the first frame wherein the decoded parameter of the first frame comprises a spectral pair parameter of the first frame, wherein the decoded parameter of the second frame comprises a spectral pair parameter of the second frame, and wherein the post-processed decoded parameter of the first frame is calculated by weighting the spectral pair parameter of the first frame and the spectral pair parameter of the second frame.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) To describe the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings needed for describing the embodiments. The accompanying drawings in the following description show merely some embodiments of the present application, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.
(2)
(3)
(4)
(5)
DESCRIPTION OF EMBODIMENTS
(6) To make a person skilled in the art understand the technical solutions in the present application better, the following clearly describes the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. The described embodiments are merely some but not all of the embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without creative efforts shall fall within the protection scope of the present application.
(7) The following provides respective descriptions in detail.
(8) In the specification, claims, and accompanying drawings of the present application, the terms “first” and “second” are intended to distinguish between similar objects but do not necessarily indicate a specific order or sequence. It should be understood that data termed in such a way is interchangeable in proper circumstances so that the embodiments of the present application described herein can, for example, be implemented in orders other than the order illustrated or described herein. Moreover, the terms “include”, “contain” and any other variants mean to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that includes a list of steps or units is not necessarily limited to those steps or units, but may include other steps or units not expressly listed or inherent to such a process, method, system, product, or device.
(9) A method for decoding a speech/audio bitstream provided in this embodiment of the present application is first introduced. The method for decoding a speech/audio bitstream provided in this embodiment of the present application is executed by a decoder. The decoder may be any apparatus that needs to output speeches, for example, a mobile phone, a notebook computer, a tablet computer, or a personal computer.
(10)
(11) Step 101: Determine whether a current frame is a normal decoding frame or a redundancy decoding frame.
(12) A normal decoding frame means that information about a current frame can be obtained directly from a bitstream of the current frame by means of decoding. A redundancy decoding frame means that information about a current frame cannot be obtained directly from a bitstream of the current frame by means of decoding, but redundant bitstream information of the current frame can be obtained from a bitstream of another frame.
(13) In an embodiment of the present application, when the current frame is a normal decoding frame, the method provided in this embodiment of the present application is executed only when a previous frame of the current frame is a redundancy decoding frame. The previous frame of the current frame and the current frame are two immediately neighboring frames. In another embodiment of the present application, when the current frame is a normal decoding frame, the method provided in this embodiment of the present application is executed only when there is a redundancy decoding frame among a particular quantity of frames before the current frame. The particular quantity may be set as needed, for example, may be set to 2, 3, 4, or 10.
(14) Step 102: If the current frame is a normal decoding frame or a redundancy decoding frame, obtain a decoded parameter of the current frame by means of parsing.
(15) The decoded parameter of the current frame may include at least one of a spectral pair parameter, an adaptive codebook gain (gain_pit), an algebraic codebook, and a bandwidth extension envelope, where the spectral pair parameter may be at least one of a linear spectral pair (LSP) parameter and an immittance spectral pair (ISP) parameter. It may be understood that, in this embodiment of the present application, post-processing may be performed on only any, one parameter of decoded parameters or post-processing may be performed on all decoded parameters. Furthermore, how many parameters are selected and which parameters are selected for post-processing may be selected according to application scenarios and environments, which are not limited in this embodiment of the present application.
(16) When the current frame is a normal decoding frame, information about the current frame can be directly obtained from a bitstream of the current frame by means of decoding in order to obtain the decoded parameter of the current frame. When the current frame is a redundancy decoding frame, the decoded parameter of the current frame can be obtained according to redundant bitstream information of the current frame in a bitstream of another frame by means of parsing.
(17) Step 103: Perform post-processing on the decoded parameter of the current frame to obtain a post-processed decoded parameter of the current frame.
(18) For different decoded parameters, different post-processing may be performed. For example, post-processing performed on a spectral pair parameter may be using a spectral pair parameter of the current frame and a spectral pair parameter of a previous frame of the current frame to perform adaptive weighting to obtain a post-processed spectral pair parameter of the current frame. Post-processing performed on an adaptive codebook gain may be performing adjustment, for example, attenuation, on the adaptive codebook gain.
(19) This embodiment of the present application does not impose limitation on specific post-processing. Furthermore, which type of post-processing is performed may be set as needed or according to application environments and scenarios.
(20) Step 104: Use the post-processed decoded parameter of the current frame to reconstruct a speech/audio signal.
(21) It can be known from the above that, in this embodiment, after obtaining a decoded parameter of a current frame by means of parsing, a decoder side may perform post-processing on the decoded parameter of the current frame and use a post-processed decoded parameter of the current frame to reconstruct a speech/audio signal such that stable quality can be obtained when a decoded signal transitions between a redundancy decoding frame and a normal decoding frame, improving quality of a speech/audio signal that is output.
(22) In an embodiment of the present application, the decoded parameter of the current frame includes a spectral pair parameter of the current frame and the performing post-processing on the decoded parameter of the current frame may include using the spectral pair parameter of the current frame and a spectral pair parameter of a previous frame of the current frame to obtain a post-processed spectral pair parameter of the current frame. Furthermore, adaptive weighting is performed on the spectral pair parameter of the current frame and the spectral pair parameter of the previous frame of the current frame to obtain the post-processed spectral pair parameter of the current frame. Furthermore, in an embodiment of the present application, the following formula may be used to obtain through calculation the post-processed spectral pair parameter of the current frame:
lsp[k]=α*lsp_old[k]+δ*lsp_new[k]0≦k≦M,
where lsp[k] is the post-processed spectral pair parameter of the current frame, lsp_old[k] is the spectral pair parameter of the previous frame, lsp_new[k] is the spectral pair parameter of the current frame, M is an order of spectral pair parameters, α is a weight of the spectral pair parameter of the previous frame, and δ is a weight of the spectral pair parameter of the current frame, where α≧0, δ≧0, and α+δ=1.
(23) In another embodiment of the present application, the following formula may be used to obtain through calculation the post-processed spectral pair parameter of the current frame:
lsp[k]=α*lsp_old[k]+β*lsp_mid[k]+δ*lsp_new[k]0≦k≦M,
where lsp[k] is the post-processed spectral pair parameter of the current frame, lsp_old[k] is the spectral pair parameter of the previous frame, lsp_mid [k] is a middle value of the spectral pair parameter of the current frame, lsp__new[k] is the spectral pair parameter of the current frame, M is an order of spectral pair parameters, α is a weight of the spectral pair parameter of the previous frame, β is a weight of the middle value of the spectral pair parameter of the current frame, and δ is a weight of the spectral pair parameter of the current frame, where α≧0, β≧0, δ≧0, and α+β+δ=1.
(24) Values of α, β, and δ in the foregoing formula may vary according to different application environments and scenarios. For example, when a signal class of the current frame is unvoiced, the previous frame of the current frame is a redundancy decoding frame, and a signal class of the previous frame of the current frame is not unvoiced, the value of α is 0 or is less than a preset threshold (α_TRESH), where a value of α_TRESH may approach 0. When the current frame is a redundancy decoding frame and a signal class of the current frame is not unvoiced, if a signal class of a next frame of the current frame is unvoiced, or a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, or a signal class of a next frame of the current frame is unvoiced and a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, the value of β is 0 or is less than a preset threshold (β_TRESH), where a value of β_TRESH may approach 0. When the current frame is a redundancy decoding frame and a signal class of the current frame is not unvoiced, if a signal class of a next frame of the current frame is unvoiced, or a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, or a signal class of a next frame of the current frame is unvoiced and a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, the value of δ is 0 or is less than a preset threshold (δ_TRESH), where a value of δ_TRESH may approach 0.
(25) The spectral tilt factor may be positive or negative, and a smaller spectral tilt factor of a frame indicates a signal class, which is more inclined to be unvoiced, of the frame.
(26) The signal class of the current frame may be unvoiced, voiced, generic, transition, inactive, or the like.
(27) Therefore, for a value of the spectral tilt factor threshold, different values may be set according to different application environments and scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(28) In another embodiment of the present application, the decoded parameter of the current frame may include an adaptive codebook gain of the current frame. When the current frame is a redundancy decoding frame, if the next frame of the current frame is an unvoiced frame, or a next frame of the next frame of the current frame is an unvoiced frame and an algebraic codebook of a current subframe of the current frame is a first quantity of times an algebraic codebook of a previous subframe of the current subframe or an algebraic codebook of the previous frame of the current frame, performing post-processing on the decoded parameter of the current frame may include attenuating an adaptive codebook gain of the current subframe of the current frame. When the current frame or the previous frame of the current frame is a redundancy decoding frame, if the signal class of the current frame is generic and the signal class of the next frame of the current frame is voiced or the signal class of the previous frame of the current frame is generic and the signal class of the current frame is voiced, and an algebraic codebook of one subframe in the current frame is different from an algebraic codebook of a previous subframe of the one subframe by a second quantity of times or an algebraic codebook of one subframe in the current frame is different from an algebraic codebook of the previous frame of the current frame by a second quantity of times, performing post-processing on the decoded parameter of the current frame may include adjusting an adaptive codebook gain of a current subframe of the current frame according to at least one of a ratio of an algebraic codebook of the current subframe of the current frame to an algebraic codebook of a neighboring subframe of the current subframe of the current frame, a ratio of an adaptive codebook gain of the current subframe of the current frame to an adaptive codebook gain of the neighboring subframe of the current subframe of the current frame, and a ratio of the algebraic codebook of the current subframe of the current frame to the algebraic codebook of the previous frame of the current frame.
(29) Values of the first quantity and the second quantity may be set according to specific application environments and scenarios. The values may be integers or may be non-integers, where the values of the first quantity and the second quantity may be the same or may be different. For example, the value of the first quantity may be 2, 2.5, 3, 3.4, or 4 and the value of the second quantity may be 2, 2.6, 3, 3.5, or 4.
(30) For an attenuation factor used when the adaptive codebook gain of the current subframe of the current frame is attenuated, different values may be set according to different application environments and scenarios.
(31) In another embodiment of the present application, the decoded parameter of the current frame includes an algebraic codebook of the current frame. When the current frame is a redundancy decoding frame, if the signal class of the next frame of the current frame is unvoiced, the spectral tilt factor of the previous frame of the current frame is less than the preset spectral tilt factor threshold, and an algebraic codebook of at least one subframe of the current frame is 0, performing post-processing on the decoded parameter of the current frame includes using random noise or a non-zero algebraic codebook of the previous subframe of the current subframe of the current frame as an algebraic codebook of an all-0 subframe of the current frame. For the spectral tilt factor threshold, different values may be set according to different application environments or scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(32) In another embodiment of the present application, the decoded parameter of the current frame includes a bandwidth extension envelope of the current frame. When the current frame is a redundancy decoding frame, the current frame is not an unvoiced frame, and the next frame of the current frame is an unvoiced frame, if the spectral tilt factor of the previous frame of the current frame is less than the preset spectral tilt factor threshold, performing post-processing on the decoded parameter of the current frame may include performing correction on the bandwidth extension envelope of the current frame according to at least one of a bandwidth extension envelope of the previous frame of the current frame and the spectral tilt factor. A correction factor used when correction is performed on the bandwidth extension envelope of the current frame is inversely proportional to the spectral tilt factor of the previous frame of the current frame and is directly proportional to a ratio of the bandwidth extension envelope of the previous frame of the current frame to the bandwidth extension envelope of the current frame. For the spectral tilt factor threshold, different values may be set according to different application environments or scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(33) In another embodiment of the present application, the decoded parameter of the current frame includes a bandwidth extension envelope of the current frame. If the current frame is a redundancy decoding frame, the previous frame of the current frame is a normal decoding frame, the signal class of the current frame is the same as the signal class of the previous frame of the current frame or the current frame is a prediction mode of redundancy decoding, performing post-processing on the decoded parameter of the current frame includes using a bandwidth extension envelope of the previous frame of the current frame to perform adjustment on the bandwidth extension envelope of the current frame. The prediction mode of redundancy decoding indicates that, when redundant bitstream information is encoded, more bits are used to encode an adaptive codebook gain part and fewer bits are used to encode an algebraic codebook part or the algebraic codebook part may be even not encoded.
(34) It can be known from the above that, in an embodiment of the present application, at transition between an unvoiced frame and a non-unvoiced frame (when the current frame is an unvoiced frame and a redundancy decoding frame, the previous frame or next frame of the current frame is a non-unvoiced frame and a normal decoding frame, or the current frame is a non-unvoiced frame and a normal decoding frame and the previous frame or next frame of the current frame is an unvoiced frame and a redundancy decoding frame), post-processing may be performed on the decoded parameter of the current frame in order to eliminate a click phenomenon at the inter-frame transition between the unvoiced frame and the non-unvoiced frame, improving quality of a speech/audio signal that is output. In another embodiment of the present application, at transition between a generic frame and a voiced frame (when the current frame is a generic frame and a redundancy decoding frame, the previous frame or next frame of the current frame is a voiced frame and a normal decoding frame, or the current frame is a voiced frame and a normal decoding frame and the previous frame or next frame of the current frame is a generic frame and a redundancy decoding frame), post-processing may be performed on the decoded parameter of the current frame in order to rectify an energy instability phenomenon at the transition between the generic frame and the voiced frame, improving quality of a speech/audio signal that is output. In another embodiment of the present application, when the current frame is a redundancy decoding frame, the current frame is not an unvoiced frame, and the next frame of the current frame is an unvoiced frame, adjustment may be performed on a bandwidth extension envelope of the current frame in order to rectify an energy instability phenomenon in time-domain bandwidth extension, improving quality of a speech/audio signal that is output.
(35)
(36) Step 201: Determine whether a current frame is a normal decoding frame. If the current frame is a normal decoding frame, perform step 204, and if the current frame is not a normal decoding frame, perform step 202.
(37) Furthermore, whether the current frame is a normal decoding frame may be determined based on a jitter buffer management (JBM) algorithm.
(38) Step 202: Determine whether redundant bitstream information of the current frame exists. If redundant bitstream information of the current frame exists, perform step 204, and if redundant bitstream information of the current frame doesn't exist, perform step 203.
(39) If redundant bitstream information of the current frame exists, the current frame is a redundancy decoding frame. Furthermore, whether redundant bitstream information of the current frame exists may be determined from a jitter buffer or a received bitstream.
(40) Step 203: Reconstruct a speech/audio signal of the current frame based on an FEC technology and end the procedure.
(41) Step 204: Obtain a decoded parameter of the current frame by means of parsing.
(42) When the current frame is a normal decoding frame, information about the current frame can be directly obtained from a bitstream of the current frame by means of decoding in order to obtain the decoded parameter of the current frame. When the current frame is a redundancy decoding frame, the decoded parameter of the current frame can be obtained according to the redundant bitstream information of the current frame by means of parsing.
(43) Step 205: Perform post-processing on the decoded parameter of the current frame to obtain a post-processed decoded parameter of the current frame.
(44) Step 206: Use the post-processed decoded parameter of the current frame to reconstruct a speech/audio signal.
(45) Steps 204 to 206 may be performed by referring to steps 102 to 104, and details are not described herein again.
(46) It can be known from the above that, in this embodiment, after obtaining a decoded parameter of a current frame by means of parsing, a decoder side may perform post-processing on the decoded parameter of the current frame and use a post-processed decoded parameter of the current frame to reconstruct a speech/audio signal such that stable quality can be obtained when a decoded signal transitions between a redundancy decoding frame and a normal decoding frame, improving quality of a speech/audio signal that is output.
(47) In this embodiment of the present application, the decoded parameter of the current frame obtained by parsing by a decoder may include at least one of a spectral pair parameter of the current frame, an adaptive codebook gain of the current frame, an algebraic codebook of the current frame, and a bandwidth extension envelope of the current frame. It may be understood that, even if the decoder obtains at least two of the decoded parameters by means of parsing, the decoder may still perform post-processing on only one of the at least two decoded parameters. Therefore, how many decoded parameters and which decoded parameters the decoder further performs post-processing on may be set according to application environments and scenarios.
(48) The following describes a decoder for decoding a speech/audio bitstream according to an embodiment of the present application. The decoder may be any apparatus that needs to output speeches, for example, a mobile phone, a notebook computer, a tablet computer, or a personal computer.
(49)
(50) The determining unit 301 is configured to determine whether a current frame is a normal decoding frame.
(51) A normal decoding frame means that information about a current frame can be obtained directly from a bitstream of the current frame by means of decoding. A redundancy decoding frame means that information about a current frame cannot be obtained directly from a bitstream of the current frame by means of decoding, but redundant bitstream information of the current frame can be obtained from a bitstream of another frame.
(52) In an embodiment of the present application, when the current frame is a normal decoding frame, the method provided in this embodiment of the present application is executed only when a previous frame of the current frame is a redundancy decoding frame. The previous frame of the current frame and the current frame are two immediately neighboring frames. In another embodiment of the present application, when the current frame is a normal decoding frame, the method provided in this embodiment of the present application is executed only when there is a redundancy decoding frame among a particular quantity of frames before the current frame. The particular quantity may be set as needed, for example, may be set to 2, 3, 4, or 10.
(53) The parsing unit 302 is configured to obtain a decoded parameter of the current frame by means of parsing when the determining unit 301 determines that the current frame is a normal decoding frame or a redundancy decoding frame.
(54) The decoded parameter of the current frame may include at least one of a spectral pair parameter, an adaptive codebook gain (gain pit), an algebraic codebook, and a bandwidth extension envelope, where the spectral pair parameter may be at least one of an LSP parameter and an ISP parameter. It may be understood that, in this embodiment of the present application, post-processing may be performed on only any, one parameter of decoded parameters or post-processing may be performed on all decoded parameters. Furthermore, how many parameters are selected and which parameters are selected for post-processing may be selected according to application scenarios and environments, which are not limited in this embodiment of the present application.
(55) When the current frame is a normal decoding frame, information about the current frame can be directly obtained from a bitstream of the current frame by means of decoding in order to obtain the decoded parameter of the current frame. When the current frame is a redundancy decoding frame, the decoded parameter of the current frame can be obtained according to redundant bitstream information of the current frame in a bitstream of another frame by means of parsing.
(56) The post-processing unit 303 is configured to perform post-processing on the decoded parameter of the current frame obtained by the parsing unit 302 to obtain a post-processed decoded parameter of the current frame.
(57) For different decoded parameters, different post-processing may be performed. For example, post-processing performed on a spectral pair parameter may be using a spectral pair parameter of the current frame and a spectral pair parameter of a previous frame of the current frame to perform adaptive weighting to obtain a post-processed spectral pair parameter of the current frame. Post-processing performed on an adaptive codebook gain may be performing adjustment, for example, attenuation, on the adaptive codebook gain.
(58) This embodiment of the present application does not impose limitation on specific post-processing. Furthermore, which type of post-processing is performed may be set as needed or according to application environments and scenarios.
(59) The reconstruction unit 304 is configured to use the post-processed decoded parameter of the current frame obtained by the post-processing unit 303 to reconstruct a speech/audio signal.
(60) It can be known from the above that, in this embodiment, after obtaining a decoded parameter of a current frame by means of parsing, a decoder side may perform post-processing on the decoded parameter of the current frame and use a post-processed decoded parameter of the current frame to reconstruct a speech/audio signal such that stable quality can be obtained when a decoded signal transitions between a redundancy decoding frame and a normal decoding frame, improving quality of a speech/audio signal that is output.
(61) In another embodiment of the present application, the decoded parameter includes the spectral pair parameter and the post-processing unit 303 may be further configured to use the spectral pair parameter of the current frame and a spectral pair parameter of a previous frame of the current frame to obtain a post-processed spectral pair parameter of the current frame when the decoded parameter of the current frame includes a spectral pair parameter of the current frame. Furthermore, adaptive weighting is performed on the spectral pair parameter of the current frame and the spectral pair parameter of the previous frame of the current frame to obtain the post-processed spectral pair parameter of the current frame. Furthermore, in an embodiment of the present application, the post-processing unit 303 may use the following formula to obtain through calculation the post-processed spectral pair parameter of the current frame:
lsp[k]=α*lsp_old[k]+δ*lsp_new[k]0≦k≦M,
where lsp[k] is the post-processed spectral pair parameter of the current frame, lsp_old[k] is the spectral pair parameter of the previous frame, lsp_new[k] is the spectral pair parameter of the current frame, M is an order of spectral pair parameters, α is a weight of the spectral pair parameter of the previous frame, and δ is a weight of the spectral pair parameter of the current frame, where α≧0 and δ≧0.
(62) In an embodiment of the present application, the post-processing unit 303 may use the following formula to obtain through calculation the post-processed spectral pair parameter of the current frame:
lsp[k]=α*lsp_old[k]+β*lsp_mid[k]+δ*lsp_new[k]0≦k≦M,
where lsp[k] is the post-processed spectral pair parameter of the current frame, lsp_old[k] is the spectral pair parameter of the previous frame, lsp_mid [k] is a middle value of the spectral pair parameter of the current frame, lsp_new[k] is the spectral pair parameter of the current frame, M is an order of spectral pair parameters, α is a weight of the spectral pair parameter of the previous frame, β is a weight of the middle value of the spectral pair parameter of the current frame, and δ is a weight of the spectral pair parameter of the current frame, where α≧0, β≧0, and δ≧0.
(63) Values of α, β, and δ in the foregoing formula may vary according to different application environments and scenarios. For example, when a signal class of the current frame is unvoiced, the previous frame of the current frame is a redundancy decoding frame, and a signal class of the previous frame of the current frame is not unvoiced, the value of α is 0 or is less than a preset threshold (α_TRESH), where a value of α_TRESH may approach 0. When the current frame is a redundancy decoding frame and a signal class of the current frame is not unvoiced, if a signal class of a next frame of the current frame is unvoiced, or a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, or a signal class of a next frame of the current frame is unvoiced and a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, the value of β is 0 or is less than a preset threshold (β_TRESH), where a value of β_TRESH may approach 0. When the current frame is a redundancy decoding frame and a signal class of the current frame is not unvoiced, if a signal class of a next frame of the current frame is unvoiced, or a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, or a signal class of a next frame of the current frame is unvoiced and a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, the value of δ is 0 or is less than a preset threshold (δ_TRESH), where a value of δ_TRESH may approach 0.
(64) The spectral tilt factor may be positive or negative, and a smaller spectral tilt factor of a frame indicates a signal class, which is more inclined to be unvoiced, of the frame.
(65) The signal class of the current frame may be unvoiced, voiced, generic, transition, inactive, or the like.
(66) Therefore, for a value of the spectral tilt factor threshold, different values may be set according to different application environments and scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(67) In another embodiment of the present application, the post-processing unit 303 is further configured to attenuate an adaptive codebook gain of the current subframe of the current frame when the decoded parameter of the current frame includes an adaptive codebook gain of the current frame and the current frame is a redundancy decoding frame, if the next frame of the current frame is an unvoiced frame, or a next frame of the next frame of the current frame is an unvoiced frame and an algebraic codebook of a current subframe of the current frame is a first quantity of times an algebraic codebook of a previous subframe of the current subframe or an algebraic codebook of the previous frame of the current frame.
(68) For an attenuation factor used when the adaptive codebook gain of the current subframe of the current frame is attenuated, different values may be set according to different application environments and scenarios.
(69) A value of the first quantity may be set according to specific application environments and scenarios. The value may be an integer or may be a non-integer. For example, the value of the first quantity may be 2, 2.5, 3, 3.4, or 4.
(70) In another embodiment of the present application, the post-processing unit 303 is further configured to adjust an adaptive codebook gain of a current subframe of the current frame according to at least one of a ratio of an algebraic codebook of the current subframe of the current frame to an algebraic codebook of a neighboring subframe of the current subframe of the current frame, a ratio of an adaptive codebook gain of the current subframe of the current frame to an adaptive codebook gain of the neighboring subframe of the current subframe of the current frame, and a ratio of the algebraic codebook of the current subframe of the current frame to the algebraic codebook of the previous frame of the current frame when the decoded parameter of the current frame includes an adaptive codebook gain of the current frame, the current frame or the previous frame of the current frame is a redundancy decoding frame, the signal class of the current frame is generic and the signal class of the next frame of the current frame is voiced or the signal class of the previous frame of the current frame is generic and the signal class of the current frame is voiced, and an algebraic codebook of one subframe in the current frame is different from an algebraic codebook of a previous subframe of the one subframe by a second quantity of times or an algebraic codebook of one subframe in the current frame is different from an algebraic codebook of the previous frame of the current frame by a second quantity of times.
(71) A value of the second quantity may be set according to specific application environments and scenarios. The value may be an integer or may be a non-integer. For example, the value of the second quantity may be 2, 2.6, 3, 3.5, or 4.
(72) In another embodiment of the present application, the post-processing unit 303 is further configured to use random noise or a non-zero algebraic codebook of the previous subframe of the current subframe of the current frame as an algebraic codebook of an all-0 subframe of the current frame when the decoded parameter of the current frame includes an algebraic codebook of the current frame, the current frame is a redundancy decoding frame, the signal class of the next frame of the current frame is unvoiced, the spectral tilt factor of the previous frame of the current frame is less than the preset spectral tilt factor threshold, and an algebraic codebook of at least one subframe of the current frame is 0. For the spectral tilt factor threshold, different values may be set according to different application environments or scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(73) In another embodiment of the present application, the post-processing unit 303 is further configured to perform correction on the bandwidth extension of the current frame according to at least one of a bandwidth extension envelope of the previous frame of the current frame and the spectral tilt factor of the previous frame of the current frame when the current frame is a redundancy decoding frame, the decoded parameter includes a bandwidth extension envelope, the current frame is not an unvoiced frame and the next frame of the current frame is an unvoiced frame, and the spectral tilt factor of the previous frame of the current frame is less than the preset spectral tilt factor threshold. A correction factor used when correction is performed on the bandwidth extension envelope of the current frame is inversely proportional to the spectral tilt factor of the previous frame of the current frame and is directly proportional to a ratio of the bandwidth extension envelope of the previous frame of the current frame to the bandwidth extension envelope of the current frame. For the spectral tilt factor threshold, different values may be set according to different application environments or scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(74) In another embodiment of the present application, the post-processing unit 303 is further configured to use a bandwidth extension envelope of the previous frame of the current frame to perform adjustment on the bandwidth extension envelope of the current frame when the current frame is a redundancy decoding frame, the decoded parameter includes a bandwidth extension envelope, the previous frame of the current frame is a normal decoding frame, and the signal class of the current frame is the same as the signal class of the previous frame of the current frame or the current frame is a prediction mode of redundancy decoding.
(75) It can be known from the above that, in an embodiment of the present application, at transition between an unvoiced frame and a non-unvoiced frame (when the current frame is an unvoiced frame and a redundancy decoding frame, the previous frame or next frame of the current frame is a non-unvoiced frame and a normal decoding frame, or the current frame is a non-unvoiced frame and a normal decoding frame and the previous frame or next frame of the current frame is an unvoiced frame and a redundancy decoding frame), post-processing may be performed on the decoded parameter of the current frame in order to eliminate a click phenomenon at the inter-frame transition between the unvoiced frame and the non-unvoiced frame, improving quality of a speech/audio signal that is output. In another embodiment of the present application, at transition between a generic frame and a voiced frame (when the current frame is a generic frame and a redundancy decoding frame, the previous frame or next frame of the current frame is a voiced frame and a normal decoding frame, or the current frame is a voiced frame and a normal decoding frame and the previous frame or next frame of the current frame is a generic frame and a redundancy decoding frame), post-processing may be performed on the decoded parameter of the current frame in order to rectify an energy instability phenomenon at the transition between the generic frame and the voiced frame, improving quality of a speech/audio signal that is output. In another embodiment of the present application, when the current frame is a redundancy decoding frame, the current frame is not an unvoiced frame, and the next frame of the current frame is an unvoiced frame, adjustment may be performed on a bandwidth extension envelope of the current frame in order to rectify an energy instability phenomenon in time-domain bandwidth extension, improving quality of a speech/audio signal that is output.
(76)
(77) The processor 402 invokes a code stored in the memory 403 using the bus 401 in order to determine whether a current frame is a normal decoding frame or a redundancy decoding frame, obtain a decoded parameter of the current frame by means of parsing if the current frame is a normal decoding frame or a redundancy decoding frame, perform post-processing on the decoded parameter of the current frame to obtain a post-processed decoded parameter of the current frame, and use the post-processed decoded parameter of the current frame to reconstruct a speech/audio signal.
(78) It can be known from the above that, in this embodiment, after obtaining a decoded parameter of a current frame by means of parsing, a decoder side may perform post-processing on the decoded parameter of the current frame and use a post-processed decoded parameter of the current frame to reconstruct a speech/audio signal such that stable quality can be obtained when a decoded signal transitions between a redundancy decoding frame and a normal decoding frame, improving quality of a speech/audio signal that is output.
(79) In an embodiment of the present application, the decoded parameter of the current frame includes a spectral pair parameter of the current frame and the processor 402 invokes the code stored in the memory 403 using the bus 401 in order to use the spectral pair parameter of the current frame and a spectral pair parameter of a previous frame of the current frame to obtain a post-processed spectral pair parameter of the current frame. Furthermore, adaptive weighting is performed on the spectral pair parameter of the current frame and the spectral pair parameter of the previous frame of the current frame to obtain the post-processed spectral pair parameter of the current frame. Further, in an embodiment of the present application, the following formula may be used to obtain through calculation the post-processed spectral pair parameter of the current frame:
lsp[k]=α*lsp_old[k]+δ*lsp_new[k]0≦k≦M,
where lsp[k] is the post-processed spectral pair parameter of the current frame, lsp_new[k] is the spectral pair parameter of the previous frame, M is an order of spectral pair parameters, α is a weight of the spectral pair parameter of the previous frame, and δ is a weight of the spectral pair parameter of the current frame, where α≧0 and δ≧0.
(80) In another embodiment of the present application, the following formula may be used to obtain through calculation the post-processed spectral pair parameter of the current frame:
lsp[k]=α*lsp_old[k]+β*lsp_mid[k]+δ*lsp_new[k]0≦k≦M,
where lsp[k] is the post-processed spectral pair parameter of the current frame, lsp_old[k] is the spectral pair parameter of the previous frame, lsp_mid [k] is a middle value of the spectral pair parameter of the current frame, lsp_new[k] is the spectral pair parameter of the current frame, M is an order of spectral pair parameters, α is a weight of the spectral pair parameter of the previous frame, β is a weight of the middle value of the spectral pair parameter of the current frame, and δ is a weight of the spectral pair parameter of the current frame, where α≧0, β≧0, and δ≧0.
(81) Values of α, β, and δ in the foregoing formula may vary according to different application environments and scenarios. For example, when a signal class of the current frame is unvoiced, the previous frame of the current frame is a redundancy decoding frame, and a signal class of the previous frame of the current frame is not unvoiced, the value of α is 0 or is less than a preset threshold (α_TRESH), where a value of α_TRESH may approach 0. When the current frame is a redundancy decoding frame and a signal class of the current frame is not unvoiced, if a signal class of a next frame of the current frame is unvoiced, or a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, or a signal class of a next frame of the current frame is unvoiced and a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, the value of β is 0 or is less than a preset threshold (β_TRESH), where a value of β_TRESH may approach 0. When the current frame is a redundancy decoding frame and a signal class of the current frame is not unvoiced, if a signal class of a next frame of the current frame is unvoiced, or a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, or a signal class of a next frame of the current frame is unvoiced and a spectral tilt factor of the previous frame of the current frame is less than a preset spectral tilt factor threshold, the value of δ is 0 or is less than a preset threshold (δ_TRESH), where a value of δ_TRESH may approach 0.
(82) The spectral tilt factor may be positive or negative, and a smaller spectral tilt factor of a frame indicates a signal class, which is more inclined to be unvoiced, of the frame.
(83) The signal class of the current frame may be unvoiced, voiced, generic, transition, inactive, or the like.
(84) Therefore, for a value of the spectral tilt factor threshold, different values may be set according to different application environments and scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(85) In another embodiment of the present application, the decoded parameter of the current frame may include an adaptive codebook gain of the current frame. When the current frame is a redundancy decoding frame, if the next frame of the current frame is an unvoiced frame, or a next frame of the next frame of the current frame is an unvoiced frame and an algebraic codebook of a current subframe of the current frame is a first quantity of times an algebraic codebook of a previous subframe of the current subframe or an algebraic codebook of the previous frame of the current frame, the processor 402 invokes the code stored in the memory 403 using the bus 401 in order to attenuate an adaptive codebook gain of the current subframe of the current frame. When the current frame or the previous frame of the current frame is a redundancy decoding frame, if the signal class of the current frame is generic and the signal class of the next frame of the current frame is voiced or the signal class of the previous frame of the current frame is generic and the signal class of the current frame is voiced, and an algebraic codebook of one subframe in the current frame is different from an algebraic codebook of a previous subframe of the one subframe by a second quantity of times or an algebraic codebook of one subframe in the current frame is different from an algebraic codebook of the previous frame of the current frame by a second quantity of times, performing post-processing on the decoded parameter of the current frame may include adjusting an adaptive codebook gain of a current subframe of the current frame according to at least one of a ratio of an algebraic codebook of the current subframe of the current frame to an algebraic codebook of a neighboring subframe of the current subframe of the current frame, a ratio of an adaptive codebook gain of the current subframe of the current frame to an adaptive codebook gain of the neighboring subframe of the current subframe of the current frame, and a ratio of the algebraic codebook of the current subframe of the current frame to the algebraic codebook of the previous frame of the current frame.
(86) Values of the first quantity and the second quantity may be set according to specific application environments and scenarios. The values may be integers or may be non-integers, where the values of the first quantity and the second quantity may be the same or may be different. For example, the value of the first quantity may be 2, 2.5, 3, 3.4, or 4 and the value of the second quantity may be 2, 2.6, 3, 3.5, or 4.
(87) For an attenuation factor used when the adaptive codebook gain of the current subframe of the current frame is attenuated, different values may be set according to different application environments and scenarios.
(88) In another embodiment of the present application, the decoded parameter of the current frame includes an algebraic codebook of the current frame. When the current frame is a redundancy decoding frame, if the signal class of the next frame of the current frame is unvoiced, the spectral tilt factor of the previous frame of the current frame is less than the preset spectral tilt factor threshold, and an algebraic codebook of at least one subframe of the current frame is 0, the processor 402 invokes the code stored in the memory 403 using the bus 401 in order to use random noise or a non-zero algebraic codebook of the previous subframe of the current subframe of the current frame as an algebraic codebook of an all-0 subframe of the current frame. For the spectral tilt factor threshold, different values may be set according to different application environments or scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(89) In another embodiment of the present application, the decoded parameter of the current frame includes a bandwidth extension envelope of the current frame. When the current frame is a redundancy decoding frame, the current frame is not an unvoiced frame, and the next frame of the current frame is an unvoiced frame, if the spectral tilt factor of the previous frame of the current frame is less than the preset spectral tilt factor threshold, the processor 402 invokes the code stored in the memory 403 using the bus 401 in order to perform correction on the bandwidth extension envelope of the current frame according to at least one of a bandwidth extension envelope of the previous frame of the current frame and the spectral tilt factor of the previous frame of the current frame. A correction factor used when correction is performed on the bandwidth extension envelope of the current frame is inversely proportional to the spectral tilt factor of the previous frame of the current frame and is directly proportional to a ratio of the bandwidth extension envelope of the previous frame of the current frame to the bandwidth extension envelope of the current frame. For the spectral tilt factor threshold, different values may be set according to different application environments or scenarios, for example, may be set to 0.16, 0.15, 0.165, 0.1, 0.161, or 0.159.
(90) In another embodiment of the present application, the decoded parameter of the current frame includes a bandwidth extension envelope of the current frame. If the current frame is a redundancy decoding frame, the previous frame of the current frame is a normal decoding frame, the signal class of the current frame is the same as the signal class of the previous frame of the current frame or the current frame is a prediction mode of redundancy decoding, the processor 402 invokes the code stored in the memory 403 using the bus 401 in order to use a bandwidth extension envelope of the previous frame of the current frame to perform adjustment on the bandwidth extension envelope of the current frame.
(91) It can be known from the above that, in an embodiment of the present application, at transition between an unvoiced frame and a non-unvoiced frame (when the current frame is an unvoiced frame and a redundancy decoding frame, the previous frame or next frame of the current frame is a non-unvoiced frame and a normal decoding frame, or the current frame is a non-unvoiced frame and a normal decoding frame and the previous frame or next frame of the current frame is an unvoiced frame and a redundancy decoding frame), post-processing may be performed on the decoded parameter of the current frame in order to eliminate a click phenomenon at the inter-frame transition between the unvoiced frame and the non-unvoiced frame, improving quality of a speech/audio signal that is output. In another embodiment of the present application, at transition between a generic frame and a voiced frame (when the current frame is a generic frame and a redundancy decoding frame, the previous frame or next frame of the current frame is a voiced frame and a normal decoding frame, or the current frame is a voiced frame and a normal decoding frame and the previous frame or next frame of the current frame is a generic frame and a redundancy decoding frame), post-processing may be performed on the decoded parameter of the current frame in order to rectify an energy instability phenomenon at the transition between the generic frame and the voiced frame, improving quality of a speech/audio signal that is output. In another embodiment of the present application, when the current frame is a redundancy decoding frame, the current frame is not an unvoiced frame, and the next frame of the current frame is an unvoiced frame, adjustment may be performed on a bandwidth extension envelope of the current frame in order to rectify an energy instability phenomenon in time-domain bandwidth extension, improving quality of a speech/audio signal that is output.
(92) An embodiment of the present application further provides a computer storage medium. The computer storage medium may store a program and the program performs some or all steps of the method for decoding a speech/audio bitstream that are described in the foregoing method embodiments.
(93) It should be noted that, for brief description, the foregoing method embodiments are represented as series of actions. However, a person skilled in the art should appreciate that the present application is not limited to the described order of the actions, because according to the present application, some steps may be performed in other orders or simultaneously. In addition, a person skilled in the art should also understand that all the embodiments described in this specification are exemplary embodiments, and the involved actions and modules are not necessarily mandatory to the present application.
(94) In the foregoing embodiments, the description of each embodiment has a respective focus. For a part that is not described in detail in one embodiment, reference may be made to related descriptions in other embodiments.
(95) In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the described apparatus embodiments are merely exemplary. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic or other forms.
(96) The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
(97) In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in a form of hardware, or may be implemented in a form of a software functional unit.
(98) When the foregoing integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of the present application essentially, or the part contributing to the prior art, or all or some of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or a processor connected to a memory) to perform all or some of the steps of the methods described in the foregoing embodiments of the present application. The foregoing storage medium includes any medium that can store program code, such as a universal serial bus (USB) flash drive, a read-only memory (ROM), a random access memory (RAM), a portable hard drive, a magnetic disk, or an optical disc.
(99) The foregoing embodiments are merely intended to describe the technical solutions of the present application, but not to limit the present application. Although the present application is described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that they may still make modifications to the technical solutions described in the foregoing embodiments or make equivalent replacements to some technical features thereof, without departing from the scope of the technical solutions of the embodiments of the present application.