Virtual stereo synthesis method and apparatus

Abstract

A virtual stereo synthesis method includes acquiring at least one sound input signal on a first side and at least one sound input signal on a second side, separately performing ratio processing on a preset head related transfer function (HRTF) left-ear component and a preset HRTF right-ear component of each sound input signal on the second side, to obtain a filtering function of each sound input signal on the second side, separately performing convolution filtering on each sound input signal on the second side and the filtering function of the sound input signal on the second side, to obtain the filtered signal on the second side, and synthesizing all of the sound input signals on the first side and all of the filtered signals on the second side into a virtual stereo signal where the method may alleviate a coloration effect, and reduce calculation complexity.

Claims

1. A virtual stereo synthesis method, comprising: acquiring at least one sound input signal on a first side and at least one sound input signal on a second side; separately performing ratio processing on a preset head related transfer function (HRTF) left-ear component and a preset HRTF right-ear component of each sound input signal on the second side, to obtain a filtering function of each of the sound input signals on the second side; separately performing convolution filtering on each of the sound input signals on the second side and the filtering function of each of the sound input signals on the second side, to obtain filtered signals on the second side; and synthesizing all of the sound input signals on the first side and all of the filtered signals on the second side into a virtual stereo signal, wherein synthesizing all of the sound input signals on the first side and all of the filtered signals on the second side comprises: summating all of the sound input signals on the first side and all of the filtered signals on the second side to obtain a synthetic signal; performing, using a fourth-order infinite impulse response (IIR) filter, timbre equalization on the synthetic signal; and using the timbre-equalized synthetic signal as the virtual stereo signal.

2. The method according to claim 1, wherein separately performing the ratio processing comprises: separately using a ratio of a left-ear frequency domain parameter to a right-ear frequency domain parameter of each of the sound input signals on the second side as a frequency-domain filtering function of each of the sound input signals on the second side, wherein the left-ear frequency domain parameter is related to the preset HRTF left-ear component and wherein-the right-ear frequency domain parameter is related to the preset HRTF right-ear component; separately transforming the frequency-domain filtering function of each of the sound input signals on the second side to a time-domain function; and using the time-domain function as the filtering function of each of the sound input signals on the second side.

3. The method according to claim 2, wherein separately transforming the frequency-domain filtering function of each of the sound input signals on the second side comprises separately performing minimum phase filtering on the frequency-domain filtering function of each of the sound input signals on the second side.

4. The method according to claim 2, further comprising: separately using a frequency domain of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signals on the second side, and separately using a frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side; separately using a frequency domain of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization or subband smoothing, and separately using the frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side after the diffuse-field equalization or the subband smoothing; or separately using the frequency domain of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization and subband smoothing is performed in sequence, and separately using the frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization and subband smoothing is performed in sequence.

5. The method according to claim 1, wherein separately performing convolution filtering on each of the sound input signals on the second side and the filtering function of each of the sound input signals on the second side comprises: separately performing reverberation processing on each of the sound input signals on the second side; using the reverberation processed signals as sound reverberation signals on the second side; and separately performing convolution filtering on each of the sound reverberation signals on the second side and the filtering function of the corresponding sound input signals on the second side, to obtain the filtered signals on the second side.

6. The method according to claim 5, wherein separately performing the reverberation processing on each of the sound input signals on the second side, and using the reverberation processed signals as the sound reverberation signals on the second side comprises: separately passing each of the sound input signals on the second side through an all-pass filter, to obtain a reverberation signal of each of the sound input signals on the second side; and separately synthesizing each of the sound input signals on the second side and the reverberation signal of each of the sound input signals on the second side into the sound reverberation signals on the second side.

7. A virtual stereo synthesis apparatus, comprising: a memory comprising instructions; and a processor coupled to the memory, wherein the instructions cause the processor to be configured to: acquire at least one sound input signal on a first side and at least one sound input signal on a second side; separately perform ratio processing on a preset head related transfer function (HRTF) left-ear component and a preset HRTF right-ear component of each sound input signal on the second side, to obtain a filtering function of each of the sound input signals on the second side; separately perform convolution filtering on each of the sound input signals on the second side and on the filtering function of each of the sound input signals on the second side, to obtain filtered signals on the second side; and synthesize all of the sound input signals on the first side and all of the filtered signals on the second side; summate all of the sound input signals on the first side and all of the filtered signals on the second side to obtain a synthetic signal; perform, using a fourth-order infinite impulse response (IIR) filter, timbre equalization on the synthetic signal; and use the timbre-equalized synthetic signal as a virtual stereo signal.

8. The virtual stereo synthesis apparatus according to claim 7, wherein the instructions further cause the processor to be configured to: separately use a ratio of a left-ear frequency domain parameter to a right-ear frequency domain parameter of each of the sound input signals on the second side as a frequency-domain filtering function of each of the sound input signals on the second side, wherein the left-ear frequency domain parameter is related to the preset HRTF left-ear component, and wherein the right-ear frequency domain parameter is related to the preset HRTF right-ear component; separately transform the frequency-domain filtering function of each of the sound input signals on the second side to a time-domain function; and use the time-domain function as the filtering function of each of the sound input signals on the second side.

9. The virtual stereo synthesis apparatus according to claim 8, wherein the instructions further cause the processor to be configured to separately perform minimum phase filtering on the frequency-domain filtering function of each of the sound input signals on the second side.

10. The virtual stereo synthesis apparatus according to claim 8, wherein the instructions further cause the processor to be configured to: separately use a frequency domain of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signals on the second side, and separately use a frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side; separately use a frequency domain of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization or subband smoothing, and separately use the frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side after the diffuse-field equalization or the subband smoothing; or separately use the frequency domain, after diffuse-field equalization and subband smoothing is performed in sequence, of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signals on the second side, and separately use the frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization and subband smoothing is performed in sequence.

11. The virtual stereo synthesis apparatus according to claim 7, wherein the instructions further cause the processor to be configured to: separately perform reverberation processing on each of the sound input signals on the second side; use the reverberation processed signals as sound reverberation signals on the second side; and separately perform convolution filtering on each of the sound reverberation signals on the second side and the filtering function of the corresponding sound input signals on the second side, to obtain the filtered signals on the second side.

12. The virtual stereo synthesis apparatus according to claim 11, wherein the instructions further cause the processor to be configured to: separately pass each of the sound input signals on the second side through an all-pass filter, to obtain a reverberation signal of each of the sound input signals on the second side; and separately synthesize each of the sound input signals on the second side and the reverberation signal of each of the sound input signals on the second side into the sound reverberation signals on the second side.

13. A non-transitory computer readable storage medium including at least one computer program code stored therein to perform virtual stereo synthesis associated with a computing device wherein when executed on a processor, the computer readable medium causes the processor to: acquire at least one sound input signal on a first side and at least one sound input signal on a second side; separately perform ratio processing on a preset head related transfer function (HRTF) left-ear component and a preset HRTF right-ear component of each sound input signal on the second side, to obtain a filtering function of each of the sound input signals on the second side; separately perform convolution filtering on each of the sound input signals on the second side and the filtering function of each of the sound input signals on the second side, to obtain filtered signals on the second side; and synthesize all of the sound input signals on the first side and all of the filtered signals on the second side; summate all of the sound input signals on the first side and all of the filtered signals on the second side to obtain a synthetic signal; perform, using a fourth-order infinite impulse response (IIR) filter, timbre equalization on the synthetic signal; and use the timbre-equalized synthetic signal as a virtual stereo signal.

14. The non-transitory computer readable storage medium according to claim 13, wherein the computer readable medium further causes the processor to be configured to: separately use a ratio of a left-ear frequency domain parameter to a right-ear frequency domain parameter of each of the sound input signals on the second side as a frequency-domain filtering function of each of the sound input signals on the second side, wherein the left-ear frequency domain parameter is related to the preset HRTF left-ear component, and wherein the right-ear frequency domain parameter is related to the preset HTRF right-ear component; separately transform the frequency-domain filtering function of each of the sound input signals on the second side to a time-domain function; and use the time-domain function as the filtering function of each of the sound input signals on the second side.

15. The non-transitory computer readable storage medium according to claim 14, wherein the computer readable medium further causes the processor to be configured to: separately perform minimum phase filtering on the frequency-domain filtering function of each of the sound input signals on the second side; transform the frequency-domain filtering function to the time-domain function; and use the time-domain function as the filtering function of each of the sound input signals on the second side.

16. The non-transitory computer readable storage medium according to claim 14, wherein the computer readable medium further causes the processor to be configured to: separately use a frequency domain of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signal on the second side, and separately use a frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side; separately use a frequency domain of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization or subband smoothing, and separately use the frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization or subband smoothing; or separately use the frequency domain of the preset HRTF left-ear component of each of the sound input signals on the second side as the left-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization and subband smoothing is performed in sequence, and separately use the frequency domain of the preset HRTF right-ear component of each of the sound input signals on the second side as the right-ear frequency domain parameter of each of the sound input signals on the second side after diffuse-field equalization and subband smoothing is performed in sequence.

17. The non-transitory computer readable storage medium according to claim 13, wherein the computer readable medium further causes the processor to be configured to: separately perform reverberation processing on each of the sound input signals on the second side; use the reverberation processed signals as sound reverberation signals on the second side; and separately perform convolution filtering on each of the sound reverberation signals on the second side and the filtering function of the corresponding sound input signals on the second side, to obtain the filtered signals on the second side.

18. The non-transitory computer readable storage medium according to claim 17, wherein the computer readable medium further causes the processor to be configured to: separately pass each of the sound input signals on the second side through an all-pass filter, to obtain a reverberation signal of each of the sound input signals on the second side; and separately synthesize each of the sound input signals on the second side and the reverberation signal of each of the sound input signals on the second side into the sound reverberation signals on the second side.

19. A virtual stereo synthesis method, comprising: acquiring at least one sound input signal on a first side and at least one sound input signal on a second side; separately performing ratio processing on a preset head related transfer function (HRTF) left-ear component and a preset HRTF right-ear component of each sound input signal on the second side, to obtain a filtering function of each of the sound input signals on the second side; separately performing convolution filtering on each of the sound input signals on the second side and the filtering function of each of the sound input signals on the second side, to obtain filtered signals on the second side; summating all of the sound input signals on the first side and all of the filtered signals on the second side to obtain a synthetic signal; performing, using a fourth-order infinite impulse response (IIR) filter, timbre equalization on the synthetic signal; and using the timbre-equalized synthetic signal as a virtual stereo signal.

20. A virtual stereo synthesis apparatus, comprising: a memory comprising instructions; and a processor coupled to the memory, wherein the instructions cause the processor to be configured to: acquire at least one sound input signal on a first side and at least one sound input signal on a second side; separately perform ratio processing on a preset head related transfer function (HRTF) left-ear component and a preset HRTF right-ear component of each sound input signal on the second side, to obtain a filtering function of each of the sound input signals on the second side; separately perform convolution filtering on each of the sound input signals on the second side and on the filtering function of each of the sound input signals on the second side, to obtain filtered signals on the second side; summate all of the sound input signals on the first side and all of the filtered signals on the second side to obtain a synthetic signal; perform, using a fourth-order infinite impulse response (IIR) filter, timbre equalization on the synthetic signal; and use the timbre-equalized synthetic signal as a virtual stereo signal.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) FIG. 1 a schematic diagram of synthesizing a virtual sound;

(2) FIG. 2 is a flowchart of an implementation manner of a virtual stereo synthesis method according to this application;

(3) FIG. 3 is a flowchart of another implementation manner of a virtual stereo synthesis method according to this application;

(4) FIG. 4 is a flowchart of a method for obtaining a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of a sound input signal on the other side in step S302 shown in FIG. 3;

(5) FIG. 5 is a schematic structural diagram of an all-pass filter used in step S303 shown in FIG. 3;

(6) FIG. 6 is a schematic structural diagram of an implementation manner of a virtual stereo synthesis apparatus according to this application;

(7) FIG. 7 is a schematic structural diagram of another implementation manner of a virtual stereo synthesis apparatus according to this application; and

(8) FIG. 8 is a schematic structural diagram of still another implementation manner of a virtual stereo synthesis apparatus according to this application.

DESCRIPTION OF EMBODIMENTS

(9) Descriptions are provided in the following with reference to the accompanying drawings and specific implementation manners.

(10) Referring to FIG. 2, FIG. 2 is a flowchart of an implementation manner of a virtual stereo synthesis method according to this application. In this implementation manner, the method includes the following steps.

(11) Step S201: A virtual stereo synthesis apparatus acquires at least one sound input signal s.sub.l.sub.m(n) on one side and at least one sound input signal s.sub.2.sub.k(n) on the other side.

(12) In the present disclosure, an original sound signal is processed to obtain an output sound signal that has a stereo sound effect. In this implementation manner, there are a total of M simulated sound sources located on one side, which accordingly generate M sound input signals on the one side, and there are a total of K simulated sound sources located on the other side, which accordingly generate K sound input signals on the other side. The virtual stereo synthesis apparatus acquires the M sound input signals s.sub.1.sub.m(n) on the one side and the K sound input signals s.sub.2.sub.k(n) on the other side, where the M sound input signals s.sub.1.sub.k(n) on the one side and the K sound input signals s.sub.2.sub.k(n) on the other side are used as original sound signals, where s.sub.1.sub.m(n) represents the m.sup.th sound input signal on the one side, s.sub.2.sub.k(n) represents the k.sup.th sound input signal on the other side, 1≦m≦M, and 1≦k≦K.

(13) Generally, in the present disclosure, the sound input signals on the one side and the other side simulate sound signals that are sent from left side and right side positions of an artificial head center in order to be distinguished from each other. For example, if the sound input signal on the one side is a left-side sound input signal, the sound input signal on the other side is a right-side sound input signal, or if the sound input signal on the one side is a right-side sound input signal, the sound input signal on the other side is a left-side sound input signal, where the left-side sound input signal is a simulation of a sound signal that is sent from the left side position of the artificial head center, and the right-side sound input signal is a simulation of a sound signal that is sent from the right side position of the artificial head center. For example, in a dual-channel mobile terminal, a left channel signal is a left-side sound input signal, and a right channel signal is a right-side sound input signal. When a sound is played by a headset, the virtual stereo synthesis apparatus separately acquires the left and right channel signals that are used as original sound signals, and separately uses the left and the right channel signals as the sound input signals on the one side and the other side. Alternatively, for some mobile terminals whose replay signal sources include four channel signals, horizontal angles between simulated sound sources of the four channel signals and the front of the artificial head center are separately ±30° and ±110°, and elevation angles of the simulated sound sources are 0°. It is generally defined that, channel signals whose horizontal angles are positive angles (+30° and +110°) are right-side sound input signals, and channel signals whose horizontal angles are negative angles (−30° and −110°) are left-side sound input signals. When a sound is played by a headset, the virtual stereo synthesis apparatus acquires the left-side and right-side sound input signals that are separately used as the sound input signals on the one side and the other side.

(14) Step S202: The virtual stereo synthesis apparatus separately performs ratio processing on a preset function HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side.

(15) A preset HRTF is briefly described herein, HRTF data h.sub.θ,φ(n) is filter model data, measured in a laboratory, of transmission paths that are from a sound source at a position to two ears of an artificial head, and expresses a comprehensive filtering function of a human physiological structure on a sound wave from the position of the sound source, where a horizontal angle between the sound source and the artificial head center is θ, and an elevation angle between the sound source and the artificial head center is φ. Different HRTF experimental measurement databases can already be provided in the prior art. In the present disclosure, HRTF data of a preset sound source may be directly acquired, without performing measurement, from the HRTF experimental measurement databases in the prior art, and a simulated sound source position is a sound source position during measurement of corresponding preset HRTF data. In this implementation manner, each sound input signal correspondingly comes from a different preset simulated sound source, and therefore a different piece of HRTF data is correspondingly preset for each sound input signal. The preset HRTF data of each sound input signal can express a filtering effect on the sound input signal that is transmitted from a preset position to the two ears. Furthermore, preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the k.sup.th sound input signal on the other side includes two pieces of data, which are respectively a left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) that expresses a filtering effect on the sound input signal that is transmitted to the left ear of the artificial head and a right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) that expresses a filtering effect on the sound input signal that is transmitted to the right ear of the artificial head.

(16) The virtual stereo synthesis apparatus performs ratio processing on the left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) in preset HRTF data of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side, for example, the virtual stereo synthesis apparatus directly transforms the preset HRTF left-ear component and the preset HRTF right-ear component of the sound input signal on the other side to frequency domain, performs a ratio operation to obtain a value, and uses the obtained value as the filtering function of the sound input signal on the other side, or the virtual stereo synthesis apparatus first transforms the preset HRTF left-ear component and the preset HRTF right-ear component of the sound input signal on the other side to frequency domain, performs subband smoothing, then performs a ratio operation to obtain a value, and uses the obtained value as the filtering function.

(17) Step S203: The virtual stereo synthesis apparatus separately performs convolution filtering on each sound input signal s.sub.2.sub.k(n) on the other side and the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, to obtain the filtered signal s.sub.2.sub.k.sup.h(n) on the other side.

(18) The virtual stereo synthesis apparatus calculates the filtered signal s.sub.2.sub.k.sup.h(n) on the other side corresponding to each sound input signal s.sub.2.sub.k(n) on the other side according to a formula s.sub.2.sub.k.sup.h(n)=conv(h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n), s.sub.2.sub.k(n)), where conv(x, y) represents a convolution of vectors x and y, s.sub.2.sub.k.sup.h(n) represents the k.sup.th filtered signal on the other side, h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) represents a filtering function of the k.sup.th sound input signal on the other side, and s.sub.2.sub.k(n) represents the k.sup.th sound input signal on the other side.

(19) Step S204: The virtual stereo synthesis apparatus synthesizes all of the sound input signals s.sub.1.sub.m(n) on the one side and all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side into a virtual stereo signal s.sup.1(n).

(20) The virtual stereo synthesis apparatus synthesizes, according to

(21) $s^{1} (n) = {.Math.}_{m = 1}^{M} s_{1_{m}} (n) + {.Math.}_{k = 1}^{K} s_{2_{k}}^{h} (n),$
all of the sound input signals s.sub.1.sub.m(n) on the one side that are obtained in step S201 and all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side that are obtained in step S203 into the virtual stereo signal s.sup.1 (n).

(22) In this implementation manner, ratio processing is performed on left-ear and right-ear components of preset HRTF data of each sound input signal on the other side, to obtain a filtering function that retains orientation information of the preset HRTF data such that during synthesis of a virtual stereo, convolution filtering processing needs to be performed on only the sound input signal on the other side using the filtering function, and the sound input signal on the other side and a sound input signal on one side are synthesized to obtain the virtual stereo, without a need to simultaneously perform convolution filtering on the sound input signals that are on the two sides, which greatly reduces calculation complexity, and during synthesis, convolution processing does not need to be performed on the sound input signal on the one side, and therefore an original audio is retained, which further alleviates a coloration effect, and improves sound quality of the virtual stereo.

(23) It should be noted that, in this implementation manner, the generated virtual stereo is a virtual stereo that is input to an ear on one side, for example, if the sound input signal on the one side is a left-side sound input signal, and the sound input signal on the other side is a right-side sound input signal, the virtual stereo signal obtained according to the foregoing steps is a left-ear virtual stereo signal that is directly input to the left ear, or if the sound input signal on the one side is a right-side sound input signal, and the sound input signal on the other side is a left-side sound input signal, the virtual stereo signal obtained according to the foregoing steps is a right-ear virtual stereo signal that is directly input to the right ear. In the foregoing manner, the virtual stereo synthesis apparatus can separately obtain a left-ear virtual stereo signal and a right-ear virtual stereo signal, and output the signals to the two ears using a headset, to achieve a stereo effect that is like a natural sound.

(24) In addition, in an implementation manner in which positions of virtual sound sources are all fixed, it is not limited that the virtual stereo synthesis apparatus executes step S202 each time virtual stereo synthesis is performed (for example, each time replay is performed using a headset). HRTF data of each sound input signal indicates filter model data of paths for transmitting the sound input signal from a sound source to two ears of an artificial head, and in a case in which a position of the sound source is fixed, the filter model data of the path for transmitting the sound input signal, generated by the sound source, from the sound source to the two ears of the artificial head is fixed. Therefore, step S202 may be separated out, and step 202 is executed in advance to acquire and save a filtering function of each sound input signal, and when the virtual stereo synthesis is performed, the filtering function, saved in advance, of each sound input signal is directly acquired to perform convolution filtering on a sound input signal on the other side generated by a virtual sound source on the other side. The foregoing case still falls within the protection scope of the virtual stereo synthesis method in the present disclosure.

(25) Referring to FIG. 3, FIG. 3 is a flowchart of another implementation manner of a virtual stereo synthesis method according to the present disclosure. In this implementation manner, the method includes the following steps.

(26) Step S301: A virtual stereo synthesis apparatus acquires at least one sound input signal s.sub.1.sub.m(n) on one side and at least one sound input signal s.sub.2.sub.k(n) on the other side.

(27) The virtual stereo synthesis apparatus acquires the at least one sound input signal s.sub.1.sub.m(n) on the one side and the at least one sound input signal s.sub.2.sub.k(n) on the other side, where s.sub.1.sub.m(n) represents the m.sup.th sound input signal on the one side, s.sub.2.sub.k(n) represents the k.sup.th sound input signal on the other side. In this implementation manner, there are a total of M sound input signals on the one side, and there are a total of K sound input signals on the other side, 1≦m≦M, and 1≦k≦K.

(28) Step S302: Separately perform ratio processing on a preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a preset function HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side.

(29) The virtual stereo synthesis apparatus performs ratio processing on the left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) in preset HRTF data of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side.

(30) A specific method for obtaining the filtering function of each sound input signal on the other side is described using an example. Referring to FIG. 4, FIG. 4 is a flowchart of a method for obtaining the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side in step S302 shown in FIG. 3. Acquiring, by the virtual stereo synthesis apparatus, the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side includes the following steps.

(31) Step S401: The virtual stereo synthesis apparatus performs diffuse-field equalization on preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side.

(32) A preset HRTF data of the k.sup.th sound input signal on the other side is represented by h.sub.θ.sub.k.sub.,φ.sub.k(n), where a horizontal angle between a simulated sound source of the k.sup.th sound input signal on the other side and an artificial head center is θ.sub.k, an elevation angle between the simulated sound source of the k.sup.th sound input signal on the other side and the artificial head center is φ.sub.k, and h.sub.θ.sub.k.sub.,φ.sub.k(n) includes two pieces of data: a left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n). Generally, a preset HRTF data obtained by means of measurement in a laboratory not only includes filter model data of transmission paths from a speaker, used as a sound source, to two ears of an artificial head, but also includes interference data such as a frequency response of the speaker, a frequency response of microphones that are disposed at the two ears to receive a signal of the speaker, and a frequency response of an ear canal of an artificial ear. These interference data affects a sense of orientation and a sense of distance of a synthetic virtual sound. Therefore, in this implementation manner, an optimal manner is used, in which the foregoing interference data is eliminated by means of diffuse-field equalization.

(33) (1) Furthermore, it is calculated that a frequency domain of the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side is H.sub.θ.sub.k.sub.,φ.sub.k(n).

(34) (2) An average energy spectrum DF _avg(n), in all directions, of the preset HRTF data frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side is calculated:

(35) ${DF}_{—} avg (n) = \frac{1}{(2 * T * P)} {.Math.}_{φ_{k} = φ_{1}}^{φ_{P}} {.Math.}_{θ_{k} = θ_{1}}^{θ_{T}} | H_{θ_{k}, φ_{k}} (n) |^{2},$
where |H.sub.θ.sub.k.sub.,φ.sub.k(n)| represents a modulus of H.sub.θ.sub.k.sub.,φ.sub.k(n), P and T represent a quantity P of elevation angles between test sound sources and an artificial head center, and a quantity T of horizontal angles between the test sound sources and the artificial head center, where P and T are included in an HRTF experimental measurement database in which H.sub.θ.sub.k.sub.,φ.sub.k(n) is located. In the present disclosure, when HRTF data in different HRTF experimental measurement databases is used, the quantity P of elevation angles and the quantity T of horizontal angles may be different.

(36) (3) The average energy spectrum DF _avg(n) is inversed, to obtain an inversion DF _inv(n) of the average energy spectrum of the preset HRTF data frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n):

(37) ${DF}_{—} inv (n) = \frac{1}{{DF}_{—} avg (n)} .$

(38) (4) The inversion DF _inv(n) of the average energy spectrum of the preset HRTF data frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) is transformed to time domain, and a real value is taken, to obtain an average inverse filtering sequence df _inv(n) of the preset HRTF data:
df _inv(n)=real(InvFT(DF _inv(n))),
where InfFT( ) represents inverse Fourier transform, and real(x) represents calculation of a real number part of a complex number x.

(39) (5) Convolution is performed on the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side and the average inverse filtering sequence df _inv(n) of the preset HRTF data, to obtain diffuse-field-equalized preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n):
h.sub.θ.sub.k.sub.,φ.sub.k(n)=conv(h.sub.θ.sub.k.sub.,φ.sub.k(n),df _inv(n)),
where conv(x,y) represents a convolution of vectors x and y, and h.sub.θ.sub.k.sub.,φ.sub.k(n) includes a diffuse-field-equalized preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a diffuse-field-equalized preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n).

(40) The virtual stereo synthesis apparatus performs the foregoing processing (1) to (5) on the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side, to obtain the diffuse-field-equalized HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n).

(41) Step S402: Perform subband smoothing on the diffuse-field-equalized preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n).

(42) The virtual stereo synthesis apparatus transforms the diffuse-field-equalized preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) to frequency domain, to obtain a frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the diffuse-field-equalized preset HRTF data. A time-domain transformation length of h.sub.θ.sub.k.sub.,φ.sub.k(n) is N.sub.1, and a quantity of frequency domain coefficients of H.sub.θ.sub.k.sub.,φ.sub.k(n) is N.sub.2, where N.sub.2=N½+1.

(43) The virtual stereo synthesis apparatus performs subband smoothing on the frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the diffuse-field-equalized preset HRTF data, calculates a modulus, and uses frequency domain data as subband-smoothed preset HRTF data |Ĥ.sub.θ.sub.k.sub.,φ.sub.k(n)|:

(44) $| {\hat{H}}_{θ_{k}, φ_{k}} (n) | = \frac{1}{{.Math.}_{j = 1}^{j_{\max} - j_{\min} + 1} hann (j)} {.Math.}_{j = j_{\min}}^{j_{\max}} | {\overline{H}}_{θ_{k}, φ_{k}} (j) * hann (j - j_{\min} + 1) |, where \begin{matrix} j_{\min} = {\begin{matrix} n - bw (n) & n - bw (n) > 1 \\ 1 & n - bw (n) \leq 1 \end{matrix} \\ j_{\max} = {\begin{matrix} n + bw (n) & n + bw (n) > M \\ M & n + bw (n) \leq M \end{matrix} \end{matrix},$
bw(n)=└0.2*n┘, └x┘ represents a maximum integer that is not greater than x, and hann(j)=0.5*(1−cos(2*π*j/(2*bw(n)+1))), j=0 . . . (2*bw(n)+1).

(45) Step S403: Use a preset HRTF left-ear frequency domain component Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) after the subband smoothing as a left-ear frequency domain parameter of the sound input signal on the other side, and use a preset HRTF right-ear frequency domain component Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) after the subband smoothing as a right-ear frequency domain parameter of the sound input signal on the other side. The left-ear frequency domain parameter represents a preset HRTF left-ear component of the sound input signal on the other side, and the right-ear frequency domain parameter represents a preset HRTF right-ear component of the sound input signal on the other side. Certainly, in another implementation manner, the preset HRTF left-ear component of the sound input signal on the other side may be directly used as the left-ear frequency domain parameter, or the preset HRTF left-ear component that has been subject to diffuse-field equalization may be used as the left-ear frequency domain parameter. It is similar for the right-ear frequency domain parameter.

(46) Step S404: Separately use a ratio of the left-ear frequency domain parameter of the sound input signal on the other side to the right-ear frequency domain parameter of the sound input signal on the other side as a frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side.

(47) The ratio of the left-ear frequency domain parameter of the sound input signal on the other side to the right-ear frequency domain parameter of the sound input signal on the other side further includes a modulus ratio and an argument difference between the left-ear frequency domain parameter and the right-ear frequency domain parameter, where the modulus ratio and the argument difference are correspondingly used as a modulus and an argument in the frequency-domain filtering function of the sound input signal on the other side, and the obtained filtering function can retain orientation information of the preset HRTF left-ear component and the preset HRTF right-ear component of the sound input signal on the other side.

(48) In this implementation manner, the virtual stereo synthesis apparatus performs a ratio operation on the left-ear frequency domain parameter and the right-ear frequency domain parameter of the sound input signal on the other side. Further, the modulus of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is obtained according to

(49) $| H_{θ, φ_{i}}^{c} (n) | = \frac{| \hat{H_{θ, φ_{i}}^{l}} (n) |}{| \hat{H_{θ, φ_{i}}^{r}} (n) |},$
the argument of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) is obtained according to arg(H.sub.θ,φ.sub.i.sup.c(n))=arg(H.sub.θ,φ.sub.i.sup.l(n))−arg(H.sub.θ,φ.sub.i.sup.r(n)), and therefore the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is obtained. |Ĥ.sub.θ.sub.k.sub.,φ.sub.k(n)| and |Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n)| respectively represent a left-ear component and a right-ear component of the subband-smoothed preset HRTF data |Ĥ.sub.θ.sub.k.sub.,φ.sub.k(n)|, and H.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and H.sub.θ.sub.k.sub.,φ.sub.k(n) respectively represent a left-ear component and a right-ear component of the frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the diffuse-field-equalized preset HRTF data. In subband smoothing, only a modulus value of a complex number is processed, that is, a value obtained after subband smoothing is the modulus value of the complex number, and does not include argument information. Therefore, when the argument of the frequency-domain filtering function is calculated, a frequency domain parameter that can represent the preset HRTF data and that includes argument information needs to be used, for example, left and right components of a diffuse-field-equalized HRTF data.

(50) It should be noted that, in the foregoing description, when diffuse-field equalization and subband smoothing are performed, the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) is processed. However, the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) includes two pieces of data: the left-ear component and the right-ear component, and therefore in fact, it is equivalent to that the diffuse-field equalization and the subband smoothing are performed separately on the left-ear component and the right-ear component of a preset HRTF data.

(51) Step S405: Separately perform minimum phase filtering on the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, then transform the frequency-domain filtering function to a time-domain function, and use the time-domain function as a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side.

(52) The obtained frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) may be expressed as a position-independent delay plus a minimum phase filter. Minimum phase filtering is performed on the obtained frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) in order to reduce a data length and reduce calculation complexity during virtual stereo synthesis, and additionally, a subjective instruction is not affected.

(53) (1) The virtual stereo synthesis apparatus extends the modulus of the obtained frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) to a time-domain transformation length N.sub.1 thereof, and calculates a logarithmic value:

(54) $| {\overline{H}}_{θ_{k}, φ_{k}}^{c} (n) | = {\begin{matrix} - \ln (| H_{θ_{k}, φ_{k}}^{c} (n) |) & n \leq N_{2} \\ - \ln (| H_{θ_{k}, φ_{k}}^{c} (N_{1} - n + 1) |) & N_{2} < n \leq N_{1} \end{matrix},$
where ln(x) is a natural logarithm of x, N.sub.1 is a time-domain transformation length of a time domain h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the frequency-domain filtering function, and N.sub.2 is a quantity of frequency domain coefficients of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n).

(55) (2) Hilbert transform is performed on the modulus |H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n)|, in (1), of the obtained frequency-domain filtering function:
H.sub.θ.sub.k.sub.,φ.sub.k.sup.H(n)=Hilbert(|H.sub.θ.sub.k.sub.,φ.sub.k.sup.c|),
where Hilbert( ) represents Hilbert transform.

(56) (3) A minimum phase filter H.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) is obtained:

(57) $H_{θ_{k}, φ_{k}}^{mp} (n) = | H_{θ_{k}, φ_{k}}^{c} (n) | ⅇ^{i * H_{θ_{k}, φ_{k}}^{H} (n)},$
where n=1 . . . N.sub.2.

(58) (4) A delay τ(θ.sub.k,φ.sub.k) is calculated:

(59) $τ (θ_{k}, φ_{k}) = - .Math. \frac{fs}{k_{\max}^{itd} - k_{\min}^{itd} + 1} {.Math.}_{k = k_{\min}^{itd}}^{k_{\max}^{itd}} \frac{\arg (H_{θ_{k}, φ_{k}}^{c} (k)) - H_{θ_{k}, φ_{k}}^{H} (k)}{π * fs * \frac{k}{N_{2} - 1}} .Math. .$

(60) (5) The minimum phase filter H.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) is transformed to time domain, to obtain h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n):
h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n)=real(InvFT(H.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n))),
where InvFT( ) represents inverse Fourier transform, and real( ) represents a real number part of a complex number x.

(61) (6) The time domain h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) of the minimum phase filter is truncated according to a length N.sub.0, and the delay τ(θ.sub.k, φ.sub.k) is added:

(62) $h_{θ_{k}, φ_{k}}^{c} (n) = {\begin{matrix} 0 & 1 \leq n \leq τ (θ_{k}, φ_{k}) \\ h_{θ_{k}, φ_{k}}^{mp} (n - τ (θ_{k}, φ_{k})) & τ (θ_{k}, φ_{k}) < n \leq τ (θ_{k}, φ_{k}) + N_{0} \end{matrix} .$

(63) Relatively large coefficients of the minimum phase filter H.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) obtained in (3) are concentrated in the front, and after relatively small coefficients in the rear are removed by means of truncation, a filtering effect does not change greatly. Therefore, generally, to reduce calculation complexity, the time domain h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) of the minimum phase filter is truncated according to the length N.sub.0, where a value of the length N.sub.0 may be selected according to the following steps. The time domain h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) of the minimum phase filter is sequentially compared, from the rear to the front, with a preset threshold e. A coefficient less than e is removed, and the comparison is continued to be performed on a coefficient prior to the removed coefficient, and is stopped until a coefficient is greater than e, where a total length of remaining coefficients is N.sub.0, and the preset threshold e may be 0.01.

(64) A tailored filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) is finally obtained according to steps S401 to 405 above, to be used as the filtering function of the sound input signal on the other side.

(65) It should be noted that, the foregoing example of obtaining the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is used as an optimal manner, in which diffuse-field equalization, subband smoothing, ratio calculation, and the minimum phase filtering is performed in sequence on the left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of the preset HRTF data of the sound input signal on the other side, to obtain the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side. However, in another implementation manner, the left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of the preset HRTF data of the sound input signal on the other side may also be separately used as the left-ear frequency domain parameter and the right-ear frequency domain parameter directly, and then ratio calculation is performed according to a formula

(66) 0 $| H_{θ_{k}, φ_{k}}^{c} | = \frac{| H_{θ_{k}, φ_{k}}^{l} (n) |}{| H_{θ_{k}, φ_{k}}^{r} (n) |}$
arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n))=arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n))−arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n)), to obtain the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, and the frequency-domain filtering function is transformed to time domain to obtain the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, or, the left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of a diffuse-field-equalized preset HRTF data are transformed to frequency domain, and then are separately used as the left-ear frequency domain parameter H.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear frequency domain parameter H.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n), ratio calculation is performed according to a

(67) $| H_{θ_{k}, φ_{k}}^{c} | = \frac{| {\overline{H}}_{θ_{k}, φ_{k}}^{l} (n) |}{| {\overline{H}}_{θ_{k}, φ_{k}}^{r} (n) |}$
formula arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n))=arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n))−arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n)), to obtain the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n), and the frequency-domain filtering function is transformed to time domain to obtain the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, or, subband smoothing is directly performed on the preset HRTF data of the sound input signal on the other side according to

(68) $| {\hat{H}}_{θ_{k}, φ_{k}} (n) | = \frac{1}{{.Math.}_{j = 1}^{j_{\max} - j_{\min} + 1} hann (j)} {.Math.}_{j = j_{\min}}^{j_{\max}} | H_{θ_{k}, φ_{k}} (j) * hann (j - j_{\min} + 1) |,$
the left-ear component and the right-ear component of the subband-smoothed preset HRTF data are separately used as the left-ear frequency domain parameter and the right-ear frequency domain parameter, ratio calculation is performed according to a formula

(69) $| H_{θ, φ_{i}}^{c} (n) | = \frac{| \hat{H_{θ, φ_{i}}^{l}} (n) |}{| \hat{H_{θ, φ_{i}}^{r}} (n) |}$
arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n))=arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n))−arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n)), and minimum phase filtering is performed, to obtain the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the minimum phase filtering. The step subband smoothing in step S402 is generally set together with the step of minimum phase filtering in step S405, that is, if the step of minimum phase filtering is not performed, the step of subband smoothing is not performed. The step of subband smoothing is added before the step of minimum phase filtering, which further reduces the data length of the obtained filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, and therefore further reduces calculation complexity during virtual stereo synthesis.

(70) Step S303: Separately perform reverberation processing on each sound input signal s.sub.2.sub.k(n) on the other side and then use the processed signal as a sound reverberation signal ŝ.sub.2.sub.k(n) on the other side.

(71) After acquiring the at least one sound input signal s.sub.2.sub.k(n) on the other side, the virtual stereo synthesis apparatus separately performs reverberation processing on each sound input signal s.sub.2.sub.k(n) on the other side, to enhance filtering effects such as environment reflection and scattering during actual sound broadcasting, and enhance a sense of space of the input signal. In this implementation manner, reverberation processing is implemented using an all-pass filter. Specifics are as follows:

(72) (1) As shown in FIG. 5, filtering is performed on each sound input signal s.sub.2.sub.k(n) on the other side using three cascaded Schroeder all-pass filters, to obtain a reverberation signal s.sub.2.sub.k(n) of each sound input signal s.sub.2.sub.k(n) on the other side:
s.sub.2.sub.k(n)=conv(h.sub.k(n),s.sub.2.sub.k(n−d.sub.k)),
where conv(x,y) represents a convolution of vectors x and y, d.sub.k is a preset delay of the k.sup.th sound input signal on the other side, h.sub.k(n) is an all-pass filter of the k.sup.th sound input signal on the other side, and a transfer function thereof is

(73) $H_{k} (z) = \frac{- g_{k}^{1} + z^{- M_{k}^{1}}}{1 - g_{k}^{1} * z^{M_{k}^{1}}} * \frac{- g_{k}^{2} + z^{- M_{k}^{2}}}{1 - g_{k}^{2} * z^{M_{k}^{2}}} * \frac{- g_{k}^{3} + z^{- M_{k}^{3}}}{1 - g_{k}^{3} * z^{M_{k}^{3}}},$
where g.sub.k.sup.1, g.sub.k.sup.2, and g.sub.k.sup.3 are preset all-pass filter gains corresponding to the k.sup.th sound input signal on the other side, and M.sub.k.sup.1, M.sub.k.sup.2, and M.sub.k.sup.3 are preset all-pass filter delays corresponding to the k.sup.th sound input signal on the other side.

(74) (2) Separately add each sound input signal s.sub.2.sub.k(n) on the other side to the reverberation signal s.sub.2.sub.k(n) of the sound input signal on the other side, to obtain the sound reverberation signal s.sub.2.sub.k(n) on the other side corresponding to each sound input signal on the other side:
ŝ.sub.2.sub.k(n)=s.sub.2.sub.k(n)+w.sub.k□s.sub.2.sub.k(n),
where w.sub.k is a preset weight of the reverberation signal s.sub.2.sub.k(n) of the k.sup.th sound input signal on the other side, and generally, a larger weight indicates a stronger sense of space of a signal but causes a greater negative effect (for example, an unclear voice or indistinct percussion music). In this implementation manner, a weight of the sound input signal on the other side is determined in the following manner a suitable value is selected in advance as the weight w.sub.k of the reverberation signal s.sub.2.sub.k(n) according to an experiment result, where the value enhances the sense of space of the sound input signal on the other side and does not cause a negative effect.

(75) Step S304: Separately perform convolution filtering on each sound reverberation signal s.sub.2.sub.k(n) on the other side and the filtering function h.sub.θ,φ.sub.i.sup.c(n) of the corresponding sound input signal on the other side, to obtain a filtered signal s.sub.2.sub.k.sup.h(n) on the other side.

(76) After separately performing reverberation processing on each of the at least one sound input signal on the other side to obtain the sound reverberation signal ŝ.sub.2.sub.k(n) on the other side, the virtual stereo synthesis apparatus performs convolution filtering on each sound reverberation signal ŝ.sub.2.sub.k(n) on the other side according to a formula s.sub.2.sub.k.sup.h(n)=conv(h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n), ŝ.sub.2.sub.k(n)), to obtain the filtered signal S.sub.2.sub.k.sup.h(n) on the other side, where ŝ.sub.2.sub.k(n) represents the k.sup.th sound filtered signal on the other side, h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) represents a filtering function of the k.sup.th sound input signal on the other side, and ŝ.sub.2.sub.k(n) represents the k.sup.th sound reverberation signal on the other side.

(77) Step S305: Summate all of the sound input signals s.sub.1.sub.m(n) on the one side and all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side to obtain a synthetic signal s.sup.−1(n)

(78) Furthermore, the virtual stereo synthesis apparatus obtains the synthetic signal s.sup.−1(n) corresponding to the one side according to a formula

(79) $s^{- 1} (n) = {.Math.}_{m = 1}^{M} s_{1_{m}} (n) + {.Math.}_{k = 1}^{K} s_{2_{k}}^{h} (n) .$
For example, if the sound input signal on the one side is a left-side sound input signal, a left-ear synthetic signal is obtained, or if the sound input signal on the one side is a right-side sound input signal, a right-ear synthetic signal is obtained.

(80) Step S306: Perform, using a fourth-order IIR filter, timbre equalization on the synthetic signal s.sup.−1(n) and then use the timbre-equalized synthetic signal as a virtual stereo signal s.sup.1(n).

(81) The virtual stereo synthesis apparatus performs timbre equalization on the synthetic signal s.sup.−1(n), to reduce a coloration effect, on the synthetic signal, from the convolution-filtered sound input signal on the other side. In this implementation manner, timbre equalization is performed using a fourth-order IIR filter eq(n). Furthermore, the virtual stereo signal s.sup.1(n) that is finally output to the ear on the one side is obtained according to a formula s.sup.1(n)=conv(eq(n),s.sup.−1(n)).

(82) A transfer function of eq(n) is

(83) $H (z) = \frac{b_{1} + b_{2} z^{- 1} + b_{3} z^{- 2} + b_{4} z^{- 3} + b_{5} z^{- 4}}{a_{1} + a_{2} z^{- 1} + a_{3} z^{- 2} + a_{4} z^{- 3} + a_{5} z^{- 4}}, where \begin{matrix} b_{1} = 1.24939117710166 \\ b_{2} = - 4.72162304562892 \\ b_{3} = 6.69867047060726 \\ b_{4} = - 4.22811576299464 \\ b_{5} = 1.00174331383529 \end{matrix}, and \begin{matrix} a_{1} = 1 \\ a_{2} = - 3.76394096632083 \\ a_{3} = 5.31938925722012 \\ a_{4} = - 3.34508050090584 \\ a_{5} = 0.789702281674921 \end{matrix} .$

(84) For better comprehension of practical use of the virtual stereo synthesis method of this application, descriptions are further provided using an example, in which a sound generated by a dual-channel terminal is replayed by a headset, where a left channel signal is a left-side sound input signal s.sub.l(n), and a right channel signal is a right-side sound input signal s.sub.r(n), where preset HRTF data of the left-side sound input signal s.sub.l(n) is h.sub.θ,φ.sup.l(n) h.sub.θ,φ.sup.l(n), and preset HRTF data of the right-side sound input signal s.sub.r(n) is h.sub.θ,φ.sup.r(n).

(85) A virtual stereo synthesis apparatus separately processes the preset HRTF data h.sub.θ,φ.sup.l(n) of the left-side sound input signal and the preset HRTF data h.sub.θ,φ.sup.r(n) of the right-side sound input signal separately according to steps S401 to S405 above, to obtain a tailored filtering function h.sub.θ,φ.sup.c.sup.l(n) of the left-side sound input signal and a tailored filtering function h.sub.θ,φ.sup.c.sup.r(n) of the right-side sound input signal. In this example, horizontal angles θ.sub.l and θ.sub.r of the preset HRTF data of the left and right channel signals are 90° and −90°, and elevation angles φ.sub.l and θ.sub.r of the preset HRTF data of the left and right channel signals are both 0°. That is, values of the horizontal angles of the filtering function of the left-side sound input signal are opposite numbers, and the elevation angles of the filtering function of the left-side sound input signal are the same. Therefore h.sub.θ,φ.sup.c.sup.l(n) and h.sub.θ,φ.sup.c.sup.r(n) are same functions.

(86) The virtual stereo synthesis apparatus acquires the left-side sound input signal s.sub.l(n) as a sound input signal on one side, and the right-side sound input signal s.sub.r(n) as a sound input signal on the other side. The virtual stereo synthesis apparatus executes step S303 to perform reverberation processing on the right-side sound input signal. A reverberation signal s.sub.r(n) of the right-side sound input signal is first obtained according to s.sub.r(n)=conv(h.sub.r(n),s.sub.r(n−d.sub.r)) and

(87) $H_{r} (z) = \frac{- g_{r}^{1} + z^{- M_{r}^{1}}}{1 - g_{r}^{1} * z^{M_{r}^{1}}} * \frac{- g_{r}^{2} + z^{- M_{r}^{2}}}{1 - g_{r}^{2} * z^{M_{r}^{2}}} * \frac{- g_{r}^{3} + z^{- M_{r}^{3}}}{1 - g_{r}^{3} * z^{M_{r}^{3}}},$
and a right-side sound reverberation signal ŝ.sub.r(n) is obtained according to ŝ.sub.r(n)=s.sub.r(n)+w.sub.r□s.sub.r(n). The virtual stereo synthesis apparatus executes steps S304 to S306 to obtain a left-ear virtual stereo signal s.sup.l(n). Similarly, the virtual stereo synthesis apparatus acquires the right-side sound input signal S.sub.r(n) as a sound input signal on one side, and the left-side sound input signal s.sub.l(n) as a sound input signal on the other side. The virtual stereo synthesis apparatus executes step S303 to perform reverberation processing on the left-side sound input signal. Further, a reverberation signal s.sub.l(n) of the left-side sound input signal is first obtained according to s.sub.l(n)=conv(h.sub.l(n),s.sub.l(n−d.sub.l)) and

(88) $H_{l} (z) = \frac{- g_{l}^{1} + z^{- M_{l}^{1}}}{1 - g_{l}^{1} * z^{M_{l}^{1}}} * \frac{- g_{l}^{2} + z^{- M_{l}^{2}}}{1 - g_{l}^{2} * z^{M_{l}^{2}}} * \frac{- g_{l}^{3} + z^{- M_{l}^{3}}}{1 - g_{l}^{3} * z^{M_{l}^{3}}},$
and a left-side sound reverberation signal ŝ.sub.l(n) is obtained according to ŝ.sub.l(n)=s.sub.l(n)+w.sub.l□s.sub.l(n). The virtual stereo synthesis apparatus executes steps S304 to S306 to obtain a right-ear virtual stereo signal s.sup.r(n). The left-side sound input signal s.sub.l(n) is replayed by a left-side earphone, to enter the left ear of a user, and the right-ear virtual stereo signal s.sup.r(n) is replayed by a right-side earphone, to enter the right ear of the user, to form a stereo listening effect.

(89) Values of constants in the foregoing example are: T=72, P=1, N=512, N.sub.0=48, fs=44100, d.sub.l=220, d.sub.r=264, g.sub.l.sup.1=g.sub.l.sup.2=g.sub.l.sup.3=g.sub.r.sup.1=g.sub.r.sup.2=g.sub.r.sup.3=0.6, M.sub.l.sup.1=M.sub.r.sup.1=220, M.sub.l.sup.2=M.sub.r.sup.2=132, M.sub.l.sup.3=M.sub.r.sup.3=74, w.sub.l==w.sub.r=0.4225, θ=45°, and φ=0°.

(90) The values of the constants are numerical values that are obtained by means of multiple experiments and that provide an optimal replay effect for a virtual stereo signal. Certainly, in another implementation manner, other numerical values may also be used. The values of the constants in this implementation manner are not further limited herein.

(91) In this implementation manner, which is used as an optimized implementation manner, steps S303, S304, S305, and S306 are executed to perform reverberation processing, convolution filtering operation, virtual stereo synthesis, and timbre equalization is performed in sequence, to finally obtain a virtual stereo. However, in another implementation manner, steps S303 and S306 may be selectively performed, for example, steps S303 and S306 are not executed, while convolution filtering is directly performed on the sound input signal on the other side using the filtering function of the sound input signal on the other side, to obtain the filtered signal ŝ.sub.2.sub.k(n) on the other side, and steps S304 and S305 are executed to obtain the synthetic signal s.sup.−1(n) that is used as the final virtual stereo signal s.sup.l(n), or step S306 is not executed, while steps S303 to S305 are executed to perform reverberation processing, a convolution filtering operation, and synthesis to obtain the synthetic signal s.sup.−l(n), and the synthetic signal s.sup.−l(n) is used as the virtual stereo signal s.sup.−l(n), or step S303 is not executed, while step S304 is directly executed to perform convolution filtering on the sound input signal on the other side, to obtain the filtered signal ŝ.sub.l(n) on the other side, and steps S305 and S306 are executed to obtain the final virtual stereo signal s.sup.l(n).

(92) In this implementation manner, reverberation processing is performed on a sound input signal on the other side, which enhances a sense of space of a synthetic virtual stereo, and during synthesis of a virtual stereo, timbre equalization is performed on the virtual stereo using a filter, which reduces a coloration effect. In addition, in this implementation manner, existing HRTF data is improved. Diffuse-field equalization is first performed on the HRTF data, to eliminate interference data from the HRTF data, and then a ratio operation is performed on a left-ear component and a right-ear component that are in the HRTF data, to obtain improved HRTF data in which orientation information of the HRTF data is retained, that is, a filtering function in this application such that corresponding convolution filtering needs to be performed on only the sound input signal on the other side, and then a virtual stereo with a relatively good replay effect can be obtained. Therefore, virtual stereo synthesis in this implementation manner is different from that in the prior art, in which the convolution filtering is performed on sound input signals on both sides, and therefore, calculation complexity is greatly reduced. Moreover, an original input signal is completely retained on one side, which reduces a coloration effect. Further, in this implementation manner, the filtering function is further processed by means of subband smoothing and minimum phase filtering, which reduces a data length of the filtering function, and therefore further reduces the calculation complexity.

(93) Referring to FIG. 6, FIG. 6 is a schematic structural diagram of an implementation manner of a virtual stereo synthesis apparatus according to this application. In this implementation manner, the virtual stereo synthesis apparatus includes an acquiring module 610, a generation module 620, a convolution filtering module 630, and a synthesis module 640.

(94) The acquiring module 610 is configured to acquire at least one sound input signal s.sub.1.sub.m(n) on one side and at least one sound input signal s.sub.2.sub.k(n) on the other side, and send the at least one sound input signal on the one side and at least one sound input signal on the other side to the generation module 620 and the convolution filtering module 630.

(95) In the present disclosure, an original sound signal is processed to obtain an output sound signal that has a stereo sound effect. In this implementation manner, there are a total of M simulated sound sources located on one side, which accordingly generate M sound input signals on the one side, and there are a total of K simulated sound sources located on the other side, which accordingly generate K sound input signals on the other side. The acquiring module 610 acquires the M sound input signals s.sub.1.sub.m(n) on the one side and the K sound input signals s.sub.2.sub.k(n) on the other side, where the M sound input signals s.sub.1.sub.m (n) on the one side and the K sound input signals s.sub.2.sub.k(n) on the other side are used as original sound signals, where s.sub.1.sub.m(n) represents the m.sup.th sound input signal on the one side, s.sub.2.sub.k(n) represents the k.sup.th sound input signal on the other side, 1≦m≦M, and 1≦k≦K.

(96) Generally, in the present disclosure, the sound input signals on the one side and the other side simulate sound signals that are sent from left side and right side positions of an artificial head center in order to be distinguished from each other, for example, if the sound input signal on the one side is a left-side sound input signal, the sound input signal on the other side is a right-side sound input signal, or if the sound input signal on the one side is a right-side sound input signal, the sound input signal on the other side is a left-side sound input signal, where the left-side sound input signal is a simulation of a sound signal that is sent from the left side position of the artificial head center, and the right-side sound input signal is a simulation of a sound signal that is sent from the right side position of the artificial head center.

(97) The generation module 620 is configured to separately perform ratio processing on a preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side, and send the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side to the convolution filtering module 630.

(98) Different HRTF experimental measurement databases can already be provided in the prior art. The generation module 620 may directly acquire, without performing measurement, HRTF data from the HRTF experimental measurement databases in the prior art, to perform presetting, and a simulated sound source position of a sound input signal is a sound source position during measurement of corresponding preset HRTF data. In this implementation manner, each sound input signal correspondingly comes from a different preset simulated sound source, and therefore a different piece of HRTF data is correspondingly preset for each sound input signal. The preset HRTF data of each sound input signal can express a filtering effect on the sound input signal that is transmitted from a preset position to the two ears. Furthermore, preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the k.sup.th sound input signal on the other side includes two pieces of data, which are respectively a left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) that expresses a filtering effect on the sound input signal that is transmitted to the left ear of the artificial head and a right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) that expresses a filtering effect on the sound input signal that is transmitted to the right ear of the artificial head.

(99) The generation module 620 performs ratio processing on the left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) in preset HRTF data of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side, for example, the generation module 620 directly transforms the preset HRTF left-ear component and the preset HRTF right-ear component of the sound input signal on the other side to frequency domain, performs a ratio operation to obtain a value, and uses the obtained value as the filtering function of the sound input signal on the other side, or the generation module 620 first transforms the preset HRTF left-ear component and the preset HRTF right-ear component of the sound input signal on the other side to frequency domain, performs subband smoothing, then performs a ratio operation to obtain a value, and uses the obtained value as the filtering function.

(100) The convolution filtering module 630 is configured to separately perform convolution filtering on each sound input signal s.sub.2.sub.k(n) on the other side and the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal s.sub.2.sub.k.sup.h(n) on the other side, to obtain the filtered signal on the other side, and send all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side to the synthesis module 640.

(101) The convolution filtering module 630 calculates the filtered signal s.sub.2.sub.k.sup.h(n) on the other side corresponding to each sound input signal s.sub.2.sub.k(n) on the other side according to a formula s.sub.2.sub.k.sup.h(n)=conv(h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n),s.sub.2.sub.k(n)), where conv(x, y) represents a convolution of vectors x and y, s.sub.2.sub.k.sup.h(n) represents the k.sup.th filtered signal on the other side, h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) represents a filtering function of the k.sup.th sound input signal on the other side, and s.sub.2.sub.k(n) represents the k.sup.th sound input signal on the other side.

(102) The synthesis module 640 is configured to synthesize all of the sound input signals s.sub.1.sub.m (n) on the one side and all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side into a virtual stereo signal s.sup.1(n).

(103) The synthesis module 640 is configured to synthesize, according to

(104) $s^{1} (n) = {.Math.}_{m = 1}^{M} s_{1_{m}} (n) + {.Math.}_{k = 1}^{K} s_{2_{k}}^{h} (n),$
all of the received sound input signals s.sub.1.sub.m(n) on the one side and all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side into the virtual stereo signal s.sup.1(n).

(105) In this implementation manner, ratio processing is performed on left-ear and right-ear components of preset HRTF data of each sound input signal on the other side, to obtain a filtering function that retains orientation information of the preset HRTF data such that during synthesis of a virtual stereo, convolution filtering processing needs to be performed on only the sound input signal on the other side using the filtering function, and the sound input signal on the other side and a sound input signal on one side are synthesized to obtain the virtual stereo, without a need to simultaneously perform convolution filtering on the sound input signals that are on the two sides, which greatly reduces calculation complexity, and during synthesis, convolution processing does not need to be performed on the sound input signal on the one side, and therefore an original audio is retained, which further alleviates a coloration effect, and improves sound quality of the virtual stereo.

(106) It should be noted that, in this implementation manner, the generated virtual stereo is a virtual stereo that is input to an ear on one side, for example, if the sound input signal on the one side is a left-side sound input signal, and the sound input signal on the other side is a right-side sound input signal, the virtual stereo signal obtained by the foregoing module is a left-ear virtual stereo signal that is directly input to the left ear, or if the sound input signal on the one side is a right-side sound input signal, and the sound input signal on the other side is a left-side sound input signal, the virtual stereo signal obtained by the foregoing module is a right-ear virtual stereo signal that is directly input to the right ear. In the foregoing manner, the virtual stereo synthesis apparatus can separately obtain a left-ear virtual stereo signal and a right-ear virtual stereo signal, and output the signals to the two ears using a headset, to achieve a stereo effect that is like a natural sound.

(107) Referring to FIG. 7, FIG. 7 is a schematic structural diagram of another implementation manner of a virtual stereo synthesis apparatus according to the present disclosure. In this implementation manner, the virtual stereo synthesis apparatus includes an acquiring module 710, a generation module 720, a convolution filtering module 730, a synthesis module 740, and a reverberation processing module 750, where the synthesis module 740 includes a synthesis unit 741 and a timbre equalization unit 742.

(108) The acquiring module 710 is configured to acquire at least one sound input signal s.sub.1.sub.m(n) one side and at least one sound input signal s.sub.2.sub.k(n) on the other side.

(109) The generation module 720 is configured to separately perform ratio processing on a preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side, and send the filtering function to the convolution filtering module 730.

(110) Further optimized, the generation module 720 includes a processing unit 721, a ratio unit 722, and a transformation unit 723.

(111) The processing unit 721 is configured to separately use a frequency domain, after diffuse-field equalization and subband smoothing is performed in sequence, of the preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) of each sound input signal on the other side as a left-ear frequency domain parameter of each sound input signal on the other side, separately use a frequency domain, after diffuse-field equalization and subband smoothing is performed in sequence, of the preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of each sound input signal on the other side as a right-ear frequency domain parameter of each sound input signal on the other side, and send the left-ear and right-ear frequency domain parameters to the ratio unit 722.

(112) a. The processing unit 721 performs diffuse-field equalization on preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side. A preset HRTF data of the k.sup.th sound input signal on the other side is represented by h.sub.θ.sub.k.sub.,φ.sub.k(n), where a horizontal angle between a simulated sound source of the k.sup.th sound input signal on the other side and an artificial head center is θ.sub.k, an elevation angle between the simulated sound source of the k.sup.th sound input signal on the other side and the artificial head center is φ.sub.k, and h.sub.θ.sub.k.sub.,φ.sub.k(n) includes two pieces of data: a left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n). Generally, a preset HRTF data obtained by means of measurement in a laboratory not only includes filter model data of transmission paths from a speaker, used as a sound source, to two ears of an artificial head, but also includes interference data such as a frequency response of the speaker, a frequency response of microphones that are disposed at the two ears to receive a signal of the speaker, and a frequency response of an ear canal of an artificial ear. These interference data affects a sense of orientation and a sense of distance of a synthetic virtual sound. Therefore, in this implementation manner, an optimal manner is used, in which the foregoing interference data is eliminated by means of diffuse-field equalization.

(113) (1) Furthermore, the processing unit 721 calculates that a frequency domain of the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side is H.sub.θ.sub.k.sub.,φ.sub.k(n).

(114) (2) The processing unit 721 calculates an average energy spectrum DF _avg(n), in all directions, of the preset HRTF data frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side:

(115) 0 ${DF}_{—} avg (n) = \frac{1}{(2 * T * P)} {.Math.}_{φ_{k} = φ_{1}}^{φ_{P}} {.Math.}_{θ_{k} = θ_{1}}^{θ_{T}} | H_{θ_{k}, φ_{k}} (n) |^{2},$
where |H.sub.θ.sub.k.sub.,φ.sub.k(n)| represents a modulus of H.sub.θ.sub.k.sub.,φ.sub.k(n), P and T represent a quantity P of elevation angles between test sound sources and an artificial head center, and a quantity T of horizontal angles between the test sound sources and the artificial head center, where P and T are included in an HRTF experimental measurement database in which H.sub.θ.sub.k.sub.,φ.sub.k(n) is located. In the present disclosure, when HRTF data in different HRTF experimental measurement databases is used, the quantity P of elevation angles and the quantity T of horizontal angles may be different.

(116) (3) The processing unit 721 inverses the average energy spectrum DF _avg(n), to obtain an inversion DF inv(n) of the average energy spectrum of the preset HRTF data frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n):

(117) ${DF}_{—} inv (n) = \frac{1}{{DF}_{—} avg (n)} .$

(118) (4) The processing unit 721 transforms the inversion DF _inv(n) of the average energy spectrum of the preset HRTF data frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) to time domain, and takes a real value, to obtain an average inverse filtering sequence df _inv(n) of the preset HRTF data:
df _inv(n)=real(InvFT(DF_inv(n))),
where InvFT( ) represents inverse Fourier transform, and real(x) represents calculation of a real number part of a complex number x.

(119) (5) The processing unit 721 performs convolution on the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side and the average inverse filtering sequence df _inv(n) of the preset HRTF data, to obtain diffuse-field-equalized preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n):
h.sub.θ.sub.k.sub.,φ.sub.k(n)=conv(h.sub.θ.sub.k.sub.,φ.sub.k(n),df _inv(n)),
where conv(x,y) represents a convolution of vectors x and y, and h.sub.θ.sub.k.sub.,φ.sub.k(n) includes a diffuse-field-equalized preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a diffuse-field-equalized preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n).

(120) The processing unit 721 performs the foregoing processing (1) to (5) on the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) of the sound input signal on the other side, to obtain the diffuse-field-equalized HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n).

(121) b. The processing unit 721 performs subband smoothing on the diffuse-field-equalized preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n). The processing unit 721 transforms the diffuse-field-equalized preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) to frequency domain, to obtain a frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the diffuse-field-equalized preset HRTF data. A time-domain transformation length of h.sub.θ.sub.k.sub.,φ.sub.k(n) is N.sub.1, and a quantity of frequency domain coefficients of H.sub.θ.sub.k.sub.,φ.sub.k(n) is N.sub.2, where N.sub.2=N½+1.

(122) The processing unit 721 performs subband smoothing on the frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the diffuse-field-equalized preset HRTF data, calculates a modulus, and uses frequency domain data as subband-smoothed preset HRTF data |Ĥ.sub.θ.sub.k.sub.,φ.sub.k(n)|:

(123) $| {\hat{H}}_{θ_{k}, φ_{k}} (n) | = \frac{1}{{.Math.}_{j = 1}^{j_{\max} - j_{\min} + 1} hann (j)} {.Math.}_{j = j_{\min}}^{j_{\max}} | {\overline{H}}_{θ_{k}, φ_{k}} (j) * hann (j - j_{\min} + 1) | where$ $j_{\min} = {\begin{matrix} n - bw (n) & n - bw (n) > 1 \\ 1 & n - bw (n) \leq 1 \end{matrix} j_{\max} = {\begin{matrix} n + bw (n) & n + bw (n) > M \\ M & n + bw (n) \leq M \end{matrix},$
bw(n)=└0.2*n┘, └x┘ represents a maximum integer that is not greater than x, and
hann(j)=0.5*(1−cos(2*π*j/(2*bw(n)+1))),j=0 . . . (2*bw(n)+1).

(124) c. The processing unit 721 uses a preset HRTF left-ear frequency domain component Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) after the subband smoothing as a left-ear frequency domain parameter of the sound input signal on the other side, and uses a preset HRTF right-ear frequency domain component Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) after the subband smoothing as a right-ear frequency domain parameter of the sound input signal on the other side. The left-ear frequency domain parameter represents a preset HRTF left-ear component of the sound input signal on the other side, and the right-ear frequency domain parameter represents a preset HRTF right-ear component of the sound input signal on the other side. Certainly, in another implementation manner, the preset HRTF left-ear component of the sound input signal on the other side may be directly used as the left-ear frequency domain parameter, or the preset HRTF left-ear component that has been subject to diffuse-field equalization may be used as the left-ear frequency domain parameter. It is similar for the right-ear frequency domain parameter.

(125) It should be noted that, in the foregoing description, when diffuse-field equalization and subband smoothing are performed, the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) is processed. However, the preset HRTF data h.sub.θ.sub.k.sub.,φ.sub.k(n) includes two pieces of data: the left-ear component and the right-ear component, and therefore in fact, it is equivalent to that the diffuse-field equalization and the subband smoothing are performed separately on the left-ear component and the right-ear component of a preset HRTF data.

(126) The ratio unit 722 is configured to separately use a ratio of the left-ear frequency domain parameter of the sound input signal on the other side to the right-ear frequency domain parameter of the sound input signal on the other side as a frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side. The ratio of the left-ear frequency domain parameter of the sound input signal on the other side to the right-ear frequency domain parameter of the sound input signal on the other side further includes a modulus ratio and an argument difference between the left-ear frequency domain parameter and the right-ear frequency domain parameter, where the modulus ratio and the argument difference are correspondingly used as a modulus and an argument in the frequency-domain filtering function of the sound input signal on the other side, and the obtained filtering function can retain orientation information of the preset HRTF left-ear component and the preset HRTF right-ear component of the sound input signal on the other side.

(127) In this implementation manner, the ratio unit 722 performs a ratio operation on the left-ear frequency domain parameter and the right-ear frequency domain parameter of the sound input signal on the other side. Further, the modulus of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is obtained according to

(128) $| H_{θ, φ_{i}}^{c} (n) | = \frac{| \hat{H_{θ, φ_{i}}^{l}} (n) |}{| \hat{H_{θ, φ_{i}}^{r}} (n) |},$
the argument of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) is obtained according to arg(H.sub.θ,φ.sub.i.sup.c(n))=arg(H.sub.θ,φ.sub.i.sup.l(n))−arg(H.sub.θ,φ.sub.i.sup.r(n)), and therefore the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is obtained. |Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n)| and |Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n)| respectively represent a left-ear component and a right-ear component of the subband-smoothed preset HRTF data |Ĥ.sub.θ.sub.k.sub.,φ.sub.k (n)|, and H.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and H.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) respectively represent a left-ear component and a right-ear component of the frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the diffuse-field-equalized preset HRTF data. In subband smoothing, only a modulus value of a complex number is processed, that is, a value obtained after subband smoothing is the modulus value of the complex number, and does not include argument information. Therefore, when the argument of the frequency-domain filtering function is calculated, a frequency domain parameter that can represent the preset HRTF data and that includes argument information needs to be used, for example, left and right components of a diffuse-field-equalized HRTF data.

(129) The transformation unit 723 is configured to separately perform minimum phase filtering on the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, then transform the frequency-domain filtering function to a time-domain function, and use the time-domain function as a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side. The obtained frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) may be expressed as a position-independent delay plus a minimum phase filter. Minimum phase filtering is performed on the obtained frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) in order to reduce a data length and reduce calculation complexity during virtual stereo synthesis, and additionally, a subjective instruction is not affected.

(130) (1) The transformation unit 723 extends the modulus of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) obtained by the ratio unit 722 to a time-domain transformation length N.sub.1 thereof, and calculates a logarithmic value:

(131) $| {\overline{H}}_{θ_{k}, φ_{k}}^{c} (n) | = {\begin{matrix} - \ln (| H_{θ_{k}, φ_{k}}^{c} (n) |) & n \leq N_{2} \\ - \ln (| H_{θ_{k}, φ_{k}}^{c} (N_{1} - n + 1) |) & N_{2} < n \leq N_{1} \end{matrix},$
where ln(x) is a natural logarithm of x, N.sub.1 is a time-domain transformation length of a time domain h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the frequency-domain filtering function, and N.sub.2 is a quantity of frequency domain coefficients of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n).

(132) (2) The transformation unit 723 performs Hilbert transform on the modulus |H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n)| of the obtained frequency-domain filtering function
H.sub.θ.sub.k.sub.,φ.sub.k.sup.H(n)=Hilbert(|H.sub.θ.sub.k.sub.,φ.sub.k.sup.c|),
where Hilbert( ) represents Hilbert transform.

(133) (3) The transformation unit 723 obtains a minimum phase filter H.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n):

(134) $H_{θ_{k}, φ_{k}}^{mp} (n) = | H_{θ_{k}, φ_{k}}^{c} (n) | ⅇ^{i * H_{θ_{k}, φ_{k}}^{H} (n)},$
where n=1 . . . N.sub.2.

(135) (4) The transformation unit 723 calculates a delay τ(θ.sub.k,φ.sub.k):

(136) $τ (θ_{k}, φ_{k}) = - .Math. \frac{fs}{k_{\max}^{itd} - k_{\min}^{itd} + 1} {.Math.}_{k = k_{\min}^{itd}}^{k_{\max}^{itd}} \frac{\arg (H_{θ_{k}, φ_{k}}^{c} (k)) - H_{θ_{k}, φ_{k}}^{H} (k)}{π * fs * \frac{K}{N_{2} - 1}} .Math. .$

(137) (5) The transformation unit 723 transforms the minimum phase filter H.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) to time domain, to obtain h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n):
h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n)=real(InvFT(H.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n))),
where InvFT( ) represents inverse Fourier transform, and real( ) represents a real number part of a complex number x.

(138) (6) The transformation unit 723 truncates the time domain h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) of the minimum phase filter according to a length N.sub.0, and adds the delay τ(θ.sub.k,φ.sub.k):

(139) $h_{θ_{k}, φ_{k}}^{c} (n) = {\begin{matrix} 0 & 1 \leq n \leq τ (θ_{k}, φ_{k}) \\ h_{θ_{k}, φ_{k}}^{mp} (n - τ (θ_{k}, φ_{k})) & τ (θ_{k}, φ_{k}) < n \leq τ (θ_{k}, φ_{k}) + N_{0} \end{matrix} .$

(140) Relatively large coefficients of the minimum phase filter H.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) obtained in (3) are concentrated in the front, and after relatively small coefficients in the rear are removed by means of truncation, a filtering effect does not change greatly. Therefore, generally, to reduce calculation complexity, the time domain h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) of the minimum phase filter is truncated according to the length N.sub.0, where a value of the length N.sub.0 may be selected according to the following steps The time domain h.sub.θ.sub.k.sub.,φ.sub.k.sup.mp(n) of the minimum phase filter is sequentially compared, from the rear to the front, with a preset threshold e. A coefficient less than e is removed, and the comparison is continued to be performed on a coefficient prior to the removed coefficient, and is stopped until a coefficient is greater than e, where a total length of remaining coefficients is N.sub.0, and the preset threshold e may be 0.01.

(141) It should be noted that, the foregoing example in which the generation module obtains the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is used as an optimal manner, in which diffuse-field equalization, subband smoothing, ratio calculation, and minimum phase filtering is performed in sequence on the left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of the preset HRTF data of the sound input signal on the other side, to obtain the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side. However, in another implementation manner, diffuse-field equalization, subband smoothing, and minimum phase filtering are selectively performed. The step of subband smoothing is generally set together with the step of minimum phase filtering, that is, if the step of minimum phase filtering is not performed, the step of subband smoothing is not performed. The step of subband smoothing is added before the step of minimum phase filtering, which further reduces the data length of the obtained filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, and therefore further reduces calculation complexity during virtual stereo synthesis.

(142) The reverberation processing module 750 is configured to separately perform reverberation processing on each sound input signal s.sub.2.sub.k(n) on the other side and then use the processed signal as a sound reverberation signal ŝ.sub.2.sub.k(n) on the other side, and send the sound reverberation signal on the other side to the convolution filtering module 730.

(143) After acquiring the at least one sound input signal s.sub.2.sub.k(n) on the other side, the reverberation processing module 750 separately performs reverberation processing on each sound input signal s.sub.2.sub.k(n) on the other side, to enhance filtering effects such as environment reflection and scattering during actual sound broadcasting, and enhance a sense of space of the input signal. In this implementation manner, reverberation processing is implemented using an all-pass filter. Specifics are as follows:

(144) (1) As shown in FIG. 5, filtering is performed on each sound input signal s.sub.2.sub.k(n) on the other side using three cascaded Schroeder all-pass filters, to obtain a reverberation signal s.sub.2.sub.k(n) of each sound input signal s.sub.2.sub.k(n) on the other side
s.sub.2.sub.k(n)=conv(h.sub.k(n),s.sub.2.sub.k(n−d.sub.k))
where conv(x, y) represents a convolution of vectors x and y, d.sub.k is a preset delay of the k.sup.th sound input signal on the other side, h.sub.k(n) is an all-pass filter of the k.sup.th sound input signal on the other side, and a transfer function thereof is:

(145) $H_{k} (z) = \frac{- g_{k}^{1} + z^{- M_{k}^{1}}}{1 - g_{k}^{1} * z^{M_{k}^{1}}} * \frac{- g_{k}^{2} + z^{- M_{k}^{2}}}{1 - g_{k}^{2} * z^{M_{k}^{2}}} * \frac{- g_{k}^{3} + z^{- M_{k}^{3}}}{1 - g_{k}^{3} * z^{M_{k}^{3}}}$
where g.sub.k.sup.1, g.sub.k.sup.2, and g.sub.k.sup.3 are preset all-pass filter gains corresponding to the k.sup.th sound input signal on the other side, and M.sub.k.sup.1, M.sub.k.sup.2, and M.sub.k.sup.3 are preset all-pass filter delays corresponding to the k.sup.th sound input signal on the other side.

(146) (2) The reverberation processing module 750 separately adds each sound input signal s.sub.2.sub.k(n) on the other side to the reverberation signal s.sub.2.sub.k(n) of the sound input signal on the other side, to obtain the sound reverberation signal ŝ.sub.2.sub.k(n) on the other side corresponding to each sound input signal on the other side:
ŝ.sub.2.sub.k(n)=s.sub.2.sub.k(n)+w.sub.k□s.sub.2.sub.k(n),
where w.sub.k is a preset weight of the reverberation signal s.sub.2.sub.k(n) of the k.sup.th sound input signal on the other side, and generally, a larger weight indicates a stronger sense of space of a signal but causes a greater negative effect (for example, an unclear voice or indistinct percussion music). In this implementation manner, a weight of the sound input signal on the other side is determined in the following manner: a suitable value is selected in advance as the weight w.sub.k of the reverberation signal s.sub.2.sub.k(n) according to an experiment result, where the value enhances the sense of space of the sound input signal on the other side and does not cause a negative effect.

(147) The convolution filtering module 730 is configured to separately perform convolution filtering on each sound reverberation signal ŝ.sub.2.sub.k(n) on the other side and the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the corresponding sound input signal on the other side, to obtain a filtered signal s.sub.2.sub.k.sup.h(n) on the other side, and send the filtered signal on the other side to the synthesis module 740.

(148) After receiving all the sound reverberation signals ŝ.sub.2.sub.k(n) on the other side, the convolution filtering module 730 performs convolution filtering on each sound reverberation signal ŝ.sub.2.sub.k(n) on the other side according to a formula s.sub.2.sub.k.sup.h(n)=conv(h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n),ŝ.sub.2.sub.k(n)), to obtain the filtered signal s.sub.2.sub.k.sup.h(n) on the other side, where ŝ.sub.2.sub.k(n) represents the k.sup.th sound filtered signal on the other side, h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) represents a filtering function of the k.sup.th sound input signal on the other side, and ŝ.sub.2.sub.k(n) represents the k.sup.th sound reverberation signal on the other side.

(149) The synthesis unit 741 is configured to summate all of the sound input signals s.sub.1.sub.m(n) on the one side and all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side to obtain a synthetic signal, and send the synthetic signal s.sup.1(n) to the timbre equalization unit 742.

(150) Furthermore, the synthesis unit 741 obtains the synthetic signal s.sup.1(n) corresponding to the one side according to a formula

(151) ${\overline{s}}^{1} (n) = {.Math.}_{m = 1}^{M} S_{1_{m}} (n) + {.Math.}_{k = 1}^{K} S_{2_{k}}^{h} (n) .$
For example, if the sound input signal on the one side is a left-side sound input signal, a left-ear synthetic signal is obtained, or if the sound input signal on the one side is a right-side sound input signal, a right-ear synthetic signal is obtained.

(152) The timbre equalization unit 742 is configured to perform, using a fourth-order IIR filter, timbre equalization on the synthetic signal s.sup.1(n) and then use the timbre-equalized synthetic signal as a virtual stereo signal s.sup.1(n).

(153) The timbre equalization unit 742 performs timbre equalization on the synthetic signal s.sup.1(n), to reduce a coloration effect, on the synthetic signal, from the convolution-filtered sound input signal on the other side. In this implementation manner, timbre equalization is performed using a fourth-order IIR filter eq(n). Further, the virtual stereo signal s.sup.1(n) that is finally output to the ear on the one side is obtained according to a formula s.sup.1(n)=conv(eq(n),s.sup.1(n)).

(154) A transfer function of eq(n) is

(155) 0 $H (z) = \frac{b_{1} + b_{2} z^{- 1} + b_{3} z^{- 2} + b_{4} z^{- 3} + b_{5} z^{- 4}}{a_{1} + a_{2} z^{- 1} + a_{3} z^{- 2} + a_{4} z^{- 3} + a_{5} z^{- 4}}, where$ $\begin{matrix} b_{1} = 1.24939117710166 \\ b_{2} = - 4.72162304562892 \\ b_{3} = 6.69867047060726 \\ b_{4} = - 4.22811576399464 \\ b_{5} = 1.00174331383528 \end{matrix}, and \begin{matrix} a_{1} = 1 \\ a_{2} = - 3, 76394096632083 \\ a_{3} = 5.31928925722012 \\ a_{4} = - 3.34508050090584 \\ a_{5} = 0.789702281674921 \end{matrix} .$

(156) In this implementation manner, which is used as an optimized implementation manner, reverberation processing, convolution filtering operation, virtual stereo synthesis, and timbre equalization are performed in sequence, to finally obtain a virtual stereo. However, in another implementation manner, reverberation processing and/or timbre equalization may not be performed, which is not limited herein.

(157) It should be noted that, the virtual stereo synthesis apparatus of this application may be an independent sound replay device, for example, a mobile terminal such as a mobile phone, a tablet computer, or an MP3 player, and the foregoing functions are also performed by the sound replay device.

(158) Referring to FIG. 8, FIG. 8 is a schematic structural diagram of still another implementation manner of a virtual stereo synthesis apparatus. In this implementation manner, the virtual stereo synthesis apparatus includes a processor 810 and a memory 820, where the processor 810 is connected to the memory 820 using a bus 830.

(159) The memory 820 is configured to store a computer instruction executed by the processor 810 and data that the processor 810 needs to store at work.

(160) The processor 810 executes the computer instruction stored in the memory 820, to acquire at least one sound input signal s.sub.1.sub.m(n) on one side and at least one sound input signal s.sub.2.sub.k(n) on the other side, separately perform ratio processing on a preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side, separately perform convolution filtering on each sound input signal s.sub.2.sub.k(n) on the other side and the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, to obtain the filtered signal s.sub.2.sub.k.sup.h(n) on the other side, and synthesize all of the sound input signals s.sub.1.sub.m(n) on the one side and all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side into a virtual stereo signal s.sup.1(n).

(161) Further, the processor 810 acquires the at least one sound input signal s.sub.1.sub.m(n) on the one side and the at least one sound input signal s.sub.2.sub.k(n) on the other side, where s.sub.1.sub.m(n) represents the m.sup.th sound input signal on the one side, and s.sub.2.sub.k(n) represents the k.sup.th sound input signal on the other side.

(162) The processor 810 is configured to separately perform ratio processing on a preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and a preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of each sound input signal s.sub.2.sub.k(n) on the other side, to obtain a filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of each sound input signal on the other side.

(163) Further optimized, the processor 810 separately uses a frequency domain, after diffuse-field equalization and subband smoothing is performed in sequence, of the preset HRTF left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) of each sound input signal on the other side as a left-ear frequency domain parameter of each sound input signal on the other side, and separately uses a frequency domain, after diffuse-field equalization and subband smoothing is performed in sequence, of the preset HRTF right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of each sound input signal on the other side as a right-ear frequency domain parameter of each sound input signal on the other side. A manner in which the processor 810 further performs diffuse-field equalization and subband smoothing is the same as that of the processing unit in the foregoing implementation manner. Refer to related text descriptions, and details are not described herein.

(164) The processor 810 separately uses a ratio of the left-ear frequency domain parameter of the sound input signal on the other side to the right-ear frequency domain parameter of the sound input signal on the other side as a frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side. Further, a modulus of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is obtained according to

(165) $| H_{θ, φ_{i}}^{c} (n) | = \frac{| \hat{H_{θ, φ_{i}}^{l}} (n) |}{| \hat{H_{θ, φ_{i}}^{r}} (n) |},$
an argument of the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) is obtained according to arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n))=arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n))−arg(H.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n)), and therefore the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is obtained. |Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n)| and |Ĥ.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n)| respectively represent a left-ear component and a right-ear component of the subband-smoothed preset HRTF data |Ĥ.sub.θ.sub.k.sub.,φ.sub.k(n)|, and H.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and H.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) respectively represent a left-ear component and a right-ear component of the frequency domain H.sub.θ.sub.k.sub.,φ.sub.k(n) of the diffuse-field-equalized preset HRTF data.

(166) The processor 810 separately performs minimum phase filtering on the frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, then transform the frequency-domain filtering function to a time-domain function, and use the time-domain function as the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side. The obtained frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) may be expressed as a position-independent delay plus a minimum phase filter. Minimum phase filtering is performed on the obtained frequency-domain filtering function H.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) in order to reduce a data length and reduce calculation complexity during virtual stereo synthesis, and additionally, a subjective instruction is not affected. A specific manner in which the processor 810 performs minimum phase filtering is the same as that of the transformation unit in the foregoing implementation manner. Refer to related text descriptions, and details are not described herein.

(167) It should be noted that, the foregoing example in which the processor obtains the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side is used as an optimal manner, in which diffuse-field equalization, subband smoothing, ratio calculation, and minimum phase filtering are performed in sequence on the left-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.l(n) and the right-ear component h.sub.θ.sub.k.sub.,φ.sub.k.sup.r(n) of the preset HRTF data of the sound input signal on the other side, to obtain the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side. However, in another implementation manner, diffuse-field equalization, subband smoothing, and minimum phase filtering are selectively performed. The step of subband smoothing is generally set together with the step of minimum phase filtering, that is, if the step of minimum phase filtering is not performed, the step of subband smoothing is not performed. The step of subband smoothing is added before the step of minimum phase filtering, which further reduces the data length of the obtained filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the sound input signal on the other side, and therefore further reduces calculation complexity during virtual stereo synthesis.

(168) The processor 810 is configured to separately perform reverberation processing on each sound input signal s.sub.2.sub.k(n) on the other side and then use the processed signal as a sound reverberation signal ŝ.sub.2.sub.k(n) on the other side, to enhance filtering effects such as environment reflection and scattering during actual sound broadcasting, and enhance a sense of space of the input signal. In this implementation manner, reverberation processing is implemented using an all-pass filter. A specific manner in which the processor 810 performs reverberation processing is the same as that of the reverberation processing module in the foregoing implementation manner. Refer to related text descriptions, and details are not described herein.

(169) The processor 810 is configured to separately perform convolution filtering on each sound reverberation signal ŝ.sub.2.sub.k(n) on the other side and the filtering function h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) of the corresponding sound input signal on the other side, to obtain a filtered signal s.sub.2.sub.k.sup.h(n) on the other side. After receiving all the sound reverberation signals ŝ.sub.2.sub.k(n) on the other side, the processor 810 performs convolution filtering on each sound reverberation signal ŝ.sub.2.sub.k(n) on the other side according to a formula s.sub.2.sub.k.sup.h(n)=conv(h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n),ŝ.sub.2.sub.k(n)), to obtain the filtered signal s.sub.2.sub.k.sup.h(n) on the other side, where ŝ.sub.2.sub.k(n) represents the k.sup.th sound filtered signal on the other side, h.sub.θ.sub.k.sub.,φ.sub.k.sup.c(n) represents a filtering function of the k.sup.th sound input signal on the other side, and ŝ.sub.2.sub.k(n) represents the k.sup.th sound reverberation signal on the other side.

(170) The processor 810 is configured to summate all of the sound input signals s.sub.1.sub.m(n) on the one side and all of the filtered signals s.sub.2.sub.k.sup.h(n) on the other side to obtain a synthetic signal ŝ.sup.1(n).

(171) Further, the processor 810 obtains the synthetic signal s.sup.1(n) corresponding to the one side according to a formula

(172) ${\overline{s}}^{1} (n) = {.Math.}_{m = 1}^{M} S_{1_{m}} (n) + {.Math.}_{k = 1}^{K} S_{2_{k}}^{h} (n) .$
For example, if the sound input signal on the one side is a left-side sound input signal, a left-ear synthetic signal is obtained, or if the sound input signal on the one side is a right-side sound input signal, a right-ear synthetic signal is obtained.

(173) The processor 810 is configured to perform, using a fourth-order IIR filter, timbre equalization on the synthetic signal s.sup.1(n) and then use the timbre-equalized synthetic signal as a virtual stereo signal s.sup.1(n). A specific manner in which the processor 810 performs timbre equalization is the same as that of the timbre equalization unit in the foregoing implementation manner. Refer to related text descriptions, and details are not described herein.

(174) In this implementation manner, which is used as an optimized implementation manner, reverberation processing, convolution filtering operation, virtual stereo synthesis, and timbre equalization are performed in sequence, to finally obtain a left-ear or right-ear virtual stereo. However, in another implementation manner, the processor may not perform reverberation processing and the timbre equalization may be not performed, which is not limited herein.

(175) By means of the foregoing solutions, in this application, ratio processing is performed on left-ear and right-ear components of preset HRTF data of each sound input signal on the other side, to obtain a filtering function that retains orientation information of the preset HRTF data such that during synthesis of a virtual stereo, convolution filtering processing needs to be performed on only the sound input signal on the other side using the filtering function, and then the sound input signal on the other side and an original sound input signal on one side are synthesized to obtain the virtual stereo, without a need to simultaneously perform convolution filtering on the sound input signals that are on the two sides, which greatly reduces calculation complexity, and during synthesis, convolution processing does not need to be performed on the sound input signal on one of the sides, and therefore an original audio is retained, which further alleviates a coloration effect, and improves sound quality of the virtual stereo.

(176) In the several implementation manners provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely exemplary. For example, the module or unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.

(177) The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

(178) In addition, functional units in the embodiments of this application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in a form of hardware, or may be implemented in a form of a software functional unit.

(179) When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this application essentially, or the part contributing to the prior art, or all or a part of the technical solutions may be implemented in the form of a software product. The software product is stored in a storage medium and includes several instructions for instructing a computer device (which may be a personal computer, a server, or a network device) or a processor to perform all or a part of the steps of the methods described in the implementation manners of this application. The foregoing storage medium includes any medium that can store program code, such as a universal serial bus (USB) flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disc.

Virtual stereo synthesis method and apparatus

Assignee

Inventors

Cpc classification

Classification Explorer

H04S1/002

ELECTRICITY

Classification Explorer

H04S2420/01

ELECTRICITY

Classification Explorer

H04S2400/15

ELECTRICITY

Classification Explorer

H04R5/04

ELECTRICITY

Classification Explorer

H04S2400/11

ELECTRICITY

Classification Explorer

H04S3/004

ELECTRICITY

Classification Explorer

H04S7/306

ELECTRICITY

Classification Explorer

H04R5/033

ELECTRICITY

Classification Explorer

H04S7/307

ELECTRICITY

Classification Explorer

H04S1/005

ELECTRICITY

International classification

Classification Explorer

H04S7/00

ELECTRICITY

Classification Explorer

H04R5/04

ELECTRICITY

Classification Explorer

H04S1/00

ELECTRICITY

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

H04R5/033

ELECTRICITY

Abstract

Claims

Description