Audio signal of an FM stereo radio receiver by using parametric stereo
09877132 ยท 2018-01-23
Assignee
Inventors
Cpc classification
H04S3/00
ELECTRICITY
H04S2420/03
ELECTRICITY
G10L19/008
PHYSICS
H04S5/00
ELECTRICITY
International classification
H04S3/00
ELECTRICITY
H04S5/00
ELECTRICITY
G10L19/008
PHYSICS
Abstract
The invention relates to a method for improving a stereo audio signal of an FM stereo radio receiver. The method comprises determining one or more parametric stereo parameters based on the stereo audio signal in a frequency-variant or frequency-invariant manner. Preferably, these PS parameters are time- and frequency-variant. Moreover, the method comprises generating the improved stereo signal based on a first audio signal and the one or more parametric stereo parameters. The first audio signal is obtained from the stereo audio signal, e.g. by a downmix operation.
Claims
1. A method for improving a left/right or mid/side audio signal output by a frequency modulation (FM) stereo radio receiver, the method comprising: receiving the left/right or mid/side audio signal from the FM stereo radio receiver; generating a first audio signal based on the left/right or mid/side audio signal by a downmix operation; determining one or more parametric stereo parameters based on the left/right or mid/side audio signal in a frequency-variant; receiving the first audio signal and outputting a decorrelated signal; and generating a stereo signal based on the first audio signal, the one or more parametric stereo parameters, and selectively on: a second audio signal or at least a frequency band thereof, the second audio signal being a received side signal or a residual signal, the residual signal indicating an error associated with representing the left/right or mid/side audio signal by the first audio signal and the one or more parametric stereo parameters, or the decorrelated signal, wherein: the generating the stereo signal selectively based on the second audio signal or the decorrelated signal is frequency-variant and uses: the second audio signal for a first frequency range and the decorrelated signal for a second frequency range, the frequencies of the first frequency range being lower than the frequencies of the second frequency range.
2. The method of claim 1, wherein the method further comprises generating a decorrelated signal based on the first audio signal, and the generating the stereo signal is based on the first audio signal, the one or more parametric stereo parameters, and the decorrelated signal or at least a frequency band thereof.
3. The method of claim 1, wherein the generating the first audio signal is according to the following formula:
(L+R)/a wherein L and R denote the left and right channels of a left/right audio signal and a is a real number.
4. The method of claim 1, wherein the first audio signal corresponds to a received mid signal.
5. The method of claim 1, further comprising deriving the second audio signal based on the left/right audio or mid/side audio signal.
6. The method of claim 1, wherein the generating the stereo signal selectively depends: on a radio reception indicator indicative of the radio reception condition, and/or on a quality indicator indicative of the quality of the received side signal.
7. The method of claim 1, wherein the one or more parametric stereo parameters include a parameter indicating a channel level difference and/or a parameter indicating an inter-channel cross-correlation.
8. The method of claim 1, further comprising: performing noise reduction of the first audio signal, and generating the stereo signal based on a noise reduced first audio signal and the one or more parametric stereo parameters.
9. The method of claim 1, further comprising: performing noise reduction on the left/right or mid/side audio signal, and generating the one or more parametric stereo parameters based on the reduced left/right or mid/side audio signal.
10. The method of claim 9, further comprising: obtaining the first audio signal from the noise reduced left/right or mid/side audio signal.
11. The method of claim 1, further comprising: a noise parameter characteristic for the noise power of the received side signal; and determining the one or more parametric stereo parameters based on the left/right or mid/side audio signal and the noise parameter in a frequency-variant or frequency-invariant manner.
12. The method of claim 1, further comprising: noticing that the FM stereo receiver selects mono output of the stereo radio signal or noticing poor radio reception; and using one or more upmix parameters for blind upmix in case that the FM stereo receiver selecting mono output of the stereo radio signal is noticed or poor reception is noticed.
13. The method of claim 12, wherein the one or more upmix parameters for blind upmix are one or more preset upmix parameters.
14. The method of claim 12, further comprising: detecting whether the left/right or mid/side audio signal is predominantly speech, the one or more upmix parameters for blind upmix being dependent on said detection.
15. The method of claim 1, further comprising: noticing that the FM stereo receiver selects mono output of the stereo radio signal or noticing poor radio reception; and when the FM stereo receiver switches to mono output or poor radio reception is noticed, the generating the stereo signal uses one or more upmix parameters which are based on one or more previously estimated parametric stereo parameters from the determining.
16. The method of claim 15, wherein the generating the stereo signal continues to use the one or more previously estimated parametric stereo parameters as upmix parameters when the FM stereo receiver switches to mono output or poor radio reception occurs.
17. The method of claim 1, further comprising selecting the normal stereo mode in a frequency-variant manner.
18. The method of claim 1, wherein the determining one or more parametric stereo parameters is carried out with error compensation.
Description
DESCRIPTION OF DRAWINGS
(1) The invention is explained below by way of illustrative examples with reference to the accompanying drawings, wherein
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DETAILED DESCRIPTION
(14)
(15) Instead of using a left/right representation at the output of the FM receiver 1 and the input of the apparatus 2, a mid/side representation may be used at the interface between the FM receiver 1 and the apparatus 2 (see M, S in
(16) Optionally, a signal strength signal 6 indicating the radio reception condition may be used for adapting the audio processing in the audio processing apparatus 2. This will be explained later in this specification.
(17) The combination of the FM radio receiver 1 and the audio processing apparatus 2 corresponds to an FM radio receiver having an integrated noise reduction system.
(18)
(19) An audio signal DM is obtained from the input signal. In case the input audio signal uses already a mid/side representation, the audio signal DM may directly correspond to the mid signal. In case the input audio signal has a left/right representation, the audio signal is generated by downmixing the audio signal. Preferably, the resulting signal DM after downmix corresponds to the mid signal M and may be generated by the following equation:
DM=(L+R)/a, e.g. with a=2,
i.e. the downmix signal DM may correspond to the average of the L and R signals. For different values of a, the average of the L and R signals is amplified or attenuated.
(20) The apparatus further comprises an upmix stage 4 also called stereo mixing module or stereo upmixer. The upmix stage 4 is configured to generate a stereo signal L, R based on the audio signal DM and the PS parameters 5. Preferably, the upmix stage 4 does not only use the DM signal but also uses a side signal or some kind of pseudo side signal (not shown). This will be explained later in the specification in connection with more extended embodiments in
(21) The apparatus 2 is based on the idea that due to its noise the received side signal may too noisy for reconstructing the stereo signal by simply combining the received mid and side signals; nevertheless, in this case the side signal or side signal's component in the L/R signal may be still good enough for stereo parameter analysis in the PS parameter estimation stage 3. The resulting PS parameters 5 can be then used for generating a stereo signal L, R having a reduced level of noise in comparison to the audio signal directly at the output of the FM receiver 1.
(22) Thus, a bad FM radio signal can be cleaned-up by using the parametric stereo concept. The major part of the distortion and noise in an FM radio signal is located in the side channel which may be not used in the PS downmix. Nevertheless, the side channel is even in case of bad reception often of sufficient quality for PS parameter extraction.
(23) In all the following drawings, the input signal to the audio processing apparatus 2 is a left/right stereo signal. With minor modifications to some modules within the audio processing apparatus 2, the audio processing apparatus 2 can also process an input signal in mid/side representation. Therefore, the concepts discussed herein can be used in connection with an input signal in mid/side representation.
(24)
(25) The PS encoder 7 generatesbased on the stereo audio input signal L, Rthe audio signal DM and the PS parameters 5. Optionally, the PS encoder 7 further uses a signal strength signal 6. The audio signal DM is a mono downmix and preferably corresponds to the received mid signal. When summing the L/R channels to form the DM signal, the information of the received side channel may be completely excluded in the DM signal. Thus, in this case only the mid information is contained in the mono downmix DM. Hence, any noise from the side channel may be excluded in the DM signal. However, the side channel is part of the stereo parameter analysis in the encoder 7 as the encoder 7 typically takes L=M+S and R=MS as input (consequently, DM=(L+R)/2=M).
(26) Experimental results indicate that a received side signal that contains intermediate levels of noise may not be good enough for reconstructing stereo itself but can be good enough for stereo parameter analysis in a PS encoder 7.
(27) The mono signal DM and the PS parameters 5 are used subsequently in the PS decoder 8 to reconstruct the stereo signal L, R.
(28)
(29) The use of a residual signal in an PS encoder/decoder is e.g. described in the MPEG Surround standard (see document ISO/IEC 23003-1:2007, MPEG Surround) and in the paper MPEG SurroundThe ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding, J. Herre et al., Audio Engineering Convention Paper 7084, 122.sup.nd Convention, May 5-8, 2007.
(30)
(31) The PS parameter estimation stage 3 may estimate as PS parameters 5 the correlation and the level difference between the L and R inputs. Optionally, the parameter estimation stage receives the signal strength 6 which may be the signal power at the FM receiver. This information can be used to decide about the reliability, e.g. in case of a low signal strength 6, of the PS parameters 5. In case of a low reliability the PS parameters 5 may be set such that the output signal L, R is a mono output signal or a pseudo stereo output signal. In case of a mono output signal, the output signal L is equal to the output signal R. In case of a pseudo stereo output signal, default PS parameters may be used to generate a pseudo or default stereo output signal L, R.
(32) The PS decoder module 8 comprises a stereo mixing matrix 4a and a decorrelator 10. The decorrelator receives the mono downmix DM and generates a decorrelated signal S which is used as a pseudo side signal. The decorrelator 10 may be realized by an appropriate all-pass filter as discussed in section 4 of the cited document Low Complexity Parametric Stereo Coding in MPEG-4. The stereo mixing matrix 4a is a 22 upmix matrix in this embodiment.
(33) Dependent upon the estimated parameters 5, the matrix 4a mixes the DM signal with the received side signal S.sub.0 or the decorrelated signal S to create the stereo output signals L and R. The selection between the signal S.sub.0 and the signal S may depend on a radio reception indicator indicative of the reception conditions, such as the signal strength 6. One may instead or in addition use a quality indicator indicative of the quality of the received side signal. One example of such a quality indicator may be an estimated noise (power) of the received side signal. In case of a side signal comprising a high degree of noise, the decorrelated signal S may be used to create the stereo output signal L and R, whereas in low noise situations, the side signal S.sub.0 may be used. Various embodiments for estimating the noise of the received side signal are discussed later in this specification.
(34) As an example, in case of good reception conditions (i.e. the signal strength is high), the signal S.sub.0 is used for upmixing, whereas in case of bad conditions the upmixing is based on the decorrelated signal S. Preferably, the decision whether the stereo mixing module 4 uses the received side signal S.sub.0 or S is frequency dependent, e.g. for lower frequencies the received side signal S.sub.0 is used and for higher frequencies the decorrelated signal S is used. This will be discussed more in detail in connection with
(35) The frequency-variant or frequency-invariant selection between the signal S.sub.0 and the signal S may be done in the upmix stage 4 (e.g. by selector means in the upmix stage 6 which are controlled e.g. in dependency of the signal strength 6). Alternatively, the frequency-variant or frequency-invariant selection between the signal S.sub.0 and the signal S may be performed in the parameter estimation stage 3 (e.g. in dependency of the signal strength 6), and the parameter estimation stage 3 then sends upmix parameters to the upmix stage 6 that cause that the respectively selected signal (either S.sub.0 or S) is used for the upmix, e.g. the upmix parameters relating to the signal S.sub.0 are set to zero and the parameters relating to S are not set to zero in case of selecting S. Alternatively, a selection signal (not shown) may be send to the upmix stage 6.
(36) The upmix operation is preferably carried out according to the following matrix equation:
(37)
(38) Here, the weighting factors , , , determine the weighting of the signals DM and S. The mono downmix DM preferably corresponds to the received mid signal. The signal S in the formula corresponds either to the decorrelated signal S or to the received side signal S.sub.0. The upmix matrix elements, i.e. the weighting factors , , , , may be derived e.g. as shown the cited paper Low Complexity Parametric Stereo Coding in MPEG-4 (see section 2.2), as shown in the cited MPEG-4 standardization document ISO/IEC 14496-3:2005 (see section 8.6.4.6.2) or as shown in MPEG Surround specification document ISO/IEC 23003-1 (see section 6.5.3.2). These sections of the documents (and also sections referred to in these sections) are hereby incorporated by reference for all purposes.
(39) Preferably, the selection between S and S.sub.0 is frequency dependent. This is shown in
(40) If the received side signal S.sub.0 corresponds to S.sub.0=(LR)/2 and L=M+S.sub.0 and R=MS.sub.0, the mono downmix DM should preferably correspond to (L+R)/2; this allows perfect reconstruction, i.e. L=L and R=R.
(41) Instead of using a PS upmixer using the received side signal S.sub.0, a generalized PS upmixer using a residual signal may be used. The resulting signals L, R are function of the PS parameters, the residual signal and the mono downmix.
(42)
(43)
(44) The mono downmix signal DM may be generated by adding the L, R channels with same weighting factors (e.g. using weighting factors of 1 or using weighting factors of ). The signal DM then corresponds to the received mid signal. When using weighting factors of , the amplitude of the signal DM is half of the amplitude of the signal DM in case when using weighting factors of 1.
(45) Optionally, some form of noise reduction may be also applied to the signal L/R or the signal DM (and/or the S.sub.0 signal if used). E.g. some noise reduction may be applied to the signal DM (see the optional noise reduction stage 11 in
(46) In certain reception conditions, the FM receiver 1 only provides a mono signal, with the conveyed side signal being muted. This will typically happen when the reception conditions are very bad and the side signal is very noisy. In case the FM stereo receiver 1 has switched to mono playback of the stereo radio signal, the upmix stage preferably uses upmix parameters for blind upmix, such as preset upmix parameters, and generates a pseudo stereo signal, i.e. the upmix stage generates a stereo signal using the upmix parameters for blind upmix.
(47) There are also embodiments of the FM stereo receiver 1 which switch at too poor reception conditions to mono playback. If the reception conditions are too poor for estimation of reliable PS parameters 5, the upmix stage preferably uses upmix parameters for blind upmix and generates a pseudo stereo signal based thereon.
(48)
(49) Optionally, a speech detector 14 may be added to indicate if the received signal is predominantly speech or music. Such speech detector 14 allows for signal dependent blind upmix. E.g. such a speech detector 14 may allow for signal dependent upmix parameters. Preferably, one or more upmix parameters may be used for speech and different one or more upmix parameters may be used for music. Such a speech detector 14 may be realized by a Voice Activity Detector (VAD). Strictly speaking, the upmix stage 4 in
(50)
(51) The same approach of using upmix parameters based on the previously estimated PS parameters can be also applied if the FM receiver 1 provides a noisy stereo signal during a short period of time, with the noisy stereo signal being too bad to estimate reliable PS parameters based thereon.
(52) In the following, an advanced PS parameter estimation stage 3 providing error compensation is discussed with reference to
(53) When assuming that the noise in the side signal is independent of the mid signal: the ICC values get closer to 0 in comparison to the ICC values estimated based on a noiseless stereo signal, and the CLD values in decibel get closer to 0 dB in comparison to the CLD values estimated based on a noiseless stereo signal.
(54) For compensation of the error in the PS parameters the apparatus 2 preferably has a noise estimate stage which is configured to determine a noise parameter characteristic for the power of the noise of the received side signal that was caused by the (bad) radio transmission. The noise parameter is considered when estimating the PS parameters. This may be implemented as shown in
(55) According to
(56) The actual noisy stereo input signal values l.sub.w/noise and r.sub.w/noise, which are input to the inner PS parameter estimation stage 3 shown in
l.sub.w/noise=m+(s+n)=l.sub.w/o noise+n
r.sub.w/noise=m(s+n)=r.sub.w/o noisen
(57) It should be noted that here the received side signal is modeled as s+n, where s is the original (undistorted) side signal, and n is the noise (distortion signal) caused by the radio transmission channel. Furthermore, it is assumed here that the signal m is not distorted by noise from the radio transmission channel.
(58) Thus, the corresponding input powers L.sub.w/noise.sup.2, R.sub.w/noise.sup.2 and the cross correlation L.sub.w/noiseR.sub.w/noise can be written as:
L.sub.w/noise.sup.2=E(l.sub.w/noise.sup.2)E((m+s).sup.2)+E(n.sup.2)=L.sub.w/o noise.sup.2+N.sup.2
R.sub.w/noise.sup.2=E(r.sub.w/noise.sup.2)=E((ms).sup.2)+E(n.sup.2)=R.sub.w/o noise.sup.2+N.sup.2
L.sub.w/noiseR.sub.w/noise=E(l.sub.w/noise.Math.r.sub.w/noise)=E((l.sub.w/o noise+n).Math.(r.sub.w/o noisen))=L.sub.w/o noiseR.sub.w/o noiseN.sup.2
with the side signal noise power estimate N.sup.2, with N.sup.2=E(n.sup.2), where E( ) is the expectation operator.
(59) By rearranging the above equations, the corresponding compensated powers and cross-correlation without noise can be determined to be:
L.sub.w/o noise.sup.2=L.sub.w/noise.sup.2N.sup.2
R.sub.w/o noise.sup.2=R.sub.w/noise.sup.2N.sup.2
L.sub.w/o noiseR.sub.w/o noise=L.sub.w/noiseR.sub.w/noise+N.sup.2
(60) An error-compensated PS parameter extraction based on the compensated powers and cross correlation may be carried out as given by the formulas below:
CLD=10.Math.log.sub.10(L.sub.w/o noise.sup.2/R.sub.w/o noise.sup.2)
ICC=(L.sub.w/o noiseR.sub.w/o noise)/(L.sub.w/o noise.sup.2+R.sub.w/o noise.sup.2)
(61) Such a parameter extraction compensates for the estimated N.sup.2 term in the calculation of the PS parameters.
(62) In
(63) A variety of methods can be used for determining the side signal noise power N.sup.2, e.g.: When detecting power minima of the mid signal (e.g. pauses in speech), it can be assumed that the power of the side signal is noise only (i.e. the power of the side signal corresponds to N.sup.2 in these situations). The N.sup.2 estimate can be defined by a function of the signal strength data 6. The function (or lookup table) can be designed by experimental (physical) measurements. The N.sup.2 estimate can be defined by a function of the signal strength data 6 and/or the audio input signals (L and R). The function can be designed by heuristic rules. The N.sup.2 estimate can be based on studying the signal type coherence of the mid and side signals. The original mid and side signals can e.g. be assumed to have similar tonality-to-noise ratio or crest factor or other power envelope characteristics. Deviations of those properties can be used to indicate a high level of N.sup.2.
(64) In the following further preferred embodiments of the audio processing apparatus 2 are discussed.
(65) Preferably, the apparatus 2 is configured in such a way that for received side signals with practically only noise, the apparatus 2 smoothly switches to pseudo stereo (blind upmix) operation, as illustrated in
(66) For side signals with almost no noise, the apparatus 2 preferably switches smoothly to normal stereo operation instead of parametric stereo operation. In normal stereo operation, the signal improvement functionality of the apparatus 2 is essentially deactivated. For deactivation, the audio signal at the input of apparatus may be essentially fedthrough to the output of the apparatus 2.
(67) Alternatively, the normal stereo operation may be accomplished by using the received side signal S.sub.0, as illustrated in
L=DM+S.sub.0, R=DMS.sub.0,
in case DM=M=(L+R)/2 and S.sub.0=(LR)/2.
(68) More preferably, the normal stereo mode or the parametric stereo mode may be selected in a frequency-variant manner, i.e. the selection may be different for the different frequency bands. This is useful since the signal-to-noise ratio for the received side signal gets worse for higher frequencies.
(69) The smooth switching between different operation modes may be adapted dynamically to the current reception conditions, in order to provide always the best possible stereo signal at the output of the apparatus 2. In case of a high signal-to-noise ratio normal FM stereo operation (without noise reduction based on PS processing) is preferred, whereas in case of a low signal-to-noise ratio PS processing greatly improves the stereo signal.
(70) Preferably, the generation of the mono downmix DM in the PS encoder 7 should be done such that as little as possible noise from the side signal leaks into the mono downmix DM. This can require different downmix techniques than those typically used in a PS encoder (such as an MPEG-4 PS encoder for MPEG-4) which is normally employed in the context of a very low bitrate coding system. This can be as simple as a fixed (non-adaptive) downmix DM=M=(L+R)/2, where the downmix simply correspond to the mid signal. Furthermore, the upmix in the PS decoder 8 is typically adapted to the actual downmix technique used in the PS encoder 7.
(71) It should be noted that although in several drawings the PS encoder 7 and the PS decoder 8 are shown as separate modules, it is of course advantageous in the context of an efficient implementation to merge PS encoder 7 and the PS decoder 8 as much as possible.
(72) The concepts discussed herein can be implemented in connection with any encoder using PS techniques, e.g. an HE-AAC v2 (High-Efficiency Advanced Audio Coding version 2) encoder as defined in the standard ISO/IEC 14496-3 (MPEG-4 Audio), an encoder based on MPEG Surround or an encoder based on MPEG USAC (Unified Speech and Audio coder) as well as encoders which are not covered by MPEG standards.
(73) In the following, by way of example, a HE-AAC v2 encoder is assumed; nevertheless, the concepts may be used in connection with any audio encoder using PS techniques.
(74) HE-AAC is a lossy audio compression scheme. HE-AAC v1 (HE-AAC version 1) makes use of spectral band replication (SBR) to increase the compression efficiency. HE-AAC v2 further includes parametric stereo to enhance the compression efficiency of stereo signals at very low bitrates. An HE-AAC v2 encoder inherently includes a PS encoder to allow operation at very low bitrates. The PS encoder of such an HE-AAC v2 encoder can be used as the PS encoder 7 of the audio processing apparatus 2. In particular, the PS parameter estimating stage within a PS encoder of an HE-AAC v2 encoder can be used as the PS parameter estimating stage 3 of the audio processing apparatus 2. Also the downmix stage within a PS encoder of an HE-AAC v2 encoder can be used as the downmix stage 9 of the apparatus 2.
(75) Hence, the concept discussed in this specification can be efficiently combined with an HE-AAC v2 encoder to realize an improved FM stereo radio receiver. Such an improved FM stereo radio receiver may have an HE-AAC v2 recording feature since the HE-AAC v2 encoder outputs an HE-AAC v2 bitstream which can stored for recording purposes. This is shown in
(76) Optionally, the PS encoder 7 may be modified for the purpose of FM radio noise reduction to support a fixed downmix scheme, such as a downmix scheme according to DM=(L+R)/a.
(77) The mono downmix DM and the PS parameters 8 may be fed to the PS decoder 8 to generate the stereo signal L, R as discussed above. The mono downmix DM is fed to an HE-AAC v1 encoder for perceptual encoding of the mono downmix DM. The resulting perceptual encoded audio signal and the PS information are multiplexed into an HE-AAC v2 bitstream 18. For recording purposes, the HE-AAC v2 bitstream 18 can be stored in a memory such as a flash-memory or a hard-disk.
(78) The HE-AAC v1 encoder 17 comprises an SBR encoder and an MC encoder (not shown). The SBR encoder typically performs signal processing in the QMF (quadrature mirror filterbank) domain and thus needs QMF samples. In contrast, the MC encoder typically needs time domain samples (typically downsampled by a factor 2).
(79) The PS encoder 7 within the HE-MC v2 encoder 16 typically provides the downmix signal DM already in the QMF domain.
(80) Since the PS encoder 7 may already send the QMF domain signal DM to the HE-AAC v1 encoder, the QMF analysis transform in the HE-AAC v1 encoder for the SBR analysis can be made obsolete. Thus, the QMF analysis that is normally part of the HE-MC v1 encoder can be avoided by providing the downmix signal DM as QMF samples. This reduces the computing effort and allows for complexity saving.
(81) The time domain samples for the AAC encoder may be derived from the input of the apparatus 2, e.g. by performing the simple operation DM=(L+R)/2 in the time domain and by downsampling the time domain signal DM. This approach is probably the cheapest approach. Alternatively, the apparatus 2 may perform a half-rate QMF synthesis of the QMF domain DM samples.
(82) It should be noted that the PS encoder and PS decoder can be partly merged if both are implemented in the same module.