Method and apparatus for level control in blending an audio signal in an in-band on-channel radio system
RE049210 · 2022-09-13
Assignee
Inventors
- Gabriel Olochwoszcz (Hillsborough, NJ, US)
- Ashwini Pahuja (Roslyn Heights, NY, US)
- Scott Vincelette (Sparta, NJ, US)
Cpc classification
H04H20/22
ELECTRICITY
H04H40/36
ELECTRICITY
H03G3/3005
ELECTRICITY
G10L19/167
PHYSICS
International classification
H04H20/22
ELECTRICITY
Abstract
A method for processing a digital audio broadcast signal includes: separating an analog audio portion and a digital audio portion of the digital audio broadcast signal; determining the loudness of the analog audio portion and the digital audio portion over a first short time interval; using the loudness of the analog and digital audio portions to calculate a short term average gain; determining a long term average gain; converting one of the long term average gain or the short term average gain to dB; if an output has been blended to digital, adjusting a digital gain parameter by a preselected increment to produce a digital gain parameter; if an output has not been blended to digital, setting the digital gain parameter to the short term average gain; providing the digital gain parameter to an audio processor; and repeating the above steps using a second short time interval.
Claims
1. A method for processing .[.a digital.]. .Iadd.an .Iaddend.audio .[.broadcast.]. signal .Iadd.including multiple audio streams using an audio signal receiver.Iaddend., the method comprising: (a) separating .[.an analog audio portion.]. .Iadd.a first audio sample stream .Iaddend.of the .[.digital.]. audio .[.broadcast.]. signal from a .[.digital audio portion.]. .Iadd.second audio sample stream .Iaddend.of the .[.digital.]. audio .[.broadcast.]. signal .Iadd.using the audio signal receiver.Iaddend.; (b) determining a loudness of the .[.analog audio portion.]. .Iadd.first audio sample stream .Iaddend.over a first short time interval; (c) determining a loudness of the .[.digital audio portion.]. .Iadd.second audio sample stream .Iaddend.over the first short time interval; (d) using the loudness of the .[.analog audio portion.]. .Iadd.first audio sample stream .Iaddend.and the loudness of the .[.digital audio portion.]. .Iadd.second audio sample stream .Iaddend.to calculate a short term average gain; (e) determining a long term average gain; (f) converting one of the long term average gain or the short term average gain to .[.dB.]. .Iadd.decibels (dB) .Iaddend.and providing the converted average gain to an audio processor; (g) .[.if an output of a receiver has been blended to digital,.]. .Iadd.performing one of blending an output of a receiver to the second audio sample stream and .Iaddend.adjusting a .[.digital.]. .Iadd.second audio sample stream .Iaddend.gain parameter toward the long term average gain by a preselected increment, .[.to produce an adjusted digital gain parameter; (h) if the output of the receiver has not been blended to digital.]. .Iadd.or not blending the output of the receiver to the second audio sample stream and .Iaddend.setting the .[.digital.]. .Iadd.second audio sample stream .Iaddend.gain parameter to the short term average gain; .[.(i).]. .Iadd.(h) .Iaddend.providing the .[.digital.]. .Iadd.second audio sample stream .Iaddend.gain parameter .[.adjusted by.]. .Iadd.from .Iaddend.step (g) .[.or step (h).]. to the audio processor; and .[.(j).]. .Iadd.(i) .Iaddend.repeating steps (a) through .[.(i).]. .Iadd.(h) .Iaddend.using a second short time interval.
2. The method of claim 1, wherein: the short term average gain is a linear ratio of analog audio power .Iadd.of the first audio sample stream .Iaddend.and digital audio power .Iadd.of the second audio sample stream.Iaddend..
3. The method of claim 1, wherein: the .[.digital.]. .Iadd.second audio sample stream .Iaddend.gain parameter is adjusted toward the short term average gain; and the preselected increment is 1 dB.
4. The method of claim 1, wherein: the long term average gain comprises a running average of the short term average gain.
5. The method of claim 1, wherein: the long term average gain is determined independently of the short term average gain.
6. The method of claim 1, wherein: the long term average gain is converted to dB if a long time interval has been met and the short term average gain is converted to dB if the long time interval has not been met.
7. The method of claim 6, wherein the long time interval comprises an integer multiple of the short time intervals.
8. The method of claim 6, wherein: the short time interval is in a range from 1 to 5 seconds and the long time interval is in a range from 5 to 30 seconds.
9. The method of claim 1, wherein: the .[.analog audio portion.]. .Iadd.first audio sample stream .Iaddend.comprises a stream of samples analog modulated program material; and the .[.digital audio portion.]. .Iadd.second audio sample stream .Iaddend.comprises a stream of samples digitally modulated program material.
10. The method of claim 1, wherein the measurements of the levels of the .[.analog audio portion.]. .Iadd.first audio sample stream .Iaddend.and the .[.digital audio portion.]. .Iadd.second audio sample stream .Iaddend.are performed in accordance with the International Telecommunications Union Recommendation (ITU-R) BS.1770 specification.
11. .[.A radio.]. .Iadd.An audio signal .Iaddend.receiver comprising: processing circuitry configured to: (a) separate .[.an analog.]. .Iadd.a first .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.of .[.the digital.]. .Iadd.an .Iaddend.audio .[.broadcast.]. signal from a .[.digital.]. .Iadd.second .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.of the .[.digital.]. audio .[.broadcast.]. signal; (b) determine a loudness of the .[.analog.]. .Iadd.first .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.over a first short time interval; (c) determine a loudness of the .[.digital.]. .Iadd.second .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.over the first short time interval; (d) use the loudness of the .[.analog.]. .Iadd.first .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.and the loudness of the .[.digital.]. .Iadd.second .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.to calculate a short term average gain; (e) determine a long term average gain; (f) convert one of the long term average gain or the short term average gain to .[.dB.]. .Iadd.decibels (dB) .Iaddend.and provide the converted average gain to an audio processor; (g) .[.if an output of a receiver has been blended to digital,.]. .Iadd.perform one of blend an output of a receiver to the second audio sample stream and .Iaddend.adjust a .[.digital.]. .Iadd.second audio sample stream .Iaddend.gain parameter by a preselected increment .[.to produce an adjusted digital gain parameter; (h) if the output of the receiver has not been blended to digital.]., .Iadd.or not blend the output of the receiver to the second audio sample stream and .Iaddend.set the .[.digital.]. .Iadd.second audio sample stream .Iaddend.gain parameter to the short term average gain; .[.(i).]. .Iadd.(h) .Iaddend.provide the .[.digital.]. .Iadd.second audio sample stream .Iaddend.gain parameter .[.adjusted by.]. .Iadd.from .Iaddend.step (g) .[.or step (h).]. to the audio processor; and .[.(j).]. .Iadd.(i) .Iaddend.repeat steps (a) through .[.(i).]. .Iadd.(h) .Iaddend.using a second short time interval.
12. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 11, wherein: the short term average gain is a linear ratio of analog audio power and digital audio power.
13. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 11, wherein: the .[.digital.]. .Iadd.second audio sample stream .Iaddend.gain parameter is adjusted toward the short term average gain; and the preselected increment is 1 dB.
14. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 11, wherein: the long term average gain comprises a running average of the short term average gain.
15. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 11, wherein: the long term average gain is determined independently of the short term average gain.
16. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 11, wherein: the long term average gain is converted to dB if a long time interval has been met and the short term average gain is converted to dB if the long time interval has not been met.
17. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 16, wherein the long time interval comprises an integer multiple of the short time intervals.
18. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 16, wherein: the short time interval is in a range from 1 to 5 seconds and the long time interval is in a range from 5 to 30 seconds.
19. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 11, wherein: the .[.analog.]. .Iadd.first .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.comprises a stream of samples analog modulated program material; and the .[.digital.]. .Iadd.second .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.comprises a stream of samples digitally modulated program material.
20. The .[.radio.]. .Iadd.audio signal .Iaddend.receiver of claim 11, wherein the measurements of the levels of the .[.analog.]. .Iadd.first .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.and the .[.digital.]. .Iadd.second .Iaddend.audio .[.portion.]. .Iadd.sample stream .Iaddend.are performed in accordance with the International Telecommunications Union Recommendation (ITU-R) BS.1770 specification.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION OF THE INVENTION
(6) Embodiments described herein relate to the processing of the digital and analog components of a digital radio broadcast signal. While aspects of the disclosure are presented in the context of an exemplary IBOC system, it should he understood that the present disclosure is not limited to IBOC systems and that the teachings herein are applicable to other forms of digital radio broadcasting as well.
(7) Referring to the drawings,
(8) In one example, a basic unit of transmission of the DAB signal is the modem frame, which is typically on the order of a second in duration. Exemplary AM and FM IBOC DAB transmission systems arrange the digital audio and data in units of modem frames. Some transmission systems are both simplified and enhanced by assigning a fixed number of audio frames to each modem frame. The audio frame period is the length of time required to render, e.g., play back audio for a user, the samples in an audio frame. For example, if an audio frame contains 1024 samples, and the sampling period is 22.67 μsec, then the audio frame period would be approximately 23.2 milliseconds. A scheduler determines the total number of bits allocated to the audio frames within each modem frame. The modem frame duration is advantageous because it may enable sufficiently long interleaving times to mitigate the effects of fading and short outages or noise bursts such as may be expected in a digital audio broadcasting system. Therefore the main digital audio signal can be processed in units of modem frames, and audio processing, error mitigation, and encoding strategies may be able to exploit this relatively large modem frame time without additional penalty.
(9) In typical implementations, an audio encoder may be used to compress the audio samples into audio frames in a manner that is more efficient and robust for transmission and reception of the IBOC signal over the radio channel. The audio encoder encodes the audio frames using the bit allocation for each modem frame. The remaining bits in the modem frame are typically consumed by the multiplexed data and overhead. Any suitable audio encoder can initially produce the compressed audio frames such as an HDC encoder as developed by Coding Technologies of Dolby Laboratories, Inc.; an Advanced Audio Coding (AAC) encoder; an MPEG-1 Audio Layer 3 (MP3) encoder; or a Windows Media Audio (WMA) encoder. Typical lossy audio encoding schemes, such as AAC, MP3, and WMA, utilize the modified discrete cosine transform (MDCT) for compressing audio data. MDCT based schemes typically compress audio samples in blocks of a fixed size. For example, in AAC encoding, the encoder may use a single MDCT block of length 1024 samples or 8 blocks of 128 samples. Accordingly, in implementations using an AAC coder, for example, each audio frame could be comprised of a single block of 1024 audio samples, and each modem frame could include 64 audio frames. In other typical implementations, each audio frame could be comprised of a single block of 2048 audio samples, and each modem frame could include 32 audio frames. Any other suitable combination of sample block sizes and audio frames per modem frame could be utilized.
(10) In an exemplary IBOC DAB system, the broadcast signal includes main program service (MPS) audio, MPS data (MPSD), supplemental program service (SPS) audio, and SPS data (SPSD). MPS audio serves as the main audio programming source. In hybrid modes, it preserves the existing analog radio programming formats in both the analog and digital transmissions. MPSD, also known as program service data (PSD), includes information such as music title, artist, album name, etc. Supplemental program service can include supplementary audio content as well as PSD. Station Information Service (SIS) is also provided, which comprises station information such as call sign, absolute time, position correlated to GPS, data describing the services available on the station. In certain embodiments, Advanced Applications Services (AAS) may be provided that include the ability to deliver many data services or streams and application specific content over one channel in the AM or FM spectrum, and enable stations to broadcast multiple streams on supplemental or sub-channels of the main frequency.
(11) A digital radio broadcast receiver performs the inverse of some of the functions described for the transmitter.
(12)
(13) In contrast, the analog signal (i.e., the digitized analog audio samples) spends an amount of time T.sub.ANALOG in the analog signal path 92. T.sub.ANALOG is typically a constant amount of time that is implementation dependent. It should be noted that the analog signal path 92 may be co-located with the digital signal path on the baseband processor 82 or separately located on an independent analog processing chip. Since the time spent traveling through the digital signal path T.sub.DIGITAL and the analog signal path T.sub.ANALOG may be different, it is desirable to align the samples from the digital signal with the samples from the analog signal within a predetermined amount so that they can be smoothly combined in the audio transition module 94. The alignment accuracy will preferably be chosen to minimize the introduction of audio distortions when blending from analog to digital and visa versa. The digital and analog signals are combined and travel through the audio transition module 94. Then the combined digitized audio signal is converted into analog for rendering via the digital-to-analog converter (DAC) 96. As used in this description, references to “analog” or “digital” with regard to a particular data sample streams in this disclosure connote the radio signal from which the sample stream was extracted, as both data streams are in a digital format for the processing described herein.
(14) One technique for determining time alignment between signals in digital and analog pathways performs a correlation between the samples of the two audio streams and looks for the peak of the correlation. Time samples of digital and analog audio are compared as one sample stream is shifted in time against the other. The alignment error can be calculated by successively applying offsets to the sample streams until the correlation peaks. The time offset between the two samples at peak correlation is the alignment error. Once the alignment error has been determined, the timing of the digital and/or analog audio samples can be adjusted to allow smooth blending of the digital and analog audio.
(15) While the description of the previously existing blend technique illustrated in
(16)
(17) The correlation operation performed by the correlator may include multiplying together decimated data from each stream. The result of the multiplication may appear as noise, with a large peak when the data streams are aligned in time.
(18) In the system of
(19) Once the analog and digital data streams are sufficiently aligned, a blend operation may begin. The blend operation may be conducted, for example, by reducing the contribution of the analog data stream to the output audio while correspondingly increasing the contribution of the digital data stream until the latter is the exclusive source.
(20) The transition time between the analog and digital audio outputs is generally less than one second, which is limited by the diversity delay and receiver decoding times. The relatively short blend transition time presents challenges in designing blending systems. It has been observed that frequent transitions between the analog and digital audio can be somewhat annoying when the difference in audio quality and loudness between the digital audio and the analog audio is significant. This is especially significant when the digital signal has a wider audio bandwidth than the analog audio, and the digital signal is stereo while the analog is mono. This phenomenon can occur in mobile receivers in fringe coverage areas when highway overpasses (or power lines for AM) are frequently encountered.
(21) International Telecommunication Union Recommendation ITU-R BS.1770-3 specification, hereinafter referred to as ITU 1770, is a primary standard for loudness measurement. ITU 1770 algorithms can be used to measure audio program loudness and true-peak audio level. In ITU 1770, the Equivalent Sound Level, L.sub.eq, is simply defined as the RMS sound power of the signal relative to a reference sound power. This calculation is easily accomplished with minimal memory and MIPS (millions of instructions per second). An optional frequency weighting prior to the sound power calculation is specified as an “RLB” filter, which is a simple low pass at ˜100 Hz followed by a filter that applies a 4 dB boost to frequencies above approximately 2 kHz. Adding the filter calculations for an RLB weighting filter does not require significantly more MIPS/Memory.
(22) The loudness difference between analog and digital audio can change dynamically. For example, up to 10 dB in loudness difference has been measured when comparing analog and digital audio at various points in the same program. If the loudness difference is small when blending to digital and later in the program the difference becomes greater, possibly 10 dB greater, a blend back to analog would result in an unacceptable abrupt change in loudness. This is primarily due to the dynamic nature of the loudness difference between digital and analog audio within a single program. This loudness difference exists for numerous reasons including, but not limited to, different processing applied to the analog and digital audio, poor signal conditions, etc.
(23) A short term loudness match at the time a blend operation is performed, coupled with a long term loudness equalization of the digital audio can solve this fundamental problem.
(24) There are conflicting requirements when setting analog and digital loudness in the HD Radio system. The first requirement, referred to as a “long term loudness difference”, requires that the loudness perceived over the duration of the program must be consistent whether listening to the analog stream or the digital stream. The second requirement, referred to as a “short term loudness difference”, occurs at the transition time between the two streams. This transition time is generally short (e.g., <1 second), and the loudness must be relatively equal (e.g., ±2 dB), or else the listener will perceive the difference. Measurements have found that the short and long term loudness values can be drastically different as the content of the program changes. Therefore, at the point of blend a short term value is used so that the transition time sounds smooth. The short term loudness can be determined over a short time interval. The short time interval is a time in the range of 1 to 5 seconds. In one embodiment, the short time interval is 2.97 seconds. The ideal short time interval for a particular application can be determined based on audio perception and perceptual memory, such as what is perceived by human hearing to be instantaneous.
(25) The short term loudness value can be slowly ramped to the long term value as the program continues so that the overall perceived loudness of a given program is the same regardless of whether the analog or digital audio stream is playing. The long term loudness can be determined over a long time interval. The long term loudness can be determined over a long time interval. The long time interval is a time in the range of 5 to 30 seconds. In one embodiment, the long time interval is in a range of 5.94 to 29.72 seconds. The long time interval is always longer than the short time interval.
(26) Generally, the short time interval must be several seconds, and always less than the long time interval. In some embodiments, the long time interval is measured in integer multiples of the short time interval. This is not a strict requirement for the process, but was chosen to simplify implementation.
(27) When the level of the digital audio stream matches the level of the analog audio stream, the streams can be blended to produce an audio output signal. The short term loudness measurement is calculated and used to update a long term running average loudness value. The minimum time before blend may occur when level control is enabled is the short time interval.
(28)
(29) A short term average power (loudness) is calculated for each stream as shown in blocks 156 and 158. This calculation can be performed using the algorithm set forth in ITU-1770. Then a short term average gain is calculated as shown in block 160. The short term average gain is calculated as the linear ratio of analog audio power to digital audio power. Block 162 shows that the long term average gain is then calculated, either directly or using the short term gain. The short term average gain is the gain determined over the short time interval. The long term average gain is the gain determined over the long time interval.
(30) The next step depends on whether or not the long time interval has been met as shown in block 164. In one implementation, the long time interval is comprised of integer multiples of the short time interval and the long term gain is calculated using a running average of the short term gain. Another implementation could calculate the short term and long term gain independently over different intervals. An audio frame counter can be used to determine when each of the short and long time intervals has been met.
(31) If the long time interval has been met, the long term gain (running average) over the full long time interval is used as shown in block 166. If the long time interval has not been met, the short term gain is used as shown in block 168. The short term gain may be averaged with previously calculated short term gain measurements to generate a partial long term gain, but this is not a strict requirement. In either case, the gain is converted from a linear ratio to integer dB (always rounding down), as shown in block 170, and provided to a host processor for the purpose of adjusting the digital audio loudness during blend to better match the loudness of the analog audio. The range of digital gain correction which can be applied is −8 dB to 7 dB, in 1 dB increments.
(32) The next step depends on whether or not the output of the receiver has already been blended to digital as shown in block 172. If the output of the receiver has already been blended to digital, the digital gain is adjusted by a predetermined amount (e.g., 1 dB) towards the calculated long term gain, as shown in block 174. The adjustment step size should be less than 1.5 dB to avoid immediately perceptible changes in output volume. If the output of the receiver has not been blended to digital, the digital gain is set to the calculated gain, as shown in block 176. The updated digital gain parameter is provided to an external audio processor, as shown in block 178. Then the short time interval is ended as shown in block 180 and a new short time interval is used for subsequent iterations of the process as shown in block 150.
(33) The method illustrated in
(34) Updating the gain of the digital audio signal with this long term loudness difference value could drive the long term average loudness of the digital to match that of the analog. If the step size were kept small, for example 1 dB, and the update rate were sufficiently long, for example 3, 5 or 10 seconds, then the difference in audio level could be imperceptible to a listener. After a time the loudness measurements would stabilize and digital volume would reliably track the analog volume. This would minimize the potential volume difference at the next blend to analog without causing major changes in the digital volume during playback.
(35) In one embodiment, the short term level measurement can be performed on samples that occur after time alignment resulting in a longer delay to blend. However, the time alignment algorithm can be run multiple times to ensure consistency. Then the short term level alignment function can be run concurrently with a second (or subsequent) execution of the time alignment algorithm, using the alignment value from the first execution. In addition, because the short term level alignment can be executed separately from time alignment, the level alignment algorithm could be run continuously (for example over a 3 second sample window) regardless of the time alignment range.
(36) The functions shown in
(37) While the present invention has been described in terms of its preferred embodiments, it will be apparent to those skilled in the art that various modifications can be made to the described embodiments without departing from the scope of the invention as defined by the following claims.