Loudness level and range processing
10355657 ยท 2019-07-16
Assignee
Inventors
Cpc classification
H03G3/3005
ELECTRICITY
H04S3/006
ELECTRICITY
H04R2430/01
ELECTRICITY
International classification
Abstract
Loudness signal processors and methods for processing an input audio signal in order to control a resulting integrated loudness level and a resulting loudness range of an output audio signal by a predetermined target loudness level and by a predetermined target loudness range, the processors and methods comprising level detection and level distribution analysis; transfer function generation based on the level distribution, the predetermined target loudness level and the predetermined target loudness range; and calculation of a gain to apply to said input audio signal, resulting in said output audio signal.
Claims
1. A loudness signal processor for processing an input audio signal in order to control a resulting integrated loudness level and a resulting loudness range of an output audio signal by a target loudness level and by a target loudness range, the loudness signal processor comprising a level detector block having an input to receive the input audio signal and arranged to determine a time-varying level of the input audio signal; a distribution analyzer block coupled to the level detector block and arranged to include a priming with a loudness level distribution and to provide an input level distribution based on the priming and the time-varying level of the input audio signal; a transfer function generator block coupled to the distribution analyzer block and arranged to determine a transfer function in response to the input level distribution, the target loudness level, and the target loudness range; a gain control block coupled to the transfer function generator block and arranged to calculate a time-varying gain in response to the time-varying level of the input audio signal and the transfer function; a multiplier block coupled to the gain control block and arranged to receive the time-varying gain from the gain control block and to apply the time-varying gain to the input audio signal and generate the output audio signal.
2. The loudness signal processor according to claim 1, wherein an estimated integrated loudness level over a finite-length window of the output audio signal substantially matches the target loudness level, and wherein an estimated loudness range over a finite-length window of the output audio signal is substantially constrained by the target loudness range.
3. The loudness signal processor according to claim 1, wherein said priming is determined by metadata.
4. The loudness signal processor according to claim 1, wherein said distribution analyzer block is arranged to provide said input level distribution under consideration of weight factors or inclusion rules determined by metadata.
5. The loudness signal processor according to claim 1, wherein the priming being effective immediately.
6. The loudness signal processor according to claim 1, wherein the priming being effective at a specific relative or absolute time.
7. The loudness signal processor according to claim 1, wherein said distribution analyzer block is arranged to, over time, update the loudness level distribution based on the time-varying level of the input audio signal.
8. The loudness signal processor according to claim 1, wherein said distribution analyzer block is arranged to, over time, update the loudness level distribution based on the time-varying level of the input audio signal; wherein two or more loudness level distributions are stored, each being associated with a certain audio signal type; and wherein the distribution analyzer block is arranged to perform the update only for a loudness level distribution that is associated with an audio signal type corresponding to a type of the input audio signal.
9. The loudness signal processor according to claim 1, wherein the loudness signal processor is arranged to let the loudness level distribution predominantly affect the target loudness range properties of the transfer function, and the input level distribution predominantly affect the target loudness level properties of the transfer function.
10. The loudness signal processor according to claim 1, wherein the loudness level distribution has been generated in advance based on a collection of programs belonging to a same audio signal type that have first been loudness normalized individually, then their time-varying levels measured forming individual level distributions, and then all these individual level distributions combined into the loudness level distribution to be used for the priming.
11. The loudness signal processor according to claim 1, wherein the loudness level distribution is based on a pre-analysis of at least a part of said input audio signal.
12. The loudness signal processor according to claim 1, wherein the distribution analyzer block is arranged to estimate a time-varying level distribution of the time-varying level of said input audio signal, and is further arranged to provide the input level distribution based on the loudness level distribution, the time-varying level distribution, or a combination thereof.
13. The loudness signal processor according to claim 3, wherein said metadata comprises indication of whether or not the input audio signal has been loudness normalized or loudness processed, according to certain specifications.
14. The loudness signal processor according to claim 3, wherein said metadata comprises indication of whether the input audio signal contains predominantly speech.
15. The loudness signal processor according to claim 3, wherein said metadata correspond to whether or not the input audio signal contains predominantly music which has been dynamically processed.
16. The loudness signal processor according to claim 3, wherein at least part of said metadata are received from a separate source.
17. The loudness signal processor according to claim 3, wherein at least part of said metadata are received from a signal classifier.
18. The loudness signal processor according to claim 1, wherein the time-varying level of said input audio signal is based on at least one selection from a list comprising: an estimate of a time-varying loudness level of said input audio signal; an RMS calculation of said input audio signal; and an Leq-type measure of said input audio signal.
19. The loudness signal processor according to claim 1, wherein said transfer function generator block determines said transfer function based on integrating the levels within said input level distribution by performing an RMS calculation in which levels below a threshold level are excluded.
20. The loudness signal processor according to claim 1, wherein said transfer function generator block is arranged to determine a degree of compression of said time-varying transfer function on the basis of a difference between said target loudness range and a distance between two percentiles estimated on the basis of said time-varying level distribution.
Description
DRAWINGS
(1) The present invention will in the following be described with reference to the drawings, illustrating:
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION
(8) The present invention constitutes a loudness level and loudness range processor. It comprises an audio signal processor which can control the loudness level as well as the loudness range of the audio signal, by means of applying a time-varying gain controlled on the basis of a continuous analysis of the signal. Various embodiments of the present invention including alternatives and optional features will be described in the following. Further suitable combinations of the disclosed embodiments, alternatives and features than exemplified in the following are within the scope of the present invention.
(9) Description of the Blocks
(10)
(11) A side chain comprises a Level detector block (204), which determines a time-varying level of the input signal. A Distribution Analyzer block (205) estimates the level distribution over time based on the output from the level detector (204). As the distribution is continually updated, taking into account new levels, the output of 205 is a time-varying level distribution.
(12) A Transfer Function Generator block (208) then generates a transfer function to be used by the Gain Control block (209). The transfer function is designed such that the dynamics processing of the input signal, based on the transfer function, will fulfill two target parametersthe Target loudness level and the Target loudness range. In other words, the target parameters specify properties of the desired result, and block 208 then calculates how best to obtain that result, based on what it knows about the input signal.
(13) The two target parameters are understood as goals or objectives of the loudness processing. It may be, due to the invention being a real-time processor, that the properties of the output signal would deviate somewhat from these targets. This might also depend on the particular input signal, and on how well it fits the model of the input estimated by the Distribution Analyzer. Even with some deviation from the target parameters, the processing might still be desirable in a given application. For instance, in a live broadcasting application, some deviation from the specified target parameters would be expected (see e.g. EBU (2010)).
An Embodiment Described in Detail
(14) A detailed description follows of an embodiment of the present invention. Several variants of the central elements are described. In order to provide sufficient detail in an unambiguous manner, MATLAB by MathWorks code is provided to demonstrate one implementation. This MATLAB code only relies on functions available in a standard MATLAB installation, and full documentation for all functions and operators used in the following code snippets are thereby easily found, e.g. at the MathWorks website, http://www.mathworks.com/help, the relevant parts of the MATLAB documentation hereby incorporated by reference. However, trivial detailssuch as variable initialization and sample loopshave been omitted here for clarity. Note, that in MATLAB-code the percent-sign, %, marks the beginning of a text comment, which is in the following used to provide a few explanations about the code.
(15) Prerequisites
(16) In the following is assumed that the Input signal (200) is stored as digital samples in vector IS. The sample-rate is stored in fs, fs=48000 Hz will be used in the examples, but any sample rate is within the scope of the invention. The variable i is the sample counter, the index of the current input and output sample. Note that even though all the input samples are stored in IS in this demonstration, the code never uses any value of IS greater than i (i.e. the system is causal)that is, the examples all simulate a real-time processing.
(17) Level Detector (204)
(18) This example embodiment of the invention shows an implementation of an RMS type of level detector, where LD(i) is the output of the level detector for the current sample, and detector_samps denotes the length of the detector's sliding window, in samples. Here, a 50 ms window (detector_samps=2400) is used. This may be considered a simple estimate of time-varying loudness level.
(19) The same principle could be implemented as an averaging FIR filter, having the squared samples as input.
(20) LD(i)=sqrt(mean(IS(i-detector_samps:i) .^2)); % RootMeanSquare value
(21) LD(i)=20*log 10(LD(i)); % Convert to dB
(22) Alternatively, an IIR type level detector could be employed, as is common in prior art dynamic compressors (see e.g. Zlzer).
(23) Distribution Analyzer (205)
(24) According to this example embodiment of the invention, the Distribution Analyzer is implemented as a sliding window, of DA_secs seconds length. The length of the sliding window of the distribution analyzer should preferably be long enough to cover different kinds of loudness dynamics used in the program, e.g. conversation, moody passages, intense passages, etc., to be able to estimate a stable level distribution. However, the window should preferably not be so long that it covers (a large part of) a different program type or otherwise passages whose influence on the integrated loudness level is undesired. Thus, the optimal value for DA_secs would depend on the content type and genre being processed, but an example of a value for the DA_secs may be in the range 10-60 minutes. Program types where faster adaption is desired may have shorter values, whereas program types that need a really robust loudness processing e.g. because of a large loudness range but with long time between loudness level changes may require a distribution analysis based on hours of past material.
(25) In this example, the DA estimates 2 percentiles of the statistical distribution corresponding to the samples in its analysis window; 10% and 90% are used as an example. The percentiles will provide the basis for estimating the loudness range of the input signal.
(26) DA_samps=DA secs*fs;
(27) % Update DA sliding window with new sample
(28) DA_window=[LD(i) DA_window]; DA_window(DA_samps:end)=[ ];
(29) % Compute distribution parameters
(30) percentiles=[10 90];
(31) DA_sort=sort(DA_window);
(32) DA_percen=DA_sort(round((length(DA_sort)1)*percentiles./100+1));
(33) Optionally, the LD signal may be down-sampled prior to the DA, as an optimization.
(34) Alternatively, block 205 could model the distribution itself, for instance by maintaining a histogram representation, or by continually estimating the parameters of a suitable parametric distribution.
(35) Note that in some embodiments of the invention (involving presets), the DA is primed with a distribution, pre-computed and stored. In this example, this corresponds to simply initializing DA-window with the preset vector of length DA_samps.
(36) The output of the Distribution Analyzer block (205) may comprise several of the determined values, e.g. DA_percen for range processing (compression/expansion) and DA_window as basis for level processing (overall gain).
(37) Transfer Loudness Level and Range (206, 207)
(38) The two target parameters are in this embodiment of the present invention given in the 2 variables below. Note that different target values will be used in the simulations shown in the plots later in this document. Also note, that any target values are within the scope of the invention, as they are preferably user-specified, typically in accordance with broadcasting standards or program standards.
(39) TargetLoudnessLevel=20; % dBFS
(40) TargetLoudnessRange=15; % dB
(41) Here is used a target loudness level specified according to a simple RMS-based calculation of the integrated loudness level (i.e. an Leq loudness level). Other measures could alternatively be used within the scope of the present invention, such as the gated, integrated loudness level (ITU, 2011), or the LLML (Vickers, 2001).
(42) In these examples, a target loudness range is used, defined as the difference, in dB, between a high and a low percentile of the level distribution. Other measures could alternatively be used within the scope of the present invention, such as variants of the LRA descriptor (Skovenborg, 2012), or the Dynamic Spread (Vickers, 2001).
(43) The implementation of blocks 205 and 208 (and possibly 204 and 301) would in any case have to correspond to the loudness level and loudness range measures which the target parameters refer to.
(44) Transfer Function Generator (208)
(45) Within the scope of the present invention, four different implementations of block 208 are shown in the following, with different properties and features. In the following code examples, TF denotes the transfer function, and variables with TF prefix are variables related to the generation of the transfer function TF.
(46) The vector LX contains a set of levels, in increasing order, which are essentially the X-axis of the transfer function (in the representation used here), i.e. LX determines the span and resolution of the transfer function. For example, LX=[80:0.2:0]; (in MATLAB notation).
(47) TF Method 1
(48) TF_Range=diff(DA_percen);
(49) TF_Comp=TF_RangeTargetLoudnessRange; % how much compression needed (dB)
(50) % Generate compression transfer function
(51) for k=1:length(LX) if LX(k)<=DA_percen(1) % below lower percentile TF(k)=LX(k); else if TF_Comp>0 r=TF_Range/(TF_RangeTF_Comp); else r=1; % don't apply dynamic expansion end TF(k) (LX(k)DA_percen(1))/r+DA_percen(1); end
(52) end
(53) % Calculate post-comp integrated level (RMS method)
(54) DA_window_comp=interp1(LX,TF, DA_window, nearest);
(55) TF_IntegratedPostComp=10*log 10(mean(10.^(DA_window_comp./10)));
(56) TF_Gain=TargetLoudnessLevelTF_IntegratedPostComp;
(57) % Generate the Transfer Function
(58) TF=TF+TF_Gain;
(59) Method 1 calculates the degree of compression needed to match the target loudness range, but it does not apply dynamic expansion, in case the loudness range of the input signal is smaller than the specified target range. It then shifts the transfer function, corresponding to a static gain, in order for the compressed (output) signal to match the target loudness level. An equivalent method could be implemented based on other representations of the level distribution (by block 205).
(60) TF Method 2
(61) Similar to TF method 1, except:
(62) % Generate compression/expansion transfer function
(63) for k=1:length(LX) if LX(k)<=DA_percen(1) % below lower percentile r=; % low-level expansion (noise reduction) TF(k)=(LX(k)DA_percen(1))/r+DA_percen(1); else if TF_Comp>0 r=TF_Range/(TF_RangeTF_Comp); else r=1; % don't apply dynamic expansion end TF(k)=(LX(k)DA_percen(1))/r+DA_percen(1); end
(64) end
(65) Method 2 furthermore applies dynamic expansion at low levels, at a ratio 1:2, in order to perform single-ended noise reduction. Note that the threshold for what is regarded as low levels is signal-dependent, as it is the lower percentile of the level distribution. Thus, if the input signal was somehow gained up X dB, then the low levels threshold by this example would automatically also move up by X dB.
(66) TF Method 3
(67) TF_Range=diff(DA_percen);
(68) TF_Comp=TF_RangeTargetLoudnessRange; % how much compression needed (dB)
(69) TF_Comp=max(TF_Comp, 0);
(70) % Generate compression transfer function
(71) if TF_Range>0 r=TF_Range/(TF_RangeTF_Comp);
(72) else r=1; % initially
(73) end
(74) XY=[min(LX), min(LX); DA_percen{1}, DA_percen(1); DA_percen{2}+0.01, DA_percen(2)+0.01-TF_Comp; max(LX), (max(LX)DA_percen(2))/(r*2)+(DA_percen(2)+0.01-TF_Comp);
(75) % twice the compression at high levels ];
(76) TF=interp1(XY(:,1),XY(:,2), LX, linear);
(77) % Calculate post-comp integrated level (RMS method)
(78) DA_window_comp=interp1(LX,TF, DA_window, nearest);
(79) TF_IntegratedPostComp=10*log 10(mean(10.^(DA_window_comp./10)));
(80) TF_Gain=TargetLoudnessLevelTF_IntegratedPostComp;
(81) % Generate the Transfer Function
(82) TF=TF+TF_Gain;
(83) This method demonstrates that more breakpoints can be added to the transfer function. In this case, levels above the high percentilei.e. the 10% loudest levelsare compressed with a ratio which is twice that of the normal levels. This feature may be desirable, as too high levels are known to be perceptually annoying. Note that, again the generation of the specific TF is guided by signal properties (via the distribution analysis), without the user having to intervene when the signal-type, -genre, or -level changes.
(84) Here, Matlab's linear-interpolation function is employed to construct the TF, for convenience and brevity.
(85) TF Method 4
(86) Similar to TF method 3, except:
(87) XY=[min(LX), min(LX); DA_percen(1), DA_percen(1); DA_percen(2)+0.01, DA_percen(2)+0.01-TF_Comp; max(LX), (max(LX)DA_percen(2))/(r*2)+(DA_percen(2)+0.01-TF_Comp);
(88) % twice the compression at high levels ];
(89) TF=interp1(XY(:,1),XY(:,2), LX, cubic);
(90) This method demonstrates, that the transfer function does not need to consist of line segments. Here, a piecewise cubic fit generates a TF which is smooth, without any corners, and hence may sound better in some cases. This can be considered a generalization of the soft knee method known from prior art.
(91) Note that in alternative implementations, the actual TF might not be represented explicitly (as in the examples shown here) but may instead be implemented in a functional form (i.e., as a set of rules).
(92) Gain Control Block (209)
(93) Based on the transfer function (TF), block 209 calculates the time-varying gain. In this example embodiment of the present invention, the nearest value in the TF vector is simply used. GC(i) is the calculated gain for the current sample, based on the transfer function TF and the level detector output LD(i).
(94) GC(i)=interp1(LX,TF, LD(i), nearest)LD(i); % time-varying gain (dB)
(95) Alternatively, a lower resolution of LX and TF could be used together with an interpolated lookup.
(96) Multiplier Block (202)
(97) Block 202 applies the time-varying gain to the signal, and thereby produces the next output sample OS(i), and may preferably be embodied as:
(98) g=10^(GC(i)/20); % convert to linear gain
(99) OS(i)=IS(i)*g;
(100) As illustrated in
(101) Description According to the Plots
(102) The following description together with the plots of
(103) The input signal used in the demonstration is for the sake of simplicity simply composed of a pure 1 kHz tone, at different levels: 1. 10 s at 20 dBFS 2. 5 s at 30 dBFS 3. 5 s at 10 dBFS
(104) This sequence repeats 3 times, for a total duration of 60 s. See
(105) Explanation to the Plots:
(106)
(107) For each simulation the corresponding figures show: the Transfer Function resulting from block 208 in its state at the end of the input signal, the input and output levels, measured over the duration of the test signal. The input level shown corresponds to the output of block 204, and the output level is measured in the same way (though doing so is not part of a preferred embodiment of the invention, but done here for illustration of an effect of the invention). the time-varying gain (i.e. output of block 209) the integrated loudness level of the whole test signal (i.e. the entire programme) was measured before and after processing, in order to compare with the Target loudness level. the loudness range of the whole test signal (i.e. the entire programme) was measured before and after processing, in order to compare with the Target loudness range.
(108) Simulation 1:
(109) Transfer Function Generator method #1
(110) Integrated loudness level: input=15.2, output=23.7 (target=25.0) dBFS
(111) Loudness range: input=20.0, output=20.0 (target=20.0) dB
(112) Because the Target loudness range was relatively large (20 dB) no compression was applied by the processor.
(113) Simulation 2:
(114) Transfer Function Generator method #1
(115) Integrated loudness level: input=15.2, output=24.0 (target=25.0) dBFS
(116) Loudness range: input=20.0, output=10.0 (target=10.0) dB
(117) Here the Target loudness range was 10 dB, which was met by the compression applied, while also matching the integrated loudness level of the entire programme within +/1 dB.
(118) Simulation 3:
(119) Transfer Function Generator method #1
(120) Integrated loudness level: input=15.2, output=14.0 (target=15.0) dBFS
(121) Loudness range: input=20.0, output=10.0 (target=10.0) dB
(122) A different target loudness level can also be matched.
(123) Simulation 4:
(124) Transfer Function Generator method #2
(125) Integrated loudness level: input=15.2, output=19.0 (target=20.0) dBFS
(126) Loudness range: input=20.0, output=10.0 (target=10.0) dB
(127) Note the dynamic expansion (i.e. noise reduction) in the transfer function plot, applied below 30 dBFS.
(128) Simulation 5:
(129) Transfer Function Generator method #3
(130) Integrated loudness level: input=15.2, output=19.6 (target=20.0) dBFS
(131) Loudness range: input=20.0, output=10.0 (target=10.0) dB
(132) Note the extra break-point in the transfer function plot, leading to a greater compression of levels above 10 dBFS.
(133) Simulation 6:
(134) Transfer Function Generator method #4
(135) Integrated loudness level: input=15.2, output=19.5 (target=20.0) dBFS
(136) Loudness range: input=20.0, output=10.0 (target=10.0) dB
(137) Note the smooth transfer function, which matches the target parameters practically as well as the equivalent line-segment based transfer function
(138) Simulation 7:
(139) Based on the same setup as simulation 6, but this time with a look-ahead delay of 30 ms (block 201 on
(140) Note the slight differences in the Output level, compared to
(141) Simulation 8:
(142) Based on the same setup as simulation 2, but this time with the test signal being gained by 5 dB, and with the DA block (205) primed with a preset corresponding to the distribution from the test signal (i.e. at its original gain).
(143) Integrated loudness level: input=20.2, output=25.4 (target=25.0) dBFS
(144) Loudness range: input=20.0, output=10.0 (target=10.0) dB
(145) Simulation 9:
(146) Same setup as simulation 8, i.e. based on simulation 2 but with the Test signal being gained by 5 dB.
(147) However, this time the distribution analyzer block was NOT primed.
(148) Integrated loudness level: input=20.2, output=24.0 (target=25.0) dBFS
(149) Loudness range: input=20.0, output=10.0 (target=10.0) dB
(150) Note the different time-varying gain and hence output level, compared to
REFERENCES
(151) EBU (2010) Recommendation R-128, European Broadcast Union. ITU-R (2011) Recommendation BS.1770-2, International Telecommunication Union. Skovenborg (2012) Loudness Range (LRA)Design and Evaluation, AES 132.sup.nd Cony. Vickers (2001) Automatic Long-term Loudness and Dynamics Matching, AES 111.sup.th Conv. Zlzer (2011) DAFX: Digital Audio Effects, Wiley.