Apparatus and method for decomposing an audio signal using a ratio as a separation characteristic
11183199 · 2021-11-23
Assignee
Inventors
- Alexander Adami (Gundelsheim, DE)
- Jürgen HERRE (Erlangen, DE)
- Sascha Disch (Fürth, DE)
- Florin Ghido (Nuremberg, DE)
Cpc classification
G10H2250/035
PHYSICS
G10L19/008
PHYSICS
G10H2210/046
PHYSICS
H04S3/008
ELECTRICITY
G10H2250/235
PHYSICS
H04S2400/01
ELECTRICITY
International classification
G10L19/022
PHYSICS
H04S3/00
ELECTRICITY
Abstract
An apparatus for decomposing an audio signal into a background component signal and a foreground component signal includes: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a block characteristic of a current block of the audio signal and for determining an average characteristic for a group of blocks, the group of blocks including at least two blocks; and a separator for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal includes the background portion of the current block and the foreground component signal includes the foreground portion of the current block.
Claims
1. An apparatus for decomposing an audio signal into a background component signal and a foreground component signal, the apparatus comprising: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a block characteristic of a current block of the audio signal and for determining an average characteristic for a group of blocks, the group of blocks comprising at least two blocks; and a separator for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal comprises the background portion of the current block and the foreground component signal comprises the foreground portion of the current block.
2. The apparatus of claim 1, wherein the audio signal analyzer is configured for analyzing an amplitude-related measure as the block characteristic of the current block and the amplitude-related measure as the average characteristic for the group of blocks.
3. The apparatus of claim 1, wherein the audio signal analyzer is configured for analyzing a power measure or an energy measure for the current block and an average power measure or an average energy measure for the group of blocks.
4. The apparatus of claim 1, wherein the separator is configured to calculate a separation gain from the ratio, to weight the audio signal values of the current block using the separation gain to acquire the foreground portion of the current block, and to determine the background portion so that the background component signal constitutes a remaining signal, or wherein the separator is configured to calculate the separation gain from the ratio, to weight the audio signal values of the current block using the separation gain to acquire the background portion of the current block, and to determine the foreground portion so that the foreground component signal constitutes a remaining signal.
5. The apparatus of claim 1, wherein the separator is configured to calculate a separation gain using weighting the ratio using a predetermined weighting factor different from zero.
6. The apparatus of claim 5, wherein the separator is configured to calculate the separation gain using a term 1−(g.sub.N/ψ(n).sup.P or (max(1−(g.sub.N/ψ(n))).sup.P, wherein g.sub.N is the predetermined weighting factor, ψ(n) is the ratio and p is a power greater than zero and being an integer or a non-integer number, and wherein n is a block index, and wherein max is a maximum function for selecting a greater value of 1 and (g.sub.N/ψ(n).sup.P.
7. The apparatus of claim 1, wherein the separator is configured to compare the ratio of the current block to a separation threshold and to separate the current block, when the ratio of the current block is in a predetermined relation to the separation threshold, and wherein the separator is configured to not separate a further block, the further block comprising a ratio not exhibiting the predetermined relation to the separation threshold, so that the further block fully belongs to the background component signal.
8. The apparatus of claim 7, wherein the separator is configured to separate a following block following the current block in time using comparing a ratio of the following block to a release threshold, and wherein the release threshold is set such that the ratio that is not in the predetermined relation to the separation threshold is in the predetermined relation to the release threshold.
9. The apparatus of claim 8, wherein the predetermined relation is “greater than” and wherein the release threshold is lower than the separation threshold, or wherein the predetermined relation is “lower than” and wherein the release threshold is greater than the separation threshold.
10. The apparatus of claim 1, wherein the block generator is configured to determine temporally overlapping blocks of audio signal values, or wherein the temporally overlapping blocks comprise a number of sampling values being less than or equal to 600.
11. The apparatus of claim 1, wherein the block generator is configured to perform a block-wise conversion of the audio signal being a time domain audio signal into a frequency domain to acquire a spectral representation for each block, wherein the audio signal analyzer is configured to calculate the block characteristic or the average characteristic using the spectral representation of the current block, and wherein the separator is configured to separate the spectral representation into the background portion and the foreground portion so that, for spectral bins of the background portion and the foreground portion corresponding to a same frequency, each comprises a spectral value different from zero, wherein a relation of the spectral value of the foreground portion and the spectral value of the background portion within a same frequency bin depends on the ratio of the block characteristic of the current block and the average characteristic of the group of blocks.
12. The apparatus of claim 1, wherein the block generator is configured to perform a block-wise conversion of a time domain into a frequency domain to acquire a spectral representation for each block, wherein time adjacent blocks are overlapping in an overlapping range, wherein the apparatus further comprises a signal composer for composing the background component signal and for composing the foreground component signal, and wherein the signal composer is configured for performing a frequency-time conversion for the background component signal and for the foreground component signal and for cross-fading time representations of the time-adjacent blocks within the overlapping range to acquire a time domain foreground component signal and a separate time domain background component signal.
13. The apparatus of claim 1, wherein the audio signal analyzer is configured to determine the average characteristic for the group of blocks using a weighted addition of individual block characteristics of blocks in the group of blocks.
14. The apparatus of claim 1, wherein the audio signal analyzer is configured to perform a weighted addition of individual block characteristics of blocks in the group of blocks, wherein a weighting value for a block characteristic of a block close in time to the current block is greater than a weighting value for a block characteristic of a further block less close in time to the current block.
15. The apparatus of claim 13, wherein the audio signal analyzer is configured to determine the group of blocks so that the group of blocks comprises at least twenty blocks before the current block or at least twenty blocks subsequent to the current block.
16. The apparatus of claim 1, wherein the audio signal analyzer is configured to use a normalization value depending on a number of blocks in the group of blocks or depending on weighting values for blocks in the group of blocks.
17. The apparatus of claim 1, further comprising a signal characteristic measurer for measuring a signal characteristic of at least one of the background component signals and the foreground component signal.
18. The apparatus of claim 17, wherein the signal characteristic measurer is configured to determine a foreground density using the foreground component signal or to determine a foreground prominence using the foreground component signal and the audio signal.
19. The apparatus of claim 1, wherein the foreground component signal comprises clap signals, wherein the apparatus further comprises a signal characteristic modifier for modifying the foreground component signal by increasing a number of claps or decreasing a number of claps or by applying a weight to the foreground component signal or the background component signal to modify an energy relation between the foreground component signal and the background component signal being a noise-like signal.
20. The apparatus of claim 1, further comprising a blind upmixer for upmixing the audio signal into a representation comprising a number of output channels being greater than a number of channels of the audio signal, wherein the blind upmixer is configured to spatially distribute the foreground component signal into each of the number of output channels wherein the foreground component signals in the number of output channels are correlated, and to spatially distribute the background component signal into each of the number of output channels, wherein the background component signals in the output channels are less correlated than the foreground component signals or are uncorrelated to each other.
21. The apparatus of claim 1, further comprising an encoder stage for separately encoding the foreground component signal and the background component signal to acquire an encoded representation of the foreground component signal and a separate encoded representation of the background component signal for transmission or storage or decoding.
22. A method of decomposing an audio signal into a background component signal and a foreground component signal, the method comprising: generating a time sequence of blocks of audio signal values; determining a block characteristic of a current block of the audio signal and determining an average characteristic for a group of blocks, the group of blocks comprising at least two blocks; and separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal comprises the background portion of the current block and the foreground component signal comprises the foreground portion of the current block.
23. A non-transitory digital storage medium having a computer program stored thereon to perform a method of decomposing an audio signal into a background component signal and a foreground component signal, the method comprising: generating a time sequence of blocks of audio signal values; determining a block characteristic of a current block of the audio signal and determining an average characteristic for a group of blocks, the group of blocks comprising at least two blocks; and separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal comprises the background portion of the current block and the foreground component signal comprises the foreground portion of the current block, when the computer program is run by a computer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
DETAILED DESCRIPTION OF THE INVENTION
(17)
(18) Furthermore, the apparatus comprises a separator 130 for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic. Thus, the ratio of the block characteristic of the current block and the average characteristic is used as a characteristic, based on which the separation of the current block of audio signal values is performed. Particularly, the background component signal at signal output 140 comprises the background portion of the current block, and the foreground component signal output at the foreground component signal output 150 comprises the foreground portion of the current block. The procedure illustrated in
(19) Advantageously, the audio signal analyzer is configured for analyzing an amplitude-related measure as the block characteristic of the current block and, additionally, the audio signal analyzer 120 is configured for additionally analyzing the amplitude-related characteristic for the group of blocks as well.
(20) Advantageously, a power measure or an energy measure for the current block and an average power measure or an average energy measure for the group of blocks is determined by the audio signal analyzer, and a ratio between those two values for the current block is used by the separator 130 to perform the separation.
(21)
(22) In step 202, a separation gain is calculated from the ratio or the characteristic. Then, a threshold comparison in step 204 can be performed optionally. When a threshold comparison is performed in step 204, then the result can be that the characteristic is in a predetermined relation to the threshold. When this is the case, the control proceeds to step 206. When, however, it is determined in step 204 that the characteristic is not in relation to the predetermined threshold, then no separation is performed and the control proceeds to the next block in the sequence of blocks.
(23) In accordance with the first aspect, a threshold comparison in step 204 can be performed or can, alternatively, not be performed as illustrated by the broken line 208. When it is determined in block 204 that the characteristic is in a predetermined relation to the separation threshold or, in the alternative of line 208, in any case, step 206 is performed, where the audio signals are weighted using a separation gain. To this end, step 206 receives the audio signal values of an input audio signal in a time representation or, advantageously, a spectral representation as illustrated by line 210. Then, depending on the application of the separation gain, the foreground component C is calculated as illustrated by the equation directly below
(24)
(25) Subsequently,
(26)
(27) The characteristic of the current block and the variability of the characteristic are both forwarded to the separator 130 via a connection line 129. The separator is then configured for separating the current block into a background portion and the foreground portion to generate the background component signal 140 and the foreground component signal 150. Particularly, the separator is configured, in accordance with the second aspect, to determine a separation threshold based on the variability determined by the audio signal analyzer and to separate the current block into the background component signal portion and the foreground component signal portion, when the characteristic of the current block is a predetermined relation to the separation threshold. When, however, the characteristic of the current block is not in the predetermined relation to the (variable) separation threshold, then no separation of the current block is performed and the whole current block is forwarded to or used or assigned as the background component signal 140.
(28) Specifically, the separator 130 is configured to determine the first separation threshold for a first variability and a second separation threshold for a second variability, wherein the first separation threshold is lower than the second separation threshold and the first variability is lower than the second variability, and wherein the predetermined relation is “greater than”.
(29) An example is illustrated in
(30) Depending on certain implementations, the separator 130 is configured to determine the (variable) separation threshold either using a table access, where the functions illustrated in
(31) As illustrated in
(32)
(33) Particularly, a separation stage 600 that is illustrated in detail in
(34) Advantageously, based on signal separation/decomposition of the input signal a(t) into distinctly perceivable claps c(t) and more noise-like background signals n(t) an individual processing of the decomposed signal parts is realized. After processing, the modified foreground and background signals c′(t) and n′(t) are re-synthesized resulting in the output signal a′(t).
(35)
(36) Particularly, the system in
(37) The applause input signal a(t), i.e., the input signal comprising background components and applause components, is fed into a signal switch (not shown in
(38)
(39) The signal separator 130 in
(40) Furthermore, when the adaptive thresholding operation in accordance with the second aspect is performed, then the audio signal analyzer additionally performs an envelope variability estimation as illustrated in block 174, and the variability measure v(n) is forwarded to the separator, and particularly, to the adaptive thresholding processing block 182 to finally obtain the gain g.sub.s(n) as will be described later on.
(41) A flow chart of the internals of the foreground signal detector is depicted in
(42)
(43) where w(n) denotes a weighting window applied to the instantaneous energy estimates with window length L.sub.w=2M+1. As an indication as to whether a distinct clap is active within the input signal, the energy ratio Ψ(n) of instantaneous and average energy is used according to;
(44)
(45) In the simpler case without adaptive thresholding, for time instances where the energy ratio exceeds the attack threshold τ.sub.attack, the separation gain which extracts the distinct clap part from the input signal is set to 1; consequently, the noise-like signal is zero at these time instances. A block diagram of a system with hard signal switching is depicted in
(46)
(47) In a further embodiment, the above equation is replaced by the following equation:
(48)
(49) Note: if τ.sub.attack=0, the amount of signal routed to the distinctive clap only depends on the energy ratio Ψ(n) and the fixed gain g.sub.N yielding a signal dependent soft decision. In a well-tuned system, the time period in which the energy ratio exceeds the attack thresholds captures only the actual transient event. In some cases, it might be desirable to extract a longer period of time frames after an attack occurred. This can be done, for instance, by introducing a release threshold τ.sub.release indicating the level to which the energy ratio Ψ has to decrease after an attack before the separation gain is set back to zero:
(50)
(51) In a further embodiment, the immediately preceding equation is replaced by the following equation:
(52)
(53) An alternative but more static method is to simply route a certain number of frames after a detected attack to the distinct clap signal.
(54) In order to increase flexibility of the thresholding, thresholds could be chosen in a signal adaptive manner resulting in τ.sub.attack(n) and τ.sub.release(n), respectively. The thresholds are controlled by an estimate of the variability of the envelope of the applause input signal, where a high variability indicates the presence of distinctive and individually perceivable claps and a rather low variability indicates a more noise-like and stationary signal. Variability estimation could be done in time domain as well as in frequency domain. The advantageous method in this case is to do the estimation in frequency domain:
v′(n)=var([Φ.sub.A(n−M),Φ.sub.A(n−M+1), . . . ,Φ.sub.A(n+m)]), m=−M . . . M
where var(⋅) denotes the variance computation. To yield a more stable signal, the estimated variability is smoothed by low pass filtering yielding the final envelope variability estimate
v(n)=h.sub.TP(n)*v′(n)
(55) where * denotes a convolution. The mapping of envelope variability to corresponding threshold values can be done by mapping functions ƒ.sub.attack(x) and ƒ.sub.release(x) such that
τ.sub.attack(n)=ƒ.sub.
τ.sub.release(n)=ƒ.sub.
(56) In one embodiment, the mapping function could be realized as clipped linear functions, which corresponds to a linear interpolation of the thresholds. The configuration for this scenario is depicted in
(57) The separated signals are obtained by
C(k,n)=g.sub.s(n).Math.A(k,n)
N(k,n)=A(k,n)−C(k,n)
(58)
(59) Furthermore,
(60) Furthermore, as illustrated with respect to equations (7) to (9) in
(61) Furthermore,
(62) Particularly,
(63) Alternatively, as illustrated in the right portion of
(64) The separated applause signal parts can be fed into measurement stages where certain (perceptually motivated) characteristics of transient signals can be measured. An exemplary configuration for such a use case is depicted in
(65) Estimating the foreground density Θ.sub.FGD(n) can be done by counting the event rate per second, i.e. the number of detected claps per second. The foreground prominence Θ.sub.FFG(n) is given by the energy ratio of estimated foreground clap signal C(n) and A(n):
(66)
(67) A block diagram of the restoration of the measured signal characteristics is depicted in
(68) While in the previous embodiment, the signal characteristic was only measured, the system is used to modify signal characteristics. In one embodiment, the foreground processing could output a reduced number of the detected foreground claps resulting in a density modification towards lower density of the resulting output signal. In another embodiment, the foreground processing could output an increased number of foreground claps, e.g., by adding a delayed version of the foreground clap signal to itself resulting in a density modification towards increased density. Furthermore, by applying weights in the respective processing stages, the balance of foreground claps and noise-like background could be modified. Additionally, any processing like filtering, adding reverb, delay, etc. in both paths can be used to modify the characteristics of an applause signal.
(69)
(70) Subsequently, further advantageous embodiments are discussed with respect to
(71) In the
(72) The exemplarily illustrated overlapping blocks consist, for example, of a current block 304 that overlaps within the overlap range with a preceding block 303 or a following block 305. Thus, when a group of blocks comprises at least two preceding blocks then this group of blocks would consist of the preceding block 303 with respect to the current block 304 and the further preceding block indicated with order number 3 in
(73) These blocks are, for example, formed by the block generator 110 that advantageously also performs a time-spectral conversion such as the DFT mentioned earlier or an FFT (Fast Fourier transform).
(74) The result of the time-spectral conversion is a sequence of spectral blocks I to VIII, where each spectral block illustrated in
(75) Advantageously, a separation is then performed in the frequency domain, i.e., using the spectral representation where the audio signal values are spectral values. Subsequent to the separation, a foreground spectral representation, once again consisting of blocks I to VIII, and a background representation consisting of I to VIII, are obtained. Naturally, and depending on the thresholding operation, it is not necessarily the case that each block of the foreground representation subsequent to the separation 130 has values different from zero. However, advantageously, it is made sure by at least the first aspect of the present invention that each block in the spectral representation of the background component has values different from zero in order to avoid a drop out of energy in the background signal component.
(76) For each component, i.e., the foreground component and the background component, a spectral-time conversion is performed as has been discussed in the context of
(77) Advantageously, as illustrated in
(78) In particular, step 400 illustrates the determination of a general characteristic or a ratio between a block characteristic and an average characteristic for a current block as illustrated at 400.
(79) In block 402, a raw variability is calculated with respect to the current block. In block 404, raw variabilities for preceding or following blocks are calculated to obtain, by the output of block 402 and 404, a sequence of raw variabilities. In block 406, the sequence is smoothed. Thus, at the output of block 406 a smoothed sequence of variabilities exists. The variabilities of the smoothed sequence are mapped to corresponding adaptive thresholds as illustrated in block 408 so that one obtains the variable threshold for the current block.
(80) An alternative embodiment is illustrated in
(81) In block 403, a sequence of variabilities is calculated using, for example, equation 6 of
(82) In block 405, the sequence of variabilities is mapped to a sequence of raw thresholds in accordance with equation 8 and equation 9 but with non-smoothed variabilities in contrast to equation 7 of
(83) In block 407, the sequence of raw thresholds is smoothed in order to finally obtain the (smoothed) threshold for the current block.
(84) Subsequently,
(85) Once again, in step 500, a characteristic or ratio between a current block characteristic and an average block characteristic is calculated.
(86) In step 502, an average or, generally, an expectation over the characteristics/ratios for the group of blocks is calculated.
(87) In block 504, differences between characteristics/ratios and the average value/expectation value are calculated and, as illustrated in block 506, the addition of the differences or certain values derived from the differences are performed advantageously with a normalization. When the squared differences are added then the sequence of steps 502, 504, 506 reflect the calculation of a variance as has been outlined with respect to equation 6. However, for example, when magnitudes of differences or other powers of differences different from two are added together then a different statistical value derived from the differences between the characteristics and the average/expectation value is used as the variability.
(88) Alternatively, however, as illustrated in step 508, also differences between time-following characteristics/ratios for adjacent blocks are calculated and used as the variability measure. Thus, block 508 determines a variability that does not rely on an average value but that relies on a change from one block to the other, wherein, as illustrated in
(89) Subsequently, examples of embodiments are defined that can be used separately from the below examples or in combination with any of the below examples: 1. Apparatus for decomposing an audio signal (100) into a background component signal (140) and a foreground component signal (150), the apparatus comprising: a block generator (110) for generating a time sequence of blocks of audio signal values; an audio signal analyzer (120) for determining a block characteristic of a current block of the audio signal and for determining an average characteristic for a group of blocks, the group of blocks comprising at least two blocks; and a separator (130) for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal (140) comprises the background portion of the current block and the foreground component signal (150) comprises the foreground portion of the current block. 2. Apparatus of example 1, wherein the audio signal analyzer is configured for analyzing an amplitude-related measure as the characteristic of the current block and the amplitude-related characteristic as the average characteristic for the group of blocks. 3. Apparatus of example 1 or 2, wherein the audio signal analyzer (120) is configured for analyzing a power measure or an energy measure for the current block and an average power measure or an average energy measure for the group of blocks. 4. Apparatus of one of the preceding examples, wherein the separator (130) is configured to calculate a separation gain from the ratio, to weight the audio signal values of the current block using the separation gain to obtain the foreground portion of the current frame and to determine the background component so that the background signal constitutes a remaining signal, or wherein the separator is configured to calculate a separation gain from the ratio, to weight the audio signal values of the current block using the separation gain to obtain the background portion of the current frame and to determine the foreground component so that the foreground component signal constitutes a remaining signal. 5. Apparatus of one of the preceding examples, wherein the separator (130) is configured to calculate a separation gain using weighting the ratio using a predetermined weighting factor different from zero. 6. Apparatus of example 5, wherein the separator (130) is configured to calculate the separation gain using a term 1−(g.sub.N/ψ(n).sup.P or (max(1−(g.sub.N/ψ(n))).sup.P, wherein g.sub.N is the predetermined factor, ψ(n) is the ratio and p is a power greater than zero and being an integer or a non-integer number, and wherein n is a block index, and wherein max is a maximum function. 7. Apparatus of one of the preceding examples, wherein the separator (130) is configured to compare a ratio of the current block to a threshold and to separate the current block, when the ratio of the current block is in a predetermined relation to the threshold and wherein the separator (130) is configured to not separate a further block, the further block having a ratio not having the predetermined relation to the threshold, so that the further block fully belongs to the background component signal (140). 8. Apparatus of example 7, wherein the separator (130) is configured to separate a following block following the current block in time using comparing the ratio of the following block to a further release threshold, wherein the further release threshold is set such that a block ratio that is not in the predetermined relation to the threshold is in the predetermined relation to the further release threshold. 9. Apparatus of example 8, wherein the predetermined relation is “greater than” and wherein the release threshold is lower than separation threshold, or wherein the predetermined relation is “lower than” and wherein the release threshold is greater than the separation threshold. 10. Apparatus of one of the preceding examples, wherein the block generator (110) is configured to determine timely overlapping blocks of audio signal values or wherein the temporally overlapping blocks have a number of sampling values being less than or equal to 600. 11. Apparatus of one of the preceding examples, wherein the block generator is configured to perform a block-wise conversion of the time domain audio signal into a frequency domain to obtain a spectral representation for each block, wherein the audio signal analyzer is configured to calculate the characteristic using the spectral representation of the current block, and wherein the separator (130) is configured to separate the spectral representation into the background portion and the foreground portion so that, for spectral bins of the background portion and the foreground portion corresponding to the same frequency, each have a spectral value different from zero, wherein a relation of the spectral value of the foreground portion and the spectral value of the background portion within the same frequency bin depends on the ratio. 12. Apparatus of one of the preceding examples, wherein the block generator (110) is configured to perform a block-wise conversion of the time domain into the frequency domain to obtain a spectral representation for each block, wherein time adjacent blocks are overlapping in an overlapping range (302), wherein the apparatus further comprises a signal composer (160a, 161a, 160b, 161b) for composing the background component signal and for composing the foreground component signal, wherein the signal composer is configured for performing a frequency-time conversion (161a, 160a, 160b) for the background component signal and for the foreground component signal and for cross-fading (161a, 161b) time representations of time-adjacent blocks within the overlapping range to obtain a time domain foreground component signal and a separate time domain background component signal. 13. Apparatus of one of the preceding examples, wherein the audio signal analyzer (120) is configured to determine the average characteristic for the group of blocks using a weighted addition of individual characteristics of blocks in the group of blocks. 14. Apparatus of one of the preceding examples, wherein the audio signal analyzer (120) is configured to perform a weighted addition of individual characteristics of blocks in the group of blocks, wherein a weighting value for a characteristic of a block close in time to the current block is greater than a weighting value for a characteristic of a further block less close in time to the current block. 15. Apparatus of example 13 or 14, wherein the audio signal analyzer (120) is configured to determine the group of blocks so that the group of blocks comprises at least twenty blocks before the corresponding block or at least twenty blocks subsequent to the current block. 16. Apparatus of one of the preceding examples, wherein the audio signal analyzer is configured to use a normalization value depending on a number of blocks in the group of blocks or depending on the weighting values for the blocks in the group of blocks. 17. Apparatus of one of the preceding examples, further comprising a signal characteristic measurer (702, 704) for measuring a signal characteristic of at least one of the background component signals or the foreground component signals. 18. Apparatus of example 17, wherein the signal characteristic measurer is configured to determine a foreground density (702) using the foreground component signal or to determine a foreground prominence (704) using the foreground component signal and the audio input signal. 19. Apparatus of one of the preceding examples, wherein the foreground component signal comprises clap signals, wherein the apparatus further comprises a signal characteristic modifier for modifying the foreground component signal by increasing a number of claps or decreasing a number of claps or by applying a weight to the foreground component signal or the background component signal to modify an energy relation between the foreground clap signal and the background component signal being a noise-like signal. 20. Apparatus of one of the preceding examples, further comprising a blind upmixer for upmixing the audio signal into a representation having a number of output channels being greater than a number of channels of the audio signal, wherein the upmixer is configured to spatially distribute the foreground component signal into the output channels wherein the foreground component signal in the number of output channels are correlated, and to spectrally distribute the background component signal into the output channels, wherein the background component signals in the output channels are less correlated than the foreground component signals or are uncorrelated to each other. 21. Apparatus of one of the preceding examples, further comprising an encoder stage (801, 802) for separately encoding the foreground component signal and the background component signal to obtain an encoded representation (804) of the foreground component signal and a separate encoded representation of the background component signal (806) for transmission or storage or decoding. 22. Method of decomposing an audio signal (100) into a background component signal (140) and a foreground component signal (150), the method comprising: generating (110) a time sequence of blocks of audio signal values; determining (120) a block characteristic of a current block of the audio signal and determining an average characteristic for a group of blocks, the group of blocks comprising at least two blocks; and separating (130) the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal (140) comprises the background portion of the current block and the foreground component signal (150) comprises the foreground portion of the current block.
(90) Subsequently, further examples are described that can be used separately from the above examples or in combination with any of the above examples. 1. Apparatus for decomposing an audio signal into a background component signal and a foreground component signal, the apparatus comprising: a block generator (110) for generating a time sequence of blocks of audio signal values; an audio signal analyzer (120) for determining a characteristic of a current block of the audio signal and for determining a variability of the characteristic within a group of blocks comprising at least two blocks of the sequence of blocks; and a separator (130) for separating the current block into a background portion (140) and a foreground portion (150), wherein the separator (130) is configured to determine (182) a separation threshold based on the variability and to separate the current block into the background component signal (140) and the foreground component signal (150), when the characteristic of the current block is in a predetermined relation to the separation threshold, or to determine the whole current block as a foreground component signal, when the characteristic of the current block is in the predetermined relation to the separation threshold, or to determine the whole current block as a background component signal, when the characteristic of the current block is not in the predetermined relation to the separation threshold. 2. Apparatus of example 1, wherein the separator (130) is configured to determine a first separation threshold (401) for a first variability (501) and a second separation threshold (402) for a second variability (502), wherein the first separation threshold (401) is lower than the second separation threshold (402), and the first variability (501) is lower than the second variability (502) and wherein the predetermined relation is greater than, or wherein the first separation threshold is greater than the second separation threshold, wherein the first variability is lower than the second variability, and wherein the predetermined relation is lower than. 3. Apparatus of example 1 or 2, wherein the separator (130) is configured to determine the separation threshold using a table access or using a monotonic interpolation function interpolating between a first separation threshold (401) and a second separation threshold (402), so that, for a third variability (503), a third separation threshold (403) is obtained, and for a fourth variability (504), a fourth separation threshold (404) is obtained, wherein the first separation threshold (401) is associated with a first variability (501), and the second separation threshold (402) is associated with a second variability (502), wherein the third variability (503) and the fourth variability are located, with respect to their values, between the first variability (501) and the second variability (502), and wherein the third separation threshold (403) and the fourth separation threshold (404) are located, with respect to their values, between the first separation threshold (401) and the second separation threshold (402). 4. Apparatus of example 3, wherein the monotonic interpolation function is a linear function or a quadratic function or a cubic function or a power function with an order greater than 3. 5. Apparatus of one of examples 1 to 4, wherein the separator (130) is configured to determine, based on the variability of the characteristic with respect to the current block, a raw separation threshold (405) and based on the variability of at least one preceding or following block, at least one further raw separation threshold (405), and to determine (407) the separation threshold for the current block by smoothing a sequence of raw separation thresholds, the sequence comprising the raw separation threshold and the at least one further raw separation threshold, or wherein a separator (130) is configured to determine a raw variability (402) of the characteristic for the current block and, additionally, to calculate (404) a raw variability for a preceding or a following block, and wherein the separator (130) is configured for smoothing a sequence of raw variabilities comprising the raw variability for the current block and the at least one further raw variability for the preceding or the following block to obtain a smoothed sequence of variabilities, and to determine separation thresholds based on smoothed variability of the current block. 6. Apparatus of one of the preceding examples, wherein the audio signal analyzer (120) is configured to determine the variability by calculating a characteristic of each block in the group of blocks to obtain a group of characteristics and by calculating a variance of the group of characteristics, wherein the variability corresponds to the variance or depends on the variance of the group of characteristics. 7. Apparatus of one of the preceding examples, wherein the audio signal analyzer (120) is configured to calculate the variability using an average or expected characteristic (502) and differences (504) between the characteristics in the group of characteristics and the average or expected characteristic, or by calculating the variability using differences (508) between characteristics of the group of characteristics following in time. 8. Apparatus of one of the preceding examples, wherein the audio signal analyzer (120) is configured to calculate the variability of the characteristic within the group of characteristics comprising at least two blocks preceding the current block or at least two blocks following the current block. 9. Apparatus of one of the preceding examples, wherein the audio signal analyzer (120) is configured to calculate the variability of the characteristic within the group of blocks consisting of at least thirty blocks. 10. Apparatus of one of the preceding examples, wherein the audio signal analyzer (120) is configured to calculate the characteristic as a ratio of a block characteristic of the current block and an average characteristic for a group of blocks comprising at least two blocks, and wherein the separator (130) is configured to compare the ratio to the separation threshold determined based on the variability of the ratio associated with the current block within the group of blocks. 11. Apparatus of example 10, wherein the audio signal analyzer (120) is configured to use, for the calculation of the average characteristic, and for the calculation of the variability, the same group of blocks. 12. Apparatus of one of the preceding examples, wherein the audio signal analyzer is configured for analyzing an amplitude-related measure as the characteristic of the current block and the amplitude-related characteristic as the average characteristic for the group of blocks. 13. Apparatus of one of the preceding examples, wherein the separator (130) is configured to calculate the separation gain from the characteristic, to weight the audio signal values of the current block using the separation gain to obtain the foreground portion of the current frame and to determine the background component so that the background signal constitutes a remaining signal, or wherein the separator is configured to calculate a separation gain from the characteristic, to weight the audio signal values of the current block using the separation gain to obtain the background portion of the current frame and to determine the foreground component so that the foreground component signal constitutes a remaining signal. 14. Apparatus of one of the preceding examples, wherein the separator (130) is configured to separate a following block following the current block in time using comparing the characteristic of the following block to a further release threshold, wherein the further release threshold is set such that a characteristic that is not in the predetermined relation to the threshold is in the predetermined relation to the further release threshold. 15. Apparatus of example 14, wherein the separator (130) is configured to determine the release threshold based on the variability and to separate the following block, when the characteristic of the current block is in a further predetermined relation to the release threshold. 16. Apparatus of example 14 or 15, wherein the predetermined relation is “greater than” and wherein the release threshold is lower than the separation threshold, or wherein the predetermined relation is “lower than” and wherein the release threshold is greater than the separation threshold. 17. Apparatus of one of the preceding examples, wherein the block generator (110) is configured to determine timely overlapping blocks of audio signal values or wherein the timely overlapping blocks have a number of sampling values being less than or equal to 600. 18. Apparatus of one of the preceding examples, wherein the block generator is configured to perform a block-wise conversion of the time domain audio signal into a frequency domain to obtain a spectral representation for each block, wherein the audio signal analyzer is configured to calculate the characteristic using the spectral representation of the current block, and wherein the separator (130) is configured to separate the spectral representation into the background portion and the foreground portion so that, for spectral bins of the background portion and the foreground portion corresponding to the same frequency, each have a spectral value different from zero, wherein a relation of the spectral value of the foreground portion and the spectral value of the background portion within the same frequency bin depends on the characteristic. 19. Apparatus of one of the preceding examples, wherein the audio signal analyzer (120) is configured to calculate the characteristic using the spectral representation of the current block to calculate the variability for the current block using the spectral representation of the group of blocks. 20. Method for decomposing an audio signal into a background component signal and a foreground component signal, the method comprising: generating (110) a time sequence of blocks of audio signal values; determining (120) a characteristic of a current block of the audio signal and determining a variability of the characteristic within a group of blocks comprising at least two blocks of the sequence of blocks; and separating (130) the current block into a background portion (140) and a foreground portion (150), wherein a separation threshold is determined based on the variability and wherein the current block is separated into the background component signal (140) and the foreground component signal (150), when the characteristic of the current block is in a predetermined relation to the separation threshold, or wherein the whole current block is determined as a foreground component signal, when the characteristic of the current block is in the predetermined relation to the separation threshold, or wherein determine the whole current block is determined as a background component signal, when the characteristic of the current block is not in the predetermined relation to the separation threshold.
(91) An inventively encoded audio signal can be stored on a digital storage medium or a non-transitory storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
(92) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
(93) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
(94) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(95) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
(96) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.
(97) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(98) A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
(99) A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
(100) A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
(101) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(102) In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
(103) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.