DEVICE AND METHOD FOR CALCULATING LOUDSPEAKER SIGNALS FOR A PLURALITY OF LOUDSPEAKERS WHILE USING A DELAY IN THE FREQUENCY DOMAIN
20180012612 · 2018-01-11
Inventors
Cpc classification
H04S2420/07
ELECTRICITY
H04S2420/13
ELECTRICITY
H04R2430/03
ELECTRICITY
International classification
Abstract
A device for calculating loudspeaker signals for a plurality of loudspeakers while using a plurality of audio sources, an audio source including an audio signal, includes a forward transform stage for transforming each audio signal, block-by-block, to a spectral domain so as to obtain for each audio signal a plurality of temporally consecutive short-term spectra, a memory for storing a plurality of temporally consecutive short-term spectra for each audio signal, a memory access controller for accessing a specific short-term spectrum among the plurality of short-term spectra for a combination consisting of a loudspeaker and an audio signal on the basis of a delay value, a filter stage for filtering the specific short-term spectrum for the combination of the audio signal and the loudspeaker by using a filter provided for the combination of the audio signal and the loudspeaker, so that a filtered shot-term spectrum is obtained for each combination of an audio signal and a loudspeaker, a summing stage for summing up the filtered short-term spectra for a loudspeaker so as to obtain summed-up short-term spectra for each loudspeaker, and a backtransform stage for backtransforming, block-by-block, summed-up short-term spectra for the loudspeakers to a time domain so as to obtain the loudspeaker signals.
Claims
1. A device for calculating loudspeaker signals for a plurality of loudspeakers while using a plurality of audio sources, each audio source comprising an audio signal, said device comprising: a forward transform stage configured to transform each audio signal, block-by-block, to a spectral domain so as acquire for each audio signal a plurality of temporally consecutive short-term spectra; a memory configured to store a plurality of temporally consecutive short-term spectra for each audio signal; a memory access controller configured to access a specific short-term spectrum among the plurality of temporally consecutive short-term spectra for a combination comprising a loudspeaker and an audio signal on the basis of a delay value; a filter stage configured to filter the specific short-term spectrum for the combination of the audio signal and the loudspeaker by using a filter provided for the combination of the audio signal and the loudspeaker, so that a filtered short-term spectrum is acquired for each combination of an audio signal and a loudspeaker; a summing stage configured to sum up the filtered short-term spectra for a loudspeaker so as acquire summed-up short-term spectra for each loudspeaker; and a backtransform stage configured to backtransform, block-by-block, summed-up short-term spectra for the loudspeakers to a time domain so as acquire the loudspeaker signals.
2. The device as claimed in claim 1, wherein the filter stage is configured to determine, from an impulse response of the filter provided for the combination of the loudspeaker and the audio signal, a modified impulse response in that a number of zeros is inserted at a temporal beginning of the impulse response, the number of zeros depending on the delay value for the combination of the audio signal and the loudspeaker, and on the block index of the specific short-term spectrum for the combination of the audio signal and the loudspeaker.
3. The device as claimed in claim 1, wherein the filter stage is configured to multiply, spectral value by spectral value, the specific short-term spectrum by a transmission function of the filter.
4. The device as claimed in claim 1, wherein the memory comprises, for each audio source, a frequency-domain delay line with an optional access to the short-term spectra stored for said audio source, an access operation being performable via a block index for each short-term spectrum.
5. The device as claimed in claim 1, wherein the forward transform stage comprises a number of transform blocks that is equal to the number of audio sources, wherein the backtransform stage comprises a number of transform blocks that is equal to the number of loudspeaker signals, wherein a number of frequency-domain delay lines is equal to the number of audio sources, and wherein the filter stage comprises a number of single filters that is equal to the product of the number of audio sources and the number of loudspeaker signals.
6. The device as claimed in claim 1, wherein the forward transform stage and the backtransform stage are configured in accordance with an overlap-save method, wherein the forward transform stage is configured to decompose the audio signal into overlapping blocks while using a stride value so as acquire the short-term spectra, and wherein the backtransform stage is configured to discard, following backtransform of the filtered short-term spectra for a loudspeaker, specific areas in the backtransformed blocks and to piece together any portions that have not been discarded, so as acquire the loudspeaker signal for the loudspeaker.
7. The device as claimed in claim 1, wherein the forward transform stage and the backtransform stage are configured in accordance with an overlap-add method, wherein the forward transform stage is configured to decompose the audio signal into adjacent blocks, while using a stride value, which are padded with zeros in accordance with the overlap-add method, a transform being performed with the blocks that have been zero-padded in accordance with the overlap-add method, wherein the backtransform stage is configured to sum up, following the backtransform of the spectra summed up for a loudspeaker, overlapping areas of backtransformed blocks so as acquire the loudspeaker signal for the loudspeaker.
8. The device as claimed in claim 1, wherein the forward transform stage and the backtransform stage are configured to perform a digital Fourier transform algorithm or an inverse digital Fourier transform algorithm.
9. The device as claimed in claim 1, further comprising: a wave field synthesis operator configured to produce the delay value for each combination of a loudspeaker and an audio source while using a virtual position of the audio source and the position of the loudspeaker, and to provide same to the memory access controller or to the filter stage.
10. The device as claimed in claim 1, wherein the audio source comprises a directional characteristic, the filter stage being configured to use different filters for different combinations of loudspeakers and audio signals.
11. The device as claimed in claim 1, wherein the forward transform stage is configured to use a block-by-block fast Fourier transform, the length of the stage equals K+B, B being a stride in the generation of consecutive blocks, K being an order of the filter of the filter stage when the filter is configured to provide no further contribution to a delay.
12. A method of calculating loudspeaker signals for a plurality of loudspeakers while using a plurality of audio sources, each audio source comprising an audio signal, said method comprising: transforming each audio signal, block-by-block, to a spectral domain so as acquire for each audio signal a plurality of temporally consecutive short-term spectra; storing a plurality of temporally consecutive short-term spectra for each audio signal; accessing a specific short-term spectrum among the plurality of temporally consecutive short-term spectra for a combination comprising a loudspeaker and an audio signal on the basis of a delay value; filtering the specific short-term spectrum for the combination of the audio signal and the loudspeaker by using a filter provided for the combination of the audio signal and the loudspeaker, so that a filtered short-term spectrum is acquired for each combination of an audio signal and a loudspeaker; summing up the filtered short-term spectra for a loudspeaker so as acquire summed-up short-term spectra for each loudspeaker; and backtransforming, block-by-block, summed-up short-term spectra for the loudspeakers to a time domain so as acquire the loudspeaker signals.
13. A non-transitory storage medium having stored thereon a computer program comprising a program code for performing a method of calculating loudspeaker signals for a plurality of loudspeakers while using a plurality of audio sources, each audio source comprising an audio signal, when the program code runs on a computer or a processor, the method comprising: transforming each audio signal, block-by-block, to a spectral domain so as acquire for each audio signal a plurality of temporally consecutive short-term spectra; storing a plurality of temporally consecutive short-term spectra for each audio signal; accessing a specific short-term spectrum among the plurality of temporally consecutive short-term spectra for a combination comprising a loudspeaker and an audio signal on the basis of a delay value; filtering the specific short-term spectrum for the combination of the audio signal and the loudspeaker by using a filter provided for the combination of the audio signal and the loudspeaker, so that a filtered short-term spectrum is acquired for each combination of an audio signal and a loudspeaker; summing up the filtered short-term spectra for a loudspeaker so as acquire summed-up short-term spectra for each loudspeaker; and backtransforming, block-by-block, summed-up short-term spectra for the loudspeakers to a time domain so as acquire the loudspeaker signals.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0037] Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
DETAILED DESCRIPTION OF THE INVENTION
[0052]
[0053] Thus, the memory access controller is configured to resort to a specific short-term spectrum among the plurality of short-term spectra for a combination of loudspeaker and audio signal on the basis of a delay value predefined for this audio signal/loudspeaker combination. The specific short-term spectra determined by the memory access controller 600 are then fed to a filter stage 300 for filtering the specific short-term spectra for combinations of audio signals and loudspeakers so as to there perform filtering with a filter provided for the respective combination of audio signal and loudspeaker, and to obtain a sequence of filtered short-term spectra for each such combination of audio signal and loudspeaker. The filtered short-term spectra are then fed to a summing stage 400 by the filter stage 300 so as to sum up the filtered short-term spectra for a loudspeaker such that a summed-up short-term spectrum is obtained for each loudspeaker. The summed-up short-term spectra are then fed to a backtransform stage 800 for the purpose of block-wise backtransform of the summed-up short-term spectra for the loudspeakers so as to obtain the short-term spectra within a time domain, whereby the loudspeaker signals may be determined. The loudspeaker signals are thus output at an output 12 by the backtransform stage 800.
[0054] In one embodiment, wherein the device is a wave field synthesis device, the delay values 701 are supplied by a wave field synthesis operator (WFS operator) 700, which calculates the delay values 701 for each individual combination of audio signal and loudspeaker as a function of source positions fed in via an input 702 and as a function of the loudspeaker positions, i.e. those positions where the loudspeakers are arranged within the reproduction room, and which are supplied via an input 703. If the device is configured for a different application than for wave field synthesis, i.e. for an ambisonics implementation or the like, there will also exist an element corresponding to the WFS operator 700 which calculates delay values for individual loudspeaker signals and/or which calculates delay values for individual audio signal/loudspeaker combinations. Depending on the implementation, the WFS operator 700 will also calculate scaling values in addition to delay values, which scaling values can typically also be taken into account by a scaling factor in the filter stage 300. Said scaling values may also be taken into account by scaling the filter coefficients used in the filter stage 300, without causing any additional computing expenditure.
[0055] The memory access controller 600 may therefore be configured, in a specific implementation, to obtain delay values for different combinations of audio signal and loudspeaker, and to calculate an access value to the memory for each combination, as will be set forth with reference to
[0056]
[0057] In particular, the WFS operator 700 is configured to provide a delay value D, as is depicted in step 20 of
[0058] The delay achieved by controlling the filter in step 24 may be interpreted as a delay in the “time domain” even though said delay in the frequency domain is applied, due to the specific implementation of the filter stage, to the specific short-term which has been read out—specifically while using the multiple D.sub.b—from the memory 200. Thus, the result is a splitting up into three blocks for the entire delay, as is depicted at 26 in
[0059] Subsequently, an advantageous implementation of the filter stage 300 will be discussed while referring to
[0060] In a step 30, an impulse response for an audio signal/loudspeaker combination is provided. For directional sound sources, in particular, one will have a dedicated impulse response for each combination of audio signal and loudspeaker. However, for other sources, too, there are different impulse responses at least for specific combinations of audio signal and loudspeaker. In a step 31, the number of zeros to be inserted, i.e. the value D.sub.A, is determined, as was depicted in
[0061] In the embodiment, the forward transform stage 100 is configured to determine the sequence of short-term spectra with the stride B from a sequence of temporal samples, so that a first sample of a first block of temporal samples converted into a short-term spectrum is spaced apart from a first sample of a second subsequent block of temporal samples by a number of samples which equals the stride value. The stride value is thus defined by the respectively first sample of the new block, said stride value being present, as will be set forth by means of
[0062] In addition, in order to enable optional storage in the memory 200, a time value associated with a short-term spectrum is advantageously stored as a block index which indicates the number of stride values by which the first sample of the short-term spectrum is temporally spaced apart from a reference value. The reference value is, e.g., the index 0 of the short-term spectrum at 249 in
[0063] In addition, the memory access means is advantageously configured to determine the specific short-term spectrum on the basis of the delay value and of the time value of the specific short-term spectrum in such a manner that the time value of the specific short-term spectrum equals or is larger by 1 than the integer result of a division of the time duration corresponding to the delay value by the time duration corresponding to the stride value. In one implementation, the integer result used is precisely that which is smaller than the delay that may actually be used. Alternatively, however, one might also use the integer result plus one, said value being a “rounding-up”, as it were, of the delay that may actually be used. In the event of rounding-up, a slightly too large delay is achieved, which may easily suffice for applications, however. Depending on the implementation, the question whether rounding-up or rounding-down is performed may be decided as a function of the amount of the remainder. For example, if the remainder is larger than or equal to 50% of the time duration corresponding to the stride, rounding-up may be performed, i.e. the value which is larger by one may be taken. In contrast, if the remainder is smaller than 50%, “rounding-down” may be performed, i.e. the very result of the integer division may be taken. Actually, one may speak of rounding-down when the remainder is not implemented as well, e.g. by inserting zeros.
[0064] In other words, the implementation presented above and comprising rounding-up and/or rounding-down may be useful when a delay is applied which is achieved only by means of granulation of a block length, i.e. when no finer delay is achieved by inserting zeros into an impulse response. However, if a finer delay is achieved by inserting zeros into an impulse response, rounding-down rather than rounding-up will be performed in order to determine the block offset.
[0065] In order to explain this implementation, reference shall be made to
[0066] A specific exemplary access controller might read out, for example for the implementation of
[0067] In a specific implementation as was already illustrated with reference to
[0068] Advantageously, the memory 200 includes, for each audio source, a frequency-domain delay line, or FDL, 201, 202, 203 of
[0069] As is shown in
[0070] In an advantageous embodiment, the forward transform stage 100 and the backtransform stage 800 are configured in accordance with an overlap-save method, which will be explained below by means of
[0071] Alternatively, both the forward transform stage 100 and the backtransform stage 800 may be configured to perform an overlap-add method. The overlap-add method, which is also referred to as segmented convolution, is also a method of fast convolution and is controlled such that an input sequence is decomposed into actually adjacent blocks of samples with a stride B, as is depicted at 43. However, due to the attachment of zeros (also referred to as zero padding) for each block, as is shown at 44, said blocks become consecutive overlapping blocks. The input signal is thus split up into portions of the length B, which are then extended by the zero padding in accordance with step 44, so as to achieve a longer length for the result of the convolution operation. Subsequently, the blocks produced by step 44 and padded with zeros are transformed by the forward transform stage 100 in a step 45 so as to obtain the sequence of short-term spectra. Subsequently, in accordance with the processing performed in block 39 of
[0072] Depending on the implementation, the forward transform stage 100 and the backtransform stage 800 are configured as individual FFT blocks as shown in
[0073] As was already depicted by means of
[0074] There are several approaches to producing directional sound sources, or sound sources having directional characteristics, while using wave field synthesis. In addition to experimental results, most approaches are based on expanding or developing the sound field to form circular or spherical harmonics. The approach presented here also uses an expansion of the sound field of the virtual source to form circular harmonics so as to obtain a driving function for the secondary sources. This driving function will also be referred to as a WFS operator below.
[0075]
[0076] The following representation is an exemplary description of the wave field synthesis process. Alternative descriptions and implementations are also known. The sound field of the primary source ψ is generated in the region y<y.sub.L by using a linear distribution of secondary monopole sources along x (black dots).
[0077] Using the geometry of
[0078] It states that the sound pressure P.sub.R ({right arrow over (r)}.sub.R,{right arrow over (r)},ω) of a primary sound source may be generated at the receiver position R while using a linear distribution of secondary monopole line sound sources with y=y.sub.L. To this end, the speed V.sub.{right arrow over (n)}({right arrow over (r)},ω) of the primary source ψ at the positions of the secondary sources may be known in accordance with its normal {right arrow over (n)}. In equation (1), ω is the angular frequency, c is the speed of sound, and
is the Hankel function of the second kind of the order of 0. The path from the primary source position to the secondary source position is designated by {right arrow over (r)}. By analogy, {right arrow over (r)}.sub.R is the path from the secondary source to the receiver R. The two-dimensional sound field emitted by a primary source ψ with any directional characteristic desired may be described by an expansion to form circular harmonics.
wherein S(ω) is the spectrum of the source, and α is the azimuth angle of the vector {right arrow over (r)}. {hacek over (C)}.sub.v.sup.(2) (w) are the circular-harmonics expansion coefficients of the order of magnitude of v. While using the motion equation, the WFS secondary source driving function Q ( . . . ) is indicated as
[0079] In order to obtain synthesis operators that can be realized, two assumptions are made: first of all, real loudspeakers behave rather like point sources if the size of the loudspeaker is small as compared to the emitted wavelength. Therefore, the secondary source driving function should use secondary point sources rather than line sources. Secondly, what is contemplated here is only the efficient processing of the WFS driving function. While calculation of the Hankel function involves a relatively large amount of effort, the near-field directional behavior is of relatively little importance from a practical point of view.
[0080] As a result, only the far-field approximation of the Hankel function is applied to the secondary and primary source descriptions (1) and (2). This results in the secondary source driving function
Consequently, the synthesis integral may be expressed as
For a virtual source having ideal monopole characteristics, the directivity term of the source driving function becomes simpler and results in G(ω,α)=1. In this case, only a gain
a delay term
corresponding to a frequency-independent time delay of
and a constant phase shift of j are applied to the secondary source signal.
[0081] In addition to the synthesis of monopole sources, a common WFS system enables reproduction of planar wave fronts, which are referred to as plane waves. These may be considered as monopole sources arranged at an infinite distance. As in the case of monopole sources, the resulting synthesis operator consists of a static filter, a gain factor, and a time delay.
[0082] For complex directional characteristics, the gain factor A( . . . ) becomes dependent on the directional characteristic, the alignment and the frequency of the virtual source as well as on the positions of the virtual and secondary sources. Consequently, the synthesis operator contains a non-trivial filter, specifically for each secondary source
As in the case of fundamental types of sources, the delay may be extracted from (4) from the propagation time between the virtual and secondary sources
[0083] For practical rendering, time-discrete filters for the directional characteristics are determined by the frequency response (8). Because of their ability to approximate any frequency responses and their inherent stability, only FIR filters will be considered here. These directivity filters will be referred to as h.sub.m,n[k] below, wherein n=0, . . . , M−1 designates the virtual-source index, n=0, . . . , M−1 is the loudspeaker index, and k is a time domain index. K is the order of magnitude of the directivity filter. Since such filters are needed for each combination of N virtual sources and M loudspeakers, production is expected to be relatively efficient.
[0084] Here, a simple window (or frequency sampling design) is used. The desired frequency response (9) is evaluated at K+1 equidistantly sampled frequency values within the interval 0≦ω2π. The discrete filter coefficients h.sub.m,n[k], k=0, . . . , K are obtained by an inverse discrete Fourier transform (IDFT) and by applying a suitable window function w[k] so as to reduce the Gibbs phenomenon caused by cutting off of the impulse response.
h.sub.m,n[k]=w[k]IDFT{A.sub.D({right arrow over (r)}.sub.R,{right arrow over (r)},ω,α)} (10)
Implementing this design method enables several optimizations. First of all, the conjugated symmetry of the frequency response A.sub.D({right arrow over (r)}.sub.R,{right arrow over (r)},ω,α); this function is evaluated only for approximately half of the raster points. Secondly, several parts of the secondary source driving function, e.g. the expansion coefficients {hacek over (C)}.sub.v.sup.(2)(ω), are identical for all of the driving functions of any given virtual source and, therefore, are calculated only once. The directivity filters h.sub.m,n[k] introduce synthesis errors in two ways. On the one hand, the limited order of magnitude of filters results in an incomplete approximation of A.sub.D({right arrow over (r)}.sub.R,{right arrow over (r)},ω,α). On the other hand, the infinite summation of (4) is replaced by a finite boundary. As a result, the beam width of the generated directional characteristics cannot become infinitely narrow.
[0085]
[0086] WFS processing is generally implemented as a time-discrete processing system. It consists of two general tasks: calculating the synthesis operator and applying this operator to the time-discrete source signals. The latter will be referred to WFS rendering in the following.
[0087] The impact of the synthesis operator on the overall complexity is typically low since said synthesis operator is calculated relatively rarely. If the source properties change in a discrete manner only, the operator will be calculated as needed. For continuously changing source properties, e.g. in the case of moving sound sources, it is typically sufficient to calculate said values on a coarse grid and to use simple interpolation methods in between.
[0088] In contrast to this, application of the synthesis operator to the source signals is performed at the full audio sampling rate.
[0089] The number of scale and delay operations is formed by the product of the number of virtual sources N and the number of loudspeakers M. Thus, this product typically reaches high values. Consequently, the scale and delay operation is the most critical part, in terms of performance, of most WFS systems—even if only integer delays are used.
[0090]
[0091] By means of
[0092] In order to substantially reduce the computing resources that may be used, the invention proposes a signal processing scheme based on two interacting effects.
[0093] The first effect relates to the fact that the efficiency of FIR filters may frequently be increased by using fast convolution methods in the transform domain, such as overlap-save or overlap-add, for example. Generally, said algorithms transform segments of the input signal to the frequency domain by means of fast Fourier transform (FFT) techniques, perform a convolution by means of frequency domain multiplication, and transform the signal back to the time domain. Even though the actual performance highly depends on the hardware, the order of magnitude of the filter typically ranges between 16 and 50 where transform-based filtering becomes more efficient than direct convolution. For overlap-add algorithms and overlap-save algorithms, the forward and inverse FFT operations constitute the large part of the computational expenditure.
[0094] Advantageously, it is only the overlap-save method that is taken into account since it involves no addition of components of adjacent output blocks. In addition to the reduced arithmetic complexity as compared to overlap-add, said property results in a simpler control logic for the proposed processing scheme.
[0095] A further embodiment for reducing the computational expenditure exploits the structure of the WFS processing scheme. On the one hand, here each input signal is used for a large number of delay and filtering operations. On the other hand, the results for a large number of sound sources are summed for each loudspeaker. Thus, partitioning of the signal processing algorithm, which performs typical operations only once for each input or output signal, promises gains in efficiency. Generally, such partitioning of the WFS rendering algorithm results in considerable improvements in performance for moving sound sources of fundamental types of sources.
[0096] When transform-based fast convolution is employed for rendering directional sound sources, or sound sources having directional characteristics, the forward and inverse Fourier transform operations are obvious candidates for said partitioning. The resulting processing scheme is shown in
[0097]
[0098]
[0099] As was explained by means of
[0100] Conceptually, a random time delay may readily be built into the FIR directivity filter. Due to the large range of the delay value in a typical WFS system, however, this approach results in very long filter lengths and, thus, in large FFT block sizes. On the one hand, this considerably increases the computational expenditure and the storage requirements. On the other hand, the latency period for forming input blocks is not acceptable for many applications due to the block formation delay that may be used for such large FFT sizes.
[0101] For this reason, a processing scheme is proposed here which is based on a frequency-domain delay line and on partitioning of the delay value. Similarly to the conventional overlap-save method, the input signal is segmented into overlapping blocks of the size L and into a stride (or delay block size) B between adjacent blocks. The blocks are transformed to the frequency domain and are designated by Xn[I], wherein n designates the source, and I is the block index. These blocks are stored in a structure which enables indexed access of the form Xn[I-i] to the most recent frequency domain blocks. Conceptually, this data structure is identical with the frequency-domain delay lines used within the context of partitioned convolution.
[0102] The delay value D, indicated in samples, is partitioned into a multiple of the block delay quantity and into a remainder D.sub.r or D.sub.r′
D=D.sub.bB+D.sub.r with 0≦D.sub.r≦B−1,D.sub.bε. (11)
[0103] The block delay D.sub.b is applied as an indexed access to the frequency-domain delay line. By contrast, the remaining part is included into the directivity filter h.sub.m,n[k], which is formally expressed by a convolution with the delay operator δ(k−D.sub.r)
h.sub.m,n.sup.d[k]=h.sub.m,n[k]*δ(k−D.sub.r). (12)
[0104] For integer delay values, this operation corresponds to preceding h.sub.m,n[k] with D.sub.r zeros. The resulting filter is padded with zeros in accordance with the requirements of the overlap-save operation. Subsequently, the frequency-domain filter representation H.sub.m,n.sup.d is obtained by means of an FFT.
[0105] The frequency-domain representation of the signal component from the source n to the loudspeaker m is calculated as
C.sub.m,n[l]=h.sub.m,n.sup.d.Math.X.sub.n[l−D.sub.b] (13)
wherein .Math. designates an element-by-element complex multiplication. The frequency-domain representation of the driving signal for the loudspeaker m is determined by accumulating the corresponding component signals, which is implemented as a complex-valued vector addition
The remainder of the algorithm is identical with the ordinary overlap-save algorithm. The blocks Y.sub.m[I] are transformed to the time domain, and the loudspeaker driving signals y.sub.m[k] are formed by deleting a predetermined number of samples from each time domain block. This signal processing structure is schematically shown in
[0106] The lengths of the transformed segments and the shift between adjacent segments follow from the derivation of the conventional overlap-save algorithm. A linear convolution of a segment of the length L with a sequence of the length P, L<P, corresponds to a complex multiplication of two frequency domain vectors of the size L and yields L−P+1 output samples. Thus, the input segments are shifted by this amount, subsequently referred to as B=L−P+1. Conversely, in order to obtain B output samples from each input segment for a convolution with an FIR filter of the order of magnitude of K (length P=K−1), the transformed segments have a length of
L=K+B. (15)
[0107] If the integer part of the remainder portion D.sub.r of the delay is embedded into the filter h.sub.m,n.sup.d[k] in accordance with (12), the order of magnitude for h.sub.m,n.sup.d[k] that may be used will result in K′=K+B−1. This is due to the fact that h.sub.m,n.sup.d[k] is preceded by a maximum of B−1 zeros, which is the maximum value for D.sub.r (11). Thus, the segment length that may be used for the proposed algorithm is indicated by
L=K+2B−1. (16)
[0108] So far, only integer sample delay values D have been taken into account. However, the proposed processing scheme may be extended to include any delay values by accommodating an FD filter (FD=fractional delay), a so-called directivity filter h.sub.m,n.sup.d[k]. Here, only FIR-FD filters are taken into account since they may readily be integrated into the proposed algorithm. To this end, the residual delay D.sub.r is partitioned into an integer part D.sub.int and a fractional delay value d, as is customary in the FD filter design. The integer part is integrated into h.sub.m,n.sup.d[k] by preceding h.sub.m,n[k] with D.sub.int zeros. The fractional delay value is applied to h.sub.m,n.sup.d[k] by convoluting same with an FD filter designed for this fractional value d. Thus, the order of magnitude of h.sub.m,n.sup.d[k] that may be used is increased by the order of magnitude of the FD filter K.sub.FD, and the block size L (16) that may be used changes to
L=K+K.sub.FD+2B−1. (17)
[0109] However, the advantages of using random delay values are highly limited. It has been shown that fractional delay values may be used only for moving virtual sources. However, they have no positive effect on the quality as far as static sources are concerned. On the other hand, the synthesis of moving directional sound sources, or sound sources having directional characteristics, would entail constant temporal variation of synthesis filters, the design of which would dominate the overall complexity of rendering in a simple implementation.
[0110]
[0111] In a next step, fast convolution in accordance with the overlap-save method (OS) as well as a backtransform with an IFFT to the loudspeaker signals y.sub.0 . . . y.sub.M-1 is performed at stage 503. What is decisive here is the manner in which access to the spectra occurs. By way of example, access operations 504, 505, 506, and 507 are depicted in the figure. In relation to the time of the access operation 507, access operations 504, 505, and 506 are in the past.
[0112] If the loudspeaker 511 is driven by means of the access operation 507 and if, simultaneously, loudspeakers 510, 512 are driven by means of the access operation 506, it seems to the listener as if the loudspeaker signals of the loudspeakers 510, 512 are delayed as compared to the loudspeaker signal of the loudspeaker 511. The same applies to the access operation 505 and the loudspeaker signals of the loudspeakers 509, 513 as well as to the access operation 504 and to the loudspeaker signals of the loudspeakers 508, 514.
[0113] In this manner, each individual loudspeaker may be driven with a delay corresponding to a multiple of the block stride B. If further delay is to be provided which is smaller than the block stride B, this may be achieved by preceding the corresponding impulse response of the filter, which is the subject of the overlap-save operation, with zeros.
[0114]
[0115] In order to evaluate the potential increase in efficiency achieved by the proposed processing structure, a performance comparison is provided here which is based on the number of arithmetic commands. It should be understood that this comparison can only provide rough estimations of the relative performances of the different algorithms. The actual performance may differ on the basis of the characteristics of the actual hardware architecture. Performance characteristics of, in particular, the FFT operations involved differ considerably, depending on the library used, the actual FFT sizes, and the hardware. In addition, the memory capacity of the hardware used may have a critical impact on the efficiency of the algorithms compared. For this reason, the memory requirements for the filter coefficients and the delay line structures, which are the main sources of memory consumption, are also indicated.
[0116] The main parameters determining the complexity of a rendering algorithm for directional sound sources, or sound sources having directional characteristics, are the number of virtual sources N, the number of loudspeakers M, and the filter order of the directivity filter K. For methods based on fast convolution, the shift between adjacent input blocks, which is also referred to as the block delay B, impairs performance and memory requirements. In addition, block-by-block operation of the fast convolution algorithms introduces an implementation latency period of B−1 samples. The maximally allowed delay value, which is referred to as D.sub.max and is indicated as a number of samples, influences the memory size that may be used for the delay line structures.
[0117] Three different algorithms are compared: linear convolution, filter-by-filter fast convolution, and the proposed processing structure. The method which is based on linear convolution performs NM time domain convolutions of the order of magnitude of K. This amounts to NM(2K+1) commands per sample. In addition, M(N−1) real additions may be used for accumulating the loudspeaker driving signals. The memory that may be used for an individual delay line is D.sub.max+K floating-point values. Each of the MN FIR filters h.sub.m,n[k] may use K+1 memory words for floating-point values. These performance numbers are summarized in the following table. The table shows a performance comparison for wave field synthesis signal processing schemes for directional sound sources, or sound sources having directional characteristics. The number of commands is indicated for calculating a sample for all of the loudspeakers. The memory requirements are specified as numbers of floating-point values.
TABLE-US-00001 filter algorithm commands delay line storage memory linear convolution M[N(2K + 1) + (N − 1)] N(D.sub.max + K) MN(K + 1) filter-by-filter fast convolution
[0118] The second algorithm, referred to as filter-by-filter linear convolution, calculates the MN FIR filters separately while using the overlap-save fast convolution method. In accordance with (15), the size of the FFT blocks in order to calculate B samples per block is L=K+B. For each filter, a real-valued FFT of the size L and an inverse FFT of the same size is performed. A number of commands of pL log.sub.2(L) is assumed for a forward or inverse FFT of the size L, wherein p is a proportionality constant which depends on the actual implementation. p may be assumed to have value between 2.5 and 3.
[0119] Since the frequency transforms of real-valued sequences are symmetrical, complex vector multiplication of the length L, which is performed in the overlap-save method, may use approximately L/2 complex multiplications. Since a single complex multiplication is implemented by 6 arithmetic commands, the effort involved in one vector multiplication amounts to 3L commands. Thus, filtering while using the overlap-save method may use
for one single output sample on all loudspeaker signals. Similarly to the direct convolution algorithm, the effort involved in accumulating the loudspeaker signals amounts to M(N−1) commands. The delay line memory is identical with the linear convolution algorithm. In contrast, the memory requirements for the filters are increased due to the zero paddings of the filters h.sub.m,n[k] prior to the frequency transform. It is to be noted that a frequency domain representation of a real filter of the length L may be stored in L real-valued floating-point values because of the symmetry of the transformed sequence.
[0120] For the proposed efficient processing scheme, the block size for a block delay B equals L=K+2B−1 (16). Thus, a single FFT or inverse FFT operation may use p(K+2B−1)log.sub.2(K+2B−1) commands. However, only N forward and M inverse FFT operations may be used for each audio block. The complex multiplication and addition are each performed on the frequency domain representation and may use 3(K+2B−1) and K+2B−1 commands, respectively, for each symmetrical frequency domain block of the length K+2B−1. Since each processed block yields B output samples, the overall number of commands for a sampling clock iteration amounts to
Since the frequency-domain delay line stores the input signals in blocks of the size L, with a shift of B, the number of memory positions that may be used for one single input signal is
By analogy therewith, a frequency-transformed filter may use K+2B−1 memory words.
[0121] In order to evaluate the relative performance of these algorithms, an exemplary wave field synthesis rendering system shall be assumed for 16 virtual sources, 128 loudspeaker channels, directivity filters of the order of magnitude of 1023, and a block delay of 1024. Each parameter is varied separately so as to evaluate its influence on the overall complexity.
[0122]
[0123] The influence of the number of loudspeaker is shown in
[0124] The effect of the order of magnitude of the directivity filters is examined in
[0125] In
[0126] For the contemplated configuration (N=16, M=16, K=1023, B=1024) and a maximum delay value D.sub.max=48000, which corresponds to a delay value of one second at a sampling frequency of 48 kHz, the linear convolution algorithms may use approximately 2.9.Math.10.sup.6 memory words. For the same parameters, the filter-by-filter fast convolution algorithm uses approximately 5.0.Math.10.sup.6 floating-point memory positions. The increase is due to the size of the pre-calculated frequency domain filter representations. The proposed algorithm may use approximately 8.6.Math.10.sup.6 words of the memory due to the frequency-domain delay line and to the increased block size for the frequency domain representations of the input signal and of the filters. Thus, the performance improvement of the proposed algorithm as compared to filter-by-filter fast convolution is obtained by an increase in the memory of about 72.7% that may be used. Thus, the proposed algorithm may be regarded as a space-time compromise which uses additional memory in order to store pre-calculated results such as frequency-domain representations of the input signal, for example, so as to enable more efficient implementation.
[0127] The additional memory requirements may have an adverse effect on the performance, e.g. due to reduced cache locality. At the same time, it is likely that the reduced number of commands, which implies a reduced number of memory access operations, minimizes this effect. It is therefore useful to examine and evaluate the performance gains of the proposed algorithm for the intended hardware architecture. By analogy therewith, the parameters of the algorithm, such as the FFT block size L or the block delay B, for example, are adjusted to the specific target platform.
[0128] Even though specific elements are described as device elements, it shall be noted that this description may equally be regarded as a description of steps of a method, and vice versa.
[0129] Depending on the circumstances, the inventive method may be implemented in hardware or in software. Implementation may be effected on a non-transitory storage medium, a digital storage medium, in particular a disc or CD which comprises electronically readable control signals which may cooperate with a programmable computer system such that the method is performed. Generally, the invention thus also consists in a computer program product having a program code, stored on a machine-readable carrier, for performing the method when the computer program product runs on a computer. In other words, the invention may thus be realized as a computer program which has a program code for performing the method, when the computer program runs on a computer.
[0130] While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.