Device and method for estimating direction of arrival
11567162 · 2023-01-31
Assignee
Inventors
- Kainan Chen (Munich, DE)
- Jürgen Geiger (Munich, DE)
- Mohammad Taghizadeh (Munich, DE)
- Peter Grosche (Munich, DE)
Cpc classification
G01S3/8006
PHYSICS
International classification
G01S3/808
PHYSICS
Abstract
A device for estimating Direction of Arrival (DOA) of sound from Q≥1 sound sources is provided. The device is configured to obtain a phase difference matrix, which includes measured phase difference values, each of the measured phase difference values being a measured value of a phase difference between two microphone units for a frequency bin in a range of frequencies of the sound. The device is further configured to generate a replicated phase difference matrix by replicating the measured phase difference values to other potential sinusoidal periods, calculate a DOA value for each phase difference value in the replicated phase difference matrix, and determine, as Q DOA results, the Q most prominent peak values in a histogram generated based on the calculated DOA values.
Claims
1. A device for estimating Direction of Arrival (DOA) of sound from Q >1 sound sources, the device being a component in a system comprising a plurality of microphone units, the device being configured to: obtain a phase difference matrix including measured phase difference values, each of the measured phase difference values being a measured value of a phase difference between two microphone units of the plurality of microphone units for a frequency bin in a range of frequencies of the sound, generate a replicated phase difference matrix by replicating the measured phase difference values to other potential sinusoidal periods, calculate a DOA value for each phase difference value in the replicated phase difference matrix, generate a first histogram from the calculated DOA values, select, as Q+q DOA candidates, Q+q most prominent peak values in the first histogram, wherein q=2, generate a second histogram based on the selected Q+q DOA candidates, and determine, as Q DOA results, Q most prominent peak values in the second histogram.
2. The device according to claim 1, wherein the device is further configured to: generate the replicated phase difference matrix by replicating the measured phase difference values based on a minimum aliasing frequency defined by
3. The device according to claim 2, wherein: the measured phase difference values in the phase difference matrix are wrapped into [−π, π], and the device is configured to generate the replicated phase difference matrix according to
4. The device according to claim 3, wherein the device is further configured to: calculate the DOA values based on the formula
5. The device according to claim 1, wherein the device is further configured to: remove complex calculated DOA values, before generating the first histogram.
6. The device according to claim 1, wherein, for generating the second histogram, the device is configured to: determine, for each selected DOA candidate, its related DOA values from the calculated DOA values, generate third histograms from each selected DOA candidate and its related DOA values, and generate the second histogram by merging the third histograms of all selected DOA candidates.
7. The device according to claim 6, wherein the device is further configured to: merge the third histograms of all selected DOA candidates to generate the second histogram by, for each histogram index, using the maximum value from all the third histograms as the value of the second histogram for that histogram index.
8. The device according to claim 6, wherein the device is further configured to: determine the related DOA values of a DOA candidate by determe, as its related phase difference values, the phase difference values in the replicated phase difference matrix that are in supposed correct sinusoidal periods, and calculate its related DOA values from its related phase difference values.
9. The device according to claim 6, wherein the device is further configured to: apply a soft mask to the peak values in each of the third histograms, before merging the third histograms into the second histogram, wherein the soft mask is designed as a peak filter with a smaller width at a DOA of 0° and larger widths at DOAs of ±90°.
10. The device according to claim 9, wherein the device is further configured to: apply a low-pass filter to the second histogram, before determining the Q DOA results.
11. The device according to claim 1, wherein: each microphone unit of the two microphone units includes an array of one or more microphones, and the one or more measured phase difference values of the phase difference matrix are obtained from measured phase differences between the one or more microphones of one of the microphone units and the one or more microphones of the other one of the microphone units.
12. An apparatus for determining Direction of Arrival (DOA) of sound from Q>1 sound sources, the apparatus comprising: a device configured to: obtain a phase difference matrix including measured phase difference values, each of the measured phase difference values being a measured value of a phase difference between two microphone units of a plurality of microphone units for a frequency bin in a range of frequencies of the sound, generate a replicated phase difference matrix by replicating the measured phase difference values to other potential sinusoidal periods, calculate a DOA value for each phase difference value in the replicated phase difference matrix, generate a first histogram from the calculated DOA values, select, as Q+q DOA candidates, Q+q most prominent peak values in the first histogram, wherein q=2, generate a second histogram based on the selected Q+q DOA candidates, and determine, as Q DOA results, Q most prominent peak values in the second histogram and a sound receiver, including the two microphone units, configured to receive the sound, generate the phase difference matrix, and provide the phase difference matrix to the device.
13. A method of estimating Direction of Arrival (DOA) of sound from Q >1 sound sources, in a system comprising a plurality of microphone units, the method comprising: obtaining a phase difference matrix including measured phase difference values, each of the measured phase difference values being a measured value of a phase difference between two microphone units of the plurality of microphone units for a frequency bin in a range of frequencies of the sound, generating a replicated phase difference matrix by replicating the measured phase difference values to other potential sinusoidal periods, calculating a DOA value for each phase difference value in the replicated phase difference matrix, generating a first histogram from the calculated DOA values, selecting, as Q+q DOA candidates, Q+q most prominent peak values in the first histogram, wherein q=2, generating a second histogram based on the selected Q+q DOA candidates, and determining, as Q DOA results, Q most prominent peak values in the second histogram.
14. The device according to claim 10, wherein the low-pass filter is a Gaussian filter with a standard deviation σ according to
15. The apparatus according to claim 12, wherein the device is further configured to: generate the replicated phase difference matrix by replicating the measured phase difference values based on a minimum aliasing frequency defined by
16. The apparatus according to claim 15, wherein: the measured phase difference values in the phase difference matrix are wrapped into [−π, π], and the device is configured to generate the replicated phase difference matrix according to
17. The method according to claim 13, further comprising: generating the replicated phase difference matrix by replicating the measured phase difference values based on a minimum aliasing frequency defined by
18. The method according to claim 17, wherein: the measured phase difference values in the phase difference matrix are wrapped into [−π, π], and the method further comprises: generating the replicated phase difference matrix according to
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The above described aspects and implementation forms of embodiments of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION OF THE EMBODIMENTS
(15)
(16) The device 100 of
(17) The device 100 is further configured to generate a replicated phase difference matrix μ by replicating the measured phase difference values in the obtained phase difference matrix μ.sub.0 to other potential sinusoidal periods.
(18) Then, the device 100 is configured to calculate a DOA value for each phase difference value in the replicated phase difference matrix μ, i.e. it calculate a DOA matrix θ. Finally, the device 100 is configured to determine, as Q DOA results, the Q most prominent peak values in a histogram generated based on the calculated DOA values θ.
(19) The device 100 is thereby configured to carry out a method according to an embodiment of the invention. As shown in
(20) The position of the device 100 in the sound source localization is shown in
(21) A more detailed overview of a device 100 according to an embodiment of the invention, which builds on the embodiment of the device 100 in
(22) In box 301, the phase difference matrix μ.sub.0 is obtained, and the replicated phase difference matrix μ is generated by replicating the measured phase difference values to other potential sinusoidal periods. In box 302, DOA values θ are calculated from the replicated phase difference matrix μ. That is, a DOA value θ is calculated for each phase difference value in the replicated phase difference matrix μ.
(23) In box 303, a DOA histogram h (denoted as first histogram) is generated from the calculated DOA values θ. In a simple implementation form of the device 100, the Q most prominent peak values in the first histogram h may be selected already at this point as Q DOA results. In an implementation form of the device 100, for improved robustness, more peaks in the histogram h are detected at box 304. In particular, here the Q+q most prominent peak values in the first histogram h may be detected as DOA candidates. q is preferably 2.
(24) In box 305, a binary masking may be applied, wherein the binary masking takes as input the Q+q peaks detected at box 304 and the DOA values θ calculated at box 302. Thus, in box 305 particularly related DOA values θ.sub.1, θ.sub.2 . . . θ.sub.i are determined and output. At box 306, further histograms (denoted as third histograms) are produced from each selected DOA candidate and its related DOA values, and are output as h.sub.1, h.sub.2 . . . h.sub.i. At box 307, soft masking is applied to these histograms to output soft-masked histograms H.sub.1, H.sub.2 . . . H.sub.i. That is, a soft mask to the peak values is applied in each of the third histograms. At box 308, these histograms H.sub.1, H.sub.2 . . . H.sub.i are then merged into one histogram H (denoted as second histogram) at box 308. The third histograms are particularly merged to generate the second histogram by, for each histogram index, using the maximum value from all the third histograms as the value of the second histogram for that histogram index (denoted by “maximum”).
(25) At box 309, an optional low-pass filtering is applied to the histogram H. Specifically, a Gaussian filter may be applied. Then, at box 309, the Q most prominent peak values in the second histogram are determined as the Q estimated DOA results θ, and are output.
(26)
(27) The purpose of this step is to obtain a (replicated) phase difference matrix μ in all of the potential sinusoidal periods. Frequency bands below f.sub.a.sub.
(28)
where └*┘ denotes floor process, and μ is the replicated matrix. μ now contains μ.sub.0 in the correct sinusoidal period and contains some errors introduced from this step.
(29)
(30)
(31) Each phase difference value in the replicated phase difference matrix μ has a single corresponding DOA θ. μ is transformed to DOA θ including these θ as
(32)
θ(i,j) denotes the DOA value for frequency bin index i and replication index j, and Δd denotes the distance between the two microphone units 203.
(33)
(34) Now, {umlaut over (μ)} may define the phase differences in the correct sinusoidal periods, and the transformed corresponding value of DOAs may be defined as {dot over (θ)}. It is known that {dot over (θ)} is theoretically constant in clean (low noise) scenarios. This property can be expressed as
(35)
(36) By simplifying the above equation (6), the relationship of {dot over (μ)} between different frequencies can be determined as
(37)
(38) When the phase difference is in the wrong sinusoidal periods, {umlaut over (μ)}(i)={umlaut over (μ)}(i)+2nπ, (n≠0, n∈Z). The wrong estimated DOA is defined as {umlaut over (θ)}(i). {umlaut over (θ)}(i) is a complex number when the condition
(39)
is met. For this reason, all of the complex values are preferably removed from θ.
(40)
(41) By taking the above equation (6) and the mentioned simplifications, the {umlaut over (θ)} differences relationship between different frequencies is obtained as
(42)
(43) This proves that {umlaut over (θ)} is a monotonic variant along the frequency axis. Together with the constancy of {dot over (θ)}, when θ is transformed into the histogram h, the amplitudes of the correct peaks are higher than the peaks from {umlaut over (θ)}.
(44)
(45) If sound sources 202 are broadband signals, and the scenario is clean, the DOA results can be estimated by the positions of the peaks with the highest Q prominence. If the scenario is noisy, and/or some of the sound sources 202 are weak, the corresponding peaks may have less prominence than the peaks from {umlaut over (θ)}.
(46) To make the estimation carried out by the device 100 even more robust, in such a case, Q′=Q+q peaks may be taken from the histogram h as DOA candidates (practically, q is taken as 2, but it may also be another integer value, like 3 or higher).
(47) This is shown in
(48)
(49) To evaluate, whether the chosen peaks (DOA candidates) correspond to actual sound sources 202, and not aliasing peaks, each of the peaks is processed individually. The position of a k.sup.th peak is denoted as p.sub.k, and from equation (3), the corresponding aliasing frequency can be determined as f.sub.a.sub.
(50) With these frequency indexes, binary masks can be applied to select the DOA values of the phases in supposed correct sinusoidal periods for the corresponding peaks from θ. The process of selecting the related DOA values for a peak value may be described as
(51)
where θ.sub.k includes the k.sup.th peak and its related DOA values.
(52)
(53) θ.sub.k of each peak is then transformed into a histogram h.sub.k. That is a histogram h.sub.k is generated for the k.sup.th selected DOA candidate and its related DOA values, as is shown in
(54) A soft mask M.sub.k may now be applied to the histogram h.sub.k related to the k.sup.th peak, in order to highlight the correct peaks. The mask may be the same or different for each peak.
(55) Theoretically, the width of an aliasing peak is large. In contrast, the width of a correct peak p.sub.k is narrow at 0°, and increases when the peak is getting closer to ±90°. With this property, the soft mask may be designed as a peak filter with small width at 0° and large width at +90°. A practical soft mask with respect to the k.sup.th selected DOA candidate can preferably be designed like
(56)
where f.sub.nh denotes the considered highest frequency.
(57) The soft masking is preferably applied by Schurproduct (°) according to
H.sub.k−h.sub.k° M.sub.k (12)
(58)
(59) The masked histograms from the peak candidates are merged to H by “maximum” according to
H(i)=max(H.sub.1(i), . . . ,H.sub.k(i), . . . H.sub.Q′(i)) (13)
(60)
(61) A low-pass filter is preferably further applied to this histogram H, more preferably Gaussian filter. Even more preferably, a Gaussian filter is suggested to be applied with a standard deviation a equal to the lowest localization resolution of the microphone setup. The reason to set this deviation is to balance the height of the peaks closer to 0° and 90°. Theoretically, the widths of the aliasing peaks are large while the widths of the correct peaks are narrow at 0°, and the widths of the correct peaks increase when the peaks are getting closer to ±90°. Therefore using the soft-mask in this way can help to detect the correct peaks more reliably. A simplified equation to obtain the lowest resolution is given as
(62)
where f.sub.s denotes the sampling rate.
(63) Finally, Q peaks are selected by their peak prominence from the (optionally low-pass filtered) histogram H. The positions of the peaks are the DOA result output by the device 100.
(64)
(65) As a consequence, the device 100 of embodiments of the invention enhances the robustness and accuracy of sound source localization that uses microphones or microphone arrays, especially when the distance between the microphones is large. A potential application for such a device 100 or for the apparatus 200 is, for example, in a distance speech pick up device, in a tablet, in a mobile phone, or in a teleconference device. In each application, the invention specifically reduces or eliminates the negative spatial aliasing effects.
(66) The invention has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items described.