Device and method for estimating direction of arrival of sound from a plurality of sound sources

11467244 · 2022-10-11

Assignee

Huawei Technologies Co., Ltd. (Shenzhen, CN)

Inventors

Cpc classification

International classification

Abstract

A device estimates direction of arrival (DOA) of sound from custom character sound sources received by P microphones, wherein P≥>1. The device is configured to transform the output signals of the microphones into the frequency domain and compute a covariance matrix for each of N frequency bins in a range of frequencies of the sound. Further, the device is configured to calculate an adapted covariance matrix from each of the covariance matrices for wide-band merging, calculate an accumulated covariance matrix from the N adapted covariance matrices, and estimate the DOA for each of the sound sources based on the accumulated covariance matrix. In order to calculate an adapted covariance matrix from a covariance matrix, the device is configured to spectrally decompose the covariance matrix and obtain a plurality of eigenvectors, rotate each obtained eigenvector, and construct each rotated eigenvector back to the shape of the covariance matrix.

Claims

1. A device for estimating a direction of arrival (DOA) of sound from Q sound sources received by P microphone units, wherein P≥Q>1, wherein the microphone units form pairs in which the microphone units have identical sensitivity patterns and are translationally separated by a known and constant displacement vector, the device being configured to: transform output signals of the P microphone units into a frequency domain and compute N covariance matrices by computing a covariance matrix for each of a plurality of N frequency bins in a range of frequencies of the sound, calculate N adapted covariance matrices for wide-band merging by calculating from each of the N covariance matrices a corresponding adapted covariance matrix of the N adapted covariance matrices, calculate an accumulated covariance matrix from the N adapted covariance matrices, and estimate the DOA for each of the Q sound sources based on the accumulated covariance matrix, wherein the calculation of the corresponding adapted covariance matrix comprises: spectrally decomposing a corresponding covariance matrix of the N adapted covariance matrices, obtaining a plurality of eigenvectors, rotating each of the obtained eigenvectors by Hadamard powering of a corresponding frequency to a complex-value eigenvector, and constructing each of the rotated eigenvectors back to a shape of the corresponding covariance matrix to obtain the corresponding adapted covariance matrix.

2. The device according to claim 1, wherein, in order to obtain the plurality of eigenvectors, the device is configured to select Q eigenvectors of the corresponding covariance matrix related to Q highest eigenvalues.

3. The device according to claim 1, wherein, in order to construct a rotated eigenvector, of the eigenvectors, back to the shape of the corresponding covariance matrix, the device is configured to reconstruct the corresponding covariance matrix by multiplying the rotated eigenvector, a diagonal matrix of size Q*Q, and an inverse of the rotated eigenvector.

4. The device according to claim 1, wherein, in order to rotate each of the obtained eigenvectors and construct each of the rotated eigenvectors back to the shape of the corresponding covariance matrix, the device is configured to perform, over all of the frequency bins, an accumulation iteration process based on the eigenvectors and the eigenvectors' related eigenvalues, or a summing process, or an averaging process based on reconstructed covariance matrices.

5. The device according to claim 1, wherein, in order to calculate the accumulated covariance matrix, the device is configured to weigh each of the N adapted covariance matrices.

6. The device according to claim 5, wherein the device is configured to weigh each of the adapted covariance matrix, of the adapted covariance matrices, based on a mean square of the frequency-domain transformed output signals of the microphone units for a same one of the frequency bins.

7. The device according to claim 1, wherein, in order to calculate the accumulated covariance matrix, the device is configured to accumulate the N adapted covariance matrices over a plurality of time frames.

8. The device according to claim 1, wherein, in order to estimate the DOA for each of the Q sound sources, the device is configured to estimate, based on the accumulated covariance matrix, accordingly adapted phase difference values, each of the adapted phase difference values being related to a phase difference between two of the microphone units, and estimate the DOAs based on the adapted phase difference values.

9. A method of estimating direction of arrival (DOA) of sound from Q≥1 sound sources received by P microphone units, wherein P≥Q>1, wherein the microphone units form pairs in which the microphone units have identical sensitivity patterns and are translationally separated by a known and constant displacement vector, the method comprising: transforming output signals of the P microphone units into a frequency domain and computing N covariance matrices by computing a covariance matrix for each of a plurality of N frequency bins in a range of frequencies of the sound, calculating N adapted covariance matrices for wide-band merging by calculating from each of the N covariance matrices a corresponding adapted covariance matrix of the N adapted covariance matrices, calculating an accumulated covariance matrix from the N adapted covariance matrices, and estimating the DOA for each of the Q sound sources based on the accumulated covariance matrix, wherein the calculating of the corresponding adapted covariance matrix comprises: spectrally decomposing a corresponding covariance matrix of the N adapted covariance matrices, obtaining a plurality of eigenvectors, rotating each of the obtained eigenvectors by Hadamard powering of a corresponding frequency to a complex-value eigenvector, and constructing each of the rotated eigenvectors back to a shape of the corresponding covariance matrix to obtain the corresponding adapted covariance matrix.

10. A non-transitory computer readable medium comprising a program code that is configured to perform the method according to claim 9 upon running on a computer.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The above described aspects and implementation forms of the disclosure will be explained in the following description of exemplary embodiments in relation to the enclosed drawings, in which:

(2) FIG. 1 shows a device according to an embodiment of the disclosure, which implements a method according to an embodiment of the disclosure.

(3) FIG. 2 shows an overview of the sound sources and the microphone units;

(4) FIG. 3 shows a device according to an embodiment of the disclosure; and

(5) FIG. 4 shows a method according to an embodiment of the disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

(6) FIG. 1 shows a device 100 according to an embodiment of the disclosure. The device 100 is configured to estimate DOA of sound from multiple sound sources.

(7) In particular, the device 100 is designed for multichannel sound source localization in a 2D plane, where the sound is picked up by a plurality of microphone units, for instance, of a microphone array. In this respect, FIG. 2 shows an overview of a typical arrangement of a plurality of sound sources 200 and a plurality of microphone units 201 for picking up the sound of the sound sources 201. The geometry requirement in the exemplary embodiment is that the microphone units 201 form pairs, in which the microphone units 201 have identical sensitivity patterns, and are translationally separated by a known and constant displacement vector Δd. Each pair is formed by two microphone units 201. In FIG. 2, the number of microphone units 201 is twice the number of pairs. However, it is also possible that a microphone unit 201 belongs to two pairs. That is, for instance, three microphone units 201 may be arranged as a uniform linear array that every Δd to form two pairs. The same is possible with each uneven number of microphone units 201. The device 100 is preferably configured to operate in a determined scenario, in which the number of pairs of microphone units 201 is more than the number of sound sources 200.

(8) It is noted in the end that there are several special cases that fit the geometry requirements of the microphone units 201, such as 2-microphone array, uniform linear array, and some kind of circular arrays. Today tablets, mobile phones, smart TV, smart home speakers, AR/VR, and teleconference devices are using microphone arrays that fit this geometry condition.

(9) In particular, the device 100 shown in FIG. 1 is configured to estimate the DOA of sound from custom character sound sources 200 received by P microphone units 201, where P≥>1. The device 100 may include the microphone units 201, but is preferably a processing device 100, which receives and processes the output signals from the microphone units 201. To this end, the device 100 is configured to implement a method 110 according to an embodiment of the disclosure, which is described in the following.

(10) By implementing the method 110, the device 100 is configured to transform 111 the output signals of the P microphone units 201 into the frequency domain and compute a covariance matrix for each of a plurality of N frequency bins in a range of frequencies of the sound. N≥1 is a natural number. That is, the device 110 is configured to compute N covariance matrices.

(11) Further, the device 100 is configured to calculate 112 an adapted covariance matrix from each of the N covariance matrices for wide-band merging, to calculate 113 an accumulated covariance matrix from the N adapted covariance matrices (by accumulating the N adapted covariance matrices), and to estimate 114 the DOA for each of the custom character sound sources based on the accumulated covariance matrix.

(12) The device 100 is configured, in order to calculate an adapted covariance matrix (i.e. any one of the N adapted covariance matrices) from a covariance matrix (i.e. from any one of the N covariance matrices), to spectrally decompose the covariance matrix and obtain a plurality of eigenvectors (i.e. carry out an eigenvalue decomposition), rotate each obtained eigenvector (for the purpose of unifying the eigenvectors), and construct each rotated eigenvector back to the shape of the covariance matrix to obtain the adapted covariance matrix.

(13) The disclosure, implemented by device 100 and method 110, and the ESPRIT algorithm both firstly transform the output signals of the microphone units 201 into an N point frequency domain by DFT. The ESPRIT algorithm then estimates phase differences Δφ.sub.n between the microphone units 201 of each pair for each sound source from the n.sub.th frequency bin. The device 100 and method 110, however, estimate uniformed phase differences Δφ′ for each sound source 200 from all the frequency bins together, so that DOAs can be directly obtained by transforming the uniformed phase differences.

(14) An exemplary device 100 according to an embodiment of the disclosure, which builds on the device 100 shown in FIG. 1, is shown in FIG. 3. The multichannel recording signals, i.e. the multi-channel data obtained by the plurality of microphone units 201, are received by the device 100. In the device 100, these signals are transformed e.g. by a transforming unit 301, into the frequency domain (as mentioned a DFT with N points), and then the device 100 applies, e.g. by an estimation unit 300, an estimation algorithm that outputs custom character DOAs θ from these the narrow-band signals.

(15) Detailed steps carried out in the estimation unit 300 are shown in FIG. 4 and described in the following. The steps are part of a exemplary method 110 according to an embodiment of the disclosure, which builds on the method 110 shown in FIG. 1. The steps shown in FIG. 4 can be categorized as frequency bin wise narrow-band signal processing (step 111 and 112) and wide-band signal processing (step 113 and 114).

(16) The device 100 and method 110 can specifically be considered to implement an improved modification of the ESPRIT algorithm. A short overview of the ESPRIT algorithm is thus given at first.

(17) The ESPRIT algorithm obtains orthogonal signal subspaces by computing eigenvectors of a multichannel covariance matrix for each frequency bin. The signal in the frequency domain is denoted by X={X.sub.1, . . . X.sub.N}.
R.sub.n=E{X.sub.nX.sub.n*} Equation 1
R.sub.nU.sub.n=Σ.sub.nU.sub.n

(18) In Equation 1, R.sub.n denotes the covariance matrix, E{*} denotes the expectation process, U.sub.n denotes the eigenvector, and E.sub.n denotes the eigenvalue (diagonal matrix) on the n.sub.th frequency bin.

(19) It is here assumed that the lower the eigenvalue, the more diffusive the corresponding eigenvector. Thus, Q eigenvectors may chosen by the largest Q eigenvalues. The eigenvector matrix after selection is then denoted as U.sub.n,s(P×Q), wherein the columns of the eigenvector matrix represent the related pairs of microphone units 201, and the rows of the eigenvector matrix represent the related sound sources 200.

(20) To estimate the phase differences between a microphone unit 201 and its translationally shifted microphone unit 201 each pair, the microphone units 201 may be considered as two subarrays that separate the shifted microphone units (as shown in FIG. 2). In U.sub.n,s, the matrix is correspondingly separated to two sub-matrices U.sub.n,1, U.sub.n,2.

(21) It can accordingly be defined as
U.sub.n,1 custom character A.sub.n,1T.sub.n Equation 2
U.sub.n,2A.sub.n,2T.sub.n=A.sub.n,1Δφ.sub.nT.sub.n

(22) In Equations 2, T.sub.n denotes a non-singular matrix at the n.sub.thfrequency bin.

(23) The relationship of the phase difference between the two frequencies f.sub.i, f.sub.j is

(24) $\begin{matrix} \frac{{Δφ}_{i}}{f_{i}} = \frac{{Δφ}_{j}}{f_{j}} & Equation 3 \end{matrix}$

(25) Therefore, the phase differences between the frequencies are different, and that is why the ESPRIT algorithm has to be repeated for each frequency bin to estimate each narrow-band phase difference.

(26) The estimation of Δφ.sub.i from U.sub.n,2, U.sub.n,2 is
U.sub.1″=U.sub.2″ψ Equation 4
ψ=TΔφ′T.sup.−1

(27) In the end, the phase difference is transformed to DOA by

(28) $\begin{matrix} θ = \arcsin \frac{c {Δφ}_{n}}{2 π f_{n} Δ d} & Equation 5 \end{matrix}$
Now the improved algorithm implemented by the device 100, realized by performing the method 110, is described. In particular, a uniformed phase difference vector Δφ′ is defined by,

(29) $\begin{matrix} {Δφ}^{'} = \frac{Δφ}{f} & Equation 6 \end{matrix}$

(30) The uniformed phase difference vectors are theoretically equal for all the frequency bins. Therefore, if the covariance matrices R.sub.n are adapted to R.sub.n′ and merged together to obtain an accumulated covariance matrix R″, the uniformed phase difference vector Δφ′ can be estimated in wide-band. In this respect, in FIG. 4, like described above for the ESPRIT algorithm, the device 100 is configured to compute 111 a covariance matrix R.sub.n, for each X.sub.n, with n from 1 to N, wherein n denotes the nth frequency bin. Then, as described above for the ESPRIT algorithm, the device 100 is configured to carry out an eigenvalue decomposition 401 to obtain the eigenvectors U.sub.n and eigenvalues Σ.sub.n. Each eigenvector U.sub.n is then chosen 402 individually. Preferably, custom character eigenvectors U.sub.n of the covariance matrix R.sub.n related to the Q highest eigenvalues Σ.sub.n are chosen 402.

(31) By the feature of the steering vector A.sub.i,

(32) $\begin{matrix} \frac{{Δφ}_{i}}{f_{i}} = \frac{{Δφ}_{j}}{f_{j}} .fwdarw. A_{i}^{f_{i}^{- 1}} = A_{j}^{f_{j}^{- 1}} & Equation 7 \end{matrix}$
the device 100 is then configured to rotate 403 the eigenvectors U.sub.n, preferably by Hadamard powering of the corresponding frequency to the complex-value eigenvector, wherein the rotated eigenvector U.sub.n,s′ may be defined by

(33) $\begin{matrix} U_{n, s}^{'} = U_{n, s}^{f_{n}^{- 1}}, Σ_{n, s}^{'} = Σ_{n, s}^{f_{n}^{- 1}} & Equation 8 \end{matrix}$

(34) The device 100 is then configured to reconstruct 404 the adapted covariance matrix R′.sub.n. That means, each rotated eigenvector U.sub.n,s′ is constructed 404 back to the shape of the covariance matrix R.sub.n to obtain the adapted covariance matrix R′.sub.n. To this end, each rotated eigenvector U.sub.n,s, a diagonal matrix of size with the shape of Q*Q, and the inverse U.sub.n,s.sup.−1″ of the rotated eigenvector are preferably multiplied. The diagonal matrix may be Σ.sub.n,s′. Thus, the adapted covariance matrix R.sub.n′ may, for example, be defined by
R.sub.n′=U.sub.n,s′Σ.sub.n,s′U.sub.n,s.sup.−1′ Equation 9

(35) The phase differences are now uniformed for all the frequency bins, so that the estimation can be processed in wide-band by merging 113 the R.sub.n,s′ along frequencies of the N frequency bins. The merging 113 may be performed by the device 100 according to

(36) $\begin{matrix} R^{″} = {.Math.}_{n}^{N} β_{n} R_{n}^{'} & Equation 10 \end{matrix}$
where R″ is the accumulated covariance matrix and β.sub.n is a weighting function. That is, the device 100 is preferably configured to weigh each adapted covariance matrix R′.sub.n. One option of realizing the weighting function β.sub.n is to weigh each adapted covariance matrix R.sub.n′ based on the mean square |X.sub.n|.sup.−2 of the frequency-domain transformed output signals of the microphone units 201 for the same frequency bin, which may be represented by
β.sub.n=X.sub.n|.sup.−2 Equation 11

(37) The estimation of Δφ′ from R″ is preferably similar to the above-described ESPRIT algorithm, but can be made in wide-band. In particular, preferably an eigenvalue decomposition 405 of the accumulated covariance matrix R″ is carried out. Then, each eigenvector U″ is chosen 406. A division into two submatrices is then applied 407. Then, the phase differences Δφ′ are found 408. Finally, the phase differences Δφ′ are transformed 409 to the DOAs θ. These steps 405-409 may be carried out as the step 114 by the device 100 according to
R″U″=Σ″U″ Equation 12
U″.fwdarw.U″.sub.s
U″.sub.s.fwdarw.U″.sub.1,U″.sub.2
U″.sub.1=U″.sub.2ψ
ψ=TΔφ′T.sup.−1

(38) That is, in order to estimate 114 the DOA for each of the custom character sound sources 200, the device 100 is preferably configured to estimate, based on the accumulated covariance matrix R″, accordingly adapted phase difference values Δφ″, each of the adapted phase difference values Δφ′ being related to a phase difference Δφ′, between two microphone units 201, and estimate the DOAs θ based on the adapted phase difference values.

(39) A main difference of the algorithm implemented by the device 100 compared to the ESPRIT algorithm is the DOA estimation 409 in the end, namely

(40) $\begin{matrix} θ = \sin^{- 1} \frac{2 π c \arg Ψ}{Δ d} & Equation 13 \end{matrix}$

(41) It can be seen from Equation 13 that the frequency (f) does not appear in contrast to Equation 5 for the ESPRIT algorithm. That is, the DOA estimation in the end is frequency-independent, and thus for wide-band.

(42) In the following some advantageous modifications of the method 110 carried out by the device 100 are described.

(43) With respect to a first advantageous modification, it may for instance happen that, in order to process

(44) $U_{n, s}^{f_{n}^{- 1}},$
the device 100 is challenged with accuracy, when the frequencies are too high. When the frequency is getting larger, a higher level of quantization is used to prevent value distortions when representing the numbers in digital. Conventionally, double-precision floating-point format that occupies 8 bytes in computer memory is the highest level of the quantization, but it is still far below the precision requirement. To ensure that a floating-point computation in the device 100 can run accurately, the device 100 may be configured to perform, over all frequency bins, an accumulation iteration process based on the eigenvectors and their related eigenvalues, or a summing process, or an averaging process based on the reconstructed covariance matrices. The accumulation iteration process, for instance, may repeat

(45) $\begin{matrix} R_{n + 1}^{′′′} = β_{n + 1} U_{n + 1, s} Σ_{n + 1, s} U_{n + 1, s} + R_{n}^{″} R_{n + 1}^{′′′} \overset{def}{=} U_{n + 1, s}^{′′′} {.Math.}_{n + 1, s}^{′′′} U_{n + 1, s}^{′′′ - 1} R_{n + 1}^{′′} = U_{n + 1, s}^{′′′ f_{n}^{- 1} f_{n + 1}} Σ_{n + 1, s}^{′′′ f_{n}^{- 1} f_{n + 1}} U_{n + 1, s}^{- f_{n}^{- 1} f_{n + 1}} & Equation 14 \end{matrix}$
from frequency bin 1 to N-1. Then R″=R.sub.n″. Equation 13 is accordingly updated to

(46) 0 $\begin{matrix} θ = \sin^{- 1} \frac{2 π c f_{N} \arg Ψ}{Δ d} & Equation 15 \end{matrix}$

(47) In a second advantageous modification, the device 100 may be configured to accumulate R″ along time frames, i.e. to accumulate adapted covariance matrices over a plurality of time frames. This measure will also improve the robustness for short-time stationary source localization. A representation may be
R″(t)=αR″(t−1)+(1α)R″(t) Equation 16
R″(t) denotes the accumulated (i.e. adapted wide-band) covariance matrix at the time frame t.

(48) The device 100 may include a processing unit configured to carry out the above described operations. The processing unit may be any kind of programmable or non-programmable hardware (e.g., circuitry) or software, or a combination of hardware and software, that is configured to perform the above-described computations. For example, the processing unit may include a processor and a non-transitory memory carrying executable instructions which when carried out by the processor cause the processor to perform the respective operations.

(49) Embodiments of the present disclosure enhance the robustness, computing speed and accuracy of the sound source localization in real-time. Therefore, it has potential for sound source localization and supporting distance sound pickup purpose for these devices.

(50) The invention has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Device and method for estimating direction of arrival of sound from a plurality of sound sources

Assignee

Inventors

Cpc classification

Classification Explorer

G01S3/8006

PHYSICS

Classification Explorer

G01S3/8083

PHYSICS

International classification

Classification Explorer

G01S3/80

PHYSICS

Classification Explorer

G01S3/808

PHYSICS

Abstract

Claims

Description