Systems and methods for analyzing multichannel wave inputs
11259115 · 2022-02-22
Assignee
Inventors
Cpc classification
H04R2430/21
ELECTRICITY
H04R3/02
ELECTRICITY
H04S3/02
ELECTRICITY
International classification
H04R3/02
ELECTRICITY
Abstract
A spatial-audio recording system includes a processor, and instructions stored in a computer-readable medium that, when read by the processor, cause the processor to perform operations. The operations include retrieving audio data recorded at a number of microphones, determining a recorded signal vector based on the audio data, and initializing values for an operator. The operations further include determining a plurality of waves from directions by performing operations comprising iteratively, until an exit condition is satisfied: initializing or incrementing an index “i”; determining an ith direction using the operator; and updating the operator to correspond to the ith iteration.
Claims
1. A spatial-audio recording system comprising: a processor; and instructions stored in a non-transient computer-readable medium that, when read by the processor, cause the processor to perform operations comprising: retrieving audio data; determining a recorded signal vector based on the audio data; initializing values for an operator specific to a frequency; and determining a plurality of directions by performing operations comprising iteratively, until an exit condition is satisfied: initializing or incrementing an index “i”; determining an ith direction using the operator; and updating the operator to correspond to an ith iteration using the equation:
2. The system of claim 1, further comprising a display configured to display a visual indicator corresponding to at least one of the determined plurality of directions, and wherein the instructions, when read by the processor, further cause the processor to provide data indicative of the at least one of the determined plurality of directions to the display.
3. The system of claim 1, further comprising a speaker configured to output an audio signal corresponding to an isolated audio signal, and wherein the instructions, when read by the processor, further cause the processor to isolate, from the audio data, an audio signal corresponding to one of the determined plurality of directions as the isolated audio signal.
4. The system of claim 1, wherein the steering vector h.sub.n(s.sub.i) is a vector of length M having values that satisfy the equation
h.sub.n(s.sub.i).sub.m=e.sup.−jk.sup.
5. The system of claim 1, wherein determining the ith direction using the operator comprises retrieving data for the operator corresponding to an (i−1)th iteration, and performing a minimization process on an objective function that is a function of the operator corresponding to the (i−1)th iteration.
6. The system of claim 5, wherein is the objective function that is a function of the operator corresponding to the (i−1)th iteration and
satisfies the equation:
7. The system of claim 5, wherein the instructions, when read by the processor, further cause the processor to determine that the exit condition is satisfied by performing operations that include: setting an error tolerance value; determining a residual of the objective function; and determining that the residual of the objective function is less than or equal to the error tolerance value.
8. The system of claim 7, wherein determining the residual of the objective function is based on the equation: is the objective function corresponding to the ith iteration, N is a total number of frequencies of interest, P.sub.n is a vector having values corresponding to signal measurements for the nth frequency made using an array of microphones, and w.sub.n is an nth positive weight.
9. The system of claim 5, wherein the minimization process on the objective function is performed using methods that comprise a gradient method.
10. The system of claim 1, wherein retrieving data for the operator corresponding to an (i−1)th iteration comprises retrieving data for the operator generated during a previous iteration.
11. The system of claim 1, wherein the operator is expressed as a matrix determined by the processor.
12. The system of claim 11, wherein initializing the values for the operator is performed such that the initialized operator is an identity matrix.
13. A spatial-audio recording system, comprising: a plurality of microphones comprising a number M of microphones; a processor; and instructions stored in a computer-readable medium that, when read by the processor, cause the processor to perform operations comprising: retrieving audio data recorded by the microphones; determining a recorded signal vector based on the audio data; initializing values for an operator, the operator being an M×M matrix; and determining a plurality of directions by performing operations comprising iteratively, until an exit condition is satisfied: initializing or incrementing an index “i”; determining an ith direction using the operator by retrieving data for the operator corresponding to an (i−1)th iteration, and performing a minimization process on an objective function that is a function of the operator corresponding to the (i−1)th iteration, wherein the objective function satisfies the equation:
14. The system of claim 13, further comprising a display configured to display a visual indicator corresponding to at least one of the determined plurality of directions, and wherein the instructions, when read by the processor, further cause the processor to provide data indicative of the at least one of the determined plurality of directions to the display.
15. The system of claim 13, further comprising a speaker configured to output an audio signal corresponding to an isolated audio signal, and wherein the instructions, when read by the processor, further cause the processor to isolate, from the audio data, an audio signal corresponding to one of the determined plurality of directions as the isolated audio signal.
16. A method of determining one or more sources of an audio signal, comprising: retrieving audio data; determining a recorded signal vector based on the audio data; initializing values for an operator; and determining a plurality of directions by performing operations comprising iteratively, until an exit condition is satisfied: initializing or incrementing an index “i”; determining an ith direction using the operator by retrieving data for the operator corresponding to an (i−1)th iteration, and performing a minimization process on an objective function that is a function of the operator corresponding to the (i−1)th iteration, wherein the objective function satisfies the equation:
17. The system of claim 16, further comprising: providing data indicative of the at least one of the determined plurality of directions to a display device; and displaying, on the display device, a visual indicator corresponding to the determined plurality of directions.
18. The system of claim 16, further comprising: isolating, from the audio data, an audio signal corresponding to one of the determined plurality of directions; and output an audio signal corresponding to the isolated audio signal.
19. A spatial-wave analysis system comprising: a processor; and instructions stored in a non-transient computer-readable medium that, when read by the processor, cause the processor to perform operations comprising: retrieving wave signal data; determining a signal vector based on the wave signal data; initializing values for an operator specific to a frequency; determining a plurality of directions by performing operations comprising iteratively, until an exit condition is satisfied: initializing or incrementing an index “i”; determining an ith direction using the operator; and updating the operator to correspond to an ith iteration by using the equation:
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) The present disclosure provides for many different embodiments. While certain embodiments are described below and shown in the drawings, the present disclosure provides only some examples of the principles of described herein and is not intended to limit the broad aspects of the principles of described herein to the embodiments illustrated and described.
(12) Embodiments of the present disclosure provide for determining one or more directions of audio sources based on measurements of an audio field. The determination can be implemented by minimizing a cost-function that is a function of at least one of the directions.
(13) The present disclosure provides for an algorithm for the decomposition of a wave field (e.g., a broadband sound field) into its component plane-waves, and for determining respective directions and strengths of the plane-waves. The algorithm, which may be referred to herein as “Sequential Direction Detection” (or SDD), decomposes the wave field into L plane waves by recursively minimizing an objective function that determines the plane-wave directions, strengths and the number of plane-waves. A discussion of testing of the algorithm on synthetic and real data is included herein.
(14) A sound field at a point in any environment carries a tremendous amount of information, which is used by a listener to understand source locations, message content, and the size and ambience of the space. It would be useful to decompose the sound into its components for identification, and obtain the location/direction and content of individual source objects, especially in applications recreating real scenes in virtual and augmented reality, where sources are usually broadband. Microphone arrays are often used for this. An issue faced is the lack of algorithms to perform such decompositions reliably. As such, steered beamforming can be used.
(15) Plane-wave decomposition with arrays of special shape, such as spherical/cylindrical, may be considered. However, in these cases the number of sources and their directions are not estimated.
(16) A problem of incident field reconstruction at a location can be approached by imposing the prior that the scene is generated by an unknown number of distant broadband sources, which is collected at a spatially compact microphone array of M microphones. The signal from these sources (or their reflections) arrive at the array and can be modeled as far-field plane-waves incident from various directions. Imposing this prior, a formulation can be developed for identifying the incoming plane-wave directions via computing a cost function based on those frequencies for which the array theoretically exhibits no aliasing. A sequential operator formulation can be employed which identifies successively the leading order plane-waves. After identifying the directions, a plane-wave representation can be built over the entire audible frequency range for these directions. Results from synthetic experiments are presented, along with a real demonstration.
Problem Statement
(17) A discussion of one of the problems addressed by the present solutions is provided below. Consider a broadband acoustic field received at an array of M sensors (microphones). The field is assumed to be created by an unknown number of plane-waves L (e.g., plane-waves having to-be-determined directions and strengths). After converting a frame of data to the frequency domain, assume that there are N frequencies, and the field at each frequency at a point r is
(18)
where s.sub.l are the directions of arrival (DOA), ω.sub.n are the circular frequencies with wave-numbers k.sub.n, and A.sub.nl the complex amplitudes. For microphone locations r.sub.1, . . . , r.sub.M, the system of equations describing microphone readings can be written in the form
Σ.sub.l=1.sup.LA.sub.nie.sup.−ik.sup.
or in matrix-vector form
H.sub.nA.sub.n=P.sub.n,n=1, . . . ,N, (3)
where H.sub.n is a M×L matrix with entries (H.sub.n).sub.ml=e.sup.−ik.sup.
H.sub.n=(h.sub.n(s.sub.1)h.sub.n(s.sub.2) . . . h.sub.n(s.sub.L)), (4)
where h.sub.n(s.sub.l) are M vectors, known as “steering” vectors, while H.sub.n is called the “steering matrix”. The steering matrix can be modified to account for scattering from the objects holding the microphone array. In this case its entries (H.sub.n).sub.ml can be taken as object-related transfer functions, similar to the head related transfer function (HRTF).
(19) One problem can be set forth as follows: given P.sub.n, determine L, the DOA s.sub.1, . . . , s.sub.L, and amplitudes {A.sub.nl}. The field in Equation (1) (Eq. 1) is characterized by NL complex amplitudes A.sub.nl and L unit vectors s.sub.l, or 2(N+1)L real unknowns for 3D (two angles/direction) and (2N+1)L unknowns in 2D (one angle/direction). Directions are assumed to be consistent across frequencies (e.g. it is assumed that sources are broadband). The microphone readings provide NM complex numbers p.sub.mn which yield 2NM equations using Eq. 2 and Eq. 3. The system can be solved if
(20)
(21) This shows that as the number of frequencies N (or bandwidth) increases, the number of detectable DOA also increases. Regardless, L is smaller than the number of microphones M.
(22) Sequential Direction Detection Algorithm
(23) The solution of Eq. 3 can be sought by globally minimizing a suitable cost function based on discrepancy between measured and predicted data with respect to {A.sub.n} and {s.sub.l}, in a suitable norm such as L.sub.2,=Σ.sub.n=1.sup.Nw.sub.n∥H.sub.nA.sub.n−P.sub.n∥.sub.2.sup.2.fwdarw.min, (6)
where w.sub.n are some positive weights (e.g. w.sub.n=1, n=1, . . . , N).
(24) Note that {s.sub.l} determines H.sub.n. Hence, the minimum of the functional of Eq. 6 can be achieved when the amplitudes A.sub.n are related to P.sub.n via minimization for a given H.sub.n, which in L.sub.2 is
A.sub.n=(H.sub.n*H.sub.n).sup.−1H.sub.n*P.sub.n,n=1, . . . ,N, (7)
where H.sub.n* is the transpose conjugate of H.sub.n and it is assumed that H.sub.n*H.sub.n is pseudo-invertible. On the other hand, this relation determines the optimal A.sub.n as functions of directions {s.sub.l}. Substituting Eq. (7) into Eq. (6), it can be seen that the number of independent variables for the objective function reduces to L directions s.sub.l, and that
(25)
where I is the L×L identity matrix.
(26) Despite the reduction in the dimensionality at this stage by only considering directions (from 2(N+1)L to 2L in 3D, and (2N+1)L to L in 2D), nonlinear optimization is still expensive in L dimensions. Further, multiple local minima complicate the search for the global minimum. Herein is proposed a method for approximate determination of the directions, which has a relatively low computational complexity. Assume
s≠t,.Math.h.sub.n(s)≠h.sub.n(t),n=1, . . . ,N. (9)
SDD constructs steering matrices H.sub.n via consequent determination of optimal directions s.sub.1, s.sub.2, . . . terminated by an exit criteria. At the lth step the M×l steering matrix, which is a function of s, is
H.sub.n.sup.(l)(s)=(h.sub.n.sup.(l). . . h.sub.n.sup.(l-1)h.sub.n(s)). (10)
(27) Here h.sub.n.sup.(k)=h.sub.n(s.sub.k), k=1, . . . , l−1, are constants, as the directions s.sub.1, . . . , s.sub.l-1 are determined at earlier steps. Consider then the objective function .sup.(l)(s),
(28)
which is globally minimized at s=s.sub.l and continue recursively, assigning h.sub.n.sup.(l)=h.sub.n(s.sub.l) and setting the steering matrix H.sub.n.sup.(l)(s.sub.l) at the lth iteration to H.sub.n. The iteration terminates at l=M−1 or
(29)
where ϵ.sub.tol is the tolerance and ϵ.sup.(l) is the relative error in the L.sub.2 norm, for H.sub.n=H.sub.n.sup.(l)(s.sub.l)).
(30) Consider now the first step of the algorithm at which s.sub.1 can be determined. This corresponds to a guess that the field is generated by one plane wave. Then H.sub.n.sup.(1)(s) has size M×1 and consists of one vector h.sub.n(s). The objective function for the first step is,
(31)
(32) The global minimum of any .sup.(l)(s) over the two angles (in 3D) or one angle (in 2D) is relatively easily found, (e.g., using gradient methods). Denote the minimum as s.sub.l and check if the incident field is well approximated by l plane-waves using Eq. (12). If l<M−1 and condition (12) does not hold the process advances to the l+1th step.
(33) Recursion for SDD Operators
(34) The computational complexity of the implementation using Eqs. (10) and (11) directly increases with 1. This is due to several matrix-matrix multiplications and matrix inversion operations, which cost O(l.sup.3)+O(M.sup.2) for the lth step. This can be reduced to O(M.sup.2) using a recursive process for generating the SDD operators, namely M×M matrices L.sub.n.sup.(l)(s),
L.sub.n.sup.(l)(s)=I−H.sub.n.sup.(l)(s)G.sub.n.sup.(l)(s)H.sub.n.sup.(l)*(s), (14)
where
G.sub.n.sup.(l)(s)=(H.sub.n.sup.(l)*(s)H.sub.n.sup.(l)(s)).sup.−1. (15)
The objective function for the lth step takes the form.sup.(l)(s)=Σ.sub.n=1.sup.Nw.sub.n∥L.sub.n.sup.(l)(s)P.sub.n∥.sub.2.sup.2=Σ.sub.n=1.sup.Nw.sub.nP.sub.n*L.sub.n.sup.(l)(s)P.sub.n. (16)
For constant matrices computed at step l−1 the notation L.sub.n.sup.(l-1)=L.sub.n.sup.(l-1)(s.sub.l-1), G.sub.n.sup.(l-1)=G.sub.n.sup.(l-1)(s.sub.l-1), and H.sub.n.sup.(l-1)=H.sub.n.sup.(l-1)(s.sub.(l-1)) will be used. Also, for brevity, the argument s of matrix functions L.sub.n.sup.(l), G.sub.n.sup.(l), H.sub.n.sup.(l), and vector function h, is dropped. Representing
H.sub.n.sup.(l)=(H.sub.n.sup.(l-1)h.sub.n), (17)
provides
(35)
and ((G.sub.n.sup.(l-1)).sup.−1=H.sub.n.sup.(l-1)*H.sub.n.sup.(l-1))). Using the following formula for an arbitrary (invertible) block matrix,
(36)
with E=(D−CA.sup.−1B).sup.−1. When D is a scalar, E is also a scalar, so
(37)
The following can be set:
G=G.sub.n.sup.(l),A.sup.−1=G.sub.n.sup.(l-1),B=H.sub.n.sup.(l-1)*h.sub.n,C=h.sub.n*H.sub.n.sup.(l-1)=B*,
E.sup.−1=h.sub.n*h.sub.n−h.sub.n*H.sub.n.sup.(l-1)G.sub.n.sup.(l-1)H.sub.n.sup.(l-1)*h.sub.n=h.sub.n*L.sub.n.sup.(l-1)h.sub.n. (21)
Substituting this into definition (14) and simplifying, one obtains
(38)
(39) For l=1, set L.sub.n.sup.(0)=1. Eq. (22) involves using stored or previously determined constant matrices L.sub.n.sup.(l-1) to compute .sup.(l)(s) (see Eq. (16)), which thus requires only a few M matrix-vector multiplications. As soon as the optimal direction s.sub.l is found, the constant matrix L.sub.n.sup.(l)(s.sub.l) needed for the (l+1)th iteration can be computed using Eq. (22), also taking O(M.sup.2) operations. The total complexity of the recursive algorithm for the maximum number of steps is O(M.sup.3) as opposed to O(M.sup.4).
(40) Equation (22) reveals a number of features about the SDD algorithm. First, for any s, the steering vector h.sub.n(s) is an eigenvector of L.sub.n.sup.(l)(s) corresponding to zero eigenvalue, or belongs to the null-space of L.sub.n.sup.(l)(s). Indeed, as immediately follows from Eq. (22),
(41)
(42) Second, Eq. (22) shows that any eigenvector of L.sub.n.sup.1, l>1, corresponding to zero eigenvalue will be also eigenvector of L.sub.n.sup.(l), so the nullspace of operator L.sub.n.sup.(l) includes the nullspace of operator L.sub.n.sup.(l-1). Therefore, by induction all vectors h.sub.n.sup.(1), h.sub.n.sup.(2), . . . , h.sub.n.sup.(l-1) are the eigenvectors of L.sub.n (l.sup.) corresponding to zero eigenvalues.
(43) Third, this shows that for s=s.sub.l-1, L.sub.n.sup.(l)(s)=L.sub.n.sup.(l-1) and so .sup.(i)(s)=
.sup.(l-1)(s.sub.l-1). Therefore, min
.sup.(l)(s)≤min
.sup.(l-1))(s)=
.sup.(l-1)(s.sub.1-1) and by induction
min .sup.(l)≤ . . . min
.sup.(1)≤
.sup.(0)≡Σ.sub.n=1.sup.Nw.sub.n∥P.sub.n∥.sup.2. (23)
Strict inequalities can be implemented in Eq. (24). In this case the minimal .sup.(l)(s) should be at some s=s.sub.l≠s.sub.l-1. This also means that all directions found would be distinct.
(44) Fourth, if s.sub.1, . . . , s.sub.l are all different, the steering vectors h.sub.n.sup.(1) . . . h.sub.n.sup.(l) corresponding to these directions are also different (see Eq. (9)). This means that in this case rank(L.sub.n.sup.(l)(s.sub.l))=M−l since the nullspace of L.sub.n.sup.(l)(s.sub.l) is
ker(L.sub.n.sup.(l))=span(h.sub.n.sup.(1), . . . ,h.sub.n.sup.(l)),dim(ker(L.sub.n.sup.(l)))=l. (24)
(45) This shows that L.sub.n.sup.(M)(s)≡0, .sup.(l)(s)≡0 for any s, consistent with the fact that the maximum number of steps is l=M−1.
(46) SDD Algorithm Summary
(47) Define the following M×M matrices L.sub.n.sup.(l)(s) and M×1 vectors l.sub.n.sup.(l)(s) as functions of direction s:
(48)
where I is the identity. Define the objective (steering) function as.sub.n.sup.(l)(s)=P.sub.n*L.sub.n.sup.(l)(s)P.sub.n,
.sup.(l)(s)=Σ.sub.n=1.sup.Nw.sub.n
.sub.n.sup.(l)(s), (27)
and the relative norm of the residual
(49)
(50) One embodiment of the SDD algorithm then is the following: Set some tolerance, ϵ.sub.tol<1, compute and store ∥P∥.sup.2 set=0, ϵ.sup.(l)=1, L.sub.n.sup.(l)(s.sub.l)=I. while ϵ.sup.(l)>ϵ.sub.tol 1. l=l+1; 2. find and store s.sub.l=arg min .sub.SDD.sup.(l)(s); 3. evaluate L.sub.n.sup.(l)(s.sub.l); 4. evaluate ϵ.sup.(l); L=l; the required set of directions is {s.sub.1, . . . , s.sub.l}.
Audio Analysis System
(51) Referring now to
(52) In one or more embodiments, the audio analysis system 108 can include, for example, an audio generator (e.g. a speaker). The audio analysis system 108 can be configured isolate, from audio data recorded by the microphone array 106, one of the audio sources 104 and can output an audio signal that corresponds to the isolated audio source 104 via the audio generator. In some embodiments, the audio signal can be output to a speech-to-text converter to generate text corresponding to the isolated audio signal. In one or more embodiments, the audio analysis system 108 can include, for example, a display, such as a liquid crystal display (LCD), a thin film transistor LCD (TFT-LCD), a blue phase LCD, an electronic paper (e-ink) display, a flexile display, a light emitting diode display (LED), a digital light processing (DLP) display, a liquid crystal on silicon (LCOS) display, an organic light-emitting diode (OLED) display, a head-mounted display, or a 3D display. The audio analysis system 108 can be configured to determine or provide data indicative of one or more determined directions corresponding to respective locations of the audio sources 104, and the display can display a visual indicator indicative of a direction of corresponding audio sources 104.
(53) Reference is made herein to one or more microphones. However, it is to be understood that other detectors or recording devices can be used in place of, or in addition to, the microphones, as appropriate. For example, an electromagnetic detector array can be used when analyzing radio waves or other electromagnetic waves.
(54) Referring now to
(55) The memory 204 can include one or more applications, services, routines, servers, daemons, or other executable logics for analyzing an audio signal, including one or more of a recorded signal analyzer 206, an operator manager 208, a direction determiner 210, and an exit condition manager 212. The memory 204 can also include, access, maintain or manage one or more data structures, including but not limited to operator data 214.
(56) The recorded signal analyzer 206 can include components, subsystems, modules, scripts, applications, or one or more sets of computer-executable instructions for analyzing a recorded signal. The recorded signal analyzer 206 can process one or more signals received by one or more microphones of a microphone array that includes M microphones. The recorded signal analyzer 206 can determine a vector P.sub.n of length M having entries (P.sub.n).sub.m=p.sub.n(r.sub.m), where p.sub.n is an audio field corresponding to a frequency n (that may be determined by the recorded signal analyzer 206 according to Eq. 1) and r.sub.m indicates an mth microphone location.
(57) The operator manager 208 can include components, subsystems, modules, scripts, applications, or one or more sets of computer-executable instructions for managing an operator, and can include an operator updater 216. The operator manager 208 can determine an operator such as an SDD operator. The SDD operator can be specific to a frequency, and can be an M×M matrix. The operator manager 208 can initialize the SDD operator (e.g. as an identity matrix). The operator updater 216 of the operator manager 208 can iteratively update the SDD operator. For example, the operator updater 216 can iteratively update the SDD operator based on previously determined iterations of the SDD operator (e.g. according to Eq. 22, or according to Eq. 26).
(58) The direction determiner 210 can include components, subsystems, modules, scripts, applications, or one or more sets of computer-executable instructions for determining one or more directions corresponding to an audio signal (e.g. a direction that indicates a location or a direction of a source of the audio signal), and can include an objective function minimizer 216. The direction determiner 210 can determine one or more directions by iteratively minimizing an objective function that is a function of an SDD operator, such as the objective function provided in Eq. 27. The direction determiner 210 can determine at least one direction per iteration. For example, the objective function minimizer 216 can use a most recent iteration of the SDD operator (e.g. corresponding to an (i−1)th iteration) as part of the objective function to be minimized (e.g. according to Eq. 26). The objective function minimizer 216 can retrieve the most recent iteration of the SDD operator from operator data 214, which can store one or more iterations of the SDD operator. The objective function to be minimized can be a function of a direction to be determined, and minimizing the objective function can be performed by determining the direction that minimizes the objective function. Such a minimization process can be performed by the objective function minimizer 216, for example, by implementing a gradient method (e.g. a gradient descent method), or by another suitable minimization process.
(59) Furthermore, the direction determiner 210 can re-estimate the strengths of the already-estimated signals. For example, the direction determiner 210 can update the previously-determined amplitudes A.sub.n by minimizing the cost function provide in Eq. 6 using the newly determined one or more directions. Thus, the estimates of the strengths of isolated signals corresponding to the determined directions can be made more accurate.
(60) The exit condition manager 212 can include components, subsystems, modules, scripts, applications, or one or more sets of computer-executable instructions for managing an exit condition of the iterative process for determining the one or more directions. The exit manager 212 can monitor whether an exit condition is satisfied, and can terminate the iterative process when the exit condition is satisfied. The exit condition can be related to a size of an error or residual, such as the residual provided by Eq. 28. The exit condition can be based on the residual being equal to or lower than a predetermined threshold.
(61) The operator data 214 can include one or more data structures that store data for operators. For example, the operator data 214 can store operators corresponding to different iterations of the operator determined by the operator manager 208. The operator data can include, for example, SDD operators and/or vectors l.sub.n.sup.(l)(s), as described herein.
(62) Referring now to
(63) In more detail, at block 302, the recorded signal analyzer 206 can process one or more signals received by one or more microphones of a microphone array that includes M microphones. The recorded signal analyzer 206 can determine a vector P of length M having entries (P.sub.n).sub.m=p.sub.n(r.sub.m), where p.sub.n is an audio field corresponding to a frequency n (that may be determined by the recorded signal analyzer 206 according to Eq. 1) and r.sub.m indicates an mth microphone location.
(64) At block 304, the operator manager 208 can determine an SDD operator specific to a frequency. For example, the operator manager 208 can initialize the SDD operator as an M×M identity matrix.
(65) At block 306, the direction determiner 210 can initialize or increment an index “i”. At block 308, the direction determiner 210 can determine one or more directions by minimizing an objective function that is a function of the SDD operator (e.g. according to Eq. 26). The objective function minimizer 216 can retrieve the most recent iteration of the SDD operator from operator data 214, which can store one or more iterations of the SDD operator. The objective function to be minimized can be a function of a direction to be determined, and minimizing the objective function can be performed by determining the direction that minimizes the objective function. Such a minimization process can be performed by implementing a gradient method (e.g. a gradient descent method), or by another suitable minimization process.
(66) Furthermore, at block 306 the direction determiner 210 can re-estimate the strengths of the already-estimated signals. For example, the direction determiner 210 can update the previously-determined amplitudes A.sub.n by minimizing the cost function provide in Eq. 6 using the newly determined one or more directions. Thus, the estimates of the strengths of isolated signals corresponding to the determined directions can be made more accurate.
(67) At block 310, the operator manager 208 can update the SDD operator. For example, the operator updater 216 can update the SDD operator based on previously determined iterations of the SDD operator (e.g. according to Eq. 22, or according to Eq. 26).
(68) At block 312, the exit condition manager 212 can monitor whether an exit condition is satisfied, and can terminate the iterative process when the exit condition is satisfied. The exit condition can be related to a size of an error or residual, such as the residual provided by Eq. 28. The exit condition can be based on the residual being equal to or lower than a predetermined threshold.
(69) At block 314, the process ends. The audio analysis system 108 can store the determined directions, and/or can perform further analysis on the determined directions.
(70) Referring now to
(71)
(72) In more detail, at block 402, the direction determiner 210 can retrieve recorded signal data and weights corresponding to frequencies. The recorded signal data can be data determined by the recorded signal analyzer 206 by processing one or more signals received by one or more microphones of a microphone array that includes M microphones. The recorded signal data can include a vector P.sub.n of length M having entries (P.sub.n).sub.m=p.sub.n(r.sub.m), where p.sub.n is an audio field corresponding to a frequency n (that may be determined by the recorded signal analyzer 206 according to Eq. 1) and r.sub.m indicates an mth microphone location.
(73) At block 404, the direction determiner 210 can retrieve stored operator data corresponding to an (i−1)th iteration of an iterative process for determining one or more directions. For example, the stored operator data can be operator data determined by the operator manager 208 and stored as operator data 214.
(74) At block 406, the objective function minimizer 210 can determine a direction that minimizes an objective function, such as the function provided by Eq. 29. The objective function can include, or can be based on (determined from), an operator included in the stored operator data.
(75) Thus, one or more directions can be determined by the direction determiner 210.
Experimental Demonstration
(76) A set of experiments based on simulated and real data are described herein. In each simulation a number of sources were positioned in a virtual room. Only direct paths are considered in simulations. Each source signal was independently generated pink noise. The simulated microphones are omnidirectional and record at 44.1 kHz. Gaussian white noise is added to each simulated recording with SNR of 10 dB.
(77) Herein is described four synthetic experiments, labeled as A, B, C, D, and one real experiment labeled E. In the synthetic experiments, the spectrum functions of MUSIC and SDD are compared, where the SDD spectrum function is the reciprocal of its objective function (though note the SDD is performed over multiple iterations and uses a more complex cost function than what is plotted). In the real experiment, the directions computed by SDD are compared to a “ground truth.” The frame size was selected to be 2048. The azimuth and elevation of a source relative to the array center are denoted as (0, 0). Note that MUSIC is given the number of sources present and uses 4 frames of data to perform the modified covariance estimation described in reference [4], while SDD determines the number of sources and uses only one frame.
(78)
(79) Experiment A:
(80) A horizontal 16-element uniform linear array with element spacing of 0.1 meters records a single source at (20,0). The algorithms process a single frequency band corresponding to ˜1.5 kHz (the wavelength is roughly twice array spacing), and evaluate their spectrum at 256 equally-spaced points corresponding to azimuths between −90 and 90 degrees for both display and peak searching.
(81) Experiment B:
(82) The configuration of this experiment is the same as that of experiment A, except the acoustic scene now consists of three sources located at (20,0), (−30,0), and (−60,0). An additional result was obtained with MUSIC using only a single frame for covariance estimation.
(83) Experiment C:
(84) Extending the previous experiments in both azimuth and elevation, a recording using a 64-element array with microphones arranged in an equally-spaced 8×8 grid with spacing of 0.02 meters is simulated. The scene consists of four sources at (20, −10), (10,25), (−30,0), and (−32,5). Note the close arrangement of two sources. Both algorithms process 20 frequency bands in the approximate frequency range 7.8 kHz-8.6 kHz, so that the wavelengths are between 2 and 3 times the array spacing. The spectrum functions are evaluated on a p×p grid corresponding to azimuths and elevations between −45 and 45 degrees, where p is 128 for MUSIC and 64 for SDD. For MUSIC, the sum of each individual band's spectrum is used.
(85) Experiment D:
(86) A more irregular array based on that of experiment E (
(87) Experiment E:
(88) As a test of SDD's viability in practice, a 64-element array was used to record a moving source inside a room. 32 frames of these recordings were processed with SDD on 50 frequency bands in the 2.5 Hz-4 kHz frequency range. The SDD objective function was evaluated on a 32×32 grid over −45 to 45 degrees in azimuth and elevation. The array also recorded a video using a camera mounted in the array's center; as ground truth, incident angles were computed using the video frames corresponding to the processed frames.
(89)
(90)
(91)
(92) The relative error computed after each iteration of SDD in all synthetic experiments is now analyzed. As seen in
(93) This algorithm is similar to classical MUSIC and its extensions. Some comparative systems relate to multi-frequency sources. However, no decomposition is performed there; instead, there are strong assumptions on the frequency content and numbers of sources. In contrast, the SDD algorithm is general.
(94) It is also possible to pursue a representation that combines the leading planewaves/sources and the ambient field. To achieve this, the current algorithm may be extended to include near sources, and the residual after SDD/source estimation would be represented via a low-order ambisonics representation. Other possible uses include source localization/separation. Other embodiments relate to obtaining real time implementations and extending the algorithm to arrays on baffled objects.
(95) It is important to note that the construction and arrangement of the various exemplary embodiments are illustrative only. Although only a few embodiments have been described in detail in this disclosure, those skilled in the art who review this disclosure will readily appreciate that many modifications are possible without materially departing from the teachings and advantages of the subject matter described herein. The order or sequence of any process or method steps may be varied or re-sequenced according to alternative embodiments. Other substitutions, modifications, changes and omissions may also be made in the design, operating conditions and arrangement of the various exemplary embodiments without departing from the scope of the present disclosure.
(96) The following references are incorporated herein by reference in their entirety. [1] R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276-280, March 1986. [2] A. O'Donovan, R. Duraiswami, J. Neumann. “Microphone arrays as generalized cameras for integrated audio visual processing,” IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR'07, 1-8. [3] J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, “Robust localization in reverberant rooms,” in Microphone Arrays: Signal Processing Techniques and Applications, M. S. Brandstein and D. B. Ward, Eds., pp. 157-180. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001. [4] D. Kundu, “Modified MUSIC algorithm for estimation DOA of signals,” Signal Process., vol. 48, no. 1, pp. 85-90, January 1996. [5] B. Rafaely, 2004. Plane-Wave Decomposition of the Sound Field on a Sphere by Spherical Convolution, J. Acoust. Soc. Am., vol. 116(4), pp. 2149-2157. [6] T. Terada, T. Nishimura, Y. Ogawa, T. Ohgane, and H. Yamada, “DOA estimation for multi-band signal sources using compressed sensing techniques with Khatri-Rao processing,” IEICE Transactions on Communications, vol. E97.B, no. 10, pp. 2110-2117, 2014. [7] D. N. Zokin, R. Duraiswami and N. A. Gumerov. “Plane-Wave Decomposition of Acoustical Scenes Via Spherical and Cylindrical Microphone Arrays,” IEEE transactions on audio, speech, and language processing. 20(1):2-2, 2010.