MICROPHONE ARRAYS
20220060818 · 2022-02-24
Inventors
Cpc classification
H04R2201/405
ELECTRICITY
International classification
Abstract
A system for capturing sound comprising a plurality of discrete microphones (112, 14, 116, 118) and a processing system (408). The plurality of discrete microphones are arranged in a circular array. The processing system (408) arranged to perform a first signal processing algorithm on sound originating from one or more of a first set of directions relative to the array to isolate a first sound source. The processing system (408) is further arranged to perform a second signal processing algorithm on sound originating from one or more of a second set of directions relative to the array to isolate a second sound source therein. A method for receiving sound at a plurality of discrete microphones (112, 114, 116, 118) arranged in a circular array is also described.
Claims
1. A method comprising: receiving sound at a plurality of discrete microphones arranged in a circular array, at least some of said microphones producing signals in response to said sound, performing a first signal processing algorithm on sound originating from one or more of a first set of directions relative to said array to isolate a first sound source therein; and performing a second signal processing algorithm on sound originating from one or more of a second set of directions relative to said array to isolate a second sound source therein.
2. The method as claimed in claim 1 wherein the first processing algorithm comprises a broadside processing technique.
3. The method as claimed in claim 1 wherein the first set of directions comprises directions up to a threshold angle from a perpendicular to the plane of the array.
4. The method as claimed in claim 3 wherein the threshold angle is between 50° and 70°.
5. The method as claimed in claim 1 wherein the second processing algorithm comprises super-directive beamforming.
6. The method as claimed in claim 1 further comprising using an orientation sensor to determine an orientation of the array and using said orientation to determine a first and second portions of sound.
7. The method as claimed in claim 1 comprising receiving sound at a plurality of discrete microphones arranged in a plurality of concentric circular arrays.
8. The method as claimed in claim 7 wherein a radius of each concentric circular array is calculated by reference to a maximum phase mode order, M, the number of circular concentric arrays, P and a number of microphones in each concentric circular array, N, by equating the standard form of the frequency-weighted white noise gain to a form dependent on the aforementioned variables given by:
9. A system for capturing sound comprising: a plurality of discrete microphones arranged in a circular array, a processing system arranged to perform a first signal processing algorithm on sound originating from one or more of a first set of directions relative to said array to isolate a first sound source therein; wherein said processing system is arranged to perform a second signal processing algorithm on sound originating from one or more of a second set of directions relative to said array to isolate a second sound source therein.
10. The system as claimed in claim 9, further comprising a support structure.
11-14. (canceled)
15. The system as claimed in claim 9, wherein the processing subsystem is arranged to filter out spatial noise according to respective algorithms designated for each of a plurality of noise directions.
16. The system as claimed in claim 9, comprising a plurality of concentric circular arrays of microphones.
17-20. (canceled)
21. The system as claimed in claim 16, wherein a limiting aperture of the concentric circular arrays is equal to 2π/k.sub.1, where k.sub.1 is a smallest wavenumber the array is designed to detect.
22. (canceled)
23. The system as claimed in claim 9 comprising a centre microphone(s) at a centre of the circular array(s).
24. The system as claimed in claim 9, wherein a maximum excited phase mode order for which the system is optimized is in the range 1 to 15.
25-26. (canceled)
27. The system as claimed in claim 9, wherein said microphones have a spacing less than or equal to half a wavelength of a highest frequency signal the array is designed to sample.
28. The system as claimed in claim 9, wherein the microphones are arranged at equal angular spacings around the circular array(s).
29. (canceled)
30. The system as claimed in claim 10, wherein the plurality of discrete microphones comprise a first array and said plurality of microphone signals comprise a plurality of first microphone signals, said system further comprising a second plurality of discrete microphones arranged on the support structure in a second circular array, concentric with the first circular array and arranged to provide a respective plurality of second microphone signals, wherein the first plurality of microphones are mounted so that they have vectors normal to their respective membranes oriented substantially radially with respect to the first circular array and the second plurality of microphones are mounted so that they have vectors normal to their respective membrane oriented substantially parallel to an axis of the second circular array.
31. (canceled)
32. (canceled)
33. The system as claimed in claim 9, wherein the first plurality of discrete microphone and the second plurality of microphones are mounted on a common ring.
34. The system as claimed in claim 9, wherein the second plurality of microphones are at the same angular positions on the circular array as the first plurality of microphones.
35. (canceled)
Description
[0069] Certain embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawing in which:
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088] The microphones may be miniature MEMS microphones which have low self-noise, allowing for improved phase and amplitude matching. As shown in
[0089] The concentric rings may be formed from aluminium tubes. The radius of the overall structure may be for example 30 cm.
[0090]
[0091]
[0092] By combining the signals from multiple microphones at each location around the circumference of the tube 200, the self-noise introduced by the individual microphones can be effectively reduced. Furthermore, it provides a wider range of angles over which high sensitivity can be achieved which facilitates use of the overall apparatus for isolating sound emanating from a wide field relative to the apparatus. Additionally, pair subsets of these four can simulate a directive element in the cross-sectional plane.
[0093]
[0094] The device may also comprise a camera (not shown). This can be used to assist in the determination of orientation (e.g. relative to a known image of its environment). It can also allow for visualisation of the environment on a remote device e.g. tablet, laptop. This is highly desirable for surveillance and video conferencing purposes. Furthermore it may enable remote control e.g. of the selection of the direction of the sound which is to be isolated. This may be used to steer the reception beam as is known per se and, together with the orientation sensor, to determine what processing algorithm to apply to the signals from the microphones (as is explained below).
[0095] Some embodiments of the device may require two-way data communication, therefore a receiver can be provided as well as the transmitter.
[0096] The processing unit 408 may perform all necessary processing of signals from the microphones but more typically controls transmission of data from these signals to a remote device allowing for storage and more powerful processing.
[0097] In certain embodiments, the combination of sensors and microphones allows beamforming using a single cluster of microphones on the array at a time i.e. the cluster with the best orientation. For example, some embodiments could use weighted combinations of clusters of microphones, or “backwards” facing microphones to eliminate background noise from the forward facing microphones.
[0098] For acoustic imaging purposes, the position of the device is measured using a number of sensors (including the orientation sensor 406) allowing all sound signals to be mapped to spatial positions, which can then be displayed on a remote screen allowing visualization. This technique is particularly advantageous in drone detection.
[0099] For surveillance purposes, the device could be realised as a camouflaged or disguised compact device. A wireless connection between the device and a remote receiver/transmitter allows the user to pinpoint the direction of interest or access the visualization obtained from the array. In embodiments where no camera or sensors are used, the orientation of the array may be predetermined and/or specified by a user to allow for the correct processing technique to be adopted.
[0100] In use of the device, signals from the microphones 112, 114, 116, 118; 202, 204, 206, 208 are processed using an appropriate algorithm. The algorithm is selected based on the direction from which the sound which it is desired to isolate is coming. As previously mentioned this could be established using any one or combination of:
[0101] a visual interface to select the direction from a mapped image of the scene; the orientation sensor(s); or pre-programmed directions representing physical positions of the sources of interest.
[0102] The selection of algorithm is based on the direction in question relative to the central axis of the microphone array. This is the line passing though the centre or common centres of the rings 102, 104, 106, 108; 200 and normal to the planes of the rings (or the plane of the backboard 402). If the direction of sound is within a 60 degree forwardly-projected cone centred on the array central axis, broadside processing is used. If it outside this range, end-fire processing is used.
[0103] For the endfire processing mode the concept of the maximum excited phase mode may be understood as follows. The signals received at a set of N omni-directional microphones spaced evenly in a circle of radius R in a wavefield consisting of a single plane wave (given by x({right arrow over (r)},t)=Ae.sup.−i({right arrow over (k)}.Math.{right arrow over (r)}+ωt)) can be expressed, using the Jacobi-Anger expansion of complex exponentials of trigonometric functions, as
[0104] Where J.sub.m is the order m Bessel function of the first kind, and k=ω/c is the wavenumber of the wavefield with frequency co and propagation speed c. The wavevector is given by {right arrow over (k)}=−k(sin θ cos ϕ, sin θ sin ϕ, cos θ), with the minus sign introduced for later convenience. N is the number of microphones in the array. The Bessel function order M that the Jacobi-Anger expansion is truncated at is referred to as the maximum excited phase mode order in the context of circular array theory.
[0105] The processing of the array (both in the endfire and broadside processing modes) is done in such a manner that a Fast Fourier Transform (FFT) in applied to the data from all the microphones in the array. The beams corresponding to the broadside operational mode are processed according to the scheme presented in
Y(ω)=W(ω).sup.HX(ω). (1)
[0106] Where W(ω) is the weighting vector corresponding to frequency ω, X(ω) is the frequency domain vector corresponding to frequency ω and Y(ω) is the weighted frequency domain corresponding to ω. This process is done once per direction, corresponding to one beam. The expression above is provided for narrowband cases. It can be generalized for any band-width by repeating the processes for each frequency bin and by summing the contributions. If the time-domain signal is the preferable output, the inverse Fourier transform is applied.
[0107] For broadside beams, the weights W are applied to frequency data vector X=[X.sub.1, X.sub.2, . . . , X.sub.N].sup.T of dimension N×1. The weight vector W(ω)=[w.sub.1(ω), w.sub.2(ω), . . . , w.sub.N(ω)].sup.T is also N×1, where index n denotes a particular microphone, w.sub.n is the weighting applied to microphone n, and X.sub.n(ω) is the frequency domain data from the microphone n. The weight vector can be changed according to the application and the desired response of the array. In a preferable embodiment W(ω) will be estimated by using the least squares weight optimization. The constraints for the optimization will depend on the desired response, and are chosen for example to yield a super-directional response, minimum side-lobe level, or constant directivity. What is achievable via optimization is decided by the geometry of the array. For example, super-directivity at a frequency co is only possible if the array rings and the microphones on the rings are spaced closer than a half wavelength. The output Y is a scalar value of dimension 1×1.
[0108] In the end-fire operating mode, phase mode processing is used, as shown in
[0109] where {tilde over (X)} denotes the phase domain signals. The individual phase mode signals are then weighted and summed to produce an output signal {tilde over (Y)}, analogously to Eq. (1): {tilde over (Y)}={tilde over (W)}.sup.H{tilde over (X)}. This processing scheme is shown in
[0110] when the array is steered to the direction (θ′, ϕ′). The auxiliary phase mode weights h.sub.m can be used to shape the beampattern. When multiple rings are used, the signals from each microphone signals transformed to the phase mode domain in each ring individually. The resulting signals are then weighted and summed over both rings and phase modes:
[0111] where the phase mode weights are given by
[0112] P.sub.m denotes the index of the largest ring (equivalent to the number of rings) included in the sampling of a given phase mode m at a given frequency and is given by
P.sub.m=|{R.sub.q|1≤q≤P,kR.sub.q<m+1}|.
[0113] This particular form of the weights has been found to yield nearly optimal WNG for a given set of radii (fully optimal when P.sub.m=P ∀m), while at the same time being flexible in terms of allowing a large range of radii without violating the assumptions that underlie phase mode processing. The WNG using this processing scheme is given by
[0114] The ring radii are derived from maximising the WNG, with the constraints of M, the number of the discrete ring support structures, P, and the number of microphones on each discrete ring support structure, N.sub.p.
[0115] Therefore the number of microphones in each ring, the limiting aperture and the radius of each discrete ring may each be optimised.
[0116] For wideband signals, e.g. speech, the array input is decomposed via FFT and each frequency bin component is processed as a narrowband signal as described above. When designing an array for wideband frequency acquisition it is desirable to weight the WNG at different frequencies against one another. Optimizing a weighted average WNG over the frequency bands of interest may result in an array with particularly low WNG at low frequencies. Therefore a weighted log-average WNG is used in this example, as given by:
[0117] where g(f) represents a frequency weighting function e.g. for speech acquisition frequency bands are weighted by their relative importance to intelligibility, such as given by the Speech Intelligibility Index (SII). Using SII weighting as a criterion yields the upper frequency f.sub.1=8 kHz.
[0118] One of the primary parameters of the array design is the radius of the largest ring R.sub.p. From a signal processing perspective, a large as possible aperture is desirable, thus the largest radius is limited by practical considerations of the physical size of the microphone array and support structure. For example let the physical restraints determine that R.sub.P=0.20 m. The smallest non-zero radius is constrained by
in order to ensure that at least one phase mode can be sampled without modal aliasing for the highest frequency for which the array has been designed f.sub.max, where the speed of sound is denoted by c. The maximum excited phase mode order, M, is a parameter that may be varied in processing, but since the ring radii are determined by maximizing Eq. (6), a particular value M.sub.d is chosen for which the design is optimized.
[0119] The number of rings, P, is indirectly determined by the optimization procedure, though an upper bound may be set based on the desired number of elements in the array, and the design phase mode order, M.sub.d. For e.g. M.sub.d=7 the required number of elements per ring is 15, and with a cap of 256 elements in total (due to processing restrictions, for instance) this yields a maximum of 17 rings. An array optimised according to Eq. (6) tends to have fewer rings, and more elements in the largest ring.
[0120] The effects of varying M in the processing are demonstrated in
[0121] A lower maximal phase order may then be used in processing in order to boost WNG at the expense of DI. However, practical restrictions on the minimum element spacing may make it difficult to sample the highest phase modes from the innermost ring arrays.
[0122] In a specific embodiment where R.sub.p=0.10 cm, N=128 and M.sub.d=5 the minimum element distance is 7.5 mm giving an upper bound of P=11 rings. Maximising Equation 6 using SII frequency weighting yields an array as seen in
[0123]
[0124] Different frequencies are represented in different line styles. The narrowing of the beam increases the directivity and hence spatial resolution of the array.
[0125]
[0126]
[0127]
[0128] In
[0129] In
[0130]
[0131]
[0132]
[0133] In
[0134] In