Method for Tracking a Target Acoustic Source
20170261593 · 2017-09-14
Inventors
- Marco CROCCO (Ovada (AL), IT)
- Vittorio MURINO (Rapallo (GE), IT)
- Andrea TRUCCO (Genova, IT)
- Samuele MARTELLI (Genova, IT)
Cpc classification
G01S3/8006
PHYSICS
G06F18/2148
PHYSICS
G01S7/539
PHYSICS
International classification
Abstract
A method of processing an acoustic image includes the steps of acquiring acoustic signals generated by acoustic sources in a predetermined region of space, generating a multispectral 3D acoustic image that includes a collection of 2D acoustic images, performing a frequency integration of the multispectral acoustic image for generating a 2D acoustic map locating at least one target acoustic source of interest and modeling the signal spectrum associated with the target acoustic source, generating a classification map obtained by comparing the signal spectrum of each signal associated with each pixel of the multispectral acoustic image and the model of the signal spectrum associated with the target acoustic source to distinguish the spectrum of the signal associated with the target acoustic source from the signal spectra associated with the remaining acoustic sources, and merging the classification map and the acoustic map to obtain a merged map.
Claims
1. A method of processing an acoustic image comprising the following steps: a) acquiring acoustic signals generated by acoustic sources in a predetermined region of space; b) generating a multispectral 3D acoustic image (1), consisting of a collection of 2D acoustic images, each 2D acoustic image being formed by transposition of a position of each of the acquired acoustic sources into a grayscale or color model, each 2D acoustic image being identified by a single frequency or a frequency band, such that each 2D acoustic image has the position of each of the acquired audio sources marked thereon along axes of coordinates of the 2D acoustic image, for spatial allocation of the acquired acoustic sources; c) performing a frequency integration of said multispectral 3D acoustic image for generating a 2D acoustic map; d) locating at least one target acoustic source of interest and modeling a signal spectrum associated with said target acoustic source; e) generating a classification map obtained by comparing the signal spectrum of each signal associated with each pixel of said multispectral acoustic image and a model of the signal spectrum associated with said target acoustic source, the step of comparing being obtained by training a classification algorithm, said classification algorithm being executed for each pixel of said multispectral acoustic image, to thereby distinguish the spectrum of the signal associated with the target acoustic source from the signal spectra associated with the remaining acoustic sources; and f) merging said classification map and said acoustic map to obtain a merged map.
2. The method as claimed in claim 1, wherein step d) comprises the following step: d1) identifying a spectral signature of the signal generated by the target acoustic source, and wherein step e) comprises the following steps: e1) comparing the spectral signature of the signal generated by the target acoustic source with the spectral signatures of the signals associated with the individual pixels of said multispectral acoustic image (1), and e2) generating said classification map, such that a value of each pixel of said classification map indicates a probability that each signal being compared will be transmitted by the target acoustic source.
3. The method as claimed in claim 1, wherein step d) comprises a sub-step d2) of identifying spectral signatures of the signals generated by acoustic noise sources, the classification algorithm being trained to distinguish the spectrum of the signal associated with the target acoustic source from the signal spectra associated with the acoustic noise sources.
4. The method as claimed in claim 1, wherein step f) comprises multiplying values of the pixels of the acoustic map obtained in step c) by the values of the pixels of the classification map obtained in step e).
5. The method as claimed in claim 1, wherein step c) comprises weighting the frequencies or frequency bands of said multispectral acoustic image.
6. The method as claimed in claim 1, wherein said method comprises tracking the target acoustic source with the following steps: g) generating a probability function based on said merged map, and h) executing a tracking algorithm.
7. The method as claimed in claim 1, wherein step g) comprises the following steps: g1) transforming the merged map into a probability function, g2) generating an additional probability function indicating a possible dynamic path of the target acoustic source, obtained using predetermined dynamic models, and g3) comparing the probability function obtained in step g1) and the additional probability function generated in step g2) to express a conditional probability that said merged map (4) has been obtained using a dynamic model of the target acoustic source.
8. The method as claimed in claim 1, wherein step a) is carried out using an array of acoustic sensors and comprises a substep a1) of calibrating said array of acoustic sensors.
9. The method as claimed in claim 8, wherein said substep a1) comprises acquiring an optical image acquired with a camera or videocamera device, further comprising a step c1) superimposing the acoustic map generated in step c) on the acquired optical image.
10. The method as claimed in claim 6, wherein steps a) to h) are carried out in real-time mode.
Description
[0103] These and other characteristics and advantages of the present invention will be more clear from the following description of some embodiments shown in the annexed drawings wherein:
[0104]
[0105]
[0106]
[0107] It is specified that the embodiment shown in the figures is shown merely for illustrative purposes, in order to better understand the advantages and the characteristics of the method of the present invention.
[0108] In particular the embodiment of the method shown below is about a method for tracking a target acoustic source, but as mentioned above and as it will be clear below it comprises the method steps about the processing of an acoustic image according to the present invention.
[0109] Therefore the shown embodiment has not to be intended as a limitation to the inventive concept of the present invention that is to provide a method processing an acoustic image allowing a “clean” acoustic image to be obtained, that is allowing an acoustic source of interest to be identified without being affected by noise sources present within the monitored space region.
[0110] Moreover the theoretical bases that have allowed the method steps of the present invention to be developed will be disclosed.
[0111] With particular reference to
[0112] According to the shown embodiment, the method provides the following steps:
[0113] a) acquiring acoustic signals generated by acoustic sources in a predetermined region of space,
[0114] b) generating a multispectral 3D acoustic image 1, consisting of a collection of 2D acoustic images.
[0115] Each 2D acoustic image is formed by the transposition of the position of each of the acquired acoustic sources into a grayscale or color model.
[0116] Moreover each 2D acoustic image is identified by a single frequency w or a frequency band, such that each 2D acoustic image has the position x, y of each of the detected acoustic sources marked thereon along the axis x and y that subtend the plane of the 2D image, for spatial allocation of the acquired acoustic sources.
[0117] Moreover the method provides step c) performing a frequency integration of the multispectral acoustic image 1 for generating a 2D acoustic map 3.
[0118] According to the shown embodiment the method further provides a step h) executing a tracking algorithm.
[0119] Particularly as regards steps a) and b) it is specified that if a tern of Cartesian coordinates in the 3D space is defined with (x, y, z) and a camera is assumed to be placed at the origin of the coordinates and oriented along axis Z, coordinates x and y of the 2D image can be defined as follows:
(x, y)=(f x/z; f y/z)
[0120] where f is the focal length.
[0121] Now we can define I° .sub.t(h, k) as the optical image with resolution H, K function of the pixels with indexes h=1 . . . H, K=1 . . . K showing the 3D scene within the time interval t.
[0122] Coordinates in the image plane of the (h, k)-th pixel are given by the following formula:
[0123] (x.sub.h, y.sub.k)=(hΔx, kΔy), wherein Δx and Δy are horizontal and vertical pitch among adjacent pixels.
[0124] Analogously with the optical image, an acoustic image projecting the set of acoustic sources in 3D space on the same image plane can be defined.
[0125] To construct the acoustic image a planar array of acoustic sensors is preferably used in combination with the known “SRP filter and sum beamforming” [18], as described in patent application WO2014/115088.
[0126] The “filter-and-sum beamforming” in combination with the geometric arrangement of the acoustic sensors allows very wide acoustic bands to be acquired, while allowing a high resolution at lower frequencies, as well as absence of artifacts at higher frequencies [19].
[0127] However if two or more acoustic sources contemporaneously emitting a sound have a high signal energy difference, it can happen that the higher energy source obscures the acquisition of the lower energy source.
[0128] Moreover if two acoustic sources have a different nature, namely having differently shaped signal spectra, the weaker source, very likely, can be stronger or at least comparable with some frequency sub-bands.
[0129] To this end the method of the present invention uses, after beamforming, frequency sub-band normalization.
[0130] Moreover according to an improvement of the present invention, step c) provides a step weighting the frequency bands or frequencies of the multispectral acoustic image 1.
[0131] As described above the acquisition step a) is carried out through an array of acoustic sensors and it provides a sub-step a1) calibrating said array of acoustic sensors.
[0132] Preferably calibration sub-step a1) can comprise acquiring an optical image acquired through a camera or videocamera device, there being provided a step c1) of superimposing the acoustic map 3 generated in step c) on the acquired optical image.
[0133] As said above, by carrying out the integration on frequencies,
[0134] Such acoustic map can be formally defined by equation (29):
[0135] where
[0136] Î.sub.t(h, k) is the acoustic map, having a contribution given by:
[0137] Î.sub.t.sup.tr(h,k) is the acoustic map obtained if present within the space region under examination only the acoustic source of interest were present,
[0138] Î.sub.t.sup.dr(h,k) is the acoustic map obtained if within the space region under examination only noise sources were present,
[0139] n.sub.t(h, k) is the background noise produced without acoustic sources within the region under examination,
[0140] R(x.sub.m, y.sub.m) is the set of pixels comprised in the neighborhood of the coordinate (x.sub.m, y.sub.m) on the image plane, and it is defined by:
R(x.sub.m,y.sub.m)={(h,k):(x.sub.h−x.sub.m).sup.2+(y.sub.k−y.sub.m).sup.2<r}.
[0141] Particularly the method of the present invention provides between step c) and step h) the following steps to be carried out:
[0142] d) locating at least one target acoustic source of interest and modeling the signal spectrum associated with the target acoustic source,
[0143] e) generating a classification map 2 obtained by comparing the signal spectrum of each signal associated with each pixel of the multispectral acoustic image 1 and the model of the signal spectrum associated with the target acoustic source,
[0144] f) merging the classification map 2 and the acoustic map 1 to obtain a merged map 4,
[0145] g) generating a probability function based on the merged map 4.
[0146] As it will be described below the comparison is obtained by training a classification algorithm, which is executed for each pixel of the multispectral acoustic image, to thereby distinguish the signal spectrum associated with the target acoustic source from the signal spectra associated with the remaining acoustic sources.
[0147] Particularly steps d) and e) related to the implementation of classification map 2, are shown in
[0148] According to a preferred variant embodiment of the method of the present invention, step d) provides step d1) identifying the spectral signature 22 of the signal generated by the target acoustic source.
[0149] In combination with such characteristic, step e) provides the following sub-steps:
[0150] e1) comparing the spectral signature of the signal generated by the target acoustic source with the spectral signatures of the signals associated with the individual pixels of the multispectral acoustic image 1,
[0151] e2) generating the classification map 2, such that the value of each pixel of the classification map 2 indicates the probability that each signal being compared will be transmitted by the target acoustic source.
[0152] Particularly in order to generate the classification map 2, it is possible to use the Tracking by detection (TbD) approach used in video systems and widely described in [16] by using as the starting base the detection carried out by the acoustic sensor array, namely starting from the obtained multispectral acoustic image 1.
[0153]
[0154]
[0155] Particularly a classifier 21 is used and it is trained such to distinguish the spectrum of the signal associated with the target acoustic source 22 from the spectra of non interest.
[0156] Such classifier 21 is applied to each pixel of the multispectral image 1 such to obtain the classification map 2, shown in
[0157] Formally, the classification map 2 of each pixel can be defined as:
[0158] where
[0159] [0,1] is a range of values ranging from 0 to 1.
[0160] D.sub.t(h,k) is the classification map,
[0161] x.sub.m, y.sub.m are the coordinates within the 2D image,
[0162] sr is a generic acoustic source that can belong to the target acoustic source tr or to noise acoustic sources dr.
[0163] According to equation 27, the classification map D.sub.t(h,k) can be divided in M regions, such that each region has a value of about 1 or about 0 depending on the fact that the sound associated with the pixel under examination belongs to the target acoustic source or to the noise acoustic source.
[0164] In addition there is a transition region with indeterminate values ranging from 0 to 1.
[0165]
[0166] According to a preferred variant embodiment of the method of the present invention, such step f) is obtained by multiplication of values of pixels of the acoustic map 3 obtained in step c) by the values of the pixels of the classification map 2 obtained in step e).
[0167] The merged map 4, shown in
[0168] Particularly such merging is composed of a product between pixels of the two classification map 2 and acoustic map 3.
[0169] Due to how the classification map 2 and acoustic map 3 have been defined previously, equations (27) and (29), the merged map 4, resulting from the product between the two maps will be defined as:
[0170] where
[0171] J.sub.t(h, k) is the merged map 4.
[0172] As it is clear from
[0173] On the basis of equation 31 it is clear how the pixels associated with the regions of interest are kept unchanged, while pixels of the regions of non-interest are reduced to the zero value and the contribution of the pixels of the regions far from any acoustic source, both of interest and non-interest, decreases.
[0174]
[0175] Advantageously step g) is carried out by performing the following steps:
[0176] g1) transforming the merged map 4 into a probability function,
[0177] g2) generating a further probability function indicating the possible dynamic path of the target acoustic source, obtained using predetermined dynamic models,
[0178] g3) comparing the function obtained in step g1) and the function generated in step g2) to express the conditional probability that the merged map 4 has been obtained using the dynamic model of the target acoustic source.
[0179] From a formal perspective a vector s.sub.t(x.sub.t, v.sub.t) is defined where x.sub.t is the coordinates of the target acoustic source on the 2D plane of the image at instant t, while v.sub.t is the speed thereof, still at instant t.
[0180] By indicating with Z.sub.t the observation at time t obtained by the acoustic sensor array, the tracking problem is about the estimation of the vector s.sub.t given the set Z.sub.1:t, that is the set of the observations from the initial instant to instant t.
[0181] Such estimation can be obtained through the posterior probability density function (PDF) that contains all statistical information available about the variable s.sub.t.
[0182] By using the Bayes's theorem such function can be expressed as:
p(s.sub.t|Z.sub.1:t−1)=∫p(s.sub.t|s.sub.t−1)p(s.sub.t−1|Z.sub.1:t−1)ds.sub.t−1 (10)
p(s.sub.t|Z.sub.1:t)∝p(Z.sub.t|s.sub.t)p(s.sub.t|Z.sub.1:t−1) (11)
[0183] Equations 10 and 11 define the dynamic model of the acoustic source of interest.
[0184] Particularly equation 10 defines the PDF function at interval t−1: the use of such function in combination with probability function p(Z.sub.t, s.sub.t) allows the relation between state vector and the performed measurements to be modelled, on the basis of equation 11.
[0185] Equation 11 can be approximated as it follows:
p(s.sub.t|Z.sub.1:t)≈Σ.sub.p=1.sup.pω.sub.t.sup.pδ(s.sub.t−s.sub.t.sup.p) (12)
[0186] such to limit the contribution of disturbing noises and acoustic sources: samples s.sup.p.sub.t of the vector s.sub.t are used, a weight ω being associated with each one thereof and where δ is Dirac function.
[0187] Each sample at time t is estimated by using a predetermined dynamic model calculated at instant t−1.
[0188] The weighting values associated with each sample are calculated on the basis of the probability function.
[0189] According to a possible embodiment it is possible to provide the sampling step to be carried out on the basis of the distribution of weights, such to generate a higher number of samples for high weighting values, while reducing the number of samples for low weighting values.
[0190] By using equation 12, the estimation of the position of the target acoustic source will be given by:
[0191] Particularly in order to estimate the proper position of the target acoustic source the approach known as “Tracking Before Detect” described in [12] has been used adapted to the audio tracking problem described in [11].
[0192] On the basis of the teachings of such documents and of what described above the posterior distribution has been calculated for the tracking of the target acoustic source, based on the merged map 4, that is on J.sub.t(h, k):
[0193] The theoretical bases and the characteristics of the method of the present invention therefore have been applied to an experimental case, where interest has been focused on tracking a vehicle.
[0194] Particularly
[0195] The array of acoustic sensors, in this particular case of microphones, intended to acquire the acoustic sources has been placed at the top left angle of
[0196] In this case the acoustic source of interest was a vehicle travelling along a trajectory from point A to point B.
[0197] The vehicle was at about 50 meters from the microphone array, while near it there were disturbing noises, people speaking near the microphone array, as well as noisy devices, such as air-conditioners.
[0198] Moreover a motorway at about 500 meters from the microphone array caused a further noise signal.
[0199]
[0200] It is noted how the tracking algorithm, based only on the acoustic map, produces a wrong trajectory from point C to point D, compared to the real trajectory, from point A to point B.
[0201] With a particular reference to
[0202] The path evaluated from point C to point D follows the distribution of acoustic sources, it being irreversibly affected by noise acoustic sources, near the microphone array.
[0203] The situation shown in
[0204] The vehicle continues to follow a real trajectory from point A to point B.
[0205] Firstly it is possible to note how in case of presence of the merged map, the image is more clean, removing all the disturbing acoustic sources.
[0206] By properly locating the acoustic sources of interest an estimated trajectory of the vehicle from point E to point F is shown, that is near the right trajectory of the vehicle from point A to point B.
REFERENCES
[0207] [1] Y. Huang, J. Chen, and J. Benesty, “Immersive audio schemes,” Signal Processing Magazine, IEEE, vol. 28, no. 1, pp. 20-32, January 2011.
[0208] [2] M. Pucher, D. Schabus, P. Schallauer, Y. Lypetskyy, F. Graf, H. Rainer, M. Stadtschnitzer, S. Sternig, J. Birchbauer, and B. Schalko, W. Schneider, “Multimodal Highway Monitoring for Robust Incident Detection,” in Proc. 13th International IEEE Conference on Intelligent Transportation Systems, September 2010.
[0209] [3] G. Valenzise, L. Gerosa, M. Tagliasacchi, F.
[0210] Antonacci, and A. Sarti, “Scream and gunshot detection and localization for audio-surveillance systems,” in IEEE Conference in Advanced Video and Signal Based Surveillance, 2007. AVSS 2007.5-7 2007, pp. 21-26.
[0211] [4] Q.-C. Pham, A. Lapeyronnie, C. Baudry, L. Lucat, P. Sayd, S. Ambellouis, D. Sodoyer, A. Flancquart, A.-C. Barcelo, F. Heer, F. Ganansia, and V. Delcourt, “Audio-video surveillance system for public transportation,” in 2nd International Conference on Image Processing Theory Tools and Applications (IPTA), 2010, pp. 47 -53.
[0212] [5] C. Clavel, T. Ehrette, and G. Richard, “Events detection for an audiobased surveillance system,” In IEEE International Conference on Multimedia and Expo(ICME), 2005. 6-6 2005, pp. 1306 -1309.
[0213] [6] M. S. Brandstein and H. F. Silverman, “A practical methodology for speech source localization with microphone arrays,” Computer Speech & Language, vol. 11, no. 2, pp. 91-126, 1997.
[0214] [7] V. Cevher, R. Velmurugan, and J. H. McClellan, “Acoustic multitarget tracking using direction-of-arrival batches,” In IEEE Transactions on Signal Processing, 2007. Vol. 55, no. 6, pp. 2810-2825.
[0215] [8] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking,” in IEEE Transactions on Signal Processing, 2002. Vol. 50, no. 2, pp. 174-188, February 2002.
[0216] [9] D. B. Ward, E. A. Lehmann, and R. C. Williamson, “Particle filtering algorithms for tracking an acoustic source in a reverberant environment,” In IEEE Transactions on Speech and Audio Processing,2003, Vol. 11, no. 6, pp. 826-836.
[0217] [10] C.-E. Chen, H. Wang, A. Ali, F. Lorenzelli, R. Hudson, and K. Yao, “Particle filtering approach to localization and tracking of a moving acoustic source in a reverberant room,” In IEEE International Conference on in Acoustics, Speech and Signal Processing (ICASSP) 2006. Proceedings., Vol. 4, May 2006, pp. IV-IV.
[0218] [11] M. F. Fallon and S. Godsill, “Acoustic source localization and tracking using track before detect,” In IEEE Transactions on Audio, Speech, and Language Processing,Vol. 18, no. 6, pp. 1228-1242, 2010.
[0219] [12] D. Salmond and H. Birch, “A particle filter for track-before-detect”, in Proceedings of the American control conference, vol. 5, 2001, pp. 3755-3760.
[0220] [13] E. A. Lehmann and A. M. Johansson, “Particle filter with integrated voice activity detection for acoustic source tracking,” in EURASIP Journal on Applied Signal Processing, vol. 2007, no. 1, pp. 28-28, 2007.
[0221] [14] M. Kepesi, F. Pernkopf, and M. Wohlmayr, “Joint position-pitch tracking for 2-channel audio,” In IEEE International Workshop on Content-Based Multimedia Indexing (CBMI), 2007, pp. 303-306.
[0222] [15] K. Wu, S. T. Goh, and A. W. Khong, “Speaker localization and tracking in the presence of sound interference by exploiting speech harmonicity,” In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 365-369.
[0223] [16] A. Smeulders, D. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, “Visual tracking: An experimental survey,” pp. 1-1, 2013.
[0224] [17] M. Crocco and A. Trucco, “Design of superdirective planar arrays with sparse aperiodic layouts for processing broadband signals via 3-d beamforming,” Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 4, pp. 800-815, April 2014.