SYSTEM AND METHOD FOR INTEGRATED EMERGENCY VEHICLE DETECTION AND LOCALIZATION

20230128993 · 2023-04-27

    Inventors

    Cpc classification

    International classification

    Abstract

    A siren detector identifies those frequencies at which a siren is active and a localizer localizes the siren based on those frequencies.

    Claims

    1. An apparatus comprising a detector for detecting a siren that is emitted by a siren source located at a particular direction relative to a motor vehicle and a localizer in communication with said siren source for estimating said particular direction, said motor vehicle comprising a microphone array having a plurality of microphones, each of which connects to said siren detector, said siren detector being configured to identify those frequencies at which a siren is active and said localizer being configured to estimate said particular direction based on those frequencies.

    2. The apparatus of claim 1, wherein the detector includes a line detector.

    3. The apparatus of claim 1, wherein the detector includes an edge detector.

    4. The apparatus of claim 1, wherein the siren is represented on a spectrogram and wherein the localizer is configured to localize the siren based on information indicative of lines in different directions on the spectrogram.

    5. The apparatus of claim 1, wherein the siren is represented on a spectrogram and wherein the localizer is configured to localize the siren based at least in part on lines in different directions that periodically recur on the spectrogram.

    6. The apparatus of claim 1, wherein the localizer is configured to localize the siren based at least in part on information indicative of an assembly of different line segments on a spectrogram that represents the siren, the line segments having been assembled in relation to each other.

    7. The apparatus of claim 1, further comprising a dynamic system model, wherein the localizer is configured to localize the siren based at least in part on information indicative of different line segments that have been assembled in relation to each other using the dynamic system model, the line segments being representative of a time-varying spectrum of the siren as represented in a two-dimensional time-frequency space.

    8. The apparatus of claim 1, further comprising a dynamic system model that models evolution of slopes of line segments over time, the line segments representing portions of a time-varying spectrum of the siren, wherein the localizer is configured to localize the siren based at least in part on information indicative of an assembly of the line segments in relation to each other as provided by the dynamic system model.

    9. The apparatus of claim 1, further comprising a dynamic tonal model that specifies frequencies that are present in the siren for specified durations, wherein the localizer is configured to localize the siren at least in part on the basis of an assembly of line segments, each line segment representing a portion of a time-varying spectrum of the siren, the line segments having been assembled by the dynamic tonal model.

    10. The apparatus of claim 1, wherein the detector relies on a partly linear model.

    11. The apparatus of claim 1, wherein said localizer is configured to localize the siren at least in part on the basis of cross-phase spectral density information from different microphones.

    12. The apparatus of claim 1, wherein said localizer is configured to localize said siren at least in part on the basis of cross-power spectral density information from different microphones.

    13. The apparatus of claim 1, wherein, to carry out localization, the localizer localizes said siren based at least in part on power spectral density information between different microphones.

    14. The apparatus of claim 1, further comprising said array of microphones, said array of microphones being connected to both said localizer and to said siren detector.

    15. The apparatus of claim 1, further comprising said motor vehicle, said motor vehicle comprising a first array of microphones mounted thereon, said first array of microphones being connected to both said localizer and to said siren detector, said localizer and said siren detector both being mounted in said motor vehicle.

    16. The apparatus of claim 1, further comprising a motor vehicle, said motor vehicle comprising first and second arrays of microphones mounted thereon, said first and second arrays being connected to both said localizer and to said siren detector, said localizer and said siren detector both being mounted in said motor vehicle.

    17. The apparatus of claim 1, further comprising an array selector that is configured to select a first microphone array among from a set of microphone arrays and to cause said first microphone array to be connected to said localizer and to said siren detector.

    18. The apparatus of claim 1, further comprising an array selector that is configured to carry out a coarse localization of said siren and to select a first microphone array among from a set of microphone arrays based on said coarse localization, wherein as a result of said selection, said first microphone array connects to said localizer and to said siren detector to carry out a fine localization of said siren using said first microphone array.

    19. The apparatus of claim 1, wherein said localizer is configured to determine a direction-of-arrival of a siren based at least in part on variations in signal power across outputs of different microphones in a microphone array.

    20. The apparatus of claim 1, wherein said localizer is configured to determine direction-of-arrival of a siren based at least in part on phase differences between outputs of different microphones in a microphone array.

    21. A method comprising estimating a siren's direction-of-arrival, said siren having been emitted by a siren source and being incident on a vehicle, said method comprising receiving samples of said siren from different locations on said vehicle, said method comprising identifying those frequencies at which said siren is active and estimating said particular direction based on those frequencies.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0024] FIG. 1 shows an automobile having microphone arrays for detecting and localizing a siren.

    [0025] FIG. 2 shows spectrographs of time-varying fundamental frequencies of selected sirens.

    [0026] FIG. 3 shows examples spectrographs of the sirens in FIG. 2 but with a harmonic component;

    [0027] FIG. 4 is a block diagram of a system for detecting and localizing a siren;

    [0028] FIG. 5 is a flow chart of a procedure carried out by the system of FIG. 4; and

    [0029] FIG. 6 shows a block diagram of a system that uses two or more arrays for detecting and localizing a siren.

    DETAILED DESCRIPTION

    [0030] FIG. 1 shows a siren 10 that is incident on an automobile 12. As suggested by the figure, the “siren” in this case is not the physical unit but a sound wave being emitted by a siren source 14 that is carried by an emergency vehicle 16.

    [0031] In practice, different emergency vehicles emit different kinds of sirens. In fact, it is possible for the same emergency vehicle to itself emit different kinds of sirens. As a result, the siren 10 is one of several types of sirens. Each siren can be identified by its characteristic time-varying spectrum.

    [0032] FIG. 2 shows spectrograms for three types of siren 10: a yelp 18, a high-low 20, and a wail 24. Each spectrogram shows the siren's fundamental frequency as a function of time.

    [0033] The high-low 20, which can be seen in the center frame, is characterized by a low steady pitch that jumps to a higher pitch, remains there for some period, and then falls back to the original low pitch. As such, the high-low 20 is characterized by two distinct tones. In contrast, the wail 24 and the yelp 18 are characterized by distinct frequency sweeps.

    [0034] In a wail 24, which is shown in the right-most frame, the pitch rises smoothly from a low frequency to a high frequency and then decays continuously back to the low frequency, thus avoiding the discontinuous nature of the high-low 20.

    [0035] A yelp 18, which is shown in the left-most frame, has a similar pattern as the wail 24. The yelp 18 can be viewed as a periodic version of a wail 24 but with a more rapid ascent to the highest frequency and a more rapid plunge back to the lowest frequency. The yelp 18 is particularly useful in an urban environment, in which acoustic multi-path reflections are likely.

    [0036] In FIG. 2, only the time-varying fundamental frequency 26 is shown. FIG. 3 shows time-varying spectra for the same three sirens 18, 20, 24 with the addition of one or more harmonic components 28.

    [0037] Referring back to FIG. 1, the automobile 12 has first and second microphone arrays 30, 32 mounted at front and rear ends thereof. Although two arrays 30, 32 are shown, it is nevertheless possible to carry out the procedure described herein using only one array. The locations of the arrays 30, 32 are provided only for example. The methods and systems described herein do not depend on the locations of the arrays 30, 32.

    [0038] The first array 30 comprises two or more microphones 34. The microphones 34 connect to a detector 36 and to a localizer 38, as shown in FIG. 4.

    [0039] The detector 36 detects the existence of a siren 10 and the times at which the siren 10 exists. It also identifies the type of siren 10. It does so by determining the time-varying spectrum for the siren 10 as received by the microphones 34 and comparing it with time-varying spectra of known sirens.

    [0040] Referring now to FIG. 4, the detector 36 has a first output 39 and a second output 40. The first output 39 identifies the result of having detected a siren 22. The second output 40 identifies the time-varying spectrum of the detected siren 10. The localizer 38 uses the second output 40 to estimate the siren's direction-of-arrival.

    [0041] The spectrograms in FIGS. 2 and 3 show the sirens 10 as distinct lines that can be approximated by partly linear models. In these embodiments, the spectra are represented by narrow-band signals that are narrow enough to be regarded as tonal signals.

    [0042] Accordingly, in some embodiments, a detector 36 relies at least in part on a line-detection procedure. Such a procedure exploits the recurring patterns of a siren 10 as seen in FIGS. 2 and 3. It does so by searching for recurring lines at specific frequencies with specific slopes. A dynamic system model describes the time evolution of the lines' slopes. Once the detector 36 has identified the lines corresponding to the siren's sounds in the spectrogram it makes them available at the second output 40 for use by the localizer 38. Examples of a detector 36 are described in U.S. Patent Publ. 2020/0025904, the contents of which are incorporated herein by reference.

    [0043] The localizer 38 determines a direction-of-arrival by comparing received signals at different microphones 34. These signals have features from which one can infer a direction-of-arrival.

    [0044] In some embodiments, the feature relied upon is a differential time-of-arrival across the array 30, 32. This delay in time corresponds to a phase shift in frequency. Such a phase shift can be identified based on cross-phase or cross-power spectral density of the microphone signals' spectra. This procedure includes identifying the direction-of-arrival by summing steered-response power over all frequencies, thus yielding a steered response power that depends only on direction-of-arrival. By identifying the direction-of-arrival that maximizes this frequency-independent steered response power, it is possible to estimate the siren's direction of arrival.

    [0045] The steered response power for a particular direction-of-arrival and frequency is obtained by weighting the cross-spectral density between two microphones 34 at a particular time with a complex exponential that depends on the phase shift that results from an incident wave arriving at the two microphones 34 at different times.

    [0046] In particular, for a wave incident on a microphone array 30, 32, it is possible to define a direction vector that identifies the wave's direction of arrival. For any pair of microphones 34 in the array, it is possible to identify a pair vector that represents the difference between the locations of the two microphones. An inner product of the pair vector and the direction vector, when divided by the velocity of sound, provides a measure of the differential time-of-arrival at the two microphones 34 of the pair. For each frequency, this provides a phase delay between the pair of microphones 34.

    [0047] The signals received at the two microphones 34 are also characterized by a time-varying cross spectral density that depends on frequency. In some embodiments, it is useful to weight the cross-spectral density with a time-varying weight function that is indicative of the confidence that a siren source 14 was emitting a siren 10 with a particular frequency at a particular time.

    [0048] In particular, let m.sub.i represent the position vector of the i.sup.th microphone 34 that receives a time-varying signal x.sub.i(t) having a spectrum of X.sub.i(t,ω). In a three-dimensional Cartesian coordinate system, it is useful to define an elevation angle θ relative to the z axis and an azimuthal angle ϕ relative to the x axis. For a plane wave incident on the array 30, 32, it is possible to define a direction vector a(θ, ϕ) that indicates the siren's direction-of-arrival. Such a direction vector takes the following form:

    [00001] a ( φ , θ ) = [ cos ( θ ) .Math. cos ( φ ) cos ( θ ) .Math. sin ( φ ) sin ( θ ) ]

    [0049] In general, a plane wave moving with velocity v.sub.sound and arriving from an azimuth angle φ and an elevation angle θ will arrive at two microphones 34 at different times. The difference in the times of arrival, τ.sub.i,j for microphones 34 defined by position vectors m.sub.i and m.sub.j is:

    [00002] τ i , j ( φ , θ ) = a ( φ , θ ) T .Math. ( m i - m j ) v s o u n d

    [0050] This time delay yields a corresponding phase delay w.sub.i,j,ϕ,θ(ω), which is conveniently represented by a complex exponential:


    w.sub.i,j,ϕ,θ(ω)=exp(2π.sub.i,j(ϕ,θ)/ω)

    [0051] Between any two microphones i and j, it is possible to define a cross spectral density by multiplying the conjugate of one microphone's spectrum with the other microphone's spectrum:


    Γ.sub.i,j(t,ω)=X.sub.i(t,ω)*.Math.X.sub.j(t,ω)

    [0052] In some embodiments, it is useful to smooth the cross spectral density or to take an average over some time interval to obtain a more reliable estimate.

    [0053] Using the above relations, the power in direction of a plane wave arriving from azimuth φ and elevation θ in the frequency domain, hereafter referred to as the “steered response power,” is:

    [00003] SRP t ( φ , θ , ω ) = .Math. i , j = 1 N w i , j , φ , θ ( ω ) * .Math. Γ i , j ( t , ω )

    [0054] By summing over all frequencies, it is possible to obtain a total steered response power, SRP.sub.t(φ,θ):

    [00004] SRP t ( φ , θ ) = .Math. ω SRP t ( φ , θ , ω )

    [0055] The estimate of direction-of-arrival is then obtained by identifying the direction-of-arrival that maximizes this total steered response power:


    φ.sub.t,max,θ.sub.t,max=argmax.sub.φ,θSRP.sub.t(φ,θ)

    [0056] This provides an estimate for the azimuth φ.sub.t,max and elevation θ.sub.t,max of the siren's direction-of-arrival at time t.

    [0057] In other embodiments, it is useful to apply a time-varying mask function to weight the cross spectral density with a value indicative of the confidence in the result. An example of such a mask M(t, ω) is:

    [00005] M ( t , ω ) = { 1 , siren detected at time t and frequency ω 0 , otherwise

    [0058] Multiplying the above mask with the cross spectral density Γ.sub.i,j(t, ω) yields a modified cross spectral density:


    {tilde over (θ)}.sub.i,j(t,ω)=M(t,ω).Math.Γ.sub.i,j(t,ω)

    [0059] The mask need not be a binary function as shown. In general, the mask M(t, ω) is a value between zero and unity that conveys the certainty or confidence that a siren was active at time t and frequency ω. A value of unity in such a case would mean high confidence of a detected siren and a value of zero would mean very low confidence of a detected siren.

    [0060] In other embodiments, the feature relied upon is differential power. In this model, microphones 34 that are further from the siren source 14 output a signal with lower power than those closer to the siren source 14.

    [0061] To the extent that the siren 10 is the loudest sound in the environment, determining the siren's direction-of-arrival can be carried out without having to consider the siren's characteristic spectrum.

    [0062] However, in many cases, particularly when the siren is still far away, ambient traffic noise easily overwhelms the siren 10. This ambient traffic noise, which is distributed over a broad range of frequencies, hinders the localizer's operation. To overcome this difficulty, embodiments described herein rely at least in part on the siren's known time-varying spectrum or a rough model thereof utilizing dynamical system models or reoccurring lines in the spectrogram.

    [0063] The localizer 38 exploits the fact that the siren 10 is band-limited. Therefore, the siren 10 exists in only a limited portion of the acoustic spectrum. As such, instead of processing interfering noise across a broad swath of frequencies, the localizer 38 filters out those components of the microphones' signals that are outside a limited portion of the frequency spectrum that is expected to also include the frequencies of the siren 10. The localizer 38 therefore essentially ignores those frequencies that are outside the band occupied by the siren 10. Instead, the localizer 38 processes only those components of the microphones' signals that are within those portions of the frequency spectrum that would be expected to also include the siren.

    [0064] However, in order to retain only the siren's frequencies, the localizer 38 must know what those frequencies actually are. It learns what these frequencies are from the information provided at the detector's second output 40.

    [0065] Referring now to FIG. 5, the detector 36 begins by acquiring signals from the microphones 34 (step 42). Having done so, the detector 36 proceeds to calculate a short-term spectrum of the microphones' signals (step 44). In some practices of the illustrated method, the short-term spectrum is carried out using a Fourier transform.

    [0066] In some practices, the detector 36 determines energy values for each time-frequency bin of each microphone signal. However, in other practices, the detector 36 simply determines the fundamental frequency and thus avoids having to inspect each time-frequency bin.

    [0067] The detector 36 then uses the energy spectrum of the signal provided by one of the microphones 34 to detect the existence of a siren 10 (step 46) and to then identify those frequencies at which the siren 10 is active (step 48).

    [0068] The detector 36 provides the foregoing information to the localizer 38, which then proceeds to use this information, together with the signals from all the microphones 34, to localize the siren (step 50). It does so by only using those frequencies that have been identified by the detector 36 as being occupied by an active siren's spectrum.

    [0069] Some practices of the illustrated method feature the use of bandpass filters to filter signals from the microphones. These bandpass filters are tuned to pass one or more frequencies or frequency bands of the siren. A direction-finding procedure then operates on the filtered outputs of the microphones. Some practices combine the filtering and direction-finding operations. An example of a suitable technique for combining such operations includes generalized cross-correlation.

    [0070] Upon completion, the localizer 38 then generates and displays the result of localization for the driver's benefit (step 52). In autonomous vehicles, the localization result is forwarded to the autonomous driving system, where it is used to determine whether an evasive maneuver is necessary or to plan the best route to yield to the emergency vehicle.

    [0071] Suitable methods used by the localizer 38 to carry out direction finding include inspection of cross correlations between signals from the individual microphones 34. Such cross-correlation methods include generalized cross correlation. A particularly useful form of cross correlation is one in which the integration of the signals is weighted based on information that is known about the type of siren 10 that has been detected by the detector 36. However, the ability to exploit the known spectrum of the siren 10 in this way is advantageous when applied to other direction-finding methods.

    [0072] Other methods used by the localizer 38 include a steered response power method. In such cases, the localizer 38 steers the array 32, 34 across multiple directions-of-arrival and, for each such direction, determines the power received from that direction in some band of frequencies.

    [0073] After having swept across the various candidate directions-of-arrival, the localizer 38 then identifies the direction having the highest incident power as the best estimate for the siren's direction-of-arrival. Information concerning the siren's spectrum provides a basis for assigning the aforementioned mask that weights the various frequencies based on how likely it is that energy from a siren 10 is present at that frequency.

    [0074] The weight can be one that varies continuously as a function of some parameter, such as signal energy or noise estimates. However, for cases in which the signal-of-interest is sparse when plotted in the time-frequency plane, as is the case for the sirens 10 shown in FIGS. 2 and 3, it is useful to use a binary-weighting method that simply excludes those time-frequency bins that do not contain energy from the siren 10.

    [0075] Although the foregoing procedure can be carried out with a single array, the existence of two or more arrays disposed around the automobile 12 provides the opportunity to avoid effects due to shading. Such shading arises from interference from the automobile 12 itself when the automobile 12 lies between the siren source and the array. Thus, in the embodiment shown in FIG. 1, it is preferable to use the first array 30 when the siren source lies ahead of the automobile 12 and to use the second array 32 when the siren source 14 lies behind the automobile 12.

    [0076] In embodiments that choose between the first and second arrays 30, 32, the act of choosing itself requires a coarse localization step. One must, after all, first know which half plane the siren source 14 lies in. One way to achieve this is to exploit acoustic shadowing.

    [0077] In some cases, the power received at microphones 34 that are located on different sides of the automobile 12 can differ by between twenty and thirty decibels depending on the siren's direction-of-arrival. This occurs, for example, as a result of acoustic shadowing. Such a result is particularly likely when the microphones have been integrated into the automobile's body.

    [0078] Some embodiments exploit acoustic shadowing by determining signal power ratios between microphone pairs (i,j):

    [00006] SPR i , j ( t ) = .Math. ω M ( t , ω ) .Math. 20 log 1 0 ( X i ( t , ω ) X j ( t , ω ) )

    [0079] If the power ratio between signals received by microphones i and j exceeds a threshold, for example five decibels or ten decibels, the siren 10 is more likely to come from a direction that faces the automobile's body near the location of the i.sup.th microphone rather than from the direction nearer the j.sup.th microphone.

    [0080] By taking advantage of acoustic shadowing, it is possible to obtain a rough estimate of the siren's direction-of-arrival. It is also possible to use acoustic shadowing as a basis for selecting which of several microphone arrays distributed around the automobile 12 should be used to obtain a more accurate estimate using the steered response power method described earlier.

    [0081] Referring to FIG. 6, a system for using multiple arrays 30, 32 includes an array selector 54 that functions as a multiplexer for connecting an appropriately-selected array 32 to the system shown in FIG. 4. In some embodiments, the array selector 54 compares the power received at the two arrays 30, 32 and chooses the array 32 that has the higher power signal.