METHOD AND APPARATUS FOR RECOGNIZING SOUND SOURCE
20220132263 · 2022-04-28
Assignee
Inventors
Cpc classification
G08G1/0965
PHYSICS
H04S7/302
ELECTRICITY
H04R5/04
ELECTRICITY
H04R5/027
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
H04R5/027
ELECTRICITY
H04R5/04
ELECTRICITY
Abstract
The present invention relates to a method and apparatus for recognizing a sound source. According to the sound source recognition method of the present invention, acoustic signals are detected from four acoustic sensors arranged in a rectangular shape when viewed in a horizontal direction, sound arrival times are measured, six Interactive Time Differences (ITDs) are generated based on differences in sound arrival time between the respective acoustic sensors, and the location of a sound source is estimated based on the six ITDs. In addition, the type of sound source is determined by extracting and classifying features of the sound source by using a sum signal from the four acoustic sensors.
Claims
1-8. (canceled)
9. A method for recognizing a sound source, the method comprising: detecting acoustic signals from at least four acoustic sensors, wherein four of the acoustic sensors are arranged at vertices A, B, C, and D of a specific rectangle, respectively, when viewed in a horizontal direction, and acoustic signals detected by the four acoustic sensors are referred to as A(s), B(s), C(s), and D(s), respectively; calculating sound arrival times from the respective acoustic signals A(s), B(s), C(s), and D(s); generating six interaural time differences (ITDs) based on a difference in sound arrival time between the acoustic sensor at vertex A and the acoustic sensor at vertex B, between the acoustic sensor at vertex B and the acoustic sensor at vertex C, between the acoustic sensor at vertex C and the acoustic sensor at vertex D, between the acoustic sensor at vertex D and the acoustic sensor at vertex A, between the acoustic sensor at vertex A and the acoustic sensor at vertex C, and between the acoustic sensor at vertex B and the acoustic sensor at vertex D; and estimating a location of a sound source based on at least two of the six ITDs.
10. The method of claim 9, further comprising: calculating at least one of signals y(s), f(s), b(s), l(s), r(s), cl(s), cr(s), p(s), or q(s) by combining the acoustic signals A(s), B(s), C(s), or D(s) detected by the four acoustic sensors, as follows: y(s)=A(s)+B(s)+C(s)+D(s); f(s)=A(s)+B(s); b(s)=C(s)+D(s); l(s)=A(s)+D(s); r(s)=B(s)+C(s); cl(s)=A(s)+C(s); cr(s)=B(s)+D(s); p(s)=f(s)−b(s); and q(s)=l(s)−r(s); and estimating at least one of a volume level, direction, and moving direction of the sound source based on the signals y(s), f(s), b(s), l(s), r(s), cl(s), cr(s), p(s), or q(s).
11. The method of claim 9, further comprising determining a type of sound source by extracting a feature of the sound source using the signal y(s), which is a sum signal of the four sound signals, and classifying the extracted feature.
12. The method of claim 9, wherein: estimating the location of the sound source comprises: calculating an azimuth angle θ.sub.1 formed by a line connecting a first pair of acoustic sensors of the four acoustic sensors and the sound source based on an ITD between the first pair of acoustic sensors; calculating an azimuth angle θ.sub.2 formed by a line connecting a second pair of acoustic sensors of the four acoustic sensors and the sound source based on an ITD between the second pair of acoustic sensors, wherein the first pair of acoustic sensors and the second pair of acoustic sensors share one acoustic sensor; and calculating a distance to the sound source using the calculated azimuth angles θ.sub.1 and θ.sub.2, a distance between the first pair of acoustic sensors, and a distance between the second pair of acoustic sensors.
13. The method of claim 12, wherein calculating the distance to the sound source further comprises correcting an error by adopting an error correction function in order to correct the error occurring in the calculation of the distance to the sound source.
14. The method of claim 9, further comprising, before detecting the acoustic signals, performing trimming so that same output signals are output from the four acoustic sensors in a state without an input signal in order to perform initialization.
15. The method of claim 9, wherein detecting the acoustic signals comprises: canceling voice signals from signals input to the four acoustic sensors.
16. The method of claim 15, wherein detecting the acoustic signals comprises: removing noise signals common to the four acoustic sensors from the signals of the four acoustic sensors from which the voice signals have been canceled, and outputting resulting acoustic signals.
17. The method of claim 9, wherein at least one of the four acoustic sensors is disposed at a height different from that of remaining acoustic sensors.
18. The method of claim 9, wherein the acoustic sensors comprise noise shielding acoustic sensors around each of which a shield block is installed to suppress unnecessary signals.
19. An apparatus for recognizing a sound source, the apparatus comprising: at least four acoustic sensors configured to detect acoustic signals, wherein four of the acoustic sensors are arranged at vertices A, B, C, and D of a specific rectangle, respectively, when viewed in a horizontal direction, and acoustic signals detected by the four acoustic sensors are referred to as A(s), B(s), C(s), and D(s), respectively; a sound arrival time measurement unit configured to calculate sound arrival times from the respective acoustic signals A(s), B(s), C(s), and D(s); an ITD generation unit configured to generate six interaural time differences (ITDs) based on a difference in sound arrival time between the acoustic sensor at vertex A and the acoustic sensor at vertex B, between the acoustic sensor at vertex B and the acoustic sensor at vertex C, between the acoustic sensor at vertex C and the acoustic sensor at vertex D, between the acoustic sensor at vertex D and the acoustic sensor at vertex A, between the acoustic sensor at vertex A and the acoustic sensor at vertex C, and between the acoustic sensor at vertex B and the acoustic sensor at vertex D; and a sound source location estimation unit configured to estimate a location of a sound source based on at least two of the six ITDs.
20. A non-transitory computer-readable medium storing codes for causing a computer to perform the method of claim 9.
Description
DESCRIPTION OF DRAWINGS
[0027]
[0028]
[0029]
[0030]
[0031]
MODE FOR INVENTION
[0032] Embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that those of ordinary skill in the art to which the present invention pertains can easily implement them. The present invention may be embodied in many different forms and is not limited to the embodiments described herein. In order to dearly describe the present invention in the drawings, parts not related to the gist of the present invention are omitted, and the same reference numerals are assigned to the same or similar components throughout the specification.
[0033] The present invention will be described based on a method for recognizing a sound source in an autonomous vehicle below, but is not limited thereto. The method for recognizing a sound source according to the present invention may be applied to any apparatus or system requiring the recognition of a sound source, such as a robot or an Al speaker.
[0034]
[0035] Although the four acoustic sensors may be disposed at the same height, at least one of the four acoustic sensors may be disposed at a height different from that of the other acoustic sensors in order to completely prevent a blind spot. By disposing at least one of the four acoustic sensors at a different height, it may be possible to generate two or more ITDs in any direction. For example, only one of the four acoustic sensors may be disposed at a height different from that of the other acoustic sensors, two of the acoustic sensors may be disposed at a different height from that of the other acoustic sensors, and all of the four acoustic sensors may be disposed at different heights. In addition, in order to suppress unnecessary signals inside and outside the vehicle, a shield block may be installed around each of the acoustic sensors. For example, unnecessary noise such as wind noise may be blocked by the shield block.
[0036] Hereinafter, acoustic signals detected by the four acoustic sensors are referred to as A(s), B(s), C(s), and D(s), respectively.
[0037]
[0038] The acoustic sensors installed as shown in
[0039] After the initialization through automatic trimming, the method for recognizing a sound source starts with step S200 of detecting acoustic signals A(s), B(s), C(s), and D(s) using the four acoustic sensors.
[0040] Sound sources of interest in connection with an autonomous vehicle are sounds coming from the outside of the vehicle such as the sirens of an ambulance, a fire truck and a traffic control police car, the drone sounds of a drone taxi and a police drone flying in the sky, and the sound of a motorcycle driving around the autonomous vehicle. Accordingly, for the purpose of improving sound source recognition rate, the acoustic signal detection step S200 may include step S210 of canceling voice signals such as human voice or music inside a vehicle from signals input to the acoustic sensors.
[0041] Furthermore, the acoustic signal detection step S200 may further include step S220 of removing noise signals such as common random noise included in the channels from the signals of the four acoustic sensors, from which the voice signals have been canceled, through mutual cancellation. As a method for suppressing noise signals at the removal step 220 through mutual cancellation, each filter may be constructed by imparting a reference signal corresponding to a signal to be detected to a band-pass filter that can pass only the signal to be detected therethrough. For example, random noise such as a tire friction sound generated during the driving of a vehicle is not a meaningful signal, so that it is preferable to filter out and attenuate it in advance and then output a desired signal. In this case, noise may be suppressed by using a waveform smoothing method using a moving average or the like. When the noise signals are mutually canceled in this manner, the amount of data to be processed by the apparatus or system that performs the method for recognizing a sound source is reduced. In addition, sound source recognition rate may be improved by detecting only signals having a large weight by the noise signal removing step S220.
[0042] Thereafter, there is performed step S300 of measuring at least one of sound arrival time, arrival intensity, and frequency from each of the acoustic signals A(s), B(s), C(s), and D(s). The sound arrival time and the arrival intensity are then used in a sound source location recognition step S500 through the generation of an interaural time difference (ITD) or an interaural level difference (ILD). The frequency may be used to calculate the weight of the ITD or ILD.
[0043] Then, a sound source volume level, direction, and moving direction recognition step S400, the sound source location recognition step S500, and a sound source type recognition step S600 may be simultaneously performed in parallel, or may be performed sequentially. When the sound source volume level, direction, and moving direction recognition step S400, the sound source location recognition step S500, and the sound source type recognition step 600 may be simultaneously performed in parallel, there is the effect of shortening recognition time.
[0044] First, the sound source volume level, direction, and moving direction recognition step S400 is discussed. There is performed step S410 of calculating signals y(s), f(s), b(s), l(s), r(s), d(s), cr(s), p(s), and q(s) by combining the acoustic signals A(s), B(s), C(s), and D(s) detected by the four acoustic sensors.
[0045] y(s) is the sum signal of the four acoustic signals, and is calculated as follows:
y(s)=A(s)+B(s)+C(s)+D(s) [Equation 1]
[0046] f(s) represents a front signal, b(s) represents a back signal, and they are calculated as follows, respectively:
f(s)=A(s)+B(s) [Equation 2]
b(s)=C(s)+D(s) [Equation 3]
[0047] l(s) represents a left signal, r(s) represents a right signal, and they are calculated as follows, respectively:
l(s)=A(s)+D(s) [Equation 4]
r(s)=B(s)+C(s) [Equation 5]
[0048] cl(s) represents a left cross signal, cr(s) represents a right cross signal, and they are calculated as follows, respectively:
cl(s)=A(s)+C(s) [Equation 6]
cr(s)=B(s)+D(s) [Equation 7]
[0049] p(s) represents the signal difference between the front and back signals, q(s) represents the signal difference between the left and right signals, and they are calculated as follows, respectively:
p(s)=f(s)−b(s) [Equation 8]
q(s)=l(s)−r(s) [Equation 9]
[0050] Then, there is performed step S420 of estimating the volume level, direction, and moving direction of the sound source based on the signals y(s), f(s), b(s), l(s), r(s), cl(s), cr(s), p(s), and q(s). In other words, the direction in which the sound source is generated, i.e., a forward, backward, leftward, or rightward direction, may be recognized by performing comparative analysis on the signals. For example, when f(s) is higher than b(s), it may be recognized that there is the sound source on the front side. In contrast, when l(s) is higher than r(s), it may be recognized that there is the sound source on the left side. In addition, the volume level of the sound source for each channel and the sum volume level of the sound source may be recognized by performing comparative analysis on the signals. In this case, the value of the signal y(s) is regarded as the volume level.
[0051] Furthermore, the value of the signal y(s) at a specific point in time and the value of the signal y(s) at a subsequent point in time may be compared with each other. In this case, it can be seen that when the value increases, the moving direction of the sound source is a direction toward to the autonomous vehicle, and it can also be seen that when the value decreases, the moving direction of the sound source is a direction away from the autonomous vehicle. In addition, not only the direction in which the sound source is generated but also the direction in which it moves may be determined by performing comparative calculation on the signals f(s), b(s), l(s) r(s), cl(s), cr(s), p(s), and q(s) according to [Equation 2] to [Equation 9]. For example, when the signal f(s) is lower than b(s), gradually becomes equal to b(s), and further becomes higher than b(s), it can be seen that the sound source moves from the back to the front.
[0052] According to an embodiment of the present invention, the volume level of the sound source may be recognized by the signal y(s), which is the sum signal of the four acoustic signals, regardless of the direction.
[0053] The volume level, direction, and moving direction of the sound source are output to a host, which is the system host of the apparatus for recognizing a sound source, at step S430.
[0054] Now, there will be described step S500 of recognizing the location of the sound source.
[0055] In this case, the location of the sound source means the location of the sound source based on the azimuth angle and distance of the sound source.
[0056] To recognize the location of the sound source, first, there is performed step S510 of generating six ITDs or ILDs based on a difference in sound arrival time or sound arrival intensity between the front left (A) acoustic sensor and the front right (B) acoustic sensor, a difference in sound arrival time or sound arrival intensity between the front right (B) acoustic sensor and the back right (C) acoustic sensor, a difference in sound arrival time or sound arrival intensity between the back right (C) acoustic sensor and back left (D) acoustic sensor, a difference in sound arrival time or sound arrival intensity between the back left (D) acoustic sensor and the front left (A) acoustic sensor, a difference in sound arrival time or sound arrival intensity between the front left (A) acoustic sensor and the back right (C) acoustic sensor, and a difference in sound arrival time or sound arrival intensity between the front right (B) acoustic sensor and the back left (D) acoustic sensor.
[0057] Next, there is performed step S520 of estimating the location of the sound source based on at least two of the generated six ITDs or I LDs. A method for estimating the location of a sound source based on ITDs according to an embodiment of the present invention will be described below.
[0058] When an ITD is generated based on a difference in sound arrival time from an acoustic signal, an azimuth angle θ formed by a line connecting the sound source and a center between two acoustic sensors and a horizontal line connecting the two acoustic sensors can be obtained from a distance R between two of the four acoustic sensors and the speed c (about 340 m/s) of sound traveling in the air, as follows:
[0059] The remaining azimuth angles can also be calculated in the same manner.
[0060]
[0061] Meanwhile, since it is assumed that the sound source is spaced apart by an infinite distance, i.e., it is assumed that a sound arrives in parallel from the sound source to both acoustic sensors, the angle formed by a line connecting the sound source and a center between the two acoustic sensors and a horizontal line connecting the two acoustic sensors is considered to be the same as the angle formed by a line connecting the sound source and each of the two acoustic sensors and the horizontal line connecting the two acoustic sensors. Accordingly, in
[0062] Referring to
[0063] Referring to
[0064] In addition, when the values of the two azimuth angles θ are obtained in the same dimension, the location of the sound source may be estimated by calculation. In other words, two azimuth angles θ among the sound source and the acoustic sensors may be obtained from two ITDs generated from the arrival times of signals detected by three of the four acoustic sensors and the size of the structure of a given autonomous vehicle, and the distance to the sound source may be calculated using the values of the azimuth angles θ. In the case where one of the three acoustic sensors generating two ITDs is not located on the same plane but at a different height, e may be obtained by substitution with the same plane through a simulation using trigonometry.
[0065] More specifically, assuming that acoustic sensors are arranged at respective corners of a vehicle, the distance from the vehicle having a width VW and a length VL to the sound source may be calculated by applying the above-described ITDs. A method of obtaining the distance D.sub.1 between the sound source and the back left (D) acoustic sensor will be described below with reference to
[0066] D.sub.1 is the estimated distance between the sound source and the back left (D) acoustic sensor, and D.sub.2 (=d.sub.11) is the estimated distance between the sound source and the front left (A) acoustic sensor. d.sub.12 is the distance over which an acoustic signal travels further to reach the back left (D) acoustic sensor from the location at which it arrives at the front left (A) acoustic sensor. Accordingly, D.sub.1 can be obtained as the sum of d.sub.11 and d.sub.12.
[0067] A method of obtaining θ.sub.1 and θ.sub.2 necessary to obtain d.sub.11 is obtained based on the times at which the acoustic signal arrives at the front left (A) acoustic sensor, the front right (B) acoustic sensor, and the back left (D) acoustic sensor and the distance between the acoustic sensors, as in the following [Equation 13] and [Equation 14]. In this case, since it is assumed that the acoustic sensors are arranged at respective corners of the vehicle, the distance between the front left (A) acoustic sensor and the front right (B) acoustic sensor corresponds to the width VW of the vehicle, and the distance between the front left (A) acoustic sensor and the back left (D) acoustic sensor corresponds to the length VL of the vehicle. When not all acoustic sensors are arranged at the corners of the vehicle, the intervals between the acoustic sensors are used. ti is the time when the acoustic signal generated from the sound source arrives at the front left (A) acoustic sensor, and t.sub.2 and t.sub.3 are the times when the acoustic signal arrives at the front right (B) acoustic sensor and the back left (D) acoustic sensor. The equations below are merely examples. When a mathematical modeling method is different, they may be expressed in different forms.
[0068] The distance between the sound source and the back left (D) acoustic sensor can be obtained using the above-described method, and the distance between the sound source and the front left (A) acoustic sensor and the distance between the sound source and the back right (C) acoustic sensor based on the front right (B) acoustic sensor can also be calculated using the same method.
[0069] The distances between the sound source and the acoustic sensors may be obtained as described above. However, in practice, the sound source is not located at infinity, and thus a distance obtained by the above method contains an error. In other words, although a basic model assumes that a sound source is located at infinity and considers that a line denoted by D.sub.1 and a line denoted by D.sub.add in
[0070] Alternatively, the error may be corrected as in [Equation 18] below:
[0071] In this case, C.sub.EA, C.sub.ED, and C.sub.ES are nonlinear error correction functions, and may be determined by a real distance-calculated distance comparison simulation or other calculations.
[0072] Since all the four acoustic sensors are located at different locations, the arrival times of a sound entering the respective acoustic sensors are also different. In other words, ITDs are obtained using the time differences that occur when a sound arrive at the individual acoustic sensors from a sound source at asymmetric distances, azimuth angles θ are obtained, and then the distance to the sound source is calculated by utilizing given distances between the acoustic sensors, thereby recognizing the location at which the sound has been generated. In addition, according to the present invention, ITD.sub.R5 between the front left (A) acoustic sensor and the back right (C) acoustic sensor located diagonally to each other, and ITD.sub.R6 between the front right (B) acoustic sensor and the back left (D) acoustic sensor may be generated. Accordingly, even when the sound source is placed at the location at which the values of ITD.sub.R1 to ITD.sub.R4 become 0, it may be possible to recognize the location of the sound source without a blind spot by generating two or more ITDs.
[0073] In addition, when acoustic sensors are disposed at different heights, azimuth angles θ may be obtained by substitution with the same plane through a simulation using trigonometry, and the distance to a sound source may be calculated based on the azimuth angles, thereby recognizing the location of the sound source.
[0074] The location of the sound source estimated as described above is output to the host at step S530.
[0075] Now, there will be described step S600 of recognizing the type of sound source. Step S600 of recognizing the type of sound source starts with step S610 of extracting the feature(s) of a sound source using the signal y(s), which is the sum signal of the four acoustic signals. The feature(s) may be extracted using a sound spectrogram technique, or may be extracted using another acoustic signal feature extraction method, e.g., Mel Frequency Cepstrum Coefficient (MFCC). Then, there is performed step S620 of determining the type of sound source by classifying the extracted feature(s). At this determination step, the type of sound source may be determined by a method of classifying the feature(s) using artificial intelligence such as DNNs (Deep Neural Networks) and recognizing a target sound among overlapping sounds using a tensor flow backend method or other scoring method (e.g., a weighting or labeling method of allocating a weight having a value between set minimum and maximum values and then performing calculation). A learning method or a non-learning method may be used to classify the sound source. It may be possible to determine whether the sound source is, e.g., a siren sound, a drone sound, or a motorcycle sound by the sound source type determination step S620.
[0076] The type of sound source determined in the above manner is output to the host at step S630.
[0077] Although the sound source volume level, direction and moving direction recognition step S400, the sound source location recognition step S500, and the sound source type recognition step S600 have been described as being performed in parallel in this embodiment, they may be performed sequentially. The sequence of the steps illustrated in
[0078]
[0079] The processing module 1200 includes components capable of performing the steps described above in conjunction with
[0080] Although the individual components of the processing module 1200 have been described as separate components, all the components may be combined and function as a single component, or only some components may be combined and function as a single component. However, as long as the above configurations perform the above-described functions, all the configurations fall within the scope of the present invention.
[0081] Since the above embodiments are only the most basic examples of the present invention, it should not be understood that the present invention is limited to the above embodiments, but it should be understood that the scope of the present invention must be defined based on the attached claims and equivalents thereto.