METHOD AND SYSTEM FOR AVOIDING HOWLING DISTURBANCE ON CONFERENCES
20220375445 · 2022-11-24
Inventors
Cpc classification
H04M3/568
ELECTRICITY
H04M3/002
ELECTRICITY
H04R3/02
ELECTRICITY
G10L25/18
PHYSICS
International classification
G10K11/16
PHYSICS
Abstract
Method and System for avoiding howling disturbance especially on conferences, wherein the method comprising the steps of using a howling detector unit implemented inside a multipoint control unit to receive an audio stream input from a client, analyzing the audio input with the howling detector in order to verify if howling noise is present, using at least two of a skewness analysis, a flatness analysis, a crest analysis, a rolloff analysis and preventing the audio stream input to be forwarded as an output to an audio mixer, if howling noise is present.
Claims
1-15. (canceled)
16. A method for avoiding a howling disturbance on conferences, comprising: using a howling detector unit implemented inside a multipoint control unit to receive an audio stream input from a client; analyzing the audio input with the howling detector in order to verify whether howling noise is present, using at least two of a skewness analysis, a flatness analysis, a crest analysis, and a rolloff analysis; in response to detecting the howling noise, preventing the audio stream input to be forwarded as an output to an audio mixer.
17. The method of claim 16, wherein in the analyzing of the audio input includes the skewness analysis, the skewness analysis being carried out by measuring an asymmetry of a distribution around a mean value.
18. The method of claim 16, wherein the analyzing of the audio input includes the flatness analysis, the flatness analysis being carried out by calculating the ratio of a geometric mean to an arithmetic mean and quantifying the audio input sound as a noise-like sound or as a tone-like sound.
19. The method of claim 16, wherein the analyzing of the audio input includes the crest analysis being carried out by calculating the ratio of the peak values within spectral frequencies to an arithmetic mean of an energy spectrum of the audio input.
20. The method of claim 16, wherein the analyzing of the audio input includes the rolloff analysis being carried out by calculating a frequency so that a pre-defined percentage of a signal energy is contained below a frequency of the audio input.
21. The method of claim 16, comprising: the multipoint control unit automatically starting a process to inform the one or more clients that are generating the howling noise disturbance and/or the other clients that are not generating the howling noise disturbance about the howling noise disturbance.
22. The method of claim 21, comprising: the client that generated the howling noise disturbance solving the problem locally so that no howling noise is present and the multipoint control unit automatically remixing the client to the audio stream again in response to the solving of the problem locally and informing this client and/or the other clients about the remixing.
23. A system for avoiding howling disturbance on conferences comprising: a multipoint control unit communicatively connectable to a plurality of clients, the clients including a first client and a second client; a howling noise detector configured to detect a howling noise, the howling detector being implemented inside the multipoint control unit to receive an audio stream input from each of the clients; a prevention unit configured to disable and/or enable a procedure to prevent the audio stream input to be forwarded to an audio mixer in response to detection of the howling noise in the audio stream input.
24. The system of claim 23, wherein the howling detector unit comprises: a skewness analysis unit; a flatness analysis unit; a crest analysis unit; and/or a rolloff analysis unit.
25. The system of claim 23, wherein the howling detector unit comprises a DC filter unit and/or a fast Fourier transformation (FFT) unit
26. The system of claim 23, wherein the howling detector unit comprises at least one fixed sized circular queue to separately queue an output of at least one of: a skewness analysis unit, a flatness analysis unit, a crest analysis unit and/or a roll-off analysis unit.
27. The system of claim 26, wherein the howling detector unit comprises at least one average unit to individually average the output of the queue.
28. The system according to claim 27, wherein the howling detector unit comprises at least one comparator unit to compare by an output of the average unit to an individual threshold value.
29. The system of claim 28, wherein the howling detector unit comprises an AND port unit to analyze at least two individual threshold value outputs generated by at least two comparator units and forward a single Boolean value informing the multipoint control unit to prevent the audio stream input from being mixed in the audio mixer or release the audio stream input to be mixed via the audio mixer.
30. The system of claim 23, wherein the system is included in a private branch exchange (PBX), a telecommunication device, a personal computer (PC), a closed loop control system equipped with Automatic Gain Control (AGC), or a telecommunication device having a web application.
Description
[0092] The invention and embodiments thereof will be described below in further detail in connection with the drawings.
[0093]
[0094]
[0095]
[0096]
[0097]
[0098]
[0099]
[0100]
[0101] As schematically illustrated in
[0102] Normally, there is always some kind of echo canceler implementation in the client applications, but they are not enough to prevent howling, as it was described previously. And normally, MCU 101 have no echo canceler implementation in its streams, since this is a CPU intensive procedure, echo canceler leads to delay in audio and it must be implemented as close to the audio device as possible.
[0103]
[0104] The howling detector unit 100 will detect when howling noise starts and when it stops. By detecting the noise presence or absence, the howling detector unit 100 could block or release the incoming audio from being mixed to the other participants by the audio mixer 103 unit. At any time that the noise presence state changes, the MCU 101 unit could also trigger the signaling layers of the system to send a message to the participants about the status changes.
[0105] As an example, in the
[0106] As soon as Client 1 111 stops sending noise, its audio stream will automatically be mixed again to the conference and a message informing the occurrence will be sent to the Client 1 111 as well. As an optional configuration, the participants Client 2 112 and Client 3 113 may receive a message informing them about the situation of Client 1 111.
[0107] The basic operation of any echo canceler is the comparison of its microphone input with the incoming stream. If the echo canceler detects that its microphone input audio is also present in its incoming audio stream to be played in a speaker, it must filter out the part of the signal that is relatively equal to its microphone input and then send it to be played in the speaker.
[0108]
[0109] If there is an acoustic loop 124 in Client 2 112 side, it will arrive as an echo in the Client 1 111, and its echo-canceler unit 125 must filter out this signal before sending it to the microphone.
[0110] Considering this example, if the participant of Client 1 111 is talking together with the participant of Client 2 112, the double talk problem is present. As said, the presence of double talk must disable or slow down the echo-canceler procedure in order to avoid even worse echo. This is done by the double talk detector.
[0111] Still considering this example, if the acoustic loop 123 in Client 1 111 and also the acoustic loop 124 in Client 2 112 are present the system finds itself in a situation that is favorable to produce the howling noise. It happens because both loops working together may appear to the echo-canceler as a double talk situation, since both microphones may receive audio input. It may not be a problem if the acoustic loop has low gains, but if this gain is amplified by the microphone or speaker devices, or by the system operating the audio device, it may become strong enough to trigger the double talk detector, which would slow down or disable the echo canceler procedure. It could be also strong enough to avoid the echo canceler algorithm to filter out the echo. In this case, considering this closed loop, the echo is very quickly transformed to a howling noise due to the high feedback rate of the system.
[0112] Such a situation even gets worse in a conference situation, since there is a greater probability to occur acoustic loops in the several clients sharing the same audio stream. In this case, even if there are just some clients with low gain acoustic loops, these gains end summed inside the MCU mixer, which could lead to an overall echo with increased power. This echo with increased power may quickly become a howling noise.
[0113] As described, the echo canceler works with two inputs and it depends on analyzing the audio after the complete loop circuit took place.
[0114] The invention describes a system which is able to remove the howling working just in the MCU 101 stream inputs without the need of any other reference signal and without needing to wait for the complete loop circuit to happen. By this mean, it may avoid any instability in the audio circuit loop before it takes place.
[0115] For each channel in the conference, there must be one howling detector unit 100 instance. The howling detector unit 100 will not perform any mathematical operation in order to transform the incoming audio into a filtered audio stream like echo cancellers do. Such approach will reduce CPU consumption from the algorithm since it will not have to produce another stream as output. The only output generated by the howling detector unit 100 will be a Boolean value informing that the MCU unit 101 could block or release the audio from being mixed to the other conference participants.
[0116]
[0117] A common effect of high gain acoustic loop is the creation of 0 Hz frequency components (DC) in the signal. This DC has no direct audible effect in the audio for the listener, but it may cause some divergence while analyzing the signal spectrum. Then, a DC filter unit 301 is optionally added in the input of the howling detector unit 100. The DC filter unit 301 may be any simple time domain DC filter, like extracting the mean value from the audio input block.
[0118] The output of the DC filter unit 301 becomes an input to an optionally FFT unit 302. The size of the FFT unit 302 could be any size, but it is recommended small sizes, like 128 milliseconds, in order to avoid delaying the howling detection. Small sizes for the FFT unit 302 will also improve performance characteristics of the detector, but it could not be too small in a way that it may affect the precision of the detector.
[0119] The output spectrum generated by the FFT unit 302 will become an input to four spectral analysis units 303, 304, 305 and 306. Each of these spectral analysis units will perform a statistical analysis on the frequency spectrum given as input by the FFT unit 302 and each of them will give an indication about the howling noise presence. Each individual statistical analysis is probably not able to determine exactly the presence of the howling noise, but analyzed together in a combination at least of two or more analysis units they will give an accurate indication about the presence of the howling noise.
[0120] The skewness analysis unit 303 gives a measure of the asymmetry of a distribution around its mean value. Skewness equal 0 indicates a symmetric distribution. Higher skewness values indicate more energy on the left, which represents the lower frequencies in the spectrum. Lower skewness values indicate more energy on the right, which represents the higher frequencies in the spectrum. The howling noise normally tends to have more concentrated power in the higher frequencies of the spectrum, and then lower skewness values give an indication of howling noise presence. The skewness has high values for voiced speech where substantial energy is present around the fundamental frequency, and then it will be also used in order to discriminate voiced speech from howling noise.
[0121] The flatness analysis unit 304 calculates the ratio of the geometric mean to the arithmetic mean and it will provide a way to quantify how noise-like a sound is, as opposed to being tone-like. Noise-like signals have higher flatness values, since noises tend to cover all frequencies of the spectrum. Tone-like signals have lower flatness values, since tones create peaks in specific regions of the spectrum. The howling noise normally tends to have one or two small peaks, but it also spreads the noise around the entire frequency spectrum, and then higher flatness values give an indication of howling noise presence.
[0122] The crest analysis unit 305 calculates the ratio of the peak values within the spectral frequencies to the arithmetic mean of the energy spectrum, which will indicate how extreme the peaks are in a waveform. The howling noise normally tends to have some peaks in the spectrum, but they are not extreme peaks, and then lower crest values give an indication of howling noise presence.
[0123] The rolloff analysis unit 306 calculates the frequency so that a pre-defined percentage of the signal energy is contained below this frequency. The howling noise tends to have higher values for the rolloff values and then high rolloff values give an indication of howling noise presence. A pre-defined percentage suggested for the calculation is 90%.
[0124] The output of the spectrum analysis units 303, 304, 305 and 306 being queued separately by the fixed sized circular queues 307, 308, 309 and 310 respectively. The queue elements being initialized to zero at each initial position, and for each time that a new spectrum analysis result is inserted to the queue, the oldest queue value being removed, maintaining the queue size with the same fixed length and ignoring older values that becomes no relevant anymore as the time passes. The first accurate howling noise detection will be given just after all elements of the queues 307, 308, 309 and 310 are filled by the respective spectrum analysis 303, 304, 305 and 306 outputs, but the next accurate howling analysis result will be given at each time that the FFT unit 302 gives an input to the spectrum analysis units 303, 304, 305 and 306.
[0125] The largest the queue size is, the slower will be the first detection of the howling noise, and then it is suggested to avoid larger queue sizes. Larger queues sizes must be avoided also in order to avoid audio from longer time in the past to influence in the present howling analysis result. But the queue size cannot be so small that it may affect the stability of the howling detection procedure.
[0126] For a case where the FFT unit 302 creates a frequency spectrum based on 128 milliseconds of audio, the suggested value for each queue size is 4 elements, which would give an accurate howling noise result after the first 512 milliseconds of audio and just the last 512 milliseconds of audio will influence the present howling noise detection result. After the first 512 milliseconds of audio arrives, the detector generates a result after each 128 milliseconds.
[0127] The elements of each of the queues 307, 308, 309 and 310 will be averaged individually by the average units 311, 312, 313 and 314. This average is done in order to have an overall picture about the spectrum analysis results in the past periods of audio. The audio period relevant to each howling detection result is defined by the queue units 307, 308, 309 and 310 sizes and by the FFT size in FFT unit 302.
[0128] The output of the average units 311,312, 313 and 314 will be compared by individual threshold values in comparators units 315, 316, 317 and 318 respectively. Each average input will be compared with a threshold value determined specifically for each spectrum analysis parameter.
[0129] As described earlier, the skewness value must be low. Then a fixed threshold value (thd1) must be determined in order to represent how low the skewness should be in order to create an indication of howling noise presence. The comparator unit 315 compares the output of the average unit 311 against the fixed threshold value (thd1), and if it is lower than the fixed threshold value (thd1), it will output a true result.
[0130] As described earlier, the flatness value must be high. Then a fixed threshold value (thd2) must be determined in order to represent how high the flatness should be in order to create an indication of howling noise presence. The comparator unit 316 compares the output of the average unit 312 against the fixed threshold value thd2, and if it is higher than the fixed threshold value thd2, it will output a true result. As described earlier, the crest value must be low. Then a fixed threshold value (thd3) must be determined in order to represent how low the crest should be in order to create an indication of howling noise presence.
[0131] The comparator unit 317 compares the output of the average unit 313 against the fixed threshold value (thd3), and if it is lower than the fixed threshold value (thd3), it will output a true result.
[0132] As described earlier, the rolloff value must be high. Then a fixed threshold value (thd4) must be determined in order to represent how high the rolloff should be in order to create an indication of howling noise presence. The comparator unit 318 compares the output of the average unit 314 against the fixed threshold value (thd4), and if it is higher than the fixed threshold value (thd4), it will output a true result.
[0133] The AND port unit 319 will receive the output from each comparator units 315, 316, 317 and 318. if all the comparators outputs true, indicating that all the threshold rules were attended, then it outputs true indicating that a howling noise is really present in the audio signal. If any one of the comparators units 315, 316, 317 and 318 outputs a false result, it means that no howling noise is present anymore, and then it will output false, indicating that no howling noise is present in the audio signal.
[0134] Finally, after having a result about the howling presence, the howling noise detector outputs the result to other layers in the MCU unit 101. The MCU unit 101 decides then to block or unblock the audio from being mixed to the other conference participants. If the MCU unit 101 is in a state where the howling noise was already detected and the howling detector unit 100 outputs false, it may unblock the audio. If the MCU unit 101 is in a state where the howling noise was not detected and the howling detector unit 100 outputs true, it may block the audio.
[0135] As an example, a test with an audio stream containing some kinds of audio was created according the Table 1.
TABLE-US-00001 TABLE 1 Audio stream test specifications Ref Nr. Audio signal type 900 A signal composed of 6 frequencies peaks (6 tones) 901 A howling signal created with the microphone positioned against the speaker 902 The beginning of a low power howling signal 903 A DTMF tone signal 904 A recorded speech where the talker is pronouncing the “I” vowel for a period of time 905 A recorded speech where the talker is screaming the “A” vowel for a period of time 906 A recorded speech where the talker is just talking for a long period of time 907 A recorded speech where the talker is talking with intensity 908 A howling signal created with audio being filtered with a low pass filter
[0136] This audio stream was then used as input for the howling detector unit 100 and the output values of each average unit 311, 312, 313, 314 were saved. Then, the saved outputs were used to plot graphics according
[0137] Each point in the curve of the
[0138] By analyzing the plotted graphics shown in
TABLE-US-00002 TABLE 2 Audio stream test comparison Audio signal RollOff Crest Flatness Skewness type ref nr. (thd: 0, 55) (thd: 0, 15) (thd: 0, 2) (thd: 16) 900 HIGH LOW LOW LOW 901 HIGH LOW HIGH LOW 902 HIGH LOW HIGH LOW 903 LOW HIGH LOW LOW 904 LOW HIGH HIGH HIGH 905 LOW LOW HIGH LOW 906 HIGH LOW HIGH HIGH 907 LOW LOW HIGH HIGH 908 HIGH LOW HIGH LOW
[0139] The table 2 shows pretty clear a special result when the howling noise is present. The audio stream parts indicated by the reference numbers 901, 902 and 908 are all a part of the audio stream where the howling noise is present. in the table 2, it can be seen that the stream parts where the howling noise is present share the same specific logic and no other stream than the howling noise results in the same logic.
[0140] Considering the specific logic that happens just when the howling noise is present, it can be used in order to detect the presence of the howling noise and then it makes possible to take a decision based on it.
TABLE-US-00003 Reference Numerals 100 Howling Detector unit 101 Multipoint Control Unit (MCU) 103 Audio mixer 111 Client1 112 Client2 113 Client3 123 Acoustic Loop 1 124 Acoustic Loop 2 126 Echo-canceller Unit 127 Echo-canceller Unit 128 Echo-canceller Unit 301 DC filter unit 302 FFT 303 skweness analysis unit 304 Flatness analysis unit 305 Crest analysis unit 306 Rolloff analysis unit 307 Circular queue 308 Circular queue 309 Circular queue 310 Circular queue 311 Average unit 312 Average unit 313 Average unit 314 Average unit 315 Comparator unit 316 Comparator unit 317 Comparator unit 318 Comparator unit 319 AND port unit 900 A signal composed of 6 frequencies peaks (6 tones) 901 A howling signal created with the microphone positioned against the speaker 902 The beginning of a low power howling signal 903 A DTMF tone signal 904 A recorded speech where the talker is pronouncing the “I” vowel for a period of time 905 A recorded speech where the talker is screaming the “A” vowel for a period of time 906 A recorded speech where the talker is just talking for a long period of time 907 A recorded speech where the talker is talking with intensity 908 A howling signal created with audio being filtered with a low pass filter thd1 Fixed threshold value 1 thd2 Fixed threshold value 2 thd3 Fixed threshold value 3 thd4 Fixed threshold value 4