Method, apparatus, and computer-readable media utilizing residual echo estimate information to derive secondary echo reduction parameters
11297178 · 2022-04-05
Assignee
Inventors
- Richard Dale Ferguson (Okotoks, CA)
- Linshan Li (Calgary, CA)
- MAHDI JAVER (Calgary, CA)
- Nicholas Alexander Norrie (Calgary, CA)
Cpc classification
H04M3/568
ELECTRICITY
International classification
H04M3/56
ELECTRICITY
Abstract
Method, apparatus, and program code embodied in computer-readable media, for providing enhanced echo suppression in a conferencing system having at least one microphone and at least one speaker. At least one microphone input signal is received, and at least one speaker input signal is provided. At least one processor has at least one primary echo-suppressor and at least one secondary echo-suppressor. The at least one primary echo-suppressor receives (i) the microphone input signal(s) and (ii) the speaker input signal(s). The at least one primary echo-suppressor provides at least one echo-suppressed microphone signal. The at least one secondary echo-suppressor receives the at least one echo-suppressed microphone signal and provides an output signal. The at least one processor provides the at least one echo-suppressed microphone signal to the at least one secondary echo-suppressor without providing the at least one speaker input signal directly to the at least one secondary echo-suppressor.
Claims
1. An apparatus providing enhanced echo suppression in a conferencing system having at least one microphone and at least one speaker, comprising: at least one primary echo-suppressor configured to receive at least one microphone input signal from the at least one microphone and at least one speaker input signal from the at least one speaker, and configured to produce at least one echo estimate signal and at least one echo-suppressed microphone signal; and at least one secondary echo-suppressor configured to receive the at least one echo-suppressed microphone signal and one or more selected from a group consisting of the at least one echo estimate signal and the at least one microphone input signal, and configured to produce an echo-processed microphone signal.
2. The apparatus of claim 1 wherein the at least one secondary echo-suppressor is configured to produce the echo-processed microphone signal without using the at least one speaker input signal.
3. The apparatus of claim 1 wherein the at least one primary echo-suppressor comprises an echo estimate processor configured to: receive the at least one speaker input signal and the at least one microphone input signal; and produce the at least one echo estimate signal.
4. The apparatus of claim 1 wherein the at least one primary echo-suppressor is configured to produce the at least one echo-suppressed microphone signal by subtracting the at least one echo estimate signal from the at least one microphone input signal.
5. The apparatus of claim 1 wherein the at least one secondary echo-suppressor is configured to: receive the at least one echo estimate signal and the at least one echo-suppressed microphone signal; produce at least one estimate of residual echo signal based on the at least one echo estimate signal; and produce the echo-processed microphone signal by subtracting the at least one estimate of residual echo signal from the at least one echo-suppressed microphone signal.
6. The apparatus of claim 1 wherein the at least one secondary echo-suppressor is configured to: receive the at least one echo-suppressed microphone signal and the at least one microphone input signal; subtract the at least one echo-suppressed microphone signal from the at least one microphone input signal to produce at least one approximation speaker echo estimate signal; produce at least one estimate of residual echo signal based on the at least one approximation speaker echo estimate signal; and produce the echo-processed microphone signal by subtracting the at least one estimate of residual echo signal from the at least one echo-suppressed microphone signal.
7. The apparatus of claim 1 wherein the at least one primary echo-suppressor uses a room transfer function to provide the at least one echo estimate signal.
8. A method of providing enhanced echo suppression in a conferencing system having at least one microphone and at least one speaker, comprising: receiving at least one microphone input signal from the at least one microphone; receiving at least one speaker input signal from the at least one speaker via at least one primary echo-suppressor; producing at least one echo estimate signal and at least one echo-suppressed microphone signal based on the at least one microphone input signal and the at least one speaker input signal via the at least one primary echo-suppressor; and producing an echo-processed microphone signal via at least one secondary echo-suppressor based on the at least one echo-suppressed microphone signal and one or more selected from a group consisting of the at least one echo estimate signal and the at least one microphone input signal.
9. The method of claim 8 wherein the at least one secondary echo-suppressor is configured to produce the echo-processed microphone signal without using the at least one speaker input signal.
10. The method of claim 8 wherein the at least one echo estimate signal is produced based on the at least one microphone input signal and the at least one speaker input signal via the at least one primary echo-suppressor.
11. The method of claim 8 wherein the at least one echo-suppressed microphone signal is produced based on the at least one echo estimate signal and the at least one microphone input signal via the at least one primary echo-suppressor.
12. The method of claim 8 wherein said producing an echo-processed microphone signal comprises: receiving the at least one echo estimate signal and the at least one echo-suppressed microphone signal via the at least one secondary echo-suppressor; producing at least one estimate of residual echo signal based on the at least one echo estimate signal; and producing the echo-processed microphone signal by subtracting the at least one estimate of residual echo signal from the at least one echo-suppressed microphone signal.
13. The method of claim 8 wherein said producing an echo-processed microphone signal comprises: receiving the at least one echo-suppressed microphone signal and the at least one microphone input signal via the at least one secondary echo-suppressor; subtracting the at least one echo-suppressed microphone signal from the at least one microphone input signal to produce at least one approximation speaker echo estimate signal; producing at least one estimate of residual echo signal based on the at least one approximation speaker echo estimate signal; and producing the echo-processed microphone signal by subtracting the at least one estimate of residual echo signal from the at least one echo-suppressed microphone signal.
14. A non-transitory computer readable media that includes a program code for providing enhanced echo suppression in a conferencing system having at least one microphone and at least one speaker, said program code comprising instructions causing at least one processor, which comprises at least one primary echo-suppressor and at least one secondary echo-suppressor, to perform: receiving at least one microphone input signal from the at least one microphone; receiving at least one speaker input signal from the at least one speaker via the at least one primary echo-suppressor; producing at least one echo estimate signal and at least one echo-suppressed microphone signal based on the at least one microphone input signal and the at least one speaker input signal via the at least one primary echo-suppressor; and producing an echo-processed microphone signal via the at least one secondary echo-suppressor based on the at least one echo-suppressed microphone signal and one or more selected from a group consisting of the at least one echo estimate signal and the at least one microphone input signal.
15. The non-transitory computer readable media of claim 14 wherein the at least one secondary echo-suppressor is configured to produce the echo-processed microphone signal without using the at least one speaker input signal.
16. The non-transitory computer readable media of claim 14 wherein the at least one echo estimate signal is produced based on the at least one microphone input signal and the at least one speaker input signal via the at least one primary echo-suppressor.
17. The non-transitory computer readable media of claim 14 wherein the at least one echo-suppressed microphone signal is produced based on the at least one echo estimate signal and the at least one microphone input signal via the at least one primary echo-suppressor.
18. The non-transitory computer readable media of claim 14 wherein the at least one secondary echo-suppressor is configured to: receive the at least one echo estimate signal and the at least one echo-suppressed microphone signal; produce at least one estimate of residual echo signal based on the at least one echo estimate signal; and produce the echo-processed microphone signal by subtracting the at least one estimate of residual echo signal from the at least one echo-suppressed microphone signal.
19. The non-transitory computer readable media of claim 14 wherein the at least one secondary echo-suppressor is configured to: receive the at least one echo-suppressed microphone signal and the at least one microphone input signal; subtract the at least one echo-suppressed microphone signal from the at least one microphone input signal to produce at least one approximation speaker echo estimate signal; produce at least one estimate of residual echo signal based on the at least one approximation speaker echo estimate signal; and produce the echo-processed microphone signal by subtracting the at least one estimate of residual echo signal from the at least one echo-suppressed microphone signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) The present invention is directed to apparatus and methods that enable groups of people (and other sound sources, for example, recordings, broadcast music, Internet sound, etc.), known as “participants”, to join together over a network, such as the Internet or similar electronic channel(s), in a remotely-distributed real-time fashion employing personal computers, network workstations, audio conference enabled equipment and/or other similarly connected appliances, often without face-to-face contact, to engage in effective audio conference meetings that utilize multi-user rooms (spaces) with distributed participants.
(12) Advantageously, embodiments of the present apparatus and methods afford an ability to provide all participants an end user experience having all sound sources transmitted with significantly reduced return echo signals, regardless of the number potential return echo signals created, while maintaining optimum audio quality for all conference participants.
(13) A notable challenge to eliminating system return echo is the complex speaker to-microphone signal relationships that are formed in combination with changing characteristics present in reverberant rooms, people or objects moving about in the room, and the potential presence of double talk, resulting in a wide range of situations to anticipate and calibrate for, while maintaining appropriate adaptive echo canceller coefficients and compensation factors which affect the audio sound quality for all participant(s) on the audio call.
(14) A “conference enabled system” in this specification may include, but is not limited to, one or more of, an combination of device(s) such as, UC (unified communications) compliant devices and software, computers, dedicated software, audio devices, cell phones, a laptop, tablets, smart watches, a cloud-access device, and/or any device capable of sending and receiving audio signals to/from a local area network or a wide area network (e.g., the Internet), containing integrated or attached microphones, amplifiers, speakers and network adapters. PSTN, Phone networks etc.
(15) A “microphone” in this specification may include, but is not limited to, one or more of, any combination of transducer device(s) such as, condenser mics, dynamic mics, ribbon mics, USB mics, stereo mics, mono mics, shotgun mics, boundary mic, small diaphragm mics, large diaphragm mics, multi-pattern mics, strip microphones, digital microphones, fixed microphone arrays, dynamic microphone arrays, beam forming microphone arrays, and/or any transducer device capable of receiving acoustic signals and converting to electrical signals, and or digital signals.
(16) A “communication connection” in this specification may include, but is not limited to, one or more of or any combination of analog signal connections; local communication interface(s) such as memory buffer(s), queues, named pipes, etc.; digital network interface(s) and devices(s) such as, WIFI modems and cards, internet routers, internet switches, LAN cards, local area network devices, wide area network devices, PSTN, Phone networks etc.
(17) A “device” in this specification may include, but is not limited to, one or more of, or any combination of processing device(s) such as, a cell phone, a Personal Digital Assistant, a smart watch or other body-borne device (e.g., glasses, pendants, rings, etc.), a personal computer, a laptop, a pad, a cloud-access device, a white board, and/or any device capable of sending/receiving messages to/from a local area network or a wide area network (e.g., the Internet), such as devices embedded in cars, trucks, aircraft, household appliances (refrigerators, stoves, thermostats, lights, electrical control circuits, the Internet of Things, etc.).
(18) An “engine” is preferably a program that performs a core function for other programs. An engine can be a central or focal program in an operating system, subsystem, or application program that coordinates the overall operation of other programs. It is also used to describe a special-purpose program containing an algorithm that can sometimes be changed. The best-known usage is the term search engine which uses an algorithm to search an index of topics given a search argument. An engine is preferably designed so that its approach to searching an index, for example, can be changed to reflect new rules for finding and prioritizing matches in the index. In artificial intelligence, for another example, the program that uses rules of logic to derive output from a knowledge base is called an inference engine.
(19) As used herein, a “server” may comprise one or more processors, one or more Random Access Memories (RAM), one or more Read Only Memories (ROM), one or more user interfaces, such as display(s), keyboard(s), mouse/mice, etc. A server is preferably apparatus that provides functionality for other computer programs or devices, called “clients.” This architecture is called the client-server model, and a single overall computation is typically distributed across multiple processes or devices. Servers can provide various functionalities, often called “services”, such as sharing data or resources among multiple clients, or performing computation for a client. A single server can serve multiple clients, and a single client can use multiple servers. A client process may run on the same device or may connect over a network to a server on a different device. Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, application servers, and chat servers. The servers discussed in this specification may include one or more of the above, sharing functionality as appropriate. Client-server systems are most frequently implemented by (and often identified with) the request-response model: a client sends a request to the server, which performs some action and sends a response back to the client, typically with a result or acknowledgement. Designating a computer as “server-class hardware” implies that it is specialized for running servers on it. This often implies that it is more powerful and reliable than standard personal computers, but alternatively, large computing clusters may be composed of many relatively simple, replaceable server components.
(20) The servers and devices in this specification typically use the one or more processors to run one or more stored “computer programs” and/or non-transitory “computer-readable media” to cause the device and/or server(s) to perform the functions recited herein. The media may include Compact Discs, DVDs, ROM, RAM, solid-state memory, or any other storage device capable of storing the one or more computer programs.
(21) A “signal” in this specification refers to a digital representation of an analog microphone or speaker signal as a voltage (v) or power (dB) for purposes of digital signal processing. Other digital signals such as echo, or power estimates may be generated or derived from microphone or speaker signals as necessitated by processing requirements. Digitally processed audio signals are generally described in terms of standard sample rates (8 kHz, 24 kHz, 44.1 kHz, 48 kHz, 96 kHz, 192 kHz and higher) and format (16-bit Pulse Coded Modulation, 32-bit PCM, and others). Algorithms and processing detailed in this specification apply to signals processed at any sample rate and may be performed using floating-point or fixed-point calculations at 16-bit, 32-bit, 64-bit or other precision based on requirements of the specific process or operation employed in the audio processing chain with no adverse effect on the invention.
(22)
(23) The remote user 101 may utilize a laptop computer device 104 connected with audio cables 103 to a headset 102. Utilization of a headset 102 will minimize the chance of an echo signal being generated at the remote user 101 far-end. If the remote user 101 choses to use the built-in microphone and speaker into the laptop computer device 104, the opportunity for a return echo signal to be generated is significantly increased at the far end as there is minimal pathloss isolation between the built-in speaker and microphone. The remote user 101 can use any audio conference enabled system. The laptop computer device would typically run a UC (Unified Communications) client software, and or hardware device.
(24) The conference room 112 preferably contains an audio conference enabled system 106 that is connected via digital or analog connections 110 to a speaker system 109 and connected via digital or analog connections 111 to a microphone system 108. The in-room speaker system for the purpose of simplicity is shown as a single speaker 109 unit, however any number of speaker units is supported as illustrated in
(25) There are notionally four participants illustrated in the room, Participant 1 107a, Participant 2 107b, Participant 3 107c and Participant 4 107d. Participant(s) and sound source(s) and desired sound sources(s) can and will be used interchangeably and in this context, mean substantially the same thing. Each participant illustrates, but is not limited to, an example of desired sound sources within a room 112.
(26) The remote user 101 and the conference room 112 are connected via a communication connection 105. The audio conference enabled system 106, is any device and/or software combination that supports audio conference capabilities and is within the scope of this invention.
(27)
(28)
(29) A return echo signal is created by the remote user 101 talking into their headset 102 microphone 205. This creates the initial audio signal that gets fed through the audio conference systems 104, 106. The audio signal travels from the UC enabled laptop computer 104 through the communications connection 105 to the conference room 112 audio conference system 106. The audio conference system 106 will communicate 110 to the speaker system 109 which will audibly transmit the remote user 101 voice 205 to the conference room participants 107a, 107b, 107c, 107d. In addition to the conference room participants the speaker system 109 will also transmit the remote users 101 audio voice 205 to the audio conference 106 microphone system 108 through direct path 206 and reflected path (reverberations) 203 audio signals. It is this transmission back through the microphone system 108 through communication path 111 that establishes an undesired return echo signal. If the return echo signal 203 goes unprocessed, feedback will occur through the audio conference system 106, 104 back to the remote user 101 and heard through the headset 102 through the speakers 201. Only one reverberant path signal 203 is shown for clarity however it should be noted that there are often a plurality of reverberant signals 203 picked up by the system microphone 108. The number of reverberant signals 203 picked up by the microphone system 108 is dependent on many factors for example but not limited to the speaker system 109 volume, room reflective characteristics and position of the microphones 108 in relation to the speakers 109. It is this combination of factors that make return echo 201 so difficult to eliminate in the desired audio pickup signal.
(30) The situation where any number of the participants 107a, 107b, 107c, and 107d are talking (desired signal) 202a, 202b, 202c, and 202d and the remote user 101 is also talking (undesired signal) 203, 206 at the same time creates a situation known as double talk. Since all signals 202a, 202b, 202c, 202d, 206, 203 are received at the microphone system 108 at the same time the audio system 106 desirably filters out the return echo signals (undesired signals) 203, 206 while maintaining the integrity of the desired signals 202a, 202b, 202c, 202d. This is a complex signal and has proven difficult to filter adequately and can compromise echo canceller adaptive settings and resulting performance. A highly reverberant conference room 112 will result in an even more complex return echo signal 201 due to increased absorption and distortion of the signal upon every reflection; this has proven difficult to solve adequately in the current art of primary stage echo cancellation with secondary stage echo reduction as the effect of physical room characteristics is unknown to the secondary echo reduction processor.
(31) Almost all audio conference systems in the current art have implemented primary and secondary stage echo cancellers and reducers to deal with the return echo signal problem, however they have proven insufficient to solve the problem of conference call echo satisfactorily under all real-life situations.
(32)
(33)
(34)
Raw Microphone signal=Desired in-room signal+Undesired Noise signal+Undesired Speaker signal (1)
(35) The estimated speaker echo signal 512 is subtracted 506 from the raw microphone signal 504 yielding the echo cancelled microphone signal 507 as output from the Primary Echo Canceller (e.g., canceller and/or suppressor and/or reducer and/or attenuator and/or minimizer) 501.
Echo Cancelled Microphone=Raw Microphone signal−Estimated Speaker Echo signal (2)
(36) The room response (room transfer function) to the speaker signal 500 will vary depending on room size, layout, temperature, air pressure, and presence or movement of people and objects within the room 112. Due to these variations, limitations in the first stage AEC processing (i.e. filter length, data precision, etc.) and non-linearities in the amplifier circuits and physical speaker characteristics, there will be errors in the echo estimate 512 caused by over or under estimation of the echo return signal; this results in a residual echo component present in the echo cancelled microphone signal 507. If the residual echo signal is passed into the audio-conferencing system 106, the undesired echo signal may continue to build on itself resulting in poor audio quality and possible feedback. The resultant echo cancelled microphone signal 507 is comprised of the desired in-room signal 202 plus the undesired noise signal 502 and the undesired residual echo signal.
Echo Cancelled Microphone=Desired in-room signal+Undesired noise signal+Undesired residual echo signal (3)
(37)
(38) There are two possible outcomes of poor echo reduction performance: underestimation and overestimation of residual echo. In the case of underestimating residual echo, the impact to audio conferencing systems 106 is a residual echo signal fed back into the system which can continue to build if proper echo reduction cannot be achieved at both ends of a conferencing call. The second outcome, where the residual echo signal 513 is overestimated, causes degradation of the desired in-room signal 202 by subtracting 509 too much speaker signal 500 from the echo cancelled microphone signal 507 resulting in the second stage echo processor output signal 510 containing distortion and unintelligible speech.
Processed Microphone signal=Echo Cancelled Microphone−Estimated Residual Speaker Echo signal (4)
(39)
(40)
(41)
(42)
(43)
(44) P.sub.Nk.sup.(n) 7043 is the noise floor power estimate for frame n, frequency sub-band k. The smoothed noise power estimate can be expressed as
P.sub.Nk.sup.(n)=αP.sub.Nk.sup.(n−1)+(1−α)|N.sub.k.sup.(n)|.sup.2, k=0,1, . . . K (5)
Where K is the total number of sub-bands and a is the forgetting factor (which determines how quickly a filter forgets past training and adapts to current data) from 0 to 1 with a typical value of 0.95. N.sub.k.sup.(n) is the noise frequency component for frame n and sub-band k. An example algorithm to estimate the noise floor power was proposed by R. Martin, “Spectral Subtraction based on minimum statistics”, Proc. EUSIPCO-94, pp 1182-1185, Edinburgh, 1994. |N.sub.k.sup.(n)| is the absolute 7042 value (amplitude) of N.sub.k.sup.(n).
(45) After the echo cancelled microphone signal 507 is transformed to the frequency domain 7044, the signal is further transformed 7045 from a complex signal (rectangular system) into its phase and amplitude (polar system) components. P.sub.Mk.sup.(n) 7046 is the echo cancelled microphone signal 507 (first stage AEC output signal) power estimate for frame n, frequency sub-band k. The smoothed AEC output power estimate can be expressed as
P.sub.Mk.sup.(n)=αP.sub.Mk.sup.(n−1)+(1−α)|M.sub.k.sup.(n)|.sup.2, k=0,1, . . . K (6)
Where K is the total number of sub-bands and α is the forgetting factor (which determines how quickly a filter forgets past training and adapts to current data) from 0 to 1 with a typical value of 0.95. M.sub.k.sup.(n) is the first stage AEC output signal frequency components for frame n and sub-band k consisting of local voice signal (useful signal), background noise and echo residual leaked from first stage AEC. |M.sub.k.sup.(n)| 7045 is the absolute value (amplitude) of M.sub.k.sup.(n) and φ.sub.Mk.sup.(n) is the phase of M.sub.k.sup.(n).
(46) P.sub.Xk.sup.(n) 70412 is the residual echo power estimate for frame n, frequency sub-band k. The smoothed residual echo power estimate can be expressed as
P.sub.Xk.sup.(n)=αP.sub.Xk.sup.(n−1)+(1−α)({circumflex over (X)}.sub.k.sup.(n)).sup.2, k=0,1, . . . K (7)
Where K is the total number of sub-bands and a is the forgetting factor from 0 to 1 with a typical value of 0.95. {circumflex over (X)}.sub.k.sup.(n) is the residual echo estimate adaptive filter output for frame n, frequency sub-band k and can be expressed as:
(47)
Where 70411 |X.sub.k.sup.(n)(l)|(l=0, 1, . . . L−1) is the amplitude of the first stage echo estimate signal for frame n−l and the vector format is:
|X.sub.kL.sup.(n)|=[|X.sub.k.sup.(n)(0)|,|X.sub.k.sup.(n)(1)|, . . . |X.sub.k.sup.(n)(L−1)|] (9)
(48) E.sub.k.sup.(n) 704121 is the residual estimate error signal and can be expressed as
E.sub.k.sup.(n)=|M.sub.k.sup.(n)|−{circumflex over (X)}.sub.k.sup.(n) (10)
(49) The adaptive residual echo estimate filter coefficients 704123 with L taps are updated as follows for kth sub-band:
(50)
Where μ is the step size for the adaptive filter coefficient updating, and H.sub.kL.sup.(n) can be expressed as
H.sub.kL.sup.(n)=[H.sub.k.sup.(n)(0),H.sub.k.sup.(n)(1), . . . H.sub.k.sup.(n)(L−1)] (12)
W.sub.k.sup.(n) 7047 is the extended Wiener filter gain for frame n, frequency sub-band k and is expressed as
(51)
Output from the extended Wiener filter is combined with the previously saved phase information (polar system) and transformed 7048 back to a complex (rectangular system) signal. Y.sub.k.sup.(n) 7048 is the signal output after echo suppression and noise reduction for frame n, frequency sub-band k and is expressed as
Y.sub.k.sup.(n)=W.sub.k.sup.(n)|M.sub.k.sup.(n)|exp(jφ.sub.Mk.sup.(n)) (14)
y.sup.(n) 7049 is the nth frame time domain output signal and is calculated from the inverse STFT 7049 y.sup.(n)=STFT.sup.−1(Y.sup.(n)) . . . (15), where Y.sup.(n)=[Y.sub.0.sup.(n), Y.sub.1.sup.(n), . . . Y.sub.K−1.sup.(n)].
(52)
(53)
(54)
(55)
|X.sub.kL.sup.(n)|=[|X.sub.k.sup.(n)(0)|,|X.sub.k.sup.(n)(1)|, . . . |X.sub.k.sup.(n)(L−1)|] (16)
Discard the oldest one (|X.sub.k.sup.(n)(0)|) and add on the newest one (|X.sub.k.sup.(n)(L−1)|) (S1004). S1009—calculate the kth sub-band echo residual signal E.sub.k.sup.(n) which is the residual adaptive filter (704123) output subtract from the first stage AEC output signal |M.sub.k.sup.(n)| S1010—calculate the kth sub-band smoothed echo estimate signal power, P.sub.Xk.sup.(n), from current frame echo estimate frame (S1009). α is the forgetting factor and we choose 0.95 S1011—increase the sub-band frequency index k for next calculation loop. S1012—update the echo residual adaptive filter coefficients with normalized gradient for next frame H.sub.kL.sup.(n+1). μ is a small number as the adaptive filter step size, we choose 0.005. H.sub.kL.sup.(n) is the adaptive FIR filter coefficients with L taps H.sub.kL.sup.(n)=[H.sub.k.sup.(n)(0), H.sub.k.sup.(n)(1), . . . H.sub.k.sup.(n)(L−1)] S1013—calculate the extended Wiener filter gain, W.sub.k.sup.(n), for kth sub-band frequency component. S1014—calculate the complex output signal, Y.sub.k.sup.(n), for the kth sub-band component by applying the Wiener filter gain to the first stage AEC output signal amplitude |M.sub.k.sup.(n)|, and restore its phase φ.sub.Mk.sup.(n). S1015—repeat from S006 to S1014 until all the sub-bank frequency component has been calculated. S1016—calculate current frame time-domain output signal y.sup.(n)=STFT.sup.−1(Y.sup.(n)) S1017—finish the current frame calculation.
(56)
(57) The embodiments described in this application have been presented with respect to use in one or more conference rooms preferably with local and remote multi users. However, the present invention may also find applicability in other environments such as: 1. Commercial transit passenger and crew cabins such as, but not limited to, aircraft, busses, trains and boats. All of these commercial applications can be outfitted with microphones and speakers which can benefit from consistent microphone audio signal quality with minimal echo signal conditions which can vary from moderate to considerable; 2. Private transportation such as cars, truck, and mini vans, where command and control applications and voice communication applications are becoming more prominent; 3. Industrial applications such as manufacturing floors, warehouses, hospitals, and retail outlets to allow for audio monitoring and to facilitate employee communications without having to use specific portable devices; and 4. Drive through windows and similar applications, where ambient sounds levels can be quite high and variable, can be controlled to consistent levels within the scope of the invention. Also, the processing described above may be carried out in one or more devices, one or more servers, cloud servers, etc.
(58) The individual components shown in outline or designated by blocks in the attached Drawings are all well-known in the electronic processing arts, and their specific construction and operation are not critical to the operation or best mode for carrying out the invention.
(59) While the present invention has been described with respect to what is presently considered to be the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. To the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.