Methods and systems for enhancing audio signals corrupted by noise

10726856 ยท 2020-07-28

Assignee

Inventors

Cpc classification

International classification

Abstract

Systems and methods for audio signal processing including an input interface to receive a noisy audio signal including a mixture of target audio signal and noise. An encoder to map each time-frequency bin of the noisy audio signal to one or more phase-related value from one or more phase quantization codebook of phase-related values indicative of the phase of the target signal. Calculate, for each time-frequency bin of the noisy audio signal, a magnitude ratio value indicative of a ratio of a magnitude of the target audio signal to a magnitude of the noisy audio signal. A filter to cancel the noise from the noisy audio signal based on the phase-related values and the magnitude ratio values to produce an enhanced audio signal. An output interface to output the enhanced audio signal.

Claims

1. An audio signal processing system, comprising: an input interface to receive a noisy audio signal including a mixture of a target audio signal and noise; an encoder to map each time-frequency bin of the noisy audio signal to one or more phase-related values from one or more phase quantization codebooks of phase-related values indicative of the phase of the target signal, and to calculate, for each time-frequency bin of the noisy audio signal, a magnitude ratio value indicative of a ratio of a magnitude of the target audio signal to a magnitude of the noisy audio signal; a filter to cancel the noise from the noisy audio signal based on the one or more phase-related values and the magnitude ratio values to produce an enhanced audio signal; and an output interface to output the enhanced audio signal.

2. The audio signal processing system of claim 1, wherein one of the one or more phase-related values represents an approximate value of the phase of a target signal in each time-frequency bin.

3. The audio signal processing system of claim 1, wherein one of the one or more phase-related values represents an approximate difference between the phase of a target signal in each time-frequency bin and a phase of the noisy audio signal in the corresponding time-frequency bin.

4. The audio signal processing system of claim 1, wherein one of the one or more phase-related values represents an approximate difference between the phase of a target signal in each time-frequency bin and the phase of a target signal in a different time-frequency bin.

5. The audio signal processing system of claim 1, further comprising a phase-related-value weights estimator, wherein the phase-related-value weights estimator estimates phase-related-value weights for each time-frequency bin, and the phase-related-value weights are used to combine the different phase-related values.

6. The audio signal processing system of claim 1, wherein the encoder includes parameters that determine the mappings of the time-frequency bins to the one or more phase-related values in the one or more phase quantization codebook.

7. The audio signal processing system of claim 6, wherein, given a predetermined set of phase values for the one or more phase quantization codebook, the parameters of the encoder are optimized so as to minimize an estimation error between training enhanced audio signal and corresponding training target audio signal on a training dataset of pairs of training noisy audio signal and training target audio signal.

8. The audio signal processing system of claim 6, wherein the phase values of the first quantization codebook are optimized together with the parameters of the encoder in order to minimize an estimation error between training enhanced audio signal and corresponding training target audio signal on a training dataset of pairs of training noisy audio signal and training target audio signal.

9. The audio signal processing system of claim 1, wherein the encoder maps each time-frequency bin of the noisy speech to a magnitude ratio value from a magnitude quantization codebook of magnitude ratio values indicative of quantized ratios of magnitudes of the target audio signal to magnitudes of the noisy audio signal.

10. The audio signal processing system of claim 9, wherein the magnitude quantization codebook includes multiple magnitude ratio values including at least one magnitude ratio value greater than one.

11. The audio signal processing system of claim 9, further comprising: a memory to store the first quantization codebook and the second quantization codebook, and to store a neural network trained to process the noisy audio signal to produce a first index of the phase value in the phase quantization codebook and a second index of the magnitude ratio value in the magnitude quantization codebook, wherein the encoder determines the first index and the second index using the neural network, and retrieves the phase value from the memory using the first index, and retrieves the magnitude ratio value from the memory using the second index.

12. The audio signal processing system of claim 9, wherein the phase values and the magnitude ratio values are optimized together with the parameters of the encoder in order to minimize an estimation error between training enhanced speech and corresponding training target speech.

13. The audio signal processing system of claim 9, wherein the first quantization codebook and the second quantization codebook form a joint quantization codebook with combinations of the phase values and the magnitude ratio values, such that the encoder maps each time-frequency bin of the noisy speech to the phase value and the magnitude ratio value forming a combination in the joint quantization codebook.

14. The audio signal processing system of claim 13, wherein the phase values and the magnitude ratio values are combined such that the joint quantization codebook includes a subset of all possible combinations of phase values and magnitude ratio values.

15. The audio signal processing system of claim 13, wherein the phase values and the magnitude ratio values are combined, such that the joint quantization codebook includes all possible combinations of phase values and magnitude ratio values.

16. A method for audio signal processing that includes a hardware processor coupled with a memory, wherein the memory has stored instructions and other data, the method comprising: accepting by an input interface, a noisy audio signal including a mixture of target audio signal and noise; mapping by the hardware processor, each time-frequency bin of the noisy audio signal to one or more phase-related values from one or more phase quantization codebook of phase-related values indicative of the phase of the target signal; calculating by the hardware processor, for each time-frequency bin of the noisy audio signal, a magnitude ratio value indicative of a ratio of a magnitude of the target audio signal to a magnitude of the noisy audio signal; cancelling using a filter, the noise from the noisy audio signal based on the phase values and the magnitude ratio values to produce an enhanced audio signal; and outputting by an output interface, the enhanced audio signal.

17. The method of claim 16, wherein the cancelling further comprising: updating time-frequency coefficients of the filter using the one or more phase values and the magnitude ratio values determined by the hardware processor for each time-frequency bin and to multiply the time-frequency coefficients of the filter with a time-frequency representation of the noisy audio signal to produce a time-frequency representation of the enhanced audio signal.

18. The method of claim 16, wherein the stored other data includes a first quantization codebook, a second quantization codebook, and a neural network trained to process the noisy audio signal to produce a first index of the phase value in the first quantization codebook and a second index of the magnitude ratio value in the second quantization codebook, wherein the hardware processor determines the first index and the second index using the neural network, and retrieves the phase value from the memory using the first index, and retrieves the magnitude ratio value from the memory using the second index.

19. The method of claim 18, wherein the first quantization codebook and the second quantization codebook form a joint quantization codebook with combinations of the phase values and the magnitude ratio values, such that the hardware processor maps each time-frequency bin of the noisy speech to the phase value and the magnitude ratio value forming a combination in the joint quantization codebook.

20. A non-transitory computer readable storage medium embodied thereon a program executable by a hardware processor for performing a method, the method comprising: accepting a noisy audio signal including a mixture of target audio signal and noise; mapping each time-frequency bin of the noisy audio signal to a phase value from a first quantization codebook of phase values indicative of quantized phase differences between phases of the noisy audio signal and phases of the target audio signal; mapping by the hardware processor, each time-frequency bin of the noisy audio signal to one or more phase-related values from one or more phase quantization codebook of phase-related values indicative of the phase of the target signal; calculating by the hardware processor, for each time-frequency bin of the noisy audio signal, a magnitude ratio value indicative of a ratio of a magnitude of the target audio signal to a magnitude of the noisy audio signal; cancelling using a filter, the noise from the noisy audio signal based on the phase values and the magnitude ratio values to produce an enhanced audio signal; and outputting by an output interface, the enhanced audio signal.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The presently disclosed embodiments will be further explained with reference to the attached drawings. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.

(2) FIG. 1A is a flow diagram illustrating a method for audio signal processing, according to embodiments of the present disclosure;

(3) FIG. 1B is a block diagram illustrating a method for audio signal processing, implemented using some components of the system, according to embodiments of the present disclosure;

(4) FIG. 1C is a flow diagram illustrating noise suppression from a noisy speech signal using deep recurrent neural networks, where a time-frequency filter is estimated at each time-frequency bin using the output of the neural network and a codebook of filter prototypes, this time-frequency filter is multiplied with a time-frequency representation of the noisy speech to obtain a time-frequency representation of an enhanced speech, and this time-frequency representation of an enhanced speech is used to reconstruct an enhanced speech, according to embodiments of the present disclosure;

(5) FIG. 1D is a flow diagram illustrating noise suppression using deep recurrent neural networks, where a time-frequency filter is estimated at each time-frequency bin using the output of the neural network and a codebook of filter prototypes, this time-frequency filter is multiplied with a time-frequency representation of the noisy speech to obtain an initial time-frequency representation of an enhanced speech (initial enhanced spectrogram in FIG. 1D), and this initial time-frequency representation of an enhanced speech is used to reconstruct an enhanced speech via a spectrogram refinement module as follows: the initial time-frequency representation of an enhanced speech is refined using a spectrogram refinement module for example based on a phase reconstruction algorithm to obtain a time-frequency representation of an enhanced speech (enhanced speech spectrogram in FIG. 1D), and this time-frequency representation of an enhanced speech is used to reconstruct an enhanced speech, according to embodiments of the present disclosure;

(6) FIG. 2 is another flow diagram illustrating noise suppression using deep recurrent neural networks, where a time-frequency filter is estimated as a product of a magnitude and a phase components, where each component is estimated at each time-frequency bin using the output of the neural network and a corresponding codebook of prototypes, this time-frequency filter is multiplied with a time-frequency representation of the noisy speech to obtain a time-frequency representation of an enhanced speech, and this time-frequency representation of an enhanced speech is used to reconstruct an enhanced speech, according to embodiments of the present disclosure;

(7) FIG. 3 is a flow diagram of an embodiment where only the phase component of the filter is estimated using a codebook, according to embodiments of the present disclosure;

(8) FIG. 4 is a flow diagram of the training stage of the algorithm, according to embodiments of the present disclosure;

(9) FIG. 5 is a block diagram illustrating a network architecture for speech enhancement, according to embodiments of the present disclosure;

(10) FIG. 6A is illustrating a joint quantization codebook in the complex domain regularly combining a phase quantization codebook and a magnitude quantization codebook;

(11) FIG. 6B is illustrating a joint quantization codebook in the complex domain irregularly combining phase and magnitude values such that the joint quantization codebook can be described as the union of two joint quantization codebooks each regularly combining a phase quantization codebook and a magnitude quantization codebook;

(12) FIG. 6C is illustrating a joint quantization codebook in the complex domain irregularly combining phase and magnitude values such that the joint quantization codebook is most easily described as a set of points in the complex domains, where the points do not necessarily share a phase or magnitude component with each other; and

(13) FIG. 7A is a schematic illustrating a computing apparatus that can be used to implement some techniques of the methods and systems, according to embodiments of the present disclosure; and

(14) FIG. 7B is a schematic illustrating a mobile computing apparatus that can be used to implement some techniques of the methods and systems, according to embodiments of the present disclosure.

(15) While the above-identified drawings set forth presently disclosed embodiments, other embodiments are also contemplated, as noted in the discussion. This disclosure presents illustrative embodiments by way of representation and not limitation. Numerous other modifications and embodiments can be devised by those skilled in the art which fall within the scope and spirit of the principles of the presently disclosed embodiments.

DETAILED DESCRIPTION

(16) Overview

(17) The present disclosure relates to providing systems and methods for speech processing, including speech enhancement with noise suppression.

(18) Some embodiments of the present disclosure include an audio signal processing system having an input interface to receive a noisy audio signal including a mixture of target audio signal and noise. An encoder to map each time-frequency bin of the noisy audio signal to one or more phase-related value from one or more phase quantization codebook of phase-related values indicative of the phase of the target signal. Calculate, for each time-frequency bin of the noisy audio signal, a magnitude ratio value indicative of a ratio of a magnitude of the target audio signal to a magnitude of the noisy audio signal. A filter to cancel the noise from the noisy audio signal based on the phase-related values and the magnitude ratio values to produce an enhanced audio signal. An output interface to output the enhanced audio signal.

(19) Referring to FIG. 1A and FIG. 1B, FIG. 1A is a flow diagram illustrating an audio signal processing method. The method 100A can use a hardware processor coupled with a memory. Such that the memory can have stored instructions and other data, and when executed by the hardware processor carry out some steps of the method. Step 110 includes accepting a noisy audio signal having a mixture of target audio signal and noise via an input interface.

(20) Step 115 of FIG. 1A and FIG. 1B, includes mapping via the hardware processor, such that each time-frequency bin of the noisy audio signal to one or more phase-related values from one or more phase quantization codebooks of phase-related values is indicative of the phase of the target signal. The one or more phase quantization codebooks can be stored in memory 109 or can be accessed through a network. The one or more phase quantization codebooks can contain values that have been set manually beforehand or may be obtained by an optimization procedure to optimize performance, for example via training on a dataset of training data. The values contained in the one or more phase quantization codebooks are indicative of the phase of the enhanced speech, by themselves or in combination with the noisy audio signal. The system chooses the most relevant value or combination of values within the one or more phase quantization codebooks for each time-frequency bin, and this value or combination of values is used to estimate a phase of the enhanced audio signal at each time-frequency bin. For example, if the phase-related values are representative of the difference between the phase of the noisy audio signal and the phase of the clean target signal, an example of phase quantization codebook may contain several values such as

(21) - 2 , 0 , 2 , ,
and the system may select the value 0 for bins whose energy is strongly dominated by the target signal energy: selecting the value 0 for such bins results in using the phase of the noisy signal as is for these bins, as the phase component of the filter at those bins will be equal to e.sup.0*i=1, where i denotes the imaginary unit of complex numbers, which will leave the phase of the noisy signal unchanged.

(22) Step 120 of FIG. 1A and FIG. 1B, calculating by the hardware processor, for each time-frequency bin of the noisy audio signal, a magnitude ratio value indicative of a ratio of a magnitude of the target audio signal to a magnitude of the noisy audio signal. For example, an enhancement network may estimate a magnitude ratio value close to 0 for those bins where the energy of the noisy signal is dominated by that of the noise signal, and it may estimate a magnitude ratio value close to 1 for those bins where the energy of the noisy signal is dominated by that of the target signal. It may estimate a magnitude ratio value larger than 1 for those bins where the interaction of the target signal and the noise signal resulted in a noisy signal whose energy is smaller than that of the target signal.

(23) Step 125 of FIG. 1A and FIG. 1B, can include cancelling using a filter, the noise from the noisy audio signal based on the phase values and the magnitude ratio values to produce an enhanced audio signal. The time-frequency filter is for example obtained at each time-frequency bin by multiplying the calculated magnitude ratio value at that bin with the estimate of the phase difference between the noisy signal and the target signal obtained using the mapping of that time-frequency bin to the one or more phase-related values from the one or more phase quantization codebooks. For example, if the calculated magnitude ratio value at bin (t,f) for time frame t and frequency f is m.sub.t,f and the angular value of the estimate of the phase difference between the noisy signal and the target signal at that bin is .sub.t,f, then a value of a filter at that bin can be obtained as m.sub.t,fe.sup.i.sup.t,f. This filter can then be multiplied with a time-frequency representation of the noisy signal to obtain a time-frequency representation of an enhanced audio signal. For example, this time-frequency representation can be a short-time Fourier transform, in which case the obtained time-frequency representation of an enhanced audio signal can be processed by inverse short-time Fourier transform to obtain a time-domain enhanced audio signal. Alternatively, the obtained time-frequency representation of an enhanced audio signal can be processed by a phase reconstruction algorithm to obtain a time-domain enhanced audio signal.

(24) The speech enhancement method 100 is directed to, among other things, obtain enhanced speech which is a processed version of the noisy speech that is closer in a certain sense to the underlying true clean speech or target speech.

(25) Note that target speech, i.e. clean speech, can be assumed to be only available during training, and not available during the real-world use of the system, according to some embodiments. For training, clean speech can be obtained with a close talking microphone, whereas the noisy speech can be obtained with a far-field microphone recorded at the same time, according to some embodiments. Or, given separate clean speech signals and noise signals, one can add the signals together to obtain noisy speech signals, where the clean and noisy pairs can be used together for training.

(26) Step 130 of FIG. 1A and FIG. 1B, can include outputting by an output interface, the enhanced audio signal.

(27) Embodiments of the present disclosure provide unique aspects, by non-limiting example, an estimate of the phase of the target signal is obtained by relying on the selection or combination of a limited number of values within one or more phase quantization codebooks. These aspects allow the present disclosure to obtain a better estimate of the phase of the target signal, resulting in a better quality for the enhanced target signal.

(28) Referring to FIG. 1B, FIG. 1B is a block diagram illustrating a method for speech processing, implemented using some components of the system, according to embodiments of the present disclosure. For example, FIG. 1B can be a block diagram illustrating the system of FIG. 1A, by non-limiting example, wherein the system 100B is implemented using some components, including a hardware processor 140 in communication with an input interface 142, occupant transceiver 144, a memory 146, a transmitter 148, a controller 150. The controller can be connected to the set of devices 152. The occupant transceiver 144 can be a wearable electronic device that the occupant (user) wears to control the set of devices 152 as well as can send and receive information.

(29) It is contemplated the hardware processor 140 can include two or more hardware processors depending upon the requirements of the specific application. Certainly, other components may be incorporated with method 100 including input interfaces, output interfaces and transceivers.

(30) FIG. 1C is a flow diagram illustrating noise suppression using deep neural networks, where a time-frequency filter is estimated at each time-frequency bin using the output of the neural network and a codebook of filter prototypes, and this time-frequency filter is multiplied with a time-frequency representation of the noisy speech to obtain a time-frequency representation of an enhanced speech, according to embodiments of the present disclosure. The system illustrates using as example a case of speech enhancement, that is the separation of speech from noise within a noisy signal, but the same considerations apply to more general cases such as source separation, in which the system estimates multiple target audio signals from a mixture of target audio signals and potentially other non-target sources such as noise. For example, FIG. 1C illustrates an audio signal processing system 100C for estimating using processor 140 a target speech signal 190 from an input noisy speech signal 105 obtained from a sensor 103 such as a microphone monitoring an environment 102. The system 100C processes the noisy speech 105 using an enhancement network 154 with network parameters 152. The enhancement network 154 maps each time-frequency bin of a time-frequency representation of the noisy speech 105 to one or more filter codes 156 for that time-frequency bin. For each time-frequency bin, the one or more filter codes 156 are used to select or combine values corresponding to the one or more filter codes within a filter codebook 158 to obtain a filter 160 for that time-frequency bin. For example, if the filter codebook 158 contains five values v.sub.0=1, v.sub.1=0, v.sub.2=1, v.sub.3=i, v.sub.4=i, the enhancement network 154 may estimate a code c.sub.t,f{0,1,2,3,4} for a time-frequency bin t,f, in which case the value of the filter 160 at time-frequency bin t,f may be set to w.sub.t,f=v.sub.c.sub.t,f. A speech estimation module 165 then multiplies the time-frequency representation of the noisy speech 105 with the filter 160 to obtain a time-frequency representation of the enhanced speech, and inverts that time-frequency representation of the enhanced speech to obtain the enhanced speech signal 190.

(31) FIG. 1D is a flow diagram illustrating noise suppression using deep neural networks, where a time-frequency filter is estimated at each time-frequency bin using the output of the neural network and a codebook of filter prototypes, this time-frequency filter is multiplied with a time-frequency representation of the noisy speech to obtain an initial time-frequency representation of an enhanced speech (initial enhanced spectrogram in FIG. 1D), and this initial time-frequency representation of an enhanced speech is used to reconstruct an enhanced speech via a spectrogram refinement module as follows: the initial time-frequency representation of an enhanced speech is refined using a spectrogram refinement module for example based on a phase reconstruction algorithm to obtain a time-frequency representation of an enhanced speech (enhanced speech spectrogram in FIG. 1D), and this time-frequency representation of an enhanced speech is used to reconstruct an enhanced speech, according to embodiments of the present disclosure.

(32) For example, FIG. 1D illustrates an audio signal processing system 100D for estimating using processor 140 a target speech signal 190 from an input noisy speech signal 105 obtained from a sensor 103 such as a microphone monitoring an environment 102. The system 100D processes the noisy speech 105 using an enhancement network 154 with network parameters 152. The enhancement network 154 maps each time-frequency bin of a time-frequency representation of the noisy speech 105 to one or more filter codes 156 for that time-frequency bin. For each time-frequency bin, the one or more filter codes 156 are used to select or combine values corresponding to the one or more filter codes within a filter codebook 158 to obtain a filter 160 for that time-frequency bin. For example, if the filter codebook 158 contains five values v.sub.0=1, v.sub.1=0, v.sub.2=1, v.sub.3=i, v.sub.4=i, the enhancement network 154 may estimate a code c.sub.t,f{0,1,2,3,4} for a time-frequency bin t,f, in which case the value of the filter 160 at time-frequency bin t,f may be set to w.sub.t,f=v.sub.c.sub.t,f. A speech estimation module 165 then multiplies the time-frequency representation of the noisy speech 105 with the filter 160 to obtain an initial time-frequency representation of the enhanced speech, here denoted as initial enhanced spectrogram 166, processes this initial enhanced spectrogram 166 using a spectrogram refinement module 167, for example based on a phase reconstruction algorithm, to obtain time-frequency representation of the enhanced speech here denoted as enhanced speech spectrogram 168, and inverts that enhanced speech spectrogram 168 to obtain the enhanced speech signal 190.

(33) FIG. 2 is another flow diagram illustrating noise suppression using deep neural networks, where a time-frequency filter is estimated as a product of a magnitude and a phase components, where each component is estimated at each time-frequency bin using the output of the neural network and a corresponding codebook of prototypes, and this time-frequency filter is multiplied with a time-frequency representation of the noisy speech to obtain a time-frequency representation of an enhanced speech, according to embodiments of the present disclosure. For example, the method 200 of FIG. 2 estimates using processor 140 a target speech signal 290 from an input noisy speech signal 105 obtained from a sensor 103 such as a microphone monitoring an environment 102. The system 200 processes the noisy speech 105 using an enhancement network 254 with network parameters 252. The enhancement network 254 maps each time-frequency bin of a time-frequency representation of the noisy speech 105 to one or more magnitude codes 270 and one or more phase codes 272 for that time-frequency bin. For each time-frequency bin, the one or more magnitude codes 270 are used to select or combine magnitude values corresponding to the one or more magnitude codes within a magnitude codebook 158 to obtain a filter magnitude 274 for that time-frequency bin. For example, if the magnitude codebook 276 contains four values v.sub.0.sup.(m)=0, v.sub.1.sup.(m)=0.5, v.sub.2.sup.(m)=1, v.sub.3.sup.(m)=2, the enhancement network 254 may estimate a code c.sub.t,f.sup.(m){0,1,2,3} for a time-frequency bin t,f, in which case the value of the filter magnitude 274 at time-frequency bin t,f may be set to

(34) w t , f ( m ) = v c t , f ( m ) ( m ) .
For each time-frequency bin, the one or more phase codes 272 are used to select or combine phase-related values corresponding to the one or more phase codes within a phase codebook 280 to obtain a filter phase 278 for that time-frequency bin. For example, if the phase codebook 280 contains four values

(35) v 0 ( p ) = - 2 , v 1 ( p ) = 0 , v 2 ( p ) = 2 , v 3 ( p ) = ,
the enhancement network 254 may estimate a code c.sub.t,f.sup.(p){0,1,2,3} for a time-frequency bin t,f, in which case the value of the filter phase 278 at time-frequency bin t,f may be set to

(36) w t , f ( p ) = e iv c t , f ( p ) ( p ) .
The filter magnitudes 274 and filter phases 278 are combined to obtain a filter 260. For example they can be combined by multiplying their values at each time-frequency bin t,f, in which case the value of the filter 260 at time-frequency bin t,f may be set to

(37) w t , f = w t , f ( m ) w t , f ( p ) = v c t , f ( m ) ( m ) e iv c t , f ( p ) ( p ) .
A speech estimation module 265 then multiplies at each time-frequency bin the time-frequency representation of the noisy speech 105 with the filter 260 to obtain a time-frequency representation of the enhanced speech, and inverts that time-frequency representation of the enhanced speech to obtain the enhanced speech signal 290.

(38) FIG. 3 is a flow diagram of an embodiment where only the phase component of the filter is estimated using a codebook, according to embodiments of the present disclosure. For example, the method 300 of FIG. 3 estimates using processor 140 a target speech signal 390 from an input noisy speech signal 105 obtained from a sensor 103 such as a microphone monitoring an environment 102. The method 300 processes the noisy speech 105 using an enhancement network 354 with network parameters 352. The enhancement network 354 estimates a filter magnitude 374 for each time-frequency bin of a time-frequency representation of the noisy speech 105, and the enhancement network 354 also maps each time-frequency bins to one or more phase codes 372 for that time-frequency bin. For each time-frequency bin, a filter magnitude 374 is estimated by the network as indicative of the ratio of magnitude of the target speech with respect to the noisy speech for that time-frequency bin. For example, the enhancement network 354 may estimate a filter magnitude w.sub.t,f.sup.(m) for a time-frequency bin t,f such that w.sub.t,f.sup.(m) is a non-negative real number, whose range may be unlimited or it may be limited to a specific range such as [0,1] or [0,2]. For each time-frequency bin, the one or more phase codes 372 are used to select or combine phase-related values corresponding to the one or more phase codes within a phase codebook 380 to obtain a filter phase 378 for that time-frequency bin. For example, if the phase codebook 380 contains four values

(39) v 0 ( p ) = - 2 , v 1 ( p ) = 0 , v 2 ( p ) = 2 , v 3 ( p ) = ,
the enhancement network 354 may estimate a code c.sub.t,f.sup.(p){0,1,2,3} for a time-frequency bin t,f, in which case the value of the filter phase 378 at time-frequency bin t,f may be set to

(40) w t , f ( p ) = e iv c t , f ( p ) ( p ) .
The filter magnitudes 374 and filter phases 378 are combined to obtain a filter 360. For example they can be combined by multiplying their values at each time-frequency bin t,f, in which case the value of the filter 360 at time-frequency bin t,f may be set to

(41) w t , f = w t , f ( m ) w t , f ( p ) = w t , f ( m ) e iv c t , f ( p ) ( p ) .
A speech estimation module 365 then multiplies at each time-frequency bin the time-frequency representation of the noisy speech 105 with the filter 360 to obtain a time-frequency representation of the enhanced speech, and inverts that time-frequency representation of the enhanced speech to obtain the enhanced speech signal 390.

(42) FIG. 4 is a flow diagram illustrating training of an audio signal processing system 400 for speech enhancement, according to embodiments of the present disclosure. The system illustrates using as example a case of speech enhancement, that is the separation of speech from noise within a noisy signal, but the same considerations apply to more general cases such as source separation, in which the system estimates multiple target audio signals from a mixture of target audio signals and potentially other non-target sources such as noise. A noisy input speech signal 405 including a mixture of speech and noise and the corresponding clean signals 461 for the speech and noise are sampled from the training set of clean and noisy audio 401. The noisy input signal 405 is processed by an enhancement network 454 to compute a filter 460 for the target signal, using stored network parameters 452. A speech estimation module 465 then multiplies at each time-frequency bin the time-frequency representation of the noisy speech 405 with the filter 460 to obtain a time-frequency representation of the enhanced speech, and inverts that time-frequency representation of the enhanced speech to obtain the enhanced speech signal 490. An objective function computation module 463 computes an objective function by computing a distance between the clean speech and the enhanced speech. The objective function can be used by a network training module 457 to update the network parameters 452.

(43) FIG. 5 is a block diagram illustrating a network architecture 500 for speech enhancement, according to embodiments of the present disclosure. A sequence of feature vectors obtained from the input noisy speech 505, for example the log magnitude 520 of the short-time Fourier transform 510 of the input mixture, is used as input to a series of layers within an enhancement network 554. For example, the dimension of the input vector in the sequence can be F. The enhancement network can include multiple bidirectional long short-term memory (BLSTM) neural network layers, from the first BLSTM layer 530 to the last BLSTM layer 535. Each BLSTM layer is composed of a forward long short-term memory (LSTM) layer and a backward LSTM layer, whose outputs are combined and used as input by the next layer. For example, the dimension of the output of each LSTM in the first BLSTM layer 530 can be N, and both the input and output dimensions of each LSTM in all other BLSTM layers including the last BLSTM layer 535 can be N. The output of the last BLSTM layer 535 can be used as input to a magnitude softmax layer 540 and a phase softmax 542. For each time frame and each frequency in a time-frequency domain, for example the short-time Fourier transform domain, the magnitude softmax layer 540 uses output of the last BLSTM layer 535 to output I.sup.(m) non-negative numbers summing up to 1, where I.sup.(m) is the number of values in the magnitude codebook 576, and these I.sup.(m) numbers represent probabilities that the corresponding value in the magnitude codebook should be selected as the filter magnitude 574. A filter magnitude computation module 550 can use these probabilities as a plurality of weighted magnitude codes 570 to combine multiple values in the magnitude codebook 576 in a weighted fashion, or it can use only the largest probability as a unique magnitude code 570 to select the corresponding value in the magnitude codebook 576, or it can use a single value sampled according to these probabilities as a unique magnitude code 570 to select the corresponding value in the magnitude codebook 576, among multiple ways of using the output of the enhancement network 554 to obtain a filter magnitude 574. For each time frame and each frequency in a time-frequency domain, for example the short-time Fourier transform domain, the phase softmax layer 542 uses output of the last BLSTM layer 535 to output I.sup.(p) non-negative numbers summing up to 1, where I.sup.(p) is the number of values in the phase codebook 580, and these I.sup.(p) numbers represent probabilities that the corresponding value in the phase codebook should be selected as the filter phase 578. A filter phase computation module 552 can use these probabilities as a plurality of weighted phase codes 572 to combine multiple values in the phase codebook 580 in a weighted fashion, or it can use only the largest probability as a unique phase code 572 to select the corresponding value in the phase codebook 580, or it can use a single value sampled according to these probabilities as a unique phase code 572 to select the corresponding value in the phase codebook 580, among multiple ways of using the output of the enhancement network 554 to obtain a filter phase 578. A filter combination module 560 combines the filter magnitudes 574 and the filter phases 578, for example by multiplying them, to obtain a filter 576. A speech estimation module 565 uses a spectrogram estimation module 584 to process the filter 576 together with a time-frequency representation of the noisy speech 505 such as the short-time Fourier transform 582, for example by multiplying them with each other, to obtain an enhanced spectrogram, which is inverted in a speech reconstruction module 588 to obtain an enhanced speech 590.

(44) Features

(45) According to aspects of the present disclosure, the combinations of the phase values and the magnitude ratio values can minimize an estimation error between training enhanced speech and corresponding training target speech.

(46) Another aspect of the present disclosure can include the phase values and the magnitude ratio values being combined regularly and fully such that each phase value in the joint quantization codebook forms a combination with each magnitude ratio value in the joint quantization codebook. This is illustrated in FIG. 6A, which shows a phase codebook with six values, a magnitude codebook with four values, and a joint quantization codebook with regular combination in the complex domain where the set of complex values in the joint quantization codebook is equal to the set of values of the form me.sup.i for all values m in the magnitude codebook and all values in the phase codebook.

(47) Further, the phase values and the magnitude ratio values can be combined irregularly such that the joint quantization codebook includes a first magnitude ratio value forming combinations with a first set of phase values and includes a second magnitude ratio value forming combinations with a second set of phase values, wherein the first set of phase values differs from the second set of phase values. This is illustrated in FIG. 6B, which shows a joint quantization codebook with irregular combination in the complex domain, where the set of values in the joint quantization codebook is equal to the union of the set of values of the form m.sub.1e.sup.i.sup.1 for all values m.sub.1 in the magnitude codebook 1 and all values .sub.1 in the phase codebook 1, with the set of values of the form m.sub.2e.sup.i.sup.2 for all values m.sub.2 in the magnitude codebook 2 and all values .sub.2 in the phase codebook 2. More generally, FIG. 6C illustrates a joint quantization codebook with a set of K complex values w.sub.k where w.sub.k=m.sub.ke.sup.i.sup.k and m.sub.k is the unique value of a k-th magnitude codebook and .sub.k is the unique value of a k-th phase codebook.

(48) Another aspect of the present disclosure can include one of the one or more phase-related values represents an approximate value of the phase of a target signal in each time-frequency bin. Further, another aspect can be that one of the one or more phase-related values represents an approximate difference between the phase of a target signal in each time-frequency bin and a phase of the noisy audio signal in the corresponding time-frequency bin.

(49) It is possible that one of the one or more phase-related values represents an approximate difference between the phase of a target signal in each time-frequency bin and the phase of a target signal in a different time-frequency bin. Wherein the different phase-related values are combined using phase-related-value weights. Such that, the phase-related-value weights are estimated for each time-frequency bin. This estimation can be performed by the network, or it can be performed offline by estimating the best combination according to some performance criterion on some training data.

(50) Another aspect can include the one or more phase-related values in the one or more phase quantization codebook minimize an estimation error between a training enhanced audio signal and a corresponding training target audio signal.

(51) Another aspect can include the encoder includes parameters that determine the mappings of the time-frequency bins to the one or more phase-related values in the one or more phase quantization codebook. Wherein, given a predetermined set of phase values for the one or more phase quantization codebook, the parameters of the encoder are optimized so as to minimize an estimation error between training enhanced audio signal and corresponding training target audio signal. Wherein the phase values of the first quantization codebook are optimized together with the parameters of the encoder in order to minimize an estimation error between training enhanced audio signal and corresponding training target audio signal. Another aspect can include that at least one magnitude ratio value can be greater than one.

(52) Another aspect can include the encoder that maps each time-frequency bin of the noisy speech to a magnitude ratio value from a magnitude quantization codebook of magnitude ratio values indicative of quantized ratios of magnitudes of the target audio signal to magnitudes of the noisy audio signal. Wherein the magnitude quantization codebook includes multiple magnitude ratio values including at least one magnitude ratio value greater than one. It is possible to further comprise a memory to store the first quantization codebook and the second quantization codebook, and to store a neural network trained to process the noisy audio signal to produce a first index of the phase value in the phase quantization codebook and a second index of the magnitude ratio value in the magnitude quantization codebook. Wherein the encoder determines the first index and the second index using the neural network, and retrieves the phase value from the memory using the first index, and retrieves the magnitude ratio value from the memory using the second index. Wherein the combinations of the phase values and the magnitude ratio values are optimized together with the parameters of the encoder in order to minimize an estimation error between training enhanced speech and corresponding training target speech. Wherein the first quantization codebook and the second quantization codebook form a joint quantization codebook with combinations of the phase values and the magnitude ratio values, such that the encoder maps each time-frequency bin of the noisy speech to the phase value and the magnitude ratio value forming a combination in the joint quantization codebook. Wherein the phase values and the magnitude ratio values are combined such that the joint quantization codebook includes a subset of all possible combinations of phase values and magnitude ratio values. Such that the phase values and the magnitude ratio values are combined, such that the joint quantization codebook includes all possible combinations of phase values and magnitude ratio values.

(53) An aspect further includes a processor to update time-frequency coefficients of the filter using the phase values and the magnitude ratio values determined by the encoder for each time-frequency bin and to multiply the time-frequency coefficients of the filter with a time-frequency representation of the noisy audio signal to produce a time-frequency representation of the enhanced audio signal.

(54) Another aspect can include a processor to update time-frequency coefficients of the filter using the phase values and the magnitude ratio values determined by the encoder for each time-frequency bin and to multiply the time-frequency coefficients of the filter with a time-frequency representation of the noisy audio signal to produce a time-frequency representation of the enhanced audio signal.

(55) FIG. 7A is a schematic illustrating by non-limiting example a computing apparatus 700A that can be used to implement some techniques of the methods and systems, according to embodiments of the present disclosure. The computing apparatus or device 700A represents various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. There can be a mother board or some other main aspect 750 of the computing device 700A of FIG. 7A.

(56) The computing device 700A can include a power source 708, a processor 709, a memory 710, a storage device 711, all connected to a bus 750. Further, a high-speed interface 712, a low-speed interface 713, high-speed expansion ports 714 and low speed connection ports 715, can be connected to the bus 750. Also, a low-speed expansion port 716 is in connection with the bus 750.

(57) Contemplated are various component configurations that may be mounted on a common motherboard depending upon the specific application. Further still, an input interface 717 can be connected via bus 750 to an external receiver 706 and an output interface 718. A receiver 719 can be connected to an external transmitter 707 and a transmitter 720 via the bus 750. Also connected to the bus 750 can be an external memory 704, external sensors 703, machine(s) 702 and an environment 701. Further, one or more external input/output devices 705 can be connected to the bus 750. A network interface controller (NIC) 721 can be adapted to connect through the bus 750 to a network 722, wherein data or other data, among other things, can be rendered on a third party display device, third party imaging device, and/or third party printing device outside of the computer device 700A.

(58) Contemplated also is that the memory 710 can store instructions that are executable by the computer device 700A, historical data, and any data that can be utilized by the methods and systems of the present disclosure. The memory 710 can include random access memory (RAM), read only memory (ROM), flash memory, or any other suitable memory systems. The memory 710 can be a volatile memory unit or units, and/or a non-volatile memory unit or units. The memory 710 may also be another form of computer-readable medium, such as a magnetic or optical disk.

(59) Still referring to FIG. 7A, a storage device 711 can be adapted to store supplementary data and/or software modules used by the computer device 700A. For example, the storage device 711 can store historical data and other related data as mentioned above regarding the present disclosure. Additionally, or alternatively, the storage device 711 can store historical data similar to data as mentioned above regarding the present disclosure. The storage device 711 can include a hard drive, an optical drive, a thumb-drive, an array of drives, or any combinations thereof. Further, the storage device 711 can contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. Instructions can be stored in an information carrier. The instructions, when executed by one or more processing devices (for example, processor 709), perform one or more methods, such as those described above.

(60) The system can be linked through the bus 750 optionally to a display interface or user Interface (HMI) 723 adapted to connect the system to a display device 725 and keyboard 724, wherein the display device 725 can include a computer monitor, camera, television, projector, or mobile device, among others.

(61) Still referring to FIG. 7A, the computer device 700A can include a user input interface 717 adapted to a printer interface (not shown) can also be connected through bus 750 and adapted to connect to a printing device (not shown), wherein the printing device can include a liquid inkjet printer, solid ink printer, large-scale commercial printer, thermal printer, UV printer, or dye-sublimation printer, among others.

(62) The high-speed interface 712 manages bandwidth-intensive operations for the computing device 700A, while the low-speed interface 713 manages lower bandwidth-intensive operations. Such allocation of functions is an example only. In some implementations, the high-speed interface 712 can be coupled to the memory 710, a user interface (HMI) 723, and to a keyboard 724 and display 725 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 714, which may accept various expansion cards (not shown) via bus 750. In the implementation, the low-speed interface 713 is coupled to the storage device 711 and the low-speed expansion port 715, via bus 750. The low-speed expansion port 715, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices 705, and other devices a keyboard 724, a pointing device (not shown), a scanner (not shown), or a networking device such as a switch or router, e.g., through a network adapter.

(63) Still referring to FIG. 7A, the computing device 700A may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 726, or multiple times in a group of such servers. In addition, it may be implemented in a personal computer such as a laptop computer 727. It may also be implemented as part of a rack server system 728. Alternatively, components from the computing device 700A may be combined with other components in a mobile device (not shown), such as a mobile computing device 700B. Each of such devices may contain one or more of the computing device 800A and the mobile computing device 700B, and an entire system may be made up of multiple computing devices communicating with each other.

(64) FIG. 7B is a schematic illustrating a mobile computing apparatus that can be used to implement some techniques of the methods and systems, according to embodiments of the present disclosure. The mobile computing device 700B includes a bus 795 connecting a processor 761, a memory 762, an input/output device 763, a communication interface 764, among other components. The bus 795 can also be connected to a storage device 765, such as a micro-drive or other device, to provide additional storage. There can be a mother board or some other main aspect 799 of the computing device 700B of FIG. 7B.

(65) Referring to FIG. 7B, the processor 761 can execute instructions within the mobile computing device 700B, including instructions stored in the memory 762. The processor 761 may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor 761 may provide, for example, for coordination of the other components of the mobile computing device 700B, such as control of user interfaces, applications run by the mobile computing device 700B, and wireless communication by the mobile computing device 700B.

(66) The processor 761 may communicate with a user through a control interface 766 and a display interface 767 coupled to the display 768. The display 768 may be, for example, a TFT (Thin-Film-Transistor Liquid Crystal Display) display or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 767 may comprise appropriate circuitry for driving the display 768 to present graphical and other information to a user. The control interface 766 may receive commands from a user and convert them for submission to the processor 761. In addition, an external interface 769 may provide communication with the processor 761, so as to enable near area communication of the mobile computing device 700B with other devices. The external interface 769 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used.

(67) Still referring to FIG. 7B, the memory 762 stores information within the mobile computing device 700B. The memory 762 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. An expansion memory 770 may also be provided and connected to the mobile computing device 700B through an expansion interface 769, which may include, for example, a SIMM (single in line memory module) card interface. The expansion memory 770 may provide extra storage space for the mobile computing device 700B, or may also store applications or other information for the mobile computing device 700B. Specifically, the expansion memory 770 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, the expansion memory 770 may be providing as a security module for the mobile computing device 700B, and may be programmed with instructions that permit secure use of the mobile computing device 700B. In addition, secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner.

(68) The memory 762 may include, for example, flash memory and/or NVRAM memory (non-volatile random access memory), as discussed below. In some implementations, instructions are stored in an information carrier, that the instructions, when executed by one or more processing devices (for example, processor 761), perform one or more methods, such as those described above. The instructions can also be stored by one or more storage devices, such as one or more computer or machine readable mediums (for example, the memory 762, the expansion memory 770, or memory on the processor 762). In some implementations, the instructions can be received in a propagated signal, for example, over the transceiver 771 or the external interface 769.

(69) FIG. 7B is a schematic illustrating a mobile computing apparatus that can be used to implement some techniques of the methods and systems, according to embodiments of the present disclosure. The mobile computing apparatus or device 700B is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart-phones, and other similar computing devices. The mobile computing device 700B may communicate wirelessly through the communication interface 764, which may include digital signal processing circuitry where necessary. The communication interface 764 may provide for communications under various modes or protocols, such as GSM voice calls (Global System for Mobile communications), SMS (Short Message Service), EMS (Enhanced Messaging Service), or MMS messaging (Multimedia Messaging Service), CDMA (code division multiple access), TDMA (time division multiple access), PDC (Personal Digital Cellular), WCDMA (Wideband Code Division Multiple Access), CDMA2000, or GPRS (General Packet Radio Service), among others. Such communication may occur, for example, through the transceiver 771 using a radio-frequency. In addition, short-range communication may occur, such as using a Bluetooth, WiFi, or other such transceiver (not shown). In addition, a GPS (Global Positioning System) receiver module 773 may provide additional navigation and location related wireless data to the mobile computing device 700B, which may be used as appropriate by applications running on the mobile computing device 700B.

(70) The mobile computing device 700B may also communicate audibly using an audio codec 772, which may receive spoken information from a user and convert it to usable digital information. The audio codec 772 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the mobile computing device 700B. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on the mobile computing device 700B.

(71) Still referring to FIG. 7B, the mobile computing device 700B may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 774. It may also be implemented as part of a smart-phone 775, personal digital assistant, or other similar mobile device.

Embodiments

(72) The following description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.

(73) Specific details are given in the following description to provide a thorough understanding of the embodiments. However, understood by one of ordinary skill in the art can be that the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like reference numbers and designations in the various drawings indicated like elements.

(74) Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed, but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.

(75) Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, through the use of machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine readable medium. A processor(s) may perform the necessary tasks.

(76) Further, embodiments of the present disclosure and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Further some embodiments of the present disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus. Further still, program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

(77) According to embodiments of the present disclosure the term data processing apparatus can encompass all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

(78) A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.

(79) To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.

(80) Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.

(81) The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

(82) Although the present disclosure has been described with reference to certain preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the present disclosure. Therefore, it is the aspect of the append claims to cover all such variations and modifications as come within the true spirit and scope of the present disclosure.