G10H1/16

Reverberation gain normalization

Systems and methods for providing accurate and independent control of reverberation properties are disclosed. In some embodiments, a system may include a reverberation processing system, a direct processing system, and a combiner. The reverberation processing system can include a reverb initial power (RIP) control system and a reverberator. The RIP control system can include a reverb initial gain (RIG) and a RIP corrector. The RIG can be configured to apply a RIG value to the input signal, and the RIP corrector can be configured to apply a RIP correction factor to the signal from the RIG. The reverberator can be configured to apply reverberation effects to the signal from the RIP control system. In some embodiments, one or more values and/or correction factors can be calculated and applied such that the signal output from a component in the reverberation processing system is normalized to a predetermined value (e.g., unity (1.0)).

Time-varying and nonlinear audio processing using deep neural networks

A computer-implemented method of processing audio data, the method comprising receiving input audio data (x) comprising a time-series of amplitude values; transforming the input audio data (x) into an input frequency band decomposition (X1) of the input audio data (x); transforming the input frequency band decomposition (X1) into a first latent representation (Z); processing the first latent representation (Z) by a first deep neural network to obtain a second latent representation (Z{circumflex over ()}, Z1{circumflex over ()}); transforming the second latent representation (Z{circumflex over ()}, Z1{circumflex over ()}) to obtain a discrete approximation (X3{circumflex over ()}); element-wise multiplying the discrete approximation (X3{circumflex over ()}) and a residual feature map (R, X5{circumflex over ()}) to obtain a modified feature map, wherein the residual feature map (R, X5{circumflex over ()}) is derived from the input frequency band decomposition (X1); processing a pre-shaped frequency band decomposition by a waveshaping unit to obtain a waveshaped frequency band decomposition (X1{circumflex over ()}, X1.2{circumflex over ()}), wherein the pre-shaped frequency band decomposition is derived from the input frequency band decomposition (X1), wherein the waveshaping unit comprises a second deep neural network; summing the waveshaped frequency band decomposition (X1{circumflex over ()}, X1.2{circumflex over ()}) and a modified frequency band decomposition (X2{circumflex over ()}, X1.1{circumflex over ()}) to obtain a summation output (X0{circumflex over ()}), wherein the modified frequency band decomposition (X2{circumflex over ()}, X1.1{circumflex over ()}) is derived from the modified feature map; and transforming the summation output (X0{circumflex over ()}) to obtain target audio data (y{circumflex over ()}).

Time-varying and nonlinear audio processing using deep neural networks

A computer-implemented method of processing audio data, the method comprising receiving input audio data (x) comprising a time-series of amplitude values; transforming the input audio data (x) into an input frequency band decomposition (X1) of the input audio data (x); transforming the input frequency band decomposition (X1) into a first latent representation (Z); processing the first latent representation (Z) by a first deep neural network to obtain a second latent representation (Z{circumflex over ()}, Z1{circumflex over ()}); transforming the second latent representation (Z{circumflex over ()}, Z1{circumflex over ()}) to obtain a discrete approximation (X3{circumflex over ()}); element-wise multiplying the discrete approximation (X3{circumflex over ()}) and a residual feature map (R, X5{circumflex over ()}) to obtain a modified feature map, wherein the residual feature map (R, X5{circumflex over ()}) is derived from the input frequency band decomposition (X1); processing a pre-shaped frequency band decomposition by a waveshaping unit to obtain a waveshaped frequency band decomposition (X1{circumflex over ()}, X1.2{circumflex over ()}), wherein the pre-shaped frequency band decomposition is derived from the input frequency band decomposition (X1), wherein the waveshaping unit comprises a second deep neural network; summing the waveshaped frequency band decomposition (X1{circumflex over ()}, X1.2{circumflex over ()}) and a modified frequency band decomposition (X2{circumflex over ()}, X1.1{circumflex over ()}) to obtain a summation output (X0{circumflex over ()}), wherein the modified frequency band decomposition (X2{circumflex over ()}, X1.1{circumflex over ()}) is derived from the modified feature map; and transforming the summation output (X0{circumflex over ()}) to obtain target audio data (y{circumflex over ()}).