IPIQ

G10H2210/215

Time-varying and nonlinear audio processing using deep neural networks

12334043 · 2025-06-17 ·

WAVESHAPER TECHNOLOGIES INC.

A computer-implemented method of processing audio data, the method comprising receiving input audio data (x) comprising a time-series of amplitude values; transforming the input audio data (x) into an input frequency band decomposition (X1) of the input audio data (x); transforming the input frequency band decomposition (X1) into a first latent representation (Z); processing the first latent representation (Z) by a first deep neural network to obtain a second latent representation (Z{circumflex over ()}, Z1{circumflex over ()}); transforming the second latent representation (Z{circumflex over ()}, Z1{circumflex over ()}) to obtain a discrete approximation (X3{circumflex over ()}); element-wise multiplying the discrete approximation (X3{circumflex over ()}) and a residual feature map (R, X5{circumflex over ()}) to obtain a modified feature map, wherein the residual feature map (R, X5{circumflex over ()}) is derived from the input frequency band decomposition (X1); processing a pre-shaped frequency band decomposition by a waveshaping unit to obtain a waveshaped frequency band decomposition (X1{circumflex over ()}, X1.2{circumflex over ()}), wherein the pre-shaped frequency band decomposition is derived from the input frequency band decomposition (X1), wherein the waveshaping unit comprises a second deep neural network; summing the waveshaped frequency band decomposition (X1{circumflex over ()}, X1.2{circumflex over ()}) and a modified frequency band decomposition (X2{circumflex over ()}, X1.1{circumflex over ()}) to obtain a summation output (X0{circumflex over ()}), wherein the modified frequency band decomposition (X2{circumflex over ()}, X1.1{circumflex over ()}) is derived from the modified feature map; and transforming the summation output (X0{circumflex over ()}) to obtain target audio data (y{circumflex over ()}).

Patent classifications

G10H2210/215

Time-varying and nonlinear audio processing using deep neural networks