G10H2250/025

Time-varying and nonlinear audio processing using deep neural networks

A computer-implemented method of processing audio data, the method comprising receiving input audio data (x) comprising a time-series of amplitude values; transforming the input audio data (x) into an input frequency band decomposition (X1) of the input audio data (x); transforming the input frequency band decomposition (X1) into a first latent representation (Z); processing the first latent representation (Z) by a first deep neural network to obtain a second latent representation (Z{circumflex over ()}, Z1{circumflex over ()}); transforming the second latent representation (Z{circumflex over ()}, Z1{circumflex over ()}) to obtain a discrete approximation (X3{circumflex over ()}); element-wise multiplying the discrete approximation (X3{circumflex over ()}) and a residual feature map (R, X5{circumflex over ()}) to obtain a modified feature map, wherein the residual feature map (R, X5{circumflex over ()}) is derived from the input frequency band decomposition (X1); processing a pre-shaped frequency band decomposition by a waveshaping unit to obtain a waveshaped frequency band decomposition (X1{circumflex over ()}, X1.2{circumflex over ()}), wherein the pre-shaped frequency band decomposition is derived from the input frequency band decomposition (X1), wherein the waveshaping unit comprises a second deep neural network; summing the waveshaped frequency band decomposition (X1{circumflex over ()}, X1.2{circumflex over ()}) and a modified frequency band decomposition (X2{circumflex over ()}, X1.1{circumflex over ()}) to obtain a summation output (X0{circumflex over ()}), wherein the modified frequency band decomposition (X2{circumflex over ()}, X1.1{circumflex over ()}) is derived from the modified feature map; and transforming the summation output (X0{circumflex over ()}) to obtain target audio data (y{circumflex over ()}).

Audio processing method, audio processing system, and computer-readable medium

An audio processing method obtains observed envelopes of picked-up sound signals including a first observed envelope representing a contour of a first sound signal including a first target sound from a first sound source and a second spill sound from a second sound source and a second observed envelope representing a contour of a second sound signal including a second target sound from the second sound source and a first spill sound from the first sound source; and generates, based on the observed envelopes, output envelopes including a first output envelope representing a contour of the first target sound in the first observed envelope and a second output envelope representing a contour of the second target sound in the second observed envelope, using a mix matrix including a mix proportion of the second spill sound in the first sound signal and a mix proportion of the first spill sound in the second sound signal.

ACOUSTIC OUTPUT SYSTEM, ACOUSTIC OUTPUT DEVICE, INFORMATION PROCESSING DEVICE, SOUND PRODUCTION METHOD, AND SOUND DATA GENERATION METHOD

Disclosed is an acoustic output device including: an operation receiver; a communication unit; and a controller. The controller generates acoustic data in response to a performance operation on the operation receiver by a user, causes the communication unit to output the generated acoustic data to an information processing device automatically without user intervention after the performance operation, acquires sound data generated based on the outputted acoustic data in the information processing device from the communication unit automatically without user intervention after the performance operation, and causes a speaker to produce sound based on the acquired sound data automatically without user intervention after the performance operation.