Patent classifications
H04S2420/07
Spatial audio signal format generation from a microphone array using adaptive capture
Apparatus including a processor configured to: receive at least two microphone audio signals; determine spatial metadata associated with the at least two microphone audio signals; and synthesize adaptively a plurality of spherical harmonic audio signals based on at least one microphone audio signal and the spatial metadata in order to output a pre-determined order spatial audio signal format.
Parametric audio decoding
An apparatus includes a receiver and an up-mixer. The receiver is configured to receive a bitstream that includes an encoded mid signal and encoded stereo parameter information. The encoded stereo parameter information represents a first value of a stereo parameter and a second value of the stereo parameter. The first value is associated with a first frequency range. The second value is associated with a second frequency range that is distinct from the first frequency range. The up-mixer is configured to perform an up-mix operation on a frequency-domain decoded mid signal generated from the encoded mid signal. A particular value based on the first value and the second value is applied to the frequency-domain decoded mid signal during the up-mix operation.
METHOD FOR GENERATING FILTER FOR AUDIO SIGNAL, AND PARAMETERIZATION DEVICE FOR SAME
The present invention relates to a method for generating a filter for an audio signal and a parameterization device for the same, and more particularly, to a method for generating a filter for an audio signal, to implement filtering of an input audio signal with a low computational complexity, and a parameterization device therefor.
To this end, provided are a method for generating a filter for an audio signal, including: receiving at least one binaural room impulse response (BRIR) filter coefficients for binaural filtering of an input audio signal; converting the BRIR filter coefficients into a plurality of subband filter coefficients; obtaining average reverberation time information of a corresponding subband by using reverberation time information extracted from the subband filter coefficients; obtaining at least one coefficient for curve fitting of the obtained average reverberation time information; obtaining flag information indicating whether the length of the BRIR filter coefficients in a time domain is more than a predetermined value; obtaining filter order information for determining a truncation length of the subband filter coefficients, the filter order information being obtained by using the average reverberation time information or the at least one coefficient according to the obtained flag information and the filter order information of at least one subband being different from filter order information of another subband; and truncating the subband filter coefficient by using the obtained filter order information and a parameterization device therefor.
Processing of microphone signals for spatial playback
Disclosed are methods and systems which convert a multi-microphone input signal to a multichannel output signal making use of a time- and frequency-varying matrix. For each time and frequency tile, the matrix is derived as a function of a dominant direction of arrival and a steering strength parameter. Likewise, the dominant direction and steering strength parameter are derived from characteristics of the multi-microphone signals, where those characteristics include values representative of the inter-channel amplitude and group-delay differences.
SIGNAL PROCESSING DEVICE AND METHOD, LEARNING DEVICE AND METHOD, AND PROGRAM
The present technology relates to signal processing device and method, learning device and method, and a program that enable even an inexpensive device to perform audio replaying with high quality.
A signal processing device includes: a decoding processing unit that demultiplexes an input bit stream into a first audio signal, meta data of the first audio signal, and first high-frequency band information for expanding a band; a band expanding unit that performs band expansion processing on the basis of a second audio signal and second high-frequency band information and thereby generates an output audio signal, the second audio signal being obtained by performing signal processing on the basis of the first audio signal and the meta data, the second high-frequency band information being generated on the basis of the first high-frequency band information. The present technology can be applied to a smartphone.
SPATIAL AUDIO PROCESSING
According to an example embodiment, a method for processing a multi-channel input audio signal representing a sound field into a multi-channel output audio signal representing said sound field in accordance with a predefined loudspeaker layout is provided, the method comprising the following for at least one frequency band: obtaining spatial audio parameters that are descriptive of spatial characteristics of said sound field; estimating a signal energy of the sound field represented by the multi-channel input audio signal; estimating, based on said signal energy and the obtained spatial audio parameters, respective output signal energies for channels of the multi-channel output audio signal according to said predefined loudspeaker layout; determining a maximum output energy as the largest of the output signal energies across channels of said multi-channel output audio signal; and deriving, on basis of said maximum output energy, a gain value for adjusting sound reproduction gain in at least one of said channels of the multi-channel output audio signal.
AUDIO PROCESSING
According to an example embodiment, a technique for processing an input audio signal (101) comprising a multi-channel audio signal is provided, the technique comprising: deriving (104), based on the input audio signal (101), a first signal component (105-1) comprising a multi-channel audio signal that represents a focus portion of a spatial audio image conveyed by the input audio signal and a second signal component (105-2) comprising a multi-channel audio signal that represents a non-focus portion of the spatial audio image; processing (112) the second signal component (105-2) into a modified second signal component (113) wherein the width of the spatial audio image is extended from that of the second signal component (105-2); and combining (114) the first signal component (105-1) and the modified second signal component (112) into an output audio signal (115) comprising a multi-channel audio signal that represents partially extended spatial audio image.
AUDIO CROPPING
A method of cropping a portion of an audio signal captured from a plurality of spatially separated audio sources in a scene, the method comprising: capturing the audio signal with one or more recording devices; separating the audio signal into a plurality of components each associated with one or more of the plurality of audio sources; selecting a spatial region in the scene; determining which of the plurality of components are associated with an audio source positioned outside of the selected spatial region; and cropping the plurality of components associated with an audio source positioned outside of the selected spatial region out of the audio signal.
Audio distance estimation for spatial audio processing
A method for spatial audio signal processing including: determining at least one first direction parameter for at least one frequency band based on microphone signals received from a first microphone array; determining at least one second direction parameter for the at least one frequency band based on at least one microphone signal received from at least one second microphone, wherein microphones from the first microphone array and the at least one second microphone are spatially separated from each other; processing the determined at least one first direction parameter and the at least one second direction parameter to determine at least one distance parameter for the at least one frequency band; and enabling an output and/or store of the at least one distance parameter, at least one audio signal, and the at least one first direction parameter.
Crosstalk cancellation for opposite-facing transaural loudspeaker systems
Embodiments relate to audio processing for opposite facing speaker configurations that results in multiple optimal listening regions around the speakers. A system includes a left speaker and a right speaker in an opposite facing speaker configuration, and a crosstalk cancellation processor connected with the left speaker and the right speaker. The crosstalk cancellation processor applies a crosstalk cancellation to an input audio signal to generate left and right output channels. The left output channel is provided to the left speaker and the right output channel is provided to the right speaker to generate sound including multiple crosstalk cancelled listening regions that are spaced apart.