H04S1/00

Discrete binaural spatialization of sound sources on two audio channels

Embodiments relate to binaural spatialization of more than two sound sources on two audio channels of an audio system. Sound signals each emitted from a corresponding sound source are collected, and a respective virtual position within an angular range of a sound scene is assigned to each sound source. Multi-source audio signals are generated by panning each sound signal according to the respective virtual position. A first multi-source audio signal is spatialized to a first direction to generate a first left signal and a first right signal. A second multi-source audio signal is spatialized to a second direction to generate a second left signal and a second right signal. A binaural signal is generated using the first left signal, the second left signal, the first right signal, and the second right signal. The binaural signal is such that each sound source appears to originate from its respective virtual position.

METHOD FOR PROCESSING SOUND ON BASIS OF IMAGE INFORMATION, AND CORRESPONDING DEVICE

A method of processing an audio signal including at least one audio object based on image information includes: obtaining the audio signal and a current image that corresponds to the audio signal; dividing the current image into at least one block; obtaining motion information of the at least one block; generating index information including information for giving a three-dimensional (3D) effect in at least one direction to the at least one audio object, based on the motion information of the at least one block; and processing the audio object, in order to give the 3D effect in the at least one direction to the audio object, based on the index information.

Spatial audio navigation

Methods and apparatus for spatial audio navigation that may, for example, be implemented by mobile multipurpose devices. A spatial audio navigation system provides navigational information in audio form to direct users to target locations. The system uses directionality of audio played through a binaural audio device to provide navigational cues to the user. A current location, target location, and map information may be input to pathfinding algorithms to determine a real world path between the user's current location and the target location. The system may then use directional audio played through a headset to guide the user on the path from the current location to the target location. The system may implement one or more of several different spatial audio navigation methods to direct a user when following a path using spatial audio-based cues.

Camera microphone drainage system designed for beamforming
11570546 · 2023-01-31 · ·

An image capture device includes an audio depression formed into the housing with a drainage microphone mounted therein. A cover protects the drainage microphone disposed beneath the cover from an environment external to the image capture device. The cover and audio depression define a drainage channel extending from a channel entrance, through a channel volume, and out a channel exit. The surface area of the opening of the channel entrance is proportioned relative to the channel volume such that the ratio of the surface area to volume is greater than ten percent. This allows the cover to shift resonance outside of a desired frequency band.

Camera microphone drainage system designed for beamforming
11570546 · 2023-01-31 · ·

An image capture device includes an audio depression formed into the housing with a drainage microphone mounted therein. A cover protects the drainage microphone disposed beneath the cover from an environment external to the image capture device. The cover and audio depression define a drainage channel extending from a channel entrance, through a channel volume, and out a channel exit. The surface area of the opening of the channel entrance is proportioned relative to the channel volume such that the ratio of the surface area to volume is greater than ten percent. This allows the cover to shift resonance outside of a desired frequency band.

NOISE SUPPRESSION USING TANDEM NETWORKS

A device includes a memory configured to store instructions and one or more processors configured to execute the instructions. The one or more processors are configured to execute the instructions to receive audio data including a first audio frame corresponding to a first output of a first microphone and a second audio frame corresponding to a second output of a second microphone. The one or more processors are also configured to execute the instructions to provide the audio data to a first noise-suppression network and a second noise-suppression network. The first noise-suppression network is configured to generate a first noise-suppressed audio frame and the second noise-suppression network is configured to generate a second noise-suppressed audio frame. The one or more processors are further configured to execute the instructions to provide the noise-suppressed audio frames to an attention-pooling network. The attention-pooling network is configured to generate an output noise-suppressed audio frame.

COLORLESS GENERATION OF ELEVATION PERCEPTUAL CUES USING ALL-PASS FILTER NETWORKS
20230025801 · 2023-01-26 ·

A system includes one or more computing devices that encode spatial perceptual cues into a monaural channel to generate a plurality of output channels. A computing device determines a target amplitude response for the mid and side channels of the plurality of output channels, defining a spatial perceptual associated with one or more frequency-dependent phase shifts. The computing device determines a transfer function of a single-input, multi-output allpass filter based on the target amplitude response and determines coefficients of the allpass filter based on the transfer function, and processes the monaural channel with the coefficients of the allpass filter to generate the plurality of channels having the encoded spatial perceptual cues. The allpass filter is configured to be colorless with respect to the individual output channels, allowing for the placement of spatial cues into the audio stream to be decoupled from the overall coloration of the audio.

BEAM GENERATOR, BEAM GENERATING METHOD, AND CHIP

A beam generator, a beam generating method, and a chip are provided. The beam generator comprises a first channel, a second channel, and a signal merging module; the first channel comprises a first-channel filter, the first-channel filter is used to filter an input signal to obtain a first filtered signal; the first filtered signal comprises a desired signal; the second channel comprises: a second-channel blocking module, used to block the desired signal in the input signal to obtain a blocked signal; a compensation filter, connected to the second-channel blocking module for compensating for the blocked signal to obtain a second filtered signal; and an adaptive filter connected to the compensation filter for adaptively filtering the second filtered signal to obtain a third filtered signal; the signal merging module is for merging the first filtered signal and the third filtered signal to obtain an output signal.

BEAM GENERATOR, BEAM GENERATING METHOD, AND CHIP

A beam generator, a beam generating method, and a chip are provided. The beam generator comprises a first channel, a second channel, and a signal merging module; the first channel comprises a first-channel filter, the first-channel filter is used to filter an input signal to obtain a first filtered signal; the first filtered signal comprises a desired signal; the second channel comprises: a second-channel blocking module, used to block the desired signal in the input signal to obtain a blocked signal; a compensation filter, connected to the second-channel blocking module for compensating for the blocked signal to obtain a second filtered signal; and an adaptive filter connected to the compensation filter for adaptively filtering the second filtered signal to obtain a third filtered signal; the signal merging module is for merging the first filtered signal and the third filtered signal to obtain an output signal.

PERCEPTUAL OPTIMIZATION OF MAGNITUDE AND PHASE FOR TIME-FREQUENCY AND SOFTMASK SOURCE SEPARATION SYSTEMS

A method comprises: obtaining softmask values for frequency bins of time-frequency tiles representing an audio signal; reducing, or expanding and limiting, the softmask values; and applying the reduced, or expanded and limited, softmask values to the frequency bins to create a time-frequency representation of an estimated target source. An alternative method comprises, for each time-frequency tile: obtaining softmask values; applying the softmask values to the frequency bins to create a time-frequency domain representation of an estimated target source; obtaining a panning parameter and a source concentration estimates for the target source; determining, using the panning parameter estimate and the softmask values, a magnitude for the time-frequency representation of the estimated target source; determining, using the panning parameter estimate and the source phase concentration estimate, a phase for the time-frequency representation of the estimated target source; and combining the magnitude and the phase.