Patent classifications
H04S3/004
Generating binaural audio in response to multi-channel audio using at least one feedback delay network
In some embodiments, virtualization methods for generating a binaural signal in response to channels of a multi-channel audio signal, which apply a binaural room impulse response (BRIR) to each channel including by using at least one feedback delay network (FDN) to apply a common late reverberation to a downmix of the channels. In some embodiments, input signal channels are processed in a first processing path to apply to each channel a direct response and early reflection portion of a single-channel BRIR for the channel, and the downmix of the channels is processed in a second processing path including at least one FDN which applies the common late reverberation. Typically, the common late reverberation emulates collective macro attributes of late reverberation portions of at least some of the single-channel BRIRs. Other aspects are headphone virtualizers configured to perform any embodiment of the method.
HEAD TO HEADSET ROTATION TRANSFORM ESTIMATION FOR HEAD POSE TRACKING IN SPATIAL AUDIO APPLICATIONS
In an embodiment, a method comprises: estimating a first gravity direction in a source device reference frame for a source device; estimating a second gravity direction in a headset reference frame for a headset; estimating a rotation transform from the headset frame into a face reference frame using the first and second estimated gravity directions, a rotation transform from a camera reference frame to the source device reference frame, and a rotation transform from the face reference frame to the camera reference frame; estimating a relative position and attitude using source device motion data, headset motion data and the rotation transform from the headset frame to the face reference frame; using the relative position and attitude to estimate a head pose; and using the estimated head pose to render spatial audio for playback on the headset.
SYSTEM AND METHODS FOR CINEMATIC HEADPHONES
A cinema system comprises video equipment configured to generate video images from video data for presentation on a screen; a headphone system including left and right ear cups, each including at least one driver configured to drive highs and/or mids based on headphone sound signals and not configured to drive lows; a first DAC configured to convert audio data based on the headphone sound data to the headphone sound signals; and a control system configured to generate the audio data from at least headphone sound data; one or more low-frequency speakers configured to drive lows based on low-frequency speaker signals; a second DAC configured to generate the low-frequency speaker signals from low-frequency speaker data; and a server system configured to assist in providing the video data to the video equipment, the headphone sound data to the one or more headphone systems, and the low-frequency speaker data to the second DAC.
Spatial Audio Augmentation and Reproduction
An apparatus including circuitry configured for: obtaining at least one spatial audio signal comprising including at least one audio signal, wherein the at least one spatial audio signal defines an audio scene forming at least in part media content; rendering an audio scene based on the at least one spatial audio signal; obtaining at least one augmentation audio signal; transforming the at least one augmentation audio signal to at least two audio objects; augmenting the audio scene based on the at least two audio objects.
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM
An information processing device (100) according to the present disclosure includes: an acquisition unit (141) configured to acquire a first image including a content image of an ear of a user; and a calculation unit (142) configured to calculate, based on the first image acquired by the acquisition unit (141), a head-related transfer function corresponding to the user by using a learned model having learned to output a head-related transfer function corresponding to an ear when an image including a content image of the ear is input.
Method and apparatus for binaural rendering audio signal using variable order filtering in frequency domain
The present invention relates to a method and an apparatus for binaural rendering an audio signal using variable order filtering in frequency domain. To this end, provided are a method for processing an audio signal including: receiving an input audio signal; receiving a set of truncated subband filter coefficients for filtering each subband signal of the input audio signal, the set of truncated subband filter coefficients being constituted by one or more FFT filter coefficients generated by performing FFT by a predetermined block size; generating at least one subframe for each subband; generating at least one filtered subframe for each subband; performing inverse FFT on the filtered subframe for each subband; and generating a filtered subband signal by overlap-adding the transformed subframe for each subband and an apparatus for processing an audio signal using the same.
Spatial audio augmentation and reproduction
An apparatus including circuitry configured for: obtaining at least one spatial audio signal including at least one audio signal, wherein the at least one spatial audio signal defines an audio scene forming at least in part media content; rendering an audio scene based on the at least one spatial audio signal; obtaining at least one augmentation audio signal; transforming the at least one augmentation audio signal to at least two audio objects; augmenting the audio scene based on the at least two audio objects.
Generating Binaural Audio in Response to Multi-Channel Audio Using at Least One Feedback Delay Network
In some embodiments, virtualization methods for generating a binaural signal in response to channels of a multi-channel audio signal, which apply a binaural room impulse response (BRIR) to each channel including by using at least one feedback delay network (FDN) to apply a common late reverberation to a downmix of the channels. In some embodiments, input signal channels are processed in a first processing path to apply to each channel a direct response and early reflection portion of a single-channel BRIR for the channel, and the downmix of the channels is processed in a second processing path including at least one FDN which applies the common late reverberation. Typically, the common late reverberation emulates collective macro attributes of late reverberation portions of at least some of the single-channel BRIRs. Other aspects are headphone virtualizers configured to perform any embodiment of the method.
PERSONALIZED THREE-DIMENSIONAL AUDIO
A headphone system includes a calibration microphone for performing a calibration routine with a user. The calibration microphone receives a stimulus signal emitted by the headphone system and generates a response signal indicating variations in the stimulus signal that arise due to physiological attributes of the user. Based on the stimulus signal and the response signal, the calibration engine generates response data. The calibration engine processes the response data based on a headphone transfer function (HPTF) associated with the headphone system in order to create an inverse filter that can reduce or remove acoustic variations caused by the headphone system. The calibration engine generates a personalized HRTF for the user based on the response data and the inverse filter. The personalized HRTF can be used to implement highly accurate 3D audio and is thereby well-suited for applications to immersive audio and audio-visual entertainment.
Surround Sound Location Virtualization
A computer program product having a non-transitory computer-readable medium including computer program logic encoded thereon that, when performed on a surround sound audio system that is configured to render left front, right front, and center front audio signals, and also render left and right near-field binaurally-encoded audio signals, causes the surround sound audio system to develop the left and right near-field binaurally-encoded audio signals, and provide the left near-field binaurally-encoded audio signal to a left non-occluding near-field driver and provide the right near-field binaurally-encoded audio signal to a right non-occluding near-field driver.