H04S7/30

METHOD FOR GENERATING CUSTOMIZED SPATIAL AUDIO WITH HEAD TRACKING

A headphone for spatial audio rendering includes a first database having an impulse response pair corresponding to a reference speaker location. A head sensor provides head orientation information to a second database having rotation filters, the filters corresponding to different azimuth and elevation positions relative to the reference speaker location. A digital signal processor combines the rotation filters with the impulse response pair to generate an output binaural audio signal to transducers of the headphone. Efficiencies in creating impulse response or HRTF databases are achieved by sampling the impulse response less frequently than in conventional methods. This sampling at coarser intervals reduces the number of data measurements required to generate a spherical grid and reduces the time involved in capturing the impulse responses. Impulse responses for data points falling between the sampled data points are generated by interpolating in the frequency domain.

USER INTERFACE FOR MULTI-USER COMMUNICATION SESSION

The present disclosure generally relates to user interfaces for multi-user communication sessions. In some examples, a device initiates a live stream in a communication session. In some examples, a device transitions between streaming live audio and live video. In some examples, a device enables synchronizing media playback during a live stream. In some examples, a device displays synchronized media playback and plays a reaction from a first participant of the communication session.

ACOUSTIC MEASUREMENT
20230007420 · 2023-01-05 ·

A method for determining subject specific digital audio data can comprise providing at least one respective audio signal input to each of a plurality of loudspeaker elements supported in a predetermined spatial relationship, in which respective locations of an effective point source of each loudspeaker element all lie in an imaginary surface that at least partially contains a spatial region where at least one aural cavity of a subject is located, thereby providing a distance between each respective location and each aural cavity of less than 1.5 meters. Responsive to at least one audio signal output from at least one of the loudspeaker elements, via at least one microphone element located at or within an aural cavity of the subject, respective subject specific audio data output is provided and is processed via an audio processing system, the subject specific audio data output, thereby providing subject specific digital audio data.

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD
20230007232 · 2023-01-05 ·

Provided is an information processing device that performs processing on a content. An information processing device is provided with an estimation unit that estimates sounding coordinates at which a sound image is generated on the basis of a video stream and an audio stream, a video output control unit that controls an output of the video stream, and an audio output control unit that controls an output of the audio stream so as to generate the sound image at the sounding coordinates. A discrimination unit that discriminates a gazing point of a user who views video and audio is further provided, in which the estimation unit estimates the sounding coordinates at which the sound image of the object gazed by the user is generated on the basis of a discrimination result.

SCALABLE PARALLAX SYSTEM FOR RENDERING DISTANT AVATARS, ENVIRONMENTS, AND DYNAMIC OBJECTS
20230237731 · 2023-07-27 ·

Methods, systems, and storage media for rendering digital environments are disclosed. Exemplary implementations may: receive a data stream at an area rendering server; render, by the area rendering server, a global digital environment based at least in part on the received data stream; generate a parallax rendering data stream based at least in part on the received data stream; receive the parallax rendering data stream at a composition server; combine, by the composition server, the received parallax rendering data stream into a new parallax rendering data stream; render, by one or more client platforms, a local digital environment based at least in part on the new parallax rendering data stream; and cause display of the local digital environment through an output of the one or more client platforms.

Grouping and transport of audio objects

An apparatus for audio signal processing audio objects within at least one audio scene, the apparatus comprising at least one processor configured to:define for at least one time period at least one contextual grouping comprising at least two of a plurality of audio objects and at least one further audio object of the plurality of audio objects outside of the at least one contextual grouping, the plurality of audio objects within at least one audio scene; anddefine with respect to the at least one contextual grouping at least one first parameter and/or parameter rule type which is configured to be applied with respect to a common element associated with the at least two of the plurality of audio objects and wherein the at least one first parameter and/or parameter rule type is configured to be applied with respect to individual element associatedwith the at least one further audio object outside of the at least one contextual grouping, the at least one first parameter and/or parameter rule type being applied in audio rendering of both the at least two of the plurality of audio objects and the at least one further audio object.

Audio signal processing method and apparatus
11714596 · 2023-08-01 · ·

Disclosed is an operation method of an audio signal processing device configured to process an audio signal including a first audio signal component and a second audio signal component. The operation method includes: receiving the audio signal; normalizing loudness of the audio signal, based on a pre-designated target loudness; acquiring the first audio signal component from the audio signal having the normalized loudness, by using a machine learning model; and de-normalizing loudness of the first audio signal component, based on the pre-designated target loudness.

RENDERING AUDIO

An apparatus, method and computer program is described comprising: providing an incoming audio indication in response to incoming audio (41), the incoming audio indication comprising visual representations of a plurality of audio modes (55-58); receiving at least one input from a user (59) for selecting one of the plurality of audio modes (42); and rendering audio (43) based, at least partially, on the selected audio mode, wherein one or more parameters of the rendered audio are determined based on the selected audio mode.

PERCEPTUAL OPTIMIZATION OF MAGNITUDE AND PHASE FOR TIME-FREQUENCY AND SOFTMASK SOURCE SEPARATION SYSTEMS

A method comprises: obtaining softmask values for frequency bins of time-frequency tiles representing an audio signal; reducing, or expanding and limiting, the softmask values; and applying the reduced, or expanded and limited, softmask values to the frequency bins to create a time-frequency representation of an estimated target source. An alternative method comprises, for each time-frequency tile: obtaining softmask values; applying the softmask values to the frequency bins to create a time-frequency domain representation of an estimated target source; obtaining a panning parameter and a source concentration estimates for the target source; determining, using the panning parameter estimate and the softmask values, a magnitude for the time-frequency representation of the estimated target source; determining, using the panning parameter estimate and the source phase concentration estimate, a phase for the time-frequency representation of the estimated target source; and combining the magnitude and the phase.

EARPHONE AND METHOD FOR IDENTIFYING WHETHER AN EARPHONE IS BEING INSERTED INTO AN EAR OF A USER
20230232147 · 2023-07-20 ·

An earphone including a proximity sensor, an acceleration sensor, and a signal analysis device. The signal analysis device identifies an approaching movement of the earphone to an object using the proximity sensor signal. The signal analysis device ascertains whether the approaching movement is a movement of the earphone to an ear of the user. By filtering the acceleration sensor signal, the signal analysis device generates a high-pass filtered acceleration signal and a low-pass filtered acceleration signal. The signal analysis device determines an end time of the approaching movement based on a stabilization of the acceleration, using the low-pass filtered acceleration signal. The signal analysis device confirms that the approaching movement is a movement of the earphone to an ear of the user based on changes in the high-pass filtered acceleration signal after the ascertained end time of the approaching movement.