H04S7/306

Binaural rendering for headphones using metadata processing

Embodiments are described for a method of rendering audio for playback through headphones comprising receiving digital audio content, receiving binaural rendering metadata generated by an authoring tool processing the received digital audio content, receiving playback metadata generated by a playback device, and combining the binaural rendering metadata and playback metadata to optimize playback of the digital audio content through the headphones.

Method and device for processing music file, terminal and storage medium

Provided are a method and device for processing a music file, a terminal and a storage medium. The method comprises: in response to a received sound effect adjustment instruction, acquiring a music file, the adjustment of which is indicated by the sound effect adjustment instruction; carrying out vocals and accompaniment separation on the music file to obtain vocal data and accompaniment data in the music file; carrying out first sound effect processing on the vocal data to obtain target vocal data, and carrying out second sound effect processing on the accompaniment data to obtain target accompaniment data; and synthesizing the target vocal data and the target accompaniment data to obtain a target music file.

SOUND PROCESSING DEVICE, SOUND PROCESSING METHOD, AND SOUND PROCESSING PROGRAM
20230179946 · 2023-06-08 ·

Sound is flexibly reproduced. A sound processing device (10) according to an embodiment includes: an acquisition part (111) that acquires parameters relating to a way of sounding of sound from first sound data obtained by actual measurement; an adjustment part (1121) that adjusts the parameters in accordance with a space for reproduction; a synthesis part (1122) that generates third sound data from second sound data on the basis of parameters having been adjusted by the adjustment part (1121); and an output part (113) that reproduces the third sound data.

Adjustment of Reverberator Based on Source Directivity

An apparatus for assisting spatial rendering for room acoustics, the apparatus including means configured to: obtain directivity data having an identifier, wherein the directivity data includes data for at least two separate directions; obtain at least one room parameter; determine information associated with the directivity data; determine gain data based on the determined information; determine averaged gain data based on the gain data; and generate a bitstream defining a rendering, the bitstream including the averaged gain data and the at least one room parameter such that at least one audio signal associated with the identifier is configured to be rendered based on the at least one room parameter and the determined averaged gain data.

RENDERING ENCODED 6DOF AUDIO BITSTREAM AND LATE UPDATES
20230171557 · 2023-06-01 ·

Examples of the disclosure relate to apparatus, methods and computer programs for enabling audio content rendering. An example apparatus comprising means for receiving a bitstream which comprises audio content; means for receiving dynamic content independent from the bitstream; means for receiving at least one instruction for the dynamic content from at least one of: the received bitstream or the received dynamic content; and means for rendering audio with a renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction. In an embodiment, the means for receiving the at least one instruction comprises means for determining presence the of at least one instruction for the dynamic content in the bitstream, When the bitstream does not comprise the at least one instruction for the received dynamic content, the apparatus comprising means for rendering audio with a renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content, When the bitstream comprises the at least one instruction for the received dynamic content, the apparatus comprising means for rendering the audio with the renderer based upon the audio content of the bitstream, the received dynamic content, and the at least one instruction. In a further embodiment, the apparatus further comprising means for determining position of audio elements in the audio scene and audio elements in the dynamic content. When the audio elements in the audio scene and the audio elements in the dynamic content are in a same acoustic environment, the apparatus comprising means for rendering audio with the renderer based upon the audio content of the bitstream without adapting the audio based upon the received dynamic content. When the audio elements in the audio scene and the audio elements in the dynamic content are not in the same acoustic environment, the apparatus comprising means for rendering the audio with the renderer based upon both the audio content of the bitstream and the received dynamic content. In an embodiment, the apparatus further comprising means for determining an anchor object in an audio scene; means for determining at least one instruction for dynamic content relative to the anchor object; and means for transmitting the audio scene in a bitstream, where the bitstream comprises the at least one instruction.

SYSTEMS AND METHODS OF CALIBRATING EARPHONES
20170332186 · 2017-11-16 ·

Systems and methods to calibrate listening devices are disclosed herein. In some embodiments, a method to calibrate earphones includes determining a Head Related Transfer Functions (HRTF) corresponding to different parts of a user's anatomy (e.g., one or both of a listener's pinnae). The resulting HRTFs are combined to form a composite HRTF. In some embodiments, a first and a second HRTF are respectively determined for a first and second part of the user's anatomy. A composite HRTF of the user is generated by combining portions of the first and second HRTFs.

Device and method for adaptation of virtual 3D audio to a real room

The invention relates to the technical fields of binaural audio rendering and, to this end, estimation of room acoustic parameters like reverberation time. In particular, the invention provides a device and method for estimating such acoustic parameters. The device is configured to record an acoustic signal, particularly a speech signal, to estimate a frequency-dependent reverberation time in a lower frequency range based on the recorded acoustic signal, and to extend the frequency-dependent reverberation time to a higher frequency range based on a predetermined model to obtain an extended frequency-dependent reverberation time. Virtual 3D audio can thus be adapted to a real room.

AUDIO REPRODUCTION SYSTEMS AND METHODS

The system and method includes positioning a mobile device with a built-in loudspeaker at a first location in a listening environment and at least one microphone at at least one second location in the listening environment; emitting test audio content from the loudspeaker of the mobile device at the first position in the listening environment; receiving the test audio content emitted by the loudspeaker using the at least one microphone at the at least one second location in the listening environment; and, based at least in part on the received test audio content, determining one or more adjustments to be applied to desired audio content before playback by at least one earphone; wherein the first location and the second location are distant from each other so that the at least one microphone is within the near-field of the loudspeaker.

Matching reverberation in teleconferencing environments

A system and method of matching reverberation in teleconferencing environments. When the two ends of a conversation are in environments with differing reverberations, the method filters the reverberation so that when both signals are output at the near end (e.g., the audio signal from the far end and the sidetone from the near end), the reverberations match. In this manner, the user does not perceive an annoying difference in reverberations, and the user experience is improved.

SIGNAL PROCESSING METHODS AND SYSTEMS FOR RENDERING AUDIO ON VIRTUAL LOUDSPEAKER ARRAYS
20170245082 · 2017-08-24 ·

Techniques of rendering audio involve applying a balanced-realization state space model to each head-related transfer function (HRTF) to reduce the order of an effective FIR or even an infinite impulse response (IIR) filter. Along these lines, each HRTF G(z) is derived from a head-related impulse response filter (HRIR) via, e.g., a z-transform. The data of the HRIR may be used to construct a first state space representation [A, B, C, D] of the HRTF via the relation .G(z)=C(zI−A).sup.−1B+D This first state space representation is not unique and so for an FIR filter, A and B may be set to simple, binary-valued arrays, while C and D contain the HRIR data. This representation leads to a simple form of a Gramian Q whose eigenvectors provide system states that maximize the system gain as measured by a Hankel norm. Further, a factorization of Q provides a transformation into a balanced state space in which the Gramian is equal to a diagonal matrix of the eigenvalues of Q. By considering only those states associated with an eigenvalue greater than some threshold, the balanced state space representation of the HRTF may be truncated to provide an approximate HRTF that approximates the original HRTF very well while reducing the amount of computation required by as much as 90%.