RENDERING BINAURAL AUDIO OVER MULTIPLE NEAR FIELD TRANSDUCERS
20230074817 · 2023-03-09
Assignee
Inventors
- Mark F. Davis (Pacifica, CA)
- Nicolas R. Tsingos (San Francisco, CA)
- C. Phillip Brown (Castro Valley, CA)
Cpc classification
H04S2420/01
ELECTRICITY
H04S2420/03
ELECTRICITY
H04S2400/11
ELECTRICITY
H04S7/30
ELECTRICITY
H04S2420/11
ELECTRICITY
H04R2205/022
ELECTRICITY
International classification
Abstract
An apparatus and method of rendering audio. A binaural signal is split on an amplitude weighting basis into a front binaural signal and a rear binaural signal, based on perceived position information of the audio. In this manner, the front-back differentiation of the binaural signal is improved.
Claims
1. (canceled)
2. A method of rendering a spatial audio signal, the method comprising: receiving the spatial audio signal, wherein the spatial audio signal includes position information for rendering; determining a plurality of weights based on the position information; rendering the spatial audio signal to form a plurality of rendered signals, wherein the plurality of rendered signals are amplitude weighted according to the plurality of weights; combining the plurality of rendered signals into a joint rendered signal; generating metadata that relates the joint rendered signal to the plurality of rendered signals; and providing the joint rendered signal and the metadata to a loudspeaker system.
3. The method of claim 2, wherein rendering the spatial audio signal to form the plurality of rendered signals comprises: rendering the spatial audio signal to generate an interim rendered signal; and weighting the interim signal according to the plurality of weights to generate the plurality of rendered signals.
4. The method of claim 2, wherein the plurality of weights correspond to a front-back perspective applied to the position information.
5. The method of claim 2, wherein rendering the spatial audio signal to form the plurality of rendered signals corresponds to splitting the spatial audio signal, on an amplitude weighting basis, according to the plurality of weights.
6. The method of claim 2, wherein the spatial audio signal includes a plurality of audio objects, wherein each of the plurality of audio objects is associated with a respective position of the position information; wherein processing the spatial audio signal includes processing the plurality of audio objects to extract the position information; and wherein the plurality of weights correspond to the respective position of each of the plurality of audio objects.
7. The method of claim 2, wherein each of the plurality of rendered signals is a binaural signal that includes a left channel and a right channel.
8. The method of claim 2, wherein the plurality of rendered signals includes a front signal and a rear signal, wherein the front signal includes a left front channel and a right front channel, and wherein the rear signal includes a left rear channel and a right rear channel.
9. The method of claim 2, wherein the plurality of rendered signals includes a front signal, a rear signal, and another signal, wherein the front signal includes a left front channel and a right front channel, wherein the rear signal includes a left rear channel and a right rear channel, and wherein the another signal is an unpaired channel.
10. The method of claim 2, further comprising: generating, by the loudspeaker system, the plurality of rendered signals from the joint rendered signal using the metadata; and outputting, from a plurality of loudspeakers, the plurality of rendered signals.
11. The method of claim 2, further comprising: generating headtracking data; computing, based on the headtracking data, a front delay, a first front set of filter parameters, a second front set of filter parameters, a rear delay, a first rear set of filter parameters, and a second rear set of filter parameters; for a front binaural signal that includes a first channel signal and a second channel signal: generating a first modified channel signal by applying the front delay and the first front set of filter parameters to the first channel signal; generating a second modified channel signal by applying the second front set of filter parameters to the second channel signal; for a rear binaural signal that includes a third channel signal and a fourth channel signal: generating a third modified channel signal by applying the second rear set of filter parameters to the third channel signal; generating a fourth modified channel signal by applying the rear delay and the first rear set of filter parameters to the fourth channel signal; outputting, from a first front loudspeaker, the first modified channel signal; outputting, from a second front loudspeaker, the second modified channel signal; outputting, from a first rear loudspeaker, the third modified channel signal; and outputting, from a second rear loudspeaker, the fourth modified channel signal.
12. A non-transitory computer readable medium storing a computer program that, when executed by a processor, controls an apparatus to execute processing including the method of claim 2.
13. An apparatus for rendering a spatial audio signal, the apparatus comprising: a receiver configured to receive the spatial audio signal, wherein the spatial audio signal includes position information for rendering audio; a first processor configured to process the spatial audio signal to determine a plurality of weights based on the position information, a renderer configured to render the spatial audio signal to form a plurality of rendered signals, wherein the plurality of rendered signals are amplitude weighted according to the plurality of weights; and a second processor configured to combine the plurality of rendered signals into a joint rendered signal and determine metadata that relates the joint rendered signal to the plurality of rendered signals, wherein the second processor is configured to provide the joint rendered signal and the metadata to a loudspeaker system.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
DETAILED DESCRIPTION
[0050] Described herein are techniques for binaural audio processing. In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of the present invention. It will be evident, however, to one skilled in the art that the present invention as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
[0051] In the following description, various methods, processes and procedures are detailed. Although particular steps may be described in a certain order, such order is mainly for convenience and clarity. A particular step may be repeated more than once, may occur before or after other steps (even if those steps are otherwise described in another order), and may occur in parallel with other steps. A second step is required to follow a first step only when the first step must be completed before the second step is begun. Such a situation will be specifically pointed out when not clear from the context.
[0052] In this document, the terms “and”, “or” and “and/or” are used. Such terms are to be read as having an inclusive meaning. For example, “A and B” may mean at least the following: “both A and B”, “at least both A and B”. As another example, “A or B” may mean at least the following: “at least A”, “at least B”, “both A and B”, “at least both A and B”. As another example, “A and/or B” may mean at least the following: “A and B”, “A or B”. When an exclusive-or is intended, such will be specifically noted (e.g., “either A or B”, “at most one of A and B”).
[0053]
[0054] In general, the spatial audio signal 110 includes position information, and the rendering system 102 uses the position information when generating the rendered signals 120 in order for a listener to perceive the audio as originating from the various positions indicated by the position information. The spatial audio signal 110 may include audio objects, such as in the Dolby Atmos™ system or the DTS:X™ system. The spatial audio signal 110 may include B-format signals (e.g., using four component channels: W for the sound pressure, X for the front-minus-back sound pressure gradient, Y for left-minus-right, and Z for up-minus-down), such as in the Ambisonics™ system. The spatial audio signal 110 may be a surround sound signal, such as a 5.1-channel or 7.1-channel stereo signal. For channel signals (such as 5.1-channel), each channel may be assigned to a defined position, and may be referred to as bed channels. For example, the left bed channel may be provided to the left loudspeaker, etc.
[0055] According to an embodiment, the rendering system 102 generates the rendered signals 120 corresponding to front and rear binaural signals, each with left and right channels; and the loudspeaker system 104 includes four speakers that respectively output a left front channel, a right front channel, a left rear channel, and a right rear channel. Further details of the rendering system 102 and the loudspeaker system 104 are provided below.
[0056]
[0057] For example, an embodiment of the rendering system 200 includes two renderers 204 (e.g., a front renderer and a rear renderer) that respectively render a front binaural signal and a rear binaural signal (collectively forming the rendered signals 120). When the position information of a particular object indicates the sound is exclusively in the front, the weights 120 may be 1.0 provided to the front renderer, and 0.0 provided to the rear renderer, for that particular object. When the position information indicates the sound is exclusively in the rear, the weights 120 may be 0.0 provided to the front renderer, and 1.0 provided to the rear renderer, for that particular object. When the position information indicates the sound is exactly between the front and the rear, the weights 120 may be 0.5 provided to the front renderer, and 0.5 provided to the rear renderer, for that particular object. When the position information is otherwise between the front and the rear, the weights 120 may be similarly apportioned between the front renderer and the rear renderer, for that particular object. The weights 120 may be apportioned in an energy preserving manner; for example, when the position information indicates the sound is exactly between the front and the rear, the weights 120 may be ⅟sqrt(2) provided to the front renderer, and ⅟sqrt(2) provided to the rear renderer, for that particular object.
[0058]
[0059] For example, an embodiment of the rendering system 250 includes two weight modules 256 (e.g., a front weight module and a rear weight module) that respectively generate a front binaural signal and a rear binaural signal (collectively forming the rendered signals 120), in a manner similar to that described above regarding the weight calculator 202 (see
[0060] An example of calculating the weights (210 in
[0061] Continuing the example, further assume four loudspeakers arranged on the front left, the front right, the rear left, and the rear right. The renderer 254 (see
[0062] Continuing the example for a second audio object, the renderer 254 generates a left interim rendered signal and a right interim rendered signal for the signal of the second audio object. The weight modules 256 apply the front weight W1 and the rear weight W2 as described above, to generate the rendered signals for the loudspeakers that now include the weighted audio of both audio objects.
[0063] For B-format signals (e.g., first order Ambisonics™ or higher order Ambisonics™), the rendering system (e.g., the rendering system 250 of
[0064] For multiple pairs of speakers, a similar approach may be used where cosine lobes pointing towards the direction of each near-field speaker may be used to obtain different input signals or weights suitable for each binaural pair. Generally higher order lobes would be used as the number of speaker pairs increases in a way similar to a higher order Ambisonics™ stream may be decoded on a traditional sound speaker system.
[0065] For example, consider four loudspeakers arranged on the front left, the front right, the rear left, and the rear right. Further consider that the spatial audio signal 110 is a B-format signal having M basis signals (e.g., 4 basis signals w, x, y, z). The renderer 254 (see
[0066] In summary, for both the audio object case and the B-format case, the rendering of the input signal to binaural need only happen once per object (or soundfield basis signal); the matrixing / beamforming to generate the loudspeaker outputs is an additional matrixing / linear combination operation.
[0067]
[0068] At 302, a spatial audio signal is received. The spatial audio signal includes position information for rendering audio. For example, the rendering system 200 (see
[0069] At 304, the spatial audio signal is processed to determine a number of weights based on the position information. For example, the weight calculator 202 (see
[0070] At 306, the spatial audio signal is rendered to form a number of rendered signals. The rendered signals are amplitude weighted according to the weights. The rendered signals may include a number of binaural signals that are amplitude weighted according to the weights. As discussed above, generally speaking, these weights may be explicitly based on the x,y,z position of objects, so the system may binauralize each object and then send it to different pairs of speakers with appropriate weights. Alternatively, these weights may be implicitly part of the beamforming pattern. Then several input signals are obtained that can be individually binauralized and sent to their appropriate speaker pairs.
[0071] For example, the renderers 204 (see
[0072] As another example, the renderer 254 (see
[0073] At 308, a number of loudspeakers output the rendered signals. For example, the loudspeaker system 104 (see
[0074]
[0075] The processor 402 generally controls the operation of the rendering system 400. The processor 402 may execute one or more computer programs in order to implement the functions of the rendering system 200 (see
[0076] The memory 404 generally stores the data operated on by the processor 402, such as digital representations of the signals shown in
[0077] The input/output interfaces 406 and 408 generally interface the rendering system 400 with other components. The input/output interface 406 interfaces the rendering system 400 with the provider of the spatial audio signal 110. If the spatial audio signal 110 is stored locally, the input/output interface 406 may communicate with that local component. If the spatial audio signal 110 is received from a remote component, the input/output interface 406 may communicate with that remote component via a wired or wireless connection.
[0078] The input/output interface 408 interfaces the rendering system 400 with the loudspeaker system 104 (see
[0079]
[0080] The processor 502 generally controls the operation of the loudspeaker system 500, for example by executing one or more computer programs. The processor 502 may include, or be a component of, a programmable logic device or digital signal processor.
[0081] The memory 504 generally stores the data operated on by the processor 502, such as digital representations of the rendered signals 120. The memory 504 may also store any computer programs executed by the processor 502. The memory 504 may include volatile or non-volatile components.
[0082] The input/output interface 506 interfaces the loudspeaker system 500 with the rendering system 102 (see
[0083] The input/output interface 508 interfaces the loudspeakers 510 with the other components of the loudspeaker system 500.
[0084] The loudspeakers 510 generally output the auditory signals 130 (4 shown, 130a, 130b, 130c and 130d) that correspond to the rendered signals 120. According to an embodiment, the rendered signals 120 include a front binaural signal and a rear binaural signal; the loudspeaker 510a outputs a left channel of the front binaural signal, the loudspeaker 510b outputs a right channel of the front binaural signal, the loudspeaker 510c outputs a left channel of the rear binaural signal, and the loudspeaker 510d outputs a right channel of the rear binaural signal.
[0085] Since the rendered signals 120 have been weighted based on a front-back perspective applied to the position information in the spatial signal 110 (as discussed above regarding the rendering system 102), the loudspeakers 510a-510b output the left and right channels of the weighted front binaural signal, and the loudspeakers 510c-510d output the left and right channels of the weighted rear binaural signal. In this manner, the audio processing system 100 (see
[0086]
[0087]
[0088] The configurations of the loudspeakers in the loudspeaker system 600 may be varied as desired. For example, the angular separation of the loudspeakers may be adjusted to be greater than, or less than, 90 degrees. As another example, the angle of the front loudspeakers may be other than 45 and 315 degrees (e.g., 30 and 330 degrees). As a further example, the angle of the rear loudspeakers may be varied to be other than 135 and 225 degrees (e.g., 145 and 235 degrees).
[0089] The elevations of the loudspeakers in the loudspeaker system 600 may also be varied. For example, the loudspeakers may be increased, or decrease, in elevation from the elevations shown in
[0090] The quantities of the loudspeakers in the loudspeaker system 600 may also be varied. For example, a center loudspeaker may be added between the front loudspeakers 510a and 510b. Since this center loudspeaker outputs an unpaired channel, its corresponding renderer 204 (see
[0091] Another option for varying the number of loudspeakers is discussed with regard to
[0092]
[0093]
[0094] The configurations, positions, angles, quantities, and elevations of the loudspeakers 710 may be varied as desired, similar to the options discussed regarding the loudspeaker 600 (see
Visual Display Options
[0095] Embodiments may include a visual display to provide visual VR or AR aspects. For example, the loudspeaker system 600 (see
[0096] As with the other options described above, the configurations, positions, angles, quantities, and elevations of the loudspeakers may be varied as desired.
Metadata and Binaural Coding Options
[0097] As an alternative to sending separate rendered signals from the rendering system to the loudspeaker system (e.g., as shown in
[0098]
[0099] This process of combining may also be referred to as upmixing or forming a joint signal. According to an embodiment, the metadata 822 includes front-back amplitude ratios of the left and right channels in various frequency bands (e.g., on a quadrature mirror filter (QMF) subband basis).
[0100] The rendering system 802 may be implemented by components similar to those described above regarding the rendering system 400 (see
[0101]
[0102]
[0103] The loudspeaker system 904 may be implemented by components similar to those described above regarding the loudspeaker system 500 (see
Headtracking Options
[0104] As mentioned above, the audio processing system 100 (see
[0105]
[0106] The sensor 1050 detects the orientation of the loudspeaker system 1004 and generates headtracking data 1060 that corresponds to the detected orientation. The sensor 1050 may be an accelerometer, a gyroscope, a magnetometer, an infrared sensor, a camera, a radio frequency link, or any other type of sensor that allows for headtracking. The sensor 1050 may be a multiaxis sensor. The sensor 1050 may be one of a number of sensors that generate the headtracking data 1060 (e.g., one sensor generates azimuthal data, another sensor generates elevational data, etc.).
[0107] The front headtracking system 1052 modifies the front binaural signal 120a according to the headtracking data 1060 to generate a modified front binaural signal 120a'. In general, the modified front binaural signal 120a' corresponds to the front binaural signal 120a, but modified so that the listener perceives the front binaural signal 120a according to the changed orientation of the loudspeaker system 1004.
[0108] The rear headtracking system 1054 modifies the rear binaural signal 120b according to the headtracking data 1060 to generate a modified rear binaural signal 120b'. In general, the modified rear binaural signal 120b' corresponds to the rear binaural signal 120b, but modified so that the listener perceives the rear binaural signal 120b according to the changed orientation of the loudspeaker system 1004.
[0109] Further details of the front and rear headtracking systems 1052 and 1054 are provided with reference to
[0110] The left front loudspeaker 1010a outputs a left channel of the modified front binaural signal 120a' as the left front auditory output 130a. The right front loudspeaker 1010b outputs a right channel of the modified front binaural signal 120a' as the right front auditory output 130b. The left rear loudspeaker 1010c outputs a left channel of the modified rear binaural signal 120b' as the left rear auditory output 130c. The right rear loudspeaker 1010d outputs a right channel of the modified rear binaural signal 120b' as the right rear auditory output 130d.
[0111] As with the other embodiments described above, the configurations, positions, angles, quantities, and elevations of the loudspeakers in the loudspeaker system 1004 may be varied as desired.
[0112]
[0113] The calculation block 1102 generates a delay and filter parameters based on the headtracking data 1060, provides the delay to the delay blocks 1104 and 1106, and provides the filter parameters to the filter blocks 1108 and 1110. The filter coefficients may be calculated according to the Brown-Duda model (see C. P. Brown and R. O. Duda, “An efficient HRTF model for 3-D sound”, in WASPAA ‘97 (1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, Oct. 1997)), and the delay values may be calculated according to the Woodworth approximation (see R. S. Woodworth and G. Schlosberg, Experimental Psychology, pp. 349-361 (Holt, Rinehart and Winston, NY, 1962)), or any corresponding system of inter-aural level and time difference.
[0114] The delay block 1104 applies the appropriate delay to the input left signal L 1122, and the delay block 1106 applies the appropriate delay to the input right signal R 1124. For example, a leftward turn provides a delay D1 to the delay block 1104, and zero delay to the delay block 1106. Similarly, a rightward turn provides zero delay to the delay block 1104, and a delay D2 to the delay block 1106.
[0115] The filter block 1108 applies the appropriate filtering to the delayed signal from the delay block 1104, and the filter block 1110 applies the appropriate filtering to the delayed signal from the delay block 1106. The appropriate filtering will be either ipsilateral filtering (for the “near” ear) or contralateral filtering (for the “far” ear), depending upon the headtracking data 1060. For example, for a leftward turn, the filter block 1108 applies a contralateral filter, and the filter block 1110 applies an ipsilateral filter. Similarly, for a rightward turn, the filter block 1108 applies an ipsilateral filter, and the filter block 1110 applies a contralateral filter.
[0116] The rear headtracking system 1054 may be implemented similarly to the front headtracking system 1052. Differences include operating on the rear binaural signal 120b (instead of on the front binaural signal 120a), and inverting the headtracking data 1060 from that used by the front headtracking system 1052. For example, when the headtracking data 1060 indicates a leftward turn of 30 degrees (+30 degrees), the front headtracking system 1052 uses (+30 degrees) for its processing, and the rear headtracking system 1054 inverts the headtracking data 1060 as (-30 degrees) for its processing. Another difference is that the delay and the filter coefficients for the rear are slightly different from those for the front. In any event, the front headtracking system 1052 and the rear headtracking system 1054 may share the calculation block 1102.
[0117] The details of the headtracking operations may otherwise be similar to those described in International Application Pub. No. WO 2017223110 A1.
Implementation Details
[0118] An embodiment may be implemented in hardware, executable modules stored on a computer readable medium, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, the steps executed by embodiments need not inherently be related to any particular computer or other apparatus, although they may be in certain embodiments. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, embodiments may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
[0119] Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein. (Software per se and intangible or transitory signals are excluded to the extent that they are unpatentable subject matter.)
[0120] The above description illustrates various embodiments of the present invention along with examples of how aspects of the present invention may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present invention as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents will be evident to those skilled in the art and may be employed without departing from the spirit and scope of the invention as defined by the claims.