System and method for rendering real-time spatial audio in virtual environment
11632647 · 2023-04-18
Assignee
Inventors
Cpc classification
H04S2420/01
ELECTRICITY
H04S2400/15
ELECTRICITY
H04S2400/11
ELECTRICITY
International classification
Abstract
A new real-time spatial audio rendering system includes a real-time spatial audio rendering computer software application adapted to run on a communication device. The application renders stereo audio from mono audio sources in a virtual room of a listener. The listener can be mobile. The stereo audio is rendered for each listener within the room. The real-time spatial audio rendering system has two different modes, with and without reverberation. Reverberation can provide the sense of the dimensions of the room, First, the anechoic processing module produces the anechoic stereo audio that provides the sense of direction and distance of spatial audio. When reverberation is desired, the reverberation processing module is also performed to provide the sense of the room's dimensions by the spatial audio.
Claims
1. A computer-implemented method for rendering real-time spatial audio from mono audio sources in a virtual environment, said method performed by a real-time spatial audio rendering computer software application within a real-time spatial audio rendering system and comprising: 1) determining whether reverberation is configured for rendering spatial audio from a set of mono audio sources; 2) determining a set of dynamic locations of said set of mono audio sources relative to a listener's location in a virtual environment respectively; 3) obtaining a set of discrete Head-Related Impulse Responses (HRIRs); 4) converting said set of discrete HRIRs into continuous HRIRs; 5) determining interaural time differences of each mono audio source within said set of mono audio sources based on said set of dynamic locations; 6) modifying said continuous HRIRs with said interaural time differences to generate modified HRIRs; 7) applying gain control on audio signals of each mono audio source within said set of mono audio sources to generate modified audio signals; 8) convoluting said modified audio signals by said modified HRIRs to generate spatial audio signals of each mono audio source within said set of mono audio sources; and 9) combining said spatial audio signals of all mono audio sources within said set of mono audio sources to generate anechoic audio, said anechoic audio adapted to be played back by said communication device.
2. The method of claim 1, wherein said spatial audio is stereo audio.
3. The method of claim 1 further comprising compressing said anechoic audio's level to a target range for playback by said communication device wherein said spatial audio is stereo audio.
4. The method of claim 1, when reverberation is configured, further comprising: 1) generating Binaural Room Impulse Responses (BRIRs) based on a set of dimensions of a room of said listener and positions of said listener and said set of mono audio sources; 2) convoluting said audio signals of each mono audio source within said set of mono audio sources with said BRIRs to generate reverberation stereo audio of each mono audio source within said set of mono audio sources; 3) combining said reverberation stereo audio of all mono audio source within said set of mono audio sources to generate combined reverberation audio; and 4) mixing said anechoic audio with said combined reverberation audio for both a left channel and a right channel to generate final spatial audio for playback on said communication device.
5. The method of claim 4, wherein said spatial audio is stereo audio.
6. The method of claim 4 further comprising compressing said final spatial audio's level to a target range.
7. The method of claim 6, wherein said spatial audio is stereo audio.
8. A real-time spatial audio rendering system having a real-time spatial audio rendering computer software application adapted to run on a communication device, said real-time spatial audio rendering computer software application adapted to: 1) determine whether reverberation is configured for rendering spatial audio from a set of mono audio sources; 2) determine a set of dynamic locations of said set of mono audio sources relative to a listener's location in a virtual environment respectively; 3) obtain a set of discrete Head-Related Impulse Responses (HRIRs); 4) convert said set of discrete HRIRs into continuous HRIRs; 5) determine interaural time differences of each mono audio source within said set of mono audio sources set of dynamic locations; 6) modify said continuous HRIRs with said interaural time differences to generate modified HRIRs; 7) apply gain control on audio signals of each mono audio source within said set of mono audio sources to generate modified audio signals; 8) convolute said modified audio signals by said modified HRIRs to generate spatial audio signals of each mono audio source within said set of mono audio sources; and 9) combine said spatial audio signals of all mono audio sources within said set of mono audio sources to generate anechoic audio, said anechoic audio adapted to be played back by said communication device.
9. The real-time spatial audio rendering system of claim 8, wherein said spatial audio is stereo audio.
10. The real-time spatial audio rendering system of claim 8, wherein said real-time spatial audio rendering computer software application is further adapted to compress said anechoic audio's level to a target range for playback by said communication device.
11. The real-time spatial audio rendering system of claim 10, wherein said spatial audio is stereo audio.
12. The real-time spatial audio rendering system of claim 8, wherein, when reverberation is configured, said real-time spatial audio rendering computer software application is further adapted to: 1) generate Binaural Room Impulse Responses (BRIRs) based on a set of dimensions of a room of said listener and positions of said listener and said set of mono audio sources; 2) convolute said audio signals of each mono audio source within said set of mono audio sources with said BRIRs to generate reverberation stereo audio of each mono audio source within said set of mono audio sources; 3) combine said reverberation stereo audio of all mono audio source within said set of mono audio sources to generate combined reverberation audio; and 4) mix said anechoic audio with said combined reverberation audio for both a left channel and a right channel to generate final spatial audio for playback on said communication device.
13. The real-time spatial audio rendering system of claim 12, wherein said spatial audio is stereo audio.
14. The real-time spatial audio rendering system of claim 12, wherein said real-time spatial audio rendering computer software application is further adapted to compress said final spatial audio's level to a target range.
15. The real-time spatial audio rendering system of claim 14, wherein said spatial audio is stereo audio.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
(2) Although the characteristic features of this disclosure will be particularly pointed out in the claims, the invention itself, and the manner in which it may be made and used, may be better understood by referring to the following description taken in connection with the accompanying drawings forming a part hereof, wherein like reference numerals refer to like parts throughout the several views and in which:
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11) A person of ordinary skills in the art will appreciate that elements of the figures above are illustrated for simplicity and clarity, and are not necessarily drawn to scale. The dimensions of some elements in the figures may have been exaggerated relative to other elements to help understanding of the present teachings. Furthermore, a particular order in which certain elements, parts, components, modules, steps, actions, events and/or processes are described or illustrated may not be actually required. A person of ordinary skill in the art will appreciate that, for the purpose of simplicity and clarity of illustration, some commonly known and well-understood elements that are useful and/or necessary in a commercially feasible embodiment may not be depicted in order to provide a clear view of various embodiments in accordance with the present teachings.
DETAILED DESCRIPTION
(12) The new real-time (RT) spatial audio rendering system provides stereo audio output with or without reverberation. Reverberation provides the sense of dimensions of the virtual room size. Reverberation is not necessarily required depending on the usage because too much reverberation may reduce the intelligibility and not suitable for certain situations, such as a virtual meeting over the Internet with multiple participants. The RT spatial audio rendering system, in one implementation, includes a computer software application (also referred to herein as real-time spatial audio rendering computer software application) running on a communication device operated by the listener or a server computer for providing stereo audio to a listener from mono audio signals from one or more audio sources. When the server computer performs the spatial audio rendering, the computer software application obtains the input data from the listener's communication device over an Internet connection, generates the stereo audio and forwards the stereo audio data to the listener's communication device over the Internet for playback by the same device. The spatial audio rendering software application includes one or more computer programs that are written in computer software programming languages, such as C, C++, C#, Java, etc.
(13) The process by which the RT spatial audio rendering software application provides spatial audio (such as stereo audio) is further shown and generally indicated at 100 in
(14) The communication device and a server computer are further illustrated by reference to
(15) The communication device 202 (such as a laptop computer, a tablet computer, a smartphone, etc.), is further illustrated in
(16) The server computer 206 is further illustrated in
(17) Referring to
(18) Referring to
(19) Turning back to
(20) In real-time, the distance between listener and an audio source can be varying when the listener is mobile. As a result, the distances between the audio source and the listener's two ears are also varying. The latency difference is very important for the sense of space to the listener. Accordingly, at 508, the spatial audio rendering software application determines the interaural time differences (ITD) of each mono audio source within the set of audio sources by calculating the distance of the audio source to each of the listener's two ears and dividing the distances by the sound speed. The ITD calculation is further shown as follows:
ITD=a/c*(θ.sub.I=sinθ.sub.I)
(21) where a stands for the listener's head circumference, c stands for the speed of sound, and θ.sub.I is the interaural azimuth in radians. θ.sub.I is from 0 to π/2 for audio sources on listener's left side, and from π/2 to π for audio sources on listener's right side).
(22) At 510, the spatial audio rendering software application modifies the continuous HRIRs using the interaural time differences to generate modified HRIRs. In one implementation, additional samples of zeros are added to the continuous HRIRs. For example, when the audio source is at left side and the ITD is 1 ms, and the sampling rate of HRIRs is 48000 Hz, 48 samples of zeros are added to the beginning of the right side HRIRs.
(23) At 512, the spatial audio rendering software application applies gain control on the mono audio signals of the audio source. In particular, at 512, an audio source's volume is modified according to the distance between the mono audio source and listener. A gain adjusting the volume is applied to the audio signals from the audio source. The gain follows the volume propagation attenuation rules. In one implementation, the gain calculation is shown as follows:
(24)
(25) Where A(d) is the gain at distance d, d.sub.ref is the reference distance, and A.sub.ref is the reference gain. d.sub.ref and A.sub.ref are predefined parameters, meaning that at distance d.sub.ref, A.sub.ref is the amount of gain to be applied to the mono audio signals. The mono audio signals are multiplied by A(d) to generate modified audio signals of the audio source.
(26) At 514, the spatial audio rendering software application convolutes the modified mono audio signals of the audio source by the modified HRIRs (both right and left ears) to generate the stereo audio signals of the audio source. The stereo audio signals include both right and left channels. [How are ITD and A(d) involved/used in this step?]. At 516, the spatial audio rendering software application combines the stereo audio signals of each audio source within the set of audio sources (such as the audio sources P1 and P2 shown in
(27) When the room reverberation is desired for spatial audio rendering, the reverberation based on the Binaural Room Impulse Response (BRIR) is added during the spatial audio rendering. Referring to
(28) At 704, the spatial audio rendering software application generates BRIRs based on the room dimension and the positions of the listener and the audio sources. An illustrative virtual room is shown in
(29) At 706, the spatial audio rendering software application convolutes the mono audio signals of an audio source with the BRIRs to generate reverberation stereo audio (also referred to herein as reverberation audio and reverberation audio signals) of the audio source. At 708, the spatial audio rendering software application combines the generated reverberation stereo audio signals of all the audio sources within the set the audio sources (such as P1 and P2) to generate the combined reverberation stereo audio signals (or reverberation audio for short). In one implementation, the combination is achieved by adding the reverberation stereo audio signals of the set the audio sources together using the following equation:
(30)
(31) where S.sub.i stands for the reverberation stereo audio data of the i-th audio source and n stands for the number of audio sources.
(32) At 710, the spatial audio rendering software application mixes the anechoic stereo audio and the combined reverberation stereo audio for both the left and right channels to generate the final stereo audio for playback on the device 202. In one implementation, the mixing is the addition of the two categories of audio data. In a further implementation, at 712, the spatial audio rendering software application compresses the final audio signals's level to a target range to prevent the playback from being too loud. For instance, at 712, a dynamic audio compressor is applied to compress the final audio signal level to a target range.
(33) Obviously, many additional modifications and variations of the present disclosure are possible in light of the above teachings. Thus, it is to be understood that, within the scope of the appended claims, the disclosure may be practiced otherwise than is specifically described above.
(34) The foregoing description of the disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. The description was selected to best explain the principles of the present teachings and practical application of these principles to enable others skilled in the art to best utilize the disclosure in various embodiments and various modifications as are suited to the particular use contemplated. It should be recognized that the words “a” or “an” are intended to include both the singular and the plural. Conversely, any reference to plural elements shall, where appropriate, include the singular.
(35) It is intended that the scope of the disclosure not be limited by the specification, but be defined by the claims set forth below. In addition, although narrow claims may be presented below, it should be recognized that the scope of this invention is much broader than presented by the claim(s). It is intended that broader claims will be submitted in one or more applications that claim the benefit of priority from this application. Insofar as the description above and the accompanying drawings disclose additional subject matter that is not within the scope of the claim or claims below, the additional inventions are not dedicated to the public and the right to file one or more applications to claim such additional inventions is reserved.