Method for acoustically rendering the size of a sound source
11388540 · 2022-07-12
Assignee
Inventors
Cpc classification
H04S7/305
ELECTRICITY
H04S2420/07
ELECTRICITY
H04S2420/01
ELECTRICITY
H04S2400/15
ELECTRICITY
H04R5/04
ELECTRICITY
H04S2400/11
ELECTRICITY
H04S2420/11
ELECTRICITY
H04R5/027
ELECTRICITY
H04S2400/01
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
H04R5/04
ELECTRICITY
Abstract
Movement of a sound source toward or away from a listener may be simulated by crossfading a sound level of a point sound source signal representation of a source waveform and a spherical harmonic representation of the source waveform as a simulated distance of a listener from a sound source changes to generate a cross-faded waveform. A speaker may be driven with the cross-faded waveform.
Claims
1. A method for simulation of movement of a sound source towards or away from a listener, comprising: crossfading a sound level of a point sound source signal representation of a source waveform and a spherical harmonic representation of the source waveform as a simulated distance of a listener from a sound source changes to generate a cross-faded waveform; and driving a speaker with the cross-faded waveform.
2. The method of claim 1 wherein the point sound source signal is at a simulated first distance from the listener and the spherical harmonic representation is at a simulated second distance from the listener.
3. The method of claim 2, wherein the simulated second distance from the listener is less than the simulated first distance from the listener.
4. The method of claim 3 wherein the spherical harmonic representation of the source waveform is a lower order spherical harmonic representation.
5. The method of claim 4 wherein the lower order spherical harmonic is a zeroth order spherical harmonic.
6. The method of claim 4 further comprising: interpolating between the lower order spherical harmonic representation and a higher order spherical harmonic representation at a simulate third distance from the listener wherein the simulated third distance is greater than the second distance and driving the speaker with the interpolation between the lower order spherical harmonic representation and the higher order spherical harmonic representation.
7. The method of claim 2, wherein the simulated second distance moves farther from the listener.
8. The method of claim 7 wherein the spherical harmonic representation is a higher order spherical harmonic representation.
9. The method of claim 7 wherein the higher order spherical harmonic is a fifth order spherical harmonic.
10. The method of claim 1, further comprising removing an inter-aural time delay (ITD) from a head-related transfer function (HRTF) prior to convolution of the source waveform with the HRTF to generate the point sound source signal.
11. The method of claim 10 wherein the HRTF is filtered with a minimum phase filter.
12. The method of claim 10 wherein said crossfading the sound level includes applying an ITD to the cross-faded waveform using a fractional delay filter.
13. A system, comprising: a processor; a speaker; a memory coupled to the processor, the memory having executable instructions embodied therein, the instructions being configured to cause the processor to carry out a method for simulation of movement of a sound source towards or away from a listener when executed, the method comprising: crossfading a sound level of a point sound source signal representation of a source waveform and a spherical harmonic representation of the source waveform as a simulated distance of a sound source relative to a listener changes to generate a cross-faded waveform; and driving the speaker with the cross-faded waveform.
14. The system of claim 13 wherein the spherical harmonic representation of the source waveform is a lower order spherical harmonic representation.
15. The system of claim 14 wherein the lower order spherical harmonic is a zeroth order spherical harmonic.
16. The system of claim 14 further comprising: interpolating between the lower order spherical harmonic representation and a higher order spherical harmonic representation.
17. The system of claim 13 wherein an inter-aural time delay (ITD) is removed from a head- related transfer function (HRTF) used to generate the point sound source signal.
18. The system of claim 17 wherein the HRTF is filtered with a minimum phase filter.
19. The system of claim 17 wherein said crossfading includes applying an ITD to the cross-2 faded point sound source using a fractional delay filter.
20. The system of claim 13, wherein the simulated second distance moves farther from the listener.
21. The system of claim 20 wherein the spherical harmonic representation is a higher order spherical harmonic representation.
22. The system of claim 21 wherein the higher order spherical harmonic is a fifth order spherical harmonic.
23. A non-transitory computer readable medium with executable instructions embodied therein wherein execution of the instructions cause a processor to carry out a method for simulation of movement of a sound source towards or away from a listener comprising: crossfading a sound level of a point sound source signal representation of a source waveform and a spherical harmonic representation of the source waveform as a simulated distance of a sound source relative to a listener changes to generate a cross-faded waveform; and 7 driving a speaker with the cross-faded waveform.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The teachings of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION
(7) Although the following detailed description contains many specific details for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the exemplary embodiments of the invention described below are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
(8) Introduction
(9) Aspects of the present disclosure relate to localization of sound in a sound system. Specifically the present disclosure relates transitioning between a point sound source simulation and a spherical harmonic representation of sound during the movement of a sound source towards or away from a listener. Typically in a sound system each speaker is connected to a main controller, sometimes referred to as an amplifier but may also take the form of a computer or game console. Each speaker unit in the sound system has a defined data path used to identify the individual unit, called a channel. In most modern speaker systems the overall amplitude or volume of each channel is controllable with the main controller. Additionally each speaker unit may also comprise several individual speakers that have different frequency response characteristics. For example a typical speaker unit comprises both a high range speaker, sometimes referred to as a tweeter and a mid-ranged speaker. These individual speakers typically cannot have their volume controlled individually thus for ease of discussion speaker hereafter will refer to a speaker unit meaning the smallest amount of speakers that can be have its volume controlled.
(10) Sound Localization Through Application of Transfer Functions
(11) One way to create localized sound is through a binaural recording of the sound at some known location and orientation with respect to the sound source. High quality binaural recordings may be created with dummy head recorder devices made of materials which simulate the density, size and average inter-aural distance of the human head. In creation of these recordings, information such as inter-aural time delay and frequency dampening due to the head is captured within the recording.
(12) Techniques have been developed that allow any audio signal to be localized without the need to produce a binaural recording for each sound. These techniques take a source sound signal which is in the amplitude over time domain and apply a transform to the source sound signal to place the signal in the frequency amplitude domain. The transform may be a Fast Fourier transform (FFT), Discrete Cosine Transform (DCT) and the like. Once transformed the source sound signal can be convolved with a Head Related Transfer Function (HRTF) through point multiplication at each frequency bin.
(13) The HRTF is a transformed version of the Head Related Impulse Response (HRIR) which captures the changes in sound emitted at a certain distance and angle as it passes between the ears of the listener. Thus the HRTF may be used to create a binaural version of a sound signal located at a certain distance from the listener. An HRIR is created by making a localized sound recording in an anechoic chamber similar to as discussed above. In general a broadband sound may be used for HRIR recording. Several recordings may be taken representing different simulated distances and angles of the sound source in relation to the listener. The localized recording is then transformed and the base signal is de-convolved with division at each frequency bin to generate the HRTF. Additionally the source sound signal may be convolved with a Room Transfer Function (RTF) through point multiplication at each frequency bin. The RTF is the transformed version of the Room Impulse Response (RIR). The RIR captures the reverberations and secondary waves caused by reflections of source sound wave within a room. The RIR may be used to create a more realistic sound and provide the listener with context for the sound. For example and without limitation an RIR may be used that simulates the reverberations of sounds within a concert hall or within a cave. The signal generated by transformation and convolution of the source sound signal with an HRTF followed by inverse transformation may be referred to herein as a point sound source simulation.
(14) The point source simulation recreates sounds as if they were a point source at some angle from the user. Larger sound sources are not easily reproducible with this model as the model lacks the ability to faithfully reproduce differences in sound pressure along the surface of the sound wave. Sound pressure differences which exist on the surface of a traveling sound wave are recognizable to the listener when a sound source is large and relatively close to the listener.
(15) Sound Localization Through Spherical Harmonics
(16) One approach to simulating sound pressure differences on the surface of a spherical sound wave is Ambisonics. Ambisonics as discussed above, models the sound coming from a speaker as time varying data on the surface of a sphere. A sound signal ƒ(t) arriving from location θ.
(17)
(18) Where φ is the azimuthal angle in the mathematic positive orientation and ϑ is the elevation of the spherical coordinates. This surround sound signal, ƒ(φ, ϑ, t) may then be described in terms of spherical harmonics where each increasing N order of the harmonic provides a greater degree of spatial recognition. The Ambisonic representation of a sound source is produced by spherical expansion up to an Nth truncation order resulting in (eq. 2).
ƒ(φ, ϑ, t)=Σ.sub.n=0.sup.NΣ.sub.m=−n.sup.nY.sub.n.sup.m(φ, ϑ)ϕ.sub.nm(t) (eq. 2)
(19) Where Y.sup.m.sub.n represents spherical harmonic matrix of order n and degree m (see
(20)
(21) Where individual terms can be of Y.sub.n.sup.m can be computed through a recurrence relation as described in Zotter, Franz, “Analysis and Synthesis of Sound-Radiation with Spherical Arrays,”Ph.D. dissertation, University of Music and Performing Arts, Graz, 2009 which is incorporated herein by reference.
(22) Conventional Ambisonic sound systems require a specific definition for expansion coefficients ϕ.sub.nm(t) and Normalization terms N.sub.n.sup.|m|. One traditional normalization method is through the use of a standard channel numbering system such as the Ambisonic Channel Numbering (ACN). ACN provides for fully normalized spherical harmonics and defines a sequence of spherical harmonics as ACN=n.sup.2+n+m where n is the order of the harmonic and m, is the degree of the harmonic. The normalization term for ACN is (eq. 4)
(23)
(24) ACN is one method of normalizing spherical harmonics and it should be noted that this is provided by way of example and not by way of limitation. There exist other ways of normalizing spherical harmonics which have other advantages. One example, provided without limitation, of an alternative normalization technique is Schmidt semi-normalization.
(25) Manipulation may be carried out on the band limited function on a unit sphere ƒ(θ) by decomposition of the function in to the spherical spectrum ϕ.sub.Nusing a spherical harmonic transform which is described in greater detail in J. Driscoll and D. Healy, “Computing Fourier Transforms and Convolutions on the 2-Sphere,” Adv. Appl. Math., vol. 15, no. 2, pp. 202-250, June 1994 which is incorporated herein by reference.
SHT{ƒ(θ)}=ϕ.sub.N=∫.sub.S.sub.
(26) Similar to a Fourier transform the spherical harmonic transform results in a continuous function which is difficult to calculate. Thus to numerically calculate the transform a Discrete Spherical Harmonic Transform is applied (DSHT). The DSHT calculates the spherical transform over a discrete number of direction Θ=[θ.sub.1, . . . θ.sub.L].sup.T Thus the DSHT definition result is;
DSHT{ƒ(Θ)}=ϕ.sub.N=Y.sub.N.sup.†(Θ)ƒ(Θ) (eq, 6)
(27) Where † represents the moore-penrose pseudo inverse
Y.sup.†=(Y.sup.TY).sup.−1Y.sup.T (eq. 7)
(28) The Discrete Spherical harmonic vectors result in a new matrix Y.sub.N(Θ) with dimensions L*(N+1).sup.2.The distribution of sampling sources for discrete spherical harmonic transform may be described using any known method. By way of example and not by way of limitation sampling methods used may be Hyperinterpolation, Guass-Legendre, Equiangular sampling, Equiangular cylindric, spiral points, HEALPix, Spherical t-designs. Methods for sampling are described in greater detail in Zotter Franz, “Sampling Strategies for Acoustic Holography/Holophony on the Sphere,” in NAG-DAGA, 2009 which is incorporated herein by reference. Information about spherical t-design sampling and spherical harmonic manipulation can be found in Kronlachner Matthias “Spatial Transformations for the Alteration of Ambisonic Recordings” Master Thesis, June 2014, Available at http://www.matthiaskronlachner.com/wp-content/uploads/2013/01/Kronlachner_Master_Spatial_Transformations_Mobile.pdf.
(29) Movement of Sound Sources
(30) The perceived location and distance of sound sources in an Ambisonic system may be changed by weighting the source signal with direction dependent gain g(θ) and the application of an angular transformation {θ} to the source signal direction θ. After inversion of the angular transformation the resulting source signal equation with the modified location ƒ′(θ, t) is;
ƒ′(θ, t)=g(.sup.−1{θ})ƒ(
.sup.−1{θ}, t) (eq. 8)
(31) The Ambisonic representation of this source signal is related by inserting ƒ(θ, t)=y.sub.N.sup.T(θ)ϕ.sub.N(t) resulting in the equation;
(32) y.sub.N.sup.T(θ)ϕ.sub.N′(t)=g(.sup.−1{θ})y.sub.N.sup.T(θ
.sup.−1{θ})ϕ.sub.N(t) (eq. 9)
(33) The transformed Ambisonic signal ϕ.sub.N′(t) is produced by removing y.sub.N.sup.T(θ) using orthogonality after integration over two spherical harmonics and application of discrete spherical harmonic transform (DSHT). Producing the equation;
ϕ.sub.N′(t)=T*ϕ.sub.N(t) (ea. 10)
(34) Where T represents the transformation matrix;
T=DHST{diag{g(.sup.−1{Θ})}y.sub.N.sup.†(θT.sup.−{Θ})}=Y.sub.N.sup.†(Θ)diag{g(
.sup.−1{Θ})}y.sub.N.sup.T(θT.sup.−1{Θ}) (eq. 11)
(35) Rotation of a sound source can be achieved by the application of a rotation matrix T.sub.r.sup.xyz which is further described in Zotter “Sampling Strategies for Acoustic Holography/Holophony on the Sphere,” and Kronlachner.
(36) Sound sources in the Ambisonic sound system may further be modified through warping. Generally a transformation matrix as described in Kronlachner may be applied to warp a signal in any particular direction. By way of example and not by way of limitation a bilinear transform may be applied to warp a spherical harmonic source. The bilinear transform elevates or lowers the equator of the source from 0 to arcsine α for any a between −1<α<1. For higher order spherical harmonics the magnitude of signals must also be changed to compensate for the effect of playing the stretched source on additional speakers or the compressed source on fewer speakers. The enlargement of a sound source is described by the derivative of the angular transformation of the source (σ). The energy preservation after warping then may be provided using the gain fact g(μ′) where;
(37)
(38) Warping and compensation of a source distributes part of the energy to higher orders. Therefore the new warped spherical harmonics will require a different expansion order at higher decibel levels to avoid errors. As discussed earlier these higher order spherical harmonics capture the variations of sound pressure on the surface of the spherical sound wave.
(39) The computations for localization of sound sources in the spherical harmonics representation can be quite involved even for small sources as can be seen from the above discussion. Thus it would be beneficial to create a system that could capture the fidelity of the spherical harmonics representation with the reduced computing requirements of the transfer function model.
(40) Combination Spherical Harmonic and Point Sound Source Simulation
(41) According to aspects of the present disclosure a sound system may crossfade the point sound source simulation with the spherical harmonic representation of the sound source. The sound level crossfade between the two models is performed on the volume/amplitude. The system may determine the level of cross fade based on the simulated location and/or size of a sound source.
(42) Generally sound sources that are far away can be represented as point sources because only a narrow window of the signal is perceivable. This narrow perceivable window does not provide the listener with enough information to recognize higher order harmonic features within the source. Similarly small sources and quiet sources do not produce enough information for the average person to perceive higher order features. In the spherical harmonic representation a far away, small or quiet sound sources may be represented as zeroth order sound signals 101. According to aspects of the present disclosure the far away, small and/or quiet sound sources are represented by point sound source simulation. Larger, louder and/or closer sound sources may be represented by the spherical harmonic representation The benefit of using the point sound source simulation for far away, small and/or quiet sources is that it requires less computation than the spherical harmonic representation.
(43) The simulated locations of sound sources within a sound system are not always fixed and it would be desirable to accurately simulate effect of movement on sound source as it approaches or moves away from the listener.
(44) By way of example, and not by way of limitation, the sound source may have a simulated location 301 that is at a point far away from the listener 320. This far away sound source 310 may be localized through transformation and convolution of the signal with an HRIR 212 chosen to simulate the point 310 far away from the user. The simulated location of the sound source may move to a second point 302 closer to the listener 320. The second point 302 may be close enough that the listener 320 would perceive differences in sound pressure on the surface of the spherical sound wave 311 if it were a natural sound. Thus the sound source at the second point 302 should be localized using discrete spherical harmonic functions at 203.
(45) A transition of the source sound between the first point and the second point may be performed by gradually lowering the volume of the transfer function representation while gradually raising the volume of the spherical harmonic representation during the crossfade 205. The volume of the point source simulation may be full while the spherical harmonic representation is zero or not calculated at 304. As the simulated location of the sound sources moves, the volume of both representations is altered. At some point during the transition the volume of the spherical harmonic representation and the point source simulation will be equivalent at 305. When the simulated location of the source moves to some predetermined point from the user 320 the volume of the point source simulation will be attenuated at 306 leaving only the spherical harmonic representation. In an embodiment the cross fade at 305 may be incremented gradually so that each unit of distance the simulated location moves away from the first point and towards the second point corresponds to a linear decrease in the volume of the point sound source simulation and a linear increase in the volume in the spherical harmonic representation. In alternative embodiments the crossfade may be performed as a logarithmic or exponential function with respect to the simulated location of the sound source. Similar to the transition from a far source to a close source the transition from a close source to a far source may be performed by lowering the volume spherical harmonic representation while increasing the volume of the point sound source simulation.
(46) Additionally as the simulated location of the sound source moves from the first point to the second point it may be desirable to apply a second HRIR chosen to simulate a transition point. In this case the first HRIR would be convolved with the source signal and the second HRIR would be convolved with the source signal. In some implementations, as the simulated location of sound source moves from the first point to the transition point the volume level of the two different HRIR convolved signals may be crossfaded incrementally, e.g., the volume level of the source signal convolved with the first HRIR may be decreased and volume level of the second HRIR may be increased as the simulated location of the sound source moves from the first point to the transition point. Alternatively the system may interpolate between the first and second HRTF and convolve the source signal with the Interpolated HRTF. The system may then playback the first HRTF convolved signal, the Interpolated HRTF convolved signal and the second HRTF convolved signal respectively to simulate movement of the location of the sound from the first point to the transition point.
(47) According to additional aspects of the present disclosure in generating the HRTF representation at 201 the Inter-aural time delay may optionally be reduced to zero during the transition between the first simulated location of the sound source and the second simulated location of the sound source. Inter-aural time delay (ITD) captures the time it takes for a sound wave to travel from one ear of the listener to the other ear of the listener. The listener may use the time delay information in the determination of the location of a sound. In general this information is captured by HRIR recordings. The ITD information may be removed from the HRTF recordings through the use of a minimum phase filter 202 or other suitable filter. The ITD may be adjusted during or after convolution of the source signal with the HRTF at 204 and application of the crossfade to the point sound source simulation at 205.
(48) ITD information may be adjusted through the use of a fractional delay filter 206. Fractional delays may be applied to the left or right signal depending on the simulated location of the source in relation to the user's head. By way of example and not by way of limitation if the simulated location of the source is directly left of the listener's head then the right signal will have the greatest delay. Similarly if the signal is in front or behind the listener's head there will be no difference in the delay of the left and right signals. The delay between the left and right signals may be changed fractionally based how far from the center front or center rear of listener the simulated location of the source is.
(49) According to aspects of the present disclosure as the simulated location of the source approaches the listener, the transition between the transfer function model and the spherical harmonic model occurs at the zeroth order spherical harmonic 311. Similarly as the simulated location of the sound source moves away from the user the transition should occur at the zeroth order harmonic 311. It should be understood that as the simulated location of the source moves away from the listener it may be represented by increasingly higher order spherical harmonics 312 representing widening of the sound source. According to additional aspects of the present disclosure as the distance of the sound source from the listener 320 increases it may reach a transition point 303 representing the narrowing extent of the sound source due to distance. Past this transition period 309 the sound source may be represented as the interpolation between the zeroth order harmonic and the previous harmonic order as shown in volume plot 307. On the volume plot 307 in
(50) System
(51) Turning to
(52) The example system 400 may include computing components which are coupled to a sound system 440 in order to process and/or output audio signals in accordance with aspects of the present disclosure. By way of example, and not by way of limitation, in some implementations the sound system 440 may be a set of stereo or surround headphones, some or all of the computing components may be part of a headphone system 440 Furthermore, in some implementations, the system 400 may be part of an embedded system, mobile phone, personal computer, tablet computer, portable game device, workstation, game console, set-top box, stand-alone amplifier unit and the like.
(53) The example system may additionally be coupled to a game controller 430. The game controller may have numerous features which aid in tracking its location and which may be used to assist in the optimization of sound. A microphone array may be coupled to the controller for enhanced location detection. The game controller may also have numerous light sources that may be detected by an image capture unit and the location of the controller within the room may be detected from the location of the light sources. Other location detection systems may be coupled to the game controller 430, including accelerometers and/or gyroscopic displacement sensors to detect movement of the controller within the room. According to aspects of the present disclosure the game controller 430 may also have user input controls such as a direction pad and buttons 433, joysticks 431, and/or Touchpads 432. The game controller may also be mountable to the user's body.
(54) The system 400 may be configured to process audio signals to de-convolve and convolve impulse responses and generate spherical harmonic signals in accordance with aspects of the present disclosure. The system 400 may include one or more processor units 401, which may be configured according to well-known architectures, such as, e.g., single-core, dual-core, quad-core, multi-core, processor-coprocessor, accelerated processing unit and the like. The system 400 may also include one or more memory units 402 (e.g., RAM, DRAM, ROM, and the like).
(55) The processor unit 401 may execute one or more programs 404, portions of which may be stored in the memory 402, and the processor 401 may be operatively coupled to the memory 402, e.g., by accessing the memory via a data bus 420. The programs may be configured to process source audio signals 406, e.g. for converting the signals to localized signals for later use or output to the headphones 440. The programs may configure the processing unit 401 to generate spherical harmonic Data 409 representing the spherical harmonics of the signal data 406. Additionally the memory 402 may have HRTF Data 407 for convolution with the signal data 406. By way of example, and not by way of limitation, the memory 402 may include programs 404, execution of which may cause the system 400 to perform a method having one or more features in common with the example methods above, such as method 200 of
(56) The system 400 may also include well-known support circuits 410, such as input/output (I/O) circuits 411, power supplies (P/S) 412, a clock (CLK) 413, and cache 414, which may communicate with other components of the system, e.g., via the bus 420. The system 400 may also include a mass storage device 415 such as a disk drive, CD-ROM drive, tape drive, flash memory, or the like, and the mass storage device 415 may store programs and/or data. The system 400 may also include a user interface 418 and a display 416 to facilitate interaction between the system 400 and a user. The user interface 418 may include a keyboard, mouse, light pen, touch interface, or other device. The system 400 may also execute one or more general computer applications (not pictured), such as a video game, which may incorporate aspects of surround sound as computed by the sound localizing programs 404.
(57) The system 400 may include a network interface 408, configured to enable the use of Wi-Fi, an Ethernet port, or other communication methods. The network interface 408 may incorporate suitable hardware, software, firmware or some combination thereof to facilitate communication via a telecommunications network. The network interface 408 may be configured to implement wired or wireless communication over local area networks and wide area networks such as the Internet. The system 400 may send and receive data and/or requests for files via one or more data packets over a network.
(58) It will readily be appreciated that many variations on the components depicted in
(59) Conclusion
(60) While the above is a complete description of the preferred embodiment of the present invention, it is possible to use various alternatives, modifications and equivalents. Therefore, the scope of the present invention should be determined not with reference to the above description but should, instead, be determined with reference to the appended claims, along with their full scope of equivalents. Any feature described herein, whether preferred or not, may be combined with any other feature described herein, whether preferred or not. In the claims that follow, the indefinite article “a”, or “an” refers to a quantity of one or more of the item following the article, except where expressly stated otherwise. The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase “means for.”