Systems and methods for improving audio virtualization
11611828 · 2023-03-21
Assignee
Inventors
Cpc classification
H04S2420/07
ELECTRICITY
H04S2420/01
ELECTRICITY
H04R5/04
ELECTRICITY
H04S5/00
ELECTRICITY
International classification
H04R5/04
ELECTRICITY
H04S7/00
ELECTRICITY
Abstract
Virtual sound room rendering is most realistic when the listener has themselves been the subject of the binaural room impulse response measurements, and most pleasing when the sound room involved has a high acoustic fidelity. Where the listener has no access to good sound rooms non-personalised high fidelity sound rooms are modified using information from a listener's personalised binaural impulse response data to improve the realism of such rooms. Where sound rooms are available, information from higher fidelity non-personalised sound rooms are used to improve the sound quality of the listener's personalised room data. Alternatively either personalised or non-personalised rooms can be improved through modification of their reverberation characteristics according to the listener's taste.
Claims
1. A digital signal processing method for creating binaural room impulse response data, the method comprising: providing data representing a personalized binaural room impulse response, said personalized binaural impulse response being created with respect of a target listener; providing data representing a non-personalized binaural room impulse response, said non-personalized binaural impulse response being created with respect of a dummy or a person other than the target listener; and using said personalized binaural impulse response data and said non-personalized binaural impulse response data to create data representing a hybrid binaural room impulse response; wherein creating said hybrid binaural room impulse response data involves modifying said non-personalized binaural room impulse response with at least one aspect of said personalized binaural room impulse response that is independent of a room in which said personalized binaural room impulse response is created, and using said modified non-personalized binaural room impulse response as said hybrid binaural room impulse response.
2. The method of claim 1, wherein said data comprises a plurality of portions, each portion representing a different aspect of said respective binaural room impulse response; the creating of said hybrid binaural room impulse response data involves: using at least one portion of said personalized binaural room impulse response data to provide the or each corresponding portion of said hybrid binaural room impulse response data; and using at least one other portion of said non-personalized binaural room impulse response data to provide the or each other corresponding portion of said hybrid binaural room impulse response data; said plurality of portions comprising a first portion representing a portion of the respective binaural room impulse response that is independent of a room which said respective binaural room impulse response represents; creating said hybrid binaural room impulse response data involving using the first portion of said personalized binaural room impulse response data to provide the first portion of said hybrid binaural room impulse response data; and said first portion comprising data representing a head related impulse response (HRIR) portion of the respective binaural room impulse response, said HRIR portion of said personalized binaural room impulse response data being used to provide the HRIR portion of said binaural room impulse response data, the HRIR data portion comprising data representing at least one frequency component of the HRIR portion of the personalized binaural room impulse response.
3. The method of claim 2, further comprising filtering said HRIR data portion of said personalized binaural room impulse response, and using said filtered HRIR data portion to provide the HRIR portion of said hybrid binaural room impulse response data, the filtering including high pass filtering or band pass filtering.
4. The method of claim 2, further comprising: overwriting said first portion of said non-personalized binaural room impulse response data with the first portion of said personalized binaural room impulse response data to create said hybrid binaural room impulse response data; and filtering the respective first portion of each of said personalized and non-personalized binary room impulse response data prior to said overwriting, the filtering including high pass filtering or band pass filtering.
5. The method of claim 2, wherein said plurality of portions comprise at least one room-dependent portion that is dependent on a room which the respective binaural room impulse response represents; said personalized binaural room impulse response is created in a first room; said non-personalized binaural room impulse response is created in a second room having better acoustic characteristics than said first room; at least one one room-dependent portion of said non-personalized binaural room impulse response data is used to provide the or each corresponding room-dependent portion of said hybrid binaural room impulse response data; and the creating of said hybrid binaural room impulse data involves using said at least one one room-dependent portion of said non-personalized binaural room impulse response data to modify the or each corresponding room-dependent portion of said personalized binaural room impulse response data.
6. The method of claim 5, wherein data representing at least one selected from the group consisting of a reflections portion and a reverberation portion of the non-personalized binaural room impulse response is used to provide the or each corresponding portion of the hybrid binaural room impulse response data.
7. The method of claim 5, wherein said at least one room-dependent portion comprises data representing at least one characteristic of a reverberation portion of said non-personalized binaural room impulse response; and the creating of said hybrid binaural room impulse response data involves using said data representing at least one reverberation characteristic of said non-personalized binaural room impulse response to provide data representing the or each corresponding characteristic of a reverberation portion of said hybrid binaural room impulse response.
8. The method of claim 7, wherein said at least one characteristic is at least one selected from a group consisting of a time decay profile and a gain.
9. The method of claim 7, wherein said at least one characteristic comprises at least one time characteristic including a time decay characteristic and at least one frequency characteristic including a frequency response characteristic.
10. The method of claim 5, wherein said at least one room-dependent portion comprises data representing at least one characteristic of a reflection portion of said non-personalized binaural room impulse response; and the creating of said hybrid binaural room impulse response data involves using said data representing at least one reflection characteristic of said non-personalized binaural room impulse response to provide data representing the or each corresponding characteristic of a reflection portion of said hybrid binaural room impulse response.
11. The method of claim 5, wherein providing the or each corresponding room-dependent portion of said hybrid binaural room impulse response data involves performing digital signal analysis of the respective room-dependent portion of the non-personalized binaural room impulse response data and the personalized binaural room impulse response data using sub-band analysis filter banks.
12. The method of claim 5, wherein providing the or each corresponding room-dependent portion of said hybrid binaural room impulse response data involves performing a comparative listening test.
13. The method of claim 1, wherein the respective binaural room impulse response data comprises data representing an inter-aural time delay, the inter-aural time delay data of said personalized binaural room impulse response is used to provide the inter-aural time delay data of said hybrid binaural room impulse response data.
14. The method of claim 1, wherein the respective binaural room impulse response data includes at least one portion representing a portion of the respective binaural room impulse response that is dependent on a room that the respective binaural room impulse response represents; the creating of said hybrid room impulse response data involves modifying at least one room-dependent portion of said non-personalized binaural room impulse response data using an omni-directional head transfer function (HRTF) of said personalized binaural room impulse response data and an omni-directional head transfer function (HRTF) of said non-personalized binaural room impulse response data, and using said at least one modified room dependent portion in said hybrid binaural room impulse response data; said modifying involves filtering said at least one room-dependent portion of said non-personalized binaural room impulse data using a filter representing the difference between said omni-directional head transfer functions; and said filtering comprises equalization filtering and said filter comprises an equalization filter.
15. The method of claim 14, wherein the difference between said omni-directional head transfer functions is determined by digital signal analysis of said omni-directional head transfer functions.
16. The method of claim 14, wherein the difference between said omni-directional head transfer functions is determined by performing a comparative listening test, said comparative listening test involving comparing, by listening to, a test audio signal processed by the first portion of said non-personalized binaural room impulse data and the test audio signal processed by the first portion of said personalized binaural room impulse data, and adjusting, by adjustably filtering, said test audio signal processed by the first portion of said non-personalized binaural room impulse data to match the test audio signal processed by the first portion of said personalized binaural room impulse data.
17. The method of claim 14, wherein said at least one room dependent portion comprises data representing a reflections portion and a reverberation portion of the respective binaural room impulse response, said data representing at least one of said reflections portion and said reverberation portion is modified using said omni-directional head transfer functions.
18. The method of claim 1, wherein creating said hybrid binaural room impulse response data involves modifying said personalized binaural room impulse response with at least one aspect of said non-personalized binaural room impulse response that is dependent on a room in which said non-personalized binaural room impulse response is created, and using said modified personalized binaural room impulse response as said hybrid binaural room impulse response; and said at least one room-dependent portion comprises data representing at least one reverberation characteristic of said non-personalized binaural room impulse response.
19. The method of claim 1, further comprising creating a hybrid binaural room impulse data set comprising respective hybrid binaural room impulse data for each of a plurality of loudspeaker-to-head orientations.
20. The method of claim 1, further comprising: transforming an audio signal into a virtualized audio signal using said binaural room impulse response data; and rendering said virtualized audio signal to a listener.
21. A digital signal processing method for modifying data representing a binaural room impulse response, said data including data representing at least one selected from a group consisting of a reflections portion and a reverberation portion of said binaural room impulse response, said method comprising: modifying said data to modify at least one characteristic of said at least one selected from the group consisting of said reflections portion and of said reverberation portion; said at least one characteristic including a frequency response characteristic or time decay characteristics and being modified to conform to the or each corresponding characteristic of the respective portion of a reference binaural room impulse response, the reference binaural room impulse response being a personalized or non-personalized binaural room impulse response or a hybrid binaural room impulse response; and said modification to conform involves performing digital signal analysis of data representing said binaural room impulse response and data representing said reference binaural room impulse response.
22. The method of claim 21, wherein said modification to conform is performed by performing a comparative listening test between an audio signal rendered using said binaural room impulse response data and using said reference binaural room impulse response data.
23. The method of claim 21, wherein said modifying is performed empirically according to a listener's preference.
24. The method of claim 21, including performing sub-band analysis of all or part of said binaural room impulse response data; and said modifying involves modifying said at least one characteristic of at least one of the resulting sub-band data, and synthesizing the sub-band data, including any modified sub-band data.
25. The method of claim 21, wherein said at least one characteristic comprises at least one selected from a group consisting of a gain and decay envelope characteristic.
26. The method of claim 21, wherein said modifying is performed in real-time during audio virtualization of an audio signal using said binaural room impulse response data.
27. A digital signal processing apparatus for creating binaural room impulse response data, said apparatus comprising digital signal processing means for: providing data representing a personalized binaural room impulse response, said personalized binaural impulse response being created in respect of a target listener; providing data representing a non-personalized binaural room impulse response, said non-personalized binaural impulse response being created in respect of a dummy or a person other than the target listener; using said personalized binaural impulse response data and said non-personalized binaural impulse response data to create data representing a hybrid binaural room impulse response; and creating said hybrid binaural room impulse response data by modifying said non-personalized binaural room impulse response with at least one aspect of said personalized binaural room impulse response that is independent of a room in which said personalized binaural room impulse response is created, and using said modified non-personalized binaural room impulse response as said hybrid binaural room impulse response.
28. A system comprising the digital signal processing apparatus of claim 27, wherein the digital signal processing means is further for transforming an audio signal into a virtualized audio signal using said binaural room impulse response data; and the system further including headphones for rendering said virtualized audio signal to a listener.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the invention are now described by way of example and with reference to the drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
DETAILED DESCRIPTION OF THE DRAWINGS
(18) Binaural room impulse responses typically represent virtual loudspeakers in a virtual sound room as perceived by a human subject.
(19)
(20) A binaural room impulse response (whether personalised or non-personalised) is typically created for any one or more of: the or each loudspeaker; and the, or each, orientation of the head position with respect to the or each loudspeaker. This results in a respective binaural room impulse response for each of a plurality of loudspeaker-to-head orientations. Collectively, these responses, or more particularly data representing these responses, can be referred to as a binaural room impulse response data set, e.g. a BRIR data set or a PRIR data set.
(21)
(22) In this description no attempt is made to rigidly demarcate these HRIR, early reflections or reverberation samples in a binaural room impulse response in terms of time as these will depend on the dimensions and surface characteristics of the room and the position of the subject in that room. However a binaural room impulse measured in a living room by an adult subject would typically comprise a HRIR portion spanning a first period, e.g. the first five milliseconds (ms), beginning from the onset 11 (
(23)
(24) Virtual sound room rendering is most realistic when the listener has themselves been the subject of the binaural room impulse response measurement. In other words the listener must go to a room to be measured for best performance. Unfortunately the acoustical properties of sound rooms have a significant effect on the perceived quality of the reproduced sound. Music and film studios, professional listening rooms and auditoriums are designed with this in mind and will often sound considerably more pleasing than the average living room or home theatre. It makes sense therefore for listeners to seek out the best sound rooms to make PRIR measurements. The difficulty with this approach is that good sound rooms are few and far between and may not be accessible by the general public. A challenge therefore is to create a means by which a listener can take a BRIR measurement, made in an arbitrary sound room by an arbitrary person, and improve the virtual realism of such a non-personalised sound room when listening over their own headphones. In this way a BRIR of a good sound room could be downloaded over the internet, for example, processed to improve the rendering for the specific listener, and used as an alternative to a PRIR made in such sound room. It would not be expected that the processed BRIR would ever sound superior to a PRIR made by the listener in the same room, but the aim is to make the BRIR more listenable. Human sound localisation and rendition is affected by three main processes. First the time of arrival of a sound at each ear can be used by the brain to determine the direction of a sound, i.e. if it arrives at the left ear first then the sound is coming from the left side. Second, the way the sound interacts with the outer ear (pinna), head and shoulders before entering the ear canal. This modification is used by the brain to help determine direction when there is no time delay between the ears, for example when the sound is coming from directly in front. Third, the ear that is receiving the loudest sound indicates to the brain that the sound source is on the same side as that ear.
(25) For low frequency sounds, both ears hear much the same signal since obstructions such as the head and pinna are small compared to the wavelength of the sound wave and are essentially invisible to such frequencies. It can be deduced therefore that low frequency components of a binaural room impulse response are similar across the general population except only for the time delay between the two ears, this delay being related to the distance between the subject's ears.
(26) As the frequency of sound increases so too does the level of interaction with the head and in particular sounds coming from one side of the head or the other will tend to be attenuated by the time they reach the ear canal on the far side—known as head shadowing. Increasing the frequency of sound still further—as the wavelength drops below the physical size of the subject's outer ear the sound is modified by reflections and resonances set up around this structure prior to entering the ear canal. Such frequencies are also heavily affected by head shadowing.
(27) Another deduction that could therefore be made is that BRIR frequencies below those that begin to interact with the outer ear are mostly affected by head shadowing and that the attenuation properties are probably similar from head to head since head composition and size does not vary much from person to person. Again it would be the variation in distance between subject's ears that has the biggest impact.
(28) Another deduction is that, since the shapes of outer ears are clearly different across the general population, the greatest difference between BRIRs occurs in the frequency band where the sound interacts with the outer ear. In terms of personalisation, this is the region that makes a sound room rendered with a PRIR sound realistic and that with a BRIR sound vague. Worse, listening to another person's PRIR can not only cause vagueness in the virtual loudspeaker positions but can also cause an unnaturalness in the tonality or timbre of the overall sound being heard over the headphones, i.e. they can often sound too bright or too dull.
Modifying a BRIR Using Information from a PRIR
(29) One feature of an embodiment of the invention is the facility to improve the perceived sound quality of a BRIR data set by incorporating certain information from the listener's PRIR data set into the said BRIR data set. The preferred process of incorporating this information involves the following three steps. In alternative embodiments, any one of these steps may be used on its own, or any two may be used in combination with each other.
(30) 1. Use PRIR ITD Information
(31) First, the inter-aural time delay (ITD) information in the BRIR loudspeaker data is replaced by that of listener's equivalent PRIR loudspeaker data. An example of such ITD information is disclosed in WO 2006024850. This information preferably comprises right-ear to left-ear delay values, typically measured in fractional sample periods, for each head orientation and for each loudspeaker (or for each loudspeaker-to-head orientation). Replacing this data ensures the listener experiences virtualisation delays matched to their own head size and ear separation.
(32) 2. Use PRIR HRIR Information
(33) Second, for each loudspeaker represented in a BRIR the listener should have available a personalised measurement (PRIR) of the same, or similar, loudspeaker position. The room used to make this PRIR is unimportant since only the HRIR portions of the data set are used. Referring to
(34) Referring to
(35)
(36)
(37) Although the methods of
(38) The loudspeaker-to-head orientations of the PRIR loudspeakers being used to replace the BRIR HRIR information preferably have similar orientations as the loudspeakers they are replacing, although a precise match is not necessary. Where the listener uses the method of
(39) 3. Use PRIR Omni-Directional HRTF Information
(40) Third, while using the PRIR HRIR in this way will significantly improve the ability of the listener to properly localise the BRIR loudspeakers, the early reflections and reverberation still retain the HRTF encoding of the person, or dummy, used to make the BRIR measurement. In particular if their pinna shape is significantly different to the listener's, the listener may perceive an unnatural timbre in the virtualised room reverberation. Fortunately since reflections and reverberation are made up of impulses arriving simultaneously from a wide range of directions it would appear the brain is unable to judge the accuracy of the localisation and hence one person's binaural reverberation will often sound as much out-of-head as another person's reverberation. As such it is possible to reduce colouration through simple equalisation filtering without significantly degrading the BRIRs out-of-head performance.
(41) To implement such an equalisation it is first necessary to estimate the omni-directional HRTF for both the BRIR and PRIR data sets. With these estimations at hand one can either create an equalisation function directly by analysing the difference between the two, or by setting up an A-B listening apparatus that allows the listener to create one through subjective comparison. The early reflection and reverberation samples for all the BRIR virtual loudspeakers can then be filtered with this response to reduce colouration of the virtual sound room. Using the reverberation data of BRIR and PRIRs directly to calculate such omni-directional HRTFs is not desirable since the frequency response of the rooms are also embedded in this data, responses at least for the BRIR, we can assume are unknown. Since the only portion of a binaural room response that has not made contact with any room surface is the HRIR, this data is a better candidate. The down side of using the HRIR is that typically one has only a relatively sparse set of measurements, particularly with a BRIR data set, and therefore estimating a good omni-directional average for the BRIR HRTF will be more challenging.
(42) Fortunately many PRIR/BIRIR data sets (see for example WO 2006024850) include as many as seven different loudspeakers placed around the listener and measured at three look angles (i.e. head positions with respect to the loudspeakers) resulting in as many as twelve different HRIR directions for each ear. This number of directions would likely produce a useful average but more would be better. Indeed it is envisaged that PRIR data set formats would be expanded in the future to include the omni-HRTF data of the subject (human or dummy) that measured the sound room. Thereafter the fixed data set would be automatically inserted into any PRIR file made by the subject for the purposes of helping other listeners automate the colouration reduction step. Although a good average would require the subject to take perhaps twenty to thirty measurements in an even 3D spread around the head, this would not be overly onerous as it would only need to be undertaken once and stored off for future use. In addition, since the main area of interest is the average HRIR colouration caused by the pinna, such measurements can, if desired, involve a small speaker, or tweeter and effectively be made in any type of room without reducing the effectiveness of the data.
(43)
(44) An alternative to the steps described in
(45) The method of
(46)
(47) For example if a listener wants to modify the left-ear BRIR for the front left loudspeaker 5 then they would extract those impulse samples from the BRIR file and place it in the BRIR buffer 47. Likewise they would take the left-ear impulse samples of a PRIR front left loudspeaker and place them in the PRIR buffer 46. A left-ear equalisation filter 53 is loaded with filter coefficients generated by either the direct method
(48) Although
(49) The frequency range of the equalisation (EQ) filter 53 can be from DC to Fs/2 or it can be restricted in scope to focus on a particular region of interest. Since much of the colouration in the BRIR reflection and reverberation samples stems from the pinna of the subject that made the measurement, one mode of operation would be operate the EQ filter, for example, over the range 3 kHz to 20 kHz. However, since colouration can also result from other larger physical features of the subject a hard limit on the minimum frequency is not recommenced. Nonetheless, as discussed earlier, if the listener is making PRIR measurements for the purpose of either using the high-passed HRIR portion to replace that in a BRIR data set or for making a collection of measurements to create an omni-directional HRTF where the low frequencies are not required, then it is possible to do so using a small loudspeaker transducer such as a tweeter or smart phone rather than a full-range loudspeaker.
(50) Finally the hybrid BRIRs 49 are loaded into the listeners virtualiser and used to convolve audio in real-time, thereby recreating the virtual sound room over their headphones.
Modifying a PRIR Using Information from a BRIR
(51) The apparent sound quality of a room is largely dependent on the characteristics of the early reflections and reverberation. A high quality sound room will often have been designed to achieve a particular frequency response and damped reverberation characteristic. The reverberation decay rate will not be fixed across the frequency range and will normally decay faster for higher frequencies. The low frequency reverberation of a room is especially difficult to properly dampen and often requires specialised structural features to control such propagation. Consequently regular living rooms when used as a sound room will often suffer from a lack of reverberation damping, particularly in the lower registers. Hence it would be beneficial for PRIR measurements made in standard, non-treated rooms, to have their reverberation characteristics modified to follow that of a high quality sound room or studio as might be represented in a BRIR data set.
(52) While a number of alternative implementations are described below, preferred embodiments of this aspect take the listener's PRIR data set and improve the perceived quality of that virtual sound room by making its reverberation time and frequency characteristics conform to that of a BRIR data set. Rather than try to improve a non-personalised binaural room response (BRIR) as described previously, if the virtual sound room of a PRIR is of reasonable quality then it may be worthwhile to try and make it sound more like the virtual sound room of a BRIR. In this case the HRTF part of the PRIR is optimal already since it is that of the listener and does not contain any room reflections or reverberation. What may not be optimal is the reverberation frequency response and time decay characteristics of the PRIR sound room.
(53) Use the BRIR Reverberation Information Directly
(54)
(55) A simplification of
(56) Use the BRIR Reverberation Information as a Subjective Reference
(57) A subjective method of modifying the PRIR reverberation to match that of the BRIR reverberation is illustrated in
(58) The envelope control 67 would typically drive some type of exponential or logarithmic function where the magnitude and sign of the power is altered by the listener. This is because room reverberation exhibits similar decay characteristics. Each time the listener adjusts the envelope control, the amplitude of reverberation samples in the corresponding sub-band PRIR buffer are adjusted to conform to the new exponential curve.
(59) Once the listener is satisfied with the sub-band matching, the PRIR reverberation sub-band samples are recombined into a full-band reverberation set 59 as shown in
(60) The filter-bank 55 shown in
(61) For clarification
(62) The method of
(63) Finally the hybrid BRIRs 49,
(64) It will be appreciated by those skilled in the art that there are many ways of analysing and synthesising a signal in time and frequency and that the sub-band filter bank methods of
Modifying a PRIR or BRIR for Improved Sound
(65) Another feature of an embodiment of the invention is the facility for allowing the headphone listener to override the reverberation properties of a PRIR, BRIR, equalised BRIR, hybrid PRIR or hybrid BRIR data sets, both in time and frequency, as a means of altering the perceived quality of the virtual sound room. As discussed earlier, often it is the controlled damping of the room reverberation that defines a good sound room, damping that is particularly difficult to control in regular living room environments without major structural changes to the room itself.
(66) A simplification of
(67) The filter-bank 55 can have any number of bands and be implemented in many different ways. If the number of sub-bands is relatively small, one method is to use band-pass filters deploying either IIRs or FIRs. The use of band-pass filters simplifies the design of non-uniform sub-bands 74 (
(68) The steps of
(69) It will be appreciated by those skilled in the art that there are many ways of analysing and synthesising a signal in time and frequency and that the sub-band filter bank methods of
(70) Embodiments of any aspect of the present invention may be implemented by a suitably configured digital signal processing (DSP) apparatus. The DSP apparatus may comprise hardware, firmware and/or software as is convenient. The subject matter of
(71) Aspects of the invention may be embodied in an audio system for virtualisation of a set of loudspeakers by headphones (where “headphones” is intended to embrace “ear phones”), wherein the system includes an audio virtualiser configured to transform audio loudspeaker signals into virtualised loudspeaker signals for playback over headphones, rendered using a set of binaural room impulse responses. Advantageously the binaural room impulse responses are of the modified described herein or otherwise embodying any of the various aspects of the present invention.
(72) Aspects of the invention may be embodied as an audio virtualiser configured to transform audio loudspeaker signals into virtualised loudspeaker signals for playback over headphones, rendered using a set of binaural room impulse responses. Advantageously the binaural room impulse responses are of the modified described herein or otherwise embodying any of the various aspects of the present invention. The audio virtualiser transforms audio loudspeaker signals in real time, the transformed, or virtualised, signals being rendered by the headphones to the listener in real time.
(73) It will be apparent that preferred embodiments of the invention manipulate digital room impulse responses in a way that allows the listener to better experience virtual sound rooms that they do not have the opportunity to visit in person.
(74) The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above teachings.