ACOUSTIC MEASUREMENT
20230007420 · 2023-01-05
Inventors
Cpc classification
H04S2420/01
ELECTRICITY
International classification
Abstract
A method for determining subject specific digital audio data can comprise providing at least one respective audio signal input to each of a plurality of loudspeaker elements supported in a predetermined spatial relationship, in which respective locations of an effective point source of each loudspeaker element all lie in an imaginary surface that at least partially contains a spatial region where at least one aural cavity of a subject is located, thereby providing a distance between each respective location and each aural cavity of less than 1.5 meters. Responsive to at least one audio signal output from at least one of the loudspeaker elements, via at least one microphone element located at or within an aural cavity of the subject, respective subject specific audio data output is provided and is processed via an audio processing system, the subject specific audio data output, thereby providing subject specific digital audio data.
Claims
1. Apparatus for providing subject specific digital audio data, comprising: a plurality of loudspeakers, each responsive to at least one respective audio signal input and supported in a predetermined spatial relationship in which respective locations of an effective point source of each loudspeaker all lie in an imaginary surface that at least partially contains a spatial region where a subject comprising at least one aural cavity is locatable; at least one microphone locatable on or within an aural cavity of the subject, for providing a respective subject specific audio data output responsive to at least one physical characteristic of the subject and an audio signal output from at least one of the loudspeakers; and an audio processor configured to process the subject specific audio data output and provide subject specific digital audio data for said subject, responsive thereto, wherein a distance between each respective location and each aural cavity is less than 1.5 meters.
2. The apparatus as claimed in claim 1, wherein the subject specific digital audio data comprises data that represents a superposition of sound, from the plurality of effective point sources of the loudspeakers, at the aural cavity responsive to at least one physical characteristic of the subject.
3. The apparatus as claimed in claim 1, wherein each subject specific audio data output comprises a digital or analogue representation of a physical reverberation of an active element of the respective microphone responsive to a superposition of sound, including sound from the plurality of effective point sources of the loudspeakers, at the active element.
4. The apparatus as claimed in claim 1, wherein said a distance is selected to provide a near field sound wave provided by a superposition of sound, including sound from the plurality of effective point sources of the loudspeakers, at each aural cavity.
5. The apparatus as claimed in claim 1, wherein each subject comprises at least one physical characteristic responsive to a shape and size of each aural cavity and/or a density, surface texture and/or layering of supporting flesh or flesh imitating material.
6. The apparatus as claimed in claim 1, wherein the imaginary surface comprises a hemisphere or a portion of a hemisphere or a cylinder or a portion of a cylinder or a combined surface that includes a full or partial hemisphere portion and a full or partial cylindrical portion.
7. The apparatus as claimed in claim 1, wherein the subject is a person, or a dummy mannequin, or an anthropomorphic model.
8. The apparatus as claimed in claim 1, wherein a position of at least one of the plurality of loudspeakers is adjustable responsive to a determined height of the subject.
9. The apparatus as claimed in claim 1, wherein each said respective audio signal input is representative of an impulsive input and the subject specific digital audio data comprises data representative of at least one Head Related Transfer Function (HRTF).
10. The apparatus as claimed in claim 1, wherein the predetermined spatial relationship is a spatial relationship predetermined from a regular 2-dimensional shape or a regular 3-dimensional shape.
11. A method for determining subject specific digital audio data, comprising: providing at least one respective audio signal input to each of a plurality of loudspeakers supported in a predetermined spatial relationship, in which respective locations of an effective point source of each loudspeaker all lie in an imaginary surface that at least partially contains a spatial region where at least one aural cavity of a subject is located, thereby providing a distance between each respective location and each aural cavity of less than 1.5 meters; responsive to at least one audio signal output from at least one of the loudspeakers, via at least one microphone located at or within an aural cavity of the subject, providing respective subject specific audio data output; and via an audio processing system, processing the subject specific audio data output, thereby providing subject specific digital audio data.
12. The method as claimed in claim 11, further comprising: providing the subject specific digital audio data as data that represents a superposition of sound at the aural cavity responsive to at least one physical characteristic of the subject.
13. The method as claimed in claim 11, further comprising: providing the subject specific audio data output as a digital or analogue representation of a physical reverberation of an active element of a respective microphone responsive to a superposition of sound at the active element.
14. The method as claimed in claim 11, further comprising: locating a subject that comprises a person or a dummy mannequin or an anthropomorphic model in a spatial region that is at least partially contained by an imaginary surface in which an effective point source of each loudspeaker lies.
15. The method as claimed in claim 14, further comprising: prior to or subsequent to locating the subject in the spatial region, adjusting a height of at least one loudspeaker with respect to a floor surface via which the subject is located.
16. The method as claimed in claim 11, further comprising: providing at least one near field compensated (NFC) Head Related Transfer Function (HRTF) via application of a near field compensation audio processing step to the subject specific audio data output and, optionally, modifying at least one NFC HRTF and providing at least one synthesised far-field HRTF.
17. The method as claimed in claim 11, further comprising: formatting a suitable collection of HRTFs and providing a subject specific binaural Ambisonic renderer.
18. The method as claimed in claim 11, wherein the predetermined spatial relationship is a spatial relationship predetermined from a regular 2-dimensional shape or a regular 3-dimensional shape.
19. A subject specific digital audio profile, determined from at least one analogue audio data output provided by at least one microphone located on or within at least one aural cavity of a subject, that comprises a subject specific Ambisonics renderer that modifies digital audio input data according to at least one physical characteristic of a subject and provides personalized audio data output responsive thereto, wherein: the at least one microphone is responsive to an audio signal output of at least one of a plurality of loudspeakers that are supported in a predetermined spatial relationship in which respective locations of an effective point source of each loudspeaker all lie in an imaginary surface that at least partially contains a spatial region where a subject comprising at least one aural cavity is locatable, wherein a distance between each respective location and each aural cavity is less than 1.5 meters; and the analogue audio data output is processed via a near-field compensation audio processing technique.
20. The subject specific digital audio profile as claimed in claim 19, wherein the subject digital audio data comprises data representative of at least one Head Related Transfer Function (HRTF).
Description
[0095] Embodiments of the present invention will now be described hereinafter, by way of example only, with reference to the accompanying drawings in which:
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117] In the drawings like reference numerals refer to like parts.
[0118]
[0119] Linear actuators 140 can adjust the height of the support structure 120 suitable for a person 150 to stand (or, if appropriate, sit) inside the acoustic chamber 100. A first portion 170a and second portion 170b of the support structure 120 are each connected to the remainder of the support structure 120 via hinges, allowing the first and second portions 170a, 170b to swing outwards, suitable for a person 150 to walk into the acoustic chamber 100.
[0120] A display 180 comprises a part of a self-alignment system that gives feedback to the person 150 so that the person 150 can align himself at a predetermined reference point in the acoustic chamber 100. The self-alignment system further comprises at least one video camera that provides a video feed to the display 180 that can be overlaid with visual instructions on the display 180 that tell the person 150 how to adjust themselves within the chamber. Optionally, the self-alignment system further comprises at least one laser which measures the distance of a respective location of the person 150 from the laser.
[0121] At least one ear 190 of the person 150 is located within the acoustic chamber 100. Depending on the particular set of acoustic measurements that are desired, the combination of the signals transmitted by the loudspeakers 110 can generate a sweet spot centred in proximity to the centre of the head of the person 150, a sweet spot centred in proximity to the orifice of one ear 190, or two sweet spots each centred respectively in proximity to the opening of each of two ears of the person 150. Ear-locatable microphones are located on or within at least one ear 190. The ear-locatable microphones record sound transmitted by the loudspeakers 110 after the sound has been affected (e.g. via reflection, diffraction, and refraction) by the physical characteristics of the person 150. Example physical characteristics include the size, shape, and composition of the body, torso, head, facial features, and ears of the person 150. Optionally, ‘composition’ may refer to the density and/or surface texture and/or layering of flesh or flesh imitating material. The acoustic chamber 100 is of a size such that when the person 150 is aligned at the centre of the acoustic chamber 100, the loudspeakers 110 mounted to that support structure 120 are at sufficiently close distance to the person 150 such that the wave fronts of sound waves transmitted by the loudspeakers 110 are effectively non-planar. Such a distance may be referred to as ‘near field’. In the ‘near field’ of a subject, small changes in the distance of the subject to a source are perceptually relevant. Aptly, the near-field represents a region of space close to the head of a subject/listener such that the wave front curvature of a sound wave are perceptually significant.
[0122] It will be understood that instead of a person 150, a dummy mannequin or anthropomorphic model can be located in the acoustic chamber 100 and microphones can be located on or within at least one artificial ear/aural cavity. An ear or an artificial ear is an example of an aural cavity.
[0123] It will be understood that the acoustic chamber 100 shown in
[0124] It will be understood that the acoustic chamber 100 has an associated imaginary surface that the size and shape of the support structure 120 resembles. For example, the acoustic chamber as illustrated in
[0125] In
[0126] It will be understood that sound-dampening material, such as acoustic foam, can be mounted to the outside of the acoustic chamber 100 and/or between the beams of the support structure 120 and/or positioned externally to at least partially surround the acoustic chamber 100. By mounting acoustic foam to, or in proximity to, the acoustic chamber 100, external noise can be reduced increasing the quality of acoustic measurements determined using the acoustic chamber 100.
[0127]
[0128]
[0129]
[0130]
[0131] Aptly, the acoustic chamber 100 provides apparatus for providing subject specific digital audio data. The acoustic chamber includes a plurality of loudspeaker elements 200, each of these is responsive to at least one respective audio signal input and is supported in a predetermined spatial relationship in which respective locations of an effective point source of each loudspeaker element 200 all lie in an imaginary surface that at least partially contains a spatial region where a subject 150 comprising at least one aural cavity 190 is locatable. At least one microphone element is locatable on or within an aural cavity 190 of the subject 150, for providing a respective subject specific audio data output responsive to at least one physical characteristic of the subject and an audio signal output from at least one of the loudspeaker elements 200. An audio processing element can be included for processing the subject specific audio data output and providing subject specific digital audio data for said subject 150, responsive thereto. A distance between each respective location and each aural cavity 190 is less than about 1.5 metres.
[0132] Aptly, a distance between each respective location and each aural cavity 190 is about 1.5 metres. Aptly, a distance between each respective location and each aural cavity 190 is less than about 1.45 metres; or less than about 1.4 metres; or less than about 1.35 metres; or less than about 1.3 metres; or less than about 1.25 metres; or less than about 1.2 metres; or less than about 1.15 metres; or less than about 1.1 metres; or less than about 1.05 metres; or less than about 1 metre; or less than about 0.95 metres; or less than about 0.9 metres; or is any value selected from these ranges; or any sub-range constructed from the values contained within any of these ranges. Aptly, each respective location is within the near-field of each aural cavity 190. Aptly, at least one respective location is within the near-field of each aural cavity 190. Aptly, at least two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, or sixteen respective locations are within the near-field of each aural cavity 190. Aptly, the acoustic chamber is adjustable to a height of 2 metres. Aptly the acoustic chamber is adjustable to a height of less than 2 metres. Aptly, the acoustic chamber is adjustable to a height of up to 2 metres. Aptly, the acoustic chamber is adjustable to height of, or above, 1 metre. Aptly, the acoustic chamber is adjustable to a height up to 1.5 metres; or up to 1.55 metres; or up to 1.6 metres; or up to 1.65 metres; or up to 1.7 metres; or up to 1.75 metres; or up to 1.8 metres; or up to 1.85 metres; or up to 1.9 metres; or up to 1.95 metres.
[0133]
[0134]
[0135] In step 410, the subject is aligned relative to a reference point within the acoustic measurement chamber. The reference point is determined by the predetermined relationship according to which the acoustic measurement chamber is arranged. Optionally, the reference point is at a known location relative to a predicted sweet spot that may be generated by the loudspeakers 110 of an acoustic chamber 100. At least an aural cavity (and optionally two) is located so that it is contained within an imaginary surface that contains the multiple loudspeaker effective point sources.
[0136] The alignment step 410 may involve manual assistance and/or a self-alignment system. The self-alignment system may comprise at least one display connected to at least one video camera device. Optionally, the self-alignment system comprises at least one laser. Each laser can provide measurements of the distance of a part of the subject to the respective laser. The at least one display may display real-time video footage of the subject to the subject or to an external observer. The video camera devices and the displays may also be connected to a processing unit to provide simultaneously to the subject an overlay with real-time footage, so a subject or an external observer can more easily see the location of the head of the subject relative to the reference point. Adjusting the height of the acoustic chamber relative to the subject and aligning the subject relative to a reference point in the acoustic chamber can improve the accuracy of the acoustic measurements, and therefore the quality of the products of the audio processing of the acoustic measurements. A processing unit is a computing device capable of processing the video feeds of at least one video camera device and providing output to a display that shows real-time data to a subject or an external observer indicating a current position of the subject relative to the reference point. Optionally, a processing unit is a desktop computer, laptop computer, tablet, smartphone, server, or cloud computer. Optionally, the processing unit is capable of receiving data input, from at least one laser, that includes the distance of a part of the subject relative to the respective laser and providing output to a display responsive to the data input to aid the subject in the alignment process.
[0137] In step 420, at least one microphone element is placed on or within at least one ear or artificial ear or aural cavity of the subject.
[0138] In step 430, a first predetermined audio signal is played back through at least one of the loudspeaker elements. The predetermined audio signal may be an impulse of a particular frequency or a sinusoidal sweep of multiple frequencies that is inclusive thereof. A sinusoidal sweep of frequencies is an audio signal comprising a sinusoidal wave that progressively increases in frequency at a predetermined rate between a predetermined range of frequencies. Responsive to the predetermined audio signal and the physical characteristics of the subject, an audio signal (i.e. the HRTF associated with the first loudspeaker at given location) is captured by the at least one microphones and is recorded, in a digital data form, to a memory unit. This step is then repeated for as many impulse (or signal representative thereof) and loudspeaker element (of a particular location) combinations as desired. If the predetermined audio signal is a sinusoidal sweep of multiple frequencies, there may be a further step wherein a deconvolution technique is applied to the captured audio signal to determine an impulse-equivalent response of audio stimuli to the physical characteristics of the subject. A sinusoidal sweep may also be referred to as a sine sweep. Aptly, the deconvolution technique comprises a deconvolution step whereby the recorded signal is convolved with an inverted copy of the sine sweep in order to effectively simulate an impulsive stimuli.
[0139] As illustrated in
[0140] The Ambisonic audio file provides a surround-sound format that allows for the reproduction of a soundfield via an arbitrary loudspeaker layout, so long as there are a sufficient number of loudspeakers comprising the layout and, for a given number of loudspeakers, the loudspeakers are suitably arranged so that the signals from the loudspeakers appropriately interfere at a desired listening location. Via the steps in accordance with the present invention, a soundfield is decomposed into a component form based on the special mathematical functions known as ‘spherical harmonics’. By representing a soundfield in this way, certain transformations of the soundfield, such as rotational transformations, can be computed efficiently due to the natural mathematical symmetries of spherical harmonics.
[0141] For a given order of an Ambisonic format, it is the components of the decomposed soundfield that are decoded to generate the signals that are sent each loudspeaker in a respective loudspeaker layout. The ‘order’ of an Ambisonics format is determined from the number of components into which a soundfield is decomposed.
[0142] Certain embodiments of the present invention provide a subject specific binaural Ambisonics renderer determined from near-field acoustic measurements of the subject. One advantage of the provided subject specific (e.g. a personal) Ambisonics renderer is that it provides a listener/user the benefit of lower-latency audio processing and a higher accuracy of sound localisation over conventional solutions. One area in which this is useful is when the head of the listener/user is being tracked in space and the head movements (e.g. rotational head movements) affect the sounds that the listener/user hears. This is useful in the context of professional computer gaming (which may also be referred to as ‘eSports’), for example, as a player who can more precisely and more quickly locate the source of an in-game sound has an advantage over his competitors.
[0143]
[0144]
[0145]
[0146]
[0147] In this context, the group delay of an audio signal is the time delay introduced during the reproduction of the audio signal into sound for the component frequencies of the audio signal.
[0148]
[0149]
[0150]
[0151]
[0152]
[0153] In
[0154] In step 1020, a Low-Pass Filter (LPF) effect is applied to each set of intermediate HRTFs that attenuates the amplitude of frequencies in the HRTF signal above the cross over frequency, producing a first set of time-aligned HRTFs for each ear.
[0155] In step 1030, a second copy of the head-centred HRTFs are time-aligned by introducing a fixed time delay for all frequencies, producing a second set of intermediate HRTFs for each ear, where the time delay is calculated according to the location of each ear relative to the loudspeakers.
[0156] In step 1040, a High-Pass Filter (HPF) effect is applied to the intermediate HRTFs that attenuates the amplitude of frequencies in the HRTF signal below the cross over frequency, producing a second set of time-aligned HRTFs for each ear.
[0157] In step 1050, the first and second sets of time-aligned HRTFs are combined for each ear, respectively, producing what is referred to as ‘hybrid HRTFs’ for each ear. These hybrid HRTFs for each ear can also be packaged together into a single set of stereophonic hybrid HRTFs. Optionally, the first and second set of time-aligned HRTFs are combined via a linear phase crossover filter effect. It will be understood that alternative methods could be used to combine these two sets of HRTFs.
[0158] In
[0159] In step 1110, an incremental time delay is introduced to the head-centred HRTFs, producing a first set of intermediate HRTFs for each ear, for example according to the curve 730, to a first copy of the head-centred HRTFs, up to the cross over frequency 720. The time delay introduced is negligible for frequencies below the first-increment frequency 750. In general, the time delay introduced can be determined individually for each HRTF because each HRTF corresponds to a particular location relative to the subject measured.
[0160] In step 1120, a Low-Pass Filter (LPF) effect is applied to the intermediate HRTFs for each ear that attenuates the amplitude of frequencies in the HRTF signal above the cross over frequency, producing a first set of time-aligned HRTFs for each ear.
[0161] In step 1130, BiRADIAL HRTFs for each ear of a subject are obtained, for example according to the method steps as shown in
[0162] In step 1150, the truncated BiRADIAL HRTFs and time-aligned HRTFs are combined for each ear, respectively, producing another example of hybrid HRTFs for each ear. These hybrid HRTFs for each ear can also be packaged together to form stereophonic hybrid HRTFs. Optionally, the first and second set of time-aligned HRTFs are combined via linear phase crossover filter effect. It will be understood that alternative methods could be used to combine these two sets of HRTFs.
[0163]
[0164] In step 1210, the near-field time-aligned HRTFs or near-field BiRADIAL HRTFs are distance-compensated and encoded into a spherical harmonic format.
[0165] Conventionally, certain Ambisonics techniques are based on the assumption of plane wave theory, mathematically encoding a source into spherical harmonics assumes that the source has a planar wavefront. In accordance with the present invention, for acoustic measurements taken in the near-field, which thus involve sound waves having a non-planar wavefront, near-field compensation (NFC) steps are applied so the HRTFs are suitable for use in an Ambisonics renderer.
[0166] The Ambisonic components, β.sub.mi.sup.σ, of a plane wave signal, s, of incidence (φ,ϑ) may be defined:
β.sub.mi.sup.σ=s.Math.Y.sub.mi.sup.σ(φ,ϑ) (1)
[0167] For a (radial) point source of position (φ,ε,r.sub.s) it is helpful to consider the near-field effect filter, Γ.sub.m, such that:
β.sub.mi.sup.σ=S.Math.Γ.sub.m(r.sub.s).Math.Y.sub.mi.sup.σ(φ,ϑ) (2)
Γ.sub.m(r.sub.s)=k.Math.d.sub.ref.Math.h.sub.m.sup.−(kr.sub.s).Math.j.sup.−(m+1) (3)
[0168] Where:
[0174] Equation 2 can be simplified into the following form:
[0175] Whereby F.sub.m are the degree dependent transfer functions which model the near-field effect of a signal originating from the point (φ,ϑ,r.sub.s) having been measured from the origin. The filters apply a phase shift and bass-boost to sources as they approach the origin and have a greater effect on higher order components. The near-field properties of the original source and the reproduction loudspeaker are considered when applying NFC.
[0176] In step 1220, mathematical functions representing an audio impulse source are encoded into a spherical harmonic format for a set of frequencies and are convolved with the HRTFs provided via step 1210. Interaural Time Differences (ITDs) are determined for each HRIR from the position of the subject, of whom/which the acoustic measurements were taken, relative to the loudspeakers and the predetermined spatial relationship according to which the loudspeakers are arranged.
[0177] In step 1230, after introducing time delays, synthesised far-field (time-aligned or BiRADIAL) HRTFs are derived. Optionally, the synthesised far-field HRTFs are derived in a spherical harmonic format. The synthesised far-field HRTFs might also be referred to as far-field-equivalent HRTFs.
[0178] Aptly, near-field (time-aligned or BiRADIAL) HRTFs may be encoded into spherical harmonic format in the form of a binaural Ambisonic renderer and distance compensated.
[0179] Aptly, impulse input sources may also be encoded into spherical harmonic format. These may be convolved with the encoded time-aligned or BiRADIAL HRTFs (that form part of a binaural renderer) to produce synthesised far-field time-aligned or BiRADIAL HRTFs. However, time-aligned or BiRADIAL HRTFS can occasionally be limited in their use because they may not reproduce ITDs at low frequencies. Therefore, a time delay can be reintroduced at this point. This results in head-centred synthesised far-field HRTFs. These synthesised HRTFs may then be used in an Ambisonic renderer or indeed converted to hybrid HRTFs at this point for improved reproduction accuracy.
[0180] It will be understood that synthesised far-field hybrid HRTFs may be determined in accordance with the present invention. Synthesised far-field hybrid HRTFs may be determined from near-field hybrid HRTFs that may be encoded into a spherical harmonic format and distance compensated. Impulse input sources, which may also be encoded into a spherical harmonic format, may be convolved with the near-field hybrid HRTFs to produce synthesised far-field hybrid HRTFs.
[0181]
[0182] In step 1310, near-field hybrid (BiRADIAL or time-aligned) HRTFs or synthesised far-field hybrid HRTFs are determined for the specific subject.
[0183] In step 1320, where appropriate, the HRTFs is provided via step 1310 are distance-compensated. The HRTFs are then integrated into a subject specific Ambisonics renderer. A subject specific Ambisonics renderer might also be referred to as a subject specific Ambisonics decoder or a subject specific Ambisonics profile.
[0184] In step 1330, the subject specific Ambisonics renderer is then provided to the user in an appropriate file format via an appropriate means, for example via electronic file transfer, email, cloud computer access, or providing headphones with the subject specific renderer inbuilt/on board.
[0185] In step 1340, the subject specific Ambisonics renderer can then be integrated into software, such as a music player, video player, web-browser, operating system, video game, video game engine, and the like, or (if appropriate) an application programming interface (API) thereof, executed on a computer, smart phone, cloud server, cloud server, and the like to provide a subject specific binaural audio experience for the subject.
[0186]
[0187] The steps shown in
[0188] The steps shown in
[0189] The steps shown in
[0190] The steps shown in
[0191] The steps shown in
[0192] Throughout the description and claims of this specification, the words “comprise” and “contain” and variations of them mean “including but not limited to” and they are not intended to (and do not) exclude other moieties, additives, components, integers or steps. Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.
[0193] Features, integers, characteristics or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of the features and/or steps are mutually exclusive. The invention is not restricted to any details of any foregoing embodiments. The invention extends to any novel one, or novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.
[0194] The reader's attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.