CYLINDRICAL MICROPHONE ARRAY FOR EFFICIENT RECORDING OF 3D SOUND FIELDS
20170295429 · 2017-10-12
Assignee
Inventors
Cpc classification
H04S2400/15
ELECTRICITY
G06F3/038
PHYSICS
H03G5/165
ELECTRICITY
G06F3/0488
PHYSICS
H04R2430/20
ELECTRICITY
H04R5/027
ELECTRICITY
G06F2203/0381
PHYSICS
H04S2420/11
ELECTRICITY
International classification
H04R5/027
ELECTRICITY
H04R1/28
ELECTRICITY
Abstract
Provided are methods, systems, and apparatuses for recording a three-dimensional (3D) sound field using a vertically-oriented cylindrical array with multiple circular arrays at different heights. The design of the cylindrical array is well-suited to providing a high-resolution in azimuth and a reduced resolution in elevation, and offers improved performance over existing 3D sound reproduction systems. The methods, systems, and apparatuses provide a larger vertical aperture than horizontal aperture, as opposed to a spherical array, which has the same aperture for all dimensions, and further provides an alternative format to mixed-order spherical decomposition.
Claims
1. An apparatus for recording a three-dimensional sound field, the apparatus comprising: a cylindrical baffle; and a plurality of line arrays distributed around a circumference of the cylindrical baffle, each line array including microphone elements spaced apart from one another in a longitudinal direction of the cylindrical baffle, wherein each of the line arrays produces a set of vertical beamformer responses, the set of responses having a maximum response at a specified direction of arrival and specified elevation.
2. The apparatus of claim 1, wherein each set of vertical beamformer responses is processed in azimuth to produce cylindrical coefficients of the sound field at the specified elevation.
3. The apparatus of claim 1, wherein the cylindrical baffle has at least one rounded end to control diffraction effects.
4. The apparatus of claim 1, further comprising: one or more vertical beamformers to reduce diffraction effects at one or both ends of the cylindrical baffle.
5. The apparatus of claim 1, wherein the plurality of line arrays are positioned at regularly-spaced angles around the circumference of the cylindrical baffle.
6. The apparatus of claim 1, wherein the microphone elements of each line array are equally spaced apart from one another in the longitudinal direction of the cylindrical baffle.
7. The apparatus of claim 1, wherein the microphone elements of each line array are nonlinearly spaced apart from one another in the longitudinal direction of the cylindrical baffle such that a distance between adjacent microphone elements increases towards one or both ends of the array.
8. The apparatus of claim 1, wherein each microphone element is a micro-electro-mechanical system (MEMS) microphone.
9. A method for recording a three-dimensional sound field, the method comprising: receiving, at a plurality of vertical beamformers of a cylinder-shaped audio recording device, plane waves arriving at a specified elevation angle; generating, based on a decomposition in azimuth of the plane waves arriving at the specified elevation angle, cylindrical coefficients for the elevation angle; and storing the cylindrical coefficients for the elevation angle on a storage device.
10. The method of claim 9, further comprising: generating azimuthal mode decompositions; and applying a mode equalizer to each of the azimuthal mode decompositions.
11. The method of claim 10, wherein the mode equalizer applied to each of the azimuthal mode decompositions is specific to the azimuthal order and specified elevation.
12. The method of claim 10, wherein generating the azimuthal mode decompositions includes: assigning weights to a set of vertically beamformed outputs associated with the specified elevation; and combining the weighted outputs to produce the azimuthal mode decompositions.
13. The method of claim 9, wherein the plurality of vertical beamformers are configured to represent elevational information in the sound field for a specified reproduction loudspeaker array.
14. The method of claim 9, wherein the plurality of vertical beamformers are configured to represent elevational information in the sound field for a maximum resolution of sound in elevation.
15. A system for recording a three-dimensional sound field, the system comprising: system memory; at least one processor coupled to the system memory; and a non-transitory computer-readable medium associated with the at least one processor, the non-transitory medium having instructions stored thereon that, when executed by the at least one processor, causes the at least one processor to: select plane waves arriving at a specified elevation angle at a plurality of vertical beamformers of a cylinder-shaped audio recording device, generate, based on a decomposition in azimuth of the plane waves arriving at the specified elevation angle, cylindrical coefficients for the elevation angle, and store the cylindrical coefficients for the elevation angle on a storage device.
16. The system of claim 15, wherein the at least one processor is caused to: generate azimuthal mode decompositions; and apply a mode equalizer to each of the azimuthal mode decompositions.
17. The system of claim 16, wherein the mode equalizer applied to each of the azimuthal mode decompositions is specific to the azimuthal order and specified elevation.
18. The system of claim 16, wherein the at least one processor is caused to: assign weights to a set of vertically beamformed outputs associated with the specified elevation; and combine the weighted outputs to produce the azimuthal mode decompositions.
19. The system of claim 15, wherein the plurality of vertical beamformers represent elevational information in the sound field for a specified reproduction loudspeaker array.
20. The system of claim 15, wherein the plurality of vertical beamformers represent elevational information in the sound field for a maximum resolution of sound in elevation.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0048] These and other objects, features, and characteristics of the present disclosure will become more apparent to those skilled in the art from a study of the following Detailed Description in conjunction with the appended claims and drawings, all of which form a part of this specification. In the drawings:
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057] The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of what is claimed in the present disclosure.
[0058] In the drawings, the same reference numerals and any acronyms identify elements or acts with the same or similar structure or functionality for ease of understanding and convenience. The drawings will be described in detail in the course of the following Detailed Description.
DETAILED DESCRIPTION
[0059] Various examples and embodiments of the methods, systems, and apparatuses of the present disclosure will now be described. The following description provides specific details for a thorough understanding and enabling description of these examples. One skilled in the relevant art will understand, however, that one or more embodiments described herein may be practiced without many of these details. Likewise, one skilled in the relevant art will also understand that one or more embodiments of the present disclosure can include other features not described in detail herein. Additionally, some well-known structures or functions may not be shown or described in detail below, so as to avoid unnecessarily obscuring the relevant description.
[0060] As described above, existing approaches for spatial audio recording are either limited in their capabilities (e.g., unable to perform beamforming in elevation) or are impractical for many applications. Whereas some of the existing approaches described above are considered perception-based methods, in which only those spatial cues that are perceptually relevant are recorded, the present disclosure relates to the physical-based higher-order Ambisonics method of recording and reproducing spatial sound.
[0061] In view of the various limitations of existing approaches for spatial audio recording, embodiments of the present disclosure relate to methods, systems, and apparatuses for recording 3D sound fields using a vertically-oriented cylindrical array with multiple circular arrays at different heights. The techniques and designs described herein are well-suited to providing a high resolution in azimuth and a reduced resolution in elevation, and offer improved performance over existing 3D sound reproduction systems, which typically only have loudspeakers at two or three heights. The present disclosure provides an alternative format to the mixed-order spherical decomposition, and allows for less complex and less costly manufacture as compared with spherical arrays.
[0062] For example, one or more embodiments involves the use of low-cost silicon microphones that provide digital outputs and which can be easily interfaced to a digital processor, and subsequently to a digital storage device, without requiring a large number of analogue to digital convertors. One advantage of the methods, systems, and apparatuses described herein is that they provide a larger vertical aperture than horizontal aperture, as opposed to a spherical array, which has the same aperture for all dimensions. This is particularly relevant to loudspeaker reproduction arrays consisting of multiple rings (e.g., three) where the vertical spacing between rings is relatively small, requires a high resolution decomposition of the sound field in elevation, but with a small number of desired directions (e.g., three).
[0063]
where J.sub.m(.) is the cylindrical Bessel function, B.sub.m(k.sub.z,ω) is the mth sound field expansion function, k.sub.z is the z-component of the vector wave number, and k.sub.R=√{square root over (k.sup.2−k.sub.z.sup.2)}. The cylindrical description has a trigonometric expansion in azimuthal angle, but a continuous distribution in k.sub.z′. The plane wave coefficients are then a continuous function of k.sub.z.
[0064] For a finite wave number k and radius R, equation (12) may be truncated to a maximum order M≈┌kR┐ in a manner similar to that used in the spherical case.
[0065] For an incident field consisting of plane waves arriving from an angle θ.sub.i from the z-axis, k.sub.z=k cos θ.sub.i and k.sub.R=k sin θ.sub.i. The integral in equation (12) can be transformed by the substitution k.sub.z=k cos θ.sub.i into
[0066] For an incident field consisting of plane waves arriving solely from an angle θ.sub.i from the z-axis the sound pressure (equation (13)) simplifies to
If the incidence angle is θ.sub.i=π/2, then the plane wave distribution is z-independent and
which is the solution to the wave equation for a 2D sound field in a source-free region, with coefficients i.sup.mB.sub.m(ω).
[0067] The cylindrical decomposition of the sound field can be expressed in terms of amplitude modes in a similar manner to the spherical decomposition. Combining the negative and positive m terms in equation (13) gives
where C.sub.0(θ,ω)=B.sub.0(θ,ω) and
C.sub.m(θ,ω)=B.sub.m(θ,ω)+B.sub.−m(θ,ω) (17)
D.sub.m(θ,ω)=i(B.sub.m(θ,ω)−B.sub.−m(θ,ω)) (18)
[0068] For plane waves at elevation θ.sub.i the amplitude mode expansion simplifies to (following equation (14))
[0069] Equation (19) provides an alternative approach to the recording of 3D sound fields using spherical harmonics in which the resolution of the sound field in elevation may be chosen independently of the azimuthal resolution. The field components can be determined for a set of Q angles θ.sub.q, each of which has an expansion of the form of equation (19).
[0070] In accordance with one or more embodiments described herein, the elevational decomposition may be carried out using a cylindrical microphone array consisting of a set of vertical line arrays placed around a cylinder (e.g., vertical line arrays 330 distributed around the circumference of cylindrical baffle 320 as shown in
[0071] For example, in accordance with at least one embodiment, the rounded ends (or end) of the cylinder may be hemispherical so that there is a smooth transition from the sides of the cylinder to the rounded ends, to minimize diffraction from the junction of the two. However, for a more compact implementation, one or both of the rounded ends may be flatter. It should be noted that, in most implementations, both ends of the cylinder should have rounded ends since high-order diffraction from either top or bottom can affect the mode responses of the array.
[0072] In order to record sound fields for listening by a single listener, the cylinder diameter should be, in accordance with at least one embodiment, of a size similar to, or greater than, the human head. If the diameter of the cylinder is equal to the mean human head diameter, then pairs of microphone signals from opposite sides of the array can provide an approximation to binaural recordings, which provides an alternative use of the microphone. The cylinder height should be sufficiently large so that the microphone arrays are not significantly affected by diffraction from either of the ends. For example, the cylinder may have a diameter of 180 mm and a height (including rounded ends) of 394 mm. As a second example, the cylinder diameter may be 175 mm and the cylinder height may be 450 mm. In practice, where a very compact microphone is required, the mode responses of the array will differ from the theoretical values that assume an infinite height cylinder, and in this case must be determined numerically using acoustic simulation software which implements techniques such as, but not limited to, equivalent source methods, finite difference time domain methods, or boundary element methods.
[0073] The following describes recording sound fields in the format of equation (19) in accordance with one or more embodiments of the present disclosure. It should be noted that the following description is based on an initial assumption that the cylindrical microphone array has infinite height.
[0074] The sound pressure on a rigid cylinder of radius a and infinite height has the amplitude mode form
where
is the cylinder mode response.
[0075] The coefficients C.sub.m(θ,ω) and D.sub.m(θ,ω) can, in principle, be determined by multiplying the pressure (equation (20)) by exp(−ikz cos θ.sub.i) and integrating over z
This selects only those components of the sound field consisting of plane waves with angle of elevation θ.sub.i.
[0076] It should be understood that, in practice, equation (22) may not be implemented since a finite number of samples of the pressure are available in z, obtained from a finite number of microphones. This limitation can be accounted for by including a general aperture weighting f(z), which may include delta functions to describe discrete arrays in z. The resulting integral over z is
where
is the vertical response produced by the integral over z at the radian frequency ω. This expansion describes the output of a cylindrical microphone array which implements vertical beamforming (where beamforming refers to the design of a spatio-temporal filter that operates on the outputs of a microphone array). As understood by those skilled in the art, a beamformer is a signal processor used together with a microphone array to provide spatial filtering capabilities such as, for example, extracting a signal from a specific direction and reducing undesired interfering signals from other directions. The microphone array produces spatial samples of a propagating wave, which are then manipulated by the signal processor to produce the beamformer output signal. In at least one example, beamforming is accomplished by filtering the microphone signals and combining the outputs to extract (e.g., using constructive combining) the desired signal and reject (e.g., using destructive combining) interfering signals according to their spatial location. Depending on the particular arrangement, a beamformer can separate sources with overlapping frequency content that originate at different spatial locations. Typically, the beamforming filtering is accomplished by applying a delay to each microphone signal so that the microphone outputs are in phase for the desired location and correspondingly out of phase for other spatial locations. Amplitude weightings may also be applied to limit the effects of the finite array size and reduce side-lobes in the polar response of the beamformer. Returning to the output of the cylindrical microphone array that implements vertical beamforming, the output approximates the expansion in equation (22) if the beamformer is designed to respond primarily to θ=θ.sub.i. The corresponding approximate sound field coefficients at the specified elevation can be found using the identities
[0077] A set of desired elevation angles θ.sub.i=θ.sub.q, q=1, . . . , Q can then be chosen (including the horizontal plane, θ.sub.q=π/2) to allow a 3D sound field to be represented as a horizontal field plus a finite number of elevational fields. The 3D field produces a sparse plane wave approximation to the full 3D field in elevation. In at least one embodiment, the number Q may be chosen to suit a reproduction array, which consists of loudspeakers at a finite number of elevational angles. In accordance with at least one other embodiment, the number Q may be chosen to best represent human acuity in elevation.
[0078] In practice, the mode response function b.sub.m(θ.sub.q,ω) is small at low frequencies and for large m, and so the equalization may be implemented using a regularized inverse
where λ is a regularization parameter that prevents excessive gain at frequencies where b.sub.m(θ.sub.q,ω) is small.
[0079] It should be noted that mode response b.sub.m(θ.sub.q,ω) may not precisely equal equation (21), since the cylinder is not infinite in height. The effect of a finite cylindrical baffle is to produce additional variations in the mode response. The mode response variations are reduced by using rounded ends on the baffle. The effects of the finite length are further reduced by the vertical beamforming, which tends to attenuate sound arriving from the ends of the baffle.
[0080] In accordance with at least one embodiment, the decomposition of each elevational sound field in azimuth is carried out using L.sub.φ microphones in azimuth. The cylindrical microphone array thus consists of L.sub.φ line arrays equally spaced around the cylinder, with each line array having L.sub.z elements. There are thus a total of L.sub.M=L.sub.φL.sub.z microphones.
[0081] The spatial Nyquist frequency in azimuth is obtained from the inter-microphone spacing 2πa/L.sub.φ
[0082]
[0083]
[0084] It should also be noted that, in accordance with at least one embodiment, the microphone elements positioned on the cylindrical baffle (e.g., cylindrical baffle 320 or 420 as shown in
[0085] In at least one embodiment, the microphone has L.sub.φ=32 line arrays, and at each angle in azimuth there are L.sub.z=5 microphones in elevation, at distances of, for example, ±20 and ±60 mm from the central microphone position.
[0086] Each microphone may be, for example, a MEMS microphone that has frequency response characteristics which are well-matched (e.g., typically within ±1 dB of each other). In accordance with at least one embodiment, each MEMS microphone may have a digital output that converts the analog output of the MEMS mechanism into a digital representation of the sound pressure. In accordance with at least one embodiment, two microphone data signals may optionally be multiplexed onto a single data line, so that the total number of data lines is L.sub.φL.sub.z/2. The data lines are connected to a central processor unit which processes the data. In the case of a sigma delta or pulse density modulation bit stream, the processor may down-sample and convert the bitstreams to a pulse code modulation (PCM) data format.
[0087] In at least one embodiment, the PCM data may be directly transmitted via a serial interface, such as, for example, an Ethernet connection, to a computing device that receives and stores the microphone signals. The computer may apply Q vertical beamformers to each line array, producing Q outputs for each line array which represent the sound arriving at the microphone from elevation angle θ.sub.q. The Q outputs are then further decomposed into 2M+1 azimuthal modes. Each mode has a mode equalizer (EQ) applied to produce the desired mixed-order Ambisonics representation.
[0088] In accordance with one or more other embodiments, the digital processor within the microphone array applies Q vertical beamformers to each line array producing Q outputs for each line array, each of which represent the sound arriving at the microphone from elevation angle θ.sub.q. The Q outputs are then further decomposed into 2M+1 azimuthal modes. Each mode has a mode equalizer applied to produce the desired mixed-order Ambisonics representation. The Q(2M+1) signals may then be transmitted via a serial interface, such as, for example, an Ethernet connection, to a computing device that receives and stores the elevational Ambisonics signals.
[0089]
[0090] The L.sub.φL.sub.z microphones (which may be, for example, MEMS microphones) preferably produce serial digital outputs (530) which are connected to a digital processor 520. The processor 520 processes the microphone signals (which may optionally include multiplexing the signals (e.g., using optional multiplexer 560)) into a single data line (540) (which may itself consist of two or more serial data lines) that is fed to a digital data recording device 550 (e.g., a computer). The processor 520 may optionally apply delays directly to the serial sigma delta or pulse density bitstreams to implement vertical beamforming, and then carries out serial data conversion to a multi-bit format. Amplitude shading may then be applied to the multi-bit signals to further control the vertical beamforming response.
[0091]
[0092] The Q beamformer outputs are further processed to produce azimuthal mode decompositions, typically by weighting the vertically beamformed outputs for a set of L.sub.φ outputs associated with elevation q, at line array angles θ.sub.l, by weights of the form cos (mφ.sub.l) and sin(mφ.sub.l) and adding to produce a single mode response signal for the mth mode. Each mode response signal is then equalized by a mode equalizer specific to the azimuthal order m and to the elevation q, for example as given in equation (31). In accordance with at least one embodiment, the mode equalizer may be implemented as a finite impulse response (FIR) digital filter that produces a frequency response close to the response in equation (31). This FIR filter can be designed by methods known to those skilled in the art, such as, for example, representing equation (31) numerically at discrete frequencies and then using an inverse Fourier transform to produce a discrete FIR impulse response, or using a least-squares design method.
[0093]
[0094] At block 705, plane waves arriving at a specified elevation angle may be received (e.g., selected) at a plurality (e.g., set) of vertical beamformers (e.g., vertical beamformers 620 as shown in
[0095] At block 710, cylindrical coefficients may be generated for the specified elevation angle using (e.g., based on) a decomposition in azimuth of the plane waves arriving at the specified elevation angle. In accordance with at least one embodiment described herein, block 710 may also include applying a mode equalizer to each of the azimuthal mode decompositions. The mode equalizer applied to each of the azimuthal mode decompositions may, for example, be specific to the azimuthal order and specified elevation. In addition, the azimuthal mode decompositions may be generated by assigning weights to a set of vertically beamformed outputs associated with the specified elevation, and combining (e.g., adding) the weighted outputs to produce the azimuthal mode decompositions.
[0096] In accordance with at least one embodiment, the example process 700 may optionally include, at block 715, storing or transmitting the coefficients (e.g., generated at block 710) in/to a storage device.
[0097] It should be noted that, in at least one embodiment, the plurality of vertical beamformers (e.g., that are part of a cylinder-shaped audio recording device, such as, for example, vertical beamformers 620 as shown in
[0098]
[0099] In a very basic configuration (801), the computing device (800) typically includes one or more processors (810) and system memory (820). A memory bus (830) can be used for communicating between the processor (810) and the system memory (820). Depending on the desired configuration, the processor (810) can be of any type including but not limited to a microprocessor (μP), a microcontroller (μC), a digital signal processor (DSP), or the like, or any combination thereof. The processor (810) can include one more levels of caching, such as a level one cache (811) and a level two cache (812), a processor core (813), and registers (814). The processor core (813) can include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or the like, or any combination thereof. A memory controller (815) can also be used with the processor (810), or in some implementations the memory controller (815) can be an internal part of the processor (810).
[0100] Depending on the desired configuration, the system memory (820) can be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof. System memory (820) typically includes an operating system (821), one or more applications (822), and program data (824). The application (822) may include a system for recording a 3D sound field (823). In accordance with at least one embodiment of the present disclosure, the system for recording a 3D sound field (823) is designed to record and process one or more audio signals using a vertically-oriented cylindrical array with multiple circular arrays arranged at different heights, which provides a high-resolution in azimuth and a reduced resolution in elevation.
[0101] Program Data (824) may include stored instructions that, when executed by the one or more processing devices, implement a system (823) and method for recording a 3D sound field. Additionally, in accordance with at least one embodiment, program data (824) may include audio signal data (825), which may relate to, for example, sound generated from a source located within some proximity of the vertically-oriented cylindrical array. In accordance with at least some embodiments, the application (822) can be arranged to operate with program data (824) on an operating system (821).
[0102] The computing device (800) can have additional features or functionality, and additional interfaces to facilitate communications between the basic configuration (801) and any required devices and interfaces.
[0103] System memory (820) is an example of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 800. Any such computer storage media can be part of the device (800).
[0104] The computing device (800) may be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a smartphone, a personal data assistant (PDA), a personal media player device, a tablet computer (tablet), a wireless web-watch device, a personal headset device, an application-specific device, or a hybrid device that include any of the above functions. In addition, the computing device (800) may also be implemented as a personal computer including both laptop computer and non-laptop computer configurations, one or more servers, Internet-of-Things systems, and the like.
[0105] The foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. Insofar as such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, it will be understood by those within the art that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof. In accordance with at least one embodiment, several portions of the subject matter described herein may be implemented via Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), digital signal processors (DSPs), or other integrated formats. However, those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits, as one or more computer programs running on one or more computers, as one or more programs running on one or more processors, as firmware, or as virtually any combination thereof, and that designing the circuitry and/or writing the code for the software and or firmware would be well within the skill of one of skill in the art in light of this disclosure.
[0106] In addition, those skilled in the art will appreciate that the mechanisms of the subject matter described herein are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the subject matter described herein applies regardless of the particular type of non-transitory signal bearing medium used to actually carry out the distribution. Examples of a non-transitory signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
[0107] With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
[0108] Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.