Methods and apparatus for rendering audio objects
11564051 · 2023-01-24
Assignee
- Dolby Laboratories Licensing Corporation (San Francisco, CA)
- Dolby International Ab (Amsterdam Zuidoost, NL)
Inventors
Cpc classification
H04S2400/15
ELECTRICITY
H04S5/005
ELECTRICITY
H04S2400/11
ELECTRICITY
H04S7/30
ELECTRICITY
H04S3/008
ELECTRICITY
H04S2400/01
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
H04S3/00
ELECTRICITY
Abstract
Multiple virtual source locations may be defined for a volume within which audio objects can move. A set-up process for rendering audio data may involve receiving reproduction speaker location data and pre-computing gain values for each of the virtual sources according to the reproduction speaker location data and each virtual source location. The gain values may be stored and used during “run time,” during which audio reproduction data are rendered for the speakers of the reproduction environment. During run time, for each audio object, contributions from virtual source locations within an area or volume defined by the audio object position data and the audio object size data may be computed. A set of gain values for each output channel of the reproduction environment may be computed based, at least in part, on the computed contributions. Each output channel may correspond to at least one reproduction speaker of the reproduction environment.
Claims
1. A method for rendering input audio including at least one audio object and associated metadata, wherein the metadata includes audio object size metadata and audio object position metadata corresponding to the at least one audio object, the method comprising: receiving the audio object size metadata and the audio object position metadata; determining the at least a virtual audio object based on the input audio, the audio object size metadata and the audio object position metadata; determining a location of the at least a virtual audio object based on at least one of the audio object size metadata and the audio object position metadata; determining a gain of the virtual audio object; and rendering the audio object to one or more speaker feeds based on the location of the at least a virtual audio object and the gain of the virtual audio object.
2. The method of claim 1, further comprising: receiving reproduction environment data including reproduction speaker location data, wherein the rendering is based on the reproduction speaker location data.
3. The method of claim 2, further comprising: defining a plurality of virtual source locations according to the reproduction environment data; and computing, for each of the virtual source locations, a virtual source gain value for each of the plurality of output channels.
4. The method of claim 3, further comprising storing computed virtual source gain values in a memory system.
5. The method of claim 1, further comprising receiving reproduction environment data, wherein the reproduction environment data includes reproduction environment boundary data, further comprising: determining that the audio object size metadata includes an outside area or volume outside of a reproduction environment boundary; and applying a fade-out factor based, at least in part, on the outside area or volume.
6. The method of claim 5, further comprising: determining that the at least an audio object is within a threshold distance from a reproduction environment boundary; and providing no speaker feed signals to reproduction speakers on an opposing boundary of the reproduction environment.
7. A non-transitory medium having software stored thereon, the software including instructions for performing the method of claim 1.
8. An apparatus for rendering input audio including at least one audio object and associated metadata, wherein the metadata includes audio object size metadata and audio object position metadata corresponding to the at least one audio object, the apparatus comprising: a receiver configured to receive the audio object size metadata and the audio object position metadata; a first processor for determining the at least a virtual audio object based on the input audio, the audio object size metadata and the audio object position metadata; a second processor for determining a location of the at least a virtual audio object based on at least one of the audio object size metadata and the audio object position metadata; a third processor for determining a gain of the virtual audio object; and a renderer for rendering the audio object to one or more speaker feeds based on the location of the at least a virtual audio object and the gain of the virtual audio object.
9. The apparatus of claim 8, further comprising: a second receiver for receiving reproduction environment data including reproduction speaker location data, wherein the rendering is based on the reproduction speaker location data.
10. The apparatus of claim 8, further comprising: a fourth processor for defining a plurality of virtual source locations according to the reproduction environment data and for computing, for each of the virtual source locations, a virtual source gain value for each of the plurality of output channels.
11. The apparatus of claim 10, wherein the fourth processor is further configured to store computed virtual source gain values in a memory system.
12. The apparatus of claim 8, further comprising a fourth processor configured to: receive reproduction environment data, wherein the reproduction environment data includes reproduction environment boundary data, further comprising: determine that the audio object size metadata includes an outside area or volume outside of a reproduction environment boundary; and apply a fade-out factor based, at least in part, on the outside area or volume.
13. The apparatus of claim 12, wherein the fourth processor is further configured to: determine that an audio object is within a threshold distance from a reproduction environment boundary; and provide no speaker feed signals to reproduction speakers on an opposing boundary of the reproduction environment.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19) Like reference numbers and designations in the various drawings indicate like elements.
DESCRIPTION OF EXAMPLE EMBODIMENTS
(20) The following description is directed to certain implementations for the purposes of describing some innovative aspects of this disclosure, as well as examples of contexts in which these innovative aspects may be implemented. However, the teachings herein can be applied in various different ways. For example, while various implementations have been described in terms of particular reproduction environments, the teachings herein are widely applicable to other known reproduction environments, as well as reproduction environments that may be introduced in the future. Moreover, the described implementations may be implemented in various authoring and/or rendering tools, which may be implemented in a variety of hardware, software, firmware, etc. Accordingly, the teachings of this disclosure are not intended to be limited to the implementations shown in the figures and/or described herein, but instead have wide applicability.
(21)
(22) The Dolby Surround 5.1 configuration includes left surround array 120 and right surround array 125, each of which includes a group of speakers that are gang-driven by a single channel. The Dolby Surround 5.1 configuration also includes separate channels for the left screen channel 130, the center screen channel 135 and the right screen channel 140. A separate channel for the subwoofer 145 is provided for low-frequency effects (LFE).
(23) In 2010, Dolby provided enhancements to digital cinema sound by introducing Dolby Surround 7.1.
(24) The Dolby Surround 7.1 configuration includes the left side surround array 220 and the right side surround array 225, each of which may be driven by a single channel. Like Dolby Surround 5.1, the Dolby Surround 7.1 configuration includes separate channels for the left screen channel 230, the center screen channel 235, the right screen channel 240 and the subwoofer 245. However, Dolby Surround 7.1 increases the number of surround channels by splitting the left and right surround channels of Dolby Surround 5.1 into four zones: in addition to the left side surround array 220 and the right side surround array 225, separate channels are included for the left rear surround speakers 224 and the right rear surround speakers 226. Increasing the number of surround zones within the reproduction environment 200 can significantly improve the localization of sound.
(25) In an effort to create a more immersive environment, some reproduction environments may be configured with increased numbers of speakers, driven by increased numbers of channels. Moreover, some reproduction environments may include speakers deployed at various elevations, some of which may be above a seating area of the reproduction environment.
(26)
(27) Accordingly, the modern trend is to include not only more speakers and more channels, but also to include speakers at differing heights. As the number of channels increases and the speaker layout transitions from a 2D array to a 3D array, the tasks of positioning and rendering sounds becomes increasingly difficult. Accordingly, the present assignee has developed various tools, as well as related user interfaces, which increase functionality and/or reduce authoring complexity for a 3D audio sound system. Some of these tools are described in detail with reference to
(28)
(29) As used herein with reference to virtual reproduction environments such as the virtual reproduction environment 404, the term “speaker zone” generally refers to a logical construct that may or may not have a one-to-one correspondence with a reproduction speaker of an actual reproduction environment. For example, a “speaker zone location” may or may not correspond to a particular reproduction speaker location of a cinema reproduction environment. Instead, the term “speaker zone location” may refer generally to a zone of a virtual reproduction environment. In some implementations, a speaker zone of a virtual reproduction environment may correspond to a virtual speaker, e.g., via the use of virtualizing technology such as Dolby Headphone,™(sometimes referred to as Mobile Surround™), which creates a virtual surround sound environment in real time using a set of two-channel stereo headphones. In GUI 400, there are seven speaker zones 402a at a first elevation and two speaker zones 402b at a second elevation, making a total of nine speaker zones in the virtual reproduction environment 404. In this example, speaker zones 1-3 are in the front area 405 of the virtual reproduction environment 404. The front area 405 may correspond, for example, to an area of a cinema reproduction environment in which a screen 150 is located, to an area of a home in which a television screen is located, etc.
(30) Here, speaker zone 4 corresponds generally to speakers in the left area 410 and speaker zone 5 corresponds to speakers in the right area 415 of the virtual reproduction environment 404. Speaker zone 6 corresponds to a left rear area 412 and speaker zone 7 corresponds to a right rear area 414 of the virtual reproduction environment 404. Speaker zone 8 corresponds to speakers in an upper area 420a and speaker zone 9 corresponds to speakers in an upper area 420b, which may be a virtual ceiling area. Accordingly, and as described in more detail in the Authoring and Rendering Application, the locations of speaker zones 1-9 that are shown in
(31) In various implementations described in the Authoring and Rendering Application, a user interface such as GUI 400 may be used as part of an authoring tool and/or a rendering tool. In some implementations, the authoring tool and/or rendering tool may be implemented via software stored on one or more non-transitory media. The authoring tool and/or rendering tool may be implemented (at least in part) by hardware, firmware, etc., such as the logic system and other devices described below with reference to
x.sub.i(t)=g.sub.ix.sub.i x(t),i=1, . . . N (Equation 1)
(32) In Equation 1, x.sub.i(t) represents the speaker feed signal to be applied to speaker i, g.sub.i represents the gain factor of the corresponding channel, x(t) represents the audio signal and t represents time. The gain factors may be determined, for example, according to the amplitude panning methods described in Section 2, pages 3-4 of V. Pulkki, Compensating Displacement of Amplitude Panned Virtual Sources (Audio Engineering Society (AES) International Conference on Virtual, Synthetic and Entertainment Audio), which is hereby incorporated by reference. In some implementations, the gains may be frequency dependent. In some implementations, a time delay may be introduced by replacing x(t) by x(t-Δt).
(33) In some rendering implementations, audio reproduction data created with reference to the speaker zones 402 may be mapped to speaker locations of a wide range of reproduction environments, which may be in a Dolby Surround 5.1 configuration, a Dolby Surround 7.1 configuration, a Hamasaki 22.2 configuration, or another configuration. For example, referring to
(34)
(35) In some authoring implementations, an authoring tool may be used to create metadata for audio objects. As noted above, the term “audio object” may refer to a stream of audio data signals and associated metadata. The metadata may indicate the 3D position of the audio object, the apparent size of the audio object, rendering constraints as well as content type (e.g. dialog, effects), etc. Depending on the implementation, the metadata may include other types of data, such as gain data, trajectory data, etc. Some audio objects may be static, whereas others may move. Audio object details may be authored or rendered according to the associated metadata which, among other things, may indicate the position of the audio object in a three-dimensional space at a given point in time. When audio objects are monitored or played back in a reproduction environment, the audio objects may be rendered according to their position and size metadata according to the reproduction speaker layout of the reproduction environment.
(36)
(37) In the example shown in
(38) In the example shown in
(39)
(40) Returning to
(41) In some implementations, method 500 may include optional block 515, which involves decorrelating audio data. Block 515 may be part of a run-time process. In some such implementations, block 515 may involve convolution in the frequency domain. For example, block 515 may involve applying a finite impulse response (“FIR”) filter for each speaker feed signal.
(42) In some implementations, the processes of block 515 may or may not be performed, depending on an audio object size and/or an author's artistic intention. According to some such implementations, an authoring tool may link audio object size with decorrelation by indicating (e.g., via a decorrelation flag included in associated metadata) that decorrelation should be turned on when the audio object size is greater than or equal to a size threshold value and that decorrelation should be turned off if the audio object size is below the size threshold value. In some implementations, decorrelation may be controlled (e.g., increased, decreased or disabled) according to user input regarding the size threshold value and/or other input values.
(43)
(44) The reproduction environment data also may include data indicating a correlation of output channels with reproduction speakers of a reproduction environment. For example, the reproduction environment may have a Dolby Surround 7.1 configuration such as that shown in
(45) In this example, block 525 involves defining virtual source locations 605 according to the reproduction environment data. The virtual source locations 605 may be defined within a virtual source volume. In some implementations, the virtual source volume may correspond with a volume within which audio objects can move. As shown in
(46) Moreover, the virtual source locations 605 may or may not be spaced uniformly within the virtual source volume 602, depending on the particular implementation. In some implementations, the virtual source locations 605 may be spaced uniformly in all directions. For example, the virtual source locations 605 may form a rectangular grid of N.sub.x by N.sub.y by N.sub.z virtual source locations 605. In some implementations, the value of N may be in the range of 5 to 100. The value of N may depend, at least in part, on the number of reproduction speakers in the reproduction environment: it may be desirable to include two or more virtual source locations 605 between each reproduction speaker location.
(47) In other implementations, the virtual source locations 605 may have a first uniform spacing along x and y axes and a second uniform spacing along a z axis. The virtual source locations 605 may form a rectangular grid of N.sub.x by N.sub.y by M.sub.z virtual source locations 605. For example, in some implementations there may be fewer virtual source locations 605 along the z axis than along the x or y axes. In some such implementations, the value of N may be in the range of 10 to 100, whereas the value of M may be in the range of 5 to 10.
(48) In this example, block 530 involves computing virtual source gain values for each of the virtual source locations 605. In some implementations, block 530 involves computing, for each of the virtual source locations 605, virtual source gain values for each channel of a plurality of output channels of the reproduction environment. In some implementations, block 530 may involve applying a vector-based amplitude panning (“VBAP”) algorithm, a pairwise panning algorithm or a similar algorithm to compute gain values for point sources located at each of the virtual source locations 605. In other implementations, block 530 may involve applying a separable algorithm, to compute gain values for point sources located at each of the virtual source locations 605. As used herein, a “separable” algorithm is one for which the gain of a given speaker can be expressed as a product of two or more factors that may be computed separately for each of the coordinates of the virtual source location. Examples include algorithms implemented in various existing mixing console panners, including but not limited to the Pro Tools™ software and panners implemented in digital film consoles provided by AMS Neve. Some two-dimensional examples are provided below.
(49)
(50) Referring now to
(51)
G_1 (x)=cos(pi/2*x) if 1=L,Ls
G_1 (x)=sin(pi/2*x) if 1=R,Rs
G_1 (y)=cos(pi/2*y) if 1=L,R
G_1(y)=sin(pi/2*y) if 1=Ls,Rs
(52) The overall gain is the product: G_1 (x,y)=G_1(x) G_1(y). In general, these functions depend on all the coordinates of all speakers. However, G_1(x) does not depend on the y-position of the source, and G_1(y) does not depend on its x-position. To illustrate a simple calculation, suppose that the audio object position 615 is (0,0), the location of the L speaker. G_L (x)=cos (0)=1. G_L (y)=cos (0)=1. The overall gain is the product: G_L(x,y)=G_L(x) G_L(y)=1. Similar calculations lead to G_Ls=G_Rs=G_R=0.
(53) It may be desirable to blend between different panning modes as an audio object enters or leaves the virtual reproduction environment 400a. For example, a blend of gains computed according to near-field panning methods and far-field panning methods may be applied when the audio object 610 moves from the audio object location 615 shown in
(54) Returning now to
(55)
(56) In this example, the run-time process begins with the receipt of audio reproduction data that includes one or more audio objects (block 540). The audio objects include audio signals and associated metadata, including at least audio object position data and audio object size data in this example. Referring to
(57) In this implementation, block 545 involves computing contributions from virtual sources within an area or volume defined by the audio object position data and the audio object size data. In the examples shown in
(58) In some examples, block 545 may involve retrieving, from a memory system, computed virtual source gain values for virtual source locations corresponding to an audio object position and size, and interpolating between the computed virtual source gain values. The process of interpolating between the computed virtual source gain values may involve determining a plurality of neighboring virtual source locations near the audio object position, determining computed virtual source gain values for each of the neighboring virtual source locations, determining a plurality of distances between the audio object position and each of the neighboring virtual source locations and interpolating between the computed virtual source gain values according to the plurality of distances.
(59) The process of computing contributions from virtual sources may involve computing a weighted average of computed virtual source gain values for virtual source locations within an area or volume defined by the audio object's size. Weights for the weighted average may depend, for example, on the audio object's position, the audio object's size and each virtual source location within the area or volume.
(60)
(61) The audio object 610 has a size indicated by the audio object volume 620b, a rectangular cross-sectional area of which is shown in
(62)
(63) Returning to
(64) The process of computing the audio object gain value for each of the plurality of output channels may involve determining a gain value (g.sub.l.sup.size(x.sub.o, y.sub.o, z.sub.o;s)) for an audio object of size (s) to be rendered at location x.sub.o, y.sub.o,z.sub.o. This audio object gain value may sometimes be referred to herein as an “audio object size contribution.” According to some implementations, the audio object gain value (g.sub.l.sup.size(x.sub.o, y.sub.o, z.sub.o;s)) may be expressed as:
(65)
(66) In Equation 2, (x.sub.vs, y.sub.vs, z.sub.vs) represents a virtual source location, g.sub.l(x.sub.vs, y.sub.vs, z.sub.vs) represents a gain value for channel l for the virtual source location x.sub.vs, y.sub.vs, z.sub.vs and w(x.sub.v.sub.
(67) In some examples, the exponent p may have a value between 1 and 10. In some implementations, p may be a function of the audio object size s. For example, ifs is relatively larger, in some implementations p may be relatively smaller. According to some such implementations, p may be determined as follows:
p=6, if s≤0.5
p=6+(−4)/(s−0.5)/(s.sub.max−0.5), if s>0.5,
wherein s.sub.max corresponds to the maxiumum value of an internal scaled-up size s.sub.internal (described below) and wherein an audio object size s=1 may correspond with an audio object having a size (e.g., a diameter) equal to a length of one of the boundaries of the reproduction environment (e.g., equal to the length of one wall of the reproduction environment).
(68) Depending in part on the algorithm(s) used to compute the virtual source gain values, it may be possible to simplify Equation 2 if the virtual source locations are uniformly distributed along an axis and if the weight functions and the gain functions are separable, e.g., as described above. If these conditions are met, then g.sub.l(x.sub.vs, y.sub.vs, z.sub.vs) may be expressed as g.sub.lx(x.sub.vs)g.sub.ly(y.sub.vs)gl.sub.z(z.sub.vs), wherein g.sub.lx(x.sub.vs), g.sub.lx(y.sub.vs) and g.sub.lz(z.sub.vs) represent independent gain functions of x, y and z coordinates for a virtual source's location.
(69) Similarly, w(x.sub.vs, y.sub.vs, z.sub.vs; x.sub.o, y.sub.o, z.sub.o; s) may factor as w.sub.x(x.sub.vs; x.sub.0; s)w.sub.y(y.sub.vs; y.sub.o; s)w.sub.z(z.sub.vs; z.sub.o; s), wherein w.sub.x(x.sub.vs; x.sub.o; s), w.sub.y(y.sub.vs; y.sub.o; s) and w.sub.z(z.sub.vs;z.sub.o; s) represent independent weight functions of x, y and z coordinates for a virtual source's location. One such example is shown in
(70) If w(x.sub.vs, y.sub.vs, z.sub.vs; x.sub.o, y.sub.o, z.sub.o; s) can be factored as w.sub.x(x.sub.vs; x.sub.o; s)w.sub.y(y.sub.vs; y.sub.o; s)w.sub.z(z.sub.vs; z.sub.o; s),
(71) Equation 2 simplifies to: [f.sub.l.sup.x(x.sub.o; s) f.sub.l.sup.y(y.sub.o; s) f.sub.l.sup.z(z.sub.o; s)].sup./p,
(72) wherein
(73)
(74) The functions ƒ may contain all the required information regarding the virtual sources. If the possible object positions are discretized along each axis, one can express each function ƒ as a matrix. Each function θ may be pre-computed during the set-up process of block 505 (see
(75) In some implementations, the audio object size contribution gl.sup.size may be combined with the “audio object neargain” result for the audio object position. As used herein, the “audio object neargain” is a computed gain that is based on the audio object position 615. The gain computation may be made using the same algorithm used to compute each of the virtual source gain values. According to some such implementations, a cross-fade calculation may be performed between the audio object size contribution and the audio object neargain result, e.g., as a function of audio object size. Such implementations may provide smooth panning and smooth growth of audio objects, and may allow a smooth transition between the smallest and the largest audio object sizes. In one such implementation,
g.sub.l.sup.totoal(x.sub.o,y.sub.o,z.sub.o;s)=α(s)g.sub.l.sup.neargain(x.sub.o,y.sub.o,z.sub.o;s)+β(s){tilde over (g)}.sub.l.sup.size(x.sub.o,y.sub.o,z .sub.o;s), wherein
s<s.sub.xfade,α=cos((s/s.sub.xfade)(π2)),β=sin((s/s.sub.xfade)(π/2))
s≥s.sub.xfadeα=0,β=1,
and wherein {tilde over (g)}.sub.l.sup.size represents the normalized version of the previously computed g.sub.l.sup.size. In some such implementations, s.sub.xfade=0.2. However, in alternative implementations, s.sub.xfade may have other values.
(76) According to some implementations, the audio object size value may be scaled up in the larger portion of its range of possible values. In some authoring implementations, for example, a user may be exposed to audio object size values s.sub.user, ∈[0,1] which are mapped into the actual size used by the algorithm to a larger range, e.g., the range [0, s.sub.max], wherein s.sub.max>1. This mapping may ensure that when size is set to maximum by the user, the gains become truly independent of the object's position. According to some such implementations, such mappings may be made according to a piece-wise linear function that connects pairs of points (s.sub.user, s.sub.internal), wherein s.sub.user, represents a user-selected audio object size and s.sub.internal represents a corresponding audio object size that is determined by the algorithm. According to some such implementations, the mapping may be made according to a piece-wise linear function that connects pairs of points (0, 0), (0.2, 0.3), (0.5, 0.9), (0.75, 1.5) and (1, s.sub.max). In one such implementation, S.sub.max=2.8.
(77)
(78) For aesthetical reasons, it may be desirable to modify audio object gain calculations for audio objects that are approaching a boundary of a reproduction environment. In
(79) In the example shown in
(80)
(81) In this implementation, block 915 involves determining that an audio object area or volume, defined by the audio object position data and the audio object size data, includes an outside area or volume outside of a reproduction environment boundary. Block 915 also may involve determining what proportion of the audio object area or volume is outside the reproduction environment boundary.
(82) In block 920, a fade-out factor is determined. In this example, the fade-out factor may be based, at least in part, on the outside area. For example, the fade-out factor may be proportional to the outside area.
(83) In block 925, a set of audio object gain values may be computed for each of a plurality of output channels based, at least in part, on the associated metadata (in this example, the audio object position data and the audio object size data) and the fade-out factor. Each output channel may correspond to at least one reproduction speaker of the reproduction environment.
(84) In some implementations, the audio object gain computations may involve computing contributions from virtual sources within an audio object area or volume. The virtual sources may correspond with plurality of virtual source locations that may be defined with reference to the reproduction environment data. The virtual source locations may or may not be spaced uniformly. For each of the virtual source locations, a virtual source gain value may be computed for each of the plurality of output channels. As described above, in some implementations these virtual source gain values may be computed and stored during a set-up process, then retrieved for use during run-time operations.
(85) In some implementations, the fade-out factor may be applied to all virtual source gain values corresponding to virtual source locations within a reproduction environment. In some implementations, g.sub.l.sup.size may be modified as follows:
g.sub.l.sup.size=[g.sub.l.sup.bound+(fade−out factor)×g.sub.l.sup.inside].sup./p, wherein
fade−out factor=1, if d.sub.bound≥s,
fade−out factor=d.sub.bound/s, if d.sub.bound<s,
wherein dbound represents the minimum distance between an audio object location and a boundary of the reproduction environment and g.sub.l.sup.bound represents the contribution of virtual sources along a boundary. For example, referring to
(86) In alternative implementations, g.sub.l.sup.size may be modified as follows:
g.sub.l.sup.size=[g.sub.l.sup.outside(fade−out factor)×g.sub.l.sup.inside].sup./p,
wherein g.sub.l.sup.outside represents audio object gains based on virtual sources located outside of a reproduction environment but within an audio object area or volume. For example, referring to
(87)
(88) The device 1000 includes a logic system 1010. The logic system 1010 may include a processor, such as a general purpose single- or multi-chip processor. The logic system 1010 may include a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components, or combinations thereof. The logic system 1010 may be configured to control the other components of the device 1000. Although no interfaces between the components of the device 1000 are shown in
(89) The logic system 1010 may be configured to perform audio authoring and/or rendering functionality, including but not limited to the types of audio authoring and/or rendering functionality described herein. In some such implementations, the logic system 1010 may be configured to operate (at least in part) according to software stored in one or more non-transitory media. The non-transitory media may include memory associated with the logic system 1010, such as random access memory (RAM) and/or read-only memory (ROM). The non-transitory media may include memory of the memory system 1015. The memory system 1015 may include one or more suitable types of non-transitory storage media, such as flash memory, a hard drive, etc.
(90) The display system 1030 may include one or more suitable types of display, depending on the manifestation of the device 1000. For example, the display system 1030 may include a liquid crystal display, a plasma display, a bistable display, etc.
(91) The user input system 1035 may include one or more devices configured to accept input from a user. In some implementations, the user input system 1035 may include a touch screen that overlays a display of the display system 1030. The user input system 1035 may include a mouse, a track ball, a gesture detection system, a joystick, one or more GUIs and/or menus presented on the display system 1030, buttons, a keyboard, switches, etc. In some implementations, the user input system 1035 may include the microphone 1025: a user may provide voice commands for the device 1000 via the microphone 1025. The logic system may be configured for speech recognition and for controlling at least some operations of the device 1000 according to such voice commands.
(92) The power system 1040 may include one or more suitable energy storage devices, such as a nickel-cadmium battery or a lithium-ion battery. The power system 1040 may be configured to receive power from an electrical outlet.
(93)
(94) The system 1100 may, for example, include an existing authoring system, such as a Pro Tools™ system, running a metadata creation tool (i.e., a panner as described herein) as a plugin. The panner could also run on a standalone system (e.g., a PC or a mixing console) connected to the rendering tool 1110, or could run on the same physical device as the rendering tool 1110. In the latter case, the panner and renderer could use a local connection, e.g., through shared memory. The panner GUI could also be provided on a tablet device, a laptop, etc. The rendering tool 1110 may comprise a rendering system that includes a sound processor that is configured for executing rendering methods like the ones described in
(95)
(96) Various modifications to the implementations described in this disclosure may be readily apparent to those having ordinary skill in the art. The general principles defined herein may be applied to other implementations without departing from the spirit or scope of this disclosure. Thus, the claims are not intended to be limited to the implementations shown herein, but are to be accorded the widest scope consistent with this disclosure, the principles and the novel features disclosed herein.