Spatial sound reproduction using multichannel loudspeaker systems
11302339 · 2022-04-12
Assignee
Inventors
- Mikko-Ville Laitinen (Espoo, FI)
- Juha Tapio Vilkamo (Helsinki, FI)
- Tapani Pihlajakuja (Vantaa, FI)
- Antti Johannes Eronen (Tampere, FI)
Cpc classification
H04S2400/15
ELECTRICITY
H04S2420/03
ELECTRICITY
H04S2400/11
ELECTRICITY
H04S7/30
ELECTRICITY
G10L19/02
PHYSICS
H04S2420/11
ELECTRICITY
H04S7/00
ELECTRICITY
H04R2205/024
ELECTRICITY
International classification
G10L19/02
PHYSICS
H04S7/00
ELECTRICITY
H03M7/30
ELECTRICITY
Abstract
An apparatus for spatial audio signal decoding associated with a plurality of speaker nodes (201, 203, 205, 207, 209) placed within a three dimensional space, the apparatus comprising at least one processor and at least one memory including a computer program code. The at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to determine a non-overlapping virtual surface arrangement (400), the virtual surface arrangement (400) comprising a plurality of virtual surfaces (421, 423, 431, 433) with corners positioned at at least three speaker nodes of the plurality of speaker nodes (201, 203, 205, 207, 209) and sides connecting pairs of corners configured to be non-intersecting with at least one defined virtual plane within the three dimensional space. The apparatus is further caused to generate gains for the speaker nodes based on the determined the virtual surface arrangement and apply the gains to at least one audio signal, the at least one audio signal to be positioned within the three dimensional space.
Claims
1. An apparatus for spatial audio signal decoding associated with a plurality of speaker nodes placed within a three dimensional space, the apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: determine a non-overlapping virtual surface arrangement, wherein the non-overlapping virtual surface arrangement comprises a plurality of virtual surfaces with corners positioned at least three speaker nodes of the plurality of speaker nodes and sides connecting pairs of corners configured to be non-intersecting with at least one defined virtual plane within the three dimensional space, wherein the non-overlapping virtual surface arrangement is determined by the apparatus being configured to: divide the plurality of speaker nodes into parts, wherein at least one part comprises speaker nodes located to one side or on the defined virtual plane, and at least one further part comprises speaker nodes located to the other side or on the defined virtual plane; determine virtual surface edges for the at least one part and virtual surface edges for the at least one further part; and combine the virtual surface edges for the at least one part and the virtual surface edges for the at least one further part to generate the plurality of virtual surfaces and therefore determine the non-overlapping virtual surface arrangement; generate gains for the plurality of speaker nodes based on the non-overlapping virtual surface arrangement; and apply the gains to at least one audio signal, the at least one audio signal to be positioned within the three dimensional space, wherein the apparatus, when dividing the speaker nodes into parts, is configured to divide the speaker nodes into an additional part comprising speaker nodes located on or close to the defined virtual plane, and wherein the apparatus, when determining the virtual surface edges for the at least one part, is further configured to virtually connect speaker nodes for the at least one part comprising speaker nodes located on or close to the defined virtual plane.
2. The apparatus as claimed in claim 1, wherein the apparatus caused to determine the virtual surface edges is further caused to: determine candidate virtual surface edges based on selecting pairs of speaker nodes from the plurality of speaker nodes; and process the candidate virtual surface edges to determine the virtual surface edges.
3. The apparatus as claimed in claim 2, wherein the apparatus caused to process the candidate virtual surface edges to determine the virtual surface edges is further caused to: select one of the candidate virtual surface edges until all of the candidate virtual surface edges are selected; discard the one of the candidate virtual surface edges when the one of the candidate virtual surface edges crosses the at least one defined virtual plane; discard the one of the candidate virtual surface edges when the one of the candidate virtual surface edges has any speaker node substantially behind, with respect to a defined observation point, the one of the candidate virtual surface edges; and determine whether the one of the candidate virtual surface edges intersect with at least one previous candidate virtual surface edge and discard the at least one previous candidate virtual surface edge when the previous candidate virtual surface edge was longer and otherwise discard the one of the candidate virtual surface edges.
4. The apparatus as claimed in claim 3, wherein the apparatus caused to determine whether the one of the candidate virtual surface edges intersect with at least one previous candidate virtual surface edge and discard the at least one previous candidate virtual surface edge when the previous candidate virtual surface edge was longer and otherwise discard the one of the candidate virtual surface edges is caused to apply at least one weighting a length of one or other of the candidate virtual surface edges and/or previous candidate virtual surface edge prior to a comparison.
5. The apparatus as claimed in claim 1, further caused to add at least one virtual speaker node at a defined location.
6. The apparatus as claimed in claim 5, caused to add at least one virtual speaker node at a defined location is caused to perform at least one of: add a virtual speaker node directly at the top of a sphere defined by the speaker nodes; add a virtual speaker node directly at the bottom of a sphere defined by the speaker nodes; and add at least one virtual speaker node where speaker nodes are determined to be sparse.
7. The apparatus as claimed in claim 1, further caused to move at least one speaker node to the at least one defined virtual plane when the at least one speaker node is within a defined threshold distance from the at least one defined virtual plane.
8. The apparatus as claimed in claim 1, further caused to remove at least one speaker node when the at least one speaker node is within a defined threshold distance of at least one further speaker node.
9. The apparatus as claimed in the claim 1, further caused to merge at least two speaker nodes when the at least two speaker nodes are within a defined threshold distance of each other.
10. The apparatus as claimed in claim 1, wherein the at least one defined virtual plane is at least one of: a horizontal plane with an elevation of zero; a horizontal plane linking at least two or more speaker nodes; a plane linking at least two or more speaker nodes; and a plane defined by analysis of the at least one audio signal.
11. The apparatus as claimed in claim 1, wherein the speaker nodes comprise at least one of: a physical loudspeaker; a virtual loudspeaker; and a binaural filter node.
12. A method comprising: determining a non-overlapping virtual surface arrangement in a three dimensional space including a plurality of speaker nodes, the non-overlapping virtual surface arrangement comprising a plurality of virtual surfaces with corners positioned at least three speaker nodes of the plurality of speaker nodes and sides connecting pairs of corners configured to be non-intersecting with at least one defined virtual plane within the three dimensional space, wherein determining the non-overlapping virtual surface arrangement comprises: dividing the speaker nodes into parts, wherein at least one part comprises speaker nodes located to one side or on the defined virtual plane, and at least one further part comprises speaker nodes located to the other side or on the defined virtual plane; determining virtual surface edges for the at least one part and virtual surface edges for the at least one further part; and combining the virtual surface edges for the at least one part and the virtual surface edges for the at least one further part to generate the plurality of virtual surfaces and therefore determine the non-overlapping virtual surface arrangement; generating gains for the plurality of speaker nodes based on the non-overlapping virtual surface arrangement; and applying the gains to at least one audio signal, the at least one audio signal to be positioned within the three dimensional space, wherein dividing the speaker nodes into parts comprises dividing the speaker nodes into an additional part comprising speaker nodes located on or close to the defined virtual plane, and wherein determining the virtual surface edges for the at least one part comprises virtually connecting speaker nodes for the at least one part comprising speaker nodes located on or close to the defined virtual plane.
13. The method as claimed in claim 12, wherein determining the virtual surface edges further comprises: determining candidate virtual surface edges based on selecting pairs of speaker nodes from the plurality of speaker nodes; and processing the candidate virtual surface edges to determine the virtual surface edges.
14. The method as claimed in claim 13, wherein processing the candidate virtual surface edges to determine the virtual surface edges further comprises: selecting one of the candidate virtual surface edges until all of the candidate virtual surface edges are selected; discarding the one of the candidate virtual surface edges when the one of the candidate virtual surface edges crosses the at least one defined virtual plane; discarding the one of the candidate virtual surface edges when the one of the candidate virtual surface edges has any speaker node substantially behind, with respect to a defined observation point, the one of the candidate virtual surface edges; and determining whether the one of the candidate virtual surface edges intersect with at least one previous candidate virtual surface edge and discard the at least one previous candidate virtual surface edge when the previous candidate virtual surface edge was longer and otherwise discard the one of the candidate virtual surface edges.
15. The method as claimed in claim 14, wherein determining whether the one of the candidate virtual surface edges intersect with at least one previous candidate virtual surface edge and discarding the at least one previous candidate virtual surface edge when the previous candidate virtual surface edge was longer and otherwise discard the one of the candidate virtual surface edges comprises applying at least one weighting a length of one or other of the candidate virtual surface edges and/or previous candidate virtual surface edge prior to a comparison.
16. The method as claimed in claim 12, further comprising adding at least one virtual speaker node at a defined location.
17. The method as claimed in claim 16, wherein adding at least one virtual speaker node at a defined location further comprises: adding a virtual speaker node directly at a top of a sphere defined by the speaker nodes; adding a virtual speaker node directly at a bottom of the sphere defined by the speaker nodes; and adding at least one virtual speaker node where speaker nodes are determined to be sparse.
18. The method as claimed in claim 12, the method further comprising moving at least one speaker node to the at least one defined virtual plane when the at least one speaker node is within a defined threshold distance from the at least one defined virtual plane.
19. The method as claimed in claim 12, the method further comprising removing at least one speaker node when the at least one speaker node is within a defined threshold distance of at least one further speaker node.
20. A computer program product for spatial audio signal decoding associated with a plurality of speaker nodes placed within a three dimensional space, the computer program product embodied on a non-transitory computer readable medium, and comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to perform: determine a non-overlapping virtual surface arrangement, wherein the non-overlapping virtual surface arrangement comprises a plurality of virtual surfaces with corners positioned at least three speaker nodes of the plurality of speaker nodes and sides connecting pairs of corners configured to be non-intersecting with at least one defined virtual plane within the three dimensional space, wherein the non-overlapping virtual surface arrangement is determined by the computer program code being configured to: divide the plurality of speaker nodes into parts, wherein at least one part comprises speaker nodes located to one side or on the defined virtual plane, and at least one further part comprises speaker nodes located to the other side or on the defined virtual plane; determine virtual surface edges for the at least one part and virtual surface edges for the at least one further part; and combine the virtual surface edges for the at least one part and the virtual surface edges for the at least one further part to generate the plurality of virtual surfaces and therefore determine the non-overlapping virtual surface arrangement; generate gains for the plurality of speaker nodes based on the non-overlapping virtual surface arrangement; and apply the gains to at least one audio signal, the at least one audio signal to be positioned within the three dimensional space, wherein the computer program code is configured when dividing the speaker nodes into parts, to divide the speaker nodes into an additional part comprising speaker nodes located on or close to the defined virtual plane, and wherein the computer program code, when determining the virtual surface edges for the at least one part, is further configured to virtually connect speaker nodes for the at least one part comprising speaker nodes located on or close to the defined virtual plane.
Description
SUMMARY OF THE FIGURES
(1) For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
EMBODIMENTS OF THE APPLICATION
(19) The following describes in further detail suitable apparatus and possible mechanisms for the provision of adaptation of vector base amplitude panning (VBAP).
(20) As discussed previously VBAP is based on automatically triangulating the loudspeaker setup, selecting an appropriate active triangle based on the direction (such that for a given direction three loudspeakers are selected which form a triangle where the given direction falls in), and computing gains for the three loudspeakers forming the particular triangle (or generally the particular polygon). The ‘active’ triangles may be generalized as being a virtual surface arrangement comprising virtual surfaces with corners located at loudspeaker or speaker node locations. Furthermore although some embodiments hereafter describe the generation of virtual surfaces as triangle surfaces the same methods and apparatus may be employed for any suitable polygon surface.
(21) In other words some embodiments as described hereafter are caused to determine a non-overlapping arrangement of sound surfaces, the arrangement of sound surfaces comprising a plurality of sound surfaces generated by having at least three speaker nodes of a plurality of speaker nodes, wherein each of the at least three speaker nodes is positioned in the three dimensional space in order to form a corner of a sound surface where any two sides of the sound surface in connected to a corner of the sound surface such that at least one defined sound plane does not intersect with the any two sides of the sound surface. A virtual surface as described hereafter in the examples may therefore be understood to be a sound surface represented within the 3D space defined by the speaker nodes.
(22) In a practical implementation, VBAP gains (for each azimuth and elevation) and the loudspeaker triplets (for each azimuth and elevation) are typically pre-formulated into a lookup table stored in the memory. A real-time system then performs the amplitude panning by finding from the memory the appropriate loudspeaker triplet for the desired panning direction, and the gains for these loudspeakers corresponding to the desired panning direction.
(23) The first stage of VBAP is division of the 3D loudspeaker setup into triangles. An example ‘active’ triangle is shown in
(24)
(25) The next stage is to formulate panning gains corresponding to the panning directions.
(26) The vector base amplitude panning refers to the method where the three unit vectors I.sub.1, I.sub.2, I.sub.3 (the vector base) are assumed from the point of origin to the positions of the three loudspeakers forming the triangle where the panning direction falls in as shown in
(27) The panning gains for the three loudspeakers are determined such that these three unit vectors are weighted such that their weighted sum vector points towards the desired amplitude panning direction. This can be solved as follows. A column unit vector p is formulated pointing towards the desired amplitude panning direction, and a vector g containing the amplitude panning gains can be solved by a matrix multiplication
(28)
where .sup.−1 denotes the matrix inverse. After formulating gains g, their overall level is normalized such that for the final gains the energy sum g.sup.Tg=1;
(29) In order to perform the amplitude panning, VBAP needs to first triangulate the 3D loudspeaker setup. There is no single solution to the generation of the triangulation and the loudspeaker setup can be triangulated in many ways. In typical VBAP, the solution is to try to find triangles of minimal size (no loudspeakers inside the triangles and sides having as equal length as possible). In a general case, this is a valid approach, as it treats auditory objects in any direction equally, and tries to minimize the distances to the loudspeakers that are being used to create the auditory object at that direction.
(30) Another computationally fast method for the triangulation is to generate a convex hull as a function of the data points determined by the loudspeaker angles. This is also a generic approach that treats all directions and data points equally.
(31) Perceptual spatial accuracy of amplitude panning, including VBAP, is known to be more robust when the panning is performed with loudspeakers being substantially on the horizontal axis than the vertical axis. The reason for this is that the binaural cues for elevation perception relate mostly to the sound spectrum (which is not well reproduced with amplitude panning), while the binaural cues for azimuth perception relate mostly to the inter-aural level and time differences (which are better reproduced with amplitude panning).
(32) The embodiments as discussed hereafter therefore perform the triangulation such that also the differing performance in the horizontal and vertical axes are taken into account. Additionally, as a majority of the auditory objects are typically placed on the horizontal plane (or near it) the placement of triangle edges on the horizontal axis are expected to increase reproduction accuracy. The plane, is a virtual plane and may be localized (in other words not be infinite or semi-infinite but is bounded).
(33) Typically, the loudspeaker setups are not equally distributed. Instead, there are typically more loudspeakers in the horizontal plane. For example the common 7.1 setup with only horizontal loudspeakers (and low-frequency effects (LFE)) is often used as a basis, and only a few loudspeakers with elevation are added to that setup. A common example of that approach is the 7.1.4 which contains 4 loudspeakers above the horizontal plane. In the following examples, full 3D setups are formed (loudspeakers both above and below the horizontal plane) by extending the common 7.1.4 setup by adding 4 loudspeakers also below the horizontal plane, resulting in the following positions for the loudspeakers: Elevation 0 degrees, azimuth 0, ±30, ±90, and ±150 degrees, which may be defined as (0,0) 205, (30,0) 207, (90,0) 209, (150,0) not seen in
(34) This example loudspeaker setup is denoted as 7.1.8.
(35) With such a setup, as illustrated in
(36) This effect is shown for example in
(37) The vertical panning fails with respect to the horizontal panning for two following reasons:
(38) Comb filtering. Amplitude panning refers to reproducing the same signal at (at least) two directions. When the panning occurs in the vertical axis, regardless of the horizontal rotation of the listener, the relative delays for the sounds from the two loudspeakers reaching the two ears are very similar, and both ears receive the same comb filter effect. This emphasizes the perceived spectral bias due to the panning to the reproduced sound, which in turn typically reduces the perceived audio quality. Horizontal amplitude panning, on the other hand, has in average a larger mutual variation of the comb filters at the left and right ears, thus mitigating the phenomenon. Furthermore, the head shadowing reduces the relative levels of the signals reaching the left and right ears, thus also reducing the magnitude of the comb filter effect.
(39) Localization. In amplitude panning on the vertical axis, it has been observed that different frequencies are localized at significantly different elevations. In other words, it depends on the spectral content of the signal at which elevation the sound is perceived. For substantially wide-band signals the source is perceived as spread or ambiguous over the panning arc determined by the two vertically loudspeakers.
(40) The embodiments thus described hereafter relate to 3D spatial audio reproduction with multi-channel loudspeaker (virtual or otherwise) setups to produce an automatic adaptation of the vector base amplitude panning (VBAP) for arbitrary loudspeaker setups. Furthermore, the embodiments describe a triangulation scheme for VBAP that avoids triangles crossing the horizontal plane and thus provides better audio quality for auditory objects placed on the horizontal plane.
(41) An example of such a triangular scheme is shown in
(42) In some embodiments the apparatus and methods further describe a triangulation which does not cross one or more horizontal loudspeaker arrangements above or below the horizontal plane. This enables optimizing the rendering for objects moving along circular trajectories above or beyond the horizontal plane.
(43) In some further embodiments the apparatus and methods further describe defining or designing triangulations which do not cross arbitrary planes, such as a plane oriented vertically. This enables optimal rendering for objects with arbitrary movement trajectories.
(44) The embodiments as described herein also enable making the selection of the triangulation method adaptively based on content. In such embodiments the apparatus and methods determine spatial metadata or object position metadata, determine some characteristics of movement from the metadata, and select the used triangulation accordingly.
(45) It is understood that such embodiments are applicable for triangulating physical and virtual loudspeakers. Furthermore, the following embodiments may be beneficial when triangulating binaural filters used for headphone virtualization.
(46) The auditory object placement in VBAP is performed in three stages
(47) 1) Triangulate the loudspeaker setup.
(48) 2) Based on the direction parameter, select a triangle and compute the gains for the loudspeakers of the triangle.
(49) 3) Apply the gains to the audio signal to be positioned.
(50) The stages 1) and 2) may be performed during an offline or initialization stage, and the result of which is a lookup table of gains and loudspeaker triplet indices that can be effectively utilized in real-time processing performing stage 3).
(51) The embodiments described hereafter concern primarily the operations of stage 1.
(52) With respect to
(53) In some embodiments the example apparatus comprises a speaker determiner/pre-processor 501. The speaker determiner/pre-processor is configured to determine the loudspeaker positions. In some embodiments the loudspeaker positions are provided as an input. They may be for example in a form of unit vectors, or in polar coordinates.
(54) The operation of determining the loudspeaker positions is shown in
(55) The speaker determiner/pre-processor 501 may receive the loudspeaker positions and furthermore may for example add virtual loudspeakers at some locations. For example, in some embodiments the speaker determiner/pre-processor 501 is configured to add a virtual loudspeaker directly at the top or at the bottom of the sphere. These additional virtual loudspeakers may be added for example where the input loudspeaker setup is sparse for a defined region.
(56) After the process of formulating the panning gains, the virtual loudspeaker gains can be treated in several ways. In some embodiments these virtual loudspeaker panning gains are discarded. In some embodiments the virtual loudspeaker panning gains are distributed to the nearby loudspeakers. In some embodiments the panning gains relating to the top (or bottom) virtual loudspeaker are discarded if the maximum elevation (or inclination) of the actual loudspeakers is 20 degrees or less. In some embodiments a gain-distribution method is employed if the maximum elevation (or inclination) of the actual loudspeakers is more than 20 degrees. The gain distribution method may distribute the gain of the top (or bottom) virtual loudspeaker equally to all loudspeakers with a virtual connection to the top (or bottom) virtual loudspeaker.
(57) In some embodiments the speaker determiner/pre-processor 501 is further configured to move the position information of one or more loudspeakers to the horizontal plane (or in general the no-cross plane). For example the speaker determiner/pre-processor 501 can be configured to determine that any loudspeakers within 5 degrees elevation (or inclination/declination) of the horizontal-plane are set to the horizontal-plane elevation. These loudspeakers may then be treated as a part of the horizontal ring.
(58) In some embodiments the speaker determiner/pre-processor 501 is configured to remove some loudspeakers at the loudspeaker setup. For example, the speaker determiner/pre-processor 501 is configured to determine whether loudspeakers are very close, e.g. less than or equal to 5 degrees apart (it may not be necessary or preferable to use both loudspeakers for the amplitude panning). In such embodiments one of the determined loudspeakers may be discarded in the triangulation. Alternatively in some embodiments the speaker determiner/pre-processor 501 is configured to merge the two loudspeakers (e.g., both reproducing the same signal).
(59) The operation of pre-processing is shown in
(60) The example apparatus in some embodiments comprises an (upper-half/lower-half/horizontal-ring) connector generator 503. The connector generator 503 may receive the output from the speaker determiner/pre-processor 501.
(61) The connector generator 503 may then be configured to divide the loudspeaker positions into three parts: Horizontal plane loudspeakers, upper half sphere (including horizontal), and bottom half sphere (including horizontal).
(62) The operation of selecting horizontal and above loudspeaker positions is shown in
(63) The operation of selecting horizontal ring positions is shown in
(64) The operation of selecting horizontal and below loudspeaker positions is shown in
(65) Having divided the loudspeaker positions into the three (or any suitable number of) parts connections between pairs of loudspeaker positions may be generated for each of the parts.
(66) The connections for the horizontal plane loudspeakers are generated by connecting each loudspeaker to an adjacent loudspeaker in the horizontal plane.
(67) The determining of connections for the horizontal ring data points (or positions) is shown in
(68) The determining of connections for each hemi-sphere (the horizontal and below loudspeaker positions and the horizontal and above loudspeaker positions) may be determined.
(69) The determining of connections for the horizontal ring and above horizontal data points (or positions) is shown in
(70) The determining of connections for the horizontal ring and below horizontal data points (or positions) is shown in
(71) An example of the operations for determining the connections for a hemi-sphere is shown in
(72) Determine a next connection as shown in
(73) Check if the connection has two horizontal plane loudspeakers as shown in
(74) If the connection has two horizontal plane loudspeakers discard connection as shown in
(75) If the connection does not have two horizontal plane loudspeakers the connection is then checked to determine whether it has any loudspeakers behind the connection, or nearly behind the connection (e.g. within one-degree arc). This detection can be formulated by considering a line from the origin to each of the loudspeaker positions, and checking whether any of these lines intersects or nearly intersects with the connection as shown in
(76) If the connection has any loudspeakers behind the connection, or nearly behind the connection then discard connection as shown in
(77) If the connection has no loudspeakers behind the connection, or nearly behind the connection then the connection is checked to determine whether the connection intersects with any of the previous connections. This detection can be formulated by studying the intersection of two planes. A plane is determined by two vectors from the origin to the two loudspeakers in a connection. If the plane intersection of two such planes (determined by two different connections) is within both connections then a crossing is detected. The checking of a connection to determine whether the connection intersects with any of the previous connections is shown in
(78) When no intersection is found then the connection is kept and the method jumps to step 548
(79) When an intersection is found then the new connection is checked to determine whether the new connection is longer than the prior compared connection as shown in
(80) Where the new connection is longer than the prior compared connection then discard the new connection as shown in
(81) Where the new connection is not longer than the prior compared connection then discard the prior connection as shown in
(82) When one of the prior connections intersects and is longer than the new connection, then there can be other intersecting prior connections as well. Therefore, once a prior connection is discarded, the method may apply a check again loop to keep checking the new connection against all remaining prior connections. This is shown in
(83) The checking is therefore stopped only if the
(84) 1) new connection is discarded, or
(85) 2) all connections are checked, and the new hasn't been discarded (even if some of the prior connections were).
(86) The later condition (2) is the condition when the new connection is stored, and then the method proceeds to check the next new connection.
(87) The connector determiner may then be configured to determine whether all connections have been checked as shown in
(88) When all the connections have been checked then the processing ends as shown in
(89) When at least some connections have to be checked then the next connection is selected in other words the method passes back to step 541.
(90) In some embodiments the determination of connections may also have some weighting applied to the processing operations. For example, if some loudspeakers form a ring (e.g., a ring at a defined elevation), any connection crossing that ring could be weighted in step 545, for example, its length could be doubled for the comparison. As the result of such a procedure, at that certain elevation ring the loudspeaker connections at that elevation are preferred in the triangulation and thus also at the resulting amplitude panning.
(91) In some embodiments any loudspeaker connection being more “vertical” than “horizontal” could be similarly weighted in step 545 (e.g., the length of the vertical connection could be doubled for the comparison), in order to prefer horizontal connections due to an expected better perceptual performance.
(92) The example apparatus may furthermore comprise a triangle (mesh) generator (or connector combiner) 505. The triangle (mesh) generator (or connector combiner) 505 is configured to receive the determined connections from each part (of the complete space) and combine these to generate suitable triplets/triangles or more generally polygons as shown in
(93) Finding all triangles that are formed by the determined connections; and
(94) Storing only those triangles that do not have a loudspeaker within the triangle.
(95) The example apparatus may further comprise a gain generator 507 configured to receive the output of the triangle (mesh) generator 505 and determine panning gain triplets and loudspeaker triplets. In some embodiments this may be performed as the generation of a look-up table or a real time calculation as shown in
(96) The set of panning angles can, for example, be determined based on the accuracy of direction perception of human auditory system.
(97) The gains may then be output as shown in
(98) In some embodiments the method as discussed in the embodiments above may alternatively not generate the horizontal or otherwise defined plane connections separately, but could also simply not discard the horizontal plane connections at either of the top or bottom hemispheres.
(99) With respect to
(100) In some embodiments the loudspeaker positions are determined as shown in
(101) In some embodiments a converter 551 or similar is caused to convert the loudspeaker angles (polar coordinates) into the points on a sphere (cartesian coordinates) as shown in
(102) In some embodiments a triangle or mesh generator 553 is configured to receive the output from the converter 551 and generate all possible triangles (or triplets) between the loudspeakers (or more generally the polygon edges which link the vertices located at the loudspeakers) as in common VBAP. In other words the triangles are formed between the loudspeakers as the vertices/corners as shown in
(103) In some embodiments a triangle (or mesh) analyser 555 is configured to receive the output of the triangle or mesh generator 553. The triangle analyser 555 may be configured to determine triangles having edges that cross the horizontal plane. These may then be removed and/or replaced with triangles which do not cross the horizontal plane.
(104) In other words the operation is one of identifying and removing any triangles with edges which cross the horizontal plane as shown in
(105) Furthermore in some embodiments the analysis and optimization of the triangulation is continued (i.e., removal of triangles) according to any known method. This is shown in
(106) The triplet/gain generator 557 is configured to receive the output of the triangle (mesh) analyser 555 and determine panning gain triplets and loudspeaker triplets. In some embodiments this may be performed as the generation of a look-up table or a real time calculation.
(107) The operation of formulating (generating) amplitude panning tables indicating for a set of panning (azi,ele) angles a loudspeaker triplet and the corresponding gains is shown in
(108) The operation of outputting the amplitude panning tables is shown in
(109) Although the method and apparatus in
(110) With respect to
(111) In some embodiments the loudspeaker positions are determined as shown in
(112) In some embodiments an upper/lower hemisphere triangle or mesh generator 653 is configured to receive the output from the converter 651. The upper/lower hemisphere triangle generator 653 in some embodiments comprises two separate generators or two separate functions and is configured to divide the 3D loudspeaker positions into two half-spheres, the top (upper) and the bottom (lower) hemispheres. Both hemispheres include the loudspeakers at the horizontal plane.
(113) The operation of selecting horizontal and above data points is shown in
(114) Having selected the horizontal and above data points the next operation is adding a virtual bottom data point (in other words at elevation −90) as shown in
(115) The following operation, which may be performed by the upper/lower hemisphere triangle or mesh generator 653, is the generation of edges between the datapoints (the upper sphere and the virtual bottom data point). In other words the triangles are formed between the loudspeakers and the virtual bottom data point as shown in
(116) In some embodiments an upper/lower hemisphere triangle (or mesh) analyser 655 is configured to receive the output of the upper/lower hemisphere triangle or mesh generator 653. The upper/lower hemisphere triangle analyser 655 may be configured to determine triangles having edges that include the virtual bottom data point. These may then be removed.
(117) In other words the operation is one of identifying and removing any triangles with edges that include the virtual bottom data point as shown in
(118) A similar set of operations may be performed with respect to the lower hemisphere data points.
(119) The operation of selecting horizontal and below data points is shown in
(120) Having selected the horizontal and below data points the next operation is adding a virtual top data point (in other words at elevation 90) as shown in
(121) The following operation, which may be performed by the upper/lower hemisphere triangle or mesh generator 653, is the generation of triangle/triplets between the datapoints (the lower sphere and the virtual top data point). In other words the triangles are formed between the loudspeakers and the virtual top data point as shown in
(122) In some embodiments an upper/lower hemisphere triangle (or mesh) analyser 655 is configured to receive the output of the upper/lower hemisphere triangle or mesh generator 653. The upper/lower hemisphere triangle analyser 655 may be configured to determine triangles having edges that include the virtual top data point. These may then be removed.
(123) In other words the operation is one of identifying and removing any triangles with edges that include the virtual top data point as shown in
(124) The upper/lower triangle (mesh) combiner 657 may be configured to receive the outputs of the upper/lower triangle (mesh) generator 655 and having generated upper and lower (top and bottom) hemisphere triangles combine them to form a whole sphere. This is shown in
(125) The triplet/gain generator 659 is configured to receive the output of the upper/lower triangle (mesh) combiner 657 and determine panning gain triplets and loudspeaker triplets. In some embodiments this may be performed as the generation of a look-up table or a real time calculation.
(126) The operation of formulating (generating) amplitude panning tables indicating for each panning (azi,ele) angle a loudspeaker triplet and the corresponding gains is shown in
(127) The operation of outputting the amplitude panning tables is shown in
(128) It is noted that the loudspeakers do not need to be exactly at the horizontal plane. For example, in some embodiments the method may utilize a threshold of 10 degrees so that loudspeakers within +/−10 degree elevation are considered to be in the horizontal plane.
(129) This threshold may be a parameter within the system.
(130) An example implementation of the embodiments described above is shown in
(131) The decoder 700 is shown comprising a demuxer and decoder 701 configured to receive an input bit stream (from any origin, for example, spatial sound captured, encoded and transmitted by a smartphone). The demuxer and decoder 701 is configured to separate the bit stream into an audio signal 706 component, and spatial metadata such as a diffuseness metadata 702 component (which defines an ambient to total energy ratio) and direction metadata 704 component.
(132) The audio signals within the audio component 706 are received by a forward filter bank 703 which (may be complex-modulated low-delay filterbank) configured to transform the audio signals into frequency bands.
(133) The frequency band audio signals may then be received by a divider 705. The divider 705 may furthermore receive the diffuseness metadata component 702 and divide the frequency band signals into direct 710 and an ambient 708 (or diffuse) parts, for example by applying multipliers to the audio signals as a function of the ratio/diffuseness metadata in frequency bands.
(134) The ambience (or diffuse) part 708 may be received by a decorrelator 707 which is configured to decorrelate the ambience part 708 to generate a multi-channel spatially incoherent signal.
(135) A panning table generator 715, such as described above with respect to
(136) The direct part 710 may be received by an amplitude panner 709. The amplitude panner 709 may furthermore receive the direction metadata 704 and the amplitude panning tables from the panning table generator 715. The direct part 710 audio signals may then be amplitude panned in frequency bands according to the direction metadata, utilizing the amplitude panning tables generated with the present invention.
(137) A sum module 711 may be configured to receive the direct amplitude panned output from the amplitude panner 709 and the multi-channel spatially incoherent signal from the decorrelator 707 and generate a combined multi-channel signal.
(138) An inverse filter bank 713 may then be configured to receive the combined signal and generate a suitable multi-channel loudspeaker output 725.
(139) The above examples and embodiments may be extended for any suitable use case or implementation where amplitude panning for 3D loudspeaker setups is required. One example is a decoding of a 3D surround sound, where the multi-channel sound involves audio object signals with directional metadata. The amplitude panning tables for the target loudspeaker setup (for example, a cinema loudspeaker system) are generated using the proposed apparatus and method embodiments, and the audio objects are reproduced at their appropriate directions (using the amplitude panning tables or formulated in real-time) according to their directional metadata as a function of time.
(140) A similar implementation or use case is within game audio, where the audio objects (e.g., sound sources at particular directions) are reproduced for 3D loudspeaker setups. In such implementations the user may be given an option to flexibly design the 3D loudspeaker configuration to the geometry of the room, and the method and apparatus as discussed in the embodiments above may be caused to produce amplitude panning gains (or a table) optimized for that setup.
(141) As discussed above in some embodiments can be extended to triangle generation which guarantees that no triangle crosses other planes. For example in some embodiments the amplitude panning gain generator is configured to generate gains using triangulation which does not permit triangles to intersect planes parallel to the reference 0 elevation horizontal plane (as well as or instead of the 0 elevation horizontal plane). In these examples the embodiments follow the above methods, and first prune triangles which span across some of the horizontal planes. This ensures that content moving along elevated horizontal trajectories (where for example loudspeakers are located) can be reproduced with optimal spatial quality.
(142) The concept can also be extended in some embodiments to guarantee that triangles do not cross other arbitrarily set planes. This may be advantageous with certain types of content; for example, if it is known that there are several sound objects which move along trajectories on a plane. An example could be a vertically defined virtual plane, which would guarantee that objects moving along a vertical trajectory are rendered spatially accurately.
(143) In some embodiments the selection of the planes may be adaptively based on content. These embodiments are such that the triangulation operations can be designed adaptively to provide optimal reproduction quality depending on the content types.
(144) A first example a method of content selected planes and suitable apparatus for implementing this method are shown in
(145) In some embodiments the content based plane generator comprises a spatial metadata input or processor 851. The spatial metadata input or processor 851 is configured to obtain or determine spatial metadata associated with the audio signals. For example the spatial metadata may comprise azimuth and elevation for time-frequency tiles, or object spatial position metadata. This is shown in
(146) In some embodiments the content based plane generator comprises a plane selector 853. The plane selector 853 is configured to obtain or determine one or more planes from a set of predefined virtual planes. There may be, for example, a plane stored for each horizontal loudspeaker ring which can be formed for the defined loudspeaker setup.
(147) The operation of obtaining or determining one or more planes from a set of predefined virtual planes is shown in
(148) In some embodiments the content based plane generator comprise a plane optimiser 855. The plane optimiser 855 may be configured to perform a quantization operation to quantize such data samples which are closer than a predefined threshold from the plane in the elevation dimension. A suitable threshold value may be, for example, five degrees.
(149) The quantization of data (spatial positions) is shown in
(150) The principle of quantizing spatial positions to the plane is depicted in
(151)
(152) The plane optimiser 855 may then be configured to count the number of data samples quantized to the current plane.
(153) The operation of counting the number of data samples quantized to the plane is shown in
(154) A plane which maximally passes through spatial positions containing frequent sound content is the best candidate to be used as the no-cross plane. The plane quantizer may thus determine whether the current selected plane is the first selected plane or is better than the previously stored plane.
(155) The operation of determining whether the plane is the first or better plane for selection is shown in
(156) If this is the first selected plane or more quantized samples are on this plane than on the previous plane, the parameters of the plane are stored such as shown in
(157) Otherwise, the method may proceed to select a next plane (for example by indicating to the plane selector 853 to select another plane) and repeat the method for this newly selected plane. This is shown in
(158) In some embodiments the plane optimizer is configured to determine whether all planes have been processed. If there are some planes to be checked the plane selector 853 may be caused to select another plane to check/process. When all planes have been processed, the stored, optimized, plane is passed to be used in the triangulation methods described above and used instead or in addition to the horizontal plane. In some embodiments more than one content selected planes may be determined.
(159) With respect to
(160) Thus for example in the apparatus shown in
(161) The several planes may be defined for the loudspeaker configuration within the initial panning table generator 1001, which may be similar to the panning table generator shown in
(162) In some embodiments the determination of the planes may also be done by a content creator instead of an automatic mechanism. In reproduction of audio objects, the determined plane for one object may be different than the determined plane for another object.
(163) As discussed above although the examples above have been explained in the context of loudspeaker playback, when using either physical or virtual loudspeakers. However, triangulation may be applied also in the context of binaural filtering, when selecting the closest binaural filters from a binaural filter database to be used for reproducing a sound object.
(164) With respect to
(165) In some embodiments the device 1400 comprises at least one processor or central processing unit 1407. The processor 1407 can be configured to execute various program codes such as the methods such as described herein.
(166) In some embodiments the device 1400 comprises a memory 1411. In some embodiments the at least one processor 1407 is coupled to the memory 1411. The memory 1411 can be any suitable storage means. In some embodiments the memory 1411 comprises a program code section for storing program codes implementable upon the processor 1407. Furthermore in some embodiments the memory 1411 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
(167) In some embodiments the device 1400 comprises a user interface 1405. The user interface 1405 can be coupled in some embodiments to the processor 1407. In some embodiments the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405. In some embodiments the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad. In some embodiments the user interface 1405 can enable the user to obtain information from the device 1400. For example the user interface 1405 may comprise a display configured to display information from the device 1400 to the user. The user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400. In some embodiments the user interface 1405 may be the user interface for communicating with the position determiner as described herein.
(168) In some embodiments the device 1400 comprises an input/output port 1409. The input/output port 1409 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
(169) The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
(170) The transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code. Furthermore the device may generate a suitable downmix signal and parameter output to be transmitted to the synthesis device.
(171) In some embodiments the device 1400 may be employed as at least part of the synthesis device. As such the input/output port 1409 may be configured to receive the downmix signals and in some embodiments the parameters determined at the capture device or processing device as described herein, and generate a suitable audio signal format output by using the processor 1407 executing suitable code. The input/output port 1409 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones or similar.
(172) In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
(173) The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
(174) The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
(175) Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
(176) Programs, such as those provided by Synopsys, Inc. of Mountain View, Calif. and Cadence Design, of San Jose, Calif. automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or “fab” for fabrication.
(177) The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.