Acoustic signal mixing apparatus and non-transitory computer readable storage medium
11356774 · 2022-06-07
Assignee
Inventors
Cpc classification
H04S2420/01
ELECTRICITY
H04S2400/15
ELECTRICITY
H04S7/302
ELECTRICITY
H04S3/02
ELECTRICITY
H04S3/002
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
H04S3/02
ELECTRICITY
H04S3/00
ELECTRICITY
Abstract
A mixing apparatus includes: a first speaker set processing unit to a P-th speaker set processing unit. K-th speaker set processing unit (K being an integer from 1 to P) includes: a mic set processing unit configured to process acoustic signals output by two microphones of a corresponding microphone set and to output a first acoustic signal and a second acoustic signal. The mic set processing unit configured to process acoustic signals output by two microphones of a corresponding microphone set based on an expansion/contraction coefficient for determining an expansion/contraction rate of a sound field, a shift coefficient for determining a shift amount of a sound field, and an attenuation coefficient for determining an attenuation amount of an acoustic signal output by a microphone.
Claims
1. A mixing apparatus for outputting drive signals for respectively driving N speakers based on acoustic signals obtained by performing sound collection using a plurality of microphones, N being an integer that is 3 or more, the mixing apparatus comprising: a first speaker set processing unit to a P-th speaker set processing unit corresponding to respective speaker sets of two adjacent speakers among the N speakers, P being N−1, the first speaker set processing unit to the P-th speaker set processing unit each being configured to output a first drive signal for driving a first speaker of a corresponding speaker set, and a second drive signal for driving a second speaker of the corresponding speaker set; a compositing unit configured to composite drive signals for driving the same speaker among 2P drive signals output by the first speaker set processing unit to the P-th speaker set processing unit; and a reception unit configured to receive a user operation, wherein a K-th speaker set processing unit, K being an integer from 1 to P, includes: a mic set processing unit that is provided corresponding to each microphone set of two microphones among the plurality of microphones determined based on arrangement positions of the plurality of microphones, and is configured to process acoustic signals output by the two microphones of a corresponding microphone set and to output a first acoustic signal and a second acoustic signal; a first addition unit configured to add each of the first acoustic signals output by each of the mic set processing units corresponding to the microphone set and to output the first drive signal for driving the first speaker of a corresponding speaker set; and a second addition unit configured to add each of the second acoustic signals output by each of the mic set processing units corresponding to the microphone set and to output the second drive signal for driving the second speaker of a corresponding speaker set, wherein the mic set processing unit is configured to process acoustic signals output by two microphones of a corresponding microphone set based on an expansion/contraction coefficient for determining an expansion/contraction rate of a sound field, a shift coefficient for determining a shift amount of a sound field, and an attenuation coefficient for determining an attenuation amount of an acoustic signal output by a microphone, wherein the K-th speaker set processing unit further includes a K-th determination unit configured to categorize the microphone set based on the user operation and determining the expansion/contraction coefficient, shift coefficient, and attenuation coefficient to be used by the mic set processing unit based on the result of categorizing the microphone set, wherein the plurality of microphones are arranged on a predetermined line, and the two microphones of the microphone set are microphones that are adjacent to each other on the predetermined line, wherein the user operation is an operation of designating a segment on the predetermined line, wherein the K-th determination unit is configured to divide the segment into sub-segments relating to corresponding speaker sets, and when at least one microphone is included in the sub-segment, the K-th determination unit is configured to categorize a microphone set in which two microphones are included in the sub-segment into a first set, categorize a microphone set in which two microphones are not included in the sub-segment into a second set, and categorize a microphone set in which only one microphone is included in the sub-segment into a third set, and wherein when no microphone is included in the sub-segment, the K-th determination unit is configured to categorize a set of two microphones that are the closest to the two ends of the sub-segment into the third set and categorize sets other than the closest to the two ends of the sub-segment into the second set.
2. The mixing apparatus according to claim 1, wherein the K-th determination unit is configured to determine the expansion/contraction coefficient to be used by the mic set processing unit corresponding to the first set and the second set to be a value for which there is no expansion/contraction of the sound field, and determine the shift coefficient to be used by the mic set processing unit corresponding to the first set and the second set to be a value for which there is no shifting of the sound field.
3. The mixing apparatus according to claim 1, wherein the K-th determination unit is configured to determine the expansion/contraction coefficient to be used by the mic set processing unit corresponding to the third set according to a length of the sub-segment between the two microphones in the third set, and determine the shift coefficient to be used by the mic set processing unit corresponding to the third set according to a distance between a center between arrangement positions of the two microphones in the third set and a center of the sub-segment between the two microphones in the third set.
4. The mixing apparatus according to claim 1, wherein the K-th determination unit is configured to determine the attenuation coefficient of the two acoustic signals output by the two microphones in the first set and the attenuation coefficient of the two acoustic signals output by the two microphones in the third set to be a value with a smaller attenuation amount than the attenuation coefficient of the two acoustic signals output by the two microphones in the second set.
5. The mixing apparatus according to claim 4, wherein the K-th determination unit is configured to set the attenuation coefficient of the acoustic signal output by the microphone included in the sub-segment of the third set to be the same as the attenuation coefficient of the two acoustic signals output by the two microphones in the first set.
6. The mixing apparatus according to claim 4, wherein the K-th determination unit is configured to determine the attenuation coefficient of the acoustic signal output by a microphone not included in the sub-segment of the third set to be a value with a larger attenuation amount than the attenuation coefficient of the two acoustic signals output by the two microphones in the first set.
7. The mixing apparatus according to claim 6, wherein the K-th determination unit is configured to determine the attenuation coefficient of the acoustic signal output by the microphone not included in the sub-segment of the third set according to the distance between the arrangement position of the microphone and the sub-segment.
8. The mixing apparatus according to claim 4, wherein the K-th determination unit is configured to determine the attenuation coefficient of the two acoustic signals output by the two microphones in the second set to be a value at which the attenuation amount is the greatest.
9. The mixing apparatus according to claim 1, wherein the K-th determination unit is configured to determine the attenuation coefficient of the two acoustic signals output by the two microphones in the first set to be a value at which the attenuation amount is 0.
10. The mixing apparatus according to claim 1, wherein the K-th determination unit is configured to divide the segment into P sub-segments according to the arrangement interval of the N speakers, and the related sub-segment is a sub-segment divided according to the arrangement positions of the two speakers corresponding to the K-th speaker set processing unit.
11. A non-transitory computer readable storage medium including a program for causing a computer to function as the mixing apparatus according to claim 1.
12. A mixing apparatus for outputting drive signals for respectively driving N speakers based on acoustic signals obtained by performing sound collection using a plurality of microphones, N being an integer that is 3 or more, the mixing apparatus comprising: a first speaker set processor to a P-th speaker set processor corresponding to respective speaker sets of two adjacent speakers among the N speakers, P being N−1, the first speaker set processor to the P-th speaker set processor each being configured to output a first drive signal for driving a first speaker of a corresponding speaker set, and a second drive signal for driving a second speaker of the corresponding speaker set; an additional processor configured to: composite drive signals for driving the same speaker among 2P drive signals output by the first speaker set processor to the P-th speaker set processor; and receive a user operation, wherein a K-th speaker set processor, K being an integer from 1 to P, includes a mic set processor that is provided corresponding to each microphone set of two microphones among the plurality of microphones determined based on arrangement positions of the plurality of microphones, and is configured to process acoustic signals output by the two microphones of a corresponding microphone set and to output a first acoustic signal and a second acoustic signal; wherein the K-th speaker set processor is configured to: add each of the first acoustic signals output by each of the mic set processors corresponding to the microphone set and output the first drive signal for driving the first speaker of a corresponding speaker set; and add each of the second acoustic signals output by each of the mic set processors corresponding to the microphone set and output the second drive signal for driving the second speaker of a corresponding speaker set, wherein the mic set processor is configured to process acoustic signals output by two microphones of a corresponding microphone set based on an expansion/contraction coefficient for determining an expansion/contraction rate of a sound field, a shift coefficient for determining a shift amount of a sound field, and an attenuation coefficient for determining an attenuation amount of an acoustic signal output by a microphone, wherein the K-th speaker set processor is configured to categorize the microphone set based on the user operation and determine the expansion/contraction coefficient, shift coefficient, and attenuation coefficient to be used by the mic set processor based on the result of categorizing the microphone set, wherein the plurality of microphones are arranged on a predetermined line, and the two microphones of the microphone set are microphones that are adjacent to each other on the predetermined line, wherein the user operation is an operation of designating a segment on the predetermined line, wherein the K-th speaker set processor is configured to divide the segment into sub-segments relating to corresponding speaker sets, and when at least one microphone is included in the sub-segment, the K-th speaker set processor is configured to categorize a microphone set in which two microphones are included in the sub-segment into a first set, categorize a microphone set in which two microphones are not included in the sub-segment into a second set, and categorize a microphone set in which only one microphone is included in the sub-segment into a third set, and wherein when no microphone is included in the sub-segment, the K-th speaker set processor is configured to categorize a set of two microphones that are the closest to the two ends of the sub-segment into the third set and categorize sets other than the closest to the two ends of the sub-segment into the second set.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DESCRIPTION OF THE EMBODIMENTS
(14) Hereinafter an exemplary embodiment of the present invention will be described with reference to the drawings. Note that the following embodiment is exemplary and the present invention is not limited to the content of the embodiment. Also, in the following drawings, constituent elements that are not needed in the description of the embodiment are omitted from the drawings.
(15)
(16)
(17)
(18) Also, the acoustic signal processing unit 11 includes speaker compositing units corresponding to the respective speakers #2 to #N−1 included in two sets in a speaker set. Note that the speaker compositing unit corresponding to the speaker #X (X being an integer from 2 to N−1) is the X-th speaker compositing unit. Two signals for driving the speaker #X output by the speaker set processing unit, or more specifically, a higher-number drive signal #X−1 and a lower-number drive signal #X, are input to the X-th speaker compositing unit. The X-th speaker compositing unit composites the higher-number drive signal #X−1 and the lower-number drive signal #X and outputs the resulting signal as a drive signal #X. Note that among the total of 2(N−1) signals output by the N−1 set processing units, the signals for driving the speakers #1 and #N are only the lower-number drive signal #1 and the higher-number drive signal #N−1, and therefore the acoustic signal processing unit 11 outputs the lower-number drive signal #1 and the higher-number drive signal #N−1 as the drive signal #1 and the drive signal #N respectively.
(19)
(20) As shown in
(21) Hereinafter, the processing performed by the mic set processing unit will be described. First, it is assumed that an acoustic signal collected by a mic A will be called an acoustic signal A, an acoustic signal collected by a mic B will be called an acoustic signal B, and the acoustic signal A and the acoustic signal B are input to the mic set processing unit. The mic set processing unit performs a discrete Fourier transform on the acoustic signal A and the acoustic signal B each predetermined time segment. Hereinafter, the signals of the frequency ranges obtained by performing a discrete Fourier transform on the acoustic signal A and the acoustic signal B are a signal A and a signal B respectively. The mic set processing unit generates a signal R (a light channel: corresponds to a lower number) and a signal L (a left channel: corresponding to a higher number) of a frequency range from the signal A and the signal B using the following formula (1). Note that the processing shown in formula (1) is performed for each frequency component (bin) of the signal A and the signal B. Then, the mic set processing unit performs a discrete inverse Fourier transform on the signal R and the signal L of the frequency range and outputs two acoustic signals, namely an acoustic signal R and an acoustic signal L. The lower-number compositing unit adds the acoustic signals R output by the first mic set processing unit to the M-th mic set processing unit and outputs the lower-number drive signal #K. Similarly, the higher-number compositing unit adds the acoustic signals L output by the first mic set processing unit to the M-th mic set processing unit and outputs the higher-number drive signal #K.
(22)
(23) In formula (1), f is the frequency (bin) being subjected to processing, and Φ is the principal value of the declination of the acoustic signal A and the acoustic signal B. Accordingly, in formula (1), f and Φ are values that are determined according to the two acoustic signals A and the acoustic signal B being subjected to processing. On the other hand, in formula (1), m.sub.1, m.sub.2, τ, and κ are variables that are determined by a variable determination unit and notified to the mic set processing units. Hereinafter, the technical meaning of the respective variables will be described.
(24) m.sub.1 and m.sub.2 are attenuation coefficients, and are values that are 0 or more and 1 or less. Note that m.sub.1 determines the attenuation amount of the signal A and m.sub.2 determines the attenuation amount of the signal B. Hereinafter, it is assumed that m.sub.1 is called the attenuation coefficient of the mic A and m.sub.2 is called the attenuation coefficient of the mic B.
(25) κ is a scaling (expansion/contraction) coefficient, and determines the range of the sound field. Note that the scaling coefficient κ is a value that is 0 or more and 2 or less. For example, it is assumed that the mic A and the mic B have been arranged as shown in
(26) On the other hand, when m.sub.1 and m.sub.2 are set to 1 and τ is set to 0, if κ is made less than 1, the range of the sound field becomes shorter than when κ is 1, as shown in
(27) τ is a shift coefficient, and has a value in a range from −x to +x. When τ=0 as described above, a matrix T has no influence on the signal A and the signal B. On the other hand, when τ=0 is not satisfied, the matrix T provides phase changes with the same absolute value but different signs to the signal A and the signal B. Accordingly, the position of the acoustic field shifts in the direction of the mic A or the mic B. Note that the direction of the shift is determined according to the sign of τ, and the greater the absolute value of τ is, the greater the shift amount is.
(28) The coefficient determination unit of the K-th speaker set processing unit determines the coefficients of the first mic set processing unit to the M-th mic set processing unit, that is, m.sub.1, m.sub.2, τ, and κ, and notifies the first mic set processing unit to the M-th mic set processing unit. Hereinafter, the way in which the coefficient determination unit of the K-th speaker set processing unit determines the coefficients of the mic set processing units will be described.
(29) Segment information indicating segments is input by a segment determination unit 12 (
(30) The coefficient determination unit of the K-th speaker set processing unit stores mic information indicating the arrangement positions of the multiple mics, and speaker information indicating the arrangement positions of the speakers. Also, the segment indicated by the segment information is divided into N−1 sub-segments for each of the first speaker set to the N−1-th speaker set, and the sub-segments corresponding to the K-th speaker set are determined.
L.sub.1:L.sub.2:L.sub.3: . . . :L.sub.N-1=D.sub.1:D.sub.2:D.sub.3: . . . :D.sub.N-1
L.sub.1+L.sub.2+L.sub.3+. . . +L.sub.N-1=L
are satisfied. Note that as shown in
(31) The coefficient determination unit of the K-th speaker set processing unit categorizes the M-th mic set based on the K-th sub-segment 64 and the arrangement positions of the mics.
(32) Hereinafter, the way in which the coefficients to be used by the corresponding mic set processing units are determined for the first to third sets will be described. Note that hereinafter, a coefficient to be used by the mic set processing unit of a certain set will be expressed simply as “coefficient of mic set”. Also, it is assumed that, as shown in
(33) For example, for the first set, the coefficient determination unit sets τ to 0, κ to 1, and for the attenuation coefficient, sets both of the two mics to 1. That is, expansion/contraction and shifting of the sound field are not performed, and the attenuation amount is set to a value according to which the acoustic signals collected by the two mics do not attenuate.
(34) On the other hand, the coefficient determination unit determines the scaling coefficient κ and the shift coefficient τ of the third set such that the range of the sound field corresponds to an overlapping segment. That is, the coefficient determination unit determines the scaling coefficient κ of the third set based on the length L1 of the overlapping segment. Specifically, for example, letting L be the distance between the two mics in the third set, the scaling coefficient for the third set is determined so as to reach an expansion/contraction rate of L1/L. Accordingly, the coefficient determination unit determines the scaling coefficient κ of the third set such that the range of the sound field is shorter the shorter the length of the overlapping segment of the third set is. Also, the coefficient determination unit determines the shift coefficient τ of the third set such that the central position of the sound field is located at the central position of the overlapping segment. Accordingly, the coefficient determination unit determines the shift coefficient of the third set according to the distance between the center of the arrangement position of the two mics and the center of the overlapping segment. Also, the coefficient determination unit sets each of the attenuation coefficients of the two mics in the third set to 1. Alternatively, the coefficient determination unit sets the attenuation coefficient of the mic included in the K-th sub-segment 64 in the third set to a value that is the same as the attenuation coefficients of the two mics in the first set, and sets the attenuation coefficient of the mic not included in the K-th sub-segment 64 so as to be an attenuation amount that is greater than the attenuation amount of the mic included in the K-th sub-segment 64. Alternatively, the coefficient determination unit can set the attenuation coefficient of the mic not included in the K-th sub-segment 64 of the third set such that the attenuation amount increases the greater the length of the non-overlapping segment, that is, the maximum length L2 from the arrangement position of the mic to the K-th sub-segment 64 is.
(35) Furthermore, for example, the coefficient determination unit sets τ to 0 and κ to 1 for the second set, similarly to the first set. However, the attenuation coefficients of the two mics are set to values whose attenuation amounts increase according to the attenuation coefficients set for the mics in the first and third sets. For example, the coefficient determination unit sets the attenuation coefficients of the two mics in the second set to a value at which the attenuation amount is the greater, that is, 0, or to a predetermined value near 0.
(36) For example, as shown in
(37) In the present embodiment, the acoustic signal processing unit 11 includes the first speaker set processing unit to the N−1-th speaker set processing unit, and the first speaker set processing unit to the N−1-th speaker set processing unit output drive signals corresponding to the speaker sets for reproducing the sound field of the first sub-segment to the N−1-th sub-segment using the two speakers included in each of the first speaker set to the N−1-th speaker set. Then, the acoustic signal processing unit 11 outputs the drive signals for driving the speakers. Note that two signals for driving the same speaker among the 2 (N−1) drive signals output by the first speaker set processing unit to the N−1-th speaker set processing unit are composited. By reproducing the sound fields of the sub-segments to which the speaker sets arranged as shown in
(38) Finally, the segment determination unit 12 determines the segment based on a user operation. For example, if the user directly designates a segment, the segment determination unit 12 functions as a reception unit for receiving the operation of the user designating the segment. In this case, the segment determination unit 12 outputs the segment designated by the user to the acoustic signal processing unit 11. On the other hand, for example, if applied to viewing of an image on a head-mounted display for VR, or viewing of a 360-degree panorama image on a tablet, the segment determination unit 12 calculates the segment based on the range of the image viewed by the user and outputs the calculated segment to the acoustic signal processing unit 11.
(39) Note that in the present embodiment, the segment is divided into sub-segments according to the proportion of the arrangement interval of the speakers, but if it is a prerequisite that the speakers are arranged at equal intervals, it is possible to use a configuration in which the segments are divided into sub-segments of equal intervals. In this case, the arrangement information indicating the arrangement positions of the speakers is not necessary.
(40) Note that in the present embodiment, N speakers are arranged linearly in numerical order along a straight line or a curved line and (N−1) speaker sets are thus formed. However, N speakers can be arranged on a closed curved line, or for example, on a circular circumference, and the N speakers can form N speaker sets. In this case, in addition to the configuration shown in
(41) The mixing apparatus 10 according to the present invention can be realized using a program that causes a computer including one or more processors and a storage unit to function as the above-described mixing apparatus 10. These programs can be stored in a non-transitory computer-readable storage medium or be distributed via a network. The program is stored in a storage unit and a processor executes the program, and thereby the functions of the units shown in
(42) The present invention is not limited to the above-described embodiments, and various changes and modifications are possible without departing from the spirit and scope of the present invention. Accordingly, the following claims are attached in order to apprise the public of the scope of the present invention.