SIGNAL ANALYSIS DEVICE, SIGNAL ANALYSIS METHOD, AND SIGNAL ANALYSIS PROGRAM
20210012790 ยท 2021-01-14
Assignee
Inventors
- Nobutaka ITO (Musashino-shi, Tokyo, JP)
- Tomohiro NAKATANI (Musashino-shi, Tokyo, JP)
- Shoko ARAKI (Musashino-shi, Tokyo, JP)
Cpc classification
G10L21/0308
PHYSICS
International classification
G10L21/0308
PHYSICS
Abstract
A signal analysis device (1) includes an estimation unit (10) that, when a parameter for modeling spatial characteristics of signals from N signal sources (where N is an integer equal to or larger than 2) is a spatial parameter, estimates a signal source position prior probability which is a mixture weight for modeling a prior distribution of the spatial parameter with respect to each signal source using a mixture distribution that is a linear combination of prior distributions of the spatial parameter with respect to K signal source position candidates (where K is an integer equal to or larger than 2), and which is a probability that a signal arrives from each signal source position candidate per signal source.
Claims
1. A signal analysis device, comprising: estimation circuitry configured to, when a parameter for modeling spatial characteristics of signals from N signal sources (where N is an integer equal to or larger than 2) is a spatial parameter, estimate a signal source position prior probability which is a mixture weight for modeling a prior distribution of the spatial parameter with respect to each signal source using a mixture distribution that is a linear combination of prior distributions of the spatial parameter with respect to K signal source position candidates (where K is an integer equal to or larger than 2), and which is a probability that a signal arrives from each signal source position candidate per signal source.
2. The signal analysis device according to claim 1, wherein the spatial parameter is a spatial covariance matrix, and the mixture distribution is a complex inverse Wishart mixture distribution.
3. The signal analysis device according to claim 1, wherein the estimation circuitry estimates the signal source position prior probability based on an auxiliary function method using an auxiliary function which is related to an objective function for maximizing a posterior probability of an unknown parameter, and with which a sum operation in the linear combination included in the objective function is not included in a logarithm operation.
4. The signal analysis device according to claim 1, wherein provided that N is the number of signal sources that is assumed to be a number sufficiently larger than the actual number N of signal sources, the estimation circuitry uses a signal source position candidate that maximizes the signal source position prior probability as an estimated value of a signal source position with respect to each n (where n is an integer equal to or larger than 1 and equal to or smaller than N), performs clustering of obtained N signal source positions using hierarchical clustering, and uses the number of obtained clusters as an estimated value of the actual number N of sound sources.
5. A signal analysis method executed by a signal analysis device, the signal analysis method comprising: when a parameter for modeling spatial characteristics of signals from N signal sources (where N is an integer equal to or larger than 2) is a spatial parameter, estimating a signal source position prior probability which is a mixture weight for modeling a prior distribution of the spatial parameter with respect to each signal source using a mixture distribution that is a linear combination of prior distributions of the spatial parameter with respect to K signal source position candidates (where K is an integer equal to or larger than 2), and which is a probability that a signal arrives from each signal source position candidate per signal source.
6. A signal analysis program for causing a computer to function as the signal analysis device according to claim 1.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
DESCRIPTION OF EMBODIMENTS
[0045] Below, an embodiment of a signal analysis device, a signal analysis method, and a signal analysis program according to the present application will be described in detail based on the figures. Also, the present invention is not limited by the embodiment described below. Note that hereinafter, the notation {circumflex over ()}A with respect to A which is a vector, a matrix, or a scalar is considered to be the same as a sign represented by A with {circumflex over ()} written immediately thereabove. Also, the notation A with respect to A which is a vector, a matrix, or a scalar is considered to be the same as a sign represented by A with written immediately thereabove.
First Embodiment
[0046] First, a signal analysis device according to a first embodiment will be described. Note that in the first embodiment, it is considered that in a situation where N sound source signals coexist (where N is an integer equal to or larger than 0), M observation signals y.sub.m() (m=1, . . . , M denotes a microphone index, and denotes a sample point index) that have been obtained by microphones at different positions are input to the signal analysis device (where M is an integer equal to or larger than 2). It is considered that N is the true number of sound sources, and N is the assumed number of sound sources. In the first embodiment, the assumed number of sound sources is set to be N=N, assuming a situation where the true number of sound sources N is known. Note that a sound source signal in the present first embodiment may be a target signal (e.g., speech), or may be directional noise (e.g., music played on a TV), which is noise arriving from a specific sound source position. Also, diffusive noise, which is noise arriving from various sound source positions, may be collectively regarded as one sound source signal. Examples of diffusive noise include speaking voices of many people in crowds, a caf, and the like, sound of footsteps in a station or an airport, and noise attributed to air conditioning.
[0047] A configuration and processing of the first embodiment will be described using
[0048] As shown in
[0049] First, an overview of respective units of the signal analysis device 1 will be described. The observation signal vector generation unit 11 first obtains input observation signals y.sub.m() (step S1), and calculates observation signals y.sub.m(t,f) in a time-frequency domain using, for example, short-time Fourier transform (step S2). Here, t=1, . . . , denotes a frame index, and f=1, . . . , F denotes a frequency bin index.
[0050] Next, the observation signal vector generation unit 11 generates an observation signal vector y(t,f), which is an M-dimensional column vector composed of all of the obtained M observation signals y.sub.m(t,f), that is to say, an observation signal vector y(t,f) indicated by expression (15), for each time-frequency point (step S3). Here, a superscript denotes a transpose.
[Formula 15]
y(t,f)=(y.sub.1(t,f) . . . y.sub.M(t,f)).sup.T(15)
[0051] In the present first embodiment, it is assumed that each sound source signal arrives from one of K sound source position candidates, and these sound source position candidates are represented by indexes (hereinafter, sound source position indexes) 1, . . . , K. For example, in a case where sound sources are a plurality of speakers who are having a conversation while being seated around a round table, M microphones are placed within a small area of approximately several square centimeters at the center of the round table, and only the azimuths of sound sources viewed from the center of the round table are focused as sound source positions, K azimuths , 2, . . . , K (=360/K) obtained by equally dividing 0 to 360 into K can be used as the sound source position candidates. No limitation is intended by this example; in general, arbitrary predetermined K points can be designated as the sound source position candidates. Also, the sound source position candidates may be sound source position candidates indicating diffusive noise. Diffusive noise does not arrive from one sound source position, but arrives from many sound source positions. By regarding such diffusive noise, too, as one sound source position candidate arriving from many sound source positions, accurate estimation can be made even in a situation where diffusive noise exists.
[0052] The initializing unit calculates initial values of estimated values of sound source existence prior probabilities .sub.n(f), sound source position prior probabilities .sub.kn, spatial covariance matrixes R.sub.n(f), and power parameters v.sub.n(t,f) (step S4). Note that n=1, . . . , N denotes a sound source index, and k=1, . . . , K denotes a sound source position index. For example, the initializing unit calculates these initial values based on random numbers.
[0053] The estimation unit 10 estimates sound source position prior probabilities. In the present first embodiment, spatial covariance matrixes are used as spatial parameters which are parameters for modeling the spatial characteristics of signals from the positions of N sound sources. A sound source position prior probability is a probability that a signal arrives from each sound source position candidate per sound source, and is a mixture weight for modeling a prior distribution of a spatial covariance matrix (a spatial parameter) with respect to each sound source. The foregoing prior distribution with respect to each sound source is modeled using a mixture distribution which is a linear combination of prior distributions of a spatial covariance matrix (a spatial parameter) with respect to K sound source position candidates (where K is an integer equal to or larger than 2). The estimation unit 10 includes a sound source existence posterior probability updating unit 12, a sound source position posterior probability updating unit 14, a sound source existence prior probability updating unit 15, a sound source position prior probability updating unit 16, and a spatial covariance matrix updating unit 17.
[0054] The sound source existence posterior probability updating unit 12 receives the observation signal vectors y(t,f), the sound source existence prior probabilities .sub.n(f), the spatial covariance matrixes R.sub.n(f), and the power parameters v.sub.n(t,f), and updates the sound source existence posterior probabilities .sub.n(t,f) (step S5). [0055] The observation signal vectors y(t,f) are the output from the observation signal vector generation unit 11. [0056] The sound source existence prior probabilities .sub.n(f) are the output from the sound source existence prior probability updating unit 15. Note, as an exception, these are the initial values of the sound source existence prior probabilities from the initializing unit at the time of first processing in the sound source existence posterior probability updating unit 12. [0057] The spatial covariance matrixes R.sub.n(f) are the output from the spatial covariance matrix updating unit 17. Note, as an exception, these are the initial values of the spatial covariance matrixes from the initializing unit at the time of first processing in the sound source existence posterior probability updating unit 12. [0058] The power parameters v.sub.n(t,f) are the output from the power parameter updating unit 18. Note, as an exception, these are the initial values of the power parameters from the initializing unit at the time of first processing in the sound source existence posterior probability updating unit 12.
[0059] The storage unit 13 stores parameters of prior distributions of the spatial covariance matrixes for respective sound source position candidates k and respective frequency bins f.
[0060] The sound source position posterior probability updating unit 14 receives the parameters of the prior distributions, the sound source position prior probabilities .sub.kn, and the spatial covariance matrixes R.sub.n(f), and updates sound source position posterior probabilities .sub.kn. [0061] The parameters of the prior distributions are stored in the storage unit 13 (step S6). [0062] The sound source position prior probabilities .sub.kn are the output from the sound source position prior probability updating unit 16. Note, as an exception, these are the initial values of the sound source position prior probabilities from the initializing unit at the time of first processing in the sound source position posterior probability updating unit 14. [0063] The spatial covariance matrixes R.sub.n(f) are the output from the spatial covariance matrix updating unit 17. Note, as an exception, these are the initial values of the spatial covariance matrixes from the initializing unit at the time of first processing in the sound source position posterior probability updating unit 14.
[0064] The sound source existence prior probability updating unit 15 receives the sound source existence posterior probabilities .sub.n(t,f) from the sound source existence posterior probability updating unit 12, and updates the sound source existence prior probabilities .sub.n(f) (step S7).
[0065] The sound source position prior probability updating unit 16 receives the sound source position posterior probabilities .sub.kn from the sound source position posterior probability updating unit 14, and updates the sound source position prior probabilities .sub.kn (step S8).
[0066] The spatial covariance matrix updating unit 17 receives the observation signal vectors y(t,f), the sound source existence posterior probabilities .sub.n(t,f), the parameters of the prior distributions, the sound source position posterior probabilities .sub.kn, and the power parameters v.sub.n(t,f), and updates the spatial covariance matrixes R.sub.n(f) (step S9). [0067] The observation signal vectors y(t,f) are the output from the observation signal vector generation unit 11. [0068] The sound source existence posterior probabilities .sub.n(t,f) are the output from the sound source existence posterior probability updating unit 12. [0069] The parameters of the prior distributions are stored in the storage unit 13. [0070] The sound source position posterior probabilities .sub.kn are the output from the sound source position posterior probability updating unit 14. [0071] The power parameters v.sub.n(t,f) are the output from the power parameter updating unit 18. Note, as an exception, these are the initial values of the power parameters from the initializing unit at the time of first processing in the spatial covariance matrix updating unit 17.
[0072] The power parameter updating unit 18 receives the observation signal vectors y(t,f) from the observation signal vector generation unit 11 and the spatial covariance matrixes R.sub.n(f) from the spatial covariance matrix updating unit 17, and updates the power parameters v.sub.n(t,f) (step S10).
[0073] The permutation solving unit receives the sound source existence prior probabilities .sub.n(f) from the sound source existence prior probability updating unit 15, the spatial covariance matrixes R.sub.n(f) from the spatial covariance matrix updating unit 17, and the power parameters v.sub.n(t,f) from the power parameter updating unit 18, and solves the permutation problem by updating the sound source existence prior probabilities .sub.n(f), the spatial covariance matrixes R.sub.n(f), and the power parameters v.sub.n(t,f) (step S1). Specifically, the permutation solving unit updates these parameters by switching the sound source index n for each frequency bin so that an evaluation value of, for example, a likelihood, a log-likelihood, or an auxiliary function is maximized. That is to say, when switching of the sound source index n for a frequency bin f is represented by a bijective function .sub.f:{1, . . . , N}.fwdarw.{1, . . . , N}, the bijective function .sub.f is calculated so that an evaluation value of, for example, a likelihood, a log-likelihood, or an auxiliary function is maximized when the sound source index n of these parameters has been switched to .sub.f(n) for each frequency bin f. These parameters are updated by, with use of the calculated bijective function .sub.f, switching the sound source index n of these parameters to .sub.f(n) for each frequency bin f. Note that instead of updating all of the sound source existence prior probabilities .sub.n(f), the spatial covariance matrixes R.sub.n(f), and the power parameters v.sub.n(t,f), the permutation solving unit may update only a part thereof (e.g., only the spatial covariance matrixes R.sub.n(f)). Note that processing in the permutation solving unit is not indispensable.
[0074] Subsequently, the convergence determination unit determines whether convergence has been achieved (step S12). If the convergence determination unit has determined that convergence has not been achieved (step S12: No), subsequent processing is continued with a return to processing in the sound source existence posterior probability updating unit 12 (step S5). On the other hand, if the convergence determination unit has determined that convergence has been achieved (step S12: Yes), processing in the sound source signal component estimation unit 19 (step S13) follows.
[0075] The sound source signal component estimation unit 19 receives the observation signal vectors y(t,f) from the observation signal vector generation unit 11 and the sound source existence posterior probabilities .sub.n(t,f) from the sound source existence posterior probability updating unit 12, and calculates and outputs estimated values {circumflex over ()}x.sub.n(t,f) of sound source signal components x.sub.n(t,f) (step S13).
[0076] Next, the features of the first embodiment will be described in comparison to the conventional technique. As described earlier, with the conventional technique, the prior distributions p(R.sub.n(1), . . . , R.sub.n(F)) of the spatial covariance matrixes R.sub.n(1), . . . , R.sub.n(F) in all frequency bins are modeled using the following expression (16) (relisting of expression (10)).
[0077] However, the problem with the conventional technique is that, with the assumption that the sound source positions of respective sound sources are known, application is not possible when the sound source positions of respective sound sources are unknown.
[0078] In contrast, according to the present first embodiment, the prior distributions p (R.sub.n(1), . . . , R.sub.n(F)) of the spatial covariance matrixes R.sub.n(1), . . . , R.sub.n(F) in all frequency bins are modeled using a complex inverse Wishart mixture distribution of the following expression (17).
[0079] This is represented as an average of prior distributions with respect to a sound source position candidate k, using the probability .sub.kn that a sound source n is at the sound source position candidate k as a weight. As it is assumed that the sound source positions of respective sound sources are unknown in the present first embodiment, .sub.kn is an unknown probability. However, as .sub.kn is a probability, it is considered that .sub.kn satisfies the following expression (18).
[0080] In this way, based on a weighted sum using an unknown probability .sub.kn, the prior distributions of the spatial covariance matrixes can be designed even when the sound source positions of respective sound sources are unknown. Although .sub.kn is unknown, this, too, can be regarded as an unknown parameter and estimated simultaneously with other unknown parameters.
[0081] In the present first embodiment, it is considered that parameters .sub.k(f) and .sub.k(f) of the complex inverse Wishart distribution for respective sound source position candidates k and respective frequency bins f are prepared and stored into the storage unit 13 in advance. These parameters may be prepared in advance based on information of microphone arrangement, or may be learnt in advance from data that has been actually measured.
[0082] For example, when these parameters are prepared in advance based on information of microphone arrangement, it is sufficient to calculate a steering vector of a plane wave corresponding to each sound source position candidate k from expression (19), with Cartesian coordinates of each microphone m regarded as r.sub.m, and to calculate .sub.k(f) and .sub.k(f) from the following expression (20) and expression (21).
[0083] Here, d.sub.k denotes a unit vector indicating an arrival direction of a sound source signal corresponding to the k.sup.th sound source position candidate, c denotes a sound speed, .sub.f denotes an angular frequency corresponding to a frequency bin f, j indicated by expression (21-1) denotes an imaginary unit, and a superscript H denotes a Hermitian transpose.
[Formula 22]
j{square root over (1)}(21-1)
[0084] A description is now given of derivation of prior distributions (expression (17)) according to the present first embodiment. It is assumed that the sound source positions of respective sound sources are unknown, and it is assumed that a sound source position index kn corresponding to the sound source position of each sound source n conforms to an unknown probability distribution indicated by expression (22). .sub.kn denotes a sound source position prior probability, which is a probability distribution of a sound source position index per sound source.
[Formula 23]
P(k.sub.n=k|.sub.1n, . . . ,.sub.Kn)=.sub.kn(22)
[0085] Furthermore, in the present first embodiment, on the condition that a sound source position index for a sound source n is k.sub.n=k, it is considered that spatial covariance matrixes R.sub.n(1), . . . , R.sub.n(F) of the sound source n conform to the probability distribution (expression (23)) independently of each other.
[Formula 24]
p(R.sub.n(f)|k.sub.n=k)=I(R.sub.n(f);.sub.k(f),.sub.k(f))(23)
[0086] Here, .sub.k(f) denotes a parameter (scale matrix) indicating the positions of peaks (modes) of prior distributions of spatial covariance matrixes for respective sound source position candidates, and .sub.k(f) denotes a parameter indicating the dispersions (degrees of freedom) of peaks of prior distributions of spatial covariance matrixes for respective sound source position candidates. Also, IW.sub.C(;,), which is indicated by expression (24), is the complex inverse Wishart distribution with a scale matrix and a degree of freedom .
[0087] Under the modeling of expression (22) and expression (23), the probability distributions of the spatial covariance matrixes R.sub.n(1), . . . , R.sub.n(F) of the sound source n are given by the following expression (25) to expression (28).
[0088] In the present embodiment, parameters are estimated based on prior distributions (expression (17)). Below, parameter estimation algorithms of the present embodiment will be described. Note that hereinafter, for simplicity, the complex inverse Wishart distribution IW.sub.C is simply referred to as IW with the omission of the attached letter C. Assuming that prior distributions of unknown parameters other than the spatial covariance matrixes R.sub.n(f) are uniform distributions, prior distributions of parameters are given by the following expression (29) and expression (30).
[0089] Note that the parameters according to the present first embodiment are composed of the sound source existence prior probabilities .sub.n(f), the power parameters v.sub.n(t,f), the spatial covariance matrixes R.sub.n(f), and the sound source position prior probabilities .sub.kn.
[0090] On the other hand, given the parameters , assuming that the observation signal vectors y(t,f) at respective time-frequency points are independent of each other, a likelihood is given by the following expression (31) and expression (32).
[0091] Here, Y collectively denotes the observation signal vectors y(t,f) at all time-frequency points.
[0092] In the present first embodiment, the parameters are estimated by maximizing the posterior probabilities p(|Y) of the parameters . Based on Bayes's theorem, these posterior probabilities can be expressed by expression (33), and removing logarithms from both sides results in expression (34).
[0093] As ln p(Y) is not dependent on the parameters , the maximization regarding of the posterior probabilities p(|Y) is equivalent to the maximization regarding of the following expression (35), and is thus equivalent to the maximization regarding of an objective function J () indicated by the following expression (36).
[0094] Here, a sign represented by = with c written immediately thereabove is a sign indicating that both sides are equal, excluding a difference between constants that are not dependent on the parameters . Also, A=:B means defining B with A.
[0095] The maximization of the objective function J () in the foregoing expression can be performed based on an auxiliary function method. With the auxiliary function method, the following two steps are iterated alternatingly based on an auxiliary function Q(,), which is a function of the parameters and a variable called an auxiliary variable.
1. A step of updating the auxiliary variable by maximizing the auxiliary function Q(,) with respect to the auxiliary variable .
2. A step of updating the parameters without causing a reduction in the auxiliary function Q(,).
[0096] Note, it is considered that the auxiliary function Q(,) satisfies the condition indicated by the following expression (37).
[0097] With respect to arbitrary ,
[0098] With this auxiliary function method, the objective function J() can be monotonically increased. That is to say, provided that the estimated values of the parameters obtained as a result of the i.sup.th iteration is (.sup.i), expression (38) holds.
[Formula 33]
J(.sup.(i)J(.sup.(i+1))(38)
[0099] In practice, provided that the value of the auxiliary variable obtained as a result of the i.sup.th iteration is 0(i), expression (39) and expression (40) hold based on expression (37).
[Formula 34]
j(.sup.(i))=Q(.sup.(i),.sup.(i+1))(39)
J(.sup.(i+1))=Q(.sup.(i+1),.sup.(i+2))(40)
[0100] Therefore, the following expression (41) holds, and hence expression (38) is obtained.
[Formula 35]
Q(.sup.(i),.sup.(i+1))Q(.sup.(i+1),.sup.(i+1))Q(.sup.(i+1),.sup.(i+2))(41)
[0101] With the auxiliary function method, it is necessary to design an auxiliary function Q(,) that satisfies expression (37). To this end, Jensen's inequality is used in the present first embodiment. It is known that, provided that f is a convex function, w.sub.1, . . . , w.sub.L are non-negative numbers that satisfy expression (42), and x.sub.1, . . . , x.sub.L are real numbers, expression (43) holds (the condition of satisfaction of equality is x.sub.1= . . . =x.sub.L).
[0102] This is called Jensen's inequality. Especially, provided that f(x)=ln x, expression (44) is obtained.
[0103] Provided that .sub.1(t,f), . . . , .sub.N(t,f) are non-negative numbers that satisfy expression (45), expression (46) and expression (47) are obtained from expression (44).
[0104] Furthermore, provided that .sub.1n, . . . , .sub.Kn are non-negative numbers that satisfy expression (48), expression (49) and expression (50) are obtained from expression (44).
[0105] Expression (51) is obtained from expression (47) and expression (50).
[0106] Therefore, when the right-hand side of expression (51) is replaced with expression (52), expression (53) holds from expression (36) and expression (51).
[Formula 45]
With respect to arbitrary and , Q(,)J()(53)
[0107] Note, it is considered that the auxiliary variable is composed of .sub.n(t,f) and .sub.kn.
[0108] The condition of satisfaction of equality of expression (51) is expression (54) and expression (55).
[0109] This is equivalent to the following expression (56) and expression (57).
[0110] Therefore, expression (58) holds.
[Formula 48]
With respect to arbitrary , exists, and Q(,)=J()(58)
[0111] It is apparent that, from expression (53) and expression (58), Q(,) of expression (52) satisfies expression (37). In the foregoing manner, the auxiliary function with respect to the objective function J() has been designed.
[0112] In the present first embodiment, the auxiliary variable and the parameters are updated as follows based on the auxiliary function Q(,) of expression (52). First, it is sufficient to update the auxiliary variable using expression (56) and expression (57). Also, it is sufficient to update the parameters using the following expression (59) to expression (62).
[0113] In this way, in the present first embodiment, instead of directly maximizing the objective function of expression (36), the objective function of expression (36) is indirectly maximized by alternatingly iterating the step of updating (by maximizing the auxiliary function Q(,) with respect to the auxiliary variable , and the step of updating the parameters without causing a reduction in the auxiliary function Q(,), based on the auxiliary function Q(,). Regarding the objective function of expression (36), a sum .sub.k=1.sup.K related to k is included in the logarithm ln, and differentiation of the objective function of expression (36) with respect to each parameter is complicated; thus, directly maximizing the objective function of expression (36) using, for example, a gradient method makes the update rules complicated. In contrast, regarding the auxiliary function Q(,), the sum .sub.k=1.sup.K related to k is outside the logarithm ln, and differentiation of the auxiliary function Q(,) with respect to each parameter is simple. Also, although the gradient method requires an adjustment of a step size that sets a parameter update amount per iteration, the auxiliary function method does not require the adjustment of the step size as the step size is unnecessary.
[0114] .sub.n(t,f) that has been updated using expression (56) is nothing other than a sound source existence probability after the observation signal vectors y(t,f) have been observed. In practice, based on Bayes's theorem, expression (56) can also be written as expression (63).
[0115] In view of this, .sub.n(t,f) is referred to as a sound source existence posterior probability. In contrast, .sub.n(f) (expression (64)) is a sound source existence probability before the observation signal vectors y(t,f) are observed, and is thus referred to as a sound source existence prior probability.
[Formula 54]
.sub.n(f)=P(n(t,f)=n|)(64)
[0116] Furthermore, .sub.kn that has been updated using expression (57) is nothing other than a sound source position probability after the spatial covariance matrixes R.sub.n(1), . . . , R.sub.n(F) have been given. In practice, (57) can also be written as expression (65).
[0117] In view of this, .sub.kn is referred to as a sound source position posterior probability. In contrast, .sub.kn (expression (66)) is a sound source position probability before the spatial covariance matrixes R.sub.n(1), . . . , R.sub.n(F) are given, and is thus referred to as a sound source position prior probability.
[Formula 56]
.sub.kn=P(k.sub.n=k|.sub.1n, . . . ,.sub.Kn)(66)
[0118] The process of expression (56) is performed in the sound source existence posterior probability updating unit 12, the process of expression (57) is performed in the sound source position posterior probability updating unit 14, the process of expression (59) is performed in the sound source existence prior probability updating unit 15, the process of expression (60) is performed in the sound source position prior probability updating unit 16, the process of expression (61) is performed in the spatial covariance matrix updating unit 17, and the process of expression (62) is performed in the power parameter updating unit 18.
[0119] A description is now given of derivation of the aforementioned expression (59) to expression (62) representing the update rules of the parameters . First, the auxiliary function of expression (52) can be calculated as in the following expression (67) and expression (68). Here, C is a constant that is not dependent on the parameters .
[0120] To derive expression (59) representing the update rule of the sound source existence prior probabilities .sub.n(f), given 0 as the result of differentiating expression (69) using .sub.n(f), with serving as a Lagrange undetermined multiplier and with attention to the constraint condition of expression (6), expression (70) is yielded.
[0121] Solving expression (70) with respect to .sub.n(f) yields expression (71).
[0122] Assigning expression (71) to expression (6) representing the constraint condition, to determine the value of the Lagrange undetermined multiplier (included in expression (71), yields expressions (72) to (74).
[0123] Therefore, =T, and thus expression (59) representing the update rule of the sound source existence prior probabilities .sub.n(f) is obtained. As expression (60) representing the update rule of the sound source position prior probabilities .sub.kn can be derived in a similar manner, a description thereof is omitted.
[0124] To derive expression (61) representing the update rule of the spatial covariance matrixes R.sub.n(f), given 0 as the result of differentiating expression (68) using R.sub.n(f), expression (75) is yielded.
[0125] Multiplying both sides of the foregoing expression by R.sub.n(f), from left and right, yields expression (76). By solving this with respect to R.sub.n(f), expression (61) representing the update rule of the spatial covariance matrixes R.sub.n(f) is obtained.
[0126] To derive expression (62) representing the update rule of the power parameters v.sub.n(t,f), given 0 as the result of differentiating expression (68) using v.sub.n(t,f), expression (77) is yielded.
[0127] By solving this with respect to v.sub.n(t,f), expression (62) representing the update rule of the power parameters v.sub.n(t,f) is obtained. Expressions (59) to (62) representing the update rules of the aforementioned parameters have been derived in the foregoing manner.
[0128] The present first embodiment is based on modeling in which the prior distributions of the spatial covariance matrixes R.sub.n(f), which are parameters of the complex Gaussian distribution, are prior distributions based on the complex inverse Wishart distribution. By thus using the complex Gaussian distribution and the complex inverse Wishart distribution in combination, an auxiliary function Q(,) is formatted such that an expression that gives 0 as the result of differentiation thereof with respect to the spatial covariance matrixes R.sub.n(f) can be solved with respect to R.sub.n(f) (described above). This is because the complex inverse Wishart distribution is a conjugate prior distribution of the complex Gaussian distribution. Regarding the conjugate prior distribution, see Reference Literature 2, C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
Effects of First Embodiment
[0129] As described above, in the present first embodiment, a signal source position prior probability is estimated. The signal source position prior probability is a mixture weight for modeling a prior distribution of a spatial covariance matrix with respect to each signal source using a mixture distribution which is a linear combination of prior distributions of spatial covariance matrixes with respect to a plurality of signal source position candidates. Also, the signal source position prior probability is the probability that a signal arrives from each signal source position candidate per signal source. Specifically, in the present first embodiment, a prior distribution of a spatial covariance matrix with respect to each signal source is modeled as in expression (17). Also, in the present first embodiment, based on a weighted sum using a sound source position prior probability .sub.kn, which is an unknown probability, prior distributions of spatial covariance matrixes can be designed even when the sound source positions of respective sound sources are unknown. Therefore, in the present first embodiment, even when the sound source positions with respect to respective sound source signals are unknown, signal source separation can be performed based on prior distributions of spatial covariance matrixes.
[0130] Furthermore, in the present first embodiment, due to the use of an auxiliary function in which a sum related to k is not included in the logarithm ln as indicated by expression (52), differentiation of the auxiliary function with respect to each parameter is simple, and parameter update computation is not complicated.
[0131] Moreover, the present first embodiment is based on modeling in which prior distributions of spatial covariance matrixes are prior distributions based on the complex inverse Wishart distribution. In the present first embodiment, by thus using the complex Gaussian distribution and the complex inverse Wishart distribution in combination, an auxiliary function Q(, ) is such that an expression that gives 0 as the result of differentiation thereof with respect to the spatial covariance matrixes R.sub.n(f) can be solved with respect to R.sub.n(f).
First Modification Example of First Embodiment
[0132] Although observation signal vectors y(t,f) are used as observation data in the present first embodiment, other feature vectors or feature amounts may be used as observation data. For example, feature vectors z(t,f) that are defined by expression (78) and expression (79) based on the observation signal vectors y(t,f) may be used.
[0133] Also, feature amounts, such as phase differences and amplitude ratios between microphones and arrival time differences between or arrival directions of sound source signals, may be used as observation data.
[0134] Also, although the complex Gaussian mixture distribution is used as a mixture model to be applied to observation signal vectors, which are feature vectors, in the present first embodiment, various mixture models (e.g., a Gaussian mixture distribution, a Laplace mixture distribution, a complex Watson mixture distribution, a complex Bingham mixture distribution, a complex angular central Gaussian mixture distribution, a von Mises distribution, and the like) can be used depending on feature vectors used. Furthermore, not only a mixture model, but also a model of the complex Gaussian distribution and the like may be applied to observation signal vectors, which are feature vectors.
[0135] Also, although prior distributions of spatial covariance matrixes are modeled using the complex inverse Wishart mixture distribution in the present first embodiment, modeling may be performed using other models, such as the complex Wishart mixture distribution.
[0136] Also, although the present first embodiment adopts a method of maximizing the posterior probabilities of the parameters to apply a model to observation data, a model may be applied to observation data using other methods.
[0137] Also, although optimization is performed using an auxiliary function method in the present first embodiment, optimization may be performed using other methods, such as a gradient method. In this case, the sound source existence posterior probability updating unit 12 and the sound source position posterior probability updating unit 14 are not indispensable.
Second Modification Example of First Embodiment
[0138] A description is given of a second modification example of the first embodiment in which the true number N of sound sources is estimated and sound source separation is performed when the true number N of sound sources is unknown. In the present modification example, it is considered that the assumed number N of sound sources is set to be sufficiently large so as to be NN. For example, when it is known that the assumed number of sound sources is 6 at most, it is sufficient to set the assumed number of sound sources to be N=6. Note that when the actual number of sound sources is 4, N=4.
[0139] With respect to each n (where n is an integer equal to or larger than 1 and equal to or smaller than N), the estimation unit 10 uses a sound source position candidate corresponding to k that maximizes the sound source position prior probability .sub.kn from the sound source position prior probability updating unit 16 as an estimated value of a sound source position. Then, the signal analysis device 1 performs clustering of N sound source positions that have been obtained in the foregoing manner using, for example, hierarchical clustering, and uses the number of obtained clusters as an estimated value {circumflex over ()}N of the actual number N of sound sources.
[0140] It is considered that the {circumflex over ()}N clusters that have been obtained through clustering respectively correspond to the {circumflex over ()}N actual sound sources. Therefore, this clustering makes clear to which one of the {circumflex over ()}N actual sound sources each one of the N assumed sound sources n corresponds. In performing sound source separation, the estimation unit 10 performs subsequent processing as well, using this correspondence relationship.
[0141] The estimation unit 10 further calculates the sound source existence posterior probability .sub.n(t,f) of the n.sup.th actual sound source by, with respect to each one of the obtained {circumflex over ()}N clusters n (where n is a cluster index that is an integer equal to or larger than 1 and equal to or smaller than {circumflex over ()}N), adding one of the sound source existence posterior probabilities .sub.n(t,f) of the N assumed sound sources that corresponds to this cluster. The estimation unit 10 further determines that, with respect to each time-frequency point (t,f), a signal from an actual sound source corresponding to the number n that maximizes the sound source existence posterior probability .sub.n(t,f) of the actual sound source is producing sound at (t,f), similarly to expression (8). The estimation unit 10 further performs sound source separation by considering an estimated value {circumflex over ()}x.sub.n(t,f) of a sound source signal component of an actual sound source to be y(t,f) when it is determined that the n.sup.th actual sound source is producing sound at (t,f), and to be 0 when it is determined otherwise, similarly to expression (4).
Third Modification Example of First Embodiment
[0142] The present first embodiment may be applied not only to sound signals, but also to other signals (electroencephalogram, magnetoencephalogram, wireless signals, and the like). Observation signals in the present first embodiment are not limited to observation signals obtained by a plurality of microphones (a microphone array), and may also be observation signals composed of signals that have been obtained by another sensor array (a plurality of sensors) of an electroencephalography device, a magnetoencephalography device, an antenna array, and the like, and that are generated from spatial positions in chronological order.
Fourth Modification Example of First Embodiment
[0143] An example of modeling of probability distributions of observation signal vectors y(t,f) using a complex Gaussian distribution of the following expression (80) will be described as a fourth modification example of the first embodiment. In this case, the update rules of parameters are as indicated by expression (81) to expression (86), instead of expressions (56), (57), (59), (60), (61), and (62) of the first embodiment.
[0144] A configuration and processing of the fourth modification example of the first embodiment will be described using
[0145] As shown in
[0146] Similarly to the first embodiment, the observation signal vector generation unit 11 generates observation signal vectors y(t,f) using expression (1) (step S21 to step S23).
[0147] The initializing unit calculates initial values of estimated values of a sound source position prior probability .sub.kn, a spatial covariance matrix R.sub.n(f), and a power parameter v.sub.n(t,f) (step S24). Note that n=1, . . . , N denotes a sound source index, and k=1, . . . , K denotes a sound source position candidate index. For example, the initializing unit calculates these initial values based on random numbers. The initializing unit also initializes n (step S25).
[0148] Note that the storage unit 13 stores .sub.k(f) and .sub.k(f), which are parameters of prior distributions of spatial covariance matrixes for respective sound source position candidates k and respective frequency bins f.
[0149] Subsequently, the signal analysis device 201 adds 1 to n (step S26), and performs processes of step S27 to step S31.
[0150] The sound source position posterior probability updating unit 212 receives .sub.k(f) and .sub.k(f), which are the parameters of the prior distributions from the storage unit 13, a sound source position prior probability (note, as an exception, the initial value of the sound source position prior probability from the initializing unit at the time of first processing in the sound source position posterior probability updating unit 212) .sub.kn from the sound source position prior probability updating unit 214, and the spatial covariance matrix (note, as an exception, the initial value of the spatial covariance matrix from the initializing unit at the time of first processing in the sound source position posterior probability updating unit 212) R.sub.n(f) from the spatial covariance matrix updating unit 217, and updates a sound source position posterior probability .sub.kn using expression (81) (step S27).
[0151] The sound source signal posterior probability updating unit 213 receives the observation signal vectors y(t,f) from the observation signal vector generation unit 11, the power parameter (note, as an exception, the initial value of the power parameter from the initializing unit at the time of first processing in the sound source signal posterior probability updating unit 213) v.sub.n(t,f) from the power parameter updating unit 218, and the spatial covariance matrix (note, as an exception, the initial value of the spatial covariance matrix from the initializing unit at the time of first processing in the sound source signal posterior probability updating unit 213) R.sub.n(f) from the spatial covariance matrix updating unit 217, and updates an average .sub.n(t,f) of posterior probabilities of a sound source signal component x.sub.n(t,f) and a covariance matrix .sub.n(t,f) using expression (82) and expression (83) (step S28).
[0152] The sound source position prior probability updating unit 214 receives the sound source position posterior probability .sub.kn from the sound source position posterior probability updating unit 212, and updates the sound source position prior probability .sub.kn using expression (84) (step S29).
[Formula 71]
.sub.kn.sub.kn(84)
[0153] The spatial covariance matrix updating unit 217 receives .sub.k(f) and .sub.k(f), which are the parameters of the prior distributions from the storage unit 13, the sound source position posterior probability .sub.kn from the sound source position posterior probability updating unit 212, the average .sub.n(t,f) of the posterior probabilities and the covariance matrix .sub.n(t,f) from the sound source signal posterior probability updating unit 213, and the power parameter (note, as an exception, the initial value of the power parameter from the initializing unit at the time of first processing in the spatial covariance matrix updating unit 217) v.sub.n(t,f) from the power parameter updating unit 218, and updates the spatial covariance matrix R.sub.n(f) using expression (85) (step S30).
[0154] The power parameter updating unit 218 receives the spatial covariance matrix R.sub.n(f) from the spatial covariance matrix updating unit 217 and the average .sub.n(t,f) of the posterior probabilities and the covariance matrix .sub.n(t,f) from the sound source signal posterior probability updating unit 213, and updates the power parameter v.sub.n(t,f) using expression (86) (step S31).
[0155] Then, the signal analysis device 201 determines whether n=N (step S32). If it is not determined that n=N (step S32: No), the signal analysis device 201 returns to step S26. On the other hand, if it is determined that n=N (step S32: Yes), the signal analysis device 201 proceeds to determination processing of the convergence determination unit.
[0156] The convergence determination unit determines whether convergence has been achieved (step S33). If the convergence determination unit determines that convergence has not been achieved (step S33: No), the signal analysis device 201 returns to step S25 and continues processing. On the other hand, if the convergence determination unit determines that convergence has been achieved (step S33: Yes), the sound source signal posterior probability updating unit 213 outputs averages .sub.n(t,f) of the posterior probabilities as estimated values {circumflex over ()}x.sub.n(t,f) of sound source signal components x.sub.n(t,f) (step S34), and processing in the signal analysis device 201 ends.
Fifth Modification Example of First Embodiment
[0157] Although the spatial characteristics of a sound source signal are modeled using a spatial covariance matrix in the first embodiment, the spatial characteristics of a sound source signal may be modeled using other parameters. A parameter for modeling the spatial characteristics of a sound source signal is referred to as a spatial parameter here.
[0158] For example, the spatial characteristics of a sound source signal may be modeled using a steering vector as a spatial parameter. In this case, the probability distribution of an observation signal vector y(t,f) can be modeled using, for example, a complex Gaussian distribution of the following expression (87).
[0159] Here, h.sub.n(f) denotes a steering vector which is a spatial parameter for modeling the spatial characteristics of a sound source signal n, and .sub.1.sup.2 is a positive number for regularization. In this case, the prior distribution of h.sub.n(f) is given by the following expression (88). Note that p in expression (88) denotes the complex Gaussian distribution p.sub.G.
[0160] Here, g.sub.k(f) and .sub.2.sup.2 denote hyper parameters. g.sub.k(f) is a steering vector with respect to the k.sup.th sound source position candidate, and .sub.2.sup.2 is a positive number for regularization. It is sufficient to estimate parameters , similarly to the first embodiment, based on the foregoing modeling.
[System Configuration, Etc.]
[0161] Also, the constituent elements of devices shown are functional concepts, and need not necessarily be physically configured as shown in the figures. That is to say, a specific form of separation and integration of devices is not limited to those shown in the figures, and all or a part of devices can be configured in a functionally or physically separated or integrated manner, in arbitrary units, in accordance with various types of loads, statuses of use, and the like. Furthermore, all or an arbitrary part of processing functions implemented in devices can be realized by a CPU and a program that is analyzed and executed by this CPU, or realized as hardware using a wired logic.
[0162] Also, among processes that have been described in the present embodiment, processes that have been described as being performed automatically can also be entirely or partially performed manually, or processes that have been described as being performed manually can also be entirely or partially performed automatically using a known method. In addition, processing procedures, control procedures, specific terms, and information including various types of data and parameters presented in the foregoing text and figures can be changed arbitrarily, unless specifically stated otherwise. That is to say, the processes that have been described in relation to the foregoing learning methods and speech recognition methods are not limited to being executed chronologically in the stated order, and may be executed in parallel or individually in accordance with the processing capacity of a device that executes the processes or as necessary.
[Program]
[0163]
[0164] The memory 1010 includes a ROM 1011 and a RAM 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. For example, a removable storage medium such as a magnetic disk and an optical disc is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected to, for example, a display 1130.
[0165] The hard disk drive 1090 stores, for example, an OS (Operating System) 1091, an application program 1092, a program module 1093, and program data 1094. That is to say, a program that defines the processes of the signal analysis devices 1, 201 is implemented as the program module 1093 in which codes that can be executed by the computer 1000 are written. The program module 1093 is stored in, for example, the hard disk drive 1090. For example, the program module 1093 for executing processes that are similar to the functional configurations of the signal analysis devices 1, 201 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced with an SSD (Solid State Drive).
[0166] Also, setting data used in the processes of the above-described embodiment is stored as the program data 1094 in, for example, the memory 1010 and the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 and executes the same as necessary.
[0167] Note that the program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, and may be, for example, stored in a removable storage medium and read out by the CPU 1020 via the disk drive 1100 and the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (a LAN (Local Area Network), a WAN (Wide Area Network), and the like). Then, the program module 1093 and the program data 1094 may be read out from another computer by the CPU 1020 via the network interface 1070.
[0168] Although the above has explained the embodiment to which the invention made by the present inventors is applied, the present invention is not limited by a description and figures that compose a part of the disclosure of the present invention based on the present embodiment. That is to say, other embodiments, examples, operating techniques, and the like that are implemented by, for example, a person skilled in the art based on the present embodiment are all encompassed within the scope of the present invention.
REFERENCE SIGNS LIST
[0169] 1, 201, 1P Signal analysis device [0170] 10 Estimation unit [0171] 11, 11P Observation signal vector generation unit [0172] 12, 12P Sound source existence posterior probability updating unit [0173] 13, 13P Storage unit [0174] 14, 212 Sound source position posterior probability updating unit [0175] 14P Sound source existence prior probability updating unit Sound source existence prior probability updating unit [0176] 16, 214 Sound source position prior probability updating unit [0177] 17, 217, 15P Spatial covariance matrix updating unit [0178] 18, 218, 16P Power parameter updating unit [0179] 19, 17P Sound source signal component estimation unit [0180] 213 Sound source signal posterior probability updating unit