Uncertainty measure of a mixture-model based pattern classifer
10891942 · 2021-01-12
Assignee
Inventors
Cpc classification
G06N7/01
PHYSICS
G10L15/14
PHYSICS
International classification
G10L15/14
PHYSICS
G10L15/06
PHYSICS
Abstract
There are provided mechanisms for determining an uncertainty measure of a mixture-model based parametric classifier. A method is performed by a classification device. The method includes obtaining a short-term frequency representation of a multimedia signal. The short-term frequency representation defines an input sequence. The method includes classifying the input sequence to belong to one class of at least two available classes using the parametric classifier. The parametric classifier has been trained with a training sequence. The method includes determining an uncertainty measure of the classified input sequence based on a relation between posterior probabilities of the input sequence and posterior probabilities of the training sequence.
Claims
1. A method for determining an uncertainty measure of a mixture-model based parametric classifier, the method being performed by a classification device, the method comprising: obtaining a short-term frequency representation of a multimedia signal, the short-term frequency representation defining an input sequence x; classifying the input sequence x to belong to one class .sub.m* of at least two available classes .sub.1, .sub.2 using the parametric classifier, the parametric classifier having been trained with a training sequence y; and determining an uncertainty measure of the classified input sequence x based on a relation between posterior probabilities of the input sequence x and posterior probabilities of the training sequence y; and outputting at least the class .sub.m* and the determined uncertainty measure corresponding to the class .sub.m* for transmission.
2. The method according to claim 1, wherein the uncertainty measure describes a deviation from an optimal performance of the parametric classifier.
3. The method according to claim 2, wherein the optimal performance is based on the posterior probabilities of the training sequence y.
4. The method according to claim 1, wherein the uncertainty measure is defined as minimum of 1 and a ratio between the posterior probabilities of the input sequence x and the posterior probabilities of the training sequence y.
5. The method according to claim 1, wherein there is one posterior probability of the input sequence x for each available class .sub.1, .sub.2, and wherein the posterior probability for a given class .sub.m represents probability of said given class .sub.m for the input sequence x.
6. The method according to claim 1, wherein there is one posterior probability of the training sequence y for each available class .sub.1, .sub.2, and wherein the posterior probability for a given class .sub.m represents probability of the given class .sub.m for the training sequence y.
7. The method according to claim 1, wherein the posterior probabilities of the input sequence x represent probability of classifying the input sequence x into class .sub.m given the input sequence x, and wherein the posterior probabilities of the training sequence y represent probability of classifying the training sequence y into class .sub.m given that the training sequence y belongs to class .sub.m.
8. The method according to claim 1, wherein the posterior probabilities are determined as weighted sum of respective per-cluster posterior probabilities.
9. The method according to claim 1, wherein each of the at least two available classes .sub.1, .sub.2 is associated with its own set of clusters.
10. The method according to claim 9, wherein the training sequence y is divided into training vectors y.sub.n, and wherein the posterior probabilities of the training sequence y are based on training weight factors u.sub.m,k that represent how many of the training vectors y.sub.n are associated to each cluster u.sub.m,k of each class .sub.m.
11. The method according to claim 9, wherein the input sequence x is divided into input vectors x.sub.n, and wherein the posterior probabilities of the input sequence x are based on input weight factors v.sub.m*,k that represent how many of the input vectors x.sub.n are associated to each cluster .sub.m,k of said the one class .sub.m* to which the input sequence x has been classified to belong, normalized with a total length of the input sequence x.
12. The method according to claim 1, further comprising: determining, for each cluster .sub.m,k, a relation between the input weight factors and the training weight factors for the one class .sub.m* to which the input sequence x has been classified to belong; and storing the relation.
13. The method according to claim 1, wherein the parametric classifier is based on Gaussian Mixture Models, GMMs.
14. The method according to claim 1, wherein each of the at least two available classes .sub.1, .sub.2 represents one of a unique language, a unique speaker, and a unique gender.
15. The method according to claim 1, wherein the short-term frequency representation is provided by mel-frequency cepstral components, MFCCs.
16. The method according to claim 15, further comprising: extracting the MFCCs from an audio waveform of the multimedia signal.
17. The method according to claim 16, wherein the MFCCs are represented by MFCC vectors, wherein the audio waveform is composed of frames, and wherein there is one MFCC vector per frame.
18. The method according to claim 1, wherein the obtaining, the classifying, and the determining are performed in an audio mining application.
19. A classification device for determining an uncertainty measure of a mixture-model based parametric classifier, the classification device comprising processing circuitry, the processing circuitry being configured to cause the classification device to: obtain a short-term frequency representation of a multimedia signal, the short-term frequency representation defining an input sequence x; classify the input sequence x to belong to one class .sub.m* of at least two available classes .sub.1, .sub.2 using the parametric classifier, the parametric classifier having been trained with a training sequence y; and determine an uncertainty measure of the classified input sequence x based on a relation between posterior probabilities of the input sequence x and posterior probabilities of the training sequence y; and output at least the class .sub.m* and the determined uncertainty measure corresponding to the class .sub.m* for transmission.
20. A non-transitory computer storage medium storing a computer program for determining an uncertainty measure of a mixture-model based parametric classifier, the computer program comprising computer code which, when run on processing circuitry of a classification device, causes the classification device to: obtain a short-term frequency representation of a multimedia signal, the short-term frequency representation defining an input sequence x; classify the input sequence x to belong to one class .sub.m* of at least two available classes .sub.1, .sub.2 using the parametric classifier, the parametric classifier having been trained with a training sequence y; and determine an uncertainty measure of the classified input sequence x based on a relation between posterior probabilities of the input sequence x and posterior probabilities of the training sequence y; and output at least the class .sub.m* and the determined uncertainty measure corresponding to the class .sub.m* for transmission.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The inventive concept is now described, by way of example, with reference to the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION
(9) The inventive concept will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the inventive concept are shown. This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. Like numbers refer to like elements throughout the description. Any step or feature illustrated by dashed lines should be regarded as optional.
(10) With reference to
(11) Reference is here made to
(12) As noted above, it could be challenging to deduce the degree of uncertainty in the classification decision based on the likelihood P(x|.sub.m) itself, because it depends on the shape of the probability density function, the particular input sequence x, etc.
(13) Another option is to use the posterior probabilities P(.sub.m|x), which represent the probability of the class being .sub.m given the input sequence x, also considering the impact of the input sequence x being of finite length. For example, the input sequence x could correspond a length of 200 ms.
(14) An example of the impact of the finite length of the input sequence x is presented in
(15) In conclusion, if the models are complex (consisting of many clusters) and the input sequence x is short, P(.sub.m|x) will be dependent on which clusters are associated to the particular points (as defined by vectors x.sub.n) in x. Since the model is global (a GMM outputs only a weighted sum of votes from individual clusters) the classifier 110 cannot capture how the short sequence factor will impact the certainty of the classification P(.sub.m|x).
(16) The embodiments disclosed herein thus relate to determining an uncertainty measure of the mixture-model based parametric classifier 110. In order to obtain such an uncertainty measure there is provided a classification device 200a, 200b, a method performed by the classification device 200a, 200b, a computer program product comprising code, for example in the form of a computer program, that when run on a classification device 200a, 200b, causes the classification device 200a, 200b to perform the method.
(17)
(18) Reference is now made to
(19) S102: The classification device 200a, 200b obtains a short-term frequency representation of a multimedia signal. Examples of the short-term frequency representation and examples of the multimedia signal will be provided below. The short-term frequency representation defines an input sequence x.
(20) S104: The classification device 200a, 200b classifies the input sequence x to belong to one class .sub.m* of at least two available classes .sub.1, .sub.2. The input sequence x is classified using the parametric classifier 110. The parametric classifier 110 has been trained with a training sequence y. In this respect the parametric classifier 110 implements operations which as such are known in the art for classifying the input sequence x into one class .sub.m*, of the at least two available classes .sub.1, .sub.2.
(21) S106: The classification device 200a, 200b determines an uncertainty measure of the thus classified input sequence x. The uncertainty measure is based on a relation between posterior probabilities of the input sequence x and posterior probabilities of the training sequence y. Examples of uncertainty measures will be provided below. Step S106 can be implemented by the uncertainty module 210a, 210b.
(22) Determination of the uncertainty measure, as performed by the uncertainty module 210a, 210b, can thereby be added to a legacy classification scheme, as performed by the parametric classifier 110, without modifying the core classification determination implemented by the parametric classifier 110.
(23) The determination of the uncertainty measure thus requires the classification device 200a, 200b to obtain the posterior probabilities of the input sequence x and to obtain the posterior probabilities of the training sequence y.
(24) Embodiments relating to further details of determining the uncertainty measure of a mixture-model based parametric classifier 110 as performed by the classification device 200a, 200b will now be disclosed. Reference is now made to
(25) There could be different kinds of parametric classifiers 110. According to an embodiment the parametric classifier 110 is based on Gaussian Mixture Models (GMMs).
(26) There could be different properties that the at least two available classes .sub.1, .sub.2 represent. For example, the classification of the input sequence x can be performed to classify languages, speaker, and/or genders. Hence, according to an embodiment each of the at least two available classes .sub.1, .sub.2 represents a unique language, a unique speaker, or a unique gender.
(27) There could be different ways for the classification device 200a, 200b to obtain the short-term frequency representation of the multimedia signal as in step S102. According to an embodiment the short-term frequency representation is provided by mel-frequency cepstral components (MFCCs). In this respect, the MFCCs are coefficients that collectively make up a mel-frequency cepstrum (MFC). The MFC is a representation of the short-term power spectrum of an audio waveform of the multimedia signal, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. According to one embodiment the MFCCs are made readily available to the classification device 200a, 200b. Hence, this embodiment requires a module configured to provide the MFCCs from an audio waveform of the multimedia signal. According to another embodiment the classification device 200a, 200b receives the audio waveform and extracts the MFCCs from the audio waveform. Hence, according to an embodiment the classification device 200a, 200b is configured to perform step S102a:
(28) Step S102a: The classification device 200 extracts the MFCCs from the audio waveform of the multimedia signal. How to extract MFCCs from an audio waveform is as such known in the art and further description thereof is therefore omitted. Step S102a is performed as part of step S102.
(29) Assuming that the input sequence x is divided into input vectors x.sub.n, each input vector x.sub.n, can then correspond to a vector of MFCCs. Assuming that the audio waveform is composed of frames, there is then one vector of MFCCs per frame.
(30) There could be different types of audio waveforms. According to an embodiment the audio waveform represents a speech signal.
(31) There could be different types of audio classification of which the herein disclosed methods for non-parametric audio classification could be part of. According to an embodiment the step S102 of obtaining, the step S104 of classifying, and the step S106 of determining are performed in an audio mining application. In this respect, when uncertainty is detected in a phone recognition in an automatic speech recognition module, a language model could be adapted to compensate for the detected uncertainty. In another example in a video or audio mining application the length of the input sequence x could be extended until a desired level of certainty is reached.
(32) Further, since the discriminant function, as defined in Equation (6), is additive in terms of the input sequence x, i.e., g.sub.m(x.sub.1, . . . , x.sub.N)=g.sub.m(x.sub.1, . . . , x.sub.N-1)+g.sub.m(x.sub.N), delaying decision and accommodating more data is straightforward and does not require re-computing of the past contributions.
(33) There can be different kinds of uncertainty measures. According to an embodiment the uncertainty measure describes a deviation from an optimal performance of the parametric classifier 110. According to some aspects the optimal performance is based on the posterior probabilities of the training sequence y.
(34) There can be different kinds of posterior probabilities. According to an embodiment there is one posterior probability of the input sequence x for each available class .sub.1, .sub.2. The posterior probability for a given class .sub.m then represents the probability of the given class .sub.m for the input sequence x. Further, according to an embodiment there is one posterior probability of the training sequence y for each available class .sub.1, .sub.2. The posterior probability for a given class .sub.m then represents the probability of the given class .sub.m for the training sequence y. Examples of how the posterior probabilities can be determined will be provided below.
(35) As will be further disclosed below, according to some embodiments the uncertainty measure is defined as minimum of 1 and a ratio between the posterior probabilities of the input sequence x and the posterior probabilities of the training sequence y.
(36) In the following it is assumed that the total set of parameters {u.sub.m,k, .sub.m,k, .sub.m,k}.sub.k=1.sup.K for all M classes are already available. These parameters could be estimated by the classification device 200a, 200b implementing an Expectation Maximization algorithm. Next will be described how to determine the uncertainty measure associated with these classes.
(37) Embodiments relating to the operations performed by the uncertainty module 210a will now be disclosed with reference to the classification device 200a of
(38) According to the embodiment of
(39) The the embodiment of
(40) According to a first example, if the entire training sequence y is available, the classification device 200a determines the normalization factor .sub.m.sup.opt by implementing Equation (8):
(41)
(42) According to a second example, if the entire training sequence y is available, or it is computationally prohibited for the classification device 200a to implement Equation (6), the classification device 200a is configured to determine .sub.m.sup.opt by calculating the Bhattacharyya bound on per-cluster basis and sum over all clusters and classes. This will give the total probability of error, which could be used as an estimate of (1P(.sub.m|y.sup.inf)), where y.sup.inf denotes a training sequence of infinite length.
(43) The classification device 200a is configured to, at the classification stage, for the given input sequence x, obtain the index of the most likely class m* by implementing Equation (4) and then determine an adaptive factor .sub.m* associated with the particular input sequence x by implementing Equation (9):
(44)
(45) The classification device 200a is then configured to determine the uncertainty measure .sub.m* regarding the classification made in step S104 that the most likely class for the input sequence x is .sub.m* by implementing Equation (10):
(46)
(47) Hence, according to the embodiment of
(48) Embodiments relating to the operations performed by the uncertainty module 210b will now be disclosed with reference to the classification device 200a of
(49) According to the embodiment of
(50) From Equations (6)-(7) it follows that the likelihood is roughly determined by the association of x.sub.n to the particular clusters .sub.m,k. For example, by assuming that all covariances are equal, then this association would mean that the distance defined by |x.sub.n.sub.k| is smallest, which also means that .sub.m,k(x.sub.n) is largest among all clusters. Therefore, for posterior probabilities at a given cluster .sub.m,k the notation P(.sub.m|.sub.m,k) is used, where .sub.m,k is the mean of the cluster .sub.m,k.
(51) The classification device 200b determines posterior probabilities for each cluster .sub.m,k. The classification device 200b could pre-store the determined posterior probabilities. The posterior probabilities could be determined analytically or empirically, similarly to the description above. For example, in order to perform an empirical determination of the posterior probability for a cluster with mean .sub.m,k the classification device 200b is configured to perform the following operations directly on the training sequence y. The classification device 200b stores in variable L.sub.m,k, for each cluster the number of training data points in the training sequence y that belong to cluster .sub.m,k. The classification device 200b stores in a variable L.sub.m,k.sup..sup.
(52)
(53) In this respect, a cluster .sub.m,k that is maximized by many points from a class .sub.jm relative to the data points generated from class .sub.m does not have much discriminative power by itself; that cluster simply consists of points from different classes. At the same time, a cluster .sub.m,k that is maximized by many points from its own class class .sub.m relative to other classes .sub.jm has a lot of discriminative power.
(54) According to the embodiment of
(55)
(56) The normalization factor .sub.m.sup.opt thus relate to performance of the classifier 110 when performing classification of the training sequence y. As will be disclosed below, the normalization factor .sub.m.sup.opt is by the classification device 200b used when determining the uncertainty measure of the classification of the the input sequence x.
(57) At the classification of the input sequence x the classifier 110 implements Equation (4) to obtain the index m* of the most likely class .sub.m*. The classification device 200b then determines an uncertainty measure associated with that decision.
(58) According to the embodiment of
(59) The classification device 200b is then configured to determine an adaptive factor .sub.m* associated with the particular input sequence x by implementing Equation (13):
(60)
(61) The classification device 200b is then configured to determine the uncertainty measure .sub.m* regarding the classification made in step S104 that the most likely class for the input sequence x is .sub.m* by implementing Equation (14):
(62)
(63) Hence, according to the embodiment of
(64) The classification device 200b is optionally configured to determine an additional indicator .sub.m*, by implementing Equation (15)
(65)
(66) The upper-most row in .sub.m* is defined by the pre-stored posterior probabilities of the most likely class, while the entries of the lower-most row provides information about the distortion in the data statistics due to a limited number of samples (caused by a classification of an input sequence x of short length). According to an embodiment the classification device 200b is configured to perform steps S108 and S110 in order to implement Equation (15):
(67) S108: The classification device 200b determines, for each cluster .sub.m,k for k=1, . . . , K, a relation .sub.m* between the input weight factors v.sub.m*,k and training weight factors u.sub.m,k for the class .sub.m* to which the input sequence x has been classified to belong.
(68) S110: The classification device 200b stores the thus determined relation .sub.m*.
(69) In general terms, when the uncertainty measure (either .sub.m* or .sub.m*) of classifying the input sequence x into class .sub.m* has a numerical value that is close to 1 there is a high certainty in the classification decision, and the more the uncertainty measure decreases the more uncertain the classification decision becomes.
(70) A non-limiting illustrative example of at least some of the herein disclosed embodiments will be provided next. According to the illustrative example there are two classes .sub.1, .sub.2 with two clusters each, as represented by the cluster parameters .sub.m,k representing the mean values of each cluster. Table 1 illustrates numerical values of all posterior probabilities. However, as the skilled person understands, only values for one of the classes need to be stored.
(71) TABLE-US-00001 TABLE 1 Example values of posterior probabilities P(.sub.m|.sub.m,k) for two classes, each class comprising two clusters. Classes and clusters Posterior probabilities .sub.1: .sub.1,1 = 5 P(.sub.1|.sub.1,1) = 0.85 P(.sub.2|.sub.1,1) = 0.15 .sub.1,2 = 11 P(.sub.1|.sub.1,2) = 0.55 P(.sub.2|.sub.1,2) = 0.45 .sub.2: .sub.2,1 = 9 P(.sub.1|.sub.2,1) = 0.45 P(.sub.2|.sub.2,1) = 0.55 .sub.2,2 = 15 P(.sub.1|.sub.2,2) = 0.15 P(.sub.2|.sub.2,2) = 0.85
(72) Reference is also made to
(73) To simplify notation, all variances are assumed equal, and hence excluded from the calculations. All training weight factors u.sub.m,k are also assumed to equal, i.e., all training weight factors u.sub.m,k=0.5. Given these weights and posterior probabilities the posterior probabilities as given in Table 1, the classification device 200a, 200b determines .sub.1.sup.opt=.sub.2.sup.opt=0.5.Math.0.85+0.5.Math.0.55=0.7 by implementing Equation (12).
(74) Assume that the input sequence x is divided into 10 input vectors x.sub.n. Assume further that the input sequence x generates a higher value for the discriminative function of the first class .sub.1 than for the second class .sub.2. That is m*=1, according to Equation (4). Let the input sequence x have 9 out of 10 input vectors x.sub.n, closer to .sub.1,2 and only one being closer to .sub.1,1, which leads to v.sub.1,1=0.1 and v.sub.1,2=0.9. Then, by the classification device 200a, 200b implementing Equation (13) and Equation (14) the values .sub.1=0.58 and .sub.1=0.8286 are obtained. Since most of the input vectors x.sub.n fall closest to .sub.1,2=11 (in a confusion region, i.e., where the classes .sub.1, .sub.2 are difficult to discriminate), the above determined uncertainty measure =0.8286 is regarded as being significantly lower than 1.
(75)
(76) Particularly, the processing circuitry 310 is configured to cause the classification device 200a, 200b to perform a set of operations, or steps, S102-S110, as disclosed above. For example, the storage medium 330 may store the set of operations, and the processing circuitry 310 may be configured to retrieve the set of operations from the storage medium 330 to cause the classification device 200a, 200b to perform the set of operations. The set of operations may be provided as a set of executable instructions.
(77) Thus the processing circuitry 310 is thereby arranged to execute methods as herein disclosed. The storage medium 330 may also comprise persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory. The classification device 200a, 200b may further comprise a communications interface 320 configured for communications with another device, for example to obtain the short-term frequency representation as in step S102 and to provide as result of the uncertainty measure as determined in step S106. As such the communications interface 320 may comprise one or more transmitters and receivers, comprising analogue and digital components. The processing circuitry 310 controls the general operation of the classification device 200a, 200b e.g. by sending data and control signals to the communications interface 320 and the storage medium 330, by receiving data and reports from the communications interface 320, and by retrieving data and instructions from the storage medium 330. Other components, as well as the related functionality, of the classification device 200a, 200b are omitted in order not to obscure the concepts presented herein.
(78)
(79) It should also be mentioned that even though the modules correspond to parts of a computer program, they do not need to be separate modules therein, but the way in which they are implemented in software is dependent on the programming language used. Preferably, one or more or all functional modules 310a-310f may be implemented by the processing circuitry 310, possibly in cooperation with functional units 320 and/or 330. The processing circuitry 310 may thus be configured to from the storage medium 330 fetch instructions as provided by a functional module 310a-310g and to execute these instructions, thereby performing any steps as disclosed above.
(80) The classification device 200a, 200b may be provided as a standalone device or as a part of at least one further device. For example, the classification device 200a, 200b may be provided in an audio mining device. Alternatively, functionality of the classification device 200a, 200b may be distributed between at least two devices, or nodes. These at least two nodes, or devices, may either be part of the same network part or may be spread between at least two such network parts.
(81) Thus, a first portion of the instructions performed by the classification device 200a, 200b may be executed in a first device, and a second portion of the of the instructions performed by the classification device 200a, 200b may be executed in a second device; the herein disclosed embodiments are not limited to any particular number of devices on which the instructions performed by the classification device 200a, 200b may be executed. Hence, the methods according to the herein disclosed embodiments are suitable to be performed by a classification device 200a, 200b residing in a cloud computational environment. Therefore, although a single processing circuitry 310 is illustrated in
(82)
(83) In the example of
(84) The inventive concept has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the inventive concept, as defined by the appended patent claims.
LIST OF ABBREVIATIONS
(85) GMM Gaussian Mixture Model pdf probability density function x input sequence x.sub.n n-th feature vector of the input sequence x D dimensionality of the input vector x m class index (out of total M classes) .sub.m class with index m m* index of the best class in the log-likelihood sense k cluster index (out of total K clusters in every class) cluster mean cluster covariance || determinant of the covariance matrix u training weight factors (weight factors at training state) v input weight factors (weight factors at classification stage) .sub.m,k k-th component of m-th cluster P(.sub.m|.sub.m,k) posterior probabilities (probability of class .sub.m at cluster .sub.m,k with mean .sub.m,k) .sub.m* uncertainty measure based on per-cluster posterior probabilities .sub.m* uncertainty measure based on global posterior probabilities