Incremental cluster validity index-based offline clustering for machine learning

Abstract

A neural network model replaces the supervised labeling component of a supervised learning system with an incremental cluster validity index-based unsupervised labeling component. An implementation is presented combining fuzzy adaptive resonance theory predictive mapping (ARTMAP) and incremental cluster validity indices (iCVI) for unsupervised machine learning purposes, namely the iCVI-ARTMAP. An iCVI module replaces the adaptive resonance theory (ART) module B of a fuzzy ARTMAP neural network model and provides assignments of input samples to clusters (i.e., labels) at each learning iteration in accordance to any of several possible iCVI methods described. A map field incrementally builds a many-to-one mapping of the categories of ART module A to the cluster labels. At the end of each learning epoch, clusters may be merged and/or split using the iCVI, which is recomputed incrementally except for the newly cluster during a split. The iCVI-ARTMAP performs offline incremental multi-prototype-based clustering driven by the iCVI.

Claims

1. A method for machine learning, comprising: a) configuring an ART A module of a fuzzy ARTMAP neural network as a fuzzy ART module; b) configuring an offline incremental cluster validity index (iCVI) module as a second input module to the fuzzy ARTMAP neural network; c) initializing a data set partition; d) initializing the ART A module, the offline iCVI module, and a map field of the fuzzy ARTMAP neural network corresponding to the initialized data set partition, said ART A module further having a ART A vigilance parameter; e) inputting pre-processed versions of a data set into both the ART A module and the offline iCVI module; f) in the offline iCVI module, computing temporary iCVI values for an assignment of a current sample of the pre-processed versions of the data set to each of a plurality of clusters in a current data partition and generating a current sample label for the current sample as a function of the computed temporary iCVI values; g) applying the current sample label to the vigilance test of the map field, said map field having a map field vigilance parameter; h) assigning a sample to an ART A category and an associated cluster mapped via the map field when both the ART A and map field vigilance tests are simultaneously satisfied, but when the ART A vigilance parameter is satisfied but the map field vigilance parameter is not satisfied, then causing the ART A module to change its vigilance parameter; i) incrementally updating a ART A category weight vector and a corresponding map field weight vector when both the ART A and map field vigilance tests are simultaneously satisfied; j) creating a new ART A category weight vector and a new corresponding map field weight vector when no existing ART A category simultaneously satisfies the ART A and map field vigilance tests; k) incrementally updating iCVI variables and validation measures when assignments of samples of the data set to respective clusters change and adjusting assignments of the ART A categories to clusters according to the map field, thereby adjusting the input-output relationship of the neural network; and I) generating subsequent data partitions with a multi-prototype cluster representation via a categories-to-clusters mapping of the map field.

2. The method of claim 1, wherein the offline iCVI module stores and updates variables and values of a user-selected incremental cluster validity index (iCVI), generates labels representing the assignment of data samples to clusters, as well as merges and splits clusters.

3. The method of claim 1, wherein the fuzzy ART module has a counter that stores a current number of samples assigned to each category.

4. The method of claim 1, wherein a true label of each sample of a data set is unknown.

5. The method of claim 2, wherein the iCVI module is user-selected and has associated iCVI variables.

6. The method of claim 5, in which the iCVI variables include at least one matrix representative of a degree of dissimilarity.

7. The method of claim 5, in which the iCVI variables include cluster sample statistics chosen from a group including as frequencies, means, compactnesses, and covariance matrices.

8. The method of claim 1, wherein the offline iCVI module generates a one-hot encoded label indicating a best assignment of a sample to a cluster at a presentation time by incrementally recomputing a user-defined iCVI value and variables by swapping a sample between existing clusters.

9. The method of claim 1, wherein at the end of each iteration, the offline iCVI module computes temporary iCVI values for merging clusters of the current data partition of the pre-processed data set and hierarchically merges clusters with a best resulting iCVI value when such operation entails an iCVI improvement; further comprising temporarily recomputing the iCVI values and variables solely using the current variables stored within the iCVI module, which comprise less than the entirety of the data.

10. The method of claim 9, in which the iCVI module updates the iCVI variables and iCVI values with the corresponding temporarily computed iCVI variables and iCVI values upon the merging of the clusters.

11. The method of claim 1, wherein: the offline iCVI module attempts to split multi-prototype clusters when a current number of clusters is smaller than a user-predefined value; the iCVI variables of the new cluster are temporarily computed using the samples assigned to the category representing the new cluster; the iCVI variables of the split cluster are temporarily recomputed using the current variables stored within the iCVI module and iCVI variables of the new cluster; and the iCVI value is temporarily recomputed using the temporarily computed variables and the currently stored variables within the iCVI module.

12. The method of claim 11, in which the iCVI module updates the iCVI variables and iCVI values with the corresponding temporarily computed iCVI variables and iCVI values upon the splitting of the clusters.

13. The method of claim 1, wherein the map field vigilance test considers the mapping of the selected category of the ART A module and the label provided by the offline iCVI module.

14. The method of claim 1, wherein the inputs for the offline iCVI module and ART A modules undergo their own pre-processing.

15. The method of claim 1, further comprising initializing data set partitioning by a user-selected method chosen from a set of methods including random selection of input samples and a vector quantization-based clustering routine as kmeans.

16. The method of claim 1, wherein the data samples are iteratively presented until a termination criterion is met.

17. The method of claim 1, wherein labels of samples representing the current partition are stored and, at each input presentation, the corresponding label is updated according to a map field prediction.

18. The method of claim 1, further comprising pruning of ART A categories to which no samples are assigned; updating ART A categories from which samples previously assigned to them were re-assigned to other ART A categories; updating the map field to reflect merging of clusters; and updating the map field to reflect splitting of clusters.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 depicts the components of an iCVI-ARTMAP neural network.

[0010] FIG. 2 depicts a data clustering procedure using the iCVI-ARTMAP neural network shown in FIG. 1.

[0011] FIG. 3 illustrates a training procedure for the iCVI-ARTMAP network (step 203 shown in FIG. 2).

DETAILED DESCRIPTION

[0012] In general, embodiments disclosed here allow a user to select an iCVI and perform offline incremental multi-prototype-based clustering via ARTMAP. Offline clustering is also known as batch clustering. This embodiment also significantly accelerates offline clustering when cluster validity indices are computed, so it is preferred over other methods for offline clustering with cluster validity indices. The offline clustering process makes use of such iCVI for decision-making (regarding sample allocation to clusters, as well as deletion, merging and splitting of clusters) while incrementally building associative mappings between prototypes (categories) to clusters.

[0013] Clustering is an unsupervised learning task; thus, true labels of the samples of a data set are not provided to classify data into groups. The solution presented here and shown in FIG. 1, namely the iCVI-ARTMAP 100, consists of a fuzzy ARTMAP neural network as in G. A. Carpenter, et al., “Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps”, IEEE Transactions on Neural Networks, 3(5):698-713, Sep. 1992, and “A fuzzy ARTMAP nonparametric probability estimator for nonstationary pattern recognition problems”, IEEE Transactions on Neural Networks, 6(6):1330-1336, Nov. 1995, whose ART B module is replaced by an iCVI module 103. The latter is responsible for constructing labels for each input presented, as well as incrementally updating the selected iCVI throughout the unsupervised machine learning process. A map field 101, and ART A (fuzzy ART equipped with a counter that stores the current number of samples assigned to each category) 102 modules maintain most of their original dynamics as disclosed by G.A. Carpenter, super. The labels for each sample generated by the iCVI module follow the one-hot encoding standard and correspond to the optimal assignment—given the selected iCVI—of a sample. Note that the iCVI may yield different labels for the same sample at different iteration times.

[0014] At each iteration, the iCVI values are incrementally computed by swapping the presented sample between current clusters. The iCVI variables associated with the current partition are also cached (see L.E. Brito da Silva, et al., super) to speed up the neural network training. At the end of each epoch, the system attempts to merge clusters to improve the partition quality. Next, if necessary, the system attempts to split clusters to enforce the desired number of clusters. These operations are performed incrementally, however, in the split case, the parent cluster that is split has its iCVI variables updated incrementally whereas the newly created cluster has its iCVI variables computed in batch mode. The map field, which may, for example, be implemented as an associative network, allows for multi-prototype representation of the clusters. The user can seamlessly select the iCVI to match the input data characteristics and application at hand.

[0015] The different software routines and components illustrated in FIG. 1 may be embodied as computer-executable code stored in any conventional storage or memory system or combination thereof on one or more computing platforms, and which, when executed on one or more processing units, cause the processing units and other conventional hardware components to carry out the various I/O operations, storage requirements, computations, and updating routines, etc., described here.

[0016] FIG. 2 depicts a data clustering procedure using the iCVI-ARTMAP neural network (shown in FIG. 1). After the data (X) is loaded (step 200) it is duplicated. Next, each copy undergoes user-defined transformations (step 201) as a pre-processing stage. Such transformations need not be equal, and examples include min-max normalization and complement coding for the ART A module (processed data version X.sup.a) and standardization for the iCVI module (processed data version X.sup.b). Step 202 corresponds to the initialization of the map field matrix and ART A categories, creation of an initial partition based on the assignment of the data samples to the initial ART A categories, and the initialization of the iCVI value and associated variables (which may include, but are not limited to, some cluster sample statistics (such as frequencies (i.e., sample counts), means, compactnesses, covariance matrices) and/or (dis)similarity matrices) given the initial partition. In that step, the user must provide all the required iCVI-ARTMAP model parameters. The ART A categories may be initialized using random samples, kmeans++ (D. Arthur, et al., “K-Means++: The Advantages of Careful Seeding”, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA'07, page 1027-1035, USA, 2007, Society for Industrial and Applied Mathematics), or centroids from kmeans (J. B. MacQueen, et al. “Some Methods for Classification and Analysis of MultiVariate Observations”, M. Le Cam and J. Neyman, editors, Proc. fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281-297. University of California Press, 1967) clustering algorithm. If the latter initialization is chosen, then kmeans is applied to X.sup.b, the obtained centroids undergo the inverse transformation that generated X.sup.b followed by the same transformations that generated X.sup.a. Following these, some additional variables are initialized, such as the frequency count for each ART A category. Step 203 corresponds to the unsupervised machine learning task itself (i.e., clustering task), which is further detailed in FIG. 3. Step 204 corresponds to fetching the outputs of the clustering task: data partition labels and trained iCVI-ARTMAP neural network (ART A, map field and iCVI variables).

[0017] FIG. 3 depicts a training procedure for the iCVI-ARTMAP network (step 203 shown in FIG. 2). Step 300 corresponds to the presentation of the pre-processed inputs to iCVI-ARTMAP: one to ART A (input sample x.sup.a) and the other to the iCVI module (input sample x.sup.b). As previously mentioned, these two inputs correspond to the same input subjected to the selected transformations. At step 301 the iCVI module computes a temporary iCVI value for the assignment of its current input to each cluster of the current data partition. This is accomplished by using the “swap” (removal of the presented input from the cluster it is currently assigned followed by its addition to a different cluster) and “no” (the current iCVI value corresponding to the current cluster assignment of the presented input does not need to be recomputed) operations. Depending on the iCVI selected, one or more of the following variables may need to be incrementally computed (t represents the iteration):

[00001] $\begin{matrix} n_{i, new} = {\begin{matrix} n_{i} (t) + 1, & add operation \\ n_{i} (t) - 1, & remove operation \\ n_{i} (t), & no operation \end{matrix} & (EQ 1) \end{matrix}$ $\begin{matrix} μ_{i, new} = {\begin{matrix} \frac{n_{i} (t)}{n_{i, new}} μ_{i} (t) + \frac{1}{n_{i, new}} x^{b}, & add operation \\ \frac{n_{i} (t)}{n_{i, new}} μ_{i} (t) - \frac{1}{n_{i, new}} x^{b}, & remove operation \\ μ_{i} (t), & no operation \end{matrix} & (EQ 2) \end{matrix}$ $\begin{matrix} {CP}_{i, new} {\begin{matrix} C P_{i} (t) + \frac{n_{i} (t)}{n_{i, new}} {.Math. x^{b} - μ_{i} (t) .Math.}_{2}^{2}, & add operation \\ C P_{i} (t) - \frac{n_{i} (t)}{n_{i, new}} {.Math. x^{b} - μ_{i} (t) .Math.}_{2}^{2}, & remove operation \\ C P_{i} (t), & no operation \end{matrix} & (EQ 3) \end{matrix}$ $\begin{matrix} {.Math.}_{i, new} = {\begin{matrix} \frac{n_{i} (t) - 1}{n_{i} (t)} {.Math.}_{i} (t) + {\frac{1}{n_{i} (t) + 1} [x^{b} - μ_{i} (t)] [x^{b} - μ_{i} (t)]}^{T}, & add operation \\ \frac{n_{i} (t) - 1}{n_{i} (t) - 2} {.Math.}_{i} (t) - {\frac{n_{i} (t)}{[n_{i} (t) - 1] [n_{i} (t) - 2]} [x^{b} - μ_{i} (t)] [x^{b} - μ_{i} (t)]}^{T}, & remove operation \\ {.Math.}_{i} (t), & no operation \end{matrix} & (EQ 4) \end{matrix}$

where X.sup.b is the input to the iCVI module and n, μ, CP, and Σ are the frequency, mean, compactness and covariance matrix of cluster i, respectively.

[0018] As previously mentioned, the iCVI value is computed for the assignment of the input to each cluster. Note that not all the iCVI variables are recomputed, solely the ones associated with the cluster change (if any). The “add” operation in equation (EQ 3) is a more succinct formulation for the crisp version of the fuzzy compactness presented in M. Moshtaghi, et al., super, as compared to L.E. Brito da Silva, et al., super. The “add” operation in equation (EQ 4) is the classic incremental covariance update (R. O. Duda, et al., “Pattern Classification”, John Wiley & Sons, 2nd edition, 2000). The derivation of the “remove” operations in equations (EQ 3) and (EQ 4) follow the same rationale used for their “add” operation counterpart. A small term δI (I is the identity matrix) is added to the covariance to avoid numerical errors as described in L.E. Brito da Silva, et al., super, Next, a one-hot encoded label y is generated for the cluster assignment that optimizes the selected iCVI:

[00002] $\begin{matrix} y_{l} = {\begin{matrix} 1, & if l = \arg \max_{i} (T_{i}^{b}) \\ 0, & otherwise \end{matrix}, l \in {1, .Math., k} & (EQ 5) \end{matrix}$

where k is the number of clusters, T.sub.i.sup.b is the temporary iCVI value (or its negative if the iCVI is min-optimal) corresponding to assigning the current sample to cluster i, and y={right arrow over (1)}if all T.sub.i.sup.b are equal.

[0019] At step 302 the iCVI-ARTMAP neural network undergoes incremental learning using the cluster label y generated by the iCVI module and the ARTMAP's ART A input x.sup.a. Specifically, the fuzzy ARTMAP (see G.A. carpenter, et al. references, super)is used for the realization of iCVI-ARTMAP, in which the following dynamics take place: after presenting the input x.sup.a, the ART A category J that maximizes the activation function T.sub.j given by

[00003] $\begin{matrix} T_{j} = \frac{{.Math. x^{a} \land w_{j} .Math.}_{1}}{α + {.Math. w_{j} .Math.}_{1}}, α > 0, & (EQ 6) \end{matrix}$

is subjected to a vigilance test defined by

[00004] $\begin{matrix} M_{J} = \frac{{.Math. x^{a} \land w_{J} .Math.}_{1}}{{.Math. x^{a} .Math.}_{1}} \geq ρ_{a}, 0 \leq ρ_{a} \leq 1, & (EQ 7) \end{matrix}$

where T.sub.j and w.sub.j are the activation and weight vector of category j, respectively; the operator ∧represents a component-wise minimum between two vectors, ∥.Math.∥.sub.1 is the L1 norm, M.sub.J is the match function of category J with weight vector w.sub.J and p.sub.a is the vigilance parameter of ART A. If the inequality (EQ 7) is satisfied, then category J is also subject to a map field vigilance test defined by

[00005] $\begin{matrix} M_{J}^{a b} = \frac{{.Math. y \land w_{J}^{a b} .Math.}_{1}}{{.Math. y .Math.}_{1}} \geq ρ_{ab}, 0 \leq ρ_{a b} \leq 1, & (EQ 8) \end{matrix}$

where w.sub.J.sup.ab is a row vector of the map field mapping matrix W.sup.ab associated with category J. If the inequality (EQ 8) is also satisfied, then category J of ART A undergoes adaptation

w.sub.J(t+1)=(1−β)w.sub.J(t)+β[x.sup.a∧w.sub.J(t)], 0<β≤1, tm (EQ 9)

as well as the map field

w.sub.J.sup.ab(t+1)=(1−β.sub.am)w.sub.J.sup.ab(t)+β.sub.ab[y∧w.sub.J.sup.ab(t)], 0<β.sub.ab≤1, tm (EQ 10)

[0020] If category J does not satisfy both inequalities, then the match tracking procedure increases the vigilance parameter of ART A by a small value ε, the next highest ranked category is selected, and the process repeated. This loop continues until either (i) a category meets both vigilance parameters or (ii) a new one is generated. If the latter case takes place, then w.sub.new=x.sup.a and w.sub.new.sup.ab={right arrow over (1)}.

[0021] At step 303, the cluster in the map field associated with the entry of the resonant category J of ART A is defined as the current prediction

[00006] $\begin{matrix} label prediction = \arg \max_{i} (w_{J, i}^{ab}) & (EQ 11) \end{matrix}$

and used to update the current data partition. At step 304 the cluster assignment of the current input is verified (equation (EQ 11)). If a change is detected in step 304, then at step 305 the iCVI module incrementally updates the corresponding iCVI variables and its value using a “swap” operation (a subset of EQ 1 through EQ 4). Otherwise, a “no” operation takes place and the iCVI value and variables remain the same. Step 306 verifies if the ART A resonant category J for the current input x.sup.a has changed; in the affirmative case, then the number of inputs associated with the ART A category I that had been previously associated with the current input x.sup.a is verified at step 307. If category I did not become empty, then, at step 308, this category shrinks using the samples assigned to it

[00007] $\begin{matrix} w_{I} (t + 1) = \underset{x_{i}^{a} \in w_{I}}{\land} x_{i}^{a} & (EQ 12) \end{matrix}$

[0022] Otherwise, it is pruned at step 309 (associated iCVI-ARTMAP parameters are modified accordingly). The next step 310 consists of the verification of the total number of clusters in the current partition. If a cluster vanished (i.e., by not having any input assigned to it) then at step 311 the partition labels and iCVI-ARTMAP variables are adjusted accordingly. At step 312, it is verified if the current input presentation completes a full pass through the data. In the negative case then the next input is presented (step 300). In the affirmative case, then at step 313 pairs of clusters are hierarchically merged, such that at each stage, the pair i and j that yields the largest improvement of the iCVI value are merged, and the process continues until the value of the chosen iCVI worsens (i.e., when the iCVI worsens following a given merge, the cluster structure right before such merging operation is retained) or there are only 2 clusters left. The iCVI module computes the iCVI values incrementally using the “merge” operation. Depending on the chosen iCVI, one or more of the following variables might need to be recomputed incrementally:

[00008] $\begin{matrix} n_{m e rged} = n_{i} (t) + n_{j} (t) & (EQ 13) \end{matrix}$ $\begin{matrix} μ_{m e rged} = \frac{n_{i} (t)}{n_{m e rged}} μ_{i} (t) + \frac{n_{j} (t)}{n_{m e rged}} μ_{j} (t) & (EQ 14) \end{matrix}$ $\begin{matrix} {CP}_{m e rged} = C P_{i} (t) + C P_{j} (t) + \frac{n_{i} (t) n_{j} (t)}{n_{m e rged}} {.Math. μ_{j} (t) - μ_{i} (t) .Math.}_{2}^{2} & (EQ 15) \end{matrix}$ $\begin{matrix} {.Math.}_{merged} = \frac{n_{i} (t) - 1}{n_{merged} - 1} {.Math.}_{i} (t) + \frac{n_{j} (t) - 1}{n_{merged} - 1} {.Math.}_{j} (t) + {\frac{n_{i} (t) n_{j} (t)}{[n_{merged}] [n_{merged} - 1]} [μ_{j} (t) - μ_{i} (t)] [μ_{j} (t) - μ_{i} (t)]}^{T} & (EQ 16) \end{matrix}$

[0023] The partition labels and iCVI-ARTMAP variables are adjusted to reflect the changes caused by the pairwise “merge” operations (note that ART A weight vectors do not change during “merge” operations). In particular, the map field mapping matrix W.sup.ab grows as:

W.sup.ab(t+1)=[W.sup.ab(t)|C] (EQ 17)

where the entries of the column vector C are computed as

[00009] $\begin{matrix} C_{l} = {\begin{matrix} \max_{m \in {i, j}} (w_{l, m}^{a b} (t)), & if \arg \max_{m} (w_{l, m}^{a b} (t)) \in {i, j} \\ \min_{m \in {i, j}} (w_{l, m}^{a b} (t)), & otherwise \end{matrix} & (EQ 18) \end{matrix}$

and W.sub.l.sup.ab (t) is a vector from the l.sup.th row of W.sup.ab (t). Next, the i.sup.th and j.sup.th columns corresponding to the merged clusters are deleted from W.sup.ab(t+1). At step 314 the number of clusters in the partitions is verified. If the number of clusters is smaller than the user-defined value, then “split” operations is attempted at step 315, wherein the system split clusters represented by multiple prototypes (if any) to restore the desired number of clusters.

[0024] During a split operation, the category from a multi-prototype cluster (defined with respect to both the map field prediction and current partition) with the best iCVI value when comprising a cluster on its own is selected as a new cluster. Splitting is performed until the desired number of clusters is achieved or there are no more multi-prototype clusters in the system. The iCVI values are incrementally recomputed if a split operation takes place (except for the newly created cluster, for which the iCVI variables are computed in batch mode). Depending on the chosen iCVI, one or more of the following variables might need to be recomputed incrementally for the parent cluster i:

[00010] $\begin{matrix} n_{i, split} = n_{i} (t) - n_{j} (t) & (EQ 19) \end{matrix}$ $\begin{matrix} μ_{i, split} = \frac{n_{i} (t)}{n_{i, split}} μ_{i} (t) - \frac{n_{j} (t)}{n_{i, split}} μ_{j} (t) & (EQ 20) \end{matrix}$ $\begin{matrix} {CP}_{i, split} = C P_{i} (t) - C P_{j} (t) - \frac{n_{i} (t) n_{j} (t)}{n_{i, split}} {.Math. μ_{j} (t) - μ_{i} (t) .Math.}_{2}^{2} & (EQ 21) \end{matrix}$ $\begin{matrix} {.Math.}_{i, split} = \frac{n_{i} (t) - 1}{n_{i, split} - 1} {.Math.}_{i} (t) - \frac{n_{j} (t) - 1}{n_{i, split} - 1} {.Math.}_{j} (t) - {\frac{n_{i} (t) n_{j} (t)}{[n_{i, split}] [n_{i, split} - 1]} [μ_{j} (t) - μ_{i} (t)] [μ_{j} (t) - μ_{i} (t)]}^{T} & (EQ 22) \end{matrix}$

[0025] The partition labels and iCVI-ARTMAP variables are updated accordingly (note that ART A weight vectors do not change during “split” operations). Specifically, the map field mapping matrix W.sup.ab grows as:

W.sup.ab(t+1)=[W.sup.ab(t)|C] (EQ 23)

where the entries of the column vector C are computed as

[00011] $\begin{matrix} C_{l} = {\begin{matrix} \max_{j} (w_{q, j}^{a b} (t)), & if l = q \\ 0, & othe r w i s e \end{matrix} & (EQ 24) \end{matrix}$

and w.sub.q.sup.ab(t) is a vector from the q.sup.th row of W.sup.ab(t) corresponding to the category q that was selected to become a new cluster. Next, the clusters to which category q was previously mapped to according to the map field prediction (equation (EQ 11)) is assigned the smallest value of w.sub.q.sup.ab(t) in W.sup.ab(t+1), that is:

[00012] $\begin{matrix} w_{q, s}^{a b} (t + 1) = \min_{j} [w_{q, j}^{a b} (t)] & (EQ 25) \end{matrix}$ $where$ $\begin{matrix} s = \arg \max_{j} [w_{q, j}^{a b} (t)] & (EQ 26) \end{matrix}$

and w.sub.q.sup.ab(t+1) is a vector from the q.sup.th row of W.sup.ab(t+1) corresponding to category q. In case w.sub.q.sup.an(t) is a vector with identical components, then, except for the last component corresponding to the new cluster, all components w.sub.q,j.sup.ab(t+1) are equal to w.sub.q,j.sup.ab(t) minus a very small constant. Finally, if the number of clusters is equal to the value defined by the user or it is not possible to perform a split operation then step 316 takes place, in which the stopping conditions are checked to determine if the training should end. These include for instance, reaching the maximum number of epochs (one epoch corresponds to a full pass through the data), no change in ART A weight vectors (w) or the change in the iCVI value between two consecutive epochs is smaller than a tolerance parameter. If the stopping conditions are satisfied, then the training procedure stops.

[0026] This invention provides several advantages over the prior art. One improvement is that it provides an assessment of the quality of learning by a neural network incrementally, that is, “as you go”. Furthermore, it converges under certain assumptions to the answer prior art systems would reach only during post-processing. The invention thus enables faster validation of machine learning. Although some prior art proposals have considered the use of iCVIs, only this invention provides for incorporating such techniques into an offline multi-prototype-based incremental clustering algorithm. This enables incremental updates of validation measures to affect the choice of cluster assignments while those assignments are actually being made.

[0027] Although the invention improves both speed and accuracy, there are other potential benefits of the described techniques. For example, they may also be used to adjust the vigilance parameter. As another example, the invention may also maintain speed and accuracy with fewer neurons than prior art solutions. Furthermore, it may allow representation of the data with fewer clusters.

Incremental cluster validity index-based offline clustering for machine learning

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/088

PHYSICS

Classification Explorer

G06N3/082

PHYSICS

Classification Explorer

G06N3/0409

PHYSICS

Classification Explorer

G06N3/043

PHYSICS

International classification

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Abstract

Claims

Description