Methods for using feature vectors and machine learning algorithms to determine discriminant functions of minimum risk quadratic classification systems

Abstract

Methods are provided for determining discriminant functions of minimum risk quadratic classification systems, wherein a discriminant function is represented by a geometric locus of a principal eigenaxis of a quadratic decision boundary. A geometric locus of a principal eigenaxis is determined by solving a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium. Feature vectors and machine learning algorithms are used to determine discriminant functions and ensembles of discriminant functions of minimum risk quadratic classification systems, wherein a discriminant function of a minimum risk quadratic classification system exhibits the minimum probability of error for classifying given collections of feature vectors and unknown feature vectors related to the collections.

Claims

1. A computer-implemented method of using feature vectors and machine learning algorithms to determine a discriminant function of a minimum risk quadratic classification system that classifies said feature vectors into two classes and using said discriminant function of said minimum risk quadratic classification system to classify unknown feature vectors related to said feature vectors, said method comprising: receiving an Nd data set of feature vectors within a computer system, wherein N is a number of feature vectors, d is a number of vector components in each feature vector, and each one of said N feature vectors is labeled with information that identifies which of two classes each one of said N feature vectors belongs to, and wherein each said feature vector is defined by a d-dimensional vector of numerical features, wherein said numerical features are extracted from digital signals; receiving within said computer system unknown feature vectors related to said data set; determining a kernel matrix using said data set, said determination of said kernel matrix being performed by using processors of said computer system to calculate a matrix of all possible inner products of signed reproducing kernels of said N feature vectors, wherein a reproducing kernel of a feature vector replaces said feature vector with a curve that contains first and second degree vector components, and wherein each one of said reproducing kernels of said N feature vectors has a sign of +1 or 1 that identifies which of said two classes each one of said N feature vectors belongs to, and using said processors of said computer system to calculate a regularized kernel matrix from said kernel matrix; determining scale factors of a geometric locus of signed and scaled reproducing kernels of extreme points using said regularized kernel matrix, wherein said extreme points are located within overlapping regions or near tail regions of distributions of said N feature vectors, said determination of said scale factors being performed by using said processors of said computer system to determine a solution of a dual optimization problem, wherein said scale factors and said geometric locus satisfy a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium, and wherein said scale factors determine conditional densities for said extreme points and also determine critical minimum eigenenergies exhibited by scaled extreme vectors on said geometric locus, wherein said critical minimum eigenenergies determine conditional probabilities of said extreme points and also determine corresponding counter risks and risks of a minimum risk quadratic classification system, wherein said counter risks are associated with right decisions and said risks are associated with wrong decisions of said minimum risk quadratic classification system, and wherein said geometric locus determines the principal eigenaxis of the decision boundary of said minimum risk quadratic classification system, wherein said principal eigenaxis exhibits symmetrical dimensions and density, wherein said conditional probabilities and said critical minimum eigenenergies exhibited by said minimum risk quadratic classification system are symmetrically concentrated within said principal eigenaxis, and wherein counteracting and opposing components of said critical minimum eigenenergies exhibited by corresponding components of said scaled extreme vectors on said geometric locus together with corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about the geometric center of said principal eigenaxis, wherein the center of total allowed eigenenergy and minimum expected risk of said minimum risk quadratic classification system is located at the geometric center of said geometric locus, and wherein said geometric locus determines a primal representation of a dual locus of likelihood components and principal eigenaxis components, wherein said likelihood components and said principal eigenaxis components are symmetrically distributed over either side of the axis of said dual locus, wherein a statistical fulcrum is placed directly under the center of said dual locus, and wherein said likelihood components of said dual locus determine conditional likelihoods for said extreme points, and wherein said principal eigenaxis components of said dual locus determine an intrinsic coordinate system of geometric loci of a quadratic decision boundary and corresponding decision borders that jointly partition the decision space of said minimum risk quadratic classification system into symmetrical decision regions; determining said extreme vectors on said geometric locus using the vector of said scale factors, said determination of said extreme vectors being performed by using said processors of said computer system to identify said scale factors that exceed zero by a small threshold, and using said processors of said computer system to determine a sign vector of signs associated with said extreme vectors using said data set, and compute the average sign using said sign vector; determining a locus of risk for said minimum risk quadratic classification system using said reproducing kernels of said extreme vectors and said signed reproducing kernels of said N feature vectors and said vector of scale factors, said determination of said locus of risk being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said extreme vectors, and multiply said matrix by said vector of scale factors, and compute the average risk for said minimum risk quadratic classification system using said locus of risk; determining a discriminant locus for said minimum risk quadratic classification system using said geometric locus, said determination of said discriminant locus being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said unknown feature vectors, and multiply said matrix by said vector of scale factors; determining the discriminant function of said minimum risk quadratic classification system, using said average risk and said average sign and said discriminant locus, said determination of said discriminant function of said minimum risk quadratic classification system being performed by using said processors of said computer system to subtract said average risk from sum of said discriminant locus and said average sign, wherein said discriminant function of said minimum risk quadratic classification system satisfies said system of fundamental locus equations of binary classification, and wherein said discriminant function of said minimum risk quadratic classification system determines likely locations of said N feature vectors and also determines said geometric loci of said quadratic decision boundary and said corresponding decision borders that jointly partition said extreme points into said symmetrical decision regions, wherein said symmetrical decision regions span said overlapping regions or said tail regions of said distributions of said N feature vectors, and wherein said discriminant function of said minimum risk quadratic classification system satisfies said quadratic decision boundary in terms of a critical minimum eigenenergy and said minimum expected risk, wherein said counteracting and opposing components of said critical minimum eigenenergies exhibited by said corresponding components of said scaled extreme vectors on said geometric locus associated with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically distributed over said axis of said dual locus, on equal sides of said statistical fulcrum located at said geometric center of said dual locus, wherein said counteracting and opposing components of said critical minimum eigenenergies together with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about said geometric center of said dual locus, and wherein said statistical fulcrum is located at said center of said total allowed eigenenergy and said minimum expected risk of said minimum risk quadratic classification system, wherein said minimum risk quadratic classification system satisfies a state of statistical equilibrium, wherein said total allowed eigenenergy and said expected risk of said minimum risk quadratic classification system are minimized, and wherein said minimum risk quadratic classification system exhibits the minimum probability of error for classifying said N feature vectors that belong to said two classes and said unknown feature vectors related to said data set; and determining which of said two classes said unknown feature vectors belong to using said discriminant function of said minimum risk quadratic classification system, said determination of said classes of said unknown feature vectors being performed by using said processors of said computer system to apply said discriminant function of said minimum risk quadratic classification system to said unknown feature vectors, wherein said discriminant function determines likely locations of said unknown feature vectors and identifies said decision regions related to said two classes that said unknown feature vectors are located within, wherein said discriminant function recognizes said classes of said unknown feature vectors, and wherein said minimum risk quadratic classification system decides which of said two classes said unknown feature belong to and thereby classifies said unknown feature vectors.

2. The method of claim 1, wherein the reproducing kernel is a Gaussian reproducing kernel: k.sub.x=exp(s=x.sup.2):0.010.1.

3. The method of claim 1, wherein the reproducing kernel is a second-order polynomial reproducing kernel: k.sub.x=(s.sup.Tx+1).sup.2.

4. A computer-implemented method of using feature vectors and machine learning algorithms to determine a fused discriminant function of a fused minimum risk quadratic classification system that classifies two types of said feature vectors into two classes, wherein said types of said feature vectors have different numbers of vector components, and using said fused discriminant function of said fused minimum risk quadratic classification system to classify unknown feature vectors related to said two types of said feature vectors, said method comprising: receiving an Nd data set of feature vectors within a computer system, wherein N is a number of feature vectors, d is a number of vector components in each feature vector, and each one of said N feature vectors is labeled with information that identifies which of two classes each one of said N feature vectors belongs to, and wherein each said feature vector is defined by a d-dimensional vector of numerical features, wherein said numerical features are extracted from digital signals; receiving an Np different data set of different feature vectors within said computer system, wherein N is a number of different feature vectors, p is a number of vector components in each different feature vector, and each one of said N different feature vectors is labeled with information that identifies which of said two classes each one of said N different feature vectors belongs to, and wherein each said different feature vector is defined by a p-dimensional vector of numerical features, wherein said numerical features are extracted from digital signals; receiving within said computer system unknown feature vectors related to said data set and unknown different feature vectors related to said different data set; determining a kernel matrix using said data set, said determination of said kernel matrix being performed by using processors of said computer system to calculate a matrix of all possible inner products of signed reproducing kernels of said N feature vectors, wherein a reproducing kernel of a feature vector replaces said feature vector with a curve that contains first and second degree vector components, and wherein each one of said reproducing kernels of said N feature vectors has a sign of +1 or 1 that identifies which of said two classes each one of said N feature vectors belongs to, and using said processors of said computer system to calculate a regularized kernel matrix from said kernel matrix; determining a different kernel matrix using said different data set, said determination of said different kernel matrix being performed by using saki processors of said computer system to calculate a matrix of all possible inner products of signed reproducing kernels of said N different feature vectors, wherein a reproducing kernel of a different feature vector replaces said different feature vector with a curve that contains first and second degree vector components, and wherein each one of said reproducing kernels of said N different feature vectors has a sign of +1 or 1 that identifies which of said two classes each one of said N different feature vectors belongs to, and using said processors of said computer system to calculate a regularized different kernel matrix from said different kernel matrix; determining a discriminant function of a minimum risk quadratic classification system using said regularized kernel matrix and said data set, said determination of said discriminant function of said minimum risk quadratic classification system comprising the steps of: determining scale factors of a geometric locus of signed and scaled reproducing kernels of extreme points using said regularized kernel matrix, wherein said extreme points are located within overlapping regions or near tail regions of distributions of said N feature vectors, said determination of said scale factors being performed by using said processors of said computer system to determine a solution of a dual optimization problem, wherein said scale factors and said geometric locus satisfy a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium, and wherein said scale factors determine conditional densities for said extreme points and also determine critical minimum eigenenergies exhibited by scaled extreme vectors on said geometric locus, wherein said critical minimum eigenenergies determine conditional probabilities of said extreme points and also determine corresponding counter risks and risks of a minimum risk quadratic classification system, wherein said counter risks are associated with right decisions and said risks are associated with wrong decisions of said minimum risk quadratic classification system, and wherein said geometric locus determines the principal eigenaxis of the decision boundary of said minimum risk quadratic classification system, wherein said principal eigenaxis exhibits symmetrical dimensions and density, wherein said conditional probabilities and said critical minimum eigenenergies exhibited by said minimum risk quadratic classification system are symmetrically concentrated within said principal eigenaxis, and wherein counteracting and opposing components of said critical minimum eigenenergies exhibited by corresponding components of said scaled extreme vectors on said geometric locus together with corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about the geometric center of said principal eigenaxis, wherein the center of total allowed eigenenergy and minimum expected risk of said minimum risk quadratic classification system is located at the geometric center of said geometric locus, and wherein said geometric locus determines a primal representation of a dual locus of likelihood components and principal eigenaxis components, wherein said likelihood components and said principal eigenaxis components are symmetrically distributed over either side of the axis of said dual locus, wherein a statistical fulcrum is placed directly under the center of said dual locus, and wherein said likelihood components of said dual locus determine conditional likelihoods for said extreme points, and wherein said principal eigenaxis components of said dual locus determine an intrinsic coordinate system of geometric loci of a quadratic decision boundary and corresponding decision borders that jointly partition the decision space of said minimum risk quadratic classification system into symmetrical decision regions; determining said extreme vectors on said geometric locus using the vector of said scale factors, said determination of said extreme vectors being performed by using said processors of said computer system to identify said scale factors that exceed zero by a small threshold, and using said processors of said computer system to determine a sign vector of signs associated with said extreme vectors using said data set, and compute the average sign using said sign vector; determining a locus of risk for said minimum risk quadratic classification system using said reproducing kernels of said extreme vectors and said signed reproducing kernels of said N feature vectors and said vector of scale factors, said determination of said locus of risk being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said extreme vectors, and multiply said matrix by said vector of scale factors, and compute the average risk for said minimum risk quadratic classification system using said locus of risk; determining a discriminant locus for said minimum risk quadratic classification system using said geometric locus, said determination of said discriminant locus being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said unknown feature vectors, and multiply said matrix by said vector of scale factors; determining the discriminant function of said minimum risk quadratic classification system, using said average risk and said average sign and said discriminant locus, said determination of said discriminant function of said minimum risk quadratic classification system being performed by using said processors of said computer system to subtract said average risk from sum of said discriminant locus and said average sign, wherein said discriminant function of said minimum risk quadratic classification system satisfies said system of fundamental locus equations of binary classification, and wherein said discriminant function of said minimum risk quadratic classification system determines likely locations of said N feature vectors and also determines said geometric loci of said quadratic decision boundary and said corresponding decision borders that jointly partition said extreme points into said symmetrical decision regions, wherein said symmetrical decision regions span said overlapping regions or said tail regions of said distributions of said N feature vectors, and wherein said discriminant function of said minimum risk quadratic classification system satisfies said quadratic decision boundary in terms of a critical minimum eigenenergy and said minimum expected risk, wherein said counteracting and opposing components of said critical minimum eigenenergies exhibited by said corresponding components of said scaled extreme vectors on said geometric locus associated with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically distributed over said axis of said dual locus, on equal sides of said statistical fulcrum located at said geometric center of said dual locus, wherein said counteracting and opposing components of said critical minimum eigenenergies together with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about said geometric center of said dual locus, and wherein said statistical fulcrum is located at said center of said total allowed eigenenergy and said minimum expected risk of said minimum risk quadratic classification system, wherein said minimum risk quadratic classification system satisfies a state of statistical equilibrium, wherein said total allowed eigenenergy and said expected risk of said minimum risk quadratic classification system are minimized, and wherein said minimum risk quadratic classification system exhibits the minimum probability of error for classifying said N feature vectors that belong to said two classes and said unknown feature vectors related to said data set; determining a different discriminant function of a different minimum risk quadratic classification system using said regularized different kernel matrix and said different data set, said determination of said different discriminant function of said different minimum risk quadratic classification system being performed by using said processors of said computer system to perform said steps of determining said discriminant function of said minimum risk quadratic classification system, wherein said different minimum risk quadratic classification system exhibits the minimum probability of error for classifying said N different feature vectors that belong to said two classes and said unknown different feature vectors related to said different data set; determining a fused discriminant function of a fused minimum risk quadratic classification system using said discriminant function of said minimum risk quadratic classification system and said different discriminant function of said different minimum risk quadratic classification system, said determination of said fused discriminant function of said fused minimum risk quadratic classification system being performed by using said processors of said computer system to sum said discriminant function of said minimum risk quadratic classification system and said different discriminant function of said different minimum risk quadratic classification system; and determining which of said two classes said unknown feature vectors and said unknown different feature vectors belong to using said fused discriminant function of said fused minimum risk quadratic classification system, said determination of said classes of said unknown feature vectors and said unknown different feature vectors being performed by using said processors of said computer system to apply said fused discriminant function of said fused minimum risk quadratic classification system to said unknown feature vectors and said unknown different feature vectors, wherein said fused discriminant function determines likely locations of said unknown feature vectors and said unknown different feature vectors and identifies said decision regions related to said two classes that said unknown feature vectors and said unknown different feature vectors are located within, wherein said fused discriminant function recognizes said classes of said unknown feature vectors and said unknown different feature vectors, and wherein said fused minimum risk quadratic classification system decides which of said two classes said unknown feature vectors and said unknown different feature vectors belong to and thereby classifies said unknown feature vectors and said unknown different feature vectors.

5. The method of claim 4, wherein the reproducing kernel is a Gaussian reproducing kernel: k.sub.x=exp(sx.sup.2):0.010.1.

6. The method of claim 4, wherein the reproducing kernel is a second-order polynomial reproducing kernel: k.sub.x=(s.sup.T x+1).sup.2.

7. A computer-implemented method of using feature vectors and machine learning algorithms to determine a discriminant function of an M-class minimum risk quadratic classification system that classifies said feature vectors into M classes and using said discriminant function of said M-class minimum risk quadratic classification system to classify unknown feature vectors related to said feature vectors, said method comprising: receiving M Nd data sets of feature vectors within a computer system, wherein M is a number of classes, N is a number of feature vectors in each one of said M data sets, d is a number of vector components in each feature vector, and each one of said N feature vectors in each one of said M data sets belongs to the same class and is labeled with information that identifies said class, and wherein each said feature vector is defined by a d-dimensional vector of numerical features, wherein said numerical features are extracted from digital signals; receiving within said computer system unknown feature vectors related to said M data sets; determining M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems using said M data sets, wherein the determination of each one of said M ensembles comprises the steps of: determining M1 kernel matrices for a class of feature vectors using said M data sets, said determination of said M-1 kernel matrices being performed by using processors of said computer system to calculate M1 matrices, wherein each matrix contains all possible inner products of signed reproducing kernels of feature vectors that belong to said class and one of the other M-1 classes, wherein a reproducing kernel of a feature vector replaces said feature vector with a curve that contains first and second degree vector components, and wherein said N feature vectors that belong to said class have the sign +1, and said N feature vectors that belong to said other class have the sign 1, and wherein said M1 matrices account for all of the other said M1 classes, and calculating M1 regularized kernel matrices from said M1 kernel matrices; determining M1 discriminant functions of M1 minimum risk quadratic classification systems using said M1 regularized kernel matrices and said M data sets, wherein the determination of each one of said M1 discriminant functions of M1 minimum risk quadratic classification systems further comprises the steps of: determining scale factors of a geometric locus of signed and scaled reproducing kernels of extreme points using one of said regularized kernel matrices, wherein said extreme points are located within overlapping regions or near tail regions of distributions of feature vectors that belong to said class and one of the other said M1 classes, said determination of said scale factors being performed by using said processors of said computer system to determine a solution of a dual optimization problem, wherein said scale factors and said geometric locus satisfy a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium, and wherein said scale factors determine conditional densities for said extreme points and also determine critical minimum eigenenergies exhibited by scaled extreme vectors on said geometric locus, wherein said critical minimum eigenenergies determine conditional probabilities of said extreme points and also determine corresponding counter risks and risks of a minimum risk quadratic classification system, wherein said counter risks are associated with right decisions and said risks are associated with wrong decisions of said minimum risk quadratic classification system, and wherein said geometric locus determines the principal eigenaxis of the decision boundary of said minimum risk quadratic classification system, wherein said principal eigenaxis exhibits symmetrical dimensions and density, wherein said conditional probabilities and said critical minimum eigenenergies exhibited by said minimum risk quadratic classification system are symmetrically concentrated within said principal eigenaxis, and wherein counteracting and opposing components of said critical minimum eigenenergies exhibited by corresponding components of said scaled extreme vectors on said geometric locus together with corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about the geometric center of said principal eigenaxis, wherein the center of total allowed eigenenergy and minimum expected risk of said minimum risk quadratic classification system is located at the geometric center of said geometric locus, and wherein said geometric locus determines a primal representation of a dual locus of likelihood components and principal eigenaxis components, wherein said likelihood components and said principal eigenaxis components are symmetrically distributed over either side of the axis of said dual locus, wherein a statistical fulcrum is placed directly under the center of said dual locus, and wherein said likelihood components of said dual locus determine conditional likelihoods for said extreme points, and wherein said principal eigenaxis components of said dual locus determine an intrinsic coordinate system of geometric loci of a quadratic decision boundary and corresponding decision borders that jointly partition the decision space of said minimum risk quadratic classification system into symmetrical decision regions; determining said extreme vectors on said geometric locus using the vector of said scale factors, said determination of said extreme vectors being performed by using said processors of said computer system to identify said scale factors that exceed zero by a small threshold, and using said processors of said computer system to determine a sign vector of signs associated with said extreme vectors using data set of said class and data set of said other class, and compute the average sign using said sign vector; determining a locus of risk for said minimum risk quadratic classification system using said reproducing kernels of said extreme vectors and said signed reproducing kernels of said N feature vectors and said vector of scale factors, said determination of said locus of risk being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said extreme vectors, and multiply said matrix by said vector of scale factors, and compute the average risk for said minimum risk quadratic classification system using said locus of risk; determining a discriminant locus for said minimum risk quadratic classification system using said geometric locus, said determination of said discriminant locus being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said feature vectors that belong to said class and said other class and said reproducing kernels of said unknown feature vectors, and multiply said matrix by said vector of scale factors; determining the discriminant function of said minimum risk quadratic classification system, using said average risk and said average sign and said discriminant locus, said determination of said discriminant function of said minimum risk quadratic classification system being performed by using said processors of said computer system to subtract said average risk from sum of said discriminant locus and said average sign, wherein said discriminant function of said minimum risk quadratic classification system satisfies said system of fundamental locus equations of binary classification, and wherein said discriminant function of said minimum risk quadratic classification system determines likely locations of said N feature vectors from said class and said N feature vectors from said other class and also determines said geometric loci of said quadratic decision boundary and said corresponding decision borders that jointly partition said extreme points into said symmetrical decision regions, wherein said symmetrical decision regions span said overlapping regions or said tail regions of said distributions of said N feature vectors that belong to said class and said N feature vectors that belong to said other class, and wherein said discriminant function of said minimum risk quadratic classification system satisfies said quadratic decision boundary in terms of a critical minimum eigenenergy and said minimum expected risk, wherein said counteracting and opposing components of said critical minimum eigenenergies exhibited by said corresponding components of said scaled extreme vectors on said geometric locus associated with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically distributed over said axis of said dual locus, on equal sides of said statistical fulcrum located at said geometric center of said dual locus, wherein said counteracting and opposing components of said critical minimum eigenenergies together with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about said geometric center of said dual locus, and wherein said statistical fulcrum is located at said center of said total allowed eigenenergy and said minimum expected risk of said minimum risk quadratic classification system, wherein said minimum risk quadratic classification system satisfies a state of statistical equilibrium, wherein said total allowed eigenenergy and said expected risk of said minimum risk quadratic classification system are minimized, and wherein said minimum risk quadratic classification system exhibits the minimum probability of error for classifying said N feature vectors that belong to said class and said N feature vectors that belong to said other class and said unknown feature vectors related to said data set of said class and said data set of said other class; determining a discriminant function of an M-class minimum risk quadratic classification system using said M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems, said determination of said discriminant function of said M-class minimum risk quadratic classification system being performed by using said processors of said computer system to sum said M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems; and determining which of said M classes said unknown feature vectors belong to using said discriminant function of said M-class minimum risk quadratic classification system, said determination of said classes of said unknown feature vectors being performed by using said processors of said computer system to apply said discriminant function of said M- class minimum risk quadratic classification system to said unknown feature vectors, wherein said discriminant function determines likely locations of said unknown feature vectors and identifies said decision regions related to said M classes that said unknown feature vectors are located within, wherein said discriminant function recognizes said classes of said unknown feature vectors, and wherein said M-class minimum risk quadratic classification system decides which of said M classes said unknown feature vectors belong to and thereby classifies said unknown feature vectors.

8. The method of claim 7, wherein the reproducing kernel is a Gaussian reproducing kernel: k.sub.x=exp (sx.sup.2):0.010.1.

9. The method of claim 7, wherein the reproducing kernel is a second-order polynomial reproducing kernel: k.sub.x=(s.sup.Tx+1).sup.2.

10. A computer-implemented method of using feature vectors and machine learning algorithms to determine a fused discriminant function of a fused M-class minimum risk quadratic classification system that classifies two types of said feature vectors into M classes, wherein said types of said feature vectors have different numbers of vector components, and using said fused discriminant function of said fused M-class minimum risk quadratic classification system to classify unknown feature vectors related to said two types of said feature vectors, said method comprising: receiving M Nd data sets of feature vectors within a computer system, wherein M is a number of classes, N is a number of feature vectors in each one of said M data sets, d is a number of vector components in each feature vector, and each one of said N feature vectors in each one of said M data sets belongs to the same class and is labeled with information that identifies said class, and wherein each said feature vector is defined by a d-dimensional vector of numerical features, wherein said numerical features are extracted from digital signals; receiving M Np different data sets of different feature vectors within said computer system, wherein M is said number of said classes, N is a number of different feature vectors in each one of said M different data sets, p is a number of vector components in each different feature vector, and each one of said N different feature vectors in each one of said M different data sets belongs to the same class and is labeled with information that identifies said class, and wherein each said different feature vector is defined by a p-dimensional vector of numerical features, wherein said numerical features are extracted from digital signals; receiving within said computer system unknown feature vectors related to said M data sets and unknown different feature vectors related to said M different data sets; determining M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems using said M data sets, wherein the determination of each one of said M ensembles comprises the steps of: determining M1 kernel matrices for a class of feature vectors using said M data sets, said determination of said M1 kernel matrices being performed by using processors of said computer system to calculate M1 matrices, wherein each matrix contains all possible inner products of signed reproducing kernels of feature vectors that belong to said class and one of the other M1 classes, wherein a reproducing kernel of a feature vector replaces said feature vector with a curve that contains first and second degree vector components, and wherein said N feature vectors that belong to said class have the sign +1, and said N feature vectors that belong to said other class have the sign 1, and said M1 matrices account for all of the other said M1 classes, and calculating M1 regularized kernel matrices from said M1 kernel matrices; determining M1 discriminant functions of M1 minimum risk quadratic classification systems using said M1 regularized kernel matrices and said M data sets, wherein the determination of each one of said M1 discriminant functions of M1 minimum risk quadratic classification systems further comprises the steps of: determining scale factors of a geometric locus of signed and scaled reproducing kernels of extreme points using one of said regularized kernel matrices, wherein said extreme points are located within overlapping regions or near tail regions of distributions of feature vectors that belong to said class and one of the other said M1 classes, said determination of said scale factors being performed by using said processors of said computer system to determine a solution of a dual optimization problem, wherein said scale factors and said geometric locus satisfy a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium, and wherein said scale factors determine conditional densities for said extreme points and also determine critical minimum eigenenergies exhibited by scaled extreme vectors on said geometric locus, wherein said critical minimum eigenenergies determine conditional probabilities of said extreme points and also determine corresponding counter risks and risks of a minimum risk quadratic classification system, wherein said counter risks are associated with right decisions and said risks are associated with wrong decisions of said minimum risk quadratic classification system, and wherein said geometric locus determines the principal eigenaxis of the decision boundary of said minimum risk quadratic classification system, wherein said principal eigenaxis exhibits symmetrical dimensions and density, wherein said conditional probabilities and said critical minimum eigenenergies exhibited by said minimum risk quadratic classification system are symmetrically concentrated within said principal eigenaxis, and wherein counteracting and opposing components of said critical minimum eigenenergies exhibited by corresponding components of said scaled extreme vectors on said geometric locus together with corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about the geometric center of said principal eigenaxis, wherein the center of total allowed eigenenergy and minimum expected risk of said minimum risk quadratic classification system is located at the geometric center of said geometric locus, and wherein said geometric locus determines a primal representation of a dual locus of likelihood components and principal eigenaxis components, wherein said likelihood components and said principal eigenaxis components are symmetrically distributed over either side of the axis of said dual locus, wherein a statistical fulcrum is placed directly under the center of said dual locus, and wherein said likelihood components of said dual locus determine conditional likelihoods for said extreme points, and wherein said principal eigenaxis components of said dual locus determine an intrinsic coordinate system of geometric loci of a quadratic decision boundary and corresponding decision borders that jointly partition the decision space of said minimum risk quadratic classification system into symmetrical decision regions; determining said extreme vectors on said geometric locus using the vector of said scale factors, said determination of said extreme vectors being performed by using said processors of said computer system to identify said scale factors that exceed zero by a small threshold, and using said processors of said computer system to determine a sign vector of signs associated with said extreme vectors using data set of said class and data set of said other class, and compute the average sign using said sign vector; determining a locus of risk for said minimum risk quadratic classification system using said reproducing kernels of said extreme vectors and said signed reproducing kernels of said N feature vectors and said vector of scale factors, said determination of said locus of risk being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said extreme vectors, and multiply said matrix by said vector of scale factors, and compute the average risk for said minimum risk quadratic classification system using said locus of risk; determining a discriminant locus for said minimum risk quadratic classification system using said geometric locus, said determination of said discriminant locus being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said feature vectors that belong to said class and said other class and said reproducing kernels of said unknown feature vectors, and multiply said matrix by said vector of scale factors; determining the discriminant function of said minimum risk quadratic classification system, using said locus of average risk and said average sign and said discriminant locus, said determination of said discriminant function of said minimum risk quadratic classification system being performed by using said processors of said computer system to subtract said average risk from sum of said discriminant locus and said average sign, wherein said discriminant function of said minimum risk quadratic classification system satisfies said system of fundamental locus equations of binary classification, and wherein said discriminant function of said minimum risk quadratic classification system determines likely locations of said N feature vectors from said class and said N feature vectors from said other class and also determines said geometric loci of said quadratic decision boundary and said corresponding decision borders that jointly partition said extreme points into said symmetrical decision regions, wherein said symmetrical decision regions span said overlapping regions or said tail regions of said distributions of said N feature vectors that belong to said class and said N feature vectors that belong to said other class, and wherein said discriminant function of said minimum risk quadratic classification system satisfies said quadratic decision boundary in terms of a critical minimum eigenenergy and said minimum expected risk, wherein said counteracting and opposing components of said critical minimum eigenenergies exhibited by said corresponding components of said scaled extreme vectors on said geometric locus associated with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically distributed over said axis of said dual locus, on equal sides of said statistical fulcrum located at said geometric center of said dual locus, wherein said counteracting and opposing components of said critical minimum eigenenergies together with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about said geometric center of said dual locus, and wherein said statistical fulcrum is located at said center of said total allowed eigenenergy and said minimum expected risk of said minimum risk quadratic classification system, wherein said minimum risk quadratic classification system satisfies a state of statistical equilibrium, wherein said total allowed eigenenergy and said expected risk of said minimum risk quadratic classification system are minimized, and wherein said minimum risk quadratic classification system exhibits the minimum probability of error for classifying said N feature vectors that belong to said class and said N feature vectors that belong to said other class and said unknown feature vectors related to said data set of said class and said data set of said other class; determining M different ensembles of M1 different discriminant functions of M1 different minimum risk quadratic classification systems using said M different data sets, said determination of said M different ensembles of M1 different discriminant functions of M1 different minimum risk quadratic classification systems being performed by performing said steps of determining M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems; determining a fused discriminant function of a fused M-class minimum risk quadratic classification system using said M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems and said M different ensembles of M1 different discriminant functions of M1 different minimum risk quadratic classification systems, said determination of said fused discriminant function of said fused M-class minimum risk quadratic classification system being performed by using said processors of said computer system to sum said M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems and said M different ensembles of M1 different discriminant functions of M1 different minimum risk quadratic classification systems; and determining which of said M classes said unknown feature vectors and said unknown different feature vectors belong to using said fused discriminant function of said fused M-class minimum risk quadratic classification system, said determination of said classes of said unknown feature vectors and said unknown different feature vectors being performed by using said processors of said computer system to apply said fused discriminant function of said fused M-class minimum risk quadratic classification system to said unknown feature vectors and said unknown different feature vectors, wherein said fused discriminant function determines likely locations of said unknown feature vectors and said unknown different feature vectors and identifies said decision regions related to said M classes that said unknown feature vectors and said unknown different feature vectors are located within, wherein said fused discriminant function recognizes said classes of said unknown feature vectors and said unknown different feature vectors, and wherein said fused M-class minimum risk quadratic classification system decides which of said M classes said unknown feature vectors and said unknown different feature vectors belong to and thereby classifies said unknown feature vectors and said unknown different feature vectors.

11. The method of claim 10, wherein the reproducing kernel is a Gaussian reproducing kernel: k.sub.x=exp(sx.sup.2): 0.010.1.

12. The method of claim 10, wherein the reproducing kernel is a second-order polynomial reproducing kernel: k.sub.x=(s.sup.Tx+1).sup.2.

13. A computer-implemented method of using feature vectors and machine learning algorithms to determine a discriminant function of a minimum risk quadratic classification system that classifies said feature vectors into two classes and using said discriminant function of said minimum risk quadratic classification system to determine a classification error rate and a measure of overlap between distributions of said feature vectors, said method comprising: receiving an Nd data set of feature vectors within a computer system, wherein N is a number of feature vectors, d is a number of vector components in each feature vector, and each one of said N feature vectors is labeled with information that identifies which of two classes each one of said N feature vectors belongs to, and wherein each said feature vector is defined by a d-dimensional vector of numerical features, wherein said numerical features are extracted from digital signals; receiving an Nd test data set of test feature vectors related to said data set within said computer system, wherein N is a number of test feature vectors, d is a number of vector components in each test feature vector, and each one of said N test feature vectors is labeled with information that identifies which of said two classes each one of said N test feature vectors belongs to; determining a kernel matrix using said data set, said determination of said kernel matrix being performed by using processors of said computer system to calculate a matrix of all possible inner products of signed reproducing kernels of said N feature vectors, wherein a reproducing kernel of a feature vector replaces said feature vector with a curve that contains first and second degree vector components, and wherein each one of said reproducing kernels of said N feature vectors has a sign of +1 or 1 that identifies which of said two classes each one of said N feature vectors belongs to, and using said processors of said computer system to calculate a regularized kernel matrix from said kernel matrix; determining scale factors of a geometric locus of signed and scaled reproducing kernels of extreme points using said regularized kernel matrix, wherein said extreme points are located within overlapping regions or near tail regions of distributions of said N feature vectors, said determination of said scale factors being performed by using said processors of said computer system to determine a solution of a dual optimization problem, wherein said scale factors and said geometric locus satisfy a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium, and wherein said scale factors determine conditional densities for said extreme points and also determine critical minimum eigenenergies exhibited by scaled extreme vectors on said geometric locus, wherein said critical minimum eigenenergies determine conditional probabilities of said extreme points and also determine corresponding counter risks and risks of a minimum risk quadratic classification system, wherein said counter risks are associated with right decisions and said risks are associated with wrong decisions of said minimum risk quadratic classification system, and wherein said geometric locus determines the principal eigenaxis of the decision boundary of said minimum risk quadratic classification system, wherein said principal eigenaxis exhibits symmetrical dimensions and density, wherein said conditional probabilities and said critical minimum eigenenergies exhibited by said minimum risk quadratic classification system are symmetrically concentrated within said principal eigenaxis, and wherein counteracting and opposing components of said critical minimum eigenenergies exhibited by corresponding components of said scaled extreme vectors on said geometric locus together with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about the geometric center of said principal eigenaxis, wherein the center of total allowed eigenenergy and minimum expected risk of said minimum risk quadratic classification system is located at the geometric center of said geometric locus, and wherein said geometric locus determines a primal representation of a dual locus of likelihood components and principal eigenaxis components, wherein said likelihood components and said principal eigenaxis components are symmetrically distributed over either side of the axis of said dual locus, wherein a statistical fulcrum is placed directly under the center of said dual locus, and wherein said likelihood components of said dual locus determine conditional likelihoods for said extreme points, and wherein said principal eigenaxis components of said dual locus determine an intrinsic coordinate system of geometric loci of a quadratic decision boundary and corresponding decision borders that jointly partition the decision space of said minimum risk quadratic classification system into symmetrical decision regions; determining said extreme vectors on said geometric locus using the vector of said scale factors, said determination of said extreme vectors being performed by using said processors of said computer system to identify said scale factors that exceed zero by a small threshold, and using said processors of said computer system to determine a sign vector of signs associated with said extreme vectors using said data set, and compute the average sign using said sign vector; determining a locus of risk for said minimum risk quadratic classification system using said reproducing kernels of said extreme vectors and said signed reproducing kernels of said N feature vectors and said vector of scale factors, said determination of said locus of risk being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said extreme vectors, and multiply said matrix by said vector of scale factors, and compute the average risk for said minimum risk quadratic classification system using said locus of risk; determining a discriminant locus for said minimum risk quadratic classification system using said geometric locus, said determination of said discriminant locus being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said N feature vectors and said N test feature vectors, and multiply said matrix by said vector of scale factors; determining the discriminant function of said minimum risk quadratic classification system, using said average risk and said average sign and said discriminant locus, said determination of said discriminant function of said minimum risk quadratic classification system being performed by using said processors of said computer system to subtract said average risk from sum of said discriminant locus and said average sign, wherein said discriminant function of said minimum risk quadratic classification system satisfies said system of fundamental locus equations of binary classification, and wherein said discriminant function of said minimum risk quadratic classification system determines likely locations of said N feature vectors and said N test feature vectors and also determines said geometric loci of said quadratic decision boundary and said corresponding decision borders that jointly partition said extreme points into said symmetrical decision regions, wherein said symmetrical decision regions span said overlapping regions or said tail regions of said distributions of said N feature vectors, and wherein said discriminant function of said minimum risk quadratic classification system satisfies said quadratic decision boundary in terms of a critical minimum eigenenergy and said minimum expected risk, wherein said counteracting and opposing components of said critical minimum eigenenergies exhibited by said corresponding components of said scaled extreme vectors on said geometric locus associated with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically distributed over said axis of said dual locus, on equal sides of said statistical fulcrum located at said geometric center of said dual locus, wherein said counteracting and opposing components of said critical minimum eigenenergies together with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about said geometric center of said dual locus, and wherein said statistical fulcrum is located at said center of said total allowed eigenenergy and said minimum expected risk of said minimum risk quadratic classification system, wherein said minimum risk quadratic classification system satisfies a state of statistical equilibrium, wherein said total allowed eigenenergy and said expected risk of said minimum risk quadratic classification system are minimized, and wherein said minimum risk quadratic classification system exhibits the minimum probability of error for classifying said N feature vectors and said N test feature vectors related to said data set; determining which of said two classes said N feature vectors belong to using said discriminant function of said minimum risk quadratic classification system, said determination of said classes of said N feature vectors being performed by using said processors of said computer system to apply said discriminant function of said minimum risk quadratic classification system to said N feature vectors, wherein said discriminant function determines likely locations of said N feature vectors and identifies said decision regions related to said two classes that said N feature vectors are located within, wherein said discriminant function recognizes said classes of said N feature vectors, and wherein said minimum risk quadratic classification system decides which of said two classes said N feature vectors belong to belong to and thereby classifies said N feature vectors; determining an in-sample classification error rate for said two classes of feature vectors, said determination of said error rate being performed by using said processors of said computer system to calculate the average number of wrong decisions made by said minimum risk quadratic classification system for classifying said N features vectors; determining which of said two classes said N test feature vectors belong to using said discriminant function of said minimum risk quadratic classification system, said determination of said classes of said N test feature vectors being performed by using said processors of said computer system to apply said discriminant function of said minimum risk quadratic classification system to said N test feature vectors, wherein said discriminant function determines likely locations of said N test feature vectors and identifies said decision regions related to said two classes that said N test feature vectors are located within, wherein said discriminant function recognizes said classes of said N test feature vectors, and wherein said minimum risk quadratic classification system decides which of said two classes said N test feature vectors belong to and thereby classifies said N test feature vectors; determining an out-of-sample classification error rate for said two classes of feature vectors, said determination of said error rate being performed by using said processors of said computer system to calculate the average number of wrong decisions made by said minimum risk quadratic classification system for classifying said N test features vectors; determining a classification error rate for said two classes of feature vectors, said determination of said classification error rate being performed by using said processors of said computer system to average said in-sample classification error rate and said out-of-sample classification error rate; and determining a measure of overlap between distributions of feature vectors for said two classes of feature vectors using said N feature vectors and said extreme vectors, said determination of said measure of overlap being performed by using said processors of said computer system to calculate the ratio of the number of said extreme vectors to the number of said N feature vectors, wherein said ratio determines said measure of overlap.

14. The method of claim 13, wherein the reproducing kernel is a Gaussian reproducing kernel: k.sub.x=exp(sx.sup.2): 0.01y0.1.

15. The method of claim 13, wherein the reproducing kernel is a second-order polynomial reproducing kernel: k.sub.x=(s.sup.Tx+1).sup.2.

16. A computer-implemented method of using feature vectors and machine learning algorithms to determine a discriminant function of a minimum risk quadratic classification system that classifies collections of said feature vectors into two classes and using said discriminant function of said minimum risk quadratic classification system to determine if distributions of said collections of said feature vectors are homogenous distributions, said method comprising: receiving an Nd data set of feature vectors within a computer system, wherein N is a number of feature vectors, d is a number of vector components in each feature vector, and each one of said N feature vectors is labeled with information that identifies which of two collections each one of said N feature vectors belongs to, and wherein each said feature vector is defined by a d-dimensional vector of numerical features, wherein said numerical features are extracted from digital signals; determining a kernel matrix using said data set, said determination of said kernel matrix being performed by using processors of said computer system to calculate a matrix of all possible inner products of signed reproducing kernels of said N feature vectors, wherein a reproducing kernel of a feature vector replaces said feature vector with a curve that contains first and second degree vector components, and wherein each one of said reproducing kernels of said N feature vectors has a sign of +1 or 1 that identifies which of said two collections each one of said N feature vectors belongs to, and using said processors of said computer system to calculate a regularized kernel matrix from said kernel matrix; determining scale factors of a geometric locus of signed and scaled reproducing kernels of extreme points using said regularized kernel matrix, wherein said extreme points are located within overlapping regions or near tail regions of distributions of said N feature vectors, said determination of said scale factors being performed by using said processors of said computer system to determine a solution of a dual optimization problem, wherein said scale factors and said geometric locus satisfy a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium, and wherein said scale factors determine conditional densities for said extreme points and also determine critical minimum eigenenergies exhibited by scaled extreme vectors on said geometric locus, wherein said critical minimum eigenenergies determine conditional probabilities of said extreme points and also determine corresponding counter risks and risks of a minimum risk quadratic classification system, wherein said counter risks are associated with right decisions and said risks are associated with wrong decisions of said minimum risk quadratic classification system, and wherein said geometric locus determines the principal eigenaxis of the decision boundary of said minimum risk quadratic classification system, wherein said principal eigenaxis exhibits symmetrical dimensions and density, wherein said conditional probabilities and said critical minimum eigenenergies exhibited by said minimum risk quadratic classification system are symmetrically concentrated within said principal eigenaxis, and wherein counteracting and opposing components of said critical minimum eigenenergies exhibited by corresponding components of said scaled extreme vectors on said geometric locus together with corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about the geometric center of said principal eigenaxis, wherein the center of total allowed eigenenergy and minimum expected risk of said minimum risk quadratic classification system is located at the geometric center of said geometric locus, and wherein said geometric locus determines a primal representation of a dual locus of likelihood components and principal eigenaxis components, wherein said likelihood components and said principal eigenaxis components are symmetrically distributed over either side of the axis of said dual locus, wherein a statistical fulcrum is placed directly under the center of said dual locus, and wherein said likelihood components of said dual locus determine conditional likelihoods for said extreme points, and wherein said principal eigenaxis components of said dual locus determine an intrinsic coordinate system of geometric loci of a quadratic decision boundary and corresponding decision borders that jointly partition the decision space of said minimum risk quadratic classification system into symmetrical decision regions; determining said extreme vectors on said geometric locus using the vector of said scale factors, said determination of said extreme vectors being performed by using said processors of said computer system to identify said scale factors that exceed zero by a small threshold, and using said processors of said computer system to determine a sign vector of signs associated with said extreme vectors using said data set, and compute the average sign using said sign vector; determining a locus of risk for said minimum risk quadratic classification system using said reproducing kernels of said extreme vectors and said signed reproducing kernels of said N feature vectors and said vector of scale factors, said determination of said locus of risk being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said extreme vectors, and multiply said matrix by said vector of scale factors, and compute the average risk for said minimum risk quadratic classification system using said locus of risk; determining a discriminant locus for said minimum risk quadratic classification system using said geometric locus, said determination of said discriminant locus being performed by using said processors of said computer system to calculate a matrix of inner products between said signed reproducing kernels of said N feature vectors and said reproducing kernels of said N feature vectors , and multiply said matrix by said vector of scale factors; determining the discriminant function of said minimum risk quadratic classification system, using said average risk and said average sign and said discriminant locus, said determination of said discriminant function of said minimum risk quadratic classification system being performed by using said processors of said computer system to subtract said average risk from sum of said discriminant locus and said average sign, wherein said discriminant function of said minimum risk quadratic classification system satisfies said system of fundamental locus equations of binary classification, and wherein said discriminant function of said minimum risk quadratic classification system determines likely locations of said N feature vectors and also determines said geometric loci of said quadratic decision boundary and said corresponding decision borders that jointly partition said extreme points into said symmetrical decision regions, wherein said symmetrical decision regions span said overlapping regions or said tail regions of said distributions of said N feature vectors, and wherein said discriminant function of said minimum risk quadratic classification system satisfies said quadratic decision boundary in terms of a critical minimum eigenenergy and said minimum expected risk, wherein said counteracting and opposing components of said critical minimum eigenenergies exhibited by said corresponding components of said scaled extreme vectors on said geometric locus associated with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically distributed over said axis of said dual locus, on equal sides of said statistical fulcrum located at said geometric center of said dual locus, wherein said counteracting and opposing components of said critical minimum eigenenergies together with said corresponding counter risks and risks exhibited by said minimum risk quadratic classification system are symmetrically balanced with each other about said geometric center of said dual locus, and wherein said statistical fulcrum is located at said center of said total allowed eigenenergy and said minimum expected risk of said minimum risk quadratic classification system, wherein said minimum risk quadratic classification system satisfies a state of statistical equilibrium, wherein said total allowed eigenenergy and said expected risk of said minimum risk quadratic classification system are minimized, and wherein said minimum risk quadratic classification system exhibits the minimum probability of error for classifying said N feature vectors that belong to said two collections of said feature vectors; determining which of said two collections said N feature vectors belong to using said discriminant function of said minimum risk quadratic classification system, said determination of said collections of said N feature vectors being performed by using said processors of said computer system to apply said discriminant function of said minimum risk quadratic classification system to said N feature vectors, wherein said discriminant function determines likely locations of said N feature vectors and identifies said decision regions related to said two collections that said N feature vectors are located within, wherein said discriminant function recognizes said collections of said N feature vectors, and wherein said minimum risk quadratic classification system decides which of said two collections said N feature vectors belong to belong to and thereby classifies said N feature vectors; determining an in-sample classification error rate for said two collections of feature vectors, said determination of said error rate being performed by using said processors of said computer system to calculate the average number of wrong decisions made by said minimum risk quadratic classification system for classifying said N features vectors; determining a measure of overlap between said distributions of said N feature vectors for said two collections of feature vectors using said N feature vectors and said extreme vectors, said determination of said measure of overlap being performed by using said processors of said computer system to calculate the ratio of the number of said extreme vectors to the number of said N feature vectors, wherein said ratio determines said measure of overlap; and determining if said distributions of said two collections of said N feature vectors are homogenous distributions using said in-sample classification error rate and said measure of overlap, wherein said distributions of said N feature vectors are homogenous distributions if said measure of overlap has an approximate value of one and said in-sample classification error rate has an approximate value of one half.

17. The method of claim 16, wherein the reproducing kernel is a Gaussian reproducing kernel: k.sub.x=exp(sx.sup.2): 0.01y0.1.

18. The method of claim 16, wherein the reproducing kernel is a second-order polynomial reproducing kernel: k.sub.x=(s.sup.Tx+1).sup.2.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 illustrates symmetrical decision regions of a minimum risk quadratic classification system that are delineated by a hyperbolic decision boundary and hyperbolic decision borders obtained by using the method for determining a discriminant function of a minimum quadratic classification system that classifies feature vectors into two classes in which distributions of two collections of feature vectors have different mean vectors and different covariance matrices and are overlapping with each other;

(2) FIG. 2 illustrates symmetrical decision regions of a minimum risk quadratic classification system that are delineated by a parabolic decision boundary and parabolic decision borders obtained by using the method for determining a discriminant function of a minimum risk quadratic classification system that classifies feature vectors into two classes in which distributions of two collections of feature vectors have different mean vectors and different covariance matrices and are overlapping with each other;

(3) FIG. 3 illustrates symmetrical decision regions of a minimum risk quadratic classification system that are delineated by a hyperbolic decision boundary and hyperbolic decision borders obtained by using the method for determining a discriminant function of a minimum risk quadratic classification system that classifies feature vectors into two classes in which distributions of two collections of feature vectors have similar mean vectors and similar covariance matrices and are completely overlapping with each other;

(4) FIG. 4 illustrates symmetrical decision regions of a minimum risk quadratic classification system that are delineated by a parabolic decision boundary and parabolic decision borders obtained by using the method for determining a discriminant function of a minimum risk quadratic classification system that classifies feature vectors into two classes in which distributions of two collections of feature vectors have different mean vectors and similar covariance matrices and are overlapping with each other;

(5) FIG. 5 illustrates symmetrical decision regions of a minimum risk quadratic classification system that are delineated by an elliptic decision boundary and elliptic decision borders obtained by using the method for determining a discriminant function of a minimum risk quadratic classification system that classifies feature vectors into two classes in which distributions of two collections of feature vectors have different mean vectors and similar covariance matrices and are not overlapping with each other;

(6) FIG. 6 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for determining a discriminant function of a minimum risk quadratic classification system that classifies feature vectors into two classes;

(7) FIG. 7 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for determining a discriminant function of an M-class minimum risk quadratic classification system that classifies feature vectors into M classes;

(8) FIG. 8 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for determining a fused discriminant function of a fused M-class minimum risk quadratic classification system that classifies two types of feature vectors into M classes;

(9) FIG. 9 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for using a discriminant function of a minimum risk quadratic classification system to determine a classification error rate and a measure of overlap between distributions of feature vectors for two classes of feature vectors;

(10) FIG. 10 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for using a discriminant function of a minimum risk quadratic classification system to determine if distributions of two collections of feature vectors are homogenous distributions;

(11) FIG. 11 illustrates hardware components that may be used to implement discriminant functions of minimum risk quadratic classification systems of the invention; and

(12) FIG. 12 illustrates regions of counter risk and regions of risk within decision regions of quadratic classification systems in which distributions of two collections of feature vectors are overlapping with each other.

DETAILED DESCRIPTION OF THE INVENTION

(13) Before describing illustrative embodiments of the invention, a detailed description of machine learning algorithms of the invention is presented along with a detailed description of the novel principal eigenaxis that determines a discriminant function of a minimum risk quadratic classification system.

(14) The method to determine a discriminant function of a minimum risk quadratic classification system that classifies feature vectors into two categories, designed in accordance with the invention, uses machine learning algorithms and labeled feature vectors to determine a geometric locus of signed and scaled reproducing kernels of extreme points for feature vectors x of dimension d belonging to either of two classes A or B, wherein the geometric locus satisfies a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium.

(15) The input to a machine learning algorithm of the invention is a collection of N feature vectors x.sub.i with labels y.sub.i
(x.sub.1,(x.sub.2,y.sub.2), . . . ,(x.sub.N,y.sub.N)
wherein y.sub.i=+1 if x.sub.iA and y.sub.i=1 if x.sub.iB, and wherein the N feature vectors are extracted from collections of digital signals.

(16) Denote a minimum risk quadratic classification system of the invention by

(17) $k_{s} =_{0} \underset{B}{\overset{A}{}} 0,$
wherein A or B is the true category. The discriminant function D(s)=k.sub.s+.sub.D of the minimum risk quadratic classification system is represented by a novel principal eigenaxis that is expressed as a dual locus of likelihood components and principal eigenaxis components and is determined by a geometric locus of signed and scaled reproducing kernels of extreme points:

(18) $=_{1} -_{2} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}},$
wherein k.sub.x.sub.1i* and k.sub.x.sub.2i* are reproducing kernels of respective extreme points x.sub.1i* and x.sub.2i* located within overlapping regions or near tail regions of distributions of the N feature vectors, and the preferred reproducing kernel k.sub.x is either k.sub.x=(s.sup.Tx+1).sup.2 or k.sub.x=exp (sx.sup.2): 0.010.1, wherein preferred reproducing kernels k.sub.x of feature vectors x contain first x.sub.i and second degree x.sub.i.sup.2 point coordinates, which are necessary to delineate quadratic curves and surfaces, and wherein .sub.1.sub.2 determines an intrinsic coordinate system of geometric loci of a quadratic decision boundary and corresponding decision borders that jointly partition the decision space of the minimum risk quadratic classification system into symmetrical decision regions, wherein

(19) $(k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{X_{i^{*}}}) (_{1} -_{2})$
and wherein the scale factors .sub.1i* and .sub.2i* determine magnitudes .sub.1i*k.sub.x.sub.1i* and .sub.2i*k.sub.x.sub.2i* as well as critical minimum eigenenergies .sub.1i*k.sub.x.sub.1i*.sub.min.sub.c.sup.2 and .sub.2i*k.sub.x.sub.2i*.sub.min.sub.c.sup.2 exhibited by respective principal eigenaxis components .sub.1i*k.sub.x.sub.1i*, and .sub.2i*k.sub.x.sub.2i* on .sub.1.sub.2, and determine conditional likelihoods for respective extreme points k.sub.x.sub.1i*, and k.sub.x.sub.2i*. A machine learning algorithm of the invention uses the collection of N labeled feature vectors to find a satisfactory solution for the inequality constrained optimization problem:

(20) $\begin{matrix} \min () = {.Math. .Math.}^{2} / 2 + C / 2 {.Math.}_{i = 1}^{N}_{i}^{2}, s . t . y_{i} (k_{x_{i}} +_{0}) 1 -_{i}, i = 1, .Math., N, & (1.1) \end{matrix}$
wherein is a d1 geometric locus of signed and scaled reproducing kernels of extreme points that determines the principal eigenaxis of the decision boundary of a minimum risk quadratic classification system, wherein is expressed as a dual locus of likelihood components and principal eigenaxis components, and wherein k.sub.x.sub.i is a reproducing kernel for the feature vector x.sub.i, .sup.2 is the total allowed eigenenergy exhibited by , .sub.0 is a functional of , C and .sub.i are regularization parameters, and y.sub.i are class membership statistics: if x.sub.iA, assign y.sub.i=+1, and if x.sub.iB, assign y.sub.i=1. The objective of the machine leaning algorithm is to find the dual locus of likelihood components and principal eigenaxis components that minimizes the total allowed eigenenergy Z|.sub.min.sub.c.sup.2, and the expected risk custom character .sub.min (Z|.sub.min.sub.c.sup.2) exhibited by the minimum risk quadratic classification system

(21) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0,$
wherein the system of N inequalities:
y.sub.i(k.sub.x.sub.i+.sub.0)1.sub.i,i=1, . . . ,N,
is satisfied in a suitable manner, and wherein the dual locus of satisfies a critical minimum eigenenergy constraint:
()=.sub.min.sub.c.sup.2,
wherein the total allowed eigenenergy |Z|.sub.min.sub.c.sup.2, exhibited by the dual locus of determines the minimum expected risk custom character .sub.min (Z|.sub.min.sub.c.sup.2)=Z|.sub.min.sub.c.sup.2 and the conditional probability P (Z|)=Z|.sub.min.sub.c.sup.2 exhibited by the minimum risk quadratic classification system that classifies the collection of N feature vectors into the two classes A and B. A satisfactory solution for the primal optimization problem in Eq. (1.1) is found by using Lagrange multipliers .sub.i0 and the Lagrangian function:

(22) $\begin{matrix} L_{()} (,_{0},,) = {.Math. .Math.}^{2} / 2 + C / 2 {.Math.}_{i = 1}^{N}_{i}^{2} - {.Math.}_{i = 1}^{N}_{i} {y_{i} (k_{x_{i}} +_{0}) - 1 +_{i}}, & (1.2) \end{matrix}$
wherein the objective function and its constraints are combined with each other, that is minimized with respect to the primal variables and .sub.0, and is maximized with respect to the dual variables .sub.i. The Lagrange multipliers method introduces a Wolfe dual geometric locus that is symmetrically and equivalently related to the primal geometric locus and finds extrema for the restriction of the primal geometric locus to a Wolfe dual principal eigenspace.

(23) The fundamental unknowns associated with the primal optimization problem in Eq. (1.1) are the scale factors .sub.i of the principal eigenaxis components

(24) ${_{i} \frac{k_{x_{i}}}{.Math. k_{x_{i}} .Math.}}_{i = 1}^{N}$
on the geometric locus of a Wolfe dual principal eigenaxis . Each active scale factor .sub.i determines a conditional density and a corresponding conditional likelihood for a reproducing kernel of an extreme point on a dual locus of likelihood components, and each active scale factor .sub.i determines the magnitude and the critical minimum eigenenergy exhibited by a scaled extreme vector on a dual locus of principal eigenaxis components.

(25) The Karush-Kuhn-Tucker (KKT) conditions on the Lagrangian function L.sub.() in Eq. (1.2)

(26) $\begin{matrix} - {.Math.}_{i = 1}^{N}_{i} y_{i} k_{x_{i}} = 0, i = 1, .Math., N, & (1.3) \\ {.Math.}_{i = 1}^{N}_{i} y_{i} = 0, i = 1, .Math., N, & (1.4) \\ C {.Math.}_{i = 1}^{N}_{i} - {.Math.}_{i = 1}^{N}_{i} = 0, i = 1, .Math., N, & (1.5) \\ _{i} 0, i = 1, .Math., N, & (1.6) \\ _{i} [y_{i} (k_{x_{i}} +_{0}) - 1 +_{i}] 0, i = 1, .Math., N, & (1.7) \end{matrix}$
determine a system of fundamental locus equations of binary classification, subject to geometric and statistical conditions for a minimum risk quadratic classification system in statistical equilibrium, that are jointly satisfied by the geometric locus of the principal eigenaxis and the geometric locus of the principal eigenaxis .

(27) Because the primal optimization problem in Eq. (1.1) is a convex optimization problem, the inequalities in Eqs (1.6) and (1.7) must only hold for certain values of the primal and the dual variables. The KKT conditions in Eqs (1.3)-(1.7) restrict the magnitudes and the eigenenergies of the principal eigenaxis components on both and , wherein the expected risk custom character .sub.min (Z|.sub.min.sub.c.sup.2) and the total allowed eigenenergy |.sub.min.sub.c.sup.2 exhibited by a minimum risk quadratic classification system are jointly minimized.

(28) Substituting the expressions for and in Eqs (1.3) and (1.4) into the Lagrangian functional L.sub.() of Eq. (1.2) and simplifying the resulting expression determines the Lagrangian dual problem:

(29) $\begin{matrix} \max () = {.Math.}_{i = 1}^{N}_{i} - {.Math.}_{i, j = 1}^{N}_{i}_{j} y_{i} y_{j} \frac{k_{x_{i}} +_{ij} / C}{2}, & (1.8) \end{matrix}$
wherein is subject to the constraints

(30) 0 ${.Math.}_{i = 1}^{N}_{i} y_{i} = 0,$
and .sub.i0, and wherein .sub.ij is the Kronecker defined as unity for i=j and 0 otherwise.

(31) Equation (1.8) is a quadratic programming problem that can be written in vector notation by letting Q custom character I+{tilde over (X)}{tilde over (X)}.sup.T, wherein {tilde over (X)}D.sub.yX, wherein D.sub.y is a NN diagonal matrix of training labels (class membership statistics) y.sub.i, and wherein the Nd matrix {tilde over (X)} is a matrix of labeled reproducing kernels of N feature vectors:
{tilde over (X)}=(y.sub.1k.sub.x.sub.1,y.sub.2k.sub.x.sub.2, . . . ,y.sub.Nk.sub.x.sub.N).sup.T.

(32) The matrix version of the Lagrangian dual problem, which is also known as the Wolfe dual problem:

(33) $\begin{matrix} \max () = 1^{T} - \frac{^{T} Q}{2} & (1.9) \end{matrix}$
is subject to the constraints .sup.Ty=0 and .sub.i0, wherein the inequalities .sub.i0 only hold for certain values of .sub.i.

(34) Because Eq. (1.9) is a convex programming problem, the theorem for convex duality guarantees an equivalence and a corresponding symmetry between the dual loci of and . Accordingly, the geometric locus of the principal eigenaxis determines a dual locus of likelihood components and principal eigenaxis components, wherein the expected risk custom character .sub.min (Z|.sub.min.sub.c.sup.2) exhibited by the dual locus of is symmetrically and equivalently related to the expected risk .sub.min (Z|.sub.min.sub.c.sup.2) exhibited by the dual locus of : .sub.min (Z|.sub.min.sub.c.sup.2).sub.min (Z|.sub.min.sub.c.sup.2), and wherein the total allowed eigenenergy Z|.sub.min.sub.c.sup.2 exhibited by the dual locus of iv is symmetrically and equivalently related to the total allowed eigenenergy Z|.sub.min.sub.c.sup.2 exhibited by the dual locus of : Z|.sub.min.sub.c.sup.2Z|.sub.min.sub.c.sup.2.

(35) The locations and the scale factors of the principal eigenaxis components on both and are considerably affected by the rank and the eigenspectrum of the kernel matrix Q, wherein a low rank kernel matrix Q determines an unbalanced principal eigenaxis and an irregular quadratic partition of a decision space. The kernel matrix Q has low rank, wherein d<N for a collection of N feature vectors of dimension d. These problems are solved by the following regularization method.

(36) The regularized form of Q, wherein <<1 and Q custom character I+{tilde over (X)}{tilde over (X)}.sup.T, ensures that Q has full rank and a complete eigenvector set, wherein Q has a complete eigenspectrum. The regularization constant C is related to the regularization parameter by 1/C. For N feature vectors of dimension d, wherein d<N, all of the regularization parameters {.sub.i}.sub.i=1.sup.N in Eq. (1.1) and all of its derivatives are set equal to a very small value: .sub.i=<<1, e.g. .sub.i==0.02. The regularization constant C is set equal to

(37) $\frac{1}{} : C = \frac{1}{} .$

(38) For N feature vectors of dimension d, wherein N<d, all of the regularization parameters {.sub.i}.sub.i=1.sup.N in Eq. (1.1) and all of its derivatives are set equal to zero: .sub.i==0.

(39) The regularization constant C is set equal to infinity: C=.

(40) The KKT conditions in Eqs (1.3) and (1.6) require that the geometric locus of the principal eigenaxis satisfy the vector expression:

(41) $\begin{matrix} = {.Math.}_{i = 1}^{N} y_{i}_{i} k_{x_{i}}, & (1.10) \end{matrix}$
wherein .sub.i0 and reproducing kernels k.sub.x of feature vectors x.sub.i correlated with Wolfe dual principal eigenaxis components

(42) $_{i} \frac{k_{x_{i}}}{.Math. k_{x_{i}} .Math.}$
that have non-zero magnitudes .sub.i>0 are termed extreme vectors.

(43) Denote the scaled extreme vectors that belong to class A and class B by .sub.1i*k.sub.x.sub.1i*, and .sub.2i*k.sub.x.sub.2i*, respectively, wherein .sub.1i* is the scale factor for the extreme vector k.sub.x.sub.1i* and .sub.2i* is the scale factor for the extreme vector k.sub.x.sub.2i*. Let there be l.sub.1 extreme vectors {.sub.1i*k.sub.x.sub.1i*}.sub.i=1.sup.l.sup.1 that belong to class A, and let there be l.sub.2 scaled extreme vectors {.sub.2i*k.sub.x.sub.2i*}.sub.i=1.sup.l.sup.2 that belong to class B. Let there be l=l.sub.1+l.sub.2 extreme vectors from class A and class B.

(44) Using Eq. (1.10), the class membership statistics and the assumptions outlined above, it follows that the geometric locus of the principal eigenaxis is determined by the vector difference between a pair of sides, i.e., a pair of directed line segments:

(45) $\begin{matrix} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}} =_{1} -_{2}, & (1.11) \end{matrix}$
wherein .sub.1 and .sub.2 denote the sides of , wherein the side .sub.1 is determined by the vector expression

(46) $_{1} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}}$
and the side of .sub.2 is determined by the vector expression

(47) $_{2} = {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}},$
and wherein the geometric locus of the principal eigenaxis is determined by the vector difference of .sub.1 and .sub.2.

(48) All of the principal eigenaxis components .sub.1i*k.sub.x.sub.1i* and .sub.2i*k.sub.x.sub.2i* on the dual locus of

(49) $= {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}}$
determine an intrinsic coordinate system of geometric loci of a quadratic decision boundary and corresponding decision borders. FIG. 1-FIG. 5 illustrate various geometric loci of quadratic decision boundaries and corresponding decision borders.

(50) FIG. 1 illustrates a hyperbolic decision boundary and hyperbolic decision borders, wherein distributions of two collections of feature vectors have different mean vectors and different covariance matrices, wherein the distributions are overlapping with each other.

(51) FIG. 2 illustrates a parabolic decision boundary and parabolic decision borders, wherein distributions of two collections of feature vectors have different mean vectors and different covariance matrices, wherein the distributions are overlapping with each other.

(52) FIG. 3 illustrates a hyperbolic decision boundary and hyperbolic decision borders, wherein distributions of two collections of feature vectors have similar mean vectors and similar covariance matrices, wherein the distributions are completely overlapping with each other.

(53) FIG. 4 illustrates a parabolic decision boundary and parabolic decision borders, wherein distributions of two collections of feature vectors have different mean vectors and similar covariance matrices, wherein the distributions are overlapping with each other.

(54) FIG. 5 illustrates an elliptic decision boundary and elliptic decision borders, wherein distributions of two collections of feature vectors have different mean vectors and similar covariance matrices, wherein the distributions are not overlapping with each other.

(55) The manner in which a discriminate function of the invention partitions the feature space Z=Z.sub.1+Z.sub.2 of a minimum risk quadratic classification system for a collection of N feature vectors is determined by the KKT condition in Eq. (1.7) and the KKT condition of complementary slackness.

(56) The KKT condition in Eq. (1.7) and the KKT condition of complementary slackness determine a discriminant function
D(s)=k.sub.s+.sub.0(1.12)
that satisfies the set of constraints: D(s)=0, D(s)=+1, and D(s)=1,
wherein D(s)=0 denotes a quadratic decision boundary that partitions the Z.sub.1 and Z.sub.2 decision regions of a minimum risk quadratic classification system

(57) $k_{s} =_{0} \underset{B}{\overset{A}{}} 0,$
and wherein D(s)=+1 denotes the quadratic decision border for the Z.sub.1 decision region, and wherein D(s)=1 denotes the quadratic decision border for the Z.sub.2 decision region.

(58) The KKT condition in Eq. (1.7) and the KKT condition of complementary slackness also determines the following system of locus equations that are satisfied by .sub.0 and :
y.sub.i(k.sub.x.sub.i*+.sub.0)1+.sub.i=0,i=1, . . . ,l,
wherein .sub.0 satisfies the functional of in the following manner:

(59) 0 $\begin{matrix} _{0} = \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) - (\frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}) . & (1.13) \end{matrix}$

(60) Using Eqs (1.12) and (1.13), the discriminant function is rewritten as:

(61) $\begin{matrix} D (s) = k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) . & (1.14) \end{matrix}$

(62) Using Eq. (1.14) and letting D(s)=0, the discriminant function is rewritten as

(63) $\begin{matrix} k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) = 0, & (1.15) \end{matrix}$
wherein the constrained discriminant function D(s)=0 determines a quadratic decision boundary, and all of the points s on the quadratic decision boundary D(s)=0 exclusively reference the principal eigenaxis of .

(64) Using Eq. (1.14) and letting D(s)=+1, the discriminant function is rewritten as

(65) $\begin{matrix} k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) = + 1, & (1.16) \end{matrix}$
wherein the constrained discriminant function D(s)=+1 determines a quadratic decision border, and all of the points s on the quadratic decision border D(s)=+1 exclusively reference the principal eigenaxis of .

(66) Using Eq. (1.14) and letting D(s)=1, the discriminant function is rewritten as

(67) $\begin{matrix} k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) = - 1, & (1.17) \end{matrix}$
wherein the constrained discriminant function D(s)=1 determines a quadratic decision border, and all of the points s on the quadratic decision border D(s)=1 exclusively reference the principal eigenaxis of .

(68) Given Eqs (1.15)-(1.17), it follows that a constrained discriminant function of the invention

(69) $\begin{matrix} D (s) = k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) : \\ \begin{matrix} D (s) = 0, & D (s) = + 1, & and D (s) = - 1, \end{matrix} \end{matrix}$
determines geometric loci of a quadratic decision boundary D(s)=0 and corresponding decision borders D(s)=+1 and D(s)=1 that jointly partition the decision space Z of a minimum risk quadratic classification system

(70) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0,$
into symmetrical decision regions Z.sub.1 and Z.sub.2:Z=Z.sub.1+Z.sub.2:Z.sub.1Z.sub.2wherein balanced portions of the extreme points x.sub.1i* and x.sub.2i* from class A and class Baccount for right and wrong decisions of the minimum risk quadratic classification system.

(71) Therefore, the geometric locus of the principal eigenaxis determines an eigenaxis of symmetry

(72) $(k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}) (_{1} -_{2})$
for the decision space of a minimum risk quadratic classification system, wherein a constrained discriminant function delineates symmetrical decision regions Z.sub.1 and Z.sub.2:Z.sub.1Z.sub.2 for the minimum risk quadratic classification system

(73) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0,$
wherein the decision regions Z.sub.1 and Z.sub.2 are symmetrically partitioned by the quadratic decision boundary of Eq. (1.15), and wherein the span of the decision regions is regulated by the constraints on the corresponding decision borders of Eqs (1.16)-(1.17).

(74) FIG. 1-FIG. 5 illustrate various types of symmetrical decision regions for minimum risk quadratic classification systems.

(75) Substitution of the vector expressions for and .sub.0 in Eqs (1.11) and (1.13) into the expression for the discriminant function in Eq. (1.12) determines an expression for a discriminant function of a minimum risk quadratic classification system that classifies feature vectors s into two classes A and B:

(76) $\begin{matrix} D (s) = (k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}})_{1} - (k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}})_{2} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}), & (1.18) \end{matrix}$
wherein feature vectors s belong to or are related to a collection of N feature vectors {k.sub.x.sub.i}.sub.i=1.sup.N, and wherein the average extreme vector

(77) 0 $\frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}$
determines the average locus of the l extreme vectors {k.sub.x.sub.i*}.sub.i=1.sup.l that belong to the collection of N feature vectors {k.sub.x.sub.i}.sub.i=1.sup.N, and wherein the average sign

(78) $\frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i})$
accounts for class memberships of the principal eigenaxis components on .sub.1 and .sub.2. The average locus

(79) $\frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}$
determines the average risk custom character for the decision space Z=Z.sub.1+Z.sub.2 of the minimum risk quadratic classification system

(80) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0,$
wherein the vector difference

(81) $k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}$
determines the distance between a feature vector s and the locus of average risk custom character Let s denote an unknown feature vector related to a collection of N feature vectors {x.sub.i}.sub.i=1.sup.N that are inputs to one of the machine learning algorithms of the invention, wherein each feature vector x.sub.i has a label y.sub.i wherein y.sub.i=+1 if x.sub.iA and y.sub.i=1 if x.sub.iB, and wherein a discriminant function of a minimum risk quadratic classification system has been determined. Now take any given unknown feature vector s.

(82) The discriminant function

(83) $D (s) = (k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}})_{1} - (k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}})_{2} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i})$
of Eq. (1.18) determines the likely location of the unknown feature vector s, wherein the likely location of s is determined by the vector projection of

(84) $k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}$
onto the dual locus of likelihood components and principal eigenaxis components .sub.1-.sub.2:

(85) $.Math._{1} -_{2} .Math. [.Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos],$
wherein the component of

(86) $k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}$
along the dual locus of .sub.1.sub.2:

(87) $comp \underset{_{1} -_{2}}{.fwdarw.} (\overset{.fwdarw.}{k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}}) = .Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos$
determines the signed magnitude

(88) 0 $.Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos$
along the axis of .sub.1.sub.2, where is the angle between the transformed unknown feature vector

(89) $k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}$
and .sub.1.sub.2, and wherein the decision region that the unknown feature vector s is located within is determined by the sign of the expression:

(90) $sign (.Math._{1} -_{2} .Math. [.Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos] + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i})) .$

(91) Therefore, the likely location of the unknown feature vector s is determined by the scalar value of

(92) $.Math._{1} -_{2} .Math. .Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos,$
along the axis of the dual locus .sub.i.sub.2, wherein the scalar value of the expression

(93) $.Math._{1} -_{2} .Math. .Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i})$
indicates the decision region Z.sub.1 or Z.sub.2 that the unknown feature vector s is located withinalong with the corresponding class of s.

(94) Thus, if:

(95) $.Math._{1} -_{2} .Math. [.Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos] + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) 0,$
then the unknown feature vector s is located within region Z.sub.1 and sA, whereas if

(96) $.Math._{1} -_{2} .Math. [.Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos] + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) < 0,$
then the unknown feature vectors s is located within region Z.sub.2 and sB.

(97) The minimum risk quadratic classification system of the invention decides which of the two classes A or B that the unknown feature vector s belongs to according to the sign of +1 or 1 that is output by the signum function:

(98) $\begin{matrix} sign (D (s)) \overset{}{=} sign (.Math._{1} -_{2} .Math. [.Math. k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} .Math. \cos] + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i})) & (1.19) \end{matrix}$
and thereby classifies the unknown feature vector s.

(99) Thus, the discriminant function of the invention in Eq. (1.18) determines likely locations of each one of the feature vectors x.sub.i that belong to a collection of N feature vectors {x.sub.i}.sub.i=1.sup.N and any given unknown feature vectors s related to the collection, wherein the feature vectors are inputs to one of the machine learning algorithms of the invention and a discriminant function of a minimum risk quadratic classification system has been determined.

(100) Further, the discriminant function identifies the decision regions Z.sub.1 and Z.sub.2 related to the two classes A and B that each one of the N feature vectors x.sub.i and the unknown feature vectors s are located within, wherein the discriminant function recognizes the classes of each one of the N feature vectors x.sub.i and each one of the unknown feature vectors s, and the minimum risk quadratic classification system of the invention in Eq. (1.19) decides which of the two classes that each one of the N feature vectors x.sub.i and each one of the unknown feature vectors s belong to and thereby classifies the collection of N feature vectors{x.sub.t}N.sub.i and any given unknown feature vectors s.

(101) Therefore, discriminant functions of the invention exhibit a novel and useful property, wherein, for any given collection of feature vectors that belong to two classes and are inputs to a machine learning algorithm of the invention, the discriminant function that is determined by the machine learning algorithm determines likely locations of each one of the feature vectors that belong to the given collection of feature vectors and any given unknown feature vectors related to the collection, and identifies the decision regions related to the two classes that each one of the feature vectors and each one of the unknown feature vectors are located within, wherein the discriminant function recognizes the classes of the feature vectors and the unknown feature vectors according to the signs related to the two classes.

(102) The likelihood components and the corresponding principal eigenaxis components .sub.1i*k.sub.x.sub.2i* and .sub.2i*k.sub.x.sub.2i* on the dual locus of .sub.1.sub.2 are determined by the geometric and the statistical structure of the geometric locus of signed and scaled reproducing kernels of extreme points:

(103) $_{1} -_{2} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}},$
wherein the scale factors .sub.1i* and .sub.2i* of the geometric locus determine magnitudes .sub.1i*k.sub.x.sub.1i* and .sub.2i*k.sub.x.sub.2i* as well as critical minimum eigenenergies and .sub.1i*k.sub.x.sub.1i*.sub.min.sub.c.sup.2, and .sub.1i*k.sub.x.sub.1i*.sub.min.sub.c.sup.2 exhibited by respective principal eigenaxis components .sub.1i*k.sub.x.sub.1i* and .sub.2i*k.sub.x.sub.2i*, on the dual locus of .sub.1.sub.2, and each scale factor .sub.1i* or .sub.2i* determines a conditional density and a corresponding conditional likelihood for a respective extreme point k.sub.x.sub.1i* or k.sub.x.sub.2i*.

(104) Scale factors are determined by finding a satisfactory solution for the Lagrangian dual optimization problem in Eq. (1.9), wherein finding a geometric locus of signed and scaled reproducing kernels of extreme points involves optimizing a vector-valued cost function with respect to constraints on the scaled extreme vectors on the dual loci of and , wherein the constraints are specified by the KKT conditions in Eqs (1.3)-(1.7).

(105) The Wolfe dual geometric locus of scaled extreme points on is determined by the largest eigenvector .sub.max of the kernel matrix Q associated with the quadratic form .sub.max.sup.TQ.sub.max in Eq. (1.9), wherein .sup.Ty=0, .sub.i*>0, and wherein .sub.max is the principal eigenaxis of an implicit quadratic decision boundaryassociated with the constrained quadratic form .sub.max.sup.TQ.sub.maxwithin the Wolfe dual principal eigenspace of , wherein the form of the inner product statistics contained within the kernel matrix Q determines an intrinsic coordinate system of the intrinsic quadratic decision boundary.

(106) Further, the intrinsic coordinate system of the intrinsic quadratic decision boundary of Eq. (1.9) is an inherent function of inner product statistics between feature vectors k.sub.x.sub.i and k.sub.x.sub.j, wherein reproducing kernels k.sub.x of feature vectors x contain first x.sub.i and second degree x.sub.i.sup.2 point coordinates, wherein reproducing kernels that contain first and second degree point coordinates are necessary to delineate quadratic decision boundaries and corresponding decision borders.

(107) The theorem for convex duality indicates that the principal eigenaxis of satisfies a critical minimum eigenenergy constraint that is symmetrically and equivalently related to the critical minimum eigenenergy constraint on the principal eigenaxis of , within the Wolfe dual principal eigenspace of and : Z|.sub.min.sub.c.sup.2Z|.sub.min.sub.c.sup.2, wherein the principal eigenaxis of satisfies a critical minimum eigenenergy constraint:
max .sub.max.sup.TQ.sub.max=.sub.max.sub.Z|.sub.max.sub.min.sub.c.sup.2,
and the functional 1.sup.T.sup.TQ/2 in Eq. (1.9) is maximized by the largest eigenvector .sub.max of Q, wherein the constrained quadratic form .sup.TQ/2, wherein .sup.T.sub.maxy=0 and .sub.i*>0, reaches its smallest possible value. It follows that the principal eigenaxis components on satisfy minimum length constraints.

(108) The principal eigenaxis components on also satisfy an equilibrium constraint. The KKT condition in Eq. (1.4) requires that the magnitudes of the principal eigenaxis components on the dual locus of satisfy the locus equation:

(109) $\begin{matrix} (y_{i} = 1) {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} + (y_{i} = - 1) {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} = 0 & (1.20) \end{matrix}$
wherein Eq. (1.20) determines the Wolf dual equilibrium point:

(110) 0 $\begin{matrix} {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} = 0 & (1.21) \end{matrix}$
of a minimum risk quadratic classification system, wherein the critical minimum eigenenergies exhibited by the principal eigenaxis of are symmetrically concentrated.

(111) Given Eq. (1.21), it follows that the integrated lengths of the Wolfe dual principal eigenaxis components correlated with each class balance each other, wherein the principal eigenaxis of is in statistical equilibrium:

(112) $\begin{matrix} {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} = {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} . & (1.22) \end{matrix}$

(113) Now, each scale factor .sub.1i* or .sub.2i* is correlated with a respective extreme vector k.sub.x.sub.1i* or k.sub.x.sub.2i*. Therefore, let l.sub.1+l.sub.2=l, and express the principal eigenaxis of in terms of l scaled, unit extreme vectors:

(114) $\begin{matrix} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.} + {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.} =_{1} +_{2}, & (1.23) \end{matrix}$
wherein .sub.1 and .sub.2 denote the sides of the dual locus of , wherein the side .sub.1 is determined by the vector expression

(115) $_{1} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.},$
and wherein the side of .sub.2 is determined by the vector expression

(116) $_{2} = {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.} .$

(117) The system of locus equations in Eqs (1.20)-(1.23) demonstrates that the principal eigenaxis of is determined by a geometric locus of scaled, unit extreme vectors from class A and class B, wherein all of the scaled, unit extreme vectors on .sub.1 and .sub.2 are symmetrically distributed over either side of the geometric locus of the principal eigenaxis , wherein a statistical fulcrum is placed directly under the center of the principal eigenaxis of .

(118) Using Eq. (1.22) and Eq. (1.23), it follows that the length .sub.1, of .sub.1 is equal to the length .sub.2 of .sub.2: .sub.1=.sub.2. It also follows that the total allowed eigenenergies Z|.sub.1.sub.min.sub.c.sup.2 and Z|.sub.2.sub.min.sub.c.sup.2 exhibited by .sub.1 and .sub.2 are symmetrically balanced with each other about the geometric center of the principal eigenaxis of : Z|.sub.1.sub.min.sub.c.sup.2=Z|.sub.2.sub.min.sub.c.sup.2.

(119) The equilibrium constraint on the geometric locus of the principal eigenaxis in Eq. (1.20) ensures that the critical minimum eigenenergies exhibited by all of the principal eigenaxis components on .sub.1 and .sub.2 are symmetrically concentrated within the principal eigenaxis of :

(120) $\begin{matrix} {.Math. {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.} .Math.}_{\min_{c}}^{2} = {.Math. {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.} .Math.}_{\min_{c}}^{2} . & (1.24) \end{matrix}$

(121) Using Eq. (1.24), it follows that the principal eigenaxis of .sub.lf satisfies a state of statistical equilibrium, wherein all of the principal eigenaxis components on are equal or in correct proportions, relative to the center of , wherein components of likelihood components and corresponding principal eigenaxis components of class Aalong the axis of .sub.1are symmetrically balanced with components of likelihood components and corresponding principal eigenaxis components of class Balong the axis of .sub.2.

(122) Therefore, the principal eigenaxis of determines a point at which the critical minimum eigenenergies exhibited by all of the scaled, unit extreme vectors from class A and class B are symmetrically concentrated, wherein the total allowed eigenenergy Z|.sub.min.sub.c.sup.2 exhibited by the principal eigenaxis of is minimized within the Wolfe dual principal eigenspace.

(123) The scale factors are associated with the fundamental unknowns of the constrained optimization problem in Eq. (1.1). Now, the geometric locus of the principal eigenaxis can be written as

(124) $\begin{matrix} _{\max} = \frac{_{1}}{_{\max_{}}} (\begin{matrix} .Math. k_{x_{1}} .Math. .Math. k_{x_{1}} .Math. \cos_{k_{x_{1}} k_{x_{1}}} \\ .Math. k_{x_{2}} .Math. .Math. k_{x_{1}} .Math. \cos_{k_{x_{2}} k_{x_{1}}} \\ .Math. \\ - .Math. k_{x_{N}} .Math. .Math. k_{x_{1}} .Math. \cos_{k_{x_{N}} k_{x_{1}}} \end{matrix}) + .Math. + .Math. \frac{_{N}}{_{\max_{}}} (\begin{matrix} - .Math. k_{x_{1}} .Math. .Math. k_{x_{N}} .Math. \cos_{k_{x_{1}} k_{x_{N}}} \\ - .Math. k_{x_{2}} .Math. .Math. k_{x_{N}} .Math. \cos_{k_{x_{2}} k_{x_{N}}} \\ .Math. \\ .Math. k_{x_{N}} .Math. .Math. k_{x_{N}} .Math. \cos_{k_{x_{N}} k_{x_{N}}} \end{matrix}), & (1.25) \end{matrix}$
wherein each scale factor .sub.j is correlated with scalar projections

(125) $.Math. k_{x_{j}} .Math. \cos_{k_{x_{i}} k_{x_{j}}}$
of a feature vector k.sub.x.sub.j onto a collection of N signed feature vectors k.sub.x.sub.i.

(126) Further, given a kernel matrix of all possible inner products of reproducing kernels of a collection of N feature vectors {.sub.i}.sub.i=1.sup.N, the pointwise covariance statistic custom character (k.sub.x.sub.i) of any given feature vector k.sub.x.sub.i

(127) $\begin{matrix} up (k_{x_{i}}) = .Math. k_{x_{i}} .Math. {.Math.}_{j = 1}^{N} .Math. k_{x_{j}} .Math. \cos_{k_{x_{i}} k_{x_{j}}} & (1.26) \end{matrix}$
determines a unidirectional estimate of the joint variations between the random variables of each feature vector k.sub.x.sub.j in the collection of N feature vectors {x.sub.i}.sub.i=1.sup.N and the random variables of the feature vector k.sub.x.sub.i, along with a unidirectional estimate of the joint variations between the random variables of the mean feature vector

(128) ${.Math.}_{j = 1}^{N} k_{x_{j}}$
and the feature vector k.sub.x.sub.i, along the axis of the feature vector k.sub.x.sub.i.

(129) Let i=1: l.sub.1, where each extreme vector k.sub.x.sub.1i* is correlated with a principal eigenaxis component

(130) 0 $_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.}$
on .sub.1. Now take the extreme vector k.sub.x.sub.1i* that is correlated with the principal eigenaxis component

(131) $_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.} .$
Using Eqs (1.25) and (1.26), it follows that the geometric locus of the principal eigenaxis component

(132) $_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.}$
on .sub.1 is determined by the locus equation:

(133) $\begin{matrix} _{1 i^{*}} =_{\max_{}}^{- 1} .Math. k_{x_{1 i^{*}}} .Math. {.Math.}_{j = 1}^{l_{1}}_{1 j^{*}} .Math. k_{x_{1 j^{*}}} .Math. \cos_{k_{x_{1 i^{*}}} k_{x_{1 j^{*}}}} -_{\max_{}}^{- 1} .Math. k_{x_{1 i^{*}}} .Math. {.Math.}_{j = 1}^{l_{2}}_{2 j^{*}} .Math. k_{x_{2 j^{*}}} .Math. \cos_{k_{x_{1 i^{*}}} k_{x_{2 j^{*}}}} & (1.27) \end{matrix}$
wherein components of likelihood components and principal eigenaxis components for class Aalong the axis of the extreme vector k.sub.x.sub.1i*are symmetrically balanced with opposing components of likelihood components and principal eigenaxis components for class Balong the axis of the extreme vector k.sub.x.sub.1i*:

(134) $_{1 i^{*}} =_{\max_{}}^{- 1} .Math. k_{x_{1 i^{*}}} .Math. {.Math.}_{j = 1}^{l_{1}} {comp}_{\overset{.fwdarw.}{k_{x_{1 i^{*}}}}} (\overset{.fwdarw.}{_{1 j^{*}} k_{x_{1 j^{*}}}}) -_{\max_{}}^{- 1} .Math. k_{x_{1 i^{*}}} .Math. {.Math.}_{j = 1}^{l_{2}} {comp}_{\overset{.fwdarw.}{k_{x_{1 i^{*}}}}} (\overset{.fwdarw.}{_{2 j^{*}} k_{x_{2 j^{*}}}}),$
wherein .sub.1i* determines a scale factor for the extreme vector

(135) $\frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.} .$
Accordingly, Eq. (1.27) determines a scale factor .sub.1i* for a correlated extreme vector k.sub.x.sub.1i*.

(136) Let i=1:l.sub.2, where each extreme vector k.sub.x.sub.2i* is correlated with a principal eigenaxis component

(137) $_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.}$
on .sub.2. Now take the extreme vector k.sub.x.sub.2i* that is correlated with the principal eigenaxis component

(138) $_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.} .$
Using Eqs (1.25) and (1.26), it follows that the geometric locus of the principal eigenaxis component

(139) $_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.}$
on .sub.2 is determined by the locus equation:

(140) $\begin{matrix} _{2 i^{*}} =_{\max_{}}^{- 1} .Math. k_{x_{2 i^{*}}} .Math. {.Math.}_{j = 1}^{l_{2}}_{2 j^{*}} .Math. k_{x_{2 j^{*}}} .Math. \cos_{k_{x_{2 i^{*}}} k_{x_{2 j^{*}}}} -_{\max_{}}^{- 1} .Math. k_{x_{2 i^{*}}} .Math. {.Math.}_{j = 1}^{l_{1}}_{1 j^{*}} .Math. k_{x_{1 j^{*}}} .Math. \cos_{k_{x_{2 i^{*}}} k_{x_{1 j^{*}}}} & (1.28) \end{matrix}$
wherein components of likelihood components and principal eigenaxis components for class Balong the axis of the extreme vector k.sub.x.sub.2i*are symmetrically balanced with opposing components of likelihood components and principal eigenaxis components for class Aalong the axis of the extreme vector k.sub.x.sub.2i*:

(141) 0 $_{2 i^{*}} =_{\max_{}}^{- 1} .Math. k_{x_{2 i^{*}}} .Math. {.Math.}_{j = 1}^{l_{1}} {comp}_{\overset{.fwdarw.}{k_{x_{2 i^{*}}}}} (\overset{.fwdarw.}{_{2 j^{*}} k_{x_{2 j^{*}}}}) -_{\max_{}}^{- 1} .Math. k_{x_{2 i^{*}}} .Math. {.Math.}_{j = 1}^{l_{2}} {comp}_{\overset{.fwdarw.}{k_{x_{2 i^{*}}}}} (\overset{.fwdarw.}{_{1 j^{*}} k_{x_{1 j^{*}}}}),$
wherein .sub.2i* determines a scale factor for the extreme vector

(142) $\frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.} .$
Accordingly, Eq. (1.28) determines a scale factor .sub.2i* for a correlated extreme vector k.sub.x.sub.2i*.

(143) Given the pointwise covariance statistic in Eq. (1.26), it follows that Eq. (1.27) and Eq. (1.28) determine the manner in which the first and second order vector components of a set of l scaled extreme vectors {.sub.j*k.sub.x.sub.j.sub.*}.sub.j=1.sup.l, wherein the set belongs to a collection of N feature vectors {x.sub.i}.sub.i=1.sup.N, are distributed along the axes of respective extreme vectors k.sub.x.sub.1i* or k.sub.x.sub.2i*, wherein the first and second order vector components of each scaled extreme vector .sub.j*k.sub.x.sub.j* are symmetrically distributed according to: (1) a class label +1 or 1; (2) a signed magnitude

(144) $.Math. k_{x_{j^{*}}} .Math. \cos_{k_{x_{1 i^{*}}} k_{x_{j^{*}}}}$
or

(145) $.Math. k_{x_{j^{*}}} .Math. \cos_{k_{x_{2 i^{*}}} k_{x_{j^{*}}}};$
and (3) a symmetrically balanced distribution of l scaled extreme vectors {.sub.k*k.sub.x.sub.k*}.sub.k=1.sup.l along the axis of the scaled extreme vector k.sub.x.sub.j*, wherein the symmetrically balanced distribution is specified by the scale factor .sub.j*.

(146) Accordingly, the geometric locus of each principal eigenaxis component

(147) $_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.}$
or

(148) $_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.}$
on the geometric locus of the principal eigenaxis determines the manner in which the first and second order vector components of an extreme vector k.sub.x.sub.1i*, or k.sub.x.sub.2i* are symmetrically distributed over the axes of a set of l signed and scaled extreme vectors:

(149) ${_{j^{*}} k_{x_{j^{*}}}}_{j - 1}^{l} .$
wherein each scaled extreme vector has a sign on +1 or 1.

(150) It follows that the geometric locus of each principal eigenaxis component

(151) $_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.}$
or

(152) $_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.}$
on the geometric locus of the principal eigenaxis determines a conditional distribution of first and second degree coordinates for a correlated extreme point k.sub.x.sub.1i* or k.sub.x.sub.2i*, wherein

(153) $_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.}$
determines a pointwise conditional density estimate

(154) 0 $p (k_{x_{1 i^{*}}} .Math. {comp}_{\overset{.fwdarw.}{}} (\overset{.fwdarw.}{k_{x_{1 i^{*}}}}))$
for the correlated extreme point k.sub.x.sub.1i*, wherein components of the extreme vector k.sub.x.sub.1i* are symmetrically distributed over the geometric locus of the principal eigenaxis :

(155) $p (k_{x_{1 i^{*}}} .Math. {comp}_{\overset{.fwdarw.}{}} (\overset{.fwdarw.}{k_{x_{1 i^{*}}}})) =_{\max}^{- 1} {.Math.}_{j = 1}^{l_{1}} .Math._{1 j^{*}} k_{x_{1 j^{*}}} .Math. {comp}_{\overset{.fwdarw.}{_{1 j^{*}} k_{x_{1 j^{*}}}}} (\overset{.fwdarw.}{k_{x_{1 i^{*}}}}) -_{\max}^{- 1} {.Math.}_{j = 1}^{l_{2}} .Math._{2 j^{*}} k_{x_{2 j^{*}}} .Math. {comp}_{\overset{.fwdarw.}{_{2 j^{*}} k_{x_{2 j^{*}}}}} (\overset{.fwdarw.}{k_{x_{1 i^{*}}}}),$
and wherein

(156) $_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.}$
determines a pointwise conditional density estimate

(157) $p (k_{x_{2 i^{*}}} .Math. {comp}_{\overset{.fwdarw.}{-}} (\overset{.fwdarw.}{k_{x_{2 i^{*}}}}))$
for the correlated extreme point k.sub.x.sub.2i*, wherein components of the extreme vector k.sub.x.sub.2i* are symmetrically distributed over the axis of the geometric locus of :

(158) $p (k_{x_{2 i^{*}}} .Math. {comp}_{\overset{.fwdarw.}{-}} (\overset{.fwdarw.}{k_{x_{2 i^{*}}}})) =_{\max}^{- 1} {.Math.}_{j = 1}^{l_{1}} .Math._{2 j^{*}} k_{x_{2 j^{*}}} .Math. {comp}_{\overset{.fwdarw.}{_{2 j^{*}} k_{x_{2 j^{*}}}}} (\overset{.fwdarw.}{k_{x_{2 i^{*}}}}) -_{\max}^{- 1} {.Math.}_{j = 1}^{l_{2}} .Math._{1 j^{*}} k_{x_{1 j^{*}}} .Math. {comp}_{\overset{.fwdarw.}{_{1 j^{*}} k_{x_{1 j^{*}}}}} (\overset{.fwdarw.}{k_{x_{2 i^{*}}}}) .$

(159) Thus, each scale factor .sub.1i* or .sub.2i* determines a conditional density and a corresponding conditional likelihood for a correlated extreme point k.sub.x.sub.1i* or k.sub.x.sub.2i*.

(160) Therefore, conditional densities and corresponding conditional likelihoods .sub.1i*k.sub.x.sub.1i* for the extreme points k.sub.x.sub.1i* are identically distributed over the principal eigenaxis components on .sub.1

(161) $_{1} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}}$
wherein .sub.1i*k.sub.x.sub.1i* determines a conditional density and a corresponding conditional likelihood for a correlated extreme point k.sub.x.sub.1i*, and wherein .sub.1 determines a parameter vector for a class-conditional probability density function p (k.sub.x.sub.1i*|.sub.1) for a given set {k.sub.x.sub.1i*}.sub.i=1.sup.l.sup.1 of extreme points k.sub.x.sub.1i* that belong to a collection of N feature vectors {x.sub.i}.sub.i=1.sup.N:
.sub.1=p(k.sub.x.sub.1i*|.sub.1),
wherein the area .sub.1i*k.sub.x.sub.1i*.sup.2 under a scaled extreme vector .sub.1i*k.sub.x.sub.1i* determines a conditional probability that an extreme point k.sub.x.sub.1i* will be observed within a localized region of either region Z.sub.1 or region Z.sub.2 within a decision space Z, and wherein the area under the conditional density function p(k.sub.x.sub.1i*|.sub.1) determines the conditional probability P(k.sub.x.sub.1i*|.sub.1) of observing the set {k.sub.x.sub.1i*}.sub.i=1.sup.l.sup.1 of extreme points k.sub.x.sub.1i* within localized regions of the decision space Z=Z.sub.1+Z.sub.2 of a minimum risk quadratic classification system

(162) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0.$

(163) Likewise, conditional densities and corresponding conditional likelihoods .sub.2i*k.sub.x.sub.2i* for the k.sub.x.sub.2i* extreme points are identically distributed over the principal eigenaxis components on .sub.2

(164) $_{2} = {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}},$
wherein .sub.2i*k.sub.x.sub.2i* determines a conditional density and a corresponding conditional likelihood for a correlated extreme point k.sub.x.sub.2i*, and wherein .sub.2 determines a parameter vector for a class-conditional probability density function p(k.sub.x.sub.2i*|.sub.2) for a given set {k.sub.x.sub.2i*}.sub.i=1.sup.l.sup.2 of extreme points k.sub.x.sub.2i* that belong to a collection of N feature vectors {x.sub.i}.sub.i=1.sup.N:
.sub.2=p(k.sub.x.sub.2i*|.sub.2),
wherein the area .sub.2i*k.sub.x.sub.2i*.sup.2 under a scaled extreme vector .sub.2i*k.sub.x.sub.2i* determines a conditional probability that an extreme point k.sub.x.sub.2i* will be observed within a localized region of either region Z.sub.1 or region Z.sub.2 within a decision space Z, and wherein the area under the conditional density function p (k.sub.x.sub.2i*|.sub.2) determines the conditional probability P(k.sub.x.sub.2i*|.sub.2) of observing the set {k.sub.x.sub.2i*}.sub.i=1.sup.l.sup.2 of extreme points k.sub.x.sub.2i* within localized regions of the decision space Z=Z.sub.1+Z.sub.2 of a minimum risk quadratic classification system

(165) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0.$

(166) The integral of a conditional density function p (k.sub.x.sub.1i*|.sub.1) for class A

(167) $P (k_{x_{1 i^{*}}} .Math._{1}) =_{Z} ({.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}}) d_{1} =_{Z} p (k_{x_{1 i^{*}}} .Math._{1}) d_{1} =_{Z}_{1} d_{1} = \frac{1}{2} {.Math._{1} .Math.}^{2} + C = {.Math._{1} .Math.}^{2} + C_{1},$
over the decision space Z=Z.sub.i+Z.sub.2 of a minimum risk quadratic classification system, determines the conditional probability P (k.sub.x.sub.2i*|.sub.2) of observing a set {k.sub.x.sub.1i*}.sub.i=1.sup.l.sup.1 of extreme points k.sub.x.sub.1i* within localized regions of the decision space Z=Z.sub.1+Z.sub.2, wherein integrated conditional densities .sub.1i*k.sub.x.sub.1i*.sub.min.sub.c.sup.2 of extreme points k.sub.x.sub.1i* located within the decision region Z.sub.1 determine costs C custom character (Z.sub.1|.sub.1i*k.sub.x.sub.1i*.sub.min.sub.c.sup.2) for expected counter risks .sub.min (Z.sub.1|.sub.1i*k.sub.x.sub.1i*.sub.min.sub.c.sup.2) of making correct decisions, and integrated conditional densities .sub.1i*k.sub.x.sub.1i*.sub.min.sub.c.sup.2 of extreme points k.sub.x.sub.1i* located within the decision region Z.sub.2 determine costs

(168) 0 $(Z_{2} .Math. {.Math._{1 i^{*}} k_{x_{1 i^{*}}} .Math.}_{\min_{c}}^{2})$
for expected risks

(169) $\min (Z_{2} .Math. {.Math._{1 i^{*}} k_{x_{1 i^{*}}} .Math.}_{\min_{c}}^{2})$
of making decision errors.

(170) Accordingly, all of the scaled extreme vectors .sub.1i*k.sub.x.sub.1i* from class A possess critical minimum eigenenergies .sub.1i*k.sub.x.sub.1i*.sub.min.sub.c.sup.2 that determine either costs custom character for obtaining expected risks of making decision errors or costs C for obtaining expected counter risks of making correct decisions.

(171) Therefore, the conditional probability function P(k.sub.1i*|.sub.1) for class A is given by the integral
P(k.sub.x.sub.1i*|.sub.1)=.sub.Z.sub.1d.sub.1=Z|.sub.1.sub.min.sub.c.sup.2+C.sub.1,(1.29)
over the decision space Z=Z.sub.1+Z.sub.2 of a minimum risk quadratic classification system, wherein the integral of Eq. (1.29) has a solution in terms of the critical minimum eigenenergy Z|.sub.1.sub.min.sub.c.sup.2 exhibited by .sub.1 and an integration constant C.sub.1.

(172) The integral of a conditional density function p(k.sub.x.sub.2i*|.sub.2) for class B

(173) $P (k_{x_{2 i^{*}}} .Math._{2}) =_{Z} ({.Math.}_{i = 1}^{l_{2}}_{2 i^{*} k_{x_{2 i^{*}}}}) d_{2} =_{Z} p (k_{x_{2 i^{*}}} .Math._{2}) d_{2} =_{Z}_{2} d_{2} = \frac{1}{2} {.Math._{2} .Math.}^{2} + C = {.Math._{2} .Math.}^{2} + C_{2}$
over the decision space Z=Z.sub.i+Z.sub.2 of a minimum risk quadratic classification system, determines the conditional probability P(k.sub.x.sub.2i*|.sub.2) of observing a set {k.sub.x.sub.2i*}.sub.i=1.sup.l.sup.2 of extreme points k.sub.x.sub.2i* within localized regions of the decision space Z=Z.sub.1+Z.sub.2, wherein integrated conditional densities .sub.2i*k.sub.x.sub.2i*.sub.min.sub.c.sup.2 of extreme points k.sub.x.sub.2i* located within the decision region Z.sub.1 determine costs C custom character (Z.sub.1|.sub.2i*k.sub.x.sub.2i*.sub.min.sub.c.sup.2) for expected risks .sub.min (Z.sub.1|.sub.2i*k.sub.x.sub.2i*min.sub.c.sup.2) of making decision errors, and integrated conditional densities .sub.2i*k.sub.x.sub.2i*.sub.min.sub.c.sup.2 of extreme points k.sub.x.sub.2i* located within the decision region Z.sub.2 determine costs C custom character (Z.sub.2|.sub.2i*k.sub.x.sub.2i*.sub.min.sub.c.sup.2) for expected counter risks .sub.min (Z.sub.2|.sub.2i*k.sub.x.sub.2i*.sub.min.sub.c.sup.2) of making correct decisions.

(174) Accordingly, all of the scaled extreme vectors .sub.2i*k.sub.x.sub.2i* from class B possess critical minimum eigenenergies .sub.2i*k.sub.x.sub.2i*.sub.min.sub.c.sup.2 that determine either costs C custom character for obtaining expected risks of making decision errors or costs C for obtaining expected counter risks of making correct decisions.

(175) Therefore, the conditional probability function P(k.sub.x.sub.2i*|.sub.2) for class B is given by the integral
P(k.sub.x.sub.2i*|.sub.2)=.sub.Z.sub.2d.sub.2=Z|.sub.2.sub.min.sub.c.sup.2+C.sub.2,(1.30)
over the decision space Z=Z.sub.1+Z.sub.2 of a minimum risk quadratic classification system, wherein the integral of Eq. (1.30) has a solution in terms of the critical minimum eigenenergy Z|.sub.2.sub.min.sub.c.sup.2 exhibited by .sub.2 and an integration constant C.sub.2.

(176) Machine learning algorithms of the present invention find the right mix of principal eigenaxis components on the dual loci of and by accomplishing an elegant, statistical balancing feat within the Wolfe dual principal eigenspace of and . The scale factors {.sub.i*}.sub.i=1.sup.l of the principal eigenaxis components on play a fundamental role in the statistical balancing feat.

(177) Using Eq. (1.27), the integrated lengths

(178) ${.Math.}_{i = 1}^{l_{1}}_{1 i^{*}}$
of the principal eigenaxis components on .sub.1 satisfy the identity:

(179) $\begin{matrix} {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}}_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} ({.Math.}_{j = 1}^{l_{1}}_{1 j^{*}} k_{x_{1 j^{*}}} - {.Math.}_{j = 1}^{l_{2}}_{2 j^{*}} k_{x_{2 j^{*}}}), & (1.31) \end{matrix}$
and, using Eq. (1.28), the integrated lengths

(180) ${.Math.}_{i = 1}^{l_{2}}_{2 i^{*}}$
of the principal eigenaxis components on .sub.2 satisfy the identity:

(181) $\begin{matrix} {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}}_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} ({.Math.}_{j = 1}^{l_{2}}_{2 j^{*}} k_{x_{2 j^{*}}} - {.Math.}_{j = 1}^{l_{1}}_{1 j^{*}} k_{x_{1 j^{*}}}) . & (1.32) \end{matrix}$

(182) Returning to Eq. (1.22), wherein the principal eigenaxis of is in statistical equilibrium, it follows that the RHS of Eq. (1.31) equals the RHS of Eq. (1.32):

(183) $_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} ({.Math.}_{j = 1}^{l_{1}}_{1 j^{*}} k_{x_{1 j^{*}}} - {.Math.}_{j = 1}^{l_{2}}_{2 j^{*}} k_{x_{2 j^{*}}}) =_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} ({.Math.}_{j = 1}^{l_{2}}_{2 j^{*}} k_{x_{2 j^{*}}} - {.Math.}_{j = 1}^{l_{1}}_{1 j^{*}} k_{x_{1 j^{*}}}),$
wherein components of all of the extreme vectors and k.sub.x.sub.1i* from k.sub.x.sub.2i* class A and class B are distributed over the axes of .sub.1 and .sub.2 in the symmetrically balanced manner:

(184) $\begin{matrix} _{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} (_{1} -_{2}) =_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} (_{2} -_{1}), & (1.33) \end{matrix}$
wherein components of extreme vectors k.sub.x.sub.1i* along the axis of .sub.2. oppose components of extreme vectors k.sub.x.sub.1i* along the axis of .sub.1, and components of extreme vectors k.sub.x.sub.2i* along the axis of .sub.1 oppose components of extreme vectors k.sub.x.sub.2i* along the axis of .sub.2.

(185) Using Eq. (1.33), it follows that components

(186) $.Math. k_{x_{1 i^{*}}} .Math. \cos_{_{1} k_{x_{1 i^{*}}}}$
of extreme vectors k.sub.x.sub.1i*, along the axis of .sub.1, wherein the axis of .sub.1 is determined by distributions of conditional likelihoods of extreme points k.sub.x.sub.1i*, and opposing components

(187) 00 $- .Math. k_{x_{1 i^{*}}} .Math. \cos_{_{2} k_{x_{1 i^{*}}}}$
of extreme vectors k.sub.x.sub.1i* along the axis of .sub.2, wherein the axis of .sub.2 is determined by distributions of conditional likelihoods of extreme points k.sub.x.sub.2i*, are symmetrically balanced with components

(188) 01 $.Math. k_{x_{2 i^{*}}} .Math. \cos_{_{2} k_{x_{2 i^{*}}}}$
of extreme vectors k.sub.x.sub.2i*, along the axis of .sub.2, wherein the axis of .sub.2 is determined by distributions of conditional likelihoods of extreme points k.sub.x.sub.2i*, and opposing components

(189) 02 $- .Math. k_{x_{2 i^{*}}} .Math. \cos_{_{1} k_{x_{2 i^{*}}}}$
of extreme vectors k.sub.x.sub.2i* along the axis of .sub.1, wherein the axis of .sub.1 is determined by distributions of conditional likelihoods of extreme points k.sub.x.sub.1i*:

(190) 03 $_{\max_{}}^{- 1} .Math._{1} .Math. {.Math.}_{i = 1}^{l_{1}} {comp}_{\overset{.fwdarw.}{_{1}}} (\overset{.fwdarw.}{k_{x_{1 i^{*}}}}) -_{\max_{}}^{- 1} .Math._{2} .Math. {.Math.}_{i = 1}^{l_{1}} {comp}_{\overset{.fwdarw.}{_{2}}} (\overset{.fwdarw.}{k_{x_{1 i^{*}}}}) =_{\max_{}}^{- 1} .Math._{2} .Math. {.Math.}_{i = 1}^{l_{2}} {comp}_{\overset{.fwdarw.}{_{2}}} (\overset{.fwdarw.}{k_{x_{2 i^{*}}}}) -_{\max_{}}^{- 1} .Math._{1} .Math. {.Math.}_{i = 1}^{l_{2}} {comp}_{\overset{.fwdarw.}{_{1}}} (\overset{.fwdarw.}{k_{x_{2 i^{*}}}})$
wherein counteracting and opposing components of likelihoods of extreme vectors k.sub.x.sub.1i* associated with counter risks and risks for class A, along the axis of are symmetrically balanced with counteracting and opposing components of likelihoods of extreme vectors k.sub.x.sub.2i* associated with counter risks and risks for class B, along the axis of .

(191) Now rewrite Eq. (1.33) as:

(192) 04 $\begin{matrix} _{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}}_{1} +_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}}_{1} =_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}}_{2} +_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}}_{2}, & (1.34) \end{matrix}$
wherein components of all of the extreme vectors k.sub.x.sub.1i* and k.sub.x.sub.2i* from class A and class B, along the axes of .sub.1 and .sub.2, satisfy the locus equation:

(193) 05 $[{.Math.}_{i = 1}^{l_{1}} {comp}_{\overset{.fwdarw.}{_{1}}} (\overset{.fwdarw.}{k_{x_{1 i^{*}}}}) + {.Math.}_{i = 1}^{l_{2}} {comp}_{\overset{.fwdarw.}{_{1}}} (\overset{.fwdarw.}{k_{x_{2 i^{*}}}})]_{\max_{}}^{- 1} .Math._{1} .Math. = [{.Math.}_{i = 1}^{l_{2}} {comp}_{\overset{.fwdarw.}{_{2}}} (\overset{.fwdarw.}{k_{x_{2 i^{*}}}}) + {.Math.}_{i = 1}^{l_{1}} {comp}_{\overset{.fwdarw.}{_{2}}} (\overset{.fwdarw.}{k_{x_{1 i^{*}}}})]_{\max_{}}^{- 1} .Math._{2} .Math.$
wherein components of likelihoods of extreme vectors k.sub.x.sub.1i* and k.sub.x.sub.2i* associated with counter risks and risks for class A and class Balong the axis of .sub.1, are symmetrically balanced with components of likelihoods of extreme vectors and k.sub.x.sub.1i* and k.sub.x.sub.2i* associated with counter risks and risks for class A and class Balong the axis of .sub.2. Therefore, machine learning algorithms of the invention determine scale factors .sub.1i* and .sub.2i* for the geometric locus of signed and scaled reproducing kernels of extreme points in Eq. (1.11)

(194) 06 $=_{1} -_{2} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}}$
that satisfy suitable length constraints, wherein the principal eigenaxis of and the principal eigenaxis of are both formed by symmetrical distributions of likelihoods of extreme vectors k.sub.x.sub.1i* and k.sub.x.sub.2i* from class A and class B, wherein components of likelihoods of extreme vectors k.sub.x.sub.1i* and k.sub.x.sub.2i* associated with counter risks and risks for class A and class B are symmetrically balanced with each other: along the axis of .sub.1 and .sub.2 of the principal eigenaxis of and along the axis of .sub.1 and .sub.2 of the principal eigenaxis of .

(195) Given Eqs (1.33) and (1.34), it follows that the locus equation

(196) 07 $\begin{matrix} _{\max_{}}^{- 1} ({.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} + {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}}) {_{1} -_{2}} = 0 & (1.35) \end{matrix}$
determines the primal equilibrium point of a minimum risk quadratic classification systemwithin a Wolfe dual principal eigenspacewherein the form of Eq. (1.35) is determined by geometric and statistical conditions that are satisfied by the dual loci of and .

(197) A discriminant function of the invention satisfies the geometric locus of a quadratic decision boundary of a minimum risk quadratic classification system in terms of the critical minimum eigenenergy Z|.sub.min.sub.c.sup.2.sub.min.sub.c and the minimum expected risk custom character .sub.min (Z|.sub.min.sub.c.sup.2) exhibited by a dual locus , wherein the total allowed eigenenergy Z|.sub.min.sub.c.sup.2 and the minimum expected risk .sub.min (Z|.sub.min.sub.c.sup.2) exhibited by the dual locus of determines the minimum expected risk .sub.min (Z|.sub.min.sub.c.sup.2) and the total allowed eigenenergy Z|.sub.min.sub.c.sup.2 exhibited by the minimum risk quadratic classification system.

(198) The KKT condition in Eq. (1.7) on the Lagrangian function in Eq. (1.2) and the theorem of Karush, Kuhn, and Tucker determine the manner in which a discriminant function of the invention satisfies the geometric loci of the quadratic decision boundary in Eq. (1.15) and the quadratic decision borders in Eqs (1.16) and (1.17).

(199) Accordingly, given a Wolfe dual geometric locus of scaled unit extreme vectors

(200) 08 $= {.Math.}_{i = 1}^{l}_{i^{*}} \frac{k_{x_{i^{*}}}}{.Math. k_{x_{i^{*}}} .Math.},$
wherein {.sub.i*>0}.sub.i=1.sup.l and .sub.i=1.sup.l.sub.i*y.sub.i=0, it follows that the l likelihood components and corresponding principal eigenaxis components {.sub.i*k.sub.x.sub.i*}.sub.i=1.sup.l on the dual locus of satisfy the system of locus equations:

(201) 09 $\begin{matrix} _{i^{*}} [y_{1} (k_{x_{i^{*}}} +_{0}) - 1 +_{i} = 0, i = 1, .Math. & (1.36) \end{matrix}$
within the Wolfe dual principal eigenspace of the minimum risk quadratic classification system, wherein either .sub.i==0 or .sub.i=<<1, e.g. .sub.i==0.02.

(202) Take the set {.sub.1i*k.sub.x.sub.1i*}.sub.i=1.sup.l.sup.1 of l.sub.1 extreme vectors that belong to class A. Using Eq. (1.36) and letting y.sub.i=+1, it follows that the total allowed eigenenergy and the minimum expected risk exhibited by .sub.1 is are both determined by the identity

(203) 0 $\begin{matrix} {.Math. Z |_{1} .Math.}_{\min_{c}}^{2} - .Math._{1} .Math. .Math._{2} .Math. \cos_{_{1}_{2}} {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} (1 -_{i} -_{0}), & (1.37) \end{matrix}$
wherein the constrained discriminant function k.sub.s+.sub.0=+1 satisfies the geometric locus of the quadratic decision border in Eq. (1.16) in terms of the critical minimum eigenenergy Z|.sub.1.sub.min.sub.c.sup.2 and the minimum expected risk custom character .sub.min (Z|.sub.1.sub.min.sub.c.sup.2) exhibited by .sub.1, and wherein the eigenenergy functional Z|.sub.1.sub.min.sub.c.sup.2.sub.1[.sub.2cos .sub..sub.1.sub..sub.2] is equivalent to the functional

(204) ${.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} (1 -_{i} -_{0})$
within the Wolfe dual principal eigenspace of the dual locus of .sub.1.sub.2, and wherein .sub.1 and .sub.1 are symmetrically and equivalently related to each other within the Wolfe dual-principal eigenspace.

(205) Take the set {.sub.2i*k.sub.x.sub.2i*}.sub.i=1.sup.l.sup.2 of l.sub.2 extreme vectors that belong to class B. Using Eq. (1.36) and letting y.sub.i=1, it follows that the total allowed eigenenergy and the minimum expected risk exhibited by .sub.2 are both determined by the identity

(206) $\begin{matrix} {.Math. Z |_{2} .Math.}_{\min_{c}}^{6} - .Math._{2} .Math. .Math._{1} .Math. \cos_{_{2}_{1}} {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} (1 -_{i} +_{0}), & (1.38) \end{matrix}$
wherein the constrained discriminant function k.sub.s+.sub.0=1 satisfies the geometric locus of the quadratic decision border in Eq. (1.17) in terms of the critical minimum eigenenergy Z|.sub.2.sub.min.sub.c.sup.2 and the minimum expected risk custom character .sub.min (Z|.sub.2.sub.min.sub.c.sup.2) exhibited by .sub.2, and wherein the eigenenergy functional Z|.sub.2.sub.min.sub.c.sup.2.sub.2.sub.1cos .sub..sub.2.sub..sub.1 is equivalent to the functional

(207) ${.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} (1 -_{i} +_{0}) :$
within the Wolfe dual principal eigenspace of the dual locus of .sub.1.sub.2, and wherein .sub.2 and .sub.2 are symmetrically and equivalently related to each other within the Wolfe dual-principal eigenspace.

(208) Summation over the complete system of locus equations that are satisfied by .sub.1

(209) $({.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}}) = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} (1 -_{i} -_{0})$
and by .sub.2

(210) $(- {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}}) = {.Math.}_{i = 2}^{l_{2}}_{2 i^{*}} (1 -_{i} +_{0}),$
and using the equilibrium constraint on the dual locus of in Eq. (1.22), wherein the principal eigenaxis of is in statistical equilibrium, produces the identity that determines the total allowed eigenenergy Z|.sub.min.sub.c.sup.2 and the minimum expected risk custom character .sub.min(Z|.sub.min.sub.c.sup.2) exhibited by the dual locus of :

(211) $\begin{matrix} (_{1} -_{2}) {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} (1 -_{i} -_{0}) + {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} (1 -_{i} +_{0}) {.Math.}_{i = 1}^{l}_{i^{*}} (1 -_{i}), & (1.39) \end{matrix}$
wherein the constrained discriminant function k.sub.s+.sub.0=0 satisfies the geometric locus of the quadratic decision boundary in Eq. (1.15) in terms of the critical minimum eigenenergy Z|.sub.1.sub.2.sub.min.sub.c.sup.2 and the minimum expected risk custom character .sub.min(Z|.sub.1.sub.2.sub.min.sub.c.sup.2) exhibited by the dual locus of , and wherein the eigenenergy functional Z|.sub.1.sub.2.sub.min.sub.c.sup.2 is equivalent to the functional:

(212) $\begin{matrix} {.Math. Z | .Math.}_{\min_{c}}^{2} = {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.} (1 -_{i} -_{0}) + {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.} (1 -_{i} +_{0}) \\ {.Math.}_{i = 1}^{l}_{i^{*}} \frac{k_{x_{i^{*}}}}{.Math. k_{x_{i^{*}}} .Math.} (1 -_{i}), \end{matrix}$
within the Wolfe dual principal eigenspace of the dual locus of .sub.1.sub.2, and wherein the dual loci of and are symmetrically and equivalently related to each other within the Wolfe dual-principal eigenspace.

(213) Given Eq. (1.39), it follows that the total allowed eigenenergy Z|.sub.1.sub.2.sub.min.sub.c.sup.2 and the minimum expected risk custom character .sub.min (Z|.sub.1.sub.2.sub.min.sub.c.sup.2) exhibited by the dual locus of are both determined by the integrated magnitudes .sub.i* of the principal eigenaxis components on the dual locus of

(214) $(_{1} -_{2}) {.Math.}_{i = 1}^{l}_{i^{*}} (1 -_{i}) {.Math.}_{i = 1}^{l}_{i^{*}} - {.Math.}_{i = 1}^{l}_{i^{*}}_{i},$
wherein regularization parameters .sub.i=<<1 determine negligible constraints on the minimum expected risk custom character .sub.min (Z|.sub.1.sub.2.sub.min.sub.c.sup.2) and the total allowed eigenenergy Z|.sub.1.sub.2.sub.min.sub.c.sup.2 exhibited by the dual locus of .

(215) Now, take any given collection {x.sub.i}.sub.i=1.sup.N of feature vectors .sub.i that are inputs to one of the machine learning algorithm of the invention, wherein each feature vector .sub.i has a label y.sub.i wherein y.sub.i=+1 if .sub.iA and y.sub.i=1 if .sub.iB.

(216) The system of locus equations in Eqs (1.37)-(1.39) determines the manner in which a constrained discriminant function of the invention satisfies parametric, primary and secondary integral equations of binary classification over the decision space of a minimum risk quadratic classification system of the invention. The primary integral equation is devised first.

(217) Using Eq. (1.11), Eq. (1.13), Eq. (1.22) and Eqs (1.37)-(1.39), it follows that the constrained discriminant function

(218) $D (s) = k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) :$ $D (s) = 0, D (s) = + 1, and D (s) = - 1,$
satisfies the locus equations

(219) 0 $\begin{matrix} {.Math. Z |_{1} .Math.}_{\min_{c}}^{2} - .Math._{1} .Math. .Math._{2} .Math. \cos_{_{1}_{2}} + (y) {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} \frac{1}{2} {.Math. Z |_{1} -_{2} .Math.}_{\min_{c}}^{2}, & (1.40) \end{matrix}$
and

(220) $\begin{matrix} {.Math. Z |_{2} .Math.}_{\min_{c}}^{2} - .Math._{2} .Math. .Math._{1} .Math. \cos_{_{2}_{1}} - (y) {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} \frac{1}{2} {.Math. Z |_{1} -_{2} .Math.}_{\min_{c}}^{2}, & (1.41) \end{matrix}$
over the decision regions Z.sub.1 and Z.sub.2 of the decision space Z of the minimum risk quadratic classification system

(221) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0,$
wherein the parameters

(222) $(y) {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}}$
and

(223) $- (y) {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} : (y) \overset{}{=} \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i})$
are equalizer statistics.

(224) Using Eqs (1.40) and (1.41) along with the identity in Eq. (1.31)

(225) ${.Math.}_{i = 1}^{l_{1}}_{1 i^{*}}_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} ({.Math.}_{j = 1}^{l_{1}}_{1 j^{*}} k_{x_{1 j^{*}}} - {.Math.}_{j = 1}^{l_{2}}_{2 j^{*}} k_{x_{2 j^{*}}}),$
and the identity in Eq. (1.32)

(226) ${.Math.}_{i = 1}^{l_{2}}_{2 i^{*}}_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} ({.Math.}_{j = 1}^{l_{2}}_{2 j^{*}} k_{x_{2 j^{*}}} - {.Math.}_{j = 1}^{l_{1}}_{1 j^{*}} k_{x_{1 j^{*}}}),$
it follows that the constrained discriminant function satisfies the locus equation over the decision regions Z.sub.1 and Z.sub.2 of the decision space Z of the minimum risk quadratic classification system:

(227) $\begin{matrix} {.Math. Z .Math._{1} .Math.}_{\min_{c}}^{2} - .Math._{1} .Math. .Math._{2} .Math. \cos_{_{1}_{2}} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} = {.Math. Z .Math._{2} .Math.}_{\min_{c}}^{2} - .Math._{2} .Math. .Math._{1} .Math. \cos_{_{2}_{1}} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}}, & (1.42) \end{matrix}$
wherein both the left-hand side and the right-hand side of Eq. (1.42) satisfy half the total allowed eigenenergy Z|.sub.1.sub.2.sub.min.sub.c.sup.2 and half the minimum expected risk custom character .sub.min (Z|.sub.1.sub.2.sub.min.sub.c.sup.2) exhibited by the minimum risk quadratic classification system

(228) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0.$

(229) Returning to the integral in Eq. (1.29):
P(k.sub.x.sub.1i*|.sub.1)=.sub.Z.sub.1d.sub.1=Z|.sub.1.sub.min.sub.c.sup.2+C.sub.1,
wherein the above integral determines a conditional probability P (k.sub.x.sub.1i*|.sub.1) for class A, and to the integral in Eq. (1.30)
P(k.sub.x.sub.2i*,|.sub.2).sub.Z.sub.2d.sub.2=Z|.sub.2.sub.min.sub.c.sup.2+C.sub.2,
wherein the above integral determines a conditional probability P (k.sub.x.sub.2i*|.sub.2) for class B, it follows that the value for the integration constant C.sub.1 in Eq. (1.29) is: C.sub.1=.sub.1.sub.2cos .sub..sub.1.sub..sub.2, and the value for the integration constant C.sub.2 in Eq. (1.30) is: C.sub.2=.sub.2.sub.1cos .sub..sub.2.sub..sub.1.

(230) Substituting the value for C.sub.1 into Eq. (1.29), and using Eq. (1.29) and Eq. (1.42), it follows that the conditional probability P(k.sub.x.sub.1i*.sub.1) for class A, wherein the integral of the conditional density function p(k.sub.x.sub.1i*|.sub.1) for class A is given by the integral:

(231) $\begin{matrix} \begin{matrix} P (k_{x_{1 i^{*}}} |_{1}) =_{Z}^{} p (k_{x_{1 i^{*}}} .Math._{1}) d_{1} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} (_{1} -_{2}) \\ =_{Z}^{}_{1} d_{1} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} (_{1} -_{2}) \\ = {.Math. Z |_{1} .Math.}_{\min_{c}}^{2} - .Math._{1} .Math. .Math._{2} .Math. \cos_{_{1}_{2}} + \\ (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} (_{1} -_{2}) \\ \frac{1}{2} {.Math. Z |_{1} -_{2} .Math.}_{\min_{c}}^{2} \frac{1}{2} {.Math.}_{\min} (Z | {.Math._{1} -_{2} .Math.}_{\min_{c}}^{2}), \end{matrix} & (1.43) \end{matrix}$
over the decision space Z=Z.sub.1+Z.sub.2 of the minimum risk quadratic classification system, is determined by half the total allowed eigenenergy

(232) 0 $\frac{1}{2} {.Math. Z |_{1} -_{2} .Math.}_{\min_{c}}^{2}$
and half the minimum expected risk

(233) $\frac{1}{2} {.Math.}_{\min} (Z | {.Math._{1} -_{2} .Math.}_{\min_{c}}^{2})$
that is exhibited by the dual locus of =.sub.1.sub.2.

(234) Substituting the value for C.sub.2 into Eq. (1.30), and using Eq. (1.30) and Eq. (1.42), it follows that the conditional probability P (k.sub.x.sub.2i*|.sub.2) for class B, wherein the integral of the conditional density function p(k.sub.x.sub.2i*|.sub.2) for class B is given by the integral:

(235) $\begin{matrix} \begin{matrix} P (k_{x_{2 i^{*}}} |_{2}) =_{Z}^{} p (k_{x_{2 i^{*}}} .Math._{2}) d_{2} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} (_{1} -_{2}) \\ =_{Z}^{}_{2} d_{2} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} (_{1} -_{2}) \\ = {.Math. Z |_{2} .Math.}_{\min_{c}}^{2} - .Math._{2} .Math. .Math._{1} .Math. \cos_{_{2}_{1}} + \\ (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} (_{1} -_{2}) \\ \frac{1}{2} {.Math. Z |_{1} -_{2} .Math.}_{\min_{c}}^{2} \frac{1}{2} {.Math.}_{\min} (Z | {.Math._{1} -_{2} .Math.}_{\min_{c}}^{2}), \end{matrix} & (1.44) \end{matrix}$
over the decision space Z=Z.sub.1+Z.sub.2 of the minimum risk quadratic classification system, is determined by half the total allowed eigenenergy

(236) $\frac{1}{2} {.Math. Z |_{1} -_{2} .Math.}_{\min_{c}}^{2}$
and half the minimum expected risk

(237) $\frac{1}{2} {.Math.}_{\min} (Z | {.Math._{1} -_{2} .Math.}_{\min_{c}}^{2})$
that is exhibited by the dual locus of =.sub.1.sub.2.

(238) Given Eqs (1.43) and (1.44), it follows that the integral of the conditional density function p(k.sub.x.sub.1i*|.sub.1) for class A and the integral of the conditional density function p(k.sub.x.sub.2i*|.sub.2) for class B are both constrained to satisfy half the total allowed eigenenergy

(239) $\frac{1}{2} {.Math. Z |_{1} -_{2} .Math.}_{\min_{c}}^{2}$
and half the minimum expected risk

(240) $\frac{1}{2} {.Math.}_{\min} (Z | {.Math._{1} -_{2} .Math.}_{\min_{c}}^{2})$
that is exhibited by the minimum risk quadratic classification system

(241) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0.$

(242) Therefore, the conditional probability P(k.sub.x.sub.2i*|.sub.2) of observing the set {k.sub.x.sub.1i*}.sub.i=1.sup.l.sup.1 of l.sub.1 extreme points k.sub.x.sub.1i* from class A within localized regions of the decision space Z=Z.sub.1+Z.sub.2 of the minimum risk quadratic classification system is equal to the conditional probability P(k.sub.x.sub.2i*|.sub.2) of observing the set {k.sub.x.sub.2i*}.sub.i=1.sup.l.sup.2 of l.sub.2 extreme points k.sub.x.sub.2i* from class B within localized regions of the decision space Z=Z.sub.1+Z.sub.2 of the minimum risk quadratic classification system, wherein P(k.sub.x.sub.1i*|.sub.1)=P(k.sub.x.sub.2i*.sub.2), and wherein all of the extreme points belong to the collection of feature vectors {x.sub.i}.sub.i=1.sup.N that are inputs to a machine learning algorithm of the invention.

(243) Therefore, minimum risk quadratic classification systems of the invention exhibit a novel property of computer-implemented quadratic classification systems, wherein for any given collection of feature vectors {x.sub.i}.sub.i=1.sup.N that are inputs to one of the machine learning algorithms of the invention: (1) the conditional probability, (2) the minimum expected risk, and (3) the total allowed eigenenergy exhibited by a minimum risk quadratic classification system for class A is equal to (1) the conditional probability, (2) the minimum expected risk, and (3) the total allowed eigenenergy exhibited by the minimum risk quadratic classification system for class B.

(244) Using Eqs (1.43) and (1.44), it follows that the constrained discriminant function of the invention

(245) $D (s) = k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) :$ $D (s) = 0, D (s) = + 1, and D (s) = - 1,$
is the solution of the parametric, fundamental integral equation of binary classification:

(246) $\begin{matrix} f_{1} (D (s)) =_{Z_{1}}_{1} d_{1} +_{Z_{2}}_{1} d_{1} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} (_{1} -_{2}) =_{Z_{1}}_{2} d_{2} +_{Z_{2}}_{2} d_{2} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} (_{1} -_{2}), & (1.45) \end{matrix}$
over the decision space Z=Z.sub.1+Z.sub.2 of the minimum risk quadratic classification system

(247) 0 $k_{s} +_{0} \underset{B}{\overset{A}{}} 0$
of the invention, wherein the decision space Z is spanned by symmetrical decision regions Z.sub.1+Z.sub.2=Z:Z.sub.1Z.sub.2, and wherein the conditional probability P (Z.sub.1|.sub.1) and the counter risk custom character .sub.min(Z.sub.1|.sub.1.sub.min.sub.c.sup.2) and the eigenenergy Z.sub.1|.sub.1.sub.min.sub.c.sup.2 of class A: within the decision region Z.sub.1, along with the conditional probability P (Z.sub.2|.sub.1) and the risk .sub.min(Z.sub.2|.sub.1.sub.min.sub.c.sup.2) and the eigenenergy Z.sub.2|.sub.1.sub.min.sub.c.sup.2 of class A: within the decision region Z.sub.2are symmetrically balanced withthe conditional probability P(Z.sub.1|.sub.2) and the risk custom character .sub.min (Z.sub.1|.sub.2.sub.min.sub.c.sup.2) and the eigenenergy Z.sub.1|.sub.2.sub.min.sub.c.sup.2, of class B: within the decision region Z.sub.1, along with the conditional probability P(Z.sub.2|.sub.2) and the counter risk) .sub.min(Z.sub.2|.sub.2.sub.min.sub.c.sup.2) and the eigenenergy Z.sub.2|.sub.2.sub.min.sub.c.sup.2 of class B: within the decision region Z.sub.2, and wherein the conditional probability P(Z|.sub.1.sub.2) and the minimum expected risk custom character .sub.min (Z|.sub.1.sub.2.sub.min.sub.c.sup.2) and the total allowed eigenenergy Z|.sub.1.sub.2.sub.min.sub.c.sup.2 exhibited by the minimum risk quadratic classification system are jointly regulated by the primal equilibrium point:

(248) $_{\max_{}}^{- 1} ({.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} + {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}}) {_{1} -_{2}} = 0$
and the Wolfe dual equilibrium point:

(249) ${.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} \frac{k_{x_{1 i^{*}}}}{.Math. k_{x_{1 i^{*}}} .Math.} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} \frac{k_{x_{2 i^{*}}}}{.Math. k_{x_{2 i^{*}}} .Math.} = 0$
of the integral equation f.sub.1 (D(s)).

(250) Further, the novel principal eigenaxis of the invention that determines discriminant functions of the invention along with minimum risk quadratic classification systems of the inventionsatisfies the law of cosines in the symmetrically balanced manner that is outlined below.

(251) Any given geometric locus of signed and scaled reproducing kernels of extreme points:

(252) $= {.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}} =_{1} -_{2},$
wherein the geometric locus of a principal eigenaxis determines a dual locus of likelihood components and principal eigenaxis components =.sub.1.sub.2 that represents a discriminant function D(s)=k.sub.s+.sub.0 of the invention, wherein principal eigenaxis components and corresponding likelihood components .sub.1i*k.sub.x.sub.1i* and .sub.2i*k.sub.x.sub.2i* on the dual locus of .sub.1.sub.2 determine conditional densities and conditional likelihoods for respective extreme points k.sub.x.sub.1i* and k.sub.x.sub.2i*, and wherein the geometric locus of the principal eigenaxis determines an intrinsic coordinate system .sub.1.sub.2 of a quadratic decision boundary k.sub.s+.sub.0=0 and an eigenaxis of symmetry

(253) $(k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}) (_{1} -_{2})$
for the decision space Z.sub.1+Z.sub.2=Z: Z.sub.1Z.sub.2 of a minimum risk quadratic classification

(254) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0$
of the invention, satisfies the law of cosines

(255) $\begin{matrix} {.Math. .Math.}_{\min_{c}}^{2} = {.Math._{1} -_{2} .Math.}_{\min_{c}}^{2} \\ = {.Math._{1} .Math.}_{\min_{c}}^{2} + {.Math._{2} .Math.}_{\min_{c}}^{2} - 2 .Math._{1} .Math. .Math._{2} .Math. \cos_{_{1}_{2}} \end{matrix}$
in the symmetrically balanced manner:

(256) $\begin{matrix} \frac{1}{2} {.Math. .Math.}_{\min_{c}}^{2} = {.Math._{1} .Math.}_{\min_{c}}^{2} - .Math._{1} .Math. .Math._{2} .Math. \cos_{_{1}_{2}} \\ = {.Math._{2} .Math.}_{\min_{c}}^{2} - .Math._{2} .Math. .Math._{1} .Math. \cos_{_{2}_{1}}, \end{matrix}$
wherein is the angle between is .sub.1 and .sub.2, and wherein the dual locus of likelihood components and principal eigenaxis components exhibits symmetrical dimensions and density, wherein the total allowed eigenenergy .sub.1.sub.min.sub.c.sup.2 exhibited by the dual locus of components p(k.sub.x.sub.1i*|.sub.1) given class A is symmetrically balanced with the total allowed eigenenergy .sub.2.sub.min.sub.c.sup.2 exhibited by the dual locus of components p(k.sub.x.sub.2i*|.sub.2) given class B:
.sub.1.sub.min.sub.c.sup.2=.sub.2.sub.min.sub.c.sup.2,
wherein the length of side .sub.1 equals the length of side .sub.2
.sub.1=.sub.2,
and wherein components of likelihood components and principal eigenaxis components of class Aalong the axis of .sub.1are symmetrically balanced with components of likelihood components and principal eigenaxis components of class Balong the axis of .sub.2:

(257) $.Math._{1} .Math. {.Math.}_{i = 1}^{l_{1}} comp \underset{_{1}}{.fwdarw.} (\overset{.fwdarw.}{_{1 i^{*}} k_{x_{1 i^{*}}}}) = .Math._{2} .Math. {.Math.}_{i = 1}^{l_{2}} comp \underset{_{2}}{.fwdarw.} (\overset{.fwdarw.}{_{2 i^{*}} k_{x_{2 i^{*}}}}),$
wherein components of critical minimum eigenenergies exhibited by corresponding components of scaled extreme vectors from class A and corresponding counter risks and risks for class Aalong the axis of .sub.1, are symmetrically balanced with components of critical minimum eigenenergies exhibited by corresponding components of scaled extreme vectors from class B and corresponding counter risks and risks for class Balong the axis of .sub.2, and wherein the opposing component of .sub.2along the axis of .sub.1, is symmetrically balanced with the opposing component of .sub.1along the axis of .sub.2:
.sub.1[.sub.2cos .sub..sub.1.sub..sub.2]=.sub.2[.sub.1cos .sub..sub.2.sub..sub.1],
wherein opposing components of likelihood components and principal eigenaxis components of class Balong the axis of .sub.1, are symmetrically balanced with opposing components of likelihood components and principal eigenaxis components of class Aalong the axis of .sub.2:

(258) $.Math._{1} .Math. {.Math.}_{i = 1}^{l_{2}} - comp \underset{_{1}}{.fwdarw.} (\overset{.fwdarw.}{_{2 i^{*}} k_{x_{2 i^{*}}}}) = .Math._{2} .Math. {.Math.}_{i = 1}^{l_{1}} - comp \underset{_{2}}{.fwdarw.} (\overset{.fwdarw.}{_{1 i^{*}} k_{x_{1 i^{*}}}}),$
wherein opposing components of critical minimum eigenenergies exhibited by corresponding components of scaled extreme vectors from class B and corresponding counter risks and risks for class Balong the axis of is .sub.1, are symmetrically balanced with opposing components of critical minimum eigenenergies exhibited by corresponding components of scaled extreme vectors from class A and corresponding counter risks and risks for class Aalong the axis of .sub.2, and wherein opposing and counteracting random forces and influences of the minimum risk quadratic classification system of the invention are symmetrically balanced with each otherabout the geometric center of the dual locus :

(259) 0 $.Math._{1} .Math. ({.Math.}_{i = 1}^{l_{1}} comp \underset{_{1}}{.fwdarw.} (\overset{.fwdarw.}{_{1 i^{*}} k_{x_{1 i^{*}}}}) - {.Math.}_{i = 1}^{l_{2}} comp \underset{_{1}}{.fwdarw.} (\overset{.fwdarw.}{_{2 i^{*}} k_{x_{2 i^{*}}}})) = .Math._{2} .Math. ({.Math.}_{i = 1}^{l_{2}} comp \underset{_{2}}{.fwdarw.} (\overset{.fwdarw.}{_{2 i^{*}} k_{x_{2 i^{*}}}}) - {.Math.}_{i = 1}^{l_{1}} comp \underset{_{2}}{.fwdarw.} (\overset{.fwdarw.}{_{1 i^{*}} k_{x_{1 i^{*}}}})),$ wherein the statistical fulcrum of the geometric locus of the principal eigenaxis is located.

(260) Accordingly, counteracting and opposing components of critical minimum eigenenergies exhibited by corresponding components of all of the scaled extreme vectors on the geometric locus of the principal eigenaxis =.sub.1.sub.2 of the invention, along the axis of the principal eigenaxis , and corresponding counter risks and risks exhibited by the minimum risk quadratic classification system

(261) $k_{s} +_{0} \underset{B}{\overset{A}{}} 0$
of the invention, are symmetrically balanced with each other about the geometric center of the dual locus , wherein the statistical fulcrum of is located. FIG. 12 illustrates regions of counter risk and regions of risk within the decision regions of a minimum risk quadratic classification system in which distributions of feature vectors are overlapping with each other.

(262) Now, take the previous collection {x.sub.i}.sub.i=1.sup.N of labeled feature vectors x.sub.i that are inputs to one of the machine learning algorithm of the invention, wherein each feature vector x.sub.i has a label y.sub.i wherein y.sub.i=+1 if x.sub.iA and y.sub.i=1 if .sub.iB.

(263) Given that a constrained discriminant function of the invention

(264) $D (s) = (k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}})_{1} - (k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}})_{2} + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}) :$ $D (s) = 0, D (s) = + 1, and D (s) = - 1,$
is the solution of the parametric, fundamental integral equation of binary classification in Eq. (1.45), and given that the discriminant function is represented by a dual locus of likelihood components and principal eigenaxis components =.sub.1.sub.2 that satisfies the law of cosines in the symmetrically balanced manner outlined above, it follows that the constrained discriminant function satisfies the parametric, secondary integral equation of binary classification:

(265) $f_{2} (D (s)) :_{Z_{1}}_{1} d_{1} -_{Z_{1}}_{2} d_{2} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{1}} k_{x_{1 i^{*}}} (_{1} -_{2}) =_{Z_{2}}_{2} d_{2} -_{Z_{2}}_{1} d_{1} + (y)_{\max_{}}^{- 1} {.Math.}_{i = 1}^{l_{2}} k_{x_{2 i^{*}}} (_{1} -_{2}),$
over the decision regions Z.sub.1 and Z.sub.2 of a minimum risk quadratic classification system, wherein opposing and counteracting random forces and influences of the minimum risk quadratic classification system are symmetrically balanced with each otherwithin the decision regions Z.sub.1 and Z.sub.2in the following manners: (1) the eigenenergy Z.sub.1|1.sub.min.sub.c.sup.2 and the counter risk custom character .sub.min(Z.sub.1|.sub.1.sub.min.sub.c.sup.2) and the conditional probability P(Z.sub.1|.sub.1) of class A are symmetrically balanced with the opposing eigenenergy Z.sub.1.sub.2.sub.min.sub.c.sup.2 and the opposing risk .sub.min (Z.sub.1|.sub.2.sub.min.sub.c.sup.2) and the opposing conditional probability P(Z.sub.1|.sub.2) of class B: within the Z.sub.1 decision region; (2) the eigenenergy Z.sub.2|.sub.min.sub.c.sup.2 and the counter risk custom character .sub.min (Z.sub.2|.sub.2.sub.min.sub.c.sup.2) and the conditional probability P(Z.sub.2|.sub.2) of class B are symmetrically balanced with the opposing eigenenergy Z.sub.2|.sub.1.sub.min.sub.c.sup.2 and the opposing risk .sub.min(Z.sub.2|.sub.1.sub.min.sub.c.sup.2) and the opposing conditional probability P(Z.sub.2|.sub.1) of class A: within the decision region Z.sub.2; (3) the eigenenergy Z.sub.1|.sub.1.sub.min.sub.c.sup.2 and the counter risk custom character .sub.min (Z.sub.1|.sub.1.sub.min.sub.c.sup.2) and the conditional probability P(Z.sub.1|.sub.1) of class A along with the opposing eigenenergy Z.sub.1|.sub.2.sub.min.sub.c.sup.2 and the opposing risk .sub.min (Z.sub.1|.sub.2.sub.min.sub.c.sup.2) and the opposing conditional probability P(Z.sub.1|.sub.2) of class B: within the decision region Z.sub.1are symmetrically balanced with the eigenenergy Z.sub.2|.sub.2.sub.min.sub.c.sup.2 and the counter risk custom character .sub.min (Z.sub.2|.sub.2.sub.min.sub.c.sup.2) and the conditional probability P(Z.sub.2|.sub.2) of class B along with the opposing eigenenergy Z.sub.2|.sub.1.sub.min.sub.c.sup.2 and the opposing risk .sub.min (Z.sub.2|.sub.1.sub.min.sub.c.sup.2) and the opposing conditional probability P (Z.sub.2|.sub.1) of class A: within the Z.sub.2 decision region, wherein the minimum risk quadratic classification system satisfies a state of statistical equilibrium, wherein the expected risk custom character .sub.min (Z|.sub.1.sub.2.sub.min.sub.c.sup.2) and the total allowed eigenenergy Z|.sub.1.sub.2.sub.min.sub.c.sup.2 exhibited by the minimum risk quadratic classification system are minimized, and wherein the minimum risk quadratic classification system exhibits the minimum probability of error for classifying feature vectors that belong to or are related to the given collection {x.sub.i}.sub.i=1.sup.N of feature vectors.

(266) Therefore, minimum risk quadratic classification systems of the invention exhibit a novel and useful property, wherein for any given collection of labeled feature vectors that are inputs to a machine learning algorithm of the invention, the minimum risk quadratic classification system determined by the machine learning algorithm satisfies a state of statistical equilibrium, wherein the expected risk and the total allowed eigenenergy exhibited by the minimum risk quadratic classification system are minimized, and the minimum risk quadratic classification system exhibits the minimum probability of error for classifying the collection of feature vectors and feature vectors related to the collection into two classes.

(267) Further, discriminant functions of minimum risk quadratic classification systems of the invention exhibit a novel and useful property, wherein a discriminant function D(s) of a minimum risk quadratic classification system is determined by a linear combination of a collection of extreme vectors k.sub.x.sub.i*, a collection of signed and scaled extreme vectors .sub.i*k.sub.x.sub.i* and .sub.2i*k.sub.x.sub.2i*, a collection of signs y.sub.i=+1 or y.sub.i=1 associated with the extreme vectors k.sub.x.sub.i*, and a collection of regularization parameters .sub.i==0 or .sub.i=<<1:

(268) $D (s) = (k_{s} - \frac{1}{l} {.Math.}_{i = 1}^{l} k_{x_{i^{*}}}) ({.Math.}_{i = 1}^{l_{1}}_{1 i^{*}} k_{x_{1 i^{*}}} - {.Math.}_{i = 1}^{l_{2}}_{2 i^{*}} k_{x_{2 i^{*}}}) + \frac{1}{l} {.Math.}_{i = 1}^{l} y_{i} (1 -_{i}),$
wherein the collection of extreme vectors {k.sub.x.sub.i*}.sub.i=1.sup.l belong to a collection of feature vectors {x.sub.i}.sub.i=1.sup.N that are inputs to one of the machine learning algorithms of the invention, and wherein the scale factors of the extreme vectors are determined by the machine learning algorithm used to determine the discriminant function D(s) of the minimum risk quadratic classification system sign (D(s)) that classifies the collection of feature vectors {x.sub.i}.sub.i=1.sup.N into two classes:

(269) $sign (D (s)) \overset{}{=} k_{s} +_{0} \underset{B}{\overset{A}{}} 0,$
wherein the output of the minimum risk quadratic classification system sign (D(s)) is related to the two classes, and wherein the minimum risk quadratic classification system sign (D(s)) exhibits the minimum probability of error for classifying feature vectors that belong to or are related to the collection of feature vectors used to determine the system sign (D(s)).

(270) Therefore, discriminant functions D(s) of a minimum risk quadratic classification system sign(D(s)) provide scalable modules that can be used to determine ensembles

(271) $E = {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s))$
of discriminant functions of minimum risk quadratic classification systems, wherein an ensemble of M1 discriminant functions of M1 minimum risk quadratic classification systems exhibits the minimum probability of error for classifying feature vectors that belong to or are related to M given collections of feature vectors.

(272) More specifically, discriminant functions of minimum risk quadratic classification systems provide scalable modules that are used to determine a discriminant function of an M-class minimum risk quadratic classification system that classifies feature vectors into M classes, wherein the total allowed eigenenergy and the minimum expected risk that is exhibited by the M-class minimum risk quadratic classification system is determined by the total allowed eigenenergy and the minimum expected risk that is exhibited by M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems

(273) $E_{M} = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s)),$
wherein each minimum risk quadratic classification system sign(D.sub.ij(s)) of an ensemble

(274) $E_{c_{i}} = {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s))$
for a given class c.sub.i exhibits a total allowed eigenenergy and a minimum expected risk for a given collection of feature vectors, and wherein the total allowed eigenenergy and the expected risk that is exhibited by the ensemble E.sub.c.sub.i is minimum for M given collections of feature vectors, and wherein the total allowed eigenenergy and the expected risk exhibited by the M-class minimum risk quadratic classification system is minimum for the M given collections of feature vectors.

(275) It follows that discriminant functions of M-class minimum risk quadratic classification systems that are determined by machine learning algorithms of the invention exhibit the minimum probability of error for classifying feature vectors that belong to M collections of feature vectors and unknown feature vectors related to the M collections of feature vectors.

(276) It immediately follows that discriminant functions of minimum risk quadratic classification systems of the invention also provide scalable modules that are used to determine a fused discriminant function of a fused minimum quadratic classification system that classifies two types of feature vectors into two classes, wherein each type of feature vector has a different number of vector components. The total allowed eigenenergy and the minimum expected risk exhibited by the fused minimum risk quadratic classification system is determined by the total allowed eigenenergy and the minimum expected risk that is exhibited by an ensemble of a discriminant function of a minimum risk quadratic classification system sign (D(s)) and a different discriminant function of a different minimum risk quadratic classification system

(277) $sign (\overset{.Math.}{D} (s)) : {\overline{\overline{E}}}_{2} = sign (D (s)) + sign (\overset{.Math.}{D} (s)),$
wherein the total allowed eigenenergy and the expected risk exhibited by the fused minimum risk quadratic classification system is minimum for a given collection of feature vectors and a given collection of different feature vectors.

(278) Any given fused discriminant function of a fused minimum risk quadratic classification system

(279) 0 ${\overline{\overline{E}}}_{2} = sign (D (s)) + sign (\overset{.Math.}{D} (s))$
that is determined by a machine learning algorithm of the invention exhibits the minimum probability of error for classifying feature vectors that belong to or are related to a collection of feature vectors as well as different feature vectors that belong to or are related to a collection of different feature vectors.

(280) Discriminant functions of minimum risk quadratic classification systems of the invention also provide scalable modules that are used to determine a fused discriminant function of a fused M-class minimum risk quadratic classification system that classifies two types of feature vectors into M classes, wherein each type of feature vector has a different number of vector components, and wherein the total allowed eigenenergy and the minimum expected risk exhibited by the fused M-class minimum risk quadratic classification system is determined by the total allowed eigenenergy and the minimum expected risk that is exhibited by M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems

(281) $E_{M} = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s))$
and M different ensembles of M1 different discriminant functions of M1 different minimum risk quadratic classification systems

(282) ${\overset{.Math.}{E}}_{M} = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign ({\overset{.Math.}{D}}_{ij} (s)) :$

(283) ${\overline{\overline{E}}}_{M} = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s)) + {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign ({\overset{.Math.}{D}}_{ij} (s)),$
and wherein the total allowed eigenenergy and the expected risk exhibited by the fused M-class minimum risk quadratic classification system is minimum for M given collections of feature vectors and M given collections of different feature vectors.

(284) Therefore, fused discriminant functions of fused M-class minimum risk quadratic classification systems that are determined by machine learning algorithms of the invention exhibit the minimum probability of error for classifying feature vectors that belong to M collections of feature vectors and unknown feature vectors related to the M collections of feature vectors as well as different feature vectors that belong to M collections of different feature vectors and unknown different feature vectors related to the M collections of different feature vectors.

(285) Further, given that discriminant functions of the invention determine likely locations of feature vectors that belong to given collections of feature vectors and any given unknown feature vectors related to a given collection, wherein a given collection of feature vectors belong to two classes, and given that discriminant functions of the invention identify decision regions related to two classes that given collections of feature vectors and any given unknown feature vectors related to a given collection are located within, and given that discriminant functions of the invention recognize classes of feature vectors that belong to given collections of feature vectors and any given unknown feature vectors related to a given collection, wherein minimum risk quadratic classification systems of the invention decide which of two classes that given collections of feature vectors and any given unknown feature vectors related to a given collection belong to, and thereby classify given collections of feature vectors and any given unknown feature vectors related to a given collection, it follows that discriminant functions of minimum risk quadratic classification systems of the invention can be used to determine a classification error rate and a measure of overlap between distributions of feature vectors for two classes of feature vectors. Further, discriminant functions of minimum quadratic classification systems of the invention can be used to determine if distributions of two collections of feature vectors are homogenous distributions.

Embodiment 1

(286) The method to determine a discriminant function of a minimum risk quadratic classification system that classifies feature vectors into two classes, designed in accordance with the invention, is fully described within the detailed description of the invention. FIG. 6 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for determining a discriminant function of a minimum risk quadratic classification system that classifies feature vectors into two classes. The process of determining the discriminant function of a minimum risk quadratic classification system comprises the following steps:

(287) Receive an N d data set of feature vectors within a computer system wherein N is the number of feature vectors, d is the number of vector components in each feature vector, and each one of the N feature vectors is labeled with information that identifies which of the two classes each one of the N feature vectors belongs to.

(288) Receive unknown feature vectors related to the data set with the computer system.

(289) Choose a reproducing kernel and determine a kernel matrix using the data set by calculating a matrix of all possible inner products of signed reproducing kernels of the N feature vectors, wherein each one of the reproducing kernels of the N feature vectors has a sign of +1 or 1 that identifies which of the two classes each one of the N feature vectors belongs to, and calculate a regularized kernel matrix from the kernel matrix.

(290) Determine the scale factors of a geometric locus of signed and scaled reproducing kernels of extreme points by using the regularized kernel matrix to solve the dual optimization problem in Eq. (1.9).

(291) Determine the extreme vectors on the geometric locus by identifying scale factors in the vector of scale factors that exceed zero by a small threshold T e.g.: T=0.0050.

(292) Determine a sign vector of the signs associated with the extreme vectors using the data set, and compute the average sign using the sign vector.

(293) Determine a locus of risk and compute the average risk using the locus of risk.

(294) Determine a discriminant locus using the N feature vectors and feature vectors being classified to calculate a matrix of inner products between the signed reproducing kernels of the N feature vectors and the reproducing kernels of the feature vectors, and multiply the matrix by the vector of scale factors.

(295) Determine the discriminant function of the minimum risk quadratic classification system, wherein the minimum risk quadratic classification system is determined by computing the sign of the discriminant function, and classify any given unknown feature vectors.

Embodiment 2

(296) FIG. 7 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for determining a discriminant function of an M-class minimum risk quadratic classification system that classifies feature vectors into M classes.

(297) A discriminant function of an M-class minimum risk quadratic classification system that classifies feature vectors into M classes is determined by using a machine learning algorithm of the invention and M collections of N feature vectors, wherein each feature vector in a given collection belongs to the same class, to determine M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems, wherein the determination of each one of the M ensembles involves using the machine algorithm to determine M1 discriminant functions of M1 minimum risk quadratic classification systems for a class c.sub.i of feature vectors, wherein the N feature vectors that belong to the class c.sub.i have the sign +1 and all of the N feature vectors that belong to all of the other M1 classes have the sign 1:

(298) $E_{c_{i}} = {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s)),$
wherein the input of the machine learning algorithm for each discriminant function of a minimum risk quadratic classification system sign (D.sub.ij(s)) is the collection of N feature vectors that belongs to the class c.sub.i and a collection of N feature vectors that belongs to one of the other M1 classes, and wherein the ensemble E.sub.c.sub.i for class c.sub.i is determined by summing the M1 discriminant functions of the M1 minimum risk quadratic classification systems

(299) $E_{c_{i}} = {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s)),$
wherein the discriminant function D.sub.ij (s) discriminates between feature vectors that belong to class i and class j, and wherein the minimum risk quadratic classification system sign(D.sub.ij(s)) decides which of the two classes i or j that a feature vector s belongs to: according to the sign of +1 or 1 that is output by the signum function sign(D.sub.ij(s)), and wherein the output of the minimum risk quadratic classification system of the ensemble E.sub.c.sub.i is determined by the sum:

(300) ${.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s)) .$

(301) Therefore, the M ensembles of the M1 discriminant functions of the M1 minimum risk quadratic classification systems

(302) $E_{M} = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s))$
determine the discriminant function of an M-class minimum risk quadratic classification system that classifies a feature vector s into the class c.sub.i associated with the ensemble E.sub.c.sub.i that has the largest positive signed output, wherein each ensemble E.sub.c.sub.i of M1 discriminant functions of M1 minimum risk quadratic classification systems for a given class c.sub.i of feature vectors exhibits the minimum probability of error for classifying the feature vectors that belong to the M collections of N feature vectors and unknown feature vectors related to the M collections.

(303) The discriminant function of the M-class minimum risk quadratic classification system D.sub.E.sub.M (s)

(304) $D_{E_{M}} (s) = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s))$
exhibits the minimum probability of error for classifying feature vectors that belong to the M collections of N feature vectors and unknown feature vectors related to the M collections of N feature vectors, wherein the discriminant function of the M-class minimum risk quadratic classification system function determines likely locations of feature vectors that belong to and are related to the M collections of N feature vectors and identifies decision regions related to the M classes that the feature vectors are located within, wherein the discriminant function recognizes the classes of the feature vectors, and wherein the M-class minimum risk quadratic classification decides which of the M classes that the feature vectors belong to, and thereby classifies the feature vectors.

Embodiment 3

(305) A fused discriminant function of a fused minimum risk quadratic classification system that classifies two types of feature vectors into two classes, wherein the types of feature vectors have different numbers of vector components, is determined by using a machine learning algorithm of the invention and a collection of N feature vectors and a collection of N different feature vectors to determine an ensemble of a discriminant function of a minimum risk quadratic classification system

(306) $sign (D (s))$
and a different discriminant function of a different minimum risk quadratic classification system

(307) 0 $sign (\overset{.Math.}{D} (s)) : {\overline{\overline{E}}}_{2} = sign (D (s)) + sign (\overset{.Math.}{D} (s)),$
wherein the discriminant function and the different discriminant function are both determined by the process that is described in EMBODIMENT 1.

(308) The fused discriminant function of the fused minimum risk quadratic classification system

(309) ${\overline{\overline{D}}}_{E_{2}} (s) = sign (D (s)) + sign (\overset{.Math.}{D} (s))$
exhibits the minimum probability of error for classifying the feature vectors that belong to the collection of N feature vectors and unknown feature vectors related to the collection of N feature vectors as well as the different feature vectors that belong to the collection of N different feature vectors and unknown different feature vectors related to the collection of N different feature vectors, wherein the fused discriminant function determines likely locations of feature vectors that belong to and are related to the collection of N feature vectors as well as different feature vectors that belong to and are related to the collection of N different feature vectors and identifies decision regions related to the two classes that the feature vectors and the different feature vectors are located within, wherein the fused discriminant function recognizes the classes of the feature vectors and the different feature vectors, and wherein the fused minimum risk quadratic classification decides which of the two classes that the feature vectors and the different feature vectors belong to, and thereby classifies the feature vectors and the different feature vectors.

Embodiment 4

(310) FIG. 8 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for determining a fused discriminant function of a fused M-class minimum risk quadratic classification system that classifies two types of feature vectors into M classes, wherein the types of feature vectors have different numbers of vector components.

(311) A fused discriminant function of a fused M-class minimum risk quadratic classification system that classifies two types of feature vectors into M classes is determined by using a machine learning algorithm of the invention and M collections of N feature vectors to determine M ensembles of M1 discriminant functions of M1 minimum risk quadratic classification systems

(312) $E_{M} = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s))$
as well as M collections of N different feature vectors to determine M different ensembles of M1 different discriminant functions of M1 different minimum risk quadratic classification systems

(313) ${\overset{.Math.}{E}}_{M} = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign ({\overset{.Math.}{D}}_{ij} (s)),$
wherein the M ensembles and the M different ensembles are both determined by the process that is described in EMBODIMENT 2.

(314) The fused discriminant function of the fused M-class minimum risk quadratic classification system D.sub.E.sub.M(s)

(315) ${\overline{\overline{D}}}_{E_{M}} (s) = E_{M} + {\overset{.Math.}{E}}_{M} = {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign (D_{ij} (s)) + {.Math.}_{i = 1}^{M} {.Math.}_{j = 1}^{M - 1} sign ({\overset{.Math.}{D}}_{ij} (s))$
exhibits the minimum probability of error for classifying feature vectors that belong to the M collections of N feature vectors and unknown feature vectors related to the M collections of N feature vectors as well as different feature vectors that belong to the M collections of N different feature vectors and unknown different feature vectors related to the M collections of N different feature vectors, wherein the fused discriminant function determines likely locations of feature vectors that belong to and are related to the M collections of N feature vectors as well as different feature vectors that belong to and are related to the M collections of N different feature vectors and identifies decision regions related to the M classes that the feature vectors and the different feature vectors are located within, wherein the fused discriminant function recognizes the classes of the feature vectors and the different feature vectors, and wherein the fused M-class minimum risk quadratic classification decides which of the M classes that the feature vectors and the different feature vectors belong to, and thereby classifies the feature vectors and the different feature vectors.

Embodiment 5

(316) FIG. 9 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for using a discriminant function of a minimum risk quadratic classification system to determine a classification error rate and a measure of overlap between distributions of feature vectors for two classes of feature vectors.

(317) The process of using a discriminant function of a minimum risk quadratic classification system to determine a classification error rate and a measure of overlap between distributions of feature vectors for two classes of feature vectors involves the following steps:

(318) Receive an N d data set of feature vectors within a computer system, wherein N is the number of feature vectors, d is the number of vector components in each feature vector, and each one of the N feature vectors is labeled with information that identifies which of the two classes each one of the N feature vectors belongs to.

(319) Receive an N d test data set of test feature vectors related to the data set within the computer system, wherein N is a number of test feature vectors, d is a number of vector components in each test feature vector, and each one of the N test feature vectors is labeled with information that identifies which of the two classes each one of the N test feature vectors belongs to.

(320) Determine the discriminant function of the minimum risk quadratic classification system by performing the steps outlined in EMBODIMENT 1.

(321) Use the minimum risk quadratic classification system to classify the N feature vectors.

(322) Determine an in-sample classification error rate for the two classes of feature vectors by calculating the average number of wrong decisions of the minimum risk quadratic classification system for classifying the N features vectors.

(323) Use the minimum risk quadratic classification system to classify the N test feature vectors.

(324) Determine an out-of-sample classification error rate for the two classes of test feature vectors by calculating the average number of wrong decisions of the minimum risk quadratic classification system for classifying the N test feature vectors.

(325) Determine the classification error rate for the two classes of feature vectors by averaging the in-sample classification error rate and the out-of-sample classification error rate.

(326) Determine a measure of overlap between distributions of feature vectors for the two categories of feature vectors using the N feature vectors and the extreme vectors that have been identified, by calculating the ratio of the number of the extreme vectors to the number of the N feature vectors, wherein the ratio determines the measure of overlap.

Embodiment 6

(327) FIG. 10 is a flow diagram of programmed instructions executed by the processor of FIG. 11 to implement the method for using a discriminant function of a minimum risk quadratic classification system to determine if distributions of two collections of feature vectors are homogenous distributions. The process of using a discriminant function of a minimum risk quadratic classification system to determine if distributions of two collections of feature vectors are homogenous distributions involves the following steps:

(328) Receive an N d data set of feature vectors within a computer system, wherein N is the number of feature vectors, d is the number of vector components in each feature vector, and each one of the N feature vectors is labeled with information that identifies which of the two collections each one of the N feature vectors belongs to.

(329) Determine the discriminant function of the minimum risk quadratic classification system by performing the steps outlined in EMBODIMENT 1.

(330) Use the minimum risk quadratic classification system to classify the N feature vectors.

(331) Determine an in-sample classification error rate for the two collections of feature vectors by calculating the average number of wrong decisions of the minimum risk quadratic classification system for classifying the N features vectors.

(332) Determine a measure of overlap between distributions of feature vectors for the two collections of feature vectors using the N feature vectors and the extreme vectors that have been identified, by calculating the ratio of the number of the extreme vectors to the number of the N feature vectors, wherein the ratio determines the measure of overlap.

(333) Determine if the distributions of the two collections of the N feature vectors are homogenous distributions by using the in-sample classification error rate and the measure of overlap, wherein the distributions of the two collections of the N feature vectors are homogenous distributions if the measure of overlap has an approximate value of one and the in-sample classification error rate has an approximate value of one half.

(334) Machine learning algorithms of the invention involve solving certain variants of the inequality constrained optimization problem that is used by support vector machines, wherein regularization parameters and reproducing kernels have been defined.

(335) Software for machine learning algorithms of the invention can be obtained by using any of the software packages that solve quadratic programming problems, or via LIBSVM (A Library for Support Vector Machines), SVMlight (an implementation of SVMs in C) or MATLAB SVM toolboxes.

(336) The machine learning methods of the invention disclosed herein may be readily utilized in a wide variety of applications, wherein feature vectors have been extracted from outputs of sensors that include, but are not limited to radar and hyperspectral or multispectral images, biometrics, digital communication signals, text, images, digital waveforms, etc.

(337) More specifically, the applications include, for example and without limitation, general pattern recognition (including image recognition, waveform recognition, object detection, spectrum identification, and speech and handwriting recognition, data classification, (including text, image, and waveform categorization), bioinformatics (including automated diagnosis systems, biological modeling, and bio imaging classification), etc.

(338) One skilled in the art will recognize that any suitable computer system may be used to execute the machine learning methods disclosed herein. The computer system may include, without limitation, a mainframe computer system, a workstation, a personal computer system, a personal digital assistant, or other device or apparatus having at least one processor that executes instructions from a memory medium.

(339) The computer system may further include a display device or monitor for displaying operations associated with the learning machine and one or more memory mediums on which computer programs or software components may be stored. In addition, the memory medium may be entirely or partially located in one or more associated computers or computer systems which connect to the computer system over a network, such as the Internet.

(340) The machine learning methods described herein may also be executed in hardware, a combination of software and hardware, or in other suitable executable implementations. The learning machine methods implemented in software may be executed by the processor of the computer system or the processor or processors of the one or more associated computer systems connected to the computer system.

(341) While the invention herein disclosed has been described by means of specific embodiments, numerous modifications and variations could be made by those skilled in the art without departing from the scope of the invention set forth in the claims.

Methods for using feature vectors and machine learning algorithms to determine discriminant functions of minimum risk quadratic classification systems

Inventors

Cpc classification

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06F18/254

PHYSICS

Classification Explorer

A61B5/7267

HUMAN NECESSITIES

Classification Explorer

G06F17/18

PHYSICS

Classification Explorer

A61B5/7264

HUMAN NECESSITIES

Classification Explorer

G06F18/2415

PHYSICS

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G06N20/10

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06N20/20

PHYSICS

Classification Explorer

G06F18/24155

PHYSICS

Classification Explorer

G06F18/2453

PHYSICS

Classification Explorer

G06F18/2193

PHYSICS

Classification Explorer

G06F17/11

PHYSICS

International classification

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06F17/11

PHYSICS

Classification Explorer

A61B5/00

HUMAN NECESSITIES

Classification Explorer

G06F17/18

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Abstract

Claims

Description