Sequential minimal optimization algorithm for learning using partially available privileged information

Abstract

Computational algorithms integrate and analyze data to consider multiple interdependent, heterogeneous sources and forms of patient data, and using a classification model, provide new learning paradigms, including privileged learning and learning with uncertain clinical data, to determine patient status for conditions such as acute respiratory distress syndrome (ARDS) or non-ARDS.

Claims

1. A computer-implemented method of developing a classifier model for identifying a medical condition of a subject, the method comprising: receiving training data for a plurality of subjects, where the training data includes sensed physiological data, imaging data, demographic data, and/or other clinical data, where at least some of the training data is for subjects having the medical condition and at least some of the training data is for subjects not having the medical condition, wherein the training data includes privileged information where at least a portion of the privileged information is label uncertain privileged information; identifying the label uncertain privileged information; applying a penalty parameter to each label uncertain privileged information, wherein the penalty parameter establishes a soft-margin decisional boundary for the label uncertain privileged information; performing an optimization on (i) the label uncertain privileged information having the penalty parameters, (ii) the privileged information in the training data that has certain labels, and (iii) non-privileged information in training data, to develop a mapping to an outcome determination of the medical condition; and generating a classifier model based on the mapping to classify testing data as corresponding to the presence of the medical condition or to the absence of the medical condition.

2. The computer-implemented method of claim 1, further comprising: applying the training data to a support vector machine (SVM) framework configured to map training data to a vector space and generate a decision hyperplane classifier model for identifying the presence of the medical condition and for identifying the absence of the medical condition.

3. The computer-implemented method of claim 2, further comprising: configuring the support vector machine (SVM) framework to map privileged information in the training data and non-privileged information in the training data to the vector space for generating the decision hyperplane classifier model.

4. The computer-implemented method of claim 2, wherein the training data is expressed as:
(x.sub.1,x*.sub.1,y.sub.1,π.sub.1), . . . ,(x.sub.m,x*.sub.m,y.sub.m,π.sub.m),(x.sub.m+1,y.sub.m+1,π.sub.m+1),(x.sub.m+2,y.sub.m+2,π.sub.m+2), . . . ,(x.sub.n,y.sub.n,π.sub.n)
x.sub.i∈X,x*.sub.i∈X*,y.sub.i∈{−1,1},π.sub.i∈ custom character

5. The computer-implemented method of claim 4, further comprising: applying the penalty parameter by performing a correction function for identifying hyperplane parameters; and generating the classifier model by simultaneous optimization of the following expression: $\begin{matrix} \min_{w, b, w^{*}, b^{*}, ξ} \frac{1}{2} {.Math. w .Math.}_{2}^{2} + \frac{γ}{2} {.Math. w^{*} .Math.}_{2}^{2} + C {.Math.}_{i = m + 1}^{n} π_{i} ξ_{i} + C^{*} {.Math.}_{i = 1}^{m} (w^{*} .Math. z_{i}^{*} + b^{*}) & (2) \\ \begin{matrix} s . t . \forall 1 \leq i \leq m & y_{i} (w .Math. z_{i} + b) \geq 1 - (w^{*} .Math. z_{i}^{*} + b^{*}) \\ \forall 1 \leq i \leq m & w^{*} .Math. z_{i}^{*} + b^{*} \geq 0 \\ \forall m + 1 \leq i \leq n & y_{i} (w .Math. z_{i} + b) \geq 1 - ξ_{i} \\ \forall m + 1 \leq i \leq n & ξ_{i} \geq 0. \end{matrix} \end{matrix}$

6. The computer-implemented method of claim 4, further comprising: applying the penalty parameter by performing a correction function for identifying hyperplane parameters; and generating the classifier model by simultaneous optimization of the following expression: $\begin{matrix} \min_{w, b, ξ, w^{*}, b^{*}, ξ_{i}^{*}} \frac{1}{2} {.Math. w .Math.}_{2}^{2} + \frac{γ}{2} {.Math. w^{*} .Math.}_{2}^{2} + C {.Math.}_{i = m + 1}^{n} π_{i} ξ_{i} + ρ C^{*} {.Math.}_{i = 1}^{m} π_{i} ξ_{i}^{*} + C^{*} {.Math.}_{i = 1}^{m} (w^{*} .Math. z_{i}^{*} + b^{*}) & (3) \\ \begin{matrix} s . t . \forall 1 \leq i \leq m & y_{i} (w .Math. z_{i} + b) \geq 1 - (w^{*} .Math. z_{i}^{*} + b^{*}) - ξ_{i}^{*} \\ \forall 1 \leq i \leq m & w^{*} .Math. z_{i}^{*} + b^{*} \geq 0 \\ \forall 1 \leq i \leq m & ξ_{i}^{*} \geq 0 \\ \forall m + 1 \leq i \leq n & y_{i} (w .Math. z_{i} + b) \geq 1 - ξ_{i} \\ \forall m + 1 \leq i \leq n & ξ_{i} \geq 0. \end{matrix} \end{matrix}$

7. The computer-implemented method of claim 4, further comprising: applying the penalty parameter by performing a correction function for identifying hyperplane parameters; and generating the classifier model by simultaneous optimization of the following expression: $\begin{matrix} \min_{w, b, ξ, w^{*}, b^{*}, ξ_{i}^{*}} \frac{1}{2} {.Math. w .Math.}_{2}^{2} + \frac{γ}{2} {.Math. w^{*} .Math.}_{2}^{2} + C {.Math.}_{i = m + 1}^{n} π_{i} ξ_{i} + ρ C^{*} {.Math.}_{i = 1}^{m} π_{i} ξ_{i}^{*} + C^{*} {.Math.}_{i = 1}^{m} y_{i} (w^{*} .Math. z_{i}^{*} + b^{*}) & (4) \\ \begin{matrix} s . t . \forall 1 \leq i \leq m & y_{i} (w .Math. z_{i} + b) \geq 1 - y_{i} (w^{*} .Math. z_{i}^{*} + b^{*}) - ξ_{i}^{*} \\ \forall 1 \leq i \leq m & y_{i} (w^{*} .Math. z_{i}^{*} + b^{*}) \geq 0 \\ \forall 1 \leq i \leq m & ξ_{i}^{*} \geq 0 \\ \forall m + 1 \leq i \leq n & y_{i} (w .Math. z_{i} + b) \geq 1 - ξ_{i} \\ \forall m + 1 \leq i \leq n & ξ_{i} \geq 0. \end{matrix} \end{matrix}$

8. The computer-implemented method of claim 1, further comprising performing the optimization by applying a sequential minimal optimization to (i) the label uncertain privileged information having the penalty parameters, (ii) the privileged information in the training data that has certain labels, and (iii) non-privileged information in training data, to develop a mapping to an outcome determination of the medical condition.

9. The computer-implemented method of claim 1, wherein the medical condition is acute respiratory distress syndrome.

10. The computer-implemented method of claim 8, wherein the physiological data comprises continuous electrocardiogram (ECG) and/or photoplethysmography (PPG) data.

11. The computer-implemented method of claim 8, wherein the image data comprises: chest x-ray image data.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) For a more complete understanding of the disclosure, reference should be made to the following detailed description and accompanying drawing figures, in which like reference numerals identify like elements in the figures, and in which:

(2) FIG. 1A illustrates an enhanced learning using privileged information system, in accordance with an example herein.

(3) FIG. 1B is a block diagram of an example implementation of the enhanced learning using privileged information framework of FIG. 1B, in accordance with an example.

(4) FIG. 2 illustrates a schematic of a conventional machine learning model, implemented as a support vector machine.

(5) FIG. 3 illustrates a schematic of an enhanced machine learning model, in accordance with the present techniques.

(6) FIG. 4 illustrates a schematic of the machine learning model of FIG. 2 optimized in accordance with an example implementation of the techniques herein, in accordance with an example.

(7) FIG. 5 illustrates a schematic of the machine learning model of FIG. 3 optimized in accordance with an example implementation of the techniques herein, in accordance with an example.

(8) FIG. 6 illustrates an example ARDS analysis system implementing the framework of FIG. 5, in accordance with an example.

(9) FIG. 7 illustrates an example earnings forecasting system implementing the framework of FIG. 5, in accordance with an example.

DETAILED DESCRIPTION

(10) The present techniques include enhanced machine learning systems implemented with a learning using privileged information configuration. The supervised learning system is able to access, analyze, and use privileged information available for training data, which is not available for test data. The supervised learning systems herein are able to learn where privileged information is known for all the training data or for all part of the training data. The later example of learning using partially available privileged information may be implemented in numerous ways. In some examples, a supervised learning system implements modified support vector machine (SVM) techniques to implement such learning. In some examples, a supervised learning system combines a sequential minimal optimization (SMO) process with the SVM to implement such learning.

(11) FIG. 1A illustrates an example learning using privileged information system 100, in accordance with examples herein. A computing device 102 implementing an enhanced learning using privileged information framework is coupled to a network 104 for receiving various types of data, for training one or more classifiers model and for executing the trained classifier models on testing data, e.g., during treatment of a patient. The network 104 may be a wireless or wired network communicating with physiological sensors 103 and medical imagers 105 generating real time data on a patient. The network 104 may also be connected to databases, including a medical images database 106 which may be used for specific applications such as ARDS detection, privileged information databases 108 such as electronic health records databases and cognitive health databases. This database may include partially privileged information, as well. The network 104 may be further connected to uncertain labeled information databases 110, which may include privileged information with uncertain labeling. As further shown, the network 104 may be connected to a network accessible server 112 that may store data from any of the foregoing connected devices or other data that may be used for training and/or testing using the computing device.

(12) The computing device 102 includes one or more processing units 114, one or more optional graphics processing units 116, a local database 118, random access memory 120, a network interface 122, and Input/Output (I/O) interfaces 124 connecting the computing device 102 to a display 126 and user input device 128.

(13) The various enhanced learning using privileged information described herein, including as applied to SVM, is achieved using the enhanced learning using privileged information framework 130, which generates one or more trained classifier models 132 also stored in the computing device 102.

(14) FIG. 1B illustrates an example of the enhanced learning using privileged information framework 130. In the illustrated example, framework 200 is implemented as a learning using label uncertainty partially available privileged information framework. The framework includes a machine learning framework 202 that may store any of a variety of types of machine learning techniques. In examples described below, this framework is an SVM, although other machine learning techniques may be used. A features extractor 204 receives the data over the network and extracts features from that data, features such as physiological features, waveforms, image features, etc. The features extraction may further extract demographic information and other EHR data to be used in constructing a classifier model. A label uncertainty information extractor 206 identifies particular types of data, specifically data that has uncertain labeling. This data may be privileged information or partially privileged information for example. An uncertainty label correction function 208 takes the identified label uncertainty information and applies a correction function to minimize weighting of such information during training of the classifier model. The correction function may still maintain the extracted information but seeks to optimize that information for use in machine learning. An optimizer 210 is used to optimize the correction function process, for example, in an iterative process, such that the label uncertain information is appropriately weighted over success iterations of the process.

(15) FIG. 2 illustrates the schematic a conventional machine learning model 300, such as an SVM, that may be used in a clinical decision support systems (CDSS) to assist in monitoring for and diagnosing a medical condition of a subject. Initial medical data is obtained from the subject, at the time of care. For example, diagnostic data such as demographics, medical history, medications, and laboratory results may be obtained. That medical data may be characterized by having labels that are created by a clinician who is a domain expert for the medical condition under consideration, and features that may include all available medical data, a subset of medical data as deemed salient by a domain expert, summary statistics of the medical data, or some combination thereof. In the illustrated example, the CDSSS is implemented using a machine learning model 302 that applies one or more trained decision rules designed to assess the medication condition. That assessment may be a binary diagnostic assessment, such as the subject has a condition or does not have a condition. That assessment may be a probabilistic assessment of the likelihood the subject has that condition. In an example that uses an SVM as the machine learning model the decision rule may be the decision function ƒ(z)=w.Math.z+b.

(16) In a classical SVM algorithm, given a set of training data
(x.sub.1,y.sub.1), . . . ,(x.sub.n,y.sub.n), with x.sub.i∈X,y.sub.i∈{−1,1},
the SVM first maps training data vector x ∈ X into vector (space) z ∈ Z, and the SVM constructs the optimal separating hyperplane by learning the decision rule ƒ(z)=w.Math.z+b, where w and b are hyperplane parameters and the solution of

(17) $\min_{w, b, ξ_{1}} \frac{1}{2} {.Math. w .Math.}_{2}^{2} + C {.Math.}_{i = 1}^{n} ξ_{i}$ such that ∀1≤i≤n, y.sub.i(w.Math.z.sub.i+b)≥1−ξ.sub.i and ∀1≤i≤n, ξ.sub.i>0,
where C>0 is a hyperparameter.

(18) In some examples, herein, this conventional SVM is extended to include learning using privileged information. In a learning using privileged information (LUPI) paradigm, in addition to the standard training data, x.sub.i ∈ X, y.sub.i ∈ {−1,1}, the machine learning model of FIG. 2 is provided with privileged information. This privileged information, x* ∈ X*, may only be available for training data and not available for testing data collected from the subject at the time of care. In such example, the present techniques rely upon a triplet of data, e.g., (x.sub.i,x.sub.i*,y.sub.i), for training. With such a modification to SVM, in addition to mapping data vector x ∈ X into vector (space) z ∈ Z, the modified SVM is configured to map privileged information x* ∈ X* into vector (space) z* ∈ Z*, and then the slack variable ξ.sub.i of SVM may be replaced by the correcting function φ(z*)=w*.Math.z*+b*.

(19) Generally speaking, privileged information is any information that is readily available during the training phase of machine learning not necessarily available during the testing phase, with the potential to improve model performance. Privileged information can be periodic data such as the timed polling of a measured condition or scheduled release of financial reports, structural information such as the topological structure of a protein, effectively static data such as clinical/demographic information or tax rates, time-series data such as physiological waveforms or equity pricing, image data, and outcome data including expert adjudicated patient data or corporate profits. Privileged information can be measured data or coded data, obtained from local connected medical equipment, databases from remote connected sources, market research companies and governmental agencies.

(20) In an example of learning using privileged information, the assumption is that privileged information is available for all the training data. However, this is not always the case. In many practical applications, privileged information is only available for fraction of training data.

(21) Therefore, in some examples, the supervised learning systems herein are configured to learn using partially available privileged information (also termed herein “LUPAPI”) techniques, in which the training data is provided with m triplets, (x.sub.i,x*.sub.i,y.sub.l), of samples with privileged information and n−m pairs (x.sub.i,y.sub.l) of samples without privileged information. These LUPAPI techniques may be applied to SVM, using a SMO-based approach, for example.

(22) In some examples, supervised learning systems herein are configured to compensate for another condition in training, label uncertainty, that is the lack of confidence in labels for training data. Label uncertainty is very significant in medical data, since there is always lack of consensus between clinicians in their diagnosis. In some examples, supervised learning systems herein are configured to incorporate the label uncertainty into the LUPAPI techniques by allowing variability in parameters through training samples based on label confidence. For example, a slack variable (or correcting function) may be used that permits misclassification with penalty parameter to establish soft-margin decision boundaries. In this way, data with high label confidence can be given more weight and influence on the decision boundary in comparison to data with lower label confidence. We term supervised learning systems with this modification as configured with learning using label uncertainty partially available privileged information (“LULUPAPI”).

(23) FIG. 3. Illustrates an example supervised learning system having a LULUPAPI configuration 400. Collected data includes labels and features, as with conventional systems such as that of FIG. 2. However, additional data for a subject is collected, including privileged information and label uncertainty information. All of this data is provided to an enhanced machine learning model 402 that generates a decision model, which in the illustrated example are hyperplane parameters, w, b, of an SVM decision model.

(24) In some example implementations of the LULUPAPI approaches herein, compensating for label uncertainty, using training samples according to:
(x.sub.1,x*.sub.1,y.sub.1,π.sub.1), . . . ,(x.sub.m,x*.sub.m,y.sub.m,π.sub.m),(x.sub.m+1,y.sub.m+1,π.sub.m+1),(x.sub.m+2,y.sub.m+2,π.sub.m+2), . . . ,(x.sub.n,y.sub.n,π.sub.n)
x.sub.i∈X,x*.sub.i∈X*,y.sub.i∈{−1,1},π.sub.i∈ custom character
where π.sub.i is quantitative measure of uncertainty in the labels (see FIG. 3).

(25) In various examples, a supervised learning system consider is implemented using any of a number of different LULUPAPI configurations.

(26) In a first LULUPAPI configuration, a supervised learning system applies a correcting function for data with privileged information and slack variable for data without privileged information. In such examples, the supervised learning system determines the decision rule and the correcting function hyperplane parameters through simultaneous optimization of:

(27) $\begin{matrix} \min_{w, b, w^{*}, b^{*}, ξ} \frac{1}{2} {.Math. w .Math.}_{2}^{2} + \frac{γ}{2} {.Math. w^{*} .Math.}_{2}^{2} + C {.Math.}_{i = m + 1}^{n} π_{i} ξ_{i} + C^{*} {.Math.}_{i = 1}^{m} (w^{*} .Math. z_{i}^{*} + b^{*}) & (2) \\ \begin{matrix} s . t . \forall 1 \leq i \leq m & y_{i} (w .Math. z_{i} + b) \geq 1 - (w^{*} .Math. z_{i}^{*} + b^{*}) \\ \forall 1 \leq i \leq m & w^{*} .Math. z_{i}^{*} + b^{*} \geq 0 \\ \forall m + 1 \leq i \leq n & y_{i} (w .Math. z_{i} + b) \geq 1 - ξ_{i} \\ \forall m + 1 \leq i \leq n & ξ_{i} \geq 0 \end{matrix} \end{matrix}$
where C>0, C*>0, and γ>0 are hyperparameters. In this formulation, the term γ∥w*∥.sub.2.sup.2 is intended to restrict the VC-dimension of the function space. The slack variables and the label uncertainty may only be used for training data without privileged information. During testing, labels may be determined by the decision rule.

(28) In another example LULUPAPI configuration, a supervised learning system may, for data with privileged information, replace slack variables with a smooth correcting function φ(z*)=w*.Math.z*+b* or the supervised learning system may use a mixture of slacks as ξ′.sub.i=(w*.Math.z.sub.i*+b*)+ρξ*.sub.i for 1≤i≤m and ρ ∈ custom character , resulting in another formulation of LULUPAPI expressed as:

(29) $\begin{matrix} \min_{w, b, ξ, w^{*}, b^{*}, ξ_{i}^{*}} \frac{1}{2} {.Math. w .Math.}_{2}^{2} + \frac{γ}{2} {.Math. w^{*} .Math.}_{2}^{2} + C {.Math.}_{i = m + 1}^{n} π_{i} ξ_{i} + ρ C^{*} {.Math.}_{i = 1}^{m} π_{i} ξ_{i}^{*} + C^{*} {.Math.}_{i = 1}^{m} (w^{*} .Math. z_{i}^{*} + b^{*}) & (3) \\ \begin{matrix} s . t . \forall 1 \leq i \leq m & y_{i} (w .Math. z_{i} + b) \geq 1 - (w^{*} .Math. z_{i}^{*} + b^{*}) - ξ_{i}^{*} \\ \forall 1 \leq i \leq m & w^{*} .Math. z_{i}^{*} + b^{*} \geq 0 \\ \forall 1 \leq i \leq m & ξ_{i}^{*} \geq 0 \\ \forall m + 1 \leq i \leq n & y_{i} (w .Math. z_{i} + b) \geq 1 - ξ_{i} \\ \forall m + 1 \leq i \leq n & ξ_{i} \geq 0 \end{matrix} \end{matrix}$
Supervised learning systems applying this formulation may have interesting properties, such as performance that is lower bounded by SVM performance.

(30) In yet another example LULUPAPI configuration, a supervised learning system can achieve a better transfer of the knowledge obtained in privileged information space to the decision space by considering the mixture of slacks a ξ′.sub.i=y.sub.i(w*.Math.z.sub.i*+b*)+ρξ*.sub.i for 1≤i≤m and ρ ∈ custom character , resulting in a third formulation of LULUPAPI expressed as:

(31) $\begin{matrix} \min_{w, b, ξ, w^{*}, b^{*}, ξ_{i}^{*}} \frac{1}{2} {.Math. w .Math.}_{2}^{2} + \frac{γ}{2} {.Math. w^{*} .Math.}_{2}^{2} + C {.Math.}_{i = m + 1}^{n} π_{i} ξ_{i} + ρ C^{*} {.Math.}_{i = 1}^{m} π_{i} ξ_{i}^{*} + C^{*} {.Math.}_{i = 1}^{m} y_{i} (w^{*} .Math. z_{i}^{*} + b^{*}) & (4) \\ \begin{matrix} s . t . \forall 1 \leq i \leq m & y_{i} (w .Math. z_{i} + b) \geq 1 - y_{i} (w^{*} .Math. z_{i}^{*} + b^{*}) - ξ_{i}^{*} \end{matrix} \\ \begin{matrix} \forall 1 \leq i \leq m & y_{i} (w^{*} .Math. z_{i}^{*} + b^{*}) \geq 0 \end{matrix} \\ \begin{matrix} \forall 1 \leq i \leq m & ξ_{i}^{*} \geq 0 \end{matrix} \\ \begin{matrix} \forall m + 1 \leq i \leq n & y_{i} (w .Math. z_{i} + b) \geq 1 - ξ_{i} \end{matrix} \\ \begin{matrix} \forall m + 1 \leq i \leq n & ξ_{i} \geq 0 \end{matrix} \end{matrix}$

(32) Any of these LULUPAPI formulations may be implemented in the enhanced machine learning models described herein, for example.

(33) Any of these LULUPAPI formulations may be optimized by the enhanced machine learning model to generate the decisional rule/model for use in assessing test data. For example, by Lagrange multipliers, the dual optimization of the second LULUPAPI formulation (3) can be written as:

(34) $\begin{matrix} \min_{α, β} D (α, β) = {.Math.}_{i = 1}^{n} α_{i} - \frac{1}{2} {.Math.}_{i, j = 1}^{n} α_{i} α_{j} y_{i} y_{j} K_{i, j} - \frac{1}{2 γ} {.Math.}_{i, j = 1}^{n} (α_{i} + β_{i} - C^{*}) (α_{j} + β_{j} - C^{*}) K_{i, j}^{*} & (5) \\ s . t . {.Math.}_{i = 1}^{n} y_{i} α_{i} = 0 & (6) \\ {.Math.}_{i = 1}^{m} (α_{i} + β_{i} - C^{*}) = 0 & (7) \\ \forall m + 1 \leq i \leq n, 0 \leq α_{i} \leq π_{i} C & (8) \\ \forall 1 \leq i \leq m, {ρπ}_{i} C^{*} \geq α_{i} \geq 0, β_{i} \geq 0 & (9) \end{matrix}$
where K.sub.i,j custom character K(z.sub.i, z.sub.j) and K*.sub.i,jK*(z*.sup.i, z*.sub.j) are the kernel in the decision space and the correcting space, respectively, with the decision function expressed as:

(35) $\begin{matrix} f (z) = w .Math. z + b = {.Math.}_{i = 1}^{n} y_{i} α_{i} K (z_{i}, z) + b & (10) \end{matrix}$

(36) In an example, the LULUPAPI formulation is optimized using an SVM dual problem optimization process, such as a sequential minimal optimization (SMO) algorithm. FIG. 4 illustrates an example SMO optimizer implementation 500 of the system of FIG. 2, that is, of a conventional SVM supervised learning system. Data including labels and features are provided to an optimized SVM model that includes a recursive SMO optimizer and a resulting decision function.

(37) As shown in the illustrated example, an iterative optimization process may be used to optimize the LULUPAPI formulation. The problem of equation (5) can be considered as the general form of the following expression:

(38) $\max_{θ \in ℱ} D (θ)$
where θ ∈ custom character .sup.k and D:.sup.k.fwdarw. is a concave quadratic function and is a convex compact set of linear equalities and inequalities. At each iteration, a feasible direction generator determines the sets I.sub.i of maximally sparse feasible directions such that all the constraints (e.g., Equations 6, 7, 8, and 9) are satisfied. With the maximal sparse feasible directions determined from the label data, an optimizer receives feature data and optimizes these feasible directions, choosing the one that optimizes a cost function, which is then used as the starting point for the next iteration of the algorithm.

(39) For example, an implementation of the recursive SMO optimizer, the supervised learning system starts the process by introducing the maximally feasible directions. The cost function in Equation (5) has n+m variables, i.e., {α.sub.i}.sub.i=1.sup.n and {β.sub.i}.sub.i=1.sup.n. These can be combined into a single (n+m)-variable vector θ by concatenating the α and β variables: θ custom character (α, β).sup.T. Thus, each maximally sparse feasible direction is ∈ .sup.n-m. From this, it can be verified that in an example, Equation (5) has 9 sets of such directions as follows:
I.sub.1≙{u.sub.s|s=(s.sub.1,s.sub.2),n+1≤s.sub.1,s.sub.2≤n+m,s.sub.1≠s.sub.2;u.sub.s.sub.1=1,u.sub.s.sub.2=−1,θ.sub.s.sub.2>0,∀i∈su.sub.i=0}. 1)
I.sub.2≙{u.sub.s|s=(s.sub.1,s.sub.2),1≤s.sub.1,s.sub.2≤m,s.sub.1≠s.sub.2,y.sub.s.sub.1=y.sub.s.sub.2;u.sub.s.sub.1=1,θs.sub.1<ρC*πs.sub.1,u.sub.s.sub.2=−1,θ.sub.s.sub.2>0,℄i∈su.sub.i=0}. 2)
I.sub.3≙{u.sub.s|s=(s.sub.1,s.sub.2),m+1≤s.sub.1,s.sub.2≤n,s.sub.1≠s.sub.2,y.sub.s.sub.1=y.sub.s.sub.2;u.sub.s.sub.1=1,θ.sub.s.sub.1<Cπs.sub.1,u.sub.s.sub.2=−1,θ.sub.s.sub.2>0,∀i∈su.sub.i=0}. 3)
I.sub.4≙{u.sub.s|s=(s.sub.1,s.sub.2),m+1≤s.sub.1,s.sub.2≤n,s.sub.1≠s.sub.2,y.sub.s.sub.1≠y.sub.s.sub.2,∀i∈su.sub.i=0;u.sub.s.sub.1=u.sub.s.sub.2=1,θ.sub.s.sub.1<Cπ.sub.s.sub.1,θ.sub.s.sub.2<Cπ.sub.s.sub.2 or u.sub.s.sub.1=u.sub.s.sub.2=−1,θ.sub.s.sub.1>0,θ.sub.s.sub.2>0}. 4)
I.sub.5≙{u.sub.s|s=(s.sub.1,s.sub.2,s.sub.3),1≤s.sub.1,s.sub.2≤m,n−1≤s.sub.3≤n+m,s.sub.1≠s.sub.2,y.sub.s.sub.1≠y.sub.s.sub.2,∀i∈su.sub.i=0;u.sub.s.sub.1=u.sub.s.sub.2=1,θ.sub.s.sub.1<ρC*π.sub.s.sub.1,θ.sub.s.sub.2<ρC*π.sub.s.sub.2,u.sub.s.sub.3=−2,θ.sub.s.sub.3>0 or u.sub.s.sub.1=u.sub.s.sub.2=−1,θ.sub.s.sub.1>0,θ.sub.s.sub.2>0,u.sub.s.sub.2=2}. 5)
I.sub.6≙{u.sub.s|s=(s.sub.1,s.sub.2,s.sub.3),1≤s.sub.1≤m,m+1≤s.sub.2≤n,n+1≤s.sub.3≤n+m,y.sub.s.sub.1=y.sub.s.sub.2,∀i∈su.sub.i=0;u.sub.s.sub.1=1,θ.sub.s.sub.1<ρC*π.sub.s.sub.1,u.sub.s.sub.2=−1,θ.sub.s.sub.2≥0,u.sub.s.sub.2=−1,θ.sub.s.sub.3≥0 or u.sub.s.sub.1=−1,θ.sub.s.sub.1≥0,u.sub.s.sub.2=1,θ.sub.s.sub.2<Cπ.sub.s.sub.2,u.sub.s.sub.3=1}. 6)
I.sub.7≙{u.sub.s|s=(s.sub.1,s.sub.2,s.sub.3),1≤s.sub.1≤m,m+1≤s.sub.2≤n,n+1≤s.sub.3≤n+m,y.sub.s.sub.1≠y.sub.s.sub.2,∀i∈su.sub.i=0;u.sub.s.sub.1=u.sub.s.sub.2=1,θ.sub.s.sub.1<ρC*π.sub.s.sub.1,θ.sub.s.sub.2>cπ.sub.s.sub.2,u.sub.s.sub.3=1,θ.sub.s.sub.3>0 or u.sub.s.sub.1=u.sub.s.sub.2=−1,θ.sub.s.sub.1>0,θ.sub.s.sub.2>0,u.sub.s.sub.3=1}. 7)
I.sub.8≙{u.sub.s|s=(s.sub.1,s.sub.2,s.sub.3),1≤s.sub.1,s.sub.2≤m,m+1≤s.sub.3≤n,s.sub.1≠s.sub.2,y.sub.s.sub.1≠y.sub.s.sub.2,y.sub.s.sub.3=y.sub.s.sub.2,∀i∈su.sub.i=0;u.sub.s.sub.1=1,θ.sub.s.sub.1<ρC*π.sub.s.sub.1,u.sub.s.sub.2=−1,θ.sub.s.sub.2>0,u.sub.s.sub.3=2,θ.sub.s.sub.3<Cπ.sub.s.sub.3 or u.sub.s.sub.1=−1,θ.sub.s.sub.1>0,u.sub.s.sub.2=1,θ.sub.s.sub.2<ρC*π.sub.s.sub.2,u.sub.s.sub.3=−2,θ.sub.s.sub.2>0}. 8)
I.sub.9≙{u.sub.s|s=(s.sub.1,s.sub.2,s.sub.3),1≤s.sub.1,s.sub.2≤m,m+1≤s.sub.3≤n,s.sub.1≠s.sub.2,y.sub.s.sub.1≠y.sub.s.sub.2,y.sub.s.sub.3=y.sub.s.sub.1,∀i∈su.sub.i=0;u.sub.s.sub.1=1,θ.sub.s.sub.1<ρC*π.sub.s.sub.1,u.sub.s.sub.2=−1,θ.sub.s.sub.2>0,u.sub.s.sub.3=−2,θ.sub.s.sub.3<0 or u.sub.s.sub.1=−1,θ.sub.s.sub.1>0,u.sub.s.sub.2=1,θ.sub.s.sub.2<ρC*π.sub.s.sub.2,u.sub.s.sub.3=−2,θ.sub.s.sub.2<Cπ.sub.s.sub.3}. 9)

(40) It can be verified that if we move from any feasible point θ.sup.old in the direction of u.sub.s ∈U I.sub.i, it satisfies the constraints corresponding to dual SVM problems. In this example, the optimization process finds θ=θ.sup.old+λ*(s)u.sub.s, where u.sub.s and λ*(s) maximize the corresponding cost function, D(θ.sup.old+λu.sub.s) such that the constraints are satisfied.

(41) FIG. 5 illustrates an example supervised learning system 600 in accordance with the foregoing example, having an optimized LULUPAPI formulation, with similar features to that of FIG. 4. The supervised learning system of FIG. 5 is configured to optimize with privileged information and label uncertainty data. The privileged information is provided to the optimization process. The label uncertainty data is provided to a margin weight generator that converts measures of label uncertainty into weights that are used to determine the best margins within the optimization process. In the example depicted in FIG. 6, clinicians can report their level of confidence in their diagnosis, denoted by l.sub.i, using a 1-8 scale, in which 1 is not ARDS with high confidence and 8 is ARDS with high confidence. A margin weight generator π.sub.i=(|l.sub.i−p.sub.1|−p.sub.2)p.sub.3+p.sub.4, with p.sub.1=4.5, p.sub.2=3, p.sub.3=0.2 and p.sub.4=0.9, is then employed to scale the l.sub.is from the range 1-8 into the range 0.4-1 such that high-confidence cases l.sub.i=1 and l.sub.i=8 are mapped to π.sub.i=1 and low-confidence cases l.sub.i=4 and l.sub.i=5 are mapped to π.sub.i=0.4.

(42) In various examples herein, supervised learning systems can provide numerous advantages over conventional systems. These include the ability to train on data with partially available privileged information. Supervised learning systems may be configured into different models (at least three shown above) such operation, and they may further implement SMO algorithms for each of these different models. Further, supervised learning systems can incorporate label uncertainty and integrate with privileged information. Further still, supervised learning systems demonstrate the viability of more maximally sparse feasible directions. That is, considering both training samples with privileged information and without privileged information adds flexibility in searching for maximally sparse feasible direction. In configurations like that of FIG. 4, for SVM model, there are only two possible maximally sparse feasible directions. And, for configurations like that of FIG. 5, for LULUPAPI formulations incorporating privileged information and label uncertainty data, there are many more such directions (nine in the illustrated example). In practice, this flexibility helps the recursive SMO-style optimizer to converge faster in configurations like that of FIG. 5 over that of FIG. 4.

(43) FIG. 6 illustrates an example ARDS analysis system 700 implementing a supervised learning system implementing techniques herein. The ARDS analysis system first receives physiologic data, such as electrocardiography (ECG) data, photoplethysmography (PPG) data, etc. taken for a patient. Initial signal processing is performed, such as wavelet signal processing that transform these received signals into waveform features. This initial signal processing may further include signal smoothing, amplitude normalizations, band pass filtering, and other preliminary processing processes. Furthermore, while wavelet signal processing is illustrated, other transformation processes may be used to identify constituent features.

(44) The identified features data for the patient is provided to a first stage machine learning ARDS classifier that has been configured to implement a trained model of classifying ARDS based on received physiological data, in particular, waveform features obtained from the physiological data. In the illustrated example, the first stage machine learning ARDS is trained based on received clinical data, such as the physiological data, as well as privileged information such as electronic health record (EHR) data and cognitive data. FIG. 4 illustrates an example supervised learning system configuration for implementing the first stage machine learning ARDS for such configurations. In some examples, the first stage machine learning ARDS applies a trained model classifying ARDS based on received clinical data, privileged information, and in some examples uncertain labeled privileged information. FIG. 5 illustrates an example supervised learning system configuration for such configurations.

(45) The ARDS analysis system is also configured to analyze medical image data, in this example, chest radiographs. Initial image processing is performed on the medical image data to identify imaging features, such as summary statistics derived from pixel intensity histograms (including mean, variance, skewness, and kurtosis), for example, using noise filtering, edge detection, segmentation processes. These imaging features are combined with the waveform features, privileged information, including uncertain label data, to develop an updated machine learning ARDS classifier, for example implementing the configurations of FIG. 5.

(46) In an example initial wavelet signal processing for filtering and feature extraction from ECG and PPG signals, the ARDS system of FIG. 6 uses a signal processing method based on a packet-based extension of Dual Tree Complex Wavelet Transform (DTCWT), called M-band Dual Tree Complex Wavelet Packet Transform (DTCWPT). To construct a packet form of the M-band DTCWPT, in an example, the wavelet signal processor repeatedly decomposes each of the sub-bands using low-pass/high-pass, using reconstruction filter banks, so that the response of each branch of the wavelet packet filter bank is the discrete Hilbert transform of the corresponding branch of the first wavelet packet filter bank. The use of M-band decomposition in the packet transform provides a dictionary of bases over which one can search for an optimal representation of a given signal. Moreover, an advantage of the wavelet packet approach to DTCWT (i.e., M-band DTCWPT) is its ability to automatically select the proper wavelets with a suitable match with the present frequencies in the signal.

(47) After feature selection, the wavelet signal processor selected the top features (mainly wavelet-based features) to develop models to assess and predict subjects' levels of hemodynamic instability (modeled using lower body negative pressure, a human model of hypovolemia). In any example, we used the wavelet signal processor to extract an average of 12 feature windows at a window size of 120 beats for a patient sample. We compared those against traditional heart rate variability (HRV) features (i.e., standard features based on the poser signal in multiple frequency bands), and our features described in this study were used separately for developing independent models. The classification models developed (i.e., the machine learning ARDS classifier) with our identified features outperformed typical HRV features in terms of Area under the Curve (AUC) and accuracy across all window sizes in a receiver operating curves (ROC) analysis. Moreover, the classification model for our identified features (120-beats window) outperformed the best model for the typical HRV features (180 beats window) in terms of AUC and accuracy by 0.05 and 5.37%, respectively, in an example.

(48) As shown, the present techniques provide novel computational methodologies capable of synthesizing and integrating longitudinal electronic health data streams for real-time and continuous health status monitoring and early detection of disease. In an example, these techniques are used to address the problem of monitoring patients with lung disease to detect the presence of the ARDS at early stages. The techniques combine capabilities of machine learning, privileged learning and learning from uncertain data, in an optimized manner. While examples are described herein in reference to diagnosis of ARDS, the proposed techniques may be used across healthcare settings, allowing for better characterization of patient health status in both in-hospital and in-home settings via portable electronic monitoring devices.

(49) FIG. 7 illustrates an example earnings forecasting system 800 implementing a supervised learning system implementing techniques herein. The earnings forecasting system can be used to forecast if a particular company will achieve a positive or negative net income in the next quarter. In the training phase, the system utilizes two datasets X and X* as input into the machine learning model. Dataset X can include any past financial data, current market index values, and current stock valuations related to the company or its associated industry. Data contained in X should be available during both the training of the model and its use in production, i.e., for forecasting net income given current market conditions. Dataset X* contains privileged information in the form of BLS (Bureau of Labor Statistics) statistics such as the Employment Cost index and the most recent financial quarterly report that will not be available until the next quarter. In this example, the quarterly report and BLS statistics available at the end of Q1 2019 are used as privileged information to train an earnings forecasting system, the Q2 Model, for using throughout Q2 2019. The Q2 model will only use the information available within dataset X to generate its forecasts. Once the Q2 2019 financial quarterly reports and new BLS statistics become available, a new model is trained to forecast net income for Q3 2019, the Q3 model. This process can be repeated for any subsequent quarter.

(50) Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.

(51) Additionally, certain embodiments are described herein as including logic or a number of routines, subroutines, applications, or instructions. These may constitute either software (e.g., code embodied on a non-transitory, machine-readable medium) or hardware. In hardware, the routines, etc., are tangible units capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.

(52) In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.

(53) Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.

(54) Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).

(55) The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.

(56) Similarly, the methods or routines described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.

(57) The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.

(58) Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.

(59) As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.

(60) Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. For example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.

(61) Those skilled in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described embodiments without departing from the scope of the invention, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the inventive concept.

(62) While the present invention has been described with reference to specific examples, which are intended to be illustrative only and not to be limiting of the invention, it will be apparent to those of ordinary skill in the art that changes, additions and/or deletions may be made to the disclosed embodiments without departing from the spirit and scope of the invention.

(63) The foregoing description is given for clearness of understanding; and no unnecessary limitations should be understood therefrom, as modifications within the scope of the invention may be apparent to those having ordinary skill in the art.

Sequential minimal optimization algorithm for learning using partially available privileged information

Assignee

Inventors

Cpc classification

Classification Explorer

G06N20/10

PHYSICS

Classification Explorer

G06F18/2451

PHYSICS

Classification Explorer

G16H50/50

PHYSICS

Classification Explorer

G06F18/214

PHYSICS

Classification Explorer

G06V10/776

PHYSICS

Classification Explorer

G16H50/20

PHYSICS

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G06F18/2411

PHYSICS

International classification

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06N20/10

PHYSICS

Classification Explorer

G16H50/50

PHYSICS

Abstract

Claims

Description