Systems and Methods for Predicting Medications to Prescribe to a Patient Based on Machine Learning

20210358588 · 2021-11-18

Inventors

Cpc classification

International classification

Abstract

A system for predicting medications to prescribe to a patient includes a text encoding module and a medication prediction module. The text encoding module is configured to obtain a clinical-information vector from clinical information of the patient. The medication prediction module configured to apply a machine-learned medication-prediction algorithm to the clinical-information vector to select a subset of medications to prescribe to the patient. The machine-learned medication-prediction algorithm is designed with a diversity-promoting regularization model, and is configured to simultaneously consider correlations among different medications and dependencies between patient information and medications when selecting a subset of medications to prescribe to the patient.

Claims

1. A method of predicting medications to prescribe to a patient, the method comprising: obtaining a clinical-information vector from clinical information of the patient; and applying a machine-learned medication-prediction algorithm to the clinical-information vector to select a subset of medications to prescribe to the patient.

2. The method of claim 1, wherein applying a medication-prediction algorithm to the clinical-information vector comprises, for each medication in a set of medications: determining a score function representing a dependency between the patient's clinical information and the medication; and including the medication in the subset of medications if the score function satisfies a criterion.

3. The method of claim 2, wherein the medications included in the subset of medications to prescribe to the patient are selected from a first set of medications i and a second set of medications j, and determining a score function comprises obtaining a measure of correlation between a medication in the first set of medication and a medication in the second set of medications.

4. The method of claim 3, wherein determining a score function comprises implementing a determinantal point process to obtain the measure of correlation.

5. The method of claim 4, wherein determining a score function further comprises implementing a deep conditional determinantal point process to obtain a measure of dependency between a clinical condition and a pair of correlated medications.

6. The method of claim 4, wherein determining a score function further comprises implementing a relation-regularized deep conditional determinantal point process to obtain a measure of dependency between a clinical condition and a pair of correlated medications, wherein the correlation among medications accounts for synergistic and antagonistic interactions.

7. The method of claim 4, wherein determining a score function further comprises implementing a diversity-promoting regularization model.

8. The method of claim 3, wherein obtaining a measure of correlation comprises calculating a distance metric using a projection matrix, where the row vectors of the projection matrix project representation vectors of a first set of medications i and a second set of medications j into a lower-dimensional latent space.

9. The method of claim 3, wherein obtaining a measure of correlation comprises calculating uncorrelation between representation vectors of a first set of medications i and a second set of medications j.

10. The method of claim 9, wherein uncorrelation is calculated using eigenvalues of component matrices composed from representation vectors of a first set of medications i and a second set of medications j, wherein uniformity among the eigenvalues measures uncorrelation between components.

11. The method of claim 10, wherein eigenvalues are promoted to be uniform in order to promote evenness between components.

12. The method of claim 10, further including normalizing the eigenvalues into a probability simplex and encouraging the discrete distribution parameterized by the normalized eigenvalues to have small Kullback-Leibler (KL) divergence with the uniform distribution.

13. The method of claim 12, further comprising calculating a distance metric based on similarity between the normalized eigenvalues between the representation vectors of a first set of medications i and a second set of medications j.

14. The method of claim 3, wherein the vectors are encouraged to be near-orthogonal to promote diversity between the components.

15. A system predicting medications to prescribe to a patient, the system comprising: a text-encoding module configured to obtain a clinical-information vector from clinical information of the patient; and a medication prediction module configured to apply a machine-learned medication-prediction algorithm to the clinical-information vector to select a subset of medications to prescribe to the patient.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] Various aspects of apparatuses and methods will now be presented in the detailed description by way of example, and not by way of limitation, with reference to the accompanying drawings, wherein:

[0022] FIG. 1 is a block diagram of a system for predicting medications to prescribe to a patient using a machine-learned algorithm.

[0023] FIG. 2 is a block diagram of a design for a first version of the machine-learned algorithm of FIG. 1.

[0024] FIG. 3 is a block diagram of a design for a second version of the machine-learned algorithm of FIG. 1.

[0025] FIG. 4 is a detailed block diagram of a design for the second version of the machine-learned algorithm of FIG. 3 that includes a medication correlation module and a medication note dependency module.

[0026] FIG. 5 is a detailed block diagram of a design for the medication correlation module of FIG. 4.

[0027] FIG. 6 is a block diagram of a computing device that embodies the system of FIG. 1.

DETAILED DESCRIPTION

[0028] Disclosed herein is a medication prediction system for predicting medications to prescribe to a patient, including for example, medications that a caregiver may prescribe when a patient is discharged from a care facility. The prediction of discharge medications may be formulated as a subset selection problem. Given clinical information for a patient and information on a plurality K of candidate medications Y={1, . . . . , K}, the system disclosed herein predicts a subset S.Math.Y of medications that are most likely to be prescribed to the patient as discharge. Some of the concepts and features described herein are included in Diversity-promoting and Large-scale Machine Learning for Healthcare, a thesis submitted by Pengtao Xie in August 2018 to the Machine Learning Department, School of Computer Science, Carnegie Mellon University, which is hereby incorporated by reference in its entirety.

[0029] The predictive function of the system results in the identification of a subset S of medications, referred to herein as “predicted medications,” where each medication in the set is highly relevant to the patient's clinical condition x. In one configuration, the predictive function of the system also results in the identification of a subset S of medications, where the relations among the medications in the subset, including co-occurrence, adversarial interaction, and synergistic interaction, are accounted for to eliminate clinically-inconsistent medications.

[0030] The medication prediction system is configured to receive inputs of clinical information of a patient and to generate an output predicting one or more medications to be prescribed to the patient. The medication prediction system includes one or more machine-learned algorithms that model dependencies between patient information and medications, and that model correlations among medications. One or more of these models may involve a diversity-promoting distance metric learning model. For example, the machine-learned algorithm that correlates medications may be configured to learn representations of the medication records of numerous medications, compute similarities of the representations in a latent space, and generate a score that indicates similarities among the medications. The use of a diversity-promoting distance metric learning model is beneficial in that such models measure similarity in a way that ensures the capture of both frequently prescribed and infrequently prescribed medications with similar characteristics.

[0031] With reference to FIG. 1, in one configuration, a medication prediction system 100 includes a text encoding module 102 and a medication prediction module 104. The text encoding module 102 is configured to receive a subject patient's record of clinical information 106 and to produce a representation of the clinical information as a clinical-information vector x 108. The medication prediction module 104 receives the clinical information as the clinical-information vector x 108 and applies a previously-trained machine-learned algorithm 110 to the vector x and outputs a list of predicted medications 112.

[0032] Regarding the text encoding module 102, it is configured to extract information from the clinical record, and derive the clinical-information vector x 108 from the extracted information. The record of clinical information 106 may be hand written admission notes or printed records identifying one or more conditions of the patient. The conditions or extracted information may include, for example, one or more of the patient's current medication, vital signs, symptoms, laboratory results, past medical history, family history, social history, and allergies.

[0033] With reference to FIG. 2, in one embodiment, a machine-learned algorithm 110a for use in the medication prediction module 104 is previously trained in accordance with a training set K of medication i vectors a.sub.i 202, where {a.sub.i}.sub.i=1.sup.K and a training set of clinical information vectors x 204 to associate clinical information with medications. For example, the machine-learned algorithm 110 may define score functions g(a.sub.i, x) that measure the dependency between a piece of clinical information x and a medication i.

[0034] The training set K of medication i vectors a.sub.i, 202 and the training set of clinical information vectors x 204 used to train the machine-learned algorithm 110, are obtained using a text encoding module 206. In one embodiment, the text encoding module 206 is configured with the same architecture and parameters as the text encoding module 102 in FIG. 1 that is used to derive a subject patient's clinical-information vector x 108.

[0035] The text encoding module 206 is configured to receive a medication record 208 for each medication i in the set of K medications, and to produce a representation of the medical information as a medication i vector a.sub.i 202. The medication records 208 may be in the form of professional medical articles that describe various aspects of a medication, including its clinical applications, e.g., what conditions/diseases the medication can treat, and its side effects, dosage, and so on. The text encoding module 206 is configured to extract information from the medication records it receives, and derive the medication i vector a.sub.i 202 from the extracted information.

[0036] Similarly, the text encoding module 206 is configured to receive a clinical information records 210, and to produce a representation of the clinical information as a clinical information vector x 204. The clinical information records 210 may be hand written admission notes or printed records identifying one or more conditions of the patient. The conditions may include, for example, one or more of the patient's current medication, vital signs, symptoms, laboratory results, past medical history, family history, social history, and allergies.

[0037] Returning to FIG. 1, with the trained machine-learned algorithm 110a at hand, the medication prediction module 104 applies the algorithm to the subject patient's clinical-information vector x 108 to identify a subset S 112 of the set of K medications for prescribing to the patient. To this end, the trained machine-learned algorithm 110a derives a score function g (a.sub.i, x) that provides a measure of the dependency between the subject patient's clinical information x and the medication i in the set of K medications. The score function of each identified medication may be processed against a threshold score to determine whether the medication is included in the set of predicted medications 112 to be output by the medication prediction module 104.

[0038] With reference to FIG. 3, in another embodiment, a machine-learned algorithm 110b for use in the medication prediction module 104 is previously trained in accordance with a training set K of medication i vectors a.sub.i, 302, where {a.sub.i}.sub.i=1.sup.K, a training set of medication j vectors 304, and a training set of clinical information vectors x 306 to associate clinical information with medications. For example, the machine-learned algorithm 110 may define score functions k(a.sub.i, a.sub.j|x) that measure the correlation between different medications i and j and the dependency between a piece of clinical information x and correlated medications i and j.

[0039] The training set K of medication i vectors a.sub.i, 302, the training set of medication j vectors 304, and the training set of clinical information vectors x 306 used to train the machine-learned algorithm 110, are obtained using a text encoding module 308. In one embodiment, the text encoding module 308 is configured with the same architecture and parameters as the text encoding module 102 in FIG. 1.

[0040] The text encoding module 308 is configured to receive a medication record 310 for each medication i in the set of K medications, and to produce a representation of the medical information as a medication i vector a.sub.i 302. Likewise, the text encoding module 308 is configured to receive a medication record 312 for each medication j, and to produce a representation of the medical information as a medication j vector a.sub.j 304. The medication records 310, 312 may be in the form of professional medical articles that describe various aspects of a medication, including its clinical applications, e.g., what conditions/diseases the medication can treat, and its side effects, dosage, and so on. Similarly, the text encoding module 308 is configured to receive a clinical information records 314, and to produce a representation of the clinical information as a clinical information vector x 306. The clinical information records 314 may be hand written admission notes or printed records identifying one or more conditions of the patient. The conditions may include, for example, one or more of the patient's current medication, vital signs, symptoms, laboratory results, past medical history, family history, social history, and allergies.

[0041] Returning to FIG. 1, with the trained machine-learned algorithm 110b at hand, the medication prediction module 104 applies the algorithm to the subject patient's clinical-information vector x 108 to identify a subset S 112 of the set of K medications for prescribing to the patient. To this end, the trained machine-learned algorithm 110b derives a score function k (a.sub.i, a.sub.j|x) that provides a measure of the dependency between the subject patient's clinical information x and correlated medications i and j. The score function of each identified medication may be processed against a threshold score to determine whether the medication is included in the set of predicted medications 112 to be output by the medication prediction module 104.

[0042] With reference to FIG. 4, the machine-learned algorithm 110b of FIG. 3 includes a medication correlation module 402 and a medication note dependency module 406. Collectively, these modules 402, 406 present a machine-learned algorithm that is designed to simultaneously model correlations among medications i and j and model dependencies between patient information and medications.

[0043] In one embodiment, the medication correlation module 402 implements a determinantal point process (DPP) that captures the correlations among the medications and outputs scalar measures 404 indicating the correlation of a medication i and a medication j. For example, given a set of medications each represented with a vector a.sub.i or a.sub.j, the DPP computes a kernel matrix L∈R.sup.K×K, where L.sub.ij=k(a.sub.i, a.sub.j) and k(.Math.,.Math.) is a kernel function to defines a probability distribution over subsets. Then the probability over a subset of items indexed by S.Math.{1,, . . . , K} can be defined as:

[00001] $\begin{matrix} p (S) = \frac{\det (L_{s})}{\det (L + I)} & (Eq . 1) \end{matrix}$

[0044] where [0045] L.sub.s≡[L.sub.ij].sub.i,j∈S denotes the restriction of L to the entries indexed by elements of S and [0046] det(.Math.) denotes the determinant of a matrix and I is an identity matrix.

[0047] The determinant det(.Math.) enables the DPP to capture the high-order relations among items represented by the vectors a.sub.i and a.sub.j. A DPP is applied to capture the correlation among medications: given the representations of K medications {a.sub.i}.sub.i=1.sup.K, the kernel matrix L is computed and probability defined over medication-subset according to Eq. 1.

[0048] The medication note dependency module 406 incorporates a score function g(a.sub.i, x), which is derived as described above, into a kernel function in the DPP. On top of the kernel function k(a.sub.i, a.sub.j) measuring the correlation between medication i and j, a new kernel is defined that is conditioned on the clinical information input x that is included in the score function g(a.sub.i, x):

{tilde over (k)}(a.sub.i, a.sub.j|x)=g(a.sub.i, x)k(a.sub.i, a.sub.j)g(a.sub.i, x) (Eq. 2)

[0049] Under this conditional kernel parameterized by deep networks, a deep conditional DPP is obtained:

[00002] $\begin{matrix} p (S | x) = \frac{\det (L_{s} (x))}{\det (L (x) + I)} & (Eq . 3) \end{matrix}$

[0050] where L.sub.ij(x)={tilde over (k)}(a.sub.i, a.sub.j|x).

[0051] The deep conditional DPP is trained into the machine-learned algorithm 110b using historical information collected across a diverse patient population. Given training data {(d.sub.n, S.sub.n)}.sub.n=1.sup.N where d.sub.n is a clinical information input record and S.sub.n is the subset of medications prescribed at discharge time, the parameters Θ of the deep conditional DPP, mainly the weight and bias parameters in DNNs, are learned by maximizing the data likelihood

max.sub.ΘL({(d.sub.n, S.sub.n)}.sub.n=1.sup.N)=Π.sub.n=1.sup.Np(S.sub.n|d.sub.n; Θ) (Eq. 4)

[0052] Medical professionals have accumulated rich knowledge regarding the interactions between medications. These interactions largely affect the usage of medications. Specifically, two types of interactions are considered: antagonism and synergy. The antagonism interaction indicates that when used together, two medications may bring in a negative medical effect. Medications with antagonism interactions should be prohibited from being used together. The synergy interaction suggests that two medications are frequently used simultaneously to treat a disease. Their co-occurrence would bring in a positive medical effect and should be encouraged.

[0053] In another embodiment, a relational regularization model is designed and trained into the machine-learned algorithm 110b such that medications with synergy interaction are encouraged to be co-selected and those with antagonism interaction are penalized for co-selection. The relational regularization model is designed according to the property of DPP, which assigns larger probability mass p(S) over a medication subset S where the medications are more mutually “different”. The “difference” between a medication i and a medication j, as presented by a respective medication i vector a.sub.i and a medication j vector a.sub.j is measured by the kernel function k(a.sub.i, a.sub.j): the smaller k(a.sub.i, a.sub.j) is, the more different the vectors a.sub.i and a.sub.j are, hence, the more different the medications are. To encourage medication i and j to be simultaneously selected into S, the relational regularization model encourages k(a.sub.i, a.sub.j) to be small to increase p(S).

[0054] To discourage simultaneous selection, k(a.sub.i, a.sub.j)is preferred to be large to decrease p(S). Denoting M and C the set of medication pairs possessing antagonistic and synergistic interactions respectively, the following relational regularization deep conditional DPP problem is defined

max.sub.ΘL({(d.sub.n, S.sub.n)}.sub.n=1.sup.N)+λ(−Σ.sub.(i,j) ∈Mk(a.sub.i, a.sub.j)+Σ.sub.(i,j)∈Ck(a.sub.i, a.sub.j)) (6.5)

[0055] In the second term of the objective function, a medication pair (i, j) with synergistic interaction is encouraged to have smaller k (a.sub.i, a.sub.j), while those with an antagonistic interaction are encouraged to have larger k(a.sub.i, a.sub.j).

[0056] A machine-learned algorithm 110b designed and trained in accordance with the foregoing relational regularization deep conditional DPP performs well on frequent medications but less well on infrequent medications. To address this, in an alternate configuration, a diversity-promoting regularization may be applied to the medication correlation aspect of the machine-learned algorithm 110b to improve the algorithm's performance with respect to infrequent medications.

[0057] In this embodiment, medication i vectors a.sub.i 302 and medication j vectors 304 are input to the distance metric learning sub-module 502 to learn a distance metric. The distance metric is featured by a projection matrix where the row vectors of this matrix project the representation vectors of the medications into a lower-dimensional latent space. For example, “diversity” may be characterized by considering two factors: uncorrelation and evenness. In this way, uncorrelation may be a measure of how uncorrelated components are. That is, less correlation is equivalent to more diversity. Additionally, for evenness in latent space modeling, components may play substantially equally important roles and no one component dominating, such that each component contributes significantly in data modeling.

[0058] In some embodiments, uncorrelation among components may be characterized from a statistical perspective by treating components as random variables and measuring their covariance which is proportional to their correlation. In one embodiment, A∈ custom-character .sup.d×m denotes the component matrix whose k-th column is the parameter vector a.sub.k of component k. In some embodiments, a row view of A: may be used where each component is treated as a random variable and each row vector ã.sub.i.sup.T is a sample drawn from the random vector formed by the m components. Further

[00003] $μ = \frac{1}{d} {.Math.}_{i = 1}^{d} {\tilde{a}}_{i} = \frac{1}{d} A^{⊤} 1$

may be set as the sample mean, where the elements of 1∈ custom-character .sup.d are all 1. An empirical covariance matrix may then be computed with the components as

[00004] $G = \frac{1}{d} {.Math.}_{i = 1}^{d} ({\tilde{a}}_{i} - μ) {({\tilde{a}}_{i} - μ)}^{⊤} = \frac{1}{d} A^{⊤} A - (\frac{1}{d} A^{⊤} 1) {(\frac{1}{d} A^{⊤} 1)}^{⊤} .$

By imposing the constraint A.sup.T1=0, therefore

[00005] $G = \frac{1}{d} A^{⊤} A .$

Suppose A is a full rank matrix and m>d, then G is a full-rank matrix with rank m.

[0059] For the next step, the eigenvalues of G play important roles in characterizing the uncorrelation and evenness of components. Let G=Σ.sub.k=1.sup.mλ.sub.ku.sub.ku.sub.k.sup.T be the eigendecomposition where λ.sub.k is an eigenvalue and u.sub.k is the associated eigenvector. In principle component analysis, an eigenvector u.sub.k of the covariance matrix G represents a principal direction of the data points and the associated eigenvalue λ.sub.k tells the variability of points along that direction. The larger λ.sub.k is, the more spread out the points along the direction u.sub.k. When the eigenvectors (principal directions) are not aligned with the coordinate axis, the level of disparity among eigenvalues indicates the level of correlation among the m components (random variables). The more different the eigenvalues are, the higher the correlation is. Considering this, the uniformity among eigenvalues of G can be utilized to measure how uncorrelated the components are.

[0060] Secondly, the eigenvalues are related with the other factor of diversity: evenness. When the eigenvectors are aligned with the coordinate axis, the components are uncorrelated. In this case, evenness is used to measure diversity. In this example, each component is assigned an importance score. Since the eigenvectors are in parallel to the coordinate axis, the eigenvalues reflect the variance of components. Analogous to principle component analysis which posits that random variables with larger variance are more important, the present embodiment may use variance to measure importance. According to the evenness criteria, the components are more diverse if their importance scores match, which motivates us to encourage the eigenvalues to be uniform.

[0061] To sum up, the eigenvalues are encouraged to be even in both cases: (1) when the eigenvectors are not aligned with the coordinate axis, they are preferred to be even to reduce the correlation of components; (2) when the eigenvectors are aligned with the coordinate axis, they are encouraged to be even such that different components contribute equally in modeling data.

[0062] The similarity (or distance) of medications is then defined in the latent space. Further, row vectors of the projection matrix are encouraged to be diverse by a diversity-promotion sub-module. In some embodiments, by promoting diversity the row vectors evenly spread out and to represent both frequent medications and infrequent medications. By doing this, the similarity of infrequent medications can be better measured as the distance learning model counters skew toward frequent medications. The learned distance metric determined by the distance metric learning sub-module 502 and the medication i vectors a.sub.i 302 and medication j vectors 304 are input to the similarity calculation sub-module 504 to calculate the similarity score between the two medications.

[0063] In some embodiments, the distance metric learning sub-module 502 learns a distance metric. It takes representation vectors of two medications as inputs and produces a distance metric that can be utilized to measure the similarity of two medications. The distance metric between two medications is defined in the following way: given the representations of the medication's medication record, a linear projection matrix is utilized to project these representations into a latent space; then the squared Euclidean distance between the latent representations is measured. The distance metric learning sub-module learns this distance metric (specifically, the linear projection matrix) by encouraging the distance between similar medications to be as small as possible, and encouraging the distance between dissimilar medications to be separated by a margin.

[0064] A diversity-promotion sub-module 506 is utilized to control the row vectors of the distance matrix in the distance metric learning sub-module 502, such that these vectors are diverse. In this embodiment, by promoting diversity the row vectors spread out and give frequent medications and infrequent medications a fair treatment. In this way, the similarity among infrequent medications and frequent medications can be better measured. Diversity is measured using near-orthogonality: vectors that are close to being orthogonal are more diverse. To encourage near-orthogonality, the diversity-promotion sub-module 506 computes the Gram matrix of the row vectors, then encourages the Gram matrix to be close to an identity matrix where the closeness is measured using Bregman matrix divergence.

[0065] The similarity calculation sub-module 504 calculates the similarity of two medications. It takes the representation vectors, i.e., the medication i vectors a.sub.i 302 and medication j vectors 304, as input and produces a score that indicates the similarity of the two medications. At the core of this sub-module 504 is a distance matrix (learned by the DML sub-module) where the row vectors of this matrix project the representation vectors of the medications into a lower-dimensional latent space. The similarity of medications is then measured in the latent space.

[0066] In some embodiments, to promote uniformity among eigenvalues, as a general approach, eigenvalues may be normalized into a probability simplex and then the discrete distribution parameterized by the normalized eigenvalues may be encouraged to have small Kullback-Leibler (KL) divergence with the uniform distribution. Given the eigenvalues {λ.sub.k}.sub.k=1.sup.m, they are then normalized into a probability simplex

[00006] ${\hat{λ}}_{k} = \frac{{\hat{λ}}_{k}}{{.Math.}_{j = 1}^{m} {\hat{λ}}_{k}}$

based on which a distribution is defined on a discrete random variable X=1, . . . , m where p(X=k)={circumflex over (λ)}.sub.k.

[0067] In addition, to ensure the eigenvalues are strictly positive, A.sup.TA may be set to be positive definite. To encourage {λ.sub.k}.sub.k=1.sup.m to be uniform, the distribution p(X) is set be “close” to a uniform distribution

[00007] $q (X = k) = \frac{1}{m},$

where the “closeness” is measured using KL divergence

[00008] $K L (p || q) : {.Math.}_{k = 1}^{m} {\hat{λ}}_{k} \log \frac{{\hat{λ}}_{k}}{1 / m} = \frac{{.Math.}_{k = 1}^{m} λ_{k} \log λ_{k}}{{.Math.}_{j = 1}^{m} λ_{j}} - \log {.Math.}_{j = 1}^{m} λ_{j} + \log m .$

In this equation, Σ.sub.k=1.sup.mλ.sub.klog λ.sub.k is equivalent to

[00009] $tr ((\frac{1}{d} A^{⊤} A) \log (\frac{1}{d} A^{⊤} A)),$

where log(.Math.) denotes matrix logarithm. To show this, note that

[00010] $\log (\frac{1}{d} A^{⊤} A) = {.Math.}_{k = 1}^{m} \log (λ_{k}) u_{k} u_{k}^{⊤},$

according to the property of matrix logarithm. Then,

[00011] $tr ((\frac{1}{d} A^{⊤} A) \log (\frac{1}{d} A^{⊤} A))$

is equal to tr((Σ.sub.k=1.sup.mλ.sub.ku.sub.ku.sub.k.sup.T)(Σ.sub.k=1.sup.mlog(λ)k u.sub.ku.sub.k.sup.T)) which equals to Σ.sub.k=1.sup.mλ.sub.klog λ.sub.k. According to the property of trace,

[00012] $tr (\frac{1}{d} A^{⊤} A) = {.Math.}_{k = 1}^{m} λ_{k} .$

Then the KL divergence can be turned into a diversity-promoting uniform eigenvalue regularizer (UER):

[00013] $\frac{t r ((\frac{1}{d} A^{⊤} A) \log (\frac{1}{d} A^{⊤} A))}{tr (\frac{1}{d} A^{⊤} A)} - \log tr (\frac{1}{d} A^{⊤} A),$

subject to A.sup.TA>0 and A.sup.T1=0.

[0068] UER then may be applied to promote diversity. For example, let custom-character (A) denote the objective function of an ML model, then a UE-regularized ML problem can be defined as

[00014] $\min_{A} ℒ (A) + λ (\frac{tr ((\frac{1}{d} A^{⊤} A) \log (\frac{1}{d} A^{⊤} A))}{tr (\frac{1}{d} A^{⊤} A)} - \log tr (\frac{1}{d} A^{⊤} A))$

subject to A.sup.TA >0 and A.sup.T 1=0, where λ is the regularization parameter.

[0069] Uniform eigenvalue regularizers may then be applied to promote diversity in a specific model: distance metric learning (DML). Given data pairs either labeled as “similar” or “dissimilar”, distance metric learning aims to learn a distance metric under which similar pairs would be placed close to each other and dissimilar pairs are separated apart. The learned distance can benefit a wide range of tasks, including retrieval, clustering and classification. The distance metric may be defined as between x, y ∈ custom-character .sup.d as ∥A.sup.Tx−A.sup.Ty∥.sub.2.sup.2 where A∈.sup.d×m is a parameter matrix whose column vectors are components. A uniform eigenvalue regularized DML (UE-DML) problem can then be formulated as:

[00015] $\min_{A} {.Math.}_{(x_{,} y) \in} �� {.Math. A^{⊤} x - A^{⊤} y .Math.}_{2}^{2} + {.Math.}_{(x, y) \in^{��}} \max (0, 1 - {.Math. A^{⊤} x - A^{⊤} y .Math.}_{2}^{2}) + λ (\frac{t r ((\frac{1}{d} A^{⊤} A) \log (\frac{1}{d} A^{⊤} A))}{t r (\frac{1}{d} A^{⊤} A)} - \log tr (\frac{1}{d} A^{⊤} A))$

subject to A.sup.TA >0 and A.sup.T1=0, where custom-character and are the set of similar and dissimilar pairs respectively. The first and second term in the objective function encourage similar pairs to have small distances and dissimilar pairs to have large distances respectively.

[0070] The UE regularizer is nonconvex and is difficult to be convexified. As a result, the UE-regularized ML problems are nonconvex where achieving the global optimal is NP-hard. In this section, diversity-promoting regularizers are designed that make convex relaxation easier. Nonconvex regularizers are defined based on Bregman matrix divergence, then discuss how to convexify them.

[0071] Diversity may also be defined as near-orthogonality, wherein component vectors are determined to be more diverse if they are closer to being orthogonal. To encourage near orthogonality between two vectors a.sub.i and a.sub.j, one way is to make their inner product a.sub.i T a.sub.j close to zero and their custom-character .sub.2 norm ∥a.sub.i∥.sub.2, ∥a.sub.j∥.sub.2 close to one. For a set of vectors {a.sub.i}.sub.i=1.sup.m, near orthogonality can be achieved in the following manner by computing the Gram matrix G where G.sub.ij =a.sub.i.sup.Ta.sub.j, then encouraging G to be close to an identity matrix. Off the diagonal of G and I are a.sub.i.sup.Ta.sub.j and zero respectively. On the diagonal of G and I are ∥a.sub.i∥.sub.2.sup.2 and one respectively. Making G close to I effectively encourages a.sub.i.sup.Ta.sub.j to be close to zero and ∥a.sub.i∥.sub.2 close to one, which therefore encourages a.sub.i and a.sub.j to get close to orthogonal.

[0072] A Bregman matrix divergence (BMD) may be used to measure “closeness” between two matrices. Let custom-character .sup.n denote real symmetric n×n matrices. Given a strictly convex, differentiable function ϕ:.sup.n.fwdarw., the BMD is defined as D.sub.ϕ(X,Y)=ϕ(X)−ϕ(Y)−tr((∇ϕ(Y)).sup.T(X−Y)), where tr(A) denotes the trace of matrix A. Different choices of ϕ(X) lead to different divergences. When ϕ(X)=∥X∥.sub.F.sup.2, BMD is specialized to the squared Frobenius norm (SFN) ∥X−Y∥.sub.F.sup.2. If ϕ(X)=tr(X log X−X), where log X denotes the matrix logarithm of X, the divergence becomes D.sub.vN(X, Y)=tr(X log X−X log Y−X+Y), which is von Neumann divergence (VND). If ϕ(X)=−log detX where det (X) denotes the determinant of X, the log-determinant divergence (LDD) D.sub.lD(X,Y)=tr(XY.sup.−1)−log det(XY.sup.−1)−n.

[0073] To encourage near-orthogonality among components, the BMD between the Gram matrix AA.sup.T and an identity matrix I may be small, which results in a family of BMD regularizers: Ω.sub.ϕ(A)=D.sub.ϕ(AA.sup.T, I). Ω.sub.ϕ(A) can be specialized to different instances, according to the choices of D.sub.ϕ(.Math.,.Math.). Under SFN, Ω.sub.ϕ(A) becomes Ω.sub.Fro(A)=∥AA.sup.T−I∥.sub.F.sup.2. Under VND, Ω.sub.ϕ(A) becomes Ω_vN (A)=tr(AA{circumflex over ( )} T)−AA{circumflex over ( )} T)+m. Under LDD, Ω.sub.ϕ(A) becomes Ω.sub.ld (A)=tr(AA.sup.T)−log det(AA.sup.T)−m.

[0074] Applying these regularizers to distance metric learning (DML), the following BMD-regularized DML (BMD-DML) problem is defined as:

[00016] $\min_{A} \frac{1}{| �� |} {.Math.}_{(x, y) \in ��} {.Math. Ax - Ay .Math.}_{2}^{2} + \frac{1}{| �� |} {.Math.}_{(x, y) \in ��} \max (0, 1 - .Math. Ax - Ay {||}_{2}^{2}) + λ Ω_{ϕ (A)}$

which is nonconves.

[0075] FIG. 6 is a block diagram of a computing device 600 that embodies the medication prediction system of FIG. 1. The computing device 600 is specially configured to execute instructions related to the medication prediction processing described above, including the application of machine-learned algorithms to clinical information records. Computers capable of being specially configured to execute such instructions may be in the form of a laptop, desktop, workstation, or other appropriate computer capable of connecting to the medication prediction system 100 of FIG. 1.

[0076] The computing device 600 includes a central processing unit (CPU) 602 that implements the various modules of the medication prediction system 100 described above with reference to FIG. 1, and a computer readable media 604 that includes program instructions that enable the CPU to implement the modules of the medication prediction system 100. The computing device 600 also includes a user interface 608 and a display 610, and an interface bus 612 that interconnects all components of the computing device.

[0077] Computer readable media 604 suitable for storing medication prediction system processing instructions include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, flash memory devices, magnetic disks, magneto optical disks and CD ROM and DVD-ROM disks. In operation, the CPU 602 executes the medication prediction system processing instructions stored in the computer readable media 604 to thereby perform the functions of the text encoding module 102 and the medication prediction module 104.

[0078] The user interface 608, which may be a keyboard or a mouse, and the display 610 allow for a clinician to interface with the computing device 600 and the components of the medication prediction system 100. For example, a clinician seeking to obtain a set of medications to prescribe to a subject patient, may input a record or a number of input records of a subject patient for processing. The clinician may then initiate execution of the medication prediction system processing instructions stored in the computer readable media 604 through the user interface 608, and await a display of the predicted medications.

[0079] While various embodiments have been described above, they have been presented by way of example only, and not by way of limitation. Likewise, the various diagrams may depict an example architectural or other configuration for the disclosure, which is done to aid in understanding the features and functionality that can be included in the disclosure. The disclosure is not restricted to the illustrated example architectures or configurations, but can be implemented using a variety of alternative architectures and configurations.

[0080] In this document, the terms “module” and “engine” as used herein, refers to software, firmware, hardware, and any combination of these elements for performing the associated functions described herein. Additionally, for purpose of discussion, the various modules are described as discrete modules; however, as would be apparent to one of ordinary skill in the art, two or more modules may be combined to form a single module that performs the associated functions according embodiments of the invention.

[0081] In this document, the terms “computer program product”, “computer-readable medium”, and the like, may be used generally to refer to media such as, memory storage devices, or storage unit. These, and other forms of computer-readable media, may be involved in storing one or more instructions for use by processor to cause the processor to perform specified operations. Such instructions, generally referred to as “computer program code” (which may be grouped in the form of computer programs or other groupings), when executed, enable the computing system.

[0082] Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. As examples of the foregoing: the term “including” should be read as meaning “including, without limitation” or the like; the term “example” is used to provide exemplary instances of the item in discussion, not an exhaustive or limiting list thereof; and adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known”, and terms of similar meaning, should not be construed as limiting the item described to a given time period, or to an item available as of a given time. But instead these terms should be read to encompass conventional, traditional, normal, or standard technologies that may be available, known now, or at any time in the future.

[0083] Additionally, memory or other storage, as well as communication components, may be employed in embodiments of the invention. It will be appreciated that, for clarity purposes, the above description has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units, processing logic elements or domains may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processing logic elements or controllers may be performed by the same processing logic element or controller. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality, rather than indicative of a strict logical or physical structure or organization.

[0084] Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by, for example, a single unit or processing logic element. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined. The inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also, the inclusion of a feature in one category of claims does not imply a limitation to this category, but rather the feature may be equally applicable to other claim categories, as appropriate.

[0085] The various aspects of this disclosure are provided to enable one of ordinary skill in the art to practice the present invention. Various modifications to exemplary embodiments presented throughout this disclosure will be readily apparent to those skilled in the art. Thus, the claims are not intended to be limited to the various aspects of this disclosure, but are to be accorded the full scope consistent with the language of the claims. All structural and functional equivalents to the various components of the exemplary embodiments described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed under the provisions of 35 U.S.C. § 112, sixth paragraph, unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.”

Systems and Methods for Predicting Medications to Prescribe to a Patient Based on Machine Learning

Inventors

Cpc classification

Classification Explorer

G16B40/00

PHYSICS

Classification Explorer

G06N3/044

PHYSICS

Classification Explorer

G06V30/274

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

Y02A90/10

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G16H50/20

PHYSICS

Classification Explorer

G06F18/2136

PHYSICS

Classification Explorer

G06F40/284

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G16H10/60

PHYSICS

Classification Explorer

G16H20/10

PHYSICS

Classification Explorer

G16H70/60

PHYSICS

Classification Explorer

H04L67/104

ELECTRICITY

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G16B50/00

PHYSICS

Classification Explorer

G06F18/2431

PHYSICS

Classification Explorer

G06T7/0012

PHYSICS

Classification Explorer

G16H50/70

PHYSICS

Classification Explorer

G16H30/40

PHYSICS

Classification Explorer

G16H15/00

PHYSICS

Classification Explorer

G06T2207/20084

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

Classification Explorer

G06F16/36

PHYSICS

Classification Explorer

G06N5/022

PHYSICS

Classification Explorer

G06V2201/03