Method of predicting chromatographic elution order of compounds
11651213 · 2023-05-16
Assignee
- Pukyong National University Industry-University Cooperation Foundation (Busan, KR)
- Medical University of Gdansk (Gdansk, PL)
Inventors
Cpc classification
G16C20/30
PHYSICS
G01N30/8693
PHYSICS
International classification
G16C20/30
PHYSICS
Abstract
Disclosed is a method for predicting an elution order of compounds in a mixture. The method includes (a) building a quantitative structure-retention relationship (QSRR) model and (b) predicting a chromatographic elution order of the compounds in the mixture on the basis of the QSRR model using mathematical programming. The mathematical programming is a non-linear programming technique in which a predicted elution order of the compounds is used as a constraint or a multi-objective optimization (MOO) in which a retention time prediction error and an elution order prediction error are used as objective functions. With the use of the method of the present disclosure, it is possible to optimize separation of complex mixtures in reversed-phase chromatography by enabling identification of accurate positions of individual compounds that provides higher certainty in identifying a given compound, e.g., during an “omics” analysis (proteomics, metabolomics, etc.).
Claims
1. A computer-implemented method of predicting a chromatographic elution order of compounds in a mixture, the method comprising: generating, by a computer processor, a quantitative structure-retention relationship (QSRR) model; receiving, by the computer processor, mixture data representing one or more mixtures of a plurality of compounds; computer-modeling, by the computer processor, the QSRR model on the mixture data using artificial neural networks (ANN) to generate a non-linear programming (NLP) model comprising one or more inequality constraints; predicting, by the computer processor, a chromatographic elution order of the compounds in the mixture data using the NLP model, the predicting comprising obtaining a low retention time prediction error and an elution order prediction error, the one or more inequality constraints associated with the elution order prediction error; detecting, by the computer processor, positions of the compounds in the mixture data based on the predicted chromatographic elution order; and identifying, by the computer processor, one or more of the compounds based on the detected positions of the compounds, wherein the one or more inequality constraints comprise a positive relaxation parameter and a molecular descriptor associated with the elution order prediction error.
2. The method according to claim 1, wherein, in the predicting, the chromatographic elution order of the compounds in the mixture is predicted by the computer processor executing instructions comprising the non-linear programming (I) under the inequality constraints (II):
3. The method according to claim 2, wherein the molecular descriptor comprises dipole moment (μ), excess charge of an atom that is most negatively charged (δ.sub.min), solvent-accessible surface area (SASA), sum of retention times of respective 20 naturally occurring amino acids (Sum.sub.AA), Van der Waals volume (vDW.sub.vol.), computerized octanol-water coefficient (c log P), or any combination thereof.
4. A computer-implemented method of predicting a chromatographic elution order of compounds in a mixture, the method comprising: generating, by a computer processor, a quantitative structure-retention relationship (QSRR) model, wherein the QSRR model comprises a molecular descriptor; receiving, by the computer processor, mixture data representing one or more mixtures of a plurality of compounds; computer-modeling, by the computer processor, the QSRR model on the mixture data to generate a linear model; predicting, by the computer processor, a chromatographic elution order of the compounds in the mixture data using the linear model by performing multi-objective optimization (MOO) on the basis of an objective function representing a retention time prediction error represented by Formula (III) and an elution order prediction error represented by Formula (IV):
5. The method according to claim 4, wherein in the predicting, the MOO selects a Pareto optimal solution, selecting the Pareto optimal solution comprising: selecting a knee point which is an optimal compromise between the retention time prediction error and the elution order prediction error from a Pareto front including the Pareto solutions; moving to the next Pareto solution to reduce the elution order prediction error; verifying the solution using an applicability domain; and repeating the knee point selection and the moving until an increase in the retention time prediction error reaches a first predetermined threshold or an outlier in the applicability domain exceeds a second predetermined threshold.
6. The method of claim 4, wherein the linear model is represented by the following formula:
t.sub.R,j=a.sub.1x.sub.j,1+a.sub.2x.sub.j,2+ . . . +a.sub.nx.sub.j,n wherein: t.sub.R,j are retention times of respective compounds j sorted in ascending order, x.sub.j,i (i=1, . . . , n) are the molecular descriptors of respective compounds j, and a.sub.i (i=1, . . . , n) are regression coefficients.
7. The method according to claim 6, wherein the molecular descriptor comprises dipole moment (μ), excess charge of an atom that is most negatively charged (δ.sub.min), solvent-accessible surface area (SASA), sum of retention times of respective 20 naturally occurring amino acids (Sum.sub.AA), Van der Waals volume (vDW.sub.vol.), computerized octanol-water coefficient (c log P), or any combination thereof.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION
(7) In describing embodiments of the present disclosure, well-known functions or constructions will not be described in detail when they may obscure the gist of the present invention.
(8) Embodiments in accordance with the concept of the present invention can undergo various changes to have various forms, and only some specific embodiments are illustrated in the drawings and described in detail in the present disclosure. While specific embodiments of the present disclosure are described herein below, they are only for illustrative purposes and should not be construed to limit the scope of the present disclosure. The present disclosure should be construed to cover not only the specific embodiments but also cover all modifications, equivalents, and substitutions that fall within the concept and technical spirit of the present disclosure.
(9) The terminologies used herein are for the purpose of describing particular embodiments only and is not intended to limit the scope of the present disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “includes”, or “has” when used in the present disclosure specify the presence of stated features, regions, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, regions, integers, steps, operations, elements, components and/or combinations thereof.
(10) The present disclosure describes a method for solving problems with prediction of elution order through QSRR-based mathematical programming in which a retention time error and an elution order error are defined as parts of an objective function. For a linear model, the gist of the present disclosure is a general QSRR model defined by Formula 1.
t.sub.R,j=a.sub.1x.sub.j,1+a.sub.2x.sub.j,2+ . . . +a.sub.nx.sub.j,n (1)
(11) In Formula 1, t.sub.R, j are retention times of compounds j arranged in ascending order, x.sub.j, i (i=1, . . . , n) are molecular descriptors of compounds j, and a.sub.i (i 1, . . . , n) are regression coefficients, in which t.sub.R, j and x.sub.j, i are set such that their mean is adjusted to 0. The a.sub.i can be obtained through multiple linear regression (MLR). When the QSRR is non-linear, a non-linear modeling technique, such as e.g., artificial neural networks (ANN) may be used.
(12) (1) Constrained Non-Linear Programming (NLP) Using Predicted Elution Order as Constraint
(13) QSRR can be expressed as Formula 2 which is the formula of mathematical programming.
(14)
(15) In Formula 2, {circumflex over (t)}.sub.R,j=f(x) the modeled QSRR relation f(x) is a function with a.sub.i as a parameter. As in the embodiment to be described below, for example, when n=3 and when MLR is used as a relation model when=3, Formula 1 can be expressed as Formula 3.
(16)
(17) That is, a typical problem with QSRR is attributed to non-linear programming. By arranging the predicted retention times in ascending order, it is easy to predict the elution order therefrom. However, as described above, although it is possible to predict retention times of a number of peaks with adequate accuracy (i.e., within in a tolerable prediction error range) with the QSRR model obtained from Formula 2, prediction of elution order with the QSRR may often result in low accuracy. This is obvious when considered on the basis of the regression equation.
(18) In terms of mathematical programming, the problems mentioned above seem to be solved by introducing the following inequality constraints:
(19)
(20) For m compounds, the inequality constraints are represented as a vector-matrix notation
(21) However, some numerical experiments have shown that the above constraints are so excessive that even simple mixtures have bad results. As a result, a low retention time prediction error and an elution order prediction error are simultaneously obtained using the relaxed inequality constraints shown below.
(22)
(23) Where α.sub.j is a positive relaxation parameter, ā is a decision vector composed of a1, a2, a3 and α.sub.j (j=1, 2, . . . , m−1). The inequality constraints for m compounds can be expressed as a vector-matrix notation shown below.
(24)
(25) (2) Multi-Objective Optimization Using Retention Time Prediction Error and Elution Order Prediction Error as Objective Function (MOO)
(26) The problem with multi-objective optimization (MOO) is attributed to optimization with multiple objective functions. The general formula thereof is Formula 7.
min(g.sub.1(α.sub.1),g.sub.2(α.sub.2), . . . ,g.sub.k(α.sub.k)) (7) subject to: α.sub.i∈A
(27) In Formula 7, an integer k (≥2) represents the number of objective functions g and a set A is a possible set of decision vectors α. In a multi-objective optimization, normally there are no solutions that minimize all objective functions. Therefore, attention is paid to the Pareto optimal solution, which is a solution that cannot improve objective functions without degrading at least one of the objective functions.
(28) In the present disclosure, two objective functions are used, one representing the error of the retention time prediction and the other representing the error of the elution order prediction. The Pareto optimal solution is then selected according to the following procedure: (1) selecting the knee point which is the best compromise between the retention time prediction error and the elution order prediction error from the Pareto front consisting of the Pareto solutions; (2) moving to the next Pareto solution to reduce the elution order prediction error; (3) validating the solution using the applicability domain; and (4) repeating (2) and (3) until an increase in the retention time prediction error reaches a first predetermined threshold or until an outlier in the applicability domain exceeds a second predetermined threshold. This is conceptually illustrated in
(29) Hereinafter, the present disclosure will be described in more detail with reference to Examples.
(30) The examples presented herein are merely illustrative of the present disclosure and are not intended to limit the scope of the present disclosure.
Example
(31) The following two examples demonstrate the applicability of the present disclosure: (i) CS1 which is a mixture of 62 organic compounds and (ii) CS2 which is a mixture of 98 synthetic peptides. Analysis for the first example CS1 was performed using a Supelcosil LC column with a gradient time of 10 minutes at 35° C. Analysis for the second example CS2 was performed using seven chromatographic columns (i.e., Xterra, Licrospher, PRP, Discovery RP-Amide C-16, Licrospher CN, Discovery HS F5-3 and Chromolith) at different gradient settings and temperatures. Chromatographic analysis data were obtained from the references.
(32) The molecular descriptors used in each example for QSRR relation modeling are listed in Table 1 below.
(33) TABLE-US-00001 TABLE 1 Molecular descriptors used in Examples (CS1 and CS2) Molecular descriptors Explanation CS1 μ dipole moment δ.sub.min excess charge of the most negatively charged atom SASA solvent-accessible surface area CS2 Sum.sub.AA sum of retention times of respective 20 naturally occurring amino acids νDW.sub.vol. Van der Waals volume clogP computerized octanol-water coefficient
(34) In both examples, a linear model was considered as a specific form of the QSRR relation model, and control MLR model coefficients were calculated using a least-square method for comparison. The solution of a non-linear programming problem with relaxed constraints in Formula 5 was obtained using the interior-point method. The solution of a multi-objective optimization problem of Formula 7 was obtained using a genetic algorithm. In both methods, the coefficients of a control MLR model obtained for comparison were used as initial values in the optimization.
(35) For the multi-objective optimization, the percentage root mean square error (%RMSE) of the retention time was used as an objective function representing a retention time prediction error.
(36)
(37) Where t.sub.R and {circumflex over (t)}.sub.R respectively represent the retention time measured through the analytical experiment and the retention time predicted by the model and m represents the number of mixtures measured for each column. The elution order prediction can be performed after sorting the retention times predicted by the QSRR model in ascending order, and %RMSE was used as the objective function representing the accuracy of the elution order.
(38)
(39) Where order.sub.obs. and order.sub.pred. respectively represent the elution order determined through the analysis and the elution order determined from the predicted retention time.
(40) When both of the methods NLP and MOO used a linear QSRR model (MLR), the accuracy of the elution order prediction was significantly increased (see
(41) Of the seven RP-LC columns used for both methods in both examples,
(42) The optimal solutions obtained according to the procedure of finding the Pareto optimal solution while starting from the knee point are shown in
(43) TABLE-US-00002 TABLE 2 % RMSE at knee point and optimal point % RMSE(t.sub.R) % RMSE(order) CS1 Knee point 8.67 43.7 Optimal point 9.33 42.0 CS2 Knee point 11.6 19.8 Optimal point 12.1 18.1
(44) While exemplary embodiments of the present disclosure have been described with reference to the accompanying drawings, those skilled in the art will appreciate that the present disclosure can be implemented in other different forms without departing from the technical spirit or essential characteristics of the exemplary embodiments. Therefore, it is noted that the exemplary embodiments described above are only for illustrative purposes and are not restrictive in all aspects.