Biomarkers for Detecting Secondary Liver Cancer

Abstract

The invention relates to a method for typing a subject for the presence or absence of a secondary liver cancer, comprising the steps of—measuring in a sample comprising peptides from a subject a peptide level for (i) a peptide comprising the amino acid sequence of SEQ ID NO:4 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:4; and/or (ii) a peptide comprising the amino acid sequence of SEQ ID NO:1 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:1; and—typing said subject for the presence or absence of said secondary liver cancer on the basis of the measured peptide level.

Claims

1. A method for typing a subject for the presence or absence of a secondary liver cancer, comprising the steps of measuring in a sample comprising peptides from a subject a peptide level for (i) a peptide comprising the amino acid sequence of SEQ ID NO:4 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:4; and/or (ii) a peptide comprising the amino acid sequence of SEQ ID NO:1 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:1; and typing said subject for the presence or absence of said secondary liver cancer on the basis of the measured peptide level.

2. The method according to claim 1, further comprising the steps of comparing said measured peptide level to a reference peptide level for (i) said peptide comprising the amino acid sequence of SEQ ID NO:4 or said peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:4; and/or (ii) said peptide comprising the amino acid sequence of SEQ ID NO:1 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:1; and typing said subject for the presence or absence of a secondary liver cancer on the basis of the comparison of the measured peptide level and the reference peptide level.

3. The method according to claim 1, wherein the subject is a subject suffering from, or a subject having suffered from, a primary cancer.

4. The method according to claim 1, wherein said subject is a subject that suffered from a primary cancer and in which the primary cancer was surgically resected.

5. The method according to claim 1, wherein said sample comprising peptides from a subject is a bodily fluid sample from said subject.

6. The method according to claim 1, wherein said sample comprising peptides is a sample comprising collagen natural occurring peptides (NOPs).

7. The method according to claim 2, wherein said reference peptide level is measured in a sample comprising peptides from a reference subject not suffering from, or a reference subject not having suffered from, cancer.

8. The method according to claim 7, wherein said subject, or said sample, is typed as having a secondary liver cancer when said peptide level is increased as compared to said reference peptide level.

9. The method according to claim 1, further comprising the steps of measuring in a sample comprising proteins from said subject a carcinoembryonic antigen (CEA) protein level; typing said subject for the presence or absence of a secondary liver cancer on the basis of the measured peptide level and the measured CEA protein level.

10. The method according to claim 9, wherein said proteins from said subject are proteins from a blood sample of said subject.

11. The method according to claim 9, further comprising the steps of comparing said measured protein level to a reference CEA protein level; and typing said subject for the presence or absence of a secondary liver cancer on the basis of the (i) comparison of the measured peptide level and the reference peptide level and (ii) comparison of the measured CEA protein level and the reference CEA protein level.

12. The method according to claim 11, wherein said reference CEA protein level is measured in a sample comprising proteins from a reference subject not suffering from, or a reference subject not having suffered from, cancer.

13. The method according to claim 12, wherein said subject, or said sample, is typed as having a secondary liver cancer when (i) said peptide level is increased as compared to said reference peptide level and (ii) said CEA protein level is increased as compared to said reference CEA protein level.

14. (canceled)

15. The method according to claim 1 wherein said secondary liver cancer is colorectal liver metastases (CRLM).

16. A peptide comprising the amino acid sequence of SEQ ID NO:4 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:4, or a peptide comprising the amino acid sequence of SEQ ID NO:1 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:1.

17. A peptide according to claim 16, wherein said peptide comprises the amino acid sequence of SEQ ID NO:4 or SEQ ID NO:1.

18. A standard-of-care therapeutic agent against a secondary liver cancer for use in treating a subject typed as having a secondary liver cancer according to the method of claim 1.

19. A method for treating a subject suffering from a secondary liver cancer, comprising the step of performing a method according to claim 1; administering a therapeutically effective amount of a standard-of-care therapeutic agent against secondary liver cancer when said subject is typed as having a secondary liver cancer.

20. A method for measuring a peptide level, comprising the step of: optionally, providing a sample comprising peptides from a subject; measuring in a sample comprising peptides from a subject a peptide level for (i) a peptide comprising the amino acid sequence of SEQ ID NO:4 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:4, and/or (ii) a peptide comprising the amino acid sequence of SEQ ID NO:1 or a peptide comprising an amino acid sequence that has at least 90% sequence identity to the amino acid sequence of SEQ ID NO:1.

21. The method according to claim 1, wherein the subject is a subject suffering from, or a subject having suffered from, a primary colorectal cancer.

22. The method according to claim 1, wherein said sample comprising peptides from a subject is a urine sample from said subject.

23. A method for treating a subject suffering from a secondary liver cancer, comprising the step of performing a method according to claim 1; administering a therapeutically effective amount of a standard-of-care therapeutic agent against secondary liver cancer when said subject is typed as having colorectal liver metastases.

Description

FIGURE LEGENDS

[0092] FIG. 1. Flowchart study

[0093] FIG. 1 shows a flowchart of the samples used in this study, showing discovery cohort 1 and validation cohort 2.

[0094] FIG. 2. Optimized collision energies NOPs

[0095] FIG. 2 lists, among other characteristics, the optimized collision energies of NOPs AGP, GPP and GND.

[0096] FIG. 3. Scatterplot optimal LRM (GND+CEA) and old LRM (AGP+CEA)

[0097] The scatter plot shows prediction of CRLM using the new combination of biomarkers (GND+CEA; optimal LRM) (left half of scatterplot) and a known combination of biomarkers (AGP+CEA; old LRM) (right half of scatterplot). The striped line represents the optimal cut-off for each model.

TABLE-US-00001 SEQUENCE LISTING SEQ ID NO: 1: Hydroxylated GND peptide GNDGARGSDGQPGPP(—OH)GP(—OH)P(—OH)GTAGFP(—OH) GSP(—OH)GAK(—OH)GEVGP SEQ ID NO: 2: Hydroxylated GPP peptide GPPGEAGK(—OH)P(—OH)GEQGVP(—OH)GDLGAP(—OH)GP SEQ ID NO: 3: Hydroxylated AGP peptide AGPP(—OH)GEAGKP(—OH)GEQGVP(—OH)GDLGAP(—OH)GP SEQ ID NO: 4: Hydroxylated GER peptide GERGSP(—OH)GGP(—OH)GAAGFP(—OH)GARGLP(—OH)GPP (—OH)GSNGNPGPP(—OH)GP(—OH).

[0098] In the SEQ ID NOs, (—OH) indicates that the preceding amino acid residue is hydroxylated. As an example, P(—OH)G means that P is hydroxylated.

EXAMPLES

Example 1

[0099] Materials and Methods

[0100] Experimental Design and Statistical Rationale

[0101] This study was approved by the Erasmus MC ethics review board (MEC-2008-062) and was performed according to the declaration of Helsinki. Urine samples of healthy kidney donors (controls) and CRLM patients were measured alternately with mass spectrometry.

[0102] The identification of new collagen NOPs in urine was based on the identification of all NOPs in urine (discovery set 1: controls, n=40; CRLM, n=40). In a previous study, a sample size of 25 samples per group proved sufficient to identify peptide based markers in bottom-up proteomics in tissue (Van Huizen et al., J Biol Chem. 294:281-9 (2018)). However, in urine the observed differences in NOP levels are smaller. The mean and standard deviations (SD) used for the power analysis (alpha=0.05, beta=0.20) were calculated from the overall data of log-transformed significant upregulated collagen peptides in urine samples of five CRLM patients and five control patients (control mean=6.76, CRLM mean=6.98, SDpooled=0.75). The power analysis resulted in a sample size of 40 samples per group.

[0103] Targeted analysis on NOPs of interest was performed on discovery set 1 and on an additional urine sample set (discovery set 2: control, n=60; CRLM, n=60). The discovery sets 1 and 2 as used herein for discovery are described in Lalmahomed et al., Am J Cancer Res., 6:321-30 (2016). Validation was performed on independently collected urine samples (control, n=12; CRLM, n=10) (Broker et al., Plos One; 8:e70918 (2013)). A flow chart of the samples used is shown in FIG. 1.

[0104] Bottom-up proteomics was used to identify new NOPs. Assessment of the number of significant NOPs by chance in the bottom-up proteomics data was determined by permutation testing.

[0105] The three most significant NOPs associated with the three most abundant collagen alpha chains that are also more strongly upregulated in CRLM tissue than in healthy liver tissue were selected (Van Huizen et al., J Biol Chem. 294:281-9 (2018)) As bottom-up proteomics is a semi-quantitative technique, a targeted quantitative mass spectrometry method (parallel reaction monitoring, PRM) was developed to validate these findings.

[0106] The developed PRM method is conform tier 3 (of 3 levels) of analytical assay validation (Carr et al., Mol Cell Proteomics; 13:907-17(2014)), which implies that the assay is a targeted discovery assay. The PRM method was applied on both the full discovery set and the validation set. To determine the best model, a logistic regression model (LRM) was fit, containing the NOPs (referred to by the three letter code) and CEA. The optimal LRM was fit by backward elimination of predictors from the LRM that contained all molecular markers (AGP, GND, GPP, and CEA). The optimal model was validated on the validation set. The statistical analysis was performed on the discovery set 2, and on the combined discovery sets 1 and 2 (full discovery set). However, combining the discovery set 1 with discovery set 2 created a dependent data set, because discovery set 1 was already used for bottom-up proteomics. Yet, combining generates a higher statistical power. After measuring all samples, and optimizing the LRM, we selected three samples with low, medium, and high levels of the predictors present in the optimal-LRM. Because SIL peptides were not available for all predictors, these three samples were processed five times to get an estimate of the reproducibility.

[0107] Chemicals

[0108] Ultra-high pressure liquid chromatography grade solvents were obtained from Biosolve (Valkenswaard, the Netherlands). A stable isotope labeled (SIL) peptide was obtained for AGPP(—OH)GEAGK(SIL)P(—OH)GEQGVP(—OH)GDLGAP(—OH)GP from Pepscan (Lelystad, the Netherlands), the lysine is labelled with .sup.13C.sub.6.sup.15N.sub.2. This SIL peptide was characterized using HPLC-UV and ESI-MS. Other peptides are GPPGEAGK(—OH)P(—OH)GEQGVP(—OH)GDLGAP(—OH)GP and GNDGARGSDGQPGPP(—OH)GP(—OH)P(—OH)GTAGFP(—OH)GSP(—OH)GAK(—OH)GEVGP.

[0109] These three urine NOPs will be abbreviated by the first three amino acids (AGP, GPP, and GND, respectively).

[0110] All other chemicals were obtained from Sigma-Aldrich (Zwijndrecht, the Netherlands).

[0111] Sample Selection

[0112] Samples of the cohorts described in the studies of Lalmahomed et al., Am J Cancer Res., 6:321-30 (2016) and Broker et al., Plos One; 8:e70918 (2013) were reanalyzed (FIG. 1). Samples of cohorts 1 and 2, were, after collection, stored at −80° C. in polypropylene tubes. One CRLM sample was excluded from the validation set of the current study because the corresponding CEA value was not known. As CEA levels for the validation set of the Broker et al., 2013 study are not known, this set of samples was excluded.

[0113] Age and BMI differences between controls and CRLM patients were calculated with a t-test, and differences in gender and serum creatinine levels above 115 μM/L with a chi-square test. A p-value below 0.05/4=0.0125 (Bonferroni correction to correct for multiple testing) was considered as significant.

[0114] Sample Preparation

[0115] NOPs for bottom-up proteomics and targeted mass spectrometry were isolated from urine as described by Lalmahomed et al., Am J Cancer Res., 6:321-30 (2016). In brief, NOPs were separated from small molecules, salts, and proteins with a mRP C-18 Hi-Recovery Protein Column (4.6×50 mm) (Agilent, Amstelveen, the Netherlands) installed in an Ultimate 300 LC system (Dionex, Amsterdam, the Netherlands) equipped with an online-fractionator. After separation, the NOP fraction was collected, dried, reconstituted, and analyzed with mass spectrometry.

[0116] Bottom-Up Proteomics

[0117] For the identification of NOPs we applied a standard bottom-up LC-MS/MS method as described Van Huizen et al., J Biol Chem. 294:281-9 (2018). In short, an Ultimate 3000 nano RSLC system (Thermo Fischer Scientific, Germering, Germany) was coupled online to an Orbitrap Fusion Lumos Tribrid Mass Spectrometer (Thermo Fischer Scientific, San Jose, Calif., USA). Injected samples were trapped and washed on a trap column (C18 PepMap, 300 μm ID×5 mm, 2 μm particles size, 100 Å pore size; Thermo Fisher Scientific, the Netherlands). After washing, the trap column was switched in line with an analytical column (PepMap C18, 75 μm ID×250 mm, 2 μm particle size, 100 Å pore size; Thermo Fisher Scientific, the Netherlands) for peptide separation prior to mass spectrometry analysis. We deviated from the protocol described by van Huizen et al., 2018 in that prior to mass spectrometry analysis, neither the samples were analyzed on a test HPLC-system, nor the injection volume was normalized. For every sample, a volume of 2 μL was injected. Bottom-up proteomics data was uploaded to the PRIDE archive (PXD013533).

[0118] Analysis of Bottom-Up Data

[0119] MGF peak list files were extracted from raw files by ProteoWizard (v3.0.9166). MGF peak list files were searched using the Mascot search engine (v2.3.2, Matrix Science Inc., London, UK) and the UniProt/SwissProt database (20194 entries). The following settings were used for the database search: enzyme was set to open because we analyzed NOPs; the mass tolerance was set to 10 ppm for peptide mass and 0.5 Da for fragment mass. As variable modification hydroxylation of proline, lysine, and oxidation of methionine was selected (+16 Da); no fixed modifications were added. MASCOT identifications were imported into Scaffold (v4.6.2, Portland, Oreg., USA). In Scaffold, protein confidence levels were set to 1% false discovery rate (FDR), at least 2 peptides per protein, and a 1% FDR at peptide level. FDRs were estimated by inclusion of a decoy database search generated by MASCOT. Raw files were aligned and combined with the identification list exported from Scaffold in Progenesis QI (v4, Nonlinear Dynamics, Newcastle-upon-Tyne, United Kingdom) followed by exporting the normalized abundance to Excel 2010 (Microsoft, Redmon, Wash., USA). Duplicate feature intensities were summed. Data was further processed with Excel, GraphPad Prism (v5.01, La Jolla, Calif., USA), and R (v3.3.1, Vienna, Austria). Prior to .sup.10 log-transformation, a value of ‘10’ was added, with the aim to include missing values for further data analysis. After .sup.10 log-transformation the data was assumed to be normally distributed. With an unequal variance independent samples t-test NOP significance between control and CRLM was tested. P-values below 0.05 were considered significant.

[0120] Only collagen alpha chains were taken into account, which also were found to differ between CRLM tissue and normal liver tissue (Van Huizen et al., J Biol Chem. 294:281-9 (2018)). A NOP molecular panel was constructed consisting of nine NOPs, i.e., the three most significant NOPs from the top three most abundant collagen alpha chains. In addition to these nine NOPs, the earlier reported NOP named AGP (Lalmahomed et al., Am J Cancer Res., 6:321-30 (2016); Broker et al., Plos One; 8:e70918 (2013)), was included in the targeted mass spectrometry method.

[0121] Permutation testing was performed according to the R-script published as supplemental file by Van Huizen et al., J Biol Chem. 294:281-9 (2018). In short, the data was randomly divided in two groups at the peptide level; significant differences between the two groups were determined using the Wilcoxon signed-rank test. Significant differences (p-value <0.05) were summed per permutation and the .sup.10 log was taken. The distribution of the .sup.10 log summed significant p-values was assumed to be normal. The difference was assumed to be significant if the true dataset value was greater than the average value of the permutation test plus twice the SD (p<0.05).

[0122] Targeted Mass Spectrometry Analysis

[0123] Targeted mass spectrometry measurements were performed on the same nanoLC-ESI-Orbitrap Lumos Fusion as used for the bottom-up proteomics. To measure the samples, a PRM method with optimized collision energies was developed. NOPs for which no optimal collision energy could be determined or which had too low signal intensities for identification were excluded. A table listing, among other characteristics, the optimized collision energies is available in FIG. 2.

[0124] The full discovery set and the validation set were measured at different times to increase validity. The data sets were aligned to the discovery set 1 using the mean values of the control groups of the other data sets. This was only necessary for NOPs for which no SIL peptides were generated. Targeted mass spectrometry data was uploaded to the PRIDE archive (PXD013705).

[0125] Analysis of Targeted Data

[0126] Raw files produced by the mass spectrometer were imported into Skyline (MacLean et al., Bioinformatics; 26:966-8 9 (2010)). Per peptide, we selected a maximum of five transitions with a high intensity, and no obvious interference of neighboring peaks. The GND and GPP peptide peak areas from Skyline were used and for AGP a ratio with the SIL peptide was used.

[0127] Logistic Regression Model

[0128] Statistical analyses were performed in R (version 3.3.1, Vienna, Austria) (R Core Team, R Foundation for Statistical Computing, Vienna, Austria. Retrieved from https://www.R-project.org/. (2016)). The predictor selection was applied separately on the discovery set 2 (independent data set) and the full discovery set (dependent data set). If the data set (full discovery set or discovery set 2) used to select predictors did not show a different predictor selection, than the analysis was performed with the full discovery set to prevent a loss of power. To select relevant predictors to fit the optimal logistic regression model, a significance level of 0.05 was used. The critical p-value was Bonferroni corrected for the number of predictors or comparisons tested.

[0129] The current molecular panel consists of AGP and CEA, which was extended with the newly identified NOPs (GPP and GND). To fit a new model with the molecular markers, these markers were tested on any relationship between patient characteristics, individual significance, and multicollinearity. A relationship between the patient characteristics ‘age’, ‘gender’, ‘BMI, ‘serum creatinine >115 μM/L’ was determined by fitting a linear model that predicts an individual molecular marker per patient characteristic and the predictor ‘group (healthy/sick)’. Molecular markers that were significantly correlated with a patient characteristic were excluded from further analysis. All remaining individual predictors were tested for significance by fitting a LRM with the individual predictors. Significance of an individual predictor was based on Wald statistics. The selected significant predictors were assessed for multicollinearity by calculating the variance inflation factor (VIF). Multicollinearity was assumed to be present with a VIF above 10; if necessary predictors were discarded to prevent multicollinearity.

[0130] The selected predictors were fit into a combined LRM (full-LRM). The optimal LRM (optimal-LRM) was formed by backward elimination of non-significant predictors from the full-LRM.

[0131] The relation between the molecular markers in the optimal-LRM and the size of the largest tumor, as well as, the number of tumors were tested by fitting a linear model. Significance of an individual predictor was based on Wald statistics.

[0132] The Cook's distance test was used to inspect the data for outliers and/or leverage points. The threshold for a point to be suspected of being an outlier/leverage point was calculated with the formula 4/(n−k−1), whereby n=number of samples, k=number of predictors. Outliers and/or leverage points identified by manual inspection of the samples were removed from the data set.

[0133] Our previous logistic regression model (old-LRM) contained AGP and CEA. Prior to comparison of the old-LRM and the optimal-LRM, a Pearson correlation between was calculated. To select the LRM with the highest predictive power, the performance of the optimal-LRM needed to be compared to that of the old-LRM. The predictive power was compared with the ‘anova’ function if there was nesting, otherwise the DeLong's test was used to compare AUCs.

[0134] Results

[0135] Patient Characteristics

[0136] Table 1 provides an overview of the basic patient characteristics. Age and gender were significantly different between the controls and CRLM patients. A serum creatinine level above 115 μM/L, indicating renal impairment, was measured in four patients.

TABLE-US-00002 TABLE 1 Patient characteristics Full-Discovery Validation Parameter Control CRLM p-value Control CRLM p-value Age (years) 52 [43-63] 64 [57-70] <0.001 59 [54-65] 66 [65-74] 0.02 Mate gender, No. (%) 32 (32%) 68 (68%) <0.001 0 (0%) 0 (0%) — BMI 26 [23-28] 26 [25-28] 0.13 26 [23-28] 25 [22-28] 0.27 No. of lesions — 2 [1-4] — — 2 [1-2].sup.1 — Size of largest lesion (cm) — 2.7 [1.8-4] — — 3.5 [3.0-4.5].sup.1 — Serum creatinine > 115 μM/L 0 (0%) 4 (4%) 0.24 0 (0%) 1 (10%) 0.93 Data are presented as a median with the interquartile range (25th-75th percentile) or n (%) .sup.1For one patient the no. of lesions, and size of the largest lesion were not available.

[0137] Bottom-Up Mass Spectrometry

[0138] A total of 1683 NOPs were identified in the discovery set 1, belonging to 175 proteins. The three most common proteins are collagen type1(I) (n=183 NOPs), collagen type-1(III) (n=157 NOPs), and uromodulin (n=84 NOPs). Four hundred and fifty-three NOPs (27%) belong to 13 collagen alpha chains (Table 2). Four hundred and six NOPs (24%) were significantly different between control and CRLM, of which 118 belong to collagen (Table 2).

TABLE-US-00003 TABLE 2 Number of NOPs identified per collagen alpha chain. Up .sup.II Down .sup.II p < 0.05 & p < 0.05 & Protein .sup.I Peptides FC > 1 FC < 1 Collagen alpha-1(I) chain.sup.III 183 30 10 Collagen alpha-2(I) chain.sup.III 54 8 2 Collagen alpha-1(II) chain 4 0 0 Collagen alpha-1(III) chain.sup.III 157 56 1 Collagen alpha-1(IV) chain 3 1 0 Collagen alpha-5(IV) chain 4 1 0 Collagen alpha-1(V) chain 2 1 0 Collagen alpha-2(V) chain 8 3 0 Collagen alpha-1(VI) chain 3 0 0 Collagen alpha-1(X) chain 3 0 0 Collagen alpha-1(XV) chain 11 0 0 Collagen alpha-1(XVIII) chain 20 1 4 Collagen alpha-1(XXII) chain 1 0 0 .sup.I Proteins marked in bold are shown to be upregulated in CRLM compared to normal liver tissue. .sup.II The number NOPs up- or downregulated with a p-value below 0.05 and the fold change (FC) above or below 1. .sup.IIIThe three most abundant collagen alpha chains.

[0139] Targeted Mass Spectrometry

[0140] The urine NOP panel was constructed by including AGP (Lalmahomed et al., Am J Cancer Res., 6:321-30 (2016); Broker et al., Plos One; 8:e70918 (2013)) and the three most significantly different NOPs of the three most abundant collagen alpha chains. Optimal collision energy could not be determined for seven urine NOPs. The three remaining urine NOPs were AGPP(—OH)GEAGKP(—OH)GEQGVP(—OH)GDLGAP(—OH)GP, GPPGEAGK(—OH)P(—OH)GEQGVP(—OH)GDLGAP(—OH)GP, and GNDGARGSDGQPGPP(—OH)GP(—OH)P(—OH)GTAGFP(—OH)GSP(—OH)GAK(—OH)GEVGP. While AGP and GPP originate from collagen alpha chain 1(I), GND originates from collagen alpha chain 1(III).

[0141] Logistic Regression Model

[0142] The predictor selection process was applied on the discovery set 2 and the full discovery set. Prior to fitting the full-LRM, the molecular markers (AGP, GPP, GND, and CEA) were tested on a linear relationship with any of the patient characteristics (age, gender, BMI, and serum creatinine levels). Significant linear relationships were not found. The individual molecular markers were also tested for individual significance by fitting a LRM per marker. The results are shown in Table 3. Individually, all molecular markers showed to be significant and were included in the full-LRM. There was no multicollinearity present between the molecular markers. Therefore all molecular markers were included into the full-LRM in the full discovery set and in discovery set 2. In the full discovery set neither a significant linear relationship was present between any of the molecular markers individually, nor with any of the molecular markers and size of the largest tumor and number of tumors.

[0143] The optimal-LRM was formed with backwards elimination of the non-significant predictors. For both data sets, this resulted in a model containing GND and CEA (Optimal-LRM). The predictor selection was irrespective of the use of the full discovery set or discovery set 2. The remaining analyses were, therefore, performed solely with the full discovery set to prevent a loss of statistical power. The formula to predict the probability of an individual of having CRLM is shown in formula 1. The OR with 95% CI for GND is 21 [8.5-60] and for CEA 32 [10-129].

[00005] $\begin{matrix} Formula 1 \end{matrix}$ $Probability of having secondary liver cancer = \frac{1}{1 + e^{- 1 * (- 24.1476 + 3.0365 * GND + 3.4647 * CEA)}}$

[0144] The Cook's distance was calculated to ensure that this formula is not heavily influenced by outliers/leverage points. Fourteen data points were above the threshold and were manually inspected. None appeared to be a wrong measurement, and therefore none was removed.

[0145] On re-measuring, AGP values from the old-LRM and the new optimal-LRM data sets were highly correlated (correlation=0.89, p-value <2.2*10.sup.−6). The linear relationship between the old AGP values (AGP_old) and the current AGP values is: AGP=0.9+1.57*AGP_old. The AUCs of the old-LRM and the optimal-LRM were compared using DeLong's test. The old-LRM had an AUC of 0.8824, which is significantly different from the optimal-LRM AUC of 0.9256 (p-value=0.032). A scatter plot containing the values of the optimal-LRM and old-LRM is available in FIG. 3.

[0146] Based on the ROC curve for values calculated by the optimal-LRM, a cut-off value of 0.439 was chosen. This cut-off value results in an 86% sensitivity and 84% specificity in the full discovery set, and in the validation set in 92% sensitivity and 90% specificity (Table 4).

[0147] To estimate the reproducibility of the sample processing with respect to the GND values, we measured three samples, selected in the lower, middle, and higher range of all measured values, five times. The following samples were measured, with in brackets the .sup.10 log of the area and % CV: VMS-248 (low, 6.1±1.4%), VMS-253 (middle, 6.9±1.8%), and VMS-163 (high, 7.6±1.4%).

TABLE-US-00004 TABLE 3 Predictor selection, significant predictors are marked in bold Univariable (Disc 2) Muitivariable (Disc 2) Univariable (Full disc) Multivariable (Full disc) Predictor OR.sup.I p-value OR.sup.I p-value OR.sup.I p-value OR.sup.I p-value AGP 1.7 [1.4-2.2] 4.3 * 10.sup.−5 1.4 [1.0-2.0] 0.11 1.6 [1.4-2.0] 3.1 * 10−7 1.2 [0.9-1.5] 0.27 GPP 4.0 [1.7-11.3] 0.0039 —.sup.III 7.1 [3.1-18] 1.3 * 10−5 0.7 [0.2-2.4] 0.56 GND 211 [8.0-67] 1.5 * 10−8 12 [3.4-46] 1.2 * 10−4 20 [9.4-48] 3.4 * 10−13 19 [6.1-66] 1.3 * 10−6 CEA 12.7 [5.1-38] 6.8 * 10−7 20 [5.4-104] 6.2 * 10−5 21 [8.8-56] 1.3 * 10−10 34 [11-146] 8.3 * 10−8 .sup.IThe odds ratio (OR) is presented as the OR and the 95% confidence level .sup.IIThe critical p-value used is 0.05/15 = 3.3 * 10.sup.−3 .sup.IIIThis molecular marker was individually not significant, and therefore not selected for the full-LRM

TABLE-US-00005 TABLE 4 Overview of the GND and CEA values and the obtained sensitivity and specificity. Optimal-LRM Old-LRM GND.sup.I CEA.sup.I Sensitivity Specificity Sensitivity Specificity Discovery Control 6.9 [6.7-7.2] 0.3 [0.1-0.5] 86% 84% 80% 79% CRLM 7.7 [7.4-8.0] 1.0 [0.6-1.4] Validation Control 6.9 [6.7-7.1] 0.2 [0.0-0.5] 92% 90% 66% 90% CRLM 7.5 [7.3-7.6] 1.5 [0.6-2.1] .sup.IMean [1.sup.st quartile-3.sup.rd quartile]

Example 2

[0148] This Example is a supplement to Example 1.

[0149] Materials and Methods

[0150] The procedure as described in Example 1 was used to identify and test a further natural occurring peptide (NOP), also referred to as “GER”, with amino acid sequence GERGSP(−4Hyp)GGP(−4Hyp)GAAGFP(−4Hyp)GARGLP(−4Hyp)GPP(−4Hyp)GSNGNPGPP(−4Hyp)GP(−4Hyp) in urine. “P(−4Hyp)” means that the amino acid proline (P) is modified into 4-hydroxyproline. This natural occurring peptide (NOP) originates from collagen alpha-1(III) (COL3A1, protein code uniprot/Swissprot=P02461). A short description of this procedure is summarized below.

[0151] For the discovery of the novel NOP GER we had a large sample set of healthy control urine (n=100) and urine from patients suffering from CRLM (n=100) available. The sample set was split with a ratio 40:60. Identification of NOP GER as a novel marker for CRLM was based on the analysis of 40 controls and 40 CRLM urines using an unbiased semi-quantitative proteomics approach. Further validation of the value of NOP GER as such was performed by using a targeted quantitative mass spectrometry method on the full sample set (control n=100, CRLM n=100). NOP GND formed together with serum carcinoembryonic antigen (CEA) a panel of markers that were fit in a logistic regression model (LRM-GND). From LRM-GND, NOP GND was replaced with NOP GER (LRM-GER). First, it was tested if NOP GER had a significant contribution to the model based on the Wald-statistics (p-value ≤0.05) and the 95% confidence interval (CI) of the odds-ratio (not overlapping with 1). Second, the predictive power of both the LRM-GND and LRM-GER were compared by comparing the area under the curve (AUC) of the ROC-curve using DeLongs test, a p-value below 0.05 was considered significant.

[0152] Results

[0153] In the same manner as NOP GND, the NOP GER was identified as a biomarker for secondary liver cancer. Further, in Table 5, the results of the LRM-GER are displayed showing the significance of NOP GER in the model. NOP GER has a significant contribution to the model with a p-value of 3.60*10.sup.−7, which is confirmed by the 95% CI of the odds-ratio that does not overlap with 1.

TABLE-US-00006 TABLE 5 Significance of natural occurring peptide GER in a logistic regression model to predict secondary liver cancer. Odds CI CI Parameter P-value ratio 2.5% 97.5% Intercept −20.62 2.5e-09 0.00 0.00 0.00 CEA 3.05 1.5e-08 21.20 8.00 67.80 COL3A1_GER 2.49 2.3e-08 12.10 5.30 31.10 CI confidence interval CEA carcinoembryonic antigen COL3A1 collagen alpha-1(III)

[0154] The exemplary formula (Formula 2) that was used to calculate the chance for a patient of having secondary liver cancer based on LRM-GER was:

[00006] $Probability of having secondary liver cancer = \frac{1}{1 + e^{- 1 * (- 20.62 + 3.05 * CEA + 2.49 * GER)}}$

[0155] The predictive power of LRM-GND and LRM-GER were compared based on the AUC of the ROC-curve. LRM-GER had a AUC of 0.9079 and LRM-GND of 0.9256. The AUCs are not significantly different (p=0.28), indicating that both models have a similar predictive power. Similar to the NOP GND, the combination of NOP GER and serum CEA has a significantly higher predictive power than serum CEA by itself and is similar to NOP GND in combination with CEA The standard for urine concentration correction is the level of creatinine in urine. Addition of urine creatinine levels to the models does not negatively influence the predictive powers of the NOP GND nor NOP GER (data not shown).

Biomarkers for Detecting Secondary Liver Cancer

Inventors

Cpc classification

Classification Explorer

C07K14/78

CHEMISTRY; METALLURGY

Classification Explorer

G01N33/57419

PHYSICS

Classification Explorer

G01N2800/7028

PHYSICS

Classification Explorer

G01N33/6842

PHYSICS

Classification Explorer

G01N33/57438

PHYSICS

International classification

Classification Explorer

G01N33/574

PHYSICS

Classification Explorer

G01N33/68

PHYSICS

Abstract

Claims

Description