THROMBOEMBOLIC DISEASE

20220380849 · 2022-12-01

Assignee

Inventors

Cpc classification

International classification

Abstract

The invention relates to a method for a more appropriate thromboembolic event risk assessment based on the presence of different genetic variant. The invention also relates to a method for determining the risk of suffering a thromboembolism disease by combining the absence or presence of one or more polymorphic markers in a sample from the subject with conventional risk factors for thromboembolism as well as computer-implemented means for carrying out said method.

Claims

1. A method for the thromboembolic event risk assessment in a subject comprising the steps of determining in a sample isolated from said subject the presence of the following polymorphisms or a SNP in linkage disequilibrium with one of said polymorphisms: Serpin A10 (protein Z inhibitor) Arg67Stop (rs2232698), Serpin C1 (antithrombin) Ala384Ser (Cambridge II), factor XII C46T (rs1801020), factor XIII Val34Leu (rs5985), Factor II (prothrombin) G20210A (rs1799963), factor V Leiden Arg506Gln (rs6025), factor V Cambridge Arg306Thr, factor V Hong Kong Arg306Gly, ABO blood group rs8176719, ABO blood group rs7853989, ABO blood group rs8176749 or ABO blood group rs8176743, and ABO blood group rs8176750, which is indicative of a risk of having a thromboembolic event.

2. A method for the diagnosis of being developing or suffering a thromboembolic disease or event in a subject comprising the steps of determining in a sample isolated from said subject the presence of the following polymorphisms or a SNP in linkage disequilibrium with one of said polymorphisms: Serpin A10 (protein Z inhibitor) Arg67Stop (rs2232698), Serpin C1 (antithrombin) Ala384Ser (Cambridge II), factor XII C46T (rs1801020), factor XIII Val34Leu (rs5985), Factor II (prothrombin) G20210A (rs1799963), factor V Leiden Arg506Gln (rs6025), factor V Cambridge Arg306Thr, factor V Hong Kong Arg306Gly, ABO blood group rs8176719, ABO blood group rs7853989, ABO blood group rs8176743 or ABO blood group rs8176749, and ABO blood group rs8176750, which is indicative of being developing or suffering a thromboembolic disease or event.

3. A method as defined in claim 1 wherein the thromboembolic disease is selected from the group of fatal or non-fatal myocardial infarction, stroke, transient ischemic attacks, peripheral arterial disease, deep vein thrombosis, pulmonary embolism or a combination thereof.

4. A method for identifying a subject in need of anticoagulant and/or antithrombotic therapy or in need of prophylactic antithrombotic and/or anticoagulant therapy comprising the steps of determining in a sample isolated from said subject the presence in at least one allele of polymorphisms or a SNP in linkage disequilibrium with one of said polymorphisms: Serpin A10 (protein Z inhibitor) Arg67Stop (rs2232698), Serpin C1 (antithrombin) Ala384Ser (Cambridge II), factor XII C46T (rs1801020), factor XIII Val34Leu (rs5985), Factor II (prothrombin) G20210A (rs1799963), factor V Leiden Arg506Gln (rs6025), factor V Cambridge Arg306Thr, factor V Hong Kong Arg306Gly, ABO blood group rs8176719, ABO blood group rs7853989, ABO blood group rs8176743 or rs8176749, and ABO blood group rs8176750, which is indicative of having a decreased response to a antithrombotic and/or anticoagulant therapy or of being in need of early and aggressive antithrombotic and/or anticoagulant therapy or in need of prophylactic antithrombotic and/or anticoagulant treatment.

5. A method as defined in claim 1 further comprising determining one or more of a cardiovascular disease or disorder risk factor or selected from the group consisting of age, race, sex, body mass index, smoking status, systolic blood pressure, diastolic blood pressure, hospitalization, plaster cast immobilization, surgery, trauma, oral contraceptives or hormone therapy, pregnancy, prolonged travel (≥2 hours), collagen vascular diseases, heart failure, malignancy, medications, myelo proliferative disorders, neprhotic syndrome, recurrent pregnancy loss, abdominal obesity, diabetes mellitus, low density lipoprotein (LDL)-cholesterol level, high density lipoprotein (HDL)-cholesterol level, cholesterol level, triglyceride levels, family history of thromboembolic event, pregnancy, and body mass index.

6. The method according to claim 1 wherein the sample is an oral tissue sample, scraping, or wash or a biological fluid sample, preferably saliva, urine or blood.

7. The method according to claim 1 wherein the presence or absence of the polynucleotide is identified by amplifying or failing to amplify an amplification product from the sample, wherein the amplification product is preferably digested with a restriction enzyme before analysis and/or wherein the SNP is identified by hybridizing the nucleic acid sample with a primer label which is a detectable moiety.

8. A method for the indication of the need for a preventive or treatment with a antithrombotic and/or anticoagulant therapy wherein the patient is selected for said therapy based on the presence in a sample isolated from said subject of the following polymorphisms or a SNP in linkage disequilibrium with one of said polymorphisms: Serpin A10 (protein Z inhibitor) Arg67Stop (rs2232698), Serpin C1 (antithrombin) Ala384Ser (Cambridge II), factor XII C46T (rs1801020), factor XIII Val34Leu (rs5985), Factor II (prothrombin) G20210A (rs1799963), factor V Leiden Arg506Gln (rs6025), factor V Cambridge Arg306Thr, factor V Hong Kong Arg306Gly, ABO blood group rs8176719, ABO blood group rs7853989, ABO blood group rs8176743 or rs8176749, and ABO blood group rs8176750.

9. (canceled)

10. A method of determining the probability of an individual of presenting a thromboembolism disease or event based on the presence of 1 to P classical risk factors and 1 to J polymorphisms selected from the group of the following polymorphisms or a SNP in linkage disequilibrium with one of said polymorphisms: Serpin A10 (protein Z inhibitor) Arg67Stop (rs2232698), Serpin C1 (antithrombin) Ala384Ser (Cambridge II), factor XII C46T (rs1801020), factor XIII Val34Leu (rs5985), Factor II (prothrombin) G20210A (rs1799963), factor V Leiden Arg506Gln (rs6025), factor V Cambridge Arg306Thr, factor V Hong Kong Arg306Gly, ABO blood group rs8176719, ABO blood group rs7853989, ABO blood group rs8176743 or rs8176749, and ABO blood group rs8176750 using the formula:
Probability (Y=1|x.sub.1, . . . ,x.sub.n)=1/1+exp(β.sub.0+β.sub.1x.sub.1+ . . . +β.sub.nx.sub.n+β.sub.f.Math.gx.sub.f.Math.x.sub.g+ . . . +β.sub.h.Math.ix.sub.h.Math.x.sub.i), wherein: Probability (Y=1|x.sub.1, . . . ,x.sub.n)=probability of presenting a thrombosis in a particular individual with concrete and measurable characteristics in a number of variables 1, . . . , n, wherein said probability could range between 0 and 1; Exp=exponential natural base; β.sub.0=coefficient that defines the risk (the probability) of thrombosis non related with the variables 1 to n, wherein said coefficient can take a value from −∞ to +∞ and is calculated as the natural logarithm of the incidence of venous thrombosis in the population; β.sub.1=regression coefficient that expresses the risk (higher or lower) to present thrombosis associated with the value/presence of the predictor variable x.sub.1, wherein said coefficient can take a value from −∞ to +∞; x.sub.1=value taken by the predictor variable x.sub.1 in an individual, wherein the range of possible values depends on the variable; β.sub.n=regression coefficient that expresses the risk (higher or lower) to present thrombosis associated with the value/presence of the predictor variable x.sub.n, wherein said coefficient can take a value from −∞ to +∞; x.sub.n=value taken by the predictor variable x.sub.n in an individual, wherein the range of possible values depends on the variable, wherein the model includes the effect of the combination of some variables in terms of interaction or modification of the effect, wherein the effect size (regression coefficient) of a single variable (x.sub.f) can be β.sub.f but if this variable is present in combination with another variable (x.sub.g) the effect size may vary (increase or decrease) and therefore to consider the effect size of the variable x.sub.f, not only the β.sub.f but also a second regression coefficient β.sub.f.Math.g is considered by adding the β.sub.f and the β.sub.f.Math.g, wherein: β.sub.f.Math.g=regression coefficient that expresses the risk (higher or lower) to present thrombosis associated with the combined presence of the predictor variables x.sub.f and x.sub.g, wherein said coefficient can take a value from −∞ to +∞; x.sub.f=value taken by the predictor variable x.sub.f in an individual, wherein the range of possible values depends on the variable; x.sub.g=value taken by the predictor variable x.sub.f in an individual, wherein the range of possible values depends on the variable; β.sub.h.Math.i=regression coefficient that expresses the risk (higher or lower) to present thrombosis associated with the combined presence of the predictor variables x.sub.h and x.sub.i, wherein said coefficient can take a value from −∞ to +∞; x.sub.h=value taken by the predictor variable x.sub.h in an individual, wherein the range of possible values depends on the variable; x.sub.i=value taken by the predictor variable x.sub.i in an individual, wherein the range of possible values depends on the variable; wherein, if the patient does not present any mutation or genetic variant of risk but he/she presents a positive family history of venous thrombosis, this variable is included in the model, wherein said regression coefficient of this variable is 1,185 with a range of possible values from 0.200 to 2.500.

11. A computer program or a computer-readable media containing means for carrying out a method as defined in claim 1.

12. A kit comprising reagents for detecting the presence of the following polymorphisms or a SNP in linkage disequilibrium with one of said polymorphisms: Serpin A10 (protein Z inhibitor) Arg67Stop (rs2232698), Serpin C1 (antithrombin) Ala384Ser (Cambridge II), factor XII C46T (rs1801020), factor XIII Val34Leu (rs5985), Factor II (prothrombin) G20210A (rs1799963), factor V Leiden Arg506Gln (rs6025), factor V Cambridge Arg306Thr, factor V Hong Kong Arg306Gly, ABO blood group rs8176719, ABO blood group rs7853989, ABO blood group rs8176743 or rs8176749, and ABO blood group rs8176750.

13. A kit as defined in claim 12 which comprises one or more primer pairs specific for the amplification of nucleic acid sequences comprising at least Serpin A10 (protein Z inhibitor) Arg67Stop (rs2232698), Serpin C1 (antithrombin) Ala384Ser (Cambridge II), factor XII C46T (rs1801020), factor XIII Val34Leu (rs5985), Factor II (prothrombin) G20210A (rs1799963), factor V Leiden Arg506Gln (rs6025), factor V Cambridge Arg306Thr, factor V Hong Kong Arg306Gly, ABO blood group rs8176719, ABO blood group rs7853989, ABO blood group rs8176743 or rs8176749, and ABO blood group rs8176750, or a SNP in linkage disequilibrium with one of said polymorphisms.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0147] FIG. 1A shows that in the absence of a DNA target, a HairLoop™ is held in the closed state.

[0148] FIG. 1B shows that when a target binds perfectly (no mismatch) to its HairLoop™, the greater stability of the probe-target duplex forces the stem to unwind, resulting in an opening of the HairLoop™.

DETAILED DESCRIPTION OF THE INVENTION

[0149] The authors of the present invention have solved the problems identified above in the methods in use nowadays for the calculation of the risk in a subject to develop a thromboembolic event, as this term has been defined above.

[0150] The authors of the present invention have identified a series of genetic variants which are associated with a risk of presenting a thromboembolic event. These genetic variants show predictive and diagnostic value.

[0151] Method for Solving the Limitations of the Methods to the Prediction of the Risk to Develop a Thromboembolic Event or for the Diagnosis of a Thromboembolic Event.

[0152] The present application solves the above-described limitation of the methods used nowadays to calculate the thromboembolic event risk and/or to diagnosis a thromboembolic event. A particular combination (as described above) of genetic markers is used, selected and evaluated by the inventors after a complex and genuine analysis of thousands of possible markers. Of the different possibilities to construct a genetic risk score (GRS), the inventors have selected a particular one because it provided the best possible results. To calculate the genetic risk punctuation, the accumulated number of risk allele risk from those SNPs listed in table 3 that are present in each individual is considered. For each of the variants studied, every individual can have 0, 1 or 2 alleles of risk. On having calculated the summatory of risk alleles accumulated in the different set of the selected variants (n=12), for each individual a score that could go from 0 to 24 was given. The inventors have generated new algorithms for thromboembolic risk estimation.

TABLE-US-00003 TABLE 3 Informative Other Gene SNP (draft name) Reference ID of the variant rs allele allele FXII 46C > T FXII, 46C > T 1801020 T C ABO blood 261delG ABO blood group 8176719 G delG group 526C > G ABO blood group 7853989 C G (A1 allele) 703G > A ABO blood group 8176743 G A (in LD with ABO blood group 8176749 C T 703G > A) 1059 delC ABO blood group 8176750 C delC SERPIN 728C > T Serpin A10, Arg67Stop 2232698 T C A10 SERPINC1 SerpinC1, SerpinC1, Ala384Ser 121909548 A C Ala384Ser (Cambridge II) (Cambridge II) coagulation FV Leiden FV, R506Q (F5 Leiden) 6025 T C factor FV (1746G > A) FV Cambridge FV, R306T (F5 118203906 C G (1146G > C) Cambridge) FV Hong Kong FV, R306G (F5 Hong 118203905 G A (1145A > G) Kong) coagulation V34L (226G > T) FXIII, Val34Leu, 5985 C A factor XIII (A1 polypeptide) coagulation G20210A Prothrombin, G20210A 1799963 A G factor II

[0153] The list of polymorphisms which are used in this method of the present invention is given in Table 3.

[0154] In embodiments of the invention, the detection of one or more SNPs in strong linkage disequilibrium with any or all of the recited polymorphisms can also be used in place of or in addition to detecting the specifically recited polymorphism.

[0155] In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in LD when the frequency of association of their different alleles is higher that would be expected if the loci were independent and associated randomly. Measures of LD are the correlation coefficient (r2) and the coefficient of LD (D). These measures (r2 and D) are not always convenient measures of LD because their range of possible values depends on the frequency of alleles they refer to. This makes it difficult to compare the level of LD between different pairs of alleles with very different frequencies. Thus, when comparing SNPs with very different allele frequency, both r2 and D values might be low and that does not exclude LD.

[0156] An alternative measure to take into account the allele frequency (the minor allele frequency or MAF) of the SNPs to be compared is the normalized D or D′. Therefore, the D′ is a more meaningful and easier measure to use, especially when comparing SNPs with very different MAFs. For example, two SNPs in total LD but with very different MAFs (for instance, 0.5 or 50% for SNP A and 0.01 or 1% for SNP B) would have a D′ value of 1.0 but the r2 value would be 0.01. Thus, the SNPs are in LD but the r2 value is just explaining that there is a rare or uncommon B allele, so the vast majority of the time the common A allele is not found with it, but not because it is not in disequilibrium, but only because it is rare.

[0157] SNPs in LD can be substituted without affecting the magnitude of the association between a GRS and the presence of thromboembolism.

[0158] Herein, a strong linkage disequilibrium may be defined by the r.sup.2value. Linkage disequilibrium is a characterization of the haplotype distribution at a pair of loci. It describes an association between a pair of chromosomal loci in a population. The r.sup.2 value is considered particularly suitable to describe linkage disequilibrium.

[0159] The r2 measure of linkage disequilibrium is defined as

[00001] r 2 ( ? ) = ( p ab - p a p b ) 2 p a ( 1 - p a ) p b ( 1 - p b ) , ( 1 ) ? indicates text missing or illegible when filed

[0160] where p.sub.ab is the frequency of haplotypes having allele a at locus 1 and allele b at locus 2 (Hill & Robertson. 1968). As the square of a correlation coefficient, ;⋅—(P.sub.a-P.sub.b-P.sub.ab) can range from 0 to 1 as p.sub.a. p.sub.b and p.sub.ab vary.

[0161] (“Hill & Robertson, 1968” is Theor Appl Genetics 1968; 38:226-231).

[0162] A strong linkage disequilibrium is one with an r.sup.2 value of more than 0.7, preferably more than 0.8, more preferred more than 0.9, including e.g. r.sup.2 values of 1.

[0163] For example, SNPs rs8176743 and rs8176749 in the ABO gene are in complete linkage disequilibrium (LD), as both r2 and D′ values are ‘1’ or very close to in all studied populations from whom there is available information. The lowest r2 value is 0.937620 and the lowest D′ value is 0.999999.

[0164] When prediction models are used, as for instance, for making treatment decisions, predictive risks may be categorized by using risk cutoff thresholds.

[0165] Those skilled in the art will readily recognize that the analysis of the nucleotides present according to the method of the invention in an individual's nucleic acid can be done by any method or technique capable of determining nucleotides present in a polymorphic site. As it is obvious in the art, the nucleotides present in the polymorphic markers can be determined from either nucleic acid strand or from both strands.

[0166] Once a biological sample from a subject has been obtained (e.g., a bodily fluid, such as urine, saliva, plasma, serum, or a tissue sample, such as a buccal tissue sample or a buccal cell) detection of a sequence variation or allelic variant SNP is typically undertaken. Virtually any method known to the skilled artisan can be employed. Perhaps the most direct method is to actually determine the sequence of either genomic DNA or cDNA and compare these sequences to the known alleles SNPs of the gene. This can be a fairly expensive and time-consuming process. Nevertheless, this technology is quite common and is well known.

[0167] Any of a variety of methods that exist for detecting sequence variations may be used in the methods of the invention. The particular method used is not important in the estimation of cardiovascular risk or treatment selection.

[0168] Other possible commercially available methods exist for the high throughput SNP identification not using direct sequencing technologies, for example, IIlumina's Veracode Technology, Taqman® SNP Genotyping Chemistry and KASPar SNP genotyping Chemistry.

[0169] A variation on the direct sequence determination method is the Gene Chip™ method available from Affymetrix. Alternatively, robust and less expensive ways of detecting DNA sequence variation are also commercially available. For example, Perkin Elmer adapted its TAQman Assay™ to detect sequence variation. Orchid BioSciences has a method called SNP-IT™ (SNP-Identification Technology) that uses primer extension with labeled nucleotide analogs to determine which nucleotide occurs at the position immediately 3′ of an oligonucleotide probe, the extended base is then identified using direct fluorescence, an indirect colorimetric assay, mass spectrometry, or fluorescence polarization. Sequenom uses a hybridization capture technology plus MALDI-TOF (Matrix Assisted Laser Desorption/Ionization—Time-of-Flight mass spectrometry) to detect SNP genotypes with their MassARRAY™ system. Promega provides the READIT™ SNP/Genotyping System (U.S. Pat. No. 6,159,693). In this method, DNA or RNA probes are hybridized to target nucleic acid sequences. Probes that are complementary to the target sequence at each base are depolymerized with a proprietary mixture of enzymes, while probes which differ from the target at the interrogation position remain intact. The method uses pyrophosphorylation chemistry in combination with luciferase detection to provide a highly sensitive and adaptable SNP scoring system. Third Wave Technologies has the Invader OS™ method that uses proprietary Cleavaseg enzymes, which recognize and cut only the specific structure formed during the Invader process. Invader OS relies on linear amplification of the signal generated by the Invader process, rather than on exponential amplification of the target. The Invader OS assay does not utilize PCR in any part of the assay. In addition, there are a number of forensic DNA testing labs and many research labs that use gene-specific PCR, followed by restriction endonuclease digestion and gel electrophoresis (or other size separation technology) to detect restriction fragment length polymorphisms (RFLPs).

[0170] In various embodiments of any of the above aspects, the presence or absence of the SNPs is identified by amplifying or failing to amplify an amplification product from the sample. Polynucleotide amplifications are typically template-dependent. Such amplifications generally rely on the existence of a template strand to make additional copies of the template. Primers are short nucleic acids that are capable of priming the synthesis of a nascent nucleic acid in a template-dependent process, which hybridize to the template strand. Typically, primers are from ten to thirty base pairs in length, but longer sequences can be employed. Primers may be provided in double-stranded and/or single-stranded form, although the single-stranded form generally is preferred. Often, pairs of primers are designed to selectively hybridize to distinct regions of a template nucleic acid, and are contacted with the template DNA under conditions that permit selective hybridization. Depending upon the desired application, high stringency hybridization conditions may be selected that will only allow hybridization to sequences that are completely complementary to the primers. In other embodiments, hybridization may occur under reduced stringency to allow for amplification of nucleic acids containing one or more mismatches with the primer sequences. Once hybridized, the template-primer complex is contacted with one or more enzymes that facilitate template-dependent nucleic acid synthesis. Multiple rounds of amplification, also referred to as “cycles,” are conducted until a sufficient amount of amplification product is produced.

[0171] Polymerase Chain Reaction

[0172] A number of template dependent processes are available to amplify the oligonucleotide sequences present in a given template sample. One of the best known amplification methods is the polymerase chain reaction. In PCR, pairs of primers that selectively hybridize to nucleic acids are used under conditions that permit selective hybridization. The term “primer”, as used herein, encompasses any nucleic acid that is capable of priming the synthesis of a nascent nucleic acid in a template-dependent process. Primers may be provided in double-stranded or single-stranded form, although the single-stranded form is preferred. Primers are used in any one of a number of template dependent processes to amplify the target gene sequences present in a given template sample. One of the best known amplification methods is PCR, which is described in detail in U.S. Pat. Nos. 4,683,195, 4,683,202 and 4,800,159, each incorporated herein by reference. In PCR, two primer sequences are prepared which are complementary to regions on opposite complementary strands of the target-gene(s) sequence. The primers will hybridize to form a nucleic-acid:primer complex if the target-gene(s) sequence is present in a sample. An excess of deoxyribonucleoside triphosphates is added to a reaction mixture along with a DNA polymerase, e.g. Taq polymerase that facilitates template-dependent nucleic acid synthesis. If the target-gene(s) sequence:primer complex has been formed, the polymerase will cause the primers to be extended along the target-gene(s) sequence by adding on nucleotides. By raising and lowering the temperature of the reaction mixture, the extended primers will dissociate from the target-gene(s) to form reaction products, excess primers will bind to the target-gene(s) and to the reaction products and the process is repeated. These multiple rounds of amplification, referred to as “cycles”, are conducted until a sufficient amount of amplification product is produced.

[0173] The amplification product may be digested with a restriction enzyme before analysis. In still other embodiments of any of the above aspects, the presence or absence of the SNP is identified by hybridizing the nucleic acid sample with a primer labeled with a detectable moiety. In other embodiments of any of the above aspects, the detectable moiety is detected in an enzymatic assay, radioassay, immunoassay, or by detecting fluorescence. In other embodiments of any of the above aspects, the primer is labeled with a detectable dye (e.g., SYBR Green I, YO-PRO-I, thiazole orange, Hex, pico green, edans, fluorescein, FAM, or TET). In other embodiments of any of the above aspects, the primers are located on a chip. In other embodiments of any of the above aspects, the primers for amplification are specific for said SNPs.

[0174] Another method for amplification is the ligase chain reaction (“LCR”). LCR differs from PCR because it amplifies the probe molecule rather than producing an amplicon through polymerization of nucleotides. In LCR, two complementary probe pairs are prepared, and in the presence of a target sequence, each pair will bind to opposite complementary strands of the target such that they abut. In the presence of a ligase, the two probe pairs will link to form a single unit. By temperature cycling, as in PCR, bound ligated units dissociate from the target and then serve as “target sequences” for ligation of excess probe pairs. U.S. Pat. No. 4,883,750, incorporated herein by reference, describes a method similar to LCR for binding probe pairs to a target sequence.

[0175] Isothermal Amplification

[0176] An isothermal amplification method, in which restriction endonucleases and ligases are used to achieve the amplification of target molecules that contain nucleotide 5′-[[alpha]-thio]-triphosphates in one strand of a restriction site also may be useful in the amplification of nucleic acids in the present invention. In one embodiment, loop-mediated isothermal amplification (LAMP) method is used for single nucleotide polymorphism (SNP) typing.

[0177] Strand Displacement Amplification

[0178] Strand Displacement Amplification (SDA) is another method of carrying out isothermal amplification of nucleic acids which involves multiple rounds of strand displacement and synthesis, i.e., nick translation. A similar method, called Repair Chain Reaction (RCR), involves annealing several probes throughout a region targeted for amplification, followed by a repair reaction in which only two of the four bases are present. The other two bases can be added as biotinylated derivatives for easy detection.

[0179] Transcription-Based Amplification

[0180] Other nucleic acid amplification procedures include transcription-based amplification systems, including nucleic acid sequence based amplification. In nucleic acid sequence based amplification, the nucleic acids are prepared for amplification by standard phenol/chloroform extraction, heat denaturation of a clinical sample, treatment with lysis buffer and minispin columns for isolation of DNA and RNA or guanidinium chloride extraction of RNA. These amplification techniques involve annealing a primer, which has target specific sequences. Following polymerization, DNA/RNA hybrids are digested with RNase H while double stranded DNA molecules are heat denatured again. In either case the single stranded DNA is made fully double stranded by addition of second target specific primer, followed by polymerization. The double-stranded DNA molecules are then multiply transcribed by a polymerase such as T7 or SP6. In an isothermal cyclic reaction, the RNA's are reverse transcribed into double stranded DNA, and transcribed once against with a polymerase such as T7 or SP6. The resulting products, whether truncated or complete, indicate target specific sequences.

[0181] Other amplification methods may be used in accordance with the present invention. In one embodiment, “modified” primers are used in a PCR-like, template and enzyme dependent synthesis. The primers may be modified by labelling with a capture moiety (e.g., biotin) and/or a detector moiety (e.g., enzyme). In the presence of a target sequence, the probe binds and is cleaved catalytically. After cleavage, the target sequence is released intact to be bound by excess probe. Cleavage of the labelled probe signals the presence of the target sequence. In another approach, a nucleic acid amplification process involves cyclically synthesizing single-stranded RNA (“ssRNA”), ssDNA, and double-stranded DNA (dsDNA), which may be used in accordance with the present invention. The ssRNA is a first template for a first primer oligonucleotide, which is elongated by reverse transcriptase (RNA-dependent DNA polymerase). The RNA is then removed from the resulting DNA:RNA duplex by the action of ribonuclease H (RNase H, an RNase specific for RNA in duplex with either DNA or RNA). The resultant ssDNA is a second template for a second primer, which also includes the sequences of an RNA polymerase promoter (exemplified by T7 RNA polymerase) 5′ to its homology to the template. This primer is then extended by DNA polymerase (exemplified by the large “Klenow” fragment of E. coli DNA polymerase I), resulting in a double-stranded DNA (“dsDNA”) molecule, having a sequence identical to that of the original RNA between the primers and having additionally, at one end, a promoter sequence. This promoter sequence can be used by the appropriate RNA polymerase to make many RNA copies of the DNA. These copies can then re-enter the cycle leading to very swift amplification. With proper choice of enzymes, this amplification can be done isothermally without addition of enzymes at each cycle. Because of the cyclical nature of this process, the starting sequence can be chosen to be in the form of either DNA or RNA.

[0182] Methods for Nucleic Acid Separation

[0183] It may be desirable to separate nucleic acid products from other materials, such as template and excess primer. In one embodiment, amplification products are separated by agarose, agarose-acrylamide or polyacrylamide gel electrophoresis using standard methods (Sambrook et al., 1989, see infra). Separated amplification products may be cut out and eluted from the gel for further manipulation. Using low melting point agarose gels, the separated band may be removed by heating the gel, followed by extraction of the nucleic acid. Separation of nucleic acids may also be effected by chromatographic techniques known in the art. There are many kinds of chromatography which may be used in the practice of the present invention, including adsorption, partition, ion-exchange, hydroxylapatite, molecular sieve, reverse-phase, column, paper, thin-layer, and gas chromatography as well as HPLC. In certain embodiments, the amplification products are visualized. A typical visualization method involves staining of a gel with ethidium bromide and visualization of bands under UV light. Alternatively, if the amplification products are integrally labeled with radio- or fluorometrically-labeled nucleotides, the separated amplification products can be exposed to X-ray film or visualized with light exhibiting the appropriate excitatory spectra.

[0184] Alternatively, the presence of the polymorphic positions according to the methods of the invention can be determined by hybridisation or lack of hybridisation with a suitable nucleic acid probe specific for a polymorphic nucleic acid but not with the non-mutated nucleic acid. By “hybridize” is meant a pair to form a double-stranded molecule between complementary polynucleotide sequences, or portions thereof, under various conditions of stringency. For example, stringent salt concentration will ordinarily be less than about 750 mM NaCl and 75 mM trisodium citrate, preferably less than about 500 mM NaCl and 50 mM trisodium citrate, and more preferably less than about 250 mM NaCl and 25 mM trisodium citrate. Low stringency hybridization can be obtained in the absence of organic solvent, e.g., formamide, while high stringency hybridization can be obtained in the presence of at least about 35% formamide, and more preferably at least about 50% formamide. Stringent temperature conditions will ordinarily include temperatures of at least about 30° C., more preferably of at least about 37° C., and most preferably of at least about 42° C. Varying additional parameters, such as hybridization time, the concentration of detergent, e.g., sodium dodecyl sulfate (SDS), and the inclusion or exclusion of carrier DNA, are well known to those skilled in the art. Various levels of stringency are accomplished by combining these various conditions as needed. In a preferred embodiment, hybridization will occur at 30° C. in 750 mM NaCl, 75 mM trisodium citrate, and 1% SDS. In a more preferred embodiment, hybridization will occur at 37° C. in 500 mM NaCl, 50 mM trisodium citrate, 1% SDS, 35% formamide, and 100 [mu]g/ml denatured salmon sperm DNA (ssDNA). In a most preferred embodiment, hybridization will occur at 42° C. in 250 mM NaCl, 25 mM trisodium citrate, 1% SDS, 50% formamide, and 200 [mu]g/ml ssDNA. Useful variations on these conditions will be readily apparent to those skilled in the art.

[0185] For most applications, washing steps that follow hybridization will also vary in stringency. Wash stringency conditions can be defined by salt concentration and by temperature. As above, wash stringency can be increased by decreasing salt concentration or by increasing temperature. For example, stringent salt concentration for the wash steps will preferably be less than about 30 mM NaCl and 3 mM trisodium citrate, and most preferably less than about 15 mM NaCl and 1.5 mM trisodium citrate. Stringent temperature conditions for the wash steps will ordinarily include a temperature of at least about 25° C., more preferably of at least about 42° C., and even more preferably of at least about 68° C. In a preferred embodiment, wash steps will occur at 25° C. in 30 mM NaCl, 3 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 42° C. in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. In a more preferred embodiment, wash steps will occur at 68° C. in 15 mM NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. Additional variations on these conditions will be readily apparent to those skilled in the art. Hybridization techniques are well known to those skilled in the art and are described, for example, in Benton and Davis (Science 196: 180, 1977); Grunstein and Hogness (Proc. Natl. Acad. Sci., USA 72:3961, 1975); Ausubel et al. (Current Protocols in Molecular Biology, Wiley Interscience, New York, 2001); Berger and Kimmel (Guide to Molecular Cloning Techniques, 1987, Academic Press, New York); and Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 1989.

[0186] Nucleic acid molecules useful for hybridisation in the methods of the invention include any nucleic acid molecule which exhibits substantial identity so as to be able to specifically hybridise with the target nucleic acids. Polynucleotides having “substantial identity” to an endogenous sequence are typically capable of hybridizing with at least one strand of a double-stranded nucleic acid molecule. By “substantially identical” is meant a polypeptide or nucleic acid molecule exhibiting at least 50% identity to a reference amino acid sequence or nucleic acid sequence. Preferably, such a sequence is at least 60%, more preferably 80% or 85%, and more preferably 90%, 95% or even 99% identical at the amino acid level or nucleic acid to the sequence used for comparison. Sequence identity is typically measured using sequence analysis software (for example, Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705, BLAST, BESTFIT, GAP, or PILEUP/PRETTYBOX programs). Such software matches identical or similar sequences by assigning degrees of homology to various substitutions, deletions, and/or other modifications. Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. In an exemplary approach to determining the degree of identity, a BLAST program may be used, with a probability score between e<″3> and e<″100> indicating a closely related sequence.

[0187] A detection system may be used to measure the absence, presence, and amount of hybridization for all of the distinct sequences simultaneously. Preferably, a scanner is used to determine the levels and patterns of fluorescence.

[0188] Another method for detecting sequence variations is based on the amplification by PCR of specific human targets and the subsequent detection of their genotype by hybridization to specific Hairloop™ probes spotted on a microarray.

[0189] HairLoop™ is a stem-loop, single-stranded DNA molecule consisting of a probe sequence embedded between complementary sequences that form a hairpin stem. The stem is attached to the microarray surface by only one of its strands. In the absence of a DNA target, the HairLoop™ is held in the closed state (FIG. 1a). When the target binds perfectly (no mismatch) to its HairLoop™, the greater stability of the probe-target duplex forces the stem to unwind, resulting in an opening of the HairLoop™ (FIG. 1b). Due to these unique structural and thermodynamic properties, HairLoop™ offer several advantages over linear probes, one of which is their increased specificity differentiating between two DNA target sequences that differ by as little as a single nucleotide.

[0190] HairLoop™ act like switches that are normally closed, or “off”. Binding to fluorescent DNA target induces conformational changes that open the structure and as a result after washing, the fluorescence is visible, or “on”.

[0191] One HairLoop™ is designed to be specific to one given allele. Thus, assessment of a point mutation for a bi-allelic marker requires two HairLoop™; one for the wild-type allele, and one for the mutant allele. The specific sequences for the detection of the polymorphisms described in table 3 using the HairLoop technology are given in table 4.

[0192] In addition to these sequences, the sequence surrounding the polymorphism of rs8176749 is GAGCACCTTGGTGGGTTTGTGGCGCAGCAGGTACTTGTTCAGGTGGCTCTCGT (SEQ ID NO: 25). Here the bold underlined C residue is the allele of risk, whereas a T in this position has a neutral effect. This is at position 133255801 on chromosome 9 (GRCh38).

TABLE-US-00004 TABLE 4 SEQ dbSNP ID Sequence comprising accession Position NO: Variant name the polymorhism Allele number Chrom. in chrom. Strand  1 FXII, 46 C > T GGACGGATGCCATGA Risk rs1801020  5 176836532 +  2 GGACGGACGCCATGA Non-risk  3 ABO, 261delG CTCGTGGTGACCCCT Risk rs8176719  9 136132908 -  4 CTCGTGGT ACCCCTT Non-risk  5 ABO, 526C > G GGAGGTGCGCGCCT Risk rs7853989  9 136131592 +  6 GGAGGTGGGCGCCT Non-risk  7 ABO, 703G > A TGCACCCCGGCTTCTAC Risk rs8176743  9 136131415 +  8 TGCACCCCAGCTTCTAC Non-risk  9 ABO, 1059delC TCCGGAACCCGTGAGC Risk rs8176750  9 136131059 + 10 TCCGGAA_CCGTGAGCG Non-risk 11 SERPINA10, CCTGCTGTGAAAGATCT Risk rs2232698 14 94756669 + 12 728 C > T CCTGCTGCGAAAGATCT Non-risk 13 SERPINC1, Ala384Ser CGGTACTTGAAGCTGCTT Risk NA  1 173873176 - 14 (Cambridge II) CGGTACTTGCAGCTGCTT Non-risk 15 FV, R506Q ATTCCTTGCCTGTCC Risk rs6025  1 169519049 - 16 (FV Leiden) ATTCCTCGCCTGTCC Non-risk 17 FV, R306T GAAAACCACGAATCTTAAG Risk NA  1 169524537 + 18 (FV Cambrigde) GAAAACCAGGAATCTTAAG Non-risk 19 FV, R306G AAGAAAACCGGGAATCTTA Risk NA  1 169524536 + 20 (FV Hong Kong) AAGAAAACCAGGAATCTTA Non-risk 21 FXIII, Val34Leu GGGCACCACGCCCTGA Risk rs5985  6 6318795 - 22 GGGCACCAAGCCCTGA Non-risk 23 Prothrombin, TCTCAGCAAGCCTCAAT Risk rs1799963 11 46761055 + 24 G20210A TCTCAGCGAGCCTCAAT Non-risk

[0193] Method to Establish in a More Appropriate Way the Risk Status.

[0194] Another object of the present invention is the development of an algorithm to estimate the risk to develop and/or to being suffering a thromboembolic event. The algorithm is shown as function 1.

[0195] Function 1

[0196] Estimating the Risk of Thrombosis.

[0197] The individual estimation of the risk of thrombosis is based on a logistic regression model. The aim of this model is to calculate the probability that a person has of presenting venous thrombosis according to his/her genetic, sociodemographic and clinical characteristics. To calculate this probability we use the following equation:


Probability (Y=1|x.sub.1, . . . ,x.sub.n)=1/1+exp(β.sub.0+β.sub.1x.sub.1+ . . . +β.sub.nx.sub.n+β.sub.f.Math.gx.sub.f.Math.x.sub.g+ . . . +β.sub.h.Math.ix.sub.h.Math.x.sub.i),

wherein: [0198] Probability (Y=1|x.sub.1, . . . , x.sub.n)=probability of presenting a thrombosis in a particular individual with concrete and measurable characteristics in a number of variables 1, . . . n. This probability could range between 0 and 1; [0199] Exp=exponential natural base; [0200] β.sub.0=coefficient that defines the risk (the probability) of thrombosis non related with the variables 1 to n. This coefficient can take a value from −∞ to +∞ and is calculated as the natural logarithm of the incidence of venous thrombosis in the population; [0201] β.sub.1=regression coefficient that expresses the risk (higher or lower) to present thrombosis associated with the value/presence of the predictor variable x.sub.1. This coefficient can take a value from −∞ to +∞; [0202] x.sub.1=value taken by the predictor variable x.sub.1 in an individual. The range of possible values depends on the variable; [0203] β.sub.n=regression coefficient that expresses the risk (higher or lower) to present thrombosis associated with the value/presence of the predictor variable x.sub.n. This coefficient can take a value from −∞ to +∞; [0204] x.sub.n=value taken by the predictor variable x.sub.n in an individual. The range of possible values depends on the variable.

[0205] In addition, the model includes the effect of the combination of some variables in terms of interaction or modification of the effect. That is, the effect size (regression coefficient) of a single variable (x.sub.f) can be β.sub.f but if this variable is present in combination with another variable (x.sub.g) the effect size may vary (increase or decrease) and therefore to consider the effect size of the variable x.sub.f we will have to consider not only the β.sub.f but also a second regression coefficient β.sub.f.Math.g by adding the β.sub.f and the β.sub.f.Math.g. Thus:

[0206] ρ.sub.f.Math.g=regression coefficient that expresses the risk (higher or lower) to present thrombosis associated with the combined presence of the predictor variables x.sub.f and x.sub.g. This coefficient can take a value from −∞ to +∞; [0207] x.sub.f=value taken by the predictor variable x.sub.f in an individual. The range of possible values depends on the variable; [0208] x.sub.g=value taken by the predictor variable x.sub.f in an individual. The range of possible values depends on the variable; [0209] β.sub.h.Math.i=regression coefficient that expresses the risk (higher or lower) to present thrombosis associated with the combined presence of the predictor variables x.sub.h and x.sub.i. This coefficient can take a value from −∞ to +∞; [0210] x.sub.h=value taken by the predictor variable x.sub.h in an individual. The range of possible values depends on the variable; [0211] x.sub.i=value taken by the predictor variable x.sub.i in an individual. The range of possible values depends on the variable;

[0212] If the patient does not present any mutation or genetic variant of risk but he/she presents a positive family history of venous thrombosis we will include this variable in the model. The regression coefficient of this variable is 1,185 with a range of possible values from 0.200 to 2.500.

[0213] The variables included in the model and the regression coefficients of each of these variables are shown in Table 5.

TABLE-US-00005 TABLE 5 Regression Regression Risk Regression coefficient coefficient Variable exposure coefficient lower limit upper limit Clinical Age < 55 y No 0 55-64 Yes 0.811 0.100 3.000 65-74 Yes 1.409 0.100 3.000 75-84 Yes 1.681 0.100 3.000 >84 Yes 2.534 0.500 6.000 Male Yes 0.336 0.050 1.500 Diabetes Yes 0.351 0.050 1.500 Smoking Yes 0.166 0.050 1.500 Body mass index < 25 kg/m.sup.2 No 0 0 0 25-29.9 kg/m.sup.2 Yes 0.412 0.050 1.500 ≥30 kg/m.sup.2 Yes 0.820 0.100 3.000 Use of oral contraceptives Yes 1.131 0.100 3.000 Pregnancy Yes 1.435 0.100 3.000 Family history of thrombosis * Yes 1.185 0.100 3.000 Genetic Factor V Leiden Heterozygote AG 0.993 0.100 3.000 Factor V Leiden Homozygote AA 2.890 0.500 6.000 Protombin AG 0.293 0.050 1.500 Serpin10 TG 1.358 0.100 3.000 Factor XII TC/TT 1.633 0.100 3.000 Factor XIII GT/GG 0.198 0.050 1.500 SerpinC TG 2.277 0.500 6.000 ABO (A1 allele) (see table 0.956 0.100 3.000 6) Interactions Factor V Leiden .Math. Protombin AG .Math. AG 1.114 0.100 3.000 Factor V Leiden .Math. ABO AG .Math. (see 0.599 0.100 3.000 table 6) Factor V Leiden .Math. Oral AG .Math. Yes 0.028 0.005 1.000 contraceptives Factor V Leiden .Math. Pregnancy AG .Math. Yes 1.191 0.100 3.000 Protrombin .Math. Oral AG .Math. Yes 0.542 0.100 3.000 contraceptives Protrombin .Math. Pregnancy AG .Math. Yes 1.673 0.100 3.000 Protrombin .Math. BMI ≥ 30 kg/m.sup.2 AG .Math. Yes 0.772 0.100 3.000 Oral contraceptives .Math. BMI ≥ 30 Yes .Math. Yes 1.218 0.100 3.000 kg/m.sup.2 * Only included in the model if the patient does not present any mutation or genetic variant of risk but he/she presents a positive family history of venous thrombosis.

TABLE-US-00006 TABLE 6 Definition of A1 allele The subject carries at least one A1 allele at ABO locus if any of the following combinations is present: Genotypes Combination rs8176719 rs7853989 rs8176743* or rs8176749* rs8176750 1 GG CC GG CC CC 2 GG CC GG CC CdelC 3 GG CG GA CT CC 4 GdelG CC GG CC CC 5 GG CG GG CC CC *Only one of those two polymorphisms is used to calculate the A1 allele at ABO locus.

[0214] Surprisingly, the combination of SNP markers included in the present invention and set forth in table 3 and using the function described in function 1 have proved to be capable to establishing the risk to develop a thromboembolic disease or event with a higher accuracy than that obtained using the methods nowadays in use or published functions including genetic information.

[0215] Surprisingly, the combination of SNP markers included in the present invention and set forth in table 3 and using the function described in function 1 have proved to be capable to assist in the diagnosis of a thromboembolic disease or event with a higher accuracy than that obtained using the methods nowadays in use or published functions including genetic information.

[0216] By the use of the functions described, a personalized risk is obtained for the development of thromboembolic event, in particular fatal- and non-fatal-myocardium infarction, stroke, transient ischemic attack, peripheral arteriopathy, deep vein thrombosis, pulmonary embolism or a combination thereof.

Example 1

[0217] Introduction. Thromboembolic disease has an important genetic component. In addition to the classic FV Leiden (FVL) and prothrombin G20210A (PT), new genetic variants associated with this pathology have been identified. The aim of this study was to determine whether a set of genetic variants selected by us (genetic profile) improves the ability of FVL and PT to predict the presence of thrombosis.

[0218] Methods. We included two studies (thrombosis) and controls: MARTHA (1,150 cases/801 controls) designed to evaluate the association of FVL and PT with other risk factors, and a study in Spanish population: PE (249 cases/248 controls). The genetic profile analyzed was: FVL, PT, ABO (A1 allele), C46T (F12), A384S (SERPINC1), R67X (SERPINA10). The association between genetic variants and thrombosis was calculated using the OR adjusted for age and sex. The predictive ability was calculated using the c statistic (AUC-ROC) and reclassification (NRI, IDI) observed when using the FVL, PT or when using the genetic profile.

[0219] Results.

TABLE-US-00007 TABLE 7 Association between variants and thrombosis [OR (95%)] and the proportion of FVL and PT carriers compared with carriers of the genetic profile (only cases). Table 7 FVL PT A1 C46T A384S R67X MARTHA 2.3 0.9 1.8  0.9 0.9 2.3 (1.8-2.8) (0.7-1.1) (1.2-2.7) (0.6-1.4) (0.2-3.7)  (1.2-4.6) Cases 50.4 87.5 PE 7.2 2.8 2.62 3.1 4.1 2.5  (2.8-18.9) (1.2-6.8) (1.8-3.8) (1.1-8.8) (0.5-36.9) (0.8-8.1) Cases 19.7 71.5

TABLE-US-00008 TABLE 8 Estadigraf c and reclassification, NRI (net reclassification improvement) and IDI (integrated discrimination improvement) comparing the use of the genetic profile to FVL and PT. Estadiraf c NRI IDI Table 8 MARTHA PE MARTHA PE MARTHA PE FVL + PT 0.54 0.58 Ref. Ref. Ref Ref. (0.51-0.57) (0.56-0.62) FVL + PT + 0.58 0.69 5.3 23.4 1.sup.  5.9 Resto (0.55-0.61) (0.64-0.73) (−1.1-11.6); (11.1-35.6) (0.3-1.8) (3.71-7.88) P-value <0.001 <0.001 >0.05  <0.001 <0.05  <0.001

[0220] Discussion We demonstrate that the selected genetic profile significantly improves the prediction of the risk of thrombosis, identifying a genetic risk of presenting a thromboembolic event in 51.6% of people who had a thromboembolic event and through analysis of FVL and PT were not at risk genetic. The genetic profile in clinical practice will improve the diagnosis, prevention and treatment of thromboembolic disease.