METHODS AND MEANS FOR AMPLIFICATION-BASED QUANTIFICATION OF NUCLEIC ACIDS
20230366016 · 2023-11-16
Inventors
Cpc classification
C12Q1/6809
CHEMISTRY; METALLURGY
C12Q2545/107
CHEMISTRY; METALLURGY
C12Q2537/143
CHEMISTRY; METALLURGY
C12Q1/6848
CHEMISTRY; METALLURGY
C12Q2537/143
CHEMISTRY; METALLURGY
C12Q1/6809
CHEMISTRY; METALLURGY
C12Q1/6883
CHEMISTRY; METALLURGY
C12Q2545/107
CHEMISTRY; METALLURGY
International classification
Abstract
The invention relates to simplified means of using biological predictive relationships, in some instances reducing the determination of complex gene networks and relative expression patterns to a single reading.
Claims
1. A method of amplifying at least a first and at least a second target polynucleotide in a sample, wherein the method comprises: providing: a) a sample potentially comprising the at least a first and at least a second target polynucleotides b) a first tuned competitor polynucleotide and a second tuned competitor polynucleotide; c) at least a first primer wherein at least the first primer is capable of hybridising to: a first target polynucleotide in the sample; and the first tuned competitor polynucleotide; and initiating a primer extension reaction such that the first and second target polynucleotides (if present in the sample) and the first tuned competitor polynucleotide and the second tuned competitor polynucleotide are amplified, wherein amplification results in a first target product, a second target product, a first tuned competitor product and a second tuned competitor product.
2. The method according to claim 1 wherein the method comprises providing a second primer, optionally wherein; a) the second primer is capable of hybridising to the first target polynucleotide, wherein the first and second primer hybridise on opposite strands of the target so as to result in the production of the first target product, optionally a first target polymerase chain reaction (PCR) product; b) the second primer is capable of hybridising to the first tuned competitor polynucleotide, wherein the first and second primer hybridise on opposite strands of the first tuned competitor polynucleotide so as to result in the production of the first tuned competitor product, optionally first tuned competitor PCR product; c) the second primer is: i) capable of hybridising to the first tuned competitor polynucleotide, wherein the first and second primer hybridise on opposite strands of the first tuned competitor polynucleotide so as to result in the production of the first tuned competitor product, optionally first tuned competitor PCR product; and ii) is capable of hybridising to the second tuned competitor polynucleotide and initiating a primer extension reaction such that the second tuned competitor polynucleotide is amplified so as to result in the production of the second tuned competitor product, optionally in combination with a further primer wherein the second and further primer hybridise on opposite strands of the second tuned competitor polynucleotide so as to result in the production of the second tuned competitor product, optionally a first target polymerase chain reaction (PCR) product, optionally wherein the second primer is not capable of hybridising to the first target polynucleotide; and/or d) the second primer is: i) capable of hybridising to the first target polynucleotide, wherein the first and second primer hybridise on opposite strands of the target so as to result in the production of the first target product, optionally a first target polymerase chain reaction (PCR) product; and ii) is not capable of hybridising to the first or second tuned competitor polynucleotide and wherein the method comprises a third primer capable of hybridising to the first and to the second tuned competitor polynucleotide.
3. The method of any one of claims 1 or 2 wherein the amplification kinetics of the first target polynucleotide are not the same as the amplification kinetics of the first tuned competitor polynucleotide, or are not substantially similar to the amplification kinetics of the first tuned competitor polynucleotide.
4. The method according to any one of claims 1-3 wherein the number of target product polynucleotides generated is different to the number of tuned competitor product polynucleotides generated, when the initial number of target polynucleotides and the number of tuned competitor polynucleotides prior to primer extension is the same or is substantially the same.
5. The method according to any one of claims 1-4 wherein: the sequence of the first target polynucleotide to be amplified and the sequence of the at least first tuned competitor polynucleotide, and the sequence of the second target polynucleotide to be amplified and the sequence of the at least second tuned competitor polynucleotide, is selected so as to result in a final amount of first target amplification product and second target amplification product that varies with the initial concentration of the first target polynucleotide and the second target polynucleotide in such a way that approximates or reproduces or matches the predictive relationship of the target to one or more states.
6. The method according to any one of claims 1-5 wherein the rate of amplification of the first target polynucleotide and the rate of amplification of the second target polynucleotide matches a pre-defined weighting.
7. The method according to any of one of claims 1-6 wherein the sequence of the first tuned competitor polynucleotide to be amplified shares less than 95%, 90%, 88%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30% sequence identity with the sequence of the first target polynucleotide to be amplified; and optionally wherein the sequence of the second tuned competitor polynucleotide to be amplified shares less than 95%, 90%, 88%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50%, 45%, 40%, 35%, 30% sequence identity with the sequence of the second target polynucleotide to be amplified;
8. The method according to any one of claims 1-7 wherein: the first tuned competitor product is: i) at least 5 nucleotides shorter than the first target product, optionally at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290 or at least 330 nucleotides shorter than the first target product; or ii) at least 5 nucleotides longer than the first target product, optionally at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290 or at least 330 nucleotides longer than the first target product; and/or the second tuned competitor product is: i) at least 5 nucleotides shorter than the second target product, optionally at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290 or at least 330 nucleotides shorter than the second target product; or ii) at least 5 nucleotides longer than the second target product, optionally at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290 or at least 330 nucleotides longer than the second target product;
9. The method according to any one of claims 1-8 wherein the one or more target products, optionally one or more target PCR products; and the one or more tuned competitor products, optionally one or more competitor polynucleotide PCR products are detected, optionally wherein the first target amplification product, the second target amplification product, the first tuned competitor amplification product and the second tuned competitor amplification product are detected.
10. The method according to any one of claims 1-9 wherein the method comprises providing one or more probe groups, wherein each probe group comprises at least one probe polynucleotide labelled with a first label and at least one probe polynucleotide labelled with a second label, and wherein the first and the second label are different.
11. The method according to claim 10 wherein the at least one probe labelled with the first label is capable of hybridising to the first target product; and the at least one probe labelled with a second label is capable of hybridising to the first tuned competitor product.
12. The method according to any of claims 10 or 11 wherein the at least one probe labelled with the first label is capable of hybridising to the first tuned competitor product; and the at least one probe labelled with the second label is capable of hybridising to the second tuned competitor product; and optionally wherein neither probe is capable of hybridising to the first target product.
13. The method according to any of claims 10-12 wherein within a single probe group there are: at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or at least 100 different probes each labelled with the first label; and/or at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or at least 100 different probes each labelled with the second label.
14. The method according to any one of claims 10-13 wherein the method comprises providing at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or at least 100 different probe groups, optionally wherein no particular label, optionally a fluorophore, is used in more than one probe group.
15. The method according to any one of claims 10-14 wherein the only labels present on the probes are the first label and the second label.
16. The method according to any one of claims 10-15 wherein the first and second label are fluorophores, optionally wherein each probe comprises a quencher; and/or wherein the first label is FAM and the second label is HEX; or wherein the first label is HEX and the second label is FAM.
17. The method according to any one of claims 10-16 wherein i) the at least one probe that is capable of hybridising to the first target product; and the at least one probe that is capable of hybridising to the first tuned competitor product are labelled with different labels; and/or ii) the at least one probe that is capable of hybridising to the first tuned competitor product; and the at least one probe that is capable of hybridising to the second tuned competitor product are labelled with different labels.
18. The method according to any of claims 10-17 wherein each probe that is capable of hybridising to a target polynucleotide product that is associated with a positive predictive relationship of a particular state is labelled with the first label, and the corresponding probe that is capable of hybridising to the tuned competitor polynucleotide product is labelled with the second label; and/or each probe that is capable of hybridising to a target polynucleotide product that is associated with a negative predictive relationship of the particular state is labelled with the second label, and the corresponding probe that is capable of hybridising to the tuned competitor polynucleotide product is labelled with the first label.
19. The method according to any of claims 10-18 wherein following amplification the amount of the product detected by the first probe and the amount of product detected by the second probe is determined.
20. The method according to claim 19 wherein the relative amounts of each probe are compared to a standard curve to determine the relative probability of one or more states.
21. The method according to any of claims 1-20 wherein the method comprises detecting the relative abundance of all amplification products by taking a single reading of all fluorophores used.
22. The method according to any one of claims 1-21 wherein the method is for the amplification of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or at least 100 target polynucleotides.
23. The method according to claim 1-22 wherein the method comprises amplification of two tuned competitor polynucleotides, wherein the method comprises: amplification of a first tuned competitor polynucleotide with at least one primer that is capable of hybridising to the first target polynucleotide; and amplification of a second tuned competitor polynucleotide with at least one primer that is capable of hybridising to the second target polynucleotide.
24. The method of any of claims 1-23 wherein the polynucleotides are amplified using the polymerase chain reaction (PCR) or the recombinase polymerase reaction (RPA).
25. A method of: converting the predictive relationship, decision surface or differential target oligonucleotide pattern, optionally a differential gene regulation signature provided by the relative abundance of at least two oligonucleotides or the presence or absence of at least two mutations, in a sample into a single value; translating the relative abundance of at least two oligonucleotides, for example the relative expression of at least two genes, or presence or absence of at least two mutations, in a sample into the relative probability of a particular state; detecting the relative abundance of at least three oligonucleotides, for example the relative expression of at least three genes, or presence or absence of at least three mutations, in a sample using only two fluorophore labelled probes; combining the relative abundance of at least two oligonucleotides, for example the relative expression of at least two genes, or presence or absence of at least two mutations, in a sample into a single value wherein the method comprises the method of amplifying at least a first and at least a second target polynucleotide in a sample according to any of claims 1-24.
26. The method of any of claims 1-25 wherein the method is for the diagnosis and/or prognosis of a disease or condition in a subject.
27. A method of diagnosis or prognosis of a disease or condition in a subject wherein the method comprises the method of any one of claims 1-25.
28. The method according to claim 27 wherein the subject is diagnosed as having a disease or condition or prognosis of developing a disease or condition when the relative amounts of the first label and the second label indicate diagnosis or prognosis of disease or condition.
29. The method of any of claims 26-28, wherein: a) the disease or condition is selected from: human tuberculosis, human tuberculosis with HIV co-infection, human tuberculosis without HIV co-infection, cancer optionally prostate or breast cancer, sepsis, bloodstream candidiasis, bovine tuberculosis, bovine mastitis, optionally wherein the disease is tuberculosis, optionally wherein: the predictive relationship, decision surface or differential target oligonucleotide pattern, optionally a differential gene regulation signature is identified from the white blood cells of the subject; and/or the degree of differential regulation of GBP6, ARG1 and TMCC1 contributes to an overall probability of having tuberculosis as compared to having some “other disease”, optionally wherein the gene expression signature is upregulation of GBP6, and downregulation of ARG1 and TMCC1, compared to the levels of these genes in patients not having tuberculosis.
30. The method of any of claims 26-29, wherein the disease is cancer, optionally prostate or breast cancer, optionally prostate cancer.
31. The method according to any of claims 26-30 wherein diagnosis of the disease or condition requires the assessment of the relative expression levels of at least two genes, optionally requires the assessment of the relative expression levels of at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or at least 100 genes.
32. A composition comprising one of, at least two or, or all of: a) At least one tuned competitor polynucleotide as defined in any one of claims 1-26; b) At least one primer as defined in any one of claims 1-26, optionally at least two primers as defined in anyone of claims 1-26; c) at least one or more probe groups, wherein each probe group comprises at least one probe polynucleotide labelled with a first label and at least one probe polynucleotide labelled with a second label, optionally as defined in any of claims 10-26.
33. A tuned competitor polynucleotide as defined in any one of claims 1-26.
34. A kit for carrying out the method of any one of claims 1-31, wherein the kit comprises one or more of: a) One or more tuned competitor polynucleotides as defined by claims 1-26; b) One or more primers, optionally as defined in any one of claims 1-26; c) A first probe group as defined in any one of claims 10-26; d) Suitable buffers; e) Instructions for use, optionally wherein the kit comprises at least 2, 3, 4, 5, 6, 7, 8, 9 or at least 10 different tuned competitor polynucleotides and/or at least 2, 3, 4, 5, 6, 7, 8, 9 or at least 10 different probe groups.
35. The kit according to claim 34 wherein the kit comprises: a) i) One or more tuned competitor polynucleotides as defined by claims 1-26, optionally at least two tuned competitor polynucleotides as defined by claims 1-26; and ii) One or more primers, optionally as defined in any one of claims 1-26; or b) i) One or more tuned competitor polynucleotides as defined by claims 1-26, optionally at least two tuned competitor polynucleotides as defined by claims 1-26; and ii) A first probe group as defined in any one of claims 10-26; or c) i) One or more primers, optionally as defined in any one of claims 1-26; and ii) A first probe group as defined in any one of claims 10-26; or d) i) One or more tuned competitor polynucleotides as defined by claims 1-26, optionally at least two tuned competitor polynucleotides as defined by claims 1-26; ii) One or more primers, optionally as defined in any one of claims 1-26; and iii) A first probe group as defined in any one of claims 10-26.
36. A method of tuning a first competitor polynucleotide that competes for hybridisation of at least a first primer with a first target polynucleotide and which results in a discrimination in amplification of a first target product and a first tuned competitor product that translates a predictive relationship, decision surface, or differential target oligonucleotide pattern into a relative abundance of the first target polynucleotide amplification product and wherein: a) the first competitor polynucleotide is designed to have different amplification kinetics to the target polynucleotide; b) a different proportion of target polynucleotides are amplified compared to the proportion of tuned competitor polynucleotides that are amplified; c) amplification of the first target polynucleotide matches the predictive relationship of the target polynucleotide to a particular state; and/or d) the rate of amplification of the first target polynucleotide and optionally the rate of amplification of a second target polynucleotide matches a pre-defined weighting, the method comprising optimising the sequence of the tuned competitor polynucleotide and/or length of tuned competitor amplification product with respect to the sequence of the first target product and/or length of the first target product.
37. The method according to claim 36 wherein: a second primer is used in said amplification that is capable of hybridising to the first target polynucleotide so that the first target product is produced by primer extension from two primers, optionally produced by PCR; a third primer is used in said amplification that is capable of hybridising to the first tuned competitor polynucleotide so that the first tuned competitor product is produced by primer extension from two primers, optionally produced by PCR; optionally wherein the second and the third primer have the same sequence.
38. The method according to claim 36 or 37 wherein said method is a method for tuning at least two or more test tuned competitor polynucleotides that results in a discrimination in amplification of a first target product and a first tuned competitor product, and in a discrimination in amplification of a second target product and a second tuned competitor product that translates a predictive relationship, decision surface, or differential target oligonucleotide pattern into a relative abundance of the first target polynucleotide amplification product and second target polynucleotide amplification product, and optionally wherein: a) the first competitor polynucleotide is designed to have different amplification kinetics to the target polynucleotide; b) a different proportion of target polynucleotides are amplified compared to the proportion of tuned competitor polynucleotides that are amplified; c) amplification of the first target polynucleotide matches the predictive relationship of the target to a particular state; and/or d) the rate of amplification of the first target polynucleotide and optionally the rate of amplification of a second target polynucleotide matches a pre-defined weighting, and selecting the tuned competitor that results in the most preferred amplification of the first target polynucleotide.
39. A method of determining the transcriptional state of a system wherein the method comprises a method of amplification according to any of the preceding claims.
40. A method of determining whether a system is in state A or in state B wherein the method comprises a method of amplification according to any of the preceding claims.
41. A method of simultaneous competitive amplification of at least two target polynucleotides in a sample wherein the method comprises providing a) a sample comprising polynucleotides; b) a first and a second tuned competitor polynucleotide; c) a first primer set, wherein the primer set comprises two primers capable of hybridising on opposite strands of a first target polynucleotide and the first competitive polynucleotide, so as to allow production of a first target amplification product and a first competitive amplification product; d) a second primer set, wherein the primer set comprises two primers capable of hybridising on opposite strands of a second target polynucleotide and the second competitive polynucleotide, so as to allow production of a second target product and a second competitive product; e) a first probe group, wherein the first probe group comprises a first labelled target probe capable of hybridising to the first target amplification product and a first labelled competitor probe capable of hybridising to the first competitive amplification product; d) a second probe group, wherein the second probe group comprises a second labelled target probe capable of hybridising to the second target amplification product and a second labelled competitor probe capable of hybridising to the second competitive amplification product; and wherein: i) the first labelled target probe and the second target labelled probe are labelled with the same first label; and wherein the first labelled competitor probe and the second labelled competitor probe are labelled with the same second label; or ii) the first labelled target probe and the second labelled competitor probe are labelled with the same first label; and wherein the first labelled competitor probe and the second labelled target probe are labelled with the same second label and allowing the first and second primer sets to hybridise to the target and competitive polynucleotides.
42. The method according to 41 wherein the method comprises providing e) a further 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 primer sets and corresponding probe groups.
43. The method according to claim 42 wherein the method further comprises simultaneously detecting the amount of the first label and the second label following multiplexed amplification.
Description
FIGURE LEGENDS
[0718]
[0719]
[0720]
[0721]
[0722]
[0723]
[0724]
[0725]
[0726]
[0727]
[0728]
[0729]
[0730]
[0731]
[0732]
[0733]
[0734]
[0735]
[0736]
[0737]
[0738] Illustration of how regression enables tuning of the competitors to achieve a given target amplification rate r. A) A regression surface (far left) is generated, for example through Gaussian Process regression, that relates the two competitor design parameters of length (BP, in nucleotides) and GC content (in percent) to the observed amplification rate, along with the uncertainty in that relationship. Here, observed points (i.e., competitor sequences which have been designed and experimentally tested) are denoted by circles shaded by amplification rate. Filled contours represent the expected amplification rate at each point determined by the regression algorithm, and dashed lines represent iso-uncertainty contours (the square root of the variance returned by the regressor), indicated as a multiple of the standard deviation of all observed r values thus far. From this regression surface, a metric such as Expected Improvement can be calculated that indicates a new design likely to display the desired target amplification rate. Shown here are the Expected Improvement surfaces for different targets, lighter shades indicating a higher likelihood of achieving the goal. B) The regression surface and expected improvement surfaces, shown here for a target amplification rate of 1.0, change as new sequences are tested and added to the model. In this way, the practitioner can iteratively tune the competitor sequences to achieve the desired amplification rate: i) regression is performed on data obtained thus far, ii) a new design is proposed which has high likelihood of achieving the desired rate, iii) a new sequence based on this design is obtained and experimentally tested, iv) if observed behavior is suboptimal, the regression surface can be updated to incorporate this data, and v) yet another design can be proposed.
[0739]
[0740] Shown here are the real-time fluorescence traces for competitive amplification reactions between each synthetic amplicon shown in
[0741]
[0742] List of examined sequences, design characteristics, and observed amplification parameters used in this work, any of which may be used as components of any CAN. Each sequence listed here was amplified in using traditional PCR techniques and the resulting fluorescence curves were analyzed as described in this work. The measured parameters F0_lg, K, r, and m are those that appear in equations 2 and 3, and tau and rho appear in equations 4 and 5.
SEQUENCE INFORMATION
[0743] SEQ ID NOs: 1-80 are as set out in
TABLE-US-00003 TABLE 1 SEQ ID NO Sequence 81 GCTATTGCTGGGATTTTGAGG 82 CGCCAAGTCCAGAACCATAG 83 GGAGAAAAGCCACATGAATGC 84 TGCAGAAACACTACCTGGTAC 85 GCAAGAACCAAGACCCTCAG 86 TCTCTGATCGGTCCCTTTACTC 87 AGTCAGTGTCAATATCCAAGCG 88 CATTTGCTTCAACAGTGACTACG 89 TCCCCATAATCCTTCACATCAC 90 CTGGAGAGAAACCATACCAATG 91 CCAAGTTCACCCAGTTTGTG 92 CAGTGCCTTGTCTGGAGAAT 93 TAATGTATGTCGGCGGTGTATC 94 TAGAGAGGTTACCAGAGCGTTGCC 95 AGCTGTGAGACGAAGGCTTCATGC 96 AGTTTCTCAAGCAGACCAGCCTTTCTC 97 CCAGAGTTCCCAGACGATTCCCA 98 AGTCAGTGTCAATATCCAAGCGCAAATAAAACACAAAACCCCAACTCAAACAAACCACACACCACCAAC CCACCCTCCCTCTACTCCTCTTTCTCTTCTTTTCTGGCAACGCTCTGGTAACCTCTCTAACTCTGATACAC CGCCGACATACATTA 99 AGTCAGTGTCAATATCCAAGCGCAAATAACCAACAAACAACCCAACCACCCCACCTCCCACTCTCCCTCC TTCTACTTCTCTTCTTGGCAACGCTCTGGTAACCTCTCTAATCACGATACACCGCCGACATACATTA 100 AGTCAGTGTCAATATCCAAGCGCGAAAAGAGTGAAGATAGTACGTGATTATGGGTCGGGTCCTGGGCT TTCTTACTTCTGCTATGATTTGTACTTTTACGCATGAAGCCTTCGTCTCACAGCTAGTTCGATACACCGCC GACATACATTA 101 AGTCAGTGTCAATATCCAAGCGTAAGGCCCACCAACATAACCACCCAAAAGATCAAGATTAGTGTGACG TACCTACCCTGAAATGACAGCCGCCTAGCATGAAGCCTTCGTCTCACAGCTGATGAGATACACCGCCGA CATACATTA 102 AGTCAGTGTCAATATCCAAGCGTAATCAATCTCTCCTACCATCTCCCCTCCTCCCACCTCACCCTCAACC CACAACACACAAACCCCAACCTAACATAAACTCACTGGCAACGCTCTGGTAACCTCTCTAAAACTGATAC ACCGCCGACATACATTA 103 CGCCAAGTCCAGAACCATAGAAATACAGAAAGAAGAGCCCCGGAATAAGACAAGCCAGATGAACACCA ATACGACACACTAAAACATCAAACACGGGCAACGCTCTGGTAACCTCTCTATACTTGATACACCGCCGA CATACATTA 104 CGCCAAGTCCAGAACCATAGAACAACACCAACAAACCACACACCCCACCACTCATCTCCCTTCTTCCTCT TTCTCTCCTATTTCCTTTACTTTTGCATGAAGCCTTCGTCTCACAGCTCTAAAGATACACCGCCGACATAC ATTA 105 CGCCAAGTCCAGAACCATAGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGAAGTCCCCATAATCCT TCACATCAC 106 CGCCAAGTCCAGAACCATAGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGAAGTCCCCATAATCCT TCACATCAC 107 CGCCAAGTCCAGAACCATAGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGAAGTCCCCATAATCCT TCACATCAC 108 CGCCAAGTCCAGAACCATAGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGAAGTCCCCATAATCCT TCACATCAC 109 CGCCAAGTCCAGAACCATAGATCTGTATCCCAAGTGTTCAGACCTTCATATTGCATGAAGCCTTCGTCTC ACAGCTATTGATAGTTCCGATTGCAACTTGACGTCTAGTCCCCATAATCCTTCACATCAC 110 CGCCAAGTCCAGAACCATAGCAACAGAAAAGAACACGAACAACCAAAACCCACAATAAACACACCTACA ACACCCAACCCCACCTCACCCCGCATGAAGCCTTCGTCTCACAGCTAACAAGATACACCGCCGACATAC ATTA 111 CGCCAAGTCCAGAACCATAGCCAAAACCAAACACCAACCACAACCTACCCCATCTCTCCCTCTCTTTTCT CCTTTTATTTCCTGCATGAAGCCTTCGTCTCACAGCTAAACAGATACACCGCCGACATACATTA 112 CGCCAAGTCCAGAACCATAGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCGC CTCCCCATAATCCTTCACATCAC 113 CGCCAAGTCCAGAACCATAGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCGC CTCCCCATAATCCTTCACATCAC 114 CGCCAAGTCCAGAACCATAGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCGC CTCCCCATAATCCTTCACATCAC 115 CGCCAAGTCCAGAACCATAGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCGC CTCCCCATAATCCTTCACATCAC 116 CGCCAAGTCCAGAACCATAGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGTA ACCTCTCTAACCCGCACGCCGGCGACCGCGCGCCCGGTCCCCATAATCCTTCACATCAC 117 CGCCAAGTCCAGAACCATAGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGTA ACCTCTCTAACCCGCACGCCGGCGACCGCGCGCCCGGTCCCCATAATCCTTCACATCAC 118 CGCCAAGTCCAGAACCATAGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGTA ACCTCTCTAACCCGCACGCCGGCGACCGCGCGCCCGGTCCCCATAATCCTTCACATCAC 119 CGCCAAGTCCAGAACCATAGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGTA ACCTCTCTAACCCGCACGCCGGCGACCGCGCGCCCGGTCCCCATAATCCTTCACATCAC 120 CGCCAAGTCCAGAACCATAGCGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGATTGG AATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATG AAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTTCCCCATAATCCTTCACATCAC 121 CGCCAAGTCCAGAACCATAGCGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGATTGG AATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATG AAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTTCCCCATAATCCTTCACATCAC 122 CGCCAAGTCCAGAACCATAGCGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGATTGG AATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATG AAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTTCCCCATAATCCTTCACATCAC 123 CGCCAAGTCCAGAACCATAGCGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGATTGG AATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATG AAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTTCCCCATAATCCTTCACATCAC 124 CGCCAAGTCCAGAACCATAGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACACTCCTT ACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCATCTAA CCTCGGAGCCCTGTCACGCGGCGGACTTGGAGATCCCCATAATCCTTCACATCAC 125 CGCCAAGTCCAGAACCATAGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACACTCCTT ACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCATCTAA CCTCGGAGCCCTGTCACGCGGCGGACTTGGAGATCCCCATAATCCTTCACATCAC 126 CGCCAAGTCCAGAACCATAGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACACTCCTT ACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCATCTAA CCTCGGAGCCCTGTCACGCGGCGGACTTGGAGATCCCCATAATCCTTCACATCAC 127 CGCCAAGTCCAGAACCATAGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACACTCCTT ACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCATCTAA CCTCGGAGCCCTGTCACGCGGCGGACTTGGAGATCCCCATAATCCTTCACATCAC 128 CGCCAAGTCCAGAACCATAGGGATTATTGGAGCTCCTTTCTCAAAGGGACAGCCACGAGGAGGGGTGG AAGAAGGCCCTACAGTATTGAGAAAGGCTGGTCTGCTTGAGAAACTTAAAGAACAAGAGTGTGATGTGA AGGATTATGGGGA 129 CGCCAAGTCCAGAACCATAGGGATTATTGGAGCTCCTTTCTCAAAGGGACAGCCACGAGGAGGGGTGG AAGAAGGCCCTACAGTATTGAGAAAGGCTGGTCTGCTTGAGAAACTTAAAGAACAAGAGTGTGATGTGA AGGATTATGGGGA 130 CGCCAAGTCCAGAACCATAGGGATTATTGGAGCTCCTTTCTCAAAGGGACAGCCACGAGGAGGGGTGG AAGAAGGCCCTACAGTATTGGCAACGCTCTGGTAACCTCTCTAAATTAAAGAACAAGAGTGTGATGTGA AGGATTATGGGGA 131 CGCCAAGTCCAGAACCATAGGGATTATTGGAGCTCCTTTCTCAAAGGGACAGCCACGAGGAGGGGTGG AAGAAGGCCCTACAGTATTGGCAACGCTCTGGTAACCTCTCTAAATTAAAGAACAAGAGTGTGATGTGA AGGATTATGGGGA 132 CGCCAAGTCCAGAACCATAGGGCAACGCTCTGGTAACCTCTCTAAATTAAGTGATGTGAAGGATTATGG GGA 133 CGCCAAGTCCAGAACCATAGGGCAACGCTCTGGTAACCTCTCTAAATTAAGTGATGTGAAGGATTATGG GGA 134 CGCCAAGTCCAGAACCATAGTAATTATTATAGCTAATTTCTCAAATTTACAGAAACGAATAGAAGTTTAA GAATTAAATACAGTATTGGCAACGCTCTGGTAACCTCTCTAAATTAAATAACAATAATGTGATGTGAAGG ATTATGGGGA 135 CGCCAAGTCCAGAACCATAGTAATTATTATAGCTAATTTCTCAAATTTACAGAAACGAATAGAAGTTTAA GAATTAAATACAGTATTGGCAACGCTCTGGTAACCTCTCTAAATTAAATAACAATAATGTGATGTGAAGG ATTATGGGGA 136 CGCCAAGTCCAGAACCATAGTACAAAGCACGATCGAGAACAGGGCAGGTAGATTGAACGAGATGGGGA ATGATGGACGGATAAATGGGACTGGCAACGCTCTGGTAACCTCTCTAACATTGATACACCGCCGACATA CATTA 137 CGCCAAGTCCAGAACCATAGTAGCAATATTGAATTCTAGATTATACGAGGCAACGCTCTGGTAACCTCT CTATTATTTAAGCTATCATACTCTAGTGTTTTCCCCATAATCCTTCACATCAC 138 CGCCAAGTCCAGAACCATAGTAGCAATATTGAATTCTAGATTATACGAGGCAACGCTCTGGTAACCTCT CTATTATTTAAGCTATCATACTCTAGTGTTTTCCCCATAATCCTTCACATCAC 139 CGCCAAGTCCAGAACCATAGTAGCAATATTGAATTCTAGATTATACGAGGCAACGCTCTGGTAACCTCT CTATTATTTAAGCTATCATACTCTAGTGTTTTCCCCATAATCCTTCACATCAC 140 CGCCAAGTCCAGAACCATAGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACCTCTCT AAATTGATAGTTCCGATTGCAACTTGACGTTCCCCATAATCCTTCACATCAC 141 CGCCAAGTCCAGAACCATAGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACCTCTCT AAATTGATAGTTCCGATTGCAACTTGACGTTCCCCATAATCCTTCACATCAC 142 CGCCAAGTCCAGAACCATAGTCATAGTTCTATATACATTGTCATGACACGAATTGGCTGAAAACGGTTGA TAGAAGATATTGACTATATTCTCTTCCGCTGTATCCCGTTTCTTTTGGAAATTGACCGTATTATGGTCACC ATCAGCCTAAGTGATCTCTGGACCGTCGAGAGACCCCATTGACTTGGTTCTTCGGTTTGATGCACTCAT GTAAAATGTAGTCTCAATCAATACCATCCATTTCTAGCATACGGGTGAGCATGAAGCCTTCGTCTCACAG CTCCGGTACAGGTAATCGAGAGAACACTAAAACAGTCCGACATGAGATTCATTAAAACCTATTTTCACCA ATCGGTAGAACGGTTATGCGCAAAATATTTTCGGGGTCCACAGTGCACCTATGTAATCTGTAACATGAA GTTGTACGAAAATAGAGAACCCACCCAGCTTATCTAGGAAATTGATCTCTTCGATTTAAGGATGTGTCGA CACGTATCATGCCAAGTGATCAGAGGCGTAATCCCCATAATCCTTCACATCAC 143 CGCCAAGTCCAGAACCATAGTCTGTATCCCAAGTGTTCAGAGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGGTGATGTGAAGGATTATGGGGA 144 CGCCAAGTCCAGAACCATAGTCTGTATCCCAAGTGTTCAGAGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGGTGATGTGAAGGATTATGGGGA 145 CGCCAAGTCCAGAACCATAGTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTTAAAT ACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATTTTAT TCTTTAGAATAGTAGAAATTTAATTAAATTCCCCATAATCCTTCACATCAC 146 CGCCAAGTCCAGAACCATAGTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTTAAAT ACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATTTTAT TCTTTAGAATAGTAGAAATTTAATTAAATTCCCCATAATCCTTCACATCAC 147 CGCCAAGTCCAGAACCATAGTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTTAAAT ACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATTTTAT TCTTTAGAATAGTAGAAATTTAATTAAATTCCCCATAATCCTTCACATCAC 148 CGCCAAGTCCAGAACCATAGTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTTAAAT ACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATTTTAT TCTTTAGAATAGTAGAAATTTAATTAAATTCCCCATAATCCTTCACATCAC 149 CGCCAAGTCCAGAACCATAGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCCCCATG GCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGCTGTCA GGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCGACCTC CCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTCCGGAT GTGGTCCCCTCCCAGTCCTCTCCCCATAATCCTTCACATCAC 150 CGCCAAGTCCAGAACCATAGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCCCCATG GCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGCTGTCA GGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCGACCTC CCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTCCGGAT GTGGTCCCCTCCCAGTCCTCTCCCCATAATCCTTCACATCAC 151 CGCCAAGTCCAGAACCATAGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCCCCATG GCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGCTGTCA GGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCGACCTC CCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTCCGGAT GTGGTCCCCTCCCAGTCCTCTCCCCATAATCCTTCACATCAC 152 CGCCAAGTCCAGAACCATAGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCCCCATG GCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGCTGTCA GGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCGACCTC CCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTCCGGAT GTGGTCCCCTCCCAGTCCTCTCCCCATAATCCTTCACATCAC 153 CGCCAAGTCCAGAACCATAGTGTAATATTAACAAGTAATAAAGAAATATATAGCATGAAGCCTTCGTCTC ACAGCTTTTATTCAATTTAATGATTACCTTTATTATCTTCCCCATAATCCTTCACATCAC 154 GCAAGAACCAAGACCCTCAGAAACACCAACCCAACAACACAAACCCGCAACCTAAAACCACCACAACTC CCTCCTCGCATGAAGCCTTCGTCTCACAGCTCCCCACCCCTTAATTTCCGCACCTATT 155 GCAAGAACCAAGACCCTCAGACACGACTCCCCGCCACAACCACACAATCCACTACCTGCCCACATCCTA ACCCTACCCTTCCTGCATGAAGCCTTCGTCTCACAGCTCTAGTCCCCTTAATTTCCGCACCTATT 156 GCAAGAACCAAGACCCTCAGACCAAACGCAACAACACAGACACCACAACTACCACTCACCCCAACTCCA ACCGCATGAAGCCTTCGTCTCACAGCTCTCCGCCCCTTAATTTCCGCACCTATT 157 GCAAGAACCAAGACCCTCAGACCAACAACCGCCAACTACAACGACACCAGAGCACACCCATATACATCA CCCCTTCCCCTATTTCTCTTCCGCTCCTTTCTTTCCGTCTGTTTCCCGCTGCTTTTCTGTCTCGCCCTAAT CCACCAAACCCGCCCACTCCAATATCCTACCTTCTTCACCTTGCCTGTACCGATGACTTTGCCCGAATAA TCTACTCTCCTAACCTGCACCCGACTCAACTCCTCATCTATCCCAACGCCGTCACTTCCTCCATACCTCTA CCATCCAACCCCACGACCCACCTACACAGATACCCAAATCCGCATGAAGCCTTCGTCTCACAGCTTATGT CAGTGCCTTGTCTGGAGAAT 158 GCAAGAACCAAGACCCTCAGACCGCCGCCCACCCCTCCCCGCATGAAGCCTTCGTCTCACAGCTCGCG TCAGTGCCTTGTCTGGAGAAT 159 GCAAGAACCAAGACCCTCAGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGAAGATTCTCCAGACAA GGCACTG 160 GCAAGAACCAAGACCCTCAGATCCATTACCCAGATTGAGCTATTTACGACGACAACACATCCACATTCTA CCTGACCCACTACCGCGCATGAAGCCTTCGTCTCACAGCTTCGATCCCCTTAATTTCCGCACCTATT 161 GCAAGAACCAAGACCCTCAGATCTGTATCCCAAGTGTTCAGACCTTCATATTGCATGAAGCCTTCGTCTC ACAGCTATTGATAGTTCCGATTGCAACTTGACGTCTAGCAGTGCCTTGTCTGGAGAAT 162 GCAAGAACCAAGACCCTCAGATCTGTATCCCAAGTGTTCAGACCTTCATATTGCATGAAGCCTTCGTCTC ACAGCTATTGATAGTTCCGATTGCAACTTGACGTCTAGCAGTGCCTTGTCTGGAGAAT 163 GCAAGAACCAAGACCCTCAGATCTGTATCCCAAGTGTTCAGACCTTCATATTGCATGAAGCCTTCGTCTC ACAGCTATTGATAGTTCCGATTGCAACTTGACGTCTAGCAGTGCCTTGTCTGGAGAAT 164 GCAAGAACCAAGACCCTCAGCACCACCATCCCCACCTCCCACTCTACTCCACGCCTCAATTCCGACTAC CACTACGCCATTTCCCCTCTTCCATTCACTGTCCTTTCTCTCCTTATCCTGCTCCTCTGTCTCTTTTATTCT TTCCTTCCCTTTATCTCCCGTTACTTGCACTTTACCTATCCGAACCCACACATACCCCTGCCAAAACCCCA ACCTAAAACGAACACCCAAACAAAGCCACAATACAACACACCAACATAACAACCCGCACTCCCTAATATC ACCTTGCCCTCCTACTAACCTCATCATCTACCCGTCCGCTCTAACACTAATCACACTTACATCTGCCCGC CCCTTACCCTAGAAAACTCGCATGAAGCCTTCGTCTCACAGCTATTTTCAGTGCCTTGTCTGGAGAAT 165 GCAAGAACCAAGACCCTCAGCCACCCTAAATCTCCGCACAGGCATTCACGACGATATACGGAAACAGCA CAAGTGGCACGCGGGAAGGTCATCAGTTACAGTCATGGTCAGGGTTAGTAGGTTGGGTAGGAGGGAAA TTGGACAGATTAACGAGGGCAGATCAGAGAAACGTGCATACTCTACTCCACACAACTTCCGACGCTTAG ATAACCACGCAACCCCGAATTTACTACAATAACTCTCCTTTCACCTAGCCATTCCTCCCCTATTCAGTCCT AGTCGCTAAAGTTCCCATCCCCGCATAGTTGAGTGTTGTTGCATGAAGCCTTCGTCTCACAGCTCGCGT CAGTGCCTTGTCTGGAGAAT 166 GCAAGAACCAAGACCCTCAGCCCAAACACAACACCACCAACCACACCCGCCCTCCCACTTCCCTTCTCC TTTCCCCTATCTTACCCTACGCATGAAGCCTTCGTCTCACAGCTCCCCACCCCTTAATTTCCGCACCTATT 167 GCAAGAACCAAGACCCTCAGCCCATACCCCACCCCTCCACTCCTCCTTCCTTTATTTTCGTTTCTCTGTTT TGTATTTGTTGCATGAAGCCTTCGTCTCACAGCTCTTTTCCCCTTAATTTCCGCACCTATT 168 GCAAGAACCAAGACCCTCAGCCCATCCCAGAAACAAGTTACGCGACAGTGAGGAGAGAGCCAAGTATA AGTAAGCAGATCCGTCCATTCAAGCGTCAGAGTCCCGTGCCATTGTTCCCTTCCTATACCCTTGCCACTA CTTTCTCGCTCCCATATTTCTACAGGTGCATCGTACTTCTTTATGCCGCGTTACTGTTCACTCTTTTCCTT AGGCTAGATCGGAACTCGCAACAAAACTAATCACAAACGGGCAAAGGGGATACGGACACTGGAATAAG ACTACACGCCGACTTGATGAAAGCTACTCCACACGACACAACCTCCTAAACCGACCACCGCCACCAACA CACCATCACCCAACCACTCAAAATCCCTACCCGTACCTGAGAGTAAAACCAGCGCCAAATCGACCTCAA CCCACCTAACACCCCTATCCATACCGTAAAGCCCTCCGCATGAAGCCTTCGTCTCACAGCTCGGGTCAG TGCCTTGTCTGGAGAAT 169 GCAAGAACCAAGACCCTCAGCCCCGACACAAAATAAAACCACACCAAACACCCAACAACCCCACATCCC ACCACCTCCCTACCCACTACCACTCCTCTCTAAACCCGCATGAAGCCTTCGTCTCACAGCTCTTTTCCCC TTAATTTCCGCACCTATT 170 GCAAGAACCAAGACCCTCAGCCCCGCCCGTAACACTCAGACCTAACTAAACCGAGCACCACACAACCC GCATGAAGCCTTCGTCTCACAGCTCCCATCCCCTTAATTTCCGCACCTATT 171 GCAAGAACCAAGACCCTCAGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCGC CATTCTCCAGACAAGGCACTG 172 GCAAGAACCAAGACCCTCAGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGTA ACCTCTCTAACCCGCACGCCGGCGACCGCGCGCCCGGATTCTCCAGACAAGGCACTG 173 GCAAGAACCAAGACCCTCAGCGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGATTGG AATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATG AAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTATTCTCCAGACAAGGCACTG 174 GCAAGAACCAAGACCCTCAGGCATGAAGCCTTCGTCTCACAGCTCGTGATCAGTGCCTTGTCTGGAGAA T 175 GCAAGAACCAAGACCCTCAGGCATGAAGCCTTCGTCTCACAGCTCGTGATCAGTGCCTTGTCTGGAGAA T 176 GCAAGAACCAAGACCCTCAGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACACTCCTT ACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCATCTAA CCTCGGAGCCCTGTCACGCGGCGGACTTGGAGAATTCTCCAGACAAGGCACTG 177 GCAAGAACCAAGACCCTCAGGGAGGGAATCACAGTCACGCATGAAGCCTTCGTCTCACAGCTCGTGAC TTATGTAGAGGCCATCAACAGTGGAGCAGTGCCTTGTCTGGAGAAT 178 GCAAGAACCAAGACCCTCAGGGAGGGAATCACAGTCACGCATGAAGCCTTCGTCTCACAGCTCGTGAC TTATGTAGAGGCCATCAACAGTGGAGCAGTGCCTTGTCTGGAGAAT 179 GCAAGAACCAAGACCCTCAGGGAGGGAATCACAGTCACTGGGAATCGTCTGGGAACTCTGGCAGTGAC TTATGTAGAGGCCATCAACAGTGGAGCAGTGCCTTGTCTGGAGAAT 180 GCAAGAACCAAGACCCTCAGGGAGGGAATCACAGTCACTGGGAATCGTCTGGGAACTCTGGCAGTGAC TTATGTAGAGGCCATCAACAGTGGAGCAGTGCCTTGTCTGGAGAAT 181 GCAAGAACCAAGACCCTCAGGGAGGGAATCACAGTCACTGGGAATCGTCTGGGAACTCTGGCAGTGAC TTATGTAGAGGCCATCAACAGTGGAGCAGTGCCTTGTCTGGAGAAT 182 GCAAGAACCAAGACCCTCAGTACCGTTCGCATCGCCACCTTCACCTCCACTCCCTCCTTCCACACCCGT CTGCACCCCTCGAAGTCTCTGCGCTACTCTATCCCGGTCTGTGCGTTTTACCTCGTCCTCCCCTATGTGT TCCTGATCCCCGCGCATGAAGCCTTCGTCTCACAGCTCATTACAGTGCCTTGTCTGGAGAAT 183 GCAAGAACCAAGACCCTCAGTAGCAATATTGAATTCTAGATTATACGAGGCAACGCTCTGGTAACCTCT CTATTATTTAAGCTATCATACTCTAGTGTTTATTCTCCAGACAAGGCACTG 184 GCAAGAACCAAGACCCTCAGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACCTCTCT AAATTGATAGTTCCGATTGCAACTTGACGTATTCTCCAGACAAGGCACTG 185 GCAAGAACCAAGACCCTCAGTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTTAAAT ACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATTTTAT TCTTTAGAATAGTAGAAATTTAATTAAATATTCTCCAGACAAGGCACTG 186 GCAAGAACCAAGACCCTCAGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCCCCATG GCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGCTGTCA GGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCGACCTC CCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTCCGGAT GTGGTCCCCTCCCAGTCCTCATTCTCCAGACAAGGCACTG 187 GCTATTGCTGGGATTTTGAGGAACGTAGCAATATTGAATTCTAGATTATACGAGGCAACGCTCTGGTAA CCTCTCTATTATTTAAGCTATCATACTCTAGTGTTTCGTTCGTAGTCACTGTTGAAGCAAATG 188 GCTATTGCTGGGATTTTGAGGAACGTAGCAATATTGAATTCTAGATTATACGAGGCAACGCTCTGGTAA CCTCTCTATTATTTAAGCTATCATACTCTAGTGTTTCGTTCGTAGTCACTGTTGAAGCAAATG 189 GCTATTGCTGGGATTTTGAGGAAGATCTGTTCATGCGTTCGTTATTTGGATTGGAATTGTTGAGCCCTAC CTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACCTCTCTAAATTGATAGTT CCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATGAAATTTCGTCCGAACA AGTTTCAACTTCGTAGTCACTGTTGAAGCAAATG 190 GCTATTGCTGGGATTTTGAGGAAGATCTGTTCATGCGTTCGTTATTTGGATTGGAATTGTTGAGCCCTAC CTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACCTCTCTAAATTGATAGTT CCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATGAAATTTCGTCCGAACA AGTTTCAACTTCGTAGTCACTGTTGAAGCAAATG 191 GCTATTGCTGGGATTTTGAGGACCACGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGA TTGGAATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGT AACCTCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGC GATGAAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTCTGCCGTAGTCACTGTTGAAGCA AATG 192 GCTATTGCTGGGATTTTGAGGACCACGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGA TTGGAATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGT AACCTCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGC GATGAAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTCTGCCGTAGTCACTGTTGAAGCA AATG 193 GCTATTGCTGGGATTTTGAGGAGCTTTTCCTAAAAGGATTGTACACCTTAGAAGTGCTTAAGGAAGAGT GATGAAGATAGGCATGAAGCCTTCGTCTCACAGCTGCATGCGTAGTCACTGTTGAAGCAAATG 194 GCTATTGCTGGGATTTTGAGGAGCTTTTCCTAAAAGGATTGTACACCTTAGAAGTGCTTAAGGAAGAGT GATGAAGATAGGCATGAAGCCTTCGTCTCACAGCTGCATGCGTAGTCACTGTTGAAGCAAATG 195 GCTATTGCTGGGATTTTGAGGAGGCAACGCTCTGGTAACCTCTCTAATATGCGTAGTCACTGTTGAAGC AAATG 196 GCTATTGCTGGGATTTTGAGGAGGCAACGCTCTGGTAACCTCTCTAATATGCGTAGTCACTGTTGAAGC AAATG 197 GCTATTGCTGGGATTTTGAGGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGAAGCATTTGCTTCAAC AGTGACTACG 198 GCTATTGCTGGGATTTTGAGGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGAAGCATTTGCTTCAAC AGTGACTACG 199 GCTATTGCTGGGATTTTGAGGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGATTCCAGCGTAGTCA CTGTTGAAGCAAATG 200 GCTATTGCTGGGATTTTGAGGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGATTCCAGCGTAGTCA CTGTTGAAGCAAATG 201 GCTATTGCTGGGATTTTGAGGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGTTCCGATACGTGCAA CTTGTCTCGTAGTCACTGTTGAAGCAAATG 202 GCTATTGCTGGGATTTTGAGGAGGCAACGCTCTGGTAACCTCTCTAATTGATAGTTCCGATACGTGCAA CTTGTCTCGTAGTCACTGTTGAAGCAAATG 203 GCTATTGCTGGGATTTTGAGGATATGTTCCAGTAGACGCGCAACAGGGCTTCTACGGTTCGCCGGTTAT TGACTTACTGCACGTTGGGGAGCGGCTTGAATTGAGTCCCAGGCCCGAGTCCGTACCGATGCTCTTAG GCGAGCCACGTTTCTGGACCCACCCCGTGCTACCTATGGCCGTTCTTCGTATCTGTCTCTTAGCGCGCC TCAACTATGGTGTCCTCGCCTAGTAGAGCTCCGTAGACGTCCACCCCTTCGCAGGCAACGCTCTGGTAA CCTCTCTACCCGGGAAGGGATTACAGGCTCGATTCCAGTCGCAGATGACACCGCTGTTCTACTCGGCAC CTGACTACCTACCAGATGGGCCCGCAACACGTCGTGCACCCGCGGAACCGGTTAAAGAACGTTAGTTC CCTGGCCTTGGAGCCTAAACAAACTTACTGAGCCGCACCTTCCGAGTCTCGCTGTACTGTGATCCCCGC TTCCCTGGTACTAGAGGGCAAATCCGACTGGCTATACCGACGTAGTCACTGTTGAAGCAAATG 204 GCTATTGCTGGGATTTTGAGGATATGTTCCAGTAGACGCGCAACAGGGCTTCTACGGTTCGCCGGTTAT TGACTTACTGCACGTTGGGGAGCGGCTTGAATTGAGTCCCAGGCCCGAGTCCGTACCGATGCTCTTAG GCGAGCCACGTTTCTGGACCCACCCCGTGCTACCTATGGCCGTTCTTCGTATCTGTCTCTTAGCGCGCC TCAACTATGGTGTCCTCGCCTAGTAGAGCTCCGTAGACGTCCACCCCTTCGCAGGCAACGCTCTGGTAA CCTCTCTACCCGGGAAGGGATTACAGGCTCGATTCCAGTCGCAGATGACACCGCTGTTCTACTCGGCAC CTGACTACCTACCAGATGGGCCCGCAACACGTCGTGCACCCGCGGAACCGGTTAAAGAACGTTAGTTC CCTGGCCTTGGAGCCTAAACAAACTTACTGAGCCGCACCTTCCGAGTCTCGCTGTACTGTGATCCCCGC TTCCCTGGTACTAGAGGGCAAATCCGACTGGCTATACCGACGTAGTCACTGTTGAAGCAAATG 205 GCTATTGCTGGGATTTTGAGGATCTGTATCCCAAGTGTTCAGACCTTCATATTGCATGAAGCCTTCGTCT CACAGCTATTGATAGTTCCGATTGCAACTTGACGTCTAGCATTTGCTTCAACAGTGACTACG 206 GCTATTGCTGGGATTTTGAGGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCG CCCATTTGCTTCAACAGTGACTACG 207 GCTATTGCTGGGATTTTGAGGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCG CCCATTTGCTTCAACAGTGACTACG 208 GCTATTGCTGGGATTTTGAGGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCGT CCCGTAGTCACTGTTGAAGCAAATG 209 GCTATTGCTGGGATTTTGAGGCCCGCCCCGCCCTGGCAACGCTCTGGTAACCTCTCTAGCCCGCCCCGT CCCGTAGTCACTGTTGAAGCAAATG 210 GCTATTGCTGGGATTTTGAGGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGT AACCTCTCTAACCCGCACGCCGGCGACCGCGCGCCCGGCATTTGCTTCAACAGTGACTACG 211 GCTATTGCTGGGATTTTGAGGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGT AACCTCTCTAACCCGCACGCCGGCGACCGCGCGCCCGGCATTTGCTTCAACAGTGACTACG 212 GCTATTGCTGGGATTTTGAGGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGT AACCTCTCTAACCCGCACGCCGGCGCCGCGCGCCCGAGCGCACGTAGTCACTGTTGAAGCAAATG 213 GCTATTGCTGGGATTTTGAGGCGGGCGCGCCGCGCGCGACGCGCGTCCCGTCCGGCAACGCTCTGGT AACCTCTCTAACCCGCACGCCGGCGCCGCGCGCCCGAGCGCACGTAGTCACTGTTGAAGCAAATG 214 GCTATTGCTGGGATTTTGAGGCGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGATTGG AATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATG AAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTCATTTGCTTCAACAGTGACTACG 215 GCTATTGCTGGGATTTTGAGGCGGTAATTACTGTTAGACTGGTGGGTATAAACTTCGTTATTTGGATTGG AATTGTTGAGCCCTACCTGACTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCCCGTATAAATAGCCGGTCTAAACAGCGATG AAATTTCTGTAGAATCAACTAAATTTTCCGTTCAACGGATCCTCATTTGCTTCAACAGTGACTACG 216 GCTATTGCTGGGATTTTGAGGCGTGTTGTTTCGATTTAACTTGTCCATGTGTCTCTGCTGCTTTCTTCCTT TCCACTTCACTACTCTTATTCGGGCAACGCTCTGGTAACCTCTCTAATTCTCGTAGTCACTGTTGAAGCA AATG 217 GCTATTGCTGGGATTTTGAGGCGTTATTTGGATTGGAATTGTTGAGCCCTACCTGACTCTGTATCCCAAG TGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACCTCTCTAAATTGATAGTTCCGATTGCAACTTGACG TCTAGCCGATAAATAGCCGGTCTAAACAGCGATGAAATTTCCGTAGTCACTGTTGAAGCAAATG 218 GCTATTGCTGGGATTTTGAGGCGTTATTTGGATTGGAATTGTTGAGCCCTACCTGACTCTGTATCCCAAG TGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACCTCTCTAAATTGATAGTTCCGATTGCAACTTGACG TCTAGCCGATAAATAGCCGGTCTAAACAGCGATGAAATTTCCGTAGTCACTGTTGAAGCAAATG 219 GCTATTGCTGGGATTTTGAGGCTGGAATTTGTGCTCTTAGGTCGTGGGGCTGCTGTTAAGTCGCTCGCT ATCTAAAGTTCAGTCAAGGATGGCAACGCTCTGGTAACCTCTCTAGAAATCGTAGTCACTGTTGAAGCA AATG 220 GCTATTGCTGGGATTTTGAGGGACCTCGACCGCTGGCAACGCTCTGGTAACCTCTCTATCCTCCCCTCT CCCGTAGTCACTGTTGAAGCAAATG 221 GCTATTGCTGGGATTTTGAGGGACCTCGACCGCTGGCAACGCTCTGGTAACCTCTCTATCCTCCCCTCT CCCGTAGTCACTGTTGAAGCAAATG 222 GCTATTGCTGGGATTTTGAGGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACACTCCT TACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCATCTAA CCTCGGAGCCCTGTCACGCGGCGGACTTGGAGACATTTGCTTCAACAGTGACTACG 223 GCTATTGCTGGGATTTTGAGGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACACTCCT TACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCATCTAA CCTCGGAGCCCTGTCACGCGGCGGACTTGGAGACATTTGCTTCAACAGTGACTACG 224 GCTATTGCTGGGATTTTGAGGGCCCGCGCGCCGGCGGCGGCGCGGTGGCCGGCGGCAACGCTCTGGT AACCTCTCTAGGCGGCGGCGCCACCGCGCGGGGGGGCGGGCCCGTAGTCACTGTTGAAGCAAATG 225 GCTATTGCTGGGATTTTGAGGGCCCGCGCGCCGGCGGCGGCGCGGTGGCCGGCGGCAACGCTCTGGT AACCTCTCTAGGCGGCGGCGCCACCGCGCGGGCGGGCGGGCCCGTAGTCACTGTTGAAGCAAATG 226 GCTATTGCTGGGATTTTGAGGGCCGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCC CCATGGCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGC TGTCAGGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCG ACCTCCCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTC CGGATGTGGTCCCCTCCCAGTCCTCCCCGCGTAGTCACTGTTGAAGCAAATG 227 GCTATTGCTGGGATTTTGAGGGCCGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCC CCATGGCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGC TGTCAGGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCG ACCTCCCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTC CGGATGTGGTCCCCTCCCAGTCCTCCCCGCGTAGTCACTGTTGAAGCAAATG 228 GCTATTGCTGGGATTTTGAGGGCGCGGCGGTGGAGCGCTCGCGGTGGTGCGCTGGCAACGCTCTGGT AACCTCTCTATGGCGCGTGGCCACGCTCCCGCGCGACGGCCGCGTAGTCACTGTTGAAGCAAATG 229 GCTATTGCTGGGATTTTGAGGGCGCGGCGGTGGAGCGCTCGCGGTGGTGCGCTGGCAACGCTCTGGT AACCTCTCTATGGCGCGTGGCCACGCTCCCGCGCGACGGCCGCGTAGTCACTGTTGAAGCAAATG 230 GCTATTGCTGGGATTTTGAGGGTAAACAGAGCGGAATCACAAATATTTATGCCTACCAAACCGATTTCTC AAAAGTAAAACAAAGTACGTCTCATTAATACTGTGGTGTAAGTATTATCAAAATAAAATAGTGTAACTGT ATGTATGTTGGCAACGCTCTGGTAACCTCTCTAATAAATTGATAAATTACACTGAGTTTGCATAGGAATC GTTATATATCAAAGTATGTTTTCTGACTACTATCAAACGCGCAAGTTACTTACTCTAAAAGTATTTGAGTT TAAGCCATTAGTCACCGATACGTAGTCACTGTTGAAGCAAATG 231 GCTATTGCTGGGATTTTGAGGTAGCAATATTGAATTCTAGATTATACGAGGCAACGCTCTGGTAACCTCT CTATTATTTAAGCTATCATACTCTAGTGTTTCATTTGCTTCAACAGTGACTACG 232 GCTATTGCTGGGATTTTGAGGTAGCAATATTGAATTCTAGATTATACGAGGCAACGCTCTGGTAACCTCT CTATTATTTAAGCTATCATACTCTAGTGTTTCATTTGCTTCAACAGTGACTACG 233 GCTATTGCTGGGATTTTGAGGTATATAAATAAATGGCAACGCTCTGGTAACCTCTCTAAATAAATAAAAT ACGTAGTCACTGTTGAAGCAAATG 234 GCTATTGCTGGGATTTTGAGGTATATAAATAAATGGCAACGCTCTGGTAACCTCTCTAAATAAATAAAAT ACGTAGTCACTGTTGAAGCAAATG 235 GCTATTGCTGGGATTTTGAGGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACCTCTC TAAATTGATAGTTCCGATTGCAACTTGACGTCATTTGCTTCAACAGTGACTACG 236 GCTATTGCTGGGATTTTGAGGTATTATTATTTTAAATTAATATTATAATTTTAACTTTTATTGATTATATATT AGTCATTATATATAAAGGCAACGCTCTGGTAACCTCTCTATCTTAGTTTTATTAATATAAAATTTATATAAT AATATTTATTAAATAAATTCTATTATATTATTGATTCGTAGTCACTGTTGAAGCAAATG 237 GCTATTGCTGGGATTTTGAGGTATTATTATTTTAAATTAATATTATAATTTTAACTTTTATTGATTATATATT AGTCATTATATATAAAGGCAACGCTCTGGTAACCTCTCTATCTTAGTTTTATTAATATAAAATTTATATAAT AATATTTATTAAATAAATTCTATTATATTATTGATTCGTAGTCACTGTTGAAGCAAATG 238 GCTATTGCTGGGATTTTGAGGTCATAGTTCTATATACATTGTCATGACACGAATTGGCTGAAAACGGTTG ATAGAAGATATTGACTATATTCTCTTCCGCTGTATCCCGTTTCTTTTGGAAATTGACCGTATTATGGTCAC CATCAGCCTAAGTGATCTCTGGACCGTCGAGAGACCCCATTGACTTGGTTCTTCGGTTTGATGCACTCA TGTAAAATGTAGTCTCAATCAATACCATCCATTTCTAGCATACGGGTGAGCATGAAGCCTTCGTCTCACA GCTCCGGTACAGGTAATCGAGAGAACACTAAAACAGTCCGACATGAGATTCATTAAAACCTATTTTCACC AATCGGTAGAACGGTTATGCGCAAAATATTTTCGGGGTCCACAGTGCACCTATGTAATCTGTAACATGA AGTTGTACGAAAATAGAGAACCCACCCAGCTTATCTAGGAAATTGATCTCTTCGATTTAAGGATGTGTCG ACACGTATCATGCCAAGTGATCAGAGGCGTAACATTTGCTTCAACAGTGACTACG 239 GCTATTGCTGGGATTTTGAGGTCATAGTTCTATATACATTGTCATGACACGAATTGGCTGAAAACGGTTG ATAGAAGATATTGACTATATTCTCTTCCGCTGTATCCCGTTTCTTTTGGAAATTGACCGTATTATGGTCAC CATCAGCCTAAGTGATCTCTGGACCGTCGAGAGACCCCATTGACTTGGTTCTTCGGTTTGATGCACTCA TGTAAAATGTAGTCTCAATCAATACCATCCATTTCTAGCATACGGGTGAGCATGAAGCCTTCGTCTCACA GCTCCGGTACAGGTAATCGAGAGAACACTAAAACAGTCCGACATGAGATTCATTAAAACCTATTTTCACC AATCGGTAGAACGGTTATGCGCAAAATATTTTCGGGGTCCACAGTGCACCTATGTAATCTGTAACATGA AGTTGTACGAAAATAGAGAACCCACCCAGCTTATCTAGGAAATTGATCTCTTCGATTTAAGGATGTGTCG ACACGTATCATGCCAAGTGATCAGAGGCGTAACGTAGTCACTGTTGAAGCAAATG 240 GCTATTGCTGGGATTTTGAGGTCATAGTTCTATATACATTGTCATGACACGAATTGGCTGAAAACGGTTG ATAGAAGATATTGACTATATTCTCTTCCGCTGTATCCCGTTTCTTTTGGAAATTGACCGTATTATGGTCAC CATCAGCCTAAGTGATCTCTGGACCGTCGAGAGACCCCATTGACTTGGTTCTTCGGTTTGATGCACTCA TGTAAAATGTAGTCTCAATCAATACCATCCATTTCTAGCATACGGGTGAGGCAACGCTCTGGTAACCTCT CTACCGGTACAGGTAATCGAGAGAACACTAAAACAGTCCGACATGAGATTCATTAAAACCTATTTTCACC AATCGGTAGAACGGTTATGCGCAAAATATTTTCGGGGTCCACAGTGCACCTATGTAATCTGTAACATGA AGTTGTACGAAAATAGAGAACCCACCCAGCTTATCTAGGAAATTGATCTCTTCGATTTAAGGATGTGTCG ACACGTATCATGCCAAGTGATCAGAGGCGTAACGTAGTCACTGTTGAAGCAAATG 241 GCTATTGCTGGGATTTTGAGGTCATAGTTCTATATACATTGTCATGACACGAATTGGCTGAAAACGGTTG ATAGAAGATATTGACTATATTCTCTTCCGCTGTATCCCGTTTCTTTTGGAAATTGACCGTATTATGGTCAC CATCAGCCTAAGTGATCTCTGGACCGTCGAGAGACCCCATTGACTTGGTTCTTCGGTTTGATGCACTCA TGTAAAATGTAGTCTCAATCAATACCATCCATTTCTAGCATACGGGTGAGGCAACGCTCTGGTAACCTCT CTACCGGTACAGGTAATCGAGAGAACACTAAAACAGTCCGACATGAGATTCATTAAAACCTATTTTCACC AATCGGTAGAACGGTTATGCGCAAAATATTTTCGGGGTCCACAGTGCACCTATGTAATCTGTAACATGA AGTTGTACGAAAATAGAGAACCCACCCAGCTTATCTAGGAAATTGATCTCTTCGATTTAAGGATGTGTCG ACACGTATCATGCCAAGTGATCAGAGGCGTAACGTAGTCACTGTTGAAGCAAATG 242 GCTATTGCTGGGATTTTGAGGTCGCTCGCCCCTACTTACACCACCCCTCCCCGTAGTCACTGTTGAAGC AAATG 243 GCTATTGCTGGGATTTTGAGGTCTGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACAC TCCTTACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCAT CTAACCTCGGAGCCCTGTCACGCGGCGGACTTGGAGACTGTCGTAGTCACTGTTGAAGCAAATG 244 GCTATTGCTGGGATTTTGAGGTCTGGCATGTCGGCTCGGTCTGTCTCTTTCCCCTCATCTCTCGGTACAC TCCTTACCTCGCCCACCCCGGCAACGCTCTGGTAACCTCTCTATCTGGCCTGTCACGAATCACTGTCCAT CTAACCTCGGAGCCCTGTCACGCGGCGGACTTGGAGACTGTCGTAGTCACTGTTGAAGCAAATG 245 GCTATTGCTGGGATTTTGAGGTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCGTAGTCACTGTTGAAGCAAATG 246 GCTATTGCTGGGATTTTGAGGTCTGTATCCCAAGTGTTCTCTGCTTCATATTGGCAACGCTCTGGTAACC TCTCTAAATTGATAGTTCCGATTGCAACTTGACGTCTAGCGTAGTCACTGTTGAAGCAAATG 247 GCTATTGCTGGGATTTTGAGGTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTTAAAT ACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATTTTAT TCTTTAGAATAGTAGAAATTTAATTAAATCATTTGCTTCAACAGTGACTACG 248 GCTATTGCTGGGATTTTGAGGTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTTAAAT ACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATTTTAT TCTTTAGAATAGTAGAAATTTAATTAAATCATTTGCTTCAACAGTGACTACG 249 GCTATTGCTGGGATTTTGAGGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCCCCAT GGCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGCTGTC AGGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCGACCT CCCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTCCGGA TGTGGTCCCCTCCCAGTCCTCCATTTGCTTCAACAGTGACTACG 250 GCTATTGCTGGGATTTTGAGGTGCCGAGGGTCCAGGTCGAGACTCCATCCCGAGGCGTGTGTCCCCAT GGCCGTCCTCCAGGCTAGTACTGTGCCCCGTCGCCGTCGCACAAGGCCGGTCGATCGTGGTGGCTGTC AGGCGGGGTGGCAACGCTCTGGTAACCTCTCTACGGCGTAGTAGTTCGTGCCCCTCCCCTTGCGACCT CCCGCTACCACCCGTCACTCCCCGGTAAGAGGCTCTCACGGACGGCAGAGTCGGTCGCGCGCTCCGGA TGTGGTCCCCTCCCAGTCCTCCATTTGCTTCAACAGTGACTACG 251 GCTATTGCTGGGATTTTGAGGTGCGTCGATGCTGTGTGAGGTGAAGACCTAGAGGCAACGCTCTGGTA ACCTCTCTACACGCTTAGCAACGCTGCATGTCGAGTCTCCACGTAGTCACTGTTGAAGCAAATG 252 GCTATTGCTGGGATTTTGAGGTGCGTCGATGCTGTGTGAGGTGAAGACCTAGAGGCAACGCTCTGGTA ACCTCTCTACACGCTTAGCAACGCTGCATGTCGAGTCTCCACGTAGTCACTGTTGAAGCAAATG 253 GCTATTGCTGGGATTTTGAGGTGCGTCGCGGCTGTGGGAGGTGCGGACCTAGAGGCAACGCTCTGGTA ACCTCTCTACACGCTTAGCGCCGCTGCCTGTCGACCGTCCACGTAGTCACTGTTGAAGCAAATG 254 GCTATTGCTGGGATTTTGAGGTGCGTCGCGGCTGTGGGAGGTGCGGACCTAGAGGCAACGCTCTGGTA ACCTCTCTACACGCTTAGCGCCGCTGCCTGTCGACCGTCCACGTAGTCACTGTTGAAGCAAATG 255 GCTATTGCTGGGATTTTGAGGTGGGGCTGGCAGGGGCGGGTGGGGAGGAGGGCGGGGTGGGGTCGG GGCCAAGGGGAGCGGGGAGCGGCGGCAACGCTCTGGTAACCTCTCTAGCCCGTCCGTGCCGTCCGCC GCCTGGGAGCCTCGCTCGGGGACAGCCGGGACTGGGGACGCGGGCCGCCGTAGTCACTGTTGAAGCA AATG 256 GCTATTGCTGGGATTTTGAGGTGGGGCTGGCAGGGGCGGGTGGGGAGGAGGGCGGGGTGGGGTCGG GGCCAAGGGGAGCGGGGAGCGGCGGCAACGCTCTGGTAACCTCTCTAGCCCGTCCGTGCCGTCCGCC GCCTGGGAGCCTCGCTCGGGGACAGCCGGGACTGGGGACGCGGGCCGCCGTAGTCACTGTTGAAGCA AATG 257 GCTATTGCTGGGATTTTGAGGTGGTAGATGGCGTTTTGTTTCAGGAGTTTATCATTACCGACTTAAAGCT AACAACGAAACTTATGAAATGGATCTTAGGCAACGCTCTGGTAACCTCTCTAATCGCCGTAGTCACTGTT GAAGCAAATG 258 GCTATTGCTGGGATTTTGAGGTGTAATATTAACAAGTAATAAAGAAATATATAGCATGAAGCCTTCGTCT CACAGCTTTTATTCAATTTAATGATTACCTTTATTATCTCATTTGCTTCAACAGTGACTACG 259 GCTATTGCTGGGATTTTGAGGTGTAATATTAACAAGTAATAAAGAAATATATAGCATGAAGCCTTCGTCT CACAGCTTTTATTCAATTTAATGATTACCTTTATTATCTCGTAGTCACTGTTGAAGCAAATG 260 GCTATTGCTGGGATTTTGAGGTGTAATATTAACAAGTAATAAAGAAATATATAGGCAACGCTCTGGTAAC CTCTCTATTTATTCAATTTAATGATTACCTTTATTATCTCGTAGTCACTGTTGAAGCAAATG 261 GCTATTGCTGGGATTTTGAGGTGTAATATTAACAAGTAATAAAGAAATATATAGGCAACGCTCTGGTAAC CTCTCTATTTATTCAATTTAATGATTACCTTTATTATCTCGTAGTCACTGTTGAAGCAAATG 262 GCTATTGCTGGGATTTTGAGGTTGCAACTTGACGTCTCGTAGTCACTGTTGAAGCAAATG 263 GCTATTGCTGGGATTTTGAGGTTTAATAAATAAATTAAATATTATATAAATTAGGCAACGCTCTGGTAACC TCTCTATATTATATTAAATTATTAAATTAATAATTATACGTAGTCACTGTTGAAGCAAATG 264 GCTATTGCTGGGATTTTGAGGTTTAATAAATAAATTAAATATTATATAAATTAGGCAACGCTCTGGTAACC TCTCTATATTATATTAAATTATTAAATTAATAATTATACGTAGTCACTGTTGAAGCAAATG 265 GCTATTGCTGGGATTTTGAGGTTTCTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTT AAATACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATT TTATTCTTTAGAATAGTAGAAATTTAATTAAATGCACCGTAGTCACTGTTGAAGCAAATG 266 GCTATTGCTGGGATTTTGAGGTTTCTGAATATTTTATTCCCTAATTTTATTATTATGTTCTAAAAGGTATTT AAATACTTTTCATTAATGGCAACGCTCTGGTAACCTCTCTATCATAAATATTTTAAATACTAGAATCTTATT TTATTCTTTAGAATAGTAGAAATTTAATTAAATGCACCGTAGTCACTGTTGAAGCAAATG 267 GCTATTGCTGGGATTTTGAGGTTTGTTTTCGTTTCTTTCTCTCTTTCTTATCGTAGTCACTGTTGAAGCAA ATG 268 GGAGAAAAGCCACATGAATGCAAAACACCAGCAATCTCAAGACCCACCTATAAACATTGGTATGGTTTC TCTCCAG 269 GGAGAAAAGCCACATGAATGCAAAACACCAGCAATCTCAAGACCCACCTATAAACTGGAGAGAAACCAT ACCAATG 270 GGAGAAAAGCCACATGAATGCAAAAGAAGAAATAAGATAAAATACAACAATAATCAAAGACACAAAACA AACATAAACACCAGCAATCTCAAGACCCACCTAAGAACATTGGTATGGTTTCTCTCCAG 271 GGAGAAAAGCCACATGAATGCAAAAGAAGAAATAAGATAAAATACAACAATAATCAAAGACACAAAACA AACATAAACACCAGCAATCTCAAGACCCACCTAAGAACTGGAGAGAAACCATACCAATG 272 GGAGAAAAGCCACATGAATGCAAAATAAATACTAAACAAAACTAACAACACAAACACCAGCAATCTCAA GACCCACCTATAAACATTGGTATGGTTTCTCTCCAG 273 GGAGAAAAGCCACATGAATGCAAAATAAATACTAAACAAAACTAACAACACAAACACCAGCAATCTCAA GACCCACCTATAAACTGGAGAGAAACCATACCAATG 274 GGAGAAAAGCCACATGAATGCATATAATATACTACAATTATTAAAATATGCATGAAGCCTTCGTCTCACA GCTAAGTTCTGGAGAGAAACCATACCAATG 275 TCTCTGATCGGTCCCTTTACTCGCCTCCCTACTCTTCATTCTATTCTCCTTCTCGTTCTTGTTTCTTCTTTT GTCTCTTTGCTTCCCTCGTATCTGTTCCTTTCCCGTCTCCCCATTCCCCGCCCCACTACCCAACACCCAC CAATCAACCAAAACCTACAACCCATCCACACACCACCTCACTAACTCCTACCTCGCTCCTCTACACTTCA CTGGCAACGCTCTGGTAACCTCTCTATCTCACCCCTTAATTTCCGCACCTATT 276 TCTCTGATCGGTCCCTTTACTCTCCCAACCCCTCCCTCGCCCATCCCCACTCCGCTCGCTTCCCCTGGCC CTGTCCGCCTCCACCCGTCGTCCTCATCCAGCCGCAAGTTGGCAACGCTCTGGTAACCTCTCTACGCCG CCCCTTAATTTCCGCACCTATT 277 TCTCTGATCGGTCCCTTTACTCTCCCTCGCCTCCTTCCCACCCTCTTCCTCACTCACCCCACTTTTCTATC TACTTCACTGGCAACGCTCTGGTAACCTCTCTACCCCTCCCCTTAATTTCCGCACCTATT 278 TCTCTGATCGGTCCCTTTACTCTGCCTTTTCTCCTTTCTTTCCTTCCTCATCCACTTCCACCCACCTCACTC ACCCTAACCCCGCCCTCCCAACCATCACCAACACCCCTCAAACCTACCTCCTCCGCTCCCCACACTCTCC CTACTCAACTCTACACATGGCAACGCTCTGGTAACCTCTCTATCTCGCCCCTTAATTTCCGCACCTATT 279 TCTCTGATCGGTCCCTTTACTCTTCTGTCCTTCCTCCTGTATTCGCTTATCTTCCACTTTCCAATTTAACGA TATGACGAGTTTATTCCTGCTTGAGTCTAGTTCCGTTTCAAATACCCCTGCGCCCTTCTTTGTCTTACTTG TTCGGTTCACTTGCTCCTCTACTTCACGGTCTCTTTAACTCAGGCAACGCTCTGGTAACCTCTCTATCACT CCCCTTAATTTCCGCACCTATT 280 TGCAGAAACACTACCTGGTACAAAACACCAGCAATCTCAAGACCCACCTATAAACACAAACTGGGTGAA CTTGG 281 TGCAGAAACACTACCTGGTACAAAAGAAGAAATAAGATAAAATACAACAATAATCAAAGACACAAAACAA ACATAAACACCAGCAATCTCAAGACCCACCTAAGAACACAAACTGGGTGAACTTGG 282 TGCAGAAACACTACCTGGTACAAAATAAATACTAAACAAAACTAACAACACAAACACCAGCAATCTCAAG ACCCACCTATAAACACAAACTGGGTGAACTTGG 283 CGTAGTCACTGTTGAAGCAAATG 284 GTGATGTGAAGGATTATGGGGA 285 CATTGGTATGGTTTCTCTCCAG 286 ATTCTCCAGACAAGGCACTG 287 AATAGGTGCGGAAATTAAGGGG
[0744] The invention will now be described further by the following non-limiting Examples.
Examples
[0745] The core technology is a system of at least three natural target or competitor polynucleotides, used in a nucleic acid amplification reaction for evaluation of a certain combination of one or more sequences of interest. As the sequences are replicated, they compete for these shared primers, conferring unique characteristics to the resulting readout. For example, take a set of natural gene transcripts, each paired with an engineered synthetic competitor (
[0746] The “direct” competitive amplification network described above, comprising multiple pairs of natural and synthetic targets each competing for both primers, constitutes the simplest embodiment of this invention. However, the same competition principle applies to more complex networks. For example, a natural target could share one of its primers with one synthetic target, which in turn shares its other primer with a second synthetic target, making an “indirect” CAN (
[0747] Direct Competitive PCR
[0748] In competitive PCR, a competitor polynucleotide (REF) is included as a reference alongside the target (denoted in the figures as WT)(
[0749] When the target and the competitor are amplified in the same PCR reaction, they compete for the primers. Since primers are consumed by each replication of a target or competitor strand, the amplification of both sequences stops as soon as the primer pool is exhausted. The quantity of each amplification product at the end of the reaction depends on the relative starting quantity of the two targets. This is reflected in the resulting fluorescent signal (
[0750]
[0751] Direct Competitive Amplification Networks
[0752] Now, a pair of competing targets is not much of a “network”, nor does a single gene target reflect the complexity of gene expression signatures. However, we can combine multiple competitive pairs in the same reaction, each producing HEX and FAM signals that reflect a different RNA transcript. Each competitive pair reports on how close the given gene is to its individual set point, and these signals will all simply stack on top of one another. The result is an aggregate measure of the overall similarity of all genes. Regardless of the number of genes under investigation, the difference between the total HEX intensity and the total FAM intensity integrate the information from the whole system. To illustrate why this is useful, let's look at how we can use such a network to diagnose tuberculosis by mimicking the statistical technique of logistic regression.
[0753] Case Study: Diagnosis Tuberculosis with a Direct CAN
[0754] More people die each year from tuberculosis than from any other infectious disease. 2018 saw 10 million new cases and 1.5 million deaths. Tuberculosis is particularly prevalent (and deadly) among those also infected with HIV, a population particularly difficult to diagnose with current TB tests. A gene expression signature was found in human white blood cells that can be used to diagnose TB. ((1)
[0755] Kaforou, M.; Wright, V. J.; Oni, T.; French, N.; Anderson, S. T.; Bangani, N.; Banwell, C. M.; Brent, A. J.; Crampin, A. C.; Dockrell, H. M.; Eley, B.; Heyderman, R. S.; Hibberd, M. L.; Kern, F.; Langford, P. R.; Ling, L.; Mendelson, M.; Ottenhoff, T. H.; Zgambo, F.; Wilkinson, R. J.; Coin, L. J.; Levin, M. Detection of Tuberculosis in HIV-Infected and -Uninfected African Adults Using Whole Blood RNA Expression Signatures: A Case-Control Study. PLOS Medicine 2013, 10 (10), e1001538. https://doi.org/10.1371/journal.pmed.1001538. (2)
[0756] Gliddon, H. D.; Kaforou, M.; Alikian, M.; Habgood-Coote, D.; Zhou, C.; Oni, T.; Anderson, S. T.; Brent, A. J.; Crampin, A. C.; Eley, B.; Kern, F.; Langford, P. R.; Ottenhoff, T. H. M.; Hibberd, M. L.; French, N.; Wright, V. J.; Dockrell, H. M.; Coin, L. J.; Wilkinson, R. J.; Levin, M.; Consortium, on behalf of the I. Identification of Reduced Host Transcriptomic Signatures for Tuberculosis and Digital PCR-Based Validation and Quantification. bioRxiv 2019, 583674. https://doi.org/10.1101/583674.)
[0757] Crucially, this test performs equally well in patients with and without HIV. However, the technology used to identify this signature—microarrays—is too cumbersome and expensive for use in the rural, poor regions of the world where such a test is needed most. A direct Competitive Amplification Network can evaluate the gene expression signature and translate the test to a rapid, inexpensive, and easy-to-use format.
[0758] Diagnosing with Statistics: Logistic Regression
[0759] To understand how we can use a CAN to diagnose TB, we first need to understand the statistical technique we are trying to mimic: logistic regression. Logistic regression models the probability of being in one group (infected with tuberculosis) compared to another (having some other disease, OD) by looking at the individual contributions of various determining factors (expression levels of various genes). It assumes that the log-odds, or relative probability, is given by a (linear) weighted sum of these factors:
[0760] We can look at the contribution of individual genes to the overall classifier by finding the marginal log-odds for each (
[0761] To diagnose a patient based on logistic regression, we just add up the contribution of each individual gene. For example, a patient may have 103 copies of GBP6, contributing a marginal log-odds of +0.25. The same patient might have 104 and 104 copies of ARG1 and TMCC1, respectively, contributing −0.5 and −0.2. The overall log-odds of this patient having TB would be 0.25-0.5-0.2=−0.45, so we can conclude that this patient is unlikely to have TB. Repeating this for every patient (
[0762] Mimicking the Statistics with a Direct CAN
[0763] We can use a direct CAN to recapitulate this statistical inference on a molecular level by designing a competitor for each of our three gene transcripts (
[0764] In order to choose an appropriate target region and design the synthetic target sequence, we use the results of logistic regression as an “objective function”: our goal is to find a pair of sequences that, when amplified together, give us an input-output response curve that approximates this objective. Thus, for each target, we try to approximate a line with the slope derived from the equation above (the respective S term) and which intercepts 0 at the mean concentration of that target observed in our data set. Using simulation, we can predict the behaviour of any two sequences amplified together, and so we can use standard curve-fitting algorithms known to the art to find the optimal parameters. In this case, those are the parameters that produce a response curve that matches the line specified above as closely as possible in the range of target concentrations observed in our dataset, then flattens as quickly as possible outside that range (See
[0765] Once suitable parameters are found, we then need to select sequences which exhibit them. Using the equations described above in the section “Testing and predicting competitor amplification behavior”, we can predict the combinations of length and GC content which provide these parameters. Note that our simulations do not include the drift term (m) or plateau term (K) found in our regression equations. This is because the simulations represent ideal behavior, and these two parameters describe deviations from that ideal. Thus, in choosing optimal length and GC content, we would seek to minimize drift and maximize the plateau, so that we select sequences as close to the ideal as possible.
[0766] It is likely that multiple sets of parameters could give nearly-optimal curves. It may be preferable that a suitable target sequence be identified a priori (due to external constraints), its amplification parameters measured, then using the curve-fitting algorithm to select only competitor amplification parameters which produce a nearly-optimal response when simulated along with the measured parameters. The simulation of the amplification behavior is described above; supplied with the suitable equations for simulation, the skilled person would be able to perform any of several optimization techniques and algorithms, including Gradient Descent, Stochastic Gradient Descent, and Quasi-Newton optimization, among others.
[0767] Limitations of Direct CANs
[0768] The direct networks presented above have two main drawbacks. First, they will get expensive quickly for larger gene signatures since at least one if not two probes need to be designed for each transcript targeted. Economies of scale for DNA sequences are quite favourable for scale-up, but at a development scale each fluorescently-labelled probe costs ˜£200 (for context, each primer costs ˜£2 and each synthetic target ˜£20). For gene signatures with 20-50 targets iterating on sequence designs becomes prohibitively expensive. Second, direct CANs are somewhat limited in the response curves attainable. To address these issues, indirect CANs provide similar functionality at a more or less fixed cost regardless of the number of genes under investigation. Indirect competition also opens the possibility of higher-order networks capable of complex, non-linear analysis of multiple targets simultaneously. Finally, redundant targeting allows additional flexibility for all CAN architectures.
[0769] Indirect Competitive PCR
[0770] Instead of direct competition between a probed target and a probed competitor, an unprobed target can simply mediate the competition between competitor polynucleotides. Because both primers are necessary for exponential amplification of a given target, replication can be arrested by depletion of only one primer. So, we can design a synthetic target, REFH, that shares one primer with a natural sequence, WT, and its second primer with a second synthetic target, REFF (
[0771] Case Study: Diagnosing Cancer with an Indirect CAN
[0772] A promising avenue of early cancer diagnosis or monitoring of cancer treatment is through detection of tumor-derived DNA in the bloodstream (circulating tumour DNA, ctDNA), chromosomal fragments shed by the cells as they die. This is distinguishable from the ordinary milieu of cell-free DNA (cfDNA) through specific mutations, such as single nucleotide polymorphisms (SNPs) or insertion-deletions (indels). By detecting known pathogenic mutations, we may be able to diagnose someone before the tumour shows up on a scan. We can also look for ctDNA after or during treatment, to see if the patient is responding or if the cancer has come back. The difficulty is, these variants are much lower in concentration than the corresponding natural sequence. Furthermore, a single base change is hard to differentiate using ordinary PCR (indels are easier, so we'll focus on SNPs with the understanding that whatever works for SNPs will work even better for indels). While in some cases specific mutations can inform treatment decisions (namely targeted treatment susceptibility/resistance), in general the total ctDNA burden is all that is needed even though any of numerous mutations can act as proxies for that total, making this a good application for CANs.
[0773] To use CANs for ctDNA detection, we will adapt Blocker Displacement Amplification (Wu et al., 2017), a published approach for preferentially amplifying variant alleles over the corresponding wild-type (
[0774] Higher-Order Competitive Networks
[0775] The flexibility of the indirect CAN allows incorporation of multiple natural targets in a single closed network, enabling non-linear analysis of target combinations. For example,
[0776] The CANs shown above are limited in their response to a given target; the output is always monotonic or at least unimodal with regards to the target concentration. However, we can further exploit the additive nature of fluorescent signals by redundantly targeting a single sequence. Genes transcripts are typically several thousand nucleotides long, while only 50-300 nucleotides are needed for a PCR target. Accordingly, we can design independent CANs each targeting a different region of the same sequence. Their outputs will stack, producing powerful emergent behaviour. From a mathematical point of view, the individual networks become a library of “basis functions” from which theoretically any response relationship can be built, limited only by the number of target regions available within a given sequence.
[0777] Case Study: Dilution-Agnostic Comparator with a Redundant CAN
[0778] Biosensing faces a bit of a paradox: variation in the concentration of a biomolecule is used to infer disease state, yet there are many non-biological reasons a sample could vary in the concentration of targets. The patient could be more or less hydrated than expected, the sample volume could be inaccurate, or simple statistics could lead to variation in the number of cells obtained. A classic approach to accommodate these uncertainties is the use of an internal standard, something innate to the sample that shouldn't vary with disease condition. For analysis of RNA, this internal standard is typically a “housekeeping” gene, a transcript so fundamental to growth of a cell (controlling cytoskeleton or cell membrane metabolism, for example) that its concentration reflects only the number of cells analysed rather than their state. The concentration of truly interesting gene transcripts can be compared to the housekeeping gene(s) to produce a more reliable measure of their deviation from normality. Typically, these are either separate PCR reactions performed in parallel or multiple probes within a single reaction; in either case, this becomes very time-, resource-, and sample-intensive if, say, 16 genes of interest and 5 housekeeping genes are needed, with extensive post-processing required. Redundant targeting of indirect CANs offers a way to perform this calculation explicitly, on the molecular level, so the reported signal reflects the relative concentrations of two genes regardless of their absolute concentrations (
[0779] Further Applications
[0780] Two and a half decades of gene expression analysis have identified dozens or even hundreds of potentially diagnostic expression signatures. RT-PCR, Nanostring, and RNA-seq analyses have similarly produced useful insight. In addition to the signatures described above, the following reports present promising candidates for adaptation of the CAN platform: [0781] Sepsis antibiotic decision model, 11 Genes [0782] Breast cancer chemotherapy decision, 70 Genes [0783] Breast cancer diagnosis, 21 Genes [0784] Bloodstream candidiasis, 40 Genes [0785] Bovine Tuberculosis, 15 Genes [0786] Bovine Mastitis, 15 Genes
[0787] The CAN platform could also solve a problem in bioprocessing, the industrial use of synthetic cells to produce a product such as a drug or to break down a material, such as petrochemicals or greenhouse gases. This involves coordination of several synthetic and natural gene systems and may involve more than one population of engineered cells grown simultaneously. Currently, system performance is verified through RNA-seq or microarrays, which are expensive and time consuming. Alternatively, engineers include genes that produce “reporter” in conjunction with the desired product. However, doing so consumes raw materials that otherwise could be used for production of the desired compound while putting greater stress and uncertainty on the engineered cells. The CAN architecture would provide a way to get a snapshot of the transcriptional activity of all relevant genes simultaneously. A CAN could be designed to produce one colour if all genes are operating within a pre-specified window, but if any gene is above or below that window a different colour is produced.
CONCLUSION
[0788] Competitive Amplification Networks offer the potential to perform powerful calculations on a molecular level, explicitly performing analyte pattern recognition within a biosensor architecture. By leveraging the ubiquitous DNA amplification technology PCR, the CAN platform is fast, inexpensive, and, above all, easy to use. The data-driven nature of the technology is both its strength and its weakness: an adequate dataset is all that's necessary to design and test a CAN but acquiring a sufficiently robust dataset may be a lengthy challenge. Fortunately, extensive literature exists on the topic, much with open-access data. The results here only begin to describe the potential of the technology; more work is needed to establish rules and algorithms for network design, target sequence selection, and experimental validation. As it is early stages yet, creating a CAN is a very manual process, but the whole process could become simplified through integration of modelling and automated instrumentation to iterate on the cycle of i) design a network for an application, ii) select competitor and primer sequences, iii) robotically assemble the competitors from building block oligos, iv) run an appropriate number of reactions, v) compare the results against the predicted response, vi) adjust the network or sequence design. Such a close-loop development system will allow rapid deployment of the CAN platform for a wide range of biosensing applications.