Molecular hybridization probes for complex sequence capture and analysis
11505819 · 2022-11-22
Assignee
Inventors
Cpc classification
C12Q2525/161
CHEMISTRY; METALLURGY
C12Q1/6818
CHEMISTRY; METALLURGY
C12Q1/6811
CHEMISTRY; METALLURGY
C12Q1/6811
CHEMISTRY; METALLURGY
C12Q2563/131
CHEMISTRY; METALLURGY
C12Q2563/131
CHEMISTRY; METALLURGY
C12Q1/6818
CHEMISTRY; METALLURGY
C12Q1/6883
CHEMISTRY; METALLURGY
International classification
C12Q1/6818
CHEMISTRY; METALLURGY
C12Q1/6811
CHEMISTRY; METALLURGY
Abstract
This present disclosure describes hybridization probes modularly constructed from several oligonucleotides with a pattern of designed complementary interactions, allowing the probes to sequence-specifically capture or analyze nucleic acid target sequences that are long and/or complex.
Claims
1. A nucleic acid hybridization probe for sequence-selective binding of a target nucleic acid sequence, said target nucleic acid sequence comprising in 5′ to 3′ order a first region and a second region, said nucleic acid hybridization probe comprising a first Complement Oligonucleotide comprising in 5′ to 3′ order a fourth region and a fifth region, the fifth region being complementary in sequence to the first region, a second Complement Oligonucleotide comprising in 5′ to 3′ order a seventh region, and an eighth region, the seventh region being complementary to the second region, and the eighth region being complementary to the fourth region, and a first Protector Oligonucleotide comprising a ninth region, the ninth region being either (a) complementary to the fifth region and homologous to the first region or (b) complementary to the seventh region and homologous to the second region.
2. The nucleic acid hybridization probe of claim 1, wherein the target nucleic acid sequence further comprises a third region to the 5′ end of the first region, and wherein the first Complement Oligonucleotide further comprises a sixth region to the 3′ end of the fifth region, the sixth region being complementary to the third region.
3. The nucleic acid hybridization probe of claim 1, wherein the target nucleic acid sequence further comprises a third region to the 3′ end of the second region, and wherein the second Complement Oligonucleotide further comprises a sixth region to the 5′ end of the seventh region, the sixth region being complementary to the third region.
4. The nucleic acid hybridization probe of claim 1, wherein the ninth region is complementary to the fifth region and homologous to the first region, and wherein the nucleic acid hybridization probe further comprises a second Protector Oligonucleotide comprising a tenth region, the tenth region being complementary to the seventh region and homologous to the second region.
5. The nucleic acid hybridization probe of claim 4, wherein the first Protector Oligonucleotide further comprises an eleventh region, and wherein the second Protector Oligonucleotide further comprises a twelfth region, the twelfth region being complementary to the eleventh region.
6. The nucleic acid hybridization probe of claim 1, wherein the first Complement Oligonucleotide or second Complement Oligonucleotide is functionalized with a moiety for capture or detection.
7. The nucleic acid hybridization probe of claim 1, wherein the first Complement Oligonucleotide further comprises a sixteenth region, and wherein the nucleic acid hybridization probe further comprises a first Universal Oligonucleotide, the first Universal Oligonucleotide comprising a fifteenth region, the fifteenth region being complementary to the sixteenth region.
8. The nucleic acid hybridization probe of claim 7, wherein the first Universal Oligonucleotide is functionalized with a moiety for capture or detection.
9. The nucleic acid hybridization probe of claim 1, wherein the first Protector Oligonucleotide further comprises a fourteenth region, and wherein the nucleic acid hybridization probe further comprises a second Universal Oligonucleotide, the second Universal Oligonucleotide comprising a thirteenth region, the thirteenth region being complementary to the fourteenth region.
10. The nucleic acid hybridization probe of claim 9, wherein the second Universal Oligonucleotide is functionalized with a moiety for capture or detection.
11. The nucleic acid hybridization probe of claim 7, wherein the first Protector Oligonucleotide further comprises a fourteenth region, and wherein the nucleic acid hybridization probe further comprises a second Universal Oligonucleotide, the second Universal Oligonucleotide comprising a thirteenth region, the thirteenth region being complementary to the fourteenth region.
12. The nucleic acid hybridization probe of claim 11, wherein the first Universal Oligonucleotide and/or the second Universal Oligonucleotide are functionalized with a moiety for capture or detection.
13. The nucleic acid hybridization probe of claim 1, wherein the target nucleic acid sequence contains a trinucleotide repeat sequence, and wherein the first Complement Oligonucleotide and the second Complement Oligonucleotide collectively span a repeat threshold, wherein the target efficiently binds to the nucleic acid hybridization probe only when the target nucleic acid sequence's trinucleotide repeat number is equal to or in excess of the repeat threshold.
14. A solution comprising the nucleic acid hybridization probe of claim 1, wherein the concentration of the first Protector Oligonucleotide is between 1.01 times and 10,000 times the concentration of the first Complement Oligonucleotide.
15. A solution comprising the nucleic acid hybridization probe of claim 1, wherein the concentration of the second Complement Oligonucleotide is between 0.1 times and 10 times the concentration of the first Complement Oligonucleotide.
16. A process of formulating the nucleic acid hybridization probe of claim 1, comprising: selecting a first Complement Oligonucleotide from a pool of first Complement Oligonucleotide candidates; selecting a second Complement Oligonucleotide from a pool of second Complement Oligonucleotide candidates; selecting a first Protector Oligonucleotide from a pool of first Protector Oligonucleotide candidates; combining the first Complement Oligonucleotide, the second Complement Oligonucleotide, and the first Protector Oligonucleotide in aqueous solution; reacting the solution in temperature and buffer conditions conducive to DNA hybridization.
17. A process of formulating multiple nucleic acid hybridization probes, comprising: formulating a first nucleic acid hybridization probe through the process of claim 16 to yield a first nucleic acid hybridization probe solution; formulating a second nucleic acid hybridization probe through the process of claim 16 to yield a second nucleic acid hybridization probe solution, wherein the fourth region of the second nucleic acid hybridization probe is identical in sequence to the fourth region of the first nucleic acid hybridization probe, and wherein the eighth region of the second nucleic acid hybridization probe is identical in sequence to the eighth region of the first nucleic acid hybridization probe; and combining the first nucleic acid hybridization probe solution and the second nucleic acid hybridization probe solution in temperature and buffer conditions that are not conducive to dissociation of the hybridized duplex formed by the fourth and eighth regions of each of the first nucleic acid hybridization probe and the second nucleic acid hybridization probe.
18. A method for detecting a DNA or RNA target nucleic acid sequence from a sample, wherein the target nucleic acid sequence comprises in 5′ to 3′ order a first region and a second region, the method comprising: contacting the sample with a nucleic acid hybridization probe, wherein the nucleic acid hybridization probe comprises: a first Complement Oligonucleotide comprising: in 5′ to 3′ order a fourth region and a fifth region, wherein the fifth region is complementary in sequence to the first region; a second Complement Oligonucleotide comprising: in 5′ to 3′ order a seventh region and an eighth region, wherein the seventh region is complementary in sequence to the second region, and wherein the eighth region is complementary to the fourth region; and a first Protector Oligonucleotide comprising a ninth region, wherein the ninth region is either (a) complementary to the fifth region and homologous to the first region or (b) complementary to the seventh region and homologous to the second region.
19. A method for quantifying triplet repeat numbers in a DNA or RNA target nucleic acid sequence from a sample known to comprise a DNA or RNA target nucleic acid sequence, the method comprising: dividing the sample into at least three aliquots; contacting each aliquot with a nucleic acid hybridization probe of claim 13 wherein the nucleic acid hybridization probe for each aliquot has a different repeat threshold; observing the binding yield for each aliquot; determining or constraining target sequence numbers from relative binding yields of the aliquots to the different probes.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the disclosure is not limited to the specific methods and instrumentalities disclosed herein.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
(48)
(49)
(50)
(51)
(52)
(53)
(54)
(55)
(56)
(57)
(58)
(59)
(60)
(61)
(62)
(63)
(64)
(65)
(66)
(67)
(68)
(69)
(70)
(71)
(72)
(73)
(74)
(75)
(76)
(77)
(78)
(79)
(80)
(81)
(82)
(83)
(84)
(85)
(86)
(87)
(88)
(89)
(90)
(91)
(92)
(93)
(94)
(95)
(96)
(97)
(98)
(99)
(100)
DETAILED DESCRIPTION
(101) The present disclosure will provide description to the accompanying drawings, in which some, but not all embodiments of the subject matter of the disclosure are shown. Indeed, the subject matter may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, rather, these embodiments are provided so that this disclosure satisfies all the legal requirements.
(102) Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art.
(103) The modular probe is designed based on detection or capture of a target nucleic acid sequence of at least partially known sequence. The target sequence is divided conceptually into several regions, a region being a number of continuous nucleotides that act as a unit in hybridization or dissociation. In most of the present disclosure we will consider the target as comprising three regions, labeled in 5′ to 3′ order as regions 1, 2, and 3. Note that the regions may or may not be directly adjoining one another; the dashed line between regions 1 and 2 in
(104) The most general instance of the modular probe comprises two Complement Oligonucleotides and a Protector Oligonucleotide (
(105) In some embodiments, Complement Oligonucleotides of the nucleic acid hybridization probes of the present disclosure can include from any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 105, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540 and 550 to any one of 560, 550, 540, 530, 520, 510, 500, 490, 480, 470, 460, 450, 440, 430, 420, 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2 and 1 nucleotides. In some embodiments the Complement Oligonucleotides of the nucleic acid hybridization probes of the present disclosure can include more than 500 nucleotides. In some embodiments, the portion of the Complement Oligonucleotides complementary to a portion of the target nucleic acid sequence can include from any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 105, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540 and 550 to any one of 560, 550, 540, 530, 520, 510, 500, 490, 480, 470, 460, 450, 440, 430, 420, 410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2 and 1 nucleotides. In some embodiments, the portion of the Complement Oligonucleotides complementary to a portion of the target nucleic acid sequence can include more than 500 nucleotides. In some embodiments, any portion of a Complement Oligonucleotide that is complementary to a portion of another Complement Oligonucleotide can include from any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90 to any one of 100, 90, 80, 70, 60, 50, 45, 40, 35, 30, 25, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2 and 1 nucleotides. In some embodiments, the portion of the target sequence that is complementary to a portion of the nucleic acid hybridization probe that does not correspond to a complementary Protector Oligonucleotide comprises from any one of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, and 45 to any one of 50, 45, 40, 35, 30, 25, 20, 15, 10, 9, 8, 7, 6, 5, 4, 3, 2 and 1 nucleotides. In some embodiments, the portion of the target sequence that is complementary to a portion of the nucleic acid hybridization probe that does not correspond to a complementary Protector Oligonucleotide comprises more than 50 nucleotides. For example, the toehold region of the nucleic acid hybridization probe, such as region 6 in
(106) Conditionally Fluorescent Modular Probe.
(107) Many formulation protocols for preparing the modular probe from its component oligonucleotides (i.e. Complement Oligonucleotides, Protector Oligonucleotides, and Universal Oligonucleotides) are possible (examples are shown in
(108)
(109) Modular Probes with Additional Segments. The modular probes, as embodied thus far in
(110) In some embodiments, the nucleic acid hybridization probe or M-probe, can include two or more segments. In some embodiments, the nucleic acid hybridization probe or M-probe, can include three segments. In some embodiments, the nucleic acid hybridization probe or M-probe, can include four segments. In some embodiments, the nucleic acid hybridization probe or M-probe can include 5, 6, 7, 8, 9, 10 or more than 10 segments.
(111) The probe in
(112)
(113) Quantitating Triplet Repeats. Several diseases are caused or characterized by an abnormal number of triplet repeats; examples include Huntington's Disease (excessive number of CAG repeats), Friedreich's Ataxia (GAA repeats), Myotonic dystrophy (CTG repeats), and the Fragile X syndrome (CGG repeats). Biologically, these repeats induce slipped strand mispairing during DNA replication; slipped strand mispairing likewise complicate or preclude many conventional DNA analysis techniques, such as Sanger Sequencing, quantitative PCR, and next-generation sequencing.
(114) Here, we designed several modular probes to the Huntington's gene sequence, each designed to target a threshold number of repeats (6, 9, 12, 15, 18, 21, 24, and 27), as well as the 3′ neighboring sequence. For example, a 12 repeat probe is designed to hybridize to any target sequences bearing 12 or more CAG repeats, in addition to the 8 nt downstream of the CAG repeats.
(115) Combinatorial Probe Formulation. Different versions of each Segment of the modular probe can be synthesized and constructed that bind different target subsequences (
(116) For a 2-Segment probe with 3 versions for each Segment, probes can be formulated to target 9 different sequences, as depicted in
(117) Modular Probe Structure Variations. In addition to the embodiments shown previously in
(118) The probe depicted in
(119) Language Exactness. Unless explicitly stated otherwise, “complementary” in this document refers to “partially or fully complementary”. Two sequences are defined to be “partially complementary” when over 80% of the aligned nucleotides of one sequence is complementary to corresponding nucleotides of the other sequence.
EXAMPLES
(120) The present invention is demonstrated in the following examples, it being understood that the following methods apply and that the examples are for illustrative purposes only, and the invention is not intended to be limited thereto.
(121) Methods
(122) Oligo synthesis and storage conditions. Oligonucleotide molecules used in this study were purchased from Integrated DNA Technologies (IDT). Depending on oligo length, modifications, and sequence, each oligo was ordered either with standard desalting or with post-synthesis PAGE or HPLC purification. All oligos were sequence verified by IDT via mass spectrometry; purified oligos and gBlock gene fragments were also subject to size verification by capillary electrophoresis. The sequence and purification method of each oligo can be found in Tables 8-21. Except ultramer oligos and gBlock gene fragments, all other oligos were originally pre-suspended by IDT in Tris.EDTA (pH=8.0) buffer at roughly 100 μM; stock solutions were stored at 4° C. until use.
(123) VDJ recombination sequence selection and hybridization target design. Sequences of human T-cell receptor β variable (V), diversity (D), and joining (J) germline-encoded genes were downloaded from the IMGT/Gene-DB database (http://www.imgt.org/genedb/). There are 48 functional TRBV genes (ORF and pseudogene excluded), 2 functional TRBD genes, and 13 functional TRBJ genes in total. As proof of concept, we designed 48 VDJ recombination targets composed of the last 35 nt bases of 8 TRBV genes, 48 biologically occurring sequences of regions between V and J, and the first 35 nt base of 6 TRBJ genes. Then, based on the distribution of number of deletion being observed in biology, the 3′ end of each V sequence is deleted by 0 to 7 nucleotides, and the 5′ end of each J sequence was deleted by 0 to 10 nucleotides. A detailed description of VDJ recombination targets design is provided in text accompanying
(124) M-Probe formulation and strand stoichiometry. For all the probes targeting non-repetitive sequences, 1 μM M-Probe stock solutions were formulated by mixing together all the component strands in a specified ratio to minimize the formation of multi-stranded complexes that poisons the reaction (
(125) For all the probes targeting repetitive sequences, 1 μM M-Probe stock solutions were formulated by two-step formation: individual segments were formulated separately and then combined to avoid probe malformation. To formulate n=1 probes (e.g. the MP-12 showed in
(126) Protocol for time-based fluorescence measurement. Time-based fluorescence traces shown in
(127) For data acquisition, excitation and emission wavelengths were set at 582 nm and 600 nm to generate optimal fluorescence signal for ROX fluorophore in our current buffer. Slit sizes were set at 4 nm for both excitation and emission, and integration time was 10 seconds (per cuvette) with a 60 seconds integration interval. Reaction temperature during fluorescence measurement was controlled by an external water bath purchased from Thermo Fisher Scientific. Experimental data was exported to a text file, which was subsequently imported and plotted using MATLAB scripts. Time t=0 corresponds to the first data point acquired after addition of target solutions.
(128) TABLE-US-00001 TABLE 1 Component strand stoichiometric ratio of M-Probes used in each experiment. Stoichiometric ratios were ordered counterclockwise from the lower universal strand to the upper universal strand. n Internal FIG. Experiment Segments Stoichiometric ratio 1 Basic Validation 1 1:1.1:1.1:2.1:2.1:2.1 2 Programmed Sequence 1 1:1.1:1.1:6.1:6.1:6.1 Variation Tolerance 3 Combinatorial Construction 1 1:1.1:1.1:2.1:2.1:2.1 4 Long M-Probe 1 3 1:1.2:1.2:1.2:1.2: 4 Long M-Probe 2 3 2.2:2.2:2.2:2.2:2.2 1:1.2:1.2:1.2:1.2: 2.2:2.2:2.2:2.2:2.2 5 CAG Repeat Detection 0 1:1.1: 5 MP-6, MP-9 1 2.1:2.1 5 CAG Repeat Detection MP- 2 1:1.1:1.1: 5 12, MP-15 3 2.1:2.1:2.1 CAG Repeat Detection MP- 1:1.1:1.1:1.1: 18, MP-21 2.1:2.1:2.1:2.1 CAG Repeat Detection MP- 1:1.1:1.1:1.1:1.1: 24, MP-27 2.1:2.1:2.1:2.1:2.1 5 CAG Repeat Capture 0 1:1.2: 5 MP-9 3 2.2:2.5 5 CAG Repeat Capture 1 1:1.2:1.4:1.5:1.6: 5 MP-27 1 2.6:2.7:2.8:3.0:3.4 5 CAG Repeat Capture 1 1:1.2:1.4: 5 MP-33 1 2.4:2.6:3.0 5 CAG Repeat Capture 1 1:1.2:1.4: MP-35 2.4:2.6:3.0 CAG Repeat Capture 1:1.2:1.4: MP-36 2.4:2.6:3.0 CAG Repeat Capture 1:1.2:1.4: MP-37 2.4:2.6:3.0 CAG Repeat Capture 1:1.2:1.4: MP-39 2.4:2.6:3.0
(129) Protocol for equilibrium fluorescence measurement. Equilibrium fluorescence signal showed in
(130) After incubation, 10 μL the M-Probe and target mixtures were pipetted into 96-well PCR plates (Thermo Fisher Scientific) which were subsequently sealed. 30 continuous data points were collected (30 seconds per data point) in each well following a 30 min incubation step in a PCR machine at 37° C.
(131) Experimental data were collected and exported as Excel files, and subsequently analyzed and plotted using MATLAB scripts. The analysis included a fluorescence signal correction for position biases. Detailed description of the data analysis procedure can be found in
(132) Protocol for asymmetric PCR. Asymmetric PCR was applied to generate hybridization targets used in
(133) We prepared our PCR reaction mix by combining 10×PCR buffer (with Mg.sup.2+, Sigma-Aldrich), dNTP mix (prepared from dATP, dTTP, dCTP, and dGTP stocks, Sigma-Aldrich), forward primer, reverse primer, Taq polymerase (Sigma-Aldrich), template solution, and Milli-Q H.sub.2O. The total reaction volume was 50 μL in a 0.7 mL Eppendorf PCR tube as shown in Table S0-2. The centrifuge tubes containing the reaction mixtures were placed into one of the three Eppendorf MasterCycler Personal Thermocycler, amplified following the PCR protocol listed in Table 3.
(134) Protocol for selective capture of long triplet repeats. CAG repeats in the HTT gene of 7 genomic DNA samples (NA18537, NA18524, NA20245, NA20248, NA20208, NA20209, and NA20210) were first amplified using a 5-cycle PCR procedure (TABLE 4, TABLE 5). All genomic were purchased from Coriell as reference templates for validating our technology. NA20245, NA20248, NA20208, NA20209, and NA20210 have known CAG repeat lengths, while NA18537 and NA18524 have unknown CAG repeat lengths. gDNA samples were first quantitated by Nanodrop 2000c spectrophotometer (Thermo Fisher Scientific). Then various amount of template solutions were used to prepare the PCR mixtures.
(135) TABLE-US-00002 TABLE 2 Asymmetric PCR reaction mixture formulation Working Stock Final Add in Reagent Concentration Concentration Volume (μL) 10 × PCR 10 × 2.5 mM 1 × 200 μM 5 buffer dNTP fP 5 μM, rP fP 1 μM, rP 4 (each) 500 nM 100 nM 10 Primer mix 0.5 Unit/μL 0.1 Unit/μL 10 Taq Polymerase Template 100 ng/μL 10 ng/μL 5 H.sub.2 O — — 16 Total 50 volume
(136) TABLE-US-00003 TABLE 3 Thermocycler asymmetric PCR program. Step Temperature Duration 1. Initial Denaturation 95° C. 3 min 2. Denaturation 95° C. 15 s 3. Annealing 60° C. 2 min Repeat Steps 2 to 3 for 70 times 6. Hold 10° C.
(137) TABLE-US-00004 TABLE 4 5-cycle PCR reaction mix formulation. Working Stock Final Add in Reagent Concentration Concentration Volume (μL) 10 × PCR 10 × 2 mM 1 × 200 μM 10 buffer fP 0.5 μM, rP fP 0.2 μM, rP 10 dNTP 0.5 μM 0.2 μM 40 (each) 0.5 Unit/μL 0.1 Unit/μL 20 Primer mix Taq Polymerase gDNA Varies 7.5 ng/μL Varies samples 1 × TE — — Varies Total 100 volume
(138) TABLE-US-00005 TABLE 5 5-cycle PCR program. Step Temperature Duration 1. Initial Denaturation 95° C. 4 min 2. Denaturation 95° C. 30 s 3. Annealing 60° C. 2 min Repeat Steps 2 to 3 for 5 times 4. Final extension 72° C. 2 min 5. Hold 10° C.
(139) After PCR, 100 μL reaction product samples were column purified, and each eluted in 90 μL MilliQ water. 15 μL elution product was denatured at 95° C. for 10 min, and then mixed with 15 μL 2×PBS, and 15 μL pre-annealed 600 μM probe solution, containing one of the following capture probe, MP-9, MP-27, MP-33, MP-35, MP-36, MP-37, or MP-39 in 1×PBS, to form a 45 μL hybridization reaction mixture (probe final concentration 200 μM). Here, the universal strands of the M-Probes are not fluorophore and quencher functionalized. Instead, the 5.sup.1 end the lower universal strand is functionalized with a biotin moiety, so that DNA molecules bound to the probe can be subsequently separated by streptavidin-functionalized magnetic beads. The mixtures were allowed to react overnight (12 to 18 hours) at 37° C.
(140) Before using beads to capture bound DNA, 10 μL of Dynabeads MyOne Streptavidin T1 magnetic beads solution was aliquoted, washed three times in 1×PBS, and resuspended in 65 μL 1×PBS for each reaction. Then, 45 μL of the incubated samples were transferred into tubes containing prepared beads. After thorough mixing, the tubes were incubated at 37° C. for 1 hour with constant shaking (rpm=450). Supernatant containing unbound DNA was washed away, and strands that were captured on the bead surface were subsequently released by incubating beads in 25 μL MilliQ water at 95° C. for 10 minutes. The eluted solutions were then quantified by qPCR using protocol shown in Tables 6. qPCR were performed in triplicate in a Bio-Rad CFX96 machine.
(141) TABLE-US-00006 TABLE 6 qPCR program. Step Temperature Duration 1. Initial Denaturation 95° C. 3 min 2. Denaturation 95° C. 5 s 3. Annealing 60° C. 1 min Repeat Steps 2 to 3 for 40 times
(142) M-Probe Design Principle
(143) M-Probe reaction mechanism. Conceptually, the M-Probe can be thought of as a multi-stranded equivalent of the toehold probe, in which the probe and protector sequences are distributed across multiple oligonucleotides connected by arms. The upper strands collectively form the protector, and the bottom strands collectively form the probe. Upon hybridization with the target, the protector complex (the upper strands) will dissociate from the probe complex (the bottom strands) through strand displacement.
(144) The mechanism of the hybridization reaction between an M-Probe and its target is illustrated in
(145) As depicted in
(146) Design of M-Probe thermodynamics. The standard free energy of the hybridization reaction between the M-Probe and its intended target can be calculated based on literature parameters, and is illustrated in
ΔG°.sub.rxn=ΔG°.sub.Toe+ΔG°.sub.ML3−ΔG°.sub.NH1−ΔG°.sub.NH2−ΔG°.sub.ML1−ΔG°.sub.ML2−ΔG°.sub.label
(147) The ΔG° Toe term denotes the standard free energy of binding of the toehold, ΔG° NH1,2 are the standard free energies of the non-homologous regions, ΔG° ML1,2,3 are the estimated standard free energies of the multi-loops formed at the junction of different hybridized regions, and ΔG° label is the estimated standard free energy difference between the thermodynamic contribution of fluorophore in close proximity to the quencher and the thermodynamic of free fluorophore in solution. The standard free energies of hybridization between regions are calculated based on the nearest neighbor model.
(148) The vertical arm sequences are designed to be orthogonal to each other, and unlikely to bind to the human genome because they are selected from a sequence library with low homology to human DNA (e.g. ERCC external RNA controls). The vertical arms remain hybridized through the course of the reaction with a target, so the calculation of ΔG° rxn does not explicitly consider these regions.
(149) M-Probe formulation stoichiometry. We typically formulate the M-probe using a stoichiometric ratio of component strands such that the quantity of each individual strand increases in a counterclockwise fashion from the lower-left corner (the fluorophore-labeled uC strand, see
(150) M-Probe formulation yield. To show the efficiency and yield of M-Probe preparation, we prepared basic validation probe (used in
(151) Stability of the M-probes. To evaluate the stability of M-probe, we performed basic probe validation experiment (same as that shown in
(152) Comparison of M-Probe to toehold probe and X-Probe. Toehold probes, X-Probes (an n=0 M-Probe), and M-Probes all follow the same design principles and exhibit high sequence specificity when reaction ΔG° is approximately 0. To study whether there are systematic differences between these three implementations, we tested one of each design against the same synthetic target DNA sequence (
(153) The three designs qualitative produced similar results, but the M-Probe and X-Probe exhibited higher back-ground signal in the absence of addition of target. The higher background is likely because the multiloop near the fluorophore and quencher increases the probability that the fluorophore and quencher are separated by a distance greater than the Forster radius of the fluorophore. Additionally, the toehold probe exhibited lower fluorescence from the single nucleotide variant targets than the M-Probe and X-Probe; this is likely because of an underestimate in the ΔG° m term for the M-Probe and the X-Probe, resulting in a reaction ΔG° that is significantly more negative than expected from calculation, leading to lower specificity. We believe that optimization of the M-Probe and X-Probe sequences to shorten the length of the toehold would correct this difference from the toehold probe.
(154) M-Probe universal segment design. The universal segment u of the M-Probe is typically functionalized with one or more chemical moieties to facilitate detection or enrichment of the targets of interest. The two oligonucleotides that comprise the universal segment may optionally possess a horizontal region of complementarity, in addition to the required arm sequences that connect them to segment s1. The X-Probe, a special case of M-Probe with n=0, is used to study the effects of the length of the complementary region in the universal segment.
(155)
Example 2
(156) VDJ Target Design.
(157) We designed synthetic VDJ hybridization targets based on published VDJ combinations sequences of T lymphocytes from peripheral blood. 33664 sequences of CDR3 clonotypes assembled from sequencing data of pooled peripheral T cells mRNA from 380 males and 170 females were analyzed, and 22704 sequences with unambiguous assignments of both V and J were used in further analysis.
(158) To determine number of deletions present near the 3′ end of the V and the 5′ end of the J regions, V and J sequences observed in CDR3 clonodypes were compared with corresponding germline-encoded V and J gene sequences downloaded from IMGT/Gene-DB (http://www.imgt.org/genedb/). For this dataset, deletions at the 3′ of the V segment can be up to 13 bases, and deletions at the 5′ end of the J segment can be up to 25 bases (
(159) According to IMGT/Gene-DB, the 2 germ-line encoded functional TRBD genes are very short, 12 and 16 nt, respectively. Thus, after substantial base deletions and insertions, origin of the D segment is often unidentifiable. In our analysis, we considered the non-templated bases and remaining D gene sequence as ‘sequence between V-J’, and did not specifically distinguish the two. The results show that the length of sequence between V and J ranges between 0 and 44 bases (
(160) We designed 48 VDJ recombination targets based on 8 arbitrarily chosen TRBV genes, 48 biologically occurring sequences between V-J, and 6 arbitrarily chosen TRBJ genes. Therefore, 6 targets were assigned to each V segment, and 8 targets were assigned to each J segment. The distribution of deletions within the chosen V and J segments in biology are shown in
Example 3
(161) VDJ Probe Design
(162) Germ-line encoded D genes are very short and D gene usage in mature T cells is often unidentifiable due to substantial base deletion and random insertions. Consequently, we designed n=1 M-Probes with s1 and t segments only targeting V and J germ-line gene subsequences that are unlikely to be deleted during the VDJ recombination process. When the matching target DNA sequence binds to the M-Probe, a bulge will be formed at the junction between s1 and t segments. The bulge includes all the bases in the remaining D region, as well as random deletions and non-templated insertions at the V-D and D-J junctions. The targeting region of VDJ probes was designed to only cover sequences from the 3′ most 35th base of V to the 5′ most 35th base of J, because sequences upstream and downstream of CDR3 (5′ of V, and 3′ of J) are usually conserved and so are not informative in this context.
(163)
ΔG°.sub.rxn=ΔG°.sub.Toe+ΔG°.sub.Bulge−ΔG°.sub.NH−ΔG°.sub.ML1−ΔG°.sub.label≈−9.5+(+8.0)−(−3.0)−(+4.0)−(−1.5)≈−1.0 kcal/mol
(164) We estimated the free energy values of the bulge (including multiloop penalty) in the product to be roughly +8.0 kcal/mol, and fluorophore-quencher interaction to be −1.5 kcal/mol. The standard model of DNA hybridization indicates a logarithmic dependence of energy on the length of bulges, so there should not be large deviations of ΔG° values for different target sequences to the same M-Probe, except in the case of significant target secondary structure. We then designed the toehold and non-homologous regions to make the overall reaction energy to be slightly negative than 0 kcal/mol. So that probes will maintain good specificity against mutations in V and J segments, but also provide tolerance to larger bulge domains formed at the junction. As a result, despite the fact that some bulge sequences can be over 30 nt long, fluorescence response curves showed that these targets can still react with M-Probes in a reasonably fast manner (
(165)
Example 4
(166) End-Point Fluorescence Measurement for VDJ M-Probe Reactions.
(167) In Example 3, we showed kinetics traces of the VDJ M-Probes hybridization to their targets. To enable higher throughput collection of end-point data for a large number of target-probe combinations, we used the Applied Biosystems QuantStudio 7 Flex Real-Time PCR System to measure the fluorescence of products after hybridization. Note that no polymerase enzyme was added; the instrument was used solely for temperature control and fluorescence measurement. The 96 different well positions each exhibit slight biases in fluorescence levels. We performed calibration experiments to correct for these systematic position biases before experimental analysis.
(168)
(169) To correct for position dependence, the average fluorescence of the entire plate is used as a reference. We performed linear regression between the reference fluorescences and the raw fluorescences of the four concentrations, and then applied the best-fit slope and intercept values to linearly transform the fluorescence of each well into the equivalent reference fluorescence. Position correction of well A1 is shown as an example in
(170) Initial Experiments. Before conducting experiments of all 48 VDJ M-Probes, we first conducted a smaller-scale test on 8 M-Probes and their corresponding 8 target sequences. Every pairwise interaction between probe and target was studied, for a total of 64 reactions (
Example 5
(171) Long Targets
(172) Longer target DNA sequences are more prone to formation of significant secondary structure, which may interfere with intended hybridization to M-Probe for both thermodynamics and kinetics reasons. For this reason, when working with genomic DNA samples we first perform PCR amplification to generate shorter amplicons, which are then hybridized to the M-Probes. Even just considering amplicons, however, significant secondary structure may exist for some target sequences.
(173) To demonstrate M-Probe's capability to probe long sequences, we designed respective M-Probes targeting 99, 160, 218, 430, and 560 nt (
(174)
(175) Shown in
(176)
(177) Design and performance of M-Probe targeting a 430 nt sequence flanking around SNP rs7648926 were shown in
(178) M-Probes with n≥1 have multiple target-specific segments (including t), and can circumvent oligonucleotide synthesis limitations to probe longer continuous target sequences. For example, given an oligonucleotide synthesis limitation of L nucleotides (L=100 for standard oligo, L=200 for IDT Ultramer oligo), each of the n internal s segments can probe (L−2A) nucleotides (where A is the length of the arm sequence), and the terminal t segment can probe (L−A) nucleotides. An n internal segment M-Probe can thus probe a maximum length LM of
LM=(n+1).Math.L−(2n+1).Math.A
continuous nucleotides. From the above equation, it's clear that the M-Probe benefits shorter arm lengths A. The minimum length of A for stable formation of the M-Probe depends on arm sequence, temperature, and buffer salinity; at 37-45° C. and 1×PBS, A=22 is sufficient for stability for most arm sequences. For L=180 and A=22, an n=2 M-Probe can probe up to 430 nt, and an n=3 M-Probe can probe up to 564 nt.
(179) M-Probes retain their high sequence selectivity even when binding long DNA targets.
Example 6
(180) Trinucleotide Repeat Profiling.
(181) DNA trinucleotide repeat expansion profiling is difficult for standard molecular analysis technologies.
(182) Conditionally fluorescent M-Probes design and formulation. Each M-Probe provides information on whether a DNA sample contains the HTT gene with triplet repeats equal to or exceeding the designed number. A series of different M-Probes with different triplet repeat thresholds thus is able to provide precise information on triplet repeat number.
(183) We studied whether the stoichiometric ratio of component strands has significant effect on M-Probe performance (
(184) M-Probes for profiling CGG and GAA triplet repeats. We also designed M-Probes targeting FMR1 gene CGG repeats region (
(185) Control fluorescence experiments on synthetic triplet repeat samples.
(186) To analyze HTT triplet repeats from human genomic DNA samples, the HTT repeat region was PCR amplified and the amplicons are used as hybridization targets.
(187)
(188) Selective capture of high repeat HTT gene from genomic DNA using biotin-functionalized M-Probes. To apply M-Probes to profiling triplet repeat number in HTT in genomic DNA samples, biotin-functionalized M-Probes are used to selectively bind DNA with HTT exceeding the threshold number of triplet repeats. To demonstrate that our approach can precisely determine the repeat number in genomic DNA sample, we designed HTT probes with 33, 35, 36, 37, and 39 CAG repeats (schematic shown in
(189) TABLE-US-00007 TABLE 7 Summary of qPCR results for biotin-funtionalized M-Probe hybrid-capture products. NA18537 and NA18524 samples' triplet repeats numbers are not reported in publicly available databases. 1 × TE denotes the negative control experiment in which M-Probes are hybridized to a blank sample with no gDNA sample. ‘Bare Beads’ denotes the negative control experiment in which M-Probes are not used, to characterize the amount of non-specific capture of genomic DNA by magnetic beads. The last row shows the negative control using only the primers but no sample (primer dimer Ct). Ct Ct Sample Repeat Probe Ct1 Ct2 Ct3 Mean S.D. ΔCt NA18537* 12-21 HTT9 25.23 25.34 25.13 25.2 0.1 5.7 12-21 HTT27 30.61 31.14 31.06 30.9 0.3 NA18524* 12-21 HTT9 25.22 25.13 25.23 25.2 0.1 5.4 12-21 HTT27 30.39 30.88 30.49 30.6 0.3 NA20245 15/15 HTT9 25.52 25.38 25.46 25.5 0.1 5.5 15/15 HTT27 31.10 31.17 30.60 31.0 0.3 NA20248 17/36 HTT9 25.18 25.01 25.08 25.1 0.1 0.5 17/36 HTT27 25.70 25.66 25.40 25.6 0.2 17/36 HTT33 26.62 26.39 26.36 26.5 0.1 5.3 NA20248 17/36 HTT35 26.58 26.17 26.18 26.3 0.2 17/36 HTT36 26.33 26.34 26.25 26.3 0.0 17/36 HTT37 31.52 31.80 31.61 31.6 0.1 17/36 HTT39 31.33 32.31 31.63 31.8 0.5 NA20208 35/45 HTT9 24.21 24.21 24.18 24.2 0.0 1.5 35/45 HTT27 25.81 25.65 25.54 25.7 0.1 NA20209 45/46 HTT9 25.28 25.13 25.11 25.2 0.1 0.1 45/46 HTT27 25.41 25.28 25.25 25.3 0.1 NA20210 17/75 HTT9 23.73 23.52 23.38 23.5 0.2 0.6 17/75 HTT27 24.19 24.05 24.08 24.1 0.1 1 × TE N/ HTT9 31.40 31.26 31.95 31.5 0.4 — A HTT27 31.76 30.34 32.23 31.4 1.0 N/ HTT39 32.35 32.03 32.07 32.2 0.2 A N/ NA20210 17/75 Bare 32.81 32.24 30.21 31.8 1.4 — Beads — — — 33.18 31.99 32.10 32.4 0.7 —
(190)
(191)
(192) To apply M-Probes to profiling triplet repeat number in HTT in genomic DNA samples, biotin-functionalized M-Probes are used to selectively bind DNA with HTT exceeding the threshold number of triplet repeats (
(193) Amplification of HTT genes with fewer than the threshold repeat number (number of triplets in the M-Probe) shows significantly higher cycle threshold (Ct) than the HTT genes exceeding the threshold repeat number. By designing two different M-Probes, one targeting 9 repeats and one targeting 27 repeats, we can control for sample variability, and determine potential disease status through the difference in the observed Ct values (ΔCt). Small (<2) ΔCt values indicate that at least one of the two HTT gene copies exceeds 27 repeats, and large (>5) ΔCt values indicate the opposite. Residual amplification of the low-repeat number HTT genes is likely due to nonspecific binding of genomic DNA to the magnetic beads (data not shown).
(194)
(195) More precise quantitation of the HTT triplet repeat number can be achieved by extending the method to include more M-Probes with varying triplet repeat thresholds. To demonstrate this point, we designed 5 different M-Probes targeting 33, 35, 36, 37, and 39 CAG repeats, and applied it to the NA20248 genomic DNA sample. The experimental Ct values for the M-Probes targeting 37 and 39 repeats were more than 5 cycles higher than for M-Probes targeting 33, 35, and 36 repeats, suggesting correctly that the sample has one HTT gene copy with exactly 36 CAG repeats (
(196) In addition to the hybrid-capture workflow we presented here, an alternative approach to profiling triplet repeats using M-Probes is to amplify the HTT gene to above nanomolar concentrations, and then directly react the amplicons with conditionally fluorescent M-Probes. The relative advantage of this second approach is that the solid-phase separation steps are avoided, reducing total hands-on time. The relative disadvantage is that open-tube steps on high concentration amplicons are likely to lead to laboratory contamination, and undesirable in diagnostic settings. Both approaches can reliable detect repeat expansion with single repeat resolution in a small range of expansion (e.g. 27-40 for Huntington's disease) that it is difficult to achieve by previously reported methods. Budworth, H., & McMurray, C. T. Problems and solutions for the analysis of somatic CAG repeat expansion and their relationship to Huntington's disease toxicity. Rare Dis, 4: e1131885 (2016); Jama, M., Millson, A., Miller, C. E., & Lyon E. Triplet repeat primed PCR simplifies testing for Huntington disease. J Mol Diagn, 15: 255-262 (2016); Bonifazi, E., et al. Use of RNA fluorescence in situ hybridization in the prenatal molecular diagnosis of myotonic dystrophy type I. Clin Chem, 52: 319-322 (2006); Kern, A., & Seitz, O. Template-directed ligation on repetitive DNA sequences: a chemical method to probe the length of Huntington DNA. Chem Sci, 6: 724-728 (2015). Larger range of expansion can also be profiled by using M-Probes with more and/or longer segments.
Example 7
(197) List of Oligonucleotide Sequences.
(198) Oligonucleotide sequences used for all experiments are listed here. For each M-Probe, the top oligos with sequence homologous to the target sequence are referred to as P (protector) sequences, and the bottom oligos with sequence complementary to the target sequence are referred to as C (complement) sequences. Each strand includes in its name/label: the figure in which it is used, the segment to which it belongs, and additional descriptors as necessary. For example,
(199) M-Probe Proof-of-Concept Experiments (
(200) TABLE-US-00008 TABLE 8 Oligonucleotide sequences used for FIGS. 11A-11C and FIGS. 16-22C. Species Sequence FIG. 11-uP /5IAbRQ/ GTGCGAACAGGTACATTTGCT (SEQ ID NO: 2) FIG. 11-uC CGTCCTTGTTAAATCGTGGATAGTAGAC TTCGCAC /3Rox N/ (SEQ ID NO: 1) FIG. 11-Target ACGCAGCTAATGCCCTCAGACAGCTTTG FIG. 11-Variant ACGTATGTGTTTCTC (SEQ ID NO: 47) (12G > t) ACGCAGCTAATtCCCTCAGACAGCTTTGA CGTATGTGTTTCTC (SEQ ID NO: 48) FIG. 11-Variant ACGCAGCTAATGCCCTCAGACAGCTTTG (31G > a) ACaTATGTGTTTCTC (SEQ ID NO: 49) FIG. 11-s1P AAGGACGAGCAAATGTACCTGACT ACGCAGCTAATGCCCT CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 50) FIG. 11-s1C AGTAACAGACGGAAATTGTGC AGGGCATTAGCTGCGT AGTGTCTACTATCCACGATTTAAC (SEQ ID NO: 51) FIG. 11-tP TGATGCGAAGACTCTATCACG CAGACAGCTTTGACGTA (SEQ ID NO: 52) FIG. 11-tC GAGAAACACATACGTCAAAGCTGTCTG GCACAATTTCCGTCTGTTACT (SEQ ID NO: 53) FIG. 22b-P /5IAbRQ/ GTGCG ACGCAGCTAATGCCCTCAGACAGCTTTGACG (SEQ ID NO: 54) FIG. 22b-C GAGAAACACATACGTCAAAGCTGTCTGAGGGCAT TAGCTGCGT CGCAC /3ROX N/ (SEQ ID NO: 55) FIG. 22c-XP AAGGACGAGCAAATGTACCTGACT ACGCAGCTAATGCCCTCAGACAGCTTTGACGTA (SEQ ID NO: 56) FIG. 22c-XC GAGAAACACATACGTCAAAGCTGTCTGAGGGCAT TAGCTGCGT AGTGTCTACTATCCACGATTTAAC (SEQ ID NO: 57)
(201) TABLE-US-00009 TABLE 9 Sequences for non-homologous distribution experiments as shown in FIGS. 23A-23D. Species Sequence FIG. 23-Target ATGTCAAGATCACAGATTTTGGGCGGGCCA (SEQ ID NO: 58) FIG. 23-Variant ATGTCAAGATCACAGATTTTGGGCtGGCCA (SEQ ID NO: 59) FIG. 23a-uP /5IAbRQ/ GTGCGAA CAGGTACATTTGCTCGT (SEQ ID NO: 2) FIG. 23a-uC CCTTGTTAAATCGTGGATAGTAGAC TTCGCA C /3Rox N/ (SEQ ID NO: 1) FIG. 23b-uP /5IAbRQ/ GTGCG CAGGTACATTTGCTCGTCC (SEQ ID NO: 60) FIG. 23b-uC TTGTTAAATCGTGGATAGTAGAC CGCAC / 3Rox N/ (SEQ ID NO: 61) FIG. 23c-uP /5IAbRQ/ GTG CAGGTACATTTGCTCGTCCTT (SEQ ID NO: 62) FIG. 23c-uC GTTAAATCGTGGATAGTAGAC CAC /3Rox N/ (SEQ ID NO: 63) FIG. 23d-uP /5IAbRQ/ CAGGTACATTTGCTCGTCCTT (SEQ ID NO: 64) FIG. 23d-uC GTTAAATCGTGGATAGTAGAC /3Rox N/ (SEQ ID NO: 65) FIG. 23a-tP AAGGACGAGCAAATGTACCTG CA GTCAAGATCACAGATTTTGG (SEQ ID NO: 66) FIG. 23a-tC GCCCGCCCAAAATCTGTGATCTTGAC TG GTCTACTATCCACGATTTAAC (SEQ ID NO: 67) FIG. 23b-tP AAGGACGAGCAAATGTACCTG AACA GTCAAGATCACAGATTTTGG (SEQ ID NO: 68) FIG. 23b-tC GCCCGCCCAAAATCTGTGATCTTGAC TGTT GTCTACTATCCACGATTTAAC (SEQ ID NO: 69) FIG. 23c-tP AAGGACGAGCAAATGTACCTG GCAACA GTCAAGATCACAGATTTTGG (SEQ ID NO: 70) FIG. 23c-tC GCCCGCCCAAAATCTGTGATCTTGAC TGTTGC GTCTACTATCCACGATTTAAC (SEQ ID NO: 71) FIG. 23d-tP AAGGACGAGCAAATGTACCTG GTGCGAA GTCAAGATCACAGATTTTGG (SEQ ID NO: 72) FIG. 23d-tC GCCCGCCCAAAATCTGTGATCTTGAC TTCGCAC GTCTACTATCCACGATTTAAC (SEQ ID NO: 73)
(202) Sequence variation tolerance at M-Probe junctions (
(203) TABLE-US-00010 TABLE 10 Oligonucleotide sequences used for constructing M-Probe used in FIGS. 12A-12D. /5IABkFQ/ represents an Iowa Black FQ quencher modification at 5′ end of the oligo. Arm regions are shown in uppercase. Species Sequence FIG. 12- GTTAAATCGTGGATAGTAGAC /3Rox N/ uC (SEQ ID NO: 65) FIG. 12- /5IABkFQ/ CAGGTACATTTGCTCGTCCTT uP (SEQ ID NO: 64) FIG. 12- AAGGACGAGCAAATGTACCTGCAGTA s1P cacgactcagctgtgtatttttgtgctagtggCGTGATAG AGTCTTCGCATCA (SEQ ID NO: 74) FIG. 12- AGTAACAGACGGAAATTGTGCccactagcaca s1C aaaatacacagctgagtcgtgTACTGGTCTACT ATCCACGATTTAAC (SEQ ID NO: 75) FIG. 12- TGATGCGAAGACTCTATCACGggaaacaccatatattttgga tP (SEQ ID NO: 76) FIG. 12- aacttccctctccaaaatatatggtgtttcc GCACAATTTC tC CGTCTGTTACT (SEQ ID NO: 77)
(204) TABLE-US-00011 TABLE 11 Oligonucleotide sequences used as targets and variants for FIGS. 12A-12D experiments. Underscore ( ) indicates deletion, and elipsis ( . . . ) indicates that the sequence is continued on the following line. Species Type Sequence FIG. 12-Target Perfect GACTCAGCTGTGTATTTTTGTGCTAGTG Match G aac . . . GGAAACACCATATATTTTGGAGAGGG AAGTT (SEQ ID NO: 78) FIG. 12-Variant-s1-G > c 1 nt mutation GACTCAcCTGTGTATTTTTGTGCTAGTGG FIG. 12-Variant-s1-GC > ct 2 nt mutation aac . . . FIG. 12-Variant-t-G > t 1 nt mutation GGAAACACCATATATTTTGGAGA FIG. 12-Variant-t-GG > ac 2 nt mutation GGGAAGTT (SEQ ID NO: 79) FIG. 12-Variant-s1-GC > GaC 1 nt insertion GACTCActTGTGTATTTTTGTGCTAG FIG. 12-Variant-s1-GC > GgatC 3 nt insertion TGG aac . . . FIG. 12-Variant-t-CC > CtC 1 nt insertion GGAAACACCATATATTTTGGAGA FIG. 12-Variant-t-CC > CagtC 3 nt insertion GGGAAGTT (SEQ ID NO: 80) FIG. 12-Variant-s1-TGC > TC 1 nt deletion GACTCAGCTGTGTATTTTTGTGCTA FIG. 12-Variant-s1-GTGCT > GT 3 nt deletion GTGG aac . . . FIG. 12-Variant-t-GGA > GA 1 nt deletion GGAAACACCATATATTTTGGAGA FIG. 12-Variant-t-TGGAG > TG 3 nt deletion GtGAAGTT (SEQ ID NO: 81) GACTCAGCTGTGTATTTTTGTGCTA GTGG aac . . . GGAAACACCATATATTTTGGAGA GacAAGTT (SEQ ID NO: 82) GACTCAGCTGTGTATTTTTGTGaCTA GTGG aac . . . GGAAACACCATATATTTTGGAGAG GGAAGTT (SEQ ID NO: 83) GACTCAGCTGTGTATTTTTGTGgatCT AGTGG aac . . . GGAAACACCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 84) GACTCAGCTGTGTATTTTTGTGCTA GTGG aac . . . GGAAACACtCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 85) GACTCAGCTGTGTATTTTTGTGCTA GTGG aac . . . GGAAACACagtCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 86) GACTCAGCTGTGTATTTTTGT CTAGTGG aac . . . GGAAACACCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 87) GACTCAGCTGTGTATTTTTG TAGTGG aac . . . GGAAACACCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 88) GACTCAGCTGTGTATTTTTGTGCTA GTGG aac . . . GGAAACACCATATATTTTG AGAGGGAAGTT (SEQ ID NO: 89) GACTCAGCTGTGTATTTTTGTGCTA GTGG aac . . . GGAAACACCATATATTTT GAGGGAAGTT (SEQ ID NO: 90) FIG. 12-Tolerated-s1t- 3 nt GACTCAGCTGTGTATTTTTGTGCTAGTG GAACG > GG deletion G . . . FIG. 12-Tolerated-s1t- 2 nt GGAAACACCATATATTTTGGAGA AACG > AG deletion GGGAAGTT (SEQ ID NO: 91) FIG. 12-Tolerated-s1t-ACG > AG 1 nt GACTCAGCTGTGTATTTTTGTGCTA FIG. 12-Tolerated-s1t- deletion GTGG a . . . GaacG > GgtataG FIG. 12- 2 nt ins. + 3 nt GGAAACACCATATATTTTGGAGA Tolerated-s1t- mut. GGGAAGTT (SEQ ID NO: 92) GaacG > GaatgtaacG FIG. 12- 5 nt GACTCAGCTGTGTATTTTTGTGCTA Tolerated-s1t- ins. GTGG aa . . . GaacG > GatattaaacG 6 nt GGAAACACCATATATTTTGGAGAG FIG. 12-Tolerated-s1t- ins. GGAAGTT (SEQ ID NO: 93) GaacG > GaatatgtaacG 7 nt GACTCAGCTGTGTATTTTTGTGCTA FIG. 12-Tolerated-s1t- ins. GTGG gtata . . . GaacG > GgaaG 3 nt GGAAACACCATATATTTTGGAGAGGGAA FIG. 12-Tolerated-s1t- mutation GTT (SEQ ID NO: 94) GaacG > GgtcG 3 nt GACTCAGCTGTGTATTTTTGTGCTAGT FIG. 12-Tolerated-s1t- mutation GG aatgt aac . . . GaacG > GacgG 3 nt GGAAACACCATATATTTTGGAGAGGGAA mutation GTT (SEQ ID NO: 95) GACTCAGCTGTGTATTTTTGTGCTAGT GG atatta aac . . . GGAAACACCATATATTTTGGAGAGGG AAGTT (SEQ ID NO: 96) GACTCAGCTGTGTATTTTTGTGCTAGTG G aatatgt aac . . . GGAAACACCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 97) GACTCAGCTGTGTATTTTTGTGCTA GTGG gaa . . . GGAAACACCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 98) GACTCAGCTGTGTATTTTTGTGCTA GTGG gtc . . . GGAAACACCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 99) GACTCAGCTGTGTATTTTTGTGCTA GTGG acg . . . GGAAACACCATATATTTTGGAGA GGGAAGTT (SEQ ID NO: 100)
(205) VDJ Recombination Detection via M-Probes Constructed by Combinatorial Modules (
(206) TABLE-US-00012 TABLE 12 Oligonucleotide sequences used to construct M-Probes for experiments in FIGS. 13A-13D and FIGS. 24-32B. Species Sequence FIG. 13-uP GTTAAATCGTGGATAGTAGAC /3Rox N/ (SEQ ID NO: 65) FIG. 13-uC /5IABkFQ/ CAGGTACATTTGCTCGTCCTT (SEQ ID NO: 64) FIG. 13-s1P- AAGGACGAGCAAATGTACCTGCAGTA gactcagccatgtacttctgtgccagca V2 CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 104) FIG. 13-s1P- AAGGACGAGCAAATGTACCTGCAGTA V3-1 gactctgctgtgtatttctgtgccagcagcc CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 105) FIG. 13-s1P- AAGGACGAGCAAATGTACCTGCAGTA V4-1 gactcagccctgtatctctgcgccagcagcc CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 106) FIG. 13-s1P- AAGGACGAGCAAATGTACCTGCAGTA V5-1 ggactcggccctttatctttgcgccagcag CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 107) FIG. 13-s1P- AAGGACGAGCAAATGTACCTGCAGTA cagacatctgtgtacttctgtgccagca V6-1 CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 108) FIG. 13-s1P- AAGGACGAGCAAATGTACCTGCAGTA cagacatctgtatatttctgcgccagcag V10-1 CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 109) FIG. 13-s1P- AAGGACGAGCAAATGTACCTGCAGTA V11-1 gactcggccatgtatctctgtgccagcagc CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 110) FIG. 13-s1P- AAGGACGAGCAAATGTACCTGCAGTA gactcagctgtgtatttttgtgctagtgg V12-5 CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 111) FIG. 13-s1C- AGTAACAGACGGAAATTGTGC tgctggcacagaagtacatggctgagtc V2 TACTGGTCTACTATCCACGATTTAAC (SEQ ID NO: 112) FIG. 13-s1C- AGTAACAGACGGAAATTGTGC ggctgctggcacagaaatacacagcagagtc V3-1 TACTGGTCTACTATCCACGATTTAAC (SEQ ID NO: 113) FIG. 13-s1C- AGTAACAGACGGAAATTGTGC ggctgctggcgcagagatacagggctgagtc V4-1 TACTGGTCTACTATCCACGATTTAAC (SEQ ID NO: 114) FIG. 13-s1C- AGTAACAGACGGAAATTGTGC ctgctggcgcaaagataaagggccgagtcc V5-1 TACTGGTCTACTATCCACGATTTAAC (SEQ ID NO: 115) FIG. 13-s1C- AGTAACAGACGGAAATTGTGC tgctggcacagaagtacacagatgtctg V6-1 TACTGGTCTACTATCCACGATTTAAC (SEQ ID NO: 116) FIG. 13-s1C- AGTAACAGACGGAAATTGTGC ctgctggcgcagaaatatacagatgtctg V10-1 TACTGGTCTACTATCCACGATTTAAC (SEQ ID NO: 117) FIG. 13-s1C- AGTAACAGACGGAAATTGTGC gctgctggcacagagatacatggccgagtc V11-1 TACTGGTCTACTATCCACGATTTAAC (SEQ ID NO: 118) FIG. 13-s1C- AGTAACAGACGGAAATTGTGC ccactagcacaaaaatacacagctgagtc V12-5 TACTGGTCTACTATCCACGATTTAAC (SEQ ID NO: 119) FIG. 13-tP- TGATGCGAAGACTCTATCACG tgaagctttctttggacaag (SEQ ID NO: 120) J1-1 TGATGCGAAGACTCTATCAC FIG. 13-tP- G ggctacaccttcggttcgg (SEQ ID NO: 121) J1-2 TGATGCGAAGACTCTATCAC FIG. 13-tP- G ggaaacaccatatattttgga (SEQ ID NO: 76) J1-3 TGATGCGAAGACTCTATCAC FIG. 13-tP- G gagcagttatcgggc (SEQ ID NO: 122) J2-1 TGATGCGAAGACTCTATCAC FIG. 13-tP- G cggggagctgttttttgg (SEQ ID NO: 123) J2-2 TGATGCGAAGACTCTATCAC FIG. 13-tP- G gatacgcagtattttggcccag (SEQ ID NO: 124) J2-3 FIG. 13-tC- tctggtgccttgtccaaagaaagcttca GCACAATTTCCGTCTGTTACT (SEQ ID NO: 125) J1-1 cctggtccccgaaccgaaggtgtagcc FIG. 13-tC- GCACAATTTCCGTCTGTTACT (SEQ ID NO: 126) J1-2 aacttccctctccaaaatatatggtgtttcc FIG. 13-tC- GCACAATTTCCGTCTGTTACT (SEQ ID NO: 77) J1-3 gtgtccctggcccgaagaactgctc FIG. 13-tC- GCACAATTTCCGTCTGTTACT (SEQ ID NO: 127) J2-1 agagccttctccaaaaaacagctccccg FIG. 13-tC- GCACAATTTCCGTCTGTTACT (SEQ ID NO: 128) J2-2 cgggtgcctgggccaaaatactgcgtatc FIG. 13-tC- GCACAATTTCCGTCTGTTACT (SEQ ID NO: 129) J2-3
(207) TABLE-US-00013 TABLE 13 Oligonucleotide sequences used as targets for experiments in FIGS. 13A-13D and FIGS. 24-32B. Species Sequence FIG. 13-Target- GACTCAGCCATGTACTTCTGTGCCAGCAgat V2/J1-1 aggctccaatgagcagttca . . . TGAAGCTTTCTTTGGACAAGGCACCAGA (SEQ ID NO: 130) FIG. 13-Target- GACTCAGCCATGTACTTCTGTGCCAGCAGTGA V2/J1-2 ttgcgggaggttggagatacgcagtc. . . GGCTACACCTTCGGTTCGGGGACCAGG (SEQ ID NO: 131) FIG. 13-Target- GACTCAGCCATGTACTTCTGTGCCAGCAGTGA V2/J1-3 Agttatgggacacctggt . . . CTCTGGAAACACCATATATTTTGGAGAGGGAA GTT (SEQ ID NO: 132) FIG. 13-Target- GACTCAGCCATGTACTTCTGTGCCAGCAGTGA V2/J2-1 AGCacagggatcg . . . CAATGAGCAGTTCTTCGGGCCAGGGACAC (SEQ FIG. 13-Target- GACTCAGCCATGTACTTCTGTGCCAGCAGTGA V2/J2-2 AGgctacttagcgtc . . . ACCGGGGAGCTGTTTTTTGGAGAAGGCTCT (SEQ ID NO: 134) FIG. 13-Target- GACTCAGCCATGTACTTCTGTGCCA V2J/2-3 GCAGtgtgggacag . . . ACAGATACGCAGTATTTTGGC CCAGGCACCCG (SEQ ID NO: 135) FIG. 13-Target-V3- GACTCTGCTGTGTATTTCTGTGCCAGCAGCCA 1/J1-1 AGggactagcggtta . . . ACACTGAAGCTTTCTTTGGACAAGGCACCAGA (SEQ ID NO: 136) FIG. 13-Target-V3- GACTCTGCTGTGTATTTCTGTGCCAGCAGCCA 1/J1-2 cacgggacagggtc . . . CTATGGCTACACCTTCGGTTCGGGGACCAGG (SEQ ID NO: 137) FIG. 13-Target-V3- GACTCTGCTGTGTATTTCTGTGCCAGCAGCCA 1/J1-3 AGAgggagggctagcgaggg . . . CTCTGGAAACACCATATATTTTGGAGAGGGAA GTT (SEQ ID NO: 138) FIG. 13-Target-V3- GACTCTGCTGTGTATTTCTGTGCCAGCAGCCA 1/J2-1 AGAggga . . . GAGCAGTTCTTCGGGCCAGGGACAC (SEQ ID NO: 139) FIG. 13-Target-V3- GACTCTGCTGTGTATTTCTGTGCCAGCAGCCA 1/J2-2 AGtcgtatcaa . . . ACCGGGGAGCTGTTTTTTGGAGAAGGCTCT (SEQ ID NO: 140) FIG. 13-Target-V3- GACTCTGCTGTGTATTTCTGTGCCAGCAGCC 1/J2-3 AAtttggtctagcgggata . . . CACAGATACGCAGTATTTTGGCCCAGGCACC CG (SEQ ID NO: 141) FIG. 13-Target-V4- GACTCAGCCCTGTATCTCTGCGCCAG 1/J1-1 CAGCCaggacagttg . . . GAACACTGAAGCTTTCTTTGGA CAAGGCACCAGA (SEQ ID NO: 142) FIG. 13-Target-V4- GACTCAGCCCTGTATCTCTGCGCCAGCAGCCA 1/J1-2 AGAcgaggacagtaa . . . TGGCTACACCTTCGGTTCGGGGACCAGG (SEQ ID NO: 143) FIG. 13-Target-V4- GACTCAGCCCTGTATCTCTGCGCCAGCAGCCA 1/J1-3 AGAgactagcgggaata . . . TGGAAACACCATATATTTTGGAGAGGGAAGTT (SEQ ID NO: 144) FIG. 13-Target-V4- GACTCAGCCCTGTATCTCTGCGCCAGCAGCCA 1/J2-1 AGgcccgggaaagaggt . . . CAATGAGCAGTTCTTCGGGCCAGGGACAC (SEQ ID NO: 145) FIG. 13-Target-V4- GACTCAGCCCTGTATCTCTGCGCCAGCAGCCA 1/J2-2 Attacggtgg . . . CACCGGGGAGCTGTTTTTTGGAGAAGGCTCT (SEQ ID NO: 146) FIG. 13-Target-V4- GACTCAGCCCTGTATCTCTGCGCCAG 1/J2-3 CAGCCgggactacgtc . . . AGCACAGATACGCAGTATTTTGG CCCAGGCACCG (SEQ ID NO: 147) FIG. 13-Target-V5- GGACTCGGCCCTTTATCTTTGCGCCAGCAGCT 1/J1-1 TGGacgggacaggta . . . GAACACTGAAGCTTTCTTTGGACAAGGCACCA GA (SEQ ID NO: 148) FIG. 13-Target-V5- GGACTCGGCCCTTTATCTTTGCGCCAGCAGC 1/J1-2 TTgcagggtgg . . . ACTATGGCTACACCTTCGGTTCGGGGACCAGG (SEQ ID NO: 149) FIG. 13-Target-V5- GGACTCGGCCCTTTATCTTTGCGCCAGCAGC 1/J1-3 ccgtacaggcttcctaagata . . . CTGGAAACACCATATATTTTGGAGAGGGAAG TT (SEQ ID NO: 150) FIG. 13-Target-V5- GGACTCGGCCCTTTATCTTTGCGCCA 1/J2-1 GCAGCTTGGccttt . . . CCTACAATGAGCAGTTCTTCG GGCCAGGGACAC (SEQ ID NO: 151) FIG. 13-Target-V5- GGACTCGGCCCTTTATCTTTGCGCCAGCAGC 1/J2-2 TTGtggacagggaggtatcc . . . CACCGGGGAGCTGTTTTTTGGAGAAGGCTCT (SEQ ID NO: 152) FIG. 13-Target-V5- GGACTCGGCCCTTTATCTTTGCGCC 1/J2-3 AGCAGCtccatcta . . . CACAGATACGCAGTATTTTGG CCCAGGCACCCG (SEQ ID NO: 153) FIG. 13-Target-V6- CAGACATCTGTGTACTTCTGTGCCAGC 1/J1-1 AGTGaccatcagactgg . . . GAACACTGAAGCTTTCTTTGGA CAAGGCACCAGA (SEQ ID NO: 154) FIG. 13-Target-V6- CAGACATCTGTGTACTTCTGTGCCA 1/J1-2 GCAccaagggacagg . . . AACTATGGCTACACCTTCGGTT CGGGGACCAGG (SEQ ID NO: 155) FIG. 13-Target-V6- CAGACATCTGTGTACTTCTGTGCCAGC 1/J1-3 AGtctcataacgaattgg . . . CTCTGGAAACACCATATATTTTG GAGAGGGAAGTT (SEQ ID NO: 156) FIG. 13-Target-V6- CAGACATCTGTGTACTTCTGTGCCAGCAGTGA 1/J2-1 AGacagggaatcagccccagcc . . . AATGAGCAGTTCTTCGGGCCAGGGACAC (SEQ ID NO: 157) FIG. 13-Target-V6- CAGACATCTGTGTACTTCTGTGCCAGCAGTGA 1/J2-2 AGCggtcggacagggct . . . CCGGGGAGCTGTTTTTTGGAGAAGGCTCT (SEQ ID NO: 158) FIG. 13-Target-V6- CAGACATCTGTGTACTTCTGTGCCAGCAGTGA 1/J2-3 AGagacagcgaaa . . . CAGATACGCAGTATTTTGGCCCAGGCACCCG (SEQ ID NO: 159)
(208) TABLE-US-00014 TABLE 14 Oligonucleotide sequences used as targets for experiments in FIGS. 13A-13D and FIGS. 24-32B. Species Sequence FIG. 13-Target-V10- CAGACATCTGTATATTTCTGCGCCAGCAGT 1/J1-1 GAGgaatacccgggaa ... AACACTGAAGCTTTCTTTGGACAAGGCACCA GA (SEQ ID NO: 160) FIG. 13-Target-V10- CAGACATCTGTATATTTCTGCGCCAGCAGT 1/J1-2 GAgactcggacagtctg ... CTATGGCTACACCTTCGGTTCGGGGACCAGG (SEQ ID NO: 161) FIG. 13-Target-V10- CAGACATCTGTATATTTCTGCGCCAGCAGT 1/J1-3 GAGTCgtcgacagttccaa ... CTCTGGAAACACCATATATTTTGGAGAGGG AAGTT (SEQ ID NO: 162) FIG. 13-Target-V10- CAGACATCTGTATATTTCTGCGCCAGCAGg 1/J2-1 agggacagggatttgtgg ... CTCCTACAATGAGCAGTTCTTCGGGCCAGG GACAC (SEQ ID NO: 163) FIG. 13-Target-V10- CAGACATCTGTATATTTCTGCGCCAG 1/J2-2 CAGTGAGcggcaat ... GAACACCGGGGAGCTGTTTTTT GGAGAAGGCTCT (SEQ ID NO: 164) FIG. 13-Target-V10- CAGACATCTGTATATTTCTGCGCCAGCAGT 1/J2-3 gggagggaaac ... CAGATACGCAGTATTTTGGCCCAGGCACCCG (SEQ ID NO: 165) FIG. 13-Target-V11- GACTCGGCCATGTATCTCTGTGCCAGC 1/J1-1 AGCTTccgggaccg ... TGAACACTGAAGCTTTCTTTGGA CAAGGCACCAGA (SEQ ID NO: 166) FIG. 13-Target-V11- GACTCGGCCATGTATCTCTGTGCCAGCAGC 1/J1-2 tccggacagggcccccctatggctacc ... TATGGCTACACCTTCGGTTCGGGGACCAGG (SEQ ID NO: 167) FIG. 13-Target-V11- GACTCGGCCATGTATCTCTGTGCCAGCAGC 1/J1-3 ttcctgtaagcgggagtta ... GGAAACACCATATATTTTGGAGAGGGAAGTT (SEQ ID NO: 168) FIG. 13-Target-V11- GACTCGGCCATGTATCTCTGTGCCAGCAGC 1/J2-1 tcgcaggccgggagggcccag ... CTACAATGAGCAGTTCTTCGGGCCAGGGA CAC (SEQ ID NO: 169) FIG. 13-Target-V11- GACTCGGCCATGTATCTCTGTGCCAGCAGC 1/J2-2 TTAGacctaaaaacagggaccgacgg ... GAACACCGGGGAGCTGTTTTTTGGAGAAGGC TCT (SEQ ID NO: 170) FIG. 13-Target-V11- GACTCGGCCATGTATCTCTGTGCCAGCAGC 1/J2-3 TTAGatctgggcggactcttgga ... GATACGCAGTATTTTGGCCCAGGCACCCG (SEQ ID NO: 171) FIG. 13-Target-V12- GACTCAGCTGTGTATTTTTGTGCTAGTGGT 5/J1-1 TTgggctccgtctatggctacaa ... ACTGAAGCTTTCTTTGGACAAGGCACCAGA (SEQ ID NO: 172) FIG. 13-Target-V12- GACTCAGCTGTGTATTTTTGTGCTAGTGGT 5/J1-2 TTGcacaccgcaaccggcggtctag ... CTATGGCTACACCTTCGGTTCGGGGACCAGG (SEQ ID NO: 173) FIG. 13-Target-V12- GACTCAGCTGTGTATTTTTGTGCTAGTGGT 5/J1-3 gtgattcttga ... GGAAACACCATATATTTTGGAGAGGGAAGTT (SEQ ID NO: 174) FIG. 13-Target-V12- GACTCAGCTGTGTATTTTTGTGCTAGTGGT 5/J2-1 TTGGTtcctcgacagggacggga ... ACAATGAGCAGTTCTTCGGGCCAGGGACAC (SEQ ID NO: 175) FIG. 13-Target-V12- GACTCAGCTGTGTATTTTTGTGCTAGTGGT 5/J2-2 TTGGgagactagcgggtct ... ACACCGGGGAGCTGTTTTTTGGAGAAGGC TCT (SEQ ID NO: 176) FIG. 13-Target-V12- GACTCAGCTGTGTATTTTTGTGCTAG 5/J2-3 TGGTTTGGagggtc ... AGCACAGATACGCAGTATTTTGG CCCAGGCACCCG (SEQ ID NO: 177)
(209) Probing and Detection of Long Targets with M-Probes (
(210) TABLE-US-00015 TABLE 15 Oligonucleotide sequences used for constructing M-Probes used in FIGS. 33A-33D. Probe Species Sequence FIG. 33ab-99nt s1P AAGGACGAGCAAATGTACCTGCA . . . atgactgaatataaacttgtggtagttggagctggtggc gtaggcaag . . . CGTGATAGAGTCTTCGCAT (SEQ ID NO: 3) s1C TGAACGACGGAAATTGTGC . . . cttgcctacgccaccagctccaactaccacaagtttat attcagtcat . . . TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 4) tP ATGCGAAGACTCTATCACG . . . gtgccttgacgatacagctaattcagaatcattttgtg (SEQ ID NO: 5) tC atcatattcgtccacaaaatgattctgaattagctgtatcgtcaagg cac . . . GCACAATTTCCGTCGTTCA (SEQ ID NO: 6) FIG. 33ab-160nt s1P AAGGACGAGCAAATGTACCTGCA . . . atgactgaatataaacttgtggtagttggagctggtggc gtaggcaag . . . CGTGATAGAGTCTTCGCAT (SEQ ID NO: 3) s1C TGAACGACGGAAATTGTGC . . . cttgcctacgccaccagctccaactaccacaagtttat attcagtcat . . . TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 4) s2P ATGCGAAGACTCTATCACG . . . gtgccttgacgatacagctaattcagaatcattttgtgga cgaatatgat . . . GGCTGAACGTAACTCCTCG (SEQ ID NO: 7) s2C GCTATCTTCAACCTTCTGG . . . atcatattcgtccacaaaatgattctgaattagctgtatcg tcaaggcac . . . GCACAATTTCCGTCGTTCA (SEQ ID NO: 8) tP CGAGGAGTTACGTTCAGCC . . . caacaatagaggattcctacaggaagcaagtagtaattgatggag (SEQ ID NO: 9) tC ccaagagacaggtttctccatcaattactacttgcttcctgtagga atcctctattgttg . . . CCAGAAGGTTGAAGATAGC (SEQ ID NO: 10) FIG. 33cd-218nt s1P AAGGACGAGCAAATGTACCTGCA . . . gtacatgaggactggggagggctttctttgtgtatttgccataaa taatactaaat . . . catttgaagatattcaccattatagagaacaaattaaaagagtta aggactctgaagat CGTGATAGAGTCTTCGCATCA (SEQ ID NO: 178) s1C AGTAACAGACGGAAATTGTGC . . . atcttcagagtccttaactcttttaatttgttctctataatggtgaa tatcttcaaa . . . tgatttagtattatttatggcaaatacacaaagaaagccctcccca gtcctcatgtac . . . TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 179) tP TGATGCGAAGACTCTATCACG . . . gtacctatggtcctagtaggaaataaatgtgatttgc cttctagaa . . . cagtagacacaaaacaggctcaggacttagcaaga agttatg (SEQ ID NO: 180) tC caataaaaggaattccataacttcttgctaagtcctgagcctgtttt gtgtc . . . tactgttctagaaggcaaatcacatttatttcctactagga ccataggtac . . . GCACAATTTCCGTCTGTTACT (SEQ ID NO: 181)
(211) TABLE-US-00016 TABLE 16 Primer used to generate amplicons as targets for reaction with M-Probes in FIGS. 33A-33D. Amplicon Type Sequence FIG. 33ab Primers forward Primer TATAAGGCCTGCTGAAAA (fP) TGACT (SEQ ID NO: 42) reverse Primer ATCCAAGAGACAGGTTTC (rP) TCCA (SEQ ID NO: 43) FIG. 33cd Primers fP1 TCAAGAGGAGTACAGTGC AATG (SEQ ID NO: 182) fP2 TGAGGGACCAGTACAT GACTGG (SEQ ID NO: 183) fP3 GACCAGTACATGAG GGAGGGCTT (SEQ ID NO: 184) rP AACACCCTGTCTTGTCTT TGC (SEQ ID NO: 185)
(212) TABLE-US-00017 TABLE 17 gBlock sequence serving as template for PCR amplification. Sequences in uppercase are the flanking intron sequences FIG. 33 template (KRAS TATAAGGCCTGCTGAAA . . . cDNA gBlock) atgactgaatataaacttgtggtagttggagctggtggcgtaggcaagagtgccttgacg . . . atacagctaattcagaatcattttgtggacgaatatgatccaacaatagaggattcctac . . . aggaagcaagtagtaattgatggagaaacctgtctcttggatattctcgacacagcaggt . . . caagaggagtacagtgcaatgagggaccagtacatgaggactggggagggctttctttgt . . . gtatttgccataaataatactaaatcatttgaagatattcaccattatagagaacaaatt . . . aaaagagttaaggactctgaagatgtacctatggtcctagtaggaaataaatgtgatttg . . . ccttctagaacagtagacacaaaacaggctcaggacttagcaagaagttatggaattcct . . . tttattgaaacatcagcaaagacaagacagggtgttgatgatgccttctatacattagtt . . . cgagaaattcgaaaacataaagaaaagatgagcaaagatggtaaaaagaagaaaaagaag . . . tcaaagacaaagtgtgtaattatgtaa . . . ATACAATTTGTACTTTTTTCTTAAGGCATACTAGTACAAGTGG (SEQ ID NO: 41)
(213) TABLE-US-00018 TABLE 18 Universal P and C sequences used for FIGS. 14A-14B and FIGS. 36A-36B Species Sequence FIG. 14- /5IAbRQ/ GTGCGAA uP CAGGTACATTTGCTCGTCCTT (SEQ ID NO: 2) FIG. 14- GTTGACAATCGTGGATAGTAGAC uC TTCGCAC /3Rox N/ (SEQ ID NO: 186)
(214) TABLE-US-00019 TABLE 19 Primer used to generate amplicons as targets for reaction with M-Probes in FIGS. 14A-14B and FIGS. 36A-36B. Amplicon Type Sequence FIG. 36 Primers fP CCTATTTCTCCTCAGCTCAAAACC (SEQ ID NO: 187) rP ATAGTCAACTTAAGGACTAAATAAAT GATCTAATG (SEQ ID NO: 188) FIG. 14 Primers fP1 AAGGTCAGGGTCTCTGTTAGG (SEQ ID NO: 189) rP1 AGTGGTTAGAGACAATATGA CATCG (SEQ ID NO: 190) fP2 CTTCACCTATCCTGCAACCTTT (SEQ ID NO: 191) rP2 TTCTAATCTGTCTAAATTACC TAACGCT (SEQ ID NO: 192)
(215) TABLE-US-00020 TABLE 20 Oligonucleotide sequences used for constructing M-Probes used in FIGS. 36A-36B. Probe Species Sequence FIG. 36-430nt s1P AAGGACGAGCAAATGTACCTGCA ... tttctcctcagctcaaaacccttcagtggcactccgttttattggtgtcaaagccaaag tcctttcaatggtctac ... aaaacactgtttggccaggccaccaaataccttgctagtttcttctagttctattc ... CGTGATAGAGTCTTCGCATCAG (SEQ ID NO: 193) s1C ACTGAACGACGGAAATTGTGC ... gaatagaactagaagaaactagcaaggtatttggtggcctggccaaacagtgttttgta gaccattgaaaggactt ... tggctttgacaccaataaaacggagtgccactgaagggttttgagctgaggagaaa ... TGGTCTACTATCCACGATTGTCAAC (SEQ ID NO: 194) s2P CTGATGCGAAGACTCTATCACG ... tctctcacttggctccagtcacactgacctccccgccattccttcagtgcatgggaata tcccaccttcagaccat ... tgctccaattcttctcattttgggaatgttctttacccagataatagcttgactaactcc ttct ... GGCTGAACGTAACTCCTCTTTG (SEQ ID NO: 195) s2C GTGCTACTCTTCAACCTTCTGG ... agaaggagttagtcaagctattatctgggtaaagaacattcccaaaatgagaagaattgg agcaatggtctgaagg ... tgggatattcccatgcactgaaggaatggcggggaggtcagtgtgactggagccaagtg agaga ... GCACAATTTCCGTCGTTCAGT (SEQ ID NO: 196) tP CAAAGAGGAGTTACGTTCAGCC... tttatgtctgacttggctcaacagtttaatctcaatgagacttaccctgaccaccct atttca ... tagttccaacctggattccagcattcctaatccccttactctgcacgacttcttttttt tcccatggtactcaccac (SEQ ID NO: 197) tC tgatctaatgagttagaggtggtgagtaccatgggaaaaaaaagaagtcgtgcagagtaaggg gattaggaatgct... ggaatccaggttggaactatgaaatagggtggtcagggtaagtctcattgagattaaactgttg agccaagtcagacataaa ... CCAGAAGGTTGAAGAGTAGCAC (SEQ ID NO: 198)
(216) TABLE-US-00021 TABLE 21 Oligonucleotide sequences used for constructing M-Probes used in FIGS. 14A-14B. Probe Species Sequence FIG. 14- s1P AAGGACGAGCAAATGTACCTGCA ... 560nt-1 ctgttaggaaagcaaaatttccccagatattctcagcagttttctgcttgtgcttccatgtct agagctgtctctagttcc ... tggaagttcctagcttcaagcatgtctaagaaagacttcatttgagtaccttgctacctta ... CGTGATAGAGTCTTCGCATCAG (SEQ ID NO: 199) s1C ACTGAACGACGGAAATTGTGC ... taaggtagcaaggtactcaaatgaagtctttcttagacatgcttgaagctaggaacttc caggaactagagaca ... gctctagacatggaagcacaagcagaaaactgctgagaatatctggggaaattttgc tttcctaacag ... TGGTCTACTATCCACGATTGTCAAC (SEQ ID NO: 200) s2P CTGATGCGAAGACTCTATCACG ... tagtcttccctagcttaataattttttctgtacctaatgatttcagagtgagatg gtgaggtgatcatg ... ggcaaaattattagtctttctgagttctcttattccttttatatcattgaatgtt cttttttgtg ... GGCTGAACGTAACTCCTCTTTG (SEQ ID NO: 201) s2C GTGCTACTCTTCAACCTTCTGG ... cacaaaaaagaacattcaatgatataaaaggaataagagaactcagaaagacta ataattttgcccatgat ... cacctcaccatctcactctgaaatcattaggtacagaaaaaattattaagctaggg aagacta ... GCACAATTTCCGTCGTTCAGT (SEQ ID NO: 202) s3P CAAAGAGGAGTTACGTTCAGCC ... gctattgttaggattagtgtttcaatgtgaatggcagattgaagcttcagagtgctttca ctcatcttcagttgtttct ... ccgagttgccttgagagagagaaagaggtagttttagccctattttgtaggtatagtaat agtga... CGTTCTACCTCAGGTGTTCGT (SEQ ID NO: 203) s3C TTTCTGATGCACTTAGAGTGAGC ... tcactattactatacctacaaaatagggctaaaactacctctttctctctctcaaggcaact cggagaaacaactgaa ... gatgagtgaaagcactctgaagcttcaatctgccattcacattgaaacactaatcctaaca atagc ... CCAGAAGGTTGAAGAGTAGCAC (SEQ ID NO: 204) tP ACGAACACCTGAGGTAGAACG... ttcccttttcctttgtgtctattcgaatcctaccattttattccctatgtttc tgttgcctgtcctc... acatttggtccttctcaggatatggcatgctttccatatttcccagtaaaa atcccag (SEQ ID NO: 205) tC tgacatcggaaaggctgggatttttactgggaaatatggaaagcatgccatatcctgagaagga ccaaatgtgaggacaggc... aacagaaacatagggaataaaatggtaggattcgaaagagacacaa aggaaaagggaa ... GCTCACTCTAAGTGCATCAGAAA (SEQ ID NO: 206) FIG. 14- s1P AAGGACGAGCAAATGTACCTGCA ... 560nt-2 acctatcctgcaacctttccacatactcttccctcaacctggaagactcctcctgt tctttacctggataatt... cttacatagccttccattctcaactcaaatggtgttacttcaaagatgcctttgct cattacc... CGTGATAGAGTCTTCGCATCAG (SEQ ID NO: 207) s1C ACTGAACGACGGAAATTGTGC ... ggtaatgagcaaaggcatctttgaagtaacaccatttgagttgagaatggaagg ctatgtaagaattatcc... aggtaaagaacaggaggagtcttccaggttgagggaagagtatgtggaaaggtt gcaggataggt... TGGTCTACTATCCACGATTGTCAAC (SEQ ID NO: 208) s2P CTGATGCGAAGACTCTATCACG ... aaacgtatattaggcccctctcttacttatttatacttcctttgtaagcagcgacatgg ctcttttgctcaccctg ... gtaagcctagtgcccagtatatcatctgacacacaattggtggtcaactgttgatt ... GGCTGAACGTAACTCCTCTTTG (SEQ ID NO: 209) s2C GTGCTACTCTTCAACCTTCTGG ... aatcaacagttgaccaccaattgtgtgtcagatgatatactgggcactaggcttac cagggtgagcaaaag ... agccatgtcgctgcttacaaaggaagtataaataagtaagagaggggcctaatat acgttt ... GCACAATTTCCGTCGTTCAGT (SEQ ID NO: 210) s3P CAAAGAGGAGTTACGTTCAGCC ... catgagtgaattttattggttactgttgatcgccagtgaaataagtgcttagaaaca cttataggctgaatag ... gaagaattaaacaaatgaatgactagataataggtacgtgggagtcacagggatt gacatcttattt ... CGTTCTACCTCAGGTGTTCGT (SEQ ID NO: 211) s3C TTTCTGATGCACTTAGAGTGAGC ... aaataagatgtcaatccctgtgactcccacgtacctattatctagtcattcatttgtt taattcttcctattcag ... cctataagtgtttctaagcacttatttcactggcgatcaacagtaaccaataaaattc actcatg ... CCAGAAGGTTGAAGAGTAGCAC (SEQ ID NO: 212) tP ACGAACACCTGAGGTAGAACG... tattcagttttgcctacattggctcttttcttacaaatgtcctgatgcctattga gtatatatccataag... gtttctttgagttttctggaagaaatggctgttgttgatgttgtttttagcagc tcttttgactcgac (SEQ ID NO: 213) tC ctggtgtgaggatggtcgagtcaaaagagctgctaaaaacaacatcaacaacagccatttcttc cagaaaactcaaagaa ... accttatggatatatactcaataggcatcaggacatttgtaagaaaagagccaatgt aggcaaaactgaata ... GCTCACTCTAAGTGCATCAGAAA (SEQ ID NO: 214)
(217) TABLE-US-00022 TABLE 22 Amplicon sequence serving as target for 430nt M-Probe (chr3. 1772231-1772689, GRCh37.p13 Primary Assembly). Sequences in uppercase are probing sequences. MP- cctaTTTCTCCTCAGCTCAAAACCCTTCAGTGGCACTCCATTTTATT 430 GGTGTCAAAGCCAAAGTCCTTTC ... AATGGTCTACAAAACACTGTTTGGCCAGGCCACCAAATACCTTGCTA GTTTCTTCTAGTTCTATTCTCTC... TCACTTGGCTCCAGTCACACTGACCTCCCCGCCATTCCTTCAGTGCA TGGGAATATCCCACCTTCAGACC... ATTGCTCCAATTCTTCTCATTTTGGGAATGTTCTTTACCCAGATAAT AGCTTGACTAACTCCTTCTTTTA ... TGTCTGACTTGGCTCAACAGTTTAATCTCAATGAGACTTACCCTGAC CACCCTATTTCATAGTTCCAACC ... TGGATTCCAGCATTCCTAATCCCCTTACTCTGCACGACTTCTTTTTT TTCCCATGGTACTCACCACCTCT ... AACTCATTAGATCAtttatttagtccttaagttgactat (SEQ ID NO: 215)
(218) TABLE-US-00023 TABLE 23 Amplicon sequences serving as target for 560nt M-Probe-1 (chr9. 1095713-1096302, GRCh37.p13 Primary Assembly) and M-Probe-2 (chr13.22569207-22569796, GRCh37.p13 Primary Assembly). Sequences in uppercase are probing sequences. MP- aaggtcagggtctCTGTTAGGAAAGCAAAATTTCCCCAGATATTCTCAGCAG 560-1 TTTTCTGCTTGTGCTTCC ... ATGTCTAGAGCTGTCTCTAGTTCCTGGAAGTTCCTAGCTTCAAGCATG TCTAAGAAAGACTTCATTTGAG ... TACCTTGCTACCTTATAGTCTTCCCTAGCTTAATAATTTTTTCTGTACC TAATGATTTCAGAGTGAGATG ... GTGAGGTGATCATGGGCAAAATTATTAGTCTTTCTGAGTTCTCTTATT CCTTTTATATCATTGAATGTTC ... TTTTTTGTGGCTATTGTTAGGATTAGTGTTTCAATGTGAATGGCAGAT TGAAGCTTCAGAGTGCTTTCAC ... TCATCTTCAGTTGTTTCTCCGAGTTGCCTTGAGAGAGAGAAAGAGGT AGTTTTAGCCCTATTTTGTAGGT ... ATAGTAATAGTGATTCCCTTTTCCTTTGTGTCTCTTTCGAATCCTACC ATTTTATTCCCTATGTTTCTGT ... TGCCTGTCCTCACATTTGGTCCTTCTCAGGATATGGCATGCTTTCCAT ATTTCCCAGTAAAAATCCCAGC ... CTTTCCGATGTCAtattgtctctaaccact (SEQ ID NO: 216) MP- cttcACCTATCCTGCAACCTTTCCACATACTCTTCCCTCAACCTGGAAGA 560-2 CTCCTCCTGTTCTTTACCTG ... GATAATTCTTACATAGCCTTCCATTCTCAACTCAAATGGTGTTACTTC AAAGATGCCTTTGCTCATTACC ... AAACGTATATTAGGCCCCTCTCTTACTTATTTATACTTCCTTTGTAAGC AGCGACATGGCTCTTTTGCTC ... ACCCTGGTAAGCCTAGTGCCCAGTATATCATCTGACACACAATTGGTG GTCAACTGTTGATTCATGAGTG ... AATTTTATTGGTTACTGTTGATCGCCAGTGAAATAAGTGCTTAGAAAC ACTTATAGGCTGAATAGGAAGA ... ATTAAACAAATGAATGACTAGATAATAGGTACGTGGGAGTCACAGGG ATTGACATCTTATTTTATTCAGT ... TTTGCCTACATTGGCTCTTTTCTTACAAATGTCCTGATGCCTATTGAG TATATATCCATAAGGTTTCTTT ... GAGTTTTCTGGAAGAAATGGCTGTTGTTGATGTTGTTTTTAGCAGCTC TTTTGACTCGACCATCCTCACA ... CCAGcgttaggtaatttagacagattagaa (SEQ ID NO: 217)
(219) Detection of Repetitive Sequences (
(220) TABLE-US-00024 TABLE 24 Oligonucleotide sequences used as synthetic Targets for FIGS. 15A-15C. Species Sequence FIG. 15abc- CAGCAGCAGCAGCAGCAG CAACAGCC Target-6r (SEQ ID NO: 11) FIG. 15abc- CAGCAGCAGCAGCAGCAGCAGCAGCAG CAACAGCC Target-9r (SEQ ID NO: 12) FIG. 15abc- CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG Target-12r CAACAGCC (SEQ ID NO: 13) FIG. 15abc- CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG Target-15r CAGCAGCAG CAACAGCC (SEQ ID NO: 14) FIG. 15abc- CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG Target-18r CAGCAGCAGCAGCAGCAG CAACAGCC (SEQ ID NO: 15) FIG. 15abc- CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG Target-21r CAGCAGCAGCAGCAGCAGCAGCAG ... CAACAGCCg (SEQ ID NO: 18) FIG. 15abc- CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG Target-24r CAGCAGCAGCAGCAGCAGCAGCAG ... CAGCAGCAG CAACAGCCg (SEQ ID NO: 219) FIG. 15abc- CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG Target-27r CAGCAGCAGCAGCAGCAGCAGCAG ... CAGCAGCAGCAGCAGCAG CAACAGCCg (SEQ ID NO: 220)
(221) TABLE-US-00025 TABLE 25 Oligonucleotide sequences used to construct M-Probes for FIGS. 15B-15F. Species Sequence FIG. 15-sP1 AAGGACGAGCAAATGTACCTGCA cagcagcagcagcagcag FIG. 15-sP2 CGTGATAGAGTCTTCGCAT (SEQ ID NO: 23) FIG. 15-sP3 ATGCGAAGACTCTATCACG agcagcagcagcagcagcag GGCTGAACGTAACTCCTCG (SEQ ID NO: 29) CGAGGAGTTACGTTCAGCC agcagcagcagcagcag CGTTCTACCTCAGGTGTTC (SEQ ID NO: 35) FIG. 15-sC1 TGAACGACGGAAATTGTGC ctgctgctgctgctgctg FIG. 15-sC2 TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 24) FIG. 15-sC3 GCTATCTTCAACCTTCTGG ctgctgctgctgctgctgct GCACAATTTCCGTCGTTCA (SEQ ID NO: 30) CTGATGCACTTAGAGTGAGC ctgctgctgctgctgct CCAGAAGGTTGAAGATAGC (SEQ ID NO: 36) FIG. 15-tP- AAGGACGAGCAAATGTACCTGCA cagcagcagcagcagcagc (SEQ ID NO: 19) 6r AAGGACGAGCAAATGTACCTGCA FIG. 15-tP- cagcagcagcagcagcagcagcagcagc (SEQ ID NO: 21) 9r ATGCGAAGACTCTATCACG FIG. 15-tP- agcagcagcagcagcagc (SEQ ID NO: 25) 12r FIG. 15- ATGCGAAGACTCTATCACG tP-15r FIG. agcagcagcagcagcagcagcagcagc (SEQ ID NO: 27) 15-tP-18r CGAGGAGTTACGTTCAGCC FIG. 15-tP- agcagcagcagcagc (SEQ ID NO: 31) CGAGGAGTTACGTTCAGCC 21r FIG. 15- agcagcagcagcagcagcagcagc (SEQ ID NO: 33) tP-24r FIG. GAACACCTGAGGTAGAACG 15-tP-27r agcagcagcagcagc (SEQ ID NO: 221) GAACACCTGAGGTAGAACG agcagcagcagcagcagcagcagc (SEQ ID NO: 222) FIG. 15-tC- ggctgttgctgctgctgctgctgctg TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 20) 6r ggctgttgctgctgctgctgctgctgctgctgctg FIG. 15-tC- TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 22) 9r ggctgttgctgctgctgctgctgct FIG. 15-tC- GCACAATTTCCGTCGTTCA (SEQ ID NO: 26) 12r FIG. 15- ggctgttgctgctgctgctgctgctgctgctgct tC-15r FIG. GCACAATTTCCGTCGTTCA (SEQ ID NO: 28) 15-tC-18r ggctgttgctgctgctgctgct FIG. 15-tC- CCAGAAGGTTGAAGATAGC (SEQ ID NO: 32) 21r FIG. 15- ggctgttgctgctgctgctgctgctgctgct tC-24r FIG. CCAGAAGGTTGAAGATAGC (SEQ ID NO: 34) 15-tC-27r ggctgttgctgctgctgctgct GCTCACTCTAAGTGCATCAG (SEQ ID NO: 38) ggctgttgctgctgctgctgctgctgctgct GCTCACTCTAAGTGCATCAG (SEQ ID NO: 40)
(222) TABLE-US-00026 TABLE 26 Universal segment oligos used for hybrid- capture experiments shown FIGS. 15D-15G. Species Sequence FIG. 15defg- aGTGCGAA CAGGTACATTTGCTCGTCCTT (SEQ ID NO: 223) uP /5Biosg/ ttttttt GTTAAATCGTGGATAGTAGA FIG. 15defg- CTTCGCACt (SEQ ID NO: 224) uC
(223) TABLE-US-00027 TABLE 27 Oligonucleotide sequences used to construct M-Probes for FIG. 15G. Species Sequence FIG. 15g-sP1 AAGGACGAGCAAATGTACCTGCA cagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcag ... CGTGATAGAGTCTTCGCAT (SEQ ID NO: 225) FIG. 15g-sC1 TGAACGACGGAAATTGTGC ctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctg ... TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 226) FIG. 15g-tP- ATGCGAAGACTCTATCACGAG 33r cagcagcagcagcagcagcagcagcagcagcagcagcagcagc (SEQ ID NO: 227) FIG. 15g-tP- ATGCGAAGACTCTATCACGAG 35r FIG. cagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagc (SEQ ID NO: 228) 15g-tP-36r ATGCGAAGACTCTATCACGAG FIG. 15g-tP- cagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagc (SEQ ID NO: 229) 37r FIG. ATGCGAAGACTCTATCACGAG 15g-tP-39r cagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagc (SEQ ID NO: 230) ATGCGAAGACTCTATCACGAG cagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagc (SEQ ID NO: 231) FIG. 15g-tC- ggctgttgctgctgctgctgctgctgctgctgctgctgctgctgctgctgct 33r GCACAATTTCCGTCGTTCA (SEQ ID NO: 232) FIG. 15g-tC- ggctgttgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgct 35r FIG. GCACAATTTCCGTCGTTCA (SEQ ID NO: 233) 15g-tC-36r ggctgttgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgct FIG. 15g-tC- GCACAATTTCCGTCGTTCA (SEQ ID NO: 234) 37r FIG. ggctgttgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgct 15g-tC-39r GCACAATTTCCGTCGTTCA (SEQ ID NO: 235) ggctgttgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgctgct GCACAATTTCCGTCGTTCA (SEQ ID NO: 236)
(224) TABLE-US-00028 TABLE 28 Ultramer synthetic sequence with 26 CAG repeats used for Sanger Sequencing experiment in FIG. 38. Species Sequence FIG. 38 Ultramer TGGAAAAGCTGATGAAGGCCTTCGAGTCCCT with 26 CAG CAAGTCCTTCCAGCAG ... repeats CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC AGCAGCAGCAGCAGCAGCAG ... CAGCAGCAGCAGCAGCAGCAGCAACAGCC ... GCCACCGCCGCCGCCGCCGCCGCCGCCTCCT CAGCTTCCTCAG (SEQ ID NO: 237) Sequencing TCGAGTCCCTCAAGTCCTTC (SEQ ID NO: 45) Primer (forward)
(225) TABLE-US-00029 TABLE 29 Oligonucleotide sequences used in FIGS. 40A-40B to test formulation protocol. Species Sequence FIG. 40-T-15r CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC FIG. 40-T-18r AGCAGCAG CAACAGCC (SEQ ID NO: 238) FIG. 40-T-19r CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC FIG. 40-T-20r AGCAGCAGCAGCAG CAACAGCC (SEQ ID NO: 15) CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC AGCAGCAGCAGCAGCAG CAACAGCC (SEQ ID NO: 239) CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC AGCAGCAGCAGCAGCAGCAG CAACAGCC (SEQ ID NO: 240) FIG. 40-s1P AAGGACGAGCAAATGTACCTGCA cagcagcagcagcagcag FIG. 40-s1C CGTGATAGAGTCTTCGCAT (SEQ ID NO: 23) TGAACGACGGAAATTGTGC ctgctgctgctgctgctg TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 24) FIG. 40-s2P ATGCGAAGACTCTATCACG cagcagcagcagcagcagcag FIG. 40-s2C GGCTGAACGTAACTCCTCG (SEQ ID NO: 241) GCTATCTTCAACCTTCTGG ctgctgctgctgctgctgctg GCACAATTTCCGTCGTTCA (SEQ ID NO: 242) FIG. 40-tP CGAGGAGTTACGTTCAGCC cagcagcagcagcagcag (SEQ ID NO: 243) FIG. 40-tC ggctgttgctgctgctgctgctgctg CCAGAAGGTTGAAGATAGC (SEQ ID NO: 244)
(226) TABLE-US-00030 TABLE 30 Oligonucleotide sequences used in FIG. 41A to examine CGG repeats in FMR1 gene. Species Sequence FIG. 41a-T-19r CCGCCGCCGCCGCCGCCGCCGCCGCCGCCTCCGCCGCCGC FIG. 41a-T-16r CGCCGCCGCCGCCGCCGCCGC GCTGCCG (SEQ ID NO: 245) CCGCCGCCGCCGCCGCCGCCTCCGCCGCCGCCGCCGCCGC CGCCGCCGCCGC GCTGCCG (SEQ ID NO: 246) FIG. 41a-sP1 AAGGACGAGCAAATGTACCTGCA ccgccgccgccgccgccg FIG. 41a-sC1 CGTGATAGAGTCTTCGCAT (SEQ ID NO: 247) TGAACGACGGAAATTGTGC cggcggcggcggcggcgg TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 248) FIG. 41a-sP2 ATGCGAAGACTCTATCACG ccgccgccgcctccgccgccg FIG. 41a-sC2 GGCTGAACGTAACTCCTCG (SEQ ID NO: 249) GCTATCTTCAACCTTCTGG cggcggcggaggcggcggcgg GCACAATTTCCGTCGTTCA (SEQ ID NO: 250) FIG. 41a-tP CGAGGAGTTACGTTCAGCC ccgccgccgccgccgccgccgc (SEQ ID NO: 251) FIG. 41a-tC cggcagcgcggcggcggcggcggcggcgg CCAGAAGGTTGAAGATAGC (SEQ ID NO: 252)
(227) TABLE-US-00031 TABLE 31 Oligonucleotide sequences used in FIG. 41B to examine GAA repeats in FXN gene. Species Sequence FIG. 41b-T-19r GAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAG FIG. 41b-T-16r AAGAAGAAGAAGAAGAAGAA AATAAAG (SEQ ID NO: 253) GAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAG AAGAAGAAGAA AATAAAG (SEQ ID NO: 254) FIG. 41b-sP1 AAGGACGAGCAAATGTACCTGCA gaagaagaagaagaagaa FIG. 41b-sC1 CGTGATAGAGTCTTCGCAT (SEQ ID NO: 255) TGAACGACGGAAATTGTGC ttcttcttcttcttcttc TGGTCTACTATCCACGATTTAAC (SEQ ID NO: 256) FIG. 41b-sP2 ATGCGAAGACTCTATCACG gaagaagaagaagaagaagaa FIG. 41b-sC2 GGCTGAACGTAACTCCTCG (SEQ ID NO: 257) GCTATCTTCAACCTTCTGG ttcttcttcttcttcttcttc GCACAATTTCCGTCGTTCA (SEQ ID NO: 258) FIG. 41b-tP CGAGGAGTTACGTTCAGCC gaagaagaagaa (SEQ ID NO: 259) FIG. 41b-tC ctttattttcttcttcttcttcttc CCAGAAGGTTGAAGATAGC (SEQ ID NO: 260)
Example 8
(228) M-Probe Design and Validation.
(229) Each segment consists of two oligonucleotides hybridized to each other via a horizontal region; in the s and t segments, these horizontal regions' sequences are target-specific. Throughout this paper, the lower oligonucleotides have sequence complementary to subsequences of the target, and the upper oligonucleotides have sequence identical to subsequences of the target. Different segments are hybridized to each other via two vertical “arms” with sequences independent of the target. For efficient formulation, every arm has a unique sequence that is in silico designed to be orthogonal to each other and also unlikely to bind to the human genome.
(230) Following the hybridization reaction with the target sequence, the upper M-Probe oligos are released as a multi-stranded complex (
Example 9
(231) Programmed Sequence Variation Tolerance. One technical challenge for many hybridization-based enrichment and detection methods is to tolerate potential single-nucleotide polymorphisms (SNPs) at known locations. Inherited SNPs are frequent in the human genome, with literature reporting SNP frequencies of roughly 1 per 1000 nt in the average human. International HapMap Consortium, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851-861 (2007). Many SNPs are intronic or synonymous mutations with no effect on protein sequence, but may interfere with hybridization probe detection or enrichment due to their close proximity to clinically or scientifically important sequence variations. As one example, rs1050171 is a synonymous SNP in the EGFR gene (c.2361G>A) with a 43% allele frequency in the human population; it is 8 nucleotides away from the c.2369C>T (T790M) mutation that confers resistance to the cancer drug erlotinib. The 1000 Genomes and dbSNP databases provide sequence, position, and frequency information for SNPs with allele frequencies of 0.5% or higher in the human genome. 1000 Genomes Project Consortium, A global reference for human genetic variation. Nature, 526, 68-74 (2015); Sherry, S. T., et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res, 29, 308-311 (2001).
(232)
Example 10
(233) Combinatorial M-Probe Formation. Another feature of the modular construction of the M-Probe is that multiple different internal segments can be combinatorially combined to generate many different M-Probes to different target sequences (
Num. unique M-Probes=m.sub.t.Math.Π.sub.i=1.sup.nm.sub.i
where m.sub.t is the number of instances of the terminal segment t. The number of oligonucleotides used to construct these, in contrast, scales with the sum of m.sub.i:
Num. oligonucleotides=2.Math.(1+m.sub.t+Σ.sub.i=1.sup.nm.sub.i)
(234) For large n and m.sub.i values, combinatorial formulation significantly reduces the number of oligonucleotides needed to detect or enrich sequences. In human T cells, the TCRβ gene undergoes VDJ recombination in which 1 V, 1 D, and 1 J gene region are selected from 48 V, 2 D, and 13 J genes segments, respectively (
(235) Because of the short length and high sequence variability in the D gene region, we elected to consider the entire D region as variable, and designed the M-Probes to be n=1, with the s.sub.l segment corresponding to the V region and the t segment corresponding to the J region. The bulge formed upon binding an M-Probe to its intended target varies in length between 8 and 32 nt. m.sub.l=8 and m.sub.t=6 different instances of the s.sub.l and t segments were designed, allowing the detection of 48 combinatorially recombined VDJ sequences (
(236) Of the 624 hybridization reactions experimentally characterized, all off-target hybridization experiments generated less than 0.6 a.u. of fluorescence, while 43 (90%) of the on-target hybridization generated more than 0.6 a.u. of fluorescence, and 30 (63%) generated more than 1.2 a.u. of fluorescence (
(237) Therefore, the present invention is well adapted to attain the ends and advantages mentioned as well as those that are inherent therein. While numerous changes may be made by those skilled in the art, such changes are encompassed within the spirit of this invention as illustrated, in part, by the appended claims.
(238) The foregoing description of specific embodiments of the present disclosure has been presented for purpose of illustration and description. The exemplary embodiments were chosen and described in order to best explain the principles of the disclosure and its practical application, to thereby enable others skilled in the art to best utilize the subject matter and various embodiments with various modifications are suited to the particular use contemplated. Different features and disclosures of the various embodiments within the present disclosure may be combined within the scope of the present disclosure.
(239) TABLE-US-00032 SEQUENCE LISTING SEQ ID No. Seq Name Sequence 1 XP-F GTTAAATCGTGGATAGTAGAC TTCGCAC /3Rox_N/ 2 XP-Q /5IAbRQ/ GTGCGAA CAGGTACATTTGCTCGTCCTT 3 KRAS-1-Pss1 AAG GAC GAG CAA ATG TAC CTG CAA TGA CTG AAT ATA AAC TTG TGG TAG TTG GAG CTG GTG GCG TAG GCA AGC GTG ATA GAG TCT TCG CAT 4 KRAS-1-Css1 TGA ACG ACG GAA ATT GTG CCT TGC CTA CGC CAC CAG CTC CAA CTA CCA CAA GTT TAT ATT CAG TCA TTG GTC TAC TAT CCA CGA TTT AAC 5 KRAS-1-Pend-2AP ATG CGA AGA CTC TAT CAC GGT GCC TTG ACG ATA CAG CTA ATT CAG AAT CAT TTT GTG 6 KRAS-1-Cend- ATC ATA TTC GTC CAC AAA ATG ATT CTG AAT TAG 2AP CTG TAT CGT CAA GGC ACG CAC AAT TTC CGT CGT TCA 7 KRAS-1-Pss2 ATG CGA AGA CTC TAT CAC GGT GCC TTG ACG ATA CAG CTA ATT CAG AAT CAT TTT GTG GAC GAA TAT GAT GGC TGA ACG TAA CTC CTC G 8 KRAS-1-Css2 GCT ATC TTC AAC CTT CTG GAT CAT ATT CGT CCA CAA AAT GAT TCT GAA TTA GCT GTA TCG TCA AGG CAC GCA CAA TTT CCG TCG TTC A 9 KRAS-1-Pend- CGA GGA GTT ACG TTC AGC CCA ACA ATA GAG GAT AP3-t1 TCC TAC AGG AAG CAA GTA GTA ATT GAT GGA G 10 KRAS-1-Cend- CCA AGA GAC AGG TTT CTC CAT CAA TTA CTA CTT AP3 GCT TCC TGT AGG AAT CCT CTA TTG TTG CCA GAA GGT TGA AGA TAG C 11 HTT-T6 CAG CAG CAG CAG CAG CAG CAA CAG CC 12 HTT-T9 CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CC 13 HTT-T12 CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CC 14 HTT-T15 CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CC 15 HTT-T18 CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CC 16 HTT-T21 CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CC 17 HTT-T24 CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CC 18 HTT-T27 CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CC 19 HTT-6s-Pend-ex AAG GAC GAG CAA ATG TAC CTG CAC AGC AGC AGC AGC AGC AGC 20 HTT-6s-Cend GGC TGT TGC TGC TGC TGC TGC TGC TGT GGT CTA CTA TCC ACG ATT TAA C 21 HTT-9s-Pend-ex AAG GAC GAG CAA ATG TAC CTG CAC AGC AGC AGC AGC AGC AGC AGC AGC AGC 22 HTT-9s-Cend GGC TGT TGC TGC TGC TGC TGC TGC TGC TGC TGC TGT GGT CTA CTA TCC ACG ATT TAA C 23 HTT-s-Pss1 AAG GAC GAG CAA ATG TAC CTG CAC AGC AGC AGC AGC AGC AGC GTG ATA GAG TCT TCG CAT 24 HTT-s-Css1 TGA ACG ACG GAA ATT GTG CCT GCT GCT GCT GCT GCT GTG GTC TAC TAT CCA CGA TTT AAC 25 HTT-12s-Pend-ex ATG CGA AGA CTC TAT CAC GAG CAG CAG CAG CAG CAG C 26 HTT-12s-Cend GGC TGT TGC TGC TGC TGC TGC TGC TGC ACA ATT TCC GTC GTT CA 27 HTT-15s-Pend-ex ATG CGA AGA CTC TAT CAC GAG CAG CAG CAG CAG CAG CAG CAG CAG C 28 HTT-15s-Cend GGC TGT TGC TGC TGC TGC TGC TGC TGC TGC TGC TGC ACA ATT TCC GTC GTT CA 29 HTT-s-Pss2 ATG CGA AGA CTC TAT CAC GAG CAG CAG CAG CAG CAG CAG GGC TGA ACG TAA CTC CTC G 30 HTT-s-Css2 GCT ATC TTC AAC CTT CTG GCT GCT GCT GCT GCT GCT GCT GCA CAA TTT CCG TCG TTC A 31 HTT-18s-Pend-ex CGA GGA GTT ACG TTC AGC CAG CAG CAG CAG CAG C 32 HTT-18s-Cend GGC TGT TGC TGC TGC TGC TGC TCC AGA AGG TTG AAG ATA GC 33 HTT-21s-Pend-ex CGA GGA GTT ACG TTC AGC CAG CAG CAG CAG CAG CAG CAG CAG C 34 HTT-21s-Cend GGC TGT TGC TGC TGC TGC TGC TGC TGC TGC TCC AGA AGG TTG AAG ATA GC 35 HTT-s-Pss3 CGA GGA GTT ACG TTC AGC CAG CAG CAG CAG CAG CAG CGT TCT ACC TCA GGT GTT C 36 HTT-s-Css3 CTG ATG CAC TTA GAG TGA GCC TGC TGC TGC TGC TGC TCC AGA AGG TTG AAG ATA GC 37 HTT-24s-Pend GAA CAC CTG AGG TAG AAC GAG CAG CAG CAG CAG 38 HTT-24s-Cend GGC TGT TGC TGC TGC TGC TGC TGC TCA CTC TAA GTG CAT CAG 39 HTT-27s-Pend GAA CAC CTG AGG TAG AAC GAG CAG CAG CAG CAG CAG CAG CAG 40 HTT-27s-Cend GGC TGT TGC TGC TGC TGC TGC TGC TGC TGC TGC TCA CTC TAA GTG CAT CAG 41 KRAScDNA- TATAAGGCCTGCTGAAAATGACTGAATATAAACTTGTG gBlock GTAGTTGGAGCTGGTGGCGTAGGCAAGAGTGCCTTGAC GATACAGCTAATTCAGAATCATTTTGTGGACGAATATG ATCCAACAATAGAGGATTCCTACAGGAAGCAAGTAGTA ATTGATGGAGAAACCTGTCTCTTGGATATTCTCGACACA GCAGGTCAAGAGGAGTACAGTGCAATGAGGGACCAGT ACATGAGGACTGGGGAGGGCTTTCTTTGTGTATTTGCCA TAAATAATACTAAATCATTTGAAGATATTCACCATTATA GAGAACAAATTAAAAGAGTTAAGGACTCTGAAGATGTA CCTATGGTCCTAGTAGGAAATAAATGTGATTTGCCTTCT AGAACAGTAGACACAAAACAGGCTCAGGACTTAGCAA GAAGTTATGGAATTCCTTTTATTGAAACATCAGCAAAG ACAAGACAGGGTGTTGATGATGCCTTCTATACATTAGTT CGAGAAATTCGAAAACATAAAGAAAAGATGAGCAAAG ATGGTAAAAAGAAGAAAAAGAAGTCAAAGACAAAGTG TGTAATTATGTAAATACAATTTGTACTTTTTTCTTAAGG CATACTAGTACAAGTGG 42 KRAS-full-fP1 TATAAGGCCTGCTGAAAATGACT 43 KRAS-AP3-rP ATCCAAGAGACAGGTTTCTCCA 44 HTTU27 TGGAAAAGCTGATGAAGGCCTTCGAGTCCCTCAAGTCC TTCCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCA GCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG CAGCAGCAGCAACAGCCGCCACCGCCGCCGCCGCCGCC GCCGCCTCCTCAGCTTCCTCAG 45 HTT-FPnew1 TCGAGTCCCTCAAGTCCTTC 46 HTT-rpn4 GGTGGCGGCTGTTG