Methods and Reagents for Molecular Barcoding
20230017673 · 2023-01-19
Inventors
Cpc classification
C12N15/1065
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
International classification
Abstract
Methods and reagents for preparing nucleic acid samples for sequencing are provided. The samples include formalin-fixed paraffin-embedded (FFPE) samples. The methods comprise contacting a nucleic acid sample with a multimeric barcoding reagent comprising barcode regions linked together and appending barcode sequences to nucleic acid sequences of a target nucleic acid molecule. Methods are also provided that additionally use in-vitro transposition, coupling sequences and/or primer-extension to append barcode sequences to nucleic acid sequences of a target nucleic acid molecule.
Claims
1-12. (canceled)
13. A method of producing a library of multimeric barcoding reagents, wherein the method comprises: (a) contacting at least 1000 donor multimeric barcoding reagents with at least 1000 supports, wherein each said support comprises a bead that is 10 nanometers to 100 microns in diameter, wherein the donor multimeric barcoding reagents each comprise: (i) a multimeric barcode molecule comprising first and second barcode molecules comprised within a nucleic acid molecule, wherein each of the barcode molecules comprises a nucleic acid sequence comprising a barcode region; and (ii) first and second barcoded oligonucleotides, wherein the first barcoded oligonucleotide comprises a barcode region annealed to the barcode region of the first barcode molecule and wherein the second barcoded oligonucleotide comprises a barcode region annealed to the barcode region of the second barcode molecule; and (b) appending the first and second barcoded oligonucleotides of each donor multimeric barcoding reagent to a different support from among the at least 1000 supports to form a library of multimeric barcoding reagents, wherein each multimeric barcoding reagent comprises the first and second barcoded oligonucleotides of a donor multimeric barcoding reagent linked together by the support to which they are appended; wherein the library of multimeric barcoding reagents comprises at least 1000 different multimeric barcoding reagents and wherein the barcode regions of the first and second barcoded oligonucleotides of each multimeric barcoding reagent are different to the barcode regions of the barcoded oligonucleotides of at least 999 other multimeric barcoding reagents in the library.
14. The method of claim 13, wherein the method comprises: (a) synthesising the first barcoded oligonucleotide and the second barcoded oligonucleotide from a multimeric barcode molecule as a template to form each donor multimeric barcoding reagent, wherein each multimeric barcode molecule comprises first and second barcode molecules linked together, wherein each of the barcode molecules comprises a nucleic acid sequence comprising a barcode region, and wherein the first barcoded oligonucleotide comprises a sequence complementary to all or part of the first barcode region and the second barcoded oligonucleotide comprises a sequence complementary to all of part of the second barcode region; (b) contacting donor multimeric barcoding reagents with the supports; and (c) appending the first and second barcoded oligonucleotides of each donor multimeric barcoding reagent to a support to form the library of multimeric barcoding reagent.
15. The method of claim 14, wherein a priming region is located at the 3' end of each barcode region of each barcode molecule, wherein step (a) is performed by a primer-extension reaction wherein an extension primer at least partially complementary to said priming region is annealed to each priming region, and then extended in a primer-extension reaction by a polymerase.
16. The method of claim 13, wherein (i) an appending moiety is comprised within or attached to the first and second barcoded oligonucleotides, (ii) an appending moiety is comprised within or attached to a multimeric barcode molecule and/or (iii) an appending moiety is comprised within or attached to an extension primer.
17. The method of claim 13, wherein an appending moiety comprises: (i) a hapten molecule; (ii) a reactive chemical group, optionally wherein the reactive chemical group comprises a primary amine, an azide group, or an alkyne group; (iii) a nucleic acid sequence; and/or (iv) a linker region.
18. The method of claim 13, wherein the support comprises a magnetic bead or a superparamagnetic bead.
19. The method of claim 13, wherein a step of appending comprises a process of binding an appending moiety to an appending site.
20. The method of claim 19, wherein the process of binding comprises a covalent binding event or a non-covalent binding event.
21. The method of claim 13, wherein the method further comprises separating the multimeric barcode molecules from the first and second barcoded oligonucleotides.
22. The method of claim 21, wherein the separation step is performed after the barcoded oligonucleotides have been appended to the supports.
23. The method of claim 21, wherein the separation step is performed with a thermal denaturation step, and wherein the regions of annealing between the multimeric barcode molecules and the barcoded oligonucleotides are denatured.
24. The method of claim 13, wherein the method comprises dividing each multimeric barcoding reagent, wherein at least first and second fragments are produced for each multimeric barcoding reagent, and wherein said first fragment comprises a first barcode region of said multimeric barcoding reagent and said second fragment comprises a second barcode region of said multimeric barcoding reagent.
25. The method of claim 24, wherein the first and second barcode regions of each multimeric barcoding reagent are comprised within a multimeric barcode molecule, and wherein a recognition sequence for a restriction endonuclease is comprised within the multimeric barcode molecule, and wherein the multimeric barcoding reagent is divided by a process comprising cleavage by the restriction endonuclease.
26. The method of claim 24, wherein each multimeric barcoding reagent is comprised of at least a first barcode region and a second barcode region comprised within a multimeric barcode molecule, and wherein each multimeric barcode molecule comprises at least one uracil nucleotide, and wherein the dividing step comprises excising at least one uracil base by a uracil DNA glycosylase enzyme, and wherein the dividing step produces said first and second fragments for each multimeric barcoding reagent.
27. The method of claim 24, wherein the dividing step comprises (i) a step of acoustic shearing or a step of sonication, and/or (ii) digestion with a deoxyribonuclease enzyme.
28. The method of claim 24, wherein each multimeric barcoding reagent is divided into at least 2, at least 3, at least 4, at least 5, at least 10, at least 50, at least 100, at least 1000, at least 10,000, at least 50,000, or at least 100,000 fragments.
29. The method of claim 13, wherein the method comprises producing a library of at least 10.sup.4, at least 10.sup.5, at least 10.sup.6, at least 10.sup.7, at least 10.sup.8 or at least 10.sup.9 different multimeric barcoding reagents.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0726] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0727] The invention, together with further objects and advantages thereof, may best be understood by making reference to the description taken together with the accompanying drawings, in which:
[0728]
[0729]
[0730]
[0731]
[0732]
[0733]
[0734]
[0735]
[0736]
[0737]
[0738]
[0739]
[0740]
[0741]
[0742]
[0743]
[0744]
[0745]
[0746]
[0747]
[0748]
EXAMPLES
Materials and Methods
Method 1 - Synthesis of a Library of Nucleic Acid Barcode Molecules
Synthesis of Double-Stranded Sub-Barcode Molecule Library
[0749] In a PCR tube, 10 microliters of 10 micromolar BC_MX3 (an equimolar mixture of all sequences in SEQ ID NO: 18 to 269) were added to 10 microliters of 10 micromolar BC_ADD_TP1 (SEQ ID NO: 1), plus 10 microliters of 10X CutSmart Buffer (New England Biolabs) plus 1.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen) plus 68 microliters H.sub.2O, to final volume of 99 microliters. The PCR tube was placed on a thermal cycler and incubated at 75° C. for 5 minutes, then slowly annealed to 4° C., then held 4° C., then placed on ice. 1.0 microliter of Klenow polymerase fragment (New England Biolabs; at 5 U/uL) was added to the solution and mixed. The PCR tube was again placed on a thermal cycler and incubated at 25° C. for 15 minutes, then held at 4° C. The solution was then purified with a purification column (Nucleotide Removal Kit; Qiagen), eluted in 50 microliters H.sub.2O, and quantitated spectrophotometrically.
Synthesis of Double-Stranded Downstream Adapter Molecule
[0750] In a PCR tube, 0.5 microliters of 100 micromolar BC_ANC_TP1 (SEQ ID NO: 2) were added to 0.5 microliters of 100 micromolar BC_ANC_BT1 (SEQ ID NO: 3), plus 20 microliters of 10X CutSmart Buffer (New England Biolabs) plus 178 microliters H.sub.2O, to final volume of 200 microliters. The PCR tube was placed on a thermal cycler and incubated at 95° C. for 5 minutes, then slowly annealed to 4° C., then held 4° C., then placed on ice, then stored at -20° C.
Ligation of Double-Stranded Sub-Barcode Molecule Library to Double-Stranded Downstream Adapter Molecule
[0751] In a 1.5 milliliter Eppendorf tube, 1.0 microliter of Double-Stranded Downstream Adapter Molecule solution was added to 2.5 microliters of Double-Stranded Sub-Barcode Molecule Library, plus 2.0 microliters of 10X T4 DNA Ligase buffer, and 13.5 microliters H.sub.2O to final volume of 19 microliters. 1.0 microliter of T4 DNA Ligase (New England Biolabs; high concentration) was added to the solution and mixed. The tube was incubated at room temperature for 60 minutes, then purified with 1.8X volume (34 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O.
PCR Amplification of Ligated Library
[0752] In a PCR tube, 2.0 microliters of Ligated Library were added to 2.0 microliters of 50 micromolar BC_FWD_PR1 (SEQ ID NO: 4), plus 2.0 microliters of 50 micromolar BC_REV_PR1 (SEQ ID NO: 5), plus 10 microliters of 10X Taq PCR Buffer (Qiagen) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen) plus 81.5 microliters H.sub.2O, plus 0.5 microliters Qiagen Taq Polymerase (at 5 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 15 cycles of: 95° C. for 30 seconds, then 59° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C. The solution was then purified with 1.8X volume (180 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 50 microliters H.sub.2O.
Uracil Glycosylase Enzyme Digestion
[0753] To an eppendorf tube 15 microliters of the eluted PCR amplification, 1.0 microliters H.sub.2O, plus 2.0 microliters 10X CutSmart Buffer (New England Biolabs), plus 2.0 microliter of USER enzyme solution (New England Biolabs) was added and mixed. The tube was incubated at 37° C. for 60 minutes, then the solution was purified with 1.8X volume (34 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 34 microliters H.sub.2O.
Mlyl Restriction Enzyme Cleavage
[0754] To the eluate from the previous (glycosylase digestion) step, 4.0 microliters 10X CutSmart Buffer (New England Biolabs), plus 2.0 microliter of Mlyl enzyme (New England Biolabs, at 5U/uL) was added and mixed. The tube was incubated at 37° C. for 60 minutes, then the solution was purified with 1.8X volume (72 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O.
Ligation of Sub-Barcode Library to Mlyl-Cleaved Solution
[0755] In a 1.5 milliliter Eppendorf tube, 10 microliter of Mlyl-Cleaved Solution solution was added to 2.5 microliters of Double-Stranded Sub-Barcode Molecule Library, plus 2.0 microliters of 10X T4 DNA Ligase buffer, and 4.5 microliters H.sub.2O to final volume of 19 microliters. 1.0 microliter of T4 DNA Ligase (New England Biolabs; high concentration) was added to the solution and mixed. The tube was incubated at room temperature for 60 minutes, then purified with 1.8X volume (34 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 40 microliters H.sub.2O.
Repeating Cycles of Sub-Barcode Addition
[0756] The experimental steps of : 1) Ligation of Sub-Barcode Library to Mlyl-Cleaved Solution, 2) PCR Amplification of Ligated Library, 3) Uracil Glycosylase Enzyme Digestion, and 4) Mlyl Restriction Enzyme Cleavage were repeated, in sequence, for a total of five cycles.
Synthesis of Double-Stranded Upstream Adapter Molecule
[0757] In a PCR tube, 1.0 microliters of 100 micromolar BC_USO_TP1 (SEQ ID NO: 6) were added to 1.0 microliters of 100 micromolar BC_USO_BT1 (SEQ ID NO: 7), plus 20 microliters of 10X CutSmart Buffer (New England Biolabs) plus 178 microliters H.sub.2O, to final volume of 200 microliters. The PCR tube was placed on a thermal cycler and incubated at 95° C. for 60 seconds, then slowly annealed to 4° C., then held 4° C., then placed on ice, then stored at -20° C.
Ligation of Double-Stranded Upstream Adapter Molecule
[0758] In a 1.5 milliliter Eppendorf tube, 3.0 microliters of Upstream Adapter solution were added to 10.0 microliters of final (after the fifth cycle) Mlyl-Cleaved solution, plus 2.0 microliters of 10X T4 DNA Ligase buffer, and 5.0 microliters H.sub.2O to final volume of 19 microliters. 1.0 microliter of T4 DNA Ligase (New England Biolabs; high concentration) was added to the solution and mixed. The tube was incubated at room temperature for 60 minutes, then purified with 1.8X volume (34 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 40 microliters H.sub.2O.
PCR Amplification of Upstream Adapter-Ligated Library
[0759] In a PCR tube, 6.0 microliters of Upstream Adapter-Ligated Library were added to 1.0 microliters of 100 micromolar BC_CS_PCR_FWD1 (SEQ ID NO: 8), plus 1.0 microliters of 100 micromolar BC_CS_PCR_REV1 (SEQ ID NO: 9), plus 10 microliters of 10X Taq PCR Buffer (Qiagen) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen) plus 73.5 microliters H.sub.2O, plus 0.5 microliters Qiagen Taq Polymerase (at 5 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 15 cycles of: 95° C. for 30 seconds, then 61° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C. The solution, containing a library of amplified nucleic acid barcode molecules, was then purified with 1.8X volume (180 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions). The library of amplified nucleic acid barcode molecules was then eluted in 40 microliters H.sub.2O.
[0760] The library of amplified nucleic acid barcode molecules sythesised by the method described above was then used to assemble a library of multimeric barcode molecules as described below.
Method 2 - Assembly of a Library of Multimeric Barcode Molecules
[0761] A library of multimeric barcode molecules was assembled using the library of nucleic acid barcode molecules synthesised according to the methods of Method 1.
Primer-Extension With Forward Termination Primer and Forward Splinting Primer
[0762] In a PCR tube, 5.0 microliters of the library of amplified nucleic acid barcode molecules were added to 1.0 microliters of 100 micromolar CS_SPLT_FWD1 (SEQ ID NO: 10), plus 1.0 microliters of 5 micromolar CS_TERM_FWD1 (SEQ ID NO: 11), plus 10 microliters of 10X Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen) plus 80.0 microliters H.sub.2O, plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 1 cycle of: 95° C. for 30 seconds, then 53° C. for 30 seconds, then 72° C. for 60 seconds, then 1 cycle of: 95° C. for 30 seconds, then 50° C. for 30 seconds, then 72° C. for 60 seconds, then held at 4° C. The solution was then purified a PCR purification column (Qiagen), and eluted in 85.0 microliters H.sub.2O.
Primer-Extension With Reverse Termination Primer and Reverse Splinting Primer
[0763] In a PCR tube, the 85.0 microliters of forward-extension primer-extension products were added to 1.0 microliters of 100 micromolar CS_SPLT_REV1 (SEQ ID NO: 12), plus 1.0 microliters of 5 micromolar CS_TERM_REV1 (SEQ ID NO: 13), plus 10 microliters of 10X Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 1 cycle of: 95° C. for 30 seconds, then 53° C. for 30 seconds, then 72° C. for 60 seconds, then 1 cycle of: 95° C. for 30 seconds, then 50° C. for 30 seconds, then 72° C. for 60 seconds, then held at 4° C. The solution was then purified a PCR purification column (Qiagen), and eluted in 43.0 microliters H.sub.2O.
Linking Primer-Extension Products With Overlap-Extension PCR
[0764] In a PCR tube were added the 43.0 microliters of reverse-extension primer-extension products, plus 5.0 microliters of 10X Thermopol Buffer (NEB) plus 1.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) to final volume of 50 microliters. The PCR tube was placed on a thermal cycler and amplified for 5 cycles of: 95° C. for 30 seconds, then 60° C. for 60 seconds, then 72° C. for 2 minutes; then 5 cycles of: 95° C. for 30 seconds, then 60° C. for 60 seconds, then 72° C. for 5 minutes; then 5 cycles of: 95° C. for 30 seconds, then 60° C. for 60 seconds, then 72° C. for 10 minutes; then held at 4° C. The solution was then purified with 0.8X volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 40 microliters H.sub.2O.
Amplification of Overlap-Extension Products
[0765] In a PCR tube were added 2.0 microliters of Overlap-Extension PCR solution, plus 1.0 microliters of 100 micromolar CS_PCR_FWD1 (SEQ ID NO: 14), plus 1.0 microliters of 100 micromolar CS_PCR_REV1 (SEQ ID NO: 15), plus 10 microliters of 10X Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL), plus 83.0 microliters H.sub.2O to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 15 cycles of: 95° C. for 30 seconds, then 58° C. for 30 seconds, then 72° C. for 10 minutes; then held at 4° C. The solution was then purified with 0.8X volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 50 microliters H.sub.2O, and quantitated spectrophotometrically.
Gel-Based Size Selection of Amplified Overlap-Extension Products
[0766] Approximately 250 nanograms of Amplified Overlap-Extension Products were loaded and run on a 0.9% agarose gel, and then stained and visualised with ethidium bromide. A band corresponding to 1000 nucleotide size (plus and minus 100 nucleotides) was excised and purified with a gel extraction column (Gel Extraction Kit, Qiagen) and eluted in 50 microliters H.sub.2O.
Amplification of Overlap-Extension Products
[0767] In a PCR tube were added 10.0 microliters of Gel-Size-Selected solution, plus 1.0 microliters of 100 micromolar CS_PCR_FWD1 (SEQ ID NO: 14), plus 1.0 microliters of 100 micromolar CS_PCR_REV1 (SEQ ID NO: 15), plus 10 microliters of 10X Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) plus 75.0 microliters H.sub.2O to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 15 cycles of: 95° C. for 30 seconds, then 58° C. for 30 seconds, then 72° C. for 4 minutes; then held at 4° C. The solution was then purified with 0.8X volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 50 microliters H.sub.2O, and quantitated spectrophotometrically.
Selection and Amplification of Quantitatively Known Number of Multimeric Barcode Molecules
[0768] Amplified gel-extracted solution was diluted to a concentration of 1 picogram per microliter, and then to a PCR tube was added 2.0 microliters of this diluted solution (approximately 2 million individual molecules), plus 0.1 microliters of 100 micromolar CS_PCR_FWD1 (SEQ ID NO: 14), plus 0.1 microliters of 100 micromolar CS_PCR_REV1 (SEQ ID NO: 15), plus 1.0 microliter 10X Thermopol Buffer (NEB) plus 0.2 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 0.1 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) plus 6.5 microliters H.sub.2O to final volume of 10 microliters. The PCR tube was placed on a thermal cycler and amplified for 11 cycles of: 95° C. for 30 seconds, then 57° C. for 30 seconds, then 72° C. for 4 minutes; then held at 4° C.
[0769] To the PCR tube was added 1.0 microliters of 100 micromolar CS_PCR_FWD1 (SEQ ID NO: 14), plus 1.0 microliters of 100 micromolar CS_PCR_REV1 (SEQ ID NO: 15), plus 9.0 microliters of 10X Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) plus 76.0 microliters H.sub.2O to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 10 cycles of: 95° C. for 30 seconds, then 57° C. for 30 seconds, then 72° C. for 4 minutes; then held at 4° C. The solution was then purified with 0.8X volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 50 microliters H.sub.2O, and quantitated spectrophotometrically.
Method 3: Production of Single-Stranded Multimeric Barcode Molecules by In Vitro Transcription and cDNA Synthesis
[0770] This method describes a series of steps to produce single-stranded DNA strands, to which oligonucleotides may be annealed and then barcoded along. This method begins with four identical reactions performed in parallel, in which a promoter site for the T7 RNA Polymerase is appended to the 5' end of a library of multimeric barcode molecules using an overlap-extension PCR amplification reaction. Four identical reactions are performed in parallel and then merged to increase the quantitative amount and concentration of this product available. In each of four identical PCR tubes, approximately 500 picograms of size-selected and PCR-amplified multimeric barcode molecules (as produced in the ‘Selection and Amplification of Quantitatively Known Number of Multimeric Barcode Molecules‘ step of Method 2) were mixed with 2.0 microliters of 100 micromolar CS_PCR_FWD1_T7 (SEQ ID NO. 270) and 2.0 microliters of 100 micromolar CS_PCR_REV4 (SEQ ID NO. 271), plus 20.0 microliters of 10X Thermopol PCR buffer, plus 4.0 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, and 2.0 microliters Vent Exo Minus polymerse (at 5 units per microliter) plus water to a total volume of 200 microliters. The PCR tube was placed on a thermal cycler and amplified for 22 cycles of: 95° C. for 60 seconds, then 60° C. for 30 seconds, then 72° C. for 3 minutes; then held at 4° C. The solution from all four reactions was then purified with a gel extraction column (Gel Extraction Kit, Qiagen) and eluted in 52 microliters H.sub.2O.
[0771] Fifty microliters of the eluate was mixed with 10 microliters 10X NEBuffer 2 (NEB), plus 0.5 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, and 1.0 microliters Vent Exo Minus polymerse (at 5 units per microliter) plus water to a total volume of 100 microliters. The reaction was incubated for 15 minutes at room temperature, then purified with 0.8X volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 40 microliters H.sub.2O, and quantitated spectrophotometrically.
[0772] A transcription step is then performed, in which the library of PCR-amplified templates containing T7 RNA Polymerase promoter site (as produced in the preceding step) is used as a template for T7 RNA polymerase. This comprises an amplification step to produce a large amount of RNA-based nucleic acid corresponding to the library of multimeric barcode molecules (since each input PCR molecule can serve as a template to produce a large number of cognate RNA molecules). In the subsequent step, these RNA molecules are then reverse transcribed to create the desired, single-stranded multimeric barcode molecules. Ten (10) microliters of the eluate was mixed with 20 microliters 5X Transcription Buffer (Promega), plus 2.0 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 10 microliters of 0.1 milimolar DTT, plus 4.0 microliters SuperAseln (Ambion), and 4.0 microliters Promega T7 RNA Polymerase (at 20 units per microliter) plus water to a total volume of 100 microliters. The reaction was incubated 4 hours at 37° C., then purified with an RNEasy Mini Kit (Qiagen), and eluted in 50 micoliters H.sub.2O, and added to 6.0 microliters SuperAseln (Ambion).
[0773] The RNA solution produced in the preceding in vitro transcription step is then reverse transcribed (using a primer specific to the 3' ends of the RNA molecules) and then digested with RNAse H to create single-stranded DNA molecules corresponding to multimeric barcode molecules, to which oligonucleotides maybe be annealed and then barcoded along. In two identical replicate tubes, 23.5 microliters of the eluate was mixed with 5.0 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 3.0 microliters SuperAseln (Ambion), and 10.0 microliters of 2.0 micromolar CS_PCR_REV1 (SEQ ID NO. 272) plus water to final volume of 73.5 microliters. The reaction was incubated on a thermal cycler at 65° C. for 5 minutes, then 50° C. for 60 seconds; then held at 4° C. To the tube was added 20 microliters 5X Reverse Transcription buffer (Invitrogen), plus 5.0 microliters of 0.1 milimolar DTT, and 1.75 microliters Superscript III Reverse Transcriptase (Invitrogen). The reaction was incubated at 55° C. for 45 minutes, then 60° C. for 5 minutes; then 70° C. for 15 minutes, then held at 4° C., then purified with a PCR Cleanup column (Qiagen) and eluted in 40 microliters H.sub.2O.
[0774] Sixty microliters of the eluate was mixed with 7.0 microliters 10X RNAse H Buffer (Promega), plus 4.0 microliters RNAse H (Promega. The reaction was incubated 12 hours at 37° C., then 95° C. for 10 minutes, then held at 4° C., then purified with 0.7X volume (49 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.
Method 4: Production of Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides
[0775] This method describes steps to produce multimeric barcoding reagents from single-stranded multimeric barcode molecules (as produced in Method 3) and appropriate extension primers and adapter oligonucleotides.
[0776] In a PCR tube, approximately 45 nanograms of single-stranded RNAse H-digested multimeric barcode molecules (as produced in the last step of Method 3) were mixed with 0.25 microliters of 10 micromolar DS_ST_05 (SEQ ID NO. 273, an adapter oligonucleotide) and 0.25 microliters of 10 micromolar US_PCR_Prm_Only_03 (SEQ ID NO. 274, an extension primer), plus 5.0 microliters of 5X Isothermal extension/ligation buffer, plus water to final volume of 19.7 microliters. In order to anneal the adapter oligonucleotides and extension primers to the multimeric barcode molecules, in a thermal cycler, the tube was incubated at 98° C. for 60 seconds, then slowly annealed to 55° C., then held at 55° C. for 60 seconds, then slowly annealed to 50° C. then held at 50° C. for 60 seconds, then slowly annealed to 20° C. at 0.1° C./sec, then held at 4° C. To the tube was added 0.3 microliters (0.625 U) Phusion Polymerase (NEB; 2 U/uL) 2.5 microliters (100 U) Taq DNA Ligase (NEB; 40 U/uL); and 2.5 microliters 100 milimolar DTT. In order to extend the extension primer(s) across the adjacent barcode region(s) of each multimeric barcode molecule, and then to ligate this extension product to the phosphorylated 5' end of the adapter oligonucleotide annealed to the downstream thereof, the tube was then incubated at 50° C. for 3 minutes, then held at 4° C. The reaction was then purified with a PCR Cleanup column (Qiagen) and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.
Method 5: Production of Synthetic DNA Templates of Known Sequence
[0777] This method describes a technique to produce synthetic DNA templates with a large number of tandemly-repeated, co-linear molecular sequence identifiers, by circularizing and then tandemly amplifying (with a processive, strand-displacing polymerase) oligonucleotides containing said molecular sequence identifiers. This reagent may then be used to evaluate and measure the multimeric barcoding reagents described herein.
[0778] In a PCR was added 0.4 microliters of 1.0 micromolar Syn_Temp_01 (SEQ ID NO. 275) and 0.4 microliters of 1.0 micromolar ST_Splint_02 (SEQ ID NO. 276) and 10.0 microliters of 10X NEB CutSmart buffer. On a thermal cycler, the tube was incubated at 95° C. for 60 seconds, then held at 75° C. for 5 minutes, then slowly annealed to 20° C. then held at 20° C. for 60 seconds, then held at 4° C. To circularize the molecules through an intramolecular ligation reaction, the tube was then added 10.0 microliters ribo-ATP and 5.0 microliters T4 DNA Ligase (NEB; High Concentration). The tube was then incubated at room temperature for 30 minutes, then at 65° C. for 10 minutes, then slowly annealed to 20° C. then held at 20° C. for 60 seconds, then held at 4° C. To each tube was then added 10X NEB CutSmart buffer, 4.0 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, and 1.5 microliters of diluted phi29 DNA Polymerase (NEB; Diluted 1:20 in 1X CutSmart buffer) plus water to a total volume of 200 microliters. The reaction was incubated at 30° C. for 5 minutes, then held at 4° C., then purified with 0.7X volume (140 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.
Method 6: Barcoding Synthetic DNA Templates of Known Sequence With Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides
[0779] In a PCR tube were added 10.0 microliters 5X Phusion HF buffer (NEB), plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 2.0 microliters (10 nanograms) 5.0 nanogram/microliters Synthetic DNA Templates of Known Sequence (as produced by Method 5), plus water to final volume of 42.5 microliters. The tube was then incubated at 98° C. for 60 seconds, then held at 20° C. To the tube was added 5.0 microliters of 5.0 picogram/microliter Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides (as produced by Method 4). The reaction was then incubated at 70° C. for 60 seconds, then slowly annealed to 60° C., then 60° C. for five minutes, then slowly annealed to 55° C., then 55° C. for five minutes, then slowly annealed to 50° C., then 50° C. for five minutes, then held at 4° C. To the reaction was added 0.5 microliters of Phusion Polymerase (NEB), plus 2.0 microliters 10 uM SynTemp_PE2_B1_Short1 (SEQ ID NO. 277, a primer that is complementary to part of the extension products produced by annealing and extending the multimeric barcoding reagents created by Method 4 along the synthetic DNA templates created by Method 5, serves as a primer for the primer-extension and then PCR reactions described in this method). Of this reaction, a volume of 5.0 microliters was added to a new PCR tube, which was then incubated for 30 seconds at 55° C., 30 seconds 60° C., and 30 seconds 72° C., then followed by 10 cycles of: 98° C. then 65° C. then 72° C. for 30 seconds each, then held at 4° C. To each tube was then added 9.0 microliters 5X Phusion buffer, plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 1.75 microliters 10 uM SynTemp_PE2_B1_Short1 (SEQ ID NO. 277), plus 1.75 microliters 10 uM US_PCR_Prm_Only_02 (SEQ ID NO. 278, a primer partially complementary to the extension primer employed to generate the multimeric barcoding reagents as per Method 4, and serving as the ‘forward’ primer in this PCR amplification reaction), plus 0.5 microliters Phusion Polymerase (NEB), plus water to final volume of 50 microliters. The PCR tube was placed on a thermal cycler and amplified for 24 cycles of: 98° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C., then purified with 1.2X volume (60 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.
[0780] The resulting library was then barcoded for sample identification by a PCR-based method, amplified, and sequenced by standard methods using a 150-cycle, mid-output NextSeq flowcell (Illumina), and demultiplexed informatically for further analysis.
Method 7: Barcoding Synthetic DNA Templates of Known Sequence With Multimeric Barcoding Reagents and Separate Adapter Oligonucleotides
[0781] To anneal and extend adapter oligonucleotides along the synthetic DNA templates, in a PCR tube were added 10.0 microliters 5X Phusion HF buffer (NEB), plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 5.0 microliters (25 nanograms) 5.0 nanogram/ microliters Synthetic DNA Templates of Known Sequence (as produced by Method 5), plus 0.25 microliters of 10 micromolar DS_ST_05 (SEQ ID NO. 273, an adapter oligonucleotide), plus water to final volume of 49.7 microliters. On a thermal cycler, the tube was incubated at 98° C. for 2 minutes, then 63° C. for 1 minute, then slowly annealed to 60° C. then held at 60° C. for 1 minute, then slowly annealed to 57° C. then held at 57° C. for 1 minute, then slowly annealed to 54° C. then held at 54° C. for 1 minute, then slowly annealed to 50° C. then held at 50° C. for 1 minute, then slowly annealed to 45° C. then held at 45° C. for 1 minute, then slowly annealed to 40° C. then held at 40° C. for 1 minute, then held at 4° C. To the tube was added 0.3 microliters Phusion Polymerase (NEB), and the reaction was incubated at 45° C. for 20 seconds, then 50° C. for 20 seconds, then 55° C. for 20 seconds, 60° C. for 20 seconds, then 72° C. for 20 seconds, then held at 4° C.; the reaction was then purified with 0.8X volume (40 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.
[0782] In order to anneal adapter oligonucleotides (annealed and extended along the synthetic DNA templates as in the previous step) to multimeric barcode molecules, and then to anneal and then extend extension primer(s) across the adjacent barcode region(s) of each multimeric barcode molecule, and then to ligate this extension product to the phosphorylated 5' end of the adapter oligonucleotide annealed to the downstream thereof, to a PCR tube was added 10 microliters of the eluate from the previous step (containing the synthetic DNA templates along which the adapter oligonucleotides have been annealed and extended), plus 3.0 microliters of a 50.0 nanomolar solution of RNAse H-digested multimeric barcode molecules (as produced in the last step of Method 3), plus 6.0 microliters of 5X Isothermal extension/ligation buffer, plus water to final volume of 26.6 microliters. On a thermal cycler, the tube was incubated at 70° C. for 60 seconds, then slowly annealed to 60° C., then held at 60° C. for 5 minutes, then slowly annealed to 55° C. then held at 55° C. for 5 minutes, then slowly annealed to 50° C. at 0.1° C./sec then held at 50° C. for 30 minutes, then held at 4° C. To the tube was added 0.6 microliters 10 uM US_PCR_Prm Only_02 (SEQ ID NO: 278, an extension primer), and the reaction was incubated at 50° C. for 10 minutes, then held at 4° C. To the tube was added 0.3 microliters (0.625 U) Phusion Polymerase (NEB; 2 U/uL) 2.5 microliters (100 U) Taq DNA Ligase (NEB; 40 U/uL); and 2.5 microliters 100 milimolar DTT. The tube was then incubated at 50° C. for 5 minutes, then held at 4° C. The reaction was then purified with 0.7X volume (21 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.
[0783] To a new PCR tube was add 25.0 microliters of the eluate, plus 10.0 microliters 5X Phusion HF buffer (NEB), plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 2.0 microliters 10 uM SynTemp_PE2_B1_Short1 (SEQ ID NO: 277; a primer that is complementary to part of the extension products produced by the above steps; serves as a primer for the primer-extension and then PCR reactions described here), plus 0.5 uL Phusion Polymerase (NEB), plus water to final volume of 49.7 microliters. Of this reaction, a volume of 5.0 microliters was added to a new PCR tube, which was then incubated for 30 seconds at 55° C., 30 seconds 60° C., and 30 seconds 72° C., then followed by 10 cycles of: 98° C. then 65° C. then 72° C. for 30 seconds each, then held at 4° C. To each tube was then added 9.0 microliters 5X Phusion buffer, plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 1.75 microliters 10 uM SynTemp_PE2_B1_Short1 (SEQ ID NO: 277), plus 1.75 microliters 10 uM US_PCR_Prm_Only_02 (SEQ ID NO: 278), plus 0.5 microliters Phusion Polymerase (NEB), plus water to final volume of 50 microliters. The PCR tube was placed on a thermal cycler and amplified for 24 cycles of: 98° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C., then purified with 1.2X volume (60 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.
[0784] The resulting library was then barcoded for sample identification by a PCR-based method, amplified, and sequenced by standard methods using a 150-cycle, mid-output NextSeq flowcell (Illumina),and demultiplexed informatically for further analysis.
Method 9: Barcoding Genomic DNA Loci With Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides
[0785] This method describes a framework for barcoding targets within specific genomic loci (e.g. barcoding a number of exons within a specific gene) using multimeric barcoding reagents that contain barcoded oligonucleotides. First, a solution of Multimeric Barcode Molecules was produced by In Vitro Transcription and cDNA Synthesis (as described in Method 3). Then, solutions of multimeric barcoding reagents containing barcoded oligonucleotides was produced as described in Method 4, with a modification made such that instead of using an adapter oligonucleotide targeting a synthetic DNA template (i.e. DS_ST_05, SEQ IDN O: 273, as used in Method 4), adapter oligonucleotides targeting the specific genomic loci were included at that step. Specifically, a solution of multimeric barcoding reagents containing appropriate barcoded oligonucleotides was produced individually for each of three different human genes: BRCA1 (containing 7 adapter oligonucleotides, SEQ ID NOs 279-285), HLA-A (containing 3 adapter oligonucleotides, SEQ ID NOs 286-288), and DQB1 (containing 2 adapter oligonucleotides, SEQ ID NOs 289-290). The process of Method 4 was conducted for each of these three solutions as described above. These three solutions were then merged together, in equal volume, and diluted to a final, total concentration all barcoded oligonucleotides of approximately 50 nanomolar.
[0786] In a PCR tube were plus 2.0 microliters 5X Phusion HF buffer (NEB), plus 1.0 microliter of 100 nanogram/microliter human genomic DNA (NA12878 from Coriell Institute) to final volume of 9.0 microliters. In certain variant versions of this protocol, the multimeric barcoding reagents (containing barcoded oligonucleotides) were also added at this step, prior to the high-temperature 98° C. incubation. The reaction was incubated at 98° C. for 120 seconds, then held at 4° C. To the tube was added 1.0 microliters of the above 50 nanomolar solution of multimeric barcode reagents, and then the reaction was incubated for 1 hour at 55° C., then 1 hour at 50° C., then 1 hour at 45° C., then held at 4° C. (Note that for certain samples, this last annealing process was extended to occur overnight, for a total of approximately 4 hours per temperature step).
[0787] In order to add a reverse universal priming sequence to each amplicon sequence (and thus to enable subsequent amplification of the entire library at once, using just one forward and one reverse amplification primer), the reaction was diluted 1:100, and 1.0 microliter of the resulting solution was added in a new PCR tube to 20.0 microliters 5X Phusion HF buffer (NEB), plus 2.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 1.0 microliters a reverse-primer mixture (equimolar concentration of SEQ ID Nos 291-303, each primer at 5 micromolar concentration), plus 1.0 uL Phusion Polymerase (NEB), plus water to final volume of 100 microliters. The reaction was incubated at 53° C. for 30 seconds, 72° C. for 45 seconds, 98° C. for 90 seconds, then 68° C. for 30 seconds, then 64° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C. The reaction was then purified with 0.8X volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.
[0788] The resulting library was then barcoded for sample identification by a PCR-based method, amplified, and sequenced by standard methods using a 150-cycle, mid-output NextSeq flowcell (Illumina),and demultiplexed informatically for further analysis.
Method 10 - Sequencing the Library of Multimeric Barcode Molecules
Preparing Amplified Selected Molecules For Assessment With High-Throughput Sequencing
[0789] To a PCR tube was added 1.0 microliters of the amplified selected molecule solution, plus 1.0 microliters of 100 micromolar CS_SQ_AMP_REV1 (SEQ ID NO: 16), plus 1.0 microliters of 100 micromolar US_PCR_Prm_Only_02 (SEQ ID NO: 17), plus 10 microliters of 10X Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) plus 84.0 microliters H.sub.2O to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 3 cycles of: 95° C. for 30 seconds, then 56° C. for 30 seconds, then 72° C. for 3 minutes; then held at 4° C. The solution was then purified with 0.8X volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 85 microliters H.sub.2O.
[0790] This solution was then added to a new PCR tube, plus 1.0 microliters of 100 micromolar Illumina_PE1, plus 1.0 microliters of 100 micromolar Illumina_PE2, plus 10 microliters of 10X Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 4 cycles of: 95° C. for 30 seconds, then 64° C. for 30 seconds, then 72° C. for 3 minutes; then 18 cycles of: 95° C. for 30 seconds, then 67° C. for 30 seconds, then 72° C. for 3 minutes; then held at 4° C. The solution was then purified with 0.8X volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer’s instructions), and eluted in 40 microliters H.sub.2O.
[0791] High-throughput Illumina sequencing was then performed on this sample using a MiSeq sequencer with paired-end, 250-cycle V2 sequencing chemistry.
Method 11 - Assessment of Multimeric Nature of Barcodes Annealed and Extended Along Single Synthetic Template DNA Molecules
[0792] A library of barcoded synthetic DNA templates was created using a solution of multimeric barcoding reagents produced according to a protocol as described generally in Method 3 and Method 4, and using a solution of synthetic DNA templates as described in Method 5, and using a laboratory protocol as described in Method 6; the resulting library was then barcoded for sample identification by a PCR-based method, amplified, and sequenced by standard methods using a 150-cycle, mid-output NextSeq flowcell (Illumina), and demultiplexed informatically for further analysis. The DNA sequencing results from this method were then compared informatically with data produced from Method 10 to assess the degree of overlap between the multimeric barcoding of synthetic DNA templates and the arrangement of said barcodes on individual multimeric barcoding reagents (the results are shown in
Results
Structure and Expected Sequence Content of Each Sequence Multimeric Barcoding Reagent Molecule
[0793] The library of multimeric barcode molecules synthesised as described in Methods 1 to 3 was prepared for high-throughput sequencing, wherein each molecule sequenced includes a contiguous span of a specific multimeric barcode molecule (including one or more barcode sequences, and one or more associate upstream adapter sequences and/or downstream adapter sequences), all co-linear within the sequenced molecule. This library was then sequenced with paired-end 250 nucleotide reads on a MiSeq sequencer (illumina) as described. This yielded approximately 13.5 million total molecules sequenced from the library, sequenced once from each end, for a total of approximately 27 million sequence reads.
[0794] Each forward read is expected to start with a six nucleotide sequence, corresponding to the 3' end of the upstream adapter: TGACCT This forward read is followed by the first barcode sequence within the molecule (expected to be 20 nt long).
[0795] This barcode is then followed by an ‘intra-barcode sequence’ (in this case being sequenced in the ‘forward’ direction (which is 82 nucleotides including both the downstream adapter sequence and upstream adapter sequence in series): ATACCTGACTGCTCGTCAGTTGAGCGAATTCCGTATGGTGGTACACACCTACACTACTCGGA CGCTCTTCCGATCTTGACCT (SEQ ID NO: 304)
[0796] Within the 250 nucleotide forward read, this will then be followed by a second barcode, another intra-barcode sequence, and then a third barcode, and then a fraction of another intra-barcode sequence.
[0797] Each reverse read is expected to start with a sequence corresponding to the downstream adapter sequence: GCTCAACTGACGAGCAGTCAGGTAT (SEQ ID NO: 305) This reverse read is then followed by the first barcode coming in from the opposite end of the molecule (also 20 nucleotides long, but sequenced from the opposite strand of the molecule and thus of the inverse orientation to those sequenced by the forward read) This barcode is then followed by the ‘intra-barcode sequence’ but in the inverse orientation (as it is on the opposite strand):
[0798] AGGTCAAGATCGGAAGAGCGTCCGAGTAGTGTAGGTGTGTACCACCATACGGAATTCGCTC AACTGACGAGCAGTCAGGTAT (SEQ ID NO: 306) Likewise this 250 nucleotide reverse read will then be followed by a second barcode, another intra-barcode sequence, and then a third barcode, and then a fraction of another intra-barcode sequence.
Sequence Extraction and Analysis
[0799] With scripting in Python, each associated pair of barcode and flanking upstream-adapter and downstream-adapter sequence were isolated, with each individual barcode sequence of each barcode molecule then isolated, and each barcode sequence that was sequenced within the same molecule being annotated as belonging to the same multimeric barcode molecule in the library of multimeric barcode molecules. A simple analysis script (Networkx; Python) was employed to determine overall multimeric barcode molecule barcode groups, by examining overlap of barcode-barcode pairs across different sequenced molecules. Several metrics of this data were made, including barcode length, sequence content, and the size and complexity of the multimeric barcode molecules across the library of multimeric barcode molecules.
Number of Nucleotides Within Each Barcode Sequence
[0800] Each individual barcode sequence from each barcode molecule, contained within each Illumina-sequenced molecule was isolated, and the total length of each such barcode was determined by counting the number of nucleotides between the upstream adapter molecule sequence, and the downstream adapter molecule sequence. The results are shown in
[0801] The overwhelming majority of barcodes are 20 nucleotides long, which corresponds to five additions of our four-nucleotide-long sub-barcode molecules from our double-stranded sub-barcode library. This is thus the expected and desired result, and indicates that each ‘cycle’ of: Ligation of Sub-Barcode Library to Mlyl-Cleaved Solution, PCR Amplification of the Ligated Library, Uracil Glycosylase Enzyme Digestion, and Mlyl Restriction Enzyme Cleavage, was successful and able to efficiently add new four-nucleotide sub-barcode molecules at each cycle, and then was successfully able to amplify and carry these molecules forward through the protocol for continued further processing, including through the five total cycles of sub-barcode addition, to make the final, upstream-adapter-ligated libraries.
[0802] We also used this sequence analysis method to quantitate the total number of unique barcodes in total, across all sequenced multimeric barcode molecules: this amounted to 19,953,626 total unique barcodes, which is essentially identical to the 20 million barcodes that would be expected, given that we synthesised 2 million multimeric barcode molecules, each with approximately 10 individual barcode molecules.
[0803] Together, this data and analysis thus shows that the methods of creating complex, combinatoric barcodes from sub-barcode sequences is effective and useful for the purpose of synthesising multimeric barcode molecules.
Total Number of Unique Barcode Molecules in Each Multimeric Barcode Molecule
[0804]
[0805] This figure shows that the majority of multimeric barcode molecules sequenced within our reaction have two or more unique barcodes contained therein, thus showing that, through our Overlap-Extension PCR linking process, we are able to link together multiple barcode molecules into multimeric barcode molecules. Whilst we would expect to see more multimeric barcode molecules exhibiting closer to the expected number of barcode molecules (10), we expect that this observed effect is due to insufficiently high sequencing depth, and that with a greater number of sequenced molecules, we would be able to observe a greater fraction of the true links between individual barcode molecules. This data nonetheless suggest that the fundamental synthesis procedure we describe here is efficacious for the intended purpose.
Representative Multimeric Barcode Molecules
[0806]
[0807] This figure illustrates the our multimeric barcode molecule synthesis procedure: that we are able to construct barcode molecules from sub-barcode molecule libraries, that we are able to link multiple barcode molecules with an overlap-extension PCR reaction, that we are able to isolate a quantitatively known number of individual multimeric barcode molecules, and that we are able to amplify these and subject them to downstream analysis and use.
Barcoding Synthetic DNA Templates of Known Sequence With (i) Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides, and (ii) Multimeric Barcoding Reagents and Separate Adapter Oligonucleotides
Sequence Extraction and Analysis
[0808] With scripting in Python and implemented in an Amazon Web Services (AWS) framework, for each sequence read following sample-demultiplexing, each barcode region from the given multimeric barcode reagent was isolated from its flanking upstream-adapter and downstream-adapter sequence. Likewise, each molecular sequence identifier region from the given synthetic DNA template molecule was isolated from its flanking upstream and downstream sequences. This process was repeated for each molecule in the sample library; a single filtering step was performed in which individual barcodes and molecular sequence identifiers that were present in only a single read (thus likely to represent either sequencing error or error from the enzymatic sample-preparation process) were censored from the data. For each molecular sequence identifier, the total number of unique (ie with different sequences) barcode regions found associated therewith within single sequence reads was quantitated. A histogram plot was then created to visualize the distribution of this number across all molecular sequence identifiers found in the library.
Discussion
[0809]
[0810]
[0811] Together, these two figures show that this framework for multimeric molecular barcoding is an effective one, and furthermore that the framework can be configured in different methodologic ways.
[0812] To analyse whether, and the extent to which, individual multimeric barcoding reagents successfully label two or more sub-sequences of the same synthetic DNA template, the groups of different barcodes on each individual multimeric barcoding reagent in the library (as predicted from the Networkx analysis described in the preceding paragraph and as illustrated in
[0813] The data from this analysis is shown in
Barcoding Genomic DNA Loci With Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides
Sequence Extraction and Analysis
[0814] As with other analysis, scripting was composed in Python and implemented in an Amazon Web Services (AWS) framework. For each sequence read following sample-demultiplexing, each barcode region from the given multimeric barcode reagent was isolated from its flanking upstream-adapter and downstream-adapter sequence and recorded independently for further analysis. Likewise, each sequence to the 3' end of the downstream region (representing sequence containing the barcoded oligonucleotide, and any sequences that the oligonucleotide had primed along during the experimental protocol) was isolated for further analysis. Each downstream sequence of each read was analysed for the presence of expected adapter oligonucleotide sequences (i.e. from the primers corresponding to one of the three genes to which the oligonucleotides were directed) and relevant additional downstream sequences. Each read was then recorded as being either ‘on-target’ (with sequence corresponding to one of the expected, targeted sequence) or ‘off-target‘. Furthermore, for each of the targeted regions, the total number of unique multimeric barcodes (i.e. with identical but duplicate barcodes merged into a single-copy representation) was calculated. A schematic of each expected sequence read, and the constituent components thereof, is shown in
Discussion
[0815]
[0816] As seen in the figure, the majority of reads across all samples are on-target; however there is seen a large range in the number of unique barcode molecules observed for each amplicon target. These trends across different amplicons seem to be consistent across the different experimental conditions, and could be due to different priming (or mis-priming) efficiencies of the different oligonucleotides, or different amplification efficiencies, or different mapping efficiencies, plus potential other factors acting independently or in combination. Furthermore, it is clear that the samples that were annealed for longer have a larger number of barcodes observed, likely due to more complete overall annealing of the multimeric reagents to their cognate genomic targets. And furthermore, the samples where the barcoded oligonucleotides were first denatured from the barcode molecules show lower overall numbers of unique barcodes, perhaps owing to an avidity effect wherein fully assembled barcode molecules can more effectively anneal clusters of primers to nearby genomic targets at the same locus. In any case, taken together, this figure illustrates the capacity of multimeric reagents to label genomic DNA molecules, across a large number of molecules simultaneously, and to do so whether the barcoded oligonucleotides remain bound on the multimeric barcoding reagents or whether they have been denatured therefrom and thus potentially able to diffuse more readily in solution.
Experimental Method for Barcoding Fragments of Genomic DNA From FFPE Samples with Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides
[0817] An FFPE specimen (e.g. an FFPE specimen from a tissue biopsy of a cancer patient) is cut into 5 sections 5 microns to 10 microns thick, and then the paraffin is depleted from the sample with a xylene-dissolution step as described in the manufacturer’s instructions of the QlAamp DNA FFPE Tissue Kit (Qiagen), then centrifuged to pellet the de-paraffinised sample, and then washed with 100% ethanol to remove residual xylene, and then centrifuged again, the ethanol is aspirated, and any remaining ethanol is allowed to evaporate as per manufacturer’s instructions. The sample is then resuspended in 180 microliters Buffer ATL and digested with 5.0 microliter Proteinase K at 50° C. for 15 minutes. The sample is then pelleted at top speed on a desktop microcentrifuge for 5 minutes, and the solution is aspirated. The pellet is then resuspended in NEBNext Ultra II End Prep Reaction Buffer (New England Biolabs; e.g. 20.0 microliters of 1.1X Reaction Buffer) and fragments of genomic DNA from the sample are blunted and A-Tailed with
[0818] the NEBNext Ultra II End Prep Enzyme Mix (New England Biolabs) at 20° C. for 30 minutes, followed by heat-inactivation of the enzyme mix at 65° C. for 30 minutes (as per the manufacturer’s instructions).
[0819] A library of at least 10 million different multimeric barcoding reagents is then added to and mixed with the sample solution, wherein each multimeric barcoding reagent is a contiguous multimeric barcode molecule made of 10-30 individual barcode molecules, with each barcode molecule comprising a barcode region with a different sequence from the other barcode molecules on that multimeric barcoding reagent, and with a barcoded oligonucleotide annealed to each barcode molecule, wherein each barcoded oligonucleotide comprises the full forward (read 1) Illumina sequencing primer and adapter sequence on its 5' end, and a double-stranded target region with a single 3' thymine overhang nucleotide (i.e a T-overhang, able to ligate to a 3'-A overhang DNA molecule from an A-Tailing reaction) at its 3' end.
[0820] An appropriate volume of ‘NEBNext Ultra II Ligation Master Mix’, and ‘NEBNext Ligation Enhancer’ (New England Biolabs) (as per manufacturer’s instructions) are then added to the solution, and then the samples are incubated at 20° C. for 120 on a thermal cycler with the heated lid turned off. An appropriate volume of ‘NEBNext Adapter’ (New England Biolabs) is then added to the sample, and the solutions are mixed gently; the solutions were then incubated at 20° C. for 60 minutes on a thermal cycler with the heated lid turned off. To each tube is then added ‘NEBNext USER Enzyme’, and the solutions are mixed gently; the solutions are then incubated at 20° C. for 20 minutes at 37° C. for 30 minutes on a thermal cycler with a heated lid set to 50° C., and then held at 4° C. To the sample is then added 180 microliters Buffer ATL and 20.0 microliter Proteinase K, and proteinase-digested at 56° C. for 120 minutes using manufacturer’s instructions from the QlAamp DNA FFPE Tissue Kit (Qiagen). The sample is then incubated at 90° C. for 60 minutes, and added to 200 microliters Buffer AL (Qiagen), and bound to and washed with QlAamp MinElute columns and eluted in 100 microliters Buffer ATE (all from QlAamp DNA FFPE Tissue Kit; Qiagen). The entire resulting eluted sample is then amplified with PCR using the standard Illumina PCR primers to yield 100-1000 ng total DNA following cleanup with 1.2X-volume Ampure XP SPRI beads (Agencourt; as per manufacturer's instructions), and then sequenced to an appropriate depth (e.g. at least 100 million total reads) on a standard Illumina Sequencer. Raw sequences may then be quality-trimmed and length-trimmed, constant adapter/primer sequences are trimmed away, and the genomic DNA sequences and barcode sequences from each retained sequence read are isolated informatically. Linked sequences are determined by detecting genomic DNA sequences that are appended different barcode sequences from the same set of barcode sequences (i.e. from the same multimeric barcoding reagent).