Target enrichment
11555185 · 2023-01-17
Assignee
Inventors
- Cynthia Hendrickson (Wenham, MA, US)
- Sarah Bowman (Ipswich, MA, US)
- Amy Emerman (Ispwich, MA, US)
- Kruti Patel (Ipswich, MA, US)
Cpc classification
C40B50/18
CHEMISTRY; METALLURGY
C12N15/1058
CHEMISTRY; METALLURGY
C40B70/00
CHEMISTRY; METALLURGY
C12Q2525/155
CHEMISTRY; METALLURGY
C12N15/1065
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
C12N15/1093
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
International classification
C12N15/10
CHEMISTRY; METALLURGY
C40B50/18
CHEMISTRY; METALLURGY
Abstract
The present disclosure provides, among other things, a way to amplify and sequence target sequences in a low-input sample. In some embodiments, the method comprises ligating a double-stranded adaptor onto a population of fragments to produce tagged fragments, and linearly amplifying the tagged fragments.
Claims
1. A method for enriching for target sequences in biological samples, each sample characterized by a genome, the method comprising: (a) obtaining duplex polynucleotide fragments from the genomes of each of the biological samples; (b) ligating a first adaptor to the fragments from each sample to produce ligated polynucleotide fragments wherein each sample is in a separate reaction mix and wherein the first adaptor comprises: a 5′ top strand that comprises from 5′ to 3′, a leader sequence, a sample tag, and a sequence that is complementary to a 3′ bottom strand, the 3′ bottom strand comprising at least one modified nucleotide and not the sample tag nor the leader sequence, wherein at least some of the polynucleotide fragments in each sample comprise a target sequence; (c) pooling the ligated polynucleotide fragments into a single reaction mix wherein each sample of the multiple samples is tagged with a different sample tag; (d) hybridizing an oligonucleotide having an affinity binding domain to the 3′ end of the target sequence on each strand of the pooled polynucleotide fragments and immobilizing the hybridized oligonucleotide on a substrate; (e) removing any 3′ non-target single stranded overhang sequences to form a double-stranded end of the polynucleotide fragment; (f) ligating a second adaptor, optionally having an index sequence, to the 3′ double-stranded end of the polynucleotide fragment, wherein: (i) the second adaptor has a duplex at its 5′ end and a 3′ single strand overhang with a terminal 3′-5′ exonuclease blocking moiety on its 3′ end; and (ii) the duplex 5′ end has a top strand and a bottom strand where the bottom strand has at least one modified nucleotide; (g) removing the bottom strand of the second adaptor by enzymatic degradation at the modified nucleotides to form a single stranded polynucleotide immobilized on a substrate by the hybridized oligonucleotide; (h) removing immobilized polynucleotides that do not contain target sequences using a 3′-5′ double-stranded exonuclease; and (i) obtaining the enriched target sequences.
2. The method according to claim 1, further comprising: introducing the index sequence during library amplification.
3. The method according to claim 2, further comprising pooling the polynucleotides having the index sequence with other polynucleotides having different index sequences.
4. The method according to claim 3, wherein step (i) further comprises sequencing the enriched target sequences in a single sequencing reaction to determine the genotype of the biological samples.
5. The method according to claim 1, wherein step (e) further comprises using a 3′-5′ single stranded exonuclease or a plurality of 3′-5′ single stranded exonucleases to remove the 3′ non target single stranded region.
6. The method according to claim 1, wherein the 5′ top strand in (b) further comprises a unique molecule identifier (UMI).
7. The method according to claim 1, wherein the at least one modified nucleotide in (b) and (f) are deoxyuridine and enzyme degradation in (e) and (g) is achieved using UDG.
8. The method according to claim 1, wherein (g) further comprises amplifying the immobilized polynucleotides using a primer containing an index sequence.
9. The method according to claim 1, wherein step (i) further comprises sequencing the enriched target sequences in the biological samples in a single sequencing step.
10. The method according to claim 9 further comprising obtaining genotypes from the sequencing data.
11. The method according to claim 1, wherein the duplex polynucleotide fragments comprise DNA.
12. The method according to claim 1, wherein the affinity binding domain is biotin.
13. A method of making an enriched population of polynucleotides comprising target sequences from biological samples, each sample characterized by a genome, the method comprising: (a) A-tailing duplex polynucleotide fragments from the genomes of the biological samples to produce A-tailed polynucleotide fragments; (b) ligating a first adaptor to the A-tailed polynucleotide fragments from each sample to produce ligated polynucleotide fragments wherein each sample is in a separate reaction mix and wherein the first adaptor comprises: a 5′ top strand that comprises from 5′ to 3′, a leader sequence, a sample tag unique to each separate reaction mix, and a sequence that is complementary to a 3′ bottom strand, the 3′ bottom strand comprising at least one modified nucleotide and not the sample tag nor the leader sequence, wherein at least some of the polynucleotide fragments in each sample comprise a target sequence; (c) pooling the ligated polynucleotide fragments into a single reaction mix to form pooled polynucleotide fragments; (d) hybridizing an oligonucleotide having an affinity binding domain to the 3′ end of the target sequence on each strand of the pooled polynucleotide fragments to form duplexes, each duplex comprising one of the pooled polynucleotide fragments, and immobilizing the duplexes on a substrate to form immobilized duplexes; (e) removing from the immobilized duplexes any 3′ non-target single stranded overhanging sequences to form a polynucleotide-oligonucleotide duplex having a double-stranded end; (f) ligating a second adaptor, optionally having an index sequence, to the double-stranded end of the polynucleotide-oligonucleotide duplexes to form second adapter ligation products, wherein: (i) the second adaptor has a duplex at its 5′ end and a 3′ single strand overhang with a terminal 3′-5′ exonuclease blocking moiety on its 3′ end; and (ii) the duplex 5′ end has a top strand and a bottom strand where the bottom strand has at least one modified nucleotide; (g) enzymatically degrading the modified nucleotides of the second adapter ligation products to form a single stranded polynucleotide immobilized on a substrate by the hybridized oligonucleotide; and (h) removing immobilized polynucleotides that do not contain target sequences using a 3′-5′ double-stranded exonuclease to form the enriched population of single stranded polynucleotides comprising target sequences.
14. The method according to claim 13 further comprising contacting the single stranded polynucleotides with one or more primers to amplify the enriched target sequences.
15. The method according to claim 13, further comprising: introducing the index sequence during library amplification.
16. The method according to claim 13, wherein step (e) further comprises using a 3′-5′ single stranded exonuclease or a plurality of 3′-5′ single stranded exonucleases to remove the 3′ non target single stranded region.
17. The method according to claim 13, wherein the 5′ top strand in (b) further comprises a unique molecule identifier (UMI).
18. The method according to claim 13, wherein the at least one modified nucleotide in (b) and (f) are deoxyuridine and enzyme degradation in (e) and (g) is achieved using UDG.
19. The method according to claim 13, wherein (g) further comprises amplifying the immobilized polynucleotides using a primer containing an index sequence.
20. The method according to claim 13, further comprising pooling the polynucleotides having the index sequence with other polynucleotides having different index sequences.
21. The method according to claim 20, wherein step (h) further comprises sequencing in a single sequencing reaction the enriched population of single stranded polynucleotides comprising target sequences to determine the genotype of the biological samples.
22. The method according to claim 13, wherein step (h) further comprises sequencing in a single sequencing step the enriched population of single stranded polynucleotides comprising target sequences.
23. The method according to claim 22 further comprising obtaining genotypes from the sequencing data.
24. The method according to claim 13, wherein the affinity binding domain is biotin.
Description
BRIEF DESCRIPTION OF THE FIGURES
(1) The skilled artisan will understand that the drawings, described below, are for illustration purposes only.
(2) The drawings are not intended to limit the scope of the present teachings in any way.
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DETAILED DESCRIPTION
(14) Before the various embodiments are described, it is to be understood that the teachings of this disclosure are not limited to the particular embodiments described, and as such can, of course, vary. It is also to be understood that the description of particular embodiments is not intended to be limiting, since the scope of the present teachings will be limited only by the appended claims.
(15) The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described in any way. While the present teachings are described in conjunction with various embodiments, it is not intended that the present teachings be limited to such embodiments. On the contrary, the present teachings encompass various alternatives, modifications, and equivalents, as will be appreciated by those of skill in the art.
(16) As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which can be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present teachings. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.
(17) Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Still, certain elements may be defined for the sake of clarity and ease of reference.
(18) Terms and symbols of nucleic acid chemistry, biochemistry, genetics, and molecular biology used herein follow those of standard treatises and texts in the field, e.g. Kornberg and Baker, DNA Replication, Second Edition (W.H. Freeman, New York, 1992); Lehninger, Biochemistry, Second Edition (Worth Publishers, New York, 1975); Strachan and Read, Human Molecular Genetics, Second Edition (Wiley-Liss, New York, 1999); Eckstein, editor, Oligonucleotides and Analogs: A Practical Approach (Oxford University Press, New York, 1991); Gait, editor, Oligonucleotide Synthesis: A Practical Approach (IRL Press, Oxford, 1984); and the like.
(19) Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids are written left to right in 5′ to 3′ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
(20) It must be noted that as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. For example, the term “a primer” refers to one or more primers, i.e., a single primer and multiple primers. It is further noted that the claims can be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements or use of a “negative” limitation.
(21) As used herein, the term “linear amplification” is intended to refer to an amplification reaction in which the amount of product increases linearly, not exponentially, over time.
(22) The term “strand” as used herein refers to a nucleic acid made up of nucleotides covalently linked together by covalent bonds.
(23) In its double-stranded form, DNA has two complementary strands of nucleic acid referred to herein as the “top” and “bottom” strands, where the top strand is, by convention, the strand that is oriented in the 5′ to 3′ direction.
(24) If a double-stranded adaptor contains a top strand and a bottom strand, the different strands can be formed from different oligonucleotide molecules (as exemplified in
(25) The term “unique molecule identifier” (UMI) refers to a random unique sequence of at least 6 nucleotides (6N). Longer random unique sequences may be used, for example, 2-15 nucleotides, 6-12 nucleotides, or 8-12 nucleotides. The adaptors at each 3′ end of a single molecule in steps 1 and 2 of
(26) The term “sample identifier” and “sample tag” are used interchangeable and refer to a molecular barcode that identifies the sample source of a population of polynucleotide fragments. Accordingly, the adaptors ligated at to each strand in a duplex will have the same sample identifier as will other polynucleotide fragments in the population (Tag-1 in
(27) The terms “index” and “index sequence” are used interchangeably. A single index sequence is used to label a multiplexed mixture of polynucleotides from a plurality of samples. The term “ high sensitivity” for sequencing reads refers to the detection of rare variants that may occur in genomes. For example, in cancer biopsies, only a small percentage e.g. 0.1% of a population of polynucleotides from a human sample may contain the sequence variant of interest (e.g. SNPs). Therefore, a method that has a high sensitivity is necessary to detect these rare events. The methods involving linear amplification described herein and exemplified in
(28) The term “sample” is used herein to refer to the source of a population of polynucleotide fragments. Depending on its context, a sample may be a single cell, a tissue or an individual biological entity such as a plant, animal or microbe.
(29) The term “population of polynucleotides” refers to more than one polynucleotide. A population of polynucleotides may be derived from part or all of: a genomic DNA, organelle DNA, cDNA, or mRNA library.
(30) The term “polynucleotide” refers to a DNA or an RNA. This molecule may be naturally occurring and derived from a genome (DNA) of a virus or other life form, or cytoplasm or nucleus (RNA) or may be synthetic. Polynucleotides may include an entire genome, gene, fragment of DNA or library of fragments. Polynucleotides may include ribosomal RNA (rRNAs), messenger RNAs (mRNAs), silencing RNAs (siRNAs), small nuclear RNAs (snRNA) microRNAs (miRNA) short interfering RNAs, (siRNAs) or long non-coding RNAs (IncRNAs).
(31) The term “polynucleotide fragments” refers to products of polynucleotide cleavage or fragmentation.
(32) The term “target sequence” refers to a piece of the polynucleotide fragment that contains a locus of interest. This may be because the target sequence contains sequences or mutations that when determined by sequencing can be diagnostic for e.g. disease, phenotype or genotype. Examples of target sequences include exons, introns, regulatory sequences, single nucleotide polymorphisms (SNPs), gene fusions, copy number variations, and indels. Analysis of target sequences may also be used to determine heterozygosity and homozygosity.
(33) The present disclosure relates generally to compositions, methods of use, kits for obtaining sequencing data from polynucleotide samples and detecting variants that may be correlated with disease or with heredity. Examples are proved herein for linear amplification of polynucleotide samples providing the opportunity to distinguish sequences for positive and negative strands of a duplex DNA sample. Examples are also provided for multiplex analysis of polynucleotide samples. Target enrichment methods may comprise linear amplification without multiplexing, multiplexing without linear amplification, or linear amplification and multiplexing, in each case, prior to hybridization and affinity capture. Linear amplification may provide or improve accuracy and coverage when processing low abundance samples.
(34) Methods disclosed here may produce products that when sequenced, result in as much as 90% or more of reads on target, have very high coverage uniformity, and/or display minimal GC bias. Target-specific probes may be selected to capture a single gene or many targets in a multiplex workflow.
(35) Certain principles of embodiments of the present method are shown in
(36) Sample Preparation
(37) The first step of the method as described in
(38) The sample used in present embodiments can contain genomic DNA from virtually any organism, including, but not limited to, plants, animals (e.g., reptiles, mammals, insects, worms, fish, etc.), tissue samples, bacteria, fungi (e.g., yeast), phage, viruses, cadaveric tissue, archaeological/ancient samples, etc. In certain embodiments, the genomic DNA used in the method may be derived from a mammal, wherein in certain embodiments, the mammal is a human.
(39) In exemplary embodiments, the sample may contain genomic DNA from a mammalian cell, such as, a human, mouse, rat, or monkey cell. The sample may be made from cultured cells or cells of a clinical sample, e.g., a tissue biopsy, scrape or lavage or cells of a forensic sample (i.e., cells of a sample collected at a crime scene).
(40) In particular embodiments, the nucleic acid sample may be obtained from a biological sample such as cells, tissues, bodily fluids, and stool. Bodily fluids of interest include but are not limited to, blood, serum, plasma, saliva, mucous, phlegm, cerebral spinal fluid, pleural fluid, tears, lactal duct fluid, lymph, sputum, synovial fluid, urine, amniotic fluid, and semen. In particular embodiments, a sample may be obtained from a subject, e.g., a human.
(41) In some embodiments, the sample comprises fragments of human genomic DNA. In some embodiments, the sample may be obtained from a cancer patient. In some embodiments, the sample may be made by extracting fragmented DNA from a patient sample, e.g., a formalin-fixed paraffin embedded tissue sample. In some embodiments, the patient sample may be a sample of cell-free “circulating” DNA from a bodily fluid, e.g., peripheral blood from the blood of a subject (e.g., a cancer patient). The DNA fragments used in the initial step of the method should be non-amplified DNA that has not been denatured beforehand.
(42) The DNA in the initial sample may be made by extracting genomic DNA from a biological sample, and then fragmenting it. The fragmenting may be done mechanically (e.g., by sonication, nebulization, or shearing) or enzymatically using a double-stranded DNA “dsDNA” Fragmentase® enzyme (New England Biolabs, Ipswich Mass.) or other single-stranded or double-stranded nucleases or nickases. In other embodiments, the DNA in the initial sample may already be fragmented (e.g., as is the case for FFPE samples and circulating cell-free DNA (cfDNA), e.g., ctDNA). The fragments in the initial sample may have a median size that is below 1 kb (e.g., in the range of 50 bp to 500 bp, or 80 bp to 400 bp), although fragments having a median size outside of this range may be used. In this method the ends of fragmented DNA may be polished and A-tailed prior to ligation to the adaptor.
(43) In some embodiments, the amount of DNA in a sample may be limiting. For example, the initial sample of fragmented DNA may contain less than 200 ng of fragmented human DNA, e.g., 10 pg to 200 ng, 100 pg to 200 ng, 1 ng to 200 ng or 5 ng to 50 ng, or less than 10,000 haploid genome equivalents (e.g., less than 5,000, less than 1,000, less than 500, less than 100 or less than 10), depending on the genome, although amounts outside of these ranges may be used.
(44) In some embodiments, the nucleic acid sequences may be fragmented to a desired size for example, an average size of 150 bp-200 bp or 200 bp-300 bp or 300 bp-400 bp or 400 bp-500 bp or 500 bp-600 bp or 600 bp-700 bp, although sizes outside of these ranges may be used. As illustrated as step 1 in
(45) Adaptor Ligation
(46) Next, the method may comprise ligating a double-stranded adaptor onto the population of fragments to produce tagged fragments. The double-stranded 5′ adaptor can be composed of two oligonucleotides that are hybridized together (as exemplified in
(47) After ligation, if necessary, the nucleic acid fragments ligated to adaptors can be purified from the ligation reaction mixture, e.g., using magnetic beads.
(48) Linear Amplification
(49) In some embodiments of the method (in accordance with step 3 of the method shown in
(50) In embodiments in which the top strand of the double-stranded adaptor comprises one or more modified nucleotides, prior to the linear amplification, the tagged fragments may be treated with an enzyme, e.g., a glycosylase to remove sugars from the modified nucleotides prior to thermocycling resulting in cleavage and removal of the DNA containing these modified nucleotides.
(51) In these embodiments, the modified nucleotide may be deoxyuridine and the enzyme may be UDG, although other modified nucleotide/enzyme combinations can be used. In the example shown in
(52) The linear amplification may be done by combining the linear polynucleotide fragment ligation products with polymerase, dNTPs, a linear amplification primer and optionally UDG to produce a reaction mix, and thermocycling the reaction mix. The reaction mix should be thermocycled at least at least once (e.g., at least 5 times, or at least 10 times or at least 20 times) to produce a number of copies of each DNA fragment that are ligated to the bottom strand of the adaptor and where the copy number corresponds to the cycle number. The products of this reaction have a copy of the molecular barcode of the bottom strand of the adaptor at the 5′ end and therefore can be referred to as 5′-tagged amplification products. This reaction can be done using NEBNext Ultra II Q5® Master Mix (A master mix containing Q5 DNA polymerase (New England Biolabs, Ipswich, Mass.)), although other polymerases can be used.
(53) This linear amplification step can be implemented as follows. After an initial denaturation step (e.g., at 98° C. for 30 seconds), the reaction can be temperature cycled at least once (e.g., at least 5 times, at least 10 times, at least 15 times or at least 20 times) in the following way: a temperature above 90° C. (e.g., 98° C.) for at least 5 seconds, a temperature of below 60° C. (e.g., 55° C.) for at least 5 seconds, and a temperature in the range of 65° C. to 80° C., e.g., 70° C. to 75° C. for at least 10 seconds. At the first temperature, e.g., 98° C., the DNA fragments denature. At the next temperature, e.g., 55° C., the linear amplification primer anneals to the 3′ end of the adaptor sequence. At 72° C., the polymerase (e.g., Q5 polymerase) extends the linear amplification primer. Other thermocycling conditions are known and can be readily used in this step. The product is a DNA sequence fragment containing an adaptor sequence at the 5′ end, i.e., a 5′ tagged amplification product.
(54) In some embodiments, the mixture is incubated at 37° C. for 10 minutes (which is suitable for Antarctic thermolabile UDG) and then thermocycled.
(55) In one example, after the UDG treatment the reaction can be heated and cooled any number of times, e.g., once, twice, at least 5 times, at least 10 times or up to 20 times to produce up to 20 5′-tagged amplification product molecules, where each molecule is a copy of a single DNA fragment that is ligated to a bottom strand of the adaptor.
(56) In some embodiments, the polymerase used in this step of the method should have a low error rate. In some embodiments, the polymerase may be a proofreading DNA polymerase, which typically have a 3′ to 5′ exonuclease activities. Examples of non-proofreading thermostable polymerases (i.e., thermostable polymerases that do not have a 3′ to 5′ exonuclease activity) include, but are not limited to, Taq and Tth. Examples of proofreading thermostable polymerases include, but are not limited to, Pfu (Agilent Technologies, Santa Clara, Calif.), Pwo (Roche, Basel, Switzerland), Tgo (Roche, Basel Switzerland), VENT® (New England Biolabs, Ipswich, Mass.), DEEP VENT® (New England Biolabs, Ipswich, Mass.), KOD HiFi (Novagen, Madison, Wis.), PFX50™ (Invitrogen, Waltham, Mass.), HERCULASE II™ (Agilent Technologies, Santa Clara, Calif.), PLATINUM PFX™ (Life Technologies, Waltham, Mass.) and ProofStart™ (Qiagen, Hilden, Germany). These polymerases, on average, produce 4× to 8× fewer errors than Taq polymerase. Further examples of proofreading thermostable polymerases include, but are not limited to, PHUSION® (Thermo Fisher Scientific, Waltham, Mass.), PFUULTRA™ (Agilent Technologies, Santa Clara, CA), PFUULTRA™ II (Agilent Technologies, Santa Clara, Calif.), IPROOF™ (Bio-Rad, Hercules, Calif.), Q5 polymerase, and KAPAHIFI™ (Kapa Biosystems, Wilmington, Mass.). These polymerases, on average, produce at least 20× fewer errors than Taq polymerase and can be readily employed herein. In some embodiments, it is envisaged that isothermal amplification methods might be used instead of thermocycling where such methods were capable of utilizing a single primer binding site. Examples of amplification methods include ligase chain reaction (LCR), strand displacement amplification (SDA), transcription mediated amplification (TMA), self-sustained sequence replication (3SR), Qβ replicase based amplification or rolling circle amplification, nucleic acid sequence-based amplification (NASBA), repair chain reaction (RCR), boomerang DNA amplification (BDA) helicase dependent amplification (HDA).
(57) Hybridization
(58) In some embodiments of the method (in accordance with step 4 of the method shown in
(59) In some embodiments the complexes can be bound to a solid support via a capture group (e.g., biotin) on the target-specific oligonucleotides. This step enriches for 5′ tagged amplification products that comprise a target sequence. For example, if the target-specific oligonucleotide is biotinylated and the complexes can be enriched by binding to a support comprised of streptavidin beads. In some embodiments, magnetic beads coated in streptavidin can be added to the reaction mix after hybridization of the a 5′ tagged amplification products to the target-specific oligonucleotides. The magnetic beads can be isolated by magnetism and then washed, thereby enriching for complexes that comprise the 5′ tagged amplification products. An alternative to biotin includes a SNAP-tag® (New England Biolabs, Ipswich, Mass.) that is a protein that reacts with a benzylguanine and may be modified to bind to an affinity capture domain.
(60) The solid support may include a matrix formed from the affinity capture domain or coated with the affinity capture domain. A solid support may be, for example, a bead including a magnetic bead, a column, a porous matrix, or a flat surface formed from for example, plastic or paper.
(61) Production of Blunt Ends
(62) In some embodiments of the method (in accordance with step 5 of the method shown in
(63) Ligation of a 3′ Adaptor
(64) In some embodiments of the method (as illustrated in step 6 of the method shown in
(65) Sample Clean-Up and Amplification
(66) In some embodiments of the method (as illustrated in step 7 of the method shown in
(67) The 5′ and 3′ tagged strands can be amplified by PCR, e.g., using a first primer that hybridizes to the 3′ adaptor sequence and a second primer that hybridizes to the complement of the primer in the linear amplification products, to produce PCR products. In an example, the magnetic beads can be washed with a buffer and resuspended in a PCR mixture containing water, a PCR master mix and amplification primers. The following PCR cycling conditions is used: 98° C. for 30 seconds followed by 18 cycles of 98° C. for 10 seconds, 62° C. for 15 seconds and 72° C. for 20 seconds. At the end of the 18 cycles, the PCR mixture incubated at 72° C. for 5 minutes. The PCR products obtained from the target sequences are then quantified and sequenced using conventional methods.
(68) Sequencing
(69) The sequencing step may be done using any convenient next generation sequencing method and may result in at least 10,000, at least 50,000, at least 100,000, at least 500,000, at least 1M at least 10M at least 100M or at least 1B sequence reads. In some cases, the reads are paired-end reads. As would be apparent, the primers used for amplification may be compatible with use in any next generation sequencing platform in which primer extension is used, e.g., Illumina's reversible terminator method, Roche's pyrosequencing method (454), Life Technologies' sequencing by ligation (the SOLiD platform), Life Technologies' Ion Torrent platform or Pacific Biosciences' fluorescent base-cleavage method. Examples of such methods are described in the following references: Margulies et al. (Nature 2005 437: 376-80); Ronaghi et al. (Analytical Biochemistry 1996 242: 84-9); Shendure (Science 2005 309: 1728); Imelfort et al. (Brief Bioinform. 2009 10:609-18); Fox et al.(Methods Mol Biol. 2009; 553:79-108); Appleby et al. (Methods Mol Biol. 2009; 513:19-39); English (PLoS One. 2012 7: e47768); and Morozova (Genomics. 2008 92:255-64), which are incorporated by reference for the general descriptions of the methods and the particular steps of the methods, including all starting products, reagents, and final products for each of the steps. The sequence reads may be analyzed computationally to identify sequence variations in the sample, such as point mutations, in-dels, deletions, insertions and rearrangements.
(70) Multiplexing
(71) Advantages of multiplexing include (a) the ability to analyse a large number of samples in one sequence reaction while maintaining a means to track the source of each polynucleotide and each sample from which it came (b) pooling samples can increase efficiency and reduce cost of the workflow used to enrich targets and sequence samples. Multiplexing as described herein can involve pooling two, tens, hundreds or thousands of samples. A linear amplification step can be omitted for genotyping samples, where low sensitivity of detection is sufficient. Multiplexing in the absence of linear amplification can be used in any application where a low sensitivity screen for variants is desirable such as marker genotyping for molecular breeding programs (e.g., plants, livestock, and fishery breeding programs), human sample identification, and mouse-tail genotyping.
(72) In some embodiments, the adaptor ligated onto the fragments in the initial step of the method may have a sample identifier which in
(73) As shown, after all the samples have been ligated to adaptors that have a sample identifier and optionally a UMI sequence, the samples may be pooled in a single vessel and may progress through the rest of the steps shown in
(74) As described above, the amplification products may be sequenced by any convenient method to produce sequence reads that comprise the sequence, at least part of the target sequence and a sample identifier or complements thereof. During analysis, the sequence reads may be assigned to a sample on the basis of the sample identifier that is in the sequence read. This method may be implemented in a high-throughput way. As few as 1 and as many as 96 samples, or as many as 384 or more samples, each having different sample identifiers, may be pooled together where the pool is labeled with a single index on the 3′ adaptor. These pooled samples each with a single index can then be pooled into larger pools containing multiple index sequences for analysis in a single sequencing reaction. A single sequencing reaction may include a multiplex enriched preparation of 3′ adaptor and 5′ adaptor ligated polynucleotide target sequences from one or more samples, 2 or more samples, 3 or more samples, 5 or more samples, 10 or more samples, 50 or more samples, 100 or more samples, 500 or more samples, 1000 or more samples, 5000 or more samples, up to and including about 10,000 or more samples where these samples may be obtained from the same or different sources. For example, the samples may be seeds of a plant, and the sources may be different plants. In this example, the original individual polynucleotide fragments containing a target sequence can be tracked by a UMI, each seed from which the polynucleotides came, can be tracked by a sample identifier and each plant for which the seeds came can be tracked by an index sequence. This is further illustrated in
(75) The foregoing example may include further multiplexing in connection with the hybridization. Hybridization reactions may be performed with single or pairs of target isolation probes or they may be multiplexed by performing each reaction with many hybridization probes (e.g., 3 or more, 5 or more, 10 or more, 100 or more, or 1,000 or more target isolation probes).
(76) The ability to perform sequencing on multiplexes of multiplexed samples means that as many as several thousand or more samples can be analysed in a single sequencing run enabling a rapid and cost-effective analysis.
(77) Kits
(78) The present disclosure relates to kits for performing methods described herein. A kit, for example, may include any system for delivering materials or reagents for carrying out a method described herein. In some embodiments, kits can include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., probes, enzymes, adaptors, primers, reaction reagents, reaction vessels and/or surfaces in appropriate containers) and/or supporting materials (e.g., written instructions for performing the assay, handling instructions) from one location to another. For example, in some embodiments kits include one or more enclosures (e.g., boxes) containing the relevant reaction reagents and/or supporting materials. Such contents may be delivered to the intended recipient together or separately. For example, a first container may contain an enzyme for use in an assay, while a second container may contain adaptors. A kit alone or in combination may be formulated for selecting and enriching target templates from a nucleic acid sample containing non-target and target sequences. A kit may include one or more adaptors as described herein, primers; exonucleases; ligase; polymerase(s); buffers; and nucleotides. A kit may further comprise one or more buffer solutions and standard solutions for the creation of a DNA library. These components may be present in a single reaction vessel or multiple tubes and may be packaged separately or together.
(79) Automated Work Flows
(80) Methods disclosed herein, may be performed with at least some automation. Systems for processing multiple samples in parallel may be adapted for use with the disclosed methods. For example, systems for processing samples in racks of tubes, multi-well plates, on droplets on surfaces, and/or through microfluidics (including variations that use pressure, electrical potential, acoustic forces, and/or other forces to manipulate fluids and contact materials). Methods disclosed herein may be performed, for example, using an Echo® 525 Liquid Handler (Labcyte, Inc., San Jose, Calif.) or by means of microfluidic devices or a lab on a chip (Aqua Drop, Sharp). For the methods shown in the
(81) It may be desirable in agricultural research, to analyze a particular single nucleotide polymorphism (SNP) profile from a single plant or multiple plants in a single sequencing reaction. Automated multiplexing may include assessing multiple target genomic regions of interest from multiple plants. This can be achieved in a platform that utilizes for example 96 well dishes where 5′ adaptor ligation, hybridization, capture, enrichment and 3′ adaptor addition is performed in individual wells of 96-well plates. Following 5′ adaptor ligation of polynucleotide fragments from a single sample (part of a plant) is achieved, polynucleotide fragments from multiple samples (multiple plant parts) from all 96 wells can be combined into a single well in a second 96 well plate for capture enrichment and 3′ adaptor ligation. Multiplexed samples from a plurality of wells in the second plate (multiple plants) may then be pooled for sequencing.
(82) All patents and publications, including all sequences disclosed within such patents and publications, referred to herein including U.S. Provisional Application No. 62/781,762 filed December 19, 2018, are expressly incorporated by reference.
EXAMPLES
(83) Aspects of the present teachings can be further understood in light of the following examples of linear amplification without multiplexing (Example 1) and multiplexing without linear amplification (Example 2), which should not be construed as limiting the scope of the present teachings in any way.
(84) Any reagents used herein that are not otherwise associated with a vendor, were obtained from New England Biolabs, Ipswich, Mass.
Example 1
An Enriched Library of Target Sequences from Human DNA Using Linear Amplification
(85) Human gDNA was added to NEBNext Ultra II FS reaction buffer and NEBNext Ultra II FS enzyme mix according to manufacturer's instructions (New England Biolabs, Ipswich, Mass.). NEBNext Ultra II FS enzyme mix contains enzymes that perform DNA fragmentation, end repair, and A-tailing. The mixture was cooled to 4° C., and a double-stranded adaptor (the first adaptor) was added, with the bottom strand being
(86) TABLE-US-00001 (SEQ ID NO: 1) 5'GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTNNNNNNNNNNNN AGGCTATAGTGTAGATCTCGGTGGTCGCCGTATCATT 3'.
(87) A 12N (a random sequence of 12 nucleotides) UMI and a sample tag (the 8 bold underlined letters) were present in the adaptor. The top strand was complementary to nucleotides 1-32 of SEQ ID NO:1 (the portion on the 5′ side of the tag) and contained several deoxyuridine residues and a 3′ T overhang. NEBNext Ultra II Ligation Master Mix was added to the reaction mixture and incubated for 15 minutes at 20° C. (step 2 of
(88) DNA fragments in water were added to NEBNext Ultra II Q5 Master Mix, 2 μl Antarctic Thermolabile UDG, and linear amplification primer, with sequence 5′ AATGATACGGCGACCACC 3′ (SEQ ID NO:2). The reaction was incubated at 37° C. for 10 minutes, 98° C. for 30 seconds, and then subjected to 20 cycles of 98° C. for 10 seconds, 55° C. for 10 seconds, and 72° C. for 20 seconds, then a final incubation at 72° C. for 2 minutes (step 3 of
(89) This reaction was transferred to a hybridization mix (see NEBNext Ultra II Q5 PCR mix) that contained target isolation probes, each comprising a target-specific oligonucleotide (bait) and an affinity binding domain (namely, biotin), and incubated at 95° C. for 10 minutes, then 58° C. for 90 minutes (step 4 of
(90) The beads were resuspended in a 3′-5′ single stranded exonuclease buffer with enzyme and incubated for 5 minutes at 37° C., and 5 minutes at 25° C. (step 5 of
(91) The magnetic beads were then washed and resuspended in a 1× NEBNext Q5 Hot Start HiFi PCR Master Mix containing NEBNext Direct® PCR primers (New England Biolabs, Ipswich, Mass.) for PCR amplification.
(92) The PCR products obtained from the target sequences were analyzed on an Agilent TapeStation and then sequenced using conventional methods. The Agilent TapeStation performs capillary electrophoresis and determines size and concentration of DNA fragments. Example results obtained from the TapeStation are shown in
(93) The table below shows example sequencing metrics for a low input target enrichment library made by the method described above. This data was produced using 50 ng input and a 30kb panel with paired end reads of 75 bp each.
(94) TABLE-US-00002 Low input library Sample 1 PF reads 20,003,888 % Aligned 98.4% % Inserts On Target 90.9% Mean Target Coverage 10917.6 (post duplicate filtering) Median Insert Size (bp) 164
(95) The following definitions explain the entries in the first column of this table:
(96) Pass Filter (PF) Reads: The number of passing filter reads, including all reads marked as duplicates, identified as adaptor sequences, etc.;
(97) % Aligned: The percentage of passing filter reads that were aligned at any quality and for at least one base, to the reference genome;
(98) % Inserts On Target: The percentage of aligned inserts or templates, or in the case of single-end sequencing reads, that have at least one base overlapping at a target (post de-duplication);
(99) Mean Target Coverage: the mean coverage in de-duplicated bases of all targets deemed to have received non-zero coverage where that is defined as any target with at least one base covered to 2×; and
(100) Median Insert Size: The median of the calculated insert size from all read-pairs that have both ends mapped to the same chromosome (post-deduplication).
(101) In this example, deduplication was achieved by duplicate filtering. This data shown in the above table demonstrates the present method is capable of generating at least 10,000× mean target coverage with a high percentage (e.g., over 90%) of on-target inserts.
(102) An explanation of duplication and deduplication is provided by Marx, Nature Methods 2017, 14, 473-476. Deduplication tools are offered by the Brabaham Institute UK, 10XGenomics and Joint Genome Institute (Dedupe).
Example 2
An Enriched Library of Target Sequences from 96 Tomato DNA Samples Following Pooling
(103) Materials and Methods
(104) The materials and methods from Example 1 were used up to and including the step of adaptor ligation and sample tagging (step 2 in
(105) Tomato gDNA was analyzed. The double-stranded adaptor (first adaptor) had the following sequence, with the plus strand being 5′
(106) TABLE-US-00003 (SEQ ID NO: 3) AATGATACGGCGACCACCGAGATCTACACCGAATACGNNNNNNNNN NNNACACTCTTTCCCTACACGACGCTCTTCCGATCT 3'.
(107) After step 2 of
(108) Purified DNA fragments were hybridized to target isolation probes, each comprising a target-specific oligonucleotide linked to Biotin as described in Example 1. As in Example 1, one target-specific oligonucleotide was designed to bind the 5′ and the other was designed to bind the 3′ end of the target region. The remaining steps were the same as in Example 1.
(109) Results
(110) The PCR products obtained from the target sequences were analyzed on an Agilent TapeStation and then sequenced using conventional methods. The TapeStation performs capillary electrophoresis and determines size and concentration of DNA fragments. Exemplary results obtained from the TapeStation are shown in
(111) The table below shows exemplary sequencing metrics for a target enrichment library made by the method described above. In this example, 96 tomato samples were fragmented independently, tagged with a unique sample tag in the adaptor, then pooled together in a single hybridization (i.e. the samples were multiplexed). The samples could be discriminated from one another in the analysis of the sequencing data by using the unique sample tags to find the sequencing reads that correspond to each sample (i.e. the samples were de-multiplexed).
(112) This representative data from a single sample was produced using 25 ng tomato DNA input and a panel covering 2323 genomic markers (single nucleotide polymorphisms, or SNPs) with paired end reads of 75 bp each. Additional data from all 96 samples in this example may be seen in
(113) TABLE-US-00004 Single sample from a 96 plex hybridization Sample 1 PF reads 1,614,772 % Aligned 99.25% % Inserts On Target 84.02% Mean Target Coverage 65.58 (post duplicate filtering) Median Insert Size (bp) 116
(114) This data shown in the above table and in
Example 3
Analysis of a SNP in Tomato Plants
(115) A single leaf punch from each of 96 individual tomato plants (e.g., all of a single variety of interest) was placed in each well of a 96 well plate for subsequent DNA extraction. Following DNA extraction, fragmentation, and adaptor ligation and enrichment described in Example 2 (steps 1-3), the 5′ adaptor ligated fragments of the samples in the 96 wells were pooled in a single hybridization mix containing oligonucleotide with affinity binding domains. Steps 4-7 in