Next Generation Sequencing

20230144391 · 2023-05-11

    Inventors

    Cpc classification

    International classification

    Abstract

    An improved method for Next Generation Sequencing which relies on the presence of the same distinct unique molecular identifier (UMI) located at each end of a linear nucleic acid molecule so that sequence reads of approximately 2 kb or longer are obtained, and which allows generation of a genomic map without the need of a reference sequence.

    Claims

    1. A method for generating a next generation sequencing library of any nucleic acid product comprising (a) isolating genomic DNA (gDNA), cDNA, or RNA from an organism; (b) fragmenting the gDNA, cDNA, or RNA; (c) isolating the gDNA, cDNA, or RNA fragments of step (b) and optionally conducting end-repair and/or A-tailing on each gDNA, cDNA, or RNA fragment; (d) obtaining a collection of arbitrary and unique molecular identifier (UMI) adapters, each adapter comprising a different UMI flanked by at least one deoxyuracil (dU) with the form of dU/UMI/dU; (e) attaching a different dU/UMI/dU adapter to each end of the gDNA, cDNA, or RNA fragment of step (c) to form a collection, wherein both ends of each gDNA, cDNA, or RNA fragment contains a dU/UMI/dU adapter to form dU/UMI/dU/gDNA/dU/UMI/dU fragments, dU/UMI/dU/cDNA/dU/UMI/dU fragments, or dU/UMI/dU/RNA/dU/UMI/dU fragments and wherein each end of the fragment of step (c) contains a different dU/UMI/dU adapter; (f) fragmenting the collection of dU/UMI/dU/gDNA/dU/UMI/dU fragments, dU/UMI/dU/cDNA/dU/UMI/dU fragments, or dU/UMI/dU/RNA/dU/UMI/dU fragments into smaller fragments to form dU/UMI/dU/gDNA and gDNA/dU/UM/dU fragments, dU/UMI/dU/cDNA and cDNA/dU/UM/dU fragments, or dU/UMI/dU/RNA and RNA/dU/UM/dU fragments; (g) optionally isolating the dU/UMI/dU/gDNA and gDNA/dU/UM/dU fragments, dU/UMI/dU/cDNA and cDNA/dU/UM/dU fragments, or dU/UMI/dU/RNA and RNA/dU/UM/dU fragments; (h) self-ligating each of the fragments of step (e) to form circular molecules; (i) contacting the self-ligated circular molecules of step (g) with a USER enzyme to create a linear gDNA, cDNA, or RNA, wherein each end of the USER treated gDNA, cDNA, or RNA has the same UMI at each end; (j) conducting end-repair on the USER treated linear gDNA, cDNA, or RNA of step (h); (k) ligating sequencing adapters comprising sequencing primers and index sequences to each end of the linear gDNA, cDNA, or RNA of step (i); (l) preparing a collection of primer extensions comprising a primer and a sequencing primer tail; (m) performing PCR extension using random primers with sequencing adapter tails on each gDNA, cDNA, or RNA of step (j) using the primer extensions of step (k); (n) generating a population of PCR fragments with different lengths, wherein each PCR fragment contains a UMI sequence; and (o) isolating the PCR products of step (m) for sequencing.

    2. The method according to claim 1, wherein the fragments of step (b) are approximately 5-8 kb in length.

    3. The method according to, wherein the dU/UMI/dU/gDNA and gDNA/dU/UM/dU fragments, dU/UMI/dU/cDNA and cDNA/dU/UM/dU fragments, or dU/UMI/dU/RNA and RNA/dU/UM/dU fragments of step (f) are 2-4 kb in length.

    4. The method according to claim 1, wherein the self-ligation of step (h) does not produce hetero-concatemers.

    5. A method for generating a genomic map of an organism without reliance on a reference sequence comprising (a) preparing a next generation sequencing library of PCR products according to claim 1, wherein gDNA is used and the primer extension collection of step (l) is prepared without reliance on a reference sequence using arbitrary N primer extensions comprising a sequencing primer tail; (b) conducting next generation sequencing of the PCR products of step (o) to obtain gDNA sequences; and (c) aligning the gDNA sequences of the PCR products using the UMI sequences as starting points for de novo sequence assembly.

    6. A method for generating a next generation sequencing library of any nucleic acid product comprising (a) isolating genomic DNA (gDNA), cDNA, or RNA from an organism; (b) fragmenting the gDNA, cDNA, or RNA; (c) isolating the gDNA, cDNA, or RNA fragments of step (b) and optionally conducting end-repair and/or A-tailing on each gDNA, cDNA, or RNA fragment; (d) obtaining a collection of arbitrary and unique molecular identifier (UMI) adapters, each adapter comprising a different UMI flanked by at least one deoxyuracil (dU) with the form of dU/UMI/dU; (e) attaching a different dU/UMI/dU adapter to each end of the gDNA, cDNA, or RNA fragment of step (c) to form a collection, wherein both ends of each gDNA, cDNA, or RNA fragment contains a dU/UMI/dU adapter to form dU/UMI/dU/gDNA/dU/UMI/dU fragments, dU/UMI/dU/cDNA/dU/UMI/dU fragments, or dU/UMI/dU/RNA/dU/UMI/dU fragments, wherein each end of the fragment of step (c) contains a different dU/UMI/dU adapter; (f) fragmenting the collection of dU/UMI/dU/gDNA/dU/UMI/dU fragments, dU/UMI/dU/cDNA/dU/UMI/dU fragments, or dU/UMI/dU/RNA/dU/UMI/dU fragments into smaller fragments to form dU/UMI/dU/gDNA and gDNA/dU/UM/dU fragments, dU/UMI/dU/cDNA and cDNA/dU/UM/dU fragments, or dU/UMI/dU/RNA and RNA/dU/UM/dU fragments; (g) optionally isolating the dU/UMI/dU/gDNA and gDNA/dU/UM/dU fragments, dU/UMI/dU/cDNA and cDNA/dU/UM/dU fragments, or dU/UMI/dU/RNA and RNA/dU/UM/dU fragments; (h) self-ligating each of the fragments of step (f) to form a collection of circular molecules; (i) fragmenting the collection of circular molecules of step (h) to create a collection of linear gDNA, cDNA, or RNA molecules, wherein the dU/UMI/dU is internal to the two ends of each linear molecule; (j) contacting the collection of linear molecules of step (i) with a USER enzyme to cleave the dU/UMI/dU sequence and to create two linear gDNA, cDNA, or RNA fragments from each linear molecule in the collection; (k) conducting end-repair on the USER treated linear gDNA, cDNA, or RNA fragment of step (j); (l) ligating sequencing adapters comprising sequencing primers and, optionally, index sequences to each end of the linear gDNA, cDNA, or RNA of step (k); (m) preparing a collection of primer extensions comprising a sequencing primer tail; (n) performing PCR extension on each gDNA, cDNA, or RNA of step (l) using the sequencing primers of step (m); (o) generating a population of PCR fragments with different lengths, wherein each PCR fragment contains a UMI sequence; and (p) isolating the PCR products of step (o) for sequencing.

    7. The method according to claim 6, wherein the fragments of step (b) are approximately 5-8 kb in length.

    8. The method according to claim 6, wherein the dU/UMI/dU/gDNA and gDNA/dU/UM/dU fragments, dU/UMI/dU/cDNA and cDNA/dU/UM/dU fragments, or dU/UMI/dU/RNA and RNA/dU/UM/dU fragments of step (f) are 2-4 kb in length.

    9. The method according to claim 6, wherein the self-ligation of step (h) does not produce hetero-concatemers.

    10. The method according to claim 6, wherein the sequence of the sequencing primers of step (m) correspond to sequences from a reference genome.

    11. A method for generating a genomic map of an organism comprising (a) preparing a next generation sequencing library of PCR products according to claim 6; (b) conducting next generation sequencing of the PCR products of step (o) to obtain gDNA sequences; (c) aligning the gDNA sequences of the PCR products using the UMI sequences as starting points for sequence assembly; and (d) comparing the alignments of step (c) to a reference genome.

    12. A method for generating a virtual karyotype for an organism comprising (a) generating a genomic map for the organism according to claim 11; (b) comparing the genomic map from step (a) to a reference genomic map of the organism's chromosomes; (c) identifying the genomic sequences associated with each chromosome; and (d) identifying any chromosomal abnormalities in the genomic map of step (a) as compared to the reference genomic map.

    13. The method according to claim 12, wherein the chromosomal abnormality is a translocation.

    14. The method according to claim 12, wherein the chromosomal abnormality is identification of a sequence insertion and/or deletion.

    15. The method according to claim 12, wherein the chromosomal abnormality is a sequence variation and/or a sequence copy number variation (CNV).

    16. The method according to claim 12, wherein the chromosomal abnormality is a chromosomal rearrangement.

    17. A method of generating a next generation targeted sequencing library for a nucleic acid product of interest comprising (a) isolating genomic DNA (gDNA), cDNA, or RNA from an organism; (b) fragmenting the gDNA, cDNA, or RNA; (c) isolating the gDNA, cDNA, or RNA fragments of step (b); (d) performing PCR extension on the fragments of step (c) using a collection of forward primers and a collection of reverse primers, wherein the forward primer comprises a sequence specific to the nucleic acid product of interest joined to a random unique molecular identifier (UMI) joined to a sequencing adapter tail and the reverse primer comprising a sequence specific to the nucleic acid product of interest; (e) performing PCR primer extension using random primers joined to the forward primer sequencing adapter tail; and (f) isolating the primer extensions of step (e) for sequencing.

    18. The method according to claim 17, wherein the collection of reverse primers consists of identical sequences.

    19. The method according to claim 17, wherein the collection of reverse primers consists of different sequences.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0028] FIG. 1—Flowchart for generating a linear nucleic acid molecule having the same UMI at each end using genomic DNA (gDNA), cDNA, or RNA, overlapping fragments to form a library suitable for amplification with random N primer extension for de novo assembly of genome sequence prior to NGS. Solid lines represent a single strand of gDNA cDNA, or RNA; dotted lines represent newly replicated DNA, right-angled arrow represents random N sequencing primer; “nt” refers to nucleotide, “n” refers to a whole number greater than 0, such as 1-40, 4-30, 6-20, and all integers there between; “dU” refers to deoxyuridine; “UMI” refers to unique molecular identifier; “USER enzyme” refers to Uracil-Specific Excision Reagent; “PCR” refers to Polymerase Chain Reaction; “NGS” refers to Next Generation Sequencing.

    [0029] FIG. 2—Flowchart for generating a linear nucleic acid molecule having a single centrally located UMI using genomic DNA (gDNA), or cDNA for NGS library production Solid lines represent a single strand of DNA, cDNA, or RNA; “nt” refers to nucleotide, “n” refers to a whole number greater than 0, such as 1-40, 4-30, 6-20, and all integers there between; “dU” refers to deoxyuridine; “UMI” refers to unique molecular identifier; “USER enzyme” refers to Uracil-Specific Excision Reagent; “PCR” refers to Polymerase Chain Reaction; “NGS” refers to Next Generation Sequencing.

    [0030] FIG. 3—Example of adapters having the form nt.sub.ndU.sub.n-UMI-dU.sub.nnt.sub.n. “nt” refers to nucleotide, “n” refers to a whole number greater than 0, such as 1-40, 4-30, 6-20, and all integers there between; “dU” refers to deoxyuridine; “UMI” refers to unique molecular identifier; “N” refers to Adenine, Guanine, Cytosine, Thymine, Uracil, or modifications thereof.

    [0031] FIG. 4—Schematic of how sequencing using a fixed primer at one end and random primers annealing to various segments of the insert results in a longer DNA fragment read and how all reads come from the same DNA fragment.

    [0032] FIG. 5—Evidence of circularization. Lane 1: negative control (no DNA) pre-circularization; Lane 2, Lambda gDNA pre-circularization; Lane 3: negative control post-circularization; Lane 4: Lambda gDNA post-circularization.

    [0033] FIG. 6—Schematic of an Integrative Genomics Viewer (IGV) output showing paired reads aligned to a lambda genome and UMIs being properly incorporated/attached. UMIs are indicated by black bars.

    [0034] FIG. 7—Example of insert sizes and the extension sizes obtained from one end of the insert. Insert size is determined by subtracting the value in 5p from the value in 3p while extension size is determined by subtracting the value in End_1 from the value in End_2 for each paired read. Insert sizes from 500-1100 base pairs are extended by up to 129 base pairs, permitting a sequence read of about 1200 bp. Here, the extension comes from a second random primer binding to the same DNA fragment but in a different site and extends the sequence coverage. This Figure provides evidence that the extension of the sequence to create a long-read works on different size DNA fragments.

    [0035] FIG. 8—Schematic showing the frequency of the four bases in the random UMI sequences. Each horizontal grid line is percent measured in increments of 5 and each vertical grid line is length measured in increments of 40 bp. Each of the four bases in the UMI Random Sequence portion of the read is present at about 25%, indicating that the UMIs are completely random and there is no bias.

    [0036] FIG. 9—Graph showing uniform coverage of the sequences for a 50 kb Lambda genome from two different libraries. Each horizontal grid line is the depth measured in increments of 25 and each vertical grid line is the position measured in increments of 10,000 bp. A: Library F4S; B: Library F4L.

    [0037] FIG. 10—Insert size distribution based on sequencing results for two different libraries showing inserts of up to 2 kb. A: Library F4S; B: Library F4L.

    [0038] FIG. 11—Schematic showing that using DNA sequence reads of up to 2 kb or longer permits generation of a large contig and de novo genome assembly by removing the need for a reference genome alignment.

    [0039] FIG. 12—Schematics of virtual karyotyping. A: Depiction of karyotyping bands assigned to a single chromosome, showing the p arm, centromere, and q arm on the left and using arrowheads on the right to identify specific karyotype bands. B: Depiction of a translocation between chromosome 4 and chromosome 20 where the translocated regions are shown within brackets. C: Depiction of a translocation between the q arm of chromosome 6 and the p arm of chromosome 7. The arrow pointing to the q arm of the chromosome identified as der(6) indicates the juncture point of the translocation with the p arm of chromosome 7 and provides the sequence at the juncture point. Similarly, the arrow pointing to the p arm of the chromosome identified as der(7) indicates the juncture point of the translocation with the q arm of chromosome 6 and provides the sequence at the juncture point.

    [0040] FIG. 13—Schematic showing PCR amplification of 16S rDNA where a 16S rDNA reverse primer is used at one end of the 16S rDNA nucleic acid and a primer comprising ILLUMINA® Adapter and Sequencing Primer joined to a random UMI joined to a 16S rDNA forward primer.

    [0041] FIG. 14—Schematic of a random N primer extension of a 16S rDNA PCR product.

    DETAILED DESCRIPTION

    Definitions

    [0042] Unless defined otherwise, all technical and scientific terms used herein generally have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Techniques and procedures that are common in the field of molecular genetics and nucleic acid chemistry are generally performed according to conventional methods and can be found in various general references and/or laboratory manuals, such as Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, second edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (which is incorporated herein by reference).

    [0043] “Adapter,” as used herein refers to a short, chemically synthesized, single-stranded or double-stranded oligonucleotide that can be ligated to the ends of a DNA, cDNA, or RNA molecule.

    [0044] As used herein, “amplification reaction” refers to any in vitro means for multiplying one or more copies of a target sequence of nucleic acid in a linear or exponential manner. Examples of amplification reactions are polymerase chain reaction (PCR), DNA ligase chain reaction (U.S. Pat. Nos. 4,683,195 and 4,683,202), QBeta RNA replicase and RNA transcription-based amplification reactions involving T7, T3, or SP6 primed RNA polymerization, transcription amplification system (TAS), nucleic acid sequence based amplification (NASBA), isothermal amplification reactions, etc.

    [0045] The articles “a” and “an” are used herein to refer to one or more than one (i.e. to at least one) of the grammatical object of the article. For example, “an element” means one element or more than one element.

    [0046] As used herein, “amplifying” refers to a step of submitting a reaction solution to conditions permitting amplification of a polynucleotide. Components of the reaction solution include, for example, primers, a polynucleotide template, polymerase, nucleotides, and other needed reagents. Amplifying can result in a linear or an exponential increase in the target polynucleotide.

    [0047] As used herein, two nucleic acid sequences “complement” one another or are “complementary” to one another if the base pair one another at each position.

    [0048] As used herein, the term “contig” is an abbreviation for the term “contiguous” and refers to as set of overlapping DNA segments derived from a single source of genetic material that together represent a defined region of the genome from which they were derived and which provide the complete DNA sequence for that region. In bottom-up sequencing projects, a contig refers to overlapping sequence data (reads); in top-down sequencing projects contig refers to the overlapping clones that form a physical map of the genome.

    [0049] As used herein, the phrase “contig map” refers to a map depicting the relative order of a linked library of small overlapping clones representing a complete chromosome segment.

    [0050] As used herein, two nucleic acid sequences “correspond” to one another if they are both complementary to the same nucleic acid sequence.

    [0051] “Hybridization” or “hybridizes,” as used herein, refers to the act or process of forming a double stranded nucleic acid molecule from two polynucleotides that are relatively complementary to one another. In some cases, hybridization can occur between two polynucleotides that have less than 100% complementarity.

    [0052] As used herein, “molecular index” or “index” refers to a short sequence tag that is ligated to all polynucleotides originating from the same sample. Molecular indices are typically at least 4 nucleotides in length, such as at least 6, 8, 10, or 12 nucleotides in length. The length of the molecular index determines how many unique samples can be differentiated. For example, a 1 nucleotide index can differentiate at most 4 different samples, a 4 nucleotide index can differentiate at most 4.sup.4 or 256 samples, a 6 nucleotide index can differentiate at most 4096 different samples, and an 8 nucleotide index can differentiate at most 65,536 different samples.

    [0053] As used herein, “nucleic acid” and “nucleic acid molecule” are used interchangeably and mean a polymeric form of nucleotides of any length that are DNA, cDNA, or RNA in single-stranded, double-stranded, linear, and/or circular form. The terms “nucleic acid” and “polynucleotide” are also used interchangeably herein. Non-limiting examples of polynucleotides are coding or non-coding regions of a gene or gene fragment, a locus or loci defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), ribozymes, cDNA, synthetic polynucleotides, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. A polynucleotide may be modified, such as by conjugation with a labeling component or interrupted by non-nucleotide components.

    [0054] “Nucleotide,” as used herein, is a molecule containing a nitrogenous base, a five-carbon sugar, and a phosphate group and are typically referred to by the name of their nitrogenous base: Adenine, Cytosine, Guanine, Thiamine, and Uracil. Nucleotides can be naturally occurring or can be modified. Modifications include, but are not limited to, those providing chemical groups that incorporate additional charge, polarizability, hydrogen bonding, electrostatic interaction, and points of attachment. Examples of modifications are phosphodiester group modifications (e.g. replacement with phosphonate or alkylphosphotriesters; Durand et al. (1989) Nuc Acid Res), pentose sugar modifications (e.g. dideoxynucleotide triphosphates; Sanger et al. (1977) PNAS 74(12):5463-5467) purine and pyrimidine modifications (e.g. cross-coupling reactions; Liang and Wnuk (2015) 20(3):4874-4901), base-paring alterations (e.g. isobases; Chawla et al. (2015) Nuc Acid Res 43(14):6714-6729), and peptide nucleic acids (see Menchise et al. (2003) PNAS 100(21): 12021-12026). Modifications can also include addition of a fluorophore or other moieties.

    [0055] As used herein, “oligonucleotide” refers to a polynucleotide that contains a relatively small number of nucleotides; that is, a polynucleotide having a short length. For example, the length of the oligonucleotide can be at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, or 50 nucleotides. Oligonucleotides are typically referred to by their length, followed by “-mer,” such as hexamer, 12-mer, 25-mer, etc.

    [0056] “Polymerase” refers to an enzyme that performs template-directed synthesis of polynucleotides and encompasses both the full length polymerase polypeptide and a fragment of the polymerase polypeptide containing a domain having polymerase activity. Examples of DNA polymerases include those isolated or derived from Thermus flavus, Thermus aquaticus, Pyrococcus woesei, Thermus ubiquitous, Thermus thermophilus, Thermus litoralis, and Thermotoga maritima, among others. Examples of RNA polymerases include those isolated or derived from T3 bacteriophage, T7 bacteriophage, SP6 bacteriophage, among others.

    [0057] As used herein, “polymerase chain reaction” and “PCR” refer to a method of amplifying a target nucleic acid sequence in a geometric progression. PCR is well known in the art and, for DNA molecules, typically comprise either two step cycles (having a denaturation step followed by a hybridization/elongation step) or three step cycles (having a denaturation step followed by a hybridization step followed by an elongation step). For RNA PCR, the RNA is first transcribed into cDNA by reverse transcriptase and the cDNA is then used as the template for the PCR reaction.

    [0058] “Primer,” as used herein, refers to a polynucleotide sequence that hybridizes to a sequence on a target nucleic acid molecule and serves as a point of initiation of nucleic acid synthesis. Primers can be any length, but are typically less than 50 nucleotides in length, such as 10-30 nucleotides. Primers can be designed to complement a known sequence or can be a random sequence of nucleotides, typically known as N-random primers. In some cases, primers can include one or more modified or non-naturally occurring nucleotides.

    [0059] “Reference genome,” as used herein refers to a digital nucleic acid sequence database that has been assembled as a representative example of the set of genes in one idealized individual organism of a species. Reference genomes are typically assembled from a number of individual donors and do not accurately represent the set of genes of any single individual organism. Reference genomes are used as a guide on which new genomes are built.

    [0060] As used herein, “sequence read” refers to an inferred sequence of nucleotides corresponding to all or part of a single polynucleotide fragment. “Read length” is the number of nucleotides sequenced and the sequence read can begin at any point along the length of the target polynucleotide fragment.

    [0061] “Sequencing depth” and “read depth” are used interchangeably and describe the number of times that a given nucleotide in the sample population has been read in an experiment. The individual reads are bioinformatically overlapped or “tiled” to generate longer contiguous sequences that can make up meaningful data/increased accuracy for an RNA population or genomic sequence.

    [0062] A “template,” as used herein refers to a polynucleotide sequence that comprises the polynucleotide to be amplified, flanked by at least one primer hybridization site. In some cases, a target template comprises the target polynucleotide sequence flanked by a hybridization site for a “forward” primer and a “reverse” primer.

    [0063] As used herein, “Tm” refers to the melting temperature of two polynucleotides at which 50% of the polynucleotides are bound and 50% of the oligonucleotide molecules are not bound.

    [0064] “UMI,” as used herein, refers to a unique molecular identifier that can be attached to at least one end on a polynucleotide and acts as a molecular tag that allows identification of the polynucleotide to which it is attached in a population of polynucleotides having different UMIs.

    [0065] As used herein, “USER” and “USER enzyme” refers to uracil-specific excision reagent which cleaves at a deoxyuracil (dU), creating a single nucleotide gap at each location of dU and resulting in a polynucleotide fragment flanked with at least one single-stranded extension that allows seamless and directional assembly of customized molecules.

    DISCLOSURE

    [0066] One aspect of the disclosure provided herein is a method of NGS library production as shown in FIG. 1. Here, in Step 1, genomic DNA (gDNA), cDNA, or RNA is isolated. This can be accomplished using a method presented in a laboratory manual, such as Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, or by using a commercially available kit, such as those available from Thermo Fisher Scientific(Carlsbad, Calif.), Bio-Rad (Hercules, Calif.), Qiagen (Germantown, Md.), Promega (Madison, Wis.), Zymo Research (Irvine, Calif.), Agilent (La Jolla, Calif.), or Roche Life Science (Penzberg, Germany), to name but a few.

    [0067] Next, the isolated gDNA, cDNA, or RNA is fragmented. If RNA is the starting material, because of its size no fragmentation is needed, although RNA nucleic acids are first converted to cDNA using methods standard in the art. When needed, fragmentation can be accomplished by sonication, treatment with dsDNA FRAGMENTASE® (available from NEB, Ipswich, Mass.), restriction enzyme treatment, manual shearing, etc. Suitable sonicator devices are commercially available from COVARIS® (Woburn, Mass.), Diagenode (Denville, N.J.), Qsonica (Newtown, Conn.), Fisher Scientific (Carlsbad, Calif.), Thomas Scientific (Swedesboro, N.J.), and PRO Scientific (Oxford, Conn.), to name but a few. Restriction enzymes are widely available from laboratory reagent supply stores, such as NEB (Ipswich, Mass.), Promega (Madison, Wis.), Thermo Fisher Scientific (Carlsbad, Calif.), Bio-Rad (Hercules, Calif.), Zymo Research (Irvine, Calif.), and Promega (Madison, Wis.), to name but a few. Fragments resulting from treatment with restriction enzyme that cleave asymmetrically, leaving single stranded overhangs (“sticky ends”), are ready for Step 2. However, when fragmentation is conducted via a method other than treatment with asymmetrically cleaving restriction enzymes, the polynucleotide fragments must be submitted to end-repair and/or A-tailing. Protocols for end-repair and A-tailing are widely available, such as in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, and kits to accomplish these treatments are also available, for example from NEB (Ipswich, Mass.), Roche (South San Francisco, Calif.), and Thermo Fisher Scientific (Carlsbad, Calif.).

    [0068] The fragmentation results in polynucleotide fragments having an average size in the range of about 2 kb to about 50 kb, such as an average size of at least 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 11 kb, 12 kb, 13 kb, 14 kb, 15 kb, 20 kb, 25 kb, 30 kb, 35 kb, 40 kb, or 50 kb.

    [0069] Once the polynucleotide fragments having single strand overhangs are generated, adapters are attached via ligation to each end of the polynucleotide fragment. The population of polynucleotides is combined with a population of double stranded adapters, each adapter comprising, in 5′ to 3′ order, (a) a short oligonucleotide containing at least one dU nucleotide, (b) a unique UMI, and (c) either a restriction enzyme overhang compatible with the restriction enzyme overhang generated in Step 1 via fragmentation or a T overhang compatible with the A-tail present on the fragment.

    [0070] In this step, each end of the polynucleotide fragments has a different unique UMI. Examples of suitable adapters are NEBNEXT® adapters (NEB, Ipswich, Mass.) and Roche KAPA adapters (South San Francisco, Calif.). Such dU containing oligonucleotides can be synthesized by companies such as Synbio Technologies (Monmouth Junction, N.J.) and then, if needed, attached to a collection of unique UMIs having either sticky ends or T-tails. Alternatively, custom adaptors containing (a)-(c) are synthesized together generating a collection of unique UMIs such that no attachment is needed, for example by Integrated DNA Technologies (Coralville, Iowa). The resulting adapter-polynucleotide-adapter molecules have blunt ends or have a T overhang to be ligated to the A overhang produced after fragmentation and repair of the nucleic acids. In some cases the UMI sequence contains a variable number of random nucleotides (i.e., “N,” any one of which can be A, C, G, T, or U or modifications thereof), such as 1-40, 4-30, 6-20. As an example, the UMI may have the sequence shown in FIG. 3 with 11 random nucleotides:

    TABLE-US-00001 (SEQ ID NO: 1) 5′P- UCUNNNNNNNNNNNACAT- 3′OH  (SEQ ID NO: 2) OH 3′- TAGANNNNNNNNNNNUGU- 5′P 

    [0071] After attachment of the adaptors, Step 3 is performed. Here, a second fragmentation is performed, this time using the polynucleotides generated in Step 2 which have adapters with unique UMIs. Any of the fragmentation procedures discussed above can be used. If fragmentation occurs by sonication/shearing/dsDNA Fragmentase, blunt ends are ensured by repairing using standard methods. If fragmentation occurs via cleavage with an asymmetric enzyme, one or more is selected that has sticky ends that will allow ligation to those present on the adapter. In some instances, the fragmented polynucleotides having adapters with unique UMIs are isolated and in other instances no further isolation is conducted.

    [0072] The sample containing the fragmented polynucleotides having adapters with unique UMIs is then diluted to prevent hetero-concatemer formation and the polynucleotides are self-ligated/circularized. Linear DNA remaining in the self-ligation reaction is removed using exonuclease, such as exonuclease V, exonuclease VI, etc. Each self-ligated, circular polynucleotide has one of the two adapters having unique UMIs that were attached in Step 2.

    [0073] Step 4 involves USER enzyme treatment of the self-ligated, circular polynucleotide. Here, the USER enzyme creates a single nucleotide gap at each location of dU and generates a linear polynucleotide having one strand of the UMI attached to each end of the linear polynucleotide. End repair is then conducted resulting in a fully double stranded polynucleotide having an identical UMI at each of its ends (see FIG. 1).

    [0074] Importantly, because the adapters added in Step 2 represent a population containing numerous unique UMIs, the resulting collection of polynucleotides provides the ability for de novo genome assembly, virtual karyotyping, identification of indels, variants, copy number variations (CNV), translocations and/or chromosomal rearrangements.

    [0075] That is, when genomic fragments were first sequenced using Sanger sequencing, de novo assembly was used for all genomes, including humans. However, short read NGS sequences can only be used for identification of a targeted fusion, which means it can only detect fusions for a certain number of genes and cannot detect any chromosomal inversions/translocations nor identify the exact fusion site and fusion sequence.

    [0076] Specifically, using 2 kb or longer reads of DNA sequences generated by the disclosed methodology, DNA segments are assembled from sequences of the same DNA origin and are used to create a de novo assembly of the chromosome contigs. Each chromosome is assigned its correct number (e.g., in humans 1-22, X, or Y). This is done by comparing the sequences present in each of the assembled chromosomes to existing karyotyping banks and assigning karyotyping bands to each assembled chromosome based on its sequence (see FIG. 12A). Any discrepancy between the normal/expected banding pattern and the observed/identified banding is noted/flagged by the software. Chromosomes having bands coming from other chromosomes or being moved within the same chromosome are identified as a translocation or an inversion (see FIG. 12B). Because the sequence on each chromosome comes from a 2 kb or longer DNA segment, the software can pinpoint the fusion sequence of the translocated or inverted regions (see FIG. 12C). This is not possible using existing technologies such as the ILLUMINA® mate pair sequencing, which allows distant sequences to be identified, yet does not reliably and unequivocally identify the entire junction sequence due to the presence of gaps, especially in the repeat regions, and the reliance on a reference mandated by short read NGS.

    [0077] Step 5 involves ligation of a second adapter containing a sequencing primer site and/or a sequencing primer site and molecular index to each end of the polynucleotide of Step 4 (see FIG. 1). Such adapters are available from Integrated DNA Technologies (Coralville, Iowa), among other companies. Each polynucleotide in the collection has, in 5′ to 3′ order, the following: (a) a sequencing primer site, (b) optionally a molecular index, (c) a unique UMI, (d) a polynucleotide fragment from Step 3, (e) the same unique UMI as (c), (f) optionally the molecular index of (b), and (g) a sequencing primer site. In some instances, the sequencing primer sites of (a) and (g) are identical, in other instances they are different. Suitable sequencing primer sites are those recognized by P5 and P7.

    [0078] In some aspects, the resulting nucleic acid fragment contains, in 5′ to 3′ order, P7 sequence, a P7 end molecular index, a read 1 and/or gene specific primer site, the target nucleic acid, a read 2 primer site, read 2 sequencing primer site, a P5 end molecular index, and a P5 sequence.

    [0079] The polynucleotide population generated in Step 5 is then subjected to an initial PCR amplification using primers corresponding to the sequencing primer site(s) or using a random N primer extension approach (shown in FIG. 1). Random N primer extension kits are commonly available from NEB (Ipswich, Mass.), Thermo Fisher Scientific (Carlsbad, Calif.), Stratagene (La Jolla, Calif.), and Agilent (Santa Clara, Calif.), to name but a few, and the extension conducted according to the manufacturer's directions.

    [0080] Alternatively, the procedure shown in FIG. 2 is followed. Here, after performing Steps 1-3 the self-ligated circular DNA or RNA molecule is fragmented. Any of the fragmentation procedures discussed above can be used. If fragmentation occurs by sonication/shearing/dsDNA Fragmentase, blunt ends are ensured by repairing using standard methods. If fragmentation occurs via cleavage with an asymmetric enzyme, one or more is selected that has sticky ends that will allow ligation to those present on the adapter. It only after this fragmentation that USER enzyme treatment, fill-in, and end repair/A-tailing is conducted. ILLUMINA® adapters are then attached before conducting a final PCR step.

    [0081] In all cases, flow cell binding sequences are added to each resulting nucleic acid fragment prior to the final PCR step.

    [0082] This completes NGS library; however, the collection is then typically subjected to a final PCR amplification to ensure sufficient representation of fragments in the library population.

    [0083] Depending on the purpose for the NGS library, NGS sequencing is conducted using primers directed to known polynucleotide sequences in an organism's DNA or RNA (see FIG. 3) or is conducted using random N primer extension (see FIG. 1). When using random N primer extension, the primer contains a sequencing primer tail which does not rely on a reference sequence. Such random N primers are available from NEB (Ipswich, Mass.) and Thermo Fisher Scientific (Carlsbad, Calif.), among others. A typical sequence read from the NGS library disclosed above is at least about 1500 base pairs, which is significantly longer than what is normally achieved using standard NGS library preparations. FIG. 4 illustrates how using a fixed primer and random primers on the other end results in a longer DNA fragment and indicates that all sequences come from the same DNA fragment.

    [0084] In addition, because each member has a different, unique UMI at each end of the linear polynucleotide fragment as shown in FIG. 1, when using random N primer extensions there is no need for a reference sequence. Since the initial step of the NGS library preparation was fragmentation of the starting collection of polynucleotides followed by size selection, the resulting original fragments themselves contained overlap. That fact, in combination with the second fragmentation and duplication of the UMI associated with the fragment resulting from the second fragmentation allows alignment of sequences without reliance on a reference sequence.

    [0085] As a simplified example, consider a 3 kb polynucleotide having UMI#1 at each end. Sequencing with random N primers will generate a series of at least ˜1500 bp reads from each of the two strands of DNA, each read originating at a different point in the 3 kb fragment. Similarly, a 3 kb polynucleotide having UMI#2 at each end that is sequenced with random N primers will generate a series of at least ˜1500 bp reads from each of the two strands of that DNA. This then allows aligning/overlapping of the sequences using UMI#1 and UMI#2 as the starting points to generate the full sequence of the fragment having UMI#1 and the full sequence of the fragment having UMI#2. Consequently, between the sequence overlap of the original size selected polynucleotides, the overlap in the sequence reads of the final NGS library collection and the presence of UMI#1 and UMI#2 in the sequence reads, one can determine the exact 5′ to 3′ sequence located on the original gDNA, cDNA, RNA polynucleotide as shown in FIG. 11.

    [0086] In addition to generalized genomic sequencing, the technology presented herein can be used for targeted sequencing. Here, the method uses the random UMI tagging on one end of the target sequence and a random primer extension on the other end. This allows accurate identification of the sequence, especially in situations where, for example, sequences from different organisms have high homology and/or have long runs of homopolymers and/or high G-C content.

    [0087] FIG. 13 illustrates how targeted sequence can work using bacterial 16S rDNA as an example. PCR amplification is conducted using a 16S specific primer for one end of the molecule and at the other end of the molecule a primer comprising an ILLUMINA® adapter and sequencing primer joined to a random UMI joined to a 16S rDNA specific primer. As shown in FIG. 14, the amplified products are then sequenced with random primers having sequencing adapter tails. The segments are then aligned using these features in a manner essentially identical to that shown in FIG. 4. As indicated by the size distribution shown in FIGS. 10A and B, the 16S rDNA ˜1500-1600 bp sequence could be completely covered.

    Examples

    [0088] The following examples are provided by way of illustration only and not by way of limitation. A variety on non-critical parameters can be changed or modified to yield essentially the same or similar results.

    A. Library Production for De Novo Genomic Sequence Assembly (FIG. 1)

    Step 1—Fragmentation

    [0089] Up to 30 μg/g-TUBE (Covaris, Boston, England) of non-degraded, fully solubilized DNA in DI water or TE (10 mM Tris, 0.1 mM EDTA, pH 8.0) or 10 mM Tris-Cl, pH 8.5 having a starting size larger than 45 kbp is vortexed for 10 s and pre-warmed at room temperature (20° C. to 30° C.). The sample containing g-TUBE is centrifuged for 30 seconds at 13, 300 rpm (16,276 rcf(g)) to drain the sample from the upper chamber of the g-TUBE and transfer it to the bottom of the tube. Tubes are inverted and the centrifugation step repeated using the same time and speed. Sheared DNA is recovered, reapplied to the upper chamber of the same g-TUBE, the centrifugation and inversion repeated twice, and the sheared DNA again recovered. DNA is concentrated using a Zymo DNA concentration kit (Zymo; Irvine, Calif.). Average size of the fragmented gDNA is measured by loading 2 μl of the sample on a 1% agarose gel. Samples having an average size of ˜6 kb are selected.

    Step 2—Adapter Attachment

    [0090] 1 μg of fragmented DNA is brought to a 50 μl volume using EB buffer (10 mM Tris-Cl, pH 8.5) prior to adding 3 μl of NEB NEXT® ULTRA™ II end prep enzyme mix (NEB, Ipswich, Mass.) and 7 μl of NEB NEXT® ULTRA™ II end prep reaction buffer (NEB, Ipswich, Mass.). Sample is incubated for 30 minutes at 20° C. followed by a 30 minute incubation at 65° C.

    [0091] The molarity of the fragmented DNA is calculated according using the following values: DNA length of ˜6 kb, 1 ug=0.25 pmol; dU-UMI is 18 bp at 100 uM/0.2 ug=16.83 pmols. A 10-100 molar excess of adapters are used. 2.5 μl of IDT: dU-UMI+T (Integrated DNA Technologies, Coralville, Iowa) overhang adapters are mixed with 1 μl of NEBNEXT® ligation enhancer (NEB, Ipswich, Mass.) and 30 μl of NEBNEXT® ULTRA™ II Ligation master mix (NEB, Ipswich, Mass.) prior to incubating for 15 minutes at 20° C.

    SPRI Beads Purification

    [0092] SPRI beads 1:0.7×(65 μl of SPRI) are added to the sample and incubated for 5 minutes at room temperature. After placing on the magnet, the sample is further incubated another 5 minutes before adding 200 μl of 80% ethanol and incubating 30 seconds to wash. Sample is again placed on the magnet and excess ethanol removed. Sample is air dried for no more than 2 minutes, resuspended in 18 μl EB buffer (10 mM Tris-Cl, pH 8.5), incubated for 2 minutes, and again placed on the magnet to recover 16 μl. Sample can then be stored at 4° C. for up to 72 hours or at −20° C. for longer periods.

    Step 3—dsDNA Fragmentase Reaction

    [0093] A digestion using NEBNEXT® dsDNA Fragmentase (NEB, Ipswich, Mass.) is conducted by combining 5 ng-3 μg of prepared DNA in 16 μl, 2 μl of NEBNEXT® dsDNA Fragmentase reaction buffer (NEB, Ipswich, Mass.) and 2 μl of NEBNEXT® dsDNA Fragmentase. The reaction is incubated for 5 minutes on ice prior to a 4 minute incubation at 37° C. 5 μl of 0.5 M EDTA is added to stop the reaction and a 5 μl sample electrophoresed on a 1% agarose gel to assess fragment size. Samples having an average size of 3 kb are selected and the volume adjusted to 50 μl with water.

    [0094] The digested DNA is purified using the SPRI procedure described above.

    End Repair

    [0095] 3 μl of T4 PNK/T4 polymerase is added to the 50 μl sample of purified DNA along with 7 μl of reaction buffer (Thermo Fisher, Carlsbad, Calif.; NEB, Ipswich, Mass.) and incubated for 30 minutes at 20° C. SPRI purification is conducted essentially as set forth above, but using SPRI beads 1:0.8×(48 μl of SPRI).

    Self-Circularization

    [0096] 50 μl of DNA, 5 μl of T4 DNA ligase high concentration (cat#M0202T; NEB, Ipswich, Mass.), 140 μl of T4 DNA ligase buffer (NEB, Ipswich, Mass.) and 460 μl of H.sub.2O are mixed and incubated at 16° C. for 4 hours (or overnight at 4° C.) with interval shaking. The reaction is heat inactivated at 65° C. for 10 minutes.

    Purification

    [0097] Purification is conducted using SPRI beads 1:0.7× by mixing 710 μl of DNA, 50 μl of SPRI beads and 450 μl of buffer (20% PEG 8000 w/v, 2.5 M NaCl, 10 mM Tris-HCl, 1 mM EDTA, 0.05% Tween 20 v/v, pH 8.0). The sample is resuspended in 18 μl EB buffer (10 mM Tris-C1, pH 8.5), incubated for 2 minutes, and again placed on the magnet to recover 16 ul.

    Step 4— USER+Fill-in

    [0098] 2 μl of CutSmart 10×buffer (NEB, Ipswich, Mass.) and 2 μl of USER enzyme (NEB, Ipswich, Mass.) is added and the sample incubated at 37° C. for 30 minutes. 2 μl of Antarctic Phosphatase Reaction Buffer (10×) and 1 μl of Antarctic Phosphatase (NEB, Ipswich, Mass.) is then added and the reaction incubated for another 30 minutes at 37° C. The reaction is heat inactivated by incubating at 80° C. for 5 minutes. 3 μl of NEBNEXT® ULTRA′ II End Prep Enzyme Mix (NEB, Ipswich, Mass.) is then added, the sample incubated for 30 minutes at 20° C., and then incubated for 30 minutes at 65° C.

    Step 5— Adapter Ligation

    [0099] A ligation is performed by adding 2.5 μl of 15 μM Adapter of a fully-dsDNA adapter that has been designed as a truncated P7 adapter, for example:

    TABLE-US-00002 (SEQ ID NO: 3) 5′-GGGGGGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3′ (SEQ ID NO: 4) 3′-AAAAACTGACCTCAAGTCTGCACACGAGAAGGCTAG/5Phos/-5′
    These can be ordered from IDT or ILLUMINA®. 1 μl of Enhancer and 30 μl of Ligation Mix (all from NEB, Ipswich, Mass.) are added to the sample. The solution is incubated for 15 minutes at 20° C. Purification is conducted using SPRI beads 1:0.9×essentially as described above. The purified sample is resuspended in 20 μl EB buffer and 18.5 μl of purified product recovered.

    Step 6— Random Primer Extension

    [0100] 1 μl of 25 μM truncated-P5 random primer and 2.5 μl of SD polymerase 10×buffer (Boca Scientific, Dedham, Mass.) are added to the sample, incubated at 98° C. for 2 minutes and immediately placed on ice for 3 minutes. 2 μl of 10 mM dNTP mix and 1 μl of SD DNA Polymerase are added and the sample is subjected to several cycles of incubation for 3 minutes at 92° C.; incubation for 5 minutes at 16° C.; a 0.1° C./second ramp to 68° C.; and extension at 68° C. for 5 minutes. This produces different sized molecules from the 3 kb fragment from both ends, all ending with the random UMI associated with that fragment, as illustrated in FIG. 4, which could be 2 kb or longer. The genome map is similarly generated from each continuous sequence or contig associated with each random UMI, as illustrated in FIG. 11.

    [0101] The reaction volume is increased to 50 μl with EB buffer and two rounds of SPRI-bead purification are conducted, essentially as described above, using 100 μl of SPRI-beads. The products are eluted with 22 μl of EB buffer and 20 μl of purified product obtained.

    Barcoding PCR

    [0102] 25 μl of Q5® Hot Start High-Fidelity 2×Master Mix, 2.5 μl of Universal Primer and 2.5 μl of barcoded primer (NEBNEXT® multiplex oligo kit, Ipswich, Mass.) are added to 20 μl of the purified product. PCR is conducted by first incubating for 30 seconds at 98° C.; and then conducting at least 5 cycles using the following cycling times: 98° C. for 10 seconds; 65° C. for 30 seconds; 72° C. for 60 seconds. A final extension is performed for 5 minutes at 72° C. prior to holding at 4° C.

    [0103] SPRI-bead purification is conducted using 1:0.9×beads essentially as described above. Elution is conducted with 17 μl EB buffer and 15-16 μl of purified product is obtained. Fragment size is assessed on a TapeStation (HS-D1000 or D5000 Assay; Agilent, San Diego, Calif.) and quantitated using the Qubit assay (dsDNA BR or dsDNA HS; Thermo Fisher Scientific, Carlsbad, Calif.).

    MISEQ™ Sequencing

    [0104] Sequencing was conducted using the MISEQ™ System (ILLUMINA®, San Diego, Calif.) according to the manufacturer's instructions. Loading concentrations were 6-20 pM for Kit v3 and 6-10 pM for Kit v2 (300 cycle kit). Samples were 4-6 ng/μl (˜10 nM).

    Contig Formation

    [0105] The individual reads are bioinformatically overlapped to generate longer contiguous sequences or contigs where each continuous sequence or contig is associated with a specific random UMI as illustrated in FIG. 4. The genome map is similarly generated from each UMI continuous sequence or contig, as illustrated in FIG. 11.

    B. Library Preparation for Genomic Sequence Assembly Using Reference Sequences (FIG. 2)

    [0106] Steps 1-3 are conducted as shown above.

    Linear Digestion of Self-Circularized DNA

    [0107] 5 μl of Thermolabile exonuclease I and 5 μl of exonuclease V (RecBCD) (NEB, Ipswich, Mass.; Thermo Fisher, Carlsbad, Calif.) is added to the self-circularized sample and incubated at 37° C. for 2 hours with interval shaking. Inactivation of the reaction is accomplished by incubating at 80° C. for 20 minutes. Purification is conducted using SPRI beads 1:0.7× by mixing 710 μl of DNA, 50 μl of SPRI beads and 450 μl of buffer (20% PEG 8000 w/v, 2.5 M NaCl, 10 mM Tris-HCl, 1 mM EDTA, 0.05% Tween 20 v/v, pH 8.0) according to the manufacturer's directions. The isolated linear DNA is resuspended in 18 μl of EB (10 mM Tris-Cl, pH 8.5) and 16 μl recovered.

    Further Fragmentation

    [0108] An additional/optional fragmentation step is conducted using an enzymatic method as described above or via a sonicator. If performing an enzymatic digestion, purification is required, such as using SPRI beads 1:0.7× as described above, but is not necessary for sonication, which is done in a minimum volume of 50 μl. The goal is an average size of 1 kb. After the further fragmentation procedure, DNA size is assessed on a 1.2% agarose gel.

    USER Treatment, End Repair and A-tailing

    [0109] 7 μl of NEBNEXT® ULTRA′ II End Prep Reaction Buffer (NEB, Ipswich, Mass.) and 3 μl of USER enzyme (NEB, Ipswich, Mass.) is added to 50 μl DNA prior to incubation at 37° C. for 30 minutes. The reaction is heat inactivated by incubation for 5 minutes at 60° C. 3 μl of NEBNEXT® ULTRA™ II End Prep Enzyme mix (NEB, Ipswich, Mass.) is added, incubated at 20° C. for 30 minutes and then incubated for a further 30 minutes at 65° C.

    [0110] Alternatively, 7 μl of NEBNEXT® ULTRA™ II End Prep Reaction Buffer (NEB, Ipswich, Mass.) and 3 μl of USER enzyme (NEB, Ipswich, Mass.) is added to 50 μl DNA prior to incubation at 37° C. for 30 minutes. 3 μl of Klenow Fragment (3′-5′ exo-; NEB, Ipswich, Mass.) is added to the solution and incubated for 30 minutes at 37° C. prior to the addition of 3 NEBNEXT® ULTRA™ II End Prep Enzyme mix (NEB, Ipswich, Mass.). The reaction is then incubated at 20° C. for 30 minutes and then incubated for a further 30 minutes at 65° C.

    ILLUMINA® Adapter Ligation

    [0111] A ligation is performed by adding 2.5 μl of 15 μM Adapter (stem loop) from NEB Next Multiplex Oligos for ILLUMINA® along with 1 μl of Enhancer and 30 μl of Ligation Mix (all from NEB, Ipswich, Mass.) to the sample generated from USER treatment+End Repair and A-tailing. The solution is incubated for 15 minutes at 20° C. 3 μl of USER enzyme is then added and incubated for a further 15 minutes at 37° C. Purification is conducted using SPRI beads 1:0.85×(80 μl of SPRI) essentially as described above. The purified sample is resuspended in 25 μl EB buffer and 22.5 μl recovered. This is the sample used in the final PCR reaction.

    Final PCR

    [0112] If desired, sample barcodes can be introduced, such as when a truncated adapter like the NEBNEXT® Multiplex Oligos for ILLUMINA® (96 Unique Dual Index Primer Pairs (NEB, Ipswich, Mass.) was used during the adapter litigation. Reaction composition and reaction conditions vary depending on the DNA polymerase used, but are conducted according to the manufacturer's instructions.

    [0113] PCR amplification is conducted using the following PCR cycling conditions

    TABLE-US-00003 Cycle Step Temperature Time Cycles Initial Denaturation 98° C. 30 seconds 1 Denaturation 98° C. 10 seconds 3-15* Annealing/Extension 65° C. 75 seconds Final Extension 65° C.  5 minutes 1 Hold  4° C. ∞ *The number of PCR cycles is chosen based on input amount anc sample type. That is, the number of cycles is high enough to provide sufficient library fragments for a successful sequencing run, yet low enough to avoid PCR artifacts and over-cycling.

    [0114] The amplified sample is purified using SPRI beads 1:0.9×(45 μl of SPRI) essentially as described above. The purified sample is resuspended in 25 μl of EB and 22.5 μl recovered. For quality control purposes, the NGS library can be evaluated using Qubit concentration measurements (ThermoFisher Scientific, Carlsbad, Calif.), TapeStation (HS-D100 or D1000 Assay; Agilent, San Diego, Calif.) or, optionally, qPCR for a more accurate quantification.

    MISEQ™ Sequencing

    [0115] Sequencing was conducted using the MISEQ™ System (ILLUMINA®, San Diego, Calif.) according to the manufacturer's instructions. Loading concentrations were 6-20 pM for Kit v3 and 6-10 pM for Kit v2 (300 cycle kit). Samples were 4-6 ng/μl (˜10 nM).

    C. Lambda Library Generation

    Fragmentation

    [0116] Up to 30 ug/g-TUBE (Covaris, Boston, England) of non-degraded, fully solubilized DNA in DI water or TE (10 mM Tris, 0.1 mM EDTA, pH 8.0) or 10 mM Tris-Cl, pH 8.5 having a starting size larger than 45 kbp is pre-warmed at room temperature (20° C. to 30° C.). The sample containing g-TUBE is centrifuged for 30 seconds at 13, 300 rpm (16,276 rcf(g)). Tubes are inverted and the centrifugation step repeated. Sheared DNA is recovered, reapplied to the upper chamber of the same g-TUBE, the centrifugation and inversion repeated twice, and the sheared DNA again recovered. Optionally, DNA is concentrated using a Zymo DNA concentration kit (Zymo; Irvine, Calif.). Final DNA concentration is measured using a Qubit fluorometer (Thermo Fisher Scientific, Carlsbad, Calif.). Alternatively, the average size of the fragmented gDNA is measured by loading 2 μl of the sample on a 1% agarose gel. Samples having an average size of ˜6 kb are selected.

    End Repair/dA-Tailing

    [0117] End repair/dA-tailing is accomplished with an NEBNEXT® ULTRA™ II End Repair kit (NEB, Ipswich, Mass.) after 1-2 μg of fragmented DNA is diluted to a final volume of 50 μl with EB buffer (10 mM Tris-Cl, pH 8.5). The final volume is 60 μl.

    Ligation of dU-UMI Adapters

    [0118] The molarity of the fragmented DNA and dU-UMI is calculated. DNA length of ˜6 kb, 1 ug=0.25 pmol; 10 fold excess dU-UMI is 2.5 pmol; 100 fold excess dU-UMI is 25 pmol. A 10-100 molar excess of adapters are used. An appropriate amount (1-10 μl) of dU-UMI+T (Integrated DNA Technologies, Coralville, Iowa) overhang adapters are added to the 60 μl end repair/A-tailing mixture and then ligated using 1 μl Enhancer and 30 μl Ligation Mix from an NEBNEXT® ULTRA™ II Ligation kit (NEB, Ipswich, Mass.) prior to incubating for 15 minutes at 20° C.

    SPRI Beads Purification

    [0119] SPRI beads 1:0.8×(e.g. Beckman Coulter, Indianapolis, Ind.; Biocompare, South San Francisco, Calif.) were used according to the manufacturer's instructions and DNA eluted with 18 μl EB buffer for a sample recovery volume of 16-18 μl. Sample can then be stored at 4° C. for up to 72 hours or at −20° C. for longer periods.

    dsDNA Fragmentase Reaction

    [0120] A 16 μl sample of the SPRI purified DNA is digested using NEBNEXT® dsDNA Fragmentase (NEB, Ipswich, Mass.) according to the manufacturer's instructions but incubating at 37° C. for only 3.5-4 minutes. A 5 μl sample is optionally electrophoresed on a 1% agarose get to assess fragment size. Samples having an average size of 3 kb are selected and the volume adjusted to 50 μl with water.

    [0121] The 50 μl digested DNA is purified using an SPRI bead ratio of 1:0.7×according to the manufacturer's instructions and eluted in 52 μl EB buffer to recover a volume of 50-52 μl.

    End Repair

    [0122] End repair is accomplished by adding the following components to 50 μl of the purified digested DNA: 6 μl CutSmart buffer (NEB, Ipswich, Mass.), 0.6 μldNTPs (100 μM), 6 μl ATP (1 mM), 1 μl T4 polymerase, and 1 μl T4 PNK. Sample is incubated for 30-60 minutes at 20° C. SPRI purification is conducted using an SPRI bead ration of 1:1× according to the manufacturer's instructions and eluted in 52 μl for a recovery volume of 50-52 μl.

    Self-Circularization

    [0123] 150 ng of DNA, 15 μl ligase buffer and 3 μl Thermo T4 ligase (5 U/μl; Thermo Fisher, Carlsbad, Calif.), 7.5 μl PEG 4000 (5%), and water are mixed for a final volume of 150 μl. The sample is incubated at 16-20° C. overnight and heat inactivated at 65° C. for 10 minutes. Any remaining linear DNA is removed by adding 2 μl Exonuclease I and 2 μl Exonuclease V prior to incubating at 37° C. for 1 hour. The exonucleases are inactivated by incubating at 75° C. for 20 minutes prior to SPRI purification using a bead ratio of 1:0.7× and eluting in 52 μl EB buffer for a recovery volume of 50-52 μl. To confirm that circularization occurred, additional sequences, used as primers, were added to the end of the dU-UMI adapters such that only the circularized DNA fragments could amplify a PCR reaction. The smear present in Lane 4 of FIG. 5 indicates the presence of circularized DNA.

    USER+Fill-in

    [0124] 7 μl of NEBNEXT® ULTRA™ End Prep Reaction Buffer (NEB, Ipswich, Mass.) and 3 μl of USER enzyme (NEB, Ipswich, Mass.) is added to the self-circularized sample and incubated at 37° C. for 30 minutes prior to a 60° C. incubation for 5 minutes. 3 μl of NEBNEXT® ULTRA™ II End Prep Enzyme Mix (NEB, Ipswich, Mass.) is then added, the sample incubated for 30 minutes at 20° C., and then incubated for 30 minutes at 65° C.

    Adapter Ligation

    [0125] 2.5 μl of 15 μM P5end duplex adapter compatible with T/A ligation is added to the USER+Fill-in sample along with 1 P5_duplex adapter Enhancer and 30 μl Ligation Mix from an NEBNEXT® ULTRA™ II Ligation kit (NEB, Ipswich, Mass.) prior to incubating for 15 minutes at 20° C. Sample is SPRI purified using a bead ratio of 1:0.85×according to manufacturer's instructions and eluting in 26 μl EB buffer for a recovery volume of 24-26 μl. A 2 μl sample is quantitated using a Qubit Fluorometer (Thermo Fisher, Carlsbad, Calif.).

    Polymerase Mediated Primer Extension

    [0126] The following primers were used in the polymerase mediated primer extension reactions: P7 extension V1 (5′.fwdarw.3′) GGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNN (SEQ ID NO:5) and P7 extension V2 (5′.fwdarw.3′) GACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNNNNNNNNNN (SEQ ID NO:6). 50 ng of DNA was used for each 10 μl reaction along with 2 μl dNTP mix, 1 μl SD DNA polymerase and 1 μl 10×enzyme buffer (Boca Scientific, Dedham, Mass.), and 2 μl of either the V1 or V2 primer. The sample is subjected to four cycles of incubation for 3 minutes at 92° C.; incubation for 5 minutes at 16° C.; and a 0.1° C./second ramp to 68° C. The sample was then extended at 68° C. for 15 minutes prior to holding at 4° C. This produces different sized molecules from the 3 kb fragment, all ending with the random UMI associated with that fragment, as illustrated in FIG. 4. The genome map is similarly generated from each continuous sequence or contig associated with each random UMI, as illustrated in FIG. 11.

    [0127] The reaction was SPRI purified using a bead ratio of 1:0.9×according to manufacturer's instructions and eluted in 26 μl for a recovery volume of 24-26 μl. DNA concentration was measured using 2 μl of sample on a Qubit Fluorometer (Thermo Fisher, Carlsbad, Calif.) or, optionally, via electrophoresis.

    MISEQ™ Sequencing

    [0128] Sequencing was conducted using the MISEQ™ System (ILLUMINA®, San Diego, Calif.) according to the manufacturer's instructions. Loading concentrations were 6-20 pM for Kit v3 and 6-10 pM for Kit v2 (300 cycle kit). Samples were 4-6 ng/μl (˜10 nM).

    [0129] Examples of the results obtained are shown in FIG. 6-10.

    Contig Formation

    [0130] The individual reads are bioinformatically overlapped to generate longer contiguous sequences or contigs where each continuous sequence or contig is associated with a specific random UMI as illustrated in FIG. 4. Each UMI contig is then. bioinformatically overlapped to create a genome map or is compared to existing lambda genome sequences.

    D. Targeted Sequencing of 16S Bacterial rDNA

    [0131] PCR Amplification (Option 1)

    [0132] Up to 100 ng/μl of non-degraded, fully solubilized DNA extracted from samples containing microbiomes or different bacterial species in DI water or TE (10 mM Tris, 0.1 mM EDTA, pH 8.0) or 10 mM Tris-C1, pH 8.5 are used for amplification of 16S rDNA. Universal 16S rRNA bacterial primers 27F (5′-AGAGTTTGATCCTGGCTCAG-3′ (SEQ ID NO:7)) and 1392R (5′-GGTTACCTTGTTACGACTT-3′ (SEQ ID NO:8)) or 8F (5′-TGGAGAGTTTGATCCTGGCTCAG-3′ (SEQ ID NO:9)) and 533R (5′-TACCGCGGCTGCTGGCAC-3′ (SEQ ID NO:10)), or a different set of primers designed to amplify the bacterial 16S rDNA universally, are used to amplify this gene in a PCR reaction (Wang et al. (2018) AMB Expr 8:182). The forward or the reverse primer (only one primer) has a random UMI and the ILLUMINA® sequencing adapter sequence added to it. The PCR program is as follows: 95° C. for 5 min, 26 cycles at 95° C. for 60 s, 55° C. for 30 s, and 72° C. for 90 s, with a final extension of 72° C. for 10 min.

    [0133] Sample is SPRI purified using a bead ratio of 1:0.85×according to manufacturer's instructions and eluted in 26 μl EB buffer for a recovery volume of 24-26 μl. A 2 μl sample is quantitated using a Qubit Fluorometer (Thermo Fisher, Carlsbad, Calif.).

    Adapter Ligation

    [0134] 2.5 μl of 15 μM P5 and P3 end duplex adapter compatible with T/A ligation is added to the PCR product along with 1 P5 and P3_duplex adapter Enhancer and 30 μl Ligation Mix from an NEBNEXT® ULTRA™ II Ligation kit (NEB, Ipswich, Mass.) prior to incubating for 15 minutes at 20° C. Sample is SPRI purified using a bead ratio of 1:0.85×according to manufacturer's instructions and eluted in 26 μl EB buffer for a recovery volume of 24-26 μl. A 2 μl sample is quantitated using a Qubit Fluorometer (Thermo Fisher, Carlsbad, Calif.).

    Polymerase Mediated Primer Extension

    [0135] The following primers were used in the polymerase mediated primer extension reactions: P7 extension V1 (5′.fwdarw.3′) GGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNN (SEQ ID NO: 5) and P7 extension V2 (5′.fwdarw.3′) GACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNNNNNNNNNN (SEQ ID NO:6). 50 ng of DNA is used for each 10 μl reaction along with 2 μl dNTP mix, 1 μl SD DNA polymerase and 1 μl 10×enzyme buffer (Boca Scientific, Dedham, Mass.), and 2 μl of either the V1 or V2 primer. The sample is subjected to four cycles of incubation for 3 minutes at 92° C.; incubation for 5 minutes at 16° C.; and a 0.1° C./second ramp to 68° C. The sample is then extended at 68° C. for 15 minutes prior to holding at 4° C. This produces different sized molecules from the 16S rDNA fragment, all ending with the random UMI associated with that fragment, as illustrated in FIG. 14.

    [0136] The reaction is SPRI purified using a bead ratio of 1:0.9×according to manufacturer's instructions and eluted in 26 μl for a recovery volume of 24-26 μl. DNA concentration was measured using 2 μl of sample on a Qubit Fluorometer (Thermo Fisher, Carlsbad, Calif.) or, optionally, via electrophoresis.

    MISEQ™ Sequencing

    [0137] Sequencing is conducted using the MISEQ™ System (ILLUMINA®, San Diego, Calif.) according to the manufacturer's instructions. Loading concentrations are 6-20 pM for Kit v3 and 6-10 pM for Kit v2 (300 cycle kit). Samples are 4-6 ng/μl (˜10 nM).

    Contig Formation

    [0138] The individual reads are bioinformatically overlapped to generate longer contiguous sequences or contigs where each continuous sequence or contig is associated with a specific random UMI. Continuous sequences or contigs can then be compared to known 16S rDNA sequences for identification of bacterial species or variants, or can be used to as an identification tool for new species.

    [0139] PCR Amplification (Option 2)

    [0140] Up to 100 ng/μl of non-degraded, fully solubilized DNA extracted from samples containing microbiomes or different bacterial species in DI water or TE (10 mM Tris, 0.1 mM EDTA, pH 8.0) or 10 mM Tris-Cl, pH 8.5 is used for 16S rDNA amplification. Universal 16S rRNA bacterial primers 27F (5′-AGAGTTTGATCCTGGCTCAG-3′ (SEQ ID NO:7)) and 1392R (5′-GGTTACCTTGTTACGACTT-3′ SEQ ID NO:8)) or 8F (5′-TGGAGAGTTTGATCCTGGCTCAG-3′ SEQ ID NO:9)) and 533R (5′-TACCGCGGCTGCTGGCAC-3′ (SEQ ID NO:10)), or a different set of primers designed to amplify the bacterial 16S rDNA universally, are used to amplify this gene in a PCR reaction (Wang et al. (2018) AMB Expr 8:182). The forward or the reverse primer (only one primer) has a random UMI added to it. The PCR program is as follows: 95° C. for 5 min, 26 cycles at 95° C. for 60 s, 55° C. for 30 s, and 72° C. for 90 s, with a final extension of 72° C. for 10 min.

    [0141] Sample is SPRI purified using a bead ratio of 1:0.85×according to manufacturer's instructions and eluted in 26 μl EB buffer for a recovery volume of 24-26 μl. A 2 μl sample is quantitated using a Qubit Fluorometer (Thermo Fisher, Carlsbad, Calif.).

    Polymerase Mediated Primer Extension

    [0142] The following primers are used in the polymerase mediated primer extension reactions: P7 extension V1 (5′.fwdarw.3′) GGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNNNNNNNN (SEQ ID NO:5) and P7 extension V2 (5′.fwdarw.3′) GACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNNNNNNNNNN (SEQ ID NO:6). 50 ng of DNA is used for each 10 μl reaction along with 2 μl dNTP mix, 1 μl SD DNA polymerase and 1 μl 10×enzyme buffer (Boca Scientific, Dedham, Mass.), and 2 μl of either the V1 or V2 primer. The sample is subjected to four cycles of incubation for 3 minutes at 92° C.; incubation for 5 minutes at 16° C.; and a 0.1° C./second ramp to 68° C. The sample is then extended at 68° C. for 15 minutes prior to holding at 4° C. This produces different sized molecules from the 16S rDNA fragment, all ending with the random UMI associated with that fragment, as illustrated in FIG. 14.

    [0143] The reaction is SPRI purified using a bead ratio of 1:0.9×according to manufacturer's instructions and eluted in 26 μl for a recovery volume of 24-26 μl. DNA concentration is measured using 2 μl of sample on a Qubit Fluorometer (Thermo Fisher, Carlsbad, Calif.) or, optionally, via electrophoresis.

    MISEQ™ Sequencing

    [0144] Sequencing is conducted using the MISEQ™ System (ILLUMINA®, San Diego, Calif.) according to the manufacturer's instructions. Loading concentrations are 6-20 pM for Kit v3 and 6-10 pM for Kit v2 (300 cycle kit). Samples are 4-6 ng/μl (˜10 nM).

    Contig Formation

    [0145] The individual reads are bioinformatically overlapped to generate longer contiguous sequences or contigs where each continuous sequence or contig is associated with a specific random UMI. Continuous sequences or contigs can then be compared to known 16S rDNA sequences for identification of bacterial species or variants, or can be used to as an identification tool for new species.