Method for High-Throughput, Ultra Long-Read DNA Sequencing
20170275683 · 2017-09-28
Inventors
Cpc classification
C12Q2537/165
CHEMISTRY; METALLURGY
C12Q2537/165
CHEMISTRY; METALLURGY
C12Q2523/303
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
C12Q2523/303
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
International classification
Abstract
The Invention is a method for ascertaining extremely long DNA sequence reads (kilobases or megabases) from polony-type DNA sequencers. Polony-type DNA sequencers (e.g., Illumina, Roche, and Life Technologies sequencers) typically give read lengths of only about 500 bp. The Invention can extend those read lengths by orders of magnitude.
Claims
1. A method for ascertaining very long regions (kilobases or tens of kilobases or more) of possibly non-contiguous DNA sequence originating on a single long molecule of DNA comprising the use of: (i) a polony-type DNA sequencer (as defined above); (ii) DNA combing or other method for stretching DNA molecules upon a solid support or substrate; (iii) a support or substrate, such as a modified flow cell, that binds DNA; (iv) a procedure for fragmenting and amplifying DNA molecules in situ (e.g. http://www.illumina.com/products/nextera_xt_dna_library_prep_kit.html, the “Nextera” method from Illumina); (v) flow cell imaging as used on polony sequencers; and (vi) software for using spatial, geometric or directional information from images of the flow cell, and in some cases known genomic sequences, to deconvolute polonies and reconstruct long sequences.
Description
BRIEF DESCRIPTION OF THE DRAWINGS:
[0009]
[0010]
[0011]
[0012]
DETAILED DESCRIPTION OF THE INVENTION:
[0013] Although there are many embodiments of the invention, the most obvious is the embodiment on an Illumina flow cell, a planar piece of modified glass with attached oligonucleotides. The description below refers to this Illumina flow cell embodiment (
[0014] None of the individual steps below are entirely novel. DNA combing (step 1) (
[0015] 1. The procedure begins with long (tens of kilobases or megabase) DNA molecules. These are applied to the flow cell in solution, and stretched over the flow cell by some embodiment of DNA combing (
[0016] 2. For optimum results, the flow cells used in this procedure would have their surfaces chemically modified to increase DNA binding and capture. A large literature exists on various chemical modifications useful for this purpose, as such binding and capture reactions have been used for the construction of microarrays. For example, the flow cell surface could be chemically modified using reactive groups such as aldehyde groups, amino groups, ester groups, epoxide groups, methacrylate groups, and many others (http://www.arrayit.com/Products/Microarray Slides/microarray slides.html, Lee et al. 2012, “Rapid and Facile Microwave-Assisted Surface Chemistry for Functionalized Microarray Slides”, Adv. Funct. Mater 22(4):872-878; Kwiat et al., 2012, “Non-covalent monolayer-piercing anchoring of lipophilic nucleic acids: preparation, characterization,m and sensing applications. J. Am. Chem. Soc. 134(1):280-92.
[0017] 3. The stretched DNA molecules would be fragmented in situ, then amplified in situ (
[0018] 4. Sequencing of each polony will occur as in a normal Illumina sequencing reaction.
[0019] 5. The sequence of DNA in each polony will be obtained using imaging and imaging software as in a normal Illumina sequencing reaction.
[0020] 6. Custom, novel software would deconvolute the molecules on the flow cell, determining which belong to the same, original long molecule. Note that the flow cells will contain a very high density of polonies, and (unlike the drawings,
[0021] In the easy case, the genomic sequence of the DNA being sequenced is already known (this would be true if, for instance, sequencing were being done to determine haplotype). In this case, the algorithm would focus on the sequence in a particular polony, and look for other polonies “in line” (
[0022] In the hard case, an organism with a novel genomic sequence would be under study. In this case, sequence information from a related organism could be used as above, since gene orders are often similar between organisms (synteny). But even without synteny, deconvolution can be done de novo using high sequence depth (i.e., sequencing each region of the genome multiple times, such as 100 times (referred to as “100× coverage” or “100× depth”). In such a case, an algorithm would focus on a sequence from a particular polony, then find all polonies on the flow cell with at least a portion of the same sequence (for 100× coverage, there would be about 100 such colonies), then look at all “in line” sequences for all 100 polonies, and finally find in line sequences shared, and in order, by the 100 lines of polonies (
[0023] Note that step 3 (fragmentation, capture by the flow cell, and sequencing) (