Gene Synthesis by Self-Assembly of Small Oligonucleotide Building Blocks
20210171994 · 2021-06-10
Inventors
- Morten Lorentz Pedersen (Cophenhagen NV, DK)
- Gitte Laurette Pedersen (Montauk, NY, US)
- Tanya Sharlene Kanigan (Charlotte, VT, US)
Cpc classification
C12N15/1031
CHEMISTRY; METALLURGY
C12P19/34
CHEMISTRY; METALLURGY
International classification
C12P19/34
CHEMISTRY; METALLURGY
C12N15/10
CHEMISTRY; METALLURGY
Abstract
The invention provides a process for synthesizing genes and other long double stranded polynucleotides by assembling very short oligonucleotides into partly double stranded polynucleotides, and then connecting these partly double stranded polynucleotide subassemblies with linkers comprised of very short oligonucleotides. In one embodiment, the correct order of the polynucleotide subassemblies is coded in overhangs present at each end of the partly double stranded polynucleotide subassemblies. Linkers having a sequence complimentary to the combined overhangs connect adjacent subassemblies, which are then ligated together. In one preferred embodiment the oligos are six bases long, for which there are only 4096 different possible sequence permutations. A complete library of oligos of this size and scale can be cost-effectively synthesized and quality controlled, avoiding the typical errors and yield issues associated with phosphoramidite synthesis of longer oligos. Furthermore, the limited oligo library size supports development of a laboratory-scale gene synthesis machine.
Claims
1. A method for synthesizing a double stranded polynucleotide molecule having a predefined sequence, the method comprising the steps of: i) providing at least three single stranded oligonucleotides comprising complementary nucleotide sequence parts, ii) contacting the least three single stranded oligonucleotides provided in step i) with each other, and iii) creating at least one phosphodiester bond between any adjacent nucleotide in a self-assembled set of single stranded oligonucleotides from step ii) to create a double stranded polynucleotide of higher molecular weight than each of the single stranded oligonucleotides provided in step i).
2. The method of claim 1, further comprising the steps of: i) providing at least two double stranded polynucleotide molecules having a predefined sequence produced using the steps i) through iii) of claim 1, ii) contacting the at least two double stranded polynucleotides provided in step i) with each other, and iii) creating at least one phosphodiester bond between any adjacent nucleotide in a self-assembled set of double stranded polynucleotides from step ii) to create a double stranded polynucleotide of higher molecular weight than each of the double stranded oligonucleotides provided in step i).
3. The method of claim 1, further comprising the steps of: i) providing at least two double stranded polynucleotide molecules having a predefined sequence produced using the steps i) through iii) of claim 1, ii) providing at least one single stranded oligonucleotide comprising complementary nucleotide sequence parts to overhangs at the ends of the at least two double stranded polynucleotide molecules provided in step i), iii) contacting the at least two double stranded polynucleotides provided in step i) with the at least one single stranded oligonucleotide provided in step ii), and iv) creating at least one phosphodiester bond between any adjacent nucleotide in a self-assembled set of double stranded polynucleotides from step iii) to create a double stranded polynucleotide of higher molecular weight than each of the double stranded oligonucleotides provided in step i).
4. A method for synthesizing a double stranded polynucleotide molecule having a predefined sequence, the method comprising the steps of: i) providing at least two double stranded polynucleotide molecules having a predefined sequence, ii) providing at least one single stranded oligonucleotide comprising complementary nucleotide sequence parts to overhangs at the ends of the at least two double stranded polynucleotide molecules provided in step i), iii) contacting the at least two double stranded polynucleotides provided in step i) with the at least one single stranded oligonucleotide provided in step ii), and iv) creating at least one phosphodiester bond between any adjacent nucleotide in a self-assembled set of double stranded polynucleotides from step iii) to create a double stranded polynucleotide of higher molecular weight than each of the individual double stranded oligonucleotides provided in step i).
5. The method of claim 1, wherein the creation of at least one phosphodiester bond is catalyzed by a ligase enzyme.
6. The method of claim 1, wherein the creation of at least one phosphodiester bond is substituted by combining, in a polymerase chain reaction, the individual oligonucleotides and polynucleotides into at least one double stranded polynucleotide of higher molecular weight than the each of the oligonucleotides/polynucleotides that went into the reaction. The method of claim 2, wherein each of the at least two double stranded polynucleotide molecules provided in step i) comprises no more than one 3′ overhang and no more than one 5′ overhang.
8. The method of claim 2, wherein at least one of the at least two double stranded polynucleotide molecules provided in step i) is treated with a phosphatase prior to step iii).
9. The method of claim 2, wherein one of the at least two double stranded polynucleotides provided in step i) is attached to a solid support.
10. The method of claim 1, wherein the double stranded polynucleotide is assembled by an automated process or a semi-automated process.
11. The method of claim 1 wherein the at least three single stranded oligonucleotides with complementary nucleotide sequence parts provided step i) all have the same length.
12. The method of claim 4 wherein at least one of the two double stranded polynucleotides provided in step i) is derived from a double stranded polynucleotide library extracted from at least one biological source.
13. The method of claim 4 wherein at least one of the two double stranded polynucleotides provided in step i) is derived from a synthetic double stranded polynucleotide library.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
GLOSSARY
[0035] ‘Building blocks’ shall refer to nucleotides that can be assembled to larger molecules, which can be either final products or building blocks themselves.
[0036] ‘Cap’ shall refer to a partly double stranded polynucleotide molecule having only one single stranded overhang at one end comprising 1 or more bases; this molecule may function as a ‘cap’ in an assembly of multiple oligo-/polynucleotide building block In terms of comprising the last polynucleotide building block added to the assembly. A ‘cap’ always comprises only one nucleic acid zip code as its overhang. A ‘cap’ may also comprise one or more functional sequences within Its double stranded part including, but not limited to: a spacer sequence and a biotin linker to link the seed to a magnetic bead; a release site (see definition below); a PCR primer site; a label; and/or a polynucleotide sequence that will be part of the final product.
[0037] ‘Nucleic acid zip codes’ or ‘zip code’ shall refer to a unique short single stranded nucleic acid sequence that is complementary to another zip′ code, and thereby are used to direct assembly of oligo-/polynucleotide(c) building blocks in a particular order through a complimentary overlapping sequence.
[0038] ‘Oligonucleotides’ and ‘oligos’ shall refer to single stranded nucleic acids that are generally shorter than 50,100,150 or 200 bases in length. Commonly made in the laboratory by solid-phase chemical synthesis, these small bits of nucleic acids can be manufactured with any user-specified sequence.
[0039] ‘Overhang’ shall refer to the part of partly double stranded oligo-/polynucleotides that is single stranded.
[0040] ‘Polynucleotides’ shall refer to single or double stranded nucleic acids that are generally longer than 50, 100, 150, or 200 bases in length.
[0041] ‘Release site’ shall refer to a chemical feature within a polynucleotide seed or cap molecule that enables the final product, to be released from the seed or cap. The release site can be, for example, a recognition site for a restriction/nicking endonuclease, or one or more uracil residues.
[0042] ‘Seed’ shall refer to a partly double stranded polynucleotide molecule having only one single stranded overhang at one end comprising 1 or more bases; this molecule may function as a ‘seed’ in an assembly of multiple oligo-/polynucleotide building blocks in terms of comprising the first polynucleotide building block added to the assembly. A ‘seed’ always comprises one nucleic acid zip code as its overhang. A ‘seed’ may also comprise one or more functional sequences within Its double stranded part including, but not limited to: a spacer sequence and a biotin linker to link the seed to a magnetic bead; a release site (see definition below); a PCR primer site; a label; and/or a polynucleotide sequence that will be part of the final product.
[0043] ‘Single stranded tag’ shall refer to consecutive nucleotides linked together and forming a single stranded oligonucleotide. The number of nucleotides may range typically from about 2 to 20 but can also be more than 20 nucleotides, including tags of more than e.g. 200 nucleotides. For the purposes of this patent, a single stranded nucleotide tag can be obtained from genetic material present in a biological sample and can also be obtained from synthetic oligonucleotides.
[0044] ‘Subassembly’ shall refer to a nucleic acid molecule assembled from a set of oligonucleotide building blocks.
[0045] ‘Tag library’ shall refer to a plurality of at least one single stranded tag.
[0046] ‘Wobble zip’ shall refer to part of the zip code sequence that contains all possible permutations of such sequence code or a subset of all possible permutations of such sequence code.
DETAILED DESCRIPTION OF THE INVENTION
[0047] The following descriptions relate to preferred embodiments of the invention and involve assembling large, even gene-length, double stranded polynucleotides using single stranded oligonucleotides of preferably six bases (i.e. hexamers) together with partly double stranded polynucleotide molecules having three base overhangs; however, the preferred embodiments of the invention are not limited to any one length of overhang and single stranded oligonucleotides having lengths up to more than 20 bases and overhangs up to more than 10 bases can be applied.
[0048] In one preferred embodiment of the invention, the oligonucleotides are ail six bases long and the overhangs are three nucleotides long. The oligonucleotides are used to connect the double stranded polynucleotides to one another and to the seed through complimentary sets of nucleotide bases; here referred to as molecular zip codes. Each 3-nucleotide sequence provides one of 64 (4.sup.3) possible molecular zip codes; whereas the use of a six-nucleotide linker provides for up to 4096 (4.sup.6) different polynucleotide pairings. Larger numbers of pairings are possible with longer oligo Sinkers and complementary overhangs.
[0049] The invention enables more than one building block at the same time because the correct order of assembly is coded into the overhangs. This simplifies the polynucleotide manufacturing process and dramatically increases the synthesis speed because ail possible permutations of the single stranded oligonucleotides can be pre-ordered As such, this invention supports development of a whole gene synthesizing machine that can produce ANY possible sequence of a polynucleotide from a limited set of standardized building blocks (e.g. all the 4096 permutations of single stranded hexamers).
[0050]
[0051]
[0052] In
[0053] Alternatively, partly double stranded polynucleotides can be connected together in a particular order using single stranded oligonucleotide “linkers” to bridge adjacent overhangs.
[0054] The product of the assembly, which may comprise one or more subassemblies or one or more final constructs, may be isolated from the reaction by PCR, clonal selection and other methods well known in the art. Under certain conditions, such as those in which ligation is not strict or when ambiguous linkers are present (e.g. palindromes), side products may be produced. These unintended polynucleotides are unlikely to have the same length as the desired product. Thus size selection, e.g. using gel electrophoresis, may be an additional means of isolating the desired product from these side-products, if any.
[0055] Another means of separating the intended product and side product(s) is by selective capture of the overhangs. Alternate assemblies of a given set of oligos and/or partially double stranded polynucleotides are unlikely to possess the same sets of overhangs. Thus the product can be isolated by (1) capturing the intended product on a surface- bound capture molecule having a three base overhang—or simply three bases of single stranded DNA on a spacer attached to a surface—complimentary to the first overhang on the intended product, then (2) capturing the intended product on a surface-bound polynucleotide having a three complimentary bases available for capture to the second overhang on toe intended product and (3) releasing products that are captured by steps 1 and 2 into solution by methods known in the art. It may or may not be desirable to release the polynucleotides in step 1 before proceeding to step 2. For example toe intended product, if sufficiently long, can be captured on a surface or matrix displaying capture sequences complimentary to both overhangs. Nucleotide analogs and/or ligation can be used to increase the efficiency and stringency of toe capture conditions and followed by release of the product (or subassembly) from the surface or matrix using methods described in this application or otherwise known in the art.
[0056] Oligos and/or polynucleotides can also assemble on a partially double stranded polynucleotide that has only one overhang. We shall refer to such a molecule as a ‘seed’ when its overhang comprises the first zip code for a growing assembly, in one embodiment of the invention, the seed is comprised of a partly double stranded polynucleotide spacer molecule having a single stranded 3-base overhang (ZIP1′) at one end. This molecule can be bound to the surface of a solid support such as a paramagnetic bead at its double stranded end; such that the single stranded portion is free to bind with any purely single stranded or single stranded part of a partly double stranded oligo/poly-nucleotide molecule to solution having a complimentary 3-base sequence (ZIP1). The double stranded portion of this seed may contain a release site, such as a recognition/restriction site for a restriction/nicking endonuclease, or it may contain uracil residues; either of which can be used for release of the double stranded polynucleotide product from the solid support. This double stranded polynucleotide sequence may, optionally, include a PCR primer-binding site to be used to amplify the product sequence.
[0057]
[0058] In these drawings the assemblies are depicted with the minimum number of oligos and polynucleotides to illustrate the concept; however, much larger numbers of oligonucleotides and/or polynucleotides can be assembled using methods enabled by this invention. Furthermore, these methods can be used to assemble oligonucleotides and polynucleotides derived from different biological sources and synthesized by different methods known in the art. In one preferred embodiment the partly double stranded polynucleotides are synthesized by means of the oligonucleotide self-assembly process described in this invention. In another preferred embodiment these polynucleotides are isolated from double stranded DNA derived from a biological source using restriction endonucleases and other cleavage agents known in the art. In particular, U.S. Pat. No. 6,958,217 teaches that single stranded oligonucleotide tags of fixed uniform length can be isolated from biological samples using the combined action of Type IIS restriction and nicking enzymes. This patent also provides a means for creating a library of polynucleotides having fixed length overhangs, which are the byproducts of the tag isolation process.
[0059]
[0060] These overhangs and oligonucleotide linkers, which together comprise the zip codes, determine the desired order of the oligo and polynucleotides building blocks. In one preferred embodiment ail of the zip codes are unique such that the polynucleotides can be assembled in a single predetermined order to form a single product. In another embodiment one or more zip cedes are repeated and/or degenerated such that the polynucleotides are combined In at least two ways to purposefully synthesize at least two distinct polynucleotide products (i.e., for gene shuffling and codon optimization applications).
[0061]
[0062] Another embodiment of the present invention provides a means for introducing a frameshift into the synthesized gene. In this embodiment the oligonucleotide linker is at least one base longer than the combined length of its two zip codes. The extra base or bases create a gap in the other strand of the resulting oligo/polynucleotide assembly that can subsequently be closed by e.g. a DNA polymerase.
[0063] The invention also enables genes and ether large polynucleotides to be synthesized by dividing the gene sequence into subassemblies comprised of pools of overlapping hexamers. If each pool of hexamers is chosen such that it can only be assembled in a single configuration (i.e., it forms an unambiguous assembly), side reactions can be minimized or eliminated; whereas combining ail hexamer pools together in a single assembly process would result in multiple products. The resulting subassemblies are subsequently ligated together using their three-base overhangs in combination with connecting oligo hexamers to form the final product. This strategy enables multiple starting points for the synthesis of the gene and it is compatible with use of laboratory robotics. A flowchart showing a process for selecting pools of short oligonucleotide building blocks of e.g six bases is depicted in
[0064] A similar strategy is also possible with building blocks longer or shorter than six bases and it is very easy to automate. However, building blocks of six bases are preferred because they are long enough to create a three-base overhang suitable for ligation and yet also short enough to pre-order ail sequence permutations. Furthermore, six is an even number that permits creation of overhangs having a uniform number of bases.
REFERENCES
[0065] Anderson J C, Queber J E, Leguia M, Wu G C, Goler J A, Arkin A P; Keasling J D. (2010) BglBricks: A flexible standard for biological part assembly, Journal of Biological Engineering, 4(1):1-12.
[0066] Dunn J J, Butier-Loffredo L L, Studier F W. (1995) Ligation of hexamers on hexamer templates to produce primers for cycle sequencing or the polymerase chain reaction. Anal Biochem. 228(1):91-100.
[0067] Engler C, Kandzia R, Marlllonnet S. (2008) A one pot, one step, precision cloning method with high throughput capability. PloS One, 3(11):e3647.
[0068] Gibson D G, Benders G A, Andrews-Pfannkoch C, Denisova E A, Baden-Tillson H, Zaveri J, Stockwell T B, Brownley A, Thomas D W, Algire M A, Merryman C, Young L, Noskov V N, Glass J I, Venter J C, Hutchison III C A, Smith H A. (2008) Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome. Science, 319(5867):1215-1220.
[0069] Gibson D G. (2009) Synthesis of DNA fragments in yeast by one-step assembly of overlapping oligonucleotides. Nucleic Acids Research, 37(20):8984-6990.
[0070] Gibson D G, Young L, Chuang R Y, Ventar J C, Hutchison C A 3rd, Smith H O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods, 6(5):343-345.
[0071] Gibson D G, Smith H O, Hutchison C A, Venter J C, Merryrnan C. (2010) Chemical synthesis of the mouse mitochondrial genome. Nat Methods 2010a(7):901-905.
[0072] Hebelstrup K H, Christiansen M W, Carctofi M, Tauris B, Brinch-Pedersen H, Holm P B, (2010) UCE: A uracil excision (USER™)-based toolbox for transformation of cereals. Plant Methods, 6:15-24.
[0073] Horspool D R, Coope R J N, Holt R A (2010) Efficient assembly of very short oligonucleotides using T4 DNA Ligase. BMC Res Notes, 3:291-299.
[0074] Ma S, Saaem I, Tian J. (2012) Error correction in gene synthesis technology. Trends Biotechnol., 30(3): 147-54.
[0075] Quan J, Saaem I, Tang N, Ma S, Negre N, Hui G (2011) Parallel on-chip gene synthesis and application to optimization of protein expression Nature Biotechnology. 29: 449-452.
[0076] Smith H O, Hutchison III C A, Pfannkoch C. and Venter J C (2003) Generating a synthetic genome by whole genome assembly: X174 bacteriophage from synthetic oligonucleotides. PNAS, 100(26): 15440-15445.
[0077] Stemmer W P, Crameri A, Ha K D, Brennan T M, Heyneker H L (1995) Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene, 1614:49-53.
[0078] Xiong A S, Yao Q H, Peng R H, Li X, Fan H Q, Cheng Z M, LI Y. (2004) A simple, rapid, high-fidelity and cost-effective PCR-based two-step DNA synthesis method for long gene sequences. Nucleic Acids Res, 32(12):e98.
[0079] Xiong A S, Yao Q H, Peng R H, Duan H, Li X, Fan H Q, Cheng Z M, Li Y. (2006) PCR-based accurate synthesis of long DNA sequences, Nat Protoc, 1(2):791-797.
[0080] Xiong A S, Peng R H, Zhuang J, Liu J G, Gao F, Chen J M, Cheng Z M, Yao Q H. (2008) Non-polymerase-cycling-assembly-based chemical gene synthesis: strategies, methods, and progress. Biotechnol Adv. 26(2):121-34.