Gene Synthesis by Self-Assembly of Small Oligonucleotide Building Blocks

20210171994 · 2021-06-10

    Inventors

    Cpc classification

    International classification

    Abstract

    The invention provides a process for synthesizing genes and other long double stranded polynucleotides by assembling very short oligonucleotides into partly double stranded polynucleotides, and then connecting these partly double stranded polynucleotide subassemblies with linkers comprised of very short oligonucleotides. In one embodiment, the correct order of the polynucleotide subassemblies is coded in overhangs present at each end of the partly double stranded polynucleotide subassemblies. Linkers having a sequence complimentary to the combined overhangs connect adjacent subassemblies, which are then ligated together. In one preferred embodiment the oligos are six bases long, for which there are only 4096 different possible sequence permutations. A complete library of oligos of this size and scale can be cost-effectively synthesized and quality controlled, avoiding the typical errors and yield issues associated with phosphoramidite synthesis of longer oligos. Furthermore, the limited oligo library size supports development of a laboratory-scale gene synthesis machine.

    Claims

    1. A method for synthesizing a double stranded polynucleotide molecule having a predefined sequence, the method comprising the steps of: i) providing at least three single stranded oligonucleotides comprising complementary nucleotide sequence parts, ii) contacting the least three single stranded oligonucleotides provided in step i) with each other, and iii) creating at least one phosphodiester bond between any adjacent nucleotide in a self-assembled set of single stranded oligonucleotides from step ii) to create a double stranded polynucleotide of higher molecular weight than each of the single stranded oligonucleotides provided in step i).

    2. The method of claim 1, further comprising the steps of: i) providing at least two double stranded polynucleotide molecules having a predefined sequence produced using the steps i) through iii) of claim 1, ii) contacting the at least two double stranded polynucleotides provided in step i) with each other, and iii) creating at least one phosphodiester bond between any adjacent nucleotide in a self-assembled set of double stranded polynucleotides from step ii) to create a double stranded polynucleotide of higher molecular weight than each of the double stranded oligonucleotides provided in step i).

    3. The method of claim 1, further comprising the steps of: i) providing at least two double stranded polynucleotide molecules having a predefined sequence produced using the steps i) through iii) of claim 1, ii) providing at least one single stranded oligonucleotide comprising complementary nucleotide sequence parts to overhangs at the ends of the at least two double stranded polynucleotide molecules provided in step i), iii) contacting the at least two double stranded polynucleotides provided in step i) with the at least one single stranded oligonucleotide provided in step ii), and iv) creating at least one phosphodiester bond between any adjacent nucleotide in a self-assembled set of double stranded polynucleotides from step iii) to create a double stranded polynucleotide of higher molecular weight than each of the double stranded oligonucleotides provided in step i).

    4. A method for synthesizing a double stranded polynucleotide molecule having a predefined sequence, the method comprising the steps of: i) providing at least two double stranded polynucleotide molecules having a predefined sequence, ii) providing at least one single stranded oligonucleotide comprising complementary nucleotide sequence parts to overhangs at the ends of the at least two double stranded polynucleotide molecules provided in step i), iii) contacting the at least two double stranded polynucleotides provided in step i) with the at least one single stranded oligonucleotide provided in step ii), and iv) creating at least one phosphodiester bond between any adjacent nucleotide in a self-assembled set of double stranded polynucleotides from step iii) to create a double stranded polynucleotide of higher molecular weight than each of the individual double stranded oligonucleotides provided in step i).

    5. The method of claim 1, wherein the creation of at least one phosphodiester bond is catalyzed by a ligase enzyme.

    6. The method of claim 1, wherein the creation of at least one phosphodiester bond is substituted by combining, in a polymerase chain reaction, the individual oligonucleotides and polynucleotides into at least one double stranded polynucleotide of higher molecular weight than the each of the oligonucleotides/polynucleotides that went into the reaction. The method of claim 2, wherein each of the at least two double stranded polynucleotide molecules provided in step i) comprises no more than one 3′ overhang and no more than one 5′ overhang.

    8. The method of claim 2, wherein at least one of the at least two double stranded polynucleotide molecules provided in step i) is treated with a phosphatase prior to step iii).

    9. The method of claim 2, wherein one of the at least two double stranded polynucleotides provided in step i) is attached to a solid support.

    10. The method of claim 1, wherein the double stranded polynucleotide is assembled by an automated process or a semi-automated process.

    11. The method of claim 1 wherein the at least three single stranded oligonucleotides with complementary nucleotide sequence parts provided step i) all have the same length.

    12. The method of claim 4 wherein at least one of the two double stranded polynucleotides provided in step i) is derived from a double stranded polynucleotide library extracted from at least one biological source.

    13. The method of claim 4 wherein at least one of the two double stranded polynucleotides provided in step i) is derived from a synthetic double stranded polynucleotide library.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0026] FIG. 1 depicts parallel assembly of three oligonucleotides.

    [0027] FIG. 2 depicts assembly of two partly double stranded polynucleotides by connecting either two 3′ overhangs or two 5′ overhangs on two separate polynucleotides.

    [0028] FIG. 3 depicts assembly of two partly double stranded polynucleotides using an oligonucleotide linker to connect one 3′ overhang and one 5′ overhang on two separate polynucleotides.

    [0029] FIG. 4 depicts parallel assembly of two oligonucleotides onto a partly double stranded seed.

    [0030] FIG. 5 depicts sequential assembly of oligos onto a seed in combination with a partly double stranded cap molecule.

    [0031] FIG. 6 depicts assembly of multiple partly double stranded polynucleotides and multiple oligonucleotide linkers derived from multiple processes.

    [0032] FIG. 7 depicts a simple gene shuffling application.

    [0033] FIG. 8 contains a flowchart describing the algorithm for determining which oligonucleotides can be assembled together to form sets of subassemblies that can be linked together in only one order (i.e., the subassemblies form a non-ambiguous assembly) for the purpose of synthesizing a particular gene sequence.

    [0034] FIG. 9 depicts processes described in the flowchart of FIG. 8 being applied to a particular DNA sequence.

    GLOSSARY

    [0035] ‘Building blocks’ shall refer to nucleotides that can be assembled to larger molecules, which can be either final products or building blocks themselves.

    [0036] ‘Cap’ shall refer to a partly double stranded polynucleotide molecule having only one single stranded overhang at one end comprising 1 or more bases; this molecule may function as a ‘cap’ in an assembly of multiple oligo-/polynucleotide building block In terms of comprising the last polynucleotide building block added to the assembly. A ‘cap’ always comprises only one nucleic acid zip code as its overhang. A ‘cap’ may also comprise one or more functional sequences within Its double stranded part including, but not limited to: a spacer sequence and a biotin linker to link the seed to a magnetic bead; a release site (see definition below); a PCR primer site; a label; and/or a polynucleotide sequence that will be part of the final product.

    [0037] ‘Nucleic acid zip codes’ or ‘zip code’ shall refer to a unique short single stranded nucleic acid sequence that is complementary to another zip′ code, and thereby are used to direct assembly of oligo-/polynucleotide(c) building blocks in a particular order through a complimentary overlapping sequence.

    [0038] ‘Oligonucleotides’ and ‘oligos’ shall refer to single stranded nucleic acids that are generally shorter than 50,100,150 or 200 bases in length. Commonly made in the laboratory by solid-phase chemical synthesis, these small bits of nucleic acids can be manufactured with any user-specified sequence.

    [0039] ‘Overhang’ shall refer to the part of partly double stranded oligo-/polynucleotides that is single stranded.

    [0040] ‘Polynucleotides’ shall refer to single or double stranded nucleic acids that are generally longer than 50, 100, 150, or 200 bases in length.

    [0041] ‘Release site’ shall refer to a chemical feature within a polynucleotide seed or cap molecule that enables the final product, to be released from the seed or cap. The release site can be, for example, a recognition site for a restriction/nicking endonuclease, or one or more uracil residues.

    [0042] ‘Seed’ shall refer to a partly double stranded polynucleotide molecule having only one single stranded overhang at one end comprising 1 or more bases; this molecule may function as a ‘seed’ in an assembly of multiple oligo-/polynucleotide building blocks in terms of comprising the first polynucleotide building block added to the assembly. A ‘seed’ always comprises one nucleic acid zip code as its overhang. A ‘seed’ may also comprise one or more functional sequences within Its double stranded part including, but not limited to: a spacer sequence and a biotin linker to link the seed to a magnetic bead; a release site (see definition below); a PCR primer site; a label; and/or a polynucleotide sequence that will be part of the final product.

    [0043] ‘Single stranded tag’ shall refer to consecutive nucleotides linked together and forming a single stranded oligonucleotide. The number of nucleotides may range typically from about 2 to 20 but can also be more than 20 nucleotides, including tags of more than e.g. 200 nucleotides. For the purposes of this patent, a single stranded nucleotide tag can be obtained from genetic material present in a biological sample and can also be obtained from synthetic oligonucleotides.

    [0044] ‘Subassembly’ shall refer to a nucleic acid molecule assembled from a set of oligonucleotide building blocks.

    [0045] ‘Tag library’ shall refer to a plurality of at least one single stranded tag.

    [0046] ‘Wobble zip’ shall refer to part of the zip code sequence that contains all possible permutations of such sequence code or a subset of all possible permutations of such sequence code.

    DETAILED DESCRIPTION OF THE INVENTION

    [0047] The following descriptions relate to preferred embodiments of the invention and involve assembling large, even gene-length, double stranded polynucleotides using single stranded oligonucleotides of preferably six bases (i.e. hexamers) together with partly double stranded polynucleotide molecules having three base overhangs; however, the preferred embodiments of the invention are not limited to any one length of overhang and single stranded oligonucleotides having lengths up to more than 20 bases and overhangs up to more than 10 bases can be applied.

    [0048] In one preferred embodiment of the invention, the oligonucleotides are ail six bases long and the overhangs are three nucleotides long. The oligonucleotides are used to connect the double stranded polynucleotides to one another and to the seed through complimentary sets of nucleotide bases; here referred to as molecular zip codes. Each 3-nucleotide sequence provides one of 64 (4.sup.3) possible molecular zip codes; whereas the use of a six-nucleotide linker provides for up to 4096 (4.sup.6) different polynucleotide pairings. Larger numbers of pairings are possible with longer oligo Sinkers and complementary overhangs.

    [0049] The invention enables more than one building block at the same time because the correct order of assembly is coded into the overhangs. This simplifies the polynucleotide manufacturing process and dramatically increases the synthesis speed because ail possible permutations of the single stranded oligonucleotides can be pre-ordered As such, this invention supports development of a whole gene synthesizing machine that can produce ANY possible sequence of a polynucleotide from a limited set of standardized building blocks (e.g. all the 4096 permutations of single stranded hexamers).

    [0050] FIGS. 1 to 3 illustrate three types of oligonucleotide assembly reactions used in one preferred embodiment of this invention.

    [0051] FIG. 1 depicts the assembly of three hexamers into a double-stranded polynucleotide having one 3′ and one 5′ overhang. In one preferred embodiment toe oligonucleotides can only be assembled in the order specified by their consecutive overlapping bases; here referred to as e nucleic acid ‘zip code’. In one preferred embodiment after the oligos anneal together in solution, a phosphodiester bond is formed between adjacent oligos using a ligation reaction to create a continuous strand hybridized to its complementary strand. Suitable conditions for ligation must be established to ensure that only oligos that exactly match the single stranded overhang available for hybridization are added to toe growing chain. Ligation conditions would comprise e.g. choice of ligase, buffer composition, reaction temperature, and be chosen and optimized using methods known in the art. The product of this simple assembly is a partly double stranded polynucleotide having one 3′ overhang and one 5′ overhang on toe lower strand. A similar process can be applied to create a partly double stranded polynucleotide having one 3′ overhang and one 5′ overhang on the upper strand.

    [0052] In FIG. 2 two partly double stranded polynucleotides derived from the assembly of oligonucleotides depicted in Figure 1 are assembled together through the complementary zip codes that comprise their 3′ overhangs. After ligation creates phosphodiester bonds between strands, the result is a larger, partly double stranded, polynucleotide. This molecule may be the intended end product, or it may serve as a building block for further assembly reactions.

    [0053] Alternatively, partly double stranded polynucleotides can be connected together in a particular order using single stranded oligonucleotide “linkers” to bridge adjacent overhangs. FIG. 3 shows how a single stranded hexamer linker connects the 5′ overhang on the lower strand from a first polynucleotide with a 3′ overhang on the lower strand from a second polynucleotide. After the molecules anneal together they can be ligated to form toe new larger double stranded polynucleotide.

    [0054] The product of the assembly, which may comprise one or more subassemblies or one or more final constructs, may be isolated from the reaction by PCR, clonal selection and other methods well known in the art. Under certain conditions, such as those in which ligation is not strict or when ambiguous linkers are present (e.g. palindromes), side products may be produced. These unintended polynucleotides are unlikely to have the same length as the desired product. Thus size selection, e.g. using gel electrophoresis, may be an additional means of isolating the desired product from these side-products, if any.

    [0055] Another means of separating the intended product and side product(s) is by selective capture of the overhangs. Alternate assemblies of a given set of oligos and/or partially double stranded polynucleotides are unlikely to possess the same sets of overhangs. Thus the product can be isolated by (1) capturing the intended product on a surface- bound capture molecule having a three base overhang—or simply three bases of single stranded DNA on a spacer attached to a surface—complimentary to the first overhang on the intended product, then (2) capturing the intended product on a surface-bound polynucleotide having a three complimentary bases available for capture to the second overhang on toe intended product and (3) releasing products that are captured by steps 1 and 2 into solution by methods known in the art. It may or may not be desirable to release the polynucleotides in step 1 before proceeding to step 2. For example toe intended product, if sufficiently long, can be captured on a surface or matrix displaying capture sequences complimentary to both overhangs. Nucleotide analogs and/or ligation can be used to increase the efficiency and stringency of toe capture conditions and followed by release of the product (or subassembly) from the surface or matrix using methods described in this application or otherwise known in the art.

    [0056] Oligos and/or polynucleotides can also assemble on a partially double stranded polynucleotide that has only one overhang. We shall refer to such a molecule as a ‘seed’ when its overhang comprises the first zip code for a growing assembly, in one embodiment of the invention, the seed is comprised of a partly double stranded polynucleotide spacer molecule having a single stranded 3-base overhang (ZIP1′) at one end. This molecule can be bound to the surface of a solid support such as a paramagnetic bead at its double stranded end; such that the single stranded portion is free to bind with any purely single stranded or single stranded part of a partly double stranded oligo/poly-nucleotide molecule to solution having a complimentary 3-base sequence (ZIP1). The double stranded portion of this seed may contain a release site, such as a recognition/restriction site for a restriction/nicking endonuclease, or it may contain uracil residues; either of which can be used for release of the double stranded polynucleotide product from the solid support. This double stranded polynucleotide sequence may, optionally, include a PCR primer-binding site to be used to amplify the product sequence.

    [0057] FIGS. 4 and 5 depict two embodiments of the polynucleotide assembly process wherein multiple overlapping oligonucleotides self-assemble on a seed to create a double stranded polynucleotide. In one preferred embodiment the oligonucleotide building blocks are all present together in a single pot mixture and self-assemble onto the seed in a parallel fashion, and are then subsequently ligated together (FIG. 4). In a separate preferred embodiment, subsets of oligos are added to the reaction mixture one-at-a-lime in a step-wise fashion (FIG. 5). Also depicted in FIG. 5 is the inclusion of a ‘cap’ polynucleotide as a building block that can terminate a growing oligonucleotide chain because It does not provide a second overhang for additional assembly. A single stranded oligonucleotide can, alternatively, terminate a polynucleotide assembly if one of the two zip codes does not complement any other zip code present in the reaction mixture.

    [0058] In these drawings the assemblies are depicted with the minimum number of oligos and polynucleotides to illustrate the concept; however, much larger numbers of oligonucleotides and/or polynucleotides can be assembled using methods enabled by this invention. Furthermore, these methods can be used to assemble oligonucleotides and polynucleotides derived from different biological sources and synthesized by different methods known in the art. In one preferred embodiment the partly double stranded polynucleotides are synthesized by means of the oligonucleotide self-assembly process described in this invention. In another preferred embodiment these polynucleotides are isolated from double stranded DNA derived from a biological source using restriction endonucleases and other cleavage agents known in the art. In particular, U.S. Pat. No. 6,958,217 teaches that single stranded oligonucleotide tags of fixed uniform length can be isolated from biological samples using the combined action of Type IIS restriction and nicking enzymes. This patent also provides a means for creating a library of polynucleotides having fixed length overhangs, which are the byproducts of the tag isolation process.

    [0059] FIG. 6 illustrates the versatility of the method enabled by this invention by depicting a double stranded polynucleotide sequence assembled from building blocks that derive from a variety of sources and processes. These include synthetic and non-synthetic polynucleotides; subassemblies of synthetic and non-synthetic oligonucleotides, as well as random permutations of synthetic oligos. All of the building blocks have single stranded overhangs that can be connected directly (as shown in FIG. 2) or through an oligonucleotide linker (as shown in FIG. 3).

    [0060] These overhangs and oligonucleotide linkers, which together comprise the zip codes, determine the desired order of the oligo and polynucleotides building blocks. In one preferred embodiment ail of the zip codes are unique such that the polynucleotides can be assembled in a single predetermined order to form a single product. In another embodiment one or more zip cedes are repeated and/or degenerated such that the polynucleotides are combined In at least two ways to purposefully synthesize at least two distinct polynucleotide products (i.e., for gene shuffling and codon optimization applications).

    [0061] FIG. 7 contains a representation of a simple gene shuffling application. Three polynucleotide sequences are shuffled between three positions by including alternative oligonucleotides linkers in the reaction. The figure depicts three possible products, shown as surface-bound assemblies prior to ligation. The assembly at the top is comprised of the seed displaying overhang ZIP1′; three double-stranded polynucleotides (A, B, and C) each having two 3-base overhangs on the lower strand; and three oligonucleotide linkers (ZIP1-ZIP2, ZIP3-ZIP4 and ZIP5-ZIP6). These components assemble into the unique structure by virtue of their overlapping complimentary sequences. Two alternate polynucleotide sequences are created by including additional oligos (ZIP1-ZIP6, ZIP7-ZIP4, ZIP5-ZIP2, ZIP1-ZIP4, ZIP7-ZIP2) in the reaction mixture. A given set of olio/poly nucleotide building blocks can also be shuffled by including at least one linker for which one of the two zip codes has been replaced by a wobble zip that can join one specific building block to any other building block (for example, ZIP5-NNN where N=A, C, Gt or T).

    [0062] Another embodiment of the present invention provides a means for introducing a frameshift into the synthesized gene. In this embodiment the oligonucleotide linker is at least one base longer than the combined length of its two zip codes. The extra base or bases create a gap in the other strand of the resulting oligo/polynucleotide assembly that can subsequently be closed by e.g. a DNA polymerase.

    [0063] The invention also enables genes and ether large polynucleotides to be synthesized by dividing the gene sequence into subassemblies comprised of pools of overlapping hexamers. If each pool of hexamers is chosen such that it can only be assembled in a single configuration (i.e., it forms an unambiguous assembly), side reactions can be minimized or eliminated; whereas combining ail hexamer pools together in a single assembly process would result in multiple products. The resulting subassemblies are subsequently ligated together using their three-base overhangs in combination with connecting oligo hexamers to form the final product. This strategy enables multiple starting points for the synthesis of the gene and it is compatible with use of laboratory robotics. A flowchart showing a process for selecting pools of short oligonucleotide building blocks of e.g six bases is depicted in Figure 8. Accompanying this flowchart is a figure (FIG. 9) depicting the different in silico operations taking place on the target sequence.

    [0064] A similar strategy is also possible with building blocks longer or shorter than six bases and it is very easy to automate. However, building blocks of six bases are preferred because they are long enough to create a three-base overhang suitable for ligation and yet also short enough to pre-order ail sequence permutations. Furthermore, six is an even number that permits creation of overhangs having a uniform number of bases.

    REFERENCES

    [0065] Anderson J C, Queber J E, Leguia M, Wu G C, Goler J A, Arkin A P; Keasling J D. (2010) BglBricks: A flexible standard for biological part assembly, Journal of Biological Engineering, 4(1):1-12.

    [0066] Dunn J J, Butier-Loffredo L L, Studier F W. (1995) Ligation of hexamers on hexamer templates to produce primers for cycle sequencing or the polymerase chain reaction. Anal Biochem. 228(1):91-100.

    [0067] Engler C, Kandzia R, Marlllonnet S. (2008) A one pot, one step, precision cloning method with high throughput capability. PloS One, 3(11):e3647.

    [0068] Gibson D G, Benders G A, Andrews-Pfannkoch C, Denisova E A, Baden-Tillson H, Zaveri J, Stockwell T B, Brownley A, Thomas D W, Algire M A, Merryman C, Young L, Noskov V N, Glass J I, Venter J C, Hutchison III C A, Smith H A. (2008) Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome. Science, 319(5867):1215-1220.

    [0069] Gibson D G. (2009) Synthesis of DNA fragments in yeast by one-step assembly of overlapping oligonucleotides. Nucleic Acids Research, 37(20):8984-6990.

    [0070] Gibson D G, Young L, Chuang R Y, Ventar J C, Hutchison C A 3rd, Smith H O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods, 6(5):343-345.

    [0071] Gibson D G, Smith H O, Hutchison C A, Venter J C, Merryrnan C. (2010) Chemical synthesis of the mouse mitochondrial genome. Nat Methods 2010a(7):901-905.

    [0072] Hebelstrup K H, Christiansen M W, Carctofi M, Tauris B, Brinch-Pedersen H, Holm P B, (2010) UCE: A uracil excision (USER™)-based toolbox for transformation of cereals. Plant Methods, 6:15-24.

    [0073] Horspool D R, Coope R J N, Holt R A (2010) Efficient assembly of very short oligonucleotides using T4 DNA Ligase. BMC Res Notes, 3:291-299.

    [0074] Ma S, Saaem I, Tian J. (2012) Error correction in gene synthesis technology. Trends Biotechnol., 30(3): 147-54.

    [0075] Quan J, Saaem I, Tang N, Ma S, Negre N, Hui G (2011) Parallel on-chip gene synthesis and application to optimization of protein expression Nature Biotechnology. 29: 449-452.

    [0076] Smith H O, Hutchison III C A, Pfannkoch C. and Venter J C (2003) Generating a synthetic genome by whole genome assembly: X174 bacteriophage from synthetic oligonucleotides. PNAS, 100(26): 15440-15445.

    [0077] Stemmer W P, Crameri A, Ha K D, Brennan T M, Heyneker H L (1995) Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene, 1614:49-53.

    [0078] Xiong A S, Yao Q H, Peng R H, Li X, Fan H Q, Cheng Z M, LI Y. (2004) A simple, rapid, high-fidelity and cost-effective PCR-based two-step DNA synthesis method for long gene sequences. Nucleic Acids Res, 32(12):e98.

    [0079] Xiong A S, Yao Q H, Peng R H, Duan H, Li X, Fan H Q, Cheng Z M, Li Y. (2006) PCR-based accurate synthesis of long DNA sequences, Nat Protoc, 1(2):791-797.

    [0080] Xiong A S, Peng R H, Zhuang J, Liu J G, Gao F, Chen J M, Cheng Z M, Yao Q H. (2008) Non-polymerase-cycling-assembly-based chemical gene synthesis: strategies, methods, and progress. Biotechnol Adv. 26(2):121-34.