System for continuous mutagenesis in vivo to facilitate directed evolution
10760071 ยท 2020-09-01
Assignee
Inventors
Cpc classification
C12N15/1024
CHEMISTRY; METALLURGY
C40B10/00
CHEMISTRY; METALLURGY
C12N15/70
CHEMISTRY; METALLURGY
C12P19/34
CHEMISTRY; METALLURGY
International classification
C12N5/00
CHEMISTRY; METALLURGY
C12N9/12
CHEMISTRY; METALLURGY
C12P21/06
CHEMISTRY; METALLURGY
C12P19/34
CHEMISTRY; METALLURGY
C12N15/70
CHEMISTRY; METALLURGY
C40B10/00
CHEMISTRY; METALLURGY
Abstract
A system for continuous mutagenesis to facilitate directed evolution, the system including DNA polymerases carrying the novel K54E point mutation, and other point mutations including I709N, A759R, D424A (herein called K54E_LF Pol I) and this methods of use to produce and detect lines where mutagenesis is continuous and does not exhibit the usual decline in mutagenesis with sequential cloning.
Claims
1. A mutant DNA polymerase with at least 90% homology to SEP ID NO: 2, said polymerase comprising a mutation at position K54, and further comprising mutations at positions I709, A759, and D424, provided that the polymerase displays continuous mutagenesis equivalent to that of a polymerase of SEQ ID NO: 15, wherein the mutation at position I709 is an I709N mutation, wherein the mutation at position A759 is an A759R mutation, and wherein the mutation at position D424 is a D424A mutation.
2. The mutant DNA polymerase of claim 1, comprising SEQ ID NO: 15.
3. The mutant DNA polymerase of claim 1, consisting of SEQ ID NO: 15.
4. A polynucleotide that encodes the mutant DNA polymerase of claim 1.
5. A plasmid comprising the polynucleotide of claim 4 operably linked to a first promoter.
6. An E. coli cell comprising the plasmid of claim 5.
7. A method of performing directed evolution of a gene of interest, the method comprising: transforming the E. coli cell of claim 6 with a reporter plasmid, wherein the reporter plasmid comprises a polynucleotide that encodes an unmutated gene of interest, thereby performing continuous mutagenesis on the gene of interest and producing an E. coli cell comprising a reporter plasmid that includes a mutant form of the gene of interest, purifying the reporter plasmid that includes the mutant form of the gene of interest from the E. coli cell, thereby creating a purified mutated reporter plasmid; transforming an E. coli cell with the purified mutated reporter plasmid; and determining the activity of the mutant form of the gene of interest relative to the unmutated gene of interest.
8. The method of claim 7, wherein the gene of interest comprises a fluorescent protein.
9. The method of claim 8, wherein the gene of interest comprises GFP.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION
(6) The following publications may be of use understanding the background to and supporting the present invention and are all incorporated by reference for all purposes: Camps et al: PNAS Aug. 19, 2003, vol. 100 no. 17, 9727-9732; Alexander et al. (incl. Camps): Methods Mol. Biol. 2014; 1179: 31-44. doi:10.1007/978-1-4939-1053-3; Labrou. Current Protein and Peptide Science, 2010, 11, 91-100.
(7) An exemplary embodiment of the method of the invention is as follows. Note that when specific cell lines, temperatures, volumes, times and apparatus (etc.) are mentioned, any reasonable variation may be used.
(8) Transformation of Error-Prone Polymerase
(9) 1. Transform mutagenic plasmid containing K54E-LF Pol I (novel mutant) into the electro competent JS200 (SC18 polA12 recA718 uvr355) cells. Note that any other suitable competent cell line may be used as will be within the understanding of one skilled in the art. These are temperature-sensitive for Pol I function.
(10) 2. Recover cells at 30 C. (Note that any other suitable temperature may be used such as from 12-40 C., 15-30 C., 20-15 C. etc.) and plate on nutrient media plates, e.g., LB containing appropriate selective media (examples include antibiotics kanamycin, carbenicillin or zeocin, tetracycline etc. functional complementation, or addiction system) and incubate overnight at 37 C. (or there abouts). In certain embodiments the temperature may vary from between 35 C. and 42 C.
(11) 3. Pick a colony and make bench top competent cells by washing them twice in 10% glycerol in a 1.5 eppendorf tube (or similar method).
(12) Mutagenesis:
(13) 4. Transform the ColE1 reporter plasmid containing GFP and the gene of interest into JS200_K54E cells and plate onto pre-warmed nutrient media plates, e.g., LB plates (37 C.) containing selection antibiotics. Note that any other suitable reporter plasmid with a suitable fluorescent, immunological, radiological etc. reporter may be used as will be within the understanding of one skilled in the art.
(14)
(15) 5. Incubate the Petri dish(es) overnight (or for various times between about 8 hrs. and 60 hrs.) at 37 C. (at this temperature endogenous Pol I is inactive so the primary replication activity comes from K54E-LF Pol I.) In various embodiments the temperature range may be between 35 C. and 42 C.
(16) 6. Wash plate with about 2 mL of media such as LB broth and take a volume, for example 5 ul of plate wash to inoculate some amount, for example 5 ml of selective media, and keep passing some amount, for example 5 l (microliters) of saturated culture into fresh media.
(17) 7. Take the rest of the plate wash and miniprep (or similar procedure) and also miniprep (or similar procedure) each of the saturated cultures.
(18) 8. Readout: Take minipreps and transform into TOP 10 cells or any strain of E. coli bearing a WT allele of Pol I (or any other suitable strain) so that plasmids are separated individually and expressed in high copy number.
(19) 9. Plate at different concentrations so as to get about 500 colonies per plate (informative range is between 100 and 2500 colonies).
(20) 10. Observe under UV light and grade mutation index reflecting the diversity of fluorescent signal according to the following key: 0=no evidence of mutagenesis 1=rare dark colonies 2=some dark and superlight colonies 3=about 10% dark colonies, frequent superbright 4=10-30% darks many superbrights 5=50% darks many superbright 6=majority darks 7=almost all darks
(21) The results are shown in
(22)
(23)
ADVANTAGES OF THE INVENTION
(24) This invention allows performing evolution in real time, i.e. allows performing a selection without having to do a library prep first because mutations are introduced into these cell's DNA at the same time as they undergo selection. A significant advantage is that there is No need for iteration, which greatly facilitates the exploration of long trajectories in sequence space. Iterative mutagenesis and selection (the way directed evolution is currently done) is labor intensive and each round of mutagenesis inactivates part of the library. Continuous mutagenesis coupled to a strong selection avoids this by ensuring all mutants present have a minimum of activity. The other critical advantage is that this invention is scalable, i.e. the number of mutants that can be explored is only limited by the size of the culture, not by ligation or amplification steps. Commercial advantages include the fact that this method is much cheaper than the current methods because there is no need for cloning, and it is highly scalable.
(25) Disadvantages include restricted genetic diversity. The main disadvantage is that, depending of the strength of selection, libraries can experience severe bottlenecks, restricting exploration of sequence space. Clonal interference, where a number of clones with similar moderate increases in fitness compete against each other, also restricts genetic diversity. Finally, only trajectories where each mutation increases fitness are effectively explored. In our systems, fitness valleys (i.e. combinations of mutations with a negative impact on fitness) represent barriers for exploration that other systems overcome by increasing the mutation load. Another disadvantage is un-targeted mutagenesis. Mutagenesis is not targeted to the ORF of the gene of interest or to specific areas within that gene, so quantitative effects on activity can be due to regulation of gene expression or plasmid copy number. Qualitative effects (gain of function), however, should be largely insensitive to increased gene expression. So our method is especially adequate for the evolution of new genetic activities rather than for modulation of existing ones.
GENERAL INTERPRETATION OF THE DISCLOSURE
(26) In this specification, reference is made to particular features of the invention. It is to be understood that the disclosure of the invention in this specification includes all appropriate combinations of such particular features. For example, where a particular feature is disclosed in the context of a particular embodiment or a particular claim, that feature can also be used, to the extent appropriate, in the context of other particular embodiments and claims, and in the invention generally. The embodiments disclosed in this specification are exemplary and do not limit the invention. Other embodiments can be utilized and changes can be made. As used in this specification, the singular forms a, an, and the include plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to a part includes a plurality of such parts, and so forth. The term comprises and grammatical equivalents thereof are used in this specification to mean that, in addition to the features specifically identified, other features are optionally present. The term consisting essentially of and grammatical equivalents thereof is used herein to mean that, in addition to the features specifically identified, other features may be present which do not materially alter the claimed invention. The term at least followed by a number is used herein to denote the start of a range beginning with that number (which may be a range having an upper limit or no upper limit, depending on the variable being defined). For example at least 1 means 1 or more than 1, and at least 80% means 80% or more than 80%. The term at most followed by a number is used herein to denote the end of a range ending with that number (which may be a range having 1 or 0 as its lower limit, or a range having no lower limit, depending upon the variable being defined). For example, at most 4 means 4 or less than 4, and at most 40% means 40% or less than 40%. When, in this specification, a range is given as (a first number) to (a second number) or (a first number)-(a second number), this means a range whose lower limit is the first number and whose upper limit is the second number. Where reference is made in this specification to a method comprising two or more defined steps, the defined steps can be carried out in any order or simultaneously, and the method can optionally include one or more other steps which are carried out before any of the defined steps, between two of the defined steps, or after all the defined steps. Where reference is made herein to first and second features, this is generally done for identification purposes; unless the context requires otherwise, the first and second features can be the same or different, and reference to a first feature does not mean that a second feature is necessarily present (though it may be present). Where reference is made herein to a or an feature, this includes the possibility that there are two or more such features (except where the context excludes that possibility).
(27) The phrases nucleic acid and nucleic acid sequence refer to a nucleotide, oligonucleotide, polynucleotide, or any fragment thereof. These phrases also refer to DNA or RNA of genomic or synthetic origin which may be single stranded or double stranded and may represent the sense or the antisense strand, to peptide nucleic acid (PNA), or to any DNA-like or RNA-like material.
(28) Operably linked refers to the situation in which a first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked DNA sequences may be in close proximity or contiguous and, where necessary to join two protein coding regions, in the same reading frame.
(29) A variant of a particular nucleic acid sequence is defined as a nucleic acid sequence having at least 40% sequence identity to the particular nucleic acid sequence over a certain length of one of the nucleic acid sequences using blastp with the BLAST 2 Sequences tool Version 2.0.9 (May 7, 1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 95% or at least 98% or greater sequence identity over a certain defined length. A variant may be described as, for example, an allelic (as defined above), splice, species, or polymorphic variant. A splice variant may have significant identity to a reference molecule, but will generally have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA processing. The corresponding polypeptide may possess additional functional domains or lack domains that are present in the reference molecule. Species variants are polynucleotide sequences that vary from one species to another. The resulting polypeptides generally will have significant amino acid identity relative to each other. A polymorphic variant is a variation in the polynucleotide sequence of a particular gene between individuals of a given species. Polymorphic variants also may encompass single nucleotide polymorphisms (SNPs) in which the polynucleotide sequence varies by one nucleotide base. The presence of SNPs may be indicative of, for example, a certain population, a disease state, or a propensity for a disease state.
(30) A variant of a particular polypeptide sequence is defined as a polypeptide sequence having at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of the polypeptide sequences using blastp with the BLAST 2 Sequences tool Version 2.0.9 (May 7, 1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% or greater sequence identity over a certain defined length of one of the polypeptides.