SELECTIVE DEGRADATION OF PROTEINS
20220298503 · 2022-09-22
Inventors
Cpc classification
C12Y304/21026
CHEMISTRY; METALLURGY
C40B30/06
CHEMISTRY; METALLURGY
C12N15/1055
CHEMISTRY; METALLURGY
C12P21/06
CHEMISTRY; METALLURGY
C12N15/625
CHEMISTRY; METALLURGY
C12N15/1086
CHEMISTRY; METALLURGY
C12N15/1086
CHEMISTRY; METALLURGY
C07K2319/80
CHEMISTRY; METALLURGY
C12N15/1093
CHEMISTRY; METALLURGY
C07K2319/95
CHEMISTRY; METALLURGY
C12N15/1093
CHEMISTRY; METALLURGY
International classification
C12N15/10
CHEMISTRY; METALLURGY
C12N9/50
CHEMISTRY; METALLURGY
Abstract
The present disclosure provides methods to identify peptides and small molecule moieties that are able to functionally bridge an interaction between a target protein and an E3 ubiquitin ligase to mediate the degradation of the target protein. Some moieties can degrade specific target variants, but not others. The moieties create a neosubstrate for an E3 ligase of interest. The methods described enable generation of compounds able to selectively degrade specific targets within cells with implications for drug development for pathological conditions. The disclosure also describes the generation of modified peptides using post-translational modification enzymes, such as N-methyltransferases, prolyloligopeptidases, lactamases, hydroxylases, and dehydratases, along with methods of using the same.
Claims
1-73. (canceled)
74. A host cell configured to express: an E3 ubiquitin ligase; a first fusion protein comprising a first test protein, a first DNA-binding moiety, and a first gene-activating moiety; and a negative selection agent, wherein the expression of the negative selection agent is under control of a promoter DNA sequence specific for the first DNA-binding moiety.
75. The host cell of claim 74, wherein the host cell further comprises: a second fusion protein comprising a second DNA-binding moiety, a second test protein, and a second gene-activation moiety; and a positive selection reporter, wherein the expression of the positive reporter is under control of a second promoter DNA sequence specific for the second DNA-binding moiety
76. The host cell of claim 153, wherein the non-natural polypeptide encodes an N-terminal sequence for peptide stabilization.
77. The host cell of claim 153, wherein the polypeptide is an encoded product of an mRNA, wherein the mRNA comprises a 3′UTR.
78. The host cell of claim 77, wherein the mRNA is an encoded product of a DNA molecule, wherein the DNA molecule is delivered into the host cell exogenously.
79. The host cell of claim 74, wherein the host cell is a eukaryote or a prokaryote.
80. The host cell of claim 74, wherein the host cell is from a plant, animal, fungus, or bacteria.
81. The host cell of claim 80, wherein the host cell is from a fungus.
82. The host cell of claim 81, wherein the host cell is a haploid yeast cell.
83. The host cell of claim 81, wherein the host cell is a diploid yeast cell.
84. (canceled)
85. The host cell of claim 74, wherein the host cell has a mutant background enabling increased uptake of small molecules.
86-115. (canceled)
116. A method for producing cyclic peptides, the method comprising: recombinantly expressing a prolyloligopeptidase; and contacting the prolyloligopeptidase with a linear peptide such that the linear peptide is converted to a cyclic peptide; wherein the active site of prolyloligopeptidase does not have a tryptophan residue at a position corresponding to amino acid position 603 or an asparagine residue at a position corresponding to amino acid position 563 of SEQ ID NO: 55.
117-136. (canceled)
137. The host cell of claim 153, wherein the polypeptide is processed into a cyclic or bicyclic peptide in the host cell.
138. The host cell of claim 153, wherein the polypeptide is a product of post-translational modification.
139. The host cell of claim 138, wherein the post-translational modification includes cyclization.
140. The host cell of claim 139, wherein the cyclization comprises reaction with prolyl endopeptidase.
141. The host cell of claim 140, wherein the prolyl endopeptidase has at least 80%, 85%, 90%, 92%, 95%, 97% or 99% sequence identity to one of SEQ ID NOs: 42-58.
142. The host cell of claim 138, wherein the post-translational modification includes bi-cyclization.
143. The host cell of claim 142, wherein the bicyclization comprises a reaction with hydroxylase and dehydratase
144. The host cell of claim 143, wherein the hydroxylase has at least 80%, 85%, 90%, 92%, 95%, 97% or 99% sequence identity to SEQ ID NO: 123.
145. The host cell of claim 143, wherein the dehydratase has at least 80%, 85%, 90%, 92%, 95%, 97% or 99% sequence identity to one of SEQ ID NOs: 124-127.
146. The host cell of claim 142, wherein the bicyclization is formed by a tryptathionine bridge.
147. The host cell of claim 138, wherein the post-translational modification includes methylation.
148. The host cell of claim 147, wherein the methylation comprises reacting with N-methyltransferase.
149. The host cell of claim 148, wherein the N-methyltransferase is one with at least 80%, 85%, 90%, 92%, 95%, 97% or 99% sequence identity to one of SEQ ID NOs: 61-116.
150. The host cell of claim 74, wherein the host cell comprises more than one sequence of a gene for expressing a negative selection agent that is activated by a promoter DNA sequence specific for the first DNA-binding moiety.
151. The host cell of claim 74, wherein the host cell comprises a DNA sequence encoding the fusion protein, a DNA sequence encoding the E3 ubiquitin ligase, and a DNA sequence encoding the negative selection agent.
152. The host cell of claim 74, wherein the negative selection agent is a ribosomally encoded xenobiotic agent, a ribosomally encoded poison, a ribosomally encoded endogenous or exogenous gene that results in severe growth defects upon mild overexpression, a ribosomally encoded recombinase that excises an essential gene for viability, a limiting factor involved in the synthesis of a toxic secondary metabolite, a growth inhibitory sequence, or any combination thereof.
153. The host cell of claim 74, wherein the host cell is configured to express a non-natural polypeptide, wherein the non-natural polypeptide modulates an interaction between the first fusion protein and the E3 ubiquitin ligase in a manner that leads to accelerated degradation of the first fusion protein.
154. The host cell of claim 153, wherein the non-natural polypeptide is a cyclic peptide produced by the method of claim 116.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0054] Features of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
DETAILED DESCRIPTION
[0069] The present disclosure provides a system that can use a unified eukaryotic or prokaryotic one-hybrid system in which a bait expression plasmid is used in both organismal contexts. Additionally, an extensive series of leucine zipper fusion proteins of known affinities can be generated to compare the efficiency of interaction detection using such systems. The yeast system can produce a quantitative readout over a dynamic range. In addition, modified expression vectors disclosed herein can be used for expression of a protein of interest in both eukaryotes and prokaryotes.
[0070] The present disclosure also provides a system for delivering molecules across the cell membrane. The cell membrane presents a major challenge in drug discovery, especially for biologics such as peptides, proteins, and nucleic acids. One potential strategy to subvert the membrane barrier and deliver biologics into cells is to attach them to “cell penetrating peptides” (CPPs). Despite three decades of investigation, the fundamental basis for CPP activity remains elusive. CPPs that enter cells via endocytosis generally exit from endocytic vesicles in order to reach the cytosol. Unfortunately, the endosomal membrane has proven to be a significant barrier towards cytoplasmic delivery of these CPPs such that often a negligible fraction of the peptides escapes into the cell interior. What are thus needed are new scaffolds and structures that impart peptides with highly proficient intrinsic cell penetrating ability to various cell types. Several naturally occurring polyketides and peptides exhibit remarkable cell permeability (e.g. cyclosporine and amanitins). These peptides are characterized by specific modifications (e.g., N-methylation of the backbone and cyclization or bicyclization) that can play a crucial role in their cell membrane permeability. The compositions and methods disclosed herein describe methods and approaches that enable the general utilization of similar modifications to generate compositions that may be of high therapeutic value and that may be capable of degrading proteins with high selectivity.
Definitions
[0071] As used herein, “reporter gene” refers to a gene whose expression can be assayed. Such genes include, for example, LacZ, β-glucuronidase (GUS), amino acid biosynthetic genes, the yeast LEU2, HIS3, LYS2, or URA3 genes, nucleic acid biosynthetic genes, the mammalian chloramphenicol transacetylase (CAT) gene, the green fluorescent protein (GFP) or any surface antigen gene for which specific antibodies are available. Reporter genes can result in both positive and negative selection.
[0072] An “allele” refers to a DNA sequence of a gene which includes a naturally occurring, or pathogenic variant of a gene. Expression of differing alleles may lead to different protein variants.
[0073] A “promoter” is a DNA sequence located proximal to the start of transcription at the 5′ end of an operably linked transcribed sequence. The promoter can contain one or more regulatory elements or modules, which interact in modulating transcription of the operably linked gene. Promoters can be switchable or constitutive. Switchable promoters allow for reversible induction or repression of operably linked target genes upon administration of an agent. Examples of switchable promoters include but are not limited to the LexA operator and the alcohol dehydrogenase I (alcA) gene promoter. Examples of constitutive promoters include the human beta-actin gene promoter.
[0074] “Operably linked” describes two macromolecular elements arranged such that modulating the activity of the first element induces an effect on the second element. In this manner, modulation of the activity of a promoter element can be used to alter or regulate the expression of an operably-linked coding sequence. For example, the transcription of a coding sequence that is operably-linked to a promoter element can be induced by factors that activate the promoter's activity; transcription of a coding sequence that is operably-linked to a promoter element can be inhibited by factors that repress the promoter's activity. Thus, a promoter region is operably-linked to the coding sequence of a protein if transcription of such coding sequence activity is influenced by the activity of the promoter.
[0075] “In frame” as used herein throughout, refers to the proper positioning of a desired sequence of nucleotides within a DNA fragment or coding sequence operably linked to a promoter sequence, thereby permitting transcription and/or translation.
[0076] “Fusion construct” refers to recombinant genes that encode fusion proteins.
[0077] A “fusion protein” is a hybrid protein, i.e., a protein that has been constructed to contain domains from at least two different proteins. Fusion proteins described herein can be a hybrid proteins that possess both (1) a transcriptional regulatory domain from a transcriptional regulatory protein or a DNA binding domain from a DNA binding protein and (2) a heterologous protein to be assayed for interaction status. The protein that is the source of the transcriptional regulatory domain may different from the protein that is the source of the DNA binding domain. In other words, the two domains may be heterologous to each other.
[0078] A transcriptional regulatory domain of a bait fusion protein can either activate or repress transcription of target genes, depending on the biological activity of the domain. Bait proteins of the disclosure may also be part of a fusion protein where a protein of interest is operably linked to a DNA binding moiety and a transcriptional activation domain.
[0079] “Bridging interaction” refers to an interaction between a first protein and a second that occurs only when one or both of the first protein and the second protein interact with a molecule, such as a peptide or small molecule from a library. In some cases, the bridging interaction between the first protein and the second protein is direct, while in other cases the bridging interaction between the first protein and the second protein is indirect. In some cases, the interaction leads to an activity of one protein being exerted on a second protein, such as ubiquitination and subsequent degradation.
[0080] “Expression” is the process by which the information encoded within a gene is revealed. If the gene encodes a protein, then expression involves both transcription of the DNA into mRNA, the processing of mRNA (if necessary) into a mature mRNA product, and translation of the mature mRNA into protein.
[0081] As used herein, a “cloning vehicle” is any entity that is capable of delivering a nucleic acid sequence into a host cell for cloning purposes. Examples of cloning vehicles include plasmids or phage genomes. A plasmid that can replicate autonomously in the host cell is especially desired. Alternatively, a nucleic acid molecule that can insert (integrate) into the host cell's chromosomal DNA is useful, especially a molecule that inserts into the host cell's chromosomal DNA in a stable manner, that is, a manner that allows such molecule to be inherited by daughter cells.
[0082] Cloning vehicles are often characterized by one or a small number of endonuclease recognition sites at which such DNA sequences may be cut in a determinable fashion without loss of an essential biological function of the vehicle, and into which DNA may be spliced in order to bring about its replication and cloning.
[0083] The cloning vehicle can further contain a marker suitable for use in the identification of cells transformed with the cloning vehicle. For example, a marker gene can be a gene that confers resistance to a specific antibiotic on a host cell.
[0084] The word “vector” can be used interchangeably with “cloning vehicle.”
[0085] As used herein, an “expression vehicle” is a vehicle or vector similar to the cloning vehicle that is especially designed to provide an environment that allows the expression of the cloned gene after transformation into the host. One manner of providing such an environment is to include transcriptional and translational regulatory sequences on such expression vehicles, such transcriptional and translational regulatory sequences being capable of being operably linked to the cloned gene. Another manner of providing such an environment is to provide a cloning site or sites on such vehicle, wherein a desired cloned gene and a desired expression regulatory element can be cloned.
[0086] In an expression vehicle, the gene to be cloned is usually operably-linked to certain control sequences such as promoter sequences. Expression control sequences will vary depending on whether the vector is designed to express the operably-linked gene in a prokaryotic or eukaryotic host and can additionally contain transcriptional elements such as enhancer elements, termination sequences, tissue-specificity elements, or translational initiation and termination sites.
[0087] A “host” refers to any organism that is the recipient of a cloning or expression vehicle. The host may be a bacterial cell, a yeast cell or a cultured animal cell such as a mammalian or insect cell. The yeast host may be Saccharomyces cerevisiae.
[0088] A “host cell” as described herein can be a bacterial, fungal, or mammalian cell or from an insect or plant. Examples of bacterial host cells are E. coli and B. subtilis. Examples of fungal cells are S. cerevisiae and S. pombe. Non-limiting examples of mammalian cells are immortalized mammalian cell lines, such as HEK293, A549, HeLa, or CHO cells, or isolated patient primary tissue cells that have been genetically immortalized (such as by transfection with hTERT). Non-limiting example of the plant is Nicotiana tabacum or Physcomitrella patens. A non-limiting example of insect cell is a sf9 (Spodoptera frugiperda) cell.
[0089] A “DNA-binding domain (DBD),” or a “DNA-binding moiety” is a moiety that is capable of directing specific polypeptide binding to a particular DNA sequence (i.e., a “protein binding site”). These proteins can be homodimers or monomers that bind DNA in a sequence specific manner. Exemplary DNA-binding domains of the disclosure include LexA, cI, glucocorticoid receptor binding domains, and the Ume6 domain.
[0090] A “gene activating moiety” or “activation domain” (“AD”) is a moiety that is capable of inducing (albeit in many instances weakly inducing) the expression of a gene to whose control region it is bound (one example is an activation domain from a transcription factor). As used herein, “weakly” is meant below the level of activation effected by GAL4 activation region II and is preferably at or below the level of activation effected by the B42 activation domain. Levels of activation can be measured using any downstream reporter gene system and comparing, in parallel assays, the level of expression stimulated by the GAL4 region II-polypeptide with the level of expression stimulated by the polypeptide to be tested.
[0091] The term “sequence identity” as used herein in the context of amino acid sequences is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in a selected sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, ALIGN-2 or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full-length of the sequences being compared.
Screening for Functional Degraders of a Target Protein
[0092] Selective protein degradation is a unique approach to drug discovery. The ability to selectively degrade an aberrant protein or its isoform poses a controlled approach to selectively target certain pathologies such as cancer. Compounds that accomplish selective degradation through bridging to E3 ubiquitin ligases are catalytic in nature, and are not required in stoichiometric levels, making them lucrative drug compounds. Screening for compounds that selectively bridge a protein of interest to an E3 ubiquitin ligase doesn't always guarantee a functional degradation of the target protein in question. Screening for compounds that are able to functionally degrade a target protein by forming a transient tertiary complex between itself, the target, and an E3 ligase is difficult to perform. Current screens rely on identifying compounds specific to the target and chemically linking them to anther moiety that binds to a specific E3 ligase, creating large molecules that are limited to targets with known small molecule binders.
[0093] Methods and systems of the disclosure can involve the intracellular selection of peptide based selective degraders. Stated differently, various systems described herein can be used to screen for molecules that selectively lead to the degradation of a target protein by creating a functional interaction between the target protein and an E3 ubiquitin ligase or directly to the proteasome. A model organism, for example Saccharomyces cerevisiae, can be employed, for the coexpression of a target of interest with a specific E3 ubiquitin ligase and a test DNA molecule comprising a DNA sequence that encodes a randomized peptide library. This can allow for the selection of unbiased peptides that lead to a functional degradation of the target of interest using selection mechanisms (e.g., stringent viability readout selection mechanisms). The method can involve a permutation of a yeast one-hybrid system that can rely on the degradation of a transcription factor that requires an interaction between the test protein fused to a DNA binding domain (DBD) and transcription activation domain (AD) by the proteasome or through a specific E3 ubiquitin ligase via a peptide-mediated interaction (see
[0094] Methods and systems of the disclosure can use the reconstitution of a transcription factor mediated by a test protein fused to an AD, for example, VP16, NF-κB AD, VP64AD, BP64 AD, B42 acidic activation domain (B42AD), or p65 transactivation domain (p65AD) and a DBD, for example, LexA, cI, Gli-1, YY1, glucocorticoid receptor binding domain, or Ume6 domain. Similarly, the test protein can comprise an AD and bind to DNA through another binding partner.
[0095] Methods and system of the disclosure can also use two different proteins, or two variants of one protein, fused to different DBDs and ADs. The system can identify compounds that bridge one of the proteins to an E3 ligase leading to its degradation, while preserving an active version of the other test protein. For example, degrading one component in a complex without affecting the rest of the complex integrity (see
[0096] Expression of the protein of interest can direct RNA polymerase to a specific genomic site and allow for the expression of a genetic element. The genetic element can be, for example, a gene that encodes a protein that enables an organism to grow on selection media. The selection media can be specific to, for example, ADE2, URA3, TRP1, KANR, or NATR, and will lack the essential component (Ade, Ura, Trp) or include a drug (G418, NAT). Markers that can detect when a protein is no longer present (for example when the protein is degraded by an external composition) can be referred to as counter-selection markers, such as the URA3 gene, and can be poor or leaky (easily masked by the selection of mutants that escape the selection). This leakiness of the selection marker can lead to a high false positive rate.
[0097] Methods and systems of the disclosure can combine a strong negative selection marker with the intracellular stabilization of the production of short peptides or macrocycles to screen for mediators of bridging interactions between a target protein and an E3 ubiquitin ligase. An inducible one-hybrid approach can be employed, which can drive the expression of any one or combination of several cytotoxic reporters (death agents) as well as positive selection markers. A method of the disclosure involving induced expression of a combination of cytotoxic reporters in a one-hybrid system can allow for a multiplicative effect in lowering the false-positive rate of the one-hybrid assay, as all of the cytotoxic reporters must simultaneously be “leaky” to allow for an induced cell to survive.
[0098] Disclosed herein, in certain embodiments, is a method for identifying a molecule that can selectively bridge an interaction between a first test protein and an E3 ubiquitin ligase to mediate functional degradation of the test protein in a host cell. A second test protein may be used as a positive control, such as, while the molecule mediates degradation of the first test protein, it may not affect expression of the second test protein. The method may comprise expressing in the host cell a first fusion protein comprising the first test protein and a DNA-binding moiety and a gene activating moiety; an E3 ubiquitin ligase or a fragment thereof or in some cases, the E3 ubiquitin ligase and its associated machinery; and delivering a molecule from a library to the host cell. The host cell may comprise a promoter sequence for controlling expression of a death agent. The promoter may be specific to the DNA-binding moiety in the first fusion protein such that in the absence of the molecule, expression of the death agent is activated. When the molecule is present, the first test protein may be degraded by the E3 ubiquitin ligase.
[0099]
[0100]
[0101]
[0102] In some embodiments, a screen to identify a peptide or small molecule that can mediate the degradation of a target protein may involve testing the peptide or small molecule against a population of host cells in which different cells in the population express different E3 ligases. The host cells can then be transformed with or otherwise subjected to a candidate peptide/small molecule from a library. In such cases, each of the host cells may comprise the same target protein and/or death agent. Surviving cells may be sequenced to identify the E3 ligase that successfully interacts with the peptide/small molecule. In another example, each well of an assay may comprise a plurality of different host cells in which different host cell express different E3 ligases. A peptide/small molecule from a library may then be transformed or otherwise introduced into each well for the identification of a peptide/small molecule that successfully interacts with the target protein and leads to cell survival.
[0103] Examples of targets for degradation are oncogenic proteins such as K-Ras oncogenic alleles, Cyclin D family, Cyclin E family, c-MYC, EGFR, HER2, PDGFR, VEGF and beta-catenin, or oncogenic variants such as IDH1(R132H, R132S, R132C, R132G, and R132L) or IDH2(R140Q, R172K).
[0104] Examples of E3 ubiquitin ligases that can be used on the system can be chosen from a list including, but not limited to multisubunit E3 ligases of the Culin families (CRL1, CRL2, CRL3, CRL4, CRL5, and CRL7) and single subunit E3 ligases of the RING, RING-Between-RING (RBR), and HECT families consisting of, but not limited to Cereblon, Skp2, MDM2, FBXW7, DCAF1, DCAF15, VHL, AFF4, AMFR, ANAPC11, ANKIB1, AREL1, ARIH1, ARIH2, BARD1, BFAR, BIRC2, BIRC3, BIRC7, BIRC8, BMI1, BRAP, BRCA1, CBL, CBLB, CBLC, CBLL1, CCDC36, CCNB1IP1, CGRRF1, CHFR, CNOT4, CUL9, CYHR1, DCST1, DTX1, DTX2, DTX3, DTX3L, DTX4, DZIP3, E4F1, FANCL, G2E3, HACE1, HECTD1, HECTD2, HECTD3, HECTD4, HECW1, HECW2, HERC1, HERC2, HERC3, HERC4, HERC5, HERC6, HLTF, HUWE1, IRF2BP1, IRF2BP2, IRF2BPL, Itch, KCMF1, KMT2C, KMT2D, LNX1, LNX2, LONRF1, LONRF2, LONRF3, LRSAM1, LTN1, MAEA, MAP3K1, MARCH1, MARCH10, MARCH11, MARCH2, MARCH3, MARCH4, MARCH5, MARCH6, MARCH7, MARCH8, MARCH9, Mdm2, MDM4, MECOM, MEX3A, MEX3B, MEX3C, MEX3D, MGRN1, MIB1, MIB2, MID1, MID2, MKRN1, MKRN2, MKRN3, MKRN4P, MNAT1, MSL2, MUL1, MYCBP2, MYLIP, NEDD4, NEDD4L, NEURL1, NEURL1B, NEURL3, NFX1, NFXL1, NHLRC1, NOSIP, NSMCE1, PARK2, PCGF1, PCGF2, PCGF3, PCGF5, PCGF6, PDZRN3, PDZRN4, PELI1, PELI2, PELI3, PEX10, PEX12, PEX2, PHF7, PHRF1, PJA1, PJA2, PLAG1, PLAGL1, PML, PPIL2, PRPF19, RAD18, RAG1, RAPSN, RBBP6, RBCK1, RBX1, RC3H1, RC3H2, RCHY1, RFFL, RFPL1, RFPL2, RFPL3, RFPL4A, RFPL4AL1, RFPL4B, RFWD2, RFWD3, RING1, RLF, RLIM, RMND5A, RMND5B, RNF10, RNF103, RNF11, RNF111, RNF112, RNF113A, RNF113B, RNF114, RNF115, RNF121, RNF122, RNF123, RNF125, RNF126, RNF128, RNF13, RNF130, RNF133, RNF135, RNF138, RNF139, RNF14, RNF141, RNF144A, RNF144B, RNF145, RNF146, RNF148, RNF149, RNF150, RNF151, RNF152, RNF157, RNF165, RNF166, RNF167, RNF168, RNF169, RNF17, RNF170, RNF175, RNF180, RNF181, RNF182, RNF183, RNF185, RNF186, RNF187, RNF19A, RNF19B, RNF2, RNF20, RNF207, RNF208, RNF212, RNF212B, RNF213, RNF214, RNF215, RNF216, RNF217, RNF219, RNF220, RNF222, RNF223, RNF224, RNF225, RNF24, RNF25, RNF26, RNF31, RNF32, RNF34, RNF38, RNF39, RNF4, RNF40, RNF41, RNF43, RNF44, RNF5, RNF6, RNF7, RNF8, RNFT1, RNFT2, RSPRY1, SCAF11, SH3RF1, SH3RF2, SH3RF3, SHPRH, SIAH1, SIAH2, SIAH3, SMURF1, SMURF2, STUB1, SYVN1, TMEM129, Topors, TRAF2, TRAF3, TRAF4, TRAF5, TRAF6, TRAF7, TRAIP, TRIM10, TRIM11, TRIM13, TRIM15, TRIM17, TRIM2, TRIM21, TRIM22, TRIM23, TRIM24, TRIM25, TRIM26, TRIM27, TRIM28, TRIM3, TRIM31, TRIM32, TRIM33, TRIM34, TRIM35, TRIM36, TRIM37, TRIM38, TRIM39, TRIM4, TRIM40, TRIM41, TRIM42, TRIM43, TRIM43B, TRIM45, TRIM46, TRIM47, TRIM48, TRIM49, TRIM49B, TRIM49C, TRIM49D1, TRIM5, TRIM50, TRIM51, TRIM52, TRIM54, TRIM55, TRIM56, TRIM58, TRIM59, TRIM6, TRIM60, TRIM61, TRIM62, TRIM63, TRIM64, TRIM64B, TRIM64C, TRIM65, TRIM67, TRIM68, TRIM69, TRIM7, TRIM71, TRIM72, TRIM73, TRIM74, TRIM75P, TRIM77, TRIM8 , TRIM9, TRIML1, TRIML2, TRIP12, TTC3, UBE3A, UBE3B, UBE3C, UBE3D, UBE4A, UBE4B, UBOX5, UBR1, UBR2, UBR3, UBR4, UBR5, UBR7, UHRF1, UHRF2, UNK, UNKL, VPS11, VPS18, VPS41, VPS8, WDR59, WDSUB1, WWP1, WWP2, XIAP, ZBTB12, ZFP91, ZFPL1, ZNF280A, ZNF341, ZNF511, ZNF521, ZNF598, ZNF645, ZNRF1, ZNRF2, ZNRF3, ZNRF4, Zswim2, and ZXDC.
Expression of Proteins in the Host Cells
[0105] One or more plasmid constructs may be used to express different proteins in the host cell. The number of plasmids used may depend on the host cell, presence of integrated constructs in the host cell amongst other conditions.
[0106] In some cases, a method of identifying a molecule that elicits degradation of a first test protein may use proteins such as: an E3 ubiquitin ligase, a molecule from a library of molecules and a first fusion protein comprising a first test protein, a first DNA-binding moiety and a gene-activating domain. The method may also use a promoter driving the expression of a death agent, such as the promoter sequence is specific for the first DNA-binding moiety. In addition to this scheme, in some cases, the method may also utilize a second fusion protein comprising a second DNA-binding domain and a gene-activating moiety along with a promoter driving the expression of a positive or negative marker such as the promoter sequence is specific for the second DNA-binding moiety.
[0107] The proteins and nucleic acid sequences mentioned above may be provided to the host cell in the form of plasmids. In some cases, the nucleic acid sequences of the proteins and nucleic acids comprising the promoter and death agents/positive and negative markers may be integrated into the host cell. In some cases, the molecule from a library of molecules is a small molecule/compound and does not need to be encoded on a plasmid.
[0108] For instance, in one example, the first fusion protein may be provided in a plasmid (Plasmid 1), the E3 ubiquitin ligase may be provided in a separate plasmid (Plasmid 2) and the DNA encoded molecule from a library may be provided in a separate plasmid (Plasmid 3). All three plasmids may be transfected into a plurality of host cells. In cases where a second fusion protein is also being used, the second fusion protein may be provided in plasmid 1 or in a separate plasmid (Plasmid 4). The expression constructs of the plasmids may also be combined in one or two plasmids to reduce the number of plasmids to be transfected. Additionally, the constructs comprising the promoters driving the death agent or the positive/negative selection markers may also be provided in the plasmids or otherwise, they may be integrated into the host cell.
[0109] In another instance, the first fusion protein may be genetically integrated into the host cell whereas Plasmids 2 and 3 comprising the E3 ubiquitin ligase and the molecule from the library of molecules are transfected into the host cell. In this example, the second fusion protein may also be integrated into the host cell or in some cases, be provided as a plasmid.
[0110] In yet another instance, the first fusion protein and the E3 ubiquitin ligase are both integrated into the host cell and the molecule from the library of molecules is transfected in the form of a plasmid into the host cell. The second fusion protein, as mentioned above, may be integrated into the host cell or it may be provided in a plasmid form. The constructs comprising the promoters driving the death agent or the positive/negative selection markers may also be provided in the plasmids or otherwise, they may be integrated into the host cell.
[0111] In another instance, the first fusion protein may be transfected in a plasmid for/integrated into the host cell but an endogenous E3 ubiquitin ligase is used in which case, the integration or transfection of a plasmid containing the E3 ligase may not be needed.
[0112] In some cases, the nucleic acid sequences for the fusion protein/proteins, the E3 ubiquitin ligase, the promoter driving the death agent (and promoter driving the positive/negative selection marker, if it is being used) may all be integrated into the host cell. In this case, just a single plasmid comprising the molecule from a library of molecules may be transfected into the host cell.
[0113] In some embodiments, the host cell or cells disclosed herein comprises a plasmid vector. The plasmid can contain, for example, two restriction sites that enable the integration of two proteins that constitute the bait and E3 ligase of interest. The bait protein of interest can involve an oncogene (such as Cyclin E family, Cyclin D family, c-MYC, EGFR, HER2, K-Ras, PDGFR, Raf kinase, and VEGF). The bait protein of interest can involve an effector of an inflammatory response (such as IL-17RA, IL-17RB, IL-17RC, IL17-RD, IL17-RE, Act1 (CIKS), and IL-23R).
[0114] A plasmid can be configured to express two proteins that constitute the bait and E3 ligase of interest and an additional factor, for example, a variant of one of the bait protein. The variants for targeting can be KRAS (G12D, G12V, G12C, G12S, G13D, Q61K, or Q61L, etc.) and the control variant is WT KRAS. The additional factor can also be another protein bound to the bait protein, or another target of the E3 ligase.
[0115] In some embodiments, the host cell disclosed herein comprise a plasmid wherein a DNA sequence encoding a first polypeptide is inserted in frame with Gal4-DBD and in frame with VP64-AD, and a DNA sequence encoding for a second polypeptide comprising of an E3 ubiquitin ligase.
[0116] In some embodiments, the first test protein is a variant of KRAS, the E3 ubiquitin ligase is VHL.
[0117] A plasmid can encode for the fusion of an activation domain or another gene activating moiety and a DBD to each protein driven by either a strong promoter and terminator (such as ADH1), or by an inducible promoter (such as GAL1). Other exemplary activation domains include those of VP16 and B42AD. In some embodiments, the DNA binding moiety is derived from LexA, TetR, Lad, Gli-1, YY1, glucocorticoid receptor, or Ume6 domain and the gene activating moiety is derived from Gal4, B42, or VP64, Gal4, NF-κB AD, Dof1, BP64, B42, or p65. Each protein fusion can be tagged for subsequent biochemical experiments with, for example, a FLAG, HA, MYC, or His tag. The plasmid can also include bacterial selection and propagation markers (i.e. ori and AmpR), and yeast replication and selection markers (i.e. TRP1 and CEN or 2um). The plasmid may contain multiple bait proteins fused to different DBDs and ADs. The plasmid can also be integrated into the genome at a specified locus.
[0118] Disclosed herein, in certain embodiments, is a library of plasmid vectors, each plasmid vector comprising: a DNA sequence encoding a different peptide sequence operably linked to a first switchable promoter; a DNA sequence encoding a death agent under control of a second switchable promoter; and a DNA sequence encoding a positive selection reporter under control of a third switchable promoter.
Expression of Selection Markers
[0119] Positive Selection Markers
[0120] An efficient expression of a test protein can direct a RNA polymerase to a specific genomic site, and allow expression of a protein that enables an organism to grow on selection media. The selection media can be specific to, for example, ADE2, URA3, TRP1, KAN.sup.R, or NAT.sup.R, and can lack the essential component (Ade, Ura, Trp) or can include a drug (G418, NAT). A plasmid can encode for one or more positive selection markers that enable an organism to grow on selection media.
Negative Selection Markers
[0121] An inducible one-hybrid approach can be employed, which can drive the expression of any one or combination of several cytotoxic reporters (death agents) as well as positive selection markers. A method of the disclosure involving induced expression of a combination of cytotoxic reporters in a one-hybrid system can allow for a multiplicative effect in lowering the false-positive rate of the one-hybrid assay, as all of the cytotoxic reporters must simultaneously be “leaky” to allow for an induced cell to survive. The cytotoxic reporters can be comprised or contain domains of various polypeptides, for example as shown in Table 1.
TABLE-US-00001 TABLE 1 Amino acid sequences of exemplary toxins Cholera toxin SEQ ID MVKIIFVFFIFLSSFSYANDDKLYRADSRPPDEIKQSGGLMPRGQSEYFDRGTQMNIN (CtxA) NO.: 1 LYDHARGTQTGFVRHDDGYVSTSISLRSAHLVGQTILSGHSTYYIYVIATAPNMFNV NDVLGAYSPHPDEQEVSALGGIPYSQIYGWYRVHFGVLDEQLHRNRGYRDRYYSNL DIAPAADGYGLAGFPPEHRAWREEPWIHHAPPGCGNAPRSSMSNTCDEKTQSLGVK FLDEYQSKVKRQIFSGYQSDIDTHNRIKDEL SpvB toxin SEQ ID MLILNGFSSATLALITPPFLPKGGKALSQSGPDGLASITLPLPISAERGFAPALALHYSS (Salmonella NO.: 2 GGGNGPFGVGWSCATMSIARRTSHGVPQYNDSDEFLGPDGEVLVQTLSTGDAPNPV enterica) TCFAYGDVSFPQSYTVTRYQPRTESSFYRLEYWVGNSNGDDFWLLHDSNGILHLLG KTAAARLSDPQAASHTAQWLVEESVTPAGEHIYYSYLAENGDNVDLNGNEAGRDR SAMRYLSKVQYGNATPAADLYLWTSATPAVQWLFTLVFDYGERGVDPQVPPAFTA QNSWLARQDPFSLYNYGFEIRLHRLCRQVLMFHHFPDELGEADTLVSRLLLEYDENP ILTQLCAARTLAYEGDGYRRAPVNNMMPPPPPPPPPMMGGNSSRPKSKWAIVEESK QIQALRYYSAQGYSVINKYLRGDDYPETQAKETLLSRDYLSTNEPSDEEFKNAMSV YINDIAEGLSSLPETDHRVVYRGLKLDKPALSDVLKEYTTIGNIIIDKAFMSTSPDKA WINDTILNIYLEKGHKGRILGDVAHFKGEAEMLFPPNTKLKIESIVNCGSQDFASQLS KLRLSDDATADTNRIKRIINMRVLNS CARDS toxin SEQ ID MSENLYFQGHMPNPVREVYRVDLRSPEEIFEHGESTLGDVRNFFEHILSTNFGRSYFI (Mycoplasma NO.: 3 STSETPTAAIRFEGSWLREYVPEHPRRAYLYEIRADQHFYNARATGENLLDLMRQRQ pneumoniae) VVEDSGDREMAQMGIRALRTSFAYQREWFTDGPIAAANVRSAWLVDAVPVEPGHA HHPAGRVVETTRINEPEMHNPHYQELQTQANDQPWLPTPGIATPVHLSIPQAASVAD VSEGTSASLSFACPDWSPPSSNGENPLDKCIAEKIDNYNLQSLPQYASSVKELEDTPV YLRGIKTQKTFMLQADPQNNNVELVEVNPKQKSSFPQTIFFWDVYQRICLKDLTGA QISLSLTAFTTQYAGQLKVHLSVSAVNAVNQKWKMTPQDIAITQFRVSSELLGQTEN GLEWNTKSGGSQHDLYVCPLKNPPSDLEELQIIVDECTTHAQFVTMRAASTFFVDVQ LGWYWRGYYYTPQLSGWSYQMKTPDGQIFYDLKTSKIFFVQDNQNVFFLHNKLNK QTGYSWDWVEWLKHDMNEDKDENFKWYESRDDLTIPSVEGLNFRHIRCYADNQQ LKVIISGSRWGGWYSTYDKVESNVEDKILVKDGFDRF SpyA Toxin SEQ ID MLKKRYQLAMILLLSCFSLVWQTEGLVELFVCEHYERAVCEGTPAYFTFSDQKGAE (Streptococcus NO.: 4 TLIKKRWGKGLVYPRAEQEAMAAYTCQQAGPINTSLDKAKGKLSQLTPELRDQVA pyogenes) QLDAATHRLVIPWNIVVYRYVYETFLRDIGVSHADLTSYYRNHQFNPHILCKIKLGT RYTKHSFMSTTALKNGAMTHRPVEVRICVKKGAKAAFVEPYSAVPSEVELLFPRGC QLEVVGAYVSQDHKKLHIEAYFKGSL HopU1 SEQ ID MNINRQLPVSGSERLLTPDVGVSRQACSERHYSTGQDRHDFYRFAARLHVDAQCFG (Pseudomonas NO.: 5 LSIDDLMDKFSDKHFRAEHPEYRDVYPEECSAIYMETAQDYSSHLVRGEIGTPLYRE syringae) VNNYLRLQHENSGREAEIDNHDEKLSPHIKMLSSALNRLMDVAAFRGTVYRGIRGD LDTIARLYHLFDTGGRYVEPAFMSTTRIKDSAQVFEPGTPNNIAFQISLKRGADISGSS QAPSEEEIMLPMMSEFVIEHASALSEGKHLFVLSQI Chelt toxin SEQ ID MKTIISLIFINIFPLFVSAHNGNFYRADSRSPNEIKDLGGLYPRGYYDFFERGTPMSISL NO.: 6 YDHARGAPSGNTRYDDGEVSTTTDIDSAHEIGQNILSGYTEYYIYLIAPAPNLLDVNA VLGRYSPHPQENEYSALGGIPWTQVIGWYVVNNGVLDRNIHRNRQFRADLFNNLSP ALPSESYQFAGFEPEHPAWRQEPWINFAPPGCGRNVRLTKHINQQDCSNSQEELVYK KLQDLRTQFKVDKKLKLVNKTSSNNIIFPNHDFIREWVDLDGNGDLSYCGFTVDSD GSRKRIVCAHNNGNFTYSSINISLSDYGWPKGQRFIDANGDGLVDYCRVQYVWTHL YCSLSLPGQYFSLDKDAGYLDAGYNNSRAWAKVIGTNKYSFCRLTSNGYICTDIDSY STAFKDDDQGWADSRYWMDIDGNGGDDYCRLVYNWTHLRCNLQGKDGLWKRVE SKYLDGGYPSLRFKIKMTSNKDNYCRIVRNHRVMECAYVSDNGEFHNYSLNMPFSL YNKNDIQFIDIDGDNRDDICRYNSAPNTMECYLNQDKSFSQNKLVLYLSAKPISSLGS GSSKIIRTFNSEKNSSAYCYNAGYGTLRCDEFVIY Certhrax toxin SEQ ID MKEIIRNLVRLDVRSDVDENSKKTQELVEKLPHEVLELYKNVGGEIYITDKRLTQHE NO.: 7 ELSDSSHKDMFIVSSEGKSFPLREHFVFAKGGKEPSLIIHAEDYASHLSSVEVYYELG KAIIRDTFPLNQKELGNPKFINAINEVNQQKEGKGVNAKADEDGRDLLFGKELKKNL EHGQLVDLDLISGNLSEFQHVFAKSFALYYEPHYKEALKSYAPALFNYMLELDQMR FKEISDDVKEKNKNVLDFKWYTRKAESWGVQTFKNWKENLTISEKDIITGYTGSKY DPINEYLRKYDGEIIPNIGGDLDKKSKKALEKIENQIKNLDAALQKSKITENLIVYRRV SELQFGKKYEDYNLRQNGIINEEKVMELESNFKGQTFIQHNYMSTSLVQDPHQSYSN DRYPILLEITIPEGVHGAYIADMSEYPGQYEMLINRGYTEKYDKESIVKPTREEDKGK EYLKVNLSIYLGNLNREK EFV toxin SEQ ID MSQLNKWQKELQALQKANYQETDNQLFNVYRQSLIDIKKRLKVYTENAESLSFSTR NO.: 8 LEVERLFSVADEINAILQLNSPKVEKTIKGYSAKQAEQGYYGLWYTLEQSQNIALSM PLINHDYIMNLVNAPVAGKRLSKRLYKYRDELAQNVTNNIITGLFEGKSYAEIARWI NEETEASYKQALRIARTEAGRTQSVTTQKGYEEAKELGINIKKKWLATIDKHTRRTH QELDGKEVDVDEEFTIRGHSAKGPRMFGVASEDVNCRCTTIEVVDGISPELRKDNES KEMSEFKSYDEWYADRIRQNESKPKPNFTELDFFGQSDLQDDSDKWVAGLKPEQV NAMKDYTSDAFAKMNKILRNEKYNPREKPYLVNIIQNLDDAISKFKLKHDIITYRGV SANEYDAILNGNVEKEEKSTSINKKVAEDFLNFTSANKDGRVVKFLIPKGTQGAYIG TNSSMKKESEFLLNRNLKYTVEIVDNILEVTILG ExoT SEQ ID MBIQSSQQNPSFVAELSQAVAGRLGQVEARQVATPREAQQLAQRQEAPKGEGLLSR NO.: 9 LGAALARPFVAIIEWLGKLLGSRAHAATQAPLSRQDAPPAASLSAAEIKQMMLQKA LPLTLGGLGKASELATLTAERLAKDHTRLASGDGALRSLATALVGIRDGSLIEASRT QAARLLEQSVGGIALQQWGTAGGAASQHVLSASPEQLREIAVQLHAVMDKVALLR HAVESEVKGEPVDKALADGLVEHFGLEAEQYLGEHPDGPYSDAEVMALGLYTNGE YQHLNRSLRQGRELDAGQALIDRGMSAAFEKSGPAEQVVKTFRGTQGRDAFEAVK EGQVGHDAGYLSTSRDPSVARSFAGLGTITTLFGRSGIDVSEISIEGDEQEILYDKGTD MRVLLSAKDGQGVTRRVLEEATLGERSGHSEGLLDALDLATGTDRSGKPQEQDLRL RMRGLDLA CdtB SEQ ID MKKIICLELSFNLAFANLENFNVGTWNLQGSSAATESKWSVSVRQLVSGANPLDILM NO.: 10 IQEAGTLPRTATPTGRHVQQGGTPIDEYEWNLGTLSRPDRVFIYYSRVDVGANRVNL AIVSRMQAEEVIVLPPPTTVSRPIIGIRNGNDAFFNIHALANGGTDVGAIITAVDAHFA NMPQVNWMIAGDFNRDPSTITSTVDRELANRIRVVFPTSATQASGGTLDYAITGNSN RQQTYTPPLLAAILMLASLRSHIVSDHFPVNERKF Diptheria SEQ ID MSRKLFASILIGALLGIGAPPSAHAGADDVVDSSKSFVMENFSSYHGTKPGYVDSIQ toxin NO.: 11 KGIQKPKSGTQGNYDDDWKGFYSTDNKYDAAGYSVDNENPLSGKAGGVVKVTYP GLTKVLALKVDNAETIKKELGLSLTEPLMEQVGTEEFIKRFGDGASRVVLSLPFAEG SSSVEYINNWEQAKALSVELEINFETRGKRGQDAMYEYMAQACAGNRVRRSVGSS LSCINLDWDVIRDKTKTKIESLKEHGPIKNKMSESPNKTVSEEKAKQYLEEFHQTALE HPELSELKTVTGTNPVFAGANYAAWAVNVAQVIDSETADNLEKTTAALSILPGIGSV MGIADGAVHHNTEEIVAQSIALSSLMVAQAIPLVGELVDIGFAAYNFVESIINLFQVV HNSYNRPAYSPGHKTQPFLHDGYAVSWNTVEDSIIRTGFQGESGHDIKITAENTPLPI AGVLLPTIPGKLDVNKSKTHISVNGRKIRMRCRAIDGDVTFCRPKSPVYVGNGVHAN LHVAFHRSSSEKIHSNEISSDSIGVLGYQKTVDHTKVNSKLSLFFEIKS ExoU/VipB SEQ ID MKLAEIMTKSRKLKRNLLEISKTEAGQYSVSAPEHKGLVLSGGGAKGISYLGMIQAL NO.: 12 QERGKIKNLTHVSGASAGAMTASILAVGMDIKDIKKLIEGLDITKLLDNSGVGFRAR GDRERNILDVIYMMQMKKHLESVQQPIPPEQQMNYGILKQKIALYEDKLSRAGIVIN NVDDIINLTKSVKDLEKLDKALNSIPTELKGAKGEQLENPRLTLGDLGRLRELLPEEN KHLIKNLSVVVTNQTKHELERYSEDTTPQQSIAQVVQWSGAHPVLFVPGRNAKGEY IADGGILDNMPEIEGLDREEVLCVKAEAGTAFEDRVNKAKQSAMEAISWFKARMDS LVEATIGGKWLHATSSVLNREKVYYNIDNMIYINTGEVTTTNTSPTPEQRARAVKNG YDQTMQLLDSHKQTFDHPLMAILYIGHDKLKDALIDEKSEKEIFEASAHAQAILHLQ EQIVKEMNDGDYSSVQNYLDQIEDILTVDAKMDDIQKEKAFALCIKQVNFLSEGKLE TYLNKVEAEAKAAAEPSWATKILNLLWAPIEWVVSLFKGPAQDFKVEVQPEPVKVS TSENQETVSNQKDINPAVEYRKHAEVRREHTDPSPSLQEKERVGLSTTFGGH HopPtoE SEQ ID MNRVSGSSSATWQAVNDLVEQVSERTTLSTTGYQTAMGRLNKPEKSDADALMTM NO.: 13 RRAQQYTDSAKRTYISETLMNLADLQQRKIYRTNSGNLRGAIEMTPTQLTDCVQKC REEGFSNCDIQALEIGLHLRHKLGISDFTIYSNRKLSHNYVVIHPSNAFPKGAIVDSWT GQGVVELDEKTRLKEKHREENYAVNANMHEWIERYGQAHVID HopPtoF SEQ ID MGNICGTSGSRHVYSPSHTQRITSAPSTSTHVGGDTLTSITIQLSHSQREQFLNMHDP NO.: 14 MRVMGLDHDTELFRTTDSRYIKNDKLAGNPQSMASILMHEELRPNRFASHTGAQPH EARAYVPKRIKATDLGVPSLNVMTGSLARDGIRAYDHMSDNQVSVKMRLGDFLER GGKVYADASSVADDGETSQALIVTLPKGQKVPVERV HopPtoG SEQ ID MQ1KNSHLYSASRMVQNTENASPKMEVTNAIAKNNEPAALSATQTAKTHEGDSKG NO.: 15 QSSNNSKLPFRAMRYAAYLAGSAYLYDKTANNFFLSTTSLHDGKGGFTSDARLNDA QDKARKRYQNNHSSTLENKNSLLSPLRLCGENQFLTMIDYRAATKIYLSDLVDTEQ AHTSILKNIMCLKGELTNEEAIKKLNPEKTPKDYDLTNSEAYISKNKYSLTGVKNEET GSTGYTSRSITKPFVEKGLKHFIKATHGEKALTPKQCMETLDNLLRKSITLNSDSQFA AGQALLVFRQVYAGEDAWGDAERVILKSHYNRGTVLQDEADKIELSRPFSEQDLAK NMFKRNTSIAGPVLYHAYIYIQEKIFKLPPDKIEDLKHKSMADLKNLPLTHVKLSNSG VGFEDASGLGDSFTALNATSCVNHARIMSGEPPLSKDDVVILIGCLNAVYDNSSGIR HSLREIARGCFVGAGFTVQDGDDFYKQICKNASKQFYNG VopF SEQ ID MFKISVSQQANVMSTSDTAQRSSLKISIKSICNKSLSKKLHTLAEKCRRFSQELKEHT NO.: 16 ASKKQIVEQATTTVRESSLTKSDSELGSSRSLLTSDVLSSSSSHEDLTAVNLEDNDSV FVTIESSSELIVKQDGSIPPAPPLPGNIPPAPPLPSAGNIPTAPGLPKQKATTESVAQTSD NRSKLMEEIRQGVKLRATPKSSSTEKSASDPHSKLMKELINHGAKLKKVSTSDIPVPP PLPAAFASKPTDGRSALLSEIAGFSKDRLRKAGSSETLNVSQPTVAESSIPEAYDLLLS DEMFNLSPKLSETELNTLADSLADYLFKAADIDWMQVIAEQTKGSTQATSLKSQLE QAPEYVKAFCDEILKFPDCYKSADVASPESPKAGPSSVIDVALKRLQAGRNRLFSTID AKGTNELKKGEAILESAINAARSVMTAEQKSALLSSNVKSATEKVESELPCMEGFAE QNGKAAFNALRLAFYSSIQSGDTAQQDIARFMKENLATGFSGYSYLGLTSRVAQLE AQLAALTTK YopJ SEQ ID MIGPISQINISGGLSEKETSSLISNEELKNIITQLETDISDGSWFHKNYSRMDVEVMPAL NO.: 17 VIQANNKYPEMNLNLVTSPLDLSIEIKNVIENGVRSSRFIINMGEGGIHFSVIDYKHIN GKTSLILFEPANENSMGPAMLAIRTKTAIERYQLPDCHFSMVEMDIQRSSSECGIFSFA LAKKLYIERDSLLKIHEDNIKGILSDGENPLPHDKLDPYLPVTFYKHTQGKKRLNEYL NTNPQGVGTVVNKKNETIVNRFDNNKSIVDGKELSVSVHKKRIAEYKTLLKV AvrPtoB SEQ ID MAGINGAGPSGAYFVGHTDPEPASGGAHGSSSGASSSNSPRLPAPPDAPASQARDRR NO.: 18 EMLLRARPLSRQTREWVAQGMPPTAEAGVPIRPQESAEAAAPQARAEERHTPEADA AASHVRTEGGRTPQALAGTSPRHTGAVPHANRIVQQLVDAGADLAGINTMIDNAM RRHAIALPSRTVQSILIEHFPHLLAGELISGSELATAFRAALRREVRQQEASAPPRTAA RSSVRTPERSTVPPTSTESSSGSNQRTLLGRFAGLMTPNQRRPSSASNASASQRPVDR SPPRVNQVPTGANRVVMRNHGNNEADAALQGLAQQGVDMEDLRAALERHILHRR PIPMDIAYALQGVGIAPSIDTGESLMENPLMNLSVALHRALGPRPARAQAPRPAVPV APATVSRRPDSARATRLQVIPAREDYENNVAYGVRLLSLNPGAGVRETVAAFVNNR YERQAVVADIRAALNLSKQFNKLRTVSKADAASNKPGFKDLADHPDDATQCLFGEE LSLTSSVQQVIGLAGKATDMSESYSREANKDLVFMDMKKLAQFLAGKPEHPMTRET LNAENIAKYAFRIVP SdbA SEQ ID MHKKYNYYSLEKEKKTFWQHILDILKAPERLPGWVVSFFLARNITHVALNPNNIPQQ NO.: 19 RLIHLTKTSNRPEDDIVVINFKKRPPHKWENDTLIKIANTIAALPFVTPRLRTRLHYDN ENDINHVNKLLAEIDALVQGKSKQKYCKGRAFDWSKIHLKGLEFLDPKMRGYVYE QLHEKYGYVSYTTKRKPNIEFFTLKTPDGSELDSVQVTGEDEEKKPMGERKFIITCIA RDQNFINWIKDLNYTAKNLGATAISFNYRGVDYSRGLVWTENNLVDDILAQVQRLI SLGADPKNICLDGMCIGGAVATIAAAKLHEKGMKVKLNNERSFTSLSSLVFGFIVPE LQTANWWSPLTYGRFLLAGVVYALLTPLIWLAGWPVDVTKAWNRIPAQDKMYSV VRDKDNGLYDGVIHDHFCSIASLVDSQINSILYKLSTDQPLTEEEKQILCDDQFSHHF KPSQSVLKNPKYKGPHFISRQDLVAELGHREEYTNHDYFLDRLREKFQLDRATRPV ALAEDGEKDIDGISSQLSNNKERPLIIASSGGTGHISATHGHNDLQSKTDNVVITQHH AELYKNKPFSITSVLIRIGVWFTSLPILEDILKGVMRFIGYPVLPSSSIFWDQMSKIQQS ETKKENGIETGRTRPYVDMLLDIYPEGYEYTAFNNATHLTSSIEDIQTMISFKGHVEE DNRNIVYQNILQRLMHAAKQNTPYTRLISTQALSLGAICDAVKYYNTVFLPVYNAE RGTSYQPIAIDQYMTDLPSLGCIHFMNNLEELTSEQRQLMEIHAVNMSEPFKEAHFG KEQGFKAVHNIDPRNNPMIRNAFKDPSLTKYLDKTQSFDLHENVYKKEKQNALPVL NGKEKITIKPHAKIASIMIGSLAANASADYAKYLLNQGYEHIFLFGGLNDSIAARIDQI INSYPAPTRDEIRKKIILLGNQSDVEMAPIMTRSNCVVIRGGGLSVMEQMAMPIMDD KIVLLHEIEDNEEGPLTSGLSWEDGNSDKLIEYLSEKGAYAKKTSPGLCSGHLHEAEK SFEKKYHGQLKSTETKKKVDLTIPQQETYSLKKEWDRKTGYTESGHIL SHQHRFFNT IPEVREPFCSKEDLHHNELSSQSLVSVSAG SidG SEQ ID MSRSKDEVLEANDSLFGITVQTWGTNDRPSNGMMNFADQQFFGGDVGHASINMKL NO.: 20 PVTDKTKQWIEKYCYSQTYDQFKKVKGNEDKTYEEYLKTAKRLIPVELKTQVTRKA QYDSNGNLVTTHEKAYEQIYFDIDWSWWPGRLQNTEDDMVWEREGKHFEYDEKW KEYLQPEQRVHRGKLGSRKMDYAPTSIIIIQRDIPTSELEKITRDHKEITIEEKLNVVK LLQSKIDEMPHTKMSPSMELMFKNLGINVEKLLDETKDNGVDPTNLEAMREYLTNR LTERKLELETELSEAKKEVDSTQVKNKVEDVYYDFEYKLNQVRKKMEEVNSQLEK MDSLLHKLEGNTSGPIPYTAEIDELMSVLPFLKEELELENGTLSPKSIENLIDHIDELK NELASKQEKKNERNLNLIKKYEELCEQYKDDEEGLEEALWEEGIDVEEVNSAKKDIS KPAPEIQKLTDLQEQLRNHKESGVKLSSELEETLNSSVKMWKTKIDSPCQVISESSVK ALVSKINSTRPELVKEKEQLPEQEESLSKEAKKAQEELIKIQEFSQFYSENSSAYMVIG LPPHHQVSLPLAVNGKRGLHPEAMLKKMHELVAGPEKKEFNLHTNNCSLTSIEVLS AGAQHDPLLHSIMGTRALGFFGTPQQVLENAKLTSKTINEGKKSNIFTPLVTASPLDR ALGYAMSIYMDPEASKAKQNAGLALGVLVGLAKTPGHIGSLLNPKQGFNDILNTLN LVYSRNSTGLKVGLTLMALPAMIVLAPLAAIQKGVEVIAETIAKPFKLIANLFKQKPE STDEITVSVGSKKVAEKEGSYSNTALAGLVNSKIKSKIDENTITVEFQKSPQKMIEEIE SQLKENPGKVVVLSEKAHNAVLKFVSKSDDEALKQKFYDCCNQSVARSQKFAPKT RDEIDELVEEVTSTDKTELTTSPRQEPSMSSTIDEEENIDSEHQIETGTESTMRI VpdA SEQ ID MKTKQEVSQQDKLKDSKSSTPLQTKETWFISDALNITFDPYDFSISVTEQAPMPYRIV NO.: 21 FSGGGSRILAHIGALDELTRHGLKFTEFSGSSAGAMVAAFAYLGYNCSEIKQIISWFN EDKLLDSPLIFNFNNIKQIFNKGGLSSAKLMRQAANYVILKKVMDIISDEKFKTRFAK FQNFLEENIYRCPENITFQTLARIKEICPECELGEKLFITGTNLSTQKHEVFSIDTTPSM ALADAIIISANLPIAFERICYQGNVYSDGGISNNLPAHCFSEKGHKTTFLKHKDDVDFS VLALQFDNGLEENALYSQNPIPKWSWLSNTFYSLITGHPNVTENWYEDLQILRRHAH QSILIKTPTIALTNLTISQDTKKALVESGRTAAKTYLELHEFYTDDYGNIRHNECLHE KFQKPEELLDYCVLHSHFELLKKIKQAISCSQYLEKGYKHYLCELCDNLLPPQLKCP NEGSGTEQPEIKLEKDTIICEKNNNSGLTFSMTFFGVPSPLVKTLNQDSPELKIKLFTG LYPILIQNWQNLCPVSGISGILNSIRMSFVEISSTDTCIKTLIDKLNEIEIGHFLIFVFKAA LKNYDKHDFILLLKNLKHLHHSIELIRNKPFHSDDRFYGQWSFEGHDPKRILEFIKSD DISGLMTILEDKKALPNNKPN Lpg0969 SEQ ID MVSLEHIQKLISECRKLGKDGLDNGTNGLIPELEIDVVPPSAFLGVGNNPAIFVNSKT NO.: 22 YKLMRTTHEKWVENKTIVFKSYLLSQPAIKIIGAIVHETGHAFNVAAKIPNTEANACI FEIEVLMRLFQVKSPLLLGCTELDMQSYFKSRLTDYNKCVKDCQCLAEMVEFITHQF KLDEVSISEKENQIPLLSISNKWPGLFAKKQIAPDMDKLLTSPVTITPEVKILFYQLVK EHFHSPETEIKLDI Lpg1978 SEQ ID MYKIYSYLGWRIDMKTENLPQAGQEAQIDKKIHFIWVGHEVIPQKNIQVVSEWAEKN NO.: 23 PGYETHWVDKKIAPAKELDLFILDMKSKGITVKDINEEGVCRDSIRHELDQESPNYG MVSDMLRLNILAAEGGIYLDSDILCSAPFPDEIYAPFGFLLSPWSQGANNTLCNDIILC SKGNQIIQQLADAIEQSYIARDSFEFTHEYASMKETKGERIAKTLGVTGPGFLFHQLK KMGILNDKSEMEAIHWELQDQRYLIDGSVKEPDYFYVPQNNTNDASWVPSIKRPGIE NMSFQERLENAVQLIAFDIQKTGLFNLDHYANELKVKQNSWCIAAETSPELKPDSYL LIRPRDKTGEWTLYYVDEDKKLNPVTLPVIKGAIKLSEVSDPLRKFHTLLSQVSDPV NPTAHELKQIGRALIELKPRQDEWHCKNKWSGAEEIAQELWQRITSNETLRAQIKQC FTQFESLKPRVAELGLTRASGAGTEVEAHESTVKEQEIISQNTVGEEGTKEKNSVQL ASENSSDEKIKTAHDLIDEIIQDVIQLDGKLGLLGGNTRQLEDGRVINIPNGAAMIFDD YKKYKQGELTAESALESMIKIAKLSNQLNRHTFFNQRQPETGQFYKKVAAIDLQTTI AAEYDNNHGLRI YopE SEQ ID MKISSFISTSLPLPTSVSGSSSVGEMSGRSVSQQTSDQYANNLAGRTESPQGSSLASRII NO.: 24 ERLSSVAHSVIGFIQRMFSEGSHKPVVTPAPTPAQMPSPTSFSDSIKQLAAETLPKYM QQLNSLDAEMLQKNHDQFATGSGPLRGSITQCQGLMQFCGGELQAEASAILNTPVC GIPFSQWGTIGGAASAYVASGVDLTQAANEIKGLAQQMQKLLSLM SptP SEQ ID MLKYEERKLNNLTLSSFSKVGVSNDARLYIAKENTDKAYVAPEKFSSKVLTWLGK NO.: 25 MPLFKNTEVVQKHTENIRVQDQKILQTFLHALTEKYGETAVNDALLMSRINMNKPL TQRLAVQITECVKAADEGFINLIKSKDNVGVRNAALVIKGGDTKVAEKNNDVGAES KQPLLDIALKGLKRTLPQLEQMDGNSLRENFQEMASGNGPLRSLMTNLQNLNKIPE AKQLNDYVTTLTNIQVGVARFSQWGTCGGEVERWVDKASTHELTQAVKKIHVIAK ELKNVTAELEKIEAGAPMPQTMSGPTLGLARFAVSSIPINQQTQVKLSDGMPVPVNT LTFDGKPVALAGSYPKNTPDALEAHMKMLLEKECSCLVVLTSEDQMQAKQLPPYF RGSYTFGEVHTNSQKVSSASQGEAIDQYNMQLSCGEKRYTIPVLHVKNWPDHQPLP STDQLEYLADRVKNSNQNGAPGRSSSDKHLPMIHCLGGVGRTGTMAAALVLKDNP HSNLEQVRADFRDSRNNRMLEDASQFVQLKAMQAQLLMTTAS SopE2 SEQ ID MTNITLSTQHYRIHRSDVEPVKEKTTEKDIFAKSITAVRNSFISLSTSLSDRFSLHQQT NO.: 26 DIPTTHFHRGNASEGRAVLTSKTVKDFMLQKLNSLDIKGNASKDPAYARQTCEAILS AVYSNNKDQCCKLLISKGVSITPFLKEIGEAAQNAGLPGEIKNGVFTPGGAGANPFV VPLIASASIKYPHMFINHNQQVSFKAYAEKIVMKEVTPLFNKGTMPTPQQFQLTIENI ANKYLQNAS SopB/SigD SEQ ID MQIQSFYHSASLKTQEAFKSLQKTLYNGMQILSGQGKAPAKAPDARPEIIVLREPGA NO.: 27 TWGNYLQHQKASNHSLHNLYNLQRDLLTVAATVLGKQDPVLTSMANQMELAKVK ADRPATKQEEAAAKALKKNLIELIAARTQQQDGLPAKEAHRFAAVAFRDAQVKQL NNQPWQTIKNTLTHNGHHYTNTQLPAAEMKIGAKDIFPSAYEGKGVCSWDTKNIHH ANNLWMSTVSVHEDGKDKTLFCGIRHGVLSPYHEKDPLLRHVGAENKAKEVLTAA LFSKPELLNKALAGEAVSLKLVSVGLLTASNIFGKEGTMVEDQMRAWQSLTQPGK MIHLKIRNKDGDLQTVKIKPDVAAFNVGVNELALKLGFGLKASDSYNAEALHQLLG NDLRPEARPGGWVGEWLAQYPDNYEVVNTLARQIKDIWKNNQHHKDGGEPYKLA QRLAMLAHEIDAVPAWNCKSGKDRTGMMDSEIKREIISLHQTHMLSAPGSLPDSGG QKIFQKVLLNSGNLEIQKQNTGGAGNKVMKNLSPEVLNLSYQKRVGDENIWQSVK GISSLITS SipA SEQ ID MVTSVRTQPPVIMPGMQTEIKTQATNLAANLSAVRESATTTLSGEIKGPQLEDFPALI NO.: 28 KQASLDALFKCGKDAEALKEVFTNSNNVAGKKAIMEFAGLFRSALNATSDSPEAKT LLMKVGAEYTAQIIKDGLKEKSAFGPWLPETKKAEAKLENLEKQLLDIIKNNTGGEL SKLSTNLVMQEVMPYIASCIEHNFGCTLDPLTRSNLTHLVDKAAAKAVEALDMCHQ KLTQEQGTSVGREARHLEMQTLIPLLLRNVFAQIPADKLPDPKIPEPAAGPVPDGGK KAEPTGINININIDSSNHSVDNSKHINNSRSHVDNSQRHIDNSNHDNSRKTIDNSRTFI DNSQRNGESHHSTNSSNVSHSHSRVDSTTHQTETAHSASTGAIDHGIAGKIDVTAHA TAEAVTNASSESKDGKVVTSEKGTTGETTSFDEVDGVTSKSIIGKPVQATVHGVDDN KQQSQTAEIVNVKPLASQLAGVENVKTDTLQSDTTVITGNKAGTTDNDNSQTDKTG PFSGLKFKQNSFLSTVPSVTNMIISMHFDARETFLGVIRKALEPDTSTPFPVRRAFDGL RAEILPNDTIKSAALKAQCSDIDKHPELKAKMETLKEVITHHPQKEKLAEIALQFARE AGLTRLKGETDYVLSNVLDGLIGDGSWRAGPAYESYLNKPGVDRVITTVDGLHMQ R YpkA SEQ ID MKSVKIMGTMPPSISLAKAHERISQHWQNPVGELNIGGKRYRIIDNQVLRLNPHSGF NO.: 29 SLFREGVGKIFSGKMFNFSIARNLTDTLHAAQKTTSQELRSDIPNALSNLFGAKPQTE LPLGWKGEPLSGAPDLEGMRVAETDKFAEGESHISIIETKDKQRLVAKIERSIAEGHL FAELEAYKHIYKTAGKHPNLANVHGMAVVPYGNRKEEALLMDEVDGWRCSDTLR TLADSWKQGKINSEAYWGTIKFIAHRLLDVTNHLAKAGVVHNDIKPGNVVFDRASG EPVVIDLGLHSRSGEQPKGFTESFKAPELGVGNLGASEKSDVFLVVSTLLHCIEGFEK NPEIKPNQGLRFITSEPAHVMDENGYPIHRPGIAGVETAYTRFITDILGVSADSRPDSN EARLHEFLSDGTIDEESAKQILKDTLTGEMSPLSTDVRRITPKKLRELSDLLRTHLSSA ATKQLDMGGVLSDLDTMLVALDKAEREGGVDKDQLKSFNSLILKTYRVIEDYVKG REGDTKNSSTEVSPYHRSNFMLSIVEPSLQRIQKHLDQTHSFSDIGSLVRAHKHLETL LEVLVTLSQQGQPVSSETYGFLNRLAEAKITLSQQLNTLQQQQESAKAQLSILINRSG SWADVARQSLQRFDSTRPVVKFGTEQYTAIHRQMMAAHAAITLQEVSEFTDDMRN FTVDSIPLLIQLGRSSLMDEHLVEQREKLRELTTIAERLNRLEREWM YopM SEQ ID MFINPRNVSNTFLQEPLRHSSNLTEMPVEAENVKSKTEYYNAWSEWERNAPPGNGE NO.: 30 QREMAVSRLRDCLDRQAHELELNNLGLSSLPELPPHLESLVASCNSLTELPELPQSLK SLQVENNNLKALPDLPPSLKKLHVRENDLTDLPELPQSLESLRVDNNNLKALSDLPP SLEYLTASSNKLEELPELQNLPFLAAIYADNNLLETLPDLPPSLKKLHVRENDLTDLP ELPQSLESLQVDNNNLKALSDLPPSLEYLTASSNKLEELPELQNLPFLAAIYADNNLL ETLPDLPPHLEILVASYNSLTELPELPQSLKSLRVDNNNLKALSDLPPSLEYLTASSNK LEELPELQNLPFLAAIYADNNLLETLPDLPPSLKKLHVRENDLTDLPELPQSLTFLDVS DNNISGLSELPPNLYYLDASSNEIRSLCDLPPSLVDLNVKSNQLSELPALPPHLERLIA SFNYLAEVPELPQNLKQLHVEQNALREFPDIPESLEELEMDSERVVDPYEFAHETTD KLEDDVFE Amatoxin SEQ ID MSDINATRLPIWGIGCNPCVGDDVTTLLTRGEALC NO.: 31 Phallacidin SEQ ID MSDINATRLPAWLVDCPCVGDDVNRLLTRGESLC NO.: 32 Killer toxin SEQ ID MIKPERSILTILIGILCLLAYVLANGEPHDGDNEWSSYCSDQGFRRSDDGLVTTPDVG KP1 NO.: 33 QESIGKNSINGSELVDYLQCLKVRLNGQKQVVSNDGWLLLLVQEPSVNVTQKAMSE CNYNVSSGHKAGSYIQVTNTPADYKVISRRGSYEGDQLPEDVKPYFGVQKTSDYRPI SKRINPNLTLRQLAYNFAALNMCSLWCNSCISRSCPYYIAELTVHVNNIHHGTVWLH HFCRNASPQGGNLYSTLTISHKDTAYYVGTGWWKVRSTAATTNDVAGDWYPASW NQYWCGPHY Killer toxin SEQ ID MLIFSVLMYLGLLLAGASALPNGLSPRNNAFCAGFGLSCKWECWCTAHGTGNELR KP6 NO.: 34 YATAAGCGDHLSKSYYDARAGHCLFSDDLRNQFYSHCSSLNNNMSCRSLSKRTIQD SATDTVDLGAELHRDDPPPTASDIGKRGKRPRPVMCQCVDTTNGGVRLDAVTRAA CSIDSFIDGYYTEKDGFCRAKYSWDLFTSGQFYQACLRYSHAGTNCQPDPQYE Killer Toxin SEQ ID MTKPTQVLVRSVSILFFITLLIILVVALNDVAGPAETAPVSLLPREAPWYDKIWEVKD K1 NO.: 35 WLLQRATDGNWGKSITWGSEVASDAGVVIEGINVCKNCVGERKDDISTDCGKQTLA LLVSIFVAVTSGHHLIWGGNRPVSQSDPNGATVARRDISTVADGDIPLDFSALNDILN EHGISILPANASQYVKRSDTAEHTTSFVVTNNYTSLHTDLIHHGNGTYTTFTTPHIPA VAKRYVYPMCEHGIKASYCMALNDAMVSANGNLYGLAEKLFSEDEGQWETNYYK LYWSTGQWIMSMKFIEESIDNANNDFEGCDTGH Killer Toxin SEQ ID MGHLAILFSHAVLNIATAVASSDSIYLKGHRVGQDIDSLYRVYDNGTMYPVTFNEW K28 (KHR) NO.: 36 LNDLTGMNDLATNNATILKRDSSDVSCVTETCQYVDYHVDDEGVITIDISTYRIPVE WDSGSAGNASYGVSKRDTKYETECKKKICGINVSGFCNAYDFAVHAFDEGGSVYNP VSGITDRIKEATKRDKTECLGYELDHVRIDPAVDWSISISTWKQGSANCDTQASADS LKCAAQKALESEHNHQKTAFCIHLDNGGSFNLDIRLISELSFSKYNPWALPCPKYKG SNSWQVVSDCFQ Killer Toxin SEQ ID MPRFAIIFALLIAYSLFLSTLFTGSIPDRANTVTSNAPCQVVIWDWIRTRRICNCCSRLC K28 (KHS) NO.: 37 YSLLGRSNLSRTAKRGVCTIAGAVLATAAVIVAAVLVGKSSGSATKRGLTKTISVLN HTIPFTDHILNGQTLSNGTGSNEVTIGFSGYAVHATIKRASTTDIISWVIPESMEPTLAR VASYVSSSSINLAAVPDTGGNASALSFQNAVQEFATSWVSMTYDQSYGDLRNVAN DEGGEEILILMRKRSYRISFQVIETGSTALLLRTRRVVSQLITMTYLVTVQARVGIQIG DIFQHYGGIDNYVMTSISVLRTLEDKAFHENKLLIVREPPNKSNQDANQSYRLRPFSA NDLIQNLKSVDIGFLAFCSFEDKYAHYPEIEVIMKITIFISKGNLWSHYVIQARYVRKR VMKVRGQMPGGLLTNMESLLNIVSTPNLNISEFHIQTHSMSQSKPMYFQKQCYSSQ NNIIYIYNSIHITCGAVYVIVHDVRTPSVFVLIELRNCKPLKNSWCETTKTSPRDTKIK KNEYNETVCRRAGALLDGRVRTIRFLMMRTHWSRVKGVSCNTANRLSRFCNHVVS YYPSQNATHILLPTSLRAESLEQQYTTRPLSSSNNRFCCLKSIFINNCKKACESPSLVS CNLQQTAELLMVYYLYICEACYVSRNHDLLSKQCMSTVRAVYVARMRLPKFRSTFP CMPRLCWLVNGVVVV Anthrax SEQ ID MHVKEKEKNKDENKRKDEERNKTQEEHLKEEVIKHIVKIEVKGEEAVKKEAAEKLL lethal factor NO.: 38 EKVPSDVLEMYKAIGGKIYIVDGDITKHISLEALSEDKKKIKDIYGKDALLHEHYVY endopeptidase AKEGYEPVLVIQSSEDYVENTEKALNVYYEIGKILSRDILSKINQPYQKFLDVLNTIK NASDSDGQDLLFTNQLKEHPTDFSVEFLEQNSNEVQEVFAKAFAYYIEPQHRDVLQL YAPEAFNYMDKFNEQEINLSLEELKDQRMLSRYEKWEKIKQHYQHWSDSLSEEGRG LLKKLQIPIEPKKDDIIHSLSQEEKELLKRIQIDSSDFLSTEEKEFLKKLQIDIRDSLSEEE KELLNRIQVDSSNPLSEKEKEFLKKLKLDIQPYDINQRLQDTGGLIDSPSINLDVRKQ YKRDIQNIDALLHQSIGSTLYNKIYLYENMNINNLTATLGADLVDSTDNTKINRGIFN EFKKNFKYSISSNYMIVDINERPALDNERLKWRIQLSPDTRAGYLENGKLILQRNIGL EIKDVQIIKQSEKEYIRIDAKVVPKSKIDTKIQEAQLNINQEWNKALGLPKYTKLITFN VHNRYASNIVESAYLILNEWKNNIQSDLIKKVTNYLVDGNGRFVFTDITLPNIAEQYT HQDEIYEQVHSKGLYVPESRSILLHGPSKGVELRNDSEGFIHEFGHAVDDYAGYLLD KNQSDLVTNSKKFIDIFKEEGSNLTSYGRTNEAEFFAEAFRLMHSTDHAERLKVQKN APKTFQFINDQIKFIINS Shiga Toxin SEQ ID MKCILLKWVLCLLLGFSSVSYSREFTIDFSTQQSYVSSLNSIRTEISTPLEHISQGTTSV NO.: 39 SVINHTPPGSYFAVDIRGLDVYQARFDHLRLIIEQNNLYVAGFVNTATNTFYRFSDFA HISVPGVTTVSMTTDSSYTTLQRVAALERSGMQISRHSLVSSYLALMEFSGNTMTRD ASRAVLRFVTVTAEALRFRQIQREFRQALSETAPVYTMTPGDVDLTLNWGRISNVLP EYRGEDGVRVGRISFNNISAILGTVAVILNCHHQGARSVRAVNEESQPECQITGDRPV IKINNTLWESNTAAAFLNRKSQSLYTTGE Saporin Toxin SEQ ID MKSWIMLVVTWLIILQTTVTAVIIYELNLQGTTKAQYSTFLKQLRDDIKDPNLHYGG NO.: 40 TNLPVIKRPVGPPKFLRVNLKASTGTVSLAVQRSNLYVAAYLAKNNNKQFRAYYFK GFQITTNQLNNLFPEATGVSNQQELGYGESYPQIQNAAGVTRQQAGLGIKKLAESMT KVNGVARVEKDEALFLLIVVQMVGEAARFKYIENLVLNNFDTAKEVEPVPDRVIILE NNWGLLSRAAKTANNGVFQTPLVLTSYAVPGVEWRVTTVAEVEIGIFLNVDNNGLP SIIYNNIISGAFGDTY Ricin Toxin SEQ ID MYAVATWLCFGSTSGWSFTLEDNNIFPKQYPIINFTTAGATVQSYTNFIRAVRGRLT NO.: 41 TGADVRHDIPVLPNRVGLPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSA YFFHPDNQEDAEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEA ISALYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAPDPSVITL ENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIPHALMVYRCAPPPSSQ FSLLIRPVVPNFNADVCMDPEPIVRIVGRNGLCVDVRDGRFHNGNAIQLWPCKSNTD ANQLWTLKRDNTIRSNGKCLTTYGYSPGVYVMIYDCNTAATDATRWQIWDNGTIIN PRSSLVLAATSGNSGTTLTVQTNIYAVSQGWLPTNNTQPFVTTIVGLYGLCLQANSG QVWIEDCSSEKAEQQWALYADGSIRPQQNRDNCLTSDSNIRETVVKILSCGPASSGQ RWMFKNDGTILNLYSGLVLDVRRSDPSLKQIILYPLHGDPNQIWLPLF
[0122] In some embodiments, the death agent is an overexpressed product of genetic element selected from DNA or RNA. In some embodiments, the genetic element is a Growth Inhibitory (GIN) sequence such as GIN11.
[0123] In some embodiments, the death agent is a ribosomally encoded xenobiotic agent, a ribosomally encoded poison, a ribosomally encoded endogenous or exogenous gene that results in severe growth defects upon mild overexpression, a ribosomally encoded recombinase that excises an essential gene for viability, a limiting factor involved in the synthesis of a toxic secondary metabolite, or any combination thereof. In some embodiments, the ribosomally encoded death agent is Cholera toxin, SpvB toxin, CARDS toxin, SpyA Toxin, HopU1, Chelt toxin, Certhrax toxin, EFV toxin, ExoT, CdtB, Diphtheria toxin, ExoU/VipB, HopPtoE, HopPtoF, HopPtoG, VopF, YopJ, AvrPtoB, SdbA, SidG, VpdA, Lpg0969, Lpg1978, YopE, SptP, SopE2, SopB/SigD, SipA, YpkA, YopM, Amatoxin, Phallacidin, Killer toxin KP1, Killer toxin KP6 , Killer Toxin K1, Killer Toxin K28 (KHR), Killer Toxin K28 (KHS), Anthrax lethal factor endopeptidase, Shiga Toxin, Saporin Toxin, Ricin Toxin, or any combination thereof. The cytotoxic reporter or death agent may be a protein with a sequence selected from SEQ ID Nos: 1-41. The cytotoxic reporter may be a variant of a naturally found cytotoxic reporters. Such a variant can have at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NOs: 1-41.
[0124] Along with one or more positive selection markers, a plasmid can also include one or more negative selection markers under control of a different DNA binding sequence to enable binary selection. The plasmid can encode for one or more of negative selection markers in Table 1 driven by a promoter which depends on the DBD present in the bait protein integration plasmid—DNA Binding Sequence (DBS), for example, the LexAop sequence (DBS) which can become bound by LexA (DBD). In some embodiments, to ensure repression of the ‘death agents,’ the plasmid can include a silencing construct such as a TetR'-Tup11 fusion driven by a strong promoter (such as ADH1) to bind the DBD and silence transcription in the presence of doxycycline. The plasmid can comprise bacterial selection and propagation markers (i.e. ori and AmpR), and yeast replication and selection markers (i.e. LEU2 and CEN or 2-micron) as well.
[0125] Disclosed herein, in certain embodiments, is a library of plasmid vectors, each plasmid vector comprising a DNA sequence encoding a different peptide sequence operably linked to a first switchable promoter; a DNA sequence encoding a death agent under control of a second switchable promoter; and a DNA sequence encoding a positive selection reporter under control of a third switchable promoter. Plasmids comprising a promoter driving different E3 ubiquitin ligases may also be included in the library of vectors.
Addition or Expression of Modulators
[0126] A molecule from a library that can selectively bridge a bait protein of interest and a specific E3 ubiquitin ligase leading to the bait protein degradation can be screened by use of positive and/or negative selection markers in a host cell.
[0127] In some embodiments, the molecule is small molecule. In some embodiments, the small molecule is peptidomimetic. The host cell can be made to become permeabilized to small molecules, for example by deletion of drug efflux pump encoding genes such as PDR5. Genes encoding for transcription factors such as PDR1 and PDR3 that induce expression of efflux pumps including but not restricted to the 12 genes described by 12geneΔ0HSR (Chinen, 2011). The host cell could be further permeabilized to small molecules by interference with the synthesis and deposition of ergosterol in the plasma membrane such as by the deletion of ERG2, ERG3, and/or ERG6 or driving their expression under a regulatable promoter.
[0128] In other embodiments, the molecule is a peptide, macrocycle or protein. In some embodiments, the peptide or protein is derived from naturally occurring protein product. In another embodiment, the peptide or protein is a synthesized protein product. In other embodiments, the peptide or protein is a product of recombinant genes.
[0129] In some embodiments, the molecule is introduced to the host cell exogenously. In other embodiments, the molecule is the expression product of test DNA inserted into the host cell, wherein the test DNA comprises of DNA sequences that encodes a polypeptide. DNA encoded libraries can be formed by delivery of a plurality of test DNA molecules into host cells. In some embodiments, the peptide sequences of the polypeptides in the library are random. In some embodiments, the different peptide sequences are pre-enriched for binding to a target.
[0130] To screen for peptides that selectively facilitate the degradation of a protein of interest, peptides from a randomized peptide library can be applied to or expressed internally from the host cell. A plasmid can be further used to express a randomized peptide library (such as a randomized NNK 60-mer sequences). The plasmid can include a restriction site for integration of a randomized peptide library driven by a strong promoter (such as the ADH1 promoter) or an inducible promoter (such as the GAL1 promoter).
[0131] In some embodiments, the randomized peptide library is about 60-mer. In some embodiments, the randomized peptide library is from about 5-mer to 20-mer. In some embodiments, the randomized peptide library is less than 15-mer.
[0132] The library can also initiate with a fixed sequence of, for example, Methionine-Valine-Asparagine (MVN) for N-terminal stabilization and/or another combination of high-half-life N-end residues (see, for e.g., Varshaysky. Proc. Natl. Acad. Sci. USA. 93:12142-12149 (1996)) to maximize the half-life of the peptide, and terminate with the 3′UTR of a short protein (such as sORF1). The peptide can also be tagged with a protein tag such as Myc. In some embodiments, N-terminal residues of the peptide comprise Met, Gly, Ala, Ser, Thr, Val, or Pro or any combination thereof to minimize proteolysis.
[0133] The plurality of different short peptide sequences can be randomly generated by any method (e.g. NNK or NNN nucleotide randomization). The plurality of different short peptide sequences can also be preselected, either by previous experiments selecting for binding to a target, or from existing data sets in the scientific literature that have reported rationally-designed peptide libraries.
[0134] In some embodiments, the library comprises polypeptides about 60 amino acids or fewer in length. In another embodiment, the library comprises polypeptides about 30 or fewer amino acids in length. In another embodiment, the library comprises polypeptides about 20 or fewer amino acids in length.
Modification of Facilitating Peptides
[0135] The peptide that leads to the selective degradation of the target can also be a product of post-translational modifications. The post-translational modification can include any one or combination of cleavage, cyclization, bi-cyclization, methylation, halogenation, glycosylation, acylation, phosphorylation, and acetylation. In some embodiments, the methylation comprises reacting with an N-methyltransferase. In some embodiments, the post-translational modification is done by naturally occurring enzymes. In some embodiments, the post-translational modification is done by synthetic enzymes. In some embodiments, the synthetic enzymes are chimeric.
[0136] The peptide can be ribosomally synthesized and post-translationally modified peptide (RiPP) whereby the core peptide is flanked by prepropeptide sequence comprising a leader peptide and recognition sequences which signal for the recruitment of maturation, cleavage, and/or modifying enzymes such as excision or cyclization enzymes including, for example, lanthipeptides maturation enzymes from Lactococcus lactis (LanB, LanC, LanM, LanP) patellamide biosynthesis factors from cyanobacteria (PatD, PatG), butelase 1 from Clitoria ternatea, and POPB from Galerina marginata, Lentinula edodes, Omphalotacae olearis, Dendrothele bispora, or Amanita bisporigera, or other species. In some embodiments, the cyclization or bicyclization enzymes are synthetic chimeras.
[0137] In one example, the variable peptide library region is embedded within the primary sequence of a modifying enzyme (e.g., the homolog of the omphalotin N-methyltransferase enzyme from Dendrothele bispora, Marasmius fiardii, Lentinula edodes, Fomitiporia mediterranea, Omphalotus olearius or other) and contains random residues, some of which may be post-translationally decorated by additional modifications like hydroxylation, halogenation, glycosylation, acylation, phosphorylation, methylation, acetylation. This diversified variable region is excised and modified to form N-to-C cyclized, optionally backbone N-methylated macrocycles by the action of a prolyl endopeptidase belonging to the PopB family and N-methyltransferases belonging to the omphalotin methyltransferase family. An exemplary list of prolyl endopeptidases is shown in Table 2. The prolyl endopeptidases may be a protein with a sequence selected from SEQ ID NOs: 42-58. The prolyl endopeptidases may be encoded by a nucleotide sequence selected from SEQ ID NOs: 59 or 60. The prolyl endopeptidase may be a variant of a naturally found prolyl endopeptidases. Such a variant can have at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NOs: 42-58. An exemplary list of N-methyltransferases is shown in Table 3. The methyltransferase may be a protein with a sequence selected from SEQ ID Nos: 61-116. The methyltransferases may be encoded by a nucleotide sequence selected from SEQ ID Nos: 117 or 118. The prolyl endopeptidase may be a variant of a naturally found methyltransferases. Such a variant can have at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NOs: 61-116.
TABLE-US-00002 TABLE 2 Amino acid and nucleotide sequences of prolyl endopeptidase type cyclizing enzymes Galerina SEQ ID NO.: 42 MSSVTWAPGNYPSTRRSDHVDTYQSASKGEVPVPDPYQWLEESTDEVDK marginata CBS KDR68475.1 WTTAQADLAQSYLDQNADIQKLAEKFRASRNYAKFSAPTLLDDGHWYWF 339.88 hypothetical YNRGLQSQSVLYRSKEPALPDFSKGDDNVGDVFFDPNVLAADGSAGMVLC protein KFSPDGKFFAYAVSHLGGDYSTIYVRSTSSPLSQASVAQGVDGRLSDEVKW GALMADRAFT_78538 FKFSTIIWTKDSKGFLYQRYPARERHEGTRSDRNAMMCYHKVGTTQEEDII VYQDNEHPEWIYGADTSEDGKYLYLYQFKDTSKKNLLWVAELDEDGVKS GIHWRKVVNEYAADYNIITNHGSLVYIKTNLNAPQYKVITIDLSKDEPEIRD FIPEEKDAKLAQVNCANEEYFVAIYKRNVKDEIYLYSKAGVQLTRLAPDFV GAASIANRQKQTHFFLTLSGFNTPGTIARYDFTAPETQRFSILRTTKVNELDP DDFESTQVWYESKDGTKIPMFIVRHKSTKFDGTAAAIQYGYGGFATSADPF FSPIILTFLQTYGAIFAVPSIRGGGEFGEEWHKGGRRETKVNTFDDFIAAAQF LVKNKYAAPGKVAINGASNGGLLVMGSIVRAPEGTFGAAVPEGGVADLLK FHKFTGGQAWISEYGNPSIPEEFDYIYPLSPVHNVRTDKVMPATLITVNIGD GRVVPMHSFKFIATLQHNVPQNPHPLLIKIDKSWLGHGMGKPTDKNVKDA ADKWGFIARALGLELKTVE Amanita SEQ ID NO.: 43 MPPTPWAPHSYPPTRRSDHVDVYQSASRGEVPVPDPYQWLEENSNEVDEW bisporigera ADN19205.1 TTAQTAFTQGYLDKNADRQKLEEKFRASKDYVKFSAPTLLDSGHWYWFY prolyl NSGVQSQAVLYRSKKPVLPDFQRGTRKVGEVYFDPNVLSADGTAIMGTCR oligopeptidase FSPSGEYFAYAVSHLGVDYFTIYVRPTSSSLSQAPEAEGGDGRLSDGVKWC KFTTITWTKDSKGFLYQRYPARESLVAKDRDKDAMVCYHRVGTTQLEDII VQQDKENPDWTYGTDASEDGKYIYLVVYKDASKQNLLWVAEFDKDGVK PEIPWRKVINEFGADYHVITNHGSLIYVKTNVNAPQYKVVTIDLSTGEPEIR DFIPEQKDAKLTQVKCVNKGYFVAIYKRNVKDEIYLYSKAGDQLSRLASDF IGVASITNREKQPHSFLTFSGFNTPGTISRYDFTAPDTQRLSILRTTKLNGLN ADDFESTQVWYKSKDGTKVPMFIVRHKSTKFDGTAPAIQNGYGGFAITAD PFFSPIMLTFMQTYGAILAVPNIRGGGEFGGEWHKAGRRETKGNTFDDFIA AAQFLVKNKYAAPGKVAITGASNGGFLVCGSVVRAPEGTFGAAVSEGGVA DLLKFNKFTGGMAWTSEYGNPFIKEDFDFVQALSPVHNVPKDRVLPATLL MTNAGDDRVVPMHSLKFVANLQYNVPQNPHPLLIRVDKSWLGHGFGKTT DKHTKDAADKWSFVAQSLGLEWKTVD Hypsizygus SEQ ID NO.: 44 MAISPTPWTPNTYPPTRRSSHVDIYKSATRGEVRVADPYQWLEENTEETDK marmoreus] KYQ30898.1 WTTAQEEFTRSYLDKNTDRQRLEDAFRTSTDYAKFSSPTLYEDGRWYWFY Prolyl NSGLQPQPLIYRSKGKTLPDFSQDDNVVGEVFFDPNLLSDDGTAALSIYDFS endopeptidase DCGKYFAYGISFSGSDFSTIYVRSTESPLAKKNSGSTDDDRLSDEIKHVKFS AVTWTKDSKGFFYQRYPAHENAKEGIETGGDVDAMIYYHVIGTSQSEDILV HSDKSNPEWMWSIDITEDGKYLILYTMKDSSRKNLMWIAELSKNEIGPNIQ WNKIIDVFDAEYHLITNDGPILYVKTNADAPQYKLVTMDISGDKDISRDLIP EDKNANLVQVDCVNRDTFAVIYKRNVKDEIYLYSKTGIQLSRLASDFVGA ASISSREKQPHFFVTMTGFSTPGTVARYDFGAPEEQRWSIYRSVKVNGLNP DDFESKQVWYESKDGTKIPMFIVRHKATKFDGTAPAIQYGYGGFSISINPFF SPTILTFLQTYGAVLAVPNIRGGAEFGEDWHKAGTREKKGNVFDDFVAAT QYLVKNKYAGEGKVAINGGSNGGLLVGACINRAPEGTFGAAVAEVGVMD LLKFSKFTIGKAWTSDYGDPDDPKDFDFICPLSPLHNIPTDRVLPPTMLLTA DHDDRVVPMHSFKHAATLQYTLPHNPHPLVIRIDKKAGHGAGKSTEKRIKE SADKWGFVAQSLGLVWQEPA Conocybe apala SEQ ID NO.: 45 MPPSTPNEYPPTRRSDDVLTYRSEKNGEVVVPDPYQWLEHNTEETDKWTT ACQ65797.1 AQAAFTRAHLDKNPKRNALEEAFTAANDYAKFSAPQLHDDGRWYWYYN prolyl TGLQAQTCLWRTRDDTIPDFSKQLDEDVGEIFFDPNALSKDGTAALSTYRF oligopeptidase SRDGKYFAYAIAQSGSDFNTIYVRPTDSPLTKRDESGRDPSRLADEVKFVKF SGITWAPNSEGFFYQRYPHIDGATLEEGGIATRRDLHAMVYYHRVGTPQSE DILIHRDPANPEWMFGVNVTDNGEYIELYISKDSSRKNMLWVANFAMNKI GEQFQWRKVINDFAAEYDVITNHGPVYYFRTDDGAPKHKILSINIDTNERK LLVPESEDAALFSTVCVNKNYMALIYKRNVKDEVHLYTLEGKPVRRLAED FVGACTISGKEKQPWFFVTMSGETSPSTVGRYNFQIPEEENRWSIFRAAKIK NLNPNDFEASQVWYKSKDGTNVPMFIVRHKSTQFDGTAPALQYGYGGFSI SIDPFFSASILTFLKVYGAILVVPSIRGGNEFGEEWHRGGMKQNKVNCEDDF IAATNHLVEHKYAAPGKVAINGGSNGGLLVAACINRAPEGTFGAAIAEVGV HDMLKFHKFTIGKAWTSDYGNPDDPHDFDYIYPISPVHNVPTDKILPPTLLL TADHDDRVVPMHTFKLAATLQHTLPHNPHPLLLRVDKKAGHGAGKPLQL KIREQADKWGFVAQSFQLVWRDGV Amanita SEQ ID NO.: 46 MPPTPWAPHSYPPTRRSDHVDVYQSASRGEVPVPDPYQWLEENSNEVDEW bisporigera GenBank TTAQTAFTQGYLDKNADRQKLEEKFRASKDYVKFSAPTLLDSGHWYWFY HQ225841.1 NSGVQSQAVLYRSKKPVLPDFQRGTRKVGEVYFDPNVLSADGTAIMGTCR POPB FSPSGEYFAYAVSHLGVDYFTIYVRPTSSSLSQAPEAEGGDGRLSDGVKWC KFTTITWTKDSKGFLYQRYPARESLVAKDRDKDAMVCYHRVGTTQLEDIIV QQDKENPDWTYGTDASEDGKYIYLVVYKDASKQNLLWVAEFDKDGVKPE IPWRKVINEFGADYHVITNHGSLIYVKTNVNAPQYKVVTIDLSTGEPEIRDFI PEQKDAKLTQVKCVNKGYFVAIYKRNVKDEIYLYSKAGDQLSRLASDFIGV ASITNREKQPHSFLTFSGFNTPGTISRYDFTAPDTQRLSILRTTKLNGLNADD FESTQVWYKSKDGTKVPMFIVRHKSTKFDGTAPAIQNGYGGFAITADPFFSP IMLTEMQTYGAILAVPNIRGGGEFGGEWHKAGRRETKGNTFDDFIAAAQFL VKNKYAAPGKVAITGASNGGELVCGSVVRAPEGTFGAAVSEGGVADLLKF NKFTGGMAWTSEYGNPFIKEDFDFVQALSPVHNVPKDRVLPATLLMTNAG DDRVVPMHSLKFVANLQYNVPQNPHPLLIRVDKSWLGHGFGKTTDKHTK DAADKWSFVAQSLGLEWKTVD Lentinula SEQ ID NO.: 47 MFSATQESPTMSVPQWDPYPPVSRDETSAITYQSKLCGSVTVRDPYSALEV edodes GenBank PFDDSEETKAFVHAQRKFARTYLDEIPDRETWLQTLKESWNYRRFTVPKRE GAW09065.1 SDGYTYFEYNDGLQSQMSLRRVKVSEEDTILTESGPGGELFFDPNLLSLDG The DOE Joint NAALTGSMNISPCGKYWAYGVSEHGSDWMTTYVRKTSSPHMPSQEKGKD Genome PGRMDDVIRYSRFFIVYWSSDSKGFFYSRYPPEDDEGKGNTPAQNCMVYY Institute (JGI) HRLGEKQEKDTLVYEDPEHPFWLWALQLSPSGRYALLTASRDASHTQLAK 011197; IADIGTSDIQNGIQWLTIHDQWQARFVIIGDDDSTIYFMTNLEAKNYLVATL LENED_011197) DIRHSEAGVKTLVAENPDALLISASILSTDKLVLVYLHNARHEIHVHDLNTG KPIRQIFDNLIGQFSLSGRRDDNDMFVFHSGFTSPGTIYRFRLNEDSNKGTLF RAVQVPGLNLSDFTTESVFYPSKDGTPIHMFITRLKDTPVDGTAPVYIYGYG GFALAMLPTFSVSTLLFCKIYRAMYVVPNIRGGSEFGESWHREGMLDKKQ NVFDDFNAATKWLVANKYANKYNVAIRGGSNGGVLTTACANQAPELYRC VITIGGIIDMLRFPKFTFGALWRSEYGDPEDPEDFDFIYKYSPYHNIPSGDVV LPAMLFFTAAYDDRVSPLHSFKHVAALQYNFPNGPNPVLMRIDLNTGHFA GKSTQKMLEETADEYRCDLLCCNLQL Omphalotacae SEQ ID NO.: 48 MSFPGWGPYPPVERDETSAITYSSKLHGSVTVRDPYSQLEVPFEDSEETKAF olearis The DOE Joint VHSQRKFARTYLDENPDREAWLETLKKSWNYRRFSALKPESDGHYYFEYN Genome DGLQSQLSLYRVRMGEEDTVLTESGPGGELFFNPNLLSLDGNAALTGFVMS Institute (JGI) PCGNYWAYGVSEHGSDWMSIYVRKTSSPHLPSQERGKDPGRMNDKIRHV 2090; RFFIVSWTSDSKGFFYSRYPPEDDEGKGNAPAMNCMVYYHRIGEDQESDV OMPOL1_2090 LVHEDPEHPFWISSVQLTPSGRYILFAASRDASHTQLVKIADLHENDIGTNM KWKNLHDPWEARFTIVGDEGSKIYFMTNLKAKNYKVATFDANHPDEGLT TLIAEDPNAFLVSASIHAQDKLLLVYLRNASHEIHIRDLTTGKPLGRIFEDLL GQFMVSGRRQDNDIFVLFSSFLSPGTVYRYTFGEEKGHSSLFRAISIPGLNLD DFMTESVEYPSKDGTSVHMFITRPKDVLLDGTSPVLQYGYGGFSLAMLPTF SLSTLLFCKIYRAIYAIPNIRGGSEYGESWHREGMLDKKQNVEDDFNAATE WLIANKYASKDRIAIRGGSNGGVLTTACANQAPGLYRCVITIEGIIDMLRFP KFTFGASWRSEYGDPEDPEDFDFIFKYSPYHNIPPPGDTIMPAMLFFTAAYD DRVSPLHTFKHVAALQHNFPKGPNPCLMRIDLNSGHFAGKSTQEMLEETA DEYRLKVQ Gymnopus SEQ ID NO.: 49 MSMSLLGVYPPVKRDEASAITYQSKLHGSVIVHDPYSALEIPSNDSLETKAF fusipes VLSQGKFSRAYLDEIPTRKNWLKILKSNWSYRRFSALKRESDNHFYFEYND GLQPQSSIYRVKVGEEDSILTESGPGGELFFDPNLLSLDGVAALTGAAMSPS GKYWAYGVSEHGNNSMTIYVRKTSSPHQPSQEKGTDPGRMNDVLQHIRM LFVSWTRDSKGFFYQRYPPEKNEGNGNAPGQNCKIYYHYIGTEQDSDILIH EDPDHPDWFSYVQLSPSGQYVLLLINRDSSLNYLAKIADLSVNDIGTHIQW KNLHDSWNHFTMIGNDYSVIYFKTNLDAQNYKVATIDFLQPEMGFTTLVK ENPNSVLVEAKIFREDKLVLLYQQNASHQIHIYDLKSGAWLQQIFKNLTGFI TTVPNGRAEDEMFFLYNDFITPGTIYQYKFDDESDKGLVFRAIQIDGLNLDD FVTESKFYPSKDGTSVHMFITRPKDVLIDGTAAVYMYGYGGFSISVLPTFSIS TLLFCKIYRAMYVVPNIRGGSEFGESWHREGMLDKKQNGHDDFHAAAEW LIANKYAKKDCVAIRGGSSGGILTTACANQAPELYRCVITIEGIIDMLKFPKF TFGALLRSEYGDPEDPEAFDYIYKYSPYHNIPLGDVVMPPMLFFNAGYDDR VPPLHTFKHVAALQHRFPKGPNPILMRMDLSSGHYAGKSVQKMIEETADE YSFIGKSMGLTMQVRAK Lentinula SEQ ID NO.: 50 MSVPQWVSYPPVSRDATSAITYQSKLRGSVTVRDPYSALEVPFDDSEETKA novae- FVHAQRKFARTYLDEIPDR zelandiae ETWLQTLKESWNYRRFTVPKRESDGYTYFEYNDGLQSQMSLRRVKVSEED TILTESGPGGELFFDPNLLSLDGNAALTGSMMSPCGKYWAYGVSEHGSDW MTTYVRKTSSPHMPSQEKGKDPGRMDDVVRYSRFFIVYWSSDSKGFFYSR YPPEDDEGKGNAPAQNCMVYYHRLGERQEKDTLVYEDPEHPFWLWALQL SPSGRYALLTASRDASHTQLAKIADIGTSDIQNGIQWLTIHDQWQARFVIIG DDDSTIYFMTNLEAKNYLVATLDIRHSEAGVKTLVAENPDALLISASILSTD KLVLVYLHNARHEIHVHDLNTGKPIRQIFDNLIGQFSLSGRRDDNDMFIFHS GFTSPGTIYRFRLNEDSNKGTLFRAIQVPGLNLNDFTTESVFYPSKDGTPIHM FITRLKDTPVDGTAPVYIYGYGGFALAMLPTFSVSTLLFCKIYRAMYVVPNI RGGSEFGESWHREGMLDKKQNVFDDFNAATKWLVANKYANKYNVAIRG GSNGGVLTTACANQAPELYRCVITIGGIIDMLRFPKFTFGALWRSEYGDPED PEDFDFIYKYSPYHNIPSGDVVLPAMLFFTAAYDDRVSPLHSFKHVAALQY YFPNGPNPVLMRIDLNTGHFAGKSTQKMLEETADEYSFIGKSMGLVMCVQ NEHASKQWSCVVT Lentinula SEQ ID NO.: 51 MSIPRWGPYPPVRRDETSAITYQSKLHGSVTVPDPYSALEVPYNDDEESEIK raphanica TFVSEQRKFARTYLDENPDRERWLQVLKESWNYERFTVPKRESDGHTYFE YNDGLQSQMTLRRVKTGQEDTILTESGPGGELFFDPNMISLDGNAALTGSM MSPCGKYWAYGVSEHGSDWMTIYVRETSSPHQPSQEKGKDTGRMDDVVH SSRFFIVYWTSDSKGFFYSRYPPEDDEGKGNSPAKNCMVYYHRLGEKQED DALIYEDPEHPFWLWAVQLSPSGRFALLTASRDASHTQMAKIADLSSGDVR NGVNWLTIHDKWEARFLIIGDDDSKIYFLTNLEAVNYKVVTLDTRCPEAGT NTLVPENPDALLISASIVSADKLALVYLQNAKHDIYIHDLSTGKPTRRLFED LIGQFALSGRREDNDMFVFYSGFTSPGTIYRYKFDEEDNNGTLFRAMRVPG LDLDKFTTESVFYPSKDGTKVHMFITRLKNTLVDGTAPVYMYGYGGFALA MLPTFSVSTLLFCKTYRAMYVVPNIRGGSEFGESWHREGMLDKKQNVFDD FNAAAEWLIANKYAKSNCVAIRGGSNGGVLTTACTNQAPELFRCVVTIGGI IDMLRFPKFTFGALWCSEYGDPDDPEAFDYIYKYSPYHNIPSGKVVIPAMIF FTAAYDDRVSPLHTFKHVAALQYNFPTGPNPIMMRIDLNTGHYAGKSTQK MLEETADEYSFIGRSMELTMHTQNHWSCVTS Lentinula SEQ ID NO.: 52 MSVPQWVPYPPVSRDDTSAITYQSKLRGSVTVRDPYSALEVPFDDSEETKA lateritia FVHAQRKFARMYLDEIPDR ETWLQTLKESWNYRRFTVPKRESDGYTYFEYNDGLQSQMSLRRVKVSEED TILTESGPGGELFFDPNLLSLDGNAALTGSMMSPCGKYWAYGVSEHGSDW MTTYVRKTSSPHMPSQEKGKDPGRMDDVIRYSRFFIVYWSSDSKGFFYSRY PPEDDEGKGNTPAQNCMVYYHRLGEKQEKDTLVYEDPEHPFWLWALQLS PSGRYALLTASRDASHTQLAKIADIGTSDIQNGIQWLTIHDQWQARFVIIGD DDSTIYFMTNLEAKNYLVATLDIRHSEAGVKTLVAENPDALLISASILSTDK LVLVYLHNARHEIHVHDLNTGKPIRQIFDNLIGQFSLSGRRDDNDMFVFHS GFTSPGTIYRFRLNEDSNKGTLFRAIQVPGLNLNDFTTESVFYPSKDGTPIHM FITRLKDTPVDGTAPVYIYGYGGFALAMLPTFSVSTLLFCKIYRAMYVVPNI RGGSEFGESWHREGMLDKKQNVFDDFNAATKWLVANKYANKYNVAIRG GSNGGVLTTACANQAPELYRCVITIGGIIDMLRFPKFTFGALWRSEYGDPED PEDFDFIYKYSPYHNIPSGDVVLPAMLFFTAAYDDRVSPLHSFKHVAALQY YFPNGPNPVLMRIDLNTGHFAGKSTQKMLEETADEYSFIGKSMGLVMCVQ NEHASKQWSCVVT Dendrothele SEQ ID NO.: 53 MSVPQWGPYLPVDRDETSAITYRTKLHGSVTVPDPYSGLEAPLDESAKTKA bispora FVHSQRKFARTYLDENPDK EVWLETLKQSWNYKRFTVPRHESDDHIYFEYNDGLQSQLSLHRVKVGDED TILTESGPGGELFFDPNMISLDGNASLTGFIMSPCGKYWAYGVSEHGSDWM TIYVRETSSPHVPSQERGKDPGRMDDEVRHSRFFIVSWTGDSKGFFYSKYPP EENEGKGNAPAKNCIVYYHRLGEKQENDTLVHKDSGHPFWLWSLQTTPSG RYALLAASRDASHTQLAKIADIHDNDIGASMKWINLHDSWEARFSIIGDDD SKIYFMTNLQAPNYKVAIFDACHPSPDADLTTLVAEDPNALLIAASIHAKDK LALVYLRDARHEIHVHDLVTGRLLRRILGDLVGQFMVTGRRADNDMFIFY SGFTSPGTVYRYKFDDERDTCSLFRAIRIPGLDLDKFVTESVFYPSKDGTSIH MFITRPKDVLLDGTAPVLQYGYGGFALAMLPTFSVSTLLFCKIYRAMYVVP NIRGGSEYGESWHRAGMLGNKQNVFDDLNAATEWLVANKYANKDRVAI RGGSNGGVLTTACANQAPGLYRCVITIGGIIDMLRFPKFTFGALWCSEYGD PEDPEAFDFIYKYSPYHNIPSGETVMPAMLFFTAAYDDRVSPLHTFKHVAA LQHSFPHGPNPILMRVDMNSGHYAGKSTQKMLEETADEYSFIGKSMGLTM QVENKSDSNRWSCVVN Dendrothele SEQ ID NO.: 54 MPVPGWGSYPPFDRDETSAITYQSKLRGSVTVYDPYSALEVPSNDSEETKA bispora FILEQNKFSRAYLDANPDRQTWLETLKKSWHYRRFTTPTRESDDHFYFLYN DGLLAQSPVYRVKVDDVDSILTESGPGGELFFDPNLLSLDGVATLTGTAMS PCGKYWAYAISEHGNDWMTIYVRKTSSPHHPSQERGKDPGRMDDVIQHCR IFFVSWT DDSKGFFYSKWPPDENQGNGNAPGVDCKIYYHRIAVFLSEDPEHPGWFWN VEVSPSGQYALLLGTRDASLNQLVKLADLHTSDIETGIQWTTLHDSWQARF SIIGNDNSLIYFRTNLEAENHRVAAFNVHHPQAGFTTLVPGSLDSVLLDAKL YGINKLVLVYQHLAKHEIYLHDIETGRRLRQIFTDLAGKMTISGRRADHEM FVLYSD FISPGTLYRQLLNRYKFDKDTDKGLLFRTIKVDALNLDDFVTESEFYPSKDG TLVHMFITHPKDVFTDGTAPVLMYGYGGFGAPMFPNFSISNLLFCNIYRGIG GSEFGESWHREGMLEKKQNVFDDFRAAAEWLVTNKYARKGGVAIRGGSN GGIMTTACSNQAPELYGCVITIAGLQDMLRYTKFTFGDLLRSEYGNPENPE DFDYIY KYSPYHNIPLKEVTMPPMLFLQSDYDDRVSPLHTYKHVAALQHRFPKGPNP IILRIDLDSGHYAGKSTMRLIEETADEYRWDLDSSSSSCYYI Gypsophila SEQ ID NO.: 55 MATSGFSKPLHYPPVRRDETVVDDYFGVKVADPYRWLEDPNSEETKEFVD vaccaria NQEKLANSVLEECELIDKFKQKIIDFVNFPRCGVPFRRANKYFHFYNSGLQA QNVFQMQDDLDGKPEVLYDPNLREGGRSGLSLYSVSEDAKYFAFGIHSGL TEWVTIKILKTEDRSYLPDTLEWVKFSPAIWTHDNKGFFYCPYPPLKEGED HMTRSAV NQEARYHFLGTDQSEDILLWRDLENPAHHLKCQITDDGKYFLLYILDGCDD ANKVYCLDLTKLPNGLESFRGREDSAPFMKLIDSFDASYTAIANDGSVFTF QTNKDAPRKKLVRVDLNNPSVWTDLVPESKKDLLESAHAVNENQLILRYL SDVKHVLEIRDLESGALQHRLPIDIGSVDGITARRRDSVVFFKFTSILTPGIVY QCDL KNDPTQLKIFRESVVPDFDRSEFEVKQVFVPSKDGTKIPIFIAARKGISLDGS HPCEMHGYGGFGINMMPTFSASRIVFLKHLGGVFCLANIRGGGEYGEEWH KAGFRDKKQNVFDDFISAAEYLISSGYTKARRVAIEGGSNGGLLVAACINQ RPDLFGCAEANCGVMDMLRFHKFTLGYLWTGDYGCSDKEEEFKWLIKYS PIHNVRR PWEQPGNEETQYPATMILTADHDDRVVPLHSFKLLATMQHVLCTSLEDSP QKNPIIARIQRKAAHYGRATMTQIAEVADRYGFMAKALEAPWID Fragment of SEQ ID NO.: 56 MSMSLLGVYPPVKRDEASAITYQSKLHGSVIVHDPYSALEIPSNDSLETKAF Gymnopus VLSQGKFSRAYLDEIPTRKNWLKILKSNWSYRRFSALKRESDNHFYFEYND fusipes GLQPQSSIYRVKVGEEDSILTESGPGGELFFDPNLLSLDGVAALTGAAMSPS prolyl- GKYWAYGVSEHGNNSMTIYVRKTSSPHQPSQEKGTDPGRMNDVLQHIRM oligopeptidase LFVSWTRDSKGFFYQRYPPEKNEGNGNAPGQNCKIYYHYIGTEQDSDILIH EDPDHPDWFSYVQLSPSGQYVLLLINRDSSLNYLAKIADLSVNDIGTHIQW KNLHDSWNHFTMIGNDYSVIYFKTNLDAQNYKVATIDFLQPEMGFTTLVK ENPNSVLVEAKIFREDKLVLLYQQNASHQIHIYDLKSGAWLQQIFKNLTGFI TTVPNGRAEDEMFFLYNDFITPGTIYQYKFDDESDKGLVFRAIQIDGLNLDD FVTESKFYPSKDGTSVHMFITRPKDVLIDGTAAVYMYGYGGFSISVLPTFSIS TLLFCKIYRAMYVVPNIRGGSEFGESWHREGMLDKKQNGHDDFHAAAEW LIANKYAKKDCVAIRGGSSGGILTTACANQAPELYRCVITIEGIIDMLKFPKF TFGALLRSEYGDPEDPEAFDYIYKYSPYHNIPLGDVVMPPMLFFNAGYDDR VPPLHTFKHVAALQHRFPKGPNPILMRMDLSSGHYAGKSVQKMIEETADE YSFIGKSMGLTMQVRAKPSNNRWSCVVT Lentinula SEQ ID NO.: 57 MSVPQWDPYPPVSRDETSAITYQSKLCGSVTVRDPYSALEVPFDDSEETKA edodes FVHAQRKFARTYLDEIPDR ETWLQTLKESWNYRRFTVPKRESDGYTYFEYNDGLQSQMSLRRVKVSEED TILTESGPGGELFFDPNLLSLDGNAALTGSMMSPCGKYWAYGVSEHGSDW MTTYVRKTSSPHMPSQEKGKDPGRMDDVIRYSRFFIVYWSSDSKGFFYSRY PPEDDEGKGNTPAQNCMVYYHRLGEKQEKDTLVYEDPEHPFWLWALQLS PSGRYALLTASRDASHTQLAKIADIGTSDIQNGIQWLTIHDQWQARFVIIGD DDSTIYFMTNLEAKNYLVATLDIRHSEAGVKTLVAENPDALLISASILSTDK LVLVYLHNARHEIHVHDLNTGKPIRQIFDNLIGQFSLSGRRDDNDMFVFHS GFTSPGTIYRFRLNEDSNKGTLFRAVQVPGLNLSDFTTESVFYPSKDGTPIH MFITRLKDTPVDGTAPVYIYGYGGFALAMLPTFSVSTLLFCKIYRAMYVVP NIRGGSEFGESWHREGMLDKKQNVFDDFNAATKWLVANKYANKYNVAIR GGSNGGVLTTACANQAPELYRCVITIGGIIDMLRFPKFTFGALWRSEYGDPE DPEDFDFIYKYSPYHNIPSGDVVLPAMLFFTAAYDDRVSPLHSFKHVAALQ YNFPNGPNPVLMRIDLNTGHFAGKSTQKMLEETADEYSFIGKSMGLVMCA QNEHASKQWSCVVT Omphalotus SEQ ID NO.: 58 MSFPGWGPYPPVERDETSAITYSSKLHGSVTVRDPYSQLEVPFEDSEETKAF olearis VHSQRKFARTYLDENPDR EAWLETLKKSWNYRRFSALKPESDGHYYFEYNDGLQSQLSLYRVRMGEE DTVLTESGPGGELFFNPNLLSLDGNAALTGFVMSPCGNYWAYGVSEHGSD WMSIYVRKTSSPHLPSQERGKDPGRMNDKIRHVRFFIVSWTSDSKGFFYSR YPPEDDEGKGNAPAMNCMVYYHRIGEDQESDVLVHEDPEHPFWISSVQLT PSGRYILFAASRDASHTQLVKIADLHENDIGTNMKWKNLHDPWEARFTIVG DEGSKIYFMTNLKAKNYKVATFDANHPDEGLTTLIAEDPNAFLVSASIHAQ DKLLLVYLRNASHEIHIRDLTTGKPLGRIFEDLLGQFMVSGRRQDNDIFVLF SSFLSPGTVYRYTFGEEKGHSSLFRAISIPGLNLDDFMTESVFYPSKDGTSVH MFITRPKDVLLDGTSPVLQYGYGGFSLAMLPTFSLSTLLFCKIYRAIYAIPNI RGGSEYGESWHREGMLDKKQNVFDDFNAATEWLIANKYASKDRIAIRGGS NGGVLTTACANQAPGLYRCVITIEGIIDMLRFPKFTFGASWRSEYGDPEDPE DFDFIFKYSPYHNIPPPGDTIMPAMLFFTAAYDDRVSPLHTFKHVAALQHNF PKGPNPCLMRIDLNSGHFAGKSTQEMLEETADEYSFIGKSMGLTMQTQGSV DSSRWSCVTV Oligoprolyl- SEQ ID NO.: 59 1 aagcacacca ctgataatta tgcttcagat agagtgaagc ctagtgagag acaaaatctt peptidase 61 tcagactgct cttaaaaggc tgaatttcag aacaaccgaa acgttgatcg atcggctgaa enzyme 121 atggtaaccg atcaccattt cggtagtact gaagtggttg aactgtctta aaatgcttca DNA sequence 181 cccagaccga agtataatta tcagcggtgt gagagacatt acaggattga caggacttta from Gymnopus 241 ttttgaaagt aggctttttc gattccgcct aataaatcat acaaggccca tgctgaattt fusipes 301 gaccaatcac ataacagtgc gttgtattga aatttgacga tcctatctac ttggtgtcga 361 gctgccggtg tccaaatgaa cgaggttgtc agaatactgc cgatttcaat gcttatggaa 421 cgcactgtac aaggaagctg gcaatagaaa ccatgccgtc aatcctagtt caatggtatc 481 tttcacagtt cctgttgcat atgcccagtt ttattaattt ctgtcactca tgaccaataa 541 ccgtgtcttg tatgaagata acgtggcgaa aatctatatt ccttataaga aacaaacccc 601 ttcgtccgct acgtgtcttt gaaacccaca ctatgtccat gtcactttta ggtgtatatc 661 ctcctgttaa acgggacgaa gcctcagcca ttacctacca aagcaaactt cacggttctg 721 tcattgtgca tgatccatac agcgcgcttg aaataccttc taatgatagt ttggagacaa 781 aggtttgcga cacaatcctg ccatgtaaaa aatcgagaca ttcagtattt caggcatttg 841 tcctctcaca aggtaaattt tcacgggctt acttggatga aattccgaca aggtaagaga 901 attttcaaac aatgaacaac ttaaatctat ttcatatcag gaaaaattgg ttgaaaatat 961 taaagagtaa ctggagttac cggcggtttt ctgccttgaa gcgtgaaagt gacaaccatt 1021 tctatttcga atataatgat ggccttcaac cccagtcatc catttatcgg gtgaaggttg 1081 gtgaagagga ttccatcctt actgaatctg gacctggggg tgaattgttt tttgatccca 1141 atttgctttc attggatggg gttgctgcac ttactggtgc tgcgatgagt ccttctggga 1201 agtactgggc atatggtgta tctgaacatg ttgatctttt tccactcaag ctatactaat 1261 tattgaccta ataaatattg aacagggaaa caattcaatg acaatttatg ttcgaaaaac 1321 ttcatcacca catcaaccat ctcaagaaaa gggaacagat cccggacgga tgaatgatgt 1381 tctccaacac attcgcatgc tctttgtgtc ctggacaaga gatagcaaag gtttgaatac 1441 acagagagtg cttaagctgg aatatttcat catttatacc ttcaaaggtt tcttctacca 1501 aagatatcca ccagagaaaa atgaaggaaa tgggaatgca ccagggcaga attgcaaggt 1561 gaaactatct gacatcattg agtgcatgtg ccctctgaag catgttgtag atatattatc 1621 actacattgg gacagaacag gatagtgaca tccttattca gtaaggatag tgcatttctt 1681 gaagccaggc caaactcaaa tcatccttca gtgaggaccc tgatcatccg gactggttct 1741 catatgtaca gctctcccca aggtaaaatg gtctcacact gcaaagattc ctgattaata 1801 tcataccatg tagtggtcaa tatgtcctgc tactcataaa tgtatgtact tgaatttcta 1861 ctatccattg tactctgatt gtggattaaa cagcgtgatt caagtttaaa ttacctcgcc 1921 aagattgctg atttatctgt caatgatatt gggacccata tccaatggaa gaatttgcat 1981 gattcttgga accatttcac aatgttaaga gcttcatgag tttcttcata tactatgaac 2041 tgatctattt caattacata ctcaactgat aggattggga atgactactc tgtcatctat 2101 ttcaaaacaa atctggatgc acagaactac aaagttgcaa caatcgactt tcttcaacca 2161 gagatgggct tcacaactct ggtcaaggaa aatcccaatt cagtccttgt ggaggccaaa 2221 atattcagag aagacaagct tgtgcttttg taccagcaga atgctagcca tcaaatacac 2281 atttatgatc tcaagagtgg cgcatggctt caacaaatct tcaagaatct aactggattc 2341 ataactacag ttccaaatgg gcgcgctgaa gatgagatgt tttttctcta caatgacttt 2401 attacacctg ggacaatata tcagtcagtg tttaccatat atcggtggtc catcattttc 2461 agctgacaga cacggaacag atataaattt gatgatgaaa gtgacaaggg cttggtgttc 2521 cgtgccatcc aaatcgatgg actcaaccta gatgatttcg tgacagaatc agtaagtaaa 2581 tataactata ttcaactttg gggcactccg taactgaggt gttcagaagt tttacccatc 2641 caaggatgga acttcgtaat ttctctcgtt aactttgata cgtcaacttc tggttgacaa 2701 aaaaacatag ggttcacatg ttcatcaccc gcccgaagga tgtactcatc gacggaactg 2761 ccgcagtcta tatgtatggc tatggtggct tctcaatctc agtgcttccg acgttctcca 2821 tctcaaccct gctattttgc aaaatttacc gggcaatgta tgtcgtgcct aacatacggt 2881 aagggtattt ttggacaact ttgaagtcca tttacttacc tggctgccaa tttagcggag 2941 gttcggagtt tggagaatca tggcaccggg aggtgagtct atgtcaatgt gcacacaatt 3001 tacaagcttt actcaaccat gtctttcagg gaatgttgga caaaaaacag aatggacatg 3061 atgacttcca tgcagctgct gaatggctca tcgcaaataa gtacgccaaa aaggattgtg 3121 ttgccattcg cggggggtcc agcggaggtg cggagtccaa gaactgcttt tgtagccaga 3181 ttgaactttt tcacagggat tttgactacc gcatgtgcaa atcaagcacc cgaactctac 3241 cgctgtgtaa ttaccattga aggcataatt gacatgctca aagtttgtag tttgtgaatc 3301 acctttacat caaaatctca ctcatttgta tgccctcagt ttcccaagtt cacgtttggt 3361 gctttgttgc gttcggaata tggcgatgta tgtattcaat ttatcatttc tgaattgaat 3421 gagtctgaca gacctactta gcccgaggac ccagaagctt ttgactacat ctacaagtta 3481 gctttctcat ccttccacag tcatccgctc agacctaacc atgtagatac tcgccttatc 3541 ataacattcc gttgggtgat gtagtcatgc caccgatgct attcttcaat gcgggatatg 3601 atgaccgcgt tcctcctcta cacagtaagc caagtgtttg attccttcaa gaccaagcta 3661 accccctaac aagccttcaa gcatgttgct gcactacaac atagatttcc taaaggcccg 3721 aatccaattc tcatgcgcat ggacctaagt tcagggcatt atgctggcaa ggtttgtatt 3781 tcactctcca agacatgctc tttgcaaaat ttattcttgt agagtgtaca aaagatgatt 3841 gaggaaactg cagatgaata caggtgtggt caatgggtct tattacatgc atcattttct 3901 aactgatttg ggtctactag cttcattggg aagtctatgg ggcttactat gcaagtcaga 3961 aaaccat ctaataaccg ttggtcctgt gtagtgactt ga Oligoprolyl- SEQ ID NO.: 60 1 atgtccatgt cacttttagg tgtatatcct cctgttaaac gggacgaagc ctcagccatt peptidase 61 acctaccaaa gcaaacttca cggttctgtc attgtgcatg atccatacag cgcgcttgaa enzyme 121 ataccttcta atgatagttt ggagacaaag gcatttgtcc tctcacaagg taaattttca cDNA sequence 181 cgggcttact tggatgaaat tccgacaagg aaaaattggt tgaaaatatt aaagagtaac from Gymnopus 241 tggagttacc ggcggttttc tgccttgaag cgtgaaagtg acaaccattt ctatttcgaa fusipes 301 tataatgatg gccttcaacc ccagtcatcc atttatcggg tgaaggttgg tgaagaggat sequence. 361 tccatcctta ctgaatctgg acctgggggt gaattgtttt ttgatcccaa tttgctttca Underlined are 421 ttggatgggg ttgctgcact tactggtgct gcgatgagtc cttctgggaa gtactgggca positions where 481 tatggtgtat ctgaacatgg aaacaattca atgacaattt atgttcgaaa aacttcatca SNPs are 541 ccacatcaac catctcaaga aaagggaaca gatcccggac ggatgaatga tgttctccaa present, as 601 cacattcgca tgctctttgt gtcctggaca agagatagca aaggtttctt ctaccaaaga well as their 661 tatccaccag agaaaaatga aggaaatggg aatgcaccag ggcagaattg caagatatat potential 721 tatcactaca ttgggacaga acaggatagt gacatcctta ttcatgagga ccctgatcat codons 781 ccggactggt tctcatatgt acagctctcc ccaagtggtc aatatgtcct gctactcata 841 aatcgtgatt caagttt(a/t)aa ttacctcgcc aagattgctg atttatctgt caatgatatt 901 gggacccata tccaatggaa gaatttgcat gattcttgga accatttcac aatgattggg 961 aatgactact ctgtcatcta tttcaaaaca aatctggatg cacagaacta caaagttgca 1021 acaatcgact ttcttcaacc agagatgggc ttcacaactc tggtcaagga aaatcccaat 1081 tcagtccttg tggaggccaa aatattcaga gaagacaagc ttgtgctttt gtaccagcag 1141 aatgctagcc atcaaataca catttatgat ctcaagagtg gcg(c/a)atggct tcaacaaatc 1201 ttcaagaatc taactggatt cataactaca gttccaaatg ggcgcgctga agatgagatg 1261 ttttttctct acaatgactt tattacacct gggacaatat atcaatataa atttgatgat 1321 gaaagtgaca agggcttggt gttccgtgcc atccaaatcg atggactcaa cctagatgat 1381 ttcgtgacag aatcaaagtt ttacccatcc aaggatggaa cttcggttca catgttcatc 1441 acccgcccga aggatgtact catcgacgga actgccgcag tctatatgta tggctatggt 1501 ggcttctcaa tctcagtgct tccgacgttc tccatctcaa ccctgctatt ttgcaaaatt 1561 taccgggcaa tgtatgtcgt gcctaacata cgcggaggtt cggagtttgg agaatcatggp 1621 caccgggagg gaatgttgga caaaaaacag aatggacatg atgacttcca tgcagctgct 1681 gaatggctca tcgcaaataa gtacgccaaa aaggattgtg ttgccattcg cggggggtcc 1741 agcggaggga ttttgactac cgcatgtgca aatcaagcac ccgaactcta ccgctgtgta 1801 attaccattg aaggcataat tgacatgctc aaatttccca agttcacgtt tggtgctttg 1861 ttgcgttcgg aatatggcga tcccgaggac ccagaagctt ttgactacat ctacaaatac 1921 tcgccttatc ataacattcc gttgggtgat gtagtcatgc caccgatgct attcttcaat 1981 gcgggatatg atgaccgcgt tcctcctcta cacaccttca agcatgttgc tgcactacaa 2041 catagatttc ctaaaggccc gaatccaatt ctcatgcgca tggacctaag ttcagggcat 2101 tatgctggca agagtgtaca aaagatgatt gaggaaactg cagatgaata cagcttcatt 2161 gggaagtcta tggggcttac tatgcaagtc agagcaaaac catctaataa ccgttggtcc 2221 tgtgtagtga cttga
TABLE-US-00003 TABLE 3 Exemplary amino acid and nucleotide sequences of N-methyltransferases Anomoporia SEQ ID NO.: MSSPAVETKVPASPDVTAEVIPAPPSSHRPLPFGLRPGKLVIVGSGIGSIGQFTL bombycina 61 SAVAHIEQADRVFFVVADPATEAFIYSKNKNSVDLYKFYDDKKPRMDTYIQ The DOE MAEVMLRELRKGYSVVGVIYGHPGVFVTPSHRAISIARDEGYSAKMLPGVS Joint AEDNLFADIGIDPSRPGCLTYEATDLLLRNRTLVPSSHLVLFQVGCIGLSDFRF Genome KGFDNINFDVLLDRLEQVYGPDHAVIHYMAAVLPQSTTTIDRYTIKELRDPVI Institute KKRITAISTFYLPPKALSPLHEESAAKLGLMKAGYKILDGAQAPYPPFPWAGP (JGI) NVPIGIAYGRRELAAVAKLDSHVPPANYKPLRASNAMKSTMIKLATDPKAF 1346513 AQYSRNPALLANSTPGLTTPERKALQTGSQGLVRSVMKTSPEDVAKQFVQA ELRDPTLAKQYSQECYDQTGNTDGIAVISAWLKSKGYDTTPTAINDAWADM QANSLDVYQSTYNTMVDGKSGPAITIKSGVVYIGNTVVKKFAFSKSVLTWSS TDGNPSSATLSFVVLTDDDGQPLPANSYIGPQFTGFYWTSGAKPAAANTLGR NGAFPSGGGGGSGGGGGSSSQGADISTWVDSYQTYVVTTAGSWKDEDILKI DDDTAHTITYGPLKIVKYSLSNDTVSWSATDGNPFNAVIFFKVNKPTKANPT AGNQFVGKKWLPSDPAPAAVNWTGLIGSTADPKGTAAANATASMWKSIGI NLGVAVSAMVLGTAVIKAIGAAWDKGSAAWKAAKAAADKAKKDAEAAE KDSAVDDEKFADEEPPDLEELPIPDADPLVDVTDVDVTDVDVTDVDVTDVD VTDVDVTDVDVTDVDVTDVDVTDVDVVDVLDVVVI Armillaria SEQ ID NO.: MPANKGTLTIAGSGIASIGHITLETLSYIQGADKVYYVITDPATEAFIQDKSEG gallica 62 DCFDLTVYYDKNKIRYETYVQMCEVMLRDVRADYNVVGVFYGHPGVFVSP The DOE SHRAIAIARDEGYRARMLPGVSAEDYMFSDLGFDPAVPGCMTQEATAMLNH Joint NKKLDPSIHNIIWQVGAVGIDTMVFDNRKFHLLVDRLEEDFGPDHRVVNYIG Genome AVLPQSTTVMDEFTIGDLRKEDVVKQFTTVSTFYVPPRTRAPVDQEAMQKF Institute GPSDAPLAHTVRHLYPPSKWAGTQTSVVPAYGPCERAAVDRIADYTPPPDH (JGI) MILRASPAIRQFMTDLALNPGLRDRYKADPVAVLDATPDLSTQEKFALSFDK 1000654 PGPVYTVMRATPAAIASGQEPTFDDIAGATESASPPLFVIT Armillaria SEQ ID NO.: MPANKKGTLTIAGSGIASIGHITLETLSYIQEADKVYYAITDPATEAFIQDKSE gallica 63 GDCFDLTVYYDKNKIRYETYVQMCEVMLRDVRADYNVVGVFYGHPGVFV The DOE SPSHRAIAIARDEGYRARMLPGVSAEDYMFSDLGFDPAVPGCMTQEATAML Joint NHNKKLDPSIHNIIWQVGAVGIDTMVFDNRKFHLLVDRLEEDFGPDHRVVN Genome YIGAVLPQSTTVMDEFTIGDLRKEDVVKQFTTVSTFYVPPRTRAPVDQEAMQ Institute KFGPSDAPLVYPPSKWAGTQTFVVPAYGPCERAAVDRIADYTPPPDHMILRA (JGI) 622643 SPAIRQFMTDLALNPGLRDRYKADPVAVLDATPDLSTQEKFALSFDKPGPVY IVMRATPAAIASGQEPTFDDIAGATESASPPLFIIVQVPA Arthrobotrys SEQ ID NO.: MSEGGKLILVGTGVRSLCQLTLEAIDEIERADVIYYAVRDATTEGFIKKRNKE oligospora 64 AIDLYQYFINDEEIPEADIYIQIAEVMLAATRKGRRVVGAFFGHPGLFMSPNR The DOE RALAIAQAEGYTAKILPGVSVDDCLLADLGVDPSFIGCLTCEARDFMIHDHL Joint GLTSRHVIMYEVGYLGFYGDDSKTDYFEYFVNRLEEIYGNEHSLVNYTAAIS Genome PLMQPVINTLTIGDLRKPEVRKQITSASTLYFPPKEILKLNKFGCDLLDQGITN Institute KEQFQHAIFPGQPLYQLIGKALPHEAYSEHAQQVIAGLHRRKISPRYPLYRAS (JGI)4309 AAMQSTMEDIYLKNEVRKEYLISPTSFTLRVVPGLKEMEKIALASGNYSQID GAMKSGDLDQLTTGAIEIGNYKVILYSGYAIGYERATFAIADFTNFSFFNIY Armillaria SEQ ID NO.: MPANKKGTLTIAGSGIASIGHITLETLSYIQEADKVYYAITDPATEAFIHDKSK ostoyae 65 GDCFDLSVYYDKNKNRYETYVQMCEVMLRDVRADYNVLGVFYGHPGVFV The DOE SPSHRAIAIARDEGYRARMLPGVSAEDYMFSDLGFDPAVPGCMTQEATAMLI Joint HNKKLDPLIHNIIWQVGSVGVDTMVFDNRKFHLLVDRLEEDFGLDHKVVHY Genome IGAVLPQSTTVMDEFTIGDLRKEDVVKQFTTMSTFYVPPRTPAPVDQEAMQK Institute FRSLDAPLARTVHLYPPSKWAGTQTSVVPAYGPYERAAVDRIADYTPPPDH (JGI) 252778 MILRASPAIRQFMMDLALNPGLRDRYKADPVAVLDATPDLSTQEKFALSFD KPGPVYTVMRATPAAIASGQEPTFDGIAGAAKPASFPGVAPLIIISV Apodospora SEQ ID NO.: MAAEHATPSPVETHFGRTVPAMGRRPGKLVMVGSGIKSISHMTLETVSHIEQ peruviana 66 ADKVFYCVADPGTELFVKSKAKWSFDLYTLYDNDKNRYITYVQMAELCLQ The DOE AARDGFFSVGVFYGHPGVFVSPSHRAIGIAKREGIEAYMLPGISAEDCLFADL Joint GVDPSFTGCQTYEATDLLLRDRPISPYSHLIVWQVGVVGDTGFNFGGFTQTK Genome FQVLVDRLEEVYGSDHRLIHYFASTLSHGPAHIEPLRISDLRKPEVEKRMNGI Institute STFYVPQIGKSAHNPKTAERLGLRVDSKTPDRSFGHLIGPAISYNTLETRAVQ (JGI) 642771 ALKTHKPSPSYRKNRLPTSTLPVLTALATSPKAVAHFKRNTTQFLDAFPDMA THVKKVLQTGSPGLLRLLSLNSSADVAAKFVQAEFRDSTLASKYAAVLKEN NGDPDGETNIIKFLQDQGYDTTPEDVSTAYLSAISVDLNTYAGYYASTFTNG GVGPNILIQNGAVTVDDTVIKNPVYAQSLLQWSIKDGNAFNAKLTFRILTDD DGKPLAPGAYIGPQFYGTYWKSEEPSTPNIQGKTGTAPIKPVNPVTPVTPTPL DTFTGNFVAYKADATTGKWSEDGTFVVSDPAGSTVPTAVYKGKTLNNYQY SGNETLTWSSTDGNDSNGSISFFINKTATSTNPTLGAQATGRVWAPAEAMPA KVNFFMSLGQSANPSTQSVPSQSASEWKSVGINVGVGLATMLLGTAIIEAIK WRIKLKANPTDPEINQGVKDSSEKVSQSSEQQEAVQKSSVESDASGSADVQP SDIPVPDAPVTTTTDTTTTDTTTTDTTTTDTTTTDTTTTDTTTTDTTTTTDTTT TTDVTTDVTTDVDVVVDVDVIVIL Bjerkandera SEQ ID NO.: MSTTTSNNAGSLTIAGSGIASVAHITLETLSHIREADKVFYIVCDPATEAFIHD adusta 67 NAKAEAVDLTVYYDTNKARYDSYVQMAEVMLQDVRGGKDVLGIFYGHPG The DOE VFVSPSHRALAIARSEGYKAKMLPGVSAEDYLFADLEFDPSVHGCATFEATE Joint LLLREKPLNTTMHNIIWQVGAVGVDDMVFTNSKLHVLVDRLEKDFGPEHQV Genome VHYIGAVLPGSRTVMDTFTVADLCKDDVVKQFNPSSTLYIPPRSLAANSSDIA Institute ASLGAKPDHPLVDPTLFPPLRWTKSTSPEAPAYGPLEQAAVAELANHKVPSQ (JGI) 128644 HKVLAASPAMRTLVAELNVALRKKLAADPKAFAGGREGLTEVEKLAVGTG NVGTMGAVMRALPGGEQSTDMVTSPASIEQQSRREAFFLIVLIVSTRILH Cercospora SEQ ID NO.: MPSQTSIWNHIDELTRHDVFPSTEAGKGELVVVGTGIASIRQMTVEALDYIQR beticola 68 ADKVFYATLDAVTETFIKHHAPSAEDLYQYYDTEKNRVTTYVQMAEVILSS GenBank VRKGKLTVAVFYGHPGVFVTPSHRAIYIARHEGYKAQMLPGVSAEDCLYAD XP_023455951.1 LGIDPASSGCSMYEASFLLNEPNRLDSRHHLIIWQVGCVGKEAMIFDNKEIYK LADYLEAEYGPDHPVIAYLAAIQPFHDSKMDKMTVQDLRDQDKVQNIPITA GTTLYVPPKKLPANPPAYKDMAIGYQLALTSAFRISHPDLDVVETYTQEEKS WCEELASWSPPKSYNANAAPPVLRRIAVKLALLHHRLHGNVALSDVANAIT TAEPSLTDEEANLLRQFVGHLDFMFKKERPPQSVTTSIINNTIVPPIVTQLNIIR KDGSIMKGVKKPSLYVY Ceratobasidium SEQ ID NO.: MASITTGRDTTKSGSLIIAGSGISSVAHLTLETVSHLKNADNVFYLVGDPVTE sp. 69 AFIQENNKSTTNLVAHYATSKHRYQTYVEMAEVMLREVRAGHSVFGIFYGH (anastomosis The DOE PGVLTTPAHRALTLARQEGYEARMLPGVSSVDYMFADLELEPGQHGCMIHE group I, AG-I) Joint ATDLLARDRRLDPSVHNIILQPSRVGSATLEKEASKFQLLVDRLVRDFGPDH Genome KIVHYSGAVLPQSSSAMVVFVIENLRNEQLANQIRSTSILYIPPRDIVPVHPDA Institute AAALKLPDMLGLLSTSVQWVGPRFIETADYGPVERKFVDQLERQVIPEGQQS (JGI) 486605 LRASTAMRKFMINLALDPNGLKEYKESPSAVAAGVPGLTDRERSALAIASEG PIFVVMSRTDDEEPTEEQLMEADRNGARIVDSCTMCTLGGGRNS Ceratobasidium SEQ ID NO.: MTTPSDTNKKGTLTIAGSGIASIRHITLETLSYIKESDKIYYLVADPATEAFIIE sp. 70 NANGSCVSLYGLYGIDKIRYDTYVQMSEVLLRDVRAGFDVLGIFYGHPGVF (anastomosis The DOE VSPTQRAMSIALEEGFQARMLPGVSAEDYLFADLRVDPCMFGCAAYEATEL group I, AG-I) Joint LYRKRRLNPTMQNIIWQVGKRFTIIKLTSPDTQNSKFGLLVDHLEEDYGPDH Genome KVVHYIGAVLPQATTVIQPYTISELRKPEVASQIRACSTFYIPPRDEILPDASMS Institute ERLGLDAPISHLLGGRYPRPAWSVSGFKTAPAYGPREKHLVAELNVRGIPEP (JGI) 594340 DMVLFASQPMRKFMADLALKPRLRDSYRSNPQVIVDAVKGLTSLENMALK LNRVTAITRVMSVNPTALILGIEPTETDLAIDPYMDNGDPKIVVSG Cerrena SEQ ID NO.: MATQKSGSLTIAGSGIASIGHITLETLSYIEQADKVYYAVADPATEAFIQDKS unicolor 71 KVECFDLTVYYDKDKIRFETYIQMSEVMLRDVRAGHSVLGIFYGHPGVFVCP The DOE SHRAIAIALSEGYKARMLPGISAEDYMFSDIGFDPALPGCTTQEATHLLLHNK Joint KLDPSMIHNIIWQVGGVGADTMNFDNRQFHQLVDCLERDFGSSHKVVHYIG Genome AVMPQSTTIMDEFSIADLRKEEVVKQFTTWSTFYIPPRDAAPVDEGIMQSLGL Institute SSNDMQYTMYPPSSTMRLGIRSPNLDVYGRAGRAAIEKLDHHTPAARHQVL (JGI) 312586 RASPAIRKFMEDLALKSDLRDRYKADPHTVLDAIPGLTSQEKIALGFGKPGP VYKVMRATGRETADGQEHVPHDLTTTDEPGAPVLLLLLLQTT Cerrena SEQ ID NO.: MATTKTGSLTIAGSGIASVAHITLEVLSYLQEADKIYYAIVDPVTEAFIQDKSK unicolor 72 GRCFDLRVYYDKDKMRSETYVQMSEVMLRDVRSGYNVLAIFYGHPGVFVC The DOE PTHRAISIARSEGYTAKMLPGVSAEDYMFSDIGFDPAVPGCMTQEATSLLIYN Joint KQLDPSVHNIIWQVGSVGVDNMVGDNKQFHLLVDHLERDEGSIHKVIHYVG Genome AIMPQSATVMDEYTISDLRKEDVVKKFTTTSTLYIPPREIAPVDQRIMQALEF Institute SGNGDRYMALSQLRGVHARNSGLCAYGPAEQAAVDKLDHHTPPDDYEVLR (JGI) 361677 ASPAIRRFTEDLALKPDLRSRYKEDPLSVLDAIPGLTSQEKFALSFDKPGPVY KVMRATPAAIAAGQEHSLDEIAGSADSESPGALATTIVVIVHI Cladosporium SEQ ID NO.: MPSQSIWSHIAELTRGGPVPKDVPHKGELVVVGTGIASLRQLTVEALDYIQR fulvum 73 ADVVFYATLDAVTEAFIKQHAKAAENLYQYYDTEKNRNATYTQMAETILAS The DOE VRKGNMTVAVFYGHPGVFVTPSHRAIYIARQEGYKAKMLPGVSAEDCLYA Joint DLDIDPASSGCSMYEASFLLLEPDRLDSRHHLIIWQVGCVGKEAMVFDNKEL Genome YKLADYLEAEYGPKHPAIAYLAAIQPFNDSKMDHMTVEDLRDPEKVRSIPIN Institute AGTTLYVPPKKLPANPQAYKDIEIGYKLGLTSAFRISHPELDVAETYSEIEKG (JGI) 186945 WCEELVSWTPPKSYIPNAATPALRRIAIKLALLHHRLHGSMSLEDIANAATA AEPSLTTDESDLLKQSVGFLDSMFNKERPPQSVTTSIVRSVVPPIVTQLNIIRK DGTVMMGDGKPSIYVF Chalara SEQ ID NO.: MATSSSFQQLPRGSLTIVGSGFRSIIQFTTEALMHIEAAEKLYYCVLDAATRG longipes 74 FIKAKNSNSVDLYECYSNTKPRYETYIQMTEAMLRSVRDGLKATVVLYGHP The DOE GVFIHPSHRAIAIARSEGYDAWMLLGISVEDYLFADLLIDPSNPGTQTVEATEI Joint LLKERPLLTSSHVIIYQVGCIGNFTFNFSGIKNDKFDALVDRLIQEYGPDHPLV Genome NYQAAISPLSEASIGRHIVSDLRKAEVQESVTGASTFYIPPKTVLQVTPQGAK Institute LVSESDELPTYLSKDVPVFPPFPFNQSLAPIAPAYSSAERKAIEELDNHITPLEY (JGI) 462219 RKYNASSAMQKTVESISFSLDTIKKFRESPSAFASSIEELEPHEIDALSTGSGER IDAAMQGNAAVNPNAAWLITFAIIFGK Coprinopsis SEQ ID NO.: MDATANPKAGQLTIVGSGIASINHMTLQAVACIETADVVCYVVADGATEAFI marcescibilis 75 RKKNENCIDLYPLYSETKERTDTYIQMAEFMLNHVRAGKNVVGVFYGHPGV The DOE FVCPTHRAIYIARNEGYRAVMLPGLSAEDCLYADLGIDPSTVGCITYEATDM Joint LVYNRPLNSSSHLVLYQVGIVGKADFKFAYDPKENHHFGKLIDRLELEYGPD Genome HTVVHYIAPIEPTEEPVMERFTIGQLKLKENSDKIATISTFYLPPKAPSAKVSL Institute NREFLRSLNIADSRDPMTPFPWNPTAAPYGEREKKVILELESHVPPPGYRPLK (JGI) 670214 KNSGLAQALEKLSLDTRALAAWKTDRKAYADSVSGLTDDERDALASGKHA QLSGALKEGGVPMNHAQLTFFFIISNL Coprinellus SEQ ID NO.: MIGASLAKKGQLTIVGSGIASISHLTLQAVSAIENADIVCYVVADGATEAFIR micaceus 76 KKNPNSLDLYHLYGEDKQRTDTYIQMAEFMLIRVRQGQNVVGVFYGHPGV The DOE FVCPTHRALYIARSEGYKARMLPGLSAEDCLFADLGIDPSSVGCVTYEATDL Joint LVFKRPINPASHLVLYQVGIVGKSNFKFDYTSDENIHFTKLLDRLEEAYGPEH Genome SVTHYIAPLFPTEDPIAEEYTIAQLRLPEIRDKIHTISTFYVPPKTSESLIYDEVL Institute LASLGVTHKPSVPYPWNPEATPYGPREKKAIELLAEHEPPKGYRPLKERSGL (JGI) LAVLEKLCLEPLEMKKYNEDRQAYADGLKGLTENEKEALVKGDHRTLAGA 1707844 LKVGDTPTNPAALVFTFIITRLD Cystostereum SEQ ID NO.: MPAPRKGTLTIAGSGIASIGHITLETLSHIQGADKIHYAVTDPATEAFILEKSK murrayi 77 DSSSCFDLGIYYDKNKMRYETYVQMCEVMLRDVRGGHNVLGIFYGHPGVF The DOE VSPTHRAIALARDEGYTAKMLPGISAEDYMFSDLGFDPAFPGCMTQEATILL Joint VRGRKLDPSVHNIIWQVGGVGVDTMVFDNANFYILVDRLEEDLGPDHKVV Genome HYIGAVLPQSTAVIDEFTVAGLRKEEVVKQITTVSTFYLPPRTLLHADQDMV Institute QKLGLSDSLGKRAVHVYPRTKWINAESPSPPAYGPFERAAVDRLADHTIPSN (JGI) HLFLRGSQALRQLMTDLALQPTLRARYVADPTSVLDDVTGMSAEETFALTL 1185527 RHPAPVFKVMRATGEAIANGVPTLGEIAESANSSIAGSSCALIGFFVVVLEI Coprinellus SEQ ID NO.: MPSTTRGSLTLAGAGVTSIGHLTLQTVSAIENADIVCYILNDPVTEAFIIKKNP pellucidus 78 NVYDLYQLYDDGKPRIETYHQMVEVLMSKVRSGQDVVGLFTGHPGVVNTP The DOE AAQAFKIARQEGYTARMLPGITTNDALLADVVADPALGGAMAYEATDFLN Joint NNRVLHPEMNVFIQQVGVVGNKHFNFMEMRSSLLDKLIDRLEETYGGEKEII Genome HYIAPMLPIDKPVMQKMTVSDLKKPEYKAKIVPSSTFYITPNEQLSSVLDSTE Institute GKKLHREAMSALANHTHGKNYAPMKENLALTEALERLALEPKSLEAYRSDP (JGI) 554111 QSYVNENGRGLTEEERKALVTGRGIRELLSDGPVAAHRIAPLALV Dendrothele SEQ ID NO.: MPVRIPSPQKEAGSLTIVGTGIESIGQITLQAISHIETASKVFYCVVDPATEAFI bispora 79 RTKNKNCFDLYPYYDNGKHRMDTYIQMAEVMLKEVRNGLDVVGVEYGHP The DOE GVFVSPSHRALAIAESEGYKARMLPGVSAEDCLFADLRIDPSHPGCMTYEAS Joint DFLIRERPVNIHSHLVLWQVGCVGVADFNSGGFKNTKFDVLVDRLEQEYGA Genome DHPVVHYMASILPYEDPVTDKFTVSQFRDPQIAKRICGISTFYIPPKETKDSNV Institute EAMIHRLQLLPSGKGVLKETGRYPSNKWAPSGSFHDVDPYGPRELAAVTKLK (JGI) 758933 SHTIPEHYQPLATSKAMTDVMTKLALDPRVLSEYKASRQDFVHSVPGLTPNE KNALVKGEIAAIRCGMKNIPISEKQWELRDGLVTKFIVVPIWVSIDDTTGNLE Dendrothele SEQ ID NO.: MESSTQTKPGSLIVVGTGIESIGQMTLQALSYIEAASKVFYCVIDPATEAFILT bispora 80 KNKNCVDLYQYYDNGKSRMDTYTQMAELMLKEVRNGLDVVGVFYGHPG The DOE VFVNPSHRALAIARSEGYQARMLPGVSAEDCLFADLCIDPSNPGCLTYEASD Joint FLIRERPVNVHSHLILFQVGCVGIADFNFSGFDNSKFTILVDRLEQEYGPDHT Genome VVHYIAAMMPHQDPVTDKFTIGQLREPEIAKRVGGVSTFYIPPKARKDINTDI Institute IRLLEFLPAGKVPDKHTQIYPPNQWEPDVPTLPPYGQNEQAAITRLEAHAPPE (JGI) 765759 EYQPLATSKAMTDVMTKLALDPKALAEYKADHRAFAQSVPDLTPQERAAL ELGDSWAIRCAMKNMPSSLLEAASQSVEEASMNGFPWVIVTGIVGVIGSVVS SA Fomitiporia SEQ ID NO.: MATSTETTEKKGSLTIAGTGIASIKHITLETLSYIKEAEKVYYLVADPATEAFI mediterranea 81 QDNASGTCFNLHVFYDTNKHRYDSYVQMAEVMLLDVRAGHSVLGIFYGHP The DOE GVFVSPSHRAIAIAREEGFKAHMLPGISAEDYMFADIGFDPATHGCVSYEATE Joint LLVRDKPLLPSSHNIIWQVGAIGANAMVFDNGKFNILVDRLEQVFGPDHKVV Genome HYIGAVLPQSTSTIEAYTISDLRKGDVVEKFSTTSTLYVPPSVEARLSGIMVRE Institute LGLEDSGFHTKSSQSRTLWAGPVTSSAPAYGPQERIVIAQIDKDVIPDSHQILQ (JGI) 25792 ASDAMKKTMANLALNPKLSEEYYASPSTVVEKVTGLSEQEKKALILCSAGAI HMVMAATQTNIAQGHQWSAEELEAAGTPHPALALLVVIICLI Fomitiporia SEQ ID NO.: MAATTETMKKGSLTIAGSGIASIKHMTLETVSHIKEAEKVYYIVTDPATEAYI mediterranea 82 KDNAVGACFDLRVFYDTNKPRYESYVQMSEVMLRDVRVGHSVLGIFYGHP The DOE GVFVSPSHRAIAIAKEEGFQARMLPGISAEDYLFADIGFDPAAHGCMSYEATE Joint LLVRNKPLNTSTHNIIWQVGALGAEAMVFDNAKFSLLVDRLEQDYGSDHKV Genome VHYIGAILPQADPTVEAYIVADLRKEDVVKQFNAISTLYIPPRVAGKFLDDM Institute AKKLGIADSAAYLKNHYPQAPYTGPEFATDPAYGPREKAVIDQIDNHAAPEG (JGI) 30904 HTVLHASDALKKLNTDLALSPKFLEEYKENPMPILEAMDGLTNEEKAALMQ NPLGATHELMWATPDEIANGRALPVVNFMAYGGYGGYYGGGCRPCPCCVV TDRWSSGGSNKCNMVNNLNV Fomitiporia SEQ ID NO.: MAATTETTKKGSLTIAGSGIASIKHMTLETVSHIKEVEKVYYIVSDPATEAYI mediterranea 83 KDNAVGTCFDLRVFYDTNKPRYESDVQMSEVMLRDVRAGHSVLGIFYGHP The DOE GVFVSPSHRAIAIAKEEGFQARMLPGISAEDYLFADIGFDPAVHGCMSYEATE Joint LLVRNKPLNTSTYNIIWQVGALGAEAMVFDNAKFSLLVDRLERDYGSDHKV Genome VHYIGAILPQADSTIEAHTVSDLRKEDIVKQFNAISTLYIPPRVAGKFLDDMV Institute EKLGIADPATFLKNHYTQPPYSGPEFATDPAYGPREKAVIDQIDNHAAPEGH (JGI) 162487 TVLHATDALKKLNTDLALSPKFLKEYKENPMPILEAMDGLTDEEQAALMQN PLGATHELMWATPDEIANGRVLPVVNFCFLGGNRRGYRRGYQAVNYGGSY NTYIINNF Fomitiporia SEQ ID NO.: MATSTETAQKKGSLTIAGTGIASIKHITLETLSYIKEAEKVYYLVADPATEAFI mediterranea 84 HDNASGTCFNLHVFYDTNKLRYDSYVQMAEVMLRDVRAGNSVLGLFYGHP The DOE GVFVSPSHRAIAVAREEGFKAQTLPGISAEDYMFADIGFDPASHGCVSYEAT Joint DLLARDKPLLPSSHNIIWQVGAIGANAMVFDNGKFNVLVDRLERDFGPNHK Genome VVHYIGAVLPQSTSKVEQYTVADLRKDYVVKTFTTTSTLYVPPCVDAGISNI Institute MARELGLEDSTGLRTRGNQPLPLKTGPAISLASVYGSHERTTIAQIDKGVTPD (JGI) 117392 TLQILQASDAMKKLMADLALKPKLLEKYRGNPSVVIDEVTGLAPQEKAALT LCSAGAIYMVMAASQIDIAKGRQWSTEELKTAADVSAPVILVLSQYNTVH Gyromitra SEQ ID NO.: MSVQPQSSAKKGGLVVVGSGIRSVSQLTLEAVMHIEKADTVLYCVCDPSTE esculenta 85 GFIKRKNKNAIDIYGYYSDLKERPDAFVQMAEVILREVRKGINVVAVFYGHP The DOE GIFVHPSRRALAIAKKEGYAARMLPGISAEDCLFADLLVNPSFPGAQLVEASD Joint IVYRARPLATSCHVVIFQAACFGHWKYNFTAFENGKFDHLVNRLQKDYGPD Genome HPIVSYMAAVSPLEDPVINRHTISDLYKADVKKEITPNCTLYIPPKDLLPISPA Institute GELIILGHQAGPDETPKFPPLPHIHYLAPEEETYGPQETSAVAALEKGAISADY (JGI) 514041 RPYCASPAMQKVTESLSLDPEVLKTYRESPQAFAESIPGLEAREVKALASGSP VKIHDSMWVEGKSEVRW Gymnopilus SEQ ID NO.: MATPIATTTNTPTKAGSLTIAGSGIASVGHITLETLAYIKESHKVFYLVCDPVT junonius 86 EAFIQENGKGPCINLSIYYDSQKSRYDSYLQMCEVMLRDVRNGLDVLGVFY The DOE GHPGVFVSPSHRAIALAREEGFNAKMLAGVSAEDCLFADLEFDPASFGCMTC Joint EASELLIRNRPLNPYIHNVIWQVGSVGVTDMTFPILIDRLEKDFGPNH Genome TVIHYVGRVIPQSVSKIETFTIADLRKEEVMNHFDAISTLYVPPRDISPVDPTM Institute AEKLGPSGTRVEPIEAFRPSLKWSAQNDKRSYAYNPYESDVVAQLDNYVTP (JGI) EGHRILQGSPAMKKFLITLATSPQLLQAYRENPSAIVDTVEGLNEQEKYGLKL 1778734 GSEGAVYALMSRPTGDIAREKELTNDEIANNHGAPYAFVSAVIIAAIICAL Gymnopus SEQ ID NO.: MQSSTQKQAGSLTIVGSGIESISQITLQSLSHIEAASKVFYCVVDPATEAYLLA fusipes 87 KNKNCVDLYQYYDNGKPRMDTYIQMAEVMLREVRNGLDIVGVFYGHPGV FVNPSQRAIAIAKSEGYQARMLPGISAEDCLFADLGIDPCNPGCVSYEASDFLI RERPVNVSSHFILWQVGCIGVADFTFVKFNNSKFGVLLDRLEHEYGADHTV VHYIAAVLPYENPVIDKLTISQLRDTEVAKRVSGISTFYIPPKELKDPSMDIMR RLELLAADQVPDKQWHFYPTNQWAPSAPNVVPYGPIEQAAIVQLGSHTIPEQ FQPIATSKAMTDILTKLALDPKMLTEYKADRRAFAQSALELTVNERDALEM GTFWALRCAMKKMPSSFMDEVDANNLPVVAVVGVAVGAVAVTVVVSLND LTDSVN Hydnomerulius SEQ ID NO.: MPVPTTTNKNGSLTIAGSGIASIRHMTLETLSAIKSADKVYYTVCDPATEAFI pinastri 88 QDNATGSCSDLTVYYDKEKSRYDTYVQMCEVMLREVRAGHNVLGVFYGH The DOE PGVFVSPSHRAIAIARAEGYKAEMLAGVSAEDYMFADLGFDPAAHGCVTYE Joint ATEMLLRKKQLNPATHNIIWQVGGVGVSNMIFDNARFHLLVDRLEDTFGPD Genome HQVVHYIGAVLPLSVKTMETYTIADLRKEDVVAQFNPTSTLYIPPRDVSPND Institute PEVAQQLSSFEAVVRSKYPPPGWTTSEPSSALAYGPRERDAIAQLDSHVAPD (JGI) 28991 SHKVLRASSAIRRLMADLALSPELLATYRKDPQAVVAATEGLTVQEKAALS LNKAGAIYGVMKATPYDIANNRSLSVADMGAINEPAALTTMINIHVTHV Lentinula SEQ ID NO.: METPTLNKSGSLTIVGTGIESIGQMTLQTLSYIEAADKVFYCVIDPATEAFILT edodes 89 KNKDCVDLYQYYDNGKSRMDTYTQMSEVMLREVRKGLDVVGVFYGHPGV The DOE FVNPSLRALAIAKSEGFKARMLPGVSAEDCLYADLCIDPSNPGCLTYEASDFL Joint IRERPTNIYSHFILFQVGCVGIADFNFTGFENSKFGILVDRLEKEYGAEHPVVH Genome YIAAMLPHEDPVTDQWTIGQLREPEFYKRVGGVSTFYIPPKERKEINVDIIREL Institute KFLPEGKVPDTRTQIYPPNQWEPEVPTVPAYGSNEHAAIAQLDTHTPPEQYQ (JGI) PLATSKAMTDVMTKLALDPKALAEYKADHRAFAQSVPDLTANERTALEIGD 1040599 SWAFRCAMKEMPISLLDNAKQSMEEASEQGFPWIIVVGVVGVVGSVVSSA Lentinula SEQ ID NO.: METPTLNKSGSLTIVGTGIESIGQMTLQTLSYIEAADKVFYCVIDPATEAFILT lateritia 90 KNKDCVDLYQYYDNGKSRMDTYTQMSEVMLREVRKGLEVVGVFYGHPGV The DOE FVNPSLRALAIAKSEGYKARMLPGVSAEDCLYADLCIDPSNPGCLTYEASDF Joint LIRERPTNIYSHFILFQVGCVGIADFNFTGFENSKFGILVDRLEKEYGADHPVV Genome HYIAAMLPHEDPVTDQWTIGQLREPEFYKRVGGVSTFYIPPKERKEINVDIIR Institute ELKFLPEGKVPDTRTQIYPPNQWEPEVPTVPAYGSNEHAAIAQLDAHSAPEQ (JGI) 755966 YQPLATSKAMTDVMTKLALDPKALAEYKADHRAFAQSVPDLTANERTALEI GDSWAFRCAMKEMPVSLLDNAKQSMEEASEQGFPWIIVVGVVGVVGSVVS SA Lentinula SEQ ID NO.: MESSTQTKTGSLIIVGTGIESIGQMTLQTLSYIEAADRVFYCVIDPATEAFILTK raphanica 91 NKNCVDLYQYYDNGKTRMDTYTQMSEVMLREVRKGLKVVGVFYGHPGVF The DOE VNPSLRALAIAKSEGFKARMLPGVSAEDCLYADLCIDPSNPGCLTYEASDFLI Joint RERPANIYSHFILFQVGCVGIADFSFTGFDNSKEGVLVDRLEKEYGGDHPVV Genome HYIAAMLPHEEPVTDKFTIAQLREPEVYKRVGGVSTFYIPPKERKEINADIIHQ Institute LKFLPEGKVPDKRTQIFPPNQWEPEVPTLPAYGPNDYATIALIDSHTPPEQYQ (JGI) 642948 PLATSKAMTDVMIKLALDPQALEEYKADHRAFAQSIPDLTTHERIALEMGDS WAFRCAMKDMPQSLLERAQQNMEESAQHGFPWIIVVGVVGVVGSVVSSA Mycosphaerella SEQ ID NO.: MASSSVWSYIDHLTQEDDISSSCGDAGDKKGELVVVGTGIASLRQMTVEAL eumusae 92 DYIQRADMVFYVVLDAMTECFIQTHAKKHHDLYQYYDKNKPRNASYVQM GenBank AELMVQSVRDGNLTVAVYYGHPGVFVFPTHRAIHIAREEGYKAKMLPGVSA KXT02930.1 EDCLYADLGIDPGTTGCSMFEATYLLNEPDRLDPRNHVIIWQPGCVGKSTMV FDNSEIHELADYLEKTYGPEYPIIAYLAAVRPFNDPQIDKLMVKDLRDLEKLK AIPFNAATTLYIPPKTLPVVPQDMEDPIELQLARNSAFRMSHPEMNLVDNYT KQDKQWVEDLKHFVPPNDYKRMTASTAMRRAAIKLALLHHRLHGVLPREL IADRALSKSGLTPNEAESLRVMIDNLDLFLREGVERPPAVNGVSVIVFALLIIR NEDQRVNLHGGKMGWKRSVVVN Marasmius SEQ ID NO.: MTFNDKKGSLTIAGSGIASIRHITLETLSHIERADKVYYLVADPATEAFIQDKS fiardii 93 KGDYVDLAIYYDKDKNRYESYVQMSEVILNDVRAGYNVLGVFYGHPGVFV The DOE SPSHRTVAIARDEGYRVNMLPGVSAQDYMFSDIGFDPAIPGCTIQEASTILFL Joint DKRLDPTVHNIIGQVGCVGVGTMAFDNRQFHLLVDHLEKDFGPEHKVVHYI Genome GAVLPQSATVKDEFKIADLRKDDVVKQISTISTFYIPPRQVTPVPKEVAEKLG Institute FHPLPTLPISTRIYPFLGSKASSSSTSFYEPFERNAVDRLQNHLPPLDYNTLRAS (JGI) 958901 PAVRQFMTDLALRPDVLNLYQADPMVLVDEIPGLTPSEKSALRSGDPGPVYE LMRSNFTREKSTQMGAIVFVSI Mycena SEQ ID NO.: MALKKPGSLTIAGSGIASIGHITLETLALIKEADKIFYAVTDPATECYIQENSR rosella 94 GDHFDLTTFYDTNKKRYESYVQMSEVMLRDVRAGRNVLGIFYGHPGVFVA The DOE PSHRAIAIAREEGFQAKMLPGISAEDYMFADLGFDPSTYGCMTQEATELLVR Joint NKKLDPSIHNIIWQVGSVGVDTMVFDNGKFHLLVERLEKDFGLDHKIQHYIG Genome AILPQSVTVKDTFAIRDLRKEEVLKQFTTTSTFYVPPRTPAPIDPKAVQALGLP Institute ATVTKGAQDWTGFQSVSPAYGPDEMRAVAALDSFVPSQEKAVVHASRAMQ (JGI) 934645 SLMVDLALRPALLEQYKADPVAFANTRNGLTAQEKFALGLKKPGPIFVVMR QLPSAIASGQEPSQEEIARADDATAFIIIYIVQG Mycena SEQ ID NO.: MALNKPGSLTIAGSGIASIGHITLETLALIKEADKIFYAVTDPATECYIQENSR rosella 95 GDHFDLTTFYDTNKKRYESYVQMSEVMLREVRAGRNVLGIFYGHPGVFVAP The DOE SHRAIAIAREEGFQAKMLPGISAEDYMFADLGFDPSTQGCMTQEATELLVRN Joint KKLDPSVHNIIWQVGSVGVDTMVFDNGKFHLLVERLEKDFGLDHKIQHYIG Genome AILPQSVTVKDAFAIRDLRKEEVLKQFTTTSTFYIPPRAPAPIDAKVLQALGLP Institute PPAQATKDRTGYGPLEKQAVAALDSFIPSQEKQVVHASPAMQSLMADLALR (JGI) PALFEQYKADPVGFANTRNLNGLTAQEKFALGFNKSGPIFAVMRHLPSAIAS 1200894 GQERSQEEIAHAADDKELLALVVVIVQ Omphalotus SEQ ID NO.: METSTQTKAGSLTIVGTGIESIGQMTLQALSYIEAAAKVFYCVIDPATEAFILT olearius 96 KNKNCVDLYQYYDNGKSRLNTYTQMSELMVREVRKGLDVVGVFYGHPGV The DOE FVNPSHRALAIAKSEGYRARMLPGVSAEDCLFADLCIDPSNPGCLTYEASDFL Joint IRDRPVSIHSHLVLFQVGCVGIADFNFTGFDNNKFGVLVDRLEQEYGAEHPV Genome VHYIAAMMPHQDPVTDKYTVAQLREPEIAKRVGGVSTFYIPPKARKASNLDI Institute IRRLELLPAGQVPDKKARIYPANQWEPDVPEVEPYRPSDQAAIAQLADHAPP (JGI) 2087 EQYQPLATSKAMSDVMTKLALDPKALADYKADHRAFAQSVPDLTPQERAA LELGDSWAIRCAMKNMPSSLLDAARESGEEASQNGFPWVIVVGVIGVIGSV MSTE Phlebiopsis SEQ ID NO.: MSSASSDSNTGSLTIAGSGIASVRHMTLETLAHVQEADIVFYVVADPVTEAYI gigantea 97 KKNARGPCKDLEVLFDKDKVRYDTYVQMAETMLNAVREGQKVLGIFYGHP The DOE GVFVSPSRRALSIARKEGYQAKMLPGISSEDYMFADLEFDPAVHGCCAYEAT Joint QLLLREVSLDTAMSNIIWQVGGVGVSKIDFENSKVKLLVDRLEKDFGPDHH Genome VVHYIGAVLPQSATVQDVLKISDLRKEEIVAQFNSCSTLYVPPLTHANKFSGN Institute MVKQLFGQDVTEVSSALCPTPKWAAGSHLGDVVEYGPREKAAVDALVEHT (JGI) 54959 VPADYRVLGGSLAFQQFMIDLALRPAIQANYKENPRALVDATKGLTTVEQA ALLLRQPGAVFGVMKLRASEVANEQGHPVAPASLDHVAFTAPSPASLDHVA FSAPNPASLDHVAFIAPTPASLDHVAFSAPTPASLDHVSFGTPTSASLDHVAF EAPVPASLDHVAFAAPVPASLDHVAFAAPTPASLDHVAFAAPTPASLDHVAF AVPVPASLDHIAFSVPTPASLDHVAFAVPVPDHVAGIPCM Phlebiopsis SEQ ID NO.: MSHDATTTKRGSLTIAGSGIASVAHITLETVAYLAEADSVFYIVADPVTEAFI gigantea 98 HKNAKVPCQDLHVFYDKDKSRYDTYVQMAETMLNSVRAGEKVLGIFYGHP The DOE GVFVSPSRRALAIAREEGYEAKMLPGVSAEDYMFADLEFDPATHGCCAYEA Joint THILLKNIPLDTSINNIIWQVGGVGVTKIDFENSKFKFLVDRLEKDFGLDHKV Genome VHYIGAVLPQSATVKEVYTISDLRKPEVATQFNACSTLYVPPRKGAADPFPA Institute HVVEQLLGTTTSKVVDALYPVAQWDLGNNLPAVPAYGPYEQKVVAAMGD (JGI) 80884 HTTPDDYRALAGSPAMQQFMAELALRPTLQAKYRASPQAVVDATPGLTDLE RAALLLNAAGPVLAVMKPRAGEVMTVDKLKESVTPSAAYLFIFIVIAAAAHI LV Pseudocercospora SEQ ID NO.: MASTVWSYFDQLTRDDDFGSCEDACSKQGELVVVGTGIASLRQMTVEALD musae 99 YIQRADMVFYVVLDAMTEAFIQTHAKKHHDLYQYYDKNKPRSASYIQMAE GenBank LMVQSVRDGNLTVAVYYGHPGVFVFPTHRAIHIAREEGFKAKMLPGVSAED KXS93410.1 CLYADLGIDPGSTGCSMFEATYLLNEPDRLDPRNHVIIWQPGCVGKSAMVFD NSEIHELADYLEKTYGAEYPVIAYLAAVRPFNDPQIDKLMVKDLRDLEKLRA IPFNAATTLYIPPKTLPAVPQDIANPIEVQLARNSAFRLSHPEMNLVDMYTKQ DKQWCDDLKHFVPPNDYKPMTATPAMRRLAIKLALLHHRLHGALPTELIAS KALSKSELSSSEAESLRLMIKNLDLFLREGVERPPAVNGVSVIVFALLIIRSED QRVGFDGKMEWKRSVVVN Porodaedalea SEQ ID NO.: MPVSTTTTKNGTLVIAGSGIASIAHITLETLSHIKESDRVYYIVGDPATEAFIQD chrysoloma 100 NASGTCFDLTIFYDTNKVRYDSYVQMCEVMLRDVRAGHTVLGVEYGHPGV The DOE FVSPSHRAIAIARDEGYKARMLPGVSAEDYLFADLGFDPATHGCTSYEATDL Joint LVRNKPLNASTHNIIWQVGGVGVGTMVFDNAKFHLLVDRLEKDFGPSHTVV Genome HYIGAVLPQSITTMDKLTIADLRKDAVVKQFNPTSTFYIPPRDISLPLDTMAK Institute KLGMDDASARPVSLYPPSRWTGTKFTTAPAYGPREKDVIAKIDTYAAPKDH (JGI) 797528 KILHASRSMKKLMTDLALNPKLLEKYRANTKAVVEATEGLSAQEKAALNM DLAGPVHAVMKATPSDITDGREMSVDAVASATEPSAALILLLV Rhizopogon SEQ ID NO.: MITSNSSNGSNSTKCGTLTIAGSGIASVAHITLETLSYIKESEKIFYLVCDPVTE vinicolor 101 AYIQDNTTADCFDLSVFYGKNKGRHDSYIQMCEVMLKAVRAGHDVLGVFY The DOE GHPGVFVSPSHRAIAVARQEGYKAKMLPGISAEDYMFADLEFDPSLSGCKTC Joint EATEILLRDKPLDPSIQNIIWQVGSVGVVDMEFEKSKFQLLVDRLEKDFGPGH Genome KVVHYIGAVLPQSTTTMDTFTIADLRKEDVAKQFGTISTLYVPPRDEGHVNP Institute SMAEAFGTPAGPARLNDSVKWVGPKLSIVSANGPHQRDVIAQIDTHIAPEGH (JGI) 805340 KKLHASAAMKKFMTDLALRPKFLDEYKLNPVAVVESAQGLSNLEQFGLKF ARGGPVDALMKATESDIASGRQLTEEEIAKGNGPPGAAATVLLLGALIITLSL NFS Rhizopogon SEQ ID NO.: MSTKRGTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFIYDNSTA vinicolor 102 DCFDLSVEYDKTKGRYDSYIQMCEVMLKAVRAGHDVLGVFYGHPGVFVSP The DOE SHRAIAVARQEGYKAKMLPGISAEDYMFADLEFDPSVSGCKTCEATEILLRD Joint KPLDPTIQNIIWQVGSVGVVDMEFSKSKFQLLVDRLEKDFGPDHKVVHYIGA Genome VLPQSTTTMDTFTIADLRKEDVAKQFGTISTLYIPPRDEGHVNLSMAKVFGGP Institute GASVKLNDSIKWAGPKLNIVSANDPHERDVIAQVDTHVAPEGHKKLRVSAA (JGI) 749423 MKKFMTDLALKPKFLEEYKLDPVAVVESAEGLSNLERFGLKFARSGPADAL MKATESDIASGRQLTEEEIAQGTGPVGLQTALALLVLLGLGVAIVTRPDD Rhizopogon SEQ ID NO.: MTTSNSSNGTKRGTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFI vinicolor 103 HDNSTADCFDLSVFYDKNKGRYDSYIQMCEVMLKDVRAGHHVLGVFYGHP The DOE GVFVSPSHRAIAVARQEGYNAKMLPGISAEDYMFADLEFDPSLYGCKTCEAT Joint EILLRDKPLDPSIHNIIWQVGSVGVVDMEFSKSKFHLLVDRLEKDFGLEHKV Genome VHYIGAVLPQSATTMDTFTIADLRKEDVAKQFGTISTLYIPPRDERPFNPRMA Institute EAFGSPAAPAMPISSVKWAGPKLNIPPVYGPHERDVIAQIDTHVAPEGHKKL (JGI) 700323 HTSAAMKKFMTDLAMKPKLLEEYKRDPVAVVEAAEALSDLEKFGLKFARV GPADVLMKATESDIASGRQLTEEEIAKANGPQGLGTIILVWHTVHGIA Rhizopogon SEQ ID NO.: MTTDIKRGTLTIAGSGIACIAHITLETLSYIKESDKLFYLVCDPVTEAFIQDNAT vinicolor 104 GGCFDLSVFYDKNKSRYDSYIQMCEVMLKAVRVGYDVLGVFYGHPGVFVS The DOE PSHRAIAVAREEGYKARMLPGISAEDYLFADLEFDPSLHGCNTYEATELLLR Joint GKPLDPLIHNIIWQVGSVGVIDMEFEKSKFHLLVDRLENDEGPDHKVVHYIG Genome AVLPQSTTTMDTFTISDLRKEDVAKQFGTISTLYVPLRDEALVNPIMAEAFGR Institute TAAPVTMNSSVKWAGPKLNIVSAYGPHERSVIAQIDTHVAPEGHKKLHTSTA (JGI) 769711 MNKFMTDLALKPKFLEEYKLDPAAVVESAEGLSNMEKFGLKVAKAGAAHI LMKATESDIASGRQLTEDEIARADGPEGLAVVVIVLVATVALLALLV Rhizopogon SEQ ID NO.: MTTGTERGTLTIAGSGIACVAHITLETLSYIKESDKLFYLVCDPVTEAFIQDNA vinicolor 105 TGDCFDLSVFYDKNKSRYDSYIQMCEVMLKAVRAGHHVLGVFYGHPGVLV The DOE SPSYRAIAVAREEGYKARMLPGISAEDYLFADLEFDPCFPSGCNTYEATELLL Joint RDRSLDPSIHNIIWQVGSVGVTDMEFEKSKLNLLVDRLENDFGPDHKVVHYI Genome GAVLPQSTTTMDTFAVSDLHKEDVAKQFGTISTLYIPPRDEAPVSSNMMEVL Institute NRPPVPNMPPPSVMWVAPKLNISSAYTPHERDVIAQIDTHVAPEGYKKLHTS (JGI) 854502 AAMKKFMTDLALKPKFVEEYMLDPVAVIESAEGLSDVEKFALKVAKGGAA NILMKATESEIASGRHLTEDEISNAVGPLGLSATVVLVVAEAVVIMAMAVLV Rhizopogon SEQ ID NO.: MTTGTERGTLTIAGSGIACVAHITLQMLSYIKESDKLFYLVCDPVTEAFIQDN vinicolor 106 ATGDCFDLSVFYDKNKSRHDSYIQMCEIMLRAVRADHHVLGVFYGHPGIFV The DOE SPSYRAMAVAREEGYKAKMLPGISTEDYLFADLEFDPCLPGCNTYEATELLL Joint RDRSLDPSIHNIIWQVGSVGVIDIQFEKSKFHLLVDRLEKDFGPDHKVVHYIG Genome AVLPQSTTTMDTFTISDLRKEDVAKQFGTISTLYIPPRDKPLAHPGMAEAIGS Institute LTAPAKLYSPVKWAGPKLNIVSPYSPYERDVIARIDTHVAPEGHKKLYTSAA (JGI) 710394 MKKFMTDLALKPKLLEEYMLDPVAVVESADGLSDVEKFGLKLAKDGVANI LMMATESDIASGRHLAEDEIAKAKGPLGLLTVVLVIVGSSLVVHRLT Rhizopogon SEQ ID NO.: MTTSNSSDGTKRGTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFI vinicolor 107 HDNSTADCFDLSVFYDKNKGRYDSYIQMCEVMLKAVRAGHDVLGVFYGHP The DOE GVFVSPSHRAIAVARQEGYKAKMLPGISAEDYMFADLEFDPSLYGCKTCEAT Joint EILLRDKPLDPTIQNIIWQVGSVGVVDMEFSKSKFHLLVDRLEKDFGPDHKV Genome VHYIGAVLPQSATIMDTFTIADLRKEDVAKQFGTISTLYIPPRDERPVHSGMA Institute EAFGSPGAAVKPNTSIKWAGPKLNIVSACGPHEPDVIAQIDTHVTPEGYKKL (JGI) 777202 HASVSMKKFMTDLALKPKFLEEYKLDPVAVVEAAEGLSDLEKFGLKFARDG PADTLMKATESDIASGRQLTEEEVANGNGPLGLQTVVVVWLTTKIVSPEL Rhizopogon SEQ ID NO.: MTTDTKRGTLTIAGSGIASIAHITLETLSYIKESDKLFYLVCDPVTEAFIQDNA vinicolor 108 TGDFFDLSVFYDKNKSRYDSYIQMCEIMLRAVRAGHSVLGIFYGHPGVFVSP The DOE SHRAIAVAREEGYKARMLPGVSAEDYMFADLEFDPSQSTCNTYEATELLLR Joint DRPLDPAIQNIIWQVGSVGVVDMEFEKSKFHLLVDRLEQDFGPDHKVVHYIG Genome AVLPQSTTTMDIFTISDLRKENVAKQFGTISTLYIPPRDEGPVSSSMTQAFDFK Institute AGAMVYSPVKWAGPKLNIVSALSPYERDVISQIDTHVAPEGYKILHTSAAMN (JGI) 777713 KFMTDLSLKPKFLEEYKLYPEAVVESAEGLSNLEKFGLKEGSDGAVYILMKA TESDIASGRQLTEDEIAKAHKSVGEPTVLVILPTVIVVLIGRE Sanghuangporus SEQ ID NO.: MAGSQKGTLTIAGSGIASIGHITLETLSYIQEADKIHYAVADPATEAFILDKSK baumii 109 DSSHCFDLTVYYDTNKMRYETYVQMCEVMLRDVRGGYNVLGIFYGHPGVF GenBank VSPSHRAIARDEGYIAKMLPGVSAEDYMESDIGFDPAVPGCMSQEATGLL OCB86575.1 VCKKKLDPSIHNIIWQVGSVGVDTMNREFHILVDRLEEDFGLDHKVVHYIGA VLPQSTTVMDEFTIADLRKEEVVKQITTTSTFYLPPRSMAHIDQDMLQKLRLS LSPVEHVMHVYPRSKWASAESPNPPAYGPIEREAVSHLTNHTIPNDHQFLRG SRPLRQLMVDLALQPGLRNRYKADPASVLDAIPGMSAEEKFALTLNHAAPIF KVMRASRADGEAPTLDEIAGTVNPSLACPAIVVCFVGIMVIVIAL Serendipita SEQ ID NO.: MASSTHPKRGSLTIAGTGIATLAHMTLETVSHIKEADKVYYIVTDPVTQAFIE vermifera ssp. 110 ENAKGPTFDLSVYYDADKYRYTSYVQMAEVMLNAVREGCNVLGLFYGHP bescii The DOE GIFVSPSHRALAIAREEGYEARMLPGVSAEDYMFADLGLDPALPGCVCYEAT Joint NFLIRNKPLNPATHNILWQVGAVGITAMDFENSKFSLLVDRLERDLGPNHKV Genome VHYVGAVLPQSATIMETYTIAELRKPEVIKRISTTSSTFYIPPRDSEAIDYDMV Institute ARLGIPPEKYRKIPSYPPNQWAGPNYTSTPAYGPEEKAAVSQLANHVVPNSY (JGI) 781716 KTLHASPAMKKVMIDLATDRSLYKKYEANRDAFVDAVKGLTELEKVALKM GTDGSVYKVMSATQADIELGKEPSIEELEEGRGRLLLVVITAAVVV Thanatephorus SEQ ID NO.: MATFTEDNHPKRGSLIIAGSGIASVAHFTLETVSHLKNADKVFYLVNDPVTE cucumeris 111 AFIQENNPDTFDLVTFYSETKPRYHSYVEMAEIMLKEVRAGHKVLGIFYGHP The DOE GVFVHPSRRALFIARQENYEARMLPGISSEDYMFADLELDPAEFGCMTCEAT Joint ELIARNRPLNTSVHNIIWQAGIVGVSTLEYQESKFQLLVDRLERDFGPEHKVV Genome HYVGAIRMTPQAQSAMVVYSIQELRNPAVANFINSGSTLYVPPRLRDVPRVD Institute PDSATALGLPPVTTGFLSASPTWVGSRFVTPSSYGDLENNIVAQMNENRSRS (JGI) 718597 RITEPSPAMKGLMIKLAQELKLQEEYKKDPAKVAADTPDLKEIERRALSYGL DNTIRAVMSHRGSSSGPTEEQLKEISWEGSTIKHVTASSIAQ Trypethelium SEQ ID NO.: MAPSTSDRSKLPVAGYRPGRLVMVGSGIKSIAHLTLEAIGHIEQADKVFFVV eluteriae 112 ADMTTAAFIHSRNANAVDMYNLYDIGKPRYHTYVQMAERMLREVRNGFY The DOE VVGVFYGHPGIFVNPSHRAIAIARQEGHQAFMLPGISAEACLFADVGIDPSTS Joint GCQTIEATDLLLRNRPINTGSHLIIFQVGIVGDSGFHPQGFKNTKLHVLLEKLT Genome EVYGSGHRLVHYIAPSMATVEPTIDFLTLGALKKSRNARRVTGISTFYIPPKH Institute DVQPSPSAAKKLGLKVQQGAKSRNFGRLTMPEDPYGPRERVAIDELDKHKD (JGI) 416528 PAWYKRVRASQPMFDLLYRLGSDPRAAAKFKANPDKFLIPYDSDLTQTERA ALLTRRSFPVRQALQPSADDVANQVVQRLFRDPSFATQWASTLKKNKSDPN GEQNIIAWLKQQGYDTTPEAVDSAYLQALNVDLDIYDSAYATSFSGGSTGPL IVILNGKVTVAGVEIKNPIYSQSILSWGTTDGNEYNAQLFLRVLTNDDGKPLP QNAYVGPQLYGYYWSPNSVKPTKPNINGKVGQPSPSNGSDPVQPTPLSKFA ATYNTYIAGATGKYAADSQLVVANPEPNTTVTYKGIVIKKWTYANESLSWL ATDGNAQNVAIRFFINTSSTSSDPTLGPQFLGTTWAQGQNPPSKSNEFGQIGQ SADPDTTANILTKANTWIQFGLNLVNGIAAMLICHAIMSLFKARNAEAANPS PENQQAEQQAEQDANDAINEQEAIQDNAADQGGNEEVDPNDLDPDEAGEP NANADADADADADADADADADADADADAEADADAEADADAEADADAEA DADAEADADAEADADADIDIDIDADVVDIIL Trichophaea SEQ ID NO.: MTQGSLFIVGSGIRSIAQLTLEALMHIENADKVFYVVCDPVTEGFIKEKNPNA hybrida 113 VDLYEYYSNTKLRNETYIQMAEIMLREVRSGLRVVGVFYGHPGNFVSPTRR The DOE ALAIARDEGYVAKMLPGISADDCLFADLLIDPCYPGLQTVEATDVLVRNRPL Joint QTTSHVVIYQVGVICKSGFDFYSIENDKFDHFVTRLQEDYGPNHPVVNYVAA Genome VSPLAEPTIQRHTISELFKDSVKASISGVSTFYIPPKELLPLTAAGEKLILDLNT Institute DKAAVQVKTYPPLPYCPLSTGQQAYGAYEKSVIEKIKNHTTPAGYKPYQTSR (JGI) 914024 AMHKALERLYLDPETVKKYRRDPEGFAAEFEGLKENEAEALRSGNPDSCAS LGAAVLHAVAVWIAC Talaromyces SEQ ID NO.: MSTSEHHRPASHGFRPGKLVIVGSGIRSISQFTLEAVAHIEHADKVFYCVADP islandicus 114 GTDAFIERHNKNAVDLYNLYGDGKPRHQTYTQMAEVILQEVRKGFSVVGVF GenBank YGHPGVFVNPAHRAVSIAASEGYEATMLPGVSAEDCLYADLLIDPSRPGCQT CRG85870.1 LEATDVLLRKRPIAKDCHVIIFQVGAVGDLGFNFKGFKNTKFEILVQHLLEVY GPDHSVVHYIASQLTFAAPIRDRYAIQDLVKPEVAKRITGISTFYLPPKDLLQP DEVAAKSLGLVSRPTTTASFGPYAPDQPYGPRELAAIKALKAHKDPANYNK TRASPALYQALESLALNPKDVLKFRSSREKFIARIDGLTKPEQKALRFASTGLI RQVLKSSAKDIATKFVQDEFRNPTLATQYAQILKENRNKTDGIDKITEWLKA QGYDTTPEAIGEAYKQELSRNLDSYDGKYTTNVDGKPGPQLLLQKGTVLVD GVKIPNWSYSSSQLSWTVEDGNPSSAMLHFQLLTNDTGKPLPPGSYIGPQFY GLYWRKGSSKPTGNNTVGKVGEVPPPDPITPVKPTPISAWLDTYQTYLKSSS GTWDKAGELAITGDETNPTVTYKGKQIQKYSYQNETISWSSADGNPNNALS FYFNKNPTQKNPAPGNQFSGKYWESGQAPPTAANLFGQIGSSSSPGTAANDA MTAAQWKTIGINLGVGILTFVLGDFTLKAINALIKWVRNPTKENRDALDQA NDDAGEAEAQQEAVEAEGADLNPGGDIVDAGDVPAQAAEAAEAAEAAEV AEVAEVAEAAEAAEAAEAAEAAEVAEVAEVAEVAEVAEVAEVVDVVEVII Wilcoxina SEQ ID NO.: MPQGSLTIVGSGIRSIAQLTLEAIMHIENADKVFYVVCDPATEGFIKQKNPNA mikolae 115 VDLYEYYSNTKLRNETYIQMAEIMLREVRSGLRVVGVFYGHPGNFVSPTRR The DOE ALAIAQDEGYVAKMLPGISADDCLFADLLIDPCYPGLQTVEATDVLVRDRPL Joint QITSHVVIYQVGVICKSGFDFTSIENDKFDHFVNRLQQDYGPSHPVINYVAAV Genome SPLAEPTIQRYTISDLFKDSVKACISGVSTFYLPPKELLPITDVGEKLILDLGTD Institute KAALQVKTYPPLPYCPLSTGQQPYGPYEKAVIERIKDHTTPADYRPYNTSQA (JGI) 650847 MYKALERLYLDPEAVKKYRRDPEGFAAAFEGLKENEAQALKSGNPDSSASL GHVRHPV Lentinula SEQ ID NO.: METPTLNNSGSLTIVGTGIESIGQMTLQTLSYIEAADKVFYCVIDPATEAFILT novae- 116 KNKDCVDLYQYYDNGKSRMDTYTQMSEVMLREVRKGLDVVGVFYGHPGV zelandiae FVNPSLRALAIAKSEGYKARMLPGVSAEDCLYADLCIDPSNPGCLTYEASDF LIRERPTNIYSHFILFQVGCVGIADFNFTGFENSKFGILVDRLEKEYGADHPVV HYIAAMLPHEEPVTDQWTIGQLREPEFYKRVGGVSTFYIPPKERKEINVDIIRE LKFLPEGKVPDTRTQIYPPNQWEPEVPTVPAYGSNEHAAIAQLDAHSAPEQY QPLATSKAMTDVMTKLALDPKALAEYKADHRAFAQSVPDLTANERTALEIG DSWAFRCAMKEMPVSLLDNAKQSMEEASEQGFPWIIVVGVVGVVGSVVSS A Partial SEQ ID NO.: 1 gactgcgtcg acttgtatca gtattacgac aatggcaaat ccagaatggc tacttacacc methyltransferase 117 61 caaatgtcag aggtaagctc cgtacacttc aacagttgcc aggacccgat gctgacatat enzyme 121 gcgtagctca tggtcaggga agtccgcaag ggcctcgatg tcgtgggcgt cttctatgga DNA 181 cacccgggag tgttcgtgaa cccttctcac cgagctctgg ctatcgccag gagtgagggc sequence 241 taccgagcga ggatgctccc aggcgtgtct gcggaagatt gcctcttcgc cgacttgtgc Gymnopus 301 attgatcctt cgaacccggg ttgcttgacc tacgaagcat cggatttcct gatcagggat fusipes 361 cgtccggtca gcatccacag tcacttggtc ctgttccaag tcggttgtgt tggtattgca 421 gacttcacat ttgtaagatt caatgtaagc attcagtatt gcccaagatt ttgtgtctaa 481 aatgttacct ggttcagaat tcaaaatttg gggtacttct cgaccggctc gagcacgaat 541 atggcgctga tcatacagtt gtgcactata tcgcagccat gctgccttac gagaatccag 601 tgattgacaa actcaccatc agccagctcc gtgacaccga gatcgcgaag cgcgtgagtg 661 gtatatcgac cttctatatc cctccaaagg agctaaagga cccgagcatg gatatcatgc 721 gccgcctaga acttttggct gttgaccaag ttccagataa gcaatggcac ttctacccaa 781 caaaccagtg ggcaccatct gcacccaacg tagttcctta tggaccaaga gaacaagccg 841 ccattgtcca gttgggcagt cacaccattc cagagcaatt tcagcctatt gctacttcca 901 aagctatgac tgacatcttg acaaagctgg ctttggaccc caagatgctc actgagtaca 961 aggctgaccg tcgtgccttt gctcaatctg cgctggagtt gacagtcaat gagagagat Partial SEQ ID NO.: 1 gactgcgtcg acttgtatca gtattacgac aatggcaaat ccagaatggc tacttacacc methyltransferase 118 61 caaatgtcag agctcatggt cagggaagtc cgcaagggcc tcgatgtcgt gggcgtcttc enzyme 121 tatggacacc cgggagtgtt cgtgaaccct tctcaccgag ctctggctat cgccaggagt cDNA 181 gagggctacc gagcgaggat gctcccaggc gtgtctgcgg aagattgcct cttcgccgac sequence 241 ttgtgcattg atccttcgaa cccgggttgc ttgacctacg aagcatcgga tttcctgatc Gymnopus 301 agggatcgtc cggtcagcat ccacagtcac ttggtcctgt tccaagtcgg ttgtgttggt fusipes 361 attgcagact tcacatttgt aagattcaat aattcaaaat ttggggtact tctcgaccgg 421 ctcgagcacg aatatggcgc tgatcataca gttgtgcact atatcgcagc catgctgcct 481 tacgagaatc cagtgattga caaactcacc atcagccagc tccgtgacac cgagatcgcg 541 aagcgcgtga gtggtatatc gaccttctat atccctccaa aggagctaaa ggacccgagc 601 atggatatca tgcgccgcct agaacttttg gctgttgacc aagttccaga taagcaatgg 661 cacttctacc caacaaacca gtgggcacca tctgcaccca acgtagttcc ttatggacca 721 agagaacaag ccgccattgt ccagttgggc agtcacacca ttccagagca atttcagcct 781 attgctactt ccaaagctat gactgacatc ttgacaaagc tggctttgga ccccaagatg 841 ctcactgagt acaaggctga ccgtcgtgcc tttgctcaat ctgcgctgga gttgacagtc 901 aatgagagag at
[0138] Gymnopeptide A (GymA) and Gymnopeptide B (GymB) are two related multiply N-methylated cyclic octadecapeptides that were isolated from the spindleshank mushroom Gymnopus fusipes (G. fusipes) (also known as Collybia fusipes). GymA and GymB differ at one position (serine for GymA vs. threonine for GymB). Several aggressive adherent cancer cell lines (e.g. HeLa, A431, T47D, MCF7, MDA-MB-231) exhibit hypersensitivity to both GymA and GymB, with IC50 values in the low nanomolar range.
[0139] It was surprising to discover that rather than utilizing an NRPS to synthesize these peptide macrocycles, the genome of G. fusipes encodes for one gene containing a nucleic acid sequence that encodes the 18 amino acids of GymB. The 18-amino acids sequence lies at the C-terminus of an open reading frame that encodes for a putative S-Adenosylmethionine (SAM) dependent methyltransferase. Hereinafter, the gene encoding for the methyltransferase followed by the GymB peptide sequence cassette is referred to as the gymnopeptide precursor gene, GymMAB.
[0140] The GymMAB gene is present in a cluster that also includes another open reading frame encoding a prolyl-oligopeptidase (GymP), which cleaves and cyclizes the methylated gymnopeptide cassettes. These enzymes bear weak resemblance to the G.marginata and Amanita species prolyl-oligopeptidase PopB proteins and the O.olearis omphalotin-producing enzymes, and form a distinct family of RiPPs/RiPP-processing-enzymes with unique structural and functional features that allow them to accommodate the relatively large-sized 18-mer macrocycle.
[0141] Furthermore, careful examination of several Gymnopus species that are closely related to G. fusipes, such Gymnopus earle, Gymnopus dryophilus, Gymnopus ocior, Gymnopus acervatus, Gymnopus luxurians, Gymnopus androsaceus (also known as Marasmius androsaceus or Setulipes androsaceus) Micromphale foetidum, Micromphale perforans, Marasmius fiardii. Rhodocollybia maculata, and Rhodocollybia butyracea failed to detect any genes that encode for orthologs or other related genes to the aforementioned enzymes identified in G.fusipes. On the other hand, the biosynthetic gene cluster of enzymes involved in the production of the omphalotins are present in a wide group of closely related species such as Omphalotous olivascens as well as Lentinula species, including Lentinula edodes, Lentinula aciculospora, Lentinula raphanica, Lentinula novae-zelandiae, Lentinula boryana, and Lentinula lateritia. Thus, the identified genetic cluster appears to be horizontally transferred.
[0142] Enzymes such as the methyltransferase and prolyloligopeptidase isolated from species such as G. fusipes can be used to generate methylated macrocycles. The methylated macrocycles may be screened using the methods described herein. The enzymes can be integrated into host cells and used to generate DNA-encoded libraries of RiPPs. The enzymes can also be used to manufacture specific macrocycles of interest at scale in heterologous prokaryotic or eukaryotic expression systems. Uses of the enzymes in heterologous expression systems may include, but are not limited to, reverse Y2H systems as described in PCT/US2018/061292 (published as WO 2019/099678) and U.S. application Ser. No. 15/683,586 (published as US20170368132A1), which are hereby incorporated by reference in their entireties (and in particular with respect to the reverse hybrid and related yeast systems disclosed therein).
[0143] The macrocycles generated using the methods described herein may be used as drugs. Such drugs may be used for the treatment of various diseases or conditions. The macrocycles generated using the methods described herein may be used to modulate protein-protein interaction between a first and second protein. The macrocycles generated using the methods described herein may be used to disrupt protein-protein interaction between a first and second protein.
[0144] Disclosed herein, in certain embodiments, is a method of detection or degradation of a target protein that is mediated by a molecule that links a first target or test protein to a second target protein in a host cell, comprising: expressing in the host cell a first fusion protein comprising the first test protein, a second protein; delivering a first molecule to the host cell; modifying the first molecule while in the host cell via a modifying enzyme, such as a prolyloligopeptidase and/or a methyltransferase; and allowing the first molecule to bridge the interaction between the first test protein and the second protein, wherein the first molecule is a product of an encoded DNA sequence, wherein the first molecule comprises a randomized polypeptide library and one or more modifying enzymes, wherein the one or more modifying enzymes modify the randomized polypeptide library.
[0145] The prolyloligopeptidases described herein may be ones that are able to macrocyclize relatively large peptides. The prolyloligopeptidases described herein may be ones that are able to macrocyclize peptides comprising at least 5 amino acids, at least 7 amino acids, at least 10 amino acids, at least 15 amino acids, at least 18 amino acids, at least 20 amino acids or at least 25 amino acids. The prolyloligopeptidases described herein may be ones that are able to macrocyclize peptides comprising at most 7 amino acids, at most 10 amino acids, at most 15 amino acids, at most 18 amino acids, at most 20 amino acids or at most 25 amino acids.
[0146] The tryptophan at position 603 appears to be highly conserved in relative prolyloligopeptidases that are not capable of relatively large macrocyclizing peptides. Similarly, the asparagine at position 563, adjacent to the active site serine at position 562, is also conserved in these same prolyloligopeptidases. “Position 603” and “Position 563”, as used herein, refer to the position of the active-site tryptophan and the position of the asparagine adjacent to the active-site serine in the prolyloligopeptidase of SEQ ID NO: 55, respectively, along with corresponding amino acid in other prolyloligopeptidases. In other words, position 603 or position 563 of a prolyloligopeptidase that differs from SEQ ID NO: 55 may not necessarily be the 603.sup.rd or 563.sup.rd amino acid in that protein, but rather is the position that aligns with position 603 or 563 of SEQ ID NO: 55 when the prolyloligopeptidase is aligned with it, regardless of the distance of that amino acid from the N-terminus of the protein. Without being bound by theory, the mutation of these highly conserved tryptophan and asparagine residues to other amino acids, such as leucine and serine, respectively, may be key to enable its structural flexibility to accommodate peptides such as the larger 18-mer gymnopeptides. Additionally, the substitution of tryptophan at position 603 with another residue such as leucine may play an important role in expanding the cleavage site recognition specificity of the oligopeptidase from being directed towards small secondary amine residues such as proline or sarcosine (N-methyl-glycine) to enable cleavage at secondary amine sites with bulkier side chains such as N-methyl-valine, N-methyl-isoleucine, or N-methyl-leucine. Consistent with this premise, while the N-terminal cut site of the Gymnopeptides A/B precursor protein is at a proline residue, the C-terminal cut site is at a methyl-valine residue. Prolyloligopeptidases belong to the family of serine proteases. The mechanism of action of serine peptidases involves an acyl enzyme intermediate. Both the formation and the decomposition of the acyl enzyme proceed through the formation of a negatively charged tetrahedral intermediate that is stabilized by the oxyanion binding site providing two hydrogen bonds to the oxyanion. In prolyloligopeptidases one of the hydrogen bonds is formed between the oxyanion and the main chain amide group of asparagine 563, which is directly adjacent to the catalytic serine, serine 562. The second hydrogen bond is among this type of serine peptidases and is provided by the hydroxyl group of tyrosine 481 (position 481 of SEQ ID NO:55). In the chymotrypsin-type members of the serine protease family of enzymes the hydrogen bonds are contributed by the main chain amide groups of the catalytic serine residue and that of a glycine residue that is at a −2 position from the catalytic serine. The substitution of the highly conserved asparagine at position 563 with serine renders the serine 563 residue and the glycine 561 residue (position 561 of SEQ ID NO:55) positioned identically to the active site serine and glycine hydrogen bond donors of chymotrypsin-type proteases. This substitution may play an important role to enable the enzyme to toggle between using two different active-site serines for each of the two cleavage events, for example serine 562 could be the active site residue involved in the N-terminal proline-directed cut with the two hydrogen bonds to the oxyanion contributed by the main chain amide of the serine residue at position 563 and the hydroxyl group of the tyrosine at position 481, while serine 563 is the active site residue involved in the second N-methyl-valine directed cut with the two hydrogen bonds to the oxyanion contributed by the main chain amides of serine at position 563 and glycine at position 561, or vice versa. The combination of this novel wider catalytic pocket due the substitution of tryptophan 603 with leucine and a toggle-switchable active site serine due to the substitution of asparagine 563 with serine render this new oligopeptidase particularly suited at recognizing a wide variety of secondary amine residues with bulky side chains at the cleavage site and incorporate larger sizes of macrocycles than any of the previously characterized members of the family. Shown in
[0147] In some cases, the tryptophan residue in the active site of a prolyloligopeptidase, which corresponds to the conserved tryptophan at position 603 of SEQ ID NO: 55, may be replaced with a different amino acid residue. For instance, in some cases the tryptophan residue in the active site of a prolyloligopeptidase may be replaced with a leucine residue. In some cases, the prolyloligopeptidases used herein do not comprise a tryptophan residue at the 603 position in the active site of the enzyme, wherein the position 603 corresponds to the active site of SEQ ID NO: 55.
[0148] In some cases, the asparagine residue in the active site of a prolyloligopeptidase, which corresponds to the conserved asparagine at position 563 of SEQ ID NO: 55, may be replaced with a different amino acid residue. For instance, in some cases the asparagine residue in the active site of a prolyloligopeptidase may be replaced with a serine residue. In some cases, the prolyloligopeptidases used herein do not comprise a asparagine residue at the 563 position in the active site of the enzyme, wherein the position 563 corresponds to the active site of SEQ ID NO: 55.
[0149] In other embodiments, the cyclization comprises reacting with beta-lactamase. A variable region is excised and end-to-end cyclized by the actions of an N-methyltransferase and a beta-lactamase family member. Table 4 shows an exemplary list of lactamase and amino acid sequences of the processed cyclic peptides. The lactamase may be a protein with a sequence selected from SEQ ID NOs: 119-120. The lactamase may be a variant (e.g., a non-natural variant) of a naturally occurring lactamase. Such a variant can have at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NOs: 119-120. In some embodiments, some of the sidechains of the randomized residues are subsequently isomerized from the L- to D- configuration or decorated with additional modifications like hydroxylation, halogenation, glycosylation, acylation, phosphorylation, methylation, and acetylation.
TABLE-US-00004 TABLE 4 Amino acid sequences of the N-methyltransferase and beta-lactamase processed cyclic peptides Rhizophogun SEQ ID NO.: MAKVFGLVLGFLSQTFTYPSQVWFSPVGANNGQVITPELSNSIQETLDVWNI vinicolor 119 TGLSVAIIPKSGEPEYHSWGDRTEDGESVTQDTLFHMASVSKAFCVSALGIL GenBank MDDFEHGRNVTPLPPALTEFNWHTSIQDLLPGEWQLMDEWASRKANMKDI OAX32863.1 LSHVSGLPRHDFAFGPYESPKEAVSRLRYLRPAFELREQWSYNNQMFMVAG hypothetical HIVETYSGKTYTSFVEDRIFTPLGMSSSTFSPAKAAKTGKFTQGWTSSGRLLP protein beta- ELFPEDMVMLMAGAGGVISSAVDMSKWVALWLNKGVYDNVTVIPSSVYG lactamase NASQSYAVSISTPVDSEHSIQGYGLGWFQNSYLGHNVVYHSGSIPGLSMLVS (transpeptidase) FLPDDDVGFVVFANGGDKAAPVMNISNSILDAALHLRSGPAPPIMPEKKAVT SPSEDIVNLELPLEEFSGTYTDPGYGTFTFCSPSSSSSYCQQVMTDFTAVDSVH PSAPSPLQLLAAWPRMGSSHIRAVHQSGNKFLLLCTALFPEGYGRDSTPFETA EIGTPGATAEFVVEDGKVVGFGLFGLVDQVTERERTQTTVKDRAEVWFDKV Rhizophogun SEQ ID NO.: MIMAKVFGLVLGFLSQTFTYPSQIRLSPVGVNNGQVITPELSNSIQETLDVWN vinicolor 120 ITGLSVAIIPKSGEPEYHSWGDRTEDGESVTQDTLFHMASVSKAFCVSALGIL GenBank MDDFEHGRNVTPLPPALTEFNWHTSIQDLLPGEWQLMDEWASRKANVKDIL OAX34183.1 SHVSGLPSHHFAFGPYESPKEVVSRLRYLRPAFELREQWSYNNQMFTVAGHI hypothetical VETYSGKTYTSFVEDRIFTPLGMFSSTFSPAKAVKTGKFTQGWTSSGRLLPEF protein beta- FQEDMIMPMAGPGGVISSAVDMSKWVALWLNKGVHDNVTIIPSSVYGNAS lactamase QSYAVSISTPVDSEHSILGYGLGWFRNSYLGHDVVYHSGSIPGLSTLVSFLPD (transpeptidase) DDVGFVVFANGDNKAAPVMNISNRIIDAALHLRSGPAPPIMPEKKAVTSPSE DIVNLELPLEEFSGTYTDPGYGTFTFCSPSSSSPYCQQVIANFTTVDSVRPSAP SSLQLLAAWPRVGSSHIRTVHQSGNKFMLLPTALFPEGYGRDSTPFETAEIGT RGAPVEFVVEDGRVVGFGLFGLVGQVTERERTQTTVKDRAGVWFDKV Rhizophogun SEQ ID NO.: MSTKRGTLTIAGSGIASVGHITLGTLSYIKESDKIFYLVCDPVTEAFIYDNSTA vinicolor 121 DCFDLSVFYDKTKGRYDSYIQMCEVMLKAVRAGHDVLGVFYGHPGVFVSP GenBank SHRAIAVARQEGYKAKMLPGISAEDYMFADLEFDPSVSGCKTCEATEILLRD OX32862.1 KPLDPTIQNIIWQVGSVGVVDMEFSKSKFQLLVDRLEKDFGPDHKVVHYIGA hypothetical VLPQSTTTMDTFTIADLRKEDVAKQFGTISTLYIPPRDEGHVNLSMAKVFGGP N- GASVKLNDSIKWAGPKLNIVSANDPHERDVIAQVDTHVAPEGHKKLRVSAA methyltransferase MKKFMTDLALKPKFLEEYKLDPVAVVESAEGLSNLERFGLKFARSGPADAL MKATESDIASGRQLTEEEIAQGTGPVGLQTALALLVLLGLGVAIVTRPDD Rhizophogun SEQ ID NO.: MTSDNLQPEVISANWLKSLEAASSTGDTASFVSHFLPDGWFRDMLCFTWNF vinicolor 122 RTLSGQEKIHGFISEVVDGQSRLSYSHLHDFKLDDHSVNAPSPFKLPGPPDIEG GenBank VQGAFTFSITKPAAYGRGFFRLTQDVHGNWKALTLFTNMQDLVGHEESSAD OAX34185.1 EYDPHEKANPTVVIVIKVGGGQSGLICAARLGKLGIRALVIDKNARVGDIWR hypothetical QRYAEALPSFAVLSRQETQVPEPYAAYSQISKLLPYPSNFPKYLPKGKLANFL protein ESYAINQELCIWLSSTVSPSPVYDSFSARWTVEVEHENRKVILHPKHLVLATG FAD/NAD(P)- HGRPRIPTWNGMDDFQGTLYHSDFHRDAEKFRGKCVVVIGAGNASGDICED dependent FVAQGAAEVTIVQRSATCVVSSATADAFVFKLPFSDKTPIEELDFRHNSMPLA oxidoreductase, FVLQLMKSGGTQHMKAHDKEHHEGLRKAGFNLTWEPSPGSGEVGLLGFVF D- ERAGSGTMIDTGFGKLIVEGTVKVKQGQNISHFDKEGITFKDGSKLPADVIV aminoacid AATGNELTMDAIRAVLGDTIAEQLPPKVWGLDAEGELNQMYRPSGHPGLW dehydrogenase FAVGSLGMTRFCSKHLGLQILAQEVGIA
[0150] In some embodiments, the cyclization comprises reacting with a prolyl endopeptidase, an N-methyltransferase, and a hydroxylase. In some embodiments, the bicyclization comprises further modification of the indicated anchored residues on the cyclized peptide, forming an internal tryptathionine bridge. The first step may involve hydroxylation of the 2-position of the indole ring of the tryptophan residue by a hydroxylase belonging to the cytochrome P450 family of oxygenases. An example of such hydroxylase is shown in TABLE 5. The hydroxylase may be a protein with a sequence selected from SEQ ID NO: 123. The hydroxylase may be a variant (e.g., a non-natural variant) of a naturally found hydroxylase. Such a variant can have at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 123.
TABLE-US-00005 TABLE 5 Amino acid sequences of a hydroxylase Galerina SEQ ID NO.: MGKMAYHTVLDDIALYLLGSAALVIFYRSFFYPYFLSGRRLAPGPTKGELSK marginata 123 ELKQFNNEINVHFLRHMVKEYGPIFRLVGAPMIPGPGLVVCTPTAQQRILVSN CBS 339.88 KDR84981.1 SINYGQPRLAFFRWVTGGLFTLPEREHRGMRKILDPVFSFRNLISTTGVYYNT hypothetical VQSLITIFRSKIDGENGAKDGDVILVYEWLARLAIDNVSEAILGFKLDTLHDP protein NNELITTLDELSRIPTAAFELLVRVPGFLRLVTFDSVRHSTLWQRRVPGRLGV GALMADRAFT_260690 FFTFMRCLSTIRKNALAIKATILQEDSANRDLNVISVLQHMQSSDETANADIA GNILMLWMSGRATIATRISWLLWLLAKDQQCQQQLRDEIAPLFSRDPRPDYR SLDKLQWLDSVIMESIRLFLFGPNIRVALNDDYIDGVFVPKGTVVVIPLDLFT RGDIWGEDPDQFKPARWLDSTKRYKISPPFLSFLTGPHRCIAKGMAIMQTKIV IASLIANFEFKPAYEGQHVEGNPSIIGHGMPLHVKPIRPS
[0151] Step 2 may involve the formation of a tryptathionine bridge between the 2′-hydroxyl position on tryptophan and the thiol group from the cysteine residue. This condensation reaction is catalyzed by a novel family of dehydratases. Examples of the dehydratases are shown in TABLE 6. The dehydratases may be a protein with a sequence selected from SEQ ID NOs: 124-127. The dehydratases may be a variant (e.g., a non-natural variant) of a naturally found dehydratases. Such a variant can have at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NOs: 124-127.
TABLE-US-00006 TABLE 6 Amino acid sequences of dehydratases Galerina SEQ ID NO.: MPYVPDPKYFEHREQSSGATLYYCLVCRDGRERQPHHIKTHEASQAHRTAL marginata 124 SVFDSQAESSSQQTHGNPTQPGYFDPVIDDAVRALLVSGSGDPHQPLYPAGH CBS 339.88 KDR80488.1 PNVYGEPNFTDSRRRTSPVTGIDWDQFEAQEDTHAVPSAQDQLRADICQATL hypothetical DWLNDDISDDDEREPSEVDSVDSDAESDREPIPDDQPRKRARTNRDNPISED protein WYPWQDKITCTLDILMHLPRSVFSRKQLDLFLWLLRVNNVDDVPTGKSMK GALMADRAFT_136963 MLNKILQGMCGIETIAYEGKLGHNYHVNNIAQILAQELCNPKVGPHIYFYPE DSGDNLAEARQAARWLHELRPEETTPMIHLPSGDYYIYEPAMLSNRSFCIPFR WFTRNGKFHARAWSLETGVVDNTLGWIVHKENEVEISEDDLLKDFTRFSSD CEAYNVPHPSRILGVSCADSGNLLPWNHTNPVLGNRWRQLAKGHRTLCLPL WMYCDDTSGNTSKKWNEHNSFLFTLAGLPREHTAKEYNIHFLCTSNLAPPL EMMDGVVSQIEAAQQNGIWAWDCVRKEPVLIFPTILALLGDNPMHSEFACHI GLRGKFFCRTCWVKGSDAQDDANIVTPGLHETPENSPAPSPAPSPAPSPAPSP APSPALSMAPQSQPPTPSEPSMQVPAPPSTAAPTKARGKKKETMSAMLNRIT AFIKPGRLRNKSETQKTLQNFKEQAQTIGAKTKLKTARTETGIKDTVQEFFFE KLFSSYKNKRGPQAKQEALDQAVNQLPSDITSPVWRLKGLDPHQDTPVEILH VVLLGFIKYFWRDLVQNQINDDQKQTLIQRLNSFDVTGLGITQLGGETLVNY AGSLTGRDFRAVAQVAPFVIYDMVPADVFDAWLALSKLVPLVWQPYIENV AQYLTTLEHEIHVFLLRTARWTTGWFNKSKFHIILHLPSHIRRFGPAILFATEA FESFNAVIRAKSVHSNRQAPSRDIALAFAQGNRIRHLLSGGHFLSADTHMVV DPDQPQLGQYERLARGRWRSVGPGPGHLVSAEPILPSYLGIPPQSTTSSAGLC KRTKTPPQTFLQTLTGLKLPNVSRPGARELWQTCSEVYLLNDDKCLIGHHVI VQRQSEQASFVSPPFIARIGEILQKVGSANHAHDKPDGILVQTLKSSEVADKF QMPRLVPQNEWSFVPLADILCTVNAQHDCDRNGCTASGFRYVYQERIQTND QRPVVEHVNQPEDFILNTAQMRDALHLQKFRIRSRSLDEQTIIHESVARTINQ RKAQDNSSSGTGGAGVSGRGRGRGRGRGGGVEGPSTSRGRGGGIEGRGASS SSGNGRGRGRGARSAQSVPF Galerina SEQ ID NO.: MPRKKPAPECFETDEASKMIRCLICKENDTVQQGTWIKHGSASQHIETNAHK marginata 125 LAVARREQLLQVQQEEERRLQEIYGGNTIPLSGNAQLYPTYPRANMYGNQD CBS 339.88 KDR74877.1 AVDTDMDNQNSPPQAYMLCDADIPDLGIKPIERPDPSQERERLRQQVEQLLL hypothetical QAEHEDEFGSPDDPDDLTSTNIAQAFADLDLEEMLDEEEVFDYFNQVSPEHD protein YYPYPNKTTMLLDILDNLPRLRMSSNQLRLILWLLKQTGVSNVPSFSGFRNM GALMADRAFT_99137 QTHLRNMCGTTPKQHVSSLGNIFYSNNIGESVMRDFANPEVAKHLHLYPEET EGPISEVWQAERWKEFAPSELTPMFSQGHRQFFIDEVAQLQDGQYVIPRNWV MRKGKLTSDCHIVTVNPVRFSKLHGSLVLVLKQCFQSGWTLLSETQIFHADD FQFNYFDVVSRIRGPISWSEGTEVPAMPNNLRELAGDDDLVVIMVPLWCDD VSGNKSKQYNKHINVYMANSNIPGRLLQQEYFVRFVSTSPNATSPEQFSALK DQINETQKKPIQCYNAHTNKKTRAILRVPGLPADNPQQSEESCHMGGNANC KCRKCHVGGPHEKKESNEGYHEHYLTGIKRSAEETRLELEKQIKLAMYGVE KPINETQTNTGTKDKVAQHWIDILLAKSRELKSANPSRSVEEIAQELQTWFDE QPGDKINPLLSIAGLDPTQDTPVEILHTILLGIVKYAWHHLHSNWTEAEQNLF TVRLQSTDIDGLSVPPIRVAYMMQYRNGLIGKHFKTLMQTLPFHVHGTVSD AQFKLVKAIGELGSVLWVHEIGDMEKYLSDLEILIGNVLDAFAEIDPSTAMY ARFIYEPMPVPSKIIVKLKLHMLPHLIEDIKRFGPAIRNSTEVFECFNAIFRLCSI LSNHQAASRDIALKFASMDRLKHMLSGGYWLSEVEEGKFEWIRAGENVRNI LQSEPTIQRHLGWAPSAKFQSGRKRTPPTSWENTKASQFMDSEETAAIGFPNP RLLSWRKGVTTTAQSGDRCSTGSWVVARNHKVCYILASHYCSIAKNDQGES CIGRIHEIIGPDEKSASSTGIITLECFQLGKEHHPDFGLPTLQRPQADLPKYILK AWQDPLFIFSAHHDCHTASCQATALQPQLQERQLTSRMNKLIAHNDSDHFII NLYGLHNAILLREFLPRELTAPQPLHQDRKAFHYEVAAKLRVQQAEKRAKT NARRKATRAANKAKQVERQKQNPDHEQESEQEMDERPNSENGSDIELGGD DDIEVETRRKRRRN Hypsizygus SEQ ID NO.: MGRRAEELPAYVELSEDGTLVRCNLCLMHNRLDYSKEWIQRKGWRSHKGS marmoreus 126 GIHDRSEAKQRVLDDAAMDLQEPASAEVEVVTFNDILIINAPKTPTGNMQSE KYQ37095.1 EQAMWDHFDAGSFTLEAGEDPNHSSQRLYQDLARKADAYGAWDGTEALPE hypothetical YRDLDDVSQFLDEDEEEDLLSEILRGLGLEEEHEDSSDRNPAEELNSPWYPY protein GSKLMFLLDTIDNLPRLRISGAMMRVFLWLLREVGVRQVPSFDKLRKIQRKL Hypma_08924 REGSGVPTVHWMSPKGNAYSFNDPAVIVANDWASPITRPHLRRYPVIPKDG VITEVYHAEKWHREINRHFLTPMYDDGFRHYFIDELAQLKDGRYAVPVRWL EDVDGRIVADAWRVELEDDNRATIIDTATVRIHSQELALNFEEIIESNLMPEW SDTTTEAGHPSRMPNPDRALAEGDPIYTSFIDIFGDDVSGNRSKSWNKHWNM YISHRNLPRKLLHQQYHTHFVSTSTFASIPEQFVGVKEAIESTHSKPVKVRDA DTGKQIRLKIYCNCGPGDNPSQSETSGHIGGNGNYPCRKCHTGGTQKSKETD EGFYKMFTAGEARSSKETLAEVKSQVEAACTGVAKTVADAQSDTGVKDAY TQYWIDAIIEKARAMQKENPGMPTTTIQATLIKWVYDHEEAIYNSFLTLDGF DASRDTPVEILHTILLGIVKYLWHRSHTSWNAAQKKIYSTRLQGTNTQGLSIH HIRANYIMQYANSLIGRQLKTLAQVNVFHVYDLVDPLRFLFTKATGELCALL WFTEIRDLEEYLSDVDIAAANVLDIAAVIDPSKIVSKIKYHLLSHLREDIIRFGP LVGVATEVFECFNAVFRYCSILSNHLAPSRDIAYKLAAQETMKHFLSGGWW HVKDSVDLQGNPKWVQPGPSVRTFMASNPVLHTLCGWTRNNDSTPGTVKS EPRKRGPDKQTLLPLVRLAWLETQGSRALNNTSPNNETQWQRCKYVIAETQ DQCNVGSWVFARSPLLENIPIPGRIVEILQDTSASPSAFVVIDVFQVSATRDEV FGMPVLLRRFNECCLHVIPASSVIFDFNAQHDCRYAKCEATGEQPLIQERVPS GVTENFVVHKAIDRYLINIHALHNAHLIRATLPRDLTAPIPYAPNREAHHSAI AAELRSAQDTKRAKTAAKTAANAAAKKAEAALKDTTSGPAAKRRRVDDEG SGEEDNRDVDMVSV Galerina SEQ ID NO.: MAKGRKLNNPLPDFIEISNDGLQVRCTLCLAARQHNGSGWIKRGSVSNHLK marginata 127 SDNHTNSLEAHEMKKSAEKAEGRSVQEEIAMEEGMDFVILSSKIQPEITAPAR CBS 339.88 KDR73903.1 APRRSNEEQEMWDRYTLGGEVFDAGVDHTLVEAEERKRLEREATDFDLWH hypothetical GADFLPEEDPNDGELLLDELEQDDILSELLRNAHLNAPDAADVLTEEPRAAA protein DPRICDAWSPYESKMMFLLDTLDNLPRLRISNSLMNVFLWILREGGARDVPS GALMADRAFT_141673 LYHLRQVQTTLRKSTGVPTTQHKSPKGNVYSMNDPRTLVAMDWANPVICD HIRRYPVIPRNGVISEVYHAQKWRKDVDPHTLSPMYDAGNCHYYIDEVARL KNGTFIIPVRWLEDEDRNVCADAYVVQFDDQFIASVVDGETIIVQASDLQNN FLDLKDMGLLPTWGNQTIESGHPARMPNPDRALAEGDPLYTSWIDVFGDDV SGNRSKNWNKHWNIYISHRNLPRKLLQQEFHTHFVSTSPVASVTEQFHGIKQ VIELTHKSPVKVRHGTSGAQIRFKINVNCGPGDNPAQSEVCGHIGVNGNKLC RKCHTGGTHEVKESDEGFNSLFEPGDARSAQEIVADVESQVQLACLGIAQHV QNQQTKNGIKDAYTQYWIDYLINRARTLRKEQPRRTTADIQSELLVWVQEH KDEIYNPFLKLDGFDAAVDTPVEILHTILLGIVKYLWHGSHTSWTAIQKQTYS VRLQSTDTSGLSIHAIRANYIMQYANSLIGRQFKTIAQVNVFHVYDLVDTTQF LLTKAVGELTALLWIPEIANMEEYLLDVEAAAANVLDLFALIDPSKMTNKLK LHLLVHLKADILRFGPLVGVATETFECFNAIFRFCSIYSNHLAPSRDIAFQLAS QEVLKYRLTGGWWPASDGEWKRPGPSVRNFIHDHPTLQALLGWTKEEKLV NGSFRLEPLKRDASQKIESRKHLPWLQTQGAKAVNSSEDNDSKWTACRFAV ANSGDKCSVGSWVFATSPFNSNQSVTGRIVEVLAESEGKRAVVVLDIFEVCS TRHKIFGMPMLARRHEEPVYAVIASTNIEFLYNVQHDCPLAKCTASGKQPLI QERVESGLFKTYIEHKPIERFVINTHAFHNAHRLRAVLQRSLVVPIPLYPPEIR KTKHAEFAHNLQATQKVKLEARAAQKAKEIITPADKTDSTIPKKRTRSEMET ETDDTAIATQADVFFNAQGCP
[0152] Step 3 describes S-oxygenation of the tryptathionine thiol by a flavin-monoxygenase enzyme that converts it to a sulfinyl form. Examples of such monoxygenase are shown in TABLE 7. Step 4 describes potential future modification steps such as hydroxylation of side chains on the peptide such as the hydroxylation of position 6 on the indole ring of the tryptathionine-forming tryptophan residue by a P450 family monoxygenase. The monoxygenase may be a protein with a sequence selected from SEQ ID NO: 128. The monoxygenase may be a variant (e.g., a non-natural variant) of a naturally found monoxygenase. Such a variant can have at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NO: 128.
TABLE-US-00007 TABLE 7 Amino acid sequences of monoxygenases Galerina SEQ ID NO.: MVQIKRLLLGFLSSPSQTPLESNHGPVPSKSIAVVGAGSAGLAMLRTLVELEA marginata 128 FSRNNWEVVLYEERESVGGIWLPDNNDVFPPEIPKTPLYPLLRTNTPVPSMTY CBS 339.88 KDR68385.1 PGFPFPPSTPLYPRHDHVEAYHLRYARRHNLLDFIKFDTMVEKAFWNGTPEE hypothetical GYWNLTLSSKEGRMRYKTFDHLVVATGNNHIPHIPVWKGQEDWLASPANH protein SRKIIHSVYYRGPEAFSNQTVLIVGNGGSGRDAATQILGYASQTFMSIRRSYG GALMADRAFT_104945 PVDDGVIVKPDISHFTEAGVVFVDGTILDPDVILLGTGYEMQKPLLSEGGELS FDPTAKDNSSVRGTLVTNGHYIFPLHRHIFSLSPRYPPNALAFIGLLSFIASCPS DIAQSLFAAHAILDPSILPPRHLLLEELASYEDKARRQGLDPYLKGPIMLNNTS NDYQDELVEYLKQKNAIPDDGKKFVEEWRREILAYHYLQRGWSRIEKLGM GPAWTEGVKTEAQWFDLMTRVNEWQKNWETENGIAFRVDLDLTG
[0153] The sequence which flanks the encoded random peptide library can be modified by using N-term and C-term flanks from the MSDIN family genes (toxin preproprotein sequences) identified in the genomes of Amanita bisporigera and Amanita phalloides.
[0154] The enzymes can additionally be targeted to a specific cellular compartment to increase peptide synthesis efficiency and increase yield for peptide production purposes.
[0155] Disclosed herein, in certain embodiments, is a method of detecting degradation of a target protein that is mediated by a molecule that links a target or test protein to an E3 ubiquitin ligase in a host cell, comprising: expressing in the host cell a first fusion protein comprising the first test protein, an E3 ubiquitin ligase; delivering a first molecule to the host cell; modifying the first molecule while in the host cell via a modifying enzyme; and allowing the first molecule to bridge the interaction between the first test protein and the E3 ubiquitin ligase, wherein the first molecule is a product of an encoded DNA sequence, wherein the first molecule comprises a randomized polypeptide library and one or more modifying enzymes, wherein the one or more modifying enzymes modify the randomized polypeptide library.
Host Cells
[0156] In some embodiments, the host cell is a eukaryote or a prokaryote. In some embodiments, the host cell is from animal, plant, a fungus, or bacteria. In some embodiments, the fungus is Aspergillus or Pichia pastoris. In some embodiments, the host cell is a haploid yeast cell. In other embodiments, the host cell is a diploid yeast cell. In some embodiments, the diploid yeast cell is produced by mating a first host cell comprising DNA sequences encoding the first chimeric gene, the second chimeric gene, and the third chimeric gene, to a second host cell comprising DNA sequences encoding the death agent, positive selection reporter, and the mRNA comprising a nucleotide sequence encoding a polypeptide. In some embodiments, the plant is Nicotiana tabacum or Physcomitrella patens. In some embodiments, the host cell is a sf9 (Spodoptera frugiperda) insect cell.
[0157] Disclosed herein, in certain embodiments, is a host cell configured to express a first fusion protein comprising a first test protein, a first DNA-binding moiety and a first gene activating moiety; an E3 ubiquitin ligase; a death agent, wherein the expression of the death agent is under control of a promoter DNA sequence specific for the DNA-binding moiety and a polypeptide of 60 or fewer amino acids, wherein the polypeptide modulates an interaction between the first test protein and the E3 ubiquitin ligase to lead to the first test protein's accelerated degradation.
[0158] In some embodiments, the host cell may also comprise a second fusion protein, comprising a second DNA-binding moiety, a second test protein, and a second gene-activation moiety; and a positive selection reporter, wherein the expression of the positive reporter is under control of a second promoter DNA sequence specific for the second DNA-binding moiety.
[0159] The host cell may have a mutant background enabling uptake of small molecules. In some cases, the host cell has a mutant background enabling increased transformation efficiency.
[0160] Disclosed herein, in certain embodiments, is a host cell comprising a plasmid vector wherein a DNA sequence encoding a first polypeptide is inserted in frame with Gal4-DBD and VP64-AD, and a second polypeptide is inserted in frame with LexA-DBD and VP64-AD, and wherein a DNA sequence encodes an E3 ubiquitin ligase.
[0161] Disclosed herein, in certain embodiments, is a kit comprising of the described plasmids; and transfectable host cells compatible with the plasmids, or any combination thereof. In some embodiments, the provided host cells are already transfected with components of the plasmids. In some embodiments, the kit includes selectable agents for use with host cells transfected with the plasmids. In some embodiments a library of variants of either plasmid are provided, wherein more than a single pair of bait proteins or E3 ubiquitin ligases are provided. Such a library can be used to, for example, screen for agents with selective protein targeting. In some embodiments a library of variants of the polypeptide plasmid are provided, wherein a plurality of different short test polypeptide sequences for screening are provided. The plurality of different short peptide sequences can be randomly generated by any method (e.g. NNK or NNN nucleotide randomization). The plurality of different short peptide sequences can also be preselected, either by previous experiments selecting for binding to a target, or from existing data sets in the scientific literature that have reported rationally-designed peptide libraries.
[0162] The host cell can additionally be made to be permeable to small molecules, for example by deletion of drug efflux pump encoding genes such as PDR5. Genes encoding for transcription factors such as PDR1 and PDR3 that induce expression of efflux pumps including but not restricted to the 12 genes described by 12geneΔ0HSR (Chinen, 2011). The host cell could be further permeabilized to small molecules by interference with the synthesis and deposition of ergosterol in the plasma membrane such as by the deletion of ERG2, ERG3, and/or ERG6 or driving their expression under a regulatable promoter.
[0163] The host cell can additionally carry mutations to enable more efficient transformation with vectors and/or more efficient uptake small molecules.
[0164] The mentioned plasmids can be used in various permutations. In some embodiments, integration of the plasmids into the genome of the host cell is followed by transformation of a library with randomly encoded peptides using, for example, NNK or NNN codons.
[0165] In some embodiments, to perform a screen to identify a peptide that can mediate the degradation of a target protein, the host cell is propagated in selection media to ensure the presence of the required plasmids and expression of a non-target protein (e.g. on media lacking the positive selection marker for yeast, or in media containing antibiotic for human or bacterial cells). This host cell can then be transformed with the peptide library plasmid, and immediately transferred to selection media to ensure all components are present (i.e. on media lacking both plasmid selection markers for yeast, or antibiotics for bacterial or mammalian cells), and are inducing expression of any inducible component such as the target protein which activates expression of the death agent (e.g. with Gal, doxycycline, etc).
[0166] In other embodiments, the plasmids are used as a ‘plug and play platform’ utilizing the yeast mating type system, where the one or more (or two or more) plasmids (or the genetic elements therein) are introduced into the same cell by cell fusion or cell fusion followed by meiosis instead of transfection. This cell fusion involves two different yeast host cells bearing different genetic elements. In this embodiment, yeast host cell 1 is one of MATa or MATalpha and includes an integration of the target protein and E3 ubiquitin ligase plasmid. In this embodiment, yeast host cell 1 strain can be propagated on positive selection media to ensure the proteins are present. In this embodiment, the yeast host cell 2 can be the opposite mating type. This strain carries (or has integrated) the randomized peptide library and ‘death agent’ (e.g. cytotoxic reporter) plasmid. Yeast host cell 2 can be generated via large batch high efficiency transformation protocols which ensure a highly diversified library variation within the cell culture. Aliquots of this library batch can then be frozen to maintain consistency. In this embodiment, the strains are mated in batch to result in a diploid strain that carries all the markers, the target protein, E3 ligase, positive selection, ‘death agents’ and peptide. This batch culture then can be propagated on solid medium that enables selection of all the system components (i.e. media lacking both positive selection markers) and inducing expression of any inducible component (i.e. with Gal).
[0167] Surviving colonies from limiting dilution experiments performed on host cells bearing both the target protein and E3 ligase and library/cytotoxic constructs (either introduced to the cell by transfection or mating) can constitute colonies with a specific target protein has been degraded by a peptide and no longer triggers the death cascade triggered by the encoded ‘death agents’ (e.g. cytotoxic reporters) while maintaining the expression of a bait variant protein driving a positive selection marker. The peptide sequence can be obtained by DNA sequencing the peptide-encoding region of the plasmid in each surviving colony.
[0168] To ensure that survival is due to the degradation of the target rather than stochastic chance or faulty gene expression, an inducible promoter can be used to inactivate the production of either the E3 ligase or the peptide and confirm specificity. In some embodiments, cell survival is observed only on media with galactose wherein all the components are expressed; and no survival is observed on media without galactose when expression of the peptide is lost.
[0169] The plasmids can also be isolated and re-transformed into a fresh host cell to confirm specificity. Biochemical fractionation of the viable host cells which contain the target, E3 ligase, peptide, positive selection and ‘death agent’ followed by pull-down experiments can confirm an interaction between the peptide sequence and either target protein or E3 ligase using encoded tags that are part of the fusion constructs (e.g. Myc-tag, HA-tag, His-tag). This is also helpful to perform SAR to determine the binding interface.
[0170] The peptides to be used in the screening assay can be derived from a complex library that involves post-translational modifying enzymes. The modified peptides can be analyzed by methods such as mass spectrometry, in addition being sequenced to ID the primary sequence. The peptides can also be tested for inherent membrane permeability by reapplying them onto the host cells exogenously (from a lysate) and observing for reporter inactivation or activation.
[0171] Once enough surviving host cell colonies are sequenced, highly conserved sequence patterns can emerge and can be readily identified using a multiple-sequence alignment. Any such pattern can be used to ‘anchor’ residues within the library peptide insert sequence and permute the variable residues to generate diversity and achieve tighter binding. In some embodiments, this can also be done using an algorithm developed for pattern recognition and library design. Upon convergence, the disrupting peptide pattern, as identified through sequencing, can be used to define a peptide disruptor sequence. Convergence is defined by the lack of retrieval of any new sequences in the last iteration relative to the penultimate one.
[0172] In some embodiments, a peptide library may be generated and/or used for screening as described herein. The peptides in the generated library may be peptide having drug-like properties. The peptides to be used in the screening assay can be derived from a process that involves enzymes that modify peptides post-translation. For instance, to generate libraries of peptides that have an N-methylated backbone or are macrocyclic in structure, a methyltransferase (such as the ones described in Table 3) may be used to generate the library, along with a prolyloligopeptidase (such as the ones described in Table 2) as shown in
[0173] In an alternative approach, to generate libraries of drug-like N-methylated and/or macrocyclic peptides (e.g., for use in a system designed to identify “bridging” peptides or peptides that inhibit a protein-protein interaction), a methyltransferase (such as the ones described in Table 3) may be used, where a protease cleavage site (such as TEV protease) is inserted upstream of the diversified core peptide sequence as shown in
EXAMPLES
Example 1: Method for Identifying a Molecule that Leads to Selective Protein Degradation.
[0174] This is an example of a system that uses two variants of one protein, fused to different DBDs to identify facilitator for a specific variant degradation. An integration plasmid is used to integrate into Saccharomyces cerevisiae proteins that constitute the proteins of interest and an E3 ligase. The plasmid encodes for the fusion of an AD (VP64) and DBD (Gal4) with KRas(G12D), and another fusion construct of AD (VP64) and DBD (LexA) with KRas, and the E3 ubiquitin ligase Cereblon (CRL4-CRBN). The protein fusion sequences are tagged with either FLAG, MYC or HA. The plasmid further includes yeast replication and selection markers (TRP1 and CEN). The plasmid also has sites for integration into the genome at a specified locus.
[0175] The Saccharomyces cerevisiae is co-transformed with a selection and library plasmid for the expression of a randomized peptide library, NNK 20-mer sequences. The selection plasmid is driven by a strong promoter, ADH1. The selection and library plasmid also comprises a sequence that encodes a HIS tag.
[0176] The selection and library plasmid additionally comprises a LexAop sequence, which induces ‘death agents’ (cytotoxic reporter expression) when bound by a functional transcriptional factor that is formed by Gal4 —KRas(G12D) —VP64 fusion protein. The selection and library plasmid also contains a positive selection marker, ADE2 which is under control of LexA—KRas—VP64 fusion protein and leading to expression of the positive selection marker when the fusion protein is expressed. The plasmid further includes yeast replication and selection markers (TRP1 and CEN).
[0177] The screen is performed by mating the strains in a batch to result in a diploid strain, which carries all the markers, the target protein, the E3 ligase, the positive selection, the death agents, and the peptide. This batch culture is then propagated on solid medium, which enable selection of all the system components (media lacking two nutritional components) and induce expression of any inducible component with Gal.
[0178] Surviving colonies constitute cells with degraded KRas(G12D), that can no longer trigger the death cascade induced by the encoded death agents, the degradation of which has been facilitated by a peptide bridging to Cereblon. The same cells also express WT KRas that was not targeted and is driving positive selection to enable survival.
[0179] The peptide sequence that is able to selectively degrade KRas(G12D) is obtained by DNA sequencing the peptide-encoding region of the selection and library plasmid in each surviving colony.
[0180] To confirm specificity, the inducible marker is used to inactivate the production of the E3 ligase and confirm specificity. The plasmid is then isolated and re-transformed into a fresh parental strain to confirm specificity.
[0181] Biochemical fractionation of the viable strain that contained the target, E3 ligase, peptide, selection marker, and death agent is followed by pull-down experiments to confirm an interaction between the peptide sequence and either protein using the encoded tags.
[0182] An alternative example can be made by switching LexA with Gal4. In another alternative example, fusion proteins in either construct are driven by an inducible promoter, GAL1, instead of ADH1 promoter. In another example, yeast selection marker 2um is included in the target and E3 ligase integration plasmid and selection and library plasmid, instead of CEN. Similarly, yeast selection marker LEU2 can be used alternatively in another example. In yet another example, the N-terminus of the peptide translated from the selection and library plasmid can alternatively be glycine, alanine, serine, threonine, valine, or proline. In other examples, the genetic reporter in the confirmation plasmid is HIS3 or URA3, in place of ADE2. Either mating types of Saccharomyces cerevisiae haploid state can be used as background strain in alternative examples. In other examples, the library of peptides can be expressed from scaffolds that enable post translational modifications. In other examples, background strains also express the enzymes for the cyclization and methylation of peptides like lanthipeptides maturation enzymes from Lactococcus lactis (LanB, LanC, LanM, LanP), patellamide biosynthesis factors from cyanobacteria (PatD, PatG), butelase 1 from Clitoria ternatea, and GmPOPB from Galerina marginata or other species.
Example 2: Negative Readout for Degradation of a Target Protein
[0183] In this example, the target bait is operationally linked to a positive selection marker that enables growth in the absence of an essential nutrient (schematic as shown in
Example 3: Positive Readout for Degradation of a Target Protein
[0184] In this example, the target bait was operationally linked to a ‘Death Agent’ negative selection marker that prevents cell growth when expressed as also described in
[0185] In another example, cells expressing a heterologous E3 ligase, in this case TIR1, were assayed for survival to discover bridging agents that can bridge TIR1 to the bait and lead to degradation, thereby enabling cell growth (schematic shown in
[0186] In yet another example, cells expressing a heterologous E3 ligase, in this case COI1b, were assayed for survival to discover bridging agents that can bridge COI1b to the bait and lead to degradation, thereby enabling cell growth (as shown in