Methods for making high intensity sweeteners
11060124 ยท 2021-07-13
Assignee
Inventors
- Andrew P. Patron (San Marcos, CA)
- Chris Edano Noriega (San Diego, CA, US)
- Rama R. Manam (San Diego, CA, US)
- Justin Colquitt (San Diego, CA, US)
- Nathan Faber (San Diego, CA, US)
- Helge Zieler (Del Mar, CA)
- Justin Stege (San Diego, CA, US)
Cpc classification
C12P19/18
CHEMISTRY; METALLURGY
A23V2002/00
HUMAN NECESSITIES
C12P19/56
CHEMISTRY; METALLURGY
C12Y106/02004
CHEMISTRY; METALLURGY
International classification
C07J17/00
CHEMISTRY; METALLURGY
Abstract
Provided herein include methods of making mogroside compounds, e.g., Compound 1, compositions (for example host cells) for making the mogroside compounds, and the mogroside compounds made by the methods disclosed herein, and compositions (for example, cell lysates) and recombinant cells comprising the mogroside compounds (e.g., Compound 1). Also provided herein are novel cucurbitadienol synthases and the use thereof.
Claims
1. A method of producing Compound 1 having the structure of: ##STR00061## the method comprising: contacting mogroside III.sub.E with an enzyme capable of catalyzing a reaction for the production of the compound 1 from the mogroside III.sub.E, wherein the enzyme comprises an amino acid sequence having at least 90% sequence identity to any one of the sequences set forth in SEQ ID NOs: 2, and 103 having dextransucrase activity or wherein the enzyme having dextransucrase activity is encoded by a gene comprising a nucleic acid sequence having at least 90% sequence identity to any one of SEQ ID NOs:104, and 105.
2. The method of claim 1, the mogroside III.sub.E is contacted with a recombinant host cell that comprises the gene comprising the nucleic acid sequence having at least 90% sequence identity to any one of SEQ ID NOs: 104, and 105 encoding the enzyme having dextransucrase activity.
3. The method of claim 2, wherein the mogroside III.sub.E is present in and/or produced by the recombinant host cell.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
DETAILED DESCRIPTION
Definitions
(32) Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this disclosure belongs. All patents, applications, published applications, and other publications are incorporated by reference in their entirety. In the event that there is a plurality of definitions for a term herein, those in this section prevail unless stated otherwise.
(33) Solvate refers to the compound formed by the interaction of a solvent and a compound described herein or salt thereof. Suitable solvates are physiologically acceptable solvates including hydrates.
(34) A sweetener, sweet flavoring agent, sweet flavor entity, sweet compound, or sweet tasting compound, as used herein refers to a compound or physiologically acceptable salt thereof that elicits a detectable sweet flavor in a subject.
(35) As used herein, the term operably linked is used to describe the connection between regulatory elements and a gene or its coding region. Typically, gene expression is placed under the control of one or more regulatory elements, for example, without limitation, constitutive or inducible promoters, tissue-specific regulatory elements, and enhancers. A gene or coding region is said to be operably linked to or operatively linked to or operably associated with the regulatory elements, meaning that the gene or coding region is controlled or influenced by the regulatory element. For instance, a promoter is operably linked to a coding sequence if the promoter effects transcription or expression of the coding sequence.
(36) The term regulatory element and expression control element are used interchangeably and refer to nucleic acid molecules that can influence the expression of an operably linked coding sequence in a particular host organism. These terms are used broadly to and cover all elements that promote or regulate transcription, including promoters, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5 and 3 untranslated regions (UTRs), transcriptional start sites, termination sequences, polyadenylation sequences, intrans, core elements required for basic interaction of RNA polymerase and transcription factors, upstream elements, enhancers, response elements (see, e.g., Lewin, Genes V (Oxford University Press, Oxford) pages 847-873), and any combination thereof. Exemplary regulatory elements in prokaryotes include promoters, operator sequences and a ribosome binding sites. Regulatory elements that are used in eukaryotic cells can include, without limitation, transcriptional and translational control sequences, such as promoters, enhancers, splicing signals, polyadenylation signals, terminators, protein degradation signals, internal ribosome-entry element (IRES), 2A sequences, and the like, that provide for and/or regulate expression of a coding sequence and/or production of an encoded polypeptide in a host cell. In some embodiments herein, the recombinant cell described herein comprises a genes operably linked to regulatory elements.
(37) As used herein, 2A sequences or elements refer to small peptides introduced as a linker between two proteins, allowing autonomous intraribosomal self-processing of polyproteins (See e.g., de Felipe. Genetic Vaccines and Ther. 2:13 (2004); deFelipe et al. Traffic 5:616-626 (2004)). These short peptides allow co-expression of multiple proteins from a single vector. Many 2A elements are known in the art. Examples of 2A sequences that can be used in the methods and system disclosed herein, without limitation, include 2A sequences from the foot-and-mouth disease virus (F2A), equine rhinitis A virus (E2A), Thosea asigna virus (T2A), and porcine teschovirus-1 (P2A) as described in U.S. Patent Publication No. 20070116690.
(38) As used herein, the term promoter is a nucleotide sequence that permits binding of RNA polymerase and directs the transcription of a gene. Typically, a promoter is located in the 5 non-coding region of a gene, proximal to the transcriptional start site of the gene. Sequence elements within promoters that function in the initiation of transcription are often characterized by consensus nucleotide sequences. Examples of promoters include, but are not limited to, promoters from bacteria, yeast, plants, viruses, and mammals (including humans). A promoter can be inducible, repressible, and/or constitutive. Inducible promoters initiate increased levels of transcription from DNA under their control in response to some change in culture conditions, such as a change in temperature.
(39) As used herein, the term enhancer refers to a type of regulatory element that can increase the efficiency of transcription, regardless of the distance or orientation of the enhancer relative to the start site of transcription.
(40) As used herein, the term transgene refers to any nucleotide or DNA sequence that is integrated into one or more chromosomes of a target cell by human intervention. In some embodiment, the transgene comprises a polynucleotide that encodes a protein of interest. The protein-encoding polynucleotide is generally operatively linked to other sequences that are useful for obtaining the desired expression of the gene of interest, such as transcriptional regulatory sequences. In some embodiments, the transgene can additionally comprise a nucleic acid or other molecule(s) that is used to mark the chromosome where it has integrated.
(41) Percent (%) sequence identity with respect to polynucleotide or polypeptide sequences is used herein as the percentage of bases or amino acid residues in a candidate sequence that are identical with the bases or amino acid residues in another sequence, after aligning the two sequences. Gaps can be introduced into the sequence alignment, if necessary, to achieve the maximum percent sequence identity. Conservative substitutions are not considered as part of the sequence identity. Alignment for purposes of determining percent (%) sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer methods and programs such as BLAST, BLAST-2, ALIGN, FASTA (available in the Genetics Computing Group (GCG) package, from Madison, Wis., USA), or Megalign (DNASTAR). Those of skill in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared.
(42) For instance, percent (%) amino acid sequence identity values may be obtained by using the WU-BLAST-2 computer program described in, for example, Altschul et al., Methods in Enzymology, 1996, 266:460-480. Many search parameters in the WU-BLAST-2 computer program can be adjusted by those skilled in the art. For example, some of the adjustable parameters can be set with the following values: overlap span=1, overlap fraction=0.125, word threshold (T)=11, and scoring matrix=BLOSUM62. When WU-BLAST-2 is used, a % amino acid sequence identity value is determined by dividing (a) the number of matching identical amino acid residues between the amino acid sequence of a first protein of interest and the amino acid sequence of a second protein of interest as determined by WU-BLAST-2 by (b) the total number of amino acid residues of the first protein of interest.
(43) Percent amino acid sequence identity may also be determined using the sequence comparison program NCBI-BLAST2 described in, for example, Altschul et al., Nucleic Acids Res., 1997, 25:3389-3402. The NCBI-BLAST2 sequence comparison program may be downloaded from http://www.ncbi.nlm.nih.gov or otherwise obtained from the National Institute of Health, Bethesda, Md. NCBI-BLAST2 uses several adjustable search parameters. The default values for some of those adjustable search parameters are, for example, unmask=yes, strand=all, expected occurrences=10, minimum low complexity length=15/5, multi-pass e-value=0.01, constant for multi-pass=25, drop-off for final gapped alignment=25 and scoring matrix=BLOSUM62.
(44) In situations where NCBI-BLAST2 is used for amino acid sequence comparisons, the % amino acid sequence identity of a given amino acid sequence A to, with, or against a given amino acid sequence B (which can alternatively be phrased as a given amino acid sequence A that has or comprises a certain % amino acid sequence identity to, with, or against a given amino acid sequence B) is calculated as follows:
100 times the fraction X/Y
where X is the number of amino acid residues scored as identical matches by the sequence alignment program NCBI-BLAST2 in that program's alignment of A and B, and where Y is the total number of amino acid residues in B. It will be appreciated that where the length of amino acid sequence A is not equal to the length of amino acid sequence B, the % amino acid sequence identity of A to B will not equal the % amino acid sequence identity of B to A.
(45) As used herein, isolated means that the indicated compound has been separated from its natural milieu, such that one or more other compounds or biological agents present with the compound in its natural state are no longer present.
(46) As used herein, purified means that the indicated compound is present at a higher amount relative to other compounds typically found with the indicated compound (e.g., in its natural environment). In some embodiments, the relative amount of purified a purified compound is increased by greater than 1%, 5%, 10%, 20%, 30%, 40%, 50%, 80%, 90%, 100%, 120%, 150%, 200%, 300%, 400%, or 1000%. In some embodiments, a purified compound is present at a weight percent level greater than 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.5% relative to other compounds combined with the compound. In some embodiments, the compound 1 produced from the embodiments herein is present at a weight percent level greater than 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.5% relative to other compounds combined with the compound after production.
(47) Purification as described herein, can refer to the methods for extracting Compound 1 from the cell lysate and/or the supernatant, wherein the cell is excreting the product of Compound 1. Lysate as described herein, comprises the cellular content of a cell after disruption of the cell wall and cell membranes and can include proteins, sugars, and mogrosides, for example. Purification can involve ammonium sulfate precipitation to remove proteins, salting to remove proteins, hydrophobic separation (HPLC), and use of an affinity column. In view of the products produced by the methods herein, affinity media is contemplated for the removal of specific mogrosides with an adsorbent resin.
(48) HPLC as described herein is a form of liquid chromatography that can be used to separate compounds that are dissolved in solution. Without being limiting the HPLC instruments can comprise of a reservoir of mobile phase, a pump, an injector, a separation column, and a detector. Compounds can then be separated by injecting a sample mixture onto the column. The different components in the mixture pass can pass through the column at different rates due to differences in their partitioning behavior between the mobile liquid phase and the stationary phase. There are several columns that can be used. Without being limiting the columns can be normal phase columns, reverse phase columns, size exclusion type of columns, and ion exchange columns.
(49) Also contemplated is the use of solid phase extraction and fractionation, which is useful for desalting proteins and sugar samples. Other methods can include the use of HPLC, liquid chromatography for analyzing samples, and liquid-liquid extraction, described in Aurda Andrade-Eiroa et al. (TrAC Trends in Analytical Chemistry Volume 80, June 2016, Pages 641-654; incorporated by reference in its entirety herein.
(50) Solid phase extraction (SPE) for purification, as described herein, refers to a sample preparation process in which compounds that are dissolved or suspended in a liquid mixture are separated from other compounds in the mixture according to their physical and chemical properties. For example, analytical laboratories can use solid phase extraction to concentrate and purify samples for analysis. Solid phase extraction can also be used to isolate analytes of interest from a wide variety of matrices, including urine, blood, water, beverages, soil, and animal tissue, for example. In the embodiments herein, Compound 1 that is in cell lysate or in the cell media can be purified by solid phase extraction.
(51) SPE uses the affinity of solutes dissolved or suspended in a liquid (known as the mobile phase) for a solid through which the sample is passed (known as the stationary phase) to separate a mixture into desired and undesired components. SPE can also be used and applied directly in gas-solid phase and liquid-solid phase, or indirectly to solid samples by using, e.g., thermodesorption with subsequent chromatographic analysis. This can result in either the desired analytes of interest or undesired impurities in the sample are retained on the stationary phase. The portion that passes through the stationary phase can be collected or discarded, depending on whether it contains the desired analytes or undesired impurities. If the portion retained on the stationary phase includes the desired analytes, they can then be removed from the stationary phase for collection in an additional step, in which the stationary phase is rinsed with an appropriate eluent.
(52) Ways that the solid phase extraction can be performed are not limited. Without being limiting, the procedures may include: Normal phase SPE procedure, Reversed phase SPE, Ion exchange SPE, Anion exchange SPE, Cation exchange, and Solid-phase microextraction. Solid phase extraction is described in Sajid et al., and Plotka-Wasylka J et al. (Anal Chim Acta. 2017 May 1; 965:36-53, Crit Rev Anal Chem. 2017 Apr. 11:1-11; incorporated by reference in its entirety).
(53) In some embodiments, the compound 1 that is produced by the cell is purified by solid phase extraction. In some embodiments, the purity of compound 1, for example purified by solid phase extraction is 70%, 80%, 90% or 100% pure or any level of purity defined by any aforementioned values.
(54) Fermentation as described herein, refers broadly to the bulk growth of host cells in a host medium to produce a specific product. In the embodiments herein, the final product produced is Compound 1. This can also include methods that occur with or without air and can be carried out in an anaerobic environment, for example. The whole cells (recombinant host cells) may be in fermentation broth or in a reaction buffer.
(55) Compound 1 and intermediate mogroside compound for the production of Compound 1 can be isolated by collection of intermediate mogroside compounds and Compound 1 from the recombinant cell lysate or from the supernatant. The lysate can be obtained after harvesting the cells and subjecting the cells to lysis by shear force (French press cell or sonication) or by detergent treatment. The lysate can then be filtered and treated with ammonium sulfate to remove proteins, and fractionated on a C18 HPLC (510 cm Atlantis prep T3 OBD column, 5 um, Waters) and by injections using an A/B gradient (A=water B=acetonitrile) of 10.fwdarw.30% B over 30 minutes, with a 95% B wash, followed by re-equilibration at 1% (total run time=42 minutes). The runs can be collected in tared tubes (12 fractions/plate, 3 plates per run) at 30 mL/fraction. The lysate can also be centrifuged to remove solids and particulate matter.
(56) Plates can then be dried in the Genevac HT12/HT24. The desired compound is expected to be eluted in Fraction 21 along with other isomers. The pooled Fractions can be further fractionated in 47 runs on fluoro-phenyl HPLC column (310 cm, Xselect fluoro-phenyl OBD column, 5 um, Waters) using an A/B gradient (A=water, B=acetonitrile) of 15.fwdarw.30% B over 35 minutes, with a 95% B wash, followed by re-equilibration at 15% (total run time=45 minutes). Each run was collected in 12 tared tubes (12 fractions/plate, 1 plate per run) at 30 mL/fraction. Fractions containing the desired peak with the desired purity can be pooled based on UPLC analysis and dried under reduced pressure to give a whitish powdery solid. The pure compound can be re-suspended/dissolved in 10 mL of water and lyophilized to obtain at least a 95% purity.
(57) As used herein, a glycosidic bond refers to a covalent bond connecting two furanose and/or pyranose groups together. Generally, a glycosidic bond is the bond between the anomeric carbon of one furanose or pyranose moiety and an oxygen of another furanose or pyranose moiety. Glycosidic bonds are named using the numbering of the connected carbon atoms, and the alpha/beta orientation. - and -glycosidic bonds are distinguished based on the relative stereochemistry of the anomeric position and the stereocenter furthest from C1 in the ring. For example, sucrose is a disaccharide composed of one molecule of glucose and one molecule of fructose connected through an alpha 1-2 glycosidic bond, as shown below.
(58) ##STR00008##
(59) An example of a beta 1-4 glycosidic bond can be found in cellulose:
(60) ##STR00009##
(61) As used in the specification and the appended claims, the singular forms a, an and the include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to an aromatic compound includes mixtures of aromatic compounds.
(62) Often, ranges are expressed herein as from about one particular value, and/or to about another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent about, it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.
(63) Codon optimization as described herein, refers to the design process of altering codons to codons known to increase maximum protein expression efficiency. In some alternatives, codon optimization for expression in a cell is described, wherein codon optimization can be performed by using algorithms that are known to those skilled in the art so as to create synthetic genetic transcripts optimized for high mRNA and protein yield in humans. Codons can be optimized for protein expression in a bacterial cell, mammalian cell, yeast cell, insect cell, or plant cell, for example. Programs containing algorithms for codon optimization in humans are readily available. Such programs can include, for example, OptimumGene or GeneGPS algorithms. Additionally codon optimized sequences can be obtained commercially, for example, from Integrated DNA Technologies. In some of the embodiments herein, a recombinant cell for the production of Compound 1 comprises genes encoding enzymes for synthesis, wherein the genes are codon optimized for expression. In some embodiments, the genes are codon optimized for expression in bacterial, yeast, fungal or insect cells.
(64) As used herein, the terms nucleic acid, nucleic acid molecule, and polynucleotide are interchangeable and refer to any nucleic acid, whether composed of phosphodiester linkages or modified linkages such as phosphotriester, phosphoramidate, siloxane, carbonate, carboxymethylester, acetamidate, carbamate, thioether, bridged phosphoramidate, bridged methylene phosphonate, bridged phosphoramidate, bridged phosphoramidate, bridged methylene phosphonate, phosphorothioate, methylphosphonate, phosphorodithioate, bridged phosphorothioate or sultone linkages, and combinations of such linkages. The terms nucleic acid and polynucleotide also specifically include nucleic acids composed of bases other than the five biologically occurring bases (adenine, guanine, thymine, cytosine and uracil).
(65) Non-limiting examples of polynucleotides include deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), oligonucleotides, fragments generated by the polymerase chain reaction (PCR), and fragments generated by any of ligation, scission, endonuclease action, and exonuclease action. Nucleic acid molecules can be composed of monomers that are naturally-occurring nucleotides (such as DNA and RNA), or analogs of naturally-occurring nucleotides (e.g., enantiomeric forms of naturally-occurring nucleotides), or a combination of both. Modified nucleotides can have alterations in sugar moieties and/or in pyrimidine or purine base moieties. Sugar modifications include, for example, replacement of one or more hydroxyl groups with halogens, alkyl groups, amines, and azido groups, or sugars can be functionalized as ethers or esters. Moreover, the entire sugar moiety can be replaced with sterically and electronically similar structures, such as aza-sugars and carbocyclic sugar analogs. Examples of modifications in a base moiety include alkylated purines and pyrimidines, acylated purines or pyrimidines, or other well-known heterocyclic substitutes. Nucleic acid monomers can be linked by phosphodiester bonds or analogs of such linkages. Analogs of phosphodiester linkages include phosphorothioate, phosphorodithioate, phosphoroselenoate, phosphorodiselenoate, phosphoroanilothioate, phosphoranilidate, phosphoramidate, and the like. The term nucleic acid molecule also includes so-called peptide nucleic acids, which comprise naturally-occurring or modified nucleic acid bases attached to a polyamide backbone. Nucleic acids can be either single stranded or double stranded. In some alternatives, a nucleic acid sequence encoding a fusion protein is provided. In some alternatives, the nucleic acid is RNA or DNA. In some embodiments, the nucleic acid comprises any one of SEQ ID NOs: 1-1023.
(66) Coding for or encoding are used herein, and refers to the property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other macromolecules such as a defined sequence of amino acids. Thus, a gene codes for a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system. In some embodiments herein, a recombinant cell is provided, wherein the recombinant cell comprises genes encoding for enzymes such as dextransucrase, UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, dextranases, and/or UGT. In some embodiments, the transglucosidases comprises an amino acid sequence set forth by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the CGTases are encoded by or have the sequence of any one of SEQ ID NOs: 1, 3, 78-101, 147 and 154. In some embodiments, the genes encoding the enzymes such as dextransucrase, UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, dextranases, and/or UGT are codon optimized for expression in the host cell. A nucleic acid sequence coding for a polypeptide includes all nucleotide sequences that are degenerate versions of each other and that code for the same amino acid sequence.
(67) Optimization can also be performed to reduce the occurrence of secondary structure in a polynucleotide. In some alternatives of the method, optimization of the sequences in the vector can also be performed to reduce the total GC/AT ratio. Strict codon optimization can lead to unwanted secondary structure or an undesirably high GC content that leads to secondary structure. As such, the secondary structures affect transcriptional efficiency. Programs such as GeneOptimizer can be used after codon usage optimization, for secondary structure avoidance and GC content optimization. These additional programs can be used for further optimization and troubleshooting after an initial codon optimization to limit secondary structures that can occur after the first round of optimization. Alternative programs for optimization are readily available. In some alternatives of the method, the vector comprises sequences that are optimized for secondary structure avoidance and/or the sequences are optimized to reduce the total GC/AT ratio and/or the sequences are optimized for expression in a bacterial or yeast cell.
(68) Vector, Expression vector or construct is a nucleic acid used to introduce heterologous nucleic acids into a cell that has regulatory elements to provide expression of the heterologous nucleic acids in the cell. Vectors include but are not limited to plasmid, minicircles, yeast, and viral genomes. In some alternatives, the vectors are plasmid, minicircles, yeast, or genomes. In some alternatives, the vector is for protein expression in a bacterial system such as E. coli. In some alternatives, the vector is for protein expression in a bacterial system, such as E. coli. In some alternatives, the vector is for protein expression in a yeast system. In some embodiments, the vector for expression is a viral vector. In some embodiments the vector is a recombinant vector comprising promoter sequences for upregulation of expression of the genes. Regulatory elements can refer to the nucleic acid that has nucleotide sequences that can influence the transcription or translation initiation and rate, stability and mobility of a transcription or translation product.
(69) Recombinant host or recombinant host cell as described herein is a host, the genome of which has been augmented by at least one incorporated DNA sequence. Said incorporated DNA sequence may be a heterologous nucleic acid encoding one or more polypeptides. Such DNA sequences include but are not limited to genes that are not naturally present, DNA sequences that are not normally transcribed into RNA or translated into a protein (expressed), and other genes or DNA sequences which one desires to introduce into the nonrecombinant host. In some embodiments, the recombinant host cell is used to prevent expression problems such as codon-bias. There are commercial hosts for expression of proteins, for example, BL21-CodonPlus cells, tRNA-Supplemented Host Strains for Expression of Heterologous Genes, Rosetta (DE3) competent strains for enhancing expression of proteins, and commercial yeast expression systems in the genera Saccharomyces, Pichia, Kluyveromyces, Hansenula and Yarrowia.
(70) The recombinant host may be a commercially available cell such as Rosetta cells for expression of enzymes that may have rare codons.
(71) In some embodiments, the recombinant cell comprises a recombinant gene for the production of cytochrome P450 polypeptide comprising the amino acid sequence of any one of CYP533, CYP937, CYP1798, CYP1994, CYP2048, CYP2740, CYP3404, CYP3968, CYP4112, CYP4149, CYP4491, CYP5491, CYP6479, CYP7604, CYP8224, CYP8728, CYP10020, and CYP10285. In some embodiments, the P450 polypeptide is encoded in genes comprising any one of the sequences set forth in SEQ ID Nos: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891.
(72) In some embodiments, the P450 enzyme is aided by at least one CYP activator, such as CPR4497. In some embodiments, the recombinant host cell further comprises a gene encoding CPR4497, wherein the gene comprises a nucleic acid sequence set forth in SEQ ID NO: 112. In some embodiments, the recombinant host cell further comprises a gene encoding CPR4497, wherein the amino acid sequence of CPR4497 is set forth in SEQ ID NO: 113.
(73) In some embodiments, wherein the recombinant host cell is a yeast cell, the recombinant cell has a deletion of EXG1 gene and/or the EXG2 gene to prevent reduction of glucanase activity which may lead to deglucosylation of mogrosides.
(74) The type of host cell can vary. For example, the host cell can be selected from a group consisting of Agaricus, Aspergillus, Bacillus, Candida, corynebacterium, Escherichia, Fusarium/Gibberella, Kluyveromyces, Laetiporus, Lentinus, Phaffia, Phanerochaete, Pichia, Physcomitrella, Rhodoturula, Saccharomyces, Schizosaccharomyces, Sphaceloma, Xanthophyllomyces, Yarrowia, Lentinus tigrinus, Laetiporus sulphureus, Phanerochaete chrysosporium, Pichia pastoris, Physcomitrella patens, Rhodoturula glutinis, Rhodoturula mucilaginosa, Phaffia rhodozyma, Xanthophyllomyces dendrorhous, Fusarium fujikuroi/Gibberella fujikuroi, Candida utilis, Yarrowia lipolytica, Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus, Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, Lipomyces, Aspergillus nidulans, Yarrowia lipolytica, Rhodosporin toruloides, Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, and Microboryomycetes, Gibberella fujikuroi, Kluyveromyces lactis, Schizosaccharomyces pombe, Aspergillus niger, Saccharomyces cerevisiae, Escherichia coli, Rhodobacter sphaeroides, and Rhodobacter capsulatus. Methods to enhance product yield have been described, for example, in S. cerevisiae. Methods are known for making recombinant microorganisms.
(75) Methods to prepare recombinant host cells from Aspergillus spp. is described in WO2014086842, incorporated by reference in its entirety herein. Nucleotide sequences of the genomes can be obtained through gene data libraries available publicly and can allow for rational design and modifications of the pathways to enhance and improve product yield.
(76) Culture media as described herein, can be a nutrient rich broth for the growth and maintenance of cells during their production phase. A yeast culture for maintaining and propagating various strains, can require specific formulations of complex media for use in cloning and protein expression, and can be appreciated by those of skill in the art. Commercially available culture media can be used from ThermoFisher for example. The media can be YPD broth or can have a yeast nitrogen base. Yeast can be grown in YPD or synthetic media at 30 C.
(77) Lysogeny broth (LB) is typically used for bacterial cells. The bacterial cells used for growth of the enzymes and mogrosides can have antibiotic resistance to prevent the growth of other cells in the culture media and contamination. The cells can have an antibiotic gene cassettes for resistance to antibiotics such as chloramphenicol, penicillin, kanamycin and ampicillin, for example.
(78) As described herein, a fusion protein is a protein created through the joining of two or more nucleic acid sequences that originally coded for a portion or entire amino acid sequence of separate proteins. For example, a fusion protein can contain a functional protein (e.g., an enzyme (including, but not limited to, cucurbitadienol synthase)) and one or more fusion domains. A fusion domain, as describe herein, can be a full length or a portion/fragment of a protein (e.g., a functional protein including but not limited to, an enzyme, a transcription factor, a toxin, and translation factor). The location of the one or more fusion domains in the fusion protein can vary. For example, the one or more fusion domains can be at the N- and/or C-terminal regions (e.g., N- and/or C-termini) of the fusion protein. The one or more fusion domains can also be at the central region of the fusion protein. The fusion domain is not required to be located at the terminus of the fusion protein. A fusion domain can be selected so as to confer a desired property. For example, a fusion domain may affect (e.g., increase or decrees) the enzymatic activity of an enzyme that it is fused to, or affect (e.g., increase or decrease) the stability of a protein that it is fused to. A fusion domain may be a multimerizing (e.g., dimerizing and tetramerizing) domain and/or functional domains. In some embodiments, the fusion domain may enhance or decrease the multimerization of the protein that it is fused to. As a non-limiting example, a fusion protein can contain a full length protein A and a fusion domain fused to the N-terminal region and/or C-terminal region of the full length protein A. In some examples, a fusion protein contains a partial sequence of protein A and a fusion domain fused to the N-terminal region and/or C-terminal region (e.g., the N-terminus and C-terminus) of the partial sequence of protein A. The fusion domain can be, for example, a portion or the entire sequence of protein A, or a portion or the entire sequence of a protein different from protein A. In some embodiments, one or more of the enzymes suitable for use in the methods, systems and compositions disclosed herein can be a fusion protein. In some embodiments, the fusion protein is encoded by a nucleic acid sequence having at least 70%, 80%, 90%, 95%, or 99% sequence identity to one of the nucleic acid sequences listed in Table 1. In some embodiments, the fusion protein comprise an amino acid sequence having at least 70%, 80%, 90%, 95%, or 99% sequence identity to one of the amino acid sequences listed in Table 1. In some embodiments, the fusion protein comprises an amino acid protein sequence having at least 80%, 90%, 95%, or 99% sequence identity to one of the amino acid sequences listed in Table 1, and a fusion domain at N-, C-, or both terminal regions of the fusion protein. In some embodiments, the fusion protein comprises one of the amino acid protein sequences listed in Table 1, and a fusion domain located at N-, C-, or both terminal regions of the fusion protein.
(79) The length of the fusion domain can vary, for example, from 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 125, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, or a range between any of these two numbers, amino acids. In some embodiments, the fusion domain is about 3, 4, 5, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, or a range between any two of these numbers, amino acids in length. In some embodiments, the fusion domain is a substantial portion or the entire sequence of a functional protein (for example, an enzyme, a transcription factor, or a translation factor). In some embodiments, the fusion protein is a protein having cucurbitadienol synthase activity.
(80) Optimizing cell growth and protein expression techniques in culture media are also contemplated. For growth in culture media, cells such as yeast can be sensitive to low pH (Narendranath et al., Appl Environ Microbiol. 2005 May; 71(5): 2239-2243; incorporated by reference in its entirety). During growth, yeast must maintain a constant intracellular pH. There are many enzymes functioning within the yeast cell during growth and metabolism. Each enzyme works best at its optimal pH, which is acidic because of the acidophilic nature of the yeast itself. When the extracellular pH deviates from the optimal level, the yeast cell needs to invest energy to either pump in or pump out hydrogen ions in order to maintain the optimal intracellular pH. As such media containing buffers to control for the pH would be optimal. Alternatively, the cells can also be transferred into a new media if the monitored pH is high.
(81) Growth optimization of bacterial and yeast cells can also be achieved by the addition of nutrients and supplements into a culture media. Alternatively, the cultures can be grown in a fermenter designed for temperature, pH control and controlled aeration rates. Dissolved oxygen and nitrogen can flowed into the media as necessary.
(82) The term Operably linked as used herein refers to functional linkage between a regulatory sequence and a heterologous nucleic acid sequence resulting in expression of the latter.
(83) Mogrosides and mogroside compounds are used interchangeably herein and refer to a family of triterpene glycosides. Non-limiting exemplary examples of mogrosides include such as Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III, which have been identified from the fruits of Siraitia grosvenorii (Swingle) that are responsible for the sweetness of the fruits. In the embodiments herein, mogroside intermediates can be used in the in vivo, ex vivo, or in vitro production of Compound 1 having the structure of:
(84) ##STR00010##
(85) In some embodiments, a recombinant cell for producing Compound 1, further produces mogrosides and comprises genes encoding enzymes for the production of mogrosides. Recombinant cells capable of the production of mogrosides are further described in WO2014086842, incorporated by reference in its entirety herein. In some embodiments, the recombinant cell is grown in a media to allow expression of the enzymes and production of Compound 1 and mogroside intermediates. In some embodiments, Compound 1 is obtained by lysing the cell with shear force (i.e. French press cell or sonication) or by detergent lysing methods. In some embodiments, the cells are supplemented in the growth media with precursor molecules such as mogrol to boost production of Compound 1.
(86) Promoter as used herein refers to a nucleotide sequence that directs the transcription of a structural gene. In some alternatives, a promoter is located in the 5 non-coding region of a gene, proximal to the transcriptional start site of a structural gene. Sequence elements within promoters that function in the initiation of transcription are often characterized by consensus nucleotide sequences. Without being limiting, these promoter elements can include RNA polymerase binding sites, TATA sequences, CAAT sequences, differentiation-specific elements (DSEs; McGehee et al., Mol. Endocrinol. 7:551 (1993); hereby expressly incorporated by reference in its entirety), cyclic AMP response elements (CREs), serum response elements (SREs; Treisman et al., Seminars in Cancer Biol. 1:47 (1990); incorporated by reference in its entirety), glucocorticoid response elements (GREs), and binding sites for other transcription factors, such as CRE/ATF (O'Reilly et al., J. Biol. Chem. 267:19938 (1992); incorporated by reference in its entirety), AP2 (Ye et al., J. Biol. Chem. 269:25728 (1994); incorporated by reference in its entirety), SP1, cAMP response element binding protein (CREB; Loeken et al., Gene Expr. 3:253 (1993); hereby expressly incorporated by reference in its entirety) and octamer factors (see, in general, Watson et al., eds., Molecular Biology of the Gene, 4th ed. (The Benjamin/Cummings Publishing Company, Inc. 1987; incorporated by reference in its entirety)), and Lemaigre and Rousseau, Biochem. J. 303:1 (1994); incorporated by reference in its entirety). As used herein, a promoter can be constitutively active, repressible or inducible. If a promoter is an inducible promoter, then the rate of transcription increases in response to an inducing agent. In contrast, the rate of transcription is not regulated by an inducing agent if the promoter is a constitutive promoter.
(87) A ribosome skip sequence as described herein refers to a sequence that during translation, forces the ribosome to skip the ribosome skip sequence and translate the region after the ribosome skip sequence without formation of a peptide bond. Several viruses, for example, have ribosome skip sequences that allow sequential translation of several proteins on a single nucleic acid without having the proteins linked via a peptide bond. As described herein, this is the linker sequence. In some alternatives of the nucleic acids provided herein, the nucleic acids comprise a ribosome skip sequence between the sequences for the genes for the enzymes described herein, such that the proteins are co-expressed and not linked by a peptide bond. In some alternatives, the ribosome skip sequence is a P2A, T2A, E2A or F2A sequence. In some alternatives, the ribosome skip sequence is a T2A sequence.
(88) Compound 1
(89) As disclosed herein, Compound 1 is a compound having the structure of:
(90) ##STR00011##
or a salt thereof.
(91) Compound 1 is a high-intensity sweetener the can be used in a wide variety of products in which a sweet taste is desired. Compound 1 provides a low-calorie advantage to other sweeteners such as sucrose or fructose.
(92) In some embodiments, Compound 1 is in an isolated and purified form. In some embodiments, Compound 1 is present in a composition in which Compound 1 is substantially purified.
(93) In some embodiments, Compound 1 or salts thereof are isolated and is in solid form. In some embodiments, the solid form is amorphous. In some embodiments, the solid form is crystalline. In some embodiments, the compound is in the form of a lyophile. In some embodiments, Compound 1 is isolated and within a buffer.
(94) The skilled artisan will recognize that some structures described herein may be resonance forms or tautomers of compounds that may be fairly represented by other chemical structures, even when kinetically; the artisan recognizes that such structures may only represent a very small portion of a sample of such compound(s). Such compounds are considered within the scope of the structures depicted, though such resonance forms or tautomers are not represented herein.
(95) Isotopes may be present in Compound 1. Each chemical element as represented in a compound structure may include any isotope of said element. For example, in a compound structure a hydrogen atom may be explicitly disclosed or understood to be present in the compound. At any position of the compound that a hydrogen atom may be present, the hydrogen atom can be any isotope of hydrogen, including but not limited to hydrogen-1 (protium) and hydrogen-2 (deuterium). Thus, reference herein to a compound encompasses all potential isotopic forms unless the context clearly dictates otherwise. In some embodiments, compounds described herein are enriched in one or more isotopes relative to the natural prevalence of such isotopes. In some embodiments, the compounds described herein are enriched in deuterium. In some embodiments, greater than 0.0312% of hydrogen atoms in the compounds described herein are deuterium. In some embodiments, greater than 0.05%, 0.08%, or 0.1% of hydrogen atoms in the compounds described herein are deuterium.
(96) In some embodiments, Compound 1 is capable of forming acid and/or base salts by virtue of the presence of amino and/or carboxyl groups or groups similar thereto.
(97) In some embodiments, Compound 1 is substantially isolated. In some embodiments, Compound 1 is substantially purified. In some embodiments, the compound is in the form of a lyophile. In some embodiments, the compound is crystalline. In some embodiments, the compound is amorphous.
(98) Production Compositions
(99) In some embodiments, the production composition contains none, or less than a certain amount, of undesirable compounds. In some embodiments, the composition contains, or does not contain, one or more isomers of Mogroside I, Mogroside II, and Mogroside III. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of all isomers of Mogroside I, Mogroside II, and Mogroside III. In some embodiments, the composition contains, or does not contain, one or more of Mogroside IIIE, 11-oxo-Mogroside IIIE, Mogroside IIIA2, Mogroside IE, Mogroside IIE, and 11-oxo-mogrol. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of one or more of Mogroside IIIE, 11-oxo-Mogroside IIIE, Mogroside IIIA2, Mogroside IE, Mogroside IIE, and 11-oxo-mogrol. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of Mogroside IIIE. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of 11-oxo-Mogroside IIIE. In some embodiments, the composition contains a weight percent of less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm of 11-oxo-mogrol.
(100) In some embodiments, the production composition is in solid form, which may by crystalline or amorphous. In some embodiments, the composition is in particulate form. The solid form of the composition may be produced using any suitable technique, including but not limited to re-crystallization, filtration, solvent evaporation, grinding, milling, spray drying, spray agglomeration, fluid bed agglomeration, wet or dry granulation, and combinations thereof. In some embodiments, a flowable particulate composition is provided to facilitate use in further food manufacturing processes. In some such embodiments, a particle size between 50 m and 300 m, between 80 m and 200 m, or between 80 m and 150 m is generated.
(101) Some embodiments provide a production composition comprising Compound 1 that is in solution form. For example, in some embodiments a solution produced by one of the production processes described herein is used without further purification. In some embodiments, the concentration of Compound 1 in the solution is greater than 300 ppm, 500 ppm, 800 ppm, 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, or 20% by weight. In some embodiments, the concentration of all isomers of Mogroside I, Mogroside II, and Mogroside III is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, or 100 ppm. In some embodiments, the concentration of one or more of Mogroside IIIE, 11-oxo-Mogroside IIIE, Mogroside IIIA2, Mogroside IE, Mogroside IIE, and 11-oxo-mogrol in the production composition is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, 100 ppm, 50 ppm, 30 ppm, 20 ppm, 10 ppm, 5 ppm, 1 ppm, or 0.1 ppm of one or more of Mogroside IIIE, 11-oxo-Mogroside IIIE, Mogroside IIIA2, Mogroside IE, Mogroside IIE, and 11-oxo-mogrol. In some embodiments, the concentration of Mogroside IIIE is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, 100 ppm, 50 ppm, 30 ppm, 20 ppm, 10 ppm, 5 ppm, 1 ppm, or 0.1 ppm. In some embodiments, the concentration of 11-oxo-Mogroside IIIE is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, 100 ppm, 50 ppm, 30 ppm, 20 ppm, 10 ppm, 5 ppm, 1 ppm, or 0.1 ppm. In some embodiments, the concentration of 11-oxo-mogrol is less than 5%, 3%, 2%, 1%, 0.5%, 0.3%, 0.1%, 800 ppm, 500 ppm, 200 ppm, 100 ppm, 50 ppm, 30 ppm, 20 ppm, 10 ppm, 5 ppm, 1 ppm, or 0.1 ppm.
(102) Methods of Producing Compound 1 and Intermediate Mogroside Compounds
(103) In some embodiments, Compound 1 is produced by contact of various starting and/or intermediate compounds with one or more enzymes. The contact can be in vivo (e.g., in a recombinant cell) or in vitro. The starting and intermediate compounds for producing Compound 1 can include, but are not limited to, Mogroside V, Mogroside IIE, Mogroside III.sub.E, Siamenoside I, Mogroside VI isomer, Mogroside II.sub.A, Mogroside IV.sub.E, or Mogroside IV.sub.A.
(104) In some embodiments, Compound 1 as disclosed herein is produced in recombinant host cells in vivo as described herein or by modification of these methods. Ways of modifying the methodology include, among others, temperature, solvent, reagents etc., known to those skilled in the art. The methods shown and described herein are illustrative only and are not intended, nor are they to be construed, to limit the scope of the claims in any manner whatsoever. Those skilled in the art will be able to recognize modifications of the disclosed methods and to devise alternate routes based on the disclosures herein; all such modifications and alternate routes are within the scope of the claims.
(105) In some embodiments, Compound 1 disclosed herein is obtained by purification and/or isolation from a recombinant bacterial cell, yeast cell, plant cell, or insect cell. In some embodiments, the recombinant cell is from Siraitia grosvenorii. In some such embodiments, an extract obtained from Siraitia grosvenorii may be fractionated using a suitable purification technique. In some embodiments, the extract is fractionated using HPLC and the appropriate fraction is collected to obtain the desired compound in isolated and purified form.
(106) In some embodiments, Compound 1 is produced by enzymatic modification of a compound isolated from Siraitia grosvenorii. For example, in some embodiments, Compound 1 isolated from Siraitia grosvenorii is contacted with one or more enzymes to obtain the desired compounds. The contact can be in vivo (e.g., in a recombinant cell) or in vitro. The starting and intermediate compounds for producing Compound 1 can include, but are not limited to, Mogroside V, Mogroside IIE, Mogroside III.sub.E, Siamenoside I, Mogroside VI isomer, Mogroside II.sub.A, Mogroside IV.sub.E, or Mogroside IV.sub.A. One or more of these compounds can be made in vivo. Enzymes suitable for use to generate compounds described herein can include, but are not limited to, a pectinase, a -galactosidase (e.g., Aromase), a cellulase (e.g., Celluclast), a clyclomatlodextrin glucanotransferase (e.g., Toruzyme), an invertase, a glucostransferase (e.g., UGT76G1), a dextransucrase, a lactase, an arabanse, a xylanase, a hemicellulose, an amylase, or a combination thereof. In some embodiments, the enzyme is a Toruzyme comprises an amino acid sequence set forth in any one of SEQ ID NO: 89-94.
(107) Some embodiments provide a method of making Compound 1,
(108) ##STR00012##
the method comprises fractionating an extract of Siraitia grosvenorii on an HPLC column and collecting an eluted fraction comprising Compound 1.
(109) Some embodiments provide a method of making Compound 1,
(110) ##STR00013##
wherein the method comprises treating Mogroside III.sub.E with the glucose transferase enzyme UGT76G1. In some embodiments, UGT76G1 is encoded by a sequence set forth in SEQ ID NO: 440. In some embodiments, UGT76G1 comprises an amino acid sequence set forth in SEQ ID NO: 439.
(111) Various mogroside compounds can be used as intermediate compounds for producing Compound 1. A non-limiting example of such mogroside compounds is Compound 3 having the structure of:
(112) ##STR00014##
In some embodiments, a method for producing Compound 3 comprises contacting mogroside III.sub.E with a cell (e.g., a recombinant host cell) that expresses one or more cyclomaltodextrin glucanotransferases. In some embodiments, the cyclomaltodextrin glucanotransferase comprises an amino acid sequence set forth in SEQ ID NO: 95.
(113) Various mogroside compounds can be used as intermediate compounds for producing Compound 1. One non-limiting example of such mogroside compounds is Compound 12 having the structure of:
(114) ##STR00015##
In some embodiments, a method for producing Compound 12 comprises contacting mogroside VI with a cell (e.g., a recombinant host cell) that expresses one or more invertase.
(115) Various mogroside compounds can be used as intermediate compounds for producing Compound 1. One non-limiting example of such mogroside compounds is Compound 5 having the structure of:
(116) ##STR00016##
(117) In some embodiments, a method for producing Compound 5 comprises contacting mogroside III.sub.E with a cell (e.g., a recombinant host cell) that expresses one or more cyclomaltodextrin glucanotransferase. In some embodiments, the method is performed in the presence of starch.
(118) Various mogroside compounds can be used as intermediate compounds for producing Compound 1. One non-limiting example of such mogroside compounds is Compound 4 having the structure of:
(119) ##STR00017##
(120) In some embodiments, a method or producing Compound comprises contacting mogroside III.sub.E with a cell (e.g., a recombinant host cell) that expresses one or more cyclomaltodextrin glucanotransferase. In some embodiments, the method is performed in the presence of starch.
(121) Hydrolysis of Hyper-Glycosylated Mogrosides
(122) In some embodiments, one or more hyper-glycosylated mogrosides are hydrolyzed to Mogroside III.sub.E by contact with one or more hydrolase enzymes. In some embodiments, the hyper-glycosylated mogrosides are selected from a mogroside IV, a mogroside V, a mogroside VI, and combinations thereof. In some embodiments, the hyper-glycosylated mogrosides are selected from Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside IV.sub.E, and combinations thereof.
(123) It has been surprisingly discovered that Compound 1 displays tolerance to hydrolysis by certain hydrolyzing enzymes, even though such enzymes display capabilities of hydrolyzing hyper-glycosylated mogrosides to Mogroside IIIE. The alpha-linked glycoside present in Compound 1 provides a unique advantage over other mogrosides (e.g., beta-linked glycosides) due to its tolerance to hydrolysis. In some embodiments, during microbial production of Compound 1, the microbial host will hydrolyze unwanted beta-linked mogrosides back to Mogroside IIIE. This will improve the purity of Compound 1 due to the following: 1) Reduction of unwanted Mogroside VI, Mogroside V, and Mogroside IV levels, 2) The hydrolysis will increase the amount of Mogroside IIIE available to be used as a precursor for production of Compound 1.
(124)
(125) In some embodiments, the hydrolase is a -glucan hydrolase. In some embodiments, the hydrolase is EXG1. The EXG1 protein can comprise an amino acid sequence having at least 70%, 80%, 90%, 95%, 98%, 99%, or more sequence identity to SEQ ID NO: 1013 or 1014. In some embodiments, the EXG1 protein comprises, or consists of, an amino acid sequence set forth in SEQ ID NO: 1013 or 1014. In some embodiments, the hydrolase is EXG2. The EXG2 protein can comprise an amino acid sequence having at least 70%, 80%, 90%, 95%, 98%, 99%, or more sequence identity to SEQ ID NO: 1023. In some embodiments, the EXG2 protein comprises, or consists of, an amino acid sequence set forth in SEQ ID NO: 1023. The hydrolase can be, for example, any one of the hydrolases disclosed herein.
(126) Production of Compound 1 from Mogroside IIIE
(127) Compound 1 can be produced from Mogroside IIIE by contact with one or more enzymes capable of converting Mogroside IIIE to Compound 1. In some embodiments, the enzyme capable of catalyzing production of Compound 1 is one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(128) In some embodiments, the enzyme capable of catalyzing the production of Compound 1 is a CGTase. In some embodiments, the CGTase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase comprises, or consists of, the amino acid sequence of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the enzyme capable of catalyzing the production of Compound 1 is a dextransucrase. In some embodiments, the dextransucrase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity y to the sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the dextransucrase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the dextransucrase is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 104, 105, 157, 158, and 895. In some embodiments, the dextransucrase is encoded by a nucleic acid sequence comprising, or consisting of, any one of SEQ ID NOs: 104, 105, 157, 158, and 895. In some embodiments, the enzyme capable of catalyzing the production of Compound 1 is a transglucosidase. In some embodiments, the transglucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the transglucosidase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 163-292 and 723. Parameters for determining the percent sequence identity can be performed with ClustalW software of by Blast searched (ncbi.nih.gov). The use of these programs can determine conservation between protein homologues.
(129) In some embodiments, the enzyme capable of catalyzing the production of Compound 1 is a uridine diphosphate-glucosyl transferase (UGT). The UGT can comprise, for example, an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 4-9, 15-19, 125, 126, 128, 129, 293-307, 407, 409, 411, 413, 439, 441, and 444. In some embodiments, UGT comprises, or consists of, the amino acid sequence of any one of SEQ ID NOs: 4-9, 10-14, 125, 126, 128, 129, 293-304, 306, 407, 409, 411, 413, 439, 441, and 444. In some embodiments, the UGT is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO:13), UGT10391 (SEQ ID NO:14), and SEQ ID NOs: 116-124, 127, 130, 408, 410, 412, 414, 440, 442, 443, and 445. In some embodiments, the UGT is encoded by a nucleic acid sequence comprising, consisting of, any one of the nucleic acid sequence of UGT1495 (SEQ ID NO: 10), UGT1817(SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO:13), UGT10391 (SEQ ID NO:14), SEQ ID NOs: 116-124, 127, 130, 408, 410, 412, 414, 440, 442, 443, and 445. In some embodiments, the enzyme can be UGT98 or UGT SK98. For example, as described herein, a recombinant host cell capable of producing Compound 1 can comprises a third gene encoding UGT98 and/or UGT SK98. In some embodiments, the UGT98 or UGT SK98 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9 or 16. In some embodiments, the UGT comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO: 5), UGT85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9), UGT1576 (SEQ ID NO: 15), UGT SK98 (SEQ ID NO: 16), UGT430 (SEQ ID NO: 17), UGT1697 (SEQ ID NO: 18), and UGT11789 (SEQ ID NO: 19). In some embodiments, the UGT comprises, or consists of, an amino acid sequence of any one of UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO: 5), 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9), UGT1576 (SEQ ID NO: 15), UGT SK98 (SEQ ID NO:16), UGT430 (SEQ ID NO:17), UGT1697 (SEQ ID NO: 18), and UGT11789 (SEQ ID NO:19). In some embodiments, the UGT is encoded by a nucleic acid sequence at least 70%, 80%, 90%, 95%, 98%, 99% or more sequence identity to UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO:13) or UGT10391 (SEQ ID NO: 14). In some embodiments, the UGT is encoded by a nucleic acid sequence comprising, or consisting of, any one of the sequences of UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13), and UGT10391 (SEQ ID NO: 14). As disclosed herein, the enzyme capable of catalyzing the production of Compound 1 can comprises an amino acid sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more sequence identity to any one of the UGT enzymes disclosed herein. Furthermore, a recombinant host cell capable of producing Compound 1 can comprises an enzyme comprising, or consisting of a sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more sequence identity to any one of the UGT enzymes disclosed herein. In some embodiments, the recombinant host cell comprises an enzyme comprising, or consisting of a sequence of any one of the UGT enzymes disclosed herein.
(130) In some embodiments, the method of producing Compound 1 comprises treating Mogroside III.sub.E with the glucose transferase enzyme UGT76G1, for example the UGT76G1 of SEQ ID NO: 439 and the UGT76G1 encoded by the nucleic acid sequence of SEQ ID NO: 440.
(131) Enzymes for the Production of Mogroside Compounds and Compound 1
(132) As described herein, the enzymes of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases can comprise the amino acid sequences described in the table of sequences herein and can also be encoded by the nucleic acid sequences described in the Table of sequences. Additionally the enzymes can also include functional homologues with at least 70% sequence identity to the amino acid sequences described in the table of sequences. Parameters for determining the percent sequence identity can be performed with ClustalW software of by Blast searched (ncbi.nih.gov). The use of these programs can determine conservation between protein homologues.
(133) In some embodiments, the transglucosidases comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the CGTase comprises, or consists of, an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 1, 3, 78-101, and 154. In some embodiments, the transglucosidases comprise an amino acid sequence or is encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 163-290 and 723.
(134) The methods herein also include incorporating genes into the recombinant cells for producing intermediates such as pyruvate, acetyl-coa, citrate, and other TCA intermediates (Citric acid cycle). Intermediates can be further used to produce mogroside compounds for producing Compound 1. Methods for increasing squalene content are described in Gruchattka et al. and Rodriguez et al. (PLoS One. 2015 Dec. 23; 10(12; Microb Cell Fact. 2016 Mar. 3; 15:48; incorporated by reference in their entireties herein).
(135) Expression of enzymes to produce oxidosqualene and diepoxysqualene are further contemplated. The use of enzymes to produce oxidosqualene and diepoxysqualene can be used to boost squalene synthesis by the way of squalene synthase and/or squalene epoxidase. For example, Su et al. describe the gene encoding SgSQS, a 417 amino acid protein from Siraitia grosvenorii for squalene synthase (Biotechnol Lett. 2017 Mar. 28; incorporated by reference in its entirety herein). Genetically engineering the recombinant cell for expression of HMG CoA reductase is also useful for squalene synthesis (Appl Environ Microbiol. 1997 September; 63(9):3341-4.; Front Plant Sci. 2011 Jun. 30; 2:25; FEBS J. 2008 April; 275(8):1852-9.; all incorporated by reference in their entireties herein. In some embodiments, the 2, 3-oxidosqualene or diepoxysqualene is produced by an enzyme comprising a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 898 or 900. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 897 or 899.
(136) Expression of enzymes to produce cucurbitadienol/epoxycucurbitadienol are also contemplated. Examples of curubitadienol synthases from C. pepo, S grosvenorii, C sativus, C melo, C moschata, and C maxim are contemplated for engineering into the recombinant cells by a vector for expression. Oxidosqualene cyclases for titerpene biosynthesis is also contemplated for expression in the recombinant cell, which would lead to the cyclization of an acyclic substrate into various polycyclic triterpenes which can also be used as intermediates for the production of Compound 1 (Org Biomol Chem. 2015 Jul. 14; 13(26):7331-6; incorporated by reference in its entirety herein).
(137) Expression of enzymes that display epoxide hydrolase activities to make hydroxy-cucurbitadienols are also contemplated. In some embodiments herein, the recombinant cells for the production of Compound 1 further comprises genes that encode enzymes that display epoxide hydrolase activities to make hydroxy-cucurbitadienols are provided. Such enzymes are provided in Itkin et al. which is incorporated by reference in its entirety herein. The enzymes described in Itkin et al. are provided in Table 1, table of sequences, provided herein. Ikin et al., also describes enzymes for making key mogrosides, UGS families, glycosyltransferases and hydrolases that can be genetically modified for reverse reactions such as glycosylations.
(138) The expression of enzymes in recombinant cells to that hydroxylate mogroside compounds to produce mogrol are also contemplated. These enzymes can include proteins of the CAZY family, UDP glycosyltransferases, CGTases, Glycotransferases, Dextransucrases, Cellulases, B-glucosidases, Transglucosidases, Pectinases, Dextranases, yeast and fungal hydrolyzing enzymes. Such enzymes can be used for example for hydrolyzing Mogroside V to Mogroside IIIE, in which Mogroside IIIE can be further processed to produce Compound 1, for example in vivo. In some embodiments, fungal lactases comprise an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to anyone of SEQ ID NO: 678-722.
(139) In some embodiments, a mogrol precursor such as squalene or oxidosqualine, mogrol or mogroside is produced. The mogrol precursor can be used as a precursor in the production of Compound 1. Squalene can be produced from famesyl pyrophosphate using a squalene synthase, and oxidosqualene can be produced from squalene using a squalene epoxidase. The squalene synthase can be, for example, squalene synthase from Gynostemma pentaphyllum (protein accession number C4P9M2), a cucurbitaceae family plant. The squalene synthase can also comprise a squalene synthase from Arabidopsis thaliana (protein accession number C4P9M3), Brassica napus, Citrus macrophylla, Euphorbia tirucalli (protein accession number B9WZW7), Glycine max, Glycyrrhiza glabra (protein accession number Q42760, Q42761), Glycrrhiza uralensis (protein accession number D6QX40, D6QX41, D6QX42, D6QX43, D6QX44, D6QX45, D6QX47, D6QX39, D6QX55, D6QX38, D6QX53, D6QX37, D6QX35, B5AID5, B5AID4, B5AID3, C7EDDO, C6KE07, C6KE08, C7EDC9), Lotusjaponicas (protein accession number Q84LE3), Medicago truncatula (protein accession number Q8GSL6), Pisum sativum, Ricinus communis (protein accession number B9RHC3). Various squalene synthases have described in WO 2016/050890, the content of which is incorporated herein by reference in its entirety.
(140) Recombinant Host Cells
(141) Any one of the enzymes disclosed herein can be produced in vitro, ex vivo, or in vivo. For example, a nucleic acid sequence encoding the enzyme (including but not limited to any one of UGTs, CGTases, glycotransferases, dextransucrases, celluases, beta-glucosidases, amylases, transglucosidases, pectinases, dextranases, cytochrome P450, epoxide hydrolases, cucurbitadienol synthases, squalene epoxidases, squalene synthases, hydrolases, and oxidosqualene cyclases) can introduced to a host recombinant cell, for example in the form of an expression vector containing the coding nucleic acid sequence, in vivo. The expression vectors can be introduced into the host cell by, for example, standard transformation techniques (e.g., heat transformation) or by transfection. The expression systems can produce the enzymes for mogroside and Compound 1 production, in order to produce Compound 1 in the cell in vivo. Useful expression systems include, but are not limited to, bacterial, yeast and insect cell systems. For example, insect cell systems can be infected with a recombinant virus expression system for expression of the enzymes of interest. In some embodiments, the genes are codon optimized for expression in a particular cell. In some embodiments, the genes are operably linked to a promoter to drive transcription and translation of the enzyme protein. As described herein, codon optimization can be obtained, and the optimized sequence can then be engineered into a vector for transforming a recombinant host cell.
(142) Expression vectors can further comprise transcription or translation regulatory sequences, coding sequences for transcription or translation factors, or various promoters (e.g., GPD1 promoters) and/or enhancers, to promote transcription of a gene of interest in yeast cells.
(143) The recombinant cells as described herein are, in some embodiments, genetically modified to produce Compound 1 in vivo. Additionally, a cell can be fed a mogrol precursor or mogroside precursor during cell growth or after cell growth to boost rate of the production of a particular intermediate for the pathway for producing Compound 1 in vivo. The cell can be in suspension or immobilized. The cell can be in fermentation broth or in a reaction buffer. In some embodiments, a permeabilizing agent is used for transfer of a mogrol precursor or mogroside precursor into a cell. In some embodiments, a mogrol precursor or mogroside precursor can be provided in a purified form or as part of a composition or an extract.
(144) The recombinant host cell can be, for example a plant, bivalve, fish, fungus, bacteria or mammalian cell. For example, the plant can be selected from Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. The fungus can be selected from Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, and Lipomyces. In some embodiments, the fungus is Aspergillus nidulans, Yarrowia lipolytica, or Rhodosporin toruloides. In some embodiments, the recombinant host cell is a yeast cell. In some embodiments, the yeast is selected from Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, Yarrowia, and Microboryomycetes. In some embodiments, the bacteria is selected from Frankia, Actinobacteria, Streptomyces, and Enterococcus. In some embodiments, the bacteria is Enterococcus faecalis.
(145) In some embodiments, the recombinant genes are codon optimized for expression in a bacterial, mammalian, plant, fungal or insect cell. In some embodiments, one or more of genes comprises a functional mutation to increased activity of the encoded enzyme. In some embodiments, cultivating the recombinant host cell comprises monitoring the cultivating for pH, dissolved oxygen level, nitrogen level, or a combination thereof of the cultivating conditions.
(146) Producing Mogrol from Squalene
(147) Some embodiments of the method of producing Compound 1 comprises producing an intermediate for use in the production of Compound 1. The compound having the structure of:
(148) ##STR00018##
is produced in vivo in a recombinant host. In some embodiments, the compound is in the recombinant host cell, is secreted into the medium in which the recombinant cell is growing, or both. In some embodiments, the recombinant cell further produces intermediates such as mogroside compounds in vivo. The recombinant cell can be grown in a culture medium, under conditions in which the genes disclosed herein are expressed. Some embodiments of methods of growing the cell are described herein.
(149) In some embodiments, the intermediate is, or comprises, at least one of squalene, oxidosqualene, curubitadienol, mogrol and mogrosides. In some embodiments, the mogroside is Mogroside IIE. As described herein, mogrosides are a family of glycosides that can be naturally isolated from a plant or a fruit, for example. As contemplated herein, the mogrosides can be produced by a recombinant host cell.
(150) In some alternatives of the methods described herein, the recombinant host cell comprises a polynucleotide or a sequence comprising one or more of the following:
(151) a gene encoding squalene epoxidase;
(152) a gene encoding cucurbitadienol synthase;
(153) a gene encoding cytochrome P450;
(154) a gene encoding cytochrome P450 reductase; and
(155) a gene encoding epoxide hydrolase.
(156) In some embodiments, the squalene epoxidase comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 54. In some embodiments, the squalene epoxidase comprises a sequence from Arabidopsis thaliana (the protein accession numbers: Q9SM02, 065403, 065402, 065404, 081000, or Q9T064), Brassica napus (protein accession number 10 065727, 065726), Euphorbia tirucalli (protein accession number A7VJN1), Medicago truncatula (protein accession number Q8GSM8, Q8GSM9), Pisum sativum, and Ricinus communis (protein accession number B9R6VO, B9S7W5, B9S6Y2, B9TOY3, B9S7TO, B9SX91) and functional homologues of any of the aforementioned sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith. In some embodiments, the squalene epoxidase comprises, or consists of an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 50-56, 60, 61, 334 or 335.
(157) In some embodiments, the cell comprises genes encoding ERG7 (lanosterol synthase). In some embodiments, lanosterol synthase comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 111. In some embodiments, the P450 polypeptide is encoded in genes comprising a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of Claims: 31-48. In some embodiments, the sequences can be separated by ribosome skip sequences to produce separated proteins.
(158) In some embodiments, the recombinant host cell comprises a gene encoding a polypeptide having cucurbitadienol synthase activity. In some embodiments, the polypeptide having cucurbitadienol synthase activity comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the polypeptide having cucurbitadienol synthase activity comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NO: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904 and 906. In some embodiments, the polypeptide having cucurbitadienol synthase activity comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the polypeptide having cucurbitadienol synthase activity is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74.
(159) In some embodiments, the polypeptide having cucurbitadienol synthase activity is a fusion polypeptide comprising a fusion domain fused to a cucurbitadienol synthase. The fusion domain can be fused to, for example, N-terminus or C-terminus of a cucurbitadienol synthase. The fusion domain can be located, for example, at the N-terminal region or the C-terminal region of the fusion polypeptide. The length of the fusion domain can vary. For example, the fusion domain can be, or be about, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 25, 30, 40, 50, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, or a range between any two of these numbers amino acids long. In some embodiments, the fusion domain is 3 to 1000 amino acids long. In some embodiments, the fusion domain is 5 to 50 amino acids long. In some embodiments, the fusion domain comprises a substantial portion or the entire sequence of a functional protein. In some embodiments, the fusion domain comprises a portion or the entire sequence of a yeast protein. For example, the fusion polypeptide having cucurbitadienol synthase activity can comprise an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 851, 854, 856, 1024, 859, 862, 865, 867, 915, 920, 924, 928, 932, 936, 940, 944, 948, 952, 956, 959, 964, 967, 971, 975, 979, 983, 987, 991, 995, 999, 1003, 1007, and 1011. In some embodiments, the fusion polypeptide comprises, or consists of, an amino acid sequence set forth in any one of SEQ ID NOs: 851, 854, 856, 1024, 859, 862, 865, 867, 915, 920, 924, 928, 932, 936, 940, 944, 948, 952, 956, 959, 964, 967, 971, 975, 979, 983, 987, 991, 995, 999, 1003, 1007, and 1011. In some embodiments, the fusion domain of the fusion polypeptide comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 866, 870, 917, 921, 925, 929, 933, 937, 941, 945, 949, 953, 957, 961, 968, 972, 976, 980, 984, 988, 992, 996, 1000, 1004, 1008, and 1012. In some embodiments, the fusion domain of the fusion polypeptide comprises, or consists of, an amino acid sequence set forth in any one of SEQ ID NOs: 866, 870, 917, 921, 925, 929, 933, 937, 941, 945, 949, 953, 957, 961, 968, 972, 976, 980, 984, 988, 992, 996, 1000, 1004, 1008, and 1012. In some embodiments, the cucurbitadienol synthase fused with the fusion domain comprises an amino acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327, 329-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase fused with the fusion domain comprises, or consists of, an amino acid sequence set forth in any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327, 329-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase fused with the fusion domain is encoded by a gene comprising a nucleic acid sequence having, or having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905. In some embodiments, the cucurbitadienol synthase fused with the fusion domain is encoded by a gene comprising, or consists of, a nucleic acid sequence set forth in any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905. Disclosed herein include a recombinant nucleic acid molecule comprising a nucleic acid sequence encoding a fusion polypeptide having cucurbitadienol synthase activity. Also disclosed include a recombinant cell comprising a fusion polypeptide having cucurbitadienol synthase activity or a recombinant nucleic acid molecule encoding the fusion polypeptide.
(160) The fusion polypeptides having cucurbitadienol synthase activity disclosed herein can be used to catalyze enzymatic reactions as cucurbitadienol synthases. For example, a substrate for cucurbitadienol synthase can be contacted with one or more of these fusion polypeptide to produce reaction products. Non-limiting examples of the reaction product include curcurbitadienol, 24,25-epoxy curcurbitadienol, and any combination thereof. Non-limiting examples of the substrate for cucurbitadienol synthase include 2,3-oxidosqualene, dioxidosqualene, diepoxysqualene, and any combination thereof. In some embodiments, the substrate can be contacted with a recombinant host cell which comprises a nucleic acid sequence encoding one or more fusion polypeptides having cucurbitadienol synthase activity. The substrate can be provided to the recombinant host cells, present in the recombinant host cell, produced by the recombinant host cell, or any combination thereof.
(161) In some embodiments, the cytochrome P450 is a CYP5491. In some embodiments, the cytochrome P540 comprises an amino acid sequence having, or having at least, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence set forth in SEQ ID NO: 44 and/or SEQ ID NO:74. In some embodiments, the P450 reductase polypeptide comprises an amino acid sequence having, or having at least, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 46. In some embodiments, the P450 polypeptide is encoded by a gene comprising a sequence having, or having at least, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, 100% or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891.
(162) In some embodiments, the epoxide hydrolase comprises an sequence having, or having at least, 70%, 80%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 38 or 40. In some embodiments, the epoxide hydrolase comprises, or consists of, the sequence set forth in SEQ ID NO: 38 or 40.
(163) Some Methods of Producing Squalene for Mogrol Production
(164) Squalene is a natural 30 carbon organic molecule that can be produced in plants and animals and is a biochemical precursor to the family of steroids. Additionally, squalene can be used as precursor in mogrol syntheses in vivo in a host recombinant cell. Oxidation (via squalene monooxygenase) of one of the terminal double bonds of squalene yields 2,3-squalene oxide, which undergoes enzyme-catalyzed cyclization to afford lanosterol, which is then elaborated into cholesterol and other steroids. As described in Gruchattka et al. (In Vivo Validation of In Silico Predicted Metabolic Engineering Strategies in Yeast: Disruption of -Ketoglutarate Dehydrogenase and Expression of ATP-Citrate Lyase for Terpenoid Production. PLOS ONE Dec. 23, 2015; incorporated by reference in its entirety herein), synthesis of squalene can occur initially from precursors of the glycolysis cycle to produce squalene. Squalene in turn can be upregulated by the overexpression of ATP-citrate lyase to increase the production of squalene. Some embodiments disclosed herein include enzymes for producing squalene and/or boosting the production of squalene in recombinant host cells, for example recombinant yeast cells. ATP citrate lyase can also mediate acetyl CoA synthesis which can be used for squalene and mevalonate production, which was seen in yeast, S. cerevisiae (Rodrigues et al. ATP citrate lyase mediated cytosolic acetyl-CoA biosynthesis increases mevalonate production in Saccharomyces cerevisiae Microb Cell Fact. 2016; 15: 48.; incorporated by reference in its entirety). On example of the gene encoding an enzyme for mediating the acetyl CoA synthesis is set forth in SEQ ID NO: 130. In some embodiments herein, the recombinant cell comprises sequences for mediating acetyl CoA synthesis.
(165) Some embodiments disclosed herein provide methods for producing Compound 1 having the structure of:
(166) ##STR00019##
(167) In some embodiments, the methods further comprises producing intermediates in the pathway for the production of compound 1 in vivo. In some embodiments, the recombinant host cell that produces Compound 1 comprises at least one enzyme capable for converting dioxidosqualene to produce 24,25 epoxy cucurbitadienol, converting oxidosqualene to cucurbitadienol, catalyzing the hydroxylation of 24,25, epoxy cucurbitadienol to 11-hydroxy-24,25 epoxy cucurbitadienol, enzyme for catalyzing the hydroxylation of cucurbitadienol to 11-hydroxy-cucurbitadienol, enzyme for the epoxidation of cucurbitadienol to 24,25 epoxy cucurbitadienol, enzymes capable of catalyzing epoxidation of 11-hydroxy-cucurbitadienol to produce 11-hydroxy-24,25 epoxy cucurbitadienol, enzymes for the conversion of 11-hydroxy-cucurbtadienol to 11-hydroxy-24,25 epoxy cucurbitadienol, enzymes for catalyzing the conversion of 11-hydroxy-24,25 epoxy cucurbitadienol to produce mogrol and/or enzymes for catalyzing the glycosylation of a mogroside precursor to produce a mogroside compound. In some embodiments, the enzyme for glycosylation is encoded by a sequence set forth in any one of SEQ ID NOs: 121, 122, 123, and 124.
(168) In some embodiments, the enzyme for catalyzing the hydroxylation of 24,25 epoxy cucurbitadienol to form 11-hydroxy-24,25 epoxy cucurbitadinol is CYP5491. In some embodiments, the CYP5491 comprises a sequence set forth in SEQ ID NO: 49. In some embodiments, the squalene epoxidase comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of SEQ ID NO: 54.
(169) In some embodiments, the enzyme capable of epoxidation of 11-hydroxycucurbitadientol comprises an amino acid sequence set forth in SEQ ID NO: 74.
(170) In some alternatives, the recombinant cell comprises genes for expression of enzymes capable of converting dioxidosqualent to 24,25 epoxy cucurbitadienol, converting oxidosqualene to cucurbitadienol, hydroxylation of 24,25 epoxy cucurbitadienol to 11-hydroxy-24,25 epoxy cucurbitadienol, hydroxylation of cucurbitadienol to produce 11-hydroxy-cucurbitadienol, epoxidation of cucurbitadienol to produce 24,25 epoxy cucurbitadienol, and/or epoxidation of 11-hydroxycucurbitadienol to produce 11-hydroxy-24,25 epoxy cucurbitadienol. In these embodiments herein, the intermediates and mogrosides are produced in vivo.
(171) In some embodiments, a method of producing Compound 1 further comprises producing one or more of mogroside compounds and intermediates, such as oxidosqualene, dixidosqualene, cucurbitdienol, 24,25 epoxy cucurbitadienol, 11-hydrosy-cucurbitadienol, 11-hydroxy 24,25 epoxy cucurbitadienol, mogrol, and mogroside compounds.
(172) Methods for the Production of Mogroside Compounds
(173) Described herein include methods of producing a mogroside compound, for example, one of the mogroside compounds described in WO2014086842 (incorporated by reference in its entirety herein). The mogroside compound can be used as an intermediate by a cell to further produce Compound 1 disclosed herein.
(174) Recombinant hosts such as microorganisms, plant cells, or plants can be used to express polypeptides useful for the biosynthesis of mogrol (the triterpene core) and various mogrol glycosides (mogrosides).
(175) In some embodiments, the production method can comprise one or more of the following steps in any orders:
(176) (1) enhancing levels of oxido-squalene
(177) (2) enhancing levels of dioxido-squalene
(178) (3) Oxido-squalene.fwdarw.cucurbitadienol
(179) (4) Dioxido-squalene.fwdarw.24,25 epoxy cucurbitadienol
(180) (5) Cucurbitadienol.fwdarw.11-hydroxy-cucurbitadienol
(181) (6) 24,25 epoxy cucurbitadienol.fwdarw.11-hydroxy-24,25 epoxy cucurbitadienol
(182) (7) 11-hydroxy-cucurbitadienol.fwdarw.mogrol
(183) (8) 11-hydroxy-24,25 epoxy cucurbitadienol.fwdarw.mogrol
(184) (9) mogrol.fwdarw.various mogroside compounds.
(185) In the embodiments herein, the oxido-squalene, dioxido-squalene, cucurbitadienol, 24,25 epoxy cucurbitadienol or mogrol may be also produced by the recombinant cell. The method can include growing the recombinant microorganism in a culture medium under conditions in which one or more of the enzymes catalyzing step(s) of the methods of the invention, e.g. synthases, hydrolases, CYP450s and/or UGTs are expressed. The recombinant microorganism may be grown in a fed batch or continuous process. Typically, the recombinant microorganism is grown in a fermenter at a defined temperature(s) for a desired period of time in order to increase the yield of Compound 1.
(186) In some embodiments, mogroside compounds can be produced using whole cells that are fed raw materials that contain precursor molecules to increase the yield of Compound 1. The raw materials may be fed during cell growth or after cell growth. The whole cells may be in suspension or immobilized. The whole cells may be in fermentation broth or in a reaction buffer.
(187) In some embodiments, the recombinant host cell can comprise heterologous nucleic acid(s) encoding an enzyme or mixture of enzymes capable of catalyzing Oxido-squalene to cucurbitadienol, Cucurbitadienol to 11-hydroxycucurbitadienol, 11-hydroxy-cucurbitadienol to mogrol, and/or mogrol to mogroside. In some embodiments, the cell can further comprise Heterologous nucleic acid(s) encoding an enzyme or mixture of enzymes capable of catalyzing Dioxido-squalene to 24,25 epoxy cucurbitadienol, 24,25 epoxy cucurbitadienol to hydroxy-24,25 epoxy cucurbitadienol, 11-hydroxy-24,25 epoxy cucurbitadienol to mogrol, and/or mogrol to mogroside
(188) The host cell can comprises a recombinant gene encoding a cucurbitadienol synthase and/or a recombinant gene encoding a cytochrome P450 polypeptide.
(189) In some embodiments, the cell comprises a protein having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906 (curcurbitadienol synthase).
(190) In some embodiments, the conversion of Oxido-squalene to cucurbitadienol is catalyzed by cucurbitadienol synthase of any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906, or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith. In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74.
(191) In some embodiments, the conversion of Cucurbitadienol to 11-hydroxy-cucurbitadienol is catalyzed CYP5491 of SEQ ID NO: 49 or a functional homologue thereof sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith.
(192) In some embodiments, the conversion of 11-hydroxy-cucurbitadienol to mogrol comprises a polypeptide selected from the group consisting of Epoxide hydrolase 1 of SEQ ID NO: 29, Epoxide hydrolase 2 of SEQ ID NO: 30 and functional homologues of the aforementioned sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith. In some embodiments, the genes encoding epoxide hydrolase 1 and epoxide hydrolase 2 are codon optimized for expression. In some embodiments, the codon optimized genes for epoxide hydrolase comprise a nucleic acid sequence set forth in SEQ ID NO: 114 or 115.
(193) In some embodiments, the epoxide hydrolase comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 21-28 (Itkin et al, incorporated by reference in its entirety herein).
(194) In some embodiments, the conversion of mogrol to mogroside is catalyzed in the host recombinant cell by one or more UGTs selected from the group consisting of UGT1576 of SEQ ID NO: 15, UGT98 of SEQ ID NO: 9, UGT SK98 of SEQ ID NO: 68 and functional homologues of the aforementioned sharing at least 70%, such as at least 80%, for example at least 90%, such as at least 95%, for example at least 98% sequence identity therewith.
(195) In some embodiments, the host recombinant cell comprises a recombinant gene encoding a cytochrome P450 polypeptide is encoded by any one of the sequences in SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891.
(196) In some embodiments, the host recombinant cell comprises a recombinant gene encoding squalene epoxidase polypeptide comprising the sequence in SEQ ID No: 50.
(197) In some embodiments, the host recombinant cell comprises a recombinant gene encoding cucurbitadienol synthase polypeptide of any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74.
(198) Production of Mogroside Compounds from Mogrol
(199) In some embodiments, the method of producing Compound 1 comprises contacting mogroside IIIE with a first enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, the method is performed in vivo, wherein a recombinant cell comprises a gene encoding the first enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, the cell further comprises a gene encoding an enzyme capable of catalyzing production of mogroside IE1 from mogrol. In some embodiments, the enzyme comprises a sequence set forth in any one of SEQ ID NOs: 4-8.
(200) In some embodiments, the cell further comprises enzymes to convert mogroside IE to mogroside IV, mogroside V, 11-oxo-mogroside V, and siamenoside I. In some embodiments, the enzymes for converting mogroside IIE to mogroside IV, mogroside V, 11-oxo-mogroside V, and siamenoside I are encoded by genes that comprise the nucleic acid sequences set forth in SEQ ID NOs: 9-14 and 116-120. In some embodiments, the method of producing Compound 1 comprises treating Mogroside III.sub.E with the glucose transferase enzyme UGT76G1.
(201) In some embodiments, the method comprises fractionating lysate from a recombinant cell on an HPLC column and collecting an eluted fraction comprising Compound 1.
(202) In some embodiments, the method comprises contacting mogroside IIIE with a first enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, contacting mogroside IIIE with the first enzyme comprises contacting mogroside IIIE with a recombinant host cell that comprises a first gene encoding the first enzyme. In some embodiments, the first gene is heterologous to the recombinant host cell. In some embodiments, the mogroside IIIE contacts with the first enzyme in a recombinant host cell that comprises a first polynucleotide encoding the first enzyme. In some embodiments, the mogroside IIIE is present in the recombinant host cell. In some embodiments, the mogroside IIIE is produced by the recombinant host cell. In some embodiments, the method comprises cultivating the recombinant host cell in a culture medium under conditions in which the first enzyme is expressed. In some embodiments, the first enzyme is one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the first enzyme is a CGTase. In some embodiments, the CGTase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 1, 3, 78-101, 148 and 154. In some embodiments, the transglucosidases are encoded by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the CGTases comprises, or consists of, a sequence set forth in any one of SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the first enzyme is a dextransucrase. In some embodiments, the dextransucrase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the DexT comprises an amino acid sequence any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the DexT comprises a nucleic acid sequence set forth in SEQ ID NO: 104 or 105. In some embodiments, the dextransucrase comprises an amino acid sequence of SEQ ID NO: 2 or 106-110. In some embodiments, the first enzyme is a transglucosidase. In some embodiments, the transglucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 3163-291 and 723. In some embodiments, the transglucosidase comprises an amino acid sequence of SEQ ID NOs: 163-291 and 723. In some embodiments, the transglucosidases are encoded by any one of SEQ ID NOs: 163-291 and 723. In some embodiments, the transglucosidases comprises an amino acid sequence set forth by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the genes encode a CGTase comprising any one of the sequence set forth in SEQ ID NOs: 1, 3, 78-101, and 154.
(203) In some embodiments, the method comprises contacting Mogroside IIA with the recombinant host cell to produce mogroside IIIE, wherein the recombinant host cell further comprises a second gene encoding a second enzyme capable of catalyzing production of Mogroside IIIE from Mogroside IIA. In some embodiments, the mogroside IIA is produced by the recombinant host cell. In some embodiments, the second enzyme is one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the transglucosidases comprises an amino acid sequence set forth by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the genes encode a CGTase comprising an amino acid sequences set forth in SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the UGT is UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO:5, 444 or 445), 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9 or 407), UGT1576 (SEQ ID NO: 15), UGT SK98 (SEQ ID NO: 16), UGT430 (SEQ ID NO: 17), UGT1697 (SEQ ID NO: 18), or UGT11789 (SEQ ID NO: 19) or any one of SEQ ID NOs: 4, 5, 7-9, 15-19, 125, 126, 128, 129, 293-304, 306, 307, 407, 439, 441, and 444. In some embodiments, the UGT is encoded by a gene set forth in UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13) or UGT10391 (SEQ ID NO: 14).
(204) In some embodiments, the method comprises contacting mogrol with the recombinant host cell to produce mogroside IIIE, wherein the recombinant host cell further comprises one or more genes encoding one or more enzymes capable of catalyzing production of mogroside IIIE from mogrol. In some embodiments, the mogrol is produced by the recombinant host cell. In some embodiments, the one or more enzymes comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT is UGT73C3, UGT73C6, 85C2, UGT73C5, UGT73E1, UGT98, UGT1495, UGT1817, UGT5914, UGT8468, UGT10391, UGT1576, UGT SK98, UGT430, UGT1697, or UGT11789.
(205) In some embodiments, the method comprises contacting a mogroside compound with the recombinant host cell to produce mogroside IIIE, wherein the recombinant host cell further comprises one or more genes encoding one or more enzymes capable of catalyzing production of mogroside IIIE from the mogroside compound, wherein the mogroside compound is one or more of mogroside IA1, mogroside IE1, mogroside IIA1, mogroside IIE, mogroside IIIA1, mogroside IIIA2, mogroside III, mogroside IV, mogroside IVA, mogroside V, or siamenoside. In some embodiments, the mogroside compound is produced by the recombinant host cell. In some embodiments, the one or more enzymes comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the transglucosidases comprises an amino acid sequence set forth by any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the genes encode a CGTase comprising an amino acid sequences set forth in SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the method comprises contacting Mogroside IA1 with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding UGT98 or UGT SK98. In some embodiments, the UGT98 or UGT SK98 enzyme comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9, 407 or 16. In some embodiments, the contacting results in production of Mogroside IIA in the cell. In some embodiments, the one or more enzymes comprises an amino acid set forth by any one of SEQ ID NOs: 1, 3, 78-101, 106-109, 147, 154, 163-303, 405, 411, 354-405, 447-723, 770, 776, and 782.
(206) In some embodiments, the method further comprises contacting 11-hydroxy-24,25 epoxy cucurbitadienol with the recombinant host cell, wherein the recombinant host cell further comprises a third gene encoding an epoxide hydrolase. In some embodiments, the 11-hydroxy-24,25 epoxy cucurbitadienol is produced by the recombinant host cell. In some embodiments, the method further comprises contacting 11-hydroxy-cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a fourth gene encoding a cytochrome P450 or an epoxide hydrolase. In some embodiments, the P450 polypeptide is encoded in genes comprising the sequence set forth in any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the 11-hydroxy-cucurbitadienol is produced by the recombinant host cell.
(207) In some embodiments, the method further comprises contacting 3, 24, 25 trihydroxy cucurbitadienol with the recombinant host cell, wherein the recombinant host cell further comprises a fifth gene encoding a cytochrome P450. In some embodiments, the P450 polypeptide is encoded in genes comprising the sequence set forth in any one of SEQ ID NOs: 31-48, 316 and 318. In some embodiments, the 3, 24, 25 trihydroxy cucurbitadienol is produced by the recombinant host cell. In some embodiments, the contacting results in production of Mogrol in the recombinant host cell. In some embodiments, the cytochrome P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 20, 308 or 315. In some embodiments, the P450 polypeptide is encoded in genes comprising the sequence set forth in any one SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the epoxide hydrolase comprises an amino acid sequence having at least 70% of sequence identity to any one of SEQ ID NOs: 21-30 and 309-314.
(208) In some embodiments, the method further comprises contacting cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding cytochrome P450. In some embodiments, contacting results in production of 11-cucurbitadienol. In some embodiments, the 11-hydroxy cucurbitadienol is expressed in cells comprising a gene encoding CYP87D18 or SgCPR protein. In some embodiments, CYP87D18 or SgCPR comprises a sequence set forth in SEQ ID NO: 315, 872 or 874. In some embodiments, the CYP87D18 or SgCPR is encoded by SEQ ID NO: 316, 871 or 873. In some embodiments, the cucurbitadienol is produced by the recombinant host cell. In some embodiments, the gene encoding cytochrome P450 comprises a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID Nos: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the cytochrome P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the P450 polypeptide is encoded in genes comprising the sequence set forth in any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891.
(209) In some embodiments, the method further comprises contacting 2,3-oxidosqualene with the recombinant host cell, wherein the recombinant host cell comprises a seventh gene encoding cucurbitadienol synthase. In some embodiments, he cucurbitadienol synthase comprises an amino acid sequence set forth in SEQ ID NO: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904 or 906. In some embodiments, the cucurbitadienol synthase is encoded by any one sequence set forth in SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903 and 905. In some embodiments, the contacting results in production of cucurbitadienol. In some embodiments, the 2,3-oxidosqualene is produced by the recombinant host cell. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme comprising a sequence set forth in SEQ ID NO: 898 or 900. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme encoded by a nucleic acid sequence set forth in SEQ ID NO: 897 or 899.
(210) In some embodiments, the cucurbitadienol synthase is encoded by a gene comprising a sequence set forth in SEQ ID NO: 74. In some embodiments, the cucurbitadienol synthase is encoded by a gene comprising a nucleic acid sequence set forth in any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905. In some embodiments, 11-hydroxy cucurbitadienol is produced by the cell. In some embodiments, 11-OH cucurbitadienol is expressed in cells comprising a gene encoding CYP87D18 or SgCPR protein. In some embodiments, CYP87D18 or SgCPR comprises a sequence set forth in SEQ ID NO: 315, 872 or 874. In some embodiments, the CYP87D18 or SgCPR is encoded by SEQ ID NO: 316, 871 or 873. In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74. In some embodiments, the cucurbitadienol synthase comprises an amino acid sequence having at least 70% sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906 (which include, for example, cucurbitadienol synthases from C. pepo, S grosvenorii, C sativus, C melo, C moschata, and C maxim). In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74. In some embodiments, the cucurbitadienol synthase comprises an amino acid comprising the polypeptide from Lotus japonicas (BAE53431), Populus trichocarpa (XP_002310905), Actaea racemosa (ADC84219), Betula platyphylla (BAB83085), Glycyrrhiza glabra (BAA76902), Vitis vinifera (XP_002264289), Centella asiatica (AAS01524), Panax ginseng (BAA33460), and Betula platyphylla (BAB83086), as described in WO 2016/050890, incorporated by reference in its entirety herein.
(211) In some embodiments, the method comprises contacting squalene with the recombinant host cell, wherein the recombinant host cell comprises an eighth gene encoding a squalene epoxidase. In some embodiments, the contacting results in production of 2, 3-oxidosqualene. In some embodiments, the squalene is produced by the recombinant host cell. In some embodiments, the squalene epoxidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 50-56, 60, 61, 334 or 335.
(212) In some embodiments, the method comprises contacting farnesyl pyrophosphate with the recombinant host cell, wherein the recombinant host cell comprises a ninth gene encoding a squalene synthase. In some embodiments, the contacting results in production of squalene. In some embodiments, the farnesyl pyrophosphate is produced by the recombinant host cell. In some embodiments, the squalene synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 69 or 336.
(213) In some embodiments, the method further comprises contacting geranyl-PP with the recombinant host cell, wherein the recombinant host cell comprises a tenth gene encoding farnesyl-PP synthase. In some embodiments, the contacting results in production of farnesyl-PP. In some embodiments, the geranyl-PP is produced by the recombinant host cell. In some embodiments, the farnesyl-PP synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 338. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is a CMV, EF1a, SV40, PGK1, human beta actin, CAG, GAL1, GAL10, TEF, GDS, ADH1, CaMV35S, Ubi, T7, T7lac, Sp6, araBAD, trp, Lac, Ptac, pL promoter, or a combination thereof. In some embodiments, the promoter is an inducible, repressible, or constitutive promoter. In some embodiments, production of one or more of pyruvate, acetyl-CoA, citrate, and TCA cycle intermediates have been upregulated in the recombinant host cell. In some embodiments, cytosolic localization has been upregulated in the recombinant host cell. In some embodiments, one or more of the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene comprises at least one sequence encoding a 2A self-cleaving peptide. As used herein, the terms the first, the second, the third, the fourth, the fifth, the sixth, the seventh, the eighth, the ninth, the tenth, and alike do not infer particular order and/or a requirement for presence of the earlier number. For example, the recombinant host cell described herein can comprise the first gene and the third gene, but not the second gene. As another example, the recombinant host cell can comprise the first gene, the fifth gene, and the tenth gene, but not the second gene, the third gene, the fourth gene, the sixth gene, the seventh gene, the eighth gene, and the ninth gene.
(214) The recombinant host cell can be, for example, a plant, bivalve, fish, fungus, bacteria or mammalian cell. For example, the plant is selected from Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. In some embodiments, fungus is selected from Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, and Lipomyces. In some embodiments, the fungus is Aspergillus nidulans, Yarrowia lipolytica, or Rhodosporin toruloides. In some embodiments, the recombinant host cell is a yeast cell. In some embodiments, the yeast is selected from Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, and Microboryomycetes. In some embodiments, the bacteria is selected from Frankia, Actinobacteria, Streptomyces, Enterococcus, In some embodiments, the bacteria is Enterococcus faecalis. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth genes has been codon optimized for expression in a bacterial, mammalian, plant, fungal or insect cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth genes comprises a functional mutation to increased activity of the encoded enzyme. In some embodiments, cultivating the recombinant host cell comprises monitoring the cultivating for pH, dissolved oxygen level, nitrogen level, or a combination thereof of the cultivating conditions. In some embodiments, the method comprises isolating Compound 1. In some embodiments, isolating Compound 1 comprises lysing the recombinant host cell. In some embodiments, isolating Compound 1 comprises isolating Compound 1 from the culture medium. In some embodiments, the method comprises purifying Compound 1. In some embodiments, purifying Compound 1 comprises HPLC, solid phase extraction or a combination thereof. In some embodiments, the purifying comprises harvesting the recombinant cells, saving the supernatant and lysing the cells. In some embodiments, the lysing comprises subjecting the cells to shear force or detergent washes thereby obtaining a lysate. In some embodiments, the shear force is from a sonication method, french pressurized cells, or beads. In some embodiments, the lysate is subjected to filtering and purification steps. In some embodiments, the lysate is filtered and purified by solid phase extraction.
(215) In some embodiments, a compound having the structure of Compound 1,
(216) ##STR00020##
is provided, wherein the compound is produced by the method of any one of the alternative methods provided herein.
(217) In some embodiments, a cell lysate comprising Compound 1 having the structure:
(218) ##STR00021##
is provided.
(219) In some embodiments, a recombinant cell comprising: Compound 1 having the structure:
(220) ##STR00022##
is provided, and a gene encoding an enzyme capable of catalyzing production of Compound 1 from mogroside IIIE. In some embodiments, the gene is a heterologous gene to the recombinant cell.
(221) In some embodiments, a recombinant cell comprising a first gene encoding a first enzyme capable of catalyzing production of Compound 1 having the structure:
(222) ##STR00023##
from mogroside IIIE is provided. In some embodiments, the first enzyme comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 1, 3, 78-101, 148, or 154 (CGTase). In some embodiments, the first enzyme comprises the amino acid sequence of SEQ ID NOs: 1, 3, 78-101, 148, or 154 (CGTase). In some embodiments, the first enzyme is a dextransucrase. In some embodiments, the dextransucrase comprises, or consists of, an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 2, 103, 106-110, 156, and 896. In some embodiments, the dextransucrase comprises, or consists of, the amino acid sequence of SEQ ID NO: 2, 103, 104, or 105. In some embodiments, the dextransucrase comprises, or consists of, the amino acid sequence of any one of SEQ ID NO: 2, 103-110 and 156-162 and 896. In some embodiments, the DexT comprises a nucleic acid sequence set forth in SEQ ID NO: 104 or 105. In some embodiments, the first enzyme is a transglucosidase. In some embodiments, the transglucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of SEQ ID NO: 201 or SEQ ID NO: 291. In some embodiments, the recombinant cell further comprises a second gene encoding a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 4, 5, 6, 7, 8, 9, 15, 16, 17, 18, and 19. In some embodiments, UGT comprises, or consists of, the amino acid sequence of any one of SEQ ID NOs: 4, 5, 6, 7, 8, 9, 15, 16, 17, and 18. In some embodiments, the UGT is encoded by a sequence set forth in UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13), or UGT10391 (SEQ ID NO: 14). In some embodiments, the cell comprises a third gene encoding UGT98 or UGT SK98. In some embodiments, the UGT98 or UGT SK98 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9, 407 or 16. In some embodiments, the cell comprises a fourth gene encoding an epoxide hydrolase. In some embodiments, the epoxide hydrolase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 21-30 and 309-314. In some embodiments, the cell comprises a fifth sequence encoding P450. In some embodiments, the P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NOs: 20, 49, 308, 315 or 317. In some embodiments, P450 is encoded by a gene comprising a sequence set forth in any one of SEQ ID NOs: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, further comprises a sixth sequence encoding cucurbitadienol synthase. In some embodiments, the cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase polypeptide comprises a C-terminal portion comprising the sequence set forth in SEQ ID NO: 73. In some embodiments, the gene encoding the cucurbitadienol synthase polypeptide is codon optimized. In some embodiments, the codon optimized gene comprises the nucleic acid sequence set forth in SEQ ID NO: 74. In some embodiments, the cell further comprises a seventh gene encoding a squalene epoxidase. In some embodiments, the squalene epoxidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 50-56, 60, 61, 334, and 335. In some embodiments, the cell further comprises an eighth gene encoding a squalene synthase. In some embodiments, the eighth gene comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 69 or SEQ ID NO: 336. In some embodiments, the cell further comprises a ninth gene encoding a farnesyl-PP synthase. In some embodiments, the farnesyl-PP synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 338. In some embodiments, the cell is a mammalian, bacterial, fungal, or insect cell. In some embodiments, the cell is a yeast cell. Non-limiting examples of the yeast include Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, and Microboryomycetes. In some embodiments, the plant is selected from the group consisting of Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. In some embodiments, the fungus is Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, or Metarhizium.
(223) In some embodiments, the cell comprises a sequence of an enzyme set forth in any one of SEQ ID NO: 897, 899, 909, 911, 913, 418, 421, 423, 425, 427, 871, 873, 901, 903 or 905. In some embodiments, the enzyme comprises a sequence set forth in or is encoded by a sequence in SEQ ID NO: 420, 422, 424, 426, 446, 872, 874-896, 898, 900, 902, 904, 906, 908, 910, 912, and 951-1012.
(224) In some embodiments, DNA can be obtained through gene synthesis. This can be performed by either through Genescript or IDT, for example. DNA can be cloned through standard molecular biology techniques into an overexpression vector such as: pQE1, pGEX-4t3, pDest-17, pET series, pFASTBAC, for example. E. coli host strains can be used to produce enzyme (i.e., Top10 or BL21 series+/codon plus) using 1 mM IPTG for induction at OD600 of 1. E. coli strains can be propagated at 37 C, 250 rpm and switched to room temperature or 30 C (150 rpm) during induction. When indicated, some enzymes can also be expressed through SF9 insect cell lines using pFASTBAC and optimized MO. Crude extract containing enzymes can be generated through sonication and used for the reactions described herein. All UDP-glycosyltransferase reactions contain sucrose synthase, and can be obtained from A. thaliana via gene synthesis and expressed in E. coli.
(225) Hydrolysis of Hyper-Glycosylated Mogrosides to Produce Compound 1
(226) In some embodiments, hyper-glycosylated mogrosides can be hydrolyzed to produce Compound 1. Non-limiting examples of hyper-glycosylated mogrosides include Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III. Enzymes capable of catalyzing the hydrolysis process to produce Compound 1 can be, for example, CGTases (e.g., displays hydrolysis without starch), cellulases, -glucosidases, transglucosidases, amylases, pectinases, dextranases, and fungal lactases. The amino acid sequences of some of these enzymes and the nucleic acid sequences encoding some of these enzymes can be found in Table 1.
(227) In some embodiments, Compound 1 displays tolerance to hydrolytic enzymes in the recombinant cell, wherein the hydrolytic enzymes display capabilities of hydrolyzing Mogroside VI, Mogroside V, Mogroside IV to Mogroside IIIE. The alpha-linked glycoside present in Compound 1 provides a unique advantage over other Mogrosides (beta-linked glycosides) due to its tolerance to hydrolysis. During microbial production of Compound 1, the recombinant host cells (e.g., microbial host cells) can hydrolyze unwanted beta-linked Mogrosides back to Mogroside IIIE. Without being bound by any particular theory, it is believed that the hydrolysis by the host cells can improve the purity of Compound 1 due to: 1) Reduction of unwanted Mogroside VI, Mogroside V, and Mogroside IV levels, and/or 2) The hydrolysis will increase the amount of Mogroside IIIE available to be used as a precursor for production of Compound 1.
(228) Purification of Mogroside Compounds
(229) Some embodiments comprise isolating mogroside compounds, for example Compound 1. In some embodiments, isolating Compound 1 comprises lysing the recombinant host cell. In some embodiments, isolating Compound 1 comprises isolating Compound 1 from the culture medium. In some embodiments, the method further comprises purifying Compound 1. In some embodiments, purifying Compound 1 comprises HPLC, solid phase extraction or a combination thereof. In some embodiments, the purifying comprises harvesting the recombinant cells, saving the supernatant and lysing the cells. In some embodiments, the lysing comprises subjecting the cells to shear force or detergent washes thereby obtaining a lysate. In some embodiments, the shear force is from a sonication method, french pressurized cells, or beads. In some embodiments, the lysate is subjected to filtering and purification steps. In some embodiments, the lysate is filtered and purified by solid phase extraction. The lysate can then be filtered and treated with ammonium sulfate to remove proteins, and fractionated on a C18 HPLC (510 cm Atlantis prep T3 OBD column, 5 um, Waters) and by injections using an A/B gradient (A=water B=acetonitrile) of 10.fwdarw.30% B over 30 minutes, with a 95% B wash, followed by re-equilibration at 1% (total run time=42 minutes). The runs can be collected in tared tubes (12 fractions/plate, 3 plates per run) at 30 mL/fraction. The lysate can also be centrifuged to remove solids and particulate matter. Plates can then be dried in the Genevac HT12/HT24. The desired compound is expected to be eluted in Fraction 21 along with other isomers. The pooled Fractions can be further fractionated in 47 runs on fluoro-phenyl HPLC column (310 cm, Xselect fluoro-phenyl OBD column, 5 um, Waters) using an A/B gradient (A=water, B=acetonitrile) of 15.fwdarw.30% B over 35 minutes, with a 95% B wash, followed by re-equilibration at 15% (total run time=45 minutes). Each run was collected in 12 tared tubes (12 fractions/plate, 1 plate per run) at 30 mL/fraction. Fractions containing the desired peak with the desired purity can be pooled based on UPLC analysis and dried under reduced pressure to give a whitish powdery solid. The pure compound can be re-suspended/dissolved in 10 mL of water and lyophilized to obtain at least a 95% purity.
(230) For purification of Compound 1, in some embodiments, the compound can be purified by solid phase extraction, which may remove the need to HPLC. Compound 1 can be purified, for example, to or to about 70%, 80%, 90%, 95%, 98%, 99%, or 100% purity or any level of purity within a range described by any two aforementioned values In some embodiments, compound 1 that is purified by solid phase extraction is, or is substantially, identical to the HPLC purified material. In some embodiments, the method comprises fractionating lysate from a recombinant cell on an HPLC column and collecting an eluted fraction comprising Compound 1.
(231) Fermentation
(232) Host cells can be fermented as described herein for the production of Compound 1. This can also include methods that occur with or without air and can be carried out in an anaerobic environment, for example. The whole cells (e.g., recombinant host cells) may be in fermentation broth or in a reaction buffer.
(233) Monk fruit (Siraitia grosvenorii) extract can also be used to contact the cells in order to produce Compound 1. In some embodiments, a method of producing Compound 1 is provided. The method can comprise contacting monk fruit extract with a first enzyme capable of catalyzing production of Compound 1 from a mogroside such as such as Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III. In some embodiments, the contacting comprises contacting the mogrol fruit extract with a recombinant host cell that comprises a first gene encoding the first enzyme. In some embodiments, the first gene is heterologous to the recombinant host cell. In some embodiments, the mogrol fruit extract contacts with the first enzyme in a recombinant host cell that comprises a first polynucleotide encoding the first enzyme. In some embodiments, mogroside IIIE is in the mogrol fruit extract. In some embodiments, mogroside IIIE is also produced by the recombinant host cell. In some embodiments, the method further comprises cultivating the recombinant host cell in a culture medium under conditions in which the first enzyme is expressed. In some embodiments, the first enzyme is one or more of UDP glycosyltransferases, cyclomaltodextrin glucanotransferases (CGTases), glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the first enzyme is a CGTase. For example, the CGTase can comprise an amino acid sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more sequence identity to the sequence of any one of SEQ ID NO: SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase comprises the amino acid sequence of any one of SEQ ID NOs: SEQ ID NOs: 1, 3, 78-101, 148, and 154. In some embodiments, the CGTase comprises the amino acid sequence of any one of SEQ ID NOs: 78-101. In some embodiments, the first enzyme is a dextransucrase. In some embodiments, the dextransucrase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of the sequences set forth in SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the dextransucrase comprises an amino acid sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156 and 896. In some embodiments, the first enzyme is a transglucosidase. In some embodiments, the transglucosidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to the sequence of any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the transglucosidase comprises an amino acid sequence of any one of SEQ ID NOs: 163-290 and 723. In some embodiments, the first enzyme is a beta-glucosidase. In some embodiments, the beta glucosidase comprises an amino acid sequence set forth in SEQ ID NO: 292, or an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 292. In some embodiments, the mogrol fruit extract comprises Mogroside IIA and the recombinant host cell comprises a second gene encoding a second enzyme capable of catalyzing production of Mogroside IIIE from Mogroside IIA. In some embodiments, mogroside IIA is also produced by the recombinant host cell. In some embodiments, the second enzyme is one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT is UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO:5, 444, or 445), 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9 or 407), UGT1576(SEQ ID NO:15), UGT SK98 (SEQ ID NO: 16), UGT430 (SEQ ID NO: 17), UGT1697 (SEQ ID NO: 18), UGT11789 (SEQ ID NO: 19), or comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to UGT73C3 (SEQ ID NO: 4), UGT73C6 (SEQ ID NO:5, 444 or 445), 85C2 (SEQ ID NO: 6), UGT73C5 (SEQ ID NO: 7), UGT73E1 (SEQ ID NO: 8), UGT98 (SEQ ID NO: 9 or 407), UGT1576 (SEQ ID NO:15), UGT SK98 (SEQ ID NO:16), UGT430 (SEQ ID NO:17), UGT1697 (SEQ ID NO:18), UGT11789 (SEQ ID NO:19). In some embodiments, the UGT is encoded by a gene set forth in UGT1495 (SEQ ID NO: 10), UGT1817 (SEQ ID NO: 11), UGT5914 (SEQ ID NO: 12), UGT8468 (SEQ ID NO: 13) or UGT10391 (SEQ ID NO:14). In some embodiments, the monk fruit extract comprises mogrol. In some embodiments, the method further comprises contacting the mogrol of the monk fruit extract wherein the recombinant host cell further comprises one or more genes encoding one or more enzymes capable of catalyzing production of Mogroside IIIE from mogrol. In some embodiments, mogrol is also produced by the recombinant host cell. In some embodiments, the one or more enzymes comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the second enzyme is a uridine diphosphate-glucosyl transferase (UGT). In some embodiments, the UGT is UGT73C3, UGT73C6, 85C2, UGT73C5, UGT73E1, UGT98, UGT1495, UGT1817, UGT5914, UGT8468, UGT10391, UGT1576, UGT SK98, UGT430, UGT1697, or UGT11789, or comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to those UGTs. In some embodiments, the method further comprises contacting the monk fruit extract with the recombinant host cell to produce mogroside IIIE, wherein the recombinant host cell further comprises one or more genes encoding one or more enzymes capable of catalyzing production of Mogroside IIIE from the mogroside compound, wherein the mogroside compound is one or more of mogroside IA1, mogroside IE1, mogroside IIA1, mogroside IIE, mogroside IIA, mogroside IIIA1, mogroside IIIA2, mogroside III, mogroside IV, mogroside IVA, mogroside V, or siamenoside. In some embodiments, a mogroside compound is also produced by the recombinant host cell. In some embodiments, the one or more enzymes comprises one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. In some embodiments, the mogroside compound is Mogroside IIE. In some embodiments, the one or more enzymes is comprises an amino acid set forth by any one of SEQ ID NOs: 293-303. In some embodiments, the mogroside compound is Morgroside IIA or Mogroside IIE, and wherein contacting the monk fruit extract with the recombinant cell expressing the one or more enzymes produces Mogroside IIIA, Mogroside IVE and Mogroside V. In some embodiments, the one or more enzymes comprise an amino acid set forth in SEQ ID NO: 304. In some embodiments, the one or more enzymes is encoded by a sequence set forth in SEQ ID NO: 305. In some embodiments, the monk fruit extract comprises Mogroside IA1. In some embodiments, the method further comprises contacting the monk fruit extract with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding UGT98 or UGT SK98. In some embodiments, the UGT98 or UGT SK98 enzyme comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 9, 407, 16 or 306. In some embodiments, the UGT98 is encoded by a sequence set forth in SEQ ID NO: 307. In some embodiments, the contacting results in production of Mogroside IIA in the cell. In some embodiments, the monk fruit extract comprises 11-hydroxy-24,25 epoxy cucurbitadienol. In some embodiments, the method further comprises contacting monk fruit extract with the recombinant host cell, wherein the recombinant host cell further comprises a third gene encoding an epoxide hydrolase. In some embodiments, the 11-hydroxy-24,25 epoxy cucurbitadienol is also produced by the recombinant host cell. In some embodiments, the method further comprises contacting monk fruit extract with the recombinant host cell, wherein the recombinant host cell comprises a fourth gene encoding a cytochrome P450 or an epoxide hydrolase. In some embodiments, the 11-hydroxy-cucurbitadienol is also produced by the recombinant host cell. In some embodiments, the monk fruit extract comprises 3, 24, 25 trihydroxy cucurbitadienol. In some embodiments, the method further comprises contacting monk fruit extract with the recombinant host cell, wherein the recombinant host cell further comprises a fifth gene encoding a cytochrome P450. In some embodiments, the 3, 24, 25 trihydroxy cucurbitadienol is also produced by the recombinant host cell. In some embodiments, the contacting with mogrol fruit extract results in production of Mogrol in the recombinant host cell. In some embodiments, the cytochrome P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 20 or 308. In some embodiments, the epoxide hydrolase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 21-30 and 309-314. In some embodiments, the monk fruit extract comprises cucurbitadienol. In some embodiments, the method further comprises contacting cucurbitadienol with the recombinant host cell, wherein the recombinant host cell comprises a gene encoding cytochrome P450. In some embodiments, the contacting results in production of 11-hydroxy cucurbitadienol. In some embodiments, the 11-hydroxy cucurbitadienol is expressed in cells comprising a gene encoding CYP87D18 or SgCPR protein. In some embodiments, CYP87D18 or SgCPR comprises a sequence set forth in SEQ ID NO: 315, 872 or 874. In some embodiments, the CYP87D18 or SgCPR is encoded by SEQ ID NO: 316, 871 or 873. In some embodiments, the cucurbitadienol is also produced by the recombinant host cell. In some embodiments, the gene encoding cytochrome P450 comprises a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID Nos: 31-48, 316, 431, 871, 873, 875, 877, 879, 881, 883, 885, 887, and 891. In some embodiments, the cytochrome P450 comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 20 or 49. In some embodiments, the monk fruit extract comprises 2, 3-oxidosqualene. In some embodiments, the method further comprises contacting 2, 3-oxidosqualene of the monk fruit extract with the recombinant host cell, wherein the recombinant host cell comprises a seventh gene encoding cucurbitadienol synthase. In some embodiments, he cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904 or 906. In some embodiments, the cucurbitadienol synthase is encoded by a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, or 905. In some embodiments, the monk fruit extract comprises mogroside intermediates such as Mogroside V, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III. In some embodiments, the method further comprises contacting a mogroside intermediate with the recombinant host cell, wherein the recombinant host cell comprises a seventh gene encoding cucurbitadienol synthase. In some embodiments, he cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, or 906. In some embodiments, the cucurbitadienol synthase is encoded by a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, or 905. In some embodiments, the contacting results in production of cucurbitadienol. In some embodiments, the 2,3-oxidosqualene and diepoxysqualene is also produced by the recombinant host cell. In some embodiments, the 2, 3-oxidosqualene or diepoxysqualene is produced by an enzyme comprising a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 898 or 900, or comprising a sequence set forth in SEQ ID NO: 898 or 900. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 897 or 899; or encoded by a nucleic acid set forth in SEQ ID NO: 897 or 899.
(234) In some embodiments, the cucurbitadienol synthase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 70-73, 75-77, 319, 321, 323, 325, 327-333, 420, 422, 424, 426, 446, 902, 904, and 906. In some embodiments, the cucurbitadienol synthase is a cucurbitadienol synthase from C. pepo, S grosvenorii, C sativus, C melo, C moschata, or C maxim. In some embodiments, the cucurbitadienol synthase is encoded by a gene comprising a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903, and 905, or comprising a nucleic acid sequence set forth in any one of SEQ ID NOs: 74, 320, 322, 324, 326, 328, 418, 421, 423, 425, 427, 897, 899, 901, 903 and 905. In some embodiments, 11-OH cucurbitadienol is produced by the cell. In some embodiments, 11-OH cucurbitadienol is expressed in cells comprising a gene encoding CYP87D18 or SgCPR. In some embodiments, CYP87D18 or SgCPR comprises a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 315, 872, or 874, or a sequence set forth in SEQ ID NO: 315, 872 or 874. In some embodiments, the CYP87D18 or SgCPR is encoded by a sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO: 316, 871 or 873, or a sequence set forth in SEQ ID NO: 316, 871 or 873. In some embodiments, the monk fruit extract comprises squalene. In some embodiments, the 2,3-oxidosqualene or diepoxysqualene is produced by an enzyme comprising a sequence having at least 70%, 80%, 85%, 90%, 95%, 98%, 99%, or more sequence identity to SEQ ID NO: 898 or 900, or a sequence set forth in SEQ ID NO: 898 or 900. In some embodiments, the 2, 3-oxidosqualene or diepoxysqualene is produced by an enzyme encoded by a nucleic acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to SEQ ID NO; 897 or 899, or a sequence set forth in SEQ ID NO: 897 or 899. In some embodiments, the method further comprises contacting squalene with the recombinant host cell, wherein the recombinant host cell comprises an eighth gene encoding a squalene epoxidase. In some embodiments, the contacting results in production of 2,3-oxidosqualene. In some embodiments, the squalene is also produced by the recombinant host cell. In some embodiments, the squalene epoxidase comprises an amino acid sequence having at least, 70%, 80%, 85%, 90%, 95%, 98%, 99%, 100%, or a range between any two of these numbers, sequence identity to any one of SEQ ID NOs: 50-56, 60, 61, 334 or 335. In some embodiments, squalene epoxide is encoded by a nucleic acid sequence set forth in SEQ ID NO: 335. In some embodiments, the monk fruit extract comprises farnesyl pyrophosphate. In some embodiments, the method further comprises contacting farnesyl pyrophosphate with the recombinant host cell, wherein the recombinant host cell comprises a ninth gene encoding a squalene synthase. In some embodiments, the contacting results in production of squalene. In some embodiments, the farnesyl pyrophosphate is also produced by the recombinant host cell. In some embodiments, the squalene synthase comprises an amino acid sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more of sequence identity to any one of SEQ ID NO: 69 and 336. In some embodiments, the squalene synthase is encoded by a sequence comprising the nucleic acid sequence set forth in SEQ ID NO: 337. In some embodiments, the monk fruit extract comprises geranyl-PP. In some embodiments, the method further comprises contacting geranyl-PP with the recombinant host cell, wherein the recombinant host cell comprises a tenth gene encoding farnesyl-PP synthase. In some embodiments, the contacting results in production of farnesyl-PP. In some embodiments, the geranyl-PP is also produced by the recombinant host cell. In some embodiments, the farnesyl-PP synthase comprises an amino acid sequence having at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, at least 99%, or more sequence identity to SEQ ID NO: 338. In some embodiments, the farnesyl-PP synthase is encoded by a nucleic acid sequence set forth in SEQ ID NO: 339. In some embodiments, one or more of the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is a CMV, EF1a, SV40, PGK1, human beta actin, CAG, GAL1, GAL10, TEF, GDS, ADH1, CaMV35S, Ubi, T7, T7lac, Sp6, araBAD, trp, lac, Ptac, pL promoter, or a combination thereof. In some embodiments, the promoter is an inducible, repressible, or constitutive promoter. In some embodiments, production of one or more of pyruvate, acetyl-CoA, citrate, and TCA cycle intermediates have been upregulated in the recombinant host cell. In some embodiments, cytosolic localization has been upregulated in the recombinant host cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene comprises at least one sequence encoding a 2A self-cleaving peptide. In some embodiments, the recombinant host cell is a plant, bivalve, fish, fungus, bacteria or mammalian cell. In some embodiments, the plant is selected from Siraitia, Momordica, Gynostemma, Cucurbita, Cucumis, Arabidopsis, Artemisia, Stevia, Panax, Withania, Euphorbia, Medicago, Chlorophytum, Eleutherococcus, Aralia, Morus, Medicago, Betula, Astragalus, Jatropha, Camellia, Hypholoma, Aspergillus, Solanum, Huperzia, Pseudostellaria, Corchorus, Hedera, Marchantia, and Morus. In some embodiments, the fungus is selected from Trichophyton, Sanghuangporus, Taiwanofungus, Moniliophthora, Marssonina, Diplodia, Lentinula, Xanthophyllomyces, Pochonia, Colletotrichum, Diaporthe, Histoplasma, Coccidioides, Histoplasma, Sanghuangporus, Aureobasidium, Pochonia, Penicillium, Sporothrix, Metarhizium, Aspergillus, Yarrowia, and Lipomyces. In some embodiments, the fungus is Aspergillus nidulans, Yarrowia lipolytica, or Rhodosporin toruloides. In some embodiments, the recombinant host cell is a yeast cell. In some embodiments, the yeast is selected from Candida, Sacccharaomyces, Saccharomycotina, Taphrinomycotina, Schizosaccharomycetes, Komagataella, Basidiomycota, Agaricomycotina, Tremellomycetes, Pucciniomycotina, Aureobasidium, Coniochaeta, Rhodosporidium, and Microboryomycetes. In some embodiments, the bacteria is selected from the group consisting of Frankia, Actinobacteria, Streptomyces, and Enterococcus. In some embodiments, the bacteria is Enterococcus faecalis. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth gene has been codon optimized for expression in a bacterial, mammalian, plant, fungal or insect cell. In some embodiments, one or more of the first, second third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth genes comprises a functional mutation to increased activity of the encoded enzyme. In some embodiments, cultivating the recombinant host cell comprises monitoring the cultivating for pH, dissolved oxygen level, nitrogen level, or a combination thereof of the cultivating conditions. In some embodiments, the method comprises isolating Compound 1. In some embodiments, isolating Compound 1 comprises lysing the recombinant host cell. In some embodiments, isolating Compound 1 comprises isolating Compound 1 from the culture medium. In some embodiments, the method further comprises purifying Compound 1. In some embodiments, purifying Compound 1 comprises HPLC, solid phase extraction or a combination thereof. In some embodiments, the purifying further comprises harvesting the recombinant cells, saving the supernatant and lysing the cells. In some embodiments, the lysing comprises subjecting the cells to shear force or detergent washes thereby obtaining a lysate. In some embodiments, the shear force is from a sonication method, french pressurized cells, or beads. In some embodiments, the lysate is subjected to filtering and purification steps. In some embodiments, the lysate is filtered and purified by solid phase extraction. In some embodiments, the method further comprises second or third additions of monk fruit extract to the growth media of the recombinant host cells. Additionally the method can be performed by contacting the monk fruit extract with the recombinant cell lysate, wherein the recombinant cell lysate comprises the expressed enzymes listed herein.
(235) In general, compounds as disclosed and described herein, individually or in combination, can be provided in a composition, such as, e.g., an ingestible composition. In one embodiment, compounds as disclosed and described herein, individually or in combination, can provide a sweet flavor to an ingestible composition. In other embodiments, the compounds disclosed and described herein, individually or in combination, can act as a sweet flavor enhancer to enhance the sweetness of another sweetener. In other embodiments, the compounds disclosed herein impart a more sugar-like temporal profile and/or flavor profile to a sweetener composition by combining one or more of the compounds as disclosed and described herein with one or more other sweeteners in the sweetener composition. In another embodiment, compounds as disclosed and described herein, individually or in combination, can increase or enhance the sweet taste of a composition by contacting the composition thereof with the compounds as disclosed and described herein to form a modified composition. In another embodiment, compounds as disclosed and described herein, individually or in combination, can be in a composition that modulates the sweet receptors and/or their ligands expressed in the body other than in the taste buds.
(236) As used herein, an ingestible composition includes any composition that, either alone or together with another substance, is suitable to be taken by mouth whether intended for consumption or not. The ingestible composition includes both food or beverage products and non-edible products. By Food or beverage products, it is meant any edible product intended for consumption by humans or animals, including solids, semi-solids, or liquids (e.g., beverages) and includes functional food products (e.g., any fresh or processed food claimed to have a health-promoting and/or disease-preventing properties beyond the basic nutritional function of supplying nutrients). The term non-food or beverage products or noncomestible composition includes any product or composition that can be taken into the mouth by humans or animals for purposes other than consumption or as food or beverage. For example, the non-food or beverage product or noncomestible composition includes supplements, nutraceuticals, pharmaceutical and over the counter medications, oral care products such as dentifrices and mouthwashes, and chewing gum.
(237) Compositions Comprising Mogroside Compounds
(238) Also disclosed herein include compostions, e.g., ingestible compositions, comprising one or more of the mogroside compounds disclosed herein, including but not limited to Compound 1 and the compounds shown in
(239) An ingestibly acceptable ingredient is a substance that is suitable to be taken by mouth and can be combined with a compound described herein to form an ingestible composition. The ingestibly acceptable ingredient may be in any form depending on the intended use of a product, e.g., solid, semi-solid, liquid, paste, gel, lotion, cream, foamy material, suspension, solution, or any combinations thereof (such as a liquid containing solid contents). The ingestibly acceptable ingredient may be artificial or natural. Ingestibly acceptable ingredients includes many common food ingredients, such as water at neutral, acidic, or basic pH, fruit or vegetable juices, vinegar, marinades, beer, wine, natural water/fat emulsions such as milk or condensed milk, edible oils and shortenings, fatty acids and their alkyl esters, low molecular weight oligomers of propylene glycol, glyceryl esters of fatty acids, and dispersions or emulsions of such hydrophobic substances in aqueous media, salts such as sodium chloride, wheat flours, solvents such as ethanol, solid edible diluents such as vegetable powders or flours, or other liquid vehicles; dispersion or suspension aids; surface active agents; isotonic agents; thickening or emulsifying agents, preservatives; solid binders; lubricants and the like.
(240) Additional ingestibly acceptable ingredients include acids, including but are not limited to, citric acid, phosphoric acid, ascorbic acid, sodium acid sulfate, lactic acid, or tartaric acid; bitter ingredients, including, for example caffeine, quinine, green tea, catechins, polyphenols, green robusta coffee extract, green coffee extract, whey protein isolate, or potassium chloride; coloring agents, including, for example caramel color, Red #40, Yellow #5, Yellow #6, Blue #1, Red #3, purple carrot, black carrot juice, purple sweet potato, vegetable juice, fruit juice, beta carotene, turmeric curcumin, or titanium dioxide; preservatives, including, for example sodium benzoate, potassium benzoate, potassium sorbate, sodium metabisulfate, sorbic acid, or benzoic acid; antioxidants including, for example ascorbic acid, calcium disodium EDTA, alpha tocopherols, mixed tocopherols, rosemary extract, grape seed extract, resveratrol, or sodium hexametaphosphate; vitamins or functional ingredients including, for example resveratrol, Co-Q10, omega 3 fatty acids, theanine, choline chloride (citocoline), fibersol, inulin (chicory root), taurine, panax ginseng extract, guanana extract, ginger extract, L-phenylalanine, L-carnitine, L-tartrate, D-glucoronolactone, inositol, bioflavonoids, Echinacea, ginko biloba, yerba mate, flax seed oil, garcinia cambogia rind extract, white tea extract, ribose, milk thistle extract, grape seed extract, pyrodixine HCl (vitamin B6), cyanoobalamin (vitamin B12), niacinamide (vitamin B3), biotin, calcium lactate, calcium pantothenate (pantothenic acid), calcium phosphate, calcium carbonate, chromium chloride, chromium polynicotinate, cupric sulfate, folic acid, ferric pyrophosphate, iron, magnesium lactate, magnesium carbonate, magnesium sulfate, monopotassium phosphate, monosodium phosphate, phosphorus, potassium iodide, potassium phosphate, riboflavin, sodium sulfate, sodium gluconate, sodium polyphosphate, sodium bicarbonate, thiamine mononitrate, vitamin D3, vitamin A palmitate, zinc gluconate, zinc lactate, or zinc sulphate; clouding agents, including, for example ester gun, brominated vegetable oil (BVO), or sucrose acetate isobutyrate (SAIB); buffers, including, for example sodium citrate, potassium citrate, or salt; flavors, including, for example propylene glycol, ethyl alcohol, glycerine, gum Arabic (gum acacia), maltodextrin, modified corn starch, dextrose, natural flavor, natural flavor with other natural flavors (natural flavor WONF), natural and artificial flavors, artificial flavor, silicon dioxide, magnesium carbonate, or tricalcium phosphate; and stabilizers, including, for example pectin, xanthan gum, carboxylmethylcellulose (CMC), polysorbate 60, polysorbate 80, medium chain triglycerides, cellulose gel, cellulose gum, sodium caseinate, modified food starch, gum Arabic (gum acacia), or carrageenan.
EXAMPLES
(241) Some aspects of the embodiments discussed above are disclosed in further detail in the following examples, which are not in any way intended to limit the scope of the present disclosure.
Example 1: Production of Siamenoside I
(242) ##STR00024##
(243) As disclosed herein, siamenoside I can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, siamenoside I may be hydrolyzed to produce mogroside IIIE which can then be used to produce Compound 1. For example, a method for producing siamenoside I can comprises: contacting mogrol with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a recombinant cell expressing pectinase from Aspergillus aculeatus can be used.
(244) As another example, the method for producing siamenoside I can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a pectinase from Aspergillus aculeatus can be used.
Example 2: Production of Mogroside IV.SUB.E
(245) ##STR00025##
(246) As disclosed herein, Mogroside IV.sub.E can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Mogroside IV.sub.E from mogroside V can then be used to produce Compound 1. For example, a method for producing Mogroside IV.sub.E can comprises: contacting mogroside V with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. As another example the recombinant cell can comprises a gene encoding pectinase. The pectinase can be encoded by a gene from Aspergillus aculeatus.
(247) As another example, the method for producing Mogroside IV.sub.E can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a pectinase from Aspergillus aculeatus can be used.
Example 3: Production of Mogroside III.SUB.E
(248) ##STR00026##
(249) As disclosed herein, Mogroside III.sub.E can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Mogroside II.sub.A may be glycosylated to produce mogroside IIIE which can then be used to produce Compound 1.
(250) As another example, the method for producing Mogroside III.sub.E can comprises: contacting one or more of Mogroside V, Mogroside II.sub.A, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a pectinase from Aspergillus aculeatus can be encoded by a gene within the recombinant host cell.
Example 4: Production of Mogroside IV.SUB.A
(251) ##STR00027##
(252) As disclosed herein, Mogroside IV.sub.A can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Mogroside IV.sub.A from mogroside V can then be used to produce Compound 1.
(253) For example, a method for producing Mogroside IV.sub.A can comprises: contacting Mogroside V with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can also be a 3-galactosidase from Aspergillus oryzae, for example.
(254) As another example, the method for producing Mogroside IV.sub.A can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a -galactosidase from Aspergillus oryzae can be used in the method.
Example 5: Production of Mogroside II.SUB.A
(255) ##STR00028##
(256) As disclosed herein, Mogroside II.sub.A can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, a method for producing Mogroside II.sub.A can comprise: contacting Mogroside IA.sub.1 with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(257) As another example, the method for producing Mogroside II.sub.A can comprises: contacting one or more of Mogroside IA1, Mogroside V, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a celluclast can also be used.
Example 6: Production of Mogroside III.SUB.A1 .from Aromase
(258) ##STR00029##
(259) As disclosed herein, Mogroside III.sub.A1 can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Mogroside III.sub.A1 can be an intermediate to produce mogroside IV.sub.A which can then be used as an intermediate to produce Compound 1. For example, a method for producing Mogroside III.sub.A1 I can comprise contacting Siamenoside I with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can also be Aromase, for example. As another example, the method for producing Mogroside III.sub.A1 I can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
Example 7: Production of Compound 3
(260) ##STR00030##
(261) As disclosed herein, Compound 3 can be an intermediate mogroside compound that is produced with Compound 1 disclosed herein. For example, a method for producing Compound 3 can comprises: contacting mogrol with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be Cyclomaltodextrin glucanotransferase from Bacillus lichenformis and/or Toruzyme.
(262) As another example, the method for producing Compound 3 can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IIIE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CTGase enzyme can be used.
Example 8: Production of Compound 4
(263) ##STR00031##
(264) As disclosed herein, Compound 4 produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 4 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(265) As another example, the method for producing Compound 4 can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, 3-glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be Cyclomaltodextrin glucanotransferase from Bacillus lichenformis and/or Toruzyme, for example.
Example 9: Production of Compound 5
(266) ##STR00032##
(267) Compound 5 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 5 can also lead to the production of Compound 1, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(268) As another example, the method for producing Compound 5 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.
Example 10: Production of Compound 6
(269) ##STR00033##
(270) As disclosed herein, Compound 6 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 6 can also lead to the production of Compound 1, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(271) As another example, the method for producing Compound 6 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.
Example 11: Production of Compound 7
(272) ##STR00034##
(273) As disclosed herein, Compound 7 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 7 can also lead to the production of Compound 1, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(274) As another example, the method for producing Compound 7 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.
Example 12: Production of Compound 8
(275) ##STR00035##
(276) As disclosed herein, Compound 8 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 8 can also lead to the production of Compound 1, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(277) As another example, the method for producing Compound 8 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.
Example 13: Production of Compound 9
(278) ##STR00036##
(279) As disclosed herein, Compound 9 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 9 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(280) As another example, the method for producing Compound 9 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IVE, Iso-mogroside V, Mogroside IIIE, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IVA, Mogroside IIA, Mogroside IIA1, Mogroside IIA2, Mogroside IA, 11-oxo-Mogroside VI, 11-oxo-Mogroside IIIE, 11-oxo-Mogroside IVE, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside IIE, Mogroside IIIA2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.
Example 14: Production of Compound 10
(281) ##STR00037##
(282) As disclosed herein, Compound 10 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 10 can also lead to the production of Compound 1, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(283) As another example, the method for producing Compound 10 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.
Example 15: Production of Compound 11
(284) ##STR00038##
(285) As disclosed herein, Compound 11 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, a method for producing Compound 11 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIIE or 11-oxo-MIII.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(286) As another example, the method for producing Compound 11 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a CGTase from Bacillus lichenformis or Toruzyme can be used.
Example 16: Production of Compound 12
(287) ##STR00039##
(288) As disclosed herein, Compound 12 can be an intermediate mogroside compound that can be used in the production of Compound 1, disclosed herein. For example, a method for producing Compound 12 can also lead to the production of Compound 1, the method can comprise contacting Mogroside VI isomer with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, invertases and dextranases. The enzyme can be an invertase enzyme from baker's yeast, for example.
(289) As another example, the method for producing Compound 12 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, Mogroside VI isomer and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, invertases and dextranases.
Example 17: Production of Compound 13
(290) ##STR00040##
(291) As disclosed herein, Compound 13 can be an intermediate mogroside produced during the production of Compound 1 disclosed herein. For example, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme expressed can also be a celluclast, for example.
(292) As another example, the method for producing Compound 13 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a celluclast can be used.
Example 18: Production of Compound 14
(293) ##STR00041##
(294) As disclosed herein, Compound 14 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme expressed can also be a celluclast, for example.
(295) As another example, the method for producing Compound 14 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a celluclast can be used. The method can also require the presence of a sugar, such as -lactose, for example.
Example 19: Production of Compound 15
(296) ##STR00042##
(297) As disclosed herein, Compound 15 can be an intermediate mogroside compound that can be used for the production of Compound 1 disclosed herein. For example, the method can comprise contacting mogroside II.sub.A with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, 3-glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(298) As another example, the method for producing Compound 15 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside IE, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a toruzyme can be used.
Example 20: Production of Compound 16
(299) ##STR00043##
(300) As disclosed herein, Compound 16 can be an intermediate mogroside compound that can be used for the production of Compound 1 disclosed herein. For example, the method can comprise contacting mogroside II.sub.A with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, 3-glucosidases, amylases, transglucosidases, pectinases, and dextranases.
(301) As another example, the method for producing Compound 16 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a toruzyme can be used.
(302) The enzyme can be Toruzyme, for example. The recombinant cell can further comprise a gene encoding a clyclomatlodextrin glucanotransferase (e.g., Toruzyme), an invertase, a glucostransferase (e.g., UGT76G1), for example.
Example 21: Production of Compound 17
(303) ##STR00044##
(304) As disclosed herein, Compound 17 can be an intermediate mogroside compound for the production of Compound 1 disclosed herein. For example, Compound 17 may be hydrolyzed to produce mogroside IIIE which can then be used to produce Compound 1. For example, a method for producing Compound 17 can comprises: contacting Siamenoside I with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, transglucosidases, sucrose synthases, pectinases, and dextranases. For example, a recombinant cell expressing a UDP glycosyltransferase can be used.
(305) As another example, the method for producing Compound 17 can comprises: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. For example, a UDP glycosyltransferases can be used.
Example 22: Production of Compound 18
(306) ##STR00045##
(307) As disclosed herein, Compound 18 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. For example, Compound 18 may be hydrolyzed to produce mogroside IIIE which can then be used to produce Compound 1. For example, a method for producing Compound 18 can also lead to the production of Compound 1, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.
(308) As another example, the method for producing Compound 18 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example.
Example 23: Production of Compound 19
(309) ##STR00046##
(310) As disclosed herein, Compound 19 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 19 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 18 can also lead to the production of Compound 1, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.
(311) As another example, the method for producing Compound 19 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example. The enzyme can also be sucrose synthase Sus1, for example.
Example 24: Production of Compound 20
(312) ##STR00047##
(313) As disclosed herein, Compound 20 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 20 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 20 can also lead to the production of Compound 1, the method can comprise contacting Mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.
(314) As another example, the method for producing Compound 20 can comprise: contacting one or more of Mogroside V, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example. The enzyme can also be sucrose synthase Sus1, for example. The enzyme can be sucrose synthase Sus1 and UGT76G1, for example.
Example 25: Production of Compound 21
(315) ##STR00048##
(316) As disclosed herein, Compound 21 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 21 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 21 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IV.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.
(317) As another example, the method for producing Compound 21 can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example. The enzyme can also be sucrose synthase Sus1, for example. The enzymes can be sucrose synthase Sus1 and GT76G1, for example.
Example 26: Production of Compound 22
(318) ##STR00049##
(319) As disclosed herein, Compound 22 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 22 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 22 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IVA with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzymes can be Sus1 and UGT76G1 for example.
(320) As another example, the method for producing Compound 22 can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be UGT76G1, for example. The enzyme can also be sucrose synthase Sus1, for example. The enzymes can be sucrose synthase Sus1 and GT76G1, for example.
Example 27: Production of Compound 23
(321) ##STR00050##
(322) As disclosed herein, Compound 23 can be an intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 23 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Compound 22 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IV.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase, for example.
(323) As another example, the method for producing Compound 23 can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be detransucrase, for example, which will hydrolyze the hyper glycosylated mogroside IV.sub.E isomers to the desired mogroside V isomer.
Examples 28 and 29: Production of Mogroside II.SUB.A1 .and Mogroside II.SUB.A2 .from Fungal lactase
(324) ##STR00051##
(325) As disclosed herein, Mogroside II.sub.A1 and Mogroside II.sub.A2 can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Mogroside II.sub.A1 and Mogroside II.sub.A2 can be further hydrolyzed to produce Compound 1, for example. For example, a method for producing Mogroside II.sub.A1 and Mogroside II.sub.A2 can also lead to the production of Compound 1, the method can comprise contacting Mogroside IV.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be a lactase from a fungus, for example.
(326) As another example, the method for producing Mogroside II.sub.A1 and Mogroside II.sub.A2 can include: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
Example 30: Production of Mogroside .SUB.IA .from Viscozyme
(327) ##STR00052##
(328) As disclosed herein, Mogroside .sub.IA can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Mogroside .sub.IA_can be further hydrolyzed to produce Compound 1, for example. A method for producing Mogroside .sub.IA_can also lead to the production of Compound 1, the method can comprise contacting Mogroside IIA with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be Viscozyme, for example.
(329) As another example, the method for producing Mogroside .sub.IA can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be Viscozyme, for example.
Example 31: Production of Compound 24
(330) ##STR00053##
(331) As disclosed herein, Compound 24 can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 24 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compound 24_can also lead to the production of Compound 1, the method can comprise contacting mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.
(332) As another example, the method for producing Compound 24 can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.
Example 32: Production of Compound 25
(333) ##STR00054##
(334) As disclosed herein, Compound 25 can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 25 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compound 25_can also lead to the production of Compound 1, the method can comprise contacting mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.
(335) As another example, the method for producing Compound 25 can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.
Example 33: Production of Compound 26
(336) ##STR00055##
(337) As disclosed herein, Compound 26 can be intermediate mogroside compound produced during the production of Compound 1 disclosed herein. Compound 26 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compound 26 can also lead to the production of Compound 1, the method can comprise contacting mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.
(338) As another example, the method for producing Compound 26 can include: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be dextransucrase DexT, for example.
Examples 34 and 35: Production of Mogrol and Mogroside I.SUB.E .from Pectinase
(339) ##STR00056##
(340) As disclosed herein, Mogrol and Mogroside I.sub.E can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Mogrol can be used as a substrate for producing Mogroside I.sub.A1, which is further hydrolyzed to form Compound 1 and Mogroside I.sub.E can be further hydrolyzed to produce Compound 1, for example. A method for producing Mogrol and Mogroside can also lead to the production of Compound 1, the method can comprise contacting mogroside V with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be pectinase enzyme from Aspergillus aculeatus, for example.
(341) As another example, the method for producing Mogrol and Mogroside I.sub.E can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
Example 36: Production of Mogroside IIE
(342) ##STR00057##
(343) As disclosed herein, Mogroside I.sub.E can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Mogroside I.sub.E can be further hydrolyzed to produce Compound 1, for example. A method for producing Mogroside I.sub.E can also lead to the production of Compound 1, the method can comprise contacting mogroside V with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be pectinase enzyme from Aspergillus aculeatus, for example.
(344) As another example, the method for producing Mogroside I.sub.E can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
Examples 37 and 38: Production of Compounds 32 and 33
(345) ##STR00058##
(346) As disclosed herein, Compounds 32 and 33 can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Compounds 32 and 33 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compounds 32 and 33 can also lead to the production of Compound 1, the method can comprise contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, ISO-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be pectinase enzyme from Aspergillus aculeatus, for example.
(347) As another example, the method for producing Compound 32 and 33 can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
Examples 39 and 40: Production of Compounds 34 and 35
(348) ##STR00059##
(349) As disclosed herein, Compounds 34 and 35 can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Compounds 32 and 33 can be further hydrolyzed to produce Compound 1, for example. A method for producing Compounds 34 and 35 can also lead to the production of Compound 1, the method can comprise contacting mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be celluclast, for example.
(350) As another example, the method for producing Compounds 34 and 35 can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
Examples 41 and 42: Production of Mogroside III.SUB.A2 .and Mogroside III
(351) ##STR00060##
(352) As disclosed herein, Mogroside III.sub.A2 and Mogroside III can be intermediate mogroside compounds produced during the production of Compound 1 disclosed herein. Mogroside III.sub.A2 and Mogroside III can be further hydrolyzed to produce Compound 1, for example.
(353) For example Mogroside III.sub.A2 and Mogroside III can be also contact UGT to form Mogroside IVA, another mogroside compound that can be used to make Mogroside IIIE, which is further hydrolyzed to form Compound 1.
(354) A method for producing Mogroside III.sub.A2 and Mogroside III can also lead to the production of Compound 1, the method can comprise contacting mogroside III.sub.E with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases. The enzyme can be celluclast, for example.
(355) As another example, the method for producing Mogroside III.sub.A2 and Mogroside III can comprise: contacting one or more of Mogroside V, Mogroside IV.sub.E, Siamenoside I, Mogroside IV.sub.E, Iso-mogroside V, Mogroside III.sub.E, 11-Deoxy-mogroside V, 11-Oxo-mogroside V, Mogroside VI, Mogroside IV.sub.A, Mogroside II.sub.A, Mogroside II.sub.A1, Mogroside II.sub.A2, Mogroside I.sub.A, 11-oxo-Mogroside VI, 11-oxo-Mogroside III.sub.E, 11-oxo-Mogroside IV.sub.E, Mogroside I.sub.E, Mogrol, 11-oxo-mogrol, Mogroside II.sub.E, Mogroside III.sub.A2, and Mogroside III with a recombinant host cell expressing one or more of UDP glycosyltransferases, CGTases, glycotransferases, dextransucrases, cellulases, -glucosidases, amylases, transglucosidases, pectinases, and dextranases.
Example 43: Use of CGT-SL Enzyme to Produce Compound 1
(356) In 1 ml reaction volume, 5 mg of Mogroside III.sub.E, 50 mg of soluble starch, 0.1M NaOAC pH 5.0, 125 ul of CGT-SL enzyme (from Geobaccilus thermophillus) and water was mixed and with a stir bar and incubated at 50 C. Time point samples were taken for HPLC.
(357) HPLC Data: Mass spec of Compound 1 production as shown in
Example 44: Cloning: Gene Encoding for Dextransucrase Enzyme was PCR Amplified from Leuconostoc citreum ATCC11449 and Cloned into pET23a
(358) Growth conditions: BL21 Codon Plus RIL strain was grown in 2YT at 37 C, 250 rpm until OD600 of 1. 10 mM of lactose was added for induction, incubated at room temperature, 150 rpm overnight. Crude extract used for the reaction was obtained either by sonication or osmotic shock.
(359) In some embodiments, the dextransucrase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156, and 896. In some embodiments, the DexT can comprises an amino acid sequence set forth in SEQ ID NO: 103. In some embodiments, the DexT comprises a nucleic acid sequence set forth in SEQ ID NO: 104 or 105.
Example 45: Reaction of Mogroside III.SUB.E .with S. mutans Clarke ATCC25175 Dextransucrase to Produce Compound 1
(360) Growth conditions: The strain indicated above was grown anaerobically with glucose supplementation as indicated in Wenham, Henessey and Cole (1979) to stimulate dextransucrase production. 5 mg/ml Mogroside IIE was added to the growth media. Time point samples were taken for HPLC. HPLC Data is presented as mass spec of Compound 1 production in
Example 46: Reaction of Mogroside IIIE with CGTase
(361) In 1 ml reaction volume, 5 mg of Mogroside IIIE, 50 mg of soluble starch, 0.1M NaOAC pH 5.0, 125 ul of enzyme and water was mixed and with a stir bar and incubated at 50 C. Time point samples were taken for HPLC. The enzyme used was CGTase. The product of Compound 1 is seen in the HPLC data and mass spectroscopy data as shown in
Example 47: Reaction of Mogroside IIIE with Celluclast
(362) Celluclast xylosylation were performed with mogroside IIE with celluclast from the native host: Trichoderma reesei
(363) Reaction conditions: 5 mg of Mogroside IIIE, 100 mg xylan, 50 ul Celluclast were mixed in a total volume of 1 ml with 0.1M sodium acetate pH 5.0, incubated at 50 C with stirring. Time point samples were taken for HPLC.
(364) Xylosylated product is highlighted in
Example 48: Glycosyltransferases (Maltotriosyl Transferase) (Native Host: Geobacillus sp. APC9669)
(365) In this example, glycosytransferase AGY15763.1 (Amano Enzyme U.S.A. Co., Ltd., Elgin, Ill.; SEQ ID NO: 434, see Table 1) was used. 20 mL d water, 0.6 ml 0.5M MES pH 6.5, 6 g soluble starch, 150 mg Mogroside IIIE, and 3 ml enzyme were added to a 40 ml flat-bottom screw cap vial. The vial was sealed with black cap, incubated at 30 C. and stirred at 500 rpm using magnetic bar. 3 more identical reactions were set up for a total of 600 mg Mogroside IIE used as starting material. The reaction was stopped after 24 hours. Insoluble starch was removed by centrifugation (4000 rpm for 5 min, Eppendorf). The supernatant was heated to 80 C. for 30 minutes with stirring (500 rpm), followed by centrifugation (4000 rpm for 10 min, Eppendorf). The supernatant was filtered through a 250 ml, 0.22 micron PES and checked by LC-MS (Sweet Naturals 2016-Enzymatic_2016Q4_A.SPL, line 1254) to obtain HPLC data
(366) The AGY15763.1 protein (SEQ ID NO: 434) can be encoded by the native gDNA (SEQ ID NO: 437) or codon optimized (for E. coli) DNA sequence (SEQ ID NO: 438)
(367) An example of additional glycosytransferase expected to perform similarly is the UGT76G1 protein from Stevia rebaudiana (SEQ ID NO: 439), which can be expressed in E. coli. The native coding sequence for UGT76G1 (SEQ ID NO: 439) is provided in SEQ ID NO: 440).
Example 49: UDP-Glycosyltransferases UGT73C5 in the Presence of Mogrol
(368) Mogrol was reacted with UDP-glycosyltransferases which produced Mogroside I and Mogroside II. 1 mg/ml of Mogrol was reacted with 200 ul crude extract containing UGT73C5 (A. thaliana)(334), 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC. The reaction products were from Mogrol to Mogroside I and Mogroside II as shown in
(369) The protein sequence of UGT73C5 is shown in SEQ ID NO: 441, the native DNA coding sequence for UGT73C5 (SEQ ID NO: 441) is shown in SEQ ID NO: 442, and the UGT73C5 coding sequence (Codon optimized for E. coli) is shown in SEQ ID NO: 443.
Example 50: UDP-Glycosyltransferases (UGT73C6) in the Presence of Mogrol to Produce Mogroside I
(370) Reaction conditions: 1 mg/ml of Mogrol was reacted with 200 ul crude extract containing UGT73C6, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.5, incubated at 30 C. Samples were taken after 2 days for HPLC. The reaction product was Mogroside I from Mogrol. As shown in the HPLC data and Mass spectroscopy data of
(371) The protein and gDNA sequence encoding A. thaliana UGT73C6 is shown in SEQ ID NO: 444 and SEQ ID NO: 445, respectively.
Example 51: UDP-Glycosyltransferases (338) in the Presence of Mogrol to Produce Mogroside I, Mogroside IIA and Two Different Mogroside III Products
(372) Reaction conditions: 1 mg/ml of Mogrol or Mogroside IIA was reacted with 200 ul crude extract containing 338, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH8.5, incubated at 30 C. Samples were taken after 2 days for HPLC
(373) Mogrol reaction with Bacillus sp. UDP-glycotransferase (338) (described in Pandey et al., 2014; incorporated by reference in its entirety herein) led to the reaction products: Mogroside I, Mogroside IIA, and 2 different Mogroside III products.
(374) The protein and gDNA sequence encoding UGT 338 is provided in SEQ ID NO: 405 and SEQ ID NO: 406, respectively.
Example 52: UDP-Glycosyltransferases (301 (UGT98)) in the Presence of Mogroside IIIE to Produce Siamenoside I and Mogroside V
(375) Reaction conditions: 1 mg/ml of Mogroside IIIE was reacted with 200 ul crude extract containing 301, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1 M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC and mass spec analysis. The reaction products from Mogroside IIIE were Siamenoside I and Mogroside V as shown in
(376) The protein and gDNA sequence encoding S. grosvenorii 301 UGT98 is provided in SEQ ID NO: 407 and SEQ ID NO: 408, respectively.
Example 53: UDP-Glycosyltransferases (339) in the Presence of Mogrol, Siamenoside I or Compound 1 to Produce Mogroside I from Mogrol, Isomogroside V from Siamenoside I and Compound 1 Derivative from Compound 1
(377) Reaction conditions: 1 mg/ml of Mogrol, Siamenoside I or Compound 1 was reacted with 200 ul crude extract containing 339 (described in Itkin et al., incorporated by reference in its entirety herein), 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC
(378) The reaction products from Mogrol lead to Mogroside I, Siamenoside I lead to Isomogroside V, and Compound 1 led to a Compound 1 derivative (
(379) The protein and DNA sequence encoding S. grosvenorii UGT339 is provided in SEQ ID NO: 409 and SEQ ID NO: 410, respectively.
Example 54: UDP-Glycosyltransferases (330) in the Presence of Mogroside IIA, Mogroside IIE, Mogroside IIIE, Mogroside IVA, or Mogroside IVE to Produce Mogroside IIIA, Mogroside IVE, and Mogroside V
(380) As described herein, the use of UDP-glycotransferase (330) as described in Noguchi et al. 2008 (incorporated by reference in its entirety herein) led to the reaction products Mogroside IIIA, Mogroside IVE, Mogroside V. The native host is Sesamum indicum, and the production host was SF9. For the reaction, 1 mg/ml of Mogroside IIA, Mogroside IIE, Mogroside IIIE, Mogroside IVA, or Mogroside IVE was reacted with 200 ul crude extract containing 330, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC.
(381) The reaction led to surprising products such as Mogroside IIIA, Mogroside IVE, Mogroside V. As shown in
(382) The protein and gDNA sequence encoding the S. grosvenorii UGT330 protein is provided in SEQ ID NO: 411 and SEQ ID NO: 412, respectively.
Example 55: UDP-Glycosyltransferases (328) (Described in Itkin et al) in the Presence of Mogroside IIA, Mogroside IIE, Mogroside IIIE, Mogroside IVA, or Mogroside IVE to Produce Mogroside IIIA, Mogroside IVE, and Mogroside V
(383) Reaction conditions: 1 mg/ml of Mogroside III.sub.E was reacted with 200 ul crude extract containing 330, 2 ul crude extract containing sucrose synthase, 5 mM UDP, lx M221 protease inhibitor, 200 mM sucrose, 0.5 mg/ml spectinomycin, in 0.1M Tris-HCl pH7.0, incubated at 30 C. Samples were taken after 2 days for HPLC
(384) The reaction products were Mogroside IVE and Mogroside V. As shown in
(385) The sucrose synthase AtSus1 protein sequence and the gDNA encodes the AtSus1 protein are provided in SEQ ID NO: 415 and 416, respectively.
Example 56: Mogrol Production in Yeast
(386) DNA was obtained through gene synthesis either through Genescript or IDT. For some of the cucurbitadienol synthases, cDNA or genomic DNA was obtained through 10-60 day old seedlings followed by PCR amplification using specific and degenerate primers. DNA was cloned through standard molecular biology techniques into one of the following overexpression vectors: pESC-Ura, pESC-His, or pESC-LEU. Saccharomyces cerevisiae strain YHR072 (heterozygous for erg7) was purchased from GE Dharmacon. Plasmids (pESC vectors) containing Mogrol synthesis genes were transformed/co-transformed by using Zymo Yeast Transformation Kit II. Strains were grown in standard media (YPD or SC) containing the appropriate selection with 2% glucose or 2% galactose for induction of heterologous genes at 30 C, 220 rpm. When indicated, lanosterol synthase inhibitor, Ro 48-8071 (Cayman Chemicals) was added (50 ug/ml). Yeast production of mogrol and precursors were prepared after 2 days induction, followed by lysis (Yeast Buster), ethyl acetate extraction, drying, and resuspension in methanol. Samples were analyzed through HPLC.
(387) Production of cucurbitadienol was catalyzed by cucurbitadienol synthase S. grosvernorii SgCbQ in growth conditions with no inhibitor.
(388) Production of cucurbitadienol is shown in the HPLC and mass spectroscopy data which show mass peaks for the indicated product (
(389) Cpep2 was also used for the production of cucurbitadienol in yeast. As shown in
(390) Cucurbita pepo (Jack O' Lantern) Cpep4 was also used in the production of cucurbitadienol under growth conditions with no inhibitor. Production of cucurbitadienol is shown in the mass spectral data shown in
(391) A putative cucurbitadienol synthase protein sequence representing Cmax was obtained from native host Cucurbita maxima. The deduced coding DNA sequence will be used for gene synthesis and expression. The cucurbitadienol synthase sequences for the protein and DNA encoding the cucurbitadienol synthase is shown below:
(392) Proteins and DNA coding sequences below were obtained through alignment of genomic DNA PCR product sequence with known cucurbitadienol synthase sequences available through public databases (Pubmed). It is expected that any one of these Cmax proteins may be used in the methods, systems, compositions (e.g., host cells) disclosed herein to produce Compound 1. A non-limiting exemplary Cmax protein is Cmax1 (protein) (SEQ ID NO: 424) encoded by Cmax1 (DNA) (SEQ ID NO: 425).
(393) A putative cucurbitadienol synthase protein sequence representing Cmos1 was obtained from native host Cucurbita moschata. The deduced coding DNA sequence is used for gene synthesis and expression. Protein(s) and DNA coding sequence(s) shown below were obtained through alignment of genomic DNA PCR product sequence with known cucurbitadienol synthase sequences available through public databases (Pubmed). Any one of these Cmos proteins may be used in the methods, systems, compositions (e.g., host cells) disclosed herein to produce Compound 1. A non-limiting exemplary Cmos1 protein is Cmos1 (protein) (SEQ ID NO: 426) encoded by Cmos1 (DNA) (SEQ ID NO: 427).
Example 57: Production of Dihydroxycucurbitadienol in Yeast (Cucurbitadienol Synthase & Epoxide Hydrolase)
(394) The production of dihydroxycucurbitadienol in yeast was considered using cucurbitadienol synthase & epoxide hydrolase. The native host for these enzymes is S. grosvenorii.
(395) Growth conditions: SgCbQ was co-expressed with an epoxide hydrolase (EPH) in the presence of lanosterol synthase inhibitor.
(396) Possible dihydroxycucurbitadienol product is shown in
(397) EPH protein sequence and a DNA encoding EPH protein (codon optimized S. cerevisiae) is provided in SEQ ID NO: 428 and 429, respectively.
Example 58: Production of Mogrol from Cucurbitadienol Synthase, Epoxide Hydrolase, Cytochrome P450 and Cytochrome P450 Reductase
(398) Four enzymes, including Cucurbitadienol synthase, epoxide hydrolase, cytochrome P450, and cytochrome P450 reductase are co-expressed in S. cerevisiae. For the growth conditions SgCbQ, EPH, CYP87D18 and AtCPR (cytochrome P450 reductase from A. thaliana) are co-expressed in the presence of lanosterol synthase inhibitor. Production of mogrol by S. cerevisiae is expected. The protein sequence and DNA sequence encoding SgCbQ, EPH, CYP87D18 and AtCPR (cytochrome P450 reductase from A. thaliana) are: CYP87D18 (protein) (SEQ ID NO: 430), and CYP87D18 (DNA) (SEQ ID NO: 431); and AtCPR (protein) (SEQ ID NO: 432), and AtCPR (DNA) (SEQ ID NO: 433).
Example 59: Compound 1 is Tolerant to Microbial Hydrolysis
(399) Yeast strains Saccharomyces cerevisiae, Yarrowia lipolytica and Candida bombicola, were incubated in YPD supplemented with 1 mg/ml Mogroside V or Compound 1. After 3 days, supernatants were analyzed by HPLC.
(400) As shown in the HPLC data, epoxide hydrolase hydrolyzed Mogroside V to Mogroside IIIE. There was no hydrolysis products observed with Compound 1 (
Example 60: Streptococcus mutans Clarke ATCC 25175 Dextransucrase
(401) Streptococcus mutans Clarke can be grown anaerobically with glucose supplementation. An example of growth conditions can be found in Wenham, Henessey and Cole (1979), in which the method is used to stimulate dextransucrase production, for example. 5 mg/ml Mogroside IIIE was added to the growth media. Time point samples to monitor production can be taken for HPLC, for example. Sequences for various dextransucrase can be found in the Table 1, which include protein sequences for dextransucrases and nucleic acid sequences that encode dextransucrases (for example, SEQ ID NOs: 157-162). In some embodiments, the dextransucrase comprises, or consists of, an amino acid sequence of any one of SEQ ID NOs: 2, 103, 106-110, 156, and 896. In some embodiments, the DexT can comprises an amino acid sequence set forth in SEQ ID NO: 103. In some embodiments, the DexT comprises a nucleic acid sequence set forth in SEQ ID NO: 104 or 105. In some embodiments, herein the recombinant cell encodes a protein comprising the sequence set forth in any one of SEQ ID NO: 156-162 and/or comprises a nucleic acid encoding dextransucrase comprising a nucleic acid sequence set forth in any one of SEQ ID NOs: 157-162. This example is used to produce Compound 1.
Example 61: 90% Pure Compound 1 Production Procedure and Sensory Evaluation
(402) A fraction containing the mixture of 3 -mogroside isomers is obtained by treating mogroside III.sub.E (MIII.sub.E) with Dextransucrase/dextranase enzymes reaction followed by SPE fractionation. Based on UPLC analysis this mixture has 3 isomers, 11-oxo-Compound 1, Compound 1 and mogroside V isomer in 5:90:5% ratios respectively. These 3 isomers are characterized from the purification of a different fraction/source by LC-MS, 1D and 2D NMR spectra and by the comparison of closely related isomers in mogrosides series reported in the literature. This sample is further evaluated in sensory by comparing with pure Compound 1 sample using a triangle test.
(403) Enzyme Reaction and Purification Procedure
(404) 100 mL of pH 5.5 1M sodium acetate buffer, 200 g sucrose, 100 mL dextransucrase DexT (1 mg/ml crude extract, pET23a, BL21-Codon Plus-RIL, grown in 2YT), 12.5 g of Mogroside III.sub.E and 600 mL water were added to a 2.8 L shake flask, and the flask was shaken at 30 C., 200 rpm. The progress of the reaction was monitored periodically by LC-MS. After 72 hours, the reaction was treated with 2.5 mL of dextranase (Amano) and continued shaking the flask at 30 C. After 24 hours the reaction mixture was quenched by heating at 80 C. and centrifuged at 5000 rpm for 5 minutes and the supernatant was filtered and loaded directly onto a 400 g C18 SPE column and fractionated using MeOH: H.sub.2O 5/25/50/75/100 step-gradient. Each step in the gradient was collected in 6 jars, with 225 mL in each jar. The desired products were eluted in the second jar of the 75% MeOH fraction (SPE 75_2) and dried under reduced pressure. It was further re-suspended/dissolved in 7 mL of H.sub.2O, freezed and lyophilized the vial for 3 days to get 1.45 g of white solid.
(405) As per the UPLC analysis (
(406) Sensory Evaluation
(407) Triangle testing for pure Compound 1 vs. 90% pure Compound 1 was performed on Nov. 10, 2016. Two different compositions: (1) LSB+175 ppm pure Compound 1 (standard) and (2) LSB+175 ppm 90% pure Compound 1 were tested. All samples of compositions were made with Low Sodium Buffer (LSB) pH 7.1 and contain 0% ethanol.
(408) Conclusions: Panelists found that composition (1) LSB+175 ppm pure_Compound 1 (standard) was not significantly different than composition (2) LSB+175 ppm 90% pure Compound 1 (test) (p>0.05). Some of the testing analytical results are shown in Tables 2-4.
(409) TABLE-US-00001 TABLE 2 Frequency of panelists that correctly selected the different sample. n = 38 (19 panelists 2 reps). Samples Total Incorrect 24 Correct 14 Total 38 Correct Sample 0.381 Selected (p-value)
(410) TABLE-US-00002 TABLE 3 Analytical Results: Test Day Theoretical # (M) Observed (M) 175 ppm (155.51uM) pure_compound 1 132.20 1.54 (n = 2) (standard) 175 ppm (155.56uM) 90%_pure_compound 1 157.62 0.63 (n = 2) (test)
(411) TABLE-US-00003 TABLE 4 Analytical results: the day before the testing day Theoretical # (M) Observed (M) 175 ppm (155.51uM) pure_compound 1 134.48 7.31 (n = 2) (standard) 175 ppm (155.56uM) 90%_pure_compound 1 140.69 4.34 (n = 2) (test)
Example 62: Gene Expression in Recombinant Yeast Cells
(412) DNA was obtained through gene synthesis either through Genescript, IDT, or Genewiz. For some of the cucurbitadienol synthases, cDNA or genomic DNA was obtained through 10-60 day old seedlings followed by PCR amplification using specific and degenerate primers. DNA was cloned through standard molecular biology techniques or through yeast gap repair cloning (Joska et al., 2014) into one of the following overexpression vectors: pESC-Ura, pESC-His, or pESC-LEU. Gene expression was regulated by one of the following promoters; Gall, Gal10, Tef1, or GDS. Yeast transformation was performed using Zymo Yeast Transformation Kit II. Yeast strains were grown in standard media (YPD or SC) containing the appropriate selection with 2% glucose or 2% galactose for induction of heterologous genes. Yeast strains were grown in shake flask or 96 well plates at 30 C., 140-250 rpm. When indicated, lanosterol synthase inhibitor, Ro 48-8071 (Cayman Chemicals) was added (50 ug/ml). Yeast production of mogrol and precursors were prepared through lysis (Yeast Buster), ethyl acetate extraction, drying, and resuspension in methanol. Samples were analyzed through LCMS methods described below using A/B gradient (A=H.sub.2O, B=acetonitrile):
(413) For analyzing diepoxysqualene, the LCMS method included the use of C18 2.150 mm column, 5% B for 1.5 min, gradient 5% to 95% B or 5.5 min, 95% B for 6 min, 100% B for 3 min, 5% B for 1.5, and all at flow rate of 0.3 ml/min.
(414) For analyzing cucurbitadienol, the first LCMS method included the use of C4 2.1100 mm column, gradient 1 to 95% B for 6 minutes, and at flow rate of 0.55 ml/min; and the second LCMS method included the use of Waters Acquity UPLC Protein BEH C4 2.1100 mM, 1.7 um, with guard, 62 to 67% B for 2 min, 100% B for 1 min, and at flow rate of 0.9 ml/min.
(415) For analyzing 11-OH cucurbitadienol, the LCMS method included the use of C8 2.1100 mm column, gradient 60 to 90% B for 6 minutes at flow rate of 0.55 ml/min
(416) For analyzing Mogrol, the LCMS method included the use of C8 2.1100 mm column, gradient 50 to 90% B for 6 minutes at flow rate of 0.55 ml/min
(417) For analyzing Mogroside III.sub.E & Compound 1, the LCMS method included the use of Fluoro-phenyl 2.1100 mm column, gradient 15 to 30% B for 6 minutes, at flow rate of 0.55 ml/min.
Example 63
(418) Step 1. Boosting Oxidosqualene Availability
(419) Saccharomyces cerevisiae strain YHR072 (heterozygous for lanosterol synthase erg7) was purchased from GE Dharmacon. Expression of active erg7 gene was reduced by replacing the promoter with that of cup1 (Peng et al., 2015). A truncated yeast HMG-CoA reductase (tHMG-CoA) under control of GDS promoter and yeast squalene epoxidase (erg1) under the control of Tef1 promoter was integrated into the genome. Oxidosqualene boost was monitored by the production of diepoxysqualene as shown in the HPLC and UV absorbance (
(420) In some embodiments, tHMG-CoA enzyme is used for the production of diepoxysqualene.
(421) Genes encoding for putative squalene epoxidases in S. grosvenorii (Itkins et al., 2016) were selected to test for boosting oxidosqualene/diepoxysqualene production. The sequences of 3 squalene epoxidases can be found in Table 1 for their amino acids and the coding sequence (SEQ ID NO: 50-56, 60, 61, 334 or 335). Additional sequences for squalene epoxidases suitable to use in the methods, systems and compositions disclosed herein for producing oxidosqualene and/or diepoxysqualene, and for boosting the production of oxidosqualene and/or diepoxysqualene include: SQE1 (protein) SEQ ID NO: 908, SQE1 (DNA) SEQ ID NO: 909; SQE2 (protein) SEQ ID NO: 910, SQE2 (DNA) SEQ ID NO: 911; SQE3 (protein) SEQ ID NO: 912, and SQE3 (DNA) SEQ ID NO: 913.
(422) Step 2. Cucurbitadienol Production
(423) Cucurbitadienol Synthase Enzymes
(424) Plasmids containing S. grosvernorii cucurbitadienol synthase gene (SgCbQ) were transformed into yeast strain with oxidosqualene boost. Strains were grown 1-3 days at 30 C, 150-250 rpm. Production of cucurbitadienol is shown in the HPLC and mass spectroscopy data which show mass peaks for the indicated product (
(425) Converting Other Oxidosqualene Cyclases into a Cucurbitadienol Synthase
(426) Plasmids containing modified oxidosqualene genes were transformed into yeast strain with oxidosqualene boost. Strains were grown 1-3 days at 30 C, 150-250 rpm.
(427) The protein PSX Y118L from the native host Pisum sativum was also used for the production of cucurbitadienol in yeast.
(428) The oxidosqualene cyclase from Dictyostelium sp. was also used for the production of cucurbitadienol in yeast. As shown in
(429) Improving Cucurbitadienol Synthase Activities
(430) The gene encoding for a cucurbitadienol synthase form Cucumis melo was codon optimized (SEQ ID: 907) and used as a starting point for generating a library of modifications. Modifications were introduced through standard molecular biology techniques consisting of fusion peptides at the N-terminus (i.e., 5) or C-terminus (i.e., 3) end of the enzyme. Plasmids libraries of modified cucurbitadienol synthase genes were transformed into a yeast strain with oxidosqualene boost. Enzyme activities were measured by ratios of peak heights or areas of 409 and 427 positive mass fragments at the expected retention times for cucurbitadienol vs. an internal standard using LCMS method 2 described above. Enzyme performance were scored as average % activities over the average activities of the parent enzyme (n=8). Step 1 sequences of the enzymes and the sequences that encode the enzyme can be found in SEQ ID NOs: 951-1012. Step 1 sequence also include the fusions SS2c-G10, SS2e-A7b, SS2d-G11, SS2e-A7a, SS4d-G5, SS4d-C7, SS3b-D8, and SS2c-A10a as described in Table 1.
(431) Step 3. Production of 11-OH Cucurbitadienol
(432) CYP87D18 (CYP450, S. grosvenorii) and SgCPR (CYP450 reductase, S. grosvenorii) were expressed in S. cerevisiae strain producing cucurbitadienol. 11-OH cucurbitadienol (i.e., 11-hydroxy cucurbitadienol) was observed using HPLC and mass spectroscopy data (
(433) Additional CYP450s from S. grosvenorii and Glycyrrhiza (CYP88D6) were expressed in S. cerevisiae strain producing cucurbitadienol. Protein sequences and DNA coding sequences for the enzymes are provided in SEQ ID NOs: 875-890.
(434) Step 4. Production of Mogrol
(435) CYP1798 (CYP450 enzyme, S. grosvenorii) and EPH2A (epoxide hydrolase, S. grosvenorii) were expressed in S. cerevisiae strain producing 11-OH cucurbitadienol. Mogrol was observed using HPLC and mass spectroscopy data (
(436) Epoxidation of Cucurbitadienol and/or 11-OH Cucurbitadienol
(437) Additional CYP450s and SQEs from S. grosvenorii and Glycyrrhiza (CYP88D6) were also expressed in S. cerevisiae strain producing cucurbitadienol or 11-OH cucurbitadienol to test for epoxidation.
(438) For SQEs, protein and DNA coding sequences for the enzymes are provided in SEQ ID NOs: 882-888. For CYP450s, protein and DNA coding sequences for the enzymes are provided in SEQ ID NOs: 875-890.
(439) Step 7: Production of Compound 1 from Mogroside IIIE in S. cerevisiae.
(440) S. cerevisiae strain expressing a truncated dextransucrase (tDexT) was incubated in YPD (30C, 250 rpm) containing 7 mg/ml Mogroside V for 1-2 day resulting in hydrolysis to Mogroside IIIE. The S. cerevisiae cells were harvested, lysed, and then mixed back with the YPD supernatant containing Mogroside IIIE. To initiate the dextransucrase reaction, sucrose was added to a final concentration of 200 g/L, followed by incubation at 30 C, 250 rpm for 2 days. Production of Compound 1 was observed using HPLC (
Example 64
(441) S. cerevisiae or Y. lipolytica was grown in the presence of Mogroside V to allow the hydrolytic enzymes in the yeast to generate Mogroside IIIE. After 1 or 2 days, the cells were lysed in analyzed by HPLC to determine the mogroside content. After 1 day of incubation, S. cerevisiae produced a mixture of Mogroside V, Mogrosides IV, and Mogroside IIIE. After 2 days of incubation, substantially all of the mogrosides were converted to Mogroside IIIE as shown in
(442) Similarly, after 2 days of incubation Y. lipolytica produced mostly Mogroside IIIE (shown in
Example 65
(443) S. cerevisiae or Y. lipolytica was grown in the presence of Compound 1. Unlike other mogrosides (see Example 64), no hydrolysis products due to hydrolysis of Compound 1 was observed as shown in
Example 66
(444) S. cerevisiae was modified to overexpress a dextransucrase (DexT). This modified strain was grown in the presence of a mogrosides mixture to allow the hydrolytic enzymes in S. cerevisiae to generate Mogroside IIIE. After 2 days of incubation, the cells were lysed to release the DexT enzyme and supplemented with sucrose. After 24 hours, significant amounts of Compound 1 was produced (shown in
Example 67: Generation of Fusion Proteins Having Cucurbitadienol Synthase Activity
(445) A collection or library of S. cerevisiae in-frame fusion polynucleotides for a cucurbitadienol synthase gene (DNA coding sequence provided in SEQ ID NO: 907, and protein sequence provided in SEQ ID NO: 902) was prepared. The in-frame fusion polynucleotides were cloned into a yeast vector molecule to generate fusion proteins.
(446) Various fusion proteins were generated and tested for cucurbitadienol synthase activities. The testing results for some of the fusion protein generated in this example are shown in Table 2.
(447) TABLE-US-00004 TABLE 2 Cucurbitadienol synthase activities for the fusion proteins SEQ ID NO Activity SEQ ID NO Activity for fusion (as compared for fusion (as compared protein to the parent) protein to the parent) 1024 166% 851 142% 854 135% 856 123% 859 105% 862 102% 865 125% 867 145% 915 124% 920 124% 924 121% 928 117% 932 128% 936 126% 940 109% 944 107% 948 102% 952 90% 956 85% 959 46% 964 74% 967 72% 971 89% 975 35% 979 96% 983 80% 987 111% 991 114% 995 124% 999 103% 1003 118% 1007 97%
Example 68: UDP-Gycosyltransferases (311 Enzyme, SEQ IDs: 436-438) in the Presence of Mogroside IIIE, Mogroside IVE or Mogroside IVA to Produce Mogroside IV and Mogroside V Isomers
(448) Reaction conditions: To a 50 ml Falcon tube with 17 ml water, 3 ml of pH 7.0 1M Tris-HCl, 0.12 g UDP (Carbosynth), 3 g sucrose, 300 ul of protease inhibitor 100M221, 150 ul of Kanamycin (50 mg/ml), 1.185 ml sucrose synthase Sus1 (1 mg/ml crude extract), 150 mg of starting Mogrosides, and 6 ml 311 enzyme (1 mg/ml crude extract) were added and incubated at 30 C., 150 rpm. The progress of the reaction was monitored periodically by LC-MS. After 3 days, the reaction was stopped by heating to 80 C. for 30 minutes with stirring (500 rpm). The reaction was then centrifuged (4000 rpm for 10 min, Eppendorf) and the supernatant was filtered through a 50 ml, 0.22 micron PVDF. The reaction products identified are depicted in
(449) TABLE-US-00005 TABLE1 SomeproteinandDNAsequencesdisclosedherein SEQ Protein/DNA ID Description Protein/DNASequence NO Reference Cyclomaltodextrin MKEKDKLKVNRNNVNFSKDIIYQIVTDRFHNGCPSYNPKGGLYDESRKNKKKYFGGDWIGIIEK 1 g1ucanotransferase LNTNYFTELGVTSLWISQPVENIFTPINDLVGSTSYHGYWARDFKRTNPFFGTFGDFQTLITTA (CGTase;Bacillus) HAKDIKIIMDFAPNHTSPALHDDATYAENGRLYDNGLLLGGYDNDYNHYFHHNGGTDFEEYEDG VYRNLFDLADLNHQNIAIDLYFKEAIKLWLDQGIDGIRVDAVKHMSYGWQKSWLNSIYNYRPVF IFGEWYINPNEYDHRNVHFANNSGMSLLDFSFAHKVREVFRDGMDSMHGLHKMIEETYQIYNDV NNLVTFIDNHDMDRFHINGQSKRRIEQSLVFLLTSRGIPSVYYGTEQYMVGNGDPNNRGQMESF DVNTDNFKIIQSLSSLRSLNYALPYGNTKERYITNDIYVYERYFGSDVVLIALNRNLTEGYEIK DVKTILPSRKYKDILDGLLDGEAIRVENNNIDSLWLGPGSGQVWHHKGVNSIPLIGTVGHKMTT VGQIICIEGCGFTSKKGSVLFEEKEAEVVSWSHTSIKVKVPAVNDGKYEITVVTDTGTRSNIYK HIEVLNTKQVCIRFVIENGYEIPESEVFIMGNTYSLGNMNPCKAVGPFFNQIMYQFPTGYFDIS VPADTLLEFKFIRKINNTLLIEGGENHKYRTPSFGTGEVVVKWQTAEKTILVES DexTprotein MPANAPDKQSVTNAPVVPPKHDTDQQDDSLEKQQVLEPSVNSNIPKKQTNQQLAVVTAPANSAP 2 QTKTTAEISAGTELDTMPNVKHVDGKVYFYGDDGQPKKNFTTIIDGKPYYFDKDTGALSNNDKQ YVSELFSIGNKHNAVYNTSSDNFTQLEGHLTASSWYRPKDILKNGKRWAPSTVTDFRPLLMAWW PDKSTQVTYLNYMKDQGLLSGTHHFSDNENMRTLTAAAMQAQVNIEKKIGQLGNTDWLKTAMTQ YIDAQPNWNIDSEAKGDDHLQGGALLYTNSDMSPKANSDYRKLSRTPKNQKGQIADKYKQGGFE LLLANDVDNSNPVVQAEQLNWLHYMMNIGSILQNDDQANFDGYRVDAVDNVDADLLQIAGEYAK AAYGVDKNDARANQHLSILEDWGDEDPDYVKAHGNQQITMDFPLHLAIKYALNMPNDKRSGLEP TREHSLVKRITDDKENVAQPNYSFIRAHDSEVQTIIADIIKDKINPASTGLDSTVTLDQIKQAF DIYNADELKADKVYTPYNIPASYALLLTNKDTIPRVYYGDMFTDDGQYMAKQSPYYQAIDALLK ARIKYAAGGQTMKMNYFPDEQSVMTSVRYGKGAMTASDSGNQETRYQGIGLVVNNRPDLKLSDK DEVKMDMGAAHKNQDYRPVLLTTKSGLKVYSTDANAPVVRTDANGQLTFKADMVYGVNDPQVSG YIAAWVPVGASENQDARTKSETTQSTDGSVYHSNAALDSQVIYEGFSNFQDFPTTPDEFTNIKI AQNVNLFKDWGITSFEMAPQYRASSDKSFLDAIVQNGYAFTDRYDIGYNTPTKYGTADNLLDAL RALHGQGIQAINDWVPDQIYNLPDEQLVTAIRTDGSGDHTYGSVIDHTLYASKTVGGGIYQQQY GGAFLEQLKTQYPQLFQQKQISTDQPMNPDIQIKSWEAKYFNGSNIQGRGAWYVLKDWGTQQYF NVSDAQTFLPKQLLGEKAKTGFVTRGKETSFYSTSGYQAKSAFICDNGNWYYFDDKGKMVVGNQ VINGINYYFLPNGIELQDAYLVHDGMYYYYNNIGKQLHNTYYQDKQKNFHYFFEDGHMAQGIVT IIQSDGTPVTQYFDENGKQQKGVAVKGSDGHLHYFDGASGNMLFKSWGRLADGSWLYVDEKGNA VTGKQTINNQTVYFNDDGRQIKNNFKELADGSWLYLNNKGVAVTGEQIINGQTLYFGNDGRQFK GTTHINATGESRYYDPDSGNMITDRFERVGDNQWAYFGYDGVAVTGDRIIKGQKLYFNQNGIQM KGHLRLENGIMRYYDADTGELVRNRFVLLSDGSWVYFGQDGVPVTGVQVINGQTLYFDADGRQV KGQQRVIGNQRYWMDKDNGEMKKITYAAALEHHHHHH CGTaseCGT-SL MKRWLSVVLSMSLVFSAFFLVSDTQKVTVEAAGNLNKVNFTSDIVYQIVVDRFVDGNTSNNPSG 3 SLFSSGCTNLRKYCGGDWQGIINKINDGYLTEMGVTAIWISQPVENVFAVMNDADGSTSYHGYW ARDFKKTNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLIG GYTNDTNSYFHHNGGTTFSNLEDGIYRNLFDLADFNHQNQFIDKYLKDAIKLWLDMGIDGIRMD AVKHMPFGWQKSFMDEVYDYRPVFTFGEWFLSENEVDSNNHFFANESGMSLLDFRFGQKLRQVL RNNSDDWYGFNQMIQDTASAYDEVIDQVTFIDNHDMDRFMADEGDPRKVDIALAVLLTSRGVPN IYYGTEQYMTGNGDPNNRKMMTSFNKNTRAYQVIQKLSSLRRSNPALSYGDTEQRWINSDVYIY ERQFGKDVVLVAVNRSLSKSYSITGLFTALPSGTYTDQLGALLDGNTIQVGSNGAVNAFNLGPG EVGVWTYSAAESVPIIGHIGPMMGQVGHKLTIDGEGFGTNVGTVKFGNTVASVVSWSNNQITVT VPNIPAGKYNITVQTSGGQVSAAYDNFEVLTNDQVSVRFVVNNANTNWGENIYLVGNVHELGNW NTSKAIGPLFNQVIYSYPTWYVDVSVPEGKTIEFKFIKKDGSGNVIWESGSNHVYTTPTSTTGT VNVNWQY UGT73C3protein MATEKTHQFHPSLHFVLFPFMAQGHMIPMIDIARLLAQRGVTITIVTTPHNAARFKNVLNRAIE 4 SEQIDNO:21 SGLAINILHVKFPYQEFGLPEGKENIDSLDSTELMVPFFKAVNLLEDPVMKLMEEMKPRPSCLI in SDWCLPYTSIIAKNFNIPKIVFHGMGCFNLLCMHVLRRNLEILENVKSDEEYFLVPSFPDRVEF W02016050890 TKLQLPVKANASGDWKEIMDEMVKAEYTSYGVIVNTFQELEPPYVKDYKEAMDGKVWSIGPVSL (whichis CNKAGADKAERGSKAAIDQDECLQWLDSKEEGSVLYVCLGSICNLPLSQLKELGLGLEESRRSF incorporated IWVIRGSEKYKELFEWMLESGFEERIKERGLLIKGWAPQVLILSHPSVGGFLTHCGWNSTLEGI byreference TSGIPLITWPLFGDQFCNQKLVVQVLKAGVSAGVEEVMKWGEEDKIGVLVDKEGVKKAVEELMG inIts DSDDAKERRRRVKELGELAHKAVEKGGSSHSNITLLLQDIMQLAQFKN entirety) UGT73C6protein MAFEKNNEPFPLHFVLFPFMAQGHMIPMVDIARLLAQRGVLITIVTTPHNAARFKNVLNRAIES 5 SEQIDNO:23 GLPINLVQVKFPYQEAGLQEGQENMDLLTTMEQITSFFKAVNLLKEPVQNLIEEMSPRPSCLIS in DMCLSYTSEIAKKFKIPKILFHGMGCFCLLCVNVLRKNREILDNLKSDKEYFIVPYFPDRVEFT W02016050890 RPQVPVETYVPAGWKEILEDMVEADKTSYGVIVNSFQELEPAYAKDFKEARSGKAWTIGPVSLC NKVGVDKAERGNKSDIDQDECLEWLDSKEPGSVLYVCLGSICNLPLSQLLELGLGLEESQRPFI WVIRGWEKYKELVEWFSESGFEDRIQDRGLLIKGWSPQMLILSHPSVGGFLTHCGWNSTLEGIT AGLPMLTWPLFADQFCNEKLVVQILKVGVSAEVKEVMKWGEEEKIGVLVDKEGVKKAVEELMGE SDDAKERRRRAKELGESAHKAVEEGGSSHSNITFLLQDIMQLAQSNN UGT85C2sequence MDAMATTEKKPHVIFIPFPAQSHIKAMLKLAQLLHHKGLQITFVNTDFIHNQFLESSGPHCLDG 6 SEQIDNO:25 APGFRFETIPDGVSHSPEASIPIRESLLRSIETNFLDRFIDLVTKLPDPPTCIISDGFLSVFTI in DAAKKLGIPVMMYWTLAACGFMGFYHIHSLIEKGFAPLKDASYLTNGYLDTVIDWVPGMEGIRL W02016050890 KDFPLDWSTDLNDKVLMFTTEAPQRSHKVSHHIFHTFDELEPSIIKTLSLRYNHIYTIGPLQLL LDQIPEEKKQTGITSLHGYSLVKEEPECFQWLQSKEPNSVVYVNFGSTTVMSLEDMTEFGWGLA NSNHYFLWIIRSNLVIGENAVLPPELEEHIKKRGFIASWCSQEKVLKHPSVGGFLTHCGWGSTI ESLSAGVPMICWPYSWDQLTNCRYICKEWEVGLEMGTKVKRDEVKRLVQELMGEGGHKMRNKAK DWKEKARIAIAPNGSSSLNIDKMVKEITVLARN UGT73C5protein MVSETTKSSPLHFVLFPFMAQGHMIPMVDIARLLAQRGVIITIVTTPHNAARFKNVLNRAIESG 7 SEQIDNO:22 LPINLVQVKFPYLEAGLQEGQENIDSLDTMERMIPFFKAVNFLEEPVQKLIEEMNPRPSCLISD in FCLPYTSKIAKKFNIPKILFHGMGCFCLLCMHVLRKNREILDNLKSDKELFTVPDFPDRVEFTR W02016050890 TQVPVETYVPAGDWKDIFDGMVEANETSYGVIVNSFQELEPAYAKDYKEVRSGKAWTIGPVSLC NKVGADKAERGNKSDIDQDECLKWLDSKKHGSVLYVCLGSICNLPLSQLKELGLGLEESQRPFI WVIRGWEKYKELVEWFSESGFEDRIQDRGLLIKGWSPQMLILSHPSVGGFLTHCGWNSTLEGIT AGLPLLTWPLFADQFCNEKLVVEVLKAGVRSGVEQPMKWGEEEKIGVLVDKEGVKKAVEELMGE SDDAKERRRRAKELGDSAHKAVEEGGSSHSNISFLLQDIMELAEPNN UGT73E1protein MSPKMVAPPTNLHFVLFPLMAQGHLVPMVDIARILAQRGATVTIITTPYHANRVRPVISRAIAT 8 SEQIDNO:24 NLKIQLLELQLRSTEAGLPEGCESFDQLPSFEYWKNISTAIDLLQQPAEDLLRELSPPPDCIIS in DFLFPWTTDVARRLNIPRLVFNGPGCFYLLCIHVAITSNILGENEPVSSNTERVVLPGLPDRIE W02016050890 VTKLQIVGSSRPANVDEMGSWLRAVEAEKASFGIVVNTFEELEPEYVEEYKTVKDKKMWCIGPV SLCNKTGPDLAERGNKAAITEHNCLKWLDERKLGSVLYVCLGSLARISAAQAIELGLGLESINR PFIWCVRNETDELKTWFLDGFEERVRDRGLIVHGWAPQVLILSHPTIGGFLTHCGWNSTIESIT AGVPMITWPFFADQFLNEAFIVEVLKIGVRIGVERACLFGEEDKVGVLVKKEDVKKAVECLMDE DEDGDQRRKRVIELAKMAKIAMAEGGSSYENVSSLIRDVTETVRAPH UGT98protein MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI 9 SEQIDNO:53 QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA in PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK W02016050890 IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE EAGVGVEAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEIS LLRKKAPCSI UGT1495gene ATGCTTCCATGGCTGGCTCACGGCCATGTCTCCCCTTTCTTCGAGCTCGCCAAGTTGCTCGCCG 10 SEQIDNO:27 sequence CTAGAAACTTCCACATATTCTTCTGCTCCACCGCCGTAAACCTCCGCTCCGTCGAACCAAAACT in CTCTCAGAAGCTCTCCTCCCACGTGGAGCTGGTGGAGCTCAACCTACCGCCCTCGCCGGAGCTC W02016050890 CCTCCGCACCGCCACACCACCGCCGGCCTTCCACCGCACCTCATGTTCTCGCTCAAGCGAGCTT TCGACATGGCCGCTCCCGCCTTCGCCGCCATCCTCCGCGACCTGAACCCGGACTTGCTCATCTA CGACTTCCTGCAGCCGTGGGCGGCGGCGGAGGCTCTGTCGGCGGATATTCCGGCCGTGATGTTC AAAAGCACGGGTGCGCTCATGGCGGCCATGGTCGCGTACGAGCTGACGTTTCCGAACTCTGATT TTTTCTCGCTTTTCCCTGAGATTCGTCTCTCCGAGTGCGAGATTAAACAGCTGAAGAACTTGTT TCAATGTTCTGTGAATGATGCGAAAGACAAGCAAAGGATTAAGGGATGTTATGAGAGATCTTGC GGCATGATTTTGGTGAAATCTTTCAGAGAAATCGAAGGCAAATATATTGATTTTCTCTCTACTC TGCTGGGCAAGAAGGTTGTTCCAGTTGGTCCACTTGTTCAACAAACAGAAGACGACGTCGTATC AGGAAGTTTTGACGAATGGCTAAATGGAAAAGATAGATCGTCTTCCATACTCGTGTCTTTCGGA AGCGAGTTCTACCTGTCCAGAGAAGACATGGAAGAGATCGCGCATGGCTTAGAGCTGAGCCAGG TGAACTTCATATGGGTCGTCAGGTTTCCGGCGGGAGGAGAGAGAAACACGACAAAGGTGGAAGA AGAACTGCCAAAAGGGTTTCTAGAGAGAGTTAGAGAGAGAGGGATGGTGGTGGAGGGCTGGGCG CCGCAGGCTCAGATCTTGAAACATCCAAGCGTCGGCGGATTCCTCAGCCACTGCGGGTGGAGCT CCGTCGTGGAGAGCATGAAATTCGGCGTTCCGATCATCGCCATGCCGATGCACCTCGACCAGCC GCTGAATTCCCGGCTGGTCGAGCGGCTCGGCGTCGGCGTAGTGGTGGAGAGAGACGGCCGCCTC CGGGGAGAGGTGGAGAGAGTTGTCAGAGAGGTGGTGGTGGAGAAAAGTGGAGAGAGAGTGAGGA AGAAGGTGGAGGAGTTTGCAGAGATCATGAAGAAGAAAAAAGACAATGAAGAGATGGACGTAGT CGTGGAAGAGTTGGTGACGCTCTGCAGGAAGAAGAAGAAGGAGGAGGATTTACAGAGTAATTAT TGGTGCAGAACCGCCATTGATGACCATTGTTCTGAAGTCGTGAAGATTGAAGATGCTGCAGCAG CCGACGAGGAGCCTCTTTGCAAATAA UGT1817gene ATGGCTGTCACTTACAGCCTGCACATAGCAATGTACCCTTGGTTTGCTTTCGGCCACTTGACTC 11 SEQIDNO:28 sequence CATTTCTCCAAGTCTCCAACAAGCTTGCCAAGGAAGGCCACAAAATCTCCTTCTTCATCCCAAC in GAAAACGCTAACCAAATTGCAGCCTTTCAATCTCTTTCCAGATCTCATTACCTTTGTCCCCATC W02016050890 ACTGTTCCTCATGTTGATGGTCTCCCTCTTGGAGCTGAGACTACTGCTGATGTTTCTCACCCTT CACAGCTCAGTCTCATCATGACTGCTATGGATTGCACCCAACCCGAAATCGAGTGTCTTCTTCG AGACATAAAACCTGATGCCATCTTCTTCGATTTCGCGCACTGGGTGCCAAAATTGGCATGTGGA TTGGGCATTAAGTCGATTGATTACAGTGTCTGTTCTGCAGTATCAATTGGTTATGTTTTGCCCC TATTAAGGAAAGTTTGTGGACAAGATTTATTAACTGAAGATGATTTTATGCAGCCATCTCCTGG CTACCCGAGTTCCACCATCAATCTTCAAGCTCATGAGGCTCGATATTTTGCATCTCTGAGCCGC TGGAGGTTTGGCAGTGATGTCCCTTTCTTTAGTCGCCATCTTACTGCACTTAATGAATGCAATG CTTTAGCATTCAGGTCATGTAGGGAGATTGAAGGGCCTTTTATAGACTATCCAGAAAGTGAATT AAAAAAGCCTGTGTTGCTTTCCGGAGCAGTGGATCTACAACCGCCAACCACAACTGTAGAAGAA AGATGGGCAAAATGGCTATCAGGGTTCAACACCGACTCGGTCGTATATTGTGCATTTGGAAGTG AGTGTACCTTAGCAAAAGACCAATTCCAAGAACTGCTGTTGGGTTTTGAGCTTTCAAATATGCC ATTCTTTGCTGCACTTAAACCACCTTTTGGTGTTGACTCGGTTGAAGCAGCCTTGCCTGAAGGT TTTGAACAGAGAGTTCAGGGAAGAGGGGTGGTCTATGGGGGATGGGTCCAACAGCAGCTCATTT TGGAGCACCCATCAATTGGATGCTTTGTTACACATTGTGGATCAGGCTCCTTATCAGAGGCGTT AGTGAAGAAGTGTCAATTAGTGTTGTTACCTCGTATCGGTGACCACTTTTTCCGAGCAAGAATG TTGAGCAATTATTTGAAAGTTGGTGTGGAGGTAGAGAAAGGAGAAGGAGATGGATCTTTTACAA AGGAAAGTGTGTGGAAGGCAGTGAAGACAGTGATGGATGAAGAGAATGAAACTGGGAAAGAGTT CAGAGCGAACCGTGCCAAGATAAGAGAGCTATTGCTCGACGAAGATCTCGAGGAGTCTTATATC AACAATTTCATCCACAGCCTGCATACTTTGAATGCATGA UGT5914gene ATGGAAGCTAAGAACTGCAAAAAGGTTCTGATGTTCCCATGGCTGGCGCATGGTCACATATCAC 12 SEQIDNO:30 sequence CATTTGTAGAGCTGGCCAAGAAGCTCACAGACAACAACTTCGCCGTTTTTCTATGTTCTTCCCC in TGCAAATCTTCAAAACGTCAAGCCAAAACTCCCCCATCACTACTCTGATTCCATTGAACTCGTG W02016050890 GAGCTCAACCTTCCATCGTCGCCGGAGCTTCCCCCTCATATGCACACCACCAATGGCCTCCCTT TGCATTTAGTTCCCACCCTCGTTGACGCCTTGGACATGGCCGCTCCGCACTTCTCCGCCATTTT ACAGGAACTGAATCCAGATTTTCTCATATTCGACATCTTCCAACCCTGGGCGGCTGAAATCGCT TCCTCCTTCGGCGTTCCTGCTATTTTGTTGCTTATCGTTGGATCTGCTATAACCGCTTTAGGGG TTCATTTTGTCCGGAGCTCCGGTACGGAATTCCCCTTTCCCGAGCTTACTAAATCATTCAAGAA GGAGGACGACCGAAAACCTCCAGGAGATTCCGGCAACGATAGAGGAAAACGGCTATTCAAATGT CTGCTGGACCTGGAACATTCTTCAGAGACTATTTTGGTGAACAGTTTTACAGAGATAGAGGGCA AATATATGGACTATCTCTCGGTCTTACTGAAGAAGAAGATCCTTCCGATTGGTCCTTTGGTTCA GAAAATTGGCTCCGATGACGATGAATCGGGAATCCTCCGGTGGCTTGACAAGAAGAAACCGAAT TCAACTGTGTACGTTTCGTTCGGGAGTGAGTACTATTTGAGCAAAGAAGACATAGCAGAGCTTG CGCATGGTCTGGAAATCAGCGGCGTCAATTTCATCTGGATTGTTCGGTTTCCAAAGGGAGAGAA AATCGCCATTGAAGAGGCATTACCAGATGAATTTCTTGAAAGAGTCGGAGAGAGAGGCGTCGTC GTTGATGGATGGGCGCCGCAGATGAAAATATTAGGGCATTCGAGCGTCGGCGGGTTTCTGTCTC ACTGCGGATGGAACTCTGTGCTGGAGAGTCTGGTGCTCGGCGTGCCGATCATATCCCTGCCGAT ACACCTCGAACAGCCGTGGAACGCCTTGGTAGCGGAGCACGTCGGCGTTTGTGTGAGGGCGAAG AGAGACGACGGAGGAAATCTTCAAAGAGAGTTGGTGGCGGAGGCCATTAAAGAAGTGGTGGTTG AGGAAACAGGAGCGGAACTGAGAAGCAAAGCAAGAGTAATTAGTGAAATCTTGAAAAATAAAGA AGCTGAAACAATACAAGATTTGGTGGCTGAGCTTCACCGGCTTTCTGACGCAAGAAGAGCTTGT TGA UGT8468(gene ATGGAAAAAAATCTTCACATAGTGATGCTTCCATGGTCGGCGTTCGGCCATCTCATACCATTTT 13 SEQIDNO:31 sequence) TTCACCTCTCCATAGCCTTAGCCAAAGCCAAAGTTTATATCTCCTTCGTCTCCACTCCAAGAAA in TATTCAGAGACTYCCCCAAATCCCGCCGGACTTAGCTTCTTTCATAGATTTGGTGGCCATTCCC W02016050890 TTGCCGAGACTCGACGACGATCTGTTGCTAGAATCTGCAGAGGCCACTTCTGATATTCCGATCG ACAAGATTCAGTATTTGAAGCGAGCCGTCGACCTCCTCCGCCACCCCTTCAAGAAGTTTGTCGC CGAACAATCGCCGGACTGGGTCGTCGTTGATTTTCATGCTTATTGGGCCGGCGAGATCTACCAG GAGTTTCAAGTTCCCGTCGCCTACTTCTGTATTTTCTCGGCCATCTGTTTGCTTTATCTTGGAC CTCCAGACGTGTATTCGAAGGATCCTCAGATCATGGCACGAATATCTCCCGTTACCATGACGGT GCCGCCGGAGTGGGTCGGTTTTCCGTCCGCCGTAGCCTACAACTTGCATGAGGCGACGGTCATG TACTCTGCTCTCTATGAAACAAATGGGTCTGGAATAAGCGACTGCGAGAGGATTCGCCGGCTCG TCCTTTCCTGTCAAGCCGTGGCCATTCGAAGCTGCGAGGAGATTGAAGGCGAATACCTTAGGTT ATGTAAGAAACTGATTCCACCGCAGGGGATTGCCGTCGGCTTGCTTCCGCCGGAAAAGCCACCA AAATCAGATCACGAGCTCATCAAATGGCTTGACGAGCAAAAGCTCCGATTCGTCGTGTACGTGA CATTCGGCAGCGAATGCAACCTGACGAAGGACCAAGTTCACGAGATAGCCCACGGGCTGGAACT GTCGGAGCTGCCATTTTTATGGGCACTGAGGAAACCCAGCTGGGCAGCTGAGGAAGACGATGGG CTGCCGTCTGGGTTTCGTGAGAGAACGTCCGGGAGAGGGGTGGTGAGCATGGAGTGGGTGCCGC AGTTGGAGATTCTGGCGCACCAGGCCATCGGCGTCTCTTTAGTTCACGGGGGCTGGGGCTCTAT TATCGAGTCGCTACAAGCTGGGCACTGTCTGGTTGTGCTGCCGTTTATCATCGACCAGCCGCTG AACTCAAAGCTTTTGGTGGAGAAAGGGATGGCGCTTGAGATCAGAAGGAACGGTTCTGATGGAT GGTTTAGTAGAGAAGACATCGCCGGAACTTTGAGAGAAGCTATGCGGTCGTCTGAGGAAGGCGG GCAGCTGAGGAGCCGTGCAAAAGAGGCGGCGGCCATCGTTGGAGATGAGAAGCTGCAGTGGGAA CAATACTTCGGCGCGTTCGTACAGTTTCTGAGGGACAAGTCTTGA UGT10391(gene ATGTCCGAGGAGAAAGGCAGAGGGCACAGCTCGTCGACGGAGAGACACACTGCTGCCGCCATGA 14 SEQIDNO:32 sequence) ACGCCGAGAAACGAAGCACCAAAATCTTGATGCTCCCATGGCTGGCTCACGGCCACATATCTCC in ATACTTCGAGCTCGCCAAGAGGCTCACCAAGAAAAACTGCCACGTTTACTTGTGTTCTTCGCCT W02016050890 GTAAATCTCCAAGGCATCAAGCCGAAACTCTCTGAAAATTACTCTTCCTCCATTGAACTTGTGG AGCTTCATCTTCCATCTCTCCCCGACCTTCCTCCCCATATGCACACGACCAAAGGCATCCCTCT ACATCTACAATCCACCCTCATCAAAGCCTTCGACATGGCCGCCCCTGATTTTTCCGACCTGTTG CAGAAACTCGAGCCGGATCTCGTCATTTCCGATCTCTTCCAGCCATGGGCAGTTCAATTAGCGT CGTCTCGGAACATTCCCGTCGTCAATTTCGTTGTCACCGGAGTCGCTGTTCTTAGTCGTTTGGC TCACGTGTTTTGCAACTCCGTTAAGGAATTCCCTTTCCCGGAACTCGATCTAACCGACCATTGG ATCTCCAAGAGCCGCCGCAAAACGTCCGACGAATTAGGTCGCGAGTGCGCGATGCGATTTTTCA ACTGCATGAAACAATCTTCAAACATCACTCTAGCCAACACTTTCCCCGAGTTCGAAGAAAAATA CATCGATTATCTCTCTTCCTCGTTTAAGAAAAAGATTCTTCCGGTTGCTCCTCTAGTTCCTGAA ATCGACGCAGACGACGAGAAATCGGAAATTATCGAGTGGCTTGACAAGAAGAAACCGAAATCGA CTGTTTACGTTTCGTTTGGGAGTGAGTATTATCTGACGAAAGAAGACAGGGAAGAGCTCGCCCA TGGCTTAGAAAAGAGCGGCGTGAATTTCATCTGGGTTATTAGGTTTCCAAAGGGCGAGAAGATC ACCATTGAAGAGGCTTTACCAGAAGGATTTCTCGAGAGAGTAGGGGACAGGGGAGTGATTATCG ACGGGTGGGCGCCGCAGTTGAAAATATTGAGGCATTCAAGCGTGGGCGGGTTCGTGTGCCACTG CGGGTGGAACTCTGTGGTGGAGAGCGTGGTGTTTGGGGTGCCGATCATAGCCTTGCCGATGCAG CTCGATCAGCCATGGCATGCGAAGGTGGCGGAGGACGGCGGCGTCTGTGCGGAGGCGAAGAGAG ACGTTGAAGGGAGCGTTCAGAGAGAAGAGGTGGCGAAGGCCATTAAAGAGGTGGTGTTTGAGAA GAAGGGGGGGGTTCTGAGTGGAAAAGCAAGAGAGATCAGCGAGGCCTTGAGAAAGAGGGAAGGG GAAATCATAGAGGAATTGGTTGCTGAGTTTCACCAGCTCTGTGAAGCTTGA UGT1576protein MASPRHTPHFLLFPFMAQGHMIPMIDLARLLAQRGVIITIITTPHNAARYHSVLARAIDSGLHI 15 SEQIDNO:48 HVLQLQFPCKEGGLPEGCENVDLLPSLASIPRFYRAASDLLYEPSEKLFEELIPRPTCIISDMC in LPWTMRIALKYHVPRLVFYSLSCFFLLCMRSLKNNLALISSKSDSEFVTFSDLPDPVEFLKSEL W02016050890 PKSTDEDLVKFSYEMGEADRQSYGVILNLFEEMEPKYLAEYEKERESPERVWCVGPVSLCNDNK LDKAERGNKASIDEYKCIRWLDGQQPSSVVYVSLGSLCNLVTAQIIELGLGLEASKKPFIWVIR RGNITEELQKWLVEYDFEEKIKGRGLVILGWAPQVLILSHPAIGCFLTHCGWNSSIEGISAGVP MVTWPLFADQVFNEKLIVQILRIGVSVGTETTMNWGEEEEKGVVVKREKVREAIEIVMDGDERE ERRERCKELAETAKRAIEEGGSSHRNLTMLIEDIIHGGGLSYEKGSCR UGTSK98protein MDAQRGHTTTILMLPWVGYGHLLPFLELAKSLSRRKLFHIYFCSTSVSLDAIKPKLPPSISSDD 16 SEQIDNO:50 SIQLVELRLPSSPELPPHLHTTNGLPSHLMPALHQAFVMAAQHFQVILQTLAPHLLIYDILQPW in APQVASSLNIPAINFSTTGASMLSRTLHPTHYPSSKFPISEFVLHNHWRAMYTTADGALTEEGH W02016050890 KIEETLANCLHTSCGVVLVNSFRELETKYIDYLSVLLNKKVVPVGPLVYEPNQEGEDEGYSSIK NWLDKKEPSSTVFVSFGTEYFPSKEEMEEIAYGLELSEVNFIWVLRFPQGDSTSTIEDALPKGF LERAGERAMVVKGWAPQAKILKHWSTGGLVSHCGWNSMMEGMMFGVPIIAVPMHLDQPFNAGLL EEAGVGVEAKRGSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEI SLLRKKAPCSI UGT430protein MEQAHDLLHVLLFPYPAKGHIKPFLCLAELLCNAGLNVTFLNTDYNHRRLHNLHLLAACFPSLH 17 SEQIDNO:62 FESISDGLQPDQPRDILDPKFYISICQVTKPLFRELLLSYKRTSSVQTGRPPITCVITDVIFRF in PIDVAEELDIPVFSFCTFSARFMFLYFWIPKLIEDGQLPYPNGNINQKLYGVAPEAEGLLRCKD W02016050890 LPGHWAFADELKDDQLNFVDQTTASLRSSGLILNTFDDLEAPFLGRLSTIFKKIYAVGPIHALL NSHHCGLWKEDHSCLAWLDSRAARSVVFVSFGSLVKITSRQLMEFWHGLLNSGTSFLFVLRSDV VEGDGEKQVVKEIYETKAEGKWLVVGWAPQEKVLAHEAVGGFLTHSGWNSILESIAAGVPMISC PKIGDQSSNCTWISKVWKIGLEMEDQYDRATVEAMVRSIMKHEGEKIQKTIAELAKRAKYKVSK DGTSYRNLEILIEDIKKIKPN UGT1697protein MVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDVVFLSTEYNHRRISNTEALASRFPTLHFETI 18 SEQIDNO:68 PDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRWPITCIITDIMLSSPIEVAEEFGI in PVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQGVPLFEGLLRRNHLPGSWSDKSAD W02016050890 ISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFNKIYTIGPLHALSKSRLGDSSSSA SALSGFWKEDRACMSWLDCQPPRSVVFVSFGSTMKMKADELREFWYGLVSSGKPFLCVLRSDVV SGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVLSHPAVGGFLTHCGWNSTVESIAAGVPMMC WPILGDQPSNATWIDRVWKIGVERNNREWDRLTVEKMVRALMEGQKRVEIQRSMEKLSKLANEK VVRGGLSFDNLEVLVEDIKKLKPYKF UGT11789protein MDAKEESLKVFMLPWLAHGHISPYLELAKRLAKRKFLVYFCSTPVNLEAIKPKLSKSYSDSIQL 19 SEQIDNO:72 MEVPLESTPELPPHYHTAKGLPPHLMPKLMNAFKMVAPNLESILKTLNPDLLIVDILLPWMLPL in ASSLKIPMVFFTIFGAMAISFMIYNRTVSNELPFPEFELHECWKSKCPYLFKDQAESQSFLEYL W02016050890 DQSSGVILIKTSREIEAKYVDFLTSSFTKKVVTTGPLVQQPSSGEDEKQYSDIIEWLDKKEPLS TVLVSFGSEYYLSKEEMEEIAYGLESASEVNFIWIVRFPMGQETEVEAALPEGFIQRAGERGKV VEGWAPQAKILAHPSTGGHVSHNGWSSIVECLMSGVPVIGAPMQLDGPIVARLVEEIGVGLEIK RDEEGRITRGEVADAIKTVAVGKTGEDFRRKAKKISSILKMKDEEEVDTLAMELVRLCQMKRGQ ESQD CYP1798protein MEMSSSVAATISIWMVVVCIVGVGWRVVNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA 20 SEQIDNO:74 MEEQANSKPINFSHDIGPRVFPSMYKTIQNYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ in KPNLNPLIKFLLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE W02016050890 GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLVVKVAFGVYIPGWRFLPTKS NNKMKEINRKIKSLLLGIINKRQKAMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIEDV IEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLSRLKVVTMIL NEVLRLYPPASMLTRIIQKETRVGKLTLPAGVILIMPIILIHRDHDLWGEDANEFKPERFSKGV SKAAKVQPAFFPFGWGPRICMGQNFAMIEAKMALSLILQRFSFELSSSYVHAPTVVFTTQPQHG AHIVLRKL EPH1epoxide MEKIEHSTIATNGINMHVASAGSGPAVLFLHGFPELWYSWRHQLLYLSSLGYRAIAPDLRGFGD 21 Disclosedin hydrolase TDAPPSPSSYTAHHIVGDLVGLLDQLGVDQVFLVGDWGAMMAWYFCLFRPDRVKALVNLSVHFT thesuppl.of PRNPAISPLDGFRLMLGDDFYVCKFQEPGVAEADFGSVDTATMFKKFLTMRDPRPPIIPNGFRS Itkinetal. LATPEALPSWLTEEDIDYFAAKFAKTGFTGGFNYYRAIDLTWELTAPWSGSEIKVPTKFIVGDL ProcNatl DLVYHFPGVKEYIHGGGFKKDVPFLEEVVVMEGAAHFINQEKADEINSLIYDFIKQF AcadSciUSA 2016,22; 113(47):E7619 -E7628) EPH2epoxide MEKIEHTTISTNGINMHVASIGSGPAVLFLHGFPELWYSWRHQLLFLSSMGYRAIAPDLRGFGD 22 Disclosedin hydrolase TDAPPSPSSYTAHHIVGDLVGLLDQLGIDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF the LRRHPSIKFVDGFRALLGDDFYFCQFQEPGVAEADFGSVDVATMLKKFLTMRDPRPPMIPKEKG supplementof FRALETPDPLPAWLTEEDIDYFAGKFRKTGFTGGFNYYRAFNLTWELTAPWSGSEIKVAAKFIV Itkinetal GDLDLVYHFPGAKEYIHGGGFKKDVPLLEEVVVVDGAAHFINQERPAEISSLIYDFIKKF EPH3epoxide MDQIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD 23 Disclosedin hydrolase TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWGAIIAWYFCLFRPDRIKALVNLSVQF the IPRNPAIPFIEGFRTAFGDDFYMCRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG supplementof FRAIPPPENLPSWLTEEDINYYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV Itkinetal GDSDLTYHFPGAKEYIHNGGFKKDVPLLEEVVVVKDACHFINQERPQEINAHIHDFINKF EPH4epoxide MENIEHTTVQTNGIKMHVAAIGTGPPVLLLHGFPELWYSWRHQLLYLSSAGYRAIAPDLRGYGD 24 Disclosedin hydrolase TDAPPSPSSYTALHIVGDLVGLLDVLGIEKVFLIGHDWGAIIAWYFCLFRPDRIKALVNLSVQF the FPRNPTTPFVKGFRAVLGDQFYMVRFQEPGKAEEEFASVDIREFFKNVLSNRDPQAPYLPNEVK supplementof FEGVPPPALAPWLTPEDIDVYADKFAETGFTGGLNYYRAFDRTWELTAPWTGARIGVPVKFIVG Itkinetal DLDLTYHFPGAQKYIHGEGFKKAVPGLEEVVVMEDTSHFINQERPHEINSHIHDFFSKFC EPH5epoxide MEKESEIHSIRHTTVSVNGINMHVAEKGEGPLVLFIHGFPELWYSWRHQILDLASLGYRAVAPD 25 Disclosedin hydrolase LRGYGDSDAPPSASSYTSFHIVGDLIALLDAIVGVEEKVFVVAHDWGAIIAWYLCLYRPDRIKA the LVNLSVAFIRRNPKGKPVEWIRALYGDDHYMCRCQEPGEIEGEFAEIGTERVLTQFLTYHSPKP supplementof LMLPKGKAFGHPLDTPIPLPPWLSHQDIEYYASKFDKKGFTGPVNYYRNLDRNWELNAPFTRAQ Itkinetal VKVPVKFIVGDLDLTYHSFGTKEYIHSGEMKKDVPFLQEVVVMEGVGHFIQSEKPHEISDHIYQ FIKKF EPH6epoxide MEKIEHTIITTNGINMHVASIGTGPAVLFLHGFPELWYSWRHQLLSFSSLGYRAIAPDLRGYGD 26 Disclosedin hydrolase SDAPPSPSSYTVFHIVGDLVGLLDQLGIDQVFLVGHDWGASIAWYFSLLRPDRIKALVNLSVQY the FPRNPARNTVEALRALFGDDYYVCRFQEPGEMEEDFASIDTAVIFKIFLSSRDPRPPCIPKAVG supplementof FRAFPVPDSLPSWLSEEDISYYASKFSKKGFTGGLNYYRALALNWELTAPWTGTQIKVPTKFIV Itkinetal GDLDLTYHIPGSKEYIHKGGFERDVPSLEEVVVIEGAAHFVNQERPEEISKHIYDFIKKF EPH7epoxide MDAIEHRTVSVNGINMHVAEKGEGPVVLLLHGFPELWYSWRHQILALSSLGYRAVAPDLRGYGD 27 Disclosedin hydrolase TDAPGSISSYTCFHIVGLVALVESLGVDRVFVVAHDWGAMIANCLCLFRPEMVKAFVCLSVPFR the QRNPKMKPVQSMRAFFGDDYYICRFQNPGEIEEEMAQVGAREVLRGILTSRRPGPPILPKGQAF supplementof RARPGASTALPSWLSEKDLSFFASKYDQKGFTGPLNYYRAMDLNWELTASWTGVQVKVPVKYIV Itkinetal GDVDMVFTTPGVKEYVNGGGFKKDVPFLQEVVIMEGVGHFINQEKPEEISSHIHDFISRF EPH8epoxide MDQIQHKFIDIRGLKLHIAEIGTGSPAVVFLHGFPEIWYSWRHQMVAAAAVGYRAISPDLRGYG 28 Disclosedin hydrolase FSDPHPQPQNASFDDFVEDTLAILDFLHIPKAFLVGKDFGSWPVYLFSLVHPTRVAGIVSLGVP the FLPPNPKRYRDLPEGFYIFRWKESGRAEADFGRFDVKTVLRRIYTLFSRSEIPIAEKDQEIMDM supplementof VDESTPPPPWLTDEDLAAYATAYEHSGFESALQVPYRRRHQELGMSNPRVDVPVLLIIGGKDYF Itkinetal LKFPGIEDYIKSEKMREIVPDLEVADLADGTHFMQEQFPAQVNHLLISFLGKRNT EH1epoxide MDAIEHRTVSVNGINMHVAEKGEGPVVLLLHGFPELWYSWRHQILALSSLGYRAVAPDLRGYGD 29 SEQIDNO:38 hydrolase1 TDAPGSISSYTCFHIVGDLVALVESLGMDRVFVVAHDWGAMIANCLCLFRPEMVKAFVCLSVPF in RQRNPKMKPVQSMRAFFGDDYYICRFQNPGEIEEEMAQVGAREVLRGILTSRRPGPPILPKGQA W02016050890 FRARPGASTALPSWLSEKDLSFFASKYDQKGFTGPLNYYRAMDLNWELTASWTGVQVKVPVKYI VGDVDMVFTTPGVKEYVNGGGFKKDVPFLQEVVIMEGVGHFINQEKPEEISSHIHDFISKF EH2epoxide MDEIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD 30 SEQIDNO:40 hydrolase TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWGAIIAWYFCLFRPDRIKALVNLSVQF in IPRNPAIPFIEGFRTAFGDDFYICRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG W02016050890 FRAIPPPENLPSWLTEEDINFYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV GDSDLTYHFPGAKEYIHNGGFKRDVPLLEEVVVVKDACHFINQERPQEINAHIHDFINKF CYP533gene ATGGAACTCTTCTCTACCAAAACTGCAGCCGAGATCATCGCTGTTGTCTTGTTTTTCTACGCTC 31 SEQIDNO:3 (codingsequence) TCATCCGGCTATTATCTGGAAGATTCAGCTCTCAACAGAAGAGACTGCCACCTGAAGCCGGTGG in CGCCTGGCCACTGATCGGCCATCTCCATCTCCTAGGTGGGTCGGAACCTGCACATAAAACCTTG W02016050890 GCGAACATGGCGGACGCCTACGGACCAGTTTTTACGTTGAAACTGGGCATGCATACAGCTTTGG TTATGAGCAGTTGGGAAATAGCGAGAGAGTGCTTTACTAAAAACGACAGAATCTTTGCCTCCCG CCCCATAGTCACTGCCTCAAAGCTTCTCACCTATAACCATACCATGTTTGGGTTCAGCCAATAT GGTCCATTCTGGCGCCATATGCGCAAAATAGCCACGCTTCAACTCCTCTCAAACCACCGCCTCG AGCAGCTCCAACACATCAGAATATCGGAGGTCCAGACTTCGATTAAGAAACTGTACGAGTTGTG GGTCAACAGCAGAAATAATGGAGGCGAGAAAGTGTTGGTGGAGATGAAGACGTGGTTCGGAGGC ATAACCTTGAACACCATATTCAGGATGGTGGTCGGAAAGCGATTCTCGACTGCTTTCGAAGGCA GTGGTGGCGAACGGTATCGGAAGGCGTTGAGGGATTCTCTTGAATGGTTTGGGGCATTCGTTCC GTCAGATTCATTCCCGTTTTTAAGATGGTTGGATTTGGGAGGATATGAGAAGGCGATGAAGAAG ACGGCGAGTGTGCTGGACGAGGTGCTTGATAAATGGCTCAAAGAGCATCAGCAGAGGAGAAACT CCGGTGAACTGGAGACGGAGGAGCACGACTTCATGCACGTGATGCTGTCTATTGTTAAGGATGA TGAAGAACTATCCGGCTACGATGCCGATACAGTCACAAAAGCTACATGTTTGAATTTAATAGTT GGTGGATTCGACACTACACAAGTAACTATGACATGGGCTCTTTCTTTGCTTCTCAACAATGAAG AGGTATTAAAAAAGGCCCAACTTGAACTAGACGAACAAGTTGGAAGAGAGAGGTTTGTGGAAGA GTCCGATGTTAAAAATCTGTTATATCTCCAGGCCATCGTGAAGGAAACTTTGCGTTTGTACCCT TCAGCGCCAATCTCGACATTTCATGAGGCCATGGAAGATTGCACTGTTTCTGGCTACCACATCT TTTCAGGGACGCGTTTGATGGTGAATCTTCAAAAGCTTCAAAGAGATCCACTTGCATGGGAGGA TCCATGTGACTTTCGACCGGAGAGATTTCTGACAACTCATAAGGATTTCGATCTTAGAGGACAT AGTCCTCAATTGATACCATTTGGGAGTGGTCGAAGAATATGCCCTGGCATCTCGTTTGCCATTC AAGTTTTGCATCTTACGCTTGCAAATCTACTTCATGGGTTTGACATTGGAAGGCCATCTCATGA ACCAATCGATATGCAGGAGAGTAAAGGACTAACGAGTATTAAAACAACTCCACTTGAGGTTGTT TTAGCTCCACGCCTTGCTGCTCAAGTTTATGAGTGA CYP937gene(coding ATGCCGATCGCAGAICAGTCTCTGATTTGTTTGGTCGCCCACTCTTCTTTGCACTATATG 32 SEQIDNO:4 seugence) ATTGGTTCTTAGAGCATGGATCTGTTTATAAACTTGCCTTTGGACCAAAAGCCTTTGTTGTTGT in ATCAGATCCCATTGTGGCAAGATATATTCTTCGAGAAAATGCATTTGGTTATGACAAGGGAGTG W02016050890 CTTGCTGATAIITTAGAACCGATAAIGGGTAAAGGACTAATACCACTGATCCTTGGCACTTGGA AGCAGAGGAACGATTATTGCTCCAGATTCCATGCCTTTACTTGAAGCTTATGACCAAATAAGTA ATTTGCCAATTGTTCAGAACGATCAATATTGAAATTGGAGAAGCTTCTAGGAGAAGGTGAACTA CAGGAGAATAAAACCATTGAGTTGGATATGGAAGCAGAGTTTTCAAGTTTGGCTCTTGATATCA TTGGACTCGGTGTTTTCAACTATGATTTTGGTTCTGTAACCAAAGAATCTCCGGTGATTAAGGC TGTATATGGGACTCTITTTGAAGCAGAGCATAGATCGACTTTCTATATCCCATATTGGAAAGTA CCTTTGGCAAGGTGGATAGTCCCAAGGCAGCGTAAATTCCATGGTGACCTTAAGGTTATTAATG AGTGTCTTGAEGGCCTAATACGCAACGCAAGAGAAACCCGAGACTTGAAACGGATGTTGAAATT GCAGCAAAGGGACTACTTAAATCTCAAGGATATCAGTCTTTTGCGTTICTZAGTTGATATGCGG GGAGCTGATGTTGATGATCGCCATTAGGGACGATCTGATGACGATGCTATTCATGCTGGCCATG AAACAACTGCTGCTGTGCTTAGCTTACATCTTTTTTTTGCTTCACAAAATCCTTCAAAAATGAA AAAAGCGCAAGCAAGATTGATTATCTTGCATGGAGCCAACTTTTGAATCATACGAATCGTTAAA GCATTGAAGTACATCAGACTTATCGTTGCAGAGACTCTTCGTTTGTTTCCTCAGCCTCCATTGC TGATAAGACGAGCTCTCAAATCAGATATATTACCAGGAGGATACAATGGTGACAAAACTGGATA TGCAATTCCTGCAGGGACTGACATCTTCATCTCTGTTTACAATCTCCACAGATCTCCCTACTTC TGGGATAATCCTCAAGAATTTGAACCAGAGAGATTTCAAGTAAAGAGGGCAAGCGAGGGAATTG AAGGATGGGATGGTTTCGACCCATCTAGAAGCCCCGGAGCTCTATACCCGAATGAGATTGTAGC AGACTTTTCCTTCTTACCATTTGGTGGAGGCCCTAGAAAATGTGTGGGAGATCAATTTGCTCTA ATGGAGTCAACTATAGCATTGGCCATGTTACTGCAGAAGTTTGATGTGGAGCTAAAAGGAAGTC CAGAATCTGTAGAACTAGTTACTGGAGCCACAATACATACCAAAAGTGGGTTGTGGTGCAPACT GAGAAGAAGATCACAAGTAAACTGA CYP1798 ATGGAAATGTCCTCAAGTGTCGCAGCCACAATCAGTATCTGGATGGTCGTCGTATGTATCGTAG 33 SEQIDNO:5 gene(coding GTGTAGGTTGGAGAGTCGTAAATTGGGTTTGGTTGAGACCAAAGAAATTGGAAAAGAGATTGAG in sequence,codon AGAACAAGGTTEGGCGGGIAATCCTTACAGATTGITGCTCGTGACTTGAAGGATGCGAGCTGCA W02016050890 optimized) ATGGAAGAACAAGCAAATTCAAAGCCTATAAACTTCTCCCATGACATCGGTCCAAGAGCTTTCC CTTCAATGTACAAGACCATCCAAAACTACGGTAAAAACTCCTACATGTGCTTAGGTCCATACCC TAGAGTCCACATCATGGATCCACAACAATTGAAGACCGCTTCTACTTTGGTCTACGACATTCAA AAGCCAAATTTGAACCCGGTTGATTAAATTCTTOTTAGATCGGCGTTACACATGAAGGGTGAAA AGTGGGCCTAACCACAGAAACATTATTAACCCAGGTTCCATTGGGAAAAGGTGAAGGATATGAT ACCTGGCTTCTTTCACTCATCTAATGAAATCCTCAACGAATAAAGATTGATTGCTACAAAAGAA GGTTGCTGCGAATTGGATGGCAATCCCGTATTCACAAAATTEGCCGGTGACGCCATTACAAGAA CCGCTTTTGTTCTTCATACGAAGAAGAAAGATTGATTGCTAGATCTTCCAATTGTTGAAGGAAT TTTGCTTCTCAAGCTAGCTTTTGCTCTTTATATTCCACCTTOGAGATICTTGCCTACAAAGAGT AACAATGAAGGAAATTAATAGAAAAATCAAGGCITTGITGTGGGCTATCATICAAGATGCATTG GACAAAAGGCAATGGAAGAAGGCGAAOCCGGTCAATCTGATTTOTTGGGIATATTAATOGAAAG TAATACTAACGAAATCCAAGOTGAAGGTAATAACAAGGAAGAIGGCATGTCTATTGAAGACGTC ATCGAAGAGTGTAAGGTATATTATATAGGAGGTCAAGAAACTACAGCAAGATTATTGATCTGGA CTATGATATTTTGTCCAGTCGAATATAGAATGGCAAGAAGAGCCAGAACCGAAGACTTGAAGGT ATTTGTAATAAGAAACCAGATTTCGACGGTTTGTCPAGATTGAAGCTAGATACTATTGATCTTG AACGAAGTTGTAAGATTTACCCACCTCCTGCCATGCCTGACAAGPATCATCCAAAAGGAAACAA GAGTTGCTAAACCTAACCGTGCCAGCAGTCTTATCTTGATAATGCCTATCATCTTGATACATAG AGATCACGACTTGTGGGGTGAAGATCTAACGAGTTAAACCAGAAAGAATCAGTAAAGCTTCTTG TCTAGGCACAGCAAAGTCCAACCAGCCTTTTCCCTTTTGGTTGGCCTCGTACCTATTTGCATGG GACAAAACTTCGCTATGATCGAAGCTAAGATGGCATTGAGTTTGATCTIGCAAAGATTTGCTIT CGAATAGTCTICATCCTACGTTCATGCACCAACTCTCGACTICACTACACAACCACAACACGGT GCCCACATCGTATTGAGAAAGTTATGA CYP1994gene ATGGAACCACAACCAAGTGCGAATTCAACTGGAATCACAGCCTAAGCACCGTCCTATCGGTG 34 SEQIDNO:6 (codingsequence) TCATTGCCATTATTTTCTTCCGTTTTCTCGTCAAAAGAGTCACGGCCCGGTGAGCGAAAGGG in TCCGAAGCCGCCAAAAGTAGCCGGAGGGTGGCCTCTAATTGGCCACCTCCCTCTCCTCTCGAGGA W02016050890 CCTGAACTGCCCCATGTCAAACTGGGTGGGTTGCCTGATAAATATGGTCCAATCTTCTCGATCC GGCTGGGTGTCCACTCCGCCGTCGTGATAAACAGTTGGGAGGCGGCGAAACAGTTATTAACCAA CCATGACGTCGCCGTCTCTTCCCGCCCCCAAATGCTCGGCGGAAAACTCCTGGGCTACAACTAC GCCOTGTiiGGETTCGGACCCTACGGCTOTTACTOGCGCAACATGCGCAAGATAACCACGOAAG AGCTECTATCCAATAGCAGAATCCAOCTCCTAAGAGACGTTCGAGCGTCAGAAGTGAACCAAGG CATAAAAGAGCTCTACCAGCACTGGAAAGAAAGAAGAGACGGTCACGACCAAGCCTTGGEGGAA CATAAAAGAGCTCTACCAGCACTGGAAAGAAAGAAGAGACGGTCACGACCAAGCCTTGGTGGAA TCTTTGGAGCTGCAGCAACGGTAGACGAGGAAGAGGCGCGACGGAGCCATAAAGCATTGAAGGA GTTGTTACATTATATGGGGCTTTTTCTACTGGGTGATGCTGTTCCATATCTAGGATGGTTGGAC GTCGGCGGCCATGTGAAGGCGATGAAGAAAACTTCAAAAGAATTGGACCG7ATGTTAACACAGT GGTTGGAGGAGCACAAGAAGGAAGGACCCAAGAAAGATCATAAAGACTTCATGGACGTGATGCT TTCAGTTCTCAATGAAACATCCGATGTTCTTTCAGATAAGACCCATGGCTTCGATGCTGATACC ATCATCAAAGCTACATGTATGACGATGGTTTTAGGAGGGAGTGATACGACGGCGGTGGTTGTGA TATGGGCAATCTCGCTGCTGCTGAATAATCGCCCTGCGTTGAGAAAAGTGCAAGAAGAACTGGA AGCCCATATCGGCCGAGACAGAGAACTGGAGGAATCGGATCTCGGTAAGCTAGTGTATTTGCAG GCAGTCGTGAAGGAGACATTGCGGCTGTACGGAGCCGGAGGCCTTTTCTTTCGTGAAACCACAG AGGATGTCACCATCGACGGATTCCATGTCGAGAAAGGGACATGGCTGTTCGTGAACGTGGGGAA GATCCACAGAGATGGGAAGGTGTGGCCGGAGCCAACGGAGTTCAAACCGGAGAGGTTTCTGACG ACCCACAAAGATTTTGATCTGAAGGGCCAGCGGTTTGAGCTCATCCCTTTCGGGGGAGGAAGAA GATCGTGCCCTGGAATGTCTTTTGGSCTCCAAATGCTACAGCTTATTTTGGGTAAACTGCTTCA GGCTTTTGATATATCGACGCCGGGGGACGCCGCCGTTGATATGACCGGATCCATTGGACTGACG AACATGAAAGCCACTCCATTGGAAGTGCTCATCACCCCGCGCTTGCCTCTTTCGCTTTACGATT GA CYP2048gene ATGGAGACTCTTCTTCTTCATCTTCAATCGTTATTTCATCCAATTTCCTTCACTGGTTTCGTTG 35 SEQIDNO:7 (codingsequence) TCCTCTTTAGCTTCCTGTTCCTGCTCCAGAAATGGTTACTGACACGTCCAAACTCTTCATCAGA in AGCCTCACCCCCTTCTCCACCAAAGCTTCCCATCTTCGGACACCTTCTAAACCTGGGTCTGCAT W02016050890 CCCCACATCACCCTCGGAGCCTACGCTCGCCGCTATGGCCCTCTCTTCCTCCTCCACTTCGGCA GCAAGCCCACCATCGTCGTCTCTTCTGCCGAAATCGCTCGCGATATCATGAAGACCCACGACCT CGTCTTCGCCAACCGTCCTAAATCAAGCATCAGCGAAAAGATTCTTTACGGCTCCAAAGATTTA GCCGCATCTCCTTACGGCGAATACTGGAGGCAGATGAAAAGCGTTGGCGTGCTTCATCTTTTGA GCAACAAAAGGGTTCAATCCTTTCGCTCTGTCAGAGAAGAAGAAGTCGAACTGATGATCCAGAA GATCCAACAGAACCCCCTATCAGTTAATTTAAGCGAAATATTCTCTGGACTGACGAACGACATA GTTTGCAGGGTGGCTTTAGGGAGAAAGTATGGCGTGGGAGAAGACGGAAAGAAGTTCCGGTCTC TTCTGCTGGAGTTTGGGGAAGTATTGGGAAGTTTCAGTACGAGAGACTTCATCCCGTGGCTGGG TTGGATTGATCGTATCAGTGGGCTGGACGCCAAAGCCGAGAGGGTAGCCAAAGAGCTCGATGCT TTCTTTGACAGAGTGATCGAAGATCACATCCATCTAAACAAGAGAGAGAATAATCCCGATGAGC AGAAGGACTTGGTGGATGTGCTGCTTTGTGTACAGAGAGAAGACTCCATCGGGTTTCCCCTTGA GATGGATAGCATAAAAGCTTTAATCTTGGACATGTTTGCTGCAGGCACAGACACGACATACACG GTGTTGGAGTGGGCAATGTCCCAACTGTTGAGACACCCAGAAGCGATGAAGAAACTGCAGAGGG AGGTCAGAGAAATAGCAGGTGAGAAAGAACACGTAAGTGAGGATGATTTAGAAAAGATGCATTA CTTGAAGGCAGTAATCAAAGAAACGCTGCGGCTACACCCACCAATCCCACTCCTCGTCCCCAGA GAATCAACCCAAGACATCAGGTTGAGGGGGTACGATATCAGAGGCGGCACCCGGGTTATGATCA ATGCATGGGCCATCGGAAGA CYP2740gene ATGTCGATGAGTAGTGAAATTGAAAGCCTCTGGGTTTTCGCGCTGGCTTCTAAATGCTCTGCTT 36 SEQIDNO:8 (codingsequence) TAACTAAAGAAAACATCCTCTGGTCTTTACTCTTCTTTTTCCTAATCTGGGTTTCTGTTTCCAT in TCTCCACTGGGCCCATCCGGGCGGCCCGGCTTGGGGCCGCTACTGGTGGCGCCGCCGCCGCAGC W02016050890 AATTCCACCGCCGCTGCTATTCCCGGCCCGAGAGGCCTCCCCCTCGTCGGCAGCATGGGCTTGA TGGCCGACTTGGCCCACCACCGGATTGCCGCCGTGGCTGACTCCTTAAACGCCACCCGCCTCAT GGCCTTTTCGCTCGGCGACACTCGCGTGATCGTCACATGCAACCCCGACGTCGCCAAAGAGATT CTCAACAGCTCCCTCTTCGCCGACCGCCCCGTTAAGGAGTCCGCTTACTCCTTGATGTTCAACC GCGCCATTGGGTTCGCCCCCTATGGCCTTTACTGGCGGACCCTCCGCCGCATCGCTTCCCACCA CCTCTTCTGCCCCAAGCAAATCAAGTCCTCCCAGTCCCAGCGCCGCCAAATCGCTTCCCAAATG GTCGCAATGTTCGCAAACCGCGATGCCACACAGAGCCTCTGCGTTCGCGACTCTCTCAAGCGGG CTTCTCTCAACAACATGATGGGCTCTGTTTTCGGCCGAGTTTACGACCTCTCTGACTCGGCTAA CAATGACGTCCAAGAACTCCAGAGCCTCGTCGACGAAGGCTACGACTTGC7GGGCCTCCTCAAC TGGTCCGACCATCTCCCATGGCTCGCCGACTTCGACTCTCAGAAAATCCGGTTCAGATGCTCCC GACTCGTCCCCAAGGTGAACCACTTCGTCGGCCGGATCATCGCCGAACACCGCGCCAAATCCGA CAACCAAGTCCTAGATTICGTCGACGTTTTGCTCTCTCTCCAAGAAGCCGACAAACTCTCTGAC TCCGATATGATCGCCGTTCTTTGGGAAATGATTTTTCGTGGGACGGACACGGTGGCAGTTTTAA TCGAGTGGATACTGGCCAGGATGGTACTTCACAACGATATCCAAAGGAAAGTTCAAGAGGAGCT AGATAACGTGGTTGGGAGTACACGCGCCGTCGCGGAATCCGACATTCCGTCGCTGGTGTATCTA ACGGCTGTGGTTAAGGAAGTTCTGAGGTTACATCCGCCGGGCCCACTCCTGTCGTGGGCCCGCC TAGCCATCACTGATACAATCATCGATGGGCATCACGTGCCCCGGGGGACCACCGCTATGGTTAA CATGTGGTCGATAGCGCGGGACCCACAGGTCTGGTCGGACCCACTCGAATTTATGCCCCAGAGG TTTGTGTCCGACCCCGGTGACGTGGAGTTCTCGGTCATGGGTTCGGATCTCCGGCTGGCTCCGT TCGGGTCGGGCAGAAGGACCTGCCCCGGGAAGGCCTTCGCCTGGACAACTGTCACCTTCTGGGT GGCCACGCTTTTACACGACTTCAAATGGTCGCCGTCCGATCAAAACGACGCCGTCGACTTGTCG GAGGTCCTCAAGCTCTCCTGCGAGATGGCCAATCCCCTCACCGTTAAAGTACACCCAAGGCGCA GTTTAAGCTTTTAA CYP3404gene ATGGATGGTTTTCTTCCAACAGTGGCGGCGAGCGTGCCTGTGGGAGTGGGTGCAATATTGTTCA 37 SEQIDNO:9 (codingsequence) CGGCGTTGTGCGTCGTCGTGGGAGGGGTTTTGGTTTATTTCTATGGACCTTACTGGGGAGTGAG in AAGGGTGCCTGGTCCACCAGCTATTCCACTGGTCGGACATCTTCCCTTGCTGGCTAAGTACGGC W02016050890 CCAGACGTTTTCTCTGTCCTTGCCACCCAATATGGCCCTATCTTCAGGTTCCATATGGGTAGGC AGCCATTGATAATTATAGCAGACCCTGAGCTTTGTAAAGAAGCTGGTATTAAGAAATTCAAGGA CATCCCAAATAGAAGTGTCCCTTCTCCAATATCAGCTTCCCCTCTTCATCAGAAGGGTCTTTTC IICACAAGGGATGCAAGATGGTCGACAATGCGGAACACGATATTATCGGTCTATCAGTCCTCCC ATCTAGCGAGACTAATACCTACTATGCAATCAATCATTGAAACTGCAACTCAAAATCTCCATTC CTCTGTCCAGGAAGACAICCCTTTCTCCAATCTCTCCCTCAAATTGACCACCGATGTGATTGGA ACAGCAGCCTTCGGTGTCAACTTTGGGCTCTCTAATCCACAGGCAACCAAAAGTTGTGCTACCA ACGGCCAAGACAACAAAAATGACGAAGTTTCAGACTTCATCAATCAACACATCTACTCCACAAC GCAGCTCAAGATGGATTTATCAGGTTCCTTCTCAATCATACTTGGACTGCTTGTCCCTATACTC CAAGAACCATTTAGACAAGTCCTAAAGAGAATACCATTCACCATGGACTGGAAAGTGGACCGGA CAAATCAGAAATTAAGTGGTCGGCTTAATGAGATTGTGGAGAAGAGAATGAAGTGTAACGATCA AGGTTCAAAAGACTTCTTATCGCICATTTTGAGAGCAAGAGAGTCAGAGACAGTATCAAGGAAT GTCTTCACTCCAGACTACATCAGTGCAGTTACGIATGAACACCTACTTGCIGGGTCGGCTACCA CGGCGTTTACGTTGTCTTCTATTGTATATTTAGTTGCTGGGCATCCAGAAGTCGAGAAGAAGTT GCTAGAAGAGATTGACAACTTTGGTCCATCCGArCAGATACCAACAGCTAATGATCTTCATCAG AAGTTTCCATATCTTGATCAGGTGATTAAAGAGGCTATGAGGTTCTACACTGTTTCCCCTCTAG TAGCCAGAGAAACAGCTAAAGATGTGGAGATTGGTGGATATCTTCTTCCAAAGGGGACATGGGT ITGGTTAGCACTTGGAGTTCTTGCCAAGGATCCAAAGAACTTTCCAGAACCAGATAAATTCAAA CCAGAGAGGTTTGATCCAAATGAAGAAGAGGAGAAACAAAGGCATCCTTATGCTTTAATCCCCT TTGGAATTGGTCCTCGAGCATGCATIGGTAAAAAATTCGCCCTTCAGGAGTTGAAGCTCTCGTT GATTCATTTGTACAGGAAGTTTGTATTTCGGCAT CYP3968gene ATGGAAATCATTTTATCATATCTCAACAGCTCCATAGCTGGACTCTTCCTCTTGCTTCTCTTCT 38 SEQIDNO:10 (codingsequence) CGTTTTTTGTTTTGAAAAAGGCTAGAACCTGTAAACGCAGACAGCCTCCTGAAGCAGCCGGCGG in ATGGCCGATCATCGGCCACCTGAGACTGCTCGGGGGTTCGCAACTTCCCCATGAAACCTTGGGA W02016050890 GCCATGGCCGACAAGTATGGACCAATCTTCAGCATCCGAGTTGGTGTCCACCCATCTCTTGTTA TAAGCAGTTGGGAAGTGGCTAAAGAGTGCTACACCACCCTCGACTCAGTTGTCTCTTCTCGTCC CAAGAGTTTGGGTGGAAAGTTGTTGGGCTACAACTTCGCCGCTTTTGGGTTCAGGCCTTATGAT TCCTTTTACCGGAGTATCCGCAAAACCATAGCCTCCGAGGTGCTGTCGAACCGCCGTCTGGAGT TGCAGAGACACATTCGAGTTTCTGAGGTGAAGAGATCGGTGAAGGAGCTTTACAATCTGTGGAC GCAGAGAGAGGAAGGCTCAGACCACATACTTATTGATGCGGATGAATGGATTGGTAATATTAAT TTGAACGTGATTCTGATGATGGTTTGTGGGAAGCGGTTTCTTGGCGGTTCTGCCAGCGATGAGA AGGAGATGAGGCGGTGTCTCAAAGTCTCGAGAGATTTCTTCGATTTGACAGGGCAGTTTACGGT GGGAGATGCCATTCCTTTCCTGCGATGGCTGGATTTGGGTGGATATGCGAAGGCGATGAAGAAA ACTGCAAAAGAAATGGACTGTCTCGTTGAGGAATGGCTGGAAGAACACCGCCGGAAGAGAGACT CCGGCGCCACCGACGGTGAACGTGACTTCATGGATGTGATGCTTTCGATTCTTGAAGAGATGGA CCTTGCTGGCTACGACGCTGACACAGTCAACAAAGCCACATGCCTGAGCATTATTTCTGGGGGA ATCGATACTATAACGCTAACTCTGACATGGGCGATCTCGTTATTGCTGAACAATCGAGAGGCAC TGCGAAGGGTTCAAGAGGAGGTGGACATCCATGTCGGAAACAAAAGGCTTGTGGATGAATCAGA CTTGAGCAAGCTGGTGTATCTCCAAGCCGTCGTGAAAGAGACATTAAGGTTGTACCCAGCAGGG CCGCTGTCGGGAGCTCGAGAGTTCAGTCGGGACTGCACGGTCGGAGGGTATGACGTGGCCGCCG GCACACGGCTCATCACAAACCTTTGGAAGATACAGACGGACCCTCGGGTGTGGCCGGAGCCACT TGAGTTCAGGCCGGAGAGGTTTCTGAGCAGCCACCAGCAGTTGGATGTGAAGGGCCAGAACTTT GAACTGGCCCCATTTGGTTGTGGAAGAAGAGTGTGCCCTGGGGCGGGGCTTGGGGTTCAGATGA CGCAGTTGGTGCTGGCGAGTCTGATTCATTCGGTGGAACTTGGAACTCGCTCCGATGAAGCGGT GGACATGGCTGCTAAGTTTGGACTCACAATGTACAGAGCCACCCCTCTTCAGGCTCTCGTCAAG CCACGCCTCCAAGCCGGTGCTTATTCATGA CYP4112gene ATGGGTGTATTGTCCATTTTATTATTCAGATATTCCGTCAAGAAGAAGCCATTAAGATGCGGTC 39 SEQIDNO:11 (codingsequence) ACGATCAAAGAAGTACCACAGATAGTCCACCTGGTTCAAGAGGTTTGCCATTGATAGGTGAAAC in TTTGCAATTCATGGCTGCTATTAATTCTTTGAACGGTGTATACGATTTCGTTAGAATAAGATGT W02016050890 TTGAGATACGGTAGATGCTTTAAGACAAGAATCTTCGGTGAAACCCATGTTTTTGTCTCAACTA CAGAATCCGCTAAGTTGATCTTGAAGGATGGTGGTGAAAAATTCACCAAAAAGTACATCAGATC AATCGCTGAATTGGTTGGTGACAGAAGTTTGTTATGTGCATCTCATTTGCAACACAAGAGATTG AGAGGTTTGTTGACTAATTTGTTTTCTGCCACATTCTTGGCTTCTTTCGTAACTCAATTCGATG AACAAATCGTTGAAGCTTTTAGATCATGGGAATCCGGTAGTACCATAATCGTTTTGAACGAAGC ATTGAAGATCACTTGTAAGGCCATGTGCAAAATGGTCATGTCCTTAGAAAGAGAAAACGAATTG GAAGCTTTGCAAAAGGAATTGGGTCATGTTTGTGAAGCTATGTTGGCATTTCCATGCAGATTCC CTGGTACAAGATTTCACAATGGTTTGAAGGCAAGAAGAAGAATCATTAAAGTTGTCGAAATGGC CATTAGAGAAAGAAGAAGATCTGAAGCTCCTAGAGAAGATTTCTTGCAAAGATTGTTGACAGAA GAAAAGGAAGAAGAAGACGGTGGTGGTGTTTTAAGTGATGCCGAAATTGGTGACAACATATTGA CAATGATGATCGCAGGTCAAGATACCACTGCCTCTGCTATTACCTGGATGGTCAAGTTTTTGGA AGAAAACCAAGATGTATTGCAAAACTTAAGAGACGAACAATTCGAAATCATGGGTAAACAAGAA GGTTGTGGTTCATGCTTCTTGACATTAGAAGATTTGGGTAATATGTCCTATGGTGCAAAAGTAG TTAAGGAATCATTGAGATTAGCCTCCGTCGTACCATGGTTTCCTAGATTGGTTTTACAAGATTC TTTGATCCAAGGTTACAAAATTAAAAAGGGTTGGAACGTCAACATAGACGTAAGATCTTTACAT TCAGATCCATCCTTGTATAATGACCCAACAAAGTTTAACCCTAGTAGATTCGATGACGAAGCTA AACCTTACTCATTTTTGGCATTCGGTATGGGTGGTAGACAATGTTTGGGTATGAACATGGCAAA GGCCATGATGTTGGTTTTCTTGCACAGATTGGTCACCTCATTCAGATGGAAGGTTATAGATTCC GACTCTTCAATCGAAAAATGGGCTTTGTTCTCTAAGTTGAAGTCAGGTTGCCCTATCGTAGTTA CCCACATCGGTTCCTAA CYP4149gene ATGGATTTCTACTGGATCTGTGTTCTTCTGCTTTGCTTCGCATGGTTTTCCATTTTATCCCTTC 40 SEQIDNO:12 (codingsequence) ACTCGAGAACAAACAGCAGCGGCACTTCCAAACTTCCTCCCGGACCGAAACCCTTGCCGATCAT in CGGAAGCCTTTTGGCTCTCGGCCACGAGCCCCACAAGTCTTTGGCTAATCTCGCTAAATCTCAT W02016050890 GGCCCTCTTATGACCTTAAAGCTCGGCCAAATCACCACCGTCGTAGTTTCCTCCGCTGCCATGG CTAAGCAAGTTCTCCAAACGCACGACCAGTTTCTGTCCAGCAGGACCGTTCCAGACGCAATGAC CTCTCACAACCACGATGCTTTCGCACTCCCATGGATTCCGGTTTCACCCCTCTGGCGAAACCTT CGACGAATATGCAACAACCAGTTGTTTGCCGGCAAGATTCTCGACGCCAACGAGAATCTCCGGC GAACCAAAGTGGCCGAGCTCGTATCCGATATCTCGAGAAGTGCATTGAAAGGTGAGATGGTGGA TTTTGGAAACGTGGTGTTCGTCACTTCGCTCAATCTGCTTTCCAATACGATTTTCTCGGTGGAT TTCTTCGACCCAAATTCTGAAATTGGGAAAGAGTTCAGGCACGCAGTACGAGGCCTCATGGAAG AAGCTGCCAAACCAAATTTGGGGGATTATTTCCCTCTGCTGAAGAAGATAGATCTTCAAGGAAT AAAGAGGAGACAGACCACTTACTTCGATCGGGTTTTTAATGTTTTGGAGCACATGATCGACCAG CGTCTTCAGCAGCAGAAGACGACGTCTGGTTCTACCTCCAACAACAACAACGACTTACTGCACT ACCTTCTCAACCTCAGCAACGAAAATAGCGACATGAAATTGGGGAAACTTGAGCTGAAACACTT CTTATTGGTGCTATTCGTCGCTGGGACTGAAACGAGTTCTGCAACACTGCAATGGGCAATGGCA GAACTACTAAGAAACCCAGAAAAGTTAGCAAAAGCTCAAGCGGAGACCAGGCGGGTGATTGGGA AAGGGAACCCAATTGAAGAATCAGACATTTCGAGGCTGCCTTATCTGCAAGCAGTGGTGAAAGA AACTTTCAGATTGCACACACCAGCGCCATTTCTACTGCCGCGCAAAGCACTACAGGACGTGGAA ATTGCAGGTTTCACAGTCCCAAAGGACGCTCAGGTACTGGTAAATTTATGGGCTATGAGCAGAG ATTCAAGCATCTGGGAGAACCCAGAGTGGTTCGAGCCAGAAAGGTTTTTGGAGTCGGAGCTGGA CGTTAGAGGGAGAGATTTTGAGCTGATCCCGTTCGGCGGTGGGCGGAGGATTTGCCCCGGTCTG CCGTTGGCGATGAGAATGTTGCATTTGATTTTGGGTTCTCTCATCCACTTCTTTGATTGGAAGC TTGAAGATGGGTGTCGGCCGGAAGACGTGAAAATGGACGAAAAGCTTGGCCTCACTCTGGAGTT GGCTTTTCCCCTCACAGCCTTGCCTGTCCTTGTCTAA CYP4491gene ATGTCCTCCTGCGGTGGTCCAACTCCTTTGAATGTTATCGGTATCTTATTACAATCAGAATCCT 41 SEQIDNO:13 (codingsequence) CCAGAGCCTGCAACTCAGACGAAAACTCAAGAATTTTGAGAGATTTCGTAACAAGAGAAGTTAA in CGCTTTCTTATGGTTGTCCTTGATCACTATCACAGCAGTTTTGATCAGTAAAGTTGTCGGTTTG W02016050890 TTTAGATTGTGGTCTAAGGCAAAGCAATTGAGAGGTCCACCTTGTCCATCATTCTACGGTCATT CTAAGATCATCTCAAGACAAAATTTGACTGATTTGTTATATGACTCCCACAAAAAGTACGGTCC AGTAGTTAAATTGTGGTTAGGTCCTATGCAATTGTTAGTCTCCGTAAAGGAACCAAGTTTGTTG AAGGAAATATTGGTTAAAGCTGAGGATAAGTTGCCTTTAACAGGTAGAGCCTTTAGATTGGCTT TCGGTAGATCTTCATTATTTGCATCCAGTTTCGAAAAGGTTCAAAACAGAAGACAAAGATTGGC CGAAAAGTTGAATAAGATCGCATTCCAAAGAGCCAACATCATTCCAGAAAAGGCCGTAGCTTGT TTCATGGGTAGAGTTCAAGATTTGATGATAGAAGAATCTGTCGACTGTAATAAGGTTTCTCAAC ATTTGGCTTTTACTTTGTTAGGTTGCACATTGTTTGGTGACGCCTTCTTAGGTTGGTCTAAGGC TACAATCTATGAAGAATTGTTGATGATGATCGCTAAGGACGCATCCTTTTGGGCTAGTTATAGA GTTACCCCAATCTGGAAGCAAGGTTTCTGGAGATACCAAAGATTGTGTATGAAGTTGAAGTGCT TGACTCAAGATATCGTTCAACAATACAGAAAGCATTACAAGTTGTTTTCTCACTCACAAAACCA AAACTTACACAACGAAACCAAGTCAACTGGTGTTGAAGTCGCTTTTGATATTCCACCTTGTCCT GCTGCAGACGTTAGAAATTCTTGCTTTTTCTACGGTTTGAACGATCATGTTAACCCAAACGAAG AACCTTGTGGTAATATTATGGGTGTCATGTTTCACGGTTGCTTGACTACAACCTCTTTGATCGC ATCAATCTTGGAAAGATTGGCCACTAACCCAGAAATCCAAGAAAAGATTAATTCTGAATTGAAC TTAGTTCAAAAGGGTCCAGTCAAGGATCATAGAAAGAATGTTGACAACATGCCTTTGTTATTGG CAACAATCTATGAATCAGCTAGATTATTGCCAGCAGGTCCTTTATTGCAAAGATGTCCTTTGAA GCAAGATTTGGTTTTGAAAACAGGTATCACCATTCCAGCTGGTACCTTGGTCGTAGTTCCTATT AAATTGGTTCAAATGGATGACTCTTCATGGGGTTCAGATGCCAATGAGTTTAATCCATACAGAT TCTTGTCCATGGCTTGTAATGGTATTGACATGATACAAAGAACCCCTTTAGCTGGTGAAAACAT TGGTGACCAAGGTGAAGGTTCATTTGTCTTGAATGACCCAATTGGTAACGTAGGTTTCTTACCT TTTGGTTTCGGTGCAAGAGCCTGCGTTGGTCAAAAGTTTATAATCCAAGGTGTCGCTACTTTGT TCGCAAGTTTGTTGGCCCATTACGAAATTAAATTGCAATCCGAGAGTAAGAATGATTCTAAACC ATCCAGTAACACCTCTGCCAGTCAAATCGTCCCAAACTCAAAAATCGTATTCGTAAGAAGAAAC TCATAA CYP5491gene ATGTGGACTGTCGTGCTCGGTTTGGCGACGCTGITTGTCGCCIACTACATCCATTGGATTAACA 42 SEQIDNO:14 (codingsequence) AATGGAGAGATTCCAAGITCAACGGAGTTCTGCCGCCGGGCACCATGGGTTTGCCGCTCATCGG in AGAGACGATTCAACTGAGICGACCCAGTGACTCCCTCGACGTTCACCCITTCATCCAGAAAAAA W02016050890 GTTGAAAGATACGGGCCGATCTTCAAAACATGTCTGGCCGGAAGGCCGGTGGTGGTGTCGGCGG ACGCAGAGTTCAACAACTACATAATGCTGCAGGAAGGAAGAGCAGTGGAAATGTGGTATTTGGA TACGCTCTCCAAATTTTTCGGCCTCGACACCGAGTGGCTCAAAGCTCTGGGCCTCATCCACAAG TACATCAGAAGCATTACTCTCAATCACTTCGGCGCCGAGGCCCTGCGGGAGAGATTTCTTCCTT TTATTGAAGCATCCTCCATGGAAGCCClrCACICCTGGTCTACTCAACCTAGCGTCGAAGTCAA AAATGCCTCCGCTCTCATGGTTTXTAGGACCTCGGTGAATAAGATGTTCGGTGAG6ATGCGAAG AAGCTATCGGGAAATATCCCTGGGAAGTTCACGAAGCTTCTAGGAGGATTTCTGAGTTTACCAC TGAATTTTCCCGGCACCACCTACCACAAATGCTTGAAGGATAIGAAGGAAATCCAGAAGAAGCT AAGAGAGGTTGTAGACGATAGATTGGCTAATGTGGGGCCTGATGTGGAAGATTTCTTGGGGCAA GCCCTTAAAGATAAGGAATCAGAGAAGTTCATTTCAGAGGAGTTCATCATCCAACTGTTGTTTT CTATCAGTTTTGCTAGCTTTGAGTCCATCTCCACCACTCTTACTTTGATTCTCAAGCTCCTTGA TGAACACCCAGAAGTAGTGAAAGAGTTGGAAGCTGAACACGAGGCGATTCGAAAAGCTAGAGCA GATCCAGATGGACCAATTACTTGGGAAGAATACAAATCCATGACTTTTACATTACAAGTCATCA ATGAAACCCTAAGGTTGGGGAGTGTCACACCTGCCTTGTTGAGGAAAACAGTTAAAGATCTTCA AGTAAAAGGATACATAATCCCGGAAGGATGGACAATAATGCTTGTCACCGCTTCACGTCACAGA GACCCAAAAGTCTATAAGGACCCTCATATCTTCAATCC.ATGGCGTTGGAAGGACTTGGACTCA TTACCATCCAAAAGAACTTCATGCCTTTTGGGGGAGGCTTAAGGCATTGTGCTGGTGCTGAGTA CTCTAAAGTCTACTTGTGCACCTTCXTGCACATCCTCTGTACCAAATACCGATGGACCAAACTT GGGGGAGGAAGGATTGCAAGAGCTCATATATTGAGTTTTGAAGATGGGTTACATGTGAAGTTCA CACCCAAGGAATGA CYP6479gene ATGAAGATGAAGATGGAATCCATGCGCACCTCCCTGGATATCTCCGACCATGACATACTTCCAA 43 SEQIDNO:15 (codingsequence) GGGTTTATCCTCATGTTCACCTATGGATCAACAAATATGGGAAAAACTTCATTCAGTGGAATGG in CAACGTAGCTCAGTTGATTGTTTCGGATCCTGACACGATCAAGGAGATACTCCAAAACCGAGAA W02016050890 CAAGCTGTTCCCAAAATAGATCTCAGCGGAGATGCACGGAGGATATTCGGGAATGGGCTTTCGA CTTCTGACGGTGAAAAATGGGCTAAGGCTCGAAGAATCGCTGATTACGCTTTCCACGGGGATCT CCTAAGAAATATGGGGCCAACCATGGTTTCCTGTGCTGAGGCAATGGTGGAAAAGTGGAAGCAT CATCAAGGCAAAGAGCTTGATTTGTTCGAAGAGTTTAAGGTGCTCACTTCAGATATCATTGCAC ATACAGCCTTTGGAAGCAGTTATTTGGAAGGGAAAGTTATTTTTCAGACTCTAAGTAAGCTGAG CATGATATTATTTAAGAATCAGTTCAAACGAAGGATTCCTGTTATCAGCAAGTTCTTCAGATCA AAGGATGCGAGGGAGGGAGAGGAGCTGGAAAGAAGGTTGAAAAATTCCATAATTTCAATAATGG AAAAGAGAGAAGAGAAGGTGATAAGTGGTGAAGCAGATAACTATGGTAATGATTTTCTTGGATT ACTTTTGAAGGCAAAGAATGAGCCTGACCAGAGGCAGAGGATTTCTGTTGATGATGTAGTGGAT GAATGCAAAACAGTTTACTTCGCTGGGCAAGAAACTACAAGTGTTTTGCTTGCTTGGACCGCCT TTCTTTTAGCAACTCATGAGCATTGGCAAGAAGAAGCAAGAAAGGAAGTGCTGAATATGTTTGG CAACAAGAATCCAACTTTAGAAGGCATCACAAAATTAAAGATTATGAGCATGATCATCAAGGAA TCTCTAAGATTATATCCTCCAGCCCCGCCCATGTCAAGGAAGGTTAAAAAGGAAGTCAGATTGG GGAAGCTGGTTCTCCCCCCCAACATTCAAGTAAGCATCTCAACTATTGCAGTTCATCATGATAC TGCAATATGGGGTGAAGATGCCCATGTATTCAAACCAGAAAGATTTTCTGAAGGAACAGCTAAA GATATCCCATCAGCTGCATACATCCCATTTGGCTTTGGTCCTCGAAACTGCATCGGCAATATCT TGGCCATCAACGAAACTAAGATTGCACTGTCGATGATTCTACAACGATTTTCTTTCACCATCTC CCCGGCCTACGTCCACGCACCTTTCCAGTTCCTCACTATCTGCCCCCAACACGGGGTTCAGGTA AAGCTTCAGTCCCTATTAAGTGAAAGGTGA CYP7604gene ATGGAAGCTGAATTTGGTGCCGGTGCTACTATGGTATTATCCGTTGTCGCAATCGTCTTCTTTT 44 SEQIDNO:16 (codingsequence) TCACATTTTTACACTTGTTTGAATCTTTCTTTTTGAAGCCAGATAGATTGAGATCTAAGTTGAG in AAAGCAAGGTATTGGTGGTCCATCTCCTTCATTTTTGTTGGGTAATTTGTCAGAAATTAAATCC W02016050890 ATCAGAGCTTTGTCTTCACAAGCTAAGAACGCAGAAGATGCCTCTGCTGGTGGTGGTGGTGGTT CCGCCAGTATAGCTCATGGTTGGACTTCAAATTTGTTTCCTCACTTAGAACAATGGAGAAACAG ATATGGTCCAATTTTCGTATACTCCAGTGGTACAATCCAAATCTTGTGTATCACAGAAATGGAA ACCGTTAAGGAAATCTCTTTGTCAACCTCCTTGAGTTTAGGTAAACCTGCTCATTTGTCTAAGG ATAGAGGTCCATTGTTAGGTTTGGGTATCTTAGCCTCTTCAGGTCCTATTTGGGTTCACCAAAG AAAGATCATCGCTCCACAATTGTATTTGGATAAAGTAAAGGGTATGACCTCATTGATGGTTGAA AGTGCAAATTCTATGTTAAGATCCTGGGAAACTAAAGTTGAAAATCATGGTGGTCAAGCCGAAA TTAACGTCGATGGTGACTTGAGAGCATTAAGTGCCGATATCATTTCTAAGGCTTGCTTTGGTTC AAACTATTCCGAAGGTGAAGAAATTTTCTTGAAGTTGAGAGCATTGCAAGTTGTCATGAGTAAG GGTTCTATTGGTATACCTGGTTTTAGATACATACCAACTAAAAATAACAGAGAAATGTGGAAGT TGGAAAAGGAAATCGAATCAATGATCTTGAAGGTTGCCAACGAAAGAACACAACATTCCAGTCA CGAACAAGATTTGTTGCAAATGATTTTGGAAGGTGCAAAGTCTTTGGGTGAAGACAATAAGAGT ATGAACATATCAAGAGACAAGTTTATTGTTGACAATTGTAAGAACATCTATTTCGCTGGTCATG AAACTACAGCTATAACCGCATCTTGGTGCTTGATGTTGTTAGCTGCACACCCTGATTGGCAAGC AAGAGCCAGATCTGAAGTTTTACAATGTTGCGATGACAGACCAATCGATGCAGACACAGTCAAA AATATGAAGACCTTGACTATGGTAATTCAAGAAACTTTGAGATTGTACCCACCTGCTGTATTCG TTACAAGACAAGCATTAGAAGATATCAGATTCAAAAACATCACAATACCAAAGGGTATGAACTT TCATATACCAATCCCTATGTTGCAACAAGACTTCCACTTATGGGGTCCTGATGCTTGTTCATTT GACCCACAAAGATTCTCCAATGGTGTCTTAGGTGCATGCAAAAACCCACAAGCCTATATGCCTT TTGGTGTTGGTCCAAGAGTCTGTGCCGGTCAACATTTCGCTATGATCGAATTGAAAGTCATCGT ATCATTGGTTTTGTCCAGATTCGAATTTTCTTTGTCACCTTCCTACAAGCATTCACCAGCCTTC AGATTAGTTGTCGAACCAGAAAACGGTGTCATATTGCATGTCAGAAAGTTGTGA CYP8224gene ATGGAAGTGGATATCAATATCTTCACCGTCTTTTCCTTCGTATTATGCACAGTCTTCCTCTTCT 45 SEQIDNO:17 (codingsequence) TTCTATCCTTCTTGATCCTCCTCCTCCTCCGAACGCTCGCCGGAAAATCCATAACGAGCTCCGA in GTACACGCCAGTGTACGGCACCGTCTACGGTCAGGCTTTCTATTTCAACAACCTGTACGATCAT W02016050890 CTAACGGAGGTGGCCAAGAGACATCGAACCTTCCGGCTGCTTGCGCCGGCATACAGCGAGATAT ACACGACCGATCCGAGAAACATCGAGCATATGTTGAAGACGAAATTCGATAAGTATTCGAAAGG AAGCAAGGATCAAGAAATCGTTGGGGATCTGTTTGGAGAGGGGATATTTGCAGTCGATGGAGAT AAGTGGAAGCAGCAGAGGAAGCTGGCTAGCTATGAATTCTCGACGAGGATTCTTAGGGATTTTA GCTGCTCGGTTTTCAGACGAAGTGCTGCTAAACTTGTTGGAGTTGTTTCGGAGTTTTCCAGCAT GGGTCGGGTTTTTGATATCCAGGATTTGCTAATGCGGTGCGCTTTGGACTCCATTTTCAAAGTG GGGTTCGGGGTTGATTTGAATTGCTTGGAGGAATCAAGCAAAGAAGGGAGCGATTTCATGAAAG CCTTCGATGATTCTAGCGCTCAGATTTTTTGGCGCTATATCGATCCCTTCTGGAAATTGAAGAG ATTGCTTAACATCGGTTCCGAAGCTTCGTTTAGGAACAACATAAAAACCATAGATGCTTTTGTG CACCAGTTGATCAGAGACAAGAGAAAATTGCTTCAGCAACCGAATCACAAGAATGACAAAGAGG ACATACTTTGGAGGTTTCTGATGGAAAGTGAGAAGGATCCAACAAGAATGAATGATCAATATCT AAGGGATATAGTCCTCAATTTCATGTTGGCTGGCAAAGATTCAAGTGGAGGAACTCTGTCCTGG TTCTTCTACATGCTATGCAAGAACCCTTTAATACAGGAAAAAGTTGCAGAAGAAGTGAGGCAAA TTGTTGCGTTTGAAGGGGAAGAAGTTGACATCAATTTGTTCATACAAAACTTAACTGATTCAGC TCTTGACAAAATGCATTATCTTCATGCAGCATTGACCGAGACTCTGAGGCTATATCCTGCAGTC CCTTTGGATGGAAGGACTGCAGAAATAGATGACATTCTTCCTGATGGCTATAAACTAAGAAAAG GGGATGGAGTATACTACATGGCCTATTCCATGGGCAGGATGTCCTCCCTTTGGGGAGAAGATGC TGAAGATTTTAAACCCGAAAGATGGCTTGAAAGTGGAACTTTTCAACCCGAATCACCTTTCAAA TTCATCGCTTTTCATGCGGGTCCTCGAATGTGTTTGGGAAAAGAGTTTGCTTATCGACAAATGA AGATAGTATCTGCTGCTTTGCTTCAATTTTTTCGATTCAAAGTAGCTGATACAACGAGGAATGT GACTTATAGGATCATGCTTACCCTTCACATTGATGGAGGTCTCCCTCTTCTTGCAATTCCGAGA ATTAGAAAATTTACCTAA CYP8728gene TTGGATAGTGGAGTTAAAAGAGTGAAACGGCTAGTTGAAGAGAAACGGCGAGCAGAATTGTCTG 46 SEQIDNO:18 sequence CCCGGATTGCCTCTGGAGAATTCACAGTCGAAAAAGCTGGTTTTCCATCTGTATTGAGGAGTGG in CTTATCAAAGATGGGTGTTCCCAGTGAGATTCTGGACATATTATTTGGTTTCGTTGATGCTCAA W02016050890 GAAGAATATCCCAAGATTCCCGAAGCAAAAGGATCAGTAAATGCAATTCGTAGTGAGGCCTTCT TCATACCTCTCTATGAGCTTTATCTCACATATGGTGGAATATTTAGGTTGACTTTTGGGCCAAA GTCATTCTTGATAGTTTCTGATCCTTCCATTGCTAAACATATACTGAAGGATAATCCGAGGAAT TATTCTAAGGGTATCTTAGCTGAAATTCTAGAGTTTGTCATGGGGAAGGGACTTATACCAGCTG ACGAGAAGATATGGCGTGTACGAAGGCGGGCTATAGTCCCATCTTTGCATCTGAAGTATGTAGG TGCTATGATTAATCTTTTTGGAGAAGCTGCAGATAGGCTTTGCAAGAAGCTAGATGCTGCAGCA TCTGATGGGGTTGATGTGGAAATGGAGTCCCTGTTCTCCCGTTTGACTTTAGATATCATTGGCA AGGCAGTTTTTAACTATGACTTTGATTCACTTACAAATGACACTGGCATAGTTGAGGCTGTTTA CACTGTGCTAAGAGAAGCAGAGGATCGCAGTGTTGCACCAATTCCAGTATGGGAAATTCCAATT TGGAAGGATATTTCACCACGGCAAAAAAAGGTCTCTAAAGCCCTCAAATTGATCAACGACACCC TCGATCAACTAATTGCTATATGCAAGAGGATGGTTGATGAGGAGGAGCTGCAGTTTCATGAGGA ATACATGAATGAGCAAGATCCAAGCATCCTTCATTTCCTTTTGGCATCAGGAGATGATGTTTCA AGCAAGCAGCTTCGTGATGACTTGATGACTATGCTTATAGCTGGGCATGAAACATCTGCTGCAG TTTTAACATGGACCTTTTATCTTCTTTCCAAGGAGCCGAGGATCATGTCCAAGCTCCAGGAGGA GGTTGATTCAGTCCTTGGGGATCGGTTTCCAACTATTGAAGATATGAAGAACCTCAAATATGCC ACACGAATAATTAACGAATCCTTGAGGCTTTACCCACAGCCACCAGTTTTAATACGTCGATCTC TTGACAATGATATGCTCGGGAAGTACCCCATTAAAAAGGGTGAGGACATATTCATTTCTGTTTG GAACTTGCATCGCAGTCCAAAACTCTGGGATGATGCGGATAAATTTAATCCTGAAAGGTGGCCT CTGGATGGACCCAATCCAAATGAGACAAATCAAAATTTCAGATATTTACCTTTTGGTGGCGGAC CACGGAAATGTGTGGGAGACATGTTTGCTTCGTACGAGACTGTTGTAGCACTTGCAATGCTTGT TCGGCGATTTGACTTCCAAATGGCACTTGGAGCACCTCCTGTAAAAATGACAACTGGAGCTACA ATTCACACAACAGATGGATTGAAAATGACAGTTACACGAAGAATGAGACCTCCAATCATACCCA CATTAGAGATGCCTGCAGTGGTCGTTGACTCGTCTGTCGTGGACTCGTCCGTCGCCATTTTGAA AGAAGAAACACAAATTGGTTAG DNAsequence CAGTTCCTCTCCTGGTCCTCCCAGTTTGGCAAGAGGTTCATCTTCTGGAATGGGATCGAGCCCA 47 SEQIDNO:19 encodingCYP10020 GAATGTGCCTCACCGAGACCGATTTGATCAAAGAGCTTCTCTCTAAGTACAGCGCCGTCTCCGG in TAAGTCATGGCTTCAGCAACAGGGCTCCAAGCACTTCATCGGCCGCGGTCTCTTAATGGCCAAC W02016050890 GGCCAAAACTGGTACCACCAGCGTCACATCGTCGCGCCGGCCTTCATGGGAGACAGACTCAAGA GTTACGCCGGGTACATGGTGGAATGCACAAAGGAGATGCTTCAGTCAATTGAAAACGAGGTCAA CTCGGGGCGATCCGAGTTCGAAATCGGTGAGTATATGACCAGACTCACCGCCGATATAATATCA CGAACCGAGTTCGAAAGCAGCTACGAAAAGGGAAAGCAAATTTTCCATTTGCTCACCGTTTTAC AGCATCTCTGCGCTCAGGCGAGCCGCCACCTCTGCCTTCCTGGAAGCCGGTTTTTTCCGAGTAA ATACAACAGAGAGATAAAGGCATTGAAGACGAAGGTGGAGGGGTTGTTAATGGAGATAATACAG AGCAGAAGAGACTGTGTGGAGGTGGGGAGGAGCAGTTCGTATGGAAATGATCTGTTGGGAATGT TGCTGAATGAGATGCAGAAGAAGAAAGATGGGAATGGGTTGAGCTTGAATTTGCAGATTATAAT GGATGAATGCAAGACCTTCTTCTTCGCCGGCCATGAAACCACTGCTCTTTTGCTCACTTGGACT GTAATGTTATTGGCCAGCAACCCTTCTTGGCAACACAAGGTTCGAGCCGAAGTTATGGCCGTCT GCAATGGAGGAACTCTCTCTCTTGAACATCTCTCCAAGCTCTCTCTGTTGAGTATGGTGATAAA TGAATCGTTGAGGCTATACCCGCCAGCAAGTATTCTTCCAAGAATGGCATTTGAAGATATAAAG CTGGGAGATCTTGAGATCCCAAAAGGGCTGTCGATATGGATCCCAGTGCTTGCAATTCACCACA GTGAAGAGCTATGGGGCAAAGATGCAAATGAGTTCAACCCAGAAAGATTTGCAAATTCAAAAGC CTTCACTTCGGGGAGATTCATTCCCTTTGCTTCTGGCCCTCGCAACTGCGTTGGCCAATCATTT GCTCTCATGGAAACCAAGATCATTTTGGCTATGCTCATCTCCAAGTTTTCCTTCACCATCTCTG ACAATTATCGCCATGCACCCGTGGTCGTCCTCACTATAAAACCCAAATACGGAGTCCAAGTTTG CTTGAAGCCTTTCAATTAA DNAsequence ATGGAAGACACCTTCCTACTCTATCCTTCCCTCTCTCTTCTCTTTCTTCTTTTTGCTTTCAAGC 48 SEQIDNO:20 encodingCYP10285 TCATCCGTCGATCCGGAGGAGTTCGCAGGAACTTACCGCCGAGTCCGCCCTCTCTTCCGGTTAT in CGGCCACCTCCATCTCTTGAAAAAGCCACTCCACCGGACTTTCCAGAAACTTTCCGCCAAATAT W02016050890 GGTCCTGTTATGTCCCTCCGCCTCGGGTCTCGCCTCGCAGTCATTGTATCGTCGTCGTCGGCGG TGGACGAGTGTTTCACTAAAAACGACGTCGTGCTCGCCAACCGTCCTCGTTTGCTAATTGGCAA ACACCTCGGCTACAACTACACTACCATGGTTGGGGCTCCCTACGGCGACCACTGGCGTAGCCTC CGCCGCATCGGTGCCCTCGAAATCTTCTCTTCATCTCGCCTCAACAAATTCGCCGACATCCGAA GGGATGAAGTAGAGGGATTGCTTCGCAAACTCTCACGCAATTCGCTCCATCAATTCTCGAAAGT GGAAGTTCAATCGGCCTTGTCGGAGCTGACGTTCAACATCTCGATGAGAATGGCGGCAGGGAAA CGGTATTACGGAGATGACGTGACGGACGAGGAAGAGGCGAGAAAGTTCAGAGAGTTAATTAAAC AGATAGTGGCGCTGGGCGGAGTATCAAATCCAGGGGATTTCGTCCCGATTCTGAATTGGATTCC GAACGGTTTCGAGAGGAAGTTGATCGAGTGTGGGAAGAAGACGGATGCGTTCTTGCAGGGGCTG ATCGAGGACCACCGGAGAAAGAAGGAAGAGGGTAGGAACACGATGATCGATCACCTGCTCTCTC TGCAAGAATCGGAGCCTGCTCACTACGGAGACCAAATAATCAAAGGATTTATACTGGTGTTACT GACGGCGGGGACCGATACATCGGCCGTGACAATGGAGTGGGCGCTATCTCATCTCCTGAACAAT CCTGAAGTGCTAAAGAAGGCAAGAGATGAGGTCGACACTGAAATTGGACAAGAACGACTTGTCG AAGAATCAGACGTAGTATCTAAGTTACCCTATCTTCAAGGGATCATCTCCGAGACTCTCCGGCT GAATCCCGCCGCTCCGATGTTGTTGCCCCATTACGCCTCGGACGACTGCACGATATGTGGATAC GACGTGCCACGTGACACAATCGTAATGGTCAATGCATGGGCCATACATAGGGATCCAAACGAAT GGGAGGAGCCCACGTGTTTCAGACCAGAACGATATGAAAAGTCGTCGTCGGAAGCGGAGGTACA CAAGTCGGTGAGTTTCGGGGTGGGAAGGCGAGCTTGTCCTGGGTCTGGCATGGCGCAGAGGGTG ATGGGCTTGACTTTGGCGGCACTGGTTCAGTGCTTCGAGTGGGAGAGAGTTGGAGAAGAAGAAG TGGACATGAACGAAGGCTCAGGTGCCACAATGCCCAAGATGGTGCCATTGGAGGCCATGTGCAG AGCTCGTCCCATCGTCCACAACCTTCTTTACTGA CYP5491protein MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKK 49 SEQIDNO:44 VERYGPIFKTCLAGRPVVVSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK in YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVKNASALMVFRTSVNKMFGEDAK W02016050890 KLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREVVDDRLANVGPDVEDFLGQ ALKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEAEHEAIRKARA DPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGYIIPEGWTIMLVTASRHR DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKL GGGRIARAHILSFEDGLHVKFTPKE Squaleneepoxidase MSAVNVAPELINADNTITYDAIVIGAGVIGPCVATGLARKGKKVLIVERDWAMPDRIVGELMQP 50 SEQIDNO:54 (S.cerevisiae) GGVRALRSLGMIQSINNIEAYPVTGYTVFFNGEQVDIPYPYKADIPKVEKLKDLVKDGNDKVLE in DSTIHIKDYEDDERERGVAFVHGRFLNNLRNITAQEPNVTRVQGNCIEILKDEKNEVVGAKVDI W02016050890 DGRGKVEFKAHLTFICDGIFSRFRKELHPDHVPTVGSSFVGMSLFNAKNPAPMHGHVILGSDHM PILVYQISPEETRILCAYNSPKVPADIKSWMIKDVQPFIPKSLRPSFDEAVSQGKFRAMPNSYL PARQNDVTGMCVIGDALNMRHPLTGGGMTVGLHDVVLLIKKIGDLDFSDREKVLDELLDYHFER KSYDSVINVLSVALYSLFAADSDNLKALQKGCFKYFQRGGDCVNKPVEFLSGVLPKPLQLTRVF FAVAFYTIYLNMEERGFLGLPMALLEGIMILITAIRVFTPFLFGELIG Squaleneepoxidase MVDQFSLAFIFASVLGAVAFYYLFLRNRIFRVSREPRRESLKNIATTNGECKSSYSDGDIIIVG 51 SEQIDNO:88 (Gynostemma AGVAGSALAYTLGKDGRRVHVIERDLTEPDRTVGELLQPGGYLKLTELGLEDCVNEIDAQRVYG in pentaphyllum) YALFKDGKDTKLSYPLEKFHSDVSGRSFHNGRFIQRMREKAATLPNVRLEQGTVTSLLEENGII W02016050890 KGVQYKSKTGQEMTAYAPLTIVCDGCFSNLRRSLCNPKVDVPSCFVALVLENCELPHANYGHVI LADPSPILFYPISSTEVRCLVDVPGQKVPSISNGEMANYLKSVVAPQIPPQIYDALRSCYDKGN IRTMPNRSMPADPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRDLLKPLRDLHDAPILS NYLEAFYTLRKPVASTINTLAGALYKVFCASPDQARREMRQACFDYLSLGGVFSNGPVSLLSGL NPRPLSLVLHFFAVAIYGVGRLLIPFPSPRRVWIGARLISGASGIIFPIIKAEGVRQIFFPATL PAYYRAPPLVRGR Squaieneepoxidase MESQLWNWILPLLISSLLISFVAFYGFFVKPKRNGLRHDRKTVSTVTSDVGSVNITGDTVADVI 52 SEQIDNO:89 1(Arabidopsis VVGAGVAGSALAYTLGKDKRRVHVIERDLSEPDRIVGELLQPGGYLKLLELGIEDCVEEIDAQR in thaliana) VYGYALFKNGKRIRLAYPLEKFHEDVSGRSFHNGRFIQRMREKAASLPNVQLEQGTVLSLLEEN W02016050890 GTIKGVRYKNKAGEEQTAFAALTIVCDGCFSNLRRSLCNPQVEVPSCFVGLVLENCNLPYANHG HVVLADPSPILMYPISSTEVRCLVDVPGQKVPSIANGEMKNYLKTVVAPQMPHEVYDSFIAAVD KGNIKSMPNRSMPASPYPTPGALLMGDAFNMRHPLTGGGMTVALADIVVLRNLLRPLRDLSDGA SLCKYLESFYTLRKPVAATINTLANALYQVFCSSENEARNEMREACFDYLGLGGMCTSGPVSLL SGLNPRPLTLVCHFFAVAVYGVIRLLIPFPSPKRIWLGAKLISGASGIIFPIIKAEGVRQMFFP ATVPAYYYKAPTVGETKCS Squaleneepoxidase MTYAWLWTLLAFVLTWMVFHLIKMKKAATGDLEAEAEARRDGATDVIIVGAGVAGASLAYALAK 53 SEQIDNO:90 4(Arabidopsis DGRRVHVIERDLKEPQRFMGELMQAGGRFMLAQLGLEDCLEDIDAQEAKSLAIYKDGKHATLPF in thaliana) PDDKSFPHEPVGRLLRNGRLVQRLRQKAASLSNVQLEEGTVKSLIEEEGVVKGVTYKNSAGEEI W02016050890 TAFAPLTVVCDGCYSNLRRSLVDNTEEVLSYMVGYVTKNSRLEDPHSLHLIFSKPLVCVIYQIT SDEVRCVAEVPADSIPSISNGEMSTFLKKSMAPQIPETGNLREIFLKGIEEGLPEIKSTATKSM SSRLCDKRGVIVLGDAFNMRHPIIASGMMVALSDICILRNLLKPLPNLSNTKKVSDLVKSFYII RKPMSATVNTLASIFSQVLVATTDEAREGMRQGCFNYLARGDFKTRGLMTILGGMNPHPLTLVL HLVAITLTSMGHLLSPFPSPRRFWHSLRILAWALQMLGAHLVDEGFKEMLIPTNAAAYRRNYIA TTTV Squaieneepoxidase MAFTHVCLWTLVAFVLTWTVFYLTNMKKKATDLADTVAEDQKDGAADVIIVGAGVGGSALAYAL 54 SEQIDNO:91 6(Arabidopsis AKDGRRVHVIERDMREPERMMGEFMQPGGRLMLSKLGLQDCLEDIDAQKATGLAVYKDGKEADA in thaliana) PFPVDNNNFSYEPSARSFHNGRFVQQLRRKAFSLSNVRLEEGTVKSLLEEKGVVKGVTYKNKEG W02016050890 EETTALAPLTVVCDGCYSNLRRSLNDDNNAEIMSYIVGYISKNCRLEEPEKLHLILSKPSFTMV YQISSTDVRCGFEVLPENFPSIANGEMSTFMKNTIVPQVPPKLRKIFLKGIDEGAHIKVVPAKR MTSTLSKKKGVIVLGDAFNMRHPVVASGMMVLLSDILILRRLLQPLSNLGDANKVSEVINSFYD IRKPMSATVNTLGNAFSQVLIGSTDEAKEAMRQGVYDYLCSGGFRTSGMMALLGGMNPRPLSLV YHLCAITLSSIGQLLSPFPSPLRIWHSLKLFGLAMKMLVPNLKAEGVSQMLFPANAAAYHKSYM AATTL Squaleneepoxidase MAFTNVCLWTLLAFMLTWTVFYVTNRGKKATQLADAVVEEREDGATDVIIVGAGVGGSALAYAL 55 SEQIDNO:92 5(Arabidopsis AKDGRRVHVIERDLREPERIMGEFMQPGGRLMLSKLGLEDCLEGIDAQKATGMTVYKDGKEAVA in thaliana) SFPVDNNNFPFDPSARSFHNGRFVQRLRQKASSLPNVRLEEGTVKSLIEEKGVIKGVTYKNSAG W02016050890 EETTALAPLTVVCDGCYSNLRRSLNDNNAEVLSYQVGFISKNCQLEEPEKLKLIMSKPSFTMLY QISSTDVRCVFEVLPNNIPSISNGEMATFVKNTIAPQVPLKLRKIFLKGIDEGEHIKAMPTKKM TATLSEKKGVILLGDAFNMRHPAIASGMMVLLSDILILRRLLQPLSNLGNAQKISQVIKSFYDI RKPMSATVNTLGNAFSQVLVASTDEAKEAMRQGCYDYLSSGGFRTSGMMALLGGMNPRPISLIY HLCAITLSSIGHLLSPFPSPLRIWHSLRLFGLAMKMLVPHLKAEGVSQMLFPVNAAAYSKSYMA ATAL Squaleneepoxidase MKPFVIRNLERFQSTLRSSLLYTNHRIPSSRYSLSTRRFTTGATYIRRWKATAAULKLSAVNST 56 SEQIDNO:93 2(Arabidopsis VMMKPAKIALDQFIASLFTFLLLYILRRSSNKNKKNRGLVVS0NDTVSKNLETEVDSGTDVIIV in thaliana) GAGVAGSALAHTLGKEGRRVHVIERDFSEQDRIVGELLQPGGYLKLIELGLEDCVKKIDAQRVL W02016050890 GYVLFKDGKHTKLAYPLETFDSDVAHNGRFVQRMREKALSNVRLEQGTVTSLLEEHGT IKGVRIRTKEGNEFRSFAFLTIVCDGCFSNLRRSLCKPKVDVPSTFVGLVLENCELPFANHGHV VLGDPSPILMYPISSSEVRCLVDVPGLPPIANGEMAKYLVAPQVPTKVREAFITKVEKG NIRTMPNRSMPADPIPTPGALLLGDAFNMRHPLTGGGMTVALADIVVLRDLLRPIRNLNDKEAL SKYIESFYTLRKPVASTINTLAD.ALYKV7LASSDEARTEMREACFDYLSLGGVFSSGPVALLSG LNPRPLSLVLHFFAVAIYAVCRLMLPFPSTESFWLGARIISSASSIIFPIIKAEGVRQMFFPRT IPAIYRAPP Squaleneepoxidase MAPTIFVDHCILTTTFVASLFAFLLLYVLRRRSKTIHGSVNVRNGTLTVKSGTDVDIIIVGAGV 57 SEQIDNO:94 3(Arabidopsis AGAALAHTLGKEGRRVUVIERDLTEPDRIVGELLQPGGYLKLIELGLEDCVKDIDAQRVLGYAL in thaliana) FKDGKHTKLSYPLDQFDSDVAGRSFHNGRFVQRMRSKASLLPNVRMEQGTVTSLVEENGIIKGV W02016050890 QYKTKDGQELKSFAPLTIVCDGCFSNLRRSLCKPKVEVPSNFVGLVLENCELPFPNHGHWLGD PSPILFYPISSSEVRCLVDVPGSKLPSVASGEMAHHLKTMVAPQVPPQIRBAFISAVEKGNIRT MPNRSMPADPIHTPGALLLGDAFNMRHLTGGGMTVALSDIVILRDLLNPLVDLTNKESLSKYI ESFYTLRKPVASTINTLAGALYKVFLADDARSEMRRACFDYLSLGGVCSSGVALLSGLNPR PMSLVLKFFAVAIFGVGRLLVPLPSVKRLWLGARLISSASGIIFPIIKAEGVRQMFFPRTIPAI YRAPPTPSSSSPQ Squalene MDLAFPHVCLWTLLAFVLTWTVFYVNNRRKKVAHLPDAATEVRRDGDADVIIVGAGVGGSALAY 58 SEQIDNO:95 monooxygenase1,1 ALAKDGRRVIIVIERDMREPVRMMGEFMQPGGRLLLSKLGLEDCLEGIDEQIATGLAVYKDGQKA in (Brassicanapus) LVSFPEDNDFPYEPTGRAFYNGRFVQRLRQKASSLPTVOLEEGTVKSLIEEKGVIKGVTYKNSA W02016050890 GEETTAFAPLTVVCDGCYSNLRRSVNDNNAEVISYQVGYVSKNCQLEDPEHLKLIMSKPSTTML YQISSTDVRCVMEIFTGNIPSISNGEMAVYLKNTMAPOVPPELRKIFLKGIDEGAOIKAMPTKR MEATLSEKQGVIVLGDAFNMRHPAIASGMMVVLSDILILRRLLQPLRNLSDANKVSEVIKSFYV IRKPMSATVNTLGNAFSQVIIASTDEAKEAMRQGCFDYLSSGRTSGMMALLGGMNPRPLSLI FHLCGITLSSIGQLLSPYPSPLGIWHSLRLYGAEGVSQMLSPAYPAAYRKSYMTATAL Squalene MDMAFVEVCLRMLLVFVLSWTIFHVNNRKKKKATKLADLATEERKEGGPDVIIVGAGVGGSALA 59 SEQIDNO:96 monooxygenase1,2 YALAKDGRRVIIVIERDMREPVRMMGEFMQPGGRLMLSKLGLQDCLEEIDAQKSTGIRLFKDGKE in (Brassicahapus) TVACFPVDTNFTYEPSGRFFHNGRFVQRLROKASSLPNVRLEEGTVRSLIEEKGVVKGVTYKNS W02016050890 SGEETTSFAPLTVVCDGCHSNLRRSLNDNNAEVTAYEIGYISRNCRLEQPDKLITLIMAKPSFAM LYQVSSTDVRCNFELLSKNLPSVSNGEMTSFVRNSIAPQVPLKLRKTFLDEGSHIKITQAK RIPATLSRKKGATIVLGDAFNMRHPVIA3GMMVLLSDILILSRLLKPLGNLGDENKVSEVMKSFY ALRKPMSATVNTLGNSFWQVLIASTDEAKEAMRQGCFDYLSSGGFRTSGLMALIGGMNPRPLSL FYJILFVISLSSIGQLLSPFPTPLRVWHSLRLLDLSLKMLVPHLKAEGIGQMLSPTNAAAYRKSY MAATVV Squaleneepoxidase MEVIFDTYIFGTFFASLCAFLLLFILRPKVKKMGKIREISSINTQNDTAITPPKGSGTDVIIVG 60 SEQIDNO:97 (Euphorbia AGVAGAALACTLGKDGRRVEVIERDLKEPDRIVGELLQLKLVELGLQDCVEEIDAQRIVG in tirucalli) YALFMDGNNTKLSYPLEKFDAEVSGKSFHNGRFIQRMREKAASLNVULEQGTVTSLLEENGTI W02016050890 KGVQYKTKDGQEHKAYAPLTVVCDGCFSNLRRSLCKPKVDVPSHFVGLVLENCDLPFANHGHVI LADPSPILFYPISSTEVRCLVDVPGQKLPSIASMAILKTMVAKQIPPVLHDAFVSAIDKGN IRTMFNRSMPADPLPTPGALLMGDAFNMREPLTGGGNIVALADIVIARDLLKPLRDLNDAFALA KYLESFYTLRKPVASTINTLAGALYKVFSASPDEARKEMRQACFDYLSLGGECAMGPVSLLSGL NPSPLTLVLHFFGVAIYGVGRLLIPFPTPKGMWIGARIISSASGIIFPIIKAEGVRQVFFPATV PAIYRNPPVNGKSVEVPKS Squaleneepoxidase MTDPYGFGWITCTLITLAALYNFLFSRKNHSDSUTINITTATGECRSFNPNGDVDIIIVGAGV 61 SEQIDNO:98 (Medicago AGSALAYTLGRRVLIIERDLNEPDRIVGELLQPGGYLKLIELGLDDCVEKIDAQKVFGYAL in truhcatula) FKDGHTRLSYPLEKFHSDIARSFHNGRFILRMRAASLPWLEQGTVTSLLEENGTIKGV W02016050890 QYKTKDAQEFSACAPLTIVCDGCFSNLRRSLCNPKVEVPSCFVGLVLENCELPCADHGHVILGD PSPVLFYPISSTEIRCLVDVPGOKVPSISNGEMAKYLKTVVAPQVPPELHAAFIAAVDKGHIRT MPNRSMPADPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLRDLNDASSLCKYL ESFYTLRKPVASTINTLAGALYKVFCASPDPARKEMROACFDYLSLGGLFSEGPVSLLSGLNPC PLSLVLHFFAVAIYGVGRLLLPFTSPKRLWIGIRLIASASGIILPIIKAEGIRQMFFTATVPAY YRAPPDA Squalene MDLYNIGWILSSVLSLFALYNLIFAGKKNYDVNEKVNOREDSVTSTDAGEIKSDKLNGDADVII 62 SEQIDNO:99 monooxygenase VGAGIAGAALAHTLGKDGRRVHIIERDLSEPDRIVGELLQPGGYLKLVELGLQDCVDNIDAORV in (Medicago FGYALFKDGKIITRLSYPLEKFHSDVSGRSFHNGRFIQRMREHAASLPNVNMEQGTVISLLEEKG W02016050890 truncatula) TIKGVOYKNKDOQAL7LAYAPLTIVCDOCFSNLRRSLONPKVDNITSCFVGLILENCELPCANHGH VILGDPSPILFYPISSTEIROLVDVPOTKVPSISNGDMTKYLKTTVAPOVPPELYDAFIAAVDK GNIRTMPNRSMPADPRPTPGAVLMGDAFNMRHPLIGGGMTVALSDIVVLRNLLKPMRDLNDAPT LCHYLESFYILRKPVASTINTLAGALYKVFSASPDEARKEMRQACFDYLSLGGLFSEOPISLLS OLNPRPLSLVLHFFAVAVFOVORLLLPYPSPKRVNIGARLLSGASGIILPIIKAEGIROMFFPA TVPAYYRAPPVNAF Squalene MADNYLLGWILCSIIGIZOLYYMVYLVVKREEEDNNRKALLQARSDSAKTMSAVSQNGEORSDN 63 SEQIDNO: monooxygenase PADADIIIVGAGVAGSALANTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLEDCV 100in (Ricinuscommunis) EEIDAQRVFOYALFMDGKIITQLSYPLEKFHSDVAGRSFHNGRFIQRMREHASSIPNVRLEQGTV W02016050890 TSLIEEKGIIRGVVYKTKIGEELTAFAPLTIVCDOCFSNLRRSLONPKVDVPSCFVGLVLEDCK LPYQYHONVVLADPSPILFWISSIEVRCLVDVPOQKVPSISNGEMAKYLKNVVAPWIPPEIYD SFVAAVDKGNIRTMPNRSMFASPYPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRELLKPL RDLHDAPTLCRYLESFYTPVASTINTLAGALTKVFCASSDEARNEMRQACFDYLSLGGVFS TGPISLLSGLNPRPLSLVVHFFAVA+32GVGRLLLPFPSPKRVWVGARLISGASGIIFPIIAEG VRQMFFETATVPAYYRAPPVECN Squalene MEYKLAVAGITASLWALFMLCSLKRKKNITRASFNNYTDETLKSSSKEICQPEIVASPDIIIVG 64 SEQIDNO: monooxygenase AGVAGAALAYALGEDGRQVEVIERDLSEPDRIVGELLO_PLKLIELGLEDCVEKIDAWYFG 101in (Ricinuscommunis) YAIFKDGKSTKLSYPLDGFUNVSGRSFHNGRFIQRMREKATSLPNLILQQ+32TSLVEKKGTV W02016050890 KGVNYRTRNOQEMTAYAPLTIVCDOCFSNLRRSLCNPKVEIPSOFVALVLENCDLPYANHONVI LADPSPILFYPISSTEVROLVDIPOQKVPSISNGELAQYLKSTVAKQIPSELHDAFISAIEKON IRTMFNRSMPASPHPTPGALLVGDATE7NM.REPLTOGGNIVALSDIVLLRNLLRPLENLNDASVLC KYLESFYILRKPMASTINTLAGALYKVFSASTDRARSEMRQACFDYLSLGGVFSNGPIALLSGL NPRPLNLVLHFFAVAVYGVGRLILPFPSPKSIWDGVKLISGASSVIFPIMKAEGIGQIFFPITK PPNHKSOTW Squalene MGVSREENARDEKCHYYENGISLSEKSMSTDIIIVGAGVAGSALAYTLGKDGRRVHVIERDLSL 65 SEQIDNO: monooxygenase QDRIVGELLQPGGYLKLIELGLEDCVEEIDAQQVFGYALYKNGRSTKLSYPLESFDSDVSGRSF 102in (Ricinuscommunis) HNGRFIQRMREKAASLPNVRLEEGTVTSLLEVKGTIKGVQYKTKNGEELTASAPLTIVCDGCFS W02016050890 NLRRSLCNPKVDIPSCFVALILENSGQKLPSISNGDMANYLKSVVAPQIPPVLSEAFISAIEKG KIRTMPNRSMPAAPHPTPGALLLGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLHDLTDASAL CEYLKSFYSLRKPVASTINTLAGALYKVFSASHDPARNEMRQACFDYLSLGGVFSNGPIALLSG LNPRPLSLVAHFFAVAIYGVGRLIFPLPSAKGMWMGARMIKVASGIIFPIIRAEGVQHMFFSKT LSAFSRSQTS Squalene MEYQYFVGGIIASALLFVLVCRLAGKRQRRALRDTVDRDEISQNSENGISQSEKNMNTDIIIVG 66 SEQIDNO: monooxygenase AGVAGSTLAYTLGKDGRRVRVIERDLSLQDRIVGELLQPGGYLKLIELGLEDCVEEIDALQVFG 103in (Ricinuscommunis) YALYKNGRSTKLSYPLDSFDSDVSGRSFHNGRFIQRMREKAASLPNVRMEGGTVTSLLEVKGTI W02016050890 KGVQYKNKNGEELIACAPLTIVCDGCFSNLRRSLCNSKVDIPFCFVALILENCELPYPNHGHVI LADPSPILFYRISISEIRCLVDIPAGQKLPSISNGEMANYLKSVVAPQIPPELSNAFLSAIEKG KIRTMPKRSMPAAPHPTPGALLLGDAFNMRHPLTGGVMTVALSDIVVLRSLLRPLHDLTDASAL CEYLKSFYSLRKPMVSTINTLAGALYRVFSASQDPARDEMRQACFDYLSLGGVFSNGPIALLSG LNPRPLSLIVHFFAVAVYGVGRLIFPLPSAKRMWMQE Sgualene MEYQYLMGGGIMTLLFVLSYRLKRETRASVENARDEVLQNSENGISQSEKAMNTDIKLLLEQIV 67 SEQIDNO: monooxygenase QKIAMLNSIRLEEGTVTSLLEVKRDIKGVQYKTKNGEELTACAPLTIVSHGCFSNLRLHVTPST 104in (Ricinuscommunis) SKFKSFIGLEVDIPSSFAALILGNCELPFPNHGHVILADPSSILFYRISSSEICCLVDVPAGQK W02016050890 LPSISNGEMANYLKSVVAHQAFKVGLAY Squalene MSPISIQLPPRPQLYRSLISSLSLSTYKQPPSPPSFSLTIANSPPQPQPQATVSSKTRTITRLS 68 SEQIDNO: monooxygenase NSSNRVNLLQAEQHPQEPSSDLSYSSSPPHCVSGGYNIKLMEVGTDNYAVIIILGTFFASLFAF 105in (Ricinuscommunis) VFLSILRYNFKNKNKAKIHDETTLKTQNDNVRLPDNGSGNDVIIVGAGVAGAALAYTLGKDGRR W02016050890 VHVIERDLTEPDRIVGELLQPGGYLKLIELGLEDCVQEIDAQRVLGYALFKDGKNTRLSYPLEK FHADVAGRSFHNGRFIQRMREKAASLPNVKLEQGTVTSLLEENGTIKGVQYKTKDGQEIRAYAP LTIVCDGCFSNLRRSLCNPKVDVPSCFVGLVLENCQLPFANHGHVVLADPSPILFYPISSTEVR CLVDVPGQKVPSIANGEMAKYLKNVVAPQIPPVLHDAFISAIDKGNIRTMPNRSMPADPHPTPG ALLMGDAFNMRHPLTGGGMTVALSDIVVLRDLLKPLRDLNDATSLTKYLESFYTLRKPVASTIN TLAGALYKVFSASPDQARKEMRQACFDYLSLGGIFSSGPVALLSGLNPRPLSLVMHFFAVAIYG VGRLLLPFPSPKSVWIGARLISSASGIIFPIIKAEGVRQMFFPATIPAIYRPPPVKDTSDDEQK SR ERG9protein(S. MGKLLOLALHPVEMKAALKLKFCRTPLFSIYDQSTSPYLLMCFELLNLTSRSFAAVIRELHPEL 69 SEQIDNO:87 cerevisiae) RNCVTLFYLILRALDTIEDDMSIEHDLKIDLLRHFHZKLLLIKWSFDGKAPDVKDRAVLTDFES in ILIEFHKLKPEYQEVIKEITEKMGNGMADYILDENYNLNGLCTVHDYDVYCHYVAGLVGDGLTR W02016050890 LIVIAKNESLYSNEULYSMGL.DIQKTNIIRDYNEDLVDGRSITWPKEIWSQYAPQLKDITMKP ENEQLGLDCINHLVLNALSHVIDVLTYLAGIHEQSTFQYCAIPQVMAIATLALVFNNREVLHGN VKIRKGTTCYLILKSRTLPGCVEIFDYYLRDTKSKIAVQDPNFLKLNIQISKIEQFMEEMYQDK LPPNVIKPNETPIFLKVKERSRYDDELVPTQQEEEYKFNMVLSIILSVLLGFYYIYTLHRA Cucurbitadienoi MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS 70 SEQIDNO:43 synthase(S DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF in grosvenorli) LLPGLVIALYVTGVLNSVLSKHHRQEMCRYVYNHQNEDGGWGLHIEGPSTMFGSALNYVALRLL W02016050890 GEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGR MWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHIHYEDENTRYICLGPVNKVLNLLCCWVED PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV KSSQIQQDCPGDPNVWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATMEALTLF KKLHPGHRTKEIDTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Cucurbitadienol MWRLKVGAESVGEEDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ 71 Disclosedin synthase SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLERALGFYSAVQTRDGNWASDLGGP Takaseetal. (UniProtKB-Q6BE24) LFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR (OrgBiomol LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP Chem.2015 FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY Jul PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC 14;13(26):733 CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK 1-6)whichis AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPLSTRDHGWLISDCTAEGLKASLMLSKLPSTMVG incorporated EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME byreference ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY inits NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD entirety PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Cucurbitadienol MWRLKVGAESVGEEDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ 72 SEQIDNO:1 synthase(C.pepo) SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLERALGFYSAVQTRDGNWASDLGGP of LFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR W02014/086842 LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP whichis FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY incorporated PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC byreference CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK inits AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPLSTRDHGWLISDCTAEGLKASLMLSKLPSTMVG entirety. EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE C-terminalportion LEPNRLCDAVNVILSLQNDNGGFASTELTRSYPWLELINPAFTFGDIVTDYPYVECTSATMFAL 73 SEQIDNO:2 ofS.Grosvenorrii TLFKKLHPGHRTKEIDTAIVRAANFLENMORTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNN in cucurbitadienol CLAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPT W02014/086842 synthase PLHRAARLLINSQLENGDFPQQEIMGVFNKMNJCMITYAAYRNIFPIWALGEYCHRVLTE Codonoptimized ATGTGGAGATTGAAAGTAGGTGCTGAATCCGTAGGTGAAAACGACGAAAAGTGGTTGAAAAGTA 74 SEQIDNO:42 cucurbitadienol TAAGTAATCATTTGGGTAGACAAGTCTGGGAATTTTGTCCAGATGCAGGTACACAACAACAATT in synthasegenefrom GTTGCAAGTACATAAGGCTAGAAAGGCATTTCATGATGACAGATTCCACAGAAAGCAATCTTCA W02014/086842 Siraitia GATTTGTTCATCACCATCCAATACGGCAAGGAAGTAGAAAACGGTGGCAAGACTGCTGGTGTTA grosvencrii AATTGAAGGAAGGTGAAGAAGTTAGAAAAGAAGCAGTTGAATCCAGTTTGGAAAGAGCCTTGTC TTTCTACTCTTCAATCCAAACCTCTGATGGTAATTGGGCATCAGACTTGGGTGGTCCAATGTTC TTGTTACCTGGTTTGGTCATTGCCTTGTACGTAACTGGTGTTTTGAACTCTGTATTGTCAAAGC ATCACAGACAAGAAATGTGTAGATACGTTTACAACCATCAAAACGAAGATGGTGGTTGGGGTTT GCACATTGAAGGTCCATCCACTATGTTTGGTAGTGCATTGAATTATGTCGCCTTAAGATTGTTA GGTGAAGATGCAAACGCCGGTGCTATGCCTAAGGCAAGAGCCTGGATATTAGACCATGGTGGTG CTACTGGTATCACATCCTGGGGTAAATTGTGGTTAAGTGTCTTAGGTGTATATGAATGGTCTGG TAATAACCCATTGCCACCTGAATTTTGGTTGTTCCCTTACTTTTTACCATTCCATCCTGGTAGA ATGTGGTGTCACTGCAGAATGGTTTACTTGCCAATGTCTTACTTGTACGGCAAGAGATTCGTTG GTCCAATAACACCTATCGTCTTGTCATTGAGAAAGGAATTGTACGCAGTTCCTTACCATGAAAT CGATTGGAACAAGTCCAGAAACACCTGTGCTAAGGAAGATTTGTATTACCCACACCCTAAAATG CAAGACATTTTGTGGGGTAGTTTACATCACGTTTACGAACCATTATTTACTAGATGGCCTGCTA AAAGATTGAGAGAAAAGGCATTACAAACAGCCATGCAACATATCCACTACGAAGATGAAAACAC CAGATACATCTGCTTGGGTCCAGTTAACAAGGTCTTGAACTTGTTGTGTTGCTGGGTTGAAGAT CCTTATTCTGACGCTTTCAAGTTGCATTTGCAAAGAGTACACGATTACTTGTGGGTTGCAGAAG ACGGTATGAAAATGCAAGGTTACAATGGTTCACAATTGTGGGATACAGCTTTTTCCATTCAAGC AATAGTCAGTACTAAGTTGGTAGATAACTACGGTCCAACATTAAGAAAAGCTCATGACTTCGTA AAGTCCAGTCAAATACAACAAGATTGTCCAGGTGACCCTAATGTTTGGTATAGACATATCCACA AAGGTGCATGGCCATTTTCTACCAGAGATCATGGTTGGTTGATTTCAGACTGTACTGCTGAAGG TTTGAAGGCTGCATTGATGTTGTCTAAGTTGCCATCAGAAACTGTTGGTGAATCCTTGGAAAGA AATAGATTATGCGATGCCGTTAACGTCTTGTTGAGTTTGCAAAACGACAACGGTGGTTTCGCTT CTTACGAATTGACTAGATCATACCCATGGTTGGAATTAATTAATCCTGCTGAAACATTCGGTGA TATCGTCATTGACTATCCATACGTAGAATGTACCTCCGCTACTATGGAAGCATTGACCTTGTTC AAGAAGTTGCATCCTGGTCACAGAACAAAGGAAATCGATACCGCAATTGTTAGAGCCGCTAATT TCTTGGAAAACATGCAAAGAACAGACGGTTCTTGGTATGGTTGTTGGGGTGTTTGCTTTACCTA CGCTGGTTGGTTCGGTATTAAAGGTTTAGTCGCAGCCGGTAGAACATACAATAACTGTTTGGCC ATAAGAAAAGCTTGCGATTTCTTGTTATCTAAGGAATTACCAGGTGGTGGTTGGGGTGAATCCT ACTTGAGTTGTCAAAACAAGGTTTACACTAATTTGGAAGGCAACAGACCTCATTTAGTTAACAC AGCCTGGGTCTTGATGGCTTTAATCGAAGCCGGTCAAGCTGAAAGAGATCCAACTCCTTTGCAT AGAGCTGCAAGATTGTTGATCAACTCACAATTGGAAAACGGTGATTTTCCACAACAAGAAATCA TGGGTGTTTTCAACAAGAACTGCATGATAACATATGCCGCTTACAGAAACATTTTTCCTATATG GGCTTTGGGTGAATACTGCCACAGAGTCTTGACCGAATAA Cycloartenol MWKLKIAEGGNPWLRSTNSHVGRQVWEFDPKLGSPQDLAEIETARNNFHDNRFSHKHSSDLLMR 75 Disclosedin synthase[Lotus IQFSKENPIGEVLPKVKVKDVEDVTEEAVVTTLRRAISFHSTLQSHDGHWPGDYGGPMFLMPDL W02014/086842 japonicus] VITLSITGALNAVLTDEHRKEMCRYLYNHQNKDGGWGLHIEGPSTMFGSVLNYVTLRLLGEGPN GenBankAccession DGQGDMEKARDWILGHGGATYITSWGKMWLSVLGVFEWSGNNPLPPEIWLLPYALPFHPGRMWC No.BAE53431.1 HCRMVYLPMSYLYGKRFVGPITPTILSLRKELFTIPYHDIDWNQARNLCAKEDLYYPHPLVQDI LWASLHKVVEPVLMQWPGKKLREKAINSVMEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS EAFKLHLPRIYDYLWIAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIEEYGPTLRKAHTFIKNS QVLEDCPGDLNKWYRHISKGAWPFSTADHGWPISDCTAEGLKAILSLSKIAPDIVGEPLDAKRL YDAVNVILSLQNEDGGLATYELTRSYSWLELINPAETFGDIVIDYPYVECTSAAIQALTSFRKL YPGHRREEIQHSIEKAAAFIEKIQSSDGSWYGSWGVCFTYGTWFGVKGLIAAGKSFSNCSSIRK ACEFLLSKQLPSGGWGESYLSCQNKVYSNLEGNRPHAVNTGWAMLALIEAEQAKRDPTPLHRAA LYLINSQMENGDFPQQEIMGVFNKNCMITYAAYRSIFPIWALGEYRCRVLQAR Hypothetical MWKLTIGAESVHDNGQSSSWLKSVNNHLGRQVWEFCPQLGSPDELLQLQNVRLSFQAQRFDKKH 76 Disclosedin protein SADLLMRFQFEKENPCVNLPQIKVKDDEDVTEEAVTTTLRRAVNFYRKIQAHDGHWPGDYGGPM W02014/086842 POPTR_0007s15200g FLLPGLIITLSITGALNAVLSKEHQREMCRYLYNHQNRDGGWGLHIEGPSTMFGTCLNYVTLRL [Populus LGEGAEGGDGEMEKGRKWILDHGGATEITSWGKMWLSVLGVHEWSGNNPLPPEVWLCPYLLPMH trichocarpa] PGRMWCHCRMVYLPMSYLYGKRFVGPITPTIQSLRKEIYTVPYHEVDWNTARNTCAKEDLYYPH PLVQDILWASLHYAYEPILTRWPLNRLREKALHKVMQHIHYEDENTQYICIGPVNKVLNMLCCW VEDPHSEAFKLHLPRVFDYLWIAEDGMKMQGYNGSQLWDTAFAVQAIVSTNLAEEYSGTLRKAH KYLKDSQVLEDCPGDLNFWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKLPTEMVGDP LGVERLRDAVNVILSLQNADGGFATYELTRSYQWLELINPAETFGDIVIDYPYVECTSAAIQAL ASFKKLYPGHRREEIDNCIAEAANFIEKIQATDGSWYGSWGVCFTYAGWFGIKGLVAAGMTYNS SSSIRKACDYMLSKELAGGGWGESYLSCQNKVYTNLKDDRPHIVNTGWAMLALIEAGQAERDPI PLHRAARVLINSQMENGDFPQEEIMGVFNKNCMISY SAYRNIFPIWALGEYRCQVLQAL putative2,3 MWKLKIAEGEDPWLRSVNNHVGRQVWEFDRNLGTPEELIEVEKAREDFSNHKFEKKHSSDLLMR 77 Disclosedin oxidosdualene LQLAKENPCSIDLPRVQVKDTEEVTEEAVTTTLRRGLSFYSTIQGHDGHWPGDYGGPLFLMPGL W02014/086842 cyclase[Actaea VIALSVTGALNAVLSSEHQRETRRYIYNHQNEDGGWGLHIEGSSTMFITTLNYVTLRLLGEGAD racemosa] DGEGAMEKARKWILNHGSATATTSWGKMWLSVLGVFEWSGNNPLPPEMWLLPYCLPFHPGRMWC HCRMVYLPMSYLYGKRFVGPITPTIESLRKELYSVPYHEIDWNQARNLCAKEDLYYPHPLVQDI LWTSLHYGVEPILTRWPANKLREKSLLTTMQHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS EAFKLHIPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAVQAIISTNLFEDYAPTLRKAHKYIKDS QVLDDCPGDLNFWYRHISKGAWPFSTADHGWPISDCTAEGLKAALLLSKIPSKSVGDPINAKQL YDAVNVILSLQNGDGGFATYELTRSYPWLELINPAETFGDIVIDYPYVECTAAAIQALTSFKKL YPGHRREDIENCVEKAVKFLKEIQAPDGSWYGSWGVCFTYGIWFGIKGLVAAGETFTNSSSIRK ACDFLLSKELDSGGWGESYLSCQNKVYTNLKGNRPHLVNTGWAMSALIDAGQAERDPKPLHRAA RVLINSQMDNGDFPQEEIMGVFNRNCMISYSAYRNIFPIWALGEYRCRVLKAP CGTase MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD 78 AAA22298.1 AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW ARDFKQTNDAFGDFADFQNLIDTLTLITSRSDRLRPQPHVSGRAGTNPGFAENGALYDNGSLLG AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD AVKQYPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE LGTWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG VGTVTVDWQN CGTase MKYLLPTAAAGLLLLAAQPAMAMDIGINSDPSPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNN 79 3WMS_A PAGDAFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSY HGYWARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGALYDNG SLLGAYSNDTAGLFHHNGGTDFSTIEDGIYKNLIDLADINHNNNAMDAYFKSAIDLWLGMGVDG IRFDAVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQE VREVFRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTS RGVPAIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNN DVLIIERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNF TLAAGGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSW EDTQIKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTMRFLVNQANTNYGTNVYLVG NAAELGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTT PASSVGTVTVDWQNLE CGTase SPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGDAFSGDRSNLKLYFGGDWQGIIDKINDG 80 4JCL_A YLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYWARDFKQTNDAFGDFADFQNLIDTAHAH NIKVVIDFAPNHTSPADRDNPGFAENGGMYDNGSLLGAYSNDTAGLFHHNGGTDFSTIEDGIYK NLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFDAVKHMPFGWQKSFVSSIYGGDHPVFTF GEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREVFRDKTETMKDLYEVLASTESQYDYINN MVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVPAIYYGTEQYMTGDGDPNNRAMMTSFNT GTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLIIERKFGSSAALVAINRNSSAAYPISGL LSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAAGGTAVWQYTAPETSPAIGNVGPTMGQP GNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQIKAVIPKVAAGKTGVSVKTSSGTASNT FKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAELGSWDPNKAIGPMYNQVIAKYPSWYYD VSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASGVGTVTVDWQN CGTase MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD 81 WP_036618292.1 AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLLG AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG VGTVTVDWQN CGTase MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD 82 P04830.2 AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGGMYDNGSLLG AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG VGTVTVDWQN CGTase MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD 83 AAC04359.1 AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLLG AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG VGTVTVDWQN CGTase MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD 84 CAA41773.1 AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGGMYDNGSLLG AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG VGTVTVDWQN CGTase SPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGDAFSGDRSNLKLYFGGDWQGIIDKINDG 85 AGT21379.1 YLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYWARDFKQTNDAFGDFADFQNLIDTAHAH NIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLLGAYSNDTAGLFHHNGGTDFSTIEDGIYK NLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFDAVKHMPFGWQKSFVSSIYGGDHPVFTF GEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREVFRDKTETMKDLYEVLASTESQYDYINN MVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVPAIYYGTEQYMTGDGDPNNRAMMTSFNT GTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLIIERKFGSSAALVAINRNSSAAYPISGL LSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAAGGTAVWQYTAPETSPAIGNVGPTMGQP GNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQIKAVIPKVAAGKTGVSVKTSSGTASNT FKSFNVLTGDQVTMRFLVNQANTNYGTNVYLVGNAAELGSWDPNKAIGPMYNQVIAKYPSWYYD VSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASSVGTVTVDWQN CGTase SPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGDAFSGDRSNLKLYFGGDWQGIIDKINDG 86 AGT95840.1 YLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYWARDFKQTNDAFGDFADFQNLIDTAHAH NIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLPGAYSNDTAGLFHHNGGTDFSTIEDGIYK NLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFDAVKHMPFGWQKSFVSSIYGGDHPVFTF GEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREVFRDKTETMKDLYEVLASTESQYDYINN MVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVPAIYYGTEQYMTGDGDPNNRAMMTSFNT GTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLIIERKFGSSAALVAINRNSSAAYPISGL LSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAXGXTAVWQYTAPETSPAIGDVGPTMGQP GNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQIKAVIPKVAAGKTGVSVKTSSGTASNT FKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAELDSWDPNKAIGPMYNQVIAKYPSWYYD VSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASGVGTVTADWQN CGTase MKKQVKWLTSVSMSVGIALGAALPVWASPDTSVNNKLNFSTDTVYQIVTDRFVDGNSANNPTGA 87 P31835.1 AFSSDHSNLKLYFGGDWQGITNKINDGYLTGMGITALWISQPVENITAVINYSGVNNTAYHGYW PRDFKKTNAAFGSFTDFSNLIAAAHSHNIKVVMDFAPNHTNPASSTDPSFAENGALYNNGTLLG KYSNDTAGLFHHNGGTDFSTTESGIYKNLYDLADINQNNNTIDSYLKESIQLWLNLGVDGIRFD AVKHMPQGWQKSYVSSIYSSANPVFTFGEWFLGPDEMTQDNINFANQSGMHLLDFAFAQEIREV FRDKSETMTDLNSVISSTGSSYNYINNMVTFIDNHDMDRFQQAGASTRPTEQALAVTLTSRGVP AIYYGTEQYMTGNGDPNNRGMMTGFDTNKTAYKVIKALAPLRKSNPALAYGSTTQRWVNSDVYV YERKFGSNVALVAVNRSSTTAYPISGALTALPNGTYTDVLGGLLNGNSITVNGGTVSNFTLAAG GTAVWQYTTTESSPIIGNVGPTMGKPGNTITIDGRGFGTTKNKVTFGTTAVTGANIVSWEDTEI KVKVPNVAAGNTAVTVTNAAGTTSAAFNNFNVLTADQVTVRFKVNNATTALGQNVYLTGNVAEL GNWTAANAIGPMYNQVEASYPTWYFDVSVPANTALQFKFIKVNGSTVTWEGGNNHTFTSPSSGV ATVTVDWQN CGTase MKSRYKRLTSLALSLSMALGISLPAWASPDTSVDNKVNFSTDVIYQIVTDRFADGDRTNNPAGD 88 KFM94552.1 AFSGDRSNLKLYFGGDWQGIIDKINDGYLTGMGVTALWISQPVENITSVIKYSGVNNTSYHGYW ARDFKQTNDAFGDFADFQNLIDTAHAHNIKVVIDFAPNHTSPADRDNPGFAENGALYDNGSLLG AYSNDTAGLFHHNGGTDFSTIEDGIYKNLYDLADINHNNNAMDAYFKSAIDLWLGMGVDGIRFD AVKHMPFGWQKSFVSSIYGGDHPVFTFGEWYLGADQTDGDNIKFANESGMNLLDFEYAQEVREV FRDKTETMKDLYEVLASTESQYDYINNMVTFIDNHDMDRFQVAGSGTRATEQALALTLTSRGVP AIYYGTEQYMTGDGDPNNRAMMTSFNTGTTAYKVIQALAPLRKSNPAIAYGTTTERWVNNDVLI IERKFGSSAALVAINRNSSAAYPISGLLSSLPAGTYSDVLNGLLNGNSITVGSGGAVTNFTLAA GGTAVWQYTAPETSPAIGNVGPTMGQPGNIVTIDGRGFGGTAGTVYFGTTAVTGSGIVSWEDTQ IKAVIPKVAAGKTGVSVKTSSGTASNTFKSFNVLTGDQVTVRFLVNQANTNYGTNVYLVGNAAE LGSWDPNKAIGPMYNQVIAKYPSWYYDVSVPAGTKLDFKFIKKGGGTVTWEGGGNHTYTTPASG VGTVTVDWQN Toruzyme MKKTLKLLSILLITIALLFSTIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPSNNPTGD 89 AJE25826.1 LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGELL GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKAAIKLWLDMGIDGIRM DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMNLLDFRFAQKVRQV FRDNTDTMYGLDSMIQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA IYYGTEQYMTGNGDPYNRAMMTSFNTNTTAYNVIKKLAPLRKSNPAIAYGTQKQRWVNNDVYIY ERQFGNNVALIAINRNLSTSYNITGLYTALPAGTYSDVLGGLLNGNSITVSSNGSVTSFTLAPG AVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK VPALTPGKYNITLKTASEVTSNSYNNINVLTGNQVCVRFVVNNATTVWGENVYLTGNVAELGNW DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV IVNWQN Toruzyme MRKNVKLFAAIILFFSLLLTSCGSKDTSSNITPKSDVIYQVMIDRFYNGDKSNDDPKISKGMFD 90 KH062967.1 PTYTNWRMYWGGDLKGLTEKIPYIKGMGVTAIWISPVVDNINKPAIYNGEINAPYHGYWARDFK RVEEHFGSWEDFDNFVKTAHANGIKVILDFAPNHTSPADKNNPDFAENGALYDDGNLLGTYSND VNKLFHHNGGITNWNNLKDLQDKNLFDLADLDQSNPIVDKYLKDSIKLWFSHGIDGVRLDAVKH MPMEWVKSFADTIYGVNKDAILFGEWMLNGPTDPLYGYNIQFANTSGFSVLDFMLNSAIKDVFE KGYGFDRLNDTIEETNKDYDNPYKLVTFVDNHDMPRFLSVNDDKDKLHEAIAFIMTSRGIPAIY YGTEQYLHNDTNGGNDPYNRPMMEKFDENTTAYVLIRELSNLRKATQALQYGKTVSRYVSNDVY IYERQYGKDIVVVAINKGEETTVKNIETSLRKGKYSDYLKGLLKGGNLKVERGNSENDILSITL PKDSVSIWTNVKVK Toruzyme MKKTLKLLSILLITIALLFSSIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPNNNPTGD 91 KH061869.1 LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGVLL GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKAAIKLWLDMGIDGIRM DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMSLLDFRFAQKVRQV FRDNTDTMYGLDSMLQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA IYYGTEQYMTGNGDPYNRAMMTSFDTTTTAYNVIKKLAPLRKSNPAIAYGTQKQRWINNDVYIY ERQFGNNVALVAINRNLSTSYYITGLYTALPAGTYSDVLGGLLNGNNISVASDGSVTPFTLAPG EVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK VPALTPGKYNVTLKTASGVTSNSYNNINVLTGNQVCVRFVVNNASTVWGENVYLTGNVAELGSW DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV IVNWQN Toruzyme MKKTLKLLSILLITIALLFSSIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPSNNPTGD 92 KH061665.1 LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGVLL GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKVAIKLWLNMGIDGIRM DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMSLLDFRFAQKVRQV FRDNTDTMYGLDSMLQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA IYYGTEQYMTGNGDPYNRAMMTSFDTTTTAYNVIKKLAPLRKSNPAIAYGTQKQRWINNDVYIY ERQFGNNVALVAINRNLSTSYYITGLYTALPAGTYSDVLGGLLNGNNISVASDGSVTPFTLAPG EVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK VPALTPGKYNVTLKTASGVTSNSYNNINVLTGNQVCVRFVVNNASTVWGENVYLTGNVAELGSW DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV IVNWQN Toruzyme MKKTLKLLSILLITIALLFSSIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPNNNPTGD 93 WP_042834654.1 LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGVLL GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKAAIKLWLDMGIDGIRM DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMSLLDFRFAQKVRQV FRDNTDTMYGLDSMLQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA IYYGTEQYMTGNGDPYNRAMMTSFDTTTTAYNVIKKLAPLRKSNPAIAYGTQKQRWINNDVYIY ERQFGNNVALVAINRNLSTSYYITGLYTALPAGTYSDVLGGLLNGNNISVASDGSVTPFTLAPG EVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK VPALTPGKYNVTLKTASGVTSNSYNNINVLTGNQVCVRFVVNNASTVWGENVYLTGNVAELGSW DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV IVNWQN Toruzyme MKKTLKLLSILLITIALLFSSIPSVPAAPDTSVSNVVNYSTDVIYQIVTDRFLDGNPSNNPTGD 94 WP_042834464.1 LYDPTHTSLKKYFGGDWQGIINKINDGYLTGMGITAIWISQPVENIYAVLPDSTFGGSTSYHGY WARDFKKTNPFFGSFTDFQNLIATAHAHNIKVIIDFAPNHTSPASETDPTYGENGRLYDNGVLL GGYTNDTNGYFHHYGGTNFSSYEDGIYRNLFDLADLDQQNNTIDSYLKVAIKLWLNMGIDGIRM DAVKHMAFGWQKNFMDSILSYRPVFTFGEWYLGTNEVDPNNTYFANESGMSLLDFRFAQKVRQV FRDNTDTMYGLDSMLQSTAADYNFINDMVTFIDNHDMDRFYTGGSTRPVEQALAFTLTSRGVPA IYYGTEQYMTGNGDPYNRAMMTSFDTTTTAYNVIKKLAPLRKSNPAIAYGTQKQRWINNDVYIY ERQFGNNVALVAINRNLSTSYYITGLYTALPAGTYSDVLGGLLNGNNISVASDGSVTPFTLAPG EVAVWQYVSTTNPPLIGHVGPTMTKAGQTITIDGRGFGTTAGQVLFGTTPATIVSWEDTEVKVK VPALTPGKYNVTLKTASGVTSNSYNNINVLTGNQVCVRFVVNNASTVWGENVYLTGNVAELGSW DTSKAIGPMFNQVVYQYPTWYYDVSVPAGTTIEFKFIKKNGSTVTWEGGYNHVYTTPTSGTATV IVNWQN Cyclomaltodextrin MRRWLSLVLSMSFVFSAIFIVSDTQKVTVEAAGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSG 95 glucanotransferase ALFSSGCTNLRKYCGGDWQGIINKINDGYLTDMGVTAIWISQPVENVFSVMNDASGSASYHGYW [Geobacillus ARDFKKPNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLLG stearothermophilus] GYTNDANMYFHHNGGTTFSSLEDGIYRNLFDLADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMD GenBank: AVKHMPFGWQKSLMDEIDNYRPVFTFGEWFLSENEVDANNHYFANESGMSLLDFRFGQKLRQVL CAA41770.1 RNNSDNWYGFNQMIQDTASAYDEVLDQVTFIDNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPN IYYGTEQYMTGNGDPNNRKMMSSFNKNTRAYQVIQKLSSLRRNNPALAYGDTEQRWINGDVYVY ERQFGKDVVLVAVNRSSSSNYSITGLFTALPAGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPG EVGVWAYSATESTPIIGHVGPMMGQVGHQVTIDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVA VPNVSPGKYNITVQSSSGQTSAAYDNFEVLTNDQVSVRFVVNNATTNLGQNIYIVGNVYELGNW DTSKAIGPMFNQVVYSYPTWYIDVSVPEGKTIEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGK IIVDWQN cyclomaltodextrin MRRWLSLVLSMSFVFSAIFIVSDTQKVTVEAAGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSG 96 glucanotransferase ALFSSGCTNLRKYCGGDWQGIINKINDGYLTDMGVTAIWISQPVENVFSVMNDASGSASYHGYW [Geobacillus ARDFKKPNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLLG stearothermophilus] GYTNDANMYFHHNGGTTFSSLEDGIYRNLFDLADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMD GenBank: AVKHMPFGWQKSLMDEIDNYRPVFTFGEWFLSENEVDANNHYFANESGMSLLDFRFGQKLRQVL CAA41771.1 RNNSDNWYGFNQMIQDTASAYDEVLDQVTFIDNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPN IYYGTEQYMTGNGDPNNRKMMSSFNKNTRAYQVIQKLSSLRRNNPALAYGDTEQRWINGDVYVY ERQFGKDVVLVAVNRSSSSNYSITGLFTALPAGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPG EVGVWAYSATESTPIIGHVGPMMGQVGHQVTIDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVA VPNVSPGKYNITVQSSSGQTSAAYDNFEVLTNDQVSVRFVVNNATTNLGQNIYIVGNVYELGNW DTSKAIGPMFNQVVYSYPTWYIDVSVPEGKTIEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGK IIVDWQN MRRWLSLVLSMSFVFSAIFIVSDTQKVTVEAAGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSG 97 ALFSSGCTNLRKYCGGDWQGIINKINDGYLTDMGVTAIWISQPVENVFSVMNDASGSASYHGYW cyclomaltodextrin ARDFKKPNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLLG glucanotransferase GYTNDANMYFHHNGGTTFSSLEDGIYRNLFDLADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMD [Geobacillus AVKHMPFGWQKSLMDEIDNYRPVFTFGEWFLSENEVDANNHYFANESGMSLLDFRFGQKLRQVL stearothermophilus] RNNSDNWYGFNQMIQDTASAYDEVLDQVTFIDNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPN GenBank: IYYGTEQYMTGNGDPNNRKMMSSFNKNTRAYQVIQKLSSLRRNNPALAYGDTEQRWINGDVYVY CAA41772.1 ERQFGKDVVLVAVNRSSSSNYSITGLFTALPAGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPG EVGVWAYSATESTPIIGHVGPMMGQVGHQVTIDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVA VPNVSPGKYNITVQSSSGQTSAAYDNFEVLTNDQVSVRFVVNNATTNLGQNIYIVGNVYELGNW DTSKAIGPMFNQVVYSYPTWYIDVSVPEGKTIEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGK IIVDWQN ChainA, AGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSGALFSSGCTNLRKYCGGDWQGIINKINDGYLT 98 Cyclodextrin DMGVTAIWISQPVENVFSVMNDASGSASYHGYWARDFKKPNPFFGTLSDFQRLVDAAHAKGIKV Glucanotransferase IIDFAPNHTSPASETNPSYMENGRLYDNGTLLGGYTNDANMYFHHNGGTTFSSLEDGIYRNLFD (E.C.2.4.1.19; LADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMDAVKHMPFGWQKSLMDEIDNYRPVFTFGEWFL CGTase) SENEVDANNHYFANESGMSLLDFRFGQKLRQVLRNNSDNWYGFNQMIQDTASAYDEVLDQVTFI PDB:1CYG_A DNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPNIYYGTEQYMTGNGDPNNRKMMSSFNKNTRAY QVIQKLSSLRRNNPALAYGDTEQRWINGDVYVYERQFGKDVVLVAVNRSSSSNYSITGLFTALP AGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPGEVGVWAYSATESTPIIGHVGPMMGQVGHQVT IDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVAVPNVSPGKYNITVQSSSGQTSAAYDNFEVLT NDQVSVRFVVNNATTNLGQNIYIVGNVYELGNWDTSKAIGPMFNQVVYSYPTWYIDVSVPEGKT IEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGKIIVDWQN Cyclomaltodextrin MRRWLSLVLSMSFVFSAIFIVSDTQKVTVEAAGNLNKVNFTSDVVYQIVVDRFVDGNTSNNPSG 99 glucanotransferase ALFSSGCTNLRKYCGGDWQGIINKINDGYLTDMGVTAIWISQPVENVFSVMNDASGSASYHGYW (alsoknownas ARDFKKPNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLLG Cyclodextrtn- GYTNDANMYFHHNGGTTFSSLEDGIYRNLFDLADLNHQNPVIDRYLKDAVKMWIDMGIDGIRMD glycosyltransferas AVKHMPFGWQKSLMDEIDNYRPVFTFGEWFLSENEVDANNHYFANESGMSLLDFRFGQKLRQVL e;CGTase) RNNSDNWYGFNQMIQDTASAYDEVLDQVTFIDNHDMDRFMIDGGDPRKVDMALAVLLTSRGVPN UntProtKB/SwIss- IYYGTEQYMTGNGDPNNRKMMSSFNKNTRAYQVIQKLSSLRRNNPALAYGDTEQRWINGDVYVY Prot:P31797.1 ERQFGKDVVLVAVNRSSSSNYSITGLFTALPAGTYTDQLGGLLDGNTIQVGSNGSVNAFDLGPG EVGVWAYSATESTPIIGHVGPMMGQVGHQVTIDGEGFGTNTGTVKFGTTAANVVSWSNNQIVVA VPNVSPGKYNITVQSSSGQTSAAYDNFEVLTNDQVSVRFVVNNATTNLGQNIYIVGNVYELGNW DTSKAIGPMFNQVVYSYPTWYIDVSVPEGKTIEFKFIKKDSQGNVTWESGSNHVYTTPTNTTGK IIVDWQN hypothetical MSRNGAVTPDWQFTVEVQEGETITYKYVKGGSWDQEGLADHTREDDNDDDVSYYGYGAIGTDLK 100 protein VTVHNEGNNTMIVQDRILRWIDMPVVIEEVQKQGSQVTIKGNAIKNGVLTINGERVPIDGRMAF AA906_05840 SYTFTPASHQKEVSIHIEPSAESKTAIFNNDGGAIAKNTKDYVLNLETKQLREGKLTTPPSNGD (Geobacillus SPESDWPGSETPSHDGGATPGNGTSPGSGGPSDGTSPGGSVPPGGTAPPGNEAPPSRPPQKPSP stearothermophilus] SKPKEKPRKPTTPPGQVKKVYWDGVELKKGQIGRLTVQKPINLWKRTKDGRLVFVRILQPGEVY RVYGYDVRFGGQYAVGGGYYVTDIDTHIRYETPSKEKLKLVNGE Maltodextrin MSRNGAVTPDWQFTVEVQEGETITYKYVKGGSWDQEGLADHTREDDNDDDVSYYGYGAIGTDLK 101 glucosidase VTVHNEGNNTMIVQDRILRWIDMPVVIEEVQKQGSQVTIKGNAIKNGVLTINGERVPIDGRMAF (Geobacillus SYTFTPASHQKEVSIHIEPSAESKTAIFNNDGGAIAKNTKDYVLNLETKQLREGKLTTPPSNGD stearothermophilus] SPESDWPGSETPSHDGGATPGNGTSPGSGGPSDGTSPGGSVPPGGTAPPGNGAPPSAPPQKPSP GenBank: SKPKEKPRKPTTPPSQVKKVYWDGVELKKGQIGRLTVQKPINLWKRAKDGRLVFVRILQPGEVY KYD32676.1 RVYGYDARFGGQYAVGGGYYVTDIDTHIRYETPSKEKLKLVNGE Beta-glucosid MAMQLRSLLLCVLLLLLGFALADTNAAARIHPPVVCANLSRANFDTLVPGFVFGAATASYQVEG 102 (Almonds) AANLDGRGPSIWDTFTHKHPEKIADGSNGDVAIDQYHRYKEDVAIMKDMGLESYRFSISWSRVL PNGTLSGGINKKGIEYYNNLINELLHNGIEPLVTLFHWDVPQTLEDEYGGFLSNRIVNDFEEYA ELCFKKFGDRVKHWTTLNEPYTFSSHGYAKGTHAPGRCSAWYNQTCFGGDSATEPYLVTHNLLL AHAAAVKLYKTKYQAYQKGVIGITVVTPWFEPASEAKEDIDAVFRALDFIYGWFMDPLTRGDYP QSMRSLVGERLPNFTKKESKSLSGSFDYIGINYYSARYASASKNYSGHPSYLNDVNVDVKTELN GVPIGPQAASSWLYFYPKGLYDLLCYTKEKYNDPIIYITENGVDEFNQPNPKLSLCQLLDDSNR IYYYYHHLCYLQAAIKEGVKVKGYFAWSLLDNFEWDNGYTVRFGINYVDYDNGLKRHSKHSTHW FKSFLKKSSRNTKKIRRCGNNNTSATKFVF DexTprotein MPANAPDKQSVTNAPVVPPKHDTDQQDDSLEKQQVLEPSVNSNIPKKQTNQQLAVVTAPANSAP 103 (Leucohostoc QTKTTAEISAGTELDTMPNVKHVDGKVYFYGDDGQPKKNFTTIIDGKPYYFDKDTGALSNNDKQ citreum] YVSELFSIGNKHNAVYNTSSDNFTQLEGHLTASSWYRPKDILKNGKRWAPSTVTDFRPLLMAWW PDKSTQVTYLNYMKDQGLLSGTHHFSDNENMRTLTAAAMQAQVNIEKKIGQLGNTDWLKTAMTQ YIDAQPNWNIDSEAKGDDHLQGGALLYTNSDMSPKANSDYRKLSRTPKNQKGQIADKYKQGGFE LLLANDVDNSNPVVQAEQLNWLHYMMNIGSILQNDDQANFDGYRVDAVDNVDADLLQIAGEYAK AAYGVDKNDARANQHLSILEDWGDEDPDYVKAHGNQQITMDFPLHLAIKYALNMPNDKRSGLEP TREHSLVKRITDDKENVAQPNYSFIRAHDSEVQTIIADIIKDKINPASTGLDSTVTLDQIKQAF DIYNADELKADKVYTPYNIPASYALLLTNKDTIPRVYYGDMFTDDGQYMAKQSPYYQAIDALLK ARIKYAAGGQTMKMNYFPDEQSVMTSVRYGKGAMTASDSGNQETRYQGIGLVVNNRPDLKLSDK DEVKMDMGAAHKNQDYRPVLLTTKSGLKVYSTDANAPVVRTDANGQLTFKADMVYGVNDPQVSG YIAAWVPVGASENQDARTKSETTQSTDGSVYHSNAALDSQVIYEGFSNFQDFPTTPDEFTNIKI AQNVNLFKDWGITSFEMAPQYRASSDKSFLDAIVQNGYAFTDRYDIGYNTPTKYGTADNLLDAL RALHGQGIQAINDWVPDQIYNLPDEQLVTAIRTDGSGDHTYGSVIDHTLYASKTVGGGIYQQQY GGAFLEQLKTQYPQLFQQKQISTDQPMNPDIQIKSWEAKYFNGSNIQGRGAWYVLKDWGTQQYF NVSDAQTFLPKQLLGEKAKTGFVTRGKETSFYSTSGYQAKSAFICDNGNWYYFDDKGKMVVGNQ VINGINYYFLPNGIELQDAYLVHDGMYYYYNNIGKQLHNTYYQDKQKNFHYFFEDGHMAQGIVT IIQSDGTPVTQYFDENGKQQKGVAVKGSDGHLHYFDGASGNMLFKSWGRLADGSWLYVDEKGNA VTGKQTINNQTVYFNDDGRQIKNNFKELADGSWLYLNNKGVAVTGEQIINGQTLYFGNDGRQFK GTTHINATGESRYYDPDSGNMITDRFERVGDNQWAYFGYDGVAVTGDRIIKGQKLYFNQNGIQM KGHLRLENGIMRYYDADTGELVRNRFVLLSDGSWVYFGQDGVPVTGVQVINGQTLYFDADGRQV KGQQRVIGNQRYWMDKDNGEMKKITYAAALEHHHHHH DexTgenesequence ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG 104 ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA AAAAAATAACATACGCGGCCGCACTCGAGCACCACCACCACCACCACTGA DexTgene(DNA ATGCAAAACGGCGAAGTGTGTCAGCGTAAAAAACTGTACAAGTCAGGGAAGATATTAGTTACAG 105 sequencecloned CAAGTATTTTTGCTGTTATGGGTTTTGGTACTGCCATGTCACAAGCAAACGCGAGCAGTAGTGA intopET23a) TAATGATAGCAAAACACAAACTATTTCAAAAATAGTAAAAAGTAAAGTCGAACCGGCAACTGTT CAACCAGCGAAACCAGCGGAACCTACTAATAAAATAGTTGACCAAGCAGATATGCATACGGTCA GCGGGCAAAACAGCGTGCCACCAGTAGTGACTAATCAATCCAATTAACAGGCTGCAAAACCAAC TACACCTGTTACCGATGTCACAGATACGCATAAAATCGAAGCAAACAACGTCCCTGCTGATGTT ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA AAAAAATAACATACGCGGCCGCACTCGAGCACCACCACCACCACCACTGA Dextransucrase MEKKVRFKLRKVKKRWVTVSVASAVVTLTSLSGSLVKADSTDDRQQAVTESQASLVTTSEAAKE 106 (alsoknownas6- TLTATDTSTATSATSQPTATVTDNVSTTNQSTNTTANTANFDVKPTTTSEQSKTDNSDKIIATS glucosyltransferase) KAVNRLTATGKFVPANNNTAHSRTVTDKIVPIKPKIGKLKQPSSLSQDDIAALGNVKNIRKVNG UniProtKB/Swiss- KYYYYKEDGTLQKNYALNINGKTFFFDETGALSNNTLPSKKGNITNNDNTNSFAQYNQVYSTDA Prot:P13470.2 ANFEHVDHYLTAESWYRPKYILKDGKTWTQSTEKDFRPLLMTWWPDQETQRQYVNYMNAQLGIH QTYNTATSPLQLNLAAQTIQTKIEEKITAEKNTNWLRQTISAFVKTQSAWNSDSEKPFDDHLQK GALLYSNNSKLTSQANSNYRILNRTPTNQTGKKDPRYTADRTIGGYEFLLANDVDNSNPVVQAE QLNWLHFLMNFGNIYANDPDANFDSIRVDAVDNVDADLLQIAGDYLKAAKGIHKNDKAANDHLS ILEAWSYNDTPYLHDDGDNMINMDNRLRLSLLYSLAKPLNQRSGMNPLITNSLVNRTDDNAETA AVPSYSFIRAHDSEVQDLIRNIIRAEINPNVVGYSFTMEEIKKAFEIYNKDLLATEKKYTHYNT ALSYALLLTNKSSVPRVYYGDMFTDDGQYMAHKTINYEAIETLLKARIKYVSGGQAMRNQQVGN SEIITSVRYGKGALKATDTGDRTTRTSGVAVIEGNNPSLRLKASDRVVVNMGAAHKNQAYRPLL LTTDNGIKAYHSDQEAAGLVRYTNDRGELIFTAADIKGYANPQVSGYLGVWVPVGAAADQDVRV AASTAPSTDGKSVHQNAALDSRVMFEGFSNFQAFATKKEEYTNVVIAKNVDKFAEWGVTDFEMA PQYVSSTDGSFLDSVIQNGYAFTDRYDLGISKPNKYGTADDLVKAIKALHSKGIKVMADWVPDQ MYALPEKEVVTATRVDKYGTPVAGSQIKNTLYVVDGKSSGKDQQAKYGGAFLEELQAKYPELFA RKQISTGVPMDPSVKIKQWSAKYFNGTNILGRGAGYVLKDQATNTYFSLVSDNTFLPKSLVNPN HGTSSSVTGLVFDGKGYVYYSTSGNQAKNAFISLGNNWYYFDNNGYMVTGAQSINGANYYFLSN GIQLRNAIYDNGNKVLSYYGNDGRRYENGYYLFGQQWRYFQNGIMAVGLTRIHGAVQYFDASGF QAKGQFITTADGKLRYFDRDSGNQISNRFVRNSKGEWFLFDHNGVAVTGTVTFNGQRLYFKPNG VQAKGEFIRDADGHLRYYDPNSGNEVRNRFVRNSKGEWFLFDHNGIAVTGTRVVNGQRLYFKSN GVQAKGELITERKGRIKYYDPNSGNEVRNRYVRTSSGNWYYFGNDGYALIGWHVVEGRRVYFDE NGVYRYASHDQRNHWDYDYRRDFGRGSSSAVRFRHSRNGFFDNFFRF Dextransucrase MDKKVRYKLRKVKKRWVTVSVASAVMTLTTLSGGLVKADSNESKSQISNDSNTSVVTANEESNV 107 (alsoknownas TTEVTSKQEAASSQTNHTVTTISSSTSVVNPKEVVSNPYTVGETASNGEKLQNQTTTVDKTSEA Sucrose6- AANNISKQTTEADTDVIDDSNAANLQILEKLPNVKEIDGKYYYYDNNGKVRTNFTLIADGKILH glucosyltransferase) FDETGAYTDTSIDTVNKDIVTTRSNLYKKYNQVYDRSAQSFEHVDHYLTAESWYRPKYILKDGK UniProtKB/Swiss- TWTQSTEKDFRPLLMTWWPSQETQRQYVNYMNAQLGINKTYDDTSNQLQLNIAAATIQAKIEAK Prot:P08987.3 ITTLKNTDWLRQTISAFVKTQSAWNSDSEKPFDDHLQNGAVLYDNEGKLTPYANSNYRILNRTP TNQTGKKDPRYTADNTIGGYEFLLANDVDNSNPVVQAEQLNWLHFLMNFGNIYANDPDANFDSI RVDAVDNVDADLLQIAGDYLKAAKGIHKNDKAANDHLSILEAWSDNDTPYLHDDGDNMINMDNK LRLSLLFSLAKPLNQRSGMNPLITNSLVNRTDDNAETAAVPSYSFIRAHDSEVQDLIRDIIKAE INPNVVGYSFTMEEIKKAFEIYNKDLLATEKKYTHYNTALSYALLLTNKSSVPRVYYGDMFTDD GQYMAHKTINYEAIETLLKARIKYVSGGQAMRNQQVGNSEIITSVRYGKGALKATDTGDRTTRT SGVAVIEGNNPSLRLKASDRVVVNMGAAHKNQAYRPLLLTTDNGIKAYHSDQEAAGLVRYTNDR GELIFTAADIKGYANPQVSGYLGVWVPVGAAADQDVRVAASTAPSTDGKSVHQNAALDSRVMFE GFSNFQAFATKKEEYTNVVIAKNVDKFAEWGVTDFEMAPQYVSSTDGSFLDSVIQNGYAFTDRY DLGISKPNKYGTADDLVKAIKALHSKGIKVMADWVPDQMYAFPEKEVVTATRVDKFGKPVEGSQ IKSVLYVADSKSSGKDQQAKYGGAFLEELQAKYPELFARKQISTGVPMDPSVKIKQWSAKYFNG TNILGRGAGYVLKDQATNTYFNISDNKEINFLPKTLLNQDSQVGFSYDGKGYVYYSTSGYQAKN TFISEGDKWYYFDNNGYMVTGAQSINGVNYYFLSNGLQLRDAILKNEDGTYAYYGNDGRRYENG YYQFMSGVWRHFNNGEMSVGLTVIDGQVQYFDEMGYQAKGKFVTTADGKIRYFDKQSGNMYRNR FIENEEGKWLYLGEDGAAVTGSQTINGQHLYFRANGVQVKGEFVTDRYGRISYYDSNSGDQIRN RFVRNAQGQWFYFDNNGYAVTGARTINGQHLYFRANGVQVKGEFVTDRHGRISYYDGNSGDQIR NRFVRNAQGQWFYFDNNGYAVTGARTINGQHLYFRANGVQVKGEFVTDRYGRISYYDSNSGDQI RNRFVRNAQGQWFYFDNNGYAVTGARTINGQHLYFRANGVQVKGEFVTDRYGRISYYDANSGER VRIN Glucosyltransferas METKRRYKMYKVKKHWVTIAVASGLITLGTTTLGSSVSAETEQQTSDKVVTQKSEDDKAASESS 108 e-S(alsoknownas QTDAPKTKQAQTEQTQAQSQANVADTSTSITKETPSQNITTQANSDDKTVTNTKSEEAQTSEER GTF-S, TKQAEEAQATASSQALTQAKAELTKQRQTAAQENKNPVDLAAIPNVKQIDGKYYYIGSDGQPKK Dextransucrase, NFALTVNNKVLYFDKNTGALTDTSQYQFKQGLTKLNNDYTPHNQIVNFENTSLETIDNYVTADS Sucrose6- WYRPKDILKNGKTWTASSESDLRPLLMSWWPDKQTQIAYLNYMNQQGLGTGENYTADSSQESLN glucosyltransferase) LAAQTVQVKIETKISQTQQTQWLRDIINSFVKTQPNWNSQTESDTSAGEKDHLQGGALLYSNSD UniProtKB/Swiss- KTAYANSDYRLLNRTPTSQTGKPKYFEDNSSGGYDFLLANDIDNSNPVVQAEQLNWLHYLMNYG Prot:P49331.3 SIVANDPEANFDGVRVDAVDNVNADLLQIASDYLKAHYGVDKSEKNAINHLSILEAWSDNDPQY NKDTKGAQLPIDNKLRLSLLYALTRPLEKDASNKNEIRSGLEPVITNSLNNRSAEGKNSERMAN YIFIRAHDSEVQTVIAKIIKAQINPKTDGLTFTLDELKQAFKIYNEDMRQAKKKYTQSNIPTAY ALMLSNKDSITRLYYGDMYSDDGQYMATKSPYYDAIDTLLKARIKYAAGGQDMKITYVEGDKSH MDWDYTGVLTSVRYGTGANEATDQGSEATKTQGMAVITSNNPSLKLNQNDKVIVNMGTAHKNQE YRPLLLTTKDGLTSYTSDAAAKSLYRKTNDKGELVFDASDIQGYLNPQVSGYLAVWVPVGASDN QDVRVAASNKANATGQVYESSSALDSQLIYEGFSNFQDFVTKDSDYTNKKIAQNVQLFKSWGVT SFEMAPQYVSSEDGSFLDSIIQNGYAFEDRYDLAMSKNNKYGSQQDMINAVKALHKSGIQVIAD WVPDQIYNLPGKEVVTATRVNDYGEYRKDSEIKNTLYAANTKSNGKDYQAKYGGAFLSELAAKY PSIFNRTQISNGKKIDPSEKITAWKAKYFNGTNILGRGVGYVLKDNASDKYFELKGNQTYLPKQ MTNKEASTGFVNDGNGMTFYSTSGYQAKNSFVQDAKGNWYYFDNNGHMVYGLQHLNGEVQYFLS NGVQLRESFLENADGSKNYFGHLGNRYSNGYYSFDNDSKWRYFDASGVMAVGLKTINGNTQYFD QDGYQVKGAWITGSDGKKRYFDDGSGNMAVNRFANDKNGDWYYLNSDGIALVGVQTINGKTYYF GQDGKQIKGKIITDNGKLKYFLANSGELARNIFATDSQNNWYYFGSDGVAVTGSQTIAGKKLYF ASDGKQVKGSFVTYNGKVHYYHADSGELQVNRFEADKDGNWYYLDSNGEALTGSQRINGQRVFF TREGKQVKGDVAYDERGLLRYYDKNSGNMVYNKVVTLANGRRIGIDRWGIARYY Dextransucrase(EC METKRRYKMHKVKKHWVTVAVASGLITLGTTTLGSSVSAETEQQTSDKVVTQKSEDDKAASESS 109 2.4.1.5)precursor QTDAPKTKQAQTEQTQAQSQANVADTSTSITKETPSQNITTQANSDDKTVTNTKSEEAQTSEER Streptococcus TKQSEEAQTTASSQALTQAKAELTKQRQTAAQENKNPVDLAAIPNVKQIDGKYYYIGSDGQPKK mutans NFALTVNNKVLYFDKNTGALTDTSQYQFKQGLTKLNNDYTPHNQIVNFENTSLETIDNYVTADS PIR:A45866 WYRPKDILKNGKTWTASSESDLRPLLMSWWPDKQTQIAYLNYMNQQGLGTGENYTADSSQESLN LAAQTVQVKIETKISQTQQTQWLRDIINSFVKTQPNWNSQTESDTSAGEKDHLQGGALLYSNSD KTAYANSDYRLLNRTPTSQTGKPKYFEDNSSGGYDFLLANDIDNSNPVVQAEQLNWLHYLMNYG SIVANDPEANFDGVRVDAVDNVNADLLQIASDYLKAHYGVDKSEKNAINHLSILEAWSDNDPQY NKDTKGAQLPIDNKLRLSLLYALTRPLEKDASNKNEIRSGLEPVITNSLNNRSAEGKNSERMAN YIFIRAHDSEVQTVIAKIIKAQINPKTDGLTFTLDELKQAFKIYNEDMRQAKKKYTQSNIPTAY ALMLSNKDSITRLYYGDMYSDDGQYMATKSPYYDAIDTLLKARIKYAAGGQDMKITYVEGDKSH MDWDYTGVLTSVRYGTGANEATDQGSEATKTQGMAVITSNNPSLKLNQNDKVIVNMGAAHKNQE YRPLLLTTKDGLTSYTSDAAAKSLYRKTNDKGELVFDASDIQGYLNPQVSGYLAVWVPVGASDN QDVRVAASNKANATGQVYESSSALDSQLIYEGFSNFQDFVTKDSDYTNKKIAQNVQLFKSWGVT SFEMAPQYVSSEDGSFLDSIIQNGYAFEDRYDLAMSKNNKYGSQQDMINAVKALHKSGIQVIAD WVPDQIYNLPGKEVVTATRVNDYGEYRKDSEIKNTLYAANTKSNGKDYQAKYGGAFLSELAAKY PSIFNRTQISNGKKIDPSEKITAWKAKYFNGTNILGRGVGYVLKDNASDKYFELKGNQTYLPKQ MTNKEASTGFVNDGNGMTFYSTSGYQAKNSFVQDAKGNWYYFDNNGHMVYGLQQLNGEVQYFLS NGVQLRESFLENADGSKNYFGHLGNRYSNGYYSFDNDSKWRYFDASGVMAVGLKTINGNTQYFD QDGYQVKGAWITGSDGKKRYFDDGSGNMAVNRFANDKNGDWYYLNSDGIALVGVQTINGKTYYF GQDGKQIKGKIITDNGKLKYFLANSGELARNIFATDSQNNWYYFGSDGVAVTGSQTIAGKKLYF ASDGKQVKGSFVTYNGKVHYYHADSGELQVNRFEADKDGNWYYLDSNGEALTGSQRINDQRVFF TREGKQVKGDVAYDERGLLRYYD Dextranase MFSAVLLGWLLFQPTVGHAIRQRAGNHTVCNSQLCTWWHDNGEINTASMVQLGNVRQSHKYLVQ 110 VSIAGVNDFYDSFAYESIPRNGRGRIYSPWDPPNSDTLGSDVDDGITIETSAGINMAWSQFEYS TGVDVKILTRDGSRLPDPSGVKIRPTAISYDIRSSSDGGIVIRVPHDPNGRRFSVEFDNDLYTY RSDGSRYVSSGGSIVGVEPRNALVIFASPFLPDNMVPRIDGPDTKVMTPGPINQGDWGSSGILY FPPGVYWMNSNQQGQTPKIGENHIRLHPNTYWAYLAPGAYVKGAIEYSTKSDFYATGHGVLSGE HYVYQANPATYYQALKSDATSLRMWWHNNLGGGQTWYCQGPTINAPPFNTMDFHGSSDITTRIS DYKQVGAFFFQTDGPQMYPNSQVHDVFYHVNDDAIKTYYSGVTVTRATIWKAHNDPIIQMGWDT RDVTGVTLQDLYIIHTRYIKSETYVPSAIIGASPFYMPGRSVDPAKSISMTISNLVCEGLCPAL MRITPLQNYRDFRIQNVAFPDGLQANSIGTGKSIVPASSGLKFGVAISNWTVGGEQVTMSNFQS DSLGQLDIDVSYWGQWVIR Lanosterol MTEFYSDTIGLPKTDPRLWRLRTDELGRESWEYLTPQQAANDPPSTFTQWLLQDPKFPQPHPER 111 SEQIDNO:55 synthase[S. NKHSPDFSAFDACHNGASFFKLLQEPDSGIFPCQYKGPMFMTIGYVAVNYIAGIEIPEHERIEL inWO cerevisiae] IRYIVNTAHPVDGGWGLHSVDKSTVFGTVLNYVILRLLGLPKDHPVCAKARSTLLRLGGAIGSP 2016/050890 HWGKIWLSALNLYKWEGVNPAPPETWLLPYSLPMHPGRWWVHTRGVYIPVSYLSLVKFSCPMTP LLEELRNEIYTKPFDKINFSKNRNTVCGVDLYYPHSTTLNIANSLVVFYEKYLRNRFIYSLSKK KVYDLIKTELQNTDSLCIAPVNQAFCALVTLIEEGVDSEAFQRLQYRFKDALFHGPQGMTIMGT NGVQTWDCAFAIQYFFVAGLAERPEFYNTIVSAYKFLCHAQFDTECVPGSYRDKRKGAWGFSTK TQGYTVADCTAEAIKAIIMVKNSPVFSEVHHMISSERLFEGIDVLLNLQNIGSFEYGSFATYEK IKAPLAMETLNPAEVFGNIMVEYPYVECTDSSVLGLTYFHKYFDYRKEEIRTRIRIAIEFIKKS QLPDGSWYGSWGICFTYAGMFALEALHTVGETYENSSTVRKGCDFLVSKQMKDGGWGESMKSSE LHSYVDSEKSLVVQTAWALIALLFAEYPNKEVIDRGIDLLKNRQEESGEWKFESVEGVFNHSCA IEYPSYRFLFPIKALGMYSRAYETHTL DNAsequence ATGAAGGTCTCTCCATTTGAGTTCATGTCGGCAATAATTAAGGGCAGGATGGACCCGTCCAATT 112 SEQIDNO:45 encodingS. CTTCATTTGAGTCGACTGGCGAGGTTGCCTCAGTTATTTTCGAGAACCGTGAGCTGGTTGCGAT ofWO grosvenorii CTTAACCACCTCGATCGCCGTCATGATTGGCTGCTTCGTTGTTCTCATGTGGCGAAGAGCCGGC 2016/050890 CPR4497 AGTCGGAAAGTTAAGAACGTGGAGCTACCTAAGCCGTTGATTGTGCACGAGCCGGAGCCCGAAG TTGAAGACGGCAAGAAGAAGGTTTCAATCTTCTTCGGTACACAGACAGGCACCGCCGAAGGATT TGCAAAGGCTCTAGCTGACGAGGCGAAAGCACGATACGAGAAGGCCACATTTAGAGTTGTTGAT TTGGATGATTATGCAGCTGATGACGATCAGTATGAAGAGAAGTTGAAGAACGAGTCTTTCGCTG TCTTCTTATTGGCAACGTATGGCGATGGAGAGCCCACTGATAATGCCGCAAGATTCTATAAATG GTTCGCGGAGGGGAAAGAGAGAGGGGAGTGGCTTCAGAACCTTCATTATGCGGTCTTTGGCCTT GGCAACCGACAGTACGAGCATTTTAATAAGATTGCAAAGGTGGCAGATGAGCTGCTTGAGGCAC AGGGAGGCAACCGCCTTGTTAAAGTTGGTCTTGGAGATGACGATCAGTGCATAGAGGATGACTT CAGTGCCTGGAGAGAATCATTGTGGCCTGAGTTGGATATGTTGCTTCGAGATGAGGATGATGCA ACAACAGTGACCACCCCTTACACAGCTGCCGTATTAGAATATCGAGTTGTATTCCATGATTCTG CAGATGTAGCTGCTGAGGACAAGAGCTGGATCAATGCAAACGGTCATGCTGTACATGATGCTCA GCATCCCTTCAGATCTAATGTGGTTGTGAGGAAGGAGCTCCATACGTCCGCATCTGATCGCTCC TGTAGTCATCTAGAATTTAATATTTCTGGGTCTGCACTCAATTATGAAACAGGGGATCATGTCG GTGTTTACTGTGAAAACTTAACTGAGACTGTGGACGAGGCACTAAACTTATTGGGTTTGTCTCC TGAAACGTATTTCTCCATATATACTGATAACGAGGATGGCACTCCACTTGGTGGAAGCTCTTTA CCACCTCCTTTTCCATCCTGCACCCTCAGAACAGCATTGACTCGATATGCAGATCTCTTGAATT CACCCAAGAAGTCAGCTTTGCTTGCATTAGCAGCACATGCTTCAAATCCAGTAGAGGCTGACCG ATTAAGATATCTTGCATCACCTGCCGGGAAGGATGAATACGCCCAGTCTGTGATTGGTAGCCAG AAAAGCCTTCTTGAGGTCATGGCTGAATTTCCTTCTGCCAAGCCCCCACTTGGTGTCTTCTTCG CAGCTGTTGCACCGCGCTTGCAGCCTCGATTCTACTCCATATCATCATCTCCAAGGATGGCTCC ATCTAGAATTCATGTTACTTGTGCTTTAGTCTATGACAAAATGCCAACAGGACGTATTCATAAA GGAGTGTGCTCAACTTGGATGAAGAATTCTGTGCCCATGGAGAAAAGCCATGAATGCAGTTGGG CTCCAATTTTCGTGAGACAATCAAACTTCAAGCTTCCTGCAGAGAGTAAAGTGCCCATTATCAT GGTTGGTCCTGGAACTGGATTGGCTCCTTTCAGAGGTTTCTTACAGGAAAGATTAGCTTTGAAG GAATCTGGAGTAGAATTGGGGCCTTCCATATTGTTCTTTGGATGCAGAAACCGTAGGATGGATT ACATATACGAGGATGAGCTGAACAACTTTGTTGAGACTGGTGCTCTCTCTGAGTTGGTTATTGC CTTCTCACGCGAAGGGCCAACTAAGGAATATGTGCAGCATAAAATGGCAGAGAAGGCTTCGGAT ATCTGGAATTTGATATCAGAAGGGGCTTACTTATATGTATGTGGTGATGCAAAGGGCATGGCTA AGGATGTCCACCGAACTCTCCATACTATCATGCAAGAGCAGGGATCTCTTGACAGCTCAAAAGC TGAGAGCATGGTGAAGAATCTGCAAATGAATGGAAGGTATCTGCGTGATGTCTGGTGA CPR4497protein MKVSPFEFMSAIIKGRMDPSNSSFESTGEVASVIFENRELVAILTTSIAVMIGCFVVLMWRRAG 113 SEQIDNO:46 [S.grosvenorii] SRKVKNVELPKPLIVHEPEPEVEDGKKKVSIFFGTQTGTAEGFAKALADEAKARYEKATFRVVD ofWO LDDYAADDDQYEEKLKNESFAVFLLATYGDGEPTDNAARFYKWFAEGKERGEWLQNLHYAVFGL 2016/050890 GNRQYEHFNKIAKVADELLEAQGGNRLVKVGLGDDDQCIEDDFSAWRESLWPELDMLLRDEDDA TTVTTPYTAAVLEYRVVFHDSADVAAEDKSWINANGHAVHDAQHPFRSNVVVRKELHTSASDRS CSHLEFNISGSALNYETGDHVGVYCENLTETVDEALNLLGLSPETYFSIYTDNEDGTPLGGSSL PPPFPSCTLRTALTRYADLLNSPKKSALLALAAHASNPVEADRLRYLASPAGKDEYAQSVIGSQ KSLLEVMAEFPSAKPPLGVFFAAVAPRLQPRFYSISSSPRMAPSRIHVTCALVYDKMPTGRIHK GVCSTWMKNSVPMEKSHECSWAPIFVRQSNFKLPAESKVPIIMVGPGTGLAPFRGFLQERLALK ESGVELGPSILFFGCRNRRMDYIYEDELNNFVETGALSELVIAFSREGPTKEYVQHKMAEKASD IWNLISEGAYLYVCGDAKGMAKDVHRTLHTIMQEQGSLDSSKAESMVKNLQMNGRYLRDVW Codonoptimized ATGGACGCGATTGAACATAGAACCGTAAGTGTTAATGGTATCAATATGCATGTGGCAGAAAAGG 114 SEQIDNO:37 codingsequenceof GAGAGGGACCTGTCGTGTTGTTGCTTCATGGTTTCCCAGAATTGTGGTACAGTTGGAGACATCA ofWO Epoxidehydrolase AATATTGGCTCTTTCCTCTTTAGGTTACAGAGCTGTCGCACCAGACTTACGAGGCTACGGGGAT 2016/050890 lfromS ACAGATGCCCCAGGGTCAATTTCATCATACACATGCTTTCACATCGTAGGAGATCTCGTGGCTC grosvenorii TAGTTGAGTCTCTGGGTATGGACAGGGTTTTTGTTGTAGCCCACGATTGGGGTGCCATGATCGC TTGGTGTTTGTGTCTGTTTAGACCTGAAATGGTTAAAGCTTTTGTTTGTCTCTCCGTCCCATTC AGACAGAGAAACCCTAAGATGAAACCAGTTCAAAGTATGAGAGCCTTTTTCGGCGATGATTACT ATATTTGCAGATTTCAAAATCCTGGGGAAATCGAAGAGGAGATGGCTCAAGTGGGTGCAAGGGA AGTCTTAAGAGGAATTCTAACATCTCGTCGTCCTGGACCACCAATCTTACCAAAAGGGCAAGCT TTTAGAGCAAGACCAGGAGCATCCACTGCATTGCCATCTTGGCTATCTGAAAAAGATCTGTCAT TTTTCGCTTCTAAGTATGATCAAAAGGGCTTTACAGGCCCACTAAACTACTACAGAGCCATGGA TCTTAATTGGGAATTGACTGCGTCATGGACTGGTGTCCAAGTTAAAGTACCTGTCAAATACATC GTGGGTGACGTTGACATGGTTTTTACGACTCCTGGTGTAAAGGAATATGTCAACGGCGGTGGTT TCAAAAAGGACGTTCCATTTTTACAGGAAGTGGTAATCATGGAAGGCGTTGGTCATTTCATTAA TCAGGAAAAACCTGAGGAGATTTCATCTCATATACACGATTTCATAAGCAAATTCTAA Codonoptimized ATGGATGAAATCGAACATATTACCATCAATACAAATGGAATCAAAATGCATATTGCGTCAGTCG 115 SEQIDNO:39 codingsequenceof GCACAGGACCAGTTGTTCTCTTGCTACACGGCTTTCCAGAATTATGGTACTCTTGGAGACACCA ofWO S.grosvenorii ACTACTTTACCTGTCCTCCGTTGGGTACAGAGCAATAGCTCCAGATTTGAGAGGCTATGGCGAT 2016/050890 Epoxidehydrolase ACTGACAGTCCAGCTAGTCCTACCTCTTATACTGCTCTTCATATTGTAGGTGACCTGGTCGGCG 2 CATTAGACGAATTGGGAATAGAAAAGGTCTTTTTAGTGGGTCATGACTGGGGTGCTATTATCGC ATGGTACTTTTGTTTGTTTAGACCAGATAGAATTAAAGCACTTGTGAATTTGTCTGTCCAGTTT ATCCCACGTAACCCAGCAATACCTTTTATAGAAGGTTTCAGAACAGCTTTTGGTGATGACTTCT ACATTTGTAGATTTCAAGTACCTGGGGAAGCTGAAGAGGATTTCGCGTCTATCGATACTGCTCA ATTGTTTAAAACTTCATTATGCAATAGAAGCTCAGCCCCTCCTTGTTTGCCTAAAGAGATTGGT TTTAGGGCTATCCCACCACCAGAAAATCTGCCATCTTGGCTCACAGAGGAAGATATCAACTTCT ACGCAGCCAAGTTTAAACAAACTGGTTTTACTGGTGCCCTTAACTATTATAGAGCATTCGACTT GACATGGGAATTAACAGCCCCATGGACAGGAGCCCAGATCCAAGTTCCTGTAAAGTTCATAGTT GGTGATTCAGATCTCACGTACCATTTCCCTGGTGCTAAGGAATACATCCACAACGGAGGGTTTA AAAGAGATGTGCCACTATTAGAGGAAGTTGTTGTGGTAAAAGATGCCTGCCACTTCATTAACCA AGAGCGACCACAAGAGATTAATGCTCATATTCATGACTTCATCAATAAGTTCTAA UGT3494 ATGGCGGATCGGAAAGAGAGCGTTGTGATGTTCCCGTTCATGGGGCAGGGCCATATCATCCCTT 116 SEQIDNO:29 [S.grosvenorii] TTCTAGCTTTGGCCCTCCAGATTGAGCACAGAAACAGAAACTACGCCATATACTTGGTAAATAC ofWO TCCTCTCAACGTTAAGAAAATGAGATCTTCTCTCCCTCCAGATTGA 2016/050890 FragmentofS. TTCTGCTCCACGCCTGTAAATTTGGAAGCCATTAAACCAAAGCTTTCCAAAAGCTACTCTGATT 117 SEQIDNO:33 grosvenorii CGATCCAACTAATGGAGGTTCCTCTCGAATCGACGCCGGAGCTTCCTCCTCACTATCATACAGC ofWO UGT11789gene CAAAGGCCTTCCGCCGCATTTAATGCCCAAACTCATGAATGCCTTTAAAATGGTTGCTCCCAAT 2016/050890 sequence CTCGAATCGATCCTAAAAACCCTAAACCCAGATCTGCTCATCGTCGACATTCTCCTTCCATGGA TGCTTCCACTCGCTTCATCGCTCAAAATTCCGATGGTTTTCTTCACTATTTTCGGTGCCATGGC CATCTCCTTTATGATTTATAATCGAACCGTCTCGAACGAGCTTCCATTTCCAGAATTTGAACTT CACGAGTGCTGGAAATCGAAGTGCCCCTATTTGTTCAAGGACCAAGCGGAAAGTCAATCGTTCT TAGAATACTTGGATCAATCTTCAGGCGTAATTTTGATCAAAACTTCCAGAGAGATTGAGGCTAA GTATGTAGACTTTCTCACTTCGTCGTTTACGAAGAAGGTTGTGACCACCGGTCCCCTGGTTCAG CAACCTTCTTCCGGCGAAGACGAGAAGCAGTACTCCGATATCATCGAATGGCTAGACAAGAAGG AGCCGTTATCGACGGTGCTCGTTTCGTTTGGGAGCGAGTATTATCTGTCAAAGGAAGAGATGGA AGAAATCGCCTACGGGCTGGAGAGCGCCAGCGAGGTGAATTTCATCTGGATTGTTAGGTTTCCG ATGGGACAGGAAACGGAGGTCGAGGCGGCGCTGCCGGAGGGGTTCATCCAGAGGGCAGGAGAGA GAGGGAAAGTGGTCGAGGGCTGGGCTCCGCAGGCGAAAATATTGGCGCATCCGAGCACCGGCGG CCATGTGAGCCACAACGGGTGGAGCTCGATTGTGGAGTGCTTGATGTCCGGTGTACCGGTGATC GGCGCGCCGATGCAACTTGACGGGCCAATCGTCGCAAGGCTGGTGGAGGAGATCGGCGTGGGTT TGGAAATCAAGAGAGATGAGGAAGGGAGAATCACGAGGGGCGAAGTTGCCGATGCAATCAAGAC GGTGGCGGTGGGCAAAACCGGGGAAGATTTTAGAAGGAAAGCAAAAAAAATCAGCAGCATTTTG AAGATGAAAGATGAAGAAGAGGTTGACACTTTGGCAATGGAATTAGTGAGGTTATGCCAAATGA AAAGAGGGCAGGAGTCTCAGGACTAA UGT11999gene TCCCGGTCAACGGTAGAGGACTTCACGGAGCTTCGAGAGTGGATGCCTTCTGGATCGAACATGG 118 SEQIDNO:34 sequence TCTACCGGTACCACGAGATTAAAAAATCCTTAGATGGAGCAACCGGCAACGAATCGGGGACGTC ofWO [S.grosvenorii] TGATTCGGTCCGATTCGGAATTGTGATTGAGGAGAGTGTTGCTGTGGCTGTAAGAAGCTCCCCT 2016/050890 GAACTGGAACCGGAATGGTTCGATTTGCTCGCGAAGCTTTACCAGAAGCCAGTTGTTCCGGTAG GATTTCTACCTCCAGTAATTGAAGATGCGGAAGAATTGAGCAGCGATATCAAGGAATGGTTAGA CAAACAGAGCTCAAACTCGGTCCTTTACGTCGCATTCGGGACCGAGGCGACTCTGAGTCAAGAT GACGTCACTGAGTTAGCCATGGGGCTTGAGCAATCTGGGATACCATTTTTCTGGGTACTGAGAA CCTCACCTCGGGACGAGTCAGACATGTTACCGGCCGGGTTCAAGGAGCGAGTCGAAGGTCGAGG AAGTGTTCACGTGGGATGGGTCTCGCAGGTGAAGATACTGAGTCACGACTCGGTTGGCGGTTGT TTGACACACTGTGGATGGAACTCGATCATAGAGGGGCTCGGATTCGGGCGCGTTATGGTATTGT TTCCAGTCGTGAACGACCAGGGATTGAACGCTAGATTGTTGGGGGAGAAGAAGCTCGGGATAGA GATAGAAAGGGACGAGCGAGATGGATCGTTCACACGCGACTCGGTGTCGGAATCGGTGAGGTCG GCAATGGCGGAAAGTTCAGGCGAGGCCTTGAGAGTGAGGGCCAGGGAAATGAAGGGGTTGTTTG GAAACGGAGATGAGAACGAGCATCAACTGAACAAGTTTGTACAATTTCTCGAGGCAAACAGGAA TAGGCAGTCCGAGTAA PartialUGT13679 CTGCTGCCGATTCCGCTGCCGAAACCGGCCGCCGATCTCTTGCCGGAAGGTGCAGAGGCGACGG 119 SEQIDNO:35 genesequence TGGATATTCCGTCCGACAAGATTCCGTATCTGAAATTGGCCCTCGATCTCGCCGAGCAGCCGTT ofWO [Siraitia TCGGAAGTTCGTCGTTGATCGTCCGCCGGATTGGATGATCGTCGATTTTAATGCTACTTGGGTC 2016/050890 grosvenorii] TGCGATATTTCTCGGGAGCTTCAAATCCCAATCGTTTTCTTTCGTGTTCTTTCGCCTGGATTTC TTGCTTTCTTTGCGCATGTTCTTGGGAGTGGTCTGCCGCTGTCGGAGATCGAAAGCCTGATGAC TCCGCCGGTGATCGACGGGTCGACGGTGGCGTACCGCCGGCATGAAGCTGCCGTTATTTGTGCT GGGTTTTTTGAGAAGAACGCTTCTGGTATGAGTGATCGCGATCGGGTAACCAAAATTCTCTCTG CCAGTCAAGCAATCGCAGTTCGTTCTTGCTACGAATTTGACGTTGAGTATTTGAAATTGTACGA GAAATATTGTGGAAAAAGAGTGATTCCTCTAGGGTTTCTCCCTCCAGAAAAGCCCCAAAAGTCC GAGTTCGCCGCCGATTCGCCATGGAAACCGACCTTCGAGTGGCTTGACAAACAAAAGCCCCGAT CAGTGGTGTTCGTCGGATTCGGCAGCGAATGCAAACTCACGAAAGATGATGTTTACGAGATAGC GCGCGGGGTGGAGCTGTCGGAGCTGCCATTTTTGTGGGCTCTGAGAAAACCGATCTGGGCGGCG GCGGACGATTCCGACGCTCTGCCTGCCGGATTCCTCGAGCGGACGGCGGAGAGAGGGATTGTGA GCATGGGGTGGGCGCCGCAGATGGAGATTTTAACGCACCCGTCGATTGGCGGCTCTCTGTTTCA CGCCGGGTGGGGATCCGCCATTGAAGCTCTGCAATTCGGGCATTGCCTTGTTCTGTTGCCATTC ATCGTGGATCAGCCACTGAATGCAAGGCTTCTGGTGGAGAAGGGTGTTGCAGTCGAAGTTGGAA GAAAGGAAGACGGGTCTTTTAGTGGAGAAGACATAGCTAAAGCTCTGAGAGAAGCTATGGTTTC AGAAGAAGGTGAGCAGATGAGGAGGCAAGCGAGAAAG Partialsequence ATGGAAAACGACGGCGTTTTGCACGTGGTGGTATTCCCATGGCTAGCCTTGGGTCATCTCATTC 120 SEQIDNO:36 ofS.grosvenorii CTTTCGCTCGACTCGCCACCTGCTTAGCCCACAAGGGTCTCAGGGTTTCGTTCGTATCAACCAC ofWO UGT15423 AAGGAACCTGAGCAGAATTCCCAAAATACCCCCACATCTCTCCTCCTCCGTCAACCTCGTCGGC 2016/050890 TTTCCTCTGCCCCACGTCGACGGCCTTCCGGACGCCGCCGAGGCTTCCTCCGACGTGCCTTACA ACAAGCAACAGTTACTGAAGAAGGCCTTCGACTCTCTGGAATCACCGCTCGCCGATTTGCTTCG TGATTTGAATCCCGATTGGATTATCTACGATTACGCCTCTCATTGGCTTCCGCAGCTCGCGGCG GAGCTCCGTATCTCGTCTGTTTTCTTCAGCCTCTTCACCGCGGCGTTTCTTGCTTTTCTTGGCC CACCGTCGGCGTTGTCCGGCGACGGCAGTTCCCGGTGA UGT1576gene ATGGCTTCTCCTCGCCACACTCCTCACTTTCTGCTCTTCCCTTTCATGGCTCAAGGCCACATGA 121 SEQIDNO:47 sequence[S. TCCCCATGATTGACCTTGCCAGGCTTCTGGCTCAGCGAGGAGTTATCATCACTATTATCACCAC ofWO grosvenorii] GCCCCACAATGCTGCTCGCTACCACTCTGTTCTIATGCTCGCGCCATCGATTCTGGGTTACACATC 2016050890 CATGTCCTCCAACTGCAGTTTCCATGTAAGGAAGGTGGGCTGCCAGAAGGGTGCGAGAATGTGG ACTTGCTACCTTCACTTGCTTCCATACCCAGATTCTACAGAGCAGCAAGTGATCTCCTTTACGA ACCATCTGAAAAACTGTTTGAGGAACTCATCCCCCGGCCGACCTGCATAATCTCCGATATGTGC CTGCCCTGGACCATGCGAATTGCTCTGAAATATCACGTCCCAAGGCTCGTTTTCTACAGTTTGA GCTGCTTCTTTCTTCTCTGTATGCGGAGTTTAAAAAACAATCTAGCGCTTATAAGCTCCAAGTC TGATTCTGAGTTCGTAAOTTTCTCTGACTTGCCTGATCCAGTCGAGTTTCTCAAGTCGGAGCTA CCTAAATCCACCGATGAAGACTTGGTGAAGTTTAGTTATGAAATGGGGGAGGCCGATCGGCAGT CATACGGCGTTATTTTAAATCTATTTGAGGAGATGGAACCAAAGTATCTTGCAGAATATGAAAA GGAAAGAGAATCGCCGGAAAGAGTCTGGTGCGTCGGCCCAGTTTCGCTTTGCAACGACAACAAA CTCGACAAAGCTGAAAGAGGCAACAAAGCCTCCATCGACGAATACAAATGCATCAGGTGGCTCG ACGGGCAGCAGCCATCTTCGGTGGTTTACGTCTCTTTAGGAAGCTTGTGCAATCTGGTGACGGC GCAGATCATAGAGCTGGGTTTGGGTTTGGAGGCATCAAAGAAACCCTTCATTTGGGTCATAAGA AGAGGAAACATAACAGAGGAGTTACAGAAATGGCTTGTGGAGTACXATTTCGAGGAGAAAATTA AAGGGAGAGGGCTGGTGIUTCTTGGCTGGGCICCCCAAGTTCTGATACTGTCACACCCTGCAAT CGGATGCTTTTTGACGCACTGCGGTTGGAACTCAAGCATC6AAGGGATATCGGCCGGCGTGCCA ATGGTCACCTGGCCGCTTTTTGCGATCAAGTCTTCAACGAGAAGCTAATTGCTACAAATACTCA GAATCGGCGTAAGTGTAGGCACGGAAACTACTATGAACTGGGGAGAGGAAGAGGAGAAAGGGGT GGTTGTGAAGAGAGAGAAAGTGAGGGAAGCCATAGAAATAGTGATGGATGGAGATGAGAGAGAA GAGAGGAGAGAGAGATGCAAAGAGCTTGCTGAAACGGCGAAGAGAGCTATAGAAGAAGGGGGCT CGTCTCACCGGAACCTCACGATGTTGATTGAAGATATAATTCATGGAGGAGGTTTGAGTTATGA GAAAGGAAGTTGTCGCTGA UGTSK98gene ATGGATGCCCAGCGAGGTCACACCACCACCATTTTGATGCTTCCATGGGTCGGCTACGGCCATC 122 SEQIDNO:49 sequence[S. TCTTGCCTTTCCTCGAGCTGGCCAAAAGCCTCTCCAGGAGGAAATTATTCCACATCTACTTCTG ofWO21589 grosvenorii] TTCAACGTCTGTTAGCCTCGACGCCATTAAACCAAAGCTTCCTCCTTCTATCTCTTCTGATGAT TCCATCCAACTTGTGGAACTTCGTCTCCCTTCTTCTCCTGAGTTACCTCCTCATCTTCACACAA CCAACGGCCTTCCCTCTCACCTCATGCCCGCTCTCCACCAAGCCTTCGTCATGGCCGCCCAACA CTTTCAGGTCATTTTACAAACACTTGCCCCGCATCTCCTCATTTATGACATTCTCCAACCTTGG GCTCCTCAAGTGGCTTCATCCCTCAACATTCCAGCCATCAACTTCAGTACTACCGGAGCTTCAA TGCTTTCTCGAACGCTTCACCCTACTCACTACCCAAGTTCTAAATTCCCAATCTCAGAGTTTGT TCTTCACAATCACTGGAGAGCCATGTACACCACCGCCGATGGGGCTCTTACAGAAGAAGGCCAC AAAATTGAAGAAACACTTGCGAATTGCTTGCATACTTCTTGCGGGGTAGTTTTGGTCAATAGTT TCAGAGAGCTTGAGACGAAA6TATATCGATTATCTCTCTGTTCTCTTGAACAAGAAAGTTGTTC CGGTCGGTCCTTTGGTTTACGAACCGAATCAAGAAGGGGAAGATGAAGGTTATTCAAGCATCAA AAATTGGCTTGACAAAAAGGAACCGTCCTCAACCGTCTTCGTTTCATTTGGAACCGAATACTTC CCGTCAAAGGAAGAAATGGAAGAGATAGCGTATGGGTTAGAGCTGAGCGAGGTTAATTTCATCT GGGTCCTTAGATTTCCTCAAGGAGACAGCACCAGCACCATTGAAGACGCCTTGCCGAAGGGGTT T CTGGAGAGAGCGGGAGAGAGGGCGATGGTGGTGAAGGGTTGGGCTCCTCAGGCGAAGATACTGA AGCATTGGAGCACAGGGGGGCTTGTGAGTCACTGTGGATGGAACTCGATGATGGAGGGCATGAT GTTTGGCGTACCCATAATAGCGGTCCCGATGCATCTGGACCAGCCCTTTAACGCCGGACTCTTG GAAGAAGCTGGCGTCGGCGTGGAAGCCAAGCGAGGTTCGGACGGCAAAATTCAAAGAGAAGAAG TTGCAAAGTCGATCAAAGAAGTGGTGATTGAGAAAACCAGGGAAGACGTGAGGAAGAAAGCAAG AGAAATGGGTGAGATTTTGAGGAGTAAAGGAGATGAGAAAATTGATGAGTTGGTGGCTGAAATT TCTCTTTTGCGCAAAAAGGCTCCATGTTCAATTTAA SgrosvenoriiUG98 ATGGATGCCCAGCGAGGTCACACCACAACCATTTTGATGTTTCCATGGCTCGGCTATGGCCATC 123 SEQIDNO:51 genesequence TTTCGGCTTTCCTAGAGTTGGCCAAAAGCCTCTCAAGGAGGAACTTCCATATCTACTTCTGTTC ofWO AACCTCTGTTAACCTCGACGCCATTAAACCAAAGCTTCCTTCTTCTTCCTCTTCTGATTCCATC 2001606050089 CAACTTGTGGAACTTTGTCTTCCATCTTCTCCTGATCAGCTCCCTCCTCATCTTCACACAACCA 00 ACGCCCTCCCCCCTCACCTCATGCCCACTCTCCACCAAGCCTTCTCCATGGCTGCCCAACACTT TGCTGCCATTTTACACACACTTGCTCCGCATCTCCTCATTTACGACTCTTTCCAACCTTGGGCT CCTCAACTAGCTTCATCCCTCAACATTCCAGCCATCAACTTCAATACTACGGGAGCTTCAGTCC TGACCCGAATGCTTCACGCTACTCACTACCCAAGTTCTAAATTCCCAATTTCAGAGTTTGTTCT CCACGATTATTGGAAAGCCATGTACAGCGCCGCCGGTGGGGCTGTTACAAAAAAAGACCACAAA ATTGGAGAAACACTTGCGAATTGCTTGCATGCTTCTTGTAGTGTAATTCTAATCAATAGTTTCA GAGAGCTCGAGGAGAAATATATGGATTATCTCTCCGTTCTCTTGAACAAGAAAGTTGTTCCGGT TGGTCCTTTGGTTTACGAACCGAATCAAGACGGGGAAGATGAAGGTTATTCAAGCATCAAAAAT TGGCTTGACAAAAAGGAACCGTCCTCCACCGTCTTCGTTTCATTTGGAAGCGAATACTTCCCGT CAAAGGAAGAAATGGAAGAGATAGCCCATGGGTTAGAGGCGAGCGAGGTTCATTTCATCTGGGT CGTTAGGTTTCCTCAAGGAGACAACACCAGCGCCATTGAAGATGCCTTGCCGAAGGGGTTTCTG GAGAGGGTGGGAGAGAGAGGGATGGTGGTGAAGGGTTGGGCTCCTCAGGCGAAGATACTGAAGC ATTGGAGCACAGGGGGATTCGTGAGCCACTGTGGATGGAACTCGGTGATGGAAAGCATGATGTT TGGCGTTCCCATAATAGGGGTTCCGATGCATCTGGACCAGCCCTTTAACGCCGGACTCGCGGAA GAAGCTGGCGTCGGCGTGGAAGCCAAGCGAGATTCGGACGGCAAAATTCAAAGAGAAGAAGTTG CAAAGTCGATCAAAGAAGTGGTGATTGAGAAAACCAGGGAAGACGTGAGGAAGAAAGCAAGAGA AATGGGTGAGATTTTGAGGAGTAAAGGAGATGAGAAAATTGATGAGTTGGTGGCTGAAATTTCT CTTTTGCGCAAAAAGGCTCCATGTTCAATTTAA Codonoptimized CATTTGTCTGCTTTTTTGGAATTGGCCAAGTCCTTGTCTAGAAGAAACTTCCATATCTACTTTT 124 SEQIDNO:52 codingsequence GCTCCACCTCCGTTAATTTGGATGCTATTAAGCCAAAGTTGCCATCCTCTTCATCCTCCGATTC ofWO forUGT5K98 TATTCAATTGGTTGAATTGTGCTTGCCATCTTCCCCAGATCAATTGCCACCACACTTGCATACA 2016050890 ACTAATGCTTTACCACCACATTTGATGCCAACATTGCATCAAGCTTTTTCTATGGCTGCTCAAC ATTTTGCTGCTATCTTGCATACTTGGCTCCTCATTTGTTGATCTACGATTCTTTTCAACCATGG GCTCCACAATTGGCTTCATCTTTGAATATTCCAGCCATCAACTTCAACACTACTGGTGCTTCAG TTTTGACCAGAATGTTGCATGCTACTCATTACCCA UGTproteinalso MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI 125 SeqIDNO:6 referredtoas QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA ofWO UG194A9,A9or PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK 2016038617 UGT94-289-1 IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE EAGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKMDEMVAAIS LFLKI UGTprotienis MENIEHTTVQTNGIKMHVAAIGTGPPVLLLHGFPELWYSWRHQLLYLSSAGYRAIAPDLRGYGD 126 SeqIDNO:18 alsoreferredto TDAPPSPSSYTALHIVGDLVGLLDVLGIEKVFLIGHDWGAIIAWYFCLFRPDRIKALVNLSVQF in asEH1,EPH1and FPRNPTTPFVKGFRAVLGDQFYMVRFQEPGKAEEEFASVDIREFFKNVLSNRDPQAPYLPNEVK W02016038617 contig73966. FEGVPPPALAPWLTPEDIDVYADKFAETGFTGGLNYYRAFDRTWELTAPWTGARIGVPVKFIVG DLDLTYHFPGAQKYIHGEGFKKAVPGLEEVVVMEDTSHFINQERPHEINSHIHDFFSKFC UCT85E5gene ATGGTGCAACCTCGGGTACTGCTGTTTCCTTTCCCGGCACTGGGCCACGTGAAGCCCTTCTTAT 127 SeqIDNO:33 codingsequence CACTGGCGGAGCTGCTTTCCGACGCCGGCATAGACGTCGTCTTCCTCAGCACCGAGTATAACCA of CCGTCGGATCTCCAACACTGAAGCCCTAGCCTCCCGCTTCCCGACGCTTCATTTCGAAACTATA W02016038617 CCGGATGGCCTGCCGCCTAATGAGTCGCGCGCTCTTGCCGACGGCCCACTGTATTTCTCCATGC GTGAGGGAACTAAACCGAGATTCCGGCAACTGATTCAATCTCTTAACGACGGTCGTTGGCCCAT CACCTGTATTATCACTGACATCATGTTATCTTCTCCGATTGAAGTAGCGGAAGAATTTGGGATT CCAGTAATTGCCTTCTGCCCCTGCAGTGCTCGCTACTTATCGATTCACTTTTTTATACCGAAGC TCGTTGAGGAAGGTCAAATTCCATACGCAGATGACGATCCGATTGGAGAGATCCAGGGGGTGCC CTTGTTCGAAGGTCTTTTGCGACGGAATCATTTGCCTGGTTCTTGGTCTGATAAATCTGCAGAT ATATCTTTCTCGCATGGCTTGATTAATCAGACCCTTGCAGCTGGTCGAGCCTCGGCTCTTATAC TCAACACCTTCGACGAGCTCGAAGCTCCATTTCTGACCCATCTCTCTTCCATTTTCAACAAAAT CTACACCATTGGACCCCTCCATGCTCTGTCCAAATCAAGGCTCGGCGACTCCTCCTCCTCCGCT TCTGCCCTCTCCGGATTCTGGAAAGAGGATAGAGCCTGCATGTCCTGGCTCGACTGTCAGCCGC CGAGATCTGTGGTTTTCGTCAGTTTCGGGAGTACGATGAAGATGAAAGCCGATGAATTGAGAGA GTTCTGGTATGGGTTGGTGAGCAGCGGGAAACCGTTCCTCTGCGTGTTGAGATCCGACGTTGTT TCCGGCGGAGAAGCGGCGGAATTGATCGAACAGATGGCGGAGGAGGAGGGAGCTGGAGGGAAGC TGGGAATGGTAGTGGAGTGGGCAGCGCAAGAGAAGGTCCTGAGCCACCCTGCCGTCGGTGGGTT TTTGACGCACTGCGGGTGGAAC TCAACGGTGGAAAGCATTGCCGCGGGAGTTCCGATGATGTGCTGGCCGATTCTCGGCGACCAAC CCAGCAACGCCACTTGGATCGACAGAGTGTGGAAAATTGGGGTTGAAAGGAACAATCGTGAATG GGACAGGTTGACGGTGGAGAAGATGGTGAGAGCATTGATGGAAGGCCAAAAGAGAGTGGAGATT CAGAGATCAATGGAGAAGCTTTCAAAGTTGGCAAATGAGAAGGTTGTCAGGGGTGGGTTGTCTT TTGATAACTTGGAAGTTCTCGTTGAAGACATCAAAAAATTGAAACCATATAAATTTTAA UGTprotein MVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDVVFLSTEYNHRRISNTEALASRFPTLHFETI 128 SeqIDNO:34 PDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRWPITCIITDIMLSSPIEVAEEFGI of PVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQGVPLFEGLLRRNHLPGSWSDKSAD W02016038617 ISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFNKIYTIGPLHALSKSRLGDSSSSA SALSGFWKEDRACMSWLDCQPPRSVVFVSFGSTMKMKADELREFWYGLVSSGKPFLCVLRSDVV SGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVLSHPAVGGFLTHCGWNSTVESIAAGVPMMC WPILGDQPSNATWIDRVWKIGVERNNREWDRLTVE UGTprotein MDAAQQGDTTTILMLPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSFSDSI 129 SeqIDNO:38 referredtoas QFVELHLPSSPEFPPHLHTTNGLPPTLMPALHQAFSMAAQHFESILQTLAPHLLIYDSLQPWAP of UGT94C9andUGT94- RVASSLKIPAINFNTTGVFVISQGLHPIHYPHSKFPFSEFVLHNHWKAMYSTADGASTERTRKR W02016038617 289-3 GEAFLYCLHASCSVILINSFRELEGKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKNW LDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVNFIWVVRFPQGDNTSGIEDALPKGFLE RAGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHVDQPFNAGLVEE AGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKFDEMVAEISL LLKI Codingsequence TCACAACGTTTAGCTTTGACAGATGGTATCGCCGTGACAGACGGGGTCAAAGAAGTTTGGCTTT 130 forenzymesfor CATAAAAAGAGCTAGACATTCAAGCAAAAACTTGAATGTCTAGCTCTTTTGTTGATTGAATCGG acetylCoA GGGATTTAAATACTTAGTTTCGATAAGAGCGAACGGTATTATTAATAGCAGAAATACTATATTT synthesis TAATTCATCTTCTAACGTTTGATCAATATCTGTGTCTAAAGTTTCTGCAAACATGGCTTCATAT TCAGCGATAGAAAGTTCTGTCCGATTATCTAGCAGTGCTAAATGAGTTTCTTTTTGTAAATGAT TTTGATAACCAGCTACTAATTCACCAGTGAAAAATTCAGCGACAGCACCAGAACCATAACTGAA TAACCCAATTTGATTGCCTGCGGTTAAAGTCGTTGCATTTTCTAAAAGGGAAATGAGTCCCAGA TAAAGTGAACCCGTATACAAGTTTCCTACGCGACGACTATAGATGATGCTTTCTTCATAACGGG CTAAAATTCGTTCCTGTTCTGCTTCAGTTTGGTCGGAGATTTTTGCTAATAAGGCTTTTTTGCC CATTTTTGTGTAAGGAATATGGAACGCTAAAGCATCATAATCTGCAAAATCAAGACCGGTTCTT TTTTTATGTTCATCCCAGACTTGGGCAAAAGATTGGATGTAGGTTTCGTTTGACAAAGGACCAT CGACCATAGGATACGGATGGCCTGTTGGACGCCAAAAGTCATAGATATCTTGCGTCAGCATCAC ATTATCCTCTTTTAAAGCCAAGATGCGCGGTTCACTAGCAACTAACATTGCAACCGCCCCAGCT CCTTGTGTAGGCTCACCGCCAGAATTTAATCCATATTTTGCAATATCTGCTGCTACAACCAAGA CTTTTTTATCTGGATGTAAGGCTACGTGATTCTTAGCTAACTGTAAGCCTGCTGTTGCTCCGTA ACAAGCTTCCTTGATTTCGAAAGAGCGAGCGAAAGGTTGAATCCCCATTAAACGATGTAAGACA ACTGCGGCCGCTTTTGACTCATCGATACTGGACTCAGTCCCGACAATCACCATATCAATGGCCT CTTTATCTTCTTTGGTCAAGATCGCTTCTGCGGCATTGGCTGCAAATGTCACAATATCTTGGCT GATTGGGTTCACCGCCATTTGGTCTTGCCCAATACCAATATGAAATTTTCCAGGGTCTACATTT CTGGCTTCAGCCAGTGCCGTCATATCAATATAATAAGGGGGCACAAAAAAACTAATTTTATCAA TCCCAATTGTCATTTCTTTAACTCCTTTACGATAAATAGATTCATTATATAAAATAGCACGAAA TGAACCAAAATGGGGAATTTTTGTATTAACTTCATAGATTATTAAAAAATATCTTATAAGTCTG TTAACATTCAGTAATTGGCACTTGTGATTCTGGGATTTTATGATATATTTCAAGATGGAGGTGC ATTTAGTTGAAAACAGTAGTTATTATTGATGCATTACGAACACCAATTGGAAAATATAAAGGCA GCTTAAGTCAAGTAAGTGCCGTAGACTTAGGAACACATGTTACAACACAACTTTTAAAAAGACA TTCCACTATTTCTGAAGAAATTGATCAAGTAATCTTTGGAAATGTTTTACAAGCTGGAAATGGC CAAAATCCCGCACGACAAATAGCAATAAACAGCGGTTTATCTCATGAAATTCCCGCAATGACAG TTAATGAGGTCTGCGGATCAGGAATGAAGGCCGTTATTTTGGCGAAACAATTGATTCAATTAGG AGAAGCGGAAGTTTTAATTGCTGGCGGGATTGAGAATATGTCCCAAGCACCTAAATTACAACGA TTTAATTACGAAACAGAAAGCTATGATGCGCCTTTTTCTAGTATGATGTACGATGGGTTAACGG ATGCCTTTAGTGGTCAAGCAATGGGCTTAACTGCTGAAAATGTGGCCGAAAAGTATCATGTAAC TAGAGAAGAGCAAGATCAATTTTCTGTACATTCACAATTAAAAGCAGCTCAAGCACAAGCAGAA GGGATATTCGCTGACGAAATAGCCCCATTAGAAGTATCAGGAACGCTTGTGGAGAAAGATGAAG GGATTCGCCCTAATTCGAGCGTTGAGAAGCTAGGAACGCTTAAAACAGTTTTTAAAGAAGACGG TACTGTAACAGCAGGGAATGCATCAACCATTAATGATGGGGCTTCTGCTTTGATTATTGCTTCA CAAGAATATGCCGAAGCACACGGTCTTCCTTATTTAGCTATTATTCGAGACAGTGTGGAAGTCG GTATTGATCCAGCCTATATGGGAATTTCGCCGATTAAAGCCATTCAAAAACTGTTAGCGCGCAA TCAACTTACTACGGAAGAAATTGATCTGTATGAAATCAACGAAGCATTTGCAGCAACTTCAATC GTGGTCCAAAGAGAACTGGCTTTACCAGAGGAAAAGGTCAACATTTATGGTGGCGGTATTTCAT TAGGTCATGCGATTGGTGCCACAGGTGCTCGTTTATTAACGAGTTTAAGTTATCAATTAAATCA AAAAGAAAAGAAATATGGAGTGGCTTCTTTATGTATCGGCGGTGGCTTAGGACTCGCTATGCTA CTAGAGAGACCTCAGCAAAAAAAAAACAGCCGATTTATCAAATGAGTCCTGAGGAACGCCTGGC TTCTCTTCTTAATGAAGGCCAGATTTCTGCTGATACAAAAAAAGAATTTGAAAATACGGCTTTA TCTTCGCAGATTGCCAATCATATGATTGAAAATCAAATCAGTGAAACAGAAGTGCCGATGGGCG TTGGCTTACATTTAACAGTGGACGAAACTGATTATTTGGTACCAATGGCGACAGAAGAGCCCTC AGTGATTGCGGCTTTGAGTAATGGTGCAAAAATAGCACAAGGATTTAAAACAGTGAATCAACAA CGTTTAATGCGTGGACAAATCGTTTTTTACGATGTTGCAGACGCCGAGTCATTGATTGATGAAC TACAAGTAAGAGAAACGGAAATTTTTCAACAAGCAGAGTTAAGTTATCCATCTATCGTTAAACG CGGCGGCGGCTTAAGAGATTTGCAATATCGTGCTTTTGATGAATCATTTGTATCTGTCGACTTT TTAGTAGATGTTAAGGATGCAATGGGGGCAAATATCGTTAACGCTATGTTGGAAGGTGTGGCCG AGTTGTTCCGTGAATGGTTTGCGGAGCAAAAGATTTTATTCAGTATTTTAAGTAATTATGCCAC GGAGTCGGTTGTTACGATGAAAACGGCTATTCCAGTTTCACGTTTAAGTAAGGGGAGCAATGGC CGGGAAATTGCTGAAAAAATTGTTTTAGCTTCACGCTATGCTTCATTAGATCCTTATCGGGCAG TCACGCATAACAAAGGGATCATGAATGGCATTGAAGCTGTCGTTTTAGCTACAGGAAATGATAC ACGCGCTGTTAGCGCTTCTTGTCATGCTTTTGCGGTGAAGGAAGGTCGCTACCAAGGTTTGACT AGTTGGACGCTGGATGGCGAACAACTAATTGGTGAAATTTCAGTTCCGCTTGCGTTAGCCACGG TTGGCGGTGCCACAAAAGTCTTACCTAAATCTCAAGCAGCTGCTGATTTGTTAGCAGTGACGGA TGCAAAAGAACTAAGTCGAGTAGTAGCGGCTGTTGGTTTGGCACAAAATTTAGCGGCGTTACGG GCCTTAGTCTCTGAAGGAATTCAAAAAGGACACATGGCTCTACAAGCACGTTCTTTAGCGATGA CGGTCGGAGCTACTGGTAAAGAAGTTGAGGCAGTCGCTCAACAATTAAAACGTCAAAAAACGAT GAACCAAGACCGAGCCTTGGCTATTTTAAATGATTTAAGAAAACAATAAAAAAACAGTTCAGCA GAAATTATTCTGCTGAACTGTTTTTTTTCACATTAGGTAGCCGTTTCAGGCCACGAATTGGTTT TACTTTTAAGACATCTAAGAAGAAAGTGAA CGT-SL MKRWLSVVLSMSLVFSAFFLVSDTQKVTVEAAGNLNKVNFTSDIVYQIVVDRFVDGNTSNNPSG 148 glucotransferases SLFSSGCTNLRKYCGGDWQGIINKINDGYLTEMGVTAIWISQPVENVFAVMNDADGSTSYHGYW AAD00555.1 ARDFKKTNPFFGTLSDFQRLVDAAHAKGIKVIIDFAPNHTSPASETNPSYMENGRLYDNGTLIG GYTNDTNSYFHHNGGTTFSNLEDGIYRNLFDLADFNHQNQFIDKYLKDAIKLWLDMGIDGIRMD AVKHMPFGWQKSFMDEVYDYRPVFTFGEWFLSENEVDSNNHFFANESGMSLLDFRFGQKLRQVL RNNSDDWYGFNQMIQDTASAYDEVIDQVTFIDNHDMDRFMADEGDPRKVDIALAVLLTSRGVPN IYYGTEQYMTGNGDPNNRKMMTSFNKNTRAYQVIQKLSSLRRSNPALSYGDTEQRWINSDVYIY ERQFGKDVVLVAVNRSLSKSYSITGLFTALPSGTYTDQLGALLDGNTIQVGSNGAVNAFNLGPG EVGVWTYSAAESVPIIGHIGPMMGQVGHKLTIDGEGFGTNVGTVKFGNTVASVVSWSNNQITVT VPNIPAGKYNITVQTSGGQVSAAYDNFEVLTNDQVSVRFVVNNANTNWGENIYLVGNVHELGNW NTSKAIGPLFNQVIYSYPTWYVDVSVPEGKTIEFKFIKKDGSGNVIWESGSNHVYTTPTSTTGT VNVNWQY CGT-SL MSRNGAVTPDWQFTVEVQEGETITYKYVKGGSWDQEGLADHTREDDNDDDVSYYGYGAIGTDLK 154 glucotransferases VTVHNEGNNTMIVQDRILRWIDMPVVIEEVQKQGSQVTIKGNAIKNGVLTINGERVPIDGRMAF KMY60644.1 SYTFTPASHQKEVSIHIEPSAESKTAIFNNDGGAIAKNTKDYVLNLETKQLREGKLTTPPSNGD SPESDWPGSETPSHDGGATPGNGTSPGSGGPSDGTSPGGSVPPGGTAPPGNEAPPSRPPQKPSP SKPKEKPRKPTTPPGQVKKVYWDGVELKKGQIGRLTVQKPINLWKRTKDGRLVFVRILQPGEVY RVYGYDVRFGGQYAVGGGYYVTDIDTHIRYETPSKEKLKLVNGE DexTprotein MPANAPDKQSVTNAPVVPPKHDTDQQDDSLEKQQVLEPSVNSNIPKKQTNQQLAVVTAPANSAP 156 [Leuconostoc QTKTTAEISAGTELDTMPNVKHVDGKVYFYGDDGQPKKNFTTIIDGKPYYFDKDTGALSNNDKQ citreum] YVSELFSIGNKHNAVYNTSSDNFTQLEGHLTASSWYRPKDILKNGKRWAPSTVTDFRPLLMAWW PDKSTQVTYLNYMKDQGLLSGTHHFSDNENMRTLTAAAMQAQVNIEKKIGQLGNTDWLKTAMTQ YIDAQPNWNIDSEAKGDDHLQGGALLYTNSDMSPKANSDYRKLSRTPKNQKGQIADKYKQGGFE LLLANDVDNSNPVVQAEQLNWLHYMMNIGSILQNDDQANFDGYRVDAVDNVDADLLQIAGEYAK AAYGVDKNDARANQHLSILEDWGDEDPDYVKAHGNQQITMDFPLHLAIKYALNMPNDKRSGLEP TREHSLVKRITDDKENVAQPNYSFIRAHDSEVQTIIADIIKDKINPASTGLDSTVTLDQIKQAF DIYNADELKADKVYTPYNIPASYALLLTNKDTIPRVYYGDMFTDDGQYMAKQSPYYQAIDALLK ARIKYAAGGQTMKMNYFPDEQSVMTSVRYGKGAMTASDSGNQETRYQGIGLVVNNRPDLKLSDK DEVKMDMGAAHKNQDYRPVLLTTKSGLKVYSTDANAPVVRTDANGQLTFKADMVYGVNDPQVSG YIAAWVPVGASENQDARTKSETTQSTDGSVYHSNAALDSQVIYEGFSNFQDFPTTPDEFTNIKI AQNVNLFKDWGITSFEMAPQYRASSDKSFLDAIVQNGYAFTDRYDIGYNTPTKYGTADNLLDAL RALHGQGIQAINDWVPDQIYNLPDEQLVTAIRTDGSGDHTYGSVIDHTLYASKTVGGGIYQQQY GGAFLEQLKTQYPQLFQQKQISTDQPMNPDIQIKSWEAKYFNGSNIQGRGAWYVLKDWGTQQYF NVSDAQTFLPKQLLGEKAKTGFVTRGKETSFYSTSGYQAKSAFICDNGNWYYFDDKGKMVVGNQ VINGINYYFLPNGIELQDAYLVHDGMYYYYNNIGKQLHNTYYQDKQKNFHYFFEDGHMAQGIVT IIQSDGTPVTQYFDENGKQQKGVAVKGSDGHLHYFDGASGNMLFKSWGRLADGSWLYVDEKGNA VTGKQTINNQTVYFNDDGRQIKNNFKELADGSWLYLNNKGVAVTGEQIINGQTLYFGNDGRQFK GTTHINATGESRYYDPDSGNMITDRFERVGDNQWAYFGYDGVAVTGDRIIKGQKLYFNQNGIQM KGHLRLENGIMRYYDADTGELVRNRFVLLSDGSWVYFGQDGVPVTGVQVINGQTLYFDADGRQV KGQQRVIGNQRYWMDKDNGEMKKITYAAALE DexTgene(coding ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG 157 sequence) ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA AAAAAATAACATACGCGGCCGCACTCGAGCACCACCACCACCACCACTGA DexTgene(coding ATGCAAAACGGCGAAGTGTGTCAGCGTAAAAAACTGTACAAGTCAGGGAAGATATTAGTTACAG 158 sequenceisc1oned CAAGTATTTTTGCTGTTATGGGTTTTGGTACTGCCATGTCACAAGCAAACGCGAGCAGTAGTGA intopET23a) TAATGATAGCAAAACACAAACTATTTCAAAAATAGTAAAAAGTAAAGTCGAACCGGCAACTGTT CAACCAGCGAAACCAGCGGAACCTACTAATAAAATAGTTGACCAAGCAGATATGCATACGGTCA GCGGGCAAAACAGCGTGCCACCAGTAGTGACTAATCAATCCAATTAACAGGCTGCAAAACCAAC TACACCTGTTACCGATGTCACAGATACGCATAAAATCGAAGCAAACAACGTCCCTGCTGATGTT ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA AAAAAATAACATACGCGGCCGCACTCGAGCACCACCACCACCACCACTGA Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 163 CAA25303.1GI:2343 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR EYTVPQACGTSTATVTDTWR GlucoamylaseG1 ATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVASPSTDNPTYFYTRDSGLVLKTLVDL 164 1008149A FRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGLGEPKFNVDETAYTGSWGRPQRDGPAL GI:224027 RATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQYWNQTGYDLWEEVNGSSFFTIAVQHR ALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSFILANFDSSRSGKDANTLLGSIHTFDP EAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLSDSEAVAVGRYPEDTYYNGNPWFLCTL AAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAATGTYSSSSSTYSSIVDAVKTFADGFVS IVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLTANNRRNSVVPASWGETSASSVPGTCA ATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVTSTSKTTATASKTSTSTSSTSCTTPTA VAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAGESFEYK FIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase SLLAPSQPQFXIPASAAVGAQLIANIDDPQAADAQSVCPGYKASKVQHNSRGFTASLQLAGRPC 165 NVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTXASWYFLSENLVPRPKASLXASVSQSDLFV SWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFVTALPEEYNLYGLGEHITQFRLQRNA XLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQ Transglucosidase SQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQYLTSTVGLPAM 166 AAB23581.1 QQYNTLGFHQCRWGYNXWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDNDQHRFSYSEGD GI:257187 EFLSKLHESGRYYVPIVDAALYIPNPEXASDAYATYDRGAADDVFLKNPDGSLYIGAVWPGYTV FPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGXLTLNPAHPSFLLPGEP GDIIYDYPEAFXITXATEAASAXAGASXQAAATATTXXXXVSYLRTTPXPGVRNVEHPPYVINH DQEGHDLSVHAVSPXATHVDGVEEYDVHGLYGHQGLXATYQGLLEVWSHKRRPFIIGRSTFAGS GKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNRWMQLSAFFPF YRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMRALSWEFPNDP TLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKPGVXTTISAPL GHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSXGTASGQLYLDDGEXIYPXATLHVDF TASRSSLRSSAQGRWKERNPLAMVTVLGVNKEPSAVTLNGQAVFPGSVTYXSTSQVLFVGGLQX LTKGGAWAENWVLEW Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 167 CAA25219.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP 168 BAA23616.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS QVLFVGGLQNLTKGGAWAENWVLEW Transglucosidase MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP 169 P56526.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS QVLFVGGLQNLTKGGAWAENWVLEW Transglucosidase MSFRSLLALSGLVCTGLANVISKRATWDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 170 AAP04499.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSFI LANFDSSRSAKDANTLLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDATG TYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLTA NNRRNVVPSASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVTS TSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSA DKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase SSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWDTSDGIALSADKYTSSNPLWYVTVT 171 AAM18050.2 LPAGESFEYKFIRIESDDSVEWESDPNREYTVPQVCGESTATVTDTWR Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 172 AAT67041.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFRQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSGDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 173 P69328.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP 174 BAF37801.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS QVLFVGGLQNLTKGGAWAENWVLEW Transglucosidase MLYAEDNKLIFRFDDHLLWIQPWGENALRVRATKLASMPTEDWALSSKVTSIEPTISIEEHKDS 175 CAK37022.1 SITNGKIKATVSQRGKITIYNQKGEKLLEEYARNRRDLKDPKCSALEVEARELRPILGGDFHLT MRFESLDPKEKIYGMGQYQQPFLNLKGVDIELAHRNSQASVPFALSSLGYGFLWNNPAIGRAVL GTNTMSFEAYSTKVLDYWVVAGDSPAEIEEAYSKVTGYVPMMPEYGLGFWQCKLRYWNQEQLLD VAREYKRRNIPLDLIVVDFFHWKHQGEWSFDPEFWPDPDAMIKELQSLNVELMVSVWPTVENAS TNYPEMLEKGLLIRHDRGLRVSMQCNGDITHFDATNPSARAYVWSKAKQNYYDKGIKVFWLDEA EPEYSVYDFDLYRYHAGSNLQIGNIFPKEYARGFYEGMESAGQTNIVNLLRCAWAGSQKYGALV WSGDIASSWSSFRNQLAAGLNMGLAGIPWWTTDIGGFHGGDPSDPAFRELFTRWFQWGAFCPVM RLHGDREPKPENRPTDSGSDNEIWSYGEEVYEICKKYIGIREELRDYTRGLMKEAHEKGTPVMR TLFYEFPADKKAWDVETEHLFGSKYLVVPVFEAGKRSVEVYLPAGASWKVWGQEDVIHEGGKEI QVDCPIETMPVFVRV Transglucosidase MSSPQQVYLLPLKDDGSPDVPGGYIYLPAPTNPPYLLRFVIEGSSSICREGALWVNIPEKGESF 176 CAK37087.1 NRSAFRSFSLSPDFNKNIQIDVPITSAGSFAFYVTFSPLPEFSVLSTPTPEPTRTPTHYIDVSP KLTLRGQDLPLNALSIYSVISKFMGQYPKEWEKHLNGISQRNYNMVHFTPLMKRGASNSPYSIF DQLQFDDAVFPNGEDDVARLISKMENEYGLLSLTDVVWNHTAHNSKWLEEHPEAGYSVETAPWL EAALELDTALLKFGQDLQNLGLPTEFQTVDELMKVMNVMRDKVIAGIRLWEFYAIDVKSDTHKI LDKWKTSKDIDLTDTNWAQLNLQDYKNWTLKQQATFIRDHAIPTSKQVLGRFSRAVDLQFGAAI LTALFGPHNPSTSDTSIVEESLSKILDEVNLPFYEEYDGDVSEIMNQVFNRIKYLRIDDHGPKL GAVTAQSPLIETYFTRLPLNDVTKKHKKEALALVNNGWIWNADALRDNAGPDSRAYLRREVIVW GDCVKLRYGSCRDDNPFLWDFMTDYTRLMAKYFSGFRIDNCHSTPLVVAEYLLDEARKVRPNLT VFAELFTGSEEADYIFVKRLGINALIREAMQAWSTGELSRLVHRHGGRPIGSFDLDLPSSGSSH AIASSGLDSGKEKVVHIRPTPVQALFMDCTHDNEMPAQKRTAKDTLPNGALVAMCASAIGSVIG YDEVYPRLVDLVHEHRLYFSEFSEAPETGLNSLEGGIGGIKKLLNELHTKMGIEGYDETHIHHD GEYITVHRVHPRTRKGVFLIAHTAFPGQDSRSVLAPTHLVGTQVKHIGTWLLEVDTSQTTKERI QADKSYLRGLPSQVKTFEGTKIEESGKDTIISVLNSFVAGSIALFETSMPSVEHASGLDNYITE GVDHAFSDLSLVDLNFALYRCEAEERDSSKGQDGAYDIPGHGPLVYAGLQGWWSVLENIIKYNE LGHPLCDHLRNGQWALDYIVARLEKLSHKEEHPALGRPAAWLQEKFQAVRQLPSFLLPRYFAII VQVAYNAAWKRGIQLLGPHIQKGQEFIHQLGMVSVQQTGYVNSASLWPTKKVPSLAAGLPHFAV DWARCWGRDVFISLRGLLLCTGRFEDAKEHITAFASVLKHGMIPNLLSSGKLPRYNSRDSVWFF LQSIQDYTEMAPDGLEILDHKVPRRFLPYDDVWFPFDDPRAYSQQSTISEIIQEVFQRHAQGLS FREYNAGPDLDMQMTQDGFQIDVKVDWETGLIFGGSQYNCGTWQDKMGESAKAGNKGVPGTPRD GAAIEITGLVYSALTWVAKLHERGIYKHDGVDIGGGKSISFEDWASRIRANFERCYYVPLQPKD DGQYDIDANIINRRGIYKDLYRSGKPYEDYQLRSNFPIAMTVAPDLFTASKALAALALADEVLV GPVGMATLDPSDLNYRPNYNNSEDSTDFATAKGRNYHQGPEWVWQRGYFLRAFLHFDLARRTTP AERTETYQQITRRLEGCKRALRESPWKGLTELTNKNGAYCADSSPTQAWSAGCLLDLYYDASRH SQS Transglucosidase MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS 177 CAK43781.1 VTYDFGINVAGIVSVDVASASSDSAFIGVTFTESSMWISSEACDATQDAGLDTPLWFAVGQGAG LYTVEKKYNRGAFRYMTVVSNTTATVSLNSVKINYTASPTQDLRAYTGYFHSNDELLNRIWYAG AYTLQLCSIDPTTGDALVGLGVITSSETISLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD MSIALESVAVSTEDLYSVRTALESLYALQKPDGRLPYAGKPFFDTVSFTYHLHSLVGAASYYQY TGDRAWLTRYWGQYKKGVQWALSSVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS LAQTLNDNAPIRNWTTTAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN QSAIVSESLAARWGPYGAPAPEAGATVSPFIGGFELQAHYQAGQPDRALDLLRLQWGFMLDDPR MTNSTFIEGYSTDGSLAYAPYTNTPRVSHAHGWATGPTSALTIYTAGLRVTGPAGATWLYKPQP GNLTQVEAGFSTRLGSFASSFSRSGGRYQELSFSTPNGTTGSVELGDVSGQLVSDRGVKVQLVG GKASGLQGGKWKLSNN Transglucosidase MSSDSQLSRSHFLAPPTVIPAPSYIASSAAAQIITADQEFNAADFVADDEGHDSSASALVTPEA 178 CAK37133.1 LSSLNAFLDNILFNILAAAKSTQLVKIRPAVAEVLKPRLAKEMVSAADDELSEYLGGPEDEQLE FRSGQTSIGEFDLVRSWKLTRLRCMVYTRLGDMEEDDEEEYINQEIIGEDGGGLRRLASHVGHI TPAASIFLTSIIEHMGEQALIIAGEIARSRLSANLEDEDDLAGTGANRASMDRLVVEDHDMERL ALNPTLGRLWRTWRKRVRGSNLSRAVSRESLRNRQSLVFGPGSRKSSAITIDEISPRTASSRSV NEPLPETEDEVDPASVPLPMSEHDIQEIEIPCFLPELDTGDIQTMQAVVAHKVRPHSLMVLTLP SPRSPSSNGNSPITPRLVNIKSPRHVRSRSLPNTAPADEQPSEVEQPAERTSPTPSEERRRLET MYEHDEDDERHGEAATKPEAVEQNEPVVPSAGQGAAATPSTSVASVEVAMSDASSTPVSSPSLS DRDYPETDEVEKHERVEKAQLAPGVETAPGPLAPRTQGVVDSTPAQPTAAADQDASKAADCDQS TPEDSTPPTVPSSAEDTAVEKASRPVSTSGESAISDSSRSLPGKRGSSVPGVQHQYGRSSPGIA SVSSGVERAAVQRLPARPSTSVASSVYSKSRRSGSFSSSREKRPVTAGSTTSQVSSKLKGLIGR PADTGSLRLRTSSEVSRVSTRESAYDDTSGLDELIRSEETIHFTLTPRSMREMELPDSPRWRAQ QASTDPTDLPKSVEPIPDDMSRSRHSTTSSKSTVDLPPVPKYIQSKPKSIEIPTTGLQQKPAVG QARDAKHSMESTRDFANFLKSTGPNTPTTPATVDGSPAKSSRLRRLSDATEISKKLSRPASSTV SVANSARSGPRLEARSAVAPRGDQTSDLIDFIREGPPTAGAHRIPRTVAPFRDTMDSDELQAIE PGRTAKGAPSVASTQSVAETSLVSVGSRTGLLESTSRTSTPTALAKETKTTFAAPVSVSDDHRP PRTRRRVPDPYAIDLDDDDELDELLEEPKPKRDEESLIDFLRNVPPPEPTPPQQPLAATANSRR GSASVKARLRRNTASEKTLMAKPSKTSLHQQPDNYMGGASNYTVKVGMERNAGAMNGAYDLKTP SVRQTETSALADFLKNTGPPEPPVTKAPAATKSKDSGFSRLFMRRKKVEA Transglucosidase MAALVQTIPQQSGTVSVLQTRPSSSSGTFTTSSQPGQQQNPRNSTMSWNPYNNSGSSGNYRVGH 179 CAK37219.1 QVVAPYAFTSTPNPSNPTNMQSRQSLSPHLRPEHRTSSAPSVPQGSASPANVGVNSRFAHPAAG SVSTSSSNSSVHSYMSKDDSAIPTRQIRTDAPLRPLSTVNLPSPSSSNFMNISSPTVARPSPDR YRRGNRRPENAAGAQPASTQPNGPAPARSATLATDDSSLHMSTPGLAGVSLDAPRRPGHVRVPS ADDTTRADKPQTELAKRYRRRSWGNIDNAGLINMQLHLPTSSPTPTAGGHDYFDQSMRPRSAQS HREVQGSIPSAHSSTSSVRDAGHSESASSSKSGPKTDDSKRPNKPSPLSQPVEVDAKPPTPKAP QPSTTPQPAPTESLATQRLAEITKGDPKRPGKSRLRRAFSFGSASELLKASSQNKREAMATERA RRELLQEELGPEQAAIAEQQEASGLGESIYSSHHQGRIFNSSTDNLSVSSTASSASIMLRKMGK GMKRSTRSLVGLFRPKSVVSTSSTDGVMIEPMAPQLSVVNIEAERKSTAVTSDSQDHALGSSLF SKVETDAANAVSHEDGALDKSRKSIVGGDRERAEVLAAVRKGILKKTYSDPANQTYVLKSSENL NSNDSPHSSVPSTPEDQTRSGNRRSDAVKIAGEDDYLSEGRFQTSESKSAPITPQAMMPKSLVF SPRIQFHETWPSGEYDRRGDIATCNRLTPLLAQQIKEELNNFKMVRNLLPLLPST Transglucosidase MFVYCSNCSFALLFLMLLSLLSFTASRTLAFTTSITQDGLLCFPSALEFLLRTEITVPYWAPSG 180 CAK372261 SILRPTAALHAYDCSVCLDPPSKIQEARKSVLMSANALSEIIDIPVSDGGFIHGVIRFYARGDH LRWLQPPTRDAKFLAPDPYLHSLMIESWRQTLGEMHFWTRAGYLFDVVLAEVKRSEPDNYEFLN WSGTNYMPTCPYYYVSMSPMVQPVVRSAQGSVDAAQRSSQISQDPSNLVQTPLHSPEFSDDSTR SSFNTAQTASSCQSFSSPVSQSPCHEDVIQQNVQTSQVSLPFVRMDPSVTLDDPFVIEPISAEG SWSMQDHIADMKRQFRLPGPMVRNASPSFDSPTPSTTERISAREINRRRDSEKPYDPTPLANDT SASCDDETWSMEDASEKDASESSFKDAVEAHSDSASSTATGPVVASKNDDDQSLPKCNTNDTQP TTCTTVNPSLLMFESSHKTYPTIEPSYEVAASRPRSLSPVQNLENELQVGSIGGKDAEDAGSLS FNDEMKEGSEMDLFSASLDQYTAEQLASRQLTRTPELEESNPNEYGLGFGFQHNLFDGFDFFLP EDQSELPLESNMIM Transglucosidase MLGSLLLLLPLVGAAVIGPRANSQSCPGYKASNVQKQARSLTADLTLAGTPCNSYGKDLEDLKL 181 CAK37273.1 LVEYQTDERLHVMIYDADEEVYQVPESVLPRVGSDEDSEDSVLEFDYVEEPFSFTISKGDEVLF DSSASPLVFQSQYVNLRTWLPDDPYVYGLGEHSDPMRLPTYNYTRTLWNRDAYGTPNNTNLYGS HPVYYDHRGKSGTYGVFLLNSNGMDIKINQTTDGKQYLEYNLLGGVLDFYFFYGEDPKQASMEY SKIVGLPAMQSYWTFGVCPPPPNPITVRVVVYNYSQAKIPLETMWTDIDYMDKRRVFTLDPQRF PLEKMRELVTYLHNHDQHYIVMVDPAVSVSNNTAYITGVRDDVFLHNQNGSLYEGAVWPGVTVF PDWFNEGTQDYWTAQFQQFFDPKSGVDIDALWIDMNEASNFCPYPCLDPAAYAISADLPPAAPP VRPSSPIPLPGFPADFQPSSKRSVKRAQGDKGKKVGLPNRNLTDPPYTIRNAAGVLSMSTIETD LIHAGEGYAEYDTHNLYGTRLVMSSASRTAMQARRPDVRPLVITRSTFAGAGAHVGHWLGDNFS DWVHYRISIAQILSFASMFQIPMVGADVCGFGSNTTEELCARWASLGAFYTFYRNHNELGDISQ EFYRWPTVAESARKAIDIRYKLLDYIYTALHRQSQTGEPFLQPQFYLYPEDSNTFANDRQFFYG DALLVSPVLNEGSTSVDAYFPDDIFYDWYTGAVVRGHGENITLSNINITHIPLHIRGGNIIPVR TSSGMTTTEVRKQGFELIIAPDLDDTASGSLYLDDGDSLNPSSVTELEFTYSKGELHVKGTFGQ KAVPKVEKCTLLGKSARTFKGFALDAPVNFKLK Transglucosidase MSLSFSSDVALNATEAAVFLSERDVAGQIPINFVTTSAVSLRAACFGDNIYDRDAAGRCISNLL 182 CAK96369.1 VVGYRRFLVDLYWSSDQRDWMFCPLSLSPDVPVVTVSSISPASSTTTTATSGITATTTATTTTT TTSETIKATVTAVARSSGSVLYELGPYRCSLDFDLSDLINVFRGFFQAYSSELTVFTRYISLNL HAAGSATSPDEPASTVTGSQLPTSSEFVSYQFDEHLSSYIYTPSSLASERANLNQSWYQVEDGY KPITEYFTIHEEPNGDQSTPDGWPCVKYLQLAQEKRLLIDYGTIDSQLQDYNFSYMSDVIFPPN YLTSTVSVSLDSDGSVDTGCFYDSGATTVSQANNSWAISDYIPIPEGLSENSTIAAMSLVASNL TACGLTPALNNTLFNQTADTHPQPYTDISLSSSWAWSIGQPANADSSSASFSATDRCAVIDLTN SGHWRAINCSQVRYAACRVGNNPFTWQLSPTPYTFRDAYDHGCPENTSMAVPRTGLENTYLYQY LLTRTDVLDPTSAIPNKTKVWLNMNCIDVESCWVTGGPDQECPYASDPQQLERRTVLVAAIAGI VICIIAALTLFVKCNANRRNSRRNKRVIKGWEYEGVPS Transglucosidase MADSKPKPSSIPPWQQSNNASNTDNSSESTSSPTSDDTSRSTLIEQASKFLEDESIRDAPTDRK 183 CAK96386.1 VSFLQSKGLREDEINSLLGISATSTASDTTEEEKAASPDTTTPSSTEPAPAPEPTDNASASSNQ STPSSSITTPTPSPSTTTPKTNNTRDVPPIITYPEFLSTPTKPPPLVTLRSVLYTLYGAAGISA SFYGASEYLIKPMLSNLTSARQELASTATSNLQKLNEKLEQNVSVIPESLKNKTANVENDSSST DTESITSDPTELFHRDVATQTATSDFAATYNNSNKTGTDKDTPADPTAAVTDHLKRLESIRSQL RECSDTEKESGTLESGMRTRLNELHHYLDGLIYSKPGFNPLSGYGMYSTPGIDSGSGAATGVGK GEEDAIANFRAEIRGVKGALLSARNFPAGRGGRIGGVAGSIPTGLMRMNRVVNGIGSARPKKER YKHSPTTFRYYQ Transglucosidase MGVGDYVHSKEAGQPRPRTTEVSNQSRQAVAAQARIDVPPTNLVAPVPLPINKSIPLEHYSTPA 184 CAK47557.1 FSEQMPQAPAENGVHRDMFDTDVEGIDESTIAATSVMGAEDAPLQFQLRPATVPQYQEAAPVVD ERPLHPSRLPRRAYDGKWYENFGDKAMKSAGFDSEDADDASQLTSMAGDDERSDTTEDANYARR YRSSTEEPLSKRLQSFWSASRRSYKNPEPQAYPEPSKTAASAAPPLLRQSTSDARLSKQALPNR KVTLPRSMTATPRTRFSPPKPSLLEQLDITPTRRTSGPRPQPGKEPGITSTTTHQHHRHNSDDN HLFNTSRDSLPPLSTFDMTNIDDLDVDDDNDPINDPFARRNSVQRIVSDPDFQPNKSTITSSSS QNKRRNLESDYPPEILRQKSFKDLQSEPFDHTPTATASAPVKTTTPAPTPGPNATSDEKMDFLM NSEDKDRRDFFSTLTMNEWEDYGDLLIDQFSDALSKMKDLRHARRKTAALFEAEIARRNEVVEE QSADLTRKLEEMRSGGAEVLKGRTP Transglucosidase MQAIEQAGSIFTGWISSCLFCLSGRGDDESFHHQQAMKQKGVEREMRVCHTQPHLVPPMNLTDY 185 CAK47704.1 DDLPSPSSQPRVSSLQSWVVEGRTRASRASNRASMSLKRKSTAPVRISGPSEFRRVSMFLTELD EYRPLELSFNTPGNRLPDLPRFEDFPLDHDRKQVISRPPRALSSTEMGRISQPRTHRPSSSFQL ARKPVGSGSRRSSLPTQEQLQLLEKNTPITSPLIPHFSQRSSAVTGLTANAVPSTTPRLDLSGG STSLARNELHRDTHEAPTSTVPRTPTKPSLQDRPLPSIPTEEDSPSSGSTYHPPTTPSESRPPT TPSENPNQTPTRSGRVTQWLFQTPNKPNFLFSNPSKISDKGPFRIRSRTLSGSTLASTTTNITG GHKTTPSLASGTTVAPTMQASSTESRNFDLPLGSPFSPKQTFPTVTEEHTYPTIHEGEQQQHQE PEVFDDMLTQYYEYRHSAVGLAF Transglucosidase MKIFILLAIWLLASLGYATSFVGNMAEVYHEITDVLRGPHHAAHFAKLNGGKPKPDKPKGVGKT 186 CAK44239.1 LDDVVTLTVCKTDVSLAPTVGTFSLPTIVTAATLAPTVVVTGVVPTEISIGLPTEFAPEIQTGV LTGESHSTDTTVVPTNSVVSGLSSFLTGSQTVSEITGSTENHTITTEINASAVTSTFAHTTFPT ANAGNDHEGMASSAVALLVALIFSLVRI Transglucosidase MRTRSQQASPGGFVSLDENAPRRTRSAKNAAQQEPATSEQPPTRSKSQRAPKKTTTTTTTKKST 187 CAK44326.1 AKAQTRKATTTKRTTRQSTRKTDQPVSNEDAQTTHTEEDFATDNTTAEKNTPREVPETVDPTPA SSENENPDRESLPHVSHPFMVPQPPKESQEIDCFDGPDPRGIGAASCLKSLIDELSSVGSPLSE RSKTPSWTSEDGTEAALAPRPAQESGATNVETSTTTERVSLPTPAAEERTVEPPAEPTVAEPRG VAIVTSSEQFNTVSSASAAGSGGVVVVEDERVGALIASFARLSLDDLAPRSSNEAAATLMESTT ASFGEPVEPAHVTRRSLRASRQEWILGWAQQVPSTGYFHPITGQLVEGPAASVELVGDPASNRG PPTRRIGVRDYILRRRRREVQVMSPLQEEPQSSPGPGSRAVALNAVPPRGIRAQRAQNTKRLTK PPVTRKRARTESSDEEAPGPQTPAANKRRNLGPPGSTPYRPATRPRSLTANITPYSERLRRRAA EKDGRIHSTSLRVSQLLAQQEADRRRQAAESSAPPCSELPRTTFDFSLDDAHETSQGQEQSQSL QEQSSTPQPPATPERQSGWNIRGLLNSVPRTFTRILPSFRRTPEPTQVQAPPEPSSERISRTQP PQSSSISQSQAQNSRRSSEEPPQKRRRKSWSLFAQPFDRSLYLGDIPKKDSATSSSAPLESRPV AKLSAEATTPQESATSDAKKDVAAEGEDSRGREIEEQKQKKRKRSPSPDVIPNPPGCSYGLDLD YFCYSSESEDEQEPPLPRTEPNKFGRLTRTAVRGALRSERHSSKKVRFDASPEDTPSKLRLRAR ATDPYRGRHFIGMGNDSEIATPDSPTPAPHAADESSSRRPGFVPNVQGTFQLDYDAFSDDSDSS GASASANVSASAPIPAPSSATVTQASISESVPSTESRQTPRQAAPAPSTPAKIDEEALARARSQ AEKYKPKTPSGLRTASRYSSPMTATPDTVSAPVIAPAITPTPSTSQTAPASAPEPEQQTTEDFG DDEFAREAQWLYENCPSGDLNDLVWPQPITYEEEGFSPEVIDLVNEIWDPSTVDYAYTNIWTPG LDAFKRELETGASEAAQA Transglucosidase MAKSASQIHRAWWKECSVYQIWPASYKDSNDDGIGDIPGIISKLDYIKNIGVDIVWLCPSYKSP 188 CAK47737.1 QVDMGYDIADYYSIADEYGTVADVEKLIQGCHERGMKLLMDLVVNHTSDQNEWFKQSRSSKDNK YRNWYVWKPARYDEQGNRHPPNNWVSHFQGSAWEWDEHTGEYYLHLYATEQPDLNWEHPPVRKA VHDIMRFWLDKGADGFRMDVINFISKDQRFPDAPVKDPRTPWQWGDKYYANGPRLHEYLQDLGK ILKEYDAFSVGEMPFVRDTEEVLRAVRYDRNEINMIFNFEHVDIDHGTYDKFEPGSWKLTDLKA FFETWQKFMYNNDGWNALYWENHDQPRSIDRYAQAKEEFRTEAGKMLATVLALQSGTPFVYQGQ EIGMRNVPVEWDMNEYKDIDCLNHWHRLLKHRPDDIEAQKSARQEYQKKSRDNGRTPVQWSSAP NGGFTGPNAKPWMSVSPDYVRFNAEAQVNDPNSIYHYWAAVLGLRKKYLDIFVYGDYDLVDKDS QEIFAYARQYENKKALVLTNWTEKTLEWDATTNGVKGVKDVLLNSYESAEAAKGRFSGQKWSLR PYEAVVLLVEA Transglucosidase MAYYEPQGWQAPAARQASWEQPAPPSRSGSSSVSQRDEIPAFSSQFDEVDRAIDNLVKSGKLWA 189 CAK47819.1 APRRDSMPMMMGRPYPDYDPRMVNSMSQRHHSISEFDSRMHPSPNVQGFYASQRFQGRPNEVEQ MMQAKRRMAAQRERELRNYHQEQQYNRSLLAEMSGNKSDRSLSPAAMSEESRRELLARQHRALY GNDSPAFFPPAGLADDGTRSESQAGGTPTSSTGVRGASPRNVDPFGLAQTPVQAGADSLGQTAA SAASLQSPSRANSTSSPSSAINPVFGKYDSADQPVTSTSSPGGADSPSSRQAPSKSMAGPIGSV GPIGTRPLPQPHAGQVSNPALNKRSTTPLPSPLGFGFTPGDAASDRSVPSVSTAPTTAAATASV KDTSGGVGLGWGNGSGVWGSKNGLGVQASVWG Transglucosidase MLSKMQLAQLAAFAMTLATSEAAYQGFNYGNKFSDESSKFQADFEAEFKAAKNLVGTSGFTSAR 190 CAK49181.1 LYTMIQAYSTSDVIEAIPAAIAQDTSLLLGLWASGGGMDNEITALKTAISQYGEELGKLVVGIS VGSEDLYRNSVEGAEADAGVGVNPDELVEYIKEVRSVIAGTALADVSIGHVDTWDSWTNSSNSA VVEAVDWLGFDGYPFFQSSMANSIDNAKTLFEESVAKTKAVAGDKEVWITETGWPVSGDSQGDA VASIANAKTFWDEVGCPLFGNVNTWWYILQDASPTTPNPSFGIVGSTLSTTPLFDLSCKNSTTS SSSAVVSAAASSAAGSKAVGSSQASSGAAAWATSASGSAKPTFTVGRPGVNGTVFGNGTYPLRP SGSASARPSAGAISSGSGSSSSGSGSSGSTGTSATSGQSSSSGSSAAAGSSSPAAFSGASTLSG SLFGAVVAVFMTLAAL Transglucosidase MPCVQAAAETDKSFVQIANADIEELIKQLTLDEKVALLTGDDFWHTVPIPRLGIPSIRLSDGPN 191 CAK49185.1 GVRGTRFFGSVPAACLPCGTAIGATFDRNLAVQVGHLLAAEAKAKGAHVILGPTINIQRGPLGG RGFESFSEDPLLSGIIAGHYCKGLKEDNIVATLKHFVCNDQEHERMAVNSILTDRALREIYLLP FMIAIALGKPEAIMTAYNKVNGLHASESPALLQGILREEWGWEGLLMSDWFGTYSTSEAIHAGL DLEMPGPTRWRGGALTHAITANKIPMATVNARVRAVLRLVQQASRSGIPERALELQLNRAEDRQ LLRKIASEAVVLLKNDDNILPLDKTKKIAVIGPNSKIATYCGGGSAALNPYEAVTPFEGISNSA SGGVEFAQGIYGHQNLPLLGKRLRTQDGLTGFTLRIFNDPPTVANRVPLEERHETDSMVFFLDY NHPKLQPVWFADAEGYFVPEESGMYDFGLCVQGTGKLFVDGKLLVNNANVQRPGPSFLGSGTME ERGTLELTAGRQYKVHVQWGCAKTSTFKVPGVVDFGHGGFRFGACRQLSPHTGIEEAVQLAASV DQVVLVAGLSAEWESEGEDRTSMGLPPHTDELISRVLEVNPDTVVVLQSGTPVEMPWIQNAKAV LHAWYGGNETGNGLADVIFGDVNPSGKLPLTFPRHVKNNPTYFNHRSEGGRVLYGEDVYVGYRF YDEIEIDPLFPFGHGLSYTTFELSGLSFERDSNSLHAICTLRNTGSRAGAEVIQLYVAPVSPPI KRPQKELKEFRKVWLEPGAEDVVQIPLDLVRATSFWDEKSSSWCSHSGTYRIMLGTSSRGAFLE SPIELSETTFWSGL Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 192 CAK38411.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MDPNTSDRLKRLQMENLGTARREYISTADDRRHKKARLEDIQAIRMTNVPGSAAQRMATVNQGR 193 CAK47899.1 LEDWANVHKTIGDTEDLENLDSLLDGQSHRLQLSAIIRESGGQKYIGSQRRSGSAPATRSLHPV GRGGGVIGTRGRGTRQSSLPPSNPTPRGHVTQPPKRSHGDAALDDNDFYRTAREGNASRKEEAM RAQNRRSSVRRPTSSTTRPRRPVSQVDYSSMLSQPQSFLAAARSLVSARTTPAAPTPASQISRD GGRSEASRKSSSPMDTTDRPTVQKTQQEPKPQMAEPPKPFTRPVVQLPAIPPRCTAVQQESTTK ADEPLPVLPSSAVLDATSQDQGSASMSLGSTEDGQSISGIPESQTGTQEKVNSDLSTAAAPVPD IKDTSPKAATDVKEAILLDFSYTPPEQSIHGQSPAPTEVLTPSLEDLRGLDFKQDIHPKFPTRR RVDFDMSSDKREASTNVHDLMPTKQYDKAEASEDLHRQINMLCELLQSTSLSGEHRESLKQCKT ALEGKLHGAYDSTGKRTQTQGDPFLGKPVLETLEAAAEPDEQPQSTISGLGIQNVSMDNFTPNK AETIVEPDTQATPSVGEMIMPTSVENARLAMASPSPSSRLNVTAPPFVPKTPFRAQSNSFSSDS NATCVPETPCPHRRVSMPEGHIIGDHLLPGRRRETISSGTEPLAAKQPPANEVTEPRFKFSIPP KISRKLTIKTPVREGFGKEETPGPVAPRLASGNIPKPAPKPSAALQQSVHAPKAKPSSVLGGLE SSRYASPSSNKPFR Transglucosidase MPPSTVFAYWRREHRRSSASPVSPSLQPTSKAPVTSNPPQLPGLSSTRPNNLTALGASSVESSS 194 CAK38738.1 PQVPNNPHEDYYDATKKTIAVSVAPANAPGSSSANLAIPSSSSDSHTRPLSISDEQDVTTTTSQ SNYSQSSIAPPRSDQSDGDSPKPSSPFRLSLGKSLLNSHNLTSDHYNKRSSTPGLSSSGHFRFR TSPDISPGDRMALSHKDKDKEYKYEGAGNRRSADRDGSSEQAHHKSGRTRLHLLNPMSLLARRR SSNLASLRTEDTRVGARNIVPAIPDDYDPRIRGNIVHDFSAPRPRRNLSTAPVLMHDVNNQSSS ADVTYNGTGNFAHGNDQSAQSGEQRKRHTQYSPVFREHFEDDQKVLQVESKAYLQSSLLTAQTN AENDPHTLPVFARKLPSKIPEQEVSPEVPSDQTTLKQDSQHSPPNNSRELAQEDTDTIEVIPHQ PSGLPKHLKSNASRFSFDMNGVESSAQEKLLEEKHKEKEAARRAKARMEGTSFSDGEDDFDEDL LDDMDDLEEKIPGVNVDADEDDDFSGFSGPGNALNKPWLAPELSPIIASPLPTGSTNSQNVQEL AQGPLAGISAPLPVSDPAVSDVTTNFQALSVATIAPNNAPQVAMGSHPPAPQPIEDDDDLYFDD GEFGDLSTEDMGEKFDESIFDDETSHLYERKPVVQQPVPAPVPPPDNGTGSTNPLDVTAEHDEF TPEPDYDGGLRHVPSMASDYRKGSIRVYGQTRESLANLGSAKAQGGVLSEHNLEAFHNALAKAA SEAAASDRFGREASISEQSLGQESTAQTMDTPSGLVSDDSRLSQTVDMAAFEEVFEDFSYDDND DALFDDPIIAAANAEALENDDEGFYGQEFGFYAQAHGGCNGELTNGGYFGPRGVEGVNRSFSSR GKFREPSLTPITERSEWSTRNSVISLTAHGAAHSNPIASPGLAQLVDLGAMDDEMSLSALMKLR RGAWGGSNGSLRSSSGSPPLLHSTSNRASFISDASPTVYTAPPDAFGGSATESPIRESDKFRWS LNNTEQRVGQSAAGEREP Transgiucosidase MLVEPLIRTDWPVWACKPHPHLVGPEAVAKNRNRSALQPSLAPSQSLVVLAQSNLPPAFQPSAL 195 CAK38790.1 SHGSVFGWPCILPWRSSGNAGDEPPSGPYSYSGWPTPLTSSNQPSPSRREHAVQPPPLTTSLGG HQFQGLGLALGSGYSSTPLSSTSLSSPFTQGQSPAVGSPGGAAIGSSPMASRQYNVPYNPQDWG PVGSGSMNAGQATYTPPNSMLRIVSQPRSTGPHSDVSLSPPPPPYSPPSQQHQRENVSQNTSSM GSTSPSISSSYNGAVRAGVDAPSEYRQRRLPRTRPLSMFVGSESSHNRRVSLPPPPPLPPGLSS RSSSQNRSETYREPASVMAGPGPHIVVSPDNLHSTQLSDDSNMLEPTQPFDTDRPPAARRAVSA GPAVNSASSSRAHSQSGARSPPGTSWEPGMPLPPPPPGPPPATRSQSVNGLSDSSSSRNSQGPV RGGRARPPPVLGTSLDSIPPTPAGWVDETIDVKPRTERQPLTIDTATTSNTNGPESLESSRASH NPNSGGLFRSPAIKDPNAKGIRERRIERRNRQSQVLDSLSAVSMSSNPWAEALEQLKPSNLVLG ESSVDTDNGRNPASAKAAPLSTRSISSDGPQITSRSRASSGGLFSNRSCSTPKPEPSPQAPTSN SRFAQTPPFSPGTERSSAFPKRTSPALPPKALPTPPLQSGSETTPSRPGSKEERPVSHILHLPN EPVTTVSPLAPRRVSAQQNPSLDSVIKRDDDYVRNAIQRHREFIEKEAGTMDEKEALRLFADFI ISESQIRRERYAKVWDLDSFDVESVRRKLFVSPPKTAPVPQTSQVVPGPSSRRASNPTAPKLDI PQVRPESAWWNNYQPCLSPIASLGLSNDEMSSRGRAPSRWWESKTGSSSEGGERRVQRSKRETK YMGLSRGALLWEESQGSSDTGNAGTSNGGNQYAAYGPDEYPPEKVGWHEEPALEDYSNNVRLGS SRRFEEVQRMDVSRLITLPPPYPRHYPAVNNSHPDLVTYRTLVRSITDLSEIKTARQRHQTEMD GLFQDHQARVREGRRQFKANIQSQIQQGSITFAEAAEAEAALIVEENRLERDLIKGGLDTYQES VFKPMRAILADRIDRATACIDELRGRLFDDARSETPDQTQEEGDEKPELLEKLTQLKWLFEARE QLHREVFDLISDRDEKYRAVVLLPYKQASNEDKVRETNEFFVKDALDRRVDYEANALARLESFL DVIEGNVARGVEIQLSAFWDIAPSLSELVQQIPEKLRGFTVQIPANEYEENPSYRAHPLQYLYT LVSHAEKSSYQYIESQINLFCLLHEVKSAVMRASCNLMEAERIRLGESEGKVQQEMQETRTDEE RTLTSDLKDKVATVEGQWAEALGSGIQRLRERVKEQLMVEDGWEDLEQLEQA Transgiucosidase MVTQSSLLDRVWLYTHKRSPILLALHPPSQSSLISSEGHLEEQTEPTVQSRRYIPNTIMNTNMH 196 CAK38810.1 PQSLDSVRSGEEAKSENEDSNTRSGAISLIRARTISRSSTPRDTQEGCSSEQADDSQQAVSIPR IVVMEPPGDSKAKRNKMKKNKYKKKKLLLGDSESETSGSQGSGLNGKPTAESNTSNDTSSHMKE IEDLSHSAWMPAKGLNSGGCANKEKPMSGSNVSERCTVDSDKKRDLIESLSKLDIKGKQRVWQV NAVTGNDTLEEINTQRGPQPVDRARRTVLAPSYATVLSGKRASIEGPSSMPNNSDSLLPTTMEF PKLTDKTSAGQDSSPEARSGHAKLSPIPEISGEFAGDNSNDIPLSQTDLETAGSSSLDNPISTP VSTVSTGWSSTALQISPQTTEPSSEPSSSKAVSHRHATSLHHAHPLPPTPPSSTHSHSLSTANA TITNAQGTLSAQKPEGFFWQLDSHGFPCAKAHCEKRCNLWDGATVICPRCGPYSEVRYCSRAHL LEDIKPHWLYCGQMVFQHPCRETSIPRRVRAGPPLIPCLHHYDMPERHRQAVHFNMNAREGDYF IFTDWLDFVTAGLPGDKTAIRCSNRIMYVVKFDDAAEKDRFRRVLAACLFMTIELPDLTDYLYR LIRDKLRLANAPNHIEPSLRYQFLQEFNVTIQERITGSRHACETDWDGRNRRNCQDPVCRAEYR RLLGSVGGRGYSRMIETLESTYWILRAARTTHPSVKDAMKRMMGEGYAEVAEEDRRAFRRGDGW DGAGSGDMEIEGFNEGDE Transglucosidase MLATPMTPQASHPSSSNMVCSLASTTTTTSSSSSSSSSSSSSSATQQTTISSRPKLTLQTTSLP 197 CAK38817.1 RTFGTSSTGLSLSIAAGTASPTVRNTFKNAYEVTGPSSATASPSSKHPSNLRFSKPSSPFTTHN PYQLPLGVKSILRNSPLEPTCRRRAGSVATTGPNGGPSARRVFFPAKKQVSYRNPLEEEIQTVH YTARHSDLHDDPEPALEPQSQPQQPEVTSSDEDSDSNASGCPSDTSTSEDEPETGLGKTTSSPI KRKKRKHSNAERQVRAVALMDGIAGPSNPDSLTPQTPRRKRAKRRCEWRWTLGPLENRDKLLHP VQDETGPTSSASQPETIPHESETETPSSDPPLSSASTTLYHSSPSSSVSSDVETENDEWQTHTT HELECAHADQ Transglucosidase MAFWGVAEREVIERAVALEWADAAQVDERKESPNIRGVLSAGPSQPSRGDASEIKPGFGFSSAL 198 CAK38846.1 LWGAIFGAFGWTRVLRPVGRIPTRDSCSDRSDGTSWKRYLDLTLLSLDEPPTKGTKELEGQRKS QRARETKWALGSRGEKWALPELIILDD Transglucosidase MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP 199 CAK44692.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS QVLFVGGLQNLTKGGAWAENWVLEW Transglucosidase MLANSLVVLAAIVASILNPVLGAPALDVGVTEPQAEPKYVFAHFMVGIVENYQLEDWITDMKAA 200 CAK44966.1 QAIGIDAFALNCASIDKYTPTQLALAYQAAQQVNFKVFISFDFAYWSNGDTGKITAYMQQYANH PAQMQYRGGAVVSTFVGDSFNWSPVKQATSHPIHAVPNLQDPAAASSNSQRGADGAFSWYAWPT DGGNSIIKGPMTTIWDDRYIKDLAGTTYMAPVSPWFATHFNSKNWVFICENLPTLRWEQMLSLK PSLVEIISWNDYGESHYIGPYSANHSDDGSSKWANGMPHDAWRDLYKPYIAAYKSGDSKPTIPQ EGLVYWYRPTPKGVNCPEDNMPAPNGFQMLSDSIFVATMLSSPATLTVTSGSLGPVKVDVPAGI VTTNVTMGIGAQTFQISRNGQVILSGKGGLDVADRSKYYNFNVFVGSVMGSSAAGNASRMLLLL HTTLLKVLLSGDKQVNVCSTTGSKGVICHLLIPDQVTSLIIPKFLQHIVQGKKYN Transglucosidase MKRLMYLLVVLLLSYVVCALPYDDHGKKRDLGPLSDLPGGDVIVWVDQAGNALANNVVGGGNSD 201 CAK47997.1 PTATADNSPTTLPPILSTLDGDLDLSPAVPLPASTNLPKTGNYRRFGISYSPYNNDGSCKSQDQ VDEDLDKLAQYGFVRIYGVDCDQTNKVTKAARQRNLKVFAGVFDLQNFPSSLDYITGAANGDWS VFHTINIGNELVNDGKNSAADVVNAVNTARSKLRAAGYQGPVVTVDAFSVMIQHPELCQASDYC AANCHAFFDNNNTPDKAGQYVKDQANKVSKAARGKKTLISESGWPHNGQPNGKAVPSSLNQQKA IASLQQTFTGEDELVLFTAFDDLWKQDSSGTFGAEKFWGIQKH Transglucosidase MPHEERVSSHVRQLLQSLTLEEKVALLAGKNMWETVNIDRLHIPSLKMTDGPAGVRGSKWTYGS 202 CAK4847.1 LTTWIPCGISLAATFDPAMVEQVGSVLGQEARRKGCQVLLAPTMNLSRSPLGGRNFESYGEDPY LVGVIATAMIRGIQAHGVGACMKHFILNDTETRRFNVDQTIDERTLREVYMKPFTMVLNDPAST PWTAMVSYPKINGLHADISPHILPRLLRQELQFDRLVMSDWGGLNSTAESLRATTDLEMPGPAV RRGERLLAAIRAGEVEVAAHVDPSVRRFLQLLERTGLLGDATKSAEHSEAATDDPIFHRIARDA AQSGLVLLKNDKGILPLKPTTLQRVAIIGPNACQPTAGGAGSAAVNPFYVSTPESCLRDVLHAA NSELQVSYEPGIPSSLRPPLLGKLLTVPDGSRKGWQVSFFEGHALEGPVVASSMWDDSLIYLFS DGDVPAVLDDRPYSYRATGVVTPQESGRYTWSLANTGKAKLFVNDELLIDNTEWTGLTGGFLGC SSADKTASVYLEAGRAYQLRVDNVVTLPVVEAFDNTLFPRISGVRVGLALEQDEPEMLAQAVAA ARQADVAVVVVGHNKDSEGEGGDRATMQLPGRTDELVAAVCAANPNTVVVVQSASAVAMPWVDA ASGLVMAWYQGQENGHALAAALLGDCDFSGRLPITFPRRLKDHGSHAWFPGEAAQDRNTFGEGV RVGYRHFDAQGIPALWPFGFGLSYTRFQLTNIRVCGRVEGRSPESQPVLIQARVCNVGGRDGQE VVQVYVAPSAGIREAGEMSFPKTLGGFCKISVPAGDSREVSIPIRGSELSWYDARVAQWRLDAG KYACWVGRSSSHIDAELEIEVAEGEDTRQGTLE Transglucosidase MQVLRCGIHGFHEVLGIDVDEIRFYWTIETDDKHASQLAYRVVLSTDEAAVQGDAIVESKLAWD 203 CAK39248.1 SGRVMSNEQRNIICKPDNGFQSTCSYYWRVTLWDQSERPHHSAVNHFFTAYPRSHLLPPYSMNQ TYMPHTSLIFRSWFEDEPNRWKAVWIGDGGDKPIYLRKAFDLAQPPARAIMFASGLGHFNMTVN GSPASDHRLDPGWTNYHRRVQFTAYDVTAQLQTGANVLGAHLGNGFYAGDKGEDRFFWPMYEDN TYVRYGNELCFFSELHLFYPDGSHTTMISDPSWRVRRSATSLANIYASENHDRRQYPTGWDTPD FDDADWAFAKPLTGPRGHIYYQTQPPVVLHETFQPVKITEPRPGIVCYDLGQNASTMVRVEVEG PRGSEIIVRYSETIQEDGTVLMPDPLFKEYETGVFSRIHLAGTGAPETWEPDFSFTSARYIQVE GVSLDGSDGRPVIRSVVGRHISSAARRLGTMQTDKEDVNQLLSALSWTFSSNLFSYHTDCPQIE KFGWLEVTHLLAPATQYVRDMEALYTKILDDILDTQEPSGLVPTMAPEIRYMCGPLHDTITWGC AVCLLPDILREYYGSTHVIAKVFPAAVRYMEYMRTKERRGGLIEHGLGDWGRGIAFGNNQANIE TAIYYRCLQCVAMMARELGEMQKAKEFEQWAARIYAVYNRHLLVTDDASRPYAYYTSLDNYPAR DRDAIAQAMALQFGLVPEQHRKDVMAAFLDDVADGRIRAGEIGLRFLFNTLADAKRPDLVLQMA RQEEHPSYMRFLRRGETTLLEFWQDECRSKCHDMLGTIYEWFYAAVLGLKPTGPAYRTFVVDPP YNAEFKHVKGSVDCPYGTIAVEFTRNEQGQAVVNVRVPFGTTAIVKLPRSGKSSAYCREGEESR AVDGGEVSLSHGVYSIIEG Transg1ucosidase MPSTYLGALATLAVFPCLGQARSTWPLGSGLELSYQASQHQISIHQDNQTIFSTIPGQPFLSAS 204 CAK39259.1 AGKDQFVEDSGNFNITNVNQARCRGQNITQLAGIPRSDSVKNQVAVRGYLLDCGGEDIAYGMNF WVPRRFSDRVAFEASVDSEANASVPVDRLYLTFASHALEDFYGLGAQASFASMKNRSIPIFSRE QGVGRGDQPYTAIEDSQGFFSGGDQYTTYTAIPQYVSSDGRVFYLDENDTAYAVFDFQRSDAVT VRYDSLSVHGHLMQADTMLDAITMLTEYTGRMPTLPEWVDHGALLGIQGGQEKVNRIVKQGFEH DCPVAGVWLQDWSGTHLQSAPYGNMNISRLWWNWESDTSLYPTWAEFVQTLREQHGVRTLAYVN PFLANVSSKSDGYRRNLFLEASQHRYMVQNTTTNSTAIISSGKGIDAGILDLTNEDTRAWFADV LRTQVWSANISGCMWDFGEYTPITPDTSLANISTSAFFYHNQYPRDWAAYQRSVAAEMPLFHEM VTFHRSASMGANRHMNLFWVGDQATLWTRNDGIKSVVTIQGQMGISGYAHSHSDIGGYTTVFEP PTTSNSSGAIPRSAELLGRWGELGAVSSAVFRSHEGNVPSVNAQFYSNSTTYAYFAYNARLFRS LGPYRRRILNTESQRRGWPLLRMPVLYHPEDLRARQISYESFFLGRDLYVAPVLDEGHKSVEVY FPGHSANRTYTHVWTGQTYRAGQTAKVSAPFGKPAVFLVDGASSPELDVFLDFVRKENGTVLYA Transg1ucosidase MAGVNRSFSYSRGDDALLRDDEREISPLRSAEDGLYSTSYGDVSPLSAGVQAQNRPFDRGLVSV 205 CAK39383.1 PEGQTLERHMTSTPGMDNLGPASVGGGISDVPVRNLPAERDFNTTGSDNPYIPAPPDGDIYPSS EAVRYRDSYSSHTGLGAGAPFAEHSTPGTTPSQRSFFDSPYQGVDAGPYQRHSAYSSHDYPLVI NPDDIADDGDDGFPVHPKGAADYRSNANVPGTGVAGAAAAGGFLGKFRALFKREEPSPFYDSDI GGGLGGAEKAQGGRHIIGGGSRKRGWIVGLILAAVIVAAIVGGAVGGILGHQEHDGDTSSSSSS SSSSGTGSGGSDKGDGLLDKDSDEIKALMNNKNLHKVFPGVDYTPWGVQYPLCLQYPPSQNNVT RDLAVLTQLTNTIRLYGTDCNQTEMVLEAIDRLQLTNMKLWLGVWIDTNTTTTDRQISQLYKIV ENANDTSIFKGAIVGNEALYRAGSDVASAETNLIGYINDVKDHFKDKNIDLPVGTSDLGDNWNA QLVSAADFVMSNIHPFFGGVEIDDAASWTWTFWQTHDTPLTAGTNKQQIISEVGWPTGGGNDCG SDNKCQNDKQGAVAGIDELNQFLSEWVCQALDNGTEYFWFEAFDEPWKVQYNTPGQEWEDKWGL MDSARNLKPGVKIPDCGGKTIT Transg1ucosidase MSRNFHPVNTNFPSTTTLTADPDIIPSPEDNRNNYTSTQSYFCRQDVSTNPTFNSNDQALLDNC 206 CAK96650.1 SNIDVSQLVTEANCFWGSRSSTLPKNEGYPSVEEYPSTYLMGNHGHGYADENMHETSAPYGPCD HLQVAGAGMDMEMRMTGNVMGKVDETYEANQAAMLAALGMTHLDEGVSHAADDDAKTSSSVTVD DDPEWKEWKKWKEREANGGVRLPPEERMNQADVAAWLGMDSKEEHGVFQTQEEDDYDEDETISY VSDDDMMEMEEGMEDDEVVVNVVEGPSRQVLISSQTGAHAEYDTASFGCDFEEQEEQEDSEEQE NEERWGGDGQPPPSLFPRFYSDNAETGLIIELSSADSDAEETMSDPSPSSLSSSSSSSPSETPL QPHLQEAYALPWTTSPSNTSTITSTTSTPTDTSPVPTPFQPDPHPHPHPHPTTQHYLPPPSSQT TPPFTLLTDLHTHLTHTPQHRASHLRNLASEISNTMYFVNSHTISGDFSAEDAAPVDRVMRIVR EGELKMRYQERYERQKGKLVKRERRVAEREDRVSKREEMVSRREMGVAGWWEVWRERGREVWEG VEGEMMGTMEGNGYRRNGMDTLQRTVRVTREVVRRMDRIVDEGEVDGDGDVEGGVEVRGAYKID YNCDDDIVGSTSYRIEICNPPQRSSAS Transg1ucosidase MSSTPDDKPQRATAAQLANRKIKEVRRRRPNSAAPSAPAPSFGGGPFASIDPNTVSSTPASSQT 207 CAK40060.1 ASNGFTFGQSQNQSFPGASSAPSQNGGTPFAFGSGGGGSSSSSFNFSSSFGGTSSASNPFASMN TTTSDQSSKPASFSGFQGSLFNIPPGGNQSPAQQPLPSGSIFGTSSQSNASTGGLFGASTNNGP SASGAATPASGSIFGQNNAAAAASTPSTNLFGQSSVKKPSPFGQSSAFSGDDSMQTSPDAKGSG SQQKPSIFSSAPPAQPSFAGAGSTSLFGASASSAAETPSKPVFDFKATTPSTSLFGGAATPTPS ASTPAPAAVTTAAPASSSLFGAASPAKPSTPFQNPFQSSNLFQTTPSTSQKPDEEKKEEPKPAD SQPKSPFQFSASTTTPGSLFAKSDAPASPAPAAPSTGLFQTSSTKNLFEPKPPATADAEQTKAP ANPFGGLFAKPATPSKPAGEQQPLPSSSTPFQGLFSKPSTSNDAAKTSEPEKQATPGPMSFAPS SGGFSQTSNLFSPKPAASPAPAAETQATAASTSAAPVDSPIKVNGANSSPSVFTNGNTAPSTFG QMQTPKLAGKTTDPKTSEDAEMLYRMRTLNECFQRELAKLDPSSQNFDAAVQFYMRVRATLGAS VGSKRKASGEAEDAVATKKARPFGIPSEKADTPKENSTTPAVSAQSSTPFKGFGTSQASPASSK RKSIDEGDDNSPAKRVNGDSSTANIFAQSFSKSKTEAEKPEPSVVKPSTPESTKPALFSTTPTT APAKPLFSLSDSASKGSASTSLFSSSMSSATSTSAASGGNAPKNPFVLKPTSSEGSSTGSAGGT DFFAQFKARSFEDAEKEKEKRKAEDFDSEEEDEAEWERRDAEEQRKKQEQFGTQTQKRAKFVPG KGFVFEDESNESPAKKAEDSSSTTPGAGTIFSSQNNTAVKSNNIFGHLSATPSEAEDNDNDADD TEEASTPGDESDDAAENAVAADKKAESADSSAKEPEAGGRSLFDRVQYGEDGKPKRQGEEEPKG NVSTLFGSSNFSSSFNTPTSLTPASSGESNLAAPKPATTNLFGAPSTTSSIFGTPLSGSGNSTP SIFNAAQNATKSTGDNTWKPDSPIKFASDSASASSSKPDSGSATPALEAPKPFSNLFGAPPSLT KSSTSKDAQPSLGFTFGTPGQSSPSVFAPSTLTSAAPSRSTTPGGASDTGAEESGDGDGAESLP QLDLTRGGAGEENEDLVAESRARAMKHTTGTGWESQGVGFLRVLKDRTTSRGRIVVRADPSGKV ILNTRLMKEIRYSVAKNSVQFLVPQSEGPPQMWALRVKTNADAERLCKSMEETKN Transg1ucosidase MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS 208 CAK40395.1 SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV LDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF Transg1ucosidase MPLRPPASPLSLETISPSPPDLDSDPLIASDDDLDDDDRAARDQRIEKLAQAYCHGTPLFILSA 209 CAK96888.1 SLRGPFENGWANPWKKDRRTTGGGHVGIHSEHSERPIIPETIPQKRPLYRESLGISRSKSAVPL SDPTYSSKKRESQGGTAGEPSSKRPRDSRGRTSSSNNTPKPIALHKRTIETSGHSDALTPFQHT EQSWLKRDRAEINFRKVDPPTSPTTTVSSRHREGGHYTIQVPGTDYRVTTKNILRARTDSRDGT TFDSHVPDSVKAQLTSRHPGVRPETEDHENSICVLSSTSHLSKFEFRRRKRSADPEHGTTSPVP MQLENEQVSHQTTVVEDSRVSSQPAPVPPNTTSMQAIEVDMQVNEDNSHHPNVSERSGTVDHQG NSQSRKVSSTTTAEGVNISDKYPSAQRVSVNPALAENVTSLQTISAVKPNSECDNDTIPDLHFN TQAALLHAQKSFQNDLESQVPNPGETHNQPSSPANDITPFHRMNTSRIGKYSRAIHPGTAHMPM STQCIIDAVTPFTFSTEKKARSRFISPQMPSSSRIDRGTATPNTGSPLSSESEDEDEDEDPTIL PHKSPAAQQQTPDDTQEGSALPMALSHSHPTTVQDGQGVAPGPDSFNLSQAIADAGSWLQQSFD INHEIQQCRSSAKSRPSSSAGISRSASGTILGHSTLDSVLLC Transg1ucosidase MPGHSRSRDRLSPSSELDDADPVYSPSVYQREHYYNNDSLFDSADDDYTRTPRNVYSYETHDEY 210 CAK45960.1 HDDDDDDDDVHEHDHDHEYDDKFEEPWVPLRAQVEGDQWREGFETAIPKEEDVTQAKEYQYQMS GALGDDGPPPLPSDALGRGKGKKRLDRETRRQRRKERLAAFFKHKNGSASAGLVSGDALAKLLG SQDGDEDCLSHLGTERADSMSQKNLEGGRQRKLPVLSEEPMMLRPFPAVAPTGQTQGRVVSGAQ LEEGGPGMEMRHRGGGGPPAEGLLQKEGDWDGSTKGSSTSARPSFWKRYHKTFIFFAILIVLAA IAIPVGIIEARRLHGTSGGDNSSNSNLKGISRDSIPAYARGTYLDPFTWYDTTDFNVTFTNATV GGLSIMGLNSTWNDSAQANENVPPLNEKFPYGSQPIRGVNLGGWLSIEPFIVPSLFDTYTSSEG IIDEWTLSEKLGDSAASVIEKHYATFITEQDFADIRDAGLDHVRIQFSYWAIKTYDGDPYVPKI AWRYLLRAIEYCRKYGLRVNLDPHGIPGSQNGWNHSGRQGTIGWLNGTDGELNRQRSLEMHDQL SQFFAQDRYKNVVTIYGLVNEPLMLSLPVEKVLNWTTEATNLVQKNGIKAWVTVHDGFLNLDKW DKMLKTRPSNMMLDTHQYTVFNTGEIVLNHTRRVELICESWYSMIQQINITSTGWGPTICGEWS QADTDCAQYVNNVGRGTRWEGTFSLTDSTQYCPTASEGTCSCTQANAVPGVYSEGYKTFLQTYA EAQMSAFESAMGWFYWTWATESAAQWSYRTAWKNGYMPKKAYSPSFKCGDTIPSFGNLPEYY Transglucosidase STSYGGTRTPDSSSTDVSRPSDLRTGPATRAGSGLTPSLDPSSRPLASRPANRDRIPPPPPKSH 211 CAK40856.1 HGKRIAPSPGVTPSLTQTTPGKATNRFSFHGSPSEPSYSPRPPQSGSDYFSAKPKDEPPSTEQS TESLRRSQSQHKRPPTPPLSRRHSQMRRSKTTMSKVNPLRLSIHVAQASSAASSSSSPPPSPSG WSLNPARTRESRTGSTPSEEPMHTATSTLRPEPSAAAPVSPTETSQSTSTGSSTKRTSLYNPLP PPPPPRRSRGSSNHSIDSSGQSLRSGKPADETTAAPAAPAAQDEFVPHPSNAHDILADLSRLQK EVDDLRGHYESRKASQ Transglucosidase MYISKVLLVTCAAFAPFASAAVQAKPTDTPVPVSSTHVASSPLAPTPTPVSPSPLHTASSSVII 212 CAK40944.1 SSSSSSVRFHPSSSASPSHMASSSRRISSSAISSSAIASSSASFTRSYITKASARPTTTTSTDA DSKSNSNSGSDSESATAAAASATHSGAAAPAVQLSGGMAAGVLAAAGFIML Transglucosidase MPRSTVDQTSSAEAPYSGPRKLVLCLDGTGNQFMGFERDSNIVKIYQMLEKNTPGQFHYYQPGI 213 CAK41060.1 GTYVEGQSSSSGLLRYPRKLQSNIITTIDQGVGTTFESHVLAAYRFIMRYYSPGDHIYIFGFSR GAYTARFLAEMIHELGLLSQGNEEMIHFAWETFSNFQQARGKTDRTAKDEALISYMKKFNTTFC RPQVQIHFLGLFDCVNSVGQFEIPFHRKSHQYLVSPAARHIRHAVSIHERRLKFKPALVLLDKT KPVDLKEVWFAGNHGDVGGGWSLAPGQFHLLSDTPLNWMLQEVLHLENSESKLSFHTLNVADVV ERENAFPGKEEPGTTAYDVRKRTNQPHDMLMFNRGATFLMVIFWWILEILPLFTRLELEHGKWV PRQWPPNMGAPRDIPEDAVIHQSVHEMVRAGILDPKSIPPRGGNNSHLPSTARITGAWKAMRKN QEKQISSLLQKKPAGALRKEFDGKAD Transglucosidase MDPANEYCGLEDYGLVGDMHTCALVSKNGSVDSMCWPVFDSPSIFCRILDKEKGGHFSITPDRR 214 CAK41144.1 LKNPLSKQRYRPYTNMLETRWIHEEGVMNILDYFPIAKPKPHVACRSGMVRKAECVRGEMEIEI ELFPAFNYARDSHVAQQSSASDDAIQVYHFQAESQNLVVSVLGDRGDISGDDSDLSIEFELSDR PGHLGPGLVGKVTLKEGQSITMLLHDQESITCNVEDLAPYLQQIERTTGDFWSDWTSKCTFRGH YREQVERSLLVLKLLTYKPTGAIVAAPTFSLPEHIGGSRNWDYRYSWVRDAAFTVYVFLKNGYP EEAESYINFIFERIFPPMDKNPKPGEPFLPIMITIHGEREIPEMELEHLEGYRGSRPVRIGNGA ATHIQLDIYGELMDSIYLYNKHAADISYDQWRAIRRMIDFVIQIRHQPDQSIWEVRGPPQNFVY SKIMLWVALDRGLRLAEKRSNLPCPDRARWMHERDALYDEIMTKGYNSEKGFFCMSYENQDAMD AAVLIAPLVFFVAPNDPRLLSTIQKITEVPAKGGLSVANMVSRYDTGKVDDGVGGNEGAFLMVT FWLVEAMMRQLRKTATSQFDSILSFANHLGMFSEEVATSGEQIGNMPQAFSHLACQYMFNVIVK RKSLETCEYHIALDVEKRILQICYNEDISIFNKWLSISGFTYRQSFTISQLVVV Transglucosidase MPIKLPKGFARRKSSSNALEEVQNPTQSSFRVFERPTGDKKSFSDGNLVAKRLSEGQPLDSPSE 215 CAK46428.1 DDNNIFALHNSQPARHQYEPPLSPDEYLTPFAHPDGPQPESQSPHTRNLYDIPIPPLSGAIRAA GRTFSFGGRFSKASAPTPPPQPSTPGPSRSRGMTTSTSSTATPPKLPDTELRIGKIDDDFQNMF GDIGKRYYGSKDASLDQPVDLDSSGPSSRPDALPRKDERVSRPTPIDTDRSREVEPSPYSWDSR HSEEGLLTTLDSPNEQPATQPYQRNIDPVSVGDRRKSIPLPGTTPLATTSHRSLAKPRTTADKG LRRSGVYSNRRDSVPVEDEDAKLIMESLYSKRSSQVPFMADHGASDAENDGPLFDQPGANSSHP DRRESIQKDSLSSPVLSDHLDPSIAAHARLAAQYEKAQPVTVSSTNKVMTPSQFEHYRQQQELR RSNSDASKSENSAESDYDDDDEVEKDREAERQ RRKQEAHLSVYRQQMMKVTGQESPAPAMRTELDQASKSAPNLLQPGTTLGSGKSSDGDDDEEIP LGILAAHGFPNRNRPPSRLAPSSSIPNLRASFQQPYLSSPSPASVAERDPNNRGSLPVFARNLP RDPYFGASLVNQSNRESLALGGGASVHGGPSPALPPGGLVGVIATEERARAMRRGSPNTQAMYD YQGGMPVPPVHPRGIPRPYTMMSLSSPNAGGVQPTISATEQAQIELSQQMTQMVQMQMQWMQQM IHMQGGQSTQLMPPGGPPPTLGANLNARPSSMPSAGNMNNPHAGYSGDQRTLSMLDPNVSSRLN SAAMPHVSGGLRPSTPAGQGYAPSIAPSERSNVGLAPRYRPVSTIQPDLGNVGSPSIPKSWSDE NRKSSLSAAVPAAPQASQMSHRPMPSNSKPIRASKLNVGADQDDEDDDEGWAEMMKKRENKRNN WKMKKETSSFGDLLNAVH Transglucosidase MYGSQSGHQSSAPPQPEWRLPSSTQQPASSRHHPPQPSWRASSPPPPPPPPRPTTTTTSSSSPF 216 CAK41498.1 NPTVYGQISNPPSTVNNYPGTSVSPVSAVAGSETTSWGVKYNRHQLHAQSPPPLPPRPSSTAQS PQAQSPVVSPLDPNKPLPAAPGWATQSADNTSYQQWPSNPPYAPQQSVSASSLQPPPPPPAIST GYQSSSAQQSNPWQQPPAVPPPPYSGVPLGQYQDSSVQQPATLPSNQQITGHNAPNPAIPQAPS PKPSTGLHYESQPLPGPPQAPVAATLSPTHGTPPVVPPPVPPKTSPITVPTSASVLATGGPSDW EHLSPIPGSIDDLGAFGSRPQDGSSSEPLSQASQSRPPNIGEPVRKDESVSPITPPNNAPQMTS QTEGPLASQTVVRGNPHQPVRMGSTGSVSSDISTSETPESIDGIIEAWNRPISSQPSAEQNPQS SASGVRAGILPPSRKQSPIGTPTPRQESIIPR KQVRSGSSSVESSSVTNGPTTDKRTILPAFVPLDPYDDLDPWSKSSLERYVAMLRKEAVADSDA ERYNIFTAFMAKETKLREILFNIEPESTRVGENPKVSSRQPTPILRASTSVSNDDTESGLIPVE TEGGHVVSTTDDADSEDGSYSPGGRPILPRIQTPGATKLQRSASHTVSNKYNTDHVAHATSSRA TSVPPSMLGDARHEHALPPLTTNPPQPIYIPFRYTEGPQRGSDVLVFDRPAYQAYSDLRQASAE SGRVMSNAPAPTPGERPDSAVPSRRNEHDETFIGLIREKSVAYRKRAPRKTSSPPPLPAALRHG KPASPVDDLRSMASSPLSKQSESSWNMTTRKDLENYSSDFSYIREAVKSWEISSKSRREQLDKE RIHRQEVSEKRIDALFNGKEIGYADINLLEEEFRQKEARAQLDEERQELDKFVAEVFEPLDQRL KEEIAALQALYEAALAQLDHENGRTKSATTDR YNLSHTMRTVNEIYRKLELRYQKRLEIALDRERRRKKAERRPLVFMGDSVALKGVDQEFDQMEK RNILEAARERDHRANRLMDSFDDAIMHGLGENQSLLDEVAAKVAKVDTATIRSSGLPESEVEQL LKSVYNLIESLRKDSESILHNFNMADSVLNDADYSVSVAEARYSDADADVFRRLDDEKRKEDTK IQTDLKTKLESIRSGPANIVTSINGLLESLGKPPIIDQTGPSSQMPADTPASVSQHLPAEIAPQ KPQEDPEHQERLRKALENAKRRNAARVNTEISRP Transglucosidase MCNKSNYSSPKWWKESVVYQVYPASFNCGKSTTNTNGWGDVTGIIEKVPYLESLGVDISQTSRE 217 CAK41767.1 QCLTSLSLVYTSPQVDMGYDIADYESIDPRYGTLADVDLLIKTLKDHDMKLMMDLVVNHTSDQH SWFVESANSKDSPKRDWYIWRPAKGFDEAGNPVPPNNWAQILGDTLSAWTWHAETQEFYLTLHT SAQAELNWENPDVVTAVYDVMEFWLRRGICGFRMDVINFISKDQSFPDAPIIDPASKYQPGEQF YTNGPRFHEFMHGIYDNVLSKYDTITVGETPYVTDMKEIIKTVGSTAKELNMAFNFDHMEIEDI KTKGESKWSLRDWKLTELKGILSGWQKRMREWDGWNAIFLECHDQARSVSRYTNDSDEFRDRGA KLLALLETTLGGTIFLYQGQEIGMRNFPVEWDPDTEYKDIESVNFWKKSKELHPVGSEGLAQAR TLLQKKARDHARTPMQWSADPHAGFTVPDATPWMRVNDDYGTVNVEAQMSFPWEMKGELSVWQY WQQALQRRKLHKGAFVYGDFEDLDYHNELVFAYSRTSADGKETWLVAMNWTTDAVEWTVPSGIH VTRWVSSTLQTAPLMAGQSTVTLRALEGVVGCCS Transglucosidase MPLFNAKTLLAGLCAASIVSPSLGLPQSSHPGSSAVANTQGRASSESHPSWTLESDSTAATVST 218 CAK41979.1 SNTPLYSPSSSATASLGATQGSLTDDHDGASKSSTRSYGVTTISYSSAPVNNPQSAHDASASPS TPSGPHSHTTTLSSSSAGVPAQSQTTSSSRSHSVVSKETSTRSPSSLFTPSASTETPTNPSHTP STYTISTHSFSSSEHASSESATSFHAVSTSKHTHTHTPTSSTSSSNTPTRSSALTQHETSTSSS TPTRSHTHTASTPASSKANTSSSIKTHTTHSHTEDTTSSSASHTPTSSKSSSSSAEDVSSSPAS HSPTPSTHSITTTTNTDTSQSASITSGPSTTPNSTITTTSSTTTSSVDVYAIIKHLYKLVKDTY PVIKKWKEDPKSVKASDLIKPLKRVIPVADDVLDVLGAPSSLSSSSGDSESVLDSCSSGGGLLG DLIGIASCISSTADEAVSILGSSSDSDSSDESTLSSYFDAFETEGSSLSAVGVTATGSTASSTG TTTTTSSGDTSSKSTKTATSTNTDSDTSTQSTKTKTSTKSTTTSDSSSSAGSGGHSASASASPT STKSSTSTKESSTSTKTKTKSNTESSSKAASSSAAASKTDSSSSSAKSTSTDSTSTKKTTSTKS TATSSSAPLSASSSAHTSSIATTNTTSTSTNSKSSTSTDTTTIIIHTHSGTASGTTTHHTSTVT PSPSRNQTATTLVTTTSSYTPPLCYNHADPDNGAGNVCICTRSNGDYTTLSELPSGSGCSYTSI PTPTTTSTTKTTSTKTTSDPPFTVTELNSDVIVCATSTLSYFSTFTYTQCAGSSSTIYTAPTPT PTAQVVIAYLSDVYSFWSFFTPDIGSSIDFCNDAEAGELEASGSIKVIDPPYPDGTKELDFEIH DMKDCVYKGTSDEPGTFTCPDLAKTVDCESYGENKVHDCYGALSDGGVVYEEGIDCASIGSVEV Transglucosidase MHLSKISAILTPVLNAAAVLSSQAPADDLSVLSSEVARANNQSLLWGPYKPNLYFGVRPRIPNS 219 CAL00956.1 LFAGLMWAKVDNYATAQQNFRHTCEQNEGMAGYGWDEYDIRKGGRQTIHDAGNSLDLTIDFVKV PGGQHGGSWAARVKGVPRGDADPDQPTSVLFYAGLEGLGNLGVEGEPEDPRGFTGDVKLGGFTT DLGDFSIDVTSGPESNEYPEHGHPTYDEKPLDRTLVSSLTMHPEQLWQTKVIMFTQMKKEVDEM VEKYGSENPPPPYQLFTIKNEPGDGNMHLVQKVFKGSFEFDILFSSASSPQPMTSELLTEQISS ASLEFSERFESVHPPQAPFDTAEYTEFSKSMLSNLVGGIGFFHGTDIVDRSAAPEYDEENEGFW EETAEARGRAQPILEGPKDLFTCVPSRPFFPRGFLWDEGFHLIPVIDWDTDLALEIVKSWLSLM DEDGWIAREQILGSEARSKVPPEFTIQYPHYANPPTLFIILEAFIDKLDAKKNASMQTYADSGV TGNLRSIFVDQPELGEAFIRSIYPLLKKHYYWYRSTQKGDIKSYDREAYSTREAYRWRGRSIQH ILTSGLDDYPRPQPPHPGELHVDLMSWMGMMTRALRRIAVTIGETEDAEVFKTYETAIERNIDD LHWDDDARTYCDATIDEYEEHVHVCHKGYISIFPFLTGMLGPDSPRLKAILDLIGDPEELWSDY GIRSLSKKDQFYGTAENYWRSPIWVNINYLVLKNLYDIAIVSGPHKEQARELYSNLRKNLVENV FQEWKKTGFAWEQYNPETGSGQRTQHFTGWTSMVVKMMSMPDLPASEQKGHDEL Transglucosidase MEVMDMPKRNSPVSQPTAVAMASTSPHPEKKASDAPYIVDDDFPGDDDDDDDVSISPISERAPP 220 CAL00976.1 WSGTRWARFFPELSSHFSLASPTNSTNPPFPQPLTKGPPHIDGPSQQPERRSKGPSSLSSEDVA DNRSSSYTSRSSLTSQGSEATSPVHKLVDSLHIKSPTKAGVFDESKFAHQIPPPFPSRSSIAQS KDKPLPQEPPIELTPLSIRHKTPQIPDRPGYLSRLDPPPRSKKHASHHHPTLSQACTDLERTLA GLAEQQHSPAQLSPRSPLQILDGPLQISRGNMDMVATRPAPRPPASVHDNRQIHKAKSREDMKQ TKKLLKNKPSFSFTVPAFGRKLSRVHHRSTSNTSSKSEPESYRASVLHQPAVAELGDSEVAELQ GSSVIGFRERPSSAGGEKELRMRLPRLQTKEMGAPGRKRDNIHSTHEGPEQLRRPNGRARGASV GEKYFVSYSKLDGMPVHSTRQHQPTSQTSCMVYELEGGSTQPPAELQGDTTSPIDVVPVRISVG VSSVGAMPGTLPDRVILTVLEHITSLDDLFNVAVSRKDFYRVFKMHELKLIRIAVFAMSAPAWE LREMSPPWDTEWHFVLDPDAPVPEYTPSCYLKRYAEDIYTLAHLKSLILARCGTFLRPETIRGL SGTDDIRAAEVDDAFWRVWTFCRLFGSGKGREGDIAGQLDWLRGGEVARNRRVSEFTSIADPYD ANSVLFEPPTGFGDGNNGGLSKEQLLDMTEIWTCLGVLLQPMHEEWAYYILTLGLSAVLVLGSI HPYDNTTAVFQRAHSMGLTNWEASDTGASRSSFLREAVSKACQPRGSSTSQASMRSSGFSSQPS GSHDVSQTSNVRGEREPSPDFHRRRQAAYSAQLRIQRQQQPSPPNPMLAEERPISHYATIMSRL EGLPPAPQPPMSVSRIEIPPTTHSYMTNVSYMQPLQSVTPVYYPPQVRDPVDHAIDIMVRELGF GEEDAKWALKITDSGEGINVNAAISLLTRERKTHEQSSRGFSLRKRKSFLSSVINSPESRHSGW KWA Transg1ucosidase MEPLRRSQSSRSMRRSHHSSQSTEPFDPELARFQATTAASRAMLRSKCSYDVLGGPSKMAVPQR 221 CAK42352.1 QHRPAGAALNATNAPVEDVDLRRSVLDKTSDLSPPAGLPSIREFGRLDAGIATLPSSYRRLRKT RSMFTNWQRSSHVPRGLSSPGCPTHNILTRREPQDVLRAPGTLRRSMSFFRGDTQNSDSLRYAR GQDVAIEMARSHYQQPEIYPTELRKSSLTVPKSRPFKKTLRSVVSDAESASVSPAIQRSTNVIS YGKARSLSSSLKKGLKKVLELSRPSSARISLGKSSSNDQQRIQGSPSTISAKHSDPLNSGVEGN ALTHSPDTEGTVVYTGVKKSESSESLATSRSRVTSWADSTIANTVITYRADDHSSLSVIDEHDS SCLKPSSSEDVSLTTCRTPKPNCTIDSQRLYSALMKRIDGNKTENASKEIVLGHVREHRAIPTP VSSMYTRRSRKTIRLIASDESLQSPGSYTTADVGTVTPCEPAQRQAQRTHGQKHLQDIRFGSAN LASSKHTIKEDSRDETGNMAMGRSQSPEEDEDSPSMYSRSTGGTSPKTTDPKMGESDPEVANEP GVATIYASQRAIYSSPKRNADQGPEAMQRPSADWQQWVRTQMERIEYLTPTRRHYREDAQIQDE TADLAYRTPSRDRRGFWSGSPDEDLRTTCKVTARNNFSRPFSRSSSVRTTVIAPKEQADTLVPP PPPSDSTPKVLSSSSGRSLFINQVETRPDGMALSPVPALLNNRYRAPESPTPRRDATDKARWRA GGRRYGRQPSRLLPEAQDSKASQIRSSRVPQENRRLTDENVRLENGYQEVASKDSQLQNMYSPI SSKRMVEMFLESRRRRMGTEMSDGAPSKDDGTKLSSDSVYDRHHIESLFYSLAPMYETPTN Transg1ucosidase MANIIWLALVLVALSIHVQAKDVFAHFILANAENFTQTHWTRDISAAKAAQIDAFALNTGYGAA 222 CAK42453.1 NTDQLLTDAFTVAAAHDFKLFLSLDYSGDGHWPPDQVLKVLQGYANHTAYYRVDNKHPLVSTFE GYEALADWSTIKEKLPNIYFMPEWSVRTPQELASEDAVDGLLSWSAWPYGTTPMNTSTDEQYIS ALKAKDKPYIMPVSPWFYTDMVRYHKNWVWQGDGLWHTRWKQVLDLQPQFVEILTWNDFGESHY IGPLHENELGIFSFGQAPFNYASGMVHDAWREFLPYVVGEYKNGSGKGVIDKEGVVVWYRVTPA WACKAGLTTGNSVTQGQQTMPPGQVLKDEVFFQALLEDTADVEVSIGGGENKSVGWTDTPSGGS SGGGRGLYFGSVPMDNRTGEVVVTLSRNGKFVAQMIGEKITTQCPDKLTNWNAWVGTAMSNVSN ASTSRASLSEENGAASVRVGGGRGMDMWMGALWMVVVVGIRADRSIPTKWACGSQINEEVLIQR RERLPLVEGDRVYLDSSSKAFRRRRRTCTR Transg1ucosidase MKVPADHALLLSSLLLAPSVGASTCQEPINHPGEPFSFVQPLNTSILTPYGGSPPVFPSPETKG 223 CAK42457.1 KGGWEKAMAQAKNWVSQLTVEEKAWMATGQPGPCVGNILPIPRLNFTGLCLQNGPQCIQQGDYS SVFVSGVSAAASWDRKLLYDRGYAMATEHKGKGTHVVLGPIGGPLGRSPYDGRTWEGFAADPYL TGVCMEETILGIQDAGVQANAKHFIANEQETQRNPTYAPDANATTYIQDSVSSNLDDRTLHEIY MWPFANAARARVASFMCSYNRVNGSHSCQNSYLLNHLLKTELGFQGYVMSDWGATHSGVASAES GMDMTMPGGFTVYGELWTEGSYFGKNLTEAINNGTITTDRIDDMIVRIMTPYFWLGQDKNYPSV DASVGPLNVDSPPDTWLYDWKFTGPSNRDVRGNNSAMIREHGAASTVLLKNERNALPLRKPRNI VIVGNDAGSDTQGPSTQTDFEYGVLANAGGSGTCRFSYLSTPQDAITTRARQYGGRVQTWLNNT LITEKSMPELWNPEQPDVCLVFLKSWSEENVDRTYLTLDWNGNAVVEAVAKYCNNTVVVTHSAG VNVLPFADHPNVTAILAAHYPGEEAGNAIADLLYGDANPSAKLPYVIAYNESDYNAPLTTAVAT NGTYDWQSWFDEELEVGYRYFDAHNIPVRYEFGFGLSYTTYNLTKLVAAKPVASNLTALPEQRA VQPGGNPALWDTVYTLTAQVSNTGSVDGYAIPQLYVGFPDTAPAGTPPSQLRGFDKIWLEAGET KKVTFELMRRDVSYWDVTAQDWRIPAGEFTFKAGFSSRDFHANATATFFRK Transglucosidase MHLRRIFVLTVLSYVTALPSDINLGVALRGCDVEACDMECRMAGSIGGNCGGNPALNLLGLPLL 224 CAK42741.1 STNTPNSSSAGPVTLAATETLTDVESTTTTKTTTDRESVTATETTTDIESTTATQTVTDTHLLT VTKSITEKQPTTATQTATDTKFLRTTQTINNTLTATQTTTDIESLRVTKTKNNTITATQTTTDI EPTTATQTINNTLTATQTTTDTESYTAMEITTATASFTTTQTTTDTESITATQNITDTQTIHHT ESLTATRTITDTDSVTATATPTTVTDTQTSISTTTATQTATPTPEVGACFCCTEQVRLPYELNG NCDNIPVSNSTDGCPSGDDPKRNHLLCCDSSGYCTQLS Transglucosidase MGSYTFTWPYNANEVFVTGTFDDWGKTVKLDRVGDVFEKEVPLPVTDEKVHYKFVVDGIWTTDN 225 CAK46804.1 RAPEEDDGSSNINNVLYPDQILKDSTTPLLNGTAAMAGVTPGSTTAALAAGVPKESSSKHGQNG YYPTISSAAPGSTTAALGQDVPLEQRANVPGSFPVTPASEADKFSVNPIPASSGAGNPIKLNPG EKVPDSSTFNTNTISSTARTDRAGYEQGTSGGFPGSPAYDASAFAIPPVSKNMIPESSLPMGEN QGATEPTYTIQSAAPTSTTAGLAAAVPLESQRQTSSGAPTRDVPDVVRQSMSEAHRDPEAATNK EAVDEKKEMEEELRRKVPVDNSTGAPAPTTVAGLGTSSGLGFTAGAAPSTNLGPSTGLDVATGM GTTTGLDSVSGPTAAQSFQKETTSGLPAHDVPDVVKQSISEAHKDPEAAGVEEAVGEKREVEEE LQQKVPVSNQSGTPAPVITAATSETAPGSGAE PASERAPRATGGGPASAQISPRATTPTDGPTVTTGVATSKAPEESGPGASGREETTEIPTKPAA GATGASATKTVDSGVESGIAPEDTTSAPTAGATGASATKTADPTETSGAPTSGASKPAESAPTN NAAATSKPATNNAAGAATNGKEEKKKKGFFSRLKEKLKSV Transglucosidase MARVDFWHTASIPRLNIPALRMSDGPNGVRGTRFFNGIPAACFPCATALGATWDAHLLHEVGQL 226 CAK97412.1 MGDESIAKGSHIVLGPTINIQRSPLGGRGFESFAEDGVLSGILAGNYCKGLQEKGVAATLKHFV CNDQEHERLAVSSIVTMRALREIYLLPFQLAMRICPTACVMTAYNKVNGTHVSENKELITDILR KEWNWDGLVMSDWFGTYTTSDAINAGLDLEMPGKTRWRGSALAHAVSSNKVAEFVLDDRVRNIL NLVNWVEPLGIPEHAPEKALNRPQDRDLLRRAAAESVVLMKNEDNILPLRKDKPILVIGPNAQI AAYCGGGSASLDPYYTVSPFEGVTAKATSEVQFSQGVYSHKELPLLGPLLKTQDGKPGFTFRVY NEPPSHKDRTLVDELHLLRSSGFLMDYINPKIHSFTFFVDMEGYFTPTESGVYDFGVTVVGTGR LLIDNETVVDNTKNQRQGTAFFGNATVEERGSKHLNAGQTYKVVLEFGSAPTSDLDTRGIVVFG PGGFRFGAARQVSQEELISNAVSQASQASQVIIFAGLTSEWETEGNDREHMDLPPGTDEMISRV LDANPDNTVVCLQSGTPVTMPWVHKAKALVHAWFGGNECGNGIADVLFGDVNPSAKLPVTFPVR LQDNPSYLNFRSERGRVLYGEDVYVGYRYYEKTNVKPLYPFGHGLSYTTFSRSDLKITTSPEKS TLTDGEPITATVQVKNTGTVAGAEIVQLWVLPPKTEVNRPVRELKGFTKVFLQPGEEKQVEIVV EKKLATSWWDEQRGKWASEKGTYGVSVTGTGEEELSGEFGVERTRYWVGL Transglucosidase MAVRRSARLRSRQATEPEAPADPVVTDNNAPCDTNNHNNSENTSEIDTTMARLGKQPERLPPVV 227 CAK43189.1 EHEEPADAAKDVPVQRSRKKTKTETASKRRSKVEAKEPVAEVTPVIAESQSNTDTTEPAVAETE KKPTPKAVPEPAVAETEKNPTPKALPETASKLSTPKKSIPTLKGTPVHRNTPVHRSTPVHKSTP TRTPSSTLVRPSHQEMHPSKVRQSTTKQADSGLILGFKPIKKDAEGKVIKDTLADNTPTKAKAS PAPYYGTPAFEFKFSCESQLSDEAKKLMETVREDAAKIKAQMTLEPDQNRAEAADRKIVQPKGK ASRFSDVHMAEFKKMDSIAGHASAFRATPGRFQPVVKTLKRTNSKARLDESDRNSPSPSKIARP SPAIVAPASNKRVKHDKADDASTRRPTAASPPKPVQPRPRSTVRSSLMTPTRSSAARASSVTAR PPRTSMIPSLVRSPAAKPADVPRTPQTEFNPRLKSNLPTLGNLKSILRRHQPLFSKDPSKIAAG THVAAPDFTSNLLFGSRGTTEEPAQTPSPKKRVEFTPSVKARHEEVMFSPSPSKVPVASPSRTT SDVVYPTLPVLTPEQNRVSAKSPAQATTPTIRHVRPSDVHANPLPEVAGVPHGIGHKKRTRESG EDTKTNDLPEVAGVPHGIGQKKRNRAALEDETDTENVPPVDLTADARSAKRMKMTSPSPLKAPT LSARKVAAPSPTKAATPSPTKPRSHTPLRSATTSRMSTPGISTPASVRARNRGVLSVSRLNMLA QPKNRG Transglucosidase MSLRWRKKTHWPPSPCVEDEVVSLSRELHGLSQIREMPGLEGVCSRGSVDQYPVLVDVFSYSSY 228 CAK43257.1 EETTVIYEDFSRDSSSEDNVGPPTPVDEKQDPMLYLVGDDQAVSLSAPLATREQSQDPSKTPAD NEQGSTTRGRPRADTRAQRDSPSKKDTAQSSRDASRASNIRTPAVTQSKSTPSLPRRFGSVKHG RSADALTTKSGYQSDSATVKSKAKPETVDKSAAPQTDKKSPTGLTVAERLEEKIRQRQELRAKE SSGDAPKTPSPPSTDQPSVPVGRAAEPSITPAATAAPKSTAPKTRPRSTSTPKEPTNDHRVDGP EASSSAAAPALQLPPRPGLASKPAGRSVSSNDAKSTAQLPARRAVSFLDDVPQRSSSLPRTPED IPEPPPLPRRRSSSQDAVRHRSSSQDAVRQTSPKRPFFLPPCPRSTPIAGYQDWHTVKGLPHLN ICPSCMKQMRKSEFRDHFVLASPRSRGEKIRCSMSEPWTRLAWMQTLKKQLDHLELLHQITRPP LSIKPCPGRIITEQHWYRIVDPETNMYLPQFNVCSACVRNLRVLMPQHRDTFKRSSTKQERACD FLTDSPRFVRYIDYLDIAANRADQENMLRPDVTEFLSYARRKVVLRDCRRDRRILSTWHYMPQL PELTVCEDCYDDVVWPLVRAKQPIARKFSTSMRLLPGDGPSRCREASCQLYSPRMRAKFAEAVQ SNDLMYLKQVALRRRDAEQRYRDREEELLEDASRGYDVEGEMRRNVEEWKRNE Transglucosidase MSLLPVILVTFFLVFCCAAGPIAPFASKRDLESLSPSPSLTTPYVHVASSSVDDDDDNEINALT 229 CAK97469.1 VVPVTPSSLPQTASSSSTTIEPALSSSAAFIQESSIVAATTSSLSESSSAVTHSFAPSTSSDST VNTLSQTLTSTTTTTTTTSSPTTLGEPSSAPFSPLSSAVRSSHTTSSSSHIMHITTPSTTLSES SQIVTPSIIIPGGPMEASSSHAPGTATSSHITHETSSSAVRVSHSSSAAAEMSKSSPSHQGTLN SSSRLLHSSTAALLPSSTAPESAPETSSTRTSETTTSTSLTIGVIVPLPEASTSPSTMLEMSSS TSQSSESITTTADDTSSASSTKVLSTPTESETTTPTSHTASPFIGVTIPTTTKTTQPADPADIT TTTTTPTSEAEDATTTSAPTPVVVLVTPEGSTTVIGTSSFIAPNPDITSTSTSTSSLTTTIEPT PTTSTTEPPTTLHATSTTIQTVYVVITDTPTPEPTWDSTTIATAIITVYDKSTSETVPTTTTTS RAGEEMESTTTTPLDTEQTSTQSTEDEIPTSTPIADTETATAIITSYPSTSTLSGETGDVIRIV PVTPTGPITVTVTVTEKERETVTKTETVTERVTETESVTT Transglucosidase MPQKEFVPKTYQESSTGAQSSSSVHLRSSPEERSFDFSFEPIRENLFRVTFSSQDHPLPPYPSV 230 CAK97480.1 KPATSLDGVHVSATGGSNQKTIEVGDVTASVEWSNTPVVSLSWKGTEKPLYRDLPLRSYVADS TGIAHYTEHDRDCLHVGLGEKRAPMDLTGRHFQLSATDSFGYDVYNTDPLYKHIPLLIKASPDG CVAIFSTTHGRGTWSVGSEVDGLWGHFKVYRQDYGGLEQYLIVGKTLKDVVRSYAELVGLPILV PRWAYGYISGGYKYTMLDDPPAHEALMEFADKLEEHGIPCSAHQMSSGYSIAETEPKVRNVFTW NKYRFPNPEEWIAKYHGRGIRLLSNIKPFLLASHPDFQKLIDGNGFFKDPESSKPGYMRLWSAG GATGGDGCHIDFSSAVAFKWWYDGVQSLKRAGIDAMWNDNNEYTLPDDDWKLALDEPTVSDAVK KGVENSVGQWGRAMHTELMGKASHDALLNIEPNHRPFVLTRSATAGTMRYAASTWSGDNVTSWE GMKGANALSLSAGISLLQCCGHDIGGFEGPQPSPELLLRWIQLGIHSPRFAINCFKTSPGNSSV GDVIEPWMYPEITPLVRDTIKRRYEILPYIYSLGLESHLTASPPQRWVGWGYESDPEVWTKALK SGDEQFWFGDTIMVGGVYEPGVSVAKLYLPRKANDQFDFGYVNMNEPYNYLASGQWVEVPSEWR KSIPLLARIGGAIPVGKPVHTRVPGDDTPASVAVKEVDDYRGVEIFPPLGSSHGQVFSTTWFED DGISLEARISEYTVTYSSTEEKVIVGFSRDEKSGFVPAWTDLDIILHNGDERRVVSDIGKTVEY KGKGSRGRVVYTLKN Transglucosidase MNEGRLAHPQFNQYSFKAGASTVQAEAAPALNYEDASTHNAAKNATKRSGRKGDQTYTYSIPPE 231 CAK47332.1 LAEAARLVAEASPQPVPTDYGVDISLVVSKYRKYDNNDTNVPKQKYVEPNGLDGYVHTGQPEDS PEIHTELKKRATTDFWLTQMGDSGSSPYAPDGYKVWRNVRDYGAKGDGITDDTAAINKAISDGG RCGAECGSSTIYPAFVYFPAGKYLVSSPIIQYYNTEFYGNPFDYPTILAASSFVGLGVITSDVY TGDDTEWYINQNNFLRSIRNFKMDITRTDPNAYVCAIHWQVAQGTSLENIEFYMMQDGLTTQQG IYMENGSGGFLTNLTFVGGNFGYVYAFSQRCTPLSDLPSGHTLATQFTSTSLTFMNCKTALQVH WDWAWTMQDVVVENCTNGIVIVGGAGGPKSTGQSVGSLILVDAVIAHTQTGIVTTLLAENSTSF LLQGVVFIEVDTAILDSAQGKTLMAGGSNVPVFSWGFGRVVTTGAESTFYNGQDIPRTNRSVPL TTIGYIEPNFYLRRRPTYRDIGMSQVINVKDWGAAGDGKTDDTAVLNSILDRAANMSSIVFFPY GVYIIRDTLRVPVNSRIMGQVWSQIMATGPKFQDEQNPHIAVQVGQVGDRGIVEIQSLMFTVSG PTAGAVLMEWNVHQVIQGSAGMWDSHFRVGGATGSQLQADECPKGSGVVLPACKAASLLLHLTS QSSAYLENIWLWVADHDLDLQDQAQIDVYSARGLLVESQGPTWLYGTASEHNVLYQYQVSQARD LYMGMIQTESPYFQNVPPAPSPFSPGLFPNDPTFSDCDSDSQTCPVSWALRIIDSTSVYSMGAG IYSWFSAYSQDCLDTESCQQHAVGISQSTNTWLYNLVTKGIAEMVTPTNEHPTLSADNVNGFMS SILAWVRLANTTIGARKFPGFQLYQPKWLDGLTDTCKTALSQKILCHPYLEMKFSNPGIGQYID NNTLADEVCDQGCGESLQMWTTNVANSCLNQTIDDTDPVAAGGYIYAGYNLTCLRDPHTKKYCP DVLSHFTIVDSVRSMTLAEMCSYCFTTSLEMRQASPYAAYTDVDKDALETVNAECGLSGPTDLH KPLYTEDEVDRPICMSGITHTTSEGDTCDLLAYKYHVASAVIQLANPMLVNDCSELIPGRQLCM PLSCDTQYTLQDNDTCLSIEWAQPIGFGEVRRYNPWLNVDCTNLQTTRQVHGSVLCLSPQGGSH NVTGTGSPCPGISDGYTNVVQYAPTNSTIAKGTTCYCGKWYTVQQGDSCATICIKQGIPSSLFL AVNPSLSTSDCDTSLQVGYTYCVGPDTHWDDTDNFWGEFACEAY Transglucosidase CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG 232 1ACZ_A ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWX Transglucosidase MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS 233 ACF60497.1 SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV LDKNGQADGSLYVDDGETFDYKRGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 234 CAY05387.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFRQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSGDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MSFRSLLALSGLVCTGLANVISKRATWDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 235 CAY05391.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSFI LANFDSSRSAKDANTLLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDATG TYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLTA NNRRNVVPSASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVTS TSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSA DKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG 236 1ACO_A ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS 237 CAS97680.1 SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV LDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF Transglucosidase CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG 239 1KUL_A ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG 240 1KUM_A ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILSNIGADGAWVSGADSGIVVAS 241 AD032576.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNEDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTPLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 242 ADX86749.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 243 AEE60909.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase CTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALSADKYTSSDPLWYVTVTLPAG 244 AFJ52556.1 ESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS 245 CCO73840.1 SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV LDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF Transglucosidase MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP 246 BAM72725.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS QVLFVGGLQNLTKGGAWAENWVLEW Transglucosidase MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS 247 AGN929631 VTYDFGINVAGIVSVDVASASSESAFIGVTFTESSMWISNEACDATQDAGLDTPLWFAVGQGAG VYSVGKKYTRGAFRYMTVVSNTTATVSLNSVKINYTASPIQDLRAYTGYFHSSDELLNRIWYAG AYTLQLCSIDPTTGDALVGLGAITSSETITLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD MSIALESVAVSTEDLYSVRTALESLYALQKADGQLPYAGKPFYDTVSFTYHLHSLVGAASYYQY TGDRAWLTRYWGQYKKGVQWALSGVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS LAQSLNDNAPIRNWTATAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN QSAIISESLAARWGPYGAPAPEAGATVSPFIG GFELQAHYQAGQPDRALDLLRLQWGFMLDDPRMTNSTFIEGYSTDGSLVYAPYTNRPRVSHAHG WSTGPTSALTIYTAGLRVTGPAGATWLYKPQPGNLTQVEAGFSTRLGSFASSFSRSGGRYQELS FTTPNGTTGSVELGDVSGQLVSEGGVKVQLVGGKASGLQGGKWRLNV Transg1ucosidase MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVYP 248 AIY23066.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY FFLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV FTALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY FIPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY FLTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYHNFDND FQHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY FIGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA FHPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN FVEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF FIIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR FWMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTSGSTVMR FALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP FGVTTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGAASGQLYLDDGESI FYPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS FQVLFVGGLQNLTKGGAWAENWVLEW Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 249 AIY23067.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL FGEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ FYWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF FILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS FDSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT FGTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT FANNRRNSVVPASWGETSASSVPGTCAASSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT FSTSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS FADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MDPANEYCGLEDYGLVGDMHTCALVSKNGSVDSMCWPVFDSPSIFCRILDKEKGGHFSITPDRR 250 GAQ47522.1 LKNPLSKQRYRPYTNMLETRWIHEEGVVNILDYFPIAKPKPHVVEKGLPQWCRCYQNKGSAQQE FACRSGMVRKAECVRGEMEIEIELFPAFNYARDSHIAQQSSASDDGIQVYHFQSESQNLVVSVLG FDKGDISEDDSDLSIEFELSDRPGHLGPGLIGKVTLKEGQSVTMLLHDQESITCNAQDLAPYLQQ FIERTTGDFWSDWTSKCTFRGHYREQVERSLLVLKLLTYKPTGAIVAAPTFSLPEHIGGSRNWDY FRYSWVRDAAFTVYVFLKNGYPEEAESYINFIFERIFPPMDKNPKPGEPFLPIMITIHGEREIPE FMELDHLEGYRGSRPVRIGNGAATHIQLDIYGELMDSIYLYNKHAADISYDQWRAIRRMIDFVIQ FIRHQPDQSIWEVRGPPQNFVYSKIMLWVALDRGVRLAEKRSNLPCPDRARWMHERDALYDEIMT FKGYNAEKGFFCMSYENQDAMDAAVLIAPLVFFVAPNDPRLLSTIQKITDVPAKGGLSVANMVSR FYDTGKVDDGVGGNEGAFLMVTFWLVEAMMRAAKSKAYLPHDPFFQQLRKTATSQFDSILSFANH FLGMFSEEVATSGEQIGNMPQAFSHLACVSAAMNLGGGGDR Transglucosidase MSFRSLLALSGLVCSGLASVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 251 GAQ47133.1 PSTDNPDYFYTWTRDSGLVIKTLVDLFRNGDTDLLSTIENYISSQAIVQGISNPSGDLSSGGLG FEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLLTGQDNGYTSAATEIVWPLVRNDLSY FVAQYWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPQILCYLQSFWT FGEYILANFDSSRSGKDTNTLLGSIHTFDPEAGCDDSTFQPCSPRALANHKEVVDSFRSIYTLND FGLSDSEAVAVGRYPEDSYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEITDVSLDFFQALYSD FAATGTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSLSEQYDKSDGDELSARDLTWSYAA FLLTANNRRNSVVPPSWGETSASSVPGTCAATSASGTYSSVTVTSWPSIVATGGTTTTATTTGSG FSVTSTSKTTTTASKTSTTTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWDTSDGI FALSADKYTSSNPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGESTATVTD FTWR Transglucosidase MAKSASQIHRAWWKECSVYQIWPASYKDSNDDGIGDIPGIISKLDYIKNIGVDIVWLCPSYKSP 252 GAQ46031.1 QVDMGYDIADYYSIADEYGTVADVEKLIQGCHERGMKLLMDLVVNHTSDQNEWFKQSRSSKDNE YRNWYVWKPARYDEQGNRHPPNNWVSHFQGSAWEWDEHTQEYYLHLYAVEQPDLNWEHPPVRKA VHDIMRFWLDKGADGFRMDVINFVSKDQRFPDAPVKDPRTPWQWGDKYYANGPRLHEYLADLGK ILKEYDAFSVGEMPFVRDTEEVLRAVRYDRNEINMIFNFEHVDIDHGTYDKFEPGSWKLTDLKA FFETWQKFMYNNDGWNALYWENHDQPRSIDRYAQAKEEFRTEAGKMLATVLALQSGTPFVYQGQ EIGMRNVPIEWDMNEYKDIDCLNHWHRLLKHRPDDIEAQKSARQEYQKKSRDNGRTPVQWSSAP NGGFTGPNAKPWMSVNPDYVRFNAEAQVNDPNSIYHYWAAVLGLRKKYLDIFVYGDYDLVDKDS QEIFAYSRQYEDQKALVLTNWTENTLEWDATANGVKGVKDVVLNSYESAEAAKGRFSGQKWSLR PYEAVVLLVEA Transglucosidase MRCHRLLSGVLAFLPLSVAQSCWRNTTCSGPTDSAFSGPWEKNIFAPSSRTLNPEKLFLITQPD 253 GAQ44395.1 KTEDYIPFALHGNGSLVVYDFGKEVGGIVSVNFSSTGSGALGVAFTEAKNYIGEWSDSSNGGFK GPDGALYGNFTEAGSHYYVMPDKSLRGGFRYLTLFLITSDNSTIHIEDVSLEIGFQPTWSNLKA YQGYFHSNDDLLNKIWYTGAYTLQTNEVPTDTGRQIPAMAEGWANNCTLGPGDTIIVDGAKRDR AVWPGDMGIAVPSAFVSLGDLDSVKNALQVMYDTQDNSTGAFDESGPPLSQKDSDTYHMWTMVG TYNYMLYTNDSDFLEQNWEGYQKAMDYIYGKVTYPSGLLNVTGTRDWARWQQGYNNSEAQMILY HTLNTGAELATWAGDSGDLSSTWTSRAEKLRQAINEYCWDESYGAFKDNATDTTLHPQDANSMA LLFGVVDADRAASISERLTDNWTPIGAVAPELPENISPFISSFEIQGHLTVGQPQRALELIRRS WGWYYNNANGTQSTVIEGYLQNGTFGYRGSRGYYYDTAYVSHSHGWSSGPTSALTNYIVGITVT SPLGATWRIAPQFVDLQSAEGGFTTSLGKFQAGWSKTDKGYTLDFTVPHGTQGNLTLPFVGTAK PSIKIDGTEITRGVQYANSTATVTVSGGGTYKVVVQ Transglucosidase MRFRDGMWLVDPSKSLQYAEDIYSINASPDNRSLNLLCPTRHIFSRGNTLNLSTLHINLESHFD 254 GAQ43928.1 GVISLEVQHWLGARKGTPDFELFPDGEGPKLSEERIGISKSERGTTLKSGALSVTVSPDQHDFS IRFHSSDDYDWEVTSLLNRSVGLAYDPPISNGKQVEDLVQGQSGSRKHYIFTQTELDIGESVHG LGERFGPFNRLGQHVEIWNEDGGTSSDQAYKNVSFWMSSKGYGVFIDTPEKVDLEIGSERCCRV QTSVEGQRLKWYIIYGPSPKEVLTKYSVLTGRAPMVPAWSFGLWLTTSFTTNYDEATVTDFLQQ MSDRSIPVEVFHYDSFWMRAFHWCDFVFSPDHFPDPKGSIARIKHAGLTNKVCVWINPYLGQAS PVFLEAAEKGYLLKRTNGDVWQWDLWQTGMGLVDFTNPEAVRWYEGCLERLFDVGIESIKTDFG ERIPTKGVKWHDESVDPARMHNYYAFIYNKIVYNALTRRYGDGQAVLFARSACAGVQRFPLCWG GDCESTPAALAESVRGGLSIGLSSFSFWSCDIGGFEGTPPPWIYKRWVAFGLLCSHSRLHGSDS YRVPWLIDNDDAGPQGSTAVLRTFVRLKRRLMPYLYTQAVQSTRMGWPLSLRATALEFPHDPTA WAACDRQFFVGENLLVAPVFTEHGDVEFYLPEGQWTSLWDEKKVVSGPGWRREKHGFGTLPIYV REGAVIVMGKEQGEGGFAYDWCEAPEVRLYQTKQGDCATVVDASGKEVGTLTVQDDGSLKGLEC FRGDVTVRRIE Transglucosidase MDFFQTFWSSSPLSKIPSSSFNQTFMCTMCPKSNYTTPKWWKEAVVYQVYPASFNCGKPTSKTN 255 GAQ43980.1 GWGDVTGIIDKVPYLKSLGVDIVWLSPIYTSPHVDMGYDIADYKSIDPRYGTLADVDLLIKALR NHDMKLMMDLVVNHTSDQHSWFVESASSKYSPKRDWYIWRPAKGFDDDGNPVPPNNWAQILGDA LSAWTWHEETREFYLTLHTSAQVELNWENPEVVAAVYDVMEFWLRRGICGFRMDVINLISKDQS FPDAPIIDPTSKYQPGEQFYTNGPRFHEFMHGIYDNVLSKYDTITVGETPYVTDIEEIIKTVGS TAKELNMAFNFDHMEIEDVKTKGDSKWSLRDWKLTELKGILSGWQKRMKKWDGWNAIFLECHDQ ARSVSRYTIDSDEFRERGAKLLALLETTLGGTIFLYQGQEIGMRNFPLEWDPDIEYKDVESVNF LNKSKELHPVGTEGLAKARTLLQKKARDHARTPMQWSAAPHAGFTVPDATPWMRVNDDYETVNV ETQMSFPWQSKGELSVWQYWQQAIQHRKLYKNAFVYGGFEDLDYHNEKVFAYLRTSADGNDSWL VAMNWTTSAVEWTVPSDIHVTRWVSSTLQTAPPVASKTVITLRAFEGVLGCCN Transg1ucosidase MAGTRPMSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDASAQGPSWTS 256 GAQ43844.1 PYELDSSSIQFKDGQLHGTILKSVSANEKVKLPLVVSFLESGAARVVVDEEKRLNGEIQLRHDS KARKERYNEAEKWVLVGGLELSKTATLKPETETGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQ THVQLNNKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPG YKHVFGIPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELNSPMTLYGAIPFMQAHRKDSTV GVFWLNAAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGE LTGYTQLPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLT FPDPISMEEQLDESERKLVVIIDPHIKNQDKYSISQEMTSKDLATKNKDGEIYDGWCWPGSSHW IDTFNPAAIKWWISLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHN VHGITLVNATYDALLERKKGEVRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNN GIAGFPFAGADVGGFFHNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQA IRLRYQLLPAWYTAFHEASVNGMPIVRPQYYAHPTDEAGFAIDDQLYLGSTGLLAKPVVSEEAT TADIYLADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDP YTLVVVLDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMA NVRVERVVVVDPPKEWQGKTSVTVIEDGASAASTAPMQYHSQSDGKAAYAVVKNPNVGIGKTWR IEF Transg1ucosidase MLLHLLAYAALSSVVTAASLQPRLQDGLALTPQMGWNTYNHYSCSPNETIVRSNAQALVDLGLS 257 GAQ42954.1 SLGYRYVTTDCGWTVADRLPDGSLTWNETLFPQGFPAMGDFLHDLGLLFGVYQDSGILLCGSPP NETGSLYHEAQDARTFASWNVDSLKYDNCYSDAATNYPNVNYAPSTSPEPRFANMSHALLQQNR TILFQICEWGISFPAGWAPALGHSWRIGNDIIPAWRTIFRIINQAAPQTDFAGPGQWPDLDMLE VGNNIFSLPEEQTHFSLWAILKSPLIIGAALKDELTAINDASLAVLKQKDVVAFNQDALGKSAS LRRRWTEEGYEVWSGPLSNGRTVAAVINWRNESRDLTLDLPDIGLQHAGTVKNIWDGTTAQNVV TSYTATVAGHGTMLLELQNTTAVGVYPRDVFGESSGQTTTFENIYAVTTSAKYTVSVYFSQPAS SAETISIGSNANQSIISVQVPASSTLVSANIPLTAGSSNTVTINTSIPIDAIHITAPNGTYYPC TNFTLAGSTTLTTCGSGYCQPVGSKIGYISPSGTAKATISATTSGSKYLEIDWINNEIAFDSSW GWGSNSRNLTVTVNSEEPVRIEVPLSGRHSELFGPGLGWWDTATLGLLTSGWKEGLNEVIVGNV GGDEGFQSYGADFEILRSNFEKMKVSLTLYAAALQLADAAVVQKRTVDVAELEHYWSYGRSEPV YPTPETSGSGDWEEAFTKAKSLVAQMTNDEKNNITYGYTSTTNGCSGMSGGVPRLGYPGMCLQD AASGVRGTDMVNAYASGLHIGASWNRDLAYEHAHYMGAEFKRKGANVALGPVVGPLGRMARGGR NWEGYSNDPYLSGSLVQNTIRGLQESVIACVKHFIGNEQETNRNTPQLLEDSYNQSVSSNIDDK TIHELYLWPFQDAVKAGAGAVMCSYNRINNSYGCQNSKNLNGLLKGELGFQGFVVSDWNAQQSG IASAAAGLDMVMPDSVYWENGNLSLAVRNGSLSSTRLDDMATRIVAAWYKYAELEDPGFGMPIS LLEPHDPVDARDPASKATILQEAIEGHVLVKNTDNALPLKEPKFLSLFGYDAIAAQRNTMDDLS WSLWTMGLDNTLSYPNGTAVDPSHLKYMFLSSTNPSENGPGVSLNGTMISGGGSGASTPSYIDA PFDAFQRQAYEDNTFLAWDFASQSPVVNPASEACLVFINEAAAEGWDRPYVADPYSDTLVENVA SQCNNTMVIIHNAGIRLVDRWVDNPNVTAVIYGHLPGQDSGRALVEIMYGKQSPSGRLPYTVAK NASDYGALLSPVVPEGTKDLYYPQDNFTEGVYIDYKAFEQKNITPRYEFGYGLTYSTFDYSGLK ISIHTGVNTDYLPPNSTIEEGGIPALWDVVATVTCSVANTGSVAAAEVAQLYLGIPGGPAKVLR GFEKKLIQPGHHTKVQFDLTRRDLSSWDVVNQAWVLQKGDYSVYVGKSVLDTQLTGTLTI Transglucosidase MAVSASSPEPLGANIDERTPLNSSAQHATPSANTPDYSSITKGLTSDVHSSHGSDEEQPLINPP 258 GAQ42198.1 ESPGKDVTALTSISTVIGVLLLGEFISNADATLVMAATGRISSEFNRLRDASWLSTAYTLGLCA AQPMYGKLSDIYGRKPLLLWAYFLFGVGCVISGIGPDMATVILGRAISGIGGAGTMAMGSIIIT DIVPRRDVAHWRAYINIAMTLGRSAGGPVGGWLTDTIGWRWSFIIQGPLAAVAALLVVWKLKLV HPVTEKSIRRVDFLGTFLLATGIITITVIMDQAGQSFAWASLSTAILSTLSLSAFVAFVLVELY VAPEPIFELRMLRKPNVTPSYLIGSLQITAQVGMMFSVPLYFQVTSKASATVAGGHLVPAVIGN TLGGLIAGAFIRRTGQFKVLLILAGLVASVAYLLLFLRWNGHTGFWESLYIIPGGMGTGFCSAA AFVSMTAFLMPQEVAMATGGYFLLFSFAMTAGVTVTNSLLGTVFKRQMEQHLTGPGAKKIIERA LSDTSYINGLQGHVRDVVVKGYVAGLRYTYPMRVSISSLALSVYLFGKLALGLSAAEWRTQSIY FLLTDRFGRADNSTTATCDTGDQIYCGGSWQGIINHLDYIQGMGFTAIWISPITEQLPQDTSDG EAYHGYWQQKIYDVNSNFGTADDLKSLSDALHARGMYLMVDVVPNHMGYAGNGNDVDYSVFDPF DSSSYFHPYCLITDWDNLTMVQDCWEGDTIVSLPDLNTTETAVRTIWYDWVADLVSNYSVDGLR IDSVLEVEPDFFPGYQEAAGVYCVGEVDNGNPALDCPYQDYLDGVLNYPIYWQLLYAFESSSGS ISDLYNMIKSVASDCSDPTLLGNFIENHDNPRFASYTSDYSQAKNVLSYIFLSDGIPIVYAGEE QHYSGGDVPYNREATWLSGYDTSAELYTWIATTNAIRKLAISADSDYITYANDPIYTDSNTIAM RKGTSGSQVITVLSNKGSSGSSYTLSLSGSGYTSGTELIEAYTCTSVTVDSNGDIPVPMASGLP RVLLPASVVDSSSLCGGSGSTTTTAATSTSTSKATSSTTTTTAITTTSSSCTATSTTLPITFEE LVTTTYGEEIYLSGSISQLGEWDTSDAVKLSADDYTSSNPEWYVTVSLPVGTTFEYKFIKVEED GSVTWESDPNREYTVPECGSGETVVDTWR Transglucosidase MPLTYLGALAMLTALPSLGQARSTWPLGSGLELSYQASQHQISIHQDNQTIFSTLPGQPFLSAG 259 GAQ39994.1 AGKDQIVEDSGNFNITNVAQARCQGQNITQLAGIPRRDSVKNQVAVRGYLLDCGGEDIAYAMNF WVPKTLSDRVAFEATVDSDANASVPVERLYLTFASHAREDFYGLGAQASFASMKNRSIPIFSRE QGVGRGDQPYTAIEDSQGFFSGGDQYTTYTAIPQYVSSDGRVFYLDENDTAYAVFDFQRPDAVT VRYDSITVHGHLMQADNMLDAITMLTEYTGRMPALPEWVDHGALLGIQGGQEKVNRIVKQGFEH DCPVAGVWLQDWSGTHLQSAPYGNMNISRLWWNWESDTSLYPTWAEFVQALREQHGVRTLAYVN PFLADVSSKSDGYRRNLFQEASKHRYMVQNTTTNSTAIISSGKGIDAGILDLTNEETRAWFADV LRTQVWSANISGCMWDFGEYTPITADTSLANISTSAFFYHNQYPRDWAAYQRSVAAEMPLFHEM VTFHRSASMGANRHMNLFWVGDQATLWTPNDGIKSVVTIQGQMGISGYAHSHSDIGGYTTVFEP PTTSNSSGAIPRSAELLGRWGELGAVSSAVFRSHEGNVPSVNAQFYSNSTTYAYFAYNARMFRS LGPYRRRILNTESQRRGWPLLRMPVLYHPEDLRARQISYESFFLGRDLYVAPVLDEGRKSVEVY FPGHSANRTYTHVWSGQTYRGGQTAQVSAPFGKPAVFVVDGASSPELDVFLDFVRKENGTVLRA Transglucosidase MVKLTDLLARAWLVPLAYGASQSRLSTTTSSQPQFTIPASADVGAQLIANIDDPQAANAQSVCP 260 GAQ38166.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVDSLTLSVEYQDSDRLNIQILPTHVDSTNASWY FLSENLVPRPKASLNASVSDSDFSVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYEDQFIEFV TALPEEYNLYGLGEHITQFRLQRDANLTIYPSDDGTPIDKNIYGQHPFYLDTRYYKGDRQNGSY VPVKSSETDASQKYISLSHGVFLRNSHGLEILLRPQKLIWRTLGGGIDLTFYSGPNPADVTRQY LTSTVGLPAMQQYSTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND QNRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY IGAVWPGYTVFPDWHHPKAVEFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA HPPFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATSTSTSVSYLRTTPTPGVRN VEHPPYVINHDQEGHDLSVHAVSPNATHIDGVEEYDVHGLYGHQGLNATYHGLLEVWSHERRPF IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFTGNSDEELCNR WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR ALSWEFPNDPTLAAVETQFMVGPAIMVIPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP GVNTTISAPLGHIPVYVRGGNILPMQEPALTTRGARQTPWALLAALGSNGTASGQLYLDDGESI YPNATLRVGFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGKTVSPGSITYNSTS QVLFVGGLQNLTNGGAWAENWVLEW Transglucosidase MLGSLLFLLPLVGAAVIGPRAGSQSCPGYKASNVQKSARSLTADLTLAGAPCNSYGKDVEDLKL 261 GAQ36312.1 LVEYQTDERLHVMIYDADEEVYQVPESVLPRVGSDKDSQDSVLEFDYVEEPFSFTISKGDEVLF DSSASTLIFQSQYVRLRTWLPDDPYVYGLGEHSDPMRLPTYNYTRTLWNRDAYGTPNNTNLYGS HPVYYDHRGKSGTHGVFLLNSNGMDIKINQTTDGKQYLEYNLLGGVLDFYFFYGEDPKQASMEY SKIVGLPAMQSYWTFGFHQCRYGYRDVYELAEVVYNYSQAKIPLETMWTDIDYMDKRRVFTLDP QRFPLEKMRELVTYLHNHDQHYIVMVDPAVSVSNNSAYLTGVRDNVFLHNQNGSLYEGAVWPGV TVFPDWFNEDTQDYWTAQFQQFFDPKSGVDIDALWIDMNEASNFCPYPCLDPAAFAISDDLPPA APPVRPSSPIPLPGFPADFQPSSKRSVKRAQGDKGKKVGLPNRNLTDPPYTIRNAAGVLSMSTI ETDLIHAGEGYAEYDTHNLYGTMMSSASRTAMQARRPDVRPLVITRSTFAGAGAHVGHWLGDNL SDWVHYRISIAQILSFASMFQIPMVGADVCGFGSNTTEELCGRWASLGAFYTFYRNHNELGDIP QEFYRWPTVAESARKAIDIRYRLLDYIYTALHRQSQTGEPFLQPQFYLYPEDSNTFANDRQFFY GDALLVSPVLNEGSTSVDAYFPDDIFYDWYTGAVVRGHGENITLSNINITHIPLHIRGGNIIPV RMSSGMTTTEVRKQGFELIIAPDLDGTASGSLYLDDGDSLNPSSVTELEFTYSNGELHVQGTFG QKAVPKVEKCTLLGKSARTFKGFALDAPVNLKLK Transglucosidase MSSPQQVYLLPLKDDGSPDVPGGYLYLPSPTDPPYLLRFVIEGSSSICREGALWVNIPEKGESF 262 GAQ33831.1 NRSAFRSFSLSPDFNKNIQIDIPITSAGSFAFYVTFSPLPEFSVLSTPTPEPTRTPTHYIDVSP KLTLRGQDLPLNALSIYSVISKFMGQYPKDWEKHLNGISQRNYNMVHFTPLMKRGASNSPYSIF DQLQFDDAVFPNGEDDVARLVSKMEDEYGLLSLTDVVWNHTAHNSKWLEEHPEAGYSVETAPWL EAALELDTALLKFGQELSTLGLPTEFHTVDELMEVMNAMRDKVISGIRLWEFYAIDVKADTQRI LDQWKTSKDLNLTDKKWAQLNLSDYKNWTLKQQATFIREYAIPTSKQVLGRFSRAVDLHFGAAI LTALFGPHDSPTSDTNTVEESLSKILDEVNLPFYEEYDGDVSEIMNQVFNRIKYLRIDDHGPKL GAVTAQSPLIETYFTRLPLNDVTKKHKKGALALVNNGWIWNADALRDNAGPDSRAYLRREVIVW GDCVKLRYGSCRDDNPFLWDFMTDYTRLMAKYFSGFRIDNCHSTPLVVAEYLLDEARKVRPNLT VFAELFTGSEEADYIFVKRLGINALIREAMQAWSTGELSRLVHRHGGRPIGSFDLDLPSSGSSH AIASSGLDSGKEKVAHIRPTPVQALFMDCTHDNEMPAQKRTAKDTLPNGALVAMCASAIGSVIG YDEVYPRLVDLVHEHRLYFSEFSEAPETGLNSLEGGIGGIKKLLNDLHTRMGVEEYDETHIHHD GEYITVHRVHPRTRKGVFLIAHTAFSGQDGKSVLAPTHLVGTHVKHIGTWLLEVDASQTTKERI QTDKSYLRGLPSQVKTFEGTKIEESGKDTIISVLDSFVAGSIALFETSMPSVEHASGLDNYITE GVDHAFSDLSLVDLNFALYRCEAEERDSSKGQDGVYDIPGHGPLVYAGLQGWWSVLENIIKYNE LGHPLCDHLRNGQWALDYIVARLEKLGHTDEHTALGRPAAWLQEKFQAVRQLPSFLLPRYFAII VQVAYNAAWKRGIQLLGPHIQNGQEFIHQLGMVSVQQTGYVNSASLWPTKKVPSLAAGLPHFAV DWARCWGRDVFISLRGLLLCTSRFEDAKEHITAFASVLKHGMIPNLLSSGKLPRYNSRDSVWFF LQSIQDYTKMAPDGLRLLDHNVPRRFLPYDDVWFPYDDPRAYSQHSTISEIIQEVLQRHAQGLS FREYNAGPDLDMQMTQEGFQIDVKVDWETGLIFGGSQYNCGTWQDKMGESAKAGNKGVPGTPRD GAAIEITGLVYSALTWVAELHERGLYKHDGVDIGDGKSISFKEWASRIQANFERCYYVPLQPKD DGQYDIDANIINRRGIYKDLYRSGKPYEDYQLRSNFPIAMTVAPDLFTSSKALAALALADEVLV GPVGMATLDPSDLNYRPNYNNSEDSTDFATAKGRNYHQGPEWVWQRGYFLRAFLHFDLARRTTP AERTETYQQITRRLEGCKRALRESPWKGLTELTNKNGAHCADSNICFWSVMSLTESAAAAMLTA AVVVDSPIDVHEPRWDDQTRGRDLYTQLVISLLVGLSAFFSFCVLRPKWTELYAARRRQRCAAS YLPELPDSFFGWIPVLYRITDEQVLESAGLDAFVFLTFLKFAIRFLSAIFFFALVIILPTHYKN TGKSGVPGWDDDDDETFDGDKDKKKIISDPNYLWMYVIFTYIFTGLAVYMLIQETNKVIRTRQK YLGSQTSTTDRTIRLSGIPPDLGTEEKIKDFMEGLKVGKVESVTLCRDWRELDHLIDERLKLLR NLERAWTRHLGYKRVKASPNALTLMHQQPRGSSIVSDGESERIQLLSEGGRDHVTDYAHKRPTV RIWYGPFKLRYKNIDAIDYYEEKLRRLDEKIQVARQKEYPPTEVAFVTMESIAASQMVVQAILD PHPMQLLARLAPAPADVVWKNTYLPRSRRMMQSWFITVVIGFLTVFWSVLLIPVAYLLEYETLH KVFPQLADALARNPLAKSLVQTGLPTLVLSLLTVAVPYLYNWLSNQQGMMSRGDIELSVISKTF FFSFFNLFLVFTVFGTATTFYGFWENLRDAFKDATTIAFALAKTLENFAPFYINFLCLQGIGLF PFRLLEFGSVAMYPINFLAAKTPRDYAELSTPPTFSYGYSIPQTVLSLIICVVYSVFPSSWLIC LFGLIYFTIGKFIYKYQLLYAMDHQQHSTGRAWPMICSRILMGLIVFQLAMIGVLALRRAITRS LLIVPLLMATVWFSYFFARTYEPLMKFIALKSIDRERPGGGDISPSPSSTFSPPSGLDRDSFPI RIGGQELGLRLRKYVNPSLILPLHDAWLPGRTMVPELQGELEHRNSENNAADESV Transglucosidase MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS 263 GAQ33901.1 VTYDFGINVAGIVSVDVASASSESAFIGVTFTESSMWISNEACDATQDAGLDTPLWFAVGQGAG VYSVGKKYTRGAFRYMTVVSNTTATVSLNSVKINYTASPIQDLRAYTGYFHSSDELLNRIWYAG AYTLQLCSIDPTTGDALVGLGVITSSETITLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD MSIALESVAVSTEDLYSVRTALESLYALQKADGQLPYAGKPFYDTVSFTYHLHSLVGAASYYQY TGDRAWLTRYWGQYKKGVQWALSSVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS LAQSLNDNAPIRNWTATAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN QSAIISESLAARWGPYGAPAPEAGATVSPFIGGFELQAHYQAGQPDRALDLLRLQWGFMLDDPR MTNSTFIEGYSTDGSLVYAPYTNRPRVSHAHGWSTGPTSALTIYTAGLRVTGPAGATWLYKPQP GNLTQVEAGFSTRLGSFASSFSRSGGRYQELSFTTPNGTTGSVELGDVSGQLVSEGGVKVQLVG GKASGLQGGKWRLNV Transglucosidase MRWHKLLPGVLALLPLSVAQSCWRNTTCSGPTESAFSGPWEKNIFAPSSRTVNPEKLFLITQPD 264 EHA19108.1 KTEEYSPFALHGNGSLVVYDFGKEVGGIVSVNFSSTGSGALGVAFTEAKNWIGEWSDSSNGGFK GPDGALYGNFTEAGSHYYVMPDKSLRGGFRYLTLFLITSDNSTIQIEDVNLEIGFQPTWSNLKA YQGYFHSNDDLLNKIWYTGAYTLQTNEVPTDTGRQIPAMAVGWANNCTLGPGDTIIVDGAKRDR AVWPGDMGIAVPSAFVSLGDLDSVKNALQVMYDTQNNSTGAFDESGPPLSQKDSDTYHMWTMVG TYNYMLFTNDSDFLERNWEGYQKAMDYIYGKVTYPSGLLNVTGTRDWARWQQGYNNSEAQMILY HTLNTGAELATWAGDSGDLSSTWTSRAEKLRQAINEYCWDDSYGAFKDNATDTTLHPQDANSMA LLFGVVDADRAASISERLTDNWTPIGAVAPELPENISPFISSFEIQGHLTVGQPQRALELIRRS WGWYYNNANGTQSTVIEGYLQNGTFGYRSDRGYYYDTAYVSHSHGWSSGPTSALTNYIVGISVT SPLGATWRIAPQFVDLQSAEGGFTTSLGKFQAGWSKTDKGYTLDFTVPHGTQGNLTLPFVSAAK PSIKIDGTEISRGVQYANSTATVTVSGGGTYKVEVQ Transglucosidase MPQKEFVPKTYQESSTGAQSSSSVHLRSSPEERSFDFSFEPIRENLFRVTFSSQDHPLPPYPSV 265 EHA19157.1 TKPATSLDGVHVSATGGSNQKTIEVGDVTASVEWSNTPVVSLSWKGTEKPLYRDLPLRSYVADS TGIAHYTEHDRDCLHVGLGEKRAPMDLTGRHFQLSATDSFGYDVYNTDPLYKHIPLLIKASPDG CVAIFSTTHGRGTWSVGSEVDGLWGHFKVYRQDYGGLEQYLIVGKTLKDVVRSYAELVGLPILV PRWAYGYISGGYKYTMLDDPPAHEALMEFADKLEEHGIPCSAHQMSSGYSIAETEPKVRNVFTW NKYRFPNPEEWIAKYHGRGIRLLSNIKPFLLASHPDFQKLIDGNGFFKDPESSKPGYMRLWSAG GATGGDGCHIDFSSAVAFKWWYDGVQSLKRAGIDAMWNDNNEYTLPDDDWKLALDEPTVSDAVK KGVENSVGQWGRAMHTELMGKASHDALLNIEPNHRPFVLTRSATAGTMRYAASTWSGDNVTSWE GMKGANALSLSAGISLLQCCGHDIGGFEGPQPSPELLLRWIQLGIHSPRFAINCFKTSPGNSSV GDVIEPWMYPEITPLVRDTIKRRYEILPYIYSLGLESHLTASPPQRWVGWGYESDPEVWTKALK SGDEQFWFGDTIMVGGVYEPGVSVAKLYLPRKANDQFDFGYVNMNEPYNYLASGQWVEVPSEWR KSIPLLARIGGAIPVGKPVHTRVPGDDTPASVAVKEVDDYRGVEIFPPLGSSHGQVFSTTWFED DGISLEARISEYTVTYSSTEEKVIVGFSRDEKSGFVPAWTDLDIILHNGDERRVVSDIGKTVEY KGKGSRGRVVYTLKN Transglucosidase MRLSTSSLLLSVSLLGKLALGLSAAEWRTQSIYFLLTDRFGRTDNSTTATCNTGDQIYCGGSWQ 266 EHA19519.1 GIINHLDYIQGMGFTAIWISPITEQLPQDTADGEAYHGYWQQKIYDVNSNFGTADDLKSLSDAL HARGMYLMVDVVPNHMGYAGNGNDVDYSVFDPFDSSSYFHPYCLITDWDNLTMVQDCWEGDTIV SLPDLNTTETAVRTIWYDWVADLVSNYSVDGLRIDSVLEVEPDFFPGYQEAAGVYCVGEVDNGN PALDCPYQEYLDGVLNYPIYWQLLYAFESSSGSISDLYNMIKSVASDCSDPTLLGNFIENHDNP RFASYTSDYSQAKNVLSYIFLSDGIPIVYAGEEQHYSGGKNDAFYTDSNTIAMRKGTSGSQVIT VLSNKGSSGSSYTLTLSGSGYTSGTKLIEAYTCTSVTVDSSGDIPVPMASGLPRVLLPASVVDS SSLCGGSGSNSSTTTTTTATSSSTATSKSASTSSTSTACTATSTSLAVTFEELVTTTYGEEIYL SGSISQLGDWDTSDAVKMSADDYTSSNPEWSVTVTLPVGTTFEYKFIKVESDGTVTWESDPNRE YTVPECGSGETVVDTWR Transglucosidase MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP 267 EHA20839.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS QVLFVGGLQNLTKGGAWAENWVLEW Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 268 EHA21384.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MAKSASQIHRAWWKECSVYQIWPASYKDSNDDGIGDIPGIISKLDYIKNIGVDIVWLCPSYKSP 269 EHA23512.1 QVDMGYDIADYYSIADEYGTVADVEKLIQGCHERGMKLLMDLVVNHTSDQNEWFKQSRSSKDNE YRNWYVWKPARYDEQGNRHPPNNWVSHFQGSAWEWDEHTGEYYLHLYATEQPDLNWEHPPVRKA VHDIMRFWLDKGADGFRMDVINFISKDQRFPDAPVKDPRTPWQWGDKYYANGPRLHEYLQDLGK ILKEYDAFSVGEMPFVRDTEEVLRAVRYDRNEINMIFNFEHVDIDHGTYDKFEPGSWKLTDLKA FFETWQKFMYNNDGWNALYWENHDQPRSIDRYAQAKEEFRTEAGKMLATVLALQSGTPFVYQGQ EIGMRNVPVEWDMNEYKDIDCLNHWHRLLKHRPDDIEAQKSARQEYQKKSRDNGRTPVQWSSAP NGGFTGPNAKPWMSVNPDYVRFNAEAQVNDPNSIYHYWAAVLGLRKKYLDIFVYGDYDLVDKDS QEVFAYARQFENQKALVLTNWTEKTLEWDATANGVKGIKDVLLNSYESAEAAKERFTGQKWSLR PYEAVVLLVEA Transglucosidase MPSTYLGALATLAVFPCLGQARSTWPLGSGLELSYQASQHQISIHQDNQTIFSTIPGQPFLSAS 270 EHA23680.1 AGKDQFVEDSGNFNITNVNQARCRGQNITQLAGIPRSDSVKNQVAVRGYLLDCGGEDIAYGMNF WVPRRFSDRVAFEATVDSEANASVPVDRLYLTFASHALEDFYGLGAQASFASMKNRSIPIFSRE QGVGRGDQPYTAIEDSQGFFSGGDQYTTYTAIPQYVSSDGRVFYLDENDTAYAVFDFQRSDAVT VRYDSLSVHGHLMQADTMLDAITMLTEYTGRMPTLPEWVDHGALLGIQGGQEKVNRIVKQGFEH DCPVAGVWLQDWSGTHLQSAPYGNMNISRLWWNWESDTSLYPTWAEFVQTLREQHGVRTLAYVN PFLANVSSKSDGYRRNLFLEASQHRYMVQNTTTNSTAIISSGKGIDAGILDLTNEDTRAWFADV LRTQVWSANISGCMWDFGEYTPITPDTSLANISTSAFFYHNQYPRDWAAYQRSVAAEMPLFHEM VTFHRSASMGANRHMNLFWVGDQATLWTRNDGIKSVVTIQGQMGISGYAHSHSDIGGYTTVFEP PTTSNSSGAIPRSAELLGRWGELGAVSSAVFRSHEGNVPSVNAQFYSNSTTYAYFAYNARLFRS LGPYRRRILNTESQRRGWPLLRMPVLYHPEDLRARQISYESFFLGRDLYVAPVLDEGHKSVDVY FPGHGANRTYTHVWTGQTYRAGQTAKVSAPFGKPAVFLVNGASSPELDVFLNFVRKENGTVLHA Transglucosidase MDPANEYCGLEDYGLVGDMHTCALVSKNGSVDSMCWPVFDSPSIFCRILDKEKGGHFSITPDRR 271 EHA25759.1 LKNPLSKQRYRPYTNMLETRWIHEEGVMNILDYFPIAKPKPHVVEKGLPQWCRCYQNKGSAQYQ ACRSGMVRKAECVRGEMEIEIELFPAFNYARDSHVAQQSSASDDAIQVYHFQAESQNLVVSVLG DKGDISEDDSDLSIEFELSDRPGHLGPGLVGKVILKEGQSITMLLHDQESITCDVDDLAPYLQQ IERTTGDFWSDWTSKCTFRGHYREQVERSLLVLKLLTYKPTGAIVAAPTFSLPEHIGGSRNWDY RYSWVRDAAFTVYVFLKNGYPEEAESYINFIFERIFPPMDKNPKPGEPFLPIMITIHGEREIPE MELDHLEGYRGSRPVRIGNGAATHIQLDIYGELMDSIYLYNKHAADISYDQWRAIRRMIDFVIQ IRHQPDQSIWEVRGPPQNFVYSKIMLWVALDRGLRLAEKRSNLPCPDRARWMHERDALYDEIMT KGYNAEKGFFCMSYENQDAMDAAVLIAPLVFFVAPNDPRLLSTIQRITEVPAKGGLSVANMVSR YDTGKVDDGVGGNEGAFLMVTFWLVEAMMRAARSKSYLPHDPFFQQLRKTATSQFDSILSFANH LGMFSEEVATSGEQIGNMPQAFSHLACVSAAMNLGGGGDR MSSPQQVYLLPLKDDGSPDVPGGYIYLPAPTNPPYLLRFVIEGSSSICREGALWVNIPEKGESF Transglucosidase NRSAFRSFSLSPDFNKNIQIDVPITSAGSFAFYVTFSPLPEFSVISTPTPEPTRTPTHYIDVSP 272 EHA26514.1 KLTLRGQDLPLNALSIYSVISKFMGQYPKEWEKHLNGISQRNYNMVHFTPLMKRGASNSPYSIF DQLQFDDAVFPNGEDDVARLISKMENEYGLLSLTDVVWNHTAHNSKWLEEHPEAGYSVETAPWL EAALELDTALLKFGQDLQNLGLPTEFQTVDELMKVMNVMRDKVIAGIRLWEFYAIDVKSDTHKI LDKWKTSKDIDLTDTNWAQLNLQDYKNWTLKQQATFIRDHAIPTSKQVLDRFSRAVDLQFGAAI LTALFGPHNPSTSDTSIVEESLSKILDEVNLPFYEEYDGDVSEIMNQVFNRIKYLRIDDHGPKL GAVTAQSPLIETYFTRLPLNDVTKKHKKEALALVNNGWIWNADALRDNAGPDSRAYLRREVIVW GDCVKLRYGSCRDDNPFLWDFMTDYTRLMAKYFSGFRIDNCHSTPLVVAEYLLDEARKVRPNLT VFAELFTGSEEADYIFVKRLGINALIREAMQAWSTGELSRLVHRHGGRPIGSFDLDLPSSGSSH AIASSGLDSGKEKVVHIRPTPVQALFMDCTHDNEMPAQKRTAKDTLPNGALVAMCASAIGSVIG YDEVYPRLVDLVHEHRLYFSEFSEAPETGLNSLEGGIGGIKKLLNELHTKMGIEGYDETHIHHD GEYITVHRVHPRTRKGVFLIAHTAFPGQDSRSVLAPTHLVGTQVKHIGTWLLEVDTSQTTKERI QADKSYLRGLPSQVKTFEGTKIEESGKDTIISVLNSFVAGSIALFETSMPSVEHASGLDNYITE GVDHAFSDLSLVDLNFALYRCEAEERDSSKGQDGAYDIPGHGPLVYAGLQGWWSVLENIIKYNE LGHPLCDHLRNGQWALDYIVARLEKLSHKEEHPALGRPAAWLQEKFQAVRQLPSFLLPRYFAII VQVAYNAAWKRGIQLLGSHIQKGQEFIHQLGMVSVQQTGYVNSASLWPTKKVPSLAAGLPHFAV DWARCWGRDVFISLRGLLLCTGRFEDAKEHITAFASVLKHGMIPNLLSSGKLPRYNSRDSVWFF LQSIQDYTEMAPDGLEILDHKVPRRFLPYDDVWFPFDDPRAYSQQSTISEIIQEVFQRHAQGLS FREYNAGPDLDMQMTQDGFQIDVKVDWETGLIFGGSQYNCGTWQDKMGESAKAGNKGVPGTPRD GAAIEITGLVYSALTWVAKLHERGIYKHDGVDIGGGKSISFEDWASRIRANFERCYYVPLQPKD DGQYDIDANIINRRGIYKDLYRSGKPYEDYQLRSNFPIAMTVAPDLFTASKALAALALADEVLV GPVGMATLDPSDLNYRPNYNNSEDSTDFATAKGRNYHQGPEWVWQRGYFLRAFLHFDLARRTTP AERTETYQQITRRLEGCKRALRESPWKGLTELTNKNGAYCADSSPTQAWSAGCLLDLYYDASRH SQS Transglucosidase MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS 273 EHA26552.1 VTYDFGINVAGIVSVDVASASSESAFIGVTFTESSMWISSEACDATQDAGLDTPLWFAVGQGAG LYTVEKKYNRGAFRYMTVVSNTTATVSLNSVKINYTASPTQDLRAYTGYFHSNDELLNRIWYAG AYTLQLCSIDPTTGDALVGLGVITSSETISLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD MSIALESVAVSTEDLYSVRTALESLYALQKPDGRLPYAGKPFFDTVSFTYHLHSLVGAASYYQY TGDRAWLTRYWGQYKKGVQWALSSVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS LAQTLNDNAPIRNWTTTAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN QSAIVSESLAARWGPYGAPAPEAGATVSPFIGGFELQAHYQAGQPDRALDLLRLQWGFMLDDPR MTNSTFIEGYSTDGSLAYAPYTNTPRVSHAHGWATGPTSALTIYTAGLRVTGPAGATWLYKPQP GNLTQVEAGFSTRLGSFASSFSRSGGRYQELSFSTPNGTTGSVELGDVSGQLVSDRGVKVQLVG GKASGLQGGKWKLSNN Transg1ucosidase MLGSLLLLLPLVGAAVIGPRANSQSCPGYKASNVQKQARSLTADLTLAGTPCNSYGKDLEDLKL 274 EHA26885.1 LVEYQTDERLHVMIYDADEEVYQVPESVLPRVGSDEDSEDSVLEFDYVEEPFSFTISKGDEVLF DSSASPLVFQSQYVNLRTWLPDDPYVYGLGEHSDPMRLPTYNYTRTLWNRDAYGTPNNTNLYGS HPVYYDHRGKSGTYGVFLLNSNGMDIKINQTTDGKQYLEYNLLGGVLDFYFFYGEDPKQASMEY SKIVGLPAMQSYWTFGFHQCRYGYRDVYELAEVVYNYSQAKIPLETMWTDIDYMDKRRVFTLDP QRFPLEKMRELVTYLHNHDQHYIVMVDPAVSVSNNTAYITGVRDDVFLHNQNGSLYEGAVWPGV TVFPDWFNEGTQDYWTAQFQQFFDPKSGVDIDALWIDMNEASNFCPYPCLDPAAYAISADLPPA APPVRPSSPIPLPGFPADFQPSSKRSVKRAQGDKGKKVGLPNRNLTDPPYTIRNAAGVLSMSTI ETDLIHAGEGYAEYDTHNLYGTTHIPMVGADVCGFGSNTTEELCARWASLGAFYTFYRNHNELG DISQEFYRWPTVAESARKAIDIRYKLLDYIYTALHRQSQSGEPFLQPQFYLYPEDSNTFANDRQ FFYGDALLVSPVLNEGSTSVDAYFPDDIFYDWYTGAVVRGHGENITLSNINITHIPLHIRGGNI IPVRTSSGMTTTEVRKQGFELIIAPDLDDTASGSLYLDDGDSLNPSSVTELEFTYSKGELHVKG TFGQKAVPKVEKCTLLGKSART Transg1ucosidase MCHKSNYSSPKWWKESVVYQVYPASFNCGKSTTTTNGWGDVTGIIEKVPYLKSLGVDIVWLSPI 275 EHA27488.1 YTSPQVDMGYDIADYKSIDPRYGTLADVDLLIKSLKDHDMRLMMDLVVNHTSDQHSWFVESASS KDSPKRDWYIWRPAKGFDEAGNPVPPNNWAQILGDTLSAWTWHEETQEFYLTLHTSAQAELNWE NPDVVTAVYDVMEFWLRRGICGFRMDVINFISKDQSFPDAPIIDPASKYQPGEQFYTNGPRFHE FMHGIYDNVLSKYDTITVGETPYVTDMKEIIKTVGSTAKELNMAFNFDHMEIEDIKTKGESKWS LRDWKLTELKGILSGWQKRMREWDGWNAIFLECHDQARSVSRYTNDSDEFRDRGAKLLALLETT LGGTIFLYQGQEIGMRNFPVEWGPDTEYKDIESVNFWKKSKELHPVGSEGLAQARTLLQKKARD HARTPMQWSADPHAGFTVPDATPWMRVNDDYRTVNVEAQMSFPWEMKGELSVWQYWQQALQRRK LHKGAFVYGDFEDLDYHNESVFAYSRTSADGKETWLPVPSWSLPKKRTLS Transg1ucosidase MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS 276 EHA28539.1 SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV LDKNGQADGSLYVDDGETFDYKRGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF Transg1ucosidase MWSSWLLSALLATEALAVPYEEYILAPSSRDLAPASVRQVNGSVTNAAALTGAGGQATFNGVSS 277 XP_001389086.1 VTYDFGINVAGIVSVDVASASSDSAFIGVTFTESSMWISSEACDATQDAGLDTPLWFAVGQGAG LYTVEKKYNRGAFRYMTVVSNTTATVSLNSVKINYTASPTQDLRAYTGYFHSNDELLNRIWYAG AYTLQLCSIDPTTGDALVGLGVITSSETISLPQTDKWWTNYTITNGSSTLTDGAKRDRLVWPGD MSIALESVAVSTEDLYSVRTALESLYALQKPDGRLPYAGKPFFDTVSFTYHLHSLVGAASYYQY TGDRAWLTRYWGQYKKGVQWALSSVDSTGLANITASADWLRFGMGAHNIEANAILYYVLNDAIS LAQTLNDNAPIRNWTTTAARIKTVANELLWDDKNGLYTDNETTTLHPQDGNSWAVKANLTLSAN QSAIVSESLAARWGPYGAPAPEAGATVSPFIGGFELQAHYQAGQPDRALDLLRLQWGFMLDDPR MTNSTFIEGYSTDGSLAYAPYTNTPRVSHAHGWATGPTSALTIYTAGLRVTGPAGATWLYKPQP GNLTQVEAGFSTRLGSFASSFSRSGGRYQELSFSTPNGTTGSVELGDVSGQLVSDRGVKVQLVG GKASGLQGGKWKLSNN Transglucosidase MAKSASQIHRAWWKECSVYQIWPASYKDSNDDGIGDIPGIISKLDYIKNIGVDIVWLCPSYKSP 278 XP_001400455.1 QVDMGYDIADYYSIADEYGTVADVEKLIQGCHERGMKLLMDLVVNHTSDQNEWFKQSRSSKDNK YRNWYVWKPARYDEQGNRHPPNNWVSHFQGSAWEWDEHTGEYYLHLYATEQPDLNWEHPPVRKA VHDIMRFWLDKGADGFRMDVINFISKDQRFPDAPVKDPRTPWQWGDKYYANGPRLHEYLQDLGK ILKEYDAFSVGEMPFVRDTEEVLRAVRYDRNEINMIFNFEHVDIDHGTYDKFEPGSWKLTDLKA FFETWQKFMYNNDGWNALYWENHDQPRSIDRYAQAKEEFRTEAGKMLATVLALQSGTPFVYQGQ EIGMRNVPVEWDMNEYKDIDCLNHWHRLLKHRPDDIEAQKSARQEYQKKSRDNGRTPVQWSSAP NGGFTGPNAKPWMSVSPDYVRFNAEAQVNDPNSIYHYWAAVLGLRKKYLDIFVYGDYDLVDKDS QEIFAYARQYENKKALVLTNWTEKTLEWDATTNGVKGVKDVLLNSYESAEAAKGRFSGQKWSLR PYEAVVLLVEA Transglucosidase MSFRSLLALSGLVCTGLANVISKRATLDSWLSNEATVARTAILNNIGADGAWVSGADSGIVVAS 279 XP_001390530.1 PSTDNPDYFYTWTRDSGLVLKTLVDLFRNGDTSLLSTIENYISAQAIVQGISNPSGDLSSGAGL GEPKFNVDETAYTGSWGRPQRDGPALRATAMIGFGQWLLDNGYTSTATDIVWPLVRNDLSYVAQ YWNQTGYDLWEEVNGSSFFTIAVQHRALVEGSAFATAVGSSCSWCDSQAPEILCYLQSFWTGSF ILANFDSSRSGKDANTLLGSIHTFDPEAACDDSTFQPCSPRALANHKEVVDSFRSIYTLNDGLS DSEAVAVGRYPEDTYYNGNPWFLCTLAAAEQLYDALYQWDKQGSLEVTDVSLDFFKALYSDAAT GTYSSSSSTYSSIVDAVKTFADGFVSIVETHAASNGSMSEQYDKSDGEQLSARDLTWSYAALLT ANNRRNSVVPASWGETSASSVPGTCAATSAIGTYSSVTVTSWPSIVATGGTTTTATPTGSGSVT STSKTTATASKTSTSTSSTSCTTPTAVAVTFDLTATTTYGENIYLVGSISQLGDWETSDGIALS ADKYTSSDPLWYVTVTLPAGESFEYKFIRIESDDSVEWESDPNREYTVPQACGTSTATVTDTWR Transglucosidase MSNRWTLLLSLVILLGCLVIPGVTVKHENFKTCSQSGFCKRNRAFADDAAAQGSSWASPYELDS 280 XP_001393899.1 SSIQFKDGQLHGTILKSVSPNEKVKLPLVVSFLESGAARVVVDEEKRMNGDIQLRHDSKARKER YNEAEKWVLVGGLELSKTATLRPETESGFTRVLYGPDNQFEAVIRHAPFSADFKRDGQTHVQLN NKGYLNMEHWRPKVEVEGEGEQQTQEDESTWWDESFGGNTDTKPRGPESVGLDITFPGYKHVFG IPEHADSLSLKETRGGEGNHEEPYRMYNADVFEYELSSPMTLYGAIPFMQAHRKDSTVGVFWLN AAETWVDIVKSTSSPNPLALGVGATTDTQSHWFSESGQLDVFVFLGPTPQEISKTYGELTGYTQ LPQHFAIAYHQCRWNYITDEDVKEVDRNFDKYQIPYDVIWLDIEYTDDRKYFTWDPLSFPDPIS MEEQLDESERKLVVIIDPHIKNQDKYSIVQEMKSKDLATKNKDGEIYDGWCWPGSSHWIDTFNP AAIKWWVSLFKFDKFKGTLSNVFIWNDMNEPSVFNGPETTMPKDNLHHGNWEHRDIHNVHGITL VNATYDALLERKKGEIRRPFILTRSYYAGAQRMSAMWTGDNQATWEHLAASIPMVLNNGIAGFP FAGADVGGFFQNPSKELLTRWYQAGIWYPFFRAHAHIDTRRREPYLIAEPHRSIISQAIRLRYQ LLPAWYTAFHEASVNGMPIVRPQYYAHPWDEAGFAIDDQLYLGSTGLLAKPVVSEEATTADIYL ADDEKYYDYFDYTVYQGAGKRHTVPAPMETVPLLMQGGHVIPRKDRPRRSSALMRWDPYTLVVV LDKNGQADGSLYVDDGETFDYERGAYIHRRFRFQESALVSEDVGTKGPKTAEYLKTMANVRVER VVVVDPPKEWQGKTSVTVIEDGASAASTASMQYHSQPDGKAAYAVVKNPNVGIGKTWRIEF Transg1ucosidase MPQKEFVPKTYQESSTGAQSSSSVHLRSSPEERSFDFSFEPIRENLFRVTFSSQDHPLPPYPSV 281 XP_001399012.1 TKPATSLDGVHVSATGGSNQKTIEVGDVTASVEWSNTPVVSLSWKGTEKPLYRDLPLRSYVADS TGIAHYTEHDRDCLHVGLGEKRAPMDLTGRHFQLSATDSFGYDVYNTDPLYKHIPLLIKASPDG CVAIFSTTHGRGTWSVGSEVDGLWGHFKVYRQDYGGLEQYLIVGKTLKDVVRSYAELVGLPILV PRWAYGYISGGYKYTMLDDPPAHEALMEFADKLEEHGIPCSAHQMSSGYSIAETEPKVRNVFTW NKYRFPNPEEWIAKYHGRGIRLLSNIKPFLLASHPDFQKLIDGNGFFKDPESSKPGYMRLWSAG GATGGDGCHIDFSSAVAFKWWYDGVQSLKRAGIDAMWNDNNEYTLPDDDWKLALDEPTVSDAVK KGVENSVGQWGRAMHTELMGKASHDALLNIEPNHRPFVLTRSATAGTMRYAASTWSGDNVTSWE GMKGANALSLSAGISLLQCCGHDIGGFEGPQPSPELLLRWIQLGIHSPRFAINCFKTSPGNSSV GDVIEPWMYPEITPLVRDTIKRRYEILPYIYSLGLESHLTASPPQRWVGWGYESDPEVWTKALK SGDEQFWFGDTIMVGGVYEPGVSVAKLYLPRKANDQFDFGYVNMNEPYNYLASGQWVEVPSEWR KSIPLLARIGGAIPVGKPVHTRVPGDDTPASVAVKEVDDYRGVEIFPPLGSSHGQVFSTTWFED DGISLEARISEYTVTYSSTEEKVIVGFSRDEKSGFVPAWTDLDIILHNGDERRVVSDIGKTVEY KGKGSRGRVVYTLKN Transg1ucosidase MVKLTHLLARAWLVPLAYGASQSLLSTTAPSQPQFTIPASADVGAQLIANIDDPQAADAQSVCP 282 XP_001402053.1 GYKASKVQHNSRGFTASLQLAGRPCNVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTNASWY FLSENLVPRPKASLNASVSQSDLFVSWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFV TALPEEYNLYGLGEHITQFRLQRNANLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQNGSY IPVKSSEADASQDYISLSHGVFLRNSHGLEILLRSQKLIWRTLGGGIDLTFYSGPAPADVTRQY LTSTVGLPAMQQYNTLGFHQCRWGYNNWSDLADVVANFEKFEIPLEYIWTDIDYMHGYRNFDND QHRFSYSEGDEFLSKLHESGRYYVPIVDAALYIPNPENASDAYATYDRGAADDVFLKNPDGSLY IGAVWPGYTVFPDWHHPKAVDFWANELVIWSKKVAFDGVWYDMSEVSSFCVGSCGTGNLTLNPA HPSFLLPGEPGDIIYDYPEAFNITNATEAASASAGASSQAAATATTTSTSVSYLRTTPTPGVRN VEHPPYVINHDQEGHDLSVHAVSPNATHVDGVEEYDVHGLYGHQGLNATYQGLLEVWSHKRRPF IIGRSTFAGSGKWAGHWGGDNYSKWWSMYYSISQALSFSLFGIPMFGADTCGFNGNSDEELCNR WMQLSAFFPFYRNHNELSTIPQEPYRWASVIEATKSAMRIRYAILPYFYTLFDLAHTTGSTVMR ALSWEFPNDPTLAAVETQFMVGPAIMVVPVLEPLVNTVKGVFPGVGHGEVWYDWYTQAAVDAKP GVNTTISAPLGHIPVYVRGGNILPMQEPALTTREARQTPWALLAALGSNGTASGQLYLDDGESI YPNATLHVDFTASRSSLRSSAQGRWKERNPLANVTVLGVNKEPSAVTLNGQAVFPGSVTYNSTS QVLFVGGLQNLTKGGAWAENWVLEW Transg1ucosidase MFVESAKKALLALSLLAASAQAVPRVRRQGASSSFDYKSQIVRGVNLGGWLVTEPWITPSLYDS 283 A2RAR6.1 TGGGAVDEWTLCQILGKDEAQAKLSSHWSSFITQSDFDRMAQAGLNHVRIPIGYWAVAPIDGEP YVSGQIDYLDQAVTWARAAGLKVLVDLHGAPGSQNGFDNSGHRGPIQWQQGDTVNQTMTAFDAL ARRYAQSDTVTAIEAVNEPNIPGGVNEDGLKNYYYGALADVQRLNPSTTLFMSDGFQPVESWNG FMQGSNVVMDTHHYQVFDTGLLSMSIDDHVKTACSLATQHTMQSDKPVVVGEWTGALTDCAKYL NGVGNAARYDGTYMSTTKYGDCTGKSTGSVADFSADEKANTRRYIEAQLEAYEMKSGWLFWTWK TEGAPGWDMQDLLANQLFPTSPTDRQYPHQCS Transg1ucosidase MPGHSRSRDRLSPSSELDDADPVYSPSVYQREHYYNNDSLFDSADDDYTRTPRNVYSYETHDEY 284 A2QX52.1 HDDDDDDDDVHEHDHDHEYDDKFEEPWVPLRAQVEGDQWREGFETAIPKEEDVTQAKEYQYQMS GALGDDGPPPLPSDALGRGKGKKRLDRETRRQRRKERLAAFFKHKNGSASAGLVSGDALAKLLG SQDGDEDCLSHLGTERADSMSQKNLEGGRQRKLPVLSEEPMMLRPFPAVAPTGQTQGRVVSGAQ LEEGGPGMEMRHRGGGGPPAEGLLQKEGDWDGSTKGSSTSARPSFWKRYHKTFIFFAILIVLAA IAIPVGIIEARRLHGTSGGDNSSNSNLKGISRDSIPAYARGTYLDPFTWYDTTDFNVTFTNATV GGLSIMGLNSTWNDSAQANENVPPLNEKFPYGSQPIRGVNLGGWLSIEPFIVPSLFDTYTSSEG IIDEWTLSEKLGDSAASVIEKHYATFITEQDFADIRDAGLDHVRIQFSYWAIKTYDGDPYVPKI AWRYLLRAIEYCRKYGLRVNLDPHGIPGSQNGWNHSGRQGTIGWLNGTDGELNRQRSLEMHDQL SQFFAQDRYKNVVTIYGLVNEPLMLSLPVEKVLNWTTEATNLVQKNGIKAWVTVHDGFLNLDKW DKMLKTRPSNMMLDTHQYTVFNTGEIVLNHTRRVELICESWYSMIQQINITSTGWGPTICGEWS QADTDCAQYVNNVGRGTRWEGTFSLTDSTQYCPTASEGTCSCTQANAVPGVYSEGYKTFLQTYA EAQMSAFESAMGWFYWTWATESAAQWSYRTAWKNGYMPKKAYSPSFKCGDTIPSFGNLPEYY Transg1ucosidase MSSPQQVYLLPLKDDGSPDVPGGYIYLPAPTNPPYLLRFVIEGSSSICREGALWVNIPEKGESF 285 XP_001389036.2 NRSAFRSFSLSPDFNKNIQIDVPITSAGSFAFYVTFSPLPEFSVLSTPTPEPTRTPTHYIDVSP KLTLRGQDLPLNALSIYSVISKFMGQYPKEWEKHLNGISQRNYNMVHFTPLMKRGASNSPYSIF DQLQFDDAVFPNGEDDVARLISKMENEYGLLSLTDVVWNHTAHNSKWLEEHPEAGYSVETAPWL EAALELDTALLKFGQDLQNLGLPTEFQTVDELMKVMNVMRDKVIAGIRLWEFYAIDVKSDTHKI LDKWKTSKDIDLTDTNWAQLNLQDYKNWTLKQQATFIRDHAIPTSKQVLGRFSRAVDLQFGAAI LTALFGPHNPSTSDTSIVEESLSKILDEVNLPFYEEYDGDVSEIMNQVFNRIKYLRIDDHGPKL GAVTAQSPLIETYFTRLPLNDVTKKHKKEALALVNNGWIWNADALRDNAGPDSRAYLRREVIVW GDCVKLRYGSCRDDNPFLWDFMTDYTRLMAKYFSGFRIDNCHSTPLVVAEYLLDEARKVRPNLT VFAELFTGSEEADYIFVKRLGINALIREAMQAWSTGELSRLVHRHGGRPIGSFDLDLPSSGSSH AIASSGLDSGKEKVVHIRPTPVQALFMDCTHDNEMPAQKRTAKDTLPNGALVAMCASAIGSVIG YDEVYPRLVDLVHEHRLYFSEFSEAPETGLNSLEGGIGGIKKLLNELHTKMGIEGYDETHIHHD GEYITVHRVHPRTRKGVFLIAHTAFPGQDSRSVLAPTHLVGTQVKHIGTWLLEVDTSQTTKERI QADKSYLRGLPSQVKTFEGTKIEESGKDTIISVLNSFVAGSIALFETSMPSVEHASGLDNYITE GVDHAFSDLSLVDLNFALYRCEAEERDSSKGQDGAYDIPGHGPLVYAGLQGWWSVLENIIKYNE LGHPLCDHLRNGQWALDYIVARLEKLSHKEEHPALGRPAAWLQEKFQAVRQLPSFLLPRYFAII VQVAYNAAWKRGIQLLGPHIQKGQEFIHQLGMVSVQQTGYVNSASLWPTKKVPSLAAGLPHFAV DWARCWGRDVFISLRGLLLCTGRFEDAKEHITAFASVLKHGMIPNLLSSGKLPRYNSRDSVWFF LQSIQDYTEMAPDGLEILDHKVPRRFLPYDDVWFPFDDPRAYSQQSTISEIIQEVFQRHAQGLS FREYNAGPDLDMQMTQDGFQIDVKVDWETGLIFGGSQYNCGTWQDKMGESAKAGNKGVPGTPRD GAAIEITGLVYSALTWVAKLHERGIYKHDGVDIGGGKSISFEDWASRIRANFERCYYVPLQPKD DGQYDIDANIINRRGIYKDLYRSGKPYEDYQLRSNFPIAMTVAPDLFTASKALAALALADEVLV GPVGMATLDPSDLNYRPNYNNSEDSTDFATAKGRNYHQGPEWVWQRGYFLRAFLHFDLARRTTP AERTETYQQITRRLEGCKRALRESPWKGLTELTNKNGAYCADSSPTQAWSAGCLLDLYYDASRH SQMRIWYGPFKLRYKNIDAIDYYEEKLRRLDEKIQVARQKEYPPTEVAFVTMESIAASQMVVQA ILDPHPMQLLARLAPAPADVVWKNTYLPRSRRMMQSWFITVVIGFLTVFWSVLLIPVAYLLEYE TLHKVFPQLADALARNPLAKSLVQTGLPTLVLSLLTVAVPYLYNWLSNQQGMMSRGDIELSVIS KTFFFSFFNLFLVFTVFGTATTFYGFWENLRDAFKDATTIAFALAKTLENFAPFYINFLCLQGI GLFPFRLLEFGSVAMYPINFLAAKTPRDYAELSTPPTFSYGYSIPQTVLSLIICVVYSVFPSSW LICLFGLIYFTIGKFIYKYQLLYAMDHQQHSTGRAWPMICSRILMGLMVFQLAMIGVLALRRAI TRSLLIVPLLMATVWFSYFFARTYEPLMKFIALKSIDRERPGGGDISPSPSSTFSPPSGLDRDS FPIRIGGQELGLRLRKYVNPSLILPLHDAWLPGRTMVPELQGELEHRNPGNNAADESV Transgiucosidase MLGSLLLLLPLVGAAVIGPRANSQSCPGYKASNVQKQARSLTADLTLAGTPCNSYGKDLEDLKL 286 XP_001389510.2 LVEYQTDERLHVMIYDADEEVYQVPESVLPRVGSDEDSEDSVLEFDYVEEPFSFTISKGDEVLF DSSASPLVFQSQYVNLRTWLPDDPYVYGLGEHSDPMRLPTYNYTRTLWNRDAYGTPNNTNLYGS HPVYYDHRGKSGTYGVFLLNSNGMDIKINQTTDGKQYLEYNLLGGVLDFYFFYGEDPKQASMEY SKIVGLPAMQSYWTFGFHQCRYGYRDVYELAEVVYNYSQAKIPLETMWTDIDYMDKRRVFTLDP QRFPLEKMRELVTYLHNHDQHYIVMVDPAVSVSNNTAYITGVRDDVFLHNQNGSLYEGAVWPGV TVFPDWFNEGTQDYWTAQFQQFFDPKSGVDIDALWIDMNEASNFCPYPCLDPAAYAISADLPPA APPVRPSSPIPLPGFPADFQPSSKRSVKRAQGDKGKKVGLPNRNLTDPPYTIRNAAGVLSMSTI ETDLIHAGEGYAEYDTHNLYGTMMSSASRTAMQARRPDVRPLVITRSTFAGAGAHVGHWLGDNF SDWVHYRISIAQILSFASMFQIPMVGADVCGFGSNTTEELCARWASLGAFYTFYRNHNELGDIS QEFYRWPTVAESARKAIDIRYKLLDYIYTALHRQSQTGEPFLQPQFYLYPEDSNTFANDRQFFY GDALLVSPVLNEGSTSVDAYFPDDIFYDWYTGAVVRGHGENITLSNINITHIPLHIRGGNIIPV RTSSGMTTTEVRKQGFELIIAPDLDDTASGSLYLDDGDSLNPSSVTELEFTYSKGELHVKGTFG QKAVPKVEKCTLLGKSARTFKGFALDAPVNFKLK Transgiucosidase MPSTYLGALATLAVFPCLGQARSTWPLGSGLELSYQASQHQISIHQDNQTIFSTIPGQPFLSAS 287 XP_001391128.2 AGKDQFVEDSGNFNITNVNQARCRGQNITQLAGIPRSDSVKNQVAVRGYLLDCGGEDIAYGMNF WVPRRFSDRVAFEASVDSEANASFASMKNRSIPIFSREQGVGRGDQPYTAIEDSQGFFSGGDQY TTYTAIPQYVSSDGRVFYLDENDTAYAVFDFQRSDAVTVRYDSLSVHGHLMQADTMLDAITMLT EYTGRMPTLPEWVDHGALLGIQGGQEKVNRIVKQGFEHDCPVAGVWLQDWSGTHLQSAPYGNMN ISRLWWNWESDTSLYPTWAEFVQTLREQHGVRTLAYVNPFLANVSSKSDGYRRNLFLEASQHRY MVQNTTTNSTAIISSGKGIDAGILDLTNEDTRAWFADVLRTQVWSANISGCMWDFGEYTPITPD TSLANISTSAFFYHNQYPRDWAAYQRSVAAEMPLFHEMVTFHRSASMGANRHMNLFWVGDQATL WTRNDGIKSVVTIQGQMGISGYAHSHSDIGGYTTVFEPPTTSNSSGAIPRSAELLGRWGELGAV SSAVFRSHEGNVPSVNAQFYSNSTTYAYFAYNARLFRSLGPYRRRILNTESQRRGWPLLRMPVL YHPEDLRARQISYESFFLGRDLYVAPVLDEGHKSVEVYFPGHSANRTYTHVWTGQTYRAGQTAK VSAPFGKPAVFLVDGASSPELDVFLDFVRKENGTVLYA Transglucosidase MDPANEYCGLEDYGLVGDMHTCALVSKNGSVDSMCWPVFDSPSIFCRILDKEKGGHFSITPDRR 288 XP_001395384.2 LKNPLSKQRYRPYTNMLETRWIHEEGVMNILDYFPIAKPKPHVVEKGLPQWCRCYQNKGSAQYQ ACRSGMVRKAECVRGEMEIEIELFPAFNYARDSHVAQQSSASDDAIQVYHFQAESQNLVVSVLG DRGDISGDDSDLSIEFELSDRPGHLGPGLVGKVTLKEGQSITMLLHDQESITCNVEDLAPYLQQ IERTTGDFWSDWTSKCTFRGHYREQVERSLLVLKLLTYKPTGAIVAAPTFSLPEHIGGSRNWDY RYSWVRDAAFTVYVFLKNGYPEEAESYINFIFERIFPPMDKNPKPGEPFLPIMITIHGEREIPE MELEHLEGYRGSRPVRIGNGAATHIQLDIYGELMDSIYLYNKHAADISYDQWRAIRRMIDFVIQ IRHQPDQSIWEVRGPPQNFVYSKIMLWVALDRGLRLAEKRSNLPCPDRARWMHERDALYDEIMT KGYNSEKGFFCMSYENQDAMDAAVLIAPLVFFVAPNDPRLLSTIQKITEVPAKGGLSVANMVSR YDTGKVDDGVGGNEGAFLMVTFWLVEAMMRAARSKAYLPHDPFFQQLRKTATSQFDSILSFANH LGMFSEEVATSGEQIGNMPQAFSHLACVSAAMNLGGGGDR Transglucosidase MCNKSNYSSPKWWKESVVYQVYPASFNCGKSTTNTNGWGDVTGIIEKVPYLESLGVDIVWLSPI 289 XP_001396506.2 YTSPQVDMGYDIADYESIDPRYGTLADVDLLIKTLKDHDMKLMMDLVVNHTSDQHSWFVESANS KDSPKRDWYIWRPAKGFDEAGNPVPPNNWAQILGDTLSAWTWHAETQEFYLTLHTSAQAELNWE NPDVVTAVYDVMEFWLRRGICGFRMDVINFISKDQSFPDAPIIDPASKYQPGEQFYTNGPRFHE FMHGIYDNVLSKYDTITVGETPYVTDMKEIIKTVGSTAKELNMAFNFDHMEIEDIKTKGESKWS LRDWKLTELKGILSGWQKRMREWDGWNAIFLECHDQARSVSRYTNDSDEFRDRGAKLLALLETT LGGTIFLYQGQEIGMRNFPVEWDPDTEYKDIESVNFWKKSKELHPVGSEGLAQARTLLQKKARD HARTPMQWSADPHAGFTVPDATPWMRVNDDYGTVNVEAQMSFPWEMKGELSVWQYWQQALQRRK LHKGAFVYGDFEDLDYHNELVFAYSRTSADGKETWLVAMNWTTDAVEWTVPSGIHVTRWVSSTL QTAPLMAGQSTVTLRALEGVVGCCS Transglucosidase MGLCVGWRWILLCVVMGAAVCGTDKTATMRWHKLLPGVLALLPLSVAQSCWRNTTCSGPTESAF 290 XP_001398938.2 SGPWEKNIFAPSSRTVNPEKLFLITQPDKTEEYSPFALHGNGSLVVYDFGKEVGGIVSVNFSST GSGALGVAFTEAKNWIGEWSDSSNGGFKGPDGALYGNFTEAGSHYYVMPDKSLRGGFRYLTLFL ITSDNSTIQIEDVNLEIGFQPTWSNLKAYQGYFHSNDDLLNKIWYTGAYTLQTNEVPTDTGRQI PAMAVGWANNCTLGPGDTIIVDGAKRDRAVWPGDMGIAVPSAFVSLGDLDSVKNALQVMYDTQN NSTGAFDESGPPLSQKDSDTYHMWTMVGTYNYMLFTNDSDFLERNWEGYQKAMDYIYGKVTYPS GLLNVTGTRDWARWQQGYNNSEAQMILYHTLNTGAELATWAGDSGDLSSTWTSRAEKLRQAINE YCWDDSYGAFKDNATDTTLHPQDANSMALLFGVVDADRAASISERLTDNWTPIGAVAPELPENI SPFISSFEIQGHLTVGQPQRALELIRRSWGWYYNNANGTQSTVIEGYLQNGTFGYRSDRGYYYD TAYVSHSHGWSSGPTSALTNYIVGISVTSPLGATWRIAPQFVDLQSAEGGFTTSLGKFQAGWSK TDKGYTLDFTVPHGTQGNLTLPFVSAAKPSIKIDGTEISRGVQYANSTATVTVSGGGTYKVEVQ Betaglucosidase MAMQLRSLLLCVLLLLLGFALADTNAAARIHPPVVCANLSRANFDTLVPGFVFGAATASYQVEG 292 ProteinID: AANLDGRGPSIWDTFTHKHPEKIADGSNGDVAIDQYHRYKEDVAIMKDMGLESYRFSISWSRVL H9ZGE3|H9ZGE3 PNGTLSGGINKKGIEYYNNLINELLHNGIEPLVTLFHWDVPQTLEDEYGGFLSNRIVNDFEEYA ELCFKKFGDRVKHWTTLNEPYTFSSHGYAKGTHAPGRCSAWYNQTCFGGDSATEPYLVTHNLLL AHAAAVKLYKTKYQAYQKGVIGITVVTPWFEPASEAKEDIDAVFRALDFIYGWFMDPLTRGDYP QSMRSLVGERLPNFTKKESKSLSGSFDYIGINYYSARYASASKNYSGHPSYLNDVNVDVKTELN GVPIGPQAASSWLYFYPKGLYDLLCYTKEKYNDPIIYITENGVDEFNQPNPKLSLCQLLDDSNR IYYYYHHLCYLQAAIKEGVKVKGYFAWSLLDNFEWDNGYTVRFGINYVDYDNGLKRHSKHSTHW FKSFLKKSSRNTKKIRRCGNNNTSATKFVF UGT73-251_5 MDSPPQKPHFLLFPFMAQGHMIPMIDLAKLLAQRGAIITVVTTPHNAARYHSVLARAIDSGLHI 293 Disclosedin HVLQLQFPCNEGGLPEGCENFDLLPSLGSASTFFRATFLLYEPSEKVFEELIPRPTCIISDMCL Itkinetat., PWTVRLAQKYHVPRLVFYSLSCFFLLCMRSLKNNQALISSKSDSELVTFSDLPDPVEFLKSQLP 2016,andWO KSNDEEMAKFGYEIGEADRQSHGVIVNVFEEMEPKYLAEYRKERESPEKVWCVGPVSLCNDNKL 2016/038617 DKAQRGNKASIDERECIEWLDGQQPSSVVYVSLGSLCNLVTAQLIELGLGLEASNKPFIWVIRK GNITEELQKWLVEYDFEEKTKGRGLVILGWAPQVLILSHPAIGCFLTHCGWNSSIEGISAGMPM ITWPLFADQVFNEKLIVEILRIGVSVGMETAMHWGEEEEKGVVVKREKVREAIERAMDGDEREE RRERCKELAEMAKRAVEEGGSSHRNLTLLTEDILVNGGGQERMDDADDFPTIVN UGT73-251-6 MDSPPHRPHFLLFPFMAQGHMIPMIDLAKLLAQRGAIVTILTTPHNAARTHSVLARAIDSGLQI 294 Disclsoedin RVRPLQFPCKEAGLPEGCENLDLLPSLGSASTFFRATCLLYDPSEKLFEELSPRPTCIISDMCL Itkinetat., PWTIRLAQKYHVPRLVFYSLSCFFLLCMRSLKNNPALISSKSDSEFVTFSDLPDPVEFLKSELP 2016,andWO KSTDEDLVKFSYEMGEADRKSYGVILNIFEEMEPKYLAEYGNERESPEKVWCVGPVSLCNDNKL 2016/038617 DKAQRGNKASIDERECIKWLGGQQPSSVVYASLGSLCNLVTAQFIELGLGLEASNKPFIWVIRK GNITEELQKWLVEYDFEEKTKGRGLVILGWAPQVLILSHPSIGCFLTHCGWNSSIEGISAGVPM VTWPLFSDQVFNEKLIVQILRIGVSVGAETAMNWGEEEEKGVVVKREKVREAIERMMDGDEREE RRERCKELAETAKRAIEEGGSSHRNLTLLIEDIGTSLRRL UGT73-327-2 MGSAGVELKVAFLPFAAPGHMIPLMNIARLFAMHGADVTFITTPATASRFQNVVDSDLRRGHKI 295 Disclosedin KLHTFQLPSAEAGLPPGVESFNECTSKEMTEKLFGAFEMLNGDIEQFLKGAKVDCIVSDTILVW Itkinetat., TLDAAARLGIPRIAFRSSGFFSECIHHSLRCHKPHKKVGSDTEPFIFPGLPHKIEITRLNIPQW 2016,andWO YSEEGYIQHIEKMKEMDKKSYAVLLNTFYELEADYVEYFESVIGLKTWIVGPVSLWANEGGGKN 2016/038617 DSRTENNNAELMEWLDSKQPNSVLYVSFGSMTKFPSAQVLEIAHGLEDSGCHFIWVVRKMNESE AADEEFPEGFEERVRESKRGLIIRDWAPQELILNHAAVGGFVTHCGWNSILESVCAGRPIIAWP LSAEQFFNEKFVTRVLKVGVSIGVRKWWGSTSSETLDVVKRDRIAEAVARLMGDDREVVEMRDG VRELSHAAKRAIKEGGSSHSTLLSLIHELKTMKFKRQSSNVDG UGT74-345-2 MDETTVNGGRRASDVVVFAFPRHGHMSPMLQFSKRLVSKGLRVTFLITTSATESLRLNLPPSSS 296 Itkinetat., LDLQVISDVPESNDIATLEGYLRSFKATVSKTLADFIDGIGNPPKFIVYDSVMPWVQEVARGRG 2016,andWO LDAAPFFTQSSAVNHILNHVYGGSLSIPAPENTAVSLPSMPVLQAEDLPAFPDDPEVVMNFMTS 2016/038617 QFSNFQDAKWIFFNTFDQLECKVVNWMADRWPIKTVGPTIPSAYLDDGRLEDDRAFGLNLLKPE DGKNTRQWQWLDSKDTASVLYISFGSLAILQEEQVKELAYFLKDTNLSFLWVLRDSELQKLPHN FVQETSHRGLVVNWCSQLQVLSHRAVSCFVTHCGWNSTLEALSLGVPMVAIPQWVDQTTNAKFV ADVWRVGVRVKKKDERIVTKEELEASIRQVVQGEGRNEFKHNAIKWKKLAKEAVDEGGSSDKNI EEFVKTIA UGT75-281-2 MMRNHHFLLVCFPSQGYINPSLQLARRLISLGVNVTFATTVLAGRRMKNKTHQTATTPGLSFAT 297 Itkinetal FSDGFDDETLKPNGDLTHYFSELRRCGSESLTHLITSAANEGRPITFVIYSLLLSWAADIASTY 2016,andWO DIPSALFFAQPATVLALYFYYFHGYGDTICSKLQDPSSYIELPGLPLLTSQDMPSFFSPSGPHA 2016/038617 FILPPMREQAEFLGRQSQPKVLVNTFDALEADALRAIDKLKMLAIGPLIPSALLGGNDSSDASF CGDLFQVSSEDYIEWLNSKPDSSVVYISVGSICVLSDEQEDELVHALLNSGHTFLWVKRSKENN EGVKQETDEEKLKKLEEQGKMVSWCRQVEVLKHPALGCFLTHCGWNSTIESLVSGLPVVAFPQQ IDQATNAKLIEDVWKTGVRVKANTEGIVEREEIRRCLDLVMGSRDGQKEEIERNAKKWKELARQ AIGEGGSSDSNLKTFLWEIDLEI UGT85-269-4 MAEQAHDLLHVLLFPFPAEGHIKPFLCLAELLCNAGFHVTFLNTDYNHRRLHNLHLLAARFPSL 298 Itkinetal., HFESISDGLPPDQPRDILDPKFFISICQVTKPLFRELLLSYKRISSVQTGRPPITCVITDVIFR 2016,andWO FPIDVAEELDIPVFSFCTFSARFMFLYFWIPKLIEDGQLPYPNGNINQKLYGVAPEAEGLLRCK 2016/038617 DLPGHWAFADELKDDQLNFVDQTTASSRSSGLILNTFDDLEAPFLGRLSTIFKKIYAVGPIHSL LNSHHCGLWKEDHSCLAWLDSRAAKSVVFVSFGSLVKITSRQLMEFWHGLLNSGKSFLFVLRSD VVEGDDEKQVVKEIYETKAEGKWLVVGWAPQEKVLAHEAVGGFLTHSGWNSILESIAAGVPMIS CPKIGDQSSNCTWISKVWKIGLEMEDRYDRVSVETMVRSIMEQEGEKMQKTIAELAKQAKYKVS KDGTSYQNLECLIQDIKKLNQIEGFINNPNFSDLLRV UGT85-269-1 MVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDVVFLSTEYNHRRISNTEALASRFPTLHFETI 300 Itkinetal., PDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRWPITCIITDIMLSSPIEVAEEFGI 2016,andWO PVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQGVPLFEGLLRRNHLPGSWSDKSAD 2016/038617 ISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFNKIYTIGPLHALSKSRLGDSSSSA SALSGFWKEDRACMSWLDCQPPRSVVFVSFGSTMKMKADELREFWYGLVSSGKPFLCVLRSDVV SGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVLSHPAVGGFLTHCGWNSTVESIAAGVPMMC WPILGDQPSNATWIDRVWKIGVERNNREWDRLTVE UGT94-289-1 MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI 301 Itkinetal., QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA 2016,andWO PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK 2016/038617 IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE EAGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKMDEMVAAIS LFLKI UGT94-289-2 MDAQQGHTTTILMLPWVGYGHLLPFLELAKSLSRRKLFHIYFCSTSVSLDAIKPKLPPSISSDD 302 Itkinetat., SIQLVELRLPSSPELPPHLHTTNGLPSHLMPALHQAFVMAAQHFQVILQTLAPHLLIYDILQPW 2016,andWO APQVASSLNIPAINFSTTGASMLSRTLHPTHYPSSKFPISEFVLHNHWRAMYTTADGALTEEGH 2016/038617 KIEETLANCLHTSCGVVLVNSFRELETKYIDYLSVLLNKKVVPVGPLVYEPNQEGEDEGYSSIK NWLDKKEPSSTVFVSFGTEYFPSKEEMEEIAYGLELSEVNFIWVLRFPQGDSTSTIEDALPKGF LERAGERAMVVKGWAPQAKILKHWSTGGLVSHCGWNSMMEGMMFGVPIIAVPMHLDQPFNAGLV EEAGVGVEAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEI SLLRKKAPCSI UGT94-289-3 MDAAQQGDTTTILMLPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSFSDSI 303 Itkinetal., QFVELHLPSSPEFPPHLHTTNGLPPTLMPALHQAFSMAAQHFESILQTLAPHLLIYDSLQPWAP 2016,andWO RVASSLKIPAINFNTTGVFVISQGLHPIHYPHSKFPFSEFVLHNHWKAMYSTADGASTERTRKR 2016/038617 GEAFLYCLHASCSVILINSFRELEGKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKNW LDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVNFIWVVRFPQGDNTSGIEDALPKGFLE RAGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHVDQPFNAGLVEE AGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKFDEMVAEISL LLKI UDP- MDTRKRSIRILMFPWLAHGHISAFLELAKSLAKRNFVIYICSSQVNLNSISKNMSSKDSISVKL 304 Disclosedin glycotransferase VELHIPTTILPPPYHTTNGLPPHLMSTLKRALDSARPAFSTLLQTLKPDLVLYDFLQSWASEEA Noguchiet (330) ESQNIPAMVFLSTGAAAISFIMYHWFETRPEEYPFPAIYFREHEYDNFCRFKSSDSGTSDQLRV al., SDCVKRSHDLVLIKTFRELEGQYVDFLSDLTRKRFVPVGPLVQEVGCDMENEGNDIIEWLDGKD 2008)(Plant RRSTVFSSFGSEYFLSANEIEEIAYGLELSGLNFIWVVRFPHGDEKIKIEEKLPEGFLERVEGR J.2008 GLVVEGWAQQRRILSHPSVGGFLSHCGWSSVMEGVYSGVPIIAVPMHLDQPFNARLVEAVGFGE May;54(3):415 EVVRSRQGNLDRGEVARVVKKLVMGKSGEGLRRRVEELSEKMREKGEEEIDSLVEELVTVVRRR -27) ERSNLKSENSMKKLNVMDDGE UDP- ATGGATACAAGAAAGAGAAGCATCAGGATTCTAATGTTCCCATGGCTTGCTCATGGCCATATCT 305 Disclosedin glycotransferase CAGCATTCCTCGAGCTGGCGAAGTCACTTGCCAAAAGAAACTTCGTCATTTACATTTGTTCTTC Noguchiet (330) ACAAGTAAATCTAAATTCCATCAGCAAGAACATGTCATCAAAAGACTCCATTTCCGTAAAACTT al.,2008 GTTGAGCTTCACATTCCCACCACCATACTTCCCCCTCCTTACCACACCACCAATGGCCTCCCAC CCCACCTCATGTCCACCCTCAAGAGAGCCCTCGACAGTGCCCGGCCCGCCTTCTCCACCCTCCT CCAAACCCTCAAGCCCGACTTGGTTTTATACGATTTCCTCCAGTCGTGGGCCTCGGAGGAGGCC GAGTCGCAGAATATACCAGCCATGGTGTTTCTGAGTACCGGAGCTGCAGCGATTTCTTTTATTA TGTACCATTGGTTTGAGACCAGACCGGAGGAGTACCCTTTTCCGGCTATATACTTCCGGGAACA CGAGTATGATAACTTCTGCCGTTTTAAGTCTTCCGACAGCGGTACTAGTGATCAATTGAGAGTC AGCGATTGCGTTAAACGGTCGCACGATTTGGTTCTGATCAAGACATTCCGTGAACTGGAAGGAC AATACGTAGATTTTCTCTCCGACTTGACTCGGAAGAGATTCGTACCAGTTGGCCCCCTTGTTCA GGAGGTAGGTTGTGATATGGAGAATGAAGGAAATGACATCATCGAATGGCTCGACGGGAAAGAC CGTCGTTCGACGGTTTTCTCCTCATTCGGGAGCGAGTACTTCTTGTCTGCCAATGAGATCGAAG AGATAGCTTATGGGCTGGAGCTAAGCGGGCTTAACTTCATCTGGGTTGTTAGGTTTCCTCATGG CGACGAGAAAATCAAGATTGAGGAGAAACTGCCGGAAGGGTTTCTTGAGAGAGTGGAAGGAAGA GGGTTGGTGGTGGAGGGATGGGCACAGCAGAGGAGAATATTGTCACATCCGAGTGTTGGAGGGT TTTTGAGCCACTGTGGGTGGAGTTCTGTGATGGAAGGGGTGTATTCCGGTGTGCCGATTATTGC CGTGCCGATGCATCTTGACCAGCCGTTCAATGCTAGGTTGGTGGAGGCGGTGGGGTTTGGGGAG GAGGTGGTGAGGAGTAGACAAGGAAATCTTGACAGAGGAGAGGTGGCGAGGGTGGTGAAGAAGC TGGTTATGGGGAAAAGTGGGGAGGGGTTACGGCGGAGGGTGGAGGAGTTGAGTGAGAAGATGAG AGAGAAAGGGGAGGAGGAGATTGATTCACTGGTGGAGGAATTGGTGACGGTGGTTAGGAGGAGA GAGAGATCGAATCTCAAGTCTGAGAATTCTATGAAGAAATTGAATGTGATGGATGATGGAGAAT AG UGT98protein[S. MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI 306 grosvenorii] QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE EAGVGVEAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEIS LLRKKAPCSIAAALEHHHHHH UGT98gene[S. CTCGAATTCATGGATGCCCAGCGAGGTCACACCACAACCATTTTGATGTTTCCATGGCTCGGCT 307 grosvenorii] ATGGCCATCTTTCGGCTTTCCTAGAGTTGGCCAAAAGCCTCTCAAGGAGGAACTTCCATATCTA CTTCTGTTCAACCTCTGTTAACCTCGACGCCATTAAACCAAAGCTTCCTTCTTCTTCCTCTTCT GATTCCATCCAACTTGTGGAACTTTGTCTTCCATCTTCTCCTGATCAGCTCCCTCCTCATCTTC ACACAACCAACGCCCTCCCCCCTCACCTCATGCCCACTCTCCACCAAGCCTTCTCCATGGCTGC CCAACACTTTGCTGCCATTTTACACACACTTGCTCCGCATCTCCTCATTTACGACTCTTTCCAA CCTTGGGCTCCTCAACTAGCTTCATCCCTCAACATTCCAGCCATCAACTTCAATACTACGGGAG CTTCAGTCCTGACCCGAATGCTTCACGCTACTCACTACCCAAGTTCTAAATTCCCAATTTCAGA GTTTGTTCTCCACGATTATTGGAAAGCCATGTACAGCGCCGCCGGTGGGGCTGTTACAAAAAAA GACCACAAAATTGGAGAAACACTTGCGAATTGCTTGCATGCTTCTTGTAGTGTAATTCTAATCA ATAGTTTCAGAGAGCTCGAGGAGAAATATATGGATTATCTCTCCGTTCTCTTGAACAAGAAAGT TGTTCCGGTTGGTCCTTTGGTTTACGAACCGAATCAAGACGGGGAAGATGAAGGTTATTCAAGC ATCAAAAATTGGCTTGACAAAAAGGAACCGTCCTCCACCGTCTTCGTTTCATTTGGAAGCGAAT ACTTCCCGTCAAAGGAAGAAATGGAAGAGATAGCCCATGGGTTAGAGGCGAGCGAGGTTCATTT CATCTGGGTCGTTAGGTTTCCTCAAGGAGACAACACCAGCGCCATTGAAGATGCCTTGCCGAAG GGGTTTCTGGAGAGGGTGGGAGAGAGAGGGATGGTGGTGAAGGGTTGGGCTCCCCAGGCGAAGA TACTGAAGCATTGGAGCACAGGGGGATTCGTGAGCCACTGTGGATGGAACTCGGTGATGGAAAG CATGATGTTTGGCGTTCCCATAATAGGGGTTCCGATGCATCTGGACCAGCCCTTTAACGCCGGA CTCGCGGAAGAAGCTGGCGTCGGCGTGGAAGCCAAGCGAGATTCGGACGGCAAAATTCAAAGAG AAGAAGTTGCAAAGTCGATCAAAGAAGTGGTGATTGAGAAAACCAGGGAAGACGTGAGGAAGAA AGCAAGAGAAATGGGTGAGATTTTGAGGAGTAAAGGAGATGAGAAAATTGATGAGTTGGTGGCT GAAATTTCTCTTTTGCGCAAAAAGGCCCCATGTTCAATTGCGGCCGCACTCGAGCACCACCACC ACCACCACTGA CYP1798 MEMSSSVAATISIWMVVVCIVGVGWRVVNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA 308 MEEQANSKPINFSHDIGPRVFPSMYKTIQNYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ KPNLNPLIKFLLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLVVKVAFGVYIPGWRFLPTKS NNKMKEINRKIKSLLLGIINKRQKANMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIED VIEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLSRLKVVTMI LNEVLRLYPPASMLTRIIQKETRVGKLTLPAGVILIMPIILIHRDHDLWGEDANEFKPERFSKG VSKAAKVQPAFFPFGWGRICMGQNFAMIEAKMALSLILQRFSFELSSSYVHAPTVVFTTQPQHG AHIVLRKL Epoxidehydrolase MDAIEHRTVSVNGINSHVAEKGEGPVVLLLHGFPELWYSWRHQILALSSLGYRAVAPDLRGYGD 309 TDAPGSISSYTCFHIVGDLVALVESLGMDRVFVVAHDWGAMIAWCLCLFRPEMVKAFVCLSVPF RQRNPKMKPVQSMRAFFGDDYYICRFQNPGEIEEEMAQVGAREVLRGILTSRRPGPPILPKGQA FRARPGASTALPSWLSEKDLSFFASKYDQKGFTGPLNYYRAMDLNWELTASWTGVQVKVPVKYI VGDVDMVFTTPGVKEYVNGGGFKKDVPFLQEVVIMEGVGHFINQEKQEISSHIMDFISKF Epoxidehydrolase MDEIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD 310 TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWGAIIAWYFCLFRPDRIKALVNLSVQF IPRNPAIPFIEGFRTAFGDDFYICRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG FRAIPPPENLPSWLTEEDINFYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV GDSDLTYHFPGAKEYIHNGGFKRDVPLLEEVVVVKDACHFFNQERPQEINAHIHDFINKF Epoxidehydrolase MENIEHTTVQTNGIKMHVAAIGTGPPVLLLHGFPELWYSWRHQLLYLSSAGYRAIAPDLRGYGD 311 TDAPPSPSSYTALHIVGDLVGLLDVLGIEKVFLIGHDWGAIIAWYFCLFRPDRIKALVNLSVQF FPRNPTTPFVKGFRAVLGDQFYMVRFQEPGKAEEEFASVDIREFFKNVLSNRDPQAPYLPNEVK FEGVPPPALAPWLTPEDIDVYADKFAETGFTGGLNYYRAFDRTWELTAPWTGARIGVPVKFIVG DLDLTYHFPGAQKYIHGEGFKKAVPGLEEVVVMEDTSHFINQERPHEINSHIHDFFSKFC Epoxidehydrolase MDQIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD 312 TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWAAIIAWYFCLFRPDRIKALVNLSVQF IPRNPAIPFIEGFRTAFGDDFYMCRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG FRAIPPPENLPSWLTEEDINYYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV GDSDLTYHFPGAKEYIHNGGFKKDVPLLEEVVVVKDACHFINQERPQEINAHIHDFINKF Epoxidehydrolase MEKIEHSTIATNGINMHVASAGSGPAVLFLHGFPELWYSWRHQLLYLSSLGYRAIAPDLRGFGD 313 TDAPPSPSSYTAHHIVGDLVGLLDQLGVDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF TPRNPAISPLDGFRLMLGDDFYVCKFQEPGVAEADFGSVDTATMFKKFLTMRDPRPPIIPNGFR SLATPEALPSWLTEEDIDYFAAKFAKTGFTGGFNYYRAIDLTWELTAPWSGSEIKVPTKFIVGD LDLVYHFPGVKEYIHGGGFKKDVPFLEEVVVMEGAAHFINQEKADEINSLIYDFIKQF Epoxidehydrolase MEKIEHTTISTNGINMHVASIGSGPAVLFLHGFPELWYSWRHQLLFLSSMGYRAIAPDLRGFGD 314 TDAPPSPSSYTAHHIVGDLVGLLDQLGIDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF LRRHPSIKFVDGFRALLGDDFYFCQFQEPGVAEADFGSVDVATMLKKFLTMRDPRPPMIPKEKG FRALETPDPLPAWLTEEDIDYFAGKFRKTGFTGGFNYYRAFNLTWELTAPWSGSEIKVAAKFIV GDLDLVYHFPGAKEYIHGGGFKKDVPLLEEVVVVDGAAHFINQERPAEISSLIYDFIKKF CYP87D18 MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKK 315 VERYGPIFKTCLAGRPVVVSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVKNASALMVFRTSVNKMFGEDAK KLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREVVDDRLANVGPDVEDFLGQ ALKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEAEHEAIRKARA DPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGYIIPEGWTIMLVTASRHR DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKL GGGRIARAHILSFEDGLHVKFTPKE CYP87D18gene ATGTGGACTGTCGTGCTCGGTTTGGCGACGCTGTTTGTCGCCTACTACATCCATTGGATTAACA 316 sequence AATGGAGAGATTCCAAGTTCAACGGAGTTCTGCCGCCGGGCACCATGGGTTTGCCGCTCATCGG AGAGACGATTCAACTGAGTCGACCCAGTGACTCCCTCGACGTTCACCCTTTCATCCAGAAAAAA GTTGAAAGATACGGGCCGATCTTCAAAACATGTCTGGCCGGAAGGCCGGTGGTGGTGTCGGCGG ACGCAGAGTTCAACAACTACATAATGCTGCAGGAAGGAAGAGCAGTGGAAATGTGGTATTTGGA TACGCTCTCCAAATTTTTCGGCCTCGACACCGAGTGGCTCAAAGCTCTGGGCCTCATCCACAAG TACATCAGAAGCATTACTCTCAATCACTTCGGCGCCGAGGCCCTGCGGGAGAGATTTCTTCCTT TTATTGAAGCATCCTCCATGGAAGCCCTTCACTCCTGGTCTACTCAACCTAGCGTCGAAGTCAA AAATGCCTCCGCTCTCATGGTTTTTAGGACCTCGGTGAATAAGATGTTCGGTGAGGATGCGAAG AAGCTATCGGGAAATATCCCTGGGAAGTTCACGAAGCTTCTAGGAGGATTTCTCAGTTTACCAC TGAATTTTCCCGGCACCACCTACCACAAATGCTTGAAGGATATGAAGGAAATCCAGAAGAAGCT AAGAGAGGTTGTAGACGATAGATTGGCTAATGTGGGCCCTGATGTGGAAGATTTCTTGGGGCAA GCCCTTAAAGATAAGGAATCAGAGAAGTTCATTTCAGAGGAGTTCATCATCCAACTGTTGTTTT CTATCAGTTTTGCTAGCTTTGAGTCCATCTCCACCACTCTTACTTTGATTCTCAAGCTCCTTGA TGAACACCCAGAAGTAGTGAAAGAGTTGGAAGCTGAACACGAGGCGATTCGAAAAGCTAGAGCA GATCCAGATGGACCAATTACTTGGGAAGAATACAAATCCATGACTTTTACATTACAAGTCATCA ATGAAACCCTAAGGTTGGGGAGTGTCACACCTGCCTTGTTGAGGAAAACAGTTAAAGATCTTCA AGTAAAAGGATACATAATCCCGGAAGGATGGACAATAATGCTTGTCACCGCTTCACGTCACAGA GACCCAAAAGTCTATAAGGACCCTCATATCTTCAATCCATGGCGTTGGAAGGACTTGGACTCAA TTACCATCCAAAAGAACTTCATGCCTTTTGGGGGAGGCTTAAGGCATTGTGCTGGTGCTGAGTA CTCTAAAGTCTACTTGTGCACCTTCTTGCACATCCTCTGTACCAAATACCGATGGACCAAACTT GGGGGAGGAAGGATTGCAAGAGCTCATATATTGAGTTTTGAAGATGGGTTACATGTGAAGTTCA CACCCAAGGAATGA AtCPRprotein MTSALYASDLFKQLKSIMGTDSLSDDVVLVIATTSLALVAGFVVLLWKKTTADRSGELKPLMIP 317 KSLMAKDEDDDLDLGSGKTRVSIFFGTQTGTAEGFAKALSEEIKARYEKAAVKDDYAADDDQYE EKLKKETLAFFCVATYGDGEPTDNAARFYKWFTEENERDIKLQQLAYGVFALGNRQYEHFNKIG IVLDEELCKKGAKRLIEVGLGDDDQSIEDDFNAWKESLWSELDKLLKDEDDKSVATPYTAVIPE YRVVTHDPRFTTQKSMESNVANGNTTIDIHHPCRVDVAVQKELHTHESDRSCIHLEFDISRTGI TYETGDHVGVYAENHVEIVEEAGKLLGHSLDLVFSIHADKEDGSPLESAVPPPFPGPCTLGTGL ARYADLLNPPRKSALVALAAYATEPSEAEKLKHLTSPDGKDEYSQWIVASQRSLLEVMAAFPSA KPPLGVFFAAIAPRLQPRYYSISSSPRLAPSRVHVTSALVYGPTPTGRIHKGVCSTWMKNAVPA EKSHECSGAPIFIRASNFKLPSNPSTPIVMVGPGTGLAPFRGFLQERMALKEDGEELGSSLLFF GCRNRQMDFIYEDELNNFVDQGVISELIMAFSREGAQKEYVQHKMMEKAAQVWDLIKEEGYLYV CGDAKGMARDVHRTLHTIVQEQEGVSSSEAEAIVKKLQTEGRYLRDVW. AtCPRgene ATGACTTCTGCTTTGTATGCTTCCGATTTGTTTAAGCAGCTCAAGTCAATTATGGGGACAGATT 318 sequence CGTTATCCGACGATGTTGTACTTGTGATTGCAACGACGTCTTTGGCACTAGTAGCTGGATTTGT GGTGTTGTTATGGAAGAAAACGACGGCGGATCGGAGCGGGGAGCTGAAGCCTTTGATGATCCCT AAGTCTCTTATGGCTAAGGACGAGGATGATGATTTGGATTTGGGATCCGGGAAGACTAGAGTCT CTATCTTCTTCGGTACGCAGACTGGAACAGCTGAGGGATTTGCTAAGGCATTATCCGAAGAAAT CAAAGCGAGATATGAAAAAGCAGCAGTCAAAGATGACTATGCTGCCGATGATGACCAGTATGAA GAGAAATTGAAGAAGGAAACTTTGGCATTTTTCTGTGTTGCTACTTATGGAGATGGAGAGCCTA CTGACAATGCTGCCAGATTTTACAAATGGTTTACGGAGGAAAATGAACGGGATATAAAGCTTCA ACAACTAGCATATGGTGTGTTTGCTCTTGGTAATCGCCAATATGAACATTTTAATAAGATCGGG ATAGTTCTTGATGAAGAGTTATGTAAGAAAGGTGCAAAGCGTCTTATTGAAGTCGGTCTAGGAG ATGATGATCAGAGCATTGAGGATGATTTTAATGCCTGGAAAGAATCACTATGGTCTGAGCTAGA CAAGCTCCTCAAAGACGAGGATGATAAAAGTGTGGCAACTCCTTATACAGCTGTTATTCCTGAA TACCGGGTGGTGACTCATGATCCTCGGTTTACAACTCAAAAATCAATGGAATCAAATGTGGCCA ATGGAAATACTACTATTGACATTCATCATCCCTGCAGAGTTGATGTTGCTGTGCAGAAGGAGCT TCACACACATGAATCTGATCGGTCTTGCATTCATCTCGAGTTCGACATATCCAGGACGGGTATT ACATATGAAACAGGTGACCATGTAGGTGTATATGCTGAAAATCATGTTGAAATAGTTGAAGAAG CTGGAAAATTGCTTGGCCACTCTTTAGATTTAGTATTTTCCATACATGCTGACAAGGAAGATGG CTCCCCATTGGAAAGCGCAGTGCCGCCTCCTTTCCCTGGTCCATGCACACTTGGGACTGGTTTG GCAAGATACGCAGACCTTTTGAACCCTCCTCGAAAGTCTGCGTTAGTTGCCTTGGCGGCCTATG CCACTGAACCAAGTGAAGCCGAGAAACTTAAGCACCTGACATCACCTGATGGAAAGGATGAGTA CTCACAATGGATTGTTGCAAGTCAGAGAAGTCTTTTAGAGGTGATGGCTGCTTTTCCATCTGCA AAACCCCCACTAGGTGTATTTTTTGCTGCAATAGCTCCTCGTCTACAACCTCGTTACTACTCCA TCTCATCCTCGCCAAGATTGGCGCCAAGTAGAGTTCATGTTACATCCGCACTAGTATATGGTCC AACTCCTACTGGTAGAATCCACAAGGGTGTGTGTTCTACGTGGATGAAGAATGCAGTTCCTGCG GAGAAAAGTCATGAATGTAGTGGAGCCCCAATCTTTATTCGAGCATCTAATTTCAAGTTACCAT CCAACCCTTCAACTCCAATCGTTATGGTGGGACCTGGGACTGGGCTGGCACCTTTTAGAGGTTT TCTGCAGGAAAGGATGGCACTAAAAGAAGATGGAGAAGAACTAGGTTCATCTTTGCTCTTCTTT GGGTGTAGAAATCGACAGATGGACTTTATATACGAGGATGAGCTCAATAATTTTGTTGATCAAG GCGTAATATCTGAGCTCATCATGGCATTCTCCCGTGAAGGAGCTCAGAAGGAGTATGTTCAACA TAAGATGATGGAGAAGGCAGCACAAGTTTGGGATCTAATAAAGGAAGAAGGATATCTCTATGTA TGCGGTGATGCTAAGGGCATGGCGAGGGACGTCCACCGAACTCTACACACCATTGTTCAGGAGC AGGAAGGTGTGAGTTCGTCAGAGGCAGAGGCTATAGTTAAGAAACTTCAAACCGAAGGAAGATA CCTCAGAGATGTCTGGTGA cucurbitadienol MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS 319 synthase[S. DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF grosvernorii] LLPGLVIALYVTGVLNSVLSKHHRQEMCRYVYNHQNEDGGWGLHIEGPSTMFGSALNYVALRLL Seq59,SgCbQ GEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGR protein MWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHIHYEDENTRYICLGPVNKVLNLLCCWVED PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV KSSQIQQDCPGDPNVWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATMEALTLF KKLHPGHRTKEIDTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE cucurbitadieno1 ATGTGGAGGTTAAAGGTCGGAGCAGAAAGCGTTGGGGAGAATGATGAGAAATGGTTGAAGAGCA 320 synthaseSgCbQ TAAGCAATCACTTGGGACGCCAGGTGTGGGAGTTCTGTCCGGATGCCGGCACCCAACAACAGCT genesequence CTTGCAAGTCCACAAAGCTCGTAAAGCTTTCCACGATGACCGTTTCCACCGAAAGCAATCTTCC GATCTCTTTATCACTATTCAGTATGGAAAGGAAGTAGAAAATGGTGGAAAGACAGCGGGAGTGA AATTGAAAGAAGGGGAAGAGGTGAGGAAAGAGGCAGTAGAGAGTAGCTTAGAGAGGGCATTAAG TTTCTACTCAAGCATCCAGACAAGCGATGGGAACTGGGCTTCGGATCTTGGGGGGCCCATGTTT TTACTTCCGGGTCTGGTGATTGCCCTCTACGTTACAGGCGTCTTGAATTCTGTTTTATCCAAGC ACCACCGGCAAGAGATGTGCAGATATGTTTACAATCACCAGAATGAAGATGGGGGGTGGGGTCT CCACATCGAGGGCCCAAGCACCATGTTTGGTTCCGCACTGAATTATGTTGCACTCAGGCTGCTT GGAGAAGACGCCAACGCCGGGGCAATGCCAAAAGCACGTGCTTGGATCTTGGACCACGGTGGCG CCACCGGAATCACTTCCTGGGGCAAATTGTGGCTTTCTGTACTTGGAGTCTACGAATGGAGTGG CAATAATCCTCTTCCACCCGAATTTTGGTTATTTCCTTACTTCCTACCATTTCATCCAGGAAGA ATGTGGTGCCATTGTCGAATGGTTTATCTACCAATGTCATACTTATATGGAAAGAGATTTGTTG GGCCAATCACACCCATAGTTCTGTCTCTCAGAAAAGAACTCTACGCAGTTCCATATCATGAAAT AGACTGGAATAAATCTCGCAATACATGTGCAAAGGAGGATCTGTACTATCCACATCCCAAGATG CAAGATATTCTGTGGGGATCTCTCCACCACGTGTATGAGCCCTTGTTTACTCGTTGGCCTGCCA AACGCCTGAGAGAAAAGGCTTTGCAGACTGCAATGCAACATATTCACTATGAAGATGAGAATAC CCGATATATATGCCTTGGCCCTGTCAACAAGGTACTCAATCTGCTTTGTTGTTGGGTTGAAGAT CCCTACTCCGACGCCTTCAAACTTCATCTTCAACGAGTCCATGACTATCTCTGGGTTGCTGAAG ATGGCATGAAAATGCAGGGTTATAATGGGAGCCAGTTGTGGGACACTGCTTTCTCCATCCAAGC AATCGTATCCACCAAACTTGTAGACAACTATGGCCCAACCTTAAGAAAGGCACACGACTTCGTT AAAAGTTCTCAGATTCAGCAGGACTGTCCTGGGGATCCTAATGTTTGGTACCGTCACATTCATA AAGGTGCATGGCCATTTTCAACTCGAGATCATGGATGGCTCATCTCTGACTGTACAGCAGAGGG ATTAAAGGCTGCTTTGATGTTATCCAAACTTCCATCCGAAACAGTTGGGGAATCATTAGAACGG AATCGCCTTTGCGATGCTGTAAACGTTCTCCTTTCTTTGCAAAACGATAATGGTGGCTTTGCAT CATATGAGTTGACAAGATCATACCCTTGGTTGGAGTTGATCAACCCCGCAGAAACGTTTGGAGA TATTGTCATTGATTATCCGTATGTGGAGTGCACCTCAGCCACAATGGAAGCACTGACGTTGTTT AAGAAATTACATCCCGGCCATAGGACCAAAGAAATTGATACTGCTATTGTCAGGGCGGCCAACT TCCTTGAAAATATGCAAAGGACGGATGGCTCTTGGTATGGATGTTGGGGGGTTTGCTTCACGTA TGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGGACATATAATAATTGCCTTGCC ATTCGCAAGGCTTGCGATTTTTTACTATCTAAAGAGCTGCCCGGCGGTGGATGGGGAGAGAGTT ACCTTTCATGTCAGAATAAGGTATACACAAATCTTGAAGGAAACAGACCGCACCTGGTTAACAC GGCCTGGGTTTTAATGGCCCTCATAGAAGCTGGCCAGGCTGAGAGAGACCCAACACCATTGCAT CGTGCAGCAAGGTTGTTAATCAATTCCCAGTTGGAGAATGGTGATTTCCCCCAACAGGAGATCA TGGGAGTCTTTAATAAAAATTGCATGATCACATATGCTGCATACCGAAACATTTTTCCCATTTG GGCTCTTGGAGAGTATTGCCATCGGGTTTTGACTGAATAA cucurbitadieno1 MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCAADAAAVTPHQLLQIQNARNHFHRNRFHRK 321 synthaseCpep2 QSSDLFLAIQYEKEIAKGGKGKEAVKVKEGEEVGKEAVKSTLERALSFYTAVQTSDGNWASDLG protein GPMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYIYNHQNEDGGWGLHIEGTSTMFGSALNYVA LRLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYS LPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDL YYPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNM LCCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTL RKAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTM VGEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAAT MEALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGR TYNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGE RDPAPLHRGARLVMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE. cucurbitadienol ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG 322 synthaseCpep2 TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGCCGACGCCGCCGCCGTCACTCC genesequence TCACCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCGAAGGGCGGAAAAGGGA AAGAGGCGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGA GAGGGCACTAAGTTTCTACACAGCCGTGCAGACGAGCGATGGGAATTGGGCCTCGGATCTTGGA GGGCCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAG TTTTGTCCAAGCACCACCGCGTAGAGATGTGCAGATATATTTACAATCACCAGAATGAAGATGG AGGGTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTATGTTGCA CTTAGGCTGCTTGGAGAAGACGCCGATGGCGGAGACGATGGTGCAATGACAAAAGCACGTGCTT GGATCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTGCT TGGAGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGC CTACCATTTCATCCAGGACGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACT TATATGGGAAGAGATTTGTTGGCCCAATCACTCCCAAAGTTCTTTCTCTAAGACAAGAGCTCTA CACGGTTCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTA TACTATCCACATCCCAAGATGCAAGACATACTATGGGGATCTATCTACCATGTATATGAGCCAT TGTTCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATAT TCACTATGAAGATGAAAATAGTCGCTATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATG CTTTGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATG ACTATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGTTACAATGGCAGCCAGTTGTGGGA CACTGCTTTCTCCATCCAAGCCATTGTAGCTACCAAACTTGTAGACAGCTATGCCCCAACTTTA AGAAAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATG TTTGGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCAT CTCTGACTGCACGGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATG GTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAA ATGATAACGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAA CCCAGCAGAAACATTCGGAGACATTGTCATCGACTATCCGTATGTGGAGTGCACCGCAGCAACA ATGGAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAG CTATTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAAAGGGCGGATGGCTCTTGGTATGGGTG TTGGGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATCAAGGGATTGGTGGCTGCAGGAAGA ACATATAATAGCTGCCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCG GCGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAA CAAGCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCCGGCCAGGGTGAG AGAGACCCAGCACCATTGCACCGTGGAGCAAGGTTGGTAATGAATTCTCAACTGGAGAATGGTG ATTTCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATA CCGAAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA cucurbitadienol MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCAADAAAVTPHQLLQIQNARNHFHRNRFHRK 323 synthaseCpep4 QSSDLFLAIQYEKEIAKGGKGKEAVKVKEGEEVGKEAVKSTLERALSFYTAVQTSDGNWASDLG protein GPMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYIYNHQNEDGGWGLHIEGTSTMFGSALNYVA LRLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFLLLPYS LPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDL YYPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNM LCCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTL RKAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTM VGEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYSYVECTAAT MEALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGR TYNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGE RDPAPLHRAARLVMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE. cucurbitadienol ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG 324 synthaseCpep4 TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGCCGACGCCGCCGCCGTCACTCC genesequence TCACCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCGAAGGGCGGAAAAGGGA AAGAGGCGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGA GAGGGCACTAAGTTTCTACACAGCCGTGCAGACGAGCGATGGGAATTGGGCCTCGGATCTTGGA GGGCCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAG TTTTGTCCAAGCACCACCGCGTAGAGATGTGCAGATATATTTACAATCACCAGAATGAAGATGG AGGGTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTATGTTGCA CTTAGGCTGCTTGGAGAAGACGCCGATGGCGGAGACGATGGTGCAATGACAAAAGCACGTGCTT GGATCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTGCT TGGAGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTTGCTTCTCCCTTACAGC CTACCATTTCATCCAGGACGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACT TATATGGGAAGAGATCTGTTCGCCCAATCACTCCCAAAGTTCTTTCTCTAAGACAAGAGCTCTA CACGGTTCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTA TACTATCCACATCCCAAGATGCAAGACATACTATGGGGATCTATCTACCATGTATATGAGCCAT TGTTCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATAT TCACTATGAAGATGAAAATAGTCGCTATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATG CTTTGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATG ACTATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGTTACAATGGCAGCCAGTTGTGGGA CACTGCTTTCTCCATCCAAGCCATTGTAGCTACCAAACTTGTAGACAGCTATGCCCCAACTTTA AGAAAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATG TTTGGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCAT CTCTGACTGCACGGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATG GTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAA ATGATAACGGCGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAA CCCAGCAGAAACATTCGGAGACATTGTCATCGACTATTCGTATGTGGAGTGCACCGCAGCAACA ATGGAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAG CTATTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAAAGGGCGGATGGCTCTTGGTATGGGTG TTGGGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGA ACATATAATAGCTGTCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCG GCGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAA CAAGCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAG AGAGACCCAGCACCATTGCACCGTGCAGCAAGGTTGGTAATGAATTCTCAACTGGAGAATGGCG ATTTCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATA CCGAAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA cucurbitadienol MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ 325 synthaseCmaxl SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLESALGFYSAVQTSDGNWASDLGGP protein MFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSAMVG EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE cucurbitadienol ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGGAGGATGAGAAATGGGTGAAGAGCG 326 synthasegene TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGACGCCGCCGCCGACACTCCTCA CCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCACAATCGTTTCCACCGGAAGCAG TCTTCCGATCTCTTTCTGGCTATTCAATATGAAAAGGAAATAGCAAAGGGCGCAAAAGGTGGAG CGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGAAAGGGC ACTCGGTTTCTACTCGGCCGTGCAGACAAGAGATGGGAATTGGGCCTCGGATCTTGGAGGGCCC TTGTTTTTACTTCCGGGTCTCGTGATTGCCCTTCATGTCACAGGCGTCTTGAATTCAGTTTTGT CCAAGCACCACCGCGTAGAGATGTGCAGATATCTTTACAATCACCAGAATGAAGATGGAGGGTG GGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTGAATTACGTTGCACTAAGG CTGCTTGGAGAAGACGCCGATGGCGGAGACGGTGGCGCAATGACAAAAGCACGTGCTTGGATCT TGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTACTTGGAGT GTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGCCTACCA TTTCATCCAGGAAGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCAATGTCTTACTTATATG GGAAGAGATTTGTTGGGCCAATCACTCCCAAAGTTCTTTCTCTAAGGCAAGAGCTCTACACAAT TCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTGTACTAT CCACATCCCAAGATGCAAGACATTCTATGGGGATCCATCTACCATGTATATGAGCCATTGTTCA CTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAGCTGCAATGAAACATATTCACTA TGAAGATGAAAATAGTCGATATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATGCTTTGT TGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATGACTATC TCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGCTACAATGGCAGCCAGTTGTGGGACACTGC TTTCTCCATCCAAGCCATCGTAGCCACCAAACTTGTAGACAGCTATGCCCCAACTTTAAGAAAA GCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATGTTTGGT TCCGTCATATTCATAAAGGTGCTTGGCCACTTTCGACACGAGATCATGGATGGCTCATCTCCGA CTGTACAGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATGGTTGGG GAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAAATGATA ATGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAACCCAGC TGAAACATTCGGAGACATTGTCATTGACTATCCGTATGTGGAGTGCACCGCAGCAACAATGGAA GCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAGCTATTG GCAAGGCAGCCAACTTCCTTGAGAAAATGCAGAGGGCGGATGGCTCTTGGTACGGGTGTTGGGG GGTTTGTTTTACGTATGCGGGTTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGAACATAT AATAGCTGCCTTGCCATTCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCGGCGGTG GATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGGAACAAGCC ACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAGAGAGAC CCAGCACCATTGCACCGTGCAGCAAGGTTGCTAATGAATTCCCAATTGGAGAATGGCGATTTCG TGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATACCGAAA CATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA cucurbitadienol MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAAAATPRQLLQIQNARNHFHRNRFHRK 327 synthaseCmos1 QSSDLFLAIQYEKEIAEGGKGGAVKVKEEEEVGKEAVKSTLERALSFYSAVQTSDGNWASDLGG protein PMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVAL RLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSL PFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDLY YPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNML CCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLR KAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSAMV GEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATM EALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRT YNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGER DPAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE cucurbitadienol ATGTGGAGGTTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG 328 synthaseCmos1 TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGACGCCGCCGCCGCCGCCACTCC genesequence TCGCCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCAGAGGGCGGAAAAGGTG GAGCGGTGAAAGTGAAAGAAGAGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGAAAG GGCACTAAGTTTCTACTCAGCCGTGCAGACAAGCGATGGGAATTGGGCCTCGGATCTTGGAGGG CCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAGTTT TGTCCAAGCACCACCGCGTAGAGATGTGCAGATATCTTTACAATCACCAGAATGAAGATGGAGG GTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTACGTTGCACTA AGGCTGCTTGGAGAAGACGCGGATGGCGGAGACGATGGCGCAATGACAAAAGCACGTGCTTGGA TCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAGTTGTGGCTGTCCGTGCTTGG AGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGCCTA CCATTTCATCCAGGAAGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACTTAT ATGGGAAGAGATTTGTTGGGCCAATCACTCCCAAAGTTCTATCGCTAAGACAAGAGCTTTACAC GGTTCCTTATCATGAAATAGACTGGAACAAATCCCGCAATACATGTGCAAAGGAGGATCTATAC TATCCACATCCCAAGATGCAAGACATTCTATGGGGATCCATCTACCATGTGTATGAGCCATTGT TCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATATTCA CTATGAAGATGAAAATAGTCGATATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATGCTT TGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATGACT ATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGCTACAATGGCAGCCAGTTGTGGGACAC TGCTTTCTCCATCCAAGCCATCGTAGCCACCAAACTTGTAGACAGCTATGCCCCAACTTTAAGA AAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATGTTT GGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCATCTC CGACTGTACAGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCGCAATGGTT GGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAAATG ATAATGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAACCC AGCAGAAACATTCGGAGACATTGTCATCGACTATCCGTATGTGGAGTGCACCGCAGCAACAATG GAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAGCTA TTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAGAGGGCGGATGGCTCTTGGTATGGGTGTTG GGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGAACA TATAATAGCTGCCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCGGCG GTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAACAA GCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAGAGA GACCCAGCACCATTGCACCGTGCAGCAAGGTTGCTAATGAATTCCCAATTGGAGAATGGCGATT TCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATACCG AAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTGACTGAAT cucurbitadienol MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 329 synthase[Cucumis NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD melo] GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDM. cucurbitadienol MWRLKVGAESVGEKEEKWLKSISNHLGRQVWEFCADQPTASPNHLQQIDNARKHFRNNRFHRKQ 330 synthase SSDLFLAIQNEKEIANGTKGGGIKVKEEEDVRKETVKNTVERALSFYSAIQTNDGNWASDLGGP [Citrullus MFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR colocynthis] LLGEDADGGEGGAMTKARGWILDRGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYCLP FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNKSRNTCAKEDLYY PHPKMQDILWGSIYHLYEPLFTRWPGKRLREKALQMAMKHIHYEDENSRYICLGPVNKVLNMLC CWVEDPYSDAFKFHLQRVPDYLWIAEDGMRMQGYNGSQLWDTAFSVQAIISTKLIDSFGTTLKK AHDFVKDSQIQQDFPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVG EPLEKSRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATME ALTLFKKLHPGHRTKEIDTAVAKAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTY STCVAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERD PAPLHRAARLLINSQLENGDFPQEEIMGVFNKNCMITYAAYRNIFPIWALGEYFHRVLTE. cucurbitadienol MWRLKVGAESVGEEDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ 331 synthase SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLERALGFYSAVQTRDGNWASDLGGP [Cucurbitapepo] LFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPLSTRDHGWLISDCTAEGLKASLMLSKLPSTMVG EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE cucurbitadienol MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCAENDDDDDDEAVIHVVANSSKHLLQQQRRQ 332 synthase[Cucumis SSFENARKQFRNNRFHRKQSSDLFLTIQYEKEIARNGAKNGGNTKVKEGEDVKKEAVNNTLERA sativa] LSFYSAIQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGW GLHIEGSSTMFGSALNYVALRLLGEDANGGECGAMTKARSWILERGGATAITSWGKLWLSVLGV YEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITHMVLSLRKELYTI PYHEIDWNRSRNTCAQEDLYYPHPKMQDILWGSIYHVYEPLFNGWPGRRLREKAMKIAMEHIHY EDENSRYIYLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTA FSIQAILSTKLIDTFGSTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISD CTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPA ETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAALAKAANFLENMQRTDGSWYGCWG VCFTYAGWFGIKGLVAAGRTYNNCVAIRKACHFLLSKELPGGGWGESYLSCQNKVYTNLEGNRP HLVNTAWVLMALIEAGQGERDPAPLHRAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRN IFPIWALGEYSHRVLTE cucurbitadienol DGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYLYNHQNEDGGWGLHIEGTSTM 333 synthase FGSALNYVALRLLGEDADGGEGGAMTKARSWILDRGGATAITSWGKLWLSVLGVYEWSGNNPLP [Citrullus PEFWLLPYCLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRS lanatus](partial) RNTCAKEDLYYPHPKMQDILWGSIYHLYEPLFTRWPGKRLREKALQMAMKHIHYEDENSRYICL GPVNKVLNMLCCWVEDPYSDAFKFHLQRVPDYLWVAEDGMRMQGYNGSQLWDTAFSVQAIISTK LIDSFGTTLKKAHDFVKDSQIQQDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASL MLSKLPSEIVGEPLEKSRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDY PYVECTSATMEALTLFKKLHPGRRTKEIDIAVARAANFLENMQRTDGSWYGCWGVCFTYAGWFG IKGLVAAGRTYNSCVAIRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLM ALIEAGQAERDPAPLHRAARLLINSQLENGDFPQEEIMGVFNKNCMITYAAYRNIFPIWALGEY FHRVLTE Squalene MSAVNVAPELINADNTITYDAIVIGAGVIGPCVATGLARKGKKVLIVERDWAMPDRIVGELMQP 334 epoxidase/ GGVRALRSLGMIQSINNIEAYPVTGYTVFFNGEQVDIPYPYKADIPKVEKLKDLVKDGNDKVLE squalene DSTIHIKDYEDDERERGVAFVHGRFLNNLRNITAQEPNVTRVQGNCIEILKDEKNEVVGAKVDI monooxidase DGRGKVEFKAHLTFICDGIFSRFRKELHPDHVPTVGSSFVGMSLFNAKNPAPMHGHVILGSDHM PILVYQISPEETRILCAYNSPKVPADIKSWMIKDVQPFIPKSLRPSFDEAVSQGKFRAMPNSYL PARQNDVTGMCVIGDALNMRHPLTGGGMTVGLHDVVLLIKKIGDLDFSDREKVLDELLDYHFER KSYDSVINVLSVALYSLFAADSDNLKALQKGCFKYFQRGGDCVNKPVEFLSGVLPKPLQLTRVF FAVAFYTIYLNMEERGFLGLPMALLEGIMILITAIRVFTPFLFGELIG Squalene ATGTCTGCTGTTAACGTTGCACCTGAATTGATTAATGCCGACAACACAATTACCTACGATGCGA 335 epoxidase/ TTGTCATCGGTGCTGGTGTTATCGGTCCATGTGTTGCTACTGGTCTAGCAAGAAAGGGTAAGAA squalene AGTTCTTATCGTAGAACGTGACTGGGCTATGCCTGATAGAATTGTTGGTGAATTGATGCAACCA monooxidasegene GGTGGTGTTAGAGCATTGAGAAGTCTGGGTATGATTCAATCTATCAACAACATCGAAGCATATC sequence CTGTTACCGGTTATACCGTCTTTTTCAACGGCGAACAAGTTGATATTCCATACCCTTACAAGGC CGATATCCCTAAAGTTGAAAAATTGAAGGACTTGGTCAAAGATGGTAATGACAAGGTCTTGGAA GACAGCACTATTCACATCAAGGATTACGAAGATGATGAAAGAGAAAGGGGTGTTGCTTTTGTTC ATGGTAGATTCTTGAACAACTTGAGAAACATTACTGCTCAAGAGCCAAATGTTACTAGAGTGCA AGGTAACTGTATTGAGATATTGAAGGATGAAAAGAATGAGGTTGTTGGTGCCAAGGTTGACATT GATGGCCGTGGCAAGGTGGAATTCAAAGCCCACTTGACATTTATCTGTGACGGTATCTTTTCAC GTTTCAGAAAGGAATTGCACCCAGACCATGTTCCAACTGTCGGTTCTTCGTTTGTCGGTATGTC TTTGTTCAATGCTAAGAATCCTGCTCCTATGCACGGTCACGTTATTCTTGGTAGTGATCATATG CCAATCTTGGTTTACCAAATCAGTCCAGAAGAAACAAGAATCCTTTGTGCTTACAACTCTCCAA AGGTCCCAGCTGATATCAAGAGTTGGATGATTAAGGATGTCCAACCTTTCATTCCAAAGAGTCT ACGTCCTTCATTTGATGAAGCCGTCAGCCAAGGTAAATTTAGAGCTATGCCAAACTCCTACTTG CCAGCTAGACAAAACGACGTCACTGGTATGTGTGTTATCGGTGACGCTCTAAATATGAGACATC CATTGACTGGTGGTGGTATGACTGTCGGTTTGCATGATGTTGTCTTGTTGATTAAGAAAATAGG TGACCTAGACTTCAGCGACCGTGAAAAGGTTTTGGATGAATTACTAGACTACCATTTCGAAAGA AAGAGTTACGATTCCGTTATTAACGTTTTGTCAGTGGCTTTGTATTCTTTGTTCGCTGCTGACA GCGATAACTTGAAGGCATTACAAAAAGGTTGTTTCAAATATTTCCAAAGAGGTGGCGATTGTGT CAACAAACCCGTTGAATTTCTGTCTGGTGTCTTGCCAAAGCCTTTGCAATTGACCAGGGTTTTC TTCGCTGTCGCTTTTTACACCATTTACTTGAACATGGAAGAACGTGGTTTCTTGGGATTACCAA TGGCTTTATTGGAAGGTATTATGATTTTGATCACAGCTATTAGAGTATTCACCCCATTTTTGTT TGGTGAGTTGATTGGTTAA Squalenesynthase MGKLLQLALHPVEMKAALKLKFCRTPLFSIYDQSTSPYLLHCFELLNLTSRSFAAVIRELHPEL 336 Erg9 RNCVTLFYLILRALDTIEDDMSIEHDLKIDLLRHFHEKLLLTKWSFDGNAPDVKDRAVLTDFES ILIEFHKLKPEYQEVIKEITEKMGNGMADYILDENYNLNGLQTVHDYDVYCHYVAGLVGDGLTR LIVIAKFANESLYSNEQLYESMGLFLQKTNIIRDYNEDLVDGRSFWPKEIWSQYAPQLKDFMKP ENEQLGLDCINHLVLNALSHVIDVLTYLAGIHEQSTFQFCAIPQVMAIATLALVFNNREVLHGN VKIRKGTTCYLILKSRTLRGCVEIFDYYLRDIKSKLAVQDPNFLKLNIQISKIEQFMEEMYQDK LPPNVKPNETPIFLKVKERSRYDDELVPTQQEEEYKFNMVLSIILSVLLGFYYIYTLHRA Squalenesynthase ATGGGAAAGCTATTACAATTGGCATTGCATCCGGTCGAGATGAAGGCAGCTTTGAAGCTGAAGT 337 Erg9genesequence TTTGCAGAACACCGCTATTCTCCATCTATGATCAGTCCACGTCTCCATATCTCTTGCACTGTTT CGAACTGTTGAACTTGACCTCCAGATCGTTTGCTGCTGTGATCAGAGAGCTGCATCCAGAATTG AGAAACTGTGTTACTCTCTTTTATTTGATTTTAAGGGCTTTGGATACCATCGAAGACGATATGT CCATCGAACACGATTTGAAAATTGACTTGTTGCGTCACTTCCACGAGAAATTGTTGTTAACTAA ATGGAGTTTCGACGGAAATGCCCCCGATGTGAAGGACAGAGCCGTTTTGACAGATTTCGAATCG ATTCTTATTGAATTCCACAAATTGAAACCAGAATATCAAGAAGTCATCAAGGAGATCACCGAGA AAATGGGTAATGGTATGGCCGACTACATCTTAGATGAAAATTACAACTTGAATGGGTTGCAAAC CGTCCACGACTACGACGTGTACTGTCACTACGTAGCTGGTTTGGTCGGTGATGGTTTGACCCGT TTGATTGTCATTGCCAAGTTTGCCAACGAATCTTTGTATTCTAATGAGCAATTGTATGAAAGCA TGGGTCTTTTCCTACAAAAAACCAACATCATCAGAGATTACAATGAAGATTTGGTCGATGGTAG ATCCTTCTGGCCCAAGGAAATCTGGTCACAATACGCTCCTCAGTTGAAGGACTTCATGAAACCT GAAAACGAACAACTGGGGTTGGACTGTATAAACCACCTCGTCTTAAACGCATTGAGTCATGTTA TCGATGTGTTGACTTATTTGGCCGGTATCCACGAGCAATCCACTTTCCAATTTTGTGCCATTCC CCAAGTTATGGCCATTGCAACCTTGGCTTTGGTATTCAACAACCGTGAAGTGCTACATGGCAAT GTAAAGATTCGTAAGGGTACTACCTGCTATTTAATTTTGAAATCAAGGACTTTGCGTGGCTGTG TCGAGATTTTTGACTATTACTTACGTGATATCAAATCTAAATTGGCTGTGCAAGATCCAAATTT CTTAAAATTGAACATTCAAATCTCCAAGATCGAACAGTTTATGGAAGAAATGTACCAGGATAAA TTACCTCCTAACGTGAAGCCAAATGAAACTCCAATTTTCTTGAAAGTTAAAGAAAGATCCAGAT ACGATGATGAATTGGTTCCAACCCAACAAGAAGAAGAGTACAAGTTCAATATGGTTTTATCTAT CATCTTGTCCGTTCTTCTTGGGTTTTATTATATATACACTTTACACAGAGCGTGA FarnesylPP MASEKEIRRERFLNVFPKLVEELNASLLAYGMPKEACDWYAHSLNYNTPGGKLNRGLSVV61DT 338 synthase YAILSNKTVEQLGQEEYEKVAILGWCIELLQAYFLVADDMMDKSITRRGQPCWYKVPEVGEIAI NDAFMLEAAIYKLLKSHFRNEKYYIDITELFHEVTFQTELGQLMDLITAPEDKVDLSKFSLKKH SFIVTFKTAYYSFYLPVALAMYVAGITDEKDLKQARDVLIPLGEYFQIQDDYLDCFGTPEQIGK IGTDIQDNKCSWVINKALELASAEQRKTLDENYGKKDSVAEAKCKKIFNDLKIEQLYHEYEESI AKDLKAKISQVDESRGFKADVLTAFLNKVYKRSK FarnesylPP ATGGCTTCAGAAAAAGAAATTAGGAGAGAGAGATTCTTGAACGTTTTCCCTAAATTAGTAGAGG 339 synthasegene AATTGAACGCATCGCTTTTGGCTTACGGTATGCCTAAGGAAGCATGTGACTGGTATGCCCACTC sequence ATTGAACTACAACACTCCAGGCGGTAAGCTAAATAGAGGTTTGTCCGTTGTGGACACGTATGCT ATTCTCTCCAACAAGACCGTTGAACAATTGGGGCAAGAAGAATACGAAAAGGTTGCCATTCTAG GTTGGTGCATTGAGTTGTTGCAGGCTTACTTCTTGGTCGCCGATGATATGATGGACAAGTCCAT TACCAGAAGAGGCCAACCATGTTGGTACAAGGTTCCTGAAGTTGGGGAAATTGCCATCAATGAC GCATTCATGTTAGAGGCTGCTATCTACAAGCTTTTGAAATCTCACTTCAGAAACGAAAAATACT ACATAGATATCACCGAATTGTTCCATGAGGTCACCTTCCAAACCGAATTGGGCCAATTGATGGA CTTAATCACTGCACCTGAAGACAAAGTCGACTTGAGTAAGTTCTCCCTAAAGAAGCACTCCTTC ATAGTTACTTTCAAGACTGCTTACTATTCTTTCTACTTGCCTGTCGCATTGGCCATGTACGTTG CCGGTATCACGGATGAAAAGGATTTGAAACAAGCCAGAGATGTCTTGATTCCATTGGGTGAATA CTTCCAAATTCAAGATGACTACTTAGACTGCTTCGGTACCCCAGAACAGATCGGTAAGATCGGT ACAGATATCCAAGATAACAAATGTTCTTGGGTAATCAACAAGGCATTGGAACTTGCTTCCGCAG AACAAAGAAAGACTTTAGACGAAAATTACGGTAAGAAGGACTCAGTCGCAGAAGCCAAATGCAA AAAGATTTTCAATGACTTGAAAATTGAACAGCTATACCACGAATATGAAGAGTCTATTGCCAAG GATTTGAAGGCCAAAATTTCTCAGGTCGATGAGTCTCGTGGCTTCAAAGCTGATGTCTTAACTG CGTTCTTGAACAAAGTTTACAAGAGAAGCAAATAG cycloartenol MWKLKVAEGGTPWLRTLNNHVGRQVWEFDPHSGSPQDLDDIETARRNFHDNRFTHKHSDDLLMR 340 synthase LQFAKENPMNEVLPKVKVKDVEDVTEEAVATTLRRGLNFYSTIQSHDGHWPGDYGGPMFLMPGL VITLSVTGALNAVLTDEHRKEMRRYLYNHQNKDGGWGLHIEGPSTMFGSVLCYVTLRLLGEGPN DGEGDMERGRDWILEHGGATYITSWGKMWLSVLGVFEWSGNNPMPPEWLLPYALPVHPGRMWCH CRMVYLPMSYLYGKRFVGPITPTVLSLRKELFTVPYHDIDWNQARNLCAKEDLYYPHPLVQDIL WATLHKFVEPVFMNWPGKKLREKAIKTAIEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNSE AFKLHLPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIDEFGPTLKKAHAFIKNSQ VSEDCPGDLSKWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKIAPEIVGEPLDSKRLY DAVNVILSLQNENGGLATYELTRSYTWLEIINPAETFGDIVIDCPYVECTSAAIQALATFGKLY PGHRREEIQCCIEKAVAFIEKIQASDGSWYGSWGVCFTYGTWFGIKGLIAAGKNFSNCLSIRKA CEFLLSKQLPSGGWAESYLSCQNKVYSNLEGNRSHVVNTGWAMLALIEAEQAKRDPTPLHRAAV CLINSQLENGDFPQEEIMGVFNKNCMITYAAYRCIFPIWALGEYRRVLQAC oxidosqualene MWKLKVAEGGTPWLRTLNNHVGRQVWEFDPHSGSPQDLDDIETARRNFHDNRFTHKHSDDLLMR 341 PMID: cylcases LQFAKENPMNEVLPKVKVKDVEDVTEEAVATTLRRGLNFYSTIQSHDGHWPGDLGGPMFLMPGL 26058429 VITLSVTGALNAVLTDEHRKEMRRYLYNHQNKDGGWGLHIEGPSTMFGSVLCYVTLRLLGEGPN Takaseel DGEGDMERGRDWILEHGGATYITSWGKMWLSVLGVFEWSGNNPMPPEIWLLPYALPVHPGRMWC al.,2015 HCRMVYLPMSYLYGKRFVGPITPTVLSLRKELFTVPYHDIDWNQARNLCAKEDLYYPHPLVQDI LWATLHKFVEPVFMNWPGKKLREKAIKTAIEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS EAFKLHLPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIDEFGPTLKKAHAFIKNS QVSEDCPGDLSKWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKIAPEIVGEPLDSKRL YDAVNVILSLQNENGGLATYELTRSYTWLEIINPAETFGDIVIDCPYVECTSAAIQALATFGKL YPGHRREEIQCCIEKAVAFIEKIQASDGSWYGSWGVCFTYGTWFGIKGLIAAGKNFSNCLSIRK ACEFLLSKQLPSGGWAESYLSCQNKVYSNLEGNRSHVVNTGWAMLALIEAEQAKRDPTPLHRAA VCLINSQLENGDFPQEEIMGVFNKNCMITYAAYRCIFPIWALGEYRRVLQAC oxidosqualene ATGTGGAGGCTAACAATAGGTGAGGGCGGCGGTCCGTGGCTGAAGTCGAACAATGGCTTCCTTG 342 cylcasesgene GCCGCCAAGTGTGGGAGTACGACGCCGATGCCGGCACGCCGGAAGAGCGTGCCGAGGTTGAGAG sequence GGTGCGTGCGGAATTCACAAAGAACAGGTTCCAGAGGAAGGAGTCACAGGACCTTCTTCTACGC TTGCAGTACGCAAAAGACAACCCTCTTCCGGCGAATATTCCGACAGAAGCCAAGCTTGAAAAGA GTACAGAGGTCACTCACGAGACTATCTACGAATCATTGATGCGAGCTTTACATCAATATTCCTC TCTACAAGCAGACGATGGGCATTGGCCTGGTGATTACAGTGGGATTCTCTTCATTATGCCTATC ATTATATTCTCTTTATATGTTACTAGATCACTTGACACCTTTTTATCTCCGGAACATCGTCATG AGATATGTCGCTACATTTACAATCAACAGAATGAAGATGGTGGTTGGGGAAAAATGGTTCTTGG CCCAAGTACCATGTTTGGATCGTGTATGAATTATGCAACCTTAATGATTCTTGGCGAGAAGCGA AATGGTGATCATAAGGATGCATTGGAAAAAGGGCGTTCTTGGATTTTATCTCATGGAACTGCAA CTGCAATACCACAGTGGGGAAAAATATGGTTGTCGATAATTGGCGTTTACGAATGGTCAGGAAA CAATCCTATTATACCTGAATTGTGGTTGGTTCCACATTTTCTTCCGATTCACCCAGGTCGTTTT TGGTGTTTTACCCGGTTGATATACATGTCAATGGCATATCTCTATGGTAAGAAATTTGTTGGGC CTATTAGTCCTACAATATTAGCTCTGCGACAAGACCTCTATAGTATACCTTACTGCAACATTAA TTGGGACAAGGCGCGTGATTATTGTGCAAAGGAGGACCTTCATTACCCACGCTCACGGGCACAA GATCTTATATCTGGTTGCCTAACGAAAATTGTGGAGCCAATTTTGAATTGGTGGCCAGCAAACA AGCTAAGAGATAGAGCTTTAACTAACCTCATGGAGCATATCCATTATGACGACGAATCAACCAA ATATGTGGGCATTTGCCCTATTAACAAGGCATTGAACATGATTTGTTGTTGGGTAGAAAACCCA AATTCGCCTGAATTCCAACAACATCTTCCACGATTCCATGACTATTTGTGGATGGCGGAGGATG GAATGAAGGCACAGGTATATGATGGATGTCATAGCTGGGAACTAGCGTTCATAATTCATGCCTA TTGTTCCACGGATCTTACTAGCGAGTTTATCCCGACTCTAAAAAAGGCGCACGAGTTCATGAAG AACTCACAGGTTCTTTTCAACCACCCAAATCATGAAAGCTATTATCGCCACAGATCAAAAGGCT CATGGACCCTTTCAAGTGTAGATAATGGTTGGTCTGTATCTGATTGTACTGCGGAAGCTGTTAA GGCATTGCTACTATTATCAAAGATATCCGCTGACCTTGTTGGCGATCCAATAAAACAAGACAGG TTGTATGATGCCATTGATTGCATCCTATCTTTCATGAATACAGATGGAACATTTTCTACCTACG AATGCAAACGGACATTCGCTTGGTTAGAGGTTCTCAACCCTTCTGAGAGTTTTCGGAACATTGT CGTGGACTATCCATCTGTTGAATGCACATCATCTGTGGTTGATGCTCTCATATTATTTAAAGAG ACGAATCCACGATATCGAAGAGCAGAGATAGATAAATGCATTGAAGAAGCTGTTGTATTTATTG AGAACAGTCAAAATAAGGATGGTTCATGGTATGGCTCATGGGGTATATGTTTCGCATATGGATG CATGTTTGCAGTAAGGGCGTTGGTTGCTACAGGAAAAACCTACGACAATTGTGCTTCTATCAGG AAATCATGCAAATTTGTCTTATCAAAGCAACAAACAACAGGTGGATGGGGTGAAGACTATCTTT CTAGTGACAATGGGGAATATATTGATAGCGGTAGGCCTAATGCTGTGACCACCTCATGGGCAAT GTTGGCTTTAATTTATGCTGGACAGGTTGAACGTGACCCAGTACCACTGTATAATGCTGCAAGA CAGCTAATGAATATGCAGCTAGAAACAGGTGACTTCCCCCAACAGGAACACATGGGTTGCTTCA ACTCCTCCTTGAACTTCAACTACGCCAACTACCGCAATCTATACCCGATTATGGCTCTTGGGGA ACTTCGCCGTCGACTTCTTGCGATTAAGAGCTGA cycloartenol MWKLKVAEGGTPWLRTLNNHVGRQVWEFDPHSGSPQDLDDIETARRNFHDNRFTHKHSDDLLMR 343 PMID: synthase LQFAKENPMNEVLPKVKVKDVEDVTEEAVATTLRRGLNFYSTIQSHDGHWPGDLGGPMFLMPGL 26058429 VITLSVTGALNAVLTDEHRKEMRRYLYNHQNKDGGWGLHIEGPSTMFGSVLCYVTLRLLGEGPN Takaseel DGEGDMERGRDWILEHGGATYITSWGKMWLSVLGVFEWSGNNPMPPEIWLLPYALPVHPGRMWC al.,2015 HCRMVYLPMSYLYGKRFVGPITPTVLSLRKELFTVPYHDIDWNQARNLCAKEDLYYPHPLVQDI LWATLHKFVEPVFMNWPGKKLREKAIKTAIEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS EAFKLHLPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIDEFGPTLKKAHAFIKNS QVSEDCPGDLSKWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKIAPEIVGEPLDSKRL YDAVNVILSLQNENGGLATYELTRSYTWLEIINPAETFGDIVIDCPYVECTSAAIQALATFGKL YPGHRREEIQCCIEKAVAFIEKIQASDGSWYGSWGVCFTYGTWFGIKGLIAAGKNFSNCLSIRK ACEFLLSKQLPSGGWAESYLSCQNKVYSNLEGNRSHVVNTGWAMLALIEAEQAKRDPTPLHRAA VCLINSQLENGDFPQEEIMGVFNKNCMITYAAYRCIFPIWALGEYRRVLQAC beta-amyrin MWRLTIGEGGGPWLKSNNGFLGRQVWEYDADAGTPEERAEVERRAEFTKNRFQRKESQDLLLRL 344 synthase QYAKDNPLPANIPTEAKLEKSTEVTHETIYESLMRALHQYSSLQADDGHWPGDYSGILFIMPII IFSLYVTRSLDTFLSPEHRHEICRYIYNQQNEDGGWGKMVLGPSTMFGSCMNYATLMILGKRNG DHKDALEKGRSWILSHGTATAIPQWGKIWLSIIGVYEWSGNNPIIPELWLVPHFLPIHPGRFWC FTRLIYMSMAYLYGKKFVGPISPTILALRQDLYSIPYCNINWDKARDYCAKEDLHYPRSRAQDL ISGCLTKIVEPILNWWPANKLRDRALTNLMEHIHYDDESTKYVGICPINKALNMICCWVENPNS PEFQQHLPRFHDYLWMAEDGMKAQVYDGCHSWELAFIIHAYCSTDLTSEFIPTLKKAHEFMKNS QVLFNHPNHESYYRHRSKGSWTLSSVDNGWSVSDCTAEAVKALLLLSKISADLVGDPIKQDRLY DAIDCILSFMNTDGTFSTYECKRTFAWLEVLNPSESFRNIVVDYPSVECTSSVVDALILFKETN PRYRRAEIDKCIEEAVVFIENSQNKDGSWYGSWGICFAYGCMFAVRALVATGKTYDNCASIRKS CKFVLSKQQTTGGWGEDYLSSDNGEYIDSGRPNAVTTSWAMLALIYAGQVERDPVPLYNAARQL MNMQLETGDFPQQEHMGCFNSSLNFNYANYRNLYPIMALGELRRRLLAIKS beta-amyrin ATGTGGAGGCTAACAATAGGTGAGGGCGGCGGTCCGTGGCTGAAGTCGAACAATGGCTTCCTTG 345 synthasegene GCCGCCAAGTGTGGGAGTACGACGCCGATGCCGGCACGCCGGAAGAGCGTGCCGAGGTTGAGAG sequence GGTGCGTGCGGAATTCACAAAGAACAGGTTCCAGAGGAAGGAGTCACAGGACCTTCTTCTACGC TTGCAGTACGCAAAAGACAACCCTCTTCCGGCGAATATTCCGACAGAAGCCAAGCTTGAAAAGA GTACAGAGGTCACTCACGAGACTATCTACGAATCATTGATGCGAGCTTTACATCAATATTCCTC TCTACAAGCAGACGATGGGCATTGGCCTGGTGATTACAGTGGGATTCTCTTCATTATGCCTATC ATTATATTCTCTTTATATGTTACTAGATCACTTGACACCTTTTTATCTCCGGAACATCGTCATG AGATATGTCGCTACATTTACAATCAACAGAATGAAGATGGTGGTTGGGGAAAAATGGTTCTTGG CCCAAGTACCATGTTTGGATCGTGTATGAATTATGCAACCTTAATGATTCTTGGCGAGAAGCGA AATGGTGATCATAAGGATGCATTGGAAAAAGGGCGTTCTTGGATTTTATCTCATGGAACTGCAA CTGCAATACCACAGTGGGGAAAAATATGGTTGTCGATAATTGGCGTTTACGAATGGTCAGGAAA CAATCCTATTATACCTGAATTGTGGTTGGTTCCACATTTTCTTCCGATTCACCCAGGTCGTTTT TGGTGTTTTACCCGGTTGATATACATGTCAATGGCATATCTCTATGGTAAGAAATTTGTTGGGC CTATTAGTCCTACAATATTAGCTCTGCGACAAGACCTCTATAGTATACCTTACTGCAACATTAA TTGGGACAAGGCGCGTGATTATTGTGCAAAGGAGGACCTTCATTACCCACGCTCACGGGCACAA GATCTTATATCTGGTTGCCTAACGAAAATTGTGGAGCCAATTTTGAATTGGTGGCCAGCAAACA AGCTAAGAGATAGAGCTTTAACTAACCTCATGGAGCATATCCATTATGACGACGAATCAACCAA ATATGTGGGCATTTGCCCTATTAACAAGGCATTGAACATGATTTGTTGTTGGGTAGAAAACCCA AATTCGCCTGAATTCCAACAACATCTTCCACGATTCCATGACTATTTGTGGATGGCGGAGGATG GAATGAAGGCACAGGTATATGATGGATGTCATAGCTGGGAACTAGCGTTCATAATTCATGCCTA vTTGTTCCACGGATCTTACTAGCGAGTTTATCCCGACTCTAAAAAAGGCGCACGAGTTCATGAAG AACTCACAGGTTCTTTTCAACCACCCAAATCATGAAAGCTATTATCGCCACAGATCAAAAGGCT CATGGACCCTTTCAAGTGTAGATAATGGTTGGTCTGTATCTGATTGTACTGCGGAAGCTGTTAA GGCATTGCTACTATTATCAAAGATATCCGCTGACCTTGTTGGCGATCCAATAAAACAAGACAGG TTGTATGATGCCATTGATTGCATCTATCTTTCATGAATACAGATGGAACATTTTCTACCTACGA ATGCAAACGGACATTCGCTTGGTTAGAGGTTCTCAACCCTTCTGAGAGTTTTCGGAACATTGTC GTGGACTATCCATCTGTTGAATGCACATCATCTGTGGTTGATGCTCTCATATTATTTAAAGAGA CGAATCCACGATATCGAAGAGCAGAGATAGATAAATGCATTGAAGAAGCTGTTGTATTTATTGA GAACAGTCAAAATAAGGATGGTTCATGGTATGGCTCATGGGGTATATGTTTCGCATATGGATGC ATGTTTGCAGTAAGGGCGTTGGTTGCTACAGGAAAAACCTACGACAATTGTGCTTCTATCAGGA AATCATGCAAATTTGTCTTATCAAAGCAACAAACAACAGGTGGATGGGGTGAAGACTATCTTTC TAGTGACAATGGGGAATATATTGATAGCGGTAGGCCTAATGCTGTGACCACCTCATGGGCAATG TTGGCTTTAATTTATGCTGGACAGGTTGAACGTGACCCAGTACCACTGTATAATGCTGCAAGAC AGCTAATGAATATGCAGCTAGAAACAGGTGACTTCCCCCAACAGGAACACATGGGTTGCTTCAA CTCCTCCTTGAACTTCAACTACGCCAACTACCGCAATCTATACCCGATTATGGCTCTTGGGGAA CTTCGCCGTCGACTTCTTGCGATTAAGAGCTGA Modifiedsequence MWRLTIGEGGGPWLKSNNGFLGRQVWEYDADAGTPEERAEVERVRAEFTKNRFQRKESQDLLLR 346 PMID: ofB-amyrin LQYAKDNPLPANIPTEAKLEKSTEVTHETIYESLMRALHQYSSLQADDGHWPGDYSGILFIMPI 27412861 synthasefrom IIFSLYVTRSLDTFLSPEHRHEICRYIYNQQNEDGGWGKMVLGPSTMFGSCMNYATLMILGEKR (Salmonet Avenastrigosa NGDHKDALEKGRSWILSHGTATAIPQWGKIWLSIIGVYEWSGNNPIIPELWLVPHFLPIHPGRF al.,216) (AJ311789),which WCFTRLIYMSMAYLYGKKFVGPISPTILALRQDLYSIPYCNINWDKARDYCAKEDLHYPRSRAQ reacts DLISGCLTKIVEPILNWWPANKLRDRALTNLMEHIHYDDESTKYVGICPINKALNMICCWVENP preferentially NSPEFQQHLPRFHDYLWMAEDGMKAQVYDGCHSWELAFIIHAYCSTDLTSEFIPTLKKAHEFMK with NSQVLFNHPNHESYYRHRSKGSWTLSSVDNGWSVSDCTAEAVKALLLLSKISADLVGDPIKQDR diepoxysqualene LYDAIDCILSFMNTDGTFSTYECKRTFAWLEVLNPSESFRNIVVDYPSVECTSSVVDALILFKE TNPRYRRAEIDKCIEEAVVFIENSQNKDGSWYGSWGICFAYGCMFAVRALVATGKTYDNCASIR KSCKFVLSKQQTTGGWGEDYLSSDNGEYIDSGRPNAVTTSWAMLALIYAGQVERDPVPLYNAAR QLMNMQLETGDFPQQEHMGCFNSFLNFNYANYRNLYPIMALGELRRRLLAIKS Modifiedsequence MWKLKIGKGNGEDPHLFSSNNFVGRQTWKFDHKAGSPEERAAVEEARRGFLDNRFRVKGCSDLL 347 PMID: offrom WRMQFLREKKFEQGIPQLKATNIEEITYETTTNALRRGVRYFTALQASDGHWPGEITGPLFFLP 27412861 Arabidopsis PLIFCLYITGHLEEVFDAEHRKEMLRHIYCHQNEDGGWGLHIESKSVMFCTVLNYICLRMLGEN (Salmonet thaliana(AtLup1, PEQDACKRARQWILDRGGVIFIPSWGKFWLSILGVYDWSGTNPTPPELLMLPSFLPIHPGKILC al.,2016) Q9C5M3.1),which YSRMVSIPMSYLYGKRFVGPITPLILLLREELYLEPYEEINWKKSRRLYAKEDMYYAHPLVQDL reacts LSDTLQNFVEPLLTRWPLNKLVREKALQLTMKHIHYEDENSHYITIGCVEKVLCMLACWVENPN preferentially GDYFKKHLARIPDYMWVAEDGMKMQSFGCQLWDTGFAIQALLASNLPDETDDALKRGHNYIKAS with QVRENPSGDFRSMYRHISKGAWTFSDRDHGWQVSDCTAEALKCCLLLSMMSADIVGQKIDDEQL diepoxysqualene YDSVNLLLSLQSGNGGVNAWEPSRAYKWLELLNPTEFMANTMVEREFVECTSSVIQALDLFRKL YPDHRKKEINRSIEKAVQFIQDNQTPDGSWYGNWGVCFIYATWFALGGLAAAGETYNDCLAMRN GVHFLLTTQRDDGGWGESYLSCSEQRYIPSEGERSNLVQTSWAMMALIHTGQAERDLIPLHRAA KLIINSQLENGDFPQQEIVGAFMNFCMLHYATYRNTFPLWALAEYRKVVFIVN XP_001396506.2 MCNKSNYSSPKWWKESVVYQVYPASFNCGKSTTNTNGWGDVTGIIEKVPYLESLGVDIVWLSPI 352 YTSPQVDMGYDIADYESIDPRYGTLADVDLLIKTLKDHDMKLMMDLVVNHTSDQHSWFVESANS KDSPKRDWYIWRPAKGFDEAGNPVPPNNWAQILGDTLSAWTWHAETQEFYLTLHTSAQAELNWE NPDVVTAVYDVMEFWLRRGICGFRMDVINFISKDQSFPDAPIIDPASKYQPGEQFYTNGPRFHE FMHGIYDNVLSKYDTITVGETPYVTDMKEIIKTVGSTAKELNMAFNFDHMEIEDIKTKGESKWS LRDWKLTELKGILSGWQKRMREWDGWNAIFLECHDQARSVSRYTNDSDEFRDRGAKLLALLETT LGGTIFLYQGQEIGMRNFPVEWDPDTEYKDIESVNFWKKSKELHPVGSEGLAQARTLLQKKARD HARTPMQWSADPHAGFTVPDATPWMRVNDDYGTVNVEAQMSFPWEMKGELSVWQYWQQALQRRK LHKGAFVYGDFEDLDYHNELVFAYSRTSADGKETWLVAMNWTTDAVEWTVPSGIHVTRWVSSTL QTAPLMAGQSTVTLRALEGVVGCCS beta-glucosidase MTSFHDGVKLSTVTCVLSGLVALGSAGPTAASANAQVAAAAAAQAWVPDGYYVPPYYPAPYGGW 354 [Trichodermareesei] VEDWQESYTKAKALVDSMTLAEKTNITAGTGIYMGERCAGNTGSAFRVSFPQLCLNDSPAGVRH GenBank:BAP5915.1 ADNVTAFPDGITVGATFDKALMYKRGVAIGKENRGKGVNVWLGPTVGPIGRKPKGGRNWEGFGA DPVLQAVGARETIKGVQEQGVIATIKHFIGNEQEMYRMYNPFQYAYSSNIDDRTLHEVYAWPFA EGIRAGVGAVMMAYNAVNGTACSQHPYLMSALLKDEMGFQGFIMTDWLAHMSGVASAIAGLDMD MPGDVQIPFFGGSYWMYELTRSALNGSVPMDRINDAATRIAAAWYKMGQDKGFPATNFDTNSRA AFNPLYPAALPLSPFGITNEFVPVQDDHDVIARQISQEAITLLKNDGDILPLSPSQHLKVFGTD AQKNPDGINSCTDRNCNKGTLGQGWGSGTVDYPYLDDPISAITAEADNVTFYNTDKFPSVGEVS DSDVAIVFVNSDAGENTYTVEGNHGDRDKSGLYAWHDGDKLVQDAASKFSNVIVVIHTVGPLIL EKWIDLPSVKAVLVAHLPGQEAGKSLTNVLFGHASPCGHLPYSITKEEDDLPKSVTTLIDSEFL NQPQDTYTEGLYIDYRWLNKNKTKPRYAFGHGLSYTNFTFKAASIKQVARLSAYPPARPAKGST PDFAQSIPSASEAVAPSGFGKIPRYIYSWLSQGDANRAISDGKTGKYPYPDGYSTTQKPGARAG GGEGGNPALWDVAYSLTVTVQNTGDEYAGKASVQAYLQFPDDIDYDTPIIQLRDFEKTKELKPG ETTTVTLTLTRKDVSVWDVVAQDWKVPAVDGGYKVWIGDASDSLSIVCHTDTLECETGVVGPV Beta-glucosidase MMGFDVEDVLSQLSQNEKIALLSGIDFWHTYPIPKYNVPSVRLTDGPNGIRGTKFFAGIPAACL 355 [Trichodermareesei] PCGTALASTWDKQLLKKAGKLLGDECIAKGAHCWLGPTINTPRSPLGGRGFESFSEDPYLSGIL BAP59014.1 AASMILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTS GI:690966588 YNKVNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIE SALQARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENS ILPLPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQML PVIDAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPT ATGIWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFG SANTTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDR PHMDLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFG NVNPSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYAT FKLPDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVY LEAGEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL beta-glucosidase MLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRTAEDIALL 356 reesel KSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWDLPEGLHQ [Trichodermareesei] RYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQSTSEPWTV AHK23047.1 GHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEFFTAWFAD GI:588294532 PIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASADDTVGNVD VLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESDLPKEKIL EDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYENGQKRFP KKSAKSLKPLFDELIAAA beta-glucosidase MLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRTAEDIALL 357 [Trichodermareesei] KSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWDLPEGLHQ BAA74959.1 RYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQSTSEPWTV GI:4249562 GHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEFFTAWFAD PIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASADDTVGNVD VLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESDLPKEKIL EDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYENGQKRFP KKSAKSLKPLFDELIAAA beta-glucosidase MLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRTAEDIALL 358 [Trichoderma KSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWDLPEGLHQ reeseiRUTC-3 RYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQSTSEPWTV ETS5552.1 GHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEFFTAWFAD GI:572282538 PIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASADDTVGNVD VLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESDLPKEKIL EDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYENGQKRFP KKSAKSLKPLFDELIAAA ChainD,Crystal MHHHHHHMLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRT 359 StructureOfBeta- AEDIALLKSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWD Glucosidase2From LPEGLHQRYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQS FungusTrichoderma TSEPWTVGHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEF ReeseiInComplex FTAWFADPIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASAD WithTris DTVGNVDVLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESD 3AHY_D LPKEKILEDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYE GI:303324838 NGQKRFPKKSAKSLKPLFDELIAAA ChainC,Crystal MHHHHHHMLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRT 360 StructureOfBeta- AEDIALLKSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWD Glucosidase2From LPEGLHQRYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQS FungusTrichoderma TSEPWTVGHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEF ReeseiInComplex FTAWFADPIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASAD WithTris DTVGNVDVLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESD 3AHY_CGI:33324837 LPKEKILEDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYE NGQKRFPKKSAKSLKPLFDELIAAA ChainB,Crystal MHHHHHHMLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRT 361 StructureOfBeta- AEDIALLKSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWD GlucosIdase2From LPEGLHQRYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQS FungusTrichoderma TSEPWTVGHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEF ReeseiInComplex FTAWFADPIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASAD WithTris DTVGNVDVLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESD 3AHY_B LPKEKILEDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYE GI:303324836 NGQKRFPKKSAKSLKPLFDELIAAA ChainA,Crystal MHHHHHHMLPKDFQWGFATAAYQIEGAVDQDGRGPSIWDTFCAQPGKIADGSSGVTACDSYNRT 362 StructureOfBeta- AEDIALLKSLGAKSYRFSISWSRIIPEGGRGDAVNQAGIDHYVKFVDDLLDAGITPFITLFHWD Glucosidase2From LPEGLHQRYGGLLNRTEFPLDFENYARVMFRALPKVRNWITFNEPLCSAIPGYGSGTFAPGRQS FungusTrichoderma TSEPWTVGHNILVAHGRAVKAYRDDFKPASGDGQIGIVLNGDFTYPWDAADPADKEAAERRLEF ReeseInComplex FTAWFADPIYLGDYPASMRKQLGDRLPTFTPEERALVHGSNDFYGMNHYTSNYIRHRSSPASAD WithTris DTVGNVDVLFTNKQGNCIGPETQSPWLRPCAAGFRDFLVWISKRYGYPPIYVTENGTSIKGESD 3AHY_A LPKEKILEDDFRVKYYNEYIRAMVTAVELDGVNVKGYFAWSLMDNFEWADGYVTRFGVTYVDYE GI:303324835 NGQKRFPKKSAKSLKPLFDELIAAA beta-glucosidase- MRLKHWKTAAFAAASIVSQVEAGFWNFGRDTSSSTRPPTKDQFIESLISKLTLEDLVLQLHLMF 363 likeprotein ADDIVGAASHNELYDQTMHLSPKSPIGTIHDWYPMNKSYFNVLQKLQLDNSHVKIPMMLVEECL [Trichodermareesei] HGVGSFKQSIFPQNIAMAASFDTDIVYRVGRAIGTEARSIGIHGCFSPVLDLAQDPRWGRVQED reesei FGEDKILTSHIGSAYSSGLSKNKTWSDPDAVFPIMKHFAAHGAAQAGHNTAPFTGLGPRQIKQD BAP59016.1 LLVPFKANYDLGGARGVMMAYNEIDGVPSCVNPMLYEVLDDWGYDGIVIGDDTAMRNLLTQHRV GI:690966592 TTSEADTLQQWYNAGGQIDFYDFDLDSKINITKALVANGTVPLKTLQSHVRKILGVKWDLGLFE NPYIPEHIDPLAVVASHQDVALEAAHKSIILLKNDNRTLPLSSPKKIALIGPFADTINLGDYSG ALGQYPAKYTQTLREGVLRHANKSGHTVRTSWGTNSWEYNNQYVIPGYLLSTNGKPGGLKATYY AHTNFTSPKATRVEVPAQDWGLYPPPGLSSNNFSAVWEGELESPTDLDVNGWIGLAIGPNSTSK LYVDGKLISSKGYSGSGNLLGTIEGYAWTQANSTLPPQGGVEFTFKKNAKHHVRIEFQSWNNYK KTANVNSVNSQLIFWWNLVSPNGKALDQAVSIAKDSDVVILAVGAAWNSDGESGDRGTLGLAPS extracellularbeta QDELAREVFALGKPVVLVLEGGRPSAIPDHYGNSSAVLSTFFGGQAGGQAIADVLFGDFNPGAR 364 glucosidase VPITVPWSVGQIPAYYNYKPSARAAQYLDIPSEPIYPFGYGLSYTTFSTSSPTASVSGSSKRSS 1713235AGI:227874 VDAQTSQSFGSGDWITFSVTVKNTGSVAGSYVAQVYLLGRVSTITQPVKQLVGFQRVYLEAGQK KTANIQLEVDRYLKIINRKDEWELEKGSYTFALLEHGGSNADTSKNVTLQCVG MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD QLGFPGYVMTDWDAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTSTLSVA ChainA,Crystal VVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVSGVGWNGGPCVGNTSPASKISYPSLCLQDGPL 365 StructureOfA GVRYSTGSTAFTPGVQAASTWDVNLIRERGQFIGEEVKASGIHVILGPVAGPLGKTPQGGRNWE Glycoside GFGVDPYLTGIAMGQTINGIQSVGVQATAKHYILNEQELNRETISSNPDDRTLHELYTWPFADA HydrolaseFamily3 VQANVASVMCSYNKVNTTWACEDQYTLQTVLKDQLGFPGYVMTDWNAQHTTVQSANSGLDMSMP Beta-glucosidase, GTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMVTRILAAWYLTGQDQAGYPSFNISRNVQGNHK Bg11FromHypocrea TNVRAIARDGIVLLKNDANILPLKKPASIAVVGSAAIIGNHARNSPSCNDKGCDDGALGMGWGS Jecorina GAVNYPYFVAPYDAINTRASSQGTQVTLSNTDNTSSGASAARGKDVAIVFITADSGEGYITVEG 3ZZ1_A NAGDRNNLDPWHNGNALVQAVAGANSNVIVVVHSVGAIILEQILALPQVKAVVWAGLPSQESGN GI:429544273 ALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVSGGSDSFSEGLFIDYKHFDDANITPRYEFGYG LSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSDLFQNVATVTVDIANSGQVTGAEVAQLYITYP SSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRRRDLSYWDTASQKWVVPSGSFGISVGASSRDI RLTSTLSVA ChainA,Crystal VVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVSGVGWNGGPCVGNTSPASKISYPSLCLQDGPL 366 StructureOfA GVRYSTGSTAFTPGVQAASTWDVNLIRERGQFIGEEVKASGIHVILGPVAGPLGKTPQGGRNWE Glycoside GFGVDPYLTGIAMGQTINGIQSVGVQATAKHYILNEQELNRETISSNPDDRTLHELYTWPFADA HydrolaseFamily3 VQANVASVMCSYNKVNTTWACEDQYTLQTVLKDQLGFPGYVMTDWNAQHTTVQSANSGLDMSMP Beta-glucosidase, GTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMVTRILAAWYLTGQDQAGYPSFNISRNVQGNHK Bg11fromHypocrea TNVRAIARDGIVLLKNDANILPLKKPASIAVVGSAAIIGNHARNSPSCNDKGCDDGALGMGWGS Jecorina GAVNYPYFVAPYDAINTRASSQGTQVTLSNTDNTSSGASAARGKDVAIVFITADSGEGYITVEG 3ZYZ_A NAGDRNNLDPWHNGNALVQAVAGANSNVIVVVHSVGAIILEQILALPQVKAVVWAGLPSQESGN GI:429544272 ALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVSGGSDSFSEGLFIDYKHFDDANITPRYEFGYG LSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSDLFQNVATVTVDIANSGQVTGAEVAQLYITYP SSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRRRDLSYWDTASQKWVVPSGSFGISVGASSRDI RLTSTLSVA ChainB,Crystal AVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVSGVGWNGGPCVGNTSPASKISYPSLCLQDGP 367 StructureOfBeta- LGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFIGEEVKASGIHVILGPVAGPLGKTPQGGRNW d-glucoside EGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYILNEQELNRETISSNPDDRTLHELYTWPFAD glucohydrolase AVQANVASVMCSYNKVNTTWACEDQYTLQTVLKDQLGFPGYVMTDWNAQHTTVQSANSGLDMSM fromTrichoderma PGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMVTRILAAWYLTGQDQAGYPSFNISRNVQGNH Reesei KTNVRAIARDGIVLLKNDANILPLKKPASIAVVGSAAIIGNHARNSPSCNDKGCDDGALGMGWG 4I8D_B SGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDNTSSGASAARGKDVAIVFITADSGEGYITVE GI:430801090 GNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVHSVGAIILEQILALPQVKAVVWAGLPSQESG NALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVSGGSDSFSEGLFIDYKHFDDANITPRYEFGY GLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSDLFQNVATVTVDIANSGQVTGAEVAQLYITY PSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRRRDLSYWDTASQKWVVPSGSFGISVGASSRD IRLTSTLSVA ChainA,Crystal AVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVSGVGWNGGPCVGNTSPASKISYPSLCLQDGP 368 StructureOfBeta- LGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFIGEEVKASGIHVILGPVAGPLGKTPQGGRNW d-glucoside EGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYILNEQELNRETISSNPDDRTLHELYTWPFAD Glucohydrolase AVQANVASVMCSYNKVNTTWACEDQYTLQTVLKDQLGFPGYVMTDWNAQHTTVQSANSGLDMSM fromTrichoderma PGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMVTRILAAWYLTGQDQAGYPSFNISRNVQGNH Reesei KTNVRAIARDGIVLLKNDANILPLKKPASIAVVGSAAIIGNHARNSPSCNDKGCDDGALGMGWG 4I8D_A SGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDNTSSGASAARGKDVAIVFITADSGEGYITVE GI:430801089 GNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVHSVGAIILEQILALPQVKAVVWAGLPSQESG NALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVSGGSDSFSEGLFIDYKHFDDANITPRYEFGY GLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSDLFQNVATVTVDIANSGQVTGAEVAQLYITY PSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRRRDLSYWDTASQKWVVPSGSFGISVGASSRD IRLTSTLSVA Cel3dprotein MILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTSYNK 369 [Trichoderma VNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIESAL reesei] QARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENSILP AAP57759.1 LPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQMLPVI GI:31747172 DAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPTATG IWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFGSAN TTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDRPHM DLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFGNVN PSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYATFKL PDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVYLEA GEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL Cel3cprotein MADIDVEAILKKLTLAEKVDLLAGIDFWHTKALPKHGVPSLRFTDGPNGVRGTKFFNGVPAACF 370 [Trichoderma PCGTSLGSTFNQTLLEEAGKMMGKEAIAKSAHVILGPTINMQRSPLGGRGFESIGEDPFLAGLG reesei] AAALIRGIQSTGVQATIKHFLCNDQEDRRMMVQSIVTERALREIYALPFQIAVRDSQPGAFMTA AAP57756.1 YNGINGVSCSENPKYLDGMLRKEWGWDGLIMSDWYGTYSTTEAVVAGLDLEMPGPPRFRGETLK GI:31747168 FNVSNGKPFIHVIDQRAREVLQFVKKCAASGVTENGPETTVNNTPETAALLRKVGNEGIVLLKN ENNVLPLSKKKKTLIVGPNAKQATYHGGGSAALRAYYAVTPFDGLSKQLETPPSYTVGAYTTVP PILGEQCLTPDGAPGMRWRVFNEPPGTPNRQHIDELFFTKTDMHLVDYYHPKAADTWYADMEGT YTADEDCTYELGLVVCGTAKAYVDDQLVVDNATKQVPGDAFFGSATREETGRINLVKGNTYKFK IEFGSAPTYTLKGDTIVPGHGSLRVGGCKVIDDQAEIEKSVALAKEHDQVIICAGLNADWETEG ADRASMKLPGVLDQLIADVAAANPNTVVVMQTGTPEEMPWLDATPAVIQAWYGGNETGNSIADV VFGDYNPSGKLSLSFPKRLQDNPAFLNFRTEAGRTLYGEDVYVGYRYYEFADKDVNFPFGHGLS YTTFAFSNLSVSHKDGKLSVSLSVKNTGSVPGAQVAQLYVKPLQAAKINRPVKELKGFAKVELQ PGETKAVTIEEQEKYVAAYFDEERDQWCVEKGDYEVIVSDSSAAKDGVALRGKFTVGETYWWSG V Ce13bprotein MKTLSVFAAALLAAVAEANPYPPPHSNQAYSPPFYPSPWMDPSAPGWEQAYAQAKEFVSGLTLL 371 [Trichoderma EKVNLTTGVGWMGEKCVGNVGTVPRLGMRSLCMQDGPLGLRFNTYNSAFSVGLTAAASWSRHLW reesei] VDRGTALGSEAKGKGVDVLLGPVAGPLGRNPNGGRNVEGFGSDPYLAGLALADTVTGIQNAGTI AAP57755.1 ACAKHFLLNEQEHFRQVGEANGYGYPITEALSSNVDDKTIHEVYGWPFQDAVKAGVGSFMCSYN GI:31747166 QVNNSYACQNSKLINGLLKEEYGFQGFVMSDWQAQHTGVASAVAGLDMTMPGDTAFNTGASYFG SNLTLAVLNGTVPEWRIDDMVMRIMAPFFKVGKTVDSLIDTNFDSWTNGEYGYVQAAVNENWEK VNYGVDVRANHANHIREVGAKGTVIFKNNGILPLKKPKFLTVIGEDAGGNPAGPNGCGDRGCDD GTLAMEWGSGTTNFPYLVTPDAALQSQALQDGTRYESILSNYAISQTQALVSQPDAIAIVFANS DSGEGYINVDGNEGDRKNLTLWKNGDDLIKTVAAVNPKTIVVIHSTGPVILKDYANHPNISAIL WAGAPGQESGNSLVDILYGKQSPGRTPFTWGPSLESYGVSVMTTPNNGNGAPQDNFNEGAFIDY RYFDKVAPGKPRSSDKAPTYEFGFGLSWSTFKFSNLHIQKNNVGPMSPPNGKTIAAPSLGSFSK NLKDYGFPKNVRRIKEFIYPYLSTTTSGKEASGDAHYGQTAKEFLPAGALDGSPQPRSAASGEP GGNRQLYDILYTVTATITNTGSVMDDAVPQLYLSHGGPNEPPKVLRGFDRIERIAPGQSVTFKA DLTRRDLSNWDTKKQQWVITDYPKTVYVGSSSRDLPLSARLP beta-D-glucoside MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS 372 glucohydrolase GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI [Trichoderma GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI reesei] LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD AAA18473.1 QLGFPGYVMTDWNAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV GI:493580 TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTSTLSVA putativebeta- MTSFHDGVKLSTVTYGYYVPPYYPAPYGGWVEDWQESYTKAKALVDSMTLAEKTNITAGTGIYM 373 glucosidase1 GEWRCAGNTGSAFRVSFPQLCLNDSPAGVRHADNVTAFPDGITVGATFDKALMYKRGVAIGKEN precursor RGKGVNVWLGPTVGPIGRKPKGGRNWEGFGADPVLQAVGARETIKGVQEQGVIATIKHFIGNEQ [Trichoderma EMYRMYNPFQYAYSSNIDDRTLHEVYAWPFAEGIRAGVGAVMMAYNAVNGTACSQHPYLMSALL reeseiRUTC-30] KDEMGFQGFIMTDWLAHMSGVASAIAGLDMDMPGDVQIPFFGGSYWMYELTRSALNGSVPMDRI ET502983.1 NDAATRIAAAWYKMGQDKGFPATNFDTNSRAAFNPLYPAALPLSPFGITNEFVPVQDDHDVIAR GI:572279864 QISQEAITLLKNDGDILPLSPSQHLKVFGTDAQKNPDGINSCTDRNCNKGTLGQGWGSGTVDYP YLDDPISAITAEADNVTFYNTDKFPSVGEVSDSDVAIVFVNSDAGENTYTVEGNHGDRDKSGLY AWHDGDKLVQDAASKFSNVIVVIHTVGPLILEKWIDLPSVKAVLVAHLPGQEAGKSLTNVLFGH ASPCGHLPYSITKEEDDLPKSVTTLIDSEFLNQPQDTYTEGLYIDYRWLNKNKTKPRYAFGHGL SYTNFTFKAASIKQVARLSAYPPARPAKGSTPDFAQSIPSASEAVAPSGFGKIPRYIYSWLSQG DANRAISDGKTGKYPYPDGYSTTQKPGARAGGGEGGNPALWDVAYSLTVTVQNTGDEYAGKASV QAYLQFPDDIDYDTPIIQLRDFEKTKELKPGETTTVTLTLTRKDVSVWDVVAQDWKVPAVDGGY KVWIGDASDSLSIVCHTDTLECETGVVGPV putativebeta- MVAVKQIALLAGLAHWADAAEKVITNDTHFYGQSPPVYPSPEMTGGNEWEAAYQKAKAFVGQLT 374 glucosidase LEEKVNLTAGVPPNTTCSGVIPAIERLKFPGMCLSDAGNGLRNTDFVSGFPSGIHVGASWSKDL [Trichoderma AFRRAVAMGAEFRKKGVNVLLGPVVGPAGRTVRGGRNWEGFSVDPWLAGVLVSETVSGIQEQGV reeseiRUTC-30] ITSTKHYILNEQETHRMPEANVSAVSSNIDDKTMHEYYLWPFQDAVRAGSGNIMCSYQRINNSY ET501786.1 GCSNSKTLNGLLKTELGFQGFVVSDWSAQHAGVASAEAGMDMAMPGPAEFWGEHLVEAVKNGSL GI:572278616 PESRITDMATRIIATWYQFDQDNGIPKPGIGMPSNVLDSHEIVDARDPAAVPVLLNGAIEGHVL VKNTKNTLPLKKPRKLSLFGYSATTPDFFSPSRDEQLSDSWIFGKEAYNSNYLSPDGFATFGRN GTTFGGCGSGAITPALAISPFEALKWRAAQDGTATFNNFLSDKPDVDPTSDACIVFGNAYACEG NDRPAIQDDYTDDLIKAVASQCNKTIVVLHNAGIRLVDGFVDHPNITAVIFAHLPGQESGPALT SLLYGETSPSGRLPYTVAKNDTDYGVVLDPAQATGEFAYFPQADFKEGVYLDYRYFDKEGIEPR YEFGFGLSYTTFAYLNLSVDHVSGANTYPWPGGPIVSGGQTDLWDAIATVSVDIRNTGSVASYE VAQLYIGIPGAPAKQLRGFEKPFLRPNESQSVTFHLTRRDLSVWSVERQKWQLQQGTYKIYVGS SSRRLHVNGTLDI predictedprotein, HLRSHTVESPNSIVKRGTCAFPTDDPNLVAVTPDAENAGWAMSPDQPCKPGHYCPIACKPGMVM 375 partial AQWSPDSSYSYPSSMDGGLYCDEDGEVHKPFPNKPYCVEGTGAVVAVNKCGEPMSWCQTVLPGN [Trichoderma EAMLIPTLVEDQATIAVPDTSYWCETAAHYYINPPGSSVADCVWGVSSKPVGNWSPYVAGANTD reeseiQM6a] GDGNTFVKLGWNPIWQDSALKSTLPSFGVKIECPDGGCNGLPCEISPNSDGSVDSKESAVGAGN EGR51923.1 AAFCVVTVPKGGVANIVAYNVDGSSGGSDSDSDSDSGSSSSAAPSSTAHGLKAGGFAALAEKPT GI:340521689 STTAAPSSTEVSTTAAASTTEVASTTAAAESTTTAAESTAAETTDASATATTKAAHSTTGGKAS STARARPSVNPGMFHENGTSPHQTTAAPSGPSATQADSAPVTTTTKKGEAGRQQGSTAFAGLIV AFVAAACFL glycoside MVAVKQIALLAGLAHWADAAEKVITNDTHFYGQSPPVYPSPEMTGGNEWEAAYQKAKAFVGQLT 376 hydrolasefamily3 LEEKVNLTAGVPPNTTCSGVIPAIERLKFPGMCLSDAGNGLRNTDFVSGFPSGIHVGASWSKDL protien AFRRAVAMGAEFRKKGVNVLLGPVVGPAGRTVRGGRNWEGFSVDPWLAGVLVSETVSGIQEQGV [Trichoderma ITSTKHYILNEQETHRMPEANVSAVSSNIDDKTMHEYYLWPFQDAVRAGSGNIMCSYQRINNSY reeseiQM6a] GCSNSKTLNGLLKTELGFQGFVVSDWSAQHAGVASAEAGMDMAMPGPAEFWGEHLVEAVKNGSL EGR50829.1 PESRITDMATRIIATWYQFDQDNGIPKPGIGMPSNVLDSHEIVDARDPAAVPVLLNGAIEGHVL GI:340520593 VKNTKNTLPLKKPRKLSLFGYSATTPDFFSPSRDEQLSDSWIFGKEAYNSNYLSPDGFATFGRN GTTFGGCGSGAITPALAISPFEALKWRAAQDGTATFNNFLSDKPDVDPTSDACIVFGNAYACEG NDRPAIQDDYTDDLIKAVASQCNKTIVVLHNAGIRLVDGFVDHPNITAVIFAHLPGQESGPALT SLLYGETSPSGRLPYTVAKNDTDYGVVLDPAQATGEFAYFPQADFKEGVYLDYRYFDKEGIEPR YEFGFGLSYTTFAYLNLSVDHVSGANTYPWPGGPIVSGGQTDLWDAIATVSVDIRNTGSVASYE VAQLYIGIPGAPAKQLRGFEKPFLRPNESQSVTFHLTRRDLSVWSVERQKWQLQQGTYKIYVGS SSRRLHVNGTLDI cellwallprotein MLACITRATLPTVVAATPSHHHHHAHRHAKKHAAARVEKRAPDVVTEVVVGATATVFELDGKIV 377 [Trichoderma DAATAKAGLAEGEYIIVGETTPTFVPPPPPPPATSSAAPLRAQFVEEPISSPAAPTTTSAPPPP reeseiQM6a] PTTTAQATTSSAPPPPKTSKPAQSSPSSGATGLDADFPSGKISCKTFPSEYGAVALDWLGTGGW EGR50785.1 SGLQFVPNYSPDAQSISDIITGIAGQTCSKGAMCSYACPPGYQKTQWPKAQGATLQSIGGLYCN GI:340520549 EDGFLELTRPDHPKLCEAGAGGVTIKNDLDDSVCTCRTDYPGIESMVIPACTSAGETIELTNPD ETDYYVWDGKTTSAQYYVNKKGYAVEDACVWNSPLDPRGAGNWSPINIGTGKTADGITWLSIFE NLPTSSAKLDFNIEITGDVNSKCSYIDGAWTGGDKGCTTAMPSGGKAVIRYF glycoside MILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTSYNK 378 hydrolasefamily3 VNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIESAL protein QARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENSILP [Trichoderma LPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQMLPVI reeseiQM6a] DAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPTATG EGR49878.1 IWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFGSAN GI:340519640 TTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDRPHM DLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFGNVN PSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYATFKL PDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVYLEA GEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL glycoside MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS 379 hydrolasefamily3 GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI protein GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI [Trichoderma LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD reeseiQM6a] QLGFPGYVMTDWNAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV EGR49703.1 TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG GI:340519465 SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTSTLSVA glycoside MTLAEKTNITAGTGIYMGERCAGNTGSAFRVSFPQLCLNDSPAGVRHADNVTAFPDGITVGATF 380 hydrolasefamily3 DKALMYKRGVAIGKENRGKGVNVWLGPTVGPIGRKPKGGRNWEGFGADPVLQAVGARETIKGVQ protein EQGVIATIKHFIGNEQEMYRMYNPFQYAYSSNIDDRTLHEVYAWPFAEGIRAGVGAVMMAYNAV [Trichoderma NGTACSQHPYLMSALLKDEMGFQGFIMTDWLAHMSGVASAIAGLDMDMPGDVQIPFFGGSYWMY reeseiQM6a] ELTRSALNGSVPMDRINDAATRIAAAWYKMGQDKGFPATNFDTNSRAAFNPLYPAALPLSPFGI 814aaprotein TNEFVPVQDDHDVIARQISQEAITLLKNDGDILPLSPSQHLKVFGTDAQKNPDGINSCTDRNCN EGR49559.1 KGTLGQGWGSGTVDYPYLDDPISAITAEADNVTFYNTDKFPSVGEVSDSDVAIVFVNSDAGENT GI:340519320 YTVEGNHGDRDKSGLYAWHDGDKLVQDAASKFSNVIVVIHTVGPLILEKWIDLPSVKAVLVAHL PGQEAGKSLTNVLFGHASPCGHLPYSITKEEDDLPKSVTTLIDSEFLNQPQDTYTEGLYIDYRW LNKNKTKPRYAFGHGLSYTNFTFKAASIKQVARLSAYPPARPAKGSTPDFAQSIPSASEAVAPS GFGKIPRYIYSWLSQGDANRAISDGKTGKYPYPDGYSTTQKPGARAGGGEGGNPALWDVAYSLT VTVQNTGDEYAGKASVQAYLQFPDDIDYDTPIIQLRDFEKTKELKPGETTTVTLTLTRKDVSVW DVVAQDWKVPAVDGGYKVWIGDASDSLSIVCHTDTLECETGVVGPV glycosidehydrolase MKTLSVFAAALLAAVAEANPYPPPHSNQAYSPPFYPSPWMDPSAPGWEQAYAQAKEFVSGLTLL 381 family3protein EKVNLTTGVGWMGEKCVGNVGTVPRLGMRSLCMQDGPLGLRFNTYNSAFSVGLTAAASWSRHLW [Trichoderma VDRGTALGSEAKGKGVDVLLGPVAGPLGRNPNGGRNVEGFGSDPYLAGLALADTVTGIQNAGTI reeseiQM6a] ACAKHFLLNEQEHFRQVGEANGYGYPITEALSSNVDDKTIHEVYGWPFQDAVKAGVGSIMCSYN EGR48517.1 QVNNSYACQNSKLINGLLKEEYGFQGFVMSDWQAQHTGVASAVAGLDMTMPGDTAFNTGASYFG GI:340518276 SNLTLAVLNGTVPEWRIDDMVMRIMAPFFKVGKTVDSLIDTNFDSWTNGEYGYVQAAVNENWEK VNYGVDVRANHANHIREVGAKGTVIFKNNGILPLKKPKFLTVIGEDAGGNPAGPNGCGDRGCDD GTLAMEWGSGTTNFPYLVTPDAALQSQALQDGTRYESILSNYAISQTQALVSQPDAIAIVFANS DSGEGYINVDGNEGDRKNLTLWKNGDDLIKTVAAVNPKTIVVIHSTGPVILKDYANHPNISAIL WAGAPGQESGNSLVDILYGKQSPGRTPFTWGPSLESYGVSVMTTPNNGNGAPQDNFNEGAFIDY RYFDKVAPGKPRSSDKAPTYEFGFGLSWSTFKFSNLHIQKNNVGPMSPPNGKTIAAPSLGSFSK NLKDYGFPKNVRRIKEFIYPYLSTTTSGKEASGDAHYGQTAKEFLPAGALDGSPQPRSAASGEP GGNRQLYDILYTVTATITNTGSVMDDAVPQLYLSHGGPNEPPKVLRGFDRIERIAPGQSVTFKA DLTRRDLSNWDTKKQQWVITDYPKTVYVGSSSRDLPLSARLP glycosidehydrolase MANSIGGSSADKFDLDPLWQNLDWAIGQMMLMGWDGTQVTPQIRSLIEDHHLGSIILTAKNLKS 382 family3protein AHHTALLVQELQMIAKNSGHPQPLLIAVDQENGGVNSLFDEDFVCQFPSAMAIAATGSLELSYE [Trichoderma VNKATATEISACGVNLMLGPVLDVLNNARYQVIGVRASGDDPQEVSQYGLAALSGIRDAGVASC reeseiQM6a] GKHFPSYGNLDFLGSNLDVPIITQTLEELSLSALVPFRNAIASGKLDAMFIGGCGISNPSMNVS EGR47352.1 HACLSDQVVDDLLRNELGFKGVAISECLEMEALSQDLGVQNGVVMAVEAGCDIVLLCRAYDVQL GI:340517106 EAIKGLKLGYENGIITKERIFTSLKRIFHLKSTCTSWAKALNPPGINLLSQIRPSHLALSRRAY DDSITIVRDKEKLLPLSLSMHPGEELLLLTPLVKPLPASSLTKSLLESKNDPSLVSTEHDRWNH QIRERSAIMSGEGVFREFGKTLARYRNEKLLHTSYTANGVRPVHENLINRASCIIIFTADANRN LYQAGFTKHVDMMCSMLRSRGQKKQLIVVAVSSPYDFAMDKSIGTYICTYDFTENAMAALVRAL VGDSNPVGTMPGTLRKSKKVLKSRQHWLVEEFDSSRDRKGLNDLIRAVHRASDQDFRYLQTATA DTFLLANQNIKETHFVVRNSSTQALYGFAATYFVQNVGILGALIVDPTKRNMSIGRSLHRRAIK SLTQQRGIKKVQLGSCFPALFLGIPLDIEVTTTKEWFSNSGWDTQFPRRLTNMVIQDLSAWYAP EGLSQSIQRANISFDLIYGVESGDTVMHHVRTHANPEVLELYRTALEESKACGIVRAKDAAGNL LGTIIICRPNSPLARYVPPLVSLGQDIGGLLAPIVPPAPLSTLVLQGLALMGVRQNKGHKATKS VLSWVVDDAYEPLVAMGFDVLQAFEEITNSPETFQT carbohydrate MLPRRMRKSRCCIAVLAVIAIIVMLLAAAGAFGYKKLKITPLDGKSPPWYPTPKGGSVRQWADS 383 esterasefamily4 YQKAAEMVARMTLPEKVNITTGTGWSMGLAVGNTAPALLVGFPALALQDGPLGIRFADNATALP protein AGVTVGATWNRHLMYEHGRVHALEARGKGINALLGPCVGPLGRMPAGGRNWEGFGADPYLQGVA [Trichoderma GYETIKGIQDQGVMATIKHFVANEQEHFRQAWEWVLPNALSSNIDDRTMHEIYAWPFGDAVKAG reeseiQM6a] VASVMCSYNMVNNSYACGNSKLLNGILKDELGFQGFVMSDWLAQRSGVGSALAGLDMSMPGDGL EGR46266.1 RWQDGNSLWGPNLSRAVLNGSLPLERLNDMVVRIVAAWYQLGQDDEKLFDRKGPNFSSWTNDRM GI:340516015 GVTAPASSSPQEKVVVNQFVNVQANHSILARQIAAEGTVLLKNEGVLPLSVDGLLGGGGGSNST KREGQVRIGIFGEDAGPGKGPNYCEDRSCNQGTLASGWGSGAVEFPYLVSPIEALRKKFNKDKV KLTEHLKNDELDTGVIKSQDICMVFINSDSGEGYRAWEGVRGDRNDLKPQKGGVGLVTHVGLNC GNGSGTTIVVLHSVGPVVVDPWIDMPGIKALISANLPGQESGNALASILFGEENPSGKLPYTVG KSLSDYGPGGQVMYLPNGAVPQQDFSEGLYIDYRHFDKFNIEPRYEFGFGLSYTNFDYKNLKIT ETKPRSPLPDERPAAEVEPPSFDTTIPQAEEALFPSGIRRLKKYVYPYIESVKDIKEGQYSYPD GYDKEQPLSGAGGGEGGNPSLWDSHVIVSVEVTNTGKLGGKAVPQLYLSYPASETVDFPVRVLR GFDKVYIGKGETKTVEFSLTRRDLSYWDVERQNWVIPEGEYTFAVGESSRDLRVSGTW glycosidehydrolase QLFAVGFHGREINQEITTLIRDYGVGAIVLFKRNENGLVTRISPPIASQQPGPMTLGAAGSLEY 384 family3protein, AYEVAKATAEMLRYFGINMNYAPVGDVNSEPLNPVIGVRSPSDQAETVSKFAAACTKGLREHKV partial VPCIKHFPGHGDTAVDSHYGLPVINKSRADMEKLELIPFRDAVADNIEMVMTAHISLPQLAKDG [Trichoderma LPATLSPDTIGILRNEWKYEGVIMTDCLEMDGIRATYGTVEGSLMAFQAGVDNVMICHTFDVQA reeseiQM6a] AAVDYICGAIESGKLSQERVDQSLERLRKLKERYTNWDIALHAEPPEALEAINERGEKLARQVY EGR44807.1 ADATTLVRAQEGLLPLKATAKIAFVSPGPDVPIGGAVDSGTLPTRVPWIADTFGEQIRKRAPEM GI:340514546 SDVRFTSSNLTEEQWEQIDEADVVILATRNARESKYQKELGLEVAKRRGSRPLISIATCAPYDF LDDEEIRTYIAVYEPTVEAFSAAVDILFGDAQPRGKLPVAH glycosidehydrolase MADIDVEAILKKLTLAEKVDLLAGIDFWHTKALPKHGVPSLRFTDGPNGVRGTKFFNGVPAACF 385 family3protein PCGTSLGSTFNQTLLEEAGKMMGKEAIAKSAHVILGPTINMQRSPLGGRGFESIGEDPFLAGLG [Trichoderma AAALIRGIQSTGVQATIKHFLCNDQEDRRMMVQSIVTERALREIYALPFQIAVRDSQPGAFMTA reeseiQM6a] YNGINGVSCSENPKYLDGMLRKEWGWDGLIMSDWYGTYSTTEAVVAGLDLEMPGPPRFRGETLK EGR44527.1 FNVSNGKPFIHVIDQRAREVLQFVKKCAASGVTENGPETTVNNTPETAALLRKVGNEGIVLLKN GI:340514262 ENNVLPLSKKKKTLIVGPNAKQATYHGGGSAALRAYYAVTPFDGLSKQLETPPSYTVGAYTHRF LPILGEQCLTPDGAPGMRWRVFNEPPGTPNRQHIDELFFTKTDMHLVDYYHPKAADTWYADMEG TYTADEDCTYELGLVVCGTAKAYVDDQLVVDNATKQVPGDAFFGSATREETGRINLVKGNTYKF KIEFGSAPTYTLKGDTIVPGHGSLRVGGCKVIDDQAEIEKSVALAKEHDQVIICAGLNADWETE GADRASMKLPGVLDQLIADVAAANPNTVVVMQTGTPEEMPWLDATPAVIQAWYGGNETGNSIAD VVFGDYNPSGKLSLSFPKRLQDNPAFLNFRTEAGRTLYGEDVYVGYRYYEFADKDVNFPFGHGL SYTTFAFSNLSVSHKDGKLSVSLSVKNTGSVPGAQVAQLYVKPLQAAKINRPVKELKGFAKVEL QPGETKAVTIEEQEKYVAAYFDEERDQWCVEKGDYEVIVSDSSAAKDGVALRGKFTVGETYWWS GV glycoside MADIDVEAILKKLTLAEKVDLLAGIDFWHTKALPKHGVPSLRFTDGPNGVRGTKFFNGVPAACF 386 hydrolasefamily3 PCGTSLGSTFNQTLLEEAGKMMGKEAIAKSAHVILGPTINMQRSPLGGRGFESIGEDPFLAGLG protein AAALIRGIQSTGVQATIKHFLCNDQEDRRMMVQSIVTERALREIYALPFQIAVRDSQPGAFMTA [Trichoderma YNGINGVSCSENPKYLDGMLRKEWGWDGLIMSDWYGTYSTTEAVVAGLDLEMPGPPRFRGETLK reeseiQM6a] FNVSNGKPFIHVIDQRAREVLQFVKKCAASGVTENGPETTVNNTPETAALLRKVGNEGIVLLKN XP_006969529.1 ENNVLPLSKKKKTLIVGPNAKQATYHGGGSAALRAYYAVTPFDGLSKQLETPPSYTVGAYTHRF GI:589115013 LPILGEQCLTPDGAPGMRWRVFNEPPGTPNRQHIDELFFTKTDMHLVDYYHPKAADTWYADMEG TYTADEDCTYELGLVVCGTAKAYVDDQLVVDNATKQVPGDAFFGSATREETGRINLVKGNTYKF KIEFGSAPTYTLKGDTIVPGHGSLRVGGCKVIDDQAEIEKSVALAKEHDQVIICAGLNADWETE GADRASMKLPGVLDQLIADVAAANPNTVVVMQTGTPEEMPWLDATPAVIQAWYGGNETGNSIAD VVFGDYNPSGKLSLSFPKRLQDNPAFLNFRTEAGRTLYGEDVYVGYRYYEFADKDVNFPFGHGL SYTTFAFSNLSVSHKDGKLSVSLSVKNTGSVPGAQVAQLYVKPLQAAKINRPVKELKGFAKVEL QPGETKAVTIEEQEKYVAAYFDEERDQWCVEKGDYEVIVSDSSAAKDGVALRGKFTVGETYWWS GV glycoside QLFAVGFHGREINQEITTLIRDYGVGAIVLFKRNENGLVTRISPPIASQQPGPMTLGAAGSLEY 387 hydrolasefamily3 AYEVAKATAEMLRYFGINMNYAPVGDVNSEPLNPVIGVRSPSDQAETVSKFAAACTKGLREHKV protein,partial VPCIKHFPGHGDTAVDSHYGLPVINKSRADMEKLELIPFRDAVADNIEMVMTAHISLPQLAKDG [Trichoderma LPATLSPDTIGILRNEWKYEGVIMTDCLEMDGIRATYGTVEGSLMAFQAGVDNVMICHTFDVQA reeseiQM6a] AAVDYICGAIESGKLSQERVDQSLERLRKLKERYTNWDIALHAEPPEALEAINERGEKLARQVY XP_006969215.1 ADATTLVRAQEGLLPLKATAKIAFVSPGPDVPIGGAVDSGTLPTRVPWIADTFGEQIRKRAPEM GI:589114385 SDVRFTSSNLTEEQWEQIDEADVVILATRNARESKYQKELGLEVAKRRGSRPLISIATCAPYDF LDDEEIRTYIAVYEPTVEAFSAAVDILFGDAQPRGKLPVAH carbohydrate MLPRRMRKSRCCIAVLAVIAIIVMLLAAAGAFGYKKLKITPLDGKSPPWYPTPKGGSVRQWADS 388 esterasefamily4 YQKAAEMVARMTLPEKVNITTGTGWSMGLAVGNTAPALLVGFPALALQDGPLGIRFADNATALP protein AGVTVGATWNRHLMYEHGRVHALEARGKGINALLGPCVGPLGRMPAGGRNWEGFGADPYLQGVA [Trichoderma GYETIKGIQDQGVMATIKHFVANEQEHFRQAWEWVLPNALSSNIDDRTMHEIYAWPFGDAVKAG reeseiQM6a] VASVMCSYNMVNNSYACGNSKLLNGILKDELGFQGFVMSDWLAQRSGVGSALAGLDMSMPGDGL XP_006967911.1 RWQDGNSLWGPNLSRAVLNGSLPLERLNDMVVRIVAAWYQLGQDDEKLFDRKGPNFSSWTNDRM GI:589111777 GVTAPASSSPQEKVVVNQFVNVQANHSILARQIAAEGTVLLKNEGVLPLSVDGLLGGGGGSNST KREGQVRIGIFGEDAGPGKGPNYCEDRSCNQGTLASGWGSGAVEFPYLVSPIEALRKKFNKDKV KLTEHLKNDELDTGVIKSQDICMVFINSDSGEGYRAWEGVRGDRNDLKPQKGGVGLVTHVGLNC GNGSGTTIVVLHSVGPVVVDPWIDMPGIKALISANLPGQESGNALASILFGEENPSGKLPYTVG KSLSDYGPGGQVMYLPNGAVPQQDFSEGLYIDYRHFDKFNIEPRYEFGFGLSYTNFDYKNLKIT ETKPRSPLPDERPAAEVEPPSFDTTIPQAEEALFPSGIRRLKKYVYPYIESVKDIKEGQYSYPD GYDKEQPLSGAGGGEGGNPSLWDSHVIVSVEVTNTGKLGGKAKGETKTVEFSLTRRDLSYWDVE RQNWVIPEGEYTFAVGESSRDLRVSGTW glycoside MANSIGGSSADKFDLDPLWQNLDWAIGQMMLMGWDGTQVTPQIRSLIEDHHLGSIILTAKNLKS 389 hydrolasefamily3 AHHTALLVQELQMIAKNSGHPQPLLIAVDQENGGVNSLFDEDFVCQFPSAMAIAATGSLELSYE protein VNKATATEISACGVNLMLGPVLDVLNNARYQVIGVRASGDDPQEVSQYGLAALSGIRDAGVASC [Trichoderma GKHFPSYGNLDFLGSNLDVPIITQTLEELSLSALVPFRNAIASGKLDAMFIGGCGISNPSMNVS reeseiQM6a] HACLSDQVVDDLLRNELGFKGVAISECLEMEALSQDLGVQNGVVMAVEAGCDIVLLCRAYDVQL XP_006966911.1 EAIKGLKLGYENGIITKERIFTSLKRIFHLKSTCTSWAKALNPPGINLLSQIRPSHLALSRRAY GI:589109777 DDSITIVRDKEKLLPLSLSMHPGEELLLLTPLVKPLPASSLTKSLLESKNDPSLVSTEHDRWNH QIRERSAIMSGEGVFREFGKTLARYRNEKLLHTSYTANGVRPVHENLINRASCIIIFTADANRN LYQAGFTKHVDMMCSMLRSRGQKKQLIVVAVSSPYDFAMDKSIGTYICTYDFTENAMAALVRAL VGDSNPVGTMPGTLRKSKKVLKSRQHWLVEEFDSSRDRKGLNDLIRAVHRASDQDFRYLQTATA DTFLLANQNIKETHFVVRNSSTQALYGFAATYFVQNVGILGALIVDPTKRNMSIGRSLHRRAIK SLTQQRGIKKVQLGSCFPALFLGIPLDIEVTTTKEWFSNSGWDTQFPRRLTNMVIQDLSAWYAP EGLSQSIQRANISFDLIYGVESGDTVMHHVRTHANPEVLELYRTALEESKACGIVRAKDAAGNL LGTIIICRPNSPLARYVPPLVSLGQDIGGLLAPIVPPAPLSTLVLQGLALMGVRQNKGHKATKS VLSWVVDDAYEPLVAMGFDVLQAFEEITNSPETFQT glycoside MKTLSVFAAALLAAVAEANPYPPPHSNQAYSPPFYPSPWMDPSAPGWEQAYAQAKEFVSGLTLL 390 hydrolasefamily3 EKVNLTTGVGWMGEKCVGNVGTVPRLGMRSLCMQDGPLGLRFNTYNSAFSVGLTAAASWSRHLW protein VDRGTALGSEAKGKGVDVLLGPVAGPLGRNPNGGRNVEGFGSDPYLAGLALADTVTGIQNAGTI [Trichoderma ACAKHFLLNEQEHFRQVGEANGYGYPITEALSSNVDDKTIHEVYGWPFQDAVKAGVGSIMCSYN reeseiQM6a QVNNSYACQNSKLINGLLKEEYGFQGFVMSDWQAQHTGVASAVAGLDMTMPGDTAFNTGASYFG XP_006965281.1 SNLTLAVLNGTVPEWRIDDMVMRIMAPFFKVGKTVDSLIDTNFDSWTNGEYGYVQAAVNENWEK GI:589106517 VNYGVDVRANHANHIREVGAKGTVIFKNNGILPLKKPKFLTVIGEDAGGNPAGPNGCGDRGCDD GTLAMEWGSGTTNFPYLVTPDAALQSQALQDGTRYESILSNYAISQTQALVSQPDAIAIVFANS DSGEGYINVDGNEGDRKNLTLWKNGDDLIKTVAAVNPKTIVVIHSTGPVILKDYANHPNISAIL WAGAPGQESGNSLVDILYGKQSPGRTPFTWGPSLESYGVSVMTTPNNGNGAPQDNFNEGAFIDY RYFDKVAPGKPRSSDKAPTYEFGFGLSWSTFKFSNLHIQKNNVGPMSPPNGKTIAAPSLGSFSK NLKDYGFPKNVRRIKEFIYPYLSTTTSGKEASGDAHYGQTAKEFLPAGALDGSPQPRSAASGEP GGNRQLYDILYTVTATITNTGSVMDDAVPQLYLSHGGPNEPPKVLRGFDRIERIAPGQSVTFKA DLTRRDLSNWDTKKQQWVITDYPKTVYVGSSSRDLPLSARLP glycoside MTLAEKTNITAGTGIYMGERCAGNTGSAFRVSFPQLCLNDSPAGVRHADNVTAFPDGITVGATF 391 hydrolasefamily3 DKALMYKRGVAIGKENRGKGVNVWLGPTVGPIGRKPKGGRNWEGFGADPVLQAVGARETIKGVQ protein EQGVIATIKHFIGNEQEMYRMYNPFQYAYSSNIDDRTLHEVYAWPFAEGIRAGVGAVMMAYNAV [Trichoderma NGTACSQHPYLMSALLKDEMGFQGFIMTDWLAHMSGVASAIAGLDMDMPGDVQIPFFGGSYWMY reeseiQM6a ELTRSALNGSVPMDRINDAATRIAAAWYKMGQDKGFPATNFDTNSRAAFNPLYPAALPLSPFGI XP_006964430.1 TNEFVPVQDDHDVIARQISQEAITLLKNDGDILPLSPSQHLKVFGTDAQKNPDGINSCTDRNCN GI:589104815 KGTLGQGWGSGTVDYPYLDDPISAITAEADNVTFYNTDKFPSVGEVSDSDVAIVFVNSDAGENT YTVEGNHGDRDKSGLYAWHDGDKLVQDAASKFSNVIVVIHTVGPLILEKWIDLPSVKAVLVAHL PGQEAGKSLTNVLFGHASPCGHLPYSITKEEDDLPKSVTTLIDSEFLNQPQDTYTEGLYIDYRW LNKNKTKPRYAFGHGLSYTNFTFKAASIKQVARLSAYPPARPAKGSTPDFAQSIPSASEAVAPS GFGKIPRYIYSWLSQGDANRAISDGKTGKYPYPDGYSTTQKPGARAGGGEGGNPALWDVAYSLT VTVQNTGDEYAGKASVQAYLQFPDDIDYDTPIIQLRDFEKTKELKPGETTTVTLTLTRKDVSVW DVVAQDWKVPAVDGGYKVWIGDASDSLSIVCHTDTLECETGVVGPV glycoside MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS 392 hydrolasefamily3 GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI protein GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI [Trichoderma LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD reeseiQM6a QLGFPGYVMTDWNAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV XP_006964076.1 TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG GI:589104107 SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTSTLSVA glycoside MILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTSYNK 393 hydrolasefamily3 VNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIESAL protein QARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENSILP [Trichoderma LPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQMLPVI reeseiQM6a DAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPTATG XP_006964050.1 IWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFGSAN GI:589104055 TTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDRPHM DLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFGNVN PSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYATFKL PDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVYLEA GEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL glycoside MVAVKQIALLAGLAHWADAAEKVITNDTHFYGQSPPVYPSPEMTGGNEWEAAYQKAKAFVGQLT 394 hydrolasefamily3 LEEKVNLTAGVPPNTTCSGVIPAIERLKFPGMCLSDAGNGLRNTDFVSGFPSGIHVGASWSKDL protein AFRRAVAMGAEFRKKGVNVLLGPVVGPAGRTVRGGRNWEGFSVDPWLAGVLVSETVSGIQEQGV [Trichoderma ITSTKHYILNEQETHRMPEANVSAVSSNIDDKTMHEYYLWPFQDAVRAGSGNIMCSYQRINNSY reeseiQM6a GCSNSKTLNGLLKTELGFQGFVVSDWSAQHAGVASAEAGMDMAMPGPAEFWGEHLVEAVKNGSL XP_006963375.1 PESRITDMATRIIATWYQFDQDNGIPKPGIGMPSNVLDSHEIVDARDPAAVPVLLNGAIEGHVL GI:589102705 VKNTKNTLPLKKPRKLSLFGYSATTPDFFSPSRDEQLSDSWIFGKEAYNSNYLSPDGFATFGRN GTTFGGCGSGAITPALAISPFEALKWRAAQDGTATFNNFLSDKPDVDPTSDACIVFGNAYACEG NDRPAIQDDYTDDLIKAVASQCNKTIVVLHNAGIRLVDGFVDHPNITAVIFAHLPGQESGPALT SLLYGETSPSGRLPYTVAKNDTDYGVVLDPAQATGEFAYFPQADFKEGVYLDYRYFDKEGIEPR YEFGFGLSYTTFAYLNLSVDHVSGANTYPWPGGPIVSGGQTDLWDAIATVSVDIRNTGSVASYE VAQLYIGIPGAPAKQLRGFEKPFLRPNESQSVTFHLTRRDLSVWSVERQKWQLQQGTYKIYVGS SSRRLHVNGTLDI cellwallprotein MLACITRATLPTVVAATPSHHHHHAHRHAKKHAAARVEKRAPDVVTEVVVGATATVFELDGKIV 395 [Trichoderma DAATAKAGLAEGEYIIVGETTPTFVPPPPPPPATSSAAPLRAQFVEEPISSPAAPTTTSAPPPP reeseiQM6a PTTTAQATTSSAPPPPKTSKPAQSSPSSGATGLDADFPSGKISCKTFPSEYGAVALDWLGTGGW XP_006963339.1 SGLQFVPNYSPDAQSISDIITGIAGQTCSKGAMCSYACPPGYQKTQWPKAQGATLQSIGGLYCN GI:589102633 EDGFLELTRPDHPKLCEAGAGGVTIKNDLDDSVCTCRTDYPGIESMVIPACTSAGETIELTNPD ETDYYVWDGKTTSAQYYVNKKGYAVEDACVWNSPLDPRGAGNWSPINIGTGKTADGITWLSIFE NLPTSSAKLDFNIEITGDVNSKCSYIDGAWTGGDKGCTTAMPSGGKAVIRYF predictedprotein HLRSHTVESPNSIVKRGTCAFPTDDPNLVAVTPDAENAGWAMSPDQPCKPGHYCPIACKPGMVM 396 partial AQWSPDSSYSYPSSMDGGLYCDEDGEVHKPFPNKPYCVEGTGAVVAVNKCGEPMSWCQTVLPGN [Trichoderma EAMLIPTLVEDQATIAVPDTSYWCETAAHYYINPPGSSVADCVWGVSSKPVGNWSPYVAGANTD reeseiQM6a GDGNTFVKLGWNPIWQDSALKSTLPSFGVKIECPDGGCNGLPCEISPNSDGSVDSKESAVGAGN XP_006962014.1 AAFCVVTVPKGGVANIVAYNVDGSSGGSDSDSDSDSGSSSSAAPSSTAHGLKAGGFAALAEKPT GI:589099983 STTAAPSSTEVSTTAAASTTEVASTTAAAESTTTAAESTAAETTDASATATTKAAHSTTGGKAS STARARPSVNPGMFHENGTSPHQTTAAPSGPSATQADSAPVTTTTKKGEAGRQQGSTAFAGLIV FVAAACFL hypothetical MKVTDVQAALASAVVLLSLPAGSVASSHKRFHQLPNKKHTHLRSHTVESPNSIVKRGTCAFPTD 397 protein DPNLVAVTPDAENAGWAMSPDQPCKPGHYCPIACKPGMVMAQWSPDSSYSYPSSMDGGLYCDED M419DRAFT_70331 GEVHKPFPNKPYCVEGTGAVVAVNKCGEPMSWCQTVLPGNEAMLIPTLVEDQATIAVPDTSYWC [Trichoderma ETAAHYYINPPGSSVADCVWGVSSKPVGNWSPYVAGANTDGDGNTFVKLGWNPIWQDSALKSTL reeseiRUTC-30 PSFGVKIECPDGGCNGLPCEISPNSDGSVDSKESAVGAGNAAFCVVTVPKGGVANIVAYNKPTS ETS05514.1 TTAAPSSTEVSTTAAASTTEVASTTAAAESTTTAAESTAAETTDASATATTKAAHSTTGGKASS GI:572282500 TARARPSVNPGMFHENGTSPHQTTAAPSGPSATQADSAPVTTTTKKGEAGRQQGSTAFAGLIVA FVAAACFL beta-D-glucoside MRYRTAAALALATGPFARADSHSTSGASAEAVVPPAGTPWGTAYDKAKAALAKLNLQDKVGIVS 398 glucohydrolaseI GVGWNGGPCVGNTSPASKISYPSLCLQDGPLGVRYSTGSTAFTPGVQAASTWDVNLIRERGQFI [Trichoderma GEEVKASGIHVILGPVAGPLGKTPQGGRNWEGFGVDPYLTGIAMGQTINGIQSVGVQATAKHYI reeseiRUTC-30 LNEQELNRETISSNPDDRTLHELYTWPFADAVQANVASVMCSYNKVNTTWACEDQYTLQTVLKD ETS03194.1 QLGFPGYVMTDWNAQHTTVQSANSGLDMSMPGTDFNGNNRLWGPALTNAVNSNQVPTSRVDDMV GI:572280097 TRILAAWYLTGQDQAGYPSFNISRNVQGNHKTNVRAIARDGIVLLKNDANILPLKKPASIAVVG SAAIIGNHARNSPSCNDKGCDDGALGMGWGSGAVNYPYFVAPYDAINTRASSQGTQVTLSNTDN TSSGASAARGKDVAIVFITADSGEGYITVEGNAGDRNNLDPWHNGNALVQAVAGANSNVIVVVH SVGAIILEQILALPQVKAVVWAGLPSQESGNALVDVLWGDVSPSGKLVYTIAKSPNDYNTRIVS GGSDSFSEGLFIDYKHFDDANITPRYEFGYGLSYTKFNYSRLSVLSTAKSGPATGAVVPGGPSD LFQNVATVTVDIANSGQVTGAEVAQLYITYPSSAPRTPPKQLRGFAKLNLTPGQSGTATFNIRR RDLSYWDTASQKWVVPSGSFGISVGASSRDIRLTSTLSVA hypotheticalprotei MILGCESTGVISAVKHFVANDQEHERRAVDCLITQRALREVYLRPFQIVARDARPGALMTSYNK 399 nM419DRAFT_122639 VNGKHVADSAEFLQGILRTEWNWDPLIVSDWYGTYTTIDAIKAGLDLEMPGVSRYRGKYIESAL [Trichoderma QARLLKQSTIDERARRVLRFAQKASHLKVSEVEQGRDFPEDRVLNRQICGSSIVLLKNENSILP reeseiRUTC-30 LPKSVKKVALVGSHVRLPAISGGGSASLVPYYAISLYDAVSEVLAGATITHEVGAYAHQMLPVI ETS03170.1 DAMISNAVIHFYNDPIDVKDRKLLGSENVSSTSFQLMDYNNIPTLNKAMFWGTLVGEFIPTATG GI:572280073 IWEFGLSVFGTADLYIDNELVIENTTHQTRGTAFFGKGTTEKVATRRMVAGSTYKLRLEFGSAN TTKMETTGVVNFGGGAVHLGACLKVDPQEMIARAVKAAADADYTIICTGLSGEWESEGFDRPHM DLPPGVDTMISQVLDAAPNAVVVNQSGTPVTMSWAHKAKAIVQAWYGGNETGHGISDVLFGNVN PSGKLSLSWPVDVKHNPAYLNYASVGGRVLYGEDVYVGYKFYDKTEREVLFPFGHGLSYATFKL PDSTVRTVPETFHPDQPTVAIVKIKNTSSVPGAQVLQLYISAPNSPTHRPVKELHGFEKVYLEA SUN-domain- GEEKEVQIPIDQYATSFWDEIESMWKSERGIYDVLVGFSSQEISGKGKLIVPETRFWMGL 400 containingprotein MLACITRATLPTVVAATPSHHHHHAHRHAKKHAAARVEKRAPDVVTEVVVGATATVFELDGKIV [Trichoderma DAATAKAGLAEGEYIIVGETTPTFVPPPPPPPATSSAAPLRAQFVEEPISSPAAPTTTSAPPPP reeseiRUTC-30 PTTTAQATTSSAPPPPKTSKPAQSSPSSGATGLDADFPSGKISCKTFPSEYGAVALDWLGTGGW ETS01671.1 SGLQFVPNYSPDAQSISDIITGIAGQTCSKGAMCSYACPPGYQKTQWPKAQGATLQSIGGLYCN GI:572278501 EDGFLELTRPDHPKLCEAGAGGVTIKNDLDDSVCTCRTDYPGIESMVIPACTSAGETIELTNPD ETDYYVWDGKTTSAQYYVNKKGYAVEDACVWNSPLDPRGAGNWSPINIGTGKTADGITWLSIFE NLPTSSAKLDFNIEITGDVNSKCSYIDGAWTGGDKGCTTAMPSGGKAVIRYF hypotheticalprotei MKTLSVFAAALLAAVAEANPYPPPHSNQAYSPPFYPSPWMDPSAPGWEQAYAQAKEFVSGLTLL nM419DRAFT_25095 EKVNLTTGVGWMGEKCVGNVGTVPRLGMRSLCMQDGPLGLRFNTYNSAFSVGLTAAASWSRHLW 401 [Trichoderma VDRGTALGSEAKGKGVDVLLGPVAGPLGRNPNGGRNVEGFGSDPYLAGLALADTVTGIQNAGTI reeseiRUTC-30 ACAKHFLLNEQEHFRQVGEANGYGYPITEALSSNVDDKTIHEVYGWPFQDAVKAGVGSIMCSYN ET01349.1 QVNNSYACQNSKLINGLLKEEYGFQGFVMSDWQAQHTGVASAVAGLDMTMPGDTAFNTGASYFG GI:572278157 SNLTLAVLNGTVPEWRIDDMVMRIMAPFFKVGKTVDSLIDTNFDSWTNGEYGYVQAAVNENWEK VNYGVDVRANHANHIREVGAKGTVIFKNNGILPLKKPKFLTVIGEDAGGNPAGPNGCGDRGCDD GTLAMEWGSGTTNFPYLVTPDAALQSQALQDGTRYESILSNYAISQTQALVSQPDAIAIVFANS DSGEGYINVDGNEGDRKNLTLWKNGDDLIKTVAAVNPKTIVVIHSTGPVILKDYANHPNISAIL WAGAPGQESGNSLVDILYGKQSPGRTPFTWGPSLESYGVSVMTTPNNGNGAPQDNFNEGAFIDY RYFDKVAPGKPRSSDKAPTYEFGFGLSWSTFKFSNLHIQKNNVGPMSPPNGKTIAAPSLGSFSK NLKDYGFPKNVRRIKEFIYPYLSTTTSGKEASGDAHYGQTAKEFLPAGALDGSPQPRSAASGEP GGNRQLYDILYTVTATITNTGSVMDDAVPQLYLSHGGPNEPPKVLRGFDRIERIAPGQSVTFKA DLTRRDLSNWDTKKQQWVITDYPKTVYVGSSSRDLPLSARLP beta-N- MANSIGGSSADKFDLDPLWQNLDWAIGQMMLMGWDGTQVTPQIRSLIEDHHLGSIILTAKNLKS 402 acetylglucosaminid AHHTALLVQELQMIAKNSGHPQPLLIAVDQENGGVNSLFDEDFVCQFPSAMAIAATGSLELSYE ase[Trichoderma VNKATATEISACGVNLMLGPVLDVLNNARYQVIGVRASGDDPQEVSQYGLAALSGIRDAGVASC reeseiRUTC-30 GKHFPSYGNLDFLGSNLDVPIITQTLEELSLSALVPFRNAIASGKLDAMFIGGCGISNPSMNVS ET00749.1 HACLSDQVVDDLLRNELGFKGVAISECLEMEALSQDLGVQNGVVMAVEAGCDIVLLCRAYDVQL GI:572277491 EAIKGLKLGYENGIITKERIFTSLKRIFHLKSTCTSWAKALNPPGINLLSQIRPSHLALSRRAY DDSITIVRDKEKLLPLSLSMHPGEELLLLTPLVKPLPASSLTKSLLESKNDPSLVSTEHDRWNH QIRERSAIMSGEGVFREFGKTLARYRNEKLLHTSYTANGVRPVHENLINRASCIIIFTADANRN LYQAGFTKHVDMMCSMLRSRGQKKQLIVVAVSSPYDFAMDKSIGTYICTYDFTENAMAALVRAL VGDSNPVGTMPGTLRKSKKVLKSRQHWLVEEFDSSRDRKGLNDLIRAVHRASDQDFRYLQTATA DTFLLANQNIKETHFVVRNSSTQALYGFAATYFVQNVGILGALIVDPTKRNMSIGRSLHRRAIK SLTQQRGIKKVQLGSCFPALFLGIPLDIEVTTTKEWFSNSGWDTQFPRRLTNMVIQDLSAWYAP EGLSQSIQRANISFDLIYGVESGDTVMHHVRTHANPEVLELYRTALEESKACGIVRAKDAAGNL LGTIIICRPNSPLARYVPPLVSLGQDIGGLLAPIVPPAPLSTLVLQGLALMGVRQNKGHKATKS VLSWVVDDAYEPLVAMGFDVLQAFEEITNSPETFQT hypothetical MLPRRMRKSRCCIAVLAVIAIIVMLLAAAGAFGYKKLKITPLDGKSPPWYPTPKGGSVRQWADS 403 protein YQKAAEMVARMTLPEKVNITTGTGWSMGLAVGNTAPALLVGFPALALQDGPLGIRFADNATALP M419DRAFT_86704 AGVTVGATWNRHLMYEHGRVHALEARGKGINALLGPCVGPLGRMPAGGRNWEGFGADPYLQGVA [Trichoderma GYETIKGIQDQGVMATIKHFVANEQEHFRQAWEWVLPNALSSNIDDRTMHEIYAWPFGDAVKAG reeseiRUTC-30 VASVMCSYNMVNNSYACGNSKLLNGILKDELGFQGFVMSDWLAQRSGVGSALAGLDMSMPGDGL ETR99336.1 RWQDGNSLWGPNLSRAVLNGSLPLERLNDMVVRIVAAWYQLGQDDEKLFDRKGPNFSSWTNDRM GI:572275968 GVTAPASSSPQEKVVVNQFVNVQANHSILARQIAAEGTVLLKNEGVLPLSVDGLLGGGGGSNST KREGQVRIGIFGEDAGPGKGPNYCEDRSCNQGTLASGWGSGAVEFPYLVSPIEALRKKFNKDKV KLTEHLKNDELDTGVIKSQDICMVFINSDSGEGYRAWEGVRGDRNDLKPQKGGVGLVTHVGLNC GNGSGTTIVVLHSVGPVVVDPWIDMPGIKALISANLPGQESGNALASILFGEENPSGKLPYTVG KSLSDYGPGGQVMYLPNGAVPQQDFSEGLYIDYRHFDKFNIEPRYEFGFGLSYTNFDYKNLKIT ETKPRSPLPDERPAAEVEPPSFDTTIPQAEEALFPSGIRRLKKYVYPYIESVKDIKEGQYSYPD GYDKEQPLSGAGGGEGGNPSLWDSHVIVSVEVTNTGKLGGKAVPQLYLSYPASETVDFPVRVLR GFDKVYIGKGETKTVEFSLTRRDLSYWDVERQNWVIPEGEYTFAVGESSRDLRVSGTW MPSPEQRRKIGQLFAVGFHGREINQEITTLIRDYGVGAIVLFKRNVLDAAQLQALCLGLQKIAR 404 putativebeta-N- DAGHNQPLFIGIDQENGLVTRISPPIASQQPGPMTLGAAGSLEYAYEVAKATAEMLRYFGINMN acetylglucosaminid YAPVGDVNSEPLNPVIGVRSPSDQAETVSKFAAACTKGLREHKVVPCIKHFPGHGDTAVDSHYG ase[Trichoderma LPVINKSRADMEKLELIPFRDAVADNIEMVMTAHISLPQLAKDGLPATLSPDTIGILRNEWKYE reeseiRUTC-30 GVIMTDCLEMDGIRATYGTVEGSLMAFQAGVDNVMICHTFDVQAAAVDYICGAIESGKLSQERV ETR97676.1 DQSLERLRKLKERYTNWDIALHAEPPEALEAINERGEKLARQVYADATTLVRAQEGLLPLKATA GI:572274120 KIAFVSPGPDVPIGGAVDSGTLPTRVPWIADTFGEQIRKRAPEMSDVRFTSSNLTEEQWEQIDE ADVVILATRNARESKYQKELGLEVAKRRGSRPLISIATCAPYDFLDDEEIRTYIAVYEPTVEAF SAAVDILFGDAQPRGKLPVAH UDP- MGHKHIAIFNIPAHGHINPTLALTASLVKRGYRVTYPVTDEFVKAVEETGAEPLNYRSTLNIDP 405 Pandeyet glycotransferase QQIRELMKNKKDMSQAPLMFIKEMEEVLPQLEALYENDKPDLILFDFMAMAGKLLAEKFGIEAV al.,2014 (338) RLCSTYAQNEHFTFRSISEEFKIELTPEQEDALKNSNLPSFNFEDMFEPAKLNIVFMPRAFQPY GETFDERFSFVGPSLAKRKFQEKETPIISDSGRPVMLISLGTAFNAWPEFYHMCIEAFRDTKWQ VIMAVGTTIDPESFDDIPENFSIHQRVPQLEILKKAELFITHGGMNSTMEGLNAGVPLVAVPQM PEQEITARRVEELGLGKHLQPEDTTAASLREAVSQTDGDPHVLKRIQDMQKHIKQAGGAEKAAD EIEAFLAPAGVK. UDP- ATGGGACATAAACATATCGCGATTTTTAATATTCCGGCTCACGGCCATATTAATCCAACGCTAG 406 Pandeyet glycotransferase CTTTAACGGCAAGCCTTGTCAAACGCGGTTATCGGGTAACATATCCGGTGACGGATGAGTTTGT al.,2014 338(gDNA,native) GAAGGCTGTTGAGGAAACTGGGGCAGAGCCGCTCAACTACCGCTCAACTTTAAATATCGATCCG CAGCAAATTCGGGAGCTGATGAAAAATAAAAAAGATATGTCGCAGGCTCCGCTGATGTTTATCA AAGAAATGGAGGAGGTTCTTCCTCAGCTTGAAGCGCTCTATGAGAATGACAAGCCAGACCTTAT CCTTTTTGACTTTATGGCCATGGCGGGAAAACTGCTGGCTGAGAAGTTTGGAATAGAGGCGGTC CGCCTTTGTTCTACATATGCACAGAACGAACATTTTACATTCAGATCCATTTCTGAAGAGTTTA AGATCGAGCTGACGCCTGAGCAAGAGGATGCTTTGAAAAATTCGAATCTTCCGTCATTTAACTT TGAGGATATGTTCGAGCCTGCAAAATTGAACATTGTCTTTATGCCTCGTGCTTTTCAGCCTTAC GGCGAAACGTTTGATGAGCGGTTCTCTTTTGTTGGTCCTTCTCTTGCCAAACGCAAGTTTCAGG AAAAAGAAACGCCGATTATTTCGGACAGCGGCCGTCCTGTCATGCTGATATCTTTAGGGACGGC GTTCAATGCCTGGCCGGAATTTTATCATATGTGCATAGAAGCATTCAGGGACACGAAGTGGCAG GTTATCATGGCTGTTGGCACGACAATCGATCCTGAAAGCTTTGATGACATACCTGAGAACTTTT CGATTCATCAGCGCGTTCCTCAGCTGGAGATCCTGAAGAAAGCGGAGCTGTTCATCACCCATGG GGGTATGAACAGTACGATGGAAGGGTTGAATGCCGGTGTACCGCTCGTTGCCGTTCCGCAAATG CCTGAACAGGAAATCACTGCCCGCCGCGTCGAAGAGCTTGGGCTTGGCAAGCATTTGCAGCCGG AAGACACAACAGCAGCTTCACTGCGGGAAGCCGTCTCTCAGACGGATGGTGACCCGCATGTCCT GAAACGGATACAGGACATGCAAAAGCACATTAAACAAGCCGGAGGGGCCGAGAAAGCCGCAGAT GAAATTGAGGCATTTTTAGCACCCGCAGGAGTAAAATAA 301UGT98 MDAQRGHTTTILMFPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSSSSDSI 407 QLVELCLPSSPDQLPPHLHTTNALPPHLMPTLHQAFSMAAQHFAAILHTLAPHLLIYDSFQPWA PQLASSLNIPAINFNTTGASVLTRMLHATHYPSSKFPISEFVLHDYWKAMYSAAGGAVTKKDHK IGETLANCLHASCSVILINSFRELEEKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKN WLDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVHFIWVVRFPQGDNTSAIEDALPKGFL ERVGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHLDQPFNAGLAE EAGVGVEAKRDSDGKIQREEVAKSIKEVVIEKTREDVRKKAREMGEILRSKGDEKIDELVAEIS LLRKKAPCSIAAALEHHHHHH 301UGT98(gDNA, CTCGAATTCATGGATGCCCAGCGAGGTCACACCACAACCATTTTGATGTTTCCATGGCTCGGCT 408 native) ATGGCCATCTTTCGGCTTTCCTAGAGTTGGCCAAAAGCCTCTCAAGGAGGAACTTCCATATCTA CTTCTGTTCAACCTCTGTTAACCTCGACGCCATTAAACCAAAGCTTCCTTCTTCTTCCTCTTCT GATTCCATCCAACTTGTGGAACTTTGTCTTCCATCTTCTCCTGATCAGCTCCCTCCTCATCTTC ACACAACCAACGCCCTCCCCCCTCACCTCATGCCCACTCTCCACCAAGCCTTCTCCATGGCTGC CCAACACTTTGCTGCCATTTTACACACACTTGCTCCGCATCTCCTCATTTACGACTCTTTCCAA CCTTGGGCTCCTCAACTAGCTTCATCCCTCAACATTCCAGCCATCAACTTCAATACTACGGGAG CTTCAGTCCTGACCCGAATGCTTCACGCTACTCACTACCCAAGTTCTAAATTCCCAATTTCAGA GTTTGTTCTCCACGATTATTGGAAAGCCATGTACAGCGCCGCCGGTGGGGCTGTTACAAAAAAA GACCACAAAATTGGAGAAACACTTGCGAATTGCTTGCATGCTTCTTGTAGTGTAATTCTAATCA ATAGTTTCAGAGAGCTCGAGGAGAAATATATGGATTATCTCTCCGTTCTCTTGAACAAGAAAGT TGTTCCGGTTGGTCCTTTGGTTTACGAACCGAATCAAGACGGGGAAGATGAAGGTTATTCAAGC ATCAAAAATTGGCTTGACAAAAAGGAACCGTCCTCCACCGTCTTCGTTTCATTTGGAAGCGAAT ACTTCCCGTCAAAGGAAGAAATGGAAGAGATAGCCCATGGGTTAGAGGCGAGCGAGGTTCATTT CATCTGGGTCGTTAGGTTTCCTCAAGGAGACAACACCAGCGCCATTGAAGATGCCTTGCCGAAG GGGTTTCTGGAGAGGGTGGGAGAGAGAGGGATGGTGGTGAAGGGTTGGGCTCCCCAGGCGAAGA TACTGAAGCATTGGAGCACAGGGGGATTCGTGAGCCACTGTGGATGGAACTCGGTGATGGAAAG CATGATGTTTGGCGTTCCCATAATAGGGGTTCCGATGCATCTGGACCAGCCCTTTAACGCCGGA CTCGCGGAAGAAGCTGGCGTCGGCGTGGAAGCCAAGCGAGATTCGGACGGCAAAATTCAAAGAG AAGAAGTTGCAAAGTCGATCAAAGAAGTGGTGATTGAGAAAACCAGGGAAGACGTGAGGAAGAA AGCAAGAGAAATGGGTGAGATTTTGAGGAGTAAAGGAGATGAGAAAATTGATGAGTTGGTGGCT GAAATTTCTCTTTTGCGCAAAAAGGCCCCATGTTCAATTGCGGCCGCACTCGAGCACCACCACC ACCACCACTGA UDP- MVQPRVLLFPFPALGHVKPFLSLAELLSDAGIDVVFLSTEYNHRRISNTEALASRFPTLHFETI 409 glycosy1transferas PDGLPPNESRALADGPLYFSMREGTKPRFRQLIQSLNDGRWPITCIITDIMLSSPIEVAEEFGI es(339) PVIAFCPCSARYLSIHFFIPKLVEEGQIPYADDDPIGEIQGVPLFEGLLRRNHLPGSWSDKSAD ISFSHGLINQTLAAGRASALILNTFDELEAPFLTHLSSIFNKIYTIGPLHALSKSRLGDSSSSA SALSGFWKEDRACMSWLDCQPPRSVVFVSFGSTMKMKADELREFWYGLVSSGKPFLCVLRSDVV SGGEAAELIEQMAEEEGAGGKLGMVVEWAAQEKVLSHPAVGGFLTHCGWNSTVESIAAGVPMMC WPILGDQPSNATWIDRVWKIGVERNNREWDRLTVEKMVRALMEGQKRVEIQRSMEKLSKLANEK VVRGGLSFDNLEVLVEDIKKLKPYKF UDP- ATGGTGCAACCTCGGGTACTGCTGTTTCCTTTCCCGGCACTGGGCCACGTGAAGCCCTTCTTAT 410 glycosy1transferas CACTGGCGGAGCTGCTTTCCGACGCCGGCATAGACGTCGTCTTCCTCAGCACCGAGTATAACCA es(339)(gDNA) CCGTCGGATCTCCAACACTGAAGCCCTAGCCTCCCGCTTCCCGACGCTTCATTTCGAAACTATA CCGGATGGCCTGCCGCCTAATGAGTCGCGCGCTCTTGCCGACGGCCCACTGTATTTCTCCATGC GTGAGGGAACTAAACCGAGATTCCGGCAACTGATTCAATCTCTTAACGACGGTCGTTGGCCCAT CACCTGTATTATCACTGACATCATGTTATCTTCTCCGATTGAAGTAGCGGAAGAATTTGGGATT CCAGTAATTGCCTTCTGCCCCTGCAGTGCTCGCTACTTATCGATTCACTTTTTTATACCGAAGC TCGTTGAGGAAGGTCAAATTCCATACGCAGATGACGATCCGATTGGAGAGATCCAGGGGGTGCC CTTGTTCGAAGGTCTTTTGCGACGGAATCATTTGCCTGGTTCTTGGTCTGATAAATCTGCAGAT ATATCTTTCTCGCATGGCTTGATTAATCAGACCCTTGCAGCTGGTCGAGCCTCGGCTCTTATAC TCAACACCTTCGACGAGCTCGAAGCTCCATTTCTGACCCATCTCTCTTCCATTTTCAACAAAAT CTACACCATTGGACCCCTCCATGCTCTGTCCAAATCAAGGCTCGGCGACTCCTCCTCCTCCGCT TCTGCCCTCTCCGGATTCTGGAAAGAGGATAGAGCCTGCATGTCCTGGCTCGACTGTCAGCCGC CGAGATCTGTGGTTTTCGTCAGTTTCGGGAGTACGATGAAGATGAAAGCCGATGAATTGAGAGA GTTCTGGTATGGGTTGGTGAGCAGCGGGAAACCGTTCCTCTGCGTGTTGAGATCCGACGTTGTT TCCGGCGGAGAAGCGGCGGAATTGATCGAACAGATGGCGGAGGAGGAGGGAGCTGGAGGGAAGC TGGGAATGGTAGTGGAGTGGGCAGCGCAAGAGAAGGTCCTGAGCCACCCTGCCGTCGGTGGGTT TTTGACGCACTGCGGGTGGAACTCAACGGTGGAAAGCATTGCCGCGGGAGTTCCGATGATGTGC TGGCCGATTCTCGGCGACCAACCCAGCAACGCCACTTGGATCGACAGAGTGTGGAAAATTGGGG TTGAAAGGAACAATCGTGAATGGGACAGGTTGACGGTGGAGAAGATGGTGAGAGCATTGATGGA AGGCCAAAAGAGAGTGGAGATTCAGAGATCAATGGAGAAGCTTTCAAAGTTGGCAAATGAGAAG GTTGTCAGGGGTGGGTTGTCTTTTGATAACTTGGAAGTTCTCGTTGAAGACATCAAAAAATTGA AACCATATAAATTTTAA UDP- MDTRKRSIRILMFPWLAHGHISAFLELAKSLAKRNFVIYICSSQVNLNSISKNMSSKDSISVKL 411 glycosyltransferas VELHIPTTILPPPYHTTNGLPPHLMSTLKRALDSARPAFSTLLQTLKPDLVLYDFLQSWASEEA es(330) ESQNIPAMVFLSTGAAAISFIMYHWFETRPEEYPFPAIYFREHEYDNFCRFKSSDSGTSDQLRV (protein) SDCVKRSHDLVLIKTFRELEGQYVDFLSDLTRKRFVPVGPLVQEVGCDMENEGNDIIEWLDGKD RRSTVFSSFGSEYFLSANEIEEIAYGLELSGLNFIWVVRFPHGDEKIKIEEKLPEGFLERVEGR GLVVEGWAQQRRILSHPSVGGFLSHCGWSSVMEGVYSGVPIIAVPMHLDQPFNARLVEAVGFGE EVVRSRQGNLDRGEVARVVKKLVMGKSGEGLRRRVEELSEKMREKGEEEIDSLVEELVTVVRRR ERSNLKSENSMKKLNVMDDGE UDP- ATGGATACAAGAAAGAGAAGCATCAGGATTCTAATGTTCCCATGGCTTGCTCATGGCCATATCT 412 glycosyltransferas CAGCATTCCTCGAGCTGGCGAAGTCACTTGCCAAAAGAAACTTCGTCATTTACATTTGTTCTTC es(330)(gDNA, ACAAGTAAATCTAAATTCCATCAGCAAGAACATGTCATCAAAAGACTCCATTTCCGTAAAACTT native) GTTGAGCTTCACATTCCCACCACCATACTTCCCCCTCCTTACCACACCACCAATGGCCTCCCAC CCCACCTCATGTCCACCCTCAAGAGAGCCCTCGACAGTGCCCGGCCCGCCTTCTCCACCCTCCT CCAAACCCTCAAGCCCGACTTGGTTTTATACGATTTCCTCCAGTCGTGGGCCTCGGAGGAGGCC GAGTCGCAGAATATACCAGCCATGGTGTTTCTGAGTACCGGAGCTGCAGCGATTTCTTTTATTA TGTACCATTGGTTTGAGACCAGACCGGAGGAGTACCCTTTTCCGGCTATATACTTCCGGGAACA CGAGTATGATAACTTCTGCCGTTTTAAGTCTTCCGACAGCGGTACTAGTGATCAATTGAGAGTC AGCGATTGCGTTAAACGGTCGCACGATTTGGTTCTGATCAAGACATTCCGTGAACTGGAAGGAC AATACGTAGATTTTCTCTCCGACTTGACTCGGAAGAGATTCGTACCAGTTGGCCCCCTTGTTCA GGAGGTAGGTTGTGATATGGAGAATGAAGGAAATGACATCATCGAATGGCTCGACGGGAAAGAC CGTCGTTCGACGGTTTTCTCCTCATTCGGGAGCGAGTACTTCTTGTCTGCCAATGAGATCGAAG AGATAGCTTATGGGCTGGAGCTAAGCGGGCTTAACTTCATCTGGGTTGTTAGGTTTCCTCATGG CGACGAGAAAATCAAGATTGAGGAGAAACTGCCGGAAGGGTTTCTTGAGAGAGTGGAAGGAAGA GGGTTGGTGGTGGAGGGATGGGCACAGCAGAGGAGAATATTGTCACATCCGAGTGTTGGAGGGT TTTTGAGCCACTGTGGGTGGAGTTCTGTGATGGAAGGGGTGTATTCCGGTGTGCCGATTATTGC CGTGCCGATGCATCTTGACCAGCCGTTCAATGCTAGGTTGGTGGAGGCGGTGGGGTTTGGGGAG GAGGTGGTGAGGAGTAGACAAGGAAATCTTGACAGAGGAGAGGTGGCGAGGGTGGTGAAGAAGC TGGTTATGGGGAAAAGTGGGGAGGGGTTACGGCGGAGGGTGGAGGAGTTGAGTGAGAAGATGAG AGAGAAAGGGGAGGAGGAGATTGATTCACTGGTGGAGGAATTGGTGACGGTGGTTAGGAGGAGA GAGAGATCGAATCTCAAGTCTGAGAATTCTATGAAGAAATTGAATGTGATGGATGATGGAGAAT AG UDP- MDAAQQGDTTTILMLPWLGYGHLSAFLELAKSLSRRNFHIYFCSTSVNLDAIKPKLPSSFSDSI 413 glycosyltransferas QFVELHLPSSPEFPPHLHTTNGLPPTLMPALHQAFSMAAQHFESILQTLAPHLLIYDSLQPWAP es(328)described RVASSLKIPAINFNTTGVFVISQGXHPIHYPHSKFPFSEFVLHNHWKAMYSTADGASTERTRKR inItkinetal. GEAFLYCLHASCSVILINSFRELEGKYMDYLSVLLNKKVVPVGPLVYEPNQDGEDEGYSSIKNW (protein) LDKKEPSSTVFVSFGSEYFPSKEEMEEIAHGLEASEVNFIWVVRFPQGDNTSGIEDALPKGFLE RAGERGMVVKGWAPQAKILKHWSTGGFVSHCGWNSVMESMMFGVPIIGVPMHVDQPFNAGLVEE AGVGVEAKRDPDGKIQRDEVAKLIKEVVVEKTREDVRKKAREMSEILRSKGEEKFDEMVAEISL LLKI UDP- ATGGATGCTGCCCAACAAGGTGACACCACAACCATTTTGATGCTTCCATGGCTCGGCTATGGCC 414 glycosyltransferas ATCTTTCAGCTTTTCTCGAGCTGGCCAAAAGCCTCTCAAGGAGGAACTTCCATATCTACTTCTG es(328(gDNA, TTCAACCTCTGTTAATCTTGACGCCATTAAACCAAAGCTTCCTTCTTCTTTCTCTGATTCCATT native) CAATTTGTGGAGCTCCATCTCCCTTCTTCTCCTGAGTTCCCTCCTCATCTTCACACAACCAACG GCCTTCCCCCTACCCTCATGCCCGCTCTCCACCAAGCCTTCTCCATGGCTGCCCAGCACTTTGA GTCCATTTTACAAACACTTGCCCCGCACCTTCTCATTTATGACTCTCTTCAACCTTGGGCTCCT CGGGTAGCTTCATCCCTCAAAATTCCGGCCATCAACTTCAATACCACGGGAGTTTTCGTCATTT CTCAAGGGYTTCACCCTATTCACTACCCACATTCTAAATTCCCATTCTCAGAGTTCGTTCTTCA CAATCATTGGAAAGCCATGTACTCCACTGCCGATGGAGCTTCTACCGAAAGAACCCGCAAACGT GGAGAAGCGTTTCTGTATTGCTTGCATGCTTCTTGTAGTGTAATTCTAATCAATAGTTTCAGAG AGCTCGAGGGGAAATATATGGATTATCTCTCTGTTCTCTTGAACAAGAAAGTTGTTCCGGTTGG TCCTTTGGTTTACGAACCGAATCAAGACGGGGAAGATGAAGGTTATTCAAGCATCAAAAATTGG CTTGACAAAAAGGAACCGTCCTCCACCGTCTTCGTGTCATTTGGAAGCGAATACTTCCCGTCAA AGGAAGAAATGGAAGAGATAGCCCATGGGTTAGAGGCGAGCGAGGTTAATTTCATCTGGGTCGT TAGGTTTCCTCAAGGAGACAACACCAGCGGCATTGAAGATGCCTTGCCGAAGGGTTTTCTGGAG AGGGCGGGAGAGAGAGGGATGGTGGTGAAGGGTTGGGCTCCTCAGGCGAAGATACTGAAGCATT GGAGCACAGGGGGATTCGTGAGCCACTGTGGATGGAACTCGGTGATGGAGAGCATGATGTTTGG CGTTCCCATAATAGGGGTTCCGATGCATGTGGACCAGCCCTTTAACGCCGGACTCGTGGAAGAA GCTGGCGTCGGCGTGGAGGCCAAGCGAGATCCAGACGGCAAAATTCAAAGAGACGAAGTTGCAA AGTTGATCAAAGAAGTGGTGGTTGAGAAAACCAGAGAAGATGTGCGGAAGAAAGCAAGAGAAAT GAGTGAGATTTTGAGGAGCAAGGGAGAGGAGAAGTTTGATGAGATGGTCGCTGAAATTTCTCTC TTGCTTAAAATATGA AtSus1protein MKHHHHHHQLHAGAHAAAGTMANAERMITRVHSQRERLNETLVSERNEVLALLSRVEAKGKGIL 415 QQNQIIAEFEALPEQTRKKLEGGPFFDLLKSTQEAIVLPPWVALAVRPRPGVWEYLRVNLHALV VEELQPAEFLHFKEELVDGVKNGNFTLELDFEPFNASIPRPTLHKYIGNGVDFLNRHLSAKLFH DKESLLPLLKFLRLHSHQGKNLMLSEKIQNLNTLQHTLRKAEEYLAELKSETLYEEFEAKFEEI GLERGWGDNAERVLDMIRLLLDLLEAPDPCTLETFLGRVPMVFNVVILSPHGYFAQDNVLGYPD TGGQVVYILDQVRALEIEMLQRIKQQGLNIKPRILILTRLLPDAVGTTCGERLERVYDSEYCDI LRVPFRTEKGIVRKWISRFEVWPYLETYTEDAAVELSKELNGKPDLIIGNYSDGNLVASLLAHK LGVTQCTIAHALEKTKYPDSDIYWKKLDDKYHFSCQFTADIFAMNHTDFIITSTFQEIAGSKET VGQYESHTAFTLPGLYRVVHGIDVFDPKFNIVSPGADMSIYFPYTEEKRRLTKFHSEIEELLYS DVENKEHLCVLKDKKKPILFTMARLDRVKNLSGLVEWYGKNTRLRELANLVVVGGDRRKESKDN EEKAEMKKMYDLIEEYKLNGQFRWISSQMDRVRNGELYRYICDTKGAFVQPALYEAFGLTVVEA MTCGLPTFATCKGGPAEIIVHGKSGFHIDPYHGDQAADTLADFFTKCKEDPSHWDEISKGGLQR IEEKYTWQIYSQRLLTLTGVYGFWKHVSNLDRLEARRYLEMFYALKYRPLAQAVPLAQDD AtSus1(gDNA) ATGAAACATCACCATCACCATCACCAGCTGCATGCGGGAGCTCATGCGGCCGCGGGTACCATGG 416 CAAACGCTGAACGTATGATAACGCGCGTCCACAGCCAACGTGAGCGTTTGAACGAAACGCTTGT TTCTGAGAGAAACGAAGTCCTTGCCTTGCTTTCCAGGGTTGAAGCCAAAGGTAAAGGTATTTTA CAACAAAACCAGATCATTGCTGAATTCGAAGCTTTGCCTGAACAAACCCGGAAGAAACTTGAAG GTGGTCCTTTCTTTGACCTTCTCAAATCCACTCAGGAAGCAATTGTGTTGCCACCATGGGTTGC TCTAGCTGTGAGGCCAAGGCCTGGTGTTTGGGAATACTTACGAGTCAATCTCCATGCTCTTGTC GTTGAAGAACTCCAACCTGCTGAGTTTCTTCATTTCAAGGAAGAACTCGTTGATGGAGTTAAGA ATGGTAATTTCACTCTTGAGCTTGATTTCGAGCCATTCAATGCGTCTATCCCTCGTCCAACACT CCACAAATACATTGGAAATGGTGTTGACTTCCTTAACCGTCATTTATCGGCTAAGCTCTTCCAT GACAAGGAGAGTTTGCTTCCATTGCTTAAGTTCCTTCGTCTTCACAGCCACCAGGGCAAGAACC TGATGTTGAGCGAGAAGATTCAGAACCTCAACACTCTGCAACACACCTTGAGGAAAGCAGAAGA GTATCTAGCAGAGCTTAAGTCCGAAACACTGTATGAAGAGTTTGAGGCCAAGTTTGAGGAGATT GGTCTTGAGAGGGGATGGGGAGACAATGCAGAGCGTGTCCTTGACATGATACGTCTTCTTTTGG ACCTTCTTGAGGCGCCTGATCCTTGCACTCTTGAGACTTTTCTTGGAAGAGTACCAATGGTGTT CAACGTTGTGATCCTCTCTCCACATGGTTACTTTGCTCAGGACAATGTTCTTGGTTACCCTGAC ACTGGTGGACAGGTTGTTTACATTCTTGATCAAGTTCGTGCTCTGGAGATAGAGATGCTTCAAC GTATTAAGCAACAAGGACTCAACATTAAACCAAGGATTCTCATTCTAACTCGACTTCTACCTGA TGCGGTAGGAACTACATGCGGTGAACGTCTCGAGAGAGTTTATGATTCTGAGTACTGTGATATT CTTCGTGTGCCCTTCAGAACAGAGAAGGGTATTGTTCGCAAATGGATCTCAAGGTTCGAAGTCT GGCCATATCTAGAGACTTACACCGAGGATGCTGCGGTTGAGCTATCGAAAGAATTGAATGGCAA GCCTGACCTTATCATTGGTAACTACAGTGATGGAAATCTTGTTGCTTCTTTATTGGCTCACAAA CTTGGTGTCACTCAGTGTACCATTGCTCATGCTCTTGAGAAAACAAAGTACCCGGATTCTGATA TCTACTGGAAGAAGCTTGACGACAAGTACCATTTCTCATGCCAGTTCACTGCGGATATTTTCGC AATGAACCACACTGATTTCATCATCACTAGTACTTTCCAAGAAATTGCTGGAAGCAAAGAAACT GTTGGGCAGTATGAAAGCCACACAGCCTTTACTCTTCCCGGATTGTATCGAGTTGTTCACGGGA TTGATGTGTTTGATCCCAAGTTCAACATTGTCTCTCCTGGTGCTGATATGAGCATCTACTTCCC TTACACAGAGGAGAAGCGTAGATTGACTAAGTTCCACTCTGAGATCGAGGAGCTCCTCTACAGC GATGTTGAGAACAAAGAGCACTTATGTGTGCTCAAGGACAAGAAGAAGCCGATTCTCTTCACAA TGGCTAGGCTTGATCGTGTCAAGAACTTGTCAGGTCTTGTTGAGTGGTACGGGAAGAACACCCG CTTGCGTGAGCTAGCTAACTTGGTTGTTGTTGGAGGAGACAGGAGGAAAGAGTCAAAGGACAAT GAAGAGAAAGCAGAGATGAAGAAAATGTATGATCTCATTGAGGAATACAAGCTAAACGGTCAGT TCAGGTGGATCTCCTCTCAGATGGACCGGGTAAGGAACGGTGAGCTGTACCGGTACATCTGTGA CACCAAGGGTGCTTTTGTCCAACCTGCATTATATGAAGCCTTTGGGTTAACTGTTGTGGAGGCT ATGACTTGTGGTTTACCGACTTTCGCCACTTGCAAAGGTGGTCCAGCTGAGATCATTGTGCACG GTAAATCGGGTTTCCACATTGACCCTTACCATGGTGATCAGGCTGCTGATACTCTTGCTGATTT CTTCACCAAGTGTAAGGAGGATCCATCTCACTGGGATGAGATCTCAAAAGGAGGGCTTCAGAGG ATTGAGGAGAAATACACTTGGCAAATCTATTCACAGAGGCTCTTGACATTGACTGGTGTGTATG GATTCTGGAAGCATGTCTCGAACCTTGACCGTCTTGAGGCTCGCCGTTACCTTGAAATGTTCTA TGCATTGAAGTATCGCCCATTGGCTCAGGCTGTTCCTCTTGCACAAGATGATTGA SgCbQprotein MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS 417 DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF LLPGLVIALYVTGVLNSVLSKHHRQEMCRYVYNHQNEDGGWGLHIEGPSTMFGSALNYVALRLL GEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGR MWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHIHYEDENTRYICLGPVNKVLNLLCCWVED PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV KSSQIQQDCPGDPNVWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATMEALTLF KKLHPGHRTKEIDTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE SgCbQ(gDNA) ATGTGGAGGTTAAAGGTCGGAGCAGAAAGCGTTGGGGAGAATGATGAGAAATGGTTGAAGAGCA 418 TAAGCAATCACTTGGGACGCCAGGTGTGGGAGTTCTGTCCGGATGCCGGCACCCAACAACAGCT CTTGCAAGTCCACAAAGCTCGTAAAGCTTTCCACGATGACCGTTTCCACCGAAAGCAATCTTCC GATCTCTTTATCACTATTCAGTATGGAAAGGAAGTAGAAAATGGTGGAAAGACAGCGGGAGTGA AATTGAAAGAAGGGGAAGAGGTGAGGAAAGAGGCAGTAGAGAGTAGCTTAGAGAGGGCATTAAG TTTCTACTCAAGCATCCAGACAAGCGATGGGAACTGGGCTTCGGATCTTGGGGGGCCCATGTTT TTACTTCCGGGTCTGGTGATTGCCCTCTACGTTACAGGCGTCTTGAATTCTGTTTTATCCAAGC ACCACCGGCAAGAGATGTGCAGATATGTTTACAATCACCAGAATGAAGATGGGGGGTGGGGTCT CCACATCGAGGGCCCAAGCACCATGTTTGGTTCCGCACTGAATTATGTTGCACTCAGGCTGCTT GGAGAAGACGCCAACGCCGGGGCAATGCCAAAAGCACGTGCTTGGATCTTGGACCACGGTGGCG CCACCGGAATCACTTCCTGGGGCAAATTGTGGCTTTCTGTACTTGGAGTCTACGAATGGAGTGG CAATAATCCTCTTCCACCCGAATTTTGGTTATTTCCTTACTTCCTACCATTTCATCCAGGAAGA ATGTGGTGCCATTGTCGAATGGTTTATCTACCAATGTCATACTTATATGGAAAGAGATTTGTTG GGCCAATCACACCCATAGTTCTGTCTCTCAGAAAAGAACTCTACGCAGTTCCATATCATGAAAT AGACTGGAATAAATCTCGCAATACATGTGCAAAGGAGGATCTGTACTATCCACATCCCAAGATG CAAGATATTCTGTGGGGATCTCTCCACCACGTGTATGAGCCCTTGTTTACTCGTTGGCCTGCCA AACGCCTGAGAGAAAAGGCTTTGCAGACTGCAATGCAACATATTCACTATGAAGATGAGAATAC CCGATATATATGCCTTGGCCCTGTCAACAAGGTACTCAATCTGCTTTGTTGTTGGGTTGAAGAT CCCTACTCCGACGCCTTCAAACTTCATCTTCAACGAGTCCATGACTATCTCTGGGTTGCTGAAG ATGGCATGAAAATGCAGGGTTATAATGGGAGCCAGTTGTGGGACACTGCTTTCTCCATCCAAGC AATCGTATCCACCAAACTTGTAGACAACTATGGCCCAACCTTAAGAAAGGCACACGACTTCGTT AAAAGTTCTCAGATTCAGCAGGACTGTCCTGGGGATCCTAATGTTTGGTACCGTCACATTCATA AAGGTGCATGGCCATTTTCAACTCGAGATCATGGATGGCTCATCTCTGACTGTACAGCAGAGGG ATTAAAGGCTGCTTTGATGTTATCCAAACTTCCATCCGAAACAGTTGGGGAATCATTAGAACGG AATCGCCTTTGCGATGCTGTAAACGTTCTCCTTTCTTTGCAAAACGATAATGGTGGCTTTGCAT CATATGAGTTGACAAGATCATACCCTTGGTTGGAGTTGATCAACCCCGCAGAAACGTTTGGAGA TATTGTCATTGATTATCCGTATGTGGAGTGCACCTCAGCCACAATGGAAGCACTGACGTTGTTT AAGAAATTACATCCCGGCCATAGGACCAAAGAAATTGATACTGCTATTGTCAGGGCGGCCAACT TCCTTGAAAATATGCAAAGGACGGATGGCTCTTGGTATGGATGTTGGGGGGTTTGCTTCACGTA TGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGGACATATAATAATTGCCTTGCC ATTCGCAAGGCTTGCGATTTTTTACTATCTAAAGAGCTGCCCGGCGGTGGATGGGGAGAGAGTT ACCTTTCATGTCAGAATAAGGTATACACAAATCTTGAAGGAAACAGACCGCACCTGGTTAACAC GGCCTGGGTTTTAATGGCCCTCATAGAAGCTGGCCAGGCTGAGAGAGACCCAACACCATTGCAT CGTGCAGCAAGGTTGTTAATCAATTCCCAGTTGGAGAATGGTGATTTCCCCCAACAGGAGATCA TGGGAGTCTTTAATAAAAATTGCATGATCACATATGCTGCATACCGAAACATTTTTCCCATTTG GGCTCTTGGAGAGTATTGCCATCGGGTTTTGACTGAATAA ATGTGGAGGTTAAAGGTCGGAGCAGAAAGCGTTGGGGAGAATGATGAGAAATGGTTGAAGAGCA 419 TAAGCAATCACTTGGGACGCCAGGTGTGGGAGTTCTGTCCGGATGCCGGCACCCAACAACAGCT CTTGCAAGTCCACAAAGCTCGTAAAGCTTTCCACGATGACCGTTTCCACCGAAAGCAATCTTCC GATCTCTTTATCACTATTCAGTATGGAAAGGAAGTAGAAAATGGTGGAAAGACAGCGGGAGTGA AATTGAAAGAAGGGGAAGAGGTGAGGAAAGAGGCAGTAGAGAGTAGCTTAGAGAGGGCATTAAG TTTCTACTCAAGCATCCAGACAAGCGATGGGAACTGGGCTTCGGATCTTGGGGGGCCCATGTTT TTACTTCCGGGTCTGGTGATTGCCCTCTACGTTACAGGCGTCTTGAATTCTGTTTTATCCAAGC ACCACCGGCAAGAGATGTGCAGATATGTTTACAATCACCAGAATGAAGATGGGGGGTGGGGTCT CCACATCGAGGGCCCAAGCACCATGTTTGGTTCCGCACTGAATTATGTTGCACTCAGGCTGCTT GGAGAAGACGCCAACGCCGGGGCAATGCCAAAAGCACGTGCTTGGATCTTGGACCACGGTGGCG CCACCGGAATCACTTCCTGGGGCAAATTGTGGCTTTCTGTACTTGGAGTCTACGAATGGAGTGG CAATAATCCTCTTCCACCCGAATTTTGGTTATTTCCTTACTTCCTACCATTTCATCCAGGAAGA ATGTGGTGCCATTGTCGAATGGTTTATCTACCAATGTCATACTTATATGGAAAGAGATTTGTTG GGCCAATCACACCCATAGTTCTGTCTCTCAGAAAAGAACTCTACGCAGTTCCATATCATGAAAT AGACTGGAATAAATCTCGCAATACATGTGCAAAGGAGGATCTGTACTATCCACATCCCAAGATG CAAGATATTCTGTGGGGATCTCTCCACCACGTGTATGAGCCCTTGTTTACTCGTTGGCCTGCCA AACGCCTGAGAGAAAAGGCTTTGCAGACTGCAATGCAACATATTCACTATGAAGATGAGAATAC CCGATATATATGCCTTGGCCCTGTCAACAAGGTACTCAATCTGCTTTGTTGTTGGGTTGAAGAT CCCTACTCCGACGCCTTCAAACTTCATCTTCAACGAGTCCATGACTATCTCTGGGTTGCTGAAG ATGGCATGAAAATGCAGGGTTATAATGGGAGCCAGTTGTGGGACACTGCTTTCTCCATCCAAGC AATCGTATCCACCAAACTTGTAGACAACTATGGCCCAACCTTAAGAAAGGCACACGACTTCGTT AAAAGTTCTCAGATTCAGCAGGACTGTCCTGGGGATCCTAATGTTTGGTACCGTCACATTCATA AAGGTGCATGGCCATTTTCAACTCGAGATCATGGATGGCTCATCTCTGACTGTACAGCAGAGGG ATTAAAGGCTGCTTTGATGTTATCCAAACTTCCATCCGAAACAGTTGGGGAATCATTAGAACGG AATCGCCTTTGCGATGCTGTAAACGTTCTCCTTTCTTTGCAAAACGATAATGGTGGCTTTGCAT CATATGAGTTGACAAGATCATACCCTTGGTTGGAGTTGATCAACCCCGCAGAAACGTTTGGAGA TATTGTCATTGATTATCCGTATGTGGAGTGCACCTCAGCCACAATGGAAGCACTGACGTTGTTT AAGAAATTACATCCCGGCCATAGGACCAAAGAAATTGATACTGCTATTGTCAGGGCGGCCAACT TCCTTGAAAATATGCAAAGGACGGATGGCTCTTGGTATGGATGTTGGGGGGTTTGCTTCACGTA TGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGGACATATAATAATTGCCTTGCC ATTCGCAAGGCTTGCGATTTTTTACTATCTAAAGAGCTGCCCGGCGGTGGATGGGGAGAGAGTT ACCTTTCATGTCAGAATAAGGTATACACAAATCTTGAAGGAAACAGACCGCACCTGGTTAACAC GGCCTGGGTTTTAATGGCCCTCATAGAAGCTGGCCAGGCTGAGAGAGACCCAACACCATTGCAT CGTGCAGCAAGGTTGTTAATCAATTCCCAGTTGGAGAATGGTGATTTCCCCCAACAGGAGATCA TGGGAGTCTTTAATAAAAATTGCATGATCACATATGCTGCATACCGAAACATTTTTCCCATTTG GGCTCTTGGAGAGTATTGCCATCGGGTTTTGACTGAATAA Cpep2protein MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCAADAAAVTPHQLLQIQNARNHFHRNRFHRK 420 QSSDLFLAIQYEKEIAKGGKGKEAVKVKEGEEVGKEAVKSTLERALSFYTAVQTSDGNWASDLG GPMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYIYNHQNEDGGWGLHIEGTSTMFGSALNYVA LRLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYS LPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDL YYPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNM LCCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTL RKAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTM VGEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAAT MEALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGR TYNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGE RDPAPLHRGARLVMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Cpep2gene ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG 421 sequence TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGCCGACGCCGCCGCCGTCACTCC TCACCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCGAAGGGCGGAAAAGGGA AAGAGGCGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGA GAGGGCACTAAGTTTCTACACAGCCGTGCAGACGAGCGATGGGAATTGGGCCTCGGATCTTGGA GGGCCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAG TTTTGTCCAAGCACCACCGCGTAGAGATGTGCAGATATATTTACAATCACCAGAATGAAGATGG AGGGTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTATGTTGCA CTTAGGCTGCTTGGAGAAGACGCCGATGGCGGAGACGATGGTGCAATGACAAAAGCACGTGCTT GGATCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTGCT TGGAGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGC CTACCATTTCATCCAGGACGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACT TATATGGGAAGAGATTTGTTGGCCCAATCACTCCCAAAGTTCTTTCTCTAAGACAAGAGCTCTA CACGGTTCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTA TACTATCCACATCCCAAGArGCAAGACATACTATGGGGATCTArCTACCATGTATArGAGCCAT TGTTCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATAT TCACTATGAAGATGAAAATAGTCGCTATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATG CTTTGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATG ACTATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGTTACAATGGCAGCCAGTTGTGGGA CACTGCTTTCTCCATCCAAGCCATTGTAGCTACCAAACTTGTAGACAGCTATGCCCCAACTTTA AGAAAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATG TTTGGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCAT CTCTGACTGCACGGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATG GTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAA ATGATAACGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAA CCCAGCAGAAACATTCGGAGACATTGTCATCGACTATCCGTATGTGGAGTGCACCGCAGCAACA ATGGAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAG CTATTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAAAGGGCGGATGGCTCTTGGTATGGGTG TTGGGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATCAAGGGATTGGTGGCTGCAGGAAGA ACATATAATAGCTGCCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCG GCGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAA CAAGCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCCGGCCAGGGTGAG AGAGACCCAGCACCATTGCACCGTGGAGCAAGGTTGGTAATGAATTCTCAACTGGAGAATGGTG ATTTCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATA CCGAAACATCTTCCCCATTrGGGCGCTIGGAGAGTATTGCCATCGGGTTCTTACTGAATGA Cpep4protein MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCAADAAAVTPHQLLQIQNARNHFHRNRFHRK 422 QSSDLFLAIQYEKEIAKGGKGKEAVKVKEGEEVGKEAVKSTLERALSFYTAVQTSDGNWASDLG GPMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYIYNHQNEDGGWGLHIEGTSTMFGSALNYVA LRLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFLLLPYS LPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDL YYPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNM LCCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTL RKAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSTM VGEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYSYVECTAAT MEALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGR TYNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGE RDPAPLHRAARLVMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE. Cpep4gene ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG 423 sequence TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGCCGACGCCGCCGCCGTCACTCC TCACCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCGAAGGGCGGAAAAGGGA AAGAGGCGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGA GAGGGCACTAAGTTTCTACACAGCCGTGCAGACGAGCGATGGGAATTGGGCCTCGGATCTTGGA GGGCCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAG TTTTGTCCAAGCACCACCGCGTAGAGATGTGCAGATATATTTACAATCACCAGAATGAAGATGG AGGGTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTATGTTGCA CTTAGGCTGCTTGGAGAAGACGCCGATGGCGGAGACGATGGTGCAATGACAAAAGCACGTGCTT GGATCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTGCT TGGAGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTTGCTTCTCCCTTACAGC CTACCATTTCATCCAGGACGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACT TATATGGGAAGAGATCTGTTCGCCCAATCACTCCCAAAGTTCTTTCTCTAAGACAAGAGCTCTA CACGGTTCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTA TACTATCCACATCCCAAGATGCAAGACATACTATGGGGATCTATCTACCATGTATATGAGCCAT TGTTCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATGAAACATAT TCACTATGAAGATGAAAATAGTCGCTATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATG CTTTGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATG ACTATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGTTACAATGGCAGCCAGTTGTGGGA CACTGCTTTCTCCATCCAAGCCATTGTAGCTACCAAACTTGTAGACAGCTATGCCCCAACTTTA AGAAAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATG TTTGGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCAT CTCTGACTGCACGGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATG GTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAA ATGATAACGGCGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAA CCCAGCAGAAACATTCGGAGACATTGTCATCGACTATTCGTATGTGGAGTGCACCGCAGCAACA ATGGAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAG CTATTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAAAGGGCGGATGGCTCTTGGTATGGGTG TTGGGGGGTTTGTTTCACGTATGCGGGGTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGA ACATATAATAGCTGTCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCG GCGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAA CAAGCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAG AGAGACCCAGCACCATTGCACCGTGCAGCAAGGTTGGTAATGAATTCTCAACTGGAGAATGGCG ATTTCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATA CCGAAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA Cmax1protein MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAADTPHQLLQIQNARNHFHHNRFHRKQ 424 SSDLFLAIQYEKEIAKGAKGGAVKVKEGEEVGKEAVKSTLESALGFYSAVQTSDGNWASDLGGP MFLLPGLVIALHVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVALR LLGEDADGGDGGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLP FHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTIPYHEIDWNKSRNTCAKEDLYY PHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQAAMKHIHYEDENSRYICLGPVNKVLNMLC CWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLRK AHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSAMVG EPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATME ALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRTY NSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGERD PAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Cmax1gene ATGTGGAGGCTGAAGGTGGGAGCAGAGAGCGTTGGGGAGGAGGATGAGAAATGGGTGAAGAGCG 425 sequence TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGACGCCGCCGCCGACACTCCTCA CCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCACAATCGTTTCCACCGGAAGCAG TCTTCCGATCTCTTTCTGGCTATTCAATATGAAAAGGAAATAGCAAAGGGCGCAAAAGGTGGAG CGGTGAAAGTGAAAGAAGGGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGAAAGGGC ACTCGGTTTCTACTCGGCCGTGCAGACAAGAGATGGGAATTGGGCCTCGGATCTTGGAGGGCCC TTGTTTTTACTTCCGGGTCTCGTGATTGCCCTTCATGTCACAGGCGTCTTGAATTCAGTTTTGT CCAAGCACCACCGCGTAGAGATGTGCAGATATCTTTACAATCACCAGAATGAAGATGGAGGGTG GGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTGAATTACGTTGCACTAAGG CTGCTTGGAGAAGACGCCGATGGCGGAGACGGTGGCGCAATGACAAAAGCACGTGCTTGGATCT TGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAATTGTGGCTGTCCGTACTTGGAGT GTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGCCTACCA TTTCATCCAGGAAGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCAATGTCTTACTTATATG GGAAGAGATTTGTTGGGCCAATCACTCCCAAAGTTCTTTCTCTAAGGCAAGAGCTCTACACAAT TCCTTATCATGAAATAGACTGGAATAAATCCCGCAATACATGTGCAAAGGAGGATCTGTACTAT CCACATCCCAAGATGCAAGACATTCTATGGGGATCCATCTACCATGTATATGAGCCATTGTTCA CTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAGCTGCAATGAAACATATTCACTA TGAAGATGAAAATAGTCGATATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATGCTTTGT TGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATGACTATC TCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGCTACAATGGCAGCCAGTTGTGGGACACTGC TTTCTCCATCCAAGCCATCGTAGCCACCAAACTTGTAGACAGCTATGCCCCAACTTTAAGAAAA GCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATGTTTGGT TCCGTCATATTCATAAAGGTGCTTGGCCACTTTCGACACGAGATCATGGATGGCTCATCTCCGA CTGTACAGCTGAGGGATTGAAGGCTTCTTTGATGTTATCCAAACTTCCATCCACAATGGTTGGG GAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAAATGATA ATGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAACCCAGC TGAAACATTCGGAGACATTGTCATTGACTATCCGTATGTGGAGTGCACCGCAGCAACAATGGAA GCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAGCTATTG GCAAGGCAGCCAACTTCCTTGAGAAAATGCAGAGGGCGGATGGCTCTTGGTACGGGTGTTGGGG GGTTTGTTTTACGTATGCGGGTTGGTTTGGCATAAAGGGATTGGTGGCTGCAGGAAGAACATAT AATAGCTGCCTTGCCATTCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCGGCGGTG GATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGGAACAAGCC ACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAGAGAGAC CCAGCACCATTGCACCGTGCAGCAAGGTTGCTAATGAATTCCCAATTGGAGAATGGCGATTTCG TGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATACCGAAA CATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTTACTGAATGA Cmos1protein MWRLKVGAESVGEKDEKWVKSVSNHLGRQVWEFCADAAAAATPRQLLQIQNARNHFHRNRFHRK 426 QSSDLFLAIQYEKEIAEGGKGGAVKVKEEEEVGKEAVKSTLERALSFYSAVQTSDGNWASDLGG PMFLLPGLVIALYVTGVLNSVLSKHHRVEMCRYLYNHQNEDGGWGLHIEGTSTMFGSALNYVAL RLLGEDADGGDDGAMTKARAWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSL PFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPKVLSLRQELYTVPYHEIDWNKSRNTCAKEDLY YPHPKMQDILWGSIYHVYEPLFTRWPGKRLREKALQTAMKHIHYEDENSRYICLGPVNKVLNML CCWVEDPYSDAFKLHLQRVHDYLWVAEDGMRMQGYNGSQLWDTAFSIQAIVATKLVDSYAPTLR KAHDFVKDSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSAMV GEPLEKNRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTAATM EALTLFKKLHPGHRTKEIDTAIGKAANFLEKMQRADGSWYGCWGVCFTYAGWFGIKGLVAAGRT YNSCLAIRKACEFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVLMALIEAGQGER DPAPLHRAARLLMNSQLENGDFVQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE Cmos1gene ATGTGGAGGTTGAAGGTGGGAGCAGAGAGCGTTGGGGAGAAGGATGAGAAATGGGTGAAGAGCG 427 sequence TAAGCAATCACTTGGGCCGCCAAGTTTGGGAGTTCTGTGCCGACGCCGCCGCCGCCGCCACTCC TCGCCAGTTACTACAAATTCAGAATGCTCGCAACCACTTCCATCGCAATCGTTTCCACCGGAAG CAGTCTTCCGATCTCTTTCTCGCTATTCAGTATGAAAAGGAAATAGCAGAGGGCGGAAAAGGTG GAGCGGTGAAAGTGAAAGAAGAGGAGGAGGTGGGGAAAGAGGCGGTGAAGAGTACGTTAGAAAG GGCACTAAGTTTCTACTCAGCCGTGCAGACAAGCGATGGGAATTGGGCCTCGGATCTTGGAGGG CCCATGTTTTTACTTCCGGGTCTCGTGATTGCCCTTTATGTCACAGGCGTGTTGAATTCAGTTT TGTCCAAGCACCACCGCGTAGAGATGTGCAGATATCTTTACAATCACCAGAATGAAGATGGAGG GTGGGGTCTACATATTGAGGGCACAAGCACCATGTTTGGTTCGGCACTCAATTACGTTGCACTA AGGCTGCTTGGAGAAGACGCGGATGGCGGAGACGATGGCGCAATGACAAAAGCACGTGCTTGGA TCTTGGAGCGCGGCGGCGCCACTGCGATCACTTCGTGGGGAAAGTTGTGGCTGTCCGTGCTTGG AGTGTACGAATGGAGTGGCAACAACCCTCTTCCGCCTGAGTTTTGGCTTCTCCCTTACAGCCTA CCATTTCATCCAGGAAGAATGTGGTGCCATTGTCGAATGGTTTATCTTCCCATGTCTTACTTAT ATGGGAAGAGATTTGTTGGGCCAATCACTCCCAAAGTTCTATCGCTAAGACAAGAGCTTTACAC GGTTCCTTATCATGAAATAGACTGGAACAAATCCCGCAATACATGTGCAAAGGAGGATCTATAC TATCCACATCCCAAGATGCAAGACATTCTATGGGGATCCATCTACCATGTGTATGAGCCATTGT TCACTCGTTGGCCTGGGAAACGCCTGAGGGAAAAGGCTTTACAAACTGCAATCAAACATATTCA CTATGAAGATGAAAATAGTCGATATATATGTCTTGGCCCAGTCAACAAGGTACTCAACATGCTT TGTTGTTGGGTTGAAGATCCCTACTCAGACGCCTTCAAACTTCACCTTCAACGCGTCCATGACT ATCTCTGGGTTGCTGAAGATGGCATGAGAATGCAGGGCTACAATGGCAGCCAGTTGTGGGACAC TGCTTTCTCCATCCAAGCCATCGTAGCCACCAAACTTGTAGACAGCTATGCCCCAACTTTAAGA AAAGCACATGACTTTGTTAAGGATTCTCAGATCCAGGAGGACTGTCCTGGGGATCCTAATGTTT GGTTCCGTCATATTCATAAAGGTGCTTGGCCATTTTCGACTCGAGATCATGGATGGCTCATCTC CGACTGTACAGCTGAGGGATTGAAGGCTTCTTrGATGTTATCCAAACTTCCATCCGCAATGGTT GGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTGTTAATGTTCTCCTTTCTTTGCAAAATG ATAATGGTGGATTTGCATCATACGAGTTGACGAGATCATACCCTTGGTTGGAGTTGATCAACCC AGCAGAAACATTCGGAGACATTGTCATCGACTATCCGTATGTGGAGTGCACCGCAGCAACAATG GAAGCACTGACGTTATTTAAGAAGCTACATCCAGGCCATAGGACCAAAGAGATTGACACAGCTA TTGGCAAGGCAGCCAACTTCCTTGAGAAAATGCAGAGGGCGGATGGCTCTTGGTATGGGTGTTG GGGGGTTTGTTTCACGTATGCGGGGTGGTITGGCATAAAGGGATTGGTGGCTGCAGGAAGAACA TATAATAGCTGCCTTGCCATCCGCAAGGCTTGTGAGTTTCTGCTATCTAAAGAGCTGCCCGGCG GTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAAGGTGTACACCAATCTTGAGGGAAACAA GCCACACTTGGTTAACACTGCCTGGGTTTTAATGGCTCTCATTGAAGCTGGCCAGGGTGAGAGA GACCCAGCACCATTGCACCGTGCACCAAGGTTGCTAATGAATTCCCAATTGGAGAATGGCGATT TCGTGCAACAGGAGATCATGGGAGTGTTCAATAAGAACTGCATGATCACATATGCTGCATACCG AAACATCTTCCCCATTTGGGCGCTTGGAGAGTATTGCCATCGGGTTCTGACTGAAT EPHprotein MEKIEHSTIATNGINMHVASAGSGPAVLFLHGFPELWYSWRHQLLYLSSLGYRAIAPDLRGFGD 428 TDAPPSPSSYTAHHIVGDLVGLLDQLGVDQVFLVGHDWGAMMAWYFCLFRPDRVKALVNLSVHF TPRNPAISPLDGFRLMLGDDFYVCKFQEPGVAEADFGSVDTATMFKKFLTMRDPRPPIIPNGFR SLATPEALPSWLTEEDIDYFAAKFAKTGFTGGFNYYRAIDLTWELTAPWSGSEIKVPTKFIVGD LDLVYHFPGVKEYIHGGGFKKDVPFLEEVVVMEGAAHFINQEKADEINSLIYDFIKQF. EPHgenesequence ATGGAGAAGATTGAACACTCTACTATCGCTACTAATGGTATCAATATGCACGTTGCCTCTGCTG 429 (codonoptimized, GTTCTGGTCCAGCTGTTTTGTTTTTGCACGGTTTCCCAGAATTATGGTATTCCTGGAGACACCA Ecoli) ATTGTTGTACTTGTCTTCTTTGGGTTACAGAGCTATTGCTCCAGATTTGAGAGGTTTCGGTGAC ACCGATGCTCCACCATCTCCATCCTCCTACACCGCCCACCACATCGTTGGTGATTTGGTCGGTT TGTTGGATCAATTAGGTGTCGATCAAGTCTTTTTGGTTGGTCATGATTGGGGTGCTATGATGGC CTGGTACTTCTGTTTGTTCCGTCCAGACAGAGTCAAGGCCTTAGTTAATTTATCTGTCCACTTC ACCCCACGTAACCCAGCTATCTCTCCATTAGATGGTTTCCGTTTGATGTTGGGTGATGATTTCT ACGTTTGTAAGTTTCAAGAACCAGGTGTCGCTGAAGCCGATTTCGGTTCTGTTGATACTGCCAC TATGTTTAAAAAGTTCTTGACCATGAGAGATCCACGTCCACCTATTATTCCAAACGGTTTCAGA TCCTTGGCCACCCCAGAAGCTTTGCCATCCTGGTTGACTGAAGAGGATATCGATTACTTTGCTG CCAAATTCGCTAAGACTGGTTTTACTGGTGGTTTCAACTACTACAGAGCTATCGACTTGACCTG GGAGTTGACTGCTCCATGGTCCGGTTCTGAAATCAAGGTTCCAACTAAGTTTATTGTTGGTGAC TTAGACTTGGTTTACCATTTCCCAGGTGTTAAGGAATACATTCACGGTGGTGGTTTCAAGAAGG ACGTTCCATTCTTGGAAGAAGTTGTCGTCATGGAAGGTGCTGCTCATTTTATCAACCAAGAAAA AGCTGACGAAATTAATTCTTTGATCTATGACTTCATTAAACAATTCTAG CYP87D18protein MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKK 430 VERYGPIFKTCLAGRPVVVSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVKNASALMVFRTSVNKMFGEDAK KLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREVVDDRLANVGPDVEDFLGQ ALKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEAEHEAIRKARA DPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGYIIPEGWTIMLVTASRHR DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKL GGGRIARAHILSFEDGLHVKFTPKE. CYP87D18gene ATGTGGACTGTCGTGCTCGGTTTGGCGACGCTGTTTGTCGCCTACTACATCCATTGGATTAACA 431 sequence AATGGAGAGATTCCAAGTTCAACGGAGTTCTGCCGCCGGGCACCATGGGTTTGCCGCTCATCGG AGAGACGATTCAACTGAGTCGACCCAGTGACTCCCTCGACGTTCACCCTTTCATCCAGAAAAAA GTTGAAAGATACGGGCCGATCTTCAAAACATGTCTGGCCGGAAGGCCGGTGGTGGTGTCGGCGG ACGCAGAGTTCAACAACTACATAATGCTGCAGGAAGGAAGAGCAGTGGAAATGTGGTATTTGGA TACGCTCTCCAAATTTTTCGGCCTCGACACCGAGTGGCTCAAAGCTCTGGGCCTCATCCACAAG TACATCAGAAGCATTACTCTCAATCACTTCGGCGCCGAGGCCCTGCGGGAGAGATTTCTTCCTT TTATTGAAGCATCCTCCATGGAAGCCCTTCACTCCTGGTCTACTCAACCTAGCGTCGAAGTCAA AAATGCCTCCGCTCTCATGGTTTTTAGGACCTCGGTGAATAAGATGTTCGGTGAGGATGCGAAG AAGCTATCGGGAAATATCCCTGGGAAGTTCACGAAGCTTCTAGGAGGATTTCTCAGTTTACCAC TGAATTTTCCCGGCACCACCTACCACAAATGCTTGAAGGATATGAAGGAAATCCAGAAGAAGCT AAGAGAGGTTGTAGACGATAGATTGGCTAATGTGGGCCCTGATGTGGAAGATTTCTTGGGGCAA GCCCTTAAAGATAAGGAATCAGAGAAGTTCATTTCAGAGGAGTTCATCATCCAACTGTTGTTTT CTATCAGTTTTGCTAGCTTTGAGTCCATCTCCACCACTCTTACTTTGATTCTCAAGCTCCTTGA TGAACACCCAGAAGTAGTGAAAGAGTTGGAAGCTGAACACGAGGCGATTCGAAAAGCTAGAGCA GATCCAGATGGACCAATTACTTGGGAAGAATACAAATCCATGACTTTTACATTACAAGTCATCA ATGAAACCCTAAGGTTGGGGAGTGTCACACCTGCCTTGTTGAGGAAAACAGTTAAAGATCTTCA AGTAAAAGGATACATAATCCCGGAAGGATGGACAATAATGCTTGTCACCGCTTCACGTCACAGA GACCCAAAAGTCTATAAGGACCCTCATATCTTCAATCCATGGCGTTGGAAGGACTTGGACTCAA TTACCATCCAAAAGAACTTCATGCCTTTTGGGGGAGGCTTAAGGCATTGTGCTGGTGCTGAGTA CTCTAAAGTCTACTTGTGCACCTTCTTGCACATCCTCTGTACCAAATACCGATGGACCAAACTT GGGGGAGGAAGGATTGCAAGAGCTCATATATTGAGTTTTGAAGATGGGTTACATGTGAAGTTCA CACCCAAGGAATGA AtCPRprotein MTSALYASDLFKQLKSIMGTDSLSDDVVLVIATTSLALVAGFVVLLWKKTTADRSGELKPLMIP 432 KSLMAKDEDDDLDLGSGKTRVSIFFGTQTGTAEGFAKALSEEIKARYEKAAVKDDYAADDDQYE EKLKKETLAFFCVATYGDGEPTDNAARFYKWFTEENERDIKLQQLAYGVFALGNRQYEHFNKIG IVLDEELCKKGAKRLIEVGLGDDDQSIEDDFNAWKESLWSELDKLLKDEDDKSVATPYTAVIPE YRVVTHDPRFTTQKSMESNVANGNTTIDIHHPCRVDVAVQKELHTHESDRSCIHLEFDISRTGI TYETGDHVGVYAENHVEIVEEAGKLLGHSLDLVFSIHADKEDGSPLESAVPPPFPGPCTLGTGL ARYADLLNPPRKSALVALAAYATEPSEAEKLKHLTSPDGKDEYSQWIVASQRSLLEVMAAFPSA KPPLGVFFAAIAPRLQPRYYSISSSPRLAPSRVHVTSALVYGPTPTGRIHKGVCSTWMKNAVPA EKSHECSGAPIFIRASNFKLPSNPSTPIVMVGPGTGLAPFRGFLQERMALKEDGEELGSSLLFF GCRNRQMDFIYEDELNNFVDQGVISELIMAFSREGAQKEYVQHKMMEKAAQVWDLIKEEGYLYV CGDAKGMARDVHRTLHTIVQEQEGVSSSEAEAIVKKLQTEGRYLRDVW AtCPRgene ATGACTTCTGCTTTGTATGCTTCCGATTTGTTTAAGCAGCTCAAGTCAATTATGGGGACAGATT 433 sequence CGTTATCCGACGATGTTGTACTTGTGATTGCAACGACGTCTTTGGCACTAGTAGCTGGATTTGT GGTGTTGTTATGGAAGAAAACGACGGCGGATCGGAGCGGGGAGCTGAAGCCTTTGATGATCCCT AAGTCTCTTATGGCTAAGGACGAGGATGATGATTTGGATTTGGGATCCGGGAAGACTAGAGTCT CTATCTTCTTCGGTACGCAGACTGGAACAGCTGAGGGATTTGCTAAGGCATTATCCGAAGAAAT CAAAGCGAGATATGAAAAAGCAGCAGTCAAAGATGACTATGCTGCCGATGATGACCAGTATGAA GAGAAATTGAAGAAGGAAACTTTGGCATTTTTCTGTGTTGCTACTTATGGAGATGGAGAGCCTA CTGACAATGCTGCCAGATTTTACAAATGGTTTACGGAGGAAAATGAACGGGATATAAAGCTTCA ACAACTAGCATATGGTGTGTTTGCTCTTGGTAATCGCCAATATGAACATTTTAATAAGATCGGG ATAGTTCTTGATGAAGAGTTATGTAAGAAAGGTGCAAAGCGTCTTATTGAAGTCGGTCTAGGAG ATGATGATCAGAGCATTGAGGATGATTTTAATGCCTGGAAAGAATCACTATGGTCTGAGCTAGA CAAGCTCCTCAAAGACGAGGATGATAAAAGTGTGGCAACTCCTTATACAGCTGTTATTCCTGAA TACCGGGTGGTGACTCATGATCCTCGGTTTACAACTCAAAAATCAATGGAATCAAATGTGGCCA ATGGAAATACTACTATTGACATTCATCATCCCTGCAGAGTTGATGTTGCTGTGCAGAAGGAGCT TCACACACATGAATCTGATCGGTCTTGCATTCATCTCGAGTTCGACATATCCAGGACGGGTATT ACATATGAAACAGGTGACCATGTAGGTGTATATGCTGAAAATCATGTTGAAATAGTTGAAGAAG CTGGAAAATTGCTTGGCCACTCTTTAGATTTAGTATTTTCCATACATGCTGACAAGGAAGATGG CTCCCCATTGGAAAGCGCAGTGCCGCCTCCTTTCCCTGGTCCATGCACACTTGGGACTGGTTTG GCAAGATACGCAGACCTTTTGAACCCTCCTCGAAAGTCTGCGTTAGTTGCCTTGGCGGCCTATG CCACTGAACCAAGTGAAGCCGAGAAACTTAAGCACCTGACATCACCTGATGGAAAGGATGAGTA CTCACAATGGATTGTTGCAAGTCAGAGAAGTCTTTTAGAGGTGATGGCTGCTTTTCCATCTGCA AAACCCCCACTAGGTGTATTTTTTGCTGCAATAGCTCCTCGTCTACAACCTCGTTACTACTCCA TCTCATCCTCGCCAAGATTGGCGCCAAGTAGAGTTCATGTTACATCCGCACTAGTATATGGTCC AACTCCTACTGGTAGAATCCACAAGGGTGTGTGTTCTACGTGGATGAAGAATGCAGTTCCTGCG GAGAAAAGTCATGAATGTAGTGGAGCCCCAATCTTTATTCGAGCATCTAATTTCAAGTTACCAT CCAACCCTTCAACTCCAATCGTTATGGTGGGACCTGGGACTGGGCTGGCACCTTTTAGAGGTTT TCTGCAGGAAAGGATGGCACTAAAAGAAGATGGAGAAGAACTAGGTTCATCTTTGCTCTTCTTT GGGTGTAGAAATCGACAGATGGACTTTATATACGAGGATGAGCTCAATAATTTTGTTGATCAAG GCGTAATATCTGAGCTCATCATGGCATTCTCCCGTGAAGGAGCTCAGAAGGAGTATGTTCAACA TAAGATGATGGAGAAGGCAGCACAAGTTTGGGATCTAATAAAGGAAGAAGGATATCTCTATGTA TGCGGTGATGCTAAGGGCATGGCGAGGGACGTCCACCGAACTCTACACACCATTGTTCAGGAGC AGGAAGGTGTGAGTTCGTCAGAGGCAGAGGCTATAGTTAAGAAACTTCAAACCGAAGGAAGATA CCTCAGAGATGTCTGGTGA AGY15763.1protein MWKVPKFIKQSYLVFLLALLLYSSFGFSFSRTEATTSTGALGPVTPKDTIYQIVTDRFFDGDPS 434 NNKPPGFDPTLFDDPDGNNQGNGKDLKLYQGGDFQGIIDKIPYLKNMGITAVWISAPYENRDTV IEDYQSDGSINRWTSFHGYHARNYFATNKHFGTMKDFIRLRDALHQNGIKLVIDFVSNHSSRWQ NPTLNFAPEDGKLYEPDKDANGNYVFDANGEPADYNGDGKVENLLADPHNDVNGFFHGLGDRGN DTSRFGYRYKDLGSLADYSQENALVVEHLEKAAKFWKSKGIDGFRHDATLHMNPAFVKGFKDAI DSDAGGPVTHFGEFFIGRPDPKYDEYRTFPERTGVNNLDFEYFRAATNAFGNFSETMSSFGDMM IKTSNDYIYENQTVTFLDNHDVTRFRYIQPNDKPYHAALAVLMTSRGIPNIYYGTEQYLMPSDS SDIAGRMFMQTSTNFDENTTAYKVIQKLSNLRKNNEAIAYGTTEILYSTNDVLVFKRQFYDKQV IVAVNRQPDQTFTIPELDTTLPVGTYSDVLGGLLYGSSMSVNNVNGQNKISSFTLSGGEVNVWS YNPSLGTLTPRIGDVISTMGRPGNTVYIYGTGLGGSVTVKFGSTVATVVSNSDQMIEAIVPNTN PGIQNITVTKGSVTSDPFRYEVLSGDQVQVIFHVNATTNWGENIYVVGNIPELGSWDPNQSSEA MLNPNYPEWFLPVSVPKGATFEFKFIKKDNNGNVIWESRSNRVFTAPNSSTGTIDTPLYFWDN AGY15764.1protein TTSTGALGPVTPKDTIYQIVTDRFFDGDPSNNKPPGFDPTLFDDPDGNNQGNGKDLKLYQGGDF 435 QGIIDKIPYLKNMGITAVWISAPYENRDTVIEDYQSDGSINRWTSFHGYHARNYFATNKHFGTM KDFIRLRDALHQNGIKLVIDFVSNHSSRWQNPTLNFAPEDGKLYEPDKDANGNYVFDANGEPAD YNGDGKVENLLADPHNDVNGFFHGLGDRGNDTSRFGYRYKDLGSLADYSQENALVVEHLEKAAK FWKSKGIDGFRHDATLHMNPAFVKGFKDAIDSDAGGPVTHFGEFFIGRPDPKYDEYRTFPERTG VNNLDFEYFRAATNAFGNFSETMSSFGDMMIKTSNDYIYENQTVTFLDNHDVTRFRYIQPNDKP YHAALAVLMTSRGIPNIYYGTEQYLMPSDSSDIAGRMFMQTSTNFDENTTAYKVIQKLSNLRKN NEAIAYGTTEILYSTNDVLVFKRQFYDKQVIVAVNRQPDQTFTIPELDTTLPVGTYSDVLGGLL YGSSMSVNNVNGQNKISSFTLSGGEVNVWSYNPSLGTLTPRIGDVISTMGRPGNTVYIYGTGLG GSVTVKFGSTVATVVSNSDQMIEAIVPNTNPGIQNITVTKGSVTSDPFRYEVLSGDQVQVIFHV NATTNWGENIYVVGNIPELGSWDPNQSSEAMLNPNYPEWFLPVSVPKGATFEFKFIKKDNNGNV IWESRSNRVFTAPNSSTGTIDTPLYFWDN Glycosyltransferas MDSGYSSSYAAAAGMHVVICPWLAFGHLLPCLDLAQRLASRGHRVSFVSTPRNISRLPPVRPAL 436 e(311) APLVAFVALPLPRVEGLPDGAESTNDVPHDRPDMVELHRRAFDGLAAPFSEFLGTACADWVIVD VFHHWAAAAALEHKVPCAMMLLGSAHMIASIADRRLERAETESPAAAGQGRPAAAPTFEVARMK LIRTKGSSGMSLAERFSLTLSRSSLVVGRSCVEFEPETVPLLSTLRGKPITFLGLMPPLHEGRR EDGEDATVRWLDAQPAKSVVYVALGSEVPLGVEKVHELALGLELAGTRFLWALRKPTGVSDADL LPAGFEERTRGRGVVATRWVPQMSILAHAAVGAFLTHCGWNSTIEGLMFGHPLIMLPIFGDQGP NARLIEAKNAGLQVARNDGDGSFDREGVAAAIRAVAVEEESSKVFQAKAKKLQEIVADMACHER YIDGFIQQLRSYKDAAALE Glycosyltransferas ATGGACTCCGGCTACTCCTCCTCCTACGCCGCCGCCGCCGGGATGCACGTCGTGATCTGCCCGT 437 e(311)(gDNA GGCTCGCCTTCGGCCACCTGCTCCCGTGCCTCGACCTCGCCCAGCGCCTCGCGTCGCGGGGCCA native) CCGCGTGTCGTTCGTCTCCACGCCGCGGAACATATCCCGCCTCCCGCCGGTGCGCCCCGCGCTC GCGCCGCTCGTCGCCTTCGTGGCGCTGCCGCTCCCGCGCGTCGAGGGGCTCCCCGACGGCGCCG AGTCCACCAACGACGTCCCCCACGACAGGCCGGACATGGTCGAGCTCCACCGGAGGGCCTTCGA CGGGCTCGCCGCGCCCTTCTCGGAGTTCTTGGGCACCGCGTGCGCCGACTGGGTCATCGTCGAC GTCTTCCACCACTGGGCCGCAGCCGCCGCTCTCGAGCACAAGGTGCCATGTGCAATGATGTTGT TGGGCTCTGCACATATGATCGCTTCCATAGCAGACAGACGGCTCGAGCGCGCGGAGACAGAGTC GCCTGCGGCTGCCGGGCAGGGACGCCCAGCGGCGGCGCCAACGTTCGAGGTGGCGAGGATGAAG TTGATACGAACCAAAGGCTCATCGGGAATGTCCCTCGCCGAGCGCTTCTCCTTGACGCTCTCGA GGAGCAGCCTCGTCGTCGGGCGGAGCTGCGTGGAGTTCGAGCCGGAGACCGTCCCGCTCCTGTC GACGCTCCGCGGTAAGCCTATTACCTTCCTTGGCCTTATGCCGCCGTTGCATGAAGGCCGCCGC GAGGACGGCGAGGATGCCACCGTCCGCTGGCTCGACGCGCAGCCGGCCAAGTCCGTCGTGTACG TCGCGCTAGGCAGCGAGGTGCCACTGGGAGTGGAGAAGGTCCACGAGCTCGCGCTCGGGCTGGA GCTCGCCGGGACGCGCTTCCTCTGGGCTCTTAGGAAGCCCACTGGCGTCTCCGACGCCGACCTC CTCCCCGCCGGCTTCGAGGAGCGCACGCGCGGCCGCGGCGTCGTGGCGACGAGATGGGTTCCTC AGATGAGCATACTGGCGCACGCCGCCGTGGGCGCGTTCCTGACCCACTGCGGCTGGAACTCGAC CATCGAGGGGCTCATGTTCGGCCACCCGCTTATCATGCTGCCGATCTTCGGCGACCAGGGACCG AACGCGCGGCTAATCGAGGCGAAGAACGCCGGATTGCAGGTGGCAAGAAACGACGGCGATGGAT CGTTCGACCGAGAAGGCGTCGCGGCGGCGATTCGTGCAGTCGCGGTGGAGGAAGAAAGCAGCAA AGTGTTTCAAGCCAAAGCCAAGAAGCTGCAGGAGATCGTCGCGGACATGGCCTGCCATGAGAGG TACATCGACGGATTCATTCAGCAATTGAGATCTTACAAGGATTGA Glycosyltransferas ATGGATAGCGGTTATAGCAGCAGCTATGCAGCAGCAGCCGGTATGCATGTTGTTATTTGTCCGT 438 e(311)(gDNA GGCTGGCATTTGGTCATCTGCTGCCGTGTCTGGATCTGGCACAGCGTCTGGCAAGCCGTGGTCA codonoptimized, TCGTGTTAGCTTTGTTAGCACACCGCGTAATATTAGCCGTCTGCCTCCGGTTCGTCCGGCACTG E.coli) GCACCGCTGGTTGCATTTGTTGCACTGCCGCTGCCTCGTGTTGAAGGTCTGCCGGATGGTGCAG AAAGCACCAATGATGTTCCGCATGATCGTCCGGATATGGTTGAACTGCATCGTCGTGCATTTGA TGGTCTGGCAGCACCGTTTAGCGAATTTCTGGGCACCGCATGTGCAGATTGGGTTATTGTTGAT GTTTTTCATCATTGGGCAGCCGCAGCAGCACTGGAACATAAAGTTCCGTGTGCAATGATGCTGC TGGGTAGCGCACATATGATTGCAAGCATTGCAGATCGTCGTCTGGAACGTGCAGAAACCGAAAG TCCTGCGGCAGCAGGTCAGGGTCGTCCTGCAGCCGCACCGACCTTTGAAGTTGCACGTATGAAA CTGATTCGTACCAAAGGTAGCAGCGGTATGAGCCTGGCAGAACGTTTTAGTCTGACCCTGAGCC GTAGCAGCCTGGTTGTTGGTCGTAGCTGTGTTGAATTTGAACCGGAAACCGTTCCGCTGCTGAG CACCCTGCGTGGTAAACCGATTACCTTTCTGGGTCTGATGCCTCCGCTGCATGAAGGTCGTCGC GAAGATGGTGAAGATGCAACCGTTCGTTGGCTGGATGCACAGCCTGCAAAAAGCGTTGTTTATG TTGCCCTGGGTAGTGAAGTTCCGCTGGGTGTTGAAAAAGTGCATGAACTGGCACTGGGTTTAGA ACTGGCAGGCACCCGTTTTCTGTGGGCACTGCGTAAACCGACCGGTGTTAGTGATGCCGATCTG CTTCCGGCAGGTTTTGAAGAACGTACCCGTGGTCGTGGTGTTGTTGCAACCCGTTGGGTTCCGC AGATGAGCATTCTGGCACATGCAGCAGTGGGTGCATTTCTGACCCATTGTGGTTGGAATAGCAC CATTGAAGGCCTGATGTTTGGCCATCCGCTGATTATGCTGCCGATTTTTGGTGATCAGGGTCCG AATGCACGTCTGATTGAAGCAAAAAATGCAGGTCTGCAGGTTGCCCGTAATGATGGTGATGGTA GCTTTGATCGTGAAGGTGTTGCAGCAGCCATTCGTGCAGTTGCAGTTGAAGAAGAAAGCAGCAA AGTTTTTCAGGCCAAAGCCAAAAAACTGCAAGAAATTGTTGCAGATATGGCCTGCCATGAACGT TATATTGATGGTTTTATTCAGCAGCTGCGTAGCTACAAAGAT UGT76G1protein MENKTETTVRRRRRIILFPVPFQGHINPILQLANVLYSKGFSITIFHTNFNKPKTSNYPHFTFR 439 FILDNDPQDERISNLPTHGPLAGMRIPIINEHGADELRRELELLMLASEEDEEVSCLITDALWY FAQSVADSLNLRRLVLMTSSLFNFHAHVSLPQFDELGYLDPDDKTRLEEQASGFPMLKVKDIKS AYSNWQILKEILGKMIKQTKASSGVIWNSFKELEESELETVIREIPAPSFLIPLPKHLTASSSS LLDHDRTVFQWLDQQPPSSVLYVSFGSTSEVDEKDFLEIARGLVDSKQSFLWVVRPGFVKGSTW VEPLPDGFLGERGRIVKWVPQQEVLAHGAIGAFWTHSGWNSTLESVCEGVPMIFSDFGLDQPLN ARYMSDVLKVGVYLENGWERGEIANAIRRVMVDEEGEYIRQNARVLKQKADVSLMKGGSSYESL ESLVSYISSL UGT76G1gene ATGGAAAATAAAACGGAGACCACCGTTCGCCGGCGCCGGAGAATAATATTATTCCCGGTACCAT 440 sequence TTCAAGGCCACATTAACCCAATTCTTCAGCTAGCCAATGTGTTGTACTCTAAAGGATTCAGTAT CACCATCTTTCACACCAACTTCAACAAACCCAAAACATCTAATTACCCTCACTTCACTTTCAGA TTCATCCTCGACAACGACCCACAAGACGAACGCATTTCCAATCTACCGACTCATGGTCCGCTCG CTGGTATGCGGATTCCGATTATCAACGAACACGGAGCTGACGAATTACGACGCGAACTGGAACT GTTGATGTTAGCTTCTGAAGAAGATGAAGAGGTATCGTGTTTAATCACGGATGCTCTTTGGTAC TTCGCGCAATCTGTTGCTGACAGTCTTAACCTCCGACGGCTTGTTTTGATGACAAGCAGCTTGT TTAATTTTCATGCACATGTTTCACTTCCTCAGTTTGATGAGCTTGGTTACCTCGATCCTGATGA CAAAACCCGTTTGGAAGAACAAGCGAGTGGGTTTCCTATGCTAAAAGTGAAAGACATCAAGTCT GCGTATTCGAACTGGCAAATACTCAAAGAGATATTAGGGAAGATGATAAAACAAACAAAAGCAT CTTCAGGAGTCATCTGGAACTCATTTAAGGAACTCGAAGAGTCTGAGCTCGAAACTGTTATCCG TGAGATCCCGGCTCCAAGTTTCTTGATACCACTCCCCAAGCATTTGACAGCCTCTTCCAGCAGC TTACTAGACCACGATCGAACCGTTTTTCAATGGTTAGACCAACAACCGCCAAGTTCGGTACTGT ATGTTAGTTTTGGTAGTACTAGTGAAGTGGATGAGAAAGATTTCTTGGAAATAGCTCGTGGGTT GGTTGATAGCAAGCAGTCGTTTTTATGGGTGGTTCGACCTGGGTTTGTCAAGGGTTCGACGTGG GTCGAACCGTTGCCAGATGGGTTCTTGGGTGAAAGAGGACGTATTGTGAAATGGGTTCCACAGC AAGAAGTGCTAGCTCATGGAGCAATAGGCGCATTCTGGACTCATAGCGGATGGAACTCTACGTT GGAAAGCGTTTGTGAAGGTGTTCCTATGATTTTCTCGGATTTTGGGCTCGATCAACCGTTGAAT GCTAGATACATGAGTGATGTTTTGAAGGTAGGGGTGTATTTGGAAAATGGGTGGGAAAGAGGAG AGATAGCAAATGCAATAAGAAGAGTTATGGTGGATGAAGAAGGAGAATACATTAGACAGAATGC AAGAGTTTTGAAACAAAAGGCAGATGTTTCTTTGATGAAGGGTGGTTCGTCTTACGAATCATTA GAGTCTCTAGTTTCTTACATTTCATCGTTGTAA UGT73C5protein MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLPYYIDGDVK 441 LTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLKVDFLSKLPE MLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKLVCFKKRIEAIPQIDKY LKSSKYIAWPLQGWQATFGGGDHPPKSDLVPRGSVSETTKSSPLHFVLFPFMAQGHMIPMVDIA RLLAQRGVIITIVTTPHNAARFKNVLNRAIESGLPINLVQVKFPYLEAGLQEGQENIDSLDTME RMIPFFKAVNFLEEPVQKLIEEMNPRPSCLISDFCLPYTSKIAKKFNIPKILFHGMGCFCLLCM HVLRKNREILDNLKSDKELFTVPDFPDRVEFTRTQVPVETYVPAGDWKDIFDGMVEANETSYGV IVNSFQELEPAYAKDYKEVRSGKAWTIGPVSLCNKVGADKAERGNKSDIDQDECLKWLDSKKHG SVLYVCLGSICNLPLSQLKELGLGLEESQRPFIWVIRGWEKYKELVEWFSESGFEDRIQDRGLL IKGWSPQMLILSHPSVGGFLTHCGWNSTLEGITAGLPLLTWPLFADQFCNEKLVVEVLKAGVRS GVEQPMKWGEEEKIGVLVDKEGVKKAVEELMGESDDAKERRRRAKELGDSAHKAVEEGGSSHSN ISFLLQDIMELAEPNNAAAS UGT73C5gene ATGTCCCCTATACTAGGTTATTGGAAAATTAAGGGCCTTGTGCAACCCACTCGACTTCTTTTGG 442 sequence AATATCTTGAAGAAAAATATGAAGAGCATTTGTATGAGCGCGATGAAGGTGATAAATGGCGAAA CAAAAAGTTTGAATTGGGTTTGGAGTTTCCCAATCTTCCTTATTATATTGATGGTGATGTTAAA TTAACACAGTCTATGGCCATCATACGTTATATAGCTGACAAGCACAACATGTTGGGTGGTTGTC CAAAAGAGCGTGCAGAGATTTCAATGCTTGAAGGAGCGGTTTTGGATATTAGATACGGTGTTTC GAGAATTGCATATAGTAAAGACTTTGAAACTCTCAAAGTTGATTTTCTTAGCAAGCTACCTGAA ATGCTGAAAATGTTCGAAGATCGTTTATGTCATAAAACATATTTAAATGGTGATCATGTAACCC ATCCTGACTTCATGTTGTATGACGCTCTTGATGTTGTTTTATACATGGACCCAATGTGCCTGGA TGCGTTCCCAAAATTAGTTTGTTTTAAAAAACGTATTGAAGCTATCCCACAAATTGATAAGTAC TTGAAATCCAGCAAGTATATAGCATGGCCTTTGCAGGGCTGGCAAGCCACGTTTGGTGGTGGCG ACCATCCTCCAAAATCGGATCTGGTTCCGCGTGGATCCGTTTCCGAAACAACCAAATCTTCTCC ACTTCACTTTGTTCTCTTCCCTTTCATGGCTCAAGGCCACATGATTCCCATGGTTGATATTGCA AGGCTCTTGGCTCAGCGTGGTGTGATCATAACAATTGTCACGACGCCTCACAATGCAGCGAGGT TCAAGAATGTCCTAAACCGTGCCATTGAGTCTGGCTTGCCCATCAACTTAGTGCAAGTCAAGTT TCCATATCTAGAAGCTGGTTTGCAAGAAGGACAAGAGAATATCGATTCTCTTGACACAATGGAG CGGATGATACCTTTCTTTAAAGCGGTTAACTTTCTCGAAGAACCAGTCCAGAAGCTCATTGAAG AGATGAACCCTCGACCAAGCTGTCTAATTTCTGATTTTTGTTTGCCTTATACAAGCAAAATCGC CAAGAAGTTCAATATCCCAAAGATCCTCTTCCATGGCATGGGTTGCTTTTGTCTTCTGTGTATG CATGTTTTACGCAAGAACCGTGAGATCTTGGACAATTTAAAGTCAGATAAGGAGCTTTTCACTG TTCCTGATTTTCCTGATAGAGTTGAATTCACAAGAACGCAAGTTCCGGTAGAAACATATGTTCC AGCTGGAGACTGGAAAGATATCTTTGATGGTATGGTAGAAGCGAATGAGACATCTTATGGTGTG ATCGTCAACTCATTTCAAGAGCTCGAGCCTGCTTATGCCAAAGACTACAAGGAGGTAAGGTCCG GTAAAGCATGGACCATTGGACCCGTTTCCTTGTGCAACAAGGTAGGAGCCGACAAAGCAGAGAG GGGAAACAAATCAGACATTGATCAAGATGAGTGCCTTAAATGGCTCGATTCTAAGAAACATGGC TCGGTGCTTTACGTTTGTCTTGGAAGTATCTGTAATCTTCCTTTGTCTCAACTCAAGGAGCTGG GACTAGGCCTAGAGGAATCCCAAAGACCTTTCATTTGGGTCATAAGAGGTTGGGAGAAGTACAA AGAGTTAGTTGAGTGGTTCTCGGAAAGCGGCTTTGAAGATAGAATCCAAGATAGAGGACTTCTC ATCAAAGGATGGTCCCCTCAAATGCTTATCCTTTCACATCCATCAGTTGGAGGGTTCCTAACAC ACTGTGGTTGGAACTCGACTCTTGAGGGGATAACTGCTGGTCTACCGCTACTTACATGGCCGCT ATTCGCAGACCAATTCTGCAATGAGAAATTGGTCGTTGAGGTACTAAAAGCCGGTGTAAGATCC GGGGTTGAACAGCCTATGAAATGGGGAGAAGAGGAGAAAATAGGAGTGTTGGTGGATAAAGAAG GAGTGAAGAAGGCAGTGGAAGAATTAATGGGTGAGAGTGATGATGCAAAAGAGAGAAGAAGAAG AGCCAAAGAGCTTGGAGATTCAGCTCACAAGGCTGTGGAAGAAGGAGGCTCTTCTCATTCTAAC ATCTCTTTCTTGCTACAAGACATAATGGAACTGGCAGAACCCAATAATGCGGCCGCATCGTGA UGT73C5gene ATGAGGCATGGATCCGTTAGCGAAACCACCAAAAGCAGTCCGCTGCATTTTGTTCTGTTTCCGT 443 sequence(Codon TTATGGCACAGGGTCATATGATTCCGATGGTTGATATTGCACGTCTGCTGGCACAGCGTGGTGT optimized,E. GATTATTACCATTGTTACCACACCGCATAATGCAGCACGCTTTAAAAACGTTCTGAATCGTGCA coli) ATTGAAAGCGGTCTGCCGATTAATCTGGTTCAGGTTAAATTTCCGTATCTGGAAGCAGGTCTGC AAGAAGGTCAAGAAAATATTGATAGCCTGGATACCATGGAACGCATGATTCCGTTTTTCAAAGC CGTGAATTTTCTGGAAGAACCGGTGCAGAAACTGATCGAAGAAATGAATCCGCGTCCGAGCTGT CTGATTAGCGATTTTTGTCTGCCGTATACCAGCAAAATCGCCAAAAAATTCAACATCCCGAAAA TCCTGTTTCATGGTATGGGTTGTTTTTGcctgctgtgtatgcatgttcTGCGTAAAAATCGTGA AATCCTGGATAACCTGAAAAGCGATAAAGAACTGTTTACCGTTCCGGATTTTCCGGATCGTGTG GAATTTACCCGTACACAGGTTCCGGTTGAAACCTATGTTCCGGCAGGCGATTGGAAAGATATTT TTGATGGTATGGTGGAAGCCAACGAAACCAGCTATGGTGTTATTGTGAATAGCTTTCAAGAACT GGAACCGGCATATGCGAAAGATTACAAAGAAGTTCGTAGCGGTAAAGCATGGACCATTGGTCCG GTTAGCCTGTGTAATAAAGTTGGTGCAGATAAAGCAGAACGCGGTAATAAAAGTGATATCGATC AGGATGAATGCCTGAAATGGCTGGATAGCAAAAAACATGGTAGCGTTCTGTATGTTTGTCTGGG TAGCATTTGCAATCTGCCGCTGAGCCAGCTGAAAGAATTAGGTCTGGGTTTAGAAGAAAGCCAG CGTCCGTTTATTTGGGTTATTCGTGGTTGGGAGAAATACAAAGAACTGGTTGAATGGTTTAGCG AAAGCGGTTTTGAAGATCGTATTCAGGATCGTGGCCTGCTGATTAAAGGTTGGAGTCCGCAGAT GCTGATTCTGAGCCATCCGAGCGTTGGTGGCTTTCTGACCCATTGTGGTTGGAATAGCACCCTG GAAGGTATTACAGCTGGCCTGCCGCTGCTGACCTGGCCTCTGTTTGCAGATCAGTTTTGTAATG AAAAACTGGTGGTGGAAGTTCTGAAAGCCGGTGTGCGTAGCGGTGTTGAACAGCCGATGAAATG GGGTGAAGAAGAAAAAATTGGCGTCCTGGTTGATAAAGAAGGTGTTAAAAAAGCCGTGGAAGAA CTGATGGGTGAAAGTGATGATGCAAAAGAACGTCGTCGTCGTGCAAAAGAGCTGGGCGATAGCG CACATAAAGCAGTTGAAGAAGGTGGTAGCAGCCATAGCAATATTAGCTTTCTGCTGCAGGATAT TATGGAACTGGCAGAACCGAATAACTAAGCGGCCGCTGAA UGT73C6protein MAFEKNNEPFPLHFVLFPFMAQGHMIPMVDIARLLAQRGVLITIVTTPHNAARFKNVLNRAIES 444 GLPINLVQVKFPYQEAGLQEGQENMDLLTTMEQITSFFKAVNLLKEPVQNLIEEMSPRPSCLIS DMCLSYTSEIAKKFKIPKILFHGMGCFCLLCVNVLRKNREILDNLKSDKEYFIVPYFPDRVEFT RPQVPVETYVPAGWKEILEDMVEADKTSYGVIVNSFQELEPAYAKDFKEARSGKAWTIGPVSLC NKVGVDKAERGNKSDIDQDECLEWLDSKEPGSVLYVCLGSICNLPLSQLLELGLGLEESQRPFI WVIRGWEKYKELVEWFSESGFEDRIQDRGLLIKGWSPQMLILSHPSVGGFLTHCGWNSTLEGIT AGLPMLTWPLFADQFCNEKLVVQILKVGVSAEVKEVMKWGEEEKIGVLVDKEGVKKAVEELMGE SDDAKERRRRAKELGESAHKAVEEGGSSHSNITFLLQDIMQLAQSNN UGT73C6(gDNA, ATGGCTTTCGAAAAAAACAACGAACCTTTTCCTCTTCACTTTGTTCTCTTCCCTTTCATGGCTC 445 native) AAGGCCACATGATTCCCATGGTTGATATTGCAAGGCTCTTGGCTCAGCGAGGTGTGCTTATAAC AATTGTCACGACGCCTCACAATGCAGCAAGGTTCAAGAATGTCCTAAACCGTGCCATTGAGTCT GGTTTGCCCATCAACCTAGTGCAAGTCAAGTTTCCATATCAAGAAGCTGGTCTGCAAGAAGGAC AAGAAAATATGGATTTGCTTACCACGATGGAGCAGATAACATCTTTCTTTAAAGCGGTTAACTT ACTCAAAGAACCAGTCCAGAACCTTATTGAAGAGATGAGCCCGCGACCAAGCTGTCTAATCTCT GATATGTGTTTGTCGTATACAAGCGAAATCGCCAAGAAGTTCAAAATACCAAAGATCCTCTTCC ATGGCATGGGTTGCTTTTGTCTTCTGTGTGTTAACGTTCTGCGCAAGAACCGTGAGATCTTGGA CAATTTAAAGTCTGATAAGGAGTACTTCATTGTTCCTTATTTTCCTGATAGAGTTGAATTCACA AGACCTCAAGTTCCGGTGGAAACATATGTTCCTGCAGGCTGGAAAGAGATCTTGGAGGATATGG TAGAAGCGGATAAGACATCTTATGGTGTTATAGTCAACTCATTTCAAGAGCTCGAACCTGCGTA TGCCAAAGACTTCAAGGAGGCAAGGTCTGGTAAAGCATGGACCATTGGACCTGTTTCCTTGTGC AACAAGGTAGGAGTAGACAAAGCAGAGAGGGGAAACAAATCAGATATTGATCAAGATGAGTGCC TTGAATGGCTCGATTCTAAGGAACCGGGATCTGTGCTCTACGTTTGCCTTGGAAGTATTTGTAA TCTTCCTCTGTCTCAGCTCCTTGAGCTGGGACTAGGCCTAGAGGAATCCCAAAGACCTTTCATC TGGGTCATAAGAGGTTGGGAGAAATACAAAGAGTTAGTTGAGTGGTTCTCGGAAAGCGGCTTTG AAGATAGAATCCAAGATAGAGGACTTCTCATCAAAGGATGGTCCCCTCAAATGCTTATCCTTTC ACATCCTTCTGTTGGAGGGTTCTTAACGCACTGCGGATGGAACTCGACTCTTGAGGGGATAACT GCTGGTCTACCAATGCTTACATGGCCACTATTTGCAGACCAATTCTGCAACGAGAAACTGGTCG TACAAATACTAAAAGTCGGTGTAAGTGCCGAGGTTAAAGAGGTCATGAAATGGGGAGAAGAAGA GAAGATAGGAGTGTTGGTGGATAAAGAAGGAGTGAAGAAGGCAGTGGAAGAACTAATGGGTGAG AGTGATGATGCAAAAGAGAGAAGAAGAAGAGCCAAAGAGCTTGGAGAATCAGCTCACAAGGCTG TGGAAGAAGGAGGCTCCTCTCATTCTAATATCACTTTCTTGCTACAAGACATAATGCAACTAGC ACAGTCCAATAAT SgCbQprotein MWRLKVGAESVGENDEKWLKSISNHLGRQVWEFCPDAGTQQQLLQVHKARKAFHDDRFHRKQSS 446 DLFITIQYGKEVENGGKTAGVKLKEGEEVRKEAVESSLERALSFYSSIQTSDGNWASDLGGPMF LLPGLVIALYVTGVLNSVLSKHHRQEMCRYVYNHQNEDGGWGLHIEGPSTMFGSALNYVALRLL GEDANAGAMPKARAWILDHGGATGITSWGKLWLSVLGVYEWSGNNPLPPEFWLFPYFLPFHPGR MWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYAVPYHEIDWNKSRNTCAKEDLYYPHPKM QDILWGSLHHVYEPLFTRWPAKRLREKALQTAMQHIHYEDENTRYICLGPVNKVLNLLCCWVED PYSDAFKLHLQRVHDYLWVAEDGMKMQGYNGSQLWDTAFSIQAIVSTKLVDNYGPTLRKAHDFV KSSQIQQDCPGDPNVWYRHIHKGAWPFSTRDHGWLISDCTAEGLKAALMLSKLPSETVGESLER NRLCDAVNVLLSLQNDNGGFASYELTRSYPWLELINPAETFGDIVIDYPYVECTSATMEALTLF KKLHPGHRTKEIDTAIVRAANFLENMQRTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCLA IRKACDFLLSKELPGGGWGESYLSCQNKVYTNLEGNRPHLVNTAWVLMALIEAGQAERDPTPLH RAARLLINSQLENGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYCHRVLTE glycoside MNKSVAPLLLAASILYGGAAAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT 447 hydrolasefamily5 TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS protein NNYPDGIGQMQHFVNDDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV [Trichoderma DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV reeseiQM6a] TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH GenBank:EGR512.1 AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK glycoside MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG 448 hydrolasefamily FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVAECSGSCTTVN 61protein[T. KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM reeseiQM6a] QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG EGR50392.1 GI:340520155 glycoside SAHTTFTTLFIDKKNQGDGTCVRMPYDDKTATNPVKPITSSDMACGRNGGDPVPFICSAKKGSL 449 hydrolasefamily LTFEFRLWPDAQQPGSIDPGHLGPCAVYLKKVDNMFSDSAAGGGWFKIWEDGYDSKTQKWCVDR 61protein, LVKNNGLLSVRLPRGLPAGYYIVRPEILALHWAAHRDDPQFYLGCAQIFVDSDVRGPLEIPRRQ partial QATIPGYVNAKTPGLTFDIYQDKLPPYPMPGPKVYIPPAKGNKPNQDLNAGRLVQTDGLIPKDC [Trichoderma LIKKANWCGRPVEPYSSARMCWRAVNDCYAQSKKCRESSPPIGLTNCDRWSDHCGKMDALCEQE reeseiQM6a] KYKGPP EGR49821.1 GI:340519583 glycoside MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH 450 hydrolasefamily7 DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS protein[T.reesei PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY QM6a] CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK EGR48251.1 SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA GI:340518009 YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC QYSNDYYSQCL glycoside MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT 451 hydrolasefamily AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG 45protein[T. YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS reeseiQM6a] PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP EGR47058.1 GI:340516811 glycoside MRATSLLAAALAVAGDALAGKIKYLGVAIPGIDFGCDIDGSCPTDTSSVPLLSYKGGDGAGQMK 452 hydrolasefamily5 HFAEDDGLNVFRISATWQFVLNNTVDGKLDELNWGSYNKVVNACLETGAYCMIDMHNFARYNGG 453 protein[T.reesei IIGQGGVSDDIFVDLWVQIAKYYEDNDKIIFGLMNEPHDLDIEIWAQTCQKVVTAIRKAGATSQ QM6a] MILLPGTNFASVETYVSTGSAEALGKITNPDGSTDLLYFDVHKYLDINNSGSHAECTTDNVDAF EGR44174.1 NDFADWLRQNKRQAIISETGASMEPSCMTAFCAQNKAISENSDVYIGFVGWGAGSFDTSYILTL GI:340513898 TPLGKPGNYTDNKLMNECILDQFTLDEKYRPTPTSISTAAEETATATATSDGDAPSTTKPIFRE ETASPTPNAVTKPSPDTSDSSDDDKDSAASMSAQGLTGTVLFTVAALGYMLVAF glycoside MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT 454 hydrolasefamily AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG 45protein[T. YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS reeseiQM6a] PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP XP_006967072.1 GI:589110099 glycoside MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH 455 hydrolasefamily7 DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS protein[T.reesei PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY QM6a] CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK XP_006965674.1 SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA GI:589107303 YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC QYSNDYYSQCL glycoside SAHTTFTTLFIDKKNQGDGTCVRMPYDDKTATNPVKPITSSDMACGRNGGDPVPFICSAKKGSL 456 hydrolasefamily LTFEFRLWPDAQQPGSIDPGHLGPCAVYLKKVDNMFSDSAAGGGWFKIWEDGYDSKTQKWCVDR 61protein, LVKNNGLLSVRLPRGLPAGYYIVRPEILALHWAAHRDDPQFYLGCAQIFVDSDVRGPLEIPRRQ partial[T.reesei QATIPGYVNAKTPGLTFDIYQDKLPPYPMPGPKVYIPPAKGNKPNQDLNAGRLVQTDGLIPKDC QM6a] LIKKANWCGRPVEPYSSARMCWRAVNDCYAQSKKCRESSPPIGLTNCDRWSDHCGKMDALCEQE XP_006964038.1 KYKGPP GI:589104031 glycoside MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG 457 hydrolasefamily FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVAECSGSCTTVN 61protein[T. KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM reeseiQM6a] QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG XP_006963879.1 GI:589103713 glycoside MNKSVAPLLLAASILYGGAAAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT 458 hydrolasefamily5 TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS protein[T.reesei NNYPDGIGQMQHFVNDDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV QM6a] DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV XP_006962583.1 TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH GI:589101121 AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK glycoside MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPPIVVGWTAADLDNGF 459 hydrolasefamily VSPDAYQNPDIICHKNATNAKGHASVKAGDTILFQWVPVPWPHPGPIVDYLANCNGDCETVDKT 61protein[T. TLEFFKIDGVGLLSGGDPGTWASDVLISNNNTWVVKIPDNLAPGNYVLRHEIIALHSAGQANGA reeseiQM6a] QNYPQCFNIAVSGSGSLQPSGVLGTDLYHATDPGVLINIYTSPLNYIIPGPTVVSGLPTSVAQG XP_006961567.1 SSAATATASATVPGGGSGPTSRTTTTARTTQASSRPSSTPPATTSAPAGGPTQTLYGQCGGSGY GI:589099089 SGPTRCAPPATCSTLNPYYAQCLN Endoglucanase-7; MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG 460 alsoknownas FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVVECSGSCTTVN Cellulase-61B KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM (Ce161B),Endo-1, QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG 4-beta-glucanase (EGVII); EndoglucanaseVII; Endoglucanase-61B; Q7Z9M7.3 GI:43314396 xylanase MVSFTSLLAASPPSRASCRPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPG 461 [Trichoderma GQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGTY reesei] NPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAWA CAA49293.1 QQGLTLGTMDYQIVAVEGYFSSGSASITVS GI:396564 xylanase[T. MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG 462 reesei] QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY CAA49294.1 IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN GI:396566 HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN beta-xylanase MVSFTSLLAGVAAISGVLAAPAAEVEPVAVEKRQTIQPGTGYNNGYFHSYWNDGHGGVTYTNGP 463 precursor GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVGNFGT [Trichoderma YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW reesei] AQQGLTLGTMDYQIVAVEGYFSSGSASITVS AAB5278.1 GI:78816 ChainA, XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 464 Structural SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComparisonOfTwo IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS MajorEndo-1,4- Beta-Xylanases FromTrichodrema Reesei 1XYP_AGI:112721 ChainB, XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 465 Structural SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComparisonOfTwo IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS MajorEndo-1,4- Beta-Xylanases FromTrichodrema Reesei 1XYP_BGI:1127211 ChainA, XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 466 Structural SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComparisonOfTwo IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS MajorEndo-1,4- Beta-Xylanases FromTrichodrema Reesei 1XYO_AGI:1127212 ChainB, XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 467 Structural SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComparisonOfTwo IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS MajorEndo-1,4- Beta-Xylanases FromTrichodrema Reesei 1XYO_BGI:1127213 ChainA, XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 468 Structural SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComparisonOfTwo IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS MajorEndo-1,4- Beta-Xylanases FromTrichodrema Reesei 1ENX_AGI:1127272 ChainB, XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 469 Structural SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComparisonOfTwo IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS MajorEndo-1,4- Beta-Xylanases FromTrichodrema Reesei 1ENX_BGI:1127273 ChainA,Endo-1,4- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 470 beta-xylanaseIi SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComplexWith4,5- IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS epoxypentyl-beta- D-xyloside 1RED_AGI:1942592 ChainB,Endo-1,4- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 471 beta-xylanaseIi SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComplexWith4,5- IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS epoxypentyl-beta- D-xyloside 1RED_BGI:1942593 ChainA,Endo-1,4- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 472 Beta-XylanaseIi SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComplexWith3,4- IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS Epoxybutyl-Beta-D- Xylostde 1REE_AGI:1942594 ChainB,Endo-1,4- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 473 Beta-XylanaseIi SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComplexWith3,4- IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS Epoxybutyl-Beta- D-Xylostde 1REE_BGI:1942595 ChainA,Endo-1,4- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 474 Beta-XylanaseIi SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComplexWith2,3- IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS Epoxypropyl-Beta- D-Xyloside 1REF_AGI:1942596 ChainB,Endo-1,4- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 475 Beta-XylanaseIi SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ComplexWith2,3- IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS Epoxypropyl-Beta- D-Xyloside 1REF_BGI:1942597 xylanaseIII[T. reesei] BAA89465.2 GI:7328936 xylanase,partial SYLSVYGWXTDPLIEYYIVESYGDYNPGSGGTYKGTVTSDGSVYDIYTATRTNAASIQGTATFT 476 [T.reesei] QYWSVRR AAG01167.1 GI:9858850 Transcription MDLRQACDRCHDKKLRCPRISGSPCCSRCAKANVACVFSPPSRPFRPHEPLNHSHEHSHSHSHN 477 factorACEII[T. HNGVGVSFDWLDLMSLEQQQEQQQGQPQHPPPPVQTLSERLAALLCALDRMLQAVPSSLDMHHV reesei] SRQQLREYADTVGTGFDLQSTLDSLLHHAQDLASLYSEAVPASFNKRTTAAEADALCAVPDCVH AAK69383.1 QDRTSLHTTPLPKLDHALLNLVMACHIRLLDVMDTLAEHGRMCAFMVATLPPDYDPKFAVPEIR GI:14581734 VGTFVAPTDTAASMLLSVVVELQTVLVARVKDLVAMVDQVKDDARAAREAKVVRLQCGILLERA ESTLGEWSRFKDGLVSARLLK xylanaseregulator MLSNPLRRYSAYPDISSASFDPNYHGSQSHLHSINVNTFGNSHPYPMQHLAQHAELSSSRMIRA 478 1,partial SPVQPKQRQGSLIAARKNSTGTAGPIRRRISRACDQCNQLRTKCDGLHPCAHCIEFGLGCEYVR [Trichoderma ERKKRGKASRKDIAAQQAAAAAAQHSGQVQDGPEDQHRKLSRQQSESSRGSAELAQPAHDPPHG reesei] HIEGSVSSFSDNGLSQHAAMGGMDGLEDHHGHVGVDPALGRTQLEASSAMGLGAYGEVHPGYES AAO33577.1 PGMNGHVMVPPSYGAQTTMAGYSGISYAAQAPSPATYSSDGNFRLTGHIHDYPLANGSSPSWGV GI:28194501 SLASPSLRYHVLRPVLLDVRNIYPVSLACDQMDMYFSSSSSAQMRPMSPYVEGFVFRKRSFLHP TDPRRCQPALLASMLWVAAQTSEASFLTSLPSARSKVCQKLLELTVGLLQPLIHTGTNSPSPKT SPVVGAAALGVLGVAMPGSLNMDSLAGETGAFGAIGSLDDVIAYVRLATVVSASEYKGASLRWW GAAWSLARELKLGRELPPGNPPANQEDGEGLSEDVDEHDLNRNNTRLGRKRSAKSDAITEEERE ERRRAWWLVYIVDRHLALCYNRPLFLLDSECSDLYHPMDDIKWQAGKFRSHDAGNSSINIDSSM TDEFGDSPRAARGAHYECRGRSIFGYFLSLMTILGEIVDVHHAKSHPRFGVGFRSARDWDEQVA EITRHLDMYEESLKRFVAKHLPLSSKDKEQHEMHDSGAVTDMQSPLSVRTNASSRMTESEIQAS IVVAYSTHVMHVLHILLADKWDPINLLDDDDLWISSEGFVTATSHAVSAAEAISQILEFDPGLE FMPFFFGIYLLQGSFLLLLIADKLQAEASPSVIKACETIVRAHEACVVTLSTEYQRNFSKVMRS ALALIRGRVPEDLAEQQQRRRELLALYRWTGNGTGLAL Transcription MDLRQACDRCHDKKLRCPRISGSPCCSRCAKANVACVFSPPSRPFRPHEPLNHSHEHSHSHSHN 479 factorACEII HNGVGVSFDWLDLMSLEQQQEQQQGQPQHPPPPVQTLSERLAALLCALDRMLQAVPSSLDMHHV protein SRQQLREYADTVGTGFDLQSTLDSLLHHAQDLASLYSEAVPASFNKRTTAAEADALCAVPDCVH Q96WN6.1 QDRTSLHTTPLPKLDHALLNLVMACHIRLLDVMDTLAEHGRMCAFMVATLPPDYDPKFAVPEIR GI:50400614 VGTFVAPTDTAASMLLSVVVELQTVLVARVKDLVAMVDQVKDDARAAREAKVVRLQCGILLERA ESTLGEWSRFKDGLVSARLLK ChainA,Structure TIQPGTGXNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS 480 OfVi1-Xylanase YNPNGNSXLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII 2D97_A GTATFXQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGXFSSGSASITVS GI:112490431 ChainA,Structure TIQPGTGXNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS 481 OfVi1(ExtraKiI2 YNPNGNSXLSVYGWSRNPLIEYYIVENFGTXNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII ADDED)-Xylanase GTATFXQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDXQIVAVEGXFSSGSASITVS 2D98_A, GI:112490433 ChainA,Xylanase XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 482 IiFromTricoderma SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI ReeseiAt100k IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS 2DFB_A GI:112490475 ChainA,Xylanase XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 483 IiFromT.Reesei SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI At293k IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS 190aaprotein 2DFC_A GI:112490477 TPA_inf:chitinase MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL 484 18-12[T.reesei] NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE DAA05860.1 AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN GI:126032263 KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV DALK TPA_inf:chitinase MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL 485 18-12[T.reesei] NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE DAA05860.1 AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN GI:126032263 KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV DALK TPA_inf:chitinase MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR 486 18-13[T.reesei] FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT DAA5861.1 TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL GI:126032265 FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN DNDCSDPYSCKNGVCSN TPA_inf:chitinase MFFTKAVGGLGLLASLASSAPNPIARRQAPGAQNVVYWGQNGGGTVENNDLSAYCTPTSGIDII 487 18-14[T.reesei] VLSFLYQWGQGSSALGGTIGQSCGITTSGEPQNCDALTAAITKCKTAGVKIILSLGGASAFSSF DAA05862.1 QTADQAAQAGQYLWNAYGGGSGVTRPLGNNVMDGFDLDIESNPGTNENYAALVSALRSNFASDP GI:126032267 SRQYVISGAPQCPLPEPNMGVIIQNAQFDYLWVQFYNNNEYPGDPCSLGLPGDAPFNFNNWTTF IQSTPSKDAKVFVGVPAAPLAANGAPSGEVYYATPSQLADIVNDVKSNPAFGGIMMWSAGFSDT NVNDGCNYAQEAKNILLTGSPCSSGPVSVSRPPVSSPTITSSPPGTSPAPPSQTGSVPQWGQCG GNGYTGPTQCVAPFKCVATSEWWSQCE TPA_inf:chitinase MLSRTLLTALGLTTIAAAAPSQTVKTRQAPGGQNAVYWGATNNENDNLSTYCTASSGIDIVILS 488 18-16[T.reesei] FLDIYGATGNFPSGNMGNSCYVGTNGVPQLCDDLASSIATCQAAGIKVIISLGGAASSYSLQSQ DAA05864.1 SQAVAIGQYLWNAYGNSGNTTVQRPFGNVFVNGFDFDIELNAGSQYYQYLISTLRSNFANDPKN GI:126032269 TYYITGAPQCPIPEPNMGEIISTSQFDYLWVQFYNNNPVCSLGLPGDAPFNFNDWVSFISTTPS KNAKLFVGAPASTLGANGNAGGAKYYATPEQLAGIVNSVKSSPFFGGIMLWDAGYSDSNVNNGC NYAQEAKNILLTGTACGGESSPPPSTTTTAVPPPASSTPSNPSGGSVPQWGQCGGDGYTGPTQC VAPYKCVATSEWWSSCQ TPA_inf:chitinase MVSASAGLAAVGLLNGYWGQYTTTEGLRPHCDSGVDSITLGFVNGAPDASGYPSLNFGPNCWAE 489 18-18[T.reesei] SYPGNLGLPSKLLSHCMSLQSDIPYCRSKGVKVILSIGGVYNALTSNYFVGDNGTATDFATFLY DAA05866.1 NAFGPYNASYTGPRPFDDITTGLPTSVDGFDFDIEADFPNGPYIKMIETFRSLDSSMLITGAPQ GI:0126032275 CPTNPQYFVMKDMIQQAAFDKLFIQFYNNPVCDAIPGNTAGDKFNYDDWEAVIAGSAKSKSAKL YIGLPAIQEPNESGYIDPIAMKNLVCQYKDRPHFGGLSLWDLSRGLVNNINGTSFNQWALDALQ YGCNPIPTTTTTTSTVSSTTAASSTTASSTTASTTKASSTSKASSTSKASSTSKASSTSKASST SKASSTSKASSTSKASTTSKASTTSKVSTTSKASSTSKASSSTKASTTSKASSTSKASTTSKAS TTSKASTTSKASTTSKASTTSKASSTSKASSSTKASTTSKASSTSKASTTSKASTTSKASTTSK ASTTSKASTTSKASSTSKASSSTKASTTSKASSTSKASTTSKASTTSKASTTSKASTTSKASTT SKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKVSTTSKASTTSKASTTSKAS STSKVSTTSKASTTSKVSAKATTSTKASTTVKPSTTSKASTTSKASTTSKASTTSKASTTSKAS TTSKASTTSKASTTSKAATTSVKPTSKTSTSSKPNVSASSSNVGRDATSLVEASTSTSAAVLYP TTTSRWSNSTITRSSSLTTPIVSDPASLTTSVVYTTSVHTVTKCPAYVTDCPAGGYVTTETIPL YTTVCPISEATQTAAPTVTTEAPQPWTTSTVYTTRVYTITSCAPGVVDCPANQVTTETIPWYTT VCPVTATATPVGPGSVVFPQNTEVGQPSLVGPVVEAAYPTASSSLQTLVKPATSVGVPQGSPAG SSVAPGSSSKPTAPAGPPSYPTGGSGNASPSGSWSGVPVGPSSVPGIPEANAASVMSASLFGLV IVMAAQVFVL ChainA, ASINYDQNYQTGGQVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLS 490 Structural VYGWSTNPLVEYYIMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVR ComparisonOfTwo NSPRTSGTVTVQNHFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN MajorEndo-1,4- Beta-Xylanases FromT.Reesei 1XYN_A GI:157834272 xylanase,partial QTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 491 [Trichoderma SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI reesei] IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS ACB38137.1 GI:170786291 ChainA,Xylanase XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 492 IiFromT.Reesei SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI Cocrystallized IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS WithTrIs- Dipicolinate Europium 3LGR_A GI:319443539 ChainA,Crystal TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS 493 StructuresOf YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII MutantEndo-1,4- GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVQGYFSSGSASITVS xylanaseIi ComplexedWith Substrate(1.15A) AndProducts (1.6A) 4HK8_A GI:572153255 ChainA,Crystal IQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGHFVGGKGWQPGTKNKVINFSGSY 494 StructuresOf NPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIG MutantEndo-beta- TATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS 1,4-xylanaseIi ComplexedWith Substrate(1.15A) AndProducts (1.6A) 4HK9_A GI:572153256 ChainA,Crystal TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGHFVGGKGWQPGTKNKVINFSGS 495 StructuresOf YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII MutantEndo-beta- GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS 1,4-xylanaseIi ComplexedWith Substrate(1.15A) AndProducts (1.6A) 4HKL_A GI:572153257 ChainA,Crystal TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINNPLI 496 StructuresOf EYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGS MutantEndo-beta- VNTANHFNAWAQQGLTLGTMDYQIVAVQGYFSSGSASITVS 1,4-xylanaseIi (e177p)InApo Form 4HKO_A GI:572153258 ChainA,Crystal XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 497 StructuresOf SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI MutantEndo-beta- IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS 1,4-xylanaseIi ComplexedWith SubstrateAnd Products 4HKW_A GI:572153259 ChainA,JointX- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 498 ray/neutron SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI StructureOf IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS TrichodermaReesei XylanaseIiIn ComplexWithMes AtPh5.7 4S2D_A GI:929984639 ChainA,JointX- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 499 ray/neutron SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI StructureOfT. IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS ReeseiXylanaseiIi AtPhi4.4 4S2F_A GI:929984640 ChainA,JointX- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 500 ray/neutron SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI StructureOfT. IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS ReeseiXylanaseIi AtPh5.8 4S2G_A GI:929984641 ChainA,JointX- XTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 501 ray/neutron SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI StructureOfT. IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS ReeseiXylanaseIi AtPh8.5 4S2H_A GI:929984642 ChainA,X-ray TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGDFVGGKGWQPGTKNKVINFSGS 502 StructureAnalysis YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII OfXylanase-N44d GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS 4XQ4_A GI:929984784 ChainB,X-ray TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGDFVGGKGWQPGTKNKVINFSGS 503 StructureAnalysis YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII OfXylanase-N44d GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS 4XQ4_B GI:929984785 ChainA,X-ray TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS 504 StructureAnalysis YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII OfXylanase-wtAt GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS Ph4.0 4XQD_A GI:929984786 ChainB,X-ray TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGS 505 StructureAnalysis YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII OfXylanase-wtAt GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS Ph4.0 4XQD_B GI:929984787 ChainA,X-ray TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGEFVGGKGWQPGTKNKVINFSGS 506 StructureAnalysis YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII OfXylanase-n44e GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS WithMesAt Ph6.0 4XQW_A GI:929984788 ChainA,Neutron TIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGDFVGGKGWQPGTKNKVINFSGS 507 AndX-ray YNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSII StructureAnalysis GTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS OfXylanase:N44d AtPh6 4XPV_A GI:931139811 truncatedxylanase MVSFSSLVVALVGIASSWALWRNPSTQT 508 5[T.reesei] ANW825841 GI:1048222282 truncatedxylanase MVSFSSLVVALVGIASSWALWRNPSTQT 509 5[T.reesei] ANW825851 GI:1048222284 xylanase[T. MVSFSSLVVALVGIASSWAAPLEESPNANITERGPSNFVLGGHNAVRRAAINYNQDYTTGGDVV 510 reesei] YTHSNTGFAVNWSYPNDFVVGVGWNPGGSAPINFSGNFGVGSGVGLLSVYGWSTNPLVEYYVVE ANX99792.1 DNFGFSSGGTVKGSVTSDGSSYTIWENTRVNEPSIVGTATFNQYISIRNSKRSSGTVTVANHFN GI:1049178838 AWKSLGMNLGTMNYQVIAVEGWGGQGGVQQTVSN xylanase[T. MVSFSSLVVALVGIASSLAAPLEESLNANITERGPNNFVLGGHNAVRRAAINYNQDYTTGGDVV 511 reesei] YTHSNTGFAVNWSYPNDFVVGVGWNPGGSAPINFSGNFGVGSGVGLLSVYGWSTNPLVEYYVVE ANX99793.1 DNFGFSSGGTVKGSVTSDGSSYTIWENTRVNEPSIVGTATFNQYISIRNSKRSSGTVTIANHFN GI:1049178840 AWKSLGMNLGTLNYQVIAVEGWGGQGGVQQTVSN xylanase[T. MVSFSSLVVALVGIASSLAAPLEESLNANITERGPNNFVLGGHNAVRRAAINYNQDYTTGGDVV 512 reesei] YTHSNTGFAVNWSYPNDFVVGVGWNPGGSAPINFSGNFGVGSGVGLLSVYGWSTNPLVEYYVVE ANX99794.1 DNFGFSSGGTVKGSVTSDGSSYTIWENTRVNEPSIVGTATFNQYISIRNSKRSSGTVTIANHFN GI:1049178842 AWKSLGMNLGTLNYQVIAVEGWGGQGGVQQTVSN xylanase[T. MVSFSSLVVALVGIASSLAAPLEESLNANITERGPNNFVLGGHNAVRRAAINYNQDYTTGGDVV 513 reesei] YTHSNTGFAVNWSYPNDFVVGVGWNPGGSAPINFSGNFGVGSGVGLLSVYGWSTNPLVEYYVVE ANX99795.1 DNFGFSSGGTVKGSVTSDGSSYTIWENTRVNEPSIVGTATFNQYISIRNSKRSSGTVTIANHFN GI:1049178844 AWKSLGMNLGTLNYQVIAVEGWGGQGGVQQTVSN xylanase2, QTIQPGTGYNNGYCYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWCPGTKNKVINFSG 514 partial[T. SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI reesei] IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS APU51339.1 GI:1130479396 xylanase2, QTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 515 partial[T. SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI reesei] IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS APU51340.1 GI:1130479398 ChainA,Microed QTIQPGTGYNNGYFYSYWNDGHGGVTYTNGPGGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSG 516 StructureOf SYNPNGNSYLSVYGWSRNPLIEYYIVENFGTYNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSI XylanaseAt2.3A IGTATFYQYWSVRRNHRSSGSVNTANHFNAWAQQGLTLGTMDYQIVAVEGYFSSGSASITVS Resolution 5K7P_A GI:1175128641 glycoside MFFTKAVGGLGLLASLASSAPNPIARRQAPGAQNVVYWGQNGGGTVENNDLSAYCTPTSGIDII 517 hydrolasefamily VLSFLYQWGQGSSALGGTIGQSCGITTSGEPQNCDALTAAITKCKTAGVKIILSLGGASAFSSF 18protein, QTADQAAQAGQYLWNAYGGGSGVTRPLGNNVMDGFDLDIESNPGTNENYAALVSALRSNFASDP chitinase[T. SRQYVISGAPQCPLPEPNMGVIIQNAQFDYLWVQFYNNNEYPGDPCSLGLPGDAPFNFNNWTTF reesei]QM6a IQSTPSKDAKVFVGVPAAPLAANGAPSGEVYYATPSQLADIVNDVKSNPAFGGIMMWSAGFSDT EGR44650.1 NVNDGCNYAQEAKNILLTGSPCSSGPVSVSRPPVSSPTITSSPPGTSPAPPSQTGSVPQWGQCG GI:340514387 GNGYTGPTQCVAPFKCVATSEWWSQCE glycoside MKSSISVVLALLGHSAAWSYATKSQYRANIKINARQTYQTMIGGGCSGAFGIACQQFGSSGLSP 518 hydrolasefamily5 ENQQKVTQILFDENIGGLSIVRNDIGSSPGTTILPTCPATPQDKFDYVWDGSDNCQFNLTKTAL protein[T.reesei] KYNPNLYVYADAWSAPGCMKTVGTENLGGQICGVRGTDCKHDWRQAYADYLVQYVRFYKEEGID QM6a ISLLGAWNEPDFNPFTYESMLSDGYQAKDFLEVLYPTLKKAFPKVDVSCCDATGARQERNILYE EGR44819.1 LQQAGGERYFDIATWHNYQSNPERPFNAGGKPNIQTEWADGTGPWNSTWDYSGQLAEGLQWALY GI:340514558 MHNAFVNSDTSGYTHWWCAQNTNGDNALIRLDRDSYEVSARLWAFAQYFRFARPGSVRIGATSD VENVYVTAYVNKNGTVAIPVINAAHFPYDLTIDLEGIKKRKLSEYLTDNSHNVTLQSRYKVSGS SLKVTVEPRAMKTFWLEPQSTFAVI glycoside MVSFTSLLAGVAAISGVLAAPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGP 519 hydrolasefamily GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGT 11[T.reesei] YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW QM6a AQQGLTLGTMDYQIVAVEGYFSSGSASITVS EGR45030.1 GI:340514771 glycoside MLSRTLLTALGLTTIAAAAPSQTVKTRQAPGGQNAVYWGATNNENDNLSTYCTASSGIDIVILS 520 hydrolasefamily FLDIYGATGNFPSGNMGNSCYVGTNGVPQLCDDLASSIATCQAAGIKVIISLGGAASSYSLQSQ 18,chitinase[T. SQAVAIGQYLWNAYGNSGNTTVQRPFGNVFVNGFDFDIELNAGSQYYQYLISTLRSNFANDPKN reeseiQM6a] TYYITGAPQCPIPEPNMGEIISTSQFDYLWVQFYNNNPVCSLGLPGDAPFNFNDWVSFISTTPS EGR45486.1 KNAKLFVGAPASTLGANGNAGGAKYYATPEQLAGIVNSVKSSPFFGGIMLWDAGYSDSNVNNGC GI:340515230 NYAQEAKNILLTGTACGGESSPPPSTTTTAVPPPASSTPSNPSGGSVPQWGQCGGDGYTGPTQC VAPYKCVATSEWWSSCQ xylanaseregulator MLSNPLRRYSAYPDISSASFDPNYHGSQSHLHSINVNTFGNSHPYPMQHLAQHAELSSSRMIRA 521 1[T.reeseiQM6a] SPVQPKQRQGSLIAARKNSTGTAGPIRRRISRACDQCNQLRTKCDGLHPCAHCIEFGLGCEYVR EGR48040.1 ERKKRGKASRKDIAAQQAAAAAAQHSGQVQDGPEDQHRKLSRQQSESSRGSAELAQPAHDPPHG GI:340517797 HIEGSVSSFSDNGLSQHAAMGGMDGLEDHHGHVGVDPALGRTQLEASSAMGLGAYGEHPGYESP GMNGHVMVPPSYGAQTTMAGYSGISYAAQAPSPATYSSDGNFRLTGHIHDYPLANGSSPSWGQS DLRYPVLEPLLPHLGNILPVSLACDLIDLYFSSSSSAQMHPMSPYVLGFVFRKRSFLHPTNPRR CQPALLASMLWVAAQTSEASFLTSLPSARSKVCQKLLELTVGLLQPLIHTGTNSPSPKTSPVVG AAALGVLGVAMPGSLNMDSLAGETGAFGAIGSLDDVITYVHLATVVSASEYKGASLRWWGAAWS LARELKLGRELPPGNPPANQEDGEGLSEDVDEHDLNRNNTRFVTEEEREERRRAWWLVYIVDRH LALCYNRPLFLLDSECSDLYHPMDDIKWQAGKFRSHDAGNSSINIDSSMTDEFGDSPRAARGAH YECRGRSIFGYFLSLMTILGEIVDVHHAKSHPRFGVGFRSARDWDEQVAEITRHLDMYEESLKR FVAKHLPLSSKDKEQHEMHDSGAVTDMQSPLSVRTNASSRMTESEIQASIVVAYSTHVMHVLHI LLADKWDPINLLDDDDLWISSEGFVTATSHAVSAAEAISQILEFDPGLEFMPFFYGVYLLQGSF LLLLIADKLQAEASPSVIKACETIVRAHEACVVTLSTEYQRNFSKVMRSALALIRGRVPEDLAE QQQRRRELLALYRWTGNGTGLAL predictedprotein, LRILPVGDSITYGFLSDQDGGDGNGYRLQLRQHLSKDRVVFAGTETSGNMTDGYYLIVSSSLHR 522 partial[T.reesei QAAWNGKTIQYISDHVTPSLEQRPNIILLHAGTNDMNPNGAISREGHDPVAASERLGSLVDKMT QM6a] TLCPDAVILVAMIIGTCNDEQAPQTKVFQSLIPNVVAPRLESGKHVLAVDFSTFPLDKLRDCIH EGR49987.1 PTNEGYHLLGYYWYDFIAQIPRDWITAPVGEDPQRPEEQNLAMRLETDL GI:340519749 Transcription MSFSNPRRRTPVTRPGTDCEHGLSLKTTMTLRKGATFHSPTSPSASSAAGDFVPPTLTRSQSAF 523 factor[T.reesei DDVVDASRRRIAMTLNDIDEALSKASLSDKSPRPKPLRDTSLPVPRGFLEPPVVDPAMNKQEPE QM6a] RRVLRPRSVRRTRNHASDSGIGSSVVSTNDKAGAADSTKKPQASALTRSAASSTTAMLPSLSHR EGR51484.1 AVNRIREHTLRPLLEKPTLKEFEPIVLDVPRRIRSKEIICLRDLEKTLIFMAPEKAKSAALYLD GI:340521249 FCLTSVRCIQATVEYLTDREQVRPGDRPYTNGYFIDLKEQIYQYGKQLAAIKEKGSLADDMDID PSDEVRLYGGVAENGRPAELIRVKKDGTAYSMATGKIVDMTESPTPLKRSLSEQREDEEEIMRS MARRKKNATPEELAPKKCREPGCTKEFKRPCDLTKHEKTHSRPWKCPIPTCKYHEYGWPTEKEM DRHINDKHSDAPAMYECLFKPCPYKSKRESNCKQHMEKAHGWTYVRTKTNGKKAPSQNGSTAQQ TPPLANVSTPSSTPSYSVPTPPQDQVMSTDFPMYPADDDWLATYGAQPNTIDAMDLGLENLSPA SAASSYEQYPPYQNGSTFIINDEDIYAAHVQIPAQLPTPEQVYTKMMPQQMPVYHVQQEPCTTV PILGEPQFSPNAQQNAVLYTPTSLREVDEGFDESYAADGADFQLFPATVDKTDVFQSLFTDMPS ANLGFSQTTQPDIFNQIDWSNLDYQGFQE Glycoside MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG 524 hydrolasefamily TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH 10protein[T. TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL reeseiQM6a] LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS EGR52056.1 GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW GI:340521822 RASTNPLLFDANFNPKPAYNSIVGILQ Glycoside MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR 525 hydrolasefamily FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT 18protein, LSSDAQGQQLADTVWNLFLGGTANPTLRPFGSVKLDGVDLDNETGNPTGYLAMAQRFKSNFAKD chitinase[T. TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL reeseiQM6a] FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG EGR52465.1 TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN GI:340522232 DNDCSDPYSCKNGVCSN Glycoside MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL 526 hydrolasefamily NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE 18proetin[T. AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN reeseiQM6a] KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK EGR52759.1 TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV GI:340522526 DALK Glycoside MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG 527 hydrolasefamily QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY 11protein[T. IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN reeseiQM6a] HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN EGR52985.1 GI:340522752 Glycoside MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL 528 hydrolasefamily NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE 18protein[T. AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN reeseiQM6a] KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK XP_006961069.1 TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV GI:589098093 DALK Glycoside MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR 529 hydrolasefamily FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT 18protein, LSSDAQGQQLADTVWNLFLGGTANPTLRPFGSVKLDGVDLDNETGNPTGYLAMAQRFKSNFAKD chitinase[T. TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL reeseiQM6a] FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG XP_006961376.1 TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN GI:589098707 DNDCSDPYSCKNGVCSN Glycoside MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG 530 hydrolasefamily QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY 11protein[T. IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN reeseiQM6a] HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN XP_006961811.1 GI:589099577 Glycoside MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG 531 hydrolasefamily TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH 10[T.reesei TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL QM6a] LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS XP_006962419.1 GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW GI:589100793 RASTNPLLFDANFNPKPAYNSIVGILQ Transcription MSFSNPRRRTPVTRPGTDCEHGLSLKTTMTLRKGATFHSPTSPSASSAAGDFVPPTLTRSQSAF 532 factorprotein[T. DDVVDASRRRIAMTLNDIDEALSKASLSDKSPRPKPLRDTSLPVPRGFLEPPVVDPAMNKQEPE reeseiQM6a] RRVLRPRSVRRTRNHASDSGIGSSVVSTNDKAGAADSTKKPQASALTRSAASSTTAMLPSLSHR XP_006962963.1 AVNRIREHTLRPLLEKPTLKEFEPIVLDVPRRIRSKEIICLRDLEKTLIFMAPEKAKSAALYLD GI:589101881 FCLTSVRCIQATVEYLTDREQVRPGDRPYTNGYFIDLKEQIYQYGKQLAAIKEKGSLADDMDID PSDEVRLYGGVAENGRPAELIRVKKDGTAYSMATGKIVDMTESPTPLKRSLSEQREDEEEIMRS MARRKKNATPEELAPKKCREPGCTKEFKRPCDLTKHEKTHSRPWKCPIPTCKYHEYGWPTEKEM DRHINDKHSDAPAMYECLFKPCPYKSKRESNCKQHMEKAHGWTYVRTKTNGKKAPSQNGSTAQQ TPPLANVSTPSSTPSYSVPTPPQDQVMSTDFPMYPADDDWLATYGAQPNTIDAMDLGLENLSPA SAASSYEQYPPYQNGSTFIINDEDIYAAHVQIPAQLPTPEQVYTKMMPQQMPVYHVQQEPCTTV PILGEPQFSPNAQQNAVLYTPTSLREVDEGFDESYAADGADFQLFPATVDKTDVFQSLFTDMPS ANLGFSQTTQPDIFNQIDWSNLDYQGFQE Predictedprotein, LRILPVGDSITYGFLSDQDGGDGNGYRLQLRQHLSKDRVVFAGTETSGNMTDGYYLIVSSSLHR 533 partial[T.reesei QAAWNGKTIQYISDHVTPSLEQRPNIILLHAGTNDMNPNGAISREGHDPVAASERLGSLVDKMT QM6a] TLCPDAVILVAMIIGTCNDEQAPQTKVFQSLIPNVVAPRLESGKHVLAVDFSTFPLDKLRDCIH XP_006964048.1 PTNEGYHLLGYYWYDFIAQIPRDWITAPVGEDPQRPEEQNLAMRLETDL GI:589104051 Xylanaseregulator MLSNPLRRYSAYPDISSASFDPNYHGSQSHLHSINVNTFGNSHPYPMQHLAQHAELSSSRMIRA 534 1protein[T. SPVQPKQRQGSLIAARKNSTGTAGPIRRRISRACDQCNQLRTKCDGLHPCAHCIEFGLGCEYVR reeseiQM6a] ERKKRGKASRKDIAAQQAAAAAAQHSGQVQDGPEDQHRKLSRQQSESSRGSAELAQPAHDPPHG XP_006966092.1 HIEGSVSSFSDNGLSQHAAMGGMDGLEDHHGHVGVDPALGRTQLEASSAMGLGAYGEVHPGYES GI:589108139 PGMNGHVMVPPSYGAQTTMAGYSGISYAAQAPSPATYSSDGNFRLTGHIHDYPLANGSSPSWGQ SDLRYPVLEPLLPHLGNILPVSLACDLIDLYFSSSSSAQMHPMSPYVLGFVFRKRSFLHPTNPR RCQPALLASMLWVAAQTSEASFLTSLPSARSKVCQKLLELTVGLLQPLIHTGTNSPSPKTSPVV GAAALGVLGVAMPGSLNMDSLAGETGAFGAIGSLDDVITYVHLATVVSASEYKGASLRWWGAAW SLARELKLGRELPPGNPPANQEDGEGLSEDVDEHDLNRNNTRFVTEEEREERRRAWWLVYIVDR HLALCYNRPLFLLDSECSDLYHPMDDIKWQAGKFRSHDAGNSSINIDSSMTDEFGDSPRAARGA HYECRGRSIFGYFLSLMTILGEIVDVHHAKSHPRFGVGFRSARDWDEQVAEITRHLDMYEESLK RFVAKHLPLSSKDKEQHEMHDSGAVTDMQSPLSVRTNASSRMTESEIQASIVVAYSTHVMHVLH ILLADKWDPINLLDDDDLWISSEGFVTATSHAVSAAEAISQILEFDPGLEFMPFFYGVYLLQGS FLLLLIADKLQAEASPSVIKACETIVRAHEACVVTLSTEYQRNFSKVMRSALALIRGRVPEDLA EQQQRRRELLALYRWTGNGTGLAL Glycoside MLSRTLLTALGLTTIAAAAPSQTVKTRQAPGGQNAVYWGATNNENDNLSTYCTASSGIDIVILS 535 hydrolasefamily FLDIYGATGNFPSGNMGNSCYVGTNGVPQLCDDLASSIATCQAAGIKVIISLGGAASSYSLQSQ 18protein, SQAVAIGQYLWNAYGNSGNTTVQRPFGNVFVNGFDFDIELNAGSQYYQYLISTLRSNFANDPKN chitinase[T. TYYITGAPQCPIPEPNMGEIISTSQFDYLWVQFYNNNPVCSLGLPGDAPFNFNDWVSFISTTPS reeseiQM6a] KNAKLFVGAPASTLGANGNAGGAKYYATPEQLAGIVNSVKSSPFFGGIMLWDAGYSDSNVNNGC XP_006968673.1 NYAQEAKNILLTGTACGGESSPPPSTTTTAVPPPASSTPSNPSGGSVPQWGQCGGDGYTGPTQC GI:589113301 VAPYKCVATSEWWSSCQ Glycoside MVSFTSLLAGVAAISGVLAAPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGP 536 hydrolasefamily GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGT 11[T.reesei QM6a] YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW XP_006968947.1 AQQGLTLGTMDYQIVAVEGYFSSGSASITVS GI:589113849 Glycoside MKSSISVVLALLGHSAAWSYATKSQYRANIKINARQTYQTMIGGGCSGAFGIACQQFGSSGLSP 537 hydrolasefamily5 ENQQKVTQILFDENIGGLSIVRNDIGSSPGTTILPTCPATPQDKFDYVWDGSDNCQFNLTKTAL protein[T.reesei KYNPNLYVYADAWSAPGCMKTVGTENLGGQICGVRGTDCKHDWRQAYADYLVQYVRFYKEEGID QM6a] ISLLGAWNEPDFNPFTYESMLSDGYQAKDFLEVLYPTLKKAFPKVDVSCCDATGARQERNILYE XP_006969226.1 LQQAGGERYFDIATWHNYQSNPERPFNAGGKPNIQTEWADGTGPWNSTWDYSGQLAEGLQWALY GI:589114407 MHNAFVNSDTSGYTHWWCAQNTNGDNALIRLDRDSYEVSARLWAFAQYFRFARPGSVRIGATSD VENVYVTAYVNKNGTVAIPVINAAHFPYDLTIDLEGIKKRKLSEYLTDNSHNVTLQSRYKVSGS SLKVTVEPRAMKTFWLEPQSTFAVI Glycoside MFFTKAVGGLGLLASLASSAPNPIARRQAPGAQNVVYWGQNGGGTVENNDLSAYCTPTSGIDII 538 hydrolasefamily VLSFLYQWGQGSSALGGTIGQSCGITTSGEPQNCDALTAAITKCKTAGVKIILSLGGASAFSSF 18protein, QTADQAAQAGQYLWNAYGGGSGVTRPLGNNVMDGFDLDIESNPGTNENYAALVSALRSNFASDP chitinase[T. SRQYVISGAPQCPLPEPNMGVIIQNAQFDYLWVQFYNNNEYPGDPCSLGLPGDAPFNFNNWTTF reeseiQM6a] IQSTPSKDAKVFVGVPAAPLAANGAPSGEVYYATPSQLADIVNDVKSNPAFGGIMMWSAGFSDT XP_0069693971. NVNDGCNYAQEAKNILLTGSPCSSGPVSVSRPPVSSPTITSSPPGTSPAPPSQTGSVPQWGQCG GI:589114749 GNGYTGPTQCVAPFKCVATSEWWSQCE ChainA,Crystal XASQSIDQLIKRKGKLYFGTATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWG 539 StructureOfAn DADYLVNFAQQNGKSIRGHTLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDV Endo-beta-1,4- VNEIFNEDGTLRSSVFSRLLGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVS xylanase KWISQGVPIDGIGSQSHLSGGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACL (glycoside SVSKCVGITVWGISDKDSWRASTNPLLFDANFNPKPAYNSIVGILQ HydrolaseFamily 10/gh10)Enzyme FromT.Reesei 4XVO_A GI:756143139 Endo-1,4-beta- MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG 540 xylanase1(also QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY knownasEX1; IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN Xylanase1;1,4- HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN beta-D-xylan xylanohydrolase1; Acidicendo-beta- 1,4-xylanase) GOR947.1 GI:1042851765 Endo-1,4-beta- MVSFTSLLAGVAAISGVLAAPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGP 541 xylanase2(also GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGT knownasXylanase YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW 2;1,4-beta-D- AQQGLTLGTMDYQIVAVEGYFSSGSASITVS xylan xylanohydrolase2; Alkalineendo- beta-1,4-xylanase) GRUP7.1 GI:142851766 Endo-1,4-beta- MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG 542 xylanase3(also TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH knownasXylanase TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL 3;1,4-beta-D- LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS xylan GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW xylanohydrolase3) RASTNPLLFDANFNPKPAYNSIVGILQ GORA32.1 GI:1042851767 Endo-1,4-beta- MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG 543 xylanase1(also QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY knownasEX1; IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN Xylanase1;1,4- HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN beta-D-xylan xylanohydrolase1; Acidicendo-beta- 1,4-xylanase) P36218.1GI:549460 Hypothetical MFFTKAVGGLGLLASLASSAPNPIARRQAPGAQNVVYWGQNGGGTVENNDLSAYCTPTSGIDII 544 protein VLSFLYQWGQGSSALGGTIGQSCGITTSGEPQNCDALTAAITKCKTAGVKIILSLGGASAFSSF M419DRAFT_104468 QTADQAAQAGQYLWNAYGGGSGVTRPLGNNVMDGFDLDIESNPGTNENYAALVSALRSNFASDP [T.reeseiRUT SRQYVISGAPQCPLPEPNMGVIIQNAQFDYLWVQFYNNNEYPGDPCSLGLPGDAPFNFNNWTTF C-30]ETR97430.1 IQSTPSKDAKVFVGVPAAPLAANGAPSGEVYYATPSQLADIVNDVKSNPAFGGIMMWSAGFSDT GI:572273844 NVNDGCNYAQEAKNILLTGSPCSSGPVSVSRPPVSSPTITSSPPGTSPAPPSQTGSVPQWGQCG GNGYTGPTQCVAPFKCVATSEWWSQCE Endobeta-1,4- MVAFSSLICALTSIASTLAMPTGLEPESSVNVTERGMYDFVLGAHNDHRRRASINYDQNYQTGG 545 xylanaseisotype2 QVSYSPSNTGFSVNWNTQDDFVVGVGWTTGSSAPINFGGSFSVNSGTGLLSVYGWSTNPLVEYY [T.reeseiRUT IMEDNHNYPAQGTVKGTVTSDGATYTIWENTRVNEPSIQGTATFNQYISVRNSPRTSGTVTVQN C-30]ETR98398.1 HFNAWASLGLHLGQMNYQVVAVEGWGGSGSASQSVSN GI:572274931 Hypothetical MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR 546 protein FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT M419DRAFT_114979 LSSDAQGQQLADTVWNLFLGGTANPTLRPFGSVKLDGVDLDNETGNPTGYLAMAQRFKSNFAKD [T.reeseiRUT TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL C-30]ETR98463.1 FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG GI:572274996 TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN DNDCSDPYSCKNGVCSN Hypothetical MLSRTLLTALGLTTIAAAAPSQTVKTRQAPGGQNAVYWGATNNENDNLSTYCTASSGIDIVILS 547 protein FLDIYGATGNFPSGNMGNSCYVGTNGVPQLCDDLASSIATCQAAGIKVIISLGGAASSYSLQSQ M419DRAFT_133349 SQAVAIGQYLWNAYGNSGNTTVQRPFGNVFVNGFDFDIELNAGSQYYQYLISTLRSNFANDPKN [T.reeseiRUT TYYITGAPQCPIPEPNMGEIISTSQFDYLWVQFYNNNPVCSLGLPGDAPFNFNDWVSFISTTPS C-30]ETR98658.1 KNAKLFVGAPASTLGANGNAGGAKYYATPEQLAGIVNSVKSSPFFGGIMLWDAGYSDSNVNNGC GI:572275208 NYAQEAKNILLTGTACGGESSPPPSTTTTAVPPPASSTPSNPSGGSVPQWGQCGGDGYTGPTQC VAPYKCVATSEWWSSCQ Glycoside MVSASAGLAAVGLLNGYWGQYTTTEGLRPHCDSGVDSITLGFVNGAPDASGYPSLNFGPNCWAE 548 hydrolase[T. SYPGNLGLPSKLLSHCMSLQSDIPYCRSKGVKVILSIGGVYNALTSNYFVGDNGTATDFATFLY reeseiRUTC-30] NAFGPYNASYTGPRPFDDITTGLPTSVDGFDFDIEADFPNGPYIKMIETFRSLDSSMLITGAPQ ETS00190.1 CPTNPQYFVMKDMIQQAAFDKLFIQFYNNPVCDAIPGNTAGDKFNYDDWEAVIAGSAKSKSAKL GI:572276883 YIGLPAIQEPNESGYIDPIAMKNLVCQYKDRPHFGGLSLWDLSRGLVNNINGTSFNQWALDALQ YGCNPIPTTTTTTSTASSTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKA STTSKASTTSKASTTSKASTTSKASTTSKVSTTSKASTTSKASTTSKASSTSKVSTTSKASTTS KVSAKATTSTKASTTVKPSTTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTSKASTTS KAATTSVKPTSKTSTSSKPNVSASSSNVGRDATSLVEASTSTSAAVLYPTTTSRWSNSTITRSS SLTTPIVSDPASLTTSVVYTTSVHTVTKCPAYVTDCPAGGYVTTETIPLYTTVCPISEATQTAA PTVTTEAPQPWTTSTVYTTRVYTITSCAPGVVDCPANQVTTETIPWYTTVCPVTATATPVGPGS VVFPQNTEVGQPSLVGPVVEAAYPTASSSLQTLVKPATSVGVPQGSPAGSSVAPGSSSKPTAPA GPPSYPTGGSGNASPSGSWSGVPVGPSSVPGIPEANAASVMSASLFGLVIVMAAQVFVL Xylanaseregulator MLSNPLRRYSAYPDISSASFDPNYHGSQSHLHSINVNTFGNSHPYPMQHLAQHAELSSSRMIRA 549 [T.reeseiRUT SPVQPKQRQGSLIAARKNSTGTAGPIRRRISRACDQCNQLRTKCDGLHPCAHCIEFGLGCEYVR C-30] ERKKRGKASRKDIAAQQAAAAAAQHSGQVQDGPEDQHRKLSRQQSESSRGSAELAQPAHDPPHG ET502023.1 HIEGSVSSFSDNGLSQHAAMGGMDGLEDHHGHVGVDPALGRTQLEASSAMGLGAYGEVHPGYES GI:572278872 PGMNGHVMVPPSYGAQTTMAGYSGISYAAQAPSPATYSSDGNFRLTGHIHDYPLANGSSPSWGQ SDLRYPVLEPLLPHLGNILPVSLACDLIDLYFSSSSSAQMHPMSPYVLGFVFRKRSFLHPTNPR RCQPALLASMLWVAAQTSEASFLTSLPSARSKVCQKLLELTVGLLQPLIHTGTNSPSPKTSPVV GAAALGVLGVAMPGSLNMDSLAGETGAFGAIGSLDDVITYVHLATVVSASEYKGASLRWWGAAW SLARELKLGRELPPGNPPANQEDGEGLSEDVDEHDLNRNNTRFVTEEEREERRRAWWLVYIVDR HLALCYNRPLFLLDSECSDLYHPMDDIKWQAGKFRSHDAGNSSINIDSSMTDEFGDSPRAARGA HYECRGRSIFGYFLSLMTILGEIVDVHHAKSHPRFGVGFRSARDWDEQVAEITRHLDMYEESLK RFVAKHLPLSSKDKEQHEMHDSGAVTDMQSPLSVRTNASSRMTESEIQASIVVAYSTHVMHVLH ILLADKWDPINLLDDDDLWISSEGFVTATSHAVSAAEAISQILEFDPGLEFMPFFYGVYLLQGS FLLLLIADKLQAEASPSVIKACETIVRAHEACVVTLSTEYQRNFSKVMRSALALIRGRVPEDLA EQQQRRRELLALYRWTGNGTGLAL SGNHhydrolase[T. MLLVQVRPSSSPAIDLIRGTELRILPVGDSITYGFLSDQDGGDGNGYRLQLRQHLSKDRVVFAG 550 reeseiRUTC-30] TETSGNMTDGYYAAWNGKTIQYISDHVTPSLEQRPNIILLHAGTNDMNPNGAISREGHDPVAAS ET503411.1 ERLGSLVDKMTTLCPDAVILVAMIIGTCNDEQAPQTKVFQSLIPNVVAPRLESGKHVLAVDFST GI:572280314 FPLDKLRDCIHPTNEGYHLLGYYWYDFIAQIPRDWITAPVGEDPQRPEEQNLAMRLETDLLLLG LLGLLVVLMYA XylanaseIII[T. MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG 551 reeseiRUTC-30] TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH ETS05245.1 TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL GI:572282231 LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW RASTNPLLFDANFNPKPAYNSIVGILQ Hypothetical MPSLTALAGLLALVPSALAGWNPDSKQNIAVYWGQNSANSQSTQQRLSFYCNDDNINVIEIAFL 552 protein NGINPPMTNFANAGDRCTPFSDNPWLLSCPEIEADIKTCQANGKTILLSLGGDTYSQGGWASPE M419DRAFT_94061 AAQDAAAQVWAMFGPVQSDSSAPRPFGDAVVDGFDFDFESTTNNLVAFGAQLRTLSDAAATDSN [T.reeseiRUTC- KKFYLAAAPQCFFPDAAVGPLINAVPMDWIQIQFYNNPCGVSAYTPGSEQQNNYNYQTWEDWAK 30] TSPNPNVKLLVGIPAGPNAGHGYVSDAQLKSVFEYSKKFDTFAGAMMWDMSQLYQNSGFEDQVV ETS6436.1 DALK GI:572283462 Endo-1,4-beta- MVSFTSLLAGVAAISGVLAAPAAEVESVAVEKRQTIQPGTGYNNGYFYSYWNDGHGGVTYTNGP 553 xylanase2(also GGQFSVNWSNSGNFVGGKGWQPGTKNKVINFSGSYNPNGNSYLSVYGWSRNPLIEYYIVENFGT knownasEX2; YNPSTGATKLGEVTSDGSVYDIYRTQRVNQPSIIGTATFYQYWSVRRNHRSSGSVNTANHFNAW Xylanase2;1,4- AQQGLTLGTMDYQIVAVEGYFSSGSASITVS beta-D-xylan xylanohydrolase2; Alkalineendo- beta-1,4-xylanase P36217.2 GI:1042782319 Endo-1,4-beta- MKANVILCLLAPLVLPTETIHLDPELLRANLTERTADLWDRQASQSIDQLIKRKGKLYFG 554 xylanase3(also TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH knownasXylanase; TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL 1,4-beta-D-xylan LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS xylanohydrolase3) GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW A0A024SIB3.1 RASTNPLLFDANFNPKPAYNSIVGILQ GI:1042851768 TPA_Inf:chitinase MFFSKALAAAGLLATAAYAAPTMEKRAAGGKLVVYWGAEDDSTTLANVCADSSYDIVNLAFLSR 580 18-13[T.reesei] FFAGGGYPELSLSTLGGPSAAQRAAGATNLQDGTSLIPAIQACQAAGKLVILSMGGAVDFSAVT DAA5861.1 LSSDAQGQQLADTVWNLFLGGTANPTLRPFGSVKLDGVDLDNETGNPTGYLAMAQRFKSNFAKD GI:1232265 TSKKYYLTAAPQCPFPDASEPLNVCQLADYIWVQFYNNGNCNIAQSGFNNAVKNWSKSIGNATL FIGALASGADGDQGYVSASSLLSAYQGVSALNLPNIGGIMLWEAQLAVKNGNFQKTVKAGIASG TTPPPPPPTGGCSWAGHCAGASCSTDNDCSDDLTCNGGVCGTAGSTPAPTCSWEGHCLGASCGN DNDCSDPYSCKNGVCSN GI:572280314 673 674 675 676 94RecName: MKANVILCLLAPLVAALPTETIHLDPELAALRANLTERTADLWDRQASQSIDQLIKRKGKLYFG 677 Full=Endo-1,4- TATDRGLLQREKNAAIIQADLGQVTPENSMKWQSLENNQGQLNWGDADYLVNFAQQNGKSIRGH beta-xylanase3; TLIWHSQLPAWVNNINNADTLRQVIRTHVSTVVGRYKGKIRAWDVVNEIFNEDGTLRSSVFSRL Short=Xylanase3; LGEEFVSIAFRAARDADPSARLYINDYNLDRANYGKVNGLKTYVSKWISQGVPIDGIGSQSHLS AltName:Full=1,4- GGGGSGTLGALQQLATVPVTELAITELDIQGAPTTDYTQVVQACLSVSKCVGITVWGISDKDSW beta-D-xylan RASTNPLLFDANFNPKPAYNSIVGILQ xylanohydrolase3; Flags:Precursor 347aaprotein Beta-galactosidase MKLQSILSCWAILVAQIWATTDGLTDLVAWDPYSLTVNGNRLFVYSGEFHYPRLPVPEMWLDVF 678 [Aspergillus QKMRAHGFNAVSLYFFWDYHSPINGTYDFETGAHNIQRLFDYAQEAGIYIIARAGPYCNAEFNG niger] GGLALYLSDGSGGELRTSDATYHQAWTPWIERIGKIIAENSITNGGPVILNQIENELQETTHSA A0V94178.1 SNTLVEYMEQIEEAFRAAGVDVPFTSNEKGQRSRSWSTDYEDVGGAVNVYGLDSYPGGLSCTNP GI:1078570522 STGFSVLRNYYQWFQNTSYTQPEYLPEFEGGWFSAWGADSFYDQCTSELSPQFADVYYKNIIGQ RVTLQNLYMLYGGTNWGHLAAPVVYTSYDYSAPLRETRQIRDKLSQTKLVGLFTRVSSGLLGVE MEGNGTSYTSTTSAYTWVLRNPNTTAGFYVVQQDTTSSQTDITFSLNVNTSAGAFTLPNINLQG RQSKVISTDYPLGHSTLLYVSTDIATYGTFGDTDVVVLYARSGQEVSFSFKNTTKLTFEEYGDS VNLTSSSGNRTITSYTYTQGSGTSVVKFSNGAIFYLVETETAFRFWAPPTTTDPYVTAEQQIFV LGPYLVRNVSISGSVVDLVGDNDNATTVEVFAGSSAKAVKWNGKEITVTKTDYGSLVGSIGGAD SSSITIPSLTGWKVRDSLPEIQSSYDDSKWTVCNKTTTLSPVDPLSLPVLFASDYGYYTGIKIY RGRFDGTNVTGANLTAQGGLAFGWNVWLNGDLVASLPGDADETSSNAAIDFSNHTLKQTDNLLT VVIDYTGHDETSTGDGVENPRGLLGATLNGGSFTSWKIQGNAGGAAGAYELDPVRAPMNEGGLL AERQGWHLPGYKAKSSDGWTDGSPLDGLNKSGVAFYLTTFTLDLPKNYDVPLGIQFTSPSTVDP VRIQLFINGYQYGKYVPYLGPQTTFPIPPGIINNRDKNTIGLSLWAQTDAGAKLENIELISYGA YESGFDAGNGTGFDLNGAKLGYQPEWTEARAKYT Beta-galactosidase MTRITKLCVLLLSSIGLLAAAQNQTETGWPLHDDGLTTDVQWDHYSFKVHGERIFVFSGEFHYW 679 [Aspergillus RIPVPGLWRDILEKIKAAGFTTFAFYSSWAWHAPNNHTVDFSTGARDITPIFELAKELGMYIIV niger] RPGPYINAEASAGGFPLWLTTGDYGTLRNNDSRYTEAWKPYFEKMTEITSRYQITNGHNTFCYQ A0V94179.1 IENEYGDQWLSDPSERVPNETAIAYMELLESSARENGILVPFTANDPNMNAMAWSRDWSNAGGN GI:1078570524 VDVVGLDSYPSCWTCDVSQCTSTNGEYVAYQVVEYYDYFLDFSPTMPSFMPEFQGGSYNPWAGP EGGCGDDTGVDFVNLFYRWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSIS SKYYETKLLSLFTRSARDLTMTDLIGNGTQYTNNTAVKAYELRNPTTNAGFYVTLHEDSTVGTN EAFNLRVNTSAGNLIVPRRGGSIRLNGHQSKIIVTDFTFGSETLLYSTAEVLTYAVIDKKPTLV LWVPTGESGEFAVKGAKSGSVVSKCQSCPAINFHQQGGNLIVGFTQFQGMSVVQIDNDIRVVLL DRTAAYKFWAPALTEDPLVPEDEAVLIQGPYLVRSASLEKSTLAIKGDSINETAVEIFAPENVK TITWNGKQLKTSKSSYGSLKATIAAPASIQLPAFTSWKVNDSLPERLPTYDASGPAWVDANHMT TANPSKPATLPVLYADEYGFHNGVRLWRGYFNGTASGVFLNVQGGSAFGFSAYLNGHFLGSYLG NASIEQANQTFLFPNNITHPTTQNTLLVIHDDTGHDETTGALNPRGILEARLLPSDTTNNSTSP EFTHWRIAGTAGGESNLDPVRGAWNEDGLYAERVGWHLPGFDDSTWSSASSSLSFTGATVKFFR TTIPLDIPRGLDVSISFVLGTPDNAPNAYRAQLFVNGYQYGRFNPYIGNQVVFPVPVGVLDYTG ENTIGVAVWAQTEDGAGITVDWKVNYVADSSLDVSGLETGELRPGWSAERLKFA Beta-galactosidase MKTSFLLAIGLAVEACLGLVSAPNYVRQINATDSSLQDIVTWDEYSIRVRGERILLLLGEFHPF 680 [Aspergillus RLPCPGLWLDVFQKVRALGFSAVSFYVDWALLEGERGSIRADGVFALEEFFQAATEAGLYLTAR niger] PGPYINAEVSGGGFPGWLKRVQGRLKTTDQGYLDAITPYMQAIGRIIAKAQITNGGPVILFQPE A0V94180.1 NEYTACVQDEGYTQVSNYSMPDINSSCLQKEYMAYVEEQYRKAGIVVPFIVNDADPMGNFAPGT GI:1078570526 GVGAVDIYSFDDYPLQWSTAPSNPSNWSSLISPLLSYNETVHEEQSPTTPFSISEFQGGVPDAW GGVGIETSAAYIGPEFERIFYKINYGFRAAIQNLYMIFGGTNWGNLGHSGGYTSYDVGAAIAED RQVIREKYSELKLQSNFLQASSAYLETHSDNGSYGIYTDATSLAVTRLAGNPTNFYVVRHGELT SRESTSYKLRVNTSAGNLAIPQLSGSLSLHGRDSKIHLVDYNVGNVSLIYSTAELFTWKQAGSK SVVVLYGGEDELHEFAVPANKGKPTSIEGDGLQVQQINSTTVIQWAVQPSRRVVHFSDTLEVHL LWRNEAYNYWVLDLPVPGAIGRHVSRSHTNRSVIVKAGYLLRTAEIIGTSLYLTGDINTTTTIE LISAPQPVTSILFNKNRIPTTITSPGRLTGTLTYHKPNISLPDLTTLDWYYLNTLPEVHDPTYD DHLWTPCTHTTTANPRNLTTPTSLYASDYGYNGGTLLYRGTFTATGNETSLYLLTEGGYAYGHS IWLNNTFLASWPGNPAFLLSNQTITFPSPLTPGTTYKLTILIDHLGNDENFPANGEFMKDPRGI LDYTLHGRDDKSAISWKMTGNFGGESYADLSRGPLNEGALFAERKGYHLPGAPTEQWTKRSPFD GLPEDERPGVGFFATKFDLQIPDGYDVPISVVFENSTMAGDGSGPARFRSELFVNGWQFGKYVN HIGPQLSYPVPEGILNYNGSNYLALTIWAMDEKSFKLDGLRLQANAVVQSGYRKPSLVKGEVYK ERVDSY LactaseB MTLQCKLESACSSTPHNAMVSVLQQDQWAGEPAEQQPHLSAVAAMGRDNECTMFEPGLSGHLLR 681 [Aspergillus GGHEATQVRNMVIEILF luchuensis] GAT22890.1 GI:1002328951 LactaseB MTRITKLCALLLSSTGLLAAAQNQTETGWPLYDDGLTTDIQWDHYSFKVHGERIFVFSGEFHYW 682 [Aspergillus RIPVPGLWRDILEKIKAAGFTTFSIYSSWAWHAPNNHTVDFSTGARDITPIFELAKELGMYIIV luchuensis] RPGPYINAEASAGGFPLWLTTGDYGTLRNNDSRYTAAWKPYFEKMTEITSRYQVTNGHNTFCYQ GAT26827.1 IENEYGDQWLSDPSERVPNETAIAYMELLESSARENGILVPFTANDPNMNAMAWSRDWSNAGGN GI:1002325961 VDVVGLDSYPSCWTCDVSQCTSTNGEYVAYQVVEYYDYFLEFSPTMPSFMPEFQGGSYNPWAGP EGGCGDDTGVDFVNLFYRWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSIS SKYYETKLLSLFTRSARDLTMTDLIGNGTQYTNNTAVKAYELRNPTTNAGFYVTLHEDSTVGTN EAFSLRVNTSAGNLIVPRLGGSIRLNGHQSKIIVTDFTFGSETLLYSTAEVLTYAVLDKKPTLV LWVPTGESGEFAVKGAKSGSVVSKCQDCSAINFHQQGGNLVVGFTQAQGMSIVQIDNDIRVILL DRTAAYEFWAPALTEDPLVPEDEAVLIQGPYLVRSASLEKSTLAIKGDSINETAVEIFAPNDVK TVTWNGKQLKTSKSSYGSLKATIAAPVSIQLPAFTSWKVNDSLPERLPTYDASGLAWVDANHMT TANPSKPATLPVLCADEYGFHNGVRLWRGYFNGTASGVFLNVQGGSAFGFSAYLNGQFLGSYLG NASIEQANQTFVFPTNITHPTTQNTLLIIHDDTGHDETTGALNPRGILEARLLPSTTTDNTASP EFTHWRLAGTAGGESNLDPVRGAWNEDGLYAERVGWHLPGFDDSTWPSVSSSSLSFTGATVKFF RTTIPLNIPRGLDVSISFVLGTPDNAPNTYRAQLFVNGYQYGRFNPYIGNQVVFPVPVGVLDYS GENTIGVAVWAQTEDGAAITVDWKVNYVADSSLDVAGLETAGLRPGWSVERLKFA PutativeLactaseB MPFFMPEFQGGSYNPWDGPEGGCTEDTGAEFANLFYRWNIAQRVTAMSLYMMYGGTNWGGLAAP 683 [Aspergillus VTATSYDYSAPISEDRSIGSKYYETKLLALFTRCAKDLTMTDRIDNGTQYTTNAAISATELRNP calidoustus] ETNAAFYVTNHLDTTLGTDESFKLHVDTSEGALTIPKHGGAIRLNGHQSKIIVTDFRLGRETLL CEN62581.1 YSTAEVLTYAVFDKKPTLVLWVPAGESGEFAIKGAKRGSATTCSDCSPVEFHRSKESLTVSFTQ GI:972234022 ADGISIVQLDNGVRVLLLDRPSAYTFWAPALTDDPLVPETESVFVSGPYLVRSAKLSGSTLALR GDSNGKTAIEVFAPKKVNKITWNGRRIKVTKTRYGSLKASLASAPSIELPALDGWKVSDSLPER LPAYDDSGAAWVDADHMTTPNPHKPATLPVLYADEYGFHNGVRLWRGYFNSSASGVFLNIQGGA AFGWSAYLNGHFLDSYLGDASTNQANGTLSFPDDTLNTDGTPNVLLVIHDDTGHDQTTGVLNPR GILEARLLPLDTESDTEAPEFTHWRVAGTAGGESDLDPVRGVYNEDGLFAERVGWHLPGFDDDD WPAANNSLSFTGATVKFFRTVIPPLDIPQGVDVSISFVFSASSGGNSSSSSSSTGGNTRAFRAQ LFVNGYQYGRFNPYVGNQIVYPVPPGILDYNGENTIGVAVWAQTEAGASLELDWRVNYVVDSSL DVANLDVGGLRPEWEEERLSFA Beta-plucosidase, MNVNMFKAGDDILQDVDQSCKDRLPAVEELPLPPTFTWGTATAAYQVEGGAFQDGKGKSIWDTF 684 lactase THLDPSRTNGENGDIACDHYNRMAEDVVLMASYGVDVYRFSIAWARILPLGGRGDPINEKGIAF phlorizinhydrolase YNNLIDCLLEHNIEPVVTLYHWDVPQGLYDRYGAFLDTTEFRADFEHFARLCFSRFGDRVKRWI [Aspergillus TFNEPYIIAIFGHHSGVLAPGRSSATGGDSRTEPWRVGHTIILAHTAAVQAYATDFQPTQKGDI oryzae100-8] SIVLNGHYYEPWDAGSEEHWLAAQRRLEFYIGWFGDPIFLGKDYPAPMRAQLGSRLPEFTSEEL KDE76127.1 DLLRRSAPINSFYGMNHYTTKYARALPDPPAEDDCTGNVEEGPTNSEGKTMGPLSGMSWLRVTP GI:635504017 AGFRKLLNWVWDRYRRPIVVTENGCPCPGESQMTKEQALDDQFRIRYFGLYLDAISRAIYDDGV KVEGYYVWSLMDNFEWSAGYGPRYGITHVDFTTLVRTPKQSAKYLHHSFNKRRATSLR Beta-glucosidase, MGSTSTSTLPPDFLWGFATASYQIEGAVNEDGRGPSIWDTFCKIPGKIAGGANGDVACDSYHRT 685 lactase HEDIALLKACGAKAYRFSLSWSRIIPLGGRNDPINEKGLQYYIKFVDDLHAAGITPLVTLFHWD phlorizinhydrolase LPDELDKRYGGLLNKEEFVADFAHYARIVFKAFGSKVKHWITFNEPWCSSVLGYNVGQFAPGRT [Aspergillus SDRSKSPVGDSSRECWIVGHSLLVAHGAAVKIYRDEFKASDGGEIGITLNGDWAEPWDPENPAD oryzae3.042] VEACDRKIEFAISWFADPIYHGKYPDSMVKQLGDRLPKWTPEDIALVHGSNDFYGMNHYCANFI EIT76661.1 KAKTGEADPNDTAGNLEILLQNRKGEWVGPETQSPWLRPSAIGFRKLLKWLSERYNYPKIYVTE GI:391867415 NGTSLKGENDLPLEQLLQDDFRTQYFRDYIGAMADAYTLDGVNVRAYMAWSLMDNFEWAEGYET RFGVTYVDYENNQKRIPKQSAKAIGEIFDQYIEKA Beta-glucostdase, MNVNMFKAGDDILQDVDQSCKDRLPAVEELPLPPTFTWGTATAAYQVEGGAFQDGKGKSIWDTF 686 lactase THLDPSRTNGENGDIACDHYNRMAEDVVLMASYGVDVYRFSIAWARILPLGGRGDPINEKGIAF phlortzinhydrolase YNNLIDCLLEHNIEPVVTLYHWDVPQGLYDRYGAFLDTTEFRADFEHFARLCFSRFGDRVKRWI [Aspergillus TFNEPYIIAIFGHHSGVLAPGRSSATGGDSRTEPWRVGHTIILAHTAAVQAYATDFQPTQKGDI oryzae3.42] SIVLNGHYYEPWDAGSEEHWLAAQRRLEFYIGWFGDPIFLGKDYPAPMRAQLGSRLPEFTSEEL EIT82651.1 DLLRRSAPINSFYGMNHYTTKYARALPDPPAEDDCTGNVEEGPTNSEGKTMGPLSGMSWLRVTP GI:391873626 AGFRKLLNWVWDRYRRPIVVTENGCPCPGESQMTKEQALDDQFRIRYFGLYLDAISRAIYDDGV KVEGYYVWSLMDNFEWSAGYGPRYGITHVDFTTLVRTPKQSAKYLHHSFNKRRATSLR Beta-glucosidase, MNVNMFKAGDDILQDVDQSCKDRLPAVEELPLPPTFTWGTATAAYQVEGGAFQDGKGKSIWDTF 687 lactase THLDPSRTNGENGDIACDHYNRMAEDVVLMASYGVDVYRFSIAWARILPLGGRGDPINEKGIAF phlortzinhydrolase YNNLIDCLLEHNIEPVVTLYHWDVPQGLYDRYGAFLDTTEFRADFEHFARLCFSRFGDRVKRWI [Aspergillus TFNEPYIIAIFGHHSGVLAPGRSSATGGDSRTEPWRVGHTIILAHTAAVQAYATDFQPTQKGDI oryzae3.042] SIVLNGHYYEPWDAGSEEHWLAAQRRLEFYIGWFGDPIFLGKDYPAPMRAQLGSRLPEFTSEEL EIT82651.1 DLLRRSAPINSFYGMNHYTTKYARALPDPPAEDDCTGNVEEGPTNSEGKTMGPLSGMSWLRVTP GI:391873626 AGFRKLLNWVWDRYRRPIVVTENGCPCPGESQMTKEQALDDQFRIRYFGLYLDAISRAIYDDGV KVEGYYVWSLMDNFEWSAGYGPRYGITHVDFTTLVRTPKQSAKYLHHSFNKRRATSLR LactaseB MTRITKLCALLLSSTGLLAAAQNQTETGWPLYDDGLTTDIQWDHYSFKVHVPGLWRDILEKIKA 688 [Aspergillus AGFTTFSIYSSWAWHAPNNHTVDFSTGARDITPIFELAKELGMYIIVRPGPYINAEASAGGFPL kawachiiIFO4308] WLTTGDYGTLRNNDSRYTAAWKPYFEKMTEITSRYQVTNGHNTFCYQIENEYGDQWLSDP GAA82087.1 PNETAIAYMELLESSARENGILVPFTANDPNMNAMAWSRDWSNAGGNVDVVGLDSYPSCWTCDV GI:358365465 SQCTSTNGEYVAYQVVEYYDYFLEFSPTMPSFMPEFQGGSYNPWAGPEGGCGDDTGVDFVNLFY RWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSISSKYYETKLLSLFTRSAR DLTMTDLIGNGTQYTNNTAVKAYELRNPTTNAGFYVTLHEDSTVGTNEAFSLRVNTSAGNLIVP RLGGSIRLNGHQSKIIVTDFTFGSETLLYSTAEVLTYAVLDKKPTLVLWVPTGESGEFAVKGAK SGSVVSKCQDCSAINFHQQGGNLVVGFTQAQGMSIVQIDNDIRVILLDRTAAYEFWAPALTEDP LVPEDEAVLIQGPYLVRSASLEKSTLAIKGDSINETAVEIFAPNDVKTVTWNGKQLKTSKSSYG SLKATIAAPVSIQLPAFTSWKVNDSLPERLPTYDASGLAWVDANHMTTANPSKPATLPVLYADE YGFHNGVRLWRGYFNGTASGVFLNVQGGSAFGFSAYLNGQFLGSYLGNASIEQANQTFVFPTNI THPTTQNTLLIIHDDTGHDETTGALNPRGILEARLLPSTTTDNTASPEFTHWRLAGTAGGESNL DPVRGAWNEDGLYAERVGWHLPGFDDSTWPSVSSSSLSFTGATVKFFRTTIPLNIPRGLDVSIS FVLGTPDNAPNTYRAQLFVNGYQYGRFNPYIGNQVVFPVPVGVLDYSGENTIGVAVWAQTEDGA AITVDWKVNYVADSSLDVAGLETAGLRPGWSVERLKFA Probablebeta- MRILSLLFLLLLGFLAGNRVVSATDHGKTTDVTWDRYSLSVKGERLFVFSGEFHYQRLPVPEMW 689 galactosidaseC LDVFQKLRANGFNAISVYFFWGYHSASEGEFDFETGAHNIQRLFDYAKEAGIYVIARAGPYCNA (LactaseC) ETTAGGYALWAANGQMGNERTSDDAYYAKWRPWILEVGKIIAANQITNGGPVILNQHENELQET AlCE56.1 SYEADNTLVVYMKQIARVFQEAGIVVPSSHNEKGMRAVSWSTDHHDVGGAVNIYGLDSYPGGLS GI:00680864 CTNPSSGFNLVRTYYQWFQNSSYTQPEYLPEFEGGWFQPWGGHDYDTCATELSPEFADVYYKNN IGSRVTLQNIYMVFGGTNWGHSAAPVVYTSYDYSAPLRETREIRDKLKQTKLIGLFTRVSSDLL KTHMEGNGTGYTSDSSIYTWALHNPDTNAGFYVLAHKTSSSRSVTEFSLNVTTSAGAISIPDIQ LDGRQSKIIVTDYQFGKSSALLYSSAEVLTYANLDVDVLVLYLNVGQKGLFVFKDERSKLSFQT YGNTNVTASVSSHGTQYIYTQAEGVTAVKFSNGVLAYLLDKESAWNFFAPPTTSNPQVAPDEHI LVQGPYLVRGVTINHDTVEIIGDNANTTSLEVYAGNLRVKVVKWNGKAIKSRRTAYGSLVGRAP GAEDARISPPSLDSWSAQDTLPDIQPDYDDSRWTVCNKTASVNAVPLLSLPVLYSGDYGYHAGT KVYRGRFDGRNVTGANVTVQNGVASGWAAWLNGQFVGGVAGAIDLAVTSAVLSFNSSLLHDRDN VLTVVTDYTGHDQNSVRPKGTQNPRGILGATLIGGGKFTSWRIQGNAGGEKNIDPVRGPINEGG LYGERMGWHLPGYKAPRSAAKSSPLDGISGAEGRFYTTTFTLKLDRDLDVPIGLQLGAPAGTQA VVQVFMNGYQFGHYLPHIGPQSLFPFPPGVINNRGENTLAISMWALTDAGAKLDQVELVAYGKY RSGFDFNQDWGYLQPQWKDNRRQYA Probablebeta- MAHIYRLLLLLLSNLWFSAAAQNQSETEWPLHDNGLSKVVQWDHYSFQVNGQRIFIFSGEFHYW 690 galactosidaseB RIPVPELWRDILEKVKATGFTAFAFYSSWAYHAPNNRTVDFSTGARDITPIFELAKELGMYMIV (LactaseB) RPGPYVNAEASAGGFPLWLTTGEYGSLRNDDPRYTAAWTPYFANMSQITSKYQVTDGHNTLVYQ A1D199.1 IENEYGQQWIGDPKDRNPNKTAVAYMELLEASALENGITVPLTSNDPNMNSKSWGSDWSNAGGN GI:00680896 VDVAGLDSYPSCWTCDVSQCTSTNGEYVPYKVIDYYDYFQEVQPTLPSFMPEFQGGSYNPWAGP EGGCPQDTGAEFANLFYRWNIGQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSAPISEDRSIG AKYSETKLLALFTRTAKDLTMTEAIGNGTQYTTNTAVRAFELRNPQTNAGFYVTFHNDTTVGGN QAFKLHVNTSVGALTVPKNEGVIQLNGHQSKIIVTDFTLGKRTLLYSTAEVLTYAVFENRPTLV LWVPTGESGEFAIKGTKSGKVENGDGCSGINFKREKDYLVVNFSQAKGLSVLRLDNGVRVVLLD KAAAYRFWAPALTDDPIVQETETVLVHGPYLVRSASVSKSTLALRGDSVEKTTLEIFAPHSVRK ITWNGKEVKTSQTPYGSLKATLAAPPTIKLPALTSWRSNDSLPERLPSYDDSGPAWIEANHMTT SNPSPPATLPVLYADEYGFHNGVRLWRGYFNGSASGVFLNIQGGSAFGWSAWLNGHFLDSHLGT ATTSQANKTLTFSSSILNPTENVLLIVHDDTGHDQTTGALNPRGIIEARLLSNDTSSPAPGFTQ WRIAGTAGGESNLDPIRGVFNEDGLFAERMGWHLPGFDDSAWTPENSTTSASSALSFTGATVRF FRTVVPLDIPAGLDVSISFVLSTPSAAPKGYRAQLFVNGYQYGRYNPHIGNQVVFPVPPGILDY QGDNTIGLAVWAQTEEGAGIQVDWKVNYVADSSLSVAGFGKGLRPGWTEERLKFA Probablebeta- MKLLSVCAVALLAAQAAGASIKHKLNGFTIMEHSDPAKRELLQKYVTWDEKSLFVNGERIMIFS 691 galactosidaseA GEVHPFRLPVPSLWLDVFQKIKALGFNCVSFYVDWALLEGKPGKYRAEGNFALEPFFDAAKQAG (LactaseA) IYLLARPGPYINAEASGGGFPGWLQRVNGTLRTSDPAYLKATDNYIAHVAATVAKGQITNGGPV A1D1Z9.1 ILYQPENEYSGACCNATFPDGDYMQYVIDQARNAGIVVPLINNDAWTGGHNAPGTGKGEVDIYG GI:300680858 HDSYPLGFDCGHPSVWPKGNLPTTFRTDHLRESPTTPYSLIEFQAGSFDPWGGPGFAACAALVN HEFERVFYKNDLSFGAAILNLYMTFGGTNWGNLGHPGGYTSYDYGSPLTESRNVTREKYSELKL IGNFVKASPSYLLATPGNLTTSGYADTADLTVTPLLGNGTGSYFVVRHTDYTSQASTPYKLSLP TSAGRLTVPQLGGTLTLNGRDSKVHVVDYNVAGTNILYSTAEVFTWKKFGDSKVLVLYGGPGEH HELAVSLKSDVQVVEGSNSEFTSKKVEDVVVVAWDVSASRRIVQIGDLKIFLLDRNSAYNYWVP QLDKDDSSTGYSSEKTTASSIIVKAGYLVRTAYTKGSGLYLTADFNATTPVEVIGAPSNVRNLY INGEKTQFKTDKNGIWSTGVKYSAPKIKLPSMKDLDWKYLDTLPEVQSTYDDSAWPAADLDTTP NTLRPLTMPKSLHSSDYGFHTGYLIYRGHFVADGSETTFDVRTQGGSAFGSSVWLNEAFLGSWT GLNANADYNSTYRLPQVEKGKNYVLTVVIDTMGLNENWVVGTDEMKNPRGILSYKLSGRDASAI TWKLTGNLGGEDYQDKIRGPLNEGGLYAERQGFHQPEPPSKKWKSASPLDGLSKPGIGFYTAQF DLDIPSGWDVPLYFNFGNSTKSAYRVQLYVNGYQYGKFVSNIGPQTSFPVPQGILNYQGTNWVA LTLWALESDGAKLDDFELVNTTPVMTALSKIRPSKQPNYRQRKGAY Probablebeta- MKFLLRRFIALAAASSVVAAPSVSHLSLQDAANRRELLQDLVTWDQHSLFVRGERLMIFSGEFH 692 galactosidaseE PFRLPVPGLWFDVFQKITSLGFNAVSFYTDWGLMEGNPGHVVTDGIWSLDEFFTAASEAGIYLI (LactaseE) ARPGPYINAETSAGGIPGWVLRLKGIIRSNSEDYLRATDTYMATLGKIIAKAQITNGGPVILVQ AlDJ58.1 PENEYTTWPNVSESEFPTTMNKEVMAYAEKQLRDAGVVVPTVVNDNKNLGYFAPGTGLGETDLY GI:300680873 GIDAYPMRYDCGNPYVWPTYRFPRDWQHTHRNHSPTTPFAIMEFQGGSGDGWGGVTEDGCAILV NNEAVRVVYKNNYGFGVGVFNIYMTYGGTNWGNLGYHGGYTSYDYGAAITEDRQIWREKYSEEK LQANFLKVSPAYLTATPGNGVNGSYTGNKDIAVTPLFGNGTTTNFYLVRHADFTSTGSVQYQLS VSTSVGNVTIPQLGGSLSLNGRDSKFHVTDYDVGEFNLIYSSAEIFTWAKGDNKKRVLVLYGGA GELHEFALPKHLPRPTVVDGSDVKMAKKGSAWVVQWEVTAQRRVLRAGKLEIHLLWRNDAYQHW VLELPAKQPIANYSSPSKETVLVKGGYLLRSACITNNKLHLTGDVNATTPLEVISAPKRFDGIV FNGQSLKSTRSKIGNLAATVRYQPPAISLPDLKRLDWKYLDSLPEISPDYSDEGNMSLTNTYTN NTRKFTGPTCLYADDYGYHGGSLIYRGHFKANGDESWVFLNTSGGVGFANSVWLNQTFLGSWTG SGNNMTYPRNISLPHELSPGKPYVFTVVIDHMGQDEEAPGTDAIKFPRGILDYALSGHEVSDLK WKMTGNLGGEQYQDSTRGPLNEGAMYAERRGYHLPNPPTSSWKSSSPINDGLTGAGIGFYATSF SLDLPEGYDIPLSFLFNNSASDARSGTSYRCQLFVNGYQFGKYVNDLGPQTNFPVPEGILNYNG VNYVAVSLWALEPQGALVGGLELVASTPILSAYRKPVPAPQPGWKPRRGAY Probablebeta- MRIFSFLFLLLLGILTGQGLVSGTDNGKTTDVTWDKYSLSVKGQRLFVFSGEFHYQRLPVPELW 693 galactosidaseC LDVFQKLRANGFNAISVYFFWSFHSASEGEFDFENGAHDIQRLFDYAKEAGLYVIARAGPYCNA (LactaseC) ETSAGGFALWAANGQMGNERTSDEAYYEKWRPWILEVGKIIAKNQITNGGPVILNQHENELTET A1DM65.1 TYDPNHTLVVYMKQIAQVFEEAGIVVPSSHNEKGMRGVSWSTDYHNVGGAVNIYGLDSYPGGLS GI:300680868 CTNPNSGFRLVRTYYQWFQNYSSTQPSYMPEFEGGWFQPWGGSFYDTCATELSPEFPDVYYKNN IGSRVTLHSIYMTYGGTNWGHSAAPVVYTSYDYAAPLRETREIRDKLKQTKLIGLFTRVSTDLL KTYMEGNGTGYTSDSSIYTWSLRNPDTNAGFYVLAHSTSSARDVTTFSLNATTSAGAISIPDIE LNGRQSKIIVTDYNFGTNSTLLFSSAEVLTYANLDVNVLVFYLNVGQKGTFALKDEPKLAFQTY GNSNVTTSESSYGTQYSYTQGEGVTAVKFSNGVLAYLLDKESAWNFFAPPTTSSPQVAPNEHIL VQGPYLVRGASINHGTVEITGDNANTTSIEVYTGNSQVKKVKWNGKTIETRKTAYGSLIGTVPG AEDVKIRLPSLDSWKAQDTLPEIQPDYDDSTWTVCNKTTSVNAIAPLSLPVLYSGDYGYHAGTK VYRGRFDGRNVTGANVTVQNGAAAGWAAWVNGQYAGGSAGSPSLAATSAVLTFNGLSLKDRDNV LTVVTDYTGHDQNSVRPKGTQNPRGILGATLTGGGNFTSWRIQGNAGGEKNIDPVRGPMNEGGL YGERMGWHLPGYKVPKSASKSSPLDGVSGAEGRFYTTTFKLKLDKDLDVPIGLQLGAPEGTKAV VQVFMNGYQFGHYLPHTGPQSLFPFPPGVINNRGENTLAISMWALTDAGAKLDKVELVAYGKYR SGFDFNQDWGYLQPGWKDRSQYA Probablebeta- MTRITKLCVLLLSSIGLLAAAQNQTETGWPLHDDGLTTDVQWDHYSFKVHGERIFVFSGEFHYW 694 galactosidaseB RIPVPGLWRDILEKIKAAGFTTFAFYSSWAWHAPNNHTVDFSTGARDITPIFELAKELGMYIIV (LactaseB) RPGPYINAEASAGGFPLWLTTGDYGTLRNNDSRYTEAWKPYFEKMTEITSRYQITNGHNTFCYQ A2QA64.2GI: IENEYGDQWLSDPSERVPNETAIAYMELLESSARENGILVPFTANDPNMNAMAWSRDWSNAGGN 300681011 VDVVGLDSYPSCWTCDVSQCTSTNGEYVAYQVVEYYDYFLDFSPTMPSFMPEFQGGSYNPWAGP EGGCGDDTGVDFVNLFYRWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSIS SKYYETKLLSLFTRSARDLTMTDLIGNGTQYTNNTAVKAYELRNPTTNAGFYVTLHEDSTVGTN EAFSLRVNTSAGNLIVPRLGGSIRLNGHQSKIIVTDFTFGSETLLYSTAEVLTYAVIDKKPTLV LWVPTDESGEFAVKGAKSGSVVSKCQSCPAINFHQQGGNLIVGFTQSQGMSVVQIDNDIRVVLL DRTAAYKFWAPALTEDPLVPEDEAVVLIQGPYLVRSASLEKSTLAIKGDSINETAVEIFAPENV KTITWNGKQLKTSKSSYGSLKATIAAPASIQLPAFTSWKVNDSLPERLPTYDASGPAWVDANHM TTANPSKPATLPVLYADEYGFHNGVRLWRGYFNGTASGVFLNVQGGSAFGFSAYLNGHFLGSYL GNASIEQANQTFLFPNNITHPTTQNTLLVIHDDTGHDETTGALNPRGILEARLLPSDTTNNSTS PEFTHWRIAGTAGGESNLDPVRGAWNEDGLYAERVGWHLPGFDDSTWSSVSSSSSLSFTGATVK FFRTTIPLDIPRGLDVSISFVLGTPDNAPNAYRAQLFVNGYQYGRFNPYIGNQVVFPVPVGVLD YTGENTIGVAVWAQTEDGAGITVDWKVNYVADSSLDVSGLETGELRPGWSAERLKFA Probablebeta- MKLSSACAIALLAAQAAGASIKHRINGFTLTEHSDPAKRELLQKYVTWDDKSLFINGERIMIFS 695 galactosidaseA GEFHPFRLPVKELQLDIFQKVKALGFNCVSFYVDWALVEGKPGEYRADGIFDLEPFFDAASEAG LactaseA IYLLARPGPYINAESSGGGFPGWLQRVNGTLRSSDKAYLDATDNYVSHVAATIAKYQITNGGPI A2QAN3.1 ILYQPENEYTSGCCGVEFPDPVYMQYVEDQARNAGVVIPLINNDASASGNNAPGTGKGAVDIYG GI:300680857 HDSYPLGFDCANPTVWPSGDLPTNFRTLHLEQSPTTPYAIVEFQGGSYDPWGGPGFAACSELLN NEFERVFYKNDFSFQIAIMNLYMIFGGTNWGNLGYPNGYTSYDYGSAVTESRNITREKYSELKL LGNFAKVSPGYLTASPGNLTTSGYADTTDLTVTPLLGNSTGSFFVVRHSDYSSEESTSYKLRLP TSAGSVTIPQLGGTLTLNGRDSKIHVTDYNVSGTNIIYSTAEVFTWKKFADGKVLVLYGGAGEH HELAISTKSNVTVIEGSESGISSKQTSSSVVVGWDVSTTRRIIQVGDLKILLLDRNSAYNYWVP QLATDGTSPGFSTPEKVASSIIVKAGYLVRTAYLKGSGLYLTADFNATTSVEVIGVPSTAKNLF INGDKTSHTVDKNGINSATVDYNAPDISLPSLKDLDWKYVDTLPEIQSSYDDSLWPAADLKQTK NTLRSLTTPTSLYSSDYGFHTGYLLYRGHFTATGNESTFAIDTQGGSAFGSSVWLNGTYLGSWT GLYANSDYNATYNLPQLQAGKTYVITVVIDNMGLEENWTVGEDLMKTPRGILNFLLAGRPSSAI SWKLTGNLGGEDYEDKVRGPLNEGGLYAERQGFHQPEPPSQNWKSSSPLEGLSEAGIGFYSASF DLDLPKGWDVPLFLNIGNSTTPSPYRVQVYVNGYQYAKYISNIGPQTSFPVPEGILNYRGTNWL AVTLWALDSAGGKLESLELSYTTPVLTALGEVESVDQPKYKKRKGAY Probablebeta- MKLQSILSCWAILVAQIWATTDGLTDLVAWDPYSLTVNGNRLFVYSGEFHYPRLPVPEMWLDVF 696 galactosidaseC QKMRAHGFNAVSLYFFWDYHSPINGTYDFETGAHNIQRLFDYAQEAGIYIIARAGPYCNAEFNG (LactaseC) GGLALYLSDGSGGELRTSDATYHQAWTPWIERIGKIIADNSITNGGPVILNQIENELQETTHSA A2QL84.1 SNTLVEYMEQIEEAFRAAGVDVPFTSNEKGQRSRSWSTDYEDVGGAVNVYGLDSYPGGLSCTNP GI:300680867 STGFSVLRNYYQWFQNTSYTQPEYLPEFEGGWFSAWGADSFYDQCTSELSPQFADVYYKNNIGQ RVTLQNLYMLYGGTNWGHLAAPVVYTSYDYSAPLRETRQIRDKLSQTKLVGLFTRVSSGLLGVE MEGNGTSYTSTTSAYTWVLRNPNTTAGFYVVQQDTTSSQTDITFSLNVNTSAGAFTLPNINLQG RQSKVISTDYPLGHSTLLYVSTDIATYGTFGDTDVVVLYARSGQVVSFAFKNTTKLTFEEYGDS VNLTSSSGNRTITSYTYTQGSGTSVVKFSNGAIFYLVETETAFRFWAPPTTTDPYVTAEQQIFV LGPYLVRNVSISGSVVDLVGDNDNATTVEVFAGSPAKAVKWNGKEITVTKTDYGSLVGSIGGAD SSSITIPSLTGWKVRDSLPEIQSSYDDSKWTVCNKTTTLSPVDPLSLPVLFASDYGYYTGIKIY RGRFDGTNVTGANLTAQGGLAFGWNVWLNGDLVASLPGDADETSSNAAIDFSNHTLKQTDNLLT VVIDYTGHDETSTGDGVENPRGLLGATLNGGSFTSWKIQGNAGGAAGAYELDPVRAPMNEGGLL AERQGWHLPGYKAKSSDGWTDGSPLDGLNKSGVAFYLTTFTLDLPKKYDVPLGIQFTSPSTVDP VRIQLFINGYQYGKYVPYLGPQTTFPIPPGIINNRDKNTIGLSLWAQTDAGAKLENIELISYGA YESGFDAGNGTGFDLNGAKLGYQPEWTEARAKYT Probablebeta- MKLLSVCAIALLAAQAAGASIKHMLNGFTLMEHSDPAKRELLQKYVTWDEKSLFVNGERIMIFS 697 galactosidaseA GEVHPFRLPVPSLWLDVFQKIKALGFNCVSFYVDWALLEGKPGEYRAEGNFALEPFFDVAKQAG (LactaseA) IYLLARPGPYINAEASGGGFPGWLQRVNGTLRTSDPAYLKATDNYIAHVAATIAKGQITNGGPV BXMP7.2 ILYQPENEYSGACCDATFPDGDYMQYVIDQARNAGIVVPLINNDAWTGGHNAPGTGKGEVDIYG GI:300681017 HDSYPLGFDCGHPSVWPKGNLPTTFRTDHLKQSPTTPYSLIEFQAGSFDPWGGPGFAACAALVN HEFERVFYKNDLSFGAAILNLYMTFGGTNWGNLGHPGGYTSYDYGSPLTESRNVTREKYSELKL IGNFVKASPSYLLATPGNLTTSGYADTADLTVTPLLGNGTGSYFVVRHTDYTSQASTPYKLSLP TSAGRLTVPQLGGTLTLNGRDSKIHVVDYNVAGTNIIYSTAEVFTWKNFGDSKVLILYGGPGEH HELAVSFKSDVQVVEGSNSEFKSKKVGDVAVVAWDVSPSRRIVQIGDLKIFLLDRNSVYNYWVP QLDKDDSSTGYSSEKTTASSIIVKAGYLVRTAYTKGSGLYLTADFNATTPVEVIGAPSNVRNLY INGEKTQFKTDKNGIWSTEVKYSAPKIKLPSMKDLDWKYLDTLQEVQSTYDDSAWPAADLDTTP NTLRPLTTPKSLYSSDYGFHTGYLIYRGHFVADGSETTFDVRTQGGSAFGSSVWLNESFLGSWT GLNANADYNSTYKLPQVEQGKNYVLTILIDTMGLNENWVVGTDEMKNPRGILSYKLSGRDASAI TWKLTGNLGGEDYQDKIRGPLNEGGLYAERQGFHQPQPPSQKWKSASPLDGLSKPGIGFYTAQF DLDIPSGWDVPLYFNFGNSTKSAYRVQLYVNGYQYGKFVSNIGPQTSFPVPQGILNYQGTNWVA LTLWALESDGAKLDDFELVNTTPVMTALSKIRPSKQPNYRQRKGAY Probablebeta- MAHIYRLLLLLLSNLWFSTAAQNQSETEWPLHDNGLSKVVQWDHYSFQVNGQRIFIFSGEFHYW 698 galactosidaseB RIPVPELWRDILEKVKATGFTAFAFYSSWAYHAPNNSTVDFSTGARDITPIFELAKELGMYMIV (LactaseB) RPGPYVNAEASAGGFPLWLMTGEYGSLRNDDPRYTAAWTPYFANMSQITSKYQVTDGHNTLVYQ BOXNY2.1 IENEYGQQWIGDPKNRNPNKTAVAYMELLEASARENGITVPLTSNDPNMNSKSWGSDWSNAGGN GI:300680860 VDVAGLDSYPSCWTCDVSQCTSTNGEYVPYKVIDYYDYFQEVQPTLPSFMPEFQGGSYNPWAGP EGGCPQDTSAEFANLFYRWNIGQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSAPISEDRSIG AKYSETKLLALFTRTAKDLTMTEAIGNGTQYTTNTAVRAFELRNPQTNAGFYVTFHTDTTVGGN QAFKLHVNTSVGALTVPKNEGLIQLNGHQSKIIVTDFTLGKRTLLYSTAEVLTYAVFENRPTLV LWVPTGESGEFAIKGAKSGKVENGDGCSGIKFKREKDYLVVNFSQAKGLSVLRLDNGVRVVLLD KAAAYRFWAPALTDDPNVQETETVLVHGPYLVRSASISKTTLALRGDSVEKTTLEIFAPHSVRK ITWNGKEVQTSHTPYGSLKATLAAPPDIKLPALTSWRSNDSLPERLPSYDDSGPAWIEANHMTT SNPSPPATFPVLYADEYGFHNGVRLWRGYFNGSASGVFLNIQGGSAFGWSAWLNGHFLDSHLGT ATTSQANKTLTFPSSILNPTENVLLIVHDDTGHDQTTGALNPRGILEARLLSNDTSSPPPEFTH WRLAGTAGGESNLDPIRGVFNEDGLFAERMGWHLPGFDDSAWTSENSATSASSALSFTGATVRF FRSVVPLNIPAGLDVSISFVLSTPTAAPKGYRAQLFVNGYQYGRYNPHIGNQVVFPVPPGILDY QGDNTIGLAVWAQTEEGAGIQVDWKVNYVADSSLSVAGFGKGLRPGWTEERLKFA Probablebeta- MKSLLKRLIALAAAYSVAAAPSFSHHSSQDAANKRELLQDLVTWDQHSLFVRGERLMIFSGEFH 699 galactosidaseE PFRLPVPGLWFDVFQKIKSLGFNAVSFYTDWGLMEGNPGHVVTDGIWSLDEFFTAAREAGLYLI (LactaseE) ARPGPYINAETSAGGIPGWVLRRKGIIRSNSEDYLRATDTYMATLGKIIAKAQITNGGPVILVQ BOXXE7.1 PENEYTTWPNVSESEFPTTMNQEVMAYAEKQLRDAGVVVPTVVNDNKNLGYFAPGTGLGETDLY GI:300680872 GIDAYPMRYDCGNPYVWPTYRFPRDWQHEHRNHSPTTPFAIMEFQGGSGDGWGGVTEDGCAILV NNEAVRVVYKNNYGFGVRVFNIYMTYGGTNWGNLGYYGGYTSYDYGAAITEDRQIWREKYSEEK LQANFLKVSPAYLTSTPGNGVNGSYTGNKDITVTPLFGNGTTTNLYLVRHADFTSTGSAQYNLS ISTSVGNVTIPQLGGSLSLNGRDSKFHITDYDVGGFNLIYSSAEVFTWAKGDNKKRVLVLYGGA GELHEFALPKHLPRPTVVEGSYVKIAKQGSAWVVQWEVAAQRRVLRAGKLEIHLLWRNDAYQHW VLELPAKQPIANYSSPSKETVIVKGGYLLRSAWITDNDLHLTGDVNVTTPLEVISAPKRFDGIV FNGQSLKSTRSKIGNLAATVHYQPPAISLPDLKRLDWKYIDSLPEISTEYNDEGWTPLTNTYTN NTREFTGPTCLYADDYGYHGGSLIYRGHFTANGDESWVFLNTSGGVGFANSVWLNQTFLGSWTG SGRNMTYPRNISLPHELSPGEPYVFTVVIDHMGQDEEAPGTDAIKFPRGILDYALSGHELSDLR WKMTGNLGGEQYQDLTRGPLNEGAMYAERQGYHLPSPPTSSWKSSNPIKEGLTGAGIGFYATSF SLDLPEGYDIPLSFRFNNSASAARSGTSYRCQLFVNGYQFGKYVNDLGPQTKFPVPEGILNYNG VNYVAVSLWALESQGALIGGLDLVASTPILSGYRKPAPAPQPGWKPRRGAY Probablebeta- MRIFSFLFLLLLGILTGQGLVSGTDNGKTTDVTWDKYSLSVKGQRLFVFSGEFHYQRLPVPELW 700 galactosidaseC LDVFQKLRANGFNAISVYFFWSFHSASEGEFDFENGAHDIQRLFDYAKEAGLYVIARAGPYCNA (LactaseC) ETSAGGFALWAANGQMGNERTSDEAYYEKWRPWILEVGKIIAKNQITNGGPVILNQHENELVET B0Y752.1 TYDPNHTLVVYMKQIAQVFEEAGIVVPSSHNEKGMRGVSWSTDYHNVGGAVNIYGLDSYPGGLS GI:300680865 CTNPNSGFNLVRTYHQWFQNYSFTQPSYLPEFEGGWFQPWGGSFYDTCATELSPEFPDVYYKNN IGSRVTLHSIYMTYGGTNWGHSAAPVVYTSYDYAAPLRETREIRDKLKQTKLIGLFTRVSKDLL KTYMEGNGTGYTSDSSIYTWSLRNPDTNAGFYVLAHSTSSTRDVTTFTLNVTTSAGAISIPDIE LNGRQSKIIVTDYNFGTNSTLLFSSAEVLTYANLDVNVLVFYLNVGQKGTFVFKDEPKLAFQTY GNSNLTTSESSYGTQYSYTQGKGVTAVKFSNGVLAYFLDKESAWNFFAPPTTSSPQVAPNEHIL VQGPYLVRGASVNHGTVEITGDNANTTSIEVYTGNSQVKKIKWNGKTIETRKTAYGSLIGTAPG AEDVKIQLPSLDSWKAQDTLPEIQPDYDDSKWTVCNKTTSVNAIAPLSLPVLYSGDYGYHAGTK VYRGRFDGRNVTGANVTVQNGAAAGWAAWVNGQYAGGSAGSPNLAATSAVLTFNSSSLKDQDNV LTVVTDYTGHDQNSVRPKGTQNPRGILGATLIGGGNFTSWRIQGNAGGEKNIDPVRGPMNEGGL YGERMGWHLPGYKVPKSASKSSPLDGVSGAEGRFYTTTFKLKLDKDLDVPIGLQLGAPEGTKAV VQVFMNGYQFGHYLPHTGPQSLFPFPPGVINNRGENTLAISMWALTDAGAKLDKVELVAYGKYR SGFDFNQDWGYLQPGWKDRSQYA Probablebeta- MRLLSFIYLVWLALLTGTPQVSATDNGKTSDVAWDKYSLSVKGERLFVFSGEFHYQRLPVPELW 701 galactosidaseC LDVFQKLRANGFNTISVYFFWSYHSASEDVFDFTTGAHDIQRLFDYAKQAGLYVIARAGPYCNA (LactaseC) ETSAGGFALWAANGQMGSERTSDEAYYKKWKPWILEVGKIIAANQITNGGPVILNQHENELQET B8N2I5.1 TYDSNDTKVIYMEQVAKAFEEAGVVVPSSHNEKGMRTVSWSTDYKNVGGAVNVYGLDSYPGSLS GI:300680866 CANPNSGFNLLRTYYQWFQNYSYTQPEYLAEFEGGWFQPWGGSFYDSCASELSPEFADVYYKNN IGSRVTLHNIYMTFGGTNWGHSAAPVVYTSYDYGSPLRETREIRDKLKQTKLLGLFTRVSKDLL KTYMEGNGTSYTSDDSIYTWALRNPDSDAGFYVVAHNTSSSREVTTFSLNITTSAGALTIPDIE LDGRQSKIIVTDYSIGSESSLLYSSAEVLTYATLDVDVLVFYLNAGQKGAFVFKDAPADLKYQT YGNSNLSALETSQGTQYSYTQGEGVTAVKFSNGVLVYLLDKETAWNFFAPPTVSSPTVAPNEHI LVFGPYLVRGASIKHDTVEIVGDNSNSTSIEIYTGDEHVKKVSWNGNLIDTRATAYGSLIGTVP GAEDIEISLPSLSSWKAQDTLPEISPDYDDSRWTICNKTTSVNSVAPLSLPVLYSGDYGYHTGT KIYRGRFDGQNATGANVTVQNGVAAGWAAWLNGAYVGGFSGDPDKVASWEVLKFNHSSLRSRDN VLTIITDYTGHDQNSQKPIGTQNPRGIMGATLIGGGNFTLWRIQGNAGGEKNIDPVRGPMNEGG LYGERMGWHLPGYQVPESALDSSPLEGVSGAEGRFYTTSFQLDLEEDLDVPIGLQLSAPAGTEA VVQIFMNGYQFGHYLPHIGPQSLFPFPPGVIYNRGQNSLAISMWALTDAGARLEQVELKAYAKY RSGFDFNRDWTYLQPGWKDRTEYA Probablebeta- MKLLSVAAVALLAAQAAGASIKHRLNGFTILEHPDPAKRDLLQDIVTWDDKSLFINGERIMLFS 702 galactosidaseA GEVHPFRLPVPSLWLDIFHKIRALGFNCVSFYIDWALLEGKPGDYRAEGIFALEPFFDAAKEAG (LactaseA) IYLIARPGSYINAEVSGGGFPGWLQRVNGTLRSSDEPFLKATDNYIANAAAAVAKAQITNGGPV B8N6V7.1 ILYQPENEYSGGCCGVKYPDADYMQYVMDQARKADIVVPFISNDASPSGHNAPGSGTGAVDIYG GI:300680889 HDSYPLGFDCANPSVWPEGKLPDNFRTLHLEQSPSTPYSLLEFQAGAFDPWGGPGFEKCYALVN HEFSRVFYRNDLSFGVSTFNLYMTFGGTNWGNLGHPGGYTSYDYGSPITETRNVTREKYSDIKL LANFVKASPSYLTATPRNLTTGVYTDTSDLAVTPLIGDSPGSFFVVRHTDYSSQESTSYKLKLP TSAGNLTIPQLEGTLSLNGRDSKIHVVDYNVSGTNIIYSTAEVFTWKKFDGNKVLVLYGGPKEH HELAIASKSNVTIIEGSDSGIVSTRKGSSVIIGWDVSSTRRIVQVGDLRVFLLDRNSAYNYWVP ELPTEGTSPGFSTSKTTASSIIVKAGYLLRGAHLDGADLHLTADFNATTPIEVIGAPTGAKNLF VNGEKASHTVDKNGIWSSEVKYAAPEIKLPGLKDLDWKYLDTLPEIKSSYDDSAWVSADLPKTK NTHRPLDTPTSLYSSDYGFHTGYLIYRGHFVANGKESEFFIRTQGGSAFGSSVWLNETYLGSWT GADYAMDGNSTYKLSQLESGKNYVITVVIDNLGLDENWTVGEETMKNPRGILSYKLSGQDASAI TWKLTGNLGGEDYQDKVRGPLNEGGLYAERQGFHQPQPPSESWESGSPLEGLSKPGIGFYTAQF DLDLPKGWDVPLYFNFGNNTQAARAQLYVNGYQYGKFTGNVGPQTSFPVPEGILNYRGTNYVAL SLWALESDGAKLGSFELSYTTPVLTGYGNVESPEQPKYEQRKGAY Probablebeta- MLISKTVLSGLALGASFVGVSAQQNSTRWPLHDNGLTDTVEWDHYSFLINGQRHFVFSGEFHYW 703 galactosidaseB RIPVPELWRDLLEKIKAAGFTAFSIYNHWGYHSPKPGVLDFENGAHNFTSIMTLAKEIGLYMII (LactaseB) RPGPYVNAEANAGGLPLWTTTGAYGKLRDNDPRYLEALTPYWANISKIIAPHLITNGGNVILYQ B8NKI4.2 IENEYAEQWLDEETHEPNTSGQEYMQYLEDVARENGIDAPLIHNLPNMNGHSWSKDLSNATGNV GI:68115 DVIGVDSYPTCWTCNVSECASTNGEYIPYKTLIYYDYFKELSPTQPSFMPEFQGGSYNPWGGPQ GGCPDDLGPDFANLFYRNLISQRVSAISLYMLYGGTNWGWHASTDVATSYDYSSPISENRKLIE KYYETKVLTQFTKIAQDLSKVDRLGNSTKYSSNPAVSVAELRNPDTGAAFYVTQHEYTPSGTVE KFTVKVNTSEGALTIPQYGSQITLNGHQSKIIVTDFKFGSKTLLYSTAEVLTYAVIDGKEVLAL WVPTGESGEFTVKGVNSAKFADKGRTANIEIHPGANNVTVSFMQRSGMSLVELGDGTRIVLLDR SAAHVFWSTPLNNDPAEAGNNTVLVHGPYLVRSAKLEGCDLKLTGDIQNSTEVSIFAPKSVCSV NWNGKKTSVKSAKGGVITTTLGGDAKFELPTISGWKSADSLPEIAKDYSATSKAWVVATKTNSS NPTPPAPNNPVLYVDENDIHVGNHIYRATFPSTDEPPTDVYLNITGGRAFSYSVWLNSDFIGSW LGTATTEQNDQTFSFSNATLSTDEDNILVVVMDNSAHDLRDGALNPRGITNATLIGPGSYSFTE WKLAGNAGFEDHLDPVRAPLNEGSLYAERVGIHLPGYEFDEAEEVSSNSTSLTVPGAGIRVFRT VVPLSVPQGLDVSISFRLTAPSNVTFTSAEGYTNQLRALLFVNGYQYGRFNPYIGHQIDFPVPP GVLDYNGDNTIAVTVWSQSVDGAEIKVDWNVDYVHETSFDMNFDGAYLRPGWIEERREYA Probablebeta- MARFPQLLFLLLASIGLLSAAQNHSDSEWPLHDNGLSTVVQWDHYSFHVHGQRIFVFSGEFHYW 704 galactosidaseB RIPVPGLWRDILEKIKAAGFTAFAFYSSWGYHAPNNHTVDFSTGARDITPIYELAKELGMYIIV (LactaseB) RPGPYVNAEASAGGYPLWVTTGAYGSLRNDDARYTAAWKPYFAKMSEITSQYQVTDGHNTFCYQ Q0CMF3.2 IENEYGQQWIGDPVDRNPNQTAVAYMELLEASARENGIVVPLTANDPNMNTKSWGSDWSHAGGN GI:300681013 VDVVGLDSYPSCWTCDVTQCTSTNGEYVPYKVMQYYDYFQEVQPTMPGFMPEFQGGSYNPWAGP EGGCPGDTGVDFANLFYRWNIAQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSSPISEDRSIG SKYYETKLLALFTRSATDLTMTDRIGNGTHYTNNPAVAAYELRNPVTNGAFYVTIHADSTVGTD ESFRLNVNTSAGALTVPSKGSIRLNGHQSKIIVTDFRFGPSHTLLYSTAEVLTHAVMDKKATLV LWVPTGESGEFAVKGAKSGKVERCPQCSNATFTRKKDVLVVNFTQAGGMSVLQLNNGVRVVLLD RAAAYKFWAPPLTDDPFAPETDLVLVQGPYLVRSASLSGSTLALRGDSANETALEVFASKKVHT VTWNGKRIKTSRSSYGSLTASLAAPPAVSLPALSSAQWKSQDSLPERLPSYDDSGPAWVDANHM TTQNPRTPDTLPVLYADEYGFHNGIRLWRGSFTDAASGVYLNVQGGAAFGWSAYLNGHFLGSHL GTATTSQANKTLLFPAGTLRKNTTNTILVIHDDTGHDQTTGALNPRGILAARLLAPSDSSTAPN FTQWRVAGTAGGESDLDPVRGVYNEDGLFAERMGWHLPGFDDADWPANNSTTTRGAQVSLSVTG ATVRFFRAVVPLHLPRGVDASISFMLGTPAGASTAYRAQLFVNGYQYGRFYPHIGNQVVYPVPA GVLDYDGENTIGVAVWAQSEAGAEMSLDWRVNYVADSSLDAVRVAAEGALRPGWSEERLQYA Probablebeta- MLISKTVLSGLALGASFVGVSAQQNSTRWPLHDNGLTDTVEWDHYSFLINGQRHFVFSGEFHYW 705 galactosidaseB\ RIPVPELWRDLLEKIKAAGFTAFSIYNHWGYHSPKPGVLDFENGAHNFTSIMTLAKEIGLYMII (LactaseB) RPGPYVNAEANAGGLPLWTTTGAYGKLRDNDPRYLEALTPYWANISKIIAPHLITNDGNVILYQ Q2U6P1.2 IENEYAEQWLDEETHEPNTSGQEYMQYLEDVARENGIDAPLIHNLPNMNGHSWSKDLSNATGNV GI:300681012 DVIGVDSYPTCWTCNVSECASTNGEYIPYLKLTYPQISYFKELSPTQPSFMPEFQGGSYNPWGG PQGGCPDDLGPDFANLFYRNLISQRVSAISLYMLYGGTNWGWHASTDVATSYDYSSPISENRKL IEKYYETKVLTQFTKIAQDLSKVDRLGNSTKYSSNPAVSVAELRNPDTGAAFYVTQHEYTPSGT VEKFTVKVNTSEGALTIPQYGSQITLNGHQSKIIVTDFKFGSKTLLYSTAEVLTYAVIDGKEVL ALWVPTGESGEFTVKGVNSAKFADKGRTANIEIHPGTNNVTVSFMQRSGMSLVELGDGTRIVLL DRSAAHVFWSTPLNNDPAEAGNNTVLVHGPYLVRSAKLEGCDLKLTGDIQNSTEVSIFAPKSVC SVNWNGKKTSVKSAKGGVITTTLGGDAKFELPTISGWKSADSLPEIAKDYSATSKAWVVATKTN SSNPTPPAPNNPVLYVDENDIHVGNHIYRATFPSTDEPPTDVYLNITGGRAFGYSVWLNSDFIG SWLGTATTEQNDQTFSFSNATLSTDEDNILVVVMDNSAHDLRDGALNPRGITNATLIGPGSYSF TEWKLAGNAGFEDHLDPVRAPLNEGSLYAERVGIHLPGYEFDEAEEVSSNSTSLTVPGAGIRVF RTVVPLSVPQGLDVSISFRLTAPSNVTFTSAEGYTNQLRALLFVNGYQYGRFNPYIGHQIDFPV PPGVLDYNGDNTIAVTVWSQSVDGAEIKVDWNVDYVHETSFDMNFDGAYLRPGWIEERREYA Probablebeta- MKLLSVAAVALLAAQAAGASIKHRLNGFTILEHPDPAKRDLLQDIVTWDDKSLFINGERIMLFS 706 galactosidaseA GEVHPFRLPVPSLWLDIFHKIRALGFNCVSFYIDWALLEGKPGDYRAEGIFALEPFFDAAKEAG (LactaseA) IYLIARPGSYINAEVSGGGFPGWLQRVNGTLRSSDEPFLKATDNYIANAAAAVAKAQITNGGPV Q2UCU3.1 ILYQPENEYSGGCCGVKYPDADYMQYVMDQARKADIVVPFISNDASPSGHNAPGSGTGAVDIYG GI:121801672 HDSYPLGFDCANPSVWPEGKLPDNFRTLHLEQSPSTPYSLLEFQAGAFDPWGGPGFEKCYALVN HEFSRVFYRNDLSFGVSTFNLYMTFGGTNWGNLGHPGGYTSYDYGSPITETRNVTREKYSDIKL LANFVKASPSYLTATPRNLTTGVYTDTSDLAVTPLIGDSPGSFFVVRHTDYSSQESTSYKLKLP TSAGNLTIPQLEGTLSLNGRDSKIHVVDYNVSGTNIIYSTAEVFTWKKFDGNKVLVLYGGPKEH HELAIASKSNVTIIEGSDSGIVSTRKGSSVIIGWDVSSTRRIVQVGDLRVFLLDRNSAYNYWVP ELPTEGTSPGFSTSKTTASSIIVKAGYLLRGAHLDGADLHLTADFNATTPIEVIGAPTGAKNLF VNGEKASHTVDKNGIWSSEVKYAAPEIKLPGLKDLDWKYLDTLPEIKSSYDDSAWVSADLPKTK NTHRPLDTPTSLYSSDYGFHTGYLIYRGHFVANGKESEFFIRTQGGSAFGSSVWLNETYLGSWT GADYAMDGNSTYKLSQLESGKNYVITVVIDNLGLDENWTVGEETMKNPRGILSYKLSGQDASAI TWKLTGNLGGEDYQDKVRGPLNEGGLYAERQGFHQPQPPSESWESGSPLEGLSKPGIGFYTAQF DLDLPKGWDVPLYFNFGNNTQAARAQLYVNGYQYGKFTGNVGPQTSFPVPEGILNYRGTNYVAL SLWALESDGAKLGSFELSYTTPVLTGYGNVESPEQPKYEQRKGAY Probablebeta- MRLLSFIYLVWLALLTGTPQVSATDNGKTSDVAWDKYSLSVKGERLFVFSGEFHYQRLPVPELW 707 galactosidaseC LDVFQKLRANGFNTISVYFFWSYHSASEDVFDFTTGAHDIQRLFDYAKQAGLYVIARAGPYCNA (LactaseC) ETSAGGFALWAANGQMGSERTSDEAYYKKWKPWILEVGKIIAANQITNGGPVILNQHENELQET Q2UMD5.1 TYDSNDTKVIYMEQVAKAFEEAGVVVPSSHNEKGMRTVSWSTDYKNVGGAVNVYGLDSYPGSLS GI:121804415 CANPNSGFNLLRTYYQWFQNYSYTQPEYLAEFEGGWFQPWGGSFYDSCASELSPEFADVYYKNN IGSRVTLHNIYMTFGGTNWGHSAAPVVYTSYDYGSPLRETREIRDKLKQTKLLGLFTRVSKDLL KTYMEGNGTSYTSDDSIYTWALRNPDSDAGFYVVAHNTSSSREVTTFSLNITTSAGAMTIPDIE LDGRQSKIIVTDYSIGSESSLLYSSAEVLTYATLDVDVLVFYLNAGQKGAFVFKDAPADLKYQT YGNSNLSALETSQGTQYSYTQGEGVTAVKFSNGVLVYLLDKETAWNFFAPPTVSSPTVAPNEHI LVFGPYLVRGASIKHDTVEIVGDNSNSTSIEIYTGDEHVKKVSWNGNLIDTRATAYGSLIGTVP GAEDIEISLPSLSSWKAQDTLPEISPDYDDSRWTICNKTTSVNSVAPLSLPVLYSGDYGYHTGT KIYRGRFDGQNATGANVTVQNGVAAGWAAWLNGAYVGGFSGDPDKVASWEVLKFNHSSLRSRDN VLTIITDYTGHDQNSQKPIGTQNPRGIMGATLIGGGNFTLWRIQGNAGGEKNIDPVRGPMNEGG LYGERMGWHLPGYQVPESALDSSPLEGVSGAEGRFYTTSFQLDLEEDLDVPIGLQLSAPAGTEA VVQIFMNGYQFGHYLPHIGPQSLFPFPPGVIKNRGQNSLAISMWALTDAGARLEQVELKAYAKY RSGFDFNRDWTYLQPGWKDRTEYA Probablebeta- MKSLLKRLIALAAAYSVAAAPSFSHHSSQDAANKRELLQDLVTWDQHSLFVRGERLMIFSGEFH 708 galactosidaseE PFRLPVPGLWFDVFQKIKSLGFNAVSFYTDWGLMEGNPGHVVTDGIWSLDEFFTAAREAGLYLI (LactaseE) ARPGPYINAETSAGGIPGWVLRRKGIIRSNSEDYLRATDTYMATLGKIIAKAQITNGGPVILVQ Q4WG05.1 PENEYTTWPNVSESEFPTTMNQEVMAYAEKQLRDAGVVVPTVVNDNKNLGYFAPGTGLGETDLY GI:74668464 GIDAYPMRYDCGNPYVWPTYRFPRDWQHEHRNHSPTTPFAIMEFQGGSGDGWGGVTEDGCAILV NNEAVRVVYKNNYGFGVRVFNIYMTYGGTNWGNLGYYGGYTSYDYGAAITEDRQIWREKYSEEK LQANFLKVSPAYLTSTPGNGVNGSYTGNKDITVTPLFGNGTTTNLYLVRHADFTSTGSAQYNLS ISTSVGNVTIPQLGGSLSLNGRDSKFHITDYDVGGFNLIYSSAEVFTWAKGDNKKRVLVLYGGA GELHEFALPKHLPRPTVVEGSYVKIAKQGSAWVVQWEVAAQRRVLRAGKLEIHLLWRNDAYQHW VLELPAKQPIANYSSPSKETVIVKGGYLLRSAWITDNDLHLTGDVNVTTPLEVISAPKRFDGIV FNGQSLKSTRSKIGNLAATVHYQPPAISLPDLKRLDWKYIDSLPEISTEYNDEGWTPLTNTYTN NTREFTGPTCLYADDYGYHGGSLIYRGHFTANGDESWVFLNTSGGVGFANSVWLNQTFLGSWTG SGRNMTYPRNISLPHELSPGEPYVFTVVIDHMGQDEEAPGTDAIKFPRGILDYALSGHELSDLR WKMTGNLGGEQYQDLTRGPLNEGAMYAERQGYHLPSPPTSSWKSSNPIKEGLTGAGIGFYATSF SLDLPEGYDIPLSFRFNNSASAARSGTSYRCQLFVNGYQFGKYVNDLGPQTKFPVPEGILNYNG Probablebeta- MRIFSFLFLLLLGILTGQGLVSGTDNGKTTDVTWDKYSLSVKGQRLFVFSGEFHYQRLPVPELW 709 galactosidaseC LDVFQKLRANGFNAISVYFFWSFHSASEGEFDFENGAHDIQRLFDYAKEAGLYVIARAGPYCNA (LactaseC) ETSAGGFALWAANGQMGNERTSDEAYYEKWRPWILEVGKIIAKNQITNGGPVILNQHENELVET Q4WNE4.1 TYDPNHTLVVYMKQIAQVFEEAGIVVPSSHNEKGMRGVSWSTDYHNVGGAVNIYGLDSYPGGLS GI:74671041 CTNPNSGFNLVRTYHQWFQNYSFTQPSYLPEFEGGWFQPWGGSFYDTCATELSPEFPDVYYKNN IGSRVTLHSIYMTYGGTNWGHSAAPVVYTSYDYAAPLRETREIRDKLKQTKLIGLFTRVSKDLL KTYMEGNGTGYTSDSSIYTWSLRNPDTNAGFYVLAHSTSSTRDVTTFTLNVTTSAGAISIPDIE LNGRQSKIIVTDYNFGTNSTLLFSSAEVLTYANLDVNVLVFYLNVGQKGTFVFKDEPKLAFQTY GNSNLTTSESSYGTQYSYTQGKGVTAVKFSNGVLAYFLDKESAWNFFAPPTTSSPQVAPNEHIL VQGPYLVRGASVNHGTVEITGDNANTTSIEVYTGNSQVKKIKWNGKTIETRKTAYGSLIGTAPG AEDVKIQLPSLDSWKAQDTLPEIQPDYDDSKWTVCNKTTSVNAIAPLSLPVLYSGDYGYHAGTK VYRGRFDGRNVTGANVTVQNGAAAGWAAWVNGQYAGGSAGSPNLAATSAVLTFNSSSLKDQDNV LTVVTDYTGHDQNSVRPKGTQNPRGILGATLIGGGNFTSWRIQGNAGGEKNIDPVRGPMNEGGL YGERMGWHLPGYKVPKSASKSSPLDGVSGAEGRFYTTTFKLKLDKDLDVPIGLQLGAPEGTKAV VQVFMNGYQFGHYLPHTGPQSLFPFPPGVINNRGENTLAISMWALTDAGAKLDKVELVAYGKYR SGFDFNQDWGYLQPGWKDRSQYA Probablebeta- MAHIYRLLLLLLSNLWFSTAAQNQSETEWPLHDNGLSKVVQWDHYSFQVNGQRIFIFSGEFHYW 710 galactosidaseB RIPVPELWRDILEKVKATGFTAFAFYSSWAYHAPNNSTVDFSTGARDITPIFELAKELGMYMIV (LactaseB) RPGPYVNAEASAGGFPLWLMTGEYGSLRNDDPRYTAAWTPYFANMSQITSKYQVTDGHNTLVYQ Q4WRD3.1 IENEYGQQWIGDPKNRNPNKTAVAYMELLEASARENGITVPLTSNDPNMNSKSWGSDWSNAGGN GI:74672078 VDVAGLDSYPSCWTCDVSQCTSTNGEYVPYKVIDYYDYFQEVQPTLPSFMPEFQGGSYNPWAGP EGGCPQDTSAEFANLFYRWNIGQRVTAMSLYMLYGGTNWGAIAAPVTATSYDYSAPISEDRSIG AKYSETKLLALFTRTAKDLTMTEAIGNGTQYTTNTAVRAFELRNPQTNAGFYVTFHTDTTVGGN QAFKLHVNTSVGALTVPKNEGLIQLNGHQSKIIVTDFTLGKRTLLYSTAEVLTYAVFENRPTLV LWVPTGESGEFAIKGAKSGKVENGDGCSGIKFKREKDYLVVNFSQAKGLSVLRLDNGVRVVLLD KAAAYRFWAPALTDDPNVQETETVLVHGPYLVRSASISKTTLALRGDSVEKTTLEIFAPHSVRK ITWNGKEVQTSHTPYGSLKATLAAPPDIKLPALTSWRSNDSLPERLPSYDDSGPAWIEANHMTT SNPSPPATFPVLYADEYGFHNGVRLWRGYFNGSASGVFLNIQGGSAFGWSAWLNGHFLDSHLGT ATTSQANKTLTFPSSILNPTENVLLIVHDDTGHDQTTGALNPRGILEARLLSNDTSSPPPEFTH WRLAGTAGGESNLDPIRGVFNEDGLFAERMGWHLPGFDDSAWTSENSATSASSALSFTGATVRF FRSVVPLNIPAGLDVSISFVLSTPTAAPKGYRAQLFVNGYQYGRYNPHIGNQVVFPVPPGILDY QGDNTIGLAVWAQTEEGAGIQVDWKVNYVADSSLSVAGFGKGLRPGWTEERLKFA Probablebeta- MKLLSVCAIALLAAQAAGASIKHMLNGFTLMEHSDPAKRELLQKYVTWDEKSLFVNGERIMIFS 711 galactosidaseA GEVHPFRLPVPSLWLDVFQKIKALGFNCVSFYVDWALLEGKPGEYRAEGNFALEPFFDVAKQAG (LactaseA) IYLLARPGPYINAEASGGGFPGWLQRVNGTLRTSDPAYLKATDNYIAHVAATIAKGQITNGGPV Q4WS33.2 ILYQPENEYSGACCDATFPDGDYMQYVIDQARNAGIVVPLINNDAWTGGHNAPGTGKGEVDIYG GI:300681010 HDSYPLGFDCGHPSVWPKGNLPTTFRTDHLKQSPTTPYSLIEFQAGSFDPWGGPGFAACAALVN HEFERVFYKNDLSFGAAILNLYMTFGGTNWGNLGHPGGYTSYDYGSPLTESRNVTREKYSELKL IGNFVKASPSYLLATPGNLTTSGYADTADLTVTPLLGNGTGSYFVVRHTDYTSQASTPYKLSLP TSAGRLTVPQLGGTLTLNGRDSKIHVVDYNVAGTNIIYSTAEVFTWKNFGDSKVLILYGGPGEH HELAVSLKSDVQVVEGSNSEFKSKKVGDVVVVAWDVSPSRRIVQIGDLKIFLLDRNSVYNYWVP QLDKDDSSTGYSSEKTTASSIIVKAGYLVRTAYTKGSGLYLTADFNATTPVEVIGAPSNVRNLY INGEKTQFKTDKNGIWSTEVKYSAPKIKLPSMKDLDWKYLDTLQEVQSTYDDSAWPAADLDTTP NTLRPLTTPKSLYSSDYGFHTGYLIYRGHFVADGSETTFDVRTQGGSAFGSSVWLNESFLGSWT GLNANADYNSTYKLPQVEQGKNYVLTILIDTMGLNENWVVGTDEMKNPRGILSYKLSGRDASAI TWKLTGNLGGEDYQDKIRGPLNEGGLYAERQGFHQPQPPSQKWKSASPLDGLSKPGIGFYTAQF DLDIPSGWDVPLYFNFGNSTKSAYRVQLYVNGYQYGKFVSNIGPQTSFPVPQGILNYQGTNWVA LTLWALESDGAKLDDFELVNTTPVMTALSKIRPSKQPNYRQRKGAY Probablebeta- MKLSSACAIALLAAQAAGASIKHRINGFTLTEHSDPAKRELLQKYVTWDDKSLFINGERIMIFS 712 galactosidaseA GEFHPFRLPVKELQLDIFQKVKALGFNCVSFYVDWALVEGEPGEYRADGIFDLEPFFDAASEAG (LactaseA) IYLLARPGPYINAESSGGGFPGWLQRVNGTLRSSDKAYLDATDNYVSHVAATIAKYQITNGGPI Q4ZHV7.1 ILYQPENEYTSGCCGVEFPDPVYMQYVEDQARNAGVVIPLINNDASASGNNAPGTGKGAVDIYG GI:74645200 HDSYPLGFDCANPTVWPSGDLPTNFRTLHLEQSPTTPYAIVEFQGGSYDPWGGPGFAACSELLN NEFERVSYKNDFSFQIAIMNLYMIFGGTNWGNLGYPNGYTSYDYGSAVTESRNITREKYSELKL LGNFAKVSPGYLTASPGNLTTSGYADTTDLTVTPLLGNSTGSFFVVRHSDYSSEESTSYKLRLP TSASSVTIPQLGGTLTLNGRDSKIHVTDYNVSGTNIIYSTAEVFTWKKFADGKVLVLYGGAGEH HELAISTKSNVTVIEGSESGISSKQTSSSVVVGWDVSTTRRIIQVGDLKILLLDRNSAYNYWVP QLATDGTSPGFSTPEKVASSIIVKAGYLVRTAYLKGSGLYLTADFNATTSVEVIGVPSTAKNLF INGDKTSHTVDKNGINSATVDYNAPDISLPSLKDLDWKYVDTLPEIQSSYDDSLWPAADLKQTK NTLRSLTTPTSLYSSDYGFHTGYLLYRGHFTATGNESTFAIDTQGGSAFGSSVWLNGTYLGSWT GLYANSDYNATYNLPQLQAGKTYVITVVINNMGLEENWTVGEDLMKTPRGILNFLLAGRPSSAI SWKLTGNLGGEDYEDKVRGPLNEGGLYAERQGFHQPEPPSQDWKSSSPLEGLSEAGIGFYSASF DLDLPKGWDVPLFLNIGNSTTPSPYRVQVYVNGYQYAKYISNIGPQTSFPVPEGILNYRGTNWL AVTLWALDSAGGKLESLELSYTTPVLTALGEVESVDQPKYKKRKGAY Probablebeta- MATAFWLLLFLLGSLHVLTAAQNSSQSEWPIHDNGLSKVVQWDHYSFYINGQRIFLFSGEFHYW 713 galactosidaseB RIPVPALWRDILEKIKAIGFTGFAFYSSWAYHAPNNQTVDFSTGARDITPIYDLAKELGMYIIV (LactaseB) RPGPYVNAEASAGGFPLWLTTGAYGSTRNDDPRYTAAWEPYFAEVSEITSKYQVTDGHYTLCYQ Q5BEQ0.2 IENEYGQQWIGDPRDRNPNQTAIAYMELLQASARENGITVPLTGNDPNMNTKSWGSDWSDAGGN GI:300681009 LDTVGLDSYPSCWSCDVSVCTGTNGEYVPYKVLDYYDYFQEVQPTMPFFMPEFQGGSYNPWDGP EGGCTEDTGADFANLFYRWNIGQRVSAMSLYMMFGGTNWGGIAAPVTASSYDYSAPISEDRSIG SKYYETKLLALFTRCAKDLTMTDRLGNGTQYTDNEAVIASELRNPDTNAAFYVTTHLDTTVGTD ESFKLHVNTSKGALTIPRHGGTIRLNGHHSKIIVTDFNFGSETLLYSTAEVLTYAVFDRKPTLV LWVPTGESGEFAIKGAKSGSVAKCSGCSNIKFHRDSGSLTVAFTQGEGISVLQLDNGVRVVLLD RQKAYTFWAPALTDNPLVPEGESVLVSGPYLVRTARLARSTLTLRGDSKGETLEIFAPRKIKKV TWNGKAVEATRTSYGSLKAILAKPPSVELPTLNGWKYSDSLPERFPTYDDSGAAWVEIDANHMT TPNPNKPATLPVLYADEYGFHNGVRLWRGYFNSSASGVYLNIQGGAAFGWSAWLNGHFLGSHLG SASIQQANGTLDFPANTLNTEGTPNVLLVVHDDTGHDQTTGVLNPRGILEARLLSEASDNNDDD SPGFTHWRVAGTAGGESDLDPVRGVYNEDGLYAERVGWHLPGFDDSKWATVNGTSLSFTGATVR FFRTVIPPLSIPENTDVSISFVFSTPNVNNTSAGNTSAFRAQLFVNGYQYGRYNPYVGNQVVYP VPPGILDYNGENTIGVAVWAQTEAGARLNLDWRVNYVLGSSLDAGRLDLSFVAIAYVYIFECLQ L Probablebeta- MRLLPVWTAALLAAQAAGVALTHKLNGFTITEHPDAEKRELLQKYVTWDDKSLFINGERIMIFG 471 galactosidaseA AEIHPWRLPVPSLWRDILQKVKALGFNCVSFYVDWALLEGKPGEYRAEGSFAWEPFFDAASDLG (LactaseA) IYLIARPGPYINAEASGGGFPGWLQRLNGTIRSSDQSYLDATENYVSHIGGLIAKYQITNGGPV Q5BFC4.2 ILYQPDNEYSGGCCGQEFPNPDYFQYVIDQARRAGIVVPTISNDAWPGGHNAPGTGKGEVDIYG GI:300681016 HDNYPLGFDCANPDVWPEGNLPTDYRDLHLEISPSTPYALVEYQVGAFDPWGGPGFEQCAALTG YEFERVFHKNTFSFGVGILSLYMTFGGTNWGNLGHPGGYTSYDYGSPIKETREITREKYSELKL LGNFIKSSPGYLLATPGKLTNTTYTNTADLTVTPLLGNGTGSFFVLRHSDYSSQASTPYKLRLP TSAGQLTIPQLGGSLVLNGRDSKVHLVDYDVAGTKILYSTAEVFTWKKFHDGKVLVLYGGPGEH HELAVSSKAKVKVVEGLGSGISSKQIRGAVVVAWDVEPARRIVQIGDLKIFLLDRNSAYNYWVP QLGTETSIPYATEKAVAASVIVKAGYLVRTAYVKGRDLHLTADFNATTPVEVIGAPKTAENLFI NGKKAHHTVDKNGIWSTEVGYSPPKIVLPVLEDLKWKSIDTLPEIQPSYDDSPWPDANLPTKNT IYPLRTPTSLYASDYGFHTGYLLFRGHFTANGRESNFSIQTQGGQAFGSSVWLSGTYLGSWTGD NDYQDYNATYTLPSLKAGKEYVFTVVVDNMGLNENWIVGQDEMKKPRGILNYELSGHEASDITW KLTGNFGGEDYVDKVRGPLNEGGLYAERHGYHQPYPPTKSKDWKSSTPLTGLSKPGISFYTASF DLDIKSGWDVPIYFEFGNSTTPAPAYRVQLYVNGWQYGKYVNNIGPQTRFPVPEGILNYKGTNW VAVTLWALEGSGAKLDSFKLVHGIPVRTALDVEGVELPRYQSRKGVY Lactase,partial SIKHRINGFTLTEHSDPAKRELLQKYVTWDDKSLFINGERIMIFSGEFHPFRLPVKELQLDIFQ 715 [Aspergillus KVKALGFNCVSFYVDWALVEGKPGEYGADGIFDLEPFFDAASEAGIYLLARPGPYINAESSGGG niger] FPGWLQRVNGTLRTSDKAYLEATDNYVSHIAATIAKYQITNGGPIILYQPENEYTGGCCGVEFP ABL07484.1 DPVYMQYVEDQARNAGVVIPLINNDASASGHNAPGTGEGAVDIYGHDSYPLGFDCANPTVWPSG GI:118582212 DLPTNFRTLHLVQSPTTPYAIVEFQGGSYDPWGGPGFAACSELLNNEFERVFYKNDFSFQIAIM NLYMIFGGTNWGNLGYPNGYTSYDYGSAVTESRNITREKYSELKLLGNFAKVSPGYLTASPGNL TTSGYADTTDLTVTPLLGNSTGSFFVVRHSDYSSEDSTSYKLRLPTSAGTVTIPQLGGTLTLNG RDSKIHVTDYNVSGTNIIYSTAEVFTWKKFADGKVLVLYGGAGEHHELAISTKSNVTVIEGSES GISSKQTSSSVIVGWDVSTTRRIIQVGDLKVLLLDRNSAYNYWVPQLATDGTSPGFSTSETVAS SIIVKAGYLVRTAYLKGSGLYLTADFNATTSVEVIGVPSTAKNLFINGDKTSHTVDKNGINSAT VEYNAPDISLPSLKDLDWKYVDTLPEIQSSYDDSLWPAADLKQTKNTLRSLTTPTSLYSSDYGF HTGYLLYRGHFTATGNESTFSIDTQGGSAFGSSVWLNGTYLGSWTGLYVNSDYNATYKLPQLQA GKSYVITVVIDNMGLEENWTVGEDLMKTPRGILNFLLAGRPGSAISWKLTGNLGGEDYEDKVRG PLNEGGLYAERQGFHQPEPPSGNWKSSSPLEGLSEAGIGFYSAKFDLDLPKGWDVPLFLNIGNS TTPSPYRVQVYVNGYQYAKYISNNGPQTSFPVPEGILNYRGTNWLAVTLWALDSAGGKLESLEL SYTTPVLTALGEVESVDQPKYKKRKGAY Unnamedprotein MGSTSTSTLPPDFLWGFATASYQIEGAVNEDGRGPSIWDTFCKIPGKIAGGANGDVACDSYHRT 716 product HEDIALLKACGAKAYRFSLSWSRIIPLGGRNDPINEKGLQYYIKFVDDLHAAGITPLVTLFHWD [Aspergillus LPDELDKRYGGLLNKEEFVADFAHYARIVFKAFGSKVKHWITFNEPWCSSVLGYNVGQFAPGRT oryzaeRIB40] SDRSKSPVGDSSRECWIVGHSLLVAHGAAVKIYRDEFKASDGGEIGITLNGDWAEPWDPENPAD BAE57671.1 VEACDRKIEFAISWFADPIYHGKYPDSMVKQLGDRLPKWTPEDIALVHGSNDFYGMNHYCANFI GI:83767532 KAKTGEADPNDTAGNLEILLQNKKGEWVGPETQSPWLRPSAIGFRKLLKWLSERYNYPKIYVTE NGTSLKGENDLPLEQLLQDDFRTQYFRDYIGAMADAYTLDGVNVRAYMAWSLME Unnamedprotein MARVRLKLPADFIWGVSSSSWQIEGGLQLEGRGPSVLDTIGNVLSPEAADRSDANVANMHYFMY 717 product EQDIARLAAAGIPYYSFSLSWPRIVPFGVAGSPVNTQGLDHYDDLINTCIKYGVTPIVTLNHVD [Aspergillus APTAVQADLDSLPEHFLYYAKIVMTRYADRVPYWVTFNEPNIGVGTLFQKYQDLTSALIAHADV oryzaeRIB40] YDWYKNTLGGTGKITMKFANNLAMPLDTQDSSHIAAASRYQDILLGIMSNPLFLGKQYPDAAID BAE62705.1 TVDMMQPLTDDQIKHIHGKIDFWSFDPYTAQYASPLPQGTEACASNSSDPFWPTCVILSNVQAN GI:83772577 GWLMGQASNAYAYLAPQYVRQQLGYIWNTFRPSGILIAEYGFNPFLESNRTLDAQRYDLERTLY YQDFLTETLKAIHEDNVNVIGALAWSIADNNEFGSYEEQYGLQTVNRTNGKFTRTYKRSLFDYV DFFHRHVQSA Unnamedprotein MNVNMFKAGDDILQDVDQSCKDRLPAVEELPLPPSFTWGTATAAYQVEGGAFQDGKGKSIWDTF 718 product THLDPSRTNGENGDIACDHYNRMAEDVVLMASYGVDVYRFSIAWARILPLGGRGDPINEKGIAF [Aspergillus YNNLIDCLLEHNIEPVVTLYHWDVPQGLYDRYGAFLDTTEFRADFEHFARLCFSRFGDRVKRWI oryzaeRIB40] TFNEPYIISIFGHHSGVLAPGRSSATGGDSRTEPWRVGHTIILAHTAAVQAYATDFQPTQKGDI BAE63197.1 SIVLNGHYYEPWDAGSEEHRLAAQRRLEFYIGWFGDPIFLGKDYPAPMRAQLGSRLPEFTSEEL GI:83773069 DLLRRSAPINSFYGMNHYTTKYARALPDPPAEDDCTGNVEEGPTNSEGKTMGPLSGMSWLRVTP AGFRKLLNWVWDRYRRPIVVTENGCPCPGESQMTKEQALDDQFRIRYFGLYLDAISRAIYDDGV KVEGYYVWSLMDNFEWSAGYGPRYGITHVDFTTLVRTPKQSAKYLHHSFNKRRATSLR Beta-galactosidase MTLSAVPDYENQHILQRNRLKPRAYFLPATSISLNGRWDFHYAASPVSAPEPTWSKGTKNATAE 719 [Aspergillus PRRDSNQFSSDGADSKTAWAPITVPGHWQLQGYGRPHYTNVIYPFPVCPPFVPTENPTGTYRRT fumigatusAf293] FHVPAEWDASSQLRLRFDGVDSAYHVWVNGVPIGYSQGSRNPAEFDVSQVVDRDGANELFVRVY XP_753202.1 QWSDGSYIEDQDQWWLSGIFRDVTLLAFPGQARIEDFFVRTALDKDYVDATLRLSVDLALATAA GI:70996895 IVQVTLSNPSTGSTLQTEKYSLGEKQDKLEAELSVSNPNKWTAETPNLYNLCIALYVDGAKDPV QTINHRVGFRQVEIKNGNITVNGVPVMFRGVNRHDHHPRFGRAVPLSFLREDLLIMKRHNVNAL RCSHYPSHPRLYELCDELGLWVMDEADLECHGFYDAIARPLDIPESMDYEERKKLTFGQAAQFT TNNPEWKEAYVDRMAQMVQRDKNHSCIVIWSLGNEAFYGSNHQAMYDYVKQVDPSRPVHYEGDM EAKTVDMYSYMYPSLERLVGFATAEGDEFKKPIVLCEYAHAMGNAPGGLEEYMEAFRTHRRLQG GWVWEWANHGLWDEKKGWYGYGGDFGDTPHDGNFVLDGLLFSDHTPTPGITELKKAYAPVRVWP GEDGTLVVANDYNFVGLEGLQASYKIEVLGDSGRIIATGIIELPPIPAGQNGTIKLPSAPATAI PGEVWLTISFLQKGETAWAGNNYEVAWYQQCLKSSSPRFSLAVPAEALTHSSTKTSHRISGASF SLEFSRETGSLYAWTAGGLSLLDQSSSTGAISPGFWRPPTDNDMSHDLLEWRRFGLDTLTSQLR KMHVVQHTPTSVEVTTETYISAPILGWGFFASTSYTISGNGALTVNVHLKPHGPMPADLPRLGL DVLLADELDNTSWFGLGPGEAYPDKKRAQKVGIYNAATAELHTPYEVPQEGGNRMDTRWLRVHD SRGWGLRVTRVKDESDKQPTELFQWLATRYSPEAIEAAKHAPELVPEKRIRLRLDVESCGVGTG ACGPRTLDKYRVKCEERKFGFTLQPVLAELC Beta- MTLSAVPDYENQHILQRNRLKPRAYFLPATSISLNGRWDFHYAASPVSAPEPTWSKGTKNATAE 720 galactosidase, PRRDSNQFSSDGADSKTAWAPITVPGHWQLQGYGRPHYTNVIYPFPVCPPFVPTENPTGTYRRT putative FHVPAEWDASSQLRLRFDGVDSAYHVWVNGVPIGYSQGSRNPAEFDVSQVVDRDGANELFVRVY [Aspergillus QWSDGSYIEDQDQWWLSGIFRDVTLLAFPGQARIEDFFVRTALDKDYVDATLRLSVDLALATAA fumigatusAf293] IVQVTLSNPSTGSTLQTEKYSLGEKQDKLEAELSVSNPNKWTAETPNLYNLCIALYVDGAKDPV EAL91164.1 QTINHRVGFRQVEIKNGNITVNGVPVMFRGVNRHDHHPRFGRAVPLSFLREDLLIMKRHNVNAL GI:66850838 RCSHYPSHPRLYELCDELGLWVMDEADLECHGFYDAIARPLDIPESMDYEERKKLTFGQAAQFT TNNPEWKEAYVDRMAQMVQRDKNHSCIVIWSLGNEAFYGSNHQAMYDYVKQVDPSRPVHYEGDM EAKTVDMYSYMYPSLERLVGFATAEGDEFKKPIVLCEYAHAMGNAPGGLEEYMEAFRTHRRLQG GWVWEWANHGLWDEKKGWYGYGGDFGDTPHDGNFVLDGLLFSDHTPTPGITELKKAYAPVRVWP GEDGTLVVANDYNFVGLEGLQASYKIEVLGDSGRIIATGIIELPPIPAGQNGTIKLPSAPATAI PGEVWLTISFLQKGETAWAGNNYEVAWYQQCLKSSSPRFSLAVPAEALTHSSTKTSHRISGASF SLEFSRETGSLYAWTAGGLSLLDQSSSTGAISPGFWRPPTDNDMSHDLLEWRRFGLDTLTSQLR KMHVVQHTPTSVEVTTETYISAPILGWGFFASTSYTISGNGALTVNVHLKPHGPMPADLPRLGL DVLLADELDNTSWFGLGPGEAYPDKKRAQKVGIYNAATAELHTPYEVPQEGGNRMDTRWLRVHD SRGWGLRVTRVKDESDKQPTELFQWLATRYSPEAIEAAKHAPELVPEKRIRLRLDVESCGVGTG ACGPRTLDKYRVKCEERKFGFTLQPVLAELC Beta-galactosidase MKLLSVAAVALLAAQAAGASIKHRLNGFTILEHPDPAKRDLLQDIVTWDDKSLFINGERIMLFS 721 [Aspergillus GEVHPFRLPVPSLWLDIFHKIRALGFNCVSFYIDWALLEGKPGDYRAEGIFALEPFFDAAKEAG candidus] IYLIARPGSYINAEVSGGGFPGWLQRVNGTLRSSDEPFLKATDNYIANAAAAVAKAQITNGGPV CAD24293.1 ILYQPENEYSGGCCGVKYPDADYMQYVMDQARKADIVVPFISNDASPSGHNAPGSGTGAVDIYG GI:18958133 HDSYPLGFDCANPSVWPEGKLPDNFRTLHLEQSPSTPYSLLEFQAGAFDPWGGPGFEKCYALVN HEFSRVFYRNDLSFGVSTFNLYMTFGGTNWGNLGHPGGYTSYDYGSPITETRNVTREKYSDIKL LANFVKASPSYLTATPRNLTTGVYTDTSDLAVTPLMGDSPGSFFVVRHTDYSSQESTSYKLKLP TSAGNLTIPQLEGTLSLNGRDSKIHVVDYNVSGTNIIYSTAEVFTWKKFDGNKVLVLYGGPKEH HELAIASKSNVTIIEGSDSGIVSTRKGSSVIIGWDVSSTRRIVQVGDLRVFLLDRNSAYNYWVP ELPTEGTSPGFSTSKTTASSIIVKAGYLLRGAHLDGADLHLTADFNATTPIEVIGAPTGAKNLF VNGEKASHTVDKNGIWSSEVKYAAPEIKLPGLKDLDWKYLDTLPEIKSSYDDSAWVSADLPKTK NTHRPLDTPTSLYSSDYGFHTGYLIYRGHFVANGKESEFFIRTQGGSAFGSSVWLNETYLGSWT GADYAMDGNSTYKLSQLESGKNYVITVVIDNLGLDENWTVGEETMKNPRGILSYKLSGQDASAI TWKLTGNLGGEDYQDKVRGPLNEGGLYAERQGFHQPQPPSESWESGSPLEGLSKPGIGFYTAQF DLDLPKGWDVPLYFNFGNNTQAARAQLYVNGYQYGKFTGNVGPQTSFPVPEGILNYRGTNYVAL SLWALESDGAKLGSFELSYTTPVLTGYGDVESPEQPKYEQRKGAY Beta-galactostdase MKLSSACAIALLAAQAAGASIKHRINGFTLTEHSDPAKRELLQKYVTWDDKSLFINGERIMIFS 722 (Lactase-N; GEFHPFRLPVKELQLDIFQKVKALGFNCVSFYVDWALVEGKPGEYRADGIFDLEPFFDAASEAG Lactase; IYLLARPGPYINAESSGGGFPGWLQRVNGTLRSSDKAYLDATDNYVSHVAATIAKYQITNGGPI Tilactase) ILYQPENEYTSGCSGVEFPDPVYMQYVEDQARNAGVVIPLINNDASASGNNAPGTGKGAVDIYG P29853.2 HDSYPLGFDCANPTVWPSGDLPTNFRTLHLEQSPTTPYAIVEFQGGSYDPWGGPGFAACSELLN GI:461623 NEFERVFYKNDFSFQIAIMNLYMIFGGTNWGNLGYPNGYTSYDYGSAVTESRNITREKYSELKL LGNFAKVSPGYLTASPGNLTTSGYADTTDLTVTPLLGNSTGSFFVVRHSDYSSEESTSYKLRLP TSAGSVTIPQLGGTLTLNGRDSKIHVTDHNVSGTNIIYSTAEVFTWKKFADGKVLVLYGGAGEH HELAISTKSNVTVIEGSESGISSKQTSSSVVVGWDVSTTRRIIQVGDLKILLLDRNSAYNYWVP QLATDGTSPGFSTPEKVASSIIVKAGYLVRTAYLKGSGLYLTADFNATTSVEVIGVPSTAKNLF INGDKTSHTVDKNGIWSATVDYNAPDISLPSLKDLDWKYVDTLPEIQSSYDDSLWPAADLKQTK NTLRSLTTPTSLYSSDYGFHTGYLLYRGHFTATGNESTFAIDTQGGSAFGSSVWLNGTYLGSWT GLYANSDYNATYNLPQLQAGKTYVITVVIDNMGLEENWTVGEDLMKSPRGISTSCLPDGQAAPI SWKLTGNLGGEDYEDKVRGPLNEGGLYAERQGFHQPEPPSQNWKSSSPLEGLSEAGIGFYSASF DLDLPKDGMSHCSSTSVTALRHPRTACRSTSTDIVCEIHKQHRTSDQLPCPRGNPELSRNELVG GDPVALDSAGGKLESLELSYTTPVLTALGEVESVDQPKYKKRKGAY Alpha-glucostdase SLLAPSQPQFXIPASAAVGAQLIANIDDPQAADAQSVCPGYKASKVQHNSRGFTASLQLAGRPC 723 P1subunit,ANPP1 NVYGTDVESLTLSVEYQDSDRLNIQILPTHVDSTXASWYFLSENLVPRPKASLXASVSQSDLFV subunit SWSNEPSFNFKVIRKATGDALFSTEGTVLVYENQFIEFVTALPEEYNLYGLGEHITQFRLQRNA Aspergillusniger XLTIYPSDDGTPIDQNLYGQHPFYLDTRYYKGDRQ AAB2358.1 GI:257186 (transglucosidase) Celluclast MADIDVEAILKKLTLAEKVDLLAGIDFWHTKALPKHGVPSLRFTDGPNGVRGTKFFNGVPAACF 724 hypothetical PCGTSLGSTFNQTLLEEAGKMMGKEAIAKSAHVILGPTINMQRSPLGGRGFESIGEDPFLAGLG protein AAALIRGIQSTGVQATIKHFLCNDQEDRRMMVQSIVTERALREIYALPFQIAVRDSQPGAFMTA M419DRAFT_125268 YNGINGVSCSENPKYLDGMLRKEWGWDGLIMSDWYGTYSTTEAVVAGLDLEMPGPPRFRGETLK [T.reeseiRUTC-3] FNVSNGKPFIHVIDQRAREVLQFVKKCAASGVTENGPETTVNNTPETAALLRKVGNEGIVLLKN ETR97394.1 ENNVLPLSKKKKTLIVGPNAKQATYHGGGSAALRAYYAVTPFDGLSKQLETPPSYTVGAYTHRF GI:57227381 LPILGEQCLTPDGAPGMRWRVFNEPPGTPNRQHIDELFFTKTDMHLVDYYHPKAADTWYADMEG TYTADEDCTYELGLVVCGTAKAYVDDQLVVDNATKQVPGDAFFGSATREETGRINLVKGNTYKF KIEFGSAPTYTLKGDTIVPGHGSLRVGGCKVIDDQAEIEKSVALAKEHDQVIICAGLNADWETE GADRASMKLPGVLDQLIADVAAANPNTVVVMQTGTPEEMPWLDATPAVIQAWYGGNETGNSIAD VVFGDYNPSGKLSLSFPKRLQDNPAFLNFRTEAGRTLYGEDVYVGYRYYEFADKDVNFPFGHGL SYTTFAFSNLSVSHKDGKLSVSLSVKNTGSVPGAQVAQLYVKPLQAAKINRPVKELKGFAKVEL QPGETKAVTIEEQEKYVAAYFDEERDQWCVEKGDYEVIVSDSSAAKDGVALRGKFTVGETYWWS GV Velvetcomplex MPSLIPPIVSASSASNSAALDHLYHHQPPPRLPLGAVPQSPIQSQAPPPPHLHPPSHHFQLHPG 725 subunit2 HGHHQQPHHERDHRLPPPVASYSAHSHHLQHDPLPQRLESSQPGHPGAAEHRDHPQHALDEPSR GRS98.2 SHDPYPSMATGALVHSESQQPASASLLLPISNVEEATGRRYHLDVVQQPRRARMCGFGDKDRRP GI:1881915 ITPPPCVRLIIIDVATGKEIDCNDIDHSMFVLNVDLWNEDGTREVNLVRSSTSSSPSVSSTVTY PYGSISVGESSHTYGQSAHPPSREAPYSVSQTASYAPEYQTQPTYSQGSSAYPSNGTYGPPQQY FPQHQAYRTETGPPGAMQTTVGGFRGYAQDQNALTKMAVVGGQPQGMFTRNLIGSLAASAFRLA DTSEHLGIWFVLQDLSVRTEGPFRLRFSFVNVGPLAGQNGAKVNTGRAPILASCFSEVFNVYSA KKFPGVCESTPLSKTFAAQGIKIPIRKDANLKGGDGEDDYGD alpha-L- MLSNARIIAAGCIAAGSLVAAGPCDIYSSGGTPCVAAHSTTRALFSAYTGPLYQVKRGSDGATT 726 arabinofuranostdas AISPLSSGVANAAAQDAFCAGTTCLITIIYDQSGRGNHLTQAPPGGFSGPESNGYDNLASAIGA e[T.reesei] PVTLNGQKAYGVFVSPGTGYRNNAASGTAKGDAAEGMYAVLDGTHYNGACCFDYGNAETNSRDT CAA93243.1 GNGHMEAIYFGDSTVWGTGSGKGPWIMADLENGLFSGSSPGNNAGDPSISYRFVTAAIKGQPNQ GI:158814 WAIRGGNAASGSLSTFYSGARPQVSGYNPMSKEGAIILGIGGDNSNGAQGTFYEGVMTSGYPSD ATENSVQANIVAARYAVAPLTSGPALTVGSSISLRATTACCTTRYIAHSGSTVNTQVVSSSSAT ALKQQASWTVRAGLANNACFSFESRDTSGSYIRHSNFGLVLNANDGSKLFAEDATFCTQAGING QGSSIRSWSYPTRYFRHYNNTLYIASNGGVHVFDATAAFNDDVSFVVSGGFA Beta-xylosidase MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD 727 [Trichoderma SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW reesei] ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG CAA93248.1 EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY GI:2791278 YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK LEFELVGEEVTIENWPLEEQQIKDATPDA Unnamedprotein MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD 728 product[T. SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW reesei] ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG CAW52645.1 EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY GI:219752323 YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK LEFELVGEEVTIENWPLEEQQIKDATPDA Unnamedprotein MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD 729 product[T. SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW reesei] ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG CBC2392.1 EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY GI:257341433 YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK LEFELVGEEVTIENWPLEEQQIKDATPDA ChainA,The XNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCDSSAGYVERAQALISLFTLEE 73 StructureOf LILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEWATSFPMPILTTAALNRTLIH HypocreaJecorina QIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPGEDAFFLSSAYTYEYITGIQG Beta-xy1osidase GVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEYYTPQFLAAARYAKSRSLMCA Xy13a(bx11) YNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFNPHDYASNQSSAAASSLRAGT 5A7M_A DIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFDKKNQYRSLGWKDVVKTDAWN GI:152244671 ISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGNYYGPAPYLISPLEAAKKAGY HVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQEGADRTDIAWPGNQLDLIKQL SEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVALFDILSGKRAPAGRLVTTQY PAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFYTTFKETLASHPKSLKFNTSS ILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSNAGPAPYPNKWLVGFDRLADI KPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVKLEFELVGEEVTIENWPLE ChainB,The XNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCDSSAGYVERAQALISLFTLEE 731 StructureOf LILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEWATSFPMPILTTAALNRTLIH HypocreaJecorina QIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPGEDAFFLSSAYTYEYITGIQG Beta-xylosidase GVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEYYTPQFLAAARYAKSRSLMCA Xyl3a(bx11) YNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFNPHDYASNQSSAAASSLRAGT 5A7M_B DIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFDKKNQYRSLGWKDVVKTDAWN GI:152244672 ISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGNYYGPAPYLISPLEAAKKAGY HVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQEGADRTDIAWPGNQLDLIKQL SEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVALFDILSGKRAPAGRLVTTQY PAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFYTTFKETLASHPKSLKFNTSS ILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSNAGPAPYPNKWLVGFDRLADI KPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVKLEFELVGEEVTIENWPLE ChainA,The XNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCDSSAGYVERAQALISLFTLEE 732 StructureOf LILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEWATSFPMPILTTAALNRTLIH HypocreaJecorina QIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPGEDAFFLSSAYTYEYITGIQG Beta-xylosidase GVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEYYTPQFLAAARYAKSRSLMCA Xyl3a(Bx11)In YNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFNPHDYASNQSSAAASSLRAGT ComplexWith4- DIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFDKKNQYRSLGWKDVVKTDAWN thioxylobiose ISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGNYYGPAPYLISPLEAAKKAGY 5AE6_A HVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQEGADRTDIAWPGNQLDLIKQL GI:169428461 SEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVALFDILSGKRAPAGRLVTTQY PAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFYTTFKETLASHPKSLKFNTSS ILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSNAGPAPYPNKWLVGFDRLADI KPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVKLEFELVGEEVTIENWPLEE ChainB,The XNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCDSSAGYVERAQALISLFTLEE 733 StructureOf LILNTQNSGPGVPRLGLPNYQVWNEAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNG HypocreaJecorina FRSPLWGRGQETPGEDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRL Beta-xylosidase GFDAIITQQDLSEYYTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWG Xyl3a(Bx11)In YVSSDCDAVYNVFNPHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSV ComplexWith4- TRLYANLVRLGYFDKKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIAL thioxylobiose IGPWANATTQMQGNYYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDA 5AE6_B IIYLGGIDNTIEQEGADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNS GI:169428462 LVWGGYPGQSGGVALFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWY TGKPVYEFGSGLFYTTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKT ESPYTAMLFVRTSNAGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYP GKYELALNTDESVKLEFELVGEEVTIENWPLEE Glycoside MPLIRNPILPGFNADPSIVRVGSDYYIATSTFEWYPGVQIHHSTDLANWELAVRPLSRRSQLDL 734 hydrolasefamily RGEPDSCGVWAPCLTHDGDKFWLVYTDVKRKDGSFKDTHNYIVTAPRIEGPWSDPVYANSSGFD 43[Trichoderma PSLFHDDDGRKWLVNMVQDHRARPRTFAGIALQEFDPAQGKLVGTRKVVFHGSELGLVEGPHLY reeseiQM6a] KRNGWYYLLTAEGGTGYTHAATLARSRSIWGPYELHPQQHILTSKDHPFAALQRAGHADIVETA EGR49145.1 DGKTYLVHLAGRPIGQKRRCVLGRETALQEAYWGEDDWLYVKNGPVPSLDVEVPGVRDEEAYWK GI:3451895 EKRYEFHDGLHKDFQWLRTPEPERLFAIEDGKLVLTGRESIGSWFEQSLVARRQTHFSFDAETV IDFSPEDERQFAGLTLYYSRYNFFYLAVSAHSDGRREVQILRSEASWPNGKLEDVGANACYVRI PQQGRVKLAATIRGERLQFYYALVAEGGEEQEELQRIGPVLDASIVSDECGGHQAHGSFTGSFV GVACSDVNGTEKRAVFDYFVYRPAHDSTDRYSVSMEGIQRV Glycoside MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD 735 hydrolasefamily3 SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW [T.reeseiQM6a] ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG EGR4972.1 EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY GI:34519464 YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK LEFELVGEEVTIENWPLEEQQIKDATPDA Glycoside MRVNVPLHALQIAARSVAAAICKSSSASGRSLRGGKIDQASRINIYSISNPSPNPPLTPSFPDC 736 hydrolasefamily3 TRDPLCSNDVCDTTKSIAERAAAIVKPMTLNEKVANVGSSASGSARLGLPAYQWQNEALHGVAG [T.reeseiQM6a] STGVQFQSPLGANFSAATSFPMPILLSAAFDDALVKSVATAISTEARAFANYGFAGLDFWTPNI EGR586.1 NPFRDPRWGRGMETPGEDAFRIQGYVLALVDGLQGGIDPDFYRTLSTCKHFAAYDIENGRTANN GI:34519849 LSPTQQDMADYYLPMFETCVRDAKVASIMCAYNAVDGVPACADSYLLQDVLRDTYGFTEDFNYV VSDCDAVENVFDPHHYAANLTQAAAMSINAGTDLDCGSSYNVLNASVQAGLTTEATLDKSLIRL YSALVKVGYFDQPAEYNSLGWGNVNTTQSQALAHDAATEGMTLLKNDGTLPLSRTLSNVAVIGP WANVTTQMQGNYAGTAPLLVNPLSVFQQKWRNVKYAQGTAINSQDTSGFNAALSAASSSDVIVY LGGIDISVENEGFDRSSITWPGNQLNLISQLANLGKPLVIVQFGGGQIDDSALLSNSKVNSILW AGYPGQDGGNAIFDVLTGANPPAGRLPVTQYPANYVNNNNIQDMNLRPSNGIPGRTYAWYTGTP VLPFGYGLHYTNFSLSFQSTKTAGSDIATLVNNAGSNKDLATFATIVVNVKNTGGKANLASDYV GLLFLKSTNAGPAPHPNKQLAAYGRVRNVGVGATQQLTLTVNLGSLARADTNGDRWIYPGAYTL ILDVNGPLTFNFTLTGTATKISTLPSRS Glycoside MRVNVPLHALQIAARSVAAAICKSSSASGRSLRGGKIDQASRINIYSISNPSPNPPLTPSFPDC 737 hydrolasefamily3 TRDPLCSNDVCDTTKSIAERAAAIVKPMTLNEKVANVGSSASGSARLGLPAYQWQNEALHGVAG [T.reeseiQM6a] STGVQFQSPLGANFSAATSFPMPILLSAAFDDALVKSVATAISTEARAFANYGFAGLDFWTPNI XP_6963621.1 NPFRDPRWGRGMETPGEDAFRIQGYVLALVDGLQGGIDPDFYRTLSTCKHFAAYDIENGRTANN GI:58913197 LSPTQQDMADYYLPMFETCVRDAKVASIMCAYNAVDGVPACADSYLLQDVLRDTYGFTEDFNYV VSDCDAVENVFDPHHYAANLTQAAAMSINAGTDLDCGSSYNVLNASVQAGLTTEATLDKSLIRL YSALVKVGYFDQPAEYNSLGWGNVNTTQSQALAHDAATEGMTLLKNDGTLPLSRTLSNVAVIGP WANVTTQMQGNYAGTAPLLVNPLSVFQQKWRNVKYAQGTAINSQDTSGFNAALSAASSSDVIVY LGGIDISVENEGFDRSSITWPGNQLNLISQLANLGKPLVIVQFGGGQIDDSALLSNSKVNSILW AGYPGQDGGNAIFDVLTGANPPAGRLPVTQYPANYVNNNNIQDMNLRPSNGIPGRTYAWYTGTP VLPFGYGLHYTNFSLSFQSTKTAGSDIATLVNNAGSNKDLATFATIVVNVKNTGGKANLASDYV GLLFLKSTNAGPAPHPNKQLAAYGRVRNVGVGATQQLTLTVNLGSLARADTNGDRWIYPGAYTL ILDVNGPLTFNFTLTGTATKISTLPSRS glycoside MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD 738 hydrolasefamily3 SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW [T.reeseiQM6a] ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG XP_696475.1 EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY GI:5891415 YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK LEFELVGEEVTIENWPLEEQQIKDATPDA Glycoside MPLIRNPILPGFNADPSIVRVGSDYYIATSTFEWYPGVQIHHSTDLANWELAVRPLSRRSQLDL 739 hydrolasefamily RGEPDSCGVWAPCLTHDGDKFWLVYTDVKRKDGSFKDTHNYIVTAPRIEGPWSDPVYANSSGFD 43[T.reeseiQM6a] PSLFHDDDGRKWLVNMVQDHRARPRTFAGIALQEFDPAQGKLVGTRKVVFHGSELGLVEGPHLY XP_6964816.1 KRNGWYYLLTAEGGTGYTHAATLARSRSIWGPYELHPQQHILTSKDHPFAALQRAGHADIVETA GI:58915587 DGKTYLVHLAGRPIGQKRRCVLGRETALQEAYWGEDDWLYVKNGPVPSLDVEVPGVRDEEAYWK EKRYEFHDGLHKDFQWLRTPEPERLFAIEDGKLVLTGRESIGSWFEQSLVARRQTHFSFDAETV IDFSPEDERQFAGLTLYYSRYNFFYLAVSAHSDGRREVQILRSEASWPNGKLEDVGANACYVRI PQQGRVKLAATIRGERLQFYYALVAEGGEEQEELQRIGPVLDASIVSDECGGHQAHGSFTGSFV GVACSDVNGTEKRAVFDYFVYRPAHDSTDRYSVSMEGIQRV Family43 MPLIRNPILPGFNADPSIVRVGSDYYIATSTFEWYPGVQIHHSTDLANWELAVRPLSRRSQLDL 740 glycoside RGEPDSCGVWAPCLTHDGDKFWLVYTDVKRKDGSFKDTHNYIVTAPRIEGPWSDPVYANSSGFD hydrolase[T. PSLFHDDDGRKWLVNMVQDHRARPRTFAGIALQEFDPAQGKLVGTRKVVFHGSELGLVEGPHLY reeseiRUTC-3] KRNGWYYLLTAEGGTGYTHAATLARSRSIWGPYELHPQQHILTSKDHPFAALQRAGHADIVETA ETS2497.1 DGKTYLVHLAGRPIGQKRRCVLGRETALQEAYWGEDDWLYVKNGPVPSLDVEVPGVRDEEAYWK GI:572279375 EKRYEFHDGLHKDFQWLRTPEPERLFAIEDGKLVLTGRESIGSWFEQSLVARRQTHFSFDAETV IDFSPEDERQFAGLTLYYSRYNFFYLAVSAHSDGRREVQILRSEASWPNGKLEDVGANACYVRI PQQGRVKLAATIRGERLQFYYALVAEGGEEQEELQRIGPVLDASIVSDECGGHQAHGSFTGSFV GVACSDVNGTEKRAVFDYFVYRPAHDSTDRYSVSMEGIQRV Beta-xylostdase MVNNAALLAALSALLPTALAQNNQTYANYSAQGQPDLYPETLATLTLSFPDCEHGPLKNNLVCD 741 [T.reeseiRUTC-3] SSAGYVERAQALISLFTLEELILNTQNSGPGVPRLGLPNYQVWNEALHGLDRANFATKGGQFEW ET3193.1 ATSFPMPILTTAALNRTLIHQIADIISTQARAFSNSGRYGLDVYAPNVNGFRSPLWGRGQETPG GI:5722896 EDAFFLSSAYTYEYITGIQGGVDPEHLKVAATVKHFAGYDLENWNNQSRLGFDAIITQQDLSEY YTPQFLAAARYAKSRSLMCAYNSVNGVPSCANSFFLQTLLRESWGFPEWGYVSSDCDAVYNVFN PHDYASNQSSAAASSLRAGTDIDCGQTYPWHLNESFVAGEVSRGEIERSVTRLYANLVRLGYFD KKNQYRSLGWKDVVKTDAWNISYEAAVEGIVLLKNDGTLPLSKKVRSIALIGPWANATTQMQGN YYGPAPYLISPLEAAKKAGYHVNFELGTEIAGNSTTGFAKAIAAAKKSDAIIYLGGIDNTIEQE GADRTDIAWPGNQLDLIKQLSEVGKPLVVLQMGGGQVDSSSLKSNKKVNSLVWGGYPGQSGGVA LFDILSGKRAPAGRLVTTQYPAEYVHQFPQNDMNLRPDGKSNPGQTYIWYTGKPVYEFGSGLFY TTFKETLASHPKSLKFNTSSILSAPHPGYTYSEQIPVFTFEANIKNSGKTESPYTAMLFVRTSN AGPAPYPNKWLVGFDRLADIKPGHSSKLSIPIPVSALARVDSHGNRIVYPGKYELALNTDESVK LEFELVGEEVTIENWPLEEQQIKDATPDA Glycoside MALFHLAQARTCLPPYQAQTTYQGCYHDPNSPRDLAGPMLTVGNLNSPQYCANICGAAGYQYSG 742 hydrolase[T. VEFTIQCFCGHRIESTSVKADESQCSSPCPADSSKVCGGGNMINIYSISNPSPNPPLTPSFPDC reeseiRUTC-3] TRDPLCSNDVCDTTKSIAERAAAIVKPMTLNEKVANVGSSASGSARLGLPAYQWQNEALHGVAG ETS3636.1 STGVQFQSPLGANFSAATSFPMPILLSAAFDDALVKSVATAISTEARAFANYGFAGLDFWTPNI GI:57228576 NPFRDPRWGRGMETPGEDAFRIQGYVLALVDGLQGGIDPDFYRTLSTCKHFAAYDIENGRTANN LSPTQQDMADYYLPMFETCVRDAKVASIMCAYNAVDGVPACADSYLLQDVLRDTYGFTEDFNYV VSDCDAVENVFDPHHYAANLTQAAAMSINAGTDLDCGSSYNVLNASVQAGLTTEATLDKSLIRL YSALVKVGYFDQPAEYNSLGWGNVNTTQSQALAHDAATEGMTLLKNDGTLPLSRTLSNVAVIGP WANVTTQMQGNYAGTAPLLVNPLSVFQQKWRNVKYAQGTAINSQDTSGFNAALSAASSSDVIVY LGGIDISVENEGFDRSSITWPGNQLNLISQLANLGKPLVIVQFGGGQIDDSALLSNSKVNSILW AGYPGQDGGNAIFDVLTGANPPAGRLPVTQYPANYVNNNNIQDMNLRPSNGIPGRTYAWYTGTP VLPFGYGLHYTNFSLSFQSTKTAGSDIATLVNNAGSNKDLATFATIVVNVKNTGGKANLASDYV GLLFLKSTNAGPAPHPNKQLAAYGRVRNVGVGATQQLTLTVNLGSLARADTNGDRWIYPGAYTL ILDVNGPLTFNFTLTGTATKISTLPSRS ChainB,The XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 743 Three-Dimensional ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS CrystalStructure FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW OfTheCatalytic EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC CoreOf DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS Cellobtohydrolase YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN IFromT.Reesei ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG 10EL_BGI:89287 ChainA,The XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 744 Three-Dimenstional ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS CrystalStructure FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW OfTheCatalytic EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC CoreOf DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS Cellobtohydrolase YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN IFromT.Reesei ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG 1CEL_AGI:89286 ChainA,Three- SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 745 Dimensional QTLADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS StructureOf DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA Cellobtohydrolase NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA FromT.Reesei NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG 3CBH_A TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL GI:157836775 ChainA, TQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 746 DeterminationOf TheThree- Dimensional StructureOfThe C-TerminalDomain Of Cellobtohydrolase IFromT.Reesei. 2CBH_A GI:157834734 ChainA,Three- TQSHYGQCGGIGYSGPTVCASGTTCQVLNPYASQCL 747 Dimensional StructuresOf ThreeEngineered Cellulose-Binding DomainsOf Cellobiohydrolase IFromT.Reesei, Nmr,19Structures lAZK_AGI:15783159 ChainA,Three- TQSHYGQCGGIGYSGPTVCASGTTCQVLNPAYSQCL 748 Dimensional StructuresOf ThreeEngineered Cellulose-Binding DomainsOf Cellobiohydrolase IFromT.Reesei, Nmr,18Structures lAZJ_AGI:15783158 ChainA,Three- TQSHAGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 749 Dimensional StructuresOf ThreeEngineered Cellulose-Binding DomainsOf Cellobiohydrolase IFromT.Reesei, Nmr,14Structures lAZH_AGI:15783156 ChainA,Three- TQSHAGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 750 Dimensional StructuresOf ThreeEngineered Cellulose-Binding DomainsOf Cellobiohydrolase IFromT.Reesei, Nmr,23Structures 1AZ6_AGI:15783153 CellobIohydrolase MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 751 II[T.Reesei] AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY AAA3421.1 ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV GI:17541 YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA PQAGAWFQAYFVQLLTNANPSFL CellobIohydro1ase GSASYXGNPFVGVSPWANAYYAXEVXXLAIPXLTGAMA 752 IIcoreprotein, CBHIIcp=3.2.1.91 reesei, PeptidePartIa1, 38aa AAB3868.1GI:5528 ChainA, TQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL 753 DeterminationOf TheThree- Dimensiona1 StructureOfThe C-Termina1Domain Of Cellobiohydro1ase IFromT.Reesei. 1CBH_AGI:15783535 cellobIohydro1ase MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 754 II[T.Reesei] AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY AAG3998.1 ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV GI:11692747 YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGRLLANHGWSNAFFITDQGRSGKQPTG QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA PQAGAWFQAYFVQLLTNANPSFL ChainA, XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 755 Cellobiohydro1ase ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Ce17a(E223s, FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW A224h,L225v, EPSSNNANTGIGGHGSCCSEMDIWEANSISSHVAPHPCTTVGQEICEGDGCGGTYSDNRYGGTC T226a,D262g) DPDGCGWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS Mutant YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 1EGN_A ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG GI:14277711 ChainB, XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 756 Cellobtohydro1ase ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Ce17aWithLoop FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW De1etion245-252 EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGGTCDPDGCDWN AndBoundNon- PYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGSYSGNELND Hydrolysable DYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGA Ce11otetraose VRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG 1Q2E_BGI:39654596 ChainA, XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 757 Cellobiohydro1ase ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Ce17aWithLoop FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW De1etion245-252 EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGGTCDPDGCDWN AndBoundNon- PYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGSYSGNELND Hydrolysable DYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGA Ce11otetraose VRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG 1Q2E_AGI:39654595 ChainA, XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 758 Cellobiohydro1ase ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Ce17aWith FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Disu1phideBridge EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGCGCGGTYSCNRYGGTC AddedAcrossExo- DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS LoopByMutations YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN D241cAndD249c ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG 1Q2B_AGI:39654594 Glycostde MRATSLLAAALAVAGDALAGKIKYLGVAIPGIDFGCDIDGSCPTDTSSVPLLSYKGGDGAGQMK 759 hydro1asefamily5 HFAEDDGLNVFRISATWQFVLNNTVDGKLDELNWGSYNKVVNACLETGAYCMIDMHNFARYNGG [T.reeseiQM6a] IIGQGGVSDDIFVDLWVQIAKYYEDNDKIIFGLMNEPHDLDIEIWAQTCQKVVTAIRKAGATSQ XP_6969897.1 MILLPGTNFASVETYVSTGSAEALGKITNPDGSTDLLYFDVHKYLDINNSGSHAECTTDNVDAF GI:589115749 NDFADWLRQNKRQAIISETGASMEPSCMTAFCAQNKAISENSDVYIGFVGWGAGSFDTSYILTL TPLGKPGNYTDNKLMNECILDQFTLDEKYRPTPTSISTAAEETATATATSDGDAPSTTKPIFRE ETASPTPNAVTKPSPDTSDSSDDDKDSAASMSAQGLTGTVLFTVAALGYMLVAF Glycostde MNKSVAPLLLAASILYGGAAAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT 760 hydro1asefamily5 TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS [T.reeseiQM6a] NNYPDGIGQMQHFVNDDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV EGR512.1 DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV GI:3452785 TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK Glycoside MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPPIVVGWTAADLDNGF 761 hydro1asefamily VSPDAYQNPDIICHKNATNAKGHASVKAGDTILFQWVPVPWPHPGPIVDYLANCNGDCETVDKT 61reesei TLEFFKIDGVGLLSGGDPGTWASDVLISNNNTWVVKIPDNLAPGNYVLRHEIIALHSAGQANGA QM6a1 QNYPQCFNIAVSGSGSLQPSGVLGTDLYHATDPGVLINIYTSPLNYIIPGPTVVSGLPTSVAQG EGR52697.1 SSAATATASATVPGGGSGPTSRTTTTARTTQASSRPSSTPPATTSAPAGGPTQTLYGQCGGSGY GI:34522464 SGPTRCAPPATCSTLNPYYAQCLN Endo-1,4-beta- MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT 762 g1ucanaseV[T. AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG reeseiRUTC-3] YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS ETR998.1 PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP GI:572276454 ChainA,The MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG 763 StructureOfA FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVVECSGSCTTVN Glycoside KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM Hydro1aseFamily QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG 61Member,Ce161b FromTheHypocrea Jecorina. 2VTC_AGI:19844312 ChainB,The MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG 764 StructureOfA FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVVECSGSCTTVN Glycoside KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM Hydro1aseFamily QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG 61Member,Ce161b FromTheHypocrea Jecorina. 2VTC_B GI:198443121 Endog1ucanaseVIII MRATSLLAAALAVAGDALAGKIKYLGVAIPGIDFGCDIDGSCPTDTSSVPLLSYKGGDGAGQMK 765 [T.reesei HFAEDDGLNVFRISATWQFVLNNTVDGKLDELNWGSYNKVVNACLETGAYCMIDMHNFARYNGG RUTC-3] IIGQGGVSDDIFVDLWVQIAKYYEDNDKIIFGLMNEPHDLDIEIWAQTCQKVVTAIRKAGATSQ ETR9685.1 MILLPGTNFASVETYVSTGSAEALGKITNPDGSTDLLYFDVHKYLDINNSGSHAECTTDNVDAF GI:572273122 NDFADWLRQNKRQAIISETGASMEPSCMTAFCAQNKAISENSDVYIGFVGWGAGSFDTSYILTL TPLGKPGNYTDNKLMNECILDQFTLDEKYRPTPTSISTAAEETATATATSDGDAPSTTKPIFRE ETASPTPNAVTKPSPDTSDSSDDDKDSAASMSAQGLTGTVLFTVAALGYMLVAF Putative MKLWIGLLLLGLACRASAHTTFTTLFIDKKNQGDGTCVRMPYDDKTATNPVKPITSSDMACGRN 766 endoglucanase[T. GGDPVPFICSAKKGSLLTFEFRLWPDAQQPGSIDPGHLGPCAVYLKKVDNMFSDSAAGGGWFKI reeseiRUTC-3] WEDGYDSKTQKWCVDRLVKNNGLLSVRLPRGLPAGYYIVRPEILALHWAAHRDDPQFYLGCAQI ETS3449.1 FVDSDVRGPLEIPRRQQATIPGYVNAKTPGLTFDIYQDKLPPYPMPGPKVYIPPAKGNKPNQDL GI:57228352 NAGRLVQTDGLIPKDCLIKKANWCGRPVEPYSSARMCWRAVNDCYAQSKKCRESSPPIGLTNCD RWSDHCGKMDALCEQEKYKGPPKFTEKEYVVPAPGKLPEMWNDIFERLEQNGTSTKFF EndoglucanaseVII MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG 767 [T.reesei FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVAECSGSCTTVN RUTC-3] KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM ETS3833.1 QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG GI:57228773 EndoglucanaseIII MNKSVAPLLLAASILYGGAAAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT 768 [T.reeseiRUTC-3] TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS ETS4885.1 NNYPDGIGQMQHFVNDDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV GI:572281861 DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK Putative MKLLSIASLLSLVATAQAHMEVSWPPVFRSKYNPRVPGNLINYDMTSPLNADGSNYPCKGYQVD 769 endoglucanase[T. VGRPEGAPGVTWRAGGTYNLTVAGSATHSGGSCQASLSYDRGRTWVVVHSWIGGCPLTPTWDFT reeseiRUTC-3] LPNDTPPGEALFAWTWFNRIGNREMYMNCGAVTIRPSGRAARSPADSIYNRPAQFVANVNNGCA ETS538.1 TLEGADVLFPSPGPDTDFDSDRTAAPVGKCGASSRRTRPVRA GI:572282294 Endoglucanase-5 MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT 770 (CellulaseV; AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG Endo-1,4-beta- YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS glucanaseV;EGV; PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP EndoglucanaseV; P43317.1GI:117136 ChainA,Active- XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 771 SiteMutantE212q ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS DeterminedAtPh FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW 6.WithNoLigand EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC BoundInThe DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS ActiveSite YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 2CEL_AGI:194214 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainB,Active- XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 772 SiteMutantE212q ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS DeterminedAtPh FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW 6.WithNoLigand EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC BoundInThe DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS ActiveSite YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 2CEL_BGI:194215 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Active- XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 773 SiteMutantE212q ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS DeterminedAtPh FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW 6.WIthCellobiose EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC BoundInThe DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS ActiveSite YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 3CEL_A ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG GI:157836779 ChainA,Active- XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 774 siteMutantD214n ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS DeterminedAtPh FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW 6.WithNoLigand EPSSNNANTGIGGHGSCCSEMNIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC BoundInThe DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS ActiveSite YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 4CEL_AGI:1941941 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainB,Active- XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 775 siteMutantD214n ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS DeterminedAtPh FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW 6.WithNoLigand EPSSNNANTGIGGHGSCCSEMNIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC BoundInThe DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS ActiveSite YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 4CEL_BGI:1941942 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG Endoglucanase-4 MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPPIVVGWTAADLDNGF 776 (CellulaseIV; VSPDAYQNPDIICHKNATNAKGHASVKAGDTILFQWVPVPWPHPGPIVDYLANCNGDCETVDKT Cellulase-61A; TLEFFKIDGVGLLSGGDPGTWASDVLISNNNTWVVKIPDNLAPGNYVLRHEIIALHSAGQANGA Ce161A;Endo-1,4- QNYPQCFNIAVSGSGSLQPSGVLGTDLYHATDPGVLINIYTSPLNYIIPGPTVVSGLPTSVAQG beta-glucanaseIV; SSAATATASATVPGGGSGPTSRTTTTARTTQASSRPSSTPPATTSAPAGGPTQTLYGQCGGSGY EGIV; SGPTRCAPPATCSTLNPYYAQCLN EndoglucanaseIV; Endoglucanase-61A) 01445.1 GI:21263647 EndoglucanaseI MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH 777 precursor[T. DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS reeseiRUTC-3] PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY ETS775.1 CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK GI:57228411 SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC QYSNDYYSQCL ChainA,Cbh1 XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 778 (E217p)InComplex ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS WithCellohexaose FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW AndCellobiose EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC 7CEL_A DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS GI:157837135 YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Cbh1 XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 779 (E212p) ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Cellotetraose FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Complex EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC 5CEL_AGI:1578372 DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Cbh1 XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 780 (E212p) ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Cellopentaose FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Complex EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC 6CEL_AGI:15783787 DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG EndoglucanaseEG- MNKSVAPLLLAASILYGGAVAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT 781 II(EGLII; TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS Cellulase;Endo- NNYPDGIGQMQHFVNEDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV 1,4-beta- DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV glucanase( TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH P7982.1GI:121794 AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG AGSFDSTYVLTETPTSSGNSWTDTSLVSSCLARK EndoglucanaseEG- MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH 782 1;Cellulase; DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS Endo-1,4-beta- PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY glucanase; CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK P7981.1GI:121788 SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC QYSNDYYSQCL endo-1,4-beta- MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT 783 glucanaseV(EGV) AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG [T.reesei] YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS CAA838461 PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP GI:485864 beta-1,4-glucanase MKFLQVLPALIPAALAQTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADW 784 [T.reesei] QWSGGQNNVKSYQNSQIAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYS ABV71388.1 GDYELMIWLGKYGDIGPIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFF GI:15777972 NYLRDNKGYNAAGQYVLSYQFGTEPFTGSGTLNVASWTASIN Cellbiohydrolase MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 785 II[T.reesei] AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY ADC83999.1 ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV GI:289152138 YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG QQQWGDWCNVTGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA PQAGAWFQAYFVQLLTNANPSFL Endo-beta-1,4- MKFLQVLPALIPAALAQTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADW 786 glucanase QWSGGQNNVKSYQNSQIAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYS [T.reesei] GDYELMIWLGKYGDIGPIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFF BAA214.1 NYLRDNKGYNAAGQYVLSYQFGTEPFTGSGTLNVASWTASIN GI:2116583 ChainA,Crystal MGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGSNNYPDGIGQMQHFVNEDGMTIFRLPV 787 StructureOfCel5a GWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIVDIHNYARWNGGIIGQGGPTNAQFTSL Eg2)From WSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVVTAIRNAGATSQFISLPGNDWQSAGAF HypocreaJecorina ISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTHAECTTNNIDGAFSPLATWLRQNNRQA (T.Reesei) ILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWGAGSFDSTYVLTETPTSSGNSWTDTSL 3QR3_AGI:39981273 VSSCLARKGGSGSGHHHHHH ChainB,Crystal MGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGSNNYPDGIGQMQHFVNEDGMTIFRLPV 788 StructureOfCel5a GWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIVDIHNYARWNGGIIGQGGPTNAQFTSL (Eg2)From WSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVVTAIRNAGATSQFISLPGNDWQSAGAF HypocreaJecorina ISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTHAECTTNNIDGAFSPLATWLRQNNRQA (T.Reesei) ILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWGAGSFDSTYVLTETPTSSGNSWTDTSL 3QR3_BGI:39981274 VSSCLARKGGSGSGHHHHHH Ce174a[T.reesei] MKVSRVLALVLGAVIPAHAAFSWKNVKLGGGGGFVPGIIFHPKTKGVAYARTDIGGLYRLNADD 789 AAP57752.1 SWTAVTDGIADNAGWHNWGIDAVALDPQDDQKVYAAVGMYTNSWDPSNGAIIRSSDRGATWSFT GI:317471 NLPFKVGGNMPGRGAGERLAVDPANSNIIYFGARSGNGLWKSTDGGVTFSKVSSFTATGTYIPD PSDSNGYNSDKQGLMWVTFDSTSSTTGGATSRIFVGTADNITASVYVSTNAGSTWSAVPGQPGK YFPHKAKLQPAEKALYLTYSDGTGPYDGTLGSVWRYDIAGGTWKDITPVSGSDLYFGFGGLGLD LQKPGTLVVASLNSWWPDAQLFRSTDSGTTWSPIWAWASYPTETYYYSISTPKAPWIKNNFIDV TSESPSDGLIKRLGWMIESLEIDPTDSNHWLYGTGMTIFGGHDLTNWDTRHNVSIQSLADGIEE FSVQDLASAPGGSELLAAVGDDNGFTFASRNDLGTSPQTVWATPTWATSTSVDYAGNSVKSVVR VGNTAGTQQVAISSDGGATWSIDYAADTSMNGGTVAYSADGDTILWSTASSGVQRSQFQGSFAS VSSLPAGAVIASDKKTNSVFYAGSGSTFYVSKDTGSSFTRGPKLGSAGTIRDIAAHPTTAGTLY VSTDVGIFRSTDSGTTFGQVSTALTNTYQIALGVGSGSNWNLYAFGTGPSGARLYASGDSGASW TDIQGSQGFGSIDSTKVAGSGSTAGQVYVGTNGRGVFYAQGTVGGGTGGTSSSTKQSSSSTSSA SSSTTLRSSVVSTTRASTVTSSRTSSAAGPTGSGVAGHYAQCGGIGWTGPTQCVAPYVCQKQND YYYQCV Ce161b[T.reesei] MKSCAILAALGCLAGSVLGHGQVQNFTINGQYNQGFILDYYYQKQNTGHFPNVAGWYAEDLDLG 790 AAP57753.1 FISPDQYTTPDIVCHKNAAPGAISATAAAGSNIVFQWGPGVWPHPYGPIVTYVVECSGSCTTVN GI:31747162 KNNLRWVKIQEAGINYNTQVWAQQDLINQGNKWTVKIPSSLRPGNYVFRHELLAAHGASSANGM QNYPQCVNIAVTGSGTKALPAGTPATQLYKPTDPGILFNPYTTITSYTIPGPALWQG Ce15b[T.reesei] MRATSLLAAALAVAGDALAGKIKYLGVAIPGIDFGCDIDGSCPTDTSSVPLLSYKGGDGAGQMK 791 AAP57754.1 HFAEDDGLNVFRISATWQFVLNNTVDGKLDELNWGSYNKVVNACLETGAYCMIDMHNFARYNGG GI:31747164 IIGQGGVSDDIFVDLWVQIAKYYEDNDKIIFGLMNEPHDLDIEIWAQTCQKVVTAIRKAGATSQ MILLPGTNFASVETYVSTGSAEALGKITNPDGSTDLLYFDVHKYLDINNSGSHAECTTDNVDAF NDFADWLRQNKRQAIISETGASMEPSCMTAFCAQNKAISENSDVYIGFVGWGAGSFDTSYILTL TPLGKPGNYTDNKLMNECILDQFTLDEKYRPTPTSISTAAEETATATATSDGDAPSTTKPIFRE ETASPTPNAVTKPSPDTSDSSDDDKDSAASMSAQGLTGTVLFTVAALGYMLVAF EndoglucanaseI MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH 792 132152A DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS GI:22541 PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC QYSNDYYSQCL EndoglucanaseIV MIQKLSNLLVTALAVATGVVGHGHINDIVINGVWYQAYDPTTFPYESNPPIVVGWTAADLDNGF 793 [Trichoderma VSPDAYQNPDIICHKNATNAKGHASVKAGDTILFQWVPVPWPHPGPIVDYLANCNGDCETVDKT reesei] TLEFFKIDGVGLLSGGDPGTWASDVLISNNNTWVVKIPDNLAPGNYVLRHEIIALHSAGQANGA CAA71999.1 QNYPQCFNIAVSGSGSLQPSGVLGTDLYHATDPGVLINIYTSPLNYIIPGPTVVSGLPTSVAQG GI:2315274 SSAATATASATVPGGGSGPTSRTTTTARTTQASSRPSSTPPATTSAPAGGPTQTLYGQCGGSGY SGPTRCAPPATCSTLNPYYAQCLN EndoglucanaseI MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH 794 [Trichoderma DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS reesei] PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY ADM8177.1 CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFSPYGSGYK GI:3329711 SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDVPSAQPGGDTISSCPSASA YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC QYSNDYYSQCL EndoglucanaseII MNKSVAPLLLAASILYGGAVAQQTVWGQCGGIGNSGPTNCAPGSACSTLNPYYAQCIPGATTIT 795 precursor TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS [Trichoderma NNYPDGIGQMQHFVNEDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV reesei] DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV AAA34213.1 TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH GI:17549 AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG AGSFDSTYVLTETPTSSGNSWTDTSLVSSCLARK EndoglucanaseII MNKSVAPLLLAASILYGGAVAQQTVWGQCGGIGWSGPTNCAPGSACSTLNPYYAQCIPGATTIT 796 [Trichoderma TSTRPPSGPTTTTRATSTSSSTPPTSSGVRFAGVNIAGFDFGCTTDGTCVTSKVYPPLKNFTGS reesei] NNYPDGIGQMQHFVNEDGMTIFRLPVGWQYLVNNNLGGNLDSTSISKYDQLVQGCLSLGAYCIV ABA64553.1 DIHNYARWNGGIIGQGGPTNAQFTSLWSQLASKYASQSRVWFGIMNEPHDVNINTWAATVQEVV GI:77176916 TAIRNAGATSQFISLPGNDWQSAGAFISDGSAAALSQVTNPDGSTTNLIFDVHKYLDSDNSGTH AECTTNNIDGAFSPLATWLRQNNRQAILTETGGGNVQSCIQDMCQQIQYLNQNSDVYLGYVGWG AGSFDSTYVLTETPTGSGNSWTDTSLVSSCLARK ChainA,TheX-Ray XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ 797 CrystalStructure IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG OfT.Reesei PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV Family12 LSYQFGTEPFTGSGTLNVASWTASIN Endoglucanase3, Cel12a,At1.9A Resolution 1H8V_AGI:14278359 ChainB,TheX-Ray XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ 798 CrystalStructure IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG OfT.Reesei PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV Family12 LSYQFGTEPFTGSGTLNVASWTASIN Endoglucanase3, Cel12a,At1.9A Resolution 1H8V_BGI:142783 ChainC,TheX-Ray XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ 799 CrystalStructure IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG OfT.Reesei PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV Family12 LSYQFGTEPFTGSGTLNVASWTASIN Endoglucanase3, Cel12a,At1.9A Resolution 1H8V_CGI:14278361 ChainD,TheX-Ray XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ 800 CrystalStructure IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG OfT.Reesei PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV Family12 LSYQFGTEPFTGSGTLNVASWTASIN Endoglucanase3, Cell2a,At1.9A Resolution 1H8V_DGI:14278362 ChainE,TheX-Ray XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ 801 CrystalStructure IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG OfTheT.Reesei PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV Family12 LSYQFGTEPFTGSGTLNVASWTASIN Endoglucanase3, Cell2a,At1.9A Resolution 1H8V_EGI:14278363 ChainF,TheX-Ray XTSCDQWATFTGNGYTVSNNLWGASAGSGFGCVTAVSLSGGASWHADWQWSGGQNNVKSYQNSQ 802 CrystalStructure IAIPQKRTVNSISSMPTTASWSYSGSNIRANVAYDLFTAANPNHVTYSGDYELMIWLGKYGDIG OfTheT.Reesei PIGSSQGTVNVGGQSWTLYYGYNGAMQVYSFVAQTNTTNYSGDVKNFFNYLRDNKGYNAAGQYV Family12 LSYQFGTEPFTGSGTLNVASWTASIN Endoglucanase3, Cell2a,At1.9A Resolution 1H8V_FGI:14278364 EndoglucanaseI MAPSVTLPLTTAILAIARLVAAQQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMH 803 precursor DANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVS [T.reesei] PRLYLLDSDGEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGY AAA34212.1 CDAQCPVQTWRNGTLNTSHQGFCCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYK GI:17547 SYYGPGDTVDTSKTFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASA YGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNI RWGDIGSTTNSTAPPPPPASSTTFSTTRRSSTTSSSPSCTQTHWGQCGGIGYSGCKTCTSGTTC QYSNDYYSQCL ChainA,Solution SCTQTHWGQCGGIGYSGCKTCTSGTTCQYSNDYYSQCL 804 StructureOfThe Cellulose-binding DomainOf EndoglucanaseI FromT.ReeseiAnd ItsInteraction WithCello- oligosaccharides 4BMF_AGI:5743 ChainA, XQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMHDANYNSCTVNGGVNTTLCPDEA 805 EndoglucanaseI TCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVSPRLYLLDSDGEYVMLKLNGQEL FromTrichoderma SFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGYCDAQCPVQTWRNGTLNTSHQGF Reesei CCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYKSYYGPGDTVDTSKTFTIITQFN lEGl_AGI:239236 TDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASAYGGLATMGKALSSGMVLVFSIW NDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNIRWGDIGSTT ChainC, XQPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMHDANYNSCTVNGGVNTTLCPDEA 806 EndoglucanaseI TCGKNCFIEGVDYAASGVTTSGSSLTMNQYMPSSSGGYSSVSPRLYLLDSDGEYVMLKLNGQEL FromTrichoderma SFDVDLSALPCGENGSLYLSQMDENGGANQYNTAGANYGSGYCDAQCPVQTWRNGTLNTSHQGF Reesei CCNEMDILEGNSRANALTPHSCTATACDSAGCGFNPYGSGYKSYYGPGDTVDTSKTFTIITQFN lEGl_CGI:239237 TDNGSPSGNLVSITRKYQQNGVDIPSAQPGGDTISSCPSASAYGGLATMGKALSSGMVLVFSIW NDNSQYMNWLDSGNAGPCSSTEGNPSNILANNPNTHVVFSNIRWGDIGSTT Endoglucanase MRSFSLLGSLSLLTSLSWALPTEGVISKLEGRQSGSSWFLPNIDHTTGAVRGYVPNLFNSAGQQ 807 [Trichoderma NFTYPVYKTVASGDSAGFVNALYSDGPSGGQRDNCYLAGEPRVIYLPPGTYTVSSTIFFDTDTV reeseiRUTC-3] IIGDAANPPTIKAAAGFNGDYLIVGGQGDGDSHPCGGSGGETHFSVMIKNVILDTTANAGSSGF ETS6856.1 TALSWAVAQNCALVNVKINMPQGVHTGMLVSGGSTISISDVSFNFGNIGLHWNGHQQGQIKGMT GI:572283882 FTDCTNGIFIDSGFTISIFAPTCNTVGRCIVLNSGNAWVAVIDGQSINSGDFFTSNVGFPNFML ENISKDTTNSNMVVVGGNVKVGGSTSLGTYVYGNTRGANPVYQTNPTSQPVNRPAALAPGGRYP VINAPQYADKTVANVVNLKDPNQNGGHTLQGDGFTDDTAALQGALNTAASQGKIAYLPFGIYIV KSTITIPPGTELYGEAWSTISGSGSAFSSETNPTPVVQIGATPGQKGVAHVQDIRFTVNEALPG AILLRINMAGNNPGDVAVFNSLNTIGGTRDTSISCSSESNCRAAYLGLHLAAGSSAYIDNFWSW VADHATDQSGKGTRTAVKGGVLVEATAGTWLTGLGSEHNWLYQLSFHNAANVFISLFQSETNYN QGNNGAPLPGTPFDATSIDPNFSWCSGGDTVCRMGLAQYYTGSNSNIFHYAAGSWNFIGLTKVN QGLMNFIQSTISNAHLYGFTSGPNTGETMRLPNGVEFGNGGNDGYGGSWGTLIANIASQS Endoglucanase, MKATLVLGSLIVGAVSAYKATTTRYYDGQEGACGCGSSSGAFPWQLGIGNGVYTAAGSQALFDT 808 partial AGASWCGAGCGKCYQLTSTGQAPCSSCGTGGAAGQSIIVMVTNLCPNNGNAQWCPVVGGTNQYG [T.reesei] YSYHFDIMAQNEIFGDNVVVDFEPIACPGQAASDWGTCLCVGQQETDPTPVLGNDTGSTPPGSS AHK2346.1 PPATSSSPPSGGGQQTLYGQCGGAGWTGPTTCQAPGTCKVQNQWYSQCLP GI:58829453 ChainA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 809 JecorinaCel7a ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS E212qMutantIn FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW ComplexWithP- EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC nitrophenyl DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS Cellobioside YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 4UWT_A ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG GI:922664681 ChainA,0- XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 810 nitropenyl ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS CellobiosideAsAn FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW ActiveSiteProbe EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC ForFamily7 DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQDGVTFQQPNAELGS Cellobiohydrolases YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 4VZ_AGI:931139719 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 811 JecorinaCel7a ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS (wildType)Soaked FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW WithXylopentaose. EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC 4D5Q_A DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS GI:783282859 YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Cbh1In XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 812 ComplexWithS- ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS propranolol FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW 1DY4_AGI:1284415 EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG glycoside MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 813 hydrolasefamily6 AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY [T.reeseiQM6a1] ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT XP_696258.1 PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL GI:58911115 RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA PQAGAWFQAYFVQLLTNANPSFL glycoside MYRKLAVISAFLATARAQSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSS 814 hydro1asefamily7 TNCYDGNTWSSTLCPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLM [T.reeseiQM6a] ASDTTYQEFTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQ XP_6969224.1 CPRDLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE GI:58911443 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRY YVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLW DDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGN PSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQ CL glycoside MYRKLAVISAFLATARAQSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSS 815 hydro1asefamily7 TNCYDGNTWSSTLCPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLM [T.reeseiQM6a] ASDTTYQEFTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQ EGR44817.1 CPRDLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE GI:34514556 GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRY YVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLW DDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGN PSGGNPPGGNPPGTTTTRRPATTTGSSPGPTQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQ CL glycoside MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 816 hydro1asefamily6 AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY [T.reeseiQM6a] ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV EGR5117.1 YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT GI:3452782 PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA PQAGAWFQAYFVQLLTNANPSFL ChainA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 817 Jecorina ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Cellobiohydro1ase FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Ce17aE212qSoaked EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC WithXy1otriose. DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS 4D5I_A YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN GI:783282849 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChanA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 818 Jecorina ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Cellobiohydrolase FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Cel7aE217qSoaked EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC WithXylopentaose. DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS 4D5P_A YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN GI:783282856 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 819 JecorinaCe17aIn ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS ComplexWith(R)- FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Dihydroxy- EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC Phenanthrenolol DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS 2V3I_AGI:19356563 YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 820 JecorinaCe17aIn ETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS ComplexWith(S)- FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Dihydroxy- EPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC Phenanthrenolol DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS 2V3R_AGI:19356564 YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Michaelis XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 821 ComplexOf ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS HypocreaJecorina FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Cel7aE217qMutant EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC WithCellononaose DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS SpanningThe YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN ActiveSite ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG 4C4C_AGI:57215318 ChainA,Covalent XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 822 Glycosyl-enzyme ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS IntermediateOf FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW HypocreaJecorina EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC Ce17aE217qMutant DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS TrappedUsingDnp- YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN 2-deoxy-2-fluoro- ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG cellotrioside 4C4D_A GI:572153181 ChainA,Ce16a SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 823 D175aMutant QTLADIRTANKNGGNYAGQFVVYDLPDRACAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS 1HGW_A DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA GI:1865599 NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL ChainB,Ce16a SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 824 D175aMutant QTLADIRTANKNGGNYAGQFVVYDLPDRACAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS 1HGW_B DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA GI:1865591 NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL CainA,Ce16a SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 825 D221aMutant QTLADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS 1HGY_A DIRTLLVIEPASLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA GI:18655911 NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL ChainB,Ce16a SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 826 D221aMutant QTLADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS 1HGY_B DIRTLLVIEPASLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA GI:18655912 NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL Cellobiohydro1ase, ESACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 827 betag1ucan ETCAKNCCLDGAAYTSAYSSZPGGGGGVVIFFKNVGARLYLMASDTTYQEFTLLGNEFSFDVDV 13195A SQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGWEPSSN GI:223874 NANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTCDPDGC DWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQDGVTFQQPNAELGSYSGNE LNDDYCTAEEAEFGGSSFSDKGGLTQFXXATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSST PGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSGGNPPGGNPPGTTTTTTTSS SZPPPGAHRRYGQCGGIGYSGPTVCASGTTCQVLNPYYSQCL ChainA,Cel6a SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 828 (y169f)WithA QTLADIRTANKNGGNYAGQFVVFDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS Non-hydrolysable DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA Cellotetraose NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA 1QJW_AGI:6137482 NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL ChainB,Cel6a SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 829 (y169f)WithA QTLADIRTANKNGGNYAGQFVVFDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS Non-hydrolysable DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA Cellotetraose NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA 1QJW_BGI:6137483 NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL ChainA,Cel6aIn TATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQT 830 ComplexWIthM- LADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDI iodobenzylBeta-d- RTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQ glucopyranosyl- DPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANH Beta(1,4)-d- GWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTS xylopyranoside DSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL 1QK_AGI:6137484 ChainB,Cel6aIn TATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQT 831 ComplexWithM- LADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDI iodobenzylBeta-d- RTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQ glucopyranosyl- DPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANH Beta(1,4)-d- GWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTS xylopyranoside DSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL 1QK_BGI:6137485 ChainA,WildType TATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQT 832 Ce16aWithANon- LADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDI hydrolysable RTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQ Cellotetraose DPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANH 1QK2_AGI:6137486 GWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTS DSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL ChainB,WildType TATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQT 833 Ce16aWithANon- LADIRTANKNGGNYAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDI hydrolysable RTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQ Cellotetraose DPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANH 1QK2_BGI:6137487 GWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTS DSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL Exoglucanase2 MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 834 (alsoknownas AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY 1,4-beta_ ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV cellobiohydrolase; YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT exocellobiohydrola PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL seII;CBHII; RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG ExoglucanaseII QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA P7987.1GI:121855 PQAGAWFQAYFVQLLTNANPSFL Exoglucanase1 MYRKLAVISAFLATARAQSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSS 835 (alsoknownas TNCYDGNTWSSTLCPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIGFVTQSAQKNVGARLYLM 1,4-beta- ASDTTYQEFTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQ cellobIohydrolase; CPRDLKFINGQANVEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTPHPCTTVGQEICE ExocellobIohydrola GDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRY seI;CBHI; YVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLW AltName: DDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGN Full=Exoglucanase PSGGNPPGGNRGTTTTRRPATTTGSSPGPTQSHYGQCGGIGYSGPTVCASGTTCQVLNPYYSQC I L P62694.1GI:542144 Unnamedprotein MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 836 product AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY [T.reesei] ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV CAV28333.1 YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT GI:21829938 PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA PQAGAWFQAYFVQLLTNANPSFL CellobIohydrolase MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 837 II AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY 134188A ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV GI:225475 YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTG QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA PQAGAWFQAYFVQLLTNANPSFL Unnamedprotein MIVGILTTLATLATLAASVPLEERQACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG 838 [Trichoderma AASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGNPFVGVTPWANAYY reesei] ASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGNYAGQFVV AAA72922.1 YDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYSDIRTLLVIEPDSLANLVTNLGT GI:17543 PKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRAL RGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGRLLANHGWSNAFFITDQGRSGKQPTG QQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALPDALQPA AQAGAWFQAYFVQLLTNANPSFL ChainA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 839 Jecortna ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Cellobtohydro1ase FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Ce17aE217qSoaked EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC WithXy1otrtose. DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS 4D5J_A YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN GI:783282851 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 840 Jecortna ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Cellobiohydro1ase FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Ce17aE212qSoaked EPSSNNANTGIGGHGSCCSQMDIWEANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC WithXy1opentaose. DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS 4D5O_A YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN GI:783282853 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA,Hypocrea XSACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGNTWSSTLCPDN 841 Jecortna ETCAKNCCLDGAAYASTYGVTTSGNSLSIDFVTQSAQKNVGARLYLMASDTTYQEFTLLGNEFS Cellobtohydro1ase FDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGW Ce17aE217qSoaked EPSSNNANTGIGGHGSCCSEMDIWQANSISEALTPHPCTTVGQEICEGDGCGGTYSDNRYGGTC WithXy1otetraose. DPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQFETSGAINRYYVQNGVTFQQPNAELGS 4D5V_A YSGNELNDDYCTAEEAEFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTN GI:783282861 ETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFGPIGSTGNPSG ChainA, SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 842 Cellobiohydrolase QTLADIRTANKNGGNYAGQFVVFDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS Ii,Catalytic DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA Domain,Mutant NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA Y169f NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG 1CB2_AGI182776 TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL ChainB, SGTATYSGNPFVGVTPWANAYYASEVSSLAIPSLTGAMATAAAAVAKVPSFMWLDTLDKTPLME 843 Cellobiohydrolase QTLADIRTANKNGGNYAGQFVVFDLPDRDCAALASNGEYSIADGGVAKYKNYIDTIRQIVVEYS Ii,Catalytic DIRTLLVIEPDSLANLVTNLGTPKCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPA Domain,Mutant NQDPAAQLFANVYKNASSPRALRGLATNVANYNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLA Y169f NHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGIRPSANTGDSLLDSFVWVKPGGECDG 1CB2_BGI:182777 TSDSSAPRFDSHCALPDALQPAPQAGAWFQAYFVQLLTNANPSFL SS2c-G10fusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 846 protein NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMQPSTATAAPKEKTSSEKKDNYIIKGVFWDPACVIA SS2c-G10fusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 848 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGCAACCAT CTACCGCTACCGCCGCTCCAAAAGAAAAGACCAGCAGTGAAAAGAAGGACAACTATATTATCAA AGGTGTCTTCTGGGACCCAGCATGTGTTATTGCTTAG SS2c-G10fusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 1024 protein NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMQPSTATAAPKEKTSSEKKDNYIIKGVFWDPACVIA- SS2e-A7bfusion ATGTCTGCATCTACTACAAGTTTAGAGGAATATCAAAAAACTTTCCTTGAACTGGGATTAGAAT 849 protein,coding GCAAAGCACTAAGATTTGGGTCATTCAAGCTGAATTCAGGCAGGCAGTCGCCATATTTTTTCAA sequence TCTTAGTTTGTTCAATTCTGGAAAGCTGTTGGCAAACCTTGCCACCGCGTATGCAACTGCTATC ATTCAATCGGAGCTTAAATTCGATGTTATTTTCGGACCTGCTTACAAAGGGATCCCTTTGGCTG CTATTGTATGCGTTAAACTAGCAGAAATCGGGGGCACTAAATTTCAAGGTATTCAATATGCTTT TAATAGAAAGAAAGTTAAAGACCACGGCGAAGGTGGTATTATTGTTGGAGCATCGCTTGAAGAC AAGAGGGTGTTGATTATCGACGATGTCATGACTGCAGGAACTGCAATCAATGAAGCATTTGAGA TAATCAGTATTGCTCAAGGTAGGGTAGTGGGTTGTATTGTTGCTTTAGATAGGCAAGAAGTGAT TCATGAATCTGATCCGGAAAGAACAAGTGCTACCCAATCTGTTTCAAAGAGATACAACGTTCCT GTGCTAAGTATTGTATCACTGACTCAAGTGGTACAATTTATGGGAAATAGACTATCACCAGAGC AAAAATCAGCGATTGAAAACTACCGTAAGGCCTATGGTATATGA SS2e-A7bfusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 850 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA SS2e-A7bfusion AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG 851 protein TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTCTGCAT CTACTACAAGTTTAGAGGAATATCAAAAAACTTTCCTTGAACTGGGATTAGAATGCAAAGCACT AAGATTTGGGTCATTCAAGCTGAATTCAGGCAGGCAGTCGCCATATTTTTTCAATCTTAGTTTG TTCAATTCTGGAAAGCTGTTGGCAAACCTTGCCACCGCGTATGCAACTGCTATCATTCAATCGG AGCTTAAATTCGATGTTATTTTCGGACCTGCTTACAAAGGGATCCCTTTGGCTGCTATTGTATG CGTTAAACTAGCAGAAATCGGGGGCACTAAATTTCAAGGTATTCAATATGCTTTTAATAGAAAG AAAGTTAAAGACCACGGCGAAGGTGGTATTATTGTTGGAGCATCGCTTGAAGACAAGAGGGTGT TGATTATCGACGATGTCATGACTGCAGGAACTGCAATCAATGAAGCATTTGAGATAATCAGTAT TGCTCAAGGTAGGGTAGTGGGTTGTATTGTTGCTTTAGATAGGCAAGAAGTGATTCATGAATCT GATCCGGAAAGAACAAGTGCTACCCAATCTGTTTCAAAGAGATACAACGTTCCTGTGCTAAGTA TTGTATCACTGACTCAAGTGGTACAATTTATGGGAAATAGACTATCACCAGAGCAAAAATCAGC GATTGAAAACTACCGTAAGGCCTATGGTATATGA MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMSASTTSLEEYQKTFLELGLECKALRFGSFKLNSGRQSPYFFNLSL FNSGKLLANLATAYATAIIQSELKFDVIFGPAYKGIPLAAIVCVKLAEIGGTKFQGIQYAFNRK KVKDHGEGGIIVGASLEDKRVLIIDDVMTAGTAINEAFEIISIAQGRVVGCIVALDRQEVIHES DPERTSATQSVSKRYNVPVLSIVSLTQVVQFMGNRLSPEQKSAIENYRKAYGI- SS2d-G11fusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 853 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGACCGAAT TAGATTATCAAGGAACTGCTGAGGCGGCTTCTACCTCGTATAGTCGAAATCAAACGGACCTTAA GCCGTTTCCTTCTGCAGGCAGTGCATCTTCATCAATTAAAACGACGGAACCTGTGAAAGATCAT AGAAGAAGGCGTTCTTCCAGCATAATTTCACATGTGGAACCGGAGACTTTTGAAGATGAAAATG ACCAGCAACTTCTACCAAATATGAATGCTACTTGGGTAGACCAACGCGGCGCTTGGATTATTCA TGTGGTCATTATCATACTGCTGAAACTATTTTATAATTTATTTCCTGGTGTTACCACAGAATGG TCGTGGACTCTGACTAATATGACATATGTTATTGGGTCCTATGTCATGTTCCATCTGATTAAGG GTACCCCTTTCGATTTCAATGGTGGTGCTTATGACAACTTGACGATGTGGGAACAAATTGACGA CGAGACTTTATATACTCCTTCAAGAAAATTTTTGATTAGTGTCCCGATCGCCCTATTCTTAGTT AGTACTCATTATGCTCACTATGATTTGAAATTGTTTTCATGGAATTGTTTTTTGACAACCTTTG GTGCTGTTGTCCCAAAGTTACCTGTTACTCATAGATTAAGGATTTCTATCCCAGGTATCACAGG TCGCGCCCAAATTAGTTGA SS2d-G11fusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 854 protein NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMTELDYQGTAEAASTSYSRNQTDLKPFPSAGSASSSIKTTEPVKDH RRRRSSSIISHVEPETFEDENDQQLLPNMNATWVDQRGAWIIHVVIIILLKLFYNLFPGVTTEW SWTLTNMTYVIGSYVMFHLIKGTPFDFNGGAYDNLTMWEQIDDETLYTPSRKFLISVPIALFLV STHYAHYDLKLFSWNCFLTTFGAVVPKLPVTHRLRISIPGITGRAQIS SS2e-A7afusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 855 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGCCAAGAG TAGCTATCATCATTTACACACTATATGGTCACGTTGCTGCCACCGCAGAGGCAGAAAAGAAGGG AATTGAAGCCGCTGGAGGCTCTGCAGACATTTATCAAGTCGAGGAAACGTTGTCTCCAGAAGTT GTTAAGGCGCTTGGCGGTGCTCCAAAGCCAGATTACCCAATTGCCACTCAAGATACGTTGACAG AATATGATGCCTTTTTGTTTGGTATTCCAACTAGATTTGGTAACTTCCCTGCTCAATGGAAGGC TTTCTGGGACCGTACCGGTGGGTTGTGGGCTAAGGGTGCTTTGCATGGTAAGGTCGCTGGTTGT TTCGTCTCCACCGGAACTGGTGGTGGTAATGAAGCCACAATTATGAACTCTTTGTCTACTTTGG CTCATCACGGTATCATTTTTGTCCCATTGGGTTACAAGAATGTTTTCGCTGAATTGACCAATAT GGATGAAGTTCACGGTGGTTCACCATGGGGTGCGGGTACCATTGCAGGCAGTGACGGTTCAAGA TCTCCTTCCGCCTTGGAATTACAAGTACACGAAATTCAAGGCAAGACTTTCTACGAAACCGTTG CAAAGTTTTGA SS2e-A7afusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 856 protein NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMPRVAIIIYTLYGHVAATAEAEKKGIEAAGGSADIYQVEETLSPEV VKALGGAPKPDYPIATQDTLTEYDAFLFGIPTRFGNFPAQWKAFWDRTGGLWAKGALHGKVAGC FVSTGTGGGNEATIMNSLSTLAHHGIIFVPLGYKNVFAELTNMDEVHGGSPWGAGTIAGSDGSR SPSALELQVHEIQGKTFYETVAKF- 554d-G5fusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 858 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGGGAAAGA ACGTTTTGTTGCTAGGATCTGGTTTTGTTGCACAACCTGTTATCGACACATTGGCTGCTAATGA TGACATCAATGTCACTGTCGCATGTAGAACATTAGCCAATGCGCAAGCATTGGCCAAGCCCTCT GGATCCAAGGCTATTTCATTGGATGTTACCGATGACAGTGCCTTAGACAAAGTTCTGGCTGATA ACGATGTTGTCATCTCTTTGATTCCATACACCTTCCATCCAAATGTGGTAAAGAGCGCCATCAG AACAAAGACCGATGTCGTCACTTCCTCTTACATCTCACCTGCCTTAAGAGAATTGGAACCAGAA ATCGTAAAGGCAGGTATTACAGTTATGAACGAAATTGGGTTGGATCCAGGTATCGACCACTTGT ATGCGGTCAAGACTATTGATGAAGTTCACAGAGCTGGTGGTAAGCTAAAGTCATTCTTGTCATA CTGTGGTGGTTTACCAGCTCCTGAAGACTCTGATAATCCATTAGGATACAAATTTTCATGGTCC TCCAGAGGTGTGCTACTGGCTTTAAGAAACTCTGCTAAATACTGGAAAGACGGAAAGATTGAAA CTGTTTCTTCCGAAGACTTAATGGCCACTGCTAAGCCTTACTTCATCTACCCAGGTTATGCATT CGTTTGCTACCCAAATAGAGACTCTACCCTTTTCAAGGATCTTTATCATATTCCAGAAGCCGAA ACGGTCATTAGAGGTACTTTGAGATATCAAGGTTTCCCAGAATTTGTTAAGGCTTTAGTTGACA TGGGTATGTTGAAGGATGATGCTAACGAAATCTTCAGCAAGCCAATTGCCTGGAACGAAGCACT AAAACAATATTTAGGTGCCAAGTCTACTTCTAAAGAAGATTTGATTGCTTCCATTGACTCAAAG GCTACTTGGAAAGATGATGAAGATAGAGAAAGAATCCTTTCCGGGTTTGCTTGGTTAGGCTTGT TCTCTGACGCAAAGATCACACCAAGAGGTAATGCTTTAGACACTCTATGTGCACGTTTAGAAGA ACTAATGCAATATGAAGACAATGAAAGAGATATGGTTGTACTACAACACAAATTCGGTATTGAA TGGGCTGATGGAACTACCGAAACAAGAACATCCACTTTAGTTGACTATGGTAAGGTTGGTGGTT ACAGTTCTATGGCCGCTACTGTTGGTTATCCAGTTGCCATTGCAACGAAATTCGTCTTAGATGG TACAATCAAGGGACCAGGCTTACTAGCGCCATACTCACCAGAGATTAATGATCCAATCATGAAA GAACTAAAGGACAAGTACGGCATCTATCTAAAGGAAAAGACAGTGGCTTAA SS4d-G5fusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 859 protein NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMGKNVLLLGSGFVAQPVIDTLAANDDINVTVACRTLANAQALAKPS GSKAISLDVTDDSALDKVLADNDVVISLIPYTFHPNVVKSAIRTKTDVVTSSYISPALRELEPE IVKAGITVMNEIGLDPGIDHLYAVKTIDEVHRAGGKLKSFLSYCGGLPAPEDSDNPLGYKFSWS SRGVLLALRNSAKYWKDGKIETVSSEDLMATAKPYFIYPGYAFVCYPNRDSTLFKDLYHIPEAE TVIRGTLRYQGFPEFVKALVDMGMLKDDANEIFSKPIAWNEALKQYLGAKSTSKEDLIASIDSK ATWKDDEDRERILSGFAWLGLFSDAKITPRGNALDTLCARLEELMQYEDNERDMVVLQHKFGIE WADGTTETRTSTLVDYGKVGGYSSMAATVGYPVAIATKFVLDGTIKGPGLLAPYSPEINDPIMK ELKDKYGIYLKEKTVA- SS4d-C7fusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 861 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTCACGTC TTCCTCTAAAGCAGTTCTTAGCGGATAACCCCAAAAAAGTTCTTGTTCTTGACGGTGGTCAAGG AACAGAACTGGAAAACAGAGGTATCAAAGTTGCAAATCCCGTGTGGTCTACTATTCCATTTATT AGCGAATCATTTTGGTCTGATGAGTCATCTGCTAACAGAAAAATTGTCAAAGAAATGTTCAACG ATTTCTTGAATGCTGGCGCAGAAATATTGATGACTACAACATACCAAACGAGTTATAAATCAGT TTCTGAAAACACCCCAATCAGAACTTTATCCGAGTACAATAACCTTTTAAACAGGATTGTCGAT TTTTCTCGTAATTGTATTGGCGAAGACAAATATTTGATTGGCTGTATTGGCCCATGGGGTGCTC ATATTTGTCGTGAGTTTACAGGCGACTATGGTGCTGAGCCAGAAAATATTGATTTCTACCAATA CTTCAAGCCTCAGTTGGAGAATTTCAATAAAAATGACAAATTGGATTTGATTGGGTTTGAAACC ATTCCTAACATCCATGAACTGAAAGCTATCTTATCTTGGGATGAGAGTATCCTGTCTAGACCCT TCTATATCGGGTTGTCTGTGCATGAGCACGGTGTCTTGAGAGACGGCACTACCATGGAAGAAAT CGCACAAGTTATTAAGGACTTGGGCGACAAAATAAATCCTAACTTCTCGTTCTTAGGAATCAAC TGCGTCAGCTTCAACCAATCACCCGACATTCTTGAGTCTCTACATCAAGCACTACCAAATATGG CCTTGCTTGCTTATCCAAACAGTGGTGAAGTTTATGATACTGAAAAGAAGATATGGTTGCCAAA TAGCGATAAGCTGAACAGTTGGGATACGGTTGTTAAACAGTACATTAGCAGCGGTGCCCGTATC ATTGGTGGTTGTTGCAGAACAAGTCCAAAAGACATCCAAGAGATTTCTGCAGCCGTCAAGAAAT ACACGTAA SS4d-C7fusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 862 protein NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMSRLPLKQFLADNPKKVLVLDGGQGTELENRGIKVANPVWSTIPFI SESFWSDESSANRKIVKEMFNDFLNAGAEILMTTTYQTSYKSVSENTPIRTLSEYNNLLNRIVD FSRNCIGEDKYLIGCIGPWGAHICREFTGDYGAEPENIDFYQYFKPQLENFNKNDKLDLIGFET IPNIHELKAILSWDESILSRPFYIGLSVHEHGVLRDGTTMEEIAQVIKDLGDKINPNFSFLGIN CVSFNQSPDILESLHQALPNMALLAYPNSGEVYDTEKKIWLPNSDKLNSWDTVVKQYISSGARI IGGCCRTSPKDIQEISAAVKKYT- SS3b-D8fusion ATGTCGCAAGAGTTCGAGACACCGGCGGTTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAG 864 protein,coding GAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAG sequence CATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGAC GATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCC GTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGA GATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAG AAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTG ACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCT ATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTAC ATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGT TTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCA TGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCC TGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCAC CCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAG AATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATA GTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCA GAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGG CAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAG GCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTG GACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTT CAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAG GGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAAT TGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCA AGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTT AGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGA TGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGC AGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGG TCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACA GTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGG GCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAG AAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCA TCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAA CTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAAC AAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGG CCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCT AATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAG AACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACT CCCATCGTGTCTTGGATATGTGA SS3b-D8fusion MSQEFETPAVGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDD 865 protein DEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVK KEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRY IYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITS WGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPI VLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREK AMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQ GYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPF STRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTR SYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQ KTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQN KVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNK NCMITYAAYRNIFPIWALGEYSHRVLDM- SS3b-D8fusion MSQEFETPAV 866 protein,fusion domain SS2c-A10a ATGAATTCGAATGAAGACATCATACCTGAACTATAA 867 SS2c-A10afusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 869 protein NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMNSNEDIIPEL- SS2c-A10afusion MNSNEDIIPEL 870 protein,fusion domain Pathwaystep3 ATGTGGACAGTTGTGTTGGGACTTGCTACCTTGTTTGTTGCCTATTATATTCATTGGATCAACA 871 sequenceA AGTGGAGAGATTCCAAGTTCAATGGTGTTCTACCTCCTGGAACTATGGGGCTACCATTGATAGG CYP87D18DNA AGAGACAATTCAGTTGTCAAGACCATCTGACAGTTTGGATGTGCATCCCTTTATCCAGAAGAAA (codonoptimized) GTCGAACGTTATGGTCCGATATTTAAAACCTGTTTGGCAGGCAGACCAGTTGTTGTTTCAGCGG ATGCAGAGTTCAATAATTACATTATGTTACAAGAAGGTAGAGCTGTAGAAATGTGGTATTTGGA CACACTGTCTAAATTCTTCGGGTTGGATACAGAGTGGTTAAAAGCCTTAGGCTTAATCCACAAG TACATAAGATCCATTACCCTAAACCATTTTGGTGCTGAAGCATTGAGAGAAAGATTCTTGCCAT TTATAGAGGCATCGTCTATGGAAGCGTTACATTCTTGGTCCACTCAACCCAGTGTGGAGGTCAA GAATGCAAGTGCTTTGATGGTATTCAGAACGTCTGTAAACAAAATGTTTGGAGAAGATGCTAAG AAATTATCAGGAAATATTCCAGGTAAATTCACAAAGCTGCTGGGTGGCTTTCTATCTCTACCGT TAAATTTTCCCGGCACTACTTATCACAAGTGCTTAAAAGACATGAAAGAAATCCAGAAGAAATT ACGTGAAGTTGTAGATGATAGACTTGCCAATGTTGGGCCAGATGTTGAGGACTTTCTAGGGCAA GCGTTGAAAGACAAAGAATCCGAGAAATTCATAAGCGAAGAATTTATCATCCAATTGCTATTTT CAATAAGCTTTGCTTCGTTCGAATCGATCAGCACGACGTTGACATTGATTTTGAAGCTACTTGA CGAACATCCTGAGGTTGTAAAGGAATTAGAAGCCGAACATGAAGCTATCAGAAAAGCTAGAGCT GATCCAGATGGTCCAATTACCTGGGAAGAATACAAATCTATGACCTTCACACTTCAAGTCATAA ACGAAACACTTAGGTTAGGCTCAGTGACTCCTGCCTTATTGAGGAAAACTGTTAAAGATCTGCA AGTCAAGGGTTACATTATTCCTGAAGGATGGACTATAATGTTGGTAACTGCATCTAGGCATCGT GATCCAAAGGTCTACAAAGATCCGCACATATTCAATCCTTGGAGATGGAAAGACCTGGACTCAA TTACCATTCAAAAGAACTTTATGCCATTCGGTGGTGGTTTAAGGCATTGTGCAGGAGCTGAATA CTCCAAAGTGTATCTGTGTACTTTTCTTCACATTCTTTGCACAAAATATAGGTGGACGAAGTTA GGTGGCGGTAGAATTGCAAGAGCCCATATTTTAAGTTTTGAGGATGGTTTGCACGTCAAGTTTA CTCCTAAAGAGTAA Pathwaystep3 MWTVVLGLATLFVAYYIHWINKWRDSKFNGVLPPGTMGLPLIGETIQLSRPSDSLDVHPFIQKK 872 sequenceB VERYGPIFKTCLAGRPVVVSADAEFNNYIMLQEGRAVEMWYLDTLSKFFGLDTEWLKALGLIHK CYP87D18Protein YIRSITLNHFGAEALRERFLPFIEASSMEALHSWSTQPSVEVKNASALMVFRTSVNKMFGEDAK KLSGNIPGKFTKLLGGFLSLPLNFPGTTYHKCLKDMKEIQKKLREVVDDRLANVGPDVEDFLGQ ALKDKESEKFISEEFIIQLLFSISFASFESISTTLTLILKLLDEHPEVVKELEAEHEAIRKARA DPDGPITWEEYKSMTFTLQVINETLRLGSVTPALLRKTVKDLQVKGYIIPEGWTIMLVTASRHR DPKVYKDPHIFNPWRWKDLDSITIQKNFMPFGGGLRHCAGAEYSKVYLCTFLHILCTKYRWTKL GGGRIARAHILSFEDGLHVKFTPKE Pathwaystep3 ATGAAGGTCAGTCCATTCGAATTCATGTCCGCTATTATCAAGGGTAGAATGGACCCATCTAACT 873 sequenceC CCTCATTTGAATCTACTGGTGAAGTTGCCTCCGTTATCTTTGAAAACAGAGAATTGGTTGCCAT SgCPRDNA(codon CTTGACCACTTCTATTGCTGTTATGATTGGTTGCTTCGTTGTCTTGATGTGGAGAAGAGCTGGT optimized) TCTAGAAAGGTTAAGAATGTCGAATTGCCAAAGCCATTGATTGTCCATGAACCAGAACCTGAAG TTGAAGATGGTAAGAAGAAGGTTTCCATCTTCTTCGGTACTCAAACTGGTACTGCTGAAGGTTT TGCTAAGGCTTTGGCTGATGAAGCTAAAGCTAGATACGAAAAGGCTACCTTCAGAGTTGTTGAT TTGGATGATTATGCTGCCGATGATGACCAATACGAAGAAAAATTGAAGAACGAATCCTTCGCCG TTTTCTTGTTGGCTACTTATGGTGATGGTGAACCTACTGATAATGCTGCTAGATTTTACAAGTG GTTCGCCGAAGGTAAAGAAAGAGGTGAATGGTTGCAAAACTTGCACTATGCTGTTTTTGGTTTG GGTAACAGACAATACGAACACTTCAACAAGATTGCTAAGGTTGCCGACGAATTATTGGAAGCTC AAGGTGGTAATAGATTGGTTAAGGTTGGTTTAGGTGATGACGATCAATGCATCGAAGATGATTT TTCTGCTTGGAGAGAATCTTTGTGGCCAGAATTGGATATGTTGTTGAGAGATGAAGATGATGCT ACTACTGTTACTACTCCATATACTGCTGCTGTCTTGGAATACAGAGTTGTCTTTCATGATTCTG CTGATGTTGCTGCTGAAGATAAGTCTTGGATTAACGCTAATGGTCATGCTGTTCATGATGCTCA ACATCCATTCAGATCTAACGTTGTCGTCAGAAAAGAATTGCATACTTCTGCCTCTGATAGATCC TGTTCTCATTTGGAATTCAACATTTCCGGTTCCGCTTTGAATTACGAAACTGGTGATCATGTTG GTGTCTACTGTGAAAACTTGACTGAAACTGTTGATGAAGCCTTGAACTTGTTGGGTTTGTCTCC AGAAACTTACTTCTCTATCTACACCGATAACGAAGATGGTACTCCATTGGGTGGTTCTTCATTG CCACCACCATTTCCATCATGTACTTTGAGAACTGCTTTGACCAGATACGCTGATTTGTTGAACT CTCCAAAAAAGTCTGCTTTGTTGGCTTTAGCTGCTCATGCTTCTAATCCAGTTGAAGCTGATAG ATTGAGATACTTGGCTTCTCCAGCTGGTAAAGATGAATATGCCCAATCTGTTATCGGTTCCCAA AAGTCTTTGTTGGAAGTTATGGCTGAATTCCCATCTGCTAAACCACCATTAGGTGTTTTTTTTG CTGCTGTTGCTCCAAGATTGCAACCTAGATTCTACTCCATTTCATCCTCTCCAAGAATGGCTCC ATCTAGAATCCATGTTACTTGTGCTTTGGTTTACGATAAGATGCCAACTGGTAGAATTCATAAG GGTGTTTGTTCTACCTGGATGAAGAATTCTGTTCCAATGGAAAAGTCCCATGAATGTTCTTGGG CTCCAATTTTCGTTAGACAATCCAATTTTAAGTTGCCAGCCGAATCCAAGGTTCCAATTATCAT GGTTGGTCCAGGTACTGGTTTGGCTCCTTTTAGAGGTTTTTTACAAGAAAGATTGGCCTTGAAA GAATCCGGTGTTGAATTGGGTCCATCCATTTTGTTTTTCGGTTGCAGAAACAGAAGAATGGATT ACATCTACGAAGATGAATTGAACAACTTCGTTGAAACCGGTGCTTTGTCCGAATTGGTTATTGC TTTTTCTAGAGAAGGTCCTACCAAAGAATACGTCCAACATAAGATGGCTGAAAAGGCTTCTGAT ATCTGGAACTTGATTTCTGAAGGTGCTTACTTGTACGTTTGTGGTGATGCTAAAGGTATGGCTA AGGATGTTCATAGAACCTTGCATACCATCATGCAAGAACAAGGTTCTTTGGATTCTTCCAAAGC TGAATCCATGGTCAAGAACTTGCAAATGAATGGTAGATACTTAAGAGATGTTTGGTAA Pathwaystep3 MKVSPFEFMSAIIKGRMDPSNSSFESTGEVASVIFENRELVAILTTSIAVMIGCFVVLMWRRAG 874 sequenceD SRKVKNVELPKPLIVHEPEPEVEDGKKKVSIFFGTQTGTAEGFAKALADEAKARYEKATFRVVD SgCPRProtein LDDYAADDDQYEEKLKNESFAVFLLATYGDGEPTDNAARFYKWFAEGKERGEWLQNLHYAVFGL GNRQYEHFNKIAKVADELLEAQGGNRLVKVGLGDDDQCIEDDFSAWRESLWPELDMLLRDEDDA TTVTTPYTAAVLEYRVVFHDSADVAAEDKSWINANGHAVHDAQHPFRSNVVVRKELHTSASDRS CSHLEFNISGSALNYETGDHVGVYCENLTETVDEALNLLGLSPETYFSIYTDNEDGTPLGGSSL PPPFPSCTLRTALTRYADLLNSPKKSALLALAAHASNPVEADRLRYLASPAGKDEYAQSVIGSQ KSLLEVMAEFPSAKPPLGVFFAAVAPRLQPRFYSISSSPRMAPSRIHVTCALVYDKMPTGRIHK GVCSTWMKNSVPMEKSHECSWAPIFVRQSNFKLPAESKVPIIMVGPGTGLAPFRGFLQERLALK ESGVELGPSILFFGCRNRRMDYIYEDELNNFVETGALSELVIAFSREGPTKEYVQHKMAEKASD IWNLISEGAYLYVCGDAKGMAKDVHRTLHTIMQEQGSLDSSKAESMVKNLQMNGRYLRDVW Pathwaystep3 ATGGAACCTGAAAACAAGTTCTTCAATGTTGGGTTATTGATCGTAGTTACGTTGGTTTTGGCTA 875 sequenceE AACTAATTTCTGCGGTCATTAATTCCAGGTCTAAGAAGAGAGTACCTCCAACCGTCAAAGGTTT CYP51G1(codon TCCACTTGTAGGTGGCTTGGTTAGATTTCTTAAAGGGCCAATTGTGATGTTGAGAGAAGAATAT optimized) CCCAAACATGGATCCGTATTCACTCTGAATTTACTACATAAGAAGATTACCTTTCTGATTGGAC CAGAAGTTTCTGCACATTTCTTTAAGGCTTCAGAGAGTGATTTATCACAGCAAGAAGTCTACCA ATTTAACGTGCCCACTTTTGGTCCGGGCGTTGTTTTCGATGTCGACTACTCGGTAAGGCAAGAA CAATTCAGATTCTTTACCGAAGCATTGAGAGTTACAAAACTGAAGGGCTATGTTGACCAAATGG TGAAAGAAGCAGAAGATTACTTTTCAAAATGGGGTGATTCAGGAGAGGTTGATCTAAAATGCGA ACTTGAACACTTGATCATATTAACCGCATCTAGATGTTTGTTGGGAAGAGAAGTTCGTGACCAG TTATTTGCTGATGTAAGTGCCCTATTTCATGACTTGGATAACGGTATGCTGCCAATATCCGTGA TGTTCCCATACTTGCCTATACCCGCTCATAGGAGAAGAGATCAAGCGAGATCAAAATTGGCTGA TATCTTTGTCAACATCATATCCTCTCGTAAATGTACTGGCACTTCTGAAAATGACATGTTACAA TGCTTTATAAACTCTAAATACAAAGATGGCAGACCAACTACTGATTCTGAAATCACAGGGTTAT TGATAGCCGCATTATTCGCTGGGCAACATACGAGCTCGATTACTAGCACATGGACAGGCGCATA TTTGTTATGTCACAAAGAGTATATGAGTGCCGTTCTTGAAGAGCAGCAGAAACAAATGGAGAAG CATGGTGACGAAATTGATCACGATATTCTATCCGAAATGGACAATTTGTACCGTTGCATCAAAG AAGCCCTAAGACTACATCCACCCTTGATTATGCTTATGAGGTCGAGTCATACCGATTTTAGCGT TACGACAAGAGAAGGAAAAGAGTATGATATTCCGAAGGGACATATTATAGCCACAAGTCCAGCT TTCGCAAATCGTTTACCTCACGTGTATAAAGACCCTGACAGATTTGATCCAGATAGGTTTGCTC CAGGTAGAGATGAGGATAAGGCTGCTGGACCTTTCTCCTACATATCATTTGGTGGTGGTAGACA CGGTTGTTTAGGTGAACCTTTTGCGTATTTACAAATCAAGGCAATCTGGTCACACTTACTGAGA AATTTTGAGTTAGAGTTGATTAGTCCTTTCCCGGAAATTGACTGGAATGCCATGGTTGTGGGTG TCAAGGGTAAAGTGATGGTCAGGTATAAGAGAAGAAAGCTTAGCGTATCTTAG Pathwaystep3 MEPENKFFNVGLLIVVTLVLAKLISAVINSRSKKRVPPTVKGFPLVGGLVRFLKGPIVMLREEY 876 sequenceF PKHGSVFTLNLLHKKITFLIGPEVSAHFFKASESDLSQQEVYQFNVPTFGPGVVFDVDYSVRQE CYP51G1Protein QFRFFTEALRVTKLKGYVDQMVKEAEDYFSKWGDSGEVDLKCELEHLIILTASRCLLGREVRDQ LFADVSALFHDLDNGMLPISVMFPYLPIPAHRRRDQARSKLADIFVNIISSRKCTGTSENDMLQ CFINSKYKDGRPTTDSEITGLLIAALFAGQHTSSITSTWTGAYLLCHKEYMSAVLEEQQKQMEK HGDEIDHDILSEMDNLYRCIKEALRLHPPLIMLMRSSHTDFSVTTREGKEYDIPKGHIIATSPA FANRLPHVYKDPDRFDPDRFAPGRDEDKAAGPFSYISFGGGRHGCLGEPFAYLQIKAIWSHLLR NFELELISPFPEIDWNAMVVGVKGKVMVRYKRRKLSVS. Pathwaystep3 ATGTTATCGTTGGCCATTTGGGTTTCACTTTTGTTCTTGTTGTCATCATTGCTTCTTTTAAAGA 877 sequenceG CGAAGAAGAAAGTTGCTCCACAAAAGAAGAAGAAGCAATTTCCACCTGGACCTCCCAAACTACC CYP71B97(codon ATTGTTAGGCCATCTGCACTTATTGGGTTCTTTGCCTCATTGCTCCTTATGTGAACTGTCTAGA optimized) AAATATGGTCCTGTCATGTTGTTAAAATTAGGCTCAGTACCTACCGTAGTCATATCTAGCGCTG CAGCCGCTAGAGAGGTGTTGAAAGTACACGATCTAGCATGTTGCTCTCGTCCGAGATTGGCTGC TTCCGGTAGATTCTCGTACAATTTTCTGGATCTGAACTTAAGCCCATATGGTGAGAGATGGAGA GAACTGAGGAAAATTTGCGTATTGGTTTTGCTGAGTGCTAGACGTGTTCAGAGCTTCCAACAGA TAAGAGAAGAAGAGGTGGGATTATTACTTAAATCCATTAGTCAAGTTTCCAGTAGTGCCACTCC AGTTGATCTATCTGAGAAATCCTATTCTTTGACAGCTAACATTATCACTAGAATCGCGTTTGGG AAGTCATTCAGAGGTGGCGAATTAGACAATGAAAACTTTCAACAAGTCATCCACAGAGCATCGA TTGCCTTAGGTTCCTTTTCTGTGACAAACTTCTTTCCTTCAGTAGGGTGGATTATCGACAGATT AACCGGTGTACATGGCAGATTGGAGAAGAGTTTTGCTGAATTAGACACCTTCTTTCAGCATATC ATTGATGATCGTATCAATTTTGTCGCAACAAGCCAAACCGAAGAAAACATTATAGACGTACTAT TGAAAATGGAAAGAGAACGTTCAAAATTTGATGTCCTACAACTGAATAGGGACTGCATAAAAGC CTTGATAATGGATATATTTCTTGCCGGTGTAGATACTGGAGCAGGGACAATTGTGTGGGCATTG ACTGAATTGGTGAGAAATCCCAGAGTGATGAAGAAGTTGCAAGACGAAATAAGGTCGTGTGTGA AAGAGGATCAAGTCAAGGAACGTGATTTAGAGAAACTTCAGTACTTAAAGATGGTCGTTAAAGA AGTTTTAAGATTGCATGCTCCAGTTCCTTTGTTATTGCCGAGAGAGACAATGTCTCATTTCAAA CTAAATGGTTATGACATTGATCCGAAAACTCACTTGCATGTCAATGTTTGGGCGATTGGTAGGG ACCCAGATTCTTGGTCTGATCCAGAAGAATTCTTCCCAGAAAGATTCGCAGGATCAAGTATTGA TTACAAAGGACATAATTTTGAATTGCTGCCATTTGGTGGTGGCAGAAGGATCTGTCCCGGTATG AACATGGGGACAGTTGCGGTTGAACTTGCACTAACGAACCTATTACTTTGTTTTGATTGGACTC TACCTGATGGCATGAAAGAGGAAGATGTTGACATGGAAGAAGATGGTGGACTTGCTATTGCTAA GAAATCTCCCCTAAAATTAGTTCCAGTTAGGTGTCTTAATTAG Pathwaystep3 MLSLAIWVSLLFLLSSLLLLKTKKKVAPQKKKKQFPPGPPKLPLLGHLHLLGSLPHCSLCELSR 878 sequenceH KYGPVMLLKLGSVPTVVISSAAAAREVLKVHDLACCSRPRLAASGRFSYNFLDLNLSPYGERWR CYP71B97Protein ELRKICVLVLLSARRVQSFQQIREEEVGLLLKSISQVSSSATPVDLSEKSYSLTANIITRIAFG KSFRGGELDNENFQQVIHRASIALGSFSVTNFFPSVGWIIDRLTGVHGRLEKSFAELDTFFQHI IDDRINFVATSQTEENIIDVLLKMERERSKFDVLQLNRDCIKALIMDIFLAGVDTGAGTIVWAL TELVRNPRVMKKLQDEIRSCVKEDQVKERDLEKLQYLKMVVKEVLRLHAPVPLLLPRETMSHFK LNGYDIDPKTHLHVNVWAIGRDPDSWSDPEEFFPERFAGSSIDYKGHNFELLPFGGGRRICPGM NMGTVAVELALTNLLLCFDWTLPDGMKEEDVDMEEDGGLAIAKKSPLKLVPVRCLN. Pathwaystep3 ATGGATTTGCTTTTGTTGGAAAAGACGTTGTTGGGTCTATTTATCGCTGTCGTATTGGCAATAG 879 sequenceI CCATTAGCAAATTAAGGGGTAAAAGGTTTAAACTGCCACCAGGTCCGTTACCTGTCCCTATCTT CYP73A152(codon TGGCAACTGGTTACAGGTTGGTGATGATTTGAACCACAGAAATCTAACGGGTTTAGCCAAGAAA optimized) TTTGGGGATATTTTCTTGTTAAGAATGGGCCAAAGAAACTTAGTGGTAGTTTCATCTCCTGAAC TTGCCAAAGAAGTGCTTCATACACAAGGAGTGGAGTTTGGATCTAGAACAAGAAATGTAGTGTT CGACATATTTACCGGAAAAGGTCAAGATATGGTTTTCACAGTATATGGTGAACATTGGCGTAAA ATGCGTAGAATAATGACTGTACCATTCTTCACCAACAAGGTTGTCCAACAATATAGGCATGGAT GGGAAGCAGAAGCAGCTAGCGTTGTTGAAGATGTGAAGAAGAATCCGGAATCTGCTACTACTGG TATTGTGTTACGTCGTAGACTTCAATTGATGATGTACAATAACATGTATCGTATAATGTTTGAC AGAAGATTTGAGTCCGAGGATGATCCCCTATTTCACAAATTGAGAGCACTGAATGGTGAGAGAT CTAGGTTGGCTCAATCGTTCGAGTACAACTATGGAGACTTCATCCCTATTTTAAGACCTTTCTT GAGAGGCTATTTGAAAATTTGCAAGGAAGTCAAGGACACTAGGTTACAGTTGTTTAAAGACTAC TTTGTTGAAGAAAGAAAGAAATTGGCGAACGTGAAAACTACCACAAATGAGGGCTTAAAATGTG CGATCGATCACATTCTGGACGCACAACAGAAAGGTGAAATCAATGAAGATAACGTTTTATACAT TGTTGAGAATATTAATGTAGCTGCCATTGAAACTACGTTGTGGTCGATAGAATGGGGAATTGCA GAGCTTGTCAATCATCCTGAAATCCAAAGAAAGCTGAGAAATGAGATGGATACAGTCTTAGGCT CAGGTGTTCCTATCACTGAACCAGATACACATAAGTTGCCCTATTTACAAGCTGTCATAAAAGA AACTCTTAGACTTAGAATGGCTATACCCTTGCTAGTTCCACATATGAATCTACATGATGCCAAA CTGGGTGGTTACGACATTCCAGCAGAATCCAAGATTCTAGTAAACGCTTGGTGGTTAGCCAATA ATCCAGCTAATTGGAAGAATCCAGAAGAATTCAGACCAGAGAGATTCTTGGAAGAAGAATCCAA AGTTGAAGCTAATGGGAACGACTTTAGATATTTACCGTTCGGTGTAGGAAGAAGGAGTTGTCCA GGGATAATTTTAGCGCTACCTATCCTAGCTATCACCATAGGCAGACTGGTTCAGAACTTTGAAT TGTTACCTCCACCAGGGCAAAGTAAGCTGGATACAAGTGAGAAGGGTGGTCAGTTTTCATTGCA TATTCTTAAACACTCAACCATTGTCGTTAAACCCAGGGCATTTTAG Pathwaystep3 MDLLLLEKTLLGLFIAVVLAIAISKLRGKRFKLPPGPLPVPIFGNWLQVGDDLNHRNLTGLAKK 880 sequenceJ FGDIFLLRMGQRNLVVVSSPELAKEVLHTQGVEFGSRTRNVVFDIFTGKGQDMVFTVYGEHWRK CYP73A152Protein MRRIMTVPFFTNKVVQQYRHGWEAEAASVVEDVKKNPESATTGIVLRRRLQLMMYNNMYRIMFD RRFESEDDPLFHKLRALNGERSRLAQSFEYNYGDFIPILRPFLRGYLKICKEVKDTRLQLFKDY FVEERKKLANVKTTTNEGLKCAIDHILDAQQKGEINEDNVLYIVENINVAAIETTLWSIEWGIA ELVNHPEIQRKLRNEMDTVLGSGVPITEPDTHKLPYLQAVIKETLRLRMAIPLLVPHMNLHDAK LGGYDIPAESKILVNAWWLANNPANWKNPEEFRPERFLEEESKVEANGNDFRYLPFGVGRRSCP GIILALPILAITIGRLVQNFELLPPPGQSKLDTSEKGGQFSLHILKHSTIVVKPRAF. Pathwaystep3 ATGTTAAAAGATCCCTTTTGCTTTCCCTTTCTACCTCTGTTGAGTTTGGCTGTTCTTCTGTTCT 881 sequenceK TACTATTGAGAAGGATCTGCTCTAAATCTAAGCCTAGACCTTTGCCTCCGGGTCCTACTCCATG CYP80C13(codon GCCTGTGGTCGGAAATCTATTGCAAATAGGCACAAATCCCCATATTTCGATCACTCAATTTTCT optimized) CAAACTTACGGTCCGTTGATTTCCTTGCGTTTGGGAACTAGCTTATTGGTCGTTGCATCGTCAC CAGCTGCTGCTACTGCCGTTCTTAGAACACATGATAGATTACTTAGTGCGAGATATATGTTCCA GACGATTCCTGACAAACGTAAACATGCCCAATTGTCCTTATCTACATCGCCATTCTGCGATGAC CATTGGAAGTCATTGAGAAGCATTTGTAGAGCAAACTTATTCACGTCCAAGGCTATAGAGTCAC AAGGAGGTCTTAGAAGAAGAAAGATGAAAGAGATGGTGGAATTTCTACAATCCAAACAAGGTAC GGTTGTAGGTGTTAGGGACTTAGTGTTTACCACCGTTTTCAACATCTTATCCAACTTGGTGTTC TCAAGAGACTTAGTTGGCTATGTAGGTGAAGGTTTCAATGGGATTAAGTCATCTTTTCACCGTT CTATGAAATTAGGGTTAACACCTAATCTGGCAGACTTTTATCCAATACTGGAAGGGTTCGATCT TCAAGGACTACAGAAGAAGGCTGTACTATATAACAAAGGAGTTGATTCTACATGGGAAATCCTA GTCAAAGAAAGGAGAGAATTACACAGGAACAACTTGGTAGTTTCACCGAATGACTTCTTGGATG TTTTGATACAGAATCAATTCAGTGATGATCAGATCAACTACTTGATTACCGAGGTTCTAACAGC TGGTATTGATACAACCACTTCTACCGTTGAATGGGCTATGGCGGAACTGTTAAAGAATAAGGAT TTAACTGAGAAAGTCAGGGTCGAATTGGAAAGAGAGATGAAAATCAAGGAAAATGCGATTGATG AGAGTCAGATTAGTCAATTTCAGTTTCTTCAACAGTGTGTCAAAGAAACTTTGAGACTTTATCC ACCAGTGCCATTTCTGTTACCAAGACTAGCACCAGAACCTTGTGAAGTGATGGGTTACAGTATT CCGAAAGATACCTCGATATTTGTTAACGCATGGGGCATTGGTAGAGATCCATCTATATGGGAGG AACCCTCAGCATTCAAACCAGAAAGATTTGTCAATTCAGACTTAGACTTTAAAGCCTATGATTA CAGATTCTTGCCTTTTGGTGGAGGCAGAAGATCTTGTCCAGGCCTTTTGATGACAACTGTACAA GTACCATTGATAATTGCCACGTTAATCCACAATTTTGACTGGAGCCTACCTAATGGCGGTGATT TGGCCCAATTGGATTTAAGCGGTCAAATGGGTGTATCCTTACAAAAGGAAAAGCCACTGTTGCT TATTCCCAGGAAACGTACTTAG Pathwaystep3 MLKDPFCFPFLPLLSLAVLLFLLLRRICSKSKPRPLPPGPTPWPVVGNLLQIGTNPHISITQFS 882 sequenceL QTYGPLISLRLGTSLLVVASSPAAATAVLRTHDRLLSARYMFQTIPDKRKHAQLSLSTSPFCDD CYP80C13Protein HWKSLRSICRANLFTSKAIESQGGLRRRKMKEMVEFLQSKQGTVVGVRDLVFTTVFNILSNLVF SRDLVGYVGEGFNGIKSSFHRSMKLGLTPNLADFYPILEGFDLQGLQKKAVLYNKGVDSTWEIL VKERRELHRNNLVVSPNDFLDVLIQNQFSDDQINYLITEVLTAGIDTTTSTVEWAMAELLKNKD LTEKVRVELEREMKIKENAIDESQISQFQFLQQCVKETLRLYPPVPFLLPRLAPEPCEVMGYSI PKDTSIFVNAWGIGRDPSIWEEPSAFKPERFVNSDLDFKAYDYRFLPFGGGRRSCPGLLMTTVQ VPLIIATLIHNFDWSLPNGGDLAQLDLSGQMGVSLQKEKPLLLIPRKRT Pathwaystep3 ATGGAAGCTCCCTCGTGGGTGTCTTATGCCGCAGCTTGGGTTGCAACATTGGCTCTATTGTTAC 883 sequenceM TTAGTAGGCGTTTGAGAAGAAGAAAATTGAATTTGCCACCTGGACCTAAACCCTGGCCATTAAT CYP92A127(codon TGGCAATTTAAACCTAATAGGTTCTTTACCGCATCAATCCATCCATCAATTGTCCCAAAAGTAT optimized) GGCCCAATAATGCACTTGAGATTTGGATCATTTCCTGTTGTAGTTGGCAGTTCTGTGGATATGG CCAAGATCTTCTTGAAAACTCAGGATCTAACCTTCGTTTCACGTCCAAAGACAGCAGCTGGCAA ATACACCACTTACAATTATAGCAATATAACGTGGTCACAATATGGTCCTTATTGGAGACAAGCG AGGAAAATGTGTTTGATGGAATTGTTCTCTGCTAGAAGATTGGACAGTTATGAATACATTAGGA AAGAAGAGATGAATGCCTTGCTTAAGGAAATTTGCAAAAGTTCGGGAAAAGTCATCAAACTAAA GGACTACCTATCTACAGTTTCCTTGAACGTGATAAGCAGGATGGTCTTAGGGAAGAAATACACT GACGAGTCAGAAGATGCAATCGTTAGTCCAGACGAATTTAAGAAAATGCTTGACGAATTGTTTC TTCTATCTGGTGTATTGAACATCGGTGATTCGATACCGTGGATTGATTTCTTAGATCTACAGGG TTACGTGAAACGTATGAAAGCTTTGTCCAAGAAATTCGACAGATTTCTGGAGCATGTTTTAGAC GAGCATAATGAGAGAAGAAAAGGTGTCAAAGATTATGTAGCTAAAGACATGGTCGATGTACTGT TACAACTGGCAGATGATCCGGATCTTGAGGTGAAATTGGAACGTCACGGTGTTAAGGCGTTCAC ACAAGACTTAATAGCCGGTGGTACAGAATCTTCCGCTGTCACTGTAGAATGGGCAATGAGCGAA CTTCTAAAGAAACCAGAGATGTTCGAAAAGGCCTCTGAAGAGTTAGATAGAGTGATTGGTAGGG AAAGATGGGTTGAGGAAAAGGATATCGCGAATTTACCCTATATTGACGCAATTGCTAAAGAAAC CATGAGGTTACATCCTGTGGCACCAATGTTGGTACCTAGATTATGCAGAGAAGATTGTCAGATT GCTGGCTACGATATAGCAAAGGGCACTAGAGTTCTTGTCAACGTTTGGACAATTGGAAGAGATC CAACTGTTTGGGAAAATCCGGATGAATTTAACCCAGAAAGATTTCTTGGGAAATCAATTGATGT CAAAGGGCAAGACTTTGAGTTGTTACCCTTTGGAAGTGGTAGAAGAATGTGTCCTGGATATTCA CTGGGTTTAAAAGTTATTCAGTCATCACTAGCCAACTTATTGCATGGGTTTTCCTGGAAGCTGG CTGGTGATACCAAGAAAGAAGATTTGAATATGGAAGAAGTATTCGGTTTAAGCACGCCAAAGAA GTTTCCTTTGGATGCTGTTGCCGAACCAAGACTGCCTCCACACCTGTATTCTATGTAG Pathwaystep3 MEAPSWVSYAAAWVATLALLLLSRRLRRRKLNLPPGPKPWPLIGNLNLIGSLPHQSIHQLSQKY 884 sequenceN GPIMHLRFGSFPVVVGSSVDMAKIFLKTQDLTFVSRPKTAAGKYTTYNYSNITWSQYGPYWRQA CYP92A127Protein RKMCLMELFSARRLDSYEYIRKEEMNALLKEICKSSGKVIKLKDYLSTVSLNVISRMVLGKKYT DESEDAIVSPDEFKKMLDELFLLSGVLNIGDSIPWIDFLDLQGYVKRMKALSKKFDRFLEHVLD EHNERRKGVKDYVAKDMVDVLLQLADDPDLEVKLERHGVKAFTQDLIAGGTESSAVTVEWAMSE LLKKPEMFEKASEELDRVIGRERWVEEKDIANLPYIDAIAKETMRLHPVAPMLVPRLCREDCQI AGYDIAKGTRVLVNVWTIGRDPTVWENPDEFNPERFLGKSIDVKGQDFELLPFGSGRRMCPGYS LGLKVIQSSLANLLHGFSWKLAGDTKKEDLNMEEVFGLSTPKKFPLDAVAEPRLPPHLYSM. Pathwaystep3 ATGGAGGCACCACCGTGGGTTTCATATGCAGCTGCGTGGGTAGCAACATTGGCTCTGTTACTTC 885 sequenceO TGTCTAGACATTTGCGTAGAAGAAAATTGAATTTACCACCTGGTCCAAAGCCTTGGCCTCTAAT CYP92A129(codon TGGCAATCTGAACTTGATAGGATCGCTACCACATCAATCCATACATCAATTGAGTCAGAAATAT optimized) GGCCCAATTATGCAGTTAAGATTTGGTTCTTTTCCCGTTGTTGTTGGTTCAAGCGTAGATATGG CCAAAATTTTCCTGAAAACACACGATCTTACGTTTGTGAGCAGACCGAAAACTGCTGCAGGCAA ATACACCACGTATAACTGTTCCAATATAACTTGGTCGCAATATGGTCCGTATTGGAGACAAGCC AGGAAAATGTGTTTGATGGAGCTGTTTAGCGCTAGACGTCTGGATTCATACGAATACATCAGAA AAGAGGAAATGAATGCACTATTGAAGGAGATTTGCAAAAGTAGTGGGAAAGTAATCAAACTTAA AGACTATTTGTCTACTGTCTCGCTTAATGTCATCAGTAGAATGGTGCTAGGAAAGAAGTACACC GATGAGTCTGAAGATGCCATTGTTTCTCCCGATGAATTTAAGAAAATGTTGGATGAATTGTTTC TACTGGGCGGTGTTTTGAACATCGGTGATTCCATACCTTGGATCGACTTCTTAGATCTTCAAGG ATATGTCAAGAGAATGAAGGCTTTATCAAAGAAATTTGATCGTTTTCTAGAACACGTACTAGAT GAACACAACGAGCGTAGAAAAGGTGTGAAGGATTATGTTGCTAAGGACATGGTCGATGTGTTAT TGCAATTGGCTGACGATCCAGACTTGGAAGTCAGGTTAGAGAGGCATGGTGTTAAGGCGTTTAC CCAAGACTTGATTGCAGGAGGAACAGAATCATCCGCAGTAACAGTAGAATGGGCCATGTCTGAA TTGTTAAAGAAGCCCGAAATGTTCGAGAAAGCCTCAGAAGAGCTAGACAGAGTGATTGGTAGGG AAAGATGGGTTGAAGAGAAAGACATAGCCAATTTACCGTATATAGACGCCATCGCTAAAGAAAC CATGAGATTGCATCCAGTCGCACCTATGCTAGTTCCACGTTTATGCAGAGAAGATTGTCAGATT GCTGGATACGATATTGCTAAGGGTACTAGAGTCTTGGTGAACGTTTGGACAATTGGTAGGGATC CTACTGTATGGGAAAATCCTGATGAATTCAATCCCGAAAGATTCTTAGGGAAATCCATCGATGT CAAAGGTCAAGACTTCGAATTATTGCCATTCGGATCAGGCAGAAGAATGTGTCCAGGGTACTCC TTAGGCTTAAAGGTTATACAGAGTAGCTTAGCAAATCTTTTGCATGGTTTCTCTTGGAGACTTG CTGGGGACGTTAAGAAAGAAGATTTAAACATGGAAGAAGTGTTTGGTCTTTCTACTCCCAAGAA ATTTCCATTGGATGCGGTTGCTGAACCTAGGTTACCACCTCACTTGTACTCTATTTAG Pathwaystep3 MEAPPWVSYAAAWVATLALLLLSRHLRRRKLNLPPGPKPWPLIGNLNLIGSLPHQSIHQLSQKY 886 sequenceP) GPIMQLRFGSFPVVVGSSVDMAKIFLKTHDLTFVSRPKTAAGKYTTYNCSNITWSQYGPYWRQA CYP92A129Protein RKMCLMELFSARRLDSYEYIRKEEMNALLKEICKSSGKVIKLKDYLSTVSLNVISRMVLGKKYT DESEDAIVSPDEFKKMLDELFLLGGVLNIGDSIPWIDFLDLQGYVKRMKALSKKFDRFLEHVLD EHNERRKGVKDYVAKDMVDVLLQLADDPDLEVRLERHGVKAFTQDLIAGGTESSAVTVEWAMSE LLKKPEMFEKASEELDRVIGRERWVEEKDIANLPYIDAIAKETMRLHPVAPMLVPRLCREDCQI AGYDIAKGTRVLVNVWTIGRDPTVWENPDEFNPERFLGKSIDVKGQDFELLPFGSGRRMCPGYS LGLKVIQSSLANLLHGFSWRLAGDVKKEDLNMEEVFGLSTPKKFPLDAVAEPRLPPHLYSI. Pathwaystep3 ATGGAAATGTCATCATGTGTAGCCGCTACGATTAGCATCTGGATGGTGGTTGTTTGTATTGTGG 887 sequenceQ GTGTTGGATGGAGAGTGGTAAATTGGGTTTGGCTAAGACCCAAGAAATTGGAGAAAAGGTTAAG CYP92A458(codon GGAACAAGGCTTGGCAGGGAACTCTTACAGATTGTTATTTGGTGACCTTAAAGAACGTGCAGCA optimized) ATGGCTGAACAAGCCAATTCAAAACCGATTAATTTTAGTCACGACATTGGTCCAAGAGTTTTCC CAAGTATGTACAAAACCATTCAGAATTATGGGAAGAATTCCTACATGTGGTTAGGTCCCTATCC AAGAGTGCATATAATGGATCCTCAACAGCTGAAAACCGTCTTTACATTGGTTTATGACATTCAA AAGCCGAATCTGAATCCACTGGTCAAATTCTTGTTAGATGGGATTGTCACTCATGAAGGAGAAA AGTGGGCAAAGCATAGAAAGATCATTAATCCAGCTTTTCACCTTGAAAAGTTGAAGGACATGAT TCCTGCCTTCTTTCACTCTTGCAATGAGATAGTTAATGAGTGGGAAAGACTAATTTCGAAGGAG GGTTCCTGTGAACTTGATGTTATGCCTTACTTGCAGAACTTAGCTGCTGATGCTATATCCAGAA CAGCGTTTGGTTCTAGCTATGAAGAGGGTAAAATGATATTCCAATTACTTAAGGAATTGACTGA TTTGGTCGTAAAAGTAGCGTTTGGTGTGTATATCCCTGGTTGGAGATTCTTACCAACCAAATCA AACAACAAAATGAAAGAAATCAACAGGAAAATCAAATCTCTGCTATTAGGAATCATTAACAAAC GTCAGAAAGCAATGGAAGAAGGCGAAGCTGGTCAATCTGATTTGTTAGGCATACTAATGGAATC GAATTCCAACGAAATTCAAGGAGAAGGAAACAATAAGGAGGACGGTATGTCTATAGAAGATGTA ATCGAGGAATGCAAGGTTTTCTATATAGGTGGACAAGAGACTACAGCCAGACTATTAATTTGGA CAATGATACTTTTAAGTTCACATACGGAATGGCAAGAGAGAGCAAGGACTGAAGTCTTGAAAGT CTTTGGCAATAAGAAGCCTGATTTTGATGGCTTGAACAGATTGAAAATCGTTAGTGAAATTCTA TAG Pathwaystep3 MEMSSCVAATISIWMVVVCIVGVGWRVVNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA 888 sequenceR MAEQANSKPINFSHDIGPRVFPSMYKTIQNYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ CYP92A458Protein KPNLNPLVKFLLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLVVKVAFGVYIPGWRFLPTKS NNKMKEINRKIKSLLLGIINKRQKAMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIEDV IEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLNRLKIVSEIL Pathwaystep3 MEVHWVCMCAATLLVCYIFGSKFVRNLNGWYYDVKLRRKEHPLPPGDMGWPLMGNLLSFIKDFS 889 sequenceS SGHPDSFINNLVLKYGRSGIYKTHLFGNPSIIVCEPQMCRRVLTDDVNFKLGYPKSIKELARCR CYP88D6Protein PMIDVSNAEHRLFRRLITSPIVGHKALAMYLERLEEIVINSLEELSSMKHPVELLKEMKKVSFK AIVHVFMGSSNQDIIKKIGSSFTDLYNGMFSIPINVPGFTFHKALEARKKLAKIVQPVVDERRL MIENGQQEGDQRKDLIDILLEVKDENGRKLEDEDISDLLIGLLFAGHESTATSLMWSITYLTQH PHILKKAKEEQEEIMRTRLSSQKQLSFKEIKQMVYLSQVIDETLRCANIAFATFREATADVNIN GYIIPKGWRVLIWARAIHMDSEYYPNPEEFNPSRWDDYNAKAGTFLPFGAGSRLCPGADLAKLE ISIFLHYFLLNYRLERVNPECHVTSLPVSKPTDNCLAKVMKVSCA. Pathwaystep3 ATGGAAGTACATTGGGTTTGCATGTGCGCTGCCACTTTGTTGGTATGCTACATTTTTGGAAGCA 890 sequenceT AGTTTGTGAGGAATTTGAATGGGTGGTATTATGATGTAAAACTAAGAAGGAAAGAACACCCACT CYP88D6(codon ACCCCCAGGTGACATGGGATGGCCTCTTATGGGCAATCTATTGTCCTTCATCAAAGATTTCTCA optimized) TCGGGTCACCCTGATTCATTCATCAACAACCTTGTTCTCAAATATGGACGAAGTGGTATCTACA AGACTCACTTGTTTGGGAATCCAAGCATCATTGTTTGCGAGCCTCAGATGTGTAGGCGAGTTCT CACTGATGATGTGAACTTTAAGCTTGGTTATCCAAAATCTATCAAAGAGTTGGCACGATGTAGA CCCATGATTGATGTCTCTAATGCGGAACATAGGCTTTTTCGACGCCTCATTACTTCCCCAATCG TGGGTCACAAGGCGCTAGCAATGTACCTAGAACGTCTTGAGGAAATTGTGATCAATTCGTTGGA AGAATTGTCCAGCATGAAGCACCCCGTTGAGCTCTTGAAAGAGATGAAGAAGGTTTCCTTTAAA GCCATTGTCCACGTTTTCATGGGCTCTTCCAATCAGGACATCATTAAAAAAATTGGAAGTTCGT TTACTGATTTGTACAATGGCATGTTCTCTATCCCCATTAACGTACCTGGTTTTACATTCCACAA AGCACTCGAGGCACGTAAGAAGCTAGCCAAAATAGTTCAACCCGTTGTGGATGAAAGGCGGTTG ATGATAGAAAATGGTCAACAAGAAGGGGACCAAAGAAAAGATCTTATTGATATTCTTTTGGAAG TCAAAGATGAGAATGGACGAAAATTGGAGGACGAGGATATTAGCGATTTATTAATAGGGCTTTT GTTTGCTGGCCATGAAAGTACAGCAACCAGTTTAATGTGGTCAATTACATATCTTACACAGCAT CCCCATATCTTGAAAAAGGCTAAGGAAGAGCAGGAAGAAATAATGAGGACAAGATTGTCCTCGC AGAAACAATTAAGTTTTAAGGAAATTAAACAAATGGTTTATCTTTCTCAGGTAATTGATGAAAC TTTACGATGTGCCAATATTGCCTTTGCAACTTTTCGAGAGGCAACTGCTGATGTGAACATCAAT GGTTATATCATACCAAAGGGATGGAGAGTGCTAATTTGGGCAAGAGCCATTCATATGGATTCTG AATATTACCCAAATCCAGAAGAATTTAATCCATCGAGATGGGATGATTACAATGCCAAAGCAGG AACCTTCCTTCCTTTTGGAGCAGGAAGTAGACTTTGTCCTGGAGCCGACTTGGCGAAACTTGAA ATTTCCATATTTCTTCATTATTTCCTCCTTAATTACAGGTTGGAGCGAGTAAATCCAGAATGTC ATGTTACCAGCTTACCAGTATCTAAGCCCACAGACAATTGCCTCGCTAAGGTGATGAAGGTCTC ATGTGCTTAG Pathwaystep3 ATGGAAATGTCCTCTTCTGTTGCTGCCACCATTTCTATTTGGATGGTTGTTGTATGTATCGTTG 891 sequenceU GTGTTGGTTGGAGAGTTGTTAATTGGGTTTGGTTAAGACCAAAGAAGTTGGAAAAGAGATTGAG CYP1798(codon AGAACAAGGTTTGGCTGGTAACTCTTACAGATTGTTGTTCGGTGACTTGAAAGAAAGAGCTGCT optimized) ATGGAAGAACAAGCTAACTCTAAGCCAATCAACTTCTCCCATGATATTGGTCCAAGAGTTTTCC CATCTATGTACAAGACCATTCAAAACTACGGTAAGAACTCCTATATGTGGTTGGGTCCATACCC AAGAGTTCATATTATGGATCCACAACAATTGAAAACCGTCTTTACCTTGGTTTACGACATCCAA AAGCCAAACTTGAACCCATTGATCAAGTTCTTGTTGGATGGTATTGTCACCCATGAAGGTGAAA AATGGGCTAAACATAGAAAGATTATCAACCCAGCCTTCCACTTGGAAAAGTTGAAAGATATGAT TCCAGCCTTCTTCCACTCTTGCAACGAAATAGTTAATGAATGGGAAAGATTGATCTCCAAAGAA GGTTCTTGCGAATTGGATGTTATGCCATACTTGCAAAATTTGGCTGCTGATGCTATTTCTAGAA CTGCTTTTGGTTCCTCTTACGAAGAAGGTAAGATGATCTTCCAATTATTGAAAGAATTGACCGA CTTGGTTGTTAAGGTTGCTTTCGGTGTTTACATTCCAGGTTGGAGATTTTTGCCAACTAAGTCC AACAACAAGATGAAGGAAATCAACAGAAAGATCAAGTCTTTGTTGTTAGGTATCATCAACAAGA GACAAAAGGCCATGGAAGAAGGTGAAGCTGGTCAATCTGATTTGTTGGGTATTTTGATGGAATC CAACTCCAACGAAATTCAAGGTGAAGGTAACAACAAAGAAGATGGTATGTCCATCGAAGATGTT ATCGAAGAATGCAAGGTTTTCTACATCGGTGGTCAAGAAACTACCGCCAGATTATTGATTTGGA CCATGATCTTGTTGAGTTCCCATACTGAATGGCAAGAAAGAGCAAGAACTGAAGTCTTGAAGGT TTTCGGTAACAAAAAGCCAGATTTCGACGGTTTGTCTAGATTGAAGGTTGTCACCATGATTTTG AACGAAGTTTTGAGATTATACCCACCAGCTTCTATGTTGACCAGAATCATTCAAAAAGAAACCA GAGTCGGTAAGTTGACTTTGCCAGCTGGTGTTATTTTGATCATGCCAATCATCTTGATCCACAG AGATCATGATTTGTGGGGTGAAGATGCTAATGAATTCAAGCCAGAAAGATTCTCCAAGGGTGTT TCTAAAGCTGCTAAAGTTCAACCAGCTTTCTTTCCATTTGGTTGGGGTCCAAGAATATGTATGG GTCAAAATTTCGCTATGATCGAAGCTAAGATGGCCTTGTCTTTGATCTTGCAAAGATTTTCCTT CGAATTGTCCTCCTCATATGTTCATGCTCCAACTGTTGTTTTCACCACTCAACCACAACATGGT GCTCATATCGTTTTGAGAAAGTTGTAA Pathwaystep3 MEMSSSVAATISIWMVVVCIVGVGWRVVNWVWLRPKKLEKRLREQGLAGNSYRLLFGDLKERAA 892 sequenceV MEEQANSKPINFSHDIGPRVFPSMYKTIQNYGKNSYMWLGPYPRVHIMDPQQLKTVFTLVYDIQ CYP1798Protein KPNLNPLIKFLLDGIVTHEGEKWAKHRKIINPAFHLEKLKDMIPAFFHSCNEIVNEWERLISKE GSCELDVMPYLQNLAADAISRTAFGSSYEEGKMIFQLLKELTDLVVKVAFGVYIPGWRFLPTKS NNKMKEINRKIKSLLLGIINKRQKAMEEGEAGQSDLLGILMESNSNEIQGEGNNKEDGMSIEDV IEECKVFYIGGQETTARLLIWTMILLSSHTEWQERARTEVLKVFGNKKPDFDGLSRLKVVTMIL NEVLRLYPPASMLTRIIQKETRVGKLTLPAGVILIMPIILIHRDHDLWGEDANEFKPERFSKGV SKAAKVQPAFFPFGWGPRICMGQNFAMIEAKMALSLILQRFSFELSSSYVHAPTVVFTTQPQHG AHIVLRKL. Pathwaystep3 ATGGATGAAATCGAACATATTACCATCAATACAAATGGAATCAAAATGCATATTGCGTCAGTCG 893 sequenceW GCACAGGACCAGTTGTTCTCTTGCTACACGGCTTTCCAGAATTATGGTACTCTTGGAGACACCA EPH2A(codon ACTACTTTACCTGTCCTCCGTTGGGTACAGAGCAATAGCTCCAGATTTGAGAGGCTATGGCGAT optimized) ACTGACAGTCCAGCTAGTCCTACCTCTTATACTGCTCTTCATATTGTAGGTGACCTGGTCGGCG CATTAGACGAATTGGGAATAGAAAAGGTCTTTTTAGTGGGTCATGACTGGGGTGCTATTATCGC ATGGTACTTTTGTTTGTTTAGACCAGATAGAATTAAAGCACTTGTGAATTTGTCTGTCCAGTTT ATCCCACGTAACCCAGCAATACCTTTTATAGAAGGTTTCAGAACAGCTTTTGGTGATGACTTCT ACATTTGTAGATTTCAAGTACCTGGGGAAGCTGAAGAGGATTTCGCGTCTATCGATACTGCTCA ATTGTTTAAAACTTCATTATGCAATAGAAGCTCAGCCCCTCCTTGTTTGCCTAAAGAGATTGGT TTTAGGGCTATCCCACCACCAGAAAATCTGCCATCTTGGCTCACAGAGGAAGATATCAACTTCT ACGCAGCCAAGTTTAAACAAACTGGTTTTACTGGTGCCCTTAACTATTATAGAGCATTCGACTT GACATGGGAATTAACAGCCCCATGGACAGGAGCCCAGATCCAAGTTCCTGTAAAGTTCATAGTT GGTGATTCAGATCTCACGTACCATTTCCCTGGTGCTAAGGAATACATCCACAACGGAGGGTTTA AAAGAGATGTGCCACTATTAGAGGAAGTTGTTGTGGTAAAAGATGCCTGCCACTTCATTAACCA AGAGCGACCACAAGAGATTAATGCTCATATTCATGACTTCATCAATAAGTTCTAA Pathwaystep3 MDEIEHITINTNGIKMHIASVGTGPVVLLLHGFPELWYSWRHQLLYLSSVGYRAIAPDLRGYGD 894 sequenceX TDSPASPTSYTALHIVGDLVGALDELGIEKVFLVGHDWGAIIAWYFCLFRPDRIKALVNLSVQF EPH2AProtein IPRNPAIPFIEGFRTAFGDDFYICRFQVPGEAEEDFASIDTAQLFKTSLCNRSSAPPCLPKEIG FRAIPPPENLPSWLTEEDINFYAAKFKQTGFTGALNYYRAFDLTWELTAPWTGAQIQVPVKFIV GDSDLTYHFPGAKEYIHNGGFKRDVPLLEEVVVVKDACHFINQERPQEINAHIHDFINKF Pathwaystep3 ATGCCAGCAAATGCCCCAGATAAACAATCAGTGACTAATGCACCAGTAGTGCCGCCAAAGCATG 895 sequenceY ATACGGACCAGCAGGACGATTCACTAGAAAAACAGCAAGTATTAGAACCGAGCGTAAATAGTAA tDexTDNA(native TATACCAAAAAAGCAGACAAATCAACAGTTAGCGGTTGTTACAGCACCAGCAAATTCAGCACCT DNAsequence) CAAACCAAAACAACAGCAGAAATTTCTGCTGGTACAGAGTTAGACACGATGCCTAATGTTAAGC ATGTAGATGGCAAAGTTTATTTTTATGGAGATGATGGCCAACCAAAAAAGAATTTTACTACTAT TATAGATGGTAAACCTTACTACTTTGATAAAGATACAGGGGCACTATCTAATAACGATAAGCAA TATGTATCGGAATTATTCAGTATTGGCAATAAACATAACGCCGTCTATAACACATCATCAGATA ATTTTACGCAATTAGAAGGACATCTGACGGCAAGTAGTTGGTATCGTCCAAAAGATATTTTGAA AAATGGTAAACGTTGGGCACCTTCAACAGTGACTGATTTCAGACCATTATTGATGGCCTGGTGG CCGGATAAGAGTACGCAAGTCACTTATCTGAATTACATGAAAGATCAGGGCCTCTTGTCTGGTA CTCATCACTTTTCCGATAATGAAAATATGCGGACCTTAACGGCAGCTGCCATGCAGGCACAGGT AAACATTGAGAAAAAAATTGGGCAACTTGGCAATACGGATTGGTTGAAAACGGCGATGACGCAA TACATTGATGCCCAGCCCAATTGGAATATTGACAGTGAGGCGAAAGGAGATGATCATCTACAAG GTGGTGCACTACTTTATACAAATAGTGATATGTCGCCAAAGGCCAATTCTGATTATCGTAAGCT GAGCCGTACGCCTAAAAATCAAAAAGGTCAAATTGCTGATAAATATAAGCAAGGTGGGTTTGAA TTATTACTAGCAAACGATGTCGATAATTCTAATCCAGTTGTGCAAGCAGAACAACTTAATTGGT TACATTATATGATGAATATCGGTAGTATTTTACAAAATGATGACCAAGCTAATTTTGATGGTTA CCGTGTTGATGCTGTCGATAATGTGGACGCTGACTTACTACAGATTGCTGGTGAATATGCTAAG GCTGCCTATGGTGTTGACAAAAATGACGCGAGAGCGAATCAACATTTATCAATTTTGGAAGACT GGGGAGATGAAGATCCAGACTATGTCAAAGCACATGGCAACCAGCAAATTACAATGGATTTCCC CTTGCATTTAGCGATTAAATACGCGCTCAACATGCCTAATGATAAGCGGAGTGGCCTTGAGCCA ACCCGTGAACACAGTTTAGTCAAACGAATTACAGATGATAAAGAAAATGTTGCACAACCAAATT ATTCATTTATCCGAGCTCATGACAGTGAAGTACAAACGATTATTGCTGATATTATTAAAGATAA AATCAACCCGGCGTCAACAGGGCTAGATTCAACAGTGACTTTGGATCAAATTAAGCAGGCTTTT GACATCTATAATGCTGATGAATTGAAAGCAGATAAAGTTTACACACCTTACAATATTCCAGCAT CATACGCTTTGTTATTGACTAATAAAGACACAATTCCACGTGTTTATTATGGGGATATGTTCAC GGATGATGGCCAATACATGGCTAAACAATCACCTTACTATCAAGCGATTGATGCGTTGTTGAAA GCTCGTATCAAGTATGCTGCTGGTGGTCAAACCATGAAAATGAACTATTTTCCAGATGAACAAT CTGTTATGACATCAGTTCGTTATGGTAAGGGTGCAATGACGGCAAGTGACTCTGGTAACCAAGA GACACGCTATCAAGGTATTGGACTTGTTGTCAACAATCGCCCAGATTTGAAACTATCTGACAAA GATGAAGTCAAAATGGATATGGGTGCGGCACATAAAAACCAAGATTATCGCCCAGTTTTGTTGA CGACAAAATCAGGATTAAAAGTCTACAGCACTGATGCAAATGCACCTGTCGTTCGAACTGACGC CAATGGCCAATTAACTTTTAAGGCAGACATGGTATATGGTGTAAACGACCCACAAGTGTCAGGG TACATTGCGGCTTGGGTACCAGTAGGGGCTTCAGAAAATCAAGATGCTCGAACGAAAAGTGAAA CAACGCAGTCAACTGACGGGAGTGTTTATCATTCTAATGCAGCGTTAGATTCGCAAGTCATTTA TGAAGGCTTTTCAAATTTTCAAGACTTTCCAACAACACCCGATGAGTTTACGAACATTAAAATT GCTCAAAATGTTAACTTATTTAAGGATTGGGGTATTACTAGCTTTGAAATGGCGCCACAATATC GCGCCAGCTCAGATAAAAGTTTCTTAGATGCTATCGTACAAAATGGTTATGCATTTACAGATCG ATATGATATTGGTTACAACACACCAACAAAGTATGGGACAGCAGATAATTTGTTAGATGCTTTA CGTGCATTGCATGGTCAGGGTATTCAAGCGATTAACGACTGGGTACCAGATCAAATTTATAATC TACCCGATGAACAGTTAGTCACGGCTATTCGAACAGACGGTTCAGGTGATCATACTTATGGTTC AGTTATTGACCATACTTTGTATGCATCAAAGACAGTTGGCGGGGGCATTTATCAGCAACAATAT GGTGGGGCCTTCTTGGAACAATTAAAAACACAGTACCCGCAACTTTTCCAGCAAAAACAGATTT CCACAGATCAGCCAATGAACCCAGATATTCAAATTAAGTCATGGGAAGCCAAGTATTTCAACGG TTCGAACATTCAGGGGCGTGGGGCTTGGTATGTTTTGAAGGACTGGGGCACACAACAGTATTTT AATGTGTCAGATGCGCAGACCTTCCTTCCAAAGCAATTATTGGGTGAAAAGGCCAAAACTGGTT TTGTTACGCGTGGTAAGGAGACTTCATTCTATTCCACTAGTGGCTATCAAGCAAAATCTGCCTT TATTTGTGATAACGGTAATTGGTACTACTTTGATGACAAAGGGAAAATGGTTGTTGGAAACCAA GTTATCAATGGCATCAATTATTACTTTTTACCGAATGGTATCGAATTACAAGATGCCTATCTAG TACATGATGGTATGTACTATTATTATAATAATATTGGCAAGCAACTGCACAACACATATTACCA AGATAAACAAAAAAATTTCCATTACTTCTTTGAAGATGGGCACATGGCACAGGGTATTGTCACC ATCATTCAAAGTGATGGCACCCCAGTCACACAGTACTTTGATGAGAATGGTAAGCAACAAAAAG GCGTGGCGGTCAAAGGATCAGATGGTCATTTGCATTACTTTGACGGTGCGTCAGGGAATATGCT CTTTAAATCATGGGGTAGACTAGCAGATGGCTCTTGGCTATATGTAGACGAGAAAGGTAATGCG GTTACAGGCAAACAAACCATTAATAATCAAACGGTTTACTTTAATGATGATGGTCGTCAAATCA AAAATAACTTTAAAGAATTAGCAGATGGTTCTTGGCTTTATCTTAACAATAAAGGTGTTGCAGT AACAGGAGAGCAAATAATTAATGGGCAGACACTTTATTTTGGTAACGATGGTCGTCAATTTAAA GGGACAACACATATAAATGCTACTGGTGAAAGCCGTTACTATGACCCAGACTCAGGTAATATGA TAACTGATCGTTTTGAACGTGTTGGTGATAATCAATGGGCTTATTTTGGTTATGATGGTGTTGC AGTAACAGGGGACCGAATCATTAAAGGGCAAAAACTCTATTTCAACCAAAATGGTATCCAAATG AAAGGCCACTTACGTCTTGAAAATGGTATCATGCGTTATTACGATGCTGATACTGGCGAATTAG TTCGTAATCGATTTGTATTGCTATCTGATGGTTCATGGGTTTACTTTGGCCAAGATGGCGTACC CGTAACTGGCGTGCAAGTGATTAATGGCCAAACATTATATTTTGACGCAGATGGTAGGCAAGTC AAAGGGCAGCAACGTGTAATCGGCAATCAACGCTATTGGATGGATAAAGACAATGGTGAAATGA AAAAAATAACATACTAG Pathwaystep3 MPANAPDKQSVTNAPVVPPKHDTDQQDDSLEKQQVLEPSVNSNIPKKQTNQQLAVVTAPANSAP 896 sequenceZ QTKTTAEISAGTELDTMPNVKHVDGKVYFYGDDGQPKKNFTTIIDGKPYYFDKDTGALSNNDKQ tDexTProtein YVSELFSIGNKHNAVYNTSSDNFTQLEGHLTASSWYRPKDILKNGKRWAPSTVTDFRPLLMAWW PDKSTQVTYLNYMKDQGLLSGTHHFSDNENMRTLTAAAMQAQVNIEKKIGQLGNTDWLKTAMTQ YIDAQPNWNIDSEAKGDDHLQGGALLYTNSDMSPKANSDYRKLSRTPKNQKGQIADKYKQGGFE LLLANDVDNSNPVVQAEQLNWLHYMMNIGSILQNDDQANFDGYRVDAVDNVDADLLQIAGEYAK AAYGVDKNDARANQHLSILEDWGDEDPDYVKAHGNQQITMDFPLHLAIKYALNMPNDKRSGLEP TREHSLVKRITDDKENVAQPNYSFIRAHDSEVQTIIADIIKDKINPASTGLDSTVTLDQIKQAF DIYNADELKADKVYTPYNIPASYALLLTNKDTIPRVYYGDMFTDDGQYMAKQSPYYQAIDALLK ARIKYAAGGQTMKMNYFPDEQSVMTSVRYGKGAMTASDSGNQETRYQGIGLVVNNRPDLKLSDK DEVKMDMGAAHKNQDYRPVLLTTKSGLKVYSTDANAPVVRTDANGQLTFKADMVYGVNDPQVSG YIAAWVPVGASENQDARTKSETTQSTDGSVYHSNAALDSQVIYEGFSNFQDFPTTPDEFTNIKI AQNVNLFKDWGITSFEMAPQYRASSDKSFLDAIVQNGYAFTDRYDIGYNTPTKYGTADNLLDAL RALHGQGIQAINDWVPDQIYNLPDEQLVTAIRTDGSGDHTYGSVIDHTLYASKTVGGGIYQQQY GGAFLEQLKTQYPQLFQQKQISTDQPMNPDIQIKSWEAKYFNGSNIQGRGAWYVLKDWGTQQYF NVSDAQTFLPKQLLGEKAKTGFVTRGKETSFYSTSGYQAKSAFICDNGNWYYFDDKGKMVVGNQ VINGINYYFLPNGIELQDAYLVHDGMYYYYNNIGKQLHNTYYQDKQKNFHYFFEDGHMAQGIVT IIQSDGTPVTQYFDENGKQQKGVAVKGSDGHLHYFDGASGNMLFKSWGRLADGSWLYVDEKGNA VTGKQTINNQTVYFNDDGRQIKNNFKELADGSWLYLNNKGVAVTGEQIINGQTLYFGNDGRQFK GTTHINATGESRYYDPDSGNMITDRFERVGDNQWAYFGYDGVAVTGDRIIKGQKLYFNQNGIQM KGHLRLENGIMRYYDADTGELVRNRFVLLSDGSWVYFGQDGVPVTGVQVINGQTLYFDADGRQV KGQQRVIGNQRYWMDKDNGEMKKITY Pathway1sequence ATGGCAGCTGACCAATTGGTGAAAACTGAAGTCACCAAGAAGTCTTTTACTGCTCCTGTACAAA 897 idA AGGCTTCTACACCAGTTTTAACCAATAAAACAGTCATTTCTGGATCGAAAGTCAAAAGTTTATC tHMG-CoADNA ATCTGCGCAATCGAGCTCATCAGGACCTTCATCATCTAGTGAGGAAGATGATTCCCGCGATATT GAAAGCTTGGATAAGAAAATACGTCCTTTAGAAGAATTAGAAGCATTATTAAGTAGTGGAAATA CAAAACAATTGAAGAACAAAGAGGTCGCTGCCTTGGTTATTCACGGTAAGTTACCTTTGTACGC TTTGGAGAAAAAATTAGGTGATACTACGAGAGCGGTTGCGGTACGTAGGAAGGCTCTTTCAATT TTGGCAGAAGCTCCTGTATTAGCATCTGATCGTTTACCATATAAAAATTATGACTACGACCGCG TATTTGGCGCTTGTTGTGAAAATGTTATAGGTTACATGCCTTTGCCCGTTGGTGTTATAGGCCC CTTGGTTATCGATGGTACATCTTATCATATACCAATGGCAACTACAGAGGGTTGTTTGGTAGCT TCTGCCATGCGTGGCTGTAAGGCAATCAATGCTGGCGGTGGTGCAACAACTGTTTTAACTAAGG ATGGTATGACAAGAGGCCCAGTAGTCCGTTTCCCAACTTTGAAAAGATCTGGTGCCTGTAAGAT ATGGTTAGACTCAGAAGAGGGACAAAACGCAATTAAAAAAGCTTTTAACTCTACATCAAGATTT GCACGTCTGCAACATATTCAAACTTGTCTAGCAGGAGATTTACTCTTCATGAGATTTAGAACAA CTACTGGTGACGCAATGGGTATGAATATGATTTCTAAAGGTGTCGAATACTCATTAAAGCAAAT GGTAGAAGAGTATGGCTGGGAAGATATGGAGGTTGTCTCCGTTTCTGGTAACTACTGTACCGAC AAAAAACCAGCTGCCATCAACTGGATCGAAGGTCGTGGTAAGAGTGTCGTCGCAGAAGCTACTA TTCCTGGTGATGTTGTCAGAAAAGTGTTAAAAAGTGATGTTTCCGCATTGGTTGAGTTGAACAT TGCTAAGAATTTGGTTGGATCTGCAATGGCTGGGTCTGTTGGTGGATTTAACGCACATGCAGCT AATTTAGTGACAGCTGTTTTCTTGGCATTAGGACAAGATCCTGCACAAAATGTTGAAAGTTCCA ACTGTATAACATTGATGAAAGAAGTGGACGGTGATTTGAGAATTTCCGTATCCATGCCATCCAT CGAAGTAGGTACCATCGGTGGTGGTACTGTTCTAGAACCACAAGGTGCCATGTTGGACTTATTA GGTGTAAGAGGCCCGCATGCTACCGCTCCTGGTACCAACGCACGTCAATTAGCAAGAATAGTTG CCTGTGCCGTCTTGGCAGGTGAATTATCCTTATGTGCTGCCCTAGCAGCCGGCCATTTGGTTCA AAGTCATATGACCCACAACAGGAAACCTGCTGAACCAACAAAACCTAACAATTTGGACGCCACT GATATAAATCGTTTGAAAGATGGGTCCGTCACCTGCATTAAATCCTAA Pathway1sequence MAADQLVKTEVTKKSFTAPVQKASTPVLTNKTVISGSKVKSLSSAQSSSSGPSSSSEEDDSRDI 898 idB ESLDKKIRPLEELEALLSSGNTKQLKNKEVAALVIHGKLPLYALEKKLGDTTRAVAVRRKALSI tHMG-CoAProtein LAEAPVLASDRLPYKNYDYDRVFGACCENVIGYMPLPVGVIGPLVIDGTSYHIPMATTEGCLVA SAMRGCKAINAGGGATTVLTKDGMTRGPVVRFPTLKRSGACKIWLDSEEGQNAIKKAFNSTSRF ARLQHIQTCLAGDLLFMRFRTTTGDAMGMNMISKGVEYSLKQMVEEYGWEDMEVVSVSGNYCTD KKPAAINWIEGRGKSVVAEATIPGDVVRKVLKSDVSALVELNIAKNLVGSAMAGSVGGFNAHAA NLVTAVFLALGQDPAQNVESSNCITLMKEVDGDLRISVSMPSIEVGTIGGGTVLEPQGAMLDLL GVRGPHATAPGTNARQLARIVACAVLAGELSLCAALAAGHLVQSHMTHNRKPAEPTKPNNLDAT DINRLKDGSVTCIKS Pathway1sequence ATGTCTGCTGTTAACGTTGCACCTGAATTGATTAATGCCGACAACACAATTACCTACGATGCGA 899 idC TTGTCATCGGTGCTGGTGTTATCGGTCCATGTGTTGCTACTGGTCTAGCAAGAAAGGGTAAGAA erg1DNA AGTTCTTATCGTAGAACGTGACTGGGCTATGCCTGATAGAATTGTTGGTGAATTGATGCAACCA GGTGGTGTTAGAGCATTGAGAAGTCTGGGTATGATTCAATCTATCAACAACATCGAAGCATATC CTGTTACCGGTTATACCGTCTTTTTCAACGGCGAACAAGTTGATATTCCATACCCTTACAAGGC CGATATCCCTAAAGTTGAAAAATTGAAGGACTTGGTCAAAGATGGTAATGACAAGGTCTTGGAA GACAGCACTATTCACATCAAGGATTACGAAGATGATGAAAGAGAAAGGGGTGTTGCTTTTGTTC ATGGTAGATTCTTGAACAACTTGAGAAACATTACTGCTCAAGAGCCAAATGTTACTAGAGTGCA AGGTAACTGTATTGAGATATTGAAGGATGAAAAGAATGAGGTTGTTGGTGCCAAGGTTGACATT GATGGCCGTGGCAAGGTGGAATTCAAAGCCCACTTGACATTTATCTGTGACGGTATCTTTTCAC GTTTCAGAAAGGAATTGCACCCAGACCATGTTCCAACTGTCGGTTCTTCGTTTGTCGGTATGTC TTTGTTCAATGCTAAGAATCCTGCTCCTATGCACGGTCACGTTATTCTTGGTAGTGATCATATG CCAATCTTGGTTTACCAAATCAGTCCAGAAGAAACAAGAATCCTTTGTGCTTACAACTCTCCAA AGGTCCCAGCTGATATCAAGAGTTGGATGATTAAGGATGTCCAACCTTTCATTCCAAAGAGTCT ACGTCCTTCATTTGATGAAGCCGTCAGCCAAGGTAAATTTAGAGCTATGCCAAACTCCTACTTG CCAGCTAGACAAAACGACGTCACTGGTATGTGTGTTATCGGTGACGCTCTAAATATGAGACATC CATTGACTGGTGGTGGTATGACTGTCGGTTTGCATGATGTTGTCTTGTTGATTAAGAAAATAGG TGACCTAGACTTCAGCGACCGTGAAAAGGTTTTGGATGAATTACTAGACTACCATTTCGAAAGA AAGAGTTACGATTCCGTTATTAACGTTTTGTCAGTGGCTTTGTATTCTTTGTTCGCTGCTGACA GCGATAACTTGAAGGCATTACAAAAAGGTTGTTTCAAATATTTCCAAAGAGGTGGCGATTGTGT CAACAAACCCGTTGAATTTCTGTCTGGTGTCTTGCCAAAGCCTTTGCAATTGACCAGGGTTTTC TTCGCTGTCGCTTTTTACACCATTTACTTGAACATGGAAGAACGTGGTTTCTTGGGATTACCAA TGGCTTTATTGGAAGGTATTATGATTTTGATCACAGCTATTAGAGTATTCACCCCATTTTTGTT TGGTGAGTTGATTGGTTAA Pathway1sequence MSAVNVAPELINADNTITYDAIVIGAGVIGPCVATGLARKGKKVLIVERDWAMPDRIVGELMQP 900 idD GGVRALRSLGMIQSINNIEAYPVTGYTVFFNGEQVDIPYPYKADIPKVEKLKDLVKDGNDKVLE erg1protein DSTIHIKDYEDDERERGVAFVHGRFLNNLRNITAQEPNVTRVQGNCIEILKDEKNEVVGAKVDI DGRGKVEFKAHLTFICDGIFSRFRKELHPDHVPTVGSSFVGMSLFNAKNPAPMHGHVILGSDHM PILVYQISPEETRILCAYNSPKVPADIKSWMIKDVQPFIPKSLRPSFDEAVSQGKFRAMPNSYL PARQNDVTGMCVIGDALNMRHPLTGGGMTVGLHDVVLLIKKIGDLDFSDREKVLDELLDYHFER KSYDSVINVLSVALYSLFAADSDNLKALQKGCFKYFQRGGDCVNKPVEFLSGVLPKPLQLTRVF FAVAFYTIYLNMEERGFLGLPMALLEGIMILITAIRVFTPFLFGELIG Pathway2sequence ATGTGGAGATTAAAAGTGGGAAAAGAGAGTGTTGGGGAAAAAGAAGAGAAATGGATTAAGAGTA 901 idE TAAGCAATCACTTGGGACGTCAAGTTTGGGAATTTTGCAGTGGTGAAAATGAAAATGATGATGA CmeloDNA TGAAGCCATTGCTGTTGCTAATAATTCTGCTTCAAAGTTCGAGAATGCCAGGAATCACTTTCGT AATAATCGTTTCCATCGCAAGCAATCTTCCGACCTCTTTCTTGCCATTCAGTGTGAAAAGGAAA TAATAAGAAACGGTGCAAAAAATGAAGGAACCACCAAAGTAAAAGAAGGGGAAGATGTGAAGAA AGAAGCAGTGAAGAATACATTAGAAAGAGCATTAAGTTTCTATTCGGCTGTTCAAACAAGCGAT GGGAATTGGGCTTCGGATCTTGGCGGGCCTATGTTTTTACTACCGGGTTTAGTGATTGCTCTAT ATGTCACTGGAGTCTTGAATTCTGTTCTGTCCAAGCACCATCGCCAAGAAATGTGTAGATATAT TTACAATCATCAGAATGAAGATGGGGGATGGGGTTTGCACATTGAAGGTTCGAGCACGATGTTT GGTTCGGCACTGAATTATGTTGCACTGAGACTGCTTGGAGAGGCTGCCGATGGCGGAGAGCACG GCGCAATGACAAAAGCTCGAAGTTGGATCTTGGAGCGTGGTGGAGCTACCGCAATCACTTCTTG GGGAAAATTGTGGCTGTCAGTACTTGGAGTCTATGAATGGAGTGGCAACAATCCTCTCCCACCT GAATTTTGGTTACTCCCATATAGCCTACCATTTCATCCTGGAAGAATGTGGTGCCATTGTCGAA TGGTTTATCTACCAATGTCGTACTTATATGGAAAGAGATTTGTTGGGCCAATCACACCCATAGT TTTATCTCTAAGAAAAGAGCTTTACACAATTCCATATCATGAAATTGATTGGAATAGATCTCGC AATACATGTGCAAAGGAGGATTTGTACTATCCACATCCGAAGATGCAAGATATTTTATGGGGAT CGATATACCACGTGTATGAGCCATTGTTTAGTGGTTGGCCAGGGAAAAGGTTGAGGGAAAAGGC AATGAAAATTGCAATGGAACATATACATTATGAAGATGAAAATAGTCGATATATATGTCTTGGT CCTGTCAATAAAGTACTTAATATGCTTTGTTGTTGGGTTGAAGATCCTTATTCAGATGCCTTCA AATTTCATCTACAAAGAATCCCTGACTATCTTTGGCTTGCTGAAGATGGCATGAGAATGCAGGG TTACAATGGGAGTCAATTGTGGGACACTGCTTTCTCTATTCAAGCAATTATATCCACCAAACTT ATAGACACCTTTGGCCCAACCTTAAGAAAAGCACATCATTTTGTTAAACACTCTCAGATCCAGG AGGACTGTCCTGGTGATCCTAACGTTTGGTTCCGTCACATTCATAAAGGTGCTTGGCCTTTTTC AACTCGAGATCATGGTTGGCTCATCTCTGACTGTACGGCCGAGGGACTAAAGGCTTCTTTGATG TTATCCAAACTTCCATCCAAAATAGTTGGGGAGCCATTAGAAAAGAATCGCCTTTGTGATGCTG TTAATGTTCTCCTTTCTTTACAAAACGAAAATGGTGGATTTGCATCATACGAGTTGACAAGATC ATACCCTTGGTTGGAGTTGATCAACCCTGCAGAAACATTTGGAGATATCGTCATCGATTATTCG TATGTGGAGTGCACCTCAGCGACAATGGAAGCATTGGCATTGTTTAAGAAGTTACATCCAGGGC ATAGGACCAAAGAGATTGATGCTGCTATTGCCAAGGCCGCCAACTTTCTTGAAAATATGCAAAA GACTGATGGCTCTTGGTATGGATGTTGGGGGGTATGCTTCACATATGCAGGGTGGTTTGGGATA AAGGGATTGGTTGCTGCAGGAAGAACATATAATAACTGTGTTGCAATTCGTAAGGCTTGTAATT TTCTTTTATCTAAAGAGTTACCTGGTGGTGGATGGGGGGAGAGTTACCTTTCATGTCAGAATAA GGTCTACACCAATCTTGAAGGAAACAAACCACACTTGGTTAATACTGCTTGGGTAATGATGGCT CTCATTGAAGCTGGCCAGGGTGAGAGAGACCCAGCCCCATTGCATCGTGCAGCAAGATTATTAA TCAATTCTCAATTGGAGAGTGGTGATTTTCCCCAACAGGAGATCATGGGAGTGTTTAATAAAAA CTGTATGATTACATATGCTGCATACCGAAACATTTTTCCCATTTGGGCTCTTGGAGAGTATTCC CATAGAGTTTTGGATATGTAA Pathway2sequence MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 902 idF NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD CmeloProtein GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDM Pathway2sequence ATGTGGAAACTTAAAGTTGCTGAGGGTGGCACTCCATGGTTAAGAACCCTAAACAATCACGTGG 903 idG GTAGACAGGTTTGGGAGTTTGACCCACATTCTGGTTCTCCTCAAGACTTGGACGATATTGAGAC PSXY118LDNA AGCAAGAAGAAATTTCCATGACAATCGTTTCACTCATAAACACTCAGACGACTTACTTATGAGA (codonoptimized) TTGCAGTTTGCCAAAGAAAACCCCATGAATGAAGTACTGCCTAAGGTTAAGGTTAAAGACGTTG AAGATGTCACAGAAGAAGCAGTTGCTACCACTCTAAGAAGAGGCTTGAACTTCTACAGCACCAT ACAATCCCACGATGGTCATTGGCCCGGTGATTTGGGTGGTCCTATGTTCTTGATGCCTGGTTTA GTTATCACTTTGTCCGTTACTGGGGCTCTTAACGCTGTTTTAACCGATGAACATAGAAAAGAGA TGAGAAGATACTTATACAATCACCAAAACAAAGATGGAGGCTGGGGCTTGCATATTGAAGGTCC TAGTACGATGTTTGGTTCAGTGTTATGCTATGTTACCTTGAGACTATTGGGTGAAGGGCCAAAT GATGGTGAGGGTGACATGGAGAGAGGAAGAGATTGGATCCTAGAACATGGTGGAGCAACATATA TAACCTCTTGGGGCAAAATGTGGTTATCTGTATTGGGCGTGTTTGAATGGTCAGGGAACAATCC AATGCCACCAGAAATTTGGTTGTTGCCTTATGCTCTTCCAGTTCATCCAGGAAGAATGTGGTGT CATTGTAGGATGGTTTACTTACCGATGTCGTACTTATACGGAAAACGTTTTGTCGGTCCTATTA CACCGACCGTGCTTAGTCTTAGGAAAGAGCTATTTACAGTACCGTATCATGATATAGACTGGAA CCAAGCAAGAAATTTATGTGCCAAAGAAGATTTATATTACCCTCATCCACTAGTGCAGGATATA TTATGGGCTACACTTCACAAGTTTGTCGAACCCGTCTTTATGAATTGGCCTGGTAAGAAGCTAA GGGAAAAGGCGATCAAAACAGCAATTGAGCACATTCATTATGAGGATGAGAATACTAGGTATAT CTGCATTGGGCCCGTCAACAAAGTGTTGAATATGCTGTGTTGTTGGGTGGAAGATCCTAATTCC GAAGCTTTCAAACTGCATTTGCCGAGAATTTATGATTACCTATGGGTAGCTGAAGATGGCATGA AAATGCAAGGTTATAACGGATCGCAATTGTGGGATACAGCATTTGCTGCACAAGCCATTATTAG CACAAATCTAATTGACGAATTCGGACCCACGTTAAAGAAGGCGCACGCCTTCATTAAGAATAGT CAAGTATCCGAAGATTGTCCTGGTGATCTGAGCAAATGGTACAGACACATCTCAAAAGGTGCTT GGCCATTTTCTACTGCCGATCATGGCTGGCCAATTAGCGACTGTACTGCGGAAGGGCTTAAGGC AGTATTGTTATTATCGAAGATAGCACCTGAGATTGTTGGAGAACCATTGGATTCCAAGCGTTTG TATGATGCAGTTAATGTAATTCTGTCACTGCAGAACGAAAATGGAGGTTTGGCGACTTACGAAT TGACTAGATCATATACGTGGCTGGAAATAATCAACCCTGCCGAAACGTTTGGTGACATAGTCAT AGATTGTCCATATGTTGAATGCACAAGTGCTGCCATTCAGGCTCTAGCAACTTTTGGTAAATTG TATCCAGGTCATCGTCGTGAAGAAATACAATGTTGCATAGAGAAAGCCGTTGCCTTCATCGAGA AGATTCAAGCTTCTGATGGTTCTTGGTATGGATCATGGGGCGTCTGTTTTACCTACGGGACGTG GTTTGGTATCAAGGGTTTGATTGCTGCAGGGAAGAATTTCTCCAATTGCTTAAGTATAAGGAAA GCGTGTGAGTTCTTACTGTCTAAACAATTGCCAAGTGGTGGATGGGCCGAATCTTACTTGTCTT GTCAGAACAAAGTGTACTCTAACTTAGAAGGAAATAGGTCGCACGTCGTTAATACAGGATGGGC TATGCTTGCATTGATTGAAGCAGAGCAAGCTAAGAGAGATCCAACTCCACTACATAGAGCAGCC GTATGCTTAATCAACTCACAACTTGAAAATGGCGACTTTCCGCAAGAAGAAATCATGGGCGTAT TCAATAAGAACTGTATGATAACTTACGCTGCGTATAGGTGCATCTTTCCCATTTGGGCTTTGGG TGAATATAGAAGAGTCTTACAAGCTTGCTAG Pathway2sequence MWKLKVAEGGTPWLRTLNNHVGRQVWEFDPHSGSPQDLDDIETARRNFHDNRFTHKHSDDLLMR 904 idH LQFAKENPMNEVLPKVKVKDVEDVTEEAVATTLRRGLNFYSTIQSHDGHWPGDLGGPMFLMPGL PSXY118LProtein VITLSVTGALNAVLTDEHRKEMRRYLYNHQNKDGGWGLHIEGPSTMFGSVLCYVTLRLLGEGPN DGEGDMERGRDWILEHGGATYITSWGKMWLSVLGVFEWSGNNPMPPEIWLLPYALPVHPGRMWC HCRMVYLPMSYLYGKRFVGPITPTVLSLRKELFTVPYHDIDWNQARNLCAKEDLYYPHPLVQDI LWATLHKFVEPVFMNWPGKKLREKAIKTAIEHIHYEDENTRYICIGPVNKVLNMLCCWVEDPNS EAFKLHLPRIYDYLWVAEDGMKMQGYNGSQLWDTAFAAQAIISTNLIDEFGPTLKKAHAFIKNS QVSEDCPGDLSKWYRHISKGAWPFSTADHGWPISDCTAEGLKAVLLLSKIAPEIVGEPLDSKRL YDAVNVILSLQNENGGLATYELTRSYTWLEIINPAETFGDIVIDCPYVECTSAAIQALATFGKL YPGHRREEIQCCIEKAVAFIEKIQASDGSWYGSWGVCFTYGTWFGIKGLIAAGKNFSNCLSIRK ACEFLLSKQLPSGGWAESYLSCQNKVYSNLEGNRSHVVNTGWAMLALIEAEQAKRDPTPLHRAA VCLINSQLENGDFPQEEIMGVFNKNCMITYAAYRCIFPIWALGEYRRVLQAC Pathway2sequence ATGACCACGACAAACTGGTCCCTAAAGGTAGACAGAGGGCGTCAAACTTGGGAATACTCTCAAG 905 idI AAAAGAAGGAGGCCACTGATGTGGACATCCATTTGCTACGACTGAAGGAACCCGGCACACATTG DdCASY80LDNA CCCTGAAGGTTGTGATCTGAATCGCGCTAAAACTCCCCAACAAGCGATTAAGAAAGCATTTCAG (codonoptimized) TACTTCTCCAAAGTCCAAACAGAAGATGGTCATTGGGCTGGAGATTTGGGTGGGCCAATGTTCT TGTTACCCGGTTTGGTGATAACATGCTACGTTACTGGCTATCAATTGCCAGAATCCACTCAAAG GGAAATTATAAGGTATCTGTTCAATAGACAGAATCCGGTTGATGGTGGCTGGGGTTTGCATATA GAGGCCCACTCTGATATATTTGGAACTACGTTACAATATGTATCATTGAGATTACTTGGAGTTC CAGCCGACCATCCATCTGTTGTAAAGGCAAGAACCTTCTTATTACAGAATGGTGGAGCAACCGG TATTCCTTCATGGGGTAAATTCTGGTTGGCCACGTTGAATGCATACGACTGGAACGGGTTGAAT CCAATTCCTATTGAATTTTGGCTGTTACCCTACAACTTACCCATTGCTCCTGGTAGGTGGTGGT GTCACTGTCGGATGGTCTATCTCCCAATGTCTTATATCTACGCTAAGAAAACAACTGGTCCACT AACAGATTTGGTCAAGGATCTGAGGAGAGAAATCTATTGTCAAGAGTACGAAAAGATTAACTGG TCTGAACAAAGAAACAATATTTCGAAATTAGACATGTACTACGAGCATACATCTCTTTTAAATG TTATAAACGGATCATTGAATGCTTACGAGAAAGTTCATTCCAAATGGCTTAGGGATAAAGCCAT TGACTATACCTTTGACCATATACGCTATGAAGATGAGCAGACGAAATACATTGACATAGGTCCA GTCAATAAGACCGTCAATATGTTATGCGTTTGGGATAGAGAAGGCAAATCTCCTGCGTTTTACA AACATGCCGATCGACTTAAAGATTATCTATGGTTATCTTTCGATGGGATGAAAATGCAAGGCTA TAACGGTTCTCAATTGTGGGACACTGCTTTTACGATCCAAGCATTCATGGAATCTGGGATTGCC AATCAATTCCAGGATTGTATGAAATTAGCTGGTCACTATTTGGACATCTCCCAGGTACCAGAAG ATGCCAGAGATATGAAGCACTACCACAGACACTATTCGAAGGGTGCATGGCCTTTTAGTACCGT TGACCATGGATGGCCAATTTCAGATTGCACAGCAGAAGGTATCAAGTCAGCGCTTGCTCTCAGA TCTTTGCCTTTTATCGAACCAATATCCTTAGATAGAATTGCTGATGGCATTAATGTTCTATTAA CCTTGCAAAATGGGGATGGTGGATGGGCATCGTACGAGAACACAAGAGGACCGAAATGGCTGGA AAAGTTTAACCCTTCCGAAGTTTTCCAGAATATAATGATTGACTATAGCTATGTGGAATGTAGT GCTGCTTGTATTCAAGCTATGAGTGCGTTTCGTAAACATGCACCTAATCATCCAAGAATTAAGG AAATCAACAGATCTATTGCACGTGGAGTGAAATTTATCAAGAGCATTCAACGTCAGGATGGTTC ATGGCTGGGCAGTTGGGGAATTTGTTTTACCTACGGTACTTGGTTTGGCATAGAGGGCTTAGTA GCATCTGGTGAGCCTCTAACATCGCCATCGATCGTGAAGGCTTGCAAGTTTCTTGCGTCAAAAC AACGTGCAGATGGTGGTTGGGGAGAAAGCTTTAAAAGCAATGTGACTAAAGAATATGTTCAACA CGAAACTTCACAAGTAGTCAATACTGGTTGGGCTCTACTCAGTCTAATGAGTGCTAAATATCCG GACAGAGAGTGCATAGAGAGAGGTATCAAATTCTTAATACAGAGGCAATATCCGAACGGTGATT TTCCACAGGAATCCATTATTGGCGTTTTCAATTTTAACTGTATGATCTCATATTCAAACTATAA GAACATATTCCCTCTTTGGGCCTTGAGTAGGTATAATCAATTGTACCTTAAAAGCAAAATCTGA Pathway2sequence MTTTNWSLKVDRGRQTWEYSQEKKEATDVDIHLLRLKEPGTHCPEGCDLNRAKTPQQAIKKAFQ 906 idJ YFSKVQTEDGHWAGDLGGPMFLLPGLVITCYVTGYQLPESTQREIIRYLFNRQNPVDGGWGLHI DdCASY8OLprotein EAHSDIFGTTLQYVSLRLLGVPADHPSVVKARTFLLQNGGATGIPSWGKFWLATLNAYDWNGLN PIPIEFWLLPYNLPIAPGRWWCHCRMVYLPMSYIYAKKTTGPLTDLVKDLRREIYCQEYEKINW SEQRNNISKLDMYYEHTSLLNVINGSLNAYEKVHSKWLRDKAIDYTFDHIRYEDEQTKYIDIGP VNKTVNMLCVWDREGKSPAFYKHADRLKDYLWLSFDGMKMQGYNGSQLWDTAFTIQAFMESGIA NQFQDCMKLAGHYLDISQVPEDARDMKHYHRHYSKGAWPFSTVDHGWPISDCTAEGIKSALALR SLPFIEPISLDRIADGINVLLTLQNGDGGWASYENTRGPKWLEKFNPSEVFQNIMIDYSYVECS AACIQAMSAFRKHAPNHPRIKEINRSIARGVKFIKSIQRQDGSWLGSWGICFTYGTWFGIEGLV ASGEPLTSPSIVKACKFLASKQRADGGWGESFKSNVTKEYVQHETSQVVNTGWALLSLMSAKYP DRECIERGIKFLIQRQYPNGDFPQESIIGVFNFNCMISYSNYKNIFPLWALSRYNQLYLKSKI Pathway2sequence ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 907 idK TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA CmeloDNA(codon TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT optimized) AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGTGA Pathway1sequence ATGGTCGATCAATGTGCGTTAGGCTGGATATTAGCTAGTGCACTAGGATTGGTTATCGCTCTAT 908 idL GCTTCTTCGTTGCACCAAGAAGAAACCACAGAGGTGTTGATAGCAAAGAAAGGGATGAGTGTGT SQE1DNA(codon GCAGTCTGCAGCCACAACTAAGGGCGAATGCAGATTTAACGATAGAGATGTGGATGTTATTGTG optimized) GTTGGTGCAGGAGTAGCTGGTTCGGCATTAGCCCATACACTTGGTAAAGACGGTAGAAGAGTTC ATGTCATTGAGAGGGATTTGACTGAACCAGACAGGATTGTTGGTGAACTTCTACAACCAGGAGG CTATTTGAAGTTGATTGAGTTAGGCTTACAAGACTGTGTGGAAGAAATAGATGCACAAAGAGTT TACGGGTATGCTTTGTTTAAAGATGGTAAGAACACCAGACTATCTTATCCACTTGAAAATTTTC ACTCAGATGTCTCCGGTAGAAGCTTTCACAACGGTAGATTCATTCAGAGGATGAGAGAAAAGGC TGCTTCGCTGCCAAATGTAAGATTGGAACAAGGGACGGTTACTAGTCTACTGGAAGAGAAAGGG ACGATCAAAGGAGTTCAGTATAAGTCCAAGAATGGAGAGGAGAAAACCGCGTATGCGCCTTTAA CGATAGTGTGTGATGGCTGTTTCTCTAACTTACGTAGATCATTATGTAATCCCATGGTTGACGT TCCGAGCTACTTTGTTGGTCTTGTGTTAGAAAATTGCGAACTGCCATTTGCCAATCATGGACAT GTTATCCTTGGTGATCCATCCCCAATCTTATTCTATCAGATCTCAAGAACCGAAATTAGGTGTT TGGTCGATGTACCCGGTCAAAAGGTCCCTTCAATTGCCAATGGCGAAATGGAGAAATATTTAAA GACTGTTGTAGCTCCACAAGTACCACCTCAGATTTACGACAGTTTTATAGCCGCCATTGACAAA GGGAATATCAGAACTATGCCTAATAGGTCTATGCCTGCAGCTCCCCATCCAACTCCAGGTGCGT TACTGATGGGCGATGCATTCAACATGAGACATCCTCTAACAGGAGGTGGCATGACAGTAGCACT GTCTGACATTGTGGTCTTGAGAAACTTGTTAAAACCGTTAAAAGACTTGTCTGACGCCTCTACT TTGTGCAAATACTTGGAATCCTTTTATACCCTTCGTAAACCAGTAGCTAGCACAATCAACACCT TAGCTGGAGCCTTGTACAAAGTCTTTTGCGCATCACCGGATCAAGCGAGAAAGGAAATGAGACA AGCTTGTTTTGATTACCTAAGTCTGGGAGGTATTTTCTCGAATGGTCCTGTCTCATTGTTGTCA GGGTTGAATCCCAGACCTTTATCCTTGGTATTGCACTTCTTCGCTGTCGCAATTTATGGTGTTG GTCGTTTGCTTCTACCTTTTCCAAGTGTTAAGGGTATATGGATTGGTGCAAGGTTGATCTACTC TGCCTCTGGTATAATATTTCCCATAATTAGAGCTGAAGGCGTTCGTCAAATGTTCTTTCCTGCT ACAGTGCCCGCTTACTATCGTTCCCCACCTGTATTTAAACCGATAGTGTAG Pathway1sequence MVDQCALGWILASALGLVIALCFFVAPRRNHRGVDSKERDECVQSAATTKGECRFNDRDVDVIV 909 idM VGAGVAGSALAHTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLIELGLQDCVEEIDAQRV SQE1Protein YGYALFKDGKNTRLSYPLENFHSDVSGRSFHNGRFIQRMREKAASLPNVRLEQGTVTSLLEEKG TIKGVQYKSKNGEEKTAYAPLTIVCDGCFSNLRRSLCNPMVDVPSYFVGLVLENCELPFANHGH VILGDPSPILFYQISRTEIRCLVDVPGQKVPSIANGEMEKYLKTVVAPQVPPQIYDSFIAAIDK GNIRTMPNRSMPAAPHPTPGALLMGDAFNMRHPLTGGGMTVALSDIVVLRNLLKPLKDLSDAST LCKYLESFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSLLS GLNPRPLSLVLHFFAVAIYGVGRLLLPFPSVKGIWIGARLIYSASGIIFPIIRAEGVRQMFFPA TVPAYYRSPPVFKPIV Pathway1sequence ATGGTCGATCAATGCGCGTTAGGCTGGATATTAGCTTCCGTCCTAGGAGCTGCAGCGTTGTATT 910 idN TCTTGTTTGGTAGAAAGAATGGTGGTGTGTCTAATGAAAGAAGGCATGAAAGTATTAAGAACAT SQE2DNA(codon TGCAACTACCAATGGTGAGTATAAGTCAAGTAACTCCGATGGTGACATCATCATTGTTGGTGCT optimized) GGCGTTGCTGGATCTGCTTTGGCCTATACGCTAGGTAAAGATGGGAGAAGAGTGCATGTCATTG AAAGGGATTTGACAGAACCAGACCGTATAGTAGGTGAATTGTTACAACCAGGAGGGTATCTAAA ACTGACAGAGTTGGGTTTAGAAGATTGTGTGGATGATATAGATGCTCAACGTGTTTATGGGTAT GCATTATTCAAAGACGGTAAAGATACCAGATTGTCCTATCCCTTGGAAAAGTTTCACTCTGACG TCGCAGGCAGATCCTTTCATAATGGCAGATTCATTCAGCGTATGAGAGAGAAAGCTGCTTCATT GCCTAAAGTGAGCCTAGAGCAAGGGACTGTAACGTCACTGTTGGAGGAAAACGGAATAATCAAA GGGGTACAGTATAAAACTAAGACTGGTCAAGAGATGACTGCATATGCTCCTTTAACAATCGTCT GTGACGGCTGCTTTTCGAACCTTCGTAGAAGCTTGTGCAACCCAAAAGTCGATGTTCCCTCATG TTTTGTGGGATTAGTTCTAGAAAATTGCGATTTGCCTTACGCCAATCACGGACATGTGATCTTG GCTGATCCGTCACCTATTCTGTTCTACAGAATATCTAGTACCGAAATCAGGTGTTTGGTTGATG TTCCAGGTCAGAAAGTGCCTTCTATCAGTAATGGCGAAATGGCCAACTACTTGAAGAATGTTGT TGCACCTCAGATTCCAAGCCAACTTTACGACTCTTTTGTTGCAGCCATTGACAAGGGAAACATA AGAACAATGCCGAATAGATCTATGCCAGCAGATCCATATCCAACACCCGGTGCGCTGCTAATGG GTGATGCCTTTAACATGAGACATCCTCTAACAGGTGGTGGTATGACAGTCGCTTTATCGGATGT TGTCGTATTAAGAGACTTACTGAAACCACTTAGAGACTTGAATGATGCACCTACCTTGAGCAAG TATTTAGAAGCCTTTTACACTCTGCGTAAGCCTGTTGCTTCTACCATAAACACGTTAGCAGGAG CATTGTACAAGGTATTCTGTGCTTCTCCTGATCAAGCGAGAAAGGAAATGAGACAAGCCTGTTT TGACTACCTTTCACTTGGTGGCATATTCAGTAATGGACCAGTATCCTTATTGTCAGGTCTTAAT CCAAGGCCCATTTCCCTTGTTTTACACTTCTTTGCAGTGGCTATCTATGGTGTTGGAAGGCTAT TAATACCGTTTCCATCACCGAAAAGGGTATGGATTGGTGCTAGAATTATTTCGGGCGCGAGTGC AATTATTTTCCCCATTATCAAAGCTGAAGGCGTCAGACAAATGTTCTTTCCAGCCACTGTAGCT GCCTACTACAGAGCCCCAAGAGTTGTTAAAGGTAGGTAG Pathway1sequence MVDQCALGWILASVLGAAALYFLFGRKNGGVSNERRHESIKNIATTNGEYKSSNSDGDIIIVGA 911 idO GVAGSALAYTLGKDGRRVHVIERDLTEPDRIVGELLQPGGYLKLTELGLEDCVDDIDAQRVYGY SQE2Protein ALFKDGKDTRLSYPLEKFHSDVAGRSFHNGRFIQRMREKAASLPKVSLEQGTVTSLLEENGIIK GVQYKTKTGQEMTAYAPLTIVCDGCFSNLRRSLCNPKVDVPSCFVGLVLENCDLPYANHGHVIL ADPSPILFYRISSTEIRCLVDVPGQKVPSISNGEMANYLKNVVAPQIPSQLYDSFVAAIDKGNI RTMPNRSMPADPYPTPGALLMGDAFNMRHPLTGGGMTVALSDVVVLRDLLKPLRDLNDAPTLSK YLEAFYTLRKPVASTINTLAGALYKVFCASPDQARKEMRQACFDYLSLGGIFSNGPVSLLSGLN PRPISLVLHFFAVAIYGVGRLLIPFPSPKRVWIGARIISGASAIIFPIIKAEGVRQMFFPATVA AYYRAPRVVKGR Pathway1sequence ATGGAATTCCAATCGGAACCCTTGTTTGGGGTTCTGTTGGCTAGTCTTTTAGCGCTGGTTTTCT 912 idP TCTTTACTTTGAGAGATGGTACCAAGAACAAGAAAACCACAACTGGGTCATCTGTGGATCTGAA SQE3DNA(codon ACGTACTGACGCTGTCCTACAAATGTCTCCCGAAAACGATGCTAGAAGGCAGGAAATCATAGGG optimized) GATTCAGACGTGATTGTAGTAGGTGCAGGAGTTGCAGGAGCTGCATTAGCCTATACGTTGGGCA AAGATGGTAGAAAAGTTCACGTAATTGAAAGAGACTTGACAGAGCCAGATAGAATTGTAGGTGA ACTATTACAACCTGGTGGCTACTTGAAGCTAGTGGAGTTGGGTCTTGAAGATAGTGTTAAAGGT ATTGACGCTCAACAAGTCTTTGGATATGCGTTGTATAAGGACGGTAAACACACAAGACTTACGT ATCCTTTGGAAAAGTTCGACTCAACTGTATCAGGCAGATCCTTCCATAATGGCAGATTCATCCA AAGATTAAGGGAATCTGTGAGACTAGAACAAGGAACTGTTACCAGCATCTTAGAAGAGGATGGA ACAGTTAAAGGTGTTCAGTATAAGACGAAAATTGGAGAGGAGTTTACAGCTTATGCACCATTGA CAATCGTCTGTGATGGCGGGTTTAGTAACTTGAGAAGAAATTTATGCAAACCACAAATCGACAT TCCCTCGTGTTTTGTGGGATTAGTTTTGGAAAACTGCAAACTTCCCTTCGAGAATCATGGCCAT GTAGTACTGGCAGATCCGTCACCTATTCTGTTATACCCGATTAGTTCAACGGAAATTCGTTGTT TGGTTGACATTCCAGGTCAGAAAGTGCCCTCAGTAGCCAATGGCGAAATGGCCAGATACTTAAA GACTGTTGTCGCTCCGCAAGTTCCACCTGAACTACATGCTGCCTTTATAGCGGCTATAGAGAAA GGTAATATCAAGAGCACAACTAACAGATCTATGCCAGCAGCACCTCACCCAACACCTGGCGCCC TGTTGCTAGGTGATGCATTCAATATGAGACATCCCTTAACCGGTGGTGGTATGACTGTTGCCTT AGCGGACATTGTTGTGCTTAGAGATTTGTTGCGTCCTCTTGCTAATCTAAAGGATGCTGATGCC TTGTGTCACTATCTAGAGTCCTTTTACACCCTTCGTAAACCTGTCGCATCCACCATAAACACAT TAGCTGGCGCATTATACAAGGTCTTTTGTGCCTCTCCAGATTCTGCTAGAAAGGAAATGAGGGA AGCATGTTTTGATTACCTGAGTTTAGGTGGTGTCTTTTCGTCTGGACCTGTAGCTTTGTTATCC GGTTTGAATCCAAGACCTTTGTCCTTATTTTGCCATTTCTTTGCAGTGGCCATATATGGAGTTT CTAGGTTGCTTATACCATTCCCAAGCCCAATGAGGATTTGGATTGGTGTTAGATTAATCACTGT TGCGGCCGGTATAATATTTCCGATTATCAAAGCTGAAGGGGTCAGACAGATGTTCTTTCCTGCT ACTGTCCCAGCTTATTACAGGGCACCACCAATGTAG Pathway1sequence MEFQSEPLFGVLLASLLALVFFFTLRDGTKNKKTTTGSSVDLKRTDAVLQMSPENDARRQEIIG 913 idQ DSDVIVVGAGVAGAALAYTLGKDGRKVHVIERDLTEPDRIVGELLQPGGYLKLVELGLEDSVKG SQE3Protein IDAQQVFGYALYKDGKHTRLTYPLEKFDSTVSGRSFHNGRFIQRLRESVRLEQGTVTSILEEDG TVKGVQYKTKIGEEFTAYAPLTIVCDGGFSNLRRNLCKPQIDIPSCFVGLVLENCKLPFENHGH VVLADPSPILLYPISSTEIRCLVDIPGQKVPSVANGEMARYLKTVVAPQVPPELHAAFIAAIEK GNIKSTTNRSMPAAPHPTPGALLLGDAFNMRHPLTGGGMTVALADIVVLRDLLRPLANLKDADA LCHYLESFYTLRKPVASTINTLAGALYKVFCASPDSARKEMREACFDYLSLGGVFSSGPVALLS GLNPRPLSLFCHFFAVAIYGVSRLLIPFPSPMRIWIGVRLITVAAGIIFPIIKAEGVRQMFFPA TVPAYYRAPPM SS3e-E7fusion ATGTCCGAAAATCACGTTCCTGCCGTTGTCAAAACGCGTGGAAGTGCAGCTCCTGGAAGTGGAA 915 protein,coding GTGGTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTG sequence, GATTAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAA cucurbitadienol AACGACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAA synthase ATCACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATG CGAGAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAA GATGTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTAC AGACCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGT TATTGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATG TGTCGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTT CTACTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGG CGGTGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCA ATAACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATC CATTGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTG TCATTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATC ACTCCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGA ATAGATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATAT CCTATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTG AGGGAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACA TATGCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTC TGATGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATG AGAATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCT CAACGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAG TCAGATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCT TGGCCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAG CTTCACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTT ATGTGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAA TTAACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAA TTGACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTT GCACCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAG AATATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTT GGTTTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAA GGCTTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGT TGCCAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGG TCATGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGC CAGATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTG TTTAATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAG GTGAATACTCCCATCGTGTCTTGGATATGTGA SS3e-E7fusion MSENHVPAVVKTRGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENE 916 protein, NDDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGE cucurbitadienol DVKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEM synthase CRYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATA ITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPI TPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRL REKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGM RMQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGA WPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYE LTRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLE NMQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLS CQNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGV FNKNCMITYAAYRNIFPIWALGEYSHRVLDM- SS3e-E7fusion MSENHVPAVVKTR 917 protein, fusiondomain SS3d-G5fusion ATGACGACCCAGCAAGAGGAGCTCGATGTTGGAGACAGTGAGGGAAGTGCAGCTCCTGGAAGTG 919 protein,coding GAAGTGGTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAA sequence, GTGGATTAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAAT cucurbitadienol GAAAACGACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAA synthase GAAATCACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACA ATGCGAGAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGT GAAGATGTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTG TACAGACCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCT AGTTATTGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAA ATGTGTCGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCT CTTCTACTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGA TGGCGGTGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACA GCAATAACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACA ATCCATTGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTG GTGTCATTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCA ATCACTCCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATT GGAATAGATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGA TATCCTATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGA TTGAGGGAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGT ACATATGCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTA CTCTGATGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGA ATGAGAATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAA TCTCAACGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCA TAGTCAGATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGA GCTTGGCCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGA AAGCTTCACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCG TTTATGTGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTAT GAATTAACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCG TAATTGACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAA GTTGCACCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTG GAGAATATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTG GTTGGTTTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAG AAAGGCTTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTA AGTTGCCAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCAT GGGTCATGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGC TGCCAGATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGT GTGTTTAATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTC TAGGTGAATACTCCCATCGTGTCTTGGATATGTGA SS3d-G5fusion MTTQQEELDVGDSEGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGEN 920 protein, ENDDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEG cucurbitadieno1 EDVKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQE synthase MCRYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGAT AITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGP ITPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKR LREKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDG MRMQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKG AWPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASY ELTRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFL ENMQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYL SCQNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMG VFNKNCMITYAAYRNIFPIWALGEYSHRVLDM- SS3d-G5fusion MTTQQEELDVGDSE 921 protein,fusion domain SS3c-G8fusion ATGGAGGACGGTAAACAGGCCATCAGCGAGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAG 923 protein,coding GAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAG sequence CATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGAC GATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCC GTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGA GATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAG AAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTG ACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCT ATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTAC ATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGT TTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCA TGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCC TGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCAC CCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAG AATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATA GTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCA GAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGG CAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAG GCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTG GACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTT CAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAG GGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAAT TGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCA AGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTT AGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGA TGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGC AGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGG TCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACA GTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGG GCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAG AAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCA TCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAA CTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAAC AAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGG CCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCT AATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAG AACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACT CCCATCGTGTCTTGGATATGTGA SS3c-G8fusion MEDGKQAISEGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDD 924 protein, DEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVK cucurbitadienol KEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRY synthase IYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITS WGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPI VLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREK AMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQ GYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPF STRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTR SYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQ KTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQN KVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNK NCMITYAAYRNIFPIWALGEYSHRVLDM- SS3c-G8fusion MEDGKQAISE 925 protein, fusiondomain SS3c-E5fusion ATGACGATCGGTGATAAGCTGAAAAAGAAGCTTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTT 927 protein,coding CAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAA sequence AAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGAC GACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACT TCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAA AGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTT AAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCT CTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGC GCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGT TACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTA TGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGA GCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACT TCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGC CACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTG TAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCA ATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGAT CCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATG GGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAA AAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCC TTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGC TTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATG CAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGA AATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGAT TCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCT TTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCAC TGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGA TGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACT AGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACT ACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCC TGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATG CAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTG GCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTG TAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAA AACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGA TGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATT GCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAAT AAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAAT ACTCCCATCGTGTCTTGGATATGTGA SS3c-E5fusion MTIGDKLKKKLGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEND 928 protein, DDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDV cucurbitadieno1 KKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCR synthase YIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAIT SWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITP IVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLRE KAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRM QGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWP FSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELT RSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENM QKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQ NKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFN KNCMITYAAYRNIFPIWALGEYSHRVLDM- SS3c-E5protein, MTIGDKLKKKL 929 fusiondomain SS2c-E2fusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 931 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGATGTTGG AGCCGTCACCCTAA SS2c-E2fusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 932 protein, NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD cucurbitadienol GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF synthase GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMMLEPSP- SS2c-E2protein, MMLEPSP 933 fusiondomain SS2c-A10bfusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 935 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGAATTCGA ATGAAGACATCATACCTGAACTATAA SS2c-A10bfusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 936 protein, NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD cucurbitadieno1 GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF synthase GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMNSNEDIIPEL- SS2c-A10bprotein, MNSNEDIIPE 937 fusiondomain SS3b-C1fusion ATGTGGAACAAAACCAAAAAAACACAAGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAA 939 protein,coding TGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCAT sequence TAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGAT GAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTA ATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGAT CATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAA GAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACG GTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATA CGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATC TATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTG GGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGG TGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGG GGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCG AATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAAT GGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTT TTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAA ACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAG TATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCC ATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGAC CCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAA GTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGT TATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGA TTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGA GGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGC ACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGT TATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGT CAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCC TATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTT ATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCA TAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAA ACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCA AAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTT CCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAA GTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCT TGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAAT CAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAAC TGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCC ATCGTGTCTTGGATATGTGA SS3b-C1fusion MWNKTKKTQGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDD 940 protein, EAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKK cucurbitadienol EAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYI synthase YNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSW GKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIV LSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKA MKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQG YNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFS TRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRS YPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQK TDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNK VYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKN CMITYAAYRNIFPIWALGEYSHRVLDM- SS3b-C1protein, MWNKTKKTQ 941 fusiondomain SS3b-B10fusion ATGGCCAAAGAAGATACTGTAAAACTAAAAAGGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTT 943 protein,coding CAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAA sequence AAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGAC GACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACT TCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAA AGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTT AAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCT CTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGC GCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGT TACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTA TGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGA GCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACT TCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGC CACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTG TAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCA ATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGAT CCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATG GGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAA AAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCC TTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGC TTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATG CAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGA AATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGAT TCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCT TTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCAC TGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGA TGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACT AGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACT ACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCC TGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATG CAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTG GCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTG TAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAA AACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGA TGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATT GCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAAT AAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAAT ACTCCCATCGTGTCTTGGATATGTGA SS3b-B10fusion MAKEDTVKLKRGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEND 944 protein, DDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDV cucurbitadieno1 KKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCR synthase YIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAIT SWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITP IVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLRE KAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRM QGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWP FSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELT RSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENM QKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQ NKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFN KNCMITYAAYRNIFPIWALGEYSHRVLDM- SS3b-B10protein, MAKEDTVKLKR 945 fusiondomain SS3a-D8fusion ATGTCATTTCAAATTGAAACGGTTCGTACTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAG 947 protein,coding GAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAG sequence CATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGAC GATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCC GTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGA GATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAG AAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTG ACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCT ATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTAC ATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGT TTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCA TGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCC TGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCAC CCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAG AATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATA GTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCA GAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGG CAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAG GCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTG GACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTT CAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAG GGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAAT TGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCA AGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTT AGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGA TGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGC AGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGG TCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACA GTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGG GCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAG AAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCA TCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAA CTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAAC AAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGG CCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCT AATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAG AACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACT CCCATCGTGTCTTGGATATGTGA SS3a-D8fusion MSFQIETVRTGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDD 948 protein, DEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVK cucurbitadieno1 KEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRY synthase IYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITS WGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPI VLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREK AMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQ GYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPF STRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTR SYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQ KTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQN KVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNK NCMITYAAYRNIFPIWALGEYSHRVLDM SS3a-D8protein, MSFQIETVRT 949 fusiondomain SS3a-A2fusion ATGACCGGCTTGAATGGAGATGCTGACAGCGATCTACTAGGAAGTGCAGCTCCTGGAAGTGGAA 951 protein(5' GTGGTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTG fusion),coding GATTAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAA sequence AACGACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAA ATCACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATG CGAGAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAA GATGTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTAC AGACCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGT TATTGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATG TGTCGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTT CTACTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGG CGGTGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCA ATAACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATC CATTGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTG TCATTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATC ACTCCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGA ATAGATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATAT CCTATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTG AGGGAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACA TATGCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTC TGATGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATG AGAATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCT CAACGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAG TCAGATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCT TGGCCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAG CTTCACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTT ATGTGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAA TTAACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAA TTGACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTT GCACCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAG AATATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTT GGTTTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAA GGCTTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGT TGCCAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGG TCATGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGC CAGATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTG TTTAATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAG GTGAATACTCCCATCGTGTCTTGGATATGTGA SS3a-A2fusion MTGLNGDADSDLLGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENE 952 protein(5' NDDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGE fusion), DVKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEM cucurbitadienol CRYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATA synthase ITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPI TPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRL REKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGM RMQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGA WPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYE LTRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLE NMQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLS CQNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGV FNKNCMITYAAYRNIFPIWALGEYSHRVLDM- SS3a-A2protein MTGLNGDADSDLL (5'fusion), 953 fusiondomain SS3f-A8fusion ATGGCAAGTAACCAGCTCGAGCCCCTGCAAACTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTT 955 protein(5' CAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAA fusion),coding AAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGAC sequence GACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACT TCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAA AGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTT AAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCT CTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGC GCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGT TACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTA TGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGA GCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACT TCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGC CACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTG TAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCA ATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGAT CCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATG GGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAA AAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCC TTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGC TTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATG CAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGA AATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGAT TCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCT TTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCAC TGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGA TGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACT AGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACT ACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCC TGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATG CAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTG GCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTG TAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAA AACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGA TGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATT GCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAAT AAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAAT ACTCCCATCGTGTCTTGGATATGTGA SS3f-A8fusion MASNQLEPLQTGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEND 956 protein(5' DDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDV fusion), KKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCR cucurbitadienol YIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAIT synthase SWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITP IVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLRE KAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRM QGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWP FSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELT RSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENM QKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQ NKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFN KNCMITYAAYRNIFPIWALGEYSHRVLDM- SS3f-A8protein MASNQLEPLQT 957 (5 fusion), fusiondomain SS4b-B8bfusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 959 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAC-AATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGAATTTAG ATTTAGATCAAGATTCAGACTAG SS4b-B8bfusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 960 protein, NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD cucurbitadienol GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF synthase GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMNLDLDQDSD- SS4b-B8bfusion MNLDLDQDSD 961 protein,fusion domain SS4b-B8afusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 963 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGATAAAAC ATATAGTTTCGCCATTCAGGACGAATTTTGTTGGCATCAGCAAGTCCGTGCTGTCAAGGATGAT TCATCACAAGGTTACAATCATAGGTTCTGGCCCCGCTGCCCACACCGCTGCTATATACTTGGCA AGAGCAGAGATGAAGCCCACATTATATGAGGGAATGATGGCCAACGGAATTGCTGCTGGTGGCC AATTGACAACAACCACCGATATCGAAAATTTCCCAGGGTTTCCTGAATCGTTGAGTGGCAGTGA ACTGATGGAGAGGATGAGGAAACAATCTGCCAAGTTTGGCACTAACATAATTATCGAGACTGTC TCTAAAGTCGATTTATCTTCAAAACCATTCAGATTATGGACCGAATTTAATGAGGATGCAGAGC CTGTGACCACTGATGCTATAATCTTGGCCACGGGTGCTTCCGCTAAGAGAATGCATTTACCAGG GGAGGAAACCTACTGGCAGCAGGGAATATCTGCCTGTGCTGTATGTGATGGTGCAGTCCCTATC TTTAGAAACAAGCCATTGGCCGTTATTGGTGGTGGTGACTCTGCGTGTGAGGAAGCGGAATTTC TTACGAAGTATGCGTCGAAAGTATATATATTAGTAAGAAAGGATCATTTTCGTGCATCTGTAAT AATGCAGAGACGAATTGAGAAAAATCCAAACATCATTGTTTTGTTCAACACAGTTGCATTAGAA GCTAAGGGTGATGGTAAGTTATTGAATATGTTGAGAATTAAGAATACTAAAAGTAATGTGGAGA ACGATTTAGAAGTAAATGGACTATTTTACGCAATAGGTCACAGCCCTGCCACAGATATAGTTAA AGGACAAGTAGATGAAGAAGAGACGGGGTATATAAAAACTGTGCCTGGATCGTCTCTGACTTCT GTGCCAGGTTTTTTTGCTGCAGGTGACGTTCAGGACTCTAGGTATAGACAAGCAGTTACTTCTG CTGGTTCCGGATGCATTGCTGCTTTGGATGCAGAACGGTACCTAAGTGCCCAAGAGTAA SS4b-B8afusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 964 protein, NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD cucurbitadienol GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF synthase GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMIKHIVSPFRTNFVGISKSVLSRMIHHKVTIIGSGPAAHTAAIYLA RAEMKPTLYEGMMANGIAAGGQLTTTTDIENFPGFPESLSGSELMERMRKQSAKFGTNIIIETV SKVDLSSKPFRLWTEFNEDAEPVTTDAIILATGASAKRMHLPGEETYWQQGISACAVCDGAVPI FRNKPLAVIGGGDSACEEAEFLTKYASKVYILVRKDHFRASVIMQRRIEKNPNIIVLFNTVALE AKGDGKLLNMLRIKNTKSNVENDLEVNGLFYAIGHSPATDIVKGQVDEEETGYIKTVPGSSLTS VPGFFAAGDVQDSRYRQAVTSAGSGCIAALDAERYLSAQE- SS4b-D4fusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 966 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGGCCGTAC AGAACCATATCTTGCCTCTAA SS4b-D4fusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 967 protein, NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD cucurbitadienol GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF synthase GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMAVQNHILPLTRVM- SS4b-D4fusion MAVQNHILPLTRVM protein,fusion 968 domain SS4c-C4afusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 970 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAC-AATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTCTGCTT CAACTCATTCGCCTGAATAACCGTGTCTGA SS4c-C4afusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 971 protein, NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD cucurbitadienol GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF synthase GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMSASTHSPE-PCL SS4c-C4afusion MSASTHSPE 972 protein,fusion domain SS4c-C4bfusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 974 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGGGTACTA GCATAGTAAATCTAAACCAAAAGATTGAACTGCCCCCAATCCAGGTCTTATTCGAGTCACTTAA CCGAGAAAATGAAACAAAACCCCACTTCGAGGAACGCAGGTTATATCAACCTAATCCTTCATTT GTTCCTAGAACAAATATAGCAGTTGGTAGCCCAGTTAACCCGGTTCCAGTATCATCCCCTGTTT TTTTCATTGGTCCTTCTCCACAGAGAAGCATTCAGAATCACAACGCTATTATGACTCAAAACAT ACGTCAGTATCCAGTTATATATAATAATAACCGAGAAGTTATATCTACGGGTGAGAGAAATTAC ATAATAACTGTAGGGGGGCCTCCGGTAACTTCTTCGCAGCCCGAGTATGAGCATATCTCAACTC CCAATTTCTATCAAGAGCAGAGACTGGCACAACCTCATCCGGTAAATGAGAGTATGATGATAGG TGGTTATACAAATCCTCAGCCTATTAGCATTTCCCGAGGTAAAATGCTATCCGGCAACATAAGT ACGAACTCGGTCCGCGGATCTAATAATGGATATTCCGCAAAAGAAAAAAAACATAAGGCACATG GTAAGAGGTCCAATTTACCAAAGGCCACCGTTTCAATTCTAAACAAATGGTTACATGAGCACGT AAACAACCCTTACCCAACCGTGCAGGAAAAAAGAGAACTGCTCGCGAAAACTGGTCTAACTAAA CTTCAAATTTCCAATTGGTTCATTAATGCTAGGAGAAGAAAAATATTTTCTGGCCAGAATGACG CAAATAATTTCAGAAGAAAATTCAGTTCTTCTACAAATTTAGCTAAGTTCTGA SS4c-C4bfusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 975 protein, NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD cucurbitadienol GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF synthase GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMGTSIVNLNQKIELPPIQVLFESLNRENETKPHFEERRLYQPNPSF VPRTNIAVGSPVNPVPVSSPVFFIGPSPQRSIQNHNAIMTQNIRQYPVIYNNNREVISTGERNY IITVGGPPVTSSQPEYEHISTPNFYQEQRLAQPHPVNESMMIGGYTNPQPISISRGKMLSGNIS TNSVRGSNNGYSAKEKKHKAHGKRSNLPKATVSILNKWLHEHVNNPYPTVQEKRELLAKTGLTK LQISNWFINARRRKIFSGQNDANNFRRKFSSSTNLAKF- 554e-B2fusion ATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCA 978 protein,coding TTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGA sequence TGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGT AATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGA TCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAA AGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGAC GGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTAT ACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACAT CTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTT GGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATG GTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTG GGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCC GAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAA TGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGT TTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGA AACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCA GTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGC CATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGA CCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCA AGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGG TTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTG ATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAG AGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAG CACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATG TTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAG TCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTC CTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGT TATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGC ATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAA AACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATC AAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACT TCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAA AGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCC TTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAA TCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAA CTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCC CATCGTGTCTTGGATATGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGGACCAGC CAAGGACCATTTAAGTAA SS4e-B2fusion MWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEAIAVANNSASKFENARNHFR 979 protein, NNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSD cucurbitadienol GNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMF synthase GSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGKLWLSVLGVYEWSGNNPLPP EFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSR NTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMKIAMEHIHYEDENSRYICLG PVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKL IDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLM LSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYPWLELINPAETFGDIVIDYS YVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGI KGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMA LIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYS HRVLDMGSAAPGSGSGSGMDQPRTI-V SS4e-B2fusion MDQPRTI 980 protein,fusion domain SS5a-E8fusion ATGACTACGGATAACGCAGCAGCATATTCAGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAG 982 protein,coding GAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAG sequence CATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGAC GATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCC GTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGA GATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAG AAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTG ACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCT ATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTAC ATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGT TTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCA TGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCC TGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCAC CCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAG AATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATA GTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCA GAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGG CAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAG GCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTG GACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTT CAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAG GGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAAT TGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCA AGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTT AGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGA TGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGC AGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGG TCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACA GTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGG GCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAG AAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCA TCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAA CTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAAC AAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGG CCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCT AATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAG AACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACT CCCATCGTGTCTTGGATATGTGA SS5a-E8fusion MTTDNAAAYSGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDD 983 protein, DEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVK cucurbitadienol KEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRY synthase IYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITS WGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPI VLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREK AMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQ GYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPF STRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTR SYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQ KTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQN KVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNK NCMITYAAYRNIFPIWALGEYSHRVLDM- SS5a-E8fusion MTTDNAAAYS 984 protein,fusion domain SS5d-E7fusion ATGGCTCTCGGTAATGAGATAGCCAACTTGCAAGAGGGAAGTGCAGCTCCTGGAAGTGGAAGTG 986 protein,coding GTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGAT sequence TAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAAC GACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATC ACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGA GAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGAT GTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGA CCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTAT TGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGT CGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTA CTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGG TGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATA ACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCAT TGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCA TTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACT CCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATA GATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCT ATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGG GAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATAT GCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGA TGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGA ATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAA CGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCA GATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGG CCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTT CACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATG TGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTA ACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTG ACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCA CCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAAT ATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGT TTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGC TTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGC CAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCA TGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAG ATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTT AATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTG AATACTCCCATCGTGTCTTGGATATGTGA SS5d-E7fusion MALGNEIANLQEGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEN 987 protein, DDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGED cucurbitadienol VKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMC synthase RYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAI TSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPIT PIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLR EKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMR MQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAW PFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYEL TRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLEN MQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSC QNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVF NKNCMITYAAYRNIFPIWALGEYSHRVLDM- SS5d-E7fusion MALGNEIANLQE 988 protein,fusion domain SS5d-G5fusion ATGCCAGTCTCTGTAATAACCACGTCAACACAGCCACATGTGAAGGAGCCTGTGGAAGAAGAGA 990 protein,coding GTGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGA sequence ATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCATTAGTAACCATTTGGGCAGACAAGTC TGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGATGAAGCAATTGCTGTAGCTAACAATT CAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTAATAACAGATTCCATAGGAAGCAATC TTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGATCATTAGGAATGGTGCTAAGAATGAA GGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAAGAAGCCGTAAAAAATACACTAGAAA GAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACGGTAATTGGGCATCAGACTTGGGAGG ACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATACGTTACAGGCGTTCTTAACAGTGTG TTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATCTATAATCACCAAAACGAAGATGGAG GGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTGGGTCTGCACTGAATTACGTTGCGTT AAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGGTGCGATGACCAAAGCAAGGTCCTGG ATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAG GAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCGAATTTTGGTTACTTCCATATAGCCT TCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAATGGTTTATCTGCCAATGTCGTATCTT TATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTTTTATCGTTGAGAAAAGAGTTATATA CCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAAACACGTGTGCTAAGGAGGACTTATA TTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAGTATCTACCATGTCTACGAACCGCTA TTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCCATGAAAATTGCAATGGAACATATCC ATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGACCCGTGAACAAAGTTCTAAATATGCT GTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAAGTTTCACTTGCAGAGAATTCCTGAC TATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGTTATAATGGTTCACAGCTTTGGGACA CTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGATTGATACGTTTGGTCCGACATTACG TAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGAGGATTGTCCAGGTGATCCAAACGTA TGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGCACTAGAGATCATGGTTGGCTTATAT CGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGTTATCTAAGTTGCCTTCTAAAATTGT GGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGTCAATGTTTTGTTATCATTGCAAAAC GAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCCTATCCCTGGTTAGAGTTGATTAACC CTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTTATGTGGAATGCACTAGTGCGACGAT GGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCATAGAACCAAAGAAATTGATGCCGCT ATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAAACAGACGGGTCTTGGTATGGTTGTT GGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAAC ATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGC GGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAAGTTTATACCAATCTGGAGGGAAACA AGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCTTGATTGAAGCAGGACAAGGGGAAAG AGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAATCAATAGTCAATTGGAGTCAGGTGAC TTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAACTGTATGATAACATATGCCGCATACA GAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCCATCGTGTCTTGGATATGTGA SS5d-G5fusion MPVSVITTSTQPHVKEPVEEESGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQV 991 protein, WEFCSGENENDDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNE cucurbitadienol GTTKVKEGEDVKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSV synthase LSKHHRQEMCRYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSW ILERGGATAITSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYL YGKRFVGPITPIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPL FSGWPGKRLREKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPD YLWLAEDGMRMQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNV WFRHIHKGAWPFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQN ENGGFASYELTRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAA IAKAANFLENMQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPG GGWGESYLSCQNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGD FPQQEIMGVFNKNCMITYAAYRNIFPIWALGEYSHRVLDM- SS5d-G5 MPVSVITTSTQPHVKEPVEEES 992 protein,fusion domain SS5d-G7fusion ATGTCATCGAGCAAGAAAATCACCAGTGTCAAACAAGGAAGTGCAGCTCCTGGAAGTGGAAGTG 994 protein,coding GTTCAGGAATGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGAT sequence TAAAAGCATTAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAAC GACGACGATGAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATC ACTTCCGTAATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGA GAAAGAGATCATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGAT GTTAAGAAAGAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGA CCTCTGACGGTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTAT TGCGCTATACGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGT CGTTACATCTATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTA CTATGTTTGGGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGG TGAGCATGGTGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATA ACTTCCTGGGGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCAT TGCCACCCGAATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCA TTGTAGAATGGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACT CCAATAGTTTTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATA GATCCAGAAACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCT ATGGGGCAGTATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGG GAAAAGGCCATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATAT GCCTTGGACCCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGA TGCTTTCAAGTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGA ATGCAGGGTTATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAA CGAAATTGATTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCA GATTCAAGAGGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGG CCTTTTAGCACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTT CACTGATGTTATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATG TGATGCAGTCAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTA ACTAGGTCCTATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTG ACTACAGTTATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCA CCCTGGGCATAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAAT ATGCAGAAAACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGT TTGGCATCAAAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGC TTGTAACTTCCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGC CAAAACAAAGTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCA TGATGGCCTTGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAG ATTGCTAATCAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTT AATAAGAACTGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTG AATACTCCCATCGTGTCTTGGATATGTGA SS5d-G7fusion MSSSKKITSVKQGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENEN 995 protein, DDDEAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGED cucurbitadienol VKKEAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMC synthase RYIYNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAI TSWGKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPIT PIVLSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLR EKAMKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMR MQGYNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAW PFSTRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYEL TRSYPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLEN MQKTDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSC QNKVYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVF NKNCMITYAAYRNIFPIWALGEYSHRVLDM- SS5d-G7fusion MSSSKKITSVKQ protein,fusion 996 domain SS5e-C10fusion ATGCGAGGCTTGACACCTAAGGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGTGGA 998 protein,coding GACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCATTAGTAA sequence CCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGATGAAGCA ATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTAATAACA GATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGATCATTAG GAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAAGAAGCC GTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACGGTAATT GGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATACGTTAC AGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATCTATAAT CACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTGGGTCTG CACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGGTGCGAT GACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGGGGCAAA CTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCGAATTTT GGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAATGGTTTA TCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTTTTATCG TTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAAACACGT GTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAGTATCTA CCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCCATGAAA ATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGACCCGTGA ACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAAGTTTCA CTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGTTATAAT GGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGATTGATA CGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGAGGATTG TCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGCACTAGA GATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGTTATCTA AGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGTCAATGT TTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCCTATCCC TGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTTATGTGG AATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCATAGAAC CAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAAACAGAC GGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCAAAGGCT TAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTTCCTGTT GTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAAGTTTAT ACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCTTGATTG AAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAATCAATAG TCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAACTGTATG ATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCCATCGTG TCTTGGATATGTGA SS5e-C10fusion MRGLTPKGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDEA 999 protein, IAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKEA cucurbitadienol VKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIYN synthase HQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWGK LWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVLS LRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAMK IAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGYN GSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFSTR DHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSYP WLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKTD GSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKVY TNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNCM ITYAAYRNIFPIWALGEYSHRVLDM- SS5e-C10fusion MRGLTPK 1000 protein,fusion domain SS5e-G8fusion ATGGCAAACCAAATAGCAAATCAAGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGT 1002 protein,coding GGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCATTAG sequence TAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGATGAA GCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTAATA ACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGATCAT TAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAAGAA GCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACGGTA ATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATACGT TACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATCTAT AATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTGGGT CTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGGTGC GATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGGGGC AAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCGAAT TTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAATGGT TTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTTTTA TCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAAACA CGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAGTAT CTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCCATG AAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGACCCG TGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAAGTT TCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGTTAT AATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGATTG ATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGAGGA TTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGCACT AGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGTTAT CTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGTCAA TGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCCTAT CCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTTATG TGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCATAG AACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAAACA GACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCAAAG GCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTTCCT GTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAAGTT TATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCTTGA TTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAATCAA TAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAACTGT ATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCCATC GTGTCTTGGATATGTGA SS5e-G8fusion MANQIANQGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDE 1003 protein, AIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKE cucurbitadienol AVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIY synthase NHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWG KLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVL SLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAM KIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGY NGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFST RDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSY PWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKT DGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKV YTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNC MITYAAYRNIFPIWALGEYSHRVLDM SS5e-G8fusion MANQIANQ 1004 protein,fusion domain SS5f-E11fusion ATGGTTGACGCTAGGGGTAGCAACGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAATGT 1006 protein,coding GGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCATTAG sequence TAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGATGAA GCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTAATA ACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGATCAT TAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAAGAA GCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACGGTA ATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATACGT TACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATCTAT AATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTGGGT CTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGGTGC GATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGGGGC AAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCGAAT TTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAATGGT TTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTTTTA TCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAAACA CGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAGTAT CTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCCATG AAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGACCCG TGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAAGTT TCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGTTAT AATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGATTG ATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGAGGA TTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGCACT AGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGTTAT CTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGTCAA TGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCCTAT CCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTTATG TGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCATAG AACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAAACA GACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCAAAG GCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTTCCT GTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAAGTT TATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCTTGA TTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAATCAA TAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAACTGT ATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCCATC GTGTCTTGGATATGTGA SS5f-E11fusion MVDARGSNGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDDE 1007 protein, AIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKKE cucurbitadieno1 AVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYIY synthase NHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSWG KLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIVL SLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKAM KIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQGY NGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFST RDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRSY PWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQKT DGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNKV YTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKNC MITYAAYRNIFPIWALGEYSHRVLDM- SS5f-E11fusion MVDARGSN 1008 protein,fusion domain SS5f-F8fusion ATGCACGGCAAAGAGTTGGCTGGGCTAGGAAGTGCAGCTCCTGGAAGTGGAAGTGGTTCAGGAA 1010 protein,coding TGTGGAGACTGAAAGTAGGCAAAGAATCTGTTGGCGAAAAGGAAGAAAAGTGGATTAAAAGCAT sequence TAGTAACCATTTGGGCAGACAAGTCTGGGAGTTTTGCTCTGGTGAAAATGAAAACGACGACGAT GAAGCAATTGCTGTAGCTAACAATTCAGCCTCAAAATTTGAAAATGCAAGAAATCACTTCCGTA ATAACAGATTCCATAGGAAGCAATCTTCCGACTTATTCTTGGCTATACAATGCGAGAAAGAGAT CATTAGGAATGGTGCTAAGAATGAAGGTACTACCAAAGTGAAAGAAGGTGAAGATGTTAAGAAA GAAGCCGTAAAAAATACACTAGAAAGAGCATTGTCGTTCTATTCTGCTGTACAGACCTCTGACG GTAATTGGGCATCAGACTTGGGAGGACCTATGTTCCTTTTACCAGGGCTAGTTATTGCGCTATA CGTTACAGGCGTTCTTAACAGTGTGTTGTCAAAACATCACAGGCAAGAAATGTGTCGTTACATC TATAATCACCAAAACGAAGATGGAGGGTGGGGTTTACATATAGAAGGCTCTTCTACTATGTTTG GGTCTGCACTGAATTACGTTGCGTTAAGGTTACTAGGAGAAGCGGCAGATGGCGGTGAGCATGG TGCGATGACCAAAGCAAGGTCCTGGATATTGGAAAGAGGAGGTGCCACAGCAATAACTTCCTGG GGCAAACTTTGGCTTTCCGTATTAGGAGTGTATGAATGGTCGGGAAACAATCCATTGCCACCCG AATTTTGGTTACTTCCATATAGCCTTCCCTTTCATCCAGGGAGAATGTGGTGTCATTGTAGAAT GGTTTATCTGCCAATGTCGTATCTTTATGGTAAGAGATTTGTTGGTCCAATCACTCCAATAGTT TTATCGTTGAGAAAAGAGTTATATACCATTCCGTACCACGAGATTGATTGGAATAGATCCAGAA ACACGTGTGCTAAGGAGGACTTATATTACCCTCACCCTAAGATGCAAGATATCCTATGGGGCAG TATCTACCATGTCTACGAACCGCTATTTTCAGGTTGGCCAGGTAAGAGATTGAGGGAAAAGGCC ATGAAAATTGCAATGGAACATATCCATTACGAAGATGAGAATTCCAGGTACATATGCCTTGGAC CCGTGAACAAAGTTCTAAATATGCTGTGTTGCTGGGTTGAGGATCCTTACTCTGATGCTTTCAA GTTTCACTTGCAGAGAATTCCTGACTATCTATGGTTAGCTGAAGATGGAATGAGAATGCAGGGT TATAATGGTTCACAGCTTTGGGACACTGCATTCAGCATACAAGCGATAATCTCAACGAAATTGA TTGATACGTTTGGTCCGACATTACGTAAAGCACACCATTTCGTCAAGCATAGTCAGATTCAAGA GGATTGTCCAGGTGATCCAAACGTATGGTTCAGACACATTCATAAAGGAGCTTGGCCTTTTAGC ACTAGAGATCATGGTTGGCTTATATCGGATTGCACAGCTGAGGGATTGAAAGCTTCACTGATGT TATCTAAGTTGCCTTCTAAAATTGTGGGTGAACCCTTGGAGAAGAACCGTTTATGTGATGCAGT CAATGTTTTGTTATCATTGCAAAACGAAAATGGTGGGTTCGCTTCTTATGAATTAACTAGGTCC TATCCCTGGTTAGAGTTGATTAACCCTGCCGAAACATTTGGTGATATCGTAATTGACTACAGTT ATGTGGAATGCACTAGTGCGACGATGGAAGCCTTAGCCTTGTTTAAGAAGTTGCACCCTGGGCA TAGAACCAAAGAAATTGATGCCGCTATTGCTAAAGCCGCAAATTTTCTGGAGAATATGCAGAAA ACAGACGGGTCTTGGTATGGTTGTTGGGGTGTCTGTTTTACTTATGCTGGTTGGTTTGGCATCA AAGGCTTAGTAGCTGCTGGTAGAACATACAATAATTGTGTTGCCATTAGAAAGGCTTGTAACTT CCTGTTGTCAAAAGAATTGCCTGGCGGTGGTTGGGGCGAAAGCTACCTAAGTTGCCAAAACAAA GTTTATACCAATCTGGAGGGAAACAAGCCACACTTAGTCAATACTGCATGGGTCATGATGGCCT TGATTGAAGCAGGACAAGGGGAAAGAGATCCAGCTCCGTTGCATAGAGCTGCCAGATTGCTAAT CAATAGTCAATTGGAGTCAGGTGACTTTCCACAACAAGAAATCATGGGTGTGTTTAATAAGAAC TGTATGATAACATATGCCGCATACAGAAACATATTCCCTATATGGGCTCTAGGTGAATACTCCC ATCGTGTCTTGGATATGTGA SS5f-F8fusion MHGKELAGLGSAAPGSGSGSGMWRLKVGKESVGEKEEKWIKSISNHLGRQVWEFCSGENENDDD 1011 protein, EAIAVANNSASKFENARNHFRNNRFHRKQSSDLFLAIQCEKEIIRNGAKNEGTTKVKEGEDVKK cucurbitadienol EAVKNTLERALSFYSAVQTSDGNWASDLGGPMFLLPGLVIALYVTGVLNSVLSKHHRQEMCRYI synthase YNHQNEDGGWGLHIEGSSTMFGSALNYVALRLLGEAADGGEHGAMTKARSWILERGGATAITSW GKLWLSVLGVYEWSGNNPLPPEFWLLPYSLPFHPGRMWCHCRMVYLPMSYLYGKRFVGPITPIV LSLRKELYTIPYHEIDWNRSRNTCAKEDLYYPHPKMQDILWGSIYHVYEPLFSGWPGKRLREKA MKIAMEHIHYEDENSRYICLGPVNKVLNMLCCWVEDPYSDAFKFHLQRIPDYLWLAEDGMRMQG YNGSQLWDTAFSIQAIISTKLIDTFGPTLRKAHHFVKHSQIQEDCPGDPNVWFRHIHKGAWPFS TRDHGWLISDCTAEGLKASLMLSKLPSKIVGEPLEKNRLCDAVNVLLSLQNENGGFASYELTRS YPWLELINPAETFGDIVIDYSYVECTSATMEALALFKKLHPGHRTKEIDAAIAKAANFLENMQK TDGSWYGCWGVCFTYAGWFGIKGLVAAGRTYNNCVAIRKACNFLLSKELPGGGWGESYLSCQNK VYTNLEGNKPHLVNTAWVMMALIEAGQGERDPAPLHRAARLLINSQLESGDFPQQEIMGVFNKN CMITYAAYRNIFPIWALGEYSHRVLDM- SS5f-F8fusion MHGKELAGL 1012 protein,fusion domain EXG1YLR300W MLSLKTLLCTLLTVSSVLATPVPARDPSSIQFVHEENKKRYYDYDHGSLGEPIRGVNIGGWLLL 1013 Saccharemyces EPYITPSLFEAFRTNDDNDEGIPVDEYHFCQYLGKDLAKSRLQSHWSTFYQEQDFANIASQGFN cerevisiae LVRIPIGYWAFQTLDDDPYVSGLQESYLDQAIGWARNNSLKVWVDLHGAAGSQNGFDNSGLRDS YKFLEDSNLAVTTNVLNYILKKYSAEEYLDTVIGIELINEPLGPVLDMDKMKNDYLAPAYEYLR NNIKSDQVIIIHDAFQPYNYWDDFMTENDGYWGVTIDHHHYQVFASDQLERSIDEHIKVACEWG TGVLNESHWTVCGEFAAALTDCTKWLNSVGFGARYDGSWVNGDQTSSYIGSCANNDDIAYWSDE RKENTRRYVEAQLDAFEMRGGWIIWCYKTESSLEWDAQRLMFNGLFPQPLTDRKYPNQCGTISN EXG1 MKLTKLVALAGAALASPIQLVPREGSFLGFNYGSEKVHGVNLGGWFVLEPFITPSLFEAFGNND 1014 Yarrowia ANVPVDEYHYTAWLGKEEAEKRLTDHWNTWITEYDIKAIAENYKLNLVRIPIGYWAFSLLPNDP lipolytica YVQGQEAYLDRALGWCRKYGVKAWVDVHGVPGSQNGFDNSGLRDHWDWPNADNVQHSINVINYI AGKYGAPEYNDIVVGIELVNEPLGPAIGMEVIEKYFQEGFWTVRHAGSDTAVVIHDAFQEKNYF NNFMTTEQGFWNVVLDHHQYQVFSPGELARNIDQHIAEVCNVGRQASTEYHWRIFGEWSAALTD CTHWLNGVGKGPRLDGSFPGSYYQRSCQGRGDIQTWSEQDKQESRRYVEAQLDAWEHGGDGWIY WTYKTENALEWDFRRLVDNGIFPFPYWDRQFPNQCGF Glugan1,4,alpha MPRLSYALCALSLGHAAIAAPQLSARATGSLDSWLGTETTVALNGILANIGADGAYAKSAKPGI 1015 glucosidase IIASPSTSEPDYYYTWTRDAALVTKVLVDLFRNGNLGLQKVITEYVNSQAYLQTVSNPSGGLAS GGLAEPKYNVDMTAFTGAWGRPQRDGPALRATALIDFGNWLIDNGYSSYAVNNIWPIVRNDLSY VSQYWSQSGFDLWEEVNSMSFFTVAVQHRALVEGSTFAKRVGASCSWCDSQAPQILCYMQSFWT GSYINANTGGGRSGKDANTVLASIHTFDPEAGCDDTTFQPCSPRALANHKVYTDSFRSVYAINS GIPQGAAVSAGRYPEDVYYNGNPWFLTTLAAAEQLYDAIYQWKKIGSISITSTSLAFFKDIYSS AAVGTYASSTSTFTDIINAVKTYADGYVSIVQAHAMNNGSLSEQFDKSSGLSLSARDLTWSYAA FLTANMRRNGVVPAPWGAASANSVPSSCSMGSATGTYSTATATSWPSTLTSGSPGSTTTVGTTT STTSGTAAETACATPTAVAVTFNEIATTTYGENVYIVGSISELGNWDTSKAVALSASKYTSSNN LWYVSVTLPAGTTFEYKYIRKESDGSIVWESDPNRSYTVPAACGVSTATENDTWQ polygalacturonase MHLNTTLLVSLALGAASVLASPAPPAITAPPTAEEIAKRATTCTFSGSNGASSASKSKTSCSTI 1016 1[Aspergillus VLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSGPLISVSGSDLTITGASGHSINGDGS aculeatus] RWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQVFSVAGSDYLTLKDITIDNSDGDDN CAE46193.1 GGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIYFSGGYCSGGHGLSIGSVGGRSDNTV GI:34366090 KNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITLTSIAKYGIVVQQNYGDTSSTPTTGV polygalacturonase PITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVSVSGGKTSSKCTNVPSGASC 1017 2[Aspergillus MHSFQLLGLAAVGSVVSAAPTASRVSDLVKKSSSTCTFTSASEASETSSSCSNVVLSNIEVPAG aculeatus] ETLDLSDAADGATITFEGTTSFGYEEWDGPLIRFGGKQLTITQSDGAVIDGDGSRWWDSEGTNG CAE46194.1 GKTKPKFMYVHDVEDSTIKGLQIKNTPVQAISVQATNVYLTDITIDNSDGDDNGGHNTDGFDIS GI:34366092 ESTGVYISGATVKNQDDCIAINSGENILFTGGTCSGGHGLSIGSVGGRDDNTVKNVTISDSTVT DSANGVRIKTIYGDTGDVSEITYSNIQLSGITDYGIVIEQDYENGSPTGTPSTGVPITDVTVDG VTGSIEDDAVQVYILCGDGSCSDWTWSGVDITGGETSSDCENVPSGASC polygalacturonase MVRQLALACGLLAAVAVQAAPAEPAHPMVTEAPDASLLHKRATTCTFSGSEGASKVSKSKTACS 1018 3[Aspergillus TIYLSALAVPSGTTLDLKDLNDGTHVIFEGETTFGYEEWEGPLVSVSGTDITVEGASGAVLNGD aculeatus] GSRWWDGEGGNGGKTKPKFFAAHDLTSSTIKSIYIENSPVQVFSIDGATDLTLTDITIDNTDGD CAE46195.1 TDDLAANTDGFDIGESTDITITGAKVYNQDDCVAINSGENIYFSASVCSGGHGLSIGSVGGRDD GI:34366094 NTVKNVTFYDVNVLKSQQAIRIKAIYGDTGSISDITYHEIAFSDATDYGIVIEQNYDDTSKTPT TGVPITDFTLENVIGTCADDDCTEVYIACGSGACSDWSWSSVSVTGGKVSSKCLNVPSGISCDL ChainA,Crystal ATTCTFSGSNGASSASKSKTSCSTIVLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSG 1019 StructureOf PLISVSGSDLTITGASGHSINGDGSRWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQ Polygalacturonase VFSVAGSDYLTLKDITIDNSDGDDNGGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIY FromAspergillus FSGGYCSGGHGLSIGSVGGRSDNTVKNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITL AculeatusAtPh4.5 TSIAKYGIVVQQNYGDTSSTPTTGVPITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVS 1IB4_AGI:15988280 VSGGKTSSKCTNVPSGASC ChainB,Crystal ATTCTFSGSNGASSASKSKTSCSTIVLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSG 1020 StructureOf PLISVSGSDLTITGASGHSINGDGSRWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQ Polygalacturonase VFSVAGSDYLTLKDITIDNSDGDDNGGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIY FromAspergillus FSGGYCSGGHGLSIGSVGGRSDNTVKNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITL aculeatusAtPh4.5 TSIAKYGIVVQQNYGDTSSTPTTGVPITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVS 1IB4_BGI:15988281 VSGGKTSSKCTNVPSGASC ChainA, ATTCTFSGSNGASSASKSKTSCSTIVLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSG 1021 Polygalacturonase PLISVSGSDLTITGASGHSINGDGSRWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQ FromAspergillus VFSVAGSDYLTLKDITIDNSDGDDNGGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIY Aculeatus FSGGYCSGGHGLSIGSVGGRSDNTVKNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITL 1IA5_AGI:15988279 TSIAKYGIVVQQNYGDTSSTPTTGVPITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVS VSGGKTSSKCTNVPSGASC polygalacturonase MHLNTTLLVSLALGAASVLASPAPPAITAPPTAEEIAKRATTCTFSGSNGASSASKSKTSCSTI 1022 precursor VLSNVAVPSGTTLDLTKLNDGTHVIFSGETTFGYKEWSGPLISVSGSDLTITGASGHSINGDGS [Aspergillus aculeatus] RWWDGEGGNGGKTKPKFFAAHSLTNSVISGLKIVNSPVQVFSVAGSDYLTLKDITIDNSDGDDN 378aaprotein GGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGENIYFSGGYCSGGHGLSIGSVGGRSDNTV AAC23565.1 KNVTFVDSTIINSDNGVRIKTNIDTTGSVSDVTYKDITLTSIAKYGIVVQQNYGDTSSTPTTGV GI:3220207 PITDFVLDNVHGSVVSSGTNILISCGSGSCSDWTWTDVSVSGGKTSSKCTNVPSGASC EXG2YDR261C MPLKSFFFSAFLVLCLSKFTQGVGTTEKEESLSPLELNILQNKFASYYANDTITVKGITIGGWL 1023 Saccharomyces VTEPYITPSLYRNATSLAKQQNSSSNISIVDEFTLCKTLGYNTSLTLLDNHFKTWITEDDFEQI cerevisiae KTNGFNLVRIPIGYWAWKQNTDKNLYIDNITFNDPYVSDGLQLKYLNNALEWAQKYELNVWLDL HGAPGSQNGFDNSGERILYGDLGWLRLNNTKELTLAIWRDMFQTFLNKGDKSPVVGIQIVNEPL GGKIDVSDITEMYYEAFDLLKKNQNSSDNTTFVIHDGFQGIGHWNLELNPTYQNVSHHYFNLTG ANYSSQDILVDHHHYEVFTDAQLAETQFARIENIINYGDSIHKELSFHPAVVGEWSGAITDCAT WLNGVGVGARYDGSYYNTTLFTTNDKPVGTCISQNSLADWTQDYRDRVRQFIEAQLATYSSKTT GWIFWNWKTEDAVEWDYLKLKEANLFPSPFDNYTYFKADGSIEEKFSSSLSAQAFPRTTSSVLS STTTSRKSKNAAISNKLTTSQLLPIKNMSLTWKASVCALAITIAALCASL
(450) While the invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. This includes embodiments which do not provide all of the benefits and features set forth herein. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto. Accordingly, the scope of the invention is defined only by reference to the appended claims.