Ketoacyl ACP synthase genes and uses thereof
09969990 ยท 2018-05-15
Assignee
Inventors
- David Davis (San Bruno, CA)
- George Rudenko (Mountain View, CA)
- Aravind Somanchi (Redwood City, CA)
- Jason Casolari (Palo Alto, CA)
- Scott Franklin (Woodside, CA)
- Aren Ewing (Brisbane, CA)
Cpc classification
C12P7/6463
CHEMISTRY; METALLURGY
C12Y203/01041
CHEMISTRY; METALLURGY
C12N9/1029
CHEMISTRY; METALLURGY
C12N15/82
CHEMISTRY; METALLURGY
International classification
C12P7/64
CHEMISTRY; METALLURGY
Abstract
The present invention relates to beta-ketoacyl ACP synthase genes of the KASI/KASIV type and proteins encoded by these genes. The genes can be included in nucleic acid constructs, vectors or host cells. Expression of the gene products can alter the fatty acid profile of host cells. The KAS genes can be combined with a FATA or FATB thioesterase gene to create a cell that produces an increased amount of C8-C16 fatty acids. Suitable host cells include plastidic cells of plants or microalgae. Oleaginous microalga host cells with the new genes are disclosed.
Claims
1. A recombinant polynucleotide having at least 95% sequence identity or equivalent sequence by virtue of the degeneracy of the genetic code to SEQ ID NO: 31 encoding a -keto-acyl ACP synthase IV (KASIV) protein, or the complement of the polynucleotide, wherein said protein has KASIV activity.
2. A transformation vector comprising the polynucleotide of claim 1.
3. The vector of claim 2, comprising promoter and 3UTR sequences in operable linkage to the polynucleotide, and optionally a flanking sequence for homologous recombination.
4. A host cell comprising the vector of claim 2.
5. The host cell of claim 4, wherein the host cell is a plastidic oleaginous cell having a type II fatty acid biosynthesis pathway.
6. The host cell of claim 5, wherein the host cell is a microalga.
7. The host cell of claim 6, wherein the host cell is of Trebouxiophyceae, and optionally of the genus Chlorella or Prototheca.
8. The host cell of claim 7, wherein the microalga is of the species Prototheca moriformis.
9. A host cell comprising a recombinant polynucleotide having at least 95% sequence identity or equivalent sequence by virtue of the degeneracy of the genetic code to SEQ ID NO: 31, encoding a -keto-acyl ACP synthase IV (KASIV) protein or the complement of the polynucleotide, wherein said protein has KASIV activity.
10. The host cell of claim 9 further comprising a polynucleotide encoding a fatty acyl-ACP thioesterase B (FATB) wherein the thioesterase has at least 90% amino acid sequence identity to SEQ ID NO: 1 or SEQ ID NO: 57.
11. The host cell of claim 9, wherein the host cell produces a cell oil characterized by a fatty acid profile with (i) at least 7, 8, 9, 10, 11, 12, 13, or 14 area % C8:0, (ii) at least 10, 15, 20, 25, 30, or 35 area % for the sum of C8:0 and C10:0, or (iii) a C8/C10 ratio in the range of 2.2-2.5, 2.5-3.0, or 3.0-3.4.
12. The host cell of claim 9, wherein the host cell is a plastidic oleaginous cell having a type II fatty acid biosynthesis pathway.
13. The host cell of claim 12, wherein the host cell is a microalga.
14. The host cell of claim 13, wherein the host cell is of Trebouxiophyceae, and optionally of the genus Chlorella or Prototheca.
15. The host cell of claim 14, wherein the microalga is of the species Prototheca moriformis.
16. The host cell of claim 9, wherein one or more of the polynucleotides is codon-optimized for expression in the host cell such that the polynucleotide's coding sequence contains the most or second most preferred codon for at least 60% of the codons of the coding sequence such that the codon-optimized sequence is more efficiently translated in the host cell relative to a non-optimized sequence.
17. The host cell of claim 16, wherein the coding sequence contains the most preferred codon for at least 80% of the codons of the coding sequence.
18. A recombinant polynucleotide having at least 85% sequence identity or equivalent sequence by virtue of the degeneracy of the genetic code to SEQ ID NO: 28 encoding a 3-keto-acyl ACP synthase IV (KASIV) protein, or the complement of the polynucleotide, wherein said protein has KASIV activity.
19. A transformation vector comprising the polynucleotide of claim 18.
20. The vector of claim 19, comprising promoter and 3UTR sequences in operable linkage to the polynucleotide, and optionally a flanking sequence for homologous recombination.
21. A host cell comprising the vector of claim 19.
22. The host cell of claim 21, wherein the host cell is a plastidic oleaginous cell having a type II fatty acid biosynthesis pathway.
23. The host cell of claim 22, wherein the host cell is a microalga.
24. The host cell of claim 23, wherein the host cell is of Trebouxiophyceae, and optionally of the genus Chlorella or Prototheca.
25. The host cell of claim 24, wherein the microalga is of the species Prototheca moriformis.
26. A host cell comprising a recombinant polynucleotide having at least 85% sequence identity or equivalent sequence by virtue of the degeneracy of the genetic code to SEQ ID NO: 28, encoding a -keto-acyl ACP synthase IV (KASIV) protein or the complement of the polynucleotide, wherein said protein has KASIV activity.
27. The host cell of claim 26 further comprising a polynucleotide encoding a fatty acyl-ACP thioesterase B (FATB), wherein the thioesterase has at least 90% amino acid sequence identity to SEQ ID NO: 1 or SEQ ID NO: 57.
28. The host cell of claim 26, wherein the host cell produces a cell oil characterized by a fatty acid profile with (i) at least 7, 8, 9, 10, 11, 12, 13, or 14 area % C8:0, (ii) at least 10, 15, 20, 25, 30, or 35 area % for the sum of C8:0 and C10:0, or (iii) a C8/C10 ratio in the range of 2.2-2.5, 2.5-3.0, or 3.0-3.4.
29. The host cell of claim 26, wherein the host cell is a plastidic oleaginous cell having a type II fatty acid biosynthesis pathway.
30. The host cell of claim 29, wherein the host cell is a microalga.
31. The host cell of claim 30, wherein the host cell is of Trebouxiophyceae, and optionally of the genus Chlorella or Prototheca.
32. The host cell of claim 31, wherein the microalga is of the species Prototheca moriformis.
33. The host cell of claim 26, wherein one or more of the polynucleotides is codon-optimized for expression in the host cell such that the polynucleotide's coding sequence contains the most or second most preferred codon for at least 60% of the codons of the coding sequence such that the codon-optimized sequence is more efficiently translated in the host cell relative to a non-optimized sequence.
34. The host cell of claim 33, wherein the coding sequence contains the most preferred codon for at least 80% of the codons of the coding sequence.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
(2) As used with respect to nucleic acids, the term isolated refers to a nucleic acid that is free of at least one other component that is typically present with the naturally occurring nucleic acid. Thus, a naturally occurring nucleic acid is isolated if it has been purified away from at least one other component that occurs naturally with the nucleic acid.
(3) A cell oil or cell fat shall mean a predominantly triglyceride oil obtained from an organism, where the oil has not undergone blending with another natural or synthetic oil, or fractionation so as to substantially alter the fatty acid profile of the triglyceride. In connection with an oil comprising triglycerides of a particular regiospecificity, the cell oil or cell fat has not been subjected to interesterification or other synthetic process to obtain that regiospecific triglyceride profile, rather the regiospecificity is produced naturally, by a cell or population of cells. For a cell oil or cell fat produced by a cell, the sterol profile of oil is generally determined by the sterols produced by the cell, not by artificial reconstitution of the oil by adding sterols in order to mimic the cell oil. In connection with a cell oil or cell fat, and as used generally throughout the present disclosure, the terms oil and fat are used interchangeably, except where otherwise noted. Thus, an oil or a fat can be liquid, solid, or partially solid at room temperature, depending on the makeup of the substance and other conditions. Here, the term fractionation means removing material from the oil in a way that changes its fatty acid profile relative to the profile produced by the organism, however accomplished. The terms cell oil and cell fat encompass such oils obtained from an organism, where the oil has undergone minimal processing, including refining, bleaching and/or degumming, which does not substantially change its triglyceride profile. A cell oil can also be a noninteresterified cell oil, which means that the cell oil has not undergone a process in which fatty acids have been redistributed in their acyl linkages to glycerol and remain essentially in the same configuration as when recovered from the organism.
(4) Exogenous gene shall mean a nucleic acid that codes for the expression of an RNA and/or protein that has been introduced into a cell (e.g. by transformation/transfection), and is also referred to as a transgene. A cell comprising an exogenous gene may be referred to as a recombinant cell, into which additional exogenous gene(s) may be introduced. The exogenous gene may be from a different species (and so heterologous), or from the same species (and so homologous), relative to the cell being transformed. Thus, an exogenous gene can include a homologous gene that occupies a different location in the genome of the cell or is under different control, relative to the endogenous copy of the gene. An exogenous gene may be present in more than one copy in the cell. An exogenous gene may be maintained in a cell, for example, as an insertion into the genome (nuclear or plastid) or as an episomal molecule.
(5) Fatty acids shall mean free fatty acids, fatty acid salts, or fatty acyl moieties in a glycerolipid. It will be understood that fatty acyl groups of glycerolipids can be described in terms of the carboxylic acid or anion of a carboxylic acid that is produced when the triglyceride is hydrolyzed or saponified.
(6) Microalgae are microbial organisms that contain a chloroplast or other plastid, and optionally that are capable of performing photosynthesis, or a prokaryotic microbial organism capable of performing photosynthesis. Microalgae include obligate photoautotrophs, which cannot metabolize a fixed carbon source as energy, as well as heterotrophs, which can live solely off of a fixed carbon source. Microalgae include unicellular organisms that separate from sister cells shortly after cell division, such as Chlamydomonas, as well as microbes such as, for example, Volvox, which is a simple multicellular photosynthetic microbe of two distinct cell types. Microalgae include cells such as Chlorella, Dunaliella, and Prototheca. Microalgae also include other microbial photosynthetic organisms that exhibit cell-cell adhesion, such as Agmenellum, Anabaena, and Pyrobotrys. Microalgae also include obligate heterotrophic microorganisms that have lost the ability to perform photosynthesis, such as certain dinoflagellate algae species and species of the genus Prototheca.
(7) An oleaginous cell is a cell capable of producing at least 20% lipid by dry cell weight, naturally or through recombinant or classical strain improvement. An oleaginous microbe or oleaginous microorganism is a microbe, including a microalga that is oleaginous.
(8) The term percent sequence identity, in the context of two or more amino acid or nucleic acid sequences, refers to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence, as measured using a sequence comparison algorithm or by visual inspection. For sequence comparison to determine percent nucleotide or amino acid identity, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters. Optimal alignment of sequences for comparison can be conducted using the NCBI BLAST software (ncbi.nlm.nih.gov/BLAST/) set to default parameters. For example, to compare two nucleic acid sequences, one may use blastn with the BLAST 2 Sequences tool Version 2.0.12 (Apr. 21, 2000) set at the following default parameters: Matrix: BLOSUM62; Reward for match: 1; Penalty for mismatch: 2; Open Gap: 5 and Extension Gap: 2 penalties; Gap x drop-off: 50; Expect: 10; Word Size: 11; Filter: on. For a pairwise comparison of two amino acid sequences, one may use the BLAST 2 Sequences tool Version 2.0.12 (Apr. 21, 2000) with blastp set, for example, at the following default parameters: Matrix: BLOSUM62; Open Gap: 11 and Extension Gap: 1 penalties; Gap x drop-off 50; Expect: 10; Word Size: 3; Filter: on.
(9) Where multiple sequence identities are given for a strain having a pair of exogenous genes, this encompasses all combinations of sequence identities. For example, coexpression of a first gene encoding a first protein having at least 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% with gene A and a second gene encoding a second protein having at least 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% with gene A shall be understood to encompass (i) at least 85% identity with gene A and least 85% identity with gene B, (ii)) at least 85% identity with gene A and least 99% identity with gene B, (iii) at least 92% identity with gene A and least 95% identity with gene B, and all other combinations.
(10) In connection with a cell oil, a profile is the distribution of particular species of triglycerides or fatty acyl groups within the oil. A fatty acid profile is the distribution of fatty acyl groups in the triglycerides of the oil without reference to attachment to a glycerol backbone. Fatty acid profiles are typically determined by conversion to a fatty acid methyl ester (FAME), followed by gas chromatography (GC) analysis with flame ionization detection (FID). The fatty acid profile can be expressed as one or more percent of a fatty acid in the total fatty acid signal determined from the area under the curve for that fatty acid. FAME-GC-FID measurement approximate weight percentages of the fatty acids.
(11) As used herein, an oil is said to be enriched in one or more particular fatty acids if there is at least a 10% increase in the mass of that fatty acid in the oil relative to the non-enriched oil. For example, in the case of a cell expressing a heterologous FatB gene described herein, the oil produced by the cell is said to be enriched in, e.g., C8 and C16 fatty acids if the mass of these fatty acids in the oil is at least 10% greater than in oil produced by a cell of the same type that does not express the heterologous FatB gene (e.g., wild type oil).
(12) Recombinant is a cell, nucleic acid, protein or vector that has been modified due to the introduction of an exogenous nucleic acid or the alteration of a native nucleic acid. Thus, e.g., recombinant (host) cells can express genes that are not found within the native (non-recombinant) form of the cell or express native genes differently than those genes are expressed by a non-recombinant cell. Recombinant cells can, without limitation, include recombinant nucleic acids that encode a gene product or suppression elements such as mutations, knockouts, antisense, interfering RNA (RNAi) or dsRNA that reduce the levels of active gene product in a cell. A recombinant nucleic acid is a nucleic acid originally formed in vitro, in general, by the manipulation of nucleic acid, e.g., using polymerases, ligases, exonucleases, and endonucleases, using chemical synthesis, or otherwise is in a form not normally found in nature. Recombinant nucleic acids may be produced, for example, to place two or more nucleic acids in operable linkage. Thus, an isolated nucleic acid or an expression vector formed in vitro by nucleic by ligating DNA molecules that are not normally joined in nature, are both considered recombinant for the purposes of this invention. Recombinant nucleic acids can also be produced in other ways; e.g., using chemical DNA synthesis. Once a recombinant nucleic acid is made and introduced into a host cell or organism, it may replicate using the in vivo cellular machinery of the host cell; however, such nucleic acids, once produced recombinantly, although subsequently replicated intracellularly, are still considered recombinant for purposes of this invention. Similarly, a recombinant protein is a protein made using recombinant techniques, i.e., through the expression of a recombinant nucleic acid.
(13) A KAS I-like gene or enzyme shall mean either a KAS I or KAS IV gene or enzyme.
(14) Embodiments of the present invention relate to the use of KASI-like genes isolated from plants or other organisms, which can be expressed in a transgenic host cell in order to alter the fatty acid profile of a cell-oil produced by the host cell. Although the microalga Prototheca moriformis was used to screen the genes for ability to the alter fatty acid profile, the genes discovered are useful in a wide variety of host cells for which genetic transformation techniques are known. For example, the genes can be expressed in bacteria, cyanobacteria, other eukaryotic microalgae, or higher plants. The genes can be expressed in higher plants according to the methods disclosed in U.S. Pat. No. 7,301,070, U.S. Pat. No. 6,348,642, U.S. Pat. No. 6,660,849, and U.S. Pat. No. 6,770,465. We have found that KASI-like transgenes can be used alone or in combination with a FatB transgene (encoding an active acyl-ACP thioesterase) can boost the levels of mid-chain fatty acids (e.g., capric, caprylic, lauric, myristic or palmitic acids) in the fatty acid profile of the cell oil. Combining an exogenous KASI-like gene with an exogenous FATA or FATB gene in a host cell can give levels of mid-chain fatty acids and/or long-chain fatty acids (e.g., stearic or oleic) greater than either exogenous gene alone. The fatty acids of the cell oil can be further converted to triglycerides, fatty aldehydes, fatty alcohols and other oleochemicals either synthetically or biosynthetically.
(15) In specific embodiments, triglycerides are produced by a host cell expressing a novel KASI-like gene (from a novel cDNA and/or under control of a heterologous promoter). A cell oil can be recovered from the host cell. Typically, the cell oil comprises mainly triglycerides and sterols. The cell oil can be refined, degummed, bleached and/or deodorized. The oil, in its unprocessed or processed form, can be used for foods, chemicals, fuels, cosmetics, plastics, and other uses. In other embodiments, the KASI-like gene may not be novel, but the expression of the gene in a microalga is novel.
(16) The KAS genes can be used in a variety of genetic constructs including plasmids or other vectors for expression or recombination in a host cell. The genes can be codon optimized for expression in a target host cell. The genes can be included in an expression cassette that includes a promoter (e.g., a heterologous promoter) and downstream regulatory element. The vector can include flanking sequences for homologous recombination. For example, the vector can cause insertion into a chromosome of the host cell, where it can be stably expressed. The proteins produced by the genes can be used in vivo or in purified form. In an embodiment, an expression cassette comprises a homologous promoter, a CDS operable to express a KASI-like enzyme of Table 1 and a 3UTR. The 3UTR can comprise a polyadenylation site.
(17) As described in the examples below, novel KAS genes are were discovered from cDNA produced from plant seed mRNA transcripts. Accordingly the gene sequences are non-natural because they lack introns that are present in the plant genes and mRNA transcripts of the genes prior to mRNA splicing. Accordingly, the invention comprises an isolated non-natural KASI-like gene of Table 1. Further departure from the natural gene is in the use of heterologous regulatory elements and expression in host cells for which such genes do not occur in nature.
(18) For example, the gene can be prepared in an expression vector comprising an operably linked promoter and 5UTR. Where a plastidic cell is used as the host, a suitably active plastid targeting peptide (also referred to below as a transit peptide) can be fused to the KASI-like gene, as in the examples below. The disclosed genes comprise a hydrophobic N-terminal plastid targeting sequence, which can be replaced with alternative targeting sequence and varied in length. Varying the plastid targeting peptide can improve cellular localization and enzyme activity for a given host-cell type. Thus, the invention contemplates deletions and fusion proteins in order to optimize enzyme activity in a given host cell. For example, a transit peptide from the host or related species may be used instead of that of the newly discovered plant genes described here. Additional terminal or internal deletions may be made so-long as the enzymatic activity is retained. The targeting peptide can be cleaved by the host cell to produce a mature KASI-like protein that lacks the targeting peptide.
(19) A selectable marker gene may be included in the vector to assist in isolating a transformed cell. Examples of selectable markers useful in microalgae include sucrose invertase, alpha galactosidase (for selection on melibiose) and antibiotic resistance genes.
(20) The gene sequences disclosed can also be used to prepare antisense, or inhibitory RNA (e.g., RNAi or hairpin RNA) to inhibit complementary genes in a plant or other organism. For example, armed with the knowledge of a gene sequence of Table 1, one can engineer a plant with the same or similar KASI-like gene to express an RNAi construct, gene knockout, point mutation, or the like, and thereby reduce the KASI or KASIV activity of the plant's seed. As a result, the plant can produce an oil with an altered fatty acid profile in which the mean chain length is decreased or increased, depending on the presence of other fatty acid synthesis genes.
(21) KASI-like genes/proteins found to be useful in producing desired fatty acid profiles in a cell are summarized below in Table 1, and related proteins discovered from transcript sequencing (as in Examples 1-2) are shown in Table 1a. Nucleic acids or proteins having the sequence of SEQ ID NOS: 2-18, 59, 62-72, 21-37 or 39-55 can be used to alter the fatty acid profile of a recombinant cell. Variant nucleic acids can also be used; e.g., variants having at least 70, 80, 85, 90, 95, 96, 97, 98, or 99% sequence identity to SEQ ID NOS: 21-37 or 39-55. Codon optimization of the genes for a variety of host organisms is contemplated, as is the use of gene fragments. Preferred codons for Prototheca strains and for Chlorella protothecoides are shown below in Tables 2 and 3, respectively. Codon usage for Cuphea wrightii is shown in Table 4. Codon usage for Arabidopsis is shown in Table 5; for example, the most preferred codon for each amino acid can be selected. Codon tables for other organisms including microalgae and higher plants are known in the art. In some embodiments, the first and/or second most preferred Prototheca codons are employed for codon optimization. In specific embodiments, the novel amino acid sequences contained in the sequence listings below are converted into nucleic acid sequences according to the most preferred codon usage in Prototheca, Chlorella, Cuphea wrightii, or Arabidopsis as set forth in tables 2 through 3b or nucleic acid sequences having at least 70, 80, 85, 90, 95, 96, 97, 98, or 99% sequence identity to these derived nucleic acid sequences. For example, the KASI-like gene can be codon optimized for Prototheca moriformis by substituting most preferred codons according to Table 2 for at least 10, 20, 30, 40, 50, 60, 70, 80, or 90% of all codons. Likewise, the KASI-like gene can be codon optimized for Chlorella protothecoides by substituting most-preferred codons according to Table 3 for at least 10, 20, 30, 40, 50, 60, 70, 80, or 90% of all codons. Alternately, the KASI-like gene can be codon optimized for Chlorella protothecoides or Prototheca moriformis by substituting first or second most-preferred codons according to Table 2 or 3 for at least 10, 20, 30, 40, 50, 60, 70, 80, or 90% of all codons. Codon-optimized genes are non-naturally occurring because they are optimized for expression in a host organism.
(22) In certain embodiments, percent sequence identity for variants of the nucleic acids or proteins discussed above can be calculated by using the full-length nucleic acid sequence (e.g., one of SEQ ID NOS: 21-37 or 39-55 or full-length amino acid sequence (e.g., one of SEQ ID NOS: 2-18) as the reference sequence and comparing the full-length test sequence to this reference sequence. For fragments, percent sequence identity for variants of nucleic acid or protein fragments can be calculated over the entire length of the fragment. In certain embodiments, there is a nucleic acid or protein fragment have at least 70, 80, 85, 90, 95, 96, 97, 98, or 99% sequence identity to one of SEQ ID NOS: 21-37, 39-55 or 2-18.
(23) Optionally, the plastidic targeting peptide can be swapped with another peptide that functions to traffic the KASI-like enzyme to a fatty acid synthesizing plastid of a plastidic host cell. Accordingly, in various embodiments of the invention, a transgene or transgenic host cell comprises a nucleotide or corresponding peptidic fusion of a plastic targeting sequence and an enzyme-domain sequence (the sequence remaining after deletion of the transit peptide), where the mature protein has at least 70, 80, 85, 90, 95, 96, 97, 98, or 99% sequence identity to an mature protein sequence listed in Table 1 or Table 1a. Plastid transit/targeting peptides are underlined in the accompanying informal sequence listing. Examples of targeting peptides include those of Table 1 and others known in the art, especially in connection with the targeting of KAS I, KAS II, KAS III, FATA, FATB and SAD (stearoyl-ACP desaturase) gene products to chloroplasts or other plastids of plants and microalgae. See examples of Chorophyta given in PCT publications WO2010/063032, WO2011/150411, WO2012/106560, and WO2013/158938. Optionally, the KASI-like genes encode 450, 475 or 500 amino acids or more (with or without the transit peptide), or about 555 residues (with the transit peptide,) in contrast to known truncated sequences.
(24) TABLE-US-00001 TABLE 1 KASI-like genes: The expression cassette used to test the genes in combination with a FATB transgene is given in SEQ ID NO: 38 (i.e., substituting the Cpal KASIV coding sequence of SEQ ID NO: 38 with various other coding sequences of Table 1), except that the Cuphea hookeriana KASIV was tested using the expression cassette of SEQ ID NO: 61. See Examples 1-4. nucleotide coding sequence Prototheca (from cDNA moriformis produced from codon- Amino seed mRNA, not optimized Acid codon- nucleotide Species Gene Name Sequence optimized) sequence Cuphea KASIV 2 21 39 palustris Cinnamonum KASIV 3 22 40 camphora Cinnamonum KASI 4 23 41 camphora Umbellularia KASI 5 24 42 californica U. KASIV 6 25 43 californica Cuphea. KASAI 7 26 44 wrightii Cuphea KASIVb 8 27 45 avigera Cuphea KASIVb 9 28 46 paucipetala C. ignea KASIVb 10 29 47 Cuphea KASIV 11 30 48 procumbens C. KASIVa 12 31 49 paucipetala Cuphea KASIV 13 32 50 painteri C. avigera KASIVa 14 33 51 C. ignea KASIVa 15 34 52 C. avigera KASIa 16 35 53 C. KASI 17 36 54 pulcherrima C. avigera mito- 18 37 55 chondrial KAS Cuphea KASIV 59 60, 61 hookeriana
(25) TABLE-US-00002 TABLE 1a Additional proteins encoded by cDNA discovered from transcript profiling of seeds. Coding sequences can be derived from codon tables for various host cells. Amino Acid Species Gene Name Sequence Various KASIV 69, 71 (Clade 1) consensus sequence Various KASIV 70, 72 (Clade 2) consensus sequence Cuphea KASIV 62 aequipetala Cuphea KASIV 63 glassostoma Cuphea KASIV 64 hookeriana Cuphea KASIV 65 glassostoma Cuphea KASIV 66, 67 carthagenesis C. pulcherrima KASIV 68
(26) TABLE-US-00003 TABLE2 CodonusageinProtothecastrains. Ala GCG 345(0.36) GCA 66(0.07) GCT 101(0.11) GCC 442(0.46) Cys TGT 12(0.10) TGC 105(0.90) Asp GAT 43(0.12) GAC 316(0.88) Glu GAG 377(0.96) GAA 14(0.04) Phe TTT 89(0.29) TTC 216(0.71) Gly GGG 92(0.12) GGA 56(0.07) GGT 76(0.10) GGC 559(0.71) His CAT 42(0.21) CAC 154(0.79) Ile ATA 4(0.01) ATT 30(0.08) ATC 338(0.91) Lys AAG 284(0.98) AAA 7(0.02) Leu TTG 26(0.04) TTA 3(0.00) CTG 447(0.61) CTA 20(0.03) CTT 45(0.06) CTC 190(0.26) Met ATG 191(1.00) Asn AAT 8(0.04) AAC 201(0.96) Pro CCG 161(0.29) CCA 49(0.09) CCT 71(0.13) CCC 267(0.49) Gln CAG 226(0.82) CAA 48(0.18) Arg AGG 33(0.06) AGA 14(0.02) CGG 102(0.18) CGA 49(0.08) CGT 51(0.09) CGC 331(0.57) Ser AGT 16(0.03) AGC 123(0.22) TCG 152(0.28) TCA 31(0.06) TCT 55(0.10) TCC 173(0.31) Thr ACG 184(0.38) ACA 24(0.05) ACT 21(0.05) ACC 249(0.52) Val GTG 308(0.50) GTA 9(0.01) GTT 35(0.06) GTC 262(0.43) Trp TGG 107(1.00) Tyr TAT 10(0.05) TAC 180(0.95) Stop TGA/TAG/TAA
(27) TABLE-US-00004 TABLE3 PreferredcodonusageinChlorellaprotothecoides. TTC(Phe) TGG(Trp) CTG(Leu) GAC(Asp) GCC(Ala) GAG(Glu) TAC(Tyr) CCC(Pro) CAG(Gln) TCC(Ser) AAC(Asn) TGC(Cys) CAC(His) ATC(Ile) ATG(Met) GGC(Gly) TGA(Stop) CGC(Arg) ACC(Thr) AAG(Lys) GTG(Val)
(28) TABLE-US-00005 TABLE4 CodonusageforCupheawrightii(codon,aminoacid,frequency,per thousand,number) UUU F 0.48 19.5 (52) UCU S 0.21 19.5 (52) UAU Y 0.45 6.4 (17) UGU C 0.41 10.5 (28) UUC F 0.52 21.3 (57) UCC S 0.26 23.6 (63) UAC Y 0.55 7.9 (21) UGC C 0.59 15.0 (40) UUA L 0.07 5.2 (14) UCA S 0.18 16.8 (45) UAA * 0.33 0.7 (2) UGA * 0.33 0.7 (2) UUG L 0.19 14.6 (39) UCG S 0.11 9.7 (26) UAG * 0.33 0.7 (2) UGG W 1.00 15.4 (41) CUU L 0.27 21.0 (56) CCU P 0.48 21.7 (58) CAU H 0.60 11.2 (30) CGU R 0.09 5.6 (15) CUC L 0.22 17.2 (46) CCC P 0.16 7.1 (19) CAC H 0.40 7.5 (20) CGC R 0.13 7.9 (21) CUA L 0.13 10.1 (27) CCA P 0.21 9.7 (26) CAA Q 0.31 8.6 (23) CGA R 0.11 6.7 (18) CUG L 0.12 9.7 (26) CCG P 0.16 7.1 (19) CAG Q 0.69 19.5 (52) CGG R 0.16 9.4 (25) AUU I 0.44 22.8 (61) ACU T 0.33 16.8 (45) AAU N 0.66 31.4 (84) AGU S 0.18 16.1 (43) AUC I 0.29 15.4 (41) ACC T 0.27 13.9 (37) AAC N 0.34 16.5 (44) AGC S 0.07 6.0 (16) AUA I 0.27 13.9 (37) ACA T 0.26 13.5 (36) AAA K 0.42 21.0 (56) AGA R 0.24 14.2 (38) AUG M 1.00 28.1 (75) ACG T 0.14 7.1 (19) AAG K 0.58 29.2 (78) AGG R 0.27 16.1 (43) GUU V 0.28 19.8 (53) GCU A 0.35 31.4 (84) GAU D 0.63 35.9 (96) GGU G 0.29 26.6 (71) GUC V 0.21 15.0 (40) GCC A 0.20 18.0 (48) GAC D 0.37 21.0 (56) GGC G 0.20 18.0 (48) GUA V 0.14 10.1 (27) GCA A 0.33 29.6 (79) GAA E 0.41 18.3 (49) GGA G 0.35 31.4 (84) GUG V 0.36 25.1 (67) GCG A 0.11 9.7 (26) GAG E 0.59 26.2 (70) GGG G 0.16 14.2 (38)
(29) TABLE-US-00006 TABLE5 CodonusageforArabidopsis(codon,aminoacid,frequency,perthousand) UUU F 0.51 21.8 UCU S 0.28 25.2 UAU Y 0.52 14.6 UGU C 0.60 10.5 UUC F 0.49 20.7 UCC S 0.13 11.2 UAC Y 0.48 13.7 UGC C 0.40 7.2 UUA L 0.14 12.7 UCA S 0.20 18.3 UAA * 0.36 0.9 UGA * 0.44 1.2 UUG L 0.22 20.9 UCG S 0.10 9.3 UAG * 0.20 0.5 UGG W 1.00 12.5 CUU L 0.26 24.1 CCU P 0.38 18.7 CAU H 0.61 13.8 CGU R 0.17 9.0 CUC L 0.17 16.1 CCC P 0.11 5.3 CAC H 0.39 8.7 CGC R 0.07 3.8 CUA L 0.11 9.9 CCA P 0.33 16.1 CAA Q 0.56 19.4 CGA R 0.12 6.3 CUG L 0.11 9.8 CCG P 0.18 8.6 CAG Q 0.44 15.2 CGG R 0.09 4.9 AUU I 0.41 21.5 ACU T 0.34 17.5 AAU N 0.52 22.3 AGU S 0.16 14.0 AUC I 0.35 18.5 ACC T 0.20 10.3 AAC N 0.48 20.9 AGC S 0.13 11.3 AUA I 0.24 12.6 ACA T 0.31 15.7 AAA K 0.49 30.8 AGA R 0.35 19.0 AUG M 1.00 24.5 ACG T 0.15 7.7 AAG K 0.51 32.7 AGG R 0.20 11.0 GUU V 0.40 27.2 GCU A 0.43 28.3 GAU D 0.68 36.6 GGU G 0.34 22.2 GUC V 0.19 12.8 GCC A 0.16 10.3 GAC D 0.32 17.2 GGC G 0.14 9.2 GUA V 0.15 9.9 GCA A 0.27 17.5 GAA E 0.52 34.3 GGA G 0.37 24.2 GUG V 0.26 17.4 GCG A 0.14 9.0 GAG E 0.48 32.2 GGG G 0.16 10.2
Gene Combinations
(30) In an embodiment, a gene/gene-product of Table 1 is co-expressed in a host cell with an exogenous FATA or FATB acyl-ACP thioesterase gene. In a specific embodiment, the FATB gene product has at least 85, 90, 91, 92, 93, 94, 95, 95.5, 96, 96.5 97, 97.5, 98, 98.5 or 99% amino acid sequence identity to the Cuphea palustris FATB2 (Cpal FATB2, accession AAC49180, SEQ ID NO: 1) or C. hookeriana FATB2 (Ch FATB2, accession U39834, SEQ ID NO: 57) or fragment thereof. Optionally the FATB gene product has at least 85, 90, 91, 92, 93, 94, 95, 95.5, 96, 96.5 97, 97.5, 98, 98.5 or 99% amino acid sequence identity to the non-transit-peptide domain of Cuphea palustris FATB2 (Cpal FATB2, accession AAC49180, SEQ ID NO: 1) or C. hookeriana FATB2 (Ch FATB2, accession U39834 SEQ ID NO: 57)).
(31) FATA genes encode enzymes that preferentially, but not exclusively, hydrolyze long-chain fatty acids with highest activity towards C18:1. FATB genes encode a group of enzymes with more heterogeneous substrate specificities but generally show higher activity toward saturated fatty acids. The substrate specificities of FATB enzymes are quite heterogenous; there are a number of FATB enzymes that show high activity towards C18:0 and C18:1. FATA and FATB enzymes terminate the synthesis of fatty acids by hydrolyzing the thioester bond between the acyl moiety and the acyl carrier protein (ACP).
(32) In an embodiment, a host cell is transformed to express both a FATA or FATB and KASI-like transgene. The host-cell produces a cell oil. Together, the FATA or FATB and KASI-like genes are expressed to produce their respective gene products and thereby alter the fatty acid profile of the cell oil. The two genes function either additively or synergistically with respect to control strains lacking one of the two genes. Optionally, the host cell is oleaginous and can be an oleaginous eukaryotic microalgae such as those described above or below. The fatty acid profile of the cell oil can be enriched (relative to an appropriate control) in C14:0 (myristic), C8:0, C10:0 or a combination of C8/C10.
(33) In an embodiment, the fatty acid profile of the cell is enriched in C14:0 fatty acids. In this embodiment, the FATB gene expresses an acyl-ACP thioesterase enzyme having at least 85, 90, 91, 92, 93, 94, 95, 95.5, 96, 96.5 97, 97.5, 98, 98.5 or 99% amino acid sequence identity percent amino acid identity to the enzyme of SEQ ID NO: 1. The co-expressed KASI-like gene encodes a beta-ketoacyl ACP synthase having at least 85, 90, 91, 92, 93, 94, 95, 95.5, 96, 96.5 97, 97.5, 98, 98.5 or 99% amino acid sequence identity percent amino acid identity to the enzyme of SEQ ID NO: 2. Alternately The co-expressed KASI-like gene encodes a beta-ketoacyl ACP synthase having at least 85, 90, 91, 92, 93, 94, 95, 95.5, 96, 96.5 97, 97.5, 98, 98.5 or 99% amino acid sequence identity percent amino acid identity to the enzyme of SEQ ID NO: 7. Optionally, the cell oil has a fatty acid profile characterized by at least 10%, 20%, 30%, 40%, 50% or at least 55% C14:0 (area % by FAME-GC-FID).
(34) In another embodiment, the fatty acid profile of the cell is enriched in C8:0 and/or C10:0 fatty acids. In this embodiment, the FATB gene expresses an acyl-ACP thioesterase enzyme having at least 85, 90, 91, 92, 93, 94, 95, 95.5, 96, 96.5 97, 97.5, 98, 98.5 or 99% amino acid sequence identity percent amino acid identity to the enzyme of SEQ ID NO: 57. The co-expressed KASI-like gene encodes a beta-ketoacyl ACP synthase having at least 85, 90, 91, 92, 93, 94, 9595.5, 96, 96.5 97, 97.5, 98, 98.5 or 99% amino acid sequence identity percent amino acid identity to an enzyme of one of SEQ ID NOs: 2, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 37. In a related embodiment, the co-expressed KASI-like gene encodes a beta-ketoacyl ACP synthase having at least 85, 90, 91, 92, 93, 94, 95, 95.5, 96, 96.5 97, 97.5, 98, 98.5 or 99% amino acid sequence identity percent amino acid identity to enzyme of one of SEQ ID NO: 2, 8, 11, 12, 13, 14, or 15. Optionally, the cell oil has a fatty acid profile characterized by at least 7, 8, 9, 10, 11, 12, 13, or 14 area % C8:0 (by FAME-GC-FID). Optionally, the cell oil has a fatty acid profile characterized by at least 10, 15, 20, 25, 30, or 35 area % for the sum of C8:0 and C10:0 fatty acids (by FAME-GC-FID). Optionally, the C8/C10 ratio of the cell oil is in the range of 2.2-2.5, 2.5-3.0, or 3.0-3.4.
(35) Optionally, the oils produced by these methods can have a sterol profile in accord with those described below.
(36) Host Cells
(37) The host cell can be a single cell (e.g., microalga, bacteria, yeast) or part of a multicellular organism such as a plant or fungus. Methods for expressing KASI-like genes in a plant are given in U.S. Pat. No. 7,301,070, U.S. Pat. No. 6,348,642, U.S. Pat. No. 6,660,849, and U.S. Pat. No. 6,770,465, or can be accomplished using other techniques generally known in plant biotechnology. Engineering of eukaryotic oleaginous microbes including eukaryotic microalgae (e.g., of Chlorophyta) is disclosed in WO2010/063032, WO2011/150411, and WO2012/106560 and in the examples below.
(38) Examples of oleaginous host cells include plant cells and microbial cells having a type II fatty acid biosynthetic pathway, including plastidic oleaginous cells such as those of oleaginous algae. Specific examples of microalgal cells include heterotrophic or obligate heterotrophic eukaryotic microalgae of the phylum Chlorophtya, the class Trebouxiophytae, the order Chlorellales, or the family Chlorellacae. Examples of eukaryotic oleaginous microalgae host cells are provided in Published PCT Patent Applications WO2008/151149, WO2010/06032, WO2011/150410, and WO2011/150411, including species of Chlorella and Prototheca, a genus comprising obligate heterotrophs. The oleaginous cells can be, for example, capable of producing 25, 30, 40, 50, 60, 70, 80, 85, or about 90% oil by cell weight, 5%. Optionally, the oils produced can be low in DHA or EPA fatty acids. For example, the oils can comprise less than 5%, 2%, or 1% DHA and/or EPA. The above-mentioned publications also disclose methods for cultivating such cells and extracting oil, especially from microalgal cells; such methods are applicable to the cells disclosed herein and incorporated by reference for these teachings. When microalgal cells are used they can be cultivated autotrophically (unless an obligate heterotroph) or in the dark using a sugar (e.g., glucose, fructose and/or sucrose). When cultivated heterotrophically, the cells and cell oil can comprise less than 200 ppm, 20 ppm, or 2 ppm of color-generating impurities or of chlorophyll. In any of the embodiments described herein, the cells can be heterotrophic cells comprising an exogenous invertase gene so as to allow the cells to produce oil from a sucrose feedstock. Alternately, or in addition, the cells can metabolize xylose from cellulosic feedstocks. For example, the cells can be genetically engineered to express one or more xylose metabolism genes such as those encoding an active xylose transporter, a xylulose-5-phosphate transporter, a xylose isomerase, a xylulokinase, a xylitol dehydrogenase and a xylose reductase. See WO2012/154626, GENETICALLY ENGINEERED MICROORGANISMS THAT METABOLIZE XYLOSE, published Nov. 15, 2012. The cells can be cultivated on a depolymerized cellulosic feedstock such as acid or enzyme hydrolyzed bagasse, sugar beet pulp, corn stover, wood chips, sawdust or switchgrass. Optionally, the cells can be cultivated on a depolymerized cellulosic feedstock comprising glucose and at least 5, 10, 20, 30 or 40% xylose, while producing at least 20% lipid by dry weight. Optionally, the lipid comprises triglycerides having a fatty acid profile characterized by at least 10, 15 or 20% C12:0
(39) Optionally, the host cell comprises 23S rRNA having at least 65, 70, 75, 80, 85, 90 or 95% nucleotide sequence identity to SEQ ID NO: 58.
(40) Oils and Related Products
(41) The oleaginous cells express one or more exogenous genes encoding fatty acid biosynthesis enzymes. As a result, some embodiments feature cell oils that were not obtainable from a non-plant or non-seed oil, or not obtainable at all.
(42) The oleaginous cells produce a storage oil, which is primarily triacylglyceride and may be stored in storage bodies of the cell. A raw oil may be obtained from the cells by disrupting the cells and isolating the oil. WO2008/151149, WO2010/06032, WO2011/150410, and WO2011/1504 disclose heterotrophic cultivation and oil isolation techniques. For example, oil may be obtained by cultivating, drying and pressing the cells. The cell oils produced may be refined, bleached and deodorized (RBD) as known in the seed-oil art or as described in WO2010/120939. The refining step may comprise degumming. The raw, refined, or RBD oils may be used in a variety of food, chemical, and industrial products or processes. After recovery of the oil, a valuable residual biomass remains. Uses for the residual biomass include the production of paper, plastics, absorbents, adsorbents, as animal feed, for human nutrition, or for fertilizer.
(43) Where a fatty acid profile of a triglyceride (also referred to as a triacylglyceride or TAG) cell oil is given here, it will be understood that this refers to a nonfractionated sample of the storage oil extracted from the cell analyzed under conditions in which phospholipids have been removed or with an analysis method that is substantially insensitive to the fatty acids of the phospholipids (e.g. using chromatography and mass spectrometry). The oil may be subjected to an RBD process to remove phospholipids, free fatty acids and odors yet have only minor or negligible changes to the fatty acid profile of the triglycerides in the oil. Because the cells are oleaginous, in some cases the storage oil will constitute the bulk of all the TAGs in the cell.
(44) The stable carbon isotope value 13C is an expression of the ratio of 13C/12C relative to a standard (e.g. PDB, carbonite of fossil skeleton of Belemnite americana from Peedee formation of South Carolina). The stable carbon isotope value 13C (0/00) of the oils can be related to the 13C value of the feedstock used. In some embodiments, the oils are derived from oleaginous organisms heterotrophically grown on sugar derived from a C4 plant such as corn or sugarcane. In some embodiments the 13C (0/00) of the oil is from 10 to 17 0/00 or from 13 to 16 0/00.
(45) The oils produced according to the above methods in some cases are made using a microalgal host cell. As described above, the microalga can be, without limitation, be a eukaryotic microalga falling in the classification of Chlorophyta, Trebouxiophyceae, Chlorellales, Chlorellaceae, or Chlorophyceae. It has been found that microalgae of Trebouxiophyceae can be distinguished from vegetable oils based on their sterol profiles. Oil produced by Chlorella protothecoides (a close relative of Prototheca moriformis) was found to produce sterols that appeared to be brassicasterol, ergosterol, campesterol, stigmasterol, and beta-sitosterol, when detected by GC-MS. However, it is believed that all sterols produced by Chlorella have C24 stereochemistry. Thus, it is believed that the molecules detected as campesterol, stigmasterol, and beta-sitosterol, are actually 22,23-dihydrobrassicasterol, proferasterol and clionasterol, respectively. Thus, the oils produced by the microalgae described above can be distinguished from plant oils by the presence of sterols with C24 stereochemistry and the absence of C24 stereochemistry in the sterols present. For example, the oils produced may contain 22, 23-dihydrobrassicasterol while lacking campesterol; contain clionasterol, while lacking in beta-sitosterol, and/or contain poriferasterol while lacking stigmasterol. Alternately, or in addition, the oils may contain significant amounts of .sup.7-poriferasterol.
(46) In one embodiment, the oils provided herein are not vegetable oils. Vegetable oils are oils extracted from plants and plant seeds. Vegetable oils can be distinguished from the non-plant oils provided herein on the basis of their oil content. A variety of methods for analyzing the oil content can be employed to determine the source of the oil or whether adulteration of an oil provided herein with an oil of a different (e.g. plant) origin has occurred. The determination can be made on the basis of one or a combination of the analytical methods. These tests include but are not limited to analysis of one or more of free fatty acids, fatty acid profile, total triacylglycerol content, diacylglycerol content, peroxide values, spectroscopic properties (e.g. UV absorption), sterol profile, sterol degradation products, antioxidants (e.g. tocopherols), pigments (e.g. chlorophyll), d13C values and sensory analysis (e.g. taste, odor, and mouth feel). Many such tests have been standardized for commercial oils such as the Codex Alimentarius standards for edible fats and oils.
(47) Sterol profile analysis is a particularly well-known method for determining the biological source of organic matter. Campesterol, -sitosterol, and stigmasterol are common plant sterols, with -sitosterol being a principle plant sterol. For example, -sitosterol was found to be in greatest abundance in an analysis of certain seed oils, approximately 64% in corn, 29% in rapeseed, 64% in sunflower, 74% in cottonseed, 26% in soybean, and 79% in olive oil (Gul et al. J. Cell and Molecular Biology 5:71-79, 2006).
(48) Oil isolated from Prototheca moriformis strain UTEX1435 were separately clarified (CL), refined and bleached (RB), or refined, bleached and deodorized (RBD) and were tested for sterol content according to the procedure described in JAOCS vol. 60, no. 8, August 1983. Results of the analysis are shown below (units in mg/100 g) in Table 6.
(49) TABLE-US-00007 TABLE 6 Sterols in microalgal oil. Refined, Clar- Refined & bleached, & Sterol Crude ified bleached deodorized 1 Ergosterol 384 398 293 302 (56%) (55%) (50%) (50%) 2 5,22-cholestadien- 14.6 18.8 14 15.2 24-methyl-3-ol (2.1%) (2.6%) (2.4%) (2.5%) (Brassicasterol) 3 24-methylcholest-5- 10.7 11.9 10.9 10.8 en-3-ol (Campesterol (1.6%) (1.6%) (1.8%) (1.8%) or 22,23- dihydrobrassicasterol) 4 5,22-cholestadien-24- 57.7 59.2 46.8 49.9 ethyl-3-ol (8.4%) (8.2%) (7.9%) (8.3%) (Stigmasterol or poriferasterol) 5 24-ethylcholest-5-en- 9.64 9.92 9.26 10.2 3-ol (-Sitosterol (1.4%) (1.4%) (1.6%) (1.7%) or clionasterol) 6 Other sterols 209 221 216 213 Total sterols 685.64 718.82 589.96 601.1
(50) These results show three striking features. First, ergosterol was found to be the most abundant of all the sterols, accounting for about 50% or more of the total sterols. The amount of ergosterol is greater than that of campesterol, beta-sitosterol, and stigmasterol combined. Ergosterol is steroid commonly found in fungus and not commonly found in plants, and its presence particularly in significant amounts serves as a useful marker for non-plant oils. Secondly, the oil was found to contain brassicasterol. With the exception of rapeseed oil, brassicasterol is not commonly found in plant based oils. Thirdly, less than 2% beta-sitosterol was found to be present. Beta-sitosterol is a prominent plant sterol not commonly found in microalgae, and its presence particularly in significant amounts serves as a useful marker for oils of plant origin. In summary, Prototheca moriformis strain UTEX1435 has been found to contain both significant amounts of ergosterol and only trace amounts of beta-sitosterol as a percentage of total sterol content. Accordingly, the ratio of ergosterol:beta-sitosterol or in combination with the presence of brassicasterol can be used to distinguish this oil from plant oils.
(51) In some embodiments, the oil content of an oil provided herein contains, as a percentage of total sterols, less than 20%, 15%, 10%, 5%, 4%, 3%, 2%, or 1% beta-sitosterol. In other embodiments the oil is free from beta-sitosterol.
(52) In some embodiments, the oil is free from one or more of beta-sitosterol, campesterol, or stigmasterol. In some embodiments the oil is free from beta-sitosterol, campesterol, and stigmasterol. In some embodiments the oil is free from campesterol. In some embodiments the oil is free from stigmasterol.
(53) In some embodiments, the oil content of an oil provided herein comprises, as a percentage of total sterols, less than 20%, 15%, 10%, 5%, 4%, 3%, 2%, or 1% 24-ethylcholest-5-en-3-ol. In some embodiments, the 24-ethylcholest-5-en-3-ol is clionasterol. In some embodiments, the oil content of an oil provided herein comprises, as a percentage of total sterols, at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% clionasterol.
(54) In some embodiments, the oil content of an oil provided herein contains, as a percentage of total sterols, less than 20%, 15%, 10%, 5%, 4%, 3%, 2%, or 1% 24-methylcholest-5-en-3-ol. In some embodiments, the 24-methylcholest-5-en-3-ol is 22, 23-dihydrobrassicasterol. In some embodiments, the oil content of an oil provided herein comprises, as a percentage of total sterols, at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% 22,23-dihydrobrassicasterol.
(55) In some embodiments, the oil content of an oil provided herein contains, as a percentage of total sterols, less than 20%, 15%, 10%, 5%, 4%, 3%, 2%, or 1% 5,22-cholestadien-24-ethyl-3-ol. In some embodiments, the 5, 22-cholestadien-24-ethyl-3-ol is poriferasterol. In some embodiments, the oil content of an oil provided herein comprises, as a percentage of total sterols, at least 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10% poriferasterol.
(56) In some embodiments, the oil content of an oil provided herein contains ergosterol or brassicasterol or a combination of the two. In some embodiments, the oil content contains, as a percentage of total sterols, at least 5%, 10%, 20%, 25%, 35%, 40%, 45%, 50%, 55%, 60%, or 65% ergosterol. In some embodiments, the oil content contains, as a percentage of total sterols, at least 25% ergosterol. In some embodiments, the oil content contains, as a percentage of total sterols, at least 40% ergosterol. In some embodiments, the oil content contains, as a percentage of total sterols, at least 5%, 10%, 20%, 25%, 35%, 40%, 45%, 50%, 55%, 60%, or 65% of a combination of ergosterol and brassicasterol.
(57) In some embodiments, the oil content contains, as a percentage of total sterols, at least 1%, 2%, 3%, 4% or 5% brassicasterol. In some embodiments, the oil content contains, as a percentage of total sterols less than 10%, 9%, 8%, 7%, 6%, or 5% brassicasterol.
(58) In some embodiments the ratio of ergosterol to brassicasterol is at least 5:1, 10:1, 15:1, or 20:1.
(59) In some embodiments, the oil content contains, as a percentage of total sterols, at least 5%, 10%, 20%, 25%, 35%, 40%, 45%, 50%, 55%, 60%, or 65% ergosterol and less than 20%, 15%, 10%, 5%, 4%, 3%, 2%, or 1% beta-sitosterol. In some embodiments, the oil content contains, as a percentage of total sterols, at least 25% ergosterol and less than 5% beta-sitosterol. In some embodiments, the oil content further comprises brassicasterol. For any of the oils or cell-oils disclosed in this application, the oil can have the sterol profile of any column of Table 6, above, with a sterol-by-sterol variation of 30%, 20%, 10% or less.
(60) Sterols contain from 27 to 29 carbon atoms (C27 to C29) and are found in all eukaryotes. Animals exclusively make C27 sterols as they lack the ability to further modify the C27 sterols to produce C28 and C29 sterols. Plants however are able to synthesize C28 and C29 sterols, and C28/C29 plant sterols are often referred to as phytosterols. The sterol profile of a given plant is high in C29 sterols, and the primary sterols in plants are typically the C29 sterols beta-sitosterol and stigmasterol. In contrast, the sterol profile of non-plant organisms contain greater percentages of C27 and C28 sterols. For example the sterols in fungi and in many microalgae are principally C28 sterols. The sterol profile and particularly the striking predominance of C29 sterols over C28 sterols in plants has been exploited for determining the proportion of plant and marine matter in soil samples (Huang, Wen-Yen, Meinschein W. G., Sterols as ecological indicators; Geochimica et Cosmochimia Acta. Vol 43. pp 739-745).
(61) In some embodiments the primary sterols in the microalgal oils provided herein are sterols other than beta-sitosterol and stigmasterol. In some embodiments of the microalgal oils, C29 sterols make up less than 50%, 40%, 30%, 20%, 10%, or 5% by weight of the total sterol content.
(62) In some embodiments the microalgal oils provided herein contain C28 sterols in excess of C29 sterols. In some embodiments of the microalgal oils, C28 sterols make up greater than 50%, 60%, 70%, 80%, 90%, or 95% by weight of the total sterol content. In some embodiments the C28 sterol is ergosterol. In some embodiments the C28 sterol is brassicasterol.
(63) In embodiments of the present invention, oleaginous cells expressing one or more of the genes of Table 1 can produce an oil with at least 20, 40, 60 or 70% of C8, C10, C12, C14 or C16 fatty acids. In a specific embodiment, the level of myristate (C14:0) in the oil is greater than 30%.
(64) Thus, in embodiments of the invention, there is a process for producing an oil, triglyceride, fatty acid, or derivative of any of these, comprising transforming a cell with any of the nucleic acids discussed herein. In another embodiment, the transformed cell is cultivated to produce an oil and, optionally, the oil is extracted. Oil extracted in this way can be used to produce food, oleochemicals or other products.
(65) The oils discussed above alone or in combination are useful in the production of foods, fuels and chemicals (including plastics, foams, films, detergents, soaps, etc). The oils, triglycerides, fatty acids from the oils may be subjected to CH activation, hydroamino methylation, methoxy-carbonation, ozonolysis, enzymatic transformations, epoxidation, methylation, dimerization, thiolation, metathesis, hydro-alkylation, lactonization, or other chemical processes.
(66) After extracting the oil, a residual biomass may be left, which may have use as a fuel, as an animal feed, or as an ingredient in paper, plastic, or other product. For example, residual biomass from heterotrophic algae can be used in such products.
EXAMPLES
Example 1: Screening KAS Genes in Combination with Cuphea palustris FATB2 Acyl-ACP Thioesterase
(67) A Prototheca moriformis strain expressing codon optimized Cuphea palustris (Cpal) FATB2 was constructed as described in WO2013/158938, example 53 (p. 231). The amino acid sequence of the Cpal FATB2 gene is given in SEQ ID NO: 1. This strain (S6336) produced a cell oil characterized by a fatty acid profile having about 38% myristic acid (C14:0).
(68) Six KASI-like genes were cloned from seed oil genomes. Total RNA was extracted from dried mature seeds using a liquid-nitrogen-chilled mortar and pestle to break open the seed walls. RNA was then precipitated with an 8M urea, 3M LiCl solution followed by a phenol-chloroform extraction. A cDNA library was generated with oligo dT primers using the purified RNA and subjected to Next Generation sequencing. The novel KAS genes were identified from the assembled transcriptome using BLAST with known KAS genes as bait. The identified KAS gene sequences were codon optimized for expression in Prototheca and synthesized for incorporation into an expression cassette.
(69) To test the impact on myristate accumulation, S6336 was transformed with a linearized plasmid designed for homologous recombination at the pLOOP locus and to express the KASI-like genes with coexpression of a selection marker (see WO2013/1589380). The vector is described in SEQ ID NO 38, the remaining codon optimized KAS genes were substituted into the KAS CDS segment of this vector prior to transformation. As shown in Table 7, increases in C14:0 levels in extracted cell oil were observed with the expression of the C. camphora KASIV (D3147), C. camphora KASI (D3148), U. californica KASI (D3150) or U. californica KASVI (D3152) genes in S6336. Even greater increases in C14:0 levels resulted from expression the KASI gene from C. palustris KASIV (D3145) or C. wrightii KASAI (D3153), with some individual lines producing >50% or >55% C14:0. The C14 production far exceeded the negligible amount found in the wild-type oil (see Table 7a).
(70) TABLE-US-00008 TABLE 7 KAS genes that effect an increase in C14 fatty acids in eukaryotic microalgal oil. C14:0 (area %. SEQ ID mean of 4 Highest C14:0 Gene (transformant ID) NOs: transformants) observed C. camphora KASIV 3, 22, 40 38.0 40.3 C. camphora KASI 4, 23, 41 33.8 39.3 U. californica KASI 5, 24, 42 37.4 42.3 U. californica KASVI 6, 25, 43 38.4 41.6 C. palustris KASIV 2, 21, 39 45.4 58.4 C. wrightii KASAI 7, 26, 44 43.2 53.6
(71) TABLE-US-00009 TABLE 7a Fatty acid profile of wild-type Prototheca moriformis oil (area %). C8:0 C10:0 C12:0 C14:0 C16:0 C18:0 C18:1 C18:2 C18:3 0 0 0 2 38 4 48 5 1
Example 2: Screening KAS Genes in Combination with Cuphea hookeriana FATB Acyl-ACP Thioesterase
(72) P. moriformis strains were constructed that express ChFATB2 acyl-ACP thioesterase together with a KAS gene selected from ten KASI, one KASIII and one mitochondrial KAS were cloned from seed oil genomes, codon optimized and introduced into Prototheca as described in Example 1. The KAS genes were fused to an HA epitope TAG at the c-terminus of each KAS to allow confirmation of protein expression.
(73) TABLE-US-00010 TABLE 8 Mean C8:0-C10:0 fatty acid profiles derived from transformation of FATB2-expressing microalgal strain with KASI-like genes isolated from seed oil genomes. SEQ ID NOS: (amino acid, CDS, codon C8:0 C10:0 Sum optimized (mean (mean C8:0 + C10/C8 KAS Gene CDS) area %) area %) C10:0 ratio C. avigera 16, 35, 53 8.0 21.4 29.3 2.7 KASIa C. pulcherrima 17, 36, 54 7.7 20.3 28.0 2.6 KASI C. avigera NL, 37, 55 7.8 20.4 28.2 2.6 Mitochondrial KAS C. avigera 19, NL, 56 9.5 22.8 32.3 2.4 KAS III C. paucipetala 9, 28, 46 7.9 22.5 30.3 2.9 KASIVb C. ignea 10, 29, 47 6.6 18.7 25.4 2.8 KASIVb C. painteri 13, 32, 50 9.0 22.4 31.4 2.5 KASIV C. palustris 2, 21, 38 8.6 21.6 30.4 2.5 KASIVa C. avigera 8, 27, 45 11.0 23.8 34.8 2.2 KASIVb C. procumbens 11, 30, 48 8.2 25.8 34.0 3.2 KASIV C. paucipetala 12, 31, 49 8.8 29.9 39.4 3.4 KASIVa C. ignea 15, 34, 52 8.6 25.8 34.4 3.0 KASIVa C avigera 14, 33, 51 10.0 23.0 32.9 2.3 KASIVa C. hookeriana 59, NL, 61 14.5 27.81 42.6 3.0 KASIV
(74) The parental strain is a stable microalgal strain expressing the C. hookeriana FATB2 under the control of the pH5-compatible PmUAPA1 promoter. The parental strain accumulates 27.8% C8:0-C10:0 with a C10/C8 ratio of 2.6. All transformants are derived from integrations of the KASI transgenes at the pLOOP locus of the parental strain. Means are calculated from at least 19 individual transformants for each KAS transgene (NL=not listed).
(75) As can be seen from Table 8, expression of the following KAS genes significantly increased C8:0-C10:0 levels: C. avigera KASIVb (D3287), C. procumbens KASIV (D3290), C. paucipetala KASIVa (D3291), C. avigera KASIVa (D3293), and C. ignea KASIVa (D3294). Importantly, expression of the C. avigera KASIVb (D3287) augmented the accumulation of both C8:0 and C10:0 fatty acids, while only C10:0 levels were increased upon expression of D3290, D3291, D3293 and D3294. In some cases the sum of C8:0 and C10:0 fatty acids in the fatty acid profile was at least 30%, or at least 35% (area % by FAME-GC-FID). The midchain production far exceeded the negligible amount found in the wild-type oil (see Table 7a).
(76) The mean C8/C10 ratios of Table 8 ranged from 2.2 to 3.4. The sum of mean C8 and C10 ranged from 25.4 to 39.4.
(77) The highest C8:0 producing strain found was D3287, which combined C. avigera KASIV with C. hookeriana FATB2. The mean was 11.0% C8:0 with a range of 12.4 to 14.8. Thus, a cell oil with a fatty acid profile of greater than 14% C8 was produced. Furthermore, the C10/C8 ratio was less than 2.5.
Example 3: Identification of KAS Clades and Consensus Sequences
(78) The newly identified sequences of KASI-like genes were compared to those in the ThYme database of thioester-active enzymes maintained by Iowa State University (enzyme.cbirc.iastate.edu) using the blast algorithm and the top hits were extracted. The top 50 BLAST hits were downloaded and a multiple alignment was created using ClustalW alignment algorithm and a phylogenetic tree (
(79) Two new clades were identified Clade 1 and Clade 2, characterized by consensus SEQ ID NO: 69 and SEQ ID NO:70, which include transit peptides. The clades can also be characterized by the sequences of the mature consensus proteins SEQ ID NO: 71 and SEQ ID NO: 72, respectively. The KAS genes of Clade 1 are associated with production of elevated C8 and C10 fatty acids based on based on transformations in P. moriformis in combination with a FATB acyl-ACP thioesterase as in Example 2. The KAS genes of Clade 2 are associated with production of elevated C10 fatty acids based on transformations in P. moriformis in combination with a FATB acyl-ACP thioesterase as in Example 2.
(80) Although the above discussion discloses various exemplary embodiments of the invention, it should be apparent that those skilled in the art can make various modifications that will achieve some of the advantages of the invention without departing from the true scope of the invention.
(81) TABLE-US-00011 SEQUENCELISTING SEQIDNO:1 CupheapalustrisFATB2aminoacidsequence(GenbankAccessionNo.AAC49180.1) MVAAAASAAFFSVATPRTNISPSSLSVPFKPKSNHNGGFQVKANASAHPKANGSAVS LKSGSLETQEDKTSSSSPPPRTFINQLPVWSMLLSAVTTVFGVAEKQWPMLDRKSKR PDMLVEPLGVDRIVYDGVSFRQSFSIRSYEIGADRTASIETLMNMFQETSLNHCKIIGL LNDGFGRTPEMCKRDLIWVVTKMQIEVNRYPTWGDTIEVNTWVSASGKHGMGRD WLISDCHTGEILIRATSVWAMMNQKTRRLSKIPYEVRQEIEPQFVDSAPVIVDDRKFH KLDLKTGDSICNGLTPRWTDLDVNQHVNNVKYIGWILQSVPTEVFETQELCGLTLEY RRECGRDSVLESVTAMDPSKEGDRSLYQHLLRLEDGADIVKGRTEWRPKNAGAKG AILTGKTSNGNSIS SEQIDNO:2 AminoacidsequenceoftheC.palustrisKASIV(D3145andD3295,pSZ4312).Thealga1 transitpeptideisunderlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTFQCLVTSYIDPCNQFSSSASL SFLGDNGFASLFGSKPERSNRGHRRLGRASHSGEAMAVALEPAQEVATKKKPLVKQ RRVVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIEAFDCTQFPTRIAGEIKSFSTDG WVAPKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELDKRKCGVLIGSGLGGM KLFSDSIEALRISYKKMNPFCVPFATTNMGSAMLAMDLGWMGPNYSISTACATSNFC ILNSANHIVRGEADMMLCGGSDAVIIPIGLGGFVACRALSQRNNDPTKASRPWDSNR DGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAGVILCI EKALAQAGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHL IGAAGGVEAVTVVQAIRTGWIHPNLNLEDPDKAVDAKVLVGPKKERLNVKVGLSNS FGFGGHNSSILFAPYN SEQIDNO:3 AminoacidsequenceoftheC.camphoraKASIV(D3147,pSZ4338). MAMMAGSCSNLVIGNRELGGNGPSLLHYNGLRPLENIQTASAVKKPNGLFASSTAR KSKAVRAMVLPTVTAPKREKDPKKRIVITGMGLVSVFGNDIDTFYSKLLEGESGIGPI DRFDASSFSVRFAGQIHNFSSKGYIDGKNDRRLDDCWRYCLVAGRRALEDANLGPE VLEKMDRSRIGVLIGTGMGGLSAFSNGVESLIQKGYKKITPFFIPYSITNMGSALLAID TGVMGPNYSISTACATANYCFHAAANHIRRGEAEIMVTGGTEAAVSATGVGGFIACR ALSHRNDEPQTASRPWDKDRDGFVMGEGAGVLVMESLHHARKRGANIIAEYLGGA VTCDAHHMTDPRADGLGVSSCITKSLEDAGVSPEEVNYVNAHATSTLAGDLAEVNA IKKVFKDTSEMKMNGTKSMIGHCLGAAGGLEAIATIKAINTGWLHPTINQFNIEPAVT IDTVPNVKKKHDIHVGISNSFGFGGHNSVVVFAPFMP SEQIDNO:4AminoacidsequenceoftheC.camphoraKASI(D3148,pSZ4339). MQILQTPSSSSSSLRMSSMESLSLTPKSLPLKTLLPLRPRPKNLSRRKSQNPRPISSSSSP ERETDPKKRVVITGMGLVSVFGNDVDAYYDRLLSGESGIAPIDRFDASKFPTRFAGQI RGFTSDGYIDGKNDRRLDDCLRYCIVSGKKALENAGLGPHLMDGKIDKERAGVLVG TGMGGLTVFSNGVQTLHEKGYRKMTPFFIPYAITNMGSALLAIELGEMGPNYSISTAC ATSNYCFYAAANHIRRGEADLMLAGGTEAAIIPIGLGGFVACRALSQRNDDPQTASR PWDKDRDGFVMGEGAGVLVMESLEHAMKRDAPIIAEYLGGAVNCDAYHMTDPRA DGLGVSTCIERSLEDAGVAPEEVNYINAHATSTLAGDLAEVNAIKKVFTNTSEIKINA TKSMIGHCLGAAGGLEAIATIKAINTGWLHPSINQFNPEPSVEFDTVANKKQQHEVN VAISNSFGFGGHNSVVVFSAFKP SEQIDNO:5 AminoacidsequenceoftheU.californicaKASI(D3150,pSZ4341). MESLSLTPKSLPLKTLLPFRPRPKNLSRRKSQNPKPISSSSSPERETDPKKRVVITGMGL VSVFGNDVDAYYDRLLSGESGIAPIDRFDASKFPTRFAGQIRGFTSDGYIDGKNDRRL DDCLRYCIVSGKKALENAGLGPDLMDGKIDKERAGVLVGTGMGGLTVFSNGVQTL HEKGYRKMTPFFIPYAITNMGSALLAIDLGFMGPNYSISTACATSNYCFYAAANHIRR GEADVMLAGGTEAAIIPIGLGGFVACRALSQRNDDPQTASRPWDKDRDGFVMGEGA GVLVMESLEHAMKRDAPIIAEYLGGAVNCDAYHMTDPRADGLGVSTCIERSLEDAG VAPEEVNYINAHATSTLAGDLAEVNAIKKVFTNTSEIKINATKSMIGHCLGAAGGLE AIATIKAINTGWLHPSINQFNPEPSVEFDTVANKKQQHEVNVAISNSFGFGGHNSVVV FSAFKP SEQIDNO:6 AminoacidsequenceoftheU.californicaKASIV(D3152,pSZ4343). MTQTLICPSSMETLSLTKQSHFRLRLPTPPHIRRGGGHRHPPPFISASAAPRRETDPKK RVVITGMGLVSVFGTNVDVYYDRLLAGESGVGTIDRFDASMFPTRFGGQIRRFTSEG YIDGKNDRRLDDYLRYCLVSGKKAIESAGFDLHNITNKIDKERAGILVGSGMGGLKV FSDGVESLIEKGYRKISPFFIPYMIPNMGSALLGIDLGFMGPNYSISTACATSNYCIYAA ANHIRQGDADLMVAGGTEAPIIPIGLGGFVACRALSTRNDDPQTASRPWDIDRDGFV MGEGAGILVLESLEHAMKRDAPILAEYLGGAVNCDAHHMTDPRADGLGVSTCIESS LEDAGVAAEEVNYINAHATSTPTGDLAEMKAIKNVFRNTSEIKINATKSMIGHCLGA SGGLEAIATLKAITTGWLHPTINQFNPEPSVDFDTVAKKKKQHEVNVAISNSFGFGGH NSVLVFSAFKP SEQIDNO:7 AminoacidsequenceoftheC.wrightiiKASAI(D3153,pSZ4379).Thealga1 transitpeptideisunderlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRYVFQCLVASCIDPCDQYRSSASL SFLGDNGFASLFGSKPFMSNRGHRRLRRASHSGEAMAVALQPAQEAGTKKKPVIKQ RRVVVTGMGVVTPLGHEPDVFYNNLLDGVSGISEIETFDCTQFPTRIAGEIKSFSTDG WVAPKLSKRMDKFMLYLLTAGKKALADGGITDEVMKELDKRKCGVLIGSGMGGM KVFNDAIEALRVSYKKMNPFCVPFATTNMGSAMLAMDLGWMGPNYSISTACATSN FCILNAANHIIRGEADMMLCGGSDAVIIPIGLGGFVACRALSQRNSDPTKASRPWDSN RDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAGVIL CIEKALAQAGVSKEDVNYINAHATSTSAGDIKEYQALARCFGQNSELRVNSTKSMIG HLLGAAGGVEAVTVVQAIRTGWIHPNLNLEDPDKAVDAKLLVGPKKERLNVKVGL SNSFGFGGHNSSILFAPCNV SEQIDNO:8 AminoacidsequenceoftheC.avigeraKASIVb(D3287,pSZ4453). MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTFQCYIGDNGFGSKPPRSNRG HLRLGRTSHSGEVMAVAMQSAQEVSTKEKPATKQRRVVVTGMGVVTALGHDPDV YYNNLLDGVSGISEIENFDCSQLPTRIAGEIKSFSADGWVAPKFSRRMDKFMLYILTA GKKALVDGGITEDVMKELDKRKCGVLIGSGLGGMKVFSESIEALRTSYKKISPECVPF STTNMGSAILAMDLGWMGPNYSISTACATSNFCILNAANHITKGEADMMLCGGSDS VILPIGMGGFVACRALSQRNNDPTKASRPWDSNRDGFVMGEGAGVLLLEELEHAKK RGATIYAEFLGGSFTCDAYHMTEPHPEGAGVILCIEKALAQSGVSREDVNYINAHATS TPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLLGGAGGVEAVTVVQAIRTGWIHP NINLDDPDEGVDAKLLVGPKKEKLKVKVGLSNSFGFGGHNSSILFAPCN SEQIDNO:9 AminoacidsequenceoftheC.paucipetalaKASIVb(D3288,pSZ4454). MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTFQCLGDIGFASLIGSKPPRSN RNHRRLGRTSHSGEVMAVAMQPAHEASTKNKPVTKQRRVVVTGMGVATPLGHDP DVYYNNLLDGVSGISQIENFDCTQFPTRIAGEIKSFSTEGYVIPKFAKRMDKFMLYLL TAGKKALEDGGITEDVMKELDKRKCGVLIGSGMGGMKIINDSIAALNVSYKKMTPF CVPFSTTNMGSAMLAIDLGWMGPNYSISTACATSNYCILNAANHIVRGEADMMLCG GSDAVIIPVGLGGFVACRALSQRNNDPTKASRPWDSNRDGFVMGEGAGVLLLEELE HAKKRGATIYAEFLGGSFTCDAYHMTEPHPDGAGVILCIEKALAQSGVSREDVNYIN AHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLLGAAGGVEAVTVVQAIR TGWIHPNINLENPDEAVDAKLLVGPKKEKLKVKVGLSNSFGFGGHNSSILFAPYN SEQIDNO:10 AminoacidsequenceoftheC.igneaKASIVb(D3289,pSZ4455).Thealga1transitpeptideis underlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTSQCLVTSYIDPCNKYCSSASL SFLGDNGFASLFGSKPERSNRGHRRLGRASHSGEAMAVALQPAQEVTTKKKPVIKQR RVVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIETFDCTQFPTRIAGEIKSFSTDGW VAPKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELDKRKCGVLIGSGMGGMK LENDSIEALRISYKKMNPFCVPFATTNMGSAMLAMDLGWMGPNYSISTACATSNFCI LNASNHIVRGEADMMLCGGSDSVTVPLGVGGFVACRALSQRNNDPTKASRPWDSN RDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTSDAYHMTEPHPEGAGVILC IEKALAQSGVSREDVNYINAHATSTPAGDIKEYQALARCFGQNSELRVNSTKSMIGH LLGAAGGVEAVAVIQAIRTGWIHPNINLEDPDEAVDPKLLVGPKKEKLKVKVALSNS FGFGGHNSSILFAPCN SEQIDNO:11 AminoacidsequenceoftheC.procumbensKASIV(D3290,pSZ4456).Thealga1transit peptideisunderlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTFQCLVTSHNDPCNQYCSSAS LSFLGDNGFGSKPFRSNRGHRRLGRASHSGEAMAVALQPAQEVATKKKPAMKQRR VVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIETFDCTQFPTRIAGEIKSFSTDGWV APKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELDKRKCGVLIGSGMGGMKLF NDSIEALRVSYKKMNPFCVPFATTNMGSAMLAMDLGWMGPNYSISTACATSNFCIL NAANHIVRGEADMMLCGGSDAVIIPIGLGGFVACRALSQRNNDPTKASRPWDSNRD GFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAGVILCIE KALAQSGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLL GAAGGVEAVTVIQAIRTGWIHPNLNLEDPDKAVDAKFLVGPKKERLNVKVGLSNSF GFGGHNSSILFAPCN SEQIDNO:12 AminoacidsequenceoftheC.paucipetalaKASIVa(D3291,pSZ4457).Thealga1transit peptideisunderlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTFQCLVNSHIDPCNQNVSSAS LSFLGDNGFGSNPFRSNRGHRRLGRASHSGEAMAVALQPAQEVATKKKPAIKQRRV VVTGMGVVTPLGHEPDVFYNNLLDGVSGISEIETFDCTQFPTRIAGEIKSFSTDGWVA PKLSKRMDKFMLYLLTAGKKALADAGITEDVMKELDKRKCGVLIGSGMGGMKLFN DSIEALRVSYKKMNPFCVPFATTNMGSAMLAMDLGWMGPNYSISTACATSNFCILN AANHIIRGEADMMLCGGSDAVIIPIGLGGFVACRALSQRNSDPTKASRPWDSNRDGF VMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPDGAGVILCIEKA LAQSGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLLGA AGGVEAVTVIQAIRTGWIHPNLNLEDPDEAVDAKFLVGPKKERLNVKVGLSNSFGFG GHNSSILFAPYN SEQIDNO:13 AminoacidsequenceoftheC.painteriKASIV(D3292,pSZ4458).Thealga1transitpeptide isunderlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTPQCLDPCNQHCFLGDNGFAS LIGSKPPRSNLGHLRLGRTSHSGEVMAVAQEVSTNKKHATKQRRVVVTGMGVVTPL GHDPDVYYNNLLEGVSGISEIENFDCSQLPTRIAGEIKSFSTDGLVAPKLSKRMDKFM LYILTAGKKALADGGITEDVMKELDKRKCGVLIGSGLGGMKVFSDSVEALRISYKKI SPFCVPFSTTNMGSAMLAMDLGWMGPNYSISTACATSNFCILNAANHITKGEADMM LCGGSDAAILPIGMGGFVACRALSQRNNDPTKASRPWDSNRDGFVMGEGAGVLLLE ELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPDGAGVILCIEKALAQSGVSREEVN YINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLLGGAGGVEAVTVVQ AIRTGWIHPNINLEDPDKGVDAKLLVGPKKEKLKVKVGLSNSFGFGGHNSSILFAPCN SEQIDNO:14 AminoacidsequenceoftheC.avigeraKASIVa(D3293,pSZ4459).Thealga1transitpeptide isunderlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTFQCLVTSYNDPCEQYRSSAS LSFLGDNGFASLFGSKPFRSNRGHRRLGRASHSGEAMAVALQPAQEVGTKKKPVIKQ RRVVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIETFDCTQFPTRIAGEIKSFSTDG WVAPKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELDKRKCGVLIGSGLGGM KVFSESIEALRTSYKKISPFCVPFSTTNMGSAILAMDLGWMGPNYSISTACATSNFCIL NAANHITKGEADMMLCGGSDSVILPIGMGGFVACRALSQRNNDPTKASRPWDSNRD GFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAGVILCIE KALAQSGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLL GGAGGVEAVTVVQAIRTGWIHPNINLDDPDEGVDAKLLVGPKKEKLKVKVGLSNSF GFGGHNSSILFAPCN SEQIDNO:15 AminoacidsequenceoftheC.igneaKASIVa(D3294,pSZ4460).Thealga1transitpeptideis underlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTSQCLVTSYIDPCNKYCSSASL SFLGDNGFASLFGSKPFRSNRGHRRLGRASHSGEAMAVALQPAQEVTTKKKPVIKQR RVVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIETFDCTQFPTRIAGEIKSFSTDGW VAPKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELDKRKCGVLIGSGMGGMK LFNDSIEALRISYKKMNPFCVPFATTNMGSAMLAMDLGWMGPNYSISTACATSNFCI LNASNHIVRGEADMMLCGGSDAVIIPIGLGGFVACRALSQRNNDPTKASRPWDSNRD GFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAGVILCIE KALAQAGVSKEDVNYINAHATSTPAGDIKEYQALAQCFGQNSELRVNSTKSMIGHL LGAAGGVEAVTVVQAIRTGWIHPNLNLEDPDKAVDAKLLVGPKKERLNVKVGLSNS FGFGGHNSSILFAPYN SEQIDNO:16 AminoacidsequenceoftheC.avigeraKASIa(D3342,pSZ4511). MQSLHSPALRASPLDPLRLKSSANGPSSTAAFRPLRRATLPNIRAASPTVSAPKRETDP KKRVVITGMGLVSVFGSDVDAYYEKLLSGESGISLIDRFDASKFPTRFGGQIRGFNAT GYIDGKNDRRLDDCLRYCIVAGKKALENSDLGGDSLSKIDKERAGVLVGTGMGGLT VFSDGVQNLIEKGHRKISPFFIPYAITNMGSALLAIDLGLMGPNYSISTACATSNYCFY AAANHIRRGEADLMIAGGTEAAIIPIGLGGFVACRALSQRNDDPQTASRPWDKDRDG FVMGEGAGVLVMESLEHAMKRGAPIIAEYLGGAVNCDAYHMTDPRADGLGVSSCIE SSLEDAGVSPEEVNYINAHATSTLAGDLAEINAIKKVFKNTKDIKINATKSMIGHCLG ASGGLEAIATIKGITTGWLHPSINQFNPEPSVEFDTVANKKQQHEVNVAISNSFGFGG HNSVVAFSAFKP SEQIDNO:17 AminoacidsequenceoftheC.pulcherimaKASI(D3343,pSZ4512). MHSLQSPSLRASPLDPFRPKSSTVRPLHRASIPNVRAASPTVSAPKRETDPKKRVVITG MGLVSVFGSDVDAYYDKLLSGESGIGPIDRFDASKFPTRFGGQIRGFNSMGYIDGKN DRRLDDCLRYCIVAGKKSLEDADLGADRLSKIDKERAGVLVGTGMGGLTVFSDGVQ SLIEKGHRKITPFFIPYAITNMGSALLAIELGLMGPNYSISTACATSNYCFHAAANHIRR GEADLMIAGGTEAAIIPIGLGGFVACRALSQRNDDPQTASRPWDKDRDGFVMGEGA GVLVLESLEHAMKRGAPIIAEYLGGAINCDAYHMTDPRADGLGVSSCIESSLEDAGV SPEEVNYINAHATSTLAGDLAEINAIKKVFKNTKDIKINATKSMIGHCLGASGGLEAI ATIKGINTGWLHPSINQFNPEPSVEFDTVANKKQQHEVNVAISNSFGFGGHNSVVAFS AFKP SEQIDNO:18 AminoacidsequenceoftheC.avigeramitochondrialKAS(D3344,pSZ4513). MVFLPWRKMLCPSQYRFLRPLSSSTTFDPRRVVVTGLGMVTPLGCGVNTTWKQLIE GKCGIRAISLEDLKMDAFDIDTQAYVFDQLTSKVAATVPTGVNPGEFNEDLWFNQKE HRAIARFIAYALCAADEALKDANWEPTEPEEREMTGVSIGGGTGSISDVLDAGRMIC EKKLRRLSPFFIPRILINMASGHVSMKYGFQGPNHAAVTACATGAHSIGDAARMIQF GDADVMVAGGTESSIDALSIAGFCRSRALTTKYNSCPQEASRPFDTDRDGFVIGEGSG VLVLEELDHARKRGAKMYAEFCGYGMSGDAHHITQPHSDGRGAILAMTRALKQSN LHPDQVDYVNAHATSTSLGDAIEAKAIKTVFSDHAMSGSLALSSTKGAIGHLLGAAG AVEAIFSILAIKNGLAPLTLNVARPDPVFTERFVPLTASKEMHVRAALSNSFGFGGTN TTLLFTSPPQN SEQIDNO:19 AminoacidsequenceoftheC.avigeraKASIII(D3345,pSZ4514). MANAYGFVGSSVPTVGRAAQFQQMGSGFCSVDFISKRVFCCSAVQGADKPASGDSR AEYRTPRLVSRGCKLIGSGSAIPTLQVSNDDLAKIVDTNDEWISVRTGIRNRRVLTGK DSLTNLATEAARKALEMAQVDAEDVDMVLMCTSTPEDLFGSAPQIQKALGCKKNPL SYDITAACSGFVLGLVSAACHIRGGGFNNVLVIGADSLSRYVDWTDRGTCILFGDAA GAVLVQSCDAEEDGLFAFDLHSDGDGQRHLRAVITENETDHAVGTNGSVSDFPPRRS SYSCIQMNGKEVFRFACRSVPQSIELALGKAGLNGSNIDWLLLHQANQRIIDAVATRL EVPQERVISNLANYGNTSAASIPLALDEAVRGGKVKPGHLIATAGFGAGLTWGSAIV RWG SEQIDNO:20 HAEpitopeTAGaminoacidsequence TMYPYDVPDYA SEQIDNO:21 C.palustrisKASIVCDS ATGGCGGCCGCCGCTTCCATGGTTGCGTCCCCACTCTGTACGTGGCTCGTAGCCG CTTGCATGTCCACTTCCTTCGACAACGACCCACGTTCCCCGTCCATCAAGCGTCTC CCCCGCCGGAGGAGGACTCTCTCCCAATCCTCCCTCCGCGGCGGATCCACCTTCC AATGCCTCGTCACCTCATACATCGACCCTTGCAATCAGTTCTCCTCCTCCGCCTCC CTTAGCTTCCTCGGGGATAACGGATTCGCATCCCTTTTCGGATCCAAGCCTTTCCG GTCCAATCGCGGCCACCGGAGGCTCGGCCGTGCTTCCCATTCCGGGGAGGCCATG GCCGTGGCTTTGGAACCTGCACAGGAAGTCGCCACGAAGAAGAAACCTCTTGTC AAGCAAAGGCGAGTAGTTGTTACAGGAATGGGCGTGGTGACTCCTCTAGGCCAT GAACCTGATGTTTACTACAACAATCTCCTAGATGGAGTAAGCGGCATAAGTGAG ATAGAGGCCTTCGACTGCACTCAGTTTCCCACGAGAATTGCCGGAGAGATCAAGT CTTTTTCCACAGATGGATGGGTGGCCCCAAAGCTCTCCAAGAGGATGGACAAGTT CATGCTTTACTTGTTGACTGCTGGCAAGAAAGCATTAGCGGATGGTGGAATCACC GATGATGTGATGAAAGAGCTTGATAAAAGAAAGTGTGGAGTTCTCATTGGCTCC GGATTGGGCGGCATGAAGCTGTTCAGTGATTCCATTGAAGCTCTGAGGATTTCAT ATAAGAAGATGAATCCCTTTTGTGTACCTTTTGCTACTACAAATATGGGATCAGC TATGCTTGCAATGGACTTGGGATGGATGGGTCCTAACTACTCGATATCAACTGCC TGTGCTACAAGTAATTTCTGTATACTGAATTCTGCAAATCACATAGTCAGAGGCG AAGCTGACATGATGCTTTGTGGTGGCTCGGATGCGGTCATTATACCTATTGGTTT GGGAGGTTTTGTGGCGTGCCGAGCTTTGTCACAGAGGAATAATGACCCTACCAA AGCTTCGAGACCATGGGACAGTAATCGTGATGGATTTGTAATGGGCGAAGGAGC TGGAGTGTTACTTCTCGAGGAGTTAGAGCATGCAAAGAAAAGAGGTGCCACCAT TTATGCGGAATTTTTAGGGGGCAGTTTCACTTGCGATGCCTACCATATGACCGAG CCTCACCCTGAAGGTGCTGGAGTGATCCTCTGCATAGAGAAGGCCTTGGCTCAGG CCGGAGTCTCTAGAGAAGACGTAAATTACATAAATGCGCATGCAACTTCCACTCC TGCTGGAGATATCAAGGAATACCAAGCTCTCGCACACTGCTTCGGCCAAAACAG TGAGCTGAGAGTGAATTCCACTAAATCGATGATCGGTCATCTTATTGGAGCAGCT GGTGGTGTAGAAGCAGTTACCGTAGTTCAGGCGATAAGGACTGGGTGGATCCAT CCAAATCTTAATTTGGAGGACCCGGACAAAGCCGTGGATGCAAAAGTGCTCGTA GGACCTAAGAAGGAGAGACTAAATGTCAAGGTCGGTTTGTCCAATTCATTTGGGT TCGGTGGTCATAACTCGTCCATACTCTTCGCCCCTTACAATTAG SEQIDNO:22 C.camphoraKASIVCDS ATGGCAATGATGGCAGGTTCTTGTTCCAATTTGGTGATTGGAAACAGAGAATTGG GTGGGAATGGGCCTTCTTTGCTTCACTACAATGGCCTCAGACCATTGGAAAATAT TCAAACAGCCTCAGCTGTGAAAAAGCCAAATGGGTTATTTGCATCTTCTACAGCT CGAAAATCCAAAGCTGTCAGAGCCATGGTATTGCCCACTGTAACAGCTCCAAAA CGCGAAAAAGATCCCAAGAAGCGGATTGTAATAACAGGAATGGGCCTGGTTTCC GTCTTTGGAAATGACATTGATACATTTTATAGTAAACTACTGGAAGGAGAGAGCG GGATTGGCCCAATCGACAGATTTGATGCTTCTTCCTTCTCAGTGAGATTTGCTGGT CAGATTCACAATTTCTCATCCAAAGGATACATTGATGGGAAGAATGATCGTCGGC TAGATGACTGCTGGAGGTATTGCCTTGTGGCTGGAAGAAGAGCCCTTGAAGATG CCAATCTTGGACCAGAGGTATTGGAAAAAATGGACCGATCTCGAATAGGGGTGC TGATAGGGACAGGAATGGGTGGGTTGTCAGCCTTTAGCAATGGAGTTGAGTCTCT GATCCAGAAGGGCTACAAGAAAATCACTCCATTTTTTATTCCTTACTCCATCACC AATATGGGCTCTGCTCTTTTAGCAATCGACACGGGCGTAATGGGACCAAACTACT CCATTTCAACAGCATGTGCAACCGCAAACTATTGCTTCCATGCTGCTGCAAATCA TATAAGAAGGGGTGAAGCTGAAATCATGGTGACTGGAGGGACAGAGGCAGCAG TCTCAGCTACTGGAGTTGGCGGATTCATAGCATGTAGAGCCTTATCGCACAGGAA TGATGAGCCCCAGACGGCCTCGAGACCATGGGATAAAGATCGGGATGGTTTCGT CATGGGCGAAGGCGCTGGTGTGCTGGTGATGGAGAGCTTGCATCATGCAAGAAA GAGAGGAGCAAACATAATTGCAGAGTATTTAGGAGGAGCAGTAACATGTGATGC ACATCACATGACAGATCCTCGAGCTGATGGTCTCGGGGTTTCTTCTTGCATAACC AAGAGCTTAGAAGATGCAGGAGTCTCCCCAGAAGAGGTGAACTATGTGAATGCT CATGCAACATCAACACTTGCAGGAGATTTAGCAGAGGTTAATGCCATAAAGAAG GTCTTCAAGGACACATCTGAAATGAAAATGAATGGAACTAAGTCAATGATTGGA CACTGTCTTGGAGCAGCTGGTGGATTAGAAGCCATTGCGACCATCAAAGCTATCA ATACTGGCTGGCTACATCCAACCATCAATCAATTTAACATAGAACCAGCGGTAAC TATCGACACGGTCCCAAATGTGAAGAAAAAGCATGATATCCATGTTGGCATCTCT AACTCATTTGGCTTTGGTGGGCACAACTCGGTGGTCGTTTTTGCTCCCTTCATGCC ATGA SEQIDNO:23 C.camphoraKASICDS ATGCAAATCCTCCAAACCCCATCATCATCATCGTCTTCTCTCCGCATGTCGTCCAT GGAATCTCTCTCTCTCACCCCTAAATCTCTCCCTCTCAAAACCCTTCTTCCCCTTC GTCCTCGCCCTAAAAACCTCTCCAGACGCAAATCCCAAAACCCTAGACCCATCTC CTCCTCTTCCTCCCCCGAGAGAGAGACGGATCCCAAGAAGCGAGTCGTCATCACC GGGATGGGCCTCGTCTCCGTCTTCGGCAACGATGTCGATGCCTACTACGACCGCC TCCTCTCGGGAGAGAGCGGCATCGCCCCCATCGATCGCTTCGACGCCTCCAAGTT CCCCACCAGATTCGCCGGTCAGATCCGAGGGTTCACCTCCGACGGCTACATTGAC GGGAAGAACGACCGCCGGTTAGACGATTGTCTCAGATACTGTATTGTTAGTGGG AAGAAGGCGCTCGAGAATGCCGGCCTCGGACCCCATCTCATGGACGGAAAGATT GACAAGGAGAGAGCTGGTGTGCTTGTCGGGACAGGCATGGGTGGTCTTACAGTT TTCTCTAATGGGGTCCAGACTCTACATGAGAAAGGTTACAGGAAAATGACTCCGT TTTTCATCCCTTATGCCATAACAAACATGGGTTCTGCCTTGCTTGCAATTGAACTT GGTTTTATGGGCCCAAACTATTCTATCTCAACTGCATGTGCTACCTCCAATTATTG CTTTTATGCTGCTGCTAACCATATACGGAGAGGTGAGGCTGATCTGATGCTTGCT GGTGGAACTGAAGCTGCAATTATTCCTATTGGATTAGGAGGCTTTGTTGCATGTA GAGCTTTATCACAGAGAAATGATGACCCCCAGACAGCTTCAAGACCATGGGACA AAGATCGAGACGGTTTTGTTATGGGTGAAGGTGCTGGAGTATTGGTAATGGAGA GCTTGGAGCATGCTATGAAACGTGATGCACCAATTATTGCTGAGTATTTAGGAGG TGCAGTGAACTGTGATGCGTATCATATGACGGATCCTAGAGCTGATGGGCTCGGG GTTTCAACATGCATAGAAAGAAGTCTTGAAGATGCTGGTGTGGCACCTGAAGAG GTTAACTACATAAATGCACATGCAACTTCCACTCTTGCAGGAGACCTGGCTGAGG TGAATGCGATCAAAAAGGTTTTTACAAACACTTCAGAGATCAAAATCAATGCAA CCAAGTCTATGATAGGGCACTGCCTTGGAGCGGCCGGGGGGTTAGAAGCCATTG CCACAATCAAAGCAATAAATACTGGTTGGCTGCACCCTTCTATAAACCAATTTAA TCCAGAGCCCTCTGTTGAGTTTGACACTGTAGCAAATAAAAAGCAGCAGCATGA AGTGAATGTTGCCATTTCCAACTCTTTCGGGTTTGGCGGACACAACTCAGTCGTG GTGTTTTCGGCATTCAAGCCTTGA SEQIDNO:24 UmbellulariacalifornicaKASICDS ATGGAATCTCTCTCTCTCACCCCTAAATCTCTCCCTCTCAAAACCCTTCTTCCCTTT CGTCCTCGCCCTAAAAACCTCTCCAGACGCAAATCCCAAAACCCTAAACCCATCT CCTCCTCTTCCTCCCCGGAGAGAGAGACGGATCCCAAGAAGCGAGTCGTCATCAC CGGGATGGGCCTCGTCTCCGTCTTCGGCAACGACGTCGATGCCTACTACGACCGC CTCCTCTCCGGAGAGAGCGGCATCGCCCCCATCGATCGCTTCGACGCCTCCAAGT TCCCCACCAGATTCGCCGGTCAGATCCGAGGGTTCACCTCCGACGGCTACATTGA CGGGAAGAACGACCGCCGGTTAGACGATTGTCTCAGATACTGTATCGTTAGTGG GAAGAAGGCGCTCGAGAATGCCGGCCTCGGACCCGATCTCATGGACGGAAAGAT TGACAAGGAGCGAGCTGGTGTGCTTGTCGGGACAGGCATGGGTGGTCTTACAGT TTTCTCTAATGGGGTTCAGACTCTCCATGAGAAAGGTTACAGGAAAATGACTCCG TTTTTCATCCCTTATGCCATAACAAACATGGGTTCTGCCTTGCTTGCAATTGACCT TGGTTTTATGGGCCCAAACTATTCTATCTCAACTGCATGTGCTACCTCCAATTATT GCTTTTATGCTGCTGCTAACCATATACGGAGAGGTGAGGCTGATGTGATGCTTGC TGGTGGAACTGAAGCTGCAATTATTCCTATTGGCTTAGGAGGCTTTGTTGCATGT AGAGCTTTATCACAGCGAAATGATGACCCCCAGACAGCTTCAAGACCATGGGAC AAAGATCGAGACGGTTTTGTTATGGGTGAAGGTGCTGGAGTATTGGTAATGGAG AGCTTGGAGCATGCTATGAAACGTGATGCACCAATTATTGCTGAGTATTTAGGAG GTGCAGTGAACTGTGATGCGTATCATATGACGGATCCTAGAGCTGATGGGCTCGG GGTTTCAACATGCATAGAAAGAAGTCTTGAAGATGCTGGTGTGGCACCTGAAGA GGTTAACTACATAAATGCACATGCAACTTCCACACTTGCAGGTGACCTGGCCGAG GTGAATGCCATCAAAAAGGTTTTTACAAACACTTCAGAGATCAAAATCAATGCA ACCAAGTCTATGATAGGGCACTGCCTTGGAGCGGCCGGGGGTTTAGAAGCCATT GCCACAATCAAAGCAATAAATACTGGTTGGCTGCACCCTTCTATAAACCAATTTA ATCCAGAGCCCTCTGTTGAGTTTGACACTGTAGCAAATAAAAAGCAGCAGCATG AAGTGAATGTTGCCATTTCCAACTCTTTCGGGTTTGGTGGACACAACTCGGTCGT GGTGTTTTCGGCATTCAAGCCTTGA SEQIDNO:25 UmbellulariacalifornicaKASIVCDS ATGACGCAAACCCTCATCTGCCCATCCTCCATGGAAACCCTCTCTCTTACCAAAC AATCCCATTTCAGACTCAGGCTACCCACTCCTCCTCACATCAGACGCGGCGGCGG CCATCGCCATCCTCCTCCCTTCATCTCCGCCTCCGCCGCCCCTAGGAGAGAGACC GATCCGAAGAAGAGAGTCGTCATCACGGGAATGGGCCTCGTCTCCGTCTTCGGC ACCAACGTCGATGTCTACTACGATCGCCTCCTCGCCGGCGAGAGCGGCGTTGGCA CTATCGATCGCTTCGACGCGTCGATGTTCCCGACGAGATTCGGCGGCCAGATCCG GAGGTTCACGTCGGAGGGGTACATCGACGGGAAGAACGACCGGCGGCTGGATGA CTACCTCCGGTACTGCCTCGTCAGCGGGAAGAAGGCGATCGAGAGTGCTGGCTTC GATCTCCATAACATCACCAACAAGATTGACAAGGAGCGAGCTGGGATACTTGTT GGGTCAGGCATGGGCGGTCTTAAAGTTTTCTCTGATGGTGTTGAGTCTCTTATCG AGAAAGGTTACAGGAAAATAAGTCCATTTTTCATCCCTTATATGATACCAAACAT GGGTTCTGCTTTGCTTGGAATTGACCTTGGTTTCATGGGACCAAACTACTCAATTT CAACTGCTTGTGCTACGTCAAATTATTGCATTTATGCTGCTGCAAATCATATCCGA CAAGGTGATGCCGACCTAATGGTTGCTGGTGGAACTGAGGCTCCAATTATTCCAA TTGGCTTAGGGGGCTTTGTAGCATGTAGAGCTTTGTCAACAAGAAATGATGATCC CCAGACAGCTTCAAGGCCATGGGACATAGACCGAGATGGTTTTGTTATGGGCGA AGGAGCTGGAATATTGGTATTGGAGAGCTTGGAACATGCAATGAAACGTGATGC ACCAATTCTTGCTGAGTATTTAGGAGGTGCAGTTAACTGTGATGCTCATCATATG ACAGATCCTCGAGCTGATGGGCTTGGGGTTTCAACATGCATTGAAAGCAGTCTTG AAGATGCCGGCGTGGCAGCAGAAGAGGTTAACTATATAAATGCACACGCGACTT CAACACCTACAGGTGACCTGGCTGAGATGAAGGCTATAAAAAATGTATTTAGGA ACACTTCTGAGATCAAAATCAATGCAACCAAGTCTATGATTGGGCATTGCCTTGG AGCGTCTGGGGGGCTAGAAGCCATTGCCACATTGAAAGCGATTACAACTGGTTG GCTTCATCCAACTATAAACCAATTTAATCCAGAGCCTTCTGTTGACTTTGATACG GTGGCAAAGAAAAAGAAGCAGCATGAAGTTAATGTTGCCATTTCAAACTCTTTTG GATTCGGAGGACACAACTCAGTGTTGGTGTTTTCGGCATTCAAGCCTTGA SEQIDNO:26 C.wrightiiKASAICDS(D3153,pSZ4379) atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggtacgtattccagtgcctggtggccagctgcatcgacccctgcgaccagtaccgcagcagcgccagcctgagctt cctgggcgacaacggcttcgccagcctgttcggcagcaagcccttcatgagcaaccgcggccaccgccgcctgcgccgcgccagc cacagcggcgaggccatggccgtggccctgcagcccgcccaggaggccggcaccaagaagaagcccgtgatcaagcagcgcc gcgtggtggtgaccggcatgggcgtggtgacccccctgggccacgagcccgacgtgttctacaacaacctgctggacggcgtgag cggcatcagcgagatcgagaccttcgactgcacccagttccccacccgcatcgccggcgagatcaagagcttcagcaccgacggct gggtggcccccaagctgagcaagcgcatggacaagttcatgctgtacctgctgaccgccggcaagaaggccctggccgacggcgg catcaccgacgaggtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggcagcggcatgggcggcatgaaggtgttc aacgacgccatcgaggccctgcgcgtgagctacaagaagatgaaccccttctgcgtgcccttcgccaccaccaacatgggcagcgc catgctggccatggacctgggctggatgggccccaactacagcatcagcaccgcctgcgccaccagcaacttctgcatcctgaacgc cgccaaccacatcatccgcggcgaggccgacatgatgctgtgcggcggcagcgacgccgtgatcatccccatcggcctgggcggc ttcgtggcctgccgcgccctgagccagcgcaacagcgaccccaccaaggccagccgcccctgggacagcaaccgcgacggcttc gtgatgggcgagggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatctacgccgagttc ctgggcggcagcttcacctgcgacgcctaccacatgaccgagccccaccccgagggcgccggcgtgatcctgtgcatcgagaagg ccctggcccaggccggcgtgagcaaggaggacgtgaactacatcaacgcccacgccaccagcaccagcgccggcgacatcaag gagtaccaggccctggcccgctgcttcggccagaacagcgagctgcgcgtgaacagcaccaagagcatgatcggccacctgctgg gcgccgccggcggcgtggaggccgtgaccgtggtgcaggccatccgcaccggctggattcaccccaacctgaacctggaggacc ccgacaaggccgtggacgccaagctgctggtgggccccaagaaggagcgcctgaacgtgaaggtgggcctgagcaacagcttcg gcttcggcggccacaacagcagcatcctgttcgccccctgcaacgtgtga SEQIDNO:27 C.avigeraKASIVbCDS ATGGCGGCCGCTTCTTGCATGGCTGCGTCCCCTTTCTGTACGTCGCTCGTGGCTGC ATGCATGTCGACTTCATCCGACAACGACCCATGTCCCCTTTCCCGCCGCGGATCC ACCTTCCAATGCTACATCGGGGATAACGGATTCGGATCGAAGCCTCCCCGTTCAA ATCGTGGCCACCTGAGGCTCGGCCGCACTTCACATTCCGGAGAGGTGATGGCTGT GGCTATGCAATCTGCACAAGAAGTCTCCACAAAGGAGAAACCTGCTACCAAGCA AAGGCGAGTTGTTGTCACGGGTATGGGTGTGGTGACTGCTCTAGGCCATGACCCC GATGTTTACTACAACAATCTCCTAGACGGAGTAAGCGGCATAAGCGAGATAGAA AACTTTGACTGTTCTCAGCTTCCCACGAGAATTGCCGGAGAGATCAAGTCTTTTT CTGCAGATGGGTGGGTGGCCCCGAAGTTCTCCAGGAGGATGGACAAGTTTATGC TTTACATTCTGACTGCAGGCAAGAAAGCATTAGTAGATGGTGGAATCACTGAAG ATGTGATGAAAGAGCTCGATAAAAGAAAGTGTGGAGTTCTCATTGGCTCCGGAT TGGGCGGTATGAAGGTATTTAGCGAGTCCATTGAAGCTCTGAGGACTTCATATAA GAAGATCAGTCCCTTTTGTGTACCTTTTTCTACCACGAATATGGGATCCGCTATTC TTGCAATGGACTTGGGATGGATGGGCCCTAACTATTCGATATCGACTGCCTGTGC AACAAGTAACTTCTGTATACTGAATGCTGCGAACCACATAACCAAAGGCGAAGC AGACATGATGCTTTGTGGTGGCTCGGATTCGGTCATTTTACCTATTGGTATGGGA GGTTTCGTAGCATGCCGAGCTTTGTCACAGAGGAATAATGACCCTACCAAAGCTT CGAGACCATGGGACAGTAATCGTGATGGATTTGTGATGGGAGAAGGTGCTGGAG TTTTACTTCTCGAGGAGTTAGAGCATGCAAAGAAAAGAGGCGCAACCATTTATGC GGAATTTCTTGGTGGGAGTTTCACTTGCGATGCCTACCACATGACCGAGCCTCAC CCTGAAGGAGCTGGAGTGATCCTCTGCATAGAGAAGGCCTTGGCTCAGTCCGGA GTCTCGAGGGAAGACGTAAATTACATAAATGCGCATGCAACTTCCACTCCCGCTG GAGATATCAAAGAATACCAAGCTCTCGCCCACTGTTTCGGCCAAAACAGTGAGTT AAGAGTGAATTCCACCAAGTCGATGATCGGTCACCTTCTTGGAGGAGCCGGTGG CGTAGAAGCAGTTACAGTCGTTCAGGCAATAAGGACTGGATGGATCCATCCAAA TATTAATTTGGACGACCCGGACGAAGGCGTGGATGCAAAACTGCTCGTCGGCCCT AAGAAGGAGAAACTGAAGGTCAAGGTCGGTTTGTCCAATTCATTCGGGTTCGGC GGCCATAACTCATCCATACTCTTTGCCCCATGCAATTAG SEQIDNO:28 C.paucipetalaKASIVbCDS ATGGCGGCCGCTTCATCAATGGTTGCCTCCCCATTCTCTACGTCCCTCGTAGCCGC CTGCATGTCCACTTCATTCGACAACGACCCACGTTCCCTTTCCCACAACCGCATCC GCCTCCGCGGATCCACCTTCCAATGCCTCGGGGATATCGGATTCGCTTCCCTCAT CGGATCCAAGCCTCCGCGTTCAAATCGCAACCACCGGAGGCTCGGCCGCACTTCC CATTCCGGGGAGGTCATGGCTGTGGCTATGCAACCTGCACATGAAGCTTCCACAA AGAATAAACCTGTTACCAAGCAAAGGCGAGTAGTTGTGACAGGTATGGGCGTGG CGACTCCTCTAGGCCATGACCCCGATGTTTACTACAACAATCTCCTAGACGGAGT AAGTGGCATAAGTCAGATAGAGAACTTCGACTGCACTCAGTTTCCCACGAGAATT GCCGGAGAGATCAAGTCTTTCTCCACAGAAGGGTATGTGATCCCGAAGTTCGCCA AGAGGATGGACAAGTTCATGCTTTACTTGCTGACTGCAGGCAAGAAAGCATTAG AAGATGGTGGAATCACTGAAGATGTGATGAAAGAGCTCGATAAAAGAAAGTGTG GAGTTCTCATTGGCTCCGGAATGGGCGGTATGAAGATAATCAACGATTCCATTGC AGCTCTGAATGTTTCATATAAGAAGATGACTCCCTTTTGTGTACCCTTTTCCACCA CAAATATGGGATCCGCTATGCTTGCGATAGACTTGGGATGGATGGGCCCGAACT ATTCGATATCAACTGCCTGTGCAACAAGTAACTACTGTATACTGAATGCTGCGAA CCACATAGTCAGAGGCGAAGCAGATATGATGCTTTGTGGTGGCTCGGATGCGGT CATTATACCTGTTGGTTTGGGAGGTTTCGTAGCATGCCGAGCTTTGTCACAGAGG AACAATGACCCTACCAAAGCTTCGAGACCTTGGGACAGTAACCGTGATGGATTT GTGATGGGAGAAGGAGCCGGAGTGTTACTTCTCGAGGAGTTAGAGCATGCAAAG AAAAGAGGTGCAACCATTTATGCGGAATTTCTAGGTGGGAGTTTCACTTGCGATG CCTACCACATGACCGAGCCTCACCCTGATGGAGCTGGAGTGATCCTCTGCATAGA GAAGGCTTTGGCACAGTCCGGAGTCTCGAGGGAAGACGTCAATTACATAAATGC GCATGCAACTTCTACTCCTGCTGGAGATATCAAGGAATACCAAGCTCTCGCCCAC TGTTTCGGCCAAAACAGTGAGTTAAGAGTGAATTCCACCAAATCGATGATCGGTC ACCTTCTTGGAGCTGCTGGTGGCGTAGAAGCAGTTACAGTAGTTCAGGCAATAAG GACTGGGTGGATCCATCCAAATATTAATTTGGAAAACCCGGACGAAGCTGTGGA TGCAAAATTGCTCGTCGGCCCTAAGAAGGAGAAACTGAAGGTCAAGGTCGGTTT GTCCAATTCATTTGGGTTCGGTGGGCATAACTCATCCATACTCTTCGCCCCTTACA ATTAG SEQIDNO:29 C.igneaKASIVbCDS ATGGCGGCGGCCGCTTCCATGTTTACGTCCCCACTCTGTACGTGGCTCGTAGCCT CTTGCATGTCGACTTCCTTCGACAACGACCCACGTTCGCCGTCCGTCAAGCGTCT CCCCCGCCGGAGGAGGATTCTCTCCCAATGCTCCCTCCGCGGATCCACCTCCCAA TGCCTCGTCACCTCATACATCGACCCTTGCAATAAGTACTGCTCCTCCGCCTCCCT TAGCTTCCTCGGGGATAACGGATTCGCATCCCTTTTCGGATCTAAGCCATTCCGG TCCAATCGCGGCCACCGGAGGCTCGGCCGTGCTTCCCATTCCGGGGAGGCCATGG CTGTGGCTCTGCAACCTGCACAGGAAGTCACCACGAAGAAGAAACCTGTGATCA AGCAAAGGCGAGTAGTTGTTACAGGAATGGGCGTGGTGACTCCTCTAGGCCATG AACCTGATGTTTACTACAACAATCTCCTAGATGGAGTAAGCGGCATAAGTGAGAT AGAGACCTTCGACTGCACTCAGTTTCCCACGAGAATCGCCGGAGAGATCAAGTCT TTTTCCACAGATGGGTGGGTGGCCCCAAAGCTCTCCAAGAGGATGGACAAGTTC ATGCTTTACTTGTTGACTGCTGGCAAGAAAGCATTAGCAGATGGTGGAATCACCG ATGATGTGATGAAAGAGCTTGATAAAAGAAAGTGTGGGGTTCTCATTGGCTCTG GAATGGGCGGCATGAAGTTGTTCAACGATTCCATTGAAGCTCTGAGGATTTCATA TAAAAAGATGAATCCCTTTTGTGTACCTTTTGCTACCACAAATATGGGATCAGCT ATGCTTGCAATGGACTTGGGATGGATGGGTCCTAACTACTCGATATCAACTGCCT GTGCAACAAGTAATTTCTGTATACTGAATGCTTCAAACCACATAGTCAGAGGCGA AGCTGACATGATGCTTTGTGGTGGCTCGGATTCTGTCACTGTACCTTTAGGTGTG GGAGGTTTCGTAGCATGCCGAGCTTTGTCACAGAGGAATAATGACCCTACCAAA GCTTCGAGACCTTGGGACAGTAATCGGGATGGATTTGTGATGGGAGAAGGAGCT GGAGTGTTACTTCTTGAGGAGTTAGAGCATGCAAAGAAAAGAGGTGCAACCATT TATGCGGAATTTCTCGGTGGGAGCTTTACTTCTGATGCCTACCACATGACCGAGC CTCACCCCGAAGGAGCTGGAGTGATTCTCTGCATTGAGAAGGCCTTGGCTCAGTC CGGAGTCTCGAGGGAAGACGTGAATTATATAAATGCGCATGCAACTTCCACTCCT GCTGGTGATATAAAGGAATACCAAGCTCTCGCCCGCTGTTTCGGCCAAAACAGTG AGTTAAGAGTGAATTCCACCAAATCGATGATCGGTCACCTTCTTGGAGCAGCTGG TGGCGTAGAAGCAGTTGCAGTAATTCAGGCAATAAGGACTGGATGGATCCATCC AAATATTAATTTGGAAGACCCCGACGAAGCCGTGGATCCAAAATTGCTCGTCGG CCCTAAGAAGGAGAAACTGAAGGTCAAGGTAGCTTTGTCCAATTCATTCGGGTTC GGCGGGCATAACTCATCCATACTCTTTGCCCCTTGCAATTAG SEQIDNO:30 C.procumbensKASIVCDS ATGGCGGCGGCGCCCTCTTCCCCACTCTGTACGTGGCTCGTAGCCGCTTGCATGT CCACTTCCTTCGACAACAACCCACGTTCGCCCTCCATCAAGCGTCTCCCCCGCCG GAGGAGGGTTCTCTCCCAATGCTCCCTCCGTGGATCCACCTTCCAATGCCTCGTC ACCTCACACAACGACCCTTGCAATCAGTACTGCTCCTCCGCCTCCCTTAGCTTCCT CGGGGATAACGGATTCGGATCCAAGCCATTCCGGTCCAATCGCGGCCACCGGAG GCTCGGCCGTGCTTCGCATTCCGGGGAGGCCATGGCTGTGGCCTTGCAACCTGCA CAGGAAGTCGCCACGAAGAAGAAACCTGCTATGAAGCAAAGGCGAGTAGTTGTT ACAGGAATGGGCGTGGTGACTCCTCTGGGCCATGAACCTGATGTTTACTACAACA ATCTCCTAGATGGAGTAAGCGGCATAAGTGAGATAGAGACCTTCGACTGCACTC AGTTTCCCACGAGAATCGCCGGAGAGATCAAGTCTTTTTCCACAGATGGATGGGT GGCCCCAAAGCTCTCCAAGAGGATGGACAAGTTCATGCTTTACTTGTTGACTGCT GGCAAGAAAGCATTAGCAGATGGTGGAATCACTGATGATGTGATGAAAGAGCTT GATAAAAGAAAGTGTGGAGTTCTCATTGGCTCTGGAATGGGCGGCATGAAGTTG TTCAACGATTCCATTGAAGCTCTGAGAGTTTCATATAAGAAGATGAATCCCTTTT GTGTACCTTTTGCTACCACAAATATGGGATCAGCTATGCTTGCAATGGACTTGGG ATGGATGGGTCCTAACTACTCGATATCAACTGCCTGTGCAACAAGTAATTTCTGT ATACTGAATGCTGCAAACCACATAGTCAGAGGCGAAGCTGACATGATGCTTTGT GGTGGCTCGGATGCGGTCATTATACCTATTGGTTTGGGAGGTTTTGTGGCGTGCC GAGCTTTGTCACAGAGGAATAATGACCCTACCAAGGCTTCGAGACCATGGGATA GTAATCGTGATGGATTTGTAATGGGCGAAGGAGCTGGAGTGTTACTTCTCGAGGA GTTAGAGCATGCAAAGAAAAGAGGTGCAACCATTTATGCGGAATTTTTAGGGGG CAGTTTCACTTGCGATGCCTACCATATGACCGAGCCTCACCCTGAAGGAGCTGGA GTGATCCTCTGCATAGAGAAGGCCTTGGCTCAGTCCGGAGTCTCTAGAGAAGAC GTAAATTACATAAATGCGCATGCAACTTCCACTCCTGCTGGAGATATCAAAGAAT ACCAAGCTCTCGCCCACTGTTTCGGCCAAAACAGTGAGCTGAGAGTGAATTCCAC TAAATCGATGATCGGTCATCTTCTTGGAGCAGCTGGTGGTGTAGAAGCAGTTACC GTAATTCAGGCGATAAGGACTGGGTGGATCCATCCAAATCTTAATTTGGAAGACC CGGACAAAGCCGTGGATGCAAAATTTCTCGTGGGACCTAAGAAGGAGAGACTGA ATGTCAAGGTCGGTTTGTCCAATTCATTTGGGTTCGGGGGGCATAACTCATCCAT ACTCTTTGCCCCTTGCAATTAG SEQIDNO:31 C.paucipetalaKASIVaCDS ATGGCGGCGGCGGCCTCTTCCCCACTCTGCACATGGCTCGTAGCCGCTTGCATGT CCACTTCATTCGACAACAACCCACGTTCGCCCTCCATCAAGCGTCTCCCCCGCCG GAGGAGGGTTCTCTCCCAATGCTCCCTCCGCGGATCCACCTTCCAATGCCTCGTC AACTCACACATCGACCCTTGCAATCAGAACGTCTCCTCCGCCTCCCTTAGCTTCCT CGGGGATAACGGATTCGGATCCAATCCATTCCGGTCCAATCGCGGCCACCGGAG GCTCGGCCGGGCTTCCCATTCCGGGGAGGCCATGGCTGTTGCTCTGCAACCTGCA CAGGAAGTCGCCACGAAGAAGAAACCTGCTATCAAGCAAAGGCGAGTAGTTGTT ACAGGAATGGGCGTGGTGACTCCTCTAGGCCATGAGCCTGATGTTTTCTACAACA ATCTCCTAGATGGAGTAAGCGGCATAAGTGAGATAGAGACCTTCGACTGCACTC AGTTTCCCACGAGAATTGCCGGAGAGATCAAGTCTTTTTCCACAGATGGGTGGGT GGCCCCAAAGCTCTCCAAGAGGATGGACAAGTTCATGCTTTACTTGTTGACTGCT GGCAAGAAAGCATTAGCAGATGCTGGAATTACCGAGGATGTGATGAAAGAGCTT GATAAAAGAAAGTGTGGAGTTCTCATTGGCTCCGGAATGGGCGGCATGAAGTTG TTCAACGATTCCATTGAAGCTCTGAGGGTTTCATATAAGAAGATGAATCCCTTTT GTGTACCTTTTGCTACCACAAATATGGGATCAGCTATGCTTGCAATGGACTTGGG ATGGATGGGTCCTAACTACTCGATATCGACTGCCTGTGCAACAAGTAATTTCTGT ATACTGAATGCTGCAAACCACATAATCAGAGGCGAAGCTGACATGATGCTTTGT GGTGGTTCGGATGCGGTCATTATACCTATTGGTTTGGGAGGTTTTGTGGCGTGCC GAGCTTTGTCACAGAGGAATAGTGACCCTACCAAAGCTTCGAGACCATGGGATA GTAATCGTGATGGATTTGTAATGGGCGAAGGAGCTGGAGTGTTACTTCTCGAGGA GTTAGAGCATGCAAAGAAAAGAGGTGCAACCATTTATGCGGAATTTTTAGGGGG CAGCTTCACTTGCGATGCCTACCACATGACCGAGCCTCACCCTGATGGAGCTGGA GTGATCCTCTGCATAGAGAAGGCTTTGGCACAGTCCGGAGTCTCGAGGGAAGAC GTCAATTACATAAATGCGCATGCAACTTCTACTCCTGCTGGAGATATCAAGGAAT ACCAAGCTCTCGCCCACTGTTTCGGCCAAAACAGTGAGCTGAGAGTGAATTCCAC TAAATCGATGATCGGTCATCTTCTTGGTGCAGCTGGTGGTGTAGAAGCTGTTACT GTAATTCAGGCGATAAGGACTGGGTGGATTCATCCAAATCTTAATTTGGAAGACC CGGACGAAGCCGTGGATGCAAAATTTCTCGTGGGACCTAAGAAGGAGAGATTGA ATGTCAAGGTCGGTTTGTCCAATTCATTTGGGTTCGGTGGGCATAACTCATCCAT ACTCTTCGCCCCTTACAATTAG SEQIDNO:32 C.painteriKASIVCDS ATGGCGGCCTCCTCTTGCATGGTTGCGTCCCCGTTCTGTACGTGGCTCGTATCCGC ATGCATGTCTACTTCATTCGACAACGACCCACGTTCCCTTTCCCACAAGCGGCTC CGCCTCTCCCGTCGCCGGAGGCCTCTCTCCTCTCATTGCTCCCTCCGCGGATCCAC TCCCCAATGCCTCGACCCTTGCAATCAGCACTGCTTCCTCGGGGATAACGGATTC GCTTCCCTCATCGGATCCAAGCCTCCCCGTTCCAATCTCGGCCACCTGAGGCTCG GCCGCACTTCCCATTCCGGGGAGGTCATGGCTGTGGCACAGGAAGTCTCCACAA ATAAGAAACATGCTACCAAGCAAAGGCGAGTAGTTGTGACAGGTATGGGCGTGG TGACTCCTCTAGGCCATGACCCCGATGTTTACTACAACAATCTCCTAGAAGGAGT AAGTGGCATCAGTGAGATAGAGAACTTCGACTGCTCTCAGCTTCCCACGAGAATT GCCGGAGAGATCAAGTCTTTTTCCACAGATGGGTTGGTGGCCCCGAAGCTCTCCA AGAGGATGGACAAGTTCATGCTTTACATCCTGACTGCAGGCAAGAAAGCATTAG CAGATGGTGGAATCACTGAAGATGTGATGAAAGAGCTCGATAAAAGAAAGTGTG GAGTTCTCATTGGCTCCGGATTGGGCGGTATGAAGGTATTCAGCGACTCCGTTGA AGCTCTGAGGATTTCATATAAGAAGATCAGTCCCTTTTGTGTACCTTTTTCTACCA CAAATATGGGATCCGCTATGCTTGCAATGGACTTGGGATGGATGGGCCCTAACTA TTCGATATCAACTGCCTGTGCAACAAGTAACTTCTGTATACTGAATGCTGCGAAC CACATAACCAAAGGCGAAGCTGACATGATGCTTTGTGGTGGCTCGGATGCGGCC ATTTTACCTATTGGTATGGGAGGTTTCGTGGCATGCCGAGCTTTGTCACAGAGGA ATAATGACCCTACCAAAGCTTCGAGACCATGGGACAGTAATCGTGATGGATTTGT GATGGGAGAAGGAGCTGGAGTGTTACTTCTCGAGGAGTTAGAGCATGCAAAGAA AAGAGGTGCAACCATTTATGCGGAATTTCTAGGTGGGAGTTTCACTTGCGATGCC TACCACATGACCGAGCCTCACCCTGATGGAGCTGGAGTGATCCTCTGCATAGAGA AGGCCTTGGCTCAGTCCGGAGTCTCGAGGGAAGAAGTAAATTACATAAATGCGC ATGCAACTTCCACTCCTGCTGGAGATATCAAGGAATACCAAGCTCTCGCCCATTG TTTCGGCCAAAACAGTGAGTTAAGAGTGAATTCCACCAAATCGATGATCGGTCAC CTTCTTGGAGGAGCTGGTGGCGTAGAAGCAGTTACAGTAGTTCAGGCAATAAGG ACTGGATGGATCCATCCAAATATTAATTTGGAAGACCCGGACAAAGGCGTGGAT GCAAAACTGCTCGTCGGCCCTAAGAAGGAGAAACTGAAGGTCAAGGTCGGTTTG TCCAATTCATTTGGGTTCGGCGGCCATAACTCATCCATACTCTTTGCCCCATGCAA TTAG SEQIDNO:33 C.avigeraKASIVaCDS ATGGCGGCCGCCGCTTCCATGGTTGCGTCCCCATTCTGTACGTGGCTCGTAGCCG CTTGCATGTCCACTTCCGTCGACAAAGACCCACGTTCGCCGTCTATCAAGCGTCT CCCCCGCCGGAAGAGGATTCATTCCCAATGCTCCCTCCGCGGATCCACCTTCCAA TGCCTCGTCACCTCATACAACGACCCTTGCGAACAATACCGCTCATCCGCCTCCC TTAGCTTCCTCGGGGATAACGGATTCGCATCCCTTTTCGGATCCAAGCCATTCCG GTCCAATCGCGGCCACCGGAGGCTCGGCCGTGCTTCCCATTCCGGGGAGGCCATG GCCGTGGCACTGCAACCTGCACAGGAAGTTGGCACGAAGAAGAAACCTGTTATC AAGCAAAGGCGAGTAGTTGTTACAGGAATGGGCGTGGTGACTCCTCTAGGCCAT GAACCTGATGTTTACTACAACAATCTCCTAGACGGAGTAAGCGGCATAAGTGAG ATAGAGACCTTCGACTGCACTCAGTTTCCCACGAGAATTGCCGGAGAGATCAAGT CTTTTTCCACAGATGGGTGGGTGGCTCCAAAGCTCTCTAAGAGGATGGACAAGTT CATGCTTTACTTGTTGACTGCTGGCAAGAAAGCATTGGCAGATGGTGGAATCACC GATGATGTGATGAAAGAGCTTGATAAAAGAAAGTGTGGAGTTCTCATTGGCTCC GGATTGGGCGGTATGAAGGTATTTAGCGAGTCCATTGAAGCTCTGAGGACTTCAT ATAAGAAGATCAGTCCCTTTTGTGTACCTTTTTCTACCACGAATATGGGATCCGCT ATTCTTGCAATGGACTTGGGATGGATGGGCCCTAACTATTCGATATCGACTGCCT GTGCAACAAGTAACTTCTGTATACTGAATGCTGCGAACCACATAACCAAAGGCG AAGCAGACATGATGCTTTGTGGTGGCTCGGATTCGGTCATTTTACCTATTGGTAT GGGAGGTTTCGTAGCATGCCGAGCTTTGTCACAGAGGAATAATGACCCTACCAA AGCTTCGAGACCATGGGACAGTAATCGTGATGGATTTGTGATGGGAGAAGGTGC TGGAGTTTTACTTCTCGAGGAGTTAGAGCATGCAAAGAAAAGAGGCGCAACCAT TTATGCGGAATTTCTTGGTGGGAGTTTCACTTGCGATGCCTACCACATGACCGAG CCTCACCCTGAAGGAGCTGGAGTGATCCTCTGCATAGAGAAGGCCTTGGCTCAGT CCGGAGTCTCGAGGGAAGACGTAAATTACATAAATGCGCATGCAACTTCCACTC CCGCTGGAGATATCAAAGAATACCAAGCTCTCGCCCACTGTTTCGGCCAAAACA GTGAGTTAAGAGTGAATTCCACCAAGTCGATGATCGGTCACCTTCTTGGAGGAGC CGGTGGCGTAGAAGCAGTTACAGTCGTTCAGGCAATAAGGACTGGATGGATCCA TCCAAATATTAATTTGGACGACCCGGACGAAGGCGTGGATGCAAAACTGCTCGT CGGCCCTAAGAAGGAGAAACTGAAGGTCAAGGTCGGTTTGTCCAATTCATTCGG GTTCGGCGGCCATAACTCATCCATACTCTTTGCCCCATGCAATTAG SEQIDNO:34 C.igneaKASIVaCDS ATGGCGGCGGCCGCTTCCATGTTTACGTCCCCACTCTGTACGTGGCTCGTAGCCT CTTGCATGTCGACTTCCTTCGACAACGACCCACGTTCGCCGTCCGTCAAGCGTCT CCCCCGCCGGAGGAGGATTCTCTCCCAATGCTCCCTCCGCGGATCCACCTCCCAA TGCCTCGTCACCTCATACATCGACCCTTGCAATAAGTACTGCTCCTCCGCCTCCCT TAGCTTCCTCGGGGATAACGGATTCGCATCCCTTTTCGGATCTAAGCCATTCCGG TCCAATCGCGGCCACCGGAGGCTCGGCCGTGCTTCCCATTCCGGGGAGGCCATGG CTGTGGCTCTGCAACCTGCACAGGAAGTCACCACGAAGAAGAAACCTGTGATCA AGCAAAGGCGAGTAGTTGTTACAGGAATGGGCGTGGTGACTCCTCTAGGCCATG AACCTGATGTTTACTACAACAATCTCCTAGATGGAGTAAGCGGCATAAGTGAGAT AGAGACCTTCGACTGCACTCAGTTTCCCACGAGAATCGCCGGAGAGATCAAGTCT TTTTCCACAGATGGGTGGGTGGCCCCAAAGCTCTCCAAGAGGATGGACAAGTTC ATGCTTTACTTGTTGACTGCTGGCAAGAAAGCATTAGCAGATGGTGGAATCACCG ATGATGTGATGAAAGAGCTTGATAAAAGAAAGTGTGGGGTTCTCATTGGCTCTG GAATGGGCGGCATGAAGTTGTTCAACGATTCCATTGAAGCTCTGAGGATTTCATA TAAAAAGATGAATCCCTTTTGTGTACCTTTTGCTACCACAAATATGGGATCAGCT ATGCTTGCAATGGACTTGGGATGGATGGGTCCTAACTACTCGATATCAACTGCCT GTGCAACAAGTAATTTCTGTATACTGAATGCTTCAAACCACATAGTCAGAGGCGA AGCTGACATGATGCTTTGTGGTGGCTCGGATGCGGTTATTATACCTATTGGTTTG GGAGGTTTTGTGGCGTGCCGAGCTTTGTCACAGAGGAATAATGACCCTACCAAA GCTTCGAGGCCATGGGATAGTAATCGTGATGGATTTGTAATGGGCGAAGGAGCT GGAGTGTTACTTCTCGAGGAGTTAGAGCATGCAAAGAAAAGAGGTGCAACCATT TATGCGGAATTTTTAGGGGGCAGTTTCACTTGCGATGCCTACCACATGACCGAGC CTCACCCTGAAGGAGCTGGAGTGATCCTCTGCATAGAGAAGGCCTTGGCTCAGG CCGGAGTCTCTAAAGAAGATGTAAATTACATAAATGCGCATGCAACTTCTACTCC TGCTGGAGATATCAAGGAATACCAAGCTCTCGCCCAATGTTTCGGCCAAAACAGT GAGCTGAGAGTGAATTCCACTAAATCGATGATCGGTCATCTTCTTGGAGCAGCTG GTGGTGTAGAAGCAGTTACTGTGGTTCAGGCGATAAGGACTGGGTGGATCCATC CAAATCTTAATTTGGAAGACCCGGACAAAGCCGTGGATGCAAAGTTGCTCGTGG GACCTAAGAAGGAGAGACTGAATGTCAAGGTCGGTTTGTCCAATTCATTTGGGTT CGGTGGGCATAATTCGTCCATACTCTTCGCCCCTTACAATTAG SEQIDNO:35 C.avigeraKASIaCDS ATGCAATCCCTCCATTCCCCTGCCCTCCGGGCCTCCCCTCTCGACCCTCTCCGACT CAAATCCTCCGCCAATGGCCCCTCTTCCACCGCCGCTTTCCGTCCCCTCCGCCGCG CCACCCTCCCCAACATTCGGGCCGCCTCCCCCACCGTCTCCGCCCCCAAGCGCGA GACCGACCCCAAGAAGCGTGTCGTCATCACCGGCATGGGCCTCGTCTCCGTCTTC GGCTCCGATGTCGACGCTTATTACGAAAAGCTCCTCTCCGGCGAGAGCGGGATCA GCTTAATCGACCGCTTCGACGCTTCCAAGTTCCCCACGAGGTTCGGCGGCCAGAT CCGGGGATTCAACGCCACGGGATACATCGACGGCAAAAACGACAGGAGGCTCGA CGATTGCCTCCGCTACTGCATTGTCGCCGGGAAGAAGGCTCTCGAAAATTCCGAT CTCGGCGGCGATAGTCTCTCAAAGATTGATAAGGAGAGAGCTGGAGTGCTAGTT GGAACTGGCATGGGTGGCCTAACCGTCTTCTCTGACGGGGTTCAGAATCTAATCG AGAAAGGTCACCGGAAGATCTCCCCGTTTTTCATTCCATATGCCATTACAAACAT GGGGTCTGCCCTGCTTGCCATCGATTTGGGTCTGATGGGCCCAAATTATTCGATTT CAACTGCATGTGCTACTTCCAACTACTGCTTTTATGCTGCTGCTAATCATATCCGC CGAGGCGAGGCTGACCTCATGATTGCTGGAGGAACTGAGGCTGCAATCATTCCA ATTGGGTTAGGAGGATTCGTTGCTTGCAGGGCTTTATCTCAAAGGAATGATGACC CTCAGACTGCCTCAAGGCCGTGGGATAAGGACCGTGATGGTTTTGTGATGGGTGA AGGGGCTGGAGTATTGGTTATGGAGAGCTTAGAACATGCAATGAAACGAGGAGC GCCGATTATTGCAGAATATTTGGGAGGTGCAGTCAACTGTGATGCTTATCATATG ACTGATCCAAGGGCTGATGGGCTTGGTGTCTCCTCGTGCATTGAGAGCAGTCTCG AAGATGCCGGGGTCTCACCTGAAGAGGTCAATTACATAAATGCTCATGCGACTTC TACTCTTGCTGGGGATCTTGCCGAGATAAATGCCATCAAGAAGGTTTTCAAGAAC ACCAAGGATATCAAAATCAATGCAACTAAGTCGATGATTGGACACTGTCTTGGA GCATCAGGGGGTCTTGAAGCCATTGCGACAATTAAGGGAATAACCACTGGCTGG CTTCATCCCAGCATAAACCAATTCAATCCCGAGCCATCAGTGGAATTTGACACTG TTGCCAACAAGAAGCAGCAACATGAAGTCAATGTTGCTATCTCAAATTCATTCGG ATTCGGAGGCCACAACTCAGTTGTAGCTTTCTCAGCTTTCAAGCCATGA SEQIDNO:36 C.pulcherrimaKASICDS ATGCATTCCCTCCAGTCACCCTCCCTTCGGGCCTCCCCGCTCGACCCCTTCCGCCC CAAATCATCCACCGTCCGCCCCCTCCACCGAGCATCAATTCCCAACGTCCGGGCC GCTTCCCCCACCGTCTCCGCTCCCAAGCGCGAGACCGACCCCAAGAAGCGCGTCG TGATCACCGGAATGGGCCTTGTCTCCGTTTTCGGCTCCGACGTCGATGCGTACTA CGACAAGCTCCTGTCAGGCGAGAGCGGGATCGGCCCAATCGACCGCTTCGACGC CTCCAAGTTCCCCACCAGGTTCGGCGGCCAGATTCGTGGCTTCAACTCCATGGGA TACATTGACGGCAAAAACGACAGGCGGCTTGATGATTGCCTTCGCTACTGCATTG TCGCCGGGAAGAAGTCTCTTGAGGACGCCGATCTCGGTGCCGACCGCCTCTCCAA GATCGACAAGGAGAGAGCCGGAGTGCTGGTTGGGACAGGAATGGGTGGTCTGAC TGTCTTCTCTGACGGGGTTCAATCTCTTATCGAGAAGGGTCACCGGAAAATCACC CCTTTCTTCATCCCCTATGCCATTACAAACATGGGGTCTGCCCTGCTCGCTATTGA ACTCGGTCTGATGGGCCCAAACTATTCAATTTCCACTGCATGTGCCACTTCCAAC TACTGCTTCCATGCTGCTGCTAATCATATCCGCCGTGGTGAGGCTGATCTTATGAT TGCTGGAGGCACTGAGGCCGCAATCATTCCAATTGGGTTGGGAGGCTTTGTGGCT TGCAGGGCTCTGTCTCAAAGGAACGATGACCCTCAGACTGCCTCTAGGCCCTGGG ATAAAGACCGTGATGGTTTTGTGATGGGTGAAGGTGCTGGAGTGTTGGTGCTGGA GAGCTTGGAACATGCAATGAAACGAGGAGCACCTATTATTGCAGAGTATTTGGG AGGTGCAATCAACTGTGATGCTTATCACATGACTGACCCAAGGGCTGATGGTCTC GGTGTCTCCTCTTGCATTGAGAGTAGCCTTGAAGATGCTGGCGTCTCACCTGAAG AGGTCAATTACATAAATGCTCATGCGACTTCTACTCTAGCTGGGGATCTCGCCGA GATAAATGCCATCAAGAAGGTTTTCAAGAACACAAAGGATATCAAAATTAATGC AACTAAGTCAATGATCGGACACTGTCTTGGAGCCTCTGGAGGTCTTGAAGCTATA GCGACTATTAAGGGAATAAACACCGGCTGGCTTCATCCCAGCATTAATCAATTCA ATCCTGAGCCATCCGTGGAGTTCGACACTGTTGCCAACAAGAAGCAGCAACACG AAGTTAATGTTGCGATCTCGAATTCATTTGGATTCGGAGGCCACAACTCAGTCGT GGCTTTCTCGGCTTTCAAGCCATGA SEQIDNO:37 C.avigamitochondrialKASCDS ATGGTGTTTCTTCCTTGGCGAAAAATGCTCTGTCCATCTCAATACCGTTTTTTGCG GCCCTTATCTTCATCTACAACTTTTGATCCTCGTAGGGTTGTTGTTACAGGCCTGG GTATGGTGACTCCATTAGGATGCGGGGTGAACACCACATGGAAACAACTCATAG AGGGGAAATGTGGGATAAGAGCAATATCCCTTGAAGACCTAAAGATGGATGCTT TTGATATTGATACTCAGGCCTATGTATTTGATCAGCTGACCTCGAAGGTCGCTGC CACCGTGCCCACCGGAGTGAATCCCGGAGAATTTAATGAAGATTTATGGTTCAAT CAGAAGGAGCACCGTGCTATTGCAAGGTTCATAGCTTATGCACTCTGTGCAGCTG ATGAAGCTCTTAAAGATGCAAATTGGGAACCTACTGAACCTGAAGAGAGAGAAA TGACGGGTGTCTCCATTGGTGGAGGGACTGGAAGCATTAGCGATGTATTAGATGC TGGTCGGATGATTTGTGAGAAGAAATTGCGTCGCCTAAGTCCATTCTTCATTCCA CGCATATTGATAAATATGGCCTCTGGTCATGTGAGCATGAAATATGGTTTCCAGG GACCCAACCATGCTGCTGTGACAGCTTGTGCAACAGGGGCTCATTCGATAGGTGA TGCTGCAAGGATGATACAGTTTGGAGATGCAGATGTCATGGTCGCTGGAGGCAC AGAATCTAGCATAGACGCCTTATCCATTGCAGGATTTTGCAGGTCAAGGGCTCTT ACAACAAAGTATAATTCTTGCCCACAAGAAGCTTCACGACCCTTTGATACCGATA GAGATGGGTTTGTAATAGGTGAAGGGTCTGGCGTCTTGGTATTGGAGGAACTAG ATCATGCAAGAAAACGTGGTGCAAAGATGTATGCCGAGTTCTGTGGATATGGAA TGTCTGGTGATGCGCATCATATAACCCAACCTCATAGCGATGGAAGAGGTGCCAT TTTAGCAATGACCCGTGCATTGAAGCAGTCAAATCTACATCCGGATCAGGTGGAT TATGTAAATGCTCACGCTACGTCTACTTCTTTAGGTGATGCAATTGAAGCTAAGG CGATTAAAACAGTTTTCTCGGATCATGCGATGTCAGGTTCGCTCGCCCTTTCCTCC ACCAAGGGAGCTATTGGGCATCTCCTCGGAGCAGCGGGTGCTGTGGAAGCCATT TTCTCCATTCTGGCTATAAAAAACGGACTTGCGCCTTTGACGCTAAATGTCGCAA GACCAGACCCTGTGTTTACCGAGCGGTTTGTGCCTTTGACTGCTTCAAAAGAGAT GCATGTAAGGGCGGCGTTGTCAAACTCTTTTGGCTTTGGAGGTACAAATACTACA CTTCTTTTCACTTCACCTCCTCAAAACTAA SEQIDNO:38 CupheapalustrisKASIVcodonoptimizedforProtothecawithcloningsequenceandtags. NucleotidesequenceoftheC.palustrisKASIVexpressionvector(D3145andD3295, pSZ4312).The5 and3 homologyarmsenablingtargetedintegrationintothepLOOPlocus arenotedwithlowercase;thePmHXT1-2promoterisnotedinuppercaseitalicwhichdrives expressionoftheScMelibiaseselectionmarkernotedwithlowercaseitalicfollowedbythe PmPGK3UTRterminatorhighlightedinuppercase.ThePmACPpromoter(notedinbold text)drivestheexpressionofthecodonoptimizedCpaIKASIV(notedwithlowercasebold text)andisterminatedwiththeCvNR3UTRnotedinunderlined,lowercasebold. RestrictioncloningsitesandspacerDNAfragmentsarenotedasunderlined,uppercaseplain lettering. aacggaggtctgtcaccaaatggaccccgtctattgcgggaaaccacggcgatggcacgtttcaaaacttgatgaaatacaatattcag tatgtcgcgggcggcgacggcggggagctgatgtcgcgctgggtattgcttaatcgccagcttcgcccccgtcttggcgcgaggcgt gaacaagccgaccgatgtgcacgagcaaatcctgacactagaagggctgactcgcccggcacggctgaattacacaggcttgcaaa aataccagaatttgcacgcaccgtattcgcggtattttgttggacagtgaatagcgatgcggcaatggcttgtggcgttagaaggtgcga cgaaggtggtgccaccactgtgccagccagtcctggcggctcccagggccccgatcaagagccaggacatccaaactacccacag catcaacgccccggcctatactcgaaccccacttgcactctgcaatggtatgggaaccacggggcagtcttgtgtgggtcgcgcctat cgcggtcggcgaagaccgggaaGGTACCCCGCTCCCGTCTGGTCCTCACGTTCGTGTACGGCCT GGATCCCGGAAAGGGCGGATGCACGTGGTGTTGCCCCGCCATTGGCGCCCACGTTTC AAAGTCCCCGGCCAGAAATGCACAGGACCGGCCCGGCTCGCACAGGCCATGACGAAT GCCCAGATTTCGACAGCAAAACAATCTGGAATAATCGCAACCATTCGCGTTTTGAACGA AACGAAAAGACGCTGTTTAGCACGTTTCCGATATCGTGGGGGCCGAAGCATGATTGGG GGGAGGAAAGCGTGGCCCCAAGGTAGCCCATTCTGTGCCACACGCCGACGAGGACCA ATCCCCGGCATCAGCCTTCATCGACGGCTGCGCCGCACATATAAAGCCGGACGCCTTC CCGACACGTTCAAACAGTTTTATTTCCTCCACTTCCTGAATCAAACAAATCTTCAAGGAA GATCCTGCTCTTGAGCAACTAGTatgttcgcgttctacttcctgacggcctgcatctccctgaagggcgtgttcggc gtctccccctcctacaacggcctgggcctgacgccccagatgggctgggacaactggaacacgttcgcctgcgacgtaccgagc agctgctgctggacacggccgaccgcatctccgacctgggcctgaaggacatgggctacaagtacatcatcctggacgactgct ggtcctccggccgcgactccgacggcttcctggtcgccgacgagcagaagttccccaacggcatgggccacgtcgccgaccacc tgcacaacaactccucctgttcggcatgtactcctccgcgggcgagtacacgtgcgccggctaccccggctccctgggccgcgag gaggaggacgcccagacttcgcgaacaaccgcgtggactacctgaagtacgacaactgctacaacaagggccagacggcac gcccgagatctcctaccaccgctacaaggccatgtccgacgccctgaacaagacgggccgccccatcactactccctgtgcaac tggggccaggacctgaccactactggggctccggcatcgcgaactcctggcgcatgtccggcgacgtcacggcggagacacgc gccccgactcccgctgcccctgcgacggcgacgagtacgactgcaagtacgccggcaccactgctccatcatgaacatcctgaa caaggccgcccccatgggccagaacgcgggcgtcggcggctggaacgacctggacaacctggaggtcggcgtcggcaacct gacggacgacgaggagaaggcgcacactccatgtgggccatggtgaagtcccccctgatcatcggcgcgaacgtgaacaacc tgaaggcctcctcctactccatctactcccaggcgtccgtcatcgccatcaaccaggactccaacggcatccccgccacgcgcgtc tggcgctactacgtgtccgacacggacgagtacggccagggcgagatccagatgtggtccggccccctggacaacggcgacca ggtcgtggcgctgctgaacggcggctccgtgtcccgccccatgaacacgaccctggaggagatcacttcgactccaacctgggct ccaagaagctgacctccacctgggacatctacgacctgtgggcgaaccgcgtcgacaactccacggcgtccgccatcctgggcc gcaacaagaccgccaccggcatcctgtacaacgccaccgagcagtcctacaaggacggcctgtccaagaacgacacccgcct gttcggccagaagatcggctccctgtcccccaacgcgatcctgaacacgaccgtccccgcccacggcatcgcgactaccgcctg cgcccctcctcctgaTACAACTTATTACGTATTCTGACCGGCGCTGATGTGGCGCGGACGC CGTCGTACTCTTTCAGACTTTACTCTTGAGGAATTGAACCTTTCTCGCTTGCTGGC ATGTAAACATTGGCGCAATTAATTGTGTGATGAAGAAAGGGTGGCACAAGATGG ATCGCGAATGTACGAGATCGACAACGATGGTGATTGTTATGAGGGGCCAAACCT GGCTCAATCTTGTCGCATGTCCGGCGCAATGTGATCCAGCGGCGTGACTCTCGCA ACCTGGTAGTGTGTGCGCACCGGGTCGCTTTGATTAAAACTGATCGCATTGCCAT CCCGTCAACTCACAAGCCTACTCTAGCTCCCATTGCGCACTCGGGCGCCCGGCTC GATCAATGTTCTGAGCGGAGGGCGAAGCGTCAGGAAATCGTCTCGGCAGCTGGA AGCGCATGGAATGCGGAGCGGAGATCGAATCAGGATCCCGCGTCTCGAACAGAG CGCGCAGAGGAACGCTGAAGGTCTCGCCTCTGTCGCACCTCAGCGCGGCATACA CCACAATAACCACCTGACGAATGCGCTTGGTTCTTCGTCCATTAGCGAAGCGTCC GGTTCACACACGTGCCACGTTGGCGAGGTGGCAGGTGACAATGATCGGTGGAGC TGATGGTCGAAACGTTCACAGCCTAGGGATATCGCCTGCTCAAGCGGGCGCTC AACATGCAGAGCGTCAGCGAGACGGGCTGTGGCGATCGCGAGACGGACGA GGCCGCCTCTGCCCTGTTTGAACTGAGCGTCAGCGCTGGCTAAGGGGAGGG AGACTCATCCCCAGGCTCGCGCCAGGGCTCTGATCCCGTCTCGGGCGGTGA TCGGCGCGCATGACTACGACCCAACGACGTACGAGACTGATGTCGGTCCCG ACGAGGAGCGCCGCGAGGCACTCCCGGGCCACCGACCATGTTTACACCGAC CGAAAGCACTCGCTCGTATCCATTCCGTGCGCCCGCACATGCATCATCTTTT GGTACCGACTTCGGTCTTGTTTTACCCCTACGACCTGCCTTCCAAGGTGTGA GCAACTCGCCCGGACATGACCGAGGGTGATCATCCGGATCCCCAGGCCCCA GCAGCCCCTGCCAGAATGGCTCGCGCTTTCCAGCCTGCAGGCCCGTCTCCC AGGTCGACGCAACCTACATGACCACCCCAATCTGTCCCAGACCCCAAACACC CTCCTTCCCTGCTTCTCTGTGATCGCTGATCAGCAACACATatggcttccgcggcattca ccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgccacccgcctgagggg ctccaccttccagtgcctggtgacctcctacatcgacccctgcaaccagttctcctcctccgcctccctgtccttcctgggcgaca acggcttcgcctccctgttcggctccaagcccttccgctccaaccgcggccaccgccgcctgggccgcgcctcccactccggcg aggccatggccgtggccctggagcccgcccaggaggtggccaccaagaagaagcccctggtgaagcagcgccgcgtggtg gtgaccggcatgggcgtggtgacccccctgggccacgagcccgacgtgtactacaacaacctgctggacggcgtgtccggca tctccgagatcgaggccttcgactgcacccagttccccacccgcatcgccggcgagatcaagtccttctccaccgacggctggg tggcccccaagctgtccaagcgcatggacaagttcatgctgtacctgctgaccgccggcaagaaggccctggccgacggcgg catcaccgacgacgtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctccggcctgggcggcatgaagct gttctccgactccatcgaggccctgcgcatctcctacaagaagatgaaccccttctgcgtgcccttcgccaccaccaacatggg ctccgccatgctggccatggacctgggctggatgggccccaactactccatctccaccgcctgcgccacctccaacttctgcatc ctgaactcgccaaccacatcgtgcgccggcgaggccgacatgatgctgtgcggcggtccgacgccgtgatcatccccatcgg cctgggcggcttcgtggcctgccgcgccctgtcccagcgcaacaacgaccccaccaaggcctcccgcccctgggactccaacc gcgacggcttcgtgatgggcgagggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccacc atctacgccgagttcctgggcggctccttcacctgcgacgcctaccacatgaccgagccccaccccgagggcgccggcgtgat cctgtgcatcgagaaggccctggcccaggccggcgtgtcccgcgaggacgtgaactacatcaacgcccacgccacctccacc cccgccggcgacatcaaggagtaccaggccctggcccactgcttcggccagaactccgagctgcgcgtgaactccaccaaagt ccatgatcggccacctgatcggcgccgccggcggcgtggaggccgtgaccgtggtgcaggccatccgcaccggctggatcca ccccaacctgaacctggaggaccccgacaaggccgtggacgccaaggtgctggtgggccccaagaaggagcgcctgaacg tgaaggtgggcctgtccaactccttcggcttcggcggccacaactcctccatcctgttcgccccctacaacaccatgtaccccta cgacgtgcccgactacgcctgaTATCGAGgcagcagcagctcggatagtatcgacacactctggacgctggtcgtgtga tggactgttgccgccacacttgctgccttgacctgtgaatatccctgccgcttttatcaaacagcctcagtgtgtttgatcttgtgt gtacgcgcttttgcgagttgctagctgcttgtgctatttgcgaataccacccccagcatccccttccctcgtttcatatcgcttgcat cccaaccgcaacttatctacgctgtcctgctatccctcagcgctgctcctgctcctgctcactgcccctcgcacagccttggtttgg gctccgcctgtattctcctggtactgcaacctgtaaaccagcactgcaatgctgatgcacgggaagtagtgggatgggaacac aaatggaAAGCTTGAGCTCagcggcgacggtcctgctaccgtacgacgttgggcacgcccatgaaagtttgtataccga gcttgttgagcgaactgcaagcgcggctcaaggatacttgaactcctggattgatatcggtccaataatggatggaaaatccgaacctc gtgcaagaactgagcaaacctcgttacatggatgcacagtcgccagtccaatgaacattgaagtgagcgaactgttcgcttcggtggc agtactactcaaagaatgagctgctgttaaaaatgcactctcgttctctcaagtgagtggcagatgagtgctcacgccttgcacttcgctg cccgtgtcatgccctgcgccccaaaatttgaaaaaagggatgagattattgggcaatggacgacgtcgtcgctccgggagtcaggac cggcggaaaataagaggcaacacactccgcttctta SEQIDNO:39 CupheapalustrisKASIVcodonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccaccttccagtgcctggtgacctcctacatcgacccctgcaaccagttctcctcctccgcctccctgtcctt cctgggcgacaacggcttcgcctccctgttcggctccaagcccttccgctccaaccgcggccaccgccgcctgggccgcgcctccc actccggcgaggccatggccgtggccctggagcccgcccaggaggtggccaccaagaagaagcccctggtgaagcagcgccgc gtggtggtgaccggcatgggcgtggtgacccccctgggccacgagcccgacgtgtactacaacaacctgctggacggcgtgtccg gcatctccgagatcgaggccttcgactgcacccagttccccacccgcatcgccggcgagatcaagtccttctccaccgacggctggg tggcccccaagctgtccaagcgcatggacaagttcatgctggtacctgctgaccgccggcaagaaggccctggccgacggcggcatc accgacgacgtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctccggcctgggcggcatgaagctgttctccg actccatcgaggccctgcgcatctcctacaagaagatgaaccccttctgcgtgcccttcgccaccaccaacatgggctccgccatgct ggccatggacctgggctggatgggcccaactactccatctccaccgcctgcgccacctccaacttctgcatcctgaactccgccaac cacatcgtgcgcggcgaggccgacatgatgctgtgcggcggctccgacgccgtgatcatccccatcggcctgggcggcttcgtggc ctgccgcgccctgtcccagcgcaacaacgaccccaccaaggcctcccgcccctgggactccaaccgcgacggcttcgtgatgggc gagggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatctacgccgagttcctgggcggc tccttcacctgcgacgcctaccacatgaccgagccccaccccgagggcgccggcgtgatcctgtgcatcgagaaggccctggccca ggccggcgtgtcccgcgaggacgtgaactacatcaacgcccacgccacctccaccccgccggcgacatcaaggagtaccaggc cctggcccactgcttcggccagaactccgagctgcgcgtgaactccaccaagtccatgatcggccacctgatcggcgccgccggcg gcgtggaggccgtgaccgtggtgcaggccatccgcaccggctggatccaccccaacctgaacctggaggaccccgacaaggccg tggacgccaaggtgctggtgggccccaagaaggagcgcctgaacgtgaaggtgggcctgtccaactccttcggcttcggcggccac aactcctccatcctgttcgccccctacaacaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:40 C.camphoraKASIVcodonoptimizedforPrototheca.NucleotidesequencefromtheC. camphoraKASIV(D3147,pSZ4338)expressionvector.OnlythecodonoptimizedC. camphoraKASIVsequenceisshown,thepromoter,3UTR,selectionmakrerandtargeting armsarethesameasinSEQIDNO:38. atggccatgatggccggctcctgctccaacctggtgatcggcaaccgcgagctgggcggcaacggcccctccctgctgcactacaa cggcctgcgccccctggagaacatccagaccgcctccgccgtgaagaagcccaacggcctgttcgcctcctccaccgcccgcaagt ccaaggccgtgcgcgccatggtgctgcccaccgtgaccgcccccaagcgcgagaaggaccccaagaagcgcatcgtgatcaccg gcatgggcctggtgtccgtgttcggcaacgacatcgacaccttctactccaagctgctggagggcgagtccggcatcggccccatcg accgcttcgacgcctcctccttctccgtgcgcttcgccggccagatccacaacttctcctccaagggctacatcgacggcaagaacga ccgccgcctggacgactgctggcgctactgcctggtggccggccgccgcgccctggaggacgccaacctgggccccgaggtgct ggagaagatggaccgctcccgcatcggcgtgctgatcggcaccggcatgggcggcctgtccgccttctccaacggcgtggagtccc tgatccagaagggctacaagaagatcacccccttcttcatcccctactccatcaccaacatgggctccgccctgctggccatcgacacc ggcgtgatgggccccaactactccatctccaccgcctgcgccaccgccaactactgcttccacgccgccgccaaccacatccgccgc ggcgaggccgagatcatggtgaccggcggcaccgaggccgccgtgtccgccaccggcgtgggcggcttcatcgcctgccgcgcc ctgtcccaccgcaacgacgagccccagaccgcctcccgcccctgggacaaggaccgcgacggcttcgtgatgggcgagggcgcc ggcgtgctggtgatggagtccctgcaccacgcccgcaagcgcggcgccaacatcatcgccgagtacctgggcggcgccgtgacct gcgacgcccaccacatgaccgacccccgcgccgacggcctgggcgtgtcctcctgcatcaccaagtccctggaggacgccggcgt gtcccccgaggaggtgaactacgtgaacgcccacgccacctccaccctggccggcgacctggccgaggtgaacgccatcaagaa ggtgttcaaggacacctccgagatgaagatgaacggcaccaagtccatgatcggccactgcctgggcgccgccggcggcctggag gccatcgccaccatcaaggccatcaacaccggctggctgcaccccaccatcaaccagttcaacatcgagcccgccgtgaccatcga caccgtgcccaacgtgaagaagaagcacgacatccacgtgggcatctccaactccttcggcttcggcggccacaactccgtggtggt gttcgcccccttcatgcccaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:41 C.camphoraKASI(D3148,pSZ4339)codonoptimizedforPrototheca atgcagatcctgcagaccccctcctcctcctcctcctccctgcgcatgtcctccatggagtccctgtccctgacccccaagtccctgccc ctgaagaccctgctgcccctgcgcccccgccccaagaacctgtcccgccgcaagtcccagaacccccgccccatctcctcctcctcc tcccccgagcgcgagaccgaccccaagaagcgcgtggtgatcaccggcatgggcctggtgtccgtgttcggcaacgacgtggacg cctactacgaccgcctgctgtccggcgagtccggcatcgcccccatcgaccgcttcgacgcctccaagttccccacccgcttcgccg gccagatccgcggcttcacctccgacggctacatcgacggcaagaacgaccgccgcctggacgactgcctgcgctactgcatcgtg tccggcaagaaggccctggagaacgccggcctgggcccccacctgatggacggcaagatcgacaaggagcgcgccggcgtgct ggtgggcaccggcatgggcggcctgaccgtgttctccaacggcgtgcagaccctgcacgagaagggctaccgcaagatgaccccc ttcttcatcccctacgccatcaccaacatgggctccgccctgctggccatcgagctgggcttcatgggccccaactactccatctccacc gcctgcgccacctccaactactgcttctacgccgccgccaaccacatccgccgcggcgaggccgacctgatgctggccggcggca ccgaggccgccatcatccccatcggcctgggcggcttcgtggcctgccgcgccctgtcccagcgcaacgacgacccccagaccgc ctcccgcccctgggacaaggaccgcgacggcttcgtgatgggcgagggcgccggcgtgctggtgatggagtccctggagcacgc catgaagcgcgacgcccccatcatcgccgagtacctgggcggcgccgtgaactgcgacgcctaccacatgaccgacccccgcgcc gacggcctgggcgtgtccacctgcatcgagcgctccctggaggacgccggcgtggcccccgaggaggtgaactacatcaacgccc acgccacctccaccctggccggcgacctggccgaggtgaacgccatcaagaaggtgttcaccaacacctccgagatcaagatcaac gccaccaagtccatgatcggccactgcctgggcgccgccggcggcctggaggccatcgccaccatcaaggccatcaacaccggct ggctgcacccctccatcaaccagttcaaccccgagccctccgtggagttcgacaccgtggccaacaagaagcagcagcacgaggtg aacgtggccatctccaactccttcggcttcggcggccacaactccgtggtggtgttctccgccttcaagcccaccatgtacccctacga cgtgcccgactacgcctga SEQIDNO:42 U.californicaKASI U.californicaKASI(D3150,pSZ4341)codonoptimizedforPrototheca atggagtccctgtccctgacccccaagtccctgcccctgaagaccctgctgcccttccgcccccgccccaagaacctgtcccgccgc aagtcccagaaccccaagcccatctcctcctcctcctcccccgagcgcgagaccgaccccaagaagcgcgtggtgatcaccggcat gggcctggtgtccgtgttcggcaacgacgtggacgcctactacgaccgcctgctgtccggcgagtccggcatcgcccccatcgacc gcttcgacgcctccaagttccccacccgcttcgccggccagatccgcggcttcacctccgacggctacatcgacggcaagaacgacc gccgcctggacgactgcctgcgctactgcatcgtgtccggcaagaaggccctggagaacgccggcctgggccccgacctgatgga cggcaagatcgacaaggagcgcgccggcgtgctggtgggcaccggcatgggcggcctgaccgtgttctccaacggcgtgcagac cctgcacgagaagggctaccgcaagatgacccccttcttcatcccctacgccatcaccaacatgggctccgccctgctggccatcgac ctgggcttcatgggccccaactactccatctccaccgcctgcgccacctccaactactgcttctacgccgccgccaaccacatccgcc gcggcgaggccgacgtgatgctggccggcggcaccgaggccgccatcatccccatcggcctgggcggcttcgtggcctgccgcg ccctgtcccagcgcaacgacgacccccagaccgcctcccgcccctgggacaaggaccgcgacggcttcgtgatgggcgagggcg ccggcgtgctggtgatggagtccctggagcacgccatgaagcgcgacgcccccatcatcgccgagtacctgggcggcgccgtgaa ctgcgacgcctaccacatgaccgacccccgcgccgacggcctgggcgtgtccacctgcatcgagcgctccctggaggacgccggc gtggcccccgaggaggtgaactacatcaacgcccacgccacctccaccctggccggcgacctggccgaggtgaacgccatcaaga aggtgttcaccaacacctccgagatcaagatcaacgccaccaagtccatgatcggccactgcctgggcgccgccggcggcctggag gccatcgccaccatcaaggccatcaacaccggctggctgcacccctccatcaaccagttcaaccccgagccctccgtggagttcgac accgtggccaacaagaagcagcagcacgaggtgaacgtggccatctccaactccttcggcttcggcggccacaactccgtggtggt gttctccgccttcaagcccaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:43 U.californicaKASIV(D3152,pSZ4343)codonoptimizedforPrototheca atgacccagaccctgatctgcccctcctccatggagaccctgtccctgaccaagcagtcccacttccgcctgcgcctgcccacccccc cccacatccgccgcggcggcggccaccgccaccccccccccttcatctccgcctccgccgccccccgccgcgagaccgacccca agaagcgcgtggtgatcaccggcatgggcctggtgtccgtgttcggcaccaacgtggacgtgtactacgaccgcctgctggccggc gagtccggcgtgggcaccatcgaccgcttcgacgcctccatgttccccacccgcttcggcggccagatccgccgcttcacctccgag ggctacatcgacggcaagaacgaccgccgcctggacgactacctgcgctactgcctggtgtccggcaagaaggccatcgagtccg ccggcttcgacctgcacaacatcaccaacaagatcgacaaggagcgcgccggcatcctggtgggctccggcatgggcggcctgaa ggtgttctccgacggcgtggagtccctgatcgagaagggctaccgcaagatctcccccttcttcatcccctacatgatccccaacatgg gctccgccctgctgggcatcgacctgggcttcatgggccccaactactccatctccaccgcctgcgccacctccaactactgcatctac gccgccgccaaccacatccgccagggcgacgccgacctgatggtggccggcggcaccgaggcccccatcatccccatcggcctg ggcggcttcgtggcctgccgcgccctgtccacccgcaacgacgacccccagaccgcctcccgcccctgggacatcgaccgcgacg gcttcgtgatgggcgagggcgccggcatcctggtgctggagtccctggagcacgccatgaggcgcgacgcccccatcctggccga gtacctgggcggcgccgtgaactgcgacgcccaccacatgaccgacccccgcgccgacggcctgggcgtgtccacctgcatcgag tcctccctggaggacgccggcgtggccgccgaggaggtgaactacatcaacgcccacgccacctccacccccaccggcgacctgg ccgagatgaaggccatcaagaacgtgttccgcaacacctccgagatcaagatcaacgccaccaagtccatgatcggccactgcctgg gcgcctccggcggcctggaggccatcgccaccctgaaggccatcaccaccggctggctgcaccccaccatcaaccagttcaaccc cgagccctccgtggacttcgacaccgtggccaagaagaagaagcagcacgaggtgaacgtggccatctccaactccttcggcttcg gcggccacaactccgtgctggtgttctccgccttcaagcccaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:44 C.wrightiiKASAI(D3153,pSZ4379)codonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggtacgtattccagtgcctggtggccagctgcatcgacccctgcgaccagtaccgcagcagcgccagcctgagctt cctgggcgacaacggcttcgccagcctgttcggcagcaagcccttcatgagcaaccgcggccaccgccgcctgcgccgcgccagc cacagcggcgaggccatggccgtggccctgcagcccgcccaggaggccggaccaagaagaagcccgtgatcaagcagcgcc gcgtggtggtgaccggcatgggcgtggtgacccccctgggccacgagcccgacgtgttctacaacaacctgctggacggcgtgag cggcatcagcgagatcgagaccttcgactgcacccagttccccacccgcatcgccggcgagatcaagagcttcagcaccgacggct gggtggccccccaagctgagcaagcgcatggacaagttcatgctgtacctgctgaccgccggcaagaaggccctggccgacggcgg catcaccgacgaggtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggcagcggcatgggcggcatgaaggtgttc aacgacgccatcgaggccctgcgcgtgagctacaagaagatgaaccccttctgcgtgcccttcgccaccaccaacatgggcagcgc catgctggccatggacctgggctggatgggccccaactacagcatcagcaccgcctgcgccaccagcaacttctgcatcctgaacgc cgccaaccacatcatccgcggcgaggccgacatgatgctgtgcggcggcagcgacgccgtgatcatccccatcggcctgggcggc ttcgtggcctgccgcgccctgagccagcgcaacagcgaccccaccaaggccagccgcccctgggacagcaaccgcgacggcttc gtgatgggcgagggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatctacgccgagttc ctgggcggcagcttcacctgcgacgcctaccacatgaccgagccccaccccgagggcgccggcgtgatcctgtgcatcgagaagg ccctggcccaggccggcgtgagcaaggaggacgtgaactacatcaacgcccacgccaccagcaccagcgccggcgacatcaag gagtaccaggccctggcccgctgcttcggccagaacagcgagctgcgcgtgaacagcaccaagagcatgatcggccacctgctgg gcgccgccggcggcgtggaggccgtgaccgtggtgcaggccatccgcaccggctggattcaccccaacctgaacctggaggacc ccgacaaggccgtggacgccaagctgctggtgggccccaagaaggagcgcctgaacgtgaaggtgggcctgagcaacagcttcg gcttcggcggccacaacagcagcatcctgttcgccccctgcaacgtgtga SEQIDNO:45 C.avigeraKASIVb(D3287,pSZ4453)codonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccaccttccagtgctacatcggcgacaacggcttcggctccaagcccccccgctccaaccgcggccacc tgcgcctgggccgcacctcccactccggcgaggtgatggccgtggccatgcagtccgcccaggaggtgtccaccaaggagaagcc cgccaccaagcagcgccgcgtggtggtgaccggcatgggcgtggtgaccgccctgggccacgaccccgacgtgtactacaacaa cctgctggacggcgtgtccggcatctccgagatcgagaacttcgactgctcccagctgcccacccgcatcgccggcgagatcaagtc cttctccgccgacggctgggtggcccccaagttctcccgccgcatggacaagttcatgctgtacatcctgaccgccggcaagaaggc cctggtggacggcggcatcaccgaggacgtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctccggcctggg cggcatgaaggtgttctccgagtccatcgaggccctgcgcacctcctacaagaagatctcccccttctgcgtgcccttctccaccacca acatgggctccgccatcctggccatggacctgggctggatgggccccaactactccatctccaccgcctgcgccacctccaacttctg catcctgaacgccgccaaccacatcaccaagggcgaggccgacatgatgctgtgcggcggctccgactccgtgatcctgcccatcg gcatgggcggcttcgtggcctgccgcgccctgtcccagcgcaacaacgaccccaccaaggcctcccgcccctgggactccaaccg cgacggcttcgtgatgggcgagggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatcta cgccgagttcctgggcggctccttcacctgcgacgcctaccacatgaccgagccccaccccgagggcgccggcgtgatcctgtgca tcgagaaggccctggcccagtccggcgtgtcccgcgaggacgtgaactacatcaacgcccacgccacctccacccccgccggcga catcaaggagtaccaggccctggcccactgcttcggccagaactccgagctgcgcgtgaactccaccaagtccatgatcggccacct gctgggcggcgccggcggcgtggaggccgtgaccgtggtgcaggccatccgcaccggctggatccaccccaacatcaacctgga cgaccccgacgagggcgtggacgccaagctgctggtgggccccaagaaggagaagctgaaggtgaaggtgggcctgtccaactc cttcggcttcggcggccacaactcctccatcctgttcgccccctgcaacaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:46 C.paucipetalaKASIVbcodonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccaccttccagtgcctgggcgacatcggcttcgcctccctgatcggctccaagcccccccgctccaaccg caaccaccgccgcctgggccgcacctcccactccggcgaggtgatggccgtggccatgcagcccgcccacgaggcctccaccaa gaacaagcccgtgaccaagcagcgccgcgtggtggtgaccggcatgggcgtggccacccccctgggccacgaccccgacgtgta ctacaacaacctgctggacggcgtgtccggcatctcccagatcgagaacttcgactgcacccagttccccacccgcatcgccggcga gatcaagtccttctccaccgagggctacgtgatccccaagttcgccaagcgcatggacaagttcatgctgtacctgctgaccgccggc aagaaggccctggaggacggcggcatcaccgaggacgtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctcc ggcatgggcggcatgaagatcatcaacgactccatcgccgccctgaacgtgtcctacaagaagatgacccccttctgcgtgcccttct ccaccaccaacatgggctccgccatgctggccatcgacctgggctggatgggccccaactactccatctccaccgcctgcgccacct ccaactactgcatcctgaacgccgccaaccacatcgtgcgcggcgaggccgacatgatgctgtgcggcggctccgacgccgtgatc atccccgtgggcctgggcggcttcgtggcctgccgcgccctgtcccagcgcaacaacgaccccaccaaggcctcccgcccctggg actccaaccgcgacggcttcgtgatgggcgagggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcg ccaccatctacgccgagttcctgggcggctccttcacctgcgacgcctaccacatgaccgagccccaccccgacggcgccggcgtg atcctgtgcatcgagaaggccctggcccagtccggcgtgtcccgcgaggacgtgaactacatcaacgcccacgccacctccacccc cgccggcgacatcaaggagtaccaggccctggcccactgcttcggccagaactccgagctgcgcgtgaactccaccaagtccatga tcggccacctgctgggcgccgccggcggcgtggaggccgtgaccgtggtgcaggccatccgcaccggctggatccaccccaacat caacctggagaaccccgacgaggccgtggacgccaagctgctggtgggccccaagaaggagaagctgaaggtgaaggtgggcct gtccaactccttcggcttcggcggccacaactcctccatcctgttcgccccctacaacaccatgtacccctacgacgtgcccgactacg cctga SEQIDNO:47 C.igneaKASIVb(D3289,pSZ4455)codonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccacctcccagtgcctggtgacctcctacatcgacccctgcaacaagtactgctcctccgcctccctgtcct tcctgggcgacaacggcttcgcctccctgttcggctccaagcccttccgctccaaccgcggccaccgccgcctgggccgcgcctccc actccggcgaggccatggccgtggccctgcagcccgcccaggaggtgaccaccaagaagaagcccgtgatcaagcagcgccgc gtggtggtgaccggcatgggcgtggtgacccccctgggccacgagcccgacgtgtactacaacaacctgctggacggcgtgtccg gcatctccgagatcgagaccttcgactgcacccagttccccacccgcatcgccggcgagatcaagtccttctccaccgacggctgggt ggcccccaagctgtccaagcgcatggacaagttcatgctgtacctgctgaccgccggcaagaaggccctggccgacggcggcatc accgacgacgtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctccggcatgggcggcatgaagctgttcaacg actccatcgaggccctgcgcatctcctacaagaagatgaaccccttctgcgtgcccttcgccaccaccaacatgggctccgccatgct ggccatggacctgggctggatgggccccaactactccatctccaccgcctgcgccacctccaacttctgcatcctgaacgcctccaac cacatcgtgcgcggcgaggccgacatgatgctgtgcggcggctccgactccgtgaccgtgcccctgggcgtgggcggcttcgtggc ctgccgcgccctgtcccagcgcaacaacgaccccaccaaggcctcccgcccctgggactccaaccgcgacggcttcgtgatgggc gagggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatctacgccgagttcctgggcggc tccttcacctccgacgcctaccacatgaccgagccccaccccgagggcgccggcgtgatcctgtgcatcgagaaggccctggccca gtccggcgtgtcccgcgaggacgtgaactacatcaacgcccacgccacctccacccccgccggcgacatcaaggagtaccaggcc ctggcccgctgcttcggccagaactccgagctgcgcgtgaactccaccaagtccatgatcggccacctgctgggcgccgccggcgg cgtggaggccgtggccgtgatccaggccatccgcaccggctggatccaccccaacatcaacctggaggaccccgacgaggccgtg gaccccaagctgctggtgggccccaagaaggagaagctgaaggtgaaggtggccctgtccaactccttcggcttcggcggccacaa ctcctccatcctgttcgccccctgcaacaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:48 CupheaprocumbensKASIV(D3290,pSZ4456)codonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccaccttccagtgcctggtgacctcccacaacgacccctgcaaccagtactgctcctccgcctccctgtcc ttcctgggcgacaacggcttcggctccaagcccttccgctccaaccgcggccaccgccgcctgggccgcgcctcccactccggcga ggccatggccgtggccctgcagcccgcccaggaggtggccaccaagaagaagcccgccatgaagcagcgccgcgtggtggtga ccggcatgggcgtggtgacccccctgggccacgagcccgacgtgtactacaacaacctgctggacggcgtgtccggcatctccgag atcgagaccttcgactgcacccagttccccacccgcatcgccggcgagatcaagtccttctccaccgacggctgggtggcccccaag ctgtccaagcgcatggacaagttcatgctgtacctgctgaccgccggcaagaaggccctggccgacggcggcatcaccgacgacgt gatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctccggcatgggcggcatgaagctgttcaacgactccatcgag gccctgcgcgtgtcctacaagaagatgaaccccttctgcgtgcccttcgccaccaccaacatgggctccgccatgctggccatggacc tgggctggatgggccccaactactccatctccaccgcctgcgccacctccaacttctgcatcctgaacgccgccaaccacatcgtgcg cggcgaggccgacatgatgctgtgcggcggctccgacgccgtgatcatccccatcggcctgggcggcttcgtggcctgccgcgccc tgtcccagcgcaacaacgaccccaccaaggcctcccgcccctgggactccaaccgcgacggcttcgtgatgggcgagggcgccg gcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatctacgccgagttcctgggcggctccttcacctgc gacgcctaccacatgaccgagccccaccccgagggcgccggcgtgatcctgtgcatcgagaaggccctggcccagtccggcgtgt cccgcgaggacgtgaactacatcaacgcccacgccacctccacccccgccggcgacatcaaggagtaccaggccctggcccactg cttcggccagaactccgagctgcgcgtgaactccaccaagtccatgatcggccacctgctgggcgccgccggcggcgtggaggcc gtgaccgtgatccaggccatccgcaccggctggatccaccccaacctgaacctggaggaccccgacaaggccgtggacgccaagt tcctggtgggccccaagaaggagcgcctgaacgtgaaggtgggcctgtccaactccttcggcttcggcggccacaactcctccatcc tgttcgccccctgcaacaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:49 CpaucipetalaKASIVa(D3291,pSZ4457)codonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccaccttccagtgcctggtgaactcccacatcgacccctgcaaccagaacgtgtcctccgcctccctgtcc ttcctgggcgacaacggcttcggctccaaccccttccgctccaaccgcggccaccgccgcctgggccgcgcctcccactccggcga ggccatggccgtggccctgcagcccgcccaggaggtggccaccaagaagaagcccgccatcaagcagcgccgcgtggtggtga ccggcatgggcgtggtgacccccctgggccacgagcccgacgtgttctacaacaacctgctggacggcgtgtccggcatctccgag atcgagaccttcgactgcacccagttccccacccgcatcgccggcgagatcaagtccttctccaccgacggctgggtggcccccaag ctgtccaagcgcatggacaagttcatgctgtacctgctgaccgccggcaagaaggccctggccgacgccggcatcaccgaggacgt gatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctccggcatgggcggcatgaagctgttcaacgactccatcgag gccctgcgcgtgtcctacaagaagatgaaccccttctgcgtgcccttcgccaccaccaacatgggctccgccatgctggccatggacc tgggctggatgggccccaactactccatctccaccgcctgcgccacctccaacttctgcatcctgaacgccgccaaccacatcatccg cggcgaggccgacatgatgctgtgcggcggctccgacgccgtgatcatccccatcggcctgggcggcttcgtggcctgccgcgccc tgtcccagcgcaactccgaccccaccaaggcctcccgcccctgggactccaaccgcgacggcttcgtgatgggcgagggcgccgg cgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatctacgccgagttcctgggcggctccttcacctgcg acgcctaccacatgaccgagccccaccccgacggcgccggcgtgatcctgtgcatcgagaaggcccttggcccagtccggcgtgtc ccgcgaggacgtgaactacatcaacgcccacgccacctccaccccgccggcgacatcaaggagtaccaggccctggcccactgc ttcggccagaactccgagctgcgcgtgaactccaccaagtccatgatcggccacctgctgggcgccgccggcggcgtggaggccg tgaccgtgatccaggccatccgcaccggctggatccaccccaacctgaacctggaggaccccgacgaggccgtggacgccaagtt cctggtgggccccaagaaggagcgcctgaacgtgaaggtgggcctgtccaactccttcggcttcggcggccacaactcctccatcct gttcgccccctacaacaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:50 CupheapainteriKASIV(D3292,pSZ4458)codonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccaccccccagtgcctggacccctgcaaccagcactgcttcctgggcgacaacggcttcgcctccctgat cggctccaagcccccccgctccaacctgggccacctgcgcctgggccgcacctcccactccggcgaggtgatggccgtggcccag gaggtgtccaccaacaagaagcacgccaccaagcagcgccgcgtggtggtgaccggcatgggcgtggtgacccccctgggccac gaccccgacgtgtactacaacaacctgctggagggcgtgtccggcatctccgagatcgagaacttcgactgctcccagctgcccacc cgcatcgccggcgagatcaagtccttctccaccgacggcctggtggcccccaagctgtccaagcgcatggacaagttcatgctgtac atcctgaccgccggcaagaaggccctggccgacggcggcatcaccgaggacgtgatgaaggagctggacaagcgcaagtgcgg cgtgctgatcggctccggcctgggcggcatgaaggtgttctccgactccgtggaggccctgcgcatctcctacaagaagatctccccc ttctgcgtgcccttctccaccaccaacatgggctccgccatgctggccatggacctgggctggatgggccccaactactccatctccac cgcctgcgccacctccaacttctgcatcctgaacgccgccaaccacatcaccaagggcgaggccgacatgatgctgtgcggcggct ccgacgccgccatcctgcccatcggcatgggcggcttcgtggcctgccgcgccctgtcccagcgaacaacgaccccaccaaggc ctcccgcccctgggactccaaccgcgacggcttcgtgatgggcgagggcgccggcgtgctgctgctggaggagctggagcacgcc aagaagcgcggcgccaccatctacgccgagttcctgggcggctccttcacctgcgacgcctaccacatgaccgagccccaccccga cggcgccggcgtgatcctgtgcatcgagaaggccctggcccagtccggcgtgtcccgcgaggaggtgaactacatcaagcccac gccacctccacccccgccggcgacatcaaggagtaccaggccctggcccactgcttcggccagaactccgagctgcgcgtgaactc caccaagtccatgatcggccacctgctgggcggcgccggcggcgtggaggccgtgaccgtggtgcaggccatccgcaccggctg gatccaccccaacatcaacctggaggaccccgacaagggcgtggacgccaagctgctggtgggccccaagaaggagaagctgaa ggtgaaggtgggcctgtccaactccttcggcttcggcggccacaactcctccatcctgttcgccccctgcaacaccatgtacccctacg acgtgcccgactacgcctga SEQIDNO:51 C.avigeraKASIVa(D3293,pSZ4459)codonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccaccttccagtgcctggtgacctcctacaacgacccctgcgagcagtaccgctcctccgcctccctgtcc ttcctgggcgacaacggcttcgcctccctgttcggctccaagcccttccgctccaaccgcggccaccgccgcctgggccgcgcctcc cactccggcgaggccatggccgtggccctgcagcccgcccaggaggtgggcaccaagaagaagcccgtgatcaagcagcgccg cgtggtggtgaccggcatgggcgtggtgacccccctgggccacgagcccgacgtgtactacaacaacctgctggacggcgtgtccg gcatctccgagatcgagaccttcgactgcacccagttccccacccgcatcgccggcgagatcaagtccttctccaccgacggctgggt ggcccccaagctgtccaagcgcatggacaagttcatgctgtacctgctgaccgccggcaagaaggccctggccgacggcggcatc accgacgacgtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctccggcctgggcggcatgaaggtgttctccg agtccatcgaggccctgcgcacctcctacaagaagatctcccccttctgcgtgcccttctccaccaccaacatgggctccgccatcctg gccatggacctgggctggatgggccccaactactccatctccaccgcctgcgccacctccaacttctgcatcctgaacgccgccaacc acatcaccaagggcgaggccgacatgatgctgtgcggcggctccgactccgtgatcctgcccatcggcatgggcggcttcgtggcct gccgcgccctgtcccagcgcaacaacgaccccaccaaggcctcccgcccctgggactccaaccgcgacggcttcgtgatgggcga gggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatctacgccgagttcctgggcggctc cttcacctgcgacgcctaccacatgaccgagccccaccccgagggcgccggcgtgatcctgtgcatcgagaaggccctggcccagt ccggcgtgtcccgcgaggacgtgaactacatcaacgcccacgccacctccacccccgccggcgacatcaaggagtaccaggccct ggcccactgcttcggccagaactccgagctgcgcgtgaactccaccaagtccatgatcggccacctgctgggcggcgccggcggc gtggaggccgtgaccgtggtgcaggccatccgcaccggctggatccaccccaacatcaacctggacgaccccgacgagggcgtg gacgccaagctgctggtgggccccaagaaggagaagctgaaggtgaaggtgggcctgtccaactccttcggcttcggcggccaca actcctccatcctgttcgccccctgcaacaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:52 CigneaKASIVa(D3294,pSZ4460)codonoptimizedforPrototheca atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggctccacctcccagtgcctggtgacctcctacatcgacccctgcaacaagtactgctcctccgcctccctgtcct tcctgggcgacaacggcttcgcctccctgttcggctccaagcccttccgctccaaccgcggccaccgccgcctgggccgcgcctccc actccggcgaggccatggccgtggccctgcagcccgcccaggaggtgaccaccaagaagaagcccgtgatcaagcagcgccgc gtggtggtgaccggcatgggcgtggtgacccccctgggccacgagcccgacgtgtactacaacaacctgctggacggcgtgtccg gcatctccgagatcgagaccttcgactgcacccagttccccacccgcatcgccggcgagatcaagtccttctccaccgacggctgggt ggcccccaagctgtccaagcgcatggacaagttcatgctgtaccgctgaccgccggcaagaaggccctggccgacggcggcatc accgacgacgtgatgaaggagctggacaagcgcaagtgcggcgtgctgatcggctccggcatgggcggcatgaagctgttcaacg actccatcgaggccctgcgcatctcctacaagaagatgaaccccttctgcgtgcccttcgccaccaccaacatgggctccgccatgct ggccatggacctgggctggatgggccccaactactccatctccaccgcctgcgccacctccaacttctgcatcctgaacgcctccaac cacatcgtgcgcggcgaggccgacatgatgctgtgcggcggctccgacgccgtgatcatccccatcggcctgggcggcttcgtggc ctgccgcgccctgtcccagcgcaacaacgaccccaccaaggcctcccgcccctgggactccaaccgcgacggcttcgtgatgggc gagggcgccggcgtgctgctgctggaggagctggagcacgccaagaagcgcggcgccaccatctacgccgagttcctgggcggc tccttcacctgcgacgcctaccacatgaccgagccccaccccgagggcgccggcgtgatcctgtgcatcgagaaggccctggccca ggccggcgtgtccaaggaggacgtgaactacatcaacgcccacgccacctccacccccgccggcgacatcaaggagtaccaggc cctggcccagtgcttcggccagaactccgagctgcgcgtgaactccaccaagtccatgatcggccacctgctgggcgccgccggcg gcgtggaggccgtgaccgtggtgcaggccatccgcaccggctggatccaccccaacctgaacctggaggaccccgacaaggccg tggacgccaagctgctggtgggccccaagaaggagcgcctgaacgtgaaggtgggcctgtccaactccttcggcttcggcggccac aactcctccatcctgttcgccccctacaacaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:53 C.avigeraKASIa(D3342,pSZ4511)codonoptimizedforPrototheca atgcagtccctgcactcccccgccctgcgcgcctcccccctggaccccctgcgcctgaagtcctccgccaacggcccctcctccacc gccgccttccgccccctgcgccgcgccaccctgcccaacatccgcgccgcctcccccaccgtgtccgcccccaagcgcgagaccg accccaagaagcgcgtggtgatcaccggcatgggcctggtgtccgtgttcggctccgacgtggacgcctactacgagaagctgctgt ccggcgagtccggcatctccctgatcgaccgcttcgacgcctccaagttccccacccgcttcggcggccagatccgcggcttcaacg ccaccggctacatcgacggcaagaacgaccgccgcctggacgactgcctgcgctactgcatcgtggccggcaagaaggccctgga gaactccgacctgggcggcgactccctgtccaagatcgacaaggagcgcgccggcgtgctggtgggcaccggcatgggcggcct gaccgtgttctccgacggcgtgcagaacctgatcgagaagggccaccgcaagatctcccccttcttcatcccctacgccatcaccaac atgggctccgccctgctggccatcgacctgggcctgatgggccccaactactccatctccaccgcctgcgccacctccaactactgctt ctacgccgccgccaaccacatccgccgcggcgaggccgacctgatgatcgccggcggcaccgaggccgccatcatccccatcgg cctgggcggcttcgtggcctgccgcgccctgtcccagcgcaacgacgacccccagaccgcctcccgcccctgggacaaggaccg cgacggcttcgtgatgggcgaggcgccggcgtgctggtgatggagtccctggagcacgccatgaagcgcggcgcccccatcatc gccgagtacctgggcggcgccgtgaactgcgacgcctaccacatgaccgacccccgcgccgacggcctgggcgtgtcctcctgca tcgagtcctccctggaggacgccggcgtgtcccccgaggaggtgaactacatcaacgcccacgccacctccaccctggccggcga cctggccgagatcaacgccatcaagaaggtgttcaagaacaccaaggacatcaagatcaacgccaccaagtccatgatcggccact gcctgggcgcctccggcggcctggaggccatcgccaccatcaagggcatcaccaccggctggctgcacccctccatcaaccagttc aaccccgagccctccgtggagttcgacaccgtggccaacaagaagcagcagcacgaggtgaacgtggccatctccaactccttcgg cttcggcggccacaactccgtggtggccttctccgccttcaagcccaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:54 C.pulcherrimaKASI(D3343,pSZ4512)codonoptimizedforPrototheca atgcactccctgcagtccccctccctgcgcgcctcccccctggaccccttccgccccaagtcctccaccgtgcgccccctgcaccgc gcctccatccccaacgtgcgcgccgcctcccccaccgtgtccgcccccaagcgcgagaccgaccccaagaagcgcgtggtgatca ccggcatgggcctggtgtccgtgttcggctccgacgtggacgcctactacgacaagctgctgtccggcgagtccggcatcggcccca tcgaccgcttcgacgcctccaagttccccacccgcttcggcggccagatccgcggcttcaactccatgggctacatcgacggcaaga acgaccgccgcctggacgactgcctgcgctactgcatcgtggccggcaagaagtccctggaggacgccgacctgggcgccgacc gcctgtccaagatcgacaaggagcgcgccggcgtgctggtgggcaccggcatgggcggcctgaccgtgttctccgacggcgtgca gtccctgatcgagaagggccaccgcaagatcacccccttcttcatcccctacgccatcaccaacatgggctccgccctgctggccatc gagctgggcctgatgggccccaactactccatctccaccgcctgcgccacctccaactactgcttccacgccgccgccaaccacatc cgccgcggcgaggccgacctgatgatcgccggcggcaccgaggccgccatcatccccatcggcctgggcggcttcgtggcctgcc gcgccctgtcccagcgcaacgacgacccccagaccgcctcccgcccctgggacaaggaccgcgacggcttcgtgatgggcgagg gcgccggcgtgctggtgctggagtccctggagcacgccatgaagcgcggcgcccccatcatcgccgagtacctgggcggcgccat caactgcgacgcctaccacatgaccgacccccgcgccgacggcctgggcgtgtcctcctgcatcgagtcctccctggaggacgccg gcgtgtcccccgaggaggtgaactacatcaacgcccacgccacctccaccctggccggcgacctggccgagatcaacgccatcaa gaaggtgttcaagaacaccaaggacatcaagatcaacgccaccaagtccatgatcggccactgcctgggcgcctccggcggcctgg aggccatcgccaccatcaagggcatcaacaccggctggctgcacccctccatcaaccagttcaaccccgagccctccgtggagttcg acaccgtggccaacaagaagcagcagcacgaggtgaacgtggccatctccaactccttcggcttcggcggccacaactccgtggtg gccttctccgccttcaagcccaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:55 C.avigeramitochondrialKAS(D3344,pSZ4513)codonoptimizedforPrototheca atggtgttcctgccctggcgcaagatgctgtgcccctcccagtaccgcttcctgcgccccctgtcctcctccaccaccttcgacccccg ccgcgtggtggtgaccggcctgggcatggtgacccccctgggctgcggcgtgaacaccacctggaagcagctgatcgagggcaag tgcggcatccgcgccatctccctggaggacctgaagatggacgccttcgacatcgacacccaggcctacgtgttcgaccagctgacc tccaaggtggccgccaccgtgcccaccggcgtgaaccccggcgagttcaacgaggacctgtggttcaaccagaaggagcaccgcg ccatcgcccgcttcatcgcctacgccctgtgcgccgccgacgaggccctgaaggacgccaactgggagcccaccgagcccgagg agcgcgagatgaccggcgtgtccatcggcggcggcaccggctccatctcccgacgtgctggacgccggccgcatgatctgcgagaa gaagctgcgccgcctgtcccccttcttcatcccccgcatcctgatcaacatggcctccggccacgtgtccatgaagtacggcttccagg gccccaaccacgccgccgtgaccgcctgcgccaccggcgcccactccatcggcgacgccgcccgcatgatccagttcggcgacg ccgacgtgatggtggccggcggcaccgagtcctccatcgacgccctgtccatcgccggcttctgccgctcccgcgccctgaccacc aagtacaactcctgcccccaggaggcctcccgccccttcgacaccgaccgcgacggcttcgtgatcggcgagggctccggcgtgct ggtgctggaggagctggaccacgcccgcaagcgcggcgccaagatgtacgccgagttctgcggctacggcatgtccggcgacgc ccaccacatcacccagccccactccgacggccgcggcgccatcctggccatgacccgcgccctgaagcagtccaacctgcacccc gaccaggtggactacgtgaacgcccacgccacctccacctccctgggcgacgccatcgaggccaaggccatcaagaccgtgttctc cgaccacgccatgtccggctccctggccctgtcctccaccaagggcgccatcggccacctgctgggcgccgccggcgccgtggag gccatcttctccatcctggccatcaagaacggcctggcccccctgaccctgaacgtggcccgccccgaccccgtgttcaccgagcgc ttcgtgcccctgaccgcctccaaggagatgcacgtgcgcgccgccctgtccaactccttcggcttcggcggcaccaacaccaccctg ctgttcacctcccccccccagaacaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:56 C.avigeraKASIII(D3345,pSZ4514)CodonoptimizedforPrototheca. atggccaacgcctacggcttcgtgggctcctccgtgcccaccgtgggccgcgccgcccagttccagcagatgggctccggcttctgc tccgtggacttcatctccaagcgcgtgttctgctgctccgccgtgcagggcgccgacaagcccgcctccggcgactcccgcgccgag taccgcaccccccgcctggtgtcccgcggctgcaagctgatcggctccggctccgccatccccaccctgcaggtgtccaacgacga cctggccaagatcgtggacaccaacgacgagtggatctccgtgcgcaccggcatccgcaaccgccgcgtgctgaccggcaaggac tccctgaccaacctggccaccgaggccgcccgcaaggccctggagatggcccaggtggacgccgaggacgtggacatggtgctg atgtgcacctccacccccgaggacctgttcggctccgccccccagatccagaaggccctgggctgcaagaagaaccccctgtcctac gacatcaccgccgcctgctccggcttcgtgctgggcctggtgtccgccgcctgccacatccgcggcggcggcttcaacaacgtgctg gtgatcggcgccgactccctgtcccgctacgtggactggaccgaccgcggcacctgcatcctgttcggcgacgccgccggcgccgt gctggtgcagtcctgcgacgccgaggaggacggcctgttcgccttcgacctgcactccgacggcgacggccagcgccacctgcgc gccgtgatcaccgagaacgagaccgaccacgccgtgggcaccaacggctccgtgtccgacttccccccccgccgctcctcctactc ctgcatccagatgaacggcaaggaggtgttccgcttcgcctgccgctccgtgccccagtccatcgagctggccctgggcaaggccg gcctgaacggctccaacatcgactggctgctgctgcaccaggccaaccagcgcatcatcgacgccgtggccacccgcctggaggtg ccccaggagcgcgtgatctccaacctggccaactacggcaacacctccgccgcctccatccccctggccctggacgaggccgtgcg cggcggcaaggtgaagcccggccacctgatcgccaccgccggcttcggcgccggcctgacctggggctccgccatcgtgcgctg gggcaccatgtacccctacgacgtgcccgactacgcctga SEQIDNO:57 C.hookerianaFATB2(ChFATB2) MVAAAASSAFFPVPAPGASPKPGKFGNWPSSLSPSFKPKSIPNGGFQVKANDSAHPK ANGSAVSLKSGSLNTQEDTSSSPPPRTFLHQLPDWSRLLTAITTVFVKSKRPDMHDRK SKRPDMLVDSFGLESTVQDGLVFRQSFSIRSYEIGTDRTASIETLMNHLQETSLNHCK STGILLDGFGRTLEMCKRDLIWVVIKMQIKVNRYPAWGDTVEINTRFSRLGKIGMGR DWLISDCNTGEILVRATSAYAMMNQKTRRLSKLPYEVHQEIVPLFVDSPVIEDSDLK VHKFKVKTGDSIQKGLTPGWNDLDVNQHVSNVKYIGWILESMPTEVLETQELCSLA LEYRRECGRDSVLESVTAMDPSKVGVRSQYQHLLRLEDGTAIVNGATEWRPKNAGA NGAISTGKTSNGNSVS SEQIDNO:58 23SrRNAforUTEX1439,UTEX1441,UTEX1435,UTEX1437Protothecamoriformis TGTTGAAGAATGAGCCGGCGACTTAAAATAAATGGCAGGCTAAGAGAATTAATA ACTCGAAACCTAAGCGAAAGCAAGTCTTAATAGGGCGCTAATTTAACAAAACAT TAAATAAAATCTAAAGTCATTTATTTTAGACCCGAACCTGAGTGATCTAACCATG GTCAGGATGAAACTTGGGTGACACCAAGTGGAAGTCCGAACCGACCGATGTTGA AAAATCGGCGGATGAACTGTGGTTAGTGGTGAAATACCAGTCGAACTCAGAGCT AGCTGGTTCTCCCCGAAATGCGTTGAGGCGCAGCAATATATCTCGTCTATCTAGG GGTAAAGCACTGTTTCGGTGCGGGCTATGAAAATGGTACCAAATCGTGGCAAAC TCTGAATACTAGAAATGACGATATATTAGTGAGACTATGGGGGATAAGCTCCAT AGTCGAGAGGGAAACAGCCCAGACCACCAGTTAAGGCCCCAAAATGATAATGAA GTGGTAAAGGAGGTGAAAATGCAAATACAACCAGGAGGTTGGCTTAGAAGCAGC CATCCTTTAAAGAGTGCGTAATAGCTCACTG SEQIDNO:59 AminoacidsequenceoftheC.hookerianaKASIV(D3668,pSZ4756).Thealga1transit peptideisunderlined. MASAAFTMSACPAMTGRAPGARRSGRPVATRLRGSTFQCLDPCNQQRFLGDNGFAS LFGSKPLRSNRGHLRLGRTSHSGEVMAVAMQPAQEVSTNKKPATKQRRVVVTGMG VVTPLGHDPDVYYNNLLDGISGISEIENFDCSQFPTRIAGEIKSFSTDGWVAPKFSERM DKFMLYMLTAGKKALADGGITEDAMKELNKRKCGVLIGSGLGGMKVFSDSIEALRT SYKKISPFCVPFSTTNMGSAILAMDLGWMGPNYSISTACATSNFCILNAANHIIKGEA DMMLCGGSDAAVLPVGLGGFVACRALSQRNNDPTKASRPWDSNRDGFVMGEGAG VULEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAGVILCIEKALAQSGVSR EDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLLGGAGGVEAV AVVQAIRTGWIHPNINLEDPDEGVDAKLLVGPKKEKLKVKVGLSNSFGFGGHNSSIL FAPCN SEQIDNO:60 NucleotidesequenceoftheC.hookerianaKASIV(D3668,pSZ4756)expressionvector.The 5 and3 homologyarmsenablingtargetedintegrationintotheSAD2-1locusarenotedwith lowercase.TheendogenousSAD2-1promoter(presentwithinthe5 homologytargeting arm)drivestheexpressionofthecodonoptimizedChKASIV(notedwithlowercasebold text)andisterminatedwiththePmHSP903UTRnotedinunderlined,lowercasebold.The PmHXT1-2promoterisnotedinuppercaseitalicwhichdrivesexpressionoftheScMelibiase selectionmarkernotedwithlowercaseitalicfollowedbythePmPGK3UTRterminator highlightedinuppercase.RestrictioncloningsitesandspacerDNAfragmentsarenotedas underlined,uppercaseplainlettering. gccggtcaccacccgcatgctcgtactacagcgcacgcaccgcttcgtgatccaccgggtgaacgtagtcctcgacggaaacatctg gttcgggcctcctgcttgcactcccgcccatgccgacaacctttctgctgttaccacgacccacaatgcaacgcgacacgaccgtgtg ggactgatcggttcactgcacctgcatgcaattgtcacaagcgcttactccaattgtattcgtttgttttctgggagcagttgctcgaccgc ccgcgtcccgcaggcagcgatgacgtgtgcgtggcctgggtgtttcgtcgaaaggccagcaaccctaaatcgcaggcgatccggag attgggatctgatccgagtttggaccagatccgccccgatgcggcacgggaactgcatcgactcggcgcggaacccagctttcgtaa atgccagattggtgtccgatacctggatttgccatcagcgaaacaagacttcagcagcgagcgtatttggcgggcgtgctaccagggtt gcatacattgcccatttctgtctggaccgctttactggcgcagagggtgagttgatggggttggcaggcatcgaaacgcgcgtgcatgg tgtgcgtgtctgttttcggctgcacgaattcaatagtcggatgggcgacggtagaattgggtgtggcgctcgcgtgcatgcctcgcccc gtcgggtgtcatgaccgggactggaatcccccctcgcgaccatcttgctaacgctcccgactctcccgaccgcgcgcaggatagact cttgttcaaccaatcgacaGGTACCatggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccct ggggcacgtcgctccggacggccagtcgccacccgcctgaggggcagcaccttccagtgcctggacccctgcaaccagcagc gcttcctgggcgacaacggcttcgcgtcgctgttcggctccaagcccctgcgcagcaaccgcggccacctgcgcctgggccgc acctcgcactccggcgaggtgatggccgtcgcgatgcagcccgcccaggaggtgagcaccaacaagaagcccgcgaccaa gcagcgccgcgtggtcgtgaccggcatgggcgtcgtgacccccctgggccacgaccccgacgtgtattataacaacctgctgg acggcatctcgggcatctccgagatcgagaacttcgactgcagccagttccccacccgcatcgccggcgagatcaagtcgttc tccaccgacggctgggtcgcgcccaagttcagcgagcgcatggacaagttcatgctgtatatgctgaccgccggcaagaagg cgctggccgacggcggcatcaccgaggacgcgatgaaggagctgaacaagcgcaagtgcggcgtgctgatcggctcgggc ctgggcggcatgaaggtcttctccgacagcatcgaggccctgcgcacctcgtataagaagatctcccccttctgcgtgcccttc agcaccaccaacatgggctcggcgatcctggcgatggacctgggctggatgggccccaactattccatcagcaccgcgtgcg ccacctcgaacttctgcatcctgaacgcggccaaccacatcatcaagggcgaggcggacatgatgctgtgcggcggctccga cgccgcggtgctgcccgtcggcctgggcggcttcgtggcctgccgcgcgctgagccagcgcaacaacgaccccaccaaggcc tcgcgcccctgggactccaaccgcgacggcttcgtcatgggcgagggcgcgggcgtgctgctgctggaggagctggagcacg ccaagaagcgcggcgcgaccatctatgccgagttcctgggcggcagcttcacctgcgacgcgtatcacatgaccgagcccca ccccgagggcgccggcgtcatcctgtgcatcgagaaggcgctggcccagtcgggcgtgtcccgcgaggacgtgaactatatc aacgcgcacgccaccagcacccccgcgggcgacatcaaggagtatcaggccctggcgcactgcttcggccagaactcggag ctgcgcgtcaactccaccaagagcatgatcggccacctgctgggcggcgccggcggcgtggaggcggtcgccgtggtccagg cgatccgcaccggctggatccaccccaacatcaacctggaggaccccgacgagggcgtggacgccaagctgctggtcggcc ccaagaaggagaagctgaaggtgaaggtcggcctgtcgaactccttcggcttcggcggccacaacagctcgatcctgttcgc gccctgcaactgaCTCGAGacagacgaccttggcaggcgtcgggtagggaggtggtggtgatggcgtctcgatgccatc gcacgcatccaacgaccgtatacgcatcgtccaatgaccgtcggtgtcctctctgcctccgttttgtgagatgtctcaggcttggt gcatcctcgggtggccagccacgttgcgcgtcgtgctgcttgcctctcttgcgcctctgtggtactggaaaatatcatcgaggcc cgtttttttgctcccatttcctttccgctacatcttgaaagcaaacgacaaacgaagcagcaagcaaagagcacgaggacggtg aacaagtctgtcacctgtatacatctatttccccgcgggtgcacctactctctctcctgccccggcagagtcagctgccttacgtg acCCTAGGTGCGGTGAGAATCGAAAATGCATCGTTTCTAGGTTCGGAGACGGTCAATTC CCTGCTCCGGCGAATCTGTCGGTCAAGCTGGCCAGTGGACAATGTTGCTATGGCAGC CCGCGCACATGGGCCTCCCGACGCGGCCATCAGGAGCCCAAACAGCGTGTCAGGGT ATGTGAAACTCAAGAGGTCCCTGCTGGGCACTCCGGCCCCACTCCGGGGGCGGGAC GCCAGGCATTCGCGGTCGGTCCCGCGCGACGAGCGAAATGATGATTCGGTTACGAGA CCAGGACGTCGTCGAGGTCGAGAGGCAGCCTCGGACACGTCTCGCTAGGGCAACGC CCCGAGTCCCCGCGAGGGCCGTAAACATTGTTTCTGGGTGTCGGAGTGGGCATTTTG GGCCCGATCCAATCGCCTCATGCCGCTCTCGTCTGGTCCTCACGTTCGCGTACGGCCT GGATCCCGGAAAGGGCGGATGCACGTGGTGTTGCCCCGCCATTGGCGCCCACGTTTC AAAGTCCCCGGCCAGAAATGCACAGGACCGGCCCGGCTCGCACAGGCCATGCTGAAC GCCCAGATTTCGACAGCAACACCATCTAGAATAATCGCAACCATCCGCGTTTTGAACGA AACGAAACGGCGCTGTTTAGCATGTTTCCGACATCGTGGGGGCCGAAGCATGCTCCG GGGGGAGGAAAGCGTGGCACAGCGGTAGCCCATTCTGTGCCACACGCCGACGAGGA CCAATCCCCGGCATCAGCCTTCATCGACGGCTGCGCCGCACATATAAAGCCGGACGC CTAACCGGTTTCGTGGTTATGACTAGTatgttcgcgttctacttcctgacggcctgcatctccctgaagggcgtg ttcggcgtctccccctcctacaacggcctgggcctgacgccccagatgggctgggacaactggaacacgttcgcctgcgacgtctc cgagcagctgctgctggacacggccgaccgcatctccgacctgggcctgaaggacatgggctacaagtacatcatcctggacga ctgctggtcctccggccgcgactccgacggcacctggtcgccgacgagcagaagaccccaacggcatgggccacgtcgccga ccacctgcacaacaactccacctgacggcatgtactcctccgcgggcgagtacacgtgcgccggctaccccggctccctgggcc gcgaggaggaggacgcccagacttcgcgaacaaccgcgtggactacctgaagtacgacaactgctacaacaagggccagac ggcacgcccgagatctcctaccaccgctacaaggccatgtccgacgccctgaacaagacgggccgccccatcactactccctgt gcaactggggccaggacctgaccactactggggctccggcatcgcgaactcctggcgcatgtccggcgacgtcacggcggagtt cacgcgccccgactcccgctgcccctgcgacggcgacgagtacgactgcaagtacgccggcaccactgctccatcatgaacatc ctgaacaaggccgcccccatgggccagaacgcgggcgtcggcggctggaacgacctggacaacctggaggtcggcgtcggc aacctgacggacgacgaggagaaggcgcacactccatgtgggccatggtgaagtcccccctgatcatcggcgcgaacgtgaa caacctgaaggcctcctcctactccatctactcccaggcgtccgtcatcgccatcaaccaggactccaacggcatccccgccacg cgcgtctggcgctactacgtgtccgacacggacgagtacggccagggcgagatccagatgtggtccggccccctggacaacgg cgaccaggtcgtggcgctgctgaacggcggctccgtgtcccgccccatgaacacgaccctggaggagatcacttcgactccaac ctgggctccaagaagctgacctccacctgggacatctacgacctgtgggcgaaccgcgtcgacaactccacggcgtccgccatc ctgggccgcaacaagaccgccaccggcatcctgtacaacgccaccgagcagtcctacaaggacggcctgtccaagaacgaca cccgcctgttcggccagaagatcggctccctgtcccccaacgcgatcctgaacacgaccgtcccgcccacggcatcgcgttcta ccgcctgcgcccctcctcctgATACAACTTATTACGTATTCTGACCGGCGCTGATGTGGCGCG GACGCCGTCGTACTCTTTCAGACTTTACTCTTGAGGAATTGAACCTTTCTCGCTTG CTGGCATGTAAACATTGGCGCAATTAATTGTGTGATGAAGAAAGGGTGGCACAA GATGGATCGCGAATGTACGAGATCGACAACGATGGTGATTGTTATGAGGGGCCA AACCTGGCTCAATCTTGTCGCATGTCCGGCGCAATGTGATCCAGCGGCGTGACTC TCGCAACCTGGTAGTGTGTGCGCACCGGGTCGCTTTGATTAAAACTGATCGCATT GCCATCCCGTCAACTCACAAGCCTACTCTAGCTCCCATTGCGCACTCGGGCGCCC GGCTCGATCAATGTTCTGAGCGGAGGGCGAAGCGTCAGGAAATCGTCTCGGCAG CTGGAAGCGCATGGAATGCGGAGCGGAGATCGAATCAGATATCAAGCTCCATCG AGCTCcagccacggcaacaccgcgcgccttgcggccgagcacggcgacaagaacctgagcaagatctgcgggctgatcgcc agcgacgagggccggcacgagatcgcctacacgcgcatcgtggacgagttcttccgcctcgaccccgagggcgccgtcgccgcct acgccaacatgatgcgcaagcagatcaccatgcccgcgcacctcatggacgacatgggccacggcgaggccaacccgggccgca acctcttcgccgacttctccgcggtcgccgagaagatcgacgtctacgacgccgaggactactgccgcatcctggagcacctcaacg cgcgctggaaggtggacgagcgccaggtcagcggccaggccgccgcggaccaggagtacgtcctgggcctgccccagcgcttcc ggaaactcgccgagaagaccgccgccaagcgcaagcgcgtcgcgcgcaggcccgtcgccttctcctggatctccgggcgcgaga tcatggtctagggagcgacgagtgtgcgtgcggggctggcgggagtgggacgccctcctcgctcctctctgttctgaacggaacaat cggccaccccgcgctacgcgccacgcatcgagcaacgaagaaaaccccccgatgataggttgcggtggctgccgggatatagatc cggccgcacatcaaagggcccctccgccagagaagaagctcctttcccagcagactcct SEQIDNO:61 NucleotidesequenceoftheC.hookerianaKASIVCDScodonoptimizedforP.moriformis. atggcttccgcggcattcaccatgtcggcgtgccccgcgatgactggcagggcccctggggcacgtcgctccggacggccagtcgc cacccgcctgaggggcagcaccttccagtgcctggacccctgcaaccagcagcgcttcctgggcgacaacggcttcgcgtcgctgtt cggctccaagcccctgcgcagcaaccgcggccacctgcgcctgggccgcacctcgcactccggcgaggtgatggccgtcgcgat gcagcccgcccaggaggtgagcaccaacaagaagcccgcgaccaagcagcgccgcgtggtcgtgaccggcatgggcgtcgtga cccccctgggccacgaccccgacgtgtattataacaacctgctggacggcatctcgggcatctccgagatcgagaacttcgactgca gccagttccccacccgcatcgccggcgagatcaagtcgttctccaccgacggctgggtcgcgcccaagttcagcgagcgcatggac aagttcatgctgtatatgctgaccgccggcaagaaggcgctggccgacggcggcatcaccgaggacgcgatgaaggagctgaaca agcgcaagtgcggcgtgctgatcggctcgggcctgggcggcatgaaggtcttctccgacagcatcgaggccctgcgcacctcgtat aagaagatctcccccttctgcgtgcccttcagcaccaccaacatgggctcggcgatcctggcgatggacctgggctggatgggcccc aactattccatcagcaccgcgtgcgccacctcgaacttctgcatcctgaacgcggccaaccacatcatcaagggcgaggcggacatg atgctgtgcggcggctccgacgccgcggtgctgcccgtcggcctgggcggcttcgtggcctgccgcgcgctgagccagcgcaaca acgaccccaccaaggcctcgcgcccctgggactccaaccgcgacggcttcgtcatgggcgagggcgcgggcgtgctgctgctgga ggagctggagcacgccaagaagcgcggcgcgaccatctatgccgagttcctgggcggcagcttcacctgcgacgcgtatcacatga ccgagccccaccccgagggcgccggcgtcatcctgtgcatcgagaaggcgctggcccagtcgggcgtgtcccgcgaggacgtga actatatcaacgcgcacgccaccagcacccccgcgggcgacatcaaggagtatcaggccctggcgcactgcttcggccagaactcg gagctgcgcgtcaactccaccaagagcatgatcggccacctgctgggcggcgccggcggcgtggaggcggtcgccgtggtccag gcgatccgcaccggctggatccaccccaacatcaacctggaggaccccgacgagggcgtggacgccaagctgctggtcggcccc aagaaggagaagctgaaggtgaaggtcggcctgtcgaactccttcggcttcggcggccacaacagctcgatcctgttcgcgccctgc aactga SEQIDNO:62 AminoacidsequenceoftheC.aequipetalaKASIV.Thealga1transitpeptideis underlined. CaequeKASIV MAAAASMVASPLCTWLVAACMSTSFDNDPRSPSIKRIPRRRRILSQSSLRGSTFQCLV TSYIDPCNQFSSSASLSFLGDNGFASLFGSKPFRSIRGHRRLGRASHSGEAMAVALEPA QEVATKKKPVVKQRRVVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIETFDCNQF PTRIAGEIKSFSTDGWVAPKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELDKR KCGVLIGSGLGGMKLFSDSIEALRISYKKMNPFCVPFATTNMGSAMLAMDLGWMGP NYSISTACATSNFCILNSANHIVRGEADMMLCGGSDAVIIPIGLGGfVACRALSQRNN DPTKASRPWDSNRDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHM TEPHPEGAGVILCIEKALAQAGVSREDVNYINAHATSTPAGDIKEYQALAHCFGHNS ELRVNSTKSMIGHLIGAAGGVEAVTVVQAIRTGWIHPNLNLEDPDKAVDAKLLVGP KKERLNVKVGLSNSFGfGGHNSSILFAPYN SEQIDNO:63 AminoacidsequenceoftheC.glassostomaKASIV.Thealga1transitpeptideis underlined. S07_Cg_Locus_4548_Transcript_4/9_translation MAAAASSQLCTWLVAACMSTSFDNNPRSPSIKRLPRRRRVLSHCSLRGSTFQCLVTS YIDPCNQYCSSASLSFLGDNGFTPLIGSKPFRSNRGHPRLGRASHSGEAMAVALQPAQ EVATKKKPAMKQRRVVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIETFDCTQFP TRIAGEIKSFSTDGWVAPKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELDKRK CGVLIGSGMGGMKLFNDSIEALRVSYKKMNPFCVPFATTNMGSAMLAMDLGWMGP NYSISTACATSNFCILNAANHIVRGEADMMLCGGSDAVIIPIGLGGFVACRALSQRNN DPTKASRPWDSNRDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHM TEPHPEGAGVILCIEKALAQAGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNS ELRVNSTKSMIGHLLGAAGGVEAVTVIQAIRTGWIHPNLNLDDPDKAVDAKFLVGP KKERLNVKVGLSNSFGfGGHNSSILFAPYN SEQIDNO:64 AminoacidsequenceoftheC.hookerianaKASIV.Thealga1transitpeptideis underlined. S26_ChookKASIV_trinity_43853-translation MAASSCMVGSPFCTWLVSACMSTSFDNDPRSLSHKRLRLSRRRRTLSSHCSLRGSTP QCLDPCNQHCFLGDNGFASLFGSKPPRSDLGHLRLGRTSHSGEVMAVAQEVSTNKK PATKQRRVVVTGMGVVTPLGHDPDVYYNNLLDGVSGISEIETFDCTQFPTRIAGEIKS FSTDGLVAPKLSKRMDKFMLYILTAGKKALADGGITEDVMKELDKRKCGVLIGSGL GGMKVFSDSVEALRISYKKISPFCVPFSTTNMGSAILAMDLGWMGPNYSISTACATS NFCILNAANHITKGEADMMLCGGSDAAILPIGMGGFVACRALSQRNNDPTKASRPW DSNRDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAG VILCIEKALAQAGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKS MIGHLIGAAGGVEAVTVIQAIRTGWIHPNLNLENPDKAVDAKLLVGPKKERLDVKV GLSNSFGFGGHNSSILFAPYN SEQIDNO:65 AminoacidsequenceoftheC.glassostomaKASIV.Thealga1transitpeptideis underlined S07_Cg_Locus_3059_Transcript_2/2_translation MAAASSMVASSFSTSLVAACMSTSFDNDPRFLSHKRIRLSLRRGSTFQCLGDNGFAS LIGSKPPRSNHGHRRLGRTSHSGEAMAVAMQPAQEASTKNKHVTKQRRVVVTGMG VVTPLGHDPDVYYNNLLDGVSGISEIENFDCSQFPTRIAGEIKSFSTEGYVIPKFAKRM DKFMLYLLTAGKKALEDGGITEDVMKELDKRKCGVLIGSGMGGMKIINDSIAALNV SYKKMTPFCVPFSTTNMGSAMLAIDLGWMGPNYSISTACATSNYCILNAANHIIRGE ANMMLCGGSDAVVIPVGLGGFVACRALSQRNNDPTKASRPWDSNRDGFVMGEGA GVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPDGAGVILCIEKALAQSGV SREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLLGAAGGVEA VSVVQAIRTGWIHPNINLEDPDEAVDAKLLVGPKKEKLKVKVGLSNSFGFGGHNSSI LFAPCN SEQIDNO:66 AminoacidsequenceoftheC.carthagenesisKASIV.Thealga1transitpeptideis underlined S05_CcrKASIV_17190_Seq_7/7_translation MAAAAAFASPFCTWLVAACMSSASRHDPLPSPSSKPRLRRKILFQCAGRGSSAGSGS SFHSLVTSYLGCLEPCHEYYTSSSSLGFSSLFGSTPGRTSRRQRRLHRASHSGEAMAV ALQPAQEVTTKKKPSIKQRRVVVTGMGVVTPLGHDPDVFYNNLLDGASGISEIETFD CAQFPTRIAGEIKSFSTDGWVAPKLSKRMDKFMLYMLTAGKKALADGGISEDVMKE LDKRKCGVLIGSAMGGMKVFNDAIEALRISYKKMNPFCVPFATTNMGSAMLAMDL GWMGPNYSISTACATSNFCILNAANHITRGEADMMLCGGSDAVIIPIGLGGFVACRA LSQRNNDPTKASRPWDSNRDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFT CDAYHMTEPHPKGAGVILCIERALAQSGVSREDVNYINAHATSTPAGDIKEYQALAH CFGQNSELRVNSTKSMIGHLLGAAGGVEAVTVVQAIRTGWVHPNINLENPDEGVDA KLLVGPKKEKLKVKVGLSNSFGFGGHNSSILFAPYN SEQIDNO:67 AminoacidsequenceoftheC.carthagenesisKASIV.Thealga1transitpeptideis underlined S05_CcrKASIV_17190_Seq_6/7_translation MAAAASVVASPFCTWLVAACMSASFDNEPRSLSPKRRRSLSRSSSASLRFLGGNGFA SLFGSDPLRPNRGHRRLRHASHSGEAMAVALQPAQEVSTKKKPVTKQRRVVVTGM GVVTPLGHDPDVYYNNLLDGVSGISEIETFDCTQFPTRIAGEIKSFSTDGWVAPKLSK RMDKFMLYMLTAGKKALADGGITEEVMKELDKRKCGVLIGSGMGGMKLFNDSIEA LRISYKKMNPFCVPFATTNMGSAMLAMDLGWMGPNYSISTACATSNFCILNAANHIT RGEADMMLCGGSDAVIIPIGLGGFVACRALSQRNNDPTKASRPWDSNRDGFVMGEG AGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPKGAGVILCIERALAQSG VSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIGHLLGAAGGVE AVTVVQAIRTGWVHPNINLENPDEGVDAKLLVGPKKEKLKVKVGLSNSFGFGGHNS SILFAPYN SEQIDNO:68 AminoacidsequenceoftheC.pukherrimaKASIV.Thealga1transitpeptideis underlined pSZ2181-Cpu1cKASIV MPAASSLLASPLCTWLLAACMSTSFHPSDPLPPSISSPRRRLSRRRILSQCAPLPSASSA LRGSSFHTLVTSYLACFEPCHDYYTSASLFGSRPIRTTRRHRRLNRASPSREAMAVAL QPEQEVTTKKKPSIKQRRVVVTGMGVVTPLGHDPDVFYNNLLDGTSGISEIETFDCA QFPTRIAGEIKSFSTDGWVAPKLSKRMDKFMLYMLTAGKKALTDGGITEDVMKELD KRKCGVLIGSAMGGMKVFNDAIEALRISYKKMNPFCVPFATTNMGSAMLAMDLGW MGPNYSISTACATSNFCIMNAANHIIRGEADVMLCGGSDAVIIPIGMGGFVACRALSQ RNSDPTKASRPWDSNRDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDA YHMTEPHPDGAGVILCIEKALAQSGVSREDVNYINAHATSTPAGDIKEYQALIHCFG QNRELKVNSTKSMIGHLLGAAGGVEAVSVVQAIRTGWIHPNINLENPDEGVDTKLLV GPKKERLNVKVGLSNSFGFGGHNSSILFAPYI SEQIDNO:69 Clade1KASIVconsensusC8andC10 MAAASCMVASPFCTWLVAACMSTSXDNDPRSLSHKRLRLSRRRRTLSSHCSLRGSTF QCLDPCNQHCFLGDNGFASLFGSKPPRSNRGHLRLGRTSHSGEVMAVAXQXAQEVS TNKKPATKQRRVVVTGMGVVTPLGHDPDVYYNNLLDGVSGISEIENFDCSQFPTRIA GEIKSFSTDGWVAPKLSKRMDKFMLYILTAGKKALADGGITEDVMKELDKRKCGVL IGSGLGGMKVFSDSIEALRTSYKKISPFCVPFSTTNMGSAILAMDLGWMGPNYSISTA CATSNFCILNAANHITKGEADMMLCGGSDAAILPIGMGGFVACRALSQRNNDPTKAS RPWDSNRDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPE GAGVILCIEKALAQSGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNS TKSMIGHLLGGAGGVEAVTVVQAIRTGWIHPNINLEDPDEGVDAKLLVGPKKEKLK VKVGLSNSFGFGGHNSSILFAPCN SEQIDNO:70 Clade2KASIVconsensusC10only MAAAASMXXSPLCTWLVAACMSTSFDNDPRSPSIKRLPRRRRVLSQCSLRGSTFQCL VTSYIDPCNQYCSSASLSFLGDNGFASLFGSKPFRSNRGHRRLGRASHSGEAMAVAL QPAQEVATKKKPVIKQRRVVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIETFDCT QFPTRIAGEIKSFSTDGWVAPKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELD KRKCGVLIGSGMGGMKLFNDSIEALRXSYKKMNPFCVPFATTNMGSAMLAMDLGW MGPNYSISTACATSNFCILNAANHIVRGEADMMLCGGSDAVIIPIGLGGFVACRALSQ RNNDPTKASRPWDSNRDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDA YHMTEPHPEGAGVILCIEKALAQAGVSREDVNYINAHATSTPAGDIKEYQALAHCFG QNSELRVNSTKSMIGHLLGAAGGVEAVTVXQAIRTGWIHPNLNLEDPDKAVDAKLL VGPKKERLNVKVGLSNSFGFGGHNSSILFAPYNV SEQIDNO:71 Clade1KASIVconsensusmatureprotein KQRRVVVTGMGVVTPLGHDPDVYYNNLLDGVSGISEIENFDCSQFPTRIAGEIKSFST DGWVAPKLSKRMDKFMLYILTAGKKALADGGITEDVMKELDKRKCGVLIGSGLGG MKVFSDSIEALRTSYKKISPFCVPFSTTNMGSAILAMDLGWMGPNYSISTACATSNFC ILNAANHITKGEADMMLCGGSDAAILPIGMGGFVACRALSQRNNDPTKASRPWDSN RDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAGVIL CIEKALAQSGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKSMIG HLLGGAGGVEAVTVVQAIRTGWIHPNINLEDPDEGVDAKLLVGPKKEKLKVKVGLS NSFGFGGHNSSILFAPCN SEQIDNO:72 Clade2KASIVconsensusmatureprotein KQRRVVVTGMGVVTPLGHEPDVYYNNLLDGVSGISEIETFDCTQFPTRIAGEIKSFST DGWVAPKLSKRMDKFMLYLLTAGKKALADGGITDDVMKELDKRKCGVLIGSGMG GMKLFNDSIEALRXSYKKMNPFCVPFATTNMGSAMLAMDLGWMGPNYSISTACAT SNFCILNAANHIVRGEADMMLCGGSDAVIIPIGLGGFVACRALSQRNNDPTKASRPW DSNRDGFVMGEGAGVLLLEELEHAKKRGATIYAEFLGGSFTCDAYHMTEPHPEGAG VILCIEKALAQAGVSREDVNYINAHATSTPAGDIKEYQALAHCFGQNSELRVNSTKS MIGHLLGAAGGVEAVTVXQAIRTGWIHPNLNLEDPDKAVDAKLLVGPKKERLNVK VGLSNSFGFGGHNSSILFAPYNV