BIOSYNTHESIS OF SULFUR-CONTAINING COMPOUNDS USING GENETICALLY MODIFIED BACTERIA

Abstract

A genetically modified prokaryotic cell which comprises: a cysteamine dioxygenase (ADO) polypeptide sequence which has at least 70% sequence coverage to SEQ 2, and at least 25% sequence identity to SEQ 2; and a vanin (VNN) polypeptide sequence selected from the group consisting of: a vanin-1 (VNN1) polypeptide sequence which has at least 70% sequence coverage to SEQ 4, and at least 25% sequence identity to SEQ 4; a vanin-2 (VNN2) polypeptide sequence which has at least 70% sequence coverage to SEQ 84, and at least 25% sequence identity to SEQ 84; and a vanin-3 (VNN3) polypeptide sequence which has at least 70% sequence coverage to SEQ 128 and at least 25% sequence identity to SEQ 128.

Claims

1. A genetically modified prokaryotic cell which comprises a cysteamine dioxygenase (ado) polynucleotide sequence which has at least 70% sequence coverage to SEQ 1, and at least 70% sequence identity to SEQ 1.

2. The genetically modified prokaryotic cell according to claim 1, wherein said ado polynucleotide sequence is selected from the group consisting of: SEQ 1; SEQ 22; SEQ 23; SEQ 24; SEQ 25; SEQ 26; and SEQ 27.

3. The genetically modified prokaryotic cell according to claim 1, wherein upon transcription and translation under the control of a native or synthetic promoter and Ribosomal binding site (RBS), said ado polynucleotide sequences provides a cysteamine dioxygenase (ADO) polypeptide sequence which has at least 70% sequence coverage to SEQ 2, and at least 25% sequence identity to SEQ 2.

4. The genetically modified prokaryotic cell according to claim 3, wherein said ADO polypeptide sequence is selected from the group consisting of: SEQ 2; SEQ 28; SEQ 29; SEQ 30; SEQ 31; SEQ 32; SEQ 33; SEQ 34; SEQ 35; SEQ 36; SEQ 37; SEQ 38; SEQ 39; SEQ 40; SEQ 41; and SEQ 42.

5. A genetically modified prokaryotic cell which comprises a vanin (vnn) polynucleotide sequence selected from the group consisting of: a vanin-1 (vnn1) polynucleotide sequence which has at least 70% sequence coverage to SEQ 3 or SEQ 69, and at least 70% sequence identity to SEQ 3 or SEQ 69; a vanin-2 (vnn2) polynucleotide sequence which has least 70% sequence coverage to SEQ 70, and at least 70% sequence identity to SEQ 70; and a vanin-3 (vnn3) polynucleotide sequence which has least 70% sequence coverage to SEQ 111, and at least 70% sequence identity to SEQ 111.

6. The genetically modified prokaryotic cell according to claim 5, wherein: said vanin-1 (vnn1) polynucleotide sequence is selected from the group consisting of: SEQ 3; SEQ 43; SEQ 44; SEQ 45; SEQ 46; SEQ 47; SEQ 48; SEQ 49; SEQ 50; SEQ 51; SEQ 52; and SEQ 69; or said vanin-2 (vnn2) polynucleotide sequence is selected from the group consisting of: SEQ 70; SEQ 71; SEQ 72; SEQ 73; SEQ 74; SEQ 75; SEQ 76; SEQ 77; SEQ 78; SEQ 79; SEQ 80; SEQ 81; SEQ 82; SEQ 83; and said vanin-3 (vnn3) polynucleotide sequence is selected from the group consisting of: SEQ 111; SEQ 112; SEQ 113; SEQ 114; SEQ 115; SEQ 116; SEQ 117; SEQ 118; SEQ 119; SEQ 120; SEQ 121; SEQ 122; SEQ 123; SEQ 124; SEQ 125; SEQ 126; and SEQ 127.

7. The genetically modified prokaryotic cell according to claim 5, wherein upon transcription and translation under the control of a native or synthetic promoter and Ribosomal binding site (RBS), said vnn1 polynucleotide sequences provides a vanin-1 (VNN1) polypeptide sequence which has at least 70% sequence coverage to SEQ 4, and at least 25% sequence identity to SEQ 4.

8. The genetically modified prokaryotic cell according to claim 5, wherein upon transcription and translation under the control of a native or synthetic promoter and Ribosomal binding site (RBS), said vnn2 polynucleotide sequences provides a vanin-2 (VNN2) polypeptide sequence which has at least 70% sequence coverage to SEQ 84, and at least 25% sequence identity to SEQ 84.

9. The genetically modified prokaryotic cell according to claim 5, wherein upon transcription and translation under the control of a native or synthetic promoter and Ribosomal binding site (RBS), said vnn3 polynucleotide sequences provides a vanin-3 (VNN3) polypeptide sequence which has at least 70% sequence coverage to SEQ 128, and at least 25% sequence identity to SEQ 128.

10. The genetically modified prokaryotic cell according to claim 7, wherein said VNN1 polypeptide sequence is selected from the group consisting of: SEQ 4; SEQ 53; SEQ 54; SEQ 55; SEQ 56; SEQ 57; SEQ 58; SEQ 59; SEQ 60; SEQ 61; SEQ 62; SEQ 63; SEQ 64; SEQ 65; SEQ 66; SEQ 67; and SEQ 68.

11. The genetically modified prokaryotic cell according to claim 8, wherein said VNN2 polypeptide sequence is selected from the group consisting of: SEQ 84; SEQ 85; SEQ 86; SEQ 87; SEQ 88; SEQ 89; SEQ 90; SEQ 91; SEQ 92; SEQ 93; SEQ 94; SEQ 95; SEQ 96; SEQ 97; SEQ 98; SEQ 99; SEQ 100; SEQ 101; SEQ 102; SEQ 103; SEQ 104; SEQ 105; SEQ 106; SEQ 107; SEQ 108; SEQ 109; and SEQ 110.

12. The genetically modified prokaryotic cell according to claim 9, wherein said VNN3 polypeptide sequence is selected from the group consisting of: SEQ 128; SEQ 129; SEQ 130; SEQ 131; SEQ 132; SEQ 133; SEQ 134; SEQ 135; SEQ 136; SEQ 137; SEQ 138; SEQ 139; SEQ 140; SEQ 141; SEQ 142; SEQ 143; SEQ 144; SEQ 145; SEQ 146; SEQ 147; SEQ 148; SEQ 149; SEQ 150; SEQ 151; SEQ 152; and SEQ 153.

13. The genetically modified prokaryotic cell according to claim 1 further comprising a promoter and RBS sequence which drives gene expression, wherein the genetic material for the promoter/RBS sequences comprises at least one or another of the following: SEQ 5; SEQ 6; SEQ 7; SEQ 8; SEQ 9; SEQ 10; SEQ 11; SEQ 12; SEQ 13; SEQ 14; SEQ 15; SEQ 16; SEQ 17; SEQ 18; SEQ 19; SEQ 20; and SEQ 21.

14. A genetically modified prokaryotic cell which comprises: a cysteamine dioxygenase (ado) polynucleotide sequence which has at least 70% sequence coverage to SEQ 1, and at least 70% sequence identity to SEQ 1; and a vanin (vnn) polynucleotide sequence selected from the group consisting of: a vanin-1 (vnn1) polynucleotide sequence which has at least 70% sequence coverage to SEQ 3 or SEQ 69, and at least 70% sequence identity to SEQ 3 or SEQ 69; a vanin-2 (vnn2) polynucleotide sequence which has at least 70% sequence coverage to SEQ 70, and at least 70% sequence identity to SEQ 70; and a vanin-3 (vnn3) polynucleotide sequence which has at least 70% sequence coverage to SEQ 111, and at least 70% sequence identity to SEQ 111.

15. The genetically modified prokaryotic cell according to claim 14, wherein: said ado polynucleotide sequence is selected from the group consisting of: SEQ 1; SEQ 22; SEQ 23; SEQ 24; SEQ 25; SEQ 26; and SEQ 27. said vanin-1 (vnn1) polynucleotide sequence is selected from the group consisting of: SEQ 3; SEQ 43; SEQ 44; SEQ 45; SEQ 46; SEQ 47; SEQ 48; SEQ 49; SEQ 50; SEQ 51; SEQ 52; and SEQ 69; said vanin-2 (vnn2) polynucleotide sequence is selected from the group consisting of: SEQ 70; SEQ 71; SEQ 72; SEQ 73; SEQ 74; SEQ 75; SEQ 76; SEQ 77; SEQ 78; SEQ 79; SEQ 80; SEQ 81; SEQ 82; SEQ 83; and said vanin-3 (vnn3) polynucleotide sequence is selected from the group consisting of: SEQ 111; SEQ 112; SEQ 113; SEQ 114; SEQ 115; SEQ 116; SEQ 117; SEQ 118; SEQ 119; SEQ 120; SEQ 121; SEQ 122; SEQ 123; SEQ 124; SEQ 125; SEQ 126; and SEQ 127.

16. A genetically modified prokaryotic cell which comprises: a cysteamine dioxygenase (ADO) polypeptide sequence which has at least 70% sequence coverage to SEQ 2, and at least 25% sequence identity to SEQ 2; and a vanin (VNN) polypeptide sequence selected from the group consisting of: a vanin-1 (VNN1) polypeptide sequence which has at least 70% sequence coverage to SEQ 4, and at least 25% sequence identity to SEQ 4; a vanin-2 (VNN2) polypeptide sequence which has at least 70% sequence coverage to SEQ 84, and at least 25% sequence identity to SEQ 84; and a vanin-3 (VNN3) polypeptide sequence which has at least 70% sequence coverage to SEQ 128 and at least 25% sequence identity to SEQ 128.

17. The genetically modified prokaryotic cell according to claim 16, wherein: said ADO polypeptide sequence is selected from the group consisting of: SEQ 1; SEQ 22; SEQ 23; SEQ 24; SEQ 25; SEQ 26; and SEQ 27; said vanin-1 (VNN1) polypeptide sequence is selected from the group consisting of: SEQ 4; SEQ 53; SEQ 54; SEQ 55; SEQ 56; SEQ 57; SEQ 58; SEQ 59; SEQ 60; SEQ 61; SEQ 62; SEQ 63; SEQ 64; SEQ 65; SEQ 66; SEQ 67; and SEQ 68; said vanin-2 (VNN2) polypeptide sequence is selected from the group consisting of: SEQ 84; SEQ 85; SEQ 86; SEQ 87; SEQ 88; SEQ 89; SEQ 90; SEQ 91; SEQ 92; SEQ 93; SEQ 94; SEQ 95; SEQ 96; SEQ 97; SEQ 98; SEQ 99; SEQ 100; SEQ 101; SEQ 102; SEQ 103; SEQ 104; SEQ 105; SEQ 106; SEQ 107; SEQ 108; SEQ 109; and SEQ 110; and said vanin-3 (VNN3) polypeptide sequence is selected from the group consisting of: SEQ 128; SEQ 129; SEQ 130; SEQ 131; SEQ 132; SEQ 133; SEQ 134; SEQ 135; SEQ 136; SEQ 137; SEQ 138; SEQ 139; SEQ 140; SEQ 141; SEQ 142; SEQ 143; SEQ 144; SEQ 145; SEQ 146; SEQ 147; SEQ 148; SEQ 149; SEQ 150; SEQ 151; SEQ 152; and SEQ 153.

18. The prokaryotic cell according to claim 1, where the cell is a bacterial cell.

19. The prokaryotic cell according to claim 1, where the cell is selected from the group consisting of the genera: Brevibacterium, Bacillus, Corynebacterium, Escherichia, Lactococcus, Pseudomonas, Rhodococcus, and Serratia.

20. The prokaryotic cell according to claim 1, where the cell belongs to the genus Corynebacterium.

Description

BRIEF DESCRIPTION OF THE ACCOMPANYING FIGURE

[0049] The invention may be more completely understood in consideration of the following description of various embodiments of the invention in connection with the accompanying figure, in which:

[0050] FIG. 1 is a schematic representation of various known pathways to make sulfur-containing compounds.

DETAILED DESCRIPTION OF THE INVENTION

[0051] The invention described herein addresses genetic modifications to bacterial strains allowing for the production of a sulfur-containing compound from an inexpensive feedstock using bacterial species modified with the eukaryotic ado (SEQ 1) and vanin (SEQ 3, SEQ 69, SEQ 70 or SEQ111) polynucleotides encoding for the polypeptide sequences ADO (SEQ 2) and VNN (SEQ 4, SEQ 84, or SE 128).

[0052] Provided herein are genetically engineered bacteria that can produce hypotaurine, taurine, or taurine precursors from a sugar source and a sulfur source. Also provided are methods to genetically engineer and culture hypotaurine and taurine producing bacteria.

Definitions

[0053] Within the context of the present invention all terms and technical parameters described fall within their commonly known meanings as known by individuals within the region of science that the proposed invention is associated with, unless otherwise stated. Furthermore, unless otherwise indicated, all techniques utilized within this invention are commonly conducted within the fields of molecular biology, cell biology, biochemistry, and microbiology.

[0054] A polynucleotide within the context of the present invention is defined as the collection of individual nucleotides in any organization or size that relates to the DNA sequence.

[0055] A polypeptide within the context of the present invention is defined as the combination of multiple peptides of any organization or size that relates to the amino acid sequence. The term polypeptide and protein within the context of this invention can be used interchangeably.

[0056] A vector within the context of the present invention refers to the composition of a polynucleotide with the intended purpose of introducing nucleic acids into one or more organism types. Vectors are further defined based on their functional purpose and can be designated as expression vectors, cloning vectors, plasmids, or shuttle vectors.

[0057] The term expression within the context of the present invention refers to the generation of a polypeptide sequence which is produced based on its polynucleotide sequence or gene.

[0058] An expression vector within the context of the present invention references a polynucleotide sequence containing a coding sequence or gene that enhances or promotes the generation of a polypeptide when introduced into an organism. An expression vector contains all the necessary polypeptide producing features such as a promoter and ribosomal binding site which allow for the production (or expression) of a desired gene due to transcription and translation processes.

[0059] A promoter within the context of the present invention is used to describe the nucleic acid sequence for the regulation and binding of polymerases for the purpose of transcribing a gene. This promoter can be native to an organism, or a non-endogenous promoter can be introduced into an organism to alter the regulation of gene expression.

[0060] The term gene refers to a DNA sequence that encodes for a specific polypeptide sequence. A gene can include both sequences between coding regions (introns) and the encoding sequence itself (exon).

[0061] The term recombinant within the context of the present invention refers to the modification or alteration of a sequence associated with either a polypeptide or polynucleotide sequence. Recombination can be utilized for altering expression and coding segments of a gene of interest that would produce a non-native or non-naturally occurring product.

[0062] The term exogenous refers to the addition of either polypeptide and/or polynucleotide molecules that are not normally found within the organism. This includes any un-altered or altered genes and/or proteins that are not found conventionally within an organism.

[0063] The term homology refers to the level of similarity between two or more polypeptide or polynucleotide sequences.

[0064] The terms transfection, transformation, or introduced refer to the addition of polynucleotide sequence(s) that would normally be considered exogenous to the organism. This can include the addition of a polynucleotide directly to the genome of an organism or the transfer of a plasmid and/or vector to be maintained within the organism.

[0065] Within the context of the present invention, the terms native or natural refers to polypeptide and/or polynucleotides present within the organism prior to any modification. These native or naturally occurring polypeptides and/or polynucleotides would be present or produced by the organism without any external alterations.

[0066] The term metabolic pathway refers to the subsequential biochemical reactions involved in the formation of a biologically relevant product within an organism.

[0067] Within the context of the present invention the terms knock-in and knock-out refer to the addition or removal of DNA sequences within an organism and can also be interchangeable with the terms insertion and deletion, respectively.

[0068] A coding sequence within the context of the present invention refers to a sequence of polynucleotides or DNA that facilitates the generation of a protein through transcription and translational processes (also known as transcribed and translated).

[0069] Genetic modification or related statements herein refer to the alteration of the genetic code of an organism which includes the insertion or deletion of DNA sequences within an organism. Within the context of the present invention, genetic modification could include insertion and maintenance of an expression vector into the organism, or the direct modification of the organisms genome by directly adding or deleting genes through processes like, but not limited to, 2 step allelic exchange or CRISPR cloning.

[0070] The term ribosomal binding site (RBS) refers to the region within a polynucleotide sequence that allows for the appropriate binding of a ribosome to a polynucleotide sequence to facilitate the translation of a polynucleotide sequence to produce a polypeptide sequence, which includes the terms protein, enzyme, and plasmid.

[0071] The term synthetic promoter refers to the addition or modification of a promoter sequence that would not or does not exist within the organism naturally. This can include the insertion or utilization of non-native promoters, or regions of non-native promoters utilized in the modification of protein synthesis.

[0072] The term biosynthetic in the context of the present invention refers to the generation of a biological compound by a living organism. This can include but is not limited to the formation of a biological compound that naturally occurs with the organism or the formation of a compound by an organism due to modifications to its genetic code.

[0073] The term transgenic, as used herein, refers to the combination of multiple organism polynucleotide sequences within a single organism. For example, if a polynucleotide sequence was sourced from an organism outside of the intended organism of interest within the invention, the organism of the invention's interest would be considered transgenic in nature.

[0074] The term cloning vector herein refers to a polynucleotide sequence or plasmid that can be replicated within a host organism for storage or amplification purposes. A cloning vector may contain all the necessary regulatory sequences needed to facilitate the transcription and translation of a protein.

[0075] The term unmodified promoter is defined as a promoter sequence which is unaltered and/or exists within the host organism itself.

[0076] The term two-step allelic exchange is referring to a process by which a gene of interest is either inserted or deleted from an organism through specific selective conditions. The insertion or deletion of a specific gene of interest is done so through the utilization of distinct polynucleotide sequences which allows for the exchange of genetic material between two sources.

[0077] The term CRISPR cloning is defined as a process by which the gene of interest is inserted or removed from an organism's genome using the CRISPR-CAS9 cloning system.

[0078] The terms upstream and downstream refer to regions of polynucleotides which are found prior to or after a specific gene of interest within a plasmid and/or genome of an organism.

[0079] The term enzyme within the context of the present invention defines a polypeptide sequence, specifically in the form of a protein, that can modify a biological molecule or take part within its generation through direct or indirect interactions. The process by which an enzyme influences the modification and/or production of a biological molecule and/or product is termed enzymatic activity.

[0080] The term open reading frame (ORF) refers to the collection of nucleotides which are found in between the start and stop codons of a polypeptide encoding DNA sequence.

[0081] The term codon(s) refers to 3 adjacent nucleotides in a polynucleotide sequence that are used by the cell to decode the polynucleotide sequence when the polynucleotide sequence is translated to make the polypeptide sequence and are responsible defining the order of protein residues in a polypeptide sequence based on this code. Based on a 3 letter code, and 4 different nucleotide bases, these codons include 64 different combinations that are able to be used by the cell, which with some redundancy codes for 22 possible protein residues, as well as 1 start and 3 stop codons.

[0082] A start and stop codon refer to nucleotide codon sequences comprised of three specific nucleotides in succession of each other, which allows for the identification of the initiation (start) and termination (stop) for the translation of a polypeptide sequence by the cell.

[0083] A unicellular organism refers to any organism of which complete organismal composition consists of a single cell.

[0084] The term central dogma of molecular biology states that genetic material flows in a single direction to produce protein. This dogma states that DNA is transcribed to produce messenger RNA, which in turn is translated to produce the final protein/polypeptide sequence. Simply put: DNA.fwdarw.messenger RNA.fwdarw.Protein

[0085] The term messenger RNA refers to a transitory molecule that is found between the polypeptide sequence and the DNA polynucleotide sequence. Simply, the messenger RNA is transcribed from the polynucleotide sequence and the messenger RNA is translated to produce the final protein.

[0086] The term metabolic engineering herein refers to the alteration of an organism's metabolic pathway potential. This can include both the deactivation and/or altering of pre-existing metabolic pathways of an organism or the inclusion of additional metabolic processes.

[0087] The term Sequence alignment herein refers to a bioinformatic technique by which two polynucleotide sequences or two polypeptide sequences are arranged or aligned in such a way as to identify regions of similarity between a reference sequence (the sequence that is known) and the quarry sequence (the sequence to be compared to the reference sequence). Those skilled in the art know that alignment algorithms such as, but by no means limited to, the BLAST, ALIGN, or CLUSTAL algorithms can be used to obtain this information for polynucleotide or polypeptide sequences, respectively.

[0088] The term Percentage sequence identity herein refers to the similarity between 2 sequences that have been processed through a sequence alignment, to provide insight into how similar aligned sequences are at either the nucleotide or peptide level for polynucleotide or polypeptide sequences, respectively. The percentage identity is used to determine the similarity of a query sequence to a reference sequence.

[0089] The term Percentage sequence coverage refers to the number of aligned nucleotides or peptides in a query sequence relative to the length of the reference sequence. The percentage coverage provides an indication of how much of the reference polynucleotide or polypeptide sequence is covered by the query sequence, allowing for instance the lengths of the found genes or proteins to be compared.

[0090] The BLASTN algorithm was used herein as one method to determine the percentage identity and percentage coverage between one or even multiple different polynucleotide sequences with respect to an inputted reference sequence, allowing for the determination of the percentage identity and percentage coverage of one or many query sequences to said reference sequence. One of ordinary skill in the art will recognize that search results from a BLASTN search will be influenced by the search parameters used in the search. Therefore, for all BLASTN searches done with respect to this invention to identify other sequences which have been catalogued in the NCBI polynucleotide databases relative to a reference include the following parameters: [0091] Search set parameters are comprising of standard databases (nr ect), with the specific database used being the Nucleotide collection (nr/nt), and no exclusions or limitations were placed on the search (all default parameters) [0092] Program selection algorithm parameters includes the highly similar sequences (known as the megablast algorithm) (the default parameter) [0093] Algorithm parameters altered include the Max target sequences, which was set at 5000, otherwise all default parameters are used for relevant searches (other parameters in General parameters, and all parameters in Scoring parameters and Filters and masking are default parameters)

[0094] The BLASTP algorithm was used herein as one method to determine the percentage identity and percentage coverage between one or even multiple different polypeptide sequences with respect to an inputted reference sequence, allowing for the determination of the percentage identity and percentage coverage of one or many query sequences to said reference sequence. One of ordinary skill in the art will recognize that search results from a BLASTP search will be influenced by the search parameters used in the search. Therefore, for all BLASTP searches done with respect to this invention to identify other sequences which have been catalogued in the NCBI polypeptide databases relative to a reference include the following parameters: [0095] Search set parameters are comprising of standard databases (nr ect), with the specific database used being the Non-redundant protein sequences (nr), and no exclusions or limitations were placed on the search (all default parameters) [0096] Program selection algorithm parameters includes the BLASTP (known as the protein-protein BLAST algorithm) (the default parameter) [0097] Algorithm parameters altered include the Max target sequences, which was set at 5000. Otherwise all default parameters are used for relevant searches (other parameters in General parameters, and all parameters in Scoring parameters and Filters and masking are default parameters). Notable default parameters include an Expect Threshold and word size of 0.05 and 5, respectively in the general parameters, the usage of the BLOSUM62 matrix with gap costs of Existence: 11 and Extension: 1 for the Scoring parameters, and no filter or masking components selected.

[0098] The phrases substantially similar or substantially identical in the context of at least 2 nucleic acid sequences or at least 2 polypeptide sequences typically means that a polynucleotide, polypeptide, or region or domain of a polypeptide has, preferably, a percentage coverage of at least 70%, 75%, 80%, 85%, 90%, 95%, 97%, 99%, or even 99.5%, and at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or even 99.5% percentage identity to the reference sequence. Some polynucleotide or polypeptide sequences that fall in this category are sequences that share genetic or protein homology to the reference sequence.

[0099] The terms genetic homology and protein homology, or homologous sequences refer to polynucleotide sequences or translated polypeptide sequences that have a similar or identical function in the cell. For example, 2 different proteins share a similar or identical function even though they were isolated from 2 different organisms. Polynucleotide sequences with homology are generally understood to have similar or identical biochemical functionality.

[0100] In scientific literature, genes/proteins are often renamed as more about the gene is determined, often leaving several different associated names for each gene. The FMO1 protein is known as flavin-containing monooxygenase 1. The ADO protein is known as cysteamine dioxygenase and 2-aminoethanethiol dioxygenase. The VNN1 protein is known as vanin-1 and pantetheinase. The VNN2 protein is known as vanin-2 and as pantetheinase. The VNN3 protein is known as vanin-3 and as pantetheinase.

Enzymes and Promoters

[0101] Example polypeptide sequences for enzymes involved in the synthesis of a sulfur-containing compound that can be integrated into prokaryotic organisms are provided in the sequence listing. According to a preferred embodiment of the present invention, the expression and production of these sequences within the cell are partially driven by the genetic polynucleotide promoter and ribosomal binding site sequences as provided in the sequence listings: SEQ 5 (P.sub.glyA), SEQ 6 (P.sub.SOD), SEQ 7 (P.sub.pgk), SEQ 8 (P.sub.tuf), SEQ 9 (P.sub.fbaA), SEQ 10 (P.sub.lysC), SEQ 11 (P.sub.tkt), SEQ 12 (P.sub.glnA), SEQ 13 (P.sub.pyc), SEQ 14 (P.sub.hom), SEQ 15 (P.sub.gnd), SEQ 16 (P.sub.lysA), SEQ 17 (P.sub.aspB), SEQ 18 (P.sub.ddh), SEQ 19 (P.sub.dapB), SEQ 20 (P.sub.dapA) and SEQ 21 (P.sub.tac). The invention is not limited to the use of these amino acid sequences.

[0102] Those of ordinary skill in the art know that organisms of a wide variety of species commonly express and utilize homologous proteins, which contain insertions, substitutions and deletions in the polypeptide sequences listed above, and effectively provide a similar function. For example, the protein sequences for ADO from Sus scrofa or Nycticebus coucang or Salmo salar and VNN from Sus scrofa or Nycticebus coucang or Harpia harpyja may differ to different degrees from the polypeptide sequences seen between these organisms yet maintain similar or identical functions of the protein within the organism with respect to regulatory or catalytic function. Protein sequences comprising such variations are included within the scope of the present invention and are considered substantially or sufficiently similar to the reference polypeptide sequences provided above. Although it is not intended that the present invention is limited by any theory by which it achieves its advantageous result, it is believed and supported by biochemical knowledge that the identity between polypeptide sequences that is necessary to maintain proper functionality is related to maintaining the tertiary (3D) structure of the polypeptide. This maintenance of the tertiary structure is associated with the specific interactive/catalytic portions of the protein sequence and will therefore have the desired activity, and it is contemplated that a protein including these interactive sequences in the proper spatial context will have this activity.

[0103] Those of ordinary skill in the art know that many different amino acids contain similar properties between each other and can serve similar functions in the final polypeptide sequence. Thus, when one amino acid is changed with another amino acid from this group, such as a non-polar amino acid, an uncharged polar amino acid, a charged polar acidic amino acid, or a charged polar basic amino acid, some polypeptide functionality is generally maintained. For example, it is known that the uncharged polar amino acid serine may be substituted for the uncharged polar amino acid threonine in a polypeptide without substantially altering the protein structure and functionality. Whether a given substitution will affect the functionality of the enzyme may be determined without undue experimentation using synthetic techniques and screening assays known to a person of ordinary skill in the art.

[0104] A person of ordinary skill in the art will recognize that changes in the protein sequence, resulting from individual single or multi-nucleotide substitutions, deletions, or additions to a polynucleotide will lead to changes in the resulting translated polypeptide sequence. Small mutations, such as the change of an amino acid from one to another, or the addition or elimination of single amino acids, or a small to moderate percentage of amino acids from the encoded polypeptide sequence can be considered sufficiently similar when the alteration results in the substitutions of an amino acid with a chemically similar amino acid. Thus, any number of amino acid residues in a polypeptide chain, selected from a group of integers from 1-50, can be so altered. Thus, for example, 1, 2, 3, 5, 10, 12, 20, 32, 41, or even 50 alterations can be made. Conservatively modified variants typically provide similar biological activity as the unmodified polypeptide sequence from which they are derived. For example, modification of ADO and VNN to yield functional proteins generally have, preferably, a sequence identity of at least 40%, 50%, 60%, 70%, 80%, or 90%, preferably a sequence identity of greater than 50%, of the native protein to allow processing of its native substrate. Tables of conserved substitution provide lists of functionally similar amino acids. Amino acids in polypeptide chains that are similar to one another include, but are not limited to, the following groups: (1) Serine (S), Threonine (T); (2) Aspartic acid (D), Glutamic acid (E); (3) Asparagine (N), Glutamine (Q); (4) Alanine (A), Leucine (L), and Isoleucine (I).

Suitable Polynucleotide and Polypeptide Sequences for ADO and VNN

[0105] A person of ordinary skills in the art will recognize that many different organisms will have functionally similar polynucleotide and polypeptide sequences (or homology between the sequences), however there may be differences between these sequences when compared to a reference sequence. As examples, suitable polynucleotides and their corresponding polypeptide sequences for the production of a sulfur-containing compound can be seen below. Note that the following sequences by no means are meant to limit the scope of the invention. In fact, any substantially similar polynucleotide sequences or substantially similar produced polypeptide sequences for the ADO and VNN genes with similar function or similarity to these genes can also be used for the production of a sulfur-containing compound.

[0106] According to a preferred embodiment of the present invention, the polynucleotide sequence for ado, isolated from the eukaryotic species Sus scrofa (pig) (SEQ 1), was utilized in the process described herein. In this embodiment, cysteamine dioxygenase (ado) is under the transcriptional control of a native or artificial promoter and a ribosomal binding site. However, in other embodiments of the invention, polynucleotide sequences that are homologous and/or substantially similar to SEQ 1 may also be used in the present invention to produce taurine. Polynucleotide sequences for cysteamine dioxygenase in these embodiments will, preferably, have at least 70% sequence coverage, or more preferably greater than 80%. 90%, 95%, 98%, or most preferentially greater than 99% sequence coverage of SEQ 1, and sequence identities of at least 70%, or more preferentially greater than 80%, 90%, 95%, 97% sequence identity, and most preferentially 99% sequence identity of SEQ 1. These polynucleotide sequences may include, but by no means limited to, the following sequences: SEQ 22. SEQ 23, SEQ 24, SEQ 25, SEQ 26, and SEQ 27.

[0107] According to a preferred embodiment of the present invention, the cysteamine dioxygenase polypeptide (ADO) SEQ 2 from the eukaryotic species Sus scrofa (pig) is utilized, whereby SEQ 2 is produced from the transcription and translation of the cysteamine dioxygenase polynucleotide SEQ 1. However, in other embodiments of the invention, polypeptide sequences that are homologous and/or substantially similar to SEQ 2 may also be used in the present invention to produce taurine. Polypeptide sequences for cysteamine dioxygenase in these embodiments will, preferably, have at least 70% sequence coverage, or more preferentially greater than 80%, 90%, 95%, 98%, or most preferentially greater than 99% sequence coverage of SEQ 2, and a sequence identity of, preferably, at least 25% to SEQ 2, or more preferentially greater than 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70% 75%, 80%, 85%, 90%, 95%, 97%, or most preferentially greater than 99% sequence identity to SEQ 2. These polypeptide sequences may include, but are not limited to, the following sequences: SEQ 28, SEQ 29, SEQ 30, SEQ 31, SEQ 32, SEQ 33, SEQ 34, SEQ 35, SEQ 36, SEQ 37, SEQ 38, SEQ 39, SEQ 40, SEQ 41, and SEQ 42.

[0108] According to a preferred embodiment of the present invention, the polynucleotide sequence for vanin-1 (vnn1), isolated from the eukaryotic species Sus scrofa (pig) SEQ 3, was utilized. However, in other embodiments of the invention, polynucleotide sequences that are homologous and/or substantially similar to SEQ 3 may also be used in the present invention. Furthermore, polynucleotide sequences that are homologous and/or substantially similar to SEQ 69 may also be used in the present invention. Polynucleotide sequences for vanin-1 in these embodiments will have at least 70% sequence coverage, or more preferentially greater than 75%, 80%, 85%, 90%, 95%, 97%, or 98% sequence coverage, or most preferentially greater than 99% sequence coverage of SEQ 3 or SEQ 69, and the polynucleotide sequence of vanin-1 has at least 70% sequence identity, or more preferentially 80%, 85%, 90%, 95%, 97%, or 98% sequence identity, or most preferentially greater than 99% sequence identity to SEQ 3 or SEQ 69. These polynucleotide sequences may include, but by no means are limited to, the following sequences: SEQ 43, SEQ 44, SEQ 45, SEQ 46, SEQ 47, SEQ 48, SEQ 49. SEQ 50, SEQ 51, and SEQ 52.

[0109] According to a preferred embodiment of the present invention, the vanin-1 polypeptide (VNN1) SEQ 4 from the eukaryotic species Sus scrofa (pig) is utilized to produce a sulfur-containing compound by the cell, whereby SEQ 4 is produced from the transcription and translation of the vanin-1 polynucleotides SEQ 3 or SEQ 69. However, in other embodiments of the invention, polypeptide sequences that are homologous and/or substantially similar to SEQ 4 may also be used in the present invention to produce taurine. Polypeptide sequences for vanin-1 in these embodiments will, preferably, have at least 70% sequence coverage, or more preferentially greater than 80%, 90%, 95%, 98%, or most preferentially greater than 99% sequence coverage of SEQ 4, and a sequence identity of at least 25% to SEQ 4, or more preferentially greater than 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70% 75%, 80%, 85%, 90%, 95%, 97%, or most preferentially greater than 99% sequence identity to SEQ 4. These polypeptide sequences may include, but are by no means limited to, the following sequences: SEQ 53. SEQ 54, SEQ. 56, SEQ 57, SEQ 58, SEQ 59, SEQ 60, SEQ 61, SEQ 62, SEQ 63, SEQ 64, SEQ, 65, SEQ 66, SEQ 67, and SEQ 68.

[0110] According to a preferred embodiment of the present invention, the polynucleotide sequence for vanin-2 (vnn2), isolated from the eukaryotic species Bos taurus (cattle) SEQ 70 can be utilized in the place of vanin-1 (vnn1) (SEQ 3 or SEQ 69). However, in other embodiments of the invention, polynucleotide sequences that are homologous and/or substantially similar to SEQ 70 may also be used in the present invention to produce taurine. Polynucleotide sequences for vanin-2 in these embodiments will have at least 70% sequence coverage, or more preferentially greater than 80%, 85%, 90%, 95%, 96%, or 97% sequence coverage, or most preferentially greater than 99% sequence coverage of SEQ 70, and the polynucleotide sequence of vanin-2 has at least 70% sequence identity, or more preferentially 80%, 85%, 90%, 95%, or 96% sequence identity, or most preferentially greater than 99% sequence identity to SEQ 70. These polynucleotide sequences may include, but by no means are limited to, the following sequences: SEQ 71; SEQ 72; SEQ 73; SEQ 74; SEQ 75; SEQ 76; SEQ 77; SEQ 78; SEQ 79; SEQ 80; SEQ 81; SEQ 82; and SEQ 83.

[0111] According to a preferred embodiment of the present invention, the vanin-2 polypeptide (VNN2) SEQ 84 from the eukaryotic species Bos taurus (cattle) is utilized, whereby SEQ 84 is produced from the transcription and translation of the vanin-2 polynucleotide SEQ 70. However, in other embodiments of the invention, polypeptide sequences that are homologous and/or substantially similar to SEQ 84 may also be used in the present invention to produce taurine. Polypeptide sequences for vanin-2 in these embodiments will, preferably, have at least 70% sequence coverage, or more preferentially greater than 75%, 80%, 90%, 95%, or most preferentially greater than 99% sequence coverage of SEQ 84, and a sequence identity of at least 25% to SEQ 84, or more preferentially greater than 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70% 75%, 80%, 85%, 90%, 95%, 98%, or most preferentially greater than 99% sequence identity to SEQ 84. These polypeptide sequences may include, but are by no means limited to, the following sequences: SEQ 85; SEQ 86; SEQ 87; SEQ 88; SEQ 89; SEQ 90; SEQ 91; SEQ 92; SEQ 93; SEQ 94; SEQ 95; SEQ 96; SEQ 97; SEQ 98; SEQ 99; SEQ 100; SEQ 101; SEQ 102; SEQ 103; SEQ 104; SEQ 105; SEQ 106; SEQ 107; SEQ 108; SEQ 109; and SEQ 110.

[0112] According to a preferred embodiment of the present invention, the polynucleotide sequence for vanin-3 (vnn3), isolated from the eukaryotic species Mus musculus (house mouse) (SEQ 111) can be utilized in the place of vanin-1 (vnn1) (SEQ 3 or SEQ 69). However, in other embodiments of the invention, polynucleotide sequences that are homologous and/or substantially similar to SEQ 111 may also be used in the present invention to produce taurine. Polynucleotide sequences for vanin-3 in these embodiments will have at least 70% sequence coverage, or more preferentially greater than 75%, 80%, 85%, 90%, 95%, 96%, or 97% sequence coverage, or most preferentially greater than 99% sequence coverage of SEQ 111, and the polynucleotide sequence of vanin-3 has at least 70% sequence identity, or more preferentially 75%. 80%, 90%, 95%, or 97% sequence identity, or most preferentially greater than 99% sequence identity to SEQ 111. These polynucleotide sequences may include, but by no means are limited to, the following sequences: SEQ 112; SEQ 113; SEQ 114; SEQ 115; SEQ 116; SEQ 117; SEQ 118; SEQ 119; SEQ 120; SEQ 121; SEQ 122; SEQ 123; SEQ 124; SEQ 125; SEQ 126; and SEQ 127.

[0113] According to a preferred embodiment of the present invention, the vanin-3 polypeptide (VNN3) SEQ 128 from the eukaryotic species Mus musculus (house mouse) is utilized to produce a sulfur-containing compound by the cell, whereby SEQ 128 is produced from the transcription and translation of the vanin-3 polynucleotide SEQ 111. However, in other embodiments of the invention, polypeptide sequences that are homologous and/or substantially similar to SEQ 128 may also be used in the present invention to produce taurine. Polypeptide sequences for vanin-3 in these embodiments will, preferably, have at least 70% sequence coverage, or more preferentially greater than 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or most preferentially greater than 99% sequence coverage of SEQ 128, and a sequence identity of at least 25% to SEQ 128, or more preferentially greater than 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70% 75%, 80%, 85%, 90%, 95%, 98%, or most preferentially greater than 99% sequence identity to SEQ 128. These polypeptide sequences may include, but are by no means limited to, the following sequences: SEQ 129; SEQ 130; SEQ 131; SEQ 132; SEQ 133; SEQ 134; SEQ 135; SEQ 136; SEQ 137; SEQ 138; SEQ 139; SEQ 140; SEQ 141; SEQ 142; SEQ 143; SEQ 144; SEQ 145; SEQ 146; SEQ 147; SEQ 148; SEQ 149; SEQ 150; SEQ 151; SEQ 152; and SEQ 153.

[0114] Embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.

BIOSYNTHESIS OF SULFUR-CONTAINING COMPOUNDS USING GENETICALLY MODIFIED BACTERIA

Inventors

Cpc classification

Classification Explorer

C12N1/20

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/77

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/74

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/80

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/0069

CHEMISTRY; METALLURGY

Classification Explorer

C12N2510/00

CHEMISTRY; METALLURGY

Classification Explorer

C12R2001/15

CHEMISTRY; METALLURGY

Classification Explorer

C12Y113/11019

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/0073

CHEMISTRY; METALLURGY

Classification Explorer

C12P13/001

CHEMISTRY; METALLURGY

Classification Explorer

C12P11/00

CHEMISTRY; METALLURGY

Classification Explorer

C12Y305/01092

CHEMISTRY; METALLURGY

Classification Explorer

C12R2001/01

CHEMISTRY; METALLURGY

Classification Explorer

C12Y114/13008

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/77

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/02

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/80

CHEMISTRY; METALLURGY

Abstract

Claims

Description