Method for obtaining natural variant of enzyme and super thermostable cellobiohydrolase
09944914 ยท 2018-04-17
Assignee
Inventors
- Migiwa Suda (Wako, JP)
- Jiro Okuma (Wako, JP)
- Asuka Yamaguchi (Wako, JP)
- Yoshitsugu Hirose (Wako, JP)
- Yasuhiro Kondo (Wako, JP)
Cpc classification
G16B35/00
PHYSICS
C12Y302/01004
CHEMISTRY; METALLURGY
C12N15/1089
CHEMISTRY; METALLURGY
C12N15/1034
CHEMISTRY; METALLURGY
C12P19/14
CHEMISTRY; METALLURGY
International classification
C12P19/14
CHEMISTRY; METALLURGY
C12N15/10
CHEMISTRY; METALLURGY
Abstract
A method for selectively obtaining a natural variant of an enzyme having activity includes (1) a step of detecting an ORF sequence of a protein having enzyme activity from a genome database including base sequences of metagenomic DNA of environmental microbiota; (2) a step of obtaining at least one PCR clone including the ORF sequence having a full length, a partial sequence of the ORF sequence, or a base sequence encoding an amino acid sequence which is formed by deletion, substitution, or addition of at least one amino acid residue in an amino acid sequence encoded by the ORF sequence, by performing PCR cloning on at least one metagenomic DNA of the environmental microbiota by using a primer designed based on the ORF sequence; (3) a step of determining a base sequence and an amino acid sequence which is encoded by the base sequence for each PCR clone obtained in the step (2); and (4) a step of selecting a natural variant of an enzyme having activity by measuring enzyme activity of proteins encoded by each PCR clone obtained in the step (2).
Claims
1. A method for manufacturing a super thermostable cellobiohydrolase, comprising a step of culturing a transformant expressing a nucleic acid encoding a cellobiohydrolase domain having a polypeptide of an amino acid sequence equal to or higher than 90% identical to SEQ ID NO: 1, and having serine in the 88.sup.th position, phenylalanine in the 230.sup.th position and serine in the 414.sup.th position of the amino acid sequence, wherein said super thermostable cellobiohydrolase exhibits cellobiohydrolase activity under conditions of temperature of 80 C. to 100 C. and a pH of 4.0 to a pH of 7.0.
2. A transformant comprising a nucleic sequence encoding a polypeptide of an amino acid sequence having equal to or higher than 90% identical to SEQ ID NO: 1, having serine in the 88.sup.th position, phenylalanine in the 230.sup.th position and serine in the 414.sup.th position of the amino acid sequence, and having cellobiohydrolase activity under conditions of temperature of 80 C. to 100 C. and a pH of 4.0 to a pH of 7.0.
3. An expression vector comprising a nucleic sequence encoding a polypeptide of an amino acid sequence having equal to or higher than 90% identical to SEQ ID NO: 1, having serine in the 88.sup.th position, phenylalanine in the 230.sup.th position and serine in the 414.sup.th position of the amino acid sequence, and having cellobiohydrolase activity under conditions of temperature of 80 C. to 100 C. and a pH of 4.0 to a pH of 7.0.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION OF THE INVENTION
(9) [Method for Obtaining Natural Variant of Enzyme]
(10) The method for selectively obtaining a natural variant of an enzyme having activity according to the present invention (hereinafter, referred to as a natural variant obtaining method according to the present invention in some cases) relates to a method for selectively obtaining a natural variant of an enzyme having activity from metagenomic DNA prepared from an environmental microbiota, by using a primer designed based on an open reading frame (ORF) sequence of the enzyme gene.
(11) The metagenomic DNA can be prepared from the environmental microbiota by a known method using, for example, a DNA extraction kit (ISOIL Large for Beads ver. 2, NIPPON GENE CO., LTD.).
(12) If a variant library can be constructed by collecting only gain-of-function type mutants, it is possible to avoid useless screening performed for removing the enormous number of dysfunctional mutants generated, and to efficiently obtain variants having improved functions by performing a functional assay on a relatively small number of amino acid substitution variants. The metagenomic DNA prepared from the environmental microbiota includes a large amount of gain-of-function type natural variants. Therefore, if the metagenomic DNA is used as a base, it is possible to efficiently obtain gain-of-function type variants.
(13) In the present invention, the term having activity means that there is a significant difference in the amount of reducing-end or a chromogenic reaction, with respect to at least one substrate as compared to the negative control.
(14) The reason why the metagenomic DNA prepared from the environmental microbiota includes a large amount of gain-of-function type natural variants can be explained based on the neutral theory of molecular evolution. Single nucleotide polymorphism (SNP) is one of the base sequence polymorphisms frequently occurring in genomic DNA. For example, it is said that in human genomic DNA. SNP occurs approximately in one base per about 1,000 bases. The SNP occurring in genes causes amino acid mutation or causes silent mutation that does not entail amino acid mutation, such as synonymous substitution or SNP on an intron. When the SNP causes amino acid mutation, proteins encoded by the gene undergo some structural or functional change in some cases. According to the neutral theory of molecular evolution that is widely accepted (Kimura, Nature, 1968, vol. 217 (5129), p. 624626), genetic mutation causing amino acid substitution, which may exert a negative influence on the survival value of individuals by causing functional deterioration or functional failure and thus is evolutionarily disadvantageous, is excluded from a group. In contrast, if mutation is evolutionarily neutral or advantageous, an individual carrying the mutant gene is not excluded from the group. The gene is passed down to the next generation, and genetic diversity such as SNP is accumulated. According to the neutral theory of molecular evolution, the natural variant having the SNP causing amino acid substitution is considered to include neutral or advantageous mutation that does not cause functional failure, that is, gain-of-function type mutation.
(15) That is, the natural variant obtaining method according to the present invention is a method for selectively obtaining a natural variant of an enzyme having activity, and includes the following steps (1) to (4).
(16) (1) A step of detecting an ORF sequence of a protein having enzyme activity from a genome database including base sequences of metagenomic DNA of environmental microbiota
(17) (2) A step of obtaining at least one PCR clone including the ORF sequence having a full length, a partial sequence of the ORF sequence, or a base sequence encoding an amino acid sequence which is formed by deletion, substitution, or addition of at least one amino acid in an amino acid sequence encoded by the ORF sequence, by performing PCR cloning on at least one metagenomic DNA of the environmental microbiota by using a primer designed based on the ORF sequence
(18) (3) A step of determining a base sequence and an amino acid sequence which is encoded by the base sequence for each PCR clone obtained in the step (2)
(19) (4) A step of selecting a natural variant of an enzyme having activity by measuring enzyme activity of proteins encoded by each PCR clone obtained in the step (2)
(20) The metagenomic DNA of the environmental microbiota is obtained by physically fragmenting genomic DNA of environmental microbiota (a group of microorganisms contained in an environmental sample) into small pieces. As the metagenomic DNA of the environmental microbiota used in the present invention, it is possible to use metagenomic DNA prepared from microbiota of soil, sludge, lakewater, seawater, and the like. In order to obtain an enzyme variant excellent in thermostability, it is preferable to use metagenomic DNA prepared from microbiota of a sample collected from an altithermal environment. Examples of the sample collected from an altithermal environment include a sample of soil of high temperature hot spring and the like. Herein, examples of the soil of high temperature hot spring include hot spring water and the like containing soil, mud, and biomass of 30 C. to 80 C., and the like.
(21) In the method for obtaining a natural variant having activity according to the present invention, first, as the step (1), an ORF sequence of a protein having enzyme activity is detected from a genomic database including base sequences of metagenomic DNA of environmental microbiota. The genomic database can be created by performing a method such as a shotgun sequencing method used for sequence analysis of long-chain DNA on the metagenomic DNA of the environmental microbiota.
(22) Specifically, the metagenomic DNA of the environmental microbiota is subjected to assembly and annotation of shotgun sequencing data, and an ORF sequence encoding a specific amino acid sequence is detected. The ORF sequence of a protein having target enzyme activity can be detected by a known technique such as homology (sequence identity) analysis of base sequence based on a base sequence of a gene which has the target enzyme activity and of which the base sequence is already known. For example, the homology analysis is preferably set so as to detect a sequence 30% or more homologous to the base sequence of a gene which has the enzyme activity and of which the base sequence is already known.
(23) Thereafter, as the step (2), by using a primer designed based on the ORF sequence, PCR cloning is performed on at least one metagenomic DNA of the environmental microbiota. The primer to be used is preferably a primer which is designed so as to be able to amplify at least a domain including a catalyst domain of the enzyme in the ORF sequence by PCR. By performing PCR cloning from the metagenomic DNA, at least one PCR clone, which consists of a base sequence encoding an amino acid sequence formed by deletion, substitution, or addition of at least one amino acid residue in the ORF sequence having a full length; a partial sequence of the ORF sequence; or an amino acid sequence encoded by the ORF sequence, is cloned.
(24) Then the base sequence of each PCR clone obtained in the step (2) is analyzed, and an amino acid sequence encoded by the PCR clone is determined (step (3)). Subsequently, a protein encoded by the PCR clone is manufactured, the enzyme activity of the protein is measured, and a variant of an enzyme having activity is selected (step (4)). In this way, a variant having enzyme activity can be selectively obtained.
(25) In the present invention, it is preferable to perform PCR cloning on a plurality of metagenomic DNAs derived from different sources by using the same primer. If PCR cloning is performed not only on the metagenomic DNA as a base for designing the primer but also on a plurality of metagenomic DNAs derived from different sources, and a PCR library is constructed by collecting colonies combined with a plasmid containing amplified fragments of the designed primer on a large scale, for example, by collecting hundreds or more colonies, more amino acid substitution variants (natural variants) of the enzyme gene are obtained.
(26) Generally, in PCR cloning, target genes are cloned by collecting 1 to 10 colonies. However, in the natural variant obtaining method according to the present invention, it is preferable to clone genes by collecting colonies in a large number, such as 100 at least, per each metagenomic DNA sample. In this way, it is possible to construct a library constituted with natural variants including various SNPs of the target enzyme genes.
(27) [Method for Manufacturing Variant of Enzyme]
(28) In the present invention, the natural variant library constructed as above is subjected to functional screening, and in this way, the function of the enzyme, such as thermostability, is efficiently improved. As a result, it is possible to efficiently manufacture a novel variant improved in terms of the enzyme activity.
(29) That is, the method for manufacturing a variant of an enzyme according to the present invention (hereinafter, the method will be referred to as a variant manufacturing method according to the present invention in some cases) further includes the following steps (5) and (6) in addition to the steps (1) to (4).
(30) (5) A step of investigating the relationship between the amino acid sequence encoded by the ORF sequence and the enzyme activity for a plurality of variants alter the steps (3) and (4)
(31) (6) A step of manufacturing a variant improved in terms of the enzyme activity by causing deletion, substitution, or addition of at least one amino acid residue in the amino acid sequence encoded by the ORF sequence, based on the relationship between the amino acid sequence and the enzyme activity that is obtained in the step (5).
(32) Herein, the term improvement of enzyme activity means that the activity of the enzyme encoded by the ORF sequence, in which deletion, substitution, or addition of at least one amino acid residue has occurred, is higher than the activity of the enzyme encoded by the ORF sequence in which deletion, substitution, or addition of the amino acid residue has not yet occurred.
(33) For example, in the step (5), if enzyme activities of proteins, which are encoded by two PCR clones differing from each other in terms of one, two, or more amino acid residues in the amino acid sequence, are compared with each other, amino acid mutation that can improve the enzyme activity is clearly seen. Subsequently, in the step (6), if base mutation for causing amino acid mutation that can further improve the enzyme activity is introduced into either the DNA fragments consisting of the ORF sequence or the PCR clone obtained in the step (2), and thus at least one amino acid residue in the amino acid sequence encoded by the ORF sequence undergoes deletion, substitution, or addition, a variant improved in terms of the enzyme activity is manufactured. The substitution or the like of the base that is for causing mutation of one, two, or more amino acid residues can be performed by a common method.
(34) [Super Thermostable Cellobiohydrolase]
(35) As shown in Example 1, which will be described later, the present inventors prepared genomic DNA (metagenomic DNA) of a group of microorganisms from soil of high temperature hot spring (for example, hot spring water and the like containing soil, mud, and biofilm of 30 C. to 80 C.) collected from five sites in one place in Japan. Using the respective metagenomic DNAs, the present inventors performed the natural variant obtaining method according to the present invention and obtained a novel cellobiohydrolase excellent in thermostability.
(36) Specifically, first, from one metagenomic DNA, thirteen open reading frames (ORF) having a high degree of amino acid sequence homology (that is, identity of equal to or higher than 30%) with a known cellobiohydrolase (CBH) enzyme were obtained. Based on base sequence information of these ORFs, a primer was designed, and by PCR, the cellobiohydrolase catalyst domain was cloned from the metagenomic DNA of the soil of the high temperature hot spring. The DNA cloned by PCR (PCR clone) was incorporated into E. coli such that a protein encoded by the base sequence was expressed, and functional screening was performed by assay for phosphoric acid-swollen Avicel (PSA) degradation activity and carboxymethyl cellulose (CMC) degradation activity.
(37) Finally, from one open reading frame AR19G-166, a novel super thermostable cellobiohydrolase (AR19G-166RA) having PSA degradation activity was obtained.
(38) The open reading frame AR19G-166 had an incomplete sequence in which methionine of the start codon was lost. Accordingly, the AR19G-166RA gene cloned from the ORF was a partial gene constituted only with a GH6 catalyst domain.
(39) Thereafter, by using a primer designed based on the ORF sequence, five metagenomic DNAs were cloned by PCR cloning. As a result, twelve kinds of amino acid substitution variants of AR19G-166RA were finally obtained. The super thermostable cellobiohydrolase according to the present invention is AR19G-166RASFS (a variant in which the amino acid residue in the 299.sup.th position is arginine, the amino acid residue in the 351.sup.st position is alanine, the amino acid residue in the 88.sup.th position is serine, the amino acid residue in the 230.sup.th position is phenylalanine, and the amino acid residue in the 414.sup.th position is serine) having the highest thermostability among the variants of AR19G-166RA. The amino acid sequence of AR19G-166QA that most frequently appeared is represented by SEQ ID NO:3, and a base sequence encoding the amino acid sequence is represented by SEQ ID NO:4. The amino acid sequence of AR19G-166RASFS is represented by SEQ ID NO:1, and a base sequence encoding the amino acid sequence is represented by SEQ ID NO:2.
(40) As shown in Example 1, which will be described later, AR19G-166RASFS exhibited a high degree of hydrolytic activity with respect to PSA. Moreover, the AR19G-166RASFS exhibited degradation activity with respect to Lichenan consisting of glucan having a -1,3 bond and a -1,4 bond or to Avicel as crystalline cellulose, though the degradation activity was weak. In contrast, the AR19G-166RASFS exhibited almost no degradation activity with respect to CMC or to Laminarin consisting of glucan having a -1,3 bond and a -1,6 bond. Moreover, as a result of searching for the amino acid sequence of the AR19G-166RASFS in a known amino acid sequence database, it was confirmed that an amino acid sequence having the highest sequence identity with the aforementioned amino acid sequence is a glucoside hydrolase (SEQ ID NO: 11) belonging to a GH6 family of Herpetosiphon aurantiacus DSM 785 as a known mesophilic thermophilic bacterium in the phylum Chlroflexi, and the sequence identity was only 66%. The substrate specificity. HPLC analysis of the product of a hydrolysis reaction of PSA, and the sequence identity (homology) of the amino acid sequence with the known cellobiohydrolase clearly showed that the AR19G-166RASFS is a novel cellobiohydrolase belonging to the GH6 family.
(41) The AR19G-166RASFS has cellobiohydrolase activity under conditions of at least a temperature of 80 C. and a pH of 5.5. Actually, as shown in <9> of Example 1, which will be described later, the AR19G-166RASFS exhibits the cellobiohydrolase activity within a wide range of temperature from 50 C. to 95 C. The degree of cellobiohydrolase activity of the AR19G-166RASFS expressed by using E. coli as a host is heightened as the temperature is increased within the range of 50 C. to 95 C., and the thermostability of AR19G-166RASFS is much better than that of the AR19G-166QA and amino acid substitution variants of the AR19G-166QA.
(42) Generally, in a protein having a certain physiological activity, deletion, substitution, or addition of at least one amino acid residue can be performed without impairing the physiological activity. In other words, in the AR19G-166RASFS, deletion, substitution, or addition of at least one amino acid residue can be performed without making the enzyme lose cellobiohydrolase activity.
(43) That is, the super thermostable cellobiohydrolase according to the present invention is a super thermostable cellobiohydrolase having a cellobiohydrolase catalyst domain consisting any of the following (A) to (C).
(44) (A) A polypeptide that consists of an amino acid sequence represented by SEQ ID NO:1
(45) (B) A polypeptide that consists of an amino acid sequence formed by deletion, substitution, or addition of at least one amino acid residue in the amino acid sequence represented by SEQ ID NO:1 (here, serine in the 88.sup.th position, phenylalanine in the 230.sup.th position, and serine in the 414.sup.th position in the amino acid sequence, in which the deletion, substitution, or addition of at least one amino acid residue has not yet occurred, are excluded) and has cellobiohydrolase activity under conditions of at least a temperature of 80 C. and a pH of 5.5
(46) (C) A polypeptide that consists of an amino acid sequence (here, in the amino acid sequence, the 88.sup.th position is serine, the 230.sup.th position is phenylalanine, and the 414.sup.th position is serine) having sequence identity of equal to or higher than 85% with the amino acid sequence represented by SEQ ID NO: 1 and has cellobiohydrolase activity under conditions of at least a temperature of 80 C. and a pH of 5.5
(47) In the polypeptide (B), the number of amino acid residues in the amino acid sequence represented by SEQ ID NO:1 that undergo deletion, substitution, or addition is preferably 1 to 20, more preferably 1 to 10, and even more preferably 1 to 5.
(48) In the polypeptide (C), the sequence identity with the amino acid sequence represented by SEQ ID NO:1 is not particularly limited as long as the sequence identity is equal to or higher than 80%. However, the sequence identity is preferably equal to or higher than 85%, more preferably equal to or higher than 90%, and even more preferably equal to or higher than 95%.
(49) The sequence identity (homology) between amino acid sequences is obtained in the following manner. That is, in a state in which a gap is formed in positions where insertion and deletion occur, two amino acid sequences are juxtaposed with each other such that the corresponding amino acids become as identical to each other as possible; and the ratio of the identical amino acids to the total amino acid sequences excluding the gap in the obtained alignment is calculated as the sequence identity. The sequence identity between amino acid sequences can be obtained by using various homology search software known in the field of related art. In the present invention, the value of the sequence identity of the amino acid sequence is obtained by calculation based on the alignment obtained by the known homology search software BLASTP.
(50) The polypeptides (B) and (C) may be artificially designed. Alternatively, they may be either homologues of AR19G-166 and the like or partial proteins thereof.
(51) Each of the polypeptides (A) to (C) may be chemically synthesized based on the amino acid sequence. Alternatively, they may be produced by a protein expression system by using polynucleotides according to the present invention that will be described later. Furthermore, each of the polypeptides (B) and (C) can be artificially synthesized by using a genetic recombination technique introducing amino acid mutation, based on the polypeptide consisting of the amino acid sequence represented by SEQ ID NO:1.
(52) The polypeptides (A) to (C) have cellobiohydrolase activity under conditions of at least a temperature of 80 C. and a pH of 5.5. The polypeptides have an extremely high degree of cellobiohydrolase activity even at a temperature of 80 C. to 100 C. Therefore, if the enzyme has any of the polypeptides (A) to (C) as a cellobiohydrolase catalyst domain, a super thermostable cellobiohydrolase can be obtained.
(53) The super thermostable cellobiohydrolase according to the present invention has PSA as a substrate. The super thermostable cellobiohydrolase may have glucan other than PSA as the substrate. Examples of the aforementioned other glucans include Lichenan consisting of a -1,3 bond and a -1,4 bond, crystalline cellulose such as Avicel, crystalline bacteria cellulose (Bacterial microcrystalline cellulose, BMCC) and filter paper, carboxymethylcellulose (CMC), glucan consisting of a -1,3 bond and a -1,6 bond, glucan consisting of a -1,3 bond, glucan consisting of a -1,6 bond, xylan, and the like. The super thermostable cellobiohydrolase according to the present invention preferably has, as the substrate, PSA and at least one of the glucan consisting of a -1,3 bond and a -1,4 bond and crystalline cellulose, and more preferably has, as the substrate, PSA, glucan consisting of a -1,3 bond and a -1,4 bond, and crystalline cellulose.
(54) The optimal pH of the super thermostable cellobiohydrolase according to the present invention is within a range of a pH of 5.0 to a pH of 6.5, though it varies with the reaction temperature. The super thermostable cellobiohydrolase according to the present invention preferably exhibits cellobiohydrolase activity at least within a range of a pH of 5.0 to a pH of 6.5, and more preferably exhibits cellobiohydrolase activity within a range of a pH of 4.0 to a pH of 7.0.
(55) The super thermostable cellobiohydrolase according to the present invention may have cellulose hydrolytic activity other than the cellobiohydrolase activity. Examples of the aforementioned other cellulose hydrolytic activities include endoglucanase activity, xylanase activity, -glucosidase activity, and the like.
(56) The super thermostable cellobiohydrolase according to the present invention may be an enzyme including only a cellobiohydrolase catalyst domain consisting of any of the polypeptides (A) to (C), or may include other domains. Examples of other domains include domains other than a cellobiohydrolase catalyst domain that known cellobiohydrolases have. For example, the super thermostable cellobiohydrolase according to the present invention includes enzymes that are obtained by substituting the cellobiohydrolase catalyst domain of known cellobiohydrolases with the polypeptides (A) to (C).
(57) When the super thermostable cellobiohydrolase according to the present invention includes a domain other than the cellobiohydrolase catalyst domain, it is preferable for the super thermostable cellobiohydrolase to include a cellulose-binding module. The cellulose-binding module may be positioned upstream (N terminal side) or downstream (C terminal side) of the cellobiohydrolase catalyst domain. Moreover, the cellulose-binding module and the cellobiohydrolase catalyst domain may be directly bonded to each other, or bonded to each other through a linker domain having an appropriate length. In the super thermostable cellobiohydrolase according to the present invention, the cellulose-binding module is preferably connected to the upstream or downstream of the cellobiohydrolase catalyst domain through the linker domain, and more preferably connected to the upstream of the cellobiohydrolase catalyst domain through the linker domain.
(58) The cellulose-binding module contained in the super thermostable cellobiohydrolase according to the present invention is preferably a domain having an ability to bind to cellulose, for example, PSA or crystalline cellulose, and the amino acid sequence thereof is not particularly limited. As the cellulose-binding module, for example, cellulose-binding modules included in known proteins or domains obtained by appropriately modifying the aforementioned modules may be used. Furthermore, when the super thermostable cellobiohydrolase according to the present invention has the cellobiohydrolase catalyst domain and the cellulose-binding module, these are preferably connected to each other through a linker sequence. The amino acid sequence, the length, and the like of the linker sequence are not particularly limited. In addition, the super thermostable cellobiohydrolase according to the present invention may further have a signal peptide, which can be localized by being transferred to a specific domain in a cell, or a signal peptide, which is secreted outside a cell, on the N terminal or the C terminal. Examples of the signal peptide include an apoplastic transport signal peptide, an endoplasmic reticulum retention signal peptide, a nuclear transport signal peptide, a secretory signal peptide, and the like. Examples of the endoplasmic reticulum retention signal peptide include a signal peptide consisting of an amino acid sequence of HDEL, and the like.
(59) Moreover, when the super thermostable cellobiohydrolase according to the present invention is produced using an expression system, in order to make it possible to simply purify the enzyme, various tags may be added to, for example, the N terminal or the C terminal. As the tags, for example, it is possible to use tags, such as a His tag, a hemagglutinin (HA) tag, a Myc tag, and a Flag tag, that are widely used for expression and purification of recombinant proteins.
(60) [Polynucleotide Encoding Super Thermostable Cellobiohydrolase]
(61) The polynucleotide according to the present invention encodes the super thermostable cellobiohydrolase according to the present invention. The super thermostable cellobiohydrolase can be produced by using an expression system of a host that is obtained by introducing an expression vector, into which the polynucleotide has been incorporated, into the host.
(62) Specifically, the polynucleotide according to the present invention is a polynucleotide encoding a domain including a cellobiohydrolase catalyst domain consisting of any of the following amino acid sequences (a) to (e).
(63) (a) A base sequence encoding a polypeptide that consists of an amino acid sequence represented by SEQ ID NO:1
(64) (b) A base sequence encoding a polypeptide that consists of an amino acid sequence formed by deletion, substitution, or addition of at least one amino acid residue of the amino acid sequence represented by SEQ ID NO:1 (here, serine in the 88.sup.th position, phenylalanine in the 230.sup.th position, and serine in the 414.sup.th position are excluded) and has cellobiohydrolase activity under conditions of at least a temperature of 80 C. and a pH of 5.5
(65) (c) A base sequence encoding a polypeptide that consists of an amino acid sequence (herein, in the amino acid sequence, the 88.sup.th position is serine, the 230.sup.th position is phenylalanine, and the 414.sup.th position is serine) having sequence identify of equal to or higher than 85% with the amino acid sequence represented by SEQ ID NO:1 and has cellobiohydrolase activity under conditions of at least a temperature of 80 C. and a pH of 5.5
(66) (d) A base sequence encoding a polypeptide that has sequence identity of equal to or higher than 80% with an amino acid sequence represented by SEQ ID NO:2 and has cellobiohydrolase activity under conditions of at least a temperature of 80 C. and a pH of 5.5
(67) (e) A base sequence encoding a polypeptide that is a base sequence of a polynucleotide hybridized with a polynucleotide consisting of the base sequence represented by SEQ ID NO:2 under stringent conditions and has cellobiohydrolase activity under conditions of at least a temperature of 80 C. and a pH of 5.5
(68) In the present invention and the present specification, examples of the stringent conditions include the method described in Molecular Cloning-A LABORATORY MANUAL THIRD EDITION (Sambrook et al., Cold Spring Harbor Laboratory Press). For example, under the condition, in a hybridization buffer consisting of 6SSC (composition of 20SSC: 3 M sodium chloride, 0.3 M citric acid solution, a pH of 7.0), 5Denhardt's solution (composition of 100Denhardt's solution: 2% by mass of bovine serum albumin, 2% by mass of ficoll, 2% by mass of polyvinylpyrrolidone), 0.5% by mass of SDS, 0.1 mg/mL salmon sperm DNA, and 50% formamide, hybridization is performed by incubating the proteins for several hours or overnight at a temperature of 42 C. to 70 C. As a wash buffer used for washing after the incubation, a 1SSC solution containing 0.1% by mass of SDS is preferable, and a 0.1SSC solution containing 0.1% by mass of SDS is more preferable.
(69) In the base sequences (a) to (e), as a degenerated codon, it is preferable to select a sequence that is highly frequently used as a codon of the host. For example, the base sequence (a) may be either the base sequence represented by SEQ ID NO:2 or a base sequence obtained by modifying the base sequence represented by SEQ ID NO:2 into the codon that is highly frequently used in the host without changing the amino acid sequence to be encoded. The modification of the codon can be performed by known genetic recombination techniques.
(70) The polynucleotide consisting of the base sequence represented by SEQ ID NO:2 may be chemically synthesized based on base sequence information. Alternatively, the polynucleotide may be obtained by acquiring a full-length gene encoding AR19G-166 (the gene may be referred to as an AR19G-166 gene in some cases) or a partial domain of thereof including the cellobiohydrolase catalyst domain from nature by using a genetic recombination technique. For example, the full-length AR19G-166 gene or the partial domain thereof can be obtained in a manner in which a sample containing microorganisms is obtained from nature; and PCR is performed using a forward primer and a reverse primer designed by a common method based on the base sequence represented by SEQ ID NO:2 by using genomic DNA collected from the sample as a template. cDNA, which is synthesized by a reverse transcription reaction performed using mRNA collected from the sample as a template, may be used as a template. Furthermore, the sample, from which a nucleic acid to be a template is collected, is preferably a sample collected from an high temperature environment such as the soil of hot spring.
(71) In the base sequence (d), the sequence identity with the base sequence represented by SEQ ID NO:2 is not particularly limited as long as it is equal to or higher than 80%. However, the sequence identity is preferably equal to or higher than 85%, more preferably equal to or higher than 90%, and even more preferably equal to or higher than 95%.
(72) The sequence identity (homology) between base sequences is obtained in the following manner. That is, in a state in which a gap is formed in positions where insertion and deletion occur, twobase sequences are juxtaposed with each other such that the corresponding base sequences become as identical to each other as possible; and a ratio of the identical base sequences to the total base sequences excluding the gap in the obtained alignment is calculated as the sequence identity. The sequence identity between base sequences can be obtained by using various homology search software known in the field of related art. In the present invention, the value of the sequence identity of the base sequence is obtained by calculation based on the alignment obtained by the known homology search software BLASTN.
(73) For example, each of the polynucleotides consisting of the base sequence (b) or (c) can be artificially synthesized by causing deletion, substitution, or addition of at least one base in the polynucleotide consisting of the base sequence represented by SEQ ID NO:2. Moreover, the base sequence (b) or (c) may be a full-length sequence or a partial sequence of a homologue gene of the AR19G-166 gene. The homologue gene of the AR19G-166 gene can be obtained by a genetic recombination technique used for obtaining a homologue gene of a gene of which the base sequence is already known.
(74) The polynucleotide according to the present invention may have only the domain encoding the cellobiohydrolase catalyst domain. Alternatively, the polynucleotide may have domains encoding a cellulose-binding module, a linker sequence, various signal peptides, various tags, and the like, in addition to the aforementioned domain.
(75) [Expression Vector]
(76) The expression vector according to the present invention is an expression vector into which the polynucleotide according to the present invention has been incorporated and which expresses a polypeptide having cellobiohydrolase activity in a host cell under conditions of at least a temperature of 80 C. and a pH of 5.5. That is, it is an expression vector into which the polynucleotide according to the present invention has been incorporated in a state of being able to express the super thermostable cellobiohydrolase according to the present invention. Specifically, an expression cassette, which consists of DNA having a promoter sequence, the polynucleotide according to the present invention, and DNA having a terminator sequence from the upstream, needs to be incorporated into the expression vector. The polynucleotide can be incorporated into the expression vector by using known genetic recombination technique, and a commercially available expression vector preparation kit may be used.
(77) The expression vector may be an expression vector introduced into prokaryotic cells such as E. coli or an expression vector introduced into eukaryotic cells such as yeast, filamentous fungi, insect culture cells, mammal culture cells, and plant cells. As these expression vectors, any of generally used expression vectors compatible with the host can be used.
(78) The expression vector according to the present invention is not limited to the polynucleotide according to the present invention, and is preferably an expression vector into which a drug-resistant gene or the like has been incorporated. This is because such an expression vector makes it easy to select a host having undergone transformation and a host having not undergone transformation.
(79) Examples of the drug-resistant gene include a kanamycin-resistant gene, a hygromycin-resistant gene, a bialaphos-resistant gene, and the like.
(80) [Transformant]
(81) The transformant according to the present invention contains the expression vector according to the present invention introduced thereinto. In the transformant, the super thermostable cellobiohydrolase according to the present invention can be expressed. The cellobiohydrolases known in the related art are expressed only in a narrow range of expression hosts. That is, many of the cellobiohydrolases are not easily expressed in different types of host cells. In contrast, the super thermostable cellobiohydrolase according to the present invention can be expressed in a wide range of expression cells such as E. coli, yeast, filamentous fungi, and the chloroplast of higher plants. That is, the transformant according to the present invention includes E. coli, yeast, filamentous fungi, the chloroplast of higher plants, and the like into which the expression vector of the present invention has been introduced.
(82) The method for preparing the transformant by using the expression vector is not particularly limited, and can be performed by methods generally used fir preparing transformants. Examples of the methods include an agrobacterium method, a particle gun method, an electroporation method, a polyethylene glycol (PEG) method, and the like. When the host is a plant cell, it is preferable to use a particle gun method or an agrobactrium method among the above methods.
(83) In another aspect, the host into which the expression vector is introduced may be prokaryotic cells such as E. coli or eukaryotic cells such as yeast, filamentous fungi, insect culture cells, mammal culture cells, and plant cells. By culturing transformants of E. coli, the super thermostable cellobiohydrolase according to the present invention can be mass-produced more simply. Meanwhile, in eukaryotic cells, the protein undergoes glycosylation. Therefore, if transformants of the eukaryotic cells are used, a super thermostable cellobiohydrolase superior in thermostability is obtained, compared to a case of using the transformants of the prokaryotic cells. Particularly, when the transformants are eukaryotic microorganisms such as a filamentous fungus like Aspergillus or yeast, a super thermostable cellobiohydrolase superior in thermostability can be mass-produced in a relatively simple manner.
(84) When prokaryotic cells, yeast, filamentous fungi, insect culture cells, mammal culture cells, and the like are used as host, generally the obtained transformants can be cultured by a common method in the same manner as used for the host having not undergone transformation.
(85) When the transformant according to the present invention is a plant, as the host, plant culture cell, plant organs, or plant tissues may be used. By using known plant tissue culture methods and the like, transformed plants can be obtained from transformed plant cells, callus, and the like. For example, if transformed plant cells are cultured by using a hormone-free redifferentiation medium, and the obtained young plant, which has taken root, is transplanted into soil or the like and cultivated, a transformed plant can be obtained.
(86) [Method for Manufacturing Super Thermostable Cellobiohydrolase]
(87) The method for manufacturing a super thermostable cellobiohydrolase according to the present invention is a method for producing a super thermostable cellobiohydrolase in the transformant according to the present invention. In the transformant manufactured by using the expression vector, into which the polynucleotide according to the present invention has been incorporated into the downstream of a promoter that is not able to control the time of expression or the like, the super thermostable cellobiohydrolase according to the present invention is expressed constantly. In contrast, a transformant manufactured using a so-called expression induction-type promoter, which induces expression depending on a specific compound, temperature conditions, and the like, is subjected to induction treatment appropriate for the respective expression conditions, and in this way, the super thermostable cellobiohydrolase is expressed in the transformant.
(88) The super thermostable cellobiohydrolase produced by the transformant may be used in a state of being held in the transformant or may be extracted or purified from the transformant.
(89) The method for extracting and purifying the super thermostable cellobiohydrolase from the transformant is not particularly limited as long as it is a method that does not impair the activity of the super thermostable cellobiohydrolase. The super thermostable cellobiohydrolase can be extracted by a method that is generally used for extracting polypeptides from cells or biological tissues. Examples of the method include a method of dipping the transformant in an appropriate extraction buffer, extracting the super thermostable cellobiohydrolase, and then separating the enzyme into an extract and a solid residue. The extraction buffer preferably contains a solubilizing agent such as a surfactant. When the transformant is a plant, the transformant may be shred or fragmented before being dipped in the extraction buffer. Furthermore, as the method for separating the enzyme into an extract and a solid residue, for example, it is possible to use known solid-liquid separation treatment such as a filtration method, a compression filtration method, and a centrifugation method. Moreover, the transformant in a state of being dipped in the extraction buffer may be squeezed. The super thermostable cellobiohydrolase in the extract can be purified using a known purification method such as a salting-out method, an ultrafiltration method, and a chromatography method.
(90) When the super thermostable cellobiohydrolase according to the present invention is expressed in a state of having a secretory signal peptide in the transformant, by culturing the transformant and then collecting culture supernatant excluding the transformant from the obtained culture, a solution containing the super thermostable cellobiohydrolase can be simply obtained. Moreover, when the super thermostable cellobiohydrolase according to the present invention has a tag such as a His tag, the super thermostable cellobiohydrolase in the extract or the supernatant can be simply purified by an affinity chromatography method using the tag.
(91) [Cellulase Mixture]
(92) The super thermostable cellobiohydrolase according to the present invention or the super thermostable cellobiohydrolase manufactured by the method for manufacturing a super thermostable cellobiohydrolase according to the present invention can also be used as a cellulase mixture containing at least one other kind of cellulase. The super thermostable cellobiohydrolase manufactured by the method for manufacturing a super thermostable cellobiohydrolase according to the present invention may be used in a state of being contained in the transformant or may be extracted or purified from the transformant. If the super thermostable cellobiohydrolase according to the present invention is used as a mixture, which contains another cellulase, for a cellulose degradation reaction, recalcitrant lignocellulose can be more efficiently degraded.
(93) The aforementioned another cellulase other than the super thermostable cellobiohydrolase that is contained in the cellulase mixture is not particularly limited as long as the cellulase exhibits hydrolytic activity with respect to cellulose. Examples thereof include a hemicellulase such as xylanase or -xylosidase, -glucosidase, endoglucanase, and the like. The cellulase mixture according to the present invention preferably contains at least one of the hemicellulase and endoglucanase and more preferably contains both the hemicellulase and endoglucanase. Particularly, the cellulase mixture preferably contains at least one kind selected from the group consisting of xylanase. -xylosidase, -glucosidase, and endoglucanase, and more preferably contains all of the xylanase, -xylosidase, -glucosidase, and endoglucanase.
(94) The aforementioned another cellulase contained in the cellulase mixture is preferably a thermostable cellulase having cellulase activity at a temperature of at least 70 C., and more preferably a thermostable cellulase having cellulase activity at a temperature of at least 70 C. to 90 C. If all of the enzymes contained in the cellulase mixture are resistant to heat, the degradation reaction of cellulose performed using the cellulase mixture can be efficiently performed under altithermal conditions. That is, when the cellulase mixture contains only thermostable cellulases, if the cellulase mixture is used for hydrolysis of lignocellulose, a hydrolysis reaction of lignocelluloses can be performed under an altithermal condition at a hydrolysis temperature of 70 C. to 90 C. By the hydrolysis performed under the high temperature condition, the amount of enzyme and the time taken for hydrolysis can be markedly decreased, and the hydrolysis cost can be greatly reduced.
(95) [Method for Manufacturing Cellulose Degradation Product]
(96) The method for manufacturing a cellulose degradation product according to the present invention is a method for obtaining a degradation product by degrading cellulose by using the super thermostable cellobiohydrolase according to the present invention. Specifically, by bringing a cellulose-containing material into contact with the super thermostable cellobiohydrolase according to the present invention, the transformant according to the present invention, or the super thermostable cellobiohydrolase manufactured by the method for manufacturing a super thermostable cellobiohydrolase according to the present invention, a cellulose degradation product is produced.
(97) Herein, the term cellulose degradation product means a cello-oligosaccharide containing mainly cellobiose, cellotriose and the like.
(98) The cellulose-containing material is not particularly limited as long as it contains cellulose. Examples of the material include cellulose-based biomass such as weeds and agricultural waste, waste paper, and the like. Before being brought into contact with the super thermostable cellobiohydrolase according to the present invention, the cellulose-containing material is preferably subjected to physical treatment such as fragmentation or shredding and chemical treatment using an acid, an alkali, and the like, dipped in an appropriate buffer, or subjected to a dissolution treatment.
(99) The reaction condition of the hydrolysis reaction of the cellulose performed by the super thermostable cellobiohydrolase according to the present invention is preferably a condition under which the super thermostable cellobiohydrolase exhibits cellobiohydrolase activity. For example, the reaction is preferably performed under conditions of a temperature of 55 C. to 100 C. and a pH of 3.5 to 7.0, and more preferably performed under conditions of a temperature of 70 C. to 100 C. and a pH of 4.0 to 6.0. The reaction time is appropriately adjusted in consideration of the type of the cellulose-containing material used for the hydrolysis reaction, the pretreatment method of the material, the amount of the material, and the like. For example, the reaction may be performed for 10 minutes to 100 hours. In the case of degrading cellulose-based biomass, the reaction can be performed for 1 hour to 100 hours.
(100) For the hydrolysis reaction of cellulose, in addition to the super thermostable cellobiohydrolase according to the present invention, at least one other kind of cellulase is preferably used. As the aforementioned another cellulase, the same cellulase as the cellulase contained in the cellulase mixture can be used, and the cellulase is preferably thermostable cellulase which has cellulase activity at a temperature of at least 70 C. and more preferably at a temperature of at least 70 C. to 100 C. Furthermore, in the method for manufacturing the cellulose degradation product, instead of the super thermostable cellobiohydrolase according to the present invention, the transformant according to the present invention, or the super thermostable cellobiohydrolase manufactured by the method for manufacturing a super thermostable cellobiohydrolase according to the present invention, the aforementioned cellulase mixture may be used.
EXAMPLES
(101) Next, the present invention will be more specifically described based on examples, but the present invention is not limited to the following examples.
Example 1
Cloning of Novel Super Thermostable Cellobiohydrolase from Soil of Hot Spring
(102) <1> Preparation of Metagenomic DNA of High Temperature Soil
(103) From an area in Japan from which high temperature hot spring gushes out, hot spring water containing soil, mud, and biomass was collected. The temperature and pH of the just collected samples were in a range of 33 C. to 78 C. and 7.2 to 8 respectively. Among the samples of soil of hot spring, five samples named AR19, OJS2, OJS4, OJS7, and OJS9 respectively as metagenomic DNA samples were subjected to DNA extraction. From 10 g of each of the collected samples of soil of hot spring, DNA was extracted by using a DNA extraction kit (ISOIL Large for Beads ver. 2, NIPPON GENE CO., LTD.). From AR19, DNA was obtained in an amount of equal to or greater than 10 g, and metagenomic sequencing was performed on 5 g of the DNA by using GS FLX Titanium 454 manufactured by Roche Diagnostics. The remaining DNA was used for PCR cloning of cellulase genes. Meanwhile, the samples, from which a small amount (equal to or less than 10 g) DNA was obtained, were subjected to genome amplification by using a genomic DNA amplification kit (GenomiPhi V2 DNA Amplification Kit, manufactured by GE Healthcare).
(104) <2> Assembly and Statistic of Metagenomic Data of Soil of Hot Spring
(105) By using AR19 as one of the genomic DNA samples extracted from the metagenomes of soil of hot spring, shotgun sequencing was performed three times according to the standard protocol of shotgun sequencing of 454 GS FLX Titanium technology manufactured by Roche. The output (sff file) of the Roche 454 was subjected to re-base calling by PyroBayes (Quinlan et al., Nature Methods, 2008, vol. 5, p. 179-81.), and as a result, FASTA format sequence files and Quality-value files were obtained. The ends of the obtained sequence reads were cut off so as to improve the quality thereof, and the sequence reads were assembled by using assembly software Newbler version 2.3 of 454 Life Science. The assembly was performed under conditions set to be minimum acceptable overlap match (mi)=0.9, option:large (for large or complex genomes, speeds up assembly, but reduces accuracy.).
(106) The reads having undergone Quality filtering processing and assembled contigs having a length of equal to or greater than 100 bp had a length of 330 Mbp in total, and the data set was used for cellulase enzyme gene analysis. Among a total of 2,766,332 reads, 2,040,651 reads were assembled into contigs (101,372 contigs in total) having a length of equal to or greater than 1,027 bp on average, and among these, the longest contig had a length of 187,970 bp.
(107) For the assembled sequences, all of the contigs and singletons were phylogenetically classified into five categories including bacteria, archea (archacbacteria), prokaryote, virus, and a group belonging to none of these, with reference to KEGG database (Kanehisa, M. Science & Technology Japan, 1996, No. 59, p. 34-38, http://www.genome.jp/kegg/, 2011/5/11 (searched)). In the length of 330 Mbp (=total contig length+total singleton length) of the assembled sequence, the length of the sequence hit into bacteria was 59.9 Mbp, the length of the sequence hit into archea was 3.1 Mbp, the length of the sequence hit into prokaryote was 25,499 bp, and the length of the sequence hit into virus was 384,255 bp. These results evidently showed that the metagenomic database included only 19.2% of the known DNA sequence. The length of the sequence belonging to none of the above categories was 266.7 Mbp and accounted for 80.8% of the entire assembled sequence. These are novel sequences derived from any of bacteria, archea, and prokaryote.
(108) <3> Open Reading Frame (ORF) Prediction of Cellobiohydrolase
(109) From UniProt database (http://www.uniprot.org/), sequences of EC Nos. 3.2.1.4 (cellulase), 3.2.1.37 (-xylosidase), 3.2.1.91 (cellulose 1.4--cellobiosidase), and 3.2.1.8 (endo 1,4--xylanase) were downloaded (access number: 2009/4/13), and a proteomic local database of these glycoside hydrolase genes was constructed. By using annotation software Orphelia (Hoff et al., Nucleic Acids Research, 2009, 37 (Web Server issue: W101-W105)), a gene domain (=open reading frame) was estimated from the contig sequences obtained in the section <2> (Orphelia option: default (model=Net 700, maxoverlap=60)). In order to extract glycoside hydrolase genes from the estimated ORF, BLASTP (blastall ver. 2.2.18) was used, and the local database was referred to. The option conditions of BLASTP were set to be Filter query sequence=false, Expectation value (E)<1e.sup.20, default values: Cost to open a gap=1, Cost to extended gap=1, X dropoff value for gapped alignment=0, Threshold for extending hits=0, Word size=default, and the hit sequences were collected as glycoside hydrolase genes.
(110) The glycoside hydrolases, such as cellulase, endohemicellulase, and debranching enzymes, obtained in the aforementioned manner were subjected to functional classification based on pfam HMMs (Pfam version 23.0 and HMMER v 2.3 Finn et al., Nucleic Acids Research Database, 2010, Issue 38, p. D211-222) as functional domain sequence database of proteins. Specifically, by using a sequence homology search algorithm HMMER (Durbin et al., The theory behind profile HMMs. Biological sequence analysis: probabilistic models of proteins and nucleic acids, 1998, Cambridge University Press.; hmmpfam (Ver. 2.3.2), E-value cutoff <1e.sup.5; Database=Pfam_fs (models that can be used to find fragments of the represented domains in a sequence)) using a hidden Markov model, the sequences were classified into a glycoside hydrolase (GH) family based on the homology thereof with the Pfam domain database.
(111) <4> Cloning of AR19G-166RA and AR19G-166QV
(112) From the metagenome AR19 of the soil of hot spring, thirteen ORFs including cellobiohydrolase catalyst domain sequence belonging to GH6. GH9, GH48, and other GH families were obtained. For these ORFs, primers were designed, and the genes were cloned from the metagenome AR19 of the soil of hot spring by means of PCR. From the open reading frame AR19G-166, cellobiohydrolase genes (AR19G-166RA and AR19G-166QV) belonging to the GH6 family were cloned.
(113) <5> Open Reading Frame AR19G-166
(114) The open reading frame AR19G-166 encoded a polypeptide (SEQ ID NO:5) consisting of 474 amino acid residues. However, AR19G-166 consisted of an incomplete sequence in which the start codon was lost, and was constituted only with the partial sequence of a linker and the GH6 catalyst domain. The GH6 catalyst domain of AR19G-166 was cloned into the genes AR19G-166RA and AR19G-166QV by PCR cloning, and exhibited 66% of amino acid sequence identity with respect to a GH6 glycoside hydrolase (Genbank: ABX 04776.1) of Herpetosiphon aurantiacus DSM 785 as a mesophilic thermophilic bacterium in the phylum Chlroflexi. By PCR cloning using a forward primer consisting of a base sequence represented by SEQ ID NO:7 (5-CACCATGTTGGACAATCCATTCATCGGAG-3: obtained by adding seven bases (CACCATG) to the 5-terminal side of a base sequence represented by SEQ ID NO:6; in the added sequence, ATG at the 3 side is a start codon, and CACC at the 5 side is a sequence for being inserted into a vector) and a reverse primer consisting of a base sequence represented by SEQ ID NO:8 (5-TAGGGTTGGATCGGCGGATAG-3), two gene clones (AR19G-166RA and AR19G-166QV) were obtained from AR19G-166. The amino acid sequences encoded by AR19G-166RA and AR19G-166QV were different from each other only in terms of two amino acid residues in the 299.sup.th position and the 351.sup.st position. In AR19G-166RA, the amino acid residue in the 299.sup.th position was arginine, and the amino acid residue in the 351.sup.st position was alanine. In AR19G-166QV, the amino acid residue in the 299.sup.th position was glutamine, and the amino acid residue in the 351.sup.st position was valine.
(115) <6> PCR Cloning of Natural Variant
(116) The aforementioned five metagenomic DNA (AR19, OJS2, OJS4, OJS7, and OJS9) were subjected to PCR cloning of cellobiohydrolase genes by using a forward primer consisting of the base sequence represented by SEQ ID NO:7 and a reverse primer consisting of the base sequence represented by SEQ ID NO:8. The sequence of the hit clones was decoded by a Sanger sequencer. The genotype and amino acid mutation of each of the clones are shown in Tables 1 and 2. By performing PCR cloning on a plurality of metagenomic DNA samples, a total of forty-six clones of natural variants having a large number of SNPs were obtained. These variants showed SNPs in a total of twenty-seven sites including c6t, g51a, a69g, a201g, c259t, c433t, c450t, g456c, c531t, a627g, a896g, c1116t, c98t (A33V), t251c (184T), c262t (P88S), g497a (R166H), c683t (T228I), c688t (L230F), a762t (E254D), a761g (E254G), c805t (R269C), a896g (R299Q), a916g (S306G), c1052t (A351V), a1078g (E360K), g1216a (A406T), and t1241c (F414S) (SNP is represented by in small letters, and the letters in the parenthesis represent amino acid substitution caused by SNP). The SNP caused amino acid substitution in fourteen sites among the twenty-seven sites (Tables 1 and 2). In all of the clones showing the SNP. IN/DEL mutation (insertion and deletion of a DNA base sequence) was not observed.
(117) TABLE-US-00001 TABLE 1 Metagenome DNA samples AR19 OJS2 OJS4 OJS5 OJS9 Total No. of clones amplified by 240 48 48 48 48 432 colony PCR No. of clones hit QA (Q299/A351) 18 3 4 3 1 29 QV (Q299/V351) 1 0 0 0 0 1 RA (R299/A351) 4 0 0 0 0 4 QA/A33V 2 0 0 0 0 2 QA/T228I 0 0 1 0 0 1 QA/R269C 1 0 0 0 0 1 QA/E254G 0 0 0 1 0 1 QA/S306G 2 0 0 0 0 2 QA/R166H/B360K 0 0 1 0 0 1 QA/I84T/A406T 0 0 0 1 0 1 RA/E254D 1 0 0 0 0 1 RA/P88S/L230F/F414S 0 2 0 0 0 2 Total No. of clones hit 29 5 6 5 1 46
(118) TABLE-US-00002 TABLE 2 Clone variants SNPs and amino acid substitutions Among QA lones c6t, a201g, c259t, c450t, g456c, c531t, a627g QV c1052t (A351V) RA a51g, a201g, a896g (Q299R) QA/A33V c98t (A33V) QA/T228I a201g, c433t, c683t(T2281), c1116t QA/R269C c805t (R269C) QA/E254G a201g, a761g (E254G) OA/S306G a69g, a916g (S306G) QA/R166H/E360K a201g, g497a (R166H), a1078a (E360K) QA/184T/A406T a201g, 251c (I84Tt), g1216a (A406T) RA/E254D a51g, a201g, a762t (E254D), a896g (Q299R) RA/P88S/L230F/F414S a201g, c262t (P88S), c688t (L230F), a896g (Q299R), t1241c (F414S)
(119) Among the forty-six clones obtained as above, twenty-nine clones, which were more than half of the clones, were variants AR19G-166QA (hereinafter, abbreviated to QA, SEQ ID NO:3) encoding an amino acid sequence in which the amino acid residue in the 299.sup.th position was glutamine and the amino acid residue in the 351.sup.st position was alanine; one clone was a variant AR19G-166QV (hereinafter, abbreviated to QV. SEQ ID NO:9) encoding an amino acid sequence in which the amino acid residue in the 299.sup.th position was glutamine and the amino acid residue in the 351.sup.st position was valine; and four clones were variants AR19G-166RA (hereinafter, abbreviated to RA, SEQ ID NO:10) encoding an amino acid sequence in which the amino acid residue in the 299.sup.th position was arginine and the amino acid residue in the 351.sup.st position was alanine (Table 1).
(120) Twelve other clones were variants formed by mutation of one to three amino acid residues in QA, RA, or OV. Among the twelve clones, nine clones were variants formed by substitution of one to two amino acid residues in the amino acid sequence encoded by AR19G-166QA (Table 1). The nine clones included two QA/A33V clones in which alanine in the 33.sup.rd position was substituted with valine; one QA/T228I clone in which threonine in the 228.sup.st position was substituted with isoleucine; one QA/R269C clone in which arginine in the 269.sup.th position was substituted with cysteine; one QA/E254G clone in which glutamic acid in the 254.sup.th position was substituted with glycine; two QA/S306G clones in which serine in the 306.sup.th position was substituted with glycine; one QA/R166H/E360K clone in which arginine in the 166.sup.th position and glutamic acid in the 360.sup.th position were substituted with histidine and glycine respectively; and one QA/I84T/A406T clone in which isoleucine in the 84.sup.th position and alanine in the 406.sup.th position were substituted with threnonine.
(121) Three other clones were variants formed by substitution of one or three amino acid residues in the amino acid sequence encoded by AR19G-166RA. These clones included one RA/E254D clone in which glutamic acid in the 254.sup.th position was substituted with aspartic acid; and two RA/P88S/L230F/F414S clones (also referred to as AR19G-166RASFS, SEQ ID NO:1) in which proline in the 88.sup.th position, leucine in the 230.sup.th position, and phenylalanine in the 414.sup.th position were substituted with serine, phenylalanine, and serine respectively.
(122)
(123) AR19G-166QA was the clone that most frequently appeared, and a total of twenty-nine clones of AR19G-166QA were obtained from the five kinds of metagenomic DNA (Table 2). The twenty-nine AR19G-166QA clones included SNPs at seven sites (c6t, a201g, c259t, c450t, g456t, c531 t, and a627g) in the base sequence.
(124) Unlike the AR19G-166QA genes, the base sequence of AR19G-166RA genes included SNPs at three sites (a51g, a201g, and a896g) (Table 2). Among these, two sites were silent mutation, and guanine at both the 51.sup.st position and the 201.sup.st position of the base sequence were substituted with adenine. The remaining one site was SNP involved in amino acid substitution, and adenine in the 896.sup.th position was mutated into guanine. Due to the SNP, in AR19G-166RA, the amino acid residue in the 299.sup.th position in the amino acid sequence was substituted with arginine.
(125) The base sequence of the AR19G-166QV genes included SNP at one site (Table 2). In the base sequence, cytosine in the 1,052.sup.nd position was mutated into thymine. Consequentially, in AR19G-166QV, the amino acid residue in the 351.sup.st position of the amino acid sequence encoded by AR19G-166QA was substituted with valine.
(126) Among the twelve kinds of AR19G-166SNP variants, AR19G-166RASFS exhibiting strongest thermostability included SNPs at five sites (a201 g, c262t (P88S), c688t (L230F), a896g (Q299R), and t1241c (F414S)) unlike AR19G-166QA (Table 2). Among these, four sites were involved in amino acid substitution. At one site among the four sites, adenine in the 896.sup.th position was mutated into guanine similarly to AR19G-166RA, and the amino acid residue in the 299.sup.th position in the amino acid sequence encoded by AR19G-166RASFS was substituted with arginine. At three other sites, cytosine in the 262.sup.nd position was mutated into thymine, cytosine in the 688.sup.th position was mutated into thymine, and thymine in the 1241.sup.st position was mutated into cytosine. Due to the SNPs, in the amino acid sequence encoded by AR19G-166RASFS, proline in the 88.sup.th position was substituted with serine, leucine in the 230.sup.th position was substituted with phenylalanine, and phenylalanine in the 414.sup.th position was substituted with serine.
(127) <7> Expression and Purification of Amino Acid Substitution Natural Variant
(128) Plasmids obtained by PCR cloning were introduced into a Rosetta-gamiB (DE3) pLysS strain (manufactured by Merck & Co., Inc) of E. coli for protein expression by a heat shock method. E. coli having target genes was inoculated into an LB medium containing ampicillin at 100 mg/L and cultured until OD.sub.600 became about 0.2 to 0.8. Thereafter, IPTG (Isopropyl--D()-thiogalactopyrandoside) was added thereto, and the bacterium was cultured for 20 hours to induce expression of target proteins. After culturing, E. coli was collected by performing centrifugation, and 200 mM acetate buffer (a pH of 5.5) having a volume of 1/20 of the volume of the culture fluid was added thereto so as to suspend the bacterium. Subsequently, by using an ultrasonic fragmentation device BioRuptor UCD-200T (manufactured by COSMO BIO CO., LTD.), the resultant was subjected to fragmentation for 30 seconds, and then the fragmentation was stopped for 30 seconds. The treatment process was repeated 10 times, and as a result, supernatant containing the target proteins having undergone amino acid substitution was obtained. The supernatant was taken as a crude enzyme sample liquid.
(129) The target proteins in the crude enzyme sample liquid were confirmed by SDS-PAGE analysis.
(130) An ion exchange column HiTrap Q HP (manufactured by GE Healthcare) equilibrated using 50 mM Tris-HCl buffer (a pH of 8.0) was filled with the crude enzyme sample liquid. Thereafter, by using a medium/high-pressure liquid chromatography system AKTA design (manufactured by GE Healthcare) and 50 mM Tris-HCl buffer (a pH of 8.0) containing 1 M NaCl, proteins were fractionated at a concentration gradient of 0% to 50%. The fractions having cellobiohydrolase activity were mixed together, and then the solution thereof was exchanged with 50 mM Tris-HCl buffer (a pH of 8.0) containing 750 mM ammonium sulfate by using a centrifugal ultrafiltration membrane VIVASPIN 20 (manufactured by Sartorius Stedim Biotech). Then a hydrophobic interaction separation column HiTrap Phnenyl HP (manufactured by GE Healthcare) equilibrated using the same butter was filled with the fraction having cellobiohydrolase activity having undergone the solution exchange, and proteins were extracted at a concentration gradient of 100% to 0% by using 50 mM Tris-HCl buffer (a pH of 8.0). The fractions having cellobiohydrolase activity were mixed together and then concentrated using VIVASPIN 20 until the liquid amount thereof became about 8 mL. The concentrated sample was added to a gel filtration column Hiload 26/60 superdex 200 pg (manufactured by GE Healthcare) equilibrated using 50 mM Tris-HCl buffer (a pH of 8.0) containing 150 mM NaCl, and fractionation was performed by causing the same buffer having volume 1 to 1.5 times greater than that of the column to flow at a rate of 2 mL/min to 3 mL/min. After the fractions having cellobiohydrolase activity were mixed together, the solution thereof was exchanged with 50 mM Tris-HCl buffer (a pH of 8.0), followed by concentration, thereby obtaining a purified enzyme sample liquid having final concentration of about 1 mg/mL.
(131) <8> Measurement of Cellobiohydrolase Activity having PSA as Substrate (PSA Hydrolytic Activity)
(132) For measuring the cellobiohydrolase activity, phosphoric acid-swollen Avicel (PSA) was used as a substrate. PSA was prepared in a manner in which Avicel powder (fine crystalline cellulose powder, manufactured by Merck & Co., Inc) was dissolved in a phosphoric acid solution; sterilized distilled water was then added thereto to precipitate the crystals, and the crystals were washed until the pH thereof became equal to or greater than 5.0. All of the PSA used in the following experiment was prepared in this manner.
(133) A reaction solution for measuring the PSA activity was composed of a PSA solution having final concentration of 0.5%, 50 mM acetate buffer (a pH of 5.5), and the crude enzyme sample liquid (prepared in the section <7>) diluted appropriately. The reaction solution was reacted for 20 minutes at a temperature of 50 C. to 90 C. After the reaction ended, the amount of the reducing end hydrolyzed using 3,5-dinitrosalicyclic acid reagent (DNSA) was measured by a spectrophotometer (540 nm), and by using a calibration curve created using glucose, the amount of reduced sugar was calculated. Thereafter, from the difference in the amount between a control section and an experimental section, the amount of reduced sugar generated by hydrolysis of the enzyme was calculated. The maximum amount of the reduced sugar was regarded as being 100, and the amount of the reduced sugar at each temperature was plotted as a relative value.
(134)
(135) In the DE method that artificially causes mutation, a large number of loss-of-function variants are generated, hence functional assay becomes useless in many cases. However, all of the large number of variants cloned by the natural variant obtaining method according to the present invention had enzyme activity, and the cellobiohydrolase AR19G-166RASFS exhibiting heat stability of equal to or higher than 15 C. was obtained from the variant library constituted with a relatively small number of clones.
(136) <9> Examination on Enzyme Characteristics
(137) The enzyme proteins of the variant AR19G-166RASFS, which was confirmed to have the highest optimal temperature T.sub.opt in the PSA hydrolytic activity assay and formed by substitution of three amino acids (P88S/L230F/F414S) of AR19G-166RA, were purified, and regarding the substrate specificity, pH characteristics, temperature characteristics (optimal temperature and heat stability), the enzyme characteristics were specifically measured.
(138) (Substrate Specificity)
(139) The substrate specificity was measured by using Avicel powder, carboxymethyl cellulose (CMC, manufactured by Sigma-Aldrich Co, LLC.), xylan (from Beechwood, manufactured by Sigma-Aldrich Co, LLC.), Lichenan (manufactured by MP Biomedicals LLC.), and Laminarin (from Laminaria digitata, manufactured by Sigma-Aldrich Co. LLC.). The reaction solution was composed of 100 L of a 1% aqueous solution of each substrate, 50 L of 200 mM acetate buffer (a pH of 5.5), 40 L of purified water, and 10 L of the purified enzyme sample liquid. The reaction solution was reacted for 20 minutes at 70 C. In the same manner as in the section <8>, the amount of reduced sugar generated by the hydrolysis of the enzyme was calculated, and specific activity (U/mg) was calculated. Each measurement was performed three times independently, and the average and standard error were calculated.
(140)
(141) (Temperature Dependency)
(142) The temperature dependency of PSA hydrolytic activity of AR19G-166RASFS was investigated. For measuring the PSA hydrolytic activity, a reaction solution composed of 100 L of a 1% PSA solution, 50 L of an acetate buffer (a pH of 5.5), 40 L of purified water, and 10 L of the purified enzyme sample liquid was used, and the reaction solution was reacted for 20 minutes at a temperature of 50 C. 60 C., 65 C., 70 C., 75 C., 80 C., 85 C., 90 C., 95 C., or 99 C. The amount of reduced sugar generated by the hydrolysis reaction of the enzyme was calculated. Moreover, the enzyme activity by which 1 mol of reduced sugar is generated for 1 minute was regarded as being 1 U, and the value obtained be dividing the enzyme activity by the protein amount was taken as specific activity (U/mg). Furthermore, 0.5 mM calcium ions or 0.5 mM manganese ions were added to the reaction solution, and the PSA hydrolytic activity was measured in the same manner as described above.
(143)
(144) (pH Dependency)
(145) The pH dependency of the PSA hydrolytic activity of AR19G-166RASFS was investigated. For measuring the PSA hydrolytic activity, 0.5% by mass of PSA was used as a substrate; pH of the respective reaction liquids was adjusted by using a Mellvain buffer (a pH of 3 to 8); and the hydrolysis reaction performed at 50 C., 70 C., or 90 C. at each pH level was measured.
(146) As shown in
(147) (Measurement of Heat Stability Using PSA Hydrolytic Activity)
(148) As an index relating to the heat stability of proteins, a thermal denaturation temperature or a melting temperature (Tm) are frequently used. A pre-incubation temperature, at which the enzyme activity is reduced and becomes 50% of the enzyme activity of an untreated section due to preheating (pre-incubation) performed for a certain period of time, is almost the same as the melting temperature T.sub.m of the protein, and can be determined by measuring the enzyme activity. In this manner, the melting temperature T.sub.m of AR19G-166RA and AR19G-166RASFS was determined.
(149) Specifically, by performing preheating for 40 minutes under a substrate-free condition and by using a temperature T.sub.50, at which the PSA hydrolytic activity reduced 50%, as an index, the heat stability of variant AR19G-166RASFS was investigated. The respective data was subjected to approximation by using a logistic function, and the temperature at which the relative activity level became 50% in the approximated curve was taken as T.sub.50.
(150)
(151) (Measurement of Heat Stability by DSF Method)
(152) Differential scanning fluorimetry (DSF) is one of the methods for measuring thermal denaturation of proteins by using a fluorescent dye and a real-time PCR device, and can be applied to various proteins. Fluorescent dyes used for DSF, such as SYPRO Orange, fluoresce under a non-polar condition in which the dyes bind to a hydrophobic site, and are inhibited from fluorescing under a polar condition in which the dyes have been dissolved in water. Generally, the folded structure of proteins is opened at a thermal denaturation temperature thereof, and hydrophobic sites in the structure are exposed to the protein surface. When SYPRO Orange binds to the exposed hydrophobic sites, due to excitation light having a wavelength of 470 nm to 480 nm, the dye produces intense fluorescent light having a peak around a wavelength of 595 nm. If the temperature of the protein solution is increased stepwise at a certain interval, and the fluorescence intensity is measured, the melting temperature (=change point of fluorescence intensity) is calculated.
(153) Specifically, 2 L of SYPRO Orange (manufactured by Life Technologies) diluted 100, 1 L of enzyme proteins at a concentration of 1 mg/Ml, 5 L of 200 mM acetate buffer (a pH of 5.5), and 12 L of purified water were added to wells of a 96-well PCR plate (Multiplate 96 Well PCR Plate MLL-9651, manufactured by Bio-Rad Laboratories, Inc.), and volume of each well was adjusted to be 20 L. The PCR plate was sealed with Optical flat 8-cap strips (manufactured by Bio-Rad Laboratories, Inc.), and by using a real-time PCR device (CFX96 Touch Real-Time PCR System, manufactured by Bio-Rad Laboratories, Inc.), the temperature of wells was increased to 100 C. from 30 C. by 0.5 C. After 30 seconds elapsed from when the temperature reached a target temperature, the fluorescence intensity of the respective wells was simultaneously measured. The SYPRO Orange was excited with a light emitting diode (LED) having a wavelength band of 450 nm to 490 nm, and the light radiated from the SYPRO Orange was caused to pass through a band-pass filter within a range of 560 nm to 580 nm. The fluorescence intensity was measured using a CCD camera, and the change in the fluorescence intensity was plotted in the form of a function of temperature. The thermal denaturation temperature (melting temperature; Tm value) was defined as a maximum value of d(Fluorescence)/dt shown on the ordinate of the graph of the first-order differentiation (lower side of
(154)
(155) TABLE-US-00003 TABLE 3 Melting temperature by DSF Enzyme ( C., mean se) AR19G-166RA 80.5 0.0 (n = 3) AR19G-166RA + 3 mM Ca.sup.2+ 85.7 0.2 (n = 3) AR19G-166RASFS 93.0 0.0 (n = 3) AR19G-166RASFS + 3 mM Ca.sup.2+ 96.0 0.0 (n = 3)
(156) In the DSF fluorescence intensity curve of the enzyme proteins encoded by AR19G-166RASFS, peaks were observed around 95 C. and the thermal denaturation temperature Tm=93.00.0 (n=3) (Table 3). The value of the thermal denaturation temperature was close to the optimal temperature T.sub.opt>90 C. of the enzyme determined from the PSA hydrolytic activity or the enzyme activity halving temperature T.sub.50=91.2 C. and 92.0 C.
(157) In contrast, in the DSF fluorescence intensity curve of AR19G-166RA, peaks were observed around 83 C., and the T.sub.m value was 80.50.0 C. (n=3). This showed that heat stability of AR19G-166RASFS was improved by 12.5 C. as compared with the heat stability of AR19G-166RA.
(158) Divalent metal ions are generally known to stabilize the structure of a protein by binding to the protein and thus improve the thermostability. In AR19G-166RA, due to the addition of 3 mM Ca.sup.2+, the thermal denaturation temperature Tm calculated by DSF increased by 5.2 C. and became 85.70.2 C. (n=3). In AR19G-166RASFS, due to the addition of 3 mM Ca.sup.2+, Tm increased by 5.2 C. and became 96.00.0 C. (n=3). The thermal denaturation temperature 96.0 C. is the highest value for the cellobiohydrolases known so far.
(159) The recalcitrant crystalline cellulose is hydrolyzed into glucose as polysaccharide mainly by cooperation of three kinds of glycoside hydrolases. An endoglucanase (cellulase or endo-1,4--D-glucanase, EC 3.2.1.4) randomly cleaves the 1,4- glycoside bond of a glucan chain, and generates oligosaccharides that are diverse in length. A cellobiohydrolase as an exo-glucanase (1,4--cellobiosidase or cellobiohydrolase, EC 3.2.1.91) continuously cleaves a glucan chain from the end thereof that is either a reducing terminal (CBH belonging to a GH17 family, for example, TrCel7A or CBH I) or a non-reducing terminal (CBH belonging to a GH16 family; TrCel6A or CBH II), and generates cellobiose. A -glucosidase (-1,4-glucosidase, EC 3.2.1.21) hydrolyzes cellobiose and generates glucose.
(160) To summarize, the above results showed that the variant AR19G-166RASFS formed by substitution of three amino acid residues in AR19G-166RA is a variant which was markedly improved in terms of both the optimal temperature and the thermal denaturation temperature. The results also showed that the enzyme activity halving temperature T.sub.50, at which the PSA hydrolytic activity is reduced 50%, and the thermal denaturation temperature Tm determined by DSF of the variant were 10.5 C. and 12.5 C. respectively. That is, T.sub.50 and the T.sub.m of AR19G-166RASFS are high, and it shows that the protein is a variant that has been markedly improved in terms of heat stability.
(161) A large number of super thermostable cellulase enzymes that function at a temperature of equal to or higher than 80 C. has been separated from microorganisms surviving in an extreme environment such as a hydrothermal vent. However, regarding the cellobiohydrolase that exerts the biggest influence on the hydrolysis efficiency, an enzyme having sufficient thermostability has not been obtained. Accordingly, up to now, a super thermostable enzyme mixture liquid that degrades and hydrolyzes lignocellulose biomass at a high temperature that is equal to or higher than 80 C. has not been prepared. If the cellobiohydrolase AR19G-166RASFS having an extremely high degree of thermostability is used, it is possible to prepare an enzyme mixture liquid that enables lignocelluloses to be hydrolyzed by enzymes at a high temperature.