PROTEIN COMPLEX BASED ON DNA ENZYMES OF E FAMILY OF ESCHERICHIA COLI AND APPLICATION THEREOF IN ARTIFICIAL PROTEIN SCAFFOLDS
20230220439 · 2023-07-13
Inventors
Cpc classification
C12P21/06
CHEMISTRY; METALLURGY
C12P21/02
CHEMISTRY; METALLURGY
International classification
C12P21/06
CHEMISTRY; METALLURGY
Abstract
A protein complex based on DNA enzymes of an E family of Escherichia coli and an application thereof in artificial protein scaffolds are provided. The protein complex includes one or more of interaction pairs formed by a CL2 protein and an Im2 protein, a CL7 protein and an Im7 protein, a CL8 protein and an Im8 protein, or a CL9 protein and an Im9 protein. By protein engineering of a carboxyl terminus DNase domain of the DNA enzymes CE2, CE7, CE8 and CE9, mutants that lose DNA enzyme activity but still retain the ultra-high affinity with the corresponding Im protein are obtained, and protein interaction pairs CL2/Im2, CL7/Im7, CL8/Im8 and CL9/Im9 are constructed. These protein interaction pairs have properties of heat resistance, high affinity, high specificity, small molecular weight, fast assembly speed, etc. Based on this, an artificial protein scaffold is constructed for the construction of artificial multienzyme complexes.
Claims
1. A protein complex based on DNA enzymes of an E family of Escherichia coli, comprising one or more selected from the group consisting of an interaction pair formed by a CL2 protein and an immunity (Im2) protein, an interaction pair formed by a CL7 protein and an Im7 protein, an interaction pair formed by a CL8 protein and an Im8 protein, and an interaction pair formed by a CL9 protein and an Im9 protein; wherein an amino acid sequence of the CL2 protein is shown in SEQ ID NO: 2; wherein an amino acid sequence of the CL7 protein is shown in SEQ ID NO: 5; wherein an amino acid sequence of the CL8 protein is shown in SEQ ID NO: 8; wherein an amino acid sequence of the CL9 protein is shown in SEQ ID NO: 11; wherein an amino acid sequence of the Im2 protein is shown in SEQ ID NO: 3; wherein an amino acid sequence of the Im7 protein is shown in SEQ ID NO: 6; wherein an amino acid sequence of the Im8 protein is shown in SEQ ID NO: 9; and wherein an amino acid sequence of the Im9 protein is shown in SEQ ID NO: 12.
2. An application method of the protein complex according to claim 1, comprising: preparing a multienzyme complex system by using the protein complex, wherein the multienzyme complex system comprises a scaffold protein, the CL2 protein, the CL7 protein, the CL8 protein, the CL9 protein, and target proteins; wherein the scaffold protein comprises a carbohydrate-binding module 3a (CBM3a), the Im2 protein, the Im7 protein, the Im8 protein and the Im9 protein in series through linkers; and wherein each of the CBM3a, the Im2 protein, the Im7 protein, the Im8 protein and the Im9 protein on the scaffold protein is capable of being connected to a corresponding one of the CL2 protein, the CL7 protein, the CL8 protein, the CL9 protein to form an interaction pair, and the target proteins are respectively connected to the CL proteins to realize an orderly assembly of the target proteins on the scaffold protein.
3. The application method according to claim 2, wherein the CBM3a is derived from Clostridium thermophilum.
4. An application method of a multienzyme complex system according to claim 1, comprising: hydrolyzing a cellulose substrate by using the multienzyme complex system; wherein the multienzyme complex system comprises four CL-tagged cellulases and a scaffold protein, the scaffold protein comprises a CBM3a, the Im2 protein, the Im7 protein, the Im8 protein and the Im9 protein in series through linkers, and the four CL-tagged cellulases each are connected to a corresponding one of the Im2 protein, the Im7 protein, the Im8 protein and the Im9 protein on the scaffold protein to form protein interaction pairs; and wherein the four CL-tagged cellulases comprise an exoglucanase connected with the CL2 protein, an endoglucanase mutant connected with the CL7 protein, an endoglucanase connected with the CL8 protein, and a beta-glucosidase connected with the CL9 protein.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
DETAILED DESCRIPTION OF EMBODIMENTS
[0045] Technical solutions of the disclosure will be described clearly and completely in combination with embodiments of the disclosure. Apparently, the described embodiments are only part of the embodiments of the disclosure, not all of them. Based on the embodiments of the disclosure, all other embodiments obtained by those skilled in the art without creative labor are within the protection scope of the disclosure.
Embodiment 1 Expression and Purification of CL Proteins and Immunity (Im) Proteins
[0046] 1. Through amino acid sequence comparison with CL7 protein, based on the high similarity of four protein sequences, the carboxyl terminus DNase domain of DNA enzymes colicin E2 DNase (CE2), CE7, CE8 and CE9 of CE family (i.e., the E family of colistin) is performed with protein engineering to obtain mutants CL2, CL7, CL8 and CL9 that lose DNA enzyme activity but still retain the ultra-high affinity with the corresponding Im protein. The sequences of four groups of CL-Im and four wild type CE proteins are:
[0047] (1) an amino sequence of the CE2 protein as shown in SEQ ID NO: 1, an amino sequence of the CL2 protein as shown in SEQ ID NO: 2, and an amino sequence of the Im2 protein as shown in SEQ ID NO: 3;
[0048] (2) an amino sequence of the CE7 protein as shown in SEQ ID NO: 4, an amino sequence of the CL7 protein as shown in SEQ ID NO: 5, and an amino sequence of the Im7 protein as shown in SEQ ID NO: 6;
[0049] (3) an amino sequence of the CE8 protein as shown in SEQ ID NO: 7, an amino sequence of the CL8 protein as shown in SEQ ID NO: 8, and an amino sequence of the Im8 protein as shown in SEQ ID NO: 9; and
[0050] (4) an amino sequence of the CE9 protein as shown in SEQ ID NO: 10, an amino sequence of the CL9 protein as shown in SEQ ID NO: 11, and an amino sequence of the Im9 protein as shown in SEQ ID NO: 12.
[0051] 2. Genes of all mutants and Im proteins are synthesized by Wuhan GeneCreate Biological Engineering Co., Ltd. of China, and coding genes of the four CL proteins and the four Im proteins are respectively constructed into pET23a vector by conventional methods in the related art. The recombinant plasmid is transferred into Escherichia coli BL21(DE3), induced for expression and purified. Specifically, the expression strain is inoculated into liquid lysogeny broth (LB) medium supplemented with 100 micrograms per liter (μg/mL) ampicillin, and cultured overnight at 37° C. Then, the seed liquid is inoculated into a terrific broth (TB) medium at a volume ratio of 1:100, and when the optical density 600 (0D600) value reached 0.8˜1.0, isopropyl-β-D-thiogalactoside (IPTG) with a final concentration of 1 millimole per liter (mM) is added and induced at 18° C. and 220 revolutions per minute (rpm) for 16-18 hours; The thalli are collected, fully resuspended with tris-buffered saline (TBS) buffer (pH 7.4), and repeatedly disrupted for 5 times with a high-pressure cell disruptor, and then centrifuged with a 10000 at rpm for 30 minutes, and the supernatant is transferred to the pretreated gravity column.
[0052] The pretreatment method is as follows. 4 milliliters (mL) of nickel beads are absorbed into 60 mL of protein purification gravity column, and balanced with 3 times the column volume of bacteria-cracking buffer. The supernatant is transferred to the gravity column, sealed with a parafilm, incubated at 4° C. for 1 hour in a silent mixer, and then washed with 3 times the column volume of TBS (containing 10 mM imidazole) for 3 times. Subsequently, the column is successively washed with TBS containing 30 mM, 50 mM, and 200 mM three times in each case 3 times the column volume, and the effluents are collected separately. The collected solution eluted by each gradient of imidazole is detected by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) assay. The results are shown in
Embodiment 2 Native-Polyacrylamide Gel Electrophoresis (PAGE)
[0053] 1. Preparation of Native Polyacrylamide Gel of Native-PAGE
[0054] For the verification of the binding specificity of CL-Im protein interaction pairs, 30% acrylamide (37.5:1) and glycerol electrophoresis system are used, and the specific formula is shown in Table 1.
TABLE-US-00001 TABLE 1 Formula of native polyacrylamide gel of Native-PAGE Separation gel Concentrated gel Volume Volume Component (mL) Component (mL) 37.5:1 acrylamide 1.6 37.5:1 acrylamide 0.83 1.5M Tris-HCl pH 8.8 1 1M Tris-HCl pH 6.8 0.63 60% glycerol 0.587 60% glycerol — 10% APS 0.015 10% APS 0.050 TEMED 0.004 TEMED 0.005 ddH.sub.2O 0.794 ddH.sub.2O 3.45 Total volume 4 Total volume 5
[0055] 2. Native-PAGE
[0056] In order to verify the specificity of the bonding of the CL proteins and the Im proteins, in this embodiment, the four CL proteins and the four Im proteins are mixed respectively to obtain 16 combinations, which are incubated at room temperature for 30 minutes and then subjected to Native-PAGE, with a voltage of 120 Volts (V) for the upper gel and 180 V for the lower gel. When the bromophenol blue indicator band moves to a bottom of the gel, electrophoresis is terminated, and then staining is performed. The electrophoretic results are shown in
[0057] To further verify the thermal stability of the binding of CL and Im proteins, specifically, the 16 combinations of the CL-Im protein mixed systems are incubated at room temperature for 30 minutes, heated in 100° C. water bath for 1 hour respectively, and the Native-PAGE is performed on the protein mixed samples after high-temperature treatment. The test results are shown in
Embodiment 3 Construction of Scaffold Protein Scaf-CIQ and Verification of its Thermal Stability
[0058] 1. In this embodiment, the corresponding scaffold protein is constructed based on the interaction pairs formed by the combination of the CL and Im proteins, as follows.
[0059] Scaf-CIQ, a scaffold protein in CIQ system, is a carbohydrate-binding module 3a (CBM3a, also referred to as a cellulose-binding module) derived from Clostridium thermocellum (living environment is 60˜90° C.), and is a fusion protein expressed in series with Im2, Im7, Im8 and Im9 through linkers. The presence of CBM3a endows the scaffold protein Scaf-CIQ with the ability to combine with cellulose matrices such as microcrystalline cellulose (Avicel) and phosphoric acid swollen cellulose (PASC), thus achieving the purpose of targeting cellulose matrices and immobilize enzymes. The principle of the scaffold protein Scaf-CIQ is shown in
[0060] The construction process of the scaffold protein is as follows.
[0061] Corresponding gene fragments of the four kinds of Im proteins and the CBM3a are obtained by polymerase chain reaction (PCR) amplification, integrated by overlapping PCR technology to obtain gene sequences containing 1-4 Im modules, and recombinant vectors are constructed. Specifically, the vector skeleton selected for the scaffold protein is pET23a vector subjected to double enzyme digestion by Ndel and XhoI. When primers are designed, the forward primer of CBM3a gene contains 15 nucleotides which are homologous with a flanking sequence of the Ndel site of the pET23a vector, and the coding sequences of Im2, Im7, Im8 and Im9 respectively each carry a recombination sequence that is homologous with 15 bases of a flanking sequence of the XhoI site of the pET23a vector according to the different number of Im modules.
[0062] Specifically, the CBM3a and the Im2 protein are connected by a first linker (also referred to as Linker 1) with an amino acid sequence of TTTTTTTATTASSTSTTSTSATSG shown in SEQ ID NO: 13 (Hiroyuki Inoue et al., “Contribution of a family 1 carbohydrate-binding module in thermostable glycoside hydrolase 10 xylanase from Talaromyces cellulolyticus toward synergistic enzymatic hydrolysis of lignocellulose”, Biotechnology for Biofuels, 2015, pages 1-13, Vol. 8, No. 77.), the Im2 protein and the Im7 protein are connected by a second linker (also referred to as Linker 2) with an amino acid sequence of NTLTIGGGGGGGGTNPGT shown in SEQ ID NO: 14 (Yihan Liu et al., “Studies on properties of the xylan-binding domain and linker sequence of xylanase XynG1-1 from Paenibacillus campinasensis G1-1”, J Ind Microbiol Biotechnol, 2015, pages 1591-1599, Vol. 42, No. 12.), the Im7 protein and the Im8 protein are connected by a third linker (also referred to as Linker 3) with an amino acid sequence of GASGSGTTTTTTTTTTTTTTTGGGTDP shown in SEQ ID NO: 15 (Youzhi Miao et al., “Effect of CBM1 and linker region on enzymatic properties of a novel thermostable dimeric GH10 xylanase (Xyn10A) from filamentous fungus Aspergillus fumigatus Z5”, AMB Express, 2018, pages 1-10, Vol. 8, No. 44.), and the Im8 protein and the Im9 protein are connected by a fourth linker (also referred to as Linker 4) with an amino acid sequence of SSWEGISTDDGIVHSVRISGNKTLS shown in SEQ ID NO: 16 (Haruhiko Sakuraba et al., “Structure of L-aspartate oxidase from the hyperthermophilic archaeon Sulfolobus tokodaii”, Biochim Biophys Acta, 2008, pages 563-571, Vol. 1784, No. 3.).
[0063] The gene fragments of the four kinds of Im proteins with different Linker sequences and CBM3a are amplified by PCR. CBM3a, Im2 (Im2+Im7, Im2+Im7+Im8, Im2+Im7+Im8+Im9) are mixed with the linearized pET23a linear vector at a molar ratio of 3:1 to obtain a mixed system, 0.5 U T5 nucleic acid exonuclease and corresponding New England Biolabs (NEB) Buffer™ 4 are added into the 5 μL of the mixed system. After Escherichia coli DH5a competent cells are added, it is allowed to stand on an ice-water mixed bath for 5 minutes, the homologous arm region of the gene will produce a 3′-protruding cohesive end. After the gene fragments are transferred into the competent cell, circular vectors containing complete genetic sequences of 1-4 Im modules and CBM3a is obtained by homologous recombination in Escherichia coli, in which the scaffold protein containing only one Im2 module is named Scaf-CIM, the scaffold protein containing both Im2 and Im7 modules is named Scaf-CIB, the scaffold protein containing Im2, Im7 and Im8 modules is named Scaf-CIT, and the scaffold protein containing Im2, Im7, Im8 and Im9 modules is named Scaf-CIQ.
[0064] The recombinant vectors are transferred into Escherichia coli BL21 (DE3) for induction and expression, and then purified by Ni-NTA method, and the purified proteins are detected. The detection results are shown in
[0065] 2. To investigate the thermal stability of the scaffold protein Scaf-CIQ, the following methods are used.
[0066] A carboxyl terminus of the scaffold protein Scaf-CIQ is Im9, and an amino terminus is CBM3a. It is known that Im9 is specifically bound to CL9, and CBM3a can be bound to phosphoric acid swollen cellulose. Based on this, when exploring the thermal stability of the scaffold protein Scaf-CIQ, the phosphoric acid swollen cellulose is used as the immobilization matrix, and the green fluorescence could be displayed on the phosphoric acid swollen cellulose ix) through the combination of Im9 and sfGFP fused with CL9-tag (CL9-sfGFP) at the amino end. According to the intensity of green fluorescence, the integrity of the scaffold protein Scaf-CIQ can be judged, that is, whether the scaffold protein Scaf-CIQ retains the function of CBM3a at the amino terminus and Im9 at the carboxyl terminus after high-temperature treatment.
[0067] In this embodiment, the scaffold protein Scaf-CIQ is incubated at 70° C., 75° C., 80° C. and 85° C. for 3 hours and 6 hours respectively, centrifuged at 12000 rpm for 5 minutes, the supernatant is mixed with the phosphoric acid swollen cellulose and added with excess CL9-sfGFP is added, incubated at 220 rpm at 28° C. for 1 hour. Then, the phosphoric acid swollen cellulose is washed with 2 M NaCl three times to remove free proteins, the samples are resuspended in 100 μL TBS buffer, and the fluorescence intensity of CL9-sfGFP in each sample is quantified by fluorescence spectrophotometer. The fluorescence intensity measured by combining untreated Scaf-CIQ with CL9-sfGFP and phosphoric acid swollen cellulose is 100%, and the sample without Scaf-CIQ is used as a negative control. The detection results of relative fluorescence intensity are shown in
Embodiment 4 Assembly and Application of Multienzyme Complex MEC-CIQ
[0068] 1. Assembly of MEC-CIQ a Multienzyme Complex Containing Four Cellulases (Also Referred to as Cellulose-Hydrolyzing Enzymes)
[0069] In this embodiment, the multienzyme complex assembly system MEC-CIQ is applied to the field of artificial cellulosome. Specifically, the selected thermophilic cellulase gene include: (1) a gene of an exoglucanase Ce148S.sub.m3 obtained by DNA shuffling, (2) a gene of an endoglucanase mutant Cel8A* (K276R, G283P, S329G, S375T) derived from Clostridium thermocellum, (3) a gene of an endoglucanase GHSD derived from Caldicellulosiruptor bescii, and (4) a gene of a beta-glucosidase CoGH1A derived from Caldicellulosiruptor owensensis (each cellulase refers to Matthew A. Smith et al., “A diverse set of family 48 bacterial glycoside hydrolase cellulases created by structure-guided recombination”, FEBS Journal, 2012, pages 4453-4465, Vol. 279, No. 24; Michael Anbar et al., “Improved Thermostability of Clostridium thermocellum Endoglucanase Cel8A by Using Consensus-Guided Mutagenesis”, Applied and Environmental Microbiology, 2012, pages 3458-3464, Vol. 78, No. 9; Igor A. Dvortsova et al., “Carbohydrate binding module CBM28 of endoglucanase Cel5D from Caldicellulosiruptor bescii recognizes crystalline cellulose”, Int. J. Bio. Macromol, 2017, pages 305-311, Vol. 107; Xiaowei Peng et al., “A multifunctional thermophilic glycoside hydrolase from Caldicellulosiruptor owensensis with potential applications in production of biofuels and biochemicals”, Biotechnology for Biofuels, 2016, pages 1-13, Vol. 9, No. 98.).
[0070] The above four thermophilic cellulase genes are linked with four CL protein genes respectively to form CL-tagged thermophilic cellulases. The specific operation process is as follows.
[0071] With the pET28a digested by double enzymes NcoI and XhoI and recovered by agarose gel as the vector skeleton, the four thermophilic cellulase genes are regarded as segments A (A1, A2, A3 and A4), and the four CL protein genes are regarded as segments B (B1, B2, B3 and B4). A and B are connected by Pro/Thr-rich linker sequence (TTVTTPQTS as shown in SEQ ID NO: 17). When designing primers, under the condition that CL-tag is at the N-terminal of cellulase, the reverse primer of B and the forward primer of A contain an overlapping region with a length of 15 bases, the forward primer of B and one end of the NcoI digestion site of the linear vector of contain 15 identical bases, and the reverse primer of A and one end of the XhoI digestion site of the linear vector contain 15 identical bases. In the case of CL-tag at the C-terminal of cellulase, the reverse primer of A and the forward primer of B contain an overlapping region with a length of 15 bases, the forward primer of A and one end of the NcoI digestion site of the linear vector of contain 15 identical bases, and the reverse primer of B and one end of the XhoI digestion site of the linear vector contain 15 identical bases. The PCR products thus obtained has 15 identical bases with the end of the linear vector. The PCR products A and B (e.g., A1 and B1 constitute genes A1-B1 or B1-A1) after the sequence of fragments is determined are transformed into three fragments with the linear vector under the mediation of T5 nuclease exonuclease. After homologous recombination in Escherichia coli, genes A, B and the linear vector are assembled into a complete circular plasmid containing CL-tagged thermophilic cellulase genes.
[0072] All recombinant vectors of cellulase are expressed by Escherichia coli BL21 (DE3) and purified by the Ni-NTA method. The purified protein is assembled with the scaffold protein Scaf-CIQ to form thermostable artificial cellulosome, so as to better hydrolyze cellulose substrate under high-temperature environment. The specific operations are as follows.
[0073] In this embodiment, the system is set as 500 μL, where the final concentration of four CL-tagged cellulases is 3 μM, incubated with equimolar Scaf-CIQ at room temperature for 30 minutes, and subjected to molecular sieve chromatography by AKTA protein purifier. The selected molecular sieve column model is Superose 6 Increase 10/300 GL. TBS buffer is used as purified buffer, and the flow rate is 0.5 mL/min. The assembly of the multienzyme complex is detected according to the peak value and SDS-PAGE, and the detection results are shown in
[0074] 2. Synergistic Effect of Four Cellulases in Multienzyme Complex MEC-CIQ
[0075] In order to verify the performance of the assembled artificial cellulosome, namely, the multienzyme complex MEC-CIQ containing four cellulases, PASC is used as a cellulose substrate to determine the cellulase activity.
[0076] The determination temperature is 65° C., 70° C., 75° C., 80° C., and the reaction time is 3 hours. The final concentration of each cellulase in a 400 μL acetic acid buffer system containing 10 mM CaCl.sub.2) is 1 μM and the amount of PASC is 2.5%. The reaction system is centrifuged at 12000 rpm for 1 minute, 120 μL of the supernatant is mixed with 180 μL of DNS solution, and the mixture is treated in a boiling water bath for 10 minutes and then immediately cooled in an ice-water mixture. After centrifugation at 6000 rpm for 1 minute, 250 μL of the reaction solution is collected and the absorbance was measured at OD.sub.540. The measured values are substituted into the measured glucose standard curve, and the glucose content generated by each group of reactions was calculated. Three groups of independent parallel experiments are set for each group, and four free cellulases without multienzyme complex assembly are used as control.
[0077] The test results are shown in
[0078] The above description is only preferred embodiments of the disclosure, but the scope of protection of the disclosure is not limited thereto. Any change or substitution that can easily be conceived by those skilled in the art within the technical scope disclosed by the disclosure should be included in the scope of protection of the disclosure.