METHOD FOR PREPARING TARGET POLYPEPTIDE BY MEANS OF RECOMBINATION AND SERIES CONNECTION OF FUSED PROTEINS

Abstract

Provided in the present disclosure is a fused protein. The fused protein comprises a plurality of target protein sequences, which are connected in series, wherein every two adjacent target protein sequences are connected by means of a linker sequence, the linker sequence is suitable for being cut into a plurality of free target proteins by means of protease, the multiple target protein sequences are not cleaved by the protease, and neither the C-terminus nor the N-terminus of the free target proteins contains additional residues.

Claims

1.-50. (canceled)

51. A fusion protein, comprising a plurality of target protein sequences connected in series, wherein: every two adjacent target protein sequences are connected by a linker sequence, the linker sequence is capable of being cleaved by a protease to form the plurality of the target protein sequences in a free form, the plurality of the target protein sequences each are not cleaved by the protease, and neither a C-terminus nor an N-terminus of the plurality of target protein sequences in the free form contains additional residues.

52. The fusion protein according to claim 1, wherein the linker sequence is composed of at least one protease recognition site, preferably the linker sequence has a length of 1 to 10 amino acids, preferably the fusion protein comprises a plurality of linker sequences and the plurality of the linker sequences are same or different.

53. The fusion protein according to claim 2, wherein the protease recognition site is consecutive lysine-arginine (KR) and the protease is Kex2 protease.

54. The fusion protein according to claim 1, wherein the mass ratio of the fusion protein to the protease is 250:1 to 2000:1.

55. The fusion protein according to claim 1, wherein the linker sequence comprises a first protease recognition site and a second protease recognition site, and the plurality of the target protein sequences each do not comprise the second protease recognition site, wherein: the first protease recognition site is recognized and cleaved by a first protease to form a first protease cleavage product and the N-terminus of the first protease cleavage product does not carry any residue of the linker sequence, and the second protease recognition site is recognized and cleaved by a second protease and the second protease is capable of cleaving the C-terminus of the first protease cleavage product to form the plurality of the target proteins sequences in the free form, wherein neither the C-terminus nor the N-terminus of the target protein sequence in the free form contains a residue of the linker sequence.

56. The fusion protein according to claim 5, wherein the plurality of the target protein sequences comprise at least one first internal protease recognition site, a sequence before or after the first internal protease recognition site comprises a consecutive acidic amino acid sequence adjacent to the first internal protease recognition site, and the first internal protease recognition site is essentially not recognized by the first protease.

57. The fusion protein according to claim 6, wherein the first protease is Kex2 protease, the first internal protease recognition site is at least one of lysine-lysine (KK) and arginine-lysine (RK), and the first protease recognition site in the linker sequence is lysine-arginine (KR), arginine-arginine (RR) or arginine-lysine-arginine (RKR).

58. The fusion protein according to claim 6, wherein the consecutive acidic amino acid sequence is of a length of 1 to 2 amino acids, preferably the acidic amino acid is aspartic acid or glutamic acid, more preferably the acidic amino acid is aspartic acid (D).

59. The fusion protein according to claim 8, wherein the plurality of the target protein sequences comprise consecutive aspartic acid-lysine-arginine (DKR), aspartic acid-arginine-arginine (DRR), aspartic acid-lysine-lysine (DKK) or aspartic acid-arginine-lysine (DRK), the first protease recognition site is lysine-arginine (KR), arginine-arginine (RR) or arginine-lysine-arginine (RKR) and the second protease recognition site is the carboxyl terminal arginine (R) or lysine (K), and the first protease is Kex2 protease and the second protease is CPB protease.

60. The fusion protein according to claim 5, wherein the plurality of the target protein sequences do not comprise both the first protease recognition site and the second protease recognition site.

61. The fusion protein according to claim 5, wherein the first protease recognition site and the second protease recognition site have an overlapping domain.

62. The fusion protein according to claim 5, wherein the first protease recognition site and the second protease recognition site meet one of the following conditions: the amino acid sequence of the target protein sequence does not have consecutive lysine-arginine (KR) or arginine-arginine (RR) and optionally does not have consecutive lysine-lysine (KK) or arginine-lysine (RK), the first protease recognition site is lysine-arginine (KR), arginine-arginine (RR) or arginine-lysine-arginine (RKR) and the first protease is Kex2 protease, and the second protease recognition site is carboxyl terminal arginine (R) or lysine (K) and the second protease is CPB protease; the amino acid sequence of the target protein sequence does not have lysine (K) and has arginine (R), the first protease recognition site is lysine (K) and the first protease is Lys-C protease, and the second protease recognition site is carboxyl terminal lysine (K) and the second protease is CPB protease; the amino acid sequence of the target protein sequence does not have both lysine (K) and arginine (R), the first protease recognition site is lysine (K) or arginine (R) and the first protease is Lys-C or Trp protease, and the second protease recognition site is carboxyl terminal lysine (K) or arginine (R) and the second protease is CPB protease; and the amino acid sequence of the target protein sequence has consecutive lysine-arginine (KR), arginine-arginine (RR), lysine-lysine (KK) or arginine-lysine (RK) and the consecutive lysine-arginine (KR), arginine-arginine (RR), lysine-lysine (KK) or arginine-lysine (RK) is adjacent to 1 or 2 consecutive acidic amino acids, the first protease recognition site is lysine-arginine (KR), arginine-arginine (RR) or arginine-lysine-arginine (RKR) and the first protease is Kex2 protease, and the second protease recognition site is carboxyl terminal arginine (R) or lysine (K) and the second protease is CPB protease.

63. The fusion protein according to claim 1, further comprising an auxiliary peptide segment, wherein a carboxyl terminus of the auxiliary peptide segment is connected to the N-terminus of the plurality of the target protein sequences connected in series via the linker sequence.

64. The fusion protein according to claim 13, wherein the auxiliary peptide segment comprises a tag sequence and optionally an expression promoting sequence.

65. The fusion protein according to claim 14, wherein the amino acid sequence of the tag sequence is a repeated histidine (His) sequence, optionally, the amino acid sequence of the expression promoting sequence is EEAEAEA (SEQ ID NO: 19), EEAEAEAGG (SEQ ID NO: 20) or EEAEAEARG (SEQ ID NO: 21), optionally, the first amino acid of the auxiliary peptide segment is methionine (Met).

66. The fusion protein according to claim 1, wherein the target protein sequence is of a length of 10 to 100 amino acids, preferably, the target protein sequence is of an amino acid sequence as shown in any one of SEQ ID NOs: 1 to 6, preferably, the fusion protein comprises 4 to 16 target protein sequences connected in series.

67. A method for obtaining a target protein sequence in a free form, comprising: providing the fusion protein of claim 1, contacting the fusion protein with a protease to obtain a plurality of the target protein sequences in the free form, wherein: the protease is determined based on a linker sequence, the plurality of the target protein sequences each are not cleaved by the protease, and neither a C-terminus nor an N-terminus of the plurality of target protein sequences in the free form contains additional residues.

68. The method according to claim 17, wherein contacting the fusion protein with a protease further comprises: contacting the fusion protein with a first protease to obtain a first protease cleavage product, wherein the N-terminus of the first protease cleavage product does not carry any residue of the linker sequence, contacting the first protease cleavage product with a second protease to obtain the plurality of target protein sequences in the free form, wherein the second protease is capable of cleaving the C-terminus of the first protease cleavage product, wherein the linker sequence comprises a first protease recognition site and a second protease recognition site, and the plurality of the target protein sequences each do not comprise the second protease recognition site.

69. The method according to claim 17, wherein the fusion protein is obtained by fermentation of a microorganism carrying a nucleic acid encoding the fusion protein, preferably the microorganism is Escherichia coli.

70. The method according to claim 19, further comprising subjecting the fermentation product of the microorganism to crushing and dissolving, wherein the dissolving is performed in the presence of a detergent to obtain the fusion protein.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0092] FIG. 1 is a schematic diagram showing the structure of a system for obtaining target protein sequences in a free form according to embodiments of the present disclosure;

[0093] FIG. 2 is a schematic diagram showing the structure of a proteolysis device according to embodiments of the present disclosure;

[0094] FIG. 3 is a schematic diagram showing the structure of a device for preparing a fusion protein according to embodiments of the present disclosure;

[0095] FIG. 4 is another schematic diagram showing the structure of a device for preparing a fusion protein according to embodiments of the present disclosure;

[0096] FIG. 5 is another schematic diagram showing the structure of a proteolysis device according to embodiments of the present disclosure;

[0097] FIG. 6 is a schematic diagram of construction of recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (7-37) according to embodiments of the present disclosure;

[0098] FIG. 7 is a diagram of identification of digestion of pET-30a-Arg.sup.34-GLP-1 (7-37) according to embodiments of the present disclosure;

[0099] FIG. 8 is a schematic diagram of construction of recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (9-37) according to embodiments of the present disclosure;

[0100] FIG. 9 is a diagram of identification of digestion of pET-30a-Arg.sup.34-GLP-1 (9-37) according to embodiments of the present disclosure;

[0101] FIG. 10 is a schematic diagram of construction of recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (11-37) according to embodiments of the present disclosure;

[0102] FIG. 11 is a diagram of identification of digestion of pET-30a-Arg.sup.34-GLP-1 (11-37) according to embodiments of the present disclosure;

[0103] FIG. 12 is a schematic diagram of construction of recombinant plasmid pET-30a-GLP-2 according to embodiments of the present disclosure;

[0104] FIG. 13 is a diagram of identification of digestion of pET-30a-GLP-2 according to embodiments of the present disclosure;

[0105] FIG. 14 is a schematic diagram of construction of recombinant plasmid pET-30a-Glucagon according to embodiments of the present disclosure;

[0106] FIG. 15 is a diagram of identification of digestion of pET-30a-Glucagon according to embodiments of the present disclosure;

[0107] FIG. 16 is a schematic diagram of construction of recombinant plasmid pET-30a-T4B according to embodiments of the present disclosure;

[0108] FIG. 17 is a diagram of identification of digestion of pET-30a-T4B according to embodiments of the present disclosure;

[0109] FIG. 18 is a diagram showing SDS-PAGE results of induced expression of engineered recombinant bacteria pET-30a-Arg.sup.34-GLP-1 (9-37)/BL21(DE3) according to embodiments of the present disclosure;

[0110] FIG. 19 is a mass spectrum showing molecular weights of Arg.sup.34-GLP-1 (9-37) after digestion according to embodiments of the present disclosure;

[0111] FIG. 20 is a graph showing the in vitro cellular biological activity of Arg.sup.34-GLP-1 (9-37) according to embodiments of the present disclosure;

[0112] FIG. 21 is a graph showing the in vitro cellular biological activity of GLP-2 according to embodiments of the present disclosure;

[0113] FIG. 22 is a diagram of comparison of induced expression levels of fusion proteins with or without a promoting expression peptide (EEAEAEARG) (SEQ ID NO: 21) according to embodiments of the present disclosure;

[0114] FIG. 23 is a diagram of comparison of fusion protein contents in the supernatant of crushed bacteria expressing or not expressing a promoting expression peptide (EEAEAEARG) (SEQ ID NO: 21) according to embodiments of the present disclosure; and

[0115] FIG. 24 is a diagram of comparison of enzyme cleavage efficiency of fusion proteins with or without a promoting expression peptide (EEAEAEARG) (SEQ ID NO: 21) according to embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

[0116] Glucagon is mainly useful in treating severe hypoglycemia in diabetic patients who underwent the insulin therapy. The glucagon drug on the market includes GlucaGen. GLP-1 is mainly used for type II diabetes. The GLP-1 receptor agonist drug on the market includes Exenatide, Exenatide QW, Liraglutide, Albiglutide, Dulaglutide, Lixisenatidev and Semaglutide. GLP-2 is mainly used for short bowel syndrome. The GLP-2 drug on the market includes Teduglutide.

[0117] Human GLP-1 is a peptide hormone secreted by the intestinal mucosa that promotes the insulin secretion. The GLP-1 regulates blood glucose metabolism by increasing the secretion of insulin and inhibiting the release of glucagon; reduces intestinal peristalsis, causing satiety and thus suppressing appetite; and promotes the proliferation of pancreatic β-cells and inhibits the apoptosis of pancreatic β-cells to increase the number and function of pancreatic β-cells. Importantly, the hypoglycemic effect by GLP-1 merely occurs at a situation of high blood glucose concentration, thereby avoiding hypoglycemia caused by excessively secreted insulin. The GLP-1 can also improve the sensitivity of receptor cells to insulin, which is helpful for the treatment of insulin resistance. GLP-1 long-term treatment can significantly improve the medium and long-term indicators of a patient such as glycosylated hemoglobin. For type II diabetes caused by obesity, GLP-1 can inhibit gastric emptying, help patients to control their diet and achieve weight loss. In the past two years, it has been confirmed that GLP-1 drugs such as Liraglutide and Semaglutide benefit to cardiovascular diseases. Insulin therapy usually has the disadvantages of weight increase and hypoglycemia risk, whereas the GLP-1 receptor agonist drugs just meet these clinical need.

[0118] The mechanism of GLP-1 drugs represented by Liraglutide in the treatment of diabetes includes: stimulation of insulin secretion in a physiological and glucose-dependent manner; reduction of glucagon secretion; inhibition of gastric emptying; reduction of appetite; and promotion of growth and recovery of pancreatic β-cells.

[0119] When the blood glucose concentration exceeds a normal level, GLP-1 can stimulate the secretion of insulin through the above mechanism so as to decrease the blood glucose concentration. Therefore, GLP-1 is a glucose-dependent hypoglycemic drug which has a high efficacy. GLP-1 is a suitable candidate for the treatment of type 2 diabetes based on its above characteristics and the analysis of its clinical treatment effects for years. Further, the combination of GLP-1 and insulin can exert a better therapeutic effect on a patient in the treatment of type 1 diabetes. GLP-1 can even exert a therapeutic effect on a patient who has failed sulfonylureas therapy and do not cause severe hypoglycemia, thus exhibiting the potency on glucose-lowering. Furthermore, GLP-1 has the ability of increasing the biosynthesis rate of insulin and restoring the rapid response of rat pancreatic β-cells to elevated blood glucose (i.e., prime insulin release). It has been reported in the literature that GLP-1 can stimulate the growth and proliferation of pancreatic β-cells and promote differentiation of ductal cells to new pancreatic β-cells. A number of human trials have shown that GLP-1 is also involved in the preservation and repair of pancreatic β-cell populations.

[0120] The competition points of GLP-1 drugs mainly include administration frequency, hypoglycemic effect, weight lowering effect, immunogenicity and the like. The disadvantages of Exenatide mainly lie in a short period of drug elimination and strong immunogenicity. The disadvantages of Albiglutide mainly lie in hypoglycemic effect and weight lowering effect. Although Albiglutide is severed as the first long-acting GLP-1 in an administration frequency of once a week, its efficacy is far inferior to the latter Dulaglutide entering to the market. In addition, the cardiovascular risk raised by GLP-1 drugs has also attracted much attention. For example, Insulin Degludec, which has been marketed in Japan, the European Union and the United States, has been delayed for approval by the US FDA for its cardiovascular risk concerns. Liraglutide and Semaglutide have been proven to have cardiovascular benefits in the past two years, greatly improving the overall market competitiveness of GLP-1 receptor agonist drugs.

[0121] With the development of molecular biology technology, more and more peptide drugs on the market are prepared by genetic engineering methods. For example, Liraglutide and Semaglutide are expressed in a recombinant yeast system. When the recombinant yeast system is used to express heterologous proteins, a plurality of protease families contained in the yeast system may degrade the heterologous proteins, especially some small peptides with simple structures which are more easily degraded. The degradation products increase with the extension of fermentation time. The degradation products are hardly separated by purification in an effective means. It is revealed through studies that the degradation in the fermentation process is caused by the digestion of the polypeptide by the protease contained in the yeast. The degradation degree can be partially weakened by replacing the expression host bacteria, adjusting fermentation conditions and the like, but the requirements of industrialization cannot be met. Knockout or inactivation of specific protease genes in host yeast bacteria by molecular biological means can partially prevent the degradation of polypeptides, but it is technically difficult and cannot completely overcome the degradation of polypeptides. For example, Novo Nordisk company utilizes YES2085 Saccharomyces cerevisiae (Knock out YPS1 and PEP4 to prevent degradation) to efficiently express Arg.sup.34-GLP-1 (7-37), referring to US20100317057.

[0122] The Escherichia coli expression system is also commonly used to express recombinant heterologous proteins. Polypeptide drug has a simple structure rather a complex high-level structure and does not have glycosylation sites. Since Escherichia coli just contains a few of proteases, its recombinant expression system is capable of generating active polypeptides in complete structures. By use of conventional Escherichia coli recombinant expression system, target polypeptides can be obtained after enzyme digestion. However, the yield and recovery rate of the target polypeptides after enzyme digestion are significantly reduced, which severely restricts the industrialization of polypeptide drugs.

[0123] In the published invention patent (CN201610753093.4) associated to the preparation of GLP-1 polypeptides, enterokinase as a chaperone protein is applied for fusion expression of Arg.sup.34-GLP-1 (7-37). Although the expression level of the fusion protein is relatively high, the Arg.sup.34-GLP-1 (7-37) after digestion only accounts for one tenth of the total fusion protein, with a low yield of the target protein. In addition, chaperone proteins (TrxA, DsbA) are suitable for the fusion expression of macromolecular proteins that require renaturation. Arg.sup.34-GLP-1 (7-37) has a simple spatial structure and does not require the renaturation of spatial conformation. In the purification process, it is necessary to strictly control the residual content of the chaperone protein introduced by enzyme digestion in order to prevent the caused safety risks. CN201610857663.4 adopts the recombinant SUMO-GLP-1 (7-37) fusion protein to express GLP-1 (7-37).

[0124] In the published invention patents (CN104072604B, CN101171262 or CN102659938A) associated to the preparation of GLP-2 polypeptides, GLP-2 analogues are all prepared by the solid-phase or liquid-phase synthesis methods. The CN103159848A discloses the preparation of a polypeptide of two GLP-2 repeats connected in series. The CN103945861A discloses the preparation of a fusion polypeptide of a recombinant peptide and GLP-2. The CN201610537328.6 of Shanghai Pharmaceutical Industry Research Institute prepares GLP-2 by use of enterokinase and acid cleavage method, in which a strong acid is applied to cleave the linking bond at the acid cleavage site aspartic acid-proline (D-P) in order to obtain a complete GLP-2. During the acid cleavage, broken peptides may be generated due to the damage to the polypeptide. Further, the long-term acid lysis solution may cause deamidation-related substances of the polypeptide, which seriously affects the quality of products and restricts the subsequent purification.

[0125] In addition, by use of traditional prokaryotic and eukaryotic cells for recombinant expression, the translation of proteins starts from the first methionine at N-terminus. Therefore, the first amino acid of the expression product is the non-target amino acid methionine. Only when the first amino acid of the target protein has a rotation radius of 1.22 angstroms or less such as Gly and Ala, the N-terminal methionine can be effectively cleaved by the methioninase. However, when the target protein has a high expression level, methionine is usually not cut off due to the saturation of the methioninase for cleaving the methionine and lacking of cofactors. Therefore, non-uniformity at N-terminus (with or without Met) is caused and the amino acid sequence of the expressed protein (with the first position Met) is inconsistent with that of the target protein (without the first position Met), which may cause immunotoxicity.

[0126] The embodiments of the present disclosure are described in detail below and examples of the embodiments are shown in the drawings. The embodiments in below are described exemplarily with reference to the drawings. They are intended to explain the present disclosure but should not be construed as limiting the present disclosure.

[0127] An aspect of embodiments of the present disclosure provides a fusion protein and a novel method for expressing a recombinant polypeptide in end-to-end series connection to solve the disadvantages of genetically engineered expression of recombinant polypeptides in existing technology.

[0128] In the present disclosure, the novel method for expressing a recombinant polypeptide in end-to-end series connection specifically includes the following steps:

[0129] a) designing the polypeptide in end-to-end series connection and whole gene synthesizing a DNA sequence encoding the amino acid sequence of the polypeptide, wherein the polypeptide is of a structure of auxiliary peptide segment-(enzyme cleavage site-target protein sequence-enzyme cleavage site-target protein sequence)n, wherein n is 2 to 8;

[0130] b) constructing a recombinant plasmid expression vector containing the DNA sequence encoding the amino acid sequence of the polypeptide;

[0131] c) transforming the recombinant plasmid expression vector into a host cell to obtain genetically engineered recombinant bacteria expressing the polypeptide;

[0132] d) subjecting the genetically engineered recombinant bacteria to fermentation culture in a highdensity;

[0133] e) double digesting the polypeptide via a recombinant alkaline protease to obtain all the target protein sequences; and

[0134] f) purifying the target protein sequences by reversed-phase chromatography to obtain high-purity target protein sequences.

[0135] According to the present disclosure, the expression vector in step b) refers to an expression vector of Escherichia coli containing an expression promoter including T7, Tac, Trp or lac, or a yeast expression vector containing an a secretion factor and an expression promoter AOX or GAP.

[0136] The host cell in step c) may be Pichia pastoris or Escherichia coli, preferably Escherichia coli. More specifically, the host cell is Escherichia coli BL21, BL21 (DE3) or BL21(DE3) plysS, preferably BL21 (DE3).

[0137] The recombinant alkaline protease in step e) is a recombinant double basic amino acid endopeptidase (Recombinant Kex2 Protease, Kex2 for short), a Kex2-like protease on the membrane of a yeast cell. The Kex2 protease specifically hydrolyzes a carboxyl terminal peptide bond in an alpha factor precursor, in particular a carboxyl terminal peptide bond of two consecutive basic amino acids, such as Lys-Arg, Lys-Lys, Arg-Arg or the like. Among them, Lys-Arg has the highest digestion efficiency. The Kex2 protease is of an optimal pH of 9.0 to 9.5. The enzyme digestion buffer for Kex2 protease may be Tris-HCl buffer, phosphate buffer or borate buffer, preferably Tris-HCl buffer. Recombinant carboxypeptidase B (CPB for short) is capable of selectively hydrolyzing arginine (Arg, R) or lysine (Lys, K) at the carboxyl terminus of a protein or polypeptide, preferably hydrolyzing basic amino acids. The CPB protease is of an optimal pH of 8.5 to 9.5. The enzyme digestion buffer for CPB protease may be Tris-HCl buffer, phosphate buffer or borate buffer, preferably Tris-HCl buffer.

[0138] The present disclosure has the following advantages compared to the existing technology.

[0139] (a) Regarding the novel polypeptide in end-to-end series connection designed, its genetically engineered recombinant bacteria can ensure that the loss rate of plasmid within 80 generations is not higher than 10% and thus the expression level of target proteins is basically not affected, thereby being capable of realizing the industrial scale fermentation, obtaining high density and high expression level of target proteins.

[0140] (b) According to the glucagon-like peptides in end-to-end series connection and analogs thereof designed in the present disclosure, all target proteins in complete structures can be obtained after digestion. In contrast, through the conventional method for expressing a fusion protein, although the expression level of fusion protein is relatively high, the undesired proteins are generated and need to be removed after digestion, thus only a part of target proteins corresponding to the molar concentration are obtained.

[0141] (c) The design of the present disclosure can completely overcome the non-uniformity defect caused by the methionine (Met) at N-terminus of the fusion protein. Specifically, the N-terminal Met can be completely cleaved via the unique auxiliary peptide segment and the enzyme digestion method in the present disclosure, thus obtaining target proteins having completely uniform N-terminus.

[0142] (d) Kex2 protease and recombinant CPB protease have high digestion specificity, ensuring that non-specific digestion related substances are not produced. Therefore, all target proteins with a correct structure can be obtained after digestion, which greatly reduces the difficulty of subsequent purification and separation. Thus, extremely pure target proteins can be obtained, the recovery rate of target proteins is improved and the cost for expression of genetically engineered recombinant polypeptide is reduced.

[0143] (e) Reversed-phase chromatography for purification brings a superior separation effect and a high recovery rate.

[0144] Another aspect of embodiments of the present disclosure proposes a system for obtaining a plurality of target protein sequences in a free form. According to embodiments of the present disclosure, referring to FIG. 1, the system includes: a device for providing a fusion protein 100, configured to provide the fusion protein as described in the above aspect; a proteolysis device 200, connected to the device for providing a fusion protein 100 and configured to contact the fusion protein with a protease to obtain the plurality of the target protein sequences in a free form, in which the protease is determined based on a linker sequence, the plurality of the target protein sequences each are not cleaved by the protease, and neither a C-terminus nor an N-terminus of the target protein sequence in the free form contains additional residues. The system according to embodiments of the present disclosure is suitable for performing the method for obtaining a plurality of target protein sequences in a free form as described above. Neither the C-terminus nor the N-terminus of the target protein sequence in the free form obtained contains additional residues. The quality of the target proteins is significantly improved and the subsequent purification of target proteins is greatly facilitated. The target protein as a pharmaceutical polypeptide is of significantly improved safety and significantly reduced immunotoxicity.

[0145] According to a particular embodiment of the present disclosure, referring to FIG. 2, the proteolysis device is arranged with a first protease proteolysis unit 201 and a second protease proteolysis unit 202, and the first protease proteolysis unit 201 is connected to the second protease proteolysis unit 202. The fusion protein can be cleaved in the first protease proteolysis unit. The first protease cleavage product can be further cleaved in the second protease proteolysis unit. The protease can be artificially added to the first protease proteolysis unit and the second protease proteolysis unit respectively. The first protease and the second protease can be immobilized to realize the cleavage of the fusion protein in an industrialized and automatic manner.

[0146] Particularly, in the case that the linker sequence constitutes the C-terminus of a protease cleavage product, the C-terminus of the protease cleavage product is consecutive lysine-arginine (KR), and the first protease proteolysis unit and the second protease proteolysis unit are immobilized with Kex2 protease. Thus, the target protein sequences in a free form can be obtained after the fusion protein is cleaved in the first protease proteolysis unit. Further, the first protease cleavage product may be cleaved in the second protease proteolysis unit, such that the fusion protein which is not cleaved or is partly cleaved among the first protease cleavage product can be further cleaved to obtain the target protein sequences in a free form. The first protease cleavage product may not be cleaved to obtain the target protein sequences in a free form.

[0147] Particularly, the linker sequence includes a first protease recognition site and a second protease recognition site, and the plurality of the target protein sequences do not contain the second protease recognition site. The first protease proteolysis unit 201 is immobilized with a first protease and the second protease proteolysis unit 202 is immobilized with a second protease. The fusion protein is contacted with the first protease in the first protease proteolysis unit to obtain a first protease cleavage product, and the N-terminus of the first protease cleavage product does not carry any residue of the linker sequence. The first protease cleavage product is contacted with the second protease in the second protease proteolysis unit to obtain the plurality of the target protein sequences in the free form, in which the second protease is capable of cleaving the C-terminus of the first protease cleavage product.

[0148] In the case that the amino acid sequence of the target protein sequence does not have consecutive lysine-arginine (KR) or arginine-arginine (RR) and has or does not have consecutive lysine-lysine (KK) or arginine-lysine (RK), the first protease recognition site is lysine-arginine (KR), arginine-arginine (RR) or arginine-lysine-arginine (RKR) and the first protease is Kex2 protease, and the second protease recognition site is carboxyl terminal arginine (R) or lysine (K) and the second protease is CPB protease. In the case that the amino acid sequence of the target protein sequence does not have lysine (K) and has arginine (R), the first protease recognition site is lysine (K) and the first protease is Lys-C protease, and the second protease recognition site is carboxyl terminal lysine (K) and the second protease is CPB protease. In the case that the amino acid sequence of the target protein sequence does not have both lysine (K) and arginine (R), the first protease recognition site is lysine (K) or arginine (R) and the first protease is Lys-C or Trp protease, and the second protease recognition site is carboxyl terminal lysine (K) or arginine (R) and the second protease is CPB protease. In the case that the amino acid sequence of the target protein sequence has consecutive lysine-arginine (KR), arginine-arginine (RR), lysine-lysine (KK) or arginine-lysine (RK) and the consecutive lysine-arginine (KR), arginine-arginine (RR), lysine-lysine (KK) or arginine-lysine (RK) is adjacent to 1 or 2 consecutive acidic amino acids, the first protease recognition site is lysine-arginine (KR), arginine-arginine (RR) or arginine-lysine-arginine (RKR) and the first protease is Kex2 protease, and the second protease recognition site is carboxyl terminal arginine (R) or lysine (K) and the second protease is CPB protease. Therefore, the fusion protein is cleaved in the first protease proteolysis unit, such that the carboxyl-terminal peptide bond at the first protease recognition site of the linker sequence is cleaved to obtain the first protease proteolysis product without any linker sequence residue at the N-terminus. Further, the first protease proteolysis product is cleaved in the second protease proteolysis unit, such that the linker sequence residue at the carboxyl terminus of the first protease proteolysis product is cleaved in sequence to obtain the target protein sequences in a free form without any linker sequence residue at the C-terminus.

[0149] According to particular embodiments of the present disclosure, the first protease and the second protease may be simultaneously added in a system to cleave the fusion protein. According to the embodiments of the present disclosure, the first protease and the second protease selected do not affect each other's enzyme activity.

[0150] According to embodiments of the present disclosure, referring to FIG. 3, the device for providing a fusion protein includes a fermentation unit 101. The fermentation unit 101 is configured to cause the fermentation of a microorganism carrying a nucleic acid encoding the fusion protein. Preferably, the microorganism is Escherichia coli.

[0151] According to embodiments of the present disclosure, referring to FIG. 4, the device for providing a fusion protein further includes a dissolution unit 102. The dissolution unit 102 is connected to the fermentation unit and is configured to subject the fermentation product of the microorganism to crushing and dissolving, and the dissolving is performed in the presence of a detergent to obtain the fusion protein.

[0152] According to embodiments of the present disclosure, referring to FIG. 5, the proteolysis device further includes an adjustment unit 203. The adjustment unit 203 is configured to adjust the amount of the protease such that the mass ratio of the fusion protein to the protease is 250:1 to 2000:1. The adjustment unit is configured to adjust the amount of the protease, thereby realizing the specific cleavage of the fusion protein at the enzyme cleavage site of the linker sequence.

[0153] The present disclosure is further described below in combination with specific embodiments. The advantages and characteristics of the present disclosure will become apparent in the description. These examples are merely illustrative and do not constitute any limitation on the scope of the present disclosure. Those skilled in the art should understand that the details and forms of the technical solutions of the present disclosure can be modified or replaced without departing from the scope of the present disclosure, and these modifications or replacements fall within the scope of the present disclosure.

EXAMPLE 1

Construction of pET-30a-Arg.SUP.34.-GLP-1 (7-37) Recombinant Plasmid and Engineered Recombinant Bacteria

[0154] According to auxiliary peptide segment-(enzyme cleavage site-target protein sequence-enzyme cleavage site-target protein sequence)4, Arg.sup.34-GLP-1 (7-37) (SEQ ID NO: 1) repeats were connected in series and formed the sequence shown in SEQ ID NO: 13. The cDNA sequence shown in SEQ ID NO: 7 was designed based on the codon preference of E. coli and by adding the Nde I nuclease cleavage site CAT ATG at the 5′ end, adding the double stop codons TAA TGA at the 3′ end and adding the BamH I nuclease cleavage site GGA TCC. The nucleotide sequence was artificially whole gene synthesized, followed by construction on the PUC-57 vector to obtain a recombinant plasmid PUC-57-Arg.sup.34-GLP-1 (7-37), which was transformed in E. coli bacteria Top10 Glycerol Stock for storage.

TABLE-US-00003 (SEQ ID NO: 13) NH.sub.2-Met-His-His-His-His-Glu-Glu-Ala-Glu-Ala-Glu- Ala-Arg-Gly-Lys-Arg-His-Ala-Glu-Gly-Thr-Phe-Thr- Ser-Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala- Lys-Glu-Phe-Ile-Ala-Trp-Leu-Val-Arg-Gly-Arg-Gly- Lys-Arg-His-Ala-Glu-Gly-Thr-Phe-Thr-Ser-Asp-Val- Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys-Glu-Phe- Ile-Ala-Trp-Leu-Val-Arg-Gly-Arg-Gly-Lys-Arg-His- Ala-Glu-Gly-Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser-Tyr- Leu-Glu-Gly-Gln-Ala-Ala-Lys-Glu-Phe-Ile-Ala-Trp- Leu-Val-Arg-Gly-Arg-Gly-Lys-Arg-His-Ala-Glu-Gly- Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly- Gln-Ala-Ala-Lys-Glu-Phe-Ile-Ala-Trp-Leu-Val-Arg- Gly-Arg-Gly-COOH

[0155] The recombinant plasmid PUC-57-Arg.sup.34-GLP-1 (7-37) was double digested with Nde I/BamH I endonucleases and the target nucleotide sequences were recovered. The target nucleotide sequences were subsequently connected to Nde I/BamH I double-digested plasmid pET-30a (purchased from Novagen) via the T4 DNA ligase. Recombinant plasmids were transformed into the cloning host strain E. coli Top10, followed by enzyme digestion and PCR verification to screen the recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (7-37). After that, the cDNA sequence of Arg.sup.34-GLP-1 (7-37) in the recombinant plasmid was identified as the correct sequence via the DNA sequencing. The recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (7-37) was transformed to the expression host strain Escherichia coli BL21 (DE3) and engineered recombinant bacteria were obtained via expression screening. A schematic diagram of construction of the recombinant plasmid is shown in FIG. 6. A diagram of identification of digestion of the recombinant plasmid is shown in FIG. 7, in which bands of about 5000 bp and 450 bp both appear after digestion regarding plasmids 1-3, corresponding to pET-30a and Arg.sup.34-GLP-1 (7-37) respectively and consistent with theoretical values, indicating that Arg.sup.34-GLP-1 (7-37) is correctly connected to the vector pET-30a.

EXAMPLE 2

Construction of pET-30a-Arg.SUP.34.-GLP-1 (9-37) Recombinant Plasmid and Engineered Recombinant Bacteria

[0156] According to auxiliary peptide segment-(enzyme cleavage site-target protein sequence-enzyme cleavage site-target protein sequence)4, Arg.sup.34-GLP-1 (9-37) (SEQ ID NO: 2) repeats were connected in series and formed the sequence shown in SEQ ID NO: 14. The cDNA sequence shown in SEQ ID NO: 8 was designed based on the codon preference of E. coli and by adding the Nde I nuclease cleavage site CAT ATG at the 5′ end, adding the double stop codons TAA TGA at the 3′ end and adding the BamH I nuclease cleavage site GGA TCC. The nucleotide sequence was artificially whole gene synthesized, followed by construction on the PUC-57 vector to obtain a recombinant plasmid PUC-57-Arg.sup.34-GLP-1 (9-37), which was transformed in E. coli bacteria Top10 Glycerol Stock for storage.

TABLE-US-00004 (SEQ ID NO: 14) NH.sub.2-Met-His-His-His-His-Glu-Glu-Ala-Glu-Ala-Glu- Ala-Arg-Gly-Lys-Arg-Glu-Gly-Thr-Phe-Thr-Ser-Asp- Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys-Glu- Phe-Ile-Ala-Trp-Leu-Val-Arg-Gly-Arg-Gly-Lys-Arg- Glu-Gly-Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser-Tyr-Leu- Glu-Gly-Gln-Ala-Ala-Lys-Glu-Phe-Ile-Ala-Trp-Leu- Val-Arg-Gly-Arg-Gly-Lys-Arg-Glu-Gly-Thr-Phe-Thr- Ser-Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala- Lys-Glu-Phe-Ile-Ala-Trp-Leu-Val-Arg-Gly-Arg-Gly- Lys-Arg-Glu-Gly-Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser- Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys-Glu-Phe-Ile-Ala- Trp-Leu-Val-Arg-Gly-Arg-Gly-COOH

[0157] The recombinant plasmid PUC-57-Arg.sup.34-GLP-1 (9-37) was double digested with Nde I/BamH I endonucleases and the target nucleotide sequences were recovered. The target nucleotide sequences were subsequently connected to Nde I/BamH I double-digested plasmid pET-30a (purchased from Novagen) via the T4 DNA ligase. Recombinant plasmids were transformed into the cloning host strain E. coli Top10, followed by enzyme digestion and PCR verification to screen the recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (9-37). After that, the cDNA sequence of Arg.sup.34-GLP-1 (9-37) in the recombinant plasmid was identified as the correct sequence via the DNA sequencing. The recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (9-37) was transformed to the expression host strain Escherichia coli BL21 (DE3) and engineered recombinant bacteria were obtained via expression screening. A schematic diagram of construction of the recombinant plasmid is shown in FIG. 8. A diagram of identification of digestion of the recombinant plasmid is shown in FIG. 9, in which bands of about 5000 bp and 400 bp both appear after digestion regarding plasmids 1-3, corresponding to pET-30a and Arg.sup.34-GLP-1 (9-37) respectively and consistent with theoretical values, indicating that Arg.sup.34-GLP-1 (9-37) is correctly connected to the vector pET-30a.

EXAMPLE 3

Construction of pET-30a-Arg.SUP.34.-GLP-1 (11-37) Recombinant Plasmid and Engineered Recombinant Bacteria

[0158] According to auxiliary peptide segment-(enzyme cleavage site-target protein sequence-enzyme cleavage site-target protein sequence)4, Arg.sup.34-GLP-1 (11-37) (SEQ ID NO: 3) repeats were connected in series and formed the sequence shown in SEQ ID NO: 15. The cDNA sequence shown in SEQ ID NO: 9 was designed based on the codon preference of E. coli and by adding the Nde I nuclease cleavage site CAT ATG at the 5′ end, adding the double stop codons TAA TGA at the 3′ end and adding the BamH I nuclease cleavage site GGA TCC. The nucleotide sequence was artificially whole gene synthesized, followed by construction on the PUC-57 vector to obtain a recombinant plasmid PUC-57-Arg.sup.34-GLP-1 (11-37), which was transformed in E. coli bacteria Top10 Glycerol Stock for storage.

TABLE-US-00005 (SEQ ID NO: 15) NH.sub.2-Met-His-His-His-His-Glu-Glu-Ala-Glu-Ala-Glu- Ala-Arg-Gly-Lys-Arg-Thr-Phe-Thr-Ser-Asp-Val-Ser- Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys-Glu-Phe-Ile- Ala-Trp-Leu-Val-Arg-Gly-Arg-Gly-Lys-Arg-Thr-Phe- Thr-Ser-Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala- Ala-Lys-Glu-Phe-Ile-Ala-Trp-Leu-Val-Arg-Gly-Arg- Gly-Lys-Arg-Thr-Phe-Thr-Ser-Asp-Val-Ser-Ser-Tyr- Leu-Glu-Gly-Gln-Ala-Ala-Lys-Glu-Phe-Ile-Ala-Trp- Leu-Val-Arg-Gly-Arg-Gly-Lys-Arg-Thr-Phe-Thr-Ser- Asp-Val-Ser-Ser-Tyr-Leu-Glu-Gly-Gln-Ala-Ala-Lys- Glu-Phe-Ile-Ala-Trp-Leu-Val-Arg-Gly-Arg-Gly-COOH

[0159] The recombinant plasmid PUC-57-Arg.sup.34-GLP-1 (11-37) was double digested with Nde I/BamH I endonucleases and the target nucleotide sequences were recovered. The target nucleotide sequences were subsequently connected to Nde I/BamH I double-digested plasmid pET-30a (purchased from Novagen) via the T4 DNA ligase. Recombinant plasmids were transformed into the cloning host strain E. coli Top10, followed by enzyme digestion and PCR verification to screen the recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (11-37). After that, the cDNA sequence of Arg.sup.34-GLP-1 (11-37) in the recombinant plasmid was identified as the correct sequence via the DNA sequencing. The recombinant plasmid pET-30a-Arg.sup.34-GLP-1 (11-37) was transformed to the expression host strain Escherichia coli BL21 (DE3) and engineered recombinant bacteria were obtained via expression screening. A schematic diagram of construction of the recombinant plasmid is shown in FIG. 10. A diagram of identification of digestion of the recombinant plasmid is shown in FIG. 11, in which bands of about 5000 bp and 400 bp both appear after digestion regarding plasmids 1-3, corresponding to pET-30a and Arg.sup.34-GLP-1 (11-37) respectively and consistent with theoretical values, indicating that Arg.sup.34-GLP-1 (11-37) is correctly connected to the vector pET-30a.

EXAMPLE 4

Construction of pET-30a-GLP-2 Recombinant Plasmid and Engineered Recombinant Bacteria

[0160] According to auxiliary peptide segment-(enzyme cleavage site-target protein sequence-enzyme cleavage site-target protein sequence)4, GLP-2 (SEQ ID NO: 4) repeats were connected in series and formed the sequence shown in SEQ ID NO: 16. The cDNA sequence shown in SEQ ID NO: 10 was designed based on the codon preference of E. coli and by adding the Nde I nuclease cleavage site CAT ATG at the 5′ end, adding the double stop codons TAA TGA at the 3′ end and adding the BamH I nuclease cleavage site GGA TCC. The nucleotide sequence was artificially whole gene synthesized, followed by construction on the PUC-57 vector to obtain a recombinant plasmid PUC-57-GLP-2, which was transformed in E. coli bacteria Top10 Glycerol Stock for storage.

TABLE-US-00006 (SEQ ID NO: 16) NH.sub.2-Met-His-His-His-His-Glu-Glu-Ala-Glu-Ala-Glu- Ala-Arg-Gly-Lys-Arg-His-Gly-Asp-Gly-Ser-Phe-Ser- Asp-Glu-Met-Asn-Thr-Ile-Leu-Asp-Asn-Leu-Ala-Ala- Arg-Asp-Phe-Ile-Asn-Trp-Leu-Ile-Gln-Thr-Lys-Ile- Thr-Asp-Arg-Lys-Arg-His-Gly-Asp-Gly-Ser-Phe-Ser- Asp-Glu-Met-Asn-Thr-Ile-Leu-Asp-Asn-Leu-Ala-Ala- Arg-Asp-Phe-Ile-Asn-Trp-Leu-Ile-Gln-Thr-Lys-Ile- Thr-Asp-Arg-Lys-Arg-His-Gly-Asp-Gly-Ser-Phe-Ser- Asp-Glu-Met-Asn-Thr-Ile-Leu-Asp-Asn-Leu-Ala-Ala- Arg-Asp-Phe-Ile-Asn-Trp-Leu-Ile-Gln-Thr-Lys-Ile- Thr-Asp-Arg-Lys-Arg-His-Gly-Asp-Gly-Ser-Phe-Ser- Asp-Glu-Met-Asn-Thr-Ile-Leu-Asp-Asn-Leu-Ala-Ala- Arg-Asp-Phe-Ile-Asn-Trp-Leu-Ile-Gln-Thr-Lys-Ile- Thr-Asp-COOH

[0161] The recombinant plasmid PUC-57-GLP-2 was double digested with Nde I/BamH I endonucleases and the target nucleotide sequences were recovered. The target nucleotide sequences were subsequently connected to Nde I/BamH I double-digested plasmid pET-30a (purchased from Novagen) via the T4 DNA ligase. Recombinant plasmids were transformed into the cloning host strain E. coli Top10, followed by enzyme digestion and PCR verification to screen the recombinant plasmid pET-30a-GLP-2. After that, the cDNA sequence of GLP-2 in the recombinant plasmid was identified as the correct sequence via the DNA sequencing. The recombinant plasmid pET-30a-GLP-2 was transformed to the expression host strain Escherichia coli BL.sub.21 (DE3) and engineered recombinant bacteria were obtained via expression screening. A schematic diagram of construction of the recombinant plasmid is shown in FIG. 12. A diagram of identification of digestion of the recombinant plasmid is shown in FIG. 13, in which bands of about 5000 bp and 480 bp both appear after digestion regarding plasmids 1-3, corresponding to pET-30a and GLP-2 respectively and consistent with theoretical values, indicating that GLP-2 is correctly connected to the vector pET-30a.

EXAMPLE 5

Construction of pET-30a-Glucagon Recombinant Plasmid and Engineered Recombinant Bacteria

[0162] According to auxiliary peptide segment-(enzyme cleavage site-target protein sequence-enzyme cleavage site-target protein sequence)8, Glucagon (SEQ ID NO: 5) repeats were connected in series and formed the sequence shown in SEQ ID NO: 17. The cDNA sequence shown in SEQ ID NO: 11 was designed based on the codon preference of E. coli and by adding the Nde I nuclease cleavage site CAT ATG at the 5′ end, adding the double stop codons TAA TGA at the 3′ end and adding the BamH I nuclease cleavage site GGA TCC. The nucleotide sequence was artificially whole gene synthesized, followed by construction on the PUC-57 vector to obtain a recombinant plasmid PUC-57-Glucagon, which was transformed in E. coli bacteria Top10 Glycerol Stock for storage.

TABLE-US-00007 (SEQ ID NO: 17) NH2-Met-His-His-His-His-Glu-Glu-Ala-Glu-Ala-Glu- Ala-Arg-Gly-Lys-Arg-His-Ser-Gln-Gly-Thr-Phe-Thr- Ser-Asp-Tyr-Ser-Lys-Tyr-Leu-Asp-Ser-Arg-Arg-Ala- Gln-Asp-Phe-Val-Gln-Trp-Leu-Met-Asn-Thr-Arg-Lys- Arg-His-Ser-Gln-Gly-Thr-Phe-Thr-Ser-Asp-Tyr-Ser- Lys-Tyr-Leu-Asp-Ser-Arg-Arg-Ala-Gln-Asp-Phe-Val- Gln-Trp-Leu-Met-Asn-Thr-Arg-Lys-Arg-His-Ser-Gln- Gly-Thr-Phe-Thr-Ser-Asp-Tyr-Ser-Lys-Tyr-Leu-Asp- Ser-Arg-Arg-Ala-Gln-Asp-Phe-Val-Gln-Trp-Leu-Met- Asn-Thr-Arg-Lys-Arg-His-Ser-Gln-Gly-Thr-Phe-Thr- Ser-Asp-Tyr-Ser-Lys-Tyr-Leu-Asp-Ser-Arg-Arg-Ala- Gln-Asp-Phe-Val-Gln-Trp-Leu-Met-Asn-Thr-Arg-Lys- Arg-His-Ser-Gln-Gly-Thr-Phe-Thr-Ser-Asp-Tyr-Ser- Lys-Tyr-Leu-Asp-Ser-Arg-Arg-Ala-Gln-Asp-Phe-Val- Gln-Trp-Leu-Met-Asn-Thr-Arg-Lys-Arg-His-Ser-Gln- Gly-Thr-Phe-Thr-Ser-Asp-Tyr-Ser-Lys-Tyr-Leu-Asp- Ser-Arg-Arg-Ala-Gln-Asp-Phe-Val-Gln-Trp-Leu-Met- Asn-Thr-Arg-Lys-Arg-His-Ser-Gln-Gly-Thr-Phe-Thr- Ser-Asp-Tyr-Ser-Lys-Tyr-Leu-Asp-Ser-Arg-Arg-Ala- Gln-Asp-Phe-Val-Gln-Trp-Leu-Met-Asn-Thr-Arg-Lys- Arg-His-Ser-Gln-Gly-Thr-Phe-Thr-Ser-Asp-Tyr-Ser- Lys-Tyr-Leu-Asp-Ser-Arg-Arg-Ala-Gln-Asp-Phe-Val- Gln-Trp-Leu-Met-Asn-Thr-COOH

[0163] The recombinant plasmid PUC-57-Glucagon was double digested with Nde I/BamH I endonucleases and the target nucleotide sequences were recovered. The target nucleotide sequences were subsequently connected to Nde I/BamH I double-digested plasmid pET-30a (purchased from Novagen) via the T4 DNA ligase. Recombinant plasmids were transformed into the cloning host strain E. coli Top10, followed by enzyme digestion and PCR verification to screen the recombinant plasmid pET-30a-Glucagon. After that, the cDNA sequence of Glucagon in the recombinant plasmid was identified as the correct sequence via the DNA sequencing. The recombinant plasmid pET-30a-Glucagon was transformed to the expression host strain Escherichia coli BL.sub.21 (DE3) and engineered recombinant bacteria were obtained via expression screening. A schematic diagram of construction of the recombinant plasmid is shown in FIG. 14. A diagram of identification of digestion of the recombinant plasmid is shown in FIG. 15, in which bands of about 5000 bp and 800 bp both appear after digestion regarding plasmids 1-3, corresponding to pET-30a and Glucagon respectively and consistent with theoretical values, indicating that Glucagon is correctly connected to the vector pET-30a.

EXAMPLE 6

Construction of pET-30a-TB4 Recombinant Plasmid and Engineered Recombinant Bacteria

[0164] According to auxiliary peptide segment-(enzyme cleavage site-target protein sequence-enzyme cleavage site-target protein sequence)4, TB4 (SEQ ID NO: 6) repeats were connected in series and formed the sequence shown in SEQ ID NO: 18. The cDNA sequence shown in SEQ ID NO: 12 was designed based on the codon preference of E. coli and by adding the Nde I nuclease cleavage site CAT ATG at the 5′ end, adding the double stop codons TAA TGA at the 3′ end and adding the BamH I nuclease cleavage site GGA TCC. The nucleotide sequence was artificially whole gene synthesized, followed by construction on the PUC-57 vector to obtain a recombinant plasmid PUC-57-TB4, which was transformed in E. coli bacteria Top10 Glycerol Stock for storage.

TABLE-US-00008 (SEQ ID NO: 18) NH2-Met-His-His-His-His-Glu-Glu-Ala-Glu-Ala-Glu- Ala-Arg-Gly-Lys-Arg-Ser-Asp-Lys-Pro-Asp-Met-Ala- Glu-Ile-Glu-Lys-Phe-Asp-Lys-Ser-Lys-Leu-Lys-Lys- Thr-Glu-Thr-Gln-Glu-Lys-Asn-Pro-Leu-Pro-Ser-Lys- Glu-Thr-Ile-Glu-Gln-Glu-Lys-Gln-Ala-Gly-Glu-Ser- Arg-Lys-Arg-Ser-Asp-Lys-Pro-Asp-Met-Ala-Glu-Ile- Glu-Lys-Phe-Asp-Lys-Ser-Lys-Leu-Lys-Lys-Thr-Glu- Thr-Gln-Glu-Lys-Asn-Pro-Leu-Pro-Ser-Lys-Glu-Thr- Ile-Glu-Gln-Glu-Lys-Gln-Ala-Gly-Glu-Ser-Arg-Lys- Arg-Ser-Asp-Lys-Pro-Asp-Met-Ala-Glu-Ile-Glu-Lys- Phe-Asp-Lys-Ser-Lys-Leu-Lys-Lys-Thr-Glu-Thr-Gln- Glu-Lys-Asn-Pro-Leu-Pro-Ser-Lys-Glu-Thr-Ile-Glu- Gln-Glu-Lys-Gln-Ala-Gly-Glu-Ser-Arg-Lys-Arg-Ser- Asp-Lys-Pro-Asp-Met-Ala-Glu-Ile-Glu-Lys-Phe-Asp- Lys-Ser-Lys-Leu-Lys-Lys-Thr-Glu-Thr-Gln-Glu-Lys- Asn-Pro-Leu-Pro-Ser-Lys-Glu-Thr-Ile-Glu-Gln-Glu- Lys-Gln-Ala-Gly-Glu-Ser-COOH

[0165] The recombinant plasmid PUC-57-TB4 was double digested with Nde I/BamH I endonucleases and the target nucleotide sequences were recovered. The target nucleotide sequences were subsequently connected to Nde I/BamH I double-digested plasmid pET-30a (purchased from Novagen) via the T4 DNA ligase. Recombinant plasmids were transformed into the cloning host strain E. coli Top10, followed by enzyme digestion and PCR verification to screen the recombinant plasmid pET-30a-TB4. After that, the cDNA sequence of TB4 in the recombinant plasmid was identified as the correct sequence via the DNA sequencing. The recombinant plasmid pET-30a-TB4 was transformed to the expression host strain Escherichia coli BL.sub.21 (DE3) and engineered recombinant bacteria were obtained via expression screening. A schematic diagram of construction of the recombinant plasmid is shown in FIG. 16. A diagram of identification of digestion of the recombinant plasmid is shown in FIG. 17, in which bands of about 5000 bp and 600 bp both appear after digestion regarding plasmids, corresponding to pET-30a and TB4 respectively and consistent with theoretical values, indicating that TB4 is correctly connected to the vector pET-30a.

EXAMPLE 7

Fermentation Culture of Engineered Recombinant Bacteria pET-30a-Arg.SUP.34.-GLP-1(7-37)/BL.SUB.21.(DE3), pET-30a-Arg.SUP.34.-GLP-1(9-37)/BL.SUB.21.(DE3), pET-30a-Arg.SUP.34.-GLP-1(11-37)/BL.SUB.21.(DE3), pET-30a-GLP-2/BL.SUB.21.(DE3), pET-30a-Glucagon/BL.SUB.21.(DE3), pET-30a-TB4/BL21(DE3)

[0166] Engineered recombinant bacteria pET-30a-Arg.sup.34-GLP-1(7-37)/BL.sub.21(DE3), pET-30a-Arg.sup.34-GLP-1(9-37)/BL.sub.21(DE3), pET-30a-Arg.sup.34-GLP-1(11-37)/BL.sub.21(DE3), pET-30a-GLP-2/BL.sub.21(DE3), pET-30a-Glucagon/BL.sub.21(DE3) and pET-30a-TB4/BL.sub.21(DE3) were respectively streak plated on LA agar plates and incubated overnight at 37° C. Bacterial lawn was picked from the cultured LA agar plates and inoculated in liquid LB culture medium, followed by culturing at 37° C. for 12 hours. The bacterial solution was transferred to a 1000 ml conical flask containing 200 ml LB medium at a ratio of 1% and cultured overnight at 37° C. to harvest seed liquid for fermentation tank. The seed liquid was inoculated in a 30L fermentation tank containing YT culture medium at a ratio of 5% and cultured at 37° C. During the fermentation culture, the dissolved oxygen was kept at above 25% by adjusting rotation speed, air volume and pure oxygen volume and the pH was maintained at 6.5 by adding ammonia water. When the OD.sub.600 of the bacterial solution reaches a value of 50 to 80, isopropyl-β-D-thiogalactoside with a final concentration of 0.2 mM was added. The fermentation culture was continued for another 3 hours until stopping the culture. The bacterial solution was collected and centrifuged at 8000 rpm for 10 minutes. The supernatant was discarded and the bacterial cell pellet was collected and stored in a refrigerator at −20° C. for use.

[0167] Among them, the SDS-PAGE diagram of induced expression of engineered recombinant bacteria pET-30a-Arg.sup.34-GLP-1 (9-37)/BL.sub.21(DE3) is shown in FIG. 18.

EXAMPLE 8

Pretreatment, Enzyme Digestion and Purification of Arg.SUP.34.-GLP-1 (9-37)

[0168] The cell pellet of engineered recombinant bacteria pET-30a-Arg.sup.34-GLP-1 (9-37)/BL.sub.21(DE3) after fermentation culture were suspended in a crushing buffer, homogenized at a high pressure of 600 to 700 Bar three times, stirred at room temperature and centrifuged to collect a precipitate. The precipitate was suspended in a washing liquid via a ratio of mass to volume, homogenized with a homogenizer until no particle was visible. The homogeneous mixture was stirred at room temperature for 30 minutes and centrifuged to collect a precipitate, which was dissolved in an enzyme digestion buffer containing a surfactant at a mass/volume ratio of 3% to 5% by g/mL. The mixture was adjusted to a pH value of 10.5, stirred for 30 minutes at 28° C. to 32° C. and centrifuged to collect a supernatant. The content of the fusion protein expressed in the recombinant bacteria was determined by OD.sub.280 ultraviolet. The supernatant containing the fusion protein was adjusted to a pH value of 8.0 to 9.0 and the recombinant proteases Kex2 and CPB were added at the mass ratio (the protease to the fusion protein) of 1:1000, followed by enzyme digestion reaction at 25° C. to 35° C. under stirring overnight. The enzyme digestion product was detected through the RP-HPLC method, in which the Q anion chromatography column was routinely cleaned, regenerated and equilibrated with a balance solution to 2CV. The enzyme digestion product adjusted to a pH value of 9.5 to 9.8 was loaded to the Q anion chromatography column with a conductivity lower than 5 ms/cm, rebalanced to 1CV, eluted with a first eluent until the ultraviolet absorption value was reset to zero, equilibrated with a balance solution to 2CV, followed by eluted with a second eluent to collect a liquid containing the target peak. The collected liquid was loaded to the C4 reversed-phase column, equilibrated, eluted in gradients to collect the target protein sequences, with the purity of 99% or above.

[0169] The mass spectrum of molecular weights of Arg.sup.34-GLP-1 (9-37) after digestion is shown in FIG. 19.

EXAMPLE 9

In Vitro Activity Assay of Arg.SUP.34.-GLP-1 (9-37)

[0170] In vitro activity assay was conducted by using recombinant cells CHO-K1-CRE-GLP1R transfected with GLP-1R receptor from PEG-BIO BIOPHARM CO., LTD. The recombinant cells CHO-K1-CRE-GLP1R were plated overnight, followed by stimulation with the target protein Arg.sup.34-GLP-1 (9-37), reacted under 5% CO.sub.2 at 37° C. for 4 hours±15 minutes. A chemiluminescent substrate (Promega kit, Cat.: No. E2510) was added in an amount of 100 μl/well and gently shook on an oscillator for 40 minutes±10 minutes at room temperature. Each well in the plate was measured on the microplate reader in an appropriate time of 1 second/well for the relative luciferase unit (RLU). A four-parameter regression curve was fit by the “Sigmaplot” software to calculate the half-effect dose (EC.sub.50) of Arg.sup.34-GLP-1 (9-37). The result of in vitro activity of Arg.sup.34-GLP-1 (9-37) is shown in FIG. 20.

EXAMPLE 10

In Vitro Activity Assay of GLP-2

[0171] In vitro activity assay was conducted by using recombinant cells CHO-K1-CRE-GLP2R transfected with GLP-2R receptor from PEG-BIO BIOPHARM CO., LTD. The recombinant cells CHO-K1-CRE-GLP2R were plated overnight, followed by stimulation with the target protein GLP-2, reacted under 5% CO.sub.2 at 37° C. for 4 hours±15 minutes. A chemiluminescent substrate (Promega kit, Cat.: No. E2510) was added in an amount of 100 μl/well and gently shook on an oscillator for 40 minutes±10 minutes at room temperature. Each well in the plate was measured on the microplate reader in an appropriate time of 1 second/well for the relative luciferase unit (RLU). A four-parameter regression curve was fit by the “Sigmaplot” software to calculate the half-effect dose (EC.sub.50) of GLP-2. The result of in vitro activity of GLP-2 is shown in FIG. 21.

[0172] Some illustrative experimental schemes conducted during the development of the present method were also described to show the advantage of the present method. The experimental method and results are presented in the below examples, which show that the present method achieves significantly better effects compared to the method in the comparative examples.

COMPARATIVE EXAMPLE 1

[0173] Different expression promoting sequences in the auxiliary peptide segment of the fusion protein were investigated in the development of the present method to effectively increase the expression level of the fusion protein. The screening process was described in detail as below.

[0174] The fusion proteins containing the expression promoting sequence EEAEAEARG (SEQ ID NO: 21) and the fusion proteins not containing the expression promoting sequence EEAEAEARG (SEQ ID NO: 21) were designed and induced to express by fermentation culture, followed by enzyme cleavage to obtain the target protein sequences. The results are as follows.

[0175] (a) The expression levels of fusion proteins containing or not containing the promoting expression peptide EEAEAEARG (SEQ ID NO: 21) is shown in FIG. 22.

[0176] Conclusion: the fusion proteins containing EEAEAEARG (SEQ ID NO: 21) exhibit a higher expression level than that of the fusion proteins not containing EEAEAEARG (SEQ ID NO: 21) after 4 hours of induction.

[0177] (b) The solubility of fusion proteins containing or not containing the promoting expression peptide EEAEAEARG (SEQ ID NO: 21) is shown in FIG. 23.

[0178] Conclusion: The fusion protein content in the supernatant of crushed bacteria expressing the promoting expression peptide EEAEAEARG (SEQ ID NO: 21) is higher than the fusion protein content in the supernatant of crushed bacteria not expressing the promoting expression peptide EEAEAEARG (SEQ ID NO: 21).

[0179] (c) Enzyme cleavage efficiency of fusion proteins containing or not containing the promoting expression peptide EEAEAEARG (SEQ ID NO: 21) is shown in FIG. 24.

[0180] Conclusion: the enzyme cleavage efficiency of fusion protein containing EEAEAEARG (SEQ ID NO: 21) is 96.6%, while the enzyme cleavage efficiency of fusion protein not containing EEAEAEARG (SEQ ID NO: 21) is 62.3%, indicating the fusion protein containing EEAEAEARG (SEQ ID NO: 21) has a higher cleavage efficiency than that of the fusion protein not containing EEAEAEARG (SEQ ID NO: 21).

[0181] The introduced protease recognition sites KR are all basic amino acids, which greatly increases the isoelectric point of the fusion protein and in turn adversely affects the expression of the fusion protein and the solubility of the fusion protein in the subsequent purification. The acidic amino acid glutamic acid (E) in the expression promoting sequence EEAEAEARG (SEQ ID NO: 21) balances the isoelectric point of the fusion protein, thereby facilitating the increase of the expression of the fusion protein, improving the digestion efficiency of the fusion protein and increasing the yield of the target proteins.

[0182] In the description of this specification, reference to terms “an embodiment”, “some embodiments”, “one embodiment”, “an example”, “an illustrative example”, “some examples” or the like means that a particular feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. Thus, the illustrative representations of the terms are not necessarily directed to the same embodiment or example in this specification. Moreover, the specific features, structures, materials or characteristics as described can be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled persons in the art can combine different embodiments or examples or the features of the different embodiments or examples described in this specification without contradicting each other.

[0183] Although the embodiments of the present disclosure have been shown and described above, it can be understood that the embodiments described above are exemplary and should not be construed as limiting the present disclosure. An ordinary skilled person in the art could make changes, modifications, substitutions and modifications to the embodiments within the scope of the present disclosure.

METHOD FOR PREPARING TARGET POLYPEPTIDE BY MEANS OF RECOMBINATION AND SERIES CONNECTION OF FUSED PROTEINS

Inventors

Cpc classification

Classification Explorer

C07K14/605

CHEMISTRY; METALLURGY

Classification Explorer

C12Y304/21061

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/62

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/70

CHEMISTRY; METALLURGY

Classification Explorer

C12Y304/22

CHEMISTRY; METALLURGY

Classification Explorer

C12P21/06

CHEMISTRY; METALLURGY

Classification Explorer

C12R2001/19

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/50

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C07K14/605

CHEMISTRY; METALLURGY

Classification Explorer

C12P21/06

CHEMISTRY; METALLURGY

Abstract

Claims

Description