METHOD FOR ENZYMATICALLY MODIFYING THE TRI-DIMENSIONAL STRUCTURE OF A PROTEIN

Abstract

A structurally-modified recombinant protein, obtained by a method comprising generating at least one genetic construct comprising a nucleotide sequence coding for the protein comprising a recognition sequence; expressing in a host the at least one genetic construct using a vector comprising the at least one genetic construct; and using a plant-based expression system with the vector to express the protein, the plant-based expression system being a plant or a plant cells suspension; the recognition sequence comprises a sequence Phe-x1-x2-Tyr, wherein Phe is phenylalanine, x1 and x2 are amino acid residues, and Tyr is tyrosine and the plant-based expression system has an inherent enzymatic activity which converts the phenylalanine residue of the recognition sequence into a didehydrophenylalanine residue, producing a structurally-modified recombinant protein; and isolating the protein with the recognition sequence which is a part of the protein, the phenylalanine being converted to a didehydrophenylalanine from the plant-based expression system.

Claims

1. A structurally-modified recombinant protein obtained by a method for producing the structurally-modified recombinant protein, comprising the steps of: (a) generating at least one genetic construct comprising a nucleotide sequence coding for the protein comprising a recognition sequence; (b) expressing in a host the at least one genetic construct using a vector comprising the at least one genetic construct; and using a plant-based expression system with the vector to express the protein, the plant-based expression system being a plant or a plant cells suspension; the recognition sequence comprises a sequence Phe-x1-x2-Tyr, wherein Phe is phenylalanine, x1 and x2 are amino acid residues, and Tyr is tyrosine and the plant-based expression system has an inherent enzymatic activity which converts the phenylalanine residue of the recognition sequence into a didehydrophenylalanine residue, resulting in the structurally-modified recombinant protein, and at least one subsequent step of (c) isolating the protein with the recognition sequence which is a part of the protein, wherein the phenylalanine has been converted to a didehydrophenylalanine from the plant-based expression system.

2. The structurally-modified recombinant protein according to claim 1, wherein x1 and x2 are polar hydroxyl-containing amino acids and/or basic amino acids.

3. The structurally-modified recombinant protein according to claim 2, wherein x1 and x2 are the hydroxyl-containing amino acids Thr and Ser, the basic amino acids Lys and Arg and the amide Asn, together accounting of from 70% to 80% of the amino acids found at the positions x1 and x2 of the sequence Phe-x1-x2-Tyr .

4. The structurally-modified recombinant protein according to claim 2, wherein x1 and x2 are the hydroxyl-containing amino acids Thr and Ser, the basic amino acids Lys and Arg and the amide Asn, together accounting of from 70% to 76% of the amino acids found at the positions x1 and x2 of the sequence Phe-x1-x2-Tyr.

5. The structurally-modified recombinant protein according to claim 2, wherein x1 and x2 are the hydroxyl-containing amino acids Thr and Ser, the basic amino acids Lys and Arg and the amide Asn, together accounting especially of from 75% to 80% of the amino acids found at the positions x1 and x2 of the sequence Phe-x1-x2-Tyr.

6. The structurally-modified recombinant protein according to claim 1, wherein the structurally-modified protein is used as a biocatalyst.

7. The structurally-modified recombinant protein according to claim 1, wherein the structurally-modified protein is used as a protein therapeutic.

8. The structurally-modified recombinant protein according to claim 1, wherein the plant-based expression system is based on at least one plant belonging to the clades of rosids.

9. The structurally-modified recombinant protein according to claim 8, wherein the rosids comprise fabids or malvids.

10. The structurally-modified recombinant protein according to claim 1, wherein the plant-based expression system is based on Medicago sativa, Arabidopsis thaliana and/or Cannabis sativa.

11. The structurally-modified recombinant protein according to claim 1, which is lipase or insulin.

Description

DRAWINGS

[0061] FIG. 1 is an exemplary representation of the MS/MS spectrum of the peptide represented by SEQ-ID NO:1.

[0062] FIG. 2 is an exemplary table comprising the peaks of the MS/MS spectrum of FIG. 1.

[0063] FIG. 3 exemplarily shows the bioinformatic analysis of 512 recognition sequences from βPG proteins from plant species covering the entire kingdom Plantae. These sequences are obtained from the NCBI database and the analysis and graphical representation is generated with WebLOGO (https://weblogo.berkeley.edu/logo.cgi). The position 1 is Phe while at 4 Tyr makes up for 100%. The dominance of Thr, Ser, Lys, Arg and Asn at the positions 2 and 3, respectively being x1 and x2 in the annotation Phe-x1-x2-Tyr is shown.

DETAILED DESCRIPTION

[0064] The invention proposes the incorporation of the sequence Phe-x1-x2-Tyr, at critical positions in recombinant proteins, a phenylalanine that after enzymatic modification provides a conformationally-restrained bending point in the 3D structure of the protein.

[0065] The sequence Phe-x1-x2-Tyr, hereafter defined as recognition sequence, provides a sequence targeted by an enzymatic activity inherently present in plant cells that converts the Phe of the sequence to the structure defining amino acid derivative didehydrophenylalanine, according to the description given here below.

[0066] The beta subunit of polygalacturonase (βPG), the recognition sequence Phe-x1-x2-Tyr and the formation of didehydrophenylalanine from Phe in this recognition sequence is based on current scientific knowledge universal in all organisms in the taxonomical kingdom “plantae”.

[0067] βPG is part of a plant-specific group of proteins that contain the plant-specific BURP-domain (NCBI conserved domain database entry c103923) at their C-terminus (Hattori, J et al. A conserved BURP domain defines a novel group of plant proteins with unusual primary structures. Mol Gen Genet 259, 424-428 (1998). https://doi.org/10.1007/s004380050832).

[0068] A gene coding for βPG is found in all sequenced plant genomes.

[0069] βPG is synthesized as a 3-domain precursor: a N-terminal domain containing a signal-and pro-peptide, a central domain composed of repeats of 14 amino acids starting with the sequence Phe-x1-x2-Tyr and a C-terminal BURP domain of unknown function but essential for phenotype effects (Park J et al. AtPGL3 is an Arabidopsis BURP domain protein that is localized to the cell wall and promotes cell enlargement. Front Plant Sci. 2015; 6: 412. doi:10.3389/fpls.2015.00412). Bioinformatic analysis of current sequence databases shows that only in βPG from plants domains composed of repeated sequences starting with Phe-x1-x2-Tyr are found.

[0070] In the plant cell wall, the subcellular location where βPG has its physiological function, only the central domain is found, thus forming the active protein. Amino acid analysis of this active protein indicates that of the expected 23 Phe residues, based on the genome sequence, only 2 are found. For all other amino acids the number expected based on the genome sequence is found. Furthermore peptide sequencing using Edman-degradation returns a blank cycle when a Phe residue is expected based on the genome sequence (Zheng L et al. The beta subunit of tomato fruit polygalacturonase isoenzyme 1: isolation, characterization, and identification of unique structural features. Plant Cell. 1992; 4(9): 1147-1156. doi:10.1105/tpc.4.9.1147). Detailed analysis of the active domain of tomato βPG (NCBI entry Q40161.1 residue 110-412) shows that of the 23 Phe residues expected based on the genome sequence only two are not found in the sequence Phe-x1-x2-Tyr, corresponding to the two Phe residues quantified by amino acid analysis.

[0071] Modification of an amino acid changes the chromatographic retention time of its derivatives as generated for identification and quantification respectively in Edman degradation and amino acid analysis. This indicates that all Phe residues in the sequence Phe-x1-x2-Tyr but not Phe residues in other sequences are modified.

[0072] The modification was identified as being the loss of 2 Da from the Phe-residue (FIG. 1) which, based on the chemical structure of Phe, can only be attributed to the formation of a double bond between the alpha- and beta-carbon of Phe, resulting in the formation of didehydrophenylalanine. A modification that causes the change in chromatographic retention time and thus the lack of identification/quantification with respectively Edman degradation and amino acid analysis.

[0073] The modification is found in βPG homologous in different plant species (Arabidopsis maize, Cannabis sativa, Medicago sativa) and βPG was never identified without this modification. Together with the omnipresence of βPG in plants, the conservation of its sequence in all plant taxa, the impact of didehydro amino acids on protein fold and thus structure and the fact that functional βPG is essential for plant growth this indicates that Phe-residues in the sequence Phe-x1-x2-Tyr in βPG have the same modification in all plant species.

[0074] No proteins homologous to βPG are found outside of plant taxa, not has the modification been found in searches in proteins from organisms that are not classified as plants.

[0075] The presence of a rare modification on one specific amino acid of one specific protein is similar to what is known for diphthamide (Liu S et al. Diphthamide modification on eukaryotic elongation factor 2 is needed to assure fidelity of mRNA translation and mouse development Proc Nat Ass Sci 2012, 109 (34), 13817-13822. https://doi.org/10.1073/pnas.1206933109).

[0076] Although increased during exposure to stress (Ding X. et al. Genome-wide identification of BURP domain-containing genes in rice reveals a gene family with diverse structures and responses to abiotic stresses. Planta. 2009; 230(1):149-163. doi:10.1007/s00425-009-0929-z), βPG genes are expressed in all stages of plant development (Liu H. et al. Overexpression of stress-inducible OsBURP16, the beta subunit of polygalacturonasel, decreases pectin content and cell adhesion and increases abiotic stress sensitivity in rice. Plant Cell Environ. 2014; 37(5):1144-1158.doi:10.1111/pce.12223). The impact of didehydro amino acids on protein fold and structure and the need for βPG for plant growth indicates that the unknown enzymatic activity is inherently present in all plants at all stages of development, although induced during exposure to stress.

[0077] The repeated Phe-x1-x2-Tyr recognition sequences of 100 βPG proteins found in NCBI were analyzed with bioinformatics. Significantly overrepresented among amino acids as x1 and x2 in the recognition sequence are the hydroxyl-containing amino acids Thr and Ser, the basic amino acids Lys and Arg and the amide Asn, together accounting for 76% of the amino acids found at the positions x1 and x2 of the sequence Phe-x1-x2-Tyr. Significantly underrepresented as x1 and x2 amino acids are the aromatic amino acids Phe and Tyr, the branched hydrophobic amino acids Ile, Leu and Val, the acidic amino acids Asp and Glu and the amino acids Met and His. While the small hydrophobic amino acid Gly is found proportionally, the amino acids Cys, Pro and Trp are completely absent from the analyzed Phe-x1-x2-Tyr sequences.

[0078] A typical Phe-x1-x2-Tyr recognition sequence is Phe-(Thr/Lys)-(Ser/ Asn)-Tyr but variation in x1 and x2 does not impede the conversion of Phe in the sequence Phe-x1-x2-Tyr to didehydrophenylalanine (see examples SEQ-ID No:1-6).

[0079] The conformational determination this invention relates to originates from an enzymatic dehydration of the alpha-beta carbon bond of phenylalanine, Phe in the recognition sequence, by an enzymatic activity inherently present in plant cell based expression systems as detailed in above.

[0080] First of all dynamic modelling of the stabilized recombinant protein (hereafter designated as “product”) and variations thereof will be done to identify the molecular form with the highest stability while the enzymatic properties of the product are similar or better than that of the wild type protein. The product has to include the determined recognition sequence, including the phenylalanine residue that is to be dehydrated, for the modifying enzyme.

[0081] The recognition sequence consists of the sequence Phe-x1-x2-Tyr, a phenylalanine residue followed by a tyrosine residue, separated by two other residues, i.e. Phe-x1-x2-Tyr with x1 and x2 being amino acid residues, dominantly being polar hydroxyl-containing and/or basic amino acids, as set out above. Given the specificity of the modification to the recognition sequence, both the Phe at position 1 and the tyrosine at the position 4 are essential for the modification to occur.

[0082] SEQ-ID NO:1 is part of the protein sequence of the beta subunit of polygalacturanose (alfalfa contig 53863). This particular part of the protein sequence has been identified thanks to mass spectrometry analysis, in particular tandem mass spectrometry (MS/MS). General information about the whole protein sequence can be retrieved on http://plantgrn.noble.org/AGED/.

[0083] In SEQ-ID NO:1, the sequences of interest in which the Phe is modified in planta (1) Phe800-Ser-Gly-Tyr803; and (2) Phe814-Val-Ser-Tyr817.

[0084] FIG. 1 shows the MS/MS spectrum of peptide represented by SEQ-ID NO:1. The dF residues as indicated on the spectrum corresponds to didehydro-phenylalanine (ΔPhe) with a residual mass of 145 Da compared to the residual mass of 147 Da for the unmodified phenylalanine.

[0085] Both recognition sequences (1) and (2) have thus been identified. FIG. 1 specifically indicates the y-ion (i.e., those fragment peaks that appear to extend from the C-terminus) series as well the b-ion (i.e., those fragment peaks that appear to extend from the N-terminus) series.

[0086] FIG. 2 shows a table corresponding to the matching peaks of the MS/MS spectrum given in FIG. 1. The fragment ions given in b2, b3 and in y20, y19 corresponds to ΔPhe (or dF) from the recognition sequence (1) (1) Phe800-Ser-Gly-Tyr803. The fragments ions given in b16, b17 and in y5, y6 corresponds to ΔPhe (or dF) from the recognition sequence (2) Phe814-Val-Ser-Tyr817. It indeed illustrates the 145 Da mass compared to the mass of 147 Da normally expected for an unmodified Phe.

[0087] In order to confirm the results obtained by mass spectrometry, the use of the MASCOT software enables the identification of proteins by interpreting mass spectrometry data.

[0088] Searching via MASCOT database thus results in a highly significant match between spectrum and the peptide sequence with ΔPhe. The mascot score is of 148 (a score superior to 47 being considered as significant) and an expected value of 9.3e−0.12.

[0089] Using the approach, the presence of the modification of Phe to didehydrophenylalanine when in the sequence Phe-x1-x2-Tyr, was confirmed for the following sequences. SEQ-ID NO:2 gives the completely sequence of the βPG proteins from alfalfa known under the reference alfalfa contig Medtr8g064530. Extracted from this, is the SEQ-ID NO:3 containing the sequence Phe192-Asn-Ser-Tyr195 and Phe206-Lys-Ala-Tyr209 for both of which the Phe was found to be converted to didehydrophenylalanine. SEQ-ID NO:4, SEQ-ID NO:5, and SEQ-ID NO:6 are extracted from the alfalfa contig 53863. SEQ-ID NO:4 contains the sequence Phe102-Thr-Thr-Tyr104, the sequence Phe116-Thr-Ser-Tyr119 and the sequence Phe130-Gly-Asn-Tyr133, the Phe in all the recognition sequences were observed as being dehydrated. Of the 5 Phe-x1-x2-Tyr sequences found in the SEQ-ID NO:5 and SEQ-ID NO:6 4 confirm the dominance of Ser, Thr, Lys and Asn at the x1 and x2 position of the recognition sequence.

[0090] The recognition sequence Phe277-Ala-Gly-Tyr280 (SEQ-ID NO:6) does not contain these amino acids in the x1 and/or x2 position but the Phe in this sequence was nonetheless identified as being converted to didehydrophenylalanine. Illustrating that only Phe at position 1 and Tyr at position 4 are essential for the recognition of the Phe at position 1 as amino acid that is converted.

[0091] General information about the protein sequence alfalfa contig Medtr8g064530 (SEQ-ID NO:2 and SEQ-ID NO:3) can be retrieved on http://plantgrn.noble.org/Legume|Pv2/.

[0092] General information about the protein sequence alfalfa contig 53863 (SEQ-ID NO:4, SEQ-ID NO:5 and SEQ-ID NO:6) can be retrieved on http://plantgrn.noble.org/AGED/.

[0093] In order to achieve the introduction of the structure determining modification didehydrophenylalanine into a recombinant protein genetic constructs of the target protein need to be created containing the recognition sequence and expressed in a plant cell-based expression system.

[0094] The genetic constructs can be generated using techniques for site-directed mutagenesis. These include classical genetic modification through molecular techniques (Mardanovy et al. Efficient Transient Expression of Recombinant Proteins in Plants by the Novel pEff Vector Based on the Genome of Potato Virus X. Front Plant Sci. 2017, 8, 247. Doi: 10.3389/fpls.2017.00247). A similar approach allows the generation of synthetic genes containing the recognition sequence (Jaynes et al. Plant protein improvement by genetic engineering: use of synthetic genes. Trends Biotech, 1986, 4(12), 314-320. Doi: 10.1016/0167-7799(86)90183-6). Current state-of-the-art genome-editing approaches such as the CRISPR/Cas9 system likewise allow to generate recombinant proteins containing the recognition sequence through insertion into a gene of a nucleotide sequence coding for the recognition sequence (Ma X. et al. CRISPR/Cas9 platforms for genome editing in plants: Developments and applications. Mol Plant, 2016, 9(7) 961-974. Doi: 10.1016/j.molp.2016.04.009).

[0095] The thus generated genetic constructs coding for recombinant proteins having the recognition sequence Phe-x1-x2-tyr in their sequence are expressed in plant cell-based expression system (i.e., plant, plant tissues and/or plant cell cultures) using a constitutive or inducible promotor.

[0096] Since the modifying enzyme that converts phenylalanine into didehydrophenylalanine is inherently active in the plant cell-based expression system, the Phe in the sequence Phe-x1-x2-Tyr of recombinant proteins containing this recognition sequence is converted into didehydrophenylalanine.

[0097] The modifying enzyme converts phenylalanine into didehydro-phenylalanine, thereby determining the tri-dimensional structure of the recombinant protein, stabilizing the protein fold and making it less sensitive to changes of the environment, such as temperature, pH, composition of the solvent, as encountered during isolation, storage and use of recombinant proteins.

[0098] The product, i.e. the structurally determined protein containing the recognition sequence and with a didehydrophenylalanine instead of Phe at the first position of the recognition sequence, can be isolated from the culture matrix (plants, plant tissue culture or plant cell cultures) using a pull-down approach with antibodies or other techniques currently used in the art to isolate recombinant proteins. Analytical techniques known by the skilled person in the art will be also employed to determine the structure and to check the stability of the structurally modified protein. For instance, mass spectrometry analysis, fluorescence testing and ELISA test, among many other, might be used.

[0099] By this in situ approach, the stabilized recombinant protein can be obtained directly.

[0100] This process is suitable for the stabilization of proteins comprising a large number of amino acids. This process leads to a structure-determined modified protein, the modification being in the tri-dimensional structure of the protein. It is a process for the stabilization and functional customization of proteins through the incorporation of a stable, conformation-determining amino acid in a protein sequence.”

[0101] Alternatively and pending identification, characterization and isolation of the modifying enzymatic activity that converts Phe in the recognition sequence into didehydrophenylalanine, recombinant proteins containing the recognition sequence can be produced in non-plant protein production systems and activated through incubation with the modifying enzyme. Using current state-of-art techniques recombinant proteins can be produced in prokaryotic and eukaryotic cell cultures, which based on current knowledge, do not have the enzymatic activity to convert Phe in the sequence Phe-x1-x2-Tyr into didehydrophenylalanine. After isolation of a recombinant protein containing the recognition sequence Phe-x1-x2-Tyr from a non-plant expression system it can be stored in an inactive fold. Incubation of such recombinant protein with the, currently unknown, enzymatic function that converts Phe in the recognition sequence into didehydrophenylalanine will effect a change in the fold of the recombinant protein. This would allow production, storage and distribution of proteins, like insulin or lipase, in an inactive structure followed by activation through incubation with the currently unknown modifying enzyme.

[0102] Such ex-situ approach would allow to decouple in time and space the production of recombinant proteins from their application.

[0103] FIG. 3 shows the bioinformatic analysis of 512 recognition sequences from βPG proteins from plant species covering the entire kingdom Plantae. These sequences are obtained from the NCBI database and the analysis and graphical representation is generated with WebLOGO (https://weblogo.berkeley.edu/logo.cgi). The position 1 is Phe while at 4 Tyr makes up for 100%. The dominance of Thr, Ser, Lys, Arg and Asn at the positions 2 and 3, respectively being x1 and x2 in the annotation Phe-x1-x2-Tyr is shown. More precisely, occurrence, in percentage, of the most prevalent amino acids at the positions 1-4 in the sequence Phe-x1-x2-Tyr here defined as recognition sequence. The Phe at position 1 is converted to didehydrophenylalanine. Positions 1 and 4 are always respectively Phe and Tyr, the sum of these 5 amino acids represent 77% of the amino acids found at position 2 (x1) and 76.4% of the amino acids found at position 3 (x2). Data obtained by analysis of 512 recognition sequences from βPG proteins from plant species covering the entire kingdom Plantae. These sequences are obtained from the NCBI database and the analysis and graphical representation is generated with WebLOGO (https://weblogo.berkeley.edu/logo.cgi), as mentioned above.

[0104] The stabilized recombinant protein produced in situ or ex situ may be used in different systems as biocatalysts (e.g. production of biodiesel by lipases, biomass valorisation, lignin cleavage, etc.) or in protein therapeutics (e.g. stabilized forms of insulin, stabilized forms of antibodies, etc.).

[0105] In this recombinant protein, the presence of didehydrophenylalanine will result in a determined/stabilized fold, the exact position and structure of which is for each individual targeted protein to be determined by modelling and informatic analysis prior to the construction of the genetic construct coding for the recombinant protein. Structural constraints due to the presence of didehydrophenylalanine in an amino acid sequence are known and intensively studied (Crisma et al. J. Am. Chem. Soc. 1999, 121, 14, 3272-3278 https://doi.org/10.1021/ja9842114; Gupta et a! Biopolymers. 2011; 95(3): 161-173. doi:10.100.sup.2/.sub.bip.21561).

[0106] The use of chemical peptide synthesis, as done in Menting et al. (PNAS 2014, 111(33) E3395-E3404), to generate insulin analogues with didehydrophenylalanine on the positions 24 and 25 of the beta chain shows new functional and potentially therapeutic properties for such analogues. However the cost of chemical peptide synthesis precludes the use of this technique for producing more than milligrams of protein and the less than 100% cyclic efficiency limits its use to the production of fold-stabilized proteins with less than 100 amino acids, limitations overcome using protein synthesis capacities present in plants and plant cells as according to the invention. The structure of the insulin structurally-modified recombinant protein is identical as that disclosed in Menting et al.

[0107] Due to their selectivity in substrate and product, the applications of recombinant proteins as biocatalyst are numerous however still limited because of stability and durability issues. Approaches to overcome this are proposed (eg. Cejudo-Sanches et al. Process Biochemistry 2020, 92, 156-163 https://doi.org/10.1016/j.procbio.2020.02.026), the inclusion of didehydrophenylalanine as a result of the conversion of Phe in the sequence Phe-x1-x2-Tyr in the sequence of a recombinant protein forms an alternative means to attain such stabilization. Based on modelling and informatic analysis, the sequence Phe-x1-x2-Tyr is inserted in a genetic construct coding for the desired stabilized proteins, this genetic construct is expressed in a system inherently having the enzymatic activity that converts Phe in the sequence Phe-x-x2-Tyr into didehydrophenylalanine (based on current scientific knowledge, a plant or a plant-cell), the biocatalyst with a stabilized fold can be isolated from the expression host using current practices and applied as biocatalyst. Groups of recombinant proteins used for stabilization include lipases, proteases and nucleotide ligases.

METHOD FOR ENZYMATICALLY MODIFYING THE TRI-DIMENSIONAL STRUCTURE OF A PROTEIN

Inventors

Cpc classification

Classification Explorer

C12N9/20

CHEMISTRY; METALLURGY

Classification Explorer

C12Y302/01015

CHEMISTRY; METALLURGY

Classification Explorer

C12Y301/01003

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/8251

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/2402

CHEMISTRY; METALLURGY

Classification Explorer

C07K14/62

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/82

CHEMISTRY; METALLURGY

Classification Explorer

C07K14/62

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/20

CHEMISTRY; METALLURGY

Abstract

Claims

Description