ENGINEERED DNA MOLECULE FOR CODING RNA
20250215441 ยท 2025-07-03
Assignee
Inventors
Cpc classification
C12N2830/50
CHEMISTRY; METALLURGY
C12N2710/20022
CHEMISTRY; METALLURGY
C40B40/08
CHEMISTRY; METALLURGY
International classification
C40B40/08
CHEMISTRY; METALLURGY
Abstract
An engineered DNA molecule capable of being replicated in a cell, comprising a poly (A) tail coding sequence that makes the engineered DNA molecule more conservative when replicated in cells, particularly in prokaryotic cells, while adjusting the expression level of RNA in eukaryotic cells. Also provided are an RNA comprising the poly (A) tail and a use thereof.
Claims
1-51. (canceled)
52. An engineered DNA molecule capable of being replicated in a cell, comprising a polyadenosine tail (Poly A tail) coding sequence, wherein the poly (A) tail coding sequence comprises: a single element a and at least one element b, and at least one element c and/or at least one element d: the element a consists of a plurality of consecutive adenine (A) nucleotides, and the length range of the element a is 20 nt and <80 nt; the element b consists of a plurality of consecutive A nucleotides, and the length range of the element b is 3 ntb <20 nt; the element c consists of a non-A nucleotide, and the nucleotide is selected from T, C and G nucleotides; the element d consists of any two or more consecutive nucleotides, and the nucleotides are selected from A, T, C and G nucleotides, wherein the nucleotides at a 5 terminus and a 3 terminus of the element d are not A nucleotides, and the element d does not comprise 3 or more consecutive A nucleotides; and the length range of the element d is 2 ntd20 nt; wherein the element a and the element b are not adjacent, and the element c and the element d are not adjacent, and the poly (A) tail coding sequence does not comprise any two elements b that are adjacent to each other, does not comprise any two elements c that are adjacent to each other, and does not comprise any two elements d that are adjacent to each other.
53. The DNA molecule according to claim 52, wherein the length of the poly (A) tail coding sequence is 101-200 nt.
54. The DNA molecule according to claim 52, wherein the 3 terminus of the poly (A) tail coding sequence is an A nucleotide or a non-A nucleotide.
55. The DNA molecule according to claim 52, wherein 50% or more of the polynucleotides of element a are located in the 5 portion or 3 portion of the poly (A) tail coding sequence.
56. The DNA molecule according to claim 52, wherein the element c is G.
57. The DNA molecule according to claim 52, wherein the element d comprises a palindromic sequence.
58. The DNA molecule according to claim 52, wherein the element d is any one or more selected from the following sequences: GATATC (SEQ ID NO: 15), GTATAC (SEQ ID NO: 16), GAATCT (SEQ ID NO: 17), GCATATGACT (SEQ ID NO: 18) and GATATCGTATAC (SEQ ID NO: 19).
59. The DNA molecule according to claim 52, wherein the number of the element d is 0-5.
60. The DNA molecule according to claim 52, wherein when the element c and the element d exist simultaneously, the total number of the element c and the element d is 2-15.
61. The DNA molecule according to claim 52, wherein the 3 portion of the poly (A) tail coding sequence comprises one or more non-A nucleotides.
62. The DNA molecule according to claim 52, wherein the structure of the poly (A) tail coding sequence is: element a-element c-element b-element c-element b-element c-element b-element c-element b; element b-element c-element b-element c-element a-element d-element b-element c-element b-element c-element b; element b-element c-element b-element c-element b-element d-element a-element c; element a-element d-element b-element c-element b-element c-element b; or element b-element c-element b-element c-element b-element d-element a.
63. The DNA molecule according to claim 52, wherein the structure of the poly (A) tail coding sequence is: 1) element a-element d-element b-element c-element b-element c-element b; the element a is 60 nt in length, the element b is 16-19 nt in length, and the element d is 6 nt in length, or 2) element b-element c-element b-element c-element b-element d-element a; and the element a is 60 nt in length, the element b is 16-19 nt in length, and the element d is 6 nt in length.
64. The DNA molecule according to claim 52, wherein the poly (A) tail coding sequence is represented by any one of SEQ ID NOs: 1-10.
65. The DNA molecule according to claim 52, wherein it is further connected to a gene of interest fragment on the 5 end of the poly (A) tail coding sequence, and the gene of interest fragment and the poly (A) tail coding sequence co-encode RNA.
66. The DNA molecule according to claim 52, further comprising a replicon, a resistance gene, and/or a promoter for initiating transcription of the RNA.
67. The DNA molecule according to claim 52, wherein the gene of interest fragment comprises a 5 UTR coding sequence, and/or a 3 UTR coding sequence.
68. The DNA molecule according to claim 52, wherein the gene of interest fragment comprises a protein coding sequence or a non-protein coding sequence.
69. The DNA molecule according to claim 52, comprising a replicon, an antibiotic resistance gene, a promoter, a 5 UTR coding sequence, a protein coding sequence, and a 3 UTR coding sequence.
70. The DNA molecule according to claim 52, wherein the protein coding sequence encodes an HPV (human papillomavirus) protein.
71. The DNA molecule according to claim 52, wherein the protein coding sequence encodes HPV E2, E6 or E7 protein, a fusion protein of E6 and E7 protein polypeptide fragments, or a fusion protein of E2, E6 and E7 protein polypeptide fragments.
72. The DNA molecule according to claim 52, comprising a polynucleotide sequence represented by any one of SEQ ID NOs: 22-25, or a synonymous mutant of the polynucleotide sequence represented by any one of SEQ ID NOs: 22-25, or a polynucleotide sequence sharing more than 85% sequence identity with the polynucleotide sequence represented by any one of SEQ ID NOs: 22-25 or a synonymous mutant thereof.
73. A cell comprising the DNA molecule according to claim 52.
74. The cell according to claim 73, wherein the cell is a prokaryotic cell.
75. An RNA molecule encoded by the DNA molecule according to claim 52.
76. The RNA molecule according to claim 75, further comprising a 5-cap structure, and/or some or all of the uridines in the RNA are chemically modified uridines.
77. A DNA coding sequence for the poly (A) tail according to claim 52.
78. A library comprising the DNA molecule according to claim 52.
79. A library comprising an RNA molecule encoded by the DNA molecule according to claim 52.
80. A method for regulating protein expression, comprising: introducing a plurality of DNAs in the DNA library according to claim 48 into target cells at different times and/or in different quantitative ratios.
81. A hybrid molecule of DNA and RNA, comprising the same genetic information as the DNA molecule according to claim 52.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
BACKGROUND OF THE INVENTION
[0090] The present application firstly provides a method for stably amplifying a poly (A) tail transcription template DNA in vitro, so as to reduce the mutation frequency of the poly (A) tail transcription template sequence when the DNA is replicated in large quantities in a cell. Thus, a large amount of RNA comprising a poly (A) tail with a defined sequence is obtained based on the DNA. On this basis, RNA with a poly (A) tail that is engineered to have a specific function, such as a mRNA, can be produced on a large scale through in vitro fermentation.
[0091] In addition, the present application also provides a DNA comprising a poly (A) tail transcription template that can be stably amplified in vitro, and an RNA transcribed from the DNA. Furthermore, under the premise of satisfying in vitro stable amplification, the present application further provides a group of poly (A) tails having different regulatory effects on RNA stability and/or expression efficiency, an RNA comprising the poly (A) tail, a DNA comprising the poly (A) tail coding sequence, and a library consisting of the poly (A) tail, the RNA or the DNA.
[0092] Furthermore, the present application also provides uses of the aforementioned poly (A) tail, RNA, DNA, and library.
Terms
[0093] As used herein, element a, element b, element c, element d, and element e are types of elements comprised in poly (A). The element a consists of a plurality of consecutive adenine (A) nucleotides, and the length range of element a is 20 nt; the element b consists of a plurality of consecutive A nucleotides, and the length range of element b is 3 ntb<20 nt; the element c consists of a non-A nucleotide, and the nucleotide is selected from T, C and G nucleotides; the element d consists of any two or more consecutive nucleotides, and the nucleotides are selected from A, T, C and G nucleotides, wherein the nucleotides at the 5 terminus and 3 terminus of element d are not A nucleotides, and the element d does not comprise 3 or more consecutive A nucleotides, and the length range of element dis 2 ntd20 nt. The element e consists of one or two consecutive A, and the element e is located at the 3end of the poly (A) tail coding sequence and is adjacent to the element d or the element c when it exists. When the poly (A) comprises two or more element b, element c, and element d, the sequences of each two elements b may be the same or different, the sequences of each two elements c may be the same or different, and the sequences of each two elements d may be the same or different, as long as they each meet the above definitions of elements a, b, c and d. In the present application, the element a, element b, element c, element d, element e, etc. in the poly (A) tail may be indicated by the term element.
[0094] As used herein, when describing the positional relationship of two or more elements as non-adjacent, it means that the two or more elements are not adjacent to each other. In other words, the two or more elements at least comprise one or more other nucleotides or bases other than the nucleotides of the two elements between each two elements.
[0095] As used herein, encoding\coding means i) a DNA sequence comprising genetic information that can be transcribed into an RNA molecule, and/or ii) an RNA molecule comprising genetic information that can be translated into an amino acid sequence. As used herein, therefore, the coding sequence refers to a ribonucleotide (RNA) sequence or a fragment thereof in a mRNA precursor or a mature mRNA that can be translated into a protein, and also refers to a complementary sequence or a fragment thereof of a deoxyribonucleotide (DNA) sequence that serves as a template for transcribing the mRNA precursor or mature mRNA. In addition, the coding sequence of the present application may further comprise polynucleotide sequences encoding proteins, functional nucleic acids, or fragments thereof, such as miRNA, shRNA, dsRNA, guide RNA, poly (A) tail, 5 UTR, 3 UTR, etc. Among them, a DNA molecule comprising genetic information that can be transcribed into an RNA molecule is called the coding nucleic acid of the RNA molecule; and an RNA molecule comprising genetic information that can be translated into an amino acid sequence is called the coding nucleic acid of the amino acid sequence.
[0096] In the present application, the nucleotides in all polynucleotide sequences are numbered from the 5 end to the 3 end, that is, the nucleotide at the 5 terminus is the first nucleotide, and the nucleotide at the 3 terminus is the last nucleotide. Unless otherwise specified, 5 end and 5 terminus can be used interchangeably; 3 end and 3 terminus can be used interchangeably. 5 end and 3 end focus on describing the relative position relationship between nucleotides, between nucleotide sequence segments, or between nucleotides and nucleotide sequence segments in the same nucleic acid sequence; 5 terminus and 3 terminus are used to describe the positions of the first and last nucleotides of a nucleic acid sequence or a segment of a nucleic acid sequence, respectively. 5 end side is used to describe the relative position relationship between two non-overlapping sequences in the same polynucleotide sequence; when describing a sequence being located on the 5 end side of another sequence, it means that the sequence is more closer to the 5 end of the polynucleotide sequence compared to another sequence. Similarly, when describing that a sequence is located on the 3 end side of another sequence, it means that the sequence is closer to the 3 end of the polynucleotide sequence compared to another sequence, moreover, the sequence and another sequence do not comprise any overlapping parts each other. Specifically, for example, the DNA coding sequence of the poly (A) tail is located at the 3 end of the RNA coding sequence means that the DNA coding sequence of the poly (A) tail, as the component of the RNA coding sequence, comprises the nucleotides at the 3 terminus of the RNA coding sequence. In addition, as used herein, the 5 portion refers to the one that near the half of the 5 terminus of the polynucleotide sequence, bounded by the center position of the polynucleotide sequence. The 3 portion refers to the one that near the half of the 3 terminus of the polynucleotide sequence, bounded by the center position of the polynucleotide sequence. The number of nucleotides from the central position as described in this application to the 5 end is equal to the number of nucleotides from the central position as described in this application to the 3 end.
[0097] As used herein, when referring to the replication of a nucleic acid molecule, the term conservative means a low probability of mutation during the replication process. In this context, the conservative is a relative concept. For example, when describing that the DNA coding sequence of the poly (A) tail is used to make the replication of the DNA molecule coding RNA more conservative in the host cell, it refers to the probability that the parent DNA molecule encoding RNA replicates into an offspring DNA molecule. If the RNA molecule comprises the Poly (A) tail, the offspring DNA molecule has a higher probability of 100% sequence identity with the parent DNA molecule compared to the coding DNA of an RNA molecule that does not comprise the Poly (A) tail (such as an RNA molecule comprising some other Poly (A) tail); the average sequence identity between multiple offspring DNA molecules obtained by replication of the parent DNA molecule and the parent DNA molecule is higher.
[0098] In the present application, when describing regulating the expression of an RNA molecule, the regulating means increasing or decreasing the total amount of the protein or functional RNA expressed by the RNA molecule within the same time period; or enabling the RNA to express the protein or functional RNA within a longer or shorter time period, and the increased or decreased, or longer or shorter time period is compared to another RNA molecule expressing the same protein or functional RNA. When describing regulating protein expression, it means regulating the expression of RNA molecules comprising the protein coding sequence. The regulatory effect described herein can be achieved by connecting the poly (A) tail of the present application to the 3 terminus of an RNA molecule that does not comprise a poly (A) tail, or by substituting the original poly (A) tail of the RNA with the poly (A) tail of the present application.
[0099] As used herein, the percentage of identity, for example 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 98.5%, 99%, or 99.5% identity, refers to the degree of similarity between amino acid sequences or between nucleotide sequences, as determined by sequence alignment, which is 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 98.5%, 99%, or 99.5%. For example, after two sequences have identical residues at as many positions as possible by introducing gaps, the ratio of the number of positions occupied by the identical bases or amino acid residues to the total number of positions is determined. The percentage of identity can be determined by using software programs known in the art. Preferably the alignment is performed by using default parameters. A preferred alignment program is BLAST. Preferred programs are BLASTN and BLASTP. Details of these programs are available at the following: ncbi.nlm.nih.gov/cgi-bin/BLAST.
[0100] As used herein, nucleic acid complementarity refers to the ability of one nucleic acid to form hydrogen bonds with another nucleic acid through traditional Watson-Crick base-pairing. Percent complementarity refers to the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (i.e., Watson-Crick base-pairing) with another nucleic acid molecule (e.g., about 5, 6, 7, 8, 9, 10 out of 10 correspond to about 50%, 60%, 70%, 80%, 90%, and 100% complementarity, respectively). Completely complementary means that all consecutive residues of a nucleic acid sequence will form hydrogen bond with the same number of consecutive residues in a second nucleic acid sequence. As used herein, substantially complementary refers to a degree of complementarity of any one of at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% over a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions. For a single base or a single nucleotide, according to the Watson-Crick base-pairing principle, when A is paired with T or U, and Cis paired with G or I, it is called complementary or matching, and vice versa; any other base-pairing is called non-complementary. In the present application, the complementary polynucleotide sequence of a certain polynucleotide sequence refers to a polynucleotide sequence that is completely complementary to the certain polynucleotide sequence.
[0101] As used herein, a conservative substitution variant of a protein, polypeptide or amino acid sequence refers to one in which one or more amino acid residues are subjected to amino acid substitution without altering the overall conformation and function of the protein or enzyme, including but not limited to substituting the amino acids in the amino acid sequence of the parent protein in the manner described by the aforementioned conservative substitution. Therefore, the similarity between two proteins or amino acid sequences with similar functions may be different. For example, 70-99% similarity (identity) based on the MEGALIGN algorithm. The conservative substitution variants also comprise polypeptides or enzymes with 60% or more amino acid identity as determined by BLAST or FASTA algorithms, preferably 75% or more, more preferably 85% or more, and most preferably even 90% or more, and have the same or substantially similar properties or functions as the native or parent protein or enzyme.
[0102] In the context of this application, the terms DNA and RNA refer to single-stranded or double-stranded DNA or RNA molecules. Unless otherwise indicated, the terms DNA and DNA molecule refer to double-stranded DNA molecules composed of A, C, G and/or T nucleotides, and the terms RNA and RNA molecule refer to single-stranded RNA molecules composed of A, C, G and/or U nucleotides. Herein, the A, C, G, T and U nucleotides refer to nucleotides comprising adenine, guanine, cytosine, thymine and uracil as respective nitrogenous bases.
[0103] RNA molecules comprise coding RNA or non-coding RNA (ncRNA), such as pre-mRNA, mature mRNA, or long non-coding RNA (lncRNA).
[0104] As used herein, the hybrid molecule of DNA and RNA is a molecule comprising a polynucleotide sequence consisting of deoxyribonucleotides and ribonucleotides. The hybrid molecule of DNA and RNA can be obtained by the following method: [0105] substituting one or more deoxyribonucleotides in the DNA with ribonucleotides; [0106] substituting one or more ribonucleotides in the RNA with deoxyribonucleotides; or [0107] de novo synthesizing by using deoxyribonucleotides and ribonucleotides as raw materials through biological or chemical synthesis. It should be noted that the method of obtaining a hybrid molecule of DNA and RNA is not limited to the above method, and a hybrid molecule of DNA and RNA obtained by any method belong to the category of a hybrid molecule of DNA and RNA defined in this application.
[0108] As used herein, when describing two nucleic acid molecules as having the same genetic information, it means that the two nucleic acid molecules are complementary, or comprise exactly the same base sequence, or one nucleic acid molecule having exactly the same base sequence as another nucleic acid molecule can be obtained by converting one or more thymines in the base sequence of one nucleic acid molecule into uracil. Therefore, any two of the DNA, RNA, and a hybrid molecule of DNA and RNA can have the same genetic information. Herein, the term base sequence refers to the order of arrangement of bases in a polynucleotide molecule. Unless otherwise specified, those skilled in the art should be understood that the thymine may be indicated by T when the base sequence or polynucleotide sequence described in the present application are used to describe a DNA sequence, while the T will be substituted by U (uracil) when the base sequence or polynucleotide sequence are used to describe RNA (such as mRNA). Therefore, whenever a DNA is disclosed by a specific sequence number (SEQ ID NO) herein, an RNA (e.g. mRNA or poly (A) tail) sequence complementary or corresponding to the DNA is also disclosed, wherein each T in the DNA sequence is substituted by a U.
Poly (A) Tail and Uses Thereof
[0109] As used herein, the term Poly A tail or Poly (A) sequence refers to an uninterrupted or non-interrupted sequence of adenylate residues typically located at the 3-terminus of an RNA molecule. In RNA, in the presence of a 3-UTR, the Poly-A sequence is linked to the 3 end of the 3-UTR. An uninterrupted Poly-A tail is characterized by consecutive adenylate residues. The Poly-A tail can be of any length. In some embodiments, the Poly-A tail comprises, or consists of, at least 20, at least 30, at least 40, at least 80 or at least 100, and at most 500, at most 400, at most 300, at most 200 or at most 150 adenylate nucleotides (A), in particular about 120 A. Typically, the vast majority of nucleotides in the Poly-A tail are adenosines, wherein the vast majority refers to at least 75%, at least 80%, at least 85%, at least 90% of the nucleotides, etc., but the remaining nucleotides are allowed to be nucleotides other than A (non-A nucleotides), such as U (uridylic acid), G (guanylic acid) or C (cytidylic acid).
[0110] In some embodiments, the in vitro preparation process of the RNA is a prokaryotic fermentation process, that is, the coding nucleic acid of the RNA molecule comprising the poly (A) tail is introduced into prokaryotic cells, and the prokaryotic cells are amplified to achieve the purpose of amplifying the coding nucleic acid, and then the amplified coding nucleic acid is transcribed into the RNA. In some embodiments, the in vitro preparation process of the RNA is to connect an RNA fragment comprising a protein coding sequence to a poly (A) tail by homologous recombination, enzyme digestion and ligation, or other non-homologous recombination methods, and the poly (A) tail is prepared by a prokaryotic fermentation process. During the prokaryotic fermentation process, the coding nucleic acid comprising the poly (A) tail is introduced into prokaryotic cells, and the prokaryotic cells are amplified to achieve the purpose of amplifying the coding nucleic acid; subsequently, the amplified coding nucleic acid is transcribed into an RNA comprising the poly (A) tail. In some embodiments, the aforementioned coding nucleic acid is linear. In some embodiments, the aforementioned coding nucleic acid is circular. In some embodiments, the aforementioned coding nucleic acid is a plasmid. In some embodiments, the aforementioned coding nucleic acid is single-stranded or double-stranded. In some embodiments, the aforementioned coding nucleic acid is chemically modified before being introduced into a prokaryotic cell. In some embodiments, the aforementioned coding nucleic acid is chemically synthesized before being introduced into a prokaryotic cell. In some embodiments, the coding nucleic acid is inserted into the nucleoid/karyoid genomic DNA of the prokaryotic cell. In some embodiments, the coding nucleic acid is free in the cytoplasm or outside the nucleoid/karyoid of the prokaryotic cell. In some embodiments, the prokaryotic cell is E. coli.
[0111] Based on this, the present application provides a series of poly (A) tails, which are highly conserved during the in vitro preparation of RNA. The poly (A) tail comprises one or more non-A nucleotides at one or more positions.
[0112] In some embodiments, the poly (A) tail coding sequence comprises: [0113] a single element a and at least one element b, and at least one element c; [0114] a single element a and at least one element b, and at least one element d; or [0115] a single element a and at least one element b, and at least one element c and at least one element d, [0116] wherein the element a and element b are not adjacent, the element c and element d are not adjacent, and [0117] the poly (A) tail coding sequence does not comprise any two elements b adjacent to each other, does not comprise any two elements c adjacent to each other, and does not comprise any two elements d adjacent to each other.
[0118] In some embodiments, the poly (A) tail coding sequence further comprises a single element e, wherein the element e consists of one or two consecutive A, which is located at the 3 terminus of the poly (A) tail coding sequence, and is adjacent to element d or element c.
[0119] The poly (A) tail herein can be a segment of RNA, or a hybrid molecule of DNA and RNA.
[0120] The present application also provides a poly (A) tail that can regulate protein expression level. In addition, the present application also provides a poly (A) tail that can regulate protein expression level while maintaining highly conservative during the in vitro preparation process. In some embodiments, the poly (A) tail that can regulate protein expression level; or the poly (A) tail that can regulate protein expression level while maintaining highly conservative during the in vitro preparation process is selected from the group consisting of: [0121] element a-element c-element b-element c-element b-element c-element b-element c-element b; [0122] element b-element c-element b-element c-element a-element d-element b-element c-element b-element c-element b; [0123] element b-element c-element b-element c-element b-element d-element a-element c; [0124] element a-element d-element b-element c-element b-element c-element b; and [0125] element b-element c-element b-element c-element b-element d-element a.
[0126] In some embodiments, the poly (A) tail that can regulate protein expression level; or the poly (A) tail that can regulate protein expression level while maintaining highly conservative during the in vitro preparation process is selected from the group consisting of: [0127] 60A-G-19A-G-19A-G-19A-G-3A; [0128] 7A-C-18A-G-60A-element d-7A-C-18A-G-14A; [0129] 60A-element d-19A-G-19A-G-17A; [0130] 19A-G-19A-G-19A-element d-60A; [0131] 19A-G-19A-G-19A-element d-60A-G; [0132] 19A-G-19A-G-19A-element d-60A; [0133] 19A-C-19A-C-19A-element d-60A; and [0134] 19A-T-19A-T-19A-element d-60A.
[0135] In some embodiments, the poly (A) tail that can regulate protein expression level; or the poly (A) tail that can regulate protein expression level while maintaining highly conservative during the in vitro preparation process is: 60A-element d-19A-G-19A-G-17A or 19A-G-19A-G-19A-element d-60A.
[0136] Particularly, the two elements connected by - are directly connected, and there is no nucleotide between the two elements.
[0137] In the above-mentioned structure of the poly (A) tail, yA represents the number of consecutive A in element a or element b, wherein y is a natural number, for example, 19A means that it comprises 19 consecutive A; 60A means that it comprises 60 consecutive A.
[0138] In some embodiments, the poly (A) tail that can regulate protein expression level; or the poly (A) tail that can regulate protein expression level while maintaining highly conservative during the in vitro preparation process is selected from any one of the polynucleotide sequences represented by SEQ ID NOs: 1-10.
[0139] In addition, the present application also provides use of the above-mentioned poly (A) tail for regulating protein expression, wherein in said use, the poly (A) tail is located at the 3 end of the mRNA, such as the 3 end of the 3 UTR. In some embodiments, the regulating protein expression uses the method for regulating protein expression described below.
Engineered DNA Molecules and Libraries
[0140] The present application also provides an engineered DNA molecule that can replicate in a cell, which comprises the coding sequence of the aforementioned Poly A tail or a complementary sequence thereof. Those skilled in the art should be understood that, in addition to the coding sequence of the Poly A tail, the engineered DNA molecule should also comprise structural elements necessary for the DNA molecule to replicate or to replicate efficiently in a cell. The structural elements necessary for the DNA molecule to replicate or to replicate efficiently in a cell are known in the art and comprise, for example, an origin of replication (ORI). In some embodiments, the engineered DNA molecule further comprises a label gene or a fragment thereof, and/or a reporter gene or a fragment thereof, and a unique restriction endonuclease site allowing insertion of DNA elements, preferably a restriction endonuclease site that functions as a multiple cloning site (MCS). The label gene facilitates the identification of a cell containing a plasmid comprising the label gene, which may be selected from, for example, an antibiotic resistance gene. Each restriction endonuclease site in the MCS can be specifically recognized by a different restriction endonuclease.
[0141] In some embodiments, the DNA molecule is a DNA plasmid. As used herein, the term DNA plasmid refers to a plasmid consisting of a double-stranded DNA molecule. In some embodiments, the plasmid is a circular DNA molecule. In some embodiments, the plasmid may also encompass a linear DNA molecule. Specifically, the term plasmid also encompasses molecules obtained by linearizing a circular plasmid, for example by cleaving a circular plasmid with a restriction endonuclease, thereby converting the circular plasmid molecule into a linear molecule, as well as a linear molecule replicable in a prokaryote. A plasmid can be replicates, i.e., can be amplified in a cell independently of the genomic genetic information stored in a nucleoid or karyoid of a prokaryotic cell, and can be used for cloning, i.e., for amplifying genetic information in a bacterial cell. Preferably, the DNA plasmid according to the present application is a medium copy or high copy plasmid, more preferably a high copy plasmid. Examples of such high copy plasmids are vectors based on pUC, pTZ plasmids or any other plasmid comprising an ORI supporting high copy of the plasmid (e.g., pMB1, pCoIE1 etc.).
[0142] In some embodiments, the DNA molecule is a DNA molecule or a fragment thereof constituting a nucleoid or karyoid of a prokaryotic organism, that is, the coding sequence comprising the aforementioned poly (A) tail or a complementary sequence thereof can be replicated along with the prokaryotic genome.
[0143] In some embodiments, the DNA molecule is further connected to a gene of interest fragment on the 5 end side of the poly (A) tail coding sequence, and the gene of interest fragment and the poly (A) tail coding sequence co-encode RNA. In some embodiments, the gene of interest fragment and the poly (A) tail coding sequence co-encode mRNA. The gene of interest fragment comprises a coding sequence of a protein, a polypeptide or a fragment thereof. In some embodiments, the gene of interest fragment also comprises a coding sequence for an element that can be used to initiate or regulate the expression of the protein, polypeptide or a fragment thereof after transcription, and the elements include, but are not limited to, 5 UTR, 3 UTR, etc. In some embodiments, the gene of interest fragment comprises a coding sequence of at least one untranslated region (UTR). In some embodiments, the gene of interest fragment comprises at least the coding sequence of the 5 UTR and the coding sequence of the protein, polypeptide or a fragment thereof. In some embodiments, the gene of interest fragment comprises from 5 to 3 sequentially at least: the coding sequence of 5 UTR, the coding sequence of protein, polypeptide or a fragment thereof, and the coding sequence of 3 UTR. The coding sequence of the protein, polypeptide or a fragment thereof can be ultimately translated into one or more proteins, or one or more polypeptides, for example, short peptides, oligopeptides, polypeptides, fusion proteins, proteins and fragments thereof, such as parts of known proteins, such as functional parts. The functional portion can be, for example, a biologically active portion of a protein, or an antigenic portion that can effectively generate antibodies, such as an antigenic epitope. The two ends of the coding sequence of the protein, polypeptide or a fragment thereof respectively comprise a start codon (5 end) and a stop codon (3 end), which are respectively the first three nucleotides and the last three nucleotides of the mRNA molecule that can be translated. The 5 UTR usually comprises at least one ribosome binding site (RBS), such as the Shine-Dalgarno sequence in a prokaryote, or at least one translation start site, such as the Kozak sequence in a eukaryote. RBS promotes efficient and accurate translation of mRNA molecules by recruiting ribosomes upon the translation initiation. The activity of which can be optimized by varying the length and sequence of a given RBS or translation-initiating site, as well as the distance from the given RBS or translation-initiating site to the start codon. Alternatively or optionally, the 5 UTR comprises an internal ribosome entry site or IRES. The 3 UTR may comprise one or more regulatory sequences, for example binding sites for amino acid sequences that enhance the stability of the mRNA molecule, binding sites for regulatory RNA molecules (such as miRNA molecules), and/or signal sequences involved in the intracellular transport of mRNA molecules.
[0144] Based on the foregoing embodiments, in some embodiments, the gene of interest fragment further comprises one or more additional regulatory sequences, such as binding sites for amino acid sequences that enhance the stability of mRNA molecules, binding sites for amino acid sequences that enhance the translation of mRNA molecules, regulatory elements (such as riboswitches), binding sites for regulatory RNA molecules (such as miRNA molecules), and/or nucleotide sequences that positively affect translation initiation. Furthermore, within the 5 UTR, preferably there is no functional upstream open reading frame, out-of-frame upstream translation start site, out-of-frame upstream start codon, and/or nucleotide sequences that produce secondary structures that reduce or prevent translation. The presence of such nucleotide sequences in the 5 UTR may negatively affect translation.
[0145] The coding sequence of the protein, polypeptide or a fragment thereof comprises codons that can be translated into an amino acid sequence. All the codons comprised in the coding sequence may be naturally occurring codons coding amino acids or may be partially or entirely composed of artificially synthesized codons. In some embodiments, some or all of the codons are codon optimized. In some embodiments, some or all of the codons encode unnatural amino acids.
[0146] In some embodiments, the DNA molecule further comprises structural elements necessary for initiating or regulating transcription of the RNA on the 5 end side of the gene of interest fragment, and the structural elements are known in the art. In some embodiments, the structural element comprises at least a promoter. Promoters and their sequences are known in the art, including weak promoters, medium strength promoters, strong promoters, mini-promoters or core promoters, etc. In some specific embodiments, the promoter is a strong promoter. In some embodiments, the promoter can initiate transcription of the gene of interest fragment and/or poly (A) tail in a prokaryotic cell. In some embodiments, the promoter can initiate transcription of the gene of interest fragment and/or poly (A) tail in a eukaryotic cell. The promoter comprises at least one transcription recognition site followed by a transcription factor binding site. The recognition and binding sites may interact with an amino acid sequence that mediates or regulates transcription. Compared with the recognition site, the binding site is more closer to the aforementioned gene of interest fragment. The binding site may be, for example, a Pribnow box in a prokaryote or a TATA box in a eukaryote. For example, in some embodiments, when the Pribnow box is used, the transcription recognition site can be located about 35 bp upstream of the transcription start site, while the transcription factor binding site can be located about 10 bp upstream of the transcription start site. In some embodiments, the promoter comprises at least one additional regulatory element, for example an AT-rich upstream element located approximately 40 and/or 60 nucleotides before the transcription start site, and/or an additional regulatory element for enhancing promoter activity located between the recognition site and the binding site. In some embodiments, the promoter is a strong promoter, i.e., the promoter comprises a sequence for promoting the transcription of the aforementioned RNA coding sequence. Strong promoters are known to those skilled in the art, such as the OXB18, OXB19 and OXB20 promoters derived from the RecA promoter of E. coli or can be identified or synthesized by routine laboratory procedures. In some embodiments, the promoter is a T7 promoter. In some embodiments, the promoter further comprises additional regulatory elements, such as an enhancer comprised in the DNA plasmid that can promote the transcription of the aforementioned RNA coding sequence.
[0147] The present application also provides a library comprising the aforementioned engineered DNA molecules. In some embodiments, the library comprises at least two DNA molecules having different poly (A) tail coding sequences.
[0148] In addition, the present application also provides use of the above-mentioned engineered DNA molecule for stable amplifying the poly (A) tail coding sequence or the coding sequence of RNA with a poly (A) tail. In some embodiments, the method for amplifying the coding sequence of the poly (A) tail or the coding sequence of the RNA with the poly (A) tail is as described below in the method for stably amplifying the poly (A) tail transcription template DNA in vitro.
Engineered RNAs and Libraries
[0149] The present application provides an RNA comprising the aforementioned poly (A) tail and a gene of interest fragment on the 5 end side of the poly (A) tail coding sequence. In some embodiments, the RNA further comprises a 5-cap structure. In some embodiments, the RNA is mRNA.
[0150] As used herein, mRNA (messenger RNA) is any RNA, naturally occurring, non-naturally occurring or modified, that encodes at least one protein, polypeptide, or a fragment thereof, which is enabled to be translated to produce the encoded protein, polypeptide, or a fragment thereof in vitro, in vivo, in situ, or ex vivo. Therefore, the mRNA may be a mature mRNA or a pre-mature mRNA, and the elements or structures that the mRNA must comprise or optionally comprise are known in the art. In some embodiments, the mRNA comprises coding sequences for multiple necessary functional elements to express, regulate, or enhance the expression level of the protein, polypeptide, or a fragment thereof. The functional elements include, but are not limited to, 5 cap, 5 UTR, 3 UTR, etc. Both the 5 UTR and the 3 UTR are usually transcribed from genomic DNA, which are the elements present in the pre-mature mRNA. As mature mRNA,
[0151] The term 5 cap is located at the 5 terminus of the mRNA, which comprises a methylated guanylate that is linked to the 5 terminus of the mRNA via pyrophosphate so as to form a 5,5-triphosphate linkage with its adjacent nucleotide. There are usually three types of 5 cap structures (m7G5ppp5Np, m7G5ppp5NmpNp, and m7G5ppp5NmpNmpNp), which are called type O, type I and type II, respectively. The type O means that the ribose of the terminal nucleotide is not methylated, the type I means that the ribose of one terminal nucleotide is methylated, and the type II means that the ribose of both terminal nucleotides is methylated. In some embodiments, as for the 5 cap, according to the manufacturer's protocol, a 5-guanosine cap structure can be produced by completion of capping the 5 of a polynucleotide during an in vitro transcription reaction through utilizing the following chemical RNA cap analogs: 3-O-Me-m7G (5)ppp(5)G [ARCA cap], G(5)ppp(5)A, G(5)ppp(5)G, m7G(5)ppp(5)A, m7G(5)ppp(5)G (New England BioLabs, Ipswich, MA), or m7G(5)ppp(5)(2-OMeA)pG (CleanCapAG). For example, in some embodiments, 5 capping of the modified RNA can be accomplished after transcription by using vaccinia virus capping enzyme to produce an O-type cap structure: m7G (5)ppp(5)G (New England BioLabs, Ipswich, MA). The type I cap structures can be generated by using both vaccinia virus capping enzyme and 2-O methyl-transferase to produce m7G(5)ppp(5)(2-OMeA)pG. The type II cap structures can be generated from the type I cap structures by subsequent 2-O-methylation of the third nucleotide from 5-terminus by using a 2-O-methyl-transferase. The type III cap structures can be generated from the type II cap structures by subsequent 2-O-methylation of the fourth nucleotide from 5-terminus by using a 2-O-methyl-transferase.
[0152] In some embodiments, some or all of the uridines in the mRNA are chemically modified uridines.
[0153] In some embodiments, some or all of the uridines in the mRNA are pseudouridines or 1-methyl-pseudouridines.
[0154] In some embodiments, some or all of the uracil nucleotides in the mRNA are substituted with pseudouridine () nucleotides or N1-methyl-pseudouridine (m1) nucleotides.
[0155] In some embodiments, the mRNA further comprises a stabilization element. The stabilization elements may comprise, for example, histone stem-loops. In some embodiments, the mRNA comprises a coding region, at least one histone stem-loop, and optionally a poly (A) sequence or a polyadenylation signal. The poly (A) sequence or polyadenylation signal should generally enhance the expression level of the encoded protein. In some embodiments, the mRNA comprises a combination of a poly (A) sequence or polyadenylation signal and at least one histone stem-loop, although both of them have alternative mechanisms in nature, they act synergistically to increase the protein expression to a level beyond those observed with either element alone. The synergistic effect of a combination of the poly (A) and at least one histone stem-loop is independent of the order of the elements or the length of the poly (A) sequence. In some embodiments, the histone stem-loop is generally derived from a histone gene, and comprises an intramolecular base-pairing formed loop by two adjacent partial or complete reverse complementary sequences separated by a spacer region (composed of a short sequence). The unpaired loop region is typically unable to base pair with either of the stem-loop elements. The stability of the stem-loop structure generally depends on the length, the number of mismatches or bulges, and the base composition of the paired region. In some embodiments, wobble base-pairing (non-Watson-Crick base-pairing) can be produced. In some embodiments, said at least one histone stem-loop sequence comprises 15-45 nucleotides in length.
[0156] In some embodiments, one or more AU-rich sequences of the mRNA may be removed. Such sequences are sometimes called AURES, which are destabilizing sequences found in the 3 UTR. The AURES may be removed from the mRNA. Alternatively, The AURES may be retained in the mRNA.
[0157] In some embodiments, the mRNA is formulated within a lipid nanoparticle (LNP). In some embodiments, lipids are mixed with the mRNA to form lipid nanoparticles. In some embodiments, the RNA is formulated in lipid nanoparticles. In some embodiments, the lipid nanoparticles are first formed as empty lipid nanoparticles, and then are combined or encapsulated with the mRNA of the vaccine just prior to administration (e.g., within a few minutes to an hour).
[0158] The lipid nanoparticles generally comprise ionizable lipids, non-cationic lipids, sterols and PEG lipid components and a target nucleic acid, such as the above-mentioned mRNA. The lipid nanoparticles according to the present disclosure can be produced by using components, compositions and methods generally known in the art, see, e.g., PCT/US2016/052352, PCT/US2016/068300, PCT/US2017/037551, PCT/US2015/027400, PCT/US2016/047406, PCT/U.S. Pat. No. 20,160,00129, PCT/US2016/014280, PCT/US2016/014280, PCT/US2017/038426, PCT/US2014/027077, PCT/US2014/055394, PCT/US2016/52117, PCT/US2012/069610, PCT/US2017/027492, PCT/US2016/059575 and PCT/US2016/069491, all of which are incorporated herein by reference in their entireties.
[0159] The present application also provides a library comprising the aforementioned mRNA molecules. The library comprises at least two mRNA molecules with different poly (A) tails.
[0160] The present application also provides uses of the mRNA and the mRNA library. At least two or more mRNA molecules having poly (A) tails with different influence gradients on the expression level of mRNA can be used to regulate the expression level of the coding sequence of the aforementioned protein, polypeptide or a fragment thereof, for example, by adjusting the ratio of different mRNA molecules in the library comprising said two or more mRNA molecules, or by introducing one or more of said two or more mRNA molecules with the same or different contents at different times.
Cells
[0161] The present application also provides cells comprising the aforementioned engineered DNA molecules, wherein the DNA molecules can be stored and/or amplified in the cells. In some embodiments, the cells are prokaryotic cells in which the DNA molecules can be replicated. In some embodiments, the cells are prokaryotic cells in which the DNA molecules can be replicated and/or transcribed. In some embodiments, the DNA molecules are eukaryotic cells in which the DNA molecules can be replicated. In some embodiments, the DNA molecules can be transcribed and/or replicated in the DNA-containing cells.
[0162] In some embodiments, the cell is a prokaryotic cell. In some embodiments, the cell is a bacterium, an actinomycete, a cyanobacterium, a mycoplasma, a rickettsia, and a chlamydia. In some embodiments, the cell is selected from the group consisting of Bacillus subtilis, Lactobacillus, Acetobacter, Corynebacterium, Brevibacterium, Arthrobacter, Pseudomonas, and Pediococcus. In some embodiments, the cell is a recA-bacterium. In some embodiments, the cell is E. coli. In some embodiments, the cell is E. coli, which is selected from the group consisting of K-12 and derivatives thereof, and B strain and derivatives thereof. In some embodiments, the E. coli is selected from the group consisting of MG1655, DH5 or DH5, DH10B, BL21, DB3.1, HB101, JM109, JM110, MC1061, MG1655, Pir1, Stbl2, Stbl3, Top10, XL1Blue, XL10Gold, BLR, HMS174, Tuner, Rostetta2, Lemo21, T7Express, and Origami2, etc. In some embodiments, the cell is selected from the group consisting of Streptomyces, Micromonospora, and Nocardia. In some embodiments, the cell is a fungus. In some embodiments, the cell is selected from a yeast or a mold.
Methods
[0163] The present application provides a method for stably amplifying a poly (A) tail transcription template DNA in vitro, so as to reduce the mutation frequency of the poly (A) tail transcription template sequence when the DNA is replicated in large quantities in a cell. The method comprises: expanding the cells comprising the engineered DNA molecules.
[0164] In some embodiments, prior to expanding the cells, the method further comprises introducing the engineered DNA molecule into the cells. In some embodiments, the introducing may comprise chemical transformation or electro-transformation. In some embodiments, the introducing is a natural endocytic process of the engineered DNA molecule performed by the cell.
[0165] In some embodiments, after expanding the cells, the method further comprises extracting cellular DNA and synthesizing the RNA by in vitro transcription. In some embodiments, after expanding the cells, the method further comprises inducing transcription of the RNA in the cells, and then extracting and isolating the RNA therein. In some embodiments, the method further comprises extracting the cellular DNA and transducing it into a second cell that can transcribe the RNA. In some embodiments, the transducing comprises administering to a human, wherein the administering is selected from the group consisting of intravenous, intraperitoneal, subcutaneous, intracranial, intrathecal, intraarterial (e.g., via the carotid artery), intramuscular, and intratumoral injection or perfusion.
[0166] In addition, the present application also provides a method for regulating protein expression, wherein the method comprises: [0167] introducing two or more of the aforementioned engineered DNA molecules into cells of interest at different times and/or in different quantitative ratios; or [0168] introducing two or more of the aforementioned RNA molecules into cells of interest at different times and/or in different quantitative ratios; [0169] wherein the two or more of the aforementioned engineered DNA molecules and the two or more of the aforementioned RNA molecules have different poly (A) tails, and the poly (A) tails have different influence gradients on the expression level of RNA.
[0170] In some embodiments, the present application also provides a method for regulating protein expression, wherein the method comprises introducing the aforementioned engineered DNA molecule or the aforementioned RNA molecule into a cell of interest. In some embodiments, the coding sequence of the poly (A) tail encoded by the DNA and the poly (A) tail comprised in the RNA comprises a structure selected from the group consisting of: [0171] element a-element c-element b-element c-element b-element c-element b-element c-element b; [0172] element b-element c-element b-element c-element a-element d-element b-element c-element b-element c-element b; [0173] element b-element c-element b-element c-element b-element d-element a-element c; [0174] element a-element d-element b-element c-element b-element c-element b; or [0175] element b-element c-element b-element c-element b-element d-element a.
[0176] In some embodiments, the coding sequence of the poly (A) tail encoded by the DNA and the poly (A) tail comprised in the RNA comprises a structure selected from the group consisting of: [0177] 60A-G-19A-G-19A-G-19A-G-3A; [0178] 7A-C-18A-G-60A-element d-7A-C-18A-G-14A; [0179] 60A-element d-19A-G-19A-G-17A; [0180] 19A-G-19A-G-19A-element d-60A; [0181] 19A-G-19A-G-19A-element d-60A-G; [0182] 19A-G-19A-G-19A-element d-60A; [0183] 19A-C-19A-C-19A-element d-60A; and [0184] 19A-T-19A-T-19A-element d-60A.
[0185] In some embodiments, the coding sequences of the poly (A) tail encoded by the DNA and the poly (A) tail comprised in the RNA comprise the following structure: 60A-element d-19A-G-19A-G-17A, or 19A-G-19A-G-19A-element d-60A.
[0186] Particularly, the two elements connected by the - are directly connected, and there is no nucleotide between the two elements.
[0187] In the above-mentioned structure of the poly (A) tail, yA represents the number of consecutive A in element a or element b, and y is a natural number, for example, 19A means that it comprises 19 consecutive A; 60A means that it comprises 60 consecutive A.
[0188] In some embodiments, the coding sequences of the poly (A) tail encoded by the DNA and the poly (A) tail comprised in the RNA comprise or consist of any one polynucleotide sequence selected from the polynucleotide sequences represented by SEQ ID NOs: 1-10.
[0189] It should be understood that the present application encompasses various aspects, embodiments and combinations of the aspects and/or embodiments described herein. The above description and the following Examples are intended to illustrate rather than limit the scope of the present application. Other aspects, improvements and modifications within the scope of the present application will be apparent to those skilled in the art. Therefore, those skilled in the art should appreciate that the scope of the present application also comprises such improvements and modifications to these aspects and embodiments.
EXAMPLES
Example 1: Construction of Poly (A) Tail
[0190] The poly (A) tails and the DNA sequences coding the poly (A) tails shown in Table 1 below were constructed by conventional genetic engineering methods.
TABLE-US-00001 TABLE 1 Design Specific structure Sequence Length P1 60A-G-19A-G-19A-G-19A-G-AAA SEQ ID NO: 1 124 nt P2 7A-C-18A-G-60A-element d-7A-C-18A-G-14A SEQ ID NO: 2 130 nt P3 60A-element d-19A-G-19A-G-17A SEQ ID NO: 3 123 nt P4 19A-G-19A-G-19A- element d-60A SEQ ID NO: 4 125 nt P5 19A-G-19A-G-19A- element d-60A-G SEQ ID NO: 5 126 nt P6 19A-G-19A-G-19A-element d-60A SEQ ID NO: 6 125 nt P7 19A-G-19A-G-19A-element d-60A SEQ ID NO: 7 131 nt P8 19A-C-19A-C-19A-element d-60A SEQ ID NO: 8 125 nt P9 19A-T-19A-T-19A-element d-60A SEQ ID NO: 9 125 nt P10 19A-G-19A-G-19A-element d-60A SEQ ID NO: 10 125 nt C1 A60-element d-A60 SEQ ID NO: 11 130 nt C2 A30-element d-A70 SEQ ID NO: 12 110 nt C3 A60-element c-A60 SEQ ID NO: 13 121 nt C4 A60-element d-A60 SEQ ID NO: 14 126 nt
Example 2: Testing Poly (A) Function by Using Luciferase Coding Sequence as an Example
2.1 Testing the Replication Stability of mRNA-Encoding DNA Molecules in Prokaryotic Cells
[0191] Using luciferase as the protein coding region, the stability of different poly (A) variants in E. coli and their effects on luciferase expression in cells were investigated. [0192] 1) Construction of a universal vector comprising the luciferase protein coding region
[0193] The E. coli cloning vector pUC57 was used as the vector backbone in this universal vector, and a T7 promoter sequence (5-TAATACGACTCACTATAAGG-3), a 5 UTR, a luciferase protein, a 3 UTR and a polyadenylic acid string poly (dA:dT) were sequentially arranged between the multiple cloning sites: Xba I restriction site and the EcoR I restriction site. [0194] 2) The polyadenylic acid string poly (dA:dT) in the universal vector was substituted with P1-P10 of the present application, and controls C1-C4 (A60-10 nt spacer-A60 (control C1), A30-10 nt spacer-A70 (control C2), A60-1 nt spacer-A60 (control C3) or A60-6 nt spacer-A60 (control C4). Among them, the C1 and C2 are derived from a patent of the literature (U.S. Pat. No. 10,717,982B2).
[0195] All primers required for constructing P1-P10 and control C1-C4 were synthesized, and double digestion was conducted by two restriction endonucleases to remove the poly (dA:dT) from the universal vector constructed in step 1); and then P1-P10 and C1-C4 were ligated to the vector from which the poly (dA:dT) was removed by T4 DNA ligase 1, thereby completing the substitution of the poly (dA:dT) in the universal vector. [0196] 3) Detection of the replication stability of different poly (A) variants in E. coli
[0197] The vector plasmid constructed in step 2) was confirmed to be correct by sequencing, and then transformed into E. coli DH5. The transformed plate was grown at 30 C., then completing plasmid extraction and sequencing. After the sequencing was completed, the stability and base deletion of different poly (A) variants were analyzed and calculated based on the sequencing results. The replication stability is expressed by the percentage of clones without any base changes, and the higher the percentage, the higher the replication stability of the plasmid in E. coli.
[0198] The results are shown in
[0199] A total of 100 clones were tested for control C1, in which 15 clones had base deletions, accounting for 15%; and the number of correct clones was 85, accounting for 85%.
[0200] A total of 50 clones were tested for control C2, in which all clones were correct without any base changes or deletions, with a correct clone percentage of 100%.
[0201] A total of 50 clones were tested for control C3, in which 9 clones had base deletions, accounting for 18%; and the number of correct clones was 41, accounting for 82%.
[0202] A total of 50 clones were tested for control C4, in which 14 clones had base deletions, accounting for 28%; and the number of correct clones was 36, accounting for 72%.
[0203] A total of 100 clones were tested for P1, in which 9 clones had base deletions, accounting for 9%; and the number of correct clones was 91, accounting for 91%.
[0204] A total of 100 clones were tested for P2, in which 12 clones had base deletions, accounting for 12%; the number of correct clones was 88, accounting for 88%.
[0205] A total of 62 clones were tested for P3, in which 5 clones had base deletions, accounting for 8%; and the number of correct clones was 57, accounting for 92%.
[0206] A total of 50 clones were tested for P4, in which 3 clones had base deletions, accounting for 6%; and the number of correct clones was 47, accounting for 94%.
[0207] A total of 50 clones were tested for P5, in which 4 clones had base deletions, accounting for 8%; and the number of correct clones was 46, accounting for 92%.
[0208] A total of 50 clones were tested for P6, in which 5 clones had base deletions, accounting for 10%; and the number of correct clones was 45, accounting for 90%.
[0209] A total of 50 clones were tested for P7, in which 7 clones had base deletions, accounting for 14%, and the number of correct clones was 43, accounting for 86%.
[0210] A total of 50 clones were tested for P8, in which 3 clones had base deletions, accounting for 6%, and the number of correct clones was 47, accounting for 94%.
[0211] A total of 50 clones were tested for P9, in which 4 clones had base deletions, accounting for 8%, and the number of correct clones was 46, accounting for 92%.
[0212] A total of 50 clones were tested for P10, in which 5 clones had base deletions, accounting for 10%, and the number of correct clones was 45, accounting for 90%.
[0213] Taking into account the results of the correct clone ratio and the number of deleted bases, the poly (A) variant designed in the present application is superior to or comparable to the prior art in terms of replication stability in E. coli cells. Particularly, the replication stability of poly (A) variants P3, P4, and P8 is the highest. The replication stability of P3, P4, and P8 is equivalent to that of C2, with no statistically significant difference (p>0.05, x.sup.2 test). In addition, the replication stability of P3, P4, and P8 is better than that of the controls C1, C3 and C4, and the difference is statistically significant (p<0.05, x.sup.2 test).
Example 3: Testing the Poly (A) Function by Using HPV Antigen Protein Coding Sequence as an Example
1) Construction of Plasmid With HPV as Protein Coding Region
[0214] As described above, in Example 1, vectors comprising luciferase coding genes combined with different poly (A)s were constructed. Based on these vectors, the luciferase coding gene was substituted with the HPV coding gene by conventional molecular cloning methods; and the main elements were arranged in order of T7 promoter sequence (5-TAATACGACTCACTATAAGG-3), 5 UTR, HPV antigen protein coding sequence, 3 UTR and poly (A) coding sequence.
2) Performing Small-Scale Bacterial Culture to Test the Stability of Four HPV poly (A) Variants in E. coli
[0215] The HPV vector plasmid comprising P1, P2, P3 and P4 constructed in step 1) was confirmed to be correct by sequencing, and then transformed into E. coli DH5. The transformed plates were grown at 30 C., then completing plasmid extraction and sequencing. After completion of sequencing, the stability and base deletion of different poly (A) variants were analyzed and calculated based on the sequencing results. The stability is expressed by the percentage of clones without any base changes, and the higher the percentage, the more stable it is.
[0216] The results showed (
[0223] Taking into account the ratio of mutations and the average number of base deletions, when the protein-coding gene of the luciferase in Example 1 was substituted with the HPV antigen protein, the different poly (A) variants of the present application still maintain high replication stability; particularly, P3 and C2 have comparable stability, with no statistically significant difference (p>0.05, x.sup.2 test), and compared with new variants in other groups of the present application, the cloning stability is optimal. The probability of large fragment deletion in C2 is 1/50-2%, while the probability of large fragment deletion in P3 is 1/70=1.4%. Since large poly (A) fragment deletion will affect the in vivo expression and efficacy of mRNA products, P3 is more in line with product requirements. The above results indicate that the poly (A) variants designed in the present application are universally applicable in examples involving different protein coding regions.
[0224] As described above, in Example 1 and Example 2, the plates transformed with E. coli were cultured in a biochemical incubator at 30 C. overnight, and the obtained clones were sequenced to evaluate the replication stability. In addition, since the culture temperature of E. coli affects the DNA replication rate, which further affects the replication stability. With respect to Example 3, after the plates transformed with E. coli were cultured in a 37 C. biochemical incubator, the sequencing detections were also compared in the present application. The results show (
3) Detection of the Stability of the Three Poly (A) Variants in Large-Scale Fermentation and at Different Generations
[0225] The production of mRNA drugs must rely on large-scale fermentation to prepare sufficient template plasmids, and the stability of the plasmid during fermentation (in this application, the stability of the plasmid particularly refers to the stability of poly dA:dT) is crucial to the production of mRNA drugs with uniform quality. On the other hand, in order to meet the stability requirements of different batches of production, it is necessary to establish a strain library comprising the plasmid of interest, including strain libraries of different generations such as primary library and secondary library. Therefore, it is necessary to evaluate the plasmid stability in E. coli for different generations. In response to the above two issues, the present application detected the stability of P1 and P3 between different generations of the fermentation process in Example 3. According to the sequencing results, 4 correct E. coli clones were selected for each poly (A) variant to perform passage culture through fermentation, respectively. The results show that, at the 3rd, 5th, 7th, and 9th seed passages, the plasmids of the four clones of P1 and P3 remain stable, without any base changes.
Example 4: Detection of Luciferase mRNA Expression Level in Mice
[0226] In eukaryotic cells, a poly (A) tail with a certain length is essential for protecting the 3 end of mRNA, maintaining mRNA stability and promoting protein expression. Affected by the physiological or environmental factors in the body, poly (A) gradually becomes shorter, thus priming mRNA degradation. This application investigated the effects of different poly (A) variants on protein expression levels. We performed in vivo expression assays in mice to evaluate the effects of different poly A variants on luciferase activity. During the implementation of this application, luciferase mRNA-LNP comprising control C2, as well as P3, P4, P5, P8 and P9 were injected intramuscularly into mice. Animal imaging was performed 6 hours after injection, and the fluorescence intensity was quantitatively to compare the effects of different poly (A) s on the activity of luciferase in vivo. The specific experimental process is as follows:
[0227] Luciferase mRNA was synthesized by in vitro transcription, and linearized DNA was obtained by digestion with type II restriction endonuclease BspQ I. The 3 end of the linearized DNA was exhibited as different poly (A): control C2, P3, P4, P5, P8 or P9. The linearized DNA was used as a template for in vitro transcription. A 100 l reaction system in which 1 reaction buffer; 5 mM (final concentration) of ATP, CTP, NIM-UTP and GTP, respectively; 4 mM (final concentration) of CleanCap AG; and 5 l of in vitro transcriptase were included. After mixing the reaction mixture thoroughly, the reaction was conducted at 37 C. for 3 h. The in vitro transcribed mRNA was collected by LiCl precipitation, and finally dissolved in enzyme-free water.
[0228] Animal experiment: the in vitro synthesized luciferase mRNA was encapsulated into LNPs, and the obtained mRNA stock solution was dispersed in 20 mM acetic acid solution (pH 5.0) respectively to obtain an RNA solution with an mRNA concentration of 200 g/mL. The mixed fat was prepared by mixing ionizable fat, cholesterol, DSPC, and DMG-PEG2000 with the molar ratio of ionizable fat:cholesterol:DSPC:DMG-PEG2000=50:38.5:10:1.5. The mRNA and the lipid mixture were mixed by controlling the flow rates of the aqueous phase and the oil phase through T mixing, and the injection pump was started to mix the mRNA solution with the lipid mixture to form LNP. Then, the solution was diluted 10 times with diluent, then concentrated by centrifugation in an ultrafiltration tube, followed by three times of replacement. The solution obtained above was added with Tris aqueous solution to adjust the pH to 7.0-8.0 to obtain an LNP-encapsulated mRNA solution, and the LNP means lipid nanoparticle. The concentration and particle size of the LNP-encapsulated mRNA were determined by using Ribogreen RNA quantification kit (Invitrogen, R11490) and Darwin ZetaSizer particle size analyzer, respectively. In the 4-component LNP, the molar ratio of each component is given as SM102:DSPC:cholesterol:DMG-PEG2000=50:10:38.5:1.5. After encapsulation, quality control was conducted for the LNP by measuring the particle size, encapsulation efficiency, PDI and other indicators. The quality control results show that the prepared LNPs meet the standards of particle size range of 50 nm-150 nm, PDI<0.3, and encapsulation efficiency>90%, which could be used for subsequent experiments. The mRNA content in LNP was determined by the ribogreen method, and then LNP was diluted to an mRNA content of 100 ng/L.
[0229] BALB/c mice were randomly divided into groups according to body weight, and administered after 2-3 days of adaptive feeding. Each of control C2, P3, P4, P5, P8 and P9 was respectively injected into five mice, and each mouse was administered 100 l (10 g mRNA) by intramuscular injection. In addition, five mice injected with PBS were used as a control group. Animal imaging was performed 6 h after the injection, and then fluorescence values were calculated. The results show (
[0230] The sequences used in the above examples of this application are shown in the following sequence listing. It should be understood that, the following sequences are merely exemplary sequences of the embodiments of the present application, and do not impose any limitation on the embodiments of the present application. The nucleic acid sequences in the following sequence listing may represent DNA sequences or RNA sequences, and T represents a uridine when they represent RNA sequences.
TABLE-US-00002 SequenceListing: SEQ ID NO. Name Sequence Nucleicacidsequence 1 P1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAA AAAAGAAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAA AAAAAAGAAA 2 P2 AAAAAAACAAAAAAAAAAAAAAAAAAGAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAGGAAAAAAACAAAAAAAAAAAAAAAAA AGAAAAAAAAAAAAAA 3 P3 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAGATATCAAAAAAAAAAA AAAAAAAAGAAAAAAAAAAAAAAAAAAAGAAAAAAAAA AAAAAAAA 4 P4 AAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAA AGAAAAAAAAAAAAAAAAAAAGATATCAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA 5 P5 AAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAA AGAAAAAAAAAAAAAAAAAAAGATATCAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAG 6 P6 AAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAA AGAAAAAAAAAAAAAAAAAAAGTATACAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA 7 P7 AAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAA AGAAAAAAAAAAAAAAAAAAAGATATCGTATACAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAA 8 P8 AAAAAAAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAA ACAAAAAAAAAAAAAAAAAAAGATATCAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA 9 P9 AAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAAAAAAA TAAAAAAAAAAAAAAAAAAAGATATCAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAA 10 P10 AAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAA AGAAAAAAAAAAAAAAAAAAAGAATCTAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA 11 C1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAGCATATGACTAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAA 12 C2 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCATATGAC TAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 13 C3 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAA 14 C4 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAATGCATAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAA 15 Elementd, GATATC example1 16 Elementd, GTATAC example2 17 Elementd, GAATCT example3 18 Elementd, GCATATGACT example4 19 Elementd, GATATCGTATAC example5 20 3UTR CTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTT CCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGC CTTGAGCATCTGGATTCTGCCTGCTCGCTTTCTTGCTGTCCA ATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAA ACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGC CT 21 5UTR ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAA ACAGACACC 22 Sequence ATGGAAGACGCCAAAAACATTAAGAAGGGCCCAGCGCCAT of TCTACCCACTCGAAGACGGGACCGCCGGCGAGCAGCTGCA luciferase CAAAGCCATGAAGCGCTACGCCCTGGTGCCCGGCACCATCG encoding CCTTTACCGACGCACATATCGAGGTGGACATTACCTACGCCG gene AGTACTTCGAGATGAGCGTTCGGCTGGCAGAAGCTATGAAG CGCTATGGGCTGAATACAAACCATCGGATCGTGGTGTGCAG CGAGAATAGCTTGCAGTTCTTCATGCCCGTGTTGGGTGCCCT GTTCATCGGTGTGGCTGTGGCCCCAGCTAACGACATCTACA ACGAGCGCGAGCTGCTGAACAGCATGGGCATCAGCCAGCC CACCGTCGTATTCGTGAGCAAGAAAGGGCTGCAAAAGATCC TCAACGTGCAAAAGAAGCTACCGATCATACAAAAGATCATC ATCATGGATAGCAAGACCGACTACCAGGGCTTCCAAAGCAT GTACACCTTCGTGACTTCCCATTTGCCACCCGGCTTCAACG AGTACGACTTCGTGCCCGAGAGCTTCGACCGGGACAAAAC CATCGCCCTGATCATGAACAGTAGTGGCAGTACCGGATTGC CCAAGGGCGTAGCCCTACCGCACCGCACCGCTTGTGTCCGA TTCAGTCATGCCCGCGACCCCATCTTCGGCAACCAGATCATC CCCGACACCGCTATCCTCAGCGTGGTGCCATTTCACCACGG CTTCGGCATGTTCACCACGCTGGGCTACTTGATCTGCGGCTT TCGGGTCGTGCTCATGTACCGCTTCGAGGAGGAGCTATTCTT GCGCAGCTTGCAAGACTATAAGATTCAATCTGCCCTGCTGGT GCCCACACTATTTAGCTTCTTCGCTAAGAGCACTCTCATCGA CAAGTACGACCTAAGCAACTTGCACGAGATCGCCAGCGGC GGGGCGCCGCTCAGCAAGGAGGTAGGTGAGGCCGTGGCCA AACGCTTCCACCTACCAGGCATCCGCCAGGGCTACGGCCTG ACAGAAACAACCAGCGCCATTCTGATCACCCCCGAAGGGG ACGACAAGCCTGGCGCAGTAGGCAAGGTGGTGCCCTTCTTC GAGGCTAAGGTGGTGGACTTGGACACCGGTAAGACACTGG GTGTGAACCAGCGCGGCGAGCTGTGCGTCCGTGGCCCCATG ATCATGAGCGGCTACGTTAACAACCCCGAGGCTACAAACGC TCTCATCGACAAGGACGGCTGGCTGCACAGCGGCGACATCG CCTACTGGGACGAGGACGAGCACTTCTTCATCGTGGACCGG CTCAAAAGCCTGATCAAATACAAGGGCTACCAGGTAGCCCC AGCCGAACTGGAGAGCATCCTGCTGCAACACCCCAACATCT TCGACGCCGGGGTCGCCGGCCTGCCCGACGACGATGCCGG CGAGCTGCCCGCCGCAGTCGTCGTGCTGGAACACGGTAAA ACCATGACCGAGAAGGAGATCGTGGACTATGTGGCCAGCCA GGTTACAACCGCCAAGAAGCTGCGCGGTGGTGTTGTGTTCG TGGACGAGGTGCCTAAAGGACTGACCGGCAAGTTGGACGC CCGCAAGATCCGCGAGATTCTCATTAAGGCCAAGAAGGGCG GAAAGATCGCCGTGTAA 23 Sequence ATGGATGCTATGAAACGGGGCCTGTGCTGCGTGCTGCTCCT ofHPV GTGCGGCGCTGTGTTTGTGAGCCCTAGCATCACCCAGGACT protein GCTCCTTCCAACACAGCCCCATCTCCTCCGACTTCGCTGTCA encoding AAATCCGTGAGCTGTCTGACTACCTGCTTCAAGATTACCCA gene GTCACCGTGGCCTCCAACCTGCAGGACGAGGAGCTCTGCG GGGGCCTCTGGCGGCTGGTCCTGGCACAGCGCTGGATGGA GCGGCTCAAGACTGTCGCTGGGTCCAAGATGCAAGGCTTGC TGGAGCGCGTGAACACGGAGATACACTTTGTCACCAAATGT GCCTTTCAGCCCCCCCCCAGCTGTCTTCGCTTCGTCCAGAC CAACATCTCCCGCCTCCTGCAGGAGACCTCCGAGCAGCTGG TGGCGCTGAAGCCCTGGATCACTCGCCAGAACTTCTCCCGG TGCCTGGAGCTGCAGTGTCAGCCCGACTCCTCAACCCTGCC ACCCCCATGGAGTCCCCGGCCCCTGGAGGCCACAGCCCCG ACAGCCCCGGGCGGCGGCAGCGGCGATATGCACCAGAAGA GAACCGCCATGTTCCAGGACCCTCAGGAGAGACCTAGGAA GCTGCCTCACCTGTGTACAGAGCTCCAGACAACCATCCACG ACATCATCCTGGAGTGCGTGTACTGTAAGCAGCAGCTGCTG AGAAGAGAGGTGTACGACTTCGCCTTCAGAGACCTGTGCAT CGTGTACAGAGACGGCAACCCTTACGCCGTGTGCGATAAGT GTCTGAAGTTCTATTCCAAAATCTCCGAATATAGGTACATGC ACGGCGACACCCCTACCCTGCACGAGTACATGCTGGACCTC CAGCCTGAGACCACAGACCTGTACTGCTACGAGCAGCTGA ACGACAGCTCTGAGGAAGAGGACGAGATTGACGGACCTGC TGGCCAGGCCGAGCCTGACAGAGCCCACTACAATATCGTGA CATTCTGTTGCAAATGCGACTCCACACTGGACAAGTGCCTG AAGTTCTACAGCAAGATCTCTGAGTACAGATACTACTGCTAC TCTGTGTACGGCACCACACTGGAGCAGCAGTACAACAAGC CTCTGTGCGACCTCCTGATCCGCTGCATCAACTGCCAGAAG CCTCTGTGCCCTGAGGAGAAGCAGAGACACCTGGACAAGA AGCAGCGGTTCCACAACATCAGAGGCAGATGGACCGGCAG GTGCATGTCCTGCTGTAGATCCTCCAGAACCAGACGGGAGA CCCAGCTGCACTACAACATCGTGACCTTCTGCTGCAAGTGC GACTCTACCCTGAGACTGTGCGTGCAGTCTACCCACGTGGA CATCAGAACCCTGGAGGACCTGCTGATGGGCACCCTGGGCA TCGTGTGCCCTATCTGCTCTCAGAAGCCTATGGCCAGGTTCG AGGACCCTACCAGAAGACCCTACAAGCTGCCTGACCTGTGC ACCGAGCTGAACACCTCTCTGCAAGACATCGAGATCACCTG CGTGTACTGCAAGACCGTGCTGGAGCTGACCGAGGTGTTCG AGTTCGCCTTCAAGGACCTGTTCGTGGTGTACAGAGACAGC ATCCCTCACGCTGCCTGCCACAAGTGCATCGACTTCTATTCC AGGATCAGGGAGCTGCGCTATTACTCCGACTCTGTGATGTAC GGCCCCAAGGCCACCCTCCAGGACATCGTGCTGCACCTGGA GCCTCAGAACGAGATCCCCGTGGACCTGCTGTGCCACGAGC AGCTGTCTGACTCTGAAGAGGAGAACGACGAGATCGACGG CGTGAACCACCAGCACCTGCCTGCCAGGAGAGCTGAACCC CAGCGGCATACCATGCTGTGTATGTGCTTCTACTCTAGGATC AGAGAGCTGAGGTACTACTCTGACTCTGTGTACGGCGACAC CCTGGAGAAGCTGACCAACACCGGCCTGTACAACCTGCTGA TCCGGTGCCTGAGGTGCCAGAAGCCTCTGAACCCTGCCGA GAAGCTGAGACACCTGAACGAGAAGAGAAGATTCCACAAG ATCGCTGGCCACTACAGAGGCCAGTGCCACTCTTGCTGCAA CAGAGCCAGACAGGAGAGACTCCAGCGGAGAAGGGAGAC CCAGGTGGCCAGGAGAGCCGAGCCTCAGAGACACACCATG CTGTGCATGTGCTGCAAGTGCGAGGCCAGAATCGAGCTGGT GGTGGAGAGCTCTGCCGACGACCTGAGAGCCTTCCAGCAG CTGTTCCTGTCTACCCTGAGCTTCGTGTGCCCTTGGTGCGCC TCTCAGCAGTAA 24 5 ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAA UTR-HPV ACAGACACCGGATCCGCCACCATGGATGCTATGAAACGGGG protein CCTGTGCTGCGTGCTGCTCCTGTGCGGCGCTGTGTTTGTGA encoding GCCCTAGCATCACCCAGGACTGCTCCTTCCAACACAGCCCC gene-3 ATCTCCTCCGACTTCGCTGTCAAAATCCGTGAGCTGTCTGAC UTR-poly TACCTGCTTCAAGATTACCCAGTCACCGTGGCCTCCAACCT (A)P3 GCAGGACGAGGAGCTCTGCGGGGGCCTCTGGCGGCTGGTC CTGGCACAGCGCTGGATGGAGCGGCTCAAGACTGTCGCTG GGTCCAAGATGCAAGGCTTGCTGGAGCGCGTGAACACGGA GATACACTTTGTCACCAAATGTGCCTTTCAGCCCCCCCCCAG CTGTCTTCGCTTCGTCCAGACCAACATCTCCCGCCTCCTGCA GGAGACCTCCGAGCAGCTGGTGGCGCTGAAGCCCTGGATC ACTCGCCAGAACTTCTCCCGGTGCCTGGAGCTGCAGTGTCA GCCCGACTCCTCAACCCTGCCACCCCCATGGAGTCCCCGGC CCCTGGAGGCCACAGCCCCGACAGCCCCGGGCGGCGGCAG CGGCGATATGCACCAGAAGAGAACCGCCATGTTCCAGGACC CTCAGGAGAGACCTAGGAAGCTGCCTCACCTGTGTACAGA GCTCCAGACAACCATCCACGACATCATCCTGGAGTGCGTGT ACTGTAAGCAGCAGCTGCTGAGAAGAGAGGTGTACGACTT CGCCTTCAGAGACCTGTGCATCGTGTACAGAGACGGCAACC CTTACGCCGTGTGCGATAAGTGTCTGAAGTTCTATTCCAAAA TCTCCGAATATAGGTACATGCACGGCGACACCCCTACCCTGC ACGAGTACATGCTGGACCTCCAGCCTGAGACCACAGACCTG TACTGCTACGAGCAGCTGAACGACAGCTCTGAGGAAGAGG ACGAGATTGACGGACCTGCTGGCCAGGCCGAGCCTGACAG AGCCCACTACAATATCGTGACATTCTGTTGCAAATGCGACTC CACACTGGACAAGTGCCTGAAGTTCTACAGCAAGATCTCTG AGTACAGATACTACTGCTACTCTGTGTACGGCACCACACTG GAGCAGCAGTACAACAAGCCTCTGTGCGACCTCCTGATCCG CTGCATCAACTGCCAGAAGCCTCTGTGCCCTGAGGAGAAGC AGAGACACCTGGACAAGAAGCAGCGGTTCCACAACATCAG AGGCAGATGGACCGGCAGGTGCATGTCCTGCTGTAGATCCT CCAGAACCAGACGGGAGACCCAGCTGCACTACAACATCGT GACCTTCTGCTGCAAGTGCGACTCTACCCTGAGACTGTGCG TGCAGTCTACCCACGTGGACATCAGAACCCTGGAGGACCTG CTGATGGGCACCCTGGGCATCGTGTGCCCTATCTGCTCTCAG AAGCCTATGGCCAGGTTCGAGGACCCTACCAGAAGACCCTA CAAGCTGCCTGACCTGTGCACCGAGCTGAACACCTCTCTGC AAGACATCGAGATCACCTGCGTGTACTGCAAGACCGTGCTG GAGCTGACCGAGGTGTTCGAGTTCGCCTTCAAGGACCTGTT CGTGGTGTACAGAGACAGCATCCCTCACGCTGCCTGCCACA AGTGCATCGACTTCTATTCCAGGATCAGGGAGCTGCGCTATT ACTCCGACTCTGTGATGTACGGCCCCAAGGCCACCCTCCAG GACATCGTGCTGCACCTGGAGCCTCAGAACGAGATCCCCGT GGACCTGCTGTGCCACGAGCAGCTGTCTGACTCTGAAGAG GAGAACGACGAGATCGACGGCGTGAACCACCAGCACCTGC CTGCCAGGAGAGCTGAACCCCAGCGGCATACCATGCTGTGT ATGTGCTTCTACTCTAGGATCAGAGAGCTGAGGTACTACTCT GACTCTGTGTACGGCGACACCCTGGAGAAGCTGACCAACA CCGGCCTGTACAACCTGCTGATCCGGTGCCTGAGGTGCCAG AAGCCTCTGAACCCTGCCGAGAAGCTGAGACACCTGAACG AGAAGAGAAGATTCCACAAGATCGCTGGCCACTACAGAGG CCAGTGCCACTCTTGCTGCAACAGAGCCAGACAGGAGAGA CTCCAGCGGAGAAGGGAGACCCAGGTGGCCAGGAGAGCC GAGCCTCAGAGACACACCATGCTGTGCATGTGCTGCAAGTG CGAGGCCAGAATCGAGCTGGTGGTGGAGAGCTCTGCCGAC GACCTGAGAGCCTTCCAGCAGCTGTTCCTGTCTACCCTGAG CTTCGTGTGCCCTTGGTGCGCCTCTCAGCAGTAAGGCGCGC CGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTG TTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGG CCTTGAGCATCTGGATTCTGCCTGCTCGCTTTCTTGCTGTCC AATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTA AACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTG CCTGAATTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGATATCAAA AAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAAAAAGA AAAAAAAAAAAAAAAA 25 5 ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAA UTR-HPV ACAGACACCGGATCCGCCACCATGGATGCTATGAAACGGGG protein CCTGTGCTGCGTGCTGCTCCTGTGCGGCGCTGTGTTTGTGA encoding GCCCTAGCATCACCCAGGACTGCTCCTTCCAACACAGCCCC gene-3 ATCTCCTCCGACTTCGCTGTCAAAATCCGTGAGCTGTCTGAC UTR-poly TACCTGCTTCAAGATTACCCAGTCACCGTGGCCTCCAACCT (A)P4 GCAGGACGAGGAGCTCTGCGGGGGCCTCTGGCGGCTGGTC CTGGCACAGCGCTGGATGGAGCGGCTCAAGACTGTCGCTG GGTCCAAGATGCAAGGCTTGCTGGAGCGCGTGAACACGGA GATACACTTTGTCACCAAATGTGCCTTTCAGCCCCCCCCCAG CTGTCTTCGCTTCGTCCAGACCAACATCTCCCGCCTCCTGCA GGAGACCTCCGAGCAGCTGGTGGCGCTGAAGCCCTGGATC ACTCGCCAGAACTTCTCCCGGTGCCTGGAGCTGCAGTGTCA GCCCGACTCCTCAACCCTGCCACCCCCATGGAGTCCCCGGC CCCTGGAGGCCACAGCCCCGACAGCCCCGGGCGGCGGCAG CGGCGATATGCACCAGAAGAGAACCGCCATGTTCCAGGACC CTCAGGAGAGACCTAGGAAGCTGCCTCACCTGTGTACAGA GCTCCAGACAACCATCCACGACATCATCCTGGAGTGCGTGT ACTGTAAGCAGCAGCTGCTGAGAAGAGAGGTGTACGACTT CGCCTTCAGAGACCTGTGCATCGTGTACAGAGACGGCAACC CTTACGCCGTGTGCGATAAGTGTCTGAAGTTCTATTCCAAAA TCTCCGAATATAGGTACATGCACGGCGACACCCCTACCCTGC ACGAGTACATGCTGGACCTCCAGCCTGAGACCACAGACCTG TACTGCTACGAGCAGCTGAACGACAGCTCTGAGGAAGAGG ACGAGATTGACGGACCTGCTGGCCAGGCCGAGCCTGACAG AGCCCACTACAATATCGTGACATTCTGTTGCAAATGCGACTC CACACTGGACAAGTGCCTGAAGTTCTACAGCAAGATCTCTG AGTACAGATACTACTGCTACTCTGTGTACGGCACCACACTG GAGCAGCAGTACAACAAGCCTCTGTGCGACCTCCTGATCCG CTGCATCAACTGCCAGAAGCCTCTGTGCCCTGAGGAGAAGC AGAGACACCTGGACAAGAAGCAGCGGTTCCACAACATCAG AGGCAGATGGACCGGCAGGTGCATGTCCTGCTGTAGATCCT CCAGAACCAGACGGGAGACCCAGCTGCACTACAACATCGT GACCTTCTGCTGCAAGTGCGACTCTACCCTGAGACTGTGCG TGCAGTCTACCCACGTGGACATCAGAACCCTGGAGGACCTG CTGATGGGCACCCTGGGCATCGTGTGCCCTATCTGCTCTCAG AAGCCTATGGCCAGGTTCGAGGACCCTACCAGAAGACCCTA CAAGCTGCCTGACCTGTGCACCGAGCTGAACACCTCTCTGC AAGACATCGAGATCACCTGCGTGTACTGCAAGACCGTGCTG GAGCTGACCGAGGTGTTCGAGTTCGCCTTCAAGGACCTGTT CGTGGTGTACAGAGACAGCATCCCTCACGCTGCCTGCCACA AGTGCATCGACTTCTATTCCAGGATCAGGGAGCTGCGCTATT ACTCCGACTCTGTGATGTACGGCCCCAAGGCCACCCTCCAG GACATCGTGCTGCACCTGGAGCCTCAGAACGAGATCCCCGT GGACCTGCTGTGCCACGAGCAGCTGTCTGACTCTGAAGAG GAGAACGACGAGATCGACGGCGTGAACCACCAGCACCTGC CTGCCAGGAGAGCTGAACCCCAGCGGCATACCATGCTGTGT ATGTGCTTCTACTCTAGGATCAGAGAGCTGAGGTACTACTCT GACTCTGTGTACGGCGACACCCTGGAGAAGCTGACCAACA CCGGCCTGTACAACCTGCTGATCCGGTGCCTGAGGTGCCAG AAGCCTCTGAACCCTGCCGAGAAGCTGAGACACCTGAACG AGAAGAGAAGATTCCACAAGATCGCTGGCCACTACAGAGG CCAGTGCCACTCTTGCTGCAACAGAGCCAGACAGGAGAGA CTCCAGCGGAGAAGGGAGACCCAGGTGGCCAGGAGAGCC GAGCCTCAGAGACACACCATGCTGTGCATGTGCTGCAAGTG CGAGGCCAGAATCGAGCTGGTGGTGGAGAGCTCTGCCGAC GACCTGAGAGCCTTCCAGCAGCTGTTCCTGTCTACCCTGAG CTTCGTGTGCCCTTGGTGCGCCTCTCAGCAGTAAGGCGCGC CGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTG TTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGG CCTTGAGCATCTGGATTCTGCCTGCTCGCTTTCTTGCTGTCC AATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTA AACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTG CCTGAATTCAAAAAAAAAAAAAAAAAAAGAAAAAAAAAA AAAAAAAAAGAAAAAAAAAAAAAAAAAAAGATATCAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAA 26 Sequence MDAMKRGLCCVLLLCGAVFVSPSITQDCSFQHSPISSDFAVKIR ofHPV ELSDYLLQDYPVTVASNLQDEELCGGLWRLVLAQRWMERLK protein TVAGSKMQGLLERVNTEIHFVTKCAFQPPPSCLRFVQTNISRLL QETSEQLVALKPWITRQNFSRCLELQCQPDSSTLPPPWSPRPLE ATAPTAPGGGSGDMHQKRTAMFQDPQERPRKLPHLCTELQTTI HDIILECVYCKQQLLRREVYDFAFRDLCIVYRDGNPYAVCDKC LKFYSKISEYRYMHGDTPTLHEYMLDLQPETTDLYCYEQLND SSEEEDEIDGPAGQAEPDRAHYNIVTFCCKCDSTLDKCLKFYS KISEYRYYCYSVYGTTLEQQYNKPLCDLLIRCINCQKPLCPEE KQRHLDKKQRFHNIRGRWTGRCMSCCRSSRTRRETQLHYNIV TFCCKCDSTLRLCVQSTHVDIRTLEDLLMGTLGIVCPICSQKP MARFEDPTRRPYKLPDLCTELNTSLQDIEITCVYCKTVLELTE VFEFAFKDLFVVYRDSIPHAACHKCIDFYSRIRELRYYSDSVM YGPKATLQDIVLHLEPQNEIPVDLLCHEQLSDSEEENDEIDGV NHQHLPARRAEPQRHTMLCMCFYSRIRELRYYSDSVYGDTLE KLTNTGLYNLLIRCLRCQKPLNPAEKLRHLNEKRRFHKIAGHY RGQCHSCCNRARQERLQRRRETQVARRAEPQRHTMLCMCCK CEARIELVVESSADDLRAFQQLFLSTLSFVCPWCASQQ