RNA sequence adaptation

Abstract

The present invention is directed to a method for modifying the retention time of RNA on a chromatographic column. The present invention also concerns a method for purifying RNA from a mixture of at least two RNA species. Furthermore, the present invention relates to a method for co-purifying at least two RNA species from a mixture of at least two RNA species. In particular, the present invention provides a method for harmonizing the numbers of A and/or U nucleotides in at least two RNA species. The present invention is also directed to RNA obtainable by said methods, a composition comprising said RNA or a vaccine comprising said RNA and methods for producing such RNA and compositions. Further, the invention concerns a kit, particularly a kit of parts, comprising the RNA, composition or vaccine. The invention is further directed to a method of treating or preventing a disorder or a disease, first and second medical uses of the RNA, composition and vaccine. Moreover, the present invention concerns a method for providing an adapted RNA sequence or an adapted RNA mixture.

Claims

1. A method for analysis or purification of a mixture comprising at least two harmonized RNA species, the method comprising: a) obtaining the coding sequences for at least two RNA species, said at least two RNA species each having a length of 800 to 20,000 nucleotides wherein the sequence of at least one RNA species is adapted by altering the number of A and/or U nucleotides in the RNA sequence with respect to the number of A and/or U nucleotides in the original RNA sequence, said coding sequences of the at least two RNA species having a harmonized number of encoded A and U nucleotides that is no more than 50 different from each other; b) synthesizing the at least two RNA species to produce at least two harmonized RNA species; and c) analysing and/or purifying a mixture of said at least two harmonized RNA species by chromatography.

2. The method according to claim 1, wherein step b) comprises the separate synthesis of the at least two harmonized RNA species.

3. The method according to claim 2, wherein step b) comprises mixing the at least two harmonized RNA species.

4. The method according to claim 1, wherein step b) comprises the synthesis of the at least two harmonized RNA species in one batch.

5. The method according to claim 1, wherein step b) comprises an in vitro transcription step.

6. The method according to claim 1, wherein at least one RNA species comprises at least 500 nucleotides.

7. The method according to claim 6, wherein the at least two RNA species are mRNAs.

8. The method according to claim 7, wherein the at least two RNA species each comprise a 5′-cap structure.

9. The method according to claim 8, wherein the at least two RNA species each comprise, in 5′ to 3′ direction, the following elements: a) a 5′-cap structure b) optionally, a 5′-UTR element, c) at least one coding region; d) a 3′-UTR element, and e) a poly(A) sequence comprising 10 to 200.

10. The method according to claim 8, wherein the at least two RNA species each encode different Influenza virus hemagglutinin (HA) antigens.

11. The method according to claim 8, wherein the at least two RNA species each comprise a coding region, wherein the coding region has an increased G/C content compared to the G/C content of an original coding sequence, wherein the encoded amino acid sequence is not modified compared to the amino acid sequence encoded by the corresponding original mRNA.

12. The method according to claim 8, wherein the method is applied to at least three RNA species.

13. The method according to claim 12, wherein the at least three RNA species encode different influenza HA antigens.

14. The method according to claim 1, comprising: c) analysing the mixture of said at least two harmonized RNA species by chromatography.

15. The method according to claim 1, wherein the chromatography comprises HPLC.

16. The method according to claim 15, wherein the chromatography comprises reversed phase HPLC.

17. The method according to claim 16, wherein the reversed phase HPLC is with a column that comprises a porous material, selected from the group consisting of polystyrene, a non-alkylated polystyrene, an alkylated polystyrene, a polystyrenedivinylbenzene, a non-alkylated polystyrenedivinylbenzene, an alkylated polystyrenedivinylbenzene, a silica gel, a silica gel modified with non-polar residues, a silica gel modified with alkyl containing residues, selected from butyl-, octyl and/or octadecyl containing residues, a silica gel modified with phenylic residues, and a polymethacrylate.

18. The method according to claim 8, wherein the at least two RNA species each encode different Influenza virus neuraminidase (NA) antigens.

19. The method according to claim 1, wherein the numbers of A and U nucleotides in the sequences of the at least two harmonized RNA species differ from each other by not more than 20.

20. The method according to claim 19, wherein the numbers of A and U nucleotides in the sequences of the at least two harmonized RNA species differ from each other by not more than 10.

21. The method according to claim 13, wherein the numbers of A and U nucleotides in the sequences of the at least two harmonized RNA species differ from each other by not more than 20.

22. The method according to claim 8, wherein the method is applied to at least four RNA species, wherein the at least four RNA species encode different influenza HA antigens.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1: illustrates the technical problem associated with HPLC co-purification and/or co-analysis of RNA mixtures. Schematic drawings of HPLC histograms (RNA mixture comprising different RNA molecule species) are shown.

(2) FIGS. 1A and 1B: RNA molecule species of an RNA mixture differ in their retention times. Impurities such as abortive sequences cannot be separated from each other. In addition, as the histograms partially overlap, quality attributes, such as integrity of the individual species, cannot be determined. FIG. 1A shows the separate histograms for each RNA species in the mixture.

(3) FIGS. 1C and 1D: In rare cases, three RNA species of an RNA mixture may have similar retention times, and impurities such as abortive sequences could be separated from each other. In addition, as the peaks entirely overlap, quality attributes, such as integrity of the whole RNA mixture, could be determined. FIG. 1C shows the separate histograms for each RNA species in the mixture. FIG. 1E: RNA species have different retention times. As the histograms are entirely separated from each other, quality attributes, such as integrity of individual RNA species of the RNA mixture, can be determined.

(4) FIG. 2: illustrates the basic principle of adapting a given nucleic acid sequence encoding a short protein (amino acid sequence: AWHPVAC) to either increase the AU count or decrease the AU count. FIG. 2A: As initial step of the adaptation method/algorithm, all codons of the nucleic acid sequence are categorized and potential exchanges with alternative codons are allocated for each individual codon (codon changes that allow increase or decrease of AU count are listed in Table 1 and Table 2). In FIG. 2A, codons labeled with an asterisk (*) may allow for an increase in AU count if the respective codon is changed accordingly; codons labeled with a hash (#) may allow for a decrease in AU count if the respective codon is changed accordingly; codons labeled with a cross (“x”) do not lead to a change in AU count. FIG. 2B shows the adaptation of the input nucleic acid sequence to an AU count increased target sequence. All four codons, which can potentially be replaced by alternative codons, are changed to the codon with a larger AU count (codons highlighted). FIG. 2C shows the adaptation of the input nucleic acid sequence to an AU decreased target sequence. All three codons, which can potentially be replaced by alternative codons, are changed to the codon with a lower AU count (codons highlighted).

(5) FIG. 3: illustrates the modular design principle of the inventive multivalent/polyvalent mRNA vaccine platform (e.g. influenza), where RNA species can be exchanged rapidly without changing the manufacturing conditions (HPLC purification and/or HPLC analysis). From an AU adapted RNA sequence pool, RNA species encoding antigens, e.g. HA and/or NA antigens, can easily be exchanged (“+”: addition of new RNA species to the RNA mixture; “−”: removal of RNA species from the mixture) for e.g. seasonal influenza vaccine production (A: season A; B: season B; C: season C). The general concept can also be used for e.g. Norovirus antigens.

(6) FIG. 4: shows that the number of adenine nucleotides in an RNA sequence correlates with HPLC retention times. RNA species (1-4) encoding firefly luciferase with varying polyA sizes were generated and individually analyzed via HPLC. HPLC chromatograms were superimposed. 1=A25; 2=A35; 3=A50; 4=A64. A detailed description of the experiment is provided in Example 1.

(7) FIG. 5: shows that adaptation of the number of adenine nucleotides in RNA sequences enables harmonization of HPLC retention time. FIG. 5A: Adenine adapted RNA sequences encoding HA-B Brisbane are shown (superimposed). FIG. 5B: Adenine adapted RNA sequences encoding HA-B Phuket are shown (superimposed). (1) Non-adapted sequence; (2) 9 adenines introduced in cds; (3) 9 adenine stretch added in the UTR or (4) 9 adenines added in polyA. A detailed description of the experiment is provided in Example 2.

(8) FIG. 6: shows that adaptation of the adenine count enables harmonization of HPLC retention time of RNA sequences encoding HA-A and RNA sequences encoding HA-B. RNA sequences encoding HA-B were adapted to match the A count of HA-A RNA sequences by increasing the A count accordingly. The asterisk indicates that the HPLC peaks completely overlap (declining slope of the peak determines retention time). Of note: For the purpose of FIG. 6, a distinction of individual chromatograms is not required. A detailed description of the experiment is provided in Example 2.

(9) FIG. 7: shows that HPLC is a particularly suitable method for co-analysis of an RNA mixture. RNA Mixtures of intact and degraded RNA at different ratios were analyzed via HPLC. FIG. 7A: Overlay of HPLC chromatograms of different RNA mixtures showing the amount of intact RNA. FIG. 7B: Overlay of HPLC chromatograms of different RNA mixtures showing the amount of degraded RNA. Of note: For the purpose of FIG. 7, a distinction of individual chromatograms is not required. A detailed description of the experiment is provided in Example 3.

(10) FIG. 8: shows that an adenine adapted RNA mixture (encoding three different NA antigens) generates one discrete HPLC peak, suitable for co-analysis and co-purification. FIG. 8A: HPLC histograms for individual non-adapted RNA sequences (1-3) and the resulting non-adapted RNA mixture (4) are shown. FIG. 8B: HPLC histogram for individual sequence adapted RNA sequences (1-3) and the resulting harmonized RNA mixture (4) are shown. A detailed description of the experiment is provided in Example 4.

(11) FIG. 9: shows a further illustration of the technical problem. RNA species encoding HA and NA of Influenza virus A and B have partially overlapping HPLC chromatograms due to different AU counts, illustrating the problem in the art that co-purification and/or co-analysis of such an RNA mixture (comprising all seven RNA species) would be technically impossible. 1=RNA species encoding neuraminidase of Influenza virus B (NA-B, Brisbane); 2=two RNA species encoding neuraminidase of Influenza virus A (NA-A, Hongkong, Calif.); 3=two RNA species encoding hemagglutinin of Influenza virus B (HA-B, Brisbane, Phuket); two RNA molecule species encoding hemagglutinin of Influenza virus A (HA-A, Hongkong, Calif.). A detailed description of the experiment is provided in Example 5.

(12) FIG. 10: shows that adenine and/or uracil count correlates with HPLC retention times.

(13) FIG. 10A: Total number/count of A and/or U and the content (%) of AU of different HA and NA RNA species plotted against the RNA retention time on HPLC is shown. The total number/count of A and/or U correlates with HPLC retention time, whereas the content of AU does not correlate with the HPLC retention time.

(14) FIG. 10B: Total number/count of G and/or C and the content (%) of GC of different HA and NA RNA species plotted against the RNA retention time on HPLC is shown. Both guanine/cytosine (G/C) count and content (%) do not correlate with HPLC retention times. A detailed description of the experiment is provided in Example 5.

(15) FIG. 11: illustrates the sequence adaptation strategy for an RNA mixture comprising three different RNA molecule species. Two product peaks have to be shifted by 17 AU or 40 AU (shifted peaks: dashed lines) to obtain a sequence adapted RNA mixture, where each of the three components can be co-analyzed on HPLC (AU count difference for the adapted sequences: 70 AU). A detailed description of the experiment is provided in Example 6.

(16) FIG. 12: overlay of HPLC chromatograms of constructs with different AU count on a monolithic ethylvinylbenzene-divinylbenzene copolymer. Flow rates and gradient profiles indicated. First peak corresponds to a construct with AU count 824, second peak corresponds to a construct with AU count 894, third peak corresponds to a construct with AU count 1137, fourth peak corresponds to a construct with AU count 1139, last peak corresponds to a construct with AU count 1424. A detailed description of the experiment is provided in Example 8.

(17) FIG. 13: overlay of HPLC chromatograms of constructs with different AU count on a particulate poly(styrene)-divinylbenzene (PVD) column. Flow rates and gradient profiles indicated. First peak corresponds to a construct with AU count 824, second peak corresponds to a construct with AU count 894, third peak corresponds to a construct with AU count 1137, fourth peak corresponds to a construct with AU count 1139, last peak corresponds to a construct with AU count 1424. A detailed description of the experiment is provided in Example 8.

(18) FIG. 14: overlay of HPLC chromatograms of constructs with different AU count on a Silica-based C4 column. Flow rates and gradient profiles indicated. First peak corresponds to a construct with AU count 824, second peak corresponds to a construct with AU count 894, third peak corresponds to a construct with AU count 1137, fourth peak corresponds to a construct with AU count 1139, last peak corresponds to a construct with AU count 1424. A detailed description of the experiment is provided in Example 8.

(19) FIG. 15: overlay of HPLC chromatograms of constructs with different AU count on a PLPR-S column. Flow rates and gradient profiles indicated. First peak corresponds to a construct with AU count 824, second peak corresponds to a construct with AU count 894, third peak corresponds to a construct with AU count 1137, fourth peak corresponds to a construct with AU count 1139, last peak corresponds to a construct with AU count 1424. A detailed description of the experiment is provided in Example 8.

(20) FIG. 16: plot of separation factor alpha against AU count difference for representative HPLC runs of Example 8. Monolithic: Values and respective trend line for monolithic ethylvinylbenzene-divinylbenzene copolymer with flowrate 0.5, cf. FIG. 12; PVD: Values and respective trend line for particulate poly(styrene)-divinylbenzene (PVD) column with flowrate 0.3, cf. FIG. 13; C4-F1: Values and respective trend line for silica-based C4 column with flowrate 1.0, cf. FIG. 14; PLPR-S: Values and respective trend line for PLPR-S column with flowrate 1.0, cf. FIG. 15. A detailed description of the experiment is provided in Example 8.

EXAMPLES

(21) The Examples shown in the following are merely illustrative and shall describe the present invention in a further way. These Examples shall not be construed to limit the present invention thereto.

(22) TABLE-US-00003 TABLE 3 Materials used U3000 UH PLC-System Thermo Scientific HPLC column poly(styrene- Thermo Scientific divinylbenzen) matrix) WFI Fresenius Kabi, Ampuwa Acetonitril (MS-grade) Fisher Scientific 0.1M TEAA in WFI (Eluent A) 25% ACN in 0.1M TEAA (Eluent B)

Example 1: Examination of the Correlation Between Homopolymer Stretches of Nucleotides on HPLC Retention Times

(23) The inventors surprisingly found that not the size of an RNA, but the total number of adenine nucleotides (A nucleotides) and/or uracil nucleotides (U nucleotides) of an RNA is influencing HPLC retention times. Further details are provided in the following.

(24) 1.1. Preparation of DNAs Encoding Firefly Luciferase Including Varying Stretches of Adenines:

(25) The DNA sequence encoding firefly luciferase protein was introduced into a modified pUC19 derived vector backbone to comprise a 5′-UTR derived from the 32L4 ribosomal protein (32L4 TOP 5′-UTR) and a 3′-UTR derived from albumin, a histone-stem-loop structure, and stretches of varying numbers of adenine nucleotides (also referred to in the following as ‘polyA stretch’ or ‘A homopolymer’) at the 3′-terminal end and. The complete RNA sequences are provided in the sequence listing (see Table 4 below).

(26) TABLE-US-00004 TABLE 4 Constructs used in the experiment Encoded Length of protein polyA stretch SEQ ID NO: luciferase A25 1 A35 2 A50 3 A64 4

(27) DNA plasmids were linearized using EcoRI and transcribed in vitro using DNA dependent T7 RNA polymerase in the presence of a nucleotide mixture and cap analog under suitable buffer conditions. The obtained individual RNA products were purified using PureMessenger® as described in WO 2008/077592 A1 and subsequently analyzed using HPLC.

(28) 1.2 Determination of HPLC Retention Times:

(29) Individual RNA samples were diluted to 0.1 g/L using water for injection (WFI). 10p1 of the diluted RNA sample were injected into the HPLC column (monolithic poly(styrene-divinylbenzen) matrix). The RP HPLC analysis was performed using the following conditions:

(30) Gradient 1: Buffer A (0.1 M TEAA (pH 7.0)); Buffer B (0.1 M TEAA (pH 7.0) containing 25% acetonitrile). Starting at 30% buffer B the gradient extended to 32% buffer B in 2 minutes, followed by an extension to 55% buffer B over 15 minutes at a flow rate of 1 ml/min (adapted from WO 2008/077592). Chromatograms were recorded at a wavelength of 260 nm.

(31) In order to examine an eventual correlation between the presence and the extent of nucleotide homopolymer stretches on the one hand and the HPLC retention time on the other hand, chromatograms of each HPLC analysis run were superimposed. FIG. 4 shows the superposition of HPLC runs of RNAs with varying stretches of adenine nucleotides.

(32) Results

(33) As shown in FIG. 4, the analysis of the HPLC retention time of different RNA molecule species differing only in the length of A nucleotide homopolymer stretches (i.e. in the total number of A nucleotides) show a clear correlation of the total number of A nucleotides in the RNA sequences and HPLC retention time. Longer A homopolymers led to an increase in retention time, suggesting that the observed effect on HPLC retention time is caused by the increased number of A nucleotides (+25 adenine nucleotides; +35 adenine nucleotides; +50 adenine nucleotides; +64 adenine nucleotides).

(34) Notably, changes in the total number of cytosine nucleotides (C nucleotides) did not have an influence on HPLC retention time (not shown). As only the number of A nucleotides and not the number of C nucleotides influences HPLC retention times, an effect merely caused by elongation of the RNA molecule species can be ruled out.

Example 2: Harmonization of HPLC Retention Times of Different HA RNA Sequences for Co-Purification and/or Co-Analysis by Adaptation of the Adenine Count

(35) The inventors surprisingly found that the adaptation the total number of A nucleotides in two or more different RNA species (e.g. RNA molecules comprising different sequences encoding influenza HA-B) is suitable to harmonize the HPLC retention times, so that co-purification and/or co-analysis becomes feasible. Further details are provided in the following.

(36) 2.1. Adaptation of the Total Number of a Nucleotides:

(37) As the previous examples show a correlation between the number of A nucleotides in an RNA sequence and the respective HPLC retention time, RNA sequences encoding HA antigens (four different RNA sequences encoding influenza HA) were adapted so that they comprised (essentially) the same number of A nucleotides. The sequence adaptation was performed in such a way that the encoded amino acid sequence was unchanged, either by exploiting the degeneracy of the genetic code (compare with Table 1 and Table 2) or by introducing an adenine stretch into the polyA tail or the UTR of the RNA molecule species.

(38) The goal was to adapt the sequences in a way to facilitate co-purification and/or co-analysis of an RNA mixture comprising different HA RNA molecule species by obtaining a complete overlay of the four chromatograms (harmonization) in HPLC, which is a prerequisite for a cost-effective and fast production of an influenza vaccine based on an mRNA mixture (e.g., for the development of a multivalent/polyvalent influenza RNA vaccine platform, cf. FIG. 3).

(39) In order to harmonize the retention times of all RNA molecule species encoding different HA antigens (HA-A and HA-B), GC-optimized DNA sequences encoding different HA proteins of Influenza B were adapted by increasing the number of A nucleotides by adapting the coding sequence (via codon exchange), by elongating the poly A sequence, or by introducing additional A nucleotides into the UTR region (see Table 5 below). The adaptation was performed by increasing the total number of A nucleotides in the HA-B sequences by 9 in order to shift the total number of A nucleotides in the HA-B sequences closer to the number of A nucleotides in the HA-A sequences. DNA constructs and RNA prepared as explained in Example 1.

(40) TABLE-US-00005 TABLE 5 HA-constructs used in the experiment Encoded A count AU count SEQ ID Antigen Mode of adaptation of RNA* of RNA** NO: HA-B Not adapted 467 723 5 Brisbane 9 A nucleotides introduced 476 732 6 into cds by codon exchange 9 A stretch introduced into poly A tail 476 732 7 9 A stretch introduced into the UTR 476 734 8 HA-B Not adapted 458 717 9 Phuket 9 A nucleotides introduced into 467 726 10 cds by codon exchange 9 A stretch in poly A tail 467 726 11 9 A stretch introduced into the UTR 467 728 12 HA-A Not adapted 476 737 13 California HA-A Not adapted 481 729 14 Hongkong *A-count of RNA: total number of A nucleotides in the respective RNA **AU-count of RNA: total number of A and U nucleotides in the respective RNA

(41) 2.2. Effect of the Total Number of a Nucleotides on HPLC Retention Time:

(42) HPLC sample preparation and HPLC analysis were performed as described Example 1. In order to examine the effect of the number of A nucleotides on HPLC retention time, the chromatograms of each RNA species were superimposed and analyzed.

(43) FIG. 5A shows four superimposed chromatograms for RNA molecule species encoding HA-B/Brisbane (one non-adapted sequence (1) and three adapted sequences (2, 3 and 4, respectively)). FIG. 5B shows four superimposed chromatograms for the RNA molecule species encoding HA-B/Phuket (one non-adapted sequence (1) and three adapted sequences (2, 3 and 4, respectively)). FIG. 6 shows superimposed chromatograms for adapted RNA molecule species (9 Adenines introduced into cds by codon exchange) encoding HA-B/Brisbane, adapted RNA molecule species encoding HA-B/Phuket and two non-adapted RNA molecule species encoding HA-A (HA-A California; HA-A Hongkong).

(44) Results:

(45) The results show that the adaptation of the number of A nucleotides in the RNA sequences (see FIG. 5A and FIG. 5B) by addition of 9 adenines led to a shift in HPLC retention time. As indicated in FIG. 5A and FIG. 5B, the effect of an A-stretch, either introduced into the UTR or introduced into the polyA tail had a slightly stronger effect on the retention time.

(46) FIG. 6 shows that HA-B sequences were successfully adapted, and that HA-A peaks and HA-B peaks are harmonized. This adaptation of the sequences, which results in completely overlapping HPLC peaks, allows for co-analysis of the individual RNA species in the mixture and for simultaneous determination of the integrity of the RNAs in the mixture. Moreover, harmonization of HPLC retention times allows for a simultaneous RNA purification (co-purification).

(47) Of note, as analyzed and explained in further detail in Example 5, the surprisingly precise overlap of the HA-A sequences and adapted HA-B sequences as observed in FIG. 6 can also be explained by the closely matching number of A nucleotides and U nucleotides (AU count) of the respective RNA sequences (HA-B Brisbane: AU count 732; HA-B Phuket: AU count 726; HA-A California: AU count 737; HA-A Hongkong: AU count: 729; see Table 4).

Example 3: Evaluation of Suitability of HPLC for Co-Analysis of RNA Mixture

(48) The inventors showed that HPLC is a particularly suitable method for co-analysis of an RNA mixture. Further details are provided in the following.

(49) 3.1. Preparation of Test RNA:

(50) RNA for testing the HPLC system was generated according to Example 1.

(51) 3.2. Directed Degradation of RNA and Preparation of RNA Mixtures of Different Integrities:

(52) RNA samples were degraded at 90° C. for 140 minutes. Subsequently, intact RNA and degraded RNA were mixed in different ratios of intact RNA: degraded RNA (90:10, 80:20, 70:30, 60:40, 50:50, 40:60, 30:70, 20:80, and 10:90) and respective RNA mixtures of varying integrities were applied to analytic HPLC. Analytic HPLC was performed as described in Example 1. For analysis, HPLC runs of the different RNA mixtures were superimposed. The results are shown in FIG. 7.

(53) Results:

(54) As FIG. 7 shows, RNA integrity of an RNA mixture can be determined in a scenario where the RNA mixture has the same retention time (harmonized RNA peak; peak of the individual RNA components, in that case integer RNA and degraded RNA, are completely overlapping). Accordingly, the analytic system is suitable for the analysis of a (polyvalent) RNA mixture according to the present invention.

Example 4: Harmonization of HPLC Retention Times of NA RNA Sequences for Co-Purification and/or Co-Analysis by Adaptation of the Number of a Nucleotides

(55) The inventors surprisingly found that an RNA mixture (encoding three different NA antigens) comprising RNA species with an adapted number of A nucleotides generates one harmonized HPLC peak, suitable for co-analysis and co-purification. Further details are provided in the following.

(56) 4.1. Adaptation of the Number of a Nucleotides in DNA Encoding NA Proteins of Several Influenza Strains:

(57) The goal was to adapt NA RNA sequences in a way to facilitate co-purification and/or co-analysis of an RNA mixture of different NA RNA molecule species by obtaining a complete overlay of the three chromatograms, which is a prerequisite for a cost-effective and fast production of an RNA-mixture based influenza vaccine (e.g. for the development of a multivalent influenza RNA vaccine, cf. FIG. 3)

(58) In order to harmonize the retention time of all RNA molecule species encoding different NA antigens, GC-optimized DNA sequences encoding different NA proteins of Influenza were adapted by decreasing the number of A nucleotides by altering the coding sequence (codon exchange; see Table 6 below). The adaptation was performed in order to decrease the number of A nucleotides in RNA encoding NA H3N2 and mRNA encoding NA H1N1 to essentially match the number of A nucleotides in RNA encoding NA Influenza B.

(59) DNA constructs and RNA were prepared as explained in Example 1.

(60) TABLE-US-00006 TABLE 6 NA-constructs used in the experiment Encoded Antigen Mode of adaptation SEQ ID NO: NA Influenza B Not adapted 15 (Brisbane) NA H3N2 Not adapted 16 (Hongkong) 17 A removed from cds 17 by codon exchange NA H1N1 Not adapted 18 (California) 16 A removed from cds 19 by codon exchange

(61) 4.2. Effect of the Number of a Nucleotides on HPLC Retention Time:

(62) HPLC sample preparation and HPLC analysis were performed as described in Example 1.

(63) In order to examine the effect of the number of A nucleotides on HPLC retention time, the chromatograms of non-adapted RNA species were superimposed and analyzed. In addition, non-adapted RNA molecule species were mixed (100 ng each), applied as a mixture, and analyzed by HPLC (see FIG. 8A). In addition, adapted NA H3N2 (Hongkong) RNA, adapted NA H3N2 (California) RNA, and NA Influenza B (Brisbane) RNA were mixed (100 ng each) and applied as a mixture, and analyzed by HPLC (see FIG. 8B). FIG. 8A shows superimposed chromatograms for non-adapted RNA molecule species encoding NA (1, 2, 3) next to the chromatogram of the corresponding RNA mixture (4). FIG. 8B shows the chromatogram of the harmonized NA RNA mixture (4).

(64) Results

(65) The results show that the adaptation of the number of A nucleotides in the individual RNA sequences of an RNA mixture leads to adaptation of the retention time of the RNA mixture and a discrete HPLC peak (see FIG. 8B), which allows for co-analysis and co-purification (even though the individual HPLC peaks show a slight variation). In contrast, a non-adapted RNA mixture generates a broad, non-discrete HPLC double-peak (see FIG. 8B) that is not suitable for co-analysis and co-purification.

(66) Of note, the adaptation (reduction) of the number of A nucleotides in SEQ ID NOs: 17 and 19 was performed by changing serine codon AGC to codon UCC, which led to a decrease in A count and to an increase in U count (AU count was therefore stable; ratio of A:U was decreased), suggesting that the observed slight variation in the HPLC chromatograms of the individually analyzed adapted sequences was caused by a shift in the A:U ratio. Accordingly, adaptation of the A:U ratio can also be used for sequence adaptations according to the invention.

Example 5: Examination of the Influence of Nucleotides on HPLC Retention Time

(67) As shown in the previous examples, the adaptation of the number of A nucleotides in RNA sequences allows for harmonization of HPLC chromatograms, which is a requirement for co-analysis and/or co-purification. The inventors further found that the number of A and/or U nucleotides correlates with HPLC retention time. That finding provides even more options for adapting an RNA sequence and to harmonize HPLC chromatograms of RNA mixtures. Further details are provided in the following.

(68) 5.1. Preparation of DNA Encoding HA Proteins of Several Influenza Strains:

(69) DNA sequences encoding different haemagglutinin (HA) and neuraminidase (NA) proteins, two glycoproteins found on the surface of influenza viruses (Influenza A and Influenza B), were generated, and RNA was produced as described in Example 1.

(70) TABLE-US-00007 TABLE 7 HA-constructs used in the experiment: Encoded antigen SEQ ID NO: HA-A California 13 HA-A Hongkong 14 HA-B Brisbane 5 HA-B Phuket 9 NA H1N1 (California) 18 NA H3N2 (Hongkong) 16 NA Influenza B (Brisbane) 15

(71) 5.2. Correlation Between the Total Number of a Nucleotide and/or the Relative Content of a Nucleotide and the HPLC Retention Time:

(72) HPLC sample preparation and HPLC analysis were performed as described Example 1.

(73) In a first step, the individually produced RNA constructs (RNA species) encoding HA and NA antigens were separately analyzed on HPLC. The superimposed HPLC chromatograms are shown in FIG. 9. The superimposition of the chromatograms of the seven different RNA species showed that all chromatograms partially overlap, which makes both, co-purification and co-analysis via HPLC technically impossible (compare illustration of problem in the art, FIG. 1).

(74) For a better understanding of the impact of the nucleotide sequence on HPLC retention time, the correlation between the nucleotide count (A, U, G, and C) and nucleotide content for each RNA molecule species and HPLC retention time was examined.

(75) FIG. 10A shows the correlation between number and content (AU %) of A and U nucleotides of different HA and NA RNA species and their respective HPLC retention times. FIG. 10B shows the correlation between number and content (GC %) of guanine (G) and cytosine (C) nucleotides of different HA and NA RNA species and their respective HPLC retention times.

(76) Results:

(77) FIG. 10A shows a clear correlation of the number of A and/or U nucleotides with the respective retention time. Such a correlation was not found for the content of A and U nucleotides (AU %). FIG. 10B shows HPLC retention times are neither influenced by the number of G and/or C nucleotides, nor by the content of G and/or C nucleotides (GC %).

(78) Notably, the correlation between the number of A nucleotides and the retention time is stronger than the correlation between the number of U nucleotides and the retention time; In line with that, the results of Example 4 also suggested that the effect of A nucleotides on retention time is stronger than the effect of U nucleotides on retention time.

(79) Overall, the number of A and U nucleotides shows the best correlation and will allow for the most precise way for adapting RNA sequences to harmonize RNA mixtures for co-analysis and co-purification.

Example 6: Development of an Automated Nucleotide Adaptation Method (Algorithm)

(80) The inventors developed an automated in silico method (algorithm) to set the number of any nucleotide in an RNA sequence to a certain defined value, without altering the amino acid sequence. In the context of the invention, the automated in silico method was used for sequence adaptation (adaptation of the number of A and/or U nucleotides (AU count)) of RNA sequences to allow harmonization of RNA mixtures for HPLC co-analysis and/or HPLC co-purification. Further details are provided in the following.

(81) 6.1 Sequence Analysis and Definition of Target AU Count:

(82) The objective of the experiment was to generate RNA sequences for an adapted RNA mixture (comprising three different RNA molecule species encoding antibodies) suitable for co-analysis using HPLC. The AU count has to be adapted in all RNA molecule such that their respective HPLC chromatograms are completely separated (difference in the AU count of at least 70), allowing for co-analysis of their integrity.

(83) Three antibody sequences (SEQ ID NOs: 20-22) were selected and GC optimized DNA (SEQ ID NOs: 23-25) sequences were generated (essentially according to Example 1). Nucleotide numbers were determined for the respective GC optimized sequences (product 1, product 2, product 3; see Table 8 below) to be able to define optimal numbers of A and U (T) nucleotides for HPLC co-analysis.

(84) TABLE-US-00008 TABLE 8 Nucleotide numbers for GC optimized constructs: product Length A count T (U) count AT (AU) count SEQ ID NO: 1 81 19 13 32 23 2 258 59 26 85 24 3 429 77 55 132 25

(85) To adapt the RNA molecule species comprised in the RNA mixture for HPLC co-analysis, the target AU counts for product 2 and product 3 were set to the following values, allowing integrity analysis on HPLC when analyzed as an RNA mixture:

(86) TABLE-US-00009 TABLE 9 Adaptation strategy for co-analysis of the RNA mixture: AT (AU) count Change in Target AT product (non-adapted) AU count (AU) count 1 32 0 32 2 85 +17 102 3 132 +40 172

(87) As indicated in Table 9, the target AU count for each product RNA was set in such a manner that the AU counts of the three RNA sequences differ by at least 70 nucleotides (strategy illustrated in FIG. 11).

(88) 6.2 AU Sequence Adaptation Method:

(89) In the following, the sequence adaptation method is exemplarily described for product 2 (+17 AU) (SEQ ID NO: 24). As the number of A nucleotides in the sequence was larger than the number of T (U) nucleotides, the adaptation values were set to +8A and +9T(U) in order to maintain the distribution of A and U nucleotides in the resulting AU adapted sequence.

(90) In the initial phase of the method (algorithm), a matrix for each codon comprised in the sequence was created, identifying possible changes (herein referred to as “exchange matrix”). An exemplary “exchange matrix” is shown in Formula (I).

(91) $\begin{matrix} \begin{matrix} A & 1 \\ C & 1 \\ G & 1 \\ T & 1 \\ * & 4 \end{matrix}} CGA & Formula (I) \end{matrix}$

(92) Formula (I) shows that for codon “CGA” a change to an alternative codon offers the option of increasing the number of A nucleotides by 1 (e.g.: CGA.fwdarw.AGA), offers the option of increasing the number of C nucleotides by 1 (e.g. CGA.fwdarw.CGC), offers the option of increasing the number of G nucleotides by 1 (e.g. CGA.fwdarw.CGG), and offers the option of increasing the number of T nucleotides by 1 (e.g. CGA.fwdarw.CGT).

(93) Exchange matrices were generated for each individual codon in the sequence. Using said exchange matrices, the potential maximum number of the respective nucleotides (A and T(U) count, respectively) in each codon was determined (without changing the amino acid sequence). Accordingly, all 63 codons of the sequence were analyzed, and the potential alternative codons were assembled in a table structure as shown in Table 10 by way of example.

(94) TABLE-US-00010 TABLE 10 Exemplary table of alternative codons allowing for a change in the number of a nucleotide Codons Alternative codons CGA CGT, CGC, CGG, AGA, AGG GAT GAC GAC GAT ATG no alternative codon . . . . . .

(95) Next, the sequence according to SEQ ID NO: 24 was iteratively divided into separate codons and stored in table format, which resulted in a list as exemplarily shown in Table 11 (positions 1, 2, 3, 4 . . . 86 and 87 of the sequence are indicated).

(96) TABLE-US-00011 TABLE 11 Codon list of SEQ ID NO 24: Codon position Codon 1 ATG 2 AGC 3 ATC 4 ATC . . . . . . 86 GAG 87 AGC

(97) Next, the list of codons (see Table 11) was analysed for possible codon changes by step-wise iteration, wherein in each iteration step the corresponding codon was analysed using the respective exchange matrix (as outlined above) for potential nucleotide changes. For example, if no changes were theoretically possible in the respective codon, e.g. as in the case of “ATG” or “TGG”, the corresponding exchange matrix as exemplarily shown in Formula (II) was used (* of exchange matrix=0).

(98) $\begin{matrix} \begin{matrix} A & 0 \\ C & 0 \\ G & 0 \\ T & 0 \\ * & 0 \end{matrix}} ATG & Formula (II) \end{matrix}$

(99) Formula (II) shows that for codon “ATG” a change to an alternative codon offers no option of increasing the number of A nucleotides, C nucleotides, G nucleotides or T nucleotides (as there are no alternative codons for ATG (Met)).

(100) In cases where changes according to the respective exchange matrix (*>0) were theoretically possible, the codon was further analysed if these changes can be implemented under the premise that e.g. only codons that offer the option of increasing the number of A and/or T(U) nucleotides were adapted. Therefore the intersection between the target nucleotides (e.g. A and/or T(U)) and the nucleotides that potentially generate a positive result (that is, A and/or T(U) change; see e.g. Formula (I)) in the current exchange matrix was constructed. As a result, each codon was categorized and grouped in three categories:

(101) Category 1 (Category “Favourable”):

(102) Potential codon exchanges allowing an increase in only one target nucleotide (in the present example A or T(U)). For example, the codon “GAC” (Asp) can be changed to “GAU” (Asp) in order to increase the number of A and T(U) nucleotides. No further analysis is required since that modification does not have any further impact (besides the one mentioned above) on the number of A and T(U) nucleotides.

(103) Category 2 (Category “Possible”):

(104) Potential codon exchanges allowing the increase in both target nucleotides (in the present example A and T(U)). For example, codon “GCA” can be changed to “GCU”, which would increase the T(U) count but at the same time decrease the A count. Accordingly, further analysis would be required with respect to codons belonging to this the category in order to decide, whether the number of one of the two target nucleotides (T(U)) in this example should be increased at the expense of a reduction of the number of the other target nucleotide (A).

(105) Category 3 (Category “Impossible”):

(106) Codons in the RNA sequence, for which no alternative codons exist (*=0). Examples for this category 3 are ATG (Met; start codon) or “UGG” (Trp).

(107) All codons of the original sequence were categorized in that manner. After this step, there were three categories with a total of 87 entries (for 87 codons present in SEQ ID NO: 24). For the next step, category 3 was no longer considered, as a codon change will not influence the target nucleotide count (A, T(U)).

(108) Next, it was calculated how many potential nucleotide changes have been identified for all target nucleotides (A, T(U)). For the SEQ ID NO: 24 the possible nucleotide changes are listed (category 1, category 2; see Table 12).

(109) TABLE-US-00012 TABLE 12 Nucleotide changes that can potentially be applied to SEQ ID NO 24 Potential Nucleotide changes Category A 22 1 T 15 1 A, T 47 2

(110) Accordingly, 22 favourable changes (see category 1/“favourable” as explained above) were identified for A and 15 favourable changes were identified for T(U). As the adaptation values were set to +8A and +9T(U) the adaptation at codon positions were all taken from category 1. Table 13 summarizes the introduced codon exchanges that were equally distributed across the sequence.

(111) TABLE-US-00013 TABLE 13 Codon exchanges introduced into SEQ ID NO 24 Position A Position T(U) A Exchange increase T Exchange increase 30 AAG -> AAA +1 3 AGC -> AGT +1 48 AAG -> AAA +1 12 AGC -> AGT +1 78 AAG -> AAA +1 63 GAC -> GAT +1 141 CAG -> CAA +1 90 AGC -> AGT +1 165 GAG -> GAA +1 102 AGC -> AGT +1 213 GAG -> GAA +1 117 AAC -> AAT +1 234 GAG -> GAA +1 138 AGC -> AGT +1 252 GAG -> GAA +1 150 AAC -> AAT +1 180 AGC -> AGT +1

(112) These exchanges resulted in the following adapted sequence according to SEQ ID NO: 25.

(113) In the above described example, potential nucleotide exchanges from category 2 were not implemented. However, in scenarios where e.g. T(U) counts larger than 15 are needed, codons from category 2 are used as soon as all codons from category 1 have been used, in order to obtain additional adaptation possibilities for A and T(U) counts. If category 2 is required in order to achieve the desired nucleotide counts, calculation of the following ratio will identify the exchange nucleotide (nucleotide A or T(U)):

(114) $\frac{c_{i}}{x_{i} - p_{i}}$
wherein i represents the corresponding target, c.sub.i is the count of possible adaptation positions of i in category 2, x.sub.i is the desired threshold for i, p.sub.i the count for the already changed identified adaptation positions. All calculated ratios are ranked and starting from lowest to highest ratio, the changes from category 2 are applied, until the desired threshold has been reached or until all the possible exchanges from category 2 have been performed. This procedure is carried out iteratively for all targets, where the desired numbers cannot be achieved by only using exchanges according to category 1.

(115) For example, category 2 contains 47 codons (see Table 12), which could potentially be exchanged in order to increase the A or T count. Accordingly, further changes are implemented from category 2 until the desired threshold has been reached or until all the codons from category 2 have been used as well. For SEQ ID NO: 24, a change of the T(U) count to e.g. 20 would result in additional adaptations by using the following alternative codons from category 2 (additional codon exchanges from category 1 are not shown): 6=ATT, 9=ATT, 21=CTT, 144=CCT, 183=TCT.

(116) In cases where the desired target nucleotide count cannot be achieved (as all alternative codons from category 1 and 2 have been used, which means that no further changes are possible), an adapted sequence is generated that is matching the target nucleotide count as close as possible.

(117) In order to further optimize the above described method (algorithm), the following improvements are implemented: 1. The basic equal distribution, which was used in the experiments described above, is based on the exchange possibilities. Other distribution models may also be envisaged, such as normal distribution, first occurrences distribution, last occurrences distribution or random-based distribution. Alternatively, the mean of the possible changes or median of the possible changes may be determined and all exchanges may be arranged around these values. 2. The exchange matrix contains additional information about the codon for the target sequence (e.g. codon usage etc.). This creates a further criteria for the question of whether a codon exchange is desirable or not, facilitating adaptation to a specific codon usage or a different nucleotide ratio in the target sequence. 3. Implementation of a third category by sequences or motifs which should be avoided by an exchange (e.g. a recognition motif of a restriction enzyme, promotor sequences or sequences building not desired secondary structures, etc.). 4. Automated binning of input sequences, based on their length and the occurrence of the desired target nucleotides in order to identify optimal nucleotide counts for A and/or U adaptation.

Example 7: Generation and Use of a Polyvalent Influenza Virus RNA Platform for Fast-Adjustable Influenza Vaccine Production

(118) A pool of GC optimized coding sequences encoding HA antigens were AU adapted to a count of 612 AU (360 A and 252 T) resulting in an AU adapted HA sequence pool (SEQ ID NO: 26 to 16263). A pool of GC optimized coding sequences encoding NA antigens were AU adapted to an AU count of 488 AU (271A and 217U) resulting in an AU adapted NA sequence pool (SEQ ID NOs: 16264-30567). AU adaptations were performed according to the invention, essentially as described in Example 6. The adaptation allows co-purification of RNA mixtures comprising adapted HA RNA sequences and co-purification of RNA mixtures comprising adapted NA sequences. Moreover the adaptation allows co-analysis of an RNA mixture comprising adapted HA and NA RNA species as the RNA sequences encoding HA (AU count 612) and the RNA sequences encoding NA (AU count 488) generate separated peaks (AU count difference: 124), suitable for analysis of integrity using HPLC.

(119) HA RNA mixtures are produced according to procedures as disclosed in the PCT application WO2017/109134 using GC optimized AT adapted DNA templates (generated as described in Example 1). In short, a DNA construct mixture (each of which comprising different HA coding sequences and a T7 promotor) is used as a template for simultaneous RNA in vitro transcription to generate a mixture of HA mRNA constructs. Subsequently, the obtained harmonized RNA mixture is used for co-purification using RP-HPLC.

(120) In a parallel reaction, NA RNA mixtures are produced according to procedures as disclosed in the PCT application WO 2017/109134 using GC optimized AT adapted DNA templates. In short, a DNA construct mixture (each of which comprising different NA coding sequences and a T7 promotor) is used as a template for simultaneous RNA in vitro transcription to generate a mixture of HA mRNA constructs. Subsequently, the obtained harmonized RNA mixture is used for co-purification using RP-HPLC.

(121) The purified mRNA mixture encoding HA antigens and the purified mRNA mixture encoding NA antigens are mixed to generate a HA/NA RNA mixture. The integrity of the mixture (that is of the NA peak and the HA peak) is co-analyzed via HPLC as described herein.

(122) Advantageously, the AU adaptation of HA sequences and NA sequences in order to harmonize chromatographic peaks for HPLC based co-purification and co-analysis according to the invention facilitates the production of mRNA-based multivalent influenza vaccines, which may be quickly adapted to demand, e.g. in seasonal influenza vaccine design or in a pandemic scenario (compare with FIG. 3).

Example 8: Suitability of the Method on Various Reverse Phase HPLC Matrices

(123) The inventors found that modification of the retention time of an RNA via adaptation of A and/or U count is not restricted to a certain reverse phase column chemistry.

(124) To test whether a modification in retention time via an adaptation of A and/or U count can also be observed on other reverse phase columns (in Example 1-7, a monolithic poly(styrene-divinylbenzene)matrix has been used), the following columns were tested: monolithic ethylvinylbenzene-divinylbenzene copolymer (ThermoFisher Scientific) (see FIG. 13) particulate poly(styrene)-divinylbenzene column (ThermoFisher Scientific) (see FIG. 14) Silica-based C4 column (ThermoFisher Scientific) (see FIG. 15) PLRPS, non-alkylated porous poly(styrene-divinylbenzene)matrix(see FIG. 16)

(125) RNA encoding yellow fever virus antigens was generated. The constructs encode the same antigen (YFV(17D)-prME), comprise the same UTR elements, and have the same size. The AU count for each constructs was changed via coding-sequence adaptation. The constructs are listed in Table 14.

(126) TABLE-US-00014 TABLE 14 AU adapted YFV constructs SEQ ID NO Antigen RNA size A U G C AU GC 30588 prME 2311 474 350 544 943 824 1487 30589 prME 2311 587 307 704 713 894 1417 30590 prME 2311 633 504 627 547 1137 1174 30591 prME 2311 788 351 555 617 1139 1172 30592 prME 2311 886 538 516 371 1424 887

(127) To evaluate the effect of AU adaptation on retention time, 500 ug of each construct was subjected individually to the respective column. In addition, for each column two different flow-rates were tested. Results are shown in FIGS. 12-15.

(128) In addition, the separation factor alpha was determined for each construct on each column tested. In chromatography, the separation factor alpha expresses the ratio of retention times of two compounds. Accordingly, a separation factor of value of larger 1.0 means that separation of two compounds occurred. In the present analysis, SEQ ID NO: 30592 (with AU 1424) was taken as a reference for separation factor calculation. The calculated separation factors are shown in Table 15. The obtained separation factor values were plotted against the AU count difference of the constructs (SEQ ID NO: 30592, with AU 1424 was taken as a reference). The diagram is shown in FIG. 16.

(129) TABLE-US-00015 TABLE 15 AU adapted YFV constructs SEQ RNA AU AU Alpha Alpha Alpha Alpha ID NO size count difference monolithic PVD C4 PLPR-S 30592 2311 1424 0 1,00 1,00 1,00 1,00 30591 2311 1139 285 1,07 1,05 1,05 1,08 30590 2311 1137 287 1,09 1,07 1,06 1,11 30589 2311 894 530 1,15 1,12 1,10 1,18 30588 2311 824 600 1,19 1,15 1,12 1,24

CONCLUSION

(130) As shown in FIG. 13, the modulation of retention time works on a monolithic ethylvinylbenzene-divinylbenzene-copolymer. All four constructs elute at different retention times and could be clearly separated. In the present case, the separation of construct with AU824 from the construct with AU894 was strong enough to allow further analysis of the peaks.

(131) As shown in FIG. 14, the modulation of retention time works on a particulate poly(styrene)-divinylbenzene column. All four constructs elute at different retention times and could be clearly separated. In the present case, the separation of construct with AU824 from the construct with AU894 was strong enough to allow further analysis of the peaks.

(132) As shown in FIG. 15, the modulation of retention time works on a Silica-based C4 column. All four constructs elute at different retention times and could be clearly separated. In the present case, the separation of construct with AU824 from the construct with AU1139 was strong enough to allow further analysis of the peaks.

(133) FIG. 16 summarizes the inventive concept of the present invention. An adaption of the AU count of the different RNA constructs (in other words: increasing the difference in AU count between the constructs) led to a modification in retention time, thereby allowing a separation on HPLC. Notably, that effect was observed irrespective of the column matrix used.

(134) Of course, a harmonization of AU counts (in other words: decreasing the difference in AU count between the constructs) would also lead to a modification in retention time, thereby allowing co-purification on HPLC.

RNA sequence adaptation

Assignee

Inventors

Cpc classification

Classification Explorer

A61K31/713

HUMAN NECESSITIES

Classification Explorer

A61K31/7088

HUMAN NECESSITIES

Classification Explorer

C07H1/06

CHEMISTRY; METALLURGY

Classification Explorer

A61K39/145

HUMAN NECESSITIES

Classification Explorer

C12N15/101

CHEMISTRY; METALLURGY

Classification Explorer

Y02A50/30

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

C12P19/34

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C07H1/06

CHEMISTRY; METALLURGY

Classification Explorer

A61K31/713

HUMAN NECESSITIES

Abstract

Claims

Description