DNA impurities in a composition comprising a parvoviral virion

11021762 · 2021-06-01

Assignee

Inventors

Cpc classification

International classification

Abstract

The current invention relates to nucleic acid impurities in a composition comprising a parvoviral vector. In particular, the current invention shows that DNA impurities are not randomly encapsulated within a parvoviral virion. The invention therefore relates to a method for identifying and quantifying a nucleic acid impurity in a composition comprising a parvoviral vector. Finally, the current invention relates to method of determining whether a composition comprising a parvoviral vector is regarded as clinically pure.

Claims

1. A method for identifying and quantifying a nucleic acid impurity in a composition comprising a parvoviral vector, comprising: (a) subjecting the composition to nucleic acid sequencing to obtain at least a thousand random reads of nucleotide sequences; (b) comparing, using a processor, the at least a thousand random reads from step (a) with a nucleotide sequence of a biological component used in a process for producing the composition, whereby a match between a random read and a nucleotide sequence of a biological component identifies a nucleic acid impurity; (c) determining an average number of reads per parvoviral vector; (d) determining a distribution of reads by calculating a number of reads per nucleotide of an identified nucleic acid impurity; wherein an identified nucleic acid impurity is identified as an overrepresented impurity when the distribution of reads is not random and the number of reads per nucleotide of a nucleic acid impurity is at least 0.001% the average number of reads per parvoviral vector.

2. The method according to claim 1, wherein the nucleic acid sequencing in step (a) comprises high-throughput sequencing.

3. The method according to claim 1, wherein the parvoviral vector is a recombinant adeno-associated virus (rAAV) vector.

4. The method according to claim 1, wherein the nucleotide sequence of a biological component is selected from a group consisting of nucleotide sequences of: a host cell, a plasmid, a vector other than the recombinant parvoviral vector, and a helper virus.

5. The method according to claim 4, wherein the nucleotide sequence of a biological component is a baculoviral vector.

6. The method according to claim 4, wherein the helper virus is a recombinant adenovirus and/or a recombinant herpes simplex virus.

7. The method according to claim 1, wherein the nucleotide sequence of the biological component comprises a nucleotide sequence encoding for Rep, Cap and/or a transgene.

8. The method according to claim 7, wherein the transgene is flanked by at least one parvoviral ITR.

9. The method according to claim 1, wherein the overrepresented nucleic acid impurity is quantified in a second or further composition.

10. The method according claim 1, wherein the composition comprising the parvoviral vector comprises a parvoviral capsid wherein the parvoviral vector is packaged.

11. The method according to claim 1, wherein the composition comprising the parvoviral vector does not consist of a sample from a mammal.

Description

BRIEF DESCRIPTION OF THE FIGURES

(1) FIG. 1 FIGS. 1A-1F. Resistance of AAV1-transgene (upper panels) and baculovirus DNA (middle and lower panels) to DNAse I. The amount of DNA as detected by Q-PCR using three different primer sets either without or with DNAse treatment. For each primer set two batches were tested. A) and B) primer set 59/60, C) and D) primer set 180/181, E) and F) primer set 340/341.

(2) FIG. 2. Sequence map of the baculoviral plasmid Bac.VD. The used primer sets and ITRs are indicated.

(3) FIG. 3. Relative amount of genome copies as detected by the different primers sets. On the axis the location of the amplicons are indicated. Amplicon 5214-5284 represents the CMV promoter of the AAV-transgene cassette. Amplicon 73555-73604 is targeted by primer set 340/341 and located the furthest from the AAV-transgene cassette. Each dot represents one measurement.

(4) FIGS. 4A-4C. rAAV comprising a transgene was analyzed by deep-sequencing. The obtained reads were aligned to the transgene (A), the cap cassette (B), or the rep cassette (c).

(5) FIG. 5. rAAV was analyzed by deep-sequencing. The obtained reads were aligned to the baculoviral genome. Depicted is the distribution of reads per nucleotide of the baculoviral backbone. Nucleotide 1 is the right ITR as indicated in FIG. 2.

(6) FIG. 6. Five different batches of rAAV vectors were tested for DNA impurities using Q-PCR or deep-sequencing with Illumina or Roche 454.

EXAMPLES

Example 1

DNA Impurities in Manufactured rAAV Vectors

(7) 1.1 Material and Methods

(8) To investigate whether the residual DNA was packaged in AAV1 particles, it was tested whether the residual DNA was DNAse resistant. The samples were treated with Benzonase (9 U/mL) and the amount of DNA was analyzed using Q-PCR.

(9) DNA was isolated from the samples followed by Q-PCR using three different primer sets (59/60, 180/181, 340/341). To study the DNAse resistance of the baculovirus DNA, for some samples the DNAse step was omitted (indicated as without DNAse). The data were analyzed using PLA analysis and for each sample the ratios of the amounts of DNA amplified by the different primer sets was determined.

(10) The amount of AAV1 DNA was determined using Q-PCR with primer set 59/90 targeting the CMV promoter of the AAV1-transgene vector. The quantification of residual baculovirus DNA was performed using Q-PCR with baculovirus-specific primers. The experiments were performed using two different primer sets; primer set 180/181 targets ORF 1629 of the baculovirus DNA close to AAV-transgene cassette and primer set 340/341 targets the hr3 sequence of baculovirus, detecting baculovirus DNA located distantly from the AAV1-transgene cassette. For these experiments two standards were included: plasmid standard line (pVD) and purified baculovirus DNA of clone VD.

(11) To determine the amount of baculovirus DNA using primer set 180/181, pVD with primer set 180/181 was used as standard. The concentration of pVD was determined with OD measurements. To determine the amount of baculovirus DNA using primer sets 340/341, BacVD with primer set 340/341 was used as standard. The amount of BacVD for the standard line was determined by Q-PCR using primer set 180/181 with pVD as standard.

(12) The amount of DNA (gc/mL) was calculated using the formula:

(13) ##STR00001##
in which: S=mean quantity measured (gc) D=Dilution factor of viral DNA (either 500 times or 1000 times) C=Correction factor to calculate from 10 μl sample to 1 mL sample (100)

(14) To calculate the amount of DNA in μg/mL, the formula was extended to:

(15) [ DNA ] = S .Math. D .Math. C .Math. X A .Math. Mw .Math. .Math. μg / mL
in which: X=Conversion factor for g to μg (10.sup.6) A=Number of Avogadro (6.022×10.sup.23) Mw=Molar weight of DNA. The baculovirus genome consists of 135 kbp double stranded DNA. Mean molar weight per bp is 649 Da. As Mw for the baculovirus DNA (after determination using primer sets 180/181 or 340/341), a Mw of 135000.Math.649=8.76×10.sup.7 Da was used.

(16) The AAV1 genome consists of 3630 bp single stranded DNA. To calculate the amount of AAV1 DNA, a Mw of 3630 bp.Math.340 Da=1.23×10.sup.6 Da was used.

(17) 1.2 Results

(18) The amount of baculovirus DNA was determined with Q-PCR using two different primer sets. Primer set 180/181 detects a sequence in the ORF 1629, close to the AAV-transgene cassette, while the sequence for primer set 340/341 is located distantly from the cassette. The results in Table 1 show that the two primer sets yield very different values for the amount of gc/mL of baculovirus DNA (primer set 180/181 yields on average 20-fold higher values than obtained by primer set 340/341).

(19) TABLE-US-00001 TABLE 1 Concentration of intended rAAV genome and contaminating DAN baculovirus DNA AAV DNA baculovirus DNA (primer Batch (primers 59 . . . 90) (primer 180 . . . 181) 340 . . . 341) 1 5.9 × 10.sup.12 3.2 × 10.sup.11 .sup. 1.4 × 10.sup.10 2 5.6 × 10.sup.12 2.9 × 10.sup.11 .sup. 1.5 × 10.sup.10 3 6.5 × 10.sup.12 3.0 × 10.sup.11 .sup. 1.3 × 10.sup.10 4 7.1 × 10.sup.11 3.7 × 10.sup.10 1.5 × 10.sup.9 5 7.0 × 10.sup.12 3.0 × 10.sup.11 .sup. 1.2 × 10.sup.10 6 8.5 × 10.sup.11 3.5 × 10.sup.10 2.7 × 10.sup.9 7 9.1 × 10.sup.12 3.0 × 10.sup.11 .sup. 2.0 × 10.sup.10 8 8.9 × 10.sup.11 3.9 × 10.sup.10 2.0 × 10.sup.9 9 8.4 × 10.sup.11 3.9 × 10.sup.10 1.9 × 10.sup.9 10 1.1 × 10.sup.12 3.6 × 10.sup.10 1.7 × 10.sup.9 11 3.0 × 10.sup.12 1.3 × 10.sup.11 5.8 × 10.sup.9 12 3.3 × 10.sup.12 1.3 × 10.sup.11 5.9 × 10.sup.9 13 2.9 × 10.sup.12 1.2 × 10.sup.11 6.1 × 10.sup.9
The concentration of the Bac.VD standard was corrected using Q-PCR, it can thus be excluded that this difference is standard related. Therefore, these data indicate that the baculovirus DNA close to the ITRs (detected with primer set 180/181) is far more present than DNA distant from the ITRs (detected by the primer set 340/341).

Example 2

Determining Nucleic Acid Impurities Using Q-PCR

(20) 2.1 Material and Methods

(21) To further investigate which parts of the baculovirus genome were present in the samples and possible differences in the amount of different sequences, Q-PCR with different primer sets (See FIG. 2) was performed. For each primer set a standard line was included in the experiment. The amount of transgene copies was determined using primer set 59/60. Subsequently, the relative amount of genome copies as compared to transgene copies was determined. Since it is known that AAV particle may incorporate longer DNA sequences than the length of its own genome (Grieger et al. 2005, Allocca et al. 2008) the primers were chosen at the start and the end of the ORFs flanking the transgene cassette and 10 kb up- and downstream of the ITRs.

(22) 2.2 Results

(23) The amount of baculovirus DNA was determined using primer set 340/341, amplifying amplicon 73555-73604 that is located close to the hr3 sequence of the baculovirus. It was assumed that the amount of genome copies determined with these primers is representative of the entire baculovirus genome. However, similar experiments using different primer sets targeting sequences closer to the AAV-transgene cassette showed that a higher number of genome copies was found when the amplicon was located closer to the AAV-transgene cassette containing the ITRs (FIG. 3). Since it is known that AAV may be able to package larger DNA sequences (up to 8.9 kb, probably even higher (Allocca et al., 2008)), we expect that these sequences are packaged inside the particle. This implies that there are two kinds of residual baculovirus DNA sequences; 1) random sequences, as we determined with primer set 340/341, which are only found in 0.1% of the amount of genome copies of the transgene and 2) baculovirus sequences within 10 kb from the transgene cassette (within the range of AAV packaging limit), which are found in-between 1 and 2.5% of the amount of transgene genome copies. Both sequences are presumably packaged inside the AAV1 particle or associated with the capsid.

Example 3

Determining Nucleic Acid Impurities Using Next Generation Sequencing

(24) 3.1 Material and Methods

(25) To investigate the extent and origin of the DNA impurities in manufactured rAAV vectors, four different batches of rAAV vectors were analyzed by deep-sequencing. DNA was isolated from these rAAV vectors using the NUCLEOSPIN® extract II kit (Macherey Nagel, Duren, Germany).

(26) This DNA was used to prepare the deep-sequencing libraries.

(27) In order to create separate sequencing features an in situ hybridization is performed. Clusters are accomplished by limiting dilutions of an initial material. The DNA fragments are melted and the single strands are trapped inside the flow cell which is covered by a dens lawn of primers. Subsequent local amplification (bridge PCR) leads to formation of cluster of approximately 1000 identical molecules per square micrometer. The base incorporation starts by adding primers, polymerase and four flourophore-labeled deoxynuclotidetriphosphates. The dNTPs act as reversible terminators, i.e. only a single base is added per molecule in each cycle. The cluster fluorescence is measured to identify which base has been incorporated. A green laser identifies the incorporation of the bases G and T and a read laser identifies the bases A and C. Two different filters are used to distinguish between G/T and A/C, respectively. After the signal detection the fluorophor and the terminating modification of the nucleotide are removed (Dohm, J. C., Lottaz, C., Borodina, T., and Himmelbauer, H. (2008), Nucleic Acids Res. 36, e105; Shendure, J. and Ji, H. (2008), Nat. Biotechnol. 26, 1135-1145; Rothberg, J. M. and Leamon, J. H. (2008), Nat. Biotechnol. 26, 1117-1124; Kahvejian, A., Quackenbush, J., and Thompson, J. F. (2008), Nat. Biotechnol. 26, 1125-1133). This method can be particularly useful to determine what type of sequence is present as an impurity and what the ratios between the specific sequence populations are. The analysis was performed by ServiceXS (Leiden).

(28) Standard next generation sequencing experiment results in >20 million of short reads which needs to be aligned to a reference sequence or de novo assembled in order to produce contigs. Here, upon sequencing of the total content, reads were aligned to a number of reference sequences. These reference sequences represent DNA molecules which are known to be present in the rAAV vector preparations. This includes the intended genome and production related DNA impurities. The alignment was performed by CLC_bio aligner. The frequency at which every base is read in the experiment provides information about its relative occurrence as compared to other measured sequences. The reads per nucleotide were retrieved for each reference sequence (FIGS. 4A-4C). It is generally accepted that when nucleotides of the reference sequence are read more then 8-12 times, the sequence information has a high confidence level (Schuster, S. C. (2008), Nat. Methods 5, 16-18).

(29) 3.2 Results

(30) Total DNA sequencing was used to analyse the DNA composition of different AAV batches. The analysis was performed by Baseclear (Leiden, The Netherlands) based on a single read sequencing procedure of Illumina GAI-II. Resulting quality trimmed raw sequence data were analyzed with a help of CLC_bio bioinformatic software. The reads were reference assembled onto the reference sequences representing the potentially presented DNA molecules in rAAV vector preparations i.e. baculovirus backbone, cap specific, rep specific and transgene specific DNA. As expected great majority >99.7% of generated 20 million reads assembled to the intended DNA transgene cassette and to the known production related DNA impurities. All other sequences (below 0.3%) which were not assembled to any of the mentioned reference sequences may represent sequencing errors, linker multimerization, low quality reads and other DNA sequences.

(31) Counts per nucleotide were retrieved for transgene cassette, cap cassette, rep cassette and baculovirus genome and plotted against a nucleotide number (see FIGS. 4 and 5). The distribution of the read frequencies per nucleotide was highly consistent between the different batch preparations. Furthermore, it became evident that the distribution of baculovirus genome is not random. The genome segments flanking ITRs were clearly overrepresented (FIG. 5).

(32) We have used an average distribution of the reads retrieved from sequencing experiments as an input for calculating the relative occurrence of various DNA sequences found in rAAV preparations (table 2).

(33) TABLE-US-00002 TABLE 2 Average read distribution (S) in 5 different rAAV batches (lot #) Lot#1 Lot#2 Lot#3 Lot#4 Lot#5 transgene 569837 588620 600677 589597 544717 cassette Baculovirus 1184 1449 1236 1316 1087 Cap cassette 60 47 43 102 46 Rep cassette 460 676 596 691 562

(34) Table 2 shows average frequencies (S) retrieved per given sequence. These frequencies are presented in relation to the main DNA in the sample i.e. transgene cassette in Table 3. Percentage of a given impurity is calculated in relation to the transgene cassette according to the formula presented below and takes in to account size correction factor.
X.sub.bac=S.sub.bac/S.sub.transgene*C.sub.bac*100% Where: X.sub.bac—percentage of DNA impurities of baculovirus in relation to transgene cassette S.sub.bac—average counts retrieved for baculovirus backbone S.sub.transgene—average counts retrieved for transgene cassette C.sub.bac—molecule length correction factor where C.sub.bac=baculovirus backbone length (nt)/transgene cassette length (nt)

(35) TABLE-US-00003 TABLE 3 Relative abundance of the various DNA impurities as compared to transgene cassette. Average count distribution (S) of different molecules are presented in relation to the count distribution trasngene (S.sub.transgene) S.sub.ltransgene/S.sub.transgene S.sub.bac/S.sub.transgene S.sub.cap/S.sub.transgene S.sub.rep/S.sub.transgene Lot#1 1 2.077E−03 1.048E−04 8.073E−04 Lot#2 1 2.462E−03 8.068E−05 1.148E−03 Lot#3 1 2.057E−03 7.207E−05 9.918E−04 Lot#4 1 2.232E−03 1.727E−04 1.173E−03 Lot#5 1 1.996E−03 8.467E−05 1.031E−03

(36) TABLE-US-00004 TABLE 4 percentage of various impurities present in various AAV batches in relation to lpl cassette (based on the formula described in the text). Baculovirus transgene rep cap backbone Molecular 3645 2785 3088 133894 length (nt) Molecular 1 0.76406 0.847188 36.73361 length correction factor (C) % of transgene N/A 0.061682%  0.00888% 7.630121% present in batch 1 % of transgene N/A 0.087726% 0.006835%  9.04252% present in batch 2 % of transgene N/A 0.075782% 0.006106% 7.555691% present in batch 3 % of transgene N/A 0.089591%  0.01463% 8.200541% present in batch 4 % of transgene N/A 0.078764% 0.007173% 7.333003% present in batch 5

Example 4

Non-Random Distribution of the DNA Impurities

(37) 4.1 Material and Methods

(38) The next step was to determine the exact origin of the Baculovirus-obtained DNA impurities. To this end, different batches of rAAV vectors were deep-sequenced on the Illumina platform as described above. Alignment of the reads to the baculovirus genome provided a means to examine the frequency of each (baculovirus-derived) nucleotide in the deep-sequencing library. In addition, the average frequency was calculated by dividing the total number of reads that mapped to the baculovirus genome with the number of nucleotides.

(39) 4.2 Results

(40) FIG. 5 depicts the alignment of the reads to the Baculovirus genome after deep-sequencing. If the DNA impurities would be derived randomly from Baculovirus genome, a relatively even distribution should be observed with about 1200 reads per nucleotide. An even distribution is indeed seen in the middle of the baculovirus genome. However, at the beginning and at the end of the Baculovirus genome a strong increase in read number is observed. This indicates that these regions are overrepresented as DNA impurities in rAAV.

Example 5

Quality Control Assessment Using Q-PCR or Deep-Sequencing

(41) 5.1 Material and Methods

(42) In order to investigate the quantitative capabilities of different NGS methodologies, namely SOLEXA® and 454 Roche, the obtained NGS read distribution was compared to the measurements of various targets located across the baculovirus genome with qPCR (FIG. 6). QPCR targets represent regions: highly overrepresented (180/181), matching the average distribution (426/427, 428/429; 1018/1019; 1020/1021; 1024/1025) and underrepresented (340/341). The latter region was used as a calibrator for all the other measurements.

(43) 5.2 Results

(44) Three different techniques were investigated to test the level of DNA impurity in rAAV vectors. As shown in FIG. 6, the three techniques correlated well with each other and can thus be used side-by-side for the detection of DNA impurities in rAAV vectors.

(45) As indicated in FIG. 6, NGS analysis clearly demonstrates that the random choice of DNA amplicon for a quantitative PCR can lead to inaccurate measurement of a particular DNA impurity in the vector preparation. The presence of given DNA impurity is calculated based on the amplicon measurement under the assumption that all the parts of the investigated DNA molecule (which sometimes are 136000 bp long e.g. baculovirus backbone) are distributed with the same frequency. Here presented analysis clearly indicates that various segment of long DNA molecules e.g. baculovirus genome, may contaminate vector preparation with different frequencies due to unequal packaging of different DNA sequences.

(46) TABLE-US-00005 TABLE 5 Q-PCR primers used in the experiments SEQ ID NO Name Sequence Target direction amplicon  7 pr59 AATGGGCGGTAGGCGTGTA CMV promoter forward   5214-5284  8 pr60 AGGCGATCTGACGGTTCACTAA CMV promoter reverse   5214-5284  9 pr180 CGAACCGATGGCTGGACTATC Orf 1629 (protein sciences forward   8760-8830 baculo system) 10 pr181 TGCTGCTACAAGATTTGGCAAGT Orf 1629 (protein sciences reverse   8760-8830 baculo system) 11 pr340 ATACAACCGTTGGTTGCACG hr3 region baculo forward  73555-73604 downstream 12 pr341 CGGGACACGCCATGTATT hr3 region baculo reverse  73555-73604 downstream 13 pr402 GGGAGTGGCGGCGTTGATTT Baculovirus DNA 10 kb left sense 135323-135403 14 pr403 GCACAGTTCAAGCCTCACAGCCTA Baculovirus DNA 10 kb left antisense 135323-135403 15 pr404 CAAACGTGGTTTCGTGTGCCAA Baculovirus DNA left sense    3501-3603 ORF603 16 pr405 GATGCATGACTTCACCCACACACTT Baculovirus DNA left antisense  3501-3603 ORF603 17 pr406 ACAGCCATTGTAATGAGACGCACAA Baculovirus DNA right sense  4357-4438 ORF603 18 pr407 CCTAGCGCCCGATCAGCAACTATAT Baculovirus DNA right antisense  4357-4438 ORF603 19 pr408 TACCGACTCTGCTGAAGAGGAGGAA Baculovirus DNA left sense  8421-8499 ORF1629 20 pr409 TGCGTCTGGTGCAAACTCCTTTA Baculovirus DNA left antisense  8421-8499 ORF1629 21 pr410 GATTCGTCATGGCCACCACAAA Baculovirus DNA right ORF sense  10178-10261 1629 22 pr411 CCAAAGCGCCCGTTGATTATTTT Baculovirus DNA right ORF antisense  10178-10261 1629 23 pr412 GCGTACTTGCGGCTGTCGTTGTA Baculovirus DNA 10 kb right sense  14503-14605 24 pr413 CGAGGTCAAGTTCAAAGGGCAACAT Baculovirus DNA 10 kb right antisense  14503-14605 25 Pr426 GCATTTCGCAGCTCTCCTTCAATT Bac genome sense  32837-32935 26 Pr427 CTTCAAGCGAGAACGCAGCAATT Bac genome antisense  32837-32935 27 Pr428 GTGGCGTTTGCCGTGGAAAA Bac genome sense 116615-116699 28 Pr429 TGCAGCTGTGCGTTTTGAATGAA Bac genome antisense 116615-116699 29 Pr1018 TTGTTATGTCAATTTGTAGCGC Bac genome sense  18230-18296 30 Pr1019 TGCATAAAGACACAGTACAACG Bac genome antisense  18230-18296 31 Pr1020 GACATAGTTCGTTTGAAAATTATCC Bac genome sense  25963-26024 C 32 Pr1021 AACGATCAAGCTGTTAATAAACG Bac genome antisense  25963-26024 33 Pr1024 CGCTTCGGCGTAGTTTACC Bac genome sense 109100-109151 34 Pr1025 CGCTATAAGCGCGGGTTAC Bac genome antisense 109100-109151