Method for analyzing impurities of oligonucleotide sequence based on high-throughput sequencing and application

11597922 · 2023-03-07

Assignee

Inventors

Cpc classification

International classification

Abstract

The present invention provides a method for analyzing impurities of an oligonucleotide sequence based on high-throughput sequencing. The method of the present invention comprises the following steps: constructing a high-throughput sequencing library for analysis of impurities of an oligonucleotide sequence; subjecting the high-throughput sequencing library to high-throughput sequencing, and analyzing the nucleotide sequence components according to the sequencing results; the sequence of the extension primer used in the construction of the high-throughput sequencing library consisting of the DNA molecule set forth in positions 1-22 of SEQ ID NO: 2 and N bases (A, T, C or G) in sequence; and N being an integer greater than or equal to 6. It is proved by experiments that the method for analyzing impurities of an oligonucleotide sequence based on high-throughput sequencing of the present invention can quickly, accurately, and comprehensively analyze the purity and content of each component in the oligonucleotide sequence.

Claims

1. A method for constructing a high-throughput sequencing library for analysis of impurities of an oligonucleotide sequence, comprising the following steps: 1) Adding a poly tail to the 3′ end of an oligonucleotide sequence to be detected to obtain a poly-tailed product; 2) Performing reverse extension and amplification on the poly-tailed product to obtain a reverse extension and amplification product; the primers used for the reverse extension and amplification consisting of the oligonucleotide to be detected and an extension primer; the extension primer sequence consisting of the DNA molecule set forth in positions 1-22 of SEQ ID NO: 2 and N identical bases (A, T, C, or G) in sequence, wherein the N bases are complementary to the bases of the poly tail with N being an integer greater than or equal to 6; 3) Precipitating the reverse extension and amplification product to obtain a precipitated product; 4) Performing end repair, A-tailing reaction, adapter ligation and PCR amplification on the precipitated product sequentially to obtain a high-throughput sequencing library.

2. The method according to claim 1, wherein the extension primer sequence is the DNA molecule set forth in SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3 or SEQ ID NO: 4.

3. The method according to claim 2, wherein the extension primer sequence is the DNA molecule set forth in SEQ ID NO: 1.

4. The method according to claim 1, wherein the oligonucleotide is single-stranded DNA or double-stranded DNA.

5. The method according to claim 4, wherein the oligonucleotide is single-stranded DNA.

6. The method according to claim 1, wherein the length of the oligonucleotide is 8-120 bp.

7. A product, which is any one of the following a1)-a3): a1) the extension primer in claim 1; a2) a PCR reagent containing the extension primer of a1); a3) a kit containing the extension primer of a1) or the PCR reagent of a2).

8. The product according to claim 7, wherein the final concentration of the extension primer in the PCR reagent is 0.1-100 μM.

9. A method for analyzing impurities of an oligonucleotide sequence, comprising the following steps: (1) constructing a high-throughput sequencing library for analysis of impurities of an oligonucleotide sequence according to the method of claim 1; (2) subjecting the high-throughput sequencing library to high-throughput sequencing and analyzing the nucleotide sequence components according to the sequencing results.

10. A method for analyzing components of an artificially synthesized antisense oligodeoxynucleotide sequence for gene therapy or analyzing the purity and/or content of each component in an artificially synthesized antisense oligodeoxynucleotide sequence for gene therapy, comprising the following steps: (1) constructing a high-throughput sequencing library for analysis of impurities of an oligonucleotide sequence according to the method of claim 1; wherein, the oligonucleotide sequence are the artificially synthesized antisense oligodeoxynucleotide sequence for gene therapy; (2) subjecting the high-throughput sequencing library to high-throughput sequencing and analyzing the nucleotide sequence components according to the sequencing results.

Description

DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a flowchart of a method for analyzing impurities of an oligonucleotide sequence based on high-throughput sequencing.

(2) FIG. 2 shows the results of electrophoresis detection of the product precipitated with sodium acetate.

(3) FIG. 3 shows the results of electrophoresis detection of the PCR amplification product.

(4) FIG. 4 shows the results of electrophoresis detection of the purified PCR amplification product.

(5) FIG. 5 shows the results of high-throughput sequencing data analysis.

DETAILED DESCRIPTION OF THE INVENTION

(6) Unless otherwise specified, the experimental methods used in the following example are conventional methods.

(7) The materials, reagents, etc. used in the following example are available commercially, unless otherwise specified.

(8) In the following example, T4 DNA polymerase is a product from NEB (Beijing) LTD and the catalog number is M0203L. Both DNA Polymerase I and Large (Klenow) Fragment are products from NEB, and the catalog number is M0210S. Hereinafter, DNA Polymerase I and Large (Klenow) Fragment are referred to as DNA polymerase I. T4 Polynucleotide Kinase is a product from NEB, and the catalog number is M0201. Klenow DNA polymerase I and Bst DNA Pol I large Fragment are products from NEB, and the catalog number is M0275. T4 DNA Ligase is a product from NEB, and the catalog number is M0202L. Terminal transferase is a product from Thermo Fisher Scientific (China) Co., Ltd., and the catalog number is EP0162. Deoxyadenosine triphosphate is a product from Thermo Fisher Scientific (China) Co., Ltd., and the catalog number is 10216018. Phanta Max Super Fide DNA Polymerase is a product from Vazyme Biotech Co., Ltd., and the catalog number is P505.

(9) The formulation of 10×End Repair Buffer in the following example is: the solutes and their concentrations are: 900 mM MgCl.sub.2, 30 mM DTT, 10 mM ATP, 1 μg/μL BSA and 4 mM dNTPs; the solvent is 500 mM Tris-HCl buffer (pH 8.3).

Example 1. A Method for Analyzing Impurities of an Oligonucleotide Sequence Based on High-Throughput Sequencing

(10) I. Adding Poly Tail to Oligonucleotide

(11) 1. A poly A tail was added to the 3′ end of the oligonucleotide sequence, and a tailing reaction system was prepared according to each reagent and its addition amount in Table 1. The oligonucleotide sequence is as follows: 5′-CAGAGCAGCTTGTCTTTCTTC-3′ (SEQ ID NO: 5). The oligonucleotide sequence was synthesized by Shanghai Generay Biotech Co., Ltd.

(12) TABLE-US-00001 TABLE 1 Tailing reaction system Addition amount Reagent (μL) Nuclease-free water 13 Oligonucleotide (100 μM) 1 dATP (25 μM) 0.5 5 × TdT Buffer 4 Terminal transferase 1.5 Total volume 20 μL
2. The reaction was performed at 37° C. for 25 min.
3. The reaction mixture was incubated at 70° C. for 10 min to inactivate the terminal transferase.
II. Reverse Extension and Amplification
1. Each of the reagents shown in Table 2 was added to the reaction product obtained in above step 1 to prepare a reverse extension and amplification system.

(13) TABLE-US-00002 TABLE 2 Reverse extension and amplification system Addition amount Reagent (μL) Nuclease-free water 19 2 × Phanta Max Buffer 25 dNTPS (10 mM) 2 Extension primer extpT (100 μM) 2 Oligonucleotide (100 μM) 1 DNA Polymerase I 1 Total volume 50

(14) The sequence of the extension primer extpT is as follows:

(15) TABLE-US-00003 (SEQ ID NO: 1) 5′-GAGACACGAATAGACGGCACGATTTTTTTTTTTTTTTTTTTT-3′.
2. The reverse extension and amplification system prepared in step 1 was placed on a PCR instrument and the reaction procedure shown in Table 3 was performed.

(16) TABLE-US-00004 TABLE 3 Reverse extension and amplification procedure Segment 1 95° C. for 30 s Segment 2 95° C. for 30 s (20 cycles) 60° C. for 15 s 72° C. for 30 s Segment 3 72° C. for 10 min Segment 4 Stored at 10° C.
III. Sodium Acetate Precipitation
1. The product obtained in step 11 was added with 1/10 volume of sodium acetate (pH5.2), 2.5 volumes of absolute ethanol and 1 μL of glycogen with a concentration of 20 mg/mL.
2. The mixture was kept at −80° C. for 30 minutes.
3. The mixture was then centrifuged at 12000 rpm for 30 minutes at 4° C. and the supernatant was discarded and the precipitate was collected.
4. 1 mL of 80% aqueous ethanol solution was added to the precipitate, and then centrifuged at 12000 rpm for 5 minutes at 4° C. The supernatant was discarded and the precipitate was collected.
5. Step 4 was repeated once.
6. The precipitate was dried at room temperature and added with 30 μL of TE to dissolve.
7. The precipitation product was detected. The specific steps were as follows: a 12% PAGE gel was prepared, the precipitation product was loaded and electrophoresed at 200 V for 40 minutes, stained in a dark box for 10 minutes, and photographed in a gel imaging system. The results of electrophoresis detection are shown in FIG. 2.
IV. End Repair
1. An end repair reaction system was prepared according to each reagent and its addition amount in Table 4. After preparation, the reagents were mixed and subjected to instantaneous centrifugation.

(17) TABLE-US-00005 TABLE 4 End repair reaction system. Addition amount Reagent (μL) Nuclease-free water 7 10 × End Repair Buffer 3 T4 DNA Polymerase 2 T4 Polynucleotide Kinase (T4 PNK) 2 Klenow DNA polymerase I 0.5 Bst DNA Pol I large Fragment 0.5 The product of step III 15 Total volume 30
2. The end repair reaction system in step 1 was placed on a PCR instrument and the reaction procedure shown in Table 5 was performed.

(18) TABLE-US-00006 TABLE 5 PCR reaction procedure Segment 1 20 □ for 30 min Segment 2 65 □ for 30 min Segment 3 Stored at 10° C.
V. Adapter Ligation
1. An adapter ligation reaction system was prepared according to each reagent and its addition amount in Table 6. After preparation, the reagents were mixed and subjected to instantaneous centrifugation.

(19) TABLE-US-00007 TABLE 6 Adapter ligation reaction system Addition amount Reagent (μL) End repair product 30 Nuclease-free water 11 10 × T4 DNA Ligase Buffer 5 Y-shape adapter (40 μM) 2 T4 DNA Ligase 2 Total volume 50

(20) The above Y-shape adapter consists of UAF and AI5, and their sequences are as follows:

(21) TABLE-US-00008 UAF (the underlined base is thio-modified): 5-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC TCTTCCGATCT; AI5 (the underlined base is phosphate-modified): 5-GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTA TGCCGTCTTCTGCTTG.

(22) The above Y-shape adapter sequence was synthesized by Shanghai Generay Biotech Co., Ltd. The concentration of each adapter in the adapter ligation reaction system was 1.6 μM.

(23) 2. The adapter ligation reaction system was placed on a PCR instrument at 16° C. for 2 hours to obtain an adapter-ligated product.

(24) VI. Purification of Adapter-Ligated Product

(25) 1. Ampure XP magnetic beads (Agencourt AMPure XP Kit produced by Beckman Coulter, Inc., catalog number: A63880) were thoroughly mixed by shaking;

(26) 2. 1×Ampure XP magnetic beads was added to the ligation product obtained in above step V, mixed 10 times with a pipette, and kept at room temperature for 1 min;

(27) 3. The mixture was placed on a magnetic stand for 5 min and the supernatant was discarded.

(28) 4. 200 μL of freshly prepared 80% aqueous ethanol solution was added to the magnetic beads, placed at room temperature for 30 s, and the supernatant was discarded;

(29) 5. The above step 4 was repeated once;

(30) 6. 50 μL of 10 mM Tris-HCl (pH 8.0) was added for elution, and the supernatant was transferred to a new centrifuge tube;

(31) 7. One volume of magnetic beads was added and mixed 10 times with a pipette, and kept at room temperature for 1 min;

(32) 8. 200 μL of freshly prepared 80% aqueous ethanol solution was added to the magnetic beads and kept at room temperature for 30 s, and the supernatant was discarded;

(33) 9. The above step 8 was repeated once;

(34) 10. The lid was opened and the tube was kept at room temperature for 10 min;

(35) 11. 50 μL of 10 mM Tris-HCl (pH 8.0) was added and mixed with a pipette, and kept at room temperature for 1 min:

(36) 12. The tube was placed on a magnetic stand for 5 min, and the supernatant was transferred to a new centrifuge tube, and the purified adapter-ligated product was obtained.

(37) VII. PCR Amplification of Adapter-Ligated Product

(38) 1. Each of the reagents in Table 7 was added to the purified adapter-ligated product (the addition amounts were also shown in Table 7) to prepare a PCR amplification system.

(39) TABLE-US-00009 TABLE 7 PCR amplification system Addition amount Reagent (μL) Nuclease-free water 14 mpF (25 μM) 2 mpRI5 (25 μM) 2 dNTP mix (10 mM) 1 2 × Phanta Max Buffer 25 Phanta Max Super Fide DNA Polymerase 1

(40) The sequence of the primer mPF is as follows (synthesized by Shanghai Generay Biotech Co., Ltd):

(41) TABLE-US-00010 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC TTCCGATCT;
the sequence of the primer mRPI5 is as follows (synthesized by Shanghai Generay Biotech Co., Ltd): CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTCAGACGTGTGC TCTTCCGATCT. In the above primer sequence, the underlined base is thio-modified.
2. The mixture was divided into PCR tubes in 45 μL/tube and then 5 μL of the ligation product was added.
3. The PCR amplification system was placed in a PCR instrument, and the amplification was performed according to the PCR reaction procedure shown in Table 8, to obtain PCR amplification product. The PCR amplification product was subjected to gel electrophoresis detection, and the results are shown in FIG. 3.

(42) TABLE-US-00011 TABLE 8 PCR reaction procedure Segment 1 94 □ for 3 min Segment 2 94 □ for 30 s (20 cycles) 60 □ for 30 s 72 □ for 30 s Segment 3 72 □ for 10 min Segment 4 Stored at 4 □
VIII. Purification of PCR Amplification Product
1. The required bands in the above electrophoresis were cut out, and gel recovery was performed using the Agarose Gel DNA Recovery Kit (centrifugal column type: GK2042-50) from Shanghai Generay Biotech Co., Ltd.
2. 400 μL of Binding Solution was added to the gel and placed in a 50° C. water bath until the gel block was dissolved.
3. The mixture was shaken once every 2 minutes during this period.
4. The dissolved gel block was transferred to a silica gel column, kept at room temperature for 2 min, and centrifuged at 6000 rpm for 1 min, and the waste solution was discarded.
5. 500 μL of Washing Solution was added to the silica gel column, and kept at room temperature for 3 min.
6. The mixture was centrifuged at 12000 rpm for 1 min and the waste solution was discarded.
7. Step 6 was repeated once.
8. The mixture was centrifuged at 12000 rpm for 1 min and the silica gel column was transferred to a new 1.5 mL centrifuge tube.
9. 30 μL of nuclease-free water was added to the silica gel column, and kept at room temperature for 2 min.
10. The mixture was centrifuged at 12000 rpm for 1 min and the supernatant was collected to obtain a purified PCR amplification product. The purified PCR amplification product was subjected to electrophoresis detection, and the results are shown in FIG. 4, wherein, M: 20 bp DNA Ladder; 1: library gel recovery product. As can be seen from the figure, a high-throughput sequencing library was successfully obtained.
IX. High-Throughput Sequencing and Data Analysis
1. High-Throughput Sequencing

(43) The library constructed in step VIII was sequenced using the Hiseq 3000 platform in a single-end 150 bp sequencing mode.

(44) 2. Analysis of High-Throughput Sequencing Results

(45) The trimmomatic-0.33 software (the URL of the trimmomatic-0.33 software is as follows: www.usadellab.org/cms/index.php?page=trimmomatic) was used to remove low-quality bases at the 3′ end of the read, and the self-written per script ExtractValid.pl was used to extract the forward sequencing reads containing the adapter, the cutadapt1.2.1 software (the URL of the cutadapt1.2.1 software is as follows: github.com/marcelm/cutadapt/releases/tag/v1.2.1) was used to remove the Poly A adapter from the reads, the self-written per script FilterTN.pl was used to remove the reads containing N and Poly T adapters, the self-written perl script trim_polytail.pl was used to remove Poly tails with wrong bases added at the end due to impure dNTP, the self-written perl script FastQ_ReadFilterByLength.pl was used to filter too long or short reads, the fastx_collapser module in the FASTX Toolkit 0.0.13 software (the URL of the FASTX Toolkit 0.0.13 software is as follows: hannonlab.cshl.edu/fastx_toolkit/) was used to merge repeated sequences and to remove the reads with only one number. The purity of each component of the oligonucleotide was calculated according to the following formula: the ratio of each component (%)=the number of reads of the component/the sum of the number of reads of each component×100%.

(46) The results are shown in Table 9 and FIG. From the results, it can be seen that of the obtained 9664589 oligonucleotide sequences, there were 7830352 oligonucleotide sequences that were completely identical to the oligonucleotide sequence (CAGAGCAGCTTGTCTTCTTC), and the ratio was 81.02%. Compared with the oligonucleotide sequence, there were 1331990 oligonucleotide sequences lacking a partial sequence at the 5′ end, and the ratio was 13.78%; compared with the oligonucleotide sequence, there were 144439 oligonucleotide sequences lacking a partial sequence at the 3′ end, and the ratio was 1.49%; compared with the oligonucleotide sequence, there were a total of 18697 oligonucleotide sequences lacking partial sequences both at the 5′ end and 3′ end, and the ratio was 0.19%; there were some other cases (such as insertion, deletion and incorporation of wrong bases, etc.) of oligonucleotide sequences with a total of 339111, and the ratio was 3.51%. The above results show that the method of the present invention can accurately and comprehensively analyze the content of each component in the oligonucleotide sequence and its ratio.

(47) TABLE-US-00012 TABLE 9 Data analysis results Category Number (of Ratio (compared with the oligonucleotide sequence) sequences) (%) Completely identical 7830352 81.02% Lacking a partial sequence at the 5′ end 1331990 13.78% Lacking a partial sequence at the 3′ end 144439 1.49% Lacking partial sequences both at the 5′ 18697 0.19% end and 3′ end Other cases 339111 3.51% Total 9664589 Note: The oligonucleotide sequence is CAGAGCAGCTTGTCTTTCTTC.
3. Data Analysis

(48) The ncbi-blast-2.2.28+software (the URL of ncbi-blast-2.2.28+software is as follows: ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/) was first used to align with the oligonucleotide reference sequence; a self-written script was used to parse the alignment results, standardize the parsed results, classify the standardized results, and finally count the classified results.

(49) Table 10 shows the analysis results of the components with a content of greater than 0.1% and their contents. The analysis results of N−1 reads and N+1 reads are shown in Tables 11 and 12. In the tables, the oligonucleotide sequence and content of each component whose content is greater than 0.1%, the content and ratio of each component in N−1 reads and N+1 reads are shown.

(50) TABLE-US-00013 TABLE 10  Components with a content of greater than 0.1% and their contents Component Ratio No. Component (%) 1-7830352 CAGAGCAGCTTGTCTTTCTTC 81.02% 2-209704  GCTTGTCTTTCTTC  2.17% 3-204757  AGAGCAGCTTGTCTTTCTTC  2.12% 4-182418  CAGCTTGTCTTTCTTC  1.89% 5-165497  GAGCAGCTTGTCTTTCTTC  1.71% 6-157170  GCAGCTTGTCTTTCTTC  1.63% 7-154399  AGCAGCTTGTCTTTCTTC  1.60% 8-133435  CTTGTCTTTCTTC  1.38% 9-122557  AGCTTGTCTTTCTTC  1.27% 10-85682    CAGAGCAGCTTGTCTTTCTT  0.89% 11-34227    CAGAGCAGCTTGTCTTTCTC  0.35% 12-18571    CAGAGCAGCTTGTCTTTCT  0.19% 13-18146    CAGAGCAGCTTGTCTTCTTC  0.19% 14-15125    CAGAGAGCTTGTCTTTCTTC  0.16% 15-13947    CGGAGCAGCTTGTCTTTCTC  0.14% 16-13400    CTGAGCAGCTTGTCTTTCTTC  0.14% 17-11075    CAGAGCAGCTTGTCTTTC  0.11% 18-10938    CAAGCAGCTTGTCTTTCTTC  0.11% 19-9391     CGAGCAGCTTGTCTTTCTTC  0.10% Others  2.83%

(51) TABLE-US-00014 TABLE 11 Analysis results of N − 1 reads Category Number Ratio No. (compared with the oligonucleotide sequence) of reads (%) 1 Reads lacking some bases at the 5′ end or 3′ 290439 68.22% end 2 Reads with incorrect bases 2447 0.57% 3 Reads with missing bases in the middle 76214 4 Reads with partial insertion or deletion at the 5′ 57425 17.9% end or 3′ end 5 Reads with inserted bases 137 13.49% 6 Total number of N − 1 reads 425718 0.03% Note: The oligonucleotide sequence is CAGAGCAGCTTGTCTTTCTTC.

(52) TABLE-US-00015 TABLE 12 Analysis results of N + 1 reads Category Number Ratio No. (compared with the oligonucleotide sequence) of reads (%) 1 Reads with incorrect bases 73 0.56 2 Reads with inserted bases 8571 65.32 3 Reads whose bases at the 5′ end and 3′ end 4568 34.81 cannot match 4 Reads with missing bases in the middle 0 0 5 Total number of N + 1 reads 13122 Note: The oligonucleotide sequence is CAGAGCAGCTTGTCTTTCTTC.

INDUSTRIAL APPLICATION

(53) The present invention provides a method for analyzing impurities of an oligonucleotide sequence based on high-throughput sequencing. The method of the present invention comprises the following steps: constructing a high-throughput sequencing library for analysis of impurities of an oligonucleotide sequence; subjecting the high-throughput sequencing library to high-throughput sequencing, and analyzing the nucleotide sequence components according to the sequencing results; the sequence of the extension primer used in the construction of the high-throughput sequencing library consisting of the DNA molecule set forth in positions 1-22 of SEQ ID NO: 2 and N bases (A, T, C or G) in sequence; and N being an integer greater than or equal to 6. It is proved by experiments that the method for analyzing impurities of an oligonucleotide sequence based on high-throughput sequencing of the present invention can quickly, accurately, and comprehensively analyze the purity and content of each component in the oligonucleotide sequence.