NOVEL PROTOSPACER ADJACENT MOTIF SEQUENCE AND METHOD FOR MODIFYING TARGET NUCLEIC ACID IN GENOME OF CELL BY USING SAME

20230074760 · 2023-03-09

Assignee

Inventors

Cpc classification

International classification

Abstract

Provided are: a method of modifying a target nucleic acid in the genome of a cell by using a novel PAM sequence; and a cell in which a target nucleic acid of the genome of the cell is modified by the method. Accordingly, genome editing may be performed by targeting a position, which has not been previously targeted, as a target for genome editing, and thus the range of applications of genome editing may be expanded.

Claims

1. A method of modifying a target nucleic acid in the genome of a cell, the method comprising: incubating a cell comprising a target nucleic acid; a polynucleotide encoding a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) nuclease or a variant thereof; and a guide RNA, wherein the target nucleic acid comprises a protospacer adjacent motif (PAM) and a target sequence complementary to the guide RNA, the PAM consists of a nucleotide sequence selected from the group consisting of 5′-NAGN-3′, 5′-ACGG-3′, 5′-AGAS-3′, 5′-WGGN-3′, 5′-SGRN-3′, 5′-TGAD-3′, 5′-GTGC-3′, 5′-RAGN-3′, 5′-CAGH-3′, 5′-TAGB-3′, 5′-VGAG-3′, 5′-NGGN-3′, 5′-AAGH-3′, 5′-CAGY-3′, 5′-GAGN-3′, 5′-VAAG-3′, 5′-WGRN-3′, 5′-RGYG-3′, 5′-SGAD-3′, 5′-SGGN-3′, 5′-TGTG-3′, 5′-YGCG-3′, 5′-NYGG-3′, 5′-NGRN-3′, 5′-SGCC-3′, 5′-MTGC-3′, 5′-GTGH-3′, 5′-RCAG-3′, 5′-VTAG-3′, 5′-WGCC-3′, 5′-SACG-3′, 5′-NGCD-3′, 5′-NARG-3′, 5′-WAGA-3′, 5′-GAGT-3′, 5′-NGTK-3′, 5′-NGCN-3′, 5′-NACG-3′, 5′-AGGY-3′, 5′-NGDG-3′, 5′-GAGG-3′, 5′-GGGT-3′, 5′-VYAG-3′, 5′-GAAT-3′, 5′-NGVN-3′, 5′-VACG-3′, 5′-AYGA-3′, 5′-GAGH-3′, 5′-TGTN-3′, 5′-AGTB-3′, 5′-SGTK-3′, 5′-GATG-3′, 5′-NGAN-3′, 5′-NAAG-3′, 5′-NGCG-3′, 5′-NGGR-3′, 5′-DGGC-3′, 5′-NGGT-3′, 5′-NGTG-3′, and 5′-GGAG-3′, the target nucleic acid can be recognized by a complex comprising: the Cas nuclease or the variant thereof; and the guide RNA, and the complex comprising the Cas nuclease or the variant thereof and the guide RNA modifies the target nucleic acid sequence-specifically.

2. The method of claim 1, wherein the variant of the Cas nuclease is a nuclease derived from a bacterium selected from the group consisting of eSpCas9(1.1), SpCas9-HF1, HypaCas9, evoCas9, Sniper-Cas9, SpCas9 VQR variant, SpCas9 VRER variant, SpCas9 VRQR variant, SpCas9 VRQR-HF1 variant, and SpCas9 QQR1 variant.

3. The method of claim 1, wherein, when the variant of the Cas nuclease is eSpCas9(1.1), the PAM is a polynucleotide consisting of a nucleotide sequence selected from the group consisting of 5′-NAGN-3′, 5′-ACGG-3′, 5′-AGAS-3′, 5′-WGGN-3′, 5′-SGRN-3′, 5′-TGAD-3′, and 5′-GTGC-3′.

4. The method of claim 1, wherein, when the variant of the Cas nuclease is SpCas9-HF1, the PAM is a polynucleotide consisting of a nucleotide sequence selected from the group consisting of 5′-RAGN-3′, 5′-CAGH-3′, 5′-TAGB-3′, 5′-VGAG-3′, and 5′-NGGN-3′.

5. The method of claim 1, wherein, when the variant of the Cas nuclease is HypaCas9, the PAM is a polynucleotide consisting of a nucleotide sequence selected from 5′-AAGH-3′, 5′-CAGY-3′, 5′-GAGN-3′, and 5′-NGGN-3′.

6. The method of claim 1, wherein, when the variant of the Cas nuclease is evoCas9, the PAM is a polynucleotide consisting of a nucleotide sequence selected from 5′-VAAG-3′, 5′-WGRN-3′, 5′-RGYG-3′, 5′-SGAD-3′, 5′-SGGN-3′, 5′-TGTG-3′, and 5′-YGCG-3′.

7. The method of claim 1, wherein, when the variant of the Cas nuclease is Sniper-Cas9, the PAM is a polynucleotide consisting of a nucleotide sequence selected from 5′-NAGN-3′, 5′-NYGG-3′, 5′-NGRN-3′, 5′-SGCC-3′, 5′-MTGC-3′, and 5′-GTGH-3′.

8. The method of claim 1, wherein, when the variant of the Cas nuclease is SpCas9 VQR variant, the PAM is a polynucleotide consisting of a nucleotide sequence selected from 5′-RCAG-3′, 5′-VTAG-3′, 5′-WGCC-3′, 5′-SACG-3′, 5′-NGCD-3′, 5′-NGRN-3′, 5′-NARG-3′, 5′-WAGA-3′, 5′-GAGT-3′, and 5-NGTK-3′.

9. The method of claim 1, wherein, when the variant of the Cas nuclease is SpCas9 VRER variant, the PAM is a polynucleotide consisting of a nucleotide sequence selected from 5′-NGCN-3′, 5′-NACG-3′, 5′-AGGY-3′, 5′-NGDG-3′, 5′-GAGG-3′, and 5′-GGGT-3′.

10. The method of claim 1, wherein, when the variant of the Cas nuclease is SpCas9 VRQR variant, the PAM is a polynucleotide consisting of a nucleotide sequence selected from 5′VYAG-3′, 5′-GAAT-3′, 5′-NGVN-3′, 5′-VACG-3′, 5′-WAGA-3′, 5′-AYGA-3′, 5′-GAGH-3′, 5′-NARG-3′, 5′-TGTN-3′, 5′-AGTB-3′, 5′-SGTK-3′, and 5′-GATG-3′.

11. The method of claim 1, wherein, when the variant of the Cas nuclease is SpCas9 VRQR-HF1 variant, the PAM is a polynucleotide consisting of a nucleotide sequence selected from 5′-NGAN-3′, 5′-NAAG-3′, 5′-RCAG-3′, 5′-VTAG-3′, 5′-NGCG-3′, 5′-NGGR-3′, 5′-DGGC-3′, 5′-GAGG-3′, 5′-NGGT-3′, and 5′-NGTG-3′.

12. The method of claim 1, wherein, when the variant of the Cas nuclease is SpCas9 QQR variant, the PAM is a polynucleotide consisting of a nucleotide sequence selected from 5′-NAAG-3′ and 5′-GGAG-3′.

13. The method of claim 1, wherein the guide RNA is a polynucleotide complementary to 2 to 24 consecutive nucleotide sequences in the 5′- or 3′-direction of the PAM in the target nucleic acid.

14. The method of claim 1, the length of the guide RNA is 17 to 24 nucleotides.

15. The method of claim 1, the modification is cleavage, insertion, ligation, deamination, or a combination thereof.

16. A cell in which a target nucleic acid of the genome is modified by the method of claim 1.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0047] FIG. 1 is a schematic diagram showing an experimental procedure for determining a PAM sequence by using an immobilized protospacer.

[0048] FIGS. 2A to 2C are schematic diagrams of SpCas9 nuclease and variants thereof used in Examples (arrows: positions of mutations compared to wild-type SpCas9, Arg: arginine-rich bridge helix, PI: PAM-interacting domain).

[0049] FIGS. 3 to 15 are heat maps showing average indel frequencies induced in 4-nt candidate PAM sequences by SpCas9 nuclease and variants thereof used in Examples.

MODE OF DISCLOSURE

[0050] Hereinafter, the present disclosure will be described in more detail with reference to Examples below. However, these Examples are for illustrative purposes of one or more embodiments, and the scope of the present disclosure is not limited thereto.

Example 1. Identification of PAM Sequence of SpCas9 Variant

1. Preparation of Plasmid Library Containing Guide RNA, PAM Sequence, and Target Sequence

[0051] To confirm PAM sequences other than NGG, which is an SpCas9 PAM sequence, an oligonucleotide library was prepared by requesting Twist Bioscience Co. (see FIG. 1).

[0052] Each oligonucleotide was designed to include 19-nt sgRNA from the 5′-terminal, BsmBI restriction site, barcode 1 (20-nt sequence), second BsmBI restriction site, barcode 2 (15-nt sequence), and a 30-nt target sequence containing 8-nt PAM sequence with random sequences.

[0053] First, a library with 8,130 (271×30) target sequences including 271 different 6-nt PAM sequences (256 NNNNTA sequences+16 AGGTNN sequences−1 overlapping AGGGTA sequence) paired with 30 protospacers was produced. Then, GN.sub.19 protospacers perfectly matched with respect to SpCas9-induced indel frequencies ranging from 38% to 42% were selected. The indel frequencies were determined at 7,314 lentivirally integrated target sequences with NGG PAM sequences 3 days after lentiviral delivery of SpCas9. Here, the range of the indel frequencies was from 0% to 99%. Among 7,314 sgRNAs, 30 species with indel frequencies ranging from 38% to 42% were randomly selected.

[0054] Next, 2,940 sgRNAs were designed to validate nuclease activity in mismatched target sequences: 2,940 with NGG PAM=30 sgRNAs×98 targets [1 target without mismatch (5 different barcodes)+60 targets each having 1 base mismatch+19 targets each having 2 base mismatches+18 targets each having 3 base mismatches]; and 732=4 sgRNAs×61 targets [1 target without mismatch (5 different barcodes)+60 targets each having 1 base mismatch]×3 PAMs (NGAG, NAAG and NGCG).

[0055] The plasmid library containing the sgRNAs and the target sequences was prevented from uncoupling between the guide RNA and the target sequences during PCR amplification of the oligonucleotide pool by using a two-step cloning method.

[0056] The first step is the preparation of an initial plasmid library containing guide RNA and target sequences. Lenti-gRNA-Puro plasmid (Addgene, #84752) was linearized using BsmBI restriction enzyme (NEB), and PCR-amplified oligonucleotide pool (target sequence) was ligated into the linearized vector. The reactant was transformed into E. coli, and the plasmid was isolated from the selected colonies. The primers used to amplify the oligonucleotide pool are as follows:

TABLE-US-00001 Forward primer: (SEQ ID NO: 1) 5′-TTGAAAGTATTTCGATTTCTTGGCTTTATATATCTTGTGGAAAGGACG AAACACC-3′ Reverse primer: (SEQ ID NO: 2) 5′-GAGTAAGCTGACCGCTGAAGTACAAGTGGTAGAGTAGAGATCTAGTTA CGCCAAGCT-3′

[0057] The second step is the insertion of a sgRNA scaffold. The plasmid library prepared in the first step was digested with a BsmBI restriction enzyme (NEB), and the resulting nucleic acid fragments were purified on a gel after performing agarose electrophoresis. An insert fragment containing the sgRNA scaffold was synthesized and cloned into a TOPO vector (also referred to as a T-blunt vector, Solgent). The sequence of the insert fragment is shown below, wherein the sgRNA including a poly T sequence is bolded, and the BsmBI restriction enzyme site is underlined.

TABLE-US-00002 (SEQ ID NO: 3) 5′-CGTCTCTGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCC GTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTGGGAGACG- 3′

[0058] The TOPO vector containing the insert fragment was digested with a BsmBI restriction enzyme (Enzynomics) to isolate an 83-nt insert fragment. An insert fragment was ligated into the digested plasmid library, and transformed into E. coli. Then, the plasmid library was isolated from selected colonies.

2. Preparation of Cell Library

[0059] First, to produce a lentiviral library, HEK293T cells (ATCC), which are human embryonic kidney cells, were prepared. The plasmid prepared in Example 1.1, psPAX2, and pMD2.G were mixed, and then transfected into HEK293T cells using Lipofectamine 2000 (Invitrogen). After 12 hours of the transfection, a fresh medium was added to the cells. After 36 hours of the transfection, a supernatant containing the virus was obtained. The supernatant thus obtained was filtered through a Millex-HV 0.45 μm low-protein binding membrane (Millipore), and the aliquot was stored at −80 ° C. until use. Here, a virus yield was verified by measuring with a Lenti-X p24 Rapid Titer Kit (Clontech). To calculate virus titer, serially diluted virus aliquots were transduced into HEK293T cells in the presence of 8 μg/ml polybrene, and then cultured in the presence of 2 μg/ml puromycin or 20 μg/ml blasticidin S (InvivoGen).

[0060] For transduction of the prepared lentiviral library, HEK293T cells were cultured overnight in a culture dish. The lentiviral library with a multiplicity of infection (MOI) value of of 0.3 was transduced into HEK293T cells in the presence of 8 μg/ml polybrene, and the cells were cultured overnight. The cells were further cultured in the presence of 2 μg/ml puromycin to remove non-transduced cells, and the cell library was maintained at an amount of 1.2×10.sup.7 cells.

3. Delivery of Cas9 to Cell Library

[0061] A cell library with 1.2×10.sup.7 cells was prepared, and viruses encoding SpCas9, eSpCas9(1.1), SpCas9-HF1, HypaCas9, evoCas9, Sniper-Cas9, xCas9, SpCas9 VQR variant, SpCas9 VRER variant, SpCas9 VRQR variant, SpCas9 VRQR variant, SpCas9 VRQR variant, SpCas9 VRQR variant, SpCas9 QQR1 variant, and SpCas9-NG were transduced into cells in the presence of 8 μg/ml polybrene (see FIGS. 2A to 2C). The transduction was performed at MOI value of 5, and the cells were selected in the presence of 20 μg/ml blasticidin S.

4. Measurement of Indel Frequency

[0062] To measure the frequency of indels (insertion/deletion (indel)) in the genome of the cells prepared in Example 1.3, deep sequencing and analysis of indel frequencies were performed.

[0063] For the deep sequencing, genomic DNA was isolated from the cells using a Wizard Genomic DNA purification kit (Promega). The inserted target sequences were amplified by PCR using 2X Taq PCR Smart mix (Solgent) for high-performance experiments. In the primary PCR, a total of 240 μg of the genomic DNA for each cell library was used to achieve a coverage of 1000× or greater for the library (approximately 10 μg of the genomic DNA per 10.sup.6 cells). For each reaction, 2.5 μg of the genomic DNA was subjected to primary PCR, and all reaction products were pooled first and then purified. 50 ng of a sample from the purified product was amplified by secondary PCR using a primer containing an Illumina adapter and a barcode sequence. The amplified product was purified after electrophoresis, and then analyzed using HiSeq or MiniSeq (Illumina).

[0064] Primers used in the experiment are as follows:

TABLE-US-00003 Primers for primary PCR Forward primer: (Forward primer, SEQ ID NO: 4) 5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCTCTTGAAAAAGTGGCA CCGAGTCG-3′ (SEQ ID NO: 5) 5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCTTCTTGAAAAAGTGGC ACCGAGTCG-3′ (SEQ ID NO: 6) 5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCTCGCTTGAAAAAGTGG CACCGAGTCG-3′ Reverse primer: (SEQ ID NO: 7) 5′-GTGACTGGAGTTCAGACGTGTGCTTCCGATCTTTAAGTCGAGTAAGCT GACCGCTGAAG-3′ (SEQ ID NO: 8) 5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTATTAAGTCGAGTAA GCTGACCGCTGAAG-3′ (SEQ ID NO: 9) 5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTATTAAGTCGAGTA AGCTGACCGCTGAAG-3′ Primers for second PCR Forward primer: (SEQ ID NO: 10) 5′-AATGATACGGCGACCACCGAGATCTACACACACTCTTTCCCTACACGA C-3′ (index) Reverse primer: (SEQ ID NO: 11) 5′-CAAGCAGAAGACGGCATACGAGATGTGACTGGAGTTCAGACGTGT-3′ (index)

[0065] For the analysis of indel frequency, the deep sequencing data was analyzed by modifying a Python scripts program. A pair of each guide RNA and a target sequence was identified using a total of 19-nt sequence including a 15-nt barcode and a 4-nt sequence located upstream of the barcode. When an indel was located at an expected cleavage site (i.e., an 8-nt region centered at a cleaved site), the indel was considered as a Cas9-induced mutation. To remove a background indel frequency resulting from the array synthesis and the PCR amplification, total reads, indel reads, and indel frequencies in a case where Cas9 was not introduced were calculated, and the indel frequency (%) was calculated according to the following equation:

[00001] Indel frequency ( % ) = Indel read - ( The number of total read × Background indel frequency ) Total read - ( The number of total read × Background indel frequency ) × 100 ( Equation 1 )

[0066] The deep sequencing data was uploaded on the NCBI Sequence Read Archive (SRA, www.ncbi.nlm.nih.gov/sra/) with the Accession No. SRR10215483.

5. Determination of the PAM Sequence

[0067] From the deep sequencing data obtained according to Example 1.4, PAM sequences with high indel frequencies in the xCas9, SpCas9-NG, or SpCas9 nuclease were selected.

[0068] PAM sequences having an average indel frequency of 5% or more until Day 6 after the transduction into human cells were selected, and the indel frequencies according to the Cas9 nuclease and the variant thereof and the PAM sequence were analyzed.

[0069] Heat maps showing average indel frequences induced in 4-nt candidate PAM sequences by SpCas9 (wild-type), eSpCas9(1.1), SpCas9-HF1, HypaCas9, evoCas9, Sniper-Cas9, xCas9, SpCas9 VQR variant, SpCas9 VRER variant, SpCas9 VRQR variant, SpCas9 VRQR-HF1 variant, SpCas9 QQR1 variant, and SpCas9-NG were respectively shown in FIGS. 3 to 15 (one-way ANOVA analysis followed by Tukey post-hoc test). Among the 4-nt candidate PAM sequences (a total of 256 (=4.sup.4) candidate sequences), PAM sequences with an average indel frequency of 5% or more are shaded, and sequences with an average indel frequency of less than 5% are shown in white. The PAM sequences with high average indel frequencies are shown within boxes in bold solid lines. Experiments were carried out on all possible 4-nt PAM sequences of 5′-NNNN-3′, and analysis was performed using 22 to 30 target sequences for each PAM sequence.

[0070] The 4-nt PAM sequences identified by analyzing the heat maps of FIGS. 3 to 15 are shown in Table 1.

TABLE-US-00004 TABLE 1 Cas9 nuclease PAM sequence SpCas9 5′-NGRN-3′, 5′-NAGN-3′, 5′-GGCN-3′, 5′-YGCC-3′, 5′- GTGH-3′, 5′-NYGG-3′, 5′-MTGC-3′, 5′GCGC-3′ eSpCas9(1.1) 5′-NAGN-3′ 5′-ACGG-3′, 5′-AGAS-3′, 5′-WGGN-3′, 5′- SGRN-3′, 5′-TGAD-3′, 5′GTGC-3′ SpCas9-HF1 5′-RAGN-3′, 5′-CAGH-3′, 5′-TAGB-3′, 5′-VGAG-3′, 5′- NGGN-3′ HypaCas9 5′-AAGH-3′, 5′-CAGY-3′, 5′-GAGN-3′, 5′-NGGN-3′ evoCas9 5′-VAAG-3′, 5′-WGRN-3′, 5′-RGYG-3′, 5′-SGAD-3′, 5′- SGGN-3′, 5′TGTG-3′, 5′-YGCG-3′ Sniper-Cas9 5′-NAGN-3′', 5′-NYGG-3′, 5′-NGRN-3′, 5′SGCC-3′, 5′- MTGC-3′, 5′-GTGH-3′ xCas9 5′-NGDN-3′, 5′-VAGN-3′, 5′-SAWC-3′, 5′-NGCH-3′, 5′- VGCG-3′, 5′GACC-3′, 5′-TAGS-3′, 5′-AYGG-3′, 5′-YCGG- 3′, 5′-GTGC-3′ SpCas9 VQR 5′-RCAG-3′, 5′-VTAG-3′, 5′-WGCC-3′, 5′-SACG-3′, 5′- NGCD-3′, 5′-NGRN-3′, 5′-NARG-3′, 5′-WAGA-3′, 5′-GAGT- 3′, 5′-NGTK-3′ SpCas9 VRER 5′-NGCN-3′, 5′-NACG-3′, 5′-AGGY-3′, 5′-NGDG-3′, 5′- GAGG-3′, 5′-GGGT-3′ SpCas9 VRQR 5-VYAG-3′, 5′-GAAT-3′, 5′-NGVN-3′, 5′-VACG-3′, 5′- WAGA-3′, 5′-AYGA-3′, 5′-GAGH-3′, 5′-NARG-3′, 5′-TGTN- 3′, 5′-AGTB-3′, 5′-SGTK-3′, 5′-GATG-3′ SpCas9 VRQR-HF1 5′-NGAN-3′, 5'′NAAG-3′, 5′-RCAG-3′, 5′-VTAG-3′, 5′- NGCG-3′, 5′-NGGR-3′, 5′-DGGC-3′, 5′-GAGG-3′, 5′-NGGT- 3′, 5′-NGTG-3′ SpCas9 QQR1 5′-NAAG-3′, 5′-GGAG-3′ SpCas9-NG 5′-NGNN-3′, 5′-NAGN-3′, 5′-VAHD-3′, 5′-SAMC-3′, 5′- VATC-3′, 5′-TAHG-3′, 5′-RCDG-3′, 5′-TACT-3′, 5′-TATA-3′, 5′-VTNG-3′, 5′-GTAW-3′, 5′-RCGT-3′, 5′-VTGW-3′, 5′- RTGC-3′, 5′-TTGG-3′

[0071] Therefore, by discovering targetable positions in the human genome using the novel PAM sequences and predicting the indel-inducing activity of nucleases according to the PAM sequences, the utility of the genome editing may be increased.