POLYVALENT GUIDE RNAS FOR CRISPR ANTIVIRALS
20220204970 · 2022-06-30
Inventors
Cpc classification
C12N2310/20
CHEMISTRY; METALLURGY
C12N15/111
CHEMISTRY; METALLURGY
C12N2320/11
CHEMISTRY; METALLURGY
C12N9/22
CHEMISTRY; METALLURGY
International classification
C12N15/11
CHEMISTRY; METALLURGY
C12N15/10
CHEMISTRY; METALLURGY
Abstract
Generally, the present disclosure is directed to methods for gRNA design and products thereof that can be used as antivirals in which the produced gRNAs can be tolerant to polymorphisms across clinical strains and/or adapted for activity at multiple viral sites. Aspects of example gRNAs can also include reduced interactions with the human genome or transcriptome.
Claims
1. A method to determine a pgRNA sequence comprising: identifying two or more target sequences in a viral genome for recognition by a Cas effector; for each target sequence of the two or more target sequences, calculating a homology score comprising aligning said target sequence with each other target sequence of the two or more target sequences; determining one or more target pairs based at least in part on the homology score, wherein each target pair comprises a first target sequence and a second target sequence of the two or more target sequences having the homology score calculated as greater than or equal to 60% sequence identity; generating a pgRNA template for at least one of the one or more target pairs, wherein the pgRNA template has a complementary sequence to the first target sequence, the second target sequences, or a convergent sequence. generating a relative activity score for each of one or more pgRNA templates by comparing the pgRNA template to a complementary sequence to the first target sequence and a complementary sequence to a second nucleotide sequence present in a different viral genome, a mutant viral genome, or both, wherein each pgRNA template comprises a sequence of nucleotides; determining whether to calculate an off-target score for each pgRNA template based at least in part on the relative activity score generated for said pgRNA template; and determining the pgRNA sequence based at least in part on the relative activity score for each pgRNA template, the off-target score, or both.
2. The method of claim 1, wherein the two or more target sequences are RNA, DNA or both.
3. The method of claim 2, wherein the two or more target sequences are RNA.
4. The method of claim 1, wherein identifying the two or more target sequences in the viral genome comprises: determining a sequence position for each of one or more protospacer motifs present in the viral genome based at least in part on the CAS effector, wherein each of the one or more protospacer motifs comprise an adjacent sequence of nucleotides; and assigning at least one sequence position as a protospacer position; and identifying the two or more target sequences as a sequence of nucleotides immediately downstream of the protospacer position.
5. The method of claim 1, wherein the Cas effector is enAsCas12a.
6. The method of claim 4, wherein the one or more protospacer motifs are from the group consisting of: TTYN, CTTV, RTTC, TATM, CTCC, TCCC, TACA, RTTS, TATA, TGTV, ANCC, CVCC, TGCC, GTCC, TTAC, or combinations thereof.
7. The method of claim 6, wherein the one or more protospacer motifs are from the group consisting of: TTYN, CTTV, RTTC, TATM, CTCC, TCCC, TACA, or combinations thereof.
8. The method of claim 1, wherein the different viral genome and the viral genome are included in a viral family.
9. The method of claim 8, wherein the viral family is coronaviruses.
10. The method of claim 1, wherein comparing the pgRNA template to the complementary sequence to the first nucleotide sequence and the complementary sequence to the second nucleotide sequence present in the different viral genome, the mutant viral genome, or both: determining a first sequence identify for the pgRNA template to the complementary sequence to the first nucleotide sequence and a second sequence identity for the pgRNA template to the complementary sequence to the second nucleotide sequence, wherein the first sequence identity and the second sequence identity are calculated based on a BLAST alignment, and wherein the relative activity score is based at least in part on the first sequence identity and the second sequence identity.
11. The method of claim 10, wherein calculating the off-target score is performed only for the pgRNA templates having calculated the first sequence identity as greater than about 60% and the second sequence identity as greater than about 60%.
12. The method of claim 11, wherein calculating the off-target score is performed only for the pgRNA templates having calculated the first sequence identity as greater than about 90% and the second sequence identity as greater than about 90%.
13. The method of claim 11, wherein calculating the off-target score is based at least in part on comparing each of the one or more pgRNA templates to a human genome sequence or a human transcriptome sequence.
14. The method of claim 1, wherein determining the pgRNA sequence is based at least in part on a region of interest comprising a sequence of adjacent nucleotides present in the viral genome.
15. The method of claim 1, wherein, each target pair comprises a first target sequence and a second target sequence of the two or more target sequences having the homology score calculated as greater than or equal to 75% sequence identity.
16. A pgRNA having a pgRNA sequence determined according to the method of claim 1.
17. The pgRNA of claim 16, wherein the pgRNA sequence is determined based on identifying two or more target sequences in a coronavirus genome.
18. The pgRNA of claim 17, wherein the coronavirus genome is SARS-CoV-2.
19. The pgRNA of claim 16, wherein the pgRNA sequence comprises UAACCAUUGUUCGCUGUAACAGUAUCA (SEQ ID NO: 4).
20. A method for treating a viral infection in a patient comprising, -delivering to a patient in need thereof a composition comprising the pgRNA of claim 16.
21. The method of claim 20, wherein the patient displays symptoms of Covid-19.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
DETAILED DESCRIPTION
[0052] In general, the present disclosure is directed to methods for design of gRNAs for CRISPR antivirals that exploits the widely-recognized tendency of different CRISPR effectors to possess varying levels tolerances to imperfect complementary between the gRNA spacer and the targets. While significant efforts have gone into limiting this tendency for precision gene editing applications—and activity at multiple or “off-target” sites prevented at all costs—implementations of the present disclosure utilize a process for generating “polyvalent” gRNA (pgRNAs) that can demonstrate activity at multiple viral genomic sites: in effect producing operational multiplexing with a single gRNA. For instance, embodiments of the present disclosure can be used to generate pgRNA sequences that can be characterized by one or more of the following properties: (i) high relative activity at multiple viral targets, (ii) high relative activity across clinical strain variants, (iii) low predicted relative activity at potential human “off-targets,” and (iv) reasonable biophysical characteristics that suggest high CRISPR activity for potential antiviral and/or viral detection applications.
[0053] Aspects of example implementations include: designing pgRNAs which exhibit >95% activity at distant viral sites along a viral genome such as the SARS-CoV-2 ssRNA genome and which can be tolerant to variations across strains, while still avoiding predicted off-target activity with components of the human transcriptome. In particular, these pgRNAs may be designed based on the pgRNA use in combination with a specific Cas effector such as Cas13 from Ruminococcus flavefaciens XPD3002 (RfxCas13d). Another example of a Cas effector can include a Cas12a variant (engineered Cas12a from Acidaminococcus sp. BV3L6, enAsCas12a) that can target multiple locations along the HIV-1 provirus—up to three viral targets using a single pgRNA designed in accordance with the present disclosure—while minimizing activity at other sites in the human genome.
[0054] One example implementation in accordance with the present disclosure can include a method for determining a pgRNA sequence, such as a pgRNA sequence for producing an antiviral. The method for determining a pgRNA sequence can include identifying two or more target sequences (e.g., a nucleic acid sequence that can be RNA or DNA) in a viral genome for recognition by a CAS effector. The method can also include calculating a homology score, based on performing an alignment between each target sequence of the two or more target sequences with each other target sequence. More particularly, the homology score can include a metric such as sequence identify, sequence similarity, or other similar method for determining regions of overlap between target sequences.
[0055] Example methods for determining a pgRNA sequence can also include determining a target pair comprising a first nucleotide sequence present in the viral genome and a second nucleotide sequence present in the different viral genome, the mutant viral genome, or both. In some embodiments, the target pair can be determined based at least in part on the homology score. For example, the homology score may determine that a sequence of nucleotides (nt) displays 95% sequence identity between the viral genome and a different viral genome. In certain implementations, depending on if the homology score meets a certain threshold (e.g., greater than 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%), the sequence of nucleotides can be used to determine the target pair. As should be understood, the different viral genome may include a viral genome from the same viral family (e.g., coronaviruses).
[0056] Another aspect of example methods for determining a pgRNA sequence can include generating a relative activity score for each of one or more pgRNA templates by comparing the pgRNA template to a complementary sequence to the first nucleotide sequence and a complementary sequence to the second nucleotide sequence. The pgRNA templates can be generated by various means including random generation, computer modeling, or both, and generally each pgRNA template includes a sequence of nucleotides.
[0057] Example methods for determining a pgRNA sequence may further include determining whether to calculate an off-target score for each pgRNA template based at least in part on the relative activity score generated for said pgRNA template.
[0058] For example embodiments according to the present disclosure, determining the pgRNA sequence can based at least in part on the relative activity score for each pgRNA template, the off-target score, or both.
[0059] One example aspect of identifying the two or more target sequences in the viral genome can include determining a sequence position for each of one or more protospacer motifs present in the viral genome based at least in part on the Cas effector. For instance, certain Cas effectors may display preferential recognition and/or binding to different regions of the viral genome (e.g., protospacer motifs). In particular, some implementations may use the position of protospacer motifs in the viral genome to identify possible target sequences that would display improved efficacy for antiviral treatments. For example, by assigning at least one sequence position as a protospacer position, certain embodiments may identify the two or more target sequences as at least including a sequence of nucleotides immediately downstream of the protospacer position in the viral genome.
[0060] For implementations of the present disclosure, the Cas effector can include any Cas effector that can be implemented as part of a CRISPR system to result in breakage of nucleotide oligomers such as RNA or DNA. Some non-limiting examples of Cas effectors that can be used in embodiments of the disclosure include enAsCas12a (Cas12a), RfxCas13d (Cas 13d), and/or SpyCas9 (Cas 9).
[0061] As previously discussed, certain Cas effectors may display preferred recognition and/or binding to certain protospacer motifs. For instance, using a Cas effector of the present disclosure, the one or more protospacer motifs can include one or more from the group: TTYN, CTTV, RTTC, TATM, CTCC, TCCC, TACA, RTTS, TATA, TGTV, ANCC, CVCC, TGCC, GTCC, TTAC, or combinations thereof. In some implementations, the one or more protospacer motifs can include a subset of this group. For example, in certain embodiments, the one or more protospacer motifs are from the group: TTYN, CTTV, RTTC, TATM, CTCC, TCCC, TACA, or combinations thereof. More particularly, some embodiments can include identifying target sequences that occur downstream of the position of one or more of these protospacer motifs in the viral genome. As used herein, protospacer motifs are provided as nucleotide sequences: A—adenosine, C—cytosine, T—thymidine, G—guanosine, V—uridine, N—any nucleotide, R—adenosine or guanosine, S—guanosine or cytosine, Y—a pyrimidine (C, T, or V).
[0062] One aspect of example embodiments can include methods for developing pgRNA that can target members of a viral family. For instance, in some implementations, the viral genome and the different viral genome can be included in the same viral family. Viral families are similar to animal families in that the genomes of viruses of the same family display some degree of overlap which can be determined based on aligning the genetic sequence to determine the sequence identity or similarity for regions of the genome. One non-limiting example of a viral family can include coronaviruses (coronaviridae), which includes members such as SARS-CoV-2, MERS-CoV, and SARS-CoV. Another non-limiting example of a viral family can include retroviruses (retroviridae), which includes members such as human immunodeficiency virus (HIV) and human T-lymphotropic virus (HTLV).
[0063] In certain implementations, methods for determining a pgRNA sequence can include identifying target sequences in a viral genome from a certain viral family and, calculating a homology score between a first viral genome from the certain viral family and a second, different viral genome from the same certain viral family. As an example for illustration, the first viral genome can be the genome for SARS-CoV-2 and the second viral genome can be the genome for MERS-CoV.
[0064] According to an aspect of certain embodiments, comparing the pgRNA template to a complementary sequence to the first nucleotide sequence and a complementary sequence to the second nucleotide sequence can include determining a first sequence identify for the pgRNA template to the complementary sequence to the first nucleotide sequence and a second sequence identity for the pgRNA template to the complementary sequence to the second nucleotide sequence. In general, a complementary sequence as used herein carries the ordinary meaning in biology. Base paring rules for nucleotides indicate that each one of the 5 nucleobases (adenosine ‘A’, guanosine ‘G’, cytidine ‘C’, uridine ‘U’, thymidine ‘T’) has a complementary nucleobase based on the type of nitrogenous base. For example, the complement to A is T or U (and vice-versa) and the complement to C is G (and vise-versa). Thus a complementary sequence to the example oligonucleotide AUCGCAUCU can be XAGCGXAGA where ‘X’ is independently T or U. In determining whether the complement to A is T or U, the type of viral genetic material may be used as one basis. In certain embodiments for designing pgRNA, the complement to A may only be U.
[0065] For some example embodiments of the present disclosure, the first sequence identity and/or the second sequence identity can be determined according to various methods. One example method can include performing a sequence alignment such as a BLAST alignment. BLAST alignment is a tool for comparing two sequences (e.g., nucleotide sequences) to determine characteristics such as sequence identity or sequence similarity as measures of overlap between portions of the sequences. In this manner, regions of higher overlap (greater similarity) and regions of poor overlap (lower similarity) can be determined. Thus these regions of greater similarity may be used to design pgRNA that can target multiple viruses. As such, in some embodiments of the present disclosure, the relative activity score can be based at least in part on the first sequence identity and the second sequence identity.
[0066] In certain example embodiments, calculating the off-target score can be performed only for pgRNA templates having calculated the first sequence identity as greater than about 60% and the second sequence identity as greater than about 60%, such as the first sequence identity greater than 62%, 64%, 66%, 68%, 70%, 72%, 74%, 76%, 78%, 80%, 82%, 84%, 86%, 88%, 90%, 92%, 94%, 96%, or 98% and, independently, the second sequence identity greater than 62%, 64%, 66%, 68%, 70%, 72%, 74%, 76%, 78%, 80%, 82%, 84%, 86%, 88%, 90%, 92%, 94%, 96%, or 98%. For instance, in some implementations, calculating the off-target score is performed only for the pgRNA templates having calculated the first sequence identity as greater than about 90% and the second sequence identity as greater than about 90%.
[0067] An aspect of some implementations may include calculating the off-target score based at least in part on comparing each of the one or more pgRNA templates to a human genome sequence or a human transcriptome sequence. Generally, the off-target score can be used to approximate overlapping or possible reactivity between the designed pgRNA and genetic material (e.g., RNA or DNA) present in humans. In this manner, overlapping reactivity may be diminished by excluding or removing pgRNA templates meeting an off-target score threshold.
[0068] Another aspect of certain implementations can include using further selection criteria in the design of pgRNAs. For instance, determining the pgRNA sequence can based at least in part on a region of interest which includes a sequence of adjacent nucleotides present in the viral genome. The region of interest can include a position of a gene that may be of clinical or functional significance, a position which is conserved over many viral strains and or that demonstrates greater intolerance to mutations, or a position determined using an activity prediction such as one that can be performed using bioinformatic tools and/or methods, prior to experimental validation.
[0069] While the present application is generally directed to embodiments for treating humans, it should be understood that similar protocols may be developed for treating viral diseases in a variety of organisms. For example, viral prophylaxis and/or treatment is particularly needed in many agriculturally important plants and animals. One aspect of implementations for designing pgRNA for these organisms is modifying the step for calculating the off-target score. For the organism to be treated, the off-target score should be based on the alignment to the genome or transcriptome of the host organism to be treated (e.g., a plant genome). In this manner, implementations of the present disclosure can include pgRNA designed according to such example method that can be delivered to a plant to treat a viral infestation. Further, genetic modification of organisms including plants, may be used to create transgenic organisms that produce the pgRNA rather than requiring a delivery method.
[0070] One example embodiment of the present disclosure can include a pgRNA having a pgRNA sequence determined according to example embodiments of the present disclosure. Aspects of the pgRNA can include improved activity across multiple viral strains (e.g., viruses from the same viral family). For instance, the pgRNA can be included as a cofactor in a CRISPR-Cas system to produce an antiviral.
[0071] Aspects of the pgRNA can include a pgRNA sequence that is determined based on identifying two or more target sequences in a coronavirus genome (e.g., SARS-CoV-2).
[0072] Another example embodiment of the present disclosure can include a method for treating a viral infection by delivering to a patient in need thereof a composition comprising a pgRNA, the pgRNA having a pgRNA sequence determining according to example methods of the present disclosure. For instance, an example implementation of the present disclosure can include a method for treating a patient displaying symptoms of Covid-19, by delivering a composition including a pgRNA sequence determined based on identifying one or more sequences in the SARS-CoV-2 genome.
[0073] As described in the disclosure, sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially and publicly available computer programs can be used to determine percent (%) homology between two or more sequences and may also calculate the sequence identity shared by two or more nucleic acid sequences. Sequence homologies may be generated by any of a number of computer programs known in the art, for example BLAST. BLAST and are available for offline and online searching (see e.g., https://blast.ncbi.nlm.nih.gov/Blast.cgi). As used herein, sequence identity values
[0074] further embodiment of the present disclosure can include a diagnostic that includes one or more pgRNA sequences designed according to example implementations of the present disclosure. These diagnostics can include viral detection platforms which can provide advantages such as more sensitive identification of viral genetic material (e.g., by increasing the effective numbers of viral targets in a clinical sample), improved time-to-detection, and diagnostics that are more robust to viral mutations and variations across viral strains. When these example CRISPR diagnostic effectors recognize a viral nucleic acid sequence complementary to their gRNA, they cleave the viral nucleic acids, then begin to indiscriminately degrade any other single-stranded RNA or DNA they encounter. In a CRISPR-based viral detection platform, a “probe” nucleic acid is attached to a molecule that becomes highly fluorescent when the probes are degraded indiscriminately by the CRISPR effector. When these probes are included and this reaction is coupled with an isothermal PCR reaction to increase the amount of viral nucleic acids present in a clinical sample, it rapidly produces a bright signal without the need for a thermocycler.
[0075] The present invention will be better understood with reference to the following non-limiting examples.
EXAMPLES
[0076] The present examples provide aspects of embodiments of the present disclosure. These examples are not meant to limit embodiments solely to such examples herein, but rather to illustrate some possible implementations.
Material and Methods
Viral Nucleotide Sequences
[0077] The Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) isolate Wuhan-Hu-1 complete genome (NCBI Reference Sequence: NC_045512.2) served as the primary target for pgRNA development vs. the SARS-CoV-2 ssRNA genome. Design of pgRNA targets vs. HIV-1 provirus used the Human immunodeficiency virus type 1 (HXB2), complete genome; HIV1/HTLV-III/LAV reference genome (GenBank: K03455.1).
Calculation of Mismatch Penalties and Relative CRISPR Activities
[0078] Estimates of the relative CRISPR activity at sites not perfectly targeted by the gRNA/pgRNA spacer sequence were generated by calculating the Cutting Frequency Determination (CFD) score (35,45). To calculate the CFD score, the penalty (relative reduction in CRISPR activities) that result from each site with a mismatch is first drawn from a CFD matrix, the table of position-specific reductions of activity that occur as a result of mispairing between specific nucleotides in the spacer and target. The CFD matrices for CRISPR effector were generated by the Sanjana lab (RfxCas13d) and Doench lab (SpyCas9 and enAsCas12a, using the data from the “dropout” experiments) using massively parallel screens of gRNA libraries for CRISPR activity, and CFD scoring implemented in MATLAB using publicly available data sets from those labs. The CFD score for a given target and gRNA spacer is the product of the CFD penalties for each mismatch; the position-specific penalties (average over all possible mismatched nucleotides). This approach is fast to implement and has been successfully used as a reasonable approximation for CRISPR activity at off-target sites by for a number of different CRISPR effectors. The effect of different PAMs (PAM strength) for enAsCas12a activity at different sites used multiplicative penalty using data from similar large-scale screens of PAM libraries. In the case of RfxCas13d, penalties were recovered from taking the value of the reported log2(Fold-Change in expression) to the second power, vs. a perfectly complementary targeted mRNA reporter in their massively parallel screen for gRNA activity in the presence of mismatches. A missing value (rA-rC mismatch at position 15) was interpolated from the penalties of the rA-rC mismatches at positions 14 and 16. In the event of multiple sequential mismatches (two-in-a-row, three-in-a-row, etc.), the position-specific penalties for double- and triple- mismatches were used to calculate the CFD scores at those sites. If the off-target sites had <15 nt (nucleotide) identity as the intended target (<55% identity for RfxCas13d or <65% identity for enAsCas12a), the CRISPR effectors were considered effectively inactive at those sites.
Design of Polyvalent Guide RNAs
[0079] One example protocol for the design of polyvalent guide RNAs is summarized in
[0080] Step 1: Identification of Targets (‘protospacers’). For RfxCas13d, every 27 nt sequence along the Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1 complete genome was evaluated as a CRISPR target, also known as a ‘protospacer.’ For enAsCas12a, to recognized sufficiently by the enzyme protospacers must be located immediately downstream of a “Tier 1” protospacer adjacent motif (‘PAM’) (TTYN, CTTV, RTTC, TATM, CTCC, TCCC, and TACA) or a weaker “Tier 2” PAM (RTTS, TATA, TGTV, ANCC, CVCC, TGCC, GTCC, TTAC). Every 23 nt target targets sequences located immediately downstream of a Tier 1 or Tier 2 PAM sites were identified on either strand of the HIV-1 proviral reference genome and evaluated as a potential target/protospacer.
[0081] Step 2: Identification of Targetable Pairs with high homology. For each virus, every potential target was aligned to every other potential target, and pairs with >75% sequence identity (≥21 nt identity for Cas13d targets and ≥16 nt identity for Cas12a targets) identified. Those overlapping the SARS-CoV-2 poly(rA)-tail were removed from the list of potential pairs. For targeting the HIV provirus, exact target matches between pairs of sequences on the two long terminal repeat (LTR) regions were not considered (for reasons discussed below) unless they also formed a “target pair” with a segment between the two regions.
[0082] Step 3: Adaptation of pgRNA activity at pair sequences. For a given target pair, a pgRNA spacer template was generated complementary to the targets, using the location and sequences of the matching targets. Different ‘candidate pgRNA’ spacers were generated with all four potential nucleotides (rA, rU, rC, rG) at each of the sites of sequence divergence between the target pairs, i.e. 4n candidates for target pairs with n differences between sequence. A mismatch penalty (CFD score) between the candidates and each of the target pairs was calculated using the multiplicative approach (
[0083] Step 4: Estimate relative CRISPR activity across clinical strains (SARS-CoV-2). Sequences of 942 SARS-CoV-2 clinical strain variants were downloaded from the Severe acute respiratory syndrome coronavirus 2 data hub (NCBI Virus, accessed Apr. 23, 2020) (48) as all the “complete” nucleotide sequences available at the time. The sequences were then each individually aligned to the Wuhan-1 reference strain using a Needleman-Wunsch global alignment, and for each potential target site (27 nt region) across the genome, the number and prevalence of unique variants were counted. In evaluating pgRNA candidates, if the minimum relative activity across variants (MRAV) for the candidate pgRNAs across all the sequenced SARS-CoV-2 strains was <95% at either target site, the candidates were flagged. Sequences with ambiguous sites or indels (because their effect on Cas13d and Cas12a are less well defined) were removed from the calculation. To evaluate sequence conservation and “conservation of targets” across the SARS-CoV-2 genome in general (i.e.,
[0084] Step 5: Estimate relative activity at potential human off-targets. Candidate pgRNA spacers were aligned to the human genome for Cas12a (Genome Reference Consortium Human Build 38, GRCh38 human reference genome) or human transcriptome for Cas13d (GRCh38 human RefSeq transcripts) using a local nucleotide BLAST targeted for short sequences <30 nt (blastn-short). The region surrounding each hits to the human genome or transcriptome, to a total of 27 nt (the 27 nt protospacer for Cas13d and a 4 nt PAM+23 nt protospacer for Cas12a), were evaluated for a mismatch penalty score with its respective pgRNA candidates and, for Cas12a, the presence of a Tier 1 or Tier 2 PAM. While “off-target” interactions with the human transcriptome by Cas13d is not expected to have too detrimental of consequences compared to off-target genomic mutations by Cas12a, these unwanted interactions may titrate or dilute the activities of the Cas13d against the desired targets. For Cas13d, pgRNA spacer candidates with maximum predicted relative activity at any human transcript ≥10% were removed and, for Cas12a, those with maximum predicted relative activity at any site in the human genome ≥1% were removed.
[0085] Step 6: Selection of pgRNA based on additional functional criteria. At this stage, the RNA candidates have been screened for high relative activity at multiple viral targets and across clinical strains, low predicted activity at human “off-target” sites, and biophysical characteristics that suggest high overall CRISPR activity. The candidates can then be further refined by considering pgRNA targets located within specific genes or regions of interest (ROIs) that may be of clinical or functional significance, conservation of the targets/viral intolerance to mutations, and on-target activity prediction, which can be performed using several bioinformatic tools and methods available, prior to experimental validation.
Design of Polyvalent Guide RNA Computer Implemented Code
[0086] One example computer implemented protocol for the design of polyvalent guide RNAs is s coded and made available at: https://github.com/ejosephslab/pgrna. This example code can be executed by a computing system such as a laptop, personal computer, or other device configured to read the code.
Prevalence of pgRNA Target Pairs in Viral Genomes and pgRNA Candidates for Human-Hosted Viruses
[0087] All complete sequences of all RNA viruses with human, mammal, arthropoda, ayes, and higher plant hosts found in the NCBI Reference Sequence database were subjected to a brute force direct (nucleotide-by-nucleotide, no gaps) alignment for each of their 23 nt sequence targets to each other, considering only sequence polymorphisms at the same site. We considered only the (+) strand, as even for (−) and dsRNA viruses these sequences would match the vast majority of mRNA sequences. Only targets lacking polynucleotide repeats (4 consecutive rU's, rC's, rG's, or rA's) were considered viable targets. Targets derived from different segments or cDNAs of the same viral strain were considered together. In total: arthropoda (1074 viral species), ayes (111), mammal (496), higher plant/embrophyta (691), and human (89)-hosted viruses were considered. For human-hosted (+) ssRNA viruses or sequenced viral transcripts (59 in the RefSeq database), candidate pgRNA sequences for RfxCas13d were generated for each target pair found with predicted (monovalent) activity at both sites to be in the top quartile,.sup.25 screened for biophysical compatibility (lacking polynucleotide repeats or significant predicted secondary structure in the spacer), and aligned to Genome Reference Consortium Human Build 38, GRCh38 human reference transcriptome) using a local nucleotide BLAST.sup.34 search optimized for short sequences <30 nt (blastn-short). Only those with no hits (less than 15 nt homology out of 23 nt targets) to the human transcriptome and with predicted activity at both sites to be within the top quartile of all Cas13 activity for targets of that virus were considered viable pgRNA candidates.
Estimation of SARS-CoV-2 Target Sequence Conservation
[0088] All complete SARS-CoV-2 genomic sequences available from the NCBI Virus database were downloaded on Nov. 23, 2020 (29,123 sequences). For each of the 205 target pairs possessing biophysically feasible pgRNA candidates, we aligned (no gaps) each target sequence to each genome to determine the closest matching sequence. Alignments containing ambiguous nucleotide calls were not included. Sequence variants were grouped together, with a minimum prevalence of 0.1%, with the fraction of hits by the most prevalent group being considered the sequence conservation reported.
Construction of RfxCas13d for In Planta Expression
[0089] The DNA sequences of the plant codon optimized Cas13d-EGFP with the Cas13d from Ruminococcus flavefaciens (RfxCas13d) flanked by two nuclear localization signal (NLS) was amplified from plasmid pXR001 (Addgene #109049) using Q5 high fidelity of DNA polymerase (NEB). Similarly, overlap extension PCR was performed to amplify plant expression vector pB_35S/mEGFP (Addgene #135320) with ends that matched the ends of the Cas13 product so RfxCas13d expression would be under the control of 35S Cauliflower mosaic virus promoter. The PCR products were treated with Dpnl (NEB), assembled together in a HiFi DNA assembly reaction (NEB), transformed into NEB10b cells (NEB), and grown overnight on antibiotic selection to create plasmid pB_35S/RfxCas13. Successful clones were identified and confirmed by sequencing followed by transformation into electro-competent Agrobacterium tumefaciens strain GV3101 (pMP90).
Construction of crRNA Expression Vector
[0090] Single stranded oligonucleotides corresponding to “monovalent”, non-targeting (NT), and “polyvalent” gRNAs were purchased from Integrated DNA Technologies (Coralville, Iowa), phosphorylated, annealed, and ligated into binary vector SPDK3876 (Addgene #149275) that had been digested with restriction enzymes XbaI and XhoI (NEB) to be expressed under the pea early browning virus promoter (pEBV). The binary vector containing the right constructs were identified, sequenced and finally transformed into Agrobacterium tumefaciens strain GV3101. Multiplexed expression of two crRNAs was achieved by ligating (annealed, phosphorylated) oligos for two individual crRNAs (hairpin+spacer) together with an internal 4 nt “sticky-end” and into SPDK3876 so both crRNAs would be expressed on a single transcript.
Agroinfiltration of Nicotiana benthamiana (Tobacco) Leaves
[0091] In addition to pB 35S/RfxCas13 and the SPDK3876′s harboring gRNA sequences (TRV RNA2), PLY192 (TRV RNA1) (Addgene #148968) and RNA viruses TRBO-GFP (Addgene # 800083) were individually electroporated into A. tumefaciens strain GV3101. Single colonies were grown overnight at 28 degrees in LB media (10 g/L tryptone, 5 g/L yeast extract, 10 g/L NaCl; pH 7). The overnight cultures were then centrifuged and re-suspended in infiltration media (10 mM MOPS buffer pH 5.7, 10 mM MgC12, and 200 μM acetosyringone) and incubated to 3-4 hours at 28 degrees. The above cultures were mixed to a final OD600 of 0.5 for CasRX-NLS-GFP-pB35, 0.1 for PLY192 (TRV RNA1), 0.1 for RNA2-crRNAs and 0.005 for TRBO-GFP and injected into healthy leaves of five to six-week-old N. benthamiana plants grown under long-day conditions (16 h light, 8 h dark at 24° C.). A total of four leaves for each gRNA were infiltrated. Three days post-transfection, leaves were cut out and photographed under a handheld UV light in the dark, and stored at −80° C. before subsequent analysis.
[0092] Referring now to
[0093] Referring now to
[0094] Referring now to
[0095] Referring now to
Quantitative RT-PCR
[0096] Total RNA was extracted from infiltrated leaves using RNeasy Plant Mini Kit (Qiagen) and the yield was quantified using a nanodrop. A total of lug RNA from control (NT gRNAs) and experimental samples were used for DNase I treatment (Ambion, AM2222) followed by reverse transcription using a poly-dT primer and the Superscript III First Strand cDNA Synthesis System for RT-PCR (Invitrogen). Quantitative PCR was performed on Quant studio 3 Real-Time PCR System from Applied Biosystem using iTaq PowerUP™ SYBR Green pre-formulated 2× master mix (Applied Biosystems). Relative expression levels based on fold changes were calculated using the ddCT method. Cycle 3 GFP mRNA expression levels from the TRBO-GFP replicon were normalized against transcripts of the tobacco PP2A. The samples were performed in three biological replicates.
Cas13 Collateral Activity Assays
[0097] Initial screens were performed using synthetic dsDNA (˜300 bp) containing a T7 promoter located upstream of a specific target sequence derived from either SARS-CoV-2 (
Specificity of Cas13 collateral activity was evaluated using dsDNA fragments that were not complementary to the gRNAs being tested to confirm that activation of collateral activity as well as human universal RNA (10 tissues) (Invitrogen ThermoFisher, CA US), and total human lung RNA (Invitrogen ThermoFisher, CA US), was also used at 1 and 3 ug, respectively per reaction.
SHERLOCK-Type Viral Detection Reactions
[0098] Heat-inactivated SARS-CoV-2 RNA from respiratory specimens, deposited by the Centers for Disease Control and Prevention, was obtained through BEI Resources, NIAID, NIH: Genomic RNA from SARS-Related Coronavirus 2, Isolate USA-WA1/2020, NR-52285 (American Type Culture Collection (ATCC) VR-1986HK). In a SHERLOCK-type reaction, 1 μl of heat-denatured SARS-CoV-2 (350,000 copies total) was reverse transcribed using the High Capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific) with 3.4 μl of primer (0.5 μM) in a final volume of 16 μl and PCR-amplified by the addition of 2 μl of reverse and forward target primers (2 μM) and 20 μl of 2× OneTaq Master Mix (NEB) in a final volume of 40 μl under standard thermocycler conditions (2 min at 95° C., followed by 35 cycles of 30 s at 95° C., 30 s at 49° C., and 30 sec at 68° C., followed by a final extension of 5 min at 72° C.). PCR cDNA targets were then combined accordingly, and serial dilutions were made such that the final concentrations of the starting SARs RNA material in SHERLOCK reaction was adjusted to either 400, 40, or 4 copies per μl for each target. SHERLOCK reactions were performed as described early using candidate pgRNAs and their monovalent counterparts in the presence of none (background), one, two, or four cDNA targets per reaction. SHERLOCK reactions in the absence of guide RNA were also evaluated and resulted in equivalent background signals produced from no RNA template controls.
In Vitro Transcription of Cas9 gRNAs
[0099] Single guide RNA (sgRNA) was synthesized by using the EnGen sgRNA synthesis Kit (NEB, New England Biolabs, Ipswich, Mass., United States) following standard protocols. DNA oligos (IDT) were designed to contain a T7 promoter sequence upstream of the target sequences with an initiating 5′- d(G), as well as overlapping tracrRNA DNA sequence at the 3′ end of the target. The sgRNA was purified using Monarch RNA Cleanup Kit (NEB) and quantitated using standard protocols.
Duplex gRNA Generation
[0100] Duplex CRISPR gRNAs (cRNA:tracrRNA) was generated by hybridizing synthetic RNA oligos listed in Table S9 to a universal synthetic tracer RNA oligo (IDT). To hybridize oligos, equal molar concentration of oligos were combined in IDT duplex buffer to a final concentration of 10 uM. Reactions were heated to 95° C. for 2 min and allowed to cool to room temperature prior to the reaction assembly.
Cas9 Cleavage Reactions
[0101] Cas9 Nuclease from S. pyogenes (NEB) was diluted in 1× NEB Buffer 3.1. prior to the reaction assembly. Cas9 cleavage activity was performed using either PCR-amplified targets, whole plasmid, or hybridized DNA oligos containing desired targets using standard methods. Briefly, Cas9 was preincubated with either a sgRNA or duplex gRNA (crNA:tracRNA) for 5 min at equal molar concentrations in 1× NEB Buffer 3.1 (NEB) in a volume total of 10 ul. Reactions were incubated for 5-10 min at room temperature. Target DNA was then added to the reactions, NEB Buffer 3.1 was added back to a final concentration of 1×, and nuclease-free water was added bringing the final volume to 20 ul. The final reaction contained 100 nM Cas9-CRISPR complex and 10 nM of target DNA. Similar reactions without the addition of gRNAs to Cas9 were used as a control for uncut DNA. Reactions were incubated at 37° C. for 1 hour, followed by the addition of 1 unit of Proteinase K and further incubation at 56° C. for 15 min. Reactions were stopped by the addition of one volume of purple Gel Loading dye (NEB).
Fragments were separated and analyzed using a 1.5% Agarose gel in 1×TAE and 1×SYBR Green 1 Nucleic Acid Gel Stain (Thermo Fisher Scientific; Waltham, Mass.), and fluorescence was photographed and measured (Amersham™ Imager 600; GE Life Sciences, Piscataway, N.J., United States).
Results nd Discussion
Similarities and Differences in the Design Criteria for gRNAs Used for Precision Gene Editing and Those Used for CRISPR Antivirals
[0102] Despite significant differences in the goals and desired outcomes between CRISPR precision gene editing and CRISPR antivirals as illustrated in
[0103] In the case of precision gene editing as shown in
[0104] In contrast, for CRISPR antivirals as shown in
[0105] Referring to
Design Principles for Polyvalent gRNAs (pgRNAs)
[0106] We hypothesized that, if we could match target sequences within a viral genome to other targets on the same viral genome with some shared sequence homology, a single gRNA spacer sequence could be adapted to maximize CRISPR activity at both targets; this is, in effect, the opposite as what is performed during gRNA design for precision gene editing. The development of “polyvalent” gRNAs—with one spacer able to target multiple protospacers—would have multiple advantages for CRISPR antiviral applications: operative “multiplexing” with fewer components, limiting the potential for viral escape, and increasing the effective number of potential “targets” a CRISPR effector could recognize in viral detection applications. This approach could exploit the myriad of validated tools that are currently used to predict and minimize off-target activity to instead maximize the predicted activity at both those sites. However, because of the differences in the objectives of current gRNA design tools, polyvalent gRNAs would normally be algorithmically rejected, so new approaches are necessary.
[0107] The design of polyvalent gRNAs or pgRNAs relies on exploiting known tolerances of CRISPR effectors for mismatches between gRNA and the target to maximize activity at multiple viral sites. These tolerances exhibit a strong dependence on both the type of mismatch (what nucleotides are incorrectly paired) and the position of the mismatch(es) along the target, and vary not only by type of CRISPR effector but across homologues of the effector derived from different species.
[0108] Careful and systematic studies have been performed to better predict and minimize the propensity of “off-target” effects gene editing; for the design of pgRNAs, we can use these same studies to instead attempt to maximize activity of a single gRNA at multiple viral sites. A metric to score the relative propensities of a CRISPR effector at a site that does not perfectly match its target that is both powerful and simple-to-implement uses a Cutting Frequency Density (CFD) matrix to estimate the penalty or relative decrease in CRISPR activity at off-target sites as a result of each difference in sequence between the target and that site. This approach is described in more detail in the Materials and Methods section. The CFD matrix consists of the mismatch-and position-specific penalties that have been derived from massively parallel characterizations of off-target CRISPR activity, and for each expected mispairing between the gRNA and the off-target site, these penalties are multiplied together to obtain a final score or relative expected CRISPR activity at that site. CFD scores in precision gene editing are used to reject gRNAs which may exhibit high activities at multiple sites in a targeted genome.
[0109] The design of pgRNAs can use CFD scores as an example metric for increasing predicted activity at multiple viral sites based at least in part on the following approach as shown in
[0110] For instance,
[0111] More particularly, candidate pgRNAs were also evaluated in silico for biophysical characteristics, like GC %, secondary structure free energy, and the ability of the ‘direct repeat’ segment of the gRNA to form (which is essential for CRISPR activity) as preliminary indicators for a high likelihood of strong on-target activities. We note that the CFD calculated in the way described above provides an estimate of CRISPR activity at the viral sites relative to a hypothetical target with a sequence perfectly complementary to the pgRNA spacer: this allows us later to integrate our pgRNA design algorithm into other computational tools that predict CRISPR activity at on-target/perfectly matched sequences.
pgRNAs for RfxCas13d Against SARS-CoV-2 Genomic RNA
[0112] We first sought to determine if we could generate novel pgRNAs for RfxCas13d that could be expected to exhibit high activity at multiple viral targets in SARS-CoV-2, the etiological agent of the infectious respiratory illness human COVID-19, while maintaining minimal activity with potential human off-targets (
[0113] For instance,
[0114] We first identified 81 pairs of target sites along the SARS-CoV-2 reference genome that had >75% (21/27) nt sequence identity (
TABLE-US-00001 TABLE 1 Statistical analysis of in silico generation and characterization of pgRNA candidates. HIV-1 SARS-CoV- proviral dsDNA X 2 ssRNA genome genome CRISPR effector: — RfxCas13d enAsCas12a Viral genome size: — 29903 9719 Total # potential target — 29876 2834.sup.1 sites # target pairs with >X 75% 81 .sup. 56.sup.2 homology: 95% 17 — # unique target pairs 20% — 6 with pgRNA candidates >X activity: # pgRNA spacer 95% 249 — candidates with >X activity at both each target site: 20% — 156 # unique target pairs 10% 10 — with active pgRNA candidates (transcriptome) and <X activity vs. human genome/transcriptome: 1% — 5 (genome) Total # unique target — 5 5 pairs with pgRNA candidates passing in silico screen: Total # pgRNA — .sup. 25.sup.3 47 candidates passing in silico screen: .sup.1Number of targets on both strands to the immediate 5′- of a Tier 1 or Tier 2 enAsCas12a PAM. .sup.2177 pairs, including exact matches located within the long terminal repeat (LTR) regions of the HIV-1 provirus. .sup.3125 candidates identified with <10% activity vs. human transcriptome and >95% activity targeting the reference strain sequence; 25 candidates identified with <10% activity vs. human transcriptome and >95% activity across clinical strains.
[0115] The viral targets sites for CRISPR effectors are often chosen based not only on the gene product encoded but also by conservation of nucleotide sequence across clinical strains or related viral families. However, based on the differential ability of CRISPR effectors to recognize and degrade targeted sequences in spite of mismatches between the gRNA and the protospacer, we endeavoured to quantify the “conservation of targets” (rather than sequence, per se) as potential target sites where CRISPR effectors may be highly active across strains regardless of the presence of certain sequence variations. To evaluate the “target conservation” at each of these candidate pgRNA spacers, first we aligned the 942 sequenced viral genomes from clinical samples to the reference Wuhan-1 sequence and characterized their variability. Approximately 50% (50.07%) of the target sites possessed sequence identity, or perfect sequence conservation (SC), across all 942 samples over the entire 27 nt range (
[0116] Genetic targets for detection and inactivation SARS-CoV-2 virus have largely been focused on the highly conserved genes for nucleocapsid protein N and the gene for the RNA-dependent RNA polymerase (RdRP), which is essential for viral replication. Interestingly, the top candidate pgRNA spacers each have two target sites localized across ORF lab, which encodes a large polyprotein later processed into smaller nonstructural proteins (nsp), several of which are important for viral replication. Two of the pairs have one target within the segment of ORF lab that encodes the RnRP. The results presented here demonstrate that pgRNAs can be designed for RfxCas13d that simultaneously are expected to exhibit high relative activity at multiple (essential) target sites on the SARS-CoV-2 genome for which “target conservation” is high, while minimizing expected interactions with the human transcriptome.
TABLE-US-00002 TABLE 2 pgRNA spacer candidates for RfxCas13d against the SARS-CoV-2 ssRNA genome.sup.1 Maximum relative predicted pgRNA pgRNA Target A.sup.2 (ORF/ activity spacer sequence.sup.2; product); Target B/C at BLASTn Target A antisense; Relative (ORF/product); hits to human Target B antisense. Activity (Δ).sup.3 Activity (Δ) transcriptome.sup.3 5′-UAACCAUUGUUCGCUG np4718 np7751 (no BLASTn UAACAGUAUCA-3′ (SEQ (ORF lab/nsp3); (ORF lab/ hits) ID NO: 4); 1.002 (+0.396) nsp3); 0.996 3′-AUUGGUAAUAUGCGAC (+0.155) AUUGUCGUAGU-5′ (SEQ ID NO: 5); 3′-CUUGGUAAGAAGUGAC AUUGUGAUAGU-5′ (SEQ ID NO: 6). 5′-AGAUAAACGUUCUAUG np4721 np13103 (no BLASTn CUUUAACAGCA-3′ (SEQ (ORF lab/nsp3); (ORF lab/ hits with ID NO: 7); 1.097 (+0.779) nsp10); 1.021 >15/27 nt UCUAUUGGUAAUAUGCGAC (+0.669) aligned) AUUGUCGU-5′ (SEQ ID NO: 8); 3′-UCUAUUAGAAACAUUC GAAAUCGUCGU-5′ (SEQ ID NO: 9). 5′-ACAUUGUUGGCAAGUU np8123 np14641 (no BLASTn CAGCUACUGUA-3′ (SEQ (ORF lab/nsp3); (ORF lab/RNA- hits with ID NO: 10); 0.988 (+0.458) dependent RNA >15/27 nt 3′-UGUAAGAAACGUUCAA polymerase); aligned) GUCGAAGACGU-5′ (SEQ 0.955 (+0.469) ID NO: 11); 3′-UGUAACAAUCAUUCAC GUCGAUGACUU-5′ (SEQ ID NO: 12). 5′-AUAUAGUAGUAGAUUA np9048 np14597 (no BLASTn ACCAGAGCAUC-3′ (SEQ (ORF lab/nsp4); (ORF lab/RNA- hits with ID NO: 13); 1.061 (+0.460) dependent RNA >15/27 nt 3′-UAUACCAUGACCGAAU polymerase); aligned) GGUCUUCGUAG-5′ (SEQ 1.350 (+0.549) ID NO: 14); 3′-UAGAUCAUUAUCUAAU GGUCUUCGUCG-5′ (SEQ ID NO: 15). 5′-UAAAUUGCAACCUGUC np17985 np19463 (no BLASTn AUAAACGUGUC-3′ (SEQ (ORF lab/ (ORF lab/3′-to- hits) ID NO: 16); helicase); 5′ exonuclease); 3′-AUUUAACGUUGAACAG 0.988 (+0.568) 1.019 (+0.137) UAUUUCCAGAG-5′ (SEQ ID NO: 17); 3′-AUUUAACGUUGCACAA UAUGUGCAUCG-5′ (SEQ ID NO: 18). .sup.1pgRNAs have >95% predicted relative activity at both targets; <10% predicted relative activity at hits to human transcriptome; and >95% predicted relative activity across all (948) clinical rains .sup.2Underlined at sites where Target A and Target B/C sequences diverge .sup.3Labelled according to np (nucleotide position) of central nucleotide of 27 nt protospacers, (according to SARS-CoV-2 Wuhan-1 strain). nsp: nonstructural protein .sup.4Δ: Increase in predicted CRISPR activity by using pgRNA at target A or B, compared to using gRNA for target A at target B (or vice versa) .sup.5Nucleotide BLAST targeted for short (<30 nt) sequences vs. GRCh38.p12 RefSeq transcripts.
pgRNAs for enAsCas12a Against HIV-1 Provirus
[0117] To determine whether we could generate pgRNAs for against a dsDNA virus, we targeted the HIV-1 provirus using a Cas12a effector (
[0118] However, there are additional challenges for targeting the HIV-1 proviral genome using Cas12a. The HIV-1 proviral genome is smaller (9719 bp) than the SARS-CoV-2 genome and, while both strands of the dsDNA could be targeted, unlike Cas13, Cas12a can only target sequences positioned immediately downstream a protospacer adjacent motif (PAM') that is recognized by the enzyme itself rather than the gRNA. Even with engineered enAsCas12a, which is able to recognize a larger number of PAMs than the native enzyme, strong PAM sequences (Tier 1 or Tier 2) able to activate robust endonucleoltyic activity only appear on average every 1 in 16 bp. Additionally, off-target DSBs on the human genome hold the potential for significant deleterious consequences, so we require pgRNAs with even less potential for accidental targeting of human off-targets than Cas13.
[0119] In particular,
[0120] With these considerations taken into account, we identified 177 target sites next to Tier 1 or Tier 2 PAMs of enAsCas12a in the HIV-1 proviral genome that shared >75% homology across 23 bp targets (
[0121] For instance,
[0122] To further validate proposed example implementations, a Cas9 pgRNAwas designed for two virally-derived targets. As shown in
[0123] The crRNA spacer sequences and target sequences for the above data are provided below:
TABLE-US-00003 crRNA A (SEQ ID NO: 19) ACAUGGUUGGUGUCACACGU Target A sequence (SEQ ID NO: 20) ACATGGTTGGTGTCACACGT AGG .G...C.............A pgRNA A/B (SEQ ID NO: 21) AUAUGUUUGGUGUCACACGG .........T.T...T.... Target B sequence (SEQ ID NO: 22) ATATGTTTGATATCAAACGG GGG crRNA (SEQ ID NO: 23) AUAUGUUUGAUAUCAAACGG
[0124] These results demonstrate that, even subject to the additional constraints, multiple pgRNAs for enAsCas12a could be generated that able to target multiple viral sites simultaneously while maintaining high specificity. These candidates can then be introduced into the computational predictors for on-target enCas12a activity and validated experimentally, where they are expected to strongly suppress reactivation of HIV-1.
TABLE-US-00004 TABLE 3 pgRNA spacer candidates for enAsCas12a vs. HIV-1 proviral genome.sup.1 Maximum relative Target predicted Target B/C pgRNA pgRNA A.sup.3 (gene/ activity spacer sequence.sup.2; (gene); feature); at BLASTn Target A antisense; Relative Relative hits to Target B antisense; Activity Activity human Target C antisense. (Δ).sup.4 (Δ) genome.sup.5 5′-AGCCUUAUUGAGACUC 2580 513/9598 (no BLASTn AACCAGU-3′ (SEQ ID (pol); 5′-LTR/ hits with NO: 24); 0.220 3′-LTR); Tier 1 or 3′-TCGAAATAACTCCGAA (+0.167) 0.306 Tier 2 TTCGTCA-5′ (SEQ ID (+0.297) PAMs) NO: 25); 3′-TCGGGTAAACTCTGAC ATGGTCA-5′ (SEQ ID NO: 26); 3′-TCGGGTAAACTCTGAC ATGGTCA-5′ (SEQ ID NO: 26). 5′-UGAAGAAUCGCAAAAC 8186 6882 (no BLASTn CAGCCAG-3′ (SEQ ID (env); (env); hits with NO: 27); 0.439 0.238 Tier 1 or 3′-ACTTCTTAGCGTTTTG (+0.393) (+0.118) Tier 2 GTCGTTC-5′ (SEQ ID PAMs) NO: 28); 3′-AAATCTTAGCGTTTTG GTCGGCC-5′ (SEQ ID NO: 29). 5′-AAAAGCAUCCCCUAGC 2114 5136 (no BLASTn CUUCCCU-3′ (SEQ ID (gag); (vif); hits with NO: 30); 0.214 0.488 Tier 1 or 3′-TTCTTTTAAGGGACCG (+0.186) (+0.470) Tier 2 GAAGGGA-5′ (SEQ ID PAMs) NO: 31); 3′-TTTTGGTAGGGGATCG AAAGGGA-5′ (SEQ ID NO: 32). 5′-GUCAUAUUUCCCAUAU 3731 7182 0.008526709 UUCCUAU-3′ (SEQ ID (pol); (env); NO: 33); 0.334 0.390 3′-TGGTACAAAGGGTACA (+0.295) (+0.250) AAGGAAA-5′ (SEQ ID NO: 34); 3′-GAGTATAAAGGATAAA AAGGATA-5′ (SEQ ID NO: 35); 5′-ACUGACGUAAUACAAC 3660 3441 9.76531 x UAACAGA-3′ (SEQ ID (pol); (pol); 10.sup.−6 NO: 36); 0.216 0.205 3′-TTACTACATTTTGTTA (+0.200) (+0.152) ATTGTCT-5′ (SEQ ID NO: 37); 3′-TGTCTTCATTATGGTG ATTGTCT-5′ (SEQ ID NO: 38); .sup.1pgRNAs have predicted relative activity at both sites >20%; both targets have Tier 1 or Tier 2 PAM sites; and predicted relative activity at BLAST n hits to human genome <1%. .sup.2Underlined at sites where Target A and Target B sequences diverge .sup.3Labelled at the first position of the protospacer 3′- the PAM site, according to Human immunodeficiency virus type 1 (HXB2), complete genome; HIV1/HTLV-III/LAV reference genome .sup.4Increase in predicted CRISPR activity by using pgRNA at target A or B/C, compared to using gRNA for target A at target B (or vice versa) .sup.5GRCh38.p12 human genome reference sequence
[0125] An analysis of 2,372 genomes of RNA viruses in the NCBI Reference Sequence database revealed that these homeologous pairs of Cas13-targetable sites (23 nt) with >70% identity (>16 out of 23 nt) are prevalent across RNA viruses of mammals, birds, and arthropods, and plants: RNA viruses with genomes that are 5,000 nt in length have on average around 30 of such pairs, and those with genomes that are 10,000 nt in length have on average approximately 120, obeying a power law scaling with genome length. For human-hosted RNA viruses, we could identify 19,926 of these homologous target pairs across 89 viruses.
[0126] Candidate pgRNA sequences for each pair are then generated in silico by determining what nucleotides at the positions of divergent sequence between the two targets would allow for and maximize predicted activity at both sites (
[0127] Sequences with predicted biophysical properties that might negatively impact expression or activity such as strong predicted secondary structures or the presence of mononucleotide stretches are then removed from consideration, as are any sequences with more than 65% complementarity with potential “off-targets” in the host genome or transcriptome (with at least 15 nts complementarity for 23 nt Cas13 targets), yielding a final set of pgRNA candidates with high predicted activity at multiple viral sites and effectively no predicted “off-target” activity vs. the host. To illustrate the broad potential applicability of our approach, we found we could design pgRNA candidates for RNA-targeting Cas13d from Ruminococcus flavefaciens XPD3002 (RfxCas13d) with predicted activity at both their targeted sites ranking in the top quartile of all “monovalent” gRNAs for that virus and no significant homology/predicted activity vs. the human transcriptome for 53 of the 59 (+) ssRNA viruses or expressed viral mRNA sequences in the NCBI Reference Sequence database. RfxCas13d, which has been used in CRISPR-based viral diagnostics and was recently demonstrated to disrupt influenza and SARS-CoV-2 virulence in human epithelial cells, was found to exhibit significant tolerance to mismatches relative to other CRISPR effectors and does not require specific flanking sequences next to its targets, so RfxCas13d may represent an optimal effector for antiviral applications in that regard.
[0128] To test our hypothesis that pgRNAs targeting to multiple viral sites simultaneously would inhibit viral propagation in vivo during a viral infection better than their monovalent counterparts, we designed pgRNAs for RfxCas13d to target pairs of protospacers found in the tobacco mosaic virus (TMV) and infected Nicotiana benthamiana with a TMV replicon (TRBO-GFP) via Agrobacterium tumefaciens-mediated transformation into its leaves (
[0129] After three days, plants expressing one of six different monovalent gRNAs showed viral RNA levels in their leaves reduced to approximately 10% to 25% of those in plants that were not targeting TMV via Cas13 (
[0130] After target recognition and cleavage, many Cas13 variants undergo a conformational change and exhibit “collateral activity” or a non-specific RNAse activity that has been used for applications in viral diagnostics such as SHERLOCK (
[0131] To assess whether pgRNAs might be suitable for in vitro viral diagnostics, We generated a series of 23 pgRNAs with high predicted activity at 15 target pairs found in SARS-CoV-2, then screened their collateral activity in the presence of their SARS-CoV-2 RNA targets and compared those results with the combined activity their perfectly matched monovalent gRNA counterparts (30 separate gRNAs). We found that each of the pgRNAs tested exhibited collateral activity at levels similar to or higher than their combined monovlanent gRNA counterparts with both targets present in the same sample, and no off-site collateral activity was detected in the presence of non-targeted RNA sequences, universal human reference RNA (10 human cell lines; ThermoFisher Scientific), or human lung total RNA (ThermoFisher Scientific) (3 μg RNA). We then assessed their limits of detection (LoD) in a SHERLOCK-type assay using Cas13 and the best-performing pgRNAs, and found that Cas13 with single pgRNAs (recognizing two sites) or two pgRNAs (recognizing four) could robustly generate detectable signals in samples initially containing 40 cp/uL heat-inactivated SARS-CoV-2 (clinically relevant LoD for SARS-CoV-2 is often considered to be 1000 cp/uL) (
[0132] Last, we sought to determine whether the design principles we use for pgRNAs could be applied to gRNAs of other types of CRISPR effectors like the Cas9 effector from Streptococcus pyogenes (SpyCas9), which recognizes and introduces double-strand breaks into dsDNA targets (
[0133] Referring now to
[0134] Referring now to
[0135] Referring now to
[0136] Referring now to
[0137] Referring now to
[0138] Referring now to
[0139] Referring now to
[0140] Referring now to
[0141] Referring now to
[0142] Referring now to
[0143] The CRISPR effector proteins used in biotechnological applications were originally found in bacteria and archaea as an antiviral mechanism to degrade foreign DNA and RNA, and so some tolerance to sequence variation in their targets is likely beneficial for this purpose. In gene editing applications, this tolerance is suppressed to the greatest extent possible using a number of strategies to prevent degradation and mutations at any sequence not exactly matching the gRNA spacer sequence. Rather, in a new gRNA design paradigm for antiviral applications, we show that the polyvalent targeting of viruses by single engineered gRNAs—optimized based on the CRISPR effector's natural position- and sequence-determined tolerance for mismatches for activity at the homologous target pairs that are abundant in viral genomes—can drive robust CRISPR activity at specific targeted pairs simultaneously in vitro/ex vivo, can exhibit stronger viral suppression during infection of a higher organism relative to “monovalent” targeting, and may in fact be optimal for applications of CRISPR antiviral diagnostics, prophylactics, and therapeutics.