Mapping a Functional Cancer Genome Atlas of Tumor Suppressors Using AAV-CRISPR Mediated Direct In Vivo Screening
20200017917 ยท 2020-01-16
Assignee
Inventors
Cpc classification
C12N2310/20
CHEMISTRY; METALLURGY
C12N2320/12
CHEMISTRY; METALLURGY
C12N9/22
CHEMISTRY; METALLURGY
C12N15/90
CHEMISTRY; METALLURGY
C12N15/63
CHEMISTRY; METALLURGY
C12N9/96
CHEMISTRY; METALLURGY
C12N15/111
CHEMISTRY; METALLURGY
C12Q1/6809
CHEMISTRY; METALLURGY
C12N15/86
CHEMISTRY; METALLURGY
A01K2267/0393
HUMAN NECESSITIES
A01K2217/072
HUMAN NECESSITIES
International classification
C12N15/90
CHEMISTRY; METALLURGY
C12N9/96
CHEMISTRY; METALLURGY
C12N9/22
CHEMISTRY; METALLURGY
C12Q1/6809
CHEMISTRY; METALLURGY
C12N15/11
CHEMISTRY; METALLURGY
Abstract
The present invention includes compositions and methods for identifying cancer driver mutations through use of an AAV-CRISPR library and molecular inversion sequencing probes (MIPs).
Claims
1. A method of determining at least one cancer driver mutation in vivo in a cancer-affected subject, the method comprising: administering to the subject a plurality of AAV-CRISPR vectors, wherein the AAV-CRISPR vectors comprise Cas9 and a plurality of short guide RNAs (sgRNAs) homologous to a plurality of tumor suppressor genes (TSGs); and sequencing a plurality of nucleic acids isolated from the subject's cancer; whereby analysis of the sequencing data indicates whether any cancer driver mutation is present in the subject's cancer.
2. The method of claim 1, wherein the sgRNA sequences comprise at least one selected from the group consisting of SEQ ID NOs. 1-280.
3. The method of claim 1, wherein the sgRNA sequences comprise SEQ ID NOs. 1-280.
4. The method of claim 1, wherein the sequencing comprises targeted capture sequencing.
5. The method of claim 4, wherein the targeted capture sequencing is performed using a plurality of Molecular Inversion Probes (MIPs).
6. The method of claim 5, wherein the plurality of MIPs comprises at least one selected from the group consisting of SEQ ID NOs. 289-554.
7. The method of claim 5, wherein the plurality of MIPs comprises SEQ ID NOs. 289-554.
8. The method of claim 1, wherein the mutation is a nucleotide insertion.
9. The method of claim 8, wherein the insertion comprises more than one nucleotide base.
10. The method of claim 1, wherein the mutation is a nucleotide deletion.
11. The method of claim 10, wherein the deletion comprises more than one nucleotide base.
12. The method of claim 1, wherein the subject is a mammal.
13. The method of claim 1, wherein the animal is a mouse or a human.
14. A method of identifying a plurality of cancer driver mutations in a sample, the method comprising: hybridizing a plurality of Molecular Inversion Probes (MIPs) to a plurality of nucleic acids from the sample, and performing targeted capture sequencing on the plurality of nucleic acids, wherein analyzing the data from the targeted capture sequencing indicates the presence and/or nature of any plurality of cancer driver mutations in the sample.
15. The method of claim 14, wherein the MIPs comprise at least one selected from the group consisting of SEQ ID NOs. 289-554.
16. The method of claim 14, wherein the MIPs comprise SEQ ID NOs. 289-554.
17. A composition comprising a set of Molecular Inversion Probes (MIPs) comprising at least one selected from the group consisting of SEQ ID NOs. 289-554.
18. The composition of claim 17, which comprises SEQ ID NOs. 289-554.
19. A kit comprising the composition of claim 18, and instructional material for use thereof.
20. A kit for determining at least one cancer driver mutation in a sample, the kit comprising the composition of claim 18, reagents for measuring the at least one cancer driver mutation, and instructional material for use thereof.
21. A method of determining at least one cancer driver mutation in a sample, the method comprising: contacting a plurality of Adeno-Associated Virus-Clustered Regularly Interspaced Short Palidromic Repeats (AAV-CRISPR) vectors with the sample, wherein the vectors comprise Cas9 and a plurality of nucleotide sequences homologous to a plurality of tumor suppressor genes (TSGs), thus generating a reaction mixture; sequencing a plurality of nucleic acids isolated from the reaction mixture; and analyzing the sequencing data as to identify any cancer driver mutation therein.
22. A method of determining treatment for a subject suffering from cancer, the method comprising: contacting a plurality of AAV-CRISPR vectors with a sample from the subject, wherein the vectors comprise Cas9 and a plurality of nucleotide sequences homologous to a plurality of tumor suppressor genes (TSGs), thus generating a reaction mixture; sequencing a plurality of nucleic acids isolated from the reaction mixture; and analyzing the data from the sequencing as to identify any mutation in the plurality of nucleic acids, whereby treatment for the subject suffering from cancer is determined based on the presence and/or nature of any mutation in the plurality of nucleic acids.
23. The method of claim 22, wherein the plurality of nucleotide sequences homologous to a plurality of TSGs comprises at least one selected from the group consisting of SEQ ID NOs. 1-280.
24. The method of claim 22, wherein the plurality of nucleotide sequences homologous to a plurality of TSGs comprises SEQ ID NOs. 1-280.
25. The method of claim 22, wherein the sequencing comprises targeted capture sequencing.
26. The method of claim 22, wherein the mutation is a nucleotide insertion.
27. The method of claim 26, wherein the insertion comprises more than one nucleotide base.
28. The method of claim 22, wherein the mutation is a nucleotide deletion.
29. The method of claim 28, wherein the deletion comprises more than one nucleotide base.
30. The method of claim 22, wherein the sample is a plurality of cancer cells from the subject.
31. The method of claim 22, wherein the sample is a tumor from the subject.
32. An AAV-CRISPR mTSG library comprising a plurality of AAV vectors comprising Cas9 and a plurality of nucleic acids homologous to a plurality of Tumor Suppressor Gene (TSGs).
33. The library of claim 32, wherein the plurality of nucleic acids comprises at least one selected from SEQ ID NOs. 1-280.
34. The library of claim 32, wherein the plurality of nucleic acids comprises SEQ ID NOs. 1-280.
35. A vector comprising an adeno-associated virus (AAV) genome, a U6 promoter gene, an sgRNA sequence, an EFS promoter gene, and a Cre recombinase gene.
36. A vector comprising an adeno-associated virus (AAV) genome, a U6 promoter gene, an sgRNA sequence, a TBG promoter gene, and a Cre recombinase gene.
37. The vector of claim 36, wherein the TBG promoter gene comprises the nucleic acid sequence of SEQ ID NO: 557.
38. A vector comprising the nucleic acid sequence of SEQ ID NO: 555.
39. A vector comprising the nucleic acid sequence of SEQ ID NO: 556.
40. A kit comprising a vector comprising the nucleic acid sequence of SEQ ID NO: 555, and instructional material for use thereof.
41. A kit comprising a vector comprising the nucleic acid sequence of SEQ ID NO: 556, and instructional material for use thereof.
42. A kit comprising an adeno-associated virus (AAV) genome, a U6 promoter gene, an sgRNA sequence, an EFS promoter gene, a Cre recombinase gene, and instructional material for use thereof.
43. A kit comprising an adeno-associated virus (AAV) genome, a U6 promoter gene, an sgRNA sequence, an TBG promoter gene, a Cre recombinase gene, and instructional material for use thereof.
44. The kit of claim 43, wherein the TBG promoter gene comprises the nucleic acid sequence of SEQ ID NO: 557.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] The following detailed description of specific embodiments of the invention will be better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there are shown in the drawings exemplary embodiments. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities of the embodiments shown in the drawings.
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
DETAILED DESCRIPTION OF THE INVENTION
Definitions
[0051] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although any methods and materials similar or equivalent to those described herein can be used in the practice for testing of the present invention, the preferred materials and methods are described herein. In describing and claiming the present invention, the following terminology will be used.
[0052] It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
[0053] The articles a and an are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, an element means one element or more than one element.
[0054] About as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, is meant to encompass variations of 20% or 10%, more preferably 5%, even more preferably 1%, and still more preferably 0.1% from the specified value, as such variations are appropriate to perform the disclosed methods.
[0055] As used herein the term amount refers to the abundance or quantity of a constituent in a mixture.
[0056] As used herein, the term bp refers to base pair.
[0057] The term complementary refers to the degree of anti-parallel alignment between two nucleic acid strands. Complete complementarity requires that each nucleotide be across from its opposite. No complementarity requires that each nucleotide is not across from its opposite. The degree of complementarity determines the stability of the sequences to be together or anneal/hybridize. Furthermore various DNA repair functions as well as regulatory functions are based on base pair complementarity.
[0058] The term CRISPR/Cas or clustered regularly interspaced short palindromic repeats or CRISPR refers to DNA loci containing short repetitions of base sequences followed by short segments of spacer DNA from previous exposures to a virus or plasmid. Bacteria and archaea have evolved adaptive immune defenses termed CRISPR/CRISPR-associated (Cas) systems that use short RNA to direct degradation of foreign nucleic acids. In bacteria, the CRISPR system provides acquired immunity against invading foreign DNA via. RNA-guided DNA cleavage.
[0059] The CRISPR/Cas9 system or CRISPR/Cas9-mediated gene editing refers to a type II CRISPR/Cas system that has been modified for genome editing/engineering. It is typically comprised of a guide RNA (gRNA) and a non-specific CRISPR-associated endonuclease (Cas9). Guide RNA (gRNA) is used interchangeably herein with short guide RNA (sgRNA) or single guide RNA (sgRNA). The sgRNA is a short synthetic RNA composed of a scaffold sequence necessary for Cas9-binding and a user-defined 20 nucleotide spacer or targeting sequence which defines the genomic target to be modified. The genomic target of Cas9 can be changed by changing the targeting sequence present in the sgRNA.
[0060] Encoding refers to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other polymers and macromolecules in biological processes having either a defined sequence of nucleotides (i.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the biological properties resulting therefrom. Thus, a gene encodes a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system. Both the coding strand, the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings, and the non-coding strand, used as the template for transcription of a gene or cDNA, can be referred to as encoding the protein or other product of that gene or cDNA.
[0061] The term expression as used herein is defined as the transcription and/or translation of a particular nucleotide sequence driven by its promoter.
[0062] Expression vector refers to a vector comprising a recombinant polynucleotide comprising expression control sequences operatively linked to a nucleotide sequence to be expressed. An expression vector comprises sufficient cis-acting elements for expression; other elements for expression can be supplied by the host cell or in an in vitro expression system. Expression vectors include all those known in the art, such as cosmids, plasmids (e.g., naked or contained in liposomes) and viruses (e.g., Sendai viruses, lentiviruses, retroviruses, adenoviruses, and adeno-associated viruses) that incorporate the recombinant polynucleotide.
[0063] Homologous as used herein, refers to the subunit sequence identity between two polymeric molecules, e.g., between two nucleic acid molecules, such as, two DNA molecules or two RNA molecules, or between two polypeptide molecules. When a subunit position in both of the two molecules is occupied by the same monomeric subunit; e.g., if a position in each of two DNA molecules is occupied by adenine, then they are homologous at that position. The homology between two sequences is a direct function of the number of matching or homologous positions; e.g., if half (e.g., five positions in a polymer ten subunits in length) of the positions in two sequences are homologous, the two sequences are 50% homologous; if 90% of the positions (e.g., 9 of 10), are matched or homologous, the two sequences are 90% homologous.
[0064] Identity as used herein refers to the subunit sequence identity between two polymeric molecules particularly between two amino acid molecules, such as, between two polypeptide molecules. When two amino acid sequences have the same residues at the same positions; e.g., if a position in each of two polypeptide molecules is occupied by an Arginine, then they are identical at that position. The identity or extent to which two amino acid sequences have the same residues at the same positions in an alignment is often expressed as a percentage. The identity between two amino acid sequences is a direct function of the number of matching or identical positions; e.g., if half (e.g., five positions in a polymer ten amino acids in length) of the positions in two sequences are identical, the two sequences are 50% identical; if 90% of the positions (e.g., 9 of 10), are matched or identical, the two amino acids sequences are 90% identical.
[0065] As used herein, an instructional material includes a publication, a recording, a diagram, or any other medium of expression which can be used to communicate the usefulness of the compositions and methods of the invention. The instructional material of the kit of the invention may, for example, be affixed to a container which contains the nucleic acid, peptide, and/or composition of the invention or be shipped together with a container which contains the nucleic acid, peptide, and/or composition. Alternatively, the instructional material may be shipped separately from the container with the intention that the instructional material and the compound be used cooperatively by the recipient.
[0066] A mutation as used herein is a change in a DNA sequence resulting in an alteration from a given reference sequence (which may be, for example, an earlier collected DNA sample from the same subject). The mutation can comprise deletion and/or insertion and/or duplication and/or substitution of at least one deoxyribonucleic acid base such as a purine (adenine and/or thymine) and/or a pyrimidine (guanine and/or cytosine). Mutations may or may not produce discernible changes in the observable characteristics (phenotype) of an organism (subject).
[0067] By nucleic acid is meant any nucleic acid, whether composed of deoxyribonucleosides or ribonucleosides, and whether composed of phosphodiester linkages or modified linkages such as phosphotriester, phosphoramidate, siloxane, carbonate, carboxymethylester, acetamidate, carbamate, thioether, bridged phosphoramidate, bridged methylene phosphonate, phosphorothioate, methylphosphonate, phosphorodithioate, bridged phosphorothioate or sulfone linkages, and combinations of such linkages. The term nucleic acid also specifically includes nucleic acids composed of bases other than the five biologically occurring bases (adenine, guanine, thymine, cytosine and uracil).
[0068] In the context of the present invention, the following abbreviations for the commonly occurring nucleic acid bases are used. A refers to adenosine, C refers to cytosine, G refers to guanosine, T refers to thymidine, and U refers to uridine.
[0069] Unless otherwise specified, a nucleotide sequence encoding an amino acid sequence includes all nucleotide sequences that are degenerate versions of each other and that encode the same amino acid sequence. The phrase nucleotide sequence that encodes a protein or an RNA may also include introns to the extent that the nucleotide sequence encoding the protein may in some version contain an intron(s).
[0070] The term oligonucleotide typically refers to short polynucleotides, generally no greater than about 60 nucleotides. It will be understood that when a nucleotide sequence is represented by a DNA sequence (i.e., A, T, G, C), this also includes an RNA sequence (i.e., A, U, G, C) in which U replaces T.
[0071] As used herein, the terms peptide, polypeptide, and protein are used interchangeably, and refer to a compound comprised of amino acid residues covalently linked by peptide bonds. A protein or peptide must contain at least two amino acids, and no limitation is placed on the maximum number of amino acids that can comprise a protein's or peptide's sequence. Polypeptides include any peptide or protein comprising two or more amino acids joined to each other by peptide bonds. As used herein, the term refers to both short chains, which also commonly are referred to in the art as peptides, oligopeptides and oligomers, for example, and to longer chains, which generally are referred to in the art as proteins, of which there are many types. Polypeptides include, for example, biologically active fragments, substantially homologous polypeptides, oligopeptides, homodimers, heterodimers, variants of polypeptides, modified polypeptides, derivatives, analogs, fusion proteins, among others. The polypeptides include natural peptides, recombinant peptides, synthetic peptides, or a combination thereof.
[0072] The term polynucleotide includes DNA, cDNA, RNA, DNA/RNA hybrid, anti-sense RNA, siRNA, miRNA, snoRNA, genomic DNA, synthetic forms, and mixed polymers, both sense and antisense strands, and may be chemically or biochemically modified to contain non-natural or derivatized, synthetic, or semisynthetic nucleotide bases. Also, included within the scope of the invention are alterations of a wild type or synthetic gene, including but not limited to deletion, insertion, substitution of one or more nucleotides, or fusion to other polynucleotide sequences.
[0073] Conventional notation is used herein to describe polynucleotide sequences: the left-hand end of a single-stranded polynucleotide sequence is the 5-end; the left-hand direction of a double-stranded polynucleotide sequence is referred to as the 5-direction.
[0074] The term promoter as used herein is defined as a DNA sequence recognized by the synthetic machinery of the cell, or introduced synthetic machinery, required to initiate the specific transcription of a polynucleotide sequence.
[0075] A sample or biological sample as used herein means a biological material from a subject, including but is not limited to organ, tissue, exosome, blood, plasma, saliva, urine and other body fluid. A sample can be any source of material obtained from a subject.
[0076] The term subject is intended to include living organisms in which an immune response can be elicited (e.g., mammals). A subject or patient, as used therein, may be a human or non-human mammal. Non-human mammals include, for example, livestock and pets, such as ovine, bovine, porcine, canine, feline and murine mammals. Preferably, the subject is human.
[0077] A target site or target sequence refers to a genornic nucleic acid sequence that defines a portion of a nucleic acid to which a binding molecule may specifically bind under conditions sufficient for binding to occur.
[0078] The term therapeutic as used herein means a treatment and/or prophylaxis. A therapeutic effect is obtained by suppression, remission, or eradication of a disease state.
[0079] The term transfected or transformed or transduced as used herein refers to a process by which exogenous nucleic acid is transferred or introduced into the host cell. A transfected or transformed or transduced cell is one which has been transfected, transformed or transduced with exogenous nucleic acid. The cell includes the primary subject cell and its progeny. In certain embodiments, transfected means an exogenous nucleic acid is transferred transiently into a cell, often a mammalian cell; while transduced means an exogenous nucleic acid is transferred permanently into a cell, often a mammalian cell, for example by viruses or viral vectors; transformed means an exogenous nucleic acid is transferred into a cell, often bacterial or yeast cells.
[0080] To treat a disease as the term is used herein, means to reduce the frequency or severity of at least one sign or symptom of a disease or disorder experienced by a subject.
[0081] A vector is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell. Numerous vectors are known in the art including, but not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses. Thus, the term vector includes an autonomously replicating plasmid or a virus. The term should also be construed to include non-plasmid and non-viral compounds which facilitate transfer of nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like. Examples of viral vectors include, but are not limited to, Sendai viral vectors, adenoviral vectors, adeno-associated virus vectors, retroviral vectors, lentiviral vectors, and the like.
[0082] Ranges: throughout this disclosure, various aspects of the invention can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.
Description
[0083] Herein, a Functional Cancer Genome Atlas (FCGA) of tumor suppressors in the autochthonous mouse liver was mapped using massively parallel CRISPR/Cas9 genome editing. A direct in vivo CRISPR screen was performed by intravenously injecting adeno-associated virus (AAV) pools carrying a library of 280 sgRNAs targeting 56 cancer genes into Rosa-LSL-Cas9-EGFP knock-in mice (LSL-Cas9 mice) to generate highly complex autochthonous liver tumors, and subsequently readout the Cas9-generated variants at predicted sgRNA cut sites using molecular inversion probe sequencing (MIPS). This combination of direct mutagenesis and pooled variant readout illuminated the mutational landscape of the tumors, demonstrating that the present approach can be used to quantitatively analyze numerous putative TSGs in a high-throughput manner. Mutagenesis of individual or combinations of genes represented by high frequency variants validated certain functional drivers of liver tumorigenesis in fully immunocompetent mice.
Methods
[0084] The present invention includes methods for identifying cancer driver mutations in vivo. One aspect of the method comprises selecting nucleotide sequences in silica from a plurality of tumor suppressor genes (TSGs) and designing a plurality of short guide RNA (sgRNA) sequences in Silk homologous to the plurality of TSGs. In certain embodiments, the plurality of sgRNA sequences are synthesized into oligonucleotides and introduced into a plurality of AAV-CRISPR vectors. In certain embodiments, the AAV-CRISPR vectors comprise Cas9. In certain embodiments, the AAV-CRISPR vectors containing the plurality of oligonucleotides are administered into an animal. In certain embodiments, a tumor is isolated from the animal. In certain embodiments, nucleic acids are isolated from the tumor and sequenced. In certain embodiments, the sequencing data are analyzed, thus identifying the cancer driver mutation(s).
[0085] Another aspect of the invention includes a method of determining at least one cancer driver mutation in vivo in a cancer-affected subject. In certain embodiments, the method comprises administering to the subject a plurality of AAV-CRISPR vectors, wherein the AAV-CRISPR vectors comprise Cas9 and a plurality of short guide RNAs (sgRNAs) homologous to a plurality of tumor suppressor genes (TSGs). In certain embodiments, a plurality of nucleic acids isolated from the subject's cancer is sequenced and analysis of the sequencing data indicates whether any cancer driver mutation is present in the subject's cancer.
[0086] In certain embodiments of the invention, the sgRNA sequences comprise at least one selected from the group consisting of SEQ ID NOs. 1-280.
[0087] In certain embodiments of the invention, the sgRNA sequences comprise SEQ ID NOs. 1-280.
[0088] In certain embodiments of the invention, the AAV-CRISPR vector is comprised of the components as described herein. In certain embodiments, the AAV-CRISPR can also include (1) constitutive EFS promoter or tissue-specific TBG promoter, for example polII promoters, (2) a constitutive U6 polIII promoter, (3) sgRNA spacer cloning site with double SapI type II restriction enzyme cutting site; (4) an sgRNA backbone derived from an 89 bp chimeric backbone from Streptococcus pyogenes Cas9 tracrRNA; and (5) a Cre recombinase.
[0089] In certain embodiments of the invention, the animal is a mouse. Other animals that can be used include but are not limited to rats, rabbits, dogs, cats, horses, pigs, cows and birds. In certain embodiments, the animal is a human. The AAV-CRISPR vectors can be administered to an animal by any means standard in the art. For example the vectors can be injected into the animal. The injections can be intravenous, subcutaneous, intraperitoneal, or directly into a tissue or organ.
[0090] Nucleotide sequencing or sequencing, as it is commonly known in the art, can be performed by standard methods commonly known to one of ordinary skill in the art. In certain embodiments of the invention, sequencing comprises targeted capture sequencing. Targeted capture sequencing can be performed as described herein, or by methods commonly performed by one of ordinary skill in the art. In certain embodiments, the targeted capture sequencing is performed using a plurality of Molecular Inversion Probes (MIPs). In certain embodiments, the plurality of MIPs comprises at least one selected from the group consisting of SEQ ID NOs. 289-554. In certain embodiments, the plurality of MIPs comprises SEQ ID NOs. 289-554.
[0091] Another aspect of the invention includes a method of identifying a plurality of cancer driver mutations in a sample comprising hybridizing a plurality of Molecular Inversion Probes (MIPs) to a plurality of nucleic acids from the sample. In certain embodiments, targeted capture sequencing is performed on the plurality of nucleic acids. In certain embodiments, data from the targeted capture sequencing is then analyzed, thus identifying the plurality of cancer driver mutations in the sample. In certain embodiments, the MIPs comprise at least one selected from the group consisting of SEQ ID NOs. 289-554. In certain embodiments, the MIPs comprise SEQ ID NOs. 289-554.
[0092] Yet another aspect of the invention includes a method of determining at least one cancer driver mutation in a sample comprising administering an AAV-CRISPR vectors to the sample, wherein the vectors comprise Cas9 and a plurality of nucleotide sequences homologous to a plurality of tumor suppressor genes (TSGs). In certain embodiments, the nucleic acids are isolated from the sample and sequenced. In certain embodiments, the sequencing data are analyzed, thus determining the at least one cancer driver mutation in the sample.
[0093] Another aspect of the invention includes a method of determining a treatment for cancer in a subject. The method comprises administering a plurality of AAV-CRISPR vectors to a sample from the subject. In certain embodiments, the vectors comprise Cas9 and a plurality of nucleotide sequences homologous to a plurality of tumor suppressor genes (TSGs). In certain embodiments, the nucleic acids are isolated from the sample and sequenced. In certain embodiments, the sequencing data are analyzed, thus identifying at least one cancer driver mutation in the sample. In certain embodiments, identifying the at least one cancer driver mutation determines the cancer treatment for the subject.
[0094] The mutations claimed herein can be any combination of insertions or deletions, including but not limited to a single base insertion, a single base deletion, a frameshift, a rearrangement, and an insertion or deletion of 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, any and all numbers in between, bases. The mutation can occur in a gene or in a non-coding region. The location of the mutation can provide information as to the type of treatment needed. For example, if a mutation occurs in a specific gene rendering that gene non-functional, a drug that acts on that particular gene will not be considered for treatment. Likewise if a drug is known to act on a particular gene and that gene is not mutated, that drug will be considered for treatment.
[0095] In certain embodiments the plurality of nucleotide sequences homologous to a plurality of TSGs comprises at least one selected from the group consisting of SEQ ID NOs. 1-280.
[0096] In certain embodiments the plurality of nucleotide sequences homologous to a plurality of TSGs comprises SEQ ID NOs. 1-280.
[0097] The sample of the present invention can comprise a cancer cell or a plurality of cancer cells. The sample can also comprise a tumor. In some embodiments, multiple sections of the same tumor can make up multiple samples.
[0098] The compositions described herein may be administered to a patient transarterially, subcutaneously, intradermally, intratumorally, intranodally, intramedullary, intramuscularly, by intravenous (i.v.) injection, or intraperitoneally. In other instances, the composition of the invention are injected directly into a site of inflammation in the subject, a local disease site in the subject, a lymph node, an organ, a tumor, and the like.
Compositions
[0099] One aspect of the invention provides a composition comprising a set of Molecular Inversion Probes (MIPs) comprised of at least one selected from the group consisting of SEQ ID NOs. 289-554. Another aspect includes a kit comprising a set of Molecular Inversion Probes (MIPs) comprised of at least one selected from the group consisting of SEQ ID NOs. 289-554, and instructional material for use thereof. Yet another aspect includes a kit for determining at least one cancer driver mutation in a sample comprising a set of Molecular Inversion Probes (MIPs) comprised of at least one selected from the group consisting of SEQ ID NOs. 289-554, reagents for measuring the at least one cancer driver mutation, and instructional material for use thereof.
[0100] Another aspect includes a composition comprising an AAV-CRISPR mTSG library comprised of a plurality of AAV vectors. The AVV vectors are comprised of Cas9 and a plurality of nucleic acids homologous to a plurality of Tumor Suppressor Gene (TSGs). In one embodiment, the plurality of nucleic acids comprises at least one selected from the group consisting of SEQ ID NOs. 1-280.
[0101] In one aspect, the invention includes a vector comprising an adeno-associated virus (AAV) genome, a U6 promoter gene, an sgRNA sequence, an EFS promoter gene, and a Cre recombinase gene. In another aspect, the invention includes a vector comprising an adeno-associated virus (AAV) genome, a U6 promoter gene, an sgRNA sequence, a TBG promoter gene, and a Cre recombinase gene. In yet another aspect, the invention includes a vector comprising the nucleic acid sequence of SEQ ID NO: 555. In still another aspect, the invention includes a vector comprising the nucleic acid sequence of SEQ ID NO: 556. In certain embodiments, the TBG promoter gene comprises the nucleic acid sequence of SEQ ID NO: 557. In certain embodiments, the AAV-CRISPR can also include (1) constitutive EFS promoter or tissue-specific TBG promoter, for example polII promoters, (2) a constitutive U6 polIII promoter, (3) sgRNA spacer cloning site with double SapI type II restriction enzyme cutting site; (4) an sgRNA backbone derived from an 89 bp chimeric backbone from Streptococcus pyogenes Cas9 tracrRNA; and (5) a Cre recombinase.
[0102] Another aspect of the invention includes a kit comprising an adeno-associated virus (AAV) genome, a U6 promoter gene, an sgRNA sequence, an EFS promoter gene, and a Cre recombinase gene, and instructional material for use thereof. Yet another aspect includes a kit comprising an adeno-associated virus (AAV) genome, a U6 promoter gene, an sgRNA sequence, an TBG promoter gene, and a Cre recombinase gene, and instructional material for use thereof.
CRISPR/Cas9
[0103] The CRISPR/Cas9 system is a facile and efficient system for inducing targeted genetic alterations. Target recognition by the Cas9 protein requires a seed sequence within the guide RNA (gRNA) and a conserved di-nucleotide containing protospacer adjacent motif (PAM) sequence upstream of the gRNA-binding region. The CRISPR/Cas9 system can thereby be engineered to cleave virtually any DNA sequence by redesigning the gRNA in cell lines (such as 2931 cells), primary cells, and CAR T cells. The CRISPR/Cas9 system can simultaneously target multiple genomic loci by co-expressing a single Cas9 protein with two or more gRNAs, making this system uniquely suited for multiple gene editing or synergistic activation of target genes.
[0104] The Cas9 protein and guide RNA form a complex that identifies and cleaves target sequences. Cas9 is comprised of six domains: REC I, REC II, Bridge Helix, PAM interacting, HNH, and RuvC. The Red domain binds the guide RNA, while the Bridge helix binds to target DNA. The HNH and RuvC domains are nuclease domains. Guide RNA is engineered to have a 5 end that is complementary to the target DNA sequence. Upon binding of the guide RNA to the Cas9 protein, a conformational change occurs activating the protein. Once activated, Cas9 searches for target DNA by binding to sequences that match its protospacer adjacent motif (PAM) sequence. A PAM is a two or three nucleotide base sequence within one nucleotide downstream of the region complementary to the guide RNA. In one non-limiting example, the PANT sequence is 5-NG-G-3. When the Cas9 protein finds its target sequence with the appropriate PAM, it melts the bases upstream of the PAM and pairs them with the complementary region on the guide RNA. Then the RuvC and HNH nuclease domains cut the target DNA after the third nucleotide base upstream of the PAM.
[0105] One non-limiting example of a CRISPR/Cas system used to inhibit gene expression, CRISPRi, is described in U.S. Patent Appl. Publ. No. US20140068797. CRISPRi induces permanent gene disruption that utilizes the RNA-guided Cas9 endonuclease to introduce DNA double stranded breaks which trigger error-prone repair pathways to result in frame shift mutations. A catalytically dead Cas9 lacks endonuclease activity. When coexpressed with a guide RNA, a DNA recognition complex is generated that specifically interferes with transcriptional elongation, RNA polymerase binding, or transcription factor binding. This CRISPRi system efficiently represses expression of targeted genes.
[0106] CRISPR/Cas gene disruption occurs when a guide nucleic acid sequence specific for a target gene and a Cas endonuclease are introduced into a cell and form a complex that enables the Cas endonuclease to introduce a double strand break at the target gene. In certain embodiments, the CRISPR/Cas system comprises an expression vector, such as, but not limited to, an pAd5F35-CRISPR vector. In other embodiments, the Cas expression vector induces expression of Cas9 endonuclease. Other endonucleases may also be used, including but not limited to, T7, Cas3, Cas8a, Cas8b, Cas10d, Cse1, Csy1, Csn2, Cas4, Cas10, Csm2, Cmr5, Fok1, other nucleases known in the art, and any combination thereof.
[0107] In certain embodiments, inducing the Cas expression vector comprises exposing the cell to an agent that activates an inducible promoter in the Cas expression vector. In such embodiments, the Cas expression vector includes an inducible promoter, such as one that is inducible by exposure to an antibiotic (e.g., by tetracycline or a derivative of tetracycline, for example doxycycline). However, it should be appreciated that other inducible promoters can be used. The inducing agent can be a selective condition (e.g., exposure to an agent, for example an antibiotic) that results in induction of the inducible promoter. This results in expression of the Cas expression vector.
[0108] In certain embodiments, guide RNA(s) and Cas9 can be delivered to a cell as a ribonucleoprotein (RNP) complex. RNPs are comprised of purified Cas9 protein complexed with gRNA and are well known in the art to be efficiently delivered to multiple types of cells, including but not limited to stem cells and immune cells (Addgene, Cambridge, Mass., Mirus Bio LLC, Madison, Wis.).
[0109] The guide RNA is specific for a genomic region of interest and targets that region for Cas endonuclease-induced double strand breaks. The target sequence of the guide RNA sequence may be within a loci of a gene or within a non-coding region of the genome. In certain embodiments, the guide nucleic acid sequence is at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 or more nucleotides in length.
[0110] Guide RNA (gRNA), also referred to as short guide RNA or sgRNA, provides both targeting specificity and scaffolding/binding ability for the Cas9 nuclease. The gRNA can be a synthetic RNA composed of a targeting sequence and scaffold sequence derived from endogenous bacterial crRNA and tracrRNA. gRNA is used to target Cas9 to a specific genomic locus in genome engineering experiments. Guide RNAs can be designed using standard tools well known in the art.
[0111] In the context of formation of a CRISPR complex, target sequence refers to a sequence to which a guide sequence is designed to have some complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. Full complementarity is not necessarily required, provided there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides. In certain embodiments, a target sequence is located in the nucleus or cytoplasm of a cell. In other embodiments, the target sequence may be within an organelle of a eukaryotic cell, for example, mitochondrion or nucleus. Typically, in the context of an endogenous CRISPR system, formation of a CRISPR complex (comprising a guide sequence hybridized to a target sequence and complexed with one or more Cas proteins) results in cleavage of one or both strands in or near (e.g., within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50 or more base pairs) the target sequence. As with the target sequence, it is believed that complete complementarity is not needed, provided this is sufficient to be functional.
[0112] In certain embodiments, one or more vectors driving expression of one or more elements of a CRISPR system are introduced into a host cell, such that expression of the elements of the CRISPR system direct formation of a CRISPR complex at one or more target sites. For example, a Cas enzyme, a guide sequence linked to a tracr-mate sequence, and a tracr sequence could each be operably linked to separate regulatory elements on separate vectors. Alternatively, two or more of the elements expressed from the same or different regulatory elements may be combined in a single vector, with one or more additional vectors providing any components of the CRISPR system not included in the first vector. CRISPR system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5 with respect to (upstream of) or 3 with respect to (downstream of) a second element. The coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction. In certain embodiments, a single promoter drives expression of a transcript encoding a CRISPR enzyme and one or more of the guide sequence, tracr mate sequence (optionally operably linked to the guide sequence), and a tracr sequence embedded within one or more intron sequences (e.g., each in a different intron, two or more in at least one intron, or all in a single intron).
[0113] In certain embodiments, the CRISPR enzyme is part of a fusion protein comprising one or more heterologous protein domains (e.g. about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the CRISPR enzyme). A CRISPR enzyme fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains. Examples of protein domains that may be fused to a CRISPR enzyme include, without limitation, epitope tags, reporter gene sequences, and protein domains having one or more of the following activities: methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity and nucleic acid binding activity. Additional domains that may form part of a fusion protein comprising a CRISPR enzyme are described in US20110059502, incorporated herein by reference. In certain embodiments, a tagged CRISPR enzyme is used to identify the location of a target sequence.
[0114] Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian and non-mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a CRISPR system to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g., a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell (Anderson, 1992, Science 256:808-813; and Yu, et al., 1994, Gene Therapy 1:13-26).
[0115] In certain embodiments, the CRISPR/Cas is derived from a type II CRISPR/Cas system. In other embodiments, the CRISPR/Cas system is derived from a Cas9 protein. The Cas9 protein can be from Streptococcus pyogenes, Streptococcus thermophilus, or other species.
[0116] In general, Cas proteins comprise at least one RNA recognition and/or RNA binding domain. RNA recognition and/or RNA binding domains interact with the guiding RNA. Cas proteins can also comprise nuclease domains (i.e., DNase or RNase domains), DNA binding domains, helicase domains, RNAse domains, protein-protein interaction domains, dimerization domains, as well as other domains. The Cas proteins can be modified to increase nucleic acid binding affinity and/or specificity, alter an enzymatic activity, and/or change another property of the protein. In certain embodiments, the Cas-like protein of the fusion protein can be derived from a wild type Cas9 protein or fragment thereof. In other embodiments, the Cas can be derived from modified Cas9 protein. For example, the amino acid sequence of the Cas9 protein can be modified to alter one or more properties (e.g., nuclease activity, affinity, stability, and so forth) of the protein. Alternatively, domains of the Cas9 protein not involved in RNA-guided cleavage can be eliminated from the protein such that the modified Cas9 protein is smaller than the wild type Cas9 protein. In general, a Cas9 protein comprises at least two nuclease (i.e., DNase) domains. For example, a Cas9 protein can comprise a RuvC-like nuclease domain and a HNH-like nuclease domain. The RuvC and HNH domains work together to cut single strands to make a double-stranded break in DNA. (Jinek, et al., 2012, Science, 337:816-821). In certain embodiments, the Cas9-derived protein can be modified to contain only one functional nuclease domain (either a RuvC-like or a HNH-like nuclease domain). For example, the Cas9-derived protein can be modified such that one of the nuclease domains is deleted or mutated such that it is no longer functional (i.e., the nuclease activity is absent). In some embodiments in which one of the nuclease domains is inactive, the Cas9-derived protein is able to introduce a nick into a double-stranded nucleic acid (such protein is termed a nickase), but not cleave the double-stranded DNA. In any of the above-described embodiments, any or all of the nuclease domains can be inactivated by one or more deletion mutations, insertion mutations, and/or substitution mutations using well-known methods, such as site-directed mutagenesis, PCR-mediated mutagenesis, and total gene synthesis, as well as other methods known in the art.
[0117] In one non-limiting embodiment, a vector drives the expression of the CRISPR system. The art is replete with suitable vectors that are useful in the present invention. The vectors to be used are suitable for replication and, optionally, integration in eukaryotic cells. Typical vectors contain transcription and translation terminators, initiation sequences, and promoters useful for regulation of the expression of the desired nucleic acid sequence. The vectors of the present invention may also be used for nucleic acid standard gene delivery protocols. Methods for gene delivery are known in the art (U.S. Pat. Nos. 5,399,346, 5,580,859 & 5,589,466, incorporated by reference herein in their entireties).
[0118] Further, the vector may be provided to a cell in the form of a viral vector. Viral vector technology is well known in the art and is described, for example, in Sambrook et al. (4.sup.th Edition, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York, 2012), and in other virology and molecular biology manuals. Viruses, which are useful as vectors include, but are not limited to, retroviruses, adenoviruses, adeno-associated viruses, herpes viruses, Sindbis virus, gammaretrovirus and lentiviruses. In general, a suitable vector contains an origin of replication functional in at least one organism, a promoter sequence, convenient restriction endonuclease sites, and one or more selectable markers (e.g., WO 01/96584; WO 01/29058; and U.S. Pat. No. 6,326,193).
Introduction of Nucleic Acids
[0119] Methods of introducing nucleic acids into a cell include physical, biological and chemical methods. Physical methods for introducing a polynucleotide, such as DNA or RNA, into a cell include transfection, transformation, transduction, calcium phosphate precipitation, lipofection, particle bombardment, microinjection, electroporation, and the like. RNA and DNA can be introduced into cells using commercially available methods which include electroporation (Amaxa Nucleofector-II (Amaxa Biosystems, Cologne, Germany)), (ECM 830 (BTX) (Harvard Instruments, Boston, Mass.) or the Gene Pulser II (BioRad, Denver, Colo.), Multiporator (Eppendort, Hamburg Germany). RNA and DNA can also be introduced into cells using cationic liposome mediated transfection using lipofection, using polymer encapsulation, using peptide mediated transfection, or using biolistic particle delivery systems such as gene guns (see, for example, Nishikawa, et al. Hum Gene Ther., 12(8):861-70 (2001).
[0120] Biological methods for introducing a polynucleotide of interest into a cell include the use of DNA and RNA vectors. Viral vectors, and especially retroviral vectors, have become the most widely used method for inserting genes into mammalian, e.g., human cells. Other viral vectors can be derived from lentivirus, poxviruses, herpes simplex virus I, adenoviruses and adeno-associated viruses, and the like. See, for example, U.S. Pat. Nos. 5,350,674 and 5,585,362. Non-viral vector such as plasmids can also be used to introduce nucleic acids or polynucleotides into a cell. In certain embodiments plasmids containing guide RNAs are transfected into a cell.
[0121] Chemical means for introducing a polynucleotide into a host cell include colloidal dispersion systems, such as macromolecule complexes, nanocapsules, microspheres, beads, and lipid-based systems including oil-in-water emulsions, micelles, mixed micelles, and liposomes. An exemplary colloidal system for use as a delivery vehicle in vitro and in vivo is a liposome (e.g., an artificial membrane vesicle).
[0122] Regardless of the method used to introduce exogenous nucleic acids into a host cell, in order to confirm the presence of the nucleic acids in the host cell, a variety of assays may be performed. Such assays include, for example, molecular biological assays well known to those of skill in the art, such as gel electrophoresis, Southern and Northern blotting, RT-PCR and PCR; biochemical assays, such as detecting the presence or absence of a particular peptide, e.g., by immunological means (ELISAs and Western blots) or by assays described herein to identify agents falling within the scope of the invention.
[0123] It should be understood that the methods and compositions that would be useful in the present invention are not limited to the particular formulations set forth in the examples. The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description, and are not intended to limit the scope of what the inventors regard as their invention.
[0124] The practice of the present invention employs, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are well within the purview of the skilled artisan. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, fourth edition (Sambrook et al. (2012) Molecular Cloning, Cold Spring Harbor Laboratory); Oligonucleotide Synthesis (Gait, M. J. (1984). Oligonucleotide synthesis. IRL press); Culture of Animal Cells (Freshney, R. (2010). Culture of animal cells. Cell Proliferation, 15(2.3), 1); Methods in Enzymology Weir's Handbook of Experimental Immunology (Wiley-Blackwell; 5 edition (Jan. 15, 1996); Gene Transfer Vectors for Mammalian Cells (Miller and Carlos, (1987) Cold Spring Harbor Laboratory, New York); Short Protocols in Molecular Biology (Ausubel et al., Current Protocols; 5 edition (Nov. 5, 2002)); Polymerase Chain Reaction: Principles, Applications and Troubleshooting, (Babar, M., VDM Verlag Dr. Miller (Aug. 17, 2011)); Current Protocols in Immunology (Coligan, John Wiley & Sons, Inc. Nov. 1, 2002).
[0125] Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, numerous equivalents to the specific procedures, embodiments, claims, and examples described herein. Such equivalents were considered to be within the scope of this invention and covered by the claims appended hereto. For example, it should be understood, that modifications in reaction conditions, including but not limited to reaction times, reaction size/volume, and experimental reagents, such as solvents, catalysts, pressures, atmospheric conditions, e.g., nitrogen atmosphere, and reducing/oxidizing agents, with art-recognized alternatives and using no more than routine experimentation, are within the scope of the present application.
[0126] It is to be understood that wherever values and ranges are provided herein, all values and ranges encompassed by these values and ranges, are meant to be encompassed within the scope of the present invention. Moreover, all values that fall within these ranges, as well as the upper or lower limits of a range of values, are also contemplated by the present application.
[0127] The following examples further illustrate aspects of the present invention. However, they are in no way a limitation of the teachings or disclosure of the present invention as set forth herein.
EXPERIMENTAL EXAMPLES
[0128] The invention is now described with reference to the following Examples. These Examples are provided for the purpose of illustration only, and the invention is not limited to these Examples, but rather encompasses all variations that are evident as a result of the teachings provided herein.
[0129] The materials and methods employed in these experiments are now described.
[0130] Design, Synthesis and Cloning of the mTSG Library:
[0131] Pan-cancer mutation data from 15 cancer types were retrieved from The Cancer Genome Atlas (TCGA portal) via cBioPortal (Gao et al., Sci. Signal. 6, pl 1 (2013); Cerami et al., Cancer Discov. 2, 401-404 (2012)) and Synapse (www dot synapse dot org). Recurrently mutated genes were calculated similarly to previously described methods (Kandoth et al., Nature 502, 333-339 (2013); Lawrence et al., Nature 499, 214-218 (2013); Davoli et al., Cell 155, 948-962 (2013)). Known oncogenes were excluded and only known or predicted tumor suppressor genes (TSGs) were included. The top 50 TSGs were chosen, and their mouse homologs (mTSG) were retrieved from mouse genome informatics (MGI) (www dot informatics dot jax dot org). A total of 49 mTSGs were found. A total of 7 known housekeeping genes were chosen as internal controls. sgRNAs against these 56 genes were designed using a previously described method (Shalem et al., Science 343, 84-87 (2014); Wang et al., Science 343, 80-84 (2014)) with custom scripts. Five sgRNAs were chosen for each gene, plus 8 non-targeting controls (NTCs), making a total 288 sgRNAs in the mTSG library (Table 1). NTCs do not target any predicted sites in the genome, thus were not included in subsequent MIPs analysis. Of note, two sgRNA pairs happened to be identical by design, namely Rp122_sg4/sg5, and Cdkn2a_sg2/sg5. These sgRNAs were treated as the same in subsequent analyses.
[0132] Design, Cloning of AAV-CRISPR Vectors and mTSG sgRNA Library Cloning:
[0133] AAV-CRISPR vectors were designed to express Cre recombinase for induction of Cas9 expression using constitutive or conditional promoters when delivered to LSL-Cas9 mice (Plasmids available at Addgene). Two sgRNA cassettes were built in these vectors, one encoding an sgRNA targeting Trp53, with the other being an open sgRNA cassette (double SapI sites for sgRNA cloning). The vector was generated by gBlock gene fragment synthesis (IDT) followed by Gibson assembly (NEB). The mTSG library were generated by oligo synthesis, pooled, and cloned into the double SapI sites of the AAV-CRISPR vectors. Library cloning was done at over 100 coverage to ensure proper representation. Representation of plasmid libraries was readout by barcoded Illumina sequencing (Chen et al., Cell 160, 1246-1260 (2015)) with customized primers.
TABLE-US-00001 VectorpAAV-sgRNA-EFS-Cre: (SEQIDNO:555) 1 cctgcaggcagctgcgcgctcgctcgctcactgaggccgcccgggcaaagcccgggcgtc 61 gggcgacctttggtcgcccggcctcagtgagcgagcgagcgcgcagagagggagtggcca 121 actccatcactaggggttcctgcggccgcacgcgtgagggcctatttcccatgattcctt 181 catatttgcatatacgatacaaggctgttagagagataattggaattaatttgactgtaa 241 acacaaagatattagtacaaaatacgtgacgtagaaagtaataatttcttgggtagtttg 301 cagttttaaaattatgttttaaaatggactatcatatgcttaccgtaacttgaaagtatt 361 tcgatttcttggctttatatatcttGTGGAAAGGACGAAACACCGTGTAATAGCTCCTGC 421 ATGGgttttagagctaGAAAtagcaagttaaaataaggctagtccgttatcaacttgaaa 481 aagtggcaccgagtcggtgcTTTTTTtctagaagagggcctatttcccatgattccttca 541 tatttgcatatacgatacaaggctgttagagagataattggaattaatttgactgtaaac 601 acaaagatattagtacaaaatacgtgacgtagaaagtaataatttcttgggtagtttgca 661 gttttaaaattatgttttaaaatggactatcatatgcttaccgtaacttgaaagtatttc 721 gatttcttggctttatatatcttGTGGAAAGGACGAAACACCggaagagcgagctcttct 781 gttttagagctaGAAAtagcaagttaaaataaggctagtccgttatcaacttgaaaaagt 841 ggcaccgagtcggtgcTTTTTTggtaccaggtcttgaaaggagtgggaattggctccggt 901 gcccgtcagtgggcagagcgcacatcgcccacagtccccgagaagttggggggaggggtc 961 ggcaattgaaccggtgcctagagaaggtggcgcggggtaaactgggaaagtgatgtcgtg 1021 tactggctccgcctttttcccgagggtgggggagaaccgtatataagtgcagtagtcgcc 1081 gtgaacgttctttttcgcaacgggtttgccgccagaacacaggcgtacggccaccatgga 1141 agacgccaaaaacataaagaaaggcccggcgccattctatccgctggaagatggaaccgc 1201 tggagagcaactgcataaggctatgaagagatacgccctggttcctggaacaattgcttt 1261 tacagatgcacatatcgaggtggacatcacttacgctgagtacttcgaaatgtccgttcg 1321 gttggcagaagctatgaaacgatatgggctgaatacaaatcacagaatcgtcgtatgcag 1381 tgaaaactctcttcaattctttatgccggtgttgggcgcgttatttatcggagttgcagt 1441 tgcgcccgcgaacgacatttataatgaacgtgaattgctcaacagtatgggcatttcgca 1501 gcctaccgtggtgttcgtttccaaaaaggggttgcaaaaaattttgaacgtgcaaaaaaa 1561 gctcccaatcatccaaaaaattattatcatggattctaaaacggattaccagggatttca 1621 gtcgatgtacacgttcgtcacatctcatctacctcccggttttaatgaatacgattttgt 1681 gccagagtccttcgatagggacaagacaattgcactgatcatgaactcctctggatctac 1741 tggtctgcctaaaggtgtcgctctgcctcatagaactgcctgcgtgagattctcgcatgc 1801 cagagatcctatttttggcaatcaaatcattccggatactgcgattttaagtgttgttcc 1861 attccatcacggttttggaatgtttactacactcggatatttgatatgtggatttcgagt 1921 cgtcttaatgtatagatttgaagaGgagctgtttctgaggagccttcaggattacaagat 1981 tcaaagtgcgctgctggtgccaaccctattctccttcttcgccaaaagcactctgattga 2041 caaatacgatttatctaatttacacgaaattgcttctggtggcgctcccctctctaagga 2101 agtcggggaagcggttgccaagaggttccatctgccaggtatcaggcaaggatatgggct 2161 cactgagactacatcagctattctgattacacccgagggggatgataaaccgggcgcggt 2221 cggtaaagttgttccattttttgaagcgaaggttgtggatctggataccgggaaaacgct 2281 gggcgttaatcaaagaggcgaactgtgtgtgagaggtcctatgattatgtccggttatgt 2341 aaacaatccggaagcgaccaacgccttgattgacaaggatggatggctacattctggaga 2401 catagcttactgggacgaagacgaacacttcttcatcgttgaccgcctgaagtctctgat 2461 taagtacaaaggctatcaggtggctcccgctgaattggaatccatcttgctccaacaccc 2521 caacatcttcgacgcaggtgtcgcaggtcttcccgacgatgacgccggtgaacttcccgc 2581 cgccgttgttgttttggagcacggaaagacgatgacggaaaaagagatcgtggattacgt 2641 cgccagtcaagtaacaaccgcgaaaaagttgcgcggaggagttgtgtttgtggacgaagt 2701 accgaaaggtcttaccggaaaactcgacgcaagaaaaatcagagagatcctcataaaggc 2761 caagaagggcggaaagatcgccgtgGCTAGCggaagcggagccactaacttctccctgtt 2821 gaaacaagcaggggatgtcgaagagaatcccgggccacccaagaagaagaggaaggtgtc 2881 caatctcctgactgttcaccagaacctccctgcgctgccagtagatgccactagcgatga 2941 ggtcaggaaaaatctcatggatatgtttagggatagacaggcgttttctgaacacacctg 3001 gaaaatgctgcttagcgtgtgccgatcctgggcagcctggtgtaagctgaacaatcgcaa 3061 atggttccccgccgagccggaggacgtgcgcgattacctgctgtatctccaggcaagagg 3121 gctggctgtcaagactatccagcagcacttgggccaactgaatatgctgcatcgacgcag 3181 cgggctcccccggcctagcgattcaaacgcagtctcccttgttatgaggagaattagaaa 3241 ggaaaacgtagatgcgggtgagagggctaagcaggctctcgcttttgagcggactgattt 3301 cgaccaggtcagatccctgatggagaacagcgatcggtgccaggacatcaggaacctcgc 3361 atttctgggaattgcatataacacacttctgcgcatagctgagatcgcccggatcagagt 3421 gaaagacatcagtcgaacggacggcggccggatgcttattcatattggacgcacaaagac 3481 attggtcagcaccgctggcgttgaaaaggccttgtccctgggcgtaacgaagctggtgga 3541 aagatggatctcagtgtccggcgtggctgacgaccctaataattacttgttctgtcgagt 3601 gagaaaaaacggagtcgccgcgccctctgccaccagccaattgagtacacgggcccttga 3661 agggatctttgaggcaacccaccgactcatatacggagccaaggatgacagtggccagag 3721 gtatctcgcctggtcaggtcattctgctagggtgggggccgcacgagacatggcgcgggc 3781 aggagtctccataccagagattatgcaagctggaggttggacaaatgtgaacatcgttat 3841 gaactatatccgcaatcttgactctgaaaccggggccatggtgagactgctcgaagatgg 3901 tgactacccatacgatgttccagattacgctTAAGAATTCgatatcaagcttAATAAAAG 3961 ATCTTTATTTTCATTAGATCTGTGTGTTGGTTTTTTGTGTggtaaccacgtgcggaccga 4021 gcggccgcaggaacccctagtgatggagttggccactccctctctgcgcgctcgctcgct 4081 cactgaggccgggcgaccaaaggtcgcccgacgcccgggctttgcccgggcggcctcagt 4141 gagcgagcgagcgcgcagctgcctgcaggggcgcctgatgcggtattttctccttacgca 4201 tctgtgcggtatttcacaccgcatacgtcaaagcaaccatagtacgcgccctgtagcggc 4261 gcattaagcgcggcgggtgtggtggttacgcgcagcgtgaccgctacacttgccagcgcc 4321 ctagcgcccgctcctttcgctttcttcccttcctttctcgccacgttcgccggctttccc 4381 cgtcaagctctaaatcgggggctccctttagggttccgatttagtgctttacggcacctc 4441 gaccccaaaaaacttgatttgggtgatggttcacgtagtgggccatcgccctgatagacg 4501 gtttttcgccctttgacgttggagtccacgttctttaatagtggactcttgttccaaact 4561 ggaacaacactcaaccctatctcgggctattcttttgatttataagggattttgccgatt 4621 tcggcctattggttaaaaaatgagctgatttaacaaaaatttaacgcgaattttaacaaa 4681 atattaacgtttacaattttatggtgcactctcagtacaatctgctctgatgccgcatag 4741 ttaagccagccccgacacccgccaacacccgctgacgcgccctgacgggcttgtctgctc 4801 ccggcatccgcttacagacaagctgtgaccgtctccgggagctgcatgtgtcagaggttt 4861 tcaccgtcatcaccgaaacgcgcgagacgaaagggcctcgtgatacgcctatttttatag 4921 gttaatgtcatgataataatggtttcttagacgtcaggtggcacttttcggggaaatgtg 4981 cgcggaacccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgaga 5041 caataaccctgataaatgcttcaataatattgaaaaaggaagagtatgagtattcaacat 5101 ttccgtgtcgcccttattcccttttttgcggcattttgccttcctgtttttgctcaccca 5161 gaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgcacgagtgggttacatc 5221 gaactggatctcaacagcggtaagatccttgagagttttcgccccgaagaacgttttcca 5281 atgatgagcacttttaaagttctgctatgtggcgcggtattatcccgtattgacgccggg 5341 caagagcaactcggtcgccgcatacactattctcagaatgacttggttgagtactcacca 5401 gtcacagaaaagcatcttacggatggcatgacagtaagagaattatgcagtgctgccata 5461 accatgagtgataacactgcggccaacttacttctgacaacgatcggaggaccgaaggag 5521 ctaaccgcttttttgcacaacatgggggatcatgtaactcgccttgatcgttgggaaccg 5581 gagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcctgtagcaatggca 5641 acaacgttgcgcaaactattaactggcgaactacttactctagcttcccggcaacaatta 5701 atagactggatggaggcggataaagttgcaggaccacttctgcgctcggcccttccggct 5761 ggctggtttattgctgataaatctggagccggtgagcgtgggtctcgcggtatcattgca 5821 gcactggggccagatggtaagccctcccgtatcgtagttatctacacgacggggagtcag 5881 gcaactatggatgaacgaaatagacagatcgctgagataggtgcctcactgattaagcat 5941 tggtaactgtcagaccaagtttactcatatatactttagattgatttaaaacttcatttt 6001 taatttaaaaggatctaggtgaagatcctttttgataatctcatgaccaaaatcccttaa 6061 cgtgagttttcgttccactgagcgtcagaccccgtagaaaagatcaaaggatcttcttga 6121 gatcctttttttctgcgcgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcg 6181 gtggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagc 6241 agagcgcagataccaaatactgtccttctagtgtagccgtagttaggccaccacttcaag 6301 aactctgtagcaccgcctacatacctcgctctgctaatcctgttaccagtggctgctgcc 6361 agtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataaggcg 6421 cagcggtcgggctgaacggggggttcgtgcacacagcccagcttggagcgaacgacctac 6481 accgaactgagatacctacagcgtgagctatgagaaagcgccacgcttcccgaagggaga 6541 aaggcggacaggtatccggtaagcggcagggtcggaacaggagagcgcacgagggagctt 6601 ccagggggaaacgcctggtatctttatagtcctgtcgggtttcgccacctctgacttgag 6661 cgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaacgccagcaacgcg 6721 gcctttttacggttcctggccttttgctggccttttgctcacatgt VectorpAAV-sgRNA-TBG-Cre: (SEQIDNO:556) 1 cctgcaggcagctgcgcgctcgctcgctcactgaggccgcccgggcaaagcccgggcgtc 61 gggcgacctttggtcgcccggcctcagtgagcgagcgagcgcgcagagagggagtggcca 121 actccatcactaggggttcctgcggccgcacgcgtgagggcctatttcccatgattcctt 181 catatttgcatatacgatacaaggctgttagagagataattggaattaatttgactgtaa 241 acacaaagatattagtacaaaatacgtgacgtagaaagtaataatttcttgggtagtttg 301 cagttttaaaattatgttttaaaatggactatcatatgcttaccgtaacttgaaagtatt 361 tcgatttcttggctttatatatcttGTGGAAAGGACGAAACACCGTGTAATAGCTCCTGC 421 ATGGgttttagagctaGAAAtagcaagttaaaataaggctagtccgttatcaacttgaaa 481 aagtggcaccgagtcggtgcTTTTTTtctagaagagggcctatttcccatgattccttca 541 tatttgcatatacgatacaaggctgttagagagataattggaattaatttgactgtaaac 601 acaaagatattagtacaaaatacgtgacgtagaaagtaataatttcttgggtagtttgca 661 gttttaaaattatgttttaaaatggactatcatatgcttaccgtaacttgaaagtatttc 721 gatttcttggctttatatatcttGTGGAAAGGACGAAACACCggaagagcgagctcttct 781 gttttagagctaGAAAtagcaagttaaaataaggctagtccgttatcaacttgaaaaagt 841 ggcaccgagtcggtgcTTTTTTggtaccgcggcctctagactcgaggggctggaagctac 901 ctttgacatcatttcctctgcgaatgcatgtataatttctacagaacctattagaaagga 961 tcacccagcctctgcttttgtacaactttcccttaaaaaactgccaattccactgctgtt 1021 tggcccaatagtgagaactttttcctgctgcctcttggtgcttttgcctatggcccctat 1081 tctgcctgctgaagacactcttgccagcatggacttaaacccctccagctctgacaatcc 1141 tctttctcttttgttttacatgaagggtctggcagccaaagcaatcactcaaagttcaaa 1201 ccttatcattttttgctttgttcctcttggccttggttttgtacatcagctttgaaaata 1261 ccatcccagggttaatgctggggttaatttataactaagagtgctctagttttgcaatac 1321 aggacatgctataaaaatggaaagataccggtgccaccatggccccaaagGTTAACcgta 1381 cggccaccatggaagacgccaaaaacataaagaaaggcccggcgccattctatccgctgg 1441 aagatggaaccgctggagagcaactgcataaggctatgaagagatacgccctggttcctg 1501 gaacaattgcttttacagatgcacatatcgaggtggacatcacttacgctgagtacttcg 1561 aaatgtccgttcggttggcagaagctatgaaacgatatgggctgaatacaaatcacagaa 1621 tcgtcgtatgcagtgaaaactctcttcaattctttatgccggtgttgggcgcgttattta 1681 tcggagttgcagttgcgcccgcgaacgacatttataatgaacgtgaattgctcaacagta 1741 tgggcatttcgcagcctaccgtggtgttcgtttccaaaaaggggttgcaaaaaattttga 1801 acgtgcaaaaaaagctcccaatcatccaaaaaattattatcatggattctaaaacggatt 1861 accagggatttcagtcgatgtacacgttcgtcacatctcatctacctcccggttttaatg 1921 aatacgattttgtgccagagtccttcgatagggacaagacaattgcactgatcatgaact 1981 cctctggatctactggtctgcctaaaggtgtcgctctgcctcatagaactgcctgcgtga 2041 gattctcgcatgccagagatcctatttttggcaatcaaatcattccggatactgcgattt 2101 taagtgttgttccattccatcacggttttggaatgtttactacactcggatatttgatat 2161 gtggatttcgagtcgtcttaatgtatagatttgaagaGgagctgtttctgaggagccttc 2221 aggattacaagattcaaagtgcgctgctggtgccaaccctattctccttcttcgccaaaa 2281 gcactctgattgacaaatacgatttatctaatttacacgaaattgcttctggtggcgctc 2341 ccctctctaaggaagtcggggaagcggttgccaagaggttccatctgccaggtatcaggc 2401 aaggatatgggctcactgagactacatcagctattctgattacacccgagggggatgata 2461 aaccgggcgcggtcggtaaagttgttccattttttgaagcgaaggttgtggatctggata 2521 ccgggaaaacgctgggcgttaatcaaagaggcgaactgtgtgtgagaggtcctatgatta 2581 tgtccggttatgtaaacaatccggaagcgaccaacgccttgattgacaaggatggatggc 2641 tacattctggagacatagcttactgggacgaagacgaacacttcttcatcgttgaccgcc 2701 tgaagtctctgattaagtacaaaggctatcaggtggctcccgctgaattggaatccatct 2761 tgctccaacaccccaacatcttcgacgcaggtgtcgcaggtcttcccgacgatgacgccg 2821 gtgaacttcccgccgccgttgttgttttggagcacggaaagacgatgacggaaaaagaga 2881 tcgtggattacgtcgccagtcaagtaacaaccgcgaaaaagttgcgcggaggagttgtgt 2941 ttgtggacgaagtaccgaaaggtcttaccggaaaactcgacgcaagaaaaatcagagaga 3001 tcctcataaaggccaagaagggcggaaagatcgccgtgGCTAGCggaagcggagccacta 3061 acttctccctgttgaaacaagcaggggatgtcgaagagaatcccgggccacccaagaaga 3121 agaggaaggtgtccaatctcctgactgttcaccagaacctccctgcgctgccagtagatg 3181 ccactagcgatgaggtcaggaaaaatctcatggatatgtttagggatagacaggcgtttt 3241 ctgaacacacctggaaaatgctgcttagcgtgtgccgatcctgggcagcctggtgtaagc 3301 tgaacaatcgcaaatggttccccgccgagccggaggacgtgcgcgattacctgctgtatc 3361 tccaggcaagagggctggctgtcaagactatccagcagcacttgggccaactgaatatgc 3421 tgcatcgacgcagcgggctcccccggcctagcgattcaaacgcagtctcccttgttatga 3481 ggagaattagaaaggaaaacgtagatgcgggtgagagggctaagcaggctctcgcttttg 3541 agcggactgatttcgaccaggtcagatccctgatggagaacagcgatcggtgccaggaca 3601 tcaggaacctcgcatttctgggaattgcatataacacacttctgcgcatagctgagatcg 3661 cccggatcagagtgaaagacatcagtcgaacggacggcggccggatgcttattcatattg 3721 gacgcacaaagacattggtcagcaccgctggcgttgaaaaggccttgtccctgggcgtaa 3781 cgaagctggtggaaagatggatctcagtgtccggcgtggctgacgaccctaataattact 3841 tgttctgtcgagtgagaaaaaacggagtcgccgcgccctctgccaccagccaattgagta 3901 cacgggcccttgaagggatctttgaggcaacccaccgactcatatacggagccaaggatg 3961 acagtggccagaggtatctcgcctggtcaggtcattctgctagggtgggggccgcacgag 4021 acatggcgcgggcaggagtctccataccagagattatgcaagctggaggttggacaaatg 4081 tgaacatcgttatgaactatatccgcaatcttgactctgaaaccggggccatggtgagac 4141 tgctcgaagatggtgactacccatacgatgttccagattacgctTAAGAATTCgatatca 4201 agcttAATAAAAGATCTTTATTTTCATTAGATCTGTGTGTTGGTTTTTTGTGTggtaacc 4261 acgtgcggaccgagcggccgcaggaacccctagtgatggagttggccactccctctctgc 4321 gcgctcgctcgctcactgaggccgggcgaccaaaggtcgcccgacgcccgggctttgccc 4381 gggcggcctcagtgagcgagcgagcgcgcagctgcctgcaggggcgcctgatgcggtatt 4441 ttctccttacgcatctgtgcggtatttcacaccgcatacgtcaaagcaaccatagtacgc 4501 gccctgtagcggcgcattaagcgcggcgggtgtggtggttacgcgcagcgtgaccgctac 4561 acttgccagcgccctagcgcccgctcctttcgctttcttcccttcctttctcgccacgtt 4621 cgccggctttccccgtcaagctctaaatcgggggctccctttagggttccgatttagtgc 4681 tttacggcacctcgaccccaaaaaacttgatttgggtgatggttcacgtagtgggccatc 4741 gccctgatagacggtttttcgccctttgacgttggagtccacgttctttaatagtggact 4801 cttgttccaaactggaacaacactcaaccctatctcgggctattcttttgatttataagg 4861 gattttgccgatttcggcctattggttaaaaaatgagctgatttaacaaaaatttaacgc 4921 gaattttaacaaaatattaacgtttacaattttatggtgcactctcagtacaatctgctc 4981 tgatgccgcatagttaagccagccccgacacccgccaacacccgctgacgcgccctgacg 5041 ggcttgtctgctcccggcatccgcttacagacaagctgtgaccgtctccgggagctgcat 5101 gtgtcagaggttttcaccgtcatcaccgaaacgcgcgagacgaaagggcctcgtgatacg 5161 cctatttttataggttaatgtcatgataataatggtttcttagacgtcaggtggcacttt 5221 tcggggaaatgtgcgcggaacccctatttgtttatttttctaaatacattcaaatatgta 5281 tccgctcatgagacaataaccctgataaatgcttcaataatattgaaaaaggaagagtat 5341 gagtattcaacatttccgtgtcgcccttattcccttttttgcggcattttgccttcctgt 5401 ttttgctcacccagaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgcacg 5461 agtgggttacatcgaactggatctcaacagcggtaagatccttgagagttttcgccccga 5521 agaacgttttccaatgatgagcacttttaaagttctgctatgtggcgcggtattatcccg 5581 tattgacgccgggcaagagcaactcggtcgccgcatacactattctcagaatgacttggt 5641 tgagtactcaccagtcacagaaaagcatcttacggatggcatgacagtaagagaattatg 5701 cagtgctgccataaccatgagtgataacactgcggccaacttacttctgacaacgatcgg 5761 aggaccgaaggagctaaccgcttttttgcacaacatgggggatcatgtaactcgccttga 5821 tcgttgggaaccggagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcc 5881 tgtagcaatggcaacaacgttgcgcaaactattaactggcgaactacttactctagcttc 5941 ccggcaacaattaatagactggatggaggcggataaagttgcaggaccacttctgcgctc 6001 ggcccttccggctggctggtttattgctgataaatctggagccggtgagcgtgggtctcg 6061 cggtatcattgcagcactggggccagatggtaagccctcccgtatcgtagttatctacac 6121 gacggggagtcaggcaactatggatgaacgaaatagacagatcgctgagataggtgcctc 6181 actgattaagcattggtaactgtcagaccaagtttactcatatatactttagattgattt 6241 aaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgataatctcatgac 6301 caaaatcccttaacgtgagttttcgttccactgagcgtcagaccccgtagaaaagatcaa 6361 aggatcttcttgagatcctttttttctgcgcgtaatctgctgcttgcaaacaaaaaaacc 6421 accgctaccagcggtggtttgtttgccggatcaagagctaccaactctttttccgaaggt 6481 aactggcttcagcagagcgcagataccaaatactgtccttctagtgtagccgtagttagg 6541 ccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaatcctgttacc 6601 agtggctgctgccagtggcgataagtcgtgtcttaccgggttggactcaagacgatagtt 6661 accggataaggcgcagcggtcgggctgaacggggggttcgtgcacacagcccagcttgga 6721 gcgaacgacctacaccgaactgagatacctacagcgtgagctatgagaaagcgccacgct 6781 tcccgaagggagaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcg 6841 cacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcgggtttcgcca 6901 cctctgacttgagcgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaa 6961 cgccagcaacgcggcctttttacggttcctggccttttgctggccttttgctcacatgt TBG: (SEQIDNO:557) gcggcctctagactcgaggggctggaagctacctttgacatcatttcctctgcgaatgcatgtataatttct acagaacctattagaaaggatcacccagcctctgcttttgtacaactttcccttaaaaaactgccaattcca ctgctgtttggcccaatagtgagaactttttcctgctgcctcttggtgcttttgcctatggcccctattctg cctgctgaagacactcttgccagcatggacttaaacccctccagctctgacaatcctattctcttttgtttt acatgaagggtctggcagccaaagcaatcactcaaagttcaaaccttatcattttttgctttgttcctcttg gccttggttttgtacatcagctttgaaaataccatcccagggttaatgctggggttaatttataactaagag tgctctagttttgcaatacaggacatgctataaaaatggaaagataccggtgccaccatggccccaaag
[0134] AAV-mTSG Viral Library Production:
[0135] The AAV-CRISPR plasmid vector (AAV-vector) and library (AAV-mTSG) were subjected to AAV9 production and chemical purification. Briefly, HEK 293FT cells (ThermoFisher) were transiently transfected with AAV-vector or AAV-mTSG, AAV9 serotype plasmid and pDF6 using polyethyleneimine (PEI). Each replicate consist of five of 80% confluent HEK 293FT cells in 15-cm tissue culture dishes or T-175 flasks (Corning). Multiple replicates were pooled to enhance production yield. Approximately 72 hours post transfection, cells were dislodged and transferred to a conical tube in sterile PBS. 1/10 volume of pure chloroform was added and the mixture was incubated at 37 C. and vigorously shaken for 1 hour. NaCl was added to a final concentration of 1 M and the mixture was shaken until dissolved and then pelleted at 20 k g at 4 C. for 15 minutes. The aqueous layer was discarded while the chloroform layer was transferred to another tube. PEG8000 was added to 10% (w/v) and shaken until dissolved. The mixture was incubated at 4 C. for 1 hour and then spun at 20 k g at 4 C. for 15 minutes. The supernatant was discarded and the pellet was resuspended in DPBS plus MgCl.sub.2 and treated with Benzonase (Sigma) and incubated at 37 C. for 30 minutes. Chloroform (1:1 volume) was then added, shaken, and spun down at 12 k g at 4C for 15 min. The aqueous layer was isolated and passed through a 100 kDa MWCO (Millipore). The concentrated solution was washed with PBS and the filtration process was repeated. Genomic copy number (GC) of AAV was titrated by real-time quantitative PCR (qPCR) using custom Taqman assays (ThermoFisher) targeted to Cre.
[0136] Intravenous (i.v.) Virus Injection for Liver Transduction:
[0137] Conditional LSL-Cas9 knock-in mice were bred in a mixed 129/C57BL/6 background. Mixed gender (randomized males and females) 8-14 week old mice were used in experiments. Mice were maintained and bred in standard individualized cages with maximum of 5 mice per cage, with regular room temperature (65-75 F., or 18-23 C.), 40-60% humidity, and a 12 h:12 h light cycle. To intravenously inject AAVs, mice were restrained in rodent restrainer (Braintree Scientific), their tails were dilated using a heat lamp or warm water, sterilized by 70% ethanol, and 200 microliters of concentrated AAV (1e10GC/L, 2e12 GC per mouse) was injected into the tail vein of each mouse. 100% of the mice survived the procedure. Animals that failed injections (<70% of total volume injected into tail vein after multiple attempts) were excluded from the study. No specific methods were implemented to choose sample sizes.
[0138] MRI:
[0139] MRI imaging was performed using standard imaging protocol with MRI machines (Varian 7T/310/ASRwhole mouse MRI system, or Bruker 9.4T horizontal small animal systems). Briefly, animals were anesthetized using isoflurane, and positioned in the imaging bed with a nosecone providing constant isoflurane. A total of 20-30 frontal views were acquired for each mouse using a custom setting: echo time (TE)=20, repetition time (TR)=2000, slicing=1.0 mm. Raw image stacks were processed using Osirix or Slicer tools. Rendering and quantification were performed using Slicer (slicer dot org). Tumor size was calculated with the following formula: Volume (mm.sup.3)=*3.14*length (mm)*height (mm)*depth (mm). Statistical significance was assessed by non-parametric Mann-Whitney test, as samples numbers and sample distributions varied across treatment conditions.
[0140] Survival Analysis:
[0141] LSL-Cas9 mice receiving AAV-mTSG i.v. injections rapidly deteriorated in their body condition scores (due to tumor development in most cases). Mice with body condition score (BSC)<2 were euthanized and the euthanasia date was recorded as the last survival date. Occasionally mice bearing tumors died unexpectedly early, and the date of death was recorded as the last survival date. Cohorts of mice intravenously injected with PBS, AAV-vector or AAV-mTSG virus were monitored for their survival. Survival analysis was analyzed by standard Kaplan-Meier method, using the survival and survminer R packages. Differences among the three treatment groups were assessed by log-rank test. Of note, several AAV-vector or PBS injected mice were sacrificed at time points earlier than the last day of survival analysis (at times when a certain AAV-mTSG mice were found dead or euthanized due to poor body conditions), to provide time-matched histology, even though those mice presented with good body condition (BSC4). Mice euthanized early in a healthy state were excluded from calculation of survival percentages.
[0142] Mouse Organ Dissection, Fluorescent Imaging, and Histology:
[0143] Mice were sacrificed by carbon dioxide asphyxiation or deep anesthesia with isoflurane followed by cervical dislocation. Mouse livers and other organs were manually dissected and examined under a fluorescent stereoscope (Zeiss, Olympus or Leica). Brightfield and/or GFP fluorescent images were taken for the dissected organs, and overlaid using ImageJ. Organs were then fixed in 4% formaldehyde or 10% formalin for 48 to 96 hours, embedded in paraffin, sectioned at 6 m and stained with hematoxylin and eosin (H&E) for pathology. For tumor size quantification, H&E slides were scanned using an Aperio digital slidescanner (Leica). Tumors were manually outlined as region-of-interest (ROI), and subsequently quantified using ImageScope (Leica). Statistical significance was assessed by Welch's t-test, given the unequal sample numbers and variances for each treatment condition.
[0144] Mouse Tissue Collection for Molecular Biology:
[0145] Mouse livers and various other organs were dissected and collected manually. For molecular biology, tissues were flash frozen with liquid nitrogen, ground in 24 Well Polyethylene Vials with metal beads in a GenoGrinder machine (OPS diagnostics). Homogenized tissues were used for DNA/RNA/protein extractions using standard molecular biology protocols.
[0146] Genomic DNA Extraction from Cells and Mouse Tissues:
[0147] For genomic DNA extraction, 50-200 mg of frozen ground tissue were resuspended in 6 ml of Lysis Buffer (50 mM Tris, 50 mM EDTA, 1% SDS, pH 8) in a 15 ml conical tube, and 30 l of 20 mg/ml Proteinase K (Qiagen) were added to the tissue/cell sample and incubated at 55 C. overnight. The next day, 30 IA of 10 mg/ml RNAse A (Qiagen) was added to the lysed sample, which was then inverted 25 times and incubated at 37 C. for 30 minutes. Samples were cooled on ice before addition of 2 ml of pre-chilled 7.5M ammonium acetate (Sigma) to precipitate proteins. The samples were vortexed at high speed for 20 seconds and then centrifuged at >4,000g for 10 minutes. Then, a tight pellet was visible in each tube and the supernatant was carefully decanted into a new 15 ml conical tube. Then 6 ml 100% isopropanol was added to the tube, inverted 50 times and centrifuged at >4,000g for 10 minutes. Genomic DNA was visible as a small white pellet in each tube. The supernatant was discarded, 6 ml of freshly prepared 70% ethanol was added, the tube was inverted 10 times, and then centrifuged at >4,000g for 1 minute. The supernatant was discarded by pouring; the tube was briefly spun, and remaining ethanol was removed using a P200 pipette. After air-drying for 10-30 minutes, the DNA changed appearance from a milky white pellet to slightly translucent. Then, 500 l of ddH.sub.2O was added, the tube was incubated at 65 C. for 1 hour and at room temperature overnight to fully resuspend the DNA. The next day, the gDNA samples were vortexed briefly. The gDNA concentration was measured using a Nanodrop (Thermo Scientific).
[0148] Molecular Inversion Probe (MIP) Design and Synthesis:
[0149] MIPs were designed according to previously published protocols (Hardenbol, P. et al., Nat. Biotechnol. 21, 673-678 (2003); O'Roak, B. J. et al., Science 338, 1619-1622 (2012). Briefly, the 70 bp flanking the predicted cut site of each sgRNA of all 278 unique sgRNA were chosen as targeting regions, and the bed file with these coordinates was used as an input. Since Trp53 sg4 targets a similar region as the p53 sgRNA within the base vector, the same MIP was used to sequence both of these loci.
[0150] These coordinates contained overlapping regions which were subsequently merged into 173 unique regions. Each probe contains an extension probe sequence, a ligation probe sequence, and a 7 bp degenerate barcode (NN) for PCR duplicate removal. A total of 266 MIP probes were designed covering a total amplicon of 42,478 bp. MIP target size stats: min=155 bp, max=190 bp, mean=159.7 bp, median=156.0 bp. Each of the mTSG-MIPs were synthesized using standard oligo synthesis with IDT, normalized and pooled.
[0151] MIP Capture Sequencing:
[0152] 150 ng of genomic DNA sample from each mouse organ was used as input. MIP capture sequencing was done according to previously published protocols (Hardenbol, P. et al., Nat. Biotechnol. 21, 673-678 (2003); O'Roak, B. J. et al., Science 338, 1619-1622 (2012) with some slight modifications. The multiplexed library was then quality controlled using qPCR, and subjected to high-throughput sequencing using the Hiseq-2500 or Hiseq-4000 platforms (Illumina) at Yale Center for Genome Analysis. 280/281 (99.6%) of targeted sgRNAs were captured for all samples from this experiment, with the missing one being Arid1a sg5.
[0153] Illumina Sequencing Data Pre-Processing:
[0154] FASTQ reads were mapped to the mm10 genome using the bwa mem function in BWA v0.7.13. Bam files were merged, sorted, and indexed using bamtools v2.4.0 and samtools v1.3.
[0155] Variant Calling:
[0156] For each sample, indel variants were called using samtools and VarScan v2.3.9. Specifically, samtools mpileup (-d 1000000000 -B -q 10) was used, and the output piped to VarScan pileup2indel ( - - - min-coverage 1 min-reads2 1 - - - min-var-freq 0.001 - - - p-value 0.05). To link each indel to the sgRNA that most likely caused the mutation, the center position of each indel was mapped to the closest sgRNA cut site.
[0157] Calling Mutated sgRNA Sites and Mutated Genes:
[0158] All detected indels were further filtered by requiring that each indel must overlap the 3 basepair flank of the closest sgRNA cut site, as Cas9-induced double-strand breaks are expected to occur within a narrow window of the predicted cut site. To exclude any possible germline mutations, any sgRNAs with indels present in more than half of the control samples with greater than 5% variant frequency were removed. In particular, high variant frequencies were observed across all samples at the Rps19 sg5 cut site, suggesting these were germline variants; thus, Rps19 sg5 was excluded from all analyses.
[0159] To determine significantly mutated sgRNA sites in each liver sample, a false-discovery approach was used based on the PBS and vector control samples. For each sgRNA, the highest % variant read frequency across all control liver samples were first taken: in order for a mutation to be called in an mTSG sample, the % variant read frequency had to exceed the control sample cutoff. However, since the base vector contained a Trp53 sgRNA (p53 sg8) whose cut site was only 1 bp away from the target site of Trp53 sg4 (from mTSG library), PBS samples were considered only when calculating the false-discovery cutoff for Trp53 sg4. To identify the dominant clones in each sample, a 5% variant frequency cutoff was set on top of the false-discovery cutoff. These criteria yielded a binary table (i.e. not significantly mutated vs. significantly mutated) detailing each sgRNA and whether its target site was significantly mutated in each sample. To convert significantly mutated sgRNA sites into significantly mutated genes, the binary sgRNA scores were collapsed by gene, such that if any of the 5 sgRNAs for a gene were found to be significantly cutting, the entire gene would be called as significantly mutated.
[0160] Coding Frame Analysis:
[0161] For coding frame and exonic/intronic analysis, only indels that were associated with an sgRNA that had been considered significantly mutated in that particular sample were considered. This final set of significant indels was converted to .avinput format and subsequently annotated using ANNOVAR v. 2016 Feb. 1, using default settings.
[0162] Co-Occurrence and Correlation Analysis:
[0163] Co-occurrence analysis was performed by first generating a double-mutant count table for each pairwise combination of genes in the mTSG library. Statistical significance of the co-occurrence was assessed by two-sided hypergeometric test. To calculate co-occurrence rates, the intersection was defined as the number of double-mutant samples, and the union defined as the number of samples with a mutation in either (or both) of the two genes, and then divided the intersection by the union. For correlation analysis, the table of variant frequencies was first collapsed to the gene level (in other words, summing the variant frequencies for all 5 of the targeting sgRNAs for each gene). Using these summed variant frequency values, the Pearson correlation was calculated between all gene pairs, across each mTSG sample. Statistical significance of the correlation was determined by converting the correlation coefficient to a t-statistic, and then using the t-distribution to find the associated probability. For both co-occurrence and correlation analyses, p-values were adjusted for multiple hypothesis testing by the Benjamini-Hochberg method to obtain q-values.
[0164] Unique Variant Analysis:
[0165] Instead of first collapsing variant calls to the sgRNA level as above, unique variants and their associated mutant frequencies were compiled across all sequenced samples. To be considered present in a given sample, a particular variant must have a mutant frequency 1%. Hierarchically clustered heatmaps of the unique variant landscape were created in R using the NMF package, with average linkage and Euclidean distance.
[0166] A focused analysis on the unique variant landscape within a single mouse was also performed, as presented in
[0167] Clustering of Variant Frequencies to Infer Clonality of Tumors:
[0168] For each mTSG liver sample, the individual variants that comprised the MS calls in that sample were extracted, with a cutoff of 5% variant frequency to eliminate low-abundance variants. To identify clusters of variant frequencies in an unbiased manner, the variant frequency distribution was modeled with a Gaussian kernel density estimate, using the Sheather-Jones method to select the smoothing bandwidth. From the kernel density estimate, the number of local maxima (i.e. peaks) within the density function were then identified. The number of peaks thus represented the number of variant frequency clusters for an individual sample, which is an approximation for the clonality of the tumors.
[0169] Direct In Vivo Validation of Drivers or Combinations:
[0170] Liver-specific AAV-CRISPR vectors were designed to co-cistronically expresses firefly luciferase (FLuc) and Cre recombinase for induction of Cas9 expression under a TBG promoter when delivered to LSL-Cas9 mice (Plasmids available at Addgene). Two sgRNA cassettes were built in these vectors, one encoding an sgRNA targeting Trp53, with the other being an open sgRNA cassette (double SapI sites for GeneX targeting sgRNA cloning). The vector was generated by gBlock gene fragment synthesis (IDT) followed by Gibson assembly (NEB). Each specific sgRNA targeting a driver gene was cloned separately into this vector. AAV9 virus was produced and qPCR-titrated as described above. 1e11 total viral particles were introduced by intravenous injection into LSL-Cas9 mice. For combinations of two AAVs, 5e10 viral particles were used from each AAV to generate equal titer mixtures and injected. Four to six mice were injected per group. One month after injection, mice were imaged by IVIS each month. Briefly, mice were anesthetized by intraperitoneal injection of ketamine (100 mg/kg) and xylazine (10 mg/kg), and imaged for in vivo tumor growth using an IVIS machine (PerkinElmer) with 150 mg/kg body weight Firefly D-Luciferin potassium salt injected I.P. Relative tumor burden were quantified using LivingImage software (PerkinElmer).
[0171] LIHC comparative cancer genomics analysis and patient survival analysis using TCGA datasets: Somatic mutation calls, copy number variation calls, RNA-seq expression z-scores, and clinical data containing patient survival information were obtained through cBioPortal for liver hepatocellular carcinoma (LIHC data set) (Gao, et al., Sci. Signal. 6, pl 1 (2013); Cerami, et al., Cancer Discov. 2, 401-404 (2012)) on Nov. 15, 2016. Pearson correlation coefficients were calculated comparing mouse and human mutation frequencies; statistical significance was calculated by converting the correlation coefficient to a t-statistic, and then using the t-distribution to find the associated probability. All patients with sequencing data and survival data were considered (n=372). A tumor was defined as being negative for a given gene if it had one or more of the following: 1) a non-silent somatic mutation, 2) homozygous deletion, or 3) an expression z-score <2. On the basis of these negative vs. positive classifications, Kaplan-Meier survival analysis was performed, using the log-rank test to determine statistical significance.
[0172] The results of the experiments are now described.
[0173] A list of the top SMGs in the pan-cancer TCGA datasets was compiled. The top 50 SMGs were identified after excluding known oncogenes (
TABLE-US-00002 TABLE1 mTSGlibrary sgRNA_name Spacersequence SEQIDNO Fat1_sg1 GGGCAGTGTTTCAAAATCCA SEQIDNO:1 Fat1_sg2 GGAACACGAGCCGTCAGCGG SEQIDNO:2 Fat1_sg3 GGATTTCTGTTCTGCATCAA SEQIDNO:3 Fat1_sg4 GGTCCCATCTGTTGCCTCCA SEQIDNO:4 Fat1_sg5 GTTTGGAGATCCACTCGATA SEQIDNO:5 Arid1b_sg1 GTACCCAGTGCAAGCTACAG SEQIDNO:6 Arid1b_sg2 GGGTACCCCGCTATATGTTG SEQIDNO:7 Arid1b_sg3 GCATCTGGCCCCCAGGAGAT SEQIDNO:8 Arid1b_sg4 GGTCCGTACTGAGACATCTG SEQIDNO:9 Arid1b_sg5 GATTCTGACTGGCTTCCAGG SEQIDNO:10 Trp53_sg1 GTGAAATACTCTCCATCAAG SEQIDNO:11 Trp53_sg2 GGGAGAGGCGCTTGTGCAGG SEQIDNO:12 Trp53_sg3 GCAAAGAGAGGTACGCAGGC SEQIDNO:13 Trp53_sg4 GCGGTTCATGCCCCCCATGC SEQIDNO:14 Trp53_sg5 GGTATACTCAGAGCCGGCCT SEQIDNO:15 Grlf1_sg1 GCCATGGCTGAAGGGGAGCC SEQIDNO:16 Grlf1_sg2 GACTCTGTAGGTGACAGCAA SEQIDNO:17 Grlf1_sg3 GCTCCTAAACCTACTTGTTC SEQIDNO:18 Grlf1_sg4 GCATGCTGTATGGTACCAGG SEQIDNO:19 Grlf1_sg5 GAAGGCATCTACCGGGTCAG SEQIDNO:20 Npm1_sg1 GGACGATGATGAGGACGATG SEQIDNO:21 Npm1_sg2 GTGTAATTTCAAAGCCCCCT SEQIDNO:22 Npm1_sg3 GTGGACAGCATCTAGTAGGT SEQIDNO:23 Npm1_sg4 GGGCGGTTCTCTTCCCAAAG SEQIDNO:24 Npm1_sg5 GGAAGAGAACCGCCCTATTG SEQIDNO:25 Ep300_sg1 GCATATGCTCGTAAAGTGGA SEQIDNO:26 Ep300_sg2 GATGAGTGAAAATGCTGGTG SEQIDNO:27 Ep300_sg3 GGCTGAGCTGCTGTTGGCAA SEQIDNO:28 Ep300_sg4 GCCCTAGGTGCTAGTCCTAT SEQIDNO:29 Ep300_sg5 GCTAGTCCTATGGGTGTAAA SEQIDNO:30 Mll2_sg1 GCCGGCTATGTCGGGCCTGT SEQIDNO:31 Mll2_sg2 GGGATTCAGTTCTGCTGAGC SEQIDNO:32 Mll2_sg3 GTGTGTGAGACATGTGACAA SEQIDNO:33 Mll2_sg4 GCAGGCAGGTCCTCCATAGG SEQIDNO:34 Mll2_sg5 GCCTCCTCTGCCGGAGAAAG SEQIDNO:35 Setd2_sg1 GACTGTGAATGGACAGCTGA SEQIDNO:36 Setd2_sg2 GTCTTCTCAAAACATTTCAG SEQIDNO:37 Setd2_sg3 GCACTGATGGAAAATGGTGA SEQIDNO:38 Setd2_sg4 GACTACCAGTTCCAAAGATA SEQIDNO:39 Setd2_sg5 GAAGCTTCTGGTTACTTTCC SEQIDNO:40 Cdkn2a_sg1 GGGATTGGCCGCGAAGTTCC SEQIDNO:41 Cdkn2a_sg2 GGGGTACGACCGAAAGAGTT SEQIDNO:42 Cdkn2a_sg3 GGGTCGCCTGCCGCTCGACT SEQIDNO:43 Cdkn2a_sg4 GGGAACGTCGCCCAGACCGA SEQIDNO:44 Cdkn2a_sg5 GGGGTACGACCGAAAGAGTT SEQIDNO:45 Rpl7_sg1 GTACCTGCACCAGGAAAACC SEQIDNO:46 Rpl7_sg2 GTGGAGCCATACATTGCATG SEQIDNO:47 Rpl7_sg3 GGGTGAGTTTTCTGTCTAGT SEQIDNO:48 Rpl7_sg4 GCCTTTGTCATCAGAATTCG SEQIDNO:49 Rpl7_sg5 GAAGGCAAAGCACTATCACA SEQIDNO:50 Pbrm1_sg1 GCAATGGTCTTGAGATCTAT SEQIDNO:51 Pbrm1_sg2 GACCATTGCTCAGAGGATAC SEQIDNO:52 Pbrm1_sg3 GCCTGGGTCTCAAGTATTCA SEQIDNO:53 Pbrm1_sg4 GCCAAAACATACAATGAGCC SEQIDNO:54 Pbrm1_sg5 GTGCGAAGGACCTGTCAGCC SEQIDNO:55 Pik3r1_sg1 GACTGCATGGGCAGAAGGGA SEQIDNO:56 Pik3r1_sg2 GAGACGGCACTTTCCTTGTC SEQIDNO:57 Pik3r1_sg3 GTTGGCTACAGTAGTGGGCT SEQIDNO:58 Pik3r1_sg4 GGCAGTGCTGCAGGCAAAAG SEQIDNO:59 Pik3r1_sg5 GGCTGACGCAGAAAGGTGTG SEQIDNO:60 Rps19_sg1 GGCCGCAAGCTGACGCCTCA SEQIDNO:61 Rps19_sg2 GCCTCAGGGACAGAGAGACC SEQIDNO:62 Rps19_sg3 GTCCCTGAGGCGTCAGCTTG SEQIDNO:63 Rps19_sg4 GGGCCGCAAGCTGACGCCTC SEQIDNO:64 Rps19_sg5 GTTGAAACAGAGCGGGGGGG SEQIDNO:65 Bcor_sg1 GTGGATGAAAGGCTCTTCAT SEQIDNO:66 Bcor_sg2 GGTTTTGCACAGTCTCTTCC SEQIDNO:67 Bcor_sg3 GACCTCAGGCTGAACAGCCT SEQIDNO:68 Bcor_sg4 GGCCCAGGCTGTTCAGCCTG SEQIDNO:69 Bcor_sg5 GTCCACCACCCCCTGGTCAC SEQIDNO:70 Mll3_sg1 GTTGGCACTGATTTCATAAC SEQIDNO:71 Mll3_sg2 GGGAGAAGATAGCAAGATGC SEQIDNO:72 Mll3_sg3 GTGGCTACTGACCAAACCCA SEQIDNO:73 Mll3_sg4 GAGAATTCCTAACAGCTATG SEQIDNO:74 Mll3_sg5 GCTGCCGATACTCCAAACTT SEQIDNO:75 Kdm6a_sg1 GCAACTATTTTACAACAATT SEQIDNO:76 Kdm6a_sg2 GGTAAATTAAAACACTCACC SEQIDNO:77 Kdm6a_sg3 GTAAATTAAAACACTCACCT SEQIDNO:78 Kdm6a_sg4 GCAGCATTTTCAGTTAGCTT SEQIDNO:79 Kdm6a_sg5 GGCTATTAAAGCATTTCAGG SEQIDNO:80 Atm_sg1 GTGATTTTGATCTCGTGCCT SEQIDNO:81 Atm_sg2 GCAAGGTACACTGTAATCAG SEQIDNO:82 Atm_sg3 GTGCTTATGAATCCATGAAA SEQIDNO:83 Atm_sg4 GTCCAAATATATAGTAAGGT SEQIDNO:84 Atm_sg5 GAGACTTGAGGAAAATGTTA SEQIDNO:85 Rnf43_sg1 GGGGCCAAGGGTATGCCAGA SEQIDNO:86 Rnf43_sg2 GACTGTGGGATCCCAGTTTC SEQIDNO:87 Rnf43_sg3 GTAGGTAGGAGGTGAACTCA SEQIDNO:88 Rnf43_sg4 GCATGTTCAACATCGTAGGT SEQIDNO:89 Rnf43_sg5 GGAGTCTTCTGCCTGGTTCC SEQIDNO:90 Vhl_sg1 GGACTACCCAAGTGTGCGGA SEQIDNO:91 Vhl_sg2 GCACCTTGAGAGTCAGCACC SEQIDNO:92 Vhl_sg3 GGTTAACCAGAAGTCCATCA SEQIDNO:93 Vhl_sg4 GTGCCATCCCTCAATGTCGA SEQIDNO:94 Vhl_sg5 GTCCTGAGGAGATGGAGGCT SEQIDNO:95 Sf3b3_sg1 GTCTCCTTCTTCTAGAGGCA SEQIDNO:96 Sf3b3_sg2 GGCAAAACAGAATAGGAGAG SEQIDNO:97 Sf3b3_sg3 GGCAATTTGATACAAGTAAC SEQIDNO:98 Sf3b3_sg4 GCAATTTGATACAAGTAACT SEQIDNO:99 Sf3b3_sg5 GCACAGTATCAAAATACTTG SEQIDNO:100 Map2k4_sg1 GACAAAGTTGATGAAACTGG SEQIDNO:101 Map2k4_sg2 GCCGATTTCCTTATCCAAAG SEQIDNO:102 Map2k4_sg3 GACCCAAGTGCATCAAGACA SEQIDNO:103 Map2k4_sg4 GCACTTGGGTCTATTCTTTC SEQIDNO:104 Map2k4_sg5 GGGCGACTGTTGGATCTGTA SEQIDNO:105 Arid2_sg1 GTCCAGTAAAAGCTGGAGGA SEQIDNO:106 Arid2_sg2 GAGTGGTTCTGAAATCCACA SEQIDNO:107 Arid2_sg3 GGAGAGCAATGTTAAGCTCT SEQIDNO:108 Arid2_sg4 GACTGTGTGCAGAGAGCAAC SEQIDNO:109 Arid2_sg5 GTCACTTCTCATTACAGTTT SEQIDNO:110 Tgfbr2_sg1 GATGCCCTGCAGAGGAAAGG SEQIDNO:111 Tgfbr2_sg2 GGCAGAGCGCTTCAGTGAGC SEQIDNO:112 Tgfbr2_sg3 GACAGTGTGCTGAGAGACCG SEQIDNO:113 Tgfbr2_sg4 GGCCGGAAATTCCCAGCTTC SEQIDNO:114 Tgfbr2_sg5 GTGTTTCTTTTGGTCTTAGG SEQIDNO:115 Atrx_sg1 GTGTTTCTCCCTTTAAGTCT SEQIDNO:116 Atrx_sg2 GGCAGCCCCAATTCTGCTCA SEQIDNO:117 Atrx_sg3 GATATTAGCCGTGACTCAGA SEQIDNO:118 Atrx_sg4 GAAGACAAAGATGATTTTAA SEQIDNO:119 Atrx_sg5 GGTTTCCTACAAAAGAGTTA SEQIDNO:120 Rpl22_sg1 GCTCATTGGTTGGTTTCTGC SEQIDNO:121 Rpl22_sg2 GCCTTTCTCCAAAAGGTATG SEQIDNO:122 Rpl22_sg3 GGTTAGTATGGCTCCGCGTG SEQIDNO:123 Rpl22_sg4 GCGTTACTTCCAGATTAACC SEQIDNO:124 Rpl22_sg5 GCGTTACTTCCAGATTAACC SEQIDNO:125 Fubp1_sg1 GATAAACCTCTTAGGATTAC SEQIDNO:126 Fubp1_sg2 GGAACGGGCTGGTGTTAAAA SEQIDNO:127 Fubp1_sg3 GTCTTCCCTTTTCAACAATC SEQIDNO:128 Fubp1_sg4 GAAAAGGGAAGACCAGCCCC SEQIDNO:129 Fubp1_sg5 GTTAGCATACAAGACCTTTC SEQIDNO:130 Pcna_sg1 GGAGGCGGTGAGTAGTAAGG SEQIDNO:131 Pcna_sg2 GAGGAGGCGGTGAGTAGTAA SEQIDNO:132 Pcna_sg3 GAATTTTGGACATGCTAGGG SEQIDNO:133 Pcna_sg4 GTGAGCCTGTTTTCTCCTCT SEQIDNO:134 Pcna_sg5 GGTTACCTAGAGGAGAAAAC SEQIDNO:135 Notch1_sg1 GTATACACCTTCATAACCTG SEQIDNO:136 Notch1_sg2 GCAGTGGCCATTGTGCAGAC SEQIDNO:137 Notch1_sg3 GGCACCTGGTGAAAGAGGCA SEQIDNO:138 Notch1_sg4 GCCAACCCTTGTGAGCACGC SEQIDNO:139 Notch1_sg5 GAGCACACTCATCCACGTCC SEQIDNO:140 Casp8_sg1 GGTGACAAGGGTGTCGTCTA SEQIDNO:141 Casp8_sg2 GTGTCGTCTATGGAACGGAT SEQIDNO:142 Casp8_sg3 GGTAAACTTTGTCTGAAGTC SEQIDNO:143 Casp8_sg4 GGAGTTGGGTTATGTCTTCC SEQIDNO:144 Casp8_sg5 GACTCACTGTCTTGTTCTCT SEQIDNO:145 Stag2_sg1 GAGTGTTTGTACATAGATAC SEQIDNO:146 Stag2_sg2 GCAGAACGGAATAAAATGAT SEQIDNO:147 Stag2_sg3 GATGACTGCTTTGGTAAATG SEQIDNO:148 Stag2_sg4 GATTACCCACTTACCATGGC SEQIDNO:149 Stag2_sg5 GAGGACCAGCCATGGTAAGT SEQIDNO:150 Kdm5c_sg1 GCTCTGCAGAGTATATTCCC SEQIDNO:151 Kdm5c_sg2 GCATGTAGGTGATGCAGGGC SEQIDNO:152 Kdm5c_sg3 GTTTGTCATCTTCATCTCCT SEQIDNO:153 Kdm5c_sg4 GTATGCCGAATGTGTTCCCG SEQIDNO:154 Kdm5c_sg5 GACCTTCCTAGAAGGCAAGG SEQIDNO:155 Smad4_sg1 GTCCATTTCAAAGTAAGCAA SEQIDNO:156 Smad4_sg2 GCAATGGAGCACCAGTACTC SEQIDNO:157 Smad4_sg3 GATGATTGGAAATGGGAGGC SEQIDNO:158 Smad4_sg4 GTCACAACAGGGCAGCTTGA SEQIDNO:159 Smad4_sg5 GATGGCTATGTGGATCCTTC SEQIDNO:160 Cdkn1a_sg1 GGGCTCCCGTGGGCACTTCA SEQIDNO:161 Cdkn1a_sg2 GAAAACCCTGAAGTGCCCAC SEQIDNO:162 Cdkn1a_sg3 GAAGATTCCCCGGGTGGGCC SEQIDNO:163 Cdkn1a_sg4 GGGTGGGCCCGGAACATCTC SEQIDNO:164 Cdkn1a_sg5 GATTGCGATGCGCTCATGGC SEQIDNO:165 Runx1_sg1 GCGCGCGGGGGGCATGTTGG SEQIDNO:166 Runx1_sg2 GCCTCCTCCAGGCGCGCGGG SEQIDNO:167 Runx1_sg3 GTCCTAGTGTAGGGACCGGG SEQIDNO:168 Runx1_sg4 GAGGGTTGGGCGTGGGGGCT SEQIDNO:169 Runx1_sg5 GTAGAGGTGCGTATCTGTCA SEQIDNO:170 Rb1_sg1 GTTCGAGGTGAACCATTAAT SEQIDNO:171 Rb1_sg2 GAGGTCAGAACAGGAGCGCT SEQIDNO:172 Rb1_sg3 GGCTCTCTGAGTAGTGCAGG SEQIDNO:173 Rb1_sg4 GAATCATGGAATCCCTTGCA SEQIDNO:174 Rb1_sg5 GAACCTTTTTATTCCTAGGA SEQIDNO:175 Zc3h13_sg1 GAGGCAGAACGTCGTAAAGA SEQIDNO:176 Zc3h13_sg2 GTTCTCTTCCGGCGAGGAGA SEQIDNO:177 Zc3h13_sg3 GGAGGTGGACTCGGAGTGCG SEQIDNO:178 Zc3h13_sg4 GAGATGGGAAGGACAGAGGC SEQIDNO:179 Zc3h13_sg5 GACTTTCTCAGAGAAGGTGA SEQIDNO:180 Bap1_sg1 GAACCGACAAACAGTCCTGG SEQIDNO:181 Bap1_sg2 GGTCAGGCACCACTGCCATC SEQIDNO:182 Bap1_sg3 GTCCTCTCCCCAGGGCCCTA SEQIDNO:183 Bap1_sg4 GTGGACAGATAAAGCTCGAA SEQIDNO:184 Bap1_sg5 GCTATGTGCCTATCACAGGG SEQIDNO:185 Map3k1_sg1 GGGATACCTACCTGAATCCA SEQIDNO:186 Map3k1_sg2 GGGAGGTGGGGGACTCCACG SEQIDNO:187 Map3k1_sg3 GTCCCCTTTGTAGATCTAAG SEQIDNO:188 Map3k1_sg4 GGAGATCCCATGACTTCTAC SEQIDNO:189 Map3k1_sg5 GGGGAGGGGACACCTACAGA SEQIDNO:190 Rasa1_sg1 GAGATTATTCTCTGTATTTT SEQIDNO:191 Rasa1_sg2 GTCTTAATGTCTTTCCTTTA SEQIDNO:192 Rasa1_sg3 GATCTTCTTCTCGGCCCTAA SEQIDNO:193 Rasa1_sg4 GTTCACAATGAGTTAGAAGA SEQIDNO:194 Rasa1_sg5 GGACACTGAGATATATCTAT SEQIDNO:195 Nf1_sg1 GTCCATGGTAGTTGATCTTA SEQIDNO:196 Nf1_sg2 GCTGCAGCCAAGAGCTCTTG SEQIDNO:197 Nf1_sg3 GATTATCCGAATTCTTAGCA SEQIDNO:198 Nf1_sg4 GACAATCTGATGCTATATCT SEQIDNO:199 Nf1_sg5 GGTATATTTTCCAAGTCTTG SEQIDNO:200 Kansl1_sg1 GTGGAGAGCTGTCTCACCAG SEQIDNO:201 Kansl1_sg2 GGGTGTGGAGGTGTCTGATG SEQIDNO:202 Kansl1_sg3 GGTCATGCACAGGTGGCGGC SEQIDNO:203 Kansl1_sg4 GATGGCACAGCTCTGAAGAG SEQIDNO:204 Kansl1_sg5 GCTCTGGAAGTGCAGGCTTG SEQIDNO:205 Gata3_sg1 GGTGGTGAGGTCCGAAGGAG SEQIDNO:206 Gata3_sg2 GGAAGGGTGGTGAGGTCCGA SEQIDNO:207 Gata3_sg3 GCCCACAGGCATTGCAGACC SEQIDNO:208 Gata3_sg4 GAGGAACGCTAATGGGGACC SEQIDNO:209 Gata3_sg5 GTACCATCTCGCCGCCACAG SEQIDNO:210 Pten_sg1 GGTTCATTGTCACTAACATC SEQIDNO:211 Pten_sg2 GAATGCTGATCTTCATCAAA SEQIDNO:212 Pten_sg3 GAACTTGTCCTCCCGCCGCG SEQIDNO:213 Pten_sg4 GTTCTTCATACCAGGACCAG SEQIDNO:214 Pten_sg5 GGGAATTGTGACTCCCTGAT SEQIDNO:215 Rps18_sg1 GCTGCAGAAGAAAAAGATAC SEQIDNO:216 Rps18_sg2 GCGCCACTTTTGGGGGTAAG SEQIDNO:217 Rps18_sg3 GAACCTAGATTTTGAGACAG SEQIDNO:218 Rps18_sg4 GAATTTTCTTCAGCCTCTCC SEQIDNO:219 Rps18_sg5 GAGGGCTGCGCCACTTTTGG SEQIDNO:220 Arid1a_sg1 GGCTACCCAAATATGAATCA SEQIDNO:221 Arid1a_sg2 GGACCCCCATATCCTATGGG SEQIDNO:222 Arid1a_sg3 GCTGCCTAGGATAGCCTCCT SEQIDNO:223 Arid1a_sg4 GACGCATGAGCCATTCTCCC SEQIDNO:224 Arid1a_sg5 GAAGTGTACTGGGGCATCTG SEQIDNO:225 Apc_sg1 GGAGAGAGTTTACTTCCGAG SEQIDNO:226 Apc_sg2 GTCTTTGTCCTGAGGCCTTA SEQIDNO:227 Apc_sg3 GTGGAGTGCTGCACTGGCCC SEQIDNO:228 Apc_sg4 GCTGTGAGTGAATGATGTTG SEQIDNO:229 Apc_sg5 GCCAGTGTTTTGAGTTCTAG SEQIDNO:230 Ctcf_sg1 GTCTACAAGCATAATCACAC SEQIDNO:231 Ctcf_sg2 GATTATGCTTGTAGACAGGT SEQIDNO:232 Ctcf_sg3 GATGGCGTAGAGGGGGAAAA SEQIDNO:233 Ctcf_sg4 GATAACTGTGCTGGTCCAGA SEQIDNO:234 Ctcf_sg5 GCTATGACAGTGTCACAATG SEQIDNO:235 Cic_sg1 GTACAGGCAGGAGGCAACTG SEQIDNO:236 Cic_sg2 GCAGGAGGCAACTGGGGACT SEQIDNO:237 Cic_sg3 GGGGTGCACAGTCTTGATGG SEQIDNO:238 Cic_sg4 GTGTAGCCGTTCTGCTCCAC SEQIDNO:239 Cic_sg5 GTACCTTGGCCACTAGTGGG SEQIDNO:240 Polr2a_sg1 GTGGAACGGCACATGTGTGA SEQIDNO:241 Polr2a_sg2 GGAACGGCACATGTGTGATG SEQIDNO:242 Polr2a_sg3 GACTTCAGGAATTAGTACGC SEQIDNO:243 Polr2a_sg4 GAAGGTCACTGGGCTTAGGA SEQIDNO:244 Polr2a_sg5 GTCTGCAGATGAAGGTCACT SEQIDNO:245 Rps11_sg1 GACTCCTTGTCTGACCCCAC SEQIDNO:246 Rps11_sg2 GAGGACCATTGTCATCCGCC SEQIDNO:247 Rps11_sg3 GGATGTAATGGAGATAGTCC SEQIDNO:248 Rps11_sg4 GTCACCTGAAACAGGGGGAC SEQIDNO:249 Rps11_sg5 GTCGGATCCTGTCTGGTGAG SEQIDNO:250 Stk11_sg1 GGAGCCCGAGGAGGGGTTTG SEQIDNO:251 Stk11_sg2 GGGCGCAGGCCTTCCTGGAG SEQIDNO:252 Stk11_sg3 GAAGAAACACCCTCTGGCTG SEQIDNO:253 Stk11_sg4 GTGTCTGGGCTTGGTGGGAT SEQIDNO:254 Stk11_sg5 GTGCTGCCTAATCTGTCGGA SEQIDNO:255 Cdkn1b_sg1 GCTCCACAGTGCCAGCGTTC SEQIDNO:256 Cdkn1b_sg2 GCGAAGAAGAATCTAAGAGG SEQIDNO:257 Cdkn1b_sg3 GGAGAAGCACTGCCGGGATA SEQIDNO:258 Cdkn1b_sg4 GGTTAGCGGAGCAGTGTCCA SEQIDNO:259 Cdkn1b_sg5 GGTGCTGGCGCAGGAGAGCC SEQIDNO:260 Cdh1_sgl GAAAACAGCCAAGGTTTGTA SEQIDNO:261 Cdh1_sg2 GGGTCAAGTGCCTGAGAATG SEQIDNO:262 Cdh1_sg3 GAGTTACCCTACATACACTC SEQIDNO:263 Cdh1_sg4 GTTCAGGCTGCTGACCTTCA SEQIDNO:264 Cdh1_sg5 GGAGGTTCCTGTCAAAGGAG SEQIDNO:265 B2m_sgl GGTCTTGGGCTCGGCCATAC SEQIDNO:266 B2m_sg2 GGGTGAATTCAGTGTGAGCC SEQIDNO:267 B2m_sg3 GAGCCCAAGACCGTCTACTG SEQIDNO:268 B2m_sg4 GTATGTATCAGTCTCAGTGG SEQIDNO:269 B2m_sg5 GGTCGCTTCAGTCGTCAGCA SEQIDNO:270 Fbxw7_sg1 GCCGCTTGCAGCAGGTCTTT SEQIDNO:271 Fbxw7_sg2 GCAGCAGGTCTTTGGGTTCC SEQIDNO:272 Fbxw7_sg3 GAGTGTATACATACTTTATA SEQIDNO:273 Fbxw7_sg4 GTATGCATCTCCATGAAAAA SEQIDNO:274 Fbxw7_sg5 GATCTGTACACTTTTCTTAT SEQIDNO:275 Nkx2-1_sg1 GCGGGGCGCACTGGGCAGCG SEQIDNO:276 Nkx2-1_sg2 GCCACCGCTGCCCACTGAGA SEQIDNO:277 Nkx2-1_sg3 GACGGCAAACCCTGCCAGGC SEQIDNO:278 Nkx2-1_sg4 GCCATGCAGCAGCACGCCGT SEQIDNO:279 Nkx2-1_sg5 GCCGTGGGGGGCTACTGCAA SEQIDNO:280 Control_sg1 ACGGAGGCTAAGCGTCGCAA SEQIDNO:281 Control_sg2 CGCTTCCGCGGCCCGTTCAA SEQIDNO:282 Control_sg3 ATCGTTTCCGCTTAACGGCG SEQIDNO:283 Control_sg4 GTAGGCGCGCCGCTCTCTAC SEQIDNO:284 Control_sg5 CCATATCGGGGCGAGACATG SEQIDNO:285 Control_sg6 TACTAACGCCGCTCCTACAG SEQIDNO:286 Control_sg7 TGAGGATCATGTCGAGCGCC SEQIDNO:287 Control_sg8 GGGCCCGCATAGGATATCGC SEQIDNO:288
[0174] Live magnetic resonance imaging (MRI) of mice 3 months post-treatment revealed large nodules in mTSG-treated animals (n=4), while vector-treated animals (n=3) only occasionally had small nodules and PBS animals (n=3) were devoid of detectable nodules (
[0175] Mice that received the AAV-CRISPR mTSG library (n=27) did not survive more than four months (median survival=90 days, 95% confidence interval CI=84-90 days), while mice that were treated with PBS (n=10) or vector control (n=11) all survived the duration of the experiment (log-rank test, p=1.8*10.sup.11) (
TABLE-US-00003 TABLE 3 Survival data for PBS, vector, or mTSG-treated animals. Group ID last_day_post_injection_survived vector GvIV pilot m1 NA vector GvIV pilot m2 NA vector GvIV pilot m3 NA vector GvIV pilot m4 NA vector GvIV pilot m5 NA vector GvIV pilot m6 NA vector GvIV m1 NA vector GvIV m2 NA vector GvIV m3 NA vector GvIV m4 NA vector GvIV m5 NA PBS PBS M1 NA PBS PBS M2 NA PBS PBS M3 NA PBS PBS M4 NA PBS PBS M5 NA PBS PBS M6 NA PBS PBS M7 NA PBS PBS M8 NA PBS PBS M9 NA PBS PBS M10 NA mTSG mTSG pilot 97 mTSG mTSG 107 mTSG mTSG 111 mTSG mTSG 111 mTSG mTSG 117 mTSG mTSG 117 mTSG mTSG 67 mTSG mTSG 74 mTSG mTSG 77 mTSG mTSG 84 mTSG mTSG 74 mTSG mTSG 82 mTSG mTSG 84 mTSG mTSG 84 mTSG mTSG 84 mTSG mTSG 80 mTSG mTSG 82 mTSG mTSG 87 mTSG mTSG 90 mTSG mTSG 90 mTSG mTSG 90 mTSG mTSG 90 mTSG mTSG 90 mTSG mTSG 90 mTSG mTSG 90 mTSG mTSG 90 mTSG mTSG 90
[0176] Endpoint histological sections were analyzed from PBS (n=7), vector (n=5), and mTSG-treated mice (n=13), sacrificed 3-4 months post-treatment for controls (
[0177] To understand the molecular alterations driving the development of tumors in mTSG-treated mice, Molecular Inversion Probes (MIPs) were designed to enable capture sequencing of the 70 basepair (bp) regions surrounding the predicted cut site of each sgRNA in the mTSG library (namely, the +17 position of each 20 bp spacer sequence). As opposed to simply sequencing the sgRNA cassettes to find the relative enrichment of each sgRNA within the cell population, MIP capture sequencing enables a direct quantitative analysis of the mutations induced by the Cas9-sgRNA complex. To generate this pool of MIPs (termed mTSG-MIPs) (
[0178] After collapsing each of the filtered indel calls to the closest sgRNA by summing their constituent variant frequencies, the overall spectrum of variant frequencies across all sequenced samples was plotted (
[0179] Significantly mutated sgRNA sites (SMSs) were identified in the mTSG-treated liver samples using a false-discovery rate method as compared to PBS and vector-treated liver samples, such that no control sample would have any called SMSs. Of most interest were the dominant clones that had undergone strong positive selection in the tumor, thus it was further required that at least 5% of the reads have an indel in that region in order to call an SMS. Different mTSG-treated liver samples presented with highly heterogeneous mutational signatures, indicating that a diverse array of mutations had undergone positive selection in different samples (
[0180] SMSs in each sample were collapsed to the gene level to find significantly mutated genes (SMGs). Analysis of all mTSG liver samples revealed a full mutational landscape of the entire cohort, unfolded as a binary mutation spectrum (
[0181] Of the genes that were significantly mutated in at least one sample, the vast majority (91%, or 50/55) had multiple SMSs (median=3 SMSs out of 5 total sgRNAs per gene), suggesting that these genes are indeed functional tumor suppressors (
[0182] As the study was geared to assess the selective advantage granted upon deletion of each of the genes in the mTSG library, it was reasoned that the population-wide mutation frequency across all mTSG treated liver samples could be interpreted as a proxy for the degree to which each gene normally functions as a tumor suppressor. It was thus tested whether the population-wide mutation frequency in the mTSG treated mice was correlated with the population-wide mutational frequency in humans. Using LIHC data from public datasets (Fujimoto et al., Nat. Genet. 44, 760-764 (2012): Anh et al., Hepatol. Baltim. Md. 60, 1972-1982 (2014)), mouse and human mutation frequencies were significantly correlated (R=0.461, t test for correlation, p=4.78*10.sup.4) (
[0183] To explore synergistic effects between different genes in the mTSG library, co-mutation analysis was performed. For each pair of genes, the strength of mutational co-occurrence was determined by tabulating the number of samples that were double mutant, single mutant, or double wildtype (
[0184] The correlation of gene mutation frequencies within individual tumors was investigated. Since the variant frequency is essentially a metric for the positive selection that acts on a given mutation, genes whose variant frequencies are highly correlated across samples could also be synergistic in driving tumorigenesis. A caveat is that some passenger mutations could be hitchhiking on strong drivers within a given tumor; however, the probability of finding a co-occurring passenger-driver mutation pair is vanishingly small across increasing numbers of mice. The total variant frequency was calculated for each gene by summing all the values from all five sgRNAs, using the summed gene level variant frequencies across each sample to calculate the Spearman correlation between all 1540 possible gene pairs, and assessed whether the correlations were statistically significant (
[0185] To examine the mutational landscape of the liver tumors induced by the AAV-CRISPR mTSG library at finer resolution, the analysis was reframed to the level of specific indel variants. Across all 37 mTSG-treated liver samples, 593 unique variants were identified that had a variant frequency 1% in at least one sample. The majority of these variants (80.94%) were deletions rather than insertions (
[0186] To further understand the clonal architecture in this genetically complex, highly heterogeneous yet fully gene-targeted, autochthonous tumor model, analysis was focused on a single mTSG-treated mouse that had presented with multiple visible tumors in several liver lobes, 5 of which had been harvested for MIPs capture sequencing (
[0187] To clearly delineate any potential clonal mixtures among the 5 liver lobes, the unique variant patterns across these samples were examined. 178 unique variants were identified (1% variant frequency threshold) represented within the 5 liver lobes. Using binary variant calls (i.e., whether a given variant is present or absent in a sample), the 178 variants were clustered into 8 groups (
[0188] Given the repeated emergence of Setd2 and Trp53 in each arm of the analysis (i.e., population-level mutation frequencies, co-mutation analysis, and clonal mixture analysis), the Setd2+Trp53 gene pair was further investigated. An AAV vector for liver-specific CRISPR knockout that expressed Cre recombinase under a TBG promoter, together with a Trp53-targeting sgRNA cassette and an empty sgRNA cassette was generated (
[0189] To assess whether loss-of-function mutations in Setd2 and Trp53 are clinically relevant for human LIHC, patient data from the TCGA LIHC dataset was subsequently analyzed. All patients (n=372) were classified into negative or positive groups based on the integration of somatic mutations, copy number variations, and gene expression profiles. Specifically, a tumor was classified as negative for SETD2 or TP53 if it exhibited one or more of the following: 1) a non-silent mutation, 2) homozygous deletion, or 3) a gene expression z-score <2, indicating an expression level at least two standard deviations below the mean. Using these criteria, 6.99% (26/372) of LIHC patients were classified as SETD2 negative (SETD2.sup.), and 33.87% (126/372) as TP53 negative (TP53.sup.). Kaplan-Meier survival analysis revealed statistically significant associations between SETD2 status and overall survival, with SETD2.sup. patients having worse survival times compared to SETD2.sup.+ patients (log-rank test, p=0.042) (
[0190] After classifying all TCGA LIHC patients into 4 groups in terms of both SETD2 and TP53 status, Kaplan-Meier survival analysis was again performed. The SETD2TP53 double-negative group (n=11) had the worst survival among all four groups (log-rank test, p=0.0011 by comparing all 4 survival curves; pairwise comparisons for SETD2TP53 group: p<0.0001 vs. SETD2+TP53+, p=0.039 vs. SETD2TP53+, p=0.039 vs. SETD2 TP53.sup.) (
[0191] The functional roles of mutations in several of the top genes were individually tested, in a Trp53-sensitized background. Gene pairs were chosen based on their ranking in the screen, potential biological function, and literature. An AAV vector for liver-specific CRISPR knockout was generated that expressed Cre recombinase under a TBG promoter, together with a Trp53-targeting sgRNA cassette and an open (GeneX-targeting) sgRNA cassette (
[0192] Herein, an approach was developed for direct in vivo CRISPR screens to map a provisional functional cancer genome atlas (FCGA) of tumor suppressors in the mouse liver in an autochthonous manner. The genes selected for this study were clinically-relevant, significantly mutated genes in human cancers. As many of the genes have not been specifically studied in the context of cancer in vivo, these candidate tumor suppressors were functionally interrogated in a controlled, quantitative, and high-throughput manner in mice. Using an AAV library carrying 280 different CRISPR sgRNAs, 56 genes were tested for their ability to promote tumorigenesis in the mouse liver upon loss-of-function by Cas9 mutagenesis. Capture sequencing of the resultant liver tumors revealed a heterogeneous mutational landscape, indicating that several of the genes in the mTSG library indeed function as tumor suppressors. The importance of epigenetic control in cancer is now widely appreciated, in part due to tumor profiling studies that have identified recurrent mutations in epigenetic regulators across multiple cancer types. However, the direct contribution of most epigenetic factors to tumor suppression has not yet been rigorously demonstrated. It is thus noteworthy that several of the top drivers identified in our screen were epigenetic modifiers, functionally demonstrating the importance of this gene family in tumor suppression. The population-wide mutation frequency in mTSG treated mice was significantly correlated with population-wide mutation frequency in human LIHC.
[0193] Co-mutation analysis identified several pairs of significantly co-occurring mutations, with Setd2+Trp53 as the top-ranked pair. MIP capture sequencing instead of conventional sgRNA sequencing enabled direct, multiplexed analysis of the indels induced by Cas9 mutagenesis. Variant compositions were systematically dissected across multiple liver lobes from a single mouse, uncovering evidence of clonal mixture between lobes. One variant cluster in particular was found in 4 out of 5 liver lobes, and this cluster was defined by mutations in Setd2 and Trp53. A dual-sgRNA approach was leveraged to simultaneously knockout Setd2 and Trp53 in the mouse liver, leading to rapid liver tumor growth within one month. Several other functional drivers identified in the screen also proved to be sufficient for driving liver tumorigenesis at high efficiency when paired with Trp53, including Arid2, B2m, Cic, Kdm5c, Pik3r1, Pten, Stk11, Vh1, and Zc3h13 (
[0194] As an approximation to the clonality of the tumors, the number of major clusters was also calculated (
[0195] This approach can be extended to identify genetic factors with a significant impact on various cancer types and other human diseases. The present strategy for selecting genes to target in the mTSG library was based on pan-cancer TCGA datasets, rather than being specific to LIHC. This was to identify genes that are more likely to function as tumor suppressors in a wide variety of tissues, with the overarching goal that the same AAV-CRISPR mTSG library could be used in other organs. This approach (AAV-CRISPR mutagenesis followed by MIPS) can be readily expanded to other organ systems, enabling the construction of a multi-organ FCGA of tumor suppressors.
[0196] Though the focus was on liver tumor suppressors in this study, given the immense programmability of CRISPR mediated genome editing, it is feasible to apply this AAV-CRISPR screen approach for targeting different gene sets of interest, coding and non-coding elements, and at genome-scale, to functionally assess phenotypes in an unbiased fashion for tackling a wide array of biological problems. The AAV-CRISPR genetically engineered mouse tumor models (GEMMs), developed in fully immunocompetent mice, preserved the native tumor microenvironment, and therefore can be used in high-throughput screening of immunotherapy responses in vivo.
Other Embodiments
[0197] The recitation of a listing of elements in any definition of a variable herein includes definitions of that variable as any single element or combination (or subcombination) of listed elements. The recitation of an embodiment herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof.
[0198] The disclosures of each and every patent, patent application, and publication cited herein are hereby incorporated herein by reference in their entirety. While this invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention may be devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims are intended to be construed to include all such embodiments and equivalent variations.