ENHANCED REPROGRAMMING OF SOMATIC CELLS

Abstract

A method of preparing a population of iPS cells including (i) expressing one or more Yamanaka factors selected from Oct3/4, Sox2, Klf4, Myc, Nanog and Lin28, and reducing the amount and/or activity of Menin (Men1) in a population of target cells, and (ii) optionally isolating the iPS cells from the target cell population; and a method of enhanced differentiation of a first cell into a somatic cell of a tissue of interest, including (i) treating a cell with a differentiation factor of the tissue of interest, and (ii) reducing the amount and/or activity of Menin (Men1) in a population of target cells.

Claims

1. A method of preparing a population of iPS cells comprising: (i) expressing one or more Yamanaka factors selected from Oct3/4, Sox2, Klf4, Myc, Nanog and Lin28; and (ii) reducing the amount and/or activity of Menin (Men1) in a population of target cells.

2. A method of enhanced differentiation of a first cell into a somatic cell of a tissue of interest, comprising: (i) treating a cell with a differentiation factor of said tissue of interest-; and (ii) reducing the amount and/or activity of Menin (Men1) in a population of target cells; wherein (A) said first cell is a cell of low transdifferentiation capacity selected from an adult or mature dermal cell, a blood cell, a hair follicle cell or a urinary cell; or (B) said differentiation is to a somatic cell of a different germ layer than the first cell; or (C) said somatic cell is a non-cardiac cell.

3. The method of claim 1, wherein the step of reducing the amount of Menin comprises administering to the cells one or more agents that inhibit the expression of Menin, preferably wherein the one or more agents are inhibitory nucleic acids.

4. The method of claim 3, wherein the inhibitory nucleic acid is a siRNA, shRNA, sgRNA, miRNA or antisense nucleic acid molecule.

5. The method of claim 3, wherein the agent is an inhibitory siRNA, shRNA, sgRNA, miRNA or antisense nucleic acid molecule encoded by a transient expression system in the target cells.

6. The method of claim 5, wherein the target cell is exposed to a transient expression system for between 36 to 120 hours.

7. The method claim 1, wherein the step of reducing the activity of Menin comprises administering to the cells one or more agents that inhibit the activity of Menin.

8. The method of claim 7, wherein the one or more activity inhibiting agents are antibodies, inhibitory ligands of Menin, inhibitory mimics of Menin, or small molecule inhibitors transiently inhibiting the activity of Menin, preferably wherein the small molecule inhibitors are selected from KO-382, MI-3, MI-2, MI-463, MI-503, Vinpocetine, MI-136 and/or Sinomenine.

9. The method of claim 1, wherein the step of expressing one or more Yamanaka factors or of treating a cell with a differentiation factor, respectively, comprises integrative approaches, preferably retroviral, lentiviral or adenoviral expression vectors, especially excisable and inducible vectors, or non-integrative approaches, preferably integration-defective viral, episomal, RNA or protein delivery techniques, preferably nonviral vector-based IVT-mRNA nanodelivery systems, in particular preferred, wherein the integrative or non-integrative approach for expressing one or more Yamanaka factors is transient or inducible.

10. The method of claim 1, wherein the method additionally comprises reducing the activity of Pias1 in the target cells.

11. The method claim 1, comprising isolating the iPS cells or the somatic cell, respectively, from the target cell population.

12. The method of claim 1, wherein the target cells are somatic mammalian cells; preferably, human cells, non-human primate cells, or mouse cells; and/or preferably wherein the somatic mammalian cells are fibroblasts, adult stem cells, Sertoli cells, granulosa cells, neurons, pancreatic islet cells, epidermal cells, epithelial cells, endothelial cells, hepatocytes, hair follicle cells, keratinocytes, hematopoietic cells, melanocytes, chondrocytes, lymphocytes (B and T lymphocytes), macrophages, monocytes, mononuclear cells, cardiac muscle cells or skeletal muscle cells.

13. The method of claim 1, wherein reducing the amount and/or activity of Menin (Men1) enhances reprogramming of the target cell to an iPS cell by the expression of the one or more Yamanaka factors selected from Oct3/4, Sox2, Klf4, Myc, Nanog and Lin28.

14. The method of claim 1, wherein the method comprises (i) expressing one or more Yamanaka factors selected from Oct3/4, Sox2, Klf4, Myc, Nanog and Lin28, and (ii) reducing the amount and/or activity of Menin (Men1), together.

15. A method for preparing a population of differentiated cells, comprising: (i) preparing a population of iPS cells according to the method of claim 1; and (ii) differentiating the iPS cells using a protocol or factor to form a population of differentiated cells.

16. A population of iPS cells prepared according to the method of claim 1, wherein the amount and/or activity of Menin is reduced compared to iPS cells that have not been treated with a Menin-reducing agent.

17. The population of iPS cells according to claim 16, wherein the cells comprise an inhibitory nucleic acid molecule of Menin.

18. A kit for enhanced reprogramming of somatic cells into iPS cells comprising: one or more Yamanaka factors or one or more Yamanaka-inducing agents and one or more agents that inhibit the expression, translation or activity of Menin, preferably wherein said Yamanaka factors or Yamanaka-inducing agents and one or more agents that inhibit the expression, translation or activity of Menin are in one or more cell culture medium.

19. The kit of according to claim 18, further comprising an inhibitory nucleic acid molecule of Menin, preferably with a transient transfection agent, preferably a non-integrating virus or an episomal vector.

Description

FIGURES

[0101] FIG. 1: Schematic illustration of UMI use in CRISPR/Cas9 screens. Data analysis in CRISPR screens is conventionally based on several sgRNAs targeting the same gene. Introduction of random barcodes at complexities well above total analyzed cell number will tag each individual cell with a unique molecular identifier (UMI). This generates a third layer of information at single cell level, that represents true biological replica.

[0102] FIG. 2: Conceptual advantage of CRISPR screen analysis by single cell tracing, (a) Upon infection with sgRNA libraries and selection, each infected cell gives rise to cell colonies of daughter cells with various editing outcomes. Here only 1 guide infecting independent cells is shown. Homozygous frameshifts resulting in loss of function alleles (LOF) are shown in solid color, for alternative editing outcomes only the nucleus is labelled. Negative selection results in depletion of LOF cells, (b) Conventional CRISPR screen analysis by NGS will only detect partial depletion of sgRNA reads (red lines) masking the biological component of the effect by technical limitations (e.g. guide efficiency, LOF frequency), thus limiting the possible depletion level (c) Upon limiting dilution and clonal expansion, each infected cell will be tagged with a unique tag (UMI) and represent a clonal editing outcome, whereby LOF clones show complete negative selection (d) Single cell based CRISPR analysis using NGS scores depletion in LOF clones based on biological phenotype, depletion level is approximated by median depletion of clones. (e) Positive selection of a guide can be due to high penetrance or high degree of outgrowth, (f) unlike conventional analysis, CRISPR-UMI analysis can distinguish stochasticity and quantity of effects.

[0103] FIG. 3: Bioinformatic pipeline of sgRNA prediction. Doench-scores as measure of predicted sgRNA activity were calculated for all exonic sgRNAs compatible with our cloning strategy. Doench scores were penalized based on a ruleset for biological effects. Those rules combine evaluation of exon length, prediction of protein domains, alternative splicing and ATG start codons, Pol-II terminator sequences, position of the sgRNA within the CDS. The penalties also spread selected sgRNAs over different exons and an off-target prediction penalizes sgRNAs with predicted off-targets.

[0104] FIG. 4: Library cloning and sequencing strategy, (a) Vector design for library generation. Upon pooled parallel cloning of unique molecular identifiers (UMI) into retroviral backbones at complexities of 106, chip-synthesized sgRNA pools (at a complexity of 26514) were cloned into UMI containing backbone at a coverage >1000 clones/guide. Cassette-flanking Pad sites allowed for liberation of small sgRNA containing fragments from mammalian genomic DNA; (b) Library subpools and cloning complexity resulting in overall complexity of 83 million, (c) Ethidium bromide stained agarose gel, 200 ng DNA/lane; Digest of genomic DNA after screens and plasmid DNA as control with the octamer recognition site enzyme Pad results in mostly large genomic fragments, while sgRNA fragments are 589 bp long (arrowhead). Long and short fragments can be fractionated using magnetic beads, (d) Genomic Q-PCR on fractions shown in (c); Short target region is enriched 4.1*10.sup.3 in fraction 2. Error bars are STDEV, shown is one experiment in technical triplicate representative of 3 experiments. (e) Illumina SR50 NGS sequencing strategy; First read by custom U6 primer, both index reads by standard illumina primers (f) guide representation in sgRNA libraries snows 4-fold representation difference between poorly and highly represented clones (10th and 90th percentile). Reads normalized between individual subpools.

[0105] FIG. 5: Scheme illustrating generation of CRISPR-UMI library complexity. The CRISPR-UMI library is generated by 2 subsequent complex cloning steps. Initially, a random barcode consisting of 10 nucleotides is integrated into the vector backbone. Subsequently, the sgRNA pool of 30 000 sgRNAs is ligated to the barcode library with >1000 ligation events/sgRNA. Thereby, each of the >1000 ligation events/sgRNA combines the sgRNA with another random barcode. The combination of sgRNA and random barcodes generates a complexity of >1000 times the number of sgRNAs. We refer to this highly complex combination of sgRNA and barcode as UMI (unique molecular identifier). Our library reached a complexity of 83 million (see also FIG. 4).

[0106] FIG. 6: Pilot screen to identify optimal conditions for UMI based CRISPR screen analysis, (a) Setup of screen; Upon editing, various clonal outgrowth regiments, followed by clonal expansion and dropout screening, were run in parallel. Cas9 expression was induced by Dox, selection for cells harboring guide RNAs was performed by G418 selection. Limiting dilution and expansion is variable in the experiment. Cells are treated with or without 3.3 nM etoposide a LD.sub.30 for 8 days, (b) Scheme illustrating variation in clone number and size (c) Average clone numbers and size determined from NGS data (d) Distribution of single cell derived clones in each regimen illustrated with guide_1 against Nhej1. P-value for each clone correlates with read depth but results in less data points, (e) Plot illustrating median dropout for each condition as well as p-value determined by combining multiple clones using MAGeCK. Signal to noise ratios (SNR) are highest in 148 clones of 35 reads, and the percentage of guides expected to have less than 5 clones due to variability in representation is with 0.06% lower than for 52 or 21 clone datasets.

[0107] FIG. 7: Single cell analysis of negative selection screen. (a) Graphical illustration for large scale screen setup used to identify sensitizing mutations for etoposide. Cas9 expression was induced by Box, selection for cells harboring guide RNAs was performed by G418 selection, (b) Volcano plot of conventional CRISPR analysis; sgRNA representation relative to control on X-axis, binominal p-value on y-axis, (c) volcano plot of single cell derived clones (CRISPR-UMI analysis), axis as in (a), median depletion of each guide is shown as dashed line, (d) Conventional analysis of depleting sgRNAs, 4 sgRNAs/gene discussed below are highlighted in the same color, (d) CRISPR-UMI analysis of same sgRNAs, p-value is based on MAGeCK score of individual clones within the population, depletion level relative to controls and signal to noise ratio is shown below for (d) and (e).

[0108] FIG. 8: Comparison of conventional analysis performance with CRISPR-UMI. (a) Comparison of conventional and single cell based analysis on guide level, discrepancies highlighted before (diamond) and after (asterisk) outlier removal, (b) For discrepant clones, abundance relative to untreated control (X-axis) against total reads per clone (Y-axis) shows strong outlier clones dominating total read space. Depleted clones lie on the left side of the plot, (c) Venn Diagrams illustrating the number of sgRNAs targeting the positive controls of the NHEJ complex (Lig4, Xrcc4-6, Nhej1) called within the top 50/100 hits (d) Average number of guides targeting the same gene as function of top number of sgRNAs. CRISPR-UMI shows higher reproducibility/number of guides per gene across the entire range of hits, (e) Top ranking genes based on conventional analysis as well as CRISPR-UMI. Ranking of hits avoids false positive and negative calls and shows stronger depletion in CRISPR-UMI. (f) Clonal validation of selected hits at 10 nM etoposide treatment (5 nM pre-treatment) for 3 days. Homozygous loss of function mutations in Lig4, Zfp451, Rad9a, and Erbb4 show sensitization to etoposide. 4 clones per gene (2 for Trim71 and Rac1), 3 technical replicates each, error bars are SEM; Heteroskedastic, two-sided t.test was applied p<0.01=***, p<0.01=**, p<0.05=*, p>0.05=ns. Number of samples is 12 for Lig4, Zfp451, Slc25a4, Adcy3, Rad9a, Erbb4, empty and wt and 6 for Rac1 and Trim71. (g) Visual summary of all genes identified by CRISPR-UMI. All genes, with the exception of Zfp451, have previously been implicated in DNA damage response or repair as (1) involved in resolution of Topoisomerase II entangled chromosomes, (2) NHEJ, or (3) putatively in microhomology based end joining, or (4) SUMOylation in response to DNA damage (Srivastava, M, et al. Cell 151, 1474-1487 (2012), Kurosawa, A. et al. PLoS ONE 8, e72253 (2013), Fattah, F. J. et al. DNA Repair (Amst.) 15, 39-53 (2014), Jackson, S. P. & Bartek, J. Nature 461, 1071-1078 (2009), Black et al. Genes (Basel) 7, 67 (2016), Takata et al. Nat Commun 4, 2338 (2013), Gilmore-Hebert, M., Ramabhadran, R. & Stern, D. F. Mol. Cancer Res. 8, 1388-1398 (2010), Icli, B., Bharti, A., Pentassuglia, L., Peng, X. & Sawyer, D. B. Biochem. Biophys. Res. Commun. 418, 116-121 (2012), Smilenov, L, B. et al. Cancer Res. 65, 933-938 (2005), Koidl, S. et al. The International Journal of Biochemistry & Cell Biology 79, 478-487 (2016), Guzzo, C. M. et al. Sci Signal 5, ra88-ra88 (2012), Kont, Y. S. et al. DNA Repair (Amst.) 43, 38-47 (2016), Katsube, T. et al. J. Radiat. Res. 52, 415-424 (2011) and Pommier, Y. et al. DNA Repair (Amst.) 19, 114-129 (2014)).

[0109] FIG. 9: Read distributions of single cell derived clones. (a) No strong outlier clones are detected in hits identified by conventional analysis as well as CRISPR-UMI (b) Strong outlier clones with very high read counts as well as depletion are seen in false positive hits called by conventional analysis, (c) Genes identified only in CRISPR-UMI show modest but reproducible depletion in multiple independent clones but are often dominated by clones with high read counts that to not deplete.

[0110] FIG. 10: Pooled dropout screen without clonal outgrowth also shows outliers resulting in false positive calls, (a) Comparison of CRISPR-UMI with conventional screen analysis on guide level in absence of clonal dilution and outgrowth shows highly correlative results (yellow) as well as discrepancy between both regimen (mixed colors, Pearson correlation: 0.729) (b) While correlating sgRNAs do not contain strong outlier clones based on total read count, guides only called in conventional analysis snow outlier clones responsible for overall dropout, (c) Ranking of sgRNAs improves upon removal of outlier clones (top 3 clones by read count) from the dataset illustrating their confounding effects.

[0111] FIG. 11: Single cell analysis of positive selection screen for roadblocks of reprogramming, (a) Schematic of experimental setup. Mouse embryonic fibroblasts carrying T3G-OKSM in the ColA locus, rtTA in the ROSA locus, and a knocking of GFP in the Oct4 locus were infected with lentiviral encoded Cas9 and subsequently with a retroviral library delivering sgRNAs. Reprogramming was induced by Box administration for 7 days, and GFP-positive IPS cells were purified by FACS on day 11. Evaluation of individual IPS colonies by UMIs. (b) Scatterplot shows enrichment of individual sgRNAs based on abundance (total read counts) on the y axis, and enrichment of individual sgRNAs based on incidence (independent colony number) on the x-axis. Both axes are normalized to the total number of reads (abundance) of the sgRNA in the untreated MEF population. Median enrichment of sgRNAs in 4 experiments is shown (c) Single sgRNA validation in 6-well format in triplicate. Readout was by flow cytometry measuring fraction of GFP positive cells. As the validation was done in 4 batches and each batch shows slightly different efficiency, we normalize to controls. Knockdown of Ube2i results in an improvement of reprogramming >100 fold in low dose OKSM. Error bars are STDEV. (d) Alkaline phosphatase staining in 6-wells on day 10 illustrates enhanced reprogramming upon Men1 or Pias1 targeting in the transgenic system, (e) Box plot of colony size, assayed using the normalized and median scaled abundance of unique barcode-guide combinations, revealed similar distributions for all guides targeting one gene and illustrates increased colony size due to faster reprogramming or faster IPS colony growth. (f) Representative colonies in validation experiment on day 10 after Box induction stained for alkaline phosphatase activity confirms colony size predictions.

[0112] FIG. 12: Predicted size distribution of colonies by UMI analysis from NGS data, (a) Alkaline phosphatase staining in 6 well dishes 10 days after Box induction in the transgenic system illustrating enhanced IPS colony formation for guides targeting Men1 or Pias1. (b) Median size distribution of read counts per UMI for each sgRNA. Reads for each UMI were filtered for sequencing errors and median colony size is plotted relative to median size illustrating a marked size increase per iPS colony in many but not all identified roadblocks of reprogramming, (c) Representative colonies for comparison with FIG. 8f stained with alkaline phosphatase in validation experiment on day 10 after Box.

[0113] FIG. 13: alkaline phosphatase (AP) staining of reprogrammed iPS cells. AP stains iPS cell colonies dark blue (arrowheads), while fibroblasts do not stain or appear as fibroblastic stained cells (asterisk). Reprogramming by sh menin (samples 9 and 10) or sg menin (samples 19 and 20) control is shown in samples 21 and 22,

[0114] FIG. 14: Differentiation ESC to iN. Mean number of iN derived from Ascl1 and Ngn2 cell line with and without menin knockout is shown in (a). The boxplots (b) and (c) show data from two clones with confirmed homozygous menin knockout and the corresponding parent cell line without menin knockout. Cell numbers counted using FACS are plotted from three independent experiments N=3, error bars=standard deviation

[0115] FIG. 15: Transdifferentiation MEF to iN. (a) and (b): cell images with and without menin knockout. The plot of (c) illustrates the difference in iN number obtained from Ascl1 cell line after menin knockout and empty guide control. N=3, error bars=standard deviation.

EXAMPLES

Example 1: Guide Selection

[0116] sgRNAs targeting mouse nuclear genes as well as drugged orthologs and a set of hand selected genes with 4 sgRNAs per gene (5 sgRNAs per gene for the subset drugged genes) were selected by a bioinformatics pipeline. We aimed to design a guide selection algorithm taking both guide efficiency as well as biological effect due to gene structure into account. The basis of the guide selection is the activity score as described by Doench et al. (Nature Biotechnology 32, 1262-1267 (2014)). Additionally, we identified properties of each guide and exon under consideration and penalized the Doench score accordingly. We identified all exonic PAM sites in the mouse genome mm10 (Rosenbloom et al. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res. 43, D670-81 (2015)). We excluded sgRNAs that are incompatible with our cloning strategy (contain: GAAGAC, GTCTCC, CTCGAG, CGTCTC or GAGACG, start with: AAGAC or end with: CTCGA). We then calculated Doench-scores for all potential sgRNAs. We penalized the Doench-scores based on heuristic rules (exact penalty scores can be found in FIG. 3) that aim to select sgRNAs which most likely lead to LOF phenotypes. Those rules include exon properties such as presence or absence of protein domains annotated in Pfam database (Finn et al. Nucleic Acids Res. 44, D279-85 (2016)), exon size, and whether or not exon length is a multiple of 3 bp. Then we created penalties for exon distribution, in order to spread sgRNAs over many exons where only the sgRNA with the best Doench score per exon does not get penalized. We also avoided sgRNAs that are less than 4 nt away from another better scoring sgRNA. Furthermore we penalized sgRNAs that cut DNA upstream of a possible alternative ATG start codon and sgRNAs that cut in exons that are not common to all annotated transcripts from that locus. We avoided sgRNAs that contain a stretch of 4 or more T in a row which would act as a Pol-III Terminator. We calculated a distance-penalty based on the distance from the sgRNA to the transcriptional start ranging from 1 to 0.5. Then we calculated a simple off-target prediction (FIG. 3) against all exonic sequences containing a PAM site.

[0117] The off-target prediction scores weight mismatches by position in the sgRNA sequence. We re-ranked the penalized Doench score including the off-target analysis and picked the top 4 sgRNAs per gene (the top 5 sgRNAs for Druggable genes) for chip oligo synthesis (CustomArray Inc.). For negative control guides we used a published list of human control guides (Wang et al. Science 343, 80-84 (2014)) and removed all guides which had a perfect match against the mouse genome. We included a total of 112 control guides into our mouse library targeting 6560 genes.

Example 2: Library Cloning

[0118] We ordered a gBlock (IDT) flanked by primer binding sites for amplification, restriction sites EcoRI and MfeI for cloning the Illumina i7 primer binding site followed by 10 bp random nucleotide sequence and the Illumina P7 Adaptor. (acgatgagcagagccagaaccagaaggaacttgactctagaGATCGGAAGAG-CACACGTCTGAACTCCAGTCACNNNNNNNNNNgtcctcatctgagagctactcatcaacgg-tATCTCGTATGCCGTCTTaTGCTTGTTAATTAAGAATTCctggacga, SEQ ID NO: 1) (note: we exchanged C to A in the P7 Adaptor Sequence to eliminate a BbS-I restriction site in the adaptor for library cloning, but reintroduced the C during PCR in the DNA-sample prep before NGS). The gBock was digested with EcoRI (NEB R3101L) and MfeI (NEB R3589L) purified on a column (Qiagen 27106) and precipitated with Ethanol. Vector backbone (see FIG. 4a) was digested with XbaI (NEB R0145L) and MfeI (NEB R3589L) and dephosphorylated with rSAP (NEB M0371L), a 1.5 kb stuffer containing a EcoRI restriction site was excised, vector backbone fragments were separated by agarose gel electrophoresis, gel extracted (QIAGEN 28704) and precipitated with Ethanol. 2 μg of vector were ligated with 125 ng plasmid at a molar ration of V:I=1:3, 2 μl T4 DNA Ligase (NEB M0202M) in a total volume of 200 μl split into 10 reactions of 20 μl each. The ligation was purified using a column (Qiagen 27106) and electroporated into electrocompetent XL-1 Blue cells (Agilent 200249) 80 μl in 0.2 cm cuvette 2.5 kV (BioRad, Gene Pulser II), based on colony count the electroporation complexity was estimated to be 1 million. In a second library-cloning step, sgRNAs were cloned into vector containing complex barcodes. The vector was digested using BbsI (NEB R0539L or Fermentas ER1011), excising a stuffer containing an XhoI binding site, dephosphorylated with rSAP (NEB M0371L), linear Fragments isolated from agarose gel electrophoresis, and Ethanol precipitated. sgRNAs were ordered on a chip (CustomArray Inc.) and subsets of the oligos amplified with specific flanking primers with 10 cycles PCR. PCR product was purified on a column (Qiagen 27106) and digested overnight with BbSI (NEB R0539L or Fermentas ER1011) Vector and Insert were ligated in a golden gate reaction using 0.25 μl T4 DNA Ligase (NEB M0202M) and 1 μl BbSI (NEB R0539L or Fermentas ER1011) in 50 μl reaction volume. The reaction was cycled 20 times (5 min 37° C., 5 min 16° C., 20×) followed by 10 min 50° C. inactivation. Plasmids were purified on columns (Qiagen 27106), ethanol precipitated and electroporated into electrocompetent XL-1 Blue cells (Agilent 200249). Electroporated XL-1 Blue cells were collected in 2 ml recovery diluent and incubated at 37 C 200 rpm for 40 min, cells were plated on 2 square 245×245 mm LB agar plates containing 100 μg/ml Ampicillin (Thermo 166508) and incubated at 37 C for 10 h. Bacteria were collected from the plates and grown in 2L LB-Amp at 100 μg/ml (Sigma A9518) for 2 4 h until OD 2.0. Plasmid DNA was prepared (Macherey-Nagel NucleoBond Xtra Maxi Kit) according to manufacturer's recommendations.

Example 3: ES Cell Culture

[0119] A murine embryonic stem cell clone, derived from a derivative of HMSc2 termed AN3-12, with doxycycline inducible Cas9 (T3G-Cas9-IRES-mcherry PGK-GFP-rtTA) was used for this study. The following ES cell medium (ESCM) was used: 450 ml DMEM (Sigma D1152); 75 ml FCS (Invitrogen); 5.5 ml P/S (Sigma P0781); 5.5 ml NEAA (Sigma M7145); 5.5 ml LGlu (Sigma G7513); 5.5 ml NaPyr (Sigma S8636); 0.55 ml beta-mercapto ethanol (Merck 805740; dilute 10 μl bME in 2.85 ml PBS for a 1000× stock), 7.5 μl LIF (IMBA-MolBioService; 2 mg/ml). Cell culture-grade dishes were from Greiner (Greiner 15 cm 639160) and NUNC (all other formats, e.g. 10 cm dish Nunclon A Surface, cat no. 150350; 6-well Nunclon A Surface, cat no. 140675). Cells were trypsinized and replated every 2nd day and frozen in FCS:ESCM:DMSO=4.5:4.5:1. Cells were tested for mycoplasma every second week. Etoposide treatment: Medium was supplemented every day with 3.3 nM etoposide, an LD.sub.30 dose for 8 day treatment (Sigma E2600000), 1000× etoposide stocks dissolved in PBS-10% EtOH were used. For doxycycline treatment, medium was supplemented every day with 1 μg/ml (Sigma). Cells are tested for mycoplasma contamination every second week.

Example 4: Viral Vectors and ES Cell Infection

[0120] For retroviral library generation, barcoded CRISPR library virus carrying a neomycin resistance cassette was packaged in PlatinumE cells (Cell Biolabs) according to manufacturers recommendations. Virus-containing supernatant was frozen at −80° C. 300 million ES cells were infected with a 1:10 dilution of virus containing supernatant for 24 h in the presence of 2 μg polybrene per ml (Sigma TR-1003). 24 h post infection, selection for infected cells started using G418 (Gibco) at 0.5 mg/ml. To estimate multiplicity of infection, 10,000 cells were plated on 15 cm dishes and selected using G418. For comparison, 1000 cells were plated but not exposed to G418 selection. On day 10, colonies were counted. After 24 h of Selection cells are split and 480 million cells are seeded on 60 15 cm dishes (Greiner 639160). After that cells are kept at a minimum cell number of 300 million cells during editing and screening.

Example 5: iPS Cell Screen and Validations

[0121] Mouse embryonic fibroblasts containing Colla1::tetOP-OKSM, Oct4-GFP and Rosa26 M2rtTA alleles or Oct4-GFP alone were harvested from E13.5 embryos (Stadtfeld et al. Nature Methods 7, 53-55 (2010).). iPS cells were derived in DMEM supplemented with 15% FBS, 100 U/ml penicillin, 100 μg/ml streptomycin, sodium pyruvate (1 mM), 1-glutamine (4 mM), 1,000 U/ml LIF, 0.1 mM beta-mercapto ethanol, and 50 μg/ml ascorbic acid at 37° C. and 5% CO.sub.2 as well as 4.5% O.sub.2. MEFs were infected with a lentiviral vector delivering Cas9, selected for blasticidin resistance for 3 days, and subsequently infected with the sgRNA library. MEFs were treated with 0.5 mg/ml G418 for 3 days and 0.25 mg/ml G418 for an additional 3 days. For iPS induction, MEFs were plated at densities of 500,000 cells per 15 cm dish, and induced with doxycycline (1 μg/ml) for 7 days. After passaging for an additional 4 days in doxycycline-free media, Oct4-GFP-expressing cells were sorted from each replicate using a FACSAriaIII (BD Bioscience). All validation experiments were performed in 6 well dishes in triplicate, starting from 20 000 MEFs (Dox induction) or 40 000 cells (OKSM infection). Primary reprogramming was performed by infection with a lentiviral vector carrying OKSM factors as well as puromycin resistance and selected for puromycin resistance for 3 days. Oct-4 expression was quantified using a FACS BD LSR Fortessa (BD Biosciences), data were analyzed using FlowJo. Cells are tested for mycoplasma contamination every second week.

Example 6: Etoposide Hypersensitivity Validations

[0122] For hit validation we used mouse embryonic stem cells expressing Cas9 under the control of doxycycline (T3G-Cas9-IRES-mcherry PGK-GFP-rtTA). We generated 4 knockout clones for the genes Lig4, Zfp451, Slc25a4, Adcy3, Rad9a, Erbb4 and 2 knockout clones for the genes Trim71 and Rac1 (sgRNAs, and PCR primers for genotyping) and tested every clone in triplicate. We pretreated cells for 7 days with 5 nM etoposide (Sigma E2600000) and then measured the drop of cell viability (treated/control) within a 3 day selection with 10 nM etoposide. Viability was measured with Alamar Blue staining (DAL1100 Thermo Fisher) according to manufacturer's recommendations.

Example 7: DNA Harvest and NGS Sample Preparation

[0123] Readout of pooled CRISPR screens relies on precise PCR amplification of the integrated sgRNA cassette. However, in a genomic DNA prep the sgRNA cassette only makes up about 0.1 ppm of the total DNA. We improved this ratio by 3-4 orders of magnitude by flanking our sgRNA cassette with Pac-I sites (FIG. 4) and performing a size selective precipitation with digested genomic DNA. In detail: After 8 days of selection we lysed 170 million cells per condition using SDS-Lysis buffer (10 mM Tris pH8, 1% SDS, 10 mM EDTA, 100 mM NaCl)+1 mg/ml Proteinase K (Sigma P4032)+RNaseA (Qiagen). Genomic DNA was purified by phenol extraction, precipitated with isopropanol. Samples were digested with PacI (NEB R0547L). Size selective precipitation was carried out with using Speed Beads (GE45152105050250 Sigma Aldrich) according to manufacturer's recommendations. Each Sample was PCR amplified in 200 individual 50 μl PCR reactions using primers (AATGATACGGCGACCACC-GAGATCTACAC-NNNNNN-CGAGGGCCTATTTCCCATGATTCCTTC, SEQ ID NO: 2) where 6 bp are specific experimental indices used for demultiplexing samples after NGS sequencing and (CAAGCAGAAGACGGCATACGA-GATACCGTTGATGAGTAG, SEQ ID NO: 3). For qPCR analysis, we used GoTaq® qPCR Master mix (A6001 Promega) primers (AATGATACGGCGAC-CACCGAGATCTACACGAGTGGCGAGGGCCTATTTCCCATGATTCCTTC, SEQ ID NO: 4) and (CAAGCAGAAGACGGCATACGAGATACCGTTGATGAGTAG, SEQ ID NO: 5) for detection of a 579 bp amplicon on the 589 bp Pac-I Fragment with the CRISPR-UMI cassette and (GCCTTTAAGCCAATGCTAGCTG, SEQ ID NO: 6) and (GTAAATGGACAGAGGGTGTTTAACC, SEQ ID NO: 7) as a control with a 582 bp amplicon on a 7710 bp Pac-I fragment. PCR samples were purified on columns (Qiagen 27106) and pooled and size separated using agarose gel electrophoresis. The 600 bp band was excised and purified on a mini-elute column (QIAGEN 28204). This Sample was sequenced on an Illumina HiSeq2500 in a single-read 50 dual indexing sequencing run. We sequenced sgRNA sequences using 10× fold concentrated custom read primer (CGATTTCTT-GGCTTTATATATCTTGTGGAAAGGACGAAACACCG, SEQ ID NO: 8). For Analysis, it is necessary to obtain at least 10 bp for index 1 (barcode), and 6 bp for index 2 (experimental index).

[0124] Example 8: Data Analysis. For data analysis, we assigned sgRNA, unique molecular identifier sequences, and experimental indices to all reads using bowtie, samtools, fastx-toolkit and custom scripts. For Representation of guides in the library (FIG. 4f) library subpools were sequenced with individual experimental indices and the reads within each subpool were normalized to the median guide read of all subpools.

[0125] For conventional CRISPR data analysis (FIGS. 7b-d, 6d, 9a-c, 10b): We calculated depletion or enrichment of guides or clones using equation (1) and we added a pseudocount of 0.5 if there were no reads for a guide. P-values were calculated using cumulative binominal distribution functions (scipy.stats 0.19.0) using equation (2). We plotted depletion (x-axis) against p.sub.cdf (y-axis) in a volcano plot.

[00001] $\begin{matrix} d = \frac{{RPM}_{treated}}{{RPM}_{control}} & (1) \end{matrix}$

[0126] d . . . depletion of guide or clone

[0127] RPM . . . reads per million of that guide or clone

[00002] $\begin{matrix} p_{cdf} = {.Math.}_{i = 0}^{x} .Math. (\begin{matrix} n \\ x \end{matrix}) .Math. {p^{x} (1 - p)}^{n - x} & (2) \end{matrix}$

[0128] p.sub.cdf . . . p value cumulative binominal distribution function

[0129] x . . . reads of guide or clone in experiment

[0130] n . . . total reads in experiment

[0131] p . . . probability (reads of guide or clone in ctrl/total reads control)

[0132] For clonal CRISPR-UMI analysis (FIG. 7e and FIG. 6e) we generated a “volcano-like” plot. We considered clones which contain at least 3 total reads (in treated and control combined) and guides with at least 5 clones present. To account for random barcode sequencing errors within experiments, we used adjacency-based barcode collapsing given an edit-distance of 1- and 3-fold read count difference using custom scripts. To combine data on clone level to data on guide level we plotted the median of depletion of clones against the MAGeCK neg score, a score for depletion for a sgRNA computed by using MAGeCK 0.5.5 (Li et al. Genome Biol. 15, 554 (2014)). Instead of using MAGeCK to calculate depletion-scores of a gene based on n guides, we used it to calculate depletion-scores of a guide based on n clones. Median depletion (x-axis) plotted against MAGeCK neg scores (y-axis) gives a “volcano-like” plot.

[0133] For performance comparison of conventional analysis vs CRISPR-UMI analysis on guide level (FIG. 8a and FIG. 10a) we ranked guides by both depletion and p-values for conventional analysis and by median depletion and MAGeCK neg scores for CRISPR-UMI analysis and then generated a combined Guide score using equation (3).

[00003] $\begin{matrix} GS = \frac{{rank}_{depletion}}{N_{total .Math. .Math. guides}} \times \frac{{rank}_{p .Math. - .Math. value}}{N_{total .Math. .Math. guides}} & (3) \end{matrix}$

[0134] GS . . . Guide score−combined score, evaluation of guides

[0135] rank.sub.depletion . . . rank of guide by depletion (Conventional Analysis) [0136] rank of guide by median depletion (CRISPR-UMI Analysis)

[0137] rank.sub.p-value . . . rank of guide by p-value (Conventional Analysis) [0138] rank of guide by mageck neg score (CRISPR-UMI Analysis)

[0139] N.sub.total . . . guides total guides in analysis

[0140] For gene ranking (FIG. 3e) we used MAGeCK for conventional analysis and combined guide scores (GS) for CRISPR-UMI analysis using Fisher's method of p-value combination. For the top 5 genes with lowest p-values we could not apply Fisher's method due to numerical restrictions and sorted those 5 genes by combining p values using equation (4).

[00004] $\begin{matrix} p_{Gene} = \sqrt[n]{{.Math.}_{i = 1}^{n} .Math. {GS}_{i}} & (4) \end{matrix}$

[0141] P.sub.Gene . . . Score for Gene

[0142] GS.sub.i . . . Guide score of guide i.

[0143] n . . . number of guides for that Gene

[0144] To calculate signal to noise ratios for a screen (FIGS. 3d and 3e and Supplementary FIG. 3e) we defined the signal of a guide as the distance from the origin of a volcano plot. This considers separation from noise in both depletion (x-axis) and significance (y-axis). In the volcano plot the x-axis is the ratio of guide reads in treated over control (for CRISPR-UMI the median of many clones) and the y-axis is the negative logarithm of the p-value as determined by binominal distribution in conventional analysis or the negative score as determined by MAGeCK for multiple clones per guide in CRISPR-UMI Analysis. Distance on the x axis was normalized to the guide with strongest depletion in the experiment. Distance on the y-axis was normalized to the guide with best significance score. Signal is the diagonal distance of the x-normalized and y-normalized distance from the origin calculated using equation (5). Signal to noise ratios are calculated using equation (6).

[00005] $\begin{matrix} S_{i} = \sqrt{{(\frac{1 - d_{i}}{1 - d_{\min}})}^{2} + {(\frac{\log_{10} (p_{i})}{\log_{10} (p_{\min})})}^{2}} & (5) \end{matrix}$

[0145] S.sub.i . . . signal of guide i.

[0146] d.sub.i . . . depletion of guide i (Conventional) [0147] median depletion of clones for guide i (CRISPR-UMI)

[0148] d.sub.min . . . depletion of strongest depleting guide in the comparison (Conventional), [0149] lowest median depletion of all guides in the comparison (CRISPR-UMI)

[0150] p.sub.i . . . p-value for depletion for guide i (Conventional), [0151] mageck neg score for guide i (CRISPR-UMI)

[0152] p.sub.min . . . lowest p-value of all guides in the comparison (Conventional), [0153] lowest mageck neg score for all guides in the comparison (CRISPR-UMI)

[00006] $\begin{matrix} SNR = \frac{{\overline{s}}_{NHEJ}}{σ_{CTRL}} & (6) \end{matrix}$

[0154] SNR . . . signal to noise ratio.

[0155] s.sub.NMEI . . . average signal of all guides of NHEJ pathway (Lig4, Nhej1, Xrcc4-6)

[0156] σ.sub.ctrl . . . Standard deviation of signal from all control guides

[0157] We evaluated CRISPR-UMI vs conventional screen using a value termed Depletion (NHEJ/control) (FIGS. 7d and 7e). That's a ratio of the geometric mean of fold change of all guides against NHEJ-pathway genes (Lig4, Nhej1, Xrcc4-6) and the geometric mean of fold change of non-targeting control guides.

[0158] To assess the efficiency of different screening and analysis methods (FIG. 8d), we evaluated the ranking of guides by guide scores GS calculated using equation 3. We determine the average number of guides present per gene among the top guides (y axis) while expanding the list of top guides from 1 to 100 (x-axis). E.g. If the among the top 30 guides (x-axis) there are guides against 15 different genes there are in average 2 (y-axis) guides per gene.

[0159] For incidence vs abundance analysis of a positive selection screen of reprogramming MEFs to iPSC (FIG. 11b) we determined read counts for all individual UMIs. Read counts for all UMIs belonging to the same guide were added up for a measure of abundance for each guide. The number of independent UMIs per guide passing the threshold criteria for a iPSC colony (at least 10 reads) is a measure for incidence. Enrichment for guides in either abundance or incidence is normalized to abundance of the guides in the starting MEF population. Each data point shown is the median of 4 biological replicas.

[0160] To estimate colony size of iPSC (FIG. 11e and FIG. 12) read counts for all UMIs were determined, and scaled by library size across 8 treated and 4 untreated replicates. UMI barcodes with a hamming distance of 1 were collapsed to account for UMI sequencing errors using UMI-tools (Smith et al. Genome Res. 27, 491-499 (2017)). UMIs with low counts were filtered by retaining UMIs with 5 or more read counts. Starting with the highest represented guide-UMI combination we included all UMIs up to a cumulative percentage of 90% per guide. Normalized UMI counts were divided by the median of the UMI counts for each sample, the values from the 8 treated replicates were pooled, log-transformed and used to visualize the distribution of the abundance estimates relative to the experimental median for each guide.

Example 9: Statistical Information

[0161] All error bars are standard deviation or standard error of the mean as indicated in FIG. 30 legends. Number of experiments n is given for every experiment. Statistical information for conventional and CRISPR-UMI screen-analysis is described in data analysis section.

Example 10: A Framework for Single Cell-Based CRISPR Screening

[0162] Conceptually, the depletion limit in CRISPR screens exists only on the population level. On a single cell level, cells either harbor homozygous LOF alleles, whereas others harbor any combination of alternative outcomes (Shalem et al. Nat. Rev, Genet. 16, 299-311 (2015)). In other words, the genetics at single cell level can be considered binary, while the genetics at population level is graded depending on guide efficiency. To track individual cells, we added a random barcode, that together with the sgRNA generates a “unique molecular identifier” (UMI) (Kivioja, T. et al. Nature Methods 9, 72-74 (2011)), thus increasing the depth for CRISPR screening (FIG. 1). We evaluate independent biological replicates within a pooled CRISPR screen by following single cell-derived genetically marked clones after limiting dilution (FIG. 2c). This approach discerns clones with homozygous LOF alleles that drop out completely from clones with non-LOF alleles that survive selection (FIG. 2d), and thus overcomes the expected depletion limit of a conventional population based screen which does not discern LOF from non-LOF alleles (FIG. 2b). Importantly, response of LOF clones to selection is a true reflection of biological effect and no longer be overlaid with editing efficiency. This binary mode of scoring can be leveraged if i) cells infected with a sgRNA can be distinguished from one another by use of a random barcode/UMI, and ii) the cell population is carried through a strong bottleneck so that a maximum of 1 cell and thereby 1 editing outcome remains in the population for each independent infection event (UMI), termed clone. We call this method CRISPR-UMI to highlight the presence of unique molecular identifiers allowing for tracking of each cellular event.

[0163] Beyond providing improved resolution for LOF screens, CRISPR-UMI also enables analysis of population behavior. For instance, the enrichment of an sgRNA sifter positive selection can arise from either massive expansion of a few, single cells, or milder expansion of all cells within the population carrying a specific sgRNA. However, the frequency of such stochastic events cannot be deduced from conventional population based CRISPR screen analysis. In contrast, CRISPR-UMI allows for the assessment of population behavior and thus for quantification of effect size and probability. Therefore, it generates increased information content from screening readouts (FIG. 2e, f).

Example 12: Generation of a Complex CRISPR Library Using Random Barcodes

[0164] To ensure optimal sgRNA efficiency, minimal off targeting, as well as a likely biological effect on target for our CRISPR-UMI library, we predicted Doench-scores (Doench, J. G. et al. Nature Biotechnology 32, 1262-1267 (2014)) for ail possible sgRNA within the genome and subsequently factored in additional parameters such as off-target predictions, position within the on-target transcripts, exon structure, and protein coding domains (FIG. 3).

[0165] CRISPR-UMI requires the generation of a high complexity library for clonal tracking of individual cells. We generated a retroviral vector and introduced a stretch of 10 random nucleotides by parallel cloning (1*10.sup.6 bacterial colonies, reaching the theoretical maximum complexity of a 10 nt UMI 4.sup.10=1.048*10.sup.6) and confirmed presence of random barcodes by NGS. Subsequently, several sub-poo Is of sgRNAs targeting a total of 6560 different, mouse genes including all nuclear genes with 4 sgRNAs/gene were cloned into the vector-barcode-library at a coverage of 954-8776 independent cloning events per sgRNA. We also included a set of 112 non-targeting control sgRNAs to the library (Wang et al. Science 343, 80-84 (2014)) (FIG. 4a, b, FIG. 5, and Methods in Examples 1-10). Thus, each sgRNA is combined with a different barcode in each ligation event. The combination of sgRNA and barcode together represent the unique molecular identifiers (UMI). The overall library complexity reached 83.5 million, which exceeds the number of clones assayed within a screen and thus allowed for tracking of individual cells in subsequent genetic screens. To account, for the large quantities of genomic material that need to be processed subsequent to genetic screens, the vector design further allows for enrichment of genomic DNA containing sgRNA integrations prior to PCR amplification by PacI endonuclease digest and subsequent size selection (FIG. 4c, see Methods and step-by-step protocol). This step enriches the PCR-templates 10.sup.3 to 10.sup.4-fold (FIG. 4d) and thereby minimizes the required number of PCR reactions and cycles, and thus PCR amplification biases. By integration of specific sequence stretches, the vector design further allows the direct use of standard illumina primers (FIG. 4e).

[0166] We applied our sequencing strategy on plasmid DNA of our libraries to analyze the representation of sgRNAs and UMIs. The number of distinct sequenced UMIs per sgRNA correlated with the estimation of cloning depth based on number of bacterial colonies obtained. The relative difference in abundance of guides in the library at the 10th and 90th percentile is 4-fold (FIG. 4f), which compares well to other published sgRNA libraries (Koike-Yusa et al. Nature Biotechnology 32, 267-273 (2013), Wang et al. Science 343, 80-84 (2014) and Hart, T. et al. Cell 163, 1515-1526 (2015)). Taken together, we generated libraries targeting mostly protein coding domains of nuclear proteins with even distribution and highly complex UMIs that allow for single cell based CRISPR screens.

Example 13: Single Cell Lineage Tracing Improves Signal-to-Noise Ratio in Dropout CRISPR Screening

[0167] To test CRISPR-UMI, we exposed cells to the chemotherapeutic drug etoposide, which generates DNA double strand breaks via inhibition of topoisomerase II (Burden, D. A. et al. Journal of Biological Chemistry 271, 29238-29244 (1996)). Cells that are defective in double strand break repair pathways, such as non-homologous end joining (NHEJ), are expected to be sensitive to this treatment (Srivastava, M. et al. Cell 151, 1474-1487 (2012), Kurosawa, A. et al. PLoS ONE 8, e72253 (2013), Fattah, F. J. et al. DNA Repair (Amst.) 15, 39-53 (2014) and Jackson, S. P. & Bartek, J. Nature 461, 1071-1078 (2009)).

[0168] To optimize the conditions for single cell derived clonal CRISPR screening, we performed a pilot screen on a subset of 365 genes (1437 guides) associated with DNA damage response. We infected mouse embryonic stem, cells (mESCs) harboring a doxycycline (Dox)-inducible Cas9 cassette with retroviral vectors delivering our sgRNA library and selected for G418 resistant clones. Clone size and number can be modified by varying the limiting dilutions as well as the time of clonal expansion of infected cells (FIG. 6a, b). Importantly, the number of NGS reads required for the CRISPR-UMI screen does not exceed the number needed for conventional analysis. Given this limited sequencing space for a full-scale experiment, we tested a matrix of 4000 reads/guide in 4 conditions and aimed for i) 20 clones of 200 reads, ii) 64 clones of 64 reads, iii) 200 clones of 20 reads and iv) no limiting dilution or expansion resulting in 4000 mostly independently infected cells (FIG. 6c, see Methods in Examples 1-10). In each setup, we evaluated the performance of positive control guides targeting core members of the NHEJ complex. We combine the median depletion of individual clones (FIG. 6d) per guide, as well as a p-value for depletion using MAGeCK 0.5.5 (Li, W. et al. Genome Biol. 15, 554 (2014)) that ranks individual clones for one guide relative to the full dataset to evaluate significance of bias towards depletion (FIG. 6e). Thus, we combined many individual clones carrying the same guide using MAGeCK similarly to combining independent guides targeting the same gene. In doing so, we reached one level deeper into the screen, namely down to clonal analysis. We identified all core members of the NHEJ complex, with multiple guides scoring highly significant, validating quality of the guide prediction and library. We next compared the different conditions of clone sizes. To identify the optimal compromise between number of clones and p-value per clone, we quantified performance as signal-to-noise ratio (SNR) of NHEJ pathway members relative to control sgRNAs by calculating the distance of each sgRNAs to the origin of the volcano plot (FIG. 6e). SNR reached similar values of 15-18 in all conditions, but was best at 148 clones of 35 reads. Importantly, guide representation translates into a variable number of clones per guide, while clone size is not affected by sgRNA abundance. We estimated based on the sgRNA distribution in our library (FIG. 4f), that 148 clones of 35 reads would also result in the lowest number of guides represented in too few clones for analysis (0.06% versus 1.4% or 6.6%, FIG. 6e). We therefore concluded that the ideal parameters for single cell lineage tracing CRISPR screening in our setting are roughly 150 clones with on average 30-40 reads, however this might vary depending on application.

[0169] Using this optimized clone number and size, we next tested CRISPR-UMI in an etoposide screen with the full library of 26514 guides targeting 6560 genes, containing all predicted nuclear genes in the mouse genome, as well as orthologues of drugged human genes (FIG. 7a). Using the same mESC line carrying Dox-inducible Cas9, we compared performance at the guide level as a measure of sensitivity and specificity in CRISPR-based negative selection screens. Depletion levels and p-values of depletion for pools versus single cell derived clones were evaluated and we observed clonal variance as well as increased levels of depletion at the single cell level (FIG. 7b, c).

[0170] To determine if CRISPR-UMI outperforms conventional analysis, we combined all clones per guide and plotted median depletion for each guide versus a p-value computed using the robust ranking algorithm of MAGeCK0.5.5 (Li, W. et al. Genome Biol. 15, 554 (2014)) (FIG. 7e, compare to 7d). Median depletion of multiple individual clones mostly representing LOF clones is expected to give a better measure for biological effect (see also FIG. 2e). Indeed, CRISPR-UMI resulted in a better separation of signal from noise (SNRCRISPR-UMI=9.2 versus SNRCRISPR=6.4 for NHEJ/control) and stronger depletion (Depl.CRISPR-UMI=2.4 versus Depl.CRISPR=1.4 for NHEJ/control) compared to conventional analysis (FIG. 7d, e).

[0171] To directly compare performance of conventional versus CRISPR-UMI analysis, we ranked guides within each dataset (see Methods). We observed a high degree of correlation between conventional and single-cell based screen analysis (Pearson correlation: 0.751), but we also observed guides that were uncovered only with conventional or clonal CRISPR analysis (FIG. 8a).

[0172] Intriguingly, we found that discrepancies are due to strong outlier clones with vastly overrepresented read numbers dominating the total read space in all discrepant cases (FIG. 8b and FIG. 9). These strong outliers interfere substantially with conventional analysis, which assumes that all cells within the experiment are equally represented. For guide 1 against. Trim71 and guide 2 targeting Ell, that only scored in conventional analysis, outlier clones snowed reduced read count in the treated sample compared to the control. These outliers reduced the total reads for these guides, masking the fact that all other clones of these guides showed no obvious tendency for depletion (FIG. 8b). In contrast, guides that were exclusively identified by CRISPR-UMI, such as Lig4 guide 3 and Rad9a guide 4, were associated with outlier clones having increased read counts in treated cells compared to the control, resulting in increased total reads for this guide, masking the depletion of most of the clones of these guides. We thus hypothesized that these outlier clones cause the discrepant results obtained between conventional and clone-based analysis.

[0173] To test this hypothesis, we removed the 2 clones with most reads from the data of each guide and reanalyzed all data using both methods. This resulted in a realignment of both analysis pipelines (see asterisks in 8a, curated Pearson correlation: 0.764). Upon removal of the outliers, the guides that were previously only identified by CRISPR-UMI were now also uncovered by conventional analysis, whereas the guides that were previously identified uniquely by conventional analysis dropped to a lower position in the conventional MAGeCK ranking of genes (Trim71: 9 to 214; Ell: 20 to 4416) upon outlier removal. Thus, the guides uniquely identified by population based methods were putative false positives, and those missed by conventional methods were putative false negatives. We conclude that in this screening regimen, outlier clones, which usually remain undetected, confounded conventional screen analysis but not CRISPR-UMI. We considered that the limiting dilution step in our protocol might underlie the observation of outlier clones. To investigate this possibility, we performed the same etoposide dropout screen without limiting dilution. A comparison of both scoring algorithms again revealed several guide RNAs that were differentially called by conventional analysis versus CRISPR-UMI. Once more, these discrepancies were due to outlier clones and highlighted shortcomings in conventional analysis (FIG. 10). Taken together, outlier clones are not introduced by limiting dilution. CRISPR-UMI analysis of pooled screens is thus superior to conventional analysis as it avoids putatively false positive or negative calls arising from clonal variation and outliers.

[0174] To compare the performance of both scoring methods directly, we asked how many of the positive control sgRNAs designed to target the NHEJ complex (Lig4, Xrcc4-6, Nhej1) score amongst the top 50 or 100 guides (FIG. 8c). Whereas conventional analysis only calls 7 and 8 out of 21 sgRNAs respectively, CRISPR-UMI scores 12 and 13 sgRNAs within the top 50 or top 100. Next, we determined the reproducibility on sgRNA level of all genes identified in the screen. We plotted the average number of sgRNAs present per gene (for all genes hit by the respective group of guides) as a function of increasing numbers of sgRNAs according to rank (FIG. 8d, e.g. if by the top 30 sg RNAs, 15 genes are hit, the value is 2; a value of 1 would be expected for a random dataset). For both full library screens (with and without clonal dilution-expansion) CRISPR-UMI clearly outperformed conventional analysis across the entire hit list and showed higher reproducibility between sgRNAs.

[0175] To evaluate the results of each method of analysis on the gene level, we combined guides using MAGeCK for conventional analysis and Fisher's method for CRISPR-UMI (see Methods) and ranked them according to score within each method (FIG. 8e), Both methods scored multiple expected hits in DNA repair pathways, which we color-coded according to function (FIG. 7g) (Jackson, S. P. & Bartek, J. Nature 461, 1071-1078 (2009), Black et al. Genes (Basel) 7, 67 (2016) and Takata et al. Nat Commun 4, 2338 (2013)). Furthermore, both methods identified a specific proton pump, Abcc1, as well as a novel SUMO E3 ligase, Zfp451. CRISPR-UMI outperformed conventional analysis methods not only on guide level but also on gene level, showing stronger level of depletion and better ranking of nits. To test if the genes identified in the negative selection screen indeed validate experimentally, we derived several independent KO cell lines for common hits (Lig4 and Zfp451) as well as conventional analysis specific (Slc25a4, Adcy3, Trim71) and CRISPR-UMI specific hits (Rad9a, Erbb4, Rac1) and quantified dropout in response to etoposide relative to control (FIG. 4f). Both common nits showed strong drop out, also Rad9a and Erbb4 depleted significantly, while the conventional analysis specific hits did not validate. Of note, effect size in screen readout and validation also correlated well. Reassuringly, genes identified by CRISPR-UMI, namely Rad9a and Erbb4 (FIG. 8e and FIG. 9c), were both previously implicated in DSB repair or decatenation of DNA in response to TopoII inhibition (Gilmore-Hebert et al. Mol. Cancer Res. 8, 1388-1398 (2010), Icli et al. Biochem. Biophys. Res. Commun. 418, 116-121 (2012), Mukherjee et al. Seminars in Radiation Oncology 20, 250-257 (2010), Greer et al. J. Biol. Chem, 285, 15653-15661 (2010), He et al. Nucleic Acids Res. 39, 4719-4727 (2011) and Smilenov et al. Cancer Res. 65, 933-938 (2005)).

[0176] Taken together, we identified multiple known SLS well as novel proteins involved in DNA repair upon etoposide-mediated topoisomerase II inhibition. Moreover, we snow, that the quality of screening results obtained by CRISPR-UMI exceeds the one generated conventionally both in robust identification and quantification of phenotypes.

Example 14: Positive Selection Screen to Elucidate Roadblocks of Reprogramming

[0177] Next, we chose a robust iPS cell induction protocol (Stadtfeld et al. Nature Methods 7, 53-55 (2010)) to test CRISPR-UMI during a stochastic single cell positive selection paradigm. The clonal dilution step was omitted from this approach to keep barcode complexity as high as possible, as the stochasticity of IPS induction replaced the limiting dilution and generates the clones for CRISPR-UMI analysis (FIG. 11a, compare to 2e, f). We collected mouse embryonic fibroblasts (MEFs) that contain a Dox-inducible Oct4-Klf4-Sox2-Myc (OKSM) cassette as well as an endogenous Oct4-GFP reporter, which can serve as proxy of successful reprogramming. We infected these MEFs with a lentiviral construct delivering Cas9 and subsequently our sgRNA barcode library (FIG. 11a). Six days post infection (day 0), we induced OKSM by Dox treatment for 7 days. Cells were then passaged and cultured without Dox assay IPS fate independent of exogenous OKSM, followed by FACS-purification of Oct4-GFP positive cells on day 11 to isolate IPS cells. Subsequently, we isolated genomic DMA from iPS cells (“treated”) and MEFs (“untreated”) and determined sgRNA abundance as well as the number of incidents of independent guide-barcode combinations (i.e. IPS colonies) (FIG. 11b). The incidence, reported by the number of independent barcodes, reflects probability of IPS colony formation, whereas sgRNA read abundance reports total amount of IPS cells. We identified many known roadblocks of reprogramming such as Trp53 (Marion et al. Nature 460, 1149-1153 (2009)), Pten (Liao et al. Mol. Ther. 21, 1242-1250 (2013)), Dot1l (Onder et al. Nature 483, 598-602 (2012)), Socs3 (Buckley et al. Cell Stem Cell 11, 783-798 (2012)), Sae1, Uba2 and Chaf1a (Cheloufi et al. Nature 528, 218-224 (2015)) (FIG. 11b). In the inventive aspects, such known reprogramming targets are preferably not used alone but in combination with another reprogramming target of the invention, such as Menin. Of note, guides against Senp1, Socs3, and Dot1l primarily scored on the incidents axis, and in Trp53 almost 100% of UMIs gave rise to IPS cell colonies, presumably due to the expansion of the MEF population prior to reprogramming. This highlights the additional resolution gained from, this additional readout parameter.

[0178] We also identified several novel candidates for roadblocks of reprogramming and performed validation experiments for 20 genes (FIG. 11c). As reprogramming is strongly dependent on timing and dose of OKSM (Cheloufi et al. Nature 528, 218-224 (2015)), we used two approaches for validation: (1) low OKSM levels obtained the Dox-inducible OKSM MEFs, mimicking the original screen and (2) high OKSM levels obtained by lentiviral OKSM delivery. We included knockdown of Ube2i as positive control, which improved reprogramming >100-fold in this setting. As expected, almost all sgRNAs enhanced reprogramming efficiency in the screening approach (FIG. 11c, d and FIG. 12a). However, the effect size for each gene varied between both reprogramming systems, in agreement with prior findings, suggesting system-specific roadblocks of the IPS reprogramming process (Santos et al. Cell Stem Cell 15, 102-110 (2014) and Rais et al. Nature 502, 65-70 (2013)). Importantly, targeting of novel genes such as Pias1, an E3 SUMO-protein ligase, and Men1, encoding for the transcriptional cofactor Menin, markedly outperformed all tested and previously identified roadblocks in a primary reprogramming regimen. Taken together, our screening approach based on a combination of abundance of guide RNA as well as independent clones identified multiple known as well as novel roadblocks of reprogramming.

[0179] We next wanted to make further use of our barcodes to more deeply analyze data from the primary screen, and focused on those sgRNAs that most significantly enhanced reprogramming efficiency. We plotted median iPS colony size by quantifying the read count for each barcode-tagged individual colony (FIG. 11e, FIG. 12b), Of note, colony size reflects the reprogramming speed and/or growth kinetics of resultant iPS colonies and is different from the stochastic probability of establishment of colonies, both parameters together result in the increased abundance of sgRNA reads. Interestingly, we observed a distribution of read counts that was gene specific and relatively reproducible between sgRNAs. This result was immediately suggestive of biology and predicts that distinct colony sizes will be obtained with different guide RNAs. To test if read count distribution reflects true biological outcomes regarding the incidence of iPS colony formation versus the size of such colonies, we imaged colony appearance 10 days after Box induction for guides that were predicted to generate particularly large or small colonies. Indeed, the observed colony size perfectly correlated with expected size distribution from primary screen analysis (FIG. 11f and FIG. 12c). In summary, we show that single cell based CRISPR analysis in a regimen of stochastic positive selection can robustly identify nits, as well as predict the variation of probability over variation of event quantity. We snow that new reprogramming targets to iPS cells generation have been found, which were further validated in an iPS cell assay.

Example 15: Genetic Modulation of Human Reprogramming Efficiency

[0180] Primary human dermal fibroblasts were infected with lentiviral constructs carrying either knockdown shRNAs constructs, or sgRNAs plus Cas9 to genetically target gene loci. These constructs additionally carried a blasticidin selection cassette. Subsequently, fibroblasts were selected for successful infection and infected again 46 days after the initial infection with a lentiviral vector carrying an expression cassette for the 4 “Yamanaka” factors Oct4, Klf4, Sox2, and Myc (OKSM) coupled to a puromycin selection cassette. Puromycin selection was initiated the day after. Six days after OKSM infection, cells were trypsinized and transferred on a feeder cell layer consisting of CF-1 cells. Cells were maintained for another 2 weeks to evaluate reprogramming efficiency by alkaline phosphatase (AP) staining, which is shown in FIG. 13. AP stains iPS cell colonies dark blue (arrowheads), while fibroblasts do not stain or appear as fibroblastic stained cells (asterisk).

[0181] Reprogramming efficiency in empty shRNA or empty sgRNA control vectors was low as expected. While sgRNAs against CHAF1A, SAE1, and TJBE2I, did not enhance reprogramming efficiency, reprogramming was markedly improved upon knockdown of Menin mRNA or -more pronounced-editing of the MEN1 locus. In conclusion, Menin activity prevents efficient reprogramming of human dermal fibroblasts, inhibition of MENIN thus presents an efficient method of enhancing reprogramming in human samples.

Example 16: Induced Differentiation of ESC to iN (Induced Neurons)

[0182] Expression of proneural factors Ascl1 and Ngn2 leads to direct conversion of embryonic stem cells (ESC) to neurons without intermediate states. In comparison to previous pure growth factors induced differentiation, this regime allows generation of neurons in a simpler (one step protocol compared to multi-step differentiation protocols with multiple medium conditions), faster (only 4-5 days for generating beta-III-tubulin (Tuj) positive cells compared to 7-14 days, e.g. in Gaspard et al., Nature Protocols 4(10), 2009: 1454-1463), near 100% purity and having more uniform neuronal subtype as end point differentiation. Furthermore, this method is more cost effective due to the reduced requirements in growth factors.

[0183] ESC carrying a doxycyclin inducible Ascl1 (Achaete-Scute Family BHLH Transcription Factor 1) or Ngn2 (Neurogenin 2) cassette (Ascl1-ESCs and Ngn2-ESCs) and constitutively active Cas9 were infected with retrovirus carrying guide against Menin to introduce menin knockout. ES cells were plated at clonal density and individual colonies were picked and genotyped to confirm homozygous Menin knockout. The corresponding clones were expanded and exposed to 7 days of doxycycline treatment. From, day four on, cells were treated with the drug AraC (Cytosine β-D-arabinofuranoside) to eliminate dividing cells and purify the neuron population. At day 7 of dox treatment, cells were analyzed using fluorescence activated cell sorting (FACS) for the expression of endogenously tagged pan-neuronal gene MAPT (Microtubule-Associated Protein Tau) with P2A-Venus reporter. Cell numbers were compared between menin knockout ESC and ESC without menin knockout. Data were acquired from three biological triplica. The first plot in FIG. 14 illustrates the mean number of iN derived from Ascl1 and Ngn2 cell lines with and without menin knockout. The boxplots of FIGS. 14 (b) and (c) show data from two clones with confirmed homozygous menin knockout and the corresponding parent cell line without menin knockout. The data shows that neuronal transdifferentiation using neuronal transcription factors Ascl1 or Ngn2 can be enhanced by Menin inhibition.

Example 17: Transdifferentiation MEF to iN

[0184] Enforced expression of transdifferentiation inducing genes is currently the only method to convert MEFs into functional neurons besides the detour via reprogramming using e.g. Yamanaka-factors. Furthermore, transdifferentiation is devoid of teratoma formation, which can arise from incomplete neuronal differentiation from ESC. MEFs (mouse embryonic fibroblasts) carrying and inducible Ascl1 cassette (Ascl1-MEFs) were infected with viruses carrying Menin guides and Cas9 to introduce menin knockout. Cells were plated on coverslips covered with a layer of P53-knockdown immortalized primary glia obtained from P3 mouse pups. After 13 days of doxycycline treatment, cells were fixed, using 4% PFA and stained for the pan neuronal marker beta-III-tubulin (Tuj). Number of Tuj-positive neurons per defined area (1.64 μm.sup.2) on the coverslips was obtained from confocal images and manual cell counting tool of the Fiji software.

[0185] FIG. 15 shows cell images with and without menin knockout. The plot of FIG. 15 illustrates the difference in IN number obtained from Ascl1 cell line after menin knockout and empty guide control. Experiments were performed as biological triplica.

[0186] The data confirms the results of example 16 that neuronal transdifferentiation using neuronal transcription factors Ascl1 can be enhanced by Menin inhibition with different starting cells.

Example 18: Results Summary

[0187] We applied CRISPR-UMI to a sensitizer screen for etoposide and identified all the expected genes in the NHEJ pathway, as well as unanticipated genes such as the transporter Abaci and the SUMO E3 ligase Zfp451 both by conventional and CRISPR-UMI analysis. Interestingly, mutations in Zfp451 nave recently been associated with cellular stress including DMA damage, however SL direct role in DNA damage resistance had not been reported. We therefore propose that chemical inhibition of Zfp451 will show strong synergy with etoposide in rapidly cycling tumor cells. CRISPR-UMI uncovered additional hits, Rad9a and Erbb4, that have previously been associated with DNA damage response and were not identified using the conventional analysis due to multiple single outlier clones that dominate sequencing space. Elimination of such outliers in conventional analysis also removed putatively false positive nits such as Trim71 and Eli, from the top scoring list. Furthermore, depletion levels based on median clone depletion—in particular for efficient guides—is more accurate to predict true biological effects compared to classical analysis suffering from a conceptual maximal level of measurable depletion.

[0188] Furthermore, CRISPR-UMI allowed us to score the number of independent IPS cell colonies formed in a single screen, and thus to identify well-known as well as new roadblocks of reprogramming. Importantly, the expected roadblocks of reprogramming Dotl1and Socs3 mostly scored with increased incidence, i.e. colony number, and would have been potentially missed in only read-based analysis. CRISPR-UMI identified Pias1, an E3 ligase of SUMOylation, and Menin, neither of which were previously implicated in IPS reprogramming. Interestingly, loss of Menin has been associated with facilitating of other lineage identity switches such as in vivo transdifferentiation of glucagon expressing cells to insulinomas potentially pointing to a more general role in maintenance of lineage identity.

[0189] By studying effect size and number, i.e. the number of independently iPS cell colonies and the read numbers obtained from each event, we can predict biological function directly from NGS data obtained from the screen. CRISPR-UMI identified conditions resulting primarily in faster reprogramming and thus bigger individual iPS colonies as confirmed by validation experiments. Examples are sgRNAs targeting Axin, APC and Tcf171, that lead to few but very big iPSC colonies. Indeed, Axin together with APC forms a destruction complex of beta-catenin negatively regulating Wnt signaling and acts through early promotion of endogenous pluripotency gene expression. This complex is often targeted via the Gsk3 inhibitor CHIR99021 to enhance reprogramming. Also, Tcf711 (sometimes referred to as Tcf3) inhibition was previously described to specifically promote early reprogramming stages by functioning as a transcriptional repressor of Wnt targets. In contrast to the Wnt signaling axis, targeting of Dot1l, Socs2, and Senp1 resulted primarily in an increased number of independent UMIs with few reads representative of small iPS cell colonies. The probability of a fibroblast cell to dedifferentiate into an iPS colony in the transgenic system, we used is typically below 1%. Therefore, rather than affecting kinetics, these genes modulate the likelihood of iPS colony formation. The ubiquitin E3 ligase Socs3 was previously reported as negative regulator of Stat3 signaling. Thus, Socs3 knockout boosts Stat3 signaling downstream of LIE, potentially explaining the increased numbers of UMIs we observed. However, Socs3 knockout also led to increased differentiation to trophoblast giant cells in our validation resulting in low absolute numbers of iPS cells, in line with previous observations that Socs3 results in differentiation of trophoblast stem cells to giant cells. Taken together we hypothesize that Socs3 knockout boosts LIF-induced Stat3 signaling which leads to increased reprogramming towards iPSCs but then also induces differentiation to trophoblast giant cells resulting in the observed small cell numbers/unique molecular identifier. Similarly, targeting of Dot1l also resulted mostly in increased number of independent colonies. These observations demonstrate the additional insights uncovered by positive selection screens. Moreover, because we can count the number and quantify the effect incidents, this method can detect, clonal effects in positive selection screens and thereby avoid false positive calls by clonal outgrowth, e.g. due to double infection of one cell with 2 guide vehicles whereby one positively selecting guide also enriches for another passenger guide.

[0190] Out studies were further validated by generating human iPS cells using menin suppression or editing, thereby ablating menin activity, iPS cells behave like ESCs and could be reprogrammed to neuronal cells.

ENHANCED REPROGRAMMING OF SOMATIC CELLS

Inventors

Cpc classification

Classification Explorer

C12N2501/999

CHEMISTRY; METALLURGY

Classification Explorer

C12N2501/606

CHEMISTRY; METALLURGY

Classification Explorer

C12N2501/65

CHEMISTRY; METALLURGY

Classification Explorer

C12N2501/603

CHEMISTRY; METALLURGY

Classification Explorer

C12N5/0696

CHEMISTRY; METALLURGY

Classification Explorer

C12N2501/605

CHEMISTRY; METALLURGY

Classification Explorer

C12N2310/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N2501/608

CHEMISTRY; METALLURGY

Classification Explorer

C12N2501/602

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/113

CHEMISTRY; METALLURGY

Classification Explorer

C12N2501/604

CHEMISTRY; METALLURGY

Classification Explorer

C12N2310/141

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N5/074

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/113

CHEMISTRY; METALLURGY

Abstract

Claims

Description