TRANSCRIPTIONAL RECORDING BY CRISPR SPACER ACQUISITION FROM RNA
20220049232 · 2022-02-17
Assignee
Inventors
Cpc classification
C12N2310/20
CHEMISTRY; METALLURGY
C12Q2521/107
CHEMISTRY; METALLURGY
C12N9/22
CHEMISTRY; METALLURGY
C12N2800/22
CHEMISTRY; METALLURGY
C12N2800/80
CHEMISTRY; METALLURGY
C12N15/11
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
C12Q2521/107
CHEMISTRY; METALLURGY
International classification
C12N9/22
CHEMISTRY; METALLURGY
C12N15/11
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
Abstract
The present invention relates to a method for recording a transcriptome of a cell, the method comprising the steps of: providing a test cell comprising: a first transgene nucleic acid sequence encoding a fusion protein comprising a reverse transcriptase polypeptide and a Cas1 polypeptide and a second transgene nucleic acid sequence encoding a Cas2 polypeptide, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are under transcriptional control of an inducible promoter sequence, and a third transgene nucleic acid sequence comprising a CRISPR direct repeat (DR) sequence; wherein said CRISPR direct repeat sequence is specifically recognizable by a RT-Cas1-Cas2 complex formed by the expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence, exposing said test cell to conditions under which expression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence is induced, wherein said RT-Cas1-Cas2 complex formed by expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence acquires protospacers from RNA molecules and integrates spacers into said third transgene nucleic acid sequence yielding a modified third transgene nucleic acid sequence, isolating said modified third transgene nucleic acid sequence from said test cell yielding an isolated third transgene nucleic acid sequence, and sequencing said isolated modified third transgene nucleic acid sequence.
Claims
1. A method for recording a transcript, particularly for recording a transcriptome, of a cell, the method comprising the steps of: providing a test cell comprising: a first transgene nucleic acid sequence encoding a fusion protein comprising a reverse transcriptase polypeptide and a Cas1 polypeptide and a second transgene nucleic acid sequence encoding a Cas2 polypeptide, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are under transcriptional control of an inducible promoter sequence, and a third transgene nucleic acid sequence comprising a CRISPR direct repeat (DR) sequence; wherein said CRISPR direct repeat sequence is specifically recognizable by an RT-Cas1-Cas2 complex formed by the expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence, in an exposure step, exposing said test cell to conditions under which expression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence is induced, wherein said RT-Cas1-Cas2 complex formed by expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence acquires at least one protospacer, particularly more than one protospacer, from one or more nucleic acid molecules, more particularly one or more RNA molecules, and integrates said protospacer as spacer into said third transgene nucleic acid sequence yielding a modified third transgene nucleic acid sequence comprising at least one integrated spacer, isolating said modified third transgene nucleic acid sequence from said test cell yielding an isolated modified third transgene nucleic acid sequence, and sequencing said isolated modified third transgene nucleic acid sequence.
2. The method according to claim 1, wherein said third transgene nucleic acid sequence further comprises a CRISPR leader sequence, wherein said CRISPR leader sequence is specifically recognizable by said RT-Cas1-Cas2 complex formed by the expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence.
3. The method according to claim 1 or 2, wherein said third transgene nucleic acid sequence does not comprise any further CRISPR direct repeat sequence.
4. The method according to any one of the preceding claims, wherein said test cell additionally comprises a fourth transgene nucleic acid sequence encoding a sensor, wherein said sensor will be activated when contacted with an analyte molecule yielding an activated sensor, wherein said activated sensor will induce the expression of a record gene inside the cell; and wherein in said exposure step, if said analyte molecule is present, said activated sensor induces the expression of a record gene inside the cell and RNA derived from said record gene is acquired as a spacer.
5. The method according to any one of the preceding claims, wherein said CRISPR leader sequence and/or said CRISPR direct repeat sequence are specifically recognizable by an RT-Cas1-Cas2 complex of F. saccharivorans, Candidatus accumlibacter, Eubacterium saburreum, Bacteroides fragiles, Camplyobacter fetus, Teredinibacter turnerae, Woodsholea maritima, Desulfaculus baarsii, Azospirillum lipoferum, Cellulomonospora bogoriensis, Micromonospora rosaria, Tolypothirx camplyonemoides, Oscillatoriales cyanobacterium, or Rivularia sp, or an RT-Cas1-Cas2 complex originating thereof.
6. The method according to any one of the preceding claims, wherein said test cell is an E. coli cell.
7. The method according to any one of the preceding claims, wherein said third transgene nucleic acid sequence is comprised within a vector, particularly an expression vector.
8. The method according to claim 7, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are comprised within said vector.
9. The method according to any one of the preceding claims, wherein said conditions, under which expression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence is induced, lead to an overexpression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence.
10. The method according to any one of the preceding claims, wherein said conditions, under which expression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence is induced, comprise contacting said test cell with an inducer compound, particularly IPTG, lactose, arabinose, rhamnose or anhydrotetracycline; or comprise anaerobic conditions and said inducible promoter is an anaerobically inducible promoter.
11. The method according to any one of the preceding claims, wherein said third transgene nucleic acid sequence comprises an endonuclease recognition site sequence downstream or within said CRISPR direct repeat, and said endonuclease recognition site sequence is specifically recognizable by a site-specific endonuclease, particularly a restriction endonuclease, wherein particularly said CRISPR direct repeat and said restriction site sequence are separated by 20 bps to 0 bps, and said site-specific endonuclease is particularly a Type IIS or Type IIG restriction endonuclease, particularly FaqI, BsmFI, BsIFI, FinI, or BpuSI and said isolated modified third transgene nucleic acid sequence is contacted with said specific endonuclease before said sequencing, wherein said (full length) CRISPR direct repeat (adjacent to said endonuclease site) is cleaved into a truncated CRISPR direct repeat sequence.
12. The method according to claim 11, wherein said sequencing comprises the use of a PCR primer, wherein said PCR primer comprises a nucleic acid sequence being essentially complementary to part of a full length CRISPR direct repeat sequence, but not fully complementary to said truncated CRISPR direct repeat sequences resulting from said endonuclease cleavage, within said modified third nucleic acid sequence, wherein said full length CRISPR direct repeat sequence results from or is formed by at least one spacer acquisition event.
13. The method according to any one of the preceding claims, wherein said first transgene nucleic acid sequence encoding a fusion protein comprising a reverse transcriptase polypeptide and a Cas1 polypeptide comprises or essentially consists of a sequence selected from SEQ ID NO 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 31, or a sequence at least 85% identical, particularly ≥90%, ≥93%, ≥95%, ≥98% or ≥99% identical to SEQ ID NO 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, or 31 and the encoding polypeptide having substantially the same biological functionality as the polypeptide encoded by SEQ ID NO 7.
14. The method according to any one of the preceding claims, wherein said second transgene nucleic acid sequence encoding a Cas2 polypeptide comprises or essentially consists of a sequence selected from SEQ ID NO 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, and 32, or a sequence at least 85% identical, particularly ≥90%, ≥93%, ≥95%, ≥98% or ≥99% identical to SEQ ID NO 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32, and the encoding polypeptide having substantially the same biological functionality as the polypeptide encoded by SEQ ID NO 8.
15. The method according to any one of the preceding claims, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence together comprise or essentially consist of a sequence of SEQ ID NO 34, or a sequence at least 85% identical, particularly ≥90%, ≥93%, ≥95%, ≥98% or ≥99% identical to SEQ ID NO 034 and encoding polypeptides having substantially the same biological functionality as the polypeptides encoded by SEQ ID NO 034.
16. The method according to any one of the preceding claims, wherein said third transgene nucleic acid sequence comprising a CRISPR direct repeat (DR) sequence comprises or essentially consists of a sequence selected from SEQ ID NO 35 to 103.
17. An isolated nucleic acid molecule comprising: a CRISPR direct repeat (DR), wherein said isolated nucleic acid molecule does not comprise any further CRISPR direct repeat sequence.
18. The isolated nucleic acid molecule according to claim 17 additionally comprising a CRISPR leader sequence.
19. The isolated nucleic acid molecule according to claim 18, wherein said CRISPR leader sequence and said CRISPR direct repeat sequence are separated by 10 to 0 bp.
20. The isolated nucleic acid molecule according to any one of claims 17 to 19, further comprising an endonuclease recognition site sequence downstream or within of said CRISPR direct repeat, wherein said endonuclease recognition site sequence is specifically recognizable by a site-specific endonuclease, particularly a site-specific restriction endonuclease, and, wherein particularly said CRISPR direct repeat and said restriction site sequence are separated by 20 bps to 0 bps, particularly by 10 bps to 0 bps.
21. The isolated nucleic acid molecule according to claim 20, wherein said site-specific endonuclease is a Type IIS or Type IIG restriction endonuclease, particularly FaqI, BsmFI, BsIFI, FinI, or BpuSI.
22. The isolated nucleic acid molecule according to any one of claims 17 to 21, wherein said CRISPR leader sequence and/or said CRISPR direct repeat sequence are specifically recognizable by a RT-Cas1-Cas2 complex of F. saccharivorans, Candidatus accumlibacter, Eubacterium saburreum, Bacteroides fragiles, Camplyobacter fetus, Teredinibacter turnerae, Woodsholea maritima, Desulfaculus baarsii, Azospirillum lipoferum, Cellulomonospora bogoriensis, Micromonospora rosaria, Tolypothirx camplyonemoides, Oscillatoriales cyanobacterium, or Rivularia sp., or an RT-Cas1-Cas2 complex originating thereof.
23. An expression vector comprising the following sequence elements: a first nucleic acid sequence encoding a fusion protein of a reverse transcriptase and a Cas1 polypeptide, and a second nucleic acid sequence encoding a Cas2 polypeptide, wherein said first nucleic acid sequence and said second nucleic acid sequence are under transcriptional control of an inducible promoter sequence, and a CRISPR array sequence comprising a CRISPR direct repeat (DR) sequence, wherein said CRISPR direct repeat sequence is specifically recognizable by a RT-Cas1-Cas2 complex formed by the expression products of said first nucleic acid sequence and said second nucleic acid sequence.
24. The expression vector according to claim 23, wherein said CRISPR array sequence further comprises a CRISPR leader sequence, wherein said CRISPR leader sequence and said CRISPR direct repeat sequence are separated by 10 to 0 bp.
25. The expression vector according to claim 23 or 24, wherein said CRISPR array sequence does not comprise any further CRISPR repeat sequence specifically recognizable by said RT-Cas1-Cas2 complex.
26. The expression vector according to any one of claims 23 to 25, further comprising an endonuclease recognition site sequence downstream or within of said CRISPR direct repeat, wherein said endonuclease recognition site sequence is specifically recognizable by a site-specific endonuclease, particularly a site-specific restriction endonuclease, and said CRISPR direct repeat and said restriction site sequence are separated by 10 bps to 0 bps.
27. The expression vector according to claim 26, wherein said site-specific endonuclease is a Type IIS or Type IIG restriction endonuclease, particularly FaqI, BsmFI, BsIFI, FinI, or BpuSI.
28. The expression vector according to any one of claims 23 to 27, wherein said CRISPR leader sequence, said CRISPR direct repeat sequence, said first nucleic acid sequence and said second nucleic acid sequence originate from F. saccharivorans, Candidatus accumlibacter, Eubacterium saburreum, Bacteroides fragiles, Camplyobacter fetus, Teredinibacter turnerae, Woodsholea maritima, Desulfaculus baarsii, Azospirillum lipoferum, Cellulomonospora bogoriensis, Micromonospora rosaria, Tolypothirx camplyonemoides, Oscillatoriales cyanobacterium, or Rivularia sp.
29. The expression vector according to any one of claims 23 to 28, wherein said inducible promoter sequence is operable in E. coli and is particularly selected from T7 promoter, lac promoter, tac promoter, P.sub.tet promoter, P.sub.C promoter and P.sub.BAD promoter.
30. The expression vector according to any one of claims 23 to 29, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are codon-optimized for E. coli.
31. A cell comprising a first transgene nucleic acid sequence encoding a fusion protein of a reverse transcriptase and a Cas1 polypeptide, and a second transgene nucleic acid sequence encoding a Cas2 polypeptide, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are under transcriptional control of an inducible promoter sequence, and a transgene nucleic acid molecule according to any one of claims 15 to 20, wherein said first transgene nucleic acid sequence, said second transgene and said transgene nucleic acid molecule are comprised in an expression vector according to any one of claims 23 to 30 or integrated into the genome of said cell.
32. The cell according to claim 31, additionally comprising a fourth transgene nucleic acid sequence encoding a fourth transgene product, wherein said fourth transgene product is capable of modulating the expression of a record gene inside the cell, and wherein such modulating the expression of said record gene is dependent on the presence or absence of an analyte molecule.
33. The cell according to claim 32, wherein said fourth transgene product is a sensor which will be activated when contacted with a molecule of interest yielding an activated sensor, wherein said activated sensor will induce the expression of a record gene inside the cell.
34. A method for monitoring of a diet of a patient or for diagnosis of a disease of a patient, particularly of a digestive or gastrointestinal disorder of a patient, said method comprising the steps of collecting a cell according to claims 31 to 33 from a feces sample collected from said patient, wherein said cell has been previously applied orally to said patient, isolating the transgene nucleic acid sequence from said cell yielding an isolated transgene nucleic acid sequence, and sequencing said isolated transgene nucleic acid sequence thereby recording one or more transcripts of said cell produced in the environment of the gastrointestinal tract.
35. An apparatus for conducting the method of claim 34.
Description
DESCRIPTION OF THE FIGURES
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
EXAMPLES
[0137] The inventors hypothesized that direct CRISPR spacer acquisition from RNA could be leveraged to store transcriptional records in CRISPR arrays within living cells. Therefore, several orthologous RT-Cas1-containing CRISPR-Cas systems were characterized. The inventors identified one from Fusicatenibacter saccharivorans to be capable of acquiring RNA spacers heterologously in E. coli. Leveraging F. saccharivorans RT-Cas1 and Cas2 (FsRT-Cas1-Cas2) and developed Record-seq, a method enabling transcriptome-scale molecular recordings into populations of cells. Transcriptional events are recorded according to RNA abundance, stored in CRISPR arrays within DNA, and can be leveraged to describe continuous as well as transient complex cellular behaviors.
[0138] CRISPR Spacer Acquisition by FsRT-Cas1-Cas2
[0139] The inventors set out to identify an RT-Cas1-Cas2 CRISPR acquisition complex with the ability to acquire spacers directly from RNA upon heterologous expression in E. coli. The inventors identified 121 RT-Cas1 orthologs (Table 1), and selected 14 representatives for functional characterization (
[0140] Selective Amplification of Expanded CRISPR Arrays
[0141] Using the previously established spacer acquisition assay, the inventors obtained approximately 1300 newly acquired spacers per 1 million deep sequencing reads for FsRT-Cas1-Cas2 (
[0142] The inventors then employed Record-seq to rescreen their initial selection of RT-Cas1 orthologs (
[0143] Characteristics of FsRT-Cas1-Cas2 Spacer Acquisition
[0144] In order to better understand the properties of FsRT-Cas1-Cas2, the inventors extensively characterized newly acquired spacers by performing Record-seq on populations of E. coli overexpressing FsRT-Cas1-Cas2 (
[0145] FsRT-Cas1-Cas2 Acquires Spacers Directly from RNA
[0146] To determine whether FsRT-Cas1-Cas2 acquires spacers directly from RNA, the inventors utilized a self-splicing td group I intron. This intron is a functional ribozyme, catalyzing its own excision from the pre-mRNA, resulting in a characteristic splice junction that is not present at the DNA-level. The inventors constructed three intron-interrupted constructs based on genes that were highly sampled by spacers, namely cspA, rpoS and argR (
[0147] To further validate this finding, the inventors utilized the Enterobacteria phage MS2. MS2 phages exist as both sense and antisense single-stranded RNAs during their lifecycle but have no DNA intermediates. Given that MS2 phages require the F pilus for cell entry, which is missing in E. coli BL21(DE3) cells, the inventors turned to the E. coli K12 strain NovaBlue(DE3). Upon infection of FsRT-Cas1-Cas2 expressing cells with MS2 phage, the inventors could readily observe novel MS2-aligning spacers sampled from throughout the MS2 genome (
[0148] Recording of Arbitrary Transcripts Using Record-Seq
[0149] To assess the potential of FsRT-Cas1-Cas2 for quantitatively recording transcriptional events, the inventors utilized an inducible expression system to directly determine whether spacers were being acquired according to RNA abundance. The corresponding constructs contained super-folder GFP (sfGFP) or renilla luciferase (Rluc) genes under transcriptional control of the anhydrotetracycline (aTc)-inducible P.sub.tetA promoter. The inventors introduced these into E. coli cultured in increasing levels of aTc and subsequently harvested both total RNA and plasmid DNA for qRT-PCR and Record-seq, respectively (
[0150] To further generalize these findings, the inventors evaluated a second inducible expression system, placing the firefly luciferase (Fluc) gene downstream of the 3-oxohexanoyl-homoserine lactone (3O06-HSL)-inducible P.sub.LuxR promoter. Induction led to a 4-fold increase in Fluc-aligning spacers (
[0151] Record-Seq Shows Cumulatively Highly Expressed Genes
[0152] Considering that FsRT-Cas1-Cas2 acquired spacers directly from RNA in an abundance-dependent manner, the inventors investigated whether this could enable quantification of the cumulative cellular transcriptome. The inventors harvested both plasmid DNA for Record-seq and total RNA for RNA-seq E. coli cultures overexpressing FsRT-Cas1-Cas2 (
[0153] Transcriptome-Scale Recording Reveals Cell Behaviors
[0154] To determine whether Record-seq could be used to record and describe complex cellular behaviors, the inventors turned to the well-studied oxidative stress and acid stress responses in E. coli. The inventors performed Record-seq on oxidative and acid stress stimulated FsRT-Cas1-Cas2 expressing cultures and analyzed cumulative expression counts using unsupervised hierarchical clustering as well as principal component analysis (PCA). Both approaches were successful in distinguishing treatment conditions, suggesting that Record-seq captured the differential molecular histories (
[0155] Sentinel Cells Encode Transient Herbicide Exposure
[0156] To determine whether Record-seq could be leveraged for producing sentinel cells, the inventors utilized the herbicide paraquat and determined if Record-seq could capture dose-dependent and transient exposures. Paraquat is a bacteriostatic herbicide that results in superoxide anion production in microbes, and is banned in a number of countries due to its acute toxicity in humans and use in suicide cases.
[0157] Using an improved FsRT-Cas1-Cas2 expression construct (
[0158] The inventors next determined whether Record-seq was also capable of capturing transient paraquat exposure in a physiological range. After transiently stimulating cultures with paraquat (
[0159] Sentinel Cells Recording the Gut Environment in Mice
[0160] Microbes have evolved to adapt and survive in diverse environments, including intestinal niches with diverse micronutrient availabilities. The gene expression patterns of these microbes reflect the extracellular environment they inhabit and could therefore provide key information on the nutrients that enable colonization as well as maintenance of commensal and pathogenic microbes. This could provide a clear entry point for devising and testing clinical interventions that attempt to address dysbiosis of gut microbiota, which has been causally linked to inflammatory bowel diseases (IBD) such as Crohn's disease and ulcerative colitis, as well as malnutrition, where supplementation with sugars and amino acids that are deficient in the diet has been demonstrated to be corrective in animal models and human infants. Unfortunately, microbial gene expression is transient and does not remain constant over time and throughout transit of microbes through the human intestine. Consequently, microbial gene expression patterns in intestinal niches are only accessible through highly invasive sample collection. The Record-seq technology presented by the inventors can address these limitations by creating sentinel cells that constantly record their environment as they transit through the mammalian intestine. It therefore has enormous potential to monitor human gut health and perturbations in the gut microbiome in a non-invasive manner, through collection of these sentinel cells from fecal sources, forming the basis for personalized medicine. Further, in combination with metagenomic data, Record-seq data from multiple sentinel microbes could help monitor changes in microbe-microbe and host-microbe interactions in the context of alterations in the gut.
[0161] The inventors investigated the potential of various strains of E. coli cells overexpressing FsRT-Cas1-Cas2 to function as transcriptional recorders (i.e. sentinel cells) when transiting through the murine gut. To this end the inventors monocolonized gnotobiotic C57BL/6 mice with BL21(DE3) or MG1655 E. coli cells encoding an anhydrotetracycline inducible FsRT-Cas1-Cas2 expression cassette through oral gavage. Expression of FsRT-Cas1-Cas2 was induced non-invasively via the administration of anhydrotetracycline through the drinking water of the animals along with kanamycin to ensure maintenance of the recording plasmid. Subsequently, these E. coli cells were longitudinally sampled from the feces of the mice as well as from different intestinal compartments at the endpoint of the experiment. Following plasmid DNA extraction, SENECA and deep-sequencing, the inventors could isolate newly acquired spacers (
[0162] Throughout their experiments, the inventors demonstrated, that recording of new spacers increased when raising the concentration of aTc in the drinking water and thus inducing stronger FsRT-Cas1-Cas2 expression (
[0163] The inventors then assessed the potential of Record-seq to detect different microenvironments and disease conditions in the murine gut. In one example, the inventors induced colitis by administering 1%, 2% or 3% (w/v) dextran sulfate sodium (DSS) to the drinking water of the animals. The corresponding data can be used to classify the three treatment conditions using principle component analysis (PCA) merely by performing Record-seq on cells isolated from feces of the treated animals (
[0164] Similarly, in another experiment, the inventors were able to accurately distinguish whether animals were fed with a starch or a chow-based diet (
[0165] This was further bolstered by performing differential expression analysis on the respective Record-seq datasets to pinpoint the exact genes that were differentially expressed in response to different treatment conditions (
[0166] In additional experiments using E. coli MG1655 cells, the inventors confirmed, that Record-seq could also readily distinguish three different diets in this case based on chow, starch and fat (
[0167] Discussion
[0168] Here, the inventors describe Record-seq, a technology to encode transcriptome-scale events into DNA and assess the cumulative gene expression of populations of cells. The inventors demonstrate its potential by recording specific and complex transcriptional information. First, to improve upon existing spacer readout methods the inventors developed SENECA, resulting in a several thousand-fold improvement of spacer detection efficiency compared to recent reports, thereby enabling in-depth characterization of FsRT-Cas1-Cas2 and its application as a molecular recorder. The inventors' results suggest that RNA-derived spacers are preferentially acquired from the ends of abundant transcripts from AT-rich regions with no PAM, and are broadly sampled at transcriptome-scale, enabling the parallelized quantification of cumulative transcript expression.
[0169] In a set of experiments, the inventors show that upon increasing induction of arbitrary sequences, spacers are acquired in an orthogonal, dose-dependent manner and highly correlate with the absolute mRNA copy number in the cell, thus demonstrating that the molecular record faithfully recapitulates the initial stimulus in a predictable way. This also paves the way for increasingly multiplexed and orthogonal molecular recording devices. Upon inducing complex cellular behaviors, Record-seq provides a meaningful transcriptome-scale record of molecular events, which exceeds the capabilities of current molecular recording technologies that only record specific stimuli. Finally, the inventors use Record-seq to elucidate dose-dependent features of the complex cellular response to the bacteriostatic herbicide paraquat, and demonstrate that Record-seq, but not RNA-seq, is capable of recording transient paraquat stimulation.
[0170] Although additional work will greatly improve the capacity of Record-seq to encode richer and more dynamic expression and lineage information within fewer cells, the inventors' proof-of-principle experiments introduce a powerful tool to record transcriptome-scale events permanently in DNA for later reconstructing complex molecular histories from populations of cells. The inventors show that the recorded transcriptional histories reflect the underlying gene expression changes and could therefore be used to interrogate biological or disease processes. In the long term, the inventors envision that CRISPR spacer acquisition components could be introduced into other cell types to record the molecular sequence of events, and lineage path, that gives rise to particular cell behaviors, cell states and types.
[0171] Methods
[0172] Ortholog Discovery Pipeline
[0173] The protein sequence of Arthrospira platensis RT-Cas1 (WP_006620498) was used as a seed sequence, and a JACKHMMER search was run against all NCBI Non-redundant protein sequences using HMMER v3.1b2 (E-value cutoff of 1E-05). Proteins with both Cas1 and RT domains were subsequently identified using HMMSCAN (E-value cutoff of 1E-05). Genome sequence information for the candidate proteins were retrieved and further inspected for the presence of RT-Cas1, Cas2, and a CRISPR array using CRISPRdetect v2.0, CRISPRone, and HMMSCAN. From 121 candidate proteins, 14 CRISPR loci were selected and subsequently aligned using MUSCLE v3.8.31 to identify candidate domains and catalytic residues. Genetic distances were computed using the Jukes-Cantor method and a phylogenetic tree was built using the Nearest-Neighbour method.
[0174] Bacterial Strains and Culture Conditions
[0175] Escherichia coli strains used in this study were StbI3 (Thermo Fisher Scientific) for cloning purposes as well as BL21(DE3) Gold (Agilent Technologies), BL21AI (Invitrogen) and NovaBlue(DE3) (EMD Millipore) as a K12 strain for acquisition assays. All strains were made competent using the Mix & Go E. coli Transformation Kit & Buffer Set (Zymo Research) following the manufacturer's protocol with growth in ZymoBroth at 19° C. directly from fresh colonies. After transformation, cells were grown at 37° C. on lysogenic broth (LB) (Difco) 1.5% agar plates containing 50 μg/mL kanamycin and 1% glucose (w/v) to reduce background expression from the T7lac system. Liquid cultures for plasmid isolation were grown in TB media (24 g/L yeast extract, 20 g/L tryptone, 4 mL/L glycerol, 17 mM KH.sub.2PO.sub.4, 72 mM K.sub.2HPO.sub.4) containing 1% glucose (w/v).
[0176] Generation of Golden Gate Compatible pET30 Overexpression Vector
[0177] All standard PCRs for cloning were performed using Phusion Flash High-Fidelity PCR Master Mix (Thermo Scientific) or KAPA HiFi HotStart ReadyMix (Roche), oligonucleotides and gBlocks were ordered from Integrated DNA technologies. Primers are listed in Table 6. pET30b(+) (kind gift from Markus Jeschek) was PCR amplified as five fragments using primers FS_151/FS_152, FS_153/F5_154, FS_155/FS_156, FS_157/FS_158, FS_159/FS_160, respectively in order to remove the five undesired BbsI restriction sites present in the backbone. The resulting PCR fragments were assembled using 2×HiFi DNA Assembly Mastermix (NEB), yielding pFS_0012. Subsequently, oligos FS_380 and FS_381 were annealed to generate a double stranded DNA (dsDNA) fragment encoding the T7 terminator and cloned into pFS_0012 using XhoI/CsiI, yielding pFS_0013-a pET30 derived overexpression vector harboring two Golden Gate cloning sites and thus facilitating parallel cloning of RT-Cas1, Cas2 as well as a corresponding CRISPR array. Nucleotide sequences of all RT-Cas1 and Cas2 orthologs tested in this study along with their corresponding CRISPR arrays are listed under Sequences.
[0178] Golden Gate Assembly of RT-Cas1-Cas2 Overexpression Vectors for Ortholog Screen
[0179] RT-Cas1, Cas2 and CRISPR array sequences were ordered from Twist Biosciences and Genscript. Putative CRISPR arrays were ordered as sequences consisting of the leader sequence followed, by DR-nativespacer1-DR-nativespacer2-DR. Furthermore, each fragment was flanked by BbsI restriction sites generating overhangs facilitating Golden Gate Assembly into pFS_0013. Briefly, 40 fmol per fragment (RT-Cas1, Cas2, corresponding CRISPR array, pFS_0013 acceptor vector), 1 μL ATP/DTT mix (10 mM each), 0.25 μL T7 DNA Ligase (Enzymatics), 0.75 μL BpiI (Thermo Scientific), 1 μL buffer green up to 10 μL with PCR grade H.sub.2O were subjected to 99 cycles of 37° C. for 3 min, 16° C. for 5 min, followed by 80° C. for 10 min. Subsequently, 5 μL of this mixture were transformed into 50 μL StbI3 cells and recovered in SOC media for 30 min at 37° C., 1000 rpm before spreading on plates.
[0180] Spacer Acquisition
[0181] Acquisition assays were performed at 37° C., 300 rpm in bacterial culture tubes containing 3 mL of TB media supplied with 100 μM isopropyl-β-D-thiogalactopyranoside (IPTG) (Sigma Aldrich) and for BL21(DE3) Gold and NovaBlue(DE3). For E. coli BL21AI, L-(+)-arabinose (Sigma Aldrich) was additionally added to 0.2% (w/v). Each culture was inoculated with 2 colonies of bacteria stored no longer than 14 days at 4° C. upon transformation and overnight growth at 37° C. When cultures reached saturation (typically 12-14 h post inoculation), 2 mL of bacterial culture were harvested and plasmids containing CRISPR arrays were isolated by standard plasmid Mini-Prep procedures to serve as a template for preparation of deep sequencing libraries.
[0182] Amplification of CRISPR Arrays for Classical Acquisition Readout by Deep Sequencing
[0183] Leader proximal spacers were PCR amplified from 3 ng of plasmid DNA per μL of PCR reaction using NEBNext High-Fidelity 2×PCR Master Mix (NEB) with a forward primer binding in the leader sequence of the respective CRISPR array and a reverse primer binding in the first native spacer (Primer Design Note 1 and Table 2 for primer design and binding sites of individual CRISPR arrays, respectively). For each biological replicate, 12 individual PCR reactions of 10 μL were performed with an extension time of 15 sec for 16 cycles. The individual 10-μL reactions belonging to the same biological sample were then pooled, and residual primers removed using homemade AMPure beads at a PCR to bead ratio of 1:1.5 (v/v) eluting the PCR product in 60 μL of buffer TE. Subsequently, 500 ng of first round PCR product per biological sample was run on a 3% LAB agarose gel (300V, 55 min, cooling the gel-chamber in an ice-water bath during the run) and purified by blind excision of gel slices at 211 to 300 bp, avoiding the prominent DNA band corresponding to PCR products of the unexpanded array (i.e. no acquisition of novel spacers). Amplicons were then purified from the gel slices using the QIAquick Gel Extraction Kit (QIAGEN) and eluted into 22 μL of buffer EB. Illumina sequencing adaptors and indices were appended in a second round of PCR, using 6 μL of gel purified input DNA as a template in a 20 μL PCR reaction with universal second round deep sequencing primers attaching P5 and P7 handles for binding of PCR products to the flow cell in deep sequencing as well as barcoding the samples with (N).sub.8 barcodes corresponding to Illumina TruSeq HT indices (Primer Design Note 2 and Table 3 for primer design and indices, respectively). After this second round of PCR, products were purified using the QIAquick PCR Purification Kit (QIAGEN) and eluted in 22 μL buffer EB. Samples were then pooled and subjected to another round of gel purification using the same parameters as described above, this time excising products in the range of 280 to 350 bp.
[0184] Selective amplification of ExpaNdEd Crispr Arrays (SENECA)
[0185] FsCRISPRArray2 was amplified from pFS_160 using FS_871/FS_904, generating a minimal Fs CRISPR Array consisting of the leader sequence and a single DR followed by a FaqI restriction site (CTTCAG) on the bottom strand resulting in plasmid pFS_0235 as our standard recording plasmid. This plasmid was transformed into chemocompetent BL21(DE3) Gold bacteria or NovaBlue(DE3) (EMD Millipore) and subjected to spacer acquisition as described above. Following plasmid extraction and quantification using Quant-IT PicoGreen dsDNA Assay Kit (Thermo Scientific) read out with a Tecan M1000 Pro Microplate reader, plasmid DNA was subjected to SENECA-adapter ligation in a Golden Gate reaction. Oligonucleotides FS_0963/FS_0964 were annealed (2.5 μL each of 100 μM oligo, 5 μL NEBuffer 2 (NEB), 40 μL PCR grade H.sub.2O), by heating to 95° C. for 5 min and cooling to 20° C. at 0.12° C./sec. Annealed oligos were diluted 1:100 in TE buffer. Next, 40 fmols of plasmid DNA (180.3 ng for pFS_0235), 0.25 μL T7 Ligase (Enzymatics), 1 μL FastDigest FaqI 0.5 μL of 20×SAM, 1 mM ATP, 1 mM DTT (all Thermo Scientific), 1 μL of annealed, diluted oligonucleotides FS_0963/FS_0964 in 10 μL total Volume were subjected to 99 cycles of 3 min 37° C., 3 min 20° C. followed by 15 min at 55° C. First round deep sequencing PCR was performed using NEBNext High-Fidelity 2×PCR Master Mix (NEB) (forward primers: FS_0968 to FS_0974, reverse primer: FS_0911). For each biosample one 30 μL reaction containing 10.38 μL of adapter ligated plasmid DNA were performed (98° C. for 30 s; 22 cycles at 98° C. for 10 s, 57° C. for 30 s and 72° C. for 20 s followed by 72° C. for 5 min), pooled and purified by magnetic beads (GE Healthcare) at a PCR to bead ratio of 1:1.6 (v/v) recovering the PCR product in 25 μL TE buffer (Primer Design Note 3 for details on primer design). Illumina sequencing adaptors and indices were appended in a second round of PCR (98° C. for 30 s, 8 cycles of 98° C. for 10 s, 65° C. for 30 s and 72° C. for 30 s, and 72° C. for 5 min) using 5 μL of first round PCR product as input in a 20 μL reaction (Primer Design Note 2 and Table 3 for primer design and indices, respectively). Samples were pooled, desalted using the QIAquick PCR Purification Kit (QIAGEN) and size selected on a E-Gel EX Agarose Gels, 2% (Thermo Scientific), loading 200-500 ng of DNA per lane, extracted using the QIAquick Gel Extraction Kit and subjected to deep sequencing on Illumina MiSeq or NextSeq500 platforms using the MiSeq Reagent Kit v3 (150-cycle) or NextSeq 500/550 Mid/High Output v2 kit (150 cycles) (both Illumina), respectively. Libraries were loaded at a concentration of 1.4 to 1.6 μM as determined by qPCR using the KAPA Library Quantification Kit for Illumina® Platforms (Roche). PhiX was included at 5-10%.
[0186] SENECA Based Ortholog Screen
[0187] For the SENECA based CRISPR array directionality screen, putative CRISPR arrays were extracted from genomic sequences, assuming a standard leader length of 150 nt followed by a single DR. The FaqI restriction site required for SENECA was appended downstream of the DR and sequences were flanked by universal adapters for amplification and cloning. The final array sequences including these features are depicted under Sequences 2 and were ordered from Twist Biosciences as linear DNA fragments. These were PCR amplified using primers FS_1406/FS_1407 and cloned into CsiI/NotI-digested plasmids containing their respective RT-Cas1-Cas2 ortholog using HiFi DNA Assembly (NEB). Upon transformation into E. coli BL21(DE3), these constructs were subjected to the standard spacer acquisition assay in TB media. Plasmid DNA was extracted and subjected to SENECA adapter ligation.
[0188] The respective oligos to be annealed for each CRISPR array tested in this experiment are listed in Table 4. Following adapter ligation, a single 140 μL 1st round PCR reaction was prepared for each ortholog using NEBNext High-Fidelity 2×PCR Master Mix and containing the entire 20 μL SENECA adapter ligation as a template. First round PCR primers specific to the respective DR of each CRISPR array tested are listed in Table 5. The 140 μL PCR reaction was split into 12 reactions of 11 μL along the row of a 96-well plate. This plate was subjected to a gradient PCR (53 to 68° C. in an Eppendorf Mastercycler Gradient). This procedure was chosen because SENECA leverages the fact that a DR matching primer will only bind to the full DR resulting from an acquisition event but not the truncated parental DR at a unique annealing temperature. By splitting the PCR reaction and subjecting it to a temperature gradient, it is ensured that without a prior knowledge, at least one of the 12 reactions is subjected to the annealing temperature at which selective amplification of expanded CRISPR arrays occurs. PCR was performed for 30 cycles upon which, the 12 reactions performed along the temperature gradient were pooled again and purified using 1.85×Ampure beads and eluted in 25 μL TE buffer. Five μL of this elution were used as a template for a standard 20 μL second round PCR at 65° C. annealing temperature for 12 cycles as described above. Subsequently, PCR products were purified using 2.2×Ampure beads, eluted into 22 μL TE buffer, size selected as described in the standard SENECA protocol (E-Gel Ex 2%, followed by gel extraction) and subjected to deep sequencing.
[0189] Deep Sequencing
[0190] Small scale targeted deep sequencing of CRISPR Arrays for the ortholog screen was performed using the Illumina MiSeq v3 300 cycle kit on an Illumina MiSeq platform or Illumina HiSeq High Output High Output PE 200 cycle kit an Illumina HighSeq2500. Deep sequencing of spacer libraries prepared using SENECA were sequenced using the NextSeq 550/550 High Output Kit v2 150 cycle on Illumina NextSeq platform or the MiSeq Reagent Kit v3 150-cycle on a MiSeq.
[0191] Data Analysis Pipeline
[0192] FASTQ files were quality filtered and trimmed using trimmomatic (trimmomatic SE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:75) and subsequently converted to FASTA files using FASTX-Toolkit v0.0.14 (fastq-to-fasta) (http://hannonlab.cshl.edu/fastx_toolkit/). Using custom scripts written in python2.7, spacers were identified based on the identification of a 20-66 nucleotide sequence between two 10-nt DR segments, allowing for 2 and 3 mismatches in the first and second DR segment, respectively. Arrays with multiple spacers were identified based on the presence of a complete DR sequence, allowing for 3 mismatches. Only unique spacers (>1 mismatch) from a given sample were further processed. Spacers were aligned to a merged reference genome containing plasmid and E. coli sequences [E. coli B121(DE3) Gold (NC_012947.1) genome, E. coli K12 (NC_000913.3)] using bowtie2 (bowtie2 --very-sensitive-local). In MS2 challenge experiments, the MS2 sequence [MS2 (NC_001417.2)] was also included in the merged reference genome. Identical alignments were collapsed using samtoolsv1.3, and alignments were visualized in Geneiousv10.2.3. Basic statistics about numbers of reads or alignment features were calculated using standard bash commands, and compiled and visualized using Prism7.0d. Gene body percentiles were calculated using RSeQC (geneBody_coverage.py v2.6.4). Nucleotide probabilities were determined and visualized using the weblogo webtool v2.8.2. Simulated spacer datasets were prepared using BEDtools v2.25 (bedtools random -n 500 -l 38). Transcript quantification for RNA-seq and Record-seq was performed using featureCounts v1.5.0. Using custom scripts written in Matlab v9.1.0, RNA-seq and Record-seq transcript counts were normalized using transcripts per million (TPM) and used to compute cumulative spacer sums, a linear regression fit, coefficient of determination (R.sup.2), and Pearson linear correlation coefficient.
[0193] Record-seq datasets corresponding to oxidative or acid stress treatments were analyzed using custom scripts written in R v3.4.4. Briefly, transcripts with less than 5 counts across replicates were discarded. Heatmaps representing unsupervised hierarchical clustering of Pearson linear correlation with complete linkage (using raw transcript counts as inputs) were prepared using the ‘heatmap.2’, ‘hclust’, and ‘cor’ commands with default settings. Principal component analysis (PCA) was performed on log 2 transformed data (raw counts plus one pseudocount to tolerate zeros) for the 50 most variable (standard deviation) genes using the ‘prcomp’ command with default settings. Differential expression analyses (using raw counts plus one pseudocount as input) were performed using DEseq2v1.14.1, edgeRv3.16.5, and baySeqv2.8.0 encapsulations within R. Heatmaps representing unsupervised hierarchical clustering of signature differentially expressed genes were prepared using the ‘pheatmap’ command with default settings.
[0194] Code Availability
[0195] The custom scripts used for the described data analysis are available on the Platt Lab website (platt.ethz.ch).
[0196] RNASeq of E. coli BL21(DE3)
[0197] RNA extraction from E. coli BL21(DE3) was performed after overnight growth under induction of FsRT-Cas1-Cas2 expression following the QIAGEN Supplementary Protocol: Purification of total RNA from bacteria using the RNeasy Mini Kit. To achieve the appropriate amount of input culture (corresponding to 5×10.sup.8 cells), serial dilutions of the overnight culture were prepared to achieve an OD.sub.600 between 0.2 to 0.6 measured with a NanoDrop OneC (Thermo Scientific). Bacteria were lysed using acid-washed glass beads (G1277-10G, Sigma Aldrich). The additional on-column DNase digestion was performed using the RNase-Free DNase Set (QIAGEN). DNA free RNA was submitted to the Genomics Facility Basel for ribosomal RNA (rRNA) depletion using the Ribo-Zero rRNA Removal Kit (Illumina) and followed by library preparation and sequencing on an Illumina NextSeq platform using the NextSeq 500/550 High Output v2 kit (150 cycles).
[0198] td intron
[0199] The gBlock FS_gBlock_td_intron_acceptor (Sequences 3) was cloned into pFS_0235 using SphI/SgrAI yielding pFS_0238. This gBlock encoded the BBa_J23104 promoter, the ribosome binding site from bacteriophage T7 gene 10 as well as the td intron sequence including flanking regions facilitating efficient splicing. Furthermore, a BbsI-mediated Golden Gate cloning site was placed downstream and upstream of the td intron sequence, allowing for seamless assembly of upstream and downstream exon sequences in a single one-pot reaction as described above. As the inventors previously noticed, that the 5′ end of transcripts was preferentially acquired by the FsRT-Cas1-Cas2 complex, the inventors introduced the td intron within the first 23 to 31 nucleotides of the respective transcripts. The inventors created intron-interrupted sequences of three E. coli genes cspA, rpoS, argR (cold shock protein CspA, RNA polymerase sigma factor RpoS and Arginine repressor, respectively). These were selected based on the fact that they were well sampled by the FsRT-Cas1-Cas2 complex in preceding SENECA experiments. The flanking exon sequences were mutated in four to six positions to yield optimized sequences for td intron splicing, which also aided in unambiguously distinguishing the spliced and endogenous transcripts or DNA.
[0200] Accordingly, the inventors ordered complementary oligonucleotides for the fragment of the transcript to be cloned 5′ of the td intron and annealed them prior to Golden Gate Assembly, while the fragment to be cloned 3′ of the intron was amplified by PCR from genomic DNA. Oligonucleotides were FS_1054/1055 (5′ of the intron, annealed) and FS_1056/1057 (3′ of the intron, PCR) for CspA; FS_1038/1039 and FS_1040/1041 for RpoS; FS_1046/1047 and FS_1048/1049 for ArgR. The inventors ensured that mutating sequences of the respective genes to those of the td intron flanking sites did not generate a stop codon. The td intron containing FsRT-Cas1-Cas2 overexpression constructs were subjected to a standard acquisition assay followed by plasmid DNA extraction, SENECA and deep sequencing. Presence of td intron splice sites in DNA outside of the FsCRISPR array was tested by extracting gDNA from td-ArgR transformed cultures using the GenElute Bacterial Genomic DNA Kit (Sigma Aldrich). Libraries containing the td intron insertion site were amplified using a two-round PCR strategy method analogous to the ones described above using forward primers FS_1154 to FS_1157 and reverse primers FS_1158 to FS_1161 (Table 6). First-round PCR was performed at 57° C. annealing temperature and 20 sec elongation for 15 cycles. Second-round PCR was performed at 63° C. annealing temperature and 20 sec elongation for 8 cycles.
[0201] Infection with MS2 Phage
[0202] For infections with MS2 phage, the recording plasmid pFS_0235 was transformed into the F′, and thus MS2 susceptible NovaBlue(DE3) Competent Cells (EMD Millipore). Next morning, 15 mL of TB containing 100 μM of IPTG were inoculated with 10 colonies and grown at 37° C., 150 rpm in an orbital shaker until an OD.sub.600 of 0.24. Then, MgSO.sub.4 was added to 5 mM final concentration. Aliquots of 3 mL were split into bacterial culture tubes, infected with 200 μL of high-titre MS2 phage suspension and incubated for 1 h at room temperature without shaking to allow infection by MS2. Next, culture tubes were transferred to the orbital shaker and incubated overnight at 30° C., 80 rpm. Growth of E. coli in presence of MS2 phage at 30° C. rather than 37° C. prevents lysis of cells by productive MS2. Next morning, shaking was increased to 150 rpm. Another day later (˜41 h post-infection), cultures were pelleted by centrifugation, plasmid DNA was extracted and subjected to SENECA followed by deep sequencing.
[0203] Synthetic Recording of sfGFP and Rluc Transcripts
[0204] The Pcat-tetR-term_PtetO encoding fragment was amplified with primers FS_1123/FS_1125 from pLP167 (kind gift from Luzi Pestalozzi), digested with BamHI/AgeI and cloned into AgeI/BbsI-digested pFS_0238 (see cloning of td intron constructs), yielding pFS_0270 which contains a BbsI-mediated Golden-Gate immediately downstream of the P.sub.tetA promoter. Subsequently, sfGFP was amplified from pLP167 with primers FS_1134/FS_1135 and Rluc was amplified using FS_1136/FS_1137 from BBa_J52008 (registry of standard biological parts). Both fragments were cloned into pFS_0270 using BbsI-mediated Golden Gate Assembly, yielding pFS_0271 (sfGFP) and pFS_0272 (Rluc), respectively. LuxR promoter parts were amplified with primers FS_1584/FS_1585 from pIG0046 and FS_1586/FS_1587 from pIG0059 (registry of standard biological parts) and cloned into AgeI-digested pFS_0270 using NEBuilder HiFi DNA Assembly Master Mix (NEB), resulting in pFS_0399. Oligos F5_1588/FS_1589 were annealed and cloned into pFS_0399 digested with SalI/BamHI-yielding pFS_0400. The Fluc coding sequence was amplified from BbaI712019 (registry of standard biological parts) using FS_1618/FS_1619, digested with BsaI and cloned into BbsI-digested pFS_0400, resulting in pFS_0412 that was used in RNA recording experiments. For each biological replicate, 50 mL of IPTG containing TB media were inoculated with 22 colonies of E. coli BL21(DE3) transformed with pFS_0271 (sfGFP), pFS_0272 (Rluc) or pFS_0412 (Fluc). When reaching an OD.sub.600 of 0.25, cells were split into 3 mL aliquots in bacterial culture tubes and induced with aTc in case of P.sub.tetA promoter or N-(3-Oxododecanoyl)-L-homoserine lactone (3O06-HSL) (Sigma) in case of P.sub.LuxR promoter, and cultured in an orbital shaker for 12-14 hours at 300 rpm, followed by plasmid DNA extraction, SENECA and deep sequencing. Spacers aligning to sfGFP, Rluc and Fluc were quantified as described above (see “Data analysis pipeline”). Detected number of unique spacers per million sequencing reads was normalized defining the sum number of spacers per biological replicate as 100% and plotted using GraphPad Prism v7.0d. For RNA-recording with pFS_0271 and pFS_0272 RNA extraction from the same cultures was performed using the RNAsnap method followed by treatment with the TURBO DNA-free Kit (Thermo Scientific) using 1.5 μL of TURBO DNase to minimize DNA-background. Reverse transcription was performed using qScript cDNA SuperMix (Quanta Bio) with 500 ng of RNA sample as a template. cDNA was diluted 1:4 and quantification was performed in 2 technical replicates by real-time PCR (qRT-PCR) using TaqMan Fast Advanced Master Mix (Life Technologies) in a Roche LightCycler 96 System. Primers and probes sequences are listed in Table 7. Absolute copy number was calculated using standard curve method and 16s rRNA was used as a housekeeper. To determine mRNA copy number corresponding to number of cells in a single SENECA reaction (6×10.sup.9) was calculated based on the average amount of 18700 16s rRNA transcripts per single E. coli cell (BNID 102992).
[0205] Orthogonal Synthetic Recording
[0206] The Rluc coding sequence was amplified using FS_1620/FS_1137 from pFS_0272 and cloned into pFS_0399 using BbsI-mediated Golden Gate Assembly, yielding pFS_0413. The Fluc coding sequence was amplified from Bba_I712019 (registry of standard biological parts) using FS_1621/FS_1619, digested with BbsI and cloned into BsaI-digested pFS_0413, resulting in pFS_0414 which was subsequently used in orthogonal synthetic recording experiments.
[0207] For each biological replicate, 50 mL of TB media containing 100 μM IPTG were inoculated with 33 colonies of E. coli BL21(DE3) transformed with pFS_0414, containing (3-Oxododecanoyl)-L-homoserine lactone (3O06-HSL)-inducible Fluc and aTc-inducible Rluc coding sequences. When reaching an OD.sub.600 of 0.25, cells were split into 3 mL aliquots in bacterial culture tubes and induced with 75 ng/mL of anhydrotetracyclinehydrochloride (aTc) (Cayman Chemical) or 10 μM of 3O06-HSL (Sigma) or a combination of both and cultured in an orbital shaker for 12 hours at 300 rpm, followed by plasmid DNA extraction, SENECA, deep sequencing as well as parallelized RNA extraction from the same culture followed by reverse transcription and qPCR measurements. Data was analyzed as described above for recording of single synthetic transcripts.
[0208] Transcriptional Response to Oxidative Stress
[0209] Per biological replicate 36 mL IPTG containing TB media containing 100 μM IPTG were inoculated with 24 colonies of E. coli BL21(DE3) transformed with pFS_0235 the evening before (resulting in 1 colony/1.5 mL) and shaken in a 250 mL baffled shaker flask until reaching an OD.sub.600 of 0.24 to 0.25. Then cultures were split into 3 mL aliquots into bacterial culture tubes (Grainer) and treated with H.sub.2O.sub.2 (30% w/w solution, Sigma Aldrich) to a final concentration of 1 mM or an equal volume of ddH.sub.2O. Growth was continued for 12 hours at 300 rpm followed by harvesting of 2 mL of culture for plasmid DNA extraction, SENECA and deep sequencing. Data were analyzed as described above (see “Data analysis pipeline”).
[0210] Transcriptional Response to Acid Stress
[0211] For pH-controlled growth, potassium-modified lysogenic broth (LB) (10 g/L tryptone, 5 g/L yeast extract, 7.45 g/L KCl) was buffered with 100 mM HOMOPIPES (Homopiperazine-1,4-bis(2-ethanesulfonic acid)). Subsequently, the pH of the medium was adjusted to either 5.0 (acid stress) or 7.0 (neutral) using KOH solution as described previously. For each biological replicate 50 mL of pH adjusted, IPTG containing LB media were inoculated with 33 colonies of E. coli BL21(DE3) transformed with pFS_0235 (resulting in 1 colony/1.5 mL). Samples were harvested between OD.sub.600 of 0.3 to 0.6 for plasmid DNA extraction, SENECA and deep sequencing. Data were analyzed as described above (see “Data analysis pipeline”).
[0212] Cloning of aTc-Inducible FsRT-Cas1-Cas2 Expression Construct
[0213] For recording the transcriptional response to paraquat an aTc-inducible FsRT-Cas1-Cas2 expression construct was generated. Therefore, a fragment containing the tet repressor driven by a constitutive promoter as well as the P.sub.tetA promoter was amplified from pFS_0271 using FS_1574/1575 and digested with BglI/SphI, furthermore the N-terminus of FsRT-Cas1-Cas2 was amplified with FS_1576/1577 and digested with SphI/BglII. These two fragments were cloned into BglI/BglII-digested pFS_0235 yielding pFS_0393. The codon optimized FsRT-Cas1-Cas2 sequence was obtained from Genscript, amplified using FS_1641/1642 and cloned into pFS_0393 using XhoI/SphI replacing the initial FsRT-Cas1-Cas2 coding sequence and yielding pFS_0453 (SEQ ID NO 334).
[0214] Transcriptional Response to 1 mM or 10 mM Paraquat
[0215] Paraquat dichloride hydrate (PESTANAL, Sigma Aldrich) was dissolved at 1 M in ddH.sub.2O. For each biological replicate, 75 mL of TB media containing 30 ng/mL aTc were inoculated with 50 colonies of E. coli BL21(DE3) transformed with pFS_0393 and shaken in baffled shaker flasks until reaching an OD.sub.600 of 0.24 to 0.25. Then cultures were split into 3 mL aliquots into bacterial culture tubes and treated with either 1 mM or 10 mM paraquat and cultured for an additional 11-12 hours before harvesting of 2 mL of culture for plasmid DNA extraction, SENECA and deep sequencing. Data were analyzed as described above (see “Data analysis pipeline”).
[0216] Transcriptional Response to Transient Paraquat Exposure
[0217] For each biological replicate two colonies of E. coli BL21(DE3) transformed with pFS_0453 were inoculated into 3 mL of TB media containing 30 ng/mL aTC in standard bacterial culture tubes. For the first 12 h all cultures were cultivated in the absence of paraquat (300 rpm, 37° C.). Then 2 mL of culture were aspirated, while the remaining 1 mL was spun down (2300×g, 10 min) the supernatant was aspirated and the bacterial pellet resuspended in 3 mL of fresh TB media containing 30 ng/mL of aTc. For both the transient as well as the permanent stimulus conditions, paraquat was added to 10 mM final concentration and the cultures were grown for an additional 12 h as above. Then 2 mL of culture were removed, the remaining 1 mL was pelleted as above and resuspended in 3 mL of fresh TB media containing 30 ng/mL of aTc. Paraquat was added to 10 mM the permanent stimulus condition and cultures were grown for an additional 12 h as above. Then 2 mL of culture were harvested for plasmid DNA extraction, SENECA and deep sequencing. Additionally, 100 μL of culture were harvested for RNA-extraction by the RNASnap protocol as described above followed by treatment with the TURBO DNA-free Kit (Thermo Scientific) using 1.5 μL of TURBO DNase. Ribosomal RNA was depleted using Ribo-Zero rRNA Removal Kit (Illumina) followed by library prep using TruSeq Stranded mRNA (Illumina) and deep sequencing on an NextSeq 500/550 High Output v2 kit (75 cycles) sequencing each library at a depth of 4 million reads or greater.
[0218] Bacterial Population Inputs for Record-Seq Experiments and Achieved Recording Efficiencies
[0219] Record-seq experiments were performed in standard 12 mL culture tubes filled with 3 mL of terrific broth (TB) media, of which 2 mL were used for subsequent plasmid DNA extraction. In early experiments the inventors determined that using 40 fmols (180 ng of plasmid DNA) as an input to SENECA gave consistent results and left enough plasmid for archiving samples and performing several additional SENECA reactions on the same sample if necessary.
[0220] Accordingly, 40 fmols can be considered for contextualizing the number of cells used in a typical experiment. The construct depicted in
[0221] A single SENECA reaction of pFS_0235 eventually yields ˜6,126 spacers upon using the entire adapter ligated plasmid DNA for PCR amplification (two 30 μL PCR reaction, each containing 10 μL of adapter ligated plasmid DNA). Using the optimized FsRT-Cas1-Cas2 expression construct encoding an E. coli codon-optimized FsRT-Cas1-Cas2 coding sequence under transcriptional control of the aTc inducible P.sub.tetA promoter (pFS_0453), Extended Data
[0222] Based on the number of cells required to detect a specific stimulus, this calculation can be used to derive the number of cells used as a minimal input for the respective recording. For example, the inventors defined the minimum number of spacers to be required for assessing an arbitrary sequence (sfGFP) to be as low as 500 spacers, which corresponds to 8.8×10.sup.6 E. coli cells (
[0223] Likewise, the inventors estimated the number of spacers required to detect complex cellular behaviors to be 313 (7% of the original data), (
[0224] Type III Versus Type I CRISPR-Cas Systems
[0225] Type III CRISPR-Cas systems like F. saccharivorans are generally several thousand-fold less efficient in spacer acquisition than the prototypical Type I systems (like the E. coli Type I-E). This necessitates multiple rounds of elaborate size selection procedures followed by deep sequencing to identify new spacers. Likewise, PCR products from extended CRISPR arrays cannot be detected on DNA gels (agarose or PAGE) due to their vanishingly low abundance. Taken together, while the classic spacer readout is applicable for highly efficient spacer acquisition systems, it precludes deep characterizations of most CRISPR-Cas systems, which motivated the development of SENECA.
[0226] Assessing the Correlation Between RNA-Seq and Record-Seq
[0227] The inventors set out to assess the direct correlation between RNA-seq and Record-seq (
[0228] Analysis of Complex Cellular Behaviors with Record-Seq
[0229] The inventors set out to answer the following questions: (i) are the transcriptional-scale records broadly different between the treated and untreated conditions; (ii) do the most variable genes in the dataset distinguish the two populations; (iii) do standard RNA sequencing analysis tools identify genes that were cumulatively differentially expressed; (iv) are the cumulatively differentially expressed genes informative in the context of the initial stimulus; and (v) can the inventors unbiasedly classify the cellular populations into treated and untreated conditions based on broad, variable, or signature responses.
[0230] Questions (i-iv) are addressed in the main text, but here the inventors will elaborate on question (v). Among the signature genes the inventors identified several that were expected to dominate the cellular responses for each stimulus. For example, the inventors identified dps (DNA protection during starvation protein), which codes for a hallmark DNA damage repair protein, among the oxidative stress signature genes. Additionally, dps has previously been shown to be the top differentially expressed gene in response to oxidative stress. Furthermore, the inventors identified three members of the SUF system (i.e., sufABCDSE operon), which primarily operates under oxidative stress conditions to aid in the formation of iron-sulfur (Fe—S) clusters. Likewise, the inventors identified hallmark members of the acid stress response, including asr (acid-shock protein precursor) as well as several chaperones (e.g., dnaK and ibpB) and heat-shock proteins (e.g., grpE and ibpA) among the acid stress signature genes.sup.35.
[0231] CRISPR Spacer Acquisition from RNA Versus DNA
[0232] The inventors present multiple lines of evidence showing CRISPR spacer acquisition from RNA, including spacer acquisition from an RNA only td intron splice junction (
[0233] Benefits of Record-Seq
[0234] The benefits of Record-seq include (i) the ability to heterologously express orthologous RT-Cas1-containing CRISPR acquisition systems in order to capture and store RNA species within DNA in an abundance-dependent process; (ii) the capacity to efficiently and scalably read out molecular histories permanently stored in DNA and reconstruct transcriptome-scale events; (iii) the application of this technology for recording specific inputs, such as virus infection or any single or orthogonal set of inducible expression system and (iv) the potential applications of this system for creating ‘sentinel’ cells for medical or biotechnology applications. Even if specific external stimuli cannot be recorded directly, the transcriptome-scale molecular signatures recorded within a bacterial population may be sufficient to report meaningful physiological states.
[0235] Mice Experiments
[0236] For oral gavage, E. coli (BL21 (DE3) or MG1655) cells were transformed with pFS_0453 (SEQ ID NO 334) and streaked on LB-agar plates containing 50 μg/mL kanamycin and grown overnight (12 h) at 37° C. The plasmid pFS_0453 encodes FsRT-Cas1-Cas2 under transcriptional control of an anhydrotetracycline inducible promoter (pTetA) as well as the FsCRISPR array 2 followed by a FaqI restriction site for the SENECA readout.
[0237] The following evening, a single colony was picked into 3 mL LB medium containing 50 μg/mL kanamycin under sterile conditions and grown overnight at 37° C. in a bacterial shaker (200-300 rpm). This culture was used to prepare a glycerol stock by mixing 500 μL of bacterial culture with 500 μL of sterile 50% (w/v) glycerol for long term storage at −80° C. For in vivo recording experiments, an overnight liquid culture was inoculated either directly from this glycerol stock or by streaking bacterial on an LB-agar plate containing 50 μg/mL kanamycin to obtain single bacterial colonies.
[0238] Gnotobiotic C57BL/6 mice were orally gavaged with 1×10.sup.9 colony forming units (CFU) of E. coli BL21(DE3) or MG1655 cells transformed with pFS_0453 in 500 μL PBS. Persistence of the plasmids was ensured by adding 100 μg/mL kanamycin sulfate (Sigma Aldrich) to the drinking water. Expression of FsRT-Cas1-Cas2 was induced by the addition of 10-30 μg/mL anhydrotetracycline (Cayman Chemical) to the drinking water.
[0239] For the DSS experiment, kanamycin (100 μg/mL) and anhydrotetracycline (30 μg/mL) were added to the drinking water of the germ-free C57BL/6 mice 24 hours prior to gavage. Animals were maintained under germ-free conditions. A colony of E. coli BL21(DE3) transformed with pFS_0453 was grown overnight in LB medium containing 50 μg/mL kanamycin. The resulting culture was pelleted and resuspended in 1×PBS. This bacterial resuspension was used to orally gavage each animal with 1×10.sup.9 colony forming units (CFU) of E. coli. Animals were maintained on water containing both kanamycin and anhydrotetracycline throughout the entire experiment. Fecal pellets were collected for 18 days starting 24 hours after the gavage. From day 5 to day 9 of the experiment, dextran sulfate sodium (DSS) (MPBio) was added to 1%, 2% or 3% (w/v) to the animals drinking water while maintaining kanamycin and anhydrotetracycline as described above. Animals were treated in groups of 3 and negative control animals received no DSS via the water.
[0240] The experiment was terminated on day 19 when colonal and cecal contents were also harvested for plasmid DNA extraction.
[0241] Plasmid DNA was extracted using the QIAprep Spin Miniprep Kit according to the manufacturer's instructions, volumes of buffers were increased to 500, 500 and 700 μL for buffers P1, P2 and N3, respectively to adjust for the increased biomass. Plasmid DNA was eluted in 150 μL of buffer EB and subsequently concentrated by precipitation. Therefore, 15 μL of 3M sodium acetate solution pH 5.2 (Sigma-Aldrich) and 105 μL isopropanol were added to each sample. Samples were incubated at −20° C. for at least 20 mins. Following centrifugation to precipitate nucleic acids (20,000×g, 30 mins, 4° C.), the supernatant was removed and the DNA pellet was washed with 150 μL of 70% (v/v) ethanol by centrifugation (20,000×g, 15 mins, 4° C.). Ethanol was aspirated and DNA pellets were briefly dried at 55° C. upon which the DNA pellet was resuspended in 15 μL of buffer EB. From this eluate, 7.5 μL were used for SENECA adapter ligation with all subsequent step of the SENECA protocol performed as described previously.
[0242] For the diet experiment comparing chow and starch diets, all animals were maintained on a chow-based diet (3307, Kliba Nafag) prior to the experiment. On Day 1 of the experiment, 5 animals were continuously maintained on the chow-based diet, while a second group of 5 animals was switched to a starch based diet (D12450Ji, Research Diets Inc.). On Day 2 of the experiment, anhydrotetracycline and kanamycin sulfate were added to the drinking water (30 μg/mL and 100 μg/mL, respectively). On Day 3 of the experiment, all animals were orally gavaged with 1×10.sup.9 colony forming units (CFU) of E. coli BL21(DE3) transformed with pFS_0453 as described above. Fecal pellets were collected from day 4 to day 9 of the experiment for the extraction of plasmid DNA as described above. Furthermore, on day 10 the animals were dissected to obtain cecal and colonic contents for plasmid DNA extraction as described above.
[0243] For the diet experiment comparing chow, starch and fat diets, all animals were maintained on a chow-based diet (3307, Kliba Nafag) prior to the experiment. On day 1 of the experiment, were put on either a chow-based diet (3307, Kliba Nafag), a starch-based diet (D12450Ji, Research Diets Inc.) or a fat-based diet (Fat-enriched diet D12492i, Research Diets Inc.). On Day 2 of the experiment, anhydrotetracycline and kanamycin sulfate were added to the drinking water (30 μg/mL and 100 μg/mL, respectively). On Day 3 of the experiment, all animals were orally gavaged with 1×10.sup.9 colony forming units (CFU) of E. coli MG1655 transformed with pFS_0453 as described above. Fecal pellets were collected from day 4 to day 10 of the experiment for the extraction of plasmid DNA as described above.
[0244] Furthermore, on day 10 the animals were dissected to obtain cecal and colonic contents for plasmid DNA extraction as described above.
TABLE-US-00001 TABLE 1 RT-Cas1 orthologs Host strains and protein accession number of RT-Cas1 orthologs idenfitied by HMMER-based protein sequence homology search Host and protein accession number Bacteroides salyersiae 494745665 ref WP_007481073.1 Leptolyngbya sp. PCC 7375493562087 ref WP_006515493.1 Photobacterium aphoticum 837770314 ref WP_047875592.1 Millisia brevis 1055178592 ref WP_066909103.1 Calothrix parietina 505008919 ref WP_015196021.1 Bacteroides fragilis str. 3397 T10 595923015 gb EXY33263.1 Pelodictyon phaeoclathratiforme 501500885 ref WP_012509117.1 Arthrospira platensis 493670156 ref WP_006620498.1 Calothrix sp. PCC 7507504941836 ref WP_015128938.1 Leptolyngbya sp. PCC 6406495588276 ref WP_008312855.1 Lachnoanaerobaculum saburreum 987863574 ref WP_060932241.1 Candidatus Brocadia fulgida 816979878 gb KKO19838.1 Leptolyngbya sp. O-77984539873 dbj BAU44853.1 Tistrella mobilis KA081020-065 388530577 gb AFK55773.1 Smithella sp. SC K08D17745626258 gb KIE18281.1 Lachnospiraceae bacterium oral taxon 082 497051594 ref WP_009447486.1 Psychrobacter lutiphocae 518502663 ref WP_019672870.1 Propionicicella superfundia 916602138 ref WP_051209229.1 Loktanella vestfoldensis 518800937 ref WP_019956891.1 Desulfovibrio hydrothermalis 505147525 ref WP_015334627.1 Oceanospirillum beijerinckii 654849652 ref WP_028302067.1 Fischerella muscicola 737152142 ref WP_035139015.1 Desulfobacca acetoxidans 503473041 ref WP_013707702.1 Hippea sp. KMI 643957755 ref WP_025270209.1 Chlorobium limicola 501442438 ref WP_012465887.1 Desulfarculus baarsii 503023536 ref WP_013258512.1 Thiocapsa sp. KS1971091367 emb CRI67871.1 Candidatus Accumulibacter sp. SK-02 668684200 gb KFB76584.1 Candidatus Magnetoglobus multicellularis str. Araruama 571788307 gb ETR69258.1 Vibrio sinaloensis 740352375 ref WP_038188758.1 Campylobacter concisus 544653868 ref WP_021087740.1 Cellulomonas bogoriensis 917498396 ref WP_052104813.1 Teredinibacter turnerae 518435809 ref WP_019606016.1 Campylobacter fetus subsp. fetus 998762051 emb CZE46369.1 Gemmatimonadetes bacterium SCN 70-22 1063993205 gb ODT03821.1 Microcoleus sp. PCC 7113504999115 ref WP_015186217.1 Micromonospora rosaria 1000329745 gb KXK58998.1 Candidatus Entotheonella sp. TSY2 575418691 gb ETX03376.1 Lachnoanaerobaculum sp. MSX33 570843978 gb ETO97675.1 Corynebacterium durum 492955761 ref WP_006063846.1 Anabaena cylindrica PCC 7122 428682296 gb AFZ61061.1 Pseudanabaena biceps 497311431 ref WP_009625648.1 Vibrio sp. MEBiC08052 972247703 gb KUI97421.1 Actinomyces johnsonii 545331217 ref WP_021604855.1 Microlunatus phosphovorus 503627960 ref WP_013862036.1 Kamptonema 494597365 ref WP_007355619.1 Skermania piniformis 1054700955 ref WP_066466672.1 Fischerella sp. NIES-3754 965689238 dbj BAU08380.1 Chlorobium phaeobacteroides 500067943 ref WP_011745868.1 Vibrio vulnificus 499466110 ref WP_011152750.1 Bacteroides fragilis 547947118 ref WP_022348096.1 Porphyromonas sp. COT-052 OH4946 746384965 ref WP_039428138.1 Kutzneria sp. 744 918333650 ref WP_052396493.1 Porphyromonas crevioricanis 565855908 ref WP_023938229.1 Rubrivivax benzoatilyticus 497541412 ref WP_009855610.1 Streptomyces sp. F-3 1026350507 dbj GAT81929.1 Campylobacter gracilis 492518353 ref WP_005873073.1 Fusicatenibacter saccharivorans 941895202 ref WP_055226073.1 uncultured Thiohalocapsa sp. PB-PSB1 557040601 gb ESQ17084.1 Porphyromonas gingivalis 492529527 ref WP_005874916.1 uncultured Thiohalocapsa sp. PB-PSB1 557029821 gb ESQ08042.1 Azospirillum lipoferum 503954719 ref WP_014188713.1 Teredinibacter sp. 991H.S.0a.06797071444 ref WP_045826479.1 Tolypothrix campylonemoides 751570959 ref WP_041039832.1 Pseudoalteromonas rubra 800981085 ref WP_046007427.1 Rhodovulum sulfidophilum 985596740 ref WP_060836241.1 Teredinibacter turnerae 516642225 ref WP_018013804.1 Arcobacter thereius 1054172508 ref WP_066177132.1 Nocardiopsis baichengensis 516128787 ref WP_017559367.1 Arthrospira maxima 493720432 ref WP_006669920.1 Eubacteriaceae bacterium CHKCI004 1016807618 emb CVI70780.1 Frankia sp. BMG5.1 919937513 ref WP_052914180.1 Roseburia inulinivorans 937570588 emb CRL43259.1 Porphyromonas gingivalis 503581191 ref WP_013815267.1 Campylobacter fetus subsp. fetus 998759376 emb CZE50714.1 Microcystis aeruginosa 640538680 ref WP_024971209.1 Marinomonas mediterranea 503425197 ref WP_013659858.1 Candidatus Magnetomorum sp. HK-1 927673953 gb KPA10619.1 Campylobacter fetus subsp. fetus 998758141 emb CZE46264.1 Synechococcus sp. NKBG042902 780027826 ref WP_045442561.1 Chlorobaculum limnaeum 1071376969 ref WP_069809202.1 Nostoc sp. PCC 7107764929206 ref WP_044499977.1 Arthrospira platensis 504041557 ref WP_014275551.1 Woodsholea maritima 518804695 ref WP_019960649.1 Actinomyces cardiffensis F0333 478776992 gb ENO18597.1 Mastigocladus laminosus 764662524 ref WP_044448019.1 Clostridium 916986069 ref WP_051592781.1 Rhodococcus sp. YH3-3 1033138899 ref WP_064444911.1 Rhodobacter capsulatus 940623611 gb KQB14189.1 Lachnoanaerobaculum saburreum 496026892 ref WP_008751399.1 Vibrio metoecus 941008961 ref WP_055043549.1 Porphyromonas gingivicanis 739003123 ref WP_036885018.1 Smithella sp. D17683425608 gb KFZ44108.1 Candidatus Accumulibacter sp. BA-91 668677118 gb KFB71594.1 Nodosilinea nodulosa 515871661 ref WP_017302244.1 Phormidesmis priestleyi Ana 938299454 gb KPQ33062.1 Vibrio mexicanus 823288127 ref WP_047044098.1 Photobacterium marinum 494733933 ref WP_007469744.1 Candidatus Brocadia fulgida 816977369 gb KKO17867.1 Desulfovibrio bastinii 652926624 ref WP_027180402.1 Candidatus Magnetoovum chiemensis 778249022 gb KJR40057.1 Azospirillum lipoferum 502738680 ref WP_012973664.1 Cyanothece sp. PCC 7822503100147 ref WP_013334941.1 Closaidiales bacterium VE202-01 639695530 ref WP_024721321.1 Actinomycetaceae bacterium BA112 1032601389 ref WP_064231067.1 Bacteroides 495935708 ref WP_008660287.1 Candidatus jettenia caeni 494421634 ref WP_007220853.1 Rhodobacter capsulatus SB 1003 294475643 gb ADE85031.1 Oscillatothles cyanobacterium USR001 1049312742 gb OCQ91006.1 Nostoc sp. PCC 7120 499304863 ref WP_010995638.1 Vibrio metoecus 941038135 ref WP_055051199.1 Scytonema hofmanni UTEX B 657929289 ref WP_029630506.1 Arthrospira sp. PCC 8005 495324841 ref WP_008049584.1 Phormidium willei 1057444347 ref WP_068790073.1 Vibrio rotiferianus 742405863 ref WP_038884984.1 Thermodesulfovibrio sp. N1 1057568519 ref WP_068860870.1 Bacteroides fragilis 492341859 ref WP_005815836.1 Rhodovulum sp. PH10750340320 ref WP_040622239.1 Porphyromonas gulae 807048030 ref WP_046200570.1 Arthrospira sp. TJSD091 809071417 ref WP_046320545.1 Streptomyces sp. AVP053U2 1057451804 gb ODA69832.1
TABLE-US-00002 TABLE 2 First round PCR primers for classic acquisition readout Primer bindings sites for first round PCR primers to amplify CRISPR arrays for deep sequencing, related to classical acquisition read-out in FIG. 6. Forward primer binding site is shown in top lane for each species, reverse primer binding site in bottom lane. The design of the primers including adapter sequences for first round PCR is described in detail in Primer Design Note 1 in the methods section of this paper. Array Sequence (5′.fwdarw.3′) (SEQ ID NO) Bacteroides fragilis strain S14 TCAACACTTCATCTATCTAACTGAATAA (105) TGTTATGAACGGCTACGCCT (106) Campylobacter fetus subsp. Fetus CGCTCGAATTCAGCTCTCACAG (107) AATTGCCAAATTCTGTTTCAATCC (108) Cellulomonas bogoriensis 69B4 GTCAGCCCGGGGTCAAAAC (109) GGAACTTTAAACCCTTTACATCCCC (110) Fusicatenibacter saccharivorans array TCAGAAAAACGATCGACCGAC (111) 1 AGAAGAAGCAATCGAAAAAGCG (112) Fusicatenibacter saccharivorans array AGAATCTGAAAACAGCGGAA (113) 2 ACGCTAGGGAATATGCAGCAA (114) Candidatus Accumulibacter sp. SK-02 CCGAAAAGAGCCGTTAAATTCC (115) CCTCAAAACGGTACCAAAGAAGC (116) Micromonospora rosaria array 1 CACAGCACCTCTTCGCCACG (117) CGATTCCGGTCCTCGGTTTC (118) Micromonospora rosaria array 2 CTCAAGACCCACCGTTTTCG (119) TTCAACAACGACGCCAACTATG (120) Candidatus Accumulibacter sp. BA-91 GCAAGTCTCCGGCAAGTCAG (121) TCACTTGAAGATTATATAGTGACTCTTTTCG (122) Desulfarculus baarsii DSM 2075 TGGCAAACCATGTGGAAACAG (123) AAAATGGCAACGCCGGG (124) Woodsholea maritima TGGAGCTGAATGTCACATCTTG (125) GGAATCTCAAGCAGCGGAGAA (126) Azospirillum lipoferum 4B array 1 CACAGGATGCGTGGAAAGG (127) CTCAACGAACCGAAGCTGC (128) Azospirillum lipoferum 4B array 2 CCGTTGGGAATTTTCCCGTT (129) GACTCTTTTTCCCGGAGCCC (130) Teredinibacter turnerae T8412 CCCAAACGGGGTTCTAGCAT (131) GCGACAAAAGCATATTAAGGAGACT (132) Tolypothrix campylonemoides GCGCTGTAGAATTATTTCAGGGT (133) ATGGGATGGAGGTTCGGGT (134) Oscillatothles cyanobacterium GAGCTTGGGGCAAGGCTC (135) GTCGAGAAGTAGCAGTTCACTTTCT (136) Eubacterium saburreum DSM 3986 ACCTATCACAACGGCTTAAATG (137) Array 1 ATCACTGCTATGCAGCTTATTCG (138) Eubacterium saburreum DSM 3986 AAAGCGAGGGCTTTCCCATA (139) array 2 CTCATCAGAATGTGACGGTCG (140)
TABLE-US-00003 TABLE 3 Indices for deep sequencing (N).sub.8 barcodes corresponding to Illumina TruSeq HT indices used in this study BC1 Sequence (5′.fwdarw.3′) BC2 Sequence (5′.fwdarw.3′) AAGTAGAG CATGATCG CATGCTTA AGGATCTA GCACATCT GACAGTAA TGCTCGAC CCTATGCC AGCAATTC TCGCCTTG AGTTGCTT ATAGCGTC CCAGTTAG GAAGAAGT TTGAGCCT ATTCTAGG ACACGATC CGTTACCA GGTCCAGA GTCTGATG GTATAACA TTACGCAC TTCGCTGA TTGAATAG AACTTGAC TCCTTGGT CACATCCT ACAGGTAT TCGGAATG AGGTAAGG AACGCATT AACAATGG CGCGCGGT ACTGTATC TCTGGCGA AGGTCGCA CATAGCGA AGGTTATC CAGGAGCC CAACTCTC TGTCGGAT CCAACATT ATTATGTT CTAACTCG CCTACCAT ATTCCTCT TACTTAGC CTACCAGG
TABLE-US-00004 TABLE 4 SENECA adapter oligos Reverse oligos for adapter ligation during SENECA procedure sorted by their respective CRISPR array. Related to FIG. 7 and 8. Upon annealing with the universal reverse oligo FS_0963, the array specific forward oligo (table below) creates a 4 bp overhang compatible with the plasmid overhang generated during FaqI digest in SENECA. Array Sequence (5′.fwdarw.3′) (SEQ ID NO) Bacteroides fragilis strain S14 Array 1 ATAAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (141) Bacteroides fragilis strain S14 Array 1 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (142) RC Campylobacter fetus subsp. Fetus Array TAGGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (143) 1 Campylobacter fetus subsp. Fetus Array GAAAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (144) 1 RC Cellulomonas bogoriensis 69B4 Array 1 GAGGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (145) Cellulomonas bogoriensis 69B4 Array 1 GCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (146) RC Fusicatenibacter saccharivorans Array 1 TGAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (147) Fusicatenibacter saccharivorans Array 1 AGGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (148) PC Fusicatenibacter saccharivorans Array 2 AAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (149) Fusicatenibacter saccharivorans Array 2 AGGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (150) RC Candidatus Accumulibacter sp. SK-02 AAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (151) Array 1 Candidatus Accumulibacter sp. SK-02 GGCTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (152) Array 1 RC Micromonospora rosaria Array 1 GCGGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (153) Micromonospora rosaria Array 1 RC CTGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (154) Micromonospora rosaria Array 2 GCGGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (155) Micromonospora rosaria Array 2 RC CTGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (156) Micromonospora rosaria Array 3 GGGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (157) Candidatus Accumulibacter sp. BA-91 AACAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (158) Array 1 Desulfarculus baarsii DSM 2075 Array 1 AAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (159) Desulfarculus baarsii DSM 2075 Array 1 GCATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (160) RC Desulfarculus baarsii DSM 2075 Array 2 AAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (161) Desulfarculus baarsii DSM 2075 Array 2 GCATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (162) RC Woodsholea maritima Array 1 GAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (163) Woodsholea maritima Array 1 RC GATTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (164) Woodsholea maritima Array 2 GAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (165) Woodsholea maritima Array 2 RC GATGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (166) Azospirillum lipoferum 4B Array 1 GAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (167) Azospirillum lipoferum 4B Array 1 RC GACAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (168) Azospirillum lipoferum 4B Array 2 TAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (169) Azospirillum lipoferum 4B Array 2 RC ATGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (170) Teredinibacter turnerae T8412 Array 1 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (171) Teredinibacter turnerae T8412 Array 1 GAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (172) RC Tolypothrix campylonemoides Array 1 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (173) Tolypothrix campylonemoides Array 1 GAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (174) RC Tolypothrix campylonemoides Array 2 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (175) Tolypothrix campylonemoides Array 2 GAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (176) RC Tolypothrix campylonemoides Array 3 AAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (177) Tolypothrix campylonemoides Array 3 GAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (178) RC Oscillatothles cyanobacterium Array 1 AATTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (179) Oscillatothles cyanobacterium Array 1 TAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (180) RC Oscillatothles cyanobacterium Array 2 GATTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (181) Oscillatothles cyanobacterium Array 2 CCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (182) RC Rivularia sp. PCC 7116 Array 1 GATTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (183) Rivularia sp. PCC 7116 Array 1 RC CCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (184) Rivularia sp. PCC 7116 Array 2 TAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (185) Rivularia sp. PCC 7116 Array 2 RC GGTAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (186) Eubacterium saburreum DSM 3986 TAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (187) Array 1 Eubacterium saburreum DSM 3986 GGTAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (188) Array 1 RC Eubacterium saburreum DSM 3986 ATAAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (189) Array 2 Eubacterium saburreum DSM 3986 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (190) Array 2 RC
TABLE-US-00005 TABLE 5 First round PCR primers for SENECA acquisition readout Primer binding sites for DR specific SENECA forward amplification primer sorted by their respective CRISPR arrays. Related to FIG. 8. During SENECA PCR, the forward primer was chosen corresponding to the respective CRISPR array while FS_0911 serves as a universal reverse primer binding the Illumina Adapter. Details on primer design are described in Primer Design Note 1 and 2. For the CRISPR array directionality screen, staggering was conducted by ordering only two forward primers with different stagger length (NN and NNN) instead of the usual 7 forward primers described for Fusicatenibacter sacchaiivorans array 2. Array Sequence (5′.fwdarw.3′) (SEQ ID NO) Bacteroides fragilis strain S14 Array 1 CAGTATAATAAGGATTAAGAC (191) Bacteroides fragilis strain S14 Array 1 RC ACTGGAATACATCTACAT (192) Campylobacter fetus subsp. Fetus Array 1 ATTAGGGGAT GAAAC (193) Campylobacter fetus subsp. Fetus Array 1 RC GGAGAAAGTGTCTAAAC (194) Cellulomonas bogoriensis 69B4 Array 1 GAGGGCATTGAAAC (195) Cellulomonas bogoriensis 69B4 Array 1 RC GCCATGGGTGGAAC (196) Fusicatenibacter saccharivorans Array 1 CCTATGAGGAATTGAAAC (197) Fusicatenibacter saccharivorans Array 1 RC CATAGGTAAGGTACAAC (198) Fusicatenibacter saccharivorans Array 2 CCTAAAAGGAATTGAAAC (199) Fusicatenibacter saccharivorans Array 2 RC TTTAGGTAAAGTACGAC (200) Candidatus Accumulibacter sp. SK-02 Array 1 GATAAAGGGATTGAGAC (201) Candidatus Accumulibacter sp. SK-02 Array 1 RC GGGCTTAGTTTTCAC (202) Micromonospora rosaria Array 1 GCGGGCATAGAAAC (203) Micromonospora rosaria Array 1 RC CTGTGGATGGCGAT (204) Micromonospora rosaria Array 2 GCGGGCATAGAAAC (205) Micromonospora rosaria Array 2 RC CTGTGGATGGCAAT (206) Micromonospora rosaria Array 3 GGTGATGAGCGAC (207) Candidatus Accumulibacter sp. BA-91 Array 1 GAACAGGCTTGAAAC (208) Desulfarculus baarsii DSM 2075 Array 1 GAAGCGGATTGAAAC (209) Desulfarculus baarsii DSM 2075 Array 1 RC GGCATCCCTCAATAG (210) Desulfarculus baarsii DSM 2075 Array 2 GAAGCGGATTGAAAC (211) Desulfarculus baarsii DSM 2075 Array 2 RC GGCATCCCTCAATAG (212) Woodsholea maritima Array 1 CAGAGCTGATCAAAAC (213) Woodsholea maritima Array 1 RC GATTCGAGCAGAGC (214) Woodsholea maritima Array 2 GGAGCGGATTGAAAC (215) Woodsholea maritima Array 2 RC GATGCCGTCGCGAC (216) Azospirillum lipoferum 4B Array 1 GGAGCGGATTGAAAC (217) Azospirillum lipoferum 4B Array 1 RC GACACCGGCGGAAC (218) Azospirillum lipoferum 4B Array 2 GCTAAGGCTGTGAAAC (219) Azospirillum lipoferum 4B Array 2 RC CTAATGTCGATTGCGAC (220) Teredinibacter turnerae T8412 Array 1 AAGTTGAATTAATGGAAAC (221) Teredinibacter turnerae T8412 Array 1 RC TTCCGAAGAAGTTTAAAG (222) Tolypothrix campylonemoides Array 1 AAGTTGAATTAATGGAAAC (223) Tolypothrix campylonemoides Array 1 RC GGGAGAAGTTTAACAG (224) Tolypothrix campylonemoides Array 2 AAGTTGAATTAATGGAAAC (225) Tolypothrix campylonemoides Array 2 RC TTCCGAAGAAGTTTAAAG (226) Tolypothrix campylonemoides Array 3 AGTCAAATTAATGGAAAC (227) Tolypothrix campylonemoides Array 3 RC CAGAGAAGTCGAGAAG (228) Oscillatothles cyanobacterium Array 1 GTCAAATTAATGGAAACA (229) Oscillatothles cyanobacterium Array 1 RC CCTAAGAAGTCGAAAG (230) Oscillatothles cyanobacterium Array 2 CGGATTAGTTGGAAAC (231) Oscillatothles cyanobacterium Array 2 RC CCCAATCGGTGGGG (232) Rivularia sp. PCC 7116 Array 1 CGGATTAGTTGGAAAC (233) Rivularia sp. PCC 7116 Array 1 RC CCCAATCGGTGGGG (234) Rivularia sp. PCC 7116 Array 2 CCTATAAGGAATGGAAAC (235) Rivularia sp. PCC 7116 Array 2 RC TTATAGGTAAGGTACTTAC (236) Eubacterium saburreum DSM 3986 Array 1 CCTATAAGGAATGGAAAC (237) Eubacterium saburreum DSM 3986 Array 1 RC TTATAGGTAAGGTACTTAC (238) Eubacterium saburreum DSM 3986 Array 2 CAGTATAATAAGGATTAAGAC (239) Eubacterium saburreum DSM 3986 Array 2 RC ACTGGAATACATCTACAT (240)
TABLE-US-00006 TABLE 6 Miscellaneous Primers Primers and oligonucleotides used for cloning purposes. Primer ID Sequence (5′.fwdarw.3′) (SEQ ID NO) FS_0151 ATGCTTCATGTCACCAGGTAGTCTTCCATCGACTTCAAAACTCGATCCAACATCCT GAAGACGCGGCCGCTATTCTTTTGATTTATAAGGGATTTTG (241) FS_0152 CAACAACATGAATGATCTTCGGTTTCCGTGTTTCG (242) FS_0153 CACGGAAACCGAAGATCATTCATGTTGTTGCTCAGGTC (243) FS_0154 CGCCGCACTTATGACTATCTTCTTTATCATGCAACTCG (244) FS_0155 GATAAAGAAGATAGTCATAAGTGCGGCGACG (245) FS_0156 GATACCGAAGATAGCTCATGTTATATCCCGCCG (246) FS_0157 GATATAACATGAGCTATCTTCGGTATCGTCGTATCC (247) FS_0158 CTCCCATGAAGATGGTACGCGACTGGGC (248) FS_0159 GTCGCGTACCATCTTCATGGGAGAAAATAATACTGTTG (249) FS_0160 GAAGACTACCTGGTGACATGAAGCATCTCGAGGGTCTTCCTTGCCGGTGGTGCAGA TGTTGAACAGAAGACCACATATGTATATCTCCTTCTTAAAGTTAAACAAAATTATT TC (250) FS_0380 TCGAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCG CTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTT TTGCTGAAAGGAGGAACTATATCCGGATA (251) FS_0381 CCTGGTATCCGGATATAGTTCCTCCTTTCAGCAAAAAACCCCTCAAGACCCGTTTA GAGGCCCCAAGGGGTTATGCTAGTTATTGCTCAGCGGTGGCAGCAGCCAACTCAGC TTCCTTTCGGGCTTTGTTAGCAGCCGGATC (252) FS_0658 GCTCAGCATATGGACATCCTGATCAGAAACAAGAAG (253) FS_0659 GCTCAGCATATGCAGTACTCCAACTGGCACGACTC (254) FS_0660 GCTCAGCATATGTTCATCAACGGTCGTTACCACATC (255) FS_0662 CCTACTCGCTTCTGGTGAATGTC (256) FS_0871 CCGGATACCAGGTGAGAATTAAATTG (257) FS_0904 GTTTAGCGGCCGCGGGACGTTTCAATTCCTCATAGGTAAGGTACAACATCAGCATT TCCGCTATTTTCAC (258) FS_0911 GTGACTGGAGTTCAGACG (259) FS_0963 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC (260) FS_0964 AAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (261) FS_0995 GATATACATATGTTCACTATAGACGAGATG (262) FS_0996 ATATAGCTGCGGCGTATCTGATC (263) FS_0997 AGATACGCCGCAGCTATATACATCTATATGGACAGCTACGAGAAG (264) FS_0998 GTCGGATGTCTCTAAGATCTGG (265) FS_1001 GCGAAATTAATACGACTCACTATAGG (266) FS_1002 TACTCGCTTCTGGTGAATGTC (267) FS_1003 GAGCTTTAGCCGCTAAGAGCATCATG (268) FS_1004 CATGATGCTCTTAGCGGCTAAAGCTC (269) FS_1005 GTTGCTGGCGGCAACAACCCC (270) FS_1006 GGGGTTGTTGCCGCCAGCAAC (271) FS_1007 GATGTCAGCAAAAGCCAGGTTAAGG (272) FS_1008 CCTTAACCTGGCTTTTGCTGACATC (273) FS_1009 GCTTGAAGATGGCAGCAAAATCC (274) FS_1010 GGATTTTGCTGCCATCTTCAAGC (275) FS_1011 CTATGACTATAGGCGCGAAGATGTCAGC (276) FS_1012 GCTGACATCTTCGCGCCTATAGTCATAG (277) FS_1054 ACGCATGTCCGGTAAAATGA (278) FS_1055 CAAGTCATTTTACCGGACAT (279) FS_1056 GCTCAGGAAGACTTTGCTTAAAATGGTTCAACGCTGACAAAG (280) FS_1057 GTTTAGAAGACTTGATCTTACAGGCTGGTTACGTTACCAG (281) FS_1038 ACGCATGAGTCAGAATACGCTGAAAGTT (282) FS_1039 CAAGAACTTTCAGCGTATTCTGACTCAT (283) FS_1040 GCTCAGGAAGACTTTGCTAATGAAGATGCGGAATTTGATG (284) FS_1041 GTTTAGAAGACTTGATCTTACTCGCGGAACAGCGC (285) FS_1046 ACGCATGCGAAGCTCGGCTAAGCAAGAAGAACTA (286) FS_1047 CAAGTAGTTCTTCTTGCTTAGCCGAGCTTCGCAT (287) FS_1048 GTTTAGAAGACTTTGCTTTTAAAGCATTACTTAAAGAAGAGAAATTTAGC (288) FS_1049 GTTTAGAAGACTTGATCTTAAAGCTCCTGGTCGAACAG (289) FS_1123 GCTCAGGAAGACTACCGGTGGCACGTAAGAGGTTCCAAC (290) FS_1125 GTTTAGGATCCGATCGCGTCTTCTGATCGTTGGAATCGCCATGGGAAGTCGAATGG AAGACTACTCTAGTAGTGCTCAGTATCTCTATC (291) FS_1134 GCTCAGGAAGACTTAGAGAAGCTTGCGGAGGAGCATGCATGAGCAAAGGAGAAGAA CTTTTC (292) FS_1135 GTTTAGAAGACTTGATCCTATCATTTGTAGAGTTCATCCATGCC (293) FS_1136 GCTCAGGAAGACTTAGAGAAGCTTGCGGAGGAGCATGCATGGCTTCCAAGGTGTAC G (294) FS_1137 GTTTAGAAGACTTGATCTCATTACTGCTCGTTCTTCAGCAC (295) FS_1154 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNAGCTCGGCTAAGCAAGAAGA (296) FS_1155 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNAGCTCGGCTAAGCAAGAAGA (297) FS_1156 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNAGCTCGGCTAAGCAAGAAG A (298) FS_1157 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNAGCTCGGCTAAGCAAGAA GA (299) FS_1158 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCNNGGTCAACATCCGCGAGACTT (300) FS_1159 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNGGTCAACATCCGCGAGACTT (301) FS_1160 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNNGGTCAACATCCGCGAGACT T (302) FS_1161 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNNNGGTCAACATCCGCGAGAC TT (303) FS_1406 GCTGAAAGGAGGAACTATATCCG (304) FS_1407 CAAAATCCCTTATAAATCAAAAGAATAGC (305) FS_1584 CGCCGCAAGGAATGGTGCATGCAACTAGTATACAGTGACTCTTGGCGCGCCTTGAC GGCTAGCTCAGTCCTAGGTACAGTGCTAGCTACTAGAGAAAGAGGAGAAATACTAG ATGAAAAAC (306) FS_1585 CGATCCTACAGGTGAATTCATGCCTTTAATTATAAACGCAGAAAG (307) FS_1586 GGCATGAATTCACCTGTAGGATCGTACAGGTTTACGCAAGAAAATGGTTTGTTATA GTCGAATAAATACTGAGTCTTCACCACGACGATTTCCGGCAGTTTCTCCACAGAAG ACAACGATTAAAGGCATCAAATAAAACGAAAG (308) FS_1587 GAAAGTTGGAACCTCTTACGTGCCAGTCGACCCCAGCTGTCTAGGGCG (309) FS_1588 TCGACCATTCGACTTCCCACGATTCCAACGATCAGG (310) FS_1589 GATCCCTGATCGTTGGAATCGTGGGAAGTCGAATGG (311) FS_1618 GCTCAGGGTCTCATACTAGAGAAAGAGGAGAAATACTAGATGGAAGATGCCAAAAA CATAAAG (312) FS_1619 GTTTAGGTCTCAATCGTCATTACACGGCGATCTTTCCG (313) FS_1620 GCTCAAGAAGACAAAGAGATGGCTTCCAAGGTGTACG (314) FS_1621 GCTCAGGGTCTCATACTATGGAAGATGCCAAAAACATAAAG (315) FS_1574 GCTCAGGCCATGCCGGCGGCACGTAAGAGGTTCCAAC (316) FS_1575 CTCCTTTGCTCATGCATGC (317) FS_1576 GCTCAGGCATGCATGTTCACTATAGACGAGATGCTATC (318) FS_1577 AAGTCGGATGTCTCTAAGATCTG ( 319) FS_1641 GCGGAGGAGCATGCATGTTTACCATCGACGAGATG (320) FS_1642 CAGCCGGATCTCGAGTTAG (321)
TABLE-US-00007 TABLE 7 Primers and TaqMan probes used for qRT-PCR Primer ID Sequence (5′.fwdarw.3′) (SEQ ID NO) 16S rRNA E.coli TaqMan Fw TGGCGCATACAAAGAGAAGC (322) 16S rRNA E.coli TaqMan Rv ACTCCAATCCGGACTACGAC (323) 16S rRNA E.coli TaqMan probe ACCTCGCGAGAGCAAGCGGACC (324) (5′FAM/3′Black Hole Quencher 1) sfGFP E.coli TaqMan Fw CGGATCACATGAAACGGCAT (325) sfGFP E.coli TaqMan Rv CGTCTTGTAGGTCCCGTCAT (326) sfGFP E.coli TaqMan probe ACCTTCGGGCATGGCACTCTTG (327) (5′HEX/3′Black Hole Quencher 1) Rluc E.coli TaqMan Fw AATGGGTAAGTCCGGCAAGA (328) Rluc E.coli TaqMan Rv CGTGGCCCACAAAGATGATT (329) Rluc E.coli TaqMan probe ACCTCACCGCTTGGTTCGAGCTGC (330) (5′HEX/3′Black Hole Quencher 1) Fluc E.coli TaqMan Fw GCTCCAACACCCCAACATCTTC (331) Fluc E.coli TaqMan Rv GCTCCAAAACAACAACGGCG (332) Fluc E.coli TaqMan probe CAGGTGTCGCAGGTCTTCCCGACGA (333) (5′HEX/3′Black Hole Quencher 1)
[0245] Sequences 1—RT-Cas1s, Cas2s and CRISPR Arrays
[0246] Codon mapped DNA Sequences for the individual RT-Cas1, Cas2 orthologs were ordered from Twist Biosciences or Genscript along with their predicted CRISPR arrays for the classical adaptation read-out in
[0247] Bacteroides fragilis strain 514
[0248] Bacteroides fragilis strain S14 RT-Cas1 (SEQ ID NO 1)
[0249] Bacteroides fragilis strain S14 Cas2 (SEQ ID NO 2)
[0250] Bacteroides fragilis strain S14 Array (SEQ ID NO 102)
[0251] Campylobacter fetus subsp. Fetus
[0252] Campylobacter fetus subsp. Fetus RT-Cas1 (SEQ ID NO 3)
[0253] Campylobacter fetus subsp. Fetus Cas2 (SEQ ID NO 4)
[0254] Campylobacter fetus subsp. Fetus Array (SEQ ID NO 103)
[0255] Cellulomonas bogoriensis 69B4
[0256] Cellulomonas bogoriensis 69B4 RT-Cas1 (SEQ ID NO 5)
[0257] Cellulomonas bogoriensis 69B4 Cas2 (SEQ ID NO 6)
[0258] Cellulomonas bogoriensis 69B4 Array (SEQ ID NO 35)
[0259] Fusicatenibacter saccharivorans
[0260] Fusicatenibacter saccharivorans RT-Cas1 (SEQ ID NO 7)
[0261] Fusicatenibacter saccharivorans Cas2 (SEQ ID NO 8)
[0262] Fusicatenibacter saccharivorans Array 1 (SEQ ID NO 36)
[0263] Fusicatenibacter saccharivorans Array 2 (SEQ ID NO 37)
[0264] Candidatus Accumulibacter sp. SK-02
[0265] Candidatus Accumulibacter sp. SK-02 RT-Cas1 (SEQ ID NO 9)
[0266] Candidatus Accumulibacter sp. SK-02 Cas2 (SEQ ID NO 10)
[0267] Candidatus Accumulibacter sp. SK-02 Array (SEQ ID NO 38)
[0268] Micromonospora rosaria
[0269] Micromonospora rosaria RT-Cas1 (SEQ ID NO 11)
[0270] Micromonospora rosaria Cas2 (SEQ ID NO 12)
[0271] Micromonospora rosaria Array 1 (SEQ ID NO 39)
[0272] Micromonospora rosaria Array 2 (SEQ ID NO 40)
[0273] Candidatus Accumulibacter sp. BA-91
[0274] Candidatus Accumulibacter sp. BA-91 RT-Cas1 (SEQ ID NO 13)
[0275] Candidatus Accumulibacter sp. BA-91 Cas2 (SEQ ID NO 14)
[0276] Candidatus Accumulibacter sp. BA-91 Array (SEQ ID NO 41)
[0277] Desulfarculus baarsii DSM 2075
[0278] Desulfarculus baarsii DSM 2075 RT-Cas1 (SEQ ID NO 15)
[0279] Desulfarculus baarsii DSM 2075 Cas2 (SEQ ID NO 16)
[0280] Desulfarculus baarsii DSM 2075 Array (SEQ ID NO 42)
[0281] Woodsholea maritima
[0282] Woodsholea maritima RT-Cas1 (SEQ ID NO 17)
[0283] Woodsholea maritima Array (SEQ ID NO 43)
[0284] Azospirillum lipoferum 4B
[0285] Azospirillum lipoferum 4B RT-Cas1 (SEQ ID NO 19)
[0286] Azospirillum lipoferum 4B Cas2 (SEQ ID NO 20)
[0287] Azospirillum lipoferum 4B Array (SEQ ID NO 44)
[0288] Azospirillum lipoferum 4B Array 2 (SEQ ID NO 45)
[0289] Vibrio sinaloensis strain T08
[0290] Vibrio sinaloensis strain T08 RT-Cas1 (SEQ ID NO 21)
[0291] Vibrio sinaloensis strain T08 Cast (SEQ ID NO 22)
[0292] Vibrio sinaloensis strain T08 Array (SEQ ID NO 46)
[0293] Teredinibacter turnerae T8412
[0294] Teredinibacter turnerae T8412 RT-Cas1 (SEQ ID NO 23)
[0295] Teredinibacter turnerae T8412 Cast (SEQ ID NO 24)
[0296] Teredinibacter turnerae T8412 Array (SEQ ID NO 47)
[0297] Tolypothrix campylonemoides
[0298] Tolypothrix campylonemoides RT-Cas1 (SEQ ID NO 25)
[0299] Tolypothrix campylonemoides Cas2 (SEQ ID NO 26)
[0300] Tolypothrix campylonemoides Array (SEQ ID NO 48)
[0301] Oscillatoriales cyanobacterium
[0302] Oscillatoriales cyanobacterium RT-Cas1 (SEQ ID NO 27)
[0303] Oscillatoriales cyanobacterium Cas2 (SEQ ID NO 28)
[0304] Oscillatoriales cyanobacterium Array (SEQ ID NO 49)
[0305] Rivularia sp. PCC 7116
[0306] Rivularia sp. PCC 7116 Cas1 (SEQ ID NO 29)
[0307] Rivularia sp. PCC 7116 RT (SEQ ID NO 33)
[0308] Rivularia sp. PCC 7116 Cas2 (SEQ ID NO 30)
[0309] Rivularia sp. PCC 7116 Array 1 (SEQ ID NO 50)
[0310] Rivularia sp. PCC 7116 Array 2 (SEQ ID NO 51)
[0311] Eubacterium saburreum DSM 3986
[0312] Eubacterium saburreum DSM 3986 RT-Cas1 (SEQ ID NO 31)
[0313] Eubacterium saburreum DSM 3986 Cas2 (SEQ ID NO 32)
[0314] Eubacterium saburreum DSM 3986 Array 1 (SEQ ID NO 52)
[0315] Eubacterium saburreum DSM 3986 Array 2 (SEQ ID NO 53)
[0316] Sequences 2—CRISPR Array Directionality Screen
[0317] Sequences of putative arrays for the CRISPR array directionality screen related to
[0318] Bacteroides fragilis strain S14
[0319] Bacteroides fragilis strain S14 Array 1 (SEQ ID NO 54)
[0320] Bacteroides fragilis strain S14 Array 1 RC (SEQ ID NO 55)
[0321] Campylobacter fetus subsp. Fetus
[0322] Campylobacter fetus subsp. Fetus Array 1 (SEQ ID NO 56)
[0323] Campylobacter fetus subsp. Fetus Array 1 RC (SEQ ID NO 57)
[0324] Cellulomonas bogoriensis 69B4
[0325] Cellulomonas bogoriensis 69B4 Array 1 (SEQ ID NO 58)
[0326] Cellulomonas bogoriensis 69B4 Array 1 RC (SEQ ID NO 59)
[0327] Fusicatenibacter saccharivorans
[0328] Fusicatenibacter saccharivorans Array 1 (SEQ ID NO 60)
[0329] Fusicatenibacter saccharivorans Array 1 RC (SEQ ID NO 61)
[0330] Fusicatenibacter saccharivorans Array 2 (SEQ ID NO 62)
[0331] Fusicatenibacter saccharivorans Array 2 RC (SEQ ID NO 63)
[0332] Candidatus Accumulibacter sp. SK-02
[0333] Candidatus Accumulibacter sp. SK-02 Array 1 (SEQ ID NO 64)
[0334] Candidatus Accumulibacter sp. SK-02 Array 1 RC (SEQ ID NO 65)
[0335] Micromonospora rosaria
[0336] Micromonospora rosaria Array 1A (SEQ ID NO 66)
[0337] Micromonospora rosaria Array 1 RC (SEQ ID NO 67)
[0338] Micromonospora rosaria Array 2A (SEQ ID NO 68)
[0339] Micromonospora rosaria Array 2 RC (SEQ ID NO 69)
[0340] Micromonospora rosaria Array 3A (SEQ ID NO 70)
[0341] Candidatus Accumulibacter sp. BA-91
[0342] Candidatus Accumulibacter sp. BA-91 Array 1 (SEQ ID NO 71)
[0343] Desulfarculus baarsii DSM 2075
[0344] Desulfarculus baarsii DSM 2075 Array 1 (SEQ ID NO 72)
[0345] Desulfarculus baarsii DSM 2075 Array 1 RC (SEQ ID NO 73)
[0346] Desulfarculus baarsii DSM 2075 Array 2 (SEQ ID NO 74)
[0347] Desulfarculus baarsii DSM 2075 Array 2 RC (SEQ ID NO 75)
[0348] Woodsholea maritima
[0349] Woodsholea maritima Array 1 (SEQ ID NO 76)
[0350] Woodsholea maritima Array 1 RC (SEQ ID NO 77)
[0351] Azospirillum lipoferum 4B
[0352] Azospirillum lipoferum 4B Array 1 (SEQ ID NO 78)
[0353] Azospirillum lipoferum 4B Array 1 RC (SEQ ID NO 79)
[0354] Azospirillum lipoferum 4B Array 2A (SEQ ID NO 80)
[0355] Azospirillum lipoferum 4B Array 2 RC (SEQ ID NO 81)
[0356] Teredinibacter turnerae T8412
[0357] Teredinibacter turnerae T8412 Array 1 (SEQ ID NO 82)
[0358] Teredinibacter turnerae T8412 Array 1 RC (SEQ ID NO 83)
[0359] Tolypothrix campylonemoides
[0360] Tolypothrix campylonemoides Array 1 (SEQ ID NO 84)
[0361] Tolypothrix campylonemoides Array 1 RC (SEQ ID NO 85)
[0362] Tolypothrix campylonemoides Array 2 (SEQ ID NO 86)
[0363] Tolypothrix campylonemoides Array 2 RC (SEQ ID NO 87)
[0364] Tolypothrix campylonemoides Array 3 (SEQ ID NO 88)
[0365] Tolypothrix campylonemoides Array 3 RC (SEQ ID NO 89)
[0366] Oscillatoriales cyanobacterium
[0367] Oscillatoriales cyanobacterium Array 1 (SEQ ID NO 90)
[0368] Oscillatoriales cyanobacterium Array 1 RC (SEQ ID NO 91)
[0369] Oscillatoriales cyanobacterium Array 2 (SEQ ID NO 92)
[0370] Oscillatoriales cyanobacterium Array 2 RC (SEQ ID NO 93)
[0371] Rivularia sp. PCC 7116
[0372] Rivularia sp. PCC 7116 Array 1 (SEQ ID NO 94)
[0373] Rivularia sp. PCC 7116 Array 1 RC (SEQ ID NO 95)
[0374] Rivularia sp. PCC 7116 Array 2 (SEQ ID NO 96)
[0375] Rivularia sp. PCC 7116 Array 2 RC (SEQ ID NO 97)
[0376] Eubacterium saburreum DSM 3986
[0377] Eubacterium saburreum DSM 3986 Array 1 (SEQ ID NO 98)
[0378] Eubacterium saburreum DSM 3986 Array 1 RC (SEQ ID NO 99)
[0379] Eubacterium saburreum DSM 3986 Array 2 (SEQ ID NO 100)
[0380] Eubacterium saburreum DSM 3986 Array 2 RC (SEQ ID NO 101)
[0381] Sequences 3—Miscellaneous Sequences
[0382] gBlock FS_gBlock_td_intron_acceptor (SEQ ID NO 104)
[0383] Human codon-optimized FsRT-Cas1-T7RBS-Cas2 (SEQ ID NO 34)
[0384] pFS 0453 plasmid (SEQ ID NO 334)