TRANSCRIPTIONAL RECORDING BY CRISPR SPACER ACQUISITION FROM RNA

Abstract

The present invention relates to a method for recording a transcriptome of a cell, the method comprising the steps of: providing a test cell comprising: a first transgene nucleic acid sequence encoding a fusion protein comprising a reverse transcriptase polypeptide and a Cas1 polypeptide and a second transgene nucleic acid sequence encoding a Cas2 polypeptide, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are under transcriptional control of an inducible promoter sequence, and a third transgene nucleic acid sequence comprising a CRISPR direct repeat (DR) sequence; wherein said CRISPR direct repeat sequence is specifically recognizable by a RT-Cas1-Cas2 complex formed by the expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence, exposing said test cell to conditions under which expression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence is induced, wherein said RT-Cas1-Cas2 complex formed by expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence acquires protospacers from RNA molecules and integrates spacers into said third transgene nucleic acid sequence yielding a modified third transgene nucleic acid sequence, isolating said modified third transgene nucleic acid sequence from said test cell yielding an isolated third transgene nucleic acid sequence, and sequencing said isolated modified third transgene nucleic acid sequence.

Claims

1. A method for recording a transcript, particularly for recording a transcriptome, of a cell, the method comprising the steps of: providing a test cell comprising: a first transgene nucleic acid sequence encoding a fusion protein comprising a reverse transcriptase polypeptide and a Cas1 polypeptide and a second transgene nucleic acid sequence encoding a Cas2 polypeptide, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are under transcriptional control of an inducible promoter sequence, and a third transgene nucleic acid sequence comprising a CRISPR direct repeat (DR) sequence; wherein said CRISPR direct repeat sequence is specifically recognizable by an RT-Cas1-Cas2 complex formed by the expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence, in an exposure step, exposing said test cell to conditions under which expression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence is induced, wherein said RT-Cas1-Cas2 complex formed by expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence acquires at least one protospacer, particularly more than one protospacer, from one or more nucleic acid molecules, more particularly one or more RNA molecules, and integrates said protospacer as spacer into said third transgene nucleic acid sequence yielding a modified third transgene nucleic acid sequence comprising at least one integrated spacer, isolating said modified third transgene nucleic acid sequence from said test cell yielding an isolated modified third transgene nucleic acid sequence, and sequencing said isolated modified third transgene nucleic acid sequence.

2. The method according to claim 1, wherein said third transgene nucleic acid sequence further comprises a CRISPR leader sequence, wherein said CRISPR leader sequence is specifically recognizable by said RT-Cas1-Cas2 complex formed by the expression products of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence.

3. The method according to claim 1 or 2, wherein said third transgene nucleic acid sequence does not comprise any further CRISPR direct repeat sequence.

4. The method according to any one of the preceding claims, wherein said test cell additionally comprises a fourth transgene nucleic acid sequence encoding a sensor, wherein said sensor will be activated when contacted with an analyte molecule yielding an activated sensor, wherein said activated sensor will induce the expression of a record gene inside the cell; and wherein in said exposure step, if said analyte molecule is present, said activated sensor induces the expression of a record gene inside the cell and RNA derived from said record gene is acquired as a spacer.

5. The method according to any one of the preceding claims, wherein said CRISPR leader sequence and/or said CRISPR direct repeat sequence are specifically recognizable by an RT-Cas1-Cas2 complex of F. saccharivorans, Candidatus accumlibacter, Eubacterium saburreum, Bacteroides fragiles, Camplyobacter fetus, Teredinibacter turnerae, Woodsholea maritima, Desulfaculus baarsii, Azospirillum lipoferum, Cellulomonospora bogoriensis, Micromonospora rosaria, Tolypothirx camplyonemoides, Oscillatoriales cyanobacterium, or Rivularia sp, or an RT-Cas1-Cas2 complex originating thereof.

6. The method according to any one of the preceding claims, wherein said test cell is an E. coli cell.

7. The method according to any one of the preceding claims, wherein said third transgene nucleic acid sequence is comprised within a vector, particularly an expression vector.

8. The method according to claim 7, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are comprised within said vector.

9. The method according to any one of the preceding claims, wherein said conditions, under which expression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence is induced, lead to an overexpression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence.

10. The method according to any one of the preceding claims, wherein said conditions, under which expression of said first transgene nucleic acid sequence and said second transgene nucleic acid sequence is induced, comprise contacting said test cell with an inducer compound, particularly IPTG, lactose, arabinose, rhamnose or anhydrotetracycline; or comprise anaerobic conditions and said inducible promoter is an anaerobically inducible promoter.

11. The method according to any one of the preceding claims, wherein said third transgene nucleic acid sequence comprises an endonuclease recognition site sequence downstream or within said CRISPR direct repeat, and said endonuclease recognition site sequence is specifically recognizable by a site-specific endonuclease, particularly a restriction endonuclease, wherein particularly said CRISPR direct repeat and said restriction site sequence are separated by 20 bps to 0 bps, and said site-specific endonuclease is particularly a Type IIS or Type IIG restriction endonuclease, particularly FaqI, BsmFI, BsIFI, FinI, or BpuSI and said isolated modified third transgene nucleic acid sequence is contacted with said specific endonuclease before said sequencing, wherein said (full length) CRISPR direct repeat (adjacent to said endonuclease site) is cleaved into a truncated CRISPR direct repeat sequence.

12. The method according to claim 11, wherein said sequencing comprises the use of a PCR primer, wherein said PCR primer comprises a nucleic acid sequence being essentially complementary to part of a full length CRISPR direct repeat sequence, but not fully complementary to said truncated CRISPR direct repeat sequences resulting from said endonuclease cleavage, within said modified third nucleic acid sequence, wherein said full length CRISPR direct repeat sequence results from or is formed by at least one spacer acquisition event.

13. The method according to any one of the preceding claims, wherein said first transgene nucleic acid sequence encoding a fusion protein comprising a reverse transcriptase polypeptide and a Cas1 polypeptide comprises or essentially consists of a sequence selected from SEQ ID NO 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, and 31, or a sequence at least 85% identical, particularly ≥90%, ≥93%, ≥95%, ≥98% or ≥99% identical to SEQ ID NO 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, or 31 and the encoding polypeptide having substantially the same biological functionality as the polypeptide encoded by SEQ ID NO 7.

14. The method according to any one of the preceding claims, wherein said second transgene nucleic acid sequence encoding a Cas2 polypeptide comprises or essentially consists of a sequence selected from SEQ ID NO 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, and 32, or a sequence at least 85% identical, particularly ≥90%, ≥93%, ≥95%, ≥98% or ≥99% identical to SEQ ID NO 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, or 32, and the encoding polypeptide having substantially the same biological functionality as the polypeptide encoded by SEQ ID NO 8.

15. The method according to any one of the preceding claims, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence together comprise or essentially consist of a sequence of SEQ ID NO 34, or a sequence at least 85% identical, particularly ≥90%, ≥93%, ≥95%, ≥98% or ≥99% identical to SEQ ID NO 034 and encoding polypeptides having substantially the same biological functionality as the polypeptides encoded by SEQ ID NO 034.

16. The method according to any one of the preceding claims, wherein said third transgene nucleic acid sequence comprising a CRISPR direct repeat (DR) sequence comprises or essentially consists of a sequence selected from SEQ ID NO 35 to 103.

17. An isolated nucleic acid molecule comprising: a CRISPR direct repeat (DR), wherein said isolated nucleic acid molecule does not comprise any further CRISPR direct repeat sequence.

18. The isolated nucleic acid molecule according to claim 17 additionally comprising a CRISPR leader sequence.

19. The isolated nucleic acid molecule according to claim 18, wherein said CRISPR leader sequence and said CRISPR direct repeat sequence are separated by 10 to 0 bp.

20. The isolated nucleic acid molecule according to any one of claims 17 to 19, further comprising an endonuclease recognition site sequence downstream or within of said CRISPR direct repeat, wherein said endonuclease recognition site sequence is specifically recognizable by a site-specific endonuclease, particularly a site-specific restriction endonuclease, and, wherein particularly said CRISPR direct repeat and said restriction site sequence are separated by 20 bps to 0 bps, particularly by 10 bps to 0 bps.

21. The isolated nucleic acid molecule according to claim 20, wherein said site-specific endonuclease is a Type IIS or Type IIG restriction endonuclease, particularly FaqI, BsmFI, BsIFI, FinI, or BpuSI.

22. The isolated nucleic acid molecule according to any one of claims 17 to 21, wherein said CRISPR leader sequence and/or said CRISPR direct repeat sequence are specifically recognizable by a RT-Cas1-Cas2 complex of F. saccharivorans, Candidatus accumlibacter, Eubacterium saburreum, Bacteroides fragiles, Camplyobacter fetus, Teredinibacter turnerae, Woodsholea maritima, Desulfaculus baarsii, Azospirillum lipoferum, Cellulomonospora bogoriensis, Micromonospora rosaria, Tolypothirx camplyonemoides, Oscillatoriales cyanobacterium, or Rivularia sp., or an RT-Cas1-Cas2 complex originating thereof.

23. An expression vector comprising the following sequence elements: a first nucleic acid sequence encoding a fusion protein of a reverse transcriptase and a Cas1 polypeptide, and a second nucleic acid sequence encoding a Cas2 polypeptide, wherein said first nucleic acid sequence and said second nucleic acid sequence are under transcriptional control of an inducible promoter sequence, and a CRISPR array sequence comprising a CRISPR direct repeat (DR) sequence, wherein said CRISPR direct repeat sequence is specifically recognizable by a RT-Cas1-Cas2 complex formed by the expression products of said first nucleic acid sequence and said second nucleic acid sequence.

24. The expression vector according to claim 23, wherein said CRISPR array sequence further comprises a CRISPR leader sequence, wherein said CRISPR leader sequence and said CRISPR direct repeat sequence are separated by 10 to 0 bp.

25. The expression vector according to claim 23 or 24, wherein said CRISPR array sequence does not comprise any further CRISPR repeat sequence specifically recognizable by said RT-Cas1-Cas2 complex.

26. The expression vector according to any one of claims 23 to 25, further comprising an endonuclease recognition site sequence downstream or within of said CRISPR direct repeat, wherein said endonuclease recognition site sequence is specifically recognizable by a site-specific endonuclease, particularly a site-specific restriction endonuclease, and said CRISPR direct repeat and said restriction site sequence are separated by 10 bps to 0 bps.

27. The expression vector according to claim 26, wherein said site-specific endonuclease is a Type IIS or Type IIG restriction endonuclease, particularly FaqI, BsmFI, BsIFI, FinI, or BpuSI.

28. The expression vector according to any one of claims 23 to 27, wherein said CRISPR leader sequence, said CRISPR direct repeat sequence, said first nucleic acid sequence and said second nucleic acid sequence originate from F. saccharivorans, Candidatus accumlibacter, Eubacterium saburreum, Bacteroides fragiles, Camplyobacter fetus, Teredinibacter turnerae, Woodsholea maritima, Desulfaculus baarsii, Azospirillum lipoferum, Cellulomonospora bogoriensis, Micromonospora rosaria, Tolypothirx camplyonemoides, Oscillatoriales cyanobacterium, or Rivularia sp.

29. The expression vector according to any one of claims 23 to 28, wherein said inducible promoter sequence is operable in E. coli and is particularly selected from T7 promoter, lac promoter, tac promoter, P.sub.tet promoter, P.sub.C promoter and P.sub.BAD promoter.

30. The expression vector according to any one of claims 23 to 29, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are codon-optimized for E. coli.

31. A cell comprising a first transgene nucleic acid sequence encoding a fusion protein of a reverse transcriptase and a Cas1 polypeptide, and a second transgene nucleic acid sequence encoding a Cas2 polypeptide, wherein said first transgene nucleic acid sequence and said second transgene nucleic acid sequence are under transcriptional control of an inducible promoter sequence, and a transgene nucleic acid molecule according to any one of claims 15 to 20, wherein said first transgene nucleic acid sequence, said second transgene and said transgene nucleic acid molecule are comprised in an expression vector according to any one of claims 23 to 30 or integrated into the genome of said cell.

32. The cell according to claim 31, additionally comprising a fourth transgene nucleic acid sequence encoding a fourth transgene product, wherein said fourth transgene product is capable of modulating the expression of a record gene inside the cell, and wherein such modulating the expression of said record gene is dependent on the presence or absence of an analyte molecule.

33. The cell according to claim 32, wherein said fourth transgene product is a sensor which will be activated when contacted with a molecule of interest yielding an activated sensor, wherein said activated sensor will induce the expression of a record gene inside the cell.

34. A method for monitoring of a diet of a patient or for diagnosis of a disease of a patient, particularly of a digestive or gastrointestinal disorder of a patient, said method comprising the steps of collecting a cell according to claims 31 to 33 from a feces sample collected from said patient, wherein said cell has been previously applied orally to said patient, isolating the transgene nucleic acid sequence from said cell yielding an isolated transgene nucleic acid sequence, and sequencing said isolated transgene nucleic acid sequence thereby recording one or more transcripts of said cell produced in the environment of the gastrointestinal tract.

35. An apparatus for conducting the method of claim 34.

Description

DESCRIPTION OF THE FIGURES

[0112] FIG. 1 shows the transcriptional recording by CRISPR spacer acquisition from RNA: a) Expression of RT-Cas1-Cas2 leads to the acquisition of intracellular RNAs, providing a molecular memory of transcriptional events stored within DNA; and b) Comparison of RNA sequencing (RNA-seq) and CRISPR acquisition-mediated recording of RNA followed by deep sequencing (Record-seq). RNA-seq captures the transcriptome of a population of cells at a single point in time, providing a transient snapshot of cellular events. In contrast, Record-seq permanently stores information about prior transcriptional events in a CRISPR array, providing a molecular record for reconstructing transcriptional events that occurred over time.

[0113] FIG. 2 shows the characterization of spacers acquired by FsRT-Cas1-Cas2; a) Schematic of Record-seq experimental workflow (FIG. 7); b) Coverage of spacers aligning to the E. coli genome (scale bar 250 kb) and a representative locus (scale bar 100 bp). Identical alignments represent recurrent spacers acquired in independent biological samples (n=14). The sense/antisense orientation label is with respect to the RNA; c) Length distribution of genome-aligning spacers; d) GC content distribution of genome-aligning spacers. Dotted line represents 50% GC content; e) Nucleotide probabilities of the 5′ (left) or 3′ (right) end of the spacer, along with the respective flanking sequence. The spacer (blue) and flanking (grey) nucleotides are shown. Data represent spacers merged across n=14 independent biological samples; f) Gene body coverage of spacer alignments along transcripts. Relative position represents percentiles of coding sequence lengths+/−300 bp of adjacent genomic regions. Values are mean normalized coverage ±s.d., n=14 independent biological samples. Values in c-e are mean percent of genome-aligning spacers ±s.e.m., n=14 independent biological samples.

[0114] FIG. 3 shows that the inventive system FsRT-Cas1-Cas2 acquires spacers directly from RNA according to abundance; a) Schematic of td intron-containing constructs and representative spacers aligning to the td intron splice junction; b) Quantification of spacers derived from the td intron splice junction. Values are mean td intron spacers per million reads ±s.e.m., n=3 independent biological samples. The sum of raw sequencing counts is shown below; c) Experimental workflow depicting MS2 recording; d) Quantification of MS2-derived RNA spacers. Values are mean MS2-aligning spacers per million reads ±s.e.m., n=3 (no MS2) and 4 (MS2) biologically independent samples; e) Coverage of spacers aligning to the MS2 genome. Data represents alignments merged across samples. Sense or antisense orientation is given with respect to the (+)-strand MS2 RNA., scale bar 200 bp; f) Schematic and quantification of transcriptional recording of arbitrary sequences. Values are mean relative spacer count ±s.e.m., n=10 independent biological samples. The constitutively expressed KanR selection marker was used as a control; g) Schematic and quantification of orthogonal transcriptional recording. Values are mean relative spacer count ±s.e.m., n=10 (treated) and 9 (untreated) independent biological samples.

[0115] FIG. 4 shows the transcriptome-scale recording and analysis of complex cellular behaviors; a) Workflow for comparing Record-seq with RNA-seq; b) Clustering of Record-seq data from untreated (grey) and oxidative stress treated (green) E. coli populations, performed using Pearson correlation, n=12 (untreated) and n=11 (treated) independent biological samples; c) Clustering of Record-seq data from untreated (grey boxes) and acid stress treated (orange boxes) E. coli populations, performed using Pearson correlation, n=10 independent biological samples; d) PCA of Record-seq data from untreated (grey) and oxidative stress treated (green) E. coli populations, n=12 (untreated) and n=11 (treated) independent biological samples; e) PCA of Record-seq data from untreated (grey) and acid stress treated (orange) E. coli populations, n=10 independent biological samples; f) Clustering of Record-seq data for signature differentially expressed genes under oxidative stress; g) Clustering of Record-seq data for signature differentially expressed genes under acid stress.

[0116] FIG. 5 shows sentinel cells for recording of dose-dependent and transient herbicide exposure; a) Clustering of Record-seq data from untreated (grey), 10 mM paraquat treated (red) and 1 mM paraquat treated (green) E. coli populations, performed using Pearson correlation, n=15 independent biological samples; b) PCA of Record-seq data from untreated (grey), 10 mM paraquat treated (red) and 1 mM paraquat treated (green) E. coli populations, n=15 independent biological samples; c) Clustering of Record-seq data for signature differentially expressed genes; d) Workflow for comparing Records-Seq with RNA-seq upon transient paraquat exposure; e) PCA of RNA-seq data from unexposed (grey), transient paraquat exposed (turquoise) and constantly paraquat exposed (red) E. coli populations, n=6 independent biological samples; f) PCA of Record-seq data from unexposed (grey), transient paraquat exposed (turquoise) and constantly paraquat exposed (red) E. coli populations, n=6 independent biological samples.

[0117] FIG. 6 shows the RT-Cas1 ortholog search and screening; a) Experimental workflow involving the identification of 121 RT-Cas1 orthologs, overexpression in E. coli from the plasmid carrying minimal CRISPR array, containing leader-DR-spacer1-DR-spacer2-DR, followed by deep sequencing of expanded CRISPR arrays, and analysis as well as characterization of identified spacers; b) A comparison of the 14 disparate RT-Cas1 proteins selected for functional testing. Indicated on the left is the host species followed by a neighbor-joining phylogenetic tree built using Jukes-Cantor genetic distances of a MUSCLE multiple sequence alignment. The large “Unknown Domain” is highlighted in green, Cas6 homology domain in pink, RT domain in purple, and Cas1 in yellow; c) Detection frequency of newly acquired spacers after overnight growth and induction of RT-Cas1-Cas2 in E. coli BL21(DE3) in different induction medias. Shown is the sum of spacer counts per 1 million sequencing reads, n=1 biological sample; d) Representative alignments of 200 spacers sequenced from F. saccharivorans array 1 to the corresponding overexpression plasmid; e) Representative alignments of 200 spacers sequenced from F. saccharivorans array 2 to the corresponding overexpression plasmid.

[0118] FIG. 7 shows the SENECA workflow and assessment of Record-seq efficiency in different culture conditions; a) SENECA relies on a plasmid containing a minimal CRISPR array consisting of the leader sequence followed by a single DR and a recognition sequence for the restriction enzyme FaqI. The SENECA workflow for the (left) parental and (right) expanded array are shown. In a Golden Gate reaction, FaqI cleaves within the DR (I/11) introducing sticky ends for ligation to an IIlumina P7 3′ adapter (III). For the parental array this results in a single truncated DR (IVa). For the expanded array this results in a truncated DR as well as an intact DR and spacer (IVb). PCR with primers binding to the full-length DR and the IIlumina P7 3′ adapter, results in linear amplification of the parental array (Va) and exponential amplification of the expanded array (Vb); b) Sequencing reads obtained from E. coli BL21(DE3) cells transformed with FsRT-Cas1-Cas2 encoding plasmid with or without IPTG induction; c) Same as b) but in E. coli BL21AI; d) Same as b) but in E. coli NovaBlue(DE3), a K12 substrain of E. coli; e) Comparison of the percent of sequencing reads from induced samples containing newly acquired spacers; f) Spacers per million sequencing reads obtained from cultures at an OD.sub.600 of 0.4, 0.8 or upon saturation; g) CRISPR arrays with two spacers per million sequencing reads obtained from cultures at an OD.sub.600 of 0.4, 0.8 or upon saturation. Values in b-g are mean±s.e.m., n=3 independent biological samples.

[0119] FIG. 8 shows the Record-seq-based screen of RT-Cas1 orthologs and CRISPR array directionalities; a) Schematic of the F. saccharivorans CRISPR locus depicting the selection of CRISPR arrays and directionalities for Record-seq analysis. CRISPR arrays within each locus were identified and cloned into plasmids encoding corresponding RT-Cas1-Cas2 coding sequences. Arrays were tested in both possible directionalities, forward and reverse with a 150 bp leader. In cases of insufficient genomic data, arrays were only tested in one directionality; b) Record-seq readout of RT-Cas1 orthologs and CRISPR array directionalities. Acquisition efficiency for forward (fw) and reverse complement (rc) directionality of each array are plotted in blue and orange, respectively. Values are genome-aligning spacers per million sequencing reads, n=1 biological sample.

[0120] FIG. 9 shows the characterization of spacers acquired by FsRT-Cas1-Cas2 and comparison of SENECA and classic spacer acquisition readouts; a) Nucleotide probabilities determined using plasmid-aligning spacers merged across n=14 independent biological samples, prepared analogous to FIG. 2f; b) Histogram of spacer GC content for all spacers or spacers acquired internal to the body of the transcript (‘gene body internal’), Values represent mean percent of genome-aligning spacers ±s.e.m., n=3 independent biological samples; c) Percent of spacers aligning to either the sense or antisense strand of coding genes. The sense or antisense orientation label is with respect to the RNA, prepared analogous to FIG. 2c; d) Length distribution of genome-aligning spacers, prepared analogous to FIG. 2d; e) GC-content distribution of genome-aligning spacers. The dotted line represents a balanced (50%) GC content, prepared analogous to FIG. 2e; f) Nucleotide probabilities for classic acquisition readout, prepared analogous to FIG. 2f; g) Nucleotide probabilities for SENECA acquisition readout, prepared analogous to FIG. 2f. Gene body coverage. For each gene the spacer coverage was determined and transformed into percentiles for comparison. Values are mean normalized coverage. n=1 pooled sample, containing 5798 spacers. Values in c-g are mean percent of genome-aligning spacers, n=1 pooled sample, containing 5798 spacers.

[0121] FIG. 10 shows the characterization of spacers acquired by FsRT-Cas1-Cas2; a) Experimental workflow for determining the specificity of FsRT-Cas1-Cas2 for RNA using the td intron splice junction to detect RNA-derived spacers. Genomic DNA (gDNA) was extracted from an independent culture and subjected to targeted deep sequencing of the td intron insertion site; b) Quantification of td intron splice junctions, the splice junction is specific to RNA-derived spacers and not genomic DNA or cDNA copies generated by alternative RTs in the E. coli genome, Values represent mean td intron splice junction counts per million sequencing reads ±s.e.m., n=3 independent biological samples; c) Number of spacers aligned to plasmid, E. coli genome, and MS2 genome, showing CRISPR acquisition from an RNA virus. The total number and percent of spacers aligning to each reference are shown. Values represent the sum of MS2-aligning spacers across replicates, n=64 technical replicates from n=2 biological samples, representing 22 million spacers; d) Number of MS2-aligned spacers from c) that align to the overexpression plasmid, E. coli and MS2 genome, showing that MS2-aligned spacers are specific to the MS2 genome. The total number and percent of MS2-aligned spacers that subsequently align to each reference are shown, n=64 technical replicates from n=2 biological samples, representing 22 million spacers; e) Total number of spacers aligning to features of the MS2 genome, n=64 technical replicates from n=2 biological samples, representing 22 million spacers; f) Scatter plot of transcript counts from the MS2 and E. coli genomes. Each dot represents the mean spacer count for each transcript, n=4 independent biological samples. The horizontal black bars are mean genome-aligning spacer count across all transcripts ±s.e.m.

[0122] FIG. 11 shows the quantitative analysis of arbitrary RNA sequence recording using qRT-PCR and Record-seq; a) Coverage of spacers from FIG. 3f aligning to sfGFP or Rluc. Arrow and dotted line reflect the transcription start site (TSS), black octagon indicates the transcriptional terminator. For each nucleotide position, the sum spacer coverage per million sequencing reads is shown, n=10 independent biological samples; b) Absolute quantification of sfGFP mRNA measured by qRT-PCR. Samples from FIG. 3f. Values are mean copy number per 6×10.sup.9 cells, normalized by 16S rRNA copy number, ±s.e.m., n=10 independent biological samples; c) Analogous to b, but for Rluc; d) Scatter plot depicting the correlation between absolute sfGFP mRNA copy number and the number of transcript-aligning spacers from FIG. 3f. Linear regression fit, coefficient of determination (R.sup.2), and Pearson linear correlation coefficient (P), n=10 independent biological samples; e) Analogous to d, but for Rluc; f) Comparison of spacer counts for arbitrary sfGFP sequence and endogenous transcripts. Each dot represents the mean spacer count for each transcript, horizontal black bars are mean genome-aligning spacer count ±s.e.m., n=10 independent biological samples; g) Dose-response relationship between sfGFP-aligning spacers and inducer concentration for different numbers of recorded spacers. These data represent the average number of sfGFP-aligning spacers ±s.e.m., n=10 independent biological samples; h) Relative spacer count of spacers mapping to the Fluc transcript after 3O06-HSL induction. Values are the normalized mean number of spacers per million sequencing reads ±s.e.m. with n=6 independent biological samples; i) Absolute quantification of Fluc mRNA measured by qRT-PCR. Data was obtained from the same bacterial cultures as in FIG. 3g. Values are mean copy number per 6×10.sup.9 cells, normalized by 16S rRNA copy number, ±s.e.m., n=10 independent biological samples; j) The same as in g, but for Rluc.

[0123] FIG. 12 shows that Record-seq reveals cumulatively highly expressed genes; a) Scatter plots depicting Record-seq correlation between n=3 independent biological replicates shown in b and c. Linear regression fit, coefficient of determination (R.sup.2), and Pearson linear correlation coefficient (P) are shown for each comparison. Data represent log 2-normalized transcript quantification counts; b) Spacers are preferentially acquired from highly expressed genes. Record-seq spacer counts for plasmid and E. coli genes (top) or only E. coli genes (bottom) according to decreasing RNA-seq-based gene expression values. Monte Carlo bounds reflect simulated spacers with no transcriptional bias. Mean cumulative normalized spacer count, and Monte Carlo bounds are shown, n=3 independent biological samples; c) Assessing the correlation between an RNA-seq stationary phase snapshot and a Record-seq transcriptional record. RNA-seq and Record-seq was performed on the same population of E. coli BL21(DE3) in stationary phase growth, induced to express FsRT-Cas1-Cas2 overnight. The correlation between all (top left), stationary-phase (top right), log-phase (bottom left), and plasmid-borne (bottom right) genes are shown. The linear regression fit, coefficient of determination (R.sup.2), and Pearson linear correlation coefficient (P) are shown for each comparison. The data represent the log 2 normalized transcript quantification counts averaged across replicates, n=3 independent biological samples; d) Correlation of Record-seq with log and stationary-phase genes over long-term cultivation. These data represent the R.sup.2 value calculated as described for b for either stationary or logarithmic phase gene sets using different E. coli culture time points as inputs with n=3 independent biological samples; e) Comparison of transcript-aligning spacer counts with and without normalizing for gene expression level. Each dot represents the mean normalized number of counts per transcript with n=3 independent biological samples. The horizontal black bars are mean genome-aligning spacer count ±s.e.m.

[0124] FIG. 13 shows the defining the minimum number of cells required for assessing complex cellular behaviors using Record-seq and PCA; a) Using the acid stress response data set shown in FIG. 4, PCA was performed on the entire data set as well as progressively and randomly down sampled data. This data shows that Record-seq appropriately classifies the acid stress response samples with 7% of the original data (corresponding to 314 spacer or 6.1×10.sup.6 E. coli cells)., n=10 independent biological samples.

[0125] FIG. 14 shows the defining the minimum number of cells required for assessing complex cellular behaviors using Record-seq and differential expressed signature gene analysis; Using the acid stress response data set shown in FIG. 4e, f, g, differential expressed signature genes were identified for the entire data set as well as progressively and randomly down sampled data. The plots depict hierarchically clustered signature gene heatmaps. This data shows that with 10% of the original data (corresponding to 448 spacer or 8.8×10.sup.6 E. coli cells) the signature genes can appropriately classify the samples., n=10 independent biological samples.

[0126] FIG. 15 shows the optimization of CRISPR spacer acquisition efficiency and detection of signature genes corresponding to Record-seq-compatible sentinel cells for encoding transient herbicide exposure; a) Plasmid and genome-aligning spacers obtained from E. coli BL21(DE3) transformed with FsRT-Cas1-Cas2 encoding plasmid using the original coding sequence (CDS) (light blue) or optimized CDS (dark blue) under the indicated IPTG concentrations; b) Plasmid and genome-aligning spacers obtained from E. coli BL21(DE3) transformed with FsRT-Cas1-Cas2 encoding plasmid using the optimized coding sequence under transcriptional control of either the P.sub.T7lac, P.sub.tetA, or P.sub.rhaB promoter, induced with the indicated concentrations of IPTG, aTc, or Rhamnose, respectively; c) Unsupervised hierarchical clustering of RNA-seq cumulative expression profiles for signature differentially (cumulatively) expressed genes. Signature genes represent the union between the top 20 most differently expressed genes identified by DESeq2, edgeR, and baySeq, n=6 independent biological samples; d) Unsupervised hierarchical clustering of Record-seq cumulative expression profiles for signature differentially (cumulatively) expressed genes. Signature genes represent the union between the top 20 most differently expressed genes identified by DESeq2, edgeR, and baySeq, n=6 independent biological samples. Data in a, b are mean±s.e.m., n=3 independent biological samples.

[0127] FIG. 16 Shows a schematic of the general Record-seq workflow in the mouse gut. E. coli BL21(DE3) or MG1655 cells are transformed with a plasmid encoding FsRT-Cas1-Cas2 under transcriptional control of an inducible promoter (in this case P.sub.tetA). Furthermore, the vector encodes the SENECA compatible version of a Fs CRISPR array. E. coli cells are grown first on solid culture after transformation, and then in liquid culture from individual colonies. Subsequently, germfree mice are gavaged with E. coli cells, maintenance of the plasmid and expression of FsRT-Cas1-Cas2 are ensured by addition of antibiotics (matching the resistance marker of the FsRT-Cas1-Cas2 plasmid) as well as inducers of FsRT-Cas1-Cas2 expression (in this case anhydrotetracycline). The E. coli cells colonize the gut of the germ-free mouse and FsRT-Cas1-Cas2 records spacers into plasmid-borne CRISPR arrays during the passage of cells through the gut. E. coli cells are then collected from feces of the animals or contents of the gut at different sites. Plasmid DNA is extracted from E. coli and subjected to SENECA followed by deep sequencing to retrieve the recorded spacers and infer the intestinal environment.

[0128] FIG. 17 Shows acquisition of spacers detected by SENECA and deep-sequencing after oral gavage of mice with E. coli BL21(DE3) cells. Anhydrotetracycline (aTc) was supplied through the drinking water at indicated concentrations. Acquisition of spacers increased with increasing aTc concentration.

[0129] FIG. 18: Shows acquisition of spacers detected by SENECA and deep-sequencing after oral gavage of mice with E. coli BL21(DE3) cells. Anhydrotetracycline (aTc) was supplied through the drinking water at indicated concentrations. Acquisition of multiple spacers increased with increasing aTc concentration.

[0130] FIG. 19: Shows acquisition of spacers detected by SENECA and deep-sequencing after oral gavage of mice with E. coli BL21(DE3) cells. Plasmid DNA was isolated from E. coli cells from small intestine, cecum, colon and feces. Spacer acquisition occurs in all tested anatomical sections of the gut.

[0131] FIG. 20: Shows acquisition of spacers detected by SENECA and deep-sequencing after oral gavage of mice with E. coli BL21(DE3) cells. Plasmid DNA was isolated from E. coli cells from feces of animals at days 2, 5 and 9 and spacer acquisition was shown to increase over time.

[0132] FIG. 21: Shows a PCA for Record-seq data derived from C57BL/6 mice gavaged with FsRT-Cas1-Cas2 expressing E. coli BL21(DE3) cells as outlined in FIG. 16 and treated with either water (H.sub.2O) or 1, 2 or 3% (w/v) colitis inducing dextran sulfate sodium (DSS) in their drinking water.

[0133] FIG. 22: Shows a PCA for Record-seq data derived from C57BL/6 mice gavaged with FsRT-Cas1-Cas2 expressing E. coli BL21(DE3) cells as outlined in FIG. 16 and fed with either a chow or starch-based diet.

[0134] FIG. 23: Shows a heatmap depicting unsupervised hierarchical clustering for the top differentially expressed genes for Record-seq data derived from C57BL/6 mice gavaged with FsRT-Cas1-Cas2 expressing E. coli BL21(DE3) cells as outlined in FIG. 16 and treated with either water (H.sub.2O) or 1, 2 or 3% (w/v) colitis inducing dextran sulfate sodium (DSS) in their drinking water. Variance stabilizing transformation (vst) transformed genome-aligning spacer counts were used.

[0135] FIG. 24: Shows a heatmap depicting unsupervised hierarchical clustering for the top differentially expressed genes for Record-seq data derived from C57BL/6 mice gavaged with FsRT-Cas1-Cas2 expressing E. coli BL21(DE3) cells as outlined in FIG. 16 and fed with either a chow or starch-based diet. Variance stabilizing transformation (vst) transformed genome-aligning spacer counts were used.

[0136] FIG. 25: Shows a PCA plot for Record-seq data derived from C57BL/6 mice gavaged with FsRT-Cas1-Cas2 expressing E. coli MG1655 cells as outlined in FIG. 16 and fed with either a chow, starch or fat-based diet.

EXAMPLES

[0137] The inventors hypothesized that direct CRISPR spacer acquisition from RNA could be leveraged to store transcriptional records in CRISPR arrays within living cells. Therefore, several orthologous RT-Cas1-containing CRISPR-Cas systems were characterized. The inventors identified one from Fusicatenibacter saccharivorans to be capable of acquiring RNA spacers heterologously in E. coli. Leveraging F. saccharivorans RT-Cas1 and Cas2 (FsRT-Cas1-Cas2) and developed Record-seq, a method enabling transcriptome-scale molecular recordings into populations of cells. Transcriptional events are recorded according to RNA abundance, stored in CRISPR arrays within DNA, and can be leveraged to describe continuous as well as transient complex cellular behaviors.

[0138] CRISPR Spacer Acquisition by FsRT-Cas1-Cas2

[0139] The inventors set out to identify an RT-Cas1-Cas2 CRISPR acquisition complex with the ability to acquire spacers directly from RNA upon heterologous expression in E. coli. The inventors identified 121 RT-Cas1 orthologs (Table 1), and selected 14 representatives for functional characterization (FIG. 6a, b). The inventors overexpressed corresponding RT-Cas1 and Cas2 proteins from a plasmid additionally containing their predicted CRISPR array (FIG. 6a). Using a previously established spacer acquisition assay, the inventors discovered that the ortholog of F. saccharivorans actively acquired new spacers (FIG. 6c). The endogenous F. saccharivorans locus contains two CRISPR arrays and the inventors observed novel spacers derived from the overexpression plasmid as well as the E. coli genome were acquired into either (FIG. 6c-e).

[0140] Selective Amplification of Expanded CRISPR Arrays

[0141] Using the previously established spacer acquisition assay, the inventors obtained approximately 1300 newly acquired spacers per 1 million deep sequencing reads for FsRT-Cas1-Cas2 (FIG. 6c). To improve detection of novel spacers, the inventors developed Selective amplification of expanded CRISPR arrays (SENECA), a method to selectively amplify CRISPR arrays that acquired new spacers (FIG. 2a FIG. 7a). A typical SENECA-assisted Record-seq experiment uses an input of ˜180 ng of plasmid DNA extracted from an overnight culture of E. coli overexpressing FsRT-Cas1-Cas2, and yields 950,000 total spacers aligning to the plasmid or host genome for every 1 million sequencing reads (FIG. 2a, FIG. 7b-e). This marks an improvement of several thousand-fold compared to recent reports. Using Record-seq, the inventors readily demonstrated in vivo activity of FsRT-Cas1-Cas2 in various E. coli strains and throughout growth phases (FIG. 7b-g).

[0142] The inventors then employed Record-seq to rescreen their initial selection of RT-Cas1 orthologs (FIG. 7b). Furthermore, the inventors included all potential CRISPR arrays present in their endogenous loci in both possible directionalities in order to overcome the challenges associated with predicting these a priori (FIG. 8a). Due to the improved sensitivity of Record-seq compared to the classic readout, the inventors readily detected newly acquired spacers for the majority of orthologs upon RT-Cas1-Cas2 expression (FIG. 8b). Only a few orthologs exhibited a preferred directionality of the CRISPR array (i.e., specificity for an upstream leader sequence). Consistent with the classic readout, FsRT-Cas1-Cas2 outperformed all other orthologs in terms of spacer acquisition efficiency and was chosen for further characterization. The concepts employed by Record-seq may also be applied to characterize spacer acquisition in other CRISPR-Cas systems that have been intractable due to low spacer acquisition efficiencies.

[0143] Characteristics of FsRT-Cas1-Cas2 Spacer Acquisition

[0144] In order to better understand the properties of FsRT-Cas1-Cas2, the inventors extensively characterized newly acquired spacers by performing Record-seq on populations of E. coli overexpressing FsRT-Cas1-Cas2 (FIG. 2a). The inventors observed that genome-aligning spacers were preferentially acquired with a specific ‘antisense’ orientation, whereby spacers were complementary to the originating RNA (FIG. 2b, c). The median spacer length was 39 bp, with a distribution biased towards longer lengths (FIG. 2d). The median GC content was 36%, showing a strong bias towards AT-rich spacers (FIG. 2e). In line with previously described Type III CRISPR systems, the inventors did not find a sequence preference within or adjacent to newly adapted spacers acquired from either plasmid (FIG. 9a) or genome (FIG. 2f), implying that the FsRT-Cas1-Cas2 complex exhibits no protospacer adjacent motif (PAM). While observing spacer alignments to the E. coli genome the inventors noted that many coverage peaks were located near the termini of genes (FIG. 2b). Consistent with this observation, the inventors found that at the genome-wide level, most spacers were derived from the 5′, and to a lesser extent, 3′ ends of genes (FIG. 2g). This finding raised the possibility that the apparent bias towards AT-rich spacers might be caused by the AT-richness of RNA ends in E. coli, however the bias towards AT-rich spacers persisted when only considering spacers derived from within the gene body (FIG. 8b). The inventors directly compared SENECA with the classic spacer readout to determine whether SENECA introduces additional biases but found no major differences (FIG. 9c-h). Taken together, these results reflect a process by which FsRT-Cas1-Cas2 selects AT-rich spacers based sequences related to the beginning or end of a gene, such as the ends of an RNA molecule.

[0145] FsRT-Cas1-Cas2 Acquires Spacers Directly from RNA

[0146] To determine whether FsRT-Cas1-Cas2 acquires spacers directly from RNA, the inventors utilized a self-splicing td group I intron. This intron is a functional ribozyme, catalyzing its own excision from the pre-mRNA, resulting in a characteristic splice junction that is not present at the DNA-level. The inventors constructed three intron-interrupted constructs based on genes that were highly sampled by spacers, namely cspA, rpoS and argR (FIG. 3a). Upon expression of these constructs followed by Record-seq the inventors observed unique spacers spanning the splice junctions (FIG. 3a, b). To exclude the possibility that splice junction-containing spacers were acquired from extended complementary DNA copies generated through unspecific RT activity in E. coli, the inventors performed targeted deep sequencing on genomic DNA extracted from td intron construct-expressing cultures (FIG. 10a) showing that the splice junction was absent at the DNA-level (FIG. 10a, b). Importantly, these results do not exclude the possibility of spacer acquisition from DNA. Taken together, FsRT-Cas1-Cas2 facilitates CRISPR spacer acquisition from RNA heterologously in E. coli.

[0147] To further validate this finding, the inventors utilized the Enterobacteria phage MS2. MS2 phages exist as both sense and antisense single-stranded RNAs during their lifecycle but have no DNA intermediates. Given that MS2 phages require the F pilus for cell entry, which is missing in E. coli BL21(DE3) cells, the inventors turned to the E. coli K12 strain NovaBlue(DE3). Upon infection of FsRT-Cas1-Cas2 expressing cells with MS2 phage, the inventors could readily observe novel MS2-aligning spacers sampled from throughout the MS2 genome (FIG. 3c-e, FIG. 10c-f). The MS2-aligning spacers shared no sequence similarity with the plasmid or host genome, confirming their specificity (FIG. 10d). In sum, FsRT-Cas1-Cas2 enables spacer acquisition directly from a foreign RNA, thereby providing a molecular memory of an invading virus.

[0148] Recording of Arbitrary Transcripts Using Record-Seq

[0149] To assess the potential of FsRT-Cas1-Cas2 for quantitatively recording transcriptional events, the inventors utilized an inducible expression system to directly determine whether spacers were being acquired according to RNA abundance. The corresponding constructs contained super-folder GFP (sfGFP) or renilla luciferase (Rluc) genes under transcriptional control of the anhydrotetracycline (aTc)-inducible P.sub.tetA promoter. The inventors introduced these into E. coli cultured in increasing levels of aTc and subsequently harvested both total RNA and plasmid DNA for qRT-PCR and Record-seq, respectively (FIG. 3f). The inventors observed that upon increasing induction of sfGFP or Rluc there was a concordant dose-dependent increase in the coverage of spacers aligning to the respective coding sequence (FIG. 11a). The inventors quantified this response and observed a linear relationship (R.sup.2 value of 0.97) between spacer counts and absolute mRNA copy number (FIG. 11b-e) as well as aTc concentration in the media (FIG. 3f). Furthermore, sfGFP-aligning spacers were readily detected against the backdrop of genome-aligning spacers by almost an order of magnitude (FIG. 11f, g), which is in line with using a strong synthetic inducible promoter such at P.sub.tetA. Importantly, spacers aligning to the constitutively expressed KanR gene were not dependent on the aTc concentration (FIG. 3f).

[0150] To further generalize these findings, the inventors evaluated a second inducible expression system, placing the firefly luciferase (Fluc) gene downstream of the 3-oxohexanoyl-homoserine lactone (3O06-HSL)-inducible P.sub.LuxR promoter. Induction led to a 4-fold increase in Fluc-aligning spacers (FIG. 11h). Furthermore, combining both the aTc-inducible P.sub.tetA and the 3O06-HSL-inducible P.sub.LuxR transcription system enabled orthogonal recording of two independent stimuli in parallel (FIG. 3g, FIG. 11i, j). This suggests that Record-seq is compatible with seemingly any inducible expression system, thereby enabling recording of multiple orthogonal sets of defined stimuli within a population of living cells. Taken together, these results show that CRISPR spacer acquisition from RNA can generate a quantifiable record of cumulative transcript abundance, and also that the transcriptional records are efficiently retrieved using standard molecular and sequencing methods.

[0151] Record-Seq Shows Cumulatively Highly Expressed Genes

[0152] Considering that FsRT-Cas1-Cas2 acquired spacers directly from RNA in an abundance-dependent manner, the inventors investigated whether this could enable quantification of the cumulative cellular transcriptome. The inventors harvested both plasmid DNA for Record-seq and total RNA for RNA-seq E. coli cultures overexpressing FsRT-Cas1-Cas2 (FIG. 4a). First, the inventors confirmed the reproducibility of Record-seq between biological replicates (Pearson Correlation=0.996 to 0.999 and R.sup.2=0.560 to 0.618) (FIG. 12a), and then assessed the influence of gene expression on spacer acquisition. The FsRT-Cas1-Cas2 spacers showed a strong bias towards highly transcribed genes (Extended Data FIG. 12a) and correlated with RNA-seq-based gene expression values transcriptome-wide at various growth stages (FIG. 12b-d). While certain CRISPR-Cas subtypes possess active mechanisms for preferentially acquiring plasmid-derived spacers, the inventors did not observe the same after accounting for the high expression level of these genes (FIG. 12e). Taken together, spacers are systematically acquired from highly transcribed genes, and represent cumulative transcript expression.

[0153] Transcriptome-Scale Recording Reveals Cell Behaviors

[0154] To determine whether Record-seq could be used to record and describe complex cellular behaviors, the inventors turned to the well-studied oxidative stress and acid stress responses in E. coli. The inventors performed Record-seq on oxidative and acid stress stimulated FsRT-Cas1-Cas2 expressing cultures and analyzed cumulative expression counts using unsupervised hierarchical clustering as well as principal component analysis (PCA). Both approaches were successful in distinguishing treatment conditions, suggesting that Record-seq captured the differential molecular histories (FIG. 4b-e). To identify the cumulatively differentially expressed genes the inventors leveraged standard differential expression (DE) analysis tools developed for RNA sequencing. To overcome specific biases and assumptions of individual tools, the inventors utilized three complementary tools, namely DESeq2, edgeR, and baySeq. After identifying DE genes with each tool, the inventors generated a set of signature genes for each stimulus based on the union of the top 20 DE genes from each analysis, which the inventors hierarchically clustered and plotted along with their expression values (FIG. 4f, g). Among the signature genes the inventors identified several that were expected to dominate the cellular responses for each stimulus. The inventors investigated the minimum number of cells required for assessing complex cellular behaviors by Record-seq, finding that 8.8×10.sup.6 cells are sufficient to appropriately classify treatment conditions (FIG. 13, 14). In sum, these data support the notion that the RNA-derived spacers stored within CRISPR arrays can be utilized to reconstruct the transcriptional response underlying a complex cellular behavior.

[0155] Sentinel Cells Encode Transient Herbicide Exposure

[0156] To determine whether Record-seq could be leveraged for producing sentinel cells, the inventors utilized the herbicide paraquat and determined if Record-seq could capture dose-dependent and transient exposures. Paraquat is a bacteriostatic herbicide that results in superoxide anion production in microbes, and is banned in a number of countries due to its acute toxicity in humans and use in suicide cases.

[0157] Using an improved FsRT-Cas1-Cas2 expression construct (FIG. 15a, b) the inventors exposed E. coli cultures to increasing concentrations of paraquat and retrieved the transcriptional memories by Record-seq. Quantification of cumulative gene expression in the different treatment conditions showed that samples were readily classified into appropriate exposure groups using both unsupervised hierarchical clustering and PCA (FIG. 5a, b). Moreover, the signature genes captured dose-responsive and canonical paraquat-exposure genes within E. coli (FIG. 5c). For example, within the signature genes the inventors found ahpC and ahpF, which encode the two subunits of an alkyl hydroperoxide reductase previously shown to facilitate scavenging of reactive oxygen species (ROS) caused by paraquat. Additionally, the inventors identified a set of genes of the cys-regulon involved in cysteine metabolism, namely cysC, cysJ and cysK, which were previously shown to facilitate paraquat resistance in E. coli.

[0158] The inventors next determined whether Record-seq was also capable of capturing transient paraquat exposure in a physiological range. After transiently stimulating cultures with paraquat (FIG. 5d), the inventors quantified cumulative gene expression and gene expression for Record-seq and RNA-seq data sets, respectively. Then, the inventors assessed whether the two methods were capable of capturing the transient paraquat exposure by PCA (FIG. 5e, f), and differentially expressed signature gene clustering (FIG. 15c, d). These analyses show that Record-seq, but not RNA-seq, was capable of capturing the transient paraquat exposure (FIG. 5e, f and FIG. 15c, d). Taken together, these results demonstrate that the memory of paraquat exposure was lost within the cellular transcriptome as assessed by RNA-seq, but preserved within the molecular memories stored within the DNA of the CRISPR arrays of the sentinel cells as investigated by Record-seq.

[0159] Sentinel Cells Recording the Gut Environment in Mice

[0160] Microbes have evolved to adapt and survive in diverse environments, including intestinal niches with diverse micronutrient availabilities. The gene expression patterns of these microbes reflect the extracellular environment they inhabit and could therefore provide key information on the nutrients that enable colonization as well as maintenance of commensal and pathogenic microbes. This could provide a clear entry point for devising and testing clinical interventions that attempt to address dysbiosis of gut microbiota, which has been causally linked to inflammatory bowel diseases (IBD) such as Crohn's disease and ulcerative colitis, as well as malnutrition, where supplementation with sugars and amino acids that are deficient in the diet has been demonstrated to be corrective in animal models and human infants. Unfortunately, microbial gene expression is transient and does not remain constant over time and throughout transit of microbes through the human intestine. Consequently, microbial gene expression patterns in intestinal niches are only accessible through highly invasive sample collection. The Record-seq technology presented by the inventors can address these limitations by creating sentinel cells that constantly record their environment as they transit through the mammalian intestine. It therefore has enormous potential to monitor human gut health and perturbations in the gut microbiome in a non-invasive manner, through collection of these sentinel cells from fecal sources, forming the basis for personalized medicine. Further, in combination with metagenomic data, Record-seq data from multiple sentinel microbes could help monitor changes in microbe-microbe and host-microbe interactions in the context of alterations in the gut.

[0161] The inventors investigated the potential of various strains of E. coli cells overexpressing FsRT-Cas1-Cas2 to function as transcriptional recorders (i.e. sentinel cells) when transiting through the murine gut. To this end the inventors monocolonized gnotobiotic C57BL/6 mice with BL21(DE3) or MG1655 E. coli cells encoding an anhydrotetracycline inducible FsRT-Cas1-Cas2 expression cassette through oral gavage. Expression of FsRT-Cas1-Cas2 was induced non-invasively via the administration of anhydrotetracycline through the drinking water of the animals along with kanamycin to ensure maintenance of the recording plasmid. Subsequently, these E. coli cells were longitudinally sampled from the feces of the mice as well as from different intestinal compartments at the endpoint of the experiment. Following plasmid DNA extraction, SENECA and deep-sequencing, the inventors could isolate newly acquired spacers (FIG. 16).

[0162] Throughout their experiments, the inventors demonstrated, that recording of new spacers increased when raising the concentration of aTc in the drinking water and thus inducing stronger FsRT-Cas1-Cas2 expression (FIG. 17 and FIG. 18). Furthermore, spacers were recorded throughout the gastrointestinal tracts as evident by spacers accumulating from small intestine to cecum and colon of the mice (FIG. 19). Finally, the inventors demonstrated, that the number of spacers obtained from fecal samples increased over time, indicating that bacteria robustly colonized the gut and continuously acquired new spacers throughout the experiment (FIG. 20).

[0163] The inventors then assessed the potential of Record-seq to detect different microenvironments and disease conditions in the murine gut. In one example, the inventors induced colitis by administering 1%, 2% or 3% (w/v) dextran sulfate sodium (DSS) to the drinking water of the animals. The corresponding data can be used to classify the three treatment conditions using principle component analysis (PCA) merely by performing Record-seq on cells isolated from feces of the treated animals (FIG. 21).

[0164] Similarly, in another experiment, the inventors were able to accurately distinguish whether animals were fed with a starch or a chow-based diet (FIG. 22). Together, these experiments indicate, that Record-seq based sentinel cells can stratify treatment conditions as well as reveal distinct signatures of the luminal environment and thus could serve as a diagnostic device.

[0165] This was further bolstered by performing differential expression analysis on the respective Record-seq datasets to pinpoint the exact genes that were differentially expressed in response to different treatment conditions (FIG. 23 and FIG. 24). In the colitis experiment the inventors observed signatures of nitrite reduction—likely a consequence of host inflammatory NOS upregulation. Also, in the differential diet experiment the inventors observed that sugar acid catabolism genes were induced in mice fed a starch diet, whereas the Enter-Doudoroff pathway and methylglyoxal shunt genes were induced on a chow diet, likely due to the availability of plant cell wall glycosides.

[0166] In additional experiments using E. coli MG1655 cells, the inventors confirmed, that Record-seq could also readily distinguish three different diets in this case based on chow, starch and fat (FIG. 25).

[0167] Discussion

[0168] Here, the inventors describe Record-seq, a technology to encode transcriptome-scale events into DNA and assess the cumulative gene expression of populations of cells. The inventors demonstrate its potential by recording specific and complex transcriptional information. First, to improve upon existing spacer readout methods the inventors developed SENECA, resulting in a several thousand-fold improvement of spacer detection efficiency compared to recent reports, thereby enabling in-depth characterization of FsRT-Cas1-Cas2 and its application as a molecular recorder. The inventors' results suggest that RNA-derived spacers are preferentially acquired from the ends of abundant transcripts from AT-rich regions with no PAM, and are broadly sampled at transcriptome-scale, enabling the parallelized quantification of cumulative transcript expression.

[0169] In a set of experiments, the inventors show that upon increasing induction of arbitrary sequences, spacers are acquired in an orthogonal, dose-dependent manner and highly correlate with the absolute mRNA copy number in the cell, thus demonstrating that the molecular record faithfully recapitulates the initial stimulus in a predictable way. This also paves the way for increasingly multiplexed and orthogonal molecular recording devices. Upon inducing complex cellular behaviors, Record-seq provides a meaningful transcriptome-scale record of molecular events, which exceeds the capabilities of current molecular recording technologies that only record specific stimuli. Finally, the inventors use Record-seq to elucidate dose-dependent features of the complex cellular response to the bacteriostatic herbicide paraquat, and demonstrate that Record-seq, but not RNA-seq, is capable of recording transient paraquat stimulation.

[0170] Although additional work will greatly improve the capacity of Record-seq to encode richer and more dynamic expression and lineage information within fewer cells, the inventors' proof-of-principle experiments introduce a powerful tool to record transcriptome-scale events permanently in DNA for later reconstructing complex molecular histories from populations of cells. The inventors show that the recorded transcriptional histories reflect the underlying gene expression changes and could therefore be used to interrogate biological or disease processes. In the long term, the inventors envision that CRISPR spacer acquisition components could be introduced into other cell types to record the molecular sequence of events, and lineage path, that gives rise to particular cell behaviors, cell states and types.

[0171] Methods

[0172] Ortholog Discovery Pipeline

[0173] The protein sequence of Arthrospira platensis RT-Cas1 (WP_006620498) was used as a seed sequence, and a JACKHMMER search was run against all NCBI Non-redundant protein sequences using HMMER v3.1b2 (E-value cutoff of 1E-05). Proteins with both Cas1 and RT domains were subsequently identified using HMMSCAN (E-value cutoff of 1E-05). Genome sequence information for the candidate proteins were retrieved and further inspected for the presence of RT-Cas1, Cas2, and a CRISPR array using CRISPRdetect v2.0, CRISPRone, and HMMSCAN. From 121 candidate proteins, 14 CRISPR loci were selected and subsequently aligned using MUSCLE v3.8.31 to identify candidate domains and catalytic residues. Genetic distances were computed using the Jukes-Cantor method and a phylogenetic tree was built using the Nearest-Neighbour method.

[0174] Bacterial Strains and Culture Conditions

[0175] Escherichia coli strains used in this study were StbI3 (Thermo Fisher Scientific) for cloning purposes as well as BL21(DE3) Gold (Agilent Technologies), BL21AI (Invitrogen) and NovaBlue(DE3) (EMD Millipore) as a K12 strain for acquisition assays. All strains were made competent using the Mix & Go E. coli Transformation Kit & Buffer Set (Zymo Research) following the manufacturer's protocol with growth in ZymoBroth at 19° C. directly from fresh colonies. After transformation, cells were grown at 37° C. on lysogenic broth (LB) (Difco) 1.5% agar plates containing 50 μg/mL kanamycin and 1% glucose (w/v) to reduce background expression from the T7lac system. Liquid cultures for plasmid isolation were grown in TB media (24 g/L yeast extract, 20 g/L tryptone, 4 mL/L glycerol, 17 mM KH.sub.2PO.sub.4, 72 mM K.sub.2HPO.sub.4) containing 1% glucose (w/v).

[0176] Generation of Golden Gate Compatible pET30 Overexpression Vector

[0177] All standard PCRs for cloning were performed using Phusion Flash High-Fidelity PCR Master Mix (Thermo Scientific) or KAPA HiFi HotStart ReadyMix (Roche), oligonucleotides and gBlocks were ordered from Integrated DNA technologies. Primers are listed in Table 6. pET30b(+) (kind gift from Markus Jeschek) was PCR amplified as five fragments using primers FS_151/FS_152, FS_153/F5_154, FS_155/FS_156, FS_157/FS_158, FS_159/FS_160, respectively in order to remove the five undesired BbsI restriction sites present in the backbone. The resulting PCR fragments were assembled using 2×HiFi DNA Assembly Mastermix (NEB), yielding pFS_0012. Subsequently, oligos FS_380 and FS_381 were annealed to generate a double stranded DNA (dsDNA) fragment encoding the T7 terminator and cloned into pFS_0012 using XhoI/CsiI, yielding pFS_0013-a pET30 derived overexpression vector harboring two Golden Gate cloning sites and thus facilitating parallel cloning of RT-Cas1, Cas2 as well as a corresponding CRISPR array. Nucleotide sequences of all RT-Cas1 and Cas2 orthologs tested in this study along with their corresponding CRISPR arrays are listed under Sequences.

[0178] Golden Gate Assembly of RT-Cas1-Cas2 Overexpression Vectors for Ortholog Screen

[0179] RT-Cas1, Cas2 and CRISPR array sequences were ordered from Twist Biosciences and Genscript. Putative CRISPR arrays were ordered as sequences consisting of the leader sequence followed, by DR-nativespacer1-DR-nativespacer2-DR. Furthermore, each fragment was flanked by BbsI restriction sites generating overhangs facilitating Golden Gate Assembly into pFS_0013. Briefly, 40 fmol per fragment (RT-Cas1, Cas2, corresponding CRISPR array, pFS_0013 acceptor vector), 1 μL ATP/DTT mix (10 mM each), 0.25 μL T7 DNA Ligase (Enzymatics), 0.75 μL BpiI (Thermo Scientific), 1 μL buffer green up to 10 μL with PCR grade H.sub.2O were subjected to 99 cycles of 37° C. for 3 min, 16° C. for 5 min, followed by 80° C. for 10 min. Subsequently, 5 μL of this mixture were transformed into 50 μL StbI3 cells and recovered in SOC media for 30 min at 37° C., 1000 rpm before spreading on plates.

[0180] Spacer Acquisition

[0181] Acquisition assays were performed at 37° C., 300 rpm in bacterial culture tubes containing 3 mL of TB media supplied with 100 μM isopropyl-β-D-thiogalactopyranoside (IPTG) (Sigma Aldrich) and for BL21(DE3) Gold and NovaBlue(DE3). For E. coli BL21AI, L-(+)-arabinose (Sigma Aldrich) was additionally added to 0.2% (w/v). Each culture was inoculated with 2 colonies of bacteria stored no longer than 14 days at 4° C. upon transformation and overnight growth at 37° C. When cultures reached saturation (typically 12-14 h post inoculation), 2 mL of bacterial culture were harvested and plasmids containing CRISPR arrays were isolated by standard plasmid Mini-Prep procedures to serve as a template for preparation of deep sequencing libraries.

[0182] Amplification of CRISPR Arrays for Classical Acquisition Readout by Deep Sequencing

[0183] Leader proximal spacers were PCR amplified from 3 ng of plasmid DNA per μL of PCR reaction using NEBNext High-Fidelity 2×PCR Master Mix (NEB) with a forward primer binding in the leader sequence of the respective CRISPR array and a reverse primer binding in the first native spacer (Primer Design Note 1 and Table 2 for primer design and binding sites of individual CRISPR arrays, respectively). For each biological replicate, 12 individual PCR reactions of 10 μL were performed with an extension time of 15 sec for 16 cycles. The individual 10-μL reactions belonging to the same biological sample were then pooled, and residual primers removed using homemade AMPure beads at a PCR to bead ratio of 1:1.5 (v/v) eluting the PCR product in 60 μL of buffer TE. Subsequently, 500 ng of first round PCR product per biological sample was run on a 3% LAB agarose gel (300V, 55 min, cooling the gel-chamber in an ice-water bath during the run) and purified by blind excision of gel slices at 211 to 300 bp, avoiding the prominent DNA band corresponding to PCR products of the unexpanded array (i.e. no acquisition of novel spacers). Amplicons were then purified from the gel slices using the QIAquick Gel Extraction Kit (QIAGEN) and eluted into 22 μL of buffer EB. Illumina sequencing adaptors and indices were appended in a second round of PCR, using 6 μL of gel purified input DNA as a template in a 20 μL PCR reaction with universal second round deep sequencing primers attaching P5 and P7 handles for binding of PCR products to the flow cell in deep sequencing as well as barcoding the samples with (N).sub.8 barcodes corresponding to Illumina TruSeq HT indices (Primer Design Note 2 and Table 3 for primer design and indices, respectively). After this second round of PCR, products were purified using the QIAquick PCR Purification Kit (QIAGEN) and eluted in 22 μL buffer EB. Samples were then pooled and subjected to another round of gel purification using the same parameters as described above, this time excising products in the range of 280 to 350 bp.

[0184] Selective amplification of ExpaNdEd Crispr Arrays (SENECA)

[0185] FsCRISPRArray2 was amplified from pFS_160 using FS_871/FS_904, generating a minimal Fs CRISPR Array consisting of the leader sequence and a single DR followed by a FaqI restriction site (CTTCAG) on the bottom strand resulting in plasmid pFS_0235 as our standard recording plasmid. This plasmid was transformed into chemocompetent BL21(DE3) Gold bacteria or NovaBlue(DE3) (EMD Millipore) and subjected to spacer acquisition as described above. Following plasmid extraction and quantification using Quant-IT PicoGreen dsDNA Assay Kit (Thermo Scientific) read out with a Tecan M1000 Pro Microplate reader, plasmid DNA was subjected to SENECA-adapter ligation in a Golden Gate reaction. Oligonucleotides FS_0963/FS_0964 were annealed (2.5 μL each of 100 μM oligo, 5 μL NEBuffer 2 (NEB), 40 μL PCR grade H.sub.2O), by heating to 95° C. for 5 min and cooling to 20° C. at 0.12° C./sec. Annealed oligos were diluted 1:100 in TE buffer. Next, 40 fmols of plasmid DNA (180.3 ng for pFS_0235), 0.25 μL T7 Ligase (Enzymatics), 1 μL FastDigest FaqI 0.5 μL of 20×SAM, 1 mM ATP, 1 mM DTT (all Thermo Scientific), 1 μL of annealed, diluted oligonucleotides FS_0963/FS_0964 in 10 μL total Volume were subjected to 99 cycles of 3 min 37° C., 3 min 20° C. followed by 15 min at 55° C. First round deep sequencing PCR was performed using NEBNext High-Fidelity 2×PCR Master Mix (NEB) (forward primers: FS_0968 to FS_0974, reverse primer: FS_0911). For each biosample one 30 μL reaction containing 10.38 μL of adapter ligated plasmid DNA were performed (98° C. for 30 s; 22 cycles at 98° C. for 10 s, 57° C. for 30 s and 72° C. for 20 s followed by 72° C. for 5 min), pooled and purified by magnetic beads (GE Healthcare) at a PCR to bead ratio of 1:1.6 (v/v) recovering the PCR product in 25 μL TE buffer (Primer Design Note 3 for details on primer design). Illumina sequencing adaptors and indices were appended in a second round of PCR (98° C. for 30 s, 8 cycles of 98° C. for 10 s, 65° C. for 30 s and 72° C. for 30 s, and 72° C. for 5 min) using 5 μL of first round PCR product as input in a 20 μL reaction (Primer Design Note 2 and Table 3 for primer design and indices, respectively). Samples were pooled, desalted using the QIAquick PCR Purification Kit (QIAGEN) and size selected on a E-Gel EX Agarose Gels, 2% (Thermo Scientific), loading 200-500 ng of DNA per lane, extracted using the QIAquick Gel Extraction Kit and subjected to deep sequencing on Illumina MiSeq or NextSeq500 platforms using the MiSeq Reagent Kit v3 (150-cycle) or NextSeq 500/550 Mid/High Output v2 kit (150 cycles) (both Illumina), respectively. Libraries were loaded at a concentration of 1.4 to 1.6 μM as determined by qPCR using the KAPA Library Quantification Kit for Illumina® Platforms (Roche). PhiX was included at 5-10%.

[0186] SENECA Based Ortholog Screen

[0187] For the SENECA based CRISPR array directionality screen, putative CRISPR arrays were extracted from genomic sequences, assuming a standard leader length of 150 nt followed by a single DR. The FaqI restriction site required for SENECA was appended downstream of the DR and sequences were flanked by universal adapters for amplification and cloning. The final array sequences including these features are depicted under Sequences 2 and were ordered from Twist Biosciences as linear DNA fragments. These were PCR amplified using primers FS_1406/FS_1407 and cloned into CsiI/NotI-digested plasmids containing their respective RT-Cas1-Cas2 ortholog using HiFi DNA Assembly (NEB). Upon transformation into E. coli BL21(DE3), these constructs were subjected to the standard spacer acquisition assay in TB media. Plasmid DNA was extracted and subjected to SENECA adapter ligation.

[0188] The respective oligos to be annealed for each CRISPR array tested in this experiment are listed in Table 4. Following adapter ligation, a single 140 μL 1st round PCR reaction was prepared for each ortholog using NEBNext High-Fidelity 2×PCR Master Mix and containing the entire 20 μL SENECA adapter ligation as a template. First round PCR primers specific to the respective DR of each CRISPR array tested are listed in Table 5. The 140 μL PCR reaction was split into 12 reactions of 11 μL along the row of a 96-well plate. This plate was subjected to a gradient PCR (53 to 68° C. in an Eppendorf Mastercycler Gradient). This procedure was chosen because SENECA leverages the fact that a DR matching primer will only bind to the full DR resulting from an acquisition event but not the truncated parental DR at a unique annealing temperature. By splitting the PCR reaction and subjecting it to a temperature gradient, it is ensured that without a prior knowledge, at least one of the 12 reactions is subjected to the annealing temperature at which selective amplification of expanded CRISPR arrays occurs. PCR was performed for 30 cycles upon which, the 12 reactions performed along the temperature gradient were pooled again and purified using 1.85×Ampure beads and eluted in 25 μL TE buffer. Five μL of this elution were used as a template for a standard 20 μL second round PCR at 65° C. annealing temperature for 12 cycles as described above. Subsequently, PCR products were purified using 2.2×Ampure beads, eluted into 22 μL TE buffer, size selected as described in the standard SENECA protocol (E-Gel Ex 2%, followed by gel extraction) and subjected to deep sequencing.

[0189] Deep Sequencing

[0190] Small scale targeted deep sequencing of CRISPR Arrays for the ortholog screen was performed using the Illumina MiSeq v3 300 cycle kit on an Illumina MiSeq platform or Illumina HiSeq High Output High Output PE 200 cycle kit an Illumina HighSeq2500. Deep sequencing of spacer libraries prepared using SENECA were sequenced using the NextSeq 550/550 High Output Kit v2 150 cycle on Illumina NextSeq platform or the MiSeq Reagent Kit v3 150-cycle on a MiSeq.

[0191] Data Analysis Pipeline

[0192] FASTQ files were quality filtered and trimmed using trimmomatic (trimmomatic SE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:75) and subsequently converted to FASTA files using FASTX-Toolkit v0.0.14 (fastq-to-fasta) (http://hannonlab.cshl.edu/fastx_toolkit/). Using custom scripts written in python2.7, spacers were identified based on the identification of a 20-66 nucleotide sequence between two 10-nt DR segments, allowing for 2 and 3 mismatches in the first and second DR segment, respectively. Arrays with multiple spacers were identified based on the presence of a complete DR sequence, allowing for 3 mismatches. Only unique spacers (>1 mismatch) from a given sample were further processed. Spacers were aligned to a merged reference genome containing plasmid and E. coli sequences [E. coli B121(DE3) Gold (NC_012947.1) genome, E. coli K12 (NC_000913.3)] using bowtie2 (bowtie2 --very-sensitive-local). In MS2 challenge experiments, the MS2 sequence [MS2 (NC_001417.2)] was also included in the merged reference genome. Identical alignments were collapsed using samtoolsv1.3, and alignments were visualized in Geneiousv10.2.3. Basic statistics about numbers of reads or alignment features were calculated using standard bash commands, and compiled and visualized using Prism7.0d. Gene body percentiles were calculated using RSeQC (geneBody_coverage.py v2.6.4). Nucleotide probabilities were determined and visualized using the weblogo webtool v2.8.2. Simulated spacer datasets were prepared using BEDtools v2.25 (bedtools random -n 500 -l 38). Transcript quantification for RNA-seq and Record-seq was performed using featureCounts v1.5.0. Using custom scripts written in Matlab v9.1.0, RNA-seq and Record-seq transcript counts were normalized using transcripts per million (TPM) and used to compute cumulative spacer sums, a linear regression fit, coefficient of determination (R.sup.2), and Pearson linear correlation coefficient.

[0193] Record-seq datasets corresponding to oxidative or acid stress treatments were analyzed using custom scripts written in R v3.4.4. Briefly, transcripts with less than 5 counts across replicates were discarded. Heatmaps representing unsupervised hierarchical clustering of Pearson linear correlation with complete linkage (using raw transcript counts as inputs) were prepared using the ‘heatmap.2’, ‘hclust’, and ‘cor’ commands with default settings. Principal component analysis (PCA) was performed on log 2 transformed data (raw counts plus one pseudocount to tolerate zeros) for the 50 most variable (standard deviation) genes using the ‘prcomp’ command with default settings. Differential expression analyses (using raw counts plus one pseudocount as input) were performed using DEseq2v1.14.1, edgeRv3.16.5, and baySeqv2.8.0 encapsulations within R. Heatmaps representing unsupervised hierarchical clustering of signature differentially expressed genes were prepared using the ‘pheatmap’ command with default settings.

[0194] Code Availability

[0195] The custom scripts used for the described data analysis are available on the Platt Lab website (platt.ethz.ch).

[0196] RNASeq of E. coli BL21(DE3)

[0197] RNA extraction from E. coli BL21(DE3) was performed after overnight growth under induction of FsRT-Cas1-Cas2 expression following the QIAGEN Supplementary Protocol: Purification of total RNA from bacteria using the RNeasy Mini Kit. To achieve the appropriate amount of input culture (corresponding to 5×10.sup.8 cells), serial dilutions of the overnight culture were prepared to achieve an OD.sub.600 between 0.2 to 0.6 measured with a NanoDrop OneC (Thermo Scientific). Bacteria were lysed using acid-washed glass beads (G1277-10G, Sigma Aldrich). The additional on-column DNase digestion was performed using the RNase-Free DNase Set (QIAGEN). DNA free RNA was submitted to the Genomics Facility Basel for ribosomal RNA (rRNA) depletion using the Ribo-Zero rRNA Removal Kit (Illumina) and followed by library preparation and sequencing on an Illumina NextSeq platform using the NextSeq 500/550 High Output v2 kit (150 cycles).

[0198] td intron

[0199] The gBlock FS_gBlock_td_intron_acceptor (Sequences 3) was cloned into pFS_0235 using SphI/SgrAI yielding pFS_0238. This gBlock encoded the BBa_J23104 promoter, the ribosome binding site from bacteriophage T7 gene 10 as well as the td intron sequence including flanking regions facilitating efficient splicing. Furthermore, a BbsI-mediated Golden Gate cloning site was placed downstream and upstream of the td intron sequence, allowing for seamless assembly of upstream and downstream exon sequences in a single one-pot reaction as described above. As the inventors previously noticed, that the 5′ end of transcripts was preferentially acquired by the FsRT-Cas1-Cas2 complex, the inventors introduced the td intron within the first 23 to 31 nucleotides of the respective transcripts. The inventors created intron-interrupted sequences of three E. coli genes cspA, rpoS, argR (cold shock protein CspA, RNA polymerase sigma factor RpoS and Arginine repressor, respectively). These were selected based on the fact that they were well sampled by the FsRT-Cas1-Cas2 complex in preceding SENECA experiments. The flanking exon sequences were mutated in four to six positions to yield optimized sequences for td intron splicing, which also aided in unambiguously distinguishing the spliced and endogenous transcripts or DNA.

[0200] Accordingly, the inventors ordered complementary oligonucleotides for the fragment of the transcript to be cloned 5′ of the td intron and annealed them prior to Golden Gate Assembly, while the fragment to be cloned 3′ of the intron was amplified by PCR from genomic DNA. Oligonucleotides were FS_1054/1055 (5′ of the intron, annealed) and FS_1056/1057 (3′ of the intron, PCR) for CspA; FS_1038/1039 and FS_1040/1041 for RpoS; FS_1046/1047 and FS_1048/1049 for ArgR. The inventors ensured that mutating sequences of the respective genes to those of the td intron flanking sites did not generate a stop codon. The td intron containing FsRT-Cas1-Cas2 overexpression constructs were subjected to a standard acquisition assay followed by plasmid DNA extraction, SENECA and deep sequencing. Presence of td intron splice sites in DNA outside of the FsCRISPR array was tested by extracting gDNA from td-ArgR transformed cultures using the GenElute Bacterial Genomic DNA Kit (Sigma Aldrich). Libraries containing the td intron insertion site were amplified using a two-round PCR strategy method analogous to the ones described above using forward primers FS_1154 to FS_1157 and reverse primers FS_1158 to FS_1161 (Table 6). First-round PCR was performed at 57° C. annealing temperature and 20 sec elongation for 15 cycles. Second-round PCR was performed at 63° C. annealing temperature and 20 sec elongation for 8 cycles.

[0201] Infection with MS2 Phage

[0202] For infections with MS2 phage, the recording plasmid pFS_0235 was transformed into the F′, and thus MS2 susceptible NovaBlue(DE3) Competent Cells (EMD Millipore). Next morning, 15 mL of TB containing 100 μM of IPTG were inoculated with 10 colonies and grown at 37° C., 150 rpm in an orbital shaker until an OD.sub.600 of 0.24. Then, MgSO.sub.4 was added to 5 mM final concentration. Aliquots of 3 mL were split into bacterial culture tubes, infected with 200 μL of high-titre MS2 phage suspension and incubated for 1 h at room temperature without shaking to allow infection by MS2. Next, culture tubes were transferred to the orbital shaker and incubated overnight at 30° C., 80 rpm. Growth of E. coli in presence of MS2 phage at 30° C. rather than 37° C. prevents lysis of cells by productive MS2. Next morning, shaking was increased to 150 rpm. Another day later (˜41 h post-infection), cultures were pelleted by centrifugation, plasmid DNA was extracted and subjected to SENECA followed by deep sequencing.

[0203] Synthetic Recording of sfGFP and Rluc Transcripts

[0204] The Pcat-tetR-term_PtetO encoding fragment was amplified with primers FS_1123/FS_1125 from pLP167 (kind gift from Luzi Pestalozzi), digested with BamHI/AgeI and cloned into AgeI/BbsI-digested pFS_0238 (see cloning of td intron constructs), yielding pFS_0270 which contains a BbsI-mediated Golden-Gate immediately downstream of the P.sub.tetA promoter. Subsequently, sfGFP was amplified from pLP167 with primers FS_1134/FS_1135 and Rluc was amplified using FS_1136/FS_1137 from BBa_J52008 (registry of standard biological parts). Both fragments were cloned into pFS_0270 using BbsI-mediated Golden Gate Assembly, yielding pFS_0271 (sfGFP) and pFS_0272 (Rluc), respectively. LuxR promoter parts were amplified with primers FS_1584/FS_1585 from pIG0046 and FS_1586/FS_1587 from pIG0059 (registry of standard biological parts) and cloned into AgeI-digested pFS_0270 using NEBuilder HiFi DNA Assembly Master Mix (NEB), resulting in pFS_0399. Oligos F5_1588/FS_1589 were annealed and cloned into pFS_0399 digested with SalI/BamHI-yielding pFS_0400. The Fluc coding sequence was amplified from BbaI712019 (registry of standard biological parts) using FS_1618/FS_1619, digested with BsaI and cloned into BbsI-digested pFS_0400, resulting in pFS_0412 that was used in RNA recording experiments. For each biological replicate, 50 mL of IPTG containing TB media were inoculated with 22 colonies of E. coli BL21(DE3) transformed with pFS_0271 (sfGFP), pFS_0272 (Rluc) or pFS_0412 (Fluc). When reaching an OD.sub.600 of 0.25, cells were split into 3 mL aliquots in bacterial culture tubes and induced with aTc in case of P.sub.tetA promoter or N-(3-Oxododecanoyl)-L-homoserine lactone (3O06-HSL) (Sigma) in case of P.sub.LuxR promoter, and cultured in an orbital shaker for 12-14 hours at 300 rpm, followed by plasmid DNA extraction, SENECA and deep sequencing. Spacers aligning to sfGFP, Rluc and Fluc were quantified as described above (see “Data analysis pipeline”). Detected number of unique spacers per million sequencing reads was normalized defining the sum number of spacers per biological replicate as 100% and plotted using GraphPad Prism v7.0d. For RNA-recording with pFS_0271 and pFS_0272 RNA extraction from the same cultures was performed using the RNAsnap method followed by treatment with the TURBO DNA-free Kit (Thermo Scientific) using 1.5 μL of TURBO DNase to minimize DNA-background. Reverse transcription was performed using qScript cDNA SuperMix (Quanta Bio) with 500 ng of RNA sample as a template. cDNA was diluted 1:4 and quantification was performed in 2 technical replicates by real-time PCR (qRT-PCR) using TaqMan Fast Advanced Master Mix (Life Technologies) in a Roche LightCycler 96 System. Primers and probes sequences are listed in Table 7. Absolute copy number was calculated using standard curve method and 16s rRNA was used as a housekeeper. To determine mRNA copy number corresponding to number of cells in a single SENECA reaction (6×10.sup.9) was calculated based on the average amount of 18700 16s rRNA transcripts per single E. coli cell (BNID 102992).

[0205] Orthogonal Synthetic Recording

[0206] The Rluc coding sequence was amplified using FS_1620/FS_1137 from pFS_0272 and cloned into pFS_0399 using BbsI-mediated Golden Gate Assembly, yielding pFS_0413. The Fluc coding sequence was amplified from Bba_I712019 (registry of standard biological parts) using FS_1621/FS_1619, digested with BbsI and cloned into BsaI-digested pFS_0413, resulting in pFS_0414 which was subsequently used in orthogonal synthetic recording experiments.

[0207] For each biological replicate, 50 mL of TB media containing 100 μM IPTG were inoculated with 33 colonies of E. coli BL21(DE3) transformed with pFS_0414, containing (3-Oxododecanoyl)-L-homoserine lactone (3O06-HSL)-inducible Fluc and aTc-inducible Rluc coding sequences. When reaching an OD.sub.600 of 0.25, cells were split into 3 mL aliquots in bacterial culture tubes and induced with 75 ng/mL of anhydrotetracyclinehydrochloride (aTc) (Cayman Chemical) or 10 μM of 3O06-HSL (Sigma) or a combination of both and cultured in an orbital shaker for 12 hours at 300 rpm, followed by plasmid DNA extraction, SENECA, deep sequencing as well as parallelized RNA extraction from the same culture followed by reverse transcription and qPCR measurements. Data was analyzed as described above for recording of single synthetic transcripts.

[0208] Transcriptional Response to Oxidative Stress

[0209] Per biological replicate 36 mL IPTG containing TB media containing 100 μM IPTG were inoculated with 24 colonies of E. coli BL21(DE3) transformed with pFS_0235 the evening before (resulting in 1 colony/1.5 mL) and shaken in a 250 mL baffled shaker flask until reaching an OD.sub.600 of 0.24 to 0.25. Then cultures were split into 3 mL aliquots into bacterial culture tubes (Grainer) and treated with H.sub.2O.sub.2 (30% w/w solution, Sigma Aldrich) to a final concentration of 1 mM or an equal volume of ddH.sub.2O. Growth was continued for 12 hours at 300 rpm followed by harvesting of 2 mL of culture for plasmid DNA extraction, SENECA and deep sequencing. Data were analyzed as described above (see “Data analysis pipeline”).

[0210] Transcriptional Response to Acid Stress

[0211] For pH-controlled growth, potassium-modified lysogenic broth (LB) (10 g/L tryptone, 5 g/L yeast extract, 7.45 g/L KCl) was buffered with 100 mM HOMOPIPES (Homopiperazine-1,4-bis(2-ethanesulfonic acid)). Subsequently, the pH of the medium was adjusted to either 5.0 (acid stress) or 7.0 (neutral) using KOH solution as described previously. For each biological replicate 50 mL of pH adjusted, IPTG containing LB media were inoculated with 33 colonies of E. coli BL21(DE3) transformed with pFS_0235 (resulting in 1 colony/1.5 mL). Samples were harvested between OD.sub.600 of 0.3 to 0.6 for plasmid DNA extraction, SENECA and deep sequencing. Data were analyzed as described above (see “Data analysis pipeline”).

[0212] Cloning of aTc-Inducible FsRT-Cas1-Cas2 Expression Construct

[0213] For recording the transcriptional response to paraquat an aTc-inducible FsRT-Cas1-Cas2 expression construct was generated. Therefore, a fragment containing the tet repressor driven by a constitutive promoter as well as the P.sub.tetA promoter was amplified from pFS_0271 using FS_1574/1575 and digested with BglI/SphI, furthermore the N-terminus of FsRT-Cas1-Cas2 was amplified with FS_1576/1577 and digested with SphI/BglII. These two fragments were cloned into BglI/BglII-digested pFS_0235 yielding pFS_0393. The codon optimized FsRT-Cas1-Cas2 sequence was obtained from Genscript, amplified using FS_1641/1642 and cloned into pFS_0393 using XhoI/SphI replacing the initial FsRT-Cas1-Cas2 coding sequence and yielding pFS_0453 (SEQ ID NO 334).

[0214] Transcriptional Response to 1 mM or 10 mM Paraquat

[0215] Paraquat dichloride hydrate (PESTANAL, Sigma Aldrich) was dissolved at 1 M in ddH.sub.2O. For each biological replicate, 75 mL of TB media containing 30 ng/mL aTc were inoculated with 50 colonies of E. coli BL21(DE3) transformed with pFS_0393 and shaken in baffled shaker flasks until reaching an OD.sub.600 of 0.24 to 0.25. Then cultures were split into 3 mL aliquots into bacterial culture tubes and treated with either 1 mM or 10 mM paraquat and cultured for an additional 11-12 hours before harvesting of 2 mL of culture for plasmid DNA extraction, SENECA and deep sequencing. Data were analyzed as described above (see “Data analysis pipeline”).

[0216] Transcriptional Response to Transient Paraquat Exposure

[0217] For each biological replicate two colonies of E. coli BL21(DE3) transformed with pFS_0453 were inoculated into 3 mL of TB media containing 30 ng/mL aTC in standard bacterial culture tubes. For the first 12 h all cultures were cultivated in the absence of paraquat (300 rpm, 37° C.). Then 2 mL of culture were aspirated, while the remaining 1 mL was spun down (2300×g, 10 min) the supernatant was aspirated and the bacterial pellet resuspended in 3 mL of fresh TB media containing 30 ng/mL of aTc. For both the transient as well as the permanent stimulus conditions, paraquat was added to 10 mM final concentration and the cultures were grown for an additional 12 h as above. Then 2 mL of culture were removed, the remaining 1 mL was pelleted as above and resuspended in 3 mL of fresh TB media containing 30 ng/mL of aTc. Paraquat was added to 10 mM the permanent stimulus condition and cultures were grown for an additional 12 h as above. Then 2 mL of culture were harvested for plasmid DNA extraction, SENECA and deep sequencing. Additionally, 100 μL of culture were harvested for RNA-extraction by the RNASnap protocol as described above followed by treatment with the TURBO DNA-free Kit (Thermo Scientific) using 1.5 μL of TURBO DNase. Ribosomal RNA was depleted using Ribo-Zero rRNA Removal Kit (Illumina) followed by library prep using TruSeq Stranded mRNA (Illumina) and deep sequencing on an NextSeq 500/550 High Output v2 kit (75 cycles) sequencing each library at a depth of 4 million reads or greater.

[0218] Bacterial Population Inputs for Record-Seq Experiments and Achieved Recording Efficiencies

[0219] Record-seq experiments were performed in standard 12 mL culture tubes filled with 3 mL of terrific broth (TB) media, of which 2 mL were used for subsequent plasmid DNA extraction. In early experiments the inventors determined that using 40 fmols (180 ng of plasmid DNA) as an input to SENECA gave consistent results and left enough plasmid for archiving samples and performing several additional SENECA reactions on the same sample if necessary.

[0220] Accordingly, 40 fmols can be considered for contextualizing the number of cells used in a typical experiment. The construct depicted in FIG. 2a (pFS_0235) has a size of 7293 bp, and 40 fmol of plasmid DNA was used as an input for a SENECA reaction. Using the formula [mass of dsDNA (g)=moles of dsDNA (mol)×((length of dsDNA (bp)×617.96 g/mol)+36.04 g/mol)], this equals a mass of 180.3 ng of plasmid DNA. These 40 fmol of plasmid DNA equals a total number of 2.4×10.sup.10 plasmids (using Avogadro's number of 1 mole being equal to 6.022×1023 particles and multiplying this by 40×10-15 to account for the 40 fmol used). Assuming a copy number of ˜20 for the pET origin, this results in 1.2×10.sup.9 cells used as a standard input per SENECA reaction

[0221] A single SENECA reaction of pFS_0235 eventually yields ˜6,126 spacers upon using the entire adapter ligated plasmid DNA for PCR amplification (two 30 μL PCR reaction, each containing 10 μL of adapter ligated plasmid DNA). Using the optimized FsRT-Cas1-Cas2 expression construct encoding an E. coli codon-optimized FsRT-Cas1-Cas2 coding sequence under transcriptional control of the aTc inducible P.sub.tetA promoter (pFS_0453), Extended Data FIG. 10a, b) the efficiency increased ˜10-fold to 61,462 spacer/SENECA reaction. Accordingly, 40 fmol of plasmid DNA acquired, 61,462 spacers. This is equal to one in 390,485 plasmids acquiring a new spacer. Assuming the copy number of pET30b to be 20, this results in every one in 19,524 cells acquiring a new spacer.

[0222] Based on the number of cells required to detect a specific stimulus, this calculation can be used to derive the number of cells used as a minimal input for the respective recording. For example, the inventors defined the minimum number of spacers to be required for assessing an arbitrary sequence (sfGFP) to be as low as 500 spacers, which corresponds to 8.8×10.sup.6 E. coli cells (FIG. 11g).

[0223] Likewise, the inventors estimated the number of spacers required to detect complex cellular behaviors to be 313 (7% of the original data), (FIG. 13, 14). This equals 6.1×10.sup.6 E. coli cells used as an input. The total number of spacers required to record a complex stimulus happens to be lower than that required to record a defined stimulus (sfGFP), because in the complex case, spacers mapping to many different genes contribute to a ‘usable output’ while in the case of a defined stimulus, only a subset of the required total of 500 spacers is mapping to the single gene of interest (sfGFP).

[0224] Type III Versus Type I CRISPR-Cas Systems

[0225] Type III CRISPR-Cas systems like F. saccharivorans are generally several thousand-fold less efficient in spacer acquisition than the prototypical Type I systems (like the E. coli Type I-E). This necessitates multiple rounds of elaborate size selection procedures followed by deep sequencing to identify new spacers. Likewise, PCR products from extended CRISPR arrays cannot be detected on DNA gels (agarose or PAGE) due to their vanishingly low abundance. Taken together, while the classic spacer readout is applicable for highly efficient spacer acquisition systems, it precludes deep characterizations of most CRISPR-Cas systems, which motivated the development of SENECA.

[0226] Assessing the Correlation Between RNA-Seq and Record-Seq

[0227] The inventors set out to assess the direct correlation between RNA-seq and Record-seq (FIG. 12b, c). However, given the distinct nature of the two techniques, namely RNA-seq being a snapshot in time and Record-seq being a cumulative record, the inventors expected the current transcript abundances (RNA-seq) to always precede its integration within a CRISPR array (Record-seq), thus leading to a weak correlation at any specific point in time. To investigate this potential asynchrony, the inventors performed RNA-seq and Record-seq from the same population of E. coli in stationary growth phase, and assessed the correlation between the two in the context of all genes, logarithmic-phase genes, stationary-phase genes.sup.63, and plasmid-borne genes. While a weak correlation was observed between the two datasets when considering all genes (Pearson Correlation=0.61, R.sup.2=0.37), a much stronger correlation was observed when considering only logarithmic-phase genes (Pearson Correlation=0.72, R.sup.2=0.52). In contrast, the correlation was weakest when considering only stationary-phase genes (Pearson Correlation=0.49, R.sup.2=0.24), in which case the inventors expect that the spacers corresponding to stationary-phase growth have not yet been integrated. Performing this correlation analysis using stationary-phase or logarithmic-phase genes on Record-seq datasets obtained after 12, 24 and 36 hours of growth indeed revealed that the spacer repertoire shifted towards stationary-phase genes, while the correlation to logarithmic-phase genes decreased during extended growth (FIG. 7f, g) indicating that spacer acquisition is still active at stationary phase. Furthermore, the plasmid-borne genes expressed under strong synthetic promoters, which are expected to be less affected by the growth phase, show the highest correlation (Pearson Correlation=0.84, R.sup.2=0.70). Taken together, the differences between RNA-seq and Record-seq highlight the respective features of transcript measurement by both methods, namely that RNA-seq represents a snapshot of the cellular transcriptome at the time of cell harvest, and Record-seq reveals the cumulative transcriptome sampled by FsRT-Cas1-Cas2 in a population of cells over time (FIG. 1b).

[0228] Analysis of Complex Cellular Behaviors with Record-Seq

[0229] The inventors set out to answer the following questions: (i) are the transcriptional-scale records broadly different between the treated and untreated conditions; (ii) do the most variable genes in the dataset distinguish the two populations; (iii) do standard RNA sequencing analysis tools identify genes that were cumulatively differentially expressed; (iv) are the cumulatively differentially expressed genes informative in the context of the initial stimulus; and (v) can the inventors unbiasedly classify the cellular populations into treated and untreated conditions based on broad, variable, or signature responses.

[0230] Questions (i-iv) are addressed in the main text, but here the inventors will elaborate on question (v). Among the signature genes the inventors identified several that were expected to dominate the cellular responses for each stimulus. For example, the inventors identified dps (DNA protection during starvation protein), which codes for a hallmark DNA damage repair protein, among the oxidative stress signature genes. Additionally, dps has previously been shown to be the top differentially expressed gene in response to oxidative stress. Furthermore, the inventors identified three members of the SUF system (i.e., sufABCDSE operon), which primarily operates under oxidative stress conditions to aid in the formation of iron-sulfur (Fe—S) clusters. Likewise, the inventors identified hallmark members of the acid stress response, including asr (acid-shock protein precursor) as well as several chaperones (e.g., dnaK and ibpB) and heat-shock proteins (e.g., grpE and ibpA) among the acid stress signature genes.sup.35.

[0231] CRISPR Spacer Acquisition from RNA Versus DNA

[0232] The inventors present multiple lines of evidence showing CRISPR spacer acquisition from RNA, including spacer acquisition from an RNA only td intron splice junction (FIG. 3a, b and FIG. 8a-b), spacer acquisition from an RNA virus (FIG. 3c-e and FIG. 10c-f), and RNA abundance-dependent spacer acquisition (FIG. 3f, g, FIG. 11a-e and FIG. 12b-d). While these observations strongly suggest that FsRT-Cas1-Cas2 is capable of acquiring spacers directly from RNA, they do not exclude the possibility that spacers are also being acquired from DNA. While the distinction between spacer acquisition from RNA versus DNA is fundamental to understanding the molecular mechanism of FsRT-Cas1-Cas2-mediated spacer acquisition, it does not confound Record-seq interpretation, whereby acquired spacers are preferentially derived from highly transcribed genes, correlate with gene expression at the genome-wide level, and highly correlate with RNA abundance (FIG. 12b, c).

[0233] Benefits of Record-Seq

[0234] The benefits of Record-seq include (i) the ability to heterologously express orthologous RT-Cas1-containing CRISPR acquisition systems in order to capture and store RNA species within DNA in an abundance-dependent process; (ii) the capacity to efficiently and scalably read out molecular histories permanently stored in DNA and reconstruct transcriptome-scale events; (iii) the application of this technology for recording specific inputs, such as virus infection or any single or orthogonal set of inducible expression system and (iv) the potential applications of this system for creating ‘sentinel’ cells for medical or biotechnology applications. Even if specific external stimuli cannot be recorded directly, the transcriptome-scale molecular signatures recorded within a bacterial population may be sufficient to report meaningful physiological states.

[0235] Mice Experiments

[0236] For oral gavage, E. coli (BL21 (DE3) or MG1655) cells were transformed with pFS_0453 (SEQ ID NO 334) and streaked on LB-agar plates containing 50 μg/mL kanamycin and grown overnight (12 h) at 37° C. The plasmid pFS_0453 encodes FsRT-Cas1-Cas2 under transcriptional control of an anhydrotetracycline inducible promoter (pTetA) as well as the FsCRISPR array 2 followed by a FaqI restriction site for the SENECA readout.

[0237] The following evening, a single colony was picked into 3 mL LB medium containing 50 μg/mL kanamycin under sterile conditions and grown overnight at 37° C. in a bacterial shaker (200-300 rpm). This culture was used to prepare a glycerol stock by mixing 500 μL of bacterial culture with 500 μL of sterile 50% (w/v) glycerol for long term storage at −80° C. For in vivo recording experiments, an overnight liquid culture was inoculated either directly from this glycerol stock or by streaking bacterial on an LB-agar plate containing 50 μg/mL kanamycin to obtain single bacterial colonies.

[0238] Gnotobiotic C57BL/6 mice were orally gavaged with 1×10.sup.9 colony forming units (CFU) of E. coli BL21(DE3) or MG1655 cells transformed with pFS_0453 in 500 μL PBS. Persistence of the plasmids was ensured by adding 100 μg/mL kanamycin sulfate (Sigma Aldrich) to the drinking water. Expression of FsRT-Cas1-Cas2 was induced by the addition of 10-30 μg/mL anhydrotetracycline (Cayman Chemical) to the drinking water.

[0239] For the DSS experiment, kanamycin (100 μg/mL) and anhydrotetracycline (30 μg/mL) were added to the drinking water of the germ-free C57BL/6 mice 24 hours prior to gavage. Animals were maintained under germ-free conditions. A colony of E. coli BL21(DE3) transformed with pFS_0453 was grown overnight in LB medium containing 50 μg/mL kanamycin. The resulting culture was pelleted and resuspended in 1×PBS. This bacterial resuspension was used to orally gavage each animal with 1×10.sup.9 colony forming units (CFU) of E. coli. Animals were maintained on water containing both kanamycin and anhydrotetracycline throughout the entire experiment. Fecal pellets were collected for 18 days starting 24 hours after the gavage. From day 5 to day 9 of the experiment, dextran sulfate sodium (DSS) (MPBio) was added to 1%, 2% or 3% (w/v) to the animals drinking water while maintaining kanamycin and anhydrotetracycline as described above. Animals were treated in groups of 3 and negative control animals received no DSS via the water.

[0240] The experiment was terminated on day 19 when colonal and cecal contents were also harvested for plasmid DNA extraction.

[0241] Plasmid DNA was extracted using the QIAprep Spin Miniprep Kit according to the manufacturer's instructions, volumes of buffers were increased to 500, 500 and 700 μL for buffers P1, P2 and N3, respectively to adjust for the increased biomass. Plasmid DNA was eluted in 150 μL of buffer EB and subsequently concentrated by precipitation. Therefore, 15 μL of 3M sodium acetate solution pH 5.2 (Sigma-Aldrich) and 105 μL isopropanol were added to each sample. Samples were incubated at −20° C. for at least 20 mins. Following centrifugation to precipitate nucleic acids (20,000×g, 30 mins, 4° C.), the supernatant was removed and the DNA pellet was washed with 150 μL of 70% (v/v) ethanol by centrifugation (20,000×g, 15 mins, 4° C.). Ethanol was aspirated and DNA pellets were briefly dried at 55° C. upon which the DNA pellet was resuspended in 15 μL of buffer EB. From this eluate, 7.5 μL were used for SENECA adapter ligation with all subsequent step of the SENECA protocol performed as described previously.

[0242] For the diet experiment comparing chow and starch diets, all animals were maintained on a chow-based diet (3307, Kliba Nafag) prior to the experiment. On Day 1 of the experiment, 5 animals were continuously maintained on the chow-based diet, while a second group of 5 animals was switched to a starch based diet (D12450Ji, Research Diets Inc.). On Day 2 of the experiment, anhydrotetracycline and kanamycin sulfate were added to the drinking water (30 μg/mL and 100 μg/mL, respectively). On Day 3 of the experiment, all animals were orally gavaged with 1×10.sup.9 colony forming units (CFU) of E. coli BL21(DE3) transformed with pFS_0453 as described above. Fecal pellets were collected from day 4 to day 9 of the experiment for the extraction of plasmid DNA as described above. Furthermore, on day 10 the animals were dissected to obtain cecal and colonic contents for plasmid DNA extraction as described above.

[0243] For the diet experiment comparing chow, starch and fat diets, all animals were maintained on a chow-based diet (3307, Kliba Nafag) prior to the experiment. On day 1 of the experiment, were put on either a chow-based diet (3307, Kliba Nafag), a starch-based diet (D12450Ji, Research Diets Inc.) or a fat-based diet (Fat-enriched diet D12492i, Research Diets Inc.). On Day 2 of the experiment, anhydrotetracycline and kanamycin sulfate were added to the drinking water (30 μg/mL and 100 μg/mL, respectively). On Day 3 of the experiment, all animals were orally gavaged with 1×10.sup.9 colony forming units (CFU) of E. coli MG1655 transformed with pFS_0453 as described above. Fecal pellets were collected from day 4 to day 10 of the experiment for the extraction of plasmid DNA as described above.

[0244] Furthermore, on day 10 the animals were dissected to obtain cecal and colonic contents for plasmid DNA extraction as described above.

TABLE-US-00001 TABLE 1 RT-Cas1 orthologs Host strains and protein accession number of RT-Cas1 orthologs idenfitied by HMMER-based protein sequence homology search Host and protein accession number Bacteroides salyersiae 494745665 ref WP_007481073.1 Leptolyngbya sp. PCC 7375493562087 ref WP_006515493.1 Photobacterium aphoticum 837770314 ref WP_047875592.1 Millisia brevis 1055178592 ref WP_066909103.1 Calothrix parietina 505008919 ref WP_015196021.1 Bacteroides fragilis str. 3397 T10 595923015 gb EXY33263.1 Pelodictyon phaeoclathratiforme 501500885 ref WP_012509117.1 Arthrospira platensis 493670156 ref WP_006620498.1 Calothrix sp. PCC 7507504941836 ref WP_015128938.1 Leptolyngbya sp. PCC 6406495588276 ref WP_008312855.1 Lachnoanaerobaculum saburreum 987863574 ref WP_060932241.1 Candidatus Brocadia fulgida 816979878 gb KKO19838.1 Leptolyngbya sp. O-77984539873 dbj BAU44853.1 Tistrella mobilis KA081020-065 388530577 gb AFK55773.1 Smithella sp. SC K08D17745626258 gb KIE18281.1 Lachnospiraceae bacterium oral taxon 082 497051594 ref WP_009447486.1 Psychrobacter lutiphocae 518502663 ref WP_019672870.1 Propionicicella superfundia 916602138 ref WP_051209229.1 Loktanella vestfoldensis 518800937 ref WP_019956891.1 Desulfovibrio hydrothermalis 505147525 ref WP_015334627.1 Oceanospirillum beijerinckii 654849652 ref WP_028302067.1 Fischerella muscicola 737152142 ref WP_035139015.1 Desulfobacca acetoxidans 503473041 ref WP_013707702.1 Hippea sp. KMI 643957755 ref WP_025270209.1 Chlorobium limicola 501442438 ref WP_012465887.1 Desulfarculus baarsii 503023536 ref WP_013258512.1 Thiocapsa sp. KS1971091367 emb CRI67871.1 Candidatus Accumulibacter sp. SK-02 668684200 gb KFB76584.1 Candidatus Magnetoglobus multicellularis str. Araruama 571788307 gb ETR69258.1 Vibrio sinaloensis 740352375 ref WP_038188758.1 Campylobacter concisus 544653868 ref WP_021087740.1 Cellulomonas bogoriensis 917498396 ref WP_052104813.1 Teredinibacter turnerae 518435809 ref WP_019606016.1 Campylobacter fetus subsp. fetus 998762051 emb CZE46369.1 Gemmatimonadetes bacterium SCN 70-22 1063993205 gb ODT03821.1 Microcoleus sp. PCC 7113504999115 ref WP_015186217.1 Micromonospora rosaria 1000329745 gb KXK58998.1 Candidatus Entotheonella sp. TSY2 575418691 gb ETX03376.1 Lachnoanaerobaculum sp. MSX33 570843978 gb ETO97675.1 Corynebacterium durum 492955761 ref WP_006063846.1 Anabaena cylindrica PCC 7122 428682296 gb AFZ61061.1 Pseudanabaena biceps 497311431 ref WP_009625648.1 Vibrio sp. MEBiC08052 972247703 gb KUI97421.1 Actinomyces johnsonii 545331217 ref WP_021604855.1 Microlunatus phosphovorus 503627960 ref WP_013862036.1 Kamptonema 494597365 ref WP_007355619.1 Skermania piniformis 1054700955 ref WP_066466672.1 Fischerella sp. NIES-3754 965689238 dbj BAU08380.1 Chlorobium phaeobacteroides 500067943 ref WP_011745868.1 Vibrio vulnificus 499466110 ref WP_011152750.1 Bacteroides fragilis 547947118 ref WP_022348096.1 Porphyromonas sp. COT-052 OH4946 746384965 ref WP_039428138.1 Kutzneria sp. 744 918333650 ref WP_052396493.1 Porphyromonas crevioricanis 565855908 ref WP_023938229.1 Rubrivivax benzoatilyticus 497541412 ref WP_009855610.1 Streptomyces sp. F-3 1026350507 dbj GAT81929.1 Campylobacter gracilis 492518353 ref WP_005873073.1 Fusicatenibacter saccharivorans 941895202 ref WP_055226073.1 uncultured Thiohalocapsa sp. PB-PSB1 557040601 gb ESQ17084.1 Porphyromonas gingivalis 492529527 ref WP_005874916.1 uncultured Thiohalocapsa sp. PB-PSB1 557029821 gb ESQ08042.1 Azospirillum lipoferum 503954719 ref WP_014188713.1 Teredinibacter sp. 991H.S.0a.06797071444 ref WP_045826479.1 Tolypothrix campylonemoides 751570959 ref WP_041039832.1 Pseudoalteromonas rubra 800981085 ref WP_046007427.1 Rhodovulum sulfidophilum 985596740 ref WP_060836241.1 Teredinibacter turnerae 516642225 ref WP_018013804.1 Arcobacter thereius 1054172508 ref WP_066177132.1 Nocardiopsis baichengensis 516128787 ref WP_017559367.1 Arthrospira maxima 493720432 ref WP_006669920.1 Eubacteriaceae bacterium CHKCI004 1016807618 emb CVI70780.1 Frankia sp. BMG5.1 919937513 ref WP_052914180.1 Roseburia inulinivorans 937570588 emb CRL43259.1 Porphyromonas gingivalis 503581191 ref WP_013815267.1 Campylobacter fetus subsp. fetus 998759376 emb CZE50714.1 Microcystis aeruginosa 640538680 ref WP_024971209.1 Marinomonas mediterranea 503425197 ref WP_013659858.1 Candidatus Magnetomorum sp. HK-1 927673953 gb KPA10619.1 Campylobacter fetus subsp. fetus 998758141 emb CZE46264.1 Synechococcus sp. NKBG042902 780027826 ref WP_045442561.1 Chlorobaculum limnaeum 1071376969 ref WP_069809202.1 Nostoc sp. PCC 7107764929206 ref WP_044499977.1 Arthrospira platensis 504041557 ref WP_014275551.1 Woodsholea maritima 518804695 ref WP_019960649.1 Actinomyces cardiffensis F0333 478776992 gb ENO18597.1 Mastigocladus laminosus 764662524 ref WP_044448019.1 Clostridium 916986069 ref WP_051592781.1 Rhodococcus sp. YH3-3 1033138899 ref WP_064444911.1 Rhodobacter capsulatus 940623611 gb KQB14189.1 Lachnoanaerobaculum saburreum 496026892 ref WP_008751399.1 Vibrio metoecus 941008961 ref WP_055043549.1 Porphyromonas gingivicanis 739003123 ref WP_036885018.1 Smithella sp. D17683425608 gb KFZ44108.1 Candidatus Accumulibacter sp. BA-91 668677118 gb KFB71594.1 Nodosilinea nodulosa 515871661 ref WP_017302244.1 Phormidesmis priestleyi Ana 938299454 gb KPQ33062.1 Vibrio mexicanus 823288127 ref WP_047044098.1 Photobacterium marinum 494733933 ref WP_007469744.1 Candidatus Brocadia fulgida 816977369 gb KKO17867.1 Desulfovibrio bastinii 652926624 ref WP_027180402.1 Candidatus Magnetoovum chiemensis 778249022 gb KJR40057.1 Azospirillum lipoferum 502738680 ref WP_012973664.1 Cyanothece sp. PCC 7822503100147 ref WP_013334941.1 Closaidiales bacterium VE202-01 639695530 ref WP_024721321.1 Actinomycetaceae bacterium BA112 1032601389 ref WP_064231067.1 Bacteroides 495935708 ref WP_008660287.1 Candidatus jettenia caeni 494421634 ref WP_007220853.1 Rhodobacter capsulatus SB 1003 294475643 gb ADE85031.1 Oscillatothles cyanobacterium USR001 1049312742 gb OCQ91006.1 Nostoc sp. PCC 7120 499304863 ref WP_010995638.1 Vibrio metoecus 941038135 ref WP_055051199.1 Scytonema hofmanni UTEX B 657929289 ref WP_029630506.1 Arthrospira sp. PCC 8005 495324841 ref WP_008049584.1 Phormidium willei 1057444347 ref WP_068790073.1 Vibrio rotiferianus 742405863 ref WP_038884984.1 Thermodesulfovibrio sp. N1 1057568519 ref WP_068860870.1 Bacteroides fragilis 492341859 ref WP_005815836.1 Rhodovulum sp. PH10750340320 ref WP_040622239.1 Porphyromonas gulae 807048030 ref WP_046200570.1 Arthrospira sp. TJSD091 809071417 ref WP_046320545.1 Streptomyces sp. AVP053U2 1057451804 gb ODA69832.1

TABLE-US-00002 TABLE 2 First round PCR primers for classic acquisition readout Primer bindings sites for first round PCR primers to amplify CRISPR arrays for deep sequencing, related to classical acquisition read-out in FIG. 6. Forward primer binding site is shown in top lane for each species, reverse primer binding site in bottom lane. The design of the primers including adapter sequences for first round PCR is described in detail in Primer Design Note 1 in the methods section of this paper. Array Sequence (5′.fwdarw.3′) (SEQ ID NO) Bacteroides fragilis strain S14 TCAACACTTCATCTATCTAACTGAATAA (105) TGTTATGAACGGCTACGCCT (106) Campylobacter fetus subsp. Fetus CGCTCGAATTCAGCTCTCACAG (107) AATTGCCAAATTCTGTTTCAATCC (108) Cellulomonas bogoriensis 69B4 GTCAGCCCGGGGTCAAAAC (109) GGAACTTTAAACCCTTTACATCCCC (110) Fusicatenibacter saccharivorans array TCAGAAAAACGATCGACCGAC (111) 1 AGAAGAAGCAATCGAAAAAGCG (112) Fusicatenibacter saccharivorans array AGAATCTGAAAACAGCGGAA (113) 2 ACGCTAGGGAATATGCAGCAA (114) Candidatus Accumulibacter sp. SK-02 CCGAAAAGAGCCGTTAAATTCC (115) CCTCAAAACGGTACCAAAGAAGC (116) Micromonospora rosaria array 1 CACAGCACCTCTTCGCCACG (117) CGATTCCGGTCCTCGGTTTC (118) Micromonospora rosaria array 2 CTCAAGACCCACCGTTTTCG (119) TTCAACAACGACGCCAACTATG (120) Candidatus Accumulibacter sp. BA-91 GCAAGTCTCCGGCAAGTCAG (121) TCACTTGAAGATTATATAGTGACTCTTTTCG (122) Desulfarculus baarsii DSM 2075 TGGCAAACCATGTGGAAACAG (123) AAAATGGCAACGCCGGG (124) Woodsholea maritima TGGAGCTGAATGTCACATCTTG (125) GGAATCTCAAGCAGCGGAGAA (126) Azospirillum lipoferum 4B array 1 CACAGGATGCGTGGAAAGG (127) CTCAACGAACCGAAGCTGC (128) Azospirillum lipoferum 4B array 2 CCGTTGGGAATTTTCCCGTT (129) GACTCTTTTTCCCGGAGCCC (130) Teredinibacter turnerae T8412 CCCAAACGGGGTTCTAGCAT (131) GCGACAAAAGCATATTAAGGAGACT (132) Tolypothrix campylonemoides GCGCTGTAGAATTATTTCAGGGT (133) ATGGGATGGAGGTTCGGGT (134) Oscillatothles cyanobacterium GAGCTTGGGGCAAGGCTC (135) GTCGAGAAGTAGCAGTTCACTTTCT (136) Eubacterium saburreum DSM 3986 ACCTATCACAACGGCTTAAATG (137) Array 1 ATCACTGCTATGCAGCTTATTCG (138) Eubacterium saburreum DSM 3986 AAAGCGAGGGCTTTCCCATA (139) array 2 CTCATCAGAATGTGACGGTCG (140)

TABLE-US-00003 TABLE 3 Indices for deep sequencing (N).sub.8 barcodes corresponding to Illumina TruSeq HT indices used in this study BC1 Sequence (5′.fwdarw.3′) BC2 Sequence (5′.fwdarw.3′) AAGTAGAG CATGATCG CATGCTTA AGGATCTA GCACATCT GACAGTAA TGCTCGAC CCTATGCC AGCAATTC TCGCCTTG AGTTGCTT ATAGCGTC CCAGTTAG GAAGAAGT TTGAGCCT ATTCTAGG ACACGATC CGTTACCA GGTCCAGA GTCTGATG GTATAACA TTACGCAC TTCGCTGA TTGAATAG AACTTGAC TCCTTGGT CACATCCT ACAGGTAT TCGGAATG AGGTAAGG AACGCATT AACAATGG CGCGCGGT ACTGTATC TCTGGCGA AGGTCGCA CATAGCGA AGGTTATC CAGGAGCC CAACTCTC TGTCGGAT CCAACATT ATTATGTT CTAACTCG CCTACCAT ATTCCTCT TACTTAGC CTACCAGG

TABLE-US-00004 TABLE 4 SENECA adapter oligos Reverse oligos for adapter ligation during SENECA procedure sorted by their respective CRISPR array. Related to FIG. 7 and 8. Upon annealing with the universal reverse oligo FS_0963, the array specific forward oligo (table below) creates a 4 bp overhang compatible with the plasmid overhang generated during FaqI digest in SENECA. Array Sequence (5′.fwdarw.3′) (SEQ ID NO) Bacteroides fragilis strain S14 Array 1 ATAAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (141) Bacteroides fragilis strain S14 Array 1 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (142) RC Campylobacter fetus subsp. Fetus Array TAGGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (143) 1 Campylobacter fetus subsp. Fetus Array GAAAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (144) 1 RC Cellulomonas bogoriensis 69B4 Array 1 GAGGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (145) Cellulomonas bogoriensis 69B4 Array 1 GCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (146) RC Fusicatenibacter saccharivorans Array 1 TGAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (147) Fusicatenibacter saccharivorans Array 1 AGGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (148) PC Fusicatenibacter saccharivorans Array 2 AAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (149) Fusicatenibacter saccharivorans Array 2 AGGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (150) RC Candidatus Accumulibacter sp. SK-02 AAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (151) Array 1 Candidatus Accumulibacter sp. SK-02 GGCTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (152) Array 1 RC Micromonospora rosaria Array 1 GCGGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (153) Micromonospora rosaria Array 1 RC CTGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (154) Micromonospora rosaria Array 2 GCGGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (155) Micromonospora rosaria Array 2 RC CTGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (156) Micromonospora rosaria Array 3 GGGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (157) Candidatus Accumulibacter sp. BA-91 AACAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (158) Array 1 Desulfarculus baarsii DSM 2075 Array 1 AAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (159) Desulfarculus baarsii DSM 2075 Array 1 GCATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (160) RC Desulfarculus baarsii DSM 2075 Array 2 AAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (161) Desulfarculus baarsii DSM 2075 Array 2 GCATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (162) RC Woodsholea maritima Array 1 GAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (163) Woodsholea maritima Array 1 RC GATTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (164) Woodsholea maritima Array 2 GAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (165) Woodsholea maritima Array 2 RC GATGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (166) Azospirillum lipoferum 4B Array 1 GAGCGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (167) Azospirillum lipoferum 4B Array 1 RC GACAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (168) Azospirillum lipoferum 4B Array 2 TAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (169) Azospirillum lipoferum 4B Array 2 RC ATGTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (170) Teredinibacter turnerae T8412 Array 1 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (171) Teredinibacter turnerae T8412 Array 1 GAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (172) RC Tolypothrix campylonemoides Array 1 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (173) Tolypothrix campylonemoides Array 1 GAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (174) RC Tolypothrix campylonemoides Array 2 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (175) Tolypothrix campylonemoides Array 2 GAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (176) RC Tolypothrix campylonemoides Array 3 AAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (177) Tolypothrix campylonemoides Array 3 GAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (178) RC Oscillatothles cyanobacterium Array 1 AATTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (179) Oscillatothles cyanobacterium Array 1 TAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (180) RC Oscillatothles cyanobacterium Array 2 GATTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (181) Oscillatothles cyanobacterium Array 2 CCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (182) RC Rivularia sp. PCC 7116 Array 1 GATTGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (183) Rivularia sp. PCC 7116 Array 1 RC CCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (184) Rivularia sp. PCC 7116 Array 2 TAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (185) Rivularia sp. PCC 7116 Array 2 RC GGTAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (186) Eubacterium saburreum DSM 3986 TAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (187) Array 1 Eubacterium saburreum DSM 3986 GGTAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (188) Array 1 RC Eubacterium saburreum DSM 3986 ATAAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (189) Array 2 Eubacterium saburreum DSM 3986 GAATGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (190) Array 2 RC

TABLE-US-00005 TABLE 5 First round PCR primers for SENECA acquisition readout Primer binding sites for DR specific SENECA forward amplification primer sorted by their respective CRISPR arrays. Related to FIG. 8. During SENECA PCR, the forward primer was chosen corresponding to the respective CRISPR array while FS_0911 serves as a universal reverse primer binding the Illumina Adapter. Details on primer design are described in Primer Design Note 1 and 2. For the CRISPR array directionality screen, staggering was conducted by ordering only two forward primers with different stagger length (NN and NNN) instead of the usual 7 forward primers described for Fusicatenibacter sacchaiivorans array 2. Array Sequence (5′.fwdarw.3′) (SEQ ID NO) Bacteroides fragilis strain S14 Array 1 CAGTATAATAAGGATTAAGAC (191) Bacteroides fragilis strain S14 Array 1 RC ACTGGAATACATCTACAT (192) Campylobacter fetus subsp. Fetus Array 1 ATTAGGGGAT GAAAC (193) Campylobacter fetus subsp. Fetus Array 1 RC GGAGAAAGTGTCTAAAC (194) Cellulomonas bogoriensis 69B4 Array 1 GAGGGCATTGAAAC (195) Cellulomonas bogoriensis 69B4 Array 1 RC GCCATGGGTGGAAC (196) Fusicatenibacter saccharivorans Array 1 CCTATGAGGAATTGAAAC (197) Fusicatenibacter saccharivorans Array 1 RC CATAGGTAAGGTACAAC (198) Fusicatenibacter saccharivorans Array 2 CCTAAAAGGAATTGAAAC (199) Fusicatenibacter saccharivorans Array 2 RC TTTAGGTAAAGTACGAC (200) Candidatus Accumulibacter sp. SK-02 Array 1 GATAAAGGGATTGAGAC (201) Candidatus Accumulibacter sp. SK-02 Array 1 RC GGGCTTAGTTTTCAC (202) Micromonospora rosaria Array 1 GCGGGCATAGAAAC (203) Micromonospora rosaria Array 1 RC CTGTGGATGGCGAT (204) Micromonospora rosaria Array 2 GCGGGCATAGAAAC (205) Micromonospora rosaria Array 2 RC CTGTGGATGGCAAT (206) Micromonospora rosaria Array 3 GGTGATGAGCGAC (207) Candidatus Accumulibacter sp. BA-91 Array 1 GAACAGGCTTGAAAC (208) Desulfarculus baarsii DSM 2075 Array 1 GAAGCGGATTGAAAC (209) Desulfarculus baarsii DSM 2075 Array 1 RC GGCATCCCTCAATAG (210) Desulfarculus baarsii DSM 2075 Array 2 GAAGCGGATTGAAAC (211) Desulfarculus baarsii DSM 2075 Array 2 RC GGCATCCCTCAATAG (212) Woodsholea maritima Array 1 CAGAGCTGATCAAAAC (213) Woodsholea maritima Array 1 RC GATTCGAGCAGAGC (214) Woodsholea maritima Array 2 GGAGCGGATTGAAAC (215) Woodsholea maritima Array 2 RC GATGCCGTCGCGAC (216) Azospirillum lipoferum 4B Array 1 GGAGCGGATTGAAAC (217) Azospirillum lipoferum 4B Array 1 RC GACACCGGCGGAAC (218) Azospirillum lipoferum 4B Array 2 GCTAAGGCTGTGAAAC (219) Azospirillum lipoferum 4B Array 2 RC CTAATGTCGATTGCGAC (220) Teredinibacter turnerae T8412 Array 1 AAGTTGAATTAATGGAAAC (221) Teredinibacter turnerae T8412 Array 1 RC TTCCGAAGAAGTTTAAAG (222) Tolypothrix campylonemoides Array 1 AAGTTGAATTAATGGAAAC (223) Tolypothrix campylonemoides Array 1 RC GGGAGAAGTTTAACAG (224) Tolypothrix campylonemoides Array 2 AAGTTGAATTAATGGAAAC (225) Tolypothrix campylonemoides Array 2 RC TTCCGAAGAAGTTTAAAG (226) Tolypothrix campylonemoides Array 3 AGTCAAATTAATGGAAAC (227) Tolypothrix campylonemoides Array 3 RC CAGAGAAGTCGAGAAG (228) Oscillatothles cyanobacterium Array 1 GTCAAATTAATGGAAACA (229) Oscillatothles cyanobacterium Array 1 RC CCTAAGAAGTCGAAAG (230) Oscillatothles cyanobacterium Array 2 CGGATTAGTTGGAAAC (231) Oscillatothles cyanobacterium Array 2 RC CCCAATCGGTGGGG (232) Rivularia sp. PCC 7116 Array 1 CGGATTAGTTGGAAAC (233) Rivularia sp. PCC 7116 Array 1 RC CCCAATCGGTGGGG (234) Rivularia sp. PCC 7116 Array 2 CCTATAAGGAATGGAAAC (235) Rivularia sp. PCC 7116 Array 2 RC TTATAGGTAAGGTACTTAC (236) Eubacterium saburreum DSM 3986 Array 1 CCTATAAGGAATGGAAAC (237) Eubacterium saburreum DSM 3986 Array 1 RC TTATAGGTAAGGTACTTAC (238) Eubacterium saburreum DSM 3986 Array 2 CAGTATAATAAGGATTAAGAC (239) Eubacterium saburreum DSM 3986 Array 2 RC ACTGGAATACATCTACAT (240)

TABLE-US-00006 TABLE 6 Miscellaneous Primers Primers and oligonucleotides used for cloning purposes. Primer ID Sequence (5′.fwdarw.3′) (SEQ ID NO) FS_0151 ATGCTTCATGTCACCAGGTAGTCTTCCATCGACTTCAAAACTCGATCCAACATCCT GAAGACGCGGCCGCTATTCTTTTGATTTATAAGGGATTTTG (241) FS_0152 CAACAACATGAATGATCTTCGGTTTCCGTGTTTCG (242) FS_0153 CACGGAAACCGAAGATCATTCATGTTGTTGCTCAGGTC (243) FS_0154 CGCCGCACTTATGACTATCTTCTTTATCATGCAACTCG (244) FS_0155 GATAAAGAAGATAGTCATAAGTGCGGCGACG (245) FS_0156 GATACCGAAGATAGCTCATGTTATATCCCGCCG (246) FS_0157 GATATAACATGAGCTATCTTCGGTATCGTCGTATCC (247) FS_0158 CTCCCATGAAGATGGTACGCGACTGGGC (248) FS_0159 GTCGCGTACCATCTTCATGGGAGAAAATAATACTGTTG (249) FS_0160 GAAGACTACCTGGTGACATGAAGCATCTCGAGGGTCTTCCTTGCCGGTGGTGCAGA TGTTGAACAGAAGACCACATATGTATATCTCCTTCTTAAAGTTAAACAAAATTATT TC (250) FS_0380 TCGAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCCACCG CTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTT TTGCTGAAAGGAGGAACTATATCCGGATA (251) FS_0381 CCTGGTATCCGGATATAGTTCCTCCTTTCAGCAAAAAACCCCTCAAGACCCGTTTA GAGGCCCCAAGGGGTTATGCTAGTTATTGCTCAGCGGTGGCAGCAGCCAACTCAGC TTCCTTTCGGGCTTTGTTAGCAGCCGGATC (252) FS_0658 GCTCAGCATATGGACATCCTGATCAGAAACAAGAAG (253) FS_0659 GCTCAGCATATGCAGTACTCCAACTGGCACGACTC (254) FS_0660 GCTCAGCATATGTTCATCAACGGTCGTTACCACATC (255) FS_0662 CCTACTCGCTTCTGGTGAATGTC (256) FS_0871 CCGGATACCAGGTGAGAATTAAATTG (257) FS_0904 GTTTAGCGGCCGCGGGACGTTTCAATTCCTCATAGGTAAGGTACAACATCAGCATT TCCGCTATTTTCAC (258) FS_0911 GTGACTGGAGTTCAGACG (259) FS_0963 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC (260) FS_0964 AAAGGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (261) FS_0995 GATATACATATGTTCACTATAGACGAGATG (262) FS_0996 ATATAGCTGCGGCGTATCTGATC (263) FS_0997 AGATACGCCGCAGCTATATACATCTATATGGACAGCTACGAGAAG (264) FS_0998 GTCGGATGTCTCTAAGATCTGG (265) FS_1001 GCGAAATTAATACGACTCACTATAGG (266) FS_1002 TACTCGCTTCTGGTGAATGTC (267) FS_1003 GAGCTTTAGCCGCTAAGAGCATCATG (268) FS_1004 CATGATGCTCTTAGCGGCTAAAGCTC (269) FS_1005 GTTGCTGGCGGCAACAACCCC (270) FS_1006 GGGGTTGTTGCCGCCAGCAAC (271) FS_1007 GATGTCAGCAAAAGCCAGGTTAAGG (272) FS_1008 CCTTAACCTGGCTTTTGCTGACATC (273) FS_1009 GCTTGAAGATGGCAGCAAAATCC (274) FS_1010 GGATTTTGCTGCCATCTTCAAGC (275) FS_1011 CTATGACTATAGGCGCGAAGATGTCAGC (276) FS_1012 GCTGACATCTTCGCGCCTATAGTCATAG (277) FS_1054 ACGCATGTCCGGTAAAATGA (278) FS_1055 CAAGTCATTTTACCGGACAT (279) FS_1056 GCTCAGGAAGACTTTGCTTAAAATGGTTCAACGCTGACAAAG (280) FS_1057 GTTTAGAAGACTTGATCTTACAGGCTGGTTACGTTACCAG (281) FS_1038 ACGCATGAGTCAGAATACGCTGAAAGTT (282) FS_1039 CAAGAACTTTCAGCGTATTCTGACTCAT (283) FS_1040 GCTCAGGAAGACTTTGCTAATGAAGATGCGGAATTTGATG (284) FS_1041 GTTTAGAAGACTTGATCTTACTCGCGGAACAGCGC (285) FS_1046 ACGCATGCGAAGCTCGGCTAAGCAAGAAGAACTA (286) FS_1047 CAAGTAGTTCTTCTTGCTTAGCCGAGCTTCGCAT (287) FS_1048 GTTTAGAAGACTTTGCTTTTAAAGCATTACTTAAAGAAGAGAAATTTAGC (288) FS_1049 GTTTAGAAGACTTGATCTTAAAGCTCCTGGTCGAACAG (289) FS_1123 GCTCAGGAAGACTACCGGTGGCACGTAAGAGGTTCCAAC (290) FS_1125 GTTTAGGATCCGATCGCGTCTTCTGATCGTTGGAATCGCCATGGGAAGTCGAATGG AAGACTACTCTAGTAGTGCTCAGTATCTCTATC (291) FS_1134 GCTCAGGAAGACTTAGAGAAGCTTGCGGAGGAGCATGCATGAGCAAAGGAGAAGAA CTTTTC (292) FS_1135 GTTTAGAAGACTTGATCCTATCATTTGTAGAGTTCATCCATGCC (293) FS_1136 GCTCAGGAAGACTTAGAGAAGCTTGCGGAGGAGCATGCATGGCTTCCAAGGTGTAC G (294) FS_1137 GTTTAGAAGACTTGATCTCATTACTGCTCGTTCTTCAGCAC (295) FS_1154 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNAGCTCGGCTAAGCAAGAAGA (296) FS_1155 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNAGCTCGGCTAAGCAAGAAGA (297) FS_1156 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNAGCTCGGCTAAGCAAGAAG A (298) FS_1157 ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNAGCTCGGCTAAGCAAGAA GA (299) FS_1158 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCNNGGTCAACATCCGCGAGACTT (300) FS_1159 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNGGTCAACATCCGCGAGACTT (301) FS_1160 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNNGGTCAACATCCGCGAGACT T (302) FS_1161 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCNNNNNGGTCAACATCCGCGAGAC TT (303) FS_1406 GCTGAAAGGAGGAACTATATCCG (304) FS_1407 CAAAATCCCTTATAAATCAAAAGAATAGC (305) FS_1584 CGCCGCAAGGAATGGTGCATGCAACTAGTATACAGTGACTCTTGGCGCGCCTTGAC GGCTAGCTCAGTCCTAGGTACAGTGCTAGCTACTAGAGAAAGAGGAGAAATACTAG ATGAAAAAC (306) FS_1585 CGATCCTACAGGTGAATTCATGCCTTTAATTATAAACGCAGAAAG (307) FS_1586 GGCATGAATTCACCTGTAGGATCGTACAGGTTTACGCAAGAAAATGGTTTGTTATA GTCGAATAAATACTGAGTCTTCACCACGACGATTTCCGGCAGTTTCTCCACAGAAG ACAACGATTAAAGGCATCAAATAAAACGAAAG (308) FS_1587 GAAAGTTGGAACCTCTTACGTGCCAGTCGACCCCAGCTGTCTAGGGCG (309) FS_1588 TCGACCATTCGACTTCCCACGATTCCAACGATCAGG (310) FS_1589 GATCCCTGATCGTTGGAATCGTGGGAAGTCGAATGG (311) FS_1618 GCTCAGGGTCTCATACTAGAGAAAGAGGAGAAATACTAGATGGAAGATGCCAAAAA CATAAAG (312) FS_1619 GTTTAGGTCTCAATCGTCATTACACGGCGATCTTTCCG (313) FS_1620 GCTCAAGAAGACAAAGAGATGGCTTCCAAGGTGTACG (314) FS_1621 GCTCAGGGTCTCATACTATGGAAGATGCCAAAAACATAAAG (315) FS_1574 GCTCAGGCCATGCCGGCGGCACGTAAGAGGTTCCAAC (316) FS_1575 CTCCTTTGCTCATGCATGC (317) FS_1576 GCTCAGGCATGCATGTTCACTATAGACGAGATGCTATC (318) FS_1577 AAGTCGGATGTCTCTAAGATCTG ( 319) FS_1641 GCGGAGGAGCATGCATGTTTACCATCGACGAGATG (320) FS_1642 CAGCCGGATCTCGAGTTAG (321)

TABLE-US-00007 TABLE 7 Primers and TaqMan probes used for qRT-PCR Primer ID Sequence (5′.fwdarw.3′) (SEQ ID NO) 16S rRNA E.coli TaqMan Fw TGGCGCATACAAAGAGAAGC (322) 16S rRNA E.coli TaqMan Rv ACTCCAATCCGGACTACGAC (323) 16S rRNA E.coli TaqMan probe ACCTCGCGAGAGCAAGCGGACC (324) (5′FAM/3′Black Hole Quencher 1) sfGFP E.coli TaqMan Fw CGGATCACATGAAACGGCAT (325) sfGFP E.coli TaqMan Rv CGTCTTGTAGGTCCCGTCAT (326) sfGFP E.coli TaqMan probe ACCTTCGGGCATGGCACTCTTG (327) (5′HEX/3′Black Hole Quencher 1) Rluc E.coli TaqMan Fw AATGGGTAAGTCCGGCAAGA (328) Rluc E.coli TaqMan Rv CGTGGCCCACAAAGATGATT (329) Rluc E.coli TaqMan probe ACCTCACCGCTTGGTTCGAGCTGC (330) (5′HEX/3′Black Hole Quencher 1) Fluc E.coli TaqMan Fw GCTCCAACACCCCAACATCTTC (331) Fluc E.coli TaqMan Rv GCTCCAAAACAACAACGGCG (332) Fluc E.coli TaqMan probe CAGGTGTCGCAGGTCTTCCCGACGA (333) (5′HEX/3′Black Hole Quencher 1)

[0245] Sequences 1—RT-Cas1s, Cas2s and CRISPR Arrays

[0246] Codon mapped DNA Sequences for the individual RT-Cas1, Cas2 orthologs were ordered from Twist Biosciences or Genscript along with their predicted CRISPR arrays for the classical adaptation read-out in FIGS. 6 and 7.

[0247] Bacteroides fragilis strain 514

[0248] Bacteroides fragilis strain S14 RT-Cas1 (SEQ ID NO 1)

[0249] Bacteroides fragilis strain S14 Cas2 (SEQ ID NO 2)

[0250] Bacteroides fragilis strain S14 Array (SEQ ID NO 102)

[0251] Campylobacter fetus subsp. Fetus

[0252] Campylobacter fetus subsp. Fetus RT-Cas1 (SEQ ID NO 3)

[0253] Campylobacter fetus subsp. Fetus Cas2 (SEQ ID NO 4)

[0254] Campylobacter fetus subsp. Fetus Array (SEQ ID NO 103)

[0255] Cellulomonas bogoriensis 69B4

[0256] Cellulomonas bogoriensis 69B4 RT-Cas1 (SEQ ID NO 5)

[0257] Cellulomonas bogoriensis 69B4 Cas2 (SEQ ID NO 6)

[0258] Cellulomonas bogoriensis 69B4 Array (SEQ ID NO 35)

[0259] Fusicatenibacter saccharivorans

[0260] Fusicatenibacter saccharivorans RT-Cas1 (SEQ ID NO 7)

[0261] Fusicatenibacter saccharivorans Cas2 (SEQ ID NO 8)

[0262] Fusicatenibacter saccharivorans Array 1 (SEQ ID NO 36)

[0263] Fusicatenibacter saccharivorans Array 2 (SEQ ID NO 37)

[0264] Candidatus Accumulibacter sp. SK-02

[0265] Candidatus Accumulibacter sp. SK-02 RT-Cas1 (SEQ ID NO 9)

[0266] Candidatus Accumulibacter sp. SK-02 Cas2 (SEQ ID NO 10)

[0267] Candidatus Accumulibacter sp. SK-02 Array (SEQ ID NO 38)

[0268] Micromonospora rosaria

[0269] Micromonospora rosaria RT-Cas1 (SEQ ID NO 11)

[0270] Micromonospora rosaria Cas2 (SEQ ID NO 12)

[0271] Micromonospora rosaria Array 1 (SEQ ID NO 39)

[0272] Micromonospora rosaria Array 2 (SEQ ID NO 40)

[0273] Candidatus Accumulibacter sp. BA-91

[0274] Candidatus Accumulibacter sp. BA-91 RT-Cas1 (SEQ ID NO 13)

[0275] Candidatus Accumulibacter sp. BA-91 Cas2 (SEQ ID NO 14)

[0276] Candidatus Accumulibacter sp. BA-91 Array (SEQ ID NO 41)

[0277] Desulfarculus baarsii DSM 2075

[0278] Desulfarculus baarsii DSM 2075 RT-Cas1 (SEQ ID NO 15)

[0279] Desulfarculus baarsii DSM 2075 Cas2 (SEQ ID NO 16)

[0280] Desulfarculus baarsii DSM 2075 Array (SEQ ID NO 42)

[0281] Woodsholea maritima

[0282] Woodsholea maritima RT-Cas1 (SEQ ID NO 17)

[0283] Woodsholea maritima Array (SEQ ID NO 43)

[0284] Azospirillum lipoferum 4B

[0285] Azospirillum lipoferum 4B RT-Cas1 (SEQ ID NO 19)

[0286] Azospirillum lipoferum 4B Cas2 (SEQ ID NO 20)

[0287] Azospirillum lipoferum 4B Array (SEQ ID NO 44)

[0288] Azospirillum lipoferum 4B Array 2 (SEQ ID NO 45)

[0289] Vibrio sinaloensis strain T08

[0290] Vibrio sinaloensis strain T08 RT-Cas1 (SEQ ID NO 21)

[0291] Vibrio sinaloensis strain T08 Cast (SEQ ID NO 22)

[0292] Vibrio sinaloensis strain T08 Array (SEQ ID NO 46)

[0293] Teredinibacter turnerae T8412

[0294] Teredinibacter turnerae T8412 RT-Cas1 (SEQ ID NO 23)

[0295] Teredinibacter turnerae T8412 Cast (SEQ ID NO 24)

[0296] Teredinibacter turnerae T8412 Array (SEQ ID NO 47)

[0297] Tolypothrix campylonemoides

[0298] Tolypothrix campylonemoides RT-Cas1 (SEQ ID NO 25)

[0299] Tolypothrix campylonemoides Cas2 (SEQ ID NO 26)

[0300] Tolypothrix campylonemoides Array (SEQ ID NO 48)

[0301] Oscillatoriales cyanobacterium

[0302] Oscillatoriales cyanobacterium RT-Cas1 (SEQ ID NO 27)

[0303] Oscillatoriales cyanobacterium Cas2 (SEQ ID NO 28)

[0304] Oscillatoriales cyanobacterium Array (SEQ ID NO 49)

[0305] Rivularia sp. PCC 7116

[0306] Rivularia sp. PCC 7116 Cas1 (SEQ ID NO 29)

[0307] Rivularia sp. PCC 7116 RT (SEQ ID NO 33)

[0308] Rivularia sp. PCC 7116 Cas2 (SEQ ID NO 30)

[0309] Rivularia sp. PCC 7116 Array 1 (SEQ ID NO 50)

[0310] Rivularia sp. PCC 7116 Array 2 (SEQ ID NO 51)

[0311] Eubacterium saburreum DSM 3986

[0312] Eubacterium saburreum DSM 3986 RT-Cas1 (SEQ ID NO 31)

[0313] Eubacterium saburreum DSM 3986 Cas2 (SEQ ID NO 32)

[0314] Eubacterium saburreum DSM 3986 Array 1 (SEQ ID NO 52)

[0315] Eubacterium saburreum DSM 3986 Array 2 (SEQ ID NO 53)

[0316] Sequences 2—CRISPR Array Directionality Screen

[0317] Sequences of putative arrays for the CRISPR array directionality screen related to FIG. 8b sorted by their respective ortholog. All sequences are depicted with flanking adapter sites for Gibson Assembly into their respective RT-Cas1-Cas2 expression plasmids (RC=reverse complement).

[0318] Bacteroides fragilis strain S14

[0319] Bacteroides fragilis strain S14 Array 1 (SEQ ID NO 54)

[0320] Bacteroides fragilis strain S14 Array 1 RC (SEQ ID NO 55)

[0321] Campylobacter fetus subsp. Fetus

[0322] Campylobacter fetus subsp. Fetus Array 1 (SEQ ID NO 56)

[0323] Campylobacter fetus subsp. Fetus Array 1 RC (SEQ ID NO 57)

[0324] Cellulomonas bogoriensis 69B4

[0325] Cellulomonas bogoriensis 69B4 Array 1 (SEQ ID NO 58)

[0326] Cellulomonas bogoriensis 69B4 Array 1 RC (SEQ ID NO 59)

[0327] Fusicatenibacter saccharivorans

[0328] Fusicatenibacter saccharivorans Array 1 (SEQ ID NO 60)

[0329] Fusicatenibacter saccharivorans Array 1 RC (SEQ ID NO 61)

[0330] Fusicatenibacter saccharivorans Array 2 (SEQ ID NO 62)

[0331] Fusicatenibacter saccharivorans Array 2 RC (SEQ ID NO 63)

[0332] Candidatus Accumulibacter sp. SK-02

[0333] Candidatus Accumulibacter sp. SK-02 Array 1 (SEQ ID NO 64)

[0334] Candidatus Accumulibacter sp. SK-02 Array 1 RC (SEQ ID NO 65)

[0335] Micromonospora rosaria

[0336] Micromonospora rosaria Array 1A (SEQ ID NO 66)

[0337] Micromonospora rosaria Array 1 RC (SEQ ID NO 67)

[0338] Micromonospora rosaria Array 2A (SEQ ID NO 68)

[0339] Micromonospora rosaria Array 2 RC (SEQ ID NO 69)

[0340] Micromonospora rosaria Array 3A (SEQ ID NO 70)

[0341] Candidatus Accumulibacter sp. BA-91

[0342] Candidatus Accumulibacter sp. BA-91 Array 1 (SEQ ID NO 71)

[0343] Desulfarculus baarsii DSM 2075

[0344] Desulfarculus baarsii DSM 2075 Array 1 (SEQ ID NO 72)

[0345] Desulfarculus baarsii DSM 2075 Array 1 RC (SEQ ID NO 73)

[0346] Desulfarculus baarsii DSM 2075 Array 2 (SEQ ID NO 74)

[0347] Desulfarculus baarsii DSM 2075 Array 2 RC (SEQ ID NO 75)

[0348] Woodsholea maritima

[0349] Woodsholea maritima Array 1 (SEQ ID NO 76)

[0350] Woodsholea maritima Array 1 RC (SEQ ID NO 77)

[0351] Azospirillum lipoferum 4B

[0352] Azospirillum lipoferum 4B Array 1 (SEQ ID NO 78)

[0353] Azospirillum lipoferum 4B Array 1 RC (SEQ ID NO 79)

[0354] Azospirillum lipoferum 4B Array 2A (SEQ ID NO 80)

[0355] Azospirillum lipoferum 4B Array 2 RC (SEQ ID NO 81)

[0356] Teredinibacter turnerae T8412

[0357] Teredinibacter turnerae T8412 Array 1 (SEQ ID NO 82)

[0358] Teredinibacter turnerae T8412 Array 1 RC (SEQ ID NO 83)

[0359] Tolypothrix campylonemoides

[0360] Tolypothrix campylonemoides Array 1 (SEQ ID NO 84)

[0361] Tolypothrix campylonemoides Array 1 RC (SEQ ID NO 85)

[0362] Tolypothrix campylonemoides Array 2 (SEQ ID NO 86)

[0363] Tolypothrix campylonemoides Array 2 RC (SEQ ID NO 87)

[0364] Tolypothrix campylonemoides Array 3 (SEQ ID NO 88)

[0365] Tolypothrix campylonemoides Array 3 RC (SEQ ID NO 89)

[0366] Oscillatoriales cyanobacterium

[0367] Oscillatoriales cyanobacterium Array 1 (SEQ ID NO 90)

[0368] Oscillatoriales cyanobacterium Array 1 RC (SEQ ID NO 91)

[0369] Oscillatoriales cyanobacterium Array 2 (SEQ ID NO 92)

[0370] Oscillatoriales cyanobacterium Array 2 RC (SEQ ID NO 93)

[0371] Rivularia sp. PCC 7116

[0372] Rivularia sp. PCC 7116 Array 1 (SEQ ID NO 94)

[0373] Rivularia sp. PCC 7116 Array 1 RC (SEQ ID NO 95)

[0374] Rivularia sp. PCC 7116 Array 2 (SEQ ID NO 96)

[0375] Rivularia sp. PCC 7116 Array 2 RC (SEQ ID NO 97)

[0376] Eubacterium saburreum DSM 3986

[0377] Eubacterium saburreum DSM 3986 Array 1 (SEQ ID NO 98)

[0378] Eubacterium saburreum DSM 3986 Array 1 RC (SEQ ID NO 99)

[0379] Eubacterium saburreum DSM 3986 Array 2 (SEQ ID NO 100)

[0380] Eubacterium saburreum DSM 3986 Array 2 RC (SEQ ID NO 101)

[0381] Sequences 3—Miscellaneous Sequences

[0382] gBlock FS_gBlock_td_intron_acceptor (SEQ ID NO 104)

[0383] Human codon-optimized FsRT-Cas1-T7RBS-Cas2 (SEQ ID NO 34)

[0384] pFS 0453 plasmid (SEQ ID NO 334)

TRANSCRIPTIONAL RECORDING BY CRISPR SPACER ACQUISITION FROM RNA

Assignee

Inventors

Cpc classification

Classification Explorer

C12N2310/20

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/107

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/62

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C12N2800/22

CHEMISTRY; METALLURGY

Classification Explorer

C12N2800/80

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/102

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6869

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/00

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/301

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/102

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/301

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/107

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6883

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/62

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6869

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6883

CHEMISTRY; METALLURGY

Abstract

Claims

Description