RNA SEQUENCING METHOD FOR THE ANALYSIS OF B AND T CELL TRANSCRIPTOME IN PHENOTYPICALLY DEFINED B AND T CELL SUBSETS

Abstract

Single-cell RNA sequencing (scRNAseq) allows the identification, characterization, and quantification of cell types in a tissue. When focused on the adaptive immune system's T and B cells, scRNAseq carries the potential to track the clonal lineage of each analyzed cell through the unique rearranged sequence of its antigen receptor (TCR or BCR, respectively), and link it to the functional state inferred from transcriptome analysis. Computational approaches to infer clonality and maturation status (for BCR only) from scRNAseq datasets of T and B cells have been developed but there are cumbersome and not costly effective. The inventors have now developed a FACS-based 5′-end RNAseq method, in particular a FACS-based 5′-end scRNAseq method, for cost effective integrative analysis of B and T cell transcriptome and paired BCR and TCR repertoire in phenotypically defined B and T cell subsets. In particular, the method of the present invention includes a reverse transcription step that uses a number of different well specific template switching oligonucleotides (TSO) to introduce a well-specific DNA barcode in the 5′-end of cDNAs.

Claims

1. A template switching oligonucleotide (TSO) comprising: a 5′-terminal PCR handle sequence, a barcode sequence, a Unique Molecular Identifier (UMI) sequence, an insulator sequence, and a 3′ terminal sequence consisting of 3 riboguanosine (rG).

2. The TSO of claim 1 wherein the 5′-terminal PCR handle sequence comprises the sequence TABLE-US-00006 (SEQ ID NO: 1) AGACGTGTGCTCTTCCGATCT

3. The TSO of claim 1 wherein the barcode sequence is selected from the group consisting of SEQ ID NO: 2 to SEQ ID NO:97 and SEQ ID NO:233 to SEQ:251.

4. The TSO of claim 1 which consists of comprises a sequence selected from the group consisting of SEQ ID NO:99 to SEQ ID NO:194.

5. A method for preparing DNA that is complementary to an RNA molecule, the method comprising conducting a reverse transcription reaction with the RNA molecule in the presence of the template switching oligonucleotide (TSO) of claim 1.

6. An RNA sequencing method comprising the steps of: a) providing a sample comprising RNA molecules, b) conducting reverse transcription (RT) of said RNA molecules by performing the method of claim 5, c) amplification of the amplifying cDNAs obtained at step b), d) pooling and purifying the cDNAs, e) preparing a cDNA library from purified cDNAs obtained in step d), and f) sequencing said cDNA library.

7. A single-cell RNA sequencing method comprising the steps of: a) isolating single cells, b) lysing the single cells and extracting RNA molecules, c) conducting reverse transcription (RT) of said RNA molecules by performing the method of claim 5, d) amplifying cDNAs obtained at step c), e) pooling and purifying the cDNAs, f) preparing a cDNA library from purified cDNAs obtained in step e), and g) sequencing said cDNA library.

8. The method of claim 6 wherein the step of conducting reverse transcription (RT) is performed using 96 different well-specific template switching oligonucleotides (TSO) to introduce a well-specific DNA barcode at the 5′-end of cDNAs, wherein said template switching oligonucleotides are sequences SEQ ID NOS: 99-194.

9. The method of claim 7 to wherein the single cells are B cells and/or T cells.

10. The method of claim 7 wherein the step of lysing is performed with a lysis mixture comprising an RNase inhibitor, an amount of dNTP and an amount of a primer suitable for priming the reverse transcription of polyadenylated mRNAs while incorporating a universal PCR handle at the 3′-end of cDNA molecules, wherein the primer comprises the sequence TGCGGTATCTAAAGCGGTGAGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN (SEQ ID NO:195), wherein V represents dG, dA, or dC and N represents dA, dT, dG, or dC.

11. The method of claim 6 wherein the step of amplifying is performed by PCR-based amplification and uses a pair of primers comprising a forward primer having the sequence AGACGTGTGCTCTTCCGATCT (SEQ ID NO:196) and a reverse primer having the sequence TGCGGTATCTAAAGCGGTGAG (SEQ ID NO:197).

12. The method of claim 6 wherein the step of preparing a cDNA library comprises subjecting the purified cDNAs to a tagmentation reaction with a plurality of adapters sequences as set forth in SEQ ID NOS: 214-229.

13. The method of claim 6 wherein the step of sequencing of the cDNA library is performed with the primers SEQ ID NOS: 230-232.

14. A method of performing an integrative analysis of a B and T cell transcriptome and paired T cell receptor (TCR)/B cell receptor (BCR) repertoire in phenotypically defined B and T cell subsets of a subject, comprising a) obtaining from the subject B and T cells that are phenotypically defined, b) lysing the B and T cells, c) extracting RNA molecules from a lysate obtained in step b), d) conducting reverse transcription (RT) of the RNA molecules to obtain cDNAs by performing the method of claim 5, e) amplifying the cDNAs, f) pooling and purifying the cDNAs to obtain purified cDNAs, g) preparing a cDNA library from the purified cDNAs, h) sequencing the cDNA library, and i) performing the integrative analysis using sequence data obtained in the sequencing step.

15. A method of, for B and T cell subsets: obtaining a dataset that includes sequence information, representation of V, D, J, C, VJ, VDJ, VJC, VDJC, antibody heavy chain, antibody light chain, CDR3, or T-cell receptor usage, representation for abundance of V, D, J, C, VJ, VDJ, VJC, VDJC, antibody heavy chain, antibody light chain, CDR3, or T-cell receptor and unique sequences; representation of mutation frequency, correlative measures of VJ V, D, J, C, VJ, VDJ, VJC, VDJC, antibody heavy chain, antibody light chain, CDR3, or T-cell receptor usage, comprising performing an integrative analysis of a B and T cell transcriptome and paired T cell receptor (TCR)/B cell receptor (BCR) repertoire for the B and T cell subsets by the method of claim 14.

16. The method of claim 15 wherein results obtained in said performing step are output or stored in a database of repertoire analyses, and used in comparisons with a reference or control repertoire to make a desired analysis.

17. A method of, in a subject: diagnosing an immune response, monitoring an immune response after or during a therapy, assessing a vaccine response, assessing clonal rearrangements and/or chromosomal translocations that occur in lymphoma, assessing an immune response that could lead to transplant rejection assessing immunosenescence, or for diagnosing immunodeficiencies, the method comprising performing an integrative analysis of phenotypically defined B and T cells of the subject by the method of claim 14, wherein results obtained from the step of performing are used to diagnose the immune response, monitor the immune response, assess the vaccine response, assess the clonal rearrangements and/or chromosomal translocations, assess the immune response that could lead to transplant rejection, assess the immunosenescence or diagnose the immunodeficiencies in the subject.

18. A method for selecting an antibody that specifically binds to an antigen of interest comprising (a) immunizing an animal with an antigen of interest; (b) isolating a plurality of B-cells from the immunized animal; (c) characterizing the plurality of B cells by carrying out the scRNAseq method of claim 6 and (d) providing the sequences of the antibody of interest.

19. A kit which comprises a plurality of TSO according to claim 1.

20. The kit of claim 19 which comprises the 96 TSO of SEQ ID NO:99 to SEQ ID NO:194.

21. The kit of claim 19 which further comprises one or more of a panel of antibodies for cell sorting, primers, dNTPs, adapter sequences and/or a post synthesis labelling reagent at least one buffer mediums, and purification beads.

22. The kit of claim 19 which further comprises a software package for statistical analysis, wherein the software package optionally includes a reference database for calculating the probability of a match between two repertoires.

Description

FIGURES

[0140] FIG. 1: Overview of FB5Pseq experimental workflow. Schematic illustration of the mapping of Read1 sequences on IGH and IGK or IGL amplified cDNA, enabling the in silico reconstruction of paired variable BCR sequences.

[0141] FIG. 2: Overview of FB5Pseq bioinformatics workflow. Major steps of the bioinformatics pipeline starting from Read1 and Read2 FASTQ files for the generation of single-cell gene expression matrices and BCR or TCR repertoire sequences.

[0142] FIG. 3: FB5seq quality metrics on human tonsil B cell subsets. (A) Experimental workflow for studying human tonsil B cell subsets with FB5Pseq. (B) Per cell quantitative accuracy of FB5Pseq computed based on ERCC spike-in mRNA detection (see Methods) for Memory B cells (Mem, n=73 Tonsil 1, n=65 Tonsil 2), GC B cells (GC, n=235 Tonsil 1, n=242 Tonsil 2) and PB/PC cells (n=78 Tonsil 1, n=152 Tonsil 2). Black line indicates the median with 95% confidence interval error bars. (C) Molecular sensitivity of FB5Pseq computed on ERCC spike-in mRNA detection rates (see Methods) in two distinct experiments. Dashed lines indicate the number of ERCC molecules required to reach a 50% detection probability. (D-E) Total number of unique genes (D) and molecules (E) detected in human tonsil Mem B cells (n=73 Tonsil 1, n=65 Tonsil 2), GC B cells (n=235 Tonsil 1, n=242 Tonsil 2) and PB/PCs (n=78 Tonsil 1, n=152 Tonsil 2). Black line indicates the median with 95% confidence interval error bars. (F) Pie charts showing the relative proportion of cells with reconstructed productive IGH and IGK/L sequences (1), only IGK/L sequences (2 only IGH sequences (3) or no BCR sequence (white) among Mem B cells, GC B cells and PB/PC cells from Tonsil 1 and Tonsil 2 samples. Total number of cells analyzed for each subset is indicated at the center of the pie chart.

[0143] FIG. 4: FB5Pseq analysis of human tonsil B cell subsets. (A) Scatter plots showing IGH mutation frequency in human Tonsil 1 (circles) and Tonsil 2 (triangles) B cells sorted by their IGH isotype and phenotype (Mem B cells: n=11 IgM/IgD+, n=37 IgG+ and n=26 IgA+; GC B cells: n=55 IgM/IgD+, n=174 IgG+ and n=32 IgA+; PB/PC: n=4 IgM/IgD+, n=179 IgG+ and n=42 IgA+PB/PCs. Black line indicates the median. (B) Scatter plots showing IGK/L mutation frequency in human Tonsil 1 (circles) and Tonsil 2 (triangles) B cells sorted by their IGK/L isotype and phenotype (Mem B cells: n=71 Igκ+, n=51 Igλ+; GC B cells: n=253 Igκ+, n=163 Igλ+; PB/PCs: n=139 Igκ+, n=84 Igλ+).

[0144] FIG. 5: FB5Pseq analysis of human peripheral blood antigen-specific CD4 T cells. (A) Experimental workflow for studying human peripheral blood Candida albicans-specific CD4 T cells with FB5Pseq. (B) Per cell quantitative accuracy of FB5Pseq computed based on ERCC spike-in mRNA detection (see Methods) for Candida albicans-specific CD4 T cells (n=82). Black line indicates the median with 95% confidence interval error bars. (C) Total number of unique genes detected in Candida albicans-specific CD4 T cells (n=82). Black line indicates the median with 95% confidence interval error bars. (D) Pie charts showing the relative proportion of cells with reconstructed productive TCRA and TCRB sequences (black), only TCRB sequences (3), only TCRA sequences (2) or no TCR sequence (1) among Candida albicans-specific CD4 T cells (n=82). (E) Distribution of TCRB clones among Candida albicans-specific CD4 T cells (n=67). Black sectors indicate the proportion of TCRB clones (clonotype expressed by >2 cells) within single-cells analyzed (white sector: unique clonotypes).

EXAMPLE 1

[0145] Material & Methods

[0146] Human Samples

[0147] Non-malignant tonsil samples from a 35-year old male (Tonsil 1) and a 30-year old female (Tonsil 2) were obtained as frozen live cell suspensions from the CeVi collection of the Institute Carnot/Calym (ANR, France, https://www.calym.org/-Viable-cell-collection-CeVi-.html). Peripheral blood mononuclear cells (PBMCs) were collected in Nantes University Hospital and used fresh in peptide restimulation assays for isolating C.alb-specific T cells. Written informed consent was obtained from the donors.

[0148] Flow Cytometry and Cell Sorting of B Cell Subsets

[0149] Frozen live cell suspensions were thawed at 37° C. in RPMI+10% FCS, then washed and resuspended in FACS buffer (PBS+5% FCS+2 mM EDTA) at a concentration of 10.sup.8 cells/ml for staining. Cells were first incubated with 2% normal mouse serum and Fc-Block (BD Biosciences) for 10 min on ice. Then cells were incubated with a mix of fluorophore-conjugated antibodies for 30 min on ice. Cells were washed in PBS, then incubated with the Live/Dead Fixable Aqua Dead Cell Stain (Thermofisher) for 10 min on ice. After a final wash in FACS buffer, cells were resuspended in FACS buffer at a concentration of 10.sup.7 cells/ml for cell sorting on a 4-laser BD FACS Influx (BD Biosciences).

[0150] Mem B cells were gated as CD3.sup.−CD14.sup.−IgD.sup.−CD20.sup.+CD10.sup.−CD38.sup.loCD27.sup.+SSC.sup.lo single live cells. GC B cells were gated as CD3.sup.−CD14.sup.−IgD.sup.−CD20.sup.+CD10.sup.+CD38.sup.+single live cells. PB/PC cells were gated as CD3.sup.−CD141gD.sup.−CD38.sup.hiCD27.sup.+SSC.sup.hi single live cells.

[0151] Restimulation and Cell Sorting of Antigen-Specific T Cells.

[0152] Fresh PBMCs (10-20×10.sup.6 cells, final concentration 10×10.sup.6 cells/ml) were stimulated for 3 h at 37° C. with 0.6 nmol/ml PepTivator Candida albicans MP65 (pool of 15 amino acids length peptides with 11 amino acid overlap, Miltenyi Biotec) in RPMI+5% human serum in the presence of 1 μg/ml anti-CD40 (HB14, Miltenyi Biotec). After stimulation, PBMCs were labeled with PE-conjugated anti-CD154 (5C8, Miltenyi Biotec) and enriched with anti-PE magnetic beads (Miltenyi Biotec). After enrichment, cells were stained with PerCP-Cy5.5 anti-CD4 (RPA-T4, Biolegend), AlexaFluor700 anti-CD3 (SK7, Biolegend) and APC-Cy7 anti-CD45RA (HI100, Biolegend), and antigen-specific T cells were gated as CD3.sup.+CD4.sup.+CD45RA.sup.− CD154.sup.+single live cells for single-cell sorting.

[0153] Single-Cell RNAseq

[0154] Single cells were FACS sorted into ice-cold 96-well PCR plates (Thermofisher) containing 2 μl lysis mix per well. The lysis mix contained 0.5 μl 0.4% (v/v) Triton X-100 (Sigma-Aldrich), 0.05 μl 40 U/μl RnaseOUT (Thermofisher), 0.08 μl 25 mM dNTP mix (Thermofisher), 0.5 μl 10 μM (dT)30_Smarter primer, 0.05 μl 0.5 pg/μl External RNA Controls Consortium (ERCC) spike-ins mix (Thermofisher), and 0.82 μl PCR-grade H.sub.2O (Qiagen).

[0155] For B cell subsets sorting, the index-sorting mode was activated to record the different fluorescence intensity of each sorted single-cell. Index-sorting FCS files were visualized in FlowJo software and compensated parameters values were exported in CSV tables for further processing. For visualization on linear scales in the R programming software, we applied the hyperbolic arcsine transformation on fluorescence parameters. In every 96-well plate, two wells (H1, H12) were left empty and processed throughout the protocol as negative controls.

[0156] Immediately after cell sorting, each plate was covered with adhesive film (Thermofisher), briefly spun down in a benchtop plate centrifuge, and frozen on dry ice. Plates containing single cells in lysis mix were stored at −80° C. and shipped on dry ice (only T cells) until further processing.

[0157] The plate containing single cells in lysis mix was thawed on ice, briefly spun down in a benchtop plate centrifuge, and incubated in a thermal cycler for 3 minutes at 72° C. (lid temperature 72° C.). Immediately after, the plate was placed back on ice and 3 μl RT mastermix was added to each well. The RT mastermix contained 0.25 μl 200 U/μl SuperScript II (Thermofisher), 0.25 μl 40 U/μl RnaseOUT (Thermofisher), and 2.5 μl 2×RT mastermix. The 2×RT mastermix contained 1 μl 5× SuperScript II buffer (Thermofisher), 0.25 μl 100 mM DTT (Thermofisher), 1 μl 5 M betaine (Sigma-Aldrich), 0.03 μl 1 M MgCl.sub.2 (Sigma-Aldrich), 0.125 μl 100 μM well-specific template switching oligonucleotide TSO BCx UMI5 TATA, and 0.095 μl PCR-grade H.sub.2O (Qiagen). Reverse transcription was performed in a thermal cycler (lid temperature 70° C.) by 90 min at 42° C., followed by 10 cycles of 2 min at 50° C. and 2 min at 42° C., then 15 min at 70° C. Plates with single-cell cDNA were stored at −20° C. until further processing.

[0158] For cDNA amplification, 7.5 μl LD-PCR mastermix were added to each well. The LD-PCR mastermix contained 6.25 μl 2×KAPA HiFi HotStart ReadyMix (Roche Diagnostics), 0.125 μl 20 μM PCR_Satij a forward primer, 0.125 μl 20 μM SmarterR reverse primer, and 1 μl PCR-grade H.sub.2O (Qiagen). The amplification was performed in a thermal cycler (lid temperature 98° C.) by 3 min at 98° C., followed by 22 cycles of 15 sec at 98° C., 20 sec at 67° C., 6 min at 72° C., then a final elongation for 5 min at 72° C. Plates with amplified single-cell cDNA were stored at −20° C. until further processing.

[0159] For library preparation, 5 μl amplified cDNA from each well of a 96-well plate were pooled and completed to 500 μl with PCR-grade H.sub.2O (Qiagen). Two rounds of 0.6× solid-phase reversible immobilization beads (AmpureXP, Beckman, or CleanNGS, Proteigene) cleaning were used to purify 100 μl pooled cDNA with final elution in 15 μl PCR-grade H.sub.2O (Qiagen). After quantification with Qubit dsDNA HS assay (Thermofisher), 800 pg purified cDNA pool were processed with the Nextera XT DNA sample Preparation kit (Illumina), according to the manufacturer's instructions with modifications to enrich 5′-ends of tagmented cDNA during library PCR. After tagmentation and neutralization, 25 μl tagmented cDNA was amplified with 15 μl Nextera PCR Mastermix, 5 μl Nextera i5 primer (S5xx, Illumina), and 5 μl of a custom i7 primer mix (0.5 μM i7_BCx+10 μM i7_primer). The amplification was performed in a thermal cycler (lid temperature 72° C.) by 3 min at 72° C., 30 sec at 95° C., followed by 12 cycles of 10 sec at 95° C., 30 sec at 55° C., 30 sec at 72° C., then a final elongation for 5 min at 72° C. The resulting library was purified with 0.8× solid-phase reversible immobilization beads (AmpureXP, Beckman, or CleanNGS, Proteigene).

[0160] Libraries generated from multiple 96-well plates of single cells and carrying distinct i7 barcodes were pooled for sequencing on an Illumina NextSeq550 platform, with High Output 75 cycles flow cells, targeting 5×10.sup.5 reads per cell in paired-end single-index mode with the following primers and cycles: Read1 (Read1_SP, 67 cycles), Read i7 (i7_SP, 8 cycles), Read2 (Read2_SP, 16 cycles).

[0161] Single-Cell RNAseq Data Processing

[0162] We used a custom bioinformatics pipeline to process fastq files and generate single-cell gene expression matrices and BCR or TCR sequence files. Detailed instructions for running the FB5P-seq bioinformatics pipeline can be found at https://github.com/MilpiedLab/. Briefly, the pipeline to obtain gene expression matrices was adapted from the Drop-seq pipeline, relied on extracting the cell barcode and UMI from Read2 and aligning Read1 on the reference genome using STAR and HTSeqCount. For BCR or TCR sequence reconstruction, we used Trinity for de novo transcriptome assembly for each cell based on Read1 sequences, then filtered the resulting isoforms for productive BCR or TCR sequences using MigMap, Blastn and Kallisto. Briefly, MigMap was used to assess whether reconstructed contigs corresponded to a productive V(D)J rearrangement and to identify germline V, D and J genes and CDR3 sequence for each contig. For each cell, reconstructed contigs corresponding to the same V(D)J rearrangement were merged, keeping the largest sequence for further analysis. We used Blastn to align the reconstructed BCR or TCR contigs against reference sequences of constant region genes, and discarded contigs with no constant region identified in-frame with the V(D)J rearrangement. Finally, we used the pseudoaligner Kallisto to map each cell's FB5Pseq Read1 sequences on its reconstructed contigs and quantify contig expression. In cases where several contigs corresponding to the same BCR or TCR chain had passed the above filters, we retained the contig with the highest expression level.

[0163] The per well accuracy (FIG. 3B) was computed as the Pearson correlation coefficient between log.sub.10(UMI.sub.ERCC-xxxxx+1) and log.sub.10(#mol.sub.ERCC-xxxxx+1), where UMI.sub.ERCC-xxxxx is the UMI count for gene ERCC-xxxxx in the well, and #mol.sub.ERCC-xxxxx is the actual number of molecules for ERCC-xxxxx in the well (based on a 1:2,000,000 dilution in 2 μl lysis mix per well). For each well, only ERCC-xxxxx which were detected (UMI.sub.ERCC-xxxxx>0) were considered for calculating the accuracy.

[0164] To estimate sensitivity (FIG. 3C), the percentage of wells with at least one molecule detected (UMI.sub.ERCC-xxxxx>0) was calculated over all the wells from 5 or 6 96-well plates corresponding to human B cells sorted from Tonsil 1 or Tonsil 2, respectively. The value for each ERCC-xxxxx gene was plotted against log.sub.10(#mol.sub.ERCC-xxxxx) and a standard curve was interpolated with asymmetric sigmoidal 5PL model in GraphPad Prism 8.1.2 to compute the EC50 for each dataset.

[0165] The normalized coverage over genes (data not shown) was computed with RSeQC geneBody_coverage on bam files from 11 scRNAseq 96-well plates corresponding to human B cells sorted from Tonsil 1 and Tonsil 2.

[0166] Single-Cell Gene Expression Analysis

[0167] Quality control was performed on each dataset (Tonsil 1, Tonsil 2, T cells) independently to remove poor quality cells. Cells with less than 250 genes detected were removed. We further excluded cells with values below 3 median absolute deviations (MADs) from the median for UMI counts, for the number of genes detected, or for ERCC accuracy, and cells with values above 3 MADs from the median for ERCC transcript percentage.

[0168] For each cell, gene expression UMI count values were log normalized with Seurat v3 NormalizeData with a scale factor of 10,000. Data from B cells of Tonsil 1 and Tonsil 2 were analyzed together. Data from C.alb-specific T cells were analyzed separately. Four thousand variable genes, excluding BCR or TCR genes, were identified with Seurat

[0169] Find VariableFeatures. After scaling with Seurat ScaleData, principal component analysis was performed on variable genes with Seurat RunPCA, and embedded in two-dimensional tSNE plots with Seurat RunTSNE on 40 principal components. Cell cycle phases were attributed with Seurat CellCycleScoring. Plots showing tSNE embeddings colored by index sorting protein expression or other metadata (including BCR or TCR sequence related informations) were generated with ggplot2 ggplot. Plots showing tSNE embeddings colored by gene expression were generated by Seurat FeaturePlot. Gene expression heatmaps were plotted with a custom function (available upon request).

[0170] Results

[0171] FB5Pseq Experimental Workflow

[0172] We based the design of the FB5Pseq experimental workflow on existing full-length.sup.3 and 5′-end.sup.4,5 scRNAseq protocols. The main originalities in FB5Pseq were to perform cell-specific barcoding and incorporate 5 bp UMI during reverse transcription, and sequence the 5′-ends of amplified cDNAs from their 3′-end, and not from the transcription start site (FIG. 1A-B). In FB5Pseq, single cells of interest are sorted in 96-well plates by FACS, routinely using a 10-color staining strategy to identify and enrich for specific subsets of B or T cells while recording all parameters through index sorting. Single-cells are collected in lysis buffer containing External RNA Controls Consortium (ERCC) spike-in mRNA (0.025 pg per well) and sorted plates are immediately frozen on dry ice and stored at −80° C. until further processing. The amount of ERCC spike-in mRNA added to each well was optimized to yield around 5% of sequencing reads covering ERCC genes when studying lymphocytes which generally contain little mRNA. mRNA reverse transcription (RT), cDNA 5′-end barcoding and PCR amplification are performed with a template switching (TS) approach. Notably, our TSO design included a PCR handle (different from the one introduced at the 3′-end upon RT priming), an 8 bp well-specific barcode followed by a 5 bp UMI, a TATA spacer.sup.6, and three riboguanines. We empirically selected the 96 well-specific barcode sequences to avoid TSO concatemers in FB5Pseq libraries. After amplification, barcoded full-length cDNA from each well are pooled for purification and one-tube library preparation. For each plate, an Illumina sequencing library targeting the 5′-end of barcoded cDNA is prepared by a modified transposase-based method incorporating a plate-associated i7 barcode. The FB5Pseq library preparation protocol is cost-effective (260 € for library preparation of a 96-well plate), easily scalable and may be implemented on a pipetting robot.

[0173] FB5Pseq libraries are sequenced in paired-end single-index mode with Read1 covering the gene insert from its 3′-end, Read i7 assigning the plate barcode, and Read2 covering the well barcode and UMI. Because FB5Pseq libraries have a broad size distribution, with a gene insert of 100-850 bp, Read 1 sequences cover the 5′-end of transcripts approximately from 30 to 850 bases downstream of the transcription start site. Consequently, sequencing reads cover the whole variable and a significant portion of the constant region of the IGH and IGK/L expressed mRNAs (FIG. 1), enabling in silico assembly and reconstitution of BCR repertoire from scRNAseq data. Because TCRα and TCRβ genes share a similar structure, FB5Pseq is equally suitable for reconstructing TCR repertoire from scRNAseq data when T cells are analyzed.

[0174] FB5Pseq Bioinformatics Workflow

[0175] The FB5Pseq data is processed to generate both a single-cell gene count matrix and single-cell BCR or TCR repertoire sequences when analyzing B cells or T cells, respectively. After extracting the well-specific barcode and UMI from Read2 sequences and filtering out low quality or unassigned reads, we use two separate pipelines for gene expression and repertoire analysis (FIG. 2). The transcriptome analysis pipeline was derived from the Drop-seq pipeline.sup.7. Briefly, it consists of mapping all Read1 sequences to the reference genome, then quantifying, for each gene in each cell, the number of unique molecules through UMI sequences. After merging the data from all 96-well plates in the experiment, we filter the resulting gene-by-cell count matrices to exclude low quality cells, and normalize by total UMI content per cell.

[0176] For the extraction of BCR or TCR repertoire sequences from FB5Pseq data, we have developed our own pipeline based on de novo single-cell transcriptome assembly and mapping of reconstituted long transcripts (contigs or isoforms) on public databases of variable immunoglobulin or TCR genes. We identify and select contigs corresponding to productive V(D)J rearrangements in-frame with an identified constant region gene. In cases where multiple isoforms are identified for a given chain (e.g. IGH) in a single cell, we assign the most highly expressed isoform and discard the other one(s). In early validation experiments, our pipeline was equally efficient and accurate as RT-PCR followed by Sanger sequencing for IGH variable region analysis (data not shown), with the major advantage of retrieving complete variable regions and large portions of constant regions of both IGH and IGK/L, or TCRA and TCRB, from the same scRNAseq run.

[0177] FB5Pseq Quality Metrics on Human Tonsil B Cell Subsets

[0178] To illustrate the performance of our scRNAseq protocol, we obtained non-malignant tonsil cell suspensions from two adult human donors, referred to as Tonsil 1 and Tonsil 2. Based on surface marker staining, we excluded monocytes, T cells and naïve B cells, and sorted memory (Mem) B cells, germinal center (GC) B cells, and plasmablasts or plasma cells (PB/PCs) for FB5Pseq analysis (FIG. 3A). We processed Tonsil 1 and Tonsil 2 samples in two separate experiments, generating libraries from 5 and 6 plates respectively. Libraries were sequenced at an average depth of approximately 500,000 reads per cell (data not shown). After bioinformatics quality controls, we retained more than 90% of cells in the gene expression dataset (data not shown). We computed per cell accuracy (FIG. 3B) and per experiment sensitivity (FIG. 3C) based on ERCC spike-in detection levels and rates, respectively.sup.1,2. All cells showed high quantitative accuracy independently of their phenotype, with an overall mean correlation coefficient of 0.83 (FIG. 3B). The molecular sensitivity ranged from 9.5 to 21.2 (FIG. 3C), which compares favorably with other current scRNAseq protocols.sup.2. We detected a mean of 987, 1712 and 1307 genes per cell in Mem B cells, GC B cells and PB/PCs, respectively (FIG. 3D). GC and Mem B cells displayed higher total molecule counts (mean UMI counts of 192,765 and 145,356, respectively) than PB/PCs (mean UMI count of 67,861) (FIG. 3E).

[0179] As expected from the method design, FB5Pseq Read1 sequence coverage was biased towards the 5′-end of gene bodies, with a broad distribution robustly covering from the 3.sup.rd to the 60.sup.th percentile of gene body length on average (data not shown). In Tonsil 1 and Tonsil 2 B cell subsets, the BCR reconstruction pipeline retrieved at least one productive BCR chain for the majority of the cells (FIG. 3F). Consistent with high expression of BCR gene transcripts for sustained antibody production, we obtained the paired IGH and IGK/L repertoire for the vast majority of PB/PCs. In Mem and GC B cells, we obtained paired IGH and IGK/L sequences on approximately 50% of the cells, and only the IGK/L sequence in most of the remaining cells. The superior recovery of IGK/L sequences was likely because the expression level of IGK/L was about 2-fold higher than IGH in our FB5Pseq data (data not shown).

[0180] Altogether, accuracy, sensitivity, gene coverage and BCR sequence recovery highlighted the high performance of the FB5Pseq method for integrative analysis of transcriptome and BCR repertoire in single B cells.

[0181] FB5Pseq Analysis of Human Tonsil B Cell Subsets

[0182] As a biological proof-of-concept, we further analyzed the Tonsil 1 and Tonsil 2 datasets. T-distributed stochastic neighbor embedding (t-SNE) analysis on the gene expression data discriminated three major cell clusters. Tonsil B cells clustered based on their sorting phenotype (Mem B cells, GC B cells or PB/PC) and did not cluster by sample origin (data not shown). Cell cycle status further separated the cycling (S and G2/M phase) from the non-cycling (G1) GC B cells (data not shown). The expression levels of surface protein markers recorded through index sorting were consistent with the gating strategy of Mem B cells (CD20.sup.+CD38.sup.lo CD10.sup.−CD27.sup.+), GC B cells (CD20.sup.+CD38.sup.+CD10.sup.+) and PB/PCs (CD38.sup.hiCD27.sup.hi) (data not shown). The expression of the corresponding mRNAs mirrored the protein expression (data not shown), but revealed numerous cells where the mRNA was undetected despite intermediate or high levels of the protein. Further, we detected the expression of known marker genes for Mem B cells (CCR7, SELL, GPRI83) GC B cells (AICDA, MKI67, CD81) or PB/PC PRDM1, IRF4) in the corresponding clusters (data not shown), and identified the top marker genes for each cell subset (data not shown). Those analyses were consistent with previous single-cell qPCR analyses' and bulk microarray analyses of human B cell subsets.sup.9,10.

[0183] Integrating the single-cell BCR repertoire data to the t-SNE embedding, we revealed that the IGH and IGK/L repertoire of tonsil B cell subsets was polyclonal (data not shown). Interestingly, while the somatic mutation load was equivalent in Igκ and Igλ light chains from Mem B cells, GC B cells and PB/PCs (FIG. 4B), the IGH mutation rate depended on isotype, with IgA cells expressing the most mutated BCR (FIG. 4A) regardless of phenotype or sample origin. By contrast, IgM/IgD.sup.+ cells exhibited the lowest somatic mutation loads (FIG. 4A).

[0184] Overall, those analyses confirmed that the FB5Pseq method is relevant for simultaneous protein, whole-transcriptome and BCR sequence analysis in human B cells.

[0185] FB5Pseq Analysis of Human Tonsil B Cell Subsets

[0186] To test whether our protocol is also effective in T cells, we applied FB5Pseq to Candida albicans-specific human CD4 T cells sorted after a brief restimulation of fresh peripheral blood mononuclear cells with a pool of MP65 antigen-derived peptides (FIG. 5A and Methods). Candida albicans is a common commensal in humans, known to generate antigen-specific circulating memory CD4 T cells with a TH17 profile. Similar to the B cell dataset, the T cell dataset displayed high per cell accuracy (FIG. 5B) and an average of 1890 detected genes per cell (FIG. 5C). Gene expression analysis showed an efficient detection of T cell marker genes (CD3E), activation genes (CD40LG, EGR2, NR4A1, IL2), and TH17-specific genes (CCL20, CSF2, IL22, IL23A, IL17A) in those reactivated antigen-specific T cells (data not shown). We recovered at least one productive TCRα or TCRβ chain in 88% of cells, and paired TCRαβ repertoire in 61% of cells (FIG. 5D). Moreover, CDR3β sequence analysis revealed some expanded TCRβ clonotypes likely related to MP65 antigen-specificity (FIG. 5E). Principal Component Analysis (PCA) of the gene expression data and visualization of V.sub.β-J.sub.β TCR rearrangements revealed no apparent segregation of antigen-specific T cells expressing different clonotypes (data not shown).

[0187] Taken together, these data indicate that our method is also relevant for integrative single-cell RNAseq analysis of human T cells.

Example 2

[0188] We adapted FBSP-seq to study the transcriptional response of human GC B cells to diverse combinations of stimuli by bulk RNA-seq. Briefly, we bulk-sorted GC B cells from human tonsils by FACS, and cultured them in vitro in the presence of any possible combination of five stimuli (IL4, IL10, 1L21, CD40L, anti-BCR, 32 combinations in total) at a density of 500 cells per well. After 6 hours, cells were washed in PBS, lyzed in RLT buffer, and RNA was captured by SPRI bead precipitation. The captured RNA was then eluted in FBSP-seq lysis buffer, and each 500-cell RNA sample was processed with the adapted FBSP-seq protocol (with only 16 cycles of PCR for cDNA amplification). Libraries corresponding to four 96-well plates (3 human donors×32 conditions×3 replicates+control conditions) were prepared and sequenced on a 75 cycles HighOutput Illumina NextSeq550 run, generating RNA-seq results for over 300 samples in a single run.

[0189] The corresponding data were analyzed to identify the top 10 induced genes by single-stimulus activation and their expression in all combinations (data not shown).

REFERENCES

[0190] Throughout this application, various references describe the state of the art to which this invention pertains. The disclosures of these references are hereby incorporated by reference into the present disclosure. [0191] 1. Ziegenhain, C. et al. Comparative Analysis of Single-Cell RNA Sequencing Methods. Molecular Cell 65, 631-643.e4 (2017). [0192] 2. Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat Meth 14, 381-387 (2017). [0193] 3. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9, 171-181 (2014). [0194] 4. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495-502 (2015). [0195] 5. Arguel, M.-J. et al. A cost effective 5′ selective single cell transcriptome profiling approach with improved UMI design. Nucleic Acids Res 45, e48 (2017). [0196] 6. Tang, D. T. P. et al. Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching. Nucleic Acids Res 41, e44 (2013). [0197] 7. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202-1214 (2015). [0198] 8. Milpied, P. et al. Human germinal center transcriptional programs are de-synchronized in B cell lymphoma. Nature Immunology 19, 1013 (2018). [0199] 9. Victora, G. D. et al. Identification of human germinal center light and dark zone cells and their relationship to human B-cell lymphomas. Blood 120, 2240-2248 (2012). [0200] 10. Seifert, M. et al. Functional capacities of human IgM memory B cells in early inflammatory responses and secondary germinal center reactions. Proc Natl Acad Sci USA 112, E546-E555 (2015).

RNA SEQUENCING METHOD FOR THE ANALYSIS OF B AND T CELL TRANSCRIPTOME IN PHENOTYPICALLY DEFINED B AND T CELL SUBSETS

Inventors

Cpc classification

Classification Explorer

C12Q2525/121

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1065

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/107

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2563/179

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/121

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1096

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6869

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2563/179

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6881

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/107

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6869

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12Q1/6881

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/10

CHEMISTRY; METALLURGY

Abstract

Claims

Description