Immunotherapy Targeting Tumor Neoantigenic Peptides

20220401539 · 2022-12-22

Assignee

Inventors

Cpc classification

International classification

Abstract

The present disclosure relates to a tumor specific neoantigenic peptide, wherein said peptide (i) is encoded by a part of an (ORF) sequence from an unannotated transcript which transcription is positively regulated by an aberrant fusion protein, and (ii) is expressed at a higher level or frequency in a sample from said tumor compared to normal tissue sample. The present disclosure also relates to vaccine or immunogenic composition, antibodies and immune cells derived thereof and their use in therapy of cancer.

Claims

1-19. (canceled)

20. A method for identifying tumor specific neoantigenic peptides which comprises: i. identifying transcripts from one or more samples isolated from a tumor driven by a transcription factor fusion and obtained from one or more subjects, which transcription is specifically positively regulated by said transcription factor fusion, which are specifically associated with the transcription-fusion tumor type, and optionally which are encoded by neogenes that originate from intergenic or intronic regions of the genome, ii. identifying open reading frame (ORF) sequences from the transcripts of step (i), optionally wherein said ORF sequences are specifically expressed in a tissue (or cell) sample from said transcription factor fusion-driven tumor.

21. A tumor specific neoantigenic peptide, wherein said tumor is associated with a transcription factor fusion, and wherein said peptide i) is encoded by a part of an (ORF) sequence from a neotranscript wherein: a. its expression is regulated by a transcription factor fusion as evidenced by expression in cell line wherein the expression of said transcription factor fusion is made inducible, b. it is specifically associated with the fusion-driven tumor type, c. it is are encoded by genome regions having binding motifs involved in promoter regulation such as a poly GGAA motif and/or histone marks activation, such as H3K27ac and H3K4me3 histone, located at 5 kb or less of the TSS. ii) is expressed at a higher level or frequency in a sample from said tumor compared to normal tissue sample.

22. The tumor specific neoantigenic peptide according to claim 21 wherein the aberrant fusion transcription factor is encoded by any one of the genes selected from PAX3-FOXO1, PAX7-FOXO1, ASPSCR1-TFE3, AHRR-NCOA2, EWSR1-CREB1, EWSR1-ATF1, FUS-ATF1, EWSR1-CREB1, COL1A1-PDGFB, EWSR1-WT1, WWTR1-CAMTA1, TFE3-YAP1, EWSR1-FLI1, EWSR1-ERG, EWSR1 fusion with various ETS partners such as ETV1 FEV and ETV4, FUS-ERG, EWSR1-NFATC2, CIC-DUX4, BCOR-CCNB3, EWSR1-NR4A3, TAF15-NR4A3, TCF12-NR4A3, TFG-NR4A3, ETV6-NTRK3, ALK-TPM4, ALK-TPM3, ALK-CLTC, ALK-RANBP2, ALK-ATIC, ALK-SEC31A, ALK-CARS, PLAF fusions, HMGA2 fusions, HMGA1 fusions, C-MKL2, f95-MKL2, FUS-CREB3L2, EWSR1-ZNF444, EWSR1-PBX1, EWSR1-POU5F1, FUS-DDIT3, EWSR1-DDIT3, EWS-CHOP, EWS-CHN, TGFBR3-MGEA5, TGFBR3-MGEA5, MYH9-USP6, PHF1 fusions, ACTB-GLI1, FUS-CREB3L1, NAB2-STAT6, NCOA2-SRF, NCOA2-TEAD1, SS18-SSX1, SS18-SSX2, SS18-SSX4, and CSF1-COL6A3, preferably wherein the aberrant fusion transcription factor is encoded by EWSR1-FLI1.

23. The tumor specific neoantigenic peptide according to claim 21, which is encoded by a part of an open reading frame (ORF) of one of the sequences selected from the group comprising SEQ ID No 1-145 and the transcripts identified in table 9.

24. The tumor specific neoantigenic peptide according to claim 21, wherein said peptide comprises at least 8 amino acids, in particular 8 or 9 amino acids and binds at least one MHC class I molecule of a subject or in particular from 13 to 25 amino acids and binds at least one MHC class II of a subject; optionally wherein the neoantigenic peptides are defined in SEQ ID NO:166-201.

25. A population of autologous dendritic cells or antigen presenting cells that have been pulsed with one or more of the peptides according to claim 21 or transfected with a polynucleotide encoding one or more of the peptides according to claim 21.

26. A vaccine or immunogenic composition capable of inducing a specific T-cell response comprising a) one or more neoantigenic peptides according to claim 21; and/or b) one or more polynucleotides encoding a neoantigenic peptide according to claim 21, optionally linked to a heterologous regulatory control nucleotide sequence.

27. An antibody, or an antigen-binding fragment thereof, a T cell receptor (TCR), or a chimeric antigen receptor (CAR) that specifically binds a neoantigenic peptide according to claim 21, optionally in association with an MHC molecule, with a Kd affinity of about 10.sup.−6 M or less.

28. A T cell receptor according to claim 27, wherein said T cell receptor is made soluble and fused to an antibody fragment directed to a T cell antigen, optionally wherein the targeted antigen is CD3 or CD16.

29. An antibody according to claim 27, wherein said antibody is a multispecific antibody that further targets at least an immune cell antigen, optionally wherein the immune cell is a T cell, a NK cell or a dendritic cell, optionally wherein the targeted antigen is CD3, CD16, CD30 or a TCR.

30. A polynucleotide encoding the neoantigenic peptide according to claim 21.

31. A vector comprising the polynucleotide of claim 30.

32. An immune cell that specifically binds to one or more neoantigenic peptides according to claim 21, optionally wherein the immune cell is an allogenic or autologous cell selected from T cell, NK cell, CD4+/CD8+, TILs/tumor derived CD8 T cells, central memory CD8+ T cells, Treg, MAIT, and Yδ T cell.

33. A T cell, which comprises a T cell receptor that specifically binds one or more neoantigenic peptides according to claim 21.

34. A method of treatment of a cancer associated with a transcription factor fusion in a subject in need thereof, comprising the administration of a vaccine or immunogenic composition comprising: a) one or more neoantigenic peptides according to claim 21 or one or more polynucleotides encoding said neoantigenic peptides, optionally linked to a heterologous regulatory control nucleotide sequence.

35. The method according to claim 34, wherein the cancer is Ewing sarcoma.

36. The method according to claim 34, comprising the administration of the vaccine or immunogenic composition in combination with at least one further therapeutic agent, optionally wherein the therapeutic agent is a chemotherapeutic agent or an immunotherapeutic agent.

37. A method of treatment of a cancer associated with a transcription factor fusion comprising the administration of the antibody or the antigen-binding fragment thereof, the multispecific antibody, the TCR or the CAR according to claim 27 to a subject in need thereof.

38. A method of treatment of a cancer associated with a transcription factor fusion comprising the administration of the polynucleotide according to claim 30.

39. A method of treatment of a cancer associated with a transcription factor fusion comprising the administration of a population of the immune cells according to claim 32 to a subject in need thereof.

40. A vaccine or immunogenic composition capable of inducing a specific T-cell response comprising a) a population of antigen presenting cells according to claim 25.

41. A polynucleotide encoding the antibody, the CAR or the TCR according to claim 27.

42. A T cell, which comprises a TCR or a CAR according to claim 27.

43. A method of treatment of a cancer associated with a transcription factor fusion in a subject in need thereof, comprising the administration of a vaccine or immunogenic composition comprising: a) a population of antigen presenting cells according to claim 25.

Description

FIGURES

[0358] FIG. 1. EWS-FLI1 regulates Ewing specific neogenes.

[0359] (A) Scheme of the genomic region of the JUTE 1 gene. From top to bottom: a XX GGAA microsatellite is indicated, ChiP-seq of EWS-FLI1 in EWS-FLI1high and EWS-FLI1low conditions, H3K27ac and H3K4me3 in the same EWS-FLI1high and EWS-FLI1low conditions as well as short read RNA-seq alignments of the A673/TR/shEF in + and − DOX conditions and of three different Ewing tumors.

[0360] (B) RT-PCR analysis of the expression of the four neo-transcripts in the A673/TR/shEF cell line in EWS-FLI1high (DOX−) and EWS-FLI1low (DOX+) conditions.

[0361] (C) RT-PCR analysis of the expression of the four neo-transcripts in wild type and EWS-FLI1-expressing MSCs.

[0362] (D) Expression of JUJE 1 in cancers (TCGA and institutional databases). TPM: transcripts per million.

[0363] (E) Expression of JUJE 1 in tissues (GTEx, TCGA, HPA databases).

[0364] FIG. 2. Discovery of Ewing specific neogenes from short-read RNA-seq data.

[0365] (A) Overview of the neogene discovery pipeline (cf Methods for details).

[0366] (B) Modulation of neogene expression by EWS-FLI1 in 10 Ewing cell lines and MSCs. Size of the bubbles shows mean expression level in log 10 TPM of neogenes in Ewing cell lines with baseline EWS-FLI1 expression or MSCs expressing EWS-FLI1 (EF high), capped at 100 TPM. Color represents the log 2 fold-change (capped at 6) of mean neogene expression levels in EF high cell lines versus in corresponding cell lines with downregulated expression of EWS-FLI1, or MSCs not expressing EWS-FLI1 (EF low). Number of replicates per condition are: n=4 for TC71; n=3 for ASP14, MSC; n=2 for A673, CHLA10, EW1, EW24 and MHH-ES1; n=1 for EW7, POE and SK-N-MC.

[0367] FIG. 3. Discovery of neogenes specific for alveolar rhabdomyosarcoma (aRMS) and desmoplastic small round cell tumor (DSRCT).

[0368] (A) Modulation of aRMS-specific neogene expression by PAX3-FOXO1 in aRMS RH4 cell line. Size of the bubbles shows expression level in log 10 TPM of neogenes in RH4 with baseline PAX3-FOXO1 expression (PF high). Color represents the log 2 fold-change of neogene expression level in PF high RH4 versus in RH4 with downregulated expression of PAX3-FOXO1 (PF low). n=1 replicate per condition.

[0369] (B) Modulation of DSRCT-specific neogene expression by EWS-WT1 in DSRCT cell lines. Size of the bubbles shows mean expression level in log 10 TPM of neogenes in cell lines with baseline EWS-WT1 expression (EW high). Color represents the log 2 fold-change of neogene expression level in EW high cell lines versus in cell lines with downregulated expression of EWS-WT1 (EW low). n=3 replicates per condition for both cell lines.

[0370] FIG. 4: Exemple of tetramer analysis and phenotypic characterization of a peptide derived from an EWS-FLI1-regulated lincRNA in lymphocyte from a healthy donor. A, principle of the teramer analysis; B, identification of tetramer-bound T cells; C, characterization of the naïve/memory phenotype.

[0371] FIG. 5: T cell clone sensibility. Cytokine secretion of CD8 T cells specific for peptides encoded by EWS-FLI1-regulated lincRNAs (initially identified by PacBio sequencing) in response to increasing concentration of their specific antigen presented by the K562 cell antigen presenting cells.

EXAMPLES

Experimental Procedures

[0372] Cell Culture

[0373] A673/TR/shEF (also called ASP14) cell line (Carrillo et al., 2007) was routinely checked by PCR for the absence of mycoplasma. A673/TR/shEF cell line was cultured at 37° C., in 5% CO.sup.2 with Dulbecco's Modified Eagle Medium DMEM with High Glucose, 4 mM of L-Glutamine, 4500 mg/L Glucose and sodium pyruvate (HyClone) supplemented with 10% FBS (Eurobio) and 1% antibiotics (v/v) (penicillin and streptomycin (Gibco)). Induction of EWS-FLI1 specific shRNA was performed by adding 1 μg/mL of doxycycline in the medium ex-tempo. After seven days of treatment, doxycycline was removed and cells were washed three times to stop the shRNA induction, thus enabling re-expression of EWS-FLI1.

[0374] RNA from MSCs expressing or not EWS-FLI1 has been obtained from Stamenkovic lab.

[0375] RNA Extraction and Reverse Transcriptase

[0376] From cell extract: Total RNA was isolated using the Nucleospin II kit (Macherey-Nagel) and reverse-transcribed using the High-Capacity cDNA Reverse Transcription kit (Applied Biosystems). Next, cDNA molecules were amplified by PCR performed using the AmpliTaqGold DNA Polymerase kit with Gold Buffer and MgCl2 (Applied Biosystems). One microgram of template total RNA was used for each reaction.

[0377] From polysome profiling fraction: gradient fractions were collected in 16 tubes. Equal RNA volumes (300 μL) of each gradient fraction were used for RT using the iScript cDNA Synthesis Kit (BioRad).

[0378] Next, cDNA molecules were amplified by qPCR performed using SYBR Green (Applied Biosystems). Reactions were run on 7500 QPCR instrument and analyzed using the 7500 system SDS software (Applied Biosystems). Relative quantification of neotranscripts was normalized to an endogenous control (RPLPO) and was performed using the comparative Ct method. Error bars indicate SD. Oligonucleotides were purchased from MWG Eurofins Genomics.

TABLE-US-00010 EWS-FLI1 primers: F: GCCAAGCTCCAAGTCAATATAGC/ R: GAGGCCAGAATTCATGTTATTGC RPLP0 primers: F: GAAACTCTGCATTCTCGCTTC/ R: GGTGTAATCCGTCTCCACAG. Primer JUJE1: F: ATACTGGGCTTGCAACTGGAG/ R: TCCTCCTTGGAATGAATGGGC. Primer JUJE2: F: TGGCCAATTGGAGGGTCTTC/ R: GCAAATTACATCCTCATTTCTCCA. Primer JUJE3: F: ACAGCACTGCTAGATTAGGGAA/ R: CAGTGCAAGCCCTCGAATGTC. Primer JUJE10: F: ACTTTCCTTTCTCCTCACCACC/ R: AGACATCCTAAAGTAACAGAGGCA.

[0379] RNA Seq (Illumina & PacBio) & Data Processing

[0380] Illumina:

[0381] Every RNA samples were evaluated for integrity using BioAnalyzer instrument (Agilent). All samples displayed excellent quality (RNA Integrity Number above 9). Libraries were performed using the TruSeq Stranded mRNA Library Preparation Kit. Equimolar pools of libraries were sequenced on an Illumina HiSeq 2500 machine using paired-ends reads (PE, 2×101 bp) and High Output run mode allowing to get around 200 million of raw reads per sample. Raw reads were mapped on the human reference genome hg19 using the STAR aligner (v.2.5.0a). PCR-duplicated reads and low mapping quality reads (MQ<20) were removed using Picard tools and SAMtools, respectively.

[0382] Pacbio:

[0383] Libraries have been prepared following the protocol from Pacific Biosciences: “Procedure & Checklist—Iso-Seg™ Template Preparation for Sequel® Systems—Version 5 (November 2017)”

[0384] 1 ug of totalRNA has been used as input. The cDNA has been synthetized with the SMARTer PCR cDNA Synthesis Kit from Clontech, following manufacturer's recommendations. Then the cDNA was amplified with the PrimeSTAR GXL DNA Polymerase from Clontech with 12 cycles of PCR. This number has been set up after PCR optimization, in order to obtain enough yield, and to avoid PCR bias. The amplified cDNA was then split into only 2 fractions to perfom 2 different purifications using AMPure beads (ratio of 0.4× and 1×). No 3rd fraction was isolated to perform a size-selection step using Blue Pippin system. Then an equimolar pool was made from the 2 fractions. The SMRTbell has been prepared from 2.8 ug of this equimolar pool of cDNA using the SMRTbell Template Prep Kit from Pacific Biosciences, following manufacturer's recommendations. The sequencing was performed on Sequel system, using V2.1 chemistry and Magbead loading. 4 SMRTcells have been used for the ASP14 sample and 3 SMRTcells for the ASP14+DOX sample. The sequencing runs were set up with a pre-extension step of 240 min and 10 hours of movie. We used the implemented pbsmrtpipe pipeline to perform read processing.

[0385] To annotate IsoSeq reads, we used MatchAnnot script (https://github.com/TomSkelly/MatchAnnot). MatchAnnot assign each read to transcripts annotation using score base-pair matching. Reads that have no match on the Gencode v19 reference are annotated as NA. We manually curated the list of NA reads (n=145) using Integrative Genomics Viewer (IGV). We also used Illumina RNAseq data, ChIPseq data (EWS-FLI1, H3K27me3, H3K27ac, H3K4me3) and GGAA repeats track at the same time in order to identify EWS-FLI1-regulated reads from intergenic region. After applying these filters, we found 4 clusters corresponding to four distinct expressed-intergenic regions.

[0386] JUJE Neotranscripts Quantification

[0387] Public fastq files from TCGA, GTEx and HPA dataset were downloaded and aligned to the hg19 genome assembly using STAR version 2.7.0e (Dobin et al, 2013). The GTF file used for alignment and quantification of gene expression was based on evidence-based annotation of the human genome (GRCh37), version 19 (Ensembl 74) provided by GENCODE, to which was added the information for the 4 neotranscripts, previously described, in GTF format. Gene expression was quantified using the GeneCounts procedure from STAR. Raw counts were then normalized to Transcripts Per Million (TPM).

[0388] Tumor RNA-Seq Used for Discovery of Neogenes

[0389] Paired-end RNA-seq from the institutional database (Institut Curie) of fresh-frozen patient tumor tissue were used to search for tumor-specific neogenes. RNA-seq sequencing was performed using established protocols on Illumina instruments. 31 types of cancers, for which the inventors had at least 4 RNA-seq samples were tested, including 20 fusion-driven cancers. All diagnoses were made by pathological examination, confirmed by fusion gene detection in the case of fusion-driven cancers and independently reviewed by an expert clinician. In detail, the inventors assessed the following diseases (abbreviations for neogene nomenclature in parentheses): 20 fusion-driven cancers including Ewing sarcoma (Ew), alveolar rhabdomyosarcoma (aRMS), desmoplastic small round cell tumor (DSRCT), BCOR-rearranged sarcoma (BCOR), CIC-fused sarcoma (CIC), clear cell sarcoma (CCS), EWSR1-NFATC2 sarcoma (NFAT), synovial sarcoma (SS), angiomatoid fibrous histiocytoma (AFH), alveolar soft part sarcoma (ASPS), congenital fibrosarcoma (CFS), extraskeletal myxoid chondrosarcoma (emCS), low-grade fibromyxoid sarcoma (LGFS), mesenchymal chondrosarcoma (MCS), midline carcinoma (Midline), myxoid liposarcoma (mLPS), EWSR1-PATZ1 sarcoma (PATZ1), solitary fibrous tumor (SFT), TFE3 renal cell carcinoma (TFE3), inflammatory myofibroblastic tumor (TMFI); 11 non-fusion driven cancers including atypical teratoid rhabdoid tumor (ATRT), desmoid tumor (Desmoid), embryonal rhabdomyosarcoma (eRMS), leiomyosarcoma (LMS), dedifferentiated liposarcoma (LPS), malignant peripheral nerve sheath tumor (MPNST), nephroblastoma (NEP), neuroblastoma (NEU), osteosarcoma (OST), small cell carcinoma of the ovary hypercalcemic type (SCCOHT) and undifferentiated pleomorphic sarcoma (UPS). For discovery of neogenes, the inventors initially ran the pipeline on the first 8 fusion-driven sarcomas in the list («discovery batch 1») before testing the 23 other cancers («discovery batch 2»).

[0390] RNA-Seq Alignment, Transcript Assembly and Detection of Unannotated Transcripts

[0391] The inventors used Scallop (Shao M, Kingsford C. Accurate assembly of transcripts through phase-preserving graph decomposition. Nat Biotechnol. 2017 December; 35(12):1167-1169), a reference-based transcript assembler, to predict all transcript sequences based on aligned RNA-seq reads, independently of a reference transcriptome annotation. First, paired-end FASTQ files were aligned to the hg19 human reference genome using STAR v2.7.0 (Dobin et al., Bioinformatics 2013). They then ran Scallop on the resulting BAM file with default parameters to assemble all expressed transcript sequences. To conserve only unannotated transcripts, they used Gffcompare (Pertea and Pertea, F1000 Research 2020) to compare the Scallop output GTF file with the reference GENCODE v19 GTF file, and conserved only transcripts labeled by Gffcompare as «u» (unknown, intergenic), «y» (contains a reference within its introns) and «x» (exonic overlap on the opposite strand). Finally, to remove lowly expressed transcripts and decrease the rate of false positives, they removed all transcripts with coverage less than 10 as output by Scallop.

[0392] Selection of Tumor-Specific Unannotated Transcripts

[0393] To discover tumor-specific neogenes and discard all other transcripts assembled by Scallop, three steps of selection were used.

[0394] 1. First, the inventors ran Scallop as described previously on one RNA-seq sample of the cancer of interest to generate a first set of candidate unannotated transcripts (Candidate set 1).

[0395] 2. Then they applied a first filtering process based on high and tumor-specific expression as compared to a limited set of other tumors: for this they quantified the expression of Candidate set 1 neogenes on 3 samples from each cancer type (24 samples of 8 types in «discovery batch 1» and 69 samples of 23 types in «discovery batch 2») by re-aligning each sample with STAR and quantifying expression using the GeneCounts procedure with the GENCODE v19 reference GTF file to which they added the Candidate set 1 neotranscripts. Raw counts were converted to transcripts per million (TPM) before the filtering process. To retain only tumor-specific and highly expressed candidates, they selected transcripts with:

[0396] (i) mean expression in the disease of interest of more than 20 TPM,

[0397] (ii) log-fold change of mean expression in samples of other diagnoses versus mean expression in disease of interest of less than −2,

[0398] (iii) mean expression in samples of other diagnoses of less than 3 TPM,

[0399] (iv) maximum expression in samples of other diagnoses of less than 10 TPM,

[0400] resulting in a second set of candidate neogenes (Candidate set 2). As expression in this process was quantified with a unique GTF file containing all Candidate set 1 neogenes for batch 2 (23 tumor types), the first threshold for the filtering process in batch 2 was adapted to (i) mean expression in the disease of interest of more than 10 TPM (instead of 20). As more samples were used in the filtering process for batch 2 (69 versus 24 in batch 1), the fourth threshold was adapted to (iv) maximum expression in samples of another diagnosis of less than 15 TPM (instead of 10). Other thresholds were left unchanged for batch 2. For Ewing sarcoma, we also ran Scallop as described previously on one RNA-seq sample from a cell line (ASP14) and applied the same above filters to the resulting Candidate set 1 of neogenes. The resulting Candidate set 2 neogenes not overlapping Candidate set 2 neogenes from the Ewing tumor sample were included in the final Candidate set 2 neogenes used for the subsequent round of filtering.

[0401] 3. Finally, a second filtering step was applied based on expression levels across a wide range of cancers and normal tissues: for this we quantified expression of Candidate set 2 neogenes in all tumors from our institutional database, all cancer types from TCGA (either all the samples from one type or 50 samples if number of samples exceeded 50), all normal tissue samples from TCGA, all normal tissues from GTEx (either all the samples from one type or 50 samples if number of samples exceeded 50) and all normal tissue samples from the Human Protein Atlas. Every sample was re-aligned with STAR and expression quantified by the GeneCounts procedure with the use of a GTF file including GENCODE v19 and Candidate set 2 neotranscripts (one for each discovery batch). Raw counts were converted to TPM before filtering.

[0402] To retain tumor-specific candidates with a relatively high expression level (accounting for potentially lower tumor content in some samples the inventors diminished the first threshold (i) as compared to the first filter) and quasi-null expression in other cancers and normal tissues, they selected transcripts with:

[0403] (i) mean expression in the disease of interest of more than 7.5 TPM,

[0404] (ii) log-fold change of mean expression in other samples versus mean expression in disease of interest of less than −3,

[0405] (iii) mean expression in other samples of less than 2 TPM,

[0406] (iv) 99% quantile of expression in other samples of less than 10 TPM,

[0407] (v) maximum mean expression in another cancer or tissue of less than 10 TPM (excluding testis and placenta),

[0408] resulting in a final set of tumor-specific neogenes.

[0409] During the procedure they found that a large part of candidate neogenes were mutually and exclusively expressed in the following pairs of diagnoses: angiomatoid fibrous histiocytoma (AFH) and clear cell sarcoma (CCS), alveolar soft part sarcoma (ASPS) and TFE3 renal cell carcinoma (TFE3), alveolar rhabdomyosarcoma (aRMS) and embryonal rhabdomyosarcoma (eRMS). They did not remove those neogenes (cf Results) to account for neogenes driven by the same fusion gene in two different diseases (AFH/CCS, ASPS/TFE3) and disease-associated neogenes common to both types of rhabdomyosarcoma (aRMS/eRMS).

[0410] Analysis of Fusion Transcription Factor Binding and Histone Marks Around Neogenes

[0411] The inventors analyzed ChIP-seq data to explore the epigenetic landscape of neogenes in 3 fusion-driven sarcomas (Ewing, aRMS and DSRCT). For Ewing sarcoma, they used ChIP-seq data from our laboratory for EWS-FLI1, H3K27ac and H3K4me3 in the ASP14 cell line. For alveolar rhabdomyosarcoma, ChIP-seq from public data was used: H3K27ac, H3K4me3 (GSE83726 from Gryder et al., Cancer Discov 2017) and PAX3-FOXO1 (GSE19063 from Cao et al., Cancer Res 2010) for the RH4 cell line. For DSRCT, ChIP-seq from public data was used: EWS-WT1 and PolII (GSE156277 from Hingorani et al., Sci Rep 2020) for the JN-DSRCT-1 cell line. For public data, FASTQ files were downloaded from SRA and aligned with Bowtie2 v2.2.9 (Langmead and Salzberg, Nature Methods 2012), duplicates and multi-mapped reads were removed with Samtools (Li et al., Bioinformatics 2009).

[0412] Analysis of Expression of Neogenes in Cell Lines

[0413] For Ewing sarcoma, aRMS and DSRCT, the inventors quantified expression of neogenes in cell lines having normal or downregulated expression of the fusion transcript (respectively EWS-FLI1, PAX3-FOXO1 and EWS-WT1). RNA-seq reads were aligned with STAR using a GTF file containing GENCODE v19 and the corresponding neogenes, quantification was done using GeneCounts. For Ewing sarcoma, they used RNA-seq data from their laboratory for 10 cell lines having either normal expression of EWS-FLI1 at day 0 or downregulated expression by a doxycyclin-inducible system or siRNA after 7 days. They also used RNA-seq data for MSCs, some of which were induced to express EWS-FLI1. For aRMS, the inventors used RNA-seq from public data (GSE83724 from Gryder et al., Cancer Discov 2017) for the RH4 cell line treated with an shRNA against PAX3-FOXO1 for 48 hours or a control shRNA. For DSRCT, they used RNA-seq from public data (GSE137561 from Gedminas et al., Oncogenesis 2020) for BER and JN-DSRCT-1 cell lines treated with siRNA against EWS-WT1 for 48 hours or control siRNA. All FASTQ files from public data were downloaded from SRA.

[0414] Generation of Genomic-Browser Style Figures for Neogene Loci

[0415] Genomic browser-style figures showing neotranscript sequences, RNA-seq read alignments, and ChIP-seq data were generated with a custom script written in R (R Core Team, 2017). RNA-seq reads in FASTQ format were aligned to the hg19 human reference genome with STAR using a GENCODE v19 reference GTF annotation and visualized in BAM format. ChIP-seq BAM files used for visualization were generated as described previously. For Ewing sarcoma, GGAA repeats (EWS-FLI1 canonical binding sites) were also displayed in the same figure.

[0416] Peptide Prediction

[0417] For each transcript, open reading frames were identified using ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder) in the three frames, with parameters “minimal ORF length”=75 nucleotides and “ORF start codon”=any.

[0418] Peptides binding to MHC Class 1 molecules were predicted using the NetMHCpan 4.1 suite (http://www.cbs.dtu.dk/services/NetMHCpan/), using “HLA allele”=A2, “peptide length”=8-11 and “rank threshold for strong binding”=0.5%.

[0419] Tetramer Preparation and Immunological Analyses

[0420] All experiments were conducted at the immunology department of Institut Curie. Peptides were synthetized by the GeneCust company and HLA-A*02:01/peptide monomers were prepared using the easYmer kit (immunAware, Denmark). After control of the affinity of monomers, tetramers were incubated with human CD8+ prepared from healthy controls (EasySep kit from STEMCELL technologies). Tetramer-CD8 cell binding and analysis of the bound CD8 populations (naïve or memory) was assessed by flow cytometry.

[0421] Results:

[0422] The inventors initially performed long-read RNA sequencing using the PacBio Iso-Seq protocol to investigate the full-length transcriptomic profile of the A673 Ewing cell line. A total of 15,576,646 raw sequence reads were generated, representing 40.3 Gbp. Using the pbsmtpipe pipeline (REF), 56,051 high-quality Circular Consensus Sequence (CCS) were kept for downstream analysis. Surprisingly, they identified 145 high-quality CCS (0.5%) that could be aligned to the human genome but had no match on the RefSeq database of human transcripts.

[0423] Manual inspection of each CCS identified 80 robust candidate neotranscripts. Other sequences were classified as mis-annotated genes (e.g. read-through transcription) or had low coverage support. They hypothesized that some of these transcripts may be generated by transcriptional activation by EWS-FLI1, the aberrant transcription factor specific for Ewing sarcoma. Under this hypothesis, such transcripts should fulfill the following criteria: [0424] i) be regulated by EWS-FLI, [0425] ii) demonstrate the presence of an EWS-FLI1 ChIP-seq peak as well as H3K27ac and H3K4me3 histone marks around their transcription start sites (TSS) and [0426] iii) be expressed specifically in Ewing tumors.

[0427] Four highly confident neotranscripts (JUJE1, JUJE2, JUJE3 and JUJE10) were identified. The inventors found that EWS-FLI1 binds GGAA microsatellites located in close vicinity to their TSS 5 kb). This binding was furthermore associated with the presence of H3K27ac and H3K4me3 activation marks. Both marks were considerably decreased upon down-regulation of EWS-FLI1 by DOX treatment in the inducible A673/TR/shEF cell line (FIG. 1A). A specific RT-PCR assay further showed that these transcripts were correctly spliced and induced by EWS-FLI1 both in A673/TR/shEF cells (FIG. 1B) and in mesenchymal stem cells (MSC) stably expressing EWS-FLI1 (FIG. 1C).

[0428] These transcripts were highly expressed in Ewing sarcoma tumor biospecimens as quantified by Illumina short-read RNA-seq data (FIG. 1A). To more thoroughly investigate the specificity for Ewing sarcoma, the inventors appended these transcripts to the RefSeq dataset and realigned the GTEx, HPA and TCGA cohorts as well as their in-house collection of 132 Ewing sarcomas and 862 tumors of diverse diagnoses, mainly sarcomas. As shown in FIGS. 1D & E, while sporadic low-level expression of these transcripts could be observed in a limited number of tissues or tumors, these levels were much lower than the mean expression level observed in Ewing sarcoma.

[0429] These data show that Ewing sarcoma expresses specific transcripts induced by the binding of EWS-FLI1 within otherwise transcriptionally silent regions. As long-read PacBio sequencing has technical limitations with respect to clinical samples, the inventors designed a short-read strategy to explore in more depth the neotranscriptome of Ewing sarcoma based on genome-guided assembly (Shao and Kingsford, Nat Biotech 2017). Transcripts that were not annotated in the reference GENCODE transcriptome were retrieved. The expression of transcripts that were detected in three different Ewing sarcomas but not in 21 non-Ewing tumors was thoroughly explored across various databases of normal and tumor tissues (FIG. 2A, see expression thresholds in the Materials and Methods section). A total 61 Ewing-specific transcripts were thus identified. As some of these were derived from the same genomic loci through alternative splicing, they could be further assigned to 25 Ewing-specific neo-genes (Ew_NG)(data not shown). This set included the JUTE genes described above, except for JUJE3 which was filtered out as it corresponds to an annotated pseudogene in GENCODE, but not in RefSeq, which was used for filtering in the PacBio strategy. The inventors noted during this procedure (cf below) that some neogenes could be moderately expressed (most less than 10 TPM) in germinal tissues (testis and placenta), reflecting known higher transcriptomic diversity and exclusivity there (e.g. for cancer-testis antigens), and therefore allowed the few genes (less than 1.5% of neogenes in this study) expressed in these tissues at more than 10 TPM to pass the filter nonetheless. A total of 25 neogenes, corresponding to 61 neotranscripts were identified in Ewing sarcoma (data not shown).

[0430] The inventors then explored the suspected role of EWS-FLI1 in the expression of these Ew_NGs. They first used 10 separate Ewing cell lines with si- or shRNA targeting EWS-FLI to show that most Ew_NG were expressed in EWS-FLI1high conditions and down-regulated in EWS-FLI1low conditions (FIG. 2B). Similarly, while these genes were not expressed in MSC, the suspected cell-of-origin of Ewing sarcoma, most were strongly induced in EWS-FLI1-expressing MSC (FIG. 2C). They also took advantage of available EWS-FLI1 ChIP-seq data for Ewing cell lines to show that such EWS-FLI1 peaks situated on GGAA microsatellites were strongly enriched around TSS of several Ew_NGs (12/25) along with H2K27ac and H3K4me3 marks (data not shown). Altogether, these data strongly suggest that the aberrant transcription factor of Ewing sarcoma can bind specific regions in the genome and induce transcription of Ewing-specific neogenes.

[0431] The identification of EWS-FLI1-regulated neogenes in Ewing sarcoma and the validation of the short-read approach prompted the inventors to investigate other sarcomas characterized by the expression of chimeric transcription factors. Alveolar rhabdomyosarcoma (aRMS) and desmoplastic small round cell tumor (DSRCT) are sarcomas characterized by the expression of PAX3/7-FOXO1 and EWS-WT1 fusions, respectively. Using the same strategy of genome-guided assembly of RNA-seq from clinical samples (FIG. 2A) they identified 36 aRMS-specific neogenes (aRMS NG) corresponding to 72 aRMS-specific neotranscripts, and 37 DSRCT-specific neogenes (DSRCT NG) corresponding to 105 DSRCT-specific neotranscripts (data not shown).

[0432] Similarly to Ewing sarcoma, they explored the potential roles of chimeric transcription factors PAX3/7-FOXO1 and EWS-WT1 in the expression of these neogenes. For aRMS, they used RNA-seq data (Gryder et al., Cancer Discov 2017) performed in the RH4 cell line after treatment with an shRNA against PAX3-FOXO1 to show that 15/36 of the neogenes are expressed in this cell line and 7/15 are downregulated (log 2FC>2 in PAX-FOXO high versus low) upon knock-down of PAX3-FOXO1. They also explored ChIP-seq data for PAX3-FOXO1, H3K27ac and H3K4me3 in the same cell line (Gryder et al., Cancer Discov 2017) and found that PAX3-FOXO1 peaks were enriched at the TSS of 13/36 of these neogenes (FIG. 3A).

[0433] For DSRCT, the inventors took advantage of recent work using an siRNA against EWS-WT1 in BER and JN-DSRCT-1 cell lines (Gedminas et al., Oncogenesis 2020) to show that all but 2 neogenes were expressed in at least one cell line and that most (28/36) were strongly downregulated (log 2FC>2 in EWS-WT1 high versus low) upon repression of EWS-WT1 (FIG. 3B). They also analyzed ChIP-seq data for EWS-WT1 in JN-DSRCT-1 (Hingorani et al., Sci Rep 2020) to show a high enrichment of EWS-WT1 peaks around TSS of neogenes (25/37), as well as RNA Polymerase II modifications showing active transcription (data not shown).

[0434] Based on their observations in Ewing sarcoma, aRMS and DSRCT, the inventors propose the concept of fusion-driven neogenes, i.e. tumor-specific neogenes that

[0435] 1) are specifically associated with the fusion-driven tumor type,

[0436] 2) depend on the chimeric transcription factors, as shown by regulation of expression in cell lines with activity of the chimeric transcription factor, and

[0437] 3) have evidence of physical binding of the chimeric transcription factor near their TSS. Additional evidence is available for Ewing fusion-driven neogenes with presence of GGAA microsatellites in the binding sites for EWS-FLI1, as well as decreased chromatin activation marks in EWS-FLI low condition. Numbers of fusion-driven genes satisfying all criteria are 12/25 in Ewing sarcoma, 3/36 in aRMS and 16/37 in DSRCT (see Table 5 for details of classification).

[0438] A subset of neogenes depends on the chimeric transcription factor for expression in knock-down or over-expression experiments but lack clear evidence of binding by ChIP-seq. This may be due to many factors, including sensitivity of ChIP-seq, long-range regulatory interaction, or indirect regulation. The inventors thus consider such neogenes fusion-dependent.

[0439] Based on the detailed analysis in these three fusion-driven sarcomas, the inventors hypothesized that this mechanism of chimeric transcription factor-driven (or dependent) neogenes could be a recurrent phenomenon in fusion-driven sarcomas and other cancers. To test this hypothesis, they took advantage of our large institutional database of RNA-seq for sarcomas (FIG. 2A, Methods). They studied 238 cases of 17 additional types of fusion-driven cancer (all sarcomas except for midline carcinoma and TFE3 renal cell carcinoma). Analysis of this cohort (cf Methods), identified specific neogenes for all but congenital fibrosarcoma (data not shown), with between 2 and 47 tumor-specific neogenes per tumor type (table 6). Overall, the neogenes we identified are mostly multi-exonic, typically have consensus splice sites and do not show evidence of sequence conservation across vertebrates. As the inventors have only clinical samples for these types of cancer they cannot distinguish between fusion-driven, -dependent, and -associated neogenes. Interestingly, in their selection of tumor-specific neogenes, the inventors found that 5/15 neogenes found in clear cell sarcoma (CCS) were also highly expressed (mean TPM>10) in angiomatoid fibrous histiocytoma (AFH), and conversely 15/20 AFH neogenes were highly expressed in CCS. This sharing of neogenes in two types of tumors was also found for 7 neogenes in alveolar soft part sarcoma (ASPS) and TFE3 renal cell carcinoma (TFE3). These pairs of fusion-driven cancers share the same chimeric transcription factors in most cases (EWSR1-ATF1/CREB1 for AFH/CCS and ASPSCR1-TFE3 for ASPS/TFE3), strongly suggesting that the chimeric transcription factor is indeed a determining influence in the induction of these neogenes. Finally, it is noteworthy that fusion-driven cancers characterized by fusion genes not expressing chimeric sequence specific DNA-binding transcription factors, such as synovial sarcoma (SSX-SS18) and congenital fibrosarcoma (ETV6-NTRK3), have only 3 and 0 tumor-specific neogenes respectively, potentially because the chimeric transcription factor is not able to directly induce strictly fusion-driven neogenes in these cases.

[0440] The inventors believe that as in their 3 model sarcomas, part of the neogenes may be tumor-associated lncRNAs induced by a mechanism independent of the chimeric transcription factor (cf MiTranscriptome). However for fusion-driven cancers which have significantly more of these specific neogenes, they hypothesize that part of these neogenes may be chimeric transcription factor-driven or dependent, paralleling their observations in the 3 model sarcomas and extending this concept to all other fusion-driven cancers, at least those with chimeric DNA-binding transcription

[0441] Finally, open reading frames of the identified transcripts as above described were predicted and peptides predicted to be presented by the most MEW class 1 complex were synthetized for the JUJE transcripts (see tables 6-8).

[0442] Those neopeptides predicted to be presented in the context of the most frequent HLA-A2 molecule (tables 6-8) were further tested using the tetramer technology with synthetic peptides. Soluble dye-labelled tetramers.

[0443] Combined with synthesized candidate peptides were generated, assessed by flow-cytometry and then incubated with CD3+, CD8+ T cells. Double-positive T cells were further analysed using staining with CCR7 and CD45RA. This analysis showed that double positive naïve T-cells could be detected for most of the tested peptides.

[0444] FIG. 4 shows the principle of the tetramer approach and preliminary results obtained with a peptide derived from an ORF of a chromosome 10 EWS-FLI1-regulated lincRNA (i.e., an unannotated transcript specifically regulated by the EWS-FLI1 transcription factor fusion). Selected T cells were then expanded and tested in co-culture with professional Antigen Presenting cells (APCs) presenting the synthetized neopeptides. FIG. 5 shows cytokine secretion of CD8 T cells specific for the peptides (SEQ ID XX-XX) encoded by EWS-FLI1-regulated lincRNAs (initially identified by PacBio sequencing) in response to increasing concentration of their specific antigen, presented by the K562 cell antigen presenting cells. The results presented herein show that clones have responded to the synthetic neopeptide presentation (FIG. 5) at different concentrations.

[0445] Therefore the results provide evidence that neopeptides encoded by unannotated transcripts which expression is (i) regulated by a transcription factor fusion and (ii) specifically associated with the fusion-driven tumor type can not only be recognized by naïve T cells but that they can further drive their activation when presented by antigen presenting cells.