Predicting immunogenicity of T cell epitopes

11222711 · 2022-01-11

Assignee

Inventors

Cpc classification

International classification

Abstract

The present invention relates to methods for predicting T cell epitopes. In particular, the present invention relates to methods for predicting whether modifications in peptides or polypeptides such as tumor-associated neoantigens are immunogenic or not. The methods of the invention are useful, in particular, for the provision of vaccines which are specific for a patient's tumor and, thus, in the context of personalized cancer vaccines.

Claims

1. A method for producing a personalized cancer vaccine for a subject having a tumor, the method comprising the steps of: identifying from nucleic acid and/or peptide sequence data that is obtained from tumor and normal cells of a subject, a plurality of modified peptides expressed in the tumor, each comprising an amino acid substitution at a position, relative to a corresponding parent peptide expressed in the normal cells; ascertaining, for each of the plurality of modified peptides, via a computer-based analytical process each of: a) a first score for binding of the modified peptide to one or more MHC molecules by using a peptide:MHC binding predictive model, wherein the first score is indicative of a likelihood of the modified peptide binding to one or more MHC molecules, and b) a second score for binding of the corresponding parent peptide to the one or more MHC molecules by using the peptide:MHC binding predictive model, wherein the second score is indicative of a likelihood of the corresponding parent peptide binding to one or more MHC molecules, and thus the likelihood that TCRs binding the corresponding parent peptide would have been deleted during the subject's development, and c) a third score for chemical and physical dissimilarity between the amino acids at the position in the parent and modified peptides by referencing a substitution matrix that describes a rate at which one amino acid in a sequence changes over evolutionary time to determine the probability of observing the amino acid substitution over evolutionary time, and thus identifying those modified peptides in which the substituted amino acid has sufficiently different physico-chemical properties from the parent amino acid such that the subject's TCR repertoire is able to detect the modified peptide; and selecting, via a computer-based analytical process, from the plurality of modified peptides, as immunogenic to the subject's TCR repertoire at least one candidate modified peptide with respect to which: (i) the first score satisfies a first pre-determined threshold indicating binding to the one or more MHC molecules such that the candidate modified peptide is MHC-presented; and (ii) the second score satisfies a second pre-determined threshold indicating binding to the one or more MHC molecules such that TCRs binding the corresponding parent peptide would have been deleted during the subject's development; and (iii) the third score for at least one amino acid substitution in the candidate modified peptide indicates a lower probability of observing the amino acid substitution over evolutionary time than at least one other modified peptides in the plurality; and producing a personalized cancer vaccine for the subject, which comprises a peptide or polypeptide comprising the at least one candidate modified peptide selected as immunogenic to the subject's TCR repertoire or a nucleic acid encoding the peptide or polypeptide.

2. The method of claim 1 further comprising the step: providing the vaccine to a patient in need thereof.

3. The method of claim 1 wherein the parent peptide has a germline amino acid at the position.

4. The method of claim 1 wherein the parent peptide and the modified peptide are each 8 to 15 amino acids in length.

5. The method of claim 1 wherein the first score and the second score are each ascertained for binding to more than one MHC molecule, and the MHC molecules comprise different MHC molecule types corresponding to different MHC alleles.

6. The method of claim 1 wherein the one or more MHC molecules are MHC class I molecules.

7. The method of claim 1 wherein the first score and/or the second score is/are ascertained by a computer-based analytical process comprising a sequence comparison with a database of MHC-binding motifs.

8. The method of claim 1 wherein the threshold applied with respect to the first score is different from the threshold applied with respect to the second score.

9. The method of claim 1 wherein the first and/or second pre-determined threshold reflects a probability for binding to one or more MHC molecules.

10. The method of claim 1 wherein the substitution matrix is or comprises an evolutionary based log-odds matrix.

11. The method of claim 1 wherein the modified peptide is selected as immunogenic if: (i) the parent peptide binds to one or more WIC molecules in standard assays, and (ii) the modified peptide binds to said one or more WIC molecules in standard assays, and (iii) the amino acids at the position in the modified and parent peptides are chemically and physically dissimilar.

12. The method of claim 1 wherein the position of the amino acid substitution is not an anchor position for binding to one or more WIC molecules.

13. The method of claim 1 wherein the position of the ammo acid substitution is an anchor position for binding to one or more WIC molecules.

14. The method of claim 1, wherein the plurality of modified peptides comprises two or more different modified peptides comprising the same amino acid substitution.

15. The method of claim 14 wherein the two or more different modified peptides comprising the same amino acid substitution comprise different fragments of a modified protein, said different fragments comprising the same amino acid substitution present in the modified protein.

16. The method of claim 14 wherein the two or more different modified peptides comprising the same amino acid substitution comprise all potential MHC binding fragments of a modified protein, said fragments comprising the same amino acid substitution present in the protein.

17. The method of claim 14 wherein the vaccine comprises at least one candidate modified peptides selected from the two or more different modified peptides comprising the same amino acid substitution.

18. The method of claim 14 wherein the two or more different modified peptides comprising the same amino acid substitution differ in length.

19. The method of claim 1, wherein the plurality of modified peptides comprises two or more different modified peptides each comprising a different amino acid substitution and the different amino acid substitutions are present in the same protein.

20. The method claim 14 which comprises comparing the first scores of two or more of said different modified peptides.

21. The method of claim 20 wherein the first score is weighted higher than the third score.

22. The method of claim 1 further comprising, prior to the step of identifying from the sequence data the plurality of modified peptides expressed in the tumor cells, identifying one or more non-synonymous mutations in one or more protein-coding regions encoding at least one of the modified peptides.

23. The method of claim 1 wherein the amino acid substitution is identified by sequencing a whole or partial genome or transcriptome of one or more cells.

24. The method of claim 22 wherein said mutations are somatic mutations.

25. The method of claim 22 wherein said mutations are cancer mutations.

26. The method of claim 1 wherein the one or more MHC molecules are MHC class II molecules.

27. The method of claim 1 wherein the first and the second scores are ascertained for binding to more than one MHC molecule, and the more than one MHC molecule comprise one or more MHC class I molecules and one or more MHC class II molecules.

28. The method of claim 1 wherein the subject is a human individual.

29. The method of claim 1 further comprising the step of determining that the tumor cells express the modified peptide.

30. The method of claim 1 further comprising the step of determining that the normal cells express the parent peptide.

31. The method of claim 29 further comprising the step of administering the vaccine to the subject.

32. The method of claim 30 further comprising the step of administering the vaccine to the subject, wherein the subject has a tumor that expresses the modified peptide.

33. The method of claim 14 wherein the two or more different modified peptides comprising the same amino acid substitution differ in the position of the amino acid substitution.

34. The method of claim 1, wherein the step of selecting comprises ranking at least two or more modified peptides in the plurality with reference to one or more of: their respective first scores, their respective second scores, and their respective third scores, thereby selecting, based on the ranking, at least one candidate modified peptide from the plurality of modified peptides that is more immunogenic to the subject's TCR repertoire than at least one other modified peptides in the plurality.

Description

FIGURES

(1) FIG. 1. MHC binding prediction overview

(2) FIG. 2. Analysis of immunogenicity as a function of the M.sub.mut score for 50 prioritized B16F10 mutations and 82 prioritized CT26.WT mutations (132 mutations in total), of which 30 were immunogenic. All vaccinations were performed with RNA. For B16F10 immunogenicity was assayed by challenging BMDCs with RNA and measuring the immune response of splenocytes with ELISPOT and FACS. For CT26.WT immunogenicity was assayed by challenging BMDCs with RNA and peptides separately and measuring the immune response of splenocytes with ELISPOT; a mutation was considered immunogenic if either peptide or RNA registered an immune response. A Cumulative distribution of immunogenic mutations as a function of the M.sub.mut, score. The graph shows the total number of mutations below a given M.sub.mut score (red), of these, the number of mutations that were immunogenic (blue), and the percent of immunogenic mutations from the total (black). B Histogram of percent of immunogenic mutations per M.sub.mut bin for the following ranges: ≤0.3, (0.3, 1], >1. Errors shown are standard errors.

(3) FIG. 3. Analysis of B16F10 and CT26.WT immunogenicity as a function of M.sub.mut. Cumulative distribution of immunogenic mutations as a function of the M.sub.mut score for B 16 (A) and CT26 (C). Histogram of percent of immunogenic mutations per M.sub.mut bin for the following ranges: [0.1, 0.3], (0.3, 1], (1,∞) for B16 (B) and CT26 (D). Figures A and B are based on analysis of 50 B16F10 prioritized mutations, of which 12 were immunogenic. Figures C and D are based on analysis of 82 B16F10 prioritized mutations, of which 30 were immunogenic. For more details see legend of FIG. 2. Errors are standard errors.

(4) FIG. 4. Models of immunogenicity and control hypotheses. Class I immunogenicity, denoted by H.sub.A, makes the assumption that both the WT and MUT epitopes are presented by cells, and that the mutation sufficiency altered the physico-chemical properties of the amino acid so that the immune system registers this change and generates an immune response (denoted by the lightning bolt). The H.sub.n hypothesis, serving as a control for H.sub.A, is simply the inverted H.sub.A hypothesis, namely, that the mutation did not significantly alter the physico-chemical properties of the amino acid and therefore has a lower likelihood of being “detected” by the immune system and generating an immune response. In class II immunogenicity (H.sub.B U H.sub.C) the WT epitope is not presented but the MUT epitope is presented. H.sub.B and H.sub.C are distinguished by high (T>τ) versus low (T≤τ) T scores, respectively. Note that for α*=α, the H.sub.BC1 model for immunogenicity (M.sub.mut<β) is a composite of all four groups: H.sub.BC1=U[H.sub.A,H.sub.B,H.sub.C,H.sub.n].

(5) FIG. 5. Hypothesized relation of the T score to immunogenincity. According to the class I immunogenicity model, during T cell development TCRs that bound strongly to the wild type epitope were deleted. Extant TCRs should exhibit only weak or no binding affinity to the wild type epitope (A). Epitopes that contain an amino acid substituion that has a high T score have similar physico-chemical properties to the wild type amino acid and therefore will likely have little impact on the binding affinity to extant TCRs (B). Epitopes that contain an amino acid substituion with a T score have a greater chance to increase the binding affinity to exact TCRs and therefore a greater likelihood to be immunogenic (C). In this schematic illustration, color coding is used to pair T cells with a matching peptide. Orange/yellow mutations represent mutations with high T scores (similar to the WT), where as blue/purple mutiations represent mutations with low T scores (significant physico-chemical difference compared to the WT).

(6) FIG. 6. Cumulative distribution of immunogenic mutations as a function of M.sub.mut. A Comparison of the percent of immunogenic mutations that satisfy the baseline control hypothesis H.sub.BC1: {M.sub.mut≤β}, with the percent of immunogenic mutations that satisfy the partial hypothesis H.sub.A′: H.sub.BC1∩{T≤τ}, the partial hypothesis H.sub.BC2: H.sub.BC1∩{M.sub.mut≤α} and the full hypothesis H.sub.A: H.sub.BC1∩{M.sub.mut≤α}∩{T≤τ}, for α=1, τ=1. B Comparison of the percent of immunogenic mutations that satisfy the baseline control hypotheses H.sub.BC1: {M.sub.mut≤β} with the percent of immunogenic mutations that satisfy the inverse partial hypotheses: H.sub.BC1∩{T>τ} and H.sub.BC1∩{M.sub.mut>α}. The analysis in A and B are based on the pooled B16F10 and CT26.WT datasets, comprising of 132 mutations, of which 30 were immunogenic. Each data point in the graphs is based on ≥4 mutations.

(7) FIG. 7. Cumulative distribution of immunogenic mutations as a function of the M.sub.mut score. Comparison of the percent of immunogenic mutations that satisfy the baseline control hypothesis H.sub.BC1: {M.sub.mut≤β}, with the percent of immunogenic mutations that satisfy the partial hypothesis H.sub.A′: H.sub.BC1∩{T≤τ}, the partial hypothesis H.sub.BC2: H.sub.BC1∩{M.sub.mut≤α} and the full hypothesis H.sub.A: H.sub.BC1∩{M.sub.mut≤α}∩{T≤τ}, given α=1, τ=1 for B16 (A) and CT26 (C). Comparison of the percent of immunogenic mutations that satisfy the baseline control hypotheses H.sub.BC1: {M.sub.mut≤β} with the percent of immunogenic mutations that satisfy the inverse partial hypotheses: H.sub.BC1∩{T>τ} and H.sub.BC1∩{M.sub.mut>α} for B16 (B) and CT26 (D). Figures A and B are based on analysis of the 50 B16F10 prioritized mutations, of which 12 were immunogenic. Figures C and D are based on the 82 B16F10 prioritized mutations, of which 30 were immunogenic. Each data point in the graphs is based on ≥4 mutations.

(8) FIG. 8. Controlling for WT immunogenicity. To check whether omitting MUT+/WT+ solutions had an impact on these findings we excluded from the dataset 9 MUT+/WT+ mutations and 2 mutations for which the WT has not been measured, leaving in total 121 mutations (43 B16 and 78 CT26) of which 19 were MUT+/WT− (5 for B16 and 14 for CT26). We again found the same trends as in the complete dataset, namely, highly non-linear response as a function of the M.sub.mut score, superiority of the H.sub.A hypothesis over partial hypothesis, and inferiority of inverted hypotheses compared to the baseline control H.sub.BC1. A Cumulative distribution of immunogenicity as a function of the M.sub.mut score. B Histogram of percent of immunogenic mutations per M.sub.mut bin. C Comparison of the percent of immunogenic mutations that satisfy the baseline control hypotheses H.sub.BC1 with H.sub.A′, H.sub.BC2 and H.sub.A. D Comparison of the percent of immunogenic mutations that satisfy the baseline control hypotheses H.sub.BC1 with the inverse hypotheses. See FIG. 5 legend for additional details.

(9) FIG. 9. Fraction of immunogenic mutations as a function of RPKM. Red: all 50 B16 mutations and 82 CT26 mutations with no filtering (132 mutations in total). Blue: mutations passing the H.sub.A hypothesis with α=1, β=0.5, τ=1. B. Percent of immunogenic mutations for different RPKM ranges with no filtering. RPKM bins are: 1=(0,1],2=(1,5],3=(5,50],4=(50,∞). C Percent of immunogenic mutations for different RPKM ranges under the H.sub.A hypothesis with α=1, β=0.5, τ=1. RPKM bins are: 1=(0,1], 2=(1, ∞). Errors are S.E.

(10) FIG. 10. Anchor and non-anchor position mutated class II immunogenic epitopes. Anchor position motifs were analyzed using SYFPEITHI.

(11) FIG. 11. Proposed models for immunogenic tumor-associated epitopes.

(12) FIG. 12. Example of a method for weighing rank position of mutations. For each mutation the rank position in the list of ranked mutations can be further weighed by the number of solutions for which the combination of HLA types for the patient, possible window lengths for the HLA type and mutation position within the epitope resulted in a solution with low M.sub.mut or resulted in a H.sub.A and/or H.sub.BUH.sub.C classification. Since all solutions per mutation potentially can be presented in parallel, this weighing factor may be an important contributor to the rank position of the mutation.

(13) FIG. 13. Example of scatter plot of all epitope solutions for mutation chr14_52837882 from CT26 against M.sub.mut and ΔM=M.sub.mut−M.sub.wt.

EXAMPLES

(14) The techniques and methods used herein are described herein or carried out in a manner known per se and as described, for example, in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2.sup.nd Edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. All methods including the use of kits and reagents are carried out according to the manufacturers' information unless specifically indicated.

Example 1

Establishing a Model for Predicting Immunogenicity of T Cell Epitopes

(15) Previously we explored the immunogenicity of 50 somatic mutations identified in the B16F10 murine melanoma cell line (J. C. Castle et al., Exploiting the mutanome for tumor vaccination. Cancer Research 72, 1081 (2012)). These 50 mutations were selected from a pool of 563 expressed nonsynonymous somatic mutations primarily to maximize MHC class I expression (J. C. Castle et al., Exploiting the mutanome for tumor vaccination. Cancer Research 72, 1081 (2012)) (see also Example 2). For each mutation we predicted the minimal epitope, i.e., the epitope scoring the lowest MHC class I consensus score (Y. Kim et al., Nucleic Acids Research 40, W525 (2012)) (defined here as M.sub.mut) when searching the space of all possible MHC class I alleles, potential epitope lengths and sequence windows (where to position the mutation) (J. C. Castle et al., Exploiting the mutanome for tumor vaccination. Cancer Research 72, 1081 (2012)). Measuring the immunogenicity of these mutations using RNA vaccination followed by peptide readout (see Example 2) confirmed earlier findings using peptide vaccination (J. C. Castle et al., Exploiting the mutanome for tumor vaccination. Cancer Research 72, 1081 (2012)), and showed that only 12 out of 50 mutations (24%) were immunogenic (Table 1), with MUT+/WT− sequences comprising only 10% for of all mutations tested.

(16) TABLE-US-00001 TABLE 1 Number of immunogenic mutations after RNA vaccination of B16F10 and CT26.WT murine strains. MUT/WT immunogenicity % immunogenic +/+ +/− −/+ −/− Total MUT+/total B16F10 7  5 0  38  50   24% CT26 2* 14* 1  63  80   20% Total 9 19 1 101 130 21.50% *Two CT26 MUT+ mutations were excluded from this table because their WT reactivity has not been measured yet. In total there were 18 MUT+ mutations out of 82 CT26 mutations measured thus far, resulting in 22% success rate.

(17) The results of the B16F10 murine test case demonstrate that naively selecting expressed nonsynonymous mutations with low M.sub.mut scores (≤3.9) yields rather low success rates for predicting immunogenicity. Hence a better understanding of the mechanisms driving immunogenicity is required if personalized vaccines targeting tumor-specific neoantigens are to become effective therapies. In an effort to uncover additional variables that contribute to immunogenicity we explored the immunogenicity of expressed nonsynonymous somatic mutations identified in a colorectal murine cell line CT26.WT. In total, 96 mutations were selected based on their M.sub.mut scores (low vs. high), mean RPKM (low vs. high), and cellular localization (intra- vs. extra-cellular), and tested for immunogenicity using RNA vaccination with both peptide and RNA readout (see Example 2 for further details). Together with the B16F10 cell line, our dataset comprised of 132 epitopes, whose immunogenicity was measured ex vivo on murine splenocytes.

(18) The MHC consensus score. To investigate the dependence of immunogenicity on M.sub.mut, we plotted the cumulative percent of immunogenic mutations as a function of M.sub.mut, that is, the percent of mutations with an M.sub.mut score smaller than a given threshold (denoted by β) that were immunogenic. An analysis of the combined B16 and CT26 datasets spanning a total of 132 mutations reveals a highly nonlinear dependence of the immunogenicity success rate on M.sub.mut (FIG. 2A). FIG. 2A shows that immunogenic mutations are enriched for extremely low M.sub.mut scores (≤˜0.2). For M.sub.mut≤0.1 the percent of immunogenic mutations peaks at ˜60%, and quickly decays as M.sub.mut increases, dropping below ˜25% for M.sub.mut≥2. The percent of immunogenic mutations with M.sub.mut≤0.3 versus >0.3 was 44.4% compared with 17.1%, a statistically significant difference (P value=0.004, Fisher's exact test, one tailed). A histogram of the percent of immunogenic mutations for three M.sub.mut bins: ≤0.3, (0.3, 1] and >1 shows that the percent of immunogenic mutations drops as M.sub.mut increases (FIG. 2B). The differences between the success rate of the lowest bin (M.sub.mut≤50.3), 44.4%, and both the central bin, 20.7%, and the highest bin (M.sub.mut>1), 15.8%, in FIG. 2B were statistically significant (P values=0.05 and 0.004, respectively, Fisher's exact test, one tailed), indicating that for M.sub.mut>˜0.3 the success rate drops in a statistically significant manner. A similar trend in the success rate is also observed when analyzing B16 and CT26 mutanomes separately (FIG. 3).

(19) Thus far our criteria for selecting mutations focused on presentation, and we have seen that restricting the MHC binding score of the mutated epitope allows prediction of immunogenic epitopes with up to 60% precision. Presentation, however, is a necessary but not sufficient condition to induce immunogenicity. By identifying additional criteria for TCR recognition we may be able to further improve the precision of our prediction. We hypothesized two mutually exclusive mechanisms for driving immunogenicity, which we refer to as the class I and class II immunogenicity models.

(20) Class I immunogenicity. In order for the TCR repertoire to recognize a mutated epitope and generate an immune response we hypothesized that three conditions must be satisfied (H.sub.A FIG. 4): (i) the wild type epitope, at some point during the development of the organism, was presented to the immune system leading to deletion of matching TCRs via strong TCR/pMHC binding, (ii) the mutated epitope is presented, and (iii) the physico-chemical properties of the mutated amino acid are sufficiently “different” from the wild type amino acid (by some metric that we shall define below) so that the TCR repertoire is able to “detect” or “register” this substitution. Conditions (i) and (ii) ensure that the immune system is actually exposed to the change, i.e., the mutation. Condition (iii) requires that the mutation significantly change the physico-chemical character of the wild type amino acid so that the binding affinity of the mutated epitope to extant (undeleted) TCR potentially increases, thereby turning on the signaling cascade that leads to an immune response (FIG. 5).

(21) The TCR recognition score. Class I immunogenicity models requires a metric to estimate the physico-chemical difference between two amino acids. It is well known in molecular evolution that amino acids that interchange frequently are likely to have chemical and physical similarities whereas amino acids that interchange rarely are likely to have different physico-chemical properties. The likelihood for a given substitution to occur in nature compared with the likelihood for this substitution to occur by chance is measured by log-odds matrices. The patterns observed in log-odd matrices imposed by natural selection “reflect the similarity of the functions of the amino acid residues in their weak interactions with one another in the three dimensional conformation of proteins” (M. O. Dayhoff, R. M. Schwartz, B. C. Orcutt, A model for evolutionary change. M O Dayhoff, ed. Atlas of protein sequence and structure Vol. 5, 345 (1978)). We therefore used evolutionary based log-odds matrices, which we refer to here as “T scores” to reflect TCR recognition, as effective scoring matrices for cancer associated amino acid substitutions. Substitutions with positive T scores (i.e., log-odds) are likely to occur in nature, and hence correspond to two amino acids that have similar physico-chemical properties. The class I model predicts that substitutions with positive T scores would have a lower likelihood of being immunogenic. Conversely, substitutions with negative T scores reflect substitutions that are unlikely to occur in nature and hence correspond to two amino acids that have significantly different physico-chemical properties. According to our model, such substitutions would have a greater chance of being immunogenic. We compared different methods of estimating log-odds matrices and found results to be largely robust to the exact method chosen. The maximum likelihood (ML) based estimation approach known as WAG (S. Whelan, N. Goldman, Molecular biology and evolution 18, 691 (2001)), using a PAM (point accepted mutation) distance of 250 appeared to separate predicted immunogenic from non-immunogenic mutations best, and therefore we present results with this matrix (see Example 2 for further details).

(22) Class II immunogenicity. In the class II model for immunogenicity we hypothesize that a mutation is likely to be immunogenic if the immune system has never before seen the wild type epitope, and is therefore challenged by the mutated epitope. Therefore in order for a mutation to be immunogenic in this model we hypothesized that two conditions must be satisfied: (i) the wild-type epitope was never presented to the immune system, (ii) the mutated peptide is presented. These conditions can be co-satisfied if, for example, the mutation hits an anchor position thereby changing a “nonbinder” epitope into a “binder”. Formally, class II immunogenicity can be separated into two sub-hypotheses: high T scores (H.sub.B in FIG. 4) and low T scores (H.sub.C in FIG. 4). However, since the assumption is that the wild type epitope is not presented, the nature of the amino acid substitution is not expected to have an impact on TCR recognition and we shall therefore equate class II immunogenicity with the united hypothesis: H.sub.B U H.sub.C.

(23) Testing class I immunogenicity. The assumptions of class I immunogenicity (H.sub.A in FIG. 4) can be restated mathematically as follows: we require that the wild type epitope is presented (M.sub.wt≤α), the mutated epitope is presented (M.sub.mut≤β), and the amino acid substitution is non-trivial (T≤τ), where M.sub.wt is defined as the MHC consensus score of the mutated epitope (same HLA allele and window length) replacing the mutated amino acid with the wild type amino acid, and T denotes the T score. Since all three conditions are necessary, we expect that the precision of the H.sub.A classifier will be higher compared to a classifier based on M.sub.mut alone (H.sub.BC1 in FIG. 4) or compared to the partial hypotheses: H.sub.BC1∩{M.sub.wt≤α} and H.sub.BC1∩{T≤τ}. We therefore calculated the percent of immunogenic mutations (number of true positives divided by the sum of true positives and false positives) as a function of β for H.sub.BC1 for the partial hypothesis H.sub.A′: H.sub.BC1∩{T≤τ} and for the partial hypothesis H.sub.BC2: H.sub.BC1∩{M.sub.wt≤α}. We found that a conservative threshold for τ in the range of ≈0.5 to 1 performed best (the range of the WAG250 matrix is from −5.1 (F⇄G substitution) to +5.4 (F⇄Y substitution). We also found that a can be restricted conservatively compared to β, setting α√1 . FIG. 6A indeed shows that, when considering the pooled mutanome of B 16 and CT26, classifiers based on H.sub.BC2, and H.sub.A attained greater precision than the baseline control hypothesis H.sub.BC1. Moreover, a classifier based on the complete hypothesis H.sub.A attained greater precision than the partial hypotheses H.sub.BC1, and H.sub.BC2, thereby demonstrating an additive effect. The same conclusions hold when analyzing the B 16 and CT26 datasets separately (FIG. 7).

(24) Since the conditions M.sub.wt≤α and T≤τ are postulated to be necessary conditions for immunogenicity, one would except that a classifier based on either the condition H.sub.BC1∩{T>τ} or the condition H.sub.BC1∩{M.sub.wt>α} (i.e., negating the secondary condition) would perform worse than H.sub.BC1. Indeed we found that this is the case for B 16 and CT26 when analyzed together (FIG. 6B) or separately (FIG. 7). Therefore we conclude that the B16 and CT26 datasets support both together and separately the H.sub.A hypothesis. Omitting mutations where the WT RNA also showed reactivity did not affect these conclusions (FIG. 8).

(25) Controlling for the H.sub.A hypothesis. Although mutations with high T scores may still be immunogenic, a hypothesis that enriches for such mutations should statistically enrich for non-immunogenic mutations. Therefore if we compare the H.sub.A hypothesis (H.sub.BC2∩{T≤τ}) with its inverse, H.sub.BC2∩{T>τ} (H.sub.n in FIG. 4), we should observe a statistically significant depletion of immunogenic mutations. Table 2 indeed shows that for M.sub.mut≤β=0.5, M.sub.wt≤α=1, and T≤τ=1, H.sub.A outperforms H.sub.n, with a success rate of 52.5% (n=21) compared to 21.4% (n=14; P=0.068, one tailed Fisher's exact test).

(26) TABLE-US-00002 TABLE 2 Percent of immunogenic mutations under various hypotheses based on the B16 and CT26 pooled datasets comprising 133 mutations. Hypothesis parameters M.sub.mut M.sub.wt T score % of threshold threshold threshold immunogenic (β) (α) (τ) mutations Hy- H.sub.BC1 ≤2 22/83 (26.5%) pothesis ≤1 18/56 (32.1%) ≤0.1 6/10 (60%) H.sub.A′ ≤0.1 ≤1 6/9 (66.7%) H.sub.BC2 ≤1 6/9 (66.7%) H.sub.A ≤1 6/8 (75%) H.sub.A′ ≤0.2 ≤1 8/14 (57.1%) H.sub.BC2 ≤1 9/17 (52.9%) H.sub.n >1 1/6 (16.7%) H.sub.A ≤1 8/12 (66.7%) H.sub.BC2 ≤0.5 ≤1 14/34 (41.2%) H.sub.n >1 3/14 (21.4%) H.sub.A ≤1 11/21 (52.4%) H.sub.BUH.sub.C ≤0.5 >1 1/6 (16.7%)

(27) H.sub.A also performs better than the baseline control H.sub.BC2, which achieves 41.2% (n=35). As we decrease β the difference between the success rates of H.sub.A and H.sub.n become larger since the more stringent the condition on β, the more false positives are removed from the H.sub.A group. For example, for β=0.25 the success rate of the H.sub.A group was 67% (n=14) compared to a success rate of 17% (n=6) for group H.sub.n (P=0.066, one tailed Fisher's exact test)—see Table 3.

(28) TABLE-US-00003 TABLE 3 Ranked list of 133 measured B16F10/CT26.WT mutations that satisfy the basic control hypothesis H.sub.BC1 (M.sub.mut ≤ 0.25) broken down into the three disjoint hypothesis classes: H.sub.A hypothesis for immunogenic mutations (M.sub.wt ≤ 0.8, T ≤ 0.5), H.sub.n/inverse H.sub.A hypothesis enriching for non-immunogenic mutations (M.sub.wt ≤ 0.8, T > 0.5), and H.sub.BUH.sub.C hypothesis for immunogenic mutations (M.sub.wt > 0.8). H.sub.A and H.sub.BUH.sub.C candidates are proposed to be ranked based on the relative importance of distinguishing variables. For H.sub.A the proposed order is: M.sub.mut (descending) .fwdarw. T score (descending) .fwdarw. M.sub.WT descending. For H.sub.BUH.sub.C the proposed order is: M.sub.mut (descending) .fwdarw. M.sub.WT (ascending). Errors are standard errors. RNA RNA re- re- Mean Re- sponse sponse Symbol MHC I Expres- T score Sample Mut sponse (MUT) (WT) (Ingenuity) allele epitope (MUT) epitope (WT) sion M.sub.mut M.sub.wt (WAG250) Class I immunogenicity (H.sub.A): 67 ± 14% success rate B16 33 CD8 yes no PBK H-2-Db AAVILRDALHM AAVILRVALHM 19.6 0.1 0.1 −2.7 CT26 14 no no Nphp3 H-2-Dd GGPGSEKSL GGPGSGKSL 0.6 0.1 0.1 −0.5 B16 37 no no DPF2 H-2-Db LALPNNYCDV LALPNNYCDF 20.0 0.1 0.1 −0.1 B16 25 CD4 yes yes PLOD2 H-2-Db SHLNNDVWQI SHLNNDFWQI 21.7 0.1 0.1 −0.1 CT26 37 CD4 yes no Dhx35 H-2-Kd YYMRDVIAI YYMRDVTAI 5.5 0.1 0.1 −0.1 CT26 26 CD8 yes no E2f8 H-2-Kd TYLQPAQAQM IYLQPAQAQM 29.5 0.1 0.1 −0.1 B16 21 CD4 yes no ATP11A H-2-Db QSLGFTYL QRLGFTYL 42.7 0.1 0.8 −0.1 CT26 13 yes no Nphp3 H-2-Kd EYWASRALDS EYWASRALGS 0.5 0.1 0.1 0.1 CT26 16 no no H2-Q8 H-2-Kd GYLQFAYEGC GYLQFAYEGR 5.1 0.2 0.2 −1.7 B16 46 CD4 yes yes ACTN4 H-2-Kb VTFQAFIDVMS VTFQAFIDFMS 125.6 0.2 0.1 −0.1 CT26 15 no no Slc41a2 H-2-Kd PYLTALDDLL PYLTALGDLL 2.8 0.2 0.2 0.1 CT26 27 yes Agxt2l2 H-2-Dd AGGLFVADAI AGGLFVADEI 29.7 0.2 0.3 0.1 Class II immunogencity (H.sub.BUH.sub.C): 0% success rate CT26 25 no no Pcdhga8 H-2-Dd VGINFLQSYQ VGINSLQSYQ 26.2 0.1 2.7 −2.1 CT26 1 no no Gm8909 H-2-Dd TRPARDGTF TRPAGDGTF 11.0 0.2 1.6 −1.0 CT26 40 no no Zfp449 H-2-Ld EPQIAMDDM EPQIDMDDM 0.7 0.25 1.25 −0.2 H.sub.n: 17 ± 15% success rate B16 49 no no FAT1 H-2-Db IAMQNTTQL IAIQNTTQL 18.8 0.1 0.1 2.6 B16 36 no no TM9SF3 H-2-Kb AIYHHASRAI AIYYHASRAI 51.1 0.2 0.3 2.7 CT26 39 CD8 yes no Als2 H-2-Kd SYIALVDKNI SYLALVDKNI 4.7 0.2 0.1 2.7 CT26 2 no no Snap47 H-2-Dd VIPILEMQF VIPILEVQF 15.5 0.2 0.2 1.8 CT26 17 no no H2-Q8 H-2-Kd GYLQFAYDGR GYLQFAYEGR 5.1 0.2 0.2 2.9 CT26 38 no Cspp1 H-2-Kd VYLNLLLKFT VYLNLFLKFT 0.7 0.2 0.1 2.0

(29) An Example of additional weighing factors that may further improve immunogenicity ranking is given in Example 3.

(30) More generally the list of mutations that satisfy H.sub.BC1(M.sub.mut≤β) can be classified into the three categories: H.sub.A, H.sub.n, and H.sub.BUH.sub.C (Table 3), where H.sub.A enriches for immunogenic mutations, H.sub.n enriches for non-immunogenic mutations. In the case of B16 and CT26, all three candidates in the H.sub.BUH.sub.C group were non-immunogenic, contrary to our expectation. However, if a more realistic threshold α* for M.sub.wt is chosen such that α*>>α, then there would be no predictions that could be tested for H.sub.BUH.sub.C.

(31) Maximal precision of immunogenicity classifiers. According to Table 1 the average success rate for prediction immunogenicity in the combined B16 and CT26 datasets was 22.7% (=30/132). By applying the most stringent threshold on the M.sub.mut score (β=0.1), the precision of an immunogenicity classifier increases to 60% (=6/10; H.sub.BC1 in Table 2). By combining H.sub.BC1 with either the M.sub.wt≤α a condition or the T≤τ condition (α=1, τ=1) precision is increased to 66.7% (=6/9). The H.sub.A based classifier, which combines both criteria, results in an additive response, which increases the precision to 75% (=6/8) (Table 2).

(32) B16 epitope MUT33. The H.sub.A-class epitope that was ranked the highest by all evolutionary models (except the PAM matrix) in the pooled B16/CT26 dataset was B16's MUT33 (see Table 3). Further analysis revealed that MUT33 indeed invoked an MHC class I restricted CD8+ response and exhibited ex vivo immunogenicity against the minimal predicted epitope (data not shown).

(33) Role of gene expression. Plotting the fraction of immunogenic mutations (no. of immunogenic mutations with RPKM values below a given threshold over the total no. of immunogenic mutations) as a function of RPKM values for B16 and CT26 indicates that this ratio somewhat stagnates at very low RPKM values (FIG. 9A). This effect is observed whether the H.sub.A criterion is applied or not. Plotting the percent immunogenic mutations for different RPKM bins (FIG. 9B and C) suggests that RPKM values≤˜1 have a somewhat lower success rate (both with or without applying the H.sub.A filtering hypothesis), although suggestive, it should be noted that these results are within the range of error.

(34) Survey of published CD8+ epitopes. We were next interested to see if published T cell-defined tumor antigens with single amino acid substitutions eliciting CD8+ restricted response fulfilled our models for immunogenicity. Of the 17 epitopes that were published (P. Van der Bruggen, V. Stroobant, N. Vigneron, B. Van den Eynde. (Cancer Immun, http://www.cancerimmunity.org/peptide/, 2013)) (Table 4), five satisfied the criteria for H.sub.A (α=0.7, β=0.2, τ=0.5), four satisfied the criteria for H.sub.CUH.sub.B (α=2.2, β=0.4), and two satisfied the H.sub.n criterion (α=0.6, β=0.3, τ=1.7).

(35) TABLE-US-00004 TABLE 4 Published epitopes with single amino acid substitution generating CD8+ responses. See Example 2 for list of references. Anchor position mutations in the H.sub.BUH.sub.C group are highlighted in red. Hypo- M.sub.mut M.sub.wt T score thesis threshold threshold threshold M.sub.mut M.sub.wt T score* Gene MUT epitope WT epitope Source H.sub.A ≤0.2 ≤0.7 ≤0.5 0.10 0.10 −2.1 SIRT2 KIFSEVTLK KIFSEVTPK 1 ←MZ7-MEL 0.10 0.10 −0.1 SNRPD SHETVIIEL SHETVTIEL 1 ←MZ7-MEL 0.14 0.22 0.5 ME1 FLDEFMEGV FLDEFMEAV 2 0.20 0.70 −1.0 RBAF600 RPHVPESAF GPHVPESAF 1 ←MZ7-MEL 0.15 0.20 −2.1 PRDX5 LLLDDLLVSI LLLDDSLVSI 3 H.sub.n ≤0.3 ≤0.6 ≥1.7 0.25 0.20 1.7 N-ras ILDTAGREEY ILDTAGQEEY 4 0.30 0.60 1.9 EEF2 ETVSEQSNV ETVSEESNV 10 H.sub.B ≤0.25 ≥2.25 >0.5 0.15 10.20 1.2 EFTUD2 KILDAVVAQK KILDAVVAQE 1 ←MZ7-MEL 0.25 2.25 1.2 MYO1B KINKNPKYK EINKNPKYK 5 H.sub.C <0.4 >10 ≤0.5 0.20 10.95 −2.3 FLT3 YVDFREYEYY YVDFREYEYD 6 0.38 10.58 −2.1 CTNNB1 SYLDSGIHF SYLDSGIHS 7 other ≥0.65 ≥1 0.65 2.70 −2.1 OS-9 KELEGILLL KELEGILLP 8 1.20 1.00 0.1 GPNMB TLDWLLQTPK TLGWLLQTPK 1 ←MZ7-MEL 2.68 3.58 1.0 ACTN4 FIASNGVKLV FIASKGVKLV 9 4.20 4.60 −1.2 P53 VVPCEPPEV VVPYEPPEV 11 26.40 29.75 −1.7 CDK4 ACDPHSGHFV ARDPHSGHFV 12 33.30 29.80 −2.9 K-ras VVVGAVGVG VVVGAGGVG 13 *Based on WAG250 log-odds matrix, color legend: T ≤ 0.5, 0.5 < T ≤ 1, T > 1

(36) Thus, the H.sub.A and H.sub.CUH.sub.B hypotheses together accounted for roughly 50% of the published epitopes. Interestingly, 3 out of the 4 published epitopes that satisfied the H.sub.CUH.sub.B condition (red boxes in Table 4) had an M.sub.wt score that was larger than 10 due to anchor position mutations (FIG. 10). Since the requirement for the H.sub.CUH.sub.B hypothesis is that the probability that any cell present the wild type epitope during the development of the organism is kept negligibly small it is expected that the threshold for M.sub.wt should be kept high, i.e., α*>>α. Indeed when increasing α from 0.8 to >3 the false positives for B16/CT26 in Table 3 disappear. Therefore a more realistic threshold for M.sub.wt under the H.sub.CUH.sub.B hypothesis may be somewhere between 3 and 10.

(37) The MZ7-MEL cell line. To test the ability of our immunogenicity models to predict immunogenic epitopes in a human tumor model setting, we explored the MZ7-MEL cell line, established in 1988 from a splenic metastasis of a patient with malignant melanoma (V. Lennerz et al., Proceedings of the National Academy of Sciences of the United States of America 102, 16013 (2005)). Screening of a cDNA library from MZ7-MEL cells with autologous tumor-reactive T cells revealed at least five neoantigens capable of generating CD8+ responses (V. Lennerz et al., Proceedings of the National Academy of Sciences of the United States of America 102, 16013 (2005)). This constitutes the largest set of CD8+ neoantigens derived from a patient to date. Applying our immunogenicity models to these epitopes we found that three neoantigens were classified as H.sub.A epitopes, and one neoantigen, an anchor position mutation, was classified as an H.sub.BUH.sub.C epitope (arrows in Table 4, and FIG. 10). Thus, four of the five epitopes could be explained by our immunogenicity models.

(38) To test our ability to predict these epitopes de novo in the MZ7-MEL cell line we sequenced the exome of the MZ7-MEL cell line (see Methods). In total 743 expressed nonsynonymous mutations were identified. All five mutations previously identified by Lennerz et al. (V. Lennerz et al., Proceedings of the National Academy of Sciences of the United States of America 102, 16013 (2005)) were found. We then calculated for each mutation the T score, M.sub.mut and M.sub.wt, reporting also the HLA allele and epitope that resulted in the minimal MHC consensus score for the given mutation. Mutations were classified into one of three groups: H.sub.A, H.sub.BUH.sub.C, and H.sub.n using the thresholds α=0.8, β=0.2, τ=0.5 (plus the condition RPKM>0.2), and then ranked based on their potential to be immunogenic, as explained in Table 3. We found that out of 743 mutations, 32 mutations satisfied the H.sub.A criteria (Table 5), 12 satisfied the H.sub.BUH.sub.C criterion (Table 6) and 15 satisfied the H.sub.n criterion.

(39) TABLE-US-00005 TABLE 5 H.sub.A-classified MZ7-MEL cell mutations. 32 of the 743 expressed nonsynonymous mutations in MZ7-MEL were classified as H.sub.A-immunogenic using the thresholds: α = 0.8, β = 0.2 and τ = 0.5. Rank is based on an M.sub.mut (descending) .fwdarw. T score (descending) sorting scheme. Immunogenic neoantigens identified by Lennerz et al. are highlighted in yellow. In addition RPKM was required to exceed 0.2. Rank* Gene M.sub.mut T score** M.sub.wt Mean exp 1 DPH2 0.1 −2.9 0.1 10.7 ADHFE1 0.1 −2.9 0.1 2.7 2 DDX41 0.1 −2.1 0.1 24.8 SIRT2 0.1 −2.1 0.1 15.7 ←MZ7-MEL 3 PRIC285 0.1 −1.0 0.4 5.4 4 CSTF3 0.1 −0.5 0.1 11.2 ETFDH 0.1 −0.5 0.1 10.8 5 MED12 0.1 −0.1 0.1 21.9 SNRPD1 0.1 −0.1 0.1 18.0 ←MZ7-MEL MLLT6 0.1 −0.1 0.2 14.7 AFAP1 0.1 −0.1 0.2 5.1 6 MAP3K11 0.1 0.1 0.1 41.9 7 DHX30 0.1 0.3 0.1 34.4 ALK 0.1 0.3 0.1 0.4 CHMP4B 0.1 0.3 0.7 52.8 8 HADHB 0.1 0.5 0.1 60.6 SUPT6H 0.1 0.5 0.1 25.4 C12orf35 0.1 0.5 0.1 3.1 ZDHHC5 0.1 0.5 0.4 27.6 9 WIPF1 0.15 −2.1 0.15 37.2 ZNF740 0.15 −2.1 0.5 9.7 10 MLL 0.15 0.5 0.3 3.7 11 KIAA1715 0.2 −2.1 0.1 6.3 CHD8 0.2 −2.1 0.2 10.3 12 DDX28 0.2 −1.7 0.2 7.2 13 MAPK1IP1L 0.2 −1.0 0.2 12.1 UBR4 (RBAF600) 0.2 −1.0 0.7 11.8 ←MZ7-MEL 14 TRAK2 0.2 −0.1 0.5 21.8 15 MAEA 0.2 0.0 0.25 25.3 16 KLHL13 0.2 0.1 0.2 3.4 17 FOSL2 0.2 0.3 0.2 9.4 18 UTRN 0.2 0.5 0.15 6.3 *Rank is based on M.sub.mut and the T score **T score is based on the WAG250 log-odds matrix

(40) TABLE-US-00006 TABLE 6 H.sub.BUH.sub.C-classified MZ7-MEL cell mutations. 12 of the 743 expressed nonsynonymous mutations in MZ7-MEL were classified as H.sub.BUH.sub.C- immunogenic using the thresholds: α* = 0.8, β = 0.2 and RPKM >2. Rank is based on a M.sub.mut (descending) .fwdarw. M.sub.wt (ascending) sorting scheme. Immunogenic neoantigens identified by Lennerz et al. are highlighted in yellow. Mean Rank* Gene M.sub.mut M.sub.wt exp 1 NF1 0.1 1.4 6.8 2 MESP2 0.1 1.3 0.3 3 EFTUD2 (SNRP116) 0.15 10.2 22.0 ←MZ7-MEL 4 SEC31A 0.15 2.55 33.3 5 ZNF335 0.2 18.35 8.3 6 CPEB1 0.2 4.2 6.0 7 UBAC2 0.2 2.8 7.5 8 FANCF 0.2 2.4 3.9 RAPH1 0.2 2.4 3.0 9 ZNF557 0.2 1.4 1.5 10 TLK2 0.2 1.35 2.9 11 ST7 0.2 1.15 10.9 *Rank is based on M.sub.mut and M.sub.wt

(41) Of the 32 mutations classified as H.sub.A, the three H.sub.A-class mutations identified by Lennerz et al. (SIRT2, SNRPD1 and RBAF600) were ranked in 2.sup.nd, 4.sup.th and 13.sup.th positions out of 18 rank-classes, using a M.sub.mut.fwdarw.T score ranking scheme (see Table 3). Of the 12 mutations classified as H.sub.BUH.sup.C, the forth Lennerz et al. mutation (SNRP116) ranked in the 3.sup.rd position. Moreover, if a higher (more realistic) threshold for M.sub.wt was employed (e.g., α*˜5) then the forth Lennerz et al. mutation is ranked in the 1.sup.st position (together with just one additional anchor position mutation—Table 7). Finally, the four Lennerz et al. mutations were predicted to have the correct HLA allele, epitope length and mutation position as reported by the authors.

(42) TABLE-US-00007 TABLE 7 H.sub.BUH.sub.C-classified MZ7-MEL cell mutations. 2 of the 743 expressed nonsynonymous mutations in MZ7-MEL were classified as H.sub.BUH.sub.C- immunogenic using the thresholds: α* = 5, β = 0.2 and RPKM >2. Rank is based on a M.sub.mut (descending) .fwdarw. M.sub.wt (ascending) sorting scheme. Immunogenic neoantigens identified by Lennerz et al. are highlighted in yellow. Mean Rank* Gene M.sub.mut M.sub.wt exp 1 EFTUD2 (SNRP116) 0.15 10.2 22.0 ←MZ7-MEL 2 ZNF335 0.2 18.35 8.3 *Rank is based on M.sub.mut and M.sub.wt

(43) Conclusions

(44) The analysis of the B 16 and CT26 datasets support a model where immunogenicity is conferred if three conditions are satisfied: the wild type peptide is presented, the mutated peptide is presented, and the amino acid substitution has a sufficiently low log-odds score (FIG. 11A). This model for immunogenicity, which we refer to as class I immunogenicity, is further supported in the human melanoma cell line model, MZ7-MEL. The MZ7-MEL model and published CD8+ restricted neoantigens support a second model, which we refer to as class II immunogenicity, in which the wild type epitope is not presented, but a substitution (e.g., in an anchor position) leads to a significant increase in the MHC consensus score (>5 to 10), resulting in a novel, never-before-seen epitope (FIG. 11B). This framework for defining immunogenicity is captured with a three-variable classification scheme (M.sub.mut, M.sub.wt, T score). Using this classification scheme we were able to reduce the MZ7-MEL 743 mutations to a list of 34 mutations, with 3 of the 5 Lennerz et al. epitopes ranking in the top 5 classes.

(45) Table 7 demonstrates that class II immunogenic mutations are rare. Out of 743 mutations only 2 were classified as class. II immunogenic (using a realistic threshold for M.sub.wt) compared with roughly 30 class I immunogenic mutations. A paucity of H.sub.BUH.sub.C-class mutations was also observed in the mouse melanoma models (Table 8). This observation underscores the importance of class I immunogenic of mutations for personalize vaccines, which are expected to be the dominate type mutations found in patient samples that can be used for vaccination. At the same time, the fact that one of the five epitopes found by Lennerz et al. was class II immunogenic may indicate that class II immunogenic mutations are more potent or somehow selected by the immune system.

(46) TABLE-US-00008 TABLE 8 Number of candidate H.sub.A and H.sub.BUH.sub.C mutations in different tumor models. (α = 0.8, α* = 5, β = 0.2, τ = 0.5). Hypothesis H.sub.A H.sub.BUH.sub.C Strain B16 27 0 CT26 13 0 MZ7-MEL 35 2

Example 2

Materials and Methods

(47) The materials and methods used in Example 1 are described below:

(48) Animals

(49) C57BL/6J and Balb/cJ mice (CRL) were kept in accordance with federal and state policies on animal research at the University of Mainz.

(50) Cells for Melanoma and Colorectal Murine Tumor Model

(51) B16F10 melanoma cell line (Product: ATCC CRL-6475, Lot Number: 58078645) and CT26.WT colon carcinoma cell line (Product: ATCC CRL-2638, Lot Number: 58494154) were purchased in 2010 from the American Type Culture Collection. Early (3rd, 4th) passages of cells were used for sequencing experiments. Cells were routinely tested for Mycoplasma. Re-authentification of cells has not been performed since receipt. MZ7-MEL cell line (established January 1988) and an autologous Epstein-Barr virus-transformed B cell line were obtained from Dr. Thomas Wölfel (Department of Medicine, Hematology Oncology, Johannes Gutenberg University).

(52) Synthetic Peptides

(53) Peptides were purchased from Jerini Peptide Technologies (Berlin, Germany) or synthesized from the TRON peptide facility. Synthetic peptides were 27 amino acids long with the mutated (MUT) or wild-type (WT) amino acid on position 14.

(54) Immunization of Mice

(55) Age-matched female C57BL/6 or Balb/c mice were injected intravenously with 20 μg in vitro transcribed mRNA formulated with 20 μl Lipofectamine™ RNAiMAX (Invitrogen) in PBS in a total injection volume of 200 μl (3 mice per group). The mice were immunized on day 0, 3, 7, 14 and 18. Twenty-three days after the initial injection mice were sacrificed and splenocytes were isolated for immunological testing (see ELISPOT assay). DNA-sequences representing one (Monoepitope) or two mutations (Biepitope) were constructed using the sequence of 27 amino acids (aa) with the mutation on position 14 and cloned into the pST1-2BgUTR-A120 backbone (S. Holtkamp et al., Blood 108, 4009 (2006)). In vitro transcription from this template and purification were previously described (S. Kreiter et al., Cancer Immunology, Immunotherapy 56, 1577 (2007)).

(56) Enzyme-Linked Immunospot Assay

(57) Enzyme-linked immunospot (ELISPOT) assay (S. Kreiter et al., Cancer Research 70, 9031 (2010)) and generation of syngeneic bone marrow derived dendritic cells (BMDCs) as stimulators were previously described (L. MB et al., J. Immunol. Methods 223, 77 (1999)). For the B16F10 model BMDCs were peptide pulsed (6 μg/ml), with the indicated mutation, the corresponding wild-type or with control peptide (VSV-NP). For the CT26 model in addition to the restimulation with peptides BMDCs were transfected with the corresponding in vitro transcribed mRNA and used for restimulation, as well. For the assay, 5×10.sup.4 BMDCs were coincubated with 5×10.sup.5 freshly isolated splenocytes in a microtiter plate coated with anti-IFN-γ antibody (10 μg/mL, clone AN18; Mabtech). After 18 hours at 37° C., cytokine secretion was detected with an anti-IFN-γ antibody (clone R4-6A2; Mabtech). Spot numbers were counted and analyzed with the ImmunoSpot® S5 Versa ELISPOT Analyzer, the ImmunoCapture™ Image Acquisition software and the ImmunoSpot® Analysis software Version 5. Statistical analysis was done by student's t-test and Mann-Whitney test (non-parametric test). Responses were considered significant with a p-value <0.05.

(58) Intracellular Cytokine Assay

(59) Aliquots of the splenocytes prepared for the ELISPOT assay were subjected to analysis of cytokine production by intracellular flow cytometry. To this end 2×10.sup.6 splenocytes per sample were plated in culture medium (RPMI+10% FCS) supplemented with the Golgi inhibitor Brefeldin A (10 μg/mL) in a 96-well plate. Cells from each animal were restimulated for 5 h at 37° C. with 2×10.sup.5 peptide pulsed or RNA-transfected BMDCs. After incubation the cells were washed with PBS, resuspended in 50 μl PBS and extracellularly stained with the following anti-mouse antibodies for 20 min at 4° C.: anti-CD4 FITC, anti-CD8 APC-Cy7 (BD Pharmingen). After incubation the cells were washed with PBS and subsequently resuspended in 100 μl Cytofix/Cytoperm (BD Bioscience) solution for 20 min at 4° C. for permeabilization of the outer membrane. After permeabilization the cells were washed with Perm/Wash-Buffer (BD Bioscience), resuspended in 50 μL/sample in Perm/Wash-Buffer and intracellularly stained with the following anti-mouse antibodies for 30 min at 4° C.: anti-IFN-γ PE, anti-TNF-α PE-Cy7, anti-IL2 APC (BD Pharmingen). After washing with Perm/Wash-Buffer the cells were resuspended in PBS containing 1% paraformyldehyde for flow cytometry analysis. The samples were analyzed using a BD FACSCanto™ II cytometer and FlowJo (Version 7.6.3).

(60) Next Generation Sequencing

(61) Nucleic acid extraction: DNA and RNA from bulk cells and DNA from mouse tissues were extracted using Qiagen DNeasy Blood and Tissue kit (for DNA) and Qiagen RNeasy Micro kit (for RNA).

(62) DNA exome sequencing: Exome capture for B16F10, C57BL/6J and CT26.WT and DNA re-sequencing for Balb/cJ were performed in triplicates as previously described (J. C. Castle et al., Exploiting the mutanome for tumor vaccination. Cancer Research 72, 1081 (2012)). Exome capture for MZ7-MEL/EBV-B DNA re-sequencing was performed in duplicates using Agilent XT Human all Exon 50 Mb solution-based capture assay, designed to capture all protein coding regions. 3 μg purified genomic DNA (gDNA) was fragmented to 150-200 bp using a Covaris S2 ultrasound device. Fragments were end repaired, 5′ phosphorylated and 3′ adenylated according to the maufacturer's instructions. Agilent indexing specific paired-end adapters were ligated to the gDNA fragments using a 10:1 molar ratio of adapter to gDNA. 4 cycle pre-capture amplification was done using Agilent's InPE 1.0 and SureSelect indexing pre-capture PCR primers and Herculasell polymerase. 500 ng of adapter ligated, PCR enriched gDNA fragments were hybridized to Agilent's exome capture baits for 24 hrs at 65° C. Hybridized gDNA/RNA bait complexes where removed using streptavidin coated magnetic beads, washed and the RNA baits cleaved off during elution in SureSelect elution buffer. The eluted gDNA fragments were PCR amplified post-capture for 10 cycles using SureSelect Indexing Post-Capture PCR and index PCR primers and HerculaseII polymerase. All cleanups were done with 1.8× volume of Agencourt AMPure XP magnetic beads. All quality controls were done using Invitrogen's Qubit HS assay and fragment size was determined using Agilent's 2100 Bioanalyzer HS DNA assay. Exome enriched gDNA libraries were clustered on the cBot using Truseq SR cluster kit v2.5 using 7 pM library and 1×100 bps were sequenced on the Illumina HiSeq2000 using Truseq SBS kits.

(63) RNA gene expression profiling (RNA-Seq): Barcoded mRNA-seq cDNA libraries were prepared in duplicate, from 5 μg of total RNA (modified Illumina mRNA-seq protocol using NEB reagents) mRNA was isolated using Seramag Oligo(dT) magnetic beads (Thermo Scientific) and fragmented using divalent cations and heat. Resulting fragments (160-220 bp) were converted to cDNA using random primers and SuperScriptll (Invitrogen) followed by second strand synthesis using DNA polymerase I and RNaseH. cDNA was end repaired, 5′ phosphorylated and 3′ adenylated according to NEB RNA library kit instructions. 3′ single T-overhang Illumina multiplex specific adapters were ligated with T4 DNA ligase using a 10:1 molar ratio of adapter to cDNA insert. cDNA libraries were purified and size selected at 300 bp (E-Gel 2% SizeSelect gel, Invitrogen). Enrichment, adding of Illumina six base index and flow cell specific sequences was done by PCR using Phusion DNA polymerase and Illumina specific PCR primers. All cleanups up to this step were done with 1.8× volume of Agencourt AMPure XP magnetic beads. All quality controls were done using Invitrogen's Qubit HS assay and fragment size was determined using Agilent's 2100 Bioanalyzer HS DNA assay. Barcoded RNA-Seq libraries were clustered and 50 bps were sequenced as described above.

(64) NGS data analysis, gene expression: The output sequence reads from RNA samples were preprocessed according to the Illumina standard protocol, including filtering for low quality reads. Sequence reads were aligned to the mm9 (A. T. Chinwalla et al., Nature 420, 520 (2002)) or hg18 (F. Collins, E. Lander, J. Rogers, R. Waterston, I. Conso, Nature 431, 931 (2004)) reference genomic sequence with bowtie (version 0.12.5) (B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, Genome Biol 10, R25 (2009)). For genome alignments two mismatches were allowed and only the best alignment (“-v2-best”) reported, for transcript alignments default parameters were used. Reads not alignable to the genomic sequence were aligned to a database of all possible exon-exon junction sequences of the UCSC known genes (F. Hsu et al., Bioinformatics 22, 1036 (2006)). Expression values were determined by intersecting read coordinates with those of RefSeq transcripts, counting overlapping exon and junction reads, and normalizing to RPKM expression units (Reads which map per Kilobase of exon model per million mapped reads) (A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer, B. Wold, Nature methods 5, 621 (2008)).

(65) NGS data analysis, somatic mutation discovery: Somatic mutations were identified as previously described (J. C. Castle et al., Exploiting the mutanome for tumor vaccination. Cancer Research 72, 1081 (2012)). Sequence reads aligned to the mm9 or hg18 reference genome using bwa (default options, version 0.5.8c) (H. Li, R. Durbin, Bioinformatics 25, 1754 (2009)). Ambiguous reads mapping to multiple locations of the genome were removed. Mutations were identified using a consensus of two software programs: samtools (version 0.1.8) (H. Li, Bioinformatics 27, 1157 (2011)) and SomaticSniper (A. McKenna et al., Genome Research 20, 1297 (2010)). For B16F10 and C57BL/6J, also GATK was included (A. McKenna et al., Genome Research 20, 1297 (2010)). Potential somatic variations identified in all respective replicates were assigned a “false discovery rate” (FDR) confidence value (M. Löwer et al., PLoS computational biology 8, e1002714 (2012)) (CT26 and MZ7-MEL only).

(66) Mutation Selection and Validation

(67) The criteria for selecting the 50 B16F10 mutations for immunogenicity testing were previously described (J. C. Castle et al., Exploiting the mutanome for tumor vaccination. Cancer Research 72, 1081 (2012)). These criteria for the mutations included: (i) presence in all three B16F10 replicates and absence from all C57BL/6 triplicates, (ii) occur in a RefSeq transcript, (iii) cause nonsynonymous change, (iv) occurrence in B16F10-expressed genes (median RPKM across replicates >10, exon expression >0) and (v) for each mutation the M.sub.mut score (see below) was required to be <5. Of the 59 remaining mutations, the product of the quantile ranks of MHC class I score, MHC class II score and transcript expression was formed, and the first 50 mutations (0.1≤M.sub.mut≤3.9) were selected for confirmation by PCR (see (J. C. Castle et al., Exploiting the mutanome for tumor vaccination. Cancer Research 72, 1081 (2012)) for further details). The criteria for the 96 CT26.WT mutations selected for immunogenicity testing were further refined and included the following: (i) presence in all CT26.WT three replicates and absence from all Balb/cJ three replicates, (ii) FDR≤0.05, (iii) occur in a UCSC known gene transcript, (iv) cause nonsynonymous change, (v) not present in dbSNP database (vi) not in a genomic repeat region. From the remaining 493 mutations, eight 12-member groups were defined according to three features: M.sub.mut score (lowest—[0.1,1.9] versus highest—[3.9-20.3]), compartment of the protein (extra-cellular, intra-cellular), and gene expression (below versus above the median of 7.1 RPKM), selecting mutations according to a greedy algorithm, and adjusting thresholds accordingly. 94 of the resultant 96 mutations were confirmed by PCR followed by Sanger sequencing.

(68) The criteria for selecting MZ7-ML mutations for analysis included: (i) presence in two MZ7-MEL replicates and absence from two autologous EBV-B replicates, followed by steps (ii) to (vi) describe above for CT26.WT. Applying steps (i)-(vi) reduced the initial list of ˜8000 mutations to 743.

(69) MHC Binding Prediction and Calculation of the M.sub.mut Score

(70) MHC binding predictions are performed using the IEDB analysis resource Consensus tool (http://tools.immuneepitope.org/analyze/html/mhc_binding.html) (Y. Kim et al., Nucleic Acids Research 40, W525 (2012)), which combines the best performing prediction methods based on benchmarking studies (H. H. Lin, S. Ray, S. Tongchusak, E. L. Reinherz, V. Brusic, BMC immunology 9, 8 (2008); B. Peters et al., PLoS computational biology 2, e65 (2006)) from ANN (C. Lundegaard et al., Nucleic Acids Research 36, W509 (2008); M. Nielsen et al., Protein Science 12, 1007 (2009)), SMM (B. Peters, A. Sette, BMC bioinformatics 6, 132 (2005)) and for some allele models also comblib (J. Sidney et al., Immunome Research 4, 2 (2008)). The consensus approach combines the prediction scores of all tools by generating a percentile rank, which reflects the binding prediction scores of the given peptide against peptide scores of five million random peptides from SWISSPROT.

(71) For each mutation we calculated the predicted MHC consensus scores for all possible (i) sequence windows (where to position the mutation), (ii) epitope lengths and (iii) possible murine MHC class I alleles. The minimum of all MHC consensus scores was defined to be the M.sub.mut score.

(72) Calculation of Log-Odds Matrices and the T Score

(73) Log-odds matrices can be estimated from sequence alignment comparisons of large protein databases. The early log-odds matrices were based on pairwise comparison of sequences (BLOSUM62 (S. Kreiter et al., Cancer Immunology, Immunotherapy 56, 1577 (2007))) and the maximum parsimony (MP) estimation method (e.g., PAM250 (M. O. Dayhoff, R. M. Schwartz, B. C. Orcutt, A model for evolutionary change. M O Dayhoff, ed. Atlas of protein sequence and structure Vol. 5, 345 (1978)), JTT250 (S. Q. Le, O. Gascuel, Molecular biology and evolution 25, 1307 (2008)), and the Gonnet matrix (C. C. Dang, V. Lefort, V. S. Le, Q. S. Le, O. Gascuel, Bioinformatics 27, 2758 (2011))). More recently, maximum likelihood (ML) based methods were developed (e.g., VT160 (P. G. Higgs, T. K. Attwood, Bioinformatics and molecular evolution. (Wiley-Blackwell, 2009)), WAG (S. Whelan, N. Goldman, Molecular biology and evolution 18, 691 (2001)) and LG (V. Lennerz et al., Proceedings of the National Academy of Sciences of the United States of America 102, 16013 (2005))). Since ML is not limited to comparison of only closely related sequences, as is the case with MP based approaches, this estimation approach is expected to be the most accurate.

(74) Calculation of log-odds matrix has been described in detail elsewhere (C. C. Dang, V. Lefort, V. S. Le, Q. S. Le, O. Gascuel, Bioinformatics 27, 2758 (2011)). Briefly, the standard model for amino acid substitution assumes a Markovian, time-continuous, time-reversible model represented by a 20×20 rate matrix Q.sub.ij, where q.sub.ij (i≠j) is the number of substitutions from amino acid i to j per unit of time, and where diagonal elements are chosen to satisfy

(75) Q ii = - .Math. j i Q ij .
Q can be decomposed such that Q.sub.ij=S.sub.ij.Math.π.sub.j for i≠j, where is a symmetric exchangeability matrix, and π.sub.i is the probability to observe amino acid i (C. C. Dang, V. Lefort, V. S. Le, Q. S. Le, O. Gascuel, Bioinformatics 27, 2758 (2011)). Finally, Q is normalized such that

(76) 1 = - .Math. i π i Q ii ,
so that a time unit t=1.0 corresponds to 1.0 expected substitution per site, or one “accepted point mutation” per site, denoted by a PAM distance of 100 (M. O. Dayhoff, R. M. Schwartz, B. C. Orcutt, A model for evolutionary change. M O Dayhoff, ed. Atlas of protein sequence and structure Vol. 5, 345 (1978); S. Q. Le, O. Gascuel, Molecular biology and evolution 25, 1307 (2008); C. C. Dang, V. Lefort, V. S. Le, Q. S. Le, O. Gascuel, Bioinfoimatics 27, 2758 (2011)). The probability for amino acid i to be replaced by amino acid j after time t, Pr(i.fwdarw.j|t)=P.sub.ij(t), is given by the 20×20 probability matrix P(t)=e.sup.tQ (with notation denoting matrix exponentiation). The log-odds matrix calculated for time t is given by the log-odds 20×20 matrix

(77) T i , j = 10 log 10 ( π i P ij ( t ) π i π j )
(M. O. Dayhoff, R. M. Schwartz, B. C. Orcutt, A model for evolutionary change. M O Dayhoff, ed. Atlas of protein sequence and structure Vol. 5, 345 (1978, 1978)). A time-reversible mean that π.sub.iP.sub.ij(t)=π.sub.jP.sub.ji(t), and therefore T.sub.i,j is symmetric (P. G. Higgs, T. K. Attwood, Bioinformatics and molecular evolution. (Wiley-Blackwell, 2009)).

(78) The T score for the substitution i⇄j is defined here as T.sub.i,j, and depends on the evolutionary model and the time t. We explored various models and PAM distances for the T score, including PAM, BLOSUM62, JTT, VT160, Gonnet, WAG, WAG*, and LG (see references above). The figures in this report were generated using a T score based on the WAG model and a PAM distance of 250. Such a large PAM distance means that there is substantial chance for the amino acid to change (P. G. Higgs, T. K. Attwood, Bioinformatics and molecular evolution. (Wiley-Blackwell, 2009)), and is useful in detecting distant relationships between sequences where residues may not be identical but the physico-chemical properties of the amino acids are conserved (M. O. Dayhoff, R. M. Schwartz, B. C. Orcutt, A model for evolutionary change. M O Dayhoff, ed. Atlas of protein sequence and structure Vol. 5, 345 (1978); P. G. Higgs, T. K. Attwood, Bioinformatics and molecular evolution. (Wiley-Blackwell, 2009)).

(79) Using a t-distribution test statistic we compared the mean T scores of immunogenic versus non-immunogenic epitopes from Table 3 for the WAG matrix using various PAM scores (1, 10, 25, 50, 100, 150, 200, and 250). Analysis of the test statistic showed that the P value decreased monotonically with PAM distance, implying that a PAM distance of 250 was the optimal solution, as would be anticipated (data not shown). The classification into H.sub.A and H.sub.n was the same for all matrices except for the PAM matrix, which is the least accurate of all evolutionary models. Of all evolutionary models, the WAG250 model resulted in the maximum separation between H.sub.A and H.sub.n epitopes in Table 3, measuring separation with the test statistic: [max T score(H.sub.A)-min T score(H.sub.n)]/σ(T score (H.sub.A), T score(H.sub.n)) (data not shown). The same test statistic was also maximal for a PAM distance 250 compared to smaller distance.

(80) Published CD8+ Epitopes

(81) CD8+ epitopes with single mutated amino acids were collected from the list of tumor antigens resulting from mutations published by the Cancer Immunity Journal (P. Van der Bruggen, V. Stroobant, N. Vigneron, B. Van den Eynde. (Cancer Immun, http://www.cancerimmunity.org/peptide/, 2013) (http://cancerimmunity.org/peptide/mutations/). HLA alleles were taken either from the published table or from the original paper if the latter was more precise. References listed in Table 4 are the following: (1) Lennerz et al. PNAS 102 (44), pp. 16013-16018 (2005); (2) Karanikas et al. Cancer Res 61 (9), pp. 3718-3724 (2001); (3) Sensi et al. Cancer Res 65 (2), pp. 632-640 (2005); (4) Linard et al. J. Immunol 168 (9), pp. 4802-4808 (2002); (5) Zorn et al. Eur. J. Immunol 29 (2), pp. 592-601 (1999); (6) Graf et al. Blood 109 (7), pp. 2985-2988 (2007); (7) Robbins et al. J. Exp. Med 183 (3), pp. 1185-1192 (1996); (8) Vigneron et al. Cancer Immun 2, pp. 9 (2002); (9) Echchakir et al. Cancer Res 61 (10), pp. 4078-1083 (2001); (10) Hogan et al. Cancer Res 58 (22), pp. 5144-5150 (1998); (11) Ito et al. Int. J. Cancer 120 (12), pp. 2618-2624 (2007); (12) Wölfel et al. Science 269 (5228), pp. 1281-1284 (1995); (13) Gjertsen et al. Int. J. Cancer 72 (5), pp. 784-790 (1997).

Example 3

Example of a Scheme for Weighing Mutation Scores to Improve Prioritization of Immunogenic Mutations

(82) RNA that is injected into the cell, once translated and cleaved into short peptides, can be presented on different HLA types within the cell. Therefore it stands to reason that the more HLA types that are predicted to have a low MHC consensus (or similar) score, the more likely a given mutation will be immunogenic since it can potentially be displayed on more than one HLA type in parallel. Thus, weighing mutations by the number of HLA types for which the mutation is classified as H.sub.A and/or H.sub.BUH.sub.C or even weighing each mutation simply by the number of HLA types that have a low M.sub.mut score may improve immunogenicity ranking. In the most general solution, when we inject a 27 mer RNA or peptide into the cell, there is not just the freedom to select the HLA type, but also the length of the peptide and the position of the mutation within this peptide. Therefore, one can scan all possible HLA types, all possible window lengths and all possible positions for the mutation within the window and calculate the number of solutions (per given mutation) that are classified as H.sub.A and/or H.sub.BUH.sub.C (FIG. 12). This may be an important weighing factor for mutation prioritization to select the most efficacious epitopes for vaccination. An example of a scatter plot of all these solutions as a function of M.sub.mut and ΔM=M.sub.mut−M.sub.wt is shown in FIG. 13.