SINGLE MOLECULE SEQUENCING PEPTIDES BOUND TO THE MAJOR HISTOCOMPATIBILITY COMPLEX
20230103041 · 2023-03-30
Assignee
Inventors
- Edward Marcotte (Austin, TX)
- Eric ANSLYN (Austin, TX, US)
- Alexander BOULGAKOV (Oakland, CA, US)
- Angela M. BARDO (Austin, TX, US)
- Siyuan Stella WANG (Boston, MA, US)
- Jagannath Swaminathan (Austin, TX)
- Fan TU (Emeryville, CA, US)
Cpc classification
G16B50/00
PHYSICS
G16B40/10
PHYSICS
G01N33/6824
PHYSICS
G16B15/30
PHYSICS
International classification
G01N33/543
PHYSICS
Abstract
The present disclosure provides methods of identifying and quantifying the peptides displayed by the major histocompatibility complex (MHC). Such methods may comprise the ability to determine the type, identity, and quantity of each peptide displayed by the MHC. In some embodiments, these methods may be used to develop an anti-cancer therapy or type the HLA of a patient. Also provided herein are compositions comprising peptides from the MHC which have been prepared for sequencing.
Claims
1. A method of identifying a peptide displayed by a major histocompatibility complex (MHC) of a sample, the method comprising: (a) providing said sample comprising a plurality of peptides bound by major histocompatibility complexes, wherein said plurality of peptides comprises a plurality of peptide types; (b) labeling at least one amino acid of each peptide of said plurality of peptides; (c) identifying at least one label provided in step (b), coupled to at least one peptide of said plurality of peptides; and (d) identifying a sequence of said at least one peptide of said plurality of peptides having said at least one label.
2. The method of claim 1, wherein said at least one amino acid is an internal amino acid.
3. The method of claim 1, wherein said at least one amino acid is covalently-coupled with said at least one label.
4. The method of claim 1, further comprising separating said major histocompatibility complexes from said sample.
5. The method of claim 4, wherein said separating comprises lysing a plurality of cells comprising at least a subset of said plurality of peptides bound by major histocompatibility complexes.
6. The method of claim 5, wherein said plurality of cells is derived from a biological sample.
7. The method of claim 5, wherein said biological sample is a tissue biopsy, a cell culture, enriched cells, or a bodily fluid.
8. The method of claim 1, wherein said each peptide of said plurality of peptides comprise from about 5 to about 20 amino acids.
9. The method of claim 8, wherein said each peptide of said plurality of peptides comprise from about 8 to about 12 amino acids.
10. The method of claim 9, wherein said each peptide of said plurality of peptides comprises 9 amino acids or 10 amino acids.
11. The method of claim 8, wherein said each peptide of said plurality of peptides comprise from about 12 to about 17 amino acids.
12. The method of claim 8, wherein said each peptide of said plurality of peptides comprise from about 12 to about 20 amino acids.
13. The method of claim 1, further comprising immobilizing said plurality of peptides.
14. The method of claim 13, wherein said plurality of peptides are coupled to said solid surface.
15. The method of claim 14, wherein said solid support is an array.
16. The method of claim 1, wherein said identifying of said peptide displayed by said MHC comprises identifying said each peptide of said plurality of peptides from among at most 100,000 peptides.
17. The method of claim 16, wherein said each peptide of said plurality of peptides is identified at a single molecule level.
18. The method of claim 1, wherein said each peptide of said plurality of peptides is a peptide presented by said MHC.
19. The method of claim 1, wherein said MHC is a MHC Class I molecule or a Human Leukocyte Antigens (HLA) Class I molecule.
20. The method of claim 1, wherein said MHC is a MHC Class II molecule or a HLA Class II molecule.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0067] The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure. The disclosure may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
[0080] In some aspects, the present disclosure provides methods of typing, identifying, quantifying, or locating the peptides presented by the major histocompatibility complex (MHC). In some aspects, the method provided herein include the use of fluorosequencing methods to identify the identity of specific amino acid residues in the peptides presented by the MHC. These identified amino acid residues can be used to identify the peptide using algorithms and/or other computational methods or the entire sequence may be obtained de novo. Additionally, the present methods may be used to quantify the specific peptides presented by the MHC.
[0081] The fluorosequencing methods is suited to aid in the identification of the antigenic peptides presented by the MHC. The fluorosequencing methods are based on the principle that the positional information of a small number of amino acid types in a peptide (such as xCxxC; x=any amino acid; C=Cysteine) may be sufficiently reflective of the peptides' identity, to allow its identification in a known protein sequence database. To enable experimental implementation, the peptides were selectively labeling one or more amino acids with fluorophores, sequentially degrading the immobilized peptides on the slide by Edman chemistry and monitoring the change in fluorescence intensity for each peptide, in parallel, as it loses one amino acid per cycle.
I. PEPTIDE SEQUENCING METHODS
[0082] There exist many methods of identifying the sequence of a peptide including fluorosequencing, mass spectroscopy, identifying the peptide sequence from the nucleic acid sequence, and Edman degradation. Fluorosequencing has been found to provide single molecule resolution for the sequencing of proteins of interest (Swaminathan, 2010; U.S. Pat. No. 9,625,469; U.S. patent application Ser. No. 15/461,034; U.S. patent application Ser. No. 15/510,962). One of the hallmarks of fluorosequencing is introduction of a fluorophore or other label into specific amino acid residues of the peptide sequence. This can involve the introduction of one or more amino acid residues with a unique labeling moiety. In some embodiments, one, two, three, four, five, six, or more different amino acids residues are labeled with a labeling moiety. The labeling moiety that may be used include fluorophores, chromophores, or a quencher. Each of these amino acid residues may include cysteine, lysine, glutamic acid, aspartic acid, tryptophan, tyrosine, serine, threonine, arginine, histidine, methionine, asparagine, and glutamine. Each of these amino acid residues may be labeled with a different labeling moiety. In some embodiments, multiple amino acid residues may be labeled with the same labeling moiety such as aspartic acid and glutamic acid or asparagine and glutamine. While this technique may be used with labeling moieties such as those described above, it is also contemplated that other labeling moiety may be used in fluorosequencing-like methods such as synthetic oligonucleotides or peptide-nucleic acid may be used. In particular, the labeling moiety used in the instant applications may be suitable to withstand the conditions of removing one or more of the amino acid residues. Some non-limiting examples of potential labeling moieties that may be used in the instant methods include those which emit a fluorescence signal in the red to infrared spectra such as an Alexa Fluor® dye, an Atto dye, Janelia Fluor® dye, a rhodamine dye, or other similar dyes. Examples of each of these dyes which were capable of withstanding the conditions of removing the amino acid residues include Alexa Fluor® 405, Rhodamine B, tetramethyl rhodamine, Janelia Fluor® 549, Alexa Fluor® 555, Atto647N, and (5)6-napthofluorescein. In other aspects, it is contemplated that the labeling moiety may be a fluorescent peptide or protein or a quantum dot.
[0083] Alternatively, synthetic oligonucleotides or oligonucleotide derivatives may be used as the labeling moiety for the peptides. For example, thiolated oligonucleotides are commercially available, and may be coupled to peptides using known methods. Commonly available thiol modifications are 5′ thiol modifications, 3′ thiol modifications, and dithiol modifications and each of these modifications may be used to modify the peptide. Following oligonucleotide coupling to the peptides as above, the peptides may be subjected to Edman degradation (Edman et al., 1950) and the oligonucleotides may be used to determine the presence of a specific amino acid residue in the remaining peptide sequence. In other embodiments, the labeling moiety may be a peptide-nucleic acid. The peptide-nucleic acid may be attached to the peptide sequence on specific amino acid residues.
[0084] One element of fluorosequencing is the removal of the labeled peptides through such techniques such as Edman degradation and subsequent visualization to detect a reduction in fluorescence, indicating a specific amino acid has been cleaved. Removal of each amino acid residue is carried out through a variety of different techniques including Edman degradation and proteolytic cleavage. In some embodiments, the techniques include using Edman degradation to remove the terminal amino acid residue. In other embodiments, the techniques involve using an enzyme to remove the terminal amino acid residue. These terminal amino acid residues may be removed from either the C terminus or the N terminus of the peptide chain. In situations in which Edman degradation is used, the amino acid residue at the N terminus of the peptide chain is removed.
[0085] In some aspects, the methods of sequencing or imaging the peptide sequence may comprise immobilizing the peptide on a surface. The peptide may be immobilized using an internal amino acid residue such as a cysteine residue, the N terminus, or the C terminus. In some embodiments, the peptide is immobilized by reacting the cysteine residue with the surface. In some embodiments, the present disclosure contemplates immobilizing the peptides on a surface such as a surface that is optically transparent across the visible spectra and/or the infrared spectra, possesses a refractive index between 1.3 and 1.6, is between 10 to 50 nm thick, and/or is chemically resistant to organic solvents as well as strong acid such as trifluoroacetic acid. A large range of substrates (like fluoropolymers (Teflon-AF (Dupont), Cytop® (Asahi Glass, Japan)), aromatic polymers (polyxylenes (Parylene, Kisco, Calif.), polystyrene, polymethmethylacrytate) and metal surfaces (Gold coating)), coating schemes (spin-coating, dip-coating, electron beam deposition for metals, thermal vapor deposition and plasma enhanced chemical vapor deposition) and functionalization methodologies (polyallylamine grafting, use of ammonia gas in PECVD, doping of long chain end-functionalized fluorous alkanes etc) may be used in the methods described herein as a useful surface. A 20 nm thick, optically transparent fluoropolymer surface made of Cytop® may be used in the methods described herein. The surfaces used herein may be further derivatized with a variety of fluoroalkanes that will sequester peptides for sequencing and modified targets for selection. Alternatively, an aminosilane modified surfaces may be used in the methods described herein. In other embodiments, the methods described herein may comprise immobilizing the peptides on the surface of beads, resins, gels, quartz particles, glass beads, or combinations thereof. In some non-limiting examples, the methods contemplate using peptides that have been immobilized on the surface of Tentagel® beads, Tentagel® resins, or other similar beads or resins. The surface used herein may be coated with a polymer, such as polyethylene glycol. In other embodiments, the surface is amine functionalized. In other embodiments, the surface is thiol functionalized.
[0086] Finally, each of these sequencing techniques involves imaging the peptide sequence to determine the presence of one or more labeling moiety on the peptide sequence. In some embodiments, these images are taken after each removal of an amino acid residue and used to determine the location of the specific amino acid in the peptide sequence. In some embodiments, the methods can result in the elucidation of the location of the specific amino acid in the peptide sequence. These methods may be used to determine the locations of specific amino acid residues in the peptide sequence or these results may be used to determine the entire list of amino acid residues in the peptide sequence. The methods may involve determining the location of one or more amino acid residues in the peptide sequence and comparing these locations to known peptide sequences and determining the entire list of amino acid residues in the peptide sequence.
[0087] In some aspects, the methods may comprise labeling one or more amino acid residues after the peptide has been separated from the MHC. If more than one position on the peptide is labeled, it is contemplated that the amino acids may be labeled in the following order: cysteine, lysine, N terminus, C terminus and/or amino acids with carboxylic acid groups on the side chain, and/or tryptophan. It is contemplated that one or more of these particular amino acids may be labeled or all of these amino acid residues may be labeled with different labels.
[0088] In some aspects, the imaging methods used in the sequencing techniques may involve a variety of different methods such as fluorimetry and fluorescence microscopy. The fluorescent methods may employ such fluorescent techniques such as fluorescence polarization, Forster resonance energy transfer (FRET), or time-resolved fluorescence. In some embodiments, fluorescence microscopy may be used to determine the presence of one or more fluorophores in the single molecule quantity. Such imaging methods may be used to determine the presence or absence of a label on a specific peptide sequence. After repeated cycles of removing an amino acid residue and imaging the peptide sequence, the position of the labeled amino acid residue can be determined in the peptide.
[0089] In some embodiments, the present disclosure provides methods of separating the peptide from the other components of the MHC. Some methods are known in the literature such as those described in Yadav et al., 2014 and Müller et al., 2006, both of which are incorporated herein by reference. The MHC in the sample may be enriched by trapping the MHC on a bead using a specific binding element such as an antibody. Beads for this purpose are well known in the art and include any solid support for which an antibody can be bound. For example, an antibody which is specific for the MHC allele or a pan specific antibody such as W6/32 antibody that targets all the different MHC alleles. Once the MHC has been enriched by binding to the bead and eluting the other components, the peptides may be removed using a mild acidic solution. Such solution may include an aqueous solution containing from 0.1% to about 2.5% of a weak acid. In some embodiments, the solution may contain from about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1.0%, 1.2%, 1.4%, 1.6%, 1.8%, 2.0%, or 2.5%, or any range derivable therein. Some non-limiting examples of acids which may be used in the methods of removing the peptides include formic acid, acetic acid, citric acid, trifluoroacetic acid, hydrochloric acid, or sulfuric acid. Once separated from the MHC, these peptides may be used in the sequencing methods described above.
[0090] The methods described herein are sensitive to the single molecular level. The sensitivity of the methods described herein can reveal the identity of substantially all peptides derived from the MHC. The sensitivity of the methods described herein can reveal the identity of each peptide derived from the MHC. The methods described herein may reveal the identity of at most 100,000 peptides, 90,000 peptides, 80,000 peptides, 70,000 peptides, 60,000 peptides, 50,000 peptides, 40,000 peptides, 30,000 peptides, 20,000 peptides, 10,000 peptides, 5,000 peptides, 4,000 peptides, 3,000 peptides, 2,000 peptides, 1,000 peptides, 500 peptides, 100 peptides, 50 peptides, 10 peptides, 5 peptides, 2 peptides, or 1 peptide. The methods described herein may reveal the identity of at least 1 peptide, 2 peptides, 5 peptides, 10 peptides, 50 peptides, 100 peptides, 500 peptides, 1,000 peptides, 2,000 peptides, 3,000 peptides, 4,000 peptides, 5,000 peptides, 10,000 peptides, 20,000 peptides, 30,000 peptides, 40,000 peptides, 50,000 peptides, 60,000 peptides, 70,000 peptides, 80,000 peptides, 90,000 peptides, 100,000 peptides, or more peptides. The methods described herein may reveal the identity from 100,000 peptides to 1 peptide, 50,000 peptides to 1 peptide, 10,000 peptides to 1 peptide, 5,000 peptides to 1 peptide, 1,000 peptides to 1 peptide, 500 peptides to 1 peptide, 100 peptides to 1 peptide, 10 peptides to 1 peptide, or 5 peptides to 1 peptide.
II. MAJOR HISTOCOMPATIBILITY COMPLEX (MHC)
[0091] The Major Histocompatibility Complex (MHC) is a series of cell surface proteins used by the body to recognize foreign molecules and is an essential factor in the acquired immune system. These proteins bind antigens and then display the antigens on their surface so that the antigens are recognized by T-cells. There are three major class I MHC haplotypes (A, B, and C) and three major MHC class II haplotypes (DR, DP, and DQ). The MHC in humans is also known as the human leukocyte antigen (HLA) complex. Class I MHC proteins may further comprise other elements such as molecules which assist in antigen presenting such as TAP and tapasin.
[0092] Class I MHC proteins, generally, comprises three domains, labeled α1, α2, and α3. The α1 domain functions to attach the MHC to the β-microglobulin, α3 functions is a transmembrane domain which anchors the protein into the cell membrane, and the groove between the α1 and α2 submits functions as the peptide presenting domain. On the other hand, class II MHC proteins have two domains, each with two classes of protein subunits, α and β. The first domain comprises α1 and α2 subunits while the second domain comprises β1 and β2 subunits. The α2 and β2 form the transmembrane domain of the protein anchoring the MHC to the cellular membrane with the α1 and β1 subunits forming the peptide binding groove.
[0093] The HLA loci are highly polymorphic and are distributed over 4 Mb on chromosome 6. The ability to haplotype the HLA genes within the region is clinically important since this region is associated with autoimmune and infectious diseases and the compatibility of HLA haplotypes between donor and recipient can influence the clinical outcomes of transplantation. HLAs corresponding to MHC class I present peptides from inside the cell and HLAs corresponding to MHC class II present antigens from outside of the cell to T-lymphocytes. Incompatibility of MHC haplotypes between the graft and the host triggers an immune response against the graft and leads to its rejection. Thus, a patient can be treated with an immunosuppressant to prevent rejection. HLA-matched stem cell lines may overcome the risk of immune rejection.
[0094] Because of the importance of HLA in transplantation, their currently exists several types of identifying the MHC (or the HLA). Traditionally, the HLA loci are usually typed by serology and PCR for identifying favorable donor-recipient pairs. Serological detection of HLA class I and II antigens can be accomplished using a complement mediated lymphocytotoxicity test with purified T or B lymphocytes. This procedure is predominantly used for matching HLA-A and -B loci. Molecular-based tissue typing can often be more accurate than serologic testing. Low resolution molecular methods such as SSOP (sequence specific oligonucleotide probes) methods, in which PCR products are tested against a series of oligonucleotide probes, can be used to identify HLA antigens, and currently these methods are the most common methods used for Class II-HLA typing. High resolution techniques such as SSP (sequence specific primer) methods which utilize allele specific primers for PCR amplification can identify specific MHC alleles.
III. THERAPEUTIC USES OF PEPTIDES FROM THE MAJOR HISTOCOMPATIBILITY COMPLEX AND PEPTIDES OBTAINED FROM THE MHC
[0095] Peptides obtained from the MHC may be obtained from a patient. A patient may be mammal such as a human. These peptides may be obtained from a sample such as a tissue biopsy, a cell culture, or enriched cells derived from a biological sample. The biological sample may be obtained from the blood stream or from a bodily fluid such as blood, saliva, urine, or lymphatic fluid. In an embodiment, the enriched cells may be dendritic cells. The tissue biopsy may result from a biopsy of healthy tissue or a biopsy of cancerous tissue.
[0096] In some embodiments, the methods comprise identifying the sequence of 2, 3, 4, 5, or 6 peptide sequences that are displayed by the MHC. The peptides may be further enriched from the MHC and extracted from the MHC. Peptides obtained from the MHC may have a length from about 5 to about 20 amino acid residues. In some embodiments, the MHC peptides identified has from 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, to about 20 amino acid residues, or within any range of amino acid residues derivable therein. These peptides may further comprise one or more post translational modification such as glycosylation or phosphorylation. These methods can be used to either quantify one or more peptides displayed by the MHC.
[0097] A. Promise and Pains of Immunotherapy
[0098] When 3 out of every 4 patients undergoing immunotherapy for acute lymphoblastic leukemia show complete remission 18 months later, it defines an exciting and hopeful period in the fight against cancer (Maude et al., 2018). Since the approval of ipilimumab (Yervoy®) in 2011, cancer immunotherapies have provided dramatic improvement in patients' overall survival, with ˜1400 ongoing clinical trials (www.clinicaltrials.gov; as of Nov. 17, 2018; search term “immunotherapy”), cures in various types of cancers, and an estimated $120B worldwide market in 2021 (BCC Library—Report View—PHM053A). Immunotherapies are broadly built on efforts in engineering and/or co-opting patients' own immune systems to target specific cell surface tumor antigens and induce immune responses for tumor clearance (Harris et al., 2016). However, developed therapies are not always effective, with reasons ranging from non-response to fatal cytokine release syndrome. For example, deaths in a clinical trial for Juno Therapeutics drug JCAR015 for acute lymphoblastic leukemia or Merck's Pembrolizumab for multiple myeloma have caused great anxiety for patients and drug companies alike (Harris et al., 2017). However, cancer relapse rates for immunotherapy appear to be bimodal, either completely eliminating tumor cells or working incompletely possibly with adverse side effects (Harris et al., 2016). This finding argues for careful patient selection. Efforts to use more predictive biomarkers to aid patient selection are thus critical and a growing unmet market need.
[0099] Since most classes of immunotherapies—T-cell therapies (CAR and TCRs), cancer vaccines and checkpoint inhibitors—engineer or manipulate the body's T-cells (Pham et al., 2018), a strong criterion for stratifying patients can be by directly profiling biomolecules that interact with the T-cells. T-cell receptors (TCR) recognize short 8-12 amino acid long peptides displayed by human leukocyte antigen (HLA)-1 complexes on the surfaces of cells.
[0100] B. Methods Needed to Obtain HLA Peptides Directly from Tumor Biopsies
[0101] There is currently a technological “blind spot” for sequencing and identifying HLA-I bound peptides directly from patient tumor samples (Brennick et al., 2017). The challenge is due to (a) their extremely low abundance, occurring as low as 10 copies of each peptide displayed per cell in order to trigger T cell recognition, (b) a highly heterogeneous population of up to 10,000 different TAA peptides per samples, and (c) an incomplete understanding of personalized tumor-associated pathways for processing and displaying mutated peptides (Yewdell et al., 2003). While mass spectrometry can identify peptides, it is severely limited in sensitivity, requiring about a million copies (molecules) of a single peptide to produce a detectable signal. This restricts its use to cataloguing peptides from expandable cell-lines but not directly from typical tumor biopsies of more restricted size (Caron et al., 2017). Alternatively, peptide prediction algorithms can predict antigenic peptides, e.g. by integrating exome and transcriptome sequences obtained from tumor biopsies with computer models of HLA binding motifs, binding affinity, and proteasome cleavage patterns (Lee et al., 2018). Currently, such algorithms show little concordance with each other and their ability to identify tumor-specific and tumor-associated peptides are seldom right in blind trials (Vitiello and Zanetti, 2017).
[0102] C. Establishing Clinical Correlations:
Improving Patient Selection and Outcomes by HLA-I Peptide Sequencing
[0103] Today, patient screening relies on surrogate tools such as RT-PCR or whole exome sequencing to confirm the expressed genes or mutations. For example, for multiple myeloma TCR therapy, 20 patients were initially screened for full length, expressed NY-ESO-1 mRNA, but not for the actual displayed HLA-I peptide against which the therapy was developed (Robbins et al., 2015). Introducing engineered T-cells into a patient without direct confirmation of the target antigen on the tumor puts the patient at risk of an autoimmune reaction or cytokine release syndrome without knowledge of potential efficacy (Shimabukuro-et al., 2018). A large number of therapeutic peptide targets have now been identified and catalogued in ever-expanding public (iedb.org) and private databases (companies) (Caron et al., 2017). A rapid assay to identify these confirmed peptide antigens directly from tumor biopsies are needed to help assign patients to pre-designed T-cells or vaccines.
[0104] A number of immunotherapy treatments are based on targeting HLA-I bound peptide antigens that would potentially benefit from such an assay (Lee et al., 2018). These types of immunotherapy, which we term antigen-focused immunotherapies, include: (a) endogenous T-cell therapy (ETC), wherein tumor antigen-specific T-cells are isolated from patient peripheral blood, expanded in vitro, and infused back into patients, (b) TCR T-cell therapies, in which patient T cells are engineered to express tumor antigen-specific TCRs, and (c) cancer vaccines, in which a cocktail of peptide neoantigens are used to immunize a patient in order to activate the anti-tumor T-cell response (Pham et al., 2018).
IV. DEFINITIONS
[0105] As used herein, the term “amino acid” in general refers to organic compounds that contain at least one amino group, —NH.sub.2 which may be present in its ionized form, —NH.sub.3+, and one carboxyl group, —COOH, which may be present in its ionized form, —COO.sup.−, where the carboxylic acids are deprotonated at neutral pH, having the basic formula of NH.sub.2CHRCOOH. An amino acid and thus a peptide has an N (amino)-terminal residue region and a C (carboxy)-terminal residue region. Types of amino acids include at least 20 that are considered “natural” as they comprise the majority of biological proteins in mammals and include amino acid such as lysine, cysteine, tyrosine, threonine, etc. Amino acids may also be grouped based upon their side chains such as those with a carboxylic acid groups (at neutral pH), including aspartic acid or aspartate (Asp; D) and glutamic acid or glutamate (Glu; E); and basic amino acids (at neutral pH), including lysine (Lys; L), arginine (Arg; N), and histidine (His; H).
[0106] As used herein, the term “terminal” is referred to as singular terminus and plural termini.
[0107] As used herein, the term “side chains” or “R” refers to unique structures attached to the alpha carbon (attaching the amine and carboxylic acid groups of the amino acid) that render uniqueness to each type of amino acid. R groups have a variety of shapes, sizes, charges, and reactivities, such as charged polar side chains, either positively or negatively charged, such as lysine (+), arginine (+), histidine (+), aspartate (−) and glutamate (−), amino acids can also be basic, such as lysine, or acidic, such as glutamic acid; uncharged polar side chains have hydroxyl, amide, or thiol groups, such as cysteine having a chemically reactive side chain, i.e. a thiol group that can form bonds with another cysteine, serine (Ser) and threonine (Thr), that have hydroxylic R side chains of different sizes; asparagine (Asn), glutamine (Gln), and tyrosine (Tyr); Non-polar hydrophobic amino acid side chains include the amino acid glycine; alanine, valine, leucine, and isoleucine having aliphatic hydrocarbon side chains ranging in size from a methyl group for alanine to isomeric butyl groups for leucine and isoleucine; methionine (Met) has a thiol ether side chain, proline (Pro) has a cyclic pyrrolidine side group. Phenylalanine (with its phenyl moiety) (Phe) and typtophan (Trp) (with its indole group) contain aromatic side groups, which are characterized by bulk as well as nonpolarity.
[0108] Amino acids can also be referred to by a name or 3-letter code or 1-letter code, for example, Cysteine; Cys; C, Lysine; Lys; K, Tryptophan; Trp; W, respectively.
[0109] Amino acids may be classified as nutritionally essential or nonessential, with the caveat that nonessential vs. essential may vary from organism to organism or vary during different developmental stages. Nonessential or conditional amino acids for a particular organism is one that is synthesized adequately in the body, typically in a pathway using enzymes encoded by several genes, as substrates for protein synthesis. Essential amino acids are amino acids that the organism is not unable to produce or not able to produce enough naturally, via de novo pathways, for example lysine in humans. Humans obtain essential amino acids through their diet, including synthetic supplements, meat, plants and other organisms.
[0110] “Unnatural” amino acids are those not naturally encoded or found in the genetic code nor produced via de novo pathways in mammals and plants. They can be synthesized by adding side chains not normally found or rarely found on amino acids in nature.
[0111] As used herein, β amino acids, which have their amino group bonded to the β carbon rather than the α carbon as in the 20 standard biological amino acids, are unnatural amino acids. A common naturally occurring β amino acid is β-alanine.
[0112] As used herein, the term the terms “amino acid sequence”, “peptide”, “peptide sequence”, “polypeptide”, and “polypeptide sequence” are used interchangeably herein to refer to at least two amino acids or amino acid analogs that are covalently linked by a peptide (amide) bond or an analog of a peptide bond. The term peptide includes oligomers and polymers of amino acids or amino acid analogs. The term peptide also includes molecules that are commonly referred to as peptides, which generally contain from about two (2) to about twenty (20) amino acids. The term peptide also includes molecules that are commonly referred to as polypeptides, which generally contain from about twenty (20) to about fifty amino acids (50). The term peptide also includes molecules that are commonly referred to as proteins, which generally contain from about fifty (50) to about three thousand (3000) amino acids. The amino acids of the peptide may be L-amino acids or D-amino acids. A peptide, polypeptide or protein may be synthetic, recombinant or naturally occurring. A synthetic peptide is a peptide produced artificially in vitro.
[0113] As used herein, the term “subset” refers to the N-terminal amino acid residue of an individual peptide molecule. A “subset” of individual peptide molecules with an N-terminal lysine residue is distinguished from a “subset” of individual peptide molecules with an N-terminal residue that is not lysine.
[0114] As used herein, the term “fluorescence” refers to the emission of visible light by a substance that has absorbed light of a different wavelength. In some embodiments, fluorescence provides a non-destructive way of tracking and/or analyzing biological molecules based on the fluorescent emission at a specific wavelength. Proteins (including antibodies), peptides, nucleic acid, oligonucleotides (including single stranded and double stranded primers) may be “labeled” with a variety of extrinsic fluorescent molecules referred to as fluorophores.
[0115] As used herein, sequencing of peptides “at the single molecule level” refers to amino acid sequence information obtained from individual (i.e. single) peptide molecules in a mixture of diverse peptide molecules. The present disclosure may not be limited to methods where the amino acid sequence information obtained from an individual peptide molecule is the complete or contiguous amino acid sequence of an individual peptide molecule. In some embodiment, it is sufficient that partial amino acid sequence information is obtained, allowing for identification of the peptide or protein. Partial amino acid sequence information, including for example the pattern of a specific amino acid residue (i.e. lysine) within individual peptide molecules, may be sufficient to uniquely identify an individual peptide molecule. For example, a pattern of amino acids such as X-X-X-Lys-XX-X-X-Lys-X-Lys, which indicates the distribution of lysine molecules within an individual peptide molecule, may be searched against a known proteome of a given organism to identify the individual peptide molecule. It is not intended that sequencing of peptides at the single molecule level be limited to identifying the pattern of lysine residues in an individual peptide molecule; sequence information for any amino acid residue (including multiple amino acid residues) may be used to identify individual peptide molecules in a mixture of diverse peptide molecules.
[0116] As used herein, “single molecule resolution” refers to the ability to acquire data (including, for example, amino acid sequence information) from individual peptide molecules in a mixture of diverse peptide molecules. In one non-limiting example, the mixture of diverse peptide molecules may be immobilized on a solid surface (including, for example, a glass slide, or a glass slide whose surface has been chemically modified). In one embodiment, this may include the ability to simultaneously record the fluorescent intensity of multiple individual (i.e. single) peptide molecules distributed across the glass surface. Optical devices are commercially available that can be applied in this manner. For example, a conventional microscope equipped with total internal reflection illumination and an intensified charge-couple device (CCD) detector is available (see Braslaysky et al., 2003). Imaging with a high sensitivity CCD camera allows the instrument to simultaneously record the fluorescent intensity of multiple individual (i.e. single) peptide molecules distributed across a surface. In one embodiment, image collection may be performed using an image splitter that directs light through two band pass filters (one suitable for each fluorescent molecule) to be recorded as two side-by-side images on the CCD surface. Using a motorized microscope stage with automated focus control to image multiple stage positions in the flow cell may allow millions of individual single peptides (or more) to be sequenced in one experiment.
[0117] The term “label” as used herein is the introduction of a chemical group to the molecule which generates some form of measurable signal. Such a signal may include but is not limited to fluorescence, visible light, mass, radiation, or a nucleic acid sequence.
[0118] Attribution probability mass function—for a given fluorosequence, the posterior probability mass function of its source proteins, i.e. the set of probabilities P(p.sub.i/f.sub.i) of each source protein p.sub.i, given an observed fluorosequence f.sub.i.
V. EXAMPLES
[0119] The following examples are included to demonstrate preferred embodiments of the disclosure. The techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the disclosure, and thus can be considered to constitute preferred modes for its practice. However, in light of the present disclosure, many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the disclosure.
Example 1—Profiling the Peptides Bound to the MHC by Identity and Quantity Through Sequencing
[0120] The methodology used for profiling MHC peptides is summarized in
[0121] A. Extracting MHC bound peptides:
[0122] A number of methods for enriching and extracting MHC bound peptides have been well described in literature (Yadav et al., 2014; Müller et al., 2006). The cells and tissues are first lysed and the MHC proteins are enriched by immuno-precipitation method. Briefly, the MHC-I allele specific (or pan allelic depending on the experiment) antibody is fixed to the beads and the MHC-I proteins are enriched. By gently treating this protein mixture with mild acid (such as 0.2-1% formic acid), the peptides bound to the MHC-I complex are released. These peptides are collected and lyophilized for downstream use. The source of the biological sample may be tumor biopsy, healthy tissue biopsy, cell cultures, enriched cells from blood stream (such as dendritic cells), or other suitable sources. If a situation arises in which there is availability of a tumor and a matched control sample from the same patient, this may lead to personalized MHC peptides being extracted and identified, a nature of therapy called “personalized” therapy. Regardless of the source or specific present of matched sample, the end product of the extraction method(s) is a pool of peptides.
[0123] B. Fluorosequencing of MHC Bound Peptides:
[0124] The extracted MHC peptides obtained in A are subjected to the labeling procedures used in fluoro sequencing.
[0125] (i) Labeling of Peptides:
[0126] The strategy for labeling different amino acids, namely Cysteine, Lysine, Tryptophan and Aspartic/Glutamic acid have been described earlier (Swaminathan et al., 2014; Hernandez et al., 2017). It is conceivable that labeling tyrosine, methionine, histidine and post-translationally modified amino acid residues (phosphorylation and glycosylation) can be performed as well (Swaminathan et al., 2014; Phatnami and Greenleaf, 2006; Stevens et al., 2005). Experimentally, the peptide sample is divided into parts either by random sub-sampling or via fractionation methods such as separating the peptides by salt or pH gradient columns into different aliquots. Each of these aliquots would be fluorescently labeled with a subset of amino acid selective fluorophores. In a conceivable implementation, each of the aliquots are further subdivided and labeled with different subset of amino acid selective fluorophores. Depending on the concentration of MHC peptide sample, direct fluorescent labeling can be done.
[0127] (ii) Fluorosequencing of Labeled Peptides:
[0128] The population of fluorescently labeled peptides are sequenced as has been described (Swaminathan, 2010; U.S. Pat. No. 9,625,469; U.S. patent application Ser. No. 15/461,034; U.S. patent application Ser. No. 15/510,962). About 10-15 cycles of experimental cycles (one cycle comprises one Edman degradation chemistry and a round raster scanning slide surface to obtain images of all peptide across multiple fluorescent channels) are performed, since the MHC peptides are typically 9-11 amino acid in length. The intensity trace of each peptide molecule through Edman cycles are analyzed and a fluorosequence obtained. After combining information of the efficiencies of the different physio-chemical processes in the experiment (such as photobleaching rate and Edman efficiency), a list of fluorosequences with their counts and a confidence score is generated.
[0129] C. Building Reference Database of Epitopes for Matching Fluorosequences:
[0130] The list of fluorosequences obtained from B may be matched to a reference dataset to determine its exact peptide sequence. Construction of the reference database (e.g. the potential set of all MHC peptide sequences) requires bioinformatics analysis of the underlying cellular proteome. But given the difficulty in cataloguing all the proteins and peptides present in the cellular proteome, researchers often use the exome and transcriptome sequencing data to infer the MHC peptide list. Two pertinent sources of information are required for predicting MHC peptides from genomic information—(a) the population of expressed proteins (that can be obtained from exome or transcriptome data) and (b) the HLA typing (the set of 6 different HLA alleles) of the individual cell line. Thus in the pipeline for MHC peptide sequencing by fluorosequencing, either—(a) genome (or exome) and transcriptome sequencing for the cell or tissue biopsy is performed or (b) publicly available dataset of for the particular biological sample that can yield the above two information is used.
[0131] A number of publicly available prediction algorithms are available that uses the exome and transcriptome data to infer MHC peptide sequences (Backert & Kohlbacher, 2015). The 9-11 amino acid long peptides originating from the potentially translated proteins are computationally analyzed for their secondary structures, MHC binding strengths, transcript level abundances, proteasome cleavage efficiencies, etc. to determine its probability of being presented as an MHC bound peptide (Schumacher & Schreiber, 2015). This rank-ordered list of peptides is the reference dataset for pattern matching with the observed fluorosequences. When comparisons are made on lists obtained from tumor biopsy and a matched control sample (exome or genome data alone), tumor associated or tumor specific antigens can be determined. If fluorosequences identifies or matches these MHC peptide sequences, then the fluorosequencing technology can be used for discovering and confirming neoantigens. An alternate source of this dataset may be mass spectrometry identified peptides. With a high false discovery score, the peptide list is higher with more false positive data, but in combination with prediction algorithms can encompasses a richer dataset than just the prediction algorithm output.
[0132] D. Matching Fluorosequencing Data to Reference Datasets:
[0133] The result of B is a list of fluorosequences, with the observed counts and a confidence score of its observation. The result from C is a dataset of peptide sequences, either rank-ordered from the prediction algorithms or dataset of epitopes from publicly available sources. It is very likely that given—(a) the few amino acid group that can be selectively labeled and (b) smaller peptide length (9-11 amino acid long), that unique matches of fluorosequences to peptides in the predicted dataset is low. However, given the direct observation of fluorosequences, the rank-ordered peptide list can be reweighted with this orthogonal information and a new rank-ordered peptide list be generated. It is also likely that the observed fluorosequences may match and confirm higher ranked peptides in reference list. A scoring system can be developed to match the fluorosequences to the reference dataset, with higher weightage ascribed to fluorosequences that have a lower matching frequency among the other peptides in the dataset as well as being confirmatory to higher ranked peptides.
Example 2—Computational Simulation of Fluorosequencing to Validate its Application for MHC Peptide Profiling
[0134] Fluorosequencing of MHC peptides for identification provides an information content of the sequence between two extremes as shown in a simple schematic in
[0135] The following two simulations study highlights the feasibility of fluorosequencing technology to access the information content in publicly available MHC peptides.
[0136] (i) Presence of Amino Acids that can be Labeled:
[0137] Given that six of the twenty naturally occurring amino acids can be labeled for fluorosequencing; it is unclear what its representation is in the MHC peptide sequences. To determine what percentage of the putative MHC peptides would even be visible for fluorosequencing, the epitopes presented by HLA-A2 allele was chosen from the IEDB data repository (www.iedb.org/) (filtered by confirmation with binding assay).
[0138] (ii) Unique Identification and Confirmation of MHC Epitopes by Fluorosequencing:
[0139] Amongst the cancer types, melanoma cell lines have been observed to carry the highest mutation load. In order to find out if the labeling schemes available for fluorosequencing can uniquely identify or confirm known MHC epitopes, a validated epitope list observed to have occurred in melanoma cell-lines was chosen from the IEDB data repository. The known 133 epitopes are compiled through filtering the IEDB dataset for “melanoma” term in the validated epitope observations and can serve as a benchmark to validate the limitations of fluorosequencing to uniquely identify MHC peptides. As seen in
[0140] These results indicate that fluorosequencing as a technology provides identifiable information of MHC peptides. When combined with a reference database and multiple labeling strategies, the fluorosequencing technology can identify and confirm highly probable predicted peptides. Furthermore, if there is evidence for a fluorosequence matching a predicted neoantigen peptide, then the technology can also be used for neoantigen discovery. These previously identified neoantigen (also referred to as public neoantigens) can be directly identified by fluorosequencing from the limited tissue biopsy. This type of test is envisioned for patient selection process. Therapies based on a select neoantigen can be paired to patient's expressing the displayed neoantigen, which can be identified by fluorosequencing.
Example 3—Sequencing HLA Peptides
[0141] (i) HLA Peptides from Mono-Allelic B-Cells
[0142] Pilot experiments were setup to obtain and validate HLA peptides and predict neo-antigenic peptide on a mono-allelic B-cell lines. The isolated peptides were sequenced by fluorosequencing and target peptide spiked into the mixture to determine limits of detection.
[0143] (ii) Isolating and Validating HLA Peptides
[0144] Two mono-allelic B-cell lines (HLA-A2603 and HLA B0702 were purchased from The International Histocompatibility Working Group as detailed in the publication (Petersdorf et al., 2013). 3×10.sup.8 cells were cultured and HLA peptide purification was performed as described (Abelin et al., 2017). A schematic of the process is shown in
[0145] The isolated HLA peptides were identified by LC coupled tandem mass-spectrometer (ThermoFisher, Orbitrap Fusion Lumos) using a reference dataset of a human proteome (Swissprot) and with settings described in literature for analyzing HLA peptides (Abelin et al., 2017; Bassani-Sternberg et al., 2015). The validity of the HLA isolation procedure was confirmed by performing motif analysis and binding affinity analysis on the isolated peptides (shown in
[0146] (iii) Predicting HLA Peptides from Genomic Information
[0147] The genome and RNA sequencing data for the B cell-line (expressing HLA-A2603 allele) were obtained from publicly available datasets. The raw sequence reads were analyzed and compared with standard reference human genome using a list of softwares, including mhcflurry, to generate a list of peptides containing single nucleotide variations and indels (neoantigens). The next step in the process is the analysis of the peptide sequences by netMHC software which predicts the binding affinity of the peptides to the MHC complex and serves as a proxy for its presentation on the cell. Performing this analysis narrowed down the set of transcript derived peptides to 36,000.
[0148] The Venn diagram in
[0149] (iv) Fluorosequencing of HLA Peptides
[0150] To validate the single molecule fluorosequencing method on the HLA peptides, the HLA peptides from the A2603 and B0702 cell lines were first isolated as previously described. The C-terminal carboxylic acid was then selectively capped with an acid esterified Fmoc PEG linker (Fmoc-CO-PEG4-NH.sub.2) using a previously described oxazolone chemistry (Kim et al., 2011). The internal aspartic and glutamic acid residue was labeled with Atto647N-amine using standard carbodiimide chemistry (Totaro et al., 2016) and followed by deprotection of the Fmoc group. The free dyes were removed by standard C-18 tip cleanup and then subjected to fluorosequencing. This produced a set of fluorescently labeled peptides with free carboxylic acid ends.
[0151] To further validate the sensitivity of the fluorosequencing technology and obtain the limits of its detection, a spike-in and recovery assay for a known target antigenic peptide was performed in the HLA peptide background. A previously identified neoantigen (of sequence ELYAEKVATR (SEQ ID NO: 1)) was choosen, labeled the internal acidic residues with Atto647N fluorophore and spiked the peptide across 5 orders of magnitude in dilution into the labeled HLA peptide mixture background. Fluorosequencing on this peptide mixture was performed and made measurements from about 50,000 individual molecules per experiment. The number of molecules with the observed fluorosequence pattern “ExxxE” were quantified and is presented in
[0152] (v) Application of HLA Peptide Sequencing Using Single Molecule Peptide Sequencing Methods
[0153] The single molecule peptide sequencing methods, exemplified by fluorosequencing, is applicable for tumor treatment and monitoring. The advantages of being a highly sensitive proteomic method implies requiring small sample amounts and have a high dynamic range for identification. Two specific applications are shown in
[0156] All of the methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this disclosure have been described in terms of preferred embodiments, it will be apparent that variations may be applied to the methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the disclosure. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications are deemed to be within the spirit, scope and concept of the disclosure as defined by the appended claims.
REFERENCES
[0157] The following references, to the extent that they provide examples of procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference. [0158] U.S. patent application Ser. No. 15/461,034. [0159] U.S. patent application Ser. No. 15/510,962. [0160] U.S. Pat. No. 9,625,469. [0161] Abelin, et al. Mass Spectrometry Profiling of HLA-Associated Peptidomes in Mono-allelic Cells Enables More Accurate Epitope Prediction. Immunity 46, 315-326 (2017). [0162] Backert & Kohlbacher, Genome Medicine, 7(1):119, 2015. [0163] Bassani-Sternberg, et al., Mol. Cell. Proteomics. 14:658-73, 2015. [0164] BCC Library—Report View—PHM053A. Available at: www.bccresearch.com/market-research/pharmaceuticals/cancer-immunotherapy-phm053a.html. [0165] Braslaysky et al., PNAS, 100(7):3960-4, 2003. [0166] Brennick et al., Immunotherapy, 9(4):361-71, 2017. [0167] Brown et al., Genome Res., 24:743-50, 2014. [0168] Caron et al., Immunity, 47(2):203-8, 2017. [0169] Dudley & Rosenberg, Nat. Rev. Cancer, 3:666-675, 2003. [0170] Edman, et al., Acta. Chem. Scand., 4:283-293, 1950 [0171] Goodman et al., Molecular Cancer Therapeutics, 16(11):2598-608, 2017. [0172] Harris et al., Cancer Biology & Medicine, 13(2):171-93, 2016. [0173] Harris et al., Nature, 552:S74, 2017. [0174] Hernandez et al., New Journal of Chemistry, 41:462-469, 2017. [0175] Kim, et al., Anal. Biochem., 419:211-6, 2011. [0176] Lee et al., Trends in Immunology, 39(7):536-48, 2018. [0177] Maude et al., New England Journal of Medicine, 378(5):439-48, 2018. [0178] Müller et al., in Immunotherapy of Cancer, 21-44 Humana Press, 2006. [0179] Neefjes et al., Nat. Rev. Immunol., 11:823-836, 2011. [0180] Petersdorf et al., Int. J. Immunogenet., 40, 2013. [0181] Pham et al., Annals of Surgical Oncology, 25(11):3404-12, 2018. [0182] Phatnani & Greenleaf, Genes Dev, 20:2922-2936, 2006. [0183] Robbins et al., Clinical Cancer Research, 21(5):1019-27, 2015. [0184] Schumacher & Schreiber, Science, 348(6230):69-74, 2015. [0185] Shimabukuro-et al., Journal for Immunotherapy of Cancer, 6, 2018. [0186] Stevens et al., Rapid Commun Mass Spectrom., 19:2157-2162, 2005. [0187] Swaminathan R, Biology S. Jagannath Swaminathan. Education. doi:10.1002/rcm.3179, 2010. [0188] Swaminathan, et al., bioRxiv Cold Spring Harbor Labs Journals, 2014. [0189] Totaro, K. A. et al., Bioconjug. Chem., 27:994-1004, 2016. [0190] Vitiello and Zanetti, Nature Biotechnology, 35(9):815-7, 2017. [0191] Yadav et al., Nature, 515:572-576, 2014. [0192] Yee & Lizee, Cancer J., 23:144-148, 2017. [0193] Yee et al., Cancer J., 21:492-500, 2015. [0194] Yewdell et al., Nat. Rev. Immunol., 3:952-961, 2003.