VIRUS PEPTIDE AND PROTEIN VARIANT SELECTION WORKFLOW
20230194543 · 2023-06-22
Assignee
Inventors
Cpc classification
International classification
Abstract
Provided herein are methods, techniques and processes for selecting one or more peptides for virus detection (e.g., influenza, SARS-CoV-2) in clinical samples using mass spectrometry. Methods for selecting a combination of peptides to identify one or more disease states in a patient can include: (i) selecting the one or more disease states; (ii) collecting information on proteins associated with the one or more disease states being present within the subject; (ii) in silico digesting individual proteins associated with the one or more disease state to obtain possible workflow peptides; (iv) collecting filtering data associated with the possible workflow peptides; (iv) analyzing the possible workflow peptides for coverage of the one or more disease states and for mass spectrometry detection and resolution; and (iv) selecting the combination of peptides from the possible workflow peptides based on the analyzing step.
Claims
1. A method for selecting a combination of peptides to identify one or more disease states in a subject using mass spectrometry, the method comprising: selecting the one or more disease states; collecting information on proteins associated with the one or more disease states being present within the subject; in silico digesting individual proteins associated with the one or more disease state to obtain possible workflow peptides; collecting filtering data associated with the possible workflow peptides; analyzing the possible workflow peptides for coverage of the one or more disease states and for mass spectrometry detection and resolution; and selecting the combination of peptides from the possible workflow peptides based on the analyzing step.
2. The method of claim 1, wherein two or more disease states are selected.
3. The method of claim 2, wherein three disease states are selected.
4. The method of claim 1, wherein two or more variants of the one or more disease states are selected.
5. The method of claim 1, wherein the filtering data comprises physicochemical data.
6. The method of claim 1, wherein the filtering data comprises homology data.
7. The method of claim 1, wherein the filtering data comprises abundance of peptide data (frequency %).
8. The method of claim 1, wherein the filtering data comprises data associated with the one or more disease states.
9. The method of claim 1, wherein the filtering data comprises physicochemical data and abundance of peptide data.
10. The method of claim 5, wherein the physicochemical data comprises ionization efficiency and/or retention time.
11. The method of claim 5, wherein the physicochemical data comprises one or more of (a) length of possible workflow peptides; (b) MRM transition data on co-eluting or close eluting possible workflow peptides; and (c) amino acid sequences contained within the possible workflow peptides.
12. The method of claim 1, wherein the filtering data comprises data on methionine being present within amino acid sequences contained within the possible workflow peptides.
13. The method of claim 1, further comprising eliminating possible workflow peptides using the filtering data prior to analyzing the possible workflow peptides for coverage of the one or more disease states and for mass spectrometry detection and resolution.
14. The method of claim 1, further comprising eliminating possible workflow peptides after analyzing the possible workflow peptides for coverage of the one or more disease states and for mass spectrometry detection and resolution using filtering data.
15. The method of claim 1, wherein analyzing the possible workflow peptides for coverage of the one or more disease states and for mass spectrometry detection and resolution comprises applying a statistical approach.
16. The method of claim 15, wherein the statistical approach comprises Bayesian inference.
17. The method of claim 15, wherein the statistical approach comprises a Markov Chain Monte Carlo algorithm.
18. The method of claim 1, wherein analyzing the possible workflow peptides for coverage of the one or more disease states and for mass spectrometry detection and resolution comprises analyzing coverage for a yes/no result for the disease state.
19. The method of claim 1, wherein analyzing the possible workflow peptides for coverage of the one or more disease states and for mass spectrometry detection and resolution comprises analyzing coverage for a determination of a particular variant of the disease state.
20. The method of claim 1, wherein analyzing the possible workflow peptides for coverage of the one or more disease states and for mass spectrometry detection and resolution comprises analyzing coverage for a determination of a particular disease state from a group of possible disease states.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] The technology will be more fully understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
DESCRIPTION OF THE TECHNOLOGY
[0032] In general, the present technology is directed to the selection of peptides for detection by mass spectroscopy that are indicative of a disease state (e.g., influenza infection, corona virus infection, human rhinovirus infection, etc.). The present technology provides processes in which data are first collected and/or calculated and then filtered using physicochemical properties and other characteristics and traits of the diseased state to provide an efficient workflow for the mass spectrometry detection of the diseased state in a sample.
[0033] One challenge associated with efficiently diagnosing an influenza infection is the variability that exists among influenza types and subtypes. Due to this variability, reagents and methods designed to detect one type or subtype of the virus may not detect an infection with a different influenza type of subtype. Three types of influenza viruses infect human subjects: influenza A, influenza B, and influenza C. Influenza A and B are typically associated with seasonal flu, while influenza C generally causes mild disease. A fourth type, influenza D, can infect certain non-human mammals such as cattle. Influenza viruses are further divided into subtypes/strains based on the composition of surface proteins that make up the viral capsid. For influenza A, the subtypes are based on the expression of particular variants of the surface proteins hemagglutinin (H) and neuraminidase (N). There are 18 known different variants of hemagglutinin (H1-H18), and 11 known different variants of neuraminidase (N1-N11), combinations of which result in 198 possible different influenza A subtypes (over 130 of which have been detected in nature). In view of the prevalence of influenza infection, and the overlap of influenza symptoms with other common pathogenic infections (e.g., SARS-CoV-2 infection, human rhinovirus infection), methods of accurately and efficiently detecting and diagnosing viral infection are needed.
[0034] One objective of the present technology is to identify a set of peptides that can be used to detect the presence of an influenza virus in a sample (e.g., a biological sample such as nasal swab, saliva, sputum, blood, plasma, etc.), derived from a subject known or suspected of having an influenza infection. In exemplary embodiments, the set of peptides can include peptides having sufficient homology commonality among influenza types or subtypes to allow detection of multiple influenza types or subtypes. In addition, in some embodiments, the peptides have minimal to no homology with non-influenza proteins, thereby serving as specific marker(s) of influenza infection. By way of example, the present technology can be used to identify one or more (e.g., two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, etc.) peptides that comprise a sequence that is shared among one or more influenza types and/or one or more influenza subtypes. In some embodiments, the present technology can be used to identify peptides that have significant homology (i.e., having at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 99%, or 100% sequence identity) in two or more influenza types (e.g., peptides that have significant homology in influenza A and influenza B). In some embodiments, the present technology can be used to identify peptides that have significant homology (i.e., having at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 99%, or 100% sequence identity) in two or more influenza subtypes (e.g., peptides that have, for example, significant homology in influenza A H1N1 and influenza A H3N2). In some embodiments, the technology provides a set of peptides, wherein each peptide in the set of peptides has significant homology in two, three, or four influenza types selected from influenza A, influenza B, influenza C, and influenza D, or even novel, mutated variants. In some embodiments, the technology provides a set of peptides, wherein each peptide in the set of peptides has significant homology in two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more influenza subtypes. In some embodiments, each peptide in the set of peptides has minimal homology (e.g., 60% or less, 50% or less, 40% or less, 30% or less, 20% or less, or 10% or less sequence identity) to non-influenza peptides or proteins. For example, in some embodiments, each peptide in the set of peptides can have minimal homology to a human protein, and/or a protein of animal origin, e.g., a canine protein or a feline protein. Selection of peptides in accordance with the methodology described herein can be used to identify a minimal set of peptides that can be used to detect multiple types or subtypes of influenza in a sample, e.g., a patient sample.
[0035] In general, processes of the present technology include collection/calculation steps and selection/weighting steps.
[0036] As an initial matter, processes of the present technology begin with the identification of the viruses or disease states for detection in a clinical analysis as well as the type of information required for a result from the clinical analysis. For example, in one embodiment, the virus to be detected is influenza in humans, and the result desired from the clinical analysis is a simple yes/no for any type of seasonal influenza infection.
[0037] As described above, there are 4 types of influenza virus (types A, B, C, and D). Only types A-C are known to infect humans. Type D primarily affects cattle. Further, type C infections generally result in mild symptoms. Thus, type C is generally not considered to be associated with seasonal pandemic outbreaks. As the present embodiment is directed to detecting influenza in human (and more particularly to harmful influenza), the result of the clinical analysis of the sample would indicate the presence or absence of an influenza infection (either or both type A/type B) in a sample from the human subject. In other embodiments, the workflow need not be restricted to variants A and B. In still yet further embodiments, the workflow is directed to other viruses or other viruses in combination with influenza (e.g., human rhinovirus and/or SARS-CoV-2).
[0038] Influenza viruses can be further categorized by subtype (for type A) or lineage (for type B). Additional information and classification of the virus can be discerned through genetic clades and sub-clades as shown in
[0039] Returning to
[0040] From this first step of collecting data relating to the proteins associated with viral infection (e.g., influenza), additional meta data is collected on each of the proteins. The meta data is then associated with the in silico tryptic digestion of the proteins identified in the first collection step to determine the possible workflow peptides. (See Beynon, R. J.; Bond, J. S.; Proteolytic Enzymes: A Practical Approach, 2.sup.nd Edn. Oxford University Press, Oxford, U K 2001, pp. 149-183). Having access to this list, further calculations using the physicochemical properties of the peptides are computed to determine their ionization efficiencies and other peptide characteristics (for mass spectrometry) and retention times (for LC separation prior to MS). For example, Geromanos et al., in “Simulating and validating proteomics data and search results” Proteomics 2011, 11, 1189-1211 describe an in-silico method for the generation proteomes, which utilizes the underlying physicochemical properties of peptides and proteins to compute peptide characteristics. Additionally, M. Gilar et al. describe various prediction models for peptide separation in their published paper entitled “Utility of retention prediction model for investigation of peptide separation selectivity in reverse-phase liquid chromatography: impact of concentration of trifluoroacetic acid, column temperature, gradient slop and type of stationary phase” published in Analytical Chemistry, Vol. 82, No. 1, Jan. 1, 2010 and also in M. Gilar et al. “Solvent selectivity and strength in reversed-phase liquid chromatography separation of peptides” published in Journal of Chromatography A, 1337, (2014), pages 140-146. All of this information is calculated and collected so that it can be used in the selection/weighting steps applied in the second stage of the process shown in
[0041] As mentioned above, the processes of the present technology include both collection/calculation steps followed by selection/weighting steps. In the selection stage of the process, the information calculated and collected on possible workflow peptides is evaluated and ranked to arrive at actual workflow peptides for viral detection using mass spectrometry.
[0042] After calculation and collection of ionization efficiency and retention times, as well as other physicochemical properties, filtering/weighting based on physicochemical properties and homology determination begins. Specifically, using information regarding the length of the peptides, the amino acid sequences therein (i.e., their motif), retention times to determine co-eluting or near eluting (isobaric) peptides, and MRM transitions of the co-eluting or near eluting peptides, are evaluated and used to discount (i.e., filter out, down score, reduce weighting, etc.) at least a portion of the possible workflow peptides generated from the previous collection and calculation steps. For example, the length of the possible workflow peptides is evaluated. Possible workflow peptides having a longer length are disfavored due to difficulties in liquid chromatography or mass spectroscopy analysis. Therefore, a threshold cut-off is applied as a filter (e.g., peptides having a sequence length longer than 15 amino acids are eliminated in this example). Other criteria include retention time (e.g., selecting peptides that elute at different times) and/or MRM transition differences, i.e., non-isobaric peptides and/or MRM transitions, should the peptides co-elute and potentially give rise to detection interference.
[0043] In addition to the physicochemical properties and homology filter criteria described above, other filtering or scoring functions can be applied. For example, the frequency % of a peptide (i.e., the % protein amino acid of sequences containing a given peptide) can be used to generate a score function for selecting possible workflow peptides. As a simple example, a score function (scaled between 0 and 1) is calculated by multiplying the particular possible workflow peptide frequency by the number of publications describing the protein function the peptide is associated with. As a result, peptides that are common between strains and types of influenza (high frequency) that are described extensively can be taken into account, such that a minimal number of possible workflow peptides can be selected as a way to positively detect one or more proteins indicative of viral infection.
[0044] Using one or more of physicochemical properties, homology, and/or frequency scoring factors, a scoring-function, resulting from the inference analysis and accumulating to a comprehensive, a final, multi-factorial, weighted score can be applied to filter the results. In some embodiments, instead of using a scoring factor to capture all aspects, a more simplistic threshold cut-off can be applied, such as eliminating all possible workflow peptides having more than 15 amino acids. In other embodiments, the scoring factor can be refined to capture all aspects used in the selection steps (e.g., used in addition to or in replacement of cut-off values). Other ways of calculating a scoring function or different scoring functions are also within the scope of the preset technology. For example, some data can be excluded from the analysis due to expert or domain knowledge regarding sample preparation, behavior/properties, and detection of certain peptides. For example, peptides that contain the amino acid methionine (M), an amino acid with a hydrophobic side chain, are particularly challenging to resolve as they may reside in two forms, in an oxidized and non-oxidized form. Thus, using filtering/weighting steps of the present technology, these peptides can be excluded.
[0045] In one embodiment, a Bayesian statistics based analysis is utilized for the final selection of the actual workflow peptides. Once the actual workflow peptides are determined, the data used in modeling these actual peptides through the in silico tryptic digestion, ionization efficiency and retention time, and other meta data is provided, such that the workflow for the mass spectrometry detection of all actual workflow peptides can be reviewed and applied.
[0046] In another embodiment, a Markov Chain Monte Carlo (MCMC) algorithm is utilized for targeting peptides for selection of the actual workflow peptides. MCMC is a method that allows for the efficient exploration of high-dimensional probability distributions by obtaining random samples (Monte Carlo) from the distribution using an iterative process in which each iteration depends only on the properties of the distribution at the current position and possible destinations (Markov Chain). This method can be used to limit the enormous number of possible experiments in checking each individual target list of peptides to arrive at the selected or actual workflow peptides.
[0047]
[0048] After eliminating peptide b and peptide c using the filtering methods described above, statistical analysis regarding the coverage of detection of the four variants with the least amount of peptides is undertaken to make the final selection of the actual workflow peptides. In the illustration provided in
[0049] As mentioned above,
[0050] The process of the present technology can optionally include a final step after the collection/computational steps and selection steps. In the embodiments shown in
[0051] The above illustrations of the present technology are directed to obtaining a Yes/No result of virus detection from a single subject. However, the present technology is not limited to just a single class of infection, such as, for example, influenza. For example, the present technology can be applied to a clinical analysis (a mass spectrometry based analysis) of a sample to determine if more than one virus is present in a single sample. That is, the processes of the present technology can be utilized to select actual workflow peptides for a MS determination of whether the subject is infected with SARS-CoV-2, influenza (type A or type B, or novel variant) and/or any other coronavirus or season viral infection (e.g., human rhinovirus). In addition, the present technology is also applicable to providing a more detailed result (i.e., not just Yes or No). For example, the present technology can be used to select the actual workflow peptides which MS presence will indicate not just the presence of an influenza infection, but also whether the subject is infected with type A or type B, and potentially which variant (e.g., which subclass, etc. by incorporating information regarding the variant of interest in the coverage selection steps).
[0052] Further, the present technology is not limited to detection of viral infections. For example, the processes of the present technology can be used to select actual workflow peptides for the detection of metabolic diseases (e.g., Gaucher disease, phenylketonuria) or other types of disorders that can be detected using clinical LC/MS.
[0053] While the above illustrative embodiments were directed to a sample of a single subject, the present technology can also be employed to detect infection in pooled samples (samples stemming from the collection/pooling of numerous subjects to form a pool of subjects). Pooling can be useful when testing large sections of a population for the presence of a disease state. By pooling a number of individuals together, the hope is that a greater segment of the population can be tested regularly and discounted from having the infection, thus saving resources.
[0054] In embodiments of the technology, statistical Bayesian Inference analysis is utilized to identify target peptides that represent the protein variants of interest (e.g. coverage). Technology of the present invention (see
[0055] For example, each peptide from a candidate list of peptides can be either excluded or included in an experiment for arriving at or determining the selection. There are therefore 2.sup.N possible experimental target lists L.sub.i where 1≤i≤2.sup.N that provide varying degrees of protein variant coverage C(i) where 0<C.sub.i<1 and when C(i)=1, 100% coverage is achieved. In order to assign relative merit to these potential target lists, other constraints must be considered, such as peak capacity of the separation system and duty cycle of the MS/MS detection system along with the requirement to minimize the number of peptides utilized. In reality, because there is an enormous number of possible experiments, the target lists cannot all be checked individually (to arrive at selection of actual workflow peptides). And instead, embodiments of the present technology apply a statistical approach in the selection step of the method.
[0056] In some embodiments, such as some of the embodiments discussed above, one or more statistical approaches may be used, such as those employed in Bayesian inference. Other approaches are also possible. For example, a Markov Chain Monte Carlo (MCMC) algorithm can be utilized in the selection step of the present technology. MCMC methods allow for efficient exploration of high-dimensional probability distributions by obtaining random samples (Monte Carlo) from the distribution using an iterative process in which each iteration depends on the properties of the distribution at the current position and possible destinations (Markov Chain).
[0057] When employing a MCMC approach in the present technology, the figure of merit, which must simultaneously encompass the goals of increasing protein coverage, minimizing the number of peptides and experimental compatibility, is not strictly a probability as such and there is considerable freedom in how to define it. As a result, in some embodiments, the figure of merit is split into two concepts: a “likelihood” function which is the protein coverage C(i) raised to some power S and an exponential “prior probability distribution” for the number of peptides in the target list having a mean that controls the relative importance of this parameter. The experimental constraints are incorporated by imposing a maximum on the number of simultaneously eluting target peptides in the method. For a given target list L.sub.i this can be calculated by producing a scoring function in the form of a “virtual chromatogram” for the target list using the retention time of the targeted peptides and the system peak capacity.
[0058] In some embodiments the MCMC sampling of the included/excluded statuses of peptides may comprise Gibbs sampling, or Metropolis-Hastings sampling.
[0059] In some embodiments, the objective is to uniquely identify as many proteins as possible using a minimal number of peptides. However, owing to similarity and redundancy in the list of proteins provided, it may not be possible to find peptides that can uniquely identify certain proteins. It is therefore useful to introduce a numerical measure of “degeneracy” that can be included in the figure of merit employed in the optimization process. To give some examples, “degeneracy” could be the average number of proteins identified by a peptide sequence, the variance of this quantity or some combination of these.
[0060] In line with standard Bayesian analysis, certain embodiments employ a “posterior probability distribution” which is given by the product of the “prior probability distribution” with the “likelihood” function. In the early stages of the statistical analysis, the “likelihood” is softened (a procedure often referred to as simulated annealing) to reduce the probability of the analysis becoming trapped in local minima.
[0061] Another advantage of utilizing MCMC based approaches in selection steps is their ability to provide many alternative solutions to the problem, either by re-running the analysis several times with different random seeds, or by taking several representative samples in the final stages of the analysis. This provides some flexibility in the event that a promising looking method performs less well in practice than is predicted through simulation. To give just one example, actual retention times of peptides will differ from simulated values, which could lead to the experimental capacity being exceeded.
[0062] In some embodiments, several explorations may be carried out in parallel, and the individual exploration “objects” may interact with each other at certain times during the exploration, for example, in a genetic algorithm, nested sampling or a particle swarm optimization.
[0063] The following Examples illustrate a MCMC approach utilized in the methods of the present technology during the selection steps. Prior to the selection steps, collection steps including in silico digestion were performed.
[0064] Example 1: Selection of Peptides for Differentiation of Protein Virus Variants—Analysis Using MCMC Approach. In this example, initial selection criteria were simulated to identify actual workflow peptides for the differentiation analysis of known human coronaviruses (e.g., SARS-CoV-2, SARS-CoV MERS-CoV, Coy 229E, CoV OC43, CoV NL 63, and CoVHKU1). The protein virus variant amino acid sequences were obtained from the UniRef section (100% sequence identity clusters) of the UniProt Protein Knowledge Database and processed using the collection steps illustrated in
[0065] Example 2: Selection of Peptides for Detection of all Known Protein Virus Variants. In this example, the targeted detection is the presence or absence of a disease state, specifically influenza. That is, is a protein complement of Influenza A or B detected. This method can be adopted for other disease states, such as the presence of a corona virus protein variant(s). In the present example, reviewed (i.e., manually annotated) amino acid sequences were obtained from the UniProt Protein Knowledge Database, were analyzed and pre-processed using the same filter criteria as described in Example 1.
[0066] Two additional protein virus subsets were created to demonstrate the effect of variant inclusion on peptide sample sets, restricting the analysis to only the most frequently observed and the gene translation products that undergo mutation, requiring 100% and 95% variant coverage respectively. The top three panels (
[0067] Example 3: Selection of Peptides for Detection of all Known Protein Virus Variants. In this example, the analysis of Example 2 is extended to including circulating influenza virus proteins based on WHO vaccine development recommendations (https://www.who.int/influenza/vaccines/virus/recommendations/202002_recommendation.pdf). The proteins and amino acid sequences from predicted circulating influenza viruses were obtained from the UniRef section (100% sequence identity clusters) and the reviewed entries of the UniProt Protein Knowledge Database.
[0068] In this particular example, demonstrating that tailoring of the pre-analysis step, based on, for example, expert domain knowledge, can be readily achieved, the filtering step prior to the MCMC analyses, included limiting the in silico generated peptides (digestion) to a subset of peptides with a sequence length of 5 to 20 amino acid, and allowing for one missed cleavage.
[0069] In silico determined physicochemical analyte properties included normalized retention time and relative ionization efficiency (predicted abundance of double (2+) and triply (3+) charged peptide ions of over 10% of total abundance). For the peptide selection of both circulating influenza (A and B) variants and circulating variants plus UniProt reviewed influenza (A and B variants 100% and 95% variant coverage were considered, representing four cases in total. For all these cases, three possible solutions were determined. For the 100% variant coverage, each possible solution (i.e., Solution 1, Solution 2, and Solution 3) included 146 or 149 sequences. And for the 95% variant coverage, each possible solution included 84 or 87 sequences.
[0070]
[0071]
[0072] Additional information reflected in the graph of
[0073] The above examples illustrate the effects filtering criteria and statistical analysis approaches to derive a scoring function imparts on number of selection results. This approach can be used to analyze the simulation of an enormous amount of laboratory experiments to achieve actual workflow peptides for the desired analysis.
[0074] Example 4: Selection of Peptides for Detection of all Known Protein Virus—Respirator Syncytial Virus (RSV). As another example, solutions for RSV with the proteins obtained from UniRef section (100% sequence identity clusters) and reviewed entries of the UniProt Protein Knowledge Database. The pre-analysis filtering steps were identical to Example 3 and 95% and 100% variant coverage solutions determined. Three possible solutions for both cases were determined. For the 100% variant coverage, each possible solution (i.e., Solution 1, Solution 2, and Solution 3) included 62 sequences. And for the 95% variant coverage, each possible solution included 53 or 54 sequences.
[0075] Table 1 below summarizes Examples 1˜4 and identifies the optional protein subsets.
TABLE-US-00001 Excluded amino # acid Protein.sup.† Knowledge amino containing Missed subset Ex case virus db Conditions acids peptides cleavages (optional) 1 differentiation SARS-CoV- UniProt Peak capacity = 5-15 M 0 — (all) 2 (reviewed) 50 SARS-CoV Max co- MERS-CoV eluting Cov 229E peptides = 10 CoV OC43 MRM CoV NL63 transitions/ CoVHKU1 peptide = 2 2 coverage Influenza UniProt Peak capacity = 5-15 M 0 — (all) A (reviewed) 50 Influenza Max co- B eluting peptides = 10 MRM transitions/ peptide = 2 3 coverage Influenza UniRef Peak capacity = 5-15 M 0 NP A 100% 50 HA Influenza sequence Max co- NA B identity eluting NS peptides = 10 M MRM PB1 transitions/ PB2 peptide = 2 NB 4 coverage Respirator UniRef Peak capacity = 5-15 M 0 — (all) Syncytial 100% 50 Virus sequence Max co- (RSV) identity eluting peptides = 10 MRM transitions/ peptide = 2 .sup.†abbreviations (nucleocapsid (N) and nucleoprotein (NP) are interchangeably used in literature and protein knowledge databases): — Influenza A/B: haemagglutinin (HA); matrix protein (M); nucleoprotein (NP); non-structural protein 1 (NS), matrix protein 1 and matrix protein 2 (M), RNA-directed RNA polymerase catalytic subunit (PB1), polymerase basic protein 2 (PB2), glycoprotein NB (NB)
[0076] In additional embodiments, further separation or filtering steps may be employed in the analysis, including, but not limited to ion mobility separation. These separations may be modelled as part of an in silico experimental design workflow. In the case of ion mobility separation, arrival times of peptides or peptide fragments at a particular point in the instrument (for example a mass filter) may be determined using calibration information and/or values from previous experiments or literature.