QUANTITATIVE PROTEIN ANALYSIS

Abstract

The disclosure relates to quantitative analysis of proteins in different species, including plant species. Disclosed are methods that utilize conserved peptides across species to be used as isotope labeled internal standards, which are then used for absolute quantification of proteins. For example, a method for quantitative protein analysis of two or more species is disclosed, the method including determining a set of common peptides that are common for the two or more species, creating a set of isotope-labeled peptides out of the set of common peptides, adding a predefined amount of the labeled peptides to a sample from one of the two or more species, performing mass spectrometry to create first intensity values for a group of peptides from the sample and second intensity values for the labeled peptides, and calculating a quantitative amount of the group of peptides based on the first intensity values and the second intensity values.

Claims

1. A method for quantitative protein analysis of two or more plant species, the method comprising: determining a set of common peptides that are common for the two or more plant species; creating a set of isotope labeled peptides out of the set of common peptides; adding a predefined amount of one or more labeled peptides from the set of isotope labeled peptides to a sample from one of the two or more plant species; performing mass spectrometry to create first intensity values for a group of peptides from the sample and second intensity values for the one or more labeled peptides; and calculating a quantitative amount of the group of peptides based on the first intensity values and the second intensity values.

2. The method of claim 1, wherein determining the common peptides is based on taxonomy comprising the two or more plant species.

3. The method of claim 2, wherein the taxonomy represents evolutionary relationships.

4. The method of claim 1, wherein determining the set of common peptides comprises: determining, using at least one computer, digital data indicative of multiple species-specific sets of peptides based on digital sequence data from each of species in the two or more plant species, and determining peptides that are common for the multiple sets of species-specific peptides, wherein the at least one computer comprises at least one processor, and wherein the at least one processor is operatively connected to at least one non-transitory, computer readable medium having computer-executable instructions stored thereon.

5. The method of claim 1, wherein: determining the set of common peptides is based on mass spectrometry data, the mass spectrometry data being indicative of multiple species-specific sets of peptides; and the method further comprises determining peptides that are common for the multiple sets of species-specific peptides.

6. The method of claim 4, wherein the multiple sets of species-specific peptides comprise species-specific sets determined based on the digital sequence data.

7. The method of claim 5, wherein the multiple sets of species-specific peptides comprise species-specific sets determined based on the mass spectrometry data.

8. The method of claim 1, wherein the method is used for quantifying a protein complex.

9. The method of claim 8, wherein the protein complex is the same complex in the two or more species.

10. The method of claim 1, wherein the adding the predefined amount of the one or more labeled peptides further comprises adding the predefined amount of the one or more labeled peptides to a sample from a species in a group for which the set of common peptides was determined.

11. A kit for quantitative protein analysis of two or more plant species, the kit comprising: two or more labeled peptides corresponding to peptides that are common between two or more plant species.

12. The kit of claim 11, wherein the peptides common to the two or more plant species are selected from a set of common peptides.

13. The kit of claim 11, wherein the peptides common to the two or more plant species are selected using a computational approach, a hybrid approach, and/or an empirical approach.

14. The kit of claim 11, wherein the two or more labeled peptides are selected from the group consisting of: SEQ ID NO. 54 through SEQ ID NO. 153, and combinations thereof.

15. The kit of claim 11, wherein the two or more plant species are two or more species of Rosids, and wherein the two or more labeled peptides are selected from the group consisting of: SEQ ID NO. 54 through SEQ ID NO. 453, and combinations thereof.

16. The kit of claim 11, further comprising two or more groups of labeled peptides corresponding to the peptides that are common between the two or more species, wherein the two or more groups are in a hierarchical relationship in relation to a taxonomy of species.

17. A method for quantitative protein analysis, the method comprising: receiving, by at least one processor, mass spectrometry data comprising measurements with intensity values and corresponding mass-to-charge values; based on the mass-to-charge values, identifying, by the at least one processor: a first set of measurements that relate to labeled peptides from a set of common peptides that are common for two or more plant species; and a second set of measurements that relate to sample peptides from the set of common peptides; and calculating, by the at least one processor, a quantitative amount of the sample peptides based on the intensity values of the first set of measurements and the intensity values of the second set of measurements.

18. The method of claim 17, further comprising determining, by the at least one processor, the set of common peptides that are common for the two or more plant species.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0046] The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate exemplary embodiments and, together with the description, further serve to enable a person skilled in the pertinent art to make and use these embodiments and others that will be apparent to those skilled in the art.

[0047] FIG. 1 illustrates mass spectrometry of protein samples, according to an embodiment of the disclosure.

[0048] FIG. 2 illustrates a computer system for performing quantitative protein analysis, according to an embodiment of the present disclosure.

[0049] FIG. 3 illustrates a method for quantitative protein analysis, according to an embodiment of the present disclosure.

[0050] FIG. 4 illustrates a taxonomy tree of bacteria, where the numbers indicate how many peptides are conserved among the tested species contained within the corresponding classification.

[0051] FIG. 5 illustrates a taxonomy tree of plants.

[0052] FIG. 6 illustrates the process of photosynthesis including the major complexes.

[0053] FIG. 7 illustrates molar ratios of 14 species' protein complexes, according to an embodiment of the present disclosure.

[0054] FIG. 8 illustrates ratios from the 14 species, but the ratios are relative to Rubisco and the proteins are related to the light-independent reactions of photosynthesis, according to an embodiment of the present disclosure.

[0055] FIGS. 9A-9B illustrate alignment of peptides of 10 different species against Arabidopsis as a reference sequence, according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

[0056] The present invention is more fully described below with reference to the accompanying figures. The following description is exemplary in that several embodiments are described (e.g., by use of the terms “preferably,” “for example,” or “in one embodiment”); however, such should not be viewed as limiting or as setting forth the only embodiments of the present invention, as the invention encompasses other embodiments not specifically recited in this description, including alternatives, modifications, and equivalents within the spirit and scope of the invention. Further, the use of the terms “invention,” “present invention,” “embodiment,” and similar terms throughout the description are used broadly and not intended to mean that the invention requires, or is limited to, any particular aspect being described or that such description is the only manner in which the invention may be made or used. Additionally, the invention may be described in the context of specific applications; however, the invention may be used in a variety of applications not specifically described.

[0057] The embodiment(s) described, and references in the specification to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment(s) described may include a particular feature, structure, or characteristic. Such phrases are not necessarily referring to the same embodiment. When a particular feature, structure, or characteristic is described in connection with an embodiment, persons skilled in the art may effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0058] In the several figures, like reference numerals may be used for like elements having like functions even in different drawings. The embodiments described, and their detailed construction and elements, are merely provided to assist in a comprehensive understanding of the invention. Thus, it is apparent that the present invention can be carried out in a variety of ways, and does not require any of the specific features described herein. Also, well-known functions or constructions are not described in detail since they would obscure the invention with unnecessary detail. Any signal arrows in the drawings/figures should be considered only as exemplary, and not limiting, unless otherwise specifically noted. Further, the description is not to be taken in a limiting sense, but is made merely for the purpose of illustrating the general principles of the invention, since the scope of the invention is best defined by the appended claims.

[0059] It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. Purely as a non-limiting example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. As used herein, the singular forms “a”, “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be noted that, in some alternative implementations, the functions and/or acts noted may occur out of the order as represented in at least one of the several figures. Purely as a non-limiting example, two figures shown in succession may in fact be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality and/or acts described or depicted.

[0060] As used herein, ranges are used herein in shorthand, so as to avoid having to list and describe each and every value within the range. Any appropriate value within the range can be selected, where appropriate, as the upper value, lower value, or the terminus of the range.

[0061] Unless indicated to the contrary, numerical parameters set forth herein are approximations that can vary depending upon the desired properties sought to be obtained. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of any claims, each numerical parameter should be construed in light of the number of significant digits and ordinary rounding approaches.

[0062] The words “comprise”, “comprises”, and “comprising” are to be interpreted inclusively rather than exclusively. Likewise the terms “include”, “including” and “or” should all be construed to be inclusive, unless such a construction is clearly prohibited from the context. The terms “comprising” or “including” are intended to include embodiments encompassed by the terms “consisting essentially of” and “consisting of”. Similarly, the term “consisting essentially of” is intended to include embodiments encompassed by the term “consisting of”. Although having distinct meanings, the terms “comprising”, “having”, “containing” and “consisting of” may be replaced with one another throughout the description of the invention.

[0063] Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment.

[0064] Terms such as, among others, “about,” “approximately,” “approaching,” or “substantially,” mean within an acceptable error for a particular value or numeric indication as determined by one of ordinary skill in the art, which depends in part on how the value is measured or determined. The aforementioned terms, when used with reference to a particular non-zero value or numeric indication, are intended to mean plus or minus 10% of that referenced numeric indication. As an example, the term “about 4” would include a range of 3.6 to 4.4. All numbers expressing dimensions, velocity, and so forth used in the specification are to be understood as being modified in all instances by the term “about.” Accordingly, unless indicated to the contrary, the numerical parameters set forth herein are approximations that can vary depending upon the desired properties sought to be obtained. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of any claims, each numerical parameter should be construed in light of the number of significant digits and ordinary rounding approaches.

[0065] “Typically” or “optionally” means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.

[0066] Wherever the phrase “for example,” “such as,” “including” and the like are used herein, the phrase “and without limitation” is understood to follow unless explicitly stated otherwise.

[0067] In general, the word “instructions,” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software units, possibly having entry and exit points, written in a programming language, such as, but not limited to, Python, R, Rust, Go, SWIFT, Objective C, Java, JavaScript, Lua, C, C++, or C#. A software unit may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, but not limited to, Python, R, Ruby, JavaScript, or Perl. It will be appreciated that software units may be callable from other units or from themselves, and/or may be invoked in response to detected events or interrupts. Software units configured for execution on computing devices by their hardware processor(s) may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution). Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors. Generally, the instructions described herein refer to logical modules that may be combined with other modules or divided into sub-modules despite their physical organization or storage. As used herein, the term “computer” is used in accordance with the full breadth of the term as understood by persons of ordinary skill in the art and includes, without limitation, desktop computers, laptop computers, tablets, servers, mainframe computers, smartphones, handheld computing devices, and the like.

[0068] In this disclosure, references are made to users performing certain steps or carrying out certain actions with their client computing devices/platforms. In general, such users and their computing devices are conceptually interchangeable. Therefore, it is to be understood that where an action is shown or described as being performed by a user, in various implementations and/or circumstances the action may be performed entirely by the user's computing device or by the user, using their computing device to a greater or lesser extent (e.g. a user may type out a response or input an action, or may choose from preselected responses or actions generated by the computing device). Similarly, where an action is shown or described as being carried out by a computing device, the action may be performed autonomously by that computing device or with more or less user input, in various circumstances and implementations.

[0069] In this disclosure, various implementations of a computer system architecture are possible, including, for instance, thin client (computing device for display and data entry) with fat server (cloud for app software, processing, and database), fat client (app software, processing, and display) with thin server (database), edge-fog-cloud computing, and other possible architectural implementations known in the art.

[0070] Generally, embodiments of the present disclosure provide a method for quantitative protein analysis. As set out above herein, the peak in the m/z intensity depends not only on the abundance of a protein, but also on the protein (peptide) structure and other factors. Therefore, it is inaccurate to infer quantities from relative peak values. For example, if a first fragment has peak at twice the intensity as a second fragment, it is not accurate to conclude that the corresponding first protein is twice as abundant than the second protein.

[0071] However, it is possible to label chemically synthesized peptides with isotopes or synthesize proteins that have labeled peptides. This way, the labeled synthesized peptide and the unlabeled natural peptide go through the same MS process and if they were equally abundant in the sample, they would show roughly equal intensity in their m/z peaks. It is noted that the peaks for the fragments of the labeled peptides are different from the unlabeled peptides due to the different mass of the isotopes. More information can be found in U.S. Pat. No. 7,501,286 entitled “ABSOLUTE QUANTIFICATION OF PROTEINS AND MODIFIED FORMS THEREOF BY MULTISTAGE MASS SPECTROMETRY,” which is incorporated herein by reference.

[0072] More particularly, the process of protein quantification comprises identifying a set of peptides that are to be analyzed quantitatively, combining the peptides to form a protein, synthesizing DNA to express that protein, providing the DNA to an organism (such as a bacterium) to express that protein while providing labeled pre-cursor molecules to the organism. Alternatively, the individual isotope labeled peptides are chemically synthesized. The labeled protein or peptides can then be added to the sample at a set amount (i.e., known abundance). The peaks of the natural peptides can then be “normalized” using the peaks of the labeled peptides. In other words, the quantitative abundance of the natural peptides can be calculated using the relative intensities between the peaks of the natural peptides and the peaks of the labeled peptides. Therefore, for example, if the amount of labeled peptide in the sample is 1 μmol/l and the peak of the natural peptide is ten times the peak of the labeled peptide, the abundance of the natural peptide is 10 μmol/l. More information on this process can be found in Julie M. Pratt, Deborah M. Simpson, Mary K. Doherty, Jenny Rivers, Simon J Gaskell, and Robert J Beynon: “Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes,” Nature Protocols, Vol. 1 No. 2, 2006, which is incorporated herein by reference.

[0073] While the above process using QconCAT synthetic proteins comprised of concatenated peptides can provide quantitative abundances, it is difficult to use for quantitative proteomics across different species because protein sequences differ across species and manufacturing the labeled peptides is burdensome and inefficient as a high number of labeled peptides is required. Of course, this also increases costs to a level where quantitative protein analysis across multiple protein targets, multiple species, and experiments is practically unviable. More particularly, analyzing samples from different species may require a different set of labeled peptides and therefore re-starting the process from the beginning. This problem is less relevant, although still problematic, for humans and other mammals since they share a relatively high percentage sequence identity across conserved proteins. In other groups of organisms, however, the species are vastly different and therefore, a set of peptides that works for one species, is unlikely to yield useful results for a different species.

[0074] Embodiments of the disclosure provide a method for standardized quantitative analysis across different species. In particular, one or more embodiments provide a method to determine a set of peptides that can be used for quantitative protein analysis of all species of a selected group of species. This way, the set of labeled proteins only needs to be constructed once and can then be manufactured in a large amount, which reduces costs and complexity.

[0075] The species may be plant species. For example, a producer of grain seeds wants to achieve genetic gain through selection based on quantitative proteomic phenotyping. That producer may produce rice, barley and wheat. Instead of constructing one set of labeled peptides for each of these species, the producer can now use a single set of peptides that leads to useful quantitative data on all of those species.

[0076] In other examples, the species are prokaryotes, protocista, fungi, plants, and animals. When reference is made to “different species” herein, the species may be from the same kingdom or from different kingdoms. For example, the methods disclosed herein may be used for quantitative protein analysis of fungi and plants, or for quantitative protein analysis of only plants. Thus, in one example, the species may be prokaryotes. In another example, the species may be eukaryotes.

[0077] Peptide Selection

[0078] In order to construct labeled proteins that are usable for different species, methods disclosed herein may comprise a step of finding peptides that are common to the species of interest.

[0079] For example, a universal set of peptides may be constructed by finding peptides that are common across species from all existing plant divisions, such as Marchantiophyta (liverworts), Anthocerotophyta (hornworts), Bryophyta (mosses), Filicophyta (ferns), Sphenophyta (horsetails), Cycadophyta (cycads), Ginkgophyta (ginkgos), Pinophyta (conifers), Gnetophyta (gnetophytes), and the Magnoliophyta (Angiosperms, flowering plants). In other examples, the peptides are selected such that they are common across all groups of flowering plants (angiosperms).

[0080] In one example, the method comprises accessing a tree-structured taxonomy of plants, where each plant is represented by a node and connected to other nodes via common nodes (which may be ancestors in the tree), so that connected plant nodes form a Glade (a group of organisms believed to comprise all the evolutionary descendants of a common ancestor). The method then comprises receiving a selection of species of interest and then determining, based on the tree-structured taxonomy, the common node in the tree. This common node may be a common ancestor or an estimated common ancestor. From there, the method may sample representative species from the sub-trees below that ancestor. This may involve random sampling of species below the single common ancestor or identifying most relevant sub-trees in the taxonomy and choosing representative species of those sub-trees.

[0081] For each species, its comprehensive set of peptides is determined theoretically based on sequence data, empirically, or a combination of the two. There may be various different ways for determining a set of peptides for each species as set out in more detail below. For example, in cases where genome sequencing data is available for the species, it is possible to determine the peptides computationally from the genome by determining which proteins can be expressed from that genome and then determine which peptides are in those proteins according to cleavage characteristics of a selected protease such as trypsin. The genome may be retrieved from public databases or sequenced specifically for this purpose. In another example, the peptides are determined by mass spectrometry of the actual organisms. Therefore, once the species have been selected, biological samples of those species can be obtained and a set of peptides identified through mass spectrometry for each species.

[0082] In another example, an individual species may have a protein existing as different isoforms (due to alternative splicing, for example). In further examples, a group of species may have one or more common proteins that exist as homologs. As a result, the proteins have some different peptides and not all peptides are common across the group of species despite the common protein molecular function. For this reason, one or more embodiments of the disclosed method determines the set of peptides for a group of species.

[0083] Then, the method determines an intersection of the sets of peptides of the selected group of species. The intersection then contains the common peptides that can be used for labelling and quantitative protein analysis of the originally provided group of species.

[0084] For example, there are two different plant species I and II, which are different (fern and tomato). Both species have an example protein but different homologs of this protein. The homologs are functionally equivalent, but their sequences differ (except for the conserved parts). Species I has protein homolog A and species II has protein homolog B and it is desired to perform a quantitative protein analysis. In this example, homolog A has peptides abc and homolog B has peptides bef, so peptide b is in common, which means peptide b is evolutionarily conserved.

[0085] In other words, Species I has homolog A, which has peptides abc, while Species II has homolog B, which has peptides bef.

[0086] Then, the labeled peptides could be bhi. This would provide quantitative protein analysis because peptide b is in common and because of the 1:1:1 ratio of protein to peptide it is possible to quantify A as well as B (in the different samples). Also, if the protein exists in a protein complex of known and conserved stoichiometry, then the amounts of the complex and the additional proteins in the complex can be calculated.

[0087] Once the set of common peptides have been found, it is possible to perform the previously described method of creating QconCAT genes, expressing them into a labeled protein and sample that at known amounts together with samples from the species of interest. Alternatively, the set of common peptides could be chemically synthesized with isotope labeled amino acids.

[0088] Computational Approach

[0089] As mentioned above, there are different ways to determine the set of common peptides. First, there is a computational approach where the set of peptides is determined on digital data sources. More particularly, a digital representation of the genome of different plant species can be obtained and a computer system loads this representation, such as on random access memory (RAM) or hard disk drive (HDD).

[0090] The computer system starts with the first genome and scans the first genome to identify data patterns where trypsin would, if applied chemically, split a protein produced by the genome. More specifically, the computer system processes the digitally encoded DNA and replaces all occurrences of “T” (thymine) with “U” (uracil) to create a digitally encoded RNA. The computer system then translates the digitally encoded RNA into an amino acid sequence via the genetic code that converts each 3-mer of RNA (or “codon”), into one of 20 amino acids, which again are digitally encoded. The computing system then iterates over the amino acid sequence and every time the computer system encounters arginine or lysine, except when followed by proline, splits the amino acid sequence.

[0091] The resulting parts of the amino acid sequence resulting from the splits are the digitally encoded peptide sequences (i.e., sequences of amino acids). Given that there are 20 amino acids, each amino acid can be encoded by a 5-bit variable. Alternative encodings, such as one-hot 20 bit are also possible.

[0092] In at least one embodiment, available tools such as “translate” from the Swiss Bioinformatics Resource Portal (available at the expasy.org website) may also be used. While the above example relates to DNA as a starting point, other forms of digital sequence data, such as RNA, may be used as a starting point for the calculation of lists of proteins.

[0093] In at least one embodiment, the computer system stores the resulting list of peptides and repeats the process for the second genome and all further genomes of further species under consideration. This produces multiple lists of peptides including one list for each species. The computer system now processes the lists to find common elements. For example, the lists may be sorted, such as by converting the binary encoding of the amino acids into decimal numbers. Alternatively, the lists may be ordered by first amino acid, then by second amino acid, and so on similarly to how decimal numbers would be ordered sequentially by digits. The ordering speeds-up the search for common peptides because it is not necessary to iterate over the entire list.

[0094] In yet another example, the peptides may be stored in a database, such that each entry of a peptide in one of the lists has one entry in a database table. The computer system can then execute a query for common peptides, such as using a JOIN operation to find common peptides or an AND connection, like peptide_1 is in List_1 AND is in List_2. The advantage is that databases, such as SQL, have sophisticated mechanisms to optimize this search. In yet another example, Microsoft Excel can be used with the COUNTIF function to find common peptides.

[0095] The result of these processing methods is a list of peptides that are common for the two or more species under consideration. The advantage of this computational approach is that it requires no empirical steps, such as actual mass spectrometry data of biological samples. A potential disadvantage is that some identified peptides may be difficult to detect due to low expression levels in most species or other chemical behavior during mass spectrometry.

[0096] Empirical Approach

[0097] Aside from the computational method described above, it is possible to perform mass-spectrometry of samples from a reference species or group of species under consideration. This will yield a list of peptides per species and those lists can then be processed to identify common peptides as described above. It will be understood by those skilled in the art that any suitable mass-spectrometric instrument or mass-spectrometric data acquisition method may be used to identify common peptides. For example, SWATH analysis or other data independent methods may be used. In the case of data independent methods, peptide fragment data can be compared to a reference ion library created from a reference species.

[0098] In at least one embodiment, the reference ion library is created from data dependent acquisition analysis, and subsequent peptide-spectrum matching uses probabilistic scoring of a reference species for which comprehensive genome sequence data are available. Data independent acquisition is then used for additional species that may or may not have available genome sequence data. Comparisons of the data independent data from multiple species versus the reference ion library are scored probabilistically and identifications of conserved peptides are accepted or rejected based on a probability score such as false discovery rate. Similarly, data dependent acquisition mass spectrometry methods may be used.

[0099] In data dependent methods, the fragment ion spectra are either compared to a reference ion library as above or compared to peptide sequence data using peptide spectrum matching software that assigns peptide identifications to spectra. Those resulting peptide identifications can then be searched for conserved peptides across the multiple representative species of the taxonomic group of interest.

[0100] While this empirical approach only detects peptides that are observable, it requires the task of mass spectrometry of samples and therefore may be cumbersome and expensive, especially where a large number of species are considered for common peptides, such as ten species. The empirical approach does not require whole genome sequence data from more than one species. It only requires whole genome sequence data from the species that serves as the reference species. For example, Arabidopsis thaliana was the reference species in the empirical approach that identified the conserved peptides from vascular plants in Table 4. Data dependent A. thaliana peptide data were used with its full theoretical proteome, derived from its full genome sequence, to create an ion library. Then data independent data from peptides of additional 11 species of vascular plants were compared to the A. thaliana ion library.

[0101] Hybrid Approach

[0102] While the above sections describe a computational approach and an empirical approach, it is noted that not all representative species need to be processed by the same approach but a combination is possible. For example, one of the species may be analyzed empirically, which may even involve the use of a public database to obtain mass spectrometry data including a list of observed peptides from that one species. The other species can be analyzed using the computational approach. Since unobservable peptides are not included in the first list of peptides from the first species, they are automatically “filtered” from the computationally determined lists. This is so because all peptides in the final list of common peptides need to be in all of the lists, including the first that only contains observable peptides.

[0103] Computer Systems and Computer-Implemented Methods

[0104] Turning now to FIG. 2, a computer system 200 for quantitative protein analysis is shown. Computer system 200 comprises a processor 201 connected to non-transitory (e.g. non-volatile) program memory 202 and data memory 203 (such as RAM or hard disk). Stored on program memory 202 is software code that, when executed by processor 201 causes processor 201 to execute the methods disclosed herein. In particular, processor 201 receives mass-spectrometry data from a mass spectrometer 204 and calculates quantities of proteins by performing, e.g., the steps of method 300 in FIG. 3. Processor 201 is also connected to database 205, which may store lists of peptides for two or more species or list of common peptides across two or more species.

[0105] FIG. 3 illustrates a computer-implemented method 300 for quantitative protein analysis of two or more species as performed by processor 201. First, processor 201 receives 301 mass spectrometry data. This data comprises measurements with intensity values and corresponding mass-to-charge values. The data may be provided in the form of a text file stored on data memory 203 or provided differently, such as through distributed data storage systems, e.g. Apache's Hadoop.

[0106] Based on the mass-to-charge values, processor 201 identifies 302 first measurements that relate to labeled peptides from a set of common peptides that are common for the two or more plant species. Processor 201 then identifies 303 second measurements that relate to sample peptides from the set of common peptides. These second measurements are for un-labeled peptides, which are naturally occurring in the sample and to be measured quantitatively. Finally, processor 201 calculates 304 a quantitative amount of the sample peptides based on the intensity values of the first measurements and the intensity values of the second measurements.

[0107] Calculating the quantitative amount in step 304 may be based on a known amount of labeled peptides that was added to the sample. This known amount may have been entered by the user through a user interface. In another example, the known amount is provided electronically by a dosing machine that automatically adds a pre-set amount of labeled peptides to the sample.

[0108] The quantitative amount may be relative to the added amount. For example, the processor 201 may calculate that the amount of unlabeled peptides is 10 times higher than the amount of unlabeled peptides. Processor 201 may output this result as a quantitative amount or may multiple the result with the known amount of added peptide to provide an absolute amount.

[0109] Importantly, processor 201 can repeat the receiving and identification steps for a different species but using the same set of common peptides, which is also referred herein as a “kit of labeled peptides.” As a result, the peptides of the second species can be quantitatively analyzed without the need to provide a different kit of labeled peptides. This makes the kit of peptides applicable for a wide range of species.

[0110] Even further, processor 201 can repeat the receiving and identification steps for a species that was not used for determining the common peptides. This can be done where a related species was used for determining the common peptides. In other words, there is a set of “training species” and processor 201 determines the set of common peptides for the training species as described above with reference to the computational, empirical and hybrid approaches. Processor 201 can then perform method 300 for one or more “test species” using the set of common peptides determined for the training species. Importantly, the test species does not have to be in the set of training species.

[0111] However, in examples described herein, the test species is within a space of species that is spanned by the training species in relation to a taxonomy of species, which may be an evolutionary relationship. In other words, the test species has a common ancestor in the taxonomy that is in the set of training species. In that sense, the kit of labeled peptides can be used for quantitative protein analysis of all species that have a common ancestor in the set of training species for which the kit was created.

[0112] The following examples further illustrate one or more embodiments of the present disclosure, but should not be construed as limiting the present disclosure, which is defined by the claims.

EXAMPLES

[0113] Exemplary processes for the identification of conserved peptides and their uses in quantitative methods are set out in the Examples below.

Example 1

Computational Identification of Conserved Peptides in Bacteria

[0114] Conserved peptides were identified by theoretically digesting amino acid sequences from the bacterial genomes of 46 species of bacteria (FIG. 4). The species were selected to span the phylum Firmicutes, which is a large group of economically and medically significant bacteria.

[0115] Theoretical digestion of the FASTA amino acid sequences was carried out by using Protein Digestion Simulator with the following parameters: (a) no missed cleavages with trypsin cleavage defined as occurring at the C-terminal side of K or R residues and not at KP or RP; (b) a minimum of 7 residues; and (c) a minimum mass of 400 Da and a maximum of 6,000 Da.

[0116] The data was processed in Excel. Peptides in common among two or more species were identified using the COUNTIF function. For each pair or set of species in a comparison one was the reference—the set that was the range for the COUNTIF. Shared peptides returned COUNTIF values of 1 or more (more if the peptides occurred two or more times in the reference proteome).

[0117] The process was quickened by first, for a set of species, doing a simple pairwise comparison between two species to create a list of peptides in common between them, which was much shorter than the lists of total tryptic peptides for either species. Then, the resulting short list served as the reference list for additional comparisons.

[0118] The numbers in FIG. 4 indicate how many peptides are conserved among the tested species contained within the corresponding classification. Once a set of conserved peptides was found at a level of taxonomy, for example the 492 peptides conserved in the genus Bacillus, only those peptides were used for comparisons at the next higher level of taxonomy. In the Bacillus example, that means the 492 conserved peptides were used as the reference set for the family Bacillaceae—they were compared against the peptides of the representative species of the other genera in Bacillaceae. Then, the 107 conserved peptides of the Bacillaceae were used as the reference set for finding conserved peptides among the families that make up the Order Bacillales (see FIG. 4).

Example 2

Computational Identification of Conserved Peptides in Eukaryotes

[0119] Amino acid sequences from the following Uniprot proteome entries were theoretically digested using Protein Digestion Simulator as above: Human (vertebrate animal), 75,069 sequences; Yeast—Saccharomyces cerevisiae (fungus), 6049 sequences; Nematode—Caenorhabditis elegans (invertebrate animal), 26,701 sequences; Arabidopsis thaliana (plant), 39,349 sequences; and Oomycete—Phytophthora infestans (member of a clade of oomycetes and protists distant from other eukaryotes), 17,514 sequences.

[0120] The digest outputs were processed in Excel. The yeast and phytophthora outputs were combined into one excel file. The organisms with the smallest proteomes were processed first

[0121] As above, Countif was used to determine if yeast peptides were present in phytophthora, resulting in 352 unique peptides conserved between yeast and phytophthora.

[0122] Countif was again used to identify peptides from Caenorhabditis elegans which are common to the 352 unique peptides identified between yeast and phytophthora. A total of 141 peptides conserved were identified in yeast, phytophthora and C. elegans.

[0123] Countif was again used to identify peptides from A. thaliana which are common to the 141 unique peptides identified between yeast, phytophthora and C. elegans. A total of 106 peptides conserved were identified in yeast, phytophthora, C. elegans and A. thaliana.

[0124] Countif was again used to identify human peptides which are common to the 106 unique peptides identified between yeast, phytophthora, C. elegans and A. thaliana . A total of 100 peptides conserved were identified in humans, yeast, phytophthora, C. elegans and A. thaliana . These are set out in Table 2, with example protein identifiers for yeast and Arabidopsis and example functional annotations from the MapMan annotation scheme for Arabidopsis.

TABLE-US-00002 TABLE 2 Conserved peptides in eukaryotes MapMan annotation [manual annotations from TAIR proteins names arc in TAIR10 brackets when SEQ Arabidopsis Mercator did not ID Sequence Yeast Uniprot name accession provide annotation] NO: LTGMAFR sp|P00359|G3P3_YEAST AT1G79530 Carbohydrate 54 metabolism.plastidial glycolysis.glyceralde hyde 3-phosphate dehydrogenase IGLFGGAGVGK sp|P00830|ATPB_YEAST AT5G08690 Cellular 55 respiration.oxidative phosphorylation. ATP synthase complex.peripheral MF1 subcomplex.subunit beta LQIWDTAGQER sp|P01123|YPT1_YEAST AT5G59840 Vesicle 56 trafficking.regulation of membrane tethering and fusion.RAB-GTPase activities.E-class RAB GTPase TITSSYYR sp|P01123|YPT1_YEAST AT4G17530 Vesicle 57 trafficking.regulation of membrane tethering and fusion.RAB-GTPase activities.D-class RAB GTPase EIQTAVR sp|P02294|H2B2_YEAST AT5G59910 Chromatin 58 organisation.histones. histone (H2B) DNIQGITKPAIR sp|P02309|H4_YEAST AT5G59690 Chromatin 59 organisation.histones. histone (H4) TLYGFGG sp|P02309|H4_YEAST AT5G59690 Chromatin 60 organisation.histonce. histone (H4) ELISNASDALDK sp|P02829|HSP82_YEAST AT4G24190 Protein 61 homeostasis.protein quality control.Hsp90 chaperone system. chaperone (Hsp90) STTTGHLIYK sp|P02994|EF1A_YEAST AT5G60390 Protein biosynthesis. 62 translation elongation. eEF1 aminoacyl-tRNA binding factor activity. aminoacyl-tRNA binding factor (cEF1A) LPLQDVYK sp|P02994|EF1A_YEAST AT5G60390 Protein biosynthesis. 63 translation elongation. eEF1 aminoacyl-tRNA binding factor activity.aminoacyl- tRNA binding factor (eEF1A) IGGIGTVPVGR sp|P02994|EF1A_YEAST AT5G60390 Protein biosynthesis. 64 translation elongation. cEF1 aminoacyl-tRNA binding factor activity.aminoacyl- tRNA binding factor (eEFlA) QTVAVGVIK sp|P02994|EF1A_YEAST AT5G60390 Protein 65 biosynthesis.translation elongation.eEFl aminoacyl- tRNA binding factor activity.aminoacyl- tRNA binding factor (eEF1A) EGLIDTAVK sp|P04050|RPB1_YEAST AT4G35800 RNA biosynthesis.DNA- 66 dependent RNA polymerase (Pol) complexes.Pol II catalytic componcnts. subunit 1 EGLVDTAVK sp|P04051|RPC1_YEAST AT5G60040 RNA biosynthesis.DNA- 67 dependent RNA polymerase (Pol) complexes.Pol III catalytic components. subunit 1 EGIPPDQQR sp|P05759|RS31_YEAST AT5G37640 Protein 68 homeostasis.ubiquitin- piuleasume system, ubiquitin-fold protein conjugation, ubiquitin conjugation (ubiquitylation). ubiquitin-fold protein (UBQ) ESTLHLVLR sp|P05759|RS31_YEAST AT5G37640 Protein 69 homeostasis.ubiquitin- proteasome system. ubiquitin-fold protein conjugation.ubiquitin conjugation (ubiquitylation). ubiquitin-fold protein (UBQ) VADFGLAR sp|P06242|KIN28_YEAST AT5G07280 Phytohormone 70 action.signalling peptides.NCRP (non- cysteine-rich-peptide) category.TDL-peptide activity.TDL-peptide receptor (EMS1/MSP1) MLDMGFEPQIR sp|P06634|DED1_YEAST AT5G63120 RNA processing, pre- 71 mRNA splicing.U2- type-intron-specific major spliceusuine.U1 small nuclear ribonucleoprotein particle (snRNP).pre- mRNA splicing regulator (DDX5) SSALASK sp|P07259|PYR1_YEAST AT1G29900 Amino acid metabolism. 72 biosynthesis.glutamate family.glutamate-derived amino acids.arginine. carbamoyl phosphate synthetase heterodimer. large subunit YDLTVPFAR sp|P07263|SYH_YEAST AT3G02760 Protein 73 biosynthesis.aminoacyl- tRNA synthetase activities.histidine- tRNA ligase TITTAYYR sp|P07560|SEC4_YEAST AT5G59840 Vesicle 74 trafficking.regulation of membrane tethering and fusion.RAB-GTPase activities.E-class RAB GTPase QLWWGHR sp|P07806|SYV_YEAST AT5G16715 Protein 75 biosynthesis.aminoacyl- tRNA synthetase activities.valine- tRNA ligasc AGVSQVLNR sp|P08518|RPB2_YEAST AT4G21710 RNA biosynthesis.DNA- 76 dependent RNA polymerase (Pol) complexes.Pol II catalytic components. subunit 2 NTYQSAMGK sp|P08518|RPB2_YEAST AT4G21710 RNA biosynthesis. DNA- 77 dependent RNA polymerase (Pol) complcxcs.Pol II catalytic components. subunit 2 LLLLGAGESGK sp|P08539|GPA1_YEAST AT2G26300 Multi-process regulation. 78 G-protein signalling. heterotrimeric G-protein complex.component alpha VEIIANDQGNR sp|P09435|HSP73_YEAST AT5G02500 Protein homeostasis. 79 protein quality control. cytosolic Hsp70 chaperone system.chaperone (Hsp70) TTPSYVAFTDTER sp|P09435|HSP73_YEAST AT1G16030 Protein homeostasis. 80 protein quality control. cytosolic Hsp70 chaperone system.chaperone (Hsp70) IINEPTAAAIAYGLDK sp|P09435|HSP73_YEAST AT5G42020 [In 11 heat shock proteins 81 in Arabidopsis] ITITNDK sp|P09435|HSP73_YEAST AT5G02490 Protein homeostasis. 82 protein quality control. cytosolic Hsp70 chaperone system.chaperone (Hsp70) FDLMYAK sp|P09733|TBA1_YEAST AT5G19770 Cytoskeleton organisation. 83 microtubular network.alpha- beta-Tubulin heterodimer. component alpha-Tubulin GGMQIFVK sp|P0CG63|UBI4P_YEAST AT5G37640 Protein 84 homeostasis.ubiquitin- proteasome system. ubiquitin-fold protein conjugation, ubiquitin conjugation (ubiquitylation). ubiquitin-fold protein (UBQ) NTTIPTK sp|P0CS90|HSP77_YEAST AT5G02490 Protein 85 homeostasis.protein quality control.cytosolic Hsp70 chaperone system. chaperone (Hsp70) VHGSLAR sp|P0CX34|RS30B_YEAST AT4G29390 Protein biosynthesis. 86 ribosome biogenesis. small ribosomal subunit (SSU).SSU proteome.component RPS30 ECADLWPR sp|P0CX42|RL23B_YEAST AT3G04400 Protein biosynthesis. 87 ribosome biogenesis.large ribosomal subunit (LSU).LSU proteome.component RPL23 DELTLEGIK sp|P10081|IF4A_YEAST AT3G13920 Protein biosynthesis. 88 translation initiation. mRNA loading.mRNA unwinding factor (eIF4A) IDHYLGK sp|Pl1412|G6PD_YEAST AT5G40760 Carbohydrate metabolism. 89 oxidative pentose phosphate pathway. oxidative phase.glucosc-6- phosphate dehydrogenase NAEYNPK sp|P13393|TBP_YEAST AT3G13445 RNA biosynthesis.RNA 90 polymerase II-dependent transcription.transcription initiation.TFIId basal transcription regulation complex.TATA-box-binding component ALCTGEK sp|P14832|CYPH_YEAST AT5G13120 Photosynthesis. 91 photophosphorylation. chlororespiration.NADH dehydrogenase-like (NDH) complex, lumen subcomplex L.component PnsL5 DVIAFPK sp|P15179|SYDM_YEAST AT4G33760 Protein biosynthesis. 92 aminoacyl-tRNA synthetase activities. aspartate-tRNA ligase SAIGEGMTR sp|P16140|VATB_YEAST AT4G38510 Solute transport.primary 93 active transport.V-type ATPase complex.peripheral V1 subcomplex.subunit B DNNLLGK sp|P16474|BIP_YEAST AT5G02490 Protein homeostasis. 94 protein quality control. cytosolic Hsp70 chaperone system.chaperone (Hsp70) YFPTQALNFAFK sp|P18239|ADT2_YEAST AT5G13490 Solute transport.carrier- 95 mediated transport.solute transporter (MTCC) APGFGDNR sp|P19882|HSP60_YEAST AT3G13860 Protein homeostasis. 96 proteinquality control. Hsp60 chaperone system. chaperone (Hsp60) AGAFDQLK sp|P20424|SRP54_YEAST AT5G49500 Protein translocation. 97 endoplasmic reticulum.co- translational insertion system.SRP (signal recognition particle) complex.component SRP54 GYIDLSK sp|P20459|IF2A_YEAST AT5G05470 Protein biosynthesis. 98 translation initiation. Pre-Initiation Complex (PIC) module.eIF2 Met-tRNA binding factor activity.eIF2 Met-tRNA binding factor complex.component eIF2-alpha TTLLHMLK sp|P20606|SAR1_YEAST AT3G62560 Vesicle trafficking.Coat 99 protein II (COPII) coatomer machinery.coat protein recruiting.GTPase (Sar1) HITIFSPEGR sp|P21243|PSA1_YEAST AT2G05840 Protein homeostasis. 100 ubiquitin-proteasome system.26S proteasome.20S core particle.alpha-type components.component alpha type-1 NTYQCAMGK sp|P22276|RPC2_YEAST AT5G45140 RNA biosynthesis.DNA- 101 dependent RNA polymerase (Pol) complexes.Pol III catalytic components. subunit 2 QITQVYGFYDECLR sp|P23595|PP2A2_YEAST AT5G55260 Protein modification. 102 phosphorylation. serine/threonine protein phosphatase superfamily. PPP Fe—Zn-dependent phosphatase families. PP4-class phosphatase complex.catalytic component PP4c NIGISAHIDSGK sp|P25039|EFGM_YEAST AT2G45030 Protein biosynthesis. 103 organelle machinery. translation elongation. elongation factor (EF-G) GSLPWQGLK sp|P29295|HRR25_YEAST AT5G57015 Protein modification. 104 phosphorylation.CK protein kinase superfamily.protein kinase (CKL) VAIHEAMEQQTISIAK sp|P29496|MCM5_YEAST AT2G07690 Cell cycle organisation. 105 DNA replication. preinitiation.MCM replicative DNA helicase complex. component MCM5 NMSVIAHVDHGK sp|P32324|EF2_YEAST AT1G56070 Protein biosynthesis. 106 translation elongation. eEF2 mRNA-translocation factor activity. mRNA- translocation factor (eEF2) QATINIGTIGHVAHGK sp|P32481|IF2G_YEAST AT4G18330 Protein biosynthesis. 107 translation initiation. Pre-Initiation Complex (PIC) module.eIF2 Met- tRNA binding factor activity.eIF2 Met-tRNA binding factor complex. component eIF2-gamma LGYANAK sp|P32481|IF2G_YEAST AT4G18330 Protein biosynthesis. 108 translation initiation. Pre-Initiation Complex (PIC) module.eIF2 Met- tRNA binding factor activity.eIF2 Met-tRNA binding factor complex. component eIF2-gamma QSLETICLLLAYK sp|P32598|PP12_YEAST AT5G59160 Protein modification. 109 phosphorylation. serine/threonine protein phosphatase superfamily.PPP Fe—Zn- dependent phosphatase families.PP1-class phosphatase GNHECASINR sp|P32598|PP12_YEAST AT5G59160 Protein modification. 110 phosphorylation. serine/threonine protein phosphatase superfamily.PPP Fe—Zn- dependent phosphatase families.PP1-class phosphatase IYGFYDECK sp|P32598|PP12_YEAST AT5G59160 Protein modification. 111 phosphorylation. serine/threonine protein phosphatase superfamily.PPP Fe—Zn- dependent phosphatase families.PP1-class phosphatase HLTGEFEK sp|P32836|GSP2_YEAST AT5G55190 Protein translocation. 112 nucleus. nucleocytoplasmic transport.Ran GTPase VCENIPIVLCGNK sp|P32836|GSP2_YEAST AT5G55190 Protein translocation. 113 nucleus. nucleocytoplasmic transport.Ran GTPase FQSLGVAFYR sp|P32939|YPT7_YEAST AT3G16100 Vesicle trafficking. 114 regulation of membrane tethering and fusion. RAB-GTPase activities. G-class RAB GTPase YLGEGPR sp|P33298|PRS6B_YEAST AT5G58290 Protein homeostasis, 115 ubiquitin-proteasome system. 26S proteasome. 19S regulatory particle. ATPase components. regulatory component RPT3 VIMATNR sp|P33298|PRS6B_YEAST AT5G58290 Protein homeostasis. 116 ubiquitin-proteasome system.26S proteasome. 19S regulatory particle. ATPase components. regulatory component RPT3 VIGSELVQK sp|P33299|PRS7_YEAST AT1G53750 Protein homeostasis. 117 ubiquitin-proteasome system.26S proteasome. 19S regulatory particle. ATPase components. regulatory component RPT1 YVGEGAR sp|P33299|PRS7_YEAST AT1G53750 Protein homeostasis, 118 ubiquitin-proteasome system.26S proteasome. 19S regulatory particle. ATPase components. regulatory component RPT1 TGHSGTLDPK sp|P33322|CBF5_YEAST AT3G57150 Protein biosynthesis. 119 ribosome biogenesis. rRNA biosynthesis.post- transcriptional rRNA modification. pseudouridylation. H/ACA small nucleolar ribonucleoprotein (snoRNP) rRNA pseudouridylation complex.pseudouridine synthase component Nap57/CBF5 FTLWWSPTINR sp|P33334|PRP8_YEAST AT4G38780 RNA processing.pre- 120 mRNA splicing.U2- type-intron-specific major spliceosome.U5 small nuclear ribonucleoprotein particle (snRNP). protein factor (PRPF8/SUS2) ISLIQIFR sp|P33334|PRP8_YEAST AT4G38780 RNA processing.pre- 121 mRNA splicing.U2- type-intron-spccific major spliceosome.U5 small nuclear ribonucleoprotein particle (snRNP). protein factor (PRPF8/SUS2) IIHTSVWAGQK sp|P33334|PRP8_YEAST AT4G38780 RNA processing.pre- 122 mRNA splicing.U2- type-intron-specific major spliceosome.U5 small nuclear ribonucleoprotein particle (snRNP). protein factor (PRPF8/SUS2) LAEQAER sp|P34730|BMH2YEAST AT5G65430 [In 16 regulatory 123 proteins in Arabidopsis] NLLSVAYK sp|P34730|BMH2_YEAST AT5G65430 [In 16 regulatory 124 proteins in Arabidopsis] DSTLIMQLLR sp|P34730|BMH2_YEAST AT5G65430 [In 25 regulatory 125 proteins in Arabidopsis] DIVFAASLYL sp|P35207|SKI2_YEAST AT1G59760 RNA proccssing.RNA 126 surveillance.exosome complex.associated co-factor activities. Nuclear Exosome Targeting (NEXT) activation complex. RNA helicase component MTR4/HEN2 AQIWDTAGQER sp|P38555|YPT31_YEAST AT5G65270 Vesicle trafficking. 127 regulation of membrane tethering and fusion. RAB-GTPase activities. A-class RAB GTPase AITSAYYR sp|P38555|YPT31_YEAST AT5G60860 Vesicle trafficking. 128 regulation of membrane tethering and fusion. RAB-GTPase activities. A-class RAB GTPase LCDFGSAK sp|P38615|RIM11_YEAST AT5G26751 Phytohormone action. 129 brassinosteroid. perception and signal transduction.GSK3- type protein kinase (BIN2) IADFGLAK sp|P39009|DUN1_YEAST AT5G67080 Protein modification. 130 phosphorylation. STE protein kinase superfamily.protein kinase (MAP3K- MEKK) GANEATK sp|P39990|SNU13_YEAST AT5G20160 RNA processing.pre- 131 mRNA splicing.U2- type-intron-specific major spliceosome. U4/U6 small nuclear ribonucleoprotein particle (snRNP). protein factor (NHP2L1/SNU13) LIGDAAK sp|P40150|SSB2_YEAST AT5G02500 Protein homeostasis. 132 protein quality control.cytosolic Hsp70 chaperone system.chaperone (Hsp70) DTQCGFK sp|P40350|ALG5_YEAST AT2G39630 Protein modification. 133 glycosylation.N-linked glycosylalion.dolichol- phosphate-glucose synthase (ALG5) MLSCAGADR sp|P41805|RL10_YEAST AT1G66580 Protein biosynthesis. 134 ribosome biogenesis. large ribosomal subunit (LSU).LSU proteome. component RPL10 ICDFGLAR sp|P41808|SMK1_YEAST AT5G19010 Protein modification. 135 phosphorylation. CMGC protein kinase superfamily.protein kinase (MAPK) AVAVVVDPIQSVK sp|P43588|RPN11_YEAST AT5G23540 Protein homeostasis. 136 ubiquitin-proteasome system.26S proteasome. 19S regulatory particle.non-ATPase components.regulatory component RPN11 VVIDAFR sp|P43588|RPN11_YEAST AT5G23540 Protein homeostasis. 137 ubiquitin-proteasome system.26S proteasome. 19S regulatory particle. non-ATPase components. regulatory component RPN11 YMTDGMLLR sp|P53131|PRP43_YEAST AT4G16680 [RNA helicase] 138 GVLLYGPPGTGK sp|P53549|PRS10_YEAST AT5G53540 [RNA helicase] 139 YIGESAR sp|P53549|PRS10_YEAST AT1G45000 Protein homeostasis. 140 ubiquitin-proteasome system.26S proteasome. 19S regulatory particle. ATPase components. regulatory component RPT4 LTSLGVIGALVK sp|P53829|CAF40_YEAST AT5G12980 [Cell differentiation. 141 Rcd1-like protein] GAFGEVR sp|P53894|CBK1_YEAST AT5G09890 Protein modification. 142 phosphorylation. AGC protein kinase superfamily.protein kinase (AGC-VII/NDR) CATITPDEAR sp|P53982|IDHH_YEAST AT1G54340 Enzyme classification. 143 EC_l oxidoreductases. EC_1.1 oxidoreductase acting on CH—OH group of donor SPNGTIR sp|P53982|IDHH_YEAST AT1G54340 Enzyme classification. 144 EC_1 oxidoreductases. EC_1.1 oxidoreductase acting on CH—OH group of donor AGFAGDDAPR sp|P60010|ACT_YEAST AT5G59370 Cytoskeleton organisation. 145 microfilament network. actin filament protein IWHHTFYNELR sp|P60010|ACT_YEAST AT5G59370 Cytoskeleton organisation. 146 microfilament network. actin filament protein STELLIR sp|P61830|H3_YEAST AT5G10980 Chromatin organisation. 147 histones.histone (H3) EIAQDFK sp|P61830|H3_YEAST AT5G65350 Chromatin organisation. 148 histones. histone (H3) LGLTATLVR sp|Q00578|RAD25_YEAST AT5G41370 DNA damage response. 149 nucleotide excision repair (NER).multi- functional TFIIh complex.core module. subunit SSL2/XPB ELFVMAR sp|Q01939|PRS8_YEAST AT5G19990 Protein homeostasis. 150 ubiquitin-proteasome system.26S proteasome. 19S regulatory particle. ATPase components. regulatory component RPT6 GTGLYELWK sp|Q02908|ELP3_YEAST AT5G50320 RNA biosynthesis.RNA 151 polymerase II-dependent transcription. transcription elongation. ELONGATOR transcription elongation complex. component ELP3 TEALTQAFR sp|Q12464|RUVB2_YEAST AT3G49830 Chromatin organisation. 152 chromatin remodeling complexes.SWR1/Nu A4-shared helicase (RVB) AGLQFPVGR sp|Q12692|H2AZ_YEAST AT5G54640 Chromatin organisation. 153 histones.histone (H2A)

Example 3

Hybrid Approach for the Identification of Conserved Peptides in Rosids

[0125] The Rosids is a large group of 17 orders of flowering plants (see FIG. 5). A list of 6647 conserved peptides among 10 species of Rosids (A. thaliana, Eucalyptus grandis, Ricinus communis, Phaseolus vulgaris, Vitis vinifera, Carpinus fangiana, Theobroma cacao, Malus domestica, Citrus clementina, and Cephalotus follicularis) were identified following the procedures outlined in Examples 1 and 2 above.

[0126] The list of 6647 conserved peptides were compared to the list of peptides identified in mass spectrometric experiments in the AraSpec database (Mergner et al., 2020). AraSpec has two large lists of reference peptides contained in ion libraries. One set contains phosphopeptides and the other contains non-phosphorylated peptides. For this analysis, the non-phosphorylated set was used and the redundant peptides, modified peptides and non-tryptic peptides were removed by comparing to a theoretical digest of A. thaliana.

[0127] Of these, 4647 peptides computationally found to be conserved among the ten species were also in AraSpec.

[0128] A list of peptides observed at FDR <0.01% was created from the four Rosid species in the dataset used to create the set of peptides for all vascular plants (Arabidopsis, Flooded gum, Grape, Bean) in Example 4 below. There were 647 peptides observed in all three replicates of the four species.

[0129] There were 231 peptides in common among all three sets: in the ten Rosids species theoretically, in AraSpec, and in the mass spec data from the four Rosids in triplicate.

[0130] Fifteen (15) of these peptides are found in all Eukaryotes (see Example 2). Thirty-six (36) of them are in the QconCATs for all vascular plants (see Example 4) and there are 5 peptides in the QconCATs that are found in all eukaryotes.

[0131] Not including the peptides in all eukaryotes and the QconCATs, there are 185 peptides that could be used for a Rosids kit.

[0132] In summary, the 185 Rosids peptides are: (1) theoretically conserved, (2) confirmed empirically from two sets of mass spectrometry data, (3) not in all eukaryotes, (4) not in the vascular plants prototype kit (QconCATs in Examples 4 through 7), (5) from 109 exemplary Arabidopsis proteins, (6) designed to be used with the eukaryotes kit and/or vascular plants kit, and (7) shown in Table 3 below.

TABLE-US-00003 TABLE 3 Conserved Rosid peptides SEQ Mercator or TAIR protein ID TAIR10 name Sequence description NO: AT1G03475.1 NPFAPTLHFNYR oxygen-dependent 154 coproporphyrinogen III oxidase (HemF) AT1G04420.1 LNLFPGYMER NAD(P)-linked 155 oxidoreductase superfamily protein AT1G06690.1 FAALPWR NAD(P)-linked 156 oxidoreductase superfamily protein AT1G15690.1 AAVIGDTIGDPLK proton-translocating 157 pyrophosphatase (VHP1) AT1G15690.2 AADVGADLVGK proton-translocating 158 pyrophosphatase (VHP1) AT1G15690.2 TDALDAAGNTTAAIGK proton-translocating 159 pyrophosphatase (VHP1) AT1G20010.1 INVYYNEASGGR component beta-Tubulin of 160 alpha-beta-Tubulin heterodimer AT1G29900.1 VLILGGGPNR large subunit of carbamoyl 161 phosphate synthetase heterodimer AT1G32060.1 FYGEVTQQMLK phosphoribulokinase 162 AT1G42970.1 VVAWYDNEWGYSQR glyceraldehyde 3-phosphate 163 dehydrogenase AT1G54340.1 TIEAEAAHGTVTR Peroxisomal isocitrate 164 dehydrogenase [NADP] OS = Arabidopsis thaliana (sp|q9s1k0|icdhx_arath: 872.0) & Enzyme classification.EC_1 oxidoreductases.EC_1.1 oxidoreductase acting on CH—OH group of donor(50.1.1:732.9) AT1G62750.1 MDFPDPVIK EF-G translation elongation 165 factor AT1G62750.1 VEANVGAPQVNYR EF-G translation elongation 166 factor AT1G62750.1 LAQEDPSFHFSR EF-G translation elongation 167 factor AT1G62750.1 INIIDTPGHVDFTLEVER EF-G translation elongation 168 factor AT1G62750.1 IGEVHEGTATMDWMEQEQER EF-G translation elongation 169 factor AT1G67280.2 AFGMELLR lactoyl-glutathione lyase 170 (GLX1) AT1G67280.2 ITACLDPDGWK lactoyl-glutathione lyase 171 (GLX1) AT1G67280.2 GPTPEPLCQVMLR lactoyl-glutathione lyase 172 (GLX1) AT1G70730.3 LSGTGSEGATIR cytosolic 173 phosphoglucomutase AT1G78900.2 EDDLNEIVQLVGK subunit A of V-type ATPase 174 peripheral V1 subcomplex AT1G78900.2 HFPSVNWLISYSK subunit A of V-type ATPase 175 peripheral V1 subcomplex AT1G78900.2 VLDALFPSVLGGTCAIPGAFGCGK subunit A of V-type ATPase 176 peripheral V1 subcomplex AT2G04030.2 ELVSNASDALDK chaperone (Hsp90) 177 AT2G28000.1 VVNDGVTIAR subunit alpha of Cpn60 178 chaperonin complex AT2G30950.1 FQMEPNTGVTFDDVAGVDEAK component FtsH1|2|5|6|8 of 179 FtsH plastidial protease complexes AT2G39730.3 VPLILGIWGGK ATP-dependent activase 180 involved in RuBisCo regulation AT2G39730.3 MCCLFINDLDAGAGR ATP-dependent activase 181 involved in RuBisCo regulation AT2G39730.3 MGINPIMMSAGELESGNAGEPAK ATP-dependent activase 182 involved in RuBisCo regulation AT3G01340.2 DVAWAPNLGLPK scaffolding component 183 Sec13 of coat protein complex AT3G02360.1 IGLAGLAVMGQNLALNIAEK 6-phosphogluconate 184 dehydrogenase AT3G02450.1 GVLLVGPPGTGK component FtsHi of protein 185 translocation ATPase motor complex AT3G04400.2 GSAITGPIGK component RPL23 of LSU 186 proteome component AT3G04400.2 NLYIISVK component RPL23 of LSU 187 proteome component AT3G04400.2 MSLGLPVAATVNCADNTGAK component RPL23 of LSU 188 proteome component AT3G04770.2 LLILTDPR component RPSa of SSU 189 proteome AT3G05530.1 ADILDPALMR regulatory component RPT5 190 of 26S proteasome AT3G09200.2 VGSSEAALLAK component RPP0 of LSU 191 proteome component AT3G11940.2 QAVDISPLR component RPS5 of SSU 192 proteome AT3G11940.2 TIAECLADELINAAK component RPS5 of SSU 193 proteome AT3G13120.2 TMGPVPLPTK component psRPS10 of 194 small ribosomal subunit proteome AT3G13930.1 VIDGAIGAEWLK component E2 of 195 mitochondrial pyruvate dehydrogenase complex AT3G15020.2 LFGVTTLDVVR mitochondrial NAD- 196 dependent malate dehydrogenase AT3G15020.2 DDLFNINAGIVK mitochondrial NAD- 197 dependent malate dehydrogenase AT3G16640.1 VVDIVDTFR translationally controlled 198 tumor protein AT3G26650.1 LLDASHR glyceraldehyde 3-phosphate 199 dehydrogenase AT3G26650.1 VAINGFGR glyceraldehyde 3-phosphate 200 dehydrogenase AT3G26650.1 GTMTTTHSYTGDQR glyceraldehyde 3-phosphate 201 dehydrogenase AT3G26650.1 VIAWYDNEWGYSQR glyceraldehyde 3-phosphate 202 dehydrogenase AT3G46970.1 MSILSTAGSGK cytosolic alpha-glucan 203 phosphorylase AT3G54050.2 QIASLVQR fructose- 1,6-bispho sphatase 204 AT3G54050.2 TLLYGGIYGYPR fructose- 1,6-bispho sphatase 205 AT3G58610.3 GHSYSEIINESVIESVDSLNPFMHAR ketol-acid reductoisomerase 206 AT3G63140.1 DCEEWFFDR endoribonuclease (CSP41) 207 AT3G63410.1 NVTILDQSPHQLAK MSBQ-methyltransferase 208 (APG1) AT4G01800.2 VENYFFDIR component SecA1 of 209 thylakoid membrane Sec1 translocation system AT4G02080.1 ILFLGLDNAGK GTPase (Sar1) 210 AT4G02770.1 EQCLALGTR component PsaD of PS-I 211 complex AT4G02770.1 EQIFEMPTGGAAIMR component PsaD of PS-I 212 complex AT4G04640.1 VELLYTK subunit gamma of 213 peripheral CF1 subcomplex of ATP synthase complex AT4G09000.2 QAFDEAIAELDTLGEESYK general regulatory factor 1 214 AT4G13570.2 GDEELDTLIK histone (H2A) 215 AT4G13940.4 HSLPDGLMR S-adenosyl homocysteine 216 hydrolase AT4G15000.2 YTLDVDLK component RPL27 of LSU 217 proteome component AT4G17170.1 YIIIGDTGVGK B-class RAB GTPase 218 AT4G20360.1 MVMPGDR EF-Tu translation 219 elongation factor AT4G20360.1 YDEIDAAPEER EF-Tu translation 220 elongation factor AT4G20360.1 GITINTATVEYETENR EF-Tu translation 221 elongation factor AT4G20360.1 HSPFFAGYRPQFYMR EF-Tu translation 222 elongation factor AT4G24190.2 FGWSANMER chaperone (Hsp90) 223 AT4G26970.1 ILLESAIR aconitase 224 AT4G27700.1 EWTAWDIAR Rhodanese/Cell cycle 225 control phosphatase superfamily protein AT4G29060.2 EETGAGMMDCK EF-Ts translation elongation 226 factor AT4G30190.2 ELSEIAEQAK P3A-type proton- 227 translocating ATPase (AHA) AT4G30920.1 TIEVNNTDAEGR M17-class leucyl 228 aminopeptidase (LAP) AT4G33010.1 VDNVYGDR glycine dehydrogenase 229 component P-protein of glycine cleavage system AT4G33010.2 TFCIPHGGGGPGMGPIGVK glycine dehydrogenase 230 component P-protein of glycine cleavage system AT4G34450.1 SIATLAITTLLK subunit gamma of cargo 231 adaptor F-subcomplex AT4G35650.1 LADGLFLESCR regulatory component of 232 isocitrate dehydrogenase heterodimer AT4G35830.1 VLLQDFTGVPAVVDLACMR aconitase 233 AT4G35830.2 TSLAPGSGVVTK aconitase 234 AT4G38510.5 IALTTAEYLAYECGK subunit B of V-type ATPase 235 peripheral V1 subcomplex AT4G38510.5 IPLFSAAGLPHNEIAAQICR subunit B of V-type ATPase 236 peripheral V1 subcomplex AT4G38970.1 ALQNTCLK fructose 1,6-bisphosphate 237 aldolase AT5G03340.1 DFSTAILER platform ATPase (CDC48) 238 AT5G03340.1 GILLYGPPGSGK platform ATPase (CDC48) 239 AT5G03340.1 IVS QLLTLMDGLK platform ATPase (CDC48) 240 AT5G04140.2 WPLAQPMR Fd-dependent glutamate 241 synthase AT5G04140.2 FCTGGMSLGAISR Fd-dependent glutamate 242 synthase AT5G08690.1 EMIESGVIK subunit beta of ATP 243 synthase peripheral MF1 subcomplex AT5G08690.1 TVLIMELINNVAK subunit beta of ATP 244 synthase peripheral MF1 subcomplex AT5G08690.1 FTQANSEVSALLGR subunit beta of ATP 245 synthase peripheral MF1 subcomplex AT5G08690.1 CALVYGQMNEPPGAR subunit beta of ATP 246 synthase peripheral MF1 subcomplex AT5G09660.4 ANTFVAEVLGLDPR peroxisomal NAD- 247 dependent malate dehydrogenase AT5G09810.1 YPIEHGIVSNWDDMEK actin filament protein 248 AT5G10860.1 VGDIMTEENK Cystathionine beta- synthase 249 (CBS) family protein AT5G11520.1 LNLGVGAYR aspartate aminotransferase 250 AT5G13490.2 TAAAPIER solute transporter (MTCC) 251 AT5G13490.2 MMMTSGEAVK solute transporter (MTCC) 252 AT5G14300.1 DLQMVNLTLR prohibitin 5 253 AT5G14670.1 ILMVGLDAAGK ARF-GTPase 254 AT5G14670.1 NISFTVWDVGGQDK ARF-GTPase 255 AT5G15200.2 IFEGEALLR component RPS9 of SSU 256 proteome AT5G15650.1 DELDIVIPTIR UDP-L-arabinose mutase 257 AT5G16440.1 AFSVFLFNSK isopentenyl diphosphate 258 isomerase AT5G16990.1 NLYLSCDPYMR NADP-dependent alkenal 259 double bond reductase P2 OS = Arabidopsis thaliana (sp|q39173|p2_arath: 704.0) & Enzyme classification.EC_1 oxidoreductases.EC_1.3 oxidoreductase acting on CH—CH group of donor(50.1.3:295.5) AT5G17920.2 YLFAGVVDGR methyl-tetrahydrofolate- 260 dependent methionine synthase AT5G18380.2 TLLVADPR component RPS16 of SSU 261 proteome AT5G19780.1 AVFVDLEPTVIDEVR component alpha-Tubulin of 262 alpha-beta-Tubulin heterodimer AT5G20980.2 SWLAFAAQK methyl-tetrahydrofolate- 263 dependent methionine synthase AT5G20980.2 YGAGIGPGVYDIHSPR methyl-tetrahydrofolate- 264 dependent methionine synthase AT5G20980.2 GMLTGPVTILNWSFVR methyl-tetrahydrofolate- 265 dependent methionine synthase AT5G23120.1 GFGILDVGYR HCF136 protein involved in 266 PS-II assembly AT5G23860.2 LAVNLIPFPR component beta-Tubulin of 267 alpha-beta-Tubulin heterodimer AT5G23860.2 LHFFMVGFAPLTSR component beta-Tubulin of 268 alpha-beta-Tubulin heterodimer AT5G23860.2 GHYTEGAELIDSVLDVVR component beta-Tubulin of 269 alpha-beta-Tubulin heterodimer AT5G25880.1 IWLVDSK cytosolic NADP-dependent 270 malic enzyme AT5G25880.1 ILGLGDLGCQGMGIPVGK cytosolic NADP-dependent 271 malic enzyme AT5G26780.2 GAMIFFR serine 272 hydroxymethyltransferase AT5G26780.2 MGTPALTSR serine 273 hydroxymethyltransferase AT5G26780.2 LIVAGASAYAR serine 274 hydroxymethyltransferase AT5G26780.2 NTVPGDVSAMVPGGIR serine 275 hydroxymethyltransferase AT5G26780.2 ISAVSIFFETMPYR serine 276 hydroxymethyltransferase AT5G30510.1 AEEMAQTFR component psRPS1 of small 277 ribosomal subunit proteome AT5G35530.1 GLCAIAQAESLR component RPS3 of SSU 278 proteome AT5G36700.4 ENPGCLFIATNR phosphoglycolate 279 phosphatase AT5G37600.1 WNYDGSSTGQAPGEDSEVILYPQAIFK cytosolic glutamine 280 synthetase (GLN1 ) AT5G38480.2 YEEMVEFMEK general regulatory factor 3 281 AT5G41670.2 GFPISVYNR 6-phosphogluconate 282 dehydrogenase AT5G42270.1 LESGLYSR component FtsH1|2|5|6|8 of 283 FtsH plastidial protease complexes AT5G42270.1 DEISDALER component FtsH1|2|5|6|8 of 284 FtsH plastidial protease complexes AT5G42270.1 LELQEVVDFLK component FtsH1|2|5|6|8 of 285 FtsH plastidial protease complexes AT5G42270.1 TPGFTGADLQNLMNEAAILAAR component FtsH1|2|5|6|8 of 286 FtsH plastidial protease complexes AT5G45775.2 YEGVILNK component RPL11 of LSU 287 proteome component AT5G45775.2 AMQLLESGLK component RPL11 of LSU 288 proteome component AT5G45930.1 IGGVMIMGDR component CHL-I of 289 magnesium-chelatase complex AT5G45930.1 INMVDLPLGATEDR component CHL-I of 290 magnesium-chelatase complex AT5G45930.1 FILIGSGNPEEGELRPQLLDR component CHL-I of 291 magnesium-chelatase complex AT5G48300.1 MLDADVTDSVIGEGCVIK ADP-glucose 292 pyrophosphorylase AT5G49910.1 IAGLEVLR chaperone (cpHsc70) 293 AT5G49910.1 FEELCSDLLDR chaperone (cpHsc70) 294 AT5G49910.1 QFAAEEISAQVLR chaperone (cpHsc70) 295 AT5G50920.1 LDEMIVFR chaperone component ClpC 296 of chloroplast Clp-type protease complex AT5G50920.1 LDMSEFMER chaperone component ClpC 297 of chloroplast Clp-type protease complex AT5G50920.1 VIMLAQEEAR chaperone component ClpC 298 of chloroplast Clp-type protease complex AT5G50920.1 IGFDLDYDEK chaperone component ClpC 299 of chloroplast Clp-type protease complex AT5G50920.1 VITLDMGLLVAGTK chaperone component ClpC 300 of chloroplast Clp-type protease complex AT5G50920.1 ALAAYYFGSEEAMIR chaperone component ClpC 301 of chloroplast Clp-type protease complex AT5G50920.1 NTLLIMTSNVGSSVIEK chaperone component ClpC 302 of chloroplast Clp-type protease complex AT5G50920.1 AHPDVFNMMLQILEDGR chaperone component ClpC 303 of chloroplast Clp-type protease complex AT5G50920.1 LIGSPPGYVGYTEGGQLTEAVR chaperone component ClpC 304 of chloroplast Clp-type protease complex AT5G55070.1 GLVVPVIR component E2 of 2- 305 oxoglutarate dehydrogenase complex AT5G56030.2 EEYAAFYK chaperone (Hsp90) 306 AT5G56030.2 AVENSPFLEK chaperone (Hsp90) 307 AT5G56030.2 ADLVNNLGTIAR chaperone (Hsp90) 308 AT5G56030.2 EDQLEYLEER chaperone (Hsp90) 309 AT5G56030.2 GIVDSEDLPLNISR chaperone (Hsp90) 310 AT5G56500.2 VEDALNATK subunit beta of Cpn60 311 chaperonin complex AT5G56500.2 VVAAGANPVLITR subunit beta of Cpn60 312 chaperonin complex AT5G56500.2 EVELEDPVENIGAK subunit beta of Cpn60 313 chaperonin complex AT5G56500.2 AAVEEGIVVGGGCTLLR subunit beta of Cpn60 314 chaperonin complex AT5G56500.2 LSGGVAVIQVGAQTETELK subunit beta of Cpn60 315 chaperonin complex AT5G57350.2 LGDIIPADAR P3A-type proton- 316 translocating ATPase (AHA) AT5G57350.2 ADGFAGVFPEHK P3A-type proton- 317 translocating ATPase (AHA) AT5G57350.2 ADIGIAVADATDAAR P3A-type proton- 318 translocating ATPase (AHA) AT5G57350.2 MTAIEEMAGMDVLCSDK P3A-type proton- 319 translocating ATPase (AHA) AT5G59370.2 GYSFTTTAER actin filament protein 320 AT5G59370.2 HTGVMVGMGQK actin filament protein 321 AT5G59370.2 VAPEEHPVLLTEAPLNPK actin filament protein 322 AT5G59840.1 LLLIGDSGVGK E-class RAB GTPase 323 AT5G59850.1 IVVELNGR component RPS15a of SSU 324 proteome AT5G59910.1 LVLPGELAK histone (H2B) 325 AT5G59910.1 AMGIMNSFINDIFEK histone (H2B) 326 AT5G59970.1 DAVTYTEHAR histone (H4) 327 AT5G59970.1 ISGLIYEETR histone (H4) 328 AT5G59970.1 TVTAMDVVYALK histone (H4) 329 AT5G60390.3 STNLDWYK aminoacyl-tRNA binding 330 factor (eEF1A) AT5G60390.3 EHALLAFTLGVK aminoacyl-tRNA binding 331 factor (eEF1A) AT5G60390.3 YYCTVIDAPGHR aminoacyl-tRNA binding 332 factor (eEF1A) AT5G60390.3 NMITGTSQADCAVLIIDSTTGGFEAGISK aminoacyl-tRNA binding 333 factor (eEF1A) AT5G61410.2 VIEAGANALVAGSAVFGAK phosphopentose epimerase 334 AT5G64040.1 CGSNVFWK component PsaN of PS-I 335 complex AT5G64040.2 FPENFTGCQDLAK component PsaN of PS-I 336 complex AT5G66140.1 ALLEVVESGGK component alpha type-4 of 337 26S proteasome AT5G66190.2 LDFAVSR ferredoxin-NADP 338 oxidoreductase

Example 4

Empirical Identification of Conserved Peptides in Vascular Plants

[0133] An empirical mass spectrometric approach was used to identify conserved peptides in pineapple (Ananas comosus), Thale Cress (Arabidopsis thaliana ), Flooded gum (Eucalyptus grandis), bean (Phaseolus vulgaris), native yam (Dioscorea transversa), elkhorn fern (Platycerium bifurcatum), burrawang (Macrozamia communis), loblolly pine (Pinus taeda), tomato (Solanum lycopersicum), waratah (Telopea speciosissima), grape (Vitis Vinifera), and maize (Zea mays). The 12 species were selected to span the diversity of vascular plants (see FIG. 5).

[0134] Briefly, an ion library (SWATH library) was created for Arabidopsis, based on mass spectrometric data from three Arabidopsis leaf samples. Lys-C and trypsin digested protein extracts from the three leaf samples were analyzed on a Sciex 6600 TripleTOF mass spectrometer with a data dependent acquisition method according to Aspinwall et al. (2019), “Range size and growth temperature influence Eucalyptus species responses to an experimental heatwave,” Glob. Chang. Biol. 25:1665-1684. The resulting data were matched to a list of Arabidopsis proteins (available at the arabidopsis.org website, TAIR10) using ProteinPilot (Sciex). The ProteinPilot.group file was used to create a SWATH library in the PeakView SWATH microapp (Sciex) with a peptide FDR of <1%.

[0135] The same Arabidopsis samples, and three samples each from the 11 additional species (pineapple, flooded gum, bean, native yam, elkhorn fern, burrawang, loblolly pine, tomato, waratah, grape, and maize) were analyzed using data independent SWATH (Aspinwall et al., 2019). The MS data from this analysis were matched to the Arabidopsis ion library using the SWATH microapp, identifying conserved peptides across the 12 different species and ensuring that the peptides were observable through MS analysis. Merely using an amino acid sequence alignment approach may produce peptides that may not be reliably observed through MS analysis. Presence/absence of conserved peptides were based on FDR scores assigned by the SWATH microapp, i.e., a peptide was considered genuinely present in a species, and conserved between that species and Arabidopsis, if all three replicates from a species had a peptide FDR <1%.

[0136] A subset of 105 conserved peptides (see Table 4 below) was selected to be used as a set of isotope labeled internal standards for absolute quantification of their corresponding proteins in subsequent analyses of leaves from additional plant species. Most of the selected peptides were present in all 12 of the diverse species, meaning that they are likely present in all vascular plants. Additional criteria for selection included standard chemical stability preferences for isotope labeled peptide standards, such as peptides not arising from unfavorable trypsin cleavage sites and not containing amino acids likely to undergo spontaneous chemical modification (based on Pratt et al. 2006, “Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes,” Nat. Protoc. 1:1029-43). Peptides were also selected so that highly conserved protein complexes were represented, e.g., PSII, ATP synthase. The stoichiometries of protein subunits within conserved complexes are themselves often highly conserved. Therefore, amounts of overall complexes can be inferred from isotope labeled standards covering a small number of subunits within the complex.

TABLE-US-00004 TABLE 4 Subset of 105 conserved peptides Exemplary TAIR10 or SEQ QconCAT Protein Uniprot MapMan protein ID number Peptide target protein description NO: 1 LIFQYASFNNSR psbA/D1 atcg00020 component PsbA/D1 of 339 PS-II reaction center complex 1 VINTWADIINR psbA/D1 atcg00020 component PsbA/D1 of 340 PS-II reaction center complex 1 AYDFVSQEIR psbD/D2 atcg00270 component PsbD/D2 of 341 PS-II reaction center complex 1 NILLNEGIR psbD/D2 atcg00270 component PsbD/D2 of 342 PS-II reaction center complex 1 LAFYDYIGNNPAK psbB/CP47 atcg00680 component PsbB/CP47 343 of PS-II reaction center complex 1 VHTVVLNDPGR psbB/CP47 atcg00680 component PsbB/CP47 344 of PS-II reaction center complex 1 APWLEPLR psbC/CP43 atcg00280 component PsbC/CP43 345 of PS-II reaction center complex 1 DQETTGFAWWAGNAR psbC/CP43 atcg00280 component PsbC/CP43 346 of PS-II reaction center complex 1 YPIYVGGNR petA atcg00540 apocytochrome f 347 component PetA of cytochrome b6/f complex 1 VYDWFEER petB atcg00720 apocytochrome b 348 component PetB of cytochrome b6/f complex 1 DFGYSFPC[Pye]DGPGR psaB atcg00340 apoprotein PsaB of PS- 349 I complex 1 DKPVALSIVQAR psaB atcg00340 apoprotein PsaB of PS- 350 I complex 1 QILIEPIFAQWIQSAHGK psaB atcg00340 apoprotein PsaB of PS- 351 I complex 1 VFPNGEVQYLHPK PsaD at4g02770 component PsaD of PS- 352 I complex 1 FVQAGSEVSALLGR atpB atcg00480 subunit beta of 353 peripheral CF1 subcomplex of ATP synthase complex 1 LSIFETGIK atpB atcg00480 subunit beta of 354 peripheral CF1 subcomplex of ATP synthase complex 1 DTDILAAFR RbcL atcg00490 large subunit of 355 ribulose-1,5- bisphosphat carboxylase/oxygenase heterodimer 1 TFQGPPHGIQVER RbcL atcg00490 large subunit of 356 ribulose-1,5- bisphosphat carboxylase/oxygenase heterodimer 1 FYWAPTR RCA at2g39730 ATP-dependent 357 activase involved in RuBisCo regulation 1 VYDDEVR RCA at2g39730 ATP-dependent 358 activase involved in RuBisCo regulation 1 IGVIESLLEK PGK at3g12780 phosphoglycerate 359 chloroplast kinase 1 AAALNIVPTSTGAAK GAPB at1g42970 glyceraldehyde 3- 360 phosphate dehydrogenase 1 VIITAPAK GAPB at1g42970 glyceraldehyde 3- 361 phosphate dehydrogenase 1 GKRLASIGLENTEANR FBA1 at2g21330 fructose 1,6- 362 bisphosphate aldolase 1 YIGSLVGDFHR CFBP1 at3G54050 fructose-1,6- 363 bisphosphatase 1 FFQLYVYK GLO1, at3g14420 glycolate oxidase 364 GOX1 1 NFEGLDLGK GLO1, at3g14420 glycolate oxidase 365 GOX1 1 AIPWIFAWTQTR PEPC2 at2g42600 PEP carboxylase 366 1 AIPWIFSWTQTR PEPC This variant of PEPC is not in 367 mutant Arabidopsis, but it is in many species that undergo C4 photosynthesis. 1 EFAPSIPEK MDH at1g04410 NAD-dependent malate 368 dehydrogenase 1 VLVVANPANTNALILK MDH at1g04410 NAD-dependent malate 369 dehydrogenase 1 AGLQFPVGR Histone at1g54690 histone 370 H2A 1 IFLENVIR Histone H4 at5g59970 histone 371 1 VTGGEVGAASSLAPK Ribosome at3g53430 component RPL12 of 372 LSU LSU proteome component 1 VSGVSLLALFK Ribosome at5g02960 component RPS23 of 373 RPS23 SSU proteome 1 ELAEDGYSGVEVR Ribosome at3g53870 component RPS3 of 374 RPS3 SSU proteome 1 GLDVIQQAQSGTGK EIF4A-2 at1g54270 mRNA unwinding 375 factor 1 VLITTDLLAR EIF4A-2 at1g54270 mRNA unwinding 376 factor 1 IGGIGTVPVGR eEF1A at5g60390 aminoacyl-tRNA 377 binding factor 1 LPLQDVYK eEF1A at5g60390 aminoacyl-tRNA 378 binding factor 1 GSGFVAVEIPFTPR ClpC1 at5g50920 chaperone component 379 ClpC of chloroplast Clp-type protease complex 1 TAIAEGLAQR ClpC1 at5g50920 chaperone component 380 ClpC of chloroplast Clp-type protease complex 1 GILAADESTGTIGK FBA8 at3g52930 aldolase 381 1 AVDSLVPIGR Mitochondrial at2g07698 subunit alpha of ATP 382 ATP synthase peripheral synthase MF1 subcomplex alpha 1 AHGGFSVFAGVGER Mitochondrial at5g08680 subunit beta of ATP 383 ATP synthase peripheral synthase MF1 subcomplex beta 1 VVDLLAPYQR Mitochondrial at5g08680 subunit beta of ATP 384 ATP synthase peripheral synthase MF1 subcomplex beta 1 AGFAGDDAPR Actin at5g09810 actin filament protein 385 1 IWHHTFYNELR Actin at5g09810 actin filament protein 386 1 ATAGDTHLGGEDFDNR HSP70-1 at5g02500 chaperone 387 1 IINEPTAAAIAYGLDK HSP70-1 at5g02500 chaperone 388 1 ETDGYFIK ADG1 at5g48300 ADP-glucose 389 pyrophosphorylase 1 IYVLTQFNSASLNR ADG1 at5g48300 ADP-glucose 390 pyrophosphorylase 1 YNQLLR Enolase at2g36530 Bifunctional enolase 391 2/transcriptional activator OS = Arabidopsis thaliana 1 LFTGHPETLEK Myoglobin, Uniprot 392 horse P68082 MYG_HORSE 1 VEADIAGHGQEVLIR Myoglobin, Uniprot 393 horse P68082 MYG_HORSE 1 DEDTQAMPFR Ovalbumin, Uniprot 394 chicken P01012 OVAL_CHICK 1 GGLEPINFQTAADQAR Ovalbumin, Uniprot 395 chicken P01012 OVAL_CHICK 1 ISQAVHAAHAEINEAGR Ovalbumin, Uniprot 396 chicken P01012 OVAL_CHICK 2 WAMLGALGCVFPELLAR Lhcb1.3 at1g29930 component LHCb1/2/3 397 of LHC-II complex 2 STPQSIWYGPDRPK Lhcb2 at2g05070 component LHCb1/2/3 398 of LHC-II complex 2 ALEVIHGR Lhcb3 at5g54270 component LHCb1/2/3 399 of LHC-II complex 2 ECELIHGR Lhcb4/CP29 at2g40100 component LHCb4 of 400 LHC-II complex 2 LHPGGPFDPLGLAK Lhcb5/CP26 at4g10340 component LHCb5 of 401 LHC-II complex 2 TGALLLDGNTLNYFGK Lhcb5/CP26 at4g10340 component LHCb5 of 402 LHC-II complex 2 EAELIHGR Lhcb6 at1g15820 component LHCb6 of 403 LHC-II complex 2 GGSTGYDNAVALPAGGR PsbO2 at3g50820 component 404 PsbO/OEC33 of PS-II oxygen-evolving center 2 GSSFLDPK PsbO2 at3g50820 component 405 PsbO/OEC33 of PS-II oxygen-evolving center 2 AYGEAANVFGKPK PsbP at1g06680 component PsbP of PS- 406 II oxygen-evolving center 2 AWPYVQNDLR PsbQ at4g05180 component PsbQ of 407 PS-II oxygen-evolving center 2 ANELFVGR PsbS at1g44575 non-photochemical 408 quenching PsbS protein 2 ESELIHCR Lhca1 at3g54890 component LHCa1 of 409 LHC-I complex 2 QYFLGLEK Lhca3 at1g61520 component LHCa3 of 410 LHC-I complex 2 EIPLPHEFILNR psaA atcg00350 apoprotein PsaA of PS- 411 I complex 2 TAVNPLLR PsaL at4g12800 component PsaL of PS- 412 I complex 2 VYLWHETTR PsaC atcg01060 component PsaC of PS- 413 I complex 2 EIIIDVPLASR PsaF at1g31330 component PsaF of PS- 414 I complex 2 LYSIASSAIGDFGDSK FNR at5g66190 ferredoxin-NADP 415 oxidoreductase 2 GYISPYFVTDSEK Cnp60 at1g55490 subunit beta of Cpn60 416 chaperonin complex 2 LADLVGVTLGPK Cnp60 at1g55490 subunit beta of Cpn60 417 chaperonin complex 2 AMHAVIDR RbcL atcg00490 large subunit of 418 ribulose-1,5-bisphosphat carboxylase/oxygenase heterodimer 2 SQAETGEIK RbcL atcg00490 large subunit of 419 ribulose-1,5- bisphosphat carboxylase/oxygenase heterodimer 2 LDELIYVESHLSNLSTK PRK at1g32060 phosphoribulokinase 420 2 QYADAVIEVLPTTLIPD PRK at1g32060 phosphoribulokinase 421 DNEGK 2 GVTTIIGGGDSVAAVEK PGK both at1g56190 phosphoglycerate 422 kinase 2 GGAFTGEISVEQLK TIM at2g21170 triosephosphate 423 isomerase 2 EAAWGLAR FBA1 at2g21330 fructose 1,6- 424 bisphosphate aldolase 2 VTTTIGYGSPNK TKL1 at3g60750 transketolase 425 2 YTGGMVPDVNQIIVK SBPase at3g55800 sedoheptulose-1,7- 426 bisphosphatase 2 IDLAIDGADEVDPNLDLVK RPI3 at3g04790 phosphopentose 427 isomerase 2 LVFVTNNSTK PGLP1B at5g36790 phosphoglycolate 428 phosphatase 2 LLEATGISTVPGSGFGQK GGT1 at1g23310 glutamate-glyoxylate 429 transaminase 2 LAVEAWGLK AGT1 at2g13360 serine-glyoxylate 430 transaminase 2 IAILNANYMAK GLDP1 at4g33010 glycine dehydrogenase 431 component P-protein of glycine cleavage system 2 SLLALQGPLAAPVLQHLTK GDCST at1g11860 aminomethyltransferase 432 component T-protein of glycine cleavage system 2 YSEGYPGAR SHM1 at4g37930 serine 433 hydroxymethyltransferase 2 GQTVGVIGAGR HPR at1g68010 hydroxypyruvate 434 reductase 2 FDFDPLDVTK catalase at1g20620 catalase 435 2 FSVSPVVR eEF2 at1g56070 mRNA-translocation 436 factor 2 GVQYLNEIK eEF2 at1g56070 mRNA-translocation 437 factor 2 AASFNIIPSSTGAAK GAPC2 at1g13440 NAD-dependent 438 glyceraldehyde 3- phosphate dehydrogenase 2 VPTVDVSVVDLTVR GAPC2 at1g13440 NAD-dependent 439 glyceraldehyde 3-phosphate dehydrogenase 2 LVAGLPEGGVLLLENVR PGK at1g79550 phosphoglycerate 440 kinase 2 LAADTPLLTGQR Vacuolar at1g78900 subunit A of V-type 441 ATP ATPase peripheral V1 synthase A subcomplex 2 AVVQVFEGTSGIDNK Vacuolar at1g76030 subunit B of V-type 442 ATP ATPase peripheral V1 synthase B subcomplex 2 AILNLSLR GS2 at5g35630 plastidial glutamine 443 synthetase 2 EHIAAYGEGNER GSR1 at5g37600 cytosolic glutamine 444 synthetase 2 LVAEAGIGTVASGVAK GLU1 at5g04140 Fd-dependent 445 glutamate synthase 2 VCPSHILNFQPGEAFVVR BCA at3g01500 446 2 DVATILHWK BCA at3g01500 447 2 FALESFWDGK ATCIMS at5g17920 methyl- 448 tetrahydrofolate- dependent methionine synthase 2 DEDTQAMPFR Ovalbumin, Uniprot 449 chicken P01012 OVAL_CHICK 2 GGLEPINFQTAADQAR Ovalbumin, Uniprot 450 chicken P01012 OVAL_CHICK 2 VEADIAGHGQEVLIR Myoglobin, Uniprot 451 horse P68082 MYG_HORSE 1 MAGRNFEGLDLGKELA Full 452 EDGYSGVEVRAHGGFS QconCAT1 VFAGVGERTAIAEGLA amino acid QREFAPSIPEKGGLEPIN sequence FQTAADQARLPLQDVY KAYDFVSQEIRGKRLAS IGLENTEANRDKPVALS IVQARAGFAGDDAPRQI LIEPIFAQWIQSAHGKIG GIGTVPVGRVHTVVLN DPGRVYDDEVRLSIFET GIKVYDWFEERLIFQYA SFNNSRVSGVSLLALFK ETDGYFIKVIITAPAKYP IYVGGNRAVDSLVPIGR AGLQFPVGRVVDLLAP YQRLAFYDYIGNNPAK VLVVANPANTNALILK AIPWIFAWTQTRLFTGH PETLEKFVQAGSEVSAL LGRNILLNEGIRFYWAP TRGLDVIQQAQSGTGK ATAGDTHLGGEDFDNR DFGYSFPCDGPGRAAA LNIVPTSTGAAKISQAV HAAHAEINEAGRYIGSL VGDFHRYNQLLRIGVIE SLLEKFFQLYVYKVLIT TDLLARIYVLTQFNSAS LNRAPWLEPLRGILAA DESTGTIGKIWHHTFYN ELRVTGGEVGAASSLA PKVFPNGEVQYLHPKVI NTWADIINRIFLENVIRII NEPTAAAIAYGLDKTF QGPPHGIQVERGSGFVA VEIPFTPRDQETTGFAW WAGNARVEADIAGHG QEVLIRAIPWIFSWTQT RDTDILAAFRDEDTQA MPFRLAAALEHHHHHH 2 HMAGRGGLEPINFQTA Full 453 ADQARLHPGGPFDPLG QconCAT2 LAKTGALLLDGNTLNY amino acid FGKDEDTQAMPFRWA sequence MLGALGCVFPELLARA WPYVQNDLRYSEGYPG ARFSVSPVVRGVQYLN EIKEAELIHGRECELIHG RAYGEAANVFGKPKAN ELFVGRLVFVTNNSTKL LEATGISTVPGSGFGQK LAVEAWGLKQYFLGLE KESELIHCREIIIDVPLAS RVYLWHETTREIPLPHE FILNRTAVNPLLRSTPQ SIWYGPDRPKAILNLSL RIAILNANYMAKSLLAL QGPLAAPVLQHLTKGQ TVGVIGAGRAMHAVID REHIAAYGEGNERALE VIHGRGVTTIIGGGDSV AAVEKGGAFTGEISVE QLKEAAWGLARGGST GYDNAVALPAGGRFAL ESFWDGKFDFDPLDVT KLYSIASSAIGDFGDSK GSSFLDPKLVAEAGIGT VASGVAKSQAETGEIKI DLAIDGADEVDPNLDL VKLDELIYVESHLSNLS TKQYADAVIEVLPTTLI PDDNEGKLADLVGVTL GPKGYISPYFVTDSEKY TGGMVPDVNQIIVKVT TTIGYGSPNKAVVQVFE GTSGIDNKLAADTPLLT GQRLVAGLPEGGVLLL ENVRVPTVDVSVVDLT VRAASFNIIPSSTGAAK DVATILHWKVCPSHILN FQPGEAFVVRVEADIA GHGQEVLIRLAAALEH HHHHH

[0137] Enzymatic and biological functions of the proteins targeted by the isotope labeled peptides were assigned using the MapMan functional annotation scheme (Schwacke et al., 2019). The MapMan scheme arranges protein functions hierarchically, including the subunits of complexes. Additionally, the stoichiometries of protein complex subunits were determined from publicly available sources, for example from crystallography and electron microscopy data (e.g., the RCSB Protein Data Bank, available at the rcsb.org website).

[0138] Exemplary processes for protein quantification using conserved peptides are set out in the further Examples below.

Example 5

Protein Quantification in Leaves of Three Plant Species

[0139] The conserved peptides identified in Example 4 were made into QconCATs by PolyQuant (Germany). The full sequences of the QconCATs are set out in Table 4 (SEQ ID Nos: 452 and 453). QconCAT1 contained 15N and 13C labeled lysines and arginines. QconCAT2 lysines are arginines were labeled with only 13C. The cysteines in both QconCATs were alkylated for 1 hour with 2-vinylpyridine in N-methylmorpholine/acetic acid buffer; reactions were stopped with 2-mercaptoethanol. The alkylated QconCATs were combined into a stock solution at equimolar concentrations, approximately 50 ng/μL of each.

[0140] Leaf Sample Protein Extraction

[0141] Leaf protein extraction from three species (Flooded gum, bean, corn) was carried out via the methods described in Aspinwall et al. (2019). Critically, the extraction method is quantitative and extracts nearly all the protein from leaves. Also, the leaf area of each sample was known and 38 picomoles of ovalbumin per square centimeter of leaf was added to each sample early in the extraction protocol as an internal standard. Ovalbumin was used instead of QconCATs early in the protocol because it is far less expensive. QconCATs were added later in the protocol to a small proportion of the overall extracted leaf protein. Adding QconCATs to samples early in the protocol instead of ovalbumin is functionally equivalent to adding ovalbumin early and QconCATs later. The QconCATs both contained ovalbumin peptides, which allowed measured target-to-standard ratios to be converted to target per leaf area based on the addition rate of ovalbumin (38 μmol cm.sup.−2). Additionally, target protein amounts per leaf dry weight can be calculated if dry weight per leaf area is known.

[0142] Addition of QconCAT to the Leaf Samples, Acetate Solvent Protein Extraction Method and Lys-C/trypsin Digestion

[0143] Following the alkylation step in the leaf protein extraction method, extract protein concentrations were measured using a FluroProfile Protein Quantification Kit (Sigma). Then 50 μg protein was transferred to a new microcentrifuge tube and combined with 10 μg of the QconCAT stock solution (˜0.5 μg each QconCAT). The mixture was then subjected to a methanol-chloroform extraction method modified to be quantitative according to Aspinwall et al. (2019). The resulting pellets were digested with Lys-C and trypsin in a mass spec-compatible N-methylmorpholine buffer containing Rapigest detergent (Waters) according to Aspinwall et al. (2019), with modifications to promote complete digestion. Modifications included a higher concentration of trypsin, 1.25 μg per digest, and the addition of 4 mM CaCl.sub.2. Lys-C digestion at 45° C. for 1 hour was followed by the addition of trypsin and an overnight incubation at 37° C. Digests were stopped by the addition of 2% TFA.

[0144] If peptides are chemically synthesized instead of produced as QconCATs, then the peptides are added to samples following trypsin digestion. Also, QconCATs can be digested separately from samples and added as peptides following the digestion step as if they were chemically synthesized peptides. The addition of peptides post-digestion works with or without ovalbumin as an internal standard added during the extraction method. However, adding ovalbumin or intact QconCATs early in the extraction method is preferable to adding only peptides post-digestion because the added proteins effectively account for non-specific protein losses during sample processing.

[0145] Mass Spectrometric Analysis

[0146] Following digestion, the peptides were subjected to mass spectrometric analysis according to

[0147] Aspinwall et al. (2019). Briefly, 0.2 μg peptides per sample were analyzed by SWATH LC-MS/MS on a Sciex TripleTOF 6600 according to Cain et al. (2019) with the following modifications. The column was 10 centimeters and was run at room temperature. The acquisition LC gradient was 60 minutes. Sixty (60) variable width SWATH windows were used.

[0148] Using SWATH to analyze samples that include isotope labeled standards differs from more typical targeted mass spectrometry methods such as Selected Reaction Monitoring (SRM). SRM sets the mass spectrometer to only measure targeted analytes and their corresponding internal standards. SWATH captures data for all observable peptides in a sample—afterwards, data for the target analytes and internal standards are extracted using software. SWATH data allow the analysis of additional proteins not represented by internal standards by other means, if desired, without having to re-run the sample on a mass spectrometer.

[0149] SWATH Data Analysis

[0150] SWATH data were analyzed using MultiQuant software (Sciex), which extracts and integrates chromatograms for individual target peptide fragment ions. A list of target fragment ions, four per peptide for each target peptide and four for each isotope labeled standard, was created manually and used for the MultiQuant integration method. Example target peptide fragment ions (transitions) are shown in Table 5. The data in Table 5 can be used to create a Selected Reaction Monitoring method to target peptides with a mass spectrometer method, as opposed to extracting those data from SWATH results. The resulting outputs, integrated peak areas for each fragment ion of interest, were exported to Excel.

TABLE-US-00005 TABLE 5 Sample target peptide fragment ions (transitions) QconCAT Retention precursor fragment protein_name peptide # time m/z m/z GAPB AAALNIVPTSTGAAK 1 20.8 692.8934 732.3887 GAPB AAALNIVPTSTGAAK 1 20.8 692.8934 831.457 GAPB AAALNIVPTSTGAAK 1 20.8 692.8934 1058.584 GAPB AAALNIVPTSTGAAK 1 20.8 692.8934 944.5411 GAPB AAALNIVPTSTGAAK[+08] 1 20.8 696.9005 740.4028 GAPB AAALNIVPTSTGAAK[+08] 1 20.8 696.9005 839.4713 GAPB AAALNIVPTSTGAAK[+08] 1 20.8 696.9005 1066.598 GAPB AAALNIVPTSTGAAK[+08] 1 20.8 696.9005 952.5553 Actin AGFAGDDAPR 1 9 488.7278 630.2842 Actin AGFAGDDAPR 1 9 488.7278 701.3213 Actin AGFAGDDAPR 1 9 488.7278 458.2358 Actin AGFAGDDAPR 1 9 488.7278 573.2627 Actin AGFAGDDAPR[+10] 1 9 493.7319 640.2924 Actin AGFAGDDAPR[+10] 1 9 493.7319 711.3296 Actin AGFAGDDAPR[+10] 1 9 493.7319 468.244 Actin AGFAGDDAPR[+10] 1 9 493.7319 583.271 Histone H2A AGLQFPVGR 1 23.7 472.7693 575.33 Histone H2A AGLQFPVGR 1 23.7 472.7693 428.2616 Histone H2A AGLQFPVGR 1 23.7 472.7693 703.3886 Histone H2A AGLQFPVGR 1 23.7 472.7693 352.1979 Histone H2A AGLQFPVGR[+10] 1 23.7 477.7734 585.3383 Histone H2A AGLQFPVGR[+10] 1 23.7 477.7734 438.2699 Histone H2A AGLQFPVGR[+10] 1 23.7 477.7734 713.3969 Histone H2A AGLQFPVGR[+10] 1 23.7 477.7734 357.2021

[0151] Data Analysis Workflow

[0152] Target:standard ratios were calculated for each pair of unlabeled:labeled ions, then the ratios were averaged for each peptide, producing a ratio of moles of target per moles of QconCAT. Those ratios were converted to moles of target protein per cm.sup.2 using ion areas from unlabeled ovalbumin (added on a per leaf area basis during protein extraction) and the corresponding ovalbumin peptides in the QconCATs. For target proteins that are not part of conserved complexes (e.g., the complexes below), the amounts of protein in grams per leaf area were calculated by multiplying moles by the molecular weight of the corresponding Arabidopsis reference protein. Arabadopsis protein molecular weights are used for all plant species because the structural annotation of Arabidopsis is better than most species and molecular weights of homologs are likely largely conserved. Functional annotations were assigned based on the reference Arabidopsis proteins in the MapMan functional annotation scheme (available at the MapMen Site of Analysis website).

[0153] For proteins that are subunits of complexes with highly conserved stoichiometry (e.g., the photosystems, ATP synthase, ribosomes, histones, etc.), the molar ratios of those proteins per complex were calculated from publicly available data such as the RCSB Protein Data Bank. Additional protein subunits in the complexes were also identified in the MapMan scheme from publicly available data, thereby identifying what subunits are effectively quantified by peptides in the QconCATs because they are all part of the same complex with known stoichiometry (shown in Table 7 below). The peptides in the QconCATs include subunits in 25 reference complexes, which, by extension through known complex stoichiometries, covers 167 total complex subunits. Gram amounts of complexes per leaf area were calculated based on the molecular weights of the complexes from publicly available sources.

[0154] Results

[0155] Amounts of proteins and protein complexes in nanomoles per m.sup.2 leaf area, plus or minus one standard deviation, for leaf samples from Flooded gum, Bean, and Corn, are shown in Table 6 below. These three species are all examples from the 12 training species used to identify conserved peptides. Samples were extracted and analyzed in triplicate, splitting one leaf into three samples, to demonstrate the technical precision of the method. The average percentage coefficients of variation for Flooded gum, Bean, and Corn were 10%, 9%, and 11%, respectively.

TABLE-US-00006 TABLE 6 Amounts of proteins and protein complexes in nmoles per m.sup.2 leaf area from leaf samples from flooded gum, bean, and corn Flooded MapMan Protein or gum, nmol Bean, nmol Corn, nmol bin MapMan name complex per m.sup.2 per m.sup.2 per m.sup.2 1.1.1.2.1 Photosynthesis.photophos- PSII 1217 ± 168 587 ± 32 936 ± 104 phorylation.photosystem complex II.PS-II complex.reaction center complex 1.1.1.5.1.2.1 Photosynthesis.photophos- PsbS 881 ± 92 482 ± 35 34 ± 0 phorylation.photosystem II.photoprotection.non- photochemical quenching (NPQ).PsbS-dependent machinery.regulatory protein (PsbS) 1.1.2 Photosynthesis.photophos- Cytochrome 589 ± 96 370 ± 28 567 ± 66 phorylation.cytochrome b6/f b6/f complex 1.1.4.2 Photosynthesis.photophos- PSI 524 ± 87 190 ± 27 357 ± 47 phorylation.photosystem complex I.PS-I complex 1.1.5.2.1 Photosynthesis.photophos- FNR 22 ± 3 273 ± 15 89 ± 10 phorylation.linear electron flow.ferredoxin-NADP reductase (FNR) activity.ferredoxin-NADP oxidoreductase 1.1.8.1.6.2 Photosynthesis.photophos- Cnp60 42 ± 3 60 ± 3 36 ± 4 phorylation.chlororespiration. complex NADH dehydrogenase- like (NDH) complex.assembly and stabilization.Cpn60 chaperonin heterodimer 1.1.9 Photosynthesis.photophos- ATP 438 ± 38 325 ± 17 638 ± 70 phorylation.ATP synthase synthase complex complex 1.2.1.1 Photosynthesis.calvin Rubisco 3733 ± 433 3476 ± 223 1129 ± 128 cycle.ribulose-1,5- complex bisphosphat carboxylase/oxygenase (RuBisCo) activity.RuBisCo heterodimer 1.2.1.2.1 Photosynthesis.calvin Cnp60 42 ± 3 60 ± 3 36 ± 4 cycle.ribulose-1,5- complex bisphosphat carboxylase/oxygenase (RuBisCo) activity.RuBisCo assembly.CPN60 assembly chaperone complex 1.2.1.3.2 Photosynthesis.calvin RCA 2803 ± 89 2891 ± 170 563 ± 70 cycle.ribulose-1,5- bisphosphat carboxylase/oxygenase (RuBisCo) activity.RuBisCo regulation.ATP-dependent activase (RCA) 1.2.2 Photosynthesis.calvin PGK both 84 ± 6 540 ± 23 1071 ± 149 cycle.phosphoglycerate kinase 1.2.2 Photosynthesis.calvin PGK 569 ± 92 513 ± 229 1316 ± 176 cycle.phosphoglycerate chloroplast kinase 1.2.3 Photosynthesis.calvin GAP 254 ± 24 156 ± 7 365 ± 42 cycle.glyceraldehyde 3- phosphate dehydrogenase 1.2.5 Photosynthesis.calvin FBA 1347 ± 62 937 ± 63 2320 ± 230 cycle.fructose 1,6- chloroplast bisphosphate aldolase 1.2.6 Photosynthesis.calvin FBPase 271 ± 46 137 ± 8 268 ± 32 cycle.fructose-1,6- bisphosphatase 1.2.7 Photosynthesis.calvin Transketolase 459 ± 40 351 ± 18 6 ± 1 cycle.transketolase 1.2.8 Photosynthesis.calvin SBPase 376 ± 28 252 ± 10 359 ± 36 cycle.sedoheptulose-1,7- bisphosphatase 1.3.1 Photosynthesis.photo- PGLP 147 ± 18 100 ± 5 36 ± 3 respiration.phosphoglycolate phosphatase 1.3.2 Photosynthesis.photo- GLO 246 ± 33 611 ± 295 123 ± 15 respiration.glycolate oxidase 1.3.3.1 Photosynthesis.photo- GGT 242 ± 20 169 ± 10 58 ± 6 respiration.aminotransferase activities.glutamate- glyoxylate transaminase 1.3.3.2 Photosynthesis.photo- AGT 551 ± 40 250 ± 13 8 ± 0 respiration.aminotransferase activities.serine-glyoxylate transaminase 1.3.4.1 Photosynthesis.photo- GLDP 1180 ± 290 350 ± 13 66 ± 14 respiration.glycine decarboxylase complex.glycine dehydrogenase component P-protein 1.3.4.2 Photosynthesis.photo- GDCST 493 ± 33 157 ± 7 5 ± 1 respiration.glycine decarboxylase complex.aminomethyltrans- ferase component T-protein 1.3.5 Photosynthesis.photo- SHM 425 ± 15 225 ± 11 44 ± 3 respiration.serine hydroxymethyltransferase (SHM) 1.3.6 Photosynthesis.photo- HPR 172 ± 5 103 ± 11 38 ± 5 respiration.hydroxypyruvate reductase (HPR) 1.4.1.1 Photosynthesis.CAM/C4 PEPC 73 ± 3 53 ± 2 2829 ± 350 photosynthesis.phosphoenol- pyruvate (PEP) carboxylase activity.PEP carboxylase 1.4.2 Photosynthesis.CAM/C4 MDH 150 ± 15 95 ± 7 196 ± 19 photosynthesis.NAD- dependent malate dehydrogenase 2.1.1.2 Cellular FBA8 338 ± 31 186 ± 13 99 ± 11 respiration.glycolysis.cytosolic glycolysis.aldolase 2.1.1.4.1 Cellular GAPC2 305 ± 12 183 ± 6 616 ± 80 respiration.glycolysis.cytosolic glycolysis.glyceraldehyde 3-phosphate dehydrogenase activities .NAD-dependent glyceraldehyde 3- phosphate dehydrogenase 2.4.6 Cellular ATP 78 ± 6 31 ± 2 45 ± 2 respiration.oxidative synthase phosphorylation.ATP mitochondrial synthase complex 3.1.2.2 Carbohydrate FBA8 338 ± 31 186 ± 13 99 ± 11 metabolism.sucrose metabolism.biosynthesis.cytosolic fructose- bisphosphate aldolase 3.2.2.3 Carbohydrate ADG1 151 ± 23 82 ± 4 130 ± 13 metabolism, starch metabolism.biosynthesis.ADP- glucose pyrophosphorylase 3.9.2.3 Carbohydrate Transketolase 459 ± 40 351 ± 18 6 ± 1 metabolism.oxidative pentose phosphate pathway.non-oxidative phase.transketolase 3.12.2 Carbohydrate FBA 1347 ± 62 937 ± 63 2320 ± 230 metabolism.plastidial chloroplast glycolysis.fructose-1,6- bisphosphate aldolase 3.12.5 Carbohydrate PGK both 84 ± 6 540 ± 23 1071 ± 149 metabolism.plastidial glycolysis.phosphoglycerate kinase 3.12.5 Carbohydrate PGK 569 ± 92 513 ± 229 1316 ± 176 metabolism.plastidial chloroplast glycolysis.phosphoglycerate kinase 4.1.2.1.3 Amino acid AGT 551 ± 40 250 ± 13 8 ± 0 metabolism.biosynthesis. aspartate family.asparagine.asparagine aminotransaminase 4.1.2.2.6.2.1 Amino acid ATCIMS 22 ± 3 39 ± 3 50 ± 8 metabolism.biosynthesis. aspartate family.aspartate- derived amino acids.methionine.L- homocysteine S- methyltransferase activities.methyl- tetrahydrofolate-dependent methionine synthase 5.1.1.3 Lipid metabolism.fatty acid MDH 150 ± 15 95 ± 7 196 ± 19 biosynthesis.citrate shuttle.cytosolic NAD- dependent malate dehydrogenase 10.2.1 Redox Catalase 116 ± 50 132 ± 75 9 ± 1 homeostasis.enzymatic reactive oxygen species scavengers.catalase 12.1 Chromatin Histone 169 ± 17 53 ± 5 218 ± 26 organisation.histones complex 17.1.2 Protein Ribosome 104 ± 9 74 ± 8 102 ± 11 biosynthesis.ribosome complex biogenesis.large ribosomal subunit (LSU) 17.4.2 Protein EIF4 128 ± 12 54 ± 7 87 ± 8 biosynthesis.translation initiation.mRNA loading 17.5.1.1 Protein eEF1A 559 ± 40 295 ± 18 553 ± 79 biosynthesis.translation elongation.eEF1 aminoacyl-tRNA binding factor activity.aminoacyl- tRNA binding factor (eEF1A) 17.5.2.1 Protein eEF2 97 ± 2 57 ± 1 99 ± 11 biosynthesis.translation elongation.eEF2 mRNA- translocation factor activity.mRNA- translocation factor (eEF2) 18.4.25.2 Protein PGLP 147 ± 18 100 ± 5 36 ± 3 modification.phosphorylation. aspartate-based protein phosphatase superfamily.phosphatase (CIN) 19.1.5.1 Protein homeostasis.protein HSP70-1 300 ± 10 124 ± 8 161 ± 18 quality control.cytosolic Hsp70 chaperone system.chaperone (Hsp70) 19.1.7 Protein homeostasis.protein Cnp60 42 ± 3 60 ± 3 36 ± 4 quality control.Hsp60 complex chaperone system 19.4.2.9.4 Protein ClpC1 112 ± 12 83 ± 3 100 ± 9 homeostasis.proteolysis.serine- type peptidase activities.chloroplast Clp- type protease complex.chaperone component ClpC 20.2.1 Cytoskeleton Actin 194 ± 23 132 ± 8 166 ± 15 organisation.microfilament network.actin filament protein 24.1.1 Solute transport.primary ATP 13 ± 1 10 ± 0 14 ± 2 active transport.V-type synthase ATPase complex vacuolar 25.1.5.1.1 Nutrient uptake.nitrogen GSR1 785 ± 72 20 ± 3 110 ± 15 assimilation.ammonium assimilation.glutamine synthetase activities.cytosolic glutamine synthetase (GLN1) 25.1.5.1.2 Nutrient uptake.nitrogen GS2 1268 ± 288 1375 ± 91 268 ± 68 assimilation.ammonium assimilation.glutamine synthetase activities.plastidial glutamine synthetase (GLN2) 25.1.5.2.1 Nutrient uptake.nitrogen GLU1 130 ± 18 98 ± 4 6 ± 0 assimilation.ammonium assimilation.glutamate synthase activities.Fd- dependent glutamate synthase 50.4.2 Enzyme Enolase 236 ± 15 99 ± 7 186 ± 18 classification.EC_4 lyases.EC_4.2 carbon- oxygen lyase

TABLE-US-00007 TABLE 7 Complexes quantified in Examples 5 and 6 Subunit Number MapMan Reference Reference Complex of gene Complex bins in subunit subunit reference products Complex MapMan the entire Reference MapMan copies per subunit Complex in Complex abbreviation bin complex subunits bin complex ratio MW complex Photosystem PSII 1.1.1.2 1.1.1.2.1 atcg00020.1, 1.1.1.2.1.1, 1, 1, 1, 1 1 331496 22 II to atcg00270.1, 1.1.1.2.1.2, 1.1.1.2.2. atcg00680.1, 1.1.1.2.1.3, 2.2; atcg00280.1 1.1.1.2.1.4 1.1.1.2.3 to 1.1.1.2.15 Cytochrome b6f 1.1.2 1.1.2.1 to atcg00540.1, 1.1.2.1, 1, 1 1 106448 8 b6f 1.1.2.8 atcg00720.1 1.1.2.2 Photosystem PSI 1.1.4.2 1.1.4.2.1 atcg00350.1, 1.1.4.2.1, 1, 1 1 298740 14 I to atcg00340.1 1.1.4.2.2 1.1.4.2.12, 1.1.4.2.14 Chloroplast Cnp60 1.1.8.1.6.1 1.1.8.1.6.1.1, at1g55490.2 1.1.8.1.6.1.2 3 0.333333 822645 3 chaperonin 1.1.8.1.6.1.2 Cnp60 ATP ATP 1.1.9 1.1.9.1 to atcg00480.1 1.1.9.2.2 3 0.333333 569743 9 synthase synthase 1.1.9.2.5 chloroplastic chloroplastic Rubisco Rubisco 1.2.1.1 1.2.1.1.1, atcg00490.1 1.2.1.1.1 8 0.125 541468 2 1.2.1.1.2 Chloroplastic GAP 1.2.3 1.2.3 at1g42970.1, 1.2.3 4 0.25 152622 1 glyceraldehyde chloroplast at3g26650.1, 3- at1g12900.4 phosphate dehydrogenase Cytosolic GAP 2.1.4.1 2.1.4.1 at1g13440 2.1.4.1 4 0.25 147657 1 glyceraldehyde cytosolic 3- phosphate dehydrogenase Mitochondrial Mitochondrial 2.5.6 2.5.6.1 to at2g07698.1, 2.5.6.2.1, 3, 3 0.333333 604886 13 ATP ATP 2.5.6.2.6 at5g08680.1 2.5.6.2.2 synthase synthase ADP- ADG 3.2.1 3.2.1.3 at5g48300.1 3.2.1.3 2 0.5 202388 2 glucose pyrophosph orylase Histones Histones 12.1 12.1.1 to at1g54690.1, 12.1.2, 2, 2 0.5 144073 5 12.1.5 at5g59970.1 12.1.5 Cytosolic Ribosome 17.1 17.1.1 to at3g53430.1, 17.1.1.1.12, 1, 1 1 1330626 71 ribosome 17.1.2.1. at5g02960.1 17.1.2.1.24 33 Eukaryotic EIF4A 17.3.2.1 17.3.2.1, at3g13920.1 17.3.2.1 1 1 261013 3 initiation 17.3.2.3.1, factor-4A 17.3.2.3.2 Vacuolar Vacuolar 24.2.1 24.2.1 to at1g78900.2, 24.2.1.2.1, 3, 3 0.333333 797895 13 ATP ATP 24.2.1.2.8 at1g76030.1 24.2.1.2.2 synthase synthase 25 reference subunits 167

Example 6

Measurement of Leaf Proteins for Two Species Outside the Training Set of 12 Vascular Plant Species

[0156] Two species, Cotton (Gossypium hirsutum) and Myoporum montanum, not in the training set used to identify conserved plant proteins, and not in orders represented in the training set, were analyzed using the methods in Example 5. The species were analyzed in triplicate, one leaf sample per plant from three plants. Table 8 below shows the protein and complex in mg per m.sup.2 leaf area included in addition to nmoles per m.sup.2 leaf area. The average percentage coefficient of variation for cotton and Myoporum were 28% and 12%, respectively. The larger CVs than the species in Example 5 may reflect biological variation across the triplicate plants.

TABLE-US-00008 TABLE 8 Protein and complex in mg per m.sup.2 leaf area Myoporum Myoporum montanum, montanum, MapMan Protein or Cotton, nmol Cotton, mg nmol per mg per bin MapMan name complex per m.sup.2 per m.sup.2 m.sup.2 m.sup.2 1.1.1.2.1 Photosynthesis.photophos- PSII 771 ± 255.5 ± 1906 ± 631.8 ± phorylation.photosystem II.PS-II complex 104 34.6 202 67.1 complex.reaction center complex 1.1.1.5.1.2.1 Photosynthesis.photophos- PsbS 449 ± 9.7 ± 1858 ± 76 40.1 ± 1.6 phorylation.photosystem 114 2.5 II.photoprotection.non- photochemical quenching (NPQ).PsbS-dependent machinery.regulatory protein (PsbS) 1.1.2 Photosynthesis.photophosphorylation. Cytochrome 466 ± 49.6 ± 702 ± 111 74.7 ± cytochrome b6/f complex b6/f 229 24.3 11.8 1.1.4.2 Photosynthesis.photophosphorylation. PSI 427 ± 127.4 ± 770 ± 150 230 ± 44.9 photosystem I.PS-I complex complex 5 1.6 1.1.5.2.1 Photosynthesis.photophosphorylation. FNR 6 ± 1 0.2 ± 0 774 ± 108 27.2 ± 3.8 linear electron flow.ferredoxin- NADP reductase (FNR) activity.ferredoxin-NADP oxidoreductase 1.1.8.1.6.2 Photosynthesis.photophosphorylation. Cnp60 42 ± 34.9 ± 68 ± 7 55.7 ± 5.4 chlororespiration.NADH complex 23 18.7 dehydrogenase-like (NDH) complex.assembly and stabilization.Cpn60 chaperonin heterodimer 1.1.9 Photosynthesis.photophosphorylation. ATP 307 ± 174.9 ± 718 ± 84 408.9 ± ATP synthase complex synthase 92 52.3 48.1 complex 1.2.1.1 Photosynthesis.calvin Rubisco 3442 ± 1863.9 ± 10012 ± 5420.9 ± cycle.ribulose-1,5-bisphosphat complex 1184 641.4 592 320.5 carboxylase/oxygenase (RuBisCo) activity.RuBisCo heterodimer 1.2.1.2.1 Photosynthesis.calvin Cnp60 42 ± 34.9 ± 68 ± 7 55.7 ± 5.4 cycle.ribulose-1,5-bisphosphat complex 23 18.7 carboxylase/oxygenase (RuBisCo) activity.RuBisCo assembly.CPN60 assembly chaperone complex 1.2.1.3.2 Photosynthesis.calvin RCA 2637 ± 122 ± 3654 ± 169.1 ± cycle.ribulose-1,5-bisphosphat 927 42.9 863 39.9 carboxylase/oxygenase (RuBisCo) activity.RuBisCo regulation.ATP- dependent activase (RCA) 1.2.2 Photosynthesis.calvin PGK both 470 ± 20.1 ± 1347 ± 57.4 ± 5.1 cycle.phosphoglycerate kinase 160 6.8 120 1.2.2 Photosynthesis.calvin PGK 456 ± 19.4 ± 2947 ± 125.7 ± cycle.phosphoglycerate kinase chloroplast 139 5.9 487 20.8 1.2.3 Photosynthesis.calvin GAP 175 ± 26.7 ± 384 ± 38 58.6 ± 5.7 cycle.glyceraldehyde 3-phosphate 70 10.7 dehydrogenase 1.2.5 Photosynthesis.calvin FBA 912 ± 34.7 ± 3736 ± 142 ± 7.1 cycle.fructose 1,6-bisphosphate chloroplast 189 7.2 187 aldolase 1.2.6 Photosynthesis.calvin FBPase 111 ± 4.3 ± 1 482 ± 47 18.8 ± 1.8 cycle.fructose-1,6-bisphosphatase 25 1.2.7 Photosynthesis.calvin Transketolase 288 ± 21 ± 29 ± 15 2.1 ± 1.1 cycle.transketolase 89 6.5 1.2.8 Photosynthesis.calvin SBPase 211 ± 7.3 ± 520 ± 45 18 ± 1.6 cycle.sedoheptulose-1,7- 56 1.9 bisphosphatase 1.3.1 Photosynthesis.photorespiration. PGLP 109 ± 3.7 ± 267 ± 12 9.1 ± 0.4 phosphoglycolate phosphatase 41 1.4 1.3.2 Photosynthesis.photorespiration. GLO 468 ± 18.9 ± 2179 ± 87.9 ± glycolate oxidase 92 3.7 839 33.8 1.3.3.1 Photosynthesis.photorespiration. GGT 264 ± 14.1 ± 524 ± 65 27.9 ± 3.5 aminotransferase 92 4.9 activities.glutamate-glyoxylate transaminase 1.3.3.2 Photosynthesis.photorespiration. AGT 413 ± 18.3 ± 1057 ± 92 46.7 ± 4 aminotransferase activities.serine- 87 3.8 glyoxylate transaminase 1.3.4.1 Photosynthesis.photorespiration. GLDP 542 ± 57 ± 1661 ± 174.8 ± glycine decarboxylase 242 25.4 317 33.3 complex.glycine dehydrogenase component P-protein 1.3.4.2 Photosynthesis.photorespiration. GDCST 248 ± 10.3 ± 488 ± 25 20.4 ± 1.1 glycine decarboxylase 44 1.8 complex.aminomethyltransferase component T-protein 1.3.5 Photosynthesis.photorespiration. SHM 236 ± 12.8 ± 1180 ± 81 63.7 ± 4.4 serine hydroxymethyltransferase 54 2.9 (SHM) 1.3.6 Photosynthesis.photorespiration. HPR 104 ± 4.4 ± 506 ± 41 21.4 ± 1.7 hydroxypyruvate reductase (HPR) 22 0.9 1.4.1.1 Photosynthesis.CAM/C4 PEPC 40 ± 9 4.4 ± 1 144 ± 17 15.8 ± 1.8 photosynthesis.phosphoenolpyruvate (PEP) carboxylase activity.PEP carboxylase 1.4.2 Photosynthesis.CAM/C4 MDH 56 ± 2 ± 0.4 366 ± 17 13 ± 0.6 photosynthesis.NAD-dependent 11 malate dehydrogenase 2.1.1.2 Cellular FBA8 193 ± 7.4 ± 950 ± 13 36.5 ± 0.5 respiration.glycolysis.cytosolic 70 2.7 glycolysis.aldolase 2.1.1.4.1 Cellular GAPC2 198 ± 29.3 ± 694 ± 83 102.5 ± respiration.glycolysis.cytosolic 52 7.7 12.2 glycolysis.glyceraldehyde 3- phosphate dehydrogenase activities.NAD-dependent glyceraldehyde 3-phosphate dehydrogenase 2.4.6 Cellular respiration.oxidative ATP 28 ± 2 16.7 ± 118 ± 9 71.2 ± 5.6 phosphorylation.ATP synthase synthase 1.3 complex mitochondrial 3.1.2.2 Carbohydrate metabolism.sucrose FBA8 193 ± 7.4 ± 950 ± 13 36.5 ± 0.5 metabolism.biosynthesis.cytosolic 70 2.7 fructose-bisphosphate aldolase 3.2.2.3 Carbohydrate metabolism.starch ADG1 100 ± 20.2 ± 194 ± 7 39.2 ± 1.4 metabolism.biosynthesis.ADP- 45 9.1 glucose pyrophosphorylase 3.9.2.3 Carbohydrate Transketolase 288 ± 21 ± 29 ± 15 2.1 ± 1.1 metabolism.oxidative pentose 89 6.5 phosphate pathway.non-oxidative phase.transketolase 3.12.2 Carbohydrate FBA 912 ± 34.7 ± 3736 ± 142 ± 7.1 metabolism.plastidial chloroplast 189 7.2 187 glycolysis.fructose-1,6- bisphosphate aldolase 3.12.5 Carbohydrate PGK both 470 ± 20.1 ± 1347 ± 57.4 ± 5.1 metabolism.plastidial 160 6.8 120 glycolysis.phosphoglycerate kinase 3.12.5 Carbohydrate PGK 456 ± 19.4 ± 2947 ± 125.7 ± metabolism.plastidial chloroplast 139 5.9 487 20.8 glycolysis.phosphoglycerate kinase 4.1.2.1.3 Amino acid AGT 413 ± 18.3 ± 1057 ± 92 46.7 ± 4 metabolism.biosynthesis.aspartate 87 3.8 family.asparagine.asparagine aminotransaminase 4.1.2.2.6.2.1 Amino acid ATCIMS 3 ± 1 0.3 ± 0 100 ± 23 8.4 ± 1.9 metabolism.biosynthesis.aspartate family.aspartate-derived amino acids.methionine.L-homocysteine S-methyltransferase activities.methyl-tetrahydrofolate- dependent methionine synthase 5.1.1.3 Lipid metabolism.fatty acid MDH 56 ± 2 ± 0.4 366 ± 17 13 ± 0.6 biosynthesis.citrate 11 shuttle.cytosolic NAD-dependent malate dehydrogenase 10.2.1 Redox homeostasis.enzymatic Catalase 134 ± 7.6 ± 211 ± 35 12 ± 2 reactive oxygen species 28 1.6 scavengers.catalase 12.1 Chromatin organisation.histones Histone 207 ± 29.8 ± 836 ± 130 120.4 ± complex 29 4.2 18.7 17.1.2 Protein biosynthesis.ribosome Ribosome 89 ± 118.2 ± 186 ± 16 246.9 ± biogenesis.large ribosomal subunit complex 42 56.5 20.7 (LSU) 17.4.2 Protein biosynthesis.translation EIF4 52 ± 7 13.7 ± 177 ± 2 46.3 ± 0.6 initiation.mRNA loading 1.8 17.5.1.1 Protein biosynthesis.translation eEF1A 370 ± 18.3 ± 882 ± 48 43.7 ± 2.4 elongation.eEF1 aminoacyl-tRNA 99 4.9 binding factor activity.aminoacyl- tRNA binding factor (eEF1A) 17.5.2.1 Protein biosynthesis.translation eEF2 76 ± 7.1 ± 151 ± 9 14.1 ± 0.9 elongation.eEF2 mRNA- 23 2.1 translocation factor activity.mRNA-translocation factor (eEF2) 18.4.25.2 Protein PGLP 109 ± 3.7 ± 267 ± 12 9.1 ± 0.4 modification.phosphorylation. 41 1.4 aspartate-based protein phosphatase superfamily.phosphatase (CIN) 19.1.5.1 Protein homeostasis.protein HSP70-1 138 ± 9.9 ± 614 ± 116 43.7 ± 8.2 quality control.cytosolic Hsp70 22 1.6 chaperone system.chaperone (Hsp70) 19.1.7 Protein homeostasis.protein Cnp60 42 ± 34.9 ± 68 ± 7 55.7 ± 5.4 quality control.Hsp60 chaperone complex 23 18.7 system 19.4.2.9.4 Protein ClpC1 69 ± 6.9 ± 232 ± 13 23.1 ± 1.3 homeostasis.proteolysis.serine- 23 2.3 type peptidase activities.chloroplast Clp-type protease complex.chaperone component ClpC 20.2.1 Cytoskeleton Actin 184 ± 7.7 ± 416 ± 24 17.3 ± 1 organisation.microfilament 53 2.2 network.actin filament protein 24.1.1 Solute transport.primary active ATP 9 ± 1 6.8 ± 48 ± 2 38 ± 1.5 transport.V-type ATPase complex synthase 0.9 vacuolar 25.1.5.1.1 Nutrient uptake.nitrogen GSR1 83 ± 3.2 ± 697 ± 94 27.2 ± 3.7 assimilation.ammonium 18 0.7 assimilation.glutamine synthetase activities.cytosolic glutamine synthetase (GLN1) 25.1.5.1.2 Nutrient uptake.nitrogen GS2 1012 ± 43 ± 2729 ± 115.9 ± assimilation.ammonium 370 15.7 481 20.4 assimilation.glutamine synthetase activities.plastidial glutamine synthetase (GLN2) 25.1.5.2.1 Nutrient uptake.nitrogen GLU1 72 ± 11.8 ± 351 ± 50 58 ± 8.2 assimilation.ammonium 19 3.2 assimilation.glutamate synthase activities.Fd-dependent glutamate synthase 50.4.2 Enzyme classification.EC_4 Enolase 107 ± 5.1 ± 309 ± 25 14.8 ± 1.2 lyases.EC_4.2 carbon-oxygen 38 1.8 lyase

Example 7

Absolute Protein Quantification makes New Types of Biological Insights Possible

[0157] This example demonstrates how absolute quantification of proteins and protein complexes across multiple species makes new types of biological comparisons possible. Amounts of key components of photosynthesis across 14 species were compared. The 14 species are the 12 species used in Example 4 and the two species in Example 6.

[0158] FIG. 6 exemplifies figures of the proteins of photosynthesis found in most university biochemistry and plant physiology textbooks (see Orr and Govindjee (2013), “Photosynthesis Web Resources,” Photosynthesis Research 115:179-214). It shows the major complexes (Photosystems I and II, ATP synthase, Cytochrome b6f) and demonstrates how they are complexes of protein subunits.

[0159] FIG. 7 contains box and whisker plots that summarize the 14 species' protein complex ratios relative to PSII. The ratios of the membrane associated complexes of the light-dependent reactions of photosynthesis, PSI complex (box 702), ATP synthase (box 704), and Cytochrome b6f (box 706), are all conserved with respect to PSII. However, the ratio relative to PSII of Rubisco (box 708), which is not membrane-associated and is part of the light-independent reactions, is not conserved. These sorts of quantitative comparisons across different protein complexes and across species are not possible without isotopically labeled peptide standards that can be used across multiple species.

[0160] FIG. 8 is a similar box and whisker plot summarizing ratios from the 14 species, but the ratios are relative to Rubisco and the proteins are related to the light-independent reactions of photosynthesis. RCA (box 802) is Rubisco activase, an enzyme that interacts closely with Rubisco to keep Rubisco active during the day. PGK (box 804) and GAP (box 806) are enzymes of the Calvin cycle—the carbon fixing light-independent reactions. FIG. 8 shows that, on a molar basis, there is nearly as much RCA as Rubisco. For PGK and GAP there are outliers with much higher ratios relative to Rubisco. The outliers are both from corn, which probably reflects the different type of photosynthesis corn uses (C4) compared to most other plants (which are C3). C4 plants like corn have mechanisms to enhance the carbon dioxide fixing activity of Rubisco, which means that less Rubisco per amount of other carbon fixing enzymes is required. Like the example in FIG. 7, the quantitative comparisons across proteins and species in FIG. 8 are not possible without internal peptide standards that work across species. Both examples demonstrate how the approach in this disclosure make possible new types of biological insights.

Example 8

ATP Synthase Example

[0161] A list of 105 conserved tryptic peptides were identified in Example 4 and utilized in Examples 5 through 7. That set of peptides is not exhaustive—there are numerous additional peptides produced by trypsin that could be used as standards. Similarly, additional conserved peptides can be generated by cleavage methods other than trypsin, for example by cyanogen bromide chemical cleavage or cleavage by other proteases such as Asp N. Therefore, the method of using conserved peptides is not restricted to the 105 peptides used in Examples 5 through 7. The invention is extensible to additional cleavage methods, including gas phase fragmentation of intact proteins. In the case of intact protein mass spectrometry, conserved fragment ions could be identified and intact isotope labeled proteins containing those fragment sequences could be used as internal standards.

[0162] To demonstrate how different protein digestion and hydrolysis methods produce additional potential conserved peptides, the protein sequences for the beta subunit of chloroplastic ATP synthase from 11 diverse species were aligned. The alignment illustrates stretches of conserved amino acid sequences across the 11 species. Two of the conserved stretches were used in the previous examples to quantify chloroplastic ATP synthase—they are peptides produced by trypsin digestion.

[0163] Photosynthetic eukaryote ATP synthase is a highly conserved protein complex located in chloroplast membranes. Other versions of ATP synthase exist in membranes of vacuoles and mitochondria. The 3 different types of ATP synthase are covered by different peptides in the 105 used in Examples 5 through 7, which makes it possible to quantify the three types of complexes independently. The beta subunit is represented in Examples 4 through 7 by two tryptic peptides. The alignment in FIGS. 9A-9B demonstrates that there are many other conserved peptides in the beta subunit that could be used in the kit, e.g., peptides produced by other proteases and chemical cleavage.

[0164] The alignment below contains ATP synthase beta subunits sequences from 11 widely divergent species. One of the species is a prokaryote (marine cyanobacteria Synechococcus elongatus), the rest are eukaryotes. The prokaryote does not have organelles (e.g., chloroplast, mitochondria), but it is photosynthetic and its version of ATP synthase beta is still highly conserved with eukaryotic chloroplastic ATP synthase beta. Eukaryotic chloroplasts and the cyanobacteria from which they arose evolutionarily diverged somewhere between 600 million and 2 billion years ago.

TABLE-US-00009 TABLE 9 Proteins in the Alignment Protein Uniprot entry Entry name Species Classification ATP Synthase Beta P19366 ATPB_ARATH Arabidopsis Angiosperm, dicot, subunit, thaliana Brassicales chloroplastic ATP Synthase Beta Q2MI93 ATPB_SOLLC Solanum Angiosperm, dicot, subunit, lycopersicum Solanales, tomato chloroplastic ATP Synthase Beta P0C2Z8 ATPB_ORYSI Oryza sativa Angiosperm, subunit, monocot, Poales, chloroplastic rice ATP Synthase Beta O47037 ATPB_PICAB Picea abies Gymnosperm, subunit, Norway spruce chloroplastic ATP Synthase Beta A6H5I4 ATPB_CYCTA Cycas taitungensis Cycad subunit, chloroplastic ATP Synthase Beta O03067 ATPB_DICAN Dicksonia Australian tree fern subunit, antarctica chloroplastic ATP Synthase Beta Q5SCV8 ATPB_HUPLU Huperzia lucidula Clubmoss subunit, chloroplastic ATP Synthase Beta P80658 ATPB_PHYPA Physcomitrella Moss subunit, patens chloroplastic ATP Synthase Beta Q31794 ATPB_ANTAG Anthoceros Hornwort subunit, angustus chloroplastic ATP Synthase Beta A0A250WRN1 ATPB_CHLRE Chlamydomonas Unicellular algae subunit, reinhardtii chloroplastic ATP Synthase Beta Q31KS4 ATPB_SYNE7 Synechococcus Cyanobacteria subunit elongatus

[0165] The two kit peptides for ATP synthase beta are highlighted in FIG. 9A as the following sequences within “SP|P19366|ATPB_ARATH”: (1) the “LSIFETGIK” sequence beginning at position 146 (SEQ ID NO: 354), and (2) the “FVQAGSEVSALLGR” sequence beginning at position 278 (SEQ ID NO: 353). Additional, but not exhaustive, examples of conserved peptides produced by trypsin that have not been used in the kit are highlighted as follows: (1) for “SP|P19366|ATPB_ARATH,” the “IGLFGGAGVGK” sequence beginning at position 168 (SEQ ID NO: 55), the “AHGGVSVFGGVGERTR” sequence beginning at position 192 (SEQ ID NO: 454), and the “VALVYGQMNEPPGAR” sequence beginning at position 232 (SEQ ID NO: 455), and (2) for “SP|Q2MI93|ATPB_SOLLC,” the “TVLIMELINNIAK” sequence beginning at position 179 (SEQ ID NO: 456). Examples of conserved peptides produced by Glu C (not in kit) are highlighted as follows: (1) for “SP|POC2Z8|ATPB_ORYSI,” the “LINNIAKAHGGVSVFGGVGE” sequence beginning at position 185 (SEQ ID NO: 457), and (2) for “SP|Q2MI93|ATPB_SOLLC,” the “PPGARMRVGLTALTMAE” sequence beginning at position 242 (SEQ ID NO: 458). Examples of conserved peptides produced by Asp N (not in kit) are highlighted as follows: (1) for “SP|Q2MI93|ATPB_SOLLC,” the “DTKLSIFETGIKVV” sequence beginning at position 143 (SEQ ID NO: 459), and (2) for “SP|P19366|ATPB_ARATH,” the “DPAPATTFAHL” sequence beginning at position 336 (SEQ ID NO: 460). Examples of conserved peptides produced by formic acid cleavage (C terminal side of Asp) are highlighted as follows: (1) for “SP|P0C2Z8|ATPB_ORYSI,” the “TKLSIFETGIKVVD” sequence beginning at position 144 (SEQ ID NO: 461), and (2) for “SP|Q2MI93|ATPB_SOLLC,” the “PAPATTFAHLD” sequence beginning at position 337 (SEQ ID NO: 462). Examples of conserved peptides produced by cyanogen bromide cleavage (C terminal side of M) are highlighted as follows: (1) for “SP|O47037|ATPB_PICAB,” the “NEPPGARM” sequence beginning at position 238 (SEQ ID NO: 463), (2) for “SP|P19366|ATPB_ARATH,” the “PSAVGYQPTLSTEM” sequence beginning at position 293 (SEQ ID NO: 464), and (3) for “SP|P0C2Z8|ATPB_ORYSI,” the “RVGLTALTM” sequence beginning at position 248 (SEQ ID NO: 465). Residues that conflict with highlighted conserved sequences are highlighted as follows: (1) for “SP|Q31KS4|ATPB_SYNE7,” the “E” residue at position 133, the “PKV” sequence beginning at position 136, the “I” residue at position 146, the “Q” residue at position 173, the “E” residue at position 182, the “S” residue at position 242, the “G” residue at position 293, and the “DV” sequence beginning at position 295, (2) for “SP|O03067|ATPB_DICAN,” the “S” residue at position 180, the “S” residue at position 232, the “P” residue at position 235, the “S” residue at position 270, and the “G” residue at position 284, (3) for “SP|P06541|ATPB_CHLRE,” the “A” residue at position 240, the “A” residue at position 273, and the “A” residue at position 293, (4) for “SP|O47037|ATPB_PICAB,” the “A” residue at position 301, and (5) for “SP|Q5SCV8|ATPB_HUPLU,” the “G” residue at position 301.

[0166] In FIGS. 9A-9B, alignment by Clustal Omega (available at the uniprot.org website), “*” indicates 100% conserved identity. The first sequence from Arabidopsis is the reference sequence for the methods in Examples 4 through 7. The remaining sequences are approximately in order of evolutionary distance from Arabidopsis.

[0167] These and other objectives and features of the invention are apparent in the disclosure, which includes the above and ongoing written specification.

[0168] The foregoing description details certain embodiments of the invention. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention can be practiced in many ways. As is also stated above, it should be noted that the use of particular terminology when describing certain features or aspects of the invention should not be taken to imply that the terminology is being re-defined herein to be restricted to including any specific characteristics of the features or aspects of the invention with which that terminology is associated.

[0169] The invention is not limited to the particular embodiments illustrated in the drawings and described above in detail. Those skilled in the art will recognize that other arrangements could be devised. The invention encompasses every possible combination of the various features of each embodiment disclosed. One or more of the elements described herein with respect to various embodiments can be implemented in a more separated or integrated manner than explicitly described, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. While the invention has been described with reference to specific illustrative embodiments, modifications and variations of the invention may be constructed without departing from the spirit and scope of the invention as set forth in the following claims.

QUANTITATIVE PROTEIN ANALYSIS

Inventors

Cpc classification

Classification Explorer

G01N2333/415

PHYSICS

Classification Explorer

G01N33/6848

PHYSICS

Classification Explorer

G16B40/10

PHYSICS

Classification Explorer

G16B20/30

PHYSICS

Classification Explorer

G16B10/00

PHYSICS

Classification Explorer

C12Q1/6895

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12Q1/6895

CHEMISTRY; METALLURGY

Abstract

Claims

Description