USE OF NATURAL-ABUNDANCE STABLE ISOTOPES AND DNA GENOTYPING FOR IDENTIFYING BIOLOGICAL PRODUCTS
20210398608 · 2021-12-23
Inventors
Cpc classification
G16B30/00
PHYSICS
C12Q2537/165
CHEMISTRY; METALLURGY
C12Q2537/165
CHEMISTRY; METALLURGY
G16B25/00
PHYSICS
G01N33/6842
PHYSICS
C12Q2563/185
CHEMISTRY; METALLURGY
C12Q2563/185
CHEMISTRY; METALLURGY
G16B20/00
PHYSICS
International classification
Abstract
The combined use of natural-abundance stable-isotopic analysis of bulk materials and DNA genotyping of biological materials is a highly-specific (˜1:1×10.sup.17) fingerprinting method for identifying such products in supply chains.
Claims
1. A method for objectively characterizing a biological sample containing a genetic, proteinaceous, catabolic, or metabolic constituent, comprising: (a) obtaining isotopic data from elements present in said sample; providing a mathematical array that includes the isotopic data, the mathematical array being fixed in a readable form, said readable form with said mathematical array fixed thereon being an identification of said sample, (b) obtaining genomic, proteinomic, catabolomic, or metabolomic data on the sample, (c) constructing an integrated identifying data array from the isotopic data obtained in step (a) and the genomic, proteinomic, catabolomic, or metabolomic data obtained in step (b), and (d) providing an objective characterization of the biological sample.
2. A method according to claim 1 wherein the isotopic data does not include data obtained from a taggant.
3. A method according to claim 1 or 2 wherein the elements are selected from elements that have two or more isotopes.
4. A method according to any of claims 1 to 3 wherein the elements are selected from hydrogen, carbon, nitrogen, oxygen, sulfur, chlorine, and bromine, and combinations thereof.
5. A method according to claim 3 wherein the isotopes are stable isotopes.
6. A method according to claim 5 wherein the stable isotopes are selected from .sup.1H, .sup.2H, .sup.12C, .sup.13C, .sup.14N, .sup.15N, .sup.16O, .sup.18O, .sup.32S, .sup.34S, .sup.36Cl, .sup.37Cl, .sup.79Br, and .sup.81Br and combinations thereof.
7. A method according to claim 6 wherein the isotopes are selected from the following pairs of isotopes: .sup.1H and .sup.2H, .sup.12C and .sup.13C, .sup.14N and .sup.15N, .sup.16O and .sup.18O, .sup.32S and .sup.34S, .sup.35Cl and .sup.37Cl, and .sup.79Br, and .sup.81Br.
8. A method according to claim 6 wherein the isotopes are selected from the following isotope ratios: .sup.2H/.sup.1H, .sup.13C/.sup.12C, .sup.15N/.sup.14N, .sup.18O/.sup.16O, .sup.34S/.sup.32S, .sup.37Cl/.sup.35Cl, and .sup.81Br/.sup.79Br.
9. A method according to any of claims 1 to 8 wherein the isotopic data and the genomic, proteinomic, catabolomic, or metabolomic data is intrinsic data to the sample.
10. A method according to any of claims 1 to 9 wherein the integrated data (c) is fixed in a computer or machine-readable form.
11. A method according to claim 1 wherein the biological sample contains a genetic constituent and the genomic data is obtained by genotyping.
12. A method according to claim 11 wherein the genetic constituent is selected from DNA, RNA, nucleotide fragments, and nucleic acids.
13. A method according to claim 1 wherein the isotopic data is given with respect to a reference standard.
14. A data array for objectively characterizing a biological sample containing a genetic, proteinaceous, catabolic, or metabolic constituent, comprising: (a) isotopic data from elements present in said sample; providing a mathematical array that includes the isotopic data, the mathematical array being fixed in a readable form, said readable form with said mathematical array fixed thereon being an identification of said sample, and (b) genomic, proteinomic, catabolomic, or metabolomic data on the sample, wherein the isotopic data of (a) and the genomic, proteinomic, catabolomic, or metabolomic data of (b) are integrated into an identifying data array for objectively characterizing the biological sample.
15. A data array according to claim 14 wherein the elements are selected from elements that have two or more isotopes.
16. A data array according to claim 15 or 16 wherein the elements are selected from hydrogen, carbon, nitrogen, oxygen, sulfur, chlorine, and bromine, and combinations thereof.
17. A data array according to claim 15 wherein the isotopes are stable isotopes.
18. A data array according to claim 17 wherein the stable isotopes are selected from .sup.1H, .sup.2H, .sup.12C, .sup.13C, .sup.14N, .sup.15N, .sup.16O, .sup.18O, .sup.32S, .sup.34S, .sup.35Cl, .sup.37Cl, .sup.79Br, and .sup.81Br and combinations thereof.
19. A data array according to claim 18 wherein the isotopes are selected from the following pairs of isotopes: .sup.1H and .sup.2H, .sup.12C and .sup.13C, .sup.14N and .sup.15N, .sup.16O and .sup.18O, .sup.32S and .sup.34S, .sup.35Cl and .sup.37Cl, and .sup.79Br, and .sup.81Br.
20. A data array according to claim 18 wherein the isotopes are selected from the following isotope ratios: .sup.2H/.sup.1H, .sup.13C/.sup.12C, .sup.15N/.sup.14N, .sup.18O/.sup.16O, .sup.34S/.sup.32S, .sup.37Cl/.sup.35Cl, and .sup.81Br/.sup.79Br.
21. A data array according to any of claims 14 to 20 wherein the isotopic data and the genomic, proteinomic, catabolomic, or metabolomic data is intrinsic data to the sample.
22. A data array according to any of claims 14 to 20 wherein the integrated data is fixed in a computer or machine-readable form.
23. A data array according to claim 14 wherein the biological sample contains a genetic constituent and the genomic data is obtained by genotyping.
24. A data array according to claim 14 wherein the genetic constituent is selected from DNA, RNA, nucleotide fragments, and nucleic acids.
25. A data array according to claim 14 wherein the isotopic data is given with respect to a reference standard.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
DETAILED DESCRIPTION OF THE INVENTION
[0036] G×E: G (genetics)×E (environment) is a powerful and elegant concept to trace and authenticate biological materials—i.e. materials containing DNA or RNA (e.g., plant seeds such as corn, wheat, cotton, etc.) —relative to the environment via stable isotopes.
[0037] In present invention we assert that there is a G×E relationship for one genotype: For example, “Iowa” (genetic type) soybeans grown in Iowa will likely have a very different (C, H, O, N, S) isotopic composition than “Iowa” soybeans grown in China—same DNA, different isotopic signal. This difference could be presented in a variety of ways, such as an X-Y graph (DNA phenotype vs bulk isotopes).
[0038] Although the concept of G×E is generally known, the methods and data arrays of the present invention for determining and quantitating G×E are believed to be new. As is shown herein, the present methods and data sets allow for new and useful applications for authenticating a biological sample or authenticating and/or distinguishing between two or more biological samples. The present methods and data sets provide a powerful means for performing what was not able to be performed before. The present invention takes the unique combination and integration of genetic fingerprinting data, i.e. genomic or sequencing data, with high resolution isotope ratio mass spectrometry data to provide an integrated data array that is useful for the methods herein.
[0039] Isotope ratio mass spectrometry (IRMS) is a specialized branch of mass spectrometry utilizing the relative abundance of isotopes in a given sample. The methodology allows for the precise measurement of mixtures of naturally occurring isotopes. Most instruments used for such precise determination of isotope ratios are of the magnetic sector type. The field of IRMS is of interest because differences in mass between different isotopes leads to isotope fractionation. This fractionation results in measurable effects on the isotopic composition of samples, thus providing a window into their biological or physical history. Consider the following example. The hydrogen isotope, deuterium (D or .sup.2H), has nearly double the mass of ordinary hydrogen (.sup.1H). Consequently, there is a significant difference in the mass of an ordinary water H.sub.2O molecule versus HDO (a water molecule in which one of the hydrogens is replaced by a deuterium). Processes involving the evaporation of water or the cleavage of hydrogen-water bonds or of the disassociation of hydrogen bonds between water and/or other molecules will exhibit a fractionation. Consequently, water sources in different locations around the earth will most likely have different, and thus distinguishing, isotopic ratios or “fingerprints” of D to H.
[0040] Isotope ratios are generally given with respect to a standard, which given a δ value allows calculation of a relative isotopic abundance. Reference standards can be found in Hayes, J. M., Practice and Principles of Isotopic Measurements in Organic Geochemistry, Revision 2, August 2002, pages 1-15, particularly Table 1, the gist of which is excerpted here, and the reference which is incorporated by reference herein in its entirety.
TABLE-US-00001 TABLE 1 Isotopic Compositions of Primary Standards Primary Standard Isotope Ratio Standard Mean Ocean Water (SMOW) .sup.2H/.sup.1H .sup.18O/.sup.16O .sup.17O/.sup.16O PeeDee Belemnite (PDB) .sup.13C/.sup.12C .sup.18O/.sup.16O .sup.17O/.sup.16O Air (AIR) .sup.15N/.sup.14N Canyon Diablo Trolite (CDT) .sup.34S/.sup.32S
[0041] Nucleic acid sequencing such as DNA or RNA sequencing can be used to determine the sequence of individual genes (DNA) or of the genes encoding for RNA structures such as the 16S subunit of the ribosome, which is useful for genetic identification. The methodology is used to study genomes and the proteins they encode. The advent of relatively inexpensive and rapid sequencing methodologies has allowed for the determination of sequences of DNA and RNA from biological samples to allow for their identification. These methods have led to the field of genomics, which focuses on the structure, function, evolution mapping, and editing of the genome, i.e. the genetic material of an organism. In preparing a sample for sequencing, polymerase chain reaction (PCR) is used to make copies of a specific DNA segment. In other words, the DNA sequences are exponentially amplified to generate sufficient quantities of material for genetic sequencing.
[0042]
Relative Specificities of DNA versus Stable-Isotopic Analysis
TABLE-US-00002 DNA Profiling Stable-Isotopic Analysis GMW: DNA, E. coli: ~3 × 10.sup.9 GMW: Spans ~2 Daltons to 10.sup.9 Daltons Specificity: ~1:10.sup.7 Specificity of Light Isotopes
Where GMW is (gram molecular weight) [0043] (C, H, O, N, S)
TABLE-US-00003 #δ Dynamic Range Specificity 1 100.sup.1 1:10.sup.2 2 100.sup.2 1:10.sup.4 3 100.sup.3 1:10.sup.6 4 100.sup.4 1:10.sup.8
[0044] The combination of DNA and natural stable isotopes (each with a specificity of ˜10.sup.7) is an exceptionally strong pairing, yielding very high specificity.
[0045] We assessed the compounded power of stable isotopes and human DNA. We estimated the specificity at ˜1 in 10.sup.17, extremely high. From our point of view, DNA is one among many compounds that we would typically isotopically fingerprint. Other biological compounds include RNA; proteins, peptides, and amino acids; products of catabolism; and metabolites.
[0046] We assessed the compounded power of bulk stable isotopes and bacterial DNA—DNA has a specificity of 3×10.sup.17. We estimated the specificity at ˜3 in 10.sup.17, which is extremely high for such assessments.
[0047] In the foregoing δ is a measure of the parts per thousand (per mil or “‰”) difference (either positive or negative) relative to an internationally accepted standard. For example, considering carbon 12 and carbon 13, δ.sup.13C is determined as:
δ.sup.13C═{[(.sup.13C/.sup.12C).sub.sample/(.sup.13C/.sup.12C).sub.standard]−1}×1000‰.
Other isotopic ratios are similarly determined and calculated.
[0048] DNA is one among many compounds that we can isotopically fingerprint. We can therefore asses the bulk composition of a biological sample (e.g., wheat seeds, cotton) and separately a quantitative index of the coexisting DNA genotype. Stable-isotopic fingerprinting of a DNA molecule (as a bulk organic phase) and its genotype would then be an example of a focused application of our method. Typically, we can isotopically fingerprint a bulk material (e.g., a bulk wheat seed etc.) and a quantitative index of its DNA genotype. Although genomic identification, i.e. genotyping, of genetic material in the samples (e.g., DNA, RNA, nucleic acid fragments, nucleic acids) is an important application of this method, other biological materials can also be analyzed and identified for this purpose. For example, proteomics can be used to obtain identifying information on proteins, peptides, and amino acids in the samples. Catabolomics and metabolomics can be used to obtain identifying information on products of catabolism and metabolism in the samples. Similarly, other “omic” techniques and information can be obtained on other biological components.
[0049] DNA is a linear sequence of four base pairs of nucleotides (G—guanine, T—thymine, A—adenine, and C—cytosine) that encode genetic information. Also, involved is RNA, based on the nucleotides (G—guanine, U—uracil, A—adenine, and C—cytosine).
[0050] Natural abundance stable isotopes (e.g., C, H, O, N, S) record the isotopic provenance of biological materials with great specificity, as described above. In particular, the stable H and O isotopes of water record the environment (E) in which the material was biosynthesized. The C, N, and S isotopes record the isotopic composition of the biological material itself to provide a highly specific isotopic fingerprint.
[0051] In an example, the G×E application can be shown on a bivariate plot (x,y-graph) as shown in
[0052]
[0053]
[0054]
[0055] Parameterizing genetics: Linear DNA molecules are made up of two types of sequences: conserved and variable sections of DNA strands, all composed of four base pairs (ATCG or AUCG). These DNA sequences can subsequently be translated into or expressed as proteins.
[0056] With these sequences we can use the correlation coefficients (r.sup.2) of either DNA or protein as reference sections or locations. Correlation coefficients of these reference sections span from zero to 1, with r.sup.2=0 indicating no correlation and r.sup.2=1 indicating a perfect match.
[0057]
[0058]
[0059]
[0060]
EXAMPLES
[0061] The following examples further describe and demonstrate embodiments within the scope of the present invention. The Examples are given solely for purpose of illustration and are not to be construed as limitations of the present invention, as many variations thereof are possible without departing from the spirit and scope of the invention.
Example 1
The Same Wine Grape Grown in the Sonoma Valley and the Napa Valley
[0062] This is an example of genetically identical wine grape varieties being grown in two nearby, yet different geographic and climatic environments.
[0063] In this example the genetics (G) is constant, but the environmental conditions (E) under which the grapes are grown are different. The genetic profiling is expected to be identical, but the end product of the grapes should vary because of differences in the growing conditions such as soil, water, fertilizers, etc.
[0064] The data is expected to be as that shown in the generalized
Example 2
Two Different Wine Grapes Grown in the Sonoma Valley
[0065] This is an example of genetically different wine grape varieties being grown in the same geographic and climatic environments.
[0066] In this example the genetics (G) is different, but the environmental conditions (E) under which the grapes are grown are the same. The genetic profiling is expected to be different. The end product of the grapes should vary because of the isotopic differences of the compositions of the grapes superimposed upon the different growing conditions such as soil, water, fertilizers, etc. This environmental variation is expected to be expressed as larger isotopic differences that would be much greater than the isotopic differences due to the genetics.
[0067] The data is expected to be as that shown in the generalized
Example 3
Two Different Wine Grapes Grown in the Sonoma Valley and the Napa Valley
[0068] This is an example of genetically different wine grape varieties being grown in two nearby, yet different geographic and climatic environments.
[0069] In this example the genetics (G) is different and the environmental conditions (E) under which the grapes are grown are different. The genetic profiling is expected to be different. The end products of the grapes would also be different because of differences in the growing conditions such as soil, water, fertilizers, etc.
[0070] The data is expected to be as that shown in the generalized
Example 4
[0071] The Same Seed Crop, e.g. Corn, Grown Using Two Different Nitrogen Source Fertilizers
[0072] This is an example of genetically identical corn varieties being grown on the same field with part of the field fertilized with a synthetic nitrogen source (Haber process synthetic fertilizer) and the other part of the filed fertilized with an organic nitrogen source (manure).
[0073] In this example the genetics (G) is constant, but the environmental conditions (E) under which the corn is grown are different. The genetic profiling is expected to be identical, but the end product of the corn should vary because of differences in the nitrogen source.
[0074] The data is expected to be as that shown in the generalized
Example 5
Arabica and Robusta Coffee Grown in Two Different Locations
[0075] In this example Arabian or Arabica coffee (Coffea arabica) and robusta coffee (Coffea canephora, also known as Coffea robusta) (two different G's) are grown in two different geographical locations, e.g. Brazil and Vietnam (two different E's). Arabica coffee is generally preferred as being of a better quality and having a better taste and aroma that robusta coffee, which is considered as inferior and is often described as more harsh and bitter. Arabica coffee beans generally sell at a premium of greater than 1.5 times the price of robusta coffee beans. Arabica beans comprise about 60 percent of world production with robusta beans comprising about 40 percent. It would therefore be desirable to identify a sample of coffee and its growing location to authenticate Arabica coffee grown in Brazil or Vietnam from robusta coffee grown in those same two locations. The present method would therefore distinguish Arabica coffee from robusta coffee grown in Brazil. This would be highly desirable to avoid robusta coffee grown in Vietnam as being misbranded as higher quality Arabica coffee grown in Brazil or robusta coffee grown in Brazil being misbranded as Brazilian Arabica coffee.
[0076] The data is expected to be as that shown in the generalized
REFERENCES
[0077] U.S. Pat. No. 7,323,341 B1, Stable Isotopic Identification and Method For Identifying Products By Isotopic Concentration, to Jasper, issued Jan. 29, 2008. [0078] U.S. Pat. No. 8,367,414 B2, Tracing Processes Between Precursors and Products By Utilizing Isotopic Relationships, to Jasper, issued Feb. 5, 2013. [0079] PCT Application Publication No. WO2015/103183, Method for Continuously Monitoring Chemical or Biological Processes, to Jasper, published Jul. 9, 2015. [0080] PCT Application Publication No. WO2016/109631, Isotopic Identification and Tracing of Biologic Products, to Jasper, published Nov. 12, 2015. [0081] PCT Application Publication No. WO2015/103183, Molecular Isotopic Engineering, to Jasper published Jul. 7, 2016. [0082] Hayes, J. M., Practice and Principles of Isotopic Measurements in Organic Geochemistry, Revision 2, August 2002, pages 1-15, particularly Table 1.
INCORPORATION BY REFERENCE
[0083] The entire disclosure of each of the patent documents, including certificates of correction, patent application documents, scientific articles, governmental reports, websites, and other references referred to herein is incorporated by reference herein in its entirety for all purposes. In case of a conflict in terminology, the present specification controls.
EQUIVALENTS
[0084] The invention can be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are to be considered in all respects illustrative rather than limiting on the invention described herein. In the various embodiments of the methods and systems of the present invention, where the term comprises is used with respect to the recited steps or components, it is also contemplated that the methods and systems consist essentially of, or consist of, the recited steps or components. Furthermore, the order of steps or order for performing certain actions is immaterial as long as the invention remains operable. Moreover, two or more steps or actions can be conducted simultaneously.
[0085] In the specification, the singular forms also include the plural forms, unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the case of conflict, the present specification will control.
[0086] Furthermore, it should be recognized that in certain instances a composition can be described as composed of the components prior to mixing, because upon mixing certain components can further react or be transformed into additional materials.
[0087] All percentages and ratios used herein, unless otherwise indicated, are by weight.