EDIBLE OIL ANALYSIS SYSTEM AND METHOD
20210088495 ยท 2021-03-25
Assignee
Inventors
- Zhongping Yao (Hong Kong, CN)
- Cheuk Chi Albert Ng (Hong Kong, CN)
- Tsz-tsun NG (Hong Kong, CN)
- Suying LI (Hong Kong, CN)
Cpc classification
H01J49/0036
ELECTRICITY
G01N33/6851
PHYSICS
H01J49/0418
ELECTRICITY
International classification
Abstract
The present disclosure provides a method and system for analysing one or more edible oil samples. In an embodiment the disclosure provides for calibrating the matrix-assisted laser desorption/ionisation mass spectrometry (MALDI-MS) data obtained for one or more edible oil samples to obtain calibrated spectral data; and comparing the calibrated spectral data derived from the one or more samples against a library of calibrated MALDI-MS spectra for a plurality of edible oil samples to determine the most likely composition of the one or more edible oil samples.
Claims
1. A method for analysing one or more edible oil samples, the method comprising: receiving, by a processor, matrix-assisted laser desorption/ionisation mass spectrometry (MALDI-MS) data for one or more edible oil samples, calibrating, by the processor, the MALDI-MS data for the one or more edible oil samples to obtain calibrated MALDI-MS data by using reference peaks selected from triacylglycerol (TAG) peak(s) and 2,5-dihydroxybenzoic acid (DHB) matrix peak(s) of the MALDI-MS data of the one or more edible oil samples; comparing, by the processor, the calibrated MALDI-MS data derived from the one or more samples against a library comprising calibrated MALDI-MS data for a plurality of edible oil samples, wherein the library of calibrated MALDI-MS data is obtained by calibrating reference peaks selected from triacylglycerol (TAG) peak(s) and 2,5-dihydroxybenzoic acid (DHB) matrix peak(s) of the MALDI-MS data for the plurality of edible oil samples, wherein such comparison does not include principal component analysis (PCA) analysis; and determining a most likely composition of the one or more edible oil samples.
2. The method for analysing one or more edible oil samples according to claim 1, wherein the comparison between the calibrated MALDI-MS data derived from the one or more samples and the library of calibrated MALDI-MS data for the plurality of edible oil samples is performed using a Cosine similarity test.
3. The method for analysing one or more edible oil samples according to claim 1, wherein the most likely composition of the one or more edible oil samples is determined after ranking, based on the calibrated MALDI-MS data, a plurality of calibrated MALDI-MS data of known edible oil types in the library according to their likelihood of being in the one or more edible oil samples, from most likely to least likely using cosine similarity test scores; and determining the most likely identification of the one or more edible oil samples based upon the highest ranked cosine similarity score.
4. The method for analysing one or more edible oil samples according to claim 2, wherein the cosine similarity test is conducted on one or more regions of the calibrated MALDI-MS data derived from the one or more samples selected from the group comprising high mass, low mass and TAG regions and one or more corresponding regions of the edible oil samples in the library of calibrated MALDI-MS data.
5. The method for analysing one or more edible oil samples according to claim 1, wherein following calibration and before comparison with the library of calibrated MALDI-MS data, the calibrated MALDI-MS data derived from the one or more samples is quantised by data binning.
6. The method for analysing one or more edible oil samples according to claim 6, wherein the data binning is performed by dividing an entire spectra into intervals of 0.5 m/z, averaging intensity of all readings within each bin, and setting an m/z reading as a m/z value at the middle of the interval.
7. The method for analysing one or more edible oil samples according to claim 6, wherein following binning, the calibrated MALDI-MS data derived from the one or more samples is normalised by dividing an intensity of each bin with either a maximum intensity or a total intensity of all bins, multiplying by an appropriate scaling parameter and rounding to a nearest integer.
8. The method for analysing one or more edible oil samples according to claim 1, wherein the comparison between the calibrated MALDI-MS data derived from the one or more samples and the library of calibrated MALDI-MS data for the plurality of edible oil samples is performed using a statistical test selected from the group consisting of characteristic peak matching methods, partial least squares discriminant analysis (PLS-DA), and decision tree-based methods.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0050] In order to describe the manner in which the above-recited and other advantages and features of the disclosure can be obtained, a more particular description of the principles briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only exemplary embodiments of the disclosure and are not therefore to be considered to be limiting of its scope, the principles herein are described and explained with additional specificity and detail through the use of the accompanying drawings.
[0051] Preferred embodiments of the present invention will be explained in further detail below by way of examples and with reference to the accompanying drawings, in which:
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0074] Various embodiments of the disclosure are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without departing from the scope of the present disclosure.
[0075] The disclosed technology addresses the need in the art for a scalable, reliable technique for analysis of edible oils.
Typical MALDI Protocol for Analysis of Samples (Library and Unknown Samples)
[0076] Typical MALDI-MS protocol involved mixing of matrix solution and sample solution, which is subsequently allowed to dry onto the MALDI plate. After the formation of sample and matrix crystal, the MALDI plate is inserted into the MALDI-MS instrument for analysis.
[0077] No chromatographic separation is involved in MALDI-MS analysis, thus allowing rapid analysis of edible oil samples.
[0078] In the exemplary MALDI-MS protocol, edible oil samples could be loaded directly onto the MALDI plate pre-deposited with matrix layer for the MALDI-MS analysis. Sample loading for one sample could be finished within several seconds and around three hundred samples could be loaded onto the same MALDI plate.
[0079] In a specific steps of sample loading according to an exemplary approach used aliquots of 1 L of 100 mg mL1 DHB in acetone were loaded onto spots of the MADLI plate and air-dried to form matrix layers. About 0.2 L of each oil sample was then transferred by pipette tip or cotton tip to form a thin oil layer on the matrix layer.
[0080] The plate was then introduced into the mass spectrometer for MALDI-MS analysis.
[0081] An Ultraflex Xtreme MALDI-TOF/TOF mass spectrometer (Bruker Daltonics, Germany) was used for the analysis. The laser of the MALDI source was a Smart beam-II (Nd:YAG, 355 nm) pulse laser operating at a frequency of 2000 Hz. The mass spectrometer was operated in positive reflectron mode. The settings of positive reflectron mode for the ion source 1, source 2, lens, reflector 1, reflector 2, and pulsed ion extraction were 20.00 kV, 17.75 kV, 7.00 kV, 21.10 kV, 10.85 kV, and 140 ns, respectively. The sample rate and digitizer were set to 5.00 GS/s. Extended mass ranges were employed if necessary.
[0082] The mass spectrometer was calibrated with a PEG solution mixture (PEG600/PEG1000/PEG2000/Nal=1/2/2/5 (v/v)).
[0083] The spectral acquisition was performed using the flexControl 3.4 (Bruker Daltonics, Germany) program. The mass spectra were analysed using flexAnalysis 3.4 (Bruker Daltonics, Germany) program. The Centroid algorithm was used for peak detection.
[0084] Referring to the
[0085] Referring now to
[0086] Peaks clustered around 935.8 Da can be oxidised TAGs and peak at 1059.8 Da seem correlated with TAGs-fragment cluster ion.
[0087] In
[0088] Finally,
[0089]
[0090]
[0091] Referring now to
[0092] PCA converts observations of possibly correlated variables into a set of values of linearly uncorrelated variables using an orthogonal linear transform. This technique allows for visualising and processing of high dimensional datasets but at the same time retaining as much of the variance of the dataset as possible.
[0093] In conducting the principal component analysis of the sample, a score plot is generated from first and second principal TAG components of the sample.
[0094] Results of using PCA on TAG components showed that samples from the same species were clustered individually and different vegetable oil species could be clearly differentiated from each other. (see
[0095] In
[0096] Group 1 (10) is likely to be Peanut oil, Group 2 (12) flaxseed, Group 3 (14) vegetable oils with TAGs patterns similar to olive oil and Group 4 (16) other vegetable oils.
[0097] Referring to
[0098] However, as noted in the background to the present invention, PCA analysis suffers from a number of deficiencies which mean that it is not scalable, and must be performed by skilled operators.
[0099] Accordingly, the present disclosure provides a method and system which addresses these deficiencies, and which enables a robust, scalable technique for analysis, verification and identification of edible oils.
[0100] Referring to
[0101] Referring to component 50, there is disclosed a series of steps associated with producing a library of MALDI-MS spectral data. A plurality of edible oil samples having a known type and origin are selected for assessment at step 52. In the library produced, we have selected up to six hundred samples from various suppliers, including from Mainland China, Taiwan, Hong Kong, and Sigma-Aldrich in the USA. Multiple instances of one type oil from different sources have been selected in many cases to provide a complete database.
[0102] These samples are subjected to a MALDI-MS analysis as disclosed at the protocols of the present disclosure in step 54.
[0103] The MALDI-MS spectra obtained from the analysis at step 56 are passed through an optional quality assurance review at step 58 involving review by a trained operator. During this review, the operator reviews the calibration and resolution of the spectrum in order to ensure that the best reference spectra are included in the library. A high level of oxidised products and/or poor calibration will cause the data to be rejected by the human reviewer. It would be appreciated that although this step is optional, particularly in the generation of a library of spectral data, review of the spectral data to ensure that readily apparent errors or tainted samples are rejected which in turn increases the integrity of the library of spectral data obtained.
[0104] Following the quality assurance review, the MALDI-MS spectra are stored in the spectral database at step 59, thereby forming a library of spectral data of edible oil samples. Optionally, in order to increase the accuracy of the presumptive identification made using the samples in the library, a plurality of different samples for a known oil type may be obtained from a number of different manufacturers, it is anticipated that notwithstanding the origin of the oil being from a number of different manufacturers that a fairly similar spectra will be observed.
[0105] A further part of the system of the present disclosure includes the sample analysis system referred to at box 60 of
[0106] The unknown samples are subjected to MALDI-MS analysis at step 64 in accordance with the earlier described protocol of the present disclosure to produce the sample MALDI-MS spectra at step 66.
[0107] Referring now to the identification component of the present disclosure 70, the series of steps in an exemplary identification system is described in overview.
[0108] The sample spectra 66 are matched at the algorithmic matching step 72 by a processor of a computer. The algorithmic matching may be a form of comparison such as a cosine similarity test or similar by which the sample data 66 is compared against the spectral database 59 which has been obtained for a plurality of edible oil samples in the library generation component 50. This process is disclosed in more detail with reference to
[0109] The outcome of the algorithmic matching between the sample spectra 66 and the spectral database 59 is one or more similarity scores according to the edible oil samples present in the library.
[0110] As is appreciated by a person skilled in the art, the higher the similarity score produced by whatever algorithm is used for matching the spectra of the sample with the corresponding spectra in the edible oil library, the higher the chance of presumptive identification of the unknown edible oil sample.
[0111] It would be appreciated that this process is similar, notwithstanding whether the unknown edible oil sample is an adulterated edible oil sample, a pure edible oil sample or a mixture of edible oilsthe matching with the library via the algorithm and resulting presumptive identification will be generally the same.
[0112] Referring now to
[0113] Once the raw MALDI-MS spectra have been obtained from user input, this spectra is calibrated at step 72a. Calibration is typically conducted using prominent TAG peaks and the DHB matrix peaks as a reference. This calibration process ensures that the samples are standardized, for ease of analysis and reproducibility of analysis against the library of standardized edible oil spectra peaks.
[0114] Alternative matrixes could also be used such as CHCA (-Cyano-4-hydroxycinnamic acid) and SA (sinapinic acid). Based upon whichever matrix was used appropriate calibration with the characteristic peaks could also be performed. However, it was noted that the background noise were higher if CHCA were used, and the signal intensity were very poor if sinapinic acid was used as the matrix.
[0115] To assist in obtaining reproducible results, as a matter of best laboratory practice the mass spectrometer should also be calibrated with a PEG solution mixture (PEG600/PEG1000/PEG2000/Nal=1/2/2/5 (v/v)) before conducting analysis.
[0116] Following calibration, a binning process is conducted on the data at step 72b. As is known to persons skilled in the art, data binning or bucketing is a data pre-processing technique which quantizes the data. Basically, in data binning, the original data series values which fall in a given small window or bin are replaced by a value which is a representative of that interval, often the centre value.
[0117] As is known in the art, data binning reduces the amount of data (necessarily losing information) but facilitating analysis. In the MALDI-MS spectra analysis, the typical size of bin was 0.5 m/z, although other sizes could be used. It would be appreciated by a person skilled in the art that an increased bin size will decrease the resolution of the data obtained. The size of bin affects the accuracy and quality of the matching, and the optimal size of the bin represents a balance between too much detail and too low resolution.
[0118] In an exemplary embodiment of the present invention the data binning process used was dividing the entire spectra into intervals of 0.5 m/z, averaging the intensity of all readings within each bin, and setting the m/z reading as the m/z value at the middle of the interval.
[0119] Following the data binning process at step 72b, the data was normalized by dividing the intensity of each bin with the maximum intensity of all bins, multiplied with 10000 and rounded to the nearest integer.
[0120] Alternatively, in an alternative process of normalization, the sample data could be normalised by dividing the intensity of each bin with the total intensity of all bins, and then multiplying the result by 10000 and rounding.
[0121] Following the normalization process, the unknown sample data which has been calibrated, binned, and normalized, is compared against a database of reference spectra of edible oils (where the information in that database is for edible oils which have been similarly calibrated, binned and normalized) in step 74. This comparison may be conducted using a cosine similarity approach.
[0122] As is known in the art, cosine similarity matching is a measure of similarity in which the samples of unknown sample spectra is represented as a vector; and for which the dot product of that sample with a plurality of vectors representing the sample data in the library is obtained.
[0123] Cosine similarity calculates the cosine of the angle between two vectors.
[0124] Where vectors are in generally the same direction, the cosine of the angle between the two vectors is near to 100% or 1. However, where the vectors (abstractions of the spectral data) are orientated in different directions, the cosine of the angle between the two vectors which is obtained by the dot product solution is near zero; that is 0%. Furthermore, where the vectors are in completely different directions, the cosine similarity obtained is 1.
[0125] Accordingly, the cosine similarity is a useful algorithm to obtain a numerical score which represents the degree of similarity between two spectra.
[0126] Optionally, other data processing technique such as characteristic peak matching methods (mainly for detecting the presence or absence of oxidation products or cyclopeptides), partial least squares discriminant analysis (PLS-DA), or decision-tree based techniques (e.g. random forest) could also be used to compare the calibrated, binned and normalized MALDI-MS sample data with the similarly calibrated, binned and normalized data of the reference spectra.
[0127] The outcome of the process at step 76 is the identification of the oil.
[0128] Advantageously, as is depicted in successive Figures, it may be provided in the form of a ranked series of scores of cosine similarity for a plurality of samples of the library, or alternatively may be simply selected as the highest score identification.
[0129] Referring now to
[0130] On the left hand side of the sample, it can be seen that there are a plurality of samples of canola oil 82 in which the details of the manufacturer and collection source have been recorded at 84.
[0131] There is also optionally the ability to view the raw numerical data in text format by selection of the link 86.
[0132] Most edible oils have one group of TAG peaks at around 870-885 Da, and another group of TAG peaks at 900-910 Da. Some edible oils such as coconut oil would have their TAG peaks at different region. For each type of oil, the ratio of each TAG is largely determined by the enzymes of the parent species. Therefore, the relative intensities of peaks within the TAG region are specific for each types of oil, and the intensities of the peaks form a distinctive shape for each types of oil. Hence, the location and shape of the TAG regions could be used as a fingerprint to identify the oil type of an unknown.
[0133] Accordingly, the present disclosure provides the ability for users to view, for example, by browsing the reference spectra of multiple different types of edible oils, and in many cases, for multiple samples of a particular edible oil.
[0134] This sample reference data may be stored as a series of raw numerical data, but represented for ease of human interpretation in the graphical format depicted.
[0135] Referring now to
[0136] Referring now to
[0137] For the sample identified at 92 as butter, the correlation score is relatively lower (0.9410) meaning that there is not as much confidence in the presumptive identification.
[0138] Similarly at 94, the edible oil identified is pumpkin seed oil, however the correlation score is also relatively low.
[0139] (Note: The low score may depend on the usage. If the user just wants to identify an unknown with no other information, the user may be satisfied with a score of 0.97, as it would be the closest match.
[0140] However, if the user wants to know if an oil with known type has been adulterated or heated for some time, the threshold for correct matching needs to be higher, as a lower score would mean that the TAGs in the oil has been changed somehow.
[0141] Generally speaking, based upon the experimental data obtained to date a threshold score of 0.97 seems to be a reasonable level for most purposes.)
[0142] These results may be compared to the high correlation score shown in
[0143] Accordingly, the user reviewing these results in view of the weak correlation scores would be less confident of the presumptive identifications of the latter samples.
[0144] Referring now to
[0145] Alternatively, if these samples were obtained from an unknown sample the system would generate a poor correlation score against the oil spectra in the database, and the user would know that the oil has been somehow modified (adulterated, heated, stored for too long, etc.).
[0146] Referring now to
[0147] Similarly, referring now to
[0148] This is apparent when the spectrum or the poorly calibrated spectrum of
[0149] Another example of poor calibration and resolution can be seen in
[0150] Accordingly, following review and quality assurance being performed by an operator, as depicted in
[0151] Referring to
[0152] It can be seen that the problematic sample of spectra in 10a(i) has an entirely different peak pattern of the flaxseed oil samples depicted in 10a(ii) and 10a(iii). Following the identification of the mis-match, between the claimed reference sample of flaxseed oil and the other reference samples in the reference library, the problematic sample could be subjected to a GC-FID analysis. As depicted in
[0153] Accordingly, the integrity of the reference library for identification of the edible oil samples can be increased.
[0154] The present invention provides an advantageous, potentially scalable method of identifying edible oils. This enables the rapid detection of mislabelled edible oils, the identification of adulterated oils and gutter oils, as well as the ability to authenticate labelled oil spectra.
[0155] It is also possible for the major and minor elements of the mixed oil to be identified, through comparison with reference samples such as those depicted in
[0156] In this usage, the user can check the relative proportions claimed on the label of the edible oil with the actual detected compositions.
[0157] Another advantage of the present invention is that the MALDI-MS analysis can be conducted at an analytical laboratory and the reference library may necessarily be located at a location which is geographically remote from that analytical laboratory. The presumptive identification can then take place over the internet, with the data simply uploaded onto an appropriate website.
[0158] The analysis and presumptive identification of the edible oils can be carried out by as a routine laboratory procedure, without the need for ongoing training or exhaustive statistical analysis.
[0159] The algorithm used for matching the spectra provides a reliable, scalable and efficient way of presumptively identifying a wide variety of edible oils against a reference library.
[0160] The inclusion of the automated data matching process removes the need for human matching of the reference spectra.
[0161] Unlike previous PCA approach, the algorithm does not need to be modified if a new type of oil is added to the database.
[0162] The results can also be displayed to the user automatically, which is not possible with the previous approaches.
[0163] The above embodiments are described by way of example only. Many variations are possible without departing from the scope of the invention as defined in the appended claims.
[0164] For clarity of explanation, in some instances the present technology may be presented as including individual functional blocks including functional blocks comprising devices, device components, steps or routines in a method embodied in software, or combinations of hardware and software.
[0165] Methods according to the above-described examples can be implemented using computer-executable processes that are stored or otherwise available from computer readable media. Such processes can comprise, for example, instructions and data which cause or otherwise configure a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Portions of computer resources used can be accessible over a network. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, firmware, or source code. Examples of computer-readable media that may be used to store instructions, information used, and/or information created during methods according to described examples include magnetic or optical disks, flash memory, universal serial bus (USB) devices provided with non-volatile memory, networked storage devices, and so on.
[0166] Devices implementing methods according to these disclosures can comprise hardware, firmware and/or software, and can take any of a variety of form factors. Typical examples of such form factors include laptops, smart phones, small form factor personal computers, and so on. Functionality described herein also can be embodied in peripherals or add-in cards. Such functionality can also be implemented on a circuit board among different chips or different processes executing in a single device, by way of further example.
[0167] The instructions, media for conveying such instructions, computing resources for executing them, and other structures for supporting such computing resources are means for providing the functions described in these disclosures.
[0168] Although a variety of examples and other information was used to explain aspects within the scope of the appended claims, no limitation of the claims should be implied based on particular features or arrangements in such examples, as one of ordinary shill would be able to use these examples to derive a wide variety of implementations.
[0169] Further and although some subject matter may have been described in language specific to examples of structural features and/or method steps, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to these described features or acts. For example, such functionality can be distributed differently or performed in components other than those identified herein. Rather, the described features and steps are disclosed as examples of components of systems and methods within the scope of the appended claims.