IR SPECTRA MATCHING METHODS
20200284719 ยท 2020-09-10
Inventors
- Razib Iqbal (Springfield, MO, US)
- Keiichi Yoshimatsu (Springfield, MO, US)
- Joshua Ellis (Springfield, MO, US)
Cpc classification
G01N2021/3196
PHYSICS
International classification
Abstract
Spectra matching is widely used in various applications including the search for a spectrum of an unknown or subject material, chemical, or compound in an existing spectral database and quality control by means of comparing the spectra of products with standards. New systems and methods are described for identifying an unknown compound by calculating the similarities of Fourier-transform infrared (FTIR) spectra of organic compounds. The systems and methods incrementally calculate the spectral similarity based on the local spectral shapes. This reduces the bias caused by uneven weighing of large or broader peaks. In addition, the new systems and methods tolerant to the common issues in spectra matching including baseline offset, baseline sloping, and deviations in wavenumber axis alignment, suggesting its robustness and practical applicability.
Claims
1. A method for identifying a subject material having a subject spectrum, the method comprising the steps of: providing a database containing a plurality of reference spectra for a plurality of reference materials; calculating a set of normalized local change values for the subject spectrum and for each reference spectrum in the plurality of reference spectra; calculating a spectral similarity value to the subject spectrum for each reference spectrum in the plurality of reference spectra in the database using the sets of normalized local change values; and identifying the subject material as the reference material with the highest spectral similarity value with the subject spectrum.
2. The method of claim 1 wherein each spectrum comprises a plurality of wavenumber data values and an absorbance data value associated with each of the plurality of wavenumber values, and wherein the step of calculating a set of normalized local change values for a spectrum further comprises calculating a normalized local change value for each of a plurality of the wavenumber values in the spectrum.
3. The method of claim 2 wherein the step of calculating a set of normalized local change values for a spectrum further comprises the steps of: selecting a range value; selecting absorbance data values to calculate each normalized local change value using the range value.
4. The method of claim 3 wherein the step of selecting a range value comprises the steps of: selecting at least one known spectrum associated with a first reference spectrum; calculating a spectral similarity value to at least one known spectrum for each of the reference spectrum in the plurality of reference spectra utilizing a plurality of range values; and calculating a success rate and a minimum average hit index for the reference spectra; selecting a range value from the plurality of range values associated with the maximum success rate and minimum average hit index for the reference spectra.
5. The method of claim 2 wherein the step of calculating a set of normalized local change values for a spectrum further comprises the steps of: selecting a floor value for the spectrum; replacing all absorbance data values in the spectrum that are less than the floor value with the floor value.
6. The method of claim 5 wherein the step of selecting a floor value for the spectrum comprises the steps of: calculating an average absorbance value of the absorbance data values associated with the spectrum; multiplying the average absorbance value by a floor multiplier value.
7. The method of claim 6 wherein the floor multiplier value is selected by the steps of: selecting at least one known spectrum associated with a first reference spectrum; calculating a spectral similarity value to at least one known spectrum for each of the reference spectrum in the plurality of reference spectra utilizing a plurality of floor multiplier values; and calculating a success rate and a minimum average hit index for the reference spectra; selecting a floor multiplier value from the plurality of floor multiplier values associated with the maximum success rate and minimum average hit index for the reference spectra.
8. The method of claim 2 wherein the step of calculating a spectral similarity value to the subject spectrum for each reference spectrum comprises the steps of: associating a wavenumber value from the subject spectrum with a wavenumber value from the reference spectrum; calculating a spectral difference value using the normalized local change value associated with the wavenumber value from the subject spectrum and the normalized local change value associated with the wavenumber value from the reference spectrum; converting the spectral difference value to a spectral similarity value.
9. The method of claim 8 wherein the step of associating a wavenumber value from the subject spectrum with a wavenumber value from the reference spectrum comprises: selecting the closest wavenumber values from the subject spectrum and the reference spectrum that are associated with an absorbance value; and associating the selected closest wavenumber values.
10. The method of claim 9 wherein the wavenumber values are selected only if the difference between the wavenumber values is less than or equal to a chosen wavenumber value for the reference spectra.
11. The method of claim 8 further comprising repeating the step of associating a wavenumber value for as many wavenumbers in the reference spectrum as possible.
12. A method for identifying a reference spectrum for an unknown subject spectrum from among a plurality of reference spectra, each spectrum comprising a plurality of wavenumber values, each wavenumber value associated with an absorbance value, the method having the steps of: calculating a set of normalized local change values for each spectrum of the subject spectrum and the plurality of reference spectra; calculating a spectral similarity value for each reference spectra using the set of normalized change values for the subject spectrum and the set of normalized change values for the reference spectra; identifying the reference spectra for the unknown subject spectrum by selecting the maximum spectral similarity value.
13. The method of claim 12 wherein each set of normalized change values comprise a plurality of wavenumber values each associated with a normalized local change value, and the step of calculating a set of normalized local change values comprises the steps of: calculating a normalized local change value for each of a plurality of target wavenumber values in the spectrum; associating the calculated normalized local change value with the target wavenumber.
14. The method of claim 13 wherein the step of calculating a normalized local change value for a target wavenumber value further comprises the steps of: selecting a range value; selecting a floor multiplier value; selecting the absorbance values in the spectrum associated with wavenumber values in a band extending from the target wavenumber plus and minus the range value.
15. The method of claim 14 wherein the step of calculating a normalized local change value for a target wavenumber value further comprises the step of calculating the ratio of the sum of the absorbance values for wavenumbers in the band greater than the target wavenumber over the sum of the absorbance values for wavenumbers in the band.
16. The method of claim 15 wherein the step of calculating a spectral similarity for a reference spectrum and the subject spectrum further comprises pairing a target wavenumber from the set of normalized local change values for the reference spectrum with a target wavenumber from the set of normalized local change values for the subject spectrum.
17. The method of claim 16 wherein the step of calculating a spectral similarity for a reference spectrum and the subject spectrum further comprises calculating a spectral difference value based on the absorbance value for the paired target wavenumbers from the reference spectrum and the subject spectrum.
18. The method of claim 12 further comprising repeating the step of associating a wavenumber value for as many wavenumbers in the reference spectrum as possible.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
DETAILED DESCRIPTION
[0028] The normalized local change approach and embodiments of the novel method incorporates information about the variation of a spectrum in a range around each wavenumber to calculate a value, NLC.sub.k, for that wavenumber. In various embodiments, the range and other parameters of the method may vary to achieve optimal performance of the method. The value of NLC.sub.k may be calculated as in Equation 2 where A is the set of absorbance values for a spectrum,
A.sub.i is the absorbance value for wavenumber i, and r is the range.
A graphical depiction of this calculation is shown in
[0029] The value of NLC.sub.k varies between 0 and 1 because R<L+R and both L and R have non-negative values. Values of NLC.sub.k between 0 and 0.5 indicate that the absorbance A is decreasing as wavenumber increases near k, and values between 0.5 and 1 indicate that the absorbance A is increasing as wavenumber increases near k.
[0030] Once the NLC.sub.k values have been calculated for all k in a spectrum A, the Spectral Similarity (SS.sub.NLC) may be calculated by comparing the NLC data sets for two spectra. In some embodiments of the inventive method, the spectral similarity (SS.sub.NLC) for two spectra A1 and A2, where SD.sub.NLC is the spectral difference and total length is the wavenumber length of the spectra, is calculated according to Equation 3.
[0031] In this embodiment SD.sub.NLC as shown in Equation 4 is the sum of the difference between NLC.sub.k(A1) and NLC.sub.k(A2) at each wavenumber k divided by the total wavenumber length of the spectra. In some embodiments the total length is the number of wavenumbers k for which the spectrum data set contains a value for absorbance.
[0032] When comparing two spectra, two values for NLC.sub.k, one from each spectra, must be paired for the calculation of SD.sub.NLC. Optimally, a wavenumber k will be associated with an NLC.sub.k value in both spectra. However, in some situations a wavenumber k will have an NLC.sub.k value in only one of the two spectra. In some embodiments, the NLC.sub.k value in one spectrum may be paired with an NLC.sub.k value in other other spectrum with a different wavenumber k. In some embodiments, an NLC.sub.k value in the first spectrum will be paired with the NLC.sub.k value in the second spectrum with the closest wavenumber k. In some embodiments, an NLC.sub.k(A1) value for a given k1 value in a first spectrum A1 will be disregarded by the NLC method if there is no NLC.sub.k(A2) value in the second spectrum A2 for a k2 value within some proximity to k1. In some embodiments, the k2 value must be within seven wavenumbers from the k1 value for the associated NLC.sub.k values to be paired for the NLC method. In other embodiments, a narrower or wider window for matching NLC.sub.k values between spectra may be used.
[0033] In some embodiments of the NLC method, the data values in each spectrum may be padded with absorbance values of 0 for wavenumbers less than the lowest wavenumber in the spectra and for wavenumbers greater than the highest wavenumber in the spectra. In some embodiments the padded data values extend at least R wavenumbers to each side of the lowest k and highest k values in the spectrum, where R is the range used by the embodiment of the NLC method. This allows the NLC method to compare data values up to each end of the spectrum.
[0034] In some methods of identifying a material, chemical, or compound using the NLC method, a spectrum A.sub.U for an unknown compound is received by the system. This spectrum A.sub.U may be collected by an infrared spectrometer or other similar instrument. The spectrum A.sub.U is processed to create a dataset comprising NLC.sub.k(A.sub.U ) values for all or a portion of the absorbance values A.sub.U(k) in the unknown spectrum. In some embodiments a second spectrum for known reference compound A.sub.R is processed to generate a dataset comprising NLC.sub.k(A.sub.R) values for all or a portion of the absorbance values A.sub.R(k) in the reference spectrum. The spectral difference SD.sub.NLC(A.sub.U,A.sub.R) is then calculated and converted to a value for the spectral similarity SS.sub.NLC. This number is between 0 and 1 and higher values indicate more similarity between the two spectra.
[0035] In some embodiments a reference database is provided containing a plurality of spectra for comparison to the spectra of an unknown compound. In some embodiments, the database may contain the actual absorbance data values A.sub.R(k) for each reference compound c. In other embodiments the database may contain datasets of the processed NLC.sub.k(A.sub.R) values for each reference compound c.
[0036] In some embodiments of the method of identifying a material, additional processing of the spectrum before calculating the NLC data values further improves the performance of the NLC method. In some embodiments a floor value is selected to prevent the NLC method from incorrect bias due to small peaks in the regions of the spectrum with near zero absorbance. When using the floor pre-processing step, the data set representing the spectrum is processed by checking each absorbance value A.sub.k in the spectrum, and if the value A.sub.k is below a floor value then the absorbance value A.sub.k is replaced with the floor value. In some embodiments of the method, the floor value F for a spectra A is determined by multiplying the average absorbance A.sub.k for a spectrum multiplied by a Floor Multiplier value.
[0037] Referring to
[0038] Referring to
[0039] In some embodiments of the inventive method, the Range parameter and the Floor Multiplier parameter are set to predetermined constant values. In some embodiments of the method an analysis of success rates for the NLC method using various values of Range and Floor Multiplier is used with a known spectrum tested against a reference database to select a combination of values for those parameters that will maximize the success rate and that will minimize average hit index of the NLC method.
[0040]
[0041] In order to find the optimal values for the two parameters for a given embodiment, the lower performance of the two metrics for each pairing is depicted in
[0042] In
[0043] The success rate and average hit index metrics may also be used to compare the performance of the NLC method with other commonly used methods. Table 2 depicts the values of these metrics for different methods of comparison using database d.
TABLE-US-00001 Method Success Rate Average Hit Index NLC 93.33% 1.28 COR 89.17% 1.47 DPN 88.33% 1.55 fd-MA 88.33% 2.23 EUC 83.33% 3.03
[0044] Spectra may be subject to various artifacts of the data capture process, equipment calibration issues, or other factors that introduce artificial dissimilarities between the spectrum and the reference spectra. For example, a spectrum may be offset by some amount such that the spectra are similar but transposed up or down from the reference spectra for that compound. Another common artifact is baseline sloping whereby the spectrum is skewed up or down. Yet another common artifact is a shift in the wavenumber of the spectrum such that the spectrum is transposed left or right of the reference spectra for that compound. The NLC method is less sensitive to these artifacts than other methods, and this is less likely to misidentify a spectrum due to these types of artifacts.
[0045]
[0046] Artifacts like baseline offset can be caused by the presence of dust on the optical parts of the spectroscopy instrument or similar issues. As shown in
[0047] Referring now to
[0048] Referring now to
[0049] The values for Range, Floor Multiplier, and the sensitivity to offset, baseline sloping, or wavenumber shift may be different from those depicted with respect to the described embodiments. In the embodiments described herein the NLC method is used with FTIR spectra. In other embodiments the NLC method may be used with other types of spectroscopy techniques.
[0050]
[0051] Referring specifically to
[0052] Referring now to
[0053]
[0054] Overall, in comparison to the COR and DPN approaches, the NLC method considers the local characteristics of a spectrum (range-to-range comparison) without being influenced by the information in the rest of the spectrum. The spectra of 2-(4-isobutylphenyl)propionic acid (ibuprofen) and propionic acid are other examples where the NLC method successfully matched the spectra but the COR and DPN approaches failed. The visual comparison of these spectra suggests the capability of the NLC method to capture the spectral features such as the location and width of both large and small peaks. These characteristics suggest the suitability of the NLC method for being applied in the matching of FTIR spectra where many absorption peaks with varied absorption cross-section and peak width are observed.
[0055] In a preferred embodiment, the NLC method is embodied in special purpose software executing on a general purpose computer. In some embodiments, the NLC method may be encoded in firmware on special purpose computer hardware, or in special purpose integrated circuits or other technological processes. In some embodiments the NLC method may be incorporated into spectrometer and applied to a spectrum as the spectrum is captured by the spectrometer.
[0056] Changes may be made in the above methods, devices and structures without departing from the scope hereof. Many different arrangements of the various components depicted, as well as components not shown, are possible without departing from the spirit and scope of the present invention. Embodiments of the present invention have been described with the intent to be illustrative and exemplary of the invention, rather than restrictive or limiting of the scope thereof. Alternative embodiments will become apparent to those skilled in the art that do not depart from its scope. Specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one of skill in the art to employ the present invention in any appropriately detailed structure. A skilled artisan may develop alternative means of implementing the aforementioned improvements without departing from the scope of the present invention.
[0057] It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations and are contemplated within the scope of the claims. Not all steps listed in the various figures need be carried out in the specific order described.