METHOD FOR EXTRACTING RAMAN CHARACTERISTIC PEAKS EMPLOYING IMPROVED PRINCIPAL COMPONENT ANALYSIS

Abstract

A method for extracting Raman characteristic peaks employing improved principal component analysis comprising: using a confocal microscopic Raman-spectroscopic instrument to collect Raman spectroscopic data from surfaces of pork and beef samples; and performing preprocessing on the Raman spectroscopic data, performing principal component analysis, establishing a principal component loading scatter plot, extracting dot characteristics from the principal component loading scatter plot, analyzing same, and performing filtering on the dot characteristics to obtain Raman characteristic peaks. The method for extracting Raman characteristic peaks employing improved principal component analysis is used to extract Raman characteristic peaks from pork and beef samples, and then the Raman characteristic peaks are inputted into a classifier to undergo classification, thereby achieving high accuracy and quick classification.

Claims

1. A method for extracting Raman characteristic peaks employing improved principal component analysis, comprising: 1) using a confocal microscopic Raman-spectroscopic instrument to collect Raman spectroscopic data from surfaces of pork and beef samples; 2) performing preprocessing on the Raman spectroscopic data, then performing principal component analysis, then establishing a principal component loading scatter plot, then analyzing and extracting scatter characteristics of the principal component loading scatter plot, and filtering the Raman characteristic peaks according to the scatter characteristics.

2. The method for extracting Raman characteristic peaks employing improved principal component analysis according to claim 1, wherein the scatter characteristics are a polar diameter and a polar angle of a scatter.

3. The method for extracting Raman characteristic peaks employing improved principal component analysis according to claim 1, wherein the preprocessing comprises sequentially performing smoothing and baseline correction processing.

4. The method for extracting Raman characteristic peaks employing improved principal component analysis according to claim 1, wherein Step 2) comprises: 2.1) for a Raman spectrum dataset B(B.sub.1, B.sub.2, . . . , B.sub.n) of n samples and m wave bands obtained after the preprocessing, that is, each spectrum B.sub.i containing m wave bands, adopting a random sampling method to extract 2/23 spectrum from the dataset B to form a training set C(C.sub.1, C.sub.2, . . . , C.sub.2n/3), then performing the principal component analysis on the training set C to extract first two principal components PC.sub.1 and PC.sub.2 expressed as: $P C_{1} = {.Math.}_{k = 1}^{m} α_{1 k} β_{k}$ $P C_{2} = {.Math.}_{k = 1}^{m} α_{2 k} β_{k}$ where β.sub.k is a k-th wave band, α.sub.1k represents a load factor corresponding to the k-th wave band under the first principal component, and α.sub.2k represents a load factor corresponding to the k-th wave band under the second principal component; 2.2) drawing a load distribution diagram in a form of polar coordinates establishing a two-dimensional coordinate graph with a load coefficient α.sub.1k as a horizontal axis and a load coefficient α.sub.2k as a vertical axis, taking the load coefficient α.sub.1k and the load coefficient α.sub.2k corresponding to a Raman shift point β.sub.k of a same wave band in the two principal components as scatter coordinates (α.sub.1k, α.sub.2k) drawn in the two-dimensional coordinate graph to form the principal component loading scatter; then converting scatters from Cartesian coordinates (α.sub.1k, α.sub.2k) into polar coordinates (d.sub.k, θ.sub.k), and dividing, according to angles of the polar coordinates, a wavelength range occupied by all scatters into eight regions D.sub.i(i=1,2, . . . , 8) respectively being $[0, \frac{π}{4}), [\frac{π}{4}, \frac{π}{2}), [\frac{π}{2}, \frac{3 π}{4}), [\frac{3 π}{4}, π), [π, \frac{5 π}{4}), [\frac{5 π}{4}, \frac{3 π}{2}), [\frac{3 π}{2}, \frac{7 π}{4}), [\frac{7 π}{4}, 2 π);$ 2.3) determining positions of characteristic peaks for each region D.sub.i, calculating a weighted distance d.sub.ik of each scatter (α.sub.1k, α.sub.2k) from a coordinate center; $d_{i k} = \frac{\sqrt{{(λ_{1} α_{1 k})}^{2} + {(λ_{2} α_{2 k})}^{2}}}{\sqrt{λ_{1}^{2} + λ_{2}^{2}}}$ where λ.sub.1 and λ.sub.2 respectively represent weights of the first principal component and the second principal component, and d.sub.ik represents a weighted distance of a scatter corresponding to the k-th wave band of an i-th region D.sub.i; then calculating a variance v.sub.i and a mean e.sub.i of the weighted distances d.sub.ik of all scattered points in each region, taking a maximum weighted distance d.sub.ik as a maximum polar diameter r.sub.i to then perform a following judgment, wherein for each maximum polar diameter r.sub.i , if $\frac{r_{i} - e_{i}}{v_{i}} \geq 3$ is satisfied, the Raman shift point β.sub.k corresponding to the maximum polar diameter r.sub.i is regarded as a Raman characteristic peak.

5. The method for extracting Raman characteristic peaks employing improved principal component analysis according to claim 4, wherein dividing the wavelength range occupied by all scatters into eight regions according to the angles of the polar coordinates comprises dividing the principal component loading scatter plot into eight sector-shaped regions according to the angles with the coordinate center of the plot as an origin.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0026] FIG. 1 is flowchart of processing of Raman spectrum data according to the disclosure.

[0027] FIG. 2 are original peak graphs of pork and beef according to the disclosure.

[0028] FIG. 3 is a principal component loading scatter plot according to the disclosure.

[0029] FIG. 4 is a principal component loading scatter plot based on polar coordinates according to the disclosure.

DETAILED DESCRIPTION OF DISCLOSED EMBODIMENTS

[0030] The disclosure will be further described below with reference to the drawings and embodiments.

[0031] As shown in FIG. 1, an embodiment of the disclosure and the implementation process are as follows.

[0032] In the example, mainly three types of samples were distinguished, which were respectively an equally mixed adulterated minced meat sample and minced meat samples of pure beef and pure pork. The meat sources were all vacuum-packed fresh pork and beef tenderloin slaughtered in the same batch (slaughtered and processed according to the standard and passed the inspection of the health and quarantine department, and after 24 hours of acid removal). Before the experiment, the meat was removed from the freezer, thawed in room temperature water, and then air-dried to remove obvious fats and connective tissues in the sample. Pork and beef were equally mixed and placed into the meat grinder and stirred twice for 30 s each time to obtain the adulterated minced meat sample. Then, pure beef and pure pork were respectively placed into the meat grinder and stirred twice for 30 s each time to obtain the pure beef minced meat sample and the pure pork minced meat sample.

[0033] The following are the collection of Raman spectra of pork and beef, the extraction of characteristic peaks, and the method for establishing models based on the same.

[0034] A) Collection of Raman spectra of samples. In the example, a Raman spectrometer (Raman spectrometer-LabRAM HR evolution) with 633 nm excitation light source was selected as the collection instrument. The cooling temperature of the CCD camera was −65° C., and the exposure time was 3 s. The effective power of the line laser light source was 25%. After data of the three types of samples were collected, the data was exported to txt format and transmitted to a PC. In the example, 30 spectra of the beef minced meat, the pork minced meat, and the adulterated minced meat samples were collected. The Raman spectra of the beef, the pork, and the adulterated minced meat samples were respectively denoted as B.sub.i, P.sub.i, and M.sub.i(i=1, 2, . . . , 30).

[0035] B) Smoothing and denoising of Raman spectra. The window size m=21 over which smoothed data was established was specified for each spectrum. For the center point of the window, a fifth-order polynomial was used to fit data points in the window to form a system of equations composed of 21 six-variable linear equations. A least squares solution of the system of equations was found to obtain a fitting parameter α.sub.j(j=0,1, . . . , 5). The fitting parameter α.sub.j was substitute into a quintic polynomial to obtain the smoothed spectra B′.sub.i, P′.sub.i, M′.sub.i of the three types of samples.

[0036] C) Baseline correction of Raman spectra. For each smoothed spectral signal, an adaptive iterative reweighted penalized least squares method was adopted for baseline correction. A curve roughness penalty coefficient λ=100 was set to obtain the spectra B″.sub.i, P″.sub.i, M″.sub.i after baseline correction.

[0037] D) Determination of a range of characteristic peaks of Raman spectra. 20 spectra were extracted from each of B″.sub.i, P″.sub.i, M″.sub.i to form a training set C.sub.i1(i1=1, 2, . . . , 60). Principal component analysis was performed on C.sub.i1 to extract the first two principal components PC.sub.1 and PC.sub.2.

[0038] A two-dimensional coordinate graph was established with a load coefficient α.sub.1k as the horizontal axis and a load coefficient α.sub.2k as the vertical axis. The load coefficient α.sub.1k and the load coefficient α.sub.2k corresponding to β.sub.k of the same wave band in the two principal components were taken as scatter coordinates (α.sub.1k, α.sub.2k), which were drawn in the two-dimensional coordinate graph to form a principal component loading scatter plot, as shown in FIG. 3.

[0039] The Cartesian coordinates (α.sub.1k, α.sub.2k) were converted into polar coordinates (d.sub.k, θ.sub.k), that is, the principal component loading scatter plot was divided into eight sector-shaped regions according to angles with the coordinate center of the plot as the origin. The result is shown in FIG. 4.

[0040] E) Extraction of Raman Characteristic Peaks.

[0041] For each region D.sub.i, a weighted distance d.sub.ik of each scatter (α.sub.ik, α.sub.2k) from the coordinate center was calculated. Then, a variance v.sub.i and a mean e.sub.i were calculated according to the weighted distances d.sub.ik of all scattered points in each region. The maximum weighted distance d.sub.ik was taken as the maximum polar diameter r.sub.i, and judgment was then performed. For each maximum polar diameter r.sub.i, if

[00005] $\frac{r_{i} - e_{i}}{v_{i}} \geq 3$

was satisfied, the Raman shift point β.sub.k corresponding to the maximum polar diameter r.sub.i was regarded as a Raman characteristic peak.

[0042] In the example, 5 filtered characteristic peaks were 1605 cm.sup.−1, 1646 cm.sup.−1, 1416 cm.sup.−1, 1708 cm.sup.−1, and 2952 cm.sup.−1.

[0043] F) Establishment of a classification model of pork and beef based on Raman spectra. A training set and a test set were divided by adopting a 10-fold cross-validation method 10 times.

[0044] The training set and the test set were divided by adopting the 10-fold cross-validation method 10 times. A stratified sampling method was adopted, so that each mutually exclusive subset has 3 beef and pork samples each.

[0045] Peak intensities corresponding to the five Raman characteristic peaks extracted in the above step and combined with category labels were inputted into a classifier for training. A k-nearest neighbor classifier was adopted for the classifier. A value range nearest to the actual number k in the classifier was 4˜10. A model was established for different k values of the k-nearest neighbor classifier. The model with the maximum separation weighted result F1 score was taken as the final classification model. The final classification model was used to classify and identify the meat samples to be detected.

[0046] In the model selected in the example, k=5, and the classification result is shown in Table 1.

TABLE-US-00001 TABLE 1 Classification result Predicted result Actual condition Pork Beef Adulteration (50%) Pork 30 0 0 Beef 0 30 0 Adulteration (50%) 0 0 30

[0047] It can be seen from the above table that the classification model adopting the five Raman characteristic peaks extracted by the method as input parameters can accurately distinguish the beef, pork, and adulterated meat samples. It shows that the characteristic extraction method has high accuracy, and the number of extracted characteristics is small, which effectively simplifies the model and speeds up the speed of the classification algorithm.

METHOD FOR EXTRACTING RAMAN CHARACTERISTIC PEAKS EMPLOYING IMPROVED PRINCIPAL COMPONENT ANALYSIS

Assignee

Inventors

Cpc classification

Classification Explorer

G01N21/65

PHYSICS

Classification Explorer

G01N33/12

PHYSICS

Classification Explorer

G01N2201/1293

PHYSICS

International classification

Classification Explorer

G01N21/65

PHYSICS

Classification Explorer

G01N33/12

PHYSICS

Abstract

Claims

Description