INTERACTION OF SPECTROSCOPY AND ARTIFICIAL INTELLIGENCE FOR SEROLOGICAL ANALYSIS AND ITS APPLICATIONS

Abstract

A spectroscopy and artificial intelligence-interaction serum analysis method includes: collecting bulk SERS spectral data of clinical serum samples, performing dimension reduction on the spectral data by using a covariance matrix to obtain spectral different peak positions of cancer patients and normal individuals, and performing spectral data processing and algorithm identification by using an svm model of an artificial intelligence algorithm to obtain a cancer identification rate. Compared with the conventional serum analysis method, the spectroscopy and artificial intelligence-interaction serum analysis method requires no antibody-antigen or other biological specificity modification processes, and the serum of cancer patients and normal individuals can be identified more cheaply, rapidly and accurately. Also the different peak positions in SERS spectra of a large amount of serum samples can be located, which provides an entirely novel detection and analysis method at a molecular bond energy level for the field of liquid biopsy of clinical cancers.

Claims

1. A spectroscopy and artificial intelligence-interaction serum analysis method, wherein, the spectroscopy and artificial intelligence-interaction serum analysis method uses silver nanowires without intrinsic Raman signal as surface enhanced Raman scattering (SERS) probes; a silver nanowire solution is directly liquid-phase mixed and co-incubated with serum samples from diseased patients and normal individuals, respectively; after incubation, serum SERS spectral data collection is performed under test of a Raman spectrometer to obtain original spectral data points; later, a dimension reduction is performed on the original spectral data points by using a covariance matrix, and spectral data points obtained by the dimension reduction are thus different peak positions of diseased samples compared with normal samples; then, a classification training and identification are further performed on the spectral data points after the dimension reduction by using a support vector machine model to finally obtain identification accuracy rates of the different diseased samples compared with the normal samples; wherein the spectroscopy and artificial intelligence-interaction serum analysis method specifically comprises the following steps: (1) preparing a purified silver nanowire solution for later use; and in addition, centrifuging peripheral blood plasma samples of patients with different types of diseases and normal people to obtain corresponding serum samples for later use; (2) performing liquid-phase mixing and incubation on the silver nanowire solution and all the above serum samples according to a same proportion to ensure that the silver nanowires are fully contacted with the serum, and after incubation, performing bulk SERS spectral data collection on all the samples by using the Raman spectrometer, during spectrum collection, a laser wavelength being 532 nm, a spectrum collection range being 600 cm.sup.?1-1800 cm.sup.?1, and each sample being subjected to spectrum collection for 5 times; (3) after the spectral data of all the serum samples are collected, first performing the dimension reduction on serum SERS spectral data from different sources to remove irrelevant items in sample data points, and finally, screening effective dimensions capable of reflecting data difference, specifically: calculating an original data dimension relevancy among different samples by using the covariance matrix, and then taking data points with a lowest relevancy as effective dimensions after the dimension reduction, the effective dimensions corresponding to different peak positions among different cases; (4) then, performing algorithm training: performing binary classification processing by taking the data points subjected to the dimension reduction as characteristic values during algorithm training and identification, dividing all the samples into a training set and a test set, and scaling data of each sample, a scaling range being [0, 1], a normalization formula used in a scaling process being: $y^{} = lower + (upper - lower) * \frac{y - \min}{\max - \min};$ wherein, y is data before scaling, y is data after scaling, lower and upper are minimum and maximum values of the data after scaling, and min/max are minimum/maximum values of the data before scaling; a corresponding support vector expansion being: $\begin{matrix} f (x) & = w^{T} ? (x) + b \\ = {.Math.}_{i = 1}^{N} ?_{i} y_{i} {? (x_{i})}^{T} ? (x) + b \\ = {.Math.}_{i = 1}^{N} ?_{i} y_{i} k (x, x_{i}) + b \end{matrix};$ wherein, k(x, x.sub.i) is a kernel function, and the above formula shows that an optimal solution of the support vector machine model is expanded through the kernel function of training samples; the kernel function used in the algorithm processing being a radial basis kernel function (that is, an RBF kernel function), that is: $K (x_{j}, x_{j}) = \exp (- ? {.Math. x_{.Math.} - x_{j} .Math.}^{2}), ? > 0;$ ? being a hyperparameter of a Gaussian kernel function; specifically: first, converting an original problem into a convex optimization problem: $\begin{matrix} \min_{w, b, ?} & \frac{1}{2} {.Math. w .Math.}^{2} + C {.Math.}_{i = 1}^{N} ?_{i} \\ s . t . & y_{i} (w .Math. x_{i} + b) ? 1 - ?_{i}, i = 1, 2, .Math., N \\ \begin{matrix} ?_{i} ? 0, & i = 1, 2, .Math., N \end{matrix} \end{matrix};$ the original problem: then solving the convex optimization problem; {circle around (1)} for a dual problem of the original problem, constructing a Lagrangian function: $L (w, b, ?, ?, ?) ? \frac{1}{2} {.Math. w .Math.}^{2} + C {.Math.}_{i = 1}^{N} ?_{i} 1^{2} - {.Math.}_{i = 1}^{N} ?_{i} (y_{i} (w .Math. x_{i} + b) - 1 + ?_{i}) - {.Math.}_{i = 1}^{N} ?_{i} ?_{i};$ wherein, ? is a Lagrangian multiplier; w is a normal vector on plane and determines a direction of a hyperplane; b is a displacement term and represents a distance from the hyperplane to an origin; ? represents a relaxation variable; u is a dual variable, minimum values of w, b, and ? are firstly solved, partial derivatives are solved respectively and the derivatives are let to be 0, then results are substituted into the original function, the maximum value of a is solved for the minimum value, and then maximum value solving is converted into minimum value solving to get the dual problem: $\begin{matrix} \min_{a} & \frac{1}{2} {.Math.}_{i = 1}^{N} {.Math.}_{j = 1}^{N} ?_{i} ?_{j} y_{i} y_{j} (x_{i} .Math. x_{j}) - {.Math.}_{i = 1}^{N} ?_{i} \\ s . t . & {.Math.}_{i = 1}^{N} ?_{i} y_{i} = 0 \\ \begin{matrix} 0 ? ?_{i} ? C, & i = 1, 2, .Math., N \end{matrix} \end{matrix};$ $selecting K (x_{j}, x_{j}) = (- ? {.Math. x_{i} - x_{j} .Math.}^{2}), ? > 0$ as the kernel function; {circle around (2)} from KKT condition establishment, obtaining: $w^{*} = {.Math.}_{i = 1}^{N} ?_{i}^{*} y_{i} x_{i}$ $b^{*} = y_{j} - {.Math.}_{i = 1}^{N} ?_{i}^{*} y_{i} (x_{i} .Math. x_{j});$ wherein, C is a penalty coefficient, that is a tolerance to an error, g is a parameter of the RBF function after the RBF function is selected as the kernel function, and best parameters for C and g are selected by grid optimization through a parameter optimization tool grid. py in libsvm; a relation between ? and g being deduced from the following formula: $k (x, z) = \exp (- \frac{{d (x, z)}^{2}}{2 * ?^{2}}) = \exp (- gamma .Math. {d (x, z)}^{2}) .Math. gamma = \frac{1}{2 .Math. ?^{2}};$ wherein, d(x, z) is the distance, gamma=?, that is, a value of g is equal to the hyperparameter value of the Gaussian kernel function, and ? is a width parameter of the function; after the kernel function and the parameters C and g are selected, performing training by using the training set to obtain an svm model for the serum SERS spectral data, a classification decision function used in this process being: $f (x) = sign ({.Math.}_{i = 1}^{N} ?_{i}^{*} y_{i} K (x_{i}, x) + b^{*});$ wherein a* is obtained by an smo algorithm, K (x.sub.i, x) corresponds to the Gaussian kernel function, and b* is a threshold; selecting a hinge loss function as the loss function, ??w?.sup.2 being a regularization term, that is: $\min_{w, b} {.Math.}_{i = 1}^{N} \max (0, 1 - y_{i} (w .Math. x_{i} + b)) + ? {.Math. w .Math.}^{2};$ and (5) testing the obtained model by using the test set, comparing an actual situation with a model prediction result, and finally obtaining the identification accuracy rate and outputting a result.

2. The spectroscopy and artificial intelligence-interaction serum analysis method of claim 1, wherein, in step (1), the original silver nanowire solution is centrifuged at a rotation speed of 6000 r/min.

3. An application of the spectroscopy and artificial intelligence-interaction serum analysis method of claim 1 for obtaining accuracy in cancer identification.

4. The application of claim 3, wherein, when performing the binary classification processing in step (4) of the spectroscopy and artificial intelligence-interaction serum analysis method, serum samples from normal individuals are classified into one class, and serum samples from a certain cancer patient are classified into the other class; in addition, a part of samples from cancer patients and normal individuals are subjected to algorithm training, the remaining samples are subjected to cancer identification, serum spectral data of the certain cancer patient is used as a cancer class during training and identification, serum spectral data of the normal individuals is used as a normal class independently, and finally an accuracy of cancer identification is obtained.

5. The application of claim 4, wherein the patients are lung cancer patients and colorectal carcinoma patients.

6. The application of claim 5, wherein when high-accuracy identification and different SERS peak position analysis of the lung cancer patients, the colorectal carcinoma patients and the normal individuals are performed by the spectroscopy and artificial intelligence-interaction serum analysis method, in step (3), original spectral data of each serum sample has about 1456 dimensions before the dimension reduction, and dimensions are reduced to 50 after the dimension reduction, and correspond to 50 SERS characteristic peak positions with obvious differences and belong to a source of a cancer-related database at a molecular bond energy level.

7. An application of the spectroscopy and artificial intelligence-interaction serum analysis method of claim 2 for obtaining accuracy in cancer identification.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0050] FIG. 1 shows a flow chart of a spectroscopy and artificial intelligence-interaction serum analysis method according to the present invention.

[0051] FIGS. 2A-2D show a typical SERS map and summary maps of some serum from normal individuals, colorectal carcinoma patients, and lung cancer patients in Example 2.

[0052] FIG. 3 shows a partial dimension heat map of serum SERS spectral data of 244 lung cancer samples ? 350 normal individuals in Example 2.

[0053] FIG. 4 shows a screenshot of a statistical table of 50 Raman characteristic peak positions obtained after dimension reduction of the lung cancer and normal individuals serum samples in Example 2.

[0054] FIG. 5 shows a partial dimension heat map of serum SERS spectral data of 216 colorectal carcinoma samples ? 350 normal individuals in Example 2.

[0055] FIG. 6 shows a screenshot of a statistical table of 50 Raman characteristic peak positions obtained after dimension reduction of the colorectal carcinoma and normal people samples in Example 2.

[0056] FIG. 7 shows a flow chart of identification accuracy output for colorectal carcinoma patients, lung cancer patients, and normal individuals in Example 3.

[0057] FIG. 8 shows a logic diagram of the arithmetic operation for colorectal carcinoma patients, lung cancer patients, and normal people in Example 3.

[0058] FIGS. 9A-9C show the scatter distribution charts, accuracy, and sensitivity statistical chart for three types of samples identification from colorectal carcinoma patients, lung cancer patients, and normal individuals in Example 3.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0059] The following Examples 1, 2, and 3 are provided to further illustrate the present invention, but are not to be construed as limiting the present invention. Unless otherwise specified, technical means used in the examples are conventional means well known to those skilled in the art.

Example 1

[0060] The present invention mainly combines the SERS spectrum technique in physical field with an artificial intelligence technique in computer field. As shown in FIG. 1, a serum analysis method provided by the present invention perfectly combines the SERS technique with an artificial algorithm and completes information interaction in the mode of clinical sample collectionsample preparationspectrum collectiondata training and identificationresult output-expected for clinical treatment guidance. Finally, a high-accuracy, rapid cancer identification and highly-referential different peak position locating are achieved. The method specifically includes the following steps:

[0061] (a) In clinical sample collection, human peripheral blood from 244 lung cancer patients, 216 colorectal carcinoma patients, 350 normal people and other different sources is extracted in this example. Each peripheral blood is centrifuged with the help of a centrifuge, the centrifugation time is 10 minutes, and the volume of the peripheral blood used is 1.5 ml. After the centrifugation, yellowish serum at the upper layer of the obtained liquid is carefully extracted to obtain the serum samples of the lung cancer patients, the colorectal carcinoma patients and the normal individuals for later use, respectively.

[0062] (b) In this example, silver nanowires are used as SERS probes, and the specific preparation process of original silver nanowire solution is: firstly adding 1.665 g of polyvinylpyrrolidone (with the molecular weight of 360000) and 0.0019 g of CuCl.sub.2 into 100 ml of ethylene glycol, stirring and dispersing uniformly in an ultrasonic cell to obtain the solution A; then dissolving 1.7 g of AgNO.sub.3 in 100 ml of ethylene glycol to obtain the solution B. Then dropping the solution A into the solution B at a uniform speed and stirring uniformly; and finally transferring the mixed solution to a 250 ml autoclave, sealing the autoclave and putting the autoclave into an oven for heating at 160? C. for 3 h, and after the reaction, cooling to room temperature. The original silver nanowire solution were obtained for later use.

[0063] The silver nanowires are centrifuged to remove impurities before Raman spectrum test, and the obtained silver nanowire has a diameter of about 100 nm and a length of 10-20 ?m. The specific operation of centrifugation is: 4.5 ml original silver nanowire solution are taken for centrifugation with a keeping speed at 6000 r/min. After 10 min, removing all supernatant with a pipette, resuspending obtained silver nanowire precipitate with 1 ml deionized water, and finally dispersing it evenly with an ultrasonic cleaner to obtain the concentrated silver nanowire solution.

[0064] (c) SERS test sample preparation is then performed. 30 ?l of the serum sample is firstly taken into a 100 ?l conical tube with a pipette, and then 15 ?l of the concentrated silver nanowire solution is taken and fully mixed with the serum sample; At this time, the volume ratio of the silver nanowire solution to the serum sample is fixed at 1:2 (to ensure that the same amount of SERS microprobes is added to each serum sample of the same volume), the silver nanowire microprobes are fully contacted with the serum. After 10 minutes mixed incubation at room temperature, 30 ?l of the incubated mixture is transferred to a cap of the inverted conical tube for Raman spectrum test. The sample is firstly focused below a liquid level with the help of confocal microscope, and the lens used for spectrum collection is a 50?confocal lens, the laser wavelength is 532 nm, the spectrum collection range is 600 cm.sup.?1-1800 cm.sup.?1. After the same treatment, each serum sample is subjected to sample collection for 5 times, and the total time for each sample to be subjected to sample collection for 5 times is about 15 minutes.

Example 2

[0065] After the steps of clinical sample collectionsample preparationspectrum collection in Example 1 are completed, all collected Raman spectrum data of 350 normal individuals, 244 lung cancer patients and 216 colorectal carcinoma patients were screened. Spectrum data with the best repeatability among the five times of data of each sample is finally selected as a final spectrum collection result. FIG. 2A shows a typical SERS spectrogram of a normal human serum sample after screening, and it can be found that there are obvious characteristic peaks, which also confirms extremely high detection sensitivity of the SERS technique.

[0066] After all the Raman spectrum collection data is screened, SERS maps of all the serum samples from different sources can be obtained. FIG. 2B shows a serum SERS summary map of a plurality of cases of normal individuals, and it can be found that each serum spectrum curve of the normal individuals has obvious characteristic peak positions, and all the spectrum curves have certain common characteristic peak positions. FIG. 2C and FIG. 2D show serum SERS summary maps of a plurality of cases of lung cancer patients, colorectal carcinoma patients, respectively. It can be found that each spectrum curve of the lung cancer patients and the colorectal carcinoma patients also has obvious characteristic peak positions. Although all the spectrum curves in FIGS. 2A-2D have certain common characteristic peak positions, certain characteristic peaks of these cancer patients are slightly different from those of the normal individuals. By visual observation and comparison of the serum SERS summary maps of the normal individuals in FIG. 2B, the lung cancer and colorectal carcinoma patients in FIG. 2C and FIG. 2D, slight differences can be found. But the systematic statistical analysis of spectral data from different sources is impossible.

[0067] Based on the bottleneck problem of analyzing the spectrum data in batch, the present invention provides a method for statistically processing, analyzing and identifying a large amount of serum SERS spectrum data by means of the artificial intelligence algorithm technique. The algorithm tool used by the present invention is libsvm, and before svm model training and test are performed by using the serum spectrum data, formats of all the spectral data are firstly converted into the format required by the libsvm with the help of weka software. Since the data for each sample is a data point between 600 cm.sup.?1 and 1800 cm.sup.?1, this frequency range included a total of 1456 detailed data points. The abscissas of the SERS spectral data of all samples have the same frequency, but the corresponding peak intensity of each sample at each frequency is different. Therefore, each frequency is regarded as an index value, and the corresponding peak intensity is a dimension. In this way, data of each sample becomes 1456-dimension data, and the 1456 dimensions are sorted from low to high according to the frequency. However, not every dimension is useful, some dimensions do not have characteristics. Therefore, data cleaning and characteristic dimension reduction are performed next.

[0068] In this example, the normal individuals are divided into one class and the patients with two types of cancers are divided into the other class in the process of dimension reduction. Specifically, the original spectrum data of a frequency band of 600 cm.sup.?1 to 1800 cm.sup.?1 is divided into a plurality of effective frequency bands by taking 60 cm.sup.?1 as the interval, and then relevancy between the characteristics of each band in different frequency bands is calculated by using covariance. A relevancy degree is between ?1 and 1, the closer to ?1 and 1, the greater the relevancy, and the closer to 0, the smaller the relevancy. Finally, the relevancy of frequency characteristics in different ranges is presented in the form of heat map. FIG. 3 shows a relevancy heat map of the 244 lung cancer samples compared with 350 normal control samples in a frequency band of 600 cm.sup.?1-623.7705 cm.sup.?1, in which it can be clearly found that relevancy between different dimensions shows obvious difference distribution. In the dimension reduction process, two dimensions with the lowest relevancy are selected from every 60 continuous dimensions as effective characteristic points, and two effective dimensions are still selected from remaining continuous dimensions which are less than 60, the original 1456 dimensions are finally reduced to 50 dimensions, which correspond to 50 characteristic Raman frequencies. Specific dimension difference details of the SERS spectra between all lung cancer patients and the normal people are shown in FIG. 4. These 50 Raman peak positions represent 50 differences between the serum SERS spectra of the lung cancer patients and those of the normal individuals. Similarly, FIG. 5 shows a relevancy heat map of the 216 colorectal carcinoma samples compared with the 350 normal control samples in the frequency band of 600 cm.sup.?1-623.7705 cm.sup.?1, in which it can be clearly found that relevancy between different dimensions shows obvious difference distribution. Correspondingly, specific dimension difference details of the SERS spectra between the colorectal carcinoma patients and the normal people are shown in FIG. 6. In conclusion, the method can simplify a complicated SERS peak position process and realize more accurate SERS difference peak position (that is, a cancer characteristic dimension at a molecular bond energy level) locating.

Example 3

[0069] After the dimension reduction in Example 2 is completed, the SERS spectrogram of each serum sample can be reduced to 50 dimensions, and then all data is processed according to a flow chart shown in FIG. 7, and following two types of training and identification are performed: the label of serum spectral data of a colorectal carcinoma patient is 1, and the label of serum spectral data of a normal person is 0, which is used to judge whether to be a colorectal carcinoma patient; Or the label of serum spectral data of a lung cancer patient is 1, and the label of serum spectral data of a normal person is 0, which is used to judge whether to be a lung cancer patient.

[0070] A logic chart of arithmetic operation in this example is shown in FIG. 8. During the arithmetic operation, each of the above cases is divided into a training set and a test set by 8:2, and then the data is scaled to a range of [0, 1]. As the data is too scattered, the data is relatively concentrated after scaling, which can solve an impact of some singular data. A normalization formula used in a scaling process is:

[00011] $y^{} = lower + (upper - lower) * \frac{y - \min}{\max - \min};$ [0071] where, y is data before scaling, y is data after scaling, lower and upper are minimum and maximum values of the data after scaling, and min/max are minimum/maximum values of the data before scaling.

[0072] The corresponding support vector expansion is:

[00012] $\begin{matrix} f (x) = w^{T} ? (x) + b \\ = {.Math.}_{i = 1}^{N} ?_{i} y_{i} {? (x_{i})}^{T} ? (x) + b \\ = {.Math.}_{i = 1}^{N} ?_{i} y_{i} k (x, x_{i}) + b \end{matrix}$ [0073] where, k(x, x.sub.i) is a kernel function, and the above formula shows that an optimal solution of the model can be expanded through the kernel function of the training samples.

[0074] The kernel function used in the algorithm processing is a radial basis kernel function (that is, RBF kernel function). The kernel function maps samples to a higher-dimension space nonlinearly. Different from a linear kernel, the kernel function can deal with a nonlinear relationship between classification, labeling and attributes, and shows good performance in practical problems. An specific expression is:

[00013] $K (x_{i}, x_{j}) = \exp (- ? {.Math. x_{i} - x_{j} .Math.}^{2}),$ $? > 0;$

[0075] ? is the hyperparameter of a Gaussian kernel function. Specifically: [0076] first, the original problem is converted into a convex optimization problem: [0077] the original problem:

[00014] $\min_{w, b, ?} \frac{1}{2} {.Math. w .Math.}^{2} + C {.Math.}_{i = 1}^{N} ?_{i}$ $s . t . y_{i} (w .Math. x_{i} + b) ? 1 - ?_{i}, i = 1, 2, .Math., N;$ $?_{i} ? 0, i = 1, 2, .Math., N$ [0078] then the convex optimization problem is solved; [0079] {circle around (1)} for the dual problem of the original problem, the Lagrangian function is constructed:

[00015] $L (w, b, ?, ?, ?) ? \frac{1}{2} {.Math. w .Math.}^{2} + C {.Math.}_{i = 1}^{N} ?_{i} - {.Math.}_{i = 1}^{N} ?_{i} (y_{i} (w .Math. x_{i} + b) - 1 + ?_{i}) - {.Math.}_{i = 1}^{N} ?_{i} ?_{i};$

[0080] where, ? is the Lagrangian multiplier; w is the normal vector on plane, which determines the direction of a hyperplane; b is the displacement term, which represents the distance from the hyperplane to the origin; ? represents the relaxation variable; and u is the dual variable. Minimum values of w, b, and ? are firstly solved, partial derivatives are solved respectively and the derivatives are let to be 0, then results are substituted into the original function, the maximum value of a is solved for the minimum value, and then maximum value solving is converted into minimum value solving to get the dual problem:

[00016] $\begin{matrix} \min_{a} & \frac{1}{2} {.Math.}_{i = 1}^{N} {.Math.}_{j = 1}^{N} ?_{i} ?_{j} y_{i} y_{j} (x_{i} .Math. x_{j}) - {.Math.}_{i = 1}^{N} ?_{i} \\ s . t . & {.Math.}_{i = 1}^{N} ?_{i} y_{i} = 0 \\ 0 ? ?_{i} ? C, i = 1, 2, .Math., N \end{matrix};$ $K (x_{j}, x_{j}) = (- ? {.Math. x_{i} - x_{j} .Math.}^{2}), ? > 0$

is selected as the kernel function; [0081] {circle around (2)} from KKT condition establishment, the following is obtained:

[00017] $w^{*} = {.Math.}_{i = 1}^{N} ?_{i}^{*} y_{i} x_{i}$ $b^{*} = y_{j} - {.Math.}_{i = 1}^{N} ?_{i}^{*} y_{i} (x_{i} .Math. x_{j});$

[0082] It should be noted that:

[0083] The parameters C and g in the present invention are the best parameters after grid optimization by the grid. py in the libsvm, C is the penalty coefficient, that is a tolerance to error, the higher C is, the easier it is to overfit. It indicats that the error can not be tolerated. The smaller C is, the easier it is to underfit; if the C is too large or too small, generalization ability becomes worse. G is a parameter of the RBF function after the RBF function is selected as the kernel function, implicitly determining distribution of the data mapped to a new characteristic space, the larger g is, the fewer support vectors are, the smaller g is, the more support vectors are, and the amount of support vectors affects speeds of training and prediction.

[0084] A relation between ? and g is deduced from the following formula:

[00018] $k (x, z) = \exp (- \frac{{d (x, z)}^{2}}{2 * ?^{2}}) = \exp (- gamma .Math. {d (x, z)}^{2}) .Math. gamma = \frac{1}{2 .Math. ?^{2}};$ [0085] where, d(x, z) is the distance, gamma=?, that is, the value of g is equal to the hyperparameter of the Gaussian kernel function, and ? is the width parameter of the function.

[0086] In the example, when the label of serum spectral data of the colorectal carcinoma patient is 1, and the label of serum spectral data of the normal person is 0, C=8 and g=0.0488; and when the label of serum spectral data of the lung cancer patient is 1, and the label of serum spectral data of the normal person is 0, C=8 and g=0.25.

[0087] After the kernel function and the parameters C and g are selected, training is performed by using the training set to obtain the svm model for the serum SERS spectral data, and a classification decision function used in this process is:

[00019] $f (x) = sign ({.Math.}_{i = 1}^{N} ?_{i}^{*} y_{i} K (x_{i}, x) + b^{*});$ [0088] where, a* is the optimal solution for a set of a.sub.i satisfying the condition and obtained by an smo algorithm, K (x.sub.i, x) corresponds to the Gaussian kernel function, and b* is the threshold and is already solved in the former step.

[0089] The hinge loss function is selected as loss function, is the regularization term, that is:

[00020] $\min_{w, b} {.Math.}_{i = 1}^{N} \max (0, 1 - y_{i} (w .Math. x_{i} + b)) + ? {.Math. w .Math.}^{2};$ [0090] when samples are correctly classified: y(wx+b)>0; and when samples are wrongly classified: y(wx+6)<0. The absolute value of y(wx+b) represents the distance between the sample and the decision boundary. The larger the absolute value is, the farther the sample is from the decision boundary. When the samples are correctly classified and the function interval is greater than 1, the hinge loss is 0, or otherwise the loss is 1?y(wx+b)

[0091] The obtained model is then tested by using the test set, the actual situation is compared with the model prediction result, and finally the identification accuracy rate is obtained and the result is outputted.

[0092] FIGS. 9A-9B show scatter distribution charts for three different data sets, and it can be found that the algorithm model established by the present invention has an excellent classification effect for serum Raman data from different sources, wherein the classification and identification effects for colorectal carcinoma are slightly better than those for the lung cancer. In addition, by observing FIG. 9C, it can be found that the lung cancer and the colorectal carcinoma compared with the normal individuals achieve high-sensitivity identification with the identification accuracy rate higher than 94.1% and the sensitivity higher than 91.84%. Specifically, the lung cancer identification with the accuracy of 94.1% at the sensitivity of 91.84% and colorectal carcinoma identification with the accuracy of 98.25% at the sensitivity of 97.73% can be realize. The specific identification effects approach 100%. Therefore, the spectroscopy and artificial intelligence-interaction serum analysis method provided by the present invention can realize high-accuracy cancer detection, which is of great significance to rapid, high-accuracy and non-invasive detection of clinical cancers.

[0093] In addition, it should be emphasized that compared with high-accuracy cancer detection and analysis of a single serum sample, the method of the present invention takes a very short time, the whole process of sample collectionsample preparation-spectrum collectionalgorithm trainingidentification accuracy result output takes about 1 hour, and cost of a consumable (a silver nanowire solution) is less than ?1 except for cost of a detection instrument itself. This is of great significance for the current field of liquid biopsy of cancer, which may solves the problems of strong invasiveness, long detection cycle and high cost of traditional medical methods in the process of time-consuming cancer detection.

[0094] The above examples are preferred implementation modes of the present invention, but the implementation modes of the present invention are not limited by the above examples. Any other changes, modifications, substitutions, combinations, and simplifications that do not deviate from the spirit and principle of the present invention should be equivalent and included in the scope of protection of the present invention.

INTERACTION OF SPECTROSCOPY AND ARTIFICIAL INTELLIGENCE FOR SEROLOGICAL ANALYSIS AND ITS APPLICATIONS

Assignee

Inventors

Cpc classification

Classification Explorer

G01N2201/1296

PHYSICS

Classification Explorer

G01N2800/7028

PHYSICS

Classification Explorer

G01N33/54346

PHYSICS

Classification Explorer

G01N2001/2846

PHYSICS

Classification Explorer

G01N1/2813

PHYSICS

Classification Explorer

G01N21/658

PHYSICS

International classification

Classification Explorer

G01N21/65

PHYSICS

Classification Explorer

G01N33/543

PHYSICS

Classification Explorer

G01N1/28

PHYSICS

Abstract

Claims

Description