INTERACTION OF SPECTROSCOPY AND ARTIFICIAL INTELLIGENCE FOR SEROLOGICAL ANALYSIS AND ITS APPLICATIONS
20240210324 ยท 2024-06-27
Assignee
Inventors
- Xiangheng XIAO (Wuhan, CN)
- Shilian DONG (Wuhan, CN)
- Fubing WANG (Wuhan, CN)
- Changzhong JIANG (Wuhan, CN)
Cpc classification
G01N1/2813
PHYSICS
International classification
G01N33/543
PHYSICS
Abstract
A spectroscopy and artificial intelligence-interaction serum analysis method includes: collecting bulk SERS spectral data of clinical serum samples, performing dimension reduction on the spectral data by using a covariance matrix to obtain spectral different peak positions of cancer patients and normal individuals, and performing spectral data processing and algorithm identification by using an svm model of an artificial intelligence algorithm to obtain a cancer identification rate. Compared with the conventional serum analysis method, the spectroscopy and artificial intelligence-interaction serum analysis method requires no antibody-antigen or other biological specificity modification processes, and the serum of cancer patients and normal individuals can be identified more cheaply, rapidly and accurately. Also the different peak positions in SERS spectra of a large amount of serum samples can be located, which provides an entirely novel detection and analysis method at a molecular bond energy level for the field of liquid biopsy of clinical cancers.
Claims
1. A spectroscopy and artificial intelligence-interaction serum analysis method, wherein, the spectroscopy and artificial intelligence-interaction serum analysis method uses silver nanowires without intrinsic Raman signal as surface enhanced Raman scattering (SERS) probes; a silver nanowire solution is directly liquid-phase mixed and co-incubated with serum samples from diseased patients and normal individuals, respectively; after incubation, serum SERS spectral data collection is performed under test of a Raman spectrometer to obtain original spectral data points; later, a dimension reduction is performed on the original spectral data points by using a covariance matrix, and spectral data points obtained by the dimension reduction are thus different peak positions of diseased samples compared with normal samples; then, a classification training and identification are further performed on the spectral data points after the dimension reduction by using a support vector machine model to finally obtain identification accuracy rates of the different diseased samples compared with the normal samples; wherein the spectroscopy and artificial intelligence-interaction serum analysis method specifically comprises the following steps: (1) preparing a purified silver nanowire solution for later use; and in addition, centrifuging peripheral blood plasma samples of patients with different types of diseases and normal people to obtain corresponding serum samples for later use; (2) performing liquid-phase mixing and incubation on the silver nanowire solution and all the above serum samples according to a same proportion to ensure that the silver nanowires are fully contacted with the serum, and after incubation, performing bulk SERS spectral data collection on all the samples by using the Raman spectrometer, during spectrum collection, a laser wavelength being 532 nm, a spectrum collection range being 600 cm.sup.?1-1800 cm.sup.?1, and each sample being subjected to spectrum collection for 5 times; (3) after the spectral data of all the serum samples are collected, first performing the dimension reduction on serum SERS spectral data from different sources to remove irrelevant items in sample data points, and finally, screening effective dimensions capable of reflecting data difference, specifically: calculating an original data dimension relevancy among different samples by using the covariance matrix, and then taking data points with a lowest relevancy as effective dimensions after the dimension reduction, the effective dimensions corresponding to different peak positions among different cases; (4) then, performing algorithm training: performing binary classification processing by taking the data points subjected to the dimension reduction as characteristic values during algorithm training and identification, dividing all the samples into a training set and a test set, and scaling data of each sample, a scaling range being [0, 1], a normalization formula used in a scaling process being:
2. The spectroscopy and artificial intelligence-interaction serum analysis method of claim 1, wherein, in step (1), the original silver nanowire solution is centrifuged at a rotation speed of 6000 r/min.
3. An application of the spectroscopy and artificial intelligence-interaction serum analysis method of claim 1 for obtaining accuracy in cancer identification.
4. The application of claim 3, wherein, when performing the binary classification processing in step (4) of the spectroscopy and artificial intelligence-interaction serum analysis method, serum samples from normal individuals are classified into one class, and serum samples from a certain cancer patient are classified into the other class; in addition, a part of samples from cancer patients and normal individuals are subjected to algorithm training, the remaining samples are subjected to cancer identification, serum spectral data of the certain cancer patient is used as a cancer class during training and identification, serum spectral data of the normal individuals is used as a normal class independently, and finally an accuracy of cancer identification is obtained.
5. The application of claim 4, wherein the patients are lung cancer patients and colorectal carcinoma patients.
6. The application of claim 5, wherein when high-accuracy identification and different SERS peak position analysis of the lung cancer patients, the colorectal carcinoma patients and the normal individuals are performed by the spectroscopy and artificial intelligence-interaction serum analysis method, in step (3), original spectral data of each serum sample has about 1456 dimensions before the dimension reduction, and dimensions are reduced to 50 after the dimension reduction, and correspond to 50 SERS characteristic peak positions with obvious differences and belong to a source of a cancer-related database at a molecular bond energy level.
7. An application of the spectroscopy and artificial intelligence-interaction serum analysis method of claim 2 for obtaining accuracy in cancer identification.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0059] The following Examples 1, 2, and 3 are provided to further illustrate the present invention, but are not to be construed as limiting the present invention. Unless otherwise specified, technical means used in the examples are conventional means well known to those skilled in the art.
Example 1
[0060] The present invention mainly combines the SERS spectrum technique in physical field with an artificial intelligence technique in computer field. As shown in
[0061] (a) In clinical sample collection, human peripheral blood from 244 lung cancer patients, 216 colorectal carcinoma patients, 350 normal people and other different sources is extracted in this example. Each peripheral blood is centrifuged with the help of a centrifuge, the centrifugation time is 10 minutes, and the volume of the peripheral blood used is 1.5 ml. After the centrifugation, yellowish serum at the upper layer of the obtained liquid is carefully extracted to obtain the serum samples of the lung cancer patients, the colorectal carcinoma patients and the normal individuals for later use, respectively.
[0062] (b) In this example, silver nanowires are used as SERS probes, and the specific preparation process of original silver nanowire solution is: firstly adding 1.665 g of polyvinylpyrrolidone (with the molecular weight of 360000) and 0.0019 g of CuCl.sub.2 into 100 ml of ethylene glycol, stirring and dispersing uniformly in an ultrasonic cell to obtain the solution A; then dissolving 1.7 g of AgNO.sub.3 in 100 ml of ethylene glycol to obtain the solution B. Then dropping the solution A into the solution B at a uniform speed and stirring uniformly; and finally transferring the mixed solution to a 250 ml autoclave, sealing the autoclave and putting the autoclave into an oven for heating at 160? C. for 3 h, and after the reaction, cooling to room temperature. The original silver nanowire solution were obtained for later use.
[0063] The silver nanowires are centrifuged to remove impurities before Raman spectrum test, and the obtained silver nanowire has a diameter of about 100 nm and a length of 10-20 ?m. The specific operation of centrifugation is: 4.5 ml original silver nanowire solution are taken for centrifugation with a keeping speed at 6000 r/min. After 10 min, removing all supernatant with a pipette, resuspending obtained silver nanowire precipitate with 1 ml deionized water, and finally dispersing it evenly with an ultrasonic cleaner to obtain the concentrated silver nanowire solution.
[0064] (c) SERS test sample preparation is then performed. 30 ?l of the serum sample is firstly taken into a 100 ?l conical tube with a pipette, and then 15 ?l of the concentrated silver nanowire solution is taken and fully mixed with the serum sample; At this time, the volume ratio of the silver nanowire solution to the serum sample is fixed at 1:2 (to ensure that the same amount of SERS microprobes is added to each serum sample of the same volume), the silver nanowire microprobes are fully contacted with the serum. After 10 minutes mixed incubation at room temperature, 30 ?l of the incubated mixture is transferred to a cap of the inverted conical tube for Raman spectrum test. The sample is firstly focused below a liquid level with the help of confocal microscope, and the lens used for spectrum collection is a 50?confocal lens, the laser wavelength is 532 nm, the spectrum collection range is 600 cm.sup.?1-1800 cm.sup.?1. After the same treatment, each serum sample is subjected to sample collection for 5 times, and the total time for each sample to be subjected to sample collection for 5 times is about 15 minutes.
Example 2
[0065] After the steps of clinical sample collectionsample preparationspectrum collection in Example 1 are completed, all collected Raman spectrum data of 350 normal individuals, 244 lung cancer patients and 216 colorectal carcinoma patients were screened. Spectrum data with the best repeatability among the five times of data of each sample is finally selected as a final spectrum collection result.
[0066] After all the Raman spectrum collection data is screened, SERS maps of all the serum samples from different sources can be obtained.
[0067] Based on the bottleneck problem of analyzing the spectrum data in batch, the present invention provides a method for statistically processing, analyzing and identifying a large amount of serum SERS spectrum data by means of the artificial intelligence algorithm technique. The algorithm tool used by the present invention is libsvm, and before svm model training and test are performed by using the serum spectrum data, formats of all the spectral data are firstly converted into the format required by the libsvm with the help of weka software. Since the data for each sample is a data point between 600 cm.sup.?1 and 1800 cm.sup.?1, this frequency range included a total of 1456 detailed data points. The abscissas of the SERS spectral data of all samples have the same frequency, but the corresponding peak intensity of each sample at each frequency is different. Therefore, each frequency is regarded as an index value, and the corresponding peak intensity is a dimension. In this way, data of each sample becomes 1456-dimension data, and the 1456 dimensions are sorted from low to high according to the frequency. However, not every dimension is useful, some dimensions do not have characteristics. Therefore, data cleaning and characteristic dimension reduction are performed next.
[0068] In this example, the normal individuals are divided into one class and the patients with two types of cancers are divided into the other class in the process of dimension reduction. Specifically, the original spectrum data of a frequency band of 600 cm.sup.?1 to 1800 cm.sup.?1 is divided into a plurality of effective frequency bands by taking 60 cm.sup.?1 as the interval, and then relevancy between the characteristics of each band in different frequency bands is calculated by using covariance. A relevancy degree is between ?1 and 1, the closer to ?1 and 1, the greater the relevancy, and the closer to 0, the smaller the relevancy. Finally, the relevancy of frequency characteristics in different ranges is presented in the form of heat map.
Example 3
[0069] After the dimension reduction in Example 2 is completed, the SERS spectrogram of each serum sample can be reduced to 50 dimensions, and then all data is processed according to a flow chart shown in
[0070] A logic chart of arithmetic operation in this example is shown in
[0072] The corresponding support vector expansion is:
[0074] The kernel function used in the algorithm processing is a radial basis kernel function (that is, RBF kernel function). The kernel function maps samples to a higher-dimension space nonlinearly. Different from a linear kernel, the kernel function can deal with a nonlinear relationship between classification, labeling and attributes, and shows good performance in practical problems. An specific expression is:
[0075] ? is the hyperparameter of a Gaussian kernel function. Specifically: [0076] first, the original problem is converted into a convex optimization problem: [0077] the original problem:
[0080] where, ? is the Lagrangian multiplier; w is the normal vector on plane, which determines the direction of a hyperplane; b is the displacement term, which represents the distance from the hyperplane to the origin; ? represents the relaxation variable; and u is the dual variable. Minimum values of w, b, and ? are firstly solved, partial derivatives are solved respectively and the derivatives are let to be 0, then results are substituted into the original function, the maximum value of a is solved for the minimum value, and then maximum value solving is converted into minimum value solving to get the dual problem:
is selected as the kernel function; [0081] {circle around (2)} from KKT condition establishment, the following is obtained:
[0082] It should be noted that:
[0083] The parameters C and g in the present invention are the best parameters after grid optimization by the grid. py in the libsvm, C is the penalty coefficient, that is a tolerance to error, the higher C is, the easier it is to overfit. It indicats that the error can not be tolerated. The smaller C is, the easier it is to underfit; if the C is too large or too small, generalization ability becomes worse. G is a parameter of the RBF function after the RBF function is selected as the kernel function, implicitly determining distribution of the data mapped to a new characteristic space, the larger g is, the fewer support vectors are, the smaller g is, the more support vectors are, and the amount of support vectors affects speeds of training and prediction.
[0084] A relation between ? and g is deduced from the following formula:
[0086] In the example, when the label of serum spectral data of the colorectal carcinoma patient is 1, and the label of serum spectral data of the normal person is 0, C=8 and g=0.0488; and when the label of serum spectral data of the lung cancer patient is 1, and the label of serum spectral data of the normal person is 0, C=8 and g=0.25.
[0087] After the kernel function and the parameters C and g are selected, training is performed by using the training set to obtain the svm model for the serum SERS spectral data, and a classification decision function used in this process is:
[0089] The hinge loss function is selected as loss function, is the regularization term, that is:
[0091] The obtained model is then tested by using the test set, the actual situation is compared with the model prediction result, and finally the identification accuracy rate is obtained and the result is outputted.
[0092]
[0093] In addition, it should be emphasized that compared with high-accuracy cancer detection and analysis of a single serum sample, the method of the present invention takes a very short time, the whole process of sample collectionsample preparation-spectrum collectionalgorithm trainingidentification accuracy result output takes about 1 hour, and cost of a consumable (a silver nanowire solution) is less than ?1 except for cost of a detection instrument itself. This is of great significance for the current field of liquid biopsy of cancer, which may solves the problems of strong invasiveness, long detection cycle and high cost of traditional medical methods in the process of time-consuming cancer detection.
[0094] The above examples are preferred implementation modes of the present invention, but the implementation modes of the present invention are not limited by the above examples. Any other changes, modifications, substitutions, combinations, and simplifications that do not deviate from the spirit and principle of the present invention should be equivalent and included in the scope of protection of the present invention.