CLASSIFICATION SYSTEM OF EPILEPTIC EEG SIGNALS BASED ON NON-LINEAR DYNAMICS FEATURES

Abstract

A classification system of epileptic EEG signals based on non-linear dynamics features includes a preprocessing module, a feature extraction module, a feature sorting module, a feature selection module and a classification module: the preprocessing module uses discrete wavelet transformation to remove noise in the EEG data and obtain effective EEG signal data without noise; the feature extraction module uses multiple entropy algorithms to calculate the non-linear dynamics features of each EEG signal; the feature sorting module sorts features with analysis of variance; the feature selection module selects the optimal feature subset that has the most significant impact on the accuracy of the model uses a uses a forward sequential feature selection algorithm; the classification module transforms the judgment of EEG during the period of epilepsy and EEG during the interval period of epilepsy into a binary classification problem by use of a least squares support vector machine algorithm.

Claims

1. A classification system of epileptic EEG signals based on non-linear dynamics features, including a preprocessing module, a feature extraction module, a feature sorting module, a feature selection module, and a classification module, wherein: the preprocessing module, for preprocessing the EEG signals, uses discrete wavelet transformation (DWT) to remove noise in the EEG data and obtain effective EEG signal data without noise; the feature extraction module, for dividing the EEG signals into several data segments, uses multiple entropy algorithms to calculate different entropy values of EEG data under the same time window as the characteristic values of the corresponding data segments and a feature set is formed by calculating the entropy values of all entropy algorithms; the feature sorting module sorts the significant influence on the classification results of epileptic EEG signals according to the entropy values of the extracted EEG signals by use of analysis of variance (ANOVA), and the more significant the influence of feature variables on classification results, the higher the sorting of the feature variables; the feature selection module uses a forward feature selection (FSFS) algorithm to successively add one feature from the first most significant feature into the classification model until the accuracy of the model is no longer improved, so as to select the optimal feature subset that has the most significant impact on the accuracy of the model; and the classification module classifies EEG signals of epilepsy patients by use of a least squares support vector machine (LS-SVM) algorithm.

2. The method for classification of epileptic EEG signals based on nonlinear dynamic characteristics according to claim 1, wherein: the classification module uses the collected EEG signals as training data of a least squares support vector machine to train the classification model; and the classification model is trained according to the EEG signal database of epilepsy patients, to obtain the hyper parameters of LS-SVM and select an optimized feature subset.

3. The method for classification of epileptic EEG signals based on nonlinear dynamic characteristics according to claim 2, further comprising increasing a real-time on-line system to perform real-time online classification of new EEG signals collected in real-time through the pre-processing module, the feature extraction module, and the classification module.

4. The method for classification of epileptic EEG signals based on nonlinear dynamic characteristics according to claim 1, wherein the discrete wavelet transform method that is used for EEG signal denoising is to use a Daubeches-4 wavelet function, and select an EEG signal with a frequency of 3 to 25 Hz after filtering.

5. The method for classification of epileptic EEG signals based on nonlinear dynamic characteristics according to claim 1, wherein the entropy algorithms are Shannon Entropy, Conditional Entropy, Sample Entropy, and Spectral Entropy.

6. The method for classification of epileptic EEG signals based on nonlinear dynamic characteristics according to claim 2, wherein a training method for the Least Squares Support Vector Machine (LS SVM) algorithm is as follows: randomly dividing the EEG signal database of epilepsy patients into two parts: 70% and 30%, wherein 70% of the EEG data is used to train the algorithm, and the remaining 30% data is used to test the algorithm so as to obtain a LS-LVM model.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] Other objects and features of the invention will become apparent from the following detailed description considered in connection with the accompanying drawings. It is to be understood, however, that the drawings are designed as an illustration only and not as a definition of the limits of the invention.

[0020] In the drawings,

[0021] FIG. 1 is a structural block diagram of a method for detecting epilepsy according to the present invention; and

[0022] FIG. 2 shows original EEG signals during epileptic seizures and sub-signals of each frequency segment after DWT decomposition.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0023] The invention will be described in detail in combination with the attached Drawings.

[0024] As shown in FIG. 1, the classification system of epileptic EEG signals based on non-linear dynamics features of the invention includes a preprocessing module, a feature extraction module, a feature sorting module, a feature selection module, and a classification module:

[0025] (1) Preprocessing Module

[0026] The EEG data is preprocessed. The original single channel EEG data (as shown in FIG. 2) is filtered and denoised by the Daubeches-4 wavelet function one by one. After filtering, the EEG signal with a frequency of 3 to 25 Hz is selected, that is, three sub-signals d3, d4, d5.

[0027] (2) Feature Extraction Module

[0028] Four entropy algorithms (Shannon entropy, conditional entropy, sample entropy and spectral entropy) are used to calculate the nonlinear dynamic characteristics of the three preprocessed sub-signals respectively. The calculation methods of the four entropy algorithms are given by the following formulas:

[0029] a) SHANNON ENTROPY (ShanEn)

[0030] Given a time series X={x.sub.i,i=1,2, . . . ,N}, ShanEn is defined as:

ShanEn=.sub.i=1.sup.Np(x.sub.i)log(p(x.sub.i)).

where p(x.sub.i) is the probability distribution function of X with .sub.i=1.sup.Np (x.sub.i)=1, 0p(x.sub.i)1.

[0031] b) CONDITIONAL ENTROPY (CondEn)

[0032] Given a time series X={x.sub.i,i=1,2, . . . ,N, CondEn can be calculated by the following two steps: Firstly, the phase space of X is reconstructed according to the sequence order, and a set of m-dimension vectors are generated, im. After reconstruction, (Nm+1) new vectors are obtained: x.sub.m(i)={X.sub.i,x.sub.i1, . . . x.sub.im+1}. Each X.sub.m(i) vector represents a pattern of m consecutive sample points. Next, CondEn can be calculated by the following formula:

[00001] $CondEn .Math. (\frac{m}{m - 1}) = - {.Math.}_{m - 1} .Math. p_{m - 1} .Math. {.Math.}_{m | (m - 1)} .Math. p_{m | (m - 1)} .Math. \log .Math. p_{m | (m - 1)},$

[0033] wherein, p.sub.m1 represents the joint probability of X.sub.m1(i), and p.sub.m|(m1) represents the conditional probability of X.sub.m(i) in the case that X.sub.m1(i) is given.

[0034] c) SAMPLE ENTROPY (SampEn)

[0035] Given a time series X={x.sub.i,i=1,2, . . . ,N}, given threshold r and dimension m, generating a set of m-dimensional vectors: X.sub.m(i)={x.sub.i,x.sub.i+1, . . . ,X.sub.i+m1}, it is defined that the distance d[X.sub.m(i), X.sub.m(j)] between the vectors X.sub.m(i) and X.sub.m(j) is the one with the largest difference between the two corresponding elements, that is,

d[X.sub.m(i), X.sub.m(j)]=max[|x(i+k)x(j+k)|]

[0036] wherein, m1k0,ij,i1,Nmj.

[0037] For a given threshold r, when the dimension is m and m+1, the number of d[X.sub.m(i),X.sub.m(j)]<r is counted as B and A respectively, then the SampEn can be defined as the following formula:

[00002] $S .Math. a .Math. m .Math. p .Math. E .Math. n = - \log .Math. \frac{A}{B} .$

[0038] d) SPECTRAL ENTROPY (SE)

[0039] SE is often used to measure the disorder degree of signals in the frequency distribution of amplitude component of signal power spectrum. When the signal centralizes in one frequency, the spectral entropy se reaches the minimum value. SE can be defined by:

SE=.sub.p.sub.log(p.sub.),

[0040] where is the frequency, and p.sub.is the power spectral density at frequency obtained from Fourier transform.

[0041] The above four entropies are calculated for all three decomposed EEG signals. Finally, each EEG signal segment has a total of 3*4 entropy features, which are input into the feature sorting module for weight sorting.

[0042] (3) Feature Sorting Module

[0043] The feature sorting module sorts the significant influence of the nonlinear dynamic characteristics (4 types, 12 entropy values) of the extracted EEG signals on the classification results of epileptic EEG signals by one-way analysis of variance (ANOVA). The more significant the influence of feature variables on classification results, the higher the sorting of the feature variables. The 12 features processed by the sorting module are input into the feature selection module to select the features that have significant influence on the classification results.

[0044] (4) Feature Selection Module

[0045] The feature selection module of the invention uses a forward sequential feature selection (FSFS) algorithm to successively add one feature from the first most significant feature into the classification model until the accuracy of the model is no longer improved, so as to select the optimal feature subset that has the most significant impact on the accuracy of the model.

[0046] (5) Classification Module

[0047] The classification module of the invention determines the seizure state of EEG signals by use of a least squares support vector machine (LS-SVM) algorithm. The least squares support vector machine (LS-SVM) is an improved support vector machine, which overcomes the shortcomings of high computational burden of support vector machines, has stronger real-time performance and is often used to recognize and classify physiological signals. LS-SVM is a binary classifier. The process of constructing a least squares support vector machine is to solve a quadratic programming problem using the least squares method to find the optimal hyper-plane process that separates two types of training data. The so-called optimal hyper-plane means that the classification surface can not only correctly separate two kinds of data, but also maximize the interval between two kinds of data. When n pairs of data {x.sub.i,Y.sub.i}i=1.sup.N (where x.sub.i R.sup.n is the i.sup.th input feature, Y.sub.i R is the corresponding i.sup.th category label, i.e. the corresponding seizure state of the EEG signal), the following decision function (x) can be used to determine its category:

[00003] $f (x) = sign [{.Math.}_{i = 1}^{N} .Math._{i} .Math. y_{i} .Math. K (x, x_{i}) + b]$

[0048] Where .sub.i is the Lagrange factor obtained from training, is the classification threshold, and K (x,x.sub.i) is the kernel function.

[0049] The accuracy of the least squares support vector machine class depends on the quality of the training model. The present invention selects the first-episode EEG data to establish an optimal training model. Firstly, EEG data are processed according to the processes of the above feature extraction, feature sorting and feature selection. The training method is provided as follows: the EEG signal database of epilepsy patients is randomly divided into two parts: 70% and 30%. 70% of the EEG data is used to train the algorithm, and the remaining 30% data is used to test the algorithm so as to obtain a LS-LVM model and related performance indexes.

Experimental Results

[0050] Using this method, an open source EEG database of epilepsy patients from the Department of Epileptology, Bonn University, Germany is used, including 5 subsets, which are labeled Z, O, N, F and S, respectively. Each subset contains 100 equal length EEG signals, each of which is 23.6 s in length and contains 4096 sampling points. Subset Z is collected from 5 healthy individuals with eyes closed, and Subset O is collected from 5 healthy individuals with eyes open. Subset N was collected from the hippocampus of the epilepsy patients. Subset F is collected from the epileptic area of epilepsy patients during the interval period of epilepsy. Subset S is collected from the epileptic area of epilepsy patients during the period of epilepsy. Since it is the most difficult to distinguish EEG during the period of epilepsy and EEG during the interval period of epilepsy, in the present invention, subset S and subset F are selected to test the effectiveness of the method of the present invention. All EEG signals have been marked by epilepsy experts. EEG signals during the interval period of epilepsy are marked as 0 and EEG signals during the period of epilepsy are marked as 1. This test uses three indexes to evaluate the classification performance, specificity, sensitivity and accuracy. The calculation formula of the three indexes is as follows:

[00004] $Accuracy = \frac{T .Math. P + T .Math. N}{T .Math. P + F .Math. N + T .Math. N + F .Math. P} 100 .Math. %, .Math. Sensitivity = \frac{T .Math. P}{T .Math. P + F .Math. N} 100 .Math. %, .Math. Specificity = \frac{T .Math. N}{T .Math. N + F .Math. P} 100 .Math. %,$

wherein TP, FP, TN and FN respectively represents true positive number, false positive number, true negative number and false negative number.

[0051] The EEG data during the period of epilepsy and during the interval period of epilepsy are randomly divided into 70% and 30%, respectively. The least squares support vector machine model is trained and tested for its performance, and compared with other common classification methods. See Table 1 for the specific results. As can be seen from Table 1, the classification results using the method provided by the present invention are optimal.

TABLE-US-00001 TABLE 1 Comparison of epileptic EEG signal classification results between the invention and other 5 common methods Sensitivity Specificity Accuracy Method (%) (%) (%) the method of the invention 99.50 100.00 99.40 k-Nearest Neighbor (KNN) 97.90 99.80 94.00 Linear Regression (LR) 99.00 100.00 98.00 Linear Discriminant 99.00 100.00 99.00 Regression (LDA) Naive Bayes (NB) 91.00 98.00 84.00 Random Forest (RF) 97.00 99.00 9.00

[0052] EEG signals have important value for epilepsy research. The invention uses a classification system of epileptic EEG signals based on non-linear dynamics features to analyze the EEG signals of epilepsy patients in detail. The sensitivity is 99.50%, the specificity is 100.00%, and the accuracy is 99.40%.

[0053] The present invention is not limited to the specific technical solutions described in the above embodiments, and all technical solutions formed by equivalent replacements are the protection required by the present invention.

CLASSIFICATION SYSTEM OF EPILEPTIC EEG SIGNALS BASED ON NON-LINEAR DYNAMICS FEATURES

Assignee

Inventors

Cpc classification

Classification Explorer

A61B2505/01

HUMAN NECESSITIES

Classification Explorer

G06F2218/00

PHYSICS

Classification Explorer

A61B5/374

HUMAN NECESSITIES

Classification Explorer

A61B5/7267

HUMAN NECESSITIES

Classification Explorer

G06F18/2113

PHYSICS

Classification Explorer

A61B5/7203

HUMAN NECESSITIES

Classification Explorer

A61B5/726

HUMAN NECESSITIES

Classification Explorer

A61B5/30

HUMAN NECESSITIES

Classification Explorer

A61B5/4094

HUMAN NECESSITIES

Classification Explorer

G06F18/2411

PHYSICS

International classification

Classification Explorer

A61B5/00

HUMAN NECESSITIES

Classification Explorer

G06K9/46

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Abstract

Claims

Description