METHOD FOR PREDICTING MULTI-TYPE ELECTROCARDIOGRAM HEART RHYTHMS BASED ON GRAPH CONVOLUTION

Abstract

A method for predicting multi-type ECG heart rhythms based on graph convolution includes: acquiring 12-lead ECG signals from a body surface of a patient, and resampling an ECG signal of each lead to a same signal length; constructing a node mutual information pooling U-shaped graph convolution network, and extracting deep features of the ECG signals by using a feature extraction module; performing one-layer one-dimensional convolution on the deep features to obtain a graph feature matrix to be constructed; inputting the obtained undirected graph into a graph encoding module in the graph convolution network, quantitatively calculating node mutual information of the undirected graph by using the graph encoding module, and selecting a node subset with the maximum mutual information to decrease the number of nodes in the undirected graph for down-sampling; inputting the undirected graph with the decreased number of nodes into a graph decoding module.

Claims

1. A method for predicting multi-type electrocardiogram (ECG) heart rhythms, the method comprising: (1) acquiring 12-lead ECG signals from a body surface of a patient, and resampling an ECG signal of each lead to a same signal length; (2) constructing a node mutual information pooling U-shaped graph convolution network, and extracting deep features of the ECG signals by using a feature extraction module in the graph convolution network; (3) performing one-layer one-dimensional convolution on the deep features to obtain a graph feature matrix to be constructed, where rows of the graph feature matrix correspond to various types of arrhythmias to be predicted, and columns of the graph feature matrix correspond to feature vectors corresponding to the various types of arrhythmias; and, converting the graph feature matrix into an undirected graph G(V, A, E, W, X), where V is a point set of the undirected graph, and each node in the point set corresponds to each type of arrhythmias; E is an edge set of the undirected graph, which records a similarity between nodes; W is a weight matrix, which gives different weights to different edges; X is a node feature matrix; and, A is a node adjacency matrix; (4) inputting the obtained undirected graph into a graph encoding module in the graph convolution network, quantitatively calculating node mutual information of the undirected graph by using the graph encoding module, and selecting a node subset with maximum mutual information to decrease a number of nodes in the undirected graph for down-sampling; (5) inputting the undirected graph with decreased number of nodes into a graph decoding module in the graph convolution network, up-sampling, by using the graph decoding module, the undirected graph according to existing node indexes to restore an original number of nodes so as to generate a new graph feature matrix, and adding the new graph feature matrix with the graph feature matrix before down-sampling and outputting an aggregate graph feature matrix; (6) performing summation, maximization, minimization and averaging on the aggregate graph feature matrix, splicing a result obtained after four operations into a feature vector and inputting the feature vector into a fully-connected layer, and finally mapping to obtain a probability value of each type of arrhythmias; and (7) training the graph convolution network through a cross entropy loss function by using a large amount of ECG data and labels of various arrhythmias to predict multi-type ECG heart rhythms.

2. The method of claim 1, wherein the feature extraction module comprises a one-dimensional convolution layer, a residual network ResNet34, and an average pooling layer which are successively connected to one another, and all convolution layers in the residual network ResNet34 are replaced by one-dimensional convolution.

3. The method of claim 1, wherein the graph encoding module selects the node subset with the maximum mutual information by using the following formula: $\max_{Ω \subset V} C (Ω);$ $C (Ω) = \frac{1}{K} {.Math.}_{v \in Ω} \log σ (T_{w} (x_{v}, y_{𝒩_{v}})) + \frac{1}{K^{2}} {.Math.}_{v, u \in Ω} \log (1 - σ (T_{w} (x_{v}, y_{𝒩_{u}})));$ $T_{w} (x_{v}, y_{𝒩_{v}}) = 𝒮_{w} (ℰ_{w} (x_{v}), 𝒫_{w} (y_{𝒩_{v}}));$ $T_{w} (x_{v}, y_{𝒩_{u}}) = 𝒮_{w} (ℰ_{w} (x_{v}), 𝒫_{w} (y_{𝒩_{u}}));$ where Ω denotes the node subset; K is the number of nodes in the node subset Ω; V is the point set of the undirected graph; C(Ω) is a mutual information quantization function between neighboring nodes in the node subset Ω; v is any node in the node subset Ω; x.sub.v denotes the feature vector of the node v; σ( ) is a sigmoid activation function; custom-character .sub.v denotes all neighboring nodes of the node v; y.sub.N.sub.v is a feature matrix comprising the feature vectors of all neighboring nodes of the node v; .sub.w( ) is a vertex embedding function; .sub.w( ) is a neighborhood embedding function; S.sub.w( ) is a vertex-neighborhood affinity function, which has the same function as custom-character .sub.w( ) and aggregates the result of .sub.w(x.sub.v); .sub.w( ) and .sub.w( ) are implemented by a multi-layer perceptron; and, .sub.w( ) is implemented by R-hop graph convolution.

4. The method of claim 1, wherein after up-sampling or down-sampling the undirected graph, the corresponding graph feature matrix is subjected to one-layer graph convolution to update features of all nodes in the undirected graph.

5. The method of claim 1, wherein in the process of training the graph convolution network in (7), parameters in the network are iteratively updated by an Adam algorithm until the cross entropy loss function is converged.

6. The method of claim 1, wherein the cross entropy loss function is expressed as follows: $L = {l_{1}, .Math., l_{N}}^{⊤};$ $l_{n} = - w_{n} [y_{n} .Math. \log x_{n} + (1 - y_{n}) .Math. \log (1 - x_{n})];$ $w_{n} = \frac{1}{\log (Sum ({lable}_{n}))};$ where L is the cross entropy loss function; l.sub.n is a loss function value of a n.sup.th type of arrhythmias; N is a number of arrhythmia disease types, n is a natural number and 1≤n≤N; w.sub.n is correspondingly a weight value of the n.sup.th type of arrhythmias; x.sub.n is a predicted probability value of the n.sup.th type of arrhythmias correspondingly output by inputting the current ECG signal into the graph convolution network; y.sub.n is a label value of the n.sup.th type of arrhythmias corresponding to the current ECG signal; and, Sum(lable.sub.n) is a number of training data labels about the n.sup.th type of arrhythmias. 7 The method of claim 1, wherein in (4), node mutual information pooling is a core of the graph encoding module; a graph pooling part is associated with node selection and new graph structure generation; a mutual information score between a node and a neighboring node is calculated in the graph pooling part, and top K node indexes with the highest score are returned.

8. The method of claim 1, wherein in (5), previously recorded node indexes are required during up-sampling, and a new graph feature matrix X′=O ∈ {0}.sup.N×d is generated; N is a number of arrhythmia disease types, and d is a dimension of feature; the graph feature matrix to be up-sampled is added with a down-sampled graph feature matrix in the same layer to obtain an up-sampled graph feature matrix.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] FIG. 1 is a flowchart of a method for predicting multi-type ECG heart rhythms according to one embodiment of the disclosure;

[0026] FIG. 2 is a schematic diagram of a node mutual information pooling U-shaped graph convolution network; and

[0027] FIG. 3 shows the results of comparison of the experimental result indexes of the disclosure and other existing algorithms.

DETAILED DESCRIPTION

[0028] To further illustrate the disclosure, embodiments detailing a method for predicting multi-type electrocardiogram (ECG) heart rhythms based on graph convolution are described below. It should be noted that the following embodiments are intended to describe and not to limit the disclosure.

[0029] As shown in FIG. 1, the disclosure provides a method for predicting multi-type electrocardiogram (ECG) heart rhythms based on graph convolution, comprising the following steps.

[0030] (1) 12-lead electrical signals of the body surface of the patient are acquired, and the electrical signal of each lead is resampled to a same signal length. The resampled data is 12 pieces of one-dimensional data having the same length, and can be directly input into the model.

[0031] (2) Since there are labels corresponding to multiple diseases in the training data, the number of patients corresponding to different diseases will be unbalanced. Hence, the number of each type of heart rhythms in the training data is calculated, and each weight of the cross entropy loss function to be finally optimized is modified, to improve the prediction degree of a small number of disease types by the method of the disclosure.

[00003] $ℓ (x, y) = L = {l_{1}, .Math., l_{N}}^{⊤}, l_{n} = - w_{n} [y_{n} .Math. \log x_{n} + (1 - y_{n}) .Math. \log (1 - x_{n})];$ $w_{n} = \frac{1}{\log (Sum ({lable}_{n}))};$

[0032] where w.sub.n is the weight value corresponding to each label of heart rhythms, x.sub.n is the label probability value predicted by the network, and y.sub.n is the label value and has a value of {0,1}.

[0033] (3) The data is conveyed to the feature extraction module in the constructed graph convolution network to obtain deep features extracted by this module. The structure of the graph convolution network is shown in FIG. 2. The feature extraction module comprises a one-dimensional convolution layer, a Residual skeleton based on ResNet34 and an average pooling layer, wherein all convolution layers in the ResNet34 are replaced by one-dimensional convolution.

[0034] (4) One-layer one-dimension convolution is performed on the deep features to obtain a graph feature matrix H to be constructed. The number of rows of the graph feature matrix approximately represents the total number of diseases to be predicted, and the columns represent the feature vectors corresponding to the nodes. Here, the node vector is approximately used as the “blurred feature” of each type of diseases.

[0035] The graph feature matrix is converted into an undirected graph G (V, A, E, W, X), where V is the vertex set V={v.sub.1, . . . , v.sub.N} of the undirected graph G, i.e., the set of all nodes; E is the set of edges of the undirected graph G, which records a similarity between nodes; W is the weight matrix, which gives different weights to different edges; X=[x.sub.1 . . . x.sub.N].sup.T ∈ custom-character .sup.N×d is the node feature matrix.

[0036] (5) The obtained undirected graph is input into the graph encoding module in the node mutual information pooling U-shaped graph convolution network. During the mutual information node pooling in this module, a score for node mutual information is calculated for the graph data, and node selection is performed by searching the node subset with the maximum mutual information.

[00004] $\max_{Ω \subset V} C (Ω), subject to .Math. Ω .Math. = K;$

[0037] where C(Ω) denotes the mutual information quantization function between a node and neighboring nodes, K denotes the number of nodes in the selected subset, and Ω is the set of subset nodes. Generally, the mutual information value between a node and a neighboring node is defined as the KL divergence value between the joint probability distribution and the edge probability distribution, that is:

I.sup.(Ω)(v, n)=D.sub.KL(P.sub.v,n∥P.sub.v.Math.P.sub.n);

[0038] where P.sub.v,n denotes the joint probability distribution between the node v and the set of surrounding nodes, and P.sub.v.Math.p.sub.n is the edge probability between the node v and the set of surrounding nodes.

[0039] In accordance with the relationship between the KL divergence and the f divergence, the mutual information value between the node feature and the field feature may be expressed as:

[00005] $\underset{w}{C (Ω)} = \frac{1}{.Math. Ω .Math.} {.Math.}_{v \in Ω} \log σ (T_{w} (x_{v}, y_{𝒩_{v}})) + \frac{1}{{.Math. Ω .Math.}^{2}} {.Math.}_{(v, u) \in Ω} \log (1 - σ (T_{w} (x_{v}, y_{𝒩_{u}})));$

[0040] where σ( ) is a sigmoid activation function; custom-character .sub.v denotes all neighboring nodes of the node v; is a feature matrix comprising the feature vectors of all neighboring nodes of the node v; u is a certain neighboring node of the node v; and, the function T.sub.w(.Math., .Math.) denotes a neural network which is constructed by three trainable functions: a vertex embedding function custom-character .sub.w(.Math.), a neighborhood embedding function P.sub.w(.Math.) and a vertex-neighborhood affinity function .sub.w(.Math., .Math.), specifically:

[00006] $\begin{matrix} T_{w} (x_{v}, y_{𝒩_{u}}) = 𝒮_{w} (ℰ_{w} (x_{v}), 𝒫_{w} (y_{𝒩_{u}})) \\ = 𝒮_{w} (ℰ_{w} (x_{v}), \frac{1}{R} {.Math.}_{r = 0}^{R} {.Math.}_{v \in 𝒩_{w}} {({({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2})}^{r})}_{v, u} W^{(r)} ℰ_{w} (x_{v})) \end{matrix};$

[0041] where custom-character .sub.w(.Math.) and .sub.w(.Math., .Math.) are replaced by two MLPs, respectively, and .sub.w(.Math.) is replaced by R-hop graph convolution.

[0042] (6) The graph with the decreased number of nodes is input into the graph decoding module. The previously recorded node indexes need to be used during up-sampling, and a new graph feature matrix X′=O ∈ {0}.sup.N×d is generated. Then, the graph feature matrix to be up-sampled is added with the down-sampled graph feature matrix in the same layer to obtain the up-sampled graph feature matrix.

[0043] (7) Between the down-sampling and up-sampling of the graph, the feature matrix of the graph is subjected to one-layer graph convolution to update the features of all nodes on the graph. The graph convolution calculation is expressed as:

[00007] $h_{i}^{(l + 1)} = σ (b^{(l)} + {.Math.}_{j \in 𝒩 (i)} \frac{e_{ji}}{c_{ji}} h_{j}^{(l)} W^{(l)});$ $c_{ji} = \sqrt{.Math. 𝒩 (j) .Math.} \sqrt{.Math. 𝒩 (i) .Math.};$

[0044] where the superscript l denotes the layer number; b denotes the bias value; i and j denote two nodes; custom-character (i) denotes the neighboring node of the node i; c.sub.ji denotes the product of square roots of the degree of the nodes i and j; W.sup.(l) denotes the learnable parameter of the l.sup.th layer; and, h.sub.j.sup.(l) denotes the feature vector of the node j.

[0045] (8) Graph features of the obtained graph feature matrix are read, that is, summation, maximization, minimization and averaging are performed on all nodes on the graph to obtain output features of the graph.

[0046] (9) After the output features of the graph are obtained, the features are input into a fully-connected layer and finally mapped into a probability value of each type of diseases to be output.

[0047] ŷ.sub.1, ŷ.sub.2, . . . , ŷ.sub.n=Sigmoid (0.sub.1, 0.sub.2, . . . , 0.sub.n);

[0048] where 0.sub.1, 0.sub.2, . . . , 0.sub.n are the output values of the fully-connected layer, and ŷ.sub.1, ŷ.sub.2, . . . , ŷ.sub.n are the label probability values between 0 and 1 predicted after passing through the activation function.

[0049] (10) The forward calculation of the whole model is denoted by a function f.sub.A=F(x, θ.sup.(i)), and the parameters of the whole model are updated during the model training by using an Adam algorithm.

[00008] $m_{t} = β_{1} m_{t - 1} + (1 - β_{1}) ℊ_{t};$ $v_{t} = β_{2} v_{t - 1} + (1 - β_{2}) ℊ_{t}^{2}$ ${\hat{m}}_{t} = \frac{m_{t}}{1 - β_{1}^{t}};$ ${\hat{v}}_{t} = \frac{v_{t}}{1 - β_{2}^{t}}$ $θ_{t + 1} = θ_{t} - \frac{η}{\sqrt{{\hat{v}}_{t}} + ϵ} {\hat{m}}_{t};$

[0050] where m.sub.t and v.sub.t are the estimated values of the first moment (mean) and the second moment (biased variance) of the gradient, respectively, and are both initialized as 0 vector; g.sub.t is the gradient of the forward calculation function f.sub.A at a certain moment; {circumflex over (m)}.sub.t and {circumflex over (v)}.sub.t are the corrected gradients of the estimated values of the gradient, respectively; β.sub.1 has a default value of 0.9; and, if β.sub.2 is 0.999, ϵ is 10.sup.−8.

[0051] To verify the effectiveness of the method of the disclosure, the ECG data of 43100 patients are trained and tested. 70% of the data are randomly classified into a training set, while 30% of the data are classified into a test set. Also, several main arrhythmia classification models are selected for comparison. FIG. 3 shows the precision-recall curves of the results of the model of the disclosure and other classification models. It can be seen that the method of the disclosure shows better prediction results than other models using only 1D-resnet34 and is effective for the node mutual information down-sampling graph convolution of diseases. It can be seen that the model of the disclosure can more deeply learn the hidden features of ECG data on the basis of the features of the convolutional neural network, so that the representation capability of the model for multi-label correlation prediction can be greatly improved, the correlation between diseases can be determined more effectively, and the auxiliary diagnosis can be precisely made.

[0052] It will be obvious to those skilled in the art that changes and modifications may be made, and therefore, the aim in the appended claims is to cover all such changes and modifications.

METHOD FOR PREDICTING MULTI-TYPE ELECTROCARDIOGRAM HEART RHYTHMS BASED ON GRAPH CONVOLUTION

Inventors

Cpc classification

Classification Explorer

A61B5/346

HUMAN NECESSITIES

Classification Explorer

A61B5/7267

HUMAN NECESSITIES

Classification Explorer

G16H50/30

PHYSICS

International classification

Classification Explorer

A61B5/346

HUMAN NECESSITIES

Classification Explorer

A61B5/00

HUMAN NECESSITIES

Classification Explorer

G16H50/30

PHYSICS

Abstract

Claims

Description