Segmentation method for tumor regions in pathological images of clear cell renal cell carcinoma based on deep learning

Abstract

A segmentation method for tumor regions in a pathological image of clear cell renal cell carcinoma based on deep learning includes data acquisition and pre-processing, building and training of a classification network SENet and prediction of tumor regions. The present invention studies clear cell renal cell carcinoma based on pathological images, yielding results with higher reliability than judgments made based on CT or MRI images. The present invention overcomes the drawback that the previous research on clear cell renal cell carcinoma is only limited to judgment on presence by being able to visually provide the position and size of tumor regions, which is convenient for the medical profession to better study the pathogenesis and directions to the treatment of clear cell renal cell carcinoma.

Claims

1. A segmentation method for tumor regions in a pathological image of a clear cell renal cell carcinoma based on a deep learning, comprising the following steps: a, data acquisition and pre-processing, specifically comprising: a1, converting original scanned images of HE sections of kidney cancer patients in a kfb format to images in a tiff format; a2, marking tumor regions and non-tumor regions on the images of the HE sections in the tiff format through ASAP software, then generating sampling points in the tumor regions and the non-tumor regions at random, and generating image blocks of a size 512×512 with the sampling points as centers; a3, dividing the image blocks of the HE sections into a training set, a validation set, and a test set according to the kidney cancer patients; and a4, performing an image enhancement by a random horizontal flipping and a random rotation of the image blocks, and normalizing the image blocks using a formula 1, wherein I(x) denotes a pixel value of an image block I at a position x and I′(x) denotes an output at the position x after a normalization:
I′(x)=I(x)−128)/128 formula (1); b, building and training of a classification network SENet, specifically comprising: b1, first, feeding normalized 512×512 image blocks to a convolution with a convolution kernel size of 7×7 to extract low-level features; b2, then, learning high-level features from the image blocks through 4 cascaded residual blocks, wherein SE blocks are merged into the 4 cascaded residual blocks; and b3, finally, stretching an obtained features containing a rich semantic information into one-dimensional feature vectors by a global average pooling and finally obtaining predicted value outputs of a model through full-connected layers and a sigmoid function, wherein a predicted value of 0 indicates a non-tumor image and a predicted value of 1 indicates a tumor image; c, a prediction of the tumor regions, specifically comprising: c1, first, obtaining a thumbnail of a specific size through a get_thumbnail function in an openslide library; c2, then, obtaining a foreground image with a tissue from the thumbnail of the pathological image through a maximum between-cluster variance method, denoted by a mask1; c3, then, dilating the mask1 using structural elements of a size 3×3 to fill some small holes left by a threshold segmentation to obtain a binary image mask containing only a foreground of the tissue, traversing coordinates P.sub.mask of each a foreground pixel in the binary image mask and multiplying the coordinates P.sub.mask by a scale factor by which a original image is scaled down to a thumbnail image to generate coordinates P.sub.wsi of a corresponding point in the original image; extracting from the original image a 512×512 image block with the coordinates P.sub.wsi as a center and feeding the 512×512 image block to the classification network SENet previously trained to obtain a predicted value y.sub.pred and using the predicted value y.sub.pred as a pixel value at the coordinates P.sub.mask in the binary image mask, and generating preliminary tumor region results by traversing all pixels; and c4, finally, filtering out false positive regions from preliminary segmentation results by a constraining connected component area to obtain final segmentation results of the tumor regions.

2. The segmentation method for the tumor regions in the pathological image of the clear cell renal cell carcinoma based on the deep learning according to claim 1, wherein in the SE blocks, extracted features are processed by the global average pooling to obtain features in a channel direction, and then the features in the channel direction are squeezed and excited through full-connected layers to be restored to a one-dimensional vector corresponding with an input feature channel number; the one-dimensional vector is updated with different values as a neural network is continuously optimized and iterated, wherein each value of the different values represents an importance weight of a respective original feature image of a channel to a correct prediction made by the neural network; multiplying a weight obtained by a channel attention by input features to yield enhanced features in the channel direction, thereby better promoting a learning and a convergence of the classification network SENet.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a graph showing the structure of the SENet network in the present invention.

(2) FIG. 2 is a graph showing the structure of the residual blocks into which SE blocks are merged in the present invention.

(3) FIG. 3 is a graph showing the steps of predicting tumor regions of clear cell carcinoma in an HE image in the present invention.

(4) FIG. 4 is a graph showing a receiver operating characteristic curve of ResNet18.

(5) FIG. 5 is a graph showing a receiver operating characteristic curve of the SENet in the present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

(6) A specific embodiment of the present invention is detailed below.

(7) The method of the present invention mainly comprises 3 steps: data acquisition and pre-processing, building and training of a classification network and prediction of tumor regions.

(8) Data Acquisition and Pre-Processing

(9) The present invention uses 230 HE digital sections of 69 patients of Wuxi Second People's Hospital from May 2020 to May 2021 as subjects. Each patient has 1 to 10 HE sections, and each HE section does not necessarily contain a tumor region. The present invention partitions the whole data set into a training set, a validation set and a test set by taking human as a unit. As the original digital scans of sections are in kfb format, they cannot be processed by computer. Therefore, the present invention uses the converter software provided by Konfoong to convert images in kfb format to images in tiff format. In addition, tumor regions and non-tumor regions are marked on each HE section through ASAP software under the direction of a medical professional, and then image blocks of size 512×512 are randomly extracted from respective regions for ease of subsequent training in a classification network. The specific number of pictures is shown in Table 1.

(10) TABLE-US-00001 TABLE 1 Quantity distribution of pictures among data sets Number Number Number Number of of of of tumor non-tumor patients sections pictures pictures Total Training set 51 171 4709 5183 9892 Validation 9 25 466 464 930 set Test set 9 34 648 640 1288

(11) Prior to training, image enhancement is performed by random horizontal flipping and random rotation (0°, 90°, 180° or 270°) of the original images to avoid overfitting. In addition, all of the images are normalized using formula 1 to accelerate the convergence of the network, where I(x) denotes a pixel value of an image block I at position x and I′(x) denotes an output at position x after normalization:
I′(x)=(I(x)−128)/128 formula (1)

(12) Building and Training of a Classification Network

(13) The data sets obtained in the previous step are fed to a SENet network for training; the specific training and experimental parameters are shown in the following steps:

(14) a. The specific structure of the SENet network is shown in FIG. 1. First, the original 512×512 images are fed to a convolution with a convolution kernel size of 7×7 [Conv(7,7)] and a pooling layer [Maxpool(2,2)] to extract low-level features. Then, the high-level features in the images are learned through 4 cascaded residual blocks into which SE blocks are merged (SE-residual blocks). Finally, the obtained features containing rich semantic information are stretched into one-dimensional feature vectors by global average pooling (GAP), and predicted outputs of the model are finally obtained through full-connected layers and a sigmoid function; a predicted value of 0 indicates a non-tumor image and a predicted value of 1 indicates a tumor image.

(15) b. The SE-residual blocks are the original residual blocks into which SE blocks are merged (as shown in FIG. 2). A large number of experiments show that the combination of a convolution with a 3×3 kernel [Conv(3,3)], a BatchNormalization (BN) layer and a ReLU function can speed up the training of the network, increase the nonlinearity of network parameters and prevent overfitting. However, when the above-mentioned structure is only used to extract features, less gradient information can be transmitted to deep network layers as the number of network layers increases and redundant features are continuously removed by the pooling layer, which may lead to poor classification results. To avoid the gradient disappearance problem present in the process of network training, the residual blocks add the original feature x to the feature f(x) extracted by convolution, so that the information in the original feature is retained and a new feature beneficial to image classification is also added. The present invention adds SE blocks in the middle of the residual blocks, so that features extracted by the network have richer semantic information. The specific structure of the SE-Blocks is shown in the dashed box in the upper part of FIG. 2. The extracted features are processed by global average pooling (GAP) to obtain features in the channel direction, and then the obtained features are squeezed and excited through full-connected (FC) layers to be restored to a one-dimensional vector corresponding with the input feature channel number. The one-dimensional vector will be updated with different values as the neural network is continuously optimized and iterated; each value represents importance weight of a respective original feature image of the channel to a correct prediction made by the neural network. Multiplying the weight obtained by channel attention by input features will yield enhanced features in the channel direction, thereby better promoting learning and convergence of the classification network.

(16) c. Optimization of the network model. Model parameters of SENet are optimized using an Adam optimizer with a learning rate of 10.sup.−4 in the process of training. A loss function is a function for expressing the distance between a prediction result and the actual label. The distance between the prediction result and the actual label is continuously reduced through the optimizer and a gradient descent method, such that the prediction result is getting closer to the label. In the present invention, the input images require binary classification (that is, determining whether the input images indicate tumors), and therefore binary cross-entropy is used to constrain the whole model. The calculation formula for the loss function loss is given below, where y.sub.label denotes the original label and y.sub.pred denotes the result predicted by the model.
loss=−(y.sub.label log y.sub.pred+(1−y.sub.label)log(1−y.sub.pred)) formula (2)

(17) Prediction of Tumor Regions

(18) After the training of the tumor classification network is completed, the present invention accomplishes the prediction of tumor regions through the following steps. A specific flow chart for prediction is shown in FIG. 3.

(19) First, a thumbnail of specific size (the maximum length specified here is 512) is obtained through the get_thumbnail function in the openslide library, and then a foreground image with tissue is obtained from the thumbnail of the pathological image through threshold segmentation and is denoted by mask1. The reason for the segmentation of the thumbnail is that the original pathological image is large in size—about 100 thousand×100 thousand. Prediction based on the original image will consume a large amount of computational resources and is also time-consuming, and prediction is therefore performed based on a thumbnail of the original pathological image pixel by pixel.

(20) Then, mask1 is morphologically processed (specifically, dilated using structural elements of size 3×3) to fill the holes present in threshold segmentation to obtain the label of a binary image mask containing only a foreground of tissue. Then, coordinate transformation is performed on the pixels of the foreground part in the mask image—that is, the coordinates P.sub.mask of each foreground pixel in the mask image is multiplied by a scale factor by which the original image is scaled down to go back to the coordinates P.sub.wsi of a corresponding point in the original image. A 512×512 image block with the P.sub.wsi coordinates as a center is extracted from the original image and fed to the SENet trained in step two for model prediction, and the prediction result is used as the pixel value at P.sub.mask in mask. Preliminary tumor region results are generated by traversing all pixels.

(21) Finally, false positive regions with an area smaller than 200 pixels are filtered out from the preliminary segmentation results of tumor regions through a connected component area filtration method to obtain final tumor segmentation results.

(22) Comparison of Experimental Results

(23) The present invention compares the results of recognition of clear cell carcinoma and non-clear cell carcinoma pictures using a basic ResNet18 classification network and SENet; the two networks are both trained using the same training parameters and computer hardware configuration. In tests, indicators such as sensitivity (Sen), specificity (Spe), accuracy (Acc) and area under curve (AUC) of receiver operating characteristic curve (ROC) are used to measure the performance of network classification. Sen measures the performance of recognition of tumors by an algorithm. Spe measures the performance of recognition of non-tumor images by an algorithm. Acc is the recognition accuracy of an algorithm to all images in a test set. TP denotes the number of pictures that are actually tumor pictures and are correctly predicted as tumor pictures. FN denotes the number of pictures that are actually tumor pictures and are incorrectly predicted as non-tumor pictures. TN denotes the number of pictures that are actually non-tumor images and are correctly predicted. FP denotes the number of non-tumor pictures that are predicted as tumor pictures. Sen, Spe and Acc are then calculated as follows:
Sen=TP/(TP+FN) formula (3)
Spe=TN/(TN+FP) formula(4)
Acc=(TP+TN)/(TN+FN+TP+FP) formula (5)

(24) ROC is a sensitivity curve with the true positive rate of the classifier as the ordinate and the false positive rate as the abscissa. The closer the curve is to the upper left corner, the higher the accuracy of the model is. The ROC curve of ResNet18 is shown in FIG. 4, and the ROC curve of SENet is shown in FIG. 5. As an extension of this, the area under ROC curve AUC is also used to judge the merits and demerits of a binary classification model. The specific numerical values of the classification indicators described above are shown in Table 2. It is easy to see that SENet has better classification performance after attention in the channel direction is introduced into the model.

(25) TABLE-US-00002 TABLE 2 Classification results of ResNet18 and SeNet models Sen Spe AUC Acc ResNet18 0.8873 0.9937 0.94 0.9402 SeNet 0.9429 0.9906 0.97 0.9666

Segmentation method for tumor regions in pathological images of clear cell renal cell carcinoma based on deep learning

Assignee

Inventors

Cpc classification

Classification Explorer

G06T7/11

PHYSICS

Classification Explorer

G06T7/194

PHYSICS

Classification Explorer

G06T2207/20021

PHYSICS

Classification Explorer

G06T7/0012

PHYSICS

Classification Explorer

G06T2207/20084

PHYSICS

Classification Explorer

G06T2207/30084

PHYSICS

Classification Explorer

G06T2207/30096

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

International classification

Classification Explorer

G06T7/00

PHYSICS

Classification Explorer

G06T7/11

PHYSICS

Classification Explorer

G06T7/194

PHYSICS

Abstract

Claims

Description