Cell Detection Studio: a system for the development of Deep Learning Neural Networks Algorithms for cell detection and quantification from Whole Slide Images

Abstract

The invention is made out of methods for the development of Deep Neural Networks for cell detection and quantification in Whole Slide Images (WSI): 1. Method to create generic cell detector that detects the centers and contours of all cells in a WSI. 2. Method to create algorithms to detect cells of specific categories and that can classify between various types of cells of different categories. 3. Method for efficient cell annotation with online learning. 4. Method for efficient cell annotation with active learning. 5. Method for efficient cell annotation with online learning and data balancing. 6. Method for auto annotation of cells 7. Cell Detection Studio: a method to create an AI based system that provides pathologists with a semi-automatic tool to create new algorithms aiming to find cells of specific categories in WSI digitally scanned from histological specimen

Claims

1. A method for generic cell detection in WSI: providing a digital WSI scanned by a digital scanner from a histological specimen-stained slide; detect the centers and contours of cells of all types in the WSI using image processing and deep learning algorithms.

2. A method to create a detector that can classify between various types of cells of different categories using neural network algorithms. In order to create the neural network algorithm image crops are selected from the WSI that contain cells detected using the method described in claim 1. The crop size is set to a constant value or is adjusted to the size of cells in the image crop. An interactive annotation scheme of the sampled image patches with a GUI (graphical user interface) application can be used by human annotators. Each time a single image crop is presented to the annotator, and using a keyboard press or mouse click, or touch screen tap, the annotator chooses one of the possible categories, each containing a specific type of cell or background.

3. The method of claim 2 where online learning methodology is used, thus the classification neural network is trained in parallel to the annotation process. The deep learning neural network is continuously trained on an evolving dataset. During the training the number of annotated cells in each cell category constantly changes and this can affect the loss function of the neural network. Thus, the weights of each cell category have to be constantly updated as the database constantly changes during online learning. Therefore the loss function is calculated as weighted cross entropy where the weights can be set using methods for class balancing, e.g. median frequency balancing. In addition, during the generation of the algorithm the architecture of the neural network, the number of layers, the number of parameters and the connection between layers may change.

4. A method for selecting the next image patches to be annotated using active learning framework. An active learning algorithm is applied on the cells yet to be annotated. The active learning algorithm ranks each cell. Each rank has a life time of a pre-defined parameter that can be configured. When the life time elapses the rank is then reset to a default value. The active learning algorithm chooses the best images to be annotated in order to make the annotation process most efficient. This ranking can be based on the acquisition function (should be minimized) or using an ensemble of models. If the ensemble of algorithms is in disagreement this means that the algorithm is less confident for that image patch. The ensemble of algorithms can be generated using Bayesian deep learning, where drop out is applied at test time.

5. The method of claim 4 where we add data balancing using resampling. Unbalanced data is a common situation where the number of instances of one category is significantly smaller than the number of instances of another category. In order to obtain a robust network there should be enough examples of each category. We therefore add data balancing methodology for effective active learning. We rank the cells inversely proportional to their existence. Once we have enough examples of each category we can move to the usual approach of active learning. We duplicate image patches that belong to the least frequent category.

6. The method of claim 4 where we add data balancing using weighting. The weight is inversely proportional to the proportion of the least frequent category.

7. The method of claim 4 where we add data balancing using the following weighting: Weight=E*AB*(NE)*Pminority where: E=Entropy(class proportion) A=Acquisition function as defined in Active Learning. B=parameter N=number of categories Pminority=output of neural network of claim 2 that gives the probability for the minority category.

8. A method for suggested auto annotation of the image crops that contain different kind of cells. The activation of the auto annotation process can be triggered by one of the following: 1) Time elapsed from beginning of training is higher than a threshold. 2) Classification results of the algorithm are close enough (according to some metric and threshold) to that of a human annotator. 3) Accuracy on a pre-defined validation set is good enough according to a pre-defined metric and threshold. 4) There is large enough number of annotations of cells of all categories of interest.

9. The method of claim 8 where we add noise to the auto annotation process so that some of the images will be selected at random so that we do not fall in dead ends.

10. Cell Detection Studio: an AI based system that provides pathologists with a semi-automatic tool to create new algorithms aiming to find cell of specific categories in WSI digitally scanned from histological specimen; a computer running a dedicated WSI viewer that contains data upload and save utilities, zoom and pan and the ability to quickly navigate to a region of interest. The WSI viewer also offers computer vision, machine learning and deep learning algorithms. Specifically, the system provides the methods detailed in claims 1-9 that allows detection of the presence of specific types of cells.

11. The method of claim 10, wherein the system is adapted to the transfer, storage and retrieval of the associated images, and for the generation of reports.

12. The method of claim 10 for Quality Assurance (QA) as part of the training process. The user can annotate cells of specific category in a region of interest and apply the algorithm developed on that region of interest. Then the system generates statistical report on the accuracy level achieved. Accuracy measures to be used can be but are not limited to: FA, AUC, confusion matrix, false positives and false negatives.

13. The method of claim 10 with the addition of calculating various attributes related to the specific cell categories for which the detection algorithm was created. These features are based on the location and contours of the above mentioned cells: number, density, area, location and perimeter.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:

[0021] FIG. 1 shows a flow chart summarizing the method of one embodiment of the invention, illustrating the steps from the digital Whole Slide Image to the generation of Cell Categories Detector by a pathologist for the analysis of Cell Categories of interest.

[0022] FIG. 2 shows a flow chart for the Generic Cell Detector method.

[0023] FIG. 3 shows a block diagram of the Unet neural network architecture that is used for the generic cell detector.

[0024] FIG. 4 shows a flow chart for the training of a CNN for different cells categories, where the cells to be categorized are selected from the generic cell detector described in FIG. 2.

[0025] FIG. 5 shows a block diagram of the CNN used for Cell Categories classification.

[0026] FIG. 6 shows a flow chart for effective cell annotation using online learning.

[0027] FIG. 7 shows a flow chart for effective cell annotation using active learning.

[0028] FIG. 8 shows a flow chart for effective cell annotation using data balancing strategy for active learning.

[0029] FIG. 9 shows a flow chart for a suggested auto annotation for cells.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

[0030] In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.

[0031] FIG. 1 illustrates a flowchart representation of an embodiment of a method of the invention for Cell Detection Studio. The aim is to provide pathologists and researchers with a tool to create a Deep Learning based cell detection tool according to their requirements. In which digital pathology slides are available A1. The pathology slides are generated from tissue biopsies. These tissue biopsies are sectioned, mounted on a glass slide and stained to enhance contrast in the microscopic image. For example, in some embodiments of the invention, the histological specimen slices may be stained, by H&E, Giemsa or immune histochemical staining to enhance certain features in the image. The slide is then loaded on a digital scanner for digitization and the result is a digital file containing the image information.

[0032] The resultant image is then submitted to a generic cell detection algorithm A2 that aims to detect all cells in the slide, of any category. A detailed description of the generic cell detector is given in the text for FIG. 2 and FIG. 3.

[0033] The result of A2 is a list of all the centers and contours of the cells present in the slide. The image patches extraction module A3 aims to extract crops that surround every cell selected. The size of the image patches can be set as a parameter and its default value is 32. The image patches created are then submitted to interactive image annotation using a GUI (graphical user interface) application A4. Each time a single image crop is presented to the annotator, and using a keyboard press or mouse click, or touch screen tap, the annotator chooses one of the possible categories, each containing a specific type of cell or background. A classification CNN is trained on all the available annotated image patches to create a cell categories classifier A5. Online learning is used to update the cell classification neural network as more annotations become available as described in the text for FIG. 6. Efficient annotation is enabled by active learning described in the text for FIG. 7, and active learning with data balancing strategy described in the text for FIG. 8. Moreover, efficient annotation is further enhanced using an auto annotation scheme when feasible as described in the text for FIG. 9. Once a Cell Categories Classifier is generated the user can test it on new slides and obtain visual and quantitative evaluation of the quality of the cell categories detector generated A6. The quantitative evaluation can be ROC, AUC or other standard metrics used in deep learning. When the user is satisfied with the quality of the generated cell detector it can be applied to new digital pathology slides A7, where the centers and contours of the cell categories that are of interested only are located A8 and a report that provides information on attributes of these cell category can be obtained A9. These attributes can be but are not limited to: number, density, area, location and perimeter.

[0034] FIG. 2 illustrates a flowchart representation of an embodiment of a method of the invention for the Generic Cell Detection. In which the input is an annotated dataset of cells of any kind B1. A neural network based on the Unet architecture that is presented in FIG. 3 is trained to obtain cell segmentation B2, where the output are the contours of the segmented cells B3. Data augmentation methods are used to add variability of scale, aspect ratio, staining and other variations to generalize the cell segmentation neural network and make it robust to this possible variability B4. The result is a generic cell detector that can segment all cell types in a whole slide mage B5.

[0035] FIG. 3 illustrates a flowchart representation the invention for a Unet based Generic Cell Detector.

[0036] The input to the generic cell detection block is patches from whole slide images. The output of the generic cell detection block is contours and centers of all detected cells in each patch. The generic cell detection can be any method for nuclei detection. One method for doing this is using a neural network for semantic segmentation, e.g Unet. In this case Unet is trained on a dataset of cells and outputs the body and the contour segmentation for each cell. The dataset is consisted of annotated cells where each cell has a marked polygon around its border.

[0037] In case of a segmentation network based on Unet, the architecture is consisted of an Encoder and a Decoder. The Encoder has 5 convolutional blocks, each with a kernel of 33 and a stride of 2. The Decoder has 5 convolutional blocks, each with a kernel of 33, and an up sampling ratio of 2. Each decoder block performs concatenation with features from a layer from the Encoder. The last layer in the decoder has an output size of 3: One for the background class, one for the cell body class, and one for the cell border class.

[0038] FIG. 4 illustrates a flowchart representation of an embodiment of a method of the invention for the training of a CNN for the classification of different cells categories.

[0039] The input is digital slides that contain cells of categories that are of interest to the user C1. First, the generic cell detection algorithm whose generation is described in the text for FIG. 2 is applied to the slides C2. The result is a list of the centers and contours of all the cells in the slide C3. Next, image patches are generated around each cell center as crops of, e.g. 3232 C4. The user now annotates the image patches using a GUI to the cell categories of interest or background C5. This annotation is used to train a CNN for cell categories classification C6.

[0040] FIG. 5 illustrates a flowchart representation of an embodiment of a method of the invention for CNN for the classification of different cells categories.

[0041] The CNN architecture relies on the standard VGG16 architecture (other variants could be equivalently used), with 7 convolutional blocks (each containing a Convolutional layer, followed by a Rectified Linear Unit and a Batch Normalization unit), and then 3 fully connected layer, the first two of them followed by ReLU layers, and dropout layers. The last layer in the network outputs a score for the presence of the cell category in the input image patch (and equivalently a score for the lack of presence of a cell category in the input image patch).

[0042] FIG. 6 illustrates a flowchart representation of an embodiment of a method of the invention for a Cell Detection Algorithm Generation using Active Learning. In which digital slides are submitted D1, and the generic cell detector described in FIG. 2 is applied to provide the centers and contours of all the cells in the slides D2. Images patches around the centers of all the cells detected are created D3 and are submitted to annotation by the user D4. Once a sufficient number of image patches is available, according to a predefined threshold, a detection algorithm for specific cell categories is trained using this data D5. The deep learning neural network is then continuously trained in on the evolving dataset, as more annotations become available. During the training the number of annotated cells in each cell category constantly changes and this can affect the loss of the neural network. The weights have to be constantly updated as the database constantly changes during online learning. Therefore the loss is calculated as weighted cross entropy where the weights can be set using methods for class balancing, e.g. median frequency balancing. In addition, during the generation of the algorithm the architecture of the neural network, the number of layers, the number of parameters and the connection between layers may change. The user can test the generated cell categories detector generated on new slides and obtain visual and quantitative evaluation on the quality of the cell categories detector generated. The quantitative evaluation can be ROC, AUC or other standard metrics used in deep learning. When the user is satisfied with the quality of the generated cell detector, the Cell Detection generation process is completed. The user can resume the process with new data or modified annotations.

[0043] FIG. 7 illustrates a flowchart representation of an embodiment of a method of the invention for effective cell annotation using active learning. In which digital slides are submitted E1, and the generic cell detector described in FIG. 2 is applied to provide the centers and contours of all the cells in the slides E2. Images patches around the centers of all the cells detected are created E3 and are submitted to annotation by the user E4. Once a sufficient number of image patches is available, according to a predefined threshold, a detection algorithm for specific cell categories is trained using this data E5. The current cell categories classifier is then applied to image patches that were not annotated yet by the user E6. These image patches are then ranked according to active learning methodology E7. Each rank has a life time of a pre-defined parameter that can be configured. When the life time elapses the rank is then reset to a default value. The active learning algorithm chooses the best images to be annotated in order to make the annotation process most efficient. This ranking can be based on the acquisition function (should be minimized) or using an ensemble of models. If the ensemble of algorithms is in disagreement this means that the algorithm is less confident for that image patch. The ensemble of algorithms can be generated using Bayesian deep learning, where drop out is applied at test time. The next image patches to be annotated by the user are then selected according to this ranking E8, and the annotation procedure continues E4. This process is ongoing until the user is satisfied with the quality of the cell categories classifier generated.

[0044] FIG. 8 illustrates a flowchart representation of an embodiment of a method of the invention for effective cell annotation using data balancing strategy for active learning. In which digital slides are submitted F1, and the generic cell detector described in FIG. 2 is applied to provide the centers and contours of all the cells in the slides F2. Images patches around the centers of all the cells detected are created F3 and are submitted to annotation by the user F4. Once a sufficient number of image patches is available, according to a predefined threshold, a detection algorithm for specific cell categories is trained using this data F5. The current cell categories classifier is then applied to image patches that were not annotated yet by the user F6.

[0045] Unbalanced data is a common situation where the number of instances of one category is significantly smaller than the number of instances of another category. In order to obtain a robust network there should be enough examples of each category. We therefore add data balancing methodology for effective active learning F7. The data balancing methods can be one of the following: we rank the cells inversely proportional to their existence. We duplicate image patches that belong to the least frequent category. We can also add data balancing using weighting. The weight is inversely proportional to the proportion of least frequent category. Another approach is to add data balancing using the following weighting: Weight=E*AB*(N-E)*Pminority where: *E=Entropy(class proportion) [0046] A=Acquisition function as defined in Active Learning. [0047] B=parameter [0048] N=number of categories [0049] Pminority=output of neural network that detects cell categories that gives the probability for the minority category.

[0050] Once we have enough examples of each category than we can move to the usual approach of active learning. The image patches are then ranked according to active learning methodology F8 as was described in the text for FIG. 7. The next image patches to be annotated by the user are then selected according to this ranking F9, and the annotation procedure continues F4. This process is ongoing until the user is satisfied with the quality of the cell categories classifier generated.

[0051] FIG. 9 illustrates a flowchart representation of an embodiment of a method of the invention for suggested auto annotation of cells. In which digital slides are submitted G1, and the generic cell detector described in FIG. 2 is applied to provide the centers and contours of all the cells in the slides G2. Images patches around the centers of all the cells detected are created G3 and are submitted to annotation by the user G4. Once a sufficient number of image patches is available, according to a predefined threshold, a detection algorithm for specific cell categories is trained using this data G5. The current cell categories classifier is then applied to image patches that were not annotated yet by the user G6. Once the cell categories detector is of sufficient quality, the activation of the auto annotation process can be triggered. This will be according to one of the following: 1) Time elapsed from beginning of training is higher than a threshold. 2) Classification results of the algorithm are close enough (according to some metric and threshold) to that of a human annotator. 3) Accuracy on a pre-defined validation set is good enough according to a pre-defined metric and threshold. 4) There is large enough number of annotations of cells of all categories of interest. Thus, the output of the current cell categories detector is used to annotate the image patches not annotated yet G7. The user can correct the annotations as suggested by the system G8 and these annotations are submitted for further training of the cell categories classifier G5. This process is ongoing until the user is satisfied with the quality of the cell categories classifier generated.

Cell Detection Studio: a system for the development of Deep Learning Neural Networks Algorithms for cell detection and quantification from Whole Slide Images

Inventors

Cpc classification

Classification Explorer

G06V10/778

PHYSICS

Classification Explorer

G06V10/44

PHYSICS

Classification Explorer

G06N3/047

PHYSICS

Classification Explorer

G06F18/254

PHYSICS

Classification Explorer

G06F18/217

PHYSICS

Classification Explorer

G06V20/698

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G16H50/20

PHYSICS

Classification Explorer

G06V20/695

PHYSICS

Classification Explorer

G06F18/2148

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G06V20/693

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06N3/082

PHYSICS

Classification Explorer

G16H30/40

PHYSICS

Classification Explorer

G06F18/40

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06V10/7796

PHYSICS

Classification Explorer

G06F18/28

PHYSICS

Classification Explorer

G06V10/809

PHYSICS

Classification Explorer

G06V2201/03

PHYSICS

International classification

Classification Explorer

G06K9/00

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer