CLASSIFICATION OF CELL NUCLEI
20220058371 · 2022-02-24
Assignee
Inventors
Cpc classification
G06F18/214
PHYSICS
G06F18/2415
PHYSICS
International classification
Abstract
The present invention relates to a system that can be used to accurately classify objects in biological specimens. The user firstly classifies manually an initial set of images, which are used to train a classifier. The classifier then is run on a complete set of images, and outputs not merely the classification but the probability that each image is in a variety of classes. Images are then displayed, sorted not merely by the proposed class but also the likelihood that the image in fact belongs in a proposed alternative class. The user can then reclassify images as required.
Claims
1. A method of classifying a set of images of cell nuclei into a plurality of classes, comprising: accepting input classifying each of an initial training set of images taken from the set of images of cell nuclei into a user-selected class among the plurality of classes; calculating a plurality of classification parameters characterising the image and/or the shapes of the individual nuclei of the initial training set of images; training a classification algorithm using the user-selected class and the plurality of classification parameters of the initial training set of images; running the trained classification algorithm on each of the set of images to output a set of probabilities that each of the set of images are in each of the plurality of classes; outputting on a user interface images of cell nuclei of the set of images which the set of probabilities indicates are in a likely class of the plurality of classes and also have a potential alternative class being a different class to the likely class of the plurality of classes; accepting user input to select images out of the output images that should be reclassified to the potential alternative class to obtain a final class for each of the set of images; and retraining the classification algorithm using the final class and the plurality of classification parameters of each of the complete set of images.
2. A method according to claim 1 further comprising: calculating at least one further optical parameter for images of a set of images being in a selected one or more of the final classes.
3. A method according to claim 1 further comprising carrying out case stratification on images of a set of images being in a selected one or more of the final classes.
4. A method according to claim 1 wherein the classification algorithm is an ensemble learning method for classification or regression that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes in the case of classification or mean prediction in the case of regression) of the individual trees.
5. A method according to claim 1 wherein the plurality of classification parameters include a plurality of parameters selected from: Area, optical density, Major Axis Length, Minor Axis Length, Form Factor, Shape Factor, Eccentricity, Convex area, Concavity, Equivalent Diameter, Perimeter, Perimeterdev, Symmetry, Hu moments of the shape, Hu moments of the image within the shape, Hu moments of the whole image, Mean intensity within the shape, standard deviation of intensity within the shape, variance of intensity within the shape, skewness of intensity within the shape, kurtosis of intensity within the mask, coefficient of variation of intensity within the shape, mean intensity of whole area, standard deviation of intensity of whole area, variance of intensity in the whole area, kurtosis of intensity within whole area, border mean of shape, mean of intensity of the of the strip five pixels wide just outside the border of the mask, standard deviation of intensity of the strip five pixels wide just outside the border of the mask, variance of intensity of the strip five pixels wide just outside the border of the mask, skewness of intensity of the strip five pixels wide just outside the border of the mask, kurtosis of intensity of the strip five pixels wide just outside the border of the mask; coefficient of variation of intensity of the strip five pixels wide just outside the border of the mask, jaggedness, variance of the radius, minimum diameter, maximum diameter, number of gray levels in the object, angular change, and standard deviation of intensity of the image after applying a Gabor filter.
6. A method according to claim 5 wherein the plurality of parameters include at least five of the said parameters.
7. A method according to claim 5 wherein the plurality of parameters includes all of the said parameters.
8. A method according to claim 1 wherein the user interface has a control for selecting the potential alternative class when displayed images of nuclei of the likely class.
9. A method according to claim 1 further comprising capturing the image of cell nuclei by photographing a monolayer or section on a microscope.
10. A computer program product comprising computer program code means adapted to cause a computer to carry out a method according to claim 1 when said computer program code means is run on the computer.
11. A system comprising a computer and a means for capturing images of cell nuclei, wherein the computer is adapted to carry out a method according to claim 1 to classify images of cell nuclei into a plurality of classes.
12. A system comprising a computer and a user interface, wherein: the computer comprises code for calculating a plurality of classification parameters characterising the image and/or the shapes of the individual nuclei of the initial training set of images, training a classification algorithm using the user-selected class and the plurality of classification parameters of the initial training set of images, and running the trained classification algorithm on each of the set of images to output a set of probabilities that each of the set of images are in each of the plurality of classes; and the user interface includes a selection control for accepting user input classifying each of an initial training set of images taken from the set of images of cell nuclei into a user-selected class among the plurality of classes; a display area for outputting on the user interface images of cell nuclei of the set of images which the set of probabilities indicates are in a likely class of the plurality of classes and also have a potential alternative class being a different class to the likely class of the plurality of classes; a selection control for accepting user input to select images out of the output images that should be reclassified to the potential alternative class to obtain a final class for each of the set of images; wherein the computer system further comprises code for retraining the classification algorithm using the final class and the plurality of classification parameters of each of the complete set of images.
13. A system according to claim 12 wherein the classification algorithm is an algorithm adapted to output a set of respective probabilities that an image represents an example of each respective class.
14. A system according to claim 12 wherein the user interface has a control for selecting the potential alternative class when displayed images of nuclei of the likely class.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0028] For a better understanding of the invention, embodiments will now be described, purely by way of example, with reference to the accompanying drawings, in which
[0029]
[0030]
[0031]
[0032]
[0033]
DETAILED DESCRIPTION
The System
[0034] Images may be captured using the components shown in
[0035] As an alternative or additionally to capturing images of specimens using the components shown in
[0036] Indeed, the method of the invention is not reliant on the images all been captured in the same way on the same apparatus and is able to cope with large numbers of images obtained from a variety of sources.
[0037] The processing of these images is then carried out in accordance with the method illustrated in
[0038] The set of images are then passed to the computer 2 which segments them, i.e. identifies the individual nuclei. A number of parameters, shown in Table 1 below, are then calculated for each of the masks.
[0039] A user then uses the system shown in
[0040] The images are retrieved (Step 200) and displayed (Step 210) on user interface 7,8 which includes a screen 7 and a pointer controller such as mouse 8. The user can then (Step 220) sort the objects by ordering them by the parameters listed in Table 1, the objects can then be selected and moved to a relevant class, either one at time or by selecting using the rubber band technique. An example screen of images in nuclei display area 24 sorted into class 1 (indicated by the selected class selection control 12 labelled 1) is shown in
[0041] The user interface screen 7 includes a nuclei display area 24 and number of controls 10. “Class selection” control 12 allow the selection of individual classes, to display the nuclei from those classes. An “Analyze” control 14 generates a histogram (of intensity) of a selected nucleus or nuclei. A select control 16 switches into a mode where selecting a nucleus with the mouse selects that nucleus, and a deselect control 18 switches into a mode where selecting a nucleus with the mouse deselects that nucleus. By the use of these controls the user can select a number of nuclei. These can then be dragged into a different class by dragging to the respective class selection control 12.
[0042] Note that in some cases the user may be able to classify an image by eye. In alternative cases, the user may select an image and the user interface screen may respond by presenting further data relating to the image to assist the user in classifying the image.
[0043] The user interface screen 7 also includes a sort control 20,22. This may be used to sort the images of nuclei of one class by the probability that the image is in a different class at a later stage of the method. In the example of
[0044] It is not necessary for the user in this initial step to classify more than a fraction of the complete set of images.
[0045] Next, the method uses a classification approach to classify the other images that have not been classified by the user. A number of classification parameters are calculated (Step 230) for each of the images classified by the user.
[0046] The classification approach uses a number of parameters, which will be referred to as classification parameters. In the particular arrangement, the following classification parameters are calculated for each image. It will be appreciated that although the following list gives goad results in the specific area of interest, other sets of selection parameters may be used where appropriate. In particular, it is not necessary to calculate all parameters for all applications—in some cases a more limited set of parameters may give results that are effectively as good.
TABLE-US-00001 TABLE 1 Calculated Parameters Parameter Description Area Number of pixels within the mask OD
[0047] Then, the algorithm is trained using the classification parameters for each of the initial training set of images. Data on the images, i.e. the classification parameters and the user-selected class are sent (step 240) to an algorithm to be trained (step 280).
[0048] Any suitable classification algorithm may be used. The classification algorithm needs not to simply output a proposed classification, but instead output a measure of the probability of each image fitting into each available class as a function of the classification parameters.
[0049] A particularly suitable type of algorithm is an ensemble learning method for classification or regression that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes in the case of classification or mean prediction in the case of regression) of the individual trees. Such an algorithm calculating a set of decision trees may be based on the paper by Tim Kam Ho, IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 20, Issue: 8, August 1998), and developments thereof may be used.
[0050] In particular, classification algorithms sometimes referred to as “XG Boost” or “Random Forest” may be used. In the examples in this case, the algorithms used were those available at httpsJ/cran.r-project.org/web/packages/randomForest/randomForest.pdf and in the alternative httpsJ/cran.r-project.org/web/packages/xgboost/xgboost.pdf.
[0051] The output of these algorithms is, for each of the set of images, a probability that each of the images represents an example of each class. For example, in the case that there are six classes, the set of probabilities of a sample image may be (0.15,0.04,0.11,0.26,0.11,0.33), in which the numbers represent the probability that the sample image is in the first, second, third, fourth, fifth and sixth class respectively. In this example, the highest probability is that the sample image is in the sixth class and so the sample image is classified into that class.
[0052] At this stage of the method, the classification parameters and the user-selected class of the initial training set of images is used to train the classification algorithm.
[0053] Then, the algorithm is run (Step 250) on the complete set of images, not just the initial training set of images, or alternatively on just those images that are not part of the initial training set, to classify each of the images.
[0054] These images are then displayed (step 260) not merely on the basis of the chosen sample class but also on the basis of the likelihood that the image is in a different class. Thus, the images may b displayed in groups determined not merely by the classification of the image but also the probability that the image may be in another class.
[0055] For example, as illustrated in
[0056] The user may select the displays represented in
[0057] The user can then review these pages of images and reclassify quickly and easily select and reclassify those images that should be in the proposed alternative class (step 270).
[0058] This leads to a set of images that have been reviewed by the human user without the need for individually reclassifying every image.
[0059] At this stage, the reviewed classification of the image set can be used for further analysis. This is appropriate if what is required is a set of images for analysis. Such analysis may include calculating a further optical parameter from each of a particular class of images, i.e. each of the images in one of the classes. Such calculation of the further optical parameter can include calculating optical density, calculating integrated optical density, or calculating pixel level measures such as texture, and/or including calculating measures of some property of the cell, such as the biological cell type or other biological characteristic.
[0060] Alternatively, at this stage, the classification algorithm can be retrained using the classification parameters of all of the images (by rerunning step 280 with the complete data set) and the class assigned to those images after review by the human user. In the example, the same classification algorithm as was trained using the initial training set of data. Alternatively, another algorithm may be used.
[0061] This leads to a trained classification algorithm that is effectively trained on the complete set of images without the user having had to manually classify each of the set of images. This means that it is possible to use much larger training data sets and hence to provide a more accurate and reliable trained classification algorithm.
[0062] The inventors have discovered that this approach works particularly well with some or all of the set of classification indicia proposed.
[0063] The resulting trained classification algorithm may be trained with greater quantities of data and hence is in general terms more reliable. Therefore, the trained algorithm may create a better automatic classifier of images, which can be extremely important in medical applications. Accurate classification of images of nuclei is a critical step, for example in evaluating cancer in patients, as the different susceptibility of different types of nuclei to different types of cancer means that it is necessary to have accurately classified nuclei to achieve accurate diagnosis. Such accurate classification and diagnosis may in turn allow for patients to be treated appropriately for their illness, for example only using chemotherapy where treating the exact type of cancer with chemotherapy has been shown to give enhanced life outcomes. This does not just apply to cancer, but to any medical test requiring the use of classified images of nuclei.
[0064] The utility of the larger dataset for training is that it allows for the training set to included rare biological events such as small sub population cells with certain characteristic so that these rare cells can be more reliably and statistically relied upon and hence trained into the system. It also allows rapid retraining of a system where there have been small changes in the biological specimen, preparation or imaging system that cause the existing classifier to require refinement.