CLASSIFICATION OF CELL NUCLEI

Abstract

The present invention relates to a system that can be used to accurately classify objects in biological specimens. The user firstly classifies manually an initial set of images, which are used to train a classifier. The classifier then is run on a complete set of images, and outputs not merely the classification but the probability that each image is in a variety of classes. Images are then displayed, sorted not merely by the proposed class but also the likelihood that the image in fact belongs in a proposed alternative class. The user can then reclassify images as required.

Claims

1. A method of classifying a set of images of cell nuclei into a plurality of classes, comprising: accepting input classifying each of an initial training set of images taken from the set of images of cell nuclei into a user-selected class among the plurality of classes; calculating a plurality of classification parameters characterising the image and/or the shapes of the individual nuclei of the initial training set of images; training a classification algorithm using the user-selected class and the plurality of classification parameters of the initial training set of images; running the trained classification algorithm on each of the set of images to output a set of probabilities that each of the set of images are in each of the plurality of classes; outputting on a user interface images of cell nuclei of the set of images which the set of probabilities indicates are in a likely class of the plurality of classes and also have a potential alternative class being a different class to the likely class of the plurality of classes; accepting user input to select images out of the output images that should be reclassified to the potential alternative class to obtain a final class for each of the set of images; and retraining the classification algorithm using the final class and the plurality of classification parameters of each of the complete set of images.

2. A method according to claim 1 further comprising: calculating at least one further optical parameter for images of a set of images being in a selected one or more of the final classes.

3. A method according to claim 1 further comprising carrying out case stratification on images of a set of images being in a selected one or more of the final classes.

4. A method according to claim 1 wherein the classification algorithm is an ensemble learning method for classification or regression that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes in the case of classification or mean prediction in the case of regression) of the individual trees.

5. A method according to claim 1 wherein the plurality of classification parameters include a plurality of parameters selected from: Area, optical density, Major Axis Length, Minor Axis Length, Form Factor, Shape Factor, Eccentricity, Convex area, Concavity, Equivalent Diameter, Perimeter, Perimeterdev, Symmetry, Hu moments of the shape, Hu moments of the image within the shape, Hu moments of the whole image, Mean intensity within the shape, standard deviation of intensity within the shape, variance of intensity within the shape, skewness of intensity within the shape, kurtosis of intensity within the mask, coefficient of variation of intensity within the shape, mean intensity of whole area, standard deviation of intensity of whole area, variance of intensity in the whole area, kurtosis of intensity within whole area, border mean of shape, mean of intensity of the of the strip five pixels wide just outside the border of the mask, standard deviation of intensity of the strip five pixels wide just outside the border of the mask, variance of intensity of the strip five pixels wide just outside the border of the mask, skewness of intensity of the strip five pixels wide just outside the border of the mask, kurtosis of intensity of the strip five pixels wide just outside the border of the mask; coefficient of variation of intensity of the strip five pixels wide just outside the border of the mask, jaggedness, variance of the radius, minimum diameter, maximum diameter, number of gray levels in the object, angular change, and standard deviation of intensity of the image after applying a Gabor filter.

6. A method according to claim 5 wherein the plurality of parameters include at least five of the said parameters.

7. A method according to claim 5 wherein the plurality of parameters includes all of the said parameters.

8. A method according to claim 1 wherein the user interface has a control for selecting the potential alternative class when displayed images of nuclei of the likely class.

9. A method according to claim 1 further comprising capturing the image of cell nuclei by photographing a monolayer or section on a microscope.

10. A computer program product comprising computer program code means adapted to cause a computer to carry out a method according to claim 1 when said computer program code means is run on the computer.

11. A system comprising a computer and a means for capturing images of cell nuclei, wherein the computer is adapted to carry out a method according to claim 1 to classify images of cell nuclei into a plurality of classes.

12. A system comprising a computer and a user interface, wherein: the computer comprises code for calculating a plurality of classification parameters characterising the image and/or the shapes of the individual nuclei of the initial training set of images, training a classification algorithm using the user-selected class and the plurality of classification parameters of the initial training set of images, and running the trained classification algorithm on each of the set of images to output a set of probabilities that each of the set of images are in each of the plurality of classes; and the user interface includes a selection control for accepting user input classifying each of an initial training set of images taken from the set of images of cell nuclei into a user-selected class among the plurality of classes; a display area for outputting on the user interface images of cell nuclei of the set of images which the set of probabilities indicates are in a likely class of the plurality of classes and also have a potential alternative class being a different class to the likely class of the plurality of classes; a selection control for accepting user input to select images out of the output images that should be reclassified to the potential alternative class to obtain a final class for each of the set of images; wherein the computer system further comprises code for retraining the classification algorithm using the final class and the plurality of classification parameters of each of the complete set of images.

13. A system according to claim 12 wherein the classification algorithm is an algorithm adapted to output a set of respective probabilities that an image represents an example of each respective class.

14. A system according to claim 12 wherein the user interface has a control for selecting the potential alternative class when displayed images of nuclei of the likely class.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0028] For a better understanding of the invention, embodiments will now be described, purely by way of example, with reference to the accompanying drawings, in which

[0029] FIG. 1 shows a system according to a first embodiment of the invention;

[0030] FIG. 2 is a flow chart of a method according to an embodiment of the invention;

[0031] FIG. 3 is an example user interface output after step 220;

[0032] FIG. 4 is an example user interface output at step 270; and

[0033] FIG. 5 is an example user interface output at step 270.

DETAILED DESCRIPTION

The System

[0034] Images may be captured using the components shown in FIG. 1 include camera 1 positioned on the microscope 3 which is used to analyse the specimen 4. An automated stage 5 and associated controller 6 are used to move the sample around all being controlled by the computer 2. The computer 2 moves the specimen automatically and the camera 1 is used to capture images of the specimen, including cell nuclei.

[0035] As an alternative or additionally to capturing images of specimens using the components shown in FIG. 1, the method may also work with images captured in a different way. For example, images may be captured from a slide scanner. In other cases, sets of images may be available which have already been captured and the method may classify such images.

[0036] Indeed, the method of the invention is not reliant on the images all been captured in the same way on the same apparatus and is able to cope with large numbers of images obtained from a variety of sources.

[0037] The processing of these images is then carried out in accordance with the method illustrated in FIG. 2.

[0038] The set of images are then passed to the computer 2 which segments them, i.e. identifies the individual nuclei. A number of parameters, shown in Table 1 below, are then calculated for each of the masks.

[0039] A user then uses the system shown in FIG. 1 using the method illustrated in FIG. 2 to classify some examples of the set of images of the cell nuclei into specific classes, which will also be referred to as galleries, for example Epithelial, lymphocytes, plasma cells and artefacts. For example, these can be placed into class 1, class 2, class 3 and class 4 respectively.

[0040] The images are retrieved (Step 200) and displayed (Step 210) on user interface 7,8 which includes a screen 7 and a pointer controller such as mouse 8. The user can then (Step 220) sort the objects by ordering them by the parameters listed in Table 1, the objects can then be selected and moved to a relevant class, either one at time or by selecting using the rubber band technique. An example screen of images in nuclei display area 24 sorted into class 1 (indicated by the selected class selection control 12 labelled 1) is shown in FIG. 3. This selection by the user groups the objects into groups so that the classifier can be trained. This user-grouped set of images will be referred to as the initial training set of images and each of the initial training set of images is assigned to a user-selected class. The initial training set of images may be 0.1% to 50% of the images, for example 5% to 20%.

[0041] The user interface screen 7 includes a nuclei display area 24 and number of controls 10. “Class selection” control 12 allow the selection of individual classes, to display the nuclei from those classes. An “Analyze” control 14 generates a histogram (of intensity) of a selected nucleus or nuclei. A select control 16 switches into a mode where selecting a nucleus with the mouse selects that nucleus, and a deselect control 18 switches into a mode where selecting a nucleus with the mouse deselects that nucleus. By the use of these controls the user can select a number of nuclei. These can then be dragged into a different class by dragging to the respective class selection control 12.

[0042] Note that in some cases the user may be able to classify an image by eye. In alternative cases, the user may select an image and the user interface screen may respond by presenting further data relating to the image to assist the user in classifying the image.

[0043] The user interface screen 7 also includes a sort control 20,22. This may be used to sort the images of nuclei of one class by the probability that the image is in a different class at a later stage of the method. In the example of FIG. 3, the displayed nuclei are simply nuclei in class 1 not sorted by any additional probability. This represents the display of the nuclei in class 1 after the user has carried out the sorting.

[0044] It is not necessary for the user in this initial step to classify more than a fraction of the complete set of images.

[0045] Next, the method uses a classification approach to classify the other images that have not been classified by the user. A number of classification parameters are calculated (Step 230) for each of the images classified by the user.

[0046] The classification approach uses a number of parameters, which will be referred to as classification parameters. In the particular arrangement, the following classification parameters are calculated for each image. It will be appreciated that although the following list gives goad results in the specific area of interest, other sets of selection parameters may be used where appropriate. In particular, it is not necessary to calculate all parameters for all applications—in some cases a more limited set of parameters may give results that are effectively as good.

TABLE-US-00001 TABLE 1 Calculated Parameters Parameter Description Area Number of pixels within the mask OD [00001] $O D = - \log (\frac{M e a n l n t e n s {ity}_{im}}{M e a n l n t e n s {itν}_{bk}})$ Where Optical Density is OD. MeanIntensity.sub.im is the mean intensity of the segmented object and Meanlntensity.sub.bk is the mean intensity of the background area. Major Axis minor axis = {square root over ((a + b).sup.2 − ƒ)} Length Minor Axis major axis = a + b Length Form Factor [00002] $FormFactor = \frac{D i {ameter}_{\max}}{{Diameter}_{\min}}$ Form factor is the measure used to describe the shape in terms of the length of its minimum and maximum diameters, as opposed to shape factor in Section 3.2.3 which references object to a circle using perimeter and area measures. Diameter.sub.min and Diameter.sub.max are the minimum and maximum diameters of the segmented cell. Shape Factor [00003] $ShapeFactor = 2 .Math. (\frac{A r e a}{Perimeter .Math. (\frac{{Diameter}_{\max}}{2})})$ Shape factor is a parametric measure used to describe the circularity of an object where shape factor for a circle = 1. Area, Perimeter and Diameter are object dimensions in pixels. Eccentricity The eccentricity is the ratio of the distance between the foci of the ellipse and its major axis length. Convex area The area defined by the convex hull of the object, area within the outside contour. Concavity [00004] $Concavity = \frac{C o n v e x H u l l_{A r e a}}{M a s k_{Area}}$ Concavity is the area difference between the true area and that of a convex hull of the perimeter. This parametric measure is used to detect touching nuclei. ConvexHull.sub.Area is the area defined by the convex hull of the object and Mask.sub.Are is the area of the segmented object. This parametric measure can be used to determine if an object comprises of two touching nuclei. Equivalent The equivalent diameter circle that has the same perimeter as the object. Diameter Perimeter P = 1.41.N.sub.Diagonal_Pixels + N.sub.Vert_Hoz_Pixels Perimeter P is the number of boundary pixels in the segmented object. N.sub.DiagonalPixels is the number of pixel on the perimeter of the object diagonally connected to neighbours and N.sub.Vert_Hoz_Pixels is the number of pixels connected to neighbouring pixels either vertically or horizontally. Perimeterdev Standard deviation of the distance between the centroid of the object and the points on the perimeter Symmetry Symmetry is used to detect uneven of cut [00005] $Symmetry = (\frac{{.Math.}_{n = i}^{N} \sqrt{{(X_{n 1} - X_{n 2})}^{2}}}{N})$ cells or cells that are touching each other. X.sub.n1 and X.sub.n2 are vector pairs, π radians around the perimeter of the object with the centroid at the centre There are N equally spaced paired vectors. HU Parameters Seven Hu Moments are calculated as described in Calculated On http://en.wikipedia.org/wiki/Image moment. The Hu moments are a set The Mask of parameters that describe an object are calculated on the masked object. Spatial moment calculated as M.sub.ji = sum.sub.x,y(I(x,y).Math. x.sup.j .Math. y.sup.i) where I(x,y) is the intensity of the pixel (x, y) From which the central moment is calculated μ.sub.ij = sum.sub.x,y(I(x,y) .Math. (x − x.sub.c).sup.J .Math. (y − y.sub.c).sup.i), where x.sub.c = M.sub.10/M.sub.00, y.sub.c = M.sub.01/M.sub.00 - coordinates of the gravity center And the normalised central moment n.sub.ij = μ.sub.ij/M.sub.00.sup.((i+j)/2+1) From which the seven Hu Moments are calculated h.sub.1 = η.sub.20 + η.sub.02 h.sub.2 = (η.sub.20 − η.sub.02).sup.2 4η.sub.11.sup.2 h.sub.3 = (η.sub.30 − 3η.sub.12).sup.2 + (3η.sub.21 − η.sub.03).sup.2 h.sub.4 = (η.sub.30 + η.sub.12).sup.2 + (η.sub.21 + η.sub.03).sup.2 h.sub.5 = (η.sub.30 −3η.sub.12)(η.sub.30 + η.sub.12)[(η.sub.30 + η.sub.12).sup.2 − 3(η.sub.21 + η.sub.03).sup.2] + (3η.sub.21 − η.sub.03)(η.sub.21 + η.sub.03)[3(η.sub.30 + η.sub.12).sup.2 − (η.sub.21 η.sub.03).sup.2] h.sub.6 = (η.sub.20 − η.sub.02)[(η.sub.30 + η.sub.12).sup.2 − (η.sub.21 + η.sub.03).sup.2] + 4η.sub.11(η.sub.30 + η.sub.12)(η.sub.21 + η.sub.03) h.sub.7 = (3η.sub.21 − η.sub.03)(η.sub.21 + η.sub.03)[3(η.sub.30 + η.sub.12).sup.2 − (η.sub.21 + η.sub.03).sup.2] − (η.sub.30 − 3η.sub.12)(η.sub.21 + η.sub.03)[3(η.sub.30 + η.sub.12).sup.2 − (η.sub.21 + η.sub.03).sup.2] These values are proved to be invariants to the image scale, rotation, and reflection except the seventh one, whose sign is changed by reflection. HU Parameters Same calculation as HU Parameters Calculated On The Mask but on the Calculated On pixel values of the object within the mask The Gray Scale Image Within The Mask(GS) HU Parameters Same calculation as HU Parameters Calculated On The Mask but on the Calculated On whole masked object The Gray Scale Image On The Whole Image(GS) Mean Within The Mask [00006] ${\overline{Intensity}}_{im} = \frac{{.Math.}_{n = 0}^{n = N_{im}} l n t e {nsity}_{n}}{N_{i m}}$ Intensity.sub.im is the mean intensity inside a segmented area and N.sub.im is the number of pixels within the object, Intensity n is the pixel intensity for individual pixels. Stddev Within.sub.The Mask [00007] $σ = \sqrt{\frac{{.Math.}_{l = 1}^{N} {(x_{i} - \bar{x})}^{2}}{N}}$ Standard deviation is σ, where x is the mean value, x.sub.i the sample value and N the number of samples. Variance Within var = σ.sup.2 The Mask Skewness Within The Mask [00008] $skew = \frac{.Math. {(X - x)}^{3}}{N σ^{3}}$ Skew is γ.sub.1, where x is the mean value, x.sub.i the sample value and N the number of samples, N the number of samples, σ the standard deviation. The measure shows the distribution of the histogram. Kurtosis Within The Mask [00009] $γ_{2} = \frac{{.Math.}_{i = 1}^{N} {(x_{i} - \overline{x})}^{4}}{N .Math. σ^{4}} - 3$ Skew is γ.sub.2, where x is the mean value, x.sub.i the sample value and N the number of samples, N the number of samples, σ the standard deviation. The measure shows the degree of peakedness of the distribution. Cv Within The Stddev/mean Mask Mean Of Whole As for within the mask Area Stddev Of Whole As for within the mask Area Variance Of As for within the mask Whole Area Skewness Of As for within the mask Whole Area Kurtosis Of As for within the mask Whole Area Cv (coefficient of As for within the mask variation) of the Whole Area Border Mean As above but for the 5 pixels outside the border of the mask Border Stddev As above but for the 5 pixels outside the border of the mask Border Variance As above but for the 5 pixels outside the border of the mask Border Skewness As above but for the 5 pixels outside the border of the mask Border Kurtosis As above but for the 5 pixels outside the border of the mask Border CV As above but for the 5 pixels outside the border of the mask Jaggedness [00010] $Jaggedness = (\frac{{.Math.}_{n = 0}^{n = N} \sqrt{{(X_{n} - (\frac{median (X_{(n - n + 5)})}{5}))}^{2}}}{N})$ Jaggedness is the measure of the roughness of the object. By calculating local differences in radial distance, this measure can be used to detect cut nuclei or to distinguish artefacts from nuclei that are of interest. X.sub.n is the distance from the perimeter to the centroid of the object. Radius variance [00011] $RadialVariance = (\frac{{.Math.}_{n = 1}^{N} \sqrt{{(x_{i} - \bar{x})}^{2}}}{N})$ Radial variance is the parametric measure used to determine how much the radial distance deviates around the perimeter of the measure nuclei. x.sub.i is the distance from the perimeter to the centroid of the object and x is the mean radius. Mindiameter Minimum distance from the centroid to the edge of the mask Maxdiameter Minimum distance from the centroid to the edge of the mask Gray Levels In Number of gray levels in the object The Object Angular change AngularChange = Max.sub.a, Gabor Filter Standard deviation of the image once Gabor filters as described Calculations http://en.wikipedia.org/wiki/Gabor filter has been calculated.

[0047] Then, the algorithm is trained using the classification parameters for each of the initial training set of images. Data on the images, i.e. the classification parameters and the user-selected class are sent (step 240) to an algorithm to be trained (step 280).

[0048] Any suitable classification algorithm may be used. The classification algorithm needs not to simply output a proposed classification, but instead output a measure of the probability of each image fitting into each available class as a function of the classification parameters.

[0049] A particularly suitable type of algorithm is an ensemble learning method for classification or regression that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes in the case of classification or mean prediction in the case of regression) of the individual trees. Such an algorithm calculating a set of decision trees may be based on the paper by Tim Kam Ho, IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 20, Issue: 8, August 1998), and developments thereof may be used.

[0050] In particular, classification algorithms sometimes referred to as “XG Boost” or “Random Forest” may be used. In the examples in this case, the algorithms used were those available at httpsJ/cran.r-project.org/web/packages/randomForest/randomForest.pdf and in the alternative httpsJ/cran.r-project.org/web/packages/xgboost/xgboost.pdf.

[0051] The output of these algorithms is, for each of the set of images, a probability that each of the images represents an example of each class. For example, in the case that there are six classes, the set of probabilities of a sample image may be (0.15,0.04,0.11,0.26,0.11,0.33), in which the numbers represent the probability that the sample image is in the first, second, third, fourth, fifth and sixth class respectively. In this example, the highest probability is that the sample image is in the sixth class and so the sample image is classified into that class.

[0052] At this stage of the method, the classification parameters and the user-selected class of the initial training set of images is used to train the classification algorithm.

[0053] Then, the algorithm is run (Step 250) on the complete set of images, not just the initial training set of images, or alternatively on just those images that are not part of the initial training set, to classify each of the images.

[0054] These images are then displayed (step 260) not merely on the basis of the chosen sample class but also on the basis of the likelihood that the image is in a different class. Thus, the images may b displayed in groups determined not merely by the classification of the image but also the probability that the image may be in another class.

[0055] For example, as illustrated in FIG. 4 the user is presented with a page of the images in the sixth class most likely to be in the first class. As illustrated in FIG. 5, a different page illustrates the images in the sixth class most likely to be in the fourth class. This alternative class will be referred to as a proposed alternative class. Note that the shapes of nuclei are of course different in FIG. 5 as these represent closer matches to a different class of nuclei.

[0056] The user may select the displays represented in FIGS. 4 and 5 using the sort control 20 and sort selector 22. Thus, the user displays class 6 by selecting the corresponding class selection control 12, and then selects to sort by class 1 (i.e. the probability of class 1) by selecting class 1 in sort selector 22 and then pressing the sort control 20, to obtain the set of images of FIG. 4. The set of images of FIG. 5 are obtained in a similar way except by selecting class 4 in the sort selector 22.

[0057] The user can then review these pages of images and reclassify quickly and easily select and reclassify those images that should be in the proposed alternative class (step 270).

[0058] This leads to a set of images that have been reviewed by the human user without the need for individually reclassifying every image.

[0059] At this stage, the reviewed classification of the image set can be used for further analysis. This is appropriate if what is required is a set of images for analysis. Such analysis may include calculating a further optical parameter from each of a particular class of images, i.e. each of the images in one of the classes. Such calculation of the further optical parameter can include calculating optical density, calculating integrated optical density, or calculating pixel level measures such as texture, and/or including calculating measures of some property of the cell, such as the biological cell type or other biological characteristic.

[0060] Alternatively, at this stage, the classification algorithm can be retrained using the classification parameters of all of the images (by rerunning step 280 with the complete data set) and the class assigned to those images after review by the human user. In the example, the same classification algorithm as was trained using the initial training set of data. Alternatively, another algorithm may be used.

[0061] This leads to a trained classification algorithm that is effectively trained on the complete set of images without the user having had to manually classify each of the set of images. This means that it is possible to use much larger training data sets and hence to provide a more accurate and reliable trained classification algorithm.

[0062] The inventors have discovered that this approach works particularly well with some or all of the set of classification indicia proposed.

[0063] The resulting trained classification algorithm may be trained with greater quantities of data and hence is in general terms more reliable. Therefore, the trained algorithm may create a better automatic classifier of images, which can be extremely important in medical applications. Accurate classification of images of nuclei is a critical step, for example in evaluating cancer in patients, as the different susceptibility of different types of nuclei to different types of cancer means that it is necessary to have accurately classified nuclei to achieve accurate diagnosis. Such accurate classification and diagnosis may in turn allow for patients to be treated appropriately for their illness, for example only using chemotherapy where treating the exact type of cancer with chemotherapy has been shown to give enhanced life outcomes. This does not just apply to cancer, but to any medical test requiring the use of classified images of nuclei.

[0064] The utility of the larger dataset for training is that it allows for the training set to included rare biological events such as small sub population cells with certain characteristic so that these rare cells can be more reliably and statistically relied upon and hence trained into the system. It also allows rapid retraining of a system where there have been small changes in the biological specimen, preparation or imaging system that cause the existing classifier to require refinement.

CLASSIFICATION OF CELL NUCLEI

Assignee

Inventors

Cpc classification

Classification Explorer

G06F18/214

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06F18/41

PHYSICS

Classification Explorer

G06T2207/10056

PHYSICS

Classification Explorer

G06V20/698

PHYSICS

Classification Explorer

G06V20/695

PHYSICS

Classification Explorer

G06F18/2415

PHYSICS

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G06F18/24323

PHYSICS

Classification Explorer

G06T2207/30024

PHYSICS

Classification Explorer

G06F18/2431

PHYSICS

Classification Explorer

G06T7/0012

PHYSICS

Classification Explorer

G06T2207/20084

PHYSICS

Classification Explorer

G06V10/7784

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

Classification Explorer

G06F18/2178

PHYSICS

Classification Explorer

G06V2201/03

PHYSICS

International classification

Classification Explorer

G06K9/00

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Abstract

Claims

Description