Systems and methods for segmentation and processing of tissue images and feature extraction from same for treating, diagnosing, or predicting medical conditions
09971931 ยท 2018-05-15
Assignee
Inventors
- Peter Ajemba (Chatsworth, CA)
- Richard Scott (Chestnut Ridge, NY)
- Janakiramanan Ramachandran (Fremont, CA)
- Jack Zeineh (Fullerton, CA)
- Michael Donovan (Newton, MA)
- Gerardo Fernandez (New York, NY)
- Qiuhua Liu (Sugar Land, TX)
- Faisal Khan (Fishkill, NY)
Cpc classification
G06T5/94
PHYSICS
International classification
Abstract
Apparatus, methods, and computer-readable media are provided for segmentation, processing (e.g., preprocessing and/or postprocessing), and/or feature extraction from tissue images such as, for example, images of nuclei and/or cytoplasm. Tissue images processed by various embodiments described herein may be generated by Hematoxylin and Eosin (H&E) staining, immunofluorescence (IF) detection, immunohistochemistry (IHC), similar and/or related staining processes, and/or other processes. Predictive features described herein may be provided for use in, for example, one or more predictive models for treating, diagnosing, and/or predicting the occurrence (e.g., recurrence) of one or more medical conditions such as, for example, cancer or other types of disease.
Claims
1. A system for predicting the occurrence of a medical condition, the system comprising: (1) a database configured to store patient data, including at least one sample image of a tissue sample treated with a plurality of flurochrome labeled antibodies; and (2) a processor configured by code executing therein to perform the following: (a) generate a patient image dataset, using the at least one sample image, that includes values for one or more texture features selected from a group of features consisting of (i) homogeneity and (ii) correlation; (b) evaluate at least the patient image dataset with a Support Vector Regression for Censored Data (SVRc) algorithm executed as code by the processor, where the SVRc algorithm is configured to output a value corresponding to a risk score for a medical condition occurrence based on the patient image dataset, wherein the SVRc algorithm is generated by performing regression, using code executed in the processor, on a population dataset, where each member of the population has measurement values corresponding to each feature of the patient image dataset; (c) assign the patient to a high probability of a medical condition occurrence where the output value is below a pre-determined threshold and assign the patient to a low probability of a medical condition occurrence where the output value is above the pre-determined threshold; and (d) generate a report based on the updated patient dataset.
2. The system of claim 1, wherein the processor is configured to determine homogeneity according to:
3. The system of claim 1, wherein the processor is configured to determine correlation according to:
4. The system of claim 1, wherein the processor is configured to evaluate the patient dataset using the SVRC algorithm, the patient dataset comprising the patient image dataset, one or more clinical features, and one or more molecular features, and the SVRC algorithm is generated by performing regression on a population dataset where each member of the population dataset has data corresponding to each of the features found in the patient dataset.
5. A system of predicting occurrence of a medical condition, the system comprising: (1) a database configured to store patient data, including at least one sample image of a tissue sample treated with a plurality of flurochrome labeled antibodies; and (2) a processor configured by code executing therein to perform the following: (a) generate a patient morphometric dataset, using the at least one sample image, that includes values for at least one of the following features (1) one or more values selected from a group of ring measurements of one or more ring metrics; and (2) one or more values derived from one or more ring metrics, wherein the one or more values represents an adjacency relationship between rings; (b) evaluate at least the patient morphometric dataset with a Support Vector Regression for Censored Data (SVRc) algorithm executed as code by the processor, where the SVRc algorithm is configured to output a value corresponding to a risk score for a medical condition occurrence based on the morphometric dataset, wherein the SVRc algorithm is generated by performing regression, using code executed in the processor, on a population dataset, where each member of the population has measurement values corresponding to each feature of the patient morphometric dataset; (c) assign the patient to a high probability of a medical condition occurrence where the output model value is below a pre-determined threshold and assign the patient to a low probability of a medical condition occurrence where the output value is above the pre-determined threshold; and (d) generate a report based on the updated patient dataset.
6. The system of claim 5, wherein the medical condition is prostate cancer.
7. The system of claim 5, wherein the patient morphometric dataset further includes at least one value corresponding to (a) an outer diameter of a ring, inner diameter of a ring, a border gap, lumen or clearing diameter, a border density, a lumen ratio, a proportion of border touching inner clearing, a proportion of border touching stroma, a ratio of border less than a predefined number of pixels from stroma, a mean distance of border pixels from stroma, a width of epithelial padding between ring and stroma; (b) feature(s) derived from any of the foregoing values; and (c) feature(s) representing an adjacency relationship between rings.
8. The system of claim 5, wherein the patient morphometric dataset includes at least one value corresponding to one or more features derived from a ring segmentation and/or ring classification.
9. The system of claim 5, wherein the processor is configured to evaluate the patient dataset using the SVRC algorithm, the patient dataset comprising the patient morphometric dataset, one or more clinical features, and one or more molecular features, and the SVRC algorithm is generated by performing regression on a population dataset where each member of the population dataset has data corresponding to each of the features found in the patient dataset.
10. The system of claim 9, wherein the patient dataset further includes at least one value corresponding to one or more features selected from the group of ring combination features consisting of (i) a feature generated by localizing a feature per ring, (ii) a set of values characterizing morphological components of a ring, and (iii) a combined image feature generated by combining one or more localized features, localized per ring, with one or more morphological components of a ring.
11. The system of claim 10, wherein the one or more ring combination features comprises a biomarker feature in combination with a morphological ring feature.
12. The system of claim 10, wherein the one or more ring combination features comprises an androgen receptor (AR) biomarker intensity value in combination with a morphological ring feature.
13. The system of claim 10, wherein the one or more ring combination features comprises a Ki67 area feature in combination with a morphological ring feature.
14. The system of claim 10, wherein the one or more ring combination features comprises a morphologic or architectural marker in combination with a morphological ring feature.
15. The system of claim 14, wherein the morphologic or architectural marker comprises a high molecular weight cytokeratin, HMWCK, or AMACR marker.
16. A system of predicting occurrence of a medical condition, the system comprising: (1) a database configured to store patient data, including at least one sample image of a tissue sample treated with a plurality of flurochrome labeled antibodies; and (2) a processor configured by code executing therein to perform the following: (a) generate a patient image dataset, using the at least one sample image, that includes values for at least one of the following features: (1) a feature generated based upon a comparison of histograms, said histograms corresponding to compartments or sub-compartments of cellular objects and (2) a feature generated from an intensity index corresponding to image signal intensity; (b) evaluate at least the patient image dataset with a Support Vector Regression for Censored Data (SVRc) algorithm executed as code by the processor, where the SVRc algorithm is configured to output a value corresponding to a risk score for a medical condition occurrence based on the morphometric dataset, wherein the SVRc algorithm is generated by performing regression, using code executed in the processor, on a population dataset, where each member of the population has measurement values corresponding to each feature of the patient image dataset; (c) assign the patient to a high probability of a medical condition occurrence where the output model value is below a pre-determined threshold and assign the patient to a low probability of a medical condition occurrence where the output value is above the pre-determined threshold; and (d) generate a report based on the updated patient dataset.
17. The system of claim 16, wherein the processor is configured to evaluate the patient dataset using the SVRC algorithm, the patient dataset comprising the patient image dataset, one or more clinical features, and one or more molecular features, and the SVRC algorithm is generated by performing regression on a population dataset where each member of the population dataset has data corresponding to each of the features found in the patient dataset.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) For a better understanding of embodiments of the present invention, reference is made to the following description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(48)
(49) Each of modules 102-108 may include any suitable hardware (e.g., one or more computers or processors), software, firmware, or combination thereof for performing the respective functions described herein in connection with, for example, one or more of
(50) In some embodiments of the present invention, segmentation module 106 may be an apparatus configured to perform any one or more (e.g., all) of the functions described in connection with
(51) In some embodiments of the present invention, postprocessing module 108 may be an apparatus configured to perform any one or more (e.g., all) of the functions described in connection with
(52) In some embodiments of the present invention, feature extraction module 110 may be configured to perform any one or more (e.g., all) of the functions described in connection with
(53) Modules 102-110 are shown in
(54) Pre-Processing of Tissue Images
(55)
(56) Signal intensity in a tissue image is often used as a criterion to determine whether a given portion of the image is positive signal or background signal. For example, when the intensity of a given portion (e.g., one or more pixels) of the image exceeds a threshold value, the signal may be deemed a positive signal. Portions of the image having an intensity below the threshold may be deemed background signal. Positive signal may be included in subsequent processing of the tissue image including, for example, segmentation and classification of the positive signal into cellular and/or tissue objects and/or feature extraction. Background signal, on the other hand, may be ignored in subsequent processing. Non-uniform variations in intensity can result in a failure, for example, to properly distinguish positive signal from background signal and can lead to segmentation errors, misclassification of cellular and/or tissue objects, and incorrect feature extraction.
(57) In multispectral microscopy, for example, multiple proteins (antigens) in a tissue specimen are simultaneously labeled with different fluorescent dyes conjugated to antibodies specific to each particular protein. Each dye has a distinct emission spectrum and binds to its target protein within a tissue compartment. The labeled tissue is imaged under an excitation light source using a microscope fitted with one or more relevant filters and a multispectral camera. The resulting multispectral image (e.g., image cube) is then subjected to spectral unmixing to separate the overlapping spectra of the fluorescent labels. The unmixed images have multiple components, where each component (e.g., image layer) represents the expression level of a protein-antigen in the tissue. For each image, the presence or absence of a positive signal within any given region of the image may be determined based on the intensity of the signal in that region.
(58) In some embodiments of the present invention, process 200 may include, consist of, or consist essentially of estimating the inverse illumination field of a tissue image at stage 202, and generating a corrected image at stage 204 based on (e.g., based at least in part on) the estimated inverse illumination field (e.g., multiplying the tissue image by the inverse illumination field to obtain a corrected image). The intensity non-uniformity in the tissue image input to stage 202 may be corrected or substantially improved by process 200. The output of stage 204 may be a corrected or substantially improved image having fewer (e.g., none) non-uniform intensity variations than the tissue image input to stage 202. According to various embodiments of the present invention, images resulting from process 200 may be subsequently processed at stage 206 (e.g., subject to segmentation, classification of cellular and/or tissue objections, and/or feature extraction). Advantageously, the operation of stages 202-204 may allow for improved results (e.g., fewer errors) in such subsequent processing.
(59) In some embodiments of the present invention, process 200 may be used to correct for intensity non-uniformity in a tissue image that includes primarily nuclei. The tissue image input to process 200 may be generated, for example, by imaging tissue that is labeled with the nuclear counterstain 4-6-diamidino-2-phenylindole (DAPI). Such imaging may be performed by image acquisition module 102. DAPI is a fluorescent dye that has a distinct emission spectrum. In some embodiments, DAPI may be used alone or in combination with one or more additional fluorescent dyes such as, for example, the biomarker cytokeratin 18 (CK18) that binds to cytoplasm.
(60) In other embodiments of the present invention, process 200 may be utilized to correct for non-uniform intensity variations in other types of images including, for example, images that include primarily cytoplasm (e.g., generated by imaging tissue labeled with CK18), images generated by imaging tissue labeled with other biomarker(s), and/or images that include other tissue or cellular components or a combination of tissue and/or cellular components.
(61)
(62) At stage 208, tissue background in a tissue image is subtracted from the image using, for example, a top hat filter. In some embodiments, tissue background may correspond to a gray-level in the tissue image, which is in contrast to brighter regions of the tissue image that represent tissue foreground (e.g., nuclei and/or cytoplasm). The top hat filter may remove smooth regions that are larger than most, if not all, tissue and/or cellular components of interest in the image (e.g., nuclei clusters, or cytoplasm clusters when, for example, stage 208 is applied to CK18 images). In some embodiments of the present invention, a filter size (e.g., 30 pixels) may be set for the top-hat filter for use for non-uniform illumination correction. In some embodiments, nuclei and/or cytoplasm clusters have a size range from three to five small nuclei clumps (e.g., occupying roughly 100-200 pixels) to 10-30 nuclei large clumps (e.g., occupying roughly 500-2000 pixels).
(63) At stage 210, contrast enhancement is applied to the filtered image resulting from stage 208. In some embodiments of the present invention, contrast enhancement stretches the image (e.g., some or all pixels in the image), for example, to fill the full range of intensity values. For example, for 8-bit images, the new_intensity=(old_intensityminq) 255/(maxpminq), where maxp=intensity of percentile p, minq=intensity of percentile q, taking, for example, p=99% and q=1%. In another example, for 16-bit images, 255 is replaced by 65535. In another embodiment, which may provide a more balanced sampling of bright and dark pixels in the image and avoid enhancing background noise in images with small bright areas on a large dark background, max and min percentile values may be calculated in bands around the edges in the image. In some embodiments, stage 210 may be optional and may be omitted from process 200. In other embodiments, stage 210 may be utilized, for example, as necessitated by the top-hat filtering of stage 208 and/or low contrast present in the original tissue image that serves as input to process 200.
(64) At stage 212, blob detection (e.g., nuclei detection) is performed on the image resulting from stage 210, or from stage 208 if stage 210 is not utilized. In some embodiments, such detection may be performed using an Eigenvalues-of-Hessian matrix method (EoH). Advantageously, the present inventors have determined that such a detection method can detect both bright and dim components of the types (e.g., both bright and dim nuclei) that may be present in the tissue images evaluated by process 200 according to some embodiments of the present invention. Illustrative substages involved in stage 212 are described below in connection with
(65) At stage 214, local maxima are extracted from (identified in) the image resulting from stage 212. For example, stage 212 generates a blob image, which may be subsequently thresholded (e.g., as shown in
(66) At stage 216, an inverse Euclidean distance transform is computed starting from the local maxima points to produce a distance map.
(67) At stage 218, a watershed transformation is performed using the distance map from stage 216 as input. The watershed transformation divides the image into small components or cells around the local maxima (e.g., local maxima of the nuclei). A suitable example of a watershed transformation process according to some embodiments of the present invention is described in above-incorporated Gonzalez R. C and Woods R. E., Digital Image Processing, Second Edition, Prentice-Hall Inc, 2002.
(68) In some embodiments, the intensity inside each component or cell is set to its average intensity at stage 220 (field estimation).
(69) At stage 222, a first estimate of the intensity field is obtained by, for example, smoothing the image resulting from stage 220 using a Gaussian filter.
(70) At stage 224, correction for non-uniform intensity is accomplished by multiplying the original image by the inverse of the intensity field obtained as a result of stage 220. In some embodiments, process 200 includes an additional stage (e.g., after stage 224) of enhancing the separation between clustered components (e.g., nuclei), for example, along the thin dark ridges that separate them using a morphological top-hat transform on the intensity corrected nuclei image. In some embodiments, this top-hat transform may be the same as, or similar to, the top-hat filter described above in connection with stage 208 (e.g., having a filter size of 30 pixels).
(71)
(72)
(73) At stage 228, given the Hessian matrix, Eigen values of the matrix are computed, for example, as approximately:
(74)
(75) At stage 230, the Eigen values (e.g., approximate values) are used to compute the shape index at each point (x, y), for example, as:
(76)
(77) In some embodiments of the present invention, the blobs (e.g., nuclei) are defined as the points for which /4(x, y)3/4 although other values could be employed in other embodiments of the present invention.
(78)
(79)
(80)
(81) Binarization and Segmentation of Tissue Images
(82)
(83) At stage 502, an image of tissue is received. At stage 504, the image is binarized using minimum error thresholding. For example, in some embodiments, such minimum error thresholding may include a clustering-based approach that assumes that the histogram of signal intensity in the image is bimodal in order to estimate the Poisson mixture parameters (e.g., assuming that a DAPI image resulting from process 200 (
(84) In some embodiments of the present invention, the clustering-based approach of stage 504 may model the normalized image histogram of the tissue image as:
(85)
where P.sub.k is the prior probability of the k.sup.th component, p(i\k) is a Poisson distribution with mean .sub.k, and I.sub.max is the maximum intensity bin in the histogram. For any threshold t the Poisson mixture parameters are given by:
(86)
In some embodiments, the goal is to find a threshold t* that minimizes the following error criterion function, where is the overall mean intensity of the entire image:
(87)
(88)
(89) At stage 602, an image of tissue (e.g., CK18 image of cytoplasm obtained by spectral imaging) is received. At stage 604, the image is subjected to initial segmentation. At stage 606, the background texture in the image is evaluated. At stage 608, the image is subjected to an additional or final segmentation.
(90)
(91)
(92)
Contrast may be defined as:
(93)
For these equations, p(i,j) is obtained from the gray-level co-occurrence matrix that calculates how often a pixel with gray-level (grayscale intensity) value i occurs horizontally adjacent to a pixel with the value j. Suitable examples of energy and contrast parameters are described in above-incorporated Gonzalez R. C and Woods R. E., Digital Image Processing, Second Edition, Prentice-Hall Inc, 2002, and in Mathworks' Matlab (http://www.mathworksxom/help/toolbox/images/ref/graycoprops.html), which is hereby incorporated by reference herein in its entirety. At stage 622, the features are combined to generate texture aggregate values. In some embodiments of the present invention, the aggregate value may be computed as (1contrast)*energy.
(94)
(95)
EXAMPLE: SEGMENTATION OF NOISY CYTOPLASM IMAGES
(96) Adaptive epithelial cytoplasm segmentation as described above was applied on 1030 images (from 383 patients) that had fairly noisy to very noisy background. A fraction of the images had high levels of noise in the background that made the difference in foreground and background contrast very low. These were the most challenging cases. The background texture estimation process facilitated improved cytoplasm segmentation by the automatic selection of median filter size and binarization threshold level parameters. The segmentation was thus fine-tuned to adapt to every image based on the quantitative level of background noise.
(97)
(98) In some embodiments of the present invention, process 700 may detect a seed point for a plurality of (e.g., all) components (e.g., nuclei or cell) within an image. At stage 702, an image of tissue is received. At stage 704, seed detection is performed on the image to separate touching or connected components in the image.
(99) In some embodiments of the present invention, seed detection at stage 704 may be performed using a multi-scale Laplacian of Gaussian (LoG) filter (specifically, the LoG's Difference of Gaussian (DoG) approximation). In some embodiments, use of the LoG (or DoG) filter may be based on the idea that most of nuclei and cytoplasm objects have a blob-like appearance. The scale of the filter (usually the standard deviation of its Gaussian kernel) may be directly related to the size of the blob. Thus, the maximum LoG response at the center of the blob may occur when using an LoG filter with a size that matches the size of the blob. In some embodiments, the use of multiple LoG scales may be necessitated by the presence of multiple nuclei and cytoplasm object sizes in the images within a dataset under consideration. In some embodiments, the Difference of Gaussian (DoG) approximation of the multi-scale LoG may be used because of its ease of implementation and the fact that the DoG method does not require normalization across the scale. In some embodiments of the present invention, such seed detection may be performed using the process described in above-incorporated Al-Kofahi et ah, Improved automatic detection and segmentation of cell nuclei in histopathology images, IEEE Transactions on Biomedical Engineering, 57(4), 2010, or another suitable process. Also incorporated by reference herein in its entirety is Al-Kofahi, Algorithms and Framework for Cellular-Scale Mapping, Doctoral Dissertation, Rensselaer Polytechnic Institute, Troy, N.Y., 2009.
(100) In some embodiments, seed detection in the image may be performed as follows: assuming that Lnorm (x, y; ) is scale normalized LoG at scale , in terms of a difference of Gaussians, it follows that:
(101)
where G(x, y, ) is a Gaussian kernel with standard deviation . In some embodiments, in order to detect seed points, the image I(x, y) may be first convolved with multiple scale-normalized LoGs (using the DoG approximation) at different scales. Then the maximum LoG response at each point may be set to the maximum across the scales as
(102)
where * is the convolution operation, and are the scale ranges. The resulting image R(x, y) referred to as the LoG response image can be thought of as a topographic surface in which each blob (e.g., nuclei or cytoplasm) is represented by a Gaussian like blob with one local maxima. In some embodiments, seed points may be detected by identifying these local maxima points. In some embodiments, a minimum size constraint may be used to limit the size of the identified blob. The size of the area (search box) may depend on a parameter termed the clustering resolution r, which in turn depends on the minimum scale.
(103) In some embodiments, the clustering resolution parameter may be subsequently used to perform local maximum clustering. An advantage of using the multi-scale LoG method for seed detection is that in addition to producing seed points, the method yields the LoG response image R(x, y). In some embodiments, the LoG response image and the detected seeds may be combined with a clustering based approach to separate touching cells. In some embodiments, this process may not be followed by segmentation refinement using a Graph-cuts based approach. This is because the present applicant has determined that results produced by applying a local maximum clustering approach are adequate, rendering segmentation refinement unnecessary. As can be inferred from its name, the clustering method groups points in the image around their local maxima using a graduated region growth approach. As the clustering is applied to the response image and the local maxima of the image are also the seed points, the clustering method degenerates to a simple task of assigning image points to seed points.
(104) In some embodiments, the local maximum clustering algorithm uses a sliding box having a size defined by a resolution parameter where each point in the image is first assigned to its local maximum inside that box. In subsequent iterations, that assignment may be propagated until each foreground point is assigned to a seed point. In some embodiments, there may be two advantages to this clustering method over, for example, use of a watershed transform method. First, this method may have a clustering resolution parameter that controls the smallest cluster size possible, thus making it possible to avoid producing very small clusters or fragments. Second, this method can be applied on the foreground points only (or could be constrained to a limited portion of the image), making it computationally attractive and flexible.
(105) Post-Processing: Artifact Removal and Reclassification Near Boundaries
(106)
(107) Process 800 may include using a first segmented image (e.g., nuclei segmentation) as a template to fill holes in a second segmented image (e.g., cytoplasm segmentation). For example, in some embodiments of the present invention, when the first and second images correspond to a nuclei (DAPI) image and a cytoplasm (CK18) image, respectively, process 800 may entail filling holes in the second image left by the DAPI staining. For example, here, the first and second images may be the output of a nuclei image (e.g., DAPI) segmentation process (nuclei mask) and the output of a cytoplasm image (e.g., CK18) segmentation process (cytoplasm mask). Two types of holes may be common in the cytoplasm mask. The first type represents nuclei objects that are missed during an initial thresholding step (e.g., stage 504,
(108) At stage 802, small gaps inside the cytoplasm image or on its boundary are closed by applying, for example, a gray-scale morphological closing operation. A suitable example of a gray-scale morphological closing operation according to some embodiments of the present invention is described in above-incorporated Gonzalez R. C and Woods R. E., Digital Image Processing, Second Edition, Prentice-Hall Inc, 2002. At stage 804, this may be followed by filling multiple (e.g., all) of the cytoplasm holes, for example, having a size less than or equal to an average nucleus size for the image. At stage 806, holes smaller than, for example, four times the average nucleus size and/or at least 50% filled by a single nucleus may also be filled. In some embodiments of the present invention, the remaining holes may be considered (e.g., labeled or classified by one or more computers as) lumen holes. Alternatively or additionally, in some embodiments, semi-holes that touch an edge of a non-tumor mask and are completely enclosed by one cytoplasm object may be considered (e.g., labeled or classified by one or more computers as) lumen holes.
(109)
(110) In some embodiments of the present invention, process 900 may include receiving a nuclei image at stage 902, and detecting and removing lumen artifacts at stage 904 in order to produce an output nuclei image. In some embodiments, at least a portion (e.g., most) of the unwanted artifacts may have been located within lumen holes previously extracted during the generation of the cytoplasm mask (e.g., according to process 800,
(111) In some embodiments of the present invention, stage 904 may involve the use (e.g., by one or more computers) of a rule-based system that considers the morphological characteristics and texture of objects and/or their connected components in determining if they are likely artifacts. Connected component features may include, for example, one or more (e.g., all) of the size of the connected component Zc, and its average and standard deviation of intensities Ac and Sc respectively. As for object-based (nuclei) features, they may include one or more (e.g., all) of nucleus size, eccentricity Cn and elongation En. In some embodiments, assuming that Pn is the percentage of the object area inside the lumen, Zn is the average nucleus size, AVn is average of the average nuclei intensities, and STn is their standard deviations, a rule-based classifier may be defined as follows:
(112)
(113)
(114) At stage 1102, a filter (e.g., connected component filter) may be applied to the output of a nuclei segmentation. At stage 1104, a mask (e.g., binary mask) may be created from the output of a cytoplasm segmentation. At stage 1106, a complement image may be created from the binary mask of the cytoplasm segmentation to create a new image. Relative to an initial image, an image complement may be generated by, for example, replacing all high-pixel values in the initial image with low pixel values, and replacing all of the low pixel values in the initial image with high pixel values, which may also be referred to as image inversion.
(115) At stage 1108, an operation (e.g., binary morphological operation) such as binary erosion, binary dilation, binary closing or binary opening may be applied to the binary mask of the cytoplasm segmentation output (e.g., erode the mask by 2 pixels) to create an epithelial mask. At stage 1110, an operation (e.g., a binary morphological operation) such as binary erosion, binary dilation, binary closing or binary opening may be applied to the complement of the binarized cytoplasm segmentation output (e.g., erode the mask by 5 pixels) to create a stroma mask. At stage 1112, for each nuclei object in the nuclei segmentation output image, its overlap with the epithelial and/or stroma masks may be determined. In some embodiments, if the overlap with the epithelial mask is greater than a given fraction (e.g., predetermined, fixed fraction, e.g., 0.6), the nuclei object may be labeled as an epithelial object. In some embodiments, if the overlap with the stroma mask is greater than a given fraction (e.g., predetermined, fixed fraction which could be the same or different from the fraction above, e.g., 0.5), the nuclei object may be labeled as a stroma object. In some embodiments, if the nuclei is neither labeled as epithelial nor stroma (e.g., due to insufficient overlap), it may be labeled as unclassified.
(116) At stage 1114, for each nuclei labeled as epithelial and/or stroma objects, it may be determined whether the nuclei's one or more morphological shape features (e.g., eccentricity, circularity, roundness, etc.) and/or one or more texture features (e.g., energy, standard deviation of grayscale values, etc.) meet one or more thresholds (e.g., confirm that all nuclei labeled stroma nuclei have eccentricity greater than 0.5). In some embodiments, stage 1114 may be optional and may be omitted from the process of
(117)
(118)
(119)
EXAMPLE: INTEGRATED CELLULAR SEGMENTATION OF PROSTATE GLAND TISSUE
(120) 981 5 M sections of formalin-fixed paraffin-embedded human prostate tissues were rehydrated and stained using nuclear DAPI and CK18. The DAPI and CK18 emission images were captured using a Nuance multispectral camera at 20 by 10 resolution. Non-cancer tissue regions of the DAPI-CK18 images were digitally masked out by a pathologist. As a result, three gray levels were present in the images: the masked area (black), tissue background (dark gray), and tissue foreground (bright regions representing the nuclei or cytoplasm). Ground truth was created from 48 DAPI-CK18 images using a semi-automatic labeling tool and using the process described in
(121) Segmentation performance was measured using the standard Dice and Jaccard metrics (below) and also using Dice Average and Jaccard Average, which are extensions of the traditional metrics to multiple object, multiple impingement systems (such as cellular images).
(122)
The overall segmentation accuracy for image processing and segmentation according to some embodiments set forth herein exceeded 94% using all four metrics. The artifact removal process was tested using 981 actual prostate cancer tissue images. To judge the efficacy of the process, trained personnel were asked to rate overall artifact condition of the resultant images as satisfactory or not satisfactory. The artifact removal resulted in a drastic reduction in images with visible artifacts. Only 10% of images were rated as not satisfactory compared, for example, to 37% of images for the process described in the above-incorporated Al-Kofahi paper, which lacks Applicants' artifact removal process. From experiments it was observed that the most of the improvement in performance was the result of introducing both the initial non-uniform intensity normalization and final artifact removal stages as described herein.
(123) Epithelial Unit Separation
(124)
(125)
(126) At stage 1202, a segmented nuclei (e.g., DAPI) binary mask is received. At stage 1204, the segmented nuclei binary mask is variably dilated using morphological dilation. At stage 1206, the complement of the dilated mask is generated. At stage 1208, marker centers are extracted from the complement of the dilated mask. At stage 1210, using the marker centers, a smoothed cytoplasm (e.g., CK18) image (e.g., generated by smoothing the CK18 image using a Gaussian low pass filter and an averaging filter), and/or segmented cytoplasm (e.g., CK18) binary mask as input, a new image of intensity valleys and peaks (minima and maxima) is generated and a watershed transform is applied over the new image to obtain watershed lines of separations. At stage 1212, the watershed image is binarized. At stage 1214, the segmented cytoplasm (e.g., CK18) binary mask and watershed binarized image are merged. At stage 1216, missing epithelial units from the segmented cytoplasm (e.g., CK18) binary mask are searched for and retained using one or more logical operations between the binarized watershed image and the segmented cytoplasm (e.g., CK18) binary mask. For example, in some embodiments, a logical OR operation may first be used to merge all separations that are marked for elimination, and the binarized watershed based image and the segmented cytoplasm mask may be merged. At stage 1218, the image is labeled (e.g., labeled using connected components) and separation boundaries and/or regions are extracted from the labeled image, for example, by subtracting the separations marked for elimination from initial separations obtained from the watershed segmentation.
(127)
(128) At stages 1220 and 1222, respectively, an intensity (e.g., mean intensity) and standard deviation thereof (e.g., standard deviation of mean intensity) are computed on individual separations of a cytoplasm (e.g., CK18) intensity image (e.g., the original or pre-processed CK18 image). At stage 1224, the standard deviation of, for example, mean intensity is computed on individual separations on a gradient of the cytoplasm (e.g., CK18) image. At stage 1226, separations that touch any nuclei (e.g., DAPI) marker centers are identified. At stage 1228, based on a threshold criterion (e.g., obtained by dividing the lumen area by the segmented cytoplasm area), potentially false separation boundaries are eliminated (e.g., these separation boundaries may not lie in the epithelial cytoplasm gaps). At stage 1230, final or refined separation boundaries are extracted.
(129)
(130) At stage 1232, a speed image may be created, which includes the CK18 edge+ridge strength. For example, in some embodiments, the speed image may have values in the range 0-1, with propagation speed proportional to grey level, and distance (difference between neighboring fast marching output pixels) proportional to (1/grey level). The speed image may be designed to promote propagation along dark valleys between bright epithelial units and also along the outside (dark side) of bright borders of epithelial units. In some embodiments, the speed may be a rescaled edge strength image, defined as: (0.08+original_grey_level^2)/(dark side of bright edges+dark valleys), with dark side of bright edges=DilationCKImage, and dark valleys=BottomHat. At stage 1234, a Fast Marching edge strength propagation method is performed using the speed image, initialized from the CK18 borders, to create a distance map, which is then inverted at stage 1236. At stage 1238, a watershed segmentation is performed of the inverted distance map and watershed separations are extracted at stage 1240.
(131)
(132) In some embodiments of the present invention, processes for separating epithelial units (e.g., touching glands) may be the same or similar for various different types of tissue images including, for example, H&E images, IF images, IHC images, and/or other images resulting from other types of staining. For example, for an H&E image, prior to the separation an H&E image segmentation process may be used to segment the H&E image. Once the segmentation is complete, the rest of the processing stages may be the same as those used for, for example, epithelial unit separation in IF images.
(133) Lumen Generation
(134)
(135) In some embodiments of the present invention, an initial lumens mask image and an intermediate lumens mask image may be generated from cytoplasm and nucleus mask images generated by, for example, process 500 (
(136) Referring to
(137) At stage 1244, one or more shape statistics are measured for each lumen including, for example, one or more (e.g., all) of lumen perimeter, area, eccentricity, roundness, and extent. At stage 1246, a shape threshold is determined. For example, one shape statistic (e.g., eccentricity) may be multiplied with a complement of one or more additional statistics (e.g., roundness, perimeter, area, and/or extent) to obtain a shape threshold having a range, for example, that can be between 0-1 only. For example, in some embodiments, the threshold parameter may be:
Shape Threshold=Eccentricity*(1Roundness)*(1Perimeter)*(1Area)*(1Extent)
(138) At stage 1248, lumens that are less than or equal to, for example, a certain value of the threshold parameter (e.g., 0.15 shape threshold value) are retained, and the retained lumens constitute the identified shape-based lumens.
(139) Referring to
(140)
(141) Ring Segmentation
(142)
(143) At stage 1254, Delaunay triangulation with epithelial nuclei centers as vertices is performed. In some embodiments of the present invention, the triangle connectivity graph is the Voronoi diagram. At stage 1256, a depth is assigned to each triangle, for example, equal to the length of the longest side. At stage 1258, a sort by depth is performed, and then starting from the deepest triangles, neighboring regions (e.g., 3 neighboring regions) are examined and regions are merged if the length of the common side is at least, for example, 90% of the depth of the neighbor, and if both regions touch the same epithelial units. In some embodiments, at stage 1260, a merging step reduces over-fragmentation to complete the partitioning of the image into polygonal rings. For example, according to some embodiments, an advantage of a graph-based watershed method over a pixel-based algorithm is that it is more convenient to track region statistics within the algorithm and to apply fine-tuned region merging criteria. Merging criteria that may be used according to various embodiments of the present invention include, for example, one or more of: (i) merging two touching regions if region contrast=Min (region1 diameter, region2 diameter)touching edge length<300 pixels, with region diameter defined as Max(triangle edge lengths in region); (ii) merging touching regions if contrast ratio=touching edge length/region width>0.75 for either region; (iii) merging touching regions if the smaller region has width<10 pixels; and/or (iv) not merging if the two regions overlap different epithelial units.
(144)
(145) Ring Classification
(146) According to some embodiments of the present invention, rings are polygonal regions of tissue that are identified and output by a segmentation process (e.g., the ring segmentation process described above). The goal of ring classification according to some embodiments of the present invention is to classify the rings into, for example, one or more (e.g., all) of five different types: (1) gland rings; (2) epithelial non-rings; (3) stroma regions; (4) under-segmented regions; and/or (5) incomplete regions.
(147) In some embodiments, gland rings may be rings characterized by a ring of nuclei with a central clearing possibly containing a lumen. Epithelial non-rings may be characterized by dense fields of nuclei not in rings, isolated fragments of nuclei or nuclei on the outside of a ring sandwiched between neighboring rings or sandwiched between a ring and stroma. Stroma regions may be rings bordered by epithelial nuclei in different epithelial units or rings which border concavity of stroma in the same epithelial unit. Under-segmented regions may be rings with both stroma and epithelium which cannot be classified into any of the other categories. Incomplete regions may be rings bisected by image edges or a pathology mask.
(148) Referring back to
(149) According to some embodiments of the present invention, there may be three levels/types of ring features: (i) basic ring metrics, (ii) features for classification of ring types, and (iii) features for prognosis. Table 2 below describes basic ring metrics according to some embodiments of the present invention, which are also illustrated in
(150) In some embodiments of the present invention, adjacencies and connectivities of one or more basic ring metrics (e.g., Table 2) may be used to construct a more detailed set of one or more features (e.g., Table 3) for the purpose of classifying rings into the five types or classes. For example, in some embodiments of the present invention, a rule-based method may be provided that uses one or more of the basic ring metrics in Table 2 and/or the ring classification features in Table 3 to classify each ring into, for example, one of the five classes 1-5, as illustrated below in Table 1:
(151) TABLE-US-00001 TABLE 1 Ring Classification Decision Rule Ring Classifi- cation Decision Rule 5 Touching image border 4 Confused - under segmented, with mixed stroma and cytoplasm: (numBlobs > 1 and lowck == 0) or (numStromalSegments == 0 and ring2 == 0) 3 Stroma: stroma1 == 1 or (ringOK == 0 and stroma2 == 1) 1 Gland ring: (ringOK == 1 and ((big == 1 and crowded == 0) or (medium == 1 and small == 0)) or (ringOK == 0 and (big == 1 and lumenBorderOK == 1) 2 Non-ring epithelial area: Otherwise
(152) TABLE-US-00002 TABLE 2 Basic Ring Metric Ring Metrics Dir Definition Outer + D Outer diameter of ring = diameter 4 area/perimeter (exact for regular polygons, overestimate for long objects) Inner + D Inner clearing free of nuclei = diameter 2.sup.nd largest chord in nuclei triangulation Border b Gap between nuclei around border = gap 2.sup.nd largest gap Clearing + L Lumen or clearing diameter =excess of inner diameter over border gap =d b (or =Max(Min(D,d) b, 0)) Border + d/b Density of the border nuclei gaps compared density to the interior gap Range: >=1 Lumen + Lr =L/d ratio =(d b)/d = 1 b/d = density rescaled 0-1 Ltouch + Lt Proportion of border touching inner clearing = L/D Range: 0-1 Stouch + St Proportion of border touching stroma = ratio border touching stroma + ratio border separated from stroma by a narrow band of non-ring padding of width < 15 pixels (green in the diagram above) Range: 0-1 SDist20 + Sd20 Ratio of border less than 20 pixels from stroma SDist Sd Mean distance of border pixels from stroma Padding Width of epithelial padding between ring and stroma = non-ring area adjacent to ring (non-ring border proportion x ring border length
(153) TABLE-US-00003 TABLE 3 Ring Classification Features Ring Classification Features Description numBlobs Number of connected cytoplasm areas within the region after connecting together areas separated by small concavities in the cytoplasm border. A concavity is formed where the region edge (the straight line connecting adjacent nuclei) crosses the cytoplasm border. A small concavity has area < edge_length{circumflex over ()}2 numStromalSegments Number of connected background areas crossed by region edges, not including small concavities (nonblobsedges) concavityArea Total area of small concavities lowCk 1 (true): low cytoplasm density in region: Lumen_area (=area of background areas not touching region edges) < 100 and (cytoplasm area/region area) < 0.7 ck18Width ck18_area/region perimeter ck18AreaRatio ck18_area/region_area ck18WidthOK 1 (true): large average ck width, not significantly smaller than neighbors: ck18AreaRatio > 0.75 or ck18Width > 10 or ck18Width > 0.5 ck18Width of neighboring regions stroma1 1 (true): strong evidence for stromal region: lowCk and numStromalSegments == 0 nucArtifactsFound 1 (true): evidence of nuclei artifacts in stroma/lumen: #stromal nuclei > 5 and foreground/ background intensity ratio < 2.5 stroma2 1 (true): weaker evidence for stromal region: lowCk and not nucArtifactsFound and (# stromal nuclei + numStromalSegments 2ck18WidthOK) >= 0 ring1 1 (true): strong evidence for a gland ring region. All the following conditions are true: #border nuclei > 3 ck18WidthOK (stromal nuclei total area < 10 or nucArtifactsFound) not touching pathology mask or image edges numBlobs == 1 ring2 1 (true): weaker evidence for gland ring region: numStromalSegments == 0 and stroma2 == 0 ringOK ring1 or ring2 aspectRatio With m = Sqrt(1-4 region_area/((region_perimeter/2){circumflex over ()}2), aspect ratio = (1 m)/(1 + m) lumenOK 1 (true): strong evidence that background area is lumen not stroma: ck18AreaRatio < 0.9 and lumen area > 50 and lumen_area/concavityArea > 0.5 borderOK 1 (true): tight border nuclei spacing compatible with gland ring: largest_border_gap < region_diameter lumenBorderOK concavityArea/region_area < 0.15 and (borderOK or lumenOK or aspectRatio > 0.16) big Lumen area > 400 or (region_diameter > 35 and region area > 2000 and lumenBorderOK) medium lumen area > 100 or (region_diameter > 22 and concavityArea/region_area < 0.15 and borderOK) small #border_nuclei < 5 and region_area < 500 crowded #border_nuclei < #all_nuclei/2
(154) In some embodiments of the present invention, apparatus, methods, and computer-readable media are provided that utilize features based on connectivity of cytoplasm within rings with corrections for concavities where lumen or stroma crosses the ring boundary. For example, with reference to Table 3 above, embodiments of the present invention may detect small concavities and ignore these in calculating blob connectivities. This procedure may be useful, for example, whenever connectivities of overlapping blobs are analyzed, especially blobs with holes and concave borders.
(155) Biomarker Signal Segmentation/QuantitationLow Signal-to-Noise Images
(156)
(157) In some embodiments of the present invention, process 1500 may be performed on a tissue image (e.g., DAPI and/or CK18) that has already been preprocessed according to process 200 (
(158) At stage 1502, one or more bright objects having a size below a threshold are removed from an image of tissue (e.g., nuclei image) as being indicative of and characterized as speckle noise. In some embodiments, two thresholds are utilized, one for size and one for brightness. The size threshold may be utilized to remove small artifacts (e.g., nucleus and cytoplasm fragments generated during tissue process), while the brightness threshold may be utilized to remove speckles (e.g., point speckles created during the staining process). For example, in some embodiments, relative signal intensity in a biomarker image is indicative of a targeted biological process and/or attribute for which the biomarker is designed to capture. Absolute intensity of a biomarker signal in one image relative to another may be influenced by strong nonspecific binding as a result of tissue history (e.g. extraction, processing and storage steps) as well as impurities in labeling reagents. This non-specific binding constitutes background noise in the image.
(159) In some embodiments of the present invention, to limit the influence of noise due to tissue history and other elements of the assay, biomarker quantitation may be limited to segmented objects (e.g., nuclei and/or cytoplasm and/or glandular objects). Alternatively or additionally, in some embodiments, the amount of speckle noise that is allowed in each object (e.g., nucleus and/or cytoplasm and/or or glandular object) is limited using an experimentally determined threshold (e.g., determined automatically or based on review by a pathologist of image processing results using multiple threshold values). For example, the resulting average object (e.g., nuclei) intensity I may be given as:
(160)
In (1), b are valid nuclei biomarker object pixels (e.g., nuclei pixels), T is the speckle noise threshold, m is the total number of object pixels, and n is the total number of nuclei in the image.
(161) At stage 1504, a threshold (cut-off) may be determined for distinguishing between background and real signal intensity for each object (e.g., nucleus). In some embodiments, each object (e.g., nuclei) in each of a plurality of sub-compartments (e.g., epithelia and stroma) may be treated separately. In some embodiments, the signal intensity threshold T may be determined as a function of biomarker background characteristics. For example, in order to find this function, objects (e.g., nuclei) B may be split into two non-overlapping classes: 1) real signal colored objects Br (e.g., with mean, standard deviation and average intensities given by , r, and -th percentiles qr,, where =5, 10, . . . , 100; and 2) background colored objects Bb (e.g., with b, r, qr, and -th percentiles q b,). In some embodiments, a linear relationship is modeled between the threshold T and the background characteristics:
T.sup.TX,(2)
where =.sup.T are model parameters of a model, and X=.sup.T are background features. In other embodiments, non-linear modeling may be used. Note that in (2) or according to other models in accordance with other embodiments of the present invention, the threshold T may change according to the properties of each image. In some embodiments, indirect linear regression analysis via concordance index (CI) optimization may be used to determine . For example, a set of images may be selected for training the model. For each image: a) objects B.sub.c and B.sub.b may be identified; b) background features may be extracted; and c) CI may be computed. The value of that optimizes the CI may be selected as optimal.
(162) At stages 1506 and 1508, after positive object identification, sub-compartment histogram generation (stage 1506) and feature extraction (stage 1508) may be performed. In some embodiments of the present invention, histograms (e.g., epithelial and stroma histograms HE and HS) are extracted by binning the actual signal colored objects (e.g., nuclei Br), for example, separately in each sub-compartment. In some embodiments, the binning could also involve using compartments such as, for example, peri-epithelium and sub-compartments of the stroma. The histograms (e.g., epithelia and stroma histograms) may be analyzed separately to obtain relative intensity features such as, for example, cumulative expressions at each -th percentile. Alternatively or additionally, the histograms may be analyzed in relation to each other to obtain features related to the relative expression of, for example, epithelia and stroma nuclei at each a position.
(163)
(164) In some embodiments of the present invention, apparatus, methods, and computer-readable media are provided for reducing the influence of segmentation errors while averaging biomarker positive objects by excluding no/poor signal regions (e.g. by avoiding zero pixel elements). The output of a segmentation of the nuclei image may be received. One or more (e.g., all) objects in the nuclei labeled image may be filled with its corresponding intensity in original nuclei image (e.g., DAPI image). For each filled nuclei label object, pixels below an intensity threshold (e.g., fixed threshold, e.g., zero) may be eliminated. For each filled nuclei labeled object, objects below a size threshold (e.g., fixed size threshold, e.g., 20 pixels) may be identified and removed. For each filled nuclei labeled object, the average intensity of the pixels may be calculated. The original nuclei labels may be filled with the corresponding average intensity values calculated.
(165) In some embodiments of the present invention, apparatus, methods, and computer-readable media are provided for estimating and correcting for the noisy background of a biomarker image by sampling the signal in a band (or strip) region bordering a sub compartment (e.g., the cytoplasm). A biomarker image (e.g., AR image) may be received. A cytoplasm mask may be received. The cytoplasm mask may be dilated, for example, by a fixed size (e.g., 10 pixels). The original cytoplasm mask may be subtracted from the dilated mask to create a band. The biomarker image values may be sampled within the band and its average values may be calculated (e.g., after excluding pixels less than a fixed threshold, e.g., zero). The intensity of regions of the original biomarker image above a fixed intensity value (e.g., 5000) may be reduced by the calculated threshold value.
(166) Feature Extraction
(167) According to another aspect of some embodiments of the present invention, predictive features for use in predictive models are provided. Predictive models according to some embodiments of the present invention may be configured for use in treating, diagnosing, and/or predicting, for example, the occurrence (e.g., recurrence) of a medical condition (e.g., cancer) in a patient. In some embodiments of the present invention, one or more of the features described herein may be utilized within a predictive model that includes one or more morphometric features and/or one or more molecular features. Alternatively or additionally, in some embodiments, such predictive models may include one or more clinical features. Examples of suitable types of clinical, molecular, and morphometric features are described in, for example, commonly-owned U.S. application Ser. No. 12/584,048, filed on Aug. 31, 2009, which is hereby incorporated by reference herein in its entirety.
(168) In some embodiments of the present invention, one or more of the features described herein may be extracted from images processed (e.g., pre-processed, segmented, and/or post-processed) according to some or all of the teachings set forth herein (e.g., according to
(169) Ring Features
(170) In some embodiments of the present invention, predictive features of glandular objects such as rings are provided. In some embodiments, these features are generated without using initial lumen segmentation. Rather, such rings may be detected based on nuclei geometry. A segmentation of nuclei of a cellular region into individually contiguous (or labeled) regions may be provided. The centroid of each nuclei region may be extracted and each image may then be partitioned into polygonal regions with epithelial nuclei at the vertices, for example, as described above with regard to Ring Segmentation and Ring Classification.
(171) In some embodiments of the present invention, prognostic ring features are combined and/or averaged over, for example, part of an image or an entire image to create a family of image features. This family of image features may be parameterized in, for example, one or more of four ways: (i) by statistic (two alternatives); (ii) by region type (7 alternatives); (iii) by weight (10+ alternatives); and/or (iv) by variable (20+ alternatives),
thus potentially creating in total more than 4300 possible features. For example, a consistent feature naming convention may be formed as
Statistic_Weight_RegionType_Variable.
(172) Table 4 below describes statistics that may be used to generate ring features according to some embodiments of the present invention. In addition, Table 5 below describes regions, Table 6 below describes weight variables, and Table 7 below describes ring feature variables that may be used to generate ring features according to some embodiments of the present invention. Table 8 below describes examples of the features that may be generated for use within a predictive model in some embodiments of the present invention. In Table 8, a + in the Dir column means that a larger value is better for patient outcome.
(173) TABLE-US-00004 TABLE 4 Ring Feature Statistics Statistic Definition Ratio The Ratio method takes the weighted average of the variable within regions of type r, normalized over all
(174) TABLE-US-00005 TABLE 5 Ring Feature Region Types Region Type Region Number Description Ring 1 Epithelial gland ring NonR 2 Epithelial non-ring Strom 3 Stromal region Epi 12 Epithelial Ring or Non-ring All 123
(175) TABLE-US-00006 TABLE 6 Ring Feature Weights Weight Desc NumEN Number of epithelial nuclei in region AreaEN Area of epithelial nuclei in region NumBEN Number of epithelial nuclei on border of region AreaBEN Area of epithelial nuclei on border of region AreaCK Area of CK18 mask in region AreaCKL Area CK + lumens of region type 1 rings TotArea Total area of region including CK, lumens and stroma Perimeter Length of region perimeter Stouchlen Perimeter x stouch (proportion of border touching stroma) StouchAreaBEN Area of border epithelial uclei adjacent to stroma Num =1 (used for counting rings) Ltouch Proportion of border adjacent to lumen or clearing = L/D (see diagram) Stouch Proportion of border adjacent to stroma BorderGap 2.sup.nd largest gap between nuclei around border of region = d (see diagram)
(176) TABLE-US-00007 TABLE 7 Ring Feature Variables Variable Desc Ltouch Proportion of border adjacent to lumen or clearing = LD (see diagram) LumenRatio Ratio of lumen or clearing to border gap = L/d (see diagram) Stouch Proportion of border adjacent to stroma Stouchpxx Stouch raised to power xx/10, eg., Stp05 = sqrt(stouch) InvaStouchpxx (1 Stouch{circumflex over ()}xx/10) Stouchxx Stouch + xx/10 SpLtouch Stouch + Ltouch SxLtouch Stouch x Ltouch StouchBEN Proportion of epithelial nuclei adjacent to stroma Sdist20 or Ratio border < 20 pixels from stroma, (variant of Stouch) Sd20 Sdist Mean distance of region border from stroma SdistStd Stdev of distance of border from stroma aaGn =1 if feature aa > n, =0 otherwise CkG2 ck area >=2000 pixels BlobG4 Blob area >=4000 pixels StG50 Stouch >50% DiL30 Diameter <=30 DeG15 Density >=1.5 Blob Connected epithelial unit in cytoplasm mask Perimeter Length of region perimeter SoftaaGThr or Soft thresholding of feature aa with linear ramp +/0.1 xxGZThr on either side of threshold = Thr/10 Solid 1 for regions with stouch <0.4, =1 for rings with stouch <0.9, or 0 otherwise Solidz4090 soft thresholding of Solid jumps at 0.5 Solidz406090 soft thresholding of Solid, without jump at 0.5 Solidz4060 soft thresholding = 1 for stouch <0.3, =1 for rings with stouch >0.7 with ramps =/0.1, centered at 0.4 and 0.6 Solidzxxyy ramps at xx/10, yy/10 SolidStouch50 (1-2stouch) for stouch < 0.5, (1-2stouch) for rings, or 0 otherwise
(177) TABLE-US-00008 TABLE 8 Example Ring Features Example Ring Features Dir Desc EpiNucRing, Ratio_AreaEN_x_Ring +
(178) Table 9 below shows how ring prognostic features generated according to some embodiments of the present invention are associated with many aspects of the Gleason grading system such as, for example, the fusion of multiple rings in one epithelial unit, the degree to which rings touch lumen and stoma, the formation of epithelial sheets with and without cribriform rings, and the fragmentation of glands. The Gleason grading system is the gold standard for morphological analysis of H&E stained tissue images of the prostate. These aspects of Gleason grading have not previously been captured by automated systems as they depend on understanding the need for the combination of epithelial unit separation and gland ring segmentation processes and, to the inventors' knowledge, there are no previous epithelial unit separation processes. Previous gland ring processes have also been limited by the need for lumen seeds to initialize a propagation procedure, problematic for high Gleason grades where lumens shrink and disappear.
(179) TABLE-US-00009 TABLE 9 Correspondence between Ring Features and Gleason Morphology Gleason Region type 1, Stouch ranges Morphology 1-0.9 0.9-0.5 0.5=0.1 0.1-0 Code 2 3Ring + 3Fragmented +small + 3-4Ring + + + 4Ring + + 4Fragmented + + + 4Cribrifrom + +hi density 4Solid + + +
(180)
(181) Table 10 below provides additional information regarding the correspondence between features described above and gland architecture. For example, it demonstrates how aspects of glandular architecture including fused rings, lumens or clearings in rings, Cribriform, ring size, Non-ring nuclei clusters, Cytoplasm fragmentation, Cytoplasm border, Nuclei spacing, Ring Shape, Ring dimensionality, Ring spacing, Ring arrangement, and Stroma Nuclei are represented by the features. The CI per case in Table 10 refers to the concordance index value calculated from the average feature values for each patient (e.g., by using the median feature values for each patient regardless of the number of images available for that patient). The Approx CI per image refers to the concordance index values calculated from all the feature values of all the patients.
(182) Further, rather than use the actual event time (e.g., event month=56), the outcome parameter consisted of a binary grouping of all the event cases regardless of time into one group and all the non-event cases regardless of censoring into another group.
(183) TABLE-US-00010 TABLE 10 Features Categorized by Gland Architecture CI Approx Gland Per CI per Architecture Representative Features case image Fused rings Ratio nuclei area in rings with stromal touch > S (50%), and lumen diameter > Diam (3 pixels), etc: RatioAreaNucsRedS50Diam30 0.71 0.63 RatioAreaNucsRedS50 0.71 0.63 RatioAreaNucsRed (==EpiNucRing) 0.63 Ratio nuclei area in rings with stromal touch > S (40%) and density ratio (d/b, see below) > 1.0, 1.5, 2.0 respectively: EpiNucMidRed10 0.71 0.64 EpiNucMidRed15 0.70 0.63 EpiNucMidRed20 0.63 Ratio nuclei area weighted by stromal touch %: RatioAreaNucsRedStouch 0.72 0.64 Ratio CK18 area weighted by stromal touch %: RatioAreaCKRedStouch 0.70 0.62 Ratio border nuclei adjacent to stroma: RatioAreaBNucsRedStouch 0.65 Ratio nuclei area weighted by lumen touch %: RatioAreaNucsRedLtouch 0.66 0.65 Ratio border nuclei area weighted by lumen touch %: RatioAreaBNucsRedLtouch 0.66 Ratio nuclei in rings in epithelial blobs with <= 1, 2, 4 . . . rings EpiBlobRed1NucR 0.58 EpiBlobRed2NucR 0.59 EpiBlobRed4NucR 0.6 EpiBlobRed8NucR 0.61 EpiBlobRed16NucR 0.63 EpiBlobRed32NucR 0.62 Lumens or MeanLtouchRed 0.64 0.63 clearings in MeanLumenRatioRed 0.63 rings MeanDensityRed 0.62 Cribriform Mean density (d/b) for fused rings, stromal touch < S (50%) Mean_DensityRings_Yellow_S50 0.62 0.56 Ratio nuclei touching lumen, not touching stroma RatioAreaNucsRedLtouchNotStouch Ring size Diameter of central clearing in good rings MeanDiameterRed_S50 0.59 0.56 Outer diameter of ring = approx. 4 area/perimeter MeanOuterDiameterRedS50 0.61 0.56 Average #nuclei in ring = #nucs/#rings = n/r Note: MST proportion edge 3 = approx. 2.5 r/n assuming that MST connects each ring to 2.5 other rings on average = 2.5/Mean_NumNucs_RedGreen MeanNumNucsRed_S50 0.6 MeanNumNucsRedGreen_S50 0.6 Non-ring Ratio area nuclei in non-ring clusters nuclei RatioAreaNucsNonRing(=EpiNucGreen) 0.33 034 clusters Ratio area nuclei in small non-ring clusters (area < 2000) in bigger blobs (area > 4000) RatioAreaNucsNonRingSmallCK2BibBlob4 0.36 0.36 Ratio nuclei in non-ring clusters adjacent to good rings EpiBobRingRatioNumNucsNonRing 0.44 (=EpiBlobRedNucG) Cytoplasm Ratio area nuclei in small non-ring clusters in small fragmentation blobs RatioAreaNucsGreenSmallCK2SmallBlob4 0.41 Cytoplasm Mean, std of curvature border Nuclei Spacing around ring border (=approx. MST mean spacing length): MeanBorderGapRedS50 0.46 Ring shape Elongation feature: minor axis/major axis = approx 4Pi area/perimeter{circumflex over ()}2 Ring # of different nuclei reachable in n steps or in a fixed dimensionality distance along edges of the MST or Voronoi graphs Ring spacing Stromal area features Ring Sausage or Y arrangement of rings in CK blobs: arrangement MST of rings initialized with rings, without connections between rings - proportion edge 2 vs. proportion edge 3 of connecting edges Stromal nuclei Elongation, clustering
(184) Epithelial Unit Features
(185) In some embodiments of the present invention, predictive features of glandular objects such as epithelial units are provided. Epithelial units may be generated from the process described above in connection with
(186) Biomarker Features
(187) In some embodiments of the present invention, predictive biomarker features are provided. In some embodiments, biomarker feature generation may begin with the segmentation of individual cellular objects (cell nuclei and cytoplasm) and sub-compartments (epithelia and stroma). Biomarker feature generation may include two steps: sub-compartment histogram generation and feature extraction. Histograms (e.g., epithelial and stroma histograms HE and HS) may be extracted by binning actual signal colored objects (e.g., nuclei Br), for example, in each sub-compartment separately. Epithelia and stroma histograms may be analyzed separately to obtain relative intensity features such as, for example, cumulative expressions at each -th percentile.
(188) Alternatively or additionally, the histograms also may be analyzed together to obtain features related to the relative expression of objects (e.g., epithelia and stroma nuclei), for example, at each position. In some embodiments, fractional positive percentiles may be calculated.
(189)
(190) Table 11 below includes descriptions, according to some embodiments of the present invention, feature classes for AR (and Ki67) nuclei biomarkers with their favorable value indication and concordance index (CI) on a subset of 309 patients whose 980 biopsy images were studied.
(191) TABLE-US-00011 TABLE 11 Descriptions of feature classes for AR (and Ki67) nuclei biomarkers Favorable CI on Feature Name Description Value MV06 AR_EpithDiffN The difference between the fraction Low 0.33 of epithelial nuclei whose intensity is greater than N % of the maximum nuclei intensity (positivity) and the fraction of stroma nuclei whose intensity is greater than N % of the maximum positivity AR_EpithRatioN The ratio of the fraction of epithelial Low 0.31 nuclei with intensity values greater than N % of maximum positivity to all nuclei (epithelial and stroma) above N % positivity AR_StromValueN The fraction of stroma nuclei with High 0.67 intensity values lies between N % and (N + 25)% of the maximum intensity AR_StromHistN The fraction of stroma nuclei with Low 0.31 intensity values that are less than N % of the maximum intensity AR_EpithAbsenceN The fraction of nuclei whose High 0.65 intensity are between N % and (N + 10)% of the maximum intensity that are not epithelial nuclei AR_EpithMaxRiseN The ratio of the maximum intensity Low 0.35 difference between all epithelial nuclei histograms bins that are N % apart to the maximum intensity difference between all stroma nuclei bins that are N % apart AR_EpithRelRiseN The ratio of the intensity difference Low 0.35 between N % and (100 N)% positive epithelial nuclei to the intensity difference between N % and (100 N)% positive stroma nuclei Ki67_PositiveFractionN The fraction of epithelial nuclei Low 0.33 above N % of the total positive value Ki67_PositiveRatioN The number of epithelial nuclei Low 0.34 whose normalized intensity value is greater than N % divided by the total number of epithelial nuclei
(192) Table 12 below shows four examples of cases demonstrating relative values of certain AR features according to some embodiments of the present invention. Cases 1 and 2 had favorable outcomes while cases 3 and 4 had unfavorable outcomes. In some embodiments of the present invention, image processing and feature extraction as described herein can be applied to tissue samples obtained from biopsy or tissue micro array sections as it is robust to variability in image quality and heterogeneity in staining conditions. Advantageously, it may automate traditional methods of normalizing signal intensity by relative, objective appearance. In some embodiments, features constructed from the relative analysis of multi-compartment histograms as described herein are robust to tissue variations.
(193) TABLE-US-00012 TABLE 12 Four examples cases demonstrating relative values of AR features (Cases 1 and 2 were favorable while cases 3 and 4 were unfavorable). Example Cases Favorable Feature Name Case 1 Case 2 Case 3 Case 4 value AR_EpithDiff90 0.03 0.02 0.03 0.03 Low AR_EpithRatio90 0.00 0.00 1.00 1.00 Low AR_StromValue50 0.10 0.12 0.01 0.02 High AR_StromValue75 0.06 0.04 0.00 0.00 High AR_StromHist30 0.52 0.56 0.82 0.77 Low AR_StromHist40 0.70 0.74 0.96 0.92 Low AR_StromHist50 0.83 0.83 0.99 0.98 Low AR_StromHist60 0.90 0.89 1.00 0.99 Low AR_StromHist70 0.94 0.96 1.00 1.00 Low AR_StromHist80 0.97 0.98 1.00 1.00 Low AR_EpithAbsence50 0.13 0.09 0.03 0.05 High AR_EpithAbsence60 0.06 0.06 0.01 0.01 High AR_EpithAbsence70 0.04 0.07 0.00 0.01 High AR_EpithAbsence80 0.03 0.02 0.00 0.00 High AR_EpithAbsence90 0.03 0.02 0.00 0.00 High AR_EpithRelRise95 81.07 85.28 193.58 205.51 Low AR_EpithRelRise90 83.64 83.26 207.74 191.48 Low AR_EpithRelRise85 83.33 82.61 199.09 207.20 Low AR_EpithRelRise80 77.43 91.21 204.06 196.60 Low AR_EpithRelRise75 75.72 92.32 215.09 185.77 Low
(194) Texture Features
(195) In some embodiments of the present invention, predictive texture features are provided. For example, in some embodiments, texture features may be obtained by analyzing the texture of nuclei (e.g., DAPI nuclei). As a disease progresses (example, from Gleason pattern 3 to 4), the texture of DAPI nuclei becomes coarser, making it advantageous to analyze textural differences.
(196)
(197)
In some embodiments, correlation may be defined as:
(198)
where, p(i,j) is obtained from a gray-level co-occurrence matrix that calculates how often a pixel with gray-level (grayscale intensity) value i occurs horizontally adjacent to a pixel with the value j.
(199) At stage 2108, a histogram may be plotted based on the homogeneity and the correlation of the individual nuclei and, for example, the histogram may be normalized by dividing by the total sum (y-axis). In some embodiments, the histogram may also be stretched between 0 to 1 (x-axis).
(200) At stage 2110, a polynomial curve may be fit to the histogram to minimize or get rid of fluctuations in the count. In some embodiments, each bin for the fitted curve of the histogram may become a feature.
(201) At stage 2112, stromal nuclei may be extracted from the nuclei (e.g., DAPI) image and the histogram may be plotted using the same or a similar process as described above.
(202) At stage 2114, the epithelial histogram of homogeneity and correlation may be divided by their respective stromal histograms. This may normalize the histogram between various images. A polynomial curve may be fit to the new histogram (e.g., ratio of epithelial and stromal histograms) to obtain a new set of features (ho_epistromxx, cr_epistromxx). In some embodiments of the present invention, the first and second histograms (e.g., epithelial and stromal histograms) could also be subtracted from each other or added together before extracting one or more predictive features based on a result thereof. In some embodiments of the present invention, a histogram normalization process may be used, for example, as an alternative or in addition to polynomial fitting. A suitable histogram normalization process according to some embodiments of the present invention is described in, for example, above-incorporated Gonzalez R. C and Woods R. E. Digital Image Processing, Second Edition, Prentice-Hall Inc, 2002.
EXAMPLE 1: RING FEATURE GENERATION
(203) Features were developed for predicting significant prostate cancer disease progression (including treatment resistance, metastasis and death-of-disease) post radical prostatectomy (RP) from IF images of positive biopsies. This study employed a multi-institutional cohort of 306 patients. Formalin fixed, paraffin embedded biopsy tissue samples for patients treated with RP between 1989 and 2003 for localized or locally advanced prostate cancer (cTlc-cT3) were studied. The samples were labeled with the DAPI counterstain, and the CK18 biomarker, and were imaged using a CRI Nuance multispectral imaging system. After an experienced pathologist manually reviewed images to make sure that the segmentations were accurate, gland ring features were defined as proportions of epithelial nuclei in several categories, Table 13.
(204) TABLE-US-00013 TABLE 13 Feature definitions Feature Description S-touch Proportion of ring border touching stroma EpiNucRing Proportion of epithelial nuclei in gland rings (red) EpiNucGreen Proportion of epithelial nuclei in glandular non-rings (green) EpiNucMidRed Proportion of epithelial nuclei in gland rings (red) with S-touch > 0.4 RatioRed (# gland rings)/(# gland rings (red) + # glandular non-rings (green)) OguWigu5x Morphological lumen-based H&E feature for comparison: Proportion of epithelial nuclei in symmetrical gland-unit areas around lumens
(205) The EpiNucRing feature is analogous to the H&E OguWigu5x feature, which is described in Donovan et al., Personalized Prediction of Tumor Response and Cancer Progression from the Prostate Needle Biopsy, Journal of Urology, 2009, 182: 1, 125-132, which is hereby incorporated by reference herein in its entirety. The EpiNucGreen feature is almost the inverse of the EpiNucRing feature. Since few epithelial nuclei occur in fragments within stromal areas, they are generally classified as either gland ring or non-ring. This is borne out in the cross-correlation between these two features of 0.91, Table 14.
(206) TABLE-US-00014 TABLE 14 Feature cross correlations in the range 1 to 1, comparing the proposed IF features to a H&E gland-unit feature (OguWigu5x) F2 F3 F4 F5 F1: OguWigu5x 0.52 0.3 0.32 0.08 F2: EpiNucMidRed 1 0.48 0.44 0.06 F3: EpiNucGreen 0.48 1 0.91 0.77 F4: EpiNucRing 0.44 0.91 1 0.79 F5: RatioRed 0.06 0.77 0.79 1
(207) The IF feature with the highest correlation to OguWigu5x, 0.52, is EpiNucMidRed, which is a filtered version of EpiNucRing excluding rings which do not touch stroma for at least 40% of their boundary. This has the effect of reducing the gland ring nuclear proportion for areas of densely merged epithelial units, such as cribriform sheets.
(208) Concordance indices (CIs) in Table 15 below are calculated comparing feature values to the time elapsed between when the biopsy was performed and the occurrence of significant disease progression (metastasis or death as a result of prostate cancer). EpiNucMidRed is correlated to and more stable than the H&E OguWigu5x feature described above. The correlation is related to the correlation between the touching stroma and the touching lumens present in good rings. All the features except EpiNucGreen correlate positively with outcome, having CIs greater than 0.5, meaning that a higher value of the feature is better for the patient. This interpretation corresponds with the clinical and pathological correlation of glandular objects with disease progression. It is interesting to note the advantageous operation of the epithelial unit separation process to increase the CI of the EpiNucMidRed feature up to 0.70. This can be explained as a more accurate and generally larger value for S-touch, the percentage of border touching stroma, now including parts of the border where epithelial units touch but are not fused together. These features have been further refined by, for example, replacing the hard thresholds of stouch>40% and density>1.5 in EpiNucMidRed by soft linear ramp thresholds from, for example, 0.3 to 0.5 for stouch and from 1 to 2 for density. This results in greater robustness of CIs over different datasets. The updated version of this feature is named Ratio_AreaEN_Ring_SoftStG40_DeG15, as shown in Table 8 above.
(209) TABLE-US-00015 TABLE 15 Univariate concordance indices comparing the proposed IF features to a H&E gland-unit feature (OguWigu5x) CI w/out CK18 CI with CK18 Feature Separation Separation OguWigu5x 0.63 n/a EpiNucMidRed 0.64 0.70 EpiNucGreen 0.40 0.33 EpiNucRing 0.63 0.64 RatioRed 0.61 0.65
EXAMPLE 2: AR AND KI167 FEATURES
(210) 5 M sections of formalin-fixed paraffin-embedded human prostate tissues were rehydrated and stained using DAPI and CK18 as well as androgen receptor (AR) and Ki67. The biomarker emissions were captured using a Nuance multispectral camera at 20 by 10 resolution. Non-cancer tissue regions were masked out from the image. A cohort of 926 images from 306 patients was used. The segmentation of the nuclei and cytoplasm objects was based on the DAPI and CK18 images. The univariate concordance index for each of four features (two each from the AR and Ki67 biomarkers) was calculated. The selected features are described in Table 16 below.
(211) TABLE-US-00016 TABLE 16 Predictive histogram features for nuclei biomarkers Biomarker Feature CI AR Difference between the histogram 0.33 of epithelial and stroma expression at the 95-100% bin level. The higher the worse for the patient as it means there are significantly more bright epithelial nuclei than stroma nuclei. AR Fraction of bright stroma nuclei 0.69 within the 90-100% bin of all expressions. The higher the better for the patient as it means the stroma nuclei are almost as bright as (or brighter than) epithelial nuclei. Ki67 Difference between histogram of 0.65 epithelial and stroma expression at the 0-5% bin level. The higher the better for the patient as it means there are significantly more dull epithelial nuclei than stroma nuclei. Ki67 Fraction of epithelial nuclei that 0.36 have an expression above the 25- 30% bin of all expressions. The higher the worse for the patient as it means the epithelial nuclei in have a brighter average expression than the stroma nuclei.
(212) Two multivariate models were also trained in the context of a systems pathology paradigm. In this example, disparate information from patient's clinical (e.g., age), histological (via image features extracted from hematoxylin and eosin-stained tissue specimens, and molecular (via features measuring the expression of protein biomarkers) profiles were combined in a supervised learning framework to predict cancer recurrence. One of the models utilized features from a previous method while the other utilized features described herein. Analysis show a 0.04 improvement in the predictive power (CI) of the model based on the new features.
(213) Androgen receptor (AR) and Ki67 biomarker expression in prostate biopsy samples were quantified and extracted histogram features were shown to be associated with disease progression in a univariate analysis. It manifested improved performance over prior systems. The features were also selected in a multivariate model integrating other clinical and histological features, proving their independent prognostic value. The utility of this integrated approach to biomarker quantification was demonstrated by predicting prostate cancer disease progression within eight years after a radical prostatectomy.
(214) Ring Combination Features
(215) In some embodiments of the present invention, predictive features are provided that are combinations of other features, for example, biomarker features with gland morphology features. In some embodiments, the combined features may adapt the prognostic weight of other features with particular morphological architecture of the tissue image. For example, in one embodiment, AR biomarker intensity values are generated and combined with ring features for each ring in an image. The combined AR biomarker and ring values for each ring are then aggregated over, for example, the entire tissue image using a Nave Bayes probability or other method.
(216) According to various embodiments of the present invention, one or more of the following approaches may be used for generating combined features. First, features may be localized per ring. Second, features may be multiplied by one or more features within a morphologic parameter set (e.g., Gleason triangle components). Third, the features may be combined by, for example, multiplying the feature values (e.g., first converting the feature values into probability values and then combining them by multiplying their Nave Bayes probability values). Each of these approaches is described in greater detail below.
(217) Localizing features per ring: in some embodiments, this approach may involve calculating the value of a feature at each ring in an image. The process of calculating the feature value at each ring may take into account information from the all or some part of the tissue image (e.g., the average epithelial value, the maximum stroma value, and the image intensity value).
EXAMPLE 1
(218) Per Gland Relative Rise AR value at 70% (ARRR70) is calculated from the value of the 70% of the histogram of AR values per ring and the value of the 70% of the histogram of AR value in all of the stroma. It is further normalized by a ratio, N, which is a function of the number of nuclei in the ring.
ARRR70=(AR 70% rise in ring/AR 70% rise in stroma)Sqrt(N/(N1))
EXAMPLE 2
(219) Per Gland Ki67 Positive Ratio (Ki67PR) is calculated from the number of positive Ki67 nuclei in a ring and the number of epithelial nuclei in that ring.
Ki67PR=Number of positive Ki67 in ring/Number of epithelial nuclei in ring
(220) Multiplying features by a morphologic parameter: in some embodiments according to this approach, morphologic parameters provide a partial classification of a ring into two or more morphologic categories, where the sum of the partial classification values obtained equals 1.
EXAMPLE 1
(221) The morphologic parameter Gleason triangle includes (e.g., consists of) three components: pattern 3, pattern 4 fragmented and pattern 4 sheet as shown in Table 17 below. These components characterize different aspects of the morphology of the glandular region. Each ring classified using the Gleason triangle may have three component classification values which sum to 1. Two examples are shown in the table. In the first example the region has s partial classification value in the range 0 to 1 (alternatively, has s partial membership) to the pattern 3 component of the Gleason triangle. It also has 1s partial classification value to the pattern 4 sheet component and 0 classification value to the pattern 4 fragmented component. The second region has 0, s, and 1s partial classification values to pattern 3, pattern 4 fragmented and partial 4 sheet components, respectively. Isolated rings may have a high s value while fused rings or sheets may have a low s value. Thus, s could be the stouch, ltouch, or similar feature value described above in connection with the Ring Features.
(222) TABLE-US-00017 TABLE 17 Description of Gleason triangle features Pattern 3 Pattern 4 fragmented Pattern 4 sheet Region Type s3 s4f s4s 1 S 0 1 s 2 0 s 1 s
(223) Combining features: in some embodiments of the present invention, according to this approach there may be one or more (e.g., four) different schemes for combination of feature values, as illustrated below in connection with Table 18. These may include: localization per gland, localization per image, nave Bayes per ring, and nave Bayes per image. The schemes may make use of a technique that converts a prognostic feature into an approximate event probability by a least squares linear fit of the feature to 0 (non-event) or 1 (event). An alternative scheme may replace 0 and 1 by the event time. When the event time is unknown or censored, an iterative scheme can update the event times subject to the censoring constraints to optimize the outcome probability. In some embodiments, this linear rescaling of the feature does not change the concordance index (CI) or the prognostic value of the feature, but allows the feature to be treated as a probability value.
(224) TABLE-US-00018 TABLE 18 Comparison of Feature Combination Schemes Feature Approx Approx Suffix scheme Example with s = stouchxLtouch CI Set 1 CI Set 2 xxLG Localized SLt3xARRR7SLt4FxARRR7SLt4SxARRR7n3LG 0.724 0.819 per ring xxLI Localized SLt3xARRR7SLt4FxARRR7SLt4SxARRR7n3LI 0.721 0.825 per image xxNBG Nave SLt3xARRR7SLt4FxARRR7SLt4SxARRR7n3NBG 0.731 0.771 Bayes per ring xxNBI Nave SLt3xARRR7SLt4FxARRR7SLt4SxARRR7n3NBI 0.734 0.785 bayes per image
EXAMPLE 1
(225) Localizing per ring ARRR70. This involves multiplying the ring values of ARRR70, by each of the three morphology components (s3, s4f, and s4s). A weighted average of the resultant AR components is then taken over the image as shown in Table 4Ring Feature Statistics, using the nuclei area as a weight function and normalizing over region types 1 and 2. A least squares fit of these three components to 0 (non-event) or 1 (event) creates a probability score which is the output feature value.
EXAMPLE 2
(226) Localizing per image ARRR70. This involves calculating the weighted average of the ARRR70 per ring feature over the entire image, and the weighted average of the Gleason triangle components per ring over the entire image. The weighting method here uses nuclei area as a weight and normalizes over region types 1 and 2. The weighted average of ARRR70 and each of the three morphology components are multiplied to obtain three values. A least-squares fit of the result to 0 (non-event) or 1 (event) to creates a probability score which is the output feature value.
EXAMPLE 3
(227) Nave Bayes per ring ARRR70. This involves multiplying the ring values of ARRR70, by each of the three morphology components (s3, s4f, and s4s). A weighted average of the resultant AR components is then taken over the image, as in Example 1. The three components are separately converted into three probability scores (p3, p4f, p4s) which are then combined assuming the non-event probabilities are independent using the formula:
Probability score=1(1p3)(1p4f)(1p4s)
EXAMPLE 4
(228) Nave Bayes per image ARRR70. This involves calculating the weighted average of the ARRR70 per ring feature over the entire image, and the weighted average of the Gleason triangle components per ring over the entire image as in Example 2. The weighted average of ARRR70 and each of the three Gleason components are multiplied to obtain three values which are separately converted into three probability scores (p3, p4f, p4s). The three probability scores are then combined assuming the non-event probabilities are independent as in Example 3.
EXAMPLE 5
(229) Localizing per image minimum spanning tree (MST) proportional edge 3 feature which is described in above-incorporated Donovan et al., Personalized Prediction of Tumor Response and Cancer Progression from the Prostate Needle Biopsy, Journal of Urology, 2009, 182: 1, 125-132. This involves calculating the weighted average of the MST proportional edge 3 features per ring over the entire image, and the weighted average of the Gleason triangle components per ring over the entire image. The weighting method here uses nuclei area as a weight and normalizes over region types 1 and 2. The weighted average of MST proportional edge 3 feature and each of the three morphology components are multiplied to obtain three values. A least-squares fit of the result to 0 (non-event) or 1 (event) to creates a probability score which is the output feature value.
EXAMPLE 6
(230) Localizing per image energy feature which is described above in connection with
EXAMPLE 7
(231) Localizing per image a high molecular weight cytokeratin (HMWCK) feature. HMWCK is a biomarker whose absence in epithelial units defines invasive cancer. This involves calculating the weighted average of the HMWCK feature per ring over the entire image, and the weighted average of the Gleason triangle components per ring over the entire image. The weighting method here uses nuclei area as a weight and normalizes over region types 1 and 2. The weighted average of the HMWCK feature and each of the three morphology components are multiplied to obtain three values. A least-squares fit of the result to 0 (non-event) or 1 (event) to creates a probability score which is the output feature value.
(232) In some embodiments of the present invention, instead of using the Gleason triangle components, the triple (where s is a morphological component and b is a biomarker or other feature and sb is their product) could be used. This morphological parameter set can be utilized with the four schemes described above. The Localized per image scheme creates a bilinear interpolation of s and b probability scores. This is shown in
(233) Stability Analysis
(234)
(235) An input image may be received at stage 2302 (e.g., an input image resulting from preprocessing according to process 200 (
(236) At stage 2306, the original and perturbed (e.g., slightly blurred) images are then segmented using one or more segmentation processes. For example, statistical stability analysis may be applied to multiple, different segmentation processes under consideration to rank the stability of such segmentation approaches. In some embodiments, the most stable segmentation approach is selected for use in the development of predictive models and/or evaluation of tissue images for new patients to be evaluated by existing models.
(237) At stage 2308, a match between the segmentation output of the original image to the segmentation output of each of its slightly perturbed variants is then evaluated to assess stability and produce one or more match metrics. For example, in an illustrative implementation, 48 realistic phantom images obtained from actual biopsy prostate cancer tissue were utilized. The images contained only cancer tissue as all non-cancerous regions were previously masked out.
(238) Table 19 below provides a comparison of the approach for segmentation evaluation according to some embodiments of the present invention to traditional ground truth based validation. Embodiments of the present approach arise from, for example, the observation that segmentation errors can be classed as statistical or structural. Statistical errors reflect failure to account for random variations in pixel values while structural errors result from inadequate image description models. Observations show that statistical errors predominate image analysis, for example, in cytometry.
(239) TABLE-US-00019 TABLE 19 Ground-truth based Stability-based Criteria validation validation Evaluation Metrics measure the Metrics measure the method overlap between an consistency of algorithm's output segmenting perturbed of an image to the versions of an image image's ground-truth using the algorithm Possibility of use No Yes after algorithm deployment Representativeness Generalization to Generalization to of results larger image sets is larger image sets is necessary but often not required difficult
STABILITY EXAMPLE 1: ANALYZING AND RANKING SEGMENTATION ALGORITHMS
(240) Ground-truth images were created for each of 48 phantom images using the process shown in
(241) More specifically, sixteen 5 M sections of formalin-fixed paraffin-embedded human prostate tissues were rehydrated and stained using nuclear counterstain 4-6-diamidino-2-phenylindole (DAPI) and cytokeratin 18 (CK18). DAPI and CK18 emission images were captured using a Nuance multispectral camera at 20 by 10 resolution. Non-cancer tissue regions of the DAPI-CK18 images were digitally masked out by a pathologist. As a result, three gray levels were present in the images: the masked area (black), tissue background (dark gray), and tissue foreground (bright regions representing the nuclei or cytoplasm). Three ground truth images were created from each of the 16 DAPI-CK18 images using a semi-automatic labeling tool. The generated ground truth was reviewed by two senior pathologists.
(242) Ground truth generation: The ground truth images were obtained using a semi-automatic procedure that involves manual delineation of the boundaries of cells using a border tracing tool. It allows selection of points along the boundary of interest that approximate its spline control points which it uses to calculate the best B-spline approximation of the closed curve. The user can edit the curve by adding or deleting points. The shape of the cells in the phantom images were traced from actual prostate cancer cells using the semi-automated tracing tool. The texture of the nuclei, cytoplasm and background were sampled from the actual images and registered to the ground truth images. Care was taken to replicate any texture variations along the border of the actual images on the phantoms. Effort was made to normalize the statistical distribution of the overall intensity and the distribution of the phantoms to match those of the actual image templates.
(243) Image Processing and Segmentation Approaches:
(244) The approach shown in
(245)
(246) Statistical Stability Approach:
(247) Perturbed variants of each of the input images were generated using or convolution kernels in each of four directions (top-to-bottom, right-top-to-left-bottom, right-to-left, and right-bottom-to-left-top) to obtain eight variants in total. Referring to
(248) Partition Stability Approach:
(249) Illustrative stages of the process are shown in
(250)
where, p(i,j) is obtained from the gray-level co-occurrence matrix that calculates how often a pixel with gray-level (grayscale intensity) value i occurs horizontally adjacent to a pixel with the value j. One or more arithmetic operations (e.g., division or subtraction) may be applied to one or more partitions. The resulting texture metrics in two or more partitions may be compared and the output of this comparison may be utilized as a measure of stability.
(251) Statistical Stability Validation Metrics:
(252) Extensions of the basic Dice and Jaccard image similarity metrics were derived to the case of: 1) multiple images where each object in each image was considered a match to only one object in the other images (
(253) Partition Stability Validation Metrics: In some embodiments of the present invention, one or more of the metrics may be used to rank segmented images.
(254) Metric 1: In some embodiments, a background texture metric, for example, background texture of epithelial cytoplasm objects per image, may be
(255) Metric 2: In some embodiments, a band texture metric, for example, average texture of the band around each epithelial cytoplasm object per image, may be used to rank segmented images. In some embodiments, this metric may be computed as follows: 1. obtaining the background mask by complementing an original cytoplasm segmented mask; and 2. multiplying the original image with the background mask and computing the average energy in the background.
(256) Metric 3: In some embodiments, a band texture metric, for example, average texture of the band around each epithelial cytoplasm object per image, may be used to rank segmented images. In some embodiments, this metric may be computed as follows: 1. dilate a segmented epithelial cytoplasm mask using, for example, a 10 pixel disk structuring element; 2. subtracting the dilated mask from the original cytoplasm mask; and 3. calculating the energy (texture) of that band for each cytoplasm gland object and averaged for each image, with this value being the band texture for that particular image.
(257) Metric 4: In some embodiments, a background ratio texture metric, for example, the ratio of background texture of the original cytoplasm (e.g., CK18) background divided by the dilated (e.g., CK18) background, may be used to rank segmented images. In some embodiments, this metric may be computed as follows: 1. dilate a segmented epithelial cytoplasm mask is using, for example, a 10 pixel disk structuring element; 2. obtaining the original and dilated backgrounds by complementing the original and dilated cytoplasm segmented masks; and 3. computing the energy (texture) of the ratio of the original cytoplasm (e.g., CK18) background divided by the dilated (e.g., CK18) background, with this value being the background ratio texture for that particular image.
(258) Results Analysis:
(259) For each of the 48-by-4 image-algorithm pairs, a ground-truth match metric (using the classic Dice metric) and four stability validation metrics (described above) were calculated. The correlation between each of the 48-by-4 ground truth scores for each image-algorithm pair to the corresponding stability validation score were also calculated for each metric. Using 48 images and 4 algorithms, it was determined whether the stability-based validation method could pick the better of every algorithm pair for every image, as judged by accuracy-based validation. Results showed this to be true in over 96% of cases. It exaggerated the accuracy difference in segmentation results judged similar by ground-truth validation in 4% of cases. The method reduced segmentation review effort and time by over 99% and consistently detected poor segmentation outputs.
(260)
(261) To characterize the two datasets, the average statistics of the statistical stability metric 1 and the three partition stability metrics were calculated for biopsy and prostatectomy tissue images and shown in Table 20 below. Due to laboratory process differences, prostatectomy tissue images had higher background noise than biopsy tissue images. This corresponded to lower mean metric values. Further, the standard deviations of values for the biopsy images were lower than those of the prostatectomy images, indicating their segmentation was superior on average. In Table 20: S-DSC: stability-based Dice correlation coefficient, BGT: background texture; BNT: band texture; BGRT: the ratio of background texture to combined texture of the foreground and band, in other words, the texture in the background of original versus dilated cytoplasm backgrounds.
(262) TABLE-US-00020 TABLE 20 Datasets S-DSC BGT BNT BGRT Biopsy Min 90.66 88.89 94.48 94.43 (926 Images) Max 99.99 99.98 99.98 99.98 Mean 99.30 99.31 99.48 99.49 Std 0.67 0.92 0.61 0.61 Prostatectomy Min 92.80 64.67 81.53 81.20 (1030 Images) Max 99.83 99.97 99.98 99.98 Mean 98.43 90.34 95.29 95.22 Std 0.95 6.25 2.91 2.98
(263) Table 21 below illustrates the improvement in nuclei features obtained when, for example, the lower 30% of images with low statistical stability scores are pruned (eliminated) from the dataset. Table 21 shows the concordance index of minimum spanning tree nuclei features obtained from 926 prostate biopsy images before and after such pruning.
(264) TABLE-US-00021 TABLE 21 Mean Proportional Proportional Proportional Length Edge 1 Edge Edge 3 Before 0.56 0.68 0.69 0.67 pruning After 0.68 0.72 0.74 0.69 pruning
(265) Discussion:
(266) Stability analysis was applied to performance assessment of segmentation algorithms without using ground-truth images. Tests using 48 realistic phantoms and four segmentation algorithms show that statistical validation scores correspond to ground-truth validation scores in over 96% of cases. Tests on 6000 segmentation results show that this method cut the segmentation review time and effort by over 99%. As no ground-truth is required, this method can be used for performance evaluation long after algorithm development.
(267) In various embodiments of the present invention, one or both of statistical and partition stability can be applied to the task of ranking images by segmentation. Table 20 above shows average values of statistical and partition stability metrics in the biopsy and prostatectomy sets.
ADDITIONAL EMBODIMENTS
(268) Thus it is seen that systems and methods are provided for, for example, segmentation and processing of tissue images and feature extraction from the same. Although particular embodiments have been disclosed herein in detail, this has been done by way of example for purposes of illustration only, and is not intended to be limiting with respect to the scope of the appended claims, which follow. In particular, it is contemplated by the applicant that various substitutions, alterations, and modifications may be made without departing from the spirit and scope of the invention as defined by the claims. Other aspects, advantages, and modifications are considered to be within the scope of the following claims. The claims presented are representative of the inventions disclosed herein. Other, unclaimed inventions are also contemplated. Applicant reserves the right to pursue such inventions in later claims.
(269) Insofar as embodiments of the invention described above are implementable, at least in part, using a computer system, it will be appreciated that a computer program for implementing at least part of the described methods and/or the described systems is envisaged as an aspect of the present invention. The computer system may be any suitable apparatus, system or device. For example, the computer system may be a programmable data processing apparatus, a general purpose computer, a Digital Signal Processor or a microprocessor. The computer program may be embodied as source code and undergo compilation for implementation on a computer, or may be embodied as object code, for example.
(270) It is also conceivable that some or all of the functionality ascribed to the computer program or computer system aforementioned may be implemented in hardware, for example by means of one or more application specific integrated circuits.
(271) Suitably, the computer program can be stored on a carrier medium in computer usable form, such as a non-transitory computer readable storage medium, which is also envisaged as an aspect of the present invention. For example, the carrier medium may be solid-state memory, optical or magneto-optical memory such as a readable and/or writable disk for example a compact disk (CD) or a digital versatile disk (DVD), or magnetic memory such as disc or tape, and the computer system can utilize the program to configure it for operation. The computer program may also be supplied from a remote source embodied in a carrier medium such as an electronic signal, including a radio frequency carrier wave or an optical carrier wave.
(272) All of the following disclosures are hereby incorporated by reference herein in their entireties: U.S. application Ser. No. 12/821,664, filed on Jun. 23, 2010; U.S. Provisional Application No. 61/280,162, filed on Oct. 30, 2009; U.S. application Ser. No. 12/584,048, filed on Aug. 31, 2009; U.S. application Ser. No. 12/462,041, filed on Jul. 27, 2009; PCT Application No. PCT/US09/04364, filed on Jul. 27, 2009; PCT Application No. PCT/US08/004523, filed Apr. 7, 2008, which claims priority from U.S. Provisional Patent Application Nos. 60/922,163, filed Apr. 5, 2007, 60/922,149, filed Apr. 5, 2007, 60/923,447, filed Apr. 13, 2007, and 61/010,598, filed Jan. 9, 2008; U.S. patent application Ser. No. 11/200,758, filed Aug. 9, 2005; U.S. patent application Ser. No. 11/581,043, filed Oct. 13, 2006; U.S. patent application Ser. No. 11/404,272, filed Apr. 14, 2006; U.S. patent application Ser. No. 11/581,052, filed Oct. 13, 2006, which claims priority from U.S. Provisional Patent Application No. 60/726,809, filed Oct. 13, 2005; and U.S. patent application Ser. No. 11/080,360, filed Mar. 14, 2005, which is: a continuation-in-part of U.S. patent application Ser. No. 11/067,066, filed Feb. 25, 2005 (now U.S. Pat. No. 7,321,881, issued Jan. 22, 2008), which claims priority from U.S. Provisional Patent Application Nos. 60/548,322, filed Feb. 27, 2004, and 60/577,051, filed Jun. 4, 2004; a continuation-in-part of U.S. patent application Ser. No. 10/991,897, filed Nov. 17, 2004, which claims priority from U.S. Provisional Patent Application No. 60/520,815, filed Nov. 17, 2003; a continuation-in-part of U.S. patent application Ser. No. 10/624,233, filed Jul. 21, 2003 (now U.S. Pat. No. 6,995,020, issued Feb. 7, 2006); a continuation-in-part of U.S. patent application Ser. No. 10/991,240, filed Nov. 17, 2004, which claims priority from U.S. Provisional Patent Application No. 60/520,939 filed Nov. 18, 2003; and claims priority from U.S. Provisional Patent Application Nos. 60/552,497, filed Mar. 12, 2004, 60/577,051, filed Jun. 4, 2004, 60/600,764, filed Aug. 11, 2004, 60/620,514, filed Oct. 20, 2004, 60/645,158, filed Jan. 18, 2005, and 60/651,779, filed Feb. 9, 2005. Also incorporated herein by reference are Ajemba et al., Integrated segmentation of cellular structures, Proc. of the SPIE Medical Imaging Conf, Orlando, Fla., 7962 (17) (2011), and Ajemba et al, Stability based validation of cellular segmentation algorithms, Proc. of the SPIE Medical Imaging Conference., Orlando Fla., 7962(106), (2011).