SYSTEMS AND METHODS FOR FINDING REGIONS OF INTEREST IN HEMATOXYLIN AND EOSIN (H&E) STAINED TISSUE IMAGES AND QUANTIFYING INTRATUMOR CELLULAR SPATIAL HETEROGENEITY IN MULTIPLEXED/HYPERPLEXED FLUORESCENCE TISSUE IMAGES
20220323776 · 2022-10-13
Assignee
Inventors
Cpc classification
H01Q7/00
ELECTRICITY
G16H50/20
PHYSICS
G06V10/26
PHYSICS
A61N1/37229
HUMAN NECESSITIES
G06T7/187
PHYSICS
H01Q1/273
ELECTRICITY
International classification
A61N1/372
HUMAN NECESSITIES
G06T7/187
PHYSICS
G06V10/26
PHYSICS
G06V20/69
PHYSICS
G16H50/20
PHYSICS
Abstract
Graph-theoretic segmentation methods for segmenting histological structures in H&E stained images of tissues. The method rely on characterizing local spatial statistics in the images. Also, a method for quantifying intratumor spatial heterogeneity that can work with single biomarker, multiplexed, or hyperplexed immunofluorescence (IF) data. The method is holistic in its approach, using both the expression and spatial information of an entire tumor tissue section and/or spot in a TMA to characterize spatial associations. The method generates a two-dimensional heterogeneity map to explicitly elucidate spatial associations of both major and minor sub-populations.
Claims
1. A method of identifying regions of interest in a stained tissue image, comprising: receiving color normalized image data representing the stained tissue image, determining mutual information data indicative of statistical associations between neighboring pixels in the color normalized image data; identifying and detecting boundaries of histological structures within the stained tissue image based on the determined mutual information data; and generating a segmented stained tissue image using the detected boundaries of the identified histological structures.
2. The method according to claim 1, wherein the color normalized image data comprises normalized hue data in an opponent color space, and wherein the determining mutual information data comprises estimating a joint distribution of hue angles between neighboring pixels in the normalized hue data and calculating a pointwise mutual information (PMI) of the joint distribution, the PMI being the mutual information data.
3. The method according to claim 2, wherein the identifying comprises creating an affinity function from the PMI and detecting the boundaries based on the affinity function using spectral clustering.
4. The method according to claim 2, wherein the estimating the joint distribution uses a mixture of bivariate von Mises distribution.
5. The method according to claim 1, wherein the stained tissue image is a hematoxylin and eosin (H&E) stained tissue image and wherein the segmented stained tissue image is a segmented H&E stained tissue image.
6. A non-transitory computer readable medium storing one or more programs, including instructions, which when executed by a computer, causes the computer to perform the method of claim 1.
7. A computerized system for identifying regions of interest in a stained tissue image, comprising: a processing apparatus, wherein the processing apparatus includes: a quantifying component configured for determining mutual information data indicative of statistical associations between neighboring pixels in color normalized image data representing the stained tissue image; an identifying component configured for identifying and detecting boundaries of histological structures within the stained tissue image based on the determined mutual information data; and a segmented tissue image generating component configured for generating a segmented stained tissue image using the detected boundaries of the identified histological structures.
8. The system according to claim 7, wherein the color normalized image data comprises normalized hue data in an opponent color space, and wherein the determining mutual information data comprises estimating a joint distribution of hue angles between neighboring pixels in the normalized hue data and calculating a pointwise mutual information (PMI) of the joint distribution, the PMI being the mutual information data.
9. The system according to claim 8, wherein the identifying comprises creating an affinity function from the PMI and detecting the boundaries based on the affinity function.
10. The system according to claim 9, wherein the identifying comprises creating the affinity function from the PMI and detecting the boundaries based on the affinity function using spectral clustering.
11. The system according to claim 7, wherein the stained tissue image is a hematoxylin and eosin (H&E) stained tissue image and wherein the segmented stained tissue image is a segmented H&E stained tissue image.
12. The system according to claim 9, wherein the estimating the joint distribution uses a mixture of bivariate von Mises distribution.
13. A method of identifying regions of interest in a stained tissue image, comprising: receiving color normalized image data representing the stained tissue image; quantifying local spatial statistics for the stained tissue image based on inter-nuclei distance distributions determined from the color normalized image data; identifying and detecting boundaries of histological structures within the stained tissue image based on the quantified local spatial statistics; and generating a segmented stained tissue image using the detected boundaries of the identified histological structures.
14. The method according to claim 13, wherein the quantifying comprises identifying putative nuclei locations from the color normalized image data in the form of superpixels, building a superpixel graph based on a pointwise distance between each superpixel and a number of its nearest neighbors, and clustering the superpixels into labeled segments, and wherein the identifying comprises merging the labeled segments into the histological structures.
15. The method according to claim 13, wherein the stained tissue image is a hematoxylin and eosin (H&E) stained tissue image and wherein the segmented stained tissue image is a segmented H&E stained tissue image.
16. A non-transitory computer readable medium storing one or more programs, including instructions, which when executed by a computer, causes the computer to perform the method of claim 13.
17. A computerized system for identifying regions of interest in a stained tissue image, comprising: a processing apparatus, wherein the processing apparatus includes: a quantifying component configured for quantifying local spatial statistics for the stained tissued image based on inter-nuclei distance distributions determined from color normalized image data representing the stained tissue image; an identifying component configured for identifying and detecting boundaries of histological structures within the stained tissue image based on the quantified local spatial statistics; and a segmented tissue image generating component configured for generating a segmented stained tissue image using the detected boundaries of the identified histological structures.
18. The system according to claim 17, wherein the quantifying comprises identifying putative nuclei locations from the color normalized image data in the form of superpixels, building a superpixel graph based on a pointwise distance between each superpixel and a number of its nearest neighbors, and clustering the superpixels into labeled segments, and wherein the identifying component is configured for identifying by merging the labeled segments into the histological structures.
19. The system according to claim 17, wherein the stained tissue image is a hematoxylin and eosin (H&E) stained tissue image and wherein the segmented stained tissue image.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033] FIGS. 15-17show exemplary cell spatial dependency images and PMI maps;
[0034]
[0035]
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0036] As used herein, the singular form of “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise.
[0037] As used herein, the statement that two or more parts or elements are “coupled” shall mean that the parts are joined or operate together either directly or indirectly, i.e., through one or more intermediate parts or elements, so long as a link occurs.
[0038] As used herein, the term “number” shall mean one or an integer greater than one (i.e., a plurality).
[0039] As used herein, the terms “component” and “system” are intended to refer to a computer related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component can be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers. While certain ways of displaying information to users are shown and described herein with respect to certain figures or graphs as screen or screenshots, those skilled in the relevant art will recognize that various other alternatives can be employed. The screens or screenshots are stored and/or transmitted as display descriptions, as graphical user interfaces, or by other methods of depicting information on a screen (whether personal computer, PDA, mobile telephone, or other suitable device, for example) where the layout and information or content to be displayed on the page is stored in memory, database, or another storage facility. The screens or screenshots may also be printed as desired.
[0040] As used herein, the term “superpixel” shall mean a coherent patch or group of pixels with similar image statistics.
[0041] Directional phrases used herein, such as, for example and without limitation, top, bottom, left, right, upper, lower, front, back, and derivatives thereof, relate to the orientation of the elements shown in the drawings and are not limiting upon the claims unless expressly recited therein.
A. Segmenting H & E Stained Tissue Images
[0042] A first aspect of the disclosed concepts focuses on improving the function and operation of (e.g., with improved processing capabilities) digital pathology systems and, in particular, on segmenting histological structures in H&E stained images of tissues, such as breast tissues, e.g. invasive carcinoma, carcinoma in situ, atypical and normal ducts, adipose tissue, and/or lymphocytes. The present inventors hypothesized that spatial image statistics present discriminative fingerprints for segmenting a broad class of histological structures. This aspect of the disclosed concepts, described in greater detail herein, provides two graph-theoretic segmentation methods that each rely on characterizing local spatial statistics.
[0043] In the first method, each node in the graph corresponds to a pixel in the image, and the edges correspond to the strength with which two nodes belong to the same group. The edge strength is determined by measuring pairwise pixel statistics, in the form of bivariate von Mises mixture distributions, in an opponent color space built to enhance the separation between pink and purple stains in H&E images. Spectral methods are used to partition the graph. The first method is expected to be more successful in segmenting structures with well-defined boundaries (e.g., adipose tissues and blood vessels).
[0044] The second method is conveniently designed to extract histological structures that have amorphous spatial extent (e.g., a tumor nest). In this formulation, putative nuclei centers become the nodes of a graph formed to capture the spatial distribution of the nuclei in H&E images. By applying data-driven thresholds on inter-nuclei spatial distances, the network is partitioned into homogeneous image patches.
[0045] The two segmentation methods described herein have two common elements, namely opponent color representation and appearance normalization, each of which is described in detail below. The segmentation methods differ in how they capture image statistics and embed them into graph partitioning strategies. These aspects of the methods will be described separately herein.
[0046] When the known standard opponent-color (with red-green, yellow-blue as opponent color axes) hue-saturation-brightness (HSV) transformation is applied to red-green-blue (RGB) images from H&E, the pink and purple color ranges are restricted to the blue-red quadrant of the color wheel. A goal of this aspect of the disclosed concepts is to enhance the separation between pink and purple colors so that the downstream spatial analysis pipeline is more robust. For this, the construction of a color space is optimized to opponently place the pink and purple colors. Specifically, in the exemplary implementation, an expert was allowed to select a bag of pink and purple pixels. Then, singular value decomposition was performed on this collection of data to obtain an orthogonal projection matrix of size 3×3. This aspect of the disclosed concepts provides a specific interpretation to the projected coordinates, similar to the opponent space HSV. In particular, the projection onto the first singular vector (enforced to have non-negative values) yields an H&E-brightness value b. The two remaining projected coordinates, c2 and c3, form a complex plane in which H&E-saturation s=√{square root over (c.sub.2.sup.2+c.sub.3.sup.2)} and H&E-hue θ=tan.sup.−1(c.sub.2+κ.sub.3). From this construction, the hue values of purple and pink pixels are expected to be maximally separated in the complex color plane. For illustration, it is noted that the angular difference in the mean hue values of the pink and purple pixels in
[0047] In addition, any inconsistencies in sectioning, staining, and imaging result in variation in color appearance of H&E images. Thus, in the exemplary embodiment, the data is normalized. Previous normalization methods have utilized stain vector estimation methods such as non-negative matrix factorization. These methods were found to be ineffective for this aspect of the disclosed concepts because the color distributions for some images are very skewed toward mostly purple or mostly pink. The present inventors hypothesized that the color appearance of two images is similar if their color statistics match. However, matching the statistics of the whole pixel population of the source and target images can result in unintended artifacts. For example, if the source image has mostly pink pixels (stroma) and the target image has mostly purple pixels (invasive carcinoma), then matching the source image statistics to the target image statistics will turn many pink pixels in the source image to purple and mistakenly change the cellular component identity of those pixels from stroma to nuclei. To address this issue, the following three classes of pixels are first identified: pink (eosin), purple (hematoxylin), and white (e.g., fat, shrinkage), and the statistics are matched separately for each of these classes. To identify the three classes, H&E images are converted into H&E-hue, H&E-saturation, and H&E-brightness channels as discussed. The H&E-hue space is angular and given the separation between pink, purple, and white pixel clouds in this space, the hue values are modeled with a mixture of univariate von Mises distributions. Univariate von Mises distribution for angular statistics is the equivalent counterpart of the univariate normal distribution for linear statistics. The von Mises distribution is characterized by two parameters, a mean−π<μ≤πand a concentration parameter κ>0, and is given by: ƒ(x)={2κI.sub.0(κ)}.sup.−1 exp κ cos(x−μ), where I.sub.0(κ) is the modified Bessel function of the first kind with order 0. A mixture of K univariate von Mises distributions is given by Σ.sub.κ=1.sup.κm.sub.κƒ.sub.κ(x|μ.sub.κ, κ.sub.κ), where m.sub.κ's are the prior probabilities and μ.sub.κ's, κ.sub.κ's are the means and concentration parameters. To explicitly account for pixels with low saturation values and unstable hue angles, a uniform angular noise is added as an additional mixture component whose prior probability is approximately 0.3%. The parameters of univariate von Mises mixture can be found using an expectation-maximization (EM) algorithm. The statistics of a distribution can be characterized by an infinite set of moments. However, for analytical convenience, in the exemplary embodiment, moments are computed only up to the fourth order (mean, standard deviation, skewness, kurtosis). In each channel, the moments of each pixel class from the source image are matched to the target image. For example, the moments of purple pixels in the source image are matched to the moments of purple pixels in the target image in all three channels. After normalizing the statistics in the H&E opponent color space, the resulting pixel values are converted into the RGB space (to create normalized RGB data) using the inverse of the rotation matrix described above.
[0048] Having described the two common elements of the two segmentation methods, namely opponent color representation and appearance normalization, the remainder of each segmentation method will now be described in detail. In each of the segmentation methods, normalized image data serves as inputs. In particular, normalized H&E-hue data is used as inputs in the first method, and normalized RGB data is used as inputs in the second method.
[0049] With regard to the first method, normal breast tissues have large areas of pink stained connective tissue (CT) surrounding small areas of ducts, each of which is an assembly of cells. The nuclei of these cells will be stained dark purple, while the cytoplasm that surrounds the nuclei exhibits a mixture of pink and purple, since the purple stain from the nuclei can spill over to the cytoplasm. Statistically speaking, if one were to stand on any of these nuclei, one would expect to be surrounded by purple pixels denoting the nuclei and pink-purple pixels denoting the cytoplasm. If these cells assemble into a duct structure, then in a given neighborhood of each cell, other cells exhibiting similar properties should be found. On the other hand, if one were stand on a fibroblast cell nucleus, which is found usually scattered in the connective tissue, one would find mostly pink pixels in its neighborhood. With the assumption that the statistical association within a structure such as ducts is higher than across its boundaries, ducts should be able to be segmented while ignoring the fibroblast cells scattered among the connective tissue.
[0050] Using a mixture univariate von Mises distributions, the image pixels can be separated into pink, purple and white classes, but this is insufficient to delineate histological structures, such as the glands/ducts, because such structures contain pixels from all three classes. In this aspect of the disclosed concepts, in order to segment these structures, it is assumed that the statistical association within a structure such as a duct is higher than across its boundaries, and this statistical association is modeled, according to this aspect of the disclosed concepts, using a mixture of bivariate von Mises distributions. Since the H&E-hue is an angular variable, the joint distribution P(A, B) of hue values from two neighboring pixels lies on a torus. This joint density is modeled as a mixture of bivariate von Mises distributions. Let the values of pixel A and B in H&E-hue space be cp and ψ, respectively. The bivariate distribution of two angular variables, −π<φ≤πand −π<ψ≤π is:
ƒ.sub.c(φ,ψ)=C.sub.c exp[κ.sub.i cos(ψ−μ)+κ.sub.2 cos(ψ−ν)−κ.sub.3 cos(φ−μψ+ν)]
where μ, ν are the means and κ.sub.1, κ.sub.2>0 are the concentrations of φ, ψ, respectively, κ.sub.3 is the correlation coefficient and C.sub.c is the normalizing constant. The full bivariate von Mises model has 8 parameters, but a reduced 5-parameter cosine model with positive interaction is used in the exemplary embodiment. The marginal density is: ƒ.sub.c(ψ)=C.sub.c2π|.sub.0(κ.sub.13)(ψ)exp{κ2 cos (ψ−ν)}. The value of κ.sub.3 decides whether the distribution is unimodal or bimodal. In particular, the joint density is unimodal if κ.sub.3<κ.sub.1κ.sub.2/(κ.sub.1+κ.sub.2) and it is bimodal if κ.sub.3>κ.sub.1κ.sub.2/(κ.sub.1+κ.sub.2) when κ.sub.1>κ.sub.3>0 and κ.sub.2 >κ.sub.3>0.
[0051] When the values of neighboring pixels of the H&E image in the H&E-hue space are considered, there are at most six possibilities for the masses on the torus: purple-purple, pink-pink, white-white, and the three different pairwise interactions. To model this joint distribution, a mixture of six unimodal bivariate von Mises distributions is used. A mixture model of K bivariate von Mises distributions can be parameterized by: ƒ.sub.c(φ, ψ)=Σ.sub.i=1.sup.κm.sub.iƒ.sub.i(φ, ψ|μ.sub.i, ν.sub.i, κ.sub.1i, κ.sub.2i, κ.sub.3i). The initial values of μ.sub.i, ν.sub.i, κ.sub.1i, and κ.sub.2i are generated from the mixture of univariate von Mises for all the pixels in the image. The concentration parameters κ.sub.1i, and κ.sub.2i and the correlation parameter κ.sub.3i satisfy the unimodality conditions for ƒ.sub.i. κ.sub.3i is constrained to have values between −1 and 1 to avoid distortion to the elliptical patterns (observed in sampled data). Together with the above constraints, the parameters of the mixture are estimated by an EM algorithm. Since there are at most six components of the mixture model as reasoned above, an explicit model selection step is not undertaken for the mixture model. If the H&E image lacks any one of the three basic colors, purple, pink, and white, the prior probabilities or mixing proportions of clusters related to that color will be close to 0.
[0052] Consider modeling the statistical dependencies between hue angles of neighboring pixels in the H&E opponent color space. If the joint probabilities are used as a measure of statistical association, it may be found that the pink-pink pixel pair in the connective tissue has a higher probability than a purple-purple pixel pair inside a duct or a pink-purple pixel pair across the CT-duct boundary. However, because of the overabundance of pink in some H&E images, the combination of pink-purple pixel pairs across the CT-duct boundary may have an equivalent or even higher probability than a purple-purple pixel pair inside the duct. A pink-pink pair may have the highest joint probability and a purple-purple pair may have similar joint probability to a purple-pink pair. In other words, the joint probability might not be sufficient to detect correct boundaries. This can be improved by the use of mutual information (MI) to correct for relative abundance. To compute MI, a number of pixel pairs (A,B) with features {right arrow over (ƒ.sub.A )} and {right arrow over (ƒ.sub.B )} (e.g. H&E-hue angles) are selected randomly from all locations of the image and with distances less than a threshold. The joint probability of features of A and B at a distanced apart is denoted as p (A, B; d). The overall joint probability is defined as:
The value of d depends on the parameter 6, in particular d=2+2|r| where r˜N(0, σ). A nucleus is ≈15 pixel in diameter at 10× magnification. Since the segmentation algorithm targets assembly of nuclei, the distances between pixel pairs sampled should cover at least the diameter of a nucleus. Hence, σ is set to 3. The pointwise mutual information (PMI) is calculated from the joint probability P(A, B) modeled by a mixture of bivariate von Mises distribution and the marginal probabilities P(A) and P (B) modeled by a mixture of univariate von Mises distributions. In particular,
in me exemplary embodiment, ρ=2 to normalize for the upper bound of
[0053] Furthermore, an affinity function is defined from PMI to indicate the likelihood of grouping two pixels into the same histological structure. The affinity matrix W with elements w.sub.i,j denotes similarity between pixels i and j: w.sub.i,j=e.sup.PMIρ({right arrow over (ƒi)}, {right arrow over (ƒj)}). The affinity function is used as an input to a standard spectral graph segmentation method, such as that described in Arbelaez, P. et al., “Contour Detection and Hierarchical Image Segmentation”, IEEE TPAMI, 33(5), 898-916 (20122) that has been the state-of-the-art for segmenting natural images. From the affinity matrix W, eigenpairs {right arrow over (ν)}, λ of the generalized system are found: (D−W){right arrow over (ν)}=λD{right arrow over (ν)}. Dominant eigenvector maps (small eigenvalues) indicate boundary locations of potential histological structures. As is well known, no single eigenvector will be capable of capturing all possible boundaries in complex images. Hence, the usual practice is to calculate an edge strength map from oriented spatial derivative of a large number of dominant eigenvectors. A post-processing step is used to eliminate spurious boundary pixels.
[0054]
[0055] With regard to the second segmentation method, local spatial statistics vary between the various histological structures in breast tissues. For example, the clump of cells in ductal carcinoma in situ tends to aggregate with their boundaries in close proximity of each other, because the in situ tumor is growing but is confined within ducts. On the other hand, epithelial cells in invasive carcinoma are spatially far apart. They are also growing, but can freely infiltrate into and through the breast stroma, no longer confined to ducts. Local statistics of normal ducts is more ordered, in particular, normal epithelial (inner) and myoepithelial cells (outer) form two layers surrounding a cavity (lumen).
[0056] For adipose tissue, the nuclei are small and to one side of the cells. The majority of adipose tissue consists of fat droplets. The present inventors hypothesized that different histological structures have different distributions of inter-nuclei distances (local statistics). As described below, the second segmentation method of this aspect of the disclosed concepts is based on this hypothesis.
[0057] Nuclei segmentation in histopathological and cytopathological images is an extensively researched problem. However, the close proximity of epithelial cells and the prevalence of mitotic figures (dividing cells) in breast cancer make it difficult to accurately detect nuclear boundaries, which is even difficult for human eye. To avoid this issue, in the second segmentation method, putative nuclei locations are identified in the form of superpixels, which will approximately represent nuclei, and a graph connecting superpixels is constructed to obtain neighborhood and distance information for each superpixel pair. More specifically, in the exemplary embodiment, in order to generate superpixels from H&E images, first, the pixel colors are normalized as described above. Then, the algorithm proposed in Tosun, A. B. and Gunduz-Demir, C., “Graph Run-length Matrices for Histopathological Image Segmentation”, IEEE TMI, 30(3), 721-732 (2011), is performed to fit circular shaped superpixels. Briefly, this algorithm first clusters pixels into three classes based on intensities using k-means algorithm, in which cluster centers are determined over randomly selected training images using principal component analysis. These three classes represent purple, pink, and white regions which correspond to nuclei, stroma and lumen/white regions respectively. This algorithm then fits circular superpixels into clustered pixels for nuclei, stroma and lumen/white components. After superpixel decomposition, a Delaunay triangulation is formed based on center coordinates of superpixels to determine the neighborhood of each superpixel. Having the distance information for each superpixel pair, final segmentation of histological structures is achieved by partitioning this graph in a greedy manner and applying merging rules for specific types of segments, which is detailed in following sections. Although the proposed method is motivated by the inter-nuclei distance distribution, superpixels pairs from both purple and white pixel classes are considered to account for complex histological structures such as ducts, blood vessels and adipose tissues. For example, normal duct has purple nuclei forming two cell layers surrounding a white lumen area. On the other hand, the stroma (pink) class is considered as the background and is not included in graph partitioning step.
[0058] More specifically, each superpixel is considered a node in a graph and the connectivity of the graph is determined by a distance threshold. For each class, the pairwise distance between a superpixel center and its nearest 15 neighbors (identified by the Delaunay triangulation) is calculated. The distance threshold T is set to be proportional to the median value (6) of the distance distribution. The proportionality constant is set to maximize the performance of the algorithm for the entire database. After building the superpixel graph, a greedy connected component analysis algorithm is used to cluster superpixels into labeled segments. In the exemplary embodiment, the largest 15 segments in terms of tissue area are selected. Since tissue images in the exemplary embodiment are of size 2K×2K, only a handful of ducts, tumor nests, fat droplets are expected in any given image. At this point, two sets of labeled segments have been obtained from the purple and the white superpixels.
[0059] To merge purple segments and white segments into the final histological structures, a few simple rules are followed to make sure that important structures formed by nuclei clusters are not missed. If a white segment is completely covered by a purple segment, the whole purple area takes the label of the purple segment. If a white segment overlaps with a purple segment, regardless of overlapping area, the overlapping part takes the label of the purple segment and the non-overlapping part takes the label of the white segment. If a purple segment is completely covered by a white segment, the purple area takes the purple segment's label and the remaining white area retains the white segment's label. This is to make sure that a nuclei clump residing within a vessel is not missed. After merging purple and white segments, the remaining unlabeled area is considered as background or stroma.
[0060]
B. Quantifying Intratumor Spatial Heterogeneity
[0061] As described in greater detail herein, another aspect of the disclosed concepts provides improvements in the function and operation of (e.g., improved processing) digital pathology systems. In particular, this aspect provides a method for quantifying intratumor spatial heterogeneity that can work with single biomarker, multiplexed, or hyperplexed immunofluorescence (IF) data. The method is holistic in its approach, using both the expression and spatial information of an entire tumor tissue section and/or spot in a TMA to characterize spatial associations. In the exemplary embodiment described in detail herein, the method generates a two-dimensional heterogeneity map to explicitly elucidate spatial associations of both major and minor sub-populations. It is believed that the characterization of intratumor spatial heterogeneity will be an important diagnostic biomarker for cancer progression, proliferation, and response to therapy, and thus the method and system of this aspect of the disclosed concepts will be a valuable diagnostic and treatment tool.
[0062] According to this aspect of the disclosed concepts, a predetermined set of particular biomarkers is employed to quantify spatial heterogeneity in multiplexed/hyperplexed fluorescence tissue images. For illustrative purposes, this aspect of the disclosed concepts is demonstrated herein in a non-limiting exemplary embodiment wherein spatial heterogeneity is quantified using three breast cancer biomarkers (estrogen receptor (ER), human epidermal growth factor 2 (HER2), and progesterone receptor (PR)) combined with biomarkers for segmentation including the nucleus, plasma membrane, cytoplasm and epithelial cells. It will be understood, however, that this aspect of the disclosed concepts may be used with different and/or additional biomarkers. In addition, it will also be understood that the impact of this aspect of the disclosed concepts, which uses pointwise mutual information (PMI) to quantify spatial intratumor heterogeneity, can be extended beyond the particular exemplary embodiments described herein. For example, and without limitation, this aspect of the disclosed concepts may be extended to the analysis of whole-slide IF images, labeled with increasing numbers of cancer and stromal biomarkers.
[0063] Furthermore, this aspect of the disclosed concepts employs a predetermined set of dominant biomarker intensity patterns (based on the predetermined set of particular biomarkers being used), also referred to herein as phenotypes, to measure and quantify cellular spatial heterogeneity. Thus, as an initial matter, a non-limiting exemplary method of establishing the set of dominant biomarker intensity patterns will first be described below with reference to
[0064] Referring to
[0065] Next, at step 115, the cells from the IF data are segregated into two partitions (using thresholds as described below) based on the distribution of signal intensity for each biomarker, under the assumption that signal intensity indicates true biomarker expression.
[0066] Next, at step 120 of
[0067] In order to choose the ideal dimensionality of D, a ten-fold cross validation of the data reconstruction is performed as shown in
[0068] The results of κ-means clustering, shown to the right in
[0069] Having described the exemplary methodology for learning a set of dominant biomarker intensity patterns, the discussion will now shift to the manner in which the set of dominant biomarker intensity patterns is employed to quantify spatial heterogeneity. In particular,
[0070] Referring to
[0071] Next, at step 150, a spatial network is constructed to describe the organization of the dominant biomarker intensity patterns in the subject slide(s). Then, at step 155, the heterogeneity of the subject slide (s) is quantitated by generating a PMI map for the slide(s) as described herein. In the exemplary embodiment, steps 150 and 155 are performed as set forth below.
[0072] In order to represent the spatial organization of the biomarker patterns in the biomarker image (i.e., the tissue/tumor sample) of the subject slide(s), a network is constructed for the subject slide(s). The construction of spatial networks for tumor samples intrinsically couples cellular biomarker intensity data (in the nodes of the network) to spatial data (in the edges of the network). The assumptions in the network construction are that cells have the ability to communicate with nearby cells up to a certain limit, e.g., up to 250 μm, and that the ability for cells to communicate within that limit is dependent upon cellular distance. Therefore, the probability distribution in the exemplary embodiment is computed for the distance between a cell in the subject slide and its 10-nearest neighbors. A hard limit was chosen based on the median value of this distribution times 1.5 (to estimate the standard deviation), where cells in the network were connected only within this limit. Then, the edges between cells in the network are weighted by the distance between the adjacent cells.
[0073] Next, pointwise mutual information (PMI) is used to measure the association between each pair of biomarker patterns in the dictionary, and thus different cell phenotypes, for the subject slide(s). This metric captures general statistical association, both linear and nonlinear, where previous studies have used linear metrics such as Spearman's rho coefficient. Once PMI is computed for each pair of biomarker patterns, a measure of all associations in the data of the subject slide is displayed in a PMI map. An exemplary PMI map 170 is shown in
[0074] PMI map 170 describes relationships between different cell phenotypes within the microenvironment of the subject slide(s). In particular, the entries 172 in PMI map 170 indicate how frequently a particular spatial interaction between two phenotypes (referenced by the row and column number) occurs in the dataset when compared to the interactions predicted by a random (or background) distribution over all phenotypes. Entries in a first color, such as red, denote a strong spatial association between phenotypes, while entries in a second color, such as black, denote a lack of any co-localization (weak spatial association between phenotypes). Other colors may be used to denote other associations. For example, PMI entries 172 colored in a third color, such as green, denote associations that are no better than a random distribution of cell phenotypes over the entire dataset. Additionally, PMI map 170 can portray anti-associations with entries 172 denoted in a fourth color, such as blue (e.g., if phenotype 1 rarely occurs spatially near phenotype 3).
[0075] Thus, a PMI map 170 with strong diagonal entries and weak off-diagonal entries describes a globally heterogeneous but locally homogeneous tumor. An example of such a PMI map 170A is shown in
[0076] In the exemplary embodiment, PMI for the subject slide(s) is calculated as follows. Given a linear deconstruction of an IF dataset X, where each column of X is a cell κ.sub.κ, into an overcomplete dictionary D, where each column of D is a distinct pattern d.sub.i, and a sparse coding matrix W which assigns each cell to only a single biomarker intensity pattern, each cell is, as described herein (step 140) assigned to have a phenotype ƒ.sub.i where i is the nonzero index in column w.sub.κof W. A potential pitfall of the algorithm is that high and low signal intensity cells can be assigned to the same cell phenotype. PMI between a pair of biomarker phenotypes (ƒ.sub.i, ƒ.sub.j) for a given network or network set s is defined as:
where P(ƒ.sub.is) is the probability of phenotype ƒ.sub.i occurring in network set s, and P(ƒ.sub.it) is the background probability distribution of phenotype ƒ.sub.i derived from the complete ensemble of networks. Note that the background distributions are based on the entire dataset, in order to compare individual networks to the distribution of tissue slide as a whole. This construction is similar to the position-specific scoring matrices (PSSM) for either DNA or protein sequences, where the background distributions denote the probability of finding any particular nucleotide or amino acid over the dataset of sequences, for any given position. A PMI map consists of the PMI score for every possible pair of patterns in the vocabulary for a given network set s. While we advocate the interpretation of the two-dimensional PMI map for a thorough understanding of heterogeneity, we also derive a one-dimensional heterogeneity score value from the PMI map, for convenience of the reader interested in comparing with other one-dimensional scores in the literature. The information-deficient one-dimensional heterogeneity score is defined as:
where higher scores denote a larger difference from the background distribution. The one-dimensional scores can incorrectly map two spatially different organizations of the TMEs, as seen by their PMI maps, to the same scale.
[0077] After computing PMI map 170 for the subject slide(s) and identifying significant interactions or interaction motifs, it is necessary to interrogate the cells which contributed to this significant association. A significant interaction would be considered when the PMI value is close to ±1. PMI values close to 1 signify that this particular spatial interaction of biomarker patterns occurs more frequently than is observed in the background distribution. PMI values close to −1 signifies that when one pattern is observed in the network, that the other pattern is found to be observed less frequently than expected from the background distribution. PMI values close to zero signify interactions that may adequately be described by the background distribution.
C. System Implementations
[0078]
[0079] The processor may be, for example and without limitation, a microprocessor (μP), a microcontroller, or some other suitable processing device, that interfaces with the memory. The memory can be any one or more of a variety of types of internal and/or external storage media such as, without limitation, RAM, ROM, EPROM(s), EEPROM(s), FLASH, and the like that provide a storage register, i.e., a machine readable medium, for data storage such as in the fashion of an internal storage area of a computer, and can be volatile memory or nonvolatile memory. The memory has stored therein a number of routines that are executable by the processor, including routines for implementing the disclosed concept as described herein. In particular, processing apparatus 206 includes a quantifying component 208 configured for quantifying local spatial statistics for H&E stained tissue images as described herein based on received image data representing a H&E stained tissue image, an identifying component 210 configured for identifying histological structures within the H&E stained tissue image based on the local spatial statistics as described herein, and a segmented tissue image generating component 212 configured for generating a segmented H&E stained tissue image using the received image data and the identified histological structures, which image may then be provided to display 204. Quantifying component 208 may include one or more components configured for quantifying local spatial statistics by determining mutual information data indicative of statistical associations between neighboring pixels in the H&E image data, and identifying component 210 may include one or more components configured for identifying histological structures by using the mutual information data and a graph-based spectral segmentation algorithm as described herein. Alternatively, quantifying component 208 may include one or more components for identifying putative nuclei locations from the RGB data in the form of super pixels, building a superpixel graph based on a pointwise distance between each superpixel and a number of its nearest neighbors, and clustering the superpixels into labeled segments, and identifying component 210 may include one or more components configured for identifying by merging the labeled segments into the histological structures as described herein.
[0080]
[0081] In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word “comprising” or “including” does not exclude the presence of elements or steps other than those listed in a claim. In a device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements. In any device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain elements are recited in mutually different dependent claims does not indicate that these elements cannot be used in combination.
[0082] Although the invention has been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred embodiments, it is to be understood that such detail is solely for that purpose and that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present invention contemplates that, to the extent possible, one or more features of any embodiment can be combined with one or more features of any other embodiment.