Method for Finding Image Regions that Significantly Influence Classification in a Tool for Pathology Classification in a Medical Image
20230144724 · 2023-05-11
Assignee
- Agfa Healthcare Nv (Mortsel, BE)
- VRVIS Zentrum für Virtual Reality und Visualisierung Forschungs-GmbH (Vienna, AT)
Inventors
- David Major (Mortsel, BE)
- Dimitrios Lenis (Mortsel, BE)
- Maria Wimmer (Mortsel, BE)
- Gert Sluiter (Mortsel, BE)
- Astrid Berg (Mortsel, BE)
- Katja Buehler (Mortsel, BE)
Cpc classification
G06F2218/00
PHYSICS
International classification
Abstract
A method for finding image regions that significantly influence classification in a tool for pathology classification in a medical image wherein image regions that influence classification are inpainted, by replacing the pixels in these regions by representations of heathy tissue, resulting in an image with a healthy classification-score in the classification tool.
Claims
1. (canceled)
2. A method for inpainting image regions that influence classification in a classification tool for pathology classification in a medical image by replacing pixels in potentially pathological regions by pixels of heathy tissue, these regions being determined and inpainted by the following steps: randomly initializing a binary map, whose image regions are image regions of said medical image, overlapping said binary map to said medical image to generate an input image and inputting said input image in a partial convolution network, inputting the result of said partial convolution network to said classification tool, yielding a new classification score, optimizing the above initialized binary map using a loss function fed with (i) said map, (ii) said original image with a corresponding classification score, and (iii) said new classification score, and updating said binary map in successive steps by eliminating and creating new mask positions for the partial convolution network.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0030]
[0031]
[0032]
[0033]
[0034]
DETAILED DESCRIPTION OF THE INVENTION
[0035] The goal of the present invention is to estimate a faithful and informative saliency map between a medical image and its classification score: given an image, we search for and visually attribute the specific pixel-set that contributes towards a confident classification for a fixed class.
[0036] A class represents the label of a distinct group after stratification of all possible diagnosis based on a medical image. An example could be: pathological/no findings; benign/pathological.
[0037]
[0043] This result mask is converted into a visual image.
[0044] The general problem to be solved can be formulated as finding the smallest deletion region (SDR) of a class c, i.e. the pixel-set whose marginalization w.r.t. the classifier lowers the classification score for c.
Image-Wise Saliency Mapping:
[0045] Informally, we search for the smallest smooth map, that indicates the regions we need to change (inpaint) such that we get a sufficiently healthy image able to fool the classifier.
[0046] We formalize the problem as follows:
[0047] Let I denote an image of a domain I with pixels x on a discrete grid m1×m2, c a fixed class, and f a classifier capable of estimating p(c|I), the probability of c for I.
[0048] Also let M.sub.1 denote the saliency mask for image I and class c, hence M.sub.1 ∈M.sup.m1×2({0, 1}).
[0049] We use total variation tv (M.sub.1), and size ar(M.sub.1), to measure the masks shape.
[0050] Total variation is generally defined as tv(M):=Σ.sub.i,j(M.sub.i,j.sup.−M.sub.i,j+1).sup.2+Σ.sub.i,j(M.sub.i,j−M.sub.i+1, j).sup.2.
[0051] Note that size here is ambiguous. Experimentally we found dice overlap with regions-of-interest like organ-masks to be favourable over the maps average pixel value. With 8 denoting elementwise multiplication, and π(M) the inpainting result of a hole image I 8 M, we can define φ(M):=−1.Math.log(p(c|π(M))) and ψ(M):=log(odds(I))−log(odds(π(M))), where odds(I)=p(c|I)/1−p(c|I). Both, φ and ψ, weigh the new probability of the inpainted image.
[0052] If we assume class c to denote pathological, then healthy images, and large score differences will be favoured. With this preparation we define our desired optimization function as
L(M):=λ.sub.1.Math.(φ(M)+ψ(M))+λ.sub.2.Math.tv(M)+λ.sub.3.Math.ar(M)
where λi∈IR are regularization parameters, and search for arg mints L (M).
[0053] There are two collaborating parts in L.
[0054] The first term enforces the class probability to drop, the latter two emphasize an informative mask. Focusing on medical images, L directly solves the SDR task, thereby minimizing medically implausible and adversarial artefacts caused by in painting of large classifier-neutral image regions.
[0055] The optimization problem is solved by local search through stochastic gradient descent, starting from a regular grid initialization. By design, no restrictions are applied on the classifiers f. For optimization we relax the masks domain to M.sup.m1×m2([0, 1]), and threshold at θ ∈ (0, 1).
Image Inpainting with Partial Convolutions:
[0056] For marginalization, we want to emphasize local context, while still considering global joint region interaction, and thereby favor a globally sound anatomy. Therefore, we adapt the U-Net like architecture as described by G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions”, in Proceedings of ECCV, 2018, pp. 85-100, which is capable of handling masks with irregular shapes, fitting our optimization requirements for pathological-regions of different sizes and shapes. The chosen architecture consists of eight partial convolution layers on both encoding and decoding parts. It takes an image with holes I 8 M and the hole mask M as an input, and outputs the inpainted image π(M). The partial convolution layers insert only the convolution result of the current sliding convolution-window when image information is present. The convolution filter W is applied on the features X using the binary mask M and yield new features xi the following way:
0 otherwise
where b is the bias term.
[0057] The convolution operation is scaled by
according to the amount of information available in the current sliding window. Moreover a new mask m.sup.i is passed to the next layer which is updated by setting its values to 1 in the sliding window if sum(M)>0.
[0058] We train the network with a loss function concentrating on both per-pixel reconstruction performance of the hole, non-hole regions and on overall appearance of the image.
[0059] To improve the overall appearance a perceptual loss and a style loss is applied which match images in a mapped feature space. Total variation is used as a last loss component to ensure a smooth transition between hole regions and present image regions.
Experimental Set-Up
[0060] Dataset: In this work the Database for Screening Mammography (DDSM) as described by M. Heath, K. Bowyer, D. Kopans, R. Moore, and W. P. Kegelmeyer, “The digital database for screening mammography,” in Proceedings of the 5th international workshop on digital mammography. Medical Physics Publishing, 2000, pp. 212-218 and the Curated Breast Imaging Subset of DDSM (CBIS-DDSM) as described by R. S. Lee, F. Gimenez, A. Hoogi, and D. Rubin, “Cu-rated Breast Imaging Subset of DDSM [Dataset],” The Cancer Imaging Archive, 2016 were used, downsampled to a resolution of 576×448 pixels. Data was split into 1231 scans containing masses and 2000 healthy samples for training, and into 334 mass and 778 healthy scans for testing. Scans with masses contain pixel-wise ground-truth annotation (GT).
[0061] Image Classifier: The basis of our saliency detection framework is a Mobilenet binary classifier to categorize images as healthy or as a sample with masses. The network was trained on all training scans with batch size of 4 using the Adam optimizer with a learning rate (1r) of 1e-5 for 250 epochs using early stopping. Rotation, zoom and horizontal, vertical flips were used for data augmentation. It was pre-trained by approx. 50k 224×224 pixel patches from the same data with the task of classifying background vs. masses.
[0062] Inpainting: The inpainter was trained on the healthy training samples with a batch size of 1 in two phases. The first phase was setup with batch normalization (BN) and 1r=1e-5 for 100 epochs, the second without BN in the encoder part and with 1r=1e-6 for 50 epochs. For each image up to 400 8×8 pixel holes were generated at random positions, where both single small holes and larger clusters were simulated to mimic configurations during optimization. The inpainter has the task to change the classification score of an image towards healthy when replacing mass tissue, no considerable change should happen otherwise. To demonstrate that, we computed (i) a ROC curve using the classifier on all test samples without any inpainting, (ii) ROC curves for inpainting only in healthy tissue over 10 runs with randomly sampled holes and (iii) for inpainting of mass tissue in unhealthy scans over 10 runs (see
[0063] Saliency Mapping: Parametrization was experimentally chosen based on grid-search, restricted by λi∈[0, 1], for i=1, 2, 3. We found the resulting masks to be especially sensitive to λ2. This smoothness controlling term, balances between noisy result-maps and compression induced information-loss. We exemplify this behaviour with an ablation study, where contributions of smoothing and sizing are set to zero (cf.
[0064] We compared our approach against two established methods based on widespread adaptation in medical imaging, namely the methods described by C. F. Baumgartner, L. M. Koch, K. C. Tezcan, J. X. Ang, and E. Konukoglu, “Visual feature attribution using Wasserstein GANs,” in Proceedings of CVPR, 2017, pp. 8309-8319 and by P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Y. Ding, A. Bagul, C. Langlotz, K. S. Sh-panskaya, M. P. Lungren, and A. Y. Ng, “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,” CoRR, vol. abs/1711.05225, 2017 and with inherent validity as described by J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” in Proceedings of NIPS, 2018, pp. 9505-9515.
[0065] We chose the gradient based Saliency Map (SAL) and the network-derived Cam (CAM) visualizations. As our domain prohibits the utilization of blurring, noise, etc. we could not test meaningfully against reference based methods as described in L. M. Zintgraf, T. S. Cohen, T. Adel, and M. Welling, “Visualizing deep neural network decisions: Prediction difference analysis,” in Proceedings of ICLR, 2017 and in R. C. Fong and A. Vedaldi, “Interpretable explanations of black boxes by meaningful perturbation,” in Proceed-ings of ICCV, 2017, pp. 3429-3437 and in P. Dabkowski and Y. Gal, “Real time image saliency for black box classifiers,” in Proceedings of NIPS, 2017, pp. 6967-6976.
[0066] For evaluation four measures were compared: (i) Average of distances between the centres of GT mask and the result masks (RM) connected components (D), (ii) Average Hausdorff-Distances (H) between GT masks and RM, (iii) Ratio between derived RM and the breast masks (A) indicating map sizes, and related (iv) Overlap coefficients (O) between our RM and those of CAM and SAL (cf.
Results and Conclusions:
[0067] Inpainting Evaluation: The ROC curves in
[0068] Saliency Evaluation: Quantitatively, our framework yields saliency masks significantly closer to GT masks based on centre distances D, regardless of chosen mask thresholds (cfr.
[0069] Qualitatively, as depicted in