Similar image search for radiology
11126649 · 2021-09-21
Assignee
Inventors
- Krishnan Eswaran (San Francisco, CA, US)
- Shravya Shetty (San Francisco, CA, US)
- Daniel Shing Shun Tse (Mountain View, CA, US)
- Shahar Jamshy (Santa Clara, CA, US)
- Zvika Ben-Haim (Haifa, IL)
Cpc classification
G06F16/434
PHYSICS
G06F17/18
PHYSICS
G16H50/70
PHYSICS
International classification
G06F17/00
PHYSICS
G06F7/00
PHYSICS
G06F17/18
PHYSICS
Abstract
A computer-implemented system is described for identifying and retrieving similar radiology images to a query image. The system includes one or more fetchers receiving the query image and retrieving a set of candidate similar radiology images from a data store. One or more scorers receive the query image and the set of candidate similar radiology images and generate a similarity score between the query image and each candidate image. A pooler receives the similarity scores from the one or more scorers, ranks the candidate images, and returns a list of the candidate images reflecting the ranking. The scorers implement a modelling technique to generate the similarity score capturing a plurality of similarity attributes of the query image and the set of candidate similar radiology images and annotations associated therewith. For example, the similarity attributes could be patient, diagnostic and/or visual similarity, and the modelling techniques could be triplet loss, classification loss, regression loss and object detection loss.
Claims
1. A computer-implemented system for identifying clinically useful similar radiology images to a query image, comprising: one or more fetchers receiving the query image and retrieving a set of candidate similar radiology images from a data store; one or more scorers receiving the query image and the set of candidate similar radiology images and generating a similarity score between the query image and each candidate image; and a pooler receiving the similarity scores from the one or more scorers, ranking the candidate images, and returning a list of the candidate images reflecting the ranking, wherein the one or more scorers implement a modelling technique to generate the similarity score capturing a plurality of similarity attributes of the query image and the set of candidate similar radiology images and annotations associated therewith, wherein the attributes of the query image and the set of candidate similar radiology images captured by the similarity score including diagnostic, visual, and patient demographic attributes, and wherein: (i) the system further includes a processing unit which aggregates information from the annotations associated with the set of candidate similar radiology images, the annotations comprise text-based radiology reports, and the processing unit groups images in the set of candidate similar radiology images by relevant common text from text-based radiology reports; or (ii) the system further includes the processing unit which aggregates information from the annotations associated with the set of candidate similar radiology images and the processing unit groups images in the set of candidate similar radiology images by the presence or absence of enumerated conditions in the annotations; or (iii) the modelling technique implemented in the one or more scorers is trained to determine whether the query image is from the same patient as each candidate image using a data set that includes images of a single patient over time.
2. The system of claim 1, wherein the one or more fetchers implement a modelling technique capturing a plurality of attributes of the query image and the set of candidate similar radiology images and annotations associated therewith to retrieve the set of candidate similar radiology images.
3. The system of claim 2, wherein the modelling technique implemented in the one or more fetchers comprises triplet loss, classification loss, regression loss, or object detection loss.
4. The system of claim 2, wherein there are at least two fetchers and each uses a different modelling technique.
5. The system of claim 2, wherein there are at least two scorers and each uses a different modelling technique.
6. The system of claim 1, wherein the modelling technique implemented in the one or more scorers comprises triplet loss, classification loss, regression loss, or object detection loss.
7. The system of claim 1, wherein the pooler ranks the candidate images using a logistic regression model with weighted sum of scores.
8. The system of claim 1, wherein the pooler ranks the candidate images using a generalized additive model.
9. The system of claim 1, wherein the pooler ranks the candidate images using a neural network based on scores as input.
10. The system of claim 1, wherein the system further includes the processing unit which aggregates information from the annotations associated with the set of candidate similar radiology images.
11. The system of claim 10, wherein the processing unit groups images in the set of candidate similar radiology images across common attributes that are useful for supporting a clinical decision.
12. The system of claim 10, wherein the annotations comprise text-based radiology reports, and wherein the processing unit groups images in the set of candidate similar radiology images by relevant common text from text-based radiology reports.
13. The system of claim 12, wherein the processing unit aggregates the groupings into numerical values and a comparison of the numerical values to a baseline.
14. The system of claim 10, wherein the processing unit groups images in the set of candidate similar radiology images by the presence or absence of enumerated conditions in the annotations.
15. The system of claim 1, further comprising a front end in the form of a workstation configured to display the query image, the candidate similar radiology images, and metadata associated with each of the candidate similar radiology images.
16. The system of claim 15, wherein the metadata comprises radiology reports or excerpts thereof, clinical decisions made, classification of diseases or conditions associated with the similar radiology image, or information relating to a grouping or aggregation of data associated with the candidate similar radiology images.
17. The system of claim 1, wherein the modelling technique implemented in the one or more scorers is trained to determine whether the query image is from the same patient as each candidate image using the data set that includes images of the single patient over time.
18. The system of claim 1, wherein the patient demographic attributes comprise an age, a gender, an ethnicity, a smoking history, a body mass index, a height, or a weight.
19. A method for identifying and retrieving clinically useful similar radiology images to a query radiology image, the query image associated with annotations including metadata, comprising: curating a data store of ground truth annotated radiology images, each of the radiology images associated with annotations including metadata; receiving the query image and retrieving a set of candidate similar radiology images from the data store; and generating a similarity score between the query image and each candidate similar radiology image using at least two different scoring modules, wherein the at least two scoring modules implement a different modelling technique to generate the similarity score capturing a plurality of similarity attributes of the query image and the set of candidate similar radiology images and the annotations associated therewith, wherein the attributes of the query image and the set of candidate similar radiology images captured by the similarity score includes diagnostic, visual, and patient demographic attributes, and wherein: (i) the method further comprises: aggregating, by a processing unit, information from the annotations associated with the set of candidate similar radiology images, wherein the annotations comprise text-based radiology reports; and grouping, by the processing unit, images in the set of candidate similar radiology images by relevant common text from text-based radiology reports; or (ii) the method further comprises: aggregating, by the processing unit, information from the annotations associated with the set of candidate similar radiology images; and grouping, by the processing unit, images in the set of candidate similar radiology images by the presence or absence of enumerated conditions in the annotations; or (iii) at least one of the two different scoring modules comprises a modelling technique trained to determine whether the query image is from the same patient as each candidate image using a data set that includes images of a single patient over time.
20. The method of claim 19, further comprising: ranking the candidate similar radiology images; and returning a list of the candidate similar radiology images reflecting the ranking and aggregated information obtained from the annotations associated with the set of candidate similar radiology images.
21. The method of claim 20, wherein ranking the candidate similar radiology images comprises ranking the candidate similar radiology images using a logistic regression model with weighted sum of scores.
22. The method of claim 20, wherein ranking the candidate similar radiology images comprises ranking the candidate similar radiology images using a generalized additive model.
23. The method of claim 20, wherein ranking the candidate similar radiology images comprises ranking the candidate similar radiology images using a neural network based on similarity scores as input.
24. The method of claim 19, wherein receiving the query image and retrieving the set of candidate similar radiology images from the data store comprises implementing a modelling technique capturing a plurality of attributes of the query image and the set of candidate similar radiology images and annotations associated therewith to retrieve the set of candidate similar radiology images.
25. The method of claim 24, wherein the modelling technique comprises triplet loss, classification loss, regression loss, or object detection loss.
26. The method of claim 19, wherein the modelling techniques comprises triplet loss, classification loss, regression loss, or object detection loss.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
DETAILED DESCRIPTION
(13) This document describes a computer-implemented system for identifying similar radiology images to a query image. The system can be considered a tool for assisting a medical professional such as a radiologist, ER doctor, or primary care physician in arriving at a diagnosis for a patient based on a radiology image of the patient, such as a chest X-ray, mammogram, or CT scan. The system provides diagnostically useful output information to a user based upon an input image.
(14) The general idea of how the system works is illustrated in
(15) the medical professional uses the results in addition to other diagnostic procedures and methods to generate the clinical findings. Note that all findings for an image may not be clinically relevant to a specific action/plan; here, we are referring in
(16)
(17) Once the set of similar radiology images have been identified, relevant information is returned to the user from this set. This would normally include not only the images themselves, but also metadata associated with each of the images like radiology reports, clinical decisions made (e.g., prescribing of antibiotics, diuretics), classification diseases/conditions associated with the similar image, an information relating to a grouping/aggregation of these results. That aggregation could include clustering together image results with similar properties, generating pivot tables summarizing the prevalence of certain conditions/diagnoses in the images, as well as indicating the prevalence of common phrases within the radiology reports. Examples of these kinds of aggregations will be explained later in this document.
(18)
(19) The objects in the back end can be roughly divided into two categories:
(20) (a) Objects that control the state machine of the back end 400:
(21) Controller 402: an object that receives queries from outside the back end (e.g., the front end 300 of
(22) Dispatcher 404: an object that distributes a query between several different fetchers 404 and scorers 406, then collates the results using a pooler 410. The dispatcher sends the candidate images and the queried image to a set of scorers in parallel, fetches the results, and passes the resulting scores to the pooler 410 for ranking.
(23) (b) Objects that perform specific operations required to identify and retrieve the similar images:
(24) (1) Fetcher(s) 406—an object that receives a query image 200 and generates a set of candidate similar images by querying a data store (not shown in
(25) (2) Scorer(s) 408—an object that receives a query image and a set of candidate images and returns a similarity score between the query image and each candidate image. In preferred embodiments, there are two or more scorers. As will be explained below, the scorers implement a modelling technique to generate the similarity score capturing a plurality of similarity attributes of the query image and the set of candidate similar radiology images and annotations associated therewith, such as diagnostic, visual and patient similarity. If there are multiple scorers, each implements a different modelling technique.
(26) (3) Pooler 410—an object that receives scoring results from several different scorers or fetchers, collated by the dispatcher 404, and returns a single list of the combined results. The pooler ranks the candidate images (e.g., on the basis of acuteness/severity), and returns a list of the candidate images reflecting the ranking.
(27) The software architecture of
(28) The software architecture of
(29) The similarity scores and candidate set of similar images are then returned to the dispatcher 404 and then supplied to the pooler 410, which then ranks the candidate set of similar images using the scores. The pooler then returns the ranked images as results 204 (again, preferably with aggregation information, statistics, groupings, metadata, etc. as described in detail elsewhere).
(30)
(31)
(32)
(33) Having described the overall architecture and various possible configurations of the architecture in
(34) Fetchers 406
(35) As explained previously, the fetcher receives the query image and retrieves a set of candidate similar radiology images from a data store in the form of a library of ground truth annotated reference radiology images. The data store can be curated, i.e., developed and maintained, by obtaining ground truth annotated radiology images from publicly available or private sources, or by obtaining images from public or private sources and adding the ground truth annotations with the use of trained readers.
(36) The fetcher can take the form of a trained deep convolutional neural network or classifier, optionally with filters, e.g. to exclude or include only some images for example those that are positive for a particular condition present in the query image. The fetcher can also include a function to first classify the query image (e.g., determine that it is positive for pneumothorax) and use that classification to filter the similar images to only those that have a ground truth annotation of pneumothorax. The fetcher could take several forms and could for example be configured in accordance with one of the following references, the content of which is incorporated by reference herein: C. Szegedy et al., Going Deeper with Convolutions, arXiv:1409.4842 [cs.CV] (September 2014); C. Szegedy et al., Rethinking the Inception Architecture for Computer Vision, arXiv:1512.00567 [cs.CV] (December 2015); see also US patent application of C. Szegedy et al., “Processing Images Using Deep Neural Networks”, Ser. No. 14/839,452 filed Aug. 28, 2015. A fourth generation, known as Inception-v4 is considered as another possible architecture. See C. Szegedy et al., Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv:1602.0761 [cs.CV] (February 2016). See also US patent application of C. Vanhoucke, “Image Classification Neural Networks”, Ser. No. 15/395,530 filed Dec. 30, 2016, and PCT application serial no. PCT/US2017/019051 filed Feb. 23, 2017.
(37) These candidate images may or may not already be associated with scores. For example, in one possible configuration, scores to similar images may be pre-assigned, and the fetcher may make use of pre-cached similar images to retrieve candidate similar images.
(38) In one embodiment, one or more of the fetchers could be configured as a pre-cached fetcher. In a pre-cached fetcher, the similar candidate images for a given query image have been precomputed. The precomputing of similar images could be using any suitable technique.
(39) The fetchers can use various different modelling techniques to determine similarity of images, and such modelling techniques are described in more detail in the discussion of the scorers. Such modelling techniques can include triplet loss, classification loss, regression loss and object detection loss.
(40) Scorers 408
(41) As noted above, the system uses one or more scorers which receive the query image and the set of candidate similar radiology images (identified by the fetcher) and generates a similarity score between the query image and each candidate image, using the image data as well as underlying annotations (image metadata, reports, patient information etc.) associated with the images. The score can be computed for example based on pre-computed embedding and a standard distance metric (e.g., cosine or Euclidean distance) in the embedding space. For example, the scorer looks up the embedding of an image in a database and then uses a distance measure in the embedding space. See the discussion of
(42) The scorers implement a modelling technique to generate the similarity score that can capture similarity on many different axes (e.g., diagnostic, visual, patient, etc.) Diagnostic, visual and patient attributes are some of the many signals that could be important on specific axes of similarity, but these three are not meant to be an exhaustive list. A number of different modelling techniques are contemplated, and in a typical implementation where there are more than one scorer they will each use a different modelling technique that captures these different attributes of similarity (e.g., diagnostic, visual and patient).
(43) In modeling similarity, one configuration of the scorers develops various signals in parallel that capture diagnostic, visual and patient similarity. The output from these signals will either be image embeddings that captures the similarity signal or a similarity score for every candidate image. The scoring module is responsible for combining the various signals and for the final scoring and ranking the candidate images. Some proposals for similarity models include the following:
(44) Diagnostic Similarity
(45) (1) Utilize the corresponding report text based similarity to generate diagnostic similarity image triples. For instance, utilize the natural language processing (NLP) report extraction embeddings to capture report similarity and the images corresponding to these reports give us training for diagnostic image similarity. Since the similarity is based on the entire content of the radiology report these examples will capture all diagnostic conditions and not focus on a subset.
(46) (2) Utilize the embeddings from existing X-ray classification models built for conditions like nodules, pneumothorax, opacity, etc. These are reasonably well-performing models and a similarity based on the top few layers of these models should capture diagnostic similarity.
(47) Diagnostic+Location Similarity
(48) (1) Use a patch detection approach to identify small abnormalities (e.g., nodules) along with their locations. Given an input image with a small abnormality, automatically identify the abnormality and its location, and retrieve images with similar abnormalities at similar locations, highlighting the abnormalities in both input and retrieved images.
(49) (1) Retrain a classifier (e.g., see J. Wang et al., Learning Fine-grained Image Similarity with Deep Ranking, arXiv:1404.4661 [cs.CV] (2014)) using patch based image triples from a training image data set. A scoring schema could be as follows: Same abnormality from same location>same abnormality from a different location>different abnormality from the same location>all others.
(50) Demographic and Patient Similarity
(51) (1) Models to identify if two X-rays belong to same person or not. A data set that includes longitudinal X-rays of a given patient gives us multiple images for the same person over time; use this to build a training set of same person vs not same person and the models can be trained over pairs or triplets to classify same person or not.
(52) (2) Generate demographic similarity triplets using the fields in the a training data set person table like age, gender, ethnicity, smoking history, BMI (body mass index), height, weight, etc. Derive with heuristics for how to rank these characteristics to generate the training data.
(53) Visual Similarity
(54) (1) Use a deep CNN image classifier such as that described in the J. Wang et al., Learning Fine-grained Image Similarity with Deep Ranking paper, supra. Or use the classifier with NCA (network component analysis) for feature selection using the X-ray data.
(55) (2) Retrain the classifier of (1) using triples generated for demographic, patient and diagnostic similarity.
(56) Abnormality Similarity
(57) (1) Train a Normal vs Abnormal image classifier. Use or develop a training data set that provides abnormal labels and is comprehensive. In one configuration, one can build a report extractor for normal vs abnormal from free text reports in the annotations and uses the corresponding images to generate the classifier.
(58) (2) Cycle generative adversarial networks (GANs) to identify abnormal regions. Generate an abnormality vector for each image in a training data set with abnormality type and one of 16 abnormality locations. Train a classifier for pairs of images that predicts abnormality vector similarity effectively making images with the abnormalities in the same location more similar.
(59) As noted above in the discussion of
(60) Triplet Loss
(61) This is a technique, described in the literature, that allows us to handle our heterogeneous data consistently in a way that notionally captures similarity. Specifically, suppose we have three images: a query image and two candidate images. If we know that we have a query image that is closer to one of the candidate images (the positive) than it is to the other (the negative), then we expect the distance between the extracted features between the positive pair (query and positive candidate) to be smaller than the distance between the query and negative candidate. The triplet loss is thus the difference between these two distances. Thus, triplet losses are a way of comparing images by creating an ordering of some of the images, e.g. for a distance function D(.,.), saying that D(queryImage, image1)<D(queryImage, image2)
(62) Any notion of distance can be turned into a triplet loss. The Hamming Distance is one way to construct such an ordering, by saying that images that have more of the same conditions (similar medical conditions, similar demographic information, localizable abnormalities appear in the same region, etc.) are more similar than those that have fewer.
(63) More formally, we can encapsulate an evaluation metric as a distance function
(64)
where
d.sub.H(⋅,⋅) is the Hamming distance
ρ (⋅, ⋅): {conditions}×{images}.fwdarw.{0, 1}
ρ (c, u)=1 iff image u exhibits the condition c in the image.
π (⋅, ⋅, ⋅) {image region}×{localizable conditions}×{images}.fwdarw.{0, 1}
π (r, c, u)=1 iff image u exhibits condition c in region r.
Here, condition is used loosely to capture both medical abnormalities as well as demographic information.
We say an image t is more similar to image v if
d(t, u)<d(t, v)
(65) The Hamming Distance is not the only way to construct such an ordering. Some alternatives include:
(66) a) Images taken of the same patient that are closer in time are more similar than images taken of the same patient that are farther apart b) A medical practitioner provides their own subjective ordering of some of the images
(67) c) Chest X-Ray images with an associated radiology report text, projected to a common embedding space, are more similar to each other than the original X-ray with a radiology report associated with a different Chest X-ray image.
(68) d) Chest X-Ray images with a follow-up chest CT, projected to a common embedding space, are more similar to each other than the original chest X-Ray with a chest CT that followed up a different Chest X-ray.
(69) e) All permutations of c) and d) swapping the positions of radiology report, Chest X-ray, and chest CT.
(70) Classification Loss
(71) There are other methods for modelling similarity as alternative to triplet loss. One is classification loss. Specifically, we could directly train classifiers for certain conditions. Classification loss can take several forms. One is cross-entropy loss, or log loss, which measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. The details are known in the art and described in the literature, for example in the tutorial http://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html.
(72) Regression Loss
(73) This is another alternative to the triplet loss technique. If we have an embedding for an associated image modality, e.g. a radiology report associated with a chest X-Ray or a chest CT associated with the same chest X-ray, we can formulate it as a regression problem. The idea here is to predict the report embedding vector directly from the image, modeling it as a regression problem. To the extent the report embedding accurately captures similarity, then a good regression model on the image would also capture similarity.
(74) The simplest notion of regression is one-dimensional linear regression, which corresponds to finding the slope and intercept so that one can map an input feature to an output value, e.g.
y=mx+b,
where given examples of (x_i, y_i) pairs, we find the slope m and intercept be that would minimize for some loss, e.g. squared error:
min_{m,b}\sum_i(y_i−(mx_i+b)){circumflex over ( )}2
(75) We can generalize this idea if we have some function ƒ to extract features from a report as well as another function g to extract features from an image. The outputs could be vectors, so there could be an equation
ƒ(report)=Mg(image)+b, where M is a matrix, and b is a vector.
(76) Furthermore, if g is a neural network, we can not only adjust the value of M and b for fixed ƒ and g, but also update the value of g over time as well. If the output dimension of g is the same as that of ƒ, it turns out that this is equivalent to making M an identity matrix and b a zero vector, so given example pairs (report_i, image_i), we can solve a regression problem by minimizing for some loss, e.g. squared error: min_{g} \sum_i (f(report_i)−g(image_i)){circumflex over ( )}2.
(77) Object Detection Loss
(78) Object detection loss is another modelling technique for capturing similarity. One might note that if a pneumothorax is found in the same part of a candidate image as the query, those images might be closer to one another. If the existence, size, or location of elements within an image are important for determining similarity, e.g. the position of the carina and the tip of an ET tube to determine whether the ET tube is correctly placed, or the location and size of a pulmonary nodule, then we can formulate it as an object detection problem (object detection loss, e.g. intersection over union).
(79) Therefore, if one knows where in an image a condition is, that could be used to model similarity. Attention mechanisms give us the capability to do this. The technique of Integrated Gradients can be used, as an example of an attention mechanism. Attention mechanisms, such as Integrated Gradients, are machine learning tools which basically identify those portions of the data set that contribute the most to the model predictions. These portions of the X-ray or CT scan data set can then be highlighted by adding bounding boxes in the images enclosing abnormal tissue or tumors identified from the attention mechanism. The Integrated Gradients algorithm is described in the paper of M. Sundararajan et al., Axiomatic Attribution for Deep Networks, arXiv:1703.01365 [cs.LG] (June 2017), the entire content of which is incorporated by reference. The methodology will be described conceptually in the context of attribution of individual pixels in an image in a classification of the overall image. Basically, an Integrated Gradients score IGi (or attribution weight or value) for each pixel i in the image is calculated over a uniform scaling (α) of the input image information content (spectrum of brightness in this example) from a baseline (zero information, every pixel black, α=0), to the full information in the input image (α=1), where IGi (score for each pixel) is given by equation (1)
IG.sub.i(image)=image.sub.i*∫.sub.0-1∇F.sub.i(α*image)dα (1)
where F is a prediction function for the label;
image.sub.i is the RGB intensity of the ith pixel;
IG.sub.i(image) is the integrated gradient w.r.t. the i.sub.th pixel, i.e., attribution for i.sub.th pixel; and
∇ is the gradients operator with respect to image.sub.i.
(80) Section 3 of the Sundararajan et al. paper explain the algorithm further and that description is incorporated by reference.
(81) The use of attention mechanisms in deep learning neural networks is described in the conference presentation of D. Bandanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, January 2014 (arXiv:1409.0473[cs.CL]. Further explanations of attention mechanisms in the context of healthcare include Choi et al., GRAM: Graph-based attention model for Healthcare Representation Learning, arXiv:1611.07012v3 [cs.LG] April 2017 and Choi et al., RETAIN: an Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism, arXiv:1608.05745v3[cs.GL] February 2017.
(82) The goal here is not only to use where in the image something occurs to inform similarity. There are several techniques to explain what components (pixels) in the image contribute the most to model prediction, including the techniques of soft attention and integrated gradients discussed above. One can also explicitly capture, through the assistance of labelers marking the images, where in an image a specific item is, e.g. the location of a carina, which is in the same vein as c) in the above alternative to triplet losses, i.e. object detection problems.
(83) Our framework of fetchers, scorers, and poolers allows us to seamlessly combine techniques that can use any or all of these different losses and distance methods.
(84) Pooler 410 and Ranking
(85) As noted above, the system of
(86) In one configuration, a final ranking is done at the pooler 410, with intermediate rankings proposed by the scorers, e.g., based on the similarity scores. There can also be an implicit exclusion of certain images from the ranking based on the candidate set of images that are returned by the fetcher(s) 406.
(87) The final ranking can be a mix of objective measures like the Hamming distance and scores derived from subjective measures, e.g., what medical professionals actually consider to be similar images for the clinical context they are working in. Subjective measures could be used for a final comparison of different models or ranking methods. For instance, consider a set of query images q_1, . . . , q_N, and for each of these queries, we receive ranked images r_1(q_i), r_2(q_i), . . . , r_k(q_i) the top k images returned for query image q_i. Then, doctors and/or other medical professionals, could indicate whether the ordering of r_1(q_i), r_2(q_i), . . . , r_k(q_i) makes sense for image q_i and how relevant they are. From these, one could compute scores for image pairs
(88)
As we collect more of these labels and we generate/evaluate different ranking methods, we can rate how well the ranking method does based on the scores collected above, so it offers a way to compare different ranking methods against one another.
(89) There are several options for the final ranking:
(90) Option 1—Logistic Regression Model with Weighted Sum of Scores
(91) This option might fail to capture certain nonlinearities in when and how to weight the different scores from the scorers.
(92) Option 2 Generalized Additive Models
(93) This option offers a framework for combining features together from different scoring components. Generalized additive models are a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. They are described in the scientific and technical literature, see for example the explanation at https://en.wikipedia.org/wiki/Generalized_additive_model, therefore a detailed description is omitted for the sake of brevity.
(94) Option 3—Neural Network Based on Scores as Inputs.
(95) As a general matter, here are a number of techniques that could be used to compute a final ranking, from a simple heuristic, e.g. taking the harmonic mean of the intermediate rankings produced by each scorer, to something more sophisticated like training a model to use a weighted approximate pairwise (WARP) loss. See for example the conference paper of J. Weston et al., Learning to Rank Recommendations with the k-Order Statistic Loss, RecSys'13, Oct. 12-16, 2013, Hong Kong, China, available on-line at https://research.google.com/pubs/archive/41534.pdf
(96) Example User Interfaces
(97) As explained in
(98) 1) The images can be returned not merely as a list of images but rather grouped together across common attributes that are useful for supporting a clinical decision. For example, in
(99) 2) The groupings can involve the aggregation of relevant common text from radiology free text reports. For example, while there may not be a specific label indicating that an endotracheal tube is misplaced, we can aggregate together images that are associated with reports having common phrases that imply this condition to be present, for example reports having text entries “endotracheal tube at the level of the carina”, “endotracheal tube tip terminates in right main bronchus”, or “ET tube tip could be advanced a couple of centimeters for standard positioning.” Attention mechanisms in the scorers can be used to identify portions of the free text reports, such as particular words or phrases, which contribute the most to the similarity score.
(100) 3) When we group by these common phrases in reports (or by the presence or absence of enumerated conditions in other metadata), as in example 2) above, we can aggregate these into values and compare them against a baseline, and report the comparison. For example, the similar image results are 100 images, and we may report that fact that 60 of the 100 images indicated pneumothorax was present, even though only 1 of every 1000 images in the reference library database contain pneumothorax.
(101)
(102)
(103)
(104) In summary, once the set of similar images have been identified, relevant information is returned to the user from this set. This would normally include not only the images themselves, but also metadata associated with each of these images like radiology reports, clinical decisions made (e.g., prescribing of antibiotics, diuretics), classification diseases/conditions associated with the similar image, an information relating to a grouping/aggregation of these results. That aggregation could include clustering together image results with similar properties, generating pivot tables summarizing the prevalence of certain conditions/diagnoses in the images, as well as indicating the prevalence of common phrases within the radiology reports.
(105) The system of this disclosure could be deployed in a cloud environment in which the back end of