Method for Obtaining an Indication about the Image Quality of a Digital Image

20230197246 · 2023-06-22

Assignee

Inventors

Cpc classification

International classification

Abstract

The invention relates to a method to give an indication about the image quality of a digital image in comparison to what the expected image quality in terms of image content and technical image quality parameters would be for a similar exposure type. The method evaluates whether parameters of the acquired image such as noise and dynamic range match the expectations for the intended exposure type, and whether certain regions of interest are present and properly presented in the image.

Claims

1. A method to provide an indication about image content quality of an acquired digital medical image, said indication being dependent on body part information and view position information associated with said image, the method comprising the steps of: accessing said image at a computing device, obtaining body part information and view position information for said image, extracting multi-resolution features of various scale levels from said image by a trained deep learning backbone network that comprises multi-resolution convolutional layers, providing said multi-resolution features of various scale levels from said image and said body part and view position information as input for a trained feature combination network comprising bi-directional feature pyramid network layers that operate in both top-down and bottom-up direction, and obtaining said indication about image content quality as an output result of an image quality head of said feature combination network, wherein said backbone network and feature combination network are trained simultaneously.

2. The method of claim 1, wherein said body part information and view position information is obtained from an exam request that is associated with said image and accessible from a radiology information system (RIS) on a computer device.

3. The method of claim 1, wherein said body part information and view position information is obtained from DICOM information stored with said image.

4. The method of claim 1, wherein said body part information and view position information is obtained from a trained deep learning model that receives said image as an input.

5. The method of claim 1, wherein an additional network head of said feature combination network provides a prediction of said body part and view position associated with said image.

6. The method of claim 1, wherein an additional network head of said feature combination network provides a prediction for a bounding box indicating a region of interest associated with a body part of said image.

7. The method of claim 1, wherein said image content quality comprises one of technical image quality and patient positioning quality of the acquired image.

8. The method of claim 1, wherein said backbone network and feature combination network are trained by supervised learning.

9. The method of claim 1, wherein said backbone network and feature combination network are trained by unsupervised learning.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] FIG. 1 gives a schematic overview of the deep learning model of the invention, depicting a backbone network [101] consisting of multi-resolution convolutional layers [102] and accepting the acquired medical image [100] as its input. The backbone network produces multi-resolution features [103], which are generated at different resolutions or scales of the original image, and feeding them into a feature combination network [104] comprising multi resolution feature combination layers [105]. The output head [110] performs the image quality assessment task, and produces the indication about the image content quality as an output result.

[0026] FIG. 2 gives a schematic overview of the deep learning model of the invention which comprises two output heads: an image quality head [110] and a (combined) body part and view position head [111ab]. Moreover, the requested body part and view position information is encoded and provided to the feature network as an additional input for the prediction [106].

[0027] FIG. 3 gives a schematic overview of the deep learning model of the invention which comprises a separate body part head [111a] and view position head [111b] wherein each body part is considered as one class for the earlier head and each view position is considered as another class for the latter head.

[0028] FIG. 4 gives a schematic overview of the deep learning model of the invention, wherein the information about the requested body part and view position is obtained from an additional deep learning model [107] that predicts the body part and view position [106] for input of the feature combination network [104].

DETAILED DESCRIPTION OF THE INVENTION

[0029] In the following detailed description, reference is made in sufficient detail to the above referenced drawings, allowing those skilled in the art to practice the embodiments explained below.

[0030] As one of the learnable parameters inside a neural network, a weight is by default initialized with a random number. For most image based tasks, the most popular initialization method is using model weights that are pre-trained with the ImageNet dataset for common object classification tasks. In general, this initialization gives better immediate results for this kind of classification tasks on images.

[0031] Lately, it has been noted that using the ImageNet pre-trained model for medical imaging task might result in suboptimal results due to the appearance differences between the photo image domain and the medical image domain. Thus, model pre-training with a self-supervised method with a high volume of unlabelled medical image data comes into the picture. For instance, an auto-encoder setup where the model output attempts to reconstruct the input image may be applied, as well as contrastive training where versions of the same input image and another input image are fed into the model to tell apart (or to contrast) between input that is from the same image and input that is not.

[0032] Model parameters can thus in principle be initialized randomly, by using an ImageNet dataset on a natural classification task, or by using a self-supervised method, such as an auto encoding task or through contrastive learning using an unlabeled relevant medical imaging dataset.

[0033] The design of the proposed deep learning network that is applied in the invention is a linear sequence of two deep learning models, of which the output of the first model feeds into the second model.

[0034] The first deep learning model is a so-called backbone network [101] consisting of multi-resolution convolutional layers [102] and accepts the acquired medical image [100] as its input. The backbone network produces multi-resolution features [103], which are generated at different resolutions or scales of the original image. The multi-resolution model eliminates the need to tradeoff between high-resolution and global context by using a cascade of low-resolution to high-resolution networks, and thus more efficiently yields features at the different scales.

[0035] The preferred backbone of the model is the so-called EfficientNet [Tan, Mingxing, and Quoc Le. “Efficientnet: Rethinking model scaling for convolutional neural networks.” International Conference on Machine Learning. PMLR, 2019], which is a deep learning network that uses the compound scaling concept to adapt the models' width, depth and image resolution in a balanced way.

[0036] The extracted multi-resolution features [103] are fed into a second deep learning model, which is a so-called feature combination network [104] comprising multi resolution feature combination layers [105]. The feature combination network merges the features from various resolution/scale levels in both top-down and bottom-up directions.

[0037] The preferred feature network consists of bi-directional feature pyramidal network (BiFPN) layers [105] which is a compute-efficient way of combining the features in a top-down and bottom-up data flow. The connections are also designed in a regular pattern, making it more adaptable to new tasks instead of optimized for one specific task. The output head [110] is where the adaptation to the image quality assessment task happens. In the context of the invention, the network head always comprises an image quality head [110], optionally complemented with a body part head [111a] (for classification of the imaged body part), a view position head [111b] (classifying the view position of the imaged body part), a combination of both [111ab], or a bounding box head.

[0038] The difference with the state of the art lies in the choice of the output network head (the image quality head), and the appropriate training of the deep learning pipeline as a whole, wherein the backbone network and the feature combination network are trained simultaneously, wherein the first network passes on its output to the second network.

[0039] The different output heads define some of the different embodiments of the invention; where the network head is an image quality head, the output produced is the image quality content indicator as described above.

[0040] The training of the model requires the input of an appropriate set of labelled data, that comprise at least the acquired medical images that are labelled with the corresponding quality appreciation from experienced radiologists for the images. It is important to note that the training of the model does not have to be limited to a single acquisition type, which means that the same model can be trained on multiple acquisition types.

[0041] In a second embodiment, the deep learning model contains two output heads: the image quality head and a (combined) body part and view position head. The image quality head would give the prediction of how well the input image fulfills the image content quality criteria list for the requested body part and view position.

[0042] The information about the requested body part and view position is encoded and provided to the feature network as an additional input for the prediction [106]. This data is obtained from either the order information relating to the requested acquisition, or from information that is stored together (for instance as a DICOM tag) with the image data file. The body part and view position is shown as one output head in FIG. 2, which means each specific patient pose is considered as one member of the class that is predicted by this network head. It means that one combination of body part/view position is considered as a single class of a potential prediction outcome.

[0043] In a third embodiment, a separate body part head and view position head can also be considered wherein each body part is considered as one class for this head and each view position is considered as another class for this head. (FIG. 3)

[0044] In a fourth embodiment, the information about the requested body part and view position is obtained from an additional deep learning model [107] that predicts the body part and view position [106] rather than obtaining it from recorded information about the acquired image. This body part and view position information is then encoded, and is provided to the feature combination network of the main model as described earlier. (FIG. 4)

[0045] In yet another embodiment, a combination of output heads is provided where the content image quality, the body part and view position, and a bounding box containing the most significant region in the image or region of interest (ROI).

[0046] Similar to the first embodiment, the model takes the acquired X-ray image and the requested body part and view position information as input. In addition to the image quality head that provides the prediction of the image content quality criteria fulfillment for the requested body part, and the body part and view position head that provides the prediction of the body part and view position contained in the input image, the model has a bounding box head. The bounding box head gives the prediction about the location of the ROI containing the most significant region in the image that depicts the predicted body part and view position.