AUTOMATED SEGMENTATION OF THREE DIMENSIONAL BONY STRUCTURE IMAGES
20210369226 · 2021-12-02
Assignee
Inventors
- Krzysztof B. Siemionow (Chicago, IL, US)
- Cristian J. Luciano (Evergreen Park, IL, US)
- Marek Kraft (Poznan, PL)
Cpc classification
G06T11/008
PHYSICS
A61B6/5229
HUMAN NECESSITIES
A61B6/5258
HUMAN NECESSITIES
International classification
A61B6/00
HUMAN NECESSITIES
G06T3/40
PHYSICS
Abstract
A computer-implemented system: at least one processor communicably coupled to at least one nontransitory processor-readable storage medium storing processor-executable instructions or data receives segmentation learning data comprising a plurality of batches of labeled anatomical image sets, each image set comprising image data representative of a series of slices of a three-dimensional bony structure, and each image set including at least one label which identifies the region of a particular part of the bony structure depicted in each image of the image set, wherein the label indicates one of a plurality of classes indicating parts of the bone anatomy; trains a segmentation CNN, that is a fully convolutional neural network model with layer skip connections, to segment semantically at least one part of the bony structure utilizing the received segmentation learning data; and stores the trained segmentation CNN in at least one nontransitory processor-readable storage medium of the machine learning system.
Claims
1. A method, comprising: receiving, at a processor, a training set including sets of labeled anatomical images, each set of labeled anatomical images including a set of two-dimensional (2D) images of a three-dimensional (3D) scan volume of patient anatomy, each labeled anatomical image of each set of labeled anatomical images being associated with one or more labels each identifying a different anatomical part of a set of anatomical parts of an anatomical structure in a portion of the patient anatomy depicted in that labeled anatomical image; processing, in iterations, the sets of labeled anatomical images using a segmentation convolutional neural network (CNN) to produce segmentation outputs associated with the set of anatomical parts for each set of labeled anatomical images; adjusting, after each iteration, one or more parameters of the segmentation CNN based on a difference between the segmentation output produced in that iteration and the one or more labels associated with the set of labeled anatomical images processed in that iteration; and in response to meeting a predetermined criterion, storing the segmentation CNN in a storage medium operatively coupled to the processor.
2. The method of claim 1, wherein the anatomical structure is the spine, and the set of anatomical parts is a set of spine parts including one or more of: a vertebral body, a pedicle, a transverse process, a lamina, or a spinous process.
3. The method of claim 1, further comprising: calculating, after each iteration, a value of a loss function representative of the difference between the segmentation output produced in that iteration and the one or more labels associated with the set of labeled anatomical images processed in that iteration, the one or more parameters of the segmentation CNN being adjusted based on the value of the loss function.
4. The method of claim 1, wherein the one or more parameters of the segmentation CNN include one or more weights of the segmentation CNN.
5. The method of claim 1, wherein the sets of labeled anatomical images are first sets of labeled anatomical images, the method further comprising: validating an accuracy of the segmentation CNN by processing a validation set including a second sets of labeled anatomical images using the segmentation CNN to produce a set of segmentation outputs; and determining one or more performance metrics of the segmentation CNN based on the set of segmentation outputs.
6. The method of claim 5, wherein the predetermined criterion includes the one or more performance metrics indicating that the segmentation CNN has improved.
7. The method of claim 1, wherein the predetermined criterion includes the iterations reaching a predefined number of epochs.
8. The method of claim 1, wherein the training set is a segmentation training set, the method further comprising: receiving, at the processor, a denoising training set, the denoising training set including sets of low quality images paired with sets of high quality images, each high quality image having a lower noise level than the low quality image with which that high quality image is paired; and training a denoising CNN by processing the sets of low quality images using the denoising CNN to produce denoised outputs and adjusting one or more parameters of the denoising CNN based on differences between the denoised outputs and the sets of high quality images.
9. The method of claim 8, further comprising: processing the sets of labeled anatomical images using the trained denoising CNN to denoise the sets of labeled anatomical images, the sets of labeled anatomical images being processed using the segmentation CNN after the sets of labeled anatomical images have been denoised.
10. The method of claim 1, further comprising: processing, using the trained segmentation CNN, a set of input anatomical images to produce segmentation data associated with the set of anatomical parts;
11. The method of claim 10, further comprising: generate, using the segmentation data, a set of output anatomical images including image data from the set of input anatomical images and information identifying one or more anatomical parts of the set of anatomical parts in the image data.
12. The method of claim 10, generating, using the segmentation data, a segmented 3D anatomical model of the anatomical structure; and displaying a visual representation of the segmented 3D anatomical model in which each anatomical part of the set of anatomical parts is displayed using different representation parameters.
13. The method of claim 10, wherein the segmentation data includes per-class probabilities for each pixel of each image of the set of input anatomical images, the pre-class probabilities for each pixel including a probability of that pixel belonging to a class from a set of classes, the set of classes corresponding to the set of anatomical parts.
14. The method of claim 1, further comprising augmenting the training set by: transforming a subset of images from the sets of labeled anatomical images using one or more transformations including one or more of: rotation, scaling, movement, horizontal flip, or additive noise of Gaussian or Poisson distribution and Gaussian blur.
15. A method, comprising: receiving, at a processor, one or more sets of anatomical images, each set of anatomical images including a set of two-dimensional (2D) images of a three-dimensional (3D) scan volume of patient anatomy; processing, for each set of anatomical images at a time, that set of anatomical images using a segmentation convolutional neural network (CNN) trained to segment an anatomical structure to produce segmentation data for that set of anatomical images, the segmentation data associated with a set of anatomical parts of the anatomical structure; and generating a segmented 3D anatomical model of the anatomical structure at least in part by combining the segmentation data produced for the sets of anatomical images, the segmentation 3D anatomical model including information identifying the set of anatomical parts.
16. The method of claim 15, wherein the anatomical structure is the spine, and the set of anatomical parts is a set of spine parts including one or more of: a vertebral body, a pedicle, a transverse process, a lamina, or a spinous process.
17. The method of claim 15, further comprising: pre-processing the sets of anatomical images based on pre-processing performed to images of a training set used to train the segmentation CNN. each set of anatomical images being processed using the segmentation CNN after the pre-processing of that set of anatomical images.
18. The method of claim 15, further comprising: displaying a visual representation of the segmented 3D anatomical model in which each anatomical part of the set of anatomical parts is displayed using different representation parameters.
19. The method of claim 18, wherein the representation parameters include at least one of: a color, an opacity, or a decimation.
20. The method of claim 18, wherein the visual representation is a polygonal mesh of the anatomical structure.
21. The method of claim 15, wherein the segmentation data includes per-class probabilities for each pixel of the images of each set of anatomical images, the pre-class probabilities for each pixel including a probability of that pixel belonging to a class from a set of classes, the set of classes corresponding to the set of anatomical parts.
22. The method of claim 15, wherein the segmentation CNN is a fully convolutional network with skip connections between layers of the fully convolutional network.
23. The method of claim 15, wherein the segmentation CNN includes a contracting path including convolutional layers, pooling layers, and dropout layers, where each pooling or dropout layer is preceded by at least one convolutional layer.
24. The method of claim 15, wherein the segmentation CNN includes an expanding path including convolutional layers, upsampling layers, and a concatenation of feature maps from previous layers of the segmentation CNN, where each upsampling layer is preceded by at least one convolutional layer.
25. The method of claim 15, further comprising: processing the sets of anatomical images using a denoising CNN trained to denoise anatomical images, each set of anatomical images being processed using the segmentation CNN after the denoising of that set of anatomical images.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0041] Various embodiments are herein described, by way of example only, with reference to the accompanying drawings, wherein:
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
DETAILED DESCRIPTION OF THE INVENTION
[0055] The following detailed description is of the best currently contemplated modes of carrying out the invention. The description is not to be taken in a limiting sense, but is made merely for the purpose of illustrating the general principles of the invention.
[0056] The invention relates to processing images of a bony structure, such as a spine, skull, pelvis, long bones, shoulder joint, hip joint, knee joint etc. The foregoing description will present examples related mostly to a spine, but a skilled person will realize how to adapt the embodiments to be applicable to the other bony structures as well.
[0057] Moreover, the invention may include, before segmentation, pre-processing of lower quality images to improve their quality. For example, the lower quality images may be low dose computer tomography (LDCT) images or magnetic resonance images captured with a relatively low power scanner. The foregoing description will present examples related to computer tomography (CT) images, but a skilled person will realize how to adapt the embodiments to be applicable to other image types, such as magnetic resonance images.
[0058]
[0059]
[0060]
[0061] Therefore, in the invention, a low-dose medical images (such as shown in
[0062] For the purposes of this disclosure, the LDCT image is understood as an image which is taken with an effective dose of X-ray radiation lower than the effective dose for the HDCT image, such that the lower dose of X-ray radiation causes appearance of higher amount of noise on the LDCT image than the HDCT image. LDCT images are commonly captured during intra-operative scans to limit the exposure of the patient to X-ray radiation.
[0063] As seen by comparing
[0064] The system and method disclosed below use a neural network and deep-learning based approach. In order for any neural network to work, it must first learn the task. The learning process is supervised (i.e., the network is provided with a set of input samples and a set of corresponding desired output samples). The network learns the relations that enable it to extract the output sample from the input sample. Given enough training examples, the expected results can be obtained.
[0065] In the presented system 100 and methods, for example method 200, a set of samples are generated first, wherein LDCT images and HDCT images of the same object (such as an artificial phantom or a lumbar spine) are captured using the computer tomography device. Next, the LDCT images are used as input and their corresponding HDCT images are used as desired output to teach the neutral network to denoise the images. Since the CT scanner noise is not totally random (there are some components that are characteristic for certain devices or types of scanners), the network learns which noise component is added to the LDCT images, recognizes it as noise and it is able to eliminate it in the following operation, when a new LDCT image is provided as an input to the network.
[0066] By denoising the LDCT images, the presented system and method may be used for intra-operative tasks, to provide high segmentation quality for images obtained from intra-operative scanners on low radiation dose setting.
[0067]
[0068]
[0069] One or more images can be presented to the input layer of the network to learn reasoning from single slice image, or from a series of images fused to form a local volume representation.
[0070] The convolution layers 401 can be of a standard kind, the dilated kind, or a combination thereof, with ReLU or leaky ReLU activation attached.
[0071] The upsampling or deconvolution layers 403 can be of a standard kind, the dilated kind, or a combination thereof, with ReLU or leaky ReLU activation attached.
[0072] The output slice 405 denotes the densely connected layer with one or more hidden layer and a softmax or sigmoid stage connected as the output.
[0073] The encoding-decoding flow is supplemented with additional skipping connections of layers with corresponding sizes (resolutions), which improves performance through information merging. It enables either the use of max-pooling indices from the corresponding encoder stage to downsample, or learning the deconvolution filters to upsample.
[0074] The architecture is general, in the sense that adopting it to images of different size is possible by adjusting the size (resolution) of the layers. The number of layers and number of filters within a layer is also subject to change, depending on the requirements of the application.
[0075] Deeper networks typically give results of better quality. However, there is a point at which increasing the number of layers/filters does not result in significant improvement, but significantly increases the computation time and decreases the network's capability to generalize, making such a large network impractical.
[0076] The final layer for binary segmentation recognizes two classes (bone and no-bone). The semantic segmentation is capable of recognizing multiple classes, each representing a part of the anatomy. For example, for the vertebra, this includes vertebral body, pedicles, processes etc.
[0077]
[0078] The objective of the training for the denoising CNN 300 is to tune the parameters of the denoising CNN 300 such that the network is able to reduce noise in a high noise image, such as shown in
[0079] The objective of the training for the segmentation CNN 400 is to tune the parameters of the segmentation CNN 400 such that the network is able to recognize segments in a denoised image (such as shown in
[0080] The training database may be split into a training set used to train the model, a validation set used to quantify the quality of the model, and a test set.
[0081] The training starts at 501. At 502, batches of training images are read from the training set, one batch at a time. For the denoising CNN, LDCT images represent input, and HRCT images represent desired output. For the segmentation CNN, denoised images represent input, and pre-segmented (by a human) images represent output.
[0082] At 503 the images can be augmented. Data augmentation is performed on these images to make the training set more diverse. The input/output image pair is subjected to the same combination of transformations from the following set: rotation, scaling, movement, horizontal flip, additive noise of Gaussian and/or Poisson distribution and Gaussian blur, etc.
[0083] At 504, the images and generated augmented images are then passed through the layers of the CNN in a standard forward pass. The forward pass returns the results, which are then used to calculate at 505 the value of the loss function—the difference between the desired output and the actual, computed output. The difference can be expressed using a similarity metric, e.g.: mean squared error, mean average error, categorical cross-entropy or another metric.
[0084] At 506, weights are updated as per the specified optimizer and optimizer learning rate. The loss may be calculated using a per-pixel cross-entropy loss function and the Adam update rule.
[0085] The loss is also hack-propagated through the network, and the gradients are computed. Based on the gradient values, the network's weights are updated. The process (beginning with the image batch read) is repeated continuously until an end of the training session is reached at 507.
[0086] Then, at 508, the performance metrics are calculated using a validation dataset—which is not explicitly used in training set. This is done in order to check at 509 whether not the model has improved. If it isn't the case, the early stop counter is incremented at 514 and it is checked at 515 if its value has reached a predefined number of epochs. If so, then the training process is complete at 516, since the model hasn't improved for many sessions now, so it can be concluded that the network started overfitting to the training data.
[0087] If the model has improved, the model is saved at 510 for further use and the early stop counter is reset at 511. As the final step in a session, learning rate scheduling can be applied. The session at which the rate is to be changed are predefined. Once one of the session numbers is reached at 512, the learning rate is set to one associated with this specific session number at 513.
[0088] Once the training is complete, the network can be used for inference, i.e. utilizing a trained model for prediction on new data.
[0089]
[0090] After inference is invoked at 601, a set of scans (LDCT, not denoised) are loaded at 602 and the denoising CNN 300 and its weights are loaded at 603.
[0091] At 604, one batch of images at a time is processed by the inference server. At 605, a forward pass through the denoising CNN 300 is computed.
[0092] At 606, if not all batches have been processed, a new batch is added to the processing pipeline until inference has been performed at all input noisy LDCT images.
[0093] Finally, at 607, the denoised scans are saved.
[0094]
[0095] After inference is invoked at 701, a set of scans (denoised images obtained from noisy LDCT images) are loaded at 702 and the segmentation CNN 400 and its weights are loaded at 703.
[0096] At 704, one batch of images at a time is processed by the inference server.
[0097] At 705, the images are preprocessed (e.g., normalized, cropped) using the same parameters that were utilized during training, as discussed above. In at least some implementations, inference-time distortions are applied and the average inference result is taken on, for example, 10 distorted copies of each input image. This feature creates inference results that are robust to small variations in brightness, contrast, orientation, etc.
[0098] At 706, a forward pass through the segmentation CNN 400 is computed.
[0099] At 707, the system may perform postprocessing such as linear filtering (e.g. Gaussian filtering), or nonlinear filtering, such as median filtering and morphological opening or closing.
[0100] At 708, if not all batches have been processed, a new batch is added to the processing pipeline until inference has been performed at all input images.
[0101] Finally, at 709, the inference results are saved and can be combined to a segmented 3D model. The model can be further converted to a polygonal mesh representation for the purpose of visualization on the display. The volume and/or mesh representation parameters can be adjusted in terms of change of color, opacity, changing the mesh decimation depending on the needs of the operator.
[0102]
[0103]
[0104]
[0105]
[0106]
[0107] Method 300 may further include a step 330 of, for each image of the image data, generating, by the at least one processor, a probability map for each of the plurality of classes using the generated per-class probabilities. Method 300 may still further include a step 340 of storing, by at least one processor, the generated probability maps in the at least one nontransitory processor-readable storage medium.
[0108] Method 300 may also include processing the received image data through the CNN model wherein the CNN model includes a contracting path and an expanding path. Method 300 may also include the contracting path including a number of convolutional layers and a number of pooling layers, each pooling layer preceded by at least one convolutional layer.
[0109] The functionality described herein can be implemented in a computer system. The system may include at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data and at least one processor communicably coupled to that at least one nontransitory processor-readable storage medium. That at least one processor is configured to perform the steps of the methods presented herein.
[0110] While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made. Therefore, the claimed invention as recited in the claims that follow is not limited to the embodiments described herein.