AUTONOMOUS SEGMENTATION OF THREE-DIMENSIONAL NERVOUS SYSTEM STRUCTURES FROM MEDICAL IMAGES
20220245400 · 2022-08-04
Assignee
Inventors
- Krzysztof B. Siemionow (Chicago, IL, US)
- Cristian J. Luciano (Chicago, IL, US)
- Dominik Gawel (Warsaw, PL)
- Edwing Isaac Mejia Orozco (Warsaw, PL)
- Michal Trzmiel (Warsaw, PL)
Cpc classification
G06F18/214
PHYSICS
G06T7/143
PHYSICS
G06V20/653
PHYSICS
International classification
A61B5/00
HUMAN NECESSITIES
G06V10/22
PHYSICS
Abstract
A method for autonomous segmentation of three-dimensional nervous system structures from raw medical images, the method including: receiving a 3D scan volume with a set of medical scan images of a region of the anatomy; autonomously processing the set of medical scan images to perform segmentation of a bony structure of the anatomy to obtain bony structure segmentation data; autonomously processing a subsection of the 3D scan volume as a 3D region of interest by combining the raw medical scan images and the bony structure segmentation data, wherein the 3D ROI contains a subvolume of the bony structure with a portion of surrounding tissues, including the nervous system structure; autonomously processing the ROI to determine the 3D shape, location, and size of the nervous system structures by means of a pre-trained convolutional neural network (CNN).
Claims
1. A method, comprising: processing, using a first convolutional neural network (CNN) trained to segment a first type of tissue structure, a set of two-dimensional (2D) images of a three-dimensional (3D) scan volume of a region of patient anatomy to produce segmentation data associated with a set of anatomical parts of the first type within the region of patient anatomy; generating combined image data by merging the segmentation data associated with the set of anatomical parts of the first type with the set of 2D images; determining a region of interest (ROI) in the combined image data, the ROI being of a sub-volume of the set of anatomical parts and a neighboring set of anatomical parts of a second type of tissue structure, the ROI including voxels each including data from the set of 2D images and data from the segmentation data associated with the set of anatomical parts; and processing, using a second CNN trained to segment the second type of tissue structure, the ROI to produce segmentation data associated with the neighboring set of anatomical parts.
2. The method of claim 1, wherein the first type of tissue structure is bony tissue structure, and the second type of tissue structure is nervous system structure.
3. The method of claim 2, wherein: the set of anatomical parts is of a bony structure, the data from the set of 2D images includes bone density data of the bony structure, and the data from the segmentation data includes classification information for the bony structure.
4. The method of claim 2, wherein the set of anatomical parts is a set of spine parts, the set of spine parts being one or more of: a vertebral body, a pedicle, a transverse process, a lamina, or a spinous process.
5. The method of claim 1, further comprising resizing, before processing the ROI using the second CNN, the ROI to have a predefined size suitable for processing using the second CNN, the second CNN having been trained using ROIs having the predefined size.
6. The method of claim 1, further comprising determining a shape, location, and size of the neighboring set of anatomical parts using the segmentation data.
7. The method of claim 6, wherein determining the shape, position, and size of the neighboring set of anatomical parts includes: determining a shape, position, and size of the neighboring set of anatomical parts in the ROI using the segmentation data; and combining a local coordinate system of the ROI with a global coordinate system of the 3D scan volume to determine a shape, position, and size of the neighboring set of anatomical parts in the 3D scan volume.
8. The method of claim 6, further comprising: generating, after determining the shape, location, and size of the neighboring set of anatomical parts, a 3D anatomical model including the neighboring set of anatomical parts; and displaying, via a display device, a visual representation of the 3D anatomical model.
9. The method of claim 6, further comprising: detecting, based on the shape, location, and size of the neighboring set of anatomical parts, a possible collision between a medical device and a portion of the neighboring set of anatomical parts; and displaying, via a display device, a warning of the possible collision.
10. The method of claim 1, further comprising training, before processing the ROI using the second CNN, the second CNN using a training dataset including ROIs having anatomical parts of the first and second types of tissue structure and classification data of the first and second types of tissue structure in each ROI.
11. The method of claim 10, further comprising augmenting the training dataset by: transforming a set of ROIs using a set of transformations; and transforming the classification data of the first and second types of tissue structure in each ROI of the set of ROIs using the same set of transformations, the second CNN being trained using the training dataset after augmenting the training dataset.
12. An apparatus, comprising: a memory storing instructions; and a processor operatively coupled to the memory, the processor configured to execute the instructions to: process, using a first convolutional neural network (CNN) trained to segment a first type of tissue structure, a set of two-dimensional (2D) images of a three-dimensional (3D) scan volume of a region of patient anatomy to produce segmentation data associated with a set of anatomical parts of the first type within the region of patient anatomy; generate combined image data by merging the segmentation data associated with the set of anatomical parts of the first type with the set of 2D images; determine a region of interest (ROI) in the combined image data, the ROI being of a sub-volume of the set of anatomical parts and a neighboring set of anatomical parts of a second type of tissue structure, the ROI including voxels each including data from the set of 2D images and data from the segmentation data associated with the set of anatomical parts; and process, using a second CNN trained to segment the second type of tissue structure, the ROI to produce segmentation data associated with the neighboring set of anatomical parts.
13. The apparatus of claim 12, wherein the processor is further configured to execute the instructions to determine the shape, position, and size of the neighboring set of anatomical parts by: determining a shape, position, and size of the neighboring set of anatomical parts in the ROI using the segmentation data; and combining a local coordinate system of the ROI with a global coordinate system of the 3D scan volume to determine a shape, position, and size of the neighboring set of anatomical parts in the 3D scan volume.
14. The apparatus of claim 12, wherein the processor is further configured to execute the instructions to: determine a shape, location, and size of the neighboring set of anatomical parts using the segmentation data; and detect, based on the shape, location, and size of the neighboring set of anatomical parts, a possible collision between a medical device and a portion of the neighboring set of anatomical parts.
15. A method, comprising: receiving a set of two-dimensional (2D) images of a three-dimensional (3D) scan volume of a region of patient anatomy, the set of 2D images including information about tissue appearance of first and second types of tissue structure; receiving segmentation data including classification information of a set of anatomical parts of the first type of tissue structure within the set of 2D images, the segmentation data obtained using a first convolutional neural network (CNN) trained to segment the first type of tissue structure; generating combined image data by merging the segmentation data with the set of 2D images, the combined image data including the information about tissue appearance and the classification information of the set of anatomical parts; determining a region of interest (ROI) in the combined image data, the ROI being of a sub-volume of the set of anatomical parts and a neighboring set of anatomical parts of the second type of tissue structure; processing, using a second CNN trained to segment the second type of tissue structure, the ROI to produce segmentation data associated with the neighboring set of anatomical parts; and determining a shape, location, and size of the neighboring set of anatomical parts using the segmentation data associated with the neighboring set of anatomical parts.
16. The method of claim 15, further comprising: generating a 3D anatomical model including the neighboring set of anatomical parts; and displaying, via a display device, a visual representation of the 3D anatomical model.
17. The method of claim 15, further comprising: detecting, based on the shape, location, and size of the neighboring set of anatomical parts, a possible collision between a medical device and a portion of the neighboring set of anatomical parts; and displaying, via a display device, a warning of the possible collision.
18. The method of claim 15, wherein the combined image data includes a set of color-coded 2D images.
19. The method of claim 15, wherein the first type of tissue structure is bony tissue structure, and the second type of tissue structure is nervous system structure.
20. The method of claim 15, further comprising training, before processing the ROI using the second CNN, the second CNN using a training dataset including ROIs having anatomical parts of the first and second types of tissue structure and classification data of the first and second types of tissue structure in each ROI.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0028] Various embodiments are herein described, by way of example only, with reference to the accompanying drawings, wherein:
[0029]
[0030]
[0031]
[0032] embodiment of the invention;
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
DETAILED DESCRIPTION OF THE INVENTION
[0047] The following detailed description is of the best currently contemplated modes of carrying out the invention. The description is not to be taken in a limiting sense, but is made merely for the purpose of illustrating the general principles of the invention.
[0048] Several embodiments of the invention relate to processing three dimensional images of nervous system structures in the vicinity of bones, such as nerves of extremities (arms and legs), cervical, thoracic or lumbar plexus, spinal cord (protected by the spinal column), nerves of the peripheral nervous system, cranial nerves, and others. The invention will be presented below based on an example of a spine as a bone in the vicinity of (and at least partially protecting) the nervous system structures, but the method and system can be equally well used for nervous system structures and other bones.
[0049] Moreover, the invention may include, before segmentation, pre-processing of low quality images to improve their quality. This can be done by employing a method presented in a European patent application EP16195826 by the present applicant or any other pre-processing quality improvement method. The low quality images may be, for example, low dose computer tomography (LDCT) images or magnetic resonance images captured with a relatively low power scanner
[0050] The foregoing description will present examples related to computer tomography (CT) images, but a skilled person will realize how to adapt the embodiments to be applicable to other image types, such as magnetic resonance images.
[0051] The nerve structure identification method as presented herein comprises two main procedures in certain embodiments: 1) human-assisted (manual) training, and 2) computer autonomous segmentation.
[0052] The training procedure, as presented in
[0053] Next, the received images are processed in step 102 to perform autonomous segmentation of tissues, in order to determine separate areas corresponding to different parts of the bony structure, such as vertebral body 16, pedicles 15, transverse processes 14 and/or spinous process 11, as shown in
[0054] Then, in step 103, the information obtained from both original DICOM images and segmentation results is merged to obtain a combined image, comprising information about the tissue appearance and its classification (including assignment of structure parts to classes corresponding to different anatomy parts), for example in a form of a color-coded DICOM image 17, as shown in
[0055] Next, in step 104, from the set of slice images a 3D region of interest (R01) 18 is determined, that contains, for example, a volume of each vertebral level with a part of surrounding tissues including the nervous system structures and other structures such as muscles, vessels, ligaments, intervertebral discs, joints, cerebrospinal fluid, and others, as shown in
[0056] Then, in step 105, the 3D resizing of the determined ROI 18 is performed to achieve the same size of all ROI's stacked in the 3D matrices, each containing information about voxel distribution along X, Y and Z axes and the appearance and classification information data of bony structure, such as shown in the resizing (19A) of
[0057] Next, in step 106, a training database is prepared by a human, that comprises the previously determined ROIs and corresponding manually segmented nervous system structures.
[0058] Next, in step 107, the training database is augmented, for example with the use of a 3D generic geometrical transformation and resizing with dense 3D grid deformations. An example of such transformation for data augmentation 20 is shown in
[0059] Then, in step 108, a convolutional neural network (CNN) is trained with manually segmented images (by a human) to segment the nervous system structures. In certain embodiments, a network with a plurality of layers can be used, specifically a combination of convolutional with ReLU activation functions or any other non-linear or linear activation functions. For example, a network such as shown in
[0060] The segmentation procedure, as presented in
[0061] Next, in step 306, the nervous system structures are autonomously segmented by processing the resized ROI to determine the 3D size and shape of the nervous system structure(s), by means of the pretrained nervous-system-structure segmentation CNN 400, as shown in
[0062] In step 307 the information about the global coordinate system (ROI position in the DICOM dataset) and local ROI coordinate system (segmented nervous system structures size, shape and position inside the ROI) is recombined.
[0063] Next, in step 308, the output, including the segmented nervous system structures, is visualized.
[0064] Anatomical knowledge of position, size, and shape of nervous system structure(s) allow for real-time calculation of a possible collision detection with nervous system structure(s) (
[0065]
[0066] One or more 3D ROI's can be presented to the input layer of the network to learn reasoning from the data.
[0067] The type of convolution layers 401 can be standard, dilated, or hybrids thereof, with ReLU, leaky ReLU or any other kind of activation function attached.
[0068] The type of upsampling or deconvolution layers 403 can also be standard, dilated, or hybrid thereof, with ReLU or leaky ReLU activation function attached.
[0069] The output layer 405 denotes the densely connected layer with one or more hidden layer and a softmax or sigmoid stage connected as the output.
[0070] The encoding-decoding flow is supplemented with additional skipping connections of layers with corresponding sizes (resolutions), which improves performance through information merging. It enables either the use of max-pooling indices from the corresponding encoder stage to downsample, or learning the deconvolution filters to upsample.
[0071] The general CNN architecture can be adapted to consider ROI's of different sizes. The number of layers and number of filters within a layer are also subject to change depending on the anatomical areas to be segmented.
[0072] The final layer for binary segmentation recognizes two classes: 1) nervous system structure, and 2) the background).
[0073] Additionally Select-Attend-Transfer (SAT) gates or Generative Adversarial Networks (GAN) can be used to increase the final quality of the segmentation. Introducing Select-Attend-Transfer gates to the encoder-decoder neural network results in focusing the network on the most important tissue features and their localization, simultaneously decreasing the memory consumption. Moreover, the Generative Adversarial Networks can be used to produce new artificial training examples.
[0074] The semantic segmentation is capable of recognizing multiple classes, each representing a part of the anatomy. For example the nervous system structure may include nerves of the upper and lower extremities, cervical, thoracic or lumbar plexus, the spinal cord, nerves of the peripheral nervous system (e.g., sciatic nerve, median nerve, brachial plexus), cranial nerves, and others.
[0075]
[0076] The training starts at 501. At 502, batches of training 3D images (ROIs) are read from the training set, one batch at a time. For the segmentation, 3D images (ROIs) represent the input of the CNN, and the corresponding pre-segmented 3D images (ROIs), which were manually segmented by a human, represent its desired output.
[0077] At 503, the original 3D images (ROIs) can be augmented. Data augmentation is performed on these 3D images (ROIs) to make the training set more diverse. The input and output pair of three dimensional images (ROIs) is subjected to the same combination of transformations.
[0078] At 504, the original 3D images (ROIs) and the augmented 3D images (ROIs) are then passed through the layers of the CNN in a standard forward pass. The forward pass returns the results, which are then used to calculate at 505 the value of the loss function (i.e., the difference between the desired output and the output computed by the CNN). The difference can be expressed using a similarity metric (e.g., mean squared error, mean average error, categorical cross-entropy, or another metric).
[0079] At 506, weights are updated as per the specified optimizer and optimizer learning rate. The loss may be calculated using a per-pixel cross-entropy loss function and the Adam update rule.
[0080] The loss is also back-propagated through the network, and the gradients are computed. Based on the gradient values, the network weights are updated. The process, beginning with the 3D images (ROIs) batch read, is repeated continuously until an end of the training session is reached at 506.
[0081] Then, at 508, the performance metrics are calculated using a validation dataset—which is not explicitly used in training set. This is done in order to check at 509 whether not the model has improved. If it is not the case, the early stop counter is incremented by one at 514, as long as its value has not reached a predefined maximum number of epochs at 515. The training process continues until there is no further improvement obtained at 516. Then the model is saved at 510 for further use, and the early stop counter is reset at 511. As the final step in a session, learning rate scheduling can be applied. The session at which the rate is to be changed are predefined. Once one of the session numbers is reached at 512, the learning rate is set to one associated with this specific session number at 513.
[0082] Once the training process is complete, the network can be used for inference (i.e., utilizing a trained model for autonomous segmentation of new medical images).
[0083]
[0084] After inference is invoked at 601, a set of scans (three dimensional images) are loaded at 602 and the segmentation CNN 400 and its weights are loaded at 603.
[0085] At 604, one batch of three dimensional images (ROIs) at a time is processed by the inference server.
[0086] At 605, the images are preprocessed (e.g., normalized, cropped, etc.) using the same parameters that were utilized during training. In at least some implementations, inference-time distortions are applied and the average inference result is taken on, for example, 10 distorted copies of each input 3D image (ROI). This feature creates inference results that are robust to small variations in brightness, contrast, orientation, etc.
[0087] At 606, a forward pass through the segmentation CNN 400 is computed.
[0088] At 606, the system may perform post-processing such as linear filtering (e.g., Gaussian filtering), or nonlinear filtering (e.g., median filtering, and morphological opening or closing).
[0089] At 608, if not all batches have been processed, a new batch is added to the processing pipeline until inference has been performed at all input 3D images (ROIs).
[0090] Finally, at 609, the inference results are saved and can be combined into a segmented 3D anatomical model. The model can be further converted to a polygonal mesh for the purpose of visualization. The volume and/or mesh representation parameters can be adjusted in terms of change of color, opacity, changing the mesh decimation depending on the needs of the operator.
[0091]
[0092]
[0093]
[0094] The functionality described herein can be implemented in a computer-implemented system 900, such as shown in
[0095] The computer-implemented system 900, for example a machine-learning system, may include at least one non-transitory processor-readable storage medium 910 that stores at least one of processor-executable instructions 915 or data; and at least one processor 920 communicably coupled to the at least one non-transitory processor-readable storage medium 910. The at least one processor 920 may be configured to (by executing the instructions 915) to perform the steps of the method of
[0096] While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made. Therefore, the claimed invention as recited in the claims that follow is not limited to the embodiments described herein