GPU-based system for performing 2D-3D deformable registration of a body organ using multiple 2D fluoroscopic views
11164324 · 2021-11-02
Assignee
Inventors
Cpc classification
A61B6/0492
HUMAN NECESSITIES
A61B6/5235
HUMAN NECESSITIES
A61B6/5205
HUMAN NECESSITIES
A61B6/4476
HUMAN NECESSITIES
A61B6/547
HUMAN NECESSITIES
G06T11/008
PHYSICS
G06T7/30
PHYSICS
A61B6/12
HUMAN NECESSITIES
International classification
G06T7/30
PHYSICS
A61B6/00
HUMAN NECESSITIES
A61B6/04
HUMAN NECESSITIES
A61B6/12
HUMAN NECESSITIES
Abstract
Systems and methods for assisting a physician in a medical intervention comprises performing a 2D-3D deformable registration, and more particularly, performing a 2D-3D registration based on multiple live 2D fluoroscopic views, and implemented on a multi-core processing framework such as a Graphics Processing Unit.
Claims
1. A system for deformably registering pre-acquired 3D CT image data of a patient to real time 3D data of the patient comprising: a first memory for receiving 3D CT image data of a body organ of the patient; a second memory for receiving real-time image data of the patient from an imaging unit; wherein the real-time image data comprises multiple real-time images obtained at different view angles; a processor framework operable to: (a) generate a candidate 3D deformation field based on one or more deformation parameters and the 3D CT image data; (b) create at least one digitally reconstructed radiograph (DRR) for each real-time image of the multiple real-time images based on the candidate 3D deformation field and the different view angles corresponding to the multiple real-time images; (c) determine an optimal deformation field by evaluating each DRR and multiple real-time images for a match; and if a match is not reached, adjust the one or more deformation parameters and repeat the above processor framework steps; or if a match is reached, output the most-recently generated candidate deformation field as the optimal deformation field.
2. The system of claim 1, further comprising the imaging unit, and wherein the imaging unit is one selected from the group consisting of a C-Arm, Biplanar fluoroscopy, 3D fluoroscopy, and Cone Beam CT.
3. The system of claim 1, wherein the processor framework is operable to compute cost function based on the candidate 3D deformation field and the real time image data.
4. The system of claim 3, wherein the processor framework is further operable to compute gradient vector based on the cost function.
5. The system of claim 4, wherein the processer framework comprises a GPU, and the computation of the cost function and gradient vector are performed on the GPU.
6. The system of claim 5, wherein the processer framework is operable to implement a learning-based or BFGS algorithm to determine the optimal deformation field based on the cost function and the gradient vector.
7. The system of claim 5, wherein the deformation parameters comprise a plurality of control points, and wherein the processor framework is operable to compute, in parallel, affected regions of the 3D CT image data for the plurality of control points.
8. The system of claim 1, wherein the processor framework is operable to compute the deformation field based on images arising from different breathing levels using 4D CT.
9. A computer-implemented method for assisting a physician in a medical intervention on a patient comprising: receiving pre-acquired 3D image data of a body organ of the patient prior to the medical intervention; receiving real-time image data of the body organ, wherein the real-time image data comprises multiple real-time images obtained at different view angles; and performing, on a processing framework, computations to determine a deformable registration on the pre-acquired 3D image data and the real-time data from the receiving step, and wherein the processing framework outputs an optimal 3D deformation field; wherein the performing comprises: generating a candidate 3D deformation field based on one or more deformation parameters and the 3D image data; creating at least one digitally reconstructed radiograph (DRR) for each real-time image of the multiple real-time images based on the candidate deformation field and the different view angles corresponding to the multiple real-time images; determining the optimal deformation field by evaluating each DRR and the multiple real-time images for a match; and if a match is not reached, adjust the one or more deformation parameters and repeat the above performing steps; or if a match is reached, output the optimal deformation field.
10. The method of claim 9 wherein the performing step includes computing a cost function value and a gradient vector to determine the deformable registration on the pre-acquired 3D image data.
11. The method of claim 10 comprising inputting the cost function value and the gradient vector into an optimizing algorithm.
12. The method of claim 11 comprising computing, in parallel, affected regions of the pre-acquired 3D data for a plurality of control points.
13. The method of claim 12 wherein the gradient vector further includes generating a first lookup table for pixel IDs in the affected regions for each control point.
14. The method of claim 13 wherein the multiple parallel computations are carried out on a graphics processing unit (GPU).
15. The method of claim 14 wherein the optimizing algorithm is a learning or limited-memory BFGS based algorithm.
16. The method of claim 9, wherein the receiving real time 3D data of the body organ is carried out with a unit for obtaining images selected from the group consisting of a C-Arm, Biplanar fluoroscopy, 3D fluoroscopy, and Cone Beam CT.
17. The method of claim 9, wherein the performing step uses multiple fluoroscopic views at a similar breathing level.
18. The method of claim 9, wherein the performing step is performed at different breathing levels using 4D CT.
19. The system of claim 1, wherein the evaluating each DRR and multiple real-time images for a match is performed simultaneously.
20. A computer-implemented method for assisting a physician in a medical intervention on a patient comprising: receiving pre-acquired 3D image data of a body organ of the patient prior to the medical intervention; receiving real-time image data of the body organ; and performing, on a processing framework, computations to determine a deformable registration on the pre-acquired 3D image data and the real-time image data from the receiving steps, and wherein the processing framework computes a 3D deformation field based on an optimizing algorithm; wherein the performing step comprises computing a cost function value and a gradient vector to determine the deformable registration on the pre-acquired 3D image data; computing, in parallel, affected regions of the pre-acquired 3D image data for a plurality of control points; wherein the gradient vector includes generating a first lookup table for pixel IDs in the affected regions for each control point; and inputting the cost function value and the gradient vector into the optimizing algorithm.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF THE INVENTION
(8) Before the present invention is described in detail, it is to be understood that this invention is not limited to particular variations set forth herein as various changes or modifications may be made to the invention described and equivalents may be substituted without departing from the spirit and scope of the invention. As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process act(s) or step(s) to the objective(s), spirit or scope of the present invention. All such modifications are intended to be within the scope of the claims made herein.
(9) Methods recited herein may be carried out in any order of the recited events which is logically possible, as well as the recited order of events. Furthermore, where a range of values is provided, it is understood that every intervening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the invention. Also, it is contemplated that any optional feature of the inventive variations described may be set forth and claimed independently, or in combination with any one or more of the features described herein.
(10) All existing subject matter mentioned herein (e.g., publications, patents, patent applications and hardware) is incorporated by reference herein in its entirety except insofar as the subject matter may conflict with that of the present invention (in which case what is present herein shall prevail). The referenced items are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such material by virtue of prior invention.
(11) Reference to a singular item, includes the possibility that there are plural of the same items present. More specifically, as used herein and in the appended claims, the singular forms “a,” “an,” “said” and “the” include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation. Last, it is to be appreciated that unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
(12) Described herein are systems and methods for assisting a physician to perform a 2D-3D deformable registration, and more particular, a 2D-3D registration using multiple 2D fluoroscopic views implemented on a Graphics Processing Unit (GPU), or similar programmable logic chip, or processor, or a parallel architecture.
(13) Overview
(14)
(15) With reference to
(16) A fluoroscope 40 takes real time fluoroscopic video of the subject. Video frames of the video or images are collected or received by workstation 50 for processing. Real time images may also be displayed on a video monitor 62. Examples of the fluoroscope unit, without limitation, are C-Arm, Biplanar fluoroscopy, 3D fluoroscopy, Cone Beam CT, etc.
(17) The location and pose of the fluoroscopy camera 42 is tracked with a tracking sensor 44. In the fluoroscopy unit shown in
(18) Additionally, the system can work without a tracking mechanism. The 3D camera positions of the fluoroscope relative to each other can be given or computed separately. For example, relative camera positions can be given by the fluoroscope unit itself, or can be computed by performing a rigid 2D-3D registration for each fluoroscopic image separately, or can be computed by observing 2D location of some fixed radio-opaque markers in fluoroscopic images.
(19) Workstation 50, as will be described in more detail below, is configured to receive the fluoroscopy images from the fluoroscopy unit 40 in real-time.
(20)
(21) The workstation 50 shown in
(22) As will be discussed in more detail herein, the workstation 50 includes at least one processor 70 (e.g., without limitation, a GPU or other multi-core processing framework) operable to perform a deformable registration in real time based on, amongst other things, the real-time fluoroscopy images.
(23) Workstation 50 is also shown having a memory device 80 which holds or stores information including imaging, device, marker, and procedural data. The memory device may be a hard drive, for example.
(24) The workstation 50 shown in
(25) The system 90 shown in
(26) It is to be understood, however, that although the system in
(27) The system 90 shown in
(28)
(29) Multiple DRRs are created 122, 124 from the deformed 3D image data 116.
(30) Next, and as will be described in more detail herein, the DRRs 122, 124 are compared 130 to the multiple fluoroscopic images 132, 134.
(31) When comparing a virtual fluoroscopic 2D image or the DRR with its candidate live fluoroscopic image, the DRR is produced according to the recorded view angle from the fluoroscope as discussed above in connection with
(32) If the images match, the deformed CT and the DRRs are output 142. Otherwise, (e.g., if the images do not match to a desired level of accuracy), the control points 126 are adjusted 144 to deform the 3D image data.
(33) Registering Multiple Fluoroscopic Images
(34) In embodiments, the step matching or comparing the multiple fluoroscopic images to a 3D image data comprises determining the 3D deformation field U.
(35) In embodiments, the 3D deformation field U is solved as an optimization problem comprising the step of computing a cost function. The following is an example of a cost function:
J(CT,Fluo,U)=Metric(DRR(U(CT)),Fluo) (1)
(36) in which, the CT represents a 3D CT image, Fluo represents one big fluoroscopic image constituted by putting the fluoroscopic images one after another. The U is the unknown deformation field. U(CT) represents the deformed CT. DRR represents a big DRR constituted by putting the DRR at different views one after another. DRR may be produced by a ray casting method, quite similar with CT imaging. It is also possible to add additional constraints in (1) on the deformation field T which may be based on properties of anatomical tissues in the 3D image. Examples of such anatomical properties, without limitation, include rigidity of spinal column, ribs, and lung tissues.
(37) For the Metric, to match DRR with Fluo, we can use the normalized cross-correlation (NCC) since it works well in the same modality registration,
(38)
(39) in which,
(40) In one preferred GPU implementation of the cost and the gradient, the standard NCC equation (2) is adapted accordingly to suit the GPU implementation.
(41) In embodiments, the deformation is modeled as a Free-From Deformation (FFD), which is a piece-wise cubic polynomial. (Lee S, Wolberg G, Shin S Y. Scattered data interpolation with multilevel B-splines. IEEE Transactions on Visualization and Computer Graphics 2002: 3: 228-44.) FFD can be manipulated by a regular control grid with spacing s.sub.x×s.sub.y×s.sub.z for a 3D image. FFD is computationally efficient, because the deformation at any point is only influenced by its surrounding 4×4×4 control points.
(42) For a point p with coordinate (x,y,z), assume its 4×4×4 control points are p.sub.ijk, i, j, k=0 . . . 3. Denote d.sub.ijk as the displacement vector associated with the control point p.sub.ijk, the displacement at point p is defined as,
(43)
(44) where
(45)
is the i-th basis function of Bsplines. (Lee S, Wolberg G, Shin S Y. Scattered data interpolation with multilevel B-splines. IEEE Transactions on Visualization and Computer Graphics 2002: 3: 228-44.)
(46) We can denote the vector U as the concatenation of d.sub.ijk.
(47) Cost function (1) is optimized by a L-BFGS optimizer, which believed to be more efficient compared to a simple gradient descent optimizer for high dimensional optimization problems. See, for example, Nocedal J and Wright S J. Numerical optimization. New York: Springer Verlag; 1999: 221-26.
(48) L-BFGS is a gradient-based optimizer. Both the gradient and cost are needed in the optimizer. For the cost function (1), the gradient is calculated analytically since the image, the deformation and the DRR projection can be represented analytically. Various methods may be used to calculate the gradient. An exemplary method to calculate the gradient is, for example, center differencing. Ames, W. F., (1977). Numerical Methods for Partial Differential Equations, Section 1.6. Academic Press, New York. ISBN 0-12-056760-1.
(49) GPU Implementation
(50) In embodiments a GPU implementation of 2D-3D deformable registration uses OpenCL1.1 on GeForce GTX TITAN Black GPU (manufactured by Nvidia Corporation, Santa Clara, Calif.). OpenCL1.1 is a computer language for GPU and described in, for example, John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Des. Test 12, 3 (May 2010), 66-73.
(51)
(52)
(53) The cost value 202 may be calculated using a series of steps (namely, the cost pipeline 212). The gradient is calculated using another series of steps (namely, the gradient pipeline 214).
(54) Cost Pipeline
(55)
(56) In the deforming step 220, the CT image is deformed 220 according to the current deformation field U. This step 220 may be carried out by a program or kernel as described in, for example, Modat, Marc et al., Fast free-form deformation using graphics processing units, Computer Methods and Programs in Biomedicine, Volume 98, Issue 3, 278-284.
(57) In the generating step 222, the DRR is generated. A program or DRRGenerator kernel may be used to produce the DRR based on the deformed image as described in, for example, Tornai G J, Cserey G, Pappas I, Fast DRR generation for 2D to 3D registration on GPUs, MedPhys. 2012 August; 39(8):4795-9. doi: 10.1118/1.4736827.
(58) Next, the partial sum is computed by a program or kernel 224. The inputs of this program are the DRR and the Fluoroscopy, and the output is the Partial sum vector. In embodiments, the kernel proceeds as follows: load the DRR and Fluoro into global memory; load the DRR and Fluoro into share memory; in a loop, calculate the sum for one pair in a binary fashion; and output the result of each group to the global memory.
(59) More specifically, the partial sum may be computed using an applicable normalized cross correlation. The applicable normalized cross correlation (NCC) is:
(60)
(61) In equation (4), there are five different summation operations, which can be implemented using GPU reduction operations (Mark Harris, Optimizing Parallel Reduction in CUDA, NVIDIA Developer Technology, http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf). In embodiments an 8-component vector is employed. The first 5 components are used to store the summations.
(62) Preferably, the reduction on the 8-component vector is performed only once. In GPU reduction, the vector will be divided into array size/work-group size parts. Each thread group is responsible for the summation of its corresponding part. For instance, if the vector size is 256×256, after reduction, the output is a 256 vector with each component storing a partial summation.
(63) Global synchronization can be performed using the kernel call. For example, after kernel PartialINCC 224, the array size/work-group size partial results will appear in GPU 228 global memory. These partial results may be fed into the CPU 226 to finish the final summation. The cost value 202 is sent to L-BFGS optimizer 210. Some intermediate results of the NCC are sent to the gradient pipeline for incremental calculation of the cost in the gradient calculation, as described herein.
(64) Gradient Pipeline
(65) As mentioned above, one algorithm to calculate the gradient 204 is central differences. For simplicity, in the following, we only consider the derivative along x direction:
J.sub.x=(J(x+Δx)−J(x−Δx))/2Δx
(66) In an embodiment, each GPU thread calculates the derivative for one control point. Each thread can invoke the cost pipeline twice with small disturbance along positive x and negative x directions, respectively.
(67) Each thread can hold a global memory to store the DRR, and consequently a large amount of memory needed. Also, the computation for one thread to calculate the cost is heavy.
(68) In another embodiment, in each thread, not all the DRR is stored nor is the cost calculated from scratch.
(69) A program or kernel 240 may be employed to carry out the above described steps and implemented on GPU 228. For example, a Kernel Gradient 240 may perform the following steps: Loop the local DRR region to incrementally calculate the NCC along positive (+) x direction 241; Loop the local DRR region to incrementally calculate the NCC along negative (−) x direction 241; Use central differences to calculate the derivative; and Output the gradient vector 204 to CPU 226.
(70)
(71) In embodiments a lookup table for each control point may be built, in which each entry records the pixel Id of the DRR. This table does not change in the whole optimization procedure. It only needs to be calculated once. For example, the kernel LocalDRRPixelIDLTBBuilder can perform this task.
(72) In the Kernel LocalDRRPixelIDLTBBuilder, the input is the FFD control points and the output is a lookup table recording the DRR pixel ID in the affected region. The kernel proceeds as follows: find the 5×5×5 region in 3D CT space; determine if the ray is intersected with the region or not; and use a gather operation to record all DRR pixel IDs for each 5×5×5 region.
(73)
(74) Without being bound to theory, parallelizing on the ray level is desirable because the size of the DRR is much larger than the number of cubes. Parallelizing (namely, performing parallel computations) on the ray level can dramatically reduce the burden of the kernel, which is favored by multi-core processing frameworks such as, for example, a GPU. GPU-based frameworks are well suited for, amongst other things, simultaneously running numerous simple tasks rather than a few complex tasks.
(75) With reference again to
(76) A kernel 234 (e.g., Kernel LocalDRRsGenerator) calculates the DRR pixel value for each control point in parallel. The inputs of this kernel are the LocalDRRPixelIdLookupTable, FFDLookupTable 236 and RaySumLookupTable and CT, and the output is the LocalDRRs 241, which record the updated DRR pixels for all 5×5×5 region. Steps for this kernel are as follows: get the pixel Id from LocalDRRPixelIdLookupTable; for all samples on the ray inside the 5×5×5 region, tri-linear interpolation the intensity based on the FFDLookupTable and the CT; update the sum of the samples on the ray inside the 5×5×5 region and then update the RaySumLookupTable; and generate updated DRR (local DRR) for each 5×5×5 region.
(77) The kernel is launched as the LocalDRRPixelIDLTBBuilder with the same parallel granularity. Each thread will redo the ray casting, and all DRR pixels corresponding with the control point will be collected by a gather operation. In this kernel, the tri-linear interpolation is performed, which is facilitated by a pre-computed lookup table: FFDLookupTable, an intermediate result of kernel CTDeformation
(78) Since only part of the DRR is affected after the disturbance of one control point. An incremental method can be used to update the NCC instead calculating it from scratch.
(79) To reach the incremental calculation, the original NCC equation is derived as:
(80)
(81) in which,
(82) From equation (5), except the term Δ
(83) So, in the kernel Gradient 240, we can first loop the small affected region, output by kernel DRRGenerationForGradient, to get the sum of the changed DRR pixels and then combine with the original mean and the number of the pixels, we can calculate Δ
(84) The gradient difference and pattern intensity metrics provides more robust and accurate in 2D-3D registration than some alternative metrics. This is supported in the literature. See, for example, Penney, G., Weese, J., Little, J. A., Desmedt, P., Hill, D. L., Hawkes, D. J., 1998. A, comparison of similarity measures for use in 2-D-3-D medical image registration. IEEE Transactions on Medical Imaging 17, 586-595. The Gradient kernel 240 is independent of the LocalDRRsGenerator 234 kernel. Therefore, only the Gradient kernel 240 needs to be modified if another metric is introduced.
(85) In another embodiment, a non-transitory computer readable medium includes a set of instructions for use with a computer to carry out any one or combination of the steps, functions, and methods described herein. In another embodiment, a GPU includes a set of instructions to carry out any one or combination of the steps, functions, and methods described herein.
(86) While preferred embodiments of this disclosure have been shown and described, modifications thereof can be made by one skilled in the art without departing from the scope or teaching herein. The embodiments described herein are exemplary only and are not intended to be limiting. Because many varying and different embodiments may be made within the scope of the present inventive concept, including equivalent structures, materials, or methods hereafter thought of, and because many modifications may be made in the embodiments herein detailed in accordance with the descriptive requirements of the law, it is to be understood that the details herein are to be interpreted as illustrative and not in a limiting sense.