HARMONIZATION OF DIFFUSION MAGNETIC RESONANCE IMAGING

Abstract

A computer system that performs pairwise harmonization is described. During operation, the computer system may receive information specifying magnetic-resonance diffusion measurements, where the magnetic-resonance diffusion measurements are associated with at least an individual and a first site. Then, the computer system may perform the pairwise harmonizing of the magnetic-resonance diffusion measurements based at least in part on target magnetic-resonance diffusion measurements, where the target magnetic-resonance diffusion measurements are associated with a population and one or more target sites, and the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements may be acquired using different magnetic-resonance scanners. Next, the computer system may perform quality control on the pairwise harmonized magnetic-resonance diffusion measurements, where the quality control may include comparing the pairwise harmonized magnetic-resonance diffusion measurements associated with the first site and the one or more target sites.

Claims

1. A computer system, comprising: a computation device; memory configured to store program instructions, wherein, when executed by the computation device, the program instructions cause the computer system to perform one or more operations comprising: receiving information specifying magnetic-resonance diffusion measurements, wherein the magnetic-resonance diffusion measurements are associated with at least an individual and a first site; performing pairwise harmonizing of the magnetic-resonance diffusion measurements based at least in part on target magnetic-resonance diffusion measurements, wherein the target magnetic-resonance diffusion measurements are associated with a population and one or more target sites, and wherein the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements are acquired using different magnetic-resonance scanners; and performing quality control on the pairwise harmonized magnetic-resonance diffusion measurements, wherein the quality control comprises comparing the pairwise harmonized magnetic-resonance diffusion measurements associated with the first site and the one or more target sites, wherein the target magnetic-resonance diffusion measurements from the population are used as a prior for computing of parameters for a moving population associated with the first site.

2. The computer system of claim 1, wherein the magnetic-resonance diffusion measurements comprises measurements associated with the individual and a second individual.

3. The computer system of claim 1, wherein the magnetic-resonance diffusion measurements are associated with a different number of individuals than the target magnetic-resonance diffusion measurements.

4. The computer system of claim 1, wherein the quality control comprises detecting a harmonization error in the pairwise harmonized magnetic-resonance diffusion measurements based at least in part on the pairwise harmonized magnetic-resonance diffusion measurements and a quality-control metric; and wherein the detecting comprises segmenting the pairwise harmonized magnetic-resonance diffusion measurements into a first type of magnetic-resonance diffusion result or a second type of magnetic-resonance diffusion result.

5. The computer system of claim 1, wherein the performing of the pairwise harmonization comprises eliminating one or more outliers in the magnetic-resonance diffusion measurements based at least in part on the pairwise harmonized magnetic-resonance diffusion measurements.

6. The computer system of claim 1, wherein the magnetic-resonance diffusion measurements comprise tractography results.

7. The computer system of claim 1, wherein the magnetic-resonance diffusion measurements comprise: fractional anisotropy, apparent fiber density, an amount of free water, a demyelination metric, one or more brain-region shape metric, a scalar map, or a scalar value.

8. The computer system of claim 1, wherein the pairwise harmonizing reduces or eliminates additive and multiplicative bias in the magnetic-resonance diffusion measurements.

9. The computer system of claim 1, wherein, after performing the pairwise harmonization, the operations comprise pooling the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements.

10. The computer system of claim 1, wherein, after performing the pairwise harmonization, the operations comprise generating a normative reference based at least in part on the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements.

11. The computer system of claim 1, wherein the magnetic-resonance diffusion measurements are associated with clinical operation of a magnetic-resonance scanner.

12. The computer system of claim 1, wherein the pairwise harmonization facilitates personalized medicine for at least the individual.

13. The computer system of claim 1, wherein the performing of the pairwise harmonization comprises: fitting the magnetic-resonance diffusion measurements using a data model; removing covariates and an intercept from the magnetic-resonance diffusion measurements using the data model; after removing the covariates and the intercept, computing variance in the magnetic-resonance diffusion measurements; after computing the variance, correcting the magnetic-resonance diffusion measurements for multiplicative bias; and adding second covariates and a second intercept associated with the target magnetic-resonance diffusion measurements to the magnetic-resonance diffusion measurements.

14. The computer system of claim 13, wherein the data model comprises a nonlinear model.

15. The computer system of claim 13, wherein the variance is computed using maximum likelihood.

16. The computer system of claim 13, wherein the performing of the pairwise harmonization comprises hyper-parameter autotuning with the target magnetic-resonance diffusion measurements to prevent overfitting to the data model.

17. The computer system of claim 1, wherein the performing of the pairwise harmonization comprises supplementing the magnetic-resonance diffusion measurements with additional magnetic-resonance diffusion measurements from a different magnetic-resonance scanner when a number of individuals associated with the magnetic-resonance diffusion measurements is less than a predefined value; and wherein the magnetic-resonance diffusion measurements and the additional magnetic-resonance diffusion measurements have characteristics within a predefined value of each other.

18. A non-transitory computer-readable storage medium for use in conjunction with a computer system, the computer-readable storage medium configured to store program instructions that, when executed by the computer system, causes the computer system to perform one or more operations comprising: receiving information specifying magnetic-resonance diffusion measurements, wherein the magnetic-resonance diffusion measurements are associated with at least an individual and a first site; performing pairwise harmonizing of the magnetic-resonance diffusion measurements based at least in part on target magnetic-resonance diffusion measurements, wherein the target magnetic-resonance diffusion measurements are associated with a population and one or more target sites, and wherein the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements are acquired using different magnetic-resonance scanners; and performing quality control on the pairwise harmonized magnetic-resonance diffusion measurements, wherein the quality control comprises comparing the pairwise harmonized magnetic-resonance diffusion measurements associated with the first site and the one or more target sites, wherein the target magnetic-resonance diffusion measurements from the population are used as a prior for computing of parameters for a moving population associated with the first site.

19. A method for performing pairwise harmonization, comprising: by a computer system: receiving information specifying magnetic-resonance diffusion measurements, wherein the magnetic-resonance diffusion measurements are associated with at least an individual and a first site; performing the pairwise harmonizing of the magnetic-resonance diffusion measurements based at least in part on target magnetic-resonance diffusion measurements, wherein the target magnetic-resonance diffusion measurements are associated with a population and one or more target sites, and wherein the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements are acquired using different magnetic-resonance scanners; and performing quality control on the pairwise harmonized magnetic-resonance diffusion measurements, wherein the quality control comprises comparing the pairwise harmonized magnetic-resonance diffusion measurements associated with the first site and the one or more target sites, wherein the target magnetic-resonance diffusion measurements from the population are used as a prior for computing of parameters for a moving population associated with the first site.

20. The method of claim 19, wherein the method comprises detecting a harmonization error in the magnetic-resonance diffusion measurements based at least in part on the pairwise harmonizing and a quality-control metric; and wherein the detecting comprises segmenting the magnetic-resonance diffusion measurements into a first type of magnetic-resonance diffusion result or a second type of magnetic-resonance diffusion result.

Description

BRIEF DESCRIPTION OF THE FIGURES

[0025] FIG. 1 is a drawing of an example of an end-to-end tractometry/tractography pipeline in accordance with an embodiment of the present disclosure.

[0026] FIG. 2 is a block diagram illustrating an example of a computer system in accordance with an embodiment of the present disclosure.

[0027] FIG. 3 is a flow diagram illustrating an example of a method for performing pairwise harmonization using a computer system in FIG. 2 in accordance with an embodiment of the present disclosure.

[0028] FIG. 4 is a drawing illustrating an example of communication between components in a computer system in FIG. 2 in accordance with an embodiment of the present disclosure.

[0029] FIG. 5 is a drawing of an example of raw diffusion MRI (dMRI) in accordance with an embodiment of the present disclosure.

[0030] FIG. 6 is a drawing of an example of fractional anisotropy in accordance with an embodiment of the present disclosure.

[0031] FIG. 7 is a drawing of an example of unharmonized apparent fiber density in accordance with an embodiment of the present disclosure.

[0032] FIG. 8 is a drawing of an example of harmonized apparent fiber density in accordance with an embodiment of the present disclosure.

[0033] FIG. 9 is a drawing of examples of a Vanilla (existing) ComBAT technique and a Clinical ComBAT technique in accordance with an embodiment of the present disclosure.

[0034] FIG. 10 is a drawing of an example of a Vanilla ComBAT technique in accordance with an embodiment of the present disclosure.

[0035] FIG. 11 is a drawing of an example of harmonization of mean diffusivity using a Vanilla ComBAT technique in accordance with an embodiment of the present disclosure.

[0036] FIG. 12 is a drawing of an example of harmonization of mean diffusivity using a Vanilla ComBAT technique in accordance with an embodiment of the present disclosure.

[0037] FIG. 13 is a drawing of an example of harmonization using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure.

[0038] FIG. 14 is a drawing of an example of outlier rejection in accordance with an embodiment of the present disclosure.

[0039] FIG. 15 is a drawing of examples of different polynomial models in accordance with an embodiment of the present disclosure.

[0040] FIG. 16 is a drawing of examples of harmonization of mean diffusivity using a Vanilla ComBAT technique and a Clinical ComBAT technique in accordance with an embodiment of the present disclosure.

[0041] FIG. 17 is a drawing of examples of harmonization of apparent fiber density using a Vanilla ComBAT technique and a Clinical ComBAT technique in accordance with an embodiment of the present disclosure.

[0042] FIG. 18 is a drawing of an example of harmonization using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure.

[0043] FIG. 19 is a drawing of an example of a goodness of fit of the harmonization of mean diffusivity using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure.

[0044] FIG. 20 is a drawing of an example of a hyperparameter tuning in accordance with an embodiment of the present disclosure.

[0045] FIG. 21 is a drawing of an example of a fit of mean diffusivity using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure.

[0046] FIG. 22 is a drawing of an example of outlier rejection in accordance with an embodiment of the present disclosure.

[0047] FIG. 23 is a block diagram illustrating an example of a computer in accordance with an embodiment of the present disclosure.

[0048] Note that like reference numerals refer to corresponding parts throughout the drawings. Moreover, multiple instances of the same part are designated by a common prefix separated from an instance number by a dash.

DETAILED DESCRIPTION

[0049] A computer system that performs pairwise harmonization is described. This computer may include: a computation device (such as a processor, a graphics processing unit or GPU, etc.) that executes program instructions; and memory that stores the program instructions. During operation, the computer system may receive information specifying magnetic-resonance diffusion measurements, where the magnetic-resonance diffusion measurements are associated with at least an individual and a first site. Then, the computer system may perform the pairwise harmonizing of the magnetic-resonance diffusion measurements based at least in part on target magnetic-resonance diffusion measurements, where the target magnetic-resonance diffusion measurements are associated with a population and one or more target sites, and the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements may be acquired using different magnetic-resonance scanners. Next, the computer system may perform quality control on the pairwise harmonized magnetic-resonance diffusion measurements, where the quality control may include comparing the pairwise harmonized magnetic-resonance diffusion measurements associated with the first site and the one or more target sites.

[0050] By performing the pairwise harmonization, these Clinical ComBAT techniques may address the problems associated with existing harmonization techniques (which are sometimes referred to as Vanilla ComBAT). These capabilities may allow clinical data to be appropriately normalized and corrected for inter-site variations (e.g., in hardware, software, patient populations, etc.). These capabilities may allow clinical data to corrected as it is acquired and to be used to facilitate improved diagnosis of diseases and personalized treatments.

[0051] In the discussion that follows, the Clinical ComBAT techniques are used to analyze dMRI data associated with different MRI scanners at different sites or locations. However, the Clinical ComBAT techniques may be used to analyze a wide variety of types of magnetic-resonance images (which may or may not involve MRI, e.g., free-induction-decay measurements), such as: magnetic resonance spectroscopy (MRS) with one or more types of nuclei, magnetic resonance spectral imaging (MRSI), magnetic resonance elastography (MRE), magnetic resonance thermometry (MRT), magnetic-field relaxometry and/or another magnetic resonance technique (e.g., functional MRI, metabolic imaging, molecular imaging, blood-flow imaging, diffusion-tensor imaging, etc.). More generally (and provided that the result neurological fibers are in the same reference as the analyzed images), the Clinical ComBAT techniques may be used to analyze measurement results from a wide variety of invasive and non-invasive imaging techniques, such as: X-ray measurements (such as X-ray imaging, X-ray diffraction or computed tomography at one or more wavelengths between 0.01 and 10 nm), neutron measurements (neutron diffraction), electron measurements (such as electron microscopy or electron spin resonance), optical measurements (such as optical imaging or optical spectroscopy that determines a complex index of refraction at one or more visible wavelengths between 300 and 800 nm or ultraviolet wavelengths between 10 and 400 nm), infrared measurements (such as infrared imaging or infrared spectroscopy that determines a complex index of refraction at one or more wavelengths between 700 nm and 1 mm), ultrasound measurements (such as ultrasound imaging in an ultrasound band of wavelengths between 0.2 and 1.9 mm), proton measurements (such as proton scattering), positron emission spectroscopy, positron emission tomography (PET), impedance measurements (such as electrical impedance at DC and/or an AC frequency) and/or susceptibility measurements (such as magnetic susceptibility at DC and/or an AC frequency).

[0052] We now describe embodiments of the Clinical ComBAT techniques. The brain is composed of three main regions: the gray matter (GM), the white matter (WM) and cerebrospinal fluid (CSF). The white matter is composed of axons and glial cells (such as astrocytes, microglias, oligodendrocytes, etc.), which are surrounded by water. An axon is a myelinated and elongated portion of a nerve cell that conducts electric pulses between cortical regions. Axons are organized in fibers, which are themselves grouped into bundles that one can picture as large brain connection pathways. Note that the myelin sheaths are wrapped around axons. Moreover, the glial cells include oligodendrocyte, astrocyte and microglia. All the white-matter components are surrounded by cerebrospinal fluid.

[0053] With recent advances in imaging technologies, it is possible to measure the connectivity of the brain and quantify white-matter microstructure, as well as its deterioration, typically due to aging or some neurodegenerative disease. Thanks to the fibrous nature of axons, the diffusion of molecules in the white matter is anisotropic and can be measured by a non-invasive technology, such as dMRI.

[0054] Diffusion imaging is the process by which the average displacement of water molecules within a voxel (or volume element, which is sometimes referred to as a voxel) is measured. A dMRI acquisition protocol is typically designed to acquire a series of diffusion weighted images that are each sensitized to the water diffusion into a particular direction. These images are usually acquired following the seminal pulse gradient spin echo protocol (or a slight variation of it). In an MRI scanner, a strong permanent magnetic field combined with the interplay of radio-frequency pulses and diffusion-sensitizing gradients (which are referred to as pulsed field gradients or PFG).

[0055] Given the pulse duration and pulse separation, each PFG has a magnitude or strength (which is typically called b-value) and a unique pre-defined orientation (which is typically called b-vector). Mathematically, this amounts to G=G.Math.u, where G is the gradient, G is the norm of that vector (the b-value) and u is a unit vector (the b-vector). An arbitrary large number of diffusion weighted images can be acquired, each with a different PFG b-value/b-vector pair.

[0056] When the diffusion weighted images were acquired with the same PFG b-value but different b-vectors, the PFGs can be illustrate as points lying on a sphere whose distance to the origin equals the PFG b-value. This configuration is called single-shell. However, when different b-values are used, the PFG points may lie on different concentric spheres in a configuration called multi-shell. While these q-space sampling techniques are often used, other q-space configurations may occur, such as: spectrum sampling, random sampling, grid sampling, and sparse sampling. Note that a dot at the center of the spheres indicates the b0 image, which amounts to a b-value of zero. The b0 image is T.sub.2-weighted image that takes into account tissue contrasts and signals in the absence of diffusion gradients. The b0 serves as an anatomical baseline image and is often used to normalize diffusion weighted images.

[0057] Thus, diffusion images encode how water molecules diffuse along certain directions. Depending on the environment surrounding a water molecule (such as fat, muscle, fibrous tissues, free water, etc.) it may move more or less freely and, therefore, may return a more or less intense signal. By sampling the q-space with different PFG b-values and b-vectors, a voxel may get different signal values, which can be combined into a compact mathematical model that represents the underlying local tissue structure.

[0058] The accuracy of such a local model typically depends heavily on the total number of acquired diffusion-weighted images. The simplest model is one derived from a single diffusion acquisition (or one point in the q-space). This results in a raw diffusion-weighted image whose voxel's intensity is related to the level of sensitization along one direction and to the local tissue structure. While this simple acquisition can be used for clinical applications (thanks to its simplicity and the short time, e.g., a few minutes, it takes to acquire it), it often has limitations. Notably, given that the measured signal S after the PFG is given by

[00001] $S = S_{0} e^{- bD},$

where S.sub.0 is the magnetic resonance (MR) signal at baseline, b is the b-value and D is the diffusion coefficient. Therefore, the acquired image may include the raw signal S, but not the diffusion coefficient, per se. Also, because the term e.sup.bD typically behaves very much like the T.sub.2-weighting term e.sup.TE/T2 (where TE is the echo time and T.sub.2 the transverse relaxation time), the measured value S is usually heavily affected by relaxation times. This results in variations in the image intensities that can be hard to interpret. In addition, a diffusion image usually has a low signal-to-noise ratio (SNR), which makes it difficult to distinguish the underlying anatomy.

[0059] In order to compensate for the T.sub.2 contamination and the low SNR, more diffusion-weighted images need to be acquired. Consequently, when two diffusion images with different b-values are acquired and combined, the T.sub.2 contamination can be cancelled out and the diffusion coefficient D (which is often called the apparent diffusion coefficient image or ADC image) can be recovered:

[00002] $ADC = D = \frac{\ln (\frac{S_{2}}{S_{1}})}{b 1 - b 2},$

where S.sub.1 and S.sub.2 are signal intensities obtained with b1 and b2 b-values. Note that when one of the two signals is obtained with a b0, the following ADC map (which is also called the attenuation map) is obtained:

[00003] $\begin{matrix} ADC = \frac{\ln (\frac{S_{0}}{S_{1}})}{b 1} . & (1) \end{matrix}$

ADC maps are widely used and have been shown to help diagnose brain strokes.

[0060] Moreover, in order to increase the SNR, three diffusion images may be acquired. A typical approach is to use the same b-value for all three images, but with b-vectors aligned on the X, Y and Z canonical axis:

[00004] $S_{x} = S_{o} e^{- {bD}_{xx}},$ $S_{y} = S_{o} e^{- {bD}_{yy}}, and$ $S_{z} = S_{o} e^{- {bD}_{zz}},$

where D.sub.xx, D.sub.yy, D.sub.zz are the diffusion coefficients along the three axes. The combination of these images results in the so-called trace DW image (which is sometimes referred to as the short DWI). The combination may be implemented as follows:

[00005] $\begin{matrix} S_{DWI} = \sqrt[3]{S_{x} S_{y} S_{z}} = S_{o} e^{- b (D_{xx} + D_{yy} + D_{zz}) / 3} = S_{0} e^{- b (D_{trace) / 3}} . & (2) \end{matrix}$

Note that D.sub.trace is the trace of the diffusion tensor.

[0061] Compared to a single diffusion image S, S.sub.DWI typically offers significantly improved contrast. In order to remove the T.sub.2 contamination, a b0 image may be acquired and an attenuated ADC map may be computed following Eqn. 1:

[00006] $\begin{matrix} ADC = \frac{\ln (\frac{S_{0}}{S_{DWJ}})}{b} = \frac{D_{trace}}{3} . & (3) \end{matrix}$

[0062] Note that, by their very nature, the signal intensities of an ADC map are opposite to those of a DWI, which can be confusing. In order to see this, contemplate from Eqn. 2 that S.sub.DWI is maximum when D.sub.trace is 0 and decreases exponentially fast as D.sub.trace is increasingly positive. This is the exact opposite with the ADC map of Eqn. 3, which is minimum when D.sub.trace is 0. Thus, tissues with large diffusivity values (such as free water and cerebrospinal fluid) usually appear bright in an ADC map, while tissues with low diffusivity values (such as stroke and fibrous tissues) typically appear dark. These contrasts are the exact opposite in a DWI image. In order to alleviate the confusion, an exponential ADC image (which is sometimes referred to as a Stejskal-Tanner attenuation map) may be computed:

[00007] $\begin{matrix} S_{ex p} = \frac{S_{DWI}}{S_{0}} = e^{- b . ADC} . & (4) \end{matrix}$

[0063] By acquiring more diffusion images, the 3D structure of the brain tissue may be measured. As discussed previously, the fibrous nature of axons makes the white matter a highly anisotropic environment, with diffusion coefficients larger along the axonal tracts than perpendicular to the tracts. One way of recovering the 3D microstructure of the white matter is by acquiring at least seven images, such as six diffusion-weighted images from six different b-vectors (S.sub.1 to S.sub.6) plus a b0 image (S.sub.0) to recover the unimodal model that is referred to as the diffusion tensor. The diffusion tensor is a 33 symmetric positive definite matrix:

[00008] $\begin{matrix} (\begin{matrix} D_{xx} & D_{xy} & D_{xz} \\ D_{yx} & D_{yy} & D_{yz} \\ D_{zx} & D_{zy} & D_{zz} \end{matrix}) . & (5) \end{matrix}$

[0064] From this matrix, the ADC can be represented as D(u)=u.sup.TDu, where u is a b-vector and the b0 attenuated signal is

[00009] $S_{u} = e^{- b . (u^{T} Du)} .$

Because D has six independent variables, it can be recovered from six (or more) diffusion-weighted images using least-square regression, weighted least-square regression or another technique involving Rician noise.

[0065] Moreover, from the diffusion tensor D, one can extract three eigenvalues (.sub.1, .sub.2 and .sub.3) whose combination leads to diffusion metrics, such as the fractional anisotropy (FA), the mean diffusivity (MD), the radial diffusivity (RD) and the axial diffusivity (AD), where .sub.1, .sub.2 and .sub.3 are the eigenvalues of the diffusion tensor. Because each voxel is associated with a diffusion tensor, note that these diffusion metrics can be visualized as image modalities. Fractional anisotropy is often useful for the assessment of children's brain maturation, dyslexia, and for some psychiatric disorders.

[0066] One challenge associated with the diffusion tensor D is that it only encodes one white matter fiber direction. Considering that more than two-thirds of the white matter voxels contain crossing brain fibers oriented in more than one direction, this model is often regarded as ill-suited for many applications. As a solution, multi-directional local models have been proposed, such as the bi-tensor and multi-tensor models. Other models account for higher orders by using spherical deconvolution (SD) approaches. In these cases, instead of assuming a fixed number of local fiber orientations, the SD models approximate the local white-matter structure using so-called ODF models or fiber ODF. Note that the SD approaches express the acquired diffusion signal in each voxel as a spherical convolution between an ODF and a low-pass filter called a fiber response function (FRF) that describes the signal profile of the white matter. Deconvoluting the measured signal with a FRF leads to the desired fODF.

[0067] Tractography is a process in which fibers are reconstructed by iteratively following the local fiber orientation models. When a local fiber orientation is put on each white matter voxel, the underlying fibrous structure of the white matter tissues may be determined. Moreover, with these models, several tractography techniques have been proposed to reconstruct brain fibers. When enough fibers have been reconstructed, so that each voxel is traversed by at least one fiber, one obtains a so-called whole brain tractogram (which may contain millions of fibers).

[0068] While some global tractography techniques have been proposed, typical approaches involve a local iterative technique that builds fibers point by point. The local tractography techniques can be deterministic or probabilistic, may use particle filtering (such as a Particle Filtering Tractography or PFT technique), may be driven by a pre-computed cortex manifold (such as a Surface Enhanced Tracking or SET technique), and/or may be specific to pre-segmented white matter regions (such as the Bundle Specific Tracking or BST technique).

[0069] In general, all tracking techniques have limitations, because they all produce spurious or anatomically implausible brain fibers. Such fibers may be too short or too long, have a loopy shape, pass-through non-white matter regions, connect inappropriate gray matter regions and/or connect no gray matter regions at all. These fibers are typically filtered out in a post-processing operation. The remaining anatomically-plausible fibers may then be grouped into fiber bundles.

[0070] Fibers in a bundle have roughly the same shape and connect the same gray matter regions. The most obvious way to extract white-matter bundles from a whole brain tractogram is by asking a neuroanatomist to manually dissect bundles of interest. However, manual dissecting bundles is long, tedious, and prone to large inter- and intra-expert variability. Automated techniques, such as QuickBundles (QB), QuickBundlesX (QBx), Deep Fiber Clustering (DFC), RecoBundles (RB), RecoBundlesX (RBx), TractSeg, XTRACT, FINTA, FIESTA, Deep White Matter Analysis (WMA), and BINTA, have been proposed to accelerate and increase the reproducibility of this process.

[0071] Bundles can be useful to assess the integrity of the white matter into specific regions associated with some neurodegenerative diseases, such as Alzheimer's disease. White-matter bundles open the door to tractometry analysis, in which white-matter metrics (such as the fractional anisotropy, axial diffusivity, radial diffusivity, and mean diffusivity) are analyzed within each bundle. These metrics can be analyzed across the entire bundles or within sub-regions. Note that more advanced metrics based at least in part on high-dimensional local models (be they single- or multi-shell) may also be computed, such as: free-water indices, myelin water fraction, neurite density index, orientation dispersion index, and/or magnetization transfer ratio.

[0072] Tractography is a well-established technique for brain connectivity analysis, and has been shown to be effective in measuring the local brain microstructure. While tractography and tractometry can be used independently, they are often used jointly. FIG. 1 presents a drawing of an example of an end-to-end tractometry/tractography pipeline for converting raw input diffusion and structural images into a tractometry report.

[0073] We now describe a computer system that may implement the Clinical ComBAT techniques. FIG. 2 presents a block diagram illustrating an example of a computer system 200. This computer system may include one or more computers 210. These computers may include: communication modules 212, computation modules 214, memory modules 216, and optional control modules 218. Note that a given module or engine may be implemented in hardware and/or in software.

[0074] Communication modules 212 may communicate frames or packets with data or information (such as measurement results or control instructions) between computers 210 via a network 220 (such as the Internet and/or an intranet). For example, this communication may use a wired communication protocol, such as an Institute of Electrical and Electronics Engineers (IEEE) 802.3 standard (which is sometimes referred to as Ethernet) and/or another type of wired interface. Alternatively or additionally, communication modules 212 may communicate the data or the information using a wireless communication protocol, such as: an IEEE 802.11 standard (which is sometimes referred to as Wi-Fi, from the Wi-Fi Alliance of Austin, Texas), Bluetooth (from the Bluetooth Special Interest Group of Kirkland, Washington), a third generation or 3G communication protocol, a fourth generation or 4G communication protocol, e.g., Long Term Evolution or LTE (from the 3rd Generation Partnership Project of Sophia Antipolis, Valbonne, France), LTE Advanced (LTE-A), a fifth generation or 5G communication protocol, other present or future developed advanced cellular communication protocol, or another type of wireless interface. For example, an IEEE 802.11 standard may include one or more of: IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, IEEE 802.11-2007, IEEE 802.11n, IEEE 802.11-2012, IEEE 802.11-2016, IEEE 802.11ac, IEEE 802.11ax, IEEE 802.11ba, IEEE 802.11be, IEEE 802.11bn or other present or future developed IEEE 802.11 technologies.

[0075] In the described embodiments, processing a packet or a frame in a given one of computers 210 (such as computer 210-1) may include: receiving the signals with a packet or the frame; decoding/extracting the packet or the frame from the received signals to acquire the packet or the frame; and processing the packet or the frame to determine information contained in the payload of the packet or the frame. Note that the communication in FIG. 2 may be characterized by a variety of performance metrics, such as: a data rate for successful communication (which is sometimes referred to as throughput), an error rate (such as a retry or resend rate), a mean squared error of equalized signals relative to an equalization target, intersymbol interference, multipath interference, a signal-to-noise ratio, a width of an eye pattern, a ratio of number of bytes successfully communicated during a time interval (such as 1-10 s) to an estimated maximum number of bytes that can be communicated in the time interval (the latter of which is sometimes referred to as the capacity of a communication channel or link), and/or a ratio of an actual data rate to an estimated data rate (which is sometimes referred to as utilization). Note that wireless communication between components in FIG. 2 uses one or more bands of frequencies, such as: 900 MHz, 2.4 GHZ, 5 GHZ, 6 GHZ, 60 GHZ, the Citizens Broadband Radio Spectrum or CBRS (e.g., a frequency band near 3.5 GHz), and/or a band of frequencies used by LTE or another cellular-telephone communication protocol or a data communication protocol. In some embodiments, the communication between the components may use multi-user transmission (such as orthogonal frequency division multiple access or OFDMA).

[0076] Moreover, computation modules 214 may perform calculations using: one or more microprocessors, ASICs, microcontrollers, programmable-logic devices, GPUs and/or one or more digital signal processors (DSPs). Note that a given computation component is sometimes referred to as a computation device.

[0077] Furthermore, memory modules 216 may access stored data or information in memory that local in computer system 200 and/or that is remotely located from computer system 200. Notably, in some embodiments, one or more of memory modules 216 may access stored measurement results in the local memory, such as dMRI data for one or more individuals (which, for multiple individuals, may include cases and controls or disease and healthy populations). Alternatively or additionally, in other embodiments, one or more memory modules 216 may access, via one or more of communication modules 212, stored measurement results in the remote memory in computer 224, e.g., via network 220 and network 222. Note that network 222 may include: the Internet and/or an intranet. In some embodiments, the measurement results are received from one or more measurement systems 226 (such as MRI scanners) via network 220 and network 222 and one or more of communication modules 212. Thus, in some embodiments at least some of the measurement results may have been received previously and may be stored in memory, while in other embodiments at least some of the measurement results may be received in real-time from the one or more measurement systems 226.

[0078] While FIG. 2 illustrates computer system 200 at a particular location, in other embodiments at least a portion of computer system 200 is implemented at more than one location. Thus, in some embodiments, computer system 200 is implemented in a centralized manner, while in other embodiments at least a portion of computer system 200 is implemented in a distributed manner. For example, in some embodiments, the one or more measurement systems 226 may include local hardware and/or software that performs at least some of the operations in the Clinical ComBAT techniques. This remote processing may reduce the amount of data that is communicated via network 220 and network 222. In addition, the remote processing may anonymize the measurement results that are communicated to and analyzed by computer system 200. This capability may help ensure computer system 200 is compatible and compliant with regulations, such as the Health Insurance Portability and Accountability Act, e.g., by removing or obfuscating protected health information in the measurement results.

[0079] Although we describe the computation environment shown in FIG. 2 as an example, in alternative embodiments, different numbers or types of components may be present in computer system 200. For example, some embodiments may include more or fewer components, a different component, and/or components may be combined into a single component, and/or a single component may be divided into two or more components.

[0080] As discussed previously, existing ComBAT techniques often suffer from a number of problems. Moreover, as described further below with reference to FIGS. 5-22, in order to address these challenges computer system 200 may perform the Clinical ComBAT techniques. Notably, during the Clinical ComBAT techniques, one or more of optional control modules 218 may divide the analysis among computers 210. Then, a given computer (such as computer 210-1) may perform at least a designated portion of the analysis. In particular, computation module 214-1 may receive (e.g., access) information (e.g., using memory module 216-1) specifying medical-imaging data that specify the central nervous system (including white matter) for one or more individuals. Note that the medical-imaging data may include or may correspond to dMRI data. Then, computation module 214-1 may perform operations in an analysis pipeline and/or a Clinical ComBAT technique.

[0081] For example, the analysis pipeline may include: QC (such as visual inspection, an image resolution check and/or a gradient distribution check), skull stripping or SS, preprocessing or PP (such as denoising, motion correction and/or a correction for magnetic field inhomogeneity), harmonization (to correct for data variation, e.g., in dMRI and/or MRI data, associated with different MR scanners), tissue segmentation (e.g., white matter, grey matter, tumor, cerebrospinal fluid, etc., and which may be based at least in part on structural MRI data using a convolutional neural network). The output of these portions or stages of the analysis pipeline may include tractography results (which are sometimes referred to as tractograms) that specify a set of neurological fibers, which are sometimes referred to as neural tracts or streamlines. In the present discussion, note that a neurological fiber includes a sequence of three dimensional or 3D points that are connected together.

[0082] Moreover, computation module 214-1 may perform additional tractography by computing, using a predetermined (e.g., pretrained) autoencoder neural network, second tractography results that specify a second set of neurological fibers based at least in part on the tractography results and information associated with a neurological anatomical region. Note that a subset of the set of neurological fibers may be anatomically implausible and the second set of fibers may exclude the subset. Moreover, the predetermined autoencoder neural network may be trained using an unsupervised-learning technique. In some embodiments, the second set of neurological fibers may, at least in part, be different from the set of neurological fibers. Note that computing the second tractography results may include a cleaning and bundling operation in which the neurological fibers are grouped or bundled into different types of bundles of neurological fibers having different numbers of neurological fibers.

[0083] Next, computation module 214-1 may perform additional operations in multiple stages in the analysis pipeline. For example, the second tractography results may be input to or used by one or more stages or operations in the analysis pipeline, such as: statistical analysis, connectome analysis, population analysis, etc. Notably, the analysis pipeline may include: microstructure analysis, region-wise microstructure statistics (RWMS), and/or region-wise statistical analysis (RWSA). In the case of analysis for an individual, region-wise statistical analysis may be based at least in part on a comparison with a reference atlas or data structure (such as a region-wise microstructure brain atlas corresponding to multiple individuals). Alternatively, in the case of analysis of a population, at least a portion of the analysis may be computed based at least in part on cases and controls in the population.

[0084] After performing the operations in the stages in the analysis pipeline, computation module 214-1 may output a subset of white-matter disease biomarkers for different neurological anatomical regions for at least the designated portion of the analysis. For a given neurological anatomical region, the set of white-matter disease biomarkers may include: an apparent fiber density, which corresponds to a total intra-axonal volume; an amount of free water; or a demyelination metric. Note that the amount of free water may correspond to neuroinflammation, and/or the apparent fiber density may correspond to axonal disruption or axonal quality. The apparent fiber density may be recovered or determined using: a diffusion fractional anisotropy, water corrected fractional anisotropy, water corrected axial diffusivity, intracellular neurite orientation dispersion and density imaging (NODDI) volume fraction, and/or a local fODF. Moreover, the demyelination metric may include: an inverse free-water-corrected radial diffusivity, a ratio of T.sub.1 to T.sub.2, a magnetization transfer ratio, or MWF. As noted previously, the MWF may be the ratio of the signal amplitudes measured in or associated with two water compartments

[00010] $MWF = \frac{M W}{(M W + EICW)},$

where MW is the signal amplitude associated with a myelin water compartment and EICW is a signal amplitude associated with an extra-intra cellular water compartment. These signal amplitude values may be obtained via a multi-exponential function and nonlinear optimization from a series of acquired T.sub.2-weighted images at different echo times (e.g., up to 48 echos). In some embodiments, the set of white-matter disease biomarkers may be computed on a per-voxel basis and/or a per-neurological-fiber (or fixel) basis.

[0085] Furthermore, computation module 214-1 may perform one or more operations in a Clinical ComBAT technique to correct for data variation, e.g., in dMRI and/or MRI data, associated with different MR scanners. Embodiments of Clinical ComBAT techniques are described further below with reference to FIGS. 5-22.

[0086] Additionally, one or more feedback modules 228 in computer system 200 may use the set of white-matter disease biomarkers for different neurological anatomical regions to provide feedback information associated with at least the individual or the population. For example, the set of white-matter disease biomarkers for different neurological anatomical regions may be inputs to a pretrained predictive model (such as a supervised machine-learning model or a neural network), which outputs the feedback information. Note that the feedback information may include: diagnostic information, information associated with disease progression (such as a disease stage), information regarding efficacy of a treatment, or a treatment recommendation (e.g., based at least in part on the disease stage). In some embodiments, the feedback information (such as the diagnostic information) may be based at least in part on a volume of a neurological anatomical region in at least the individual, such as the hippocampus.

[0087] In these ways, computer system 200 may automatically and accurately analyze white matter for at least an individual or a population, such as: perform filtering and/or grouping or bundling of the neurological fibers, computing the set of white-matter disease biomarkers, etc. These capabilities may allow computer system 200 to perform subsequent analyses or one or more additional operations, such as: connectome analysis, white matter segmentation, one or more clinical trial enrollment or exclusion criteria, assessing the impact of a medical intervention for a disease (e.g., in a clinical trial for a candidate pharmaceutical agent, neurostimulation and/or another type of therapy), precision medicine (such as in selecting a correct medical intervention to treat a disease, e.g., as a companion diagnostic for a prescription drug or a dose of a prescription drug), etc. For example, the more-accurate white-matter results may allow more accurate determination of: an axon density index, radial diffusion, volume, a myelin index or metric, an inflammation index or metric, fractional anisotropy (which is sensitive to anomalies), free water, apparent fiber density, a microglia index or metric, an astrocyte index or metric, an oligodendrocyte index or metric, a lesion edema index or metric, a cerebrospinal fluid index or metric and/or, more generally, a characteristic or attribute associated with microstructure environment (e.g., on a sub-millimeter length scale that is less than a length scale corresponding to a voxel) of one or more neurological fibers (and, more generally, white matter) in particular neurological anatomic regions. Note that the disease may include: Parkinson's disease, Alzheimer's disease, multiple sclerosis, amyotrophic lateral sclerosis (ALS) or Lou Gehrig's disease, normal pressure hydrocephalus, concussion, migraine, epilepsy, a type of cancer, an autoimmune disease, schizophrenia, depression, bipolar disorder, another type of mental illness, traumatic brain injury, an alcohol-use disorder, a neurodegenerative disease, a neuroinflammatory disease, or another type of neurological disease of the central nervous system. Consequently, the Clinical ComBAT techniques may facilitate accurate, value-added use of the measurement results, such as medical-imaging data.

[0088] We now describe embodiments of the method. FIG. 3 presents a flow diagram illustrating an example of a method 300 for performing pairwise harmonization, which may be performed by a computer system (such as computer system 200 in FIG. 2). During operation, the computer system may receive information (operation 310) specifying magnetic-resonance diffusion measurements, where the magnetic-resonance diffusion measurements are associated with at least an individual and a first site. For example, the receiving may include accessing the information in memory. Note that the magnetic-resonance diffusion measurements may be associated with magnetic-resonance images of at least the individual, including: T.sub.1-weighted, T.sub.2-weighted, proton density, fluid-attenuated inversion recovery (FLAIR), b-value images, diffusion weighted images, diffusion tensor images, high angular resolution diffusion images, and/or another type of magnetic-resonance image, data or derived metric (such as the fractional anisotropy, the mean diffusivity, the apparent fiber density, etc.). In some embodiments, harmonization is also performed for brain region information (such as the size, length, shape, etc.).

[0089] Then, the computer system may perform the pairwise harmonizing (operation 312) of the magnetic-resonance diffusion measurements based at least in part on target magnetic-resonance diffusion measurements, where the target magnetic-resonance diffusion measurements are associated with a population and one or more target sites, and the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements acquired using different magnetic-resonance scanners.

[0090] Next, the computer system may perform a quality control (operation 314) on the pairwise harmonized magnetic-resonance diffusion measurements, where the quality control includes comparing the pairwise harmonized magnetic-resonance diffusion measurements associated with the first site and the one or more target sites.

[0091] Note that the magnetic-resonance diffusion measurements may include measurements associated with the individual and a second individual.

[0092] Moreover, the magnetic-resonance diffusion measurements may be associated with a different number of individuals than the target magnetic-resonance diffusion measurements.

[0093] Furthermore, the quality control may include detecting a harmonization error in the pairwise harmonized magnetic-resonance diffusion measurements based at least in part on the pairwise harmonized magnetic-resonance diffusion measurements and a quality-control metric. The detecting may include segmenting the pairwise harmonized magnetic-resonance diffusion measurements into a first type of magnetic-resonance diffusion result or a second type of magnetic-resonance diffusion result.

[0094] Additionally, the magnetic-resonance diffusion measurements may include tractography results. Moreover, the magnetic-resonance diffusion measurements may include: fractional anisotropy, apparent fiber density, an amount of free water, and/or a demyelination metric.

[0095] Note that the pairwise harmonizing may reduce or eliminate additive and multiplicative bias in the magnetic-resonance diffusion measurements.

[0096] Moreover, the magnetic-resonance diffusion measurements may be associated with clinical operation of a magnetic-resonance scanner.

[0097] Furthermore, the pairwise harmonization may facilitate personalized medicine for at least the individual.

[0098] In some embodiments, the computer system may optionally perform one or more additional operations (operation 316). For example, the performing of the pairwise harmonization (operation 312) may include eliminating one or more outliers in the magnetic-resonance diffusion measurements based at least in part on the pairwise harmonized magnetic-resonance diffusion measurements.

[0099] Furthermore, after performing the pairwise harmonization (operation 312), the computer system may pool the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements.

[0100] Additionally, after performing the pairwise harmonization (operation 312), the computer system may generate a normative reference based at least in part on the magnetic-resonance diffusion measurements and the target magnetic-resonance diffusion measurements.

[0101] Note that the performing of the pairwise harmonization (operation 312) may include: fitting the magnetic-resonance diffusion measurements using a data model; removing covariates and an intercept from the magnetic-resonance diffusion measurements using the data model; after removing the covariates and the intercept, computing variance in the magnetic-resonance diffusion measurements; after computing the variance, correcting the magnetic-resonance diffusion measurements for multiplicative bias; and adding second covariates and a second intercept associated with the target magnetic-resonance diffusion measurements to the magnetic-resonance diffusion measurements.

[0102] Moreover, the data model may include a nonlinear model. For example, the variance may be computed using maximum likelihood for the target population and MAP for the moving population. Alternatively or additionally, the performing of the pairwise harmonization may include hyper-parameter autotuning with the target magnetic-resonance diffusion measurements to prevent overfitting to the data model.

[0103] Furthermore, the performing of the pairwise harmonization (operation 312) may include supplementing the magnetic-resonance diffusion measurements with additional magnetic-resonance diffusion measurements from a different magnetic-resonance scanner when a number of individuals associated with the magnetic-resonance diffusion measurements is less than a predefined value. Note that the magnetic-resonance diffusion measurements and the additional magnetic-resonance diffusion measurements may have characteristics within a predefined value of each other.

[0104] In some embodiments of method 300, there may be additional or fewer operations. Furthermore, the order of the operations may be changed, and/or two or more operations may be combined into a single operation.

[0105] Embodiments of the Clinical ComBAT techniques are further illustrated in FIG. 4, which presents a drawing illustrating an example of communication among components in computer system 200. In FIG. 4, a computation device (CD) 410 (such as a processor or a GPU) in computer 210-1 may access, in memory 412 in computer 210-1, information 414 specifying dMRI measurements for at least an individual and a first site. After receiving dMRI measurements, computation device 410 may may access, in memory 412 in computer 210-1, information 416 specifying target dMRI measurements.

[0106] Then, computation device 410 may perform pairwise harmonizing (PH) 418 of the dMRI measurements based at least in part on target dMRI measurements, where the target dMRI measurements are associated with a population and one or more target sites, and the dMRI measurements and the target dMRI measurements acquired using different magnetic-resonance scanners.

[0107] Next, computation device 410 may perform quality control (QC) 420 on the pairwise harmonized dMRI measurements, where quality control 420 includes comparing the pairwise harmonized dMRI measurements associated with the first site and the one or more target sites.

[0108] In some embodiments, computation device 410 may store, in memory 412, information 422 specifying the pairwise harmonized dMRI measurements and/or results of quality control 420 on the pairwise harmonized dMRI measurements. Alternatively or additionally, computation device 410 may provide instructions 424 to display information 422 on display 426 in computer 210-1. In some embodiments, computation device 410 may provide instructions 428 to interface circuit 430 in computer 210-1 to transmit, to another electronic device, one or more frames or packets 432 that include information 422.

[0109] While FIG. 4 illustrates communication between components using unidirectional or bidirectional communication with lines having single arrows or double arrows, in general the communication in a given operation in this figure may involve unidirectional or bidirectional communication.

[0110] Moreover, while the preceding discussion illustrated the Clinical ComBAT techniques with data from different magnetic-resonance scanners, in other embodiments, the diffusion magnetic-resonance measurements and/or the target diffusion magnetic-resonance measurements may come or may be acquired using the same physical magnetic-resonance scanner, but at different time points (which may result in signal drift), after a software update, using a different acquisition protocol, and/or for different populations.

[0111] In some embodiments, the computer may provide feedback information associated with at least the individual based at least in part on a set of white-matter disease biomarkers in different neurological anatomical regions. Moreover, the feedback information may include: diagnostic information, information associated with disease progression (such as a disease stage), information regarding efficacy of a treatment, or a treatment recommendation (e.g., based at least in part on the disease stage). For example, the diagnostic information may be associated with a neurological or neurodegenerative disease, including: epilepsy, depression, autism, cerebral palsy, multiple sclerosis, Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, traumatic brain injury, an alcohol-use disorder, or another neurological or neurodegenerative disease. Furthermore, the feedback information may be based at least in part on a volume of a neurological anatomical region in at least the individual, such as the hippocampus.

[0112] Moreover, the medical-imaging data may be associated with a population of cases and controls, which includes at least the individual, and the feedback information may be associated with the population.

[0113] Furthermore, the Clinical ComBAT and/or the feedback information may be determined using a pretrained predictive model. For example, the pretrained predictive model may include: a machine-learning model or a neural network. Note that a given node in a given layer in the neural network may include an activation function, and the activation function may include: a rectifier (ReLU), a leaky ReLU, an exponential linear unit (ELU) activation function, a parametric ReLU, a tanh activation function, or a sigmoid activation function.

[0114] Additionally, the set of white-matter disease biomarkers includes free-water, apparent-fiber-density and myelin metrics. These white matter disease biomarkers may be used in conjunction with different neurological diseases or neurodegenerative diseases. For example, the neurological diseases may include: Alzheimer's disease, Parkinson's disease, multiple sclerosis, traumatic brain injury, depression, amyotrophic lateral sclerosis and/or chronic traumatic encephalopathy.

[0115] In some embodiments, the set of white-matter disease biomarkers may be associated with Alzheimer's disease, Parkinson's disease and/or multiple sclerosis. Because Alzheimer's disease, Parkinson's disease and multiple sclerosis are, in general, slowly progressing neurological diseases, they often have corresponding signatures associated with them. Notably, in their early stages, Alzheimer's disease, Parkinson's disease and multiple sclerosis are characterized by inflammation of the white matter, which typically causes the free-water metric to increase. At this stage, the diseases are often unknown to the patients and the apparent-fiber-density and myelin metrics are usually stable. However, in later disease stages, the apparent-fiber-density and myelin metrics often start to decrease, thereby showing signs of axonal loss and myelin degradation. Consequently, Alzheimer's disease, Parkinson's disease and multiple sclerosis are typically characterized by an increase of free water and, in later stages of these diseases, by a decrease of the apparent-fiber-density and myelin metrics.

[0116] During clinical trials, it is important to know if a particular medication is effective or not. Interestingly, when a medication is effective, the free water typically decreases (which indicates that there is less inflammation in the white matter). In some cases, an increase of the apparent-fiber-density and myelin metrics are also observed, which suggests that the medication has the effect of remyelinizing the white matter and rebuilding axons. It is hypothesized that this happens when patients are in the middle stage of their diseases, where myelin and axons have not been irreversibly damaged.

[0117] For traumatic brain injury, immediately following brain concussion, an increase of free water is observed because of sudden neuro-inflammation and swelling. In most cases, the free water decreases after some time as the symptoms disappear. In these cases, the apparent-fiber-density and myelin metrics are stable all along. However, in some cases, the traumatic brain injury neuroinflammation becomes chronic which in turn can affect the integrity of the white matter and lead to reduction in apparent fiber density and myelin metrics.

[0118] Thus, for Alzheimer's disease, Parkinson's disease, multiple sclerosis and traumatic brain injury, there is an increase of free water followed in time by a decrease of the apparent-fiber-density and myelin metrics.

[0119] Note that where these changes occur in the brain may depend on the neurological diseases. For example, for Alzheimer's disease the changes may occur in the fornix region and in Alzheimer's disease bundles. Alternatively, for Parkinson's disease the changes may occur in the substantia nigra and Parkinson's disease bundles. Moreover, for multiple sclerosis the changes may occur in the vicinity of white-matter lesions and multiple sclerosis bundles. Furthermore, for traumatic brain injury the changes may occur in a variety of locations in the brain, but there are some traumatic brain injury bundles. Additionally, note that Alzheimer's disease, Parkinson's disease, and multiple sclerosis are often associated with an atrophy of the hippocampal volume.

[0120] While harmonization can be applied to the diffusion signal, in some embodiments it may also be applied to feature maps (such as a free-water or FW map, an apparent fiber density or AFD map, a myelin map, and/or brain regions, such as area size, circumference, volume, etc.). These feature map harmonization(s) may be deep-learning based (e.g., using a pretrained neural network) or non-deep learning base (e.g., by adjusting the image histogram to a reference histogram or using a technique such as Vanilla ComBAT, e.g., from the Dana-Farber Cancer Institute, Boston, Massachusetts).

[0121] In the Clinical ComBAT techniques, one or more operations may be performed using a pretrained neural network. For example, Clinical ComBAT may be implemented using a deep autoencoder or another type of neural network. The pretrained neural network may significantly improve the accuracy relative to existing ComBAT techniques (e.g., by 10), thereby reducing the use of processor, memory and communication resources in or associated with a computer system (such as computer system 200 in FIG. 2).

[0122] We now further describe the Clinical ComBAT techniques. dMRI derived scalar maps have proven effective in measuring the impact of various neurodegenerative diseases and brain conditions.

[0123] FIG. 5 presents a drawing of an example of raw dMRI in accordance with an embodiment of the present disclosure. Note that CC is the Corpus Callosum, CST is the Cortical Spinal Tract, and Cg is the Cingulum.

[0124] FIG. 6 presents a drawing of an example of fractional anisotropy in accordance with an embodiment of the present disclosure.

[0125] However, by their very nature, dMRIs can hinder the combination of data from multiple acquisition sites without a harmonization step to mitigate scanner-specific biases. The existing prevalent harmonization technique, ComBAT, despite its efficacy in academic and clinical studies, often faces limitations in real-life, large-scale clinical applications, particularly in personalized medicine. This is due to its reliance on assumptions that are seldom met in clinical settings, such as a linear model of data with respect to its covariates, a homogeneous population across sites, a fixed number of sites, and a large sample size to accurately estimate harmonization parameters.

[0126] In the present disclosure, we introduce Clinical ComBAT, which is designed for the nuances of real-life clinical practice. (In the present discussion, existing ComBAT techniques are sometimes referred to collectively as Vanilla ComBAT.) Our technique employs a simplified data model, uses a polynomial modeling of the data, a site-specific harmonization using a normative reference site, and a robust function to compensation for data irregularities. Additionally, it incorporates prior terms, uses a new hyperparameter tuning technique, and introduces a goodness of fit metric to evaluate harmonization quality.

[0127] ComBAT is a harmonization technique initially devised for genomics to counter batch effects. Its effectiveness in mitigating MRI acquisition effects in scalar maps from dMRI was subsequently demonstrated. Despite its widespread use, the theoretical underpinnings and implementation of ComBAT can be perplexing. To facilitate a comprehensive understanding, we detail the technique in the discussion below.

[0128] FIG. 7 presents a drawing of an example of unharmonized apparent fiber density in accordance with an embodiment of the present disclosure. The apparent fiber density in the brain region arcuate fasciculus in the left hemisphere was determined for 32 sites using the 1212 dataset. Also shown in FIG. 7 are the sites for the 1112 dataset.

[0129] FIG. 8 presents a drawing of an example of harmonized apparent fiber density in accordance with an embodiment of the present disclosure. In FIG. 8, the 1112 dataset was used as the normative or target reference.

[0130] FIG. 9 presents a drawing of examples of a Vanilla (existing) ComBAT technique and a Clinical ComBAT technique in accordance with an embodiment of the present disclosure. Notably, FIG. 9 shows the operations in ComBAT. Moreover, Table 1 presents a list of ComBAT variables.

[0131] In order to address the problems with Vanilla ComBAT, we introduce Clinical ComBAT, which is designed for the practicalities of clinical use. As shown in FIG. 9, Clinical ComBAT aligns each (moving) site with a well-populated (target) normative reference, circumventing the issue of increasing site numbers. Moreover, we improved the model and replaced the linear function with a nonlinear, polynomial one. Also, using a large normative population allows for: the integration of new priors, the estimation of a degree of a polynomial function, the integration of a robust function to handle abnormal data, and the inclusion of a goodness of fit metric to assess harmonization quality.

[0132] Let Y={Y.sub.1, Y.sub.2, . . . , Y.sub.1,} represent a set of I physical MRI sites, where Y.sub.i={Y.sub.i1, Y.sub.i2, . . . , Y.sub.iJ.sub.i} are a set of J.sub.i participants whose brain has been scanned with the same MRI machine i. In voxel-wise harmonization, these participant's MR images are usually non-linearly registered to a common space, such as the MNI space. Note that the number of participants J.sub.i is site dependent, as their number varies from one site to another. Each scalar map Y.sub.ij can be expressed as Y.sub.ij=[y.sub.ij1, y.sub.ij2, . . . y.sub.ijv], where V is the number of voxels (or regions) in the common space and y.sub.ijy custom-character .

TABLE-US-00001 Context Variable Meaning Global i, I Physical MRI site index. Total number of sites j, J.sub.i Scan index for site i, Total number of scans for site i v, V Voxel index for site i and scan j, Total number of voxols y Set of I physical MRI sites Y.sub.i Set of J.sub.i scans of site i Y.sub.ij Scan j from site i Unstandardized y.sub.ijv Feature of site i, scan j, and voxel v .sub.v, {circumflex over ()}.sub.v Model intercept of the overall population, Estimate [00011] ${\hat{}}_{n}^{CC}$ Estimate of CamCan intercept x.sub.ij, X.sub.ij Y.sub.ij vector of biological covariates, Matrix of stacked x.sub.ij .sub.v, {circumflex over ()}.sub.v Regression vector for a voxel v, Estimate [00012] $_{v}^{CC}$ Estimated CamCan regression vector for a voxel v, .sub.iv, {circumflex over ()}.sub.iv Additive bias of site i at the voxel v, Estimate .sub.iv, {circumflex over ()}.sub.iv Multiplicative bias of site i at the voxel v, Estimate text missing or illegible when filed Random independent Gaussian noise for the feature y.sub.ijv .sub.v, {circumflex over ()}.sub.v Standard deviation of , Estimate [00013] $_{v}^{CC}$ Estimate of CamCan standard deviation Standardized z.sub.ifv Standardized value of y.sub.ijv .sub.iv.sup.* Standardized additive bias [00014] $_{iv}^{*}$ [00015] $Estimate of_{iv}^{*}$ [00016] ${\overline{}}_{iv}^{*}$ [00017] $Estimate of_{iv}^{*} using an empirical Bayes$ [00018] $_{iv}^{*}$ Standardized multiplicative bias [00019] ${\hat{}}_{iv}^{*}$ [00020] $Estimate of_{iv}^{*}$ [00021] ${\overline{}}_{iv}^{*}$ [00022] $Estimate of ? using an empirical Bayes$ .sub.i, .sub.i [00023] $Mean of ? for site i, E stimate$ [00024] $_{i}^{2}, {\overline{}}_{i}^{2}$ [00025] $Variance of ? for site i, Estimate$ .sub.i, .sub.i [00026] $Shape of ? for site i, E stimate$ .sub.i, .sub.i [00027] $Scale of ? for site i, E stimate$ [00028] $\overline{G} ?$ First empirical moment of an Inverse-Gamma [00029] ${\overline{S}}_{i}^{2}$ Second empirical moment of an Inverso-Gamma text missing or illegible when filed indicates data missing or illegible when filed
Table 1. List of ComBAT variables.

[0133] To mitigate site-specific biases, ComBAT employs a linear model for the image formation of each region (often voxel) as follows:

[00030] $\begin{matrix} y_{ijv} =_{v} + x_{ij}^{T}_{v} +_{iv} +_{iv}_{ijv} & (6) \end{matrix}$

where .sub.v is the model intercept of the overall population across all sites, x.sub.ij is a vector of covariates, .sub.v is the regression coefficient vector, .sub.iv and .sub.iv are the additive and multiplicative effects of site i, and .sub.ijv is a random independent Gaussian noise with

[00031] $_{ijv} ~ (0,_{v}^{2}) .$

This model is depicted for two sites in FIG. 10, which presents a drawing of an example of a Vanilla ComBAT technique, and FIG. 11 presents a drawing of an example of harmonization of mean diffusivity using a Vanilla ComBAT technique. (Notably, FIG. 10 illustrates the seven operations in a typical Vanilla ComBAT harmonization of two sites underlying the effect of each variable of Eqn. 22. FIG. 10 illustrates the raw data and the harmonized data. The gray and the red curves in FIG. 10 illustrate the overall trend of the population from site 1 and site 2, whereas the scatter plots illustrate the operations leading to the harmonization of the two sites.) In FIGS. 10 and 11, the black and red lines illustrate the trend of the two populations

[00032] $(i . e ._{v} + x_{1 j}^{T}_{v} +_{1 v} and_{v} + x_{2 j}^{T}_{v} +_{2 v})$

whereas the dotted line corresponds to the overall population trend i.e.

[00033] $_{v} + x_{i j}^{T}_{v} .$

the population) is constant for all sites which, as will be demonstrated, is often incorrect and results in flawed harmonization functions. Thus, the harmonization process illustrated in FIG. 10 is an idealized version of the technique.

[0134] In FIG. 11, the MD metric is harmonized using Vanilla ComBAT between sites of healthy participants but with a different slope .sub.v. In all cases, Vanilla ComBAT leads to an erroneous harmonization. Notably, FIG. 11 shows the harmonization of the two datasets 1110 and 1112, and a different dataset 1114 and a biased version of itself. The dashed line represents the slopes and intercept computed by Vanilla ComBAT. Note that 1116 illustrates the same experiment as in 1114, but with the average of 30 groups of 4 randomly selected participants.

[0135] The primary goal of ComBAT is usually to eliminate the site-specific additive and multiplicative biases (respectively .sub.iv and .sub.iv) ensuring that the harmonized population profile conforms to the model depicted in FIG. 10,

[00034] $\begin{matrix} y_{ijv}^{ComBAT} =_{v} + x_{ij}^{T}_{v} +_{ijv} . & (7) \end{matrix}$

Because the parameters of the model .sub.v, .sub.v, .sub.iv and .sub.iv are a priori unknown, they are usually empirically estimated from the data. If these parameters can be accurately estimated (referred to as {circumflex over ()}.sub.v, {circumflex over ()}.sub.v, {circumflex over ()}.sub.iv and {circumflex over ()}.sub.iv), the harmonized values

[00035] $y_{ijv}^{ComBAT}$

can be computed as

[00036] $\begin{matrix} y_{ijv}^{ComBAT} = \frac{y_{ijv} - {\hat{}}_{v} - x_{ij}^{T} {\hat{}}_{v} - {\hat{}}_{iv}}{{\hat{S}}_{iv}} + {\hat{}}_{v} + x_{ij}^{T} {\hat{}}_{v} . & (8) \end{matrix}$

[0136] One simplistic approach to estimating these parameters is through a Location and Scale (L/S) technique. Although this approach is straightforward and effective, it often has its share of limitations. These are typically addressed with the Bayesian formulation of ComBAT, which is discussed further below.

[0137] L/S ComBAT estimates the parameters {circumflex over ()}.sub.v, {circumflex over ()}.sub.v, {circumflex over ()}.sub.iv and {circumflex over ()}.sub.iv following a maximum likelihood of Gaussian distributions. As such, the regression vector {circumflex over ()}.sub.v and the intercept {circumflex over ()}.sub.v are obtained with an ordinary least-square (L/S) approach. This involves pooling data for every voxel from every patient j across all sites i, and use the usual L/S regression solution:

[00037] $\begin{matrix} {[{\hat{}}_{v}, {\hat{}}_{v}]}^{T} = {(X^{T} X)}^{- 1} X^{T} Y_{v}, & (9) \end{matrix}$

where X is the matrix of covariate vectors from every participant across all sites, and Y.sub.v is a vector containing the metric associated with location for every participant in all sites. According to this framework, the global intercept {circumflex over ()}.sub.v represents the average feature value at voxel for all images across all sites, excluding the effect of covariates, namely

[00038] $\begin{matrix} {\hat{}}_{v} = \frac{1}{J_{I}} {.Math.}_{ij} (y_{ijv} - x_{ij}^{T} {\hat{}}_{v}), & (10) \end{matrix}$

where J.sub.I is the total number dMRI-derived metric images acquired across all sites, illustrated as the total number of gray and red points in FIG. 10.

[0138] The additive and multiplicative site-specific biases {circumflex over ()}.sub.iv and {circumflex over ()}.sub.iv are determined by taking the mean and variance of the rectified value at location for all images acquired at site i:

[00039] $\begin{matrix} {\tilde{}}_{iv} = \frac{1}{J_{i}} \underset{j}{.Math.} (y_{ijv} - {\hat{}}_{v} - x_{ij}^{T} {\hat{}}_{v}), & (11) \end{matrix}$ $\begin{matrix} {\hat{}}_{iv}^{2} = \frac{1}{J_{i} - 1} \underset{j}{.Math.} {(y_{ijv} - {\hat{}}_{v} - x_{ij}^{T} {\hat{}}_{v} - {\tilde{}}_{iv})}^{2}, & (12) \end{matrix}$

[0139] When the total number of images J.sub.I acquired across all sites is sufficient, the maximum likelihood estimate of {circumflex over ()}.sub.v and {circumflex over ()}.sub.v from Eqns. 9 and 10 can be considered unbiased. However, for any site i with a low number of acquired images (a common scenario in clinical practice) {circumflex over ()}.sub.iv and {circumflex over ()}.sub.iv are biased and thus should not be relied upon.

[0140] One common solution to that problem is to estimate {circumflex over ()}.sub.iv and {circumflex over ()}.sub.iv following a maximum a posteriori formulation while keeping as is Eqns. 9 and 10 for estimating {circumflex over ()}.sub.v and {circumflex over ()}.sub.v. This leads to the most-used version of ComBAT, which is called Vanilla ComBAT in the remainder of the present disclosure.

[0141] Vanilla ComBAT typically begins by standardizing the data, which involves subtracting the population intercept, regression weight and variance, as depicted in FIG. 10 and represented mathematically as

[00040] $\begin{matrix} z_{ijv} = \frac{y_{ijv} - {\hat{}}_{v} - x_{ij}^{T} {\hat{}}_{v}}{{\hat{}}_{v}}, & (13) \end{matrix}$

where the variance is determined as

[00041] $\begin{matrix} {\overset{}{}}_{v}^{2} = \frac{1}{J_{I}} {.Math.}_{i j} {(y_{i j v} - {\overset{}{}}_{v} - x_{i j}^{T} {\overset{}{}}_{v} - {\overset{}{}}_{i v})}^{2}, & (14) \end{matrix}$

and {circumflex over ()}.sub.iv is calculated following Eqn. 11.

[0142] At this stage, the bias of each site (the vertical distance between the origin and the red and blue dashed line in FIG. 10) is denoted as

[00042] $_{i v}^{*}$

and the variance of each site is

[00043] $_{i v}^{2 *} .$

These two variables are usually estimated by maximizing their posterior distributions rather than their likelihood distributions. According to Bayes' theorem, the posteriors (.) of

[00044] $_{i v}^{*} and_{i v}^{2 *}$

can be expressed as

[00045] $\begin{matrix} P (_{i v}^{*} .Math. z_{lv},_{i v}^{2 *}) P (z_{iv} .Math._{iv}^{*},_{i v}^{2 *}) P (_{i v}^{*}), & (15) \end{matrix}$ $\begin{matrix} P (_{i v}^{2 *} .Math. z_{i v},_{i v}^{*}) P (z_{i v} .Math._{i v}^{*},_{i v}^{2 *}) P (_{i v}^{2 *}), where & (16) \end{matrix}$ $P (z_{iv} .Math._{iv}^{*},_{iv}^{2 *}) = (_{iv}^{*},_{iv}^{2 *})$

[0143] is the likelihood-Gaussian,

[00046] $P (_{i v}^{*}) = (_{i},_{i}^{2})$

is the prior-Gaussian, and

[00047] $P (_{i v}^{2 *}) = IG (_{i},_{i})$

is the prior-Inverse Gamma.

[0144] The hyperparameters of the prior distributions .sub.i, .sub.i.sup.2, .sub.i, and .sub.i are usually estimated based on the moment of these distributions. Specifically, .sub.i and .sub.i.sup.2 may be derived from the first and second moment of a Gaussian distribution, namely

[00048] $\begin{matrix} \begin{matrix} {\overline{}}_{i} = \frac{1}{V} \underset{v}{.Math.} {\hat{}}_{iv}^{*} \\ {\overline{}}_{i}^{2} = \frac{1}{V - 1} {.Math.}_{v} {({\hat{}}_{iv}^{*} - {\overline{u}}_{i})}^{2} \end{matrix} & (17) \end{matrix}$

where V is the number of brain regions (or voxels) and

[00049] ${\hat{}}_{iv}^{*} = \frac{1}{J_{i}} {.Math.}_{j} z_{ijv}$

is the region-wise and site-wise model intercept of the standardized data and should not be confused with {circumflex over ()}.sub.iv from Eqn. 11.

[0145] To estimate .sub.i and .sub.i, one usually must first calculate the voxel-wise and site-wise standardized variance

[00050] ${\hat{}}_{iv}^{2 *} = \frac{1}{J_{i} - 1} {.Math.}_{j} {(z_{ijv} -)}^{2} .$

This variance shall not be confused wise standardized variance with

[00051] ${\hat{}}_{iv}^{2}$

from Eqn 12. The empirical mean and variance of

[00052] ${\hat{}}_{iv}^{2 *}$

across all voxels is computed as

[00053] ${\overline{G}}_{.Math.} = \frac{1}{V} {.Math.}_{v} {\hat{}}_{iv}^{2 *} and \overline{S_{.Math.}^{2}} = \frac{1}{V - 1} {.Math.}_{v} {({\hat{}}_{iv}^{2 *} - {\overline{G}}_{i})}^{2} .$

Then, by equating G.sub.i and

[00054] ${\overline{V}}_{i}^{2}$

to the first and second theoretical moments of the inverse gamma distribution, we get,

[00055] $\begin{matrix} {\overline{S}}_{i}^{2} = \frac{{\overline{G}}_{.Math.} = \frac{_{i}}{_{i_{_{i}^{2}}} - 1}}{{(_{i} - 1)}^{2} (_{i} - 2)} & (18) \end{matrix}$

which can be rearranged to estimate the two hyperparameters,

[00056] $\begin{matrix} {\overset{}{}}_{?} = \frac{{\overline{G}}_{?}^{2} + 2 {\overline{S}}_{?}^{2}}{{\overline{S}}_{?}^{2}}, {\overline{}}_{?} = \frac{{\overline{G}}_{?}^{3} + {\overline{GS}}_{?}^{2}}{{\overline{S}}_{?}^{2}} & (19) \end{matrix}$ $? indicates text missing or illegible when filed$

[0146] Now that the hyperparameters of the likelihood and prior distributions have been estimated, by expanding Eqns. 15 and 16 and integrating them with Eqn. 17 and (19), we arrive at the mathematical expectation of the posterior distributions, which yields the estimates of

[00057] $_{iv}^{*} and_{i v}^{2 *}$

[00058] $\begin{matrix} {\overline{}}_{?}^{*} = \dot{} (_{?}^{*} .Math. z_{?},_{?}^{2 *}) = \frac{J_{?} {\overline{}}_{?}^{2} {\hat{}}_{?}^{*} + {\overline{}}_{?}^{2 *} {\overline{}}_{?}}{J_{?} {\overline{}}_{?}^{2} + {\overline{}}_{?}^{2 *}} & (20) \end{matrix}$ $\begin{matrix} {\overline{}}_{?}^{2 *} = \dot{} (_{?}^{2 *} .Math. z_{?},_{?}^{*}) = \frac{{\overline{}}_{?} + \frac{1}{2} {.Math.}_{j} {(z_{ijv} - {\overline{}}_{iv}^{*})}^{2}}{\frac{J_{?}}{2} + {\overline{}}_{i} - 1} & (21) \end{matrix}$ $? indicates text missing or illegible when filed$

[0147] Given the interdependency of Eqns. 20 and 21,

[00059] ${\overset{}{}}_{i v}^{*} and {\overset{}{}}_{i v}^{2 *}$

are often computed through an iterative process. This may start with a reasonable initial value for

[00060] ${\overset{}{}}_{i v}^{2 *} (e . g ., {\overset{}{}}_{i v}^{2 *}),$

followed by the calculation of

[00061] $_{iv}^{*},$

and then re-estimation of

[00062] ${\overset{}{}}_{i v}^{2 *}$

using the new

[00063] $_{i v}^{*}$

value, and so forth. This cycle can be repeated until convergence is achieved.

[0148] After all parameters of Eqn. 7 have been empirically estimated, data harmonization is performed as follows:

[00064] $\begin{matrix} y_{?}^{ComBat} = \frac{{\overset{?}{}}_{v}}{{\overline{}}_{?}^{*}} (z_{?} - {\overline{}}_{?}^{*}) + {\overset{?}{}}_{?} + x_{?}^{T} {\overset{?}{}}_{?} & (22) \end{matrix}$ $? indicates text missing or illegible when filed$

[0149] Despite its popularity, Vanilla ComBAT often has its limitations. As such, over the years, several derived versions of ComBAT have been proposed to improve the original approach. Most of these improved versions tackle the core assumptions of ComBAT. For example, ComBAT-GAM counters the linear assumption of ComBAT using spline-based generalized additive models (GAM). M-ComBAT harmonizes data to a specific reference (or target) site rather than a global average. B-ComBAT employes Monte Carlo bootstrapping to robustly estimate the parameters in ComBAT, whereas Longitudinal ComBAT integrates intra-subject temporal information. CoVBat additionally corrects for covariance effects besides variance and mean.

[0150] Vanilla ComBAT and its subsequent versions are primarily designed for use in the context of clinical and/or academic research studies. Here, studies (or trials) are defined as research initiatives where participants (human or otherwise) undergo specific interventions under a predetermined research plan with data collected across multiple sites following stringent protocols. Studies may also encompass retrospective research on public datasets. However, the assumption-laden ComBAT harmonization for study data is typically not well-suited to the dynamism of real-life clinical practice. This necessitates a specialized harmonization approach for clinical settings, which we introduce as Clinical ComBAT.

[0151] Clinical ComBAT is designed for the nuances of real-life clinical practice. It employs a simplified data model, uses a polynomial modeling of the data, a site-specific harmonization using a normative reference site, and/or a robust function to compensation for data irregularities. Additionally, it incorporates novel prior terms, and/or introduces a goodness of fit metric to evaluate harmonization quality.

[0152] We now discuss harmonization in clinical practice. While not widely documented, the harmonization requirements for data collected in research studies versus day-to-day clinical practice are often distinct. We have identified nine core differences between these two settings, as outlined in Table 2. The most significant distinction lies in their primary objectives. Research studies focus on analyzing population trends, characterizing diseases, identifying biomarkers, or developing diagnostic and prognostic techniques. Note that studies typically involve a homogeneous population, often compared to a reference group. In contrast, clinical practice is centered on personalized medicine, where the data of each individual are processed and analyzed independently of other participants scanned at the same site. Therefore, the harmonization requirement in clinical practice is usually oriented towards comparing the metrics of an individual to those of a normative reference, akin to comparing a child's height and body mass index against pediatric growth charts.

TABLE-US-00002 TABLE 2 Differences between research studies and clinical practice. Clinical studies Clinical practice Application Population study Personalized medicine Acquisition period Limited Unlimited Nb MRI scanner Fixed Increases over time Nb participants per site Fixed Increases over time (very low at the beginning) MRI drifting Little or none Inevitable Harmonization need When all data has been acquired As soon as possible Medical condition Similar within and across sites Different within and across (known for each participant) sites (often unknown for each patient) Age bracket Similar span across sites Variable Acquisition protocol Similar across sites Different across sites

[0153] The second difference relates to the acquisition period. Research studies are inherently finite, concluding with a fixed number of sites and participants. In contrast, real-life clinical practice usually involves a continuous influx of data over an indefinite period for each site. Furthermore, from a software service provider's standpoint, the number of data-contributing sites may expand at an unpredictable rate over time. The open-ended acquisition timeline in clinical settings also means that the MRI signal is prone to drift, a factor often not problematic in research studies. Such drifting can result from hardware aging or updates to software and/or hardware.

[0154] An additional distinction is the timing of data harmonization. In clinical research, harmonization is typically a one-time process, executed after all data has been collected and curated. Conversely, in day-to-day clinical practice, harmonization typically occurs each time new data is acquired.

[0155] The final difference lies in the heterogeneity of the data. Research studies enforce strict acquisition protocols and select participants based on various covariates like physical condition, lifestyle, age, sex, etc., resulting in a consistent data profile across sites. In contrast, day-to-day clinical practice deals with a highly heterogeneous patient population, without a specific selection process. Moreover, the acquisition protocols can vary significantly from one site to another, adding to the complexity of data harmonization in clinical environments.

[0156] Given these factors, day-to-day clinical practice necessitates a harmonization technique tailored to its specific realities. First, scanner drift-due to the normal aging of scannersand periodic software and hardware updates usually necessitate regular updates to the harmonization function of MRI clinical scanners. However, clinics and hospitals often hesitate to regularly acquire new data solely for updating harmonization, because of various financial, ethical, and staff availability constraints. As such, a proper harmonization should normally adapt to drifting without requiring the acquisition of specific data. Second, the harmonization function usually must also perform well with limited data, which is common when a site begins scanning its initial patients. It should also be robust against data spanning a limited age range and corrupted data from patients with undocumented diseases. Furthermore, when new sites are integrated into the network, the harmonization process should not require recalibrating the entire network, as would be the case with indiscriminate application of Vanilla ComBAT.

[0157] Given the realities of clinical practice, several drawbacks emerge from the indiscriminate use of Vanilla ComBAT. The first issue is its linear assumption regarding data distribution (c.f. Eqns. 6 and 7). Although data may exhibit linear distribution in certain scenarios, this is not universally applicable, as illustrated by the Mean Diffusivity age distribution in FIG. 11. As will be discussed further below, a more effective harmonization technique is one that can automatically adjust to the trend of the data, which is often nonlinear.

[0158] Another significant issue with Vanilla ComBAT is its assumption that the slope (parameter .sub.v) of the distribution remains consistent across all sites. While this might not be a problem when the acquisition protocol of all sites has been carefully tuned, this assumption is often wrong between sites exhibiting very different signals, as demonstrated in FIG. 11. For example, the slopes of two populations in two datasets 1110 and 1112 (represented by black and blue dashed lines) differ notably. Given the non-linear nature of the MD metric with respect to age, and because these populations cover different age ranges (20 to 60 years for 1110 and 19 to 90 years for 1114), the slopes for both sites diverge substantially. These slope discrepancies are further amplified by significant additive and multiplicative biases between the sites. Additionally, the larger population size of 1112 (441 participants) compared to 1110 (132 participants) means the slope derived from Eqn. 9 leans more towards 1112 than 1110, leading to the inappropriate dash line for 1110 in FIG. 11, which underpins the flawed harmonization shown in FIG. 11.

[0159] FIG. 11 exemplifies this issue with the harmonization of a modified version of 1114 against itself. Assuming for this data the linear model in Eqn. 7, simulating the effects of multiplicative and additive biases (.sub.i and .sub.i) yields the linear model {circumflex over ()}.sub.ijv=y.sub.ijv.sub.i+.sub.i, or more explicitly

[00065] $\begin{matrix} {\overset{}{y}}_{i j v} =_{v}_{i} +_{i} + x_{i j}^{T}_{v}_{i} +_{ijv}_{i} . & (23) \end{matrix}$

This indicates that any multiplicative bias affects the population slope (here .sub.v.sub.i) contradicting the Vanilla ComBAT assumption of uniform slope across sites. In FIG. 11, with applied multiplicative and additive biases of 0.5 and 0.0008 to 1114, it is evident that Vanilla ComBAT is inadequate for re-harmonizing the data.

[0160] FIG. 12 presents a drawing of an example of harmonization of mean diffusivity of two different sites using a Vanilla ComBAT technique in accordance with an embodiment of the present disclosure. Notably, FIG. 12 illustrates raw data 1210, Vanilla ComBAT harmonization of the healthy subject in two datasets 1112 and 1212, and Vanilla ComBAT harmonization of all participants (healthy plus Alzheimer's Disease or AD and Moderate Cognitive Impairment or MCI) with the health participants in the other dataset 1214.

[0161] A further significant limitation is that the intercept and slope parameters (.sub.v and .sub.v) are estimated using the maximum likelihood approach of Eqn. 9 without any regularization prior. This leads to considerable variability when only a small number of participants are used to calculate the ComBAT parameters. This issue is depicted for 1116 in FIG. 11, where the biased version of the dataset 1116 is harmonized onto the original dataset 1112. As opposed to dataset 1114 where the harmonization parameters were derived using all 441 participants, only 4 participants were used in 1116. This procedure was repeated 30 times, each with a different set of participants. The average resulting curves, shown in FIG. 11 for 1116, exhibit significant difference.

[0162] Another limitation stems from Vanilla ComBAT's assumption that the data to be harmonized across all sites come from a homogeneous patient population. In practice, the populations scanned across multiple imaging centers are often heterogeneous, as they do not undergo a uniform selection process. This discrepancy becomes apparent when harmonizing clinical sites onto a normative reference target, which ideally should comprise only healthy controls (HC). An example of this is shown in FIG. 12, where the data from the 1212 dataset is harmonized with the 1112 dataset, which acts as the normative target. While the 1112 dataset includes only healthy control (HC) participants, the 1212 dataset contains a mix of healthy control, Mild Cognitive Impairment, and Alzheimer's disease patients.

[0163] Raw data 1210 in FIG. 12 highlights the differences between the two populations. The blue and gray-shaded areas indicate the healthy control distribution for both sites, whereas the turquoise and orange triangles represent respectively the Alzheimer's disease and Mild Cognitive Impairment participants from the first site. In 1212, the harmonization outcome is presented when only healthy control from the first site were used to derive the ComBAT parameters in comparison to second-site data. In contrast, 1214 illustrates the harmonization process for both healthy control and Alzheimer's disease participants from the first site. Harmonizing using only healthy control participants correctly segregates the Alzheimer's disease and Mild Cognitive Impairment participants, placing the Alzheimer's disease participants away from the normal population curve, which does not happen when both healthy control and Alzheimer's disease and Mild Cognitive Impairment participants are included in the harmonization. This discrepancy underscores the flaw in assuming patient population homogeneity during the harmonization process. Of course, it is possible to include these pathologies in the covariate vector {right arrow over (x)}.sub.ij along with age and sex, allowing for their consideration in the estimation of .sub.v. However, there is currently no standardized reference encompassing the distribution of multiple pathologies encountered in routine clinical settings. Moreover, in clinical practice, patient pathologies are frequently undisclosed, given that the primary aim of scans is to aid in diagnosing suspected neurological conditions.

[0164] Clinical ComBAT introduces eleven modifications compared to Vanilla ComBAT, each tailored to better suit the nuances of clinical practice. The primary focus of Clinical ComBAT is to facilitate the comparison of patient data against a normative model, which is an important aspect of clinical applications. FIG. 13 presents a drawing of an example of harmonization using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure. Notably, FIG. 13 depicts the operations in the harmonization procedure of Clinical ComBAT. The first notable feature of Clinical ComBAT is its strategy of harmonizing each site with a specific clinical target site. A similar approach of pairwise harmonization has been previously explored with M-ComBAT. However, Clinical ComBAT enhances this concept by ensuring that the target site contains a sufficiently large and diverse healthy control (HC) population to establish a robust normative model. It is predicated on the diversity of the subject distribution of the target site, encompassing a wide range of ages, handedness, and a balanced representation of genders.

[0165] The pairwise harmonization approach in Clinical ComBAT effectively addresses two challenges. First, it avoids the complexity of needing to harmonize an ever-growing number of sites simultaneously, as each site is harmonized independently. Secondly, it resolves the problem with Vanilla ComBAT, which could inadvertently modify the harmonization parameters for already calibrated sites when applied indiscriminately to a new site.

[0166] FIG. 13 presents a drawing of an example of harmonization of a moving site and a target site using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure. Notably, 1310 is the raw data, and 1312 is the harmonized data. The grey (target site) and the red (moving site) lines illustrate the overall trend of the population from site 1 and site 2. (1-4) are the operations leading to the harmonization of the two sites.

[0167] Additionally, pairwise harmonization simplifies the data formation equation used in Vanilla ComBAT (see Eq. (6)). At the core of most ComBAT techniques is the assumption that each dataset includes two additive biases (a population bias .sub.v and a site bias .sub.iv) and two multiplicative biases (a population bias .sub.v and a site bias .sub.iv). These four parameters are intended to represent the variations of each site from an overall unbiased reference population. Because Clinical ComBAT independently harmonizes each moving site, the data formation model can be streamlined to two biases. First, the model being a linear equation, Clinical ComBAT combines ay and .sub.iv into a single site-wise additive bias, denoted as b.sub.iv. Additionally, because the multiplication of a normally distributed variable

[00066] $_{ijv} (0,_{v}^{2})$

by .sub.iv still results in a normal distribution with variance (.sub.v.sub.iv).sup.2, our model adopts a single multiplicative bias, denoted as d.sub.iv. This modification leads to the following simplified site-wise data formation equation:

[00067] $\begin{matrix} y_{i j v} = b_{i v} + x_{i j}_{v} + d_{iv}_{ijv}, & (24) \end{matrix}$

where i represents the site index i{Moving, Target} and .sub.ijv custom-character (0,1). This new data formation equation stands as the second improvement with respect to Vanilla ComBAT.

[0168] A third distinction arises with the slope parameter .sub.v in Vanilla ComBAT, which is the same across every site. Clinical ComBAT introduces a subtle but crucial modification by adding a site index i, resulting in .sub.iv. Consequently, each site has its own distinct slope, an adjustment that is critical for compensating in specific scenarios, as will be demonstrated in the results section.

[0169] The fourth distinction in Clinical ComBAT is the nonlinearity introduced in the Vanilla ComBAT model. This nonlinearity is achieved using a polynomial basis function of positive degree P, expressed as

[00068] $\begin{matrix} ({\overset{.fwdarw.}{x}}_{i j}) = {({\overset{.fwdarw.}{x}}_{i j}^{T} .Math. {\overset{.fwdarw.}{x}}_{i j} + 1)}^{P} & (25) \end{matrix}$

denoted as {right arrow over ()}.sub.ij for notation simplicity. Notably, regardless of the degree P, the expansion of this polynomial always includes a constant term, which allows the additive bias b.sub.iv to be effectively integrated into it. Consequently, the data formation equation is revised to

[00069] $\begin{matrix} y_{ijv} = {\overset{.fwdarw.}{}}_{i j}^{T} {\overset{.fwdarw.}{}}_{i v} + d_{iv}_{ijv} . & (26) \end{matrix}$

Here, each site is characterized by a specific set of weights {right arrow over ()}.sub.iv, along with a single multiplicative bias d.sub.iv and an additive bias that is incorporated into {right arrow over ()}.sub.iv.

[0170] With the pairwise harmonization framework of Clinical ComBAT and according to Eqn. 26, the harmonization of a moving site M onto a target normative reference T requires the estimation of four terms: two weight vectors {right arrow over ()}.sub.Mv and {right arrow over ()}.sub.Tv and two multiplicative biases d.sub.Mv and d.sub.Tv.

[0171] For the target site, like Vanilla ComBAT, {right arrow over ()}.sub.Tv is computed by maximizing a likelihood Gaussian distribution of the target data. Doing so boils down to minimizing the following L2 loss function

[00070] $\begin{matrix} {\overset{.fwdarw.}{}}_{T v} = \arg \max_{\overset{.fwdarw.}{}} \frac{1}{J_{T}} {.Math.}_{i = 1}^{J_{T}} {(y_{Tjv} - {\overset{.fwdarw.}{}}_{Tj}^{T} {\overset{.fwdarw.}{}}_{T v})}^{2}, & (27) \end{matrix}$

which may be solved with the following close form solution

[00071] $\begin{matrix} {\overset{.fwdarw.}{}}_{Tv} = {(_{T}^{T}_{T})}^{- 1}_{T}^{T} Y_{Tv}, & (28) \end{matrix}$

where .sub.T is a matrix containing the concatenated covariable vectors {right arrow over ()}.sub.Tj of all target participants, and Y.sub.Tv is a vector containing the values y.sub.Tjv of the target site. While this formulation is similar to that of Eqn. 9, it only contains data from the target site whereas Eqn. 9 contains data from every site.

[0172] A fifth difference with Vanilla ComBAT is on the computation of

[00072] $d_{Tv}^{2} .$

With the hypothesis that the target site is well populated, the estimation of

[00073] $d_{Tv}^{2}$

does not require an a priori term as in Eqns. 15 and 16. Instead,

[00074] $d_{Tv}^{2}$

can be estimated following a maximum likelihood which is a much simpler alternative to the iterative solution in Vanilla ComBAT derived from the use of an inverse gamma prior term (c.f. Eqns. 20 and 21). Instead, d.sub.Tv.sup.2 is obtained following a maximum likelihood of a Gaussian distribution which leads to the calculation of the simple variance of the rectified data,

[00075] $\begin{matrix} d_{Tv}^{2} = \frac{1}{J_{T}} {.Math.}_{j = 1}^{J_{T}} {(y_{Tjv} - {\overset{.fwdarw.}{}}_{Tj}^{T} {\overset{.fwdarw.}{}}_{Tv})}^{2} . & (29) \end{matrix}$

[0173] Because the moving site may contain a limited number of data, we proceed with the estimation of {right arrow over ()}.sub.Mv and d.sub.Mv with a maximum a posteriori (MAP) formulation. Unlike Vanilla ComBAT (the sixth difference) which models {right arrow over ()}.sub.Mv with a likelihood distribution, we model it with a posterior distribution involving the product of a Gaussian likelihood and prior:

[00076] $\begin{matrix} P ({\overset{.fwdarw.}{}}_{Mv} .Math. D_{Mv}) P (D_{Mv} .Math. {\overset{.fwdarw.}{}}_{Mv}) P ({\overset{.fwdarw.}{}}_{Mv}) & (30) \end{matrix}$

where D.sub.Mv={Y.sub.Mv, X.sub.M} contains the list of all values and covariables of the moving site at location and where

[00077] $\begin{matrix} P (D_{Mv} .Math. {\overset{.fwdarw.}{}}_{Mv}) = (_{M}^{T} {\overset{.fwdarw.}{}}_{iv}, {.Math.}_{l}) (likelihood - Gaussian) & (31) \end{matrix}$ $\begin{matrix} P ({\overset{.fwdarw.}{}}_{Mv}) = ({\overset{.fwdarw.}{}}_{Tv}, {.Math.}_{0})) (Prior - Gaussian) . & (32) \end{matrix}$

The goal of the prior is to make sure {right arrow over ()}.sub.Mv does not differ too much from {right arrow over ()}.sub.Tv, the regression weights of the target site. This is often important when a site accounts for a very low number of participants. Note that maximizing the posterior P({right arrow over ()}.sub.Mv|D.sub.Mv) leads to the following close-form solution:

[00078] $\begin{matrix} {\overset{.fwdarw.}{}}_{Mv} = {(_{M}^{T}_{M} + {\overset{.fwdarw.}{}}^{T} I)}^{- 1} (_{M}^{T} {\overset{.fwdarw.}{Y}}_{Mv} + {\overset{.fwdarw.}{}}^{T} {\overset{.fwdarw.}{}}_{Tv}), & (33) \end{matrix}$

where I is the identity matrix and {right arrow over ()} custom-character .sup.p+ is a regularization vector. As can be seen, for {right arrow over ()}={right arrow over (0)}, the regression of {right arrow over ()}.sub.Mv is a maximum likelihood estimation as in Eqn. 28. Moreover, as the regularization increases, more weight is put on {right arrow over ()}.sub.Tv, and, when {right arrow over ()} becomes large enough, {right arrow over ()}.sub.Mv{right arrow over ()}.sub.Tv.

[0174] A seventh difference with Vanilla ComBAT comes from our maximum a posteriori formulation for estimating the moving site variance

[00079] $d_{Mv}^{2} .$

For Clinical ComBAT, the posterior is given by

[00080] $\begin{matrix} P (d_{Mv}^{2} .Math. z_{Mv}) P (z_{Mv} .Math. d_{Mv}^{2}) P (d_{Mv}^{2}), & (34) \end{matrix}$

where z.sub.Mv is the set of all rectified values of the moving site, namely

[00081] $z_{Mjv} = y_{Mjv} - {\overset{.fwdarw.}{}}_{Mj}^{T} {\overset{.fwdarw.}{}}_{Mv} j .$

Like Vanilla ComBAT, the likelihood distribution follows a Gaussian distribution. However, the prior is a Gamma (instead of an inverse Gamma) distribution:

[00082] $\begin{matrix} P (z_{Mv} .Math. d_{Mv}^{2}) = (0, d_{Mv}^{2}) (likelihood - Gaussian), & (35) \end{matrix}$ $\begin{matrix} P (_{Mv}) = G (_{Mv} .Math. a_{0}, b_{0}) (prior - Gamma), & (36) \end{matrix}$

where

[00083] $_{Mv} = \frac{1}{d_{Mv}^{2}}$

is the inverse of the variance also called precision. By incorporating the variance of the target site

[00084] $d_{Tv}^{2}$

in the Gamma prior, maximizing the posterior leads to the following solution:

[00085] $\begin{matrix} d_{Mv}^{2} = \frac{J_{M} {\hat{d}}_{Mv}^{2}}{J_{M} + v_{0}} + \frac{v_{0} d_{Tv}^{2}}{J_{M} + v_{0}} & (37) \end{matrix}$

where J.sub.M is the number of data in the moving site,

[00086] ${\hat{d}}_{Mv}^{2}$

is the empirical variance of the rectified moving site

[00087] $\frac{1}{J_{M}} {.Math.}_{j = 1}^{J_{M}} {(y_{Mjv} - {\overset{.fwdarw.}{}}_{Mj}^{T} {\overset{.fwdarw.}{}}_{Mv})}^{2}$

and .sub.0 custom-character .sup.+ is an hyperparameter. This equation comes with an intuitive flavor. Indeed, when the moving site lacks data, indicated by J.sub.M=0, the variance of the moving site

[00088] $d_{Mv}^{2}$

defaults to match the variance of the target site

[00089] $d_{Tv}^{2} .$

As the quantity of data of the moving site increases,

[00090] $d_{Mv}^{2}$

transitions to a weighted average between

[00091] ${\hat{d}}_{M v}^{2} and d_{Tv}^{2} .$

At some point, when

[00092] $J_{M} = v_{0}, d_{Mv}^{2}$

becomes the exact average between

[00093] ${\hat{d}}_{Mv}^{2} and d_{Tv}^{2} .$

Beyond this point, when

[00094] $J_{M} v_{0}, d_{Mv}^{2}$

closely approximates to

[00095] ${\overset{}{d}}_{Mv}^{2} .$

In addition to its simplicity, Eqn. 37 does not require an iterative process and has the target variance as a prior instead of data taken across the brain.

[0175] We now describe the harmonization process. At the heart of Clinical ComBAT are two techniques. The first, named Clinical ComBAT-fit, processes data and covariates for a specified region v collected at the target and moving sites. The data from the target site, D.sub.Tv, includes {(y.sub.T1v, {right arrow over (x)}.sub.T1), (y.sub.T2v, {right arrow over (x)}.sub.T2), . . . , (y.sub.TN.sub.T.sub.v, {right arrow over (x)}.sub.TJ.sub.T)} and similarly, the moving site data, D.sub.Mv, includes {(y.sub.M1v, {right arrow over (x)}.sub.M1), (y.sub.M2v, {right arrow over (x)}.sub.M2), . . . , (y.sub.MN.sub.M.sub.v, {right arrow over (x)}.sub.MJ.sub.M)}. From these, it calculates the four harmonization parameters

[00096] ${d_{Mv}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{Tv}^{2}, {\overset{.fwdarw.}{}}_{Tv}} .$

[0176] The second technique is called Clinical ComBAT-Apply. It uses a set of J.sub.M data from the moving site and harmonizes with the target site. Detailed explanations and mathematical frameworks of these techniques are outlined in Tables 3 and 4.

TABLE-US-00003 TABLE 3 Clinical ComBAT-Fit Input: D.sub.Tv ={(y.sub.T1v, {right arrow over (x)}.sub.T1), (y.sub.T2v, {right arrow over (x)}.sub.T2), ... , (y.sub.TN.sub.T.sub.v, {right arrow over (x)}.sub.TJ.sub.T) // Target site data Input: D.sub.Mv = {(y.sub.M1v, {right arrow over (x)}.sub.M1), (y.sub.M2v, {right arrow over (x)}.sub.M2), ... , (y.sub.MN.sub.M.sub.v, {right arrow over (x)}.sub.MJ.sub.M)} // Moving site data Input: P, {right arrow over ()}, V.sub.0 // Hyperparameters [00097] $Output : {d_{M v}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{T v}^{2}, {\overset{.fwdarw.}{}}_{Tv}}$ // harmonization parameters // Target site parameters 1 [00098] ${\overset{.fwdarw.}{}}_{T j} {({\overset{.fwdarw.}{x}}_{Tj}^{T} .Math. {\overset{.fwdarw.}{x}}_{T j} + 1)}^{P} j$ Eqn. 24 2 .sub.T stack every vector {right arrow over ()}.sub.Tj in a matrix 3 [00099] ${\overset{.fwdarw.}{}}_{Tv} = {(_{T}^{T}_{T})}^{- 1}_{T}^{T} Y_{Tv}$ Eqn. 28 4 [00100] $d_{T v}^{2} = \frac{1}{J_{T}} {.Math.}_{j = 1}^{J_{T}} {(y_{Tjv} - {\overset{.fwdarw.}{}}_{Tj}^{T} {\overset{.fwdarw.}{}}_{Tv})}^{2}$ Eqn. 29 // Moving site parameters 5 [00101] ${\overset{.fwdarw.}{}}_{Mj} {({\overset{.fwdarw.}{x}}_{Mj}^{T} .Math. {\overset{.fwdarw.}{x}}_{Mj} + 1)}^{P} j$ Eqn. (24 6 .sub.M stack every vector {right arrow over ()}.sub.Mj in a matrix 7 [00102] ${\overset{.fwdarw.}{}}_{Mv} = {(_{M}^{T}_{M} + {\overset{.fwdarw.}{}}^{T} I)}^{- 1} (_{M}^{T} {\overset{.fwdarw.}{Y}}_{Mv} + {\overset{.fwdarw.}{}}^{T} {\overset{.fwdarw.}{}}_{Tv})$ Eqn. 33 8 [00103] ${\overset{}{d}}_{M v}^{2} = \frac{1}{J_{M}} {.Math.}_{j = 1}^{J_{M}} {(y_{Mjv} - {\overset{.fwdarw.}{}}_{Mj}^{T} {\overset{.fwdarw.}{}}_{Mv})}^{2}$ 9 [00104] $d_{M v}^{2} = \frac{J_{M} {\overset{}{d}}_{M v}^{2}}{J_{M} + v_{0}} + \frac{v_{0} d_{Tv}^{2}}{J_{M} + v_{0}}$ Eqn. 37 10 [00105] $Return {d_{M v}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{T v}^{2}, {\overset{.fwdarw.}{}}_{Tv}}$

TABLE-US-00004 TABLE 4 Clinical ComBAT-Apply Input: D.sub.Mv = {(y.sub.M1v, {right arrow over (x)}.sub.M1), (y.sub.M2v, {right arrow over (x)}.sub.M2), ... , (y.sub.MN.sub.M.sub.v, {right arrow over (x)}.sub.MJ.sub.M)} // Moving site data Input: P // Hyperparameter Output: {.sub.M1v, .sub.M2v, ... .sub.MN.sub.M.sub.v} // harmonized values 1 for j = 1 ... J.sub.M, do 2 [00106] $| {\overset{.fwdarw.}{}}_{Mj} {({\overset{.fwdarw.}{x}}_{Mj}^{T} .Math. {\overset{.fwdarw.}{x}}_{Mj} + 1)}^{P}$ Eqn. 24 3 [00107] $| {\hat{y}}_{Miv} = \frac{y_{Miv} - {\overset{.fwdarw.}{}}_{Mv}^{T} {\overset{.fwdarw.}{}}_{Mj}}{d_{Mv}^{2}} d_{Tv}^{2} + {\overset{.fwdarw.}{}}_{Tv}^{T} {\overset{.fwdarw.}{}}_{Tj}$ Eqn. 26 4 Return {.sub.M1v, .sub.M2v, ... .sub.MJ.sub.M.sub.v}

[0177] Additional improvements in Clinical ComBAT include goodness of fit quality control. FIGS. 11 and 12 demonstrate that Vanilla ComBAT (and its variants) are susceptible to harmonization errors. These errors often result in a harmonized population distribution that is poorly aligned to the target population. This issue is clearly depicted in FIG. 11, where the red-shaded harmonized distribution leans downward compared to the gray-shaded target population. Such discrepancies in harmonization can lead to misinterpretation of the data by healthcare providers.

[0178] From this perspective, an eighth different with Vanilla ComBAT is a quality control system designed to identify abnormal shifts in a set of harmonized data relative to a target population. Several heuristic techniques can be implemented, such as counting the instances where harmonized data points fall two (or more) standard deviations away from the reference population. A flag would be raised when these deviations exceed a predefined threshold.

[0179] In the case of Clinical ComBAT, a population overlap score is computed to gauge the extent of overlap between the target and harmonized populations. Initially, from the original raw datasets D.sub.Tv and D.sub.Mv and using the Clinical ComBAT-Fit technique, we derive the harmonization parameters

[00108] ${d_{Mv}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{Tv}^{2}, {\overset{.fwdarw.}{}}_{Tv}} .$

Subsequently, applying Clinical ComBAT-Apply to D.sub.Mv (or on any other dataset from the moving site) produces a set of harmonized data called D.sub.Mv.

[0180] From this point on, the objective is to determine how much the harmonized population D.sub.Mv overlaps the target population Dry. This may be achieved by rectifying both populations using the target parameter vector {right arrow over ()}.sub.Tv:

[00109] $\begin{matrix} \begin{matrix} z_{Tjv} = y_{Tjv} - {\overset{.fwdarw.}{}}_{Tj}^{T} {\overset{.fwdarw.}{}}_{Tv} & (y_{Tjv}, {\overset{.fwdarw.}{}}_{Tj}^{T}) D_{Tv} \end{matrix} & (38) \end{matrix}$ $\begin{matrix} \begin{matrix} {\overset{}{z}}_{Mjv} = {\overset{}{y}}_{Mjv} - {\overset{.fwdarw.}{}}_{Mj}^{T} {\overset{.fwdarw.}{}}_{Tv} & ({\overset{}{y}}_{Mjv}, {\overset{.fwdarw.}{}}_{Mj}^{T}) D_{Mv} . \end{matrix} & (39) \end{matrix}$

By removing the effect of the covariates, this rectification process allows us to assume that z.sub.ijv is independent from {right arrow over (x)} and that P(z.sub.Tjv|{right arrow over (x)}.sub.Tj)=P(z.sub.Tjv) and P({circumflex over (z)}.sub.Mjv|{right arrow over (x)}.sub.Mj)=P({circumflex over (z)}.sub.Mjv).

[0181] The distance between univariate distribution functions can be computed using various statistical metrics and divergence measures, including: total variation distance, Hellinger distance, Lvy-Prokhorov metric, Wasserstein metric, Mahalanobis distance, Amari distance, integral probability metrics, KL divergence, Jensen-Shannon divergence, Bhattacharyya distance, Rnyi divergence, and/or f-divergence. In Table 7, we illustrate the use of the Bhattacharyya distance which, in the case of two Gaussian distributions such as in ours, results into the close form solution of line 5 in Table 7.

TABLE-US-00005 TABLE 5 Harmonization-Qc Input {circumflex over (D)}.sub.Mv = {(.sub.M1v, {right arrow over (x)}.sub.M1), (.sub.M2v, {right arrow over (x)}.sub.M2), ... , (.sub.MN.sub.M.sub.v, {right arrow over (x)}.sub.MJ.sub.M)} // Harmonized data D.sub.Tv = {(y.sub.T1v, {right arrow over (x)}.sub.T1), (y.sub.T2v, {right arrow over (x)}.sub.T2), ... , (y.sub.TN.sub.T.sub.v, {right arrow over (x)}.sub.TJ.sub.T)} // Target site data thr (default = 0.04) // Threshold {right arrow over ()}.sub.Tv // Target site parameter Output Qc flag // data rectification 1 [00110] $Z_{T} = {z_{Tjv} = y_{Tjv} - {\overset{.fwdarw.}{}}_{Tj}^{T} {\overset{.fwdarw.}{}}_{Tv} .Math. (y_{Tjv}, {\overset{.fwdarw.}{}}_{Tj}^{T}) D_{Tv}}$ 2 [00111] $Z_{M} = {{\hat{z}}_{Mjv} = {\hat{y}}_{Mjv} - {\overset{.fwdarw.}{}}_{Mj}^{T} {\overset{.fwdarw.}{}}_{Tv} .Math. ({\hat{y}}_{Mjv}, {\overset{.fwdarw.}{}}_{Mj}^{T}) D_{Mv}}$ // Compute the Bhattacharyya distance 3 [00112] $_{T} = mean (Z_{T}),_{T}^{2} = var (Z_{T})$ 4 [00113] $_{M} = mean (Z_{M}),_{M}^{2} = var (Z_{M})$ 5 [00114] $d_{B} = \frac{1}{4} \frac{{(_{T} -_{M})}^{2}}{_{T}^{2} +_{M}^{2}} + \frac{1}{2} \ln (\frac{_{T}^{2} +_{M}^{2}}{2_{T}_{M}})$ 6 if d.sub.B > thr then // a default value for thr is 0.04 7 return False 8 else 9 return True

[0182] The nineth difference with Vanilla ComBAT is the rejection of outliers. Data acquired in real-life conditions is rarely obtained from healthy individuals. Instead, it is typically collected from subjects with conditions that may affect the measured dMRI signal. Furthermore, the condition of the subject is often unknown at the time of scanning, as the primary purpose of an MRI scan is to aid in diagnosis. Consequently, the moving data D.sub.Mv can often be corrupted by outlier measures. If used as-is, this can lead to situations such as the one illustrated in FIG. 12 where the harmonization squashes together the healthy and non-healthy measures along the normative target model.

[0183] One solution is to reduce as much as possible the influence of outlier data. One way of doing so is by removing data that are spreading far away from the center of gravity of the distribution. This can be done by clipping data that are more than a few standard deviations away from the mean or the median of the population or whose p-value is below a certain threshold. As an alternative, one can also use techniques based on the median absolute deviation or other estimators like the s, and the .sub.Q.sub.n estimators.

[0184] However, because outlier data are typically due to persons suffering from a medical condition, the outlier typically fall on one side of the distribution (e.g. in FIG. 12 the Alzheimer's disease subjects are all above the average). As such, a blind clipping may end up removing as many normal data than outliers. As an alternative, one might iteratively remove extreme data on the side the distribution is skewed towards, up until when the remaining data exhibit a better Gaussian shape, such as when it passes a normality test.

[0185] FIG. 14 presents a drawing of an example of outlier rejection in accordance with an embodiment of the present disclosure. Notably, FIG. 14 illustrates an outlier rejection technique with data lying mostly on one site of the distribution. 1410 shows a population made of healthy controls (in blue) and outliers (in red) with the entire population distribution being shifted toward the outliers. Moreover, 1412 shows data lying on the left of the median are considered healthy controls. Furthermore, 1414 shows that flipping the selected data results into a distribution that is closer to the healthy control distribution. Additionally, 1416 shows that removing the right most extreme data leads to a distribution close to that of the healthy controls.

[0186] One way of doing so is illustrated in FIG. 14 for the mean diffusivity metric. Starting with rectified moving data

[00115] $Z_{M} = {{\overset{}{z}}_{Mjv} = {\overset{}{y}}_{Mjv} - {\overset{.fwdarw.}{}}_{Mj}^{T} {\overset{.fwdarw.}{}}_{Mv} | ({\overset{}{y}}_{Mjv}, {\overset{.fwdarw.}{}}_{Mj}^{T}) D_{Mv}},$

1410 illustrates the healthy controls and outlier data as blue and red points. As can be seen, the distribution of the entire population (healthy controls and outliers) is skewed towards the right, i.e., on the side of the red outlier data. One way of removing the outliers is to compute the median of the entire population and consider the data on the left of the median to be mostly composed of healthy controls (1412). Then, by flipping these data around the median and fitting on it a Gaussian distribution (the green aka Flip distribution in 1414) we get a distribution that approximate that of the healthy controls. Then, one can iteratively remove the right-most data (illustrated by the black dots) up until when the mean of the entire population falls on the left of the flip distribution as shown by 1416.

[0187] FIG. 15 presents a drawing of examples of different polynomial models of degree 3 in accordance with an embodiment of the present disclosure. The gray dots with their associated gray curve model are from the 1112 dataset and the orange ones are from sample of 19 participants from a 1510 dataset. Overfitting of the moving model obtained with {right arrow over ()}-0 diverges for the 20-40 age range. Moreover, the result of an auto-tuning technique illustrates that the shape of the orange model does not deviate too much from that of the gray model.

[0188] We now discuss hyperparameter auto-tuning. The tenth difference with Vanilla ComBAT is the ability for the disclosed Clinical ComBAT technique to automatically tune its most influential hyperparameter. For Clinical ComBAT, that hyperparameter is {right arrow over ()} in Eqn. 33. The size of this hyperparameter vector dictates the extent to which the model weights, denoted by {right arrow over ()}.sub.Mv, will diverge from the target site weights {right arrow over ()}.sub.Tv. At one extreme, when {right arrow over ()}.fwdarw.{right arrow over (0)}, {right arrow over ()}.sub.Mv aligns closely with the moving data, potentially leading to overfitting. This is more likely when the moving dataset is small, contains outliers, or is limited to a narrow age range, or when the polynomial degree P of Eqn. 24 is too high. FIG. 15 provides an example, displaying target data (gray dots) against raw moving data (orange triangles). In this example, the amount of data in the moving site is low and spans over a limited age range. The gray curve represents the target model fit:

[00116] ${\overset{.fwdarw.}{}}_{T v}^{T} {\overset{.fwdarw.}{}}_{j} j [1 8, 9 0]$

calculated with Eqn. 28, while the orange dotted curve shows the moving model fit:

[00117] ${\overset{.fwdarw.}{}}_{Mv}^{T} {\overset{.fwdarw.}{}}_{j} j [1 8, 9 0]$

derived from Eqn. 33. Both curves are polynomials of degree P equal to 3. On the left, for {right arrow over ()}={right arrow over (0)}, the orange dotted curve visibly overfits, deviating significantly from the gray curve. Conversely, with a carefully adjusted {right arrow over ()}, the orange dotted curve on the right strikes a balance between fitting the data and resembling the gray target curve.

[0189] Assuming that the vector {right arrow over ()}.sub.Tv has the correct form, our objective is to identify the appropriate {right arrow over ()} vector that ensures the orange curve aligns closely with the gray curve while accurately representing the orange moving data. This task is quantified using four distinct distance measures between the two curves, as illustrated in FIG. 15. The first two measures, d.sub.min and d.sub.max represent the minimum and maximum distances between the curves across all voxels for the age range present in the moving data:

[00118] $\begin{matrix} d_{m i n} = \min ({\overset{.fwdarw.}{}}_{Tv}^{T} {\overset{.fwdarw.}{}}_{j} - {\overset{.fwdarw.}{}}_{Mv}^{T} {\overset{.fwdarw.}{}}_{j}) j x_{M}, & (40) \end{matrix}$ $\begin{matrix} d_{m ax} = \max ({\overset{.fwdarw.}{}}_{Tv}^{T} {\overset{.fwdarw.}{}}_{j} - {\overset{.fwdarw.}{}}_{Mv}^{T} {\overset{.fwdarw.}{}}_{j}) j x_{M} . & (41) \end{matrix}$

The other two distances d.sub.1 and d.sub.2, are the minimum and maximum inter-curve distances across all voxels for the full age range.

[0190] The optimization of A can be expressed as

[00119] $\begin{matrix} \overset{.fwdarw.}{} = \arg \min_{\overset{.fwdarw.}{}} sign (d_{m i n}_{1^{-}} d_{1}) + sign (d_{2} - d_{m ax}_{2}) + 2, & (42) \end{matrix}$

where .sub.1 [0,1] and .sub.21 are predefined values, and the sign function is defined as

[00120] $\begin{matrix} sign () = {\begin{matrix} 1 & if 0 \\ - 1 & otherwise . \end{matrix} & (43) \end{matrix}$

Note that when .sub.1=.sub.2=1, the optimized {right arrow over ()} vector results in an orange dotted curve perfectly parallel to the gray target curve. Furthermore, since .sub.1 is positive, the two curves cannot intersect as in FIG. 13.

TABLE-US-00006 TABLE 6 Hyperparameter auto-tuning Input: D.sub.Tv = {(y.sub.T1v, {right arrow over (x)}.sub.T1), (y.sub.T2v, {right arrow over (x)}.sub.T2), ... , (y.sub.TN.sub.T.sub.v, {right arrow over (x)}.sub.TJ.sub.T)} // Target site data Input: D.sub.Mv = {(y.sub.M1v, {right arrow over (x)}.sub.M1), (y.sub.M2v, {right arrow over (x)}.sub.M2), ... , (y.sub.MN.sub.M.sub.v, {right arrow over (x)}.sub.MJ.sub.M)} // Moving site data Input: P, v.sub.0, .sub.1, .sub.2, k, .sub.min // Hyperparameters Output: {right arrow over ()} // harmonization parameters // Initialization of {right arrow over ()} 1 [00121] ${d_{Mv}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{Tv}^{2}, {\overset{.fwdarw.}{}}_{Tv}} = ClinicalComBAT - Fit (P, \overset{.fwdarw.}{0}, v_{0}, D_{Tv}, D_{Mv})$ 2 [00122] ${\overset{.fwdarw.}{}}_{0} = .Math. \frac{{\overset{.fwdarw.}{}}_{Tv}^{[0]}}{{\overset{.fwdarw.}{}}_{Tv}} .Math.$ // 1D Search technique 3 found = False 4 = .sub.min 5 while found == False do 6 {right arrow over ()} = , {right arrow over ()}.sub.0 7 [00123] ${d_{Mv}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{Tv}^{2}, {\overset{.fwdarw.}{}}_{Tv}} = ClinicalComBAT - Fit (P, \overset{.fwdarw.}{}, v_{0}, D_{Tv}, D_{Mv})$ 8 Compute d.sub.min, d.sub.max, d.sub.1, d.sub.2 9 if sign(d.sub.min .sub.1 d.sub.1) + sign(d.sub.2 d.sub.max .sub.2) + 2 == 0 then Eqn.(35) 10 found = True 11 else: 12 = k// k is a multiplicative step site > 1 13 return {right arrow over ()}

[0191] Finding a suitable {right arrow over ()} vector that satisfies the optimization condition of Eqn. 42 can be achieved using a technique such as the simplex optimizer. Alternatively, one may initialize {tilde over ()} with a normalized version {right arrow over ()}.sub.Tv and then incrementally adjust a multiplicative term until the optimization criterion is met. This approach simplifies the optimization process to one dimension, with details provided in Table 6.

[0192] Please note that the hyper-parameter .sub.0 may also be estimated with the same auto-tuning approach. However, our empirical experiments reveal that a value of .sub.0=5 produces good results in every case we tested.

[0193] Note that computing harmonization parameters with a very small number of participants (e.g. 10 or less) of moving data can result in a transfer function that is poorly suited for accurately harmonizing future data. This issue can be particularly pronounced when the initial data used to compute these parameters has a limited range of covariates. For example, when the data is from participants who are all of similar age (e.g., between 20 and 25), the harmonization parameters may not be effective for future participants with a significantly different age range (e.g., between 70 and 90).

[0194] The solution to this problem is the eleveth difference with Vanilla ComBAT and its variants. As such, the disclosed Clinical ComBAT technique pools the harmonization parameters

[00124] ${d_{M v}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{Tv}^{2}, {\overset{.fwdarw.}{}}_{T v}}$

or the data from a community of sites that have been properly harmonized beforehand. When the number of these community sites is sufficiently large, it can be assumed that the population distribution of the moving site is similar to at least one of these third-party sites. Based at least in part on this hypothesis, the strategy may be to identify the closest third-party sites whose distribution of covariates matches that of the moving site and use their harmonization parameters and/or data to harmonize the moving site. This approach forms the basis of the Community-ComBAT harmonization technique, as described in Table 7.

[0195] In this technique, the distance between the moving site D.sub.Mv and a third-party site D.sub.cv is measured by the Bhattacharyya distance of the rectified data. Other statistical distances, such as the Kullback-Leibler divergence, total variation distance, Hellinger distance, Jensen-Shannon divergence, or any Earth mover distances, could also be used. Moreover, to prevent the more heavily populated sites from having an overwhelming influence on the ComBAT fit (e.g., in line 7 of Table 6), one can resample the distribution of every site in order to have an equal amount of data among the k closest sites in the community.

TABLE-US-00007 TABLE 7 Community-ComBAT Input: D.sub.Tv = {(y.sub.T1v, {right arrow over (x)}.sub.T1), (y.sub.T2v, {right arrow over (x)}.sub.T2), ... , (y.sub.TN.sub.T.sub.v, {right arrow over (x)}.sub.TJ.sub.T)} // Target site data Input: D.sub.cv = {D.sub.cv1, D.sub.cv2, D.sub.cv3, ... , D.sub.cvM} // M Community site data Input: D.sub.Mv = {(y.sub.M1v, {right arrow over (x)}.sub.M1), (y.sub.M2v, {right arrow over (x)}.sub.M2), ... , (y.sub.MN.sub.M.sub.v, {right arrow over (x)}.sub.MJ.sub.M)} // Moving site data Input: P, {right arrow over ()}, v.sub.0, k, nb_sub // Hyperparameters [00125] $\begin{matrix} Output : {d_{Mv}^{2}, {\overset{.fwdarw.}{}}_{Mv,} d_{Tv}^{2}, {\overset{.fwdarw.}{}}_{Tv}} \end{matrix}$ // harmonization parameters // Initialization of {right arrow over ()} 1 Site_dist = { } 2 For i = 1 to M do 3 // Compute the Bhattacharyya distance 4 d.sub.B = Harmonization-Qc(D.sub.Mv, D.sub.cvi, {right arrow over ()}.sub.cvi) // section 2.7.4 5 Site_dist Site dist (d.sub.B, D.sub.cvi,) // Harmonize the data of the closest sites onto the target site 6 D.sub.kv pool a fix number of nb_sub data of the k closest sites according to Site_dist 7 [00126] ${d_{Mv}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{Tv}^{2}, {\overset{.fwdarw.}{}}_{Tv}} = Clinical ComBAT - Fit (D_{Tv}, D_{kv}, P, \overset{.fwdarw.}{}, v_{0})$ 8 [00127] $return {d_{Mv}^{2}, {\overset{.fwdarw.}{}}_{Mv}, d_{Tv}^{2}, {\overset{.fwdarw.}{}}_{Tv}}$

[0196] FIG. 16 presents a drawing of examples of harmonization of mean diffusivity of the arcuate fasciculus in the left hemisphere in the 1110, 1212, 1510 and 1610 datasets using a Vanilla ComBAT technique and a Clinical ComBAT technique in accordance with an embodiment of the present disclosure. The lines indicate the 95th, 75th, 50th, 25th, and 5th percentiles.

[0197] For off-the-shelf Clinical ComBAT, our experiments suggest that the polynomial degree for modeling should be determined by the target site data. The minimal degree that provides an adequate goodness of fit should be chosen (e.g., identified using the Bhattacharyya distance). Specifically, mean diffusivity, fractional anisotropy, and apparent fiber density may be suitably fitted with a polynomial degree of 2.

[0198] The variance estimation parameter .sub.0 controls the variance estimation of the moving site data

[00128] $d_{M v}^{2} .$

A too low .sub.0 value can lead to misestimation and incorrect variance correction when data points are sparse. Conversely, a too high .sub.0 value fixes the moving site variance to the target site variance, failing to correct the multiplicative bias of the moving site. Our experiments on the mean diffusion metrics white matter bundles suggest .sub.0=5 provides a reliable estimation of

[00129] $d_{M v}^{2} .$

When sufficient data points are available,

[00130] $d_{M v}^{2}$

is progressively influenced by the target site data.

[0199] The hyperparameters .sub.1 and .sub.2 control the predictive capability of the model for unseen age data. Our experiments suggest that .sub.1=0.5 and .sub.2=2 strike a good balance between the moving site data-driven fit and providing reasonable model predictions for unseen data. Values closer to one tend to fix the polynomial curve shape to the target site curve, while excessively high values may yield unrealistic model predictions.

[0200] We conducted a series of experiments to underline the robustness of Clinical ComBAT compared to Vanilla ComBAT. Results were obtained on the data from the 1110, 1212, 1510 and 1610 datasets.

[0201] The harmonization results for mean diffusivity (FIG. 16) and apparent fiber density (as shown in FIG. 17, which presents a drawing of examples of harmonization of apparent fiber density using a Vanilla ComBAT technique and a Clinical ComBAT technique in accordance with an embodiment of the present disclosure) of the arcuate fasciculus in the left hemisphere, using Vanilla ComBAT and Clinical ComBAT, are presented for the 1110, 1212, 1510 and 1610 datasets (with, respectively, 119 health controls, 73 healthy controls, 11,382 healthy controls and 80 healthy controls). In FIG. 17, the lines indicate the 95th, 75th, 50th, 25th and 5th percentiles. Moreover, the first row of FIGS. 16 and 17 illustrate the pre-harmonized data. These figures underscore the necessity of data harmonization, as the non-harmonized data distributions are misaligned with the target site. While Vanilla ComBAT improves alignment to the target site, it often overestimates the multiplicative bias, artificially concentrating the data towards the mean of the target distribution and reducing the distinction between pathological subjects and healthy controls (e.g., the second and third sites). Conversely, Clinical ComBAT provides a more accurate estimation of the multiplicative bias, resulting in a similar distribution for healthy controls and better delineation of pathological subjects.

[0202] FIG. 18 presents a drawing of an example of harmonization using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure. Notably, FIG. 18 illustrates the effect of the variance a priori of Clinical ComBAT on a limited number of healthy controls. Here, we used data from ten randomly selected healthy control subjects from the 1110 dataset and tested different values of .sub.0 (see Eqn. 37). Note that the parameter .sub.0 of Clinical ComBAT controls the weight of the a priori on the variance estimation of the moving site

[00131] $(d_{M v}^{2}) .$

[0203] In FIG. 18, panel A shows the non-harmonized data. The harmonization results using Vanilla ComBAT are shown in Panel B, using ten randomly selected healthy controls. Panels C, D, E, F show the harmonization results on the same ten participants using increasing .sub.0 values (see Eqn. 37). With .sub.0=0, the standard deviation of the moving site is overestimated and overcorrected. Panels D, E show the standard deviation of the moving site aligned with the standard deviation of the target site (.sub.0=5, .sub.0=10). The last column demonstrates an underestimation of the standard deviation because of an excessively strong a priori from the target site (.sub.0=100).

[0204] In the panel C in FIG. 18, with .sub.0=0,

[00132] $d_{M v}^{2}$

is underestimated, leading to an under correction of the multiplicative bias, as indicated by the larger percentile lines compared to the target site. Panels D and E in FIG. 18 shows

[00133] $d_{M v}^{2}$

aligned with

[00134] $d_{T v}^{2} (v_{0} = 5, v_{0} = 10)$

whereas panel f in FIG. 18 demonstrates an overestimation of

[00135] $d_{M v}^{2}$

due to an excessively strong a priori from the target site (.sub.0=100). A high value of .sub.0 with few moving site subjects effectively fixes the standard deviation of the moving site

[00136] $d_{M v}^{2}$

to that of the target site

[00137] $d_{T v}^{2},$

preventing correction the multiplicative bias of the moving site data.

[0205] FIG. 19 presents a drawing of an example of a goodness of fit (see Table 5) of the harmonization of mean diffusivity of the arcuate fasciculus in the left hemisphere using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure. The plots are shown for various numbers of healthy controls for the 1110, 1212 and 1610 datasets, and different hyper-parameter values and .sub.0. The Clinical ComBAT model is fitted ten times with random healthy control subjects for each number of subjects. The lines show the mean Bhattacharyya distance (the smaller the better), and the shaded areas indicate one standard deviation. At the top of FIG. 19, and with a fix regularization term =10, the solid line (teal, .sub.0=0) show poor performance with low number of subjects. The dashed (green, .sub.0=5) and dotted (orange, .sub.0=10) lines show similar excellent results regardless of the number of subjects. The dot-dashed line (purple, .sub.0=100) show improved results with low number of subjects compared to harmonization with .sub.0=0. However, as the number of subjects increases, the strong a priori leads to an increased distance when more subject are available. The bottom panel shows results for four different values of and the auto-tuning of (curve clinical .sub.2=2) for a fix .sub.0=5. The best results are obtained with the auto-tuning technique which outperforms Vanilla ComBAT. Vanilla ComBAT (the red or orange dashed lines) with hyperparameter auto-tuning with .sub.1=0.5 and .sub.2=2 tends to show weaker harmonization results than Clinical ComBAT, especially for .sub.0=5 or .sub.0=10. Note that the Clinical ComBAT model is fitted ten times for each number of subjects.

[0206] FIG. 20 presents a drawing of an example of a hyperparameter tuning when dealing with a moving site whose healthy control subjects span a limited age range (from age 64 to 80) in accordance with an embodiment of the present disclosure. In FIG. 20, the solid black line represents the second-order model fitted to the target healthy control data (black dots), while the dashed blue line represents the second-order model fitted to the healthy controls of the 1212 dataset (blue dots) using Clinical ComBAT. Panel A shows the model fitted without regularization. Panels B and C show the model fitted with automatic hyperparameter tuning using .sub.1=0.25, .sub.2=4, and .sub.1=1, .sub.2=1, respectively.

[0207] Although panel A in FIG. 20 shows a good model fit (blue dotted line) within the age range of the moving site, the model quickly diverges as the age deviates from the input data. Consequently, the model can be used to harmonized data within this age range, but should be used cautiously for data outside of it. The proposed hyperparameter tuning technique of Clinical ComBAT (Table 5) automatically increases the regularization parameters to limit deviation from the target model when no moving data is available. In panel B of FIG. 20, the technique is applied with .sub.1=0.25, .sub.2=4, yielding an optimal regularization parameter =1,278. In panel C of FIG. 20, with .sub.1=1, .sub.2=1, the technique yields an optimal regularization parameter =1e10. This effectively fixes the shape of the moving site curve to the target site curve, estimating only the intercept. Overall, data models estimated in panels B and C in FIG. 12 are suitable for the harmonization of moving site subjects of all ages.

[0208] FIG. 21 presents a drawing of an example of a fit of mean diffusivity of the arcuate fasciculus in the left hemisphere using a Clinical ComBAT technique in accordance with an embodiment of the present disclosure. Notably, FIG. 21 highlights the effects of using reference site a priori with and without automatic hyperparameter tuning for various values with a fix .sub.0=5 with data model of polynomial order P equal to 1 to 5. Note that the first row shows the fitted model using a fixed =10. The second and third rows show the fitted model using automatic hyperparameter tuning (.sub.1=0.5 and .sub.2=2, and .sub.1=0.9 and .sub.2=1.1, respectively). As can be seen, the use of a polynomial degree of 2 and 3 with an auto-tuning with .sub.1=0.5, .sub.2=2 yields the best results. High polynomial degrees (P equal to 4 and P equal to 5) need auto-tuning to be usable outside the age range of the input data.

[0209] FIG. 22 presents a drawing of an example of outlier rejection in accordance with an embodiment of the present disclosure. Notably, FIG. 22 shows harmonization results obtained with the disclosed outlier rejection technique. On the top left, we see the target distribution in gray level containing only healthy controls and the unharmonized moving distribution in blue (from a site of the 1212 dataset) with Alzheimer's disease subjects as green dots. On the top right is the results of Clinical ComBAT when only the healthy controls of the moving site are considered. On the bottom left, is the harmonization of the moving site when both the healthy controls and the Alzheimer's disease subjects are considered. As a result, the healthy control blue distribution is poorly aligned with the target distribution and the Alzheimer's disease subjects falls inside the normative distribution. On the bottom right is the results of Clinical ComBAT harmonization when the outlier rejection technique is used to remove outliers (here Alzheimer's disease subjects). As can be seen, the result is close to that obtained with only healthy control with Alzheimer's disease subjects being above the normative distribution.

[0210] Table 8 summarizes the 11 features in Clinical ComBAT.

TABLE-US-00008 TABLE 8 Feature Clinical ComBAT 1. Pairwise harmonization between a moving site and a well populated target site 2. New data formation model, Eqn. 19 with one additive bias and one multiplicative bias 3. The model parameter vector {right arrow over ()}.sub.iv is site specific 4. Use of a non-linear model in Eqns. 20 and 21 5. [00138] $The variance of the target site d_{Tv}^{2} is computed with a maximum likelihood$ (Eqn. 24) 6. The model parameter vector of the moving site {right arrow over ()}.sub.Mv is computed with a maximum a posteriori formulation with the data of the moving site and a prior depending on the parameter vector of the target site {right arrow over ()}.sub.Tv. (Eqn. 28) 7. [00139] $The variance of the moving site d_{Mv}^{2} is computed with a maximum a$ posteriori formulation involving a Gamma prior distribution based at least [00140] $in part on the variance of the target site d_{Tv}^{2} (Eqn . 32)$ 8. Use of a quality control module that measures the goodness of fit between the target site and the harmonized moving site (Table 5) 9. Use of an outlier rejection module to remove outliers from the moving site 10. Some hyperparameters (like {right arrow over ()} and v.sub.0) are automatically estimated following an auto-tuning technique (Table 6) 11. When the amount of data in the moving site is too low or that the data is poorly distributed across its co-variables, one can use Community- ComBAT to compensate for it (Table 7)

[0211] We now describe embodiments of a computer, which may perform at least some of the operations in the Clinical ComBAT techniques. FIG. 23 presents a block diagram illustrating an example of a computer 2300, e.g., in a computer system (such as computer system 200 in FIG. 2), in accordance with some embodiments. For example, computer 2300 may include: one of computers 210. This computer may include processing subsystem 2310, memory subsystem 2312, and networking subsystem 2314. Processing subsystem 2310 includes one or more devices configured to perform computational operations. For example, processing subsystem 2310 can include one or more microprocessors, ASICs, microcontrollers, programmable-logic devices, GPUs and/or one or more DSPs. Note that a given component in processing subsystem 2310 are sometimes referred to as a computation device.

[0212] Memory subsystem 2312 includes one or more devices for storing data and/or instructions for processing subsystem 2310 and networking subsystem 2314. For example, memory subsystem 2312 can include dynamic random access memory (DRAM), static random access memory (SRAM), and/or other types of memory. In some embodiments, instructions for processing subsystem 2310 in memory subsystem 2312 include: program instructions or sets of instructions (such as program instructions 2322 or operating system 2324), which may be executed by processing subsystem 2310. Note that the one or more computer programs or program instructions may constitute a computer-program mechanism. Moreover, instructions in the various program instructions in memory subsystem 2312 may be implemented in: a high-level procedural language, an object-oriented programming language, and/or in an assembly or machine language. Furthermore, the programming language may be compiled or interpreted, e.g., configurable or configured (which may be used interchangeably in this discussion), to be executed by processing subsystem 2310.

[0213] In addition, memory subsystem 2312 can include mechanisms for controlling access to the memory. In some embodiments, memory subsystem 2312 includes a memory hierarchy that comprises one or more caches coupled to a memory in computer 2300. In some of these embodiments, one or more of the caches is located in processing subsystem 2310.

[0214] In some embodiments, memory subsystem 2312 is coupled to one or more high-capacity mass-storage devices (not shown). For example, memory subsystem 2312 can be coupled to a magnetic or optical drive, a solid-state drive, or another type of mass-storage device. In these embodiments, memory subsystem 2312 can be used by computer 2300 as fast-access storage for often-used data, while the mass-storage device is used to store less frequently used data.

[0215] Networking subsystem 2314 includes one or more devices configured to couple to and communicate on a wired and/or wireless network (i.e., to perform network operations), including: control logic 2316, an interface circuit 2318 and one or more antennas 2320 (or antenna elements). (While FIG. 23 includes one or more antennas 2320, in some embodiments computer 2300 includes one or more nodes, such as antenna nodes 2308, e.g., a metal pad or a connector, which can be coupled to the one or more antennas 2320, or nodes 2306, which can be coupled to a wired or optical connection or link. Thus, computer 2300 may or may not include the one or more antennas 2320. Note that the one or more nodes 2306 and/or antenna nodes 2308 may constitute input(s) to and/or output(s) from computer 2300.) For example, networking subsystem 2314 can include a Bluetooth networking system, a cellular networking system (e.g., a 3G/4G/5G network such as UMTS, LTE, etc.), a universal serial bus (USB) networking system, a networking system based on the standards described in IEEE 802.11 (e.g., a Wi-Fi networking system), an Ethernet networking system, and/or another networking system.

[0216] Networking subsystem 2314 includes processors, controllers, radios/antennas, sockets/plugs, and/or other devices used for coupling to, communicating on, and handling data and events for each supported networking system. Note that mechanisms used for coupling to, communicating on, and handling data and events on the network for each network system are sometimes collectively referred to as a network interface for the network system. Moreover, in some embodiments a network or a connection between the electronic devices does not yet exist. Therefore, computer 2300 may use the mechanisms in networking subsystem 2314 for performing simple wireless communication between electronic devices, e.g., transmitting advertising or beacon frames and/or scanning for advertising frames transmitted by other electronic devices.

[0217] Within computer 2300, processing subsystem 2310, memory subsystem 2312, and networking subsystem 2314 are coupled together using bus 2328. Bus 2328 may include an electrical, optical, and/or electro-optical connection that the subsystems can use to communicate commands and data among one another. Although only one bus 2328 is shown for clarity, different embodiments can include a different number or configuration of electrical, optical, and/or electro-optical connections among the subsystems.

[0218] In some embodiments, computer 2300 includes a display subsystem 2326 for displaying information on a display, which may include a display driver and the display, such as a liquid-crystal display, a multi-touch touchscreen, etc. Moreover, computer 2300 may include a user-interface subsystem 2330, such as: a mouse, a keyboard, a trackpad, a stylus, a voice-recognition interface, and/or another human-machine interface.

[0219] Computer 2300 can be (or can be included in) any electronic device with at least one network interface. For example, computer 2300 can be (or can be included in): a desktop computer, a laptop computer, a subnotebook/netbook, a server, a supercomputer, a tablet computer, a smartphone, a cellular telephone, a consumer-electronic device, a portable computing device, communication equipment, and/or another electronic device.

[0220] Although specific components are used to describe computer 2300, in alternative embodiments, different components and/or subsystems may be present in computer 2300. For example, computer 2300 may include one or more additional processing subsystems, memory subsystems, networking subsystems, and/or display subsystems. Additionally, one or more of the subsystems may not be present in computer 2300. Moreover, in some embodiments, computer 2300 may include one or more additional subsystems that are not shown in FIG. 23. Also, although separate subsystems are shown in FIG. 23, in some embodiments some or all of a given subsystem or component can be integrated into one or more of the other subsystems or component(s) in computer 2300. For example, in some embodiments program instructions 2322 are included in operating system 2324 and/or control logic 2316 is included in interface circuit 2318.

[0221] Moreover, the circuits and components in computer 2300 may be implemented using any combination of analog and/or digital circuitry, including: bipolar, PMOS and/or NMOS gates or transistors. Furthermore, signals in these embodiments may include digital signals that have approximately discrete values and/or analog signals that have continuous values. Additionally, components and circuits may be single-ended or differential, and power supplies may be unipolar or bipolar.

[0222] An integrated circuit may implement some or all of the functionality of networking subsystem 2314 and/or computer 2300. The integrated circuit may include hardware and/or software mechanisms that are used for transmitting signals from computer 2300 and receiving signals at computer 2300 from other electronic devices. Aside from the mechanisms herein described, radios are generally known in the art and hence are not described in detail. In general, networking subsystem 2314 and/or the integrated circuit may include one or more radios.

[0223] In some embodiments, an output of a process for designing the integrated circuit, or a portion of the integrated circuit, which includes one or more of the circuits described herein may be a computer-readable medium such as, for example, a magnetic tape or an optical or magnetic disk or solid state disk. The computer-readable medium may be encoded with data structures or other information describing circuitry that may be physically instantiated as the integrated circuit or the portion of the integrated circuit. Although various formats may be used for such encoding, these data structures are commonly written in: Caltech Intermediate Format (CIF), Calma GDS II Stream Format (GDSII), Electronic Design Interchange Format (EDIF), OpenAccess (OA), or Open Artwork System Interchange Standard (OASIS). Those of skill in the art of integrated circuit design can develop such data structures from schematics of the type detailed above and the corresponding descriptions and encode the data structures on the computer-readable medium. Those of skill in the art of integrated circuit fabrication can use such encoded data to fabricate integrated circuits that include one or more of the circuits described herein.

[0224] While some of the operations in the preceding embodiments were implemented in hardware or software, in general the operations in the preceding embodiments can be implemented in a wide variety of configurations and architectures. Therefore, some or all of the operations in the preceding embodiments may be performed in hardware, in software or both. For example, at least some of the operations in the Clinical ComBAT techniques may be implemented using program instructions 2322, operating system 2324 (such as a driver for interface circuit 2318) or in firmware in interface circuit 2318. Thus, the Clinical ComBAT techniques may be implemented at runtime of program instructions 2322. Alternatively or additionally, at least some of the operations in the Clinical ComBAT techniques may be implemented in a physical layer, such as hardware in interface circuit 2318.

[0225] In the preceding description, we refer to some embodiments. Note that some embodiments describes a subset of all of the possible embodiments, but does not always specify the same subset of embodiments. Moreover, note that the numerical values provided are intended as illustrations of the Clinical ComBAT techniques. In other embodiments, the numerical values can be modified or changed.

[0226] The foregoing description is intended to enable any person skilled in the art to make and use the disclosure, and is provided in the context of a particular application and its requirements. Moreover, the foregoing descriptions of embodiments of the present disclosure have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the present disclosure to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Additionally, the discussion of the preceding embodiments is not intended to limit the present disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

HARMONIZATION OF DIFFUSION MAGNETIC RESONANCE IMAGING

Inventors

Cpc classification

Classification Explorer

G06T2210/41

PHYSICS

Classification Explorer

G06T2211/424

PHYSICS

Classification Explorer

G06T12/30

PHYSICS

Classification Explorer

G06T7/0012

PHYSICS

Classification Explorer

G06T12/20

PHYSICS

Classification Explorer

G06T2207/10092

PHYSICS

Classification Explorer

G16H20/10

PHYSICS

Classification Explorer

G06T2207/30168

PHYSICS

Classification Explorer

G06T12/10

PHYSICS

Classification Explorer

G06T2207/30016

PHYSICS

Classification Explorer

G06T7/174

PHYSICS

International classification

Classification Explorer

G06T7/00

PHYSICS

Classification Explorer

G06T11/00

PHYSICS

Classification Explorer

G06T7/174

PHYSICS

Classification Explorer

G16H20/10

PHYSICS

Abstract

Claims

Description