Method and Device for Determining a Measure of Causal Influence Between Components of Complex Systems
20220058211 · 2022-02-24
Inventors
Cpc classification
G06F18/2135
PHYSICS
A61B5/055
HUMAN NECESSITIES
G06F17/18
PHYSICS
G06F18/213
PHYSICS
A61B5/318
HUMAN NECESSITIES
International classification
G06F16/28
PHYSICS
Abstract
Disclosed is a computer-implemented method for determining a measure of the causal interdependence of a source dataset S and a target dataset Y in the simultaneous presence of a non-empty confounding dataset B. The method includes a dimensional modification step to reduce the complexity of the data and an augmentation step to add information to the dimensionally modified data without a significant increase in size or complexity of the data. The augmented dimensionally modified data is used to calculate a measure of the causality or relatedness between the source dataset S and the target dataset Y. The method enables linear or nonlinear causality analysis of high-dimension multivariate time-series or features datasets, such as time-series datasets generated by functional MRI full-brain scans.
Claims
1. A computer-implemented method for determining a measure of the causal interdependence of a source dataset S and a target dataset Y in the simultaneous presence of a non-empty confounding dataset B, the confounding dataset B not being the source dataset S and not being the target dataset Y, the method comprising the steps of: (a) calculating at least one of the following dimensionally modified representations (aa) and (ab): (aa) a dimensionally modified representation n.sub.i of one of the following three datasets (1), (2), and (3): (1) the confounding dataset B, (2) the target data set Y, and (3) the union of the confounding dataset B and the target dataset Y, whereby the representation n.sub.i does not include information about the source dataset S, and (ab) a dimensionally modified representation n.sub.c of at least one of the following datasets (1), (2), (3), and (4): (1) the source dataset S, (2) the union of the source dataset S and the confounding dataset B, (3) the union of the source dataset S and the target dataset Y, and (4) the union of the source dataset S, the confounding dataset B, and the target dataset Y; (b) constructing at least one of the following augmented dimensionally modified representations m.sub.i or m.sub.c by: (ba) constructing an augmented dimensionally modified representation m.sub.i that does not contain information about the source dataset S by augmenting a dimensionally modified representation n.sub.i with information about at least one of the following datasets (1), (2), and (3): (1) the target dataset Y, (2) the confounding dataset B, and (3) the union of the target dataset Y and the confounding dataset B; and/or (bb) constructing an augmented dimensionally modified representation m.sub.c that contains information about the source dataset S by: (bba) augmenting the dimensionally modified representation n.sub.c with information about at least one of the following datasets (1), (2), (3), and (4): (1) the source dataset S, (2) the confounding dataset B, (3) the target dataset Y, and (4) any union of at least two of the datasets S, B, and Y; or (bbb) augmenting the dimensionally modified representation n.sub.i or the augmented dimensionally modified representation m.sub.i with information about the source dataset S; and (c) calculating a measure of causal interdependence of the source dataset S and the target dataset Y using at least one of the augmented dimensionally modified representations m.sub.i or m.sub.c, whereby information about the confounding data set B is used: (1) directly, or (2) indirectly by having the information about the confounding data set B having been included in the representations n.sub.i, n.sub.c, m.sub.i, or m.sub.c, which are used for calculating a measure of causal interdependence of the source dataset S and the target dataset Y.
2. The method of claim 1, wherein step (c) comprises: calculating estimation errors that compare estimated and actual values of the target dataset Y, or calculating information-theoretic quantities that characterize information exchange between the source dataset S and the target dataset Y, such as transfer entropy, cross-entropy, corr-entropy, mutual information, or conditional mutual information.
3. The method of claim 1, wherein the calculation of a dimensionally modified representation n.sub.i is based on performing the calculation of a dimensionally modified representation n.sub.c in such a way that information about the source dataset S is eliminated.
4. The method of claim 1 wherein the calculation of the dimensionally modified representations n.sub.i or n.sub.c is performed in such a way that information about the target dataset Y or the confounding dataset B, or both, is eliminated from n.sub.i or n.sub.c.
5. The method of claim 1 wherein step (c) comprises: calculating quantities {tilde over (y)} that provide information about the target dataset Y without using information about the source dataset S by basing the calculations of {tilde over (y)}: (i) on the dimensionally modified representation n.sub.i, or (ii) on the augmented dimensionally modified representation m.sub.i; and calculating quantities y{circumflex over ( )} that provide information about the target dataset Y using information about the source dataset S by basing the calculations of y{circumflex over ( )}: (iii) on the dimensionally modified representation n.sub.c, or (iv) on the augmented dimensionally modified representation m.sub.c; and the calculation of a measure of causal interdependence of the source dataset S and the target dataset Y is based on both quantities {tilde over (y)} and y{circumflex over ( )}.
6. The method of claim 5 wherein quantities {tilde over (y)} and y{circumflex over ( )} are estimated values of the target dataset Y, whereby the calculation of a measure of causal interdependence of the source dataset S and the target dataset Y is based on the calculation of quantities that compare estimation errors of the estimated values {tilde over (y)} and the estimated values y{circumflex over ( )} in estimating the target dataset Y.
7. The method of claim 5 wherein the quantities {tilde over (y)} and y{circumflex over ( )} provide statistical information about the target dataset Y that is used to calculate information-theoretic measures that characterize information exchange between the source dataset S and the target dataset Y, such as transfer entropy, cross-entropy, corr-entropy, mutual information, or conditional mutual information.
8. The method of claim 5, wherein calculating a measure of causal interdependence between the source dataset X and the target dataset Y comprises: calculating Granger causality, such as by calculating Granger causality indices or f-statistic.
9. The method of claim 1 wherein: a complete dataset N.sub.c is defined as: the union of the source dataset S and the confounding dataset B, or the union of the source dataset S and the target dataset Y, or the union of the source dataset S, the confounding dataset B, and the target dataset Y; and step (a) comprises the steps of: removing the source dataset S from the complete dataset N.sub.c to form the incomplete dataset N.sub.i; and calculating a dimensionally modified representation of N.sub.i to form n.sub.i.
10. The method of claim 9 comprising the steps of: iterating over a non-empty and at least two-element subset P of the complete dataset N.sub.c, and for each iteration: assigning one or more different non-empty subsets of P as the source dataset S; removing that source dataset S from the complete dataset N.sub.c to form an incomplete dataset N.sub.i that includes the confounding dataset B; and calculating the measure of causal interdependence of the source dataset S and a target dataset Y by performing steps (a)-(c), wherein step (a) comprises calculating the dimensionally modified representation n.sub.i from the incomplete dataset N.sub.i.
11. The method of claim 9 comprising the steps of: iterating over a non-empty and at least two-element subset Q of the complete dataset N.sub.c, and for each iteration: assigning one or more different non-empty subsets of Q as the target dataset Y; and calculating a measure of causal interdependence of the source dataset S and the target dataset Y by performing steps (a)-(c).
12. The method of claim 9 wherein one dimensionally modified representation n.sub.c of the complete dataset N.sub.c is calculated and used for more than one iteration by, in each iteration: eliminating information about the source dataset S from n.sub.c; and/or eliminating information about the target dataset Y from n.sub.c.
13. (canceled)
14. The method of claim 1 wherein estimates of any non-empty subset or any union of any of the following data sets are calculated: target Y, source S, confounding data set B, dimensionally modified representations n.sub.i, n.sub.c, m.sub.i, or m.sub.c; and such calculations are performed using (i) linear methods, including, affine, autoregressive, such as MVAR, or any other linear regression methods; or (ii) non-linear methods, including, non-linear regression methods, feed-forward multi-layer perceptrons, e.g. trained by error-back-propagation, radial basis functions (RBF) neural networks or generalized radial basis functions (GRBF) neural networks, support vector regression, fuzzy logic, random forests, decision trees, recurrent neural networks, including, but not limited to Echo State Networks or LSTMs; or (iii) any type of local models, including, local linear models or local average models; (iv) any type of ordinary or partial differential equation solvers; or (v) any combination of different modeling approaches, including, combining approaches previously recited herein, committee machines, boosting, bootstrap aggregating, mixture-of-experts, and/or ensemble learning or ensemble methods.
15-18. (canceled)
19. The method of claim 1 wherein the calculation of a dimensionally modified representation of the source dataset S, the target dataset Y, or the confounding data set B includes a non-linear transformation, such as induced by a non-linear kernel derived from a clustering method; and such a calculated dimensionally modified representation serves as input to linear modeling methods, including, affine or MVAR modeling.
20. The method of claim 1 wherein step (c) comprises: calculating a measure of causal influence of a source dataset S on a target dataset Y; or, calculating a measure of causal influence of a target data set Y on a source dataset S.
21. The method of claim 1 wherein the source dataset S, the target dataset Y, and the confounding dataset B are each time-series datasets.
22. The method of claim 1 wherein the source dataset S, the target dataset Y, and the confounding dataset B are each features datasets.
23. The method of claim 1 wherein the source dataset S, the target dataset Y, the confounding dataset B, or any union of at least two of these datasets, defines a dataset X∈.sup.N×T, wherein the dataset X is an underdetermined data set with N>T.
24.-25. (canceled)
26. Computer code stored in a tangible medium and operable by a computer processor for computer implementation of the method as recited in claim 1.
27. A device comprising: one or more processors; and one or more memories storing code executable by the processors to perform: (a) reading a source dataset S, a target dataset Y, and a non-empty dataset B, the dataset B not being the source dataset S and not being the target dataset Y; (b) calculating at least one of the following dimensionally modified representations (aa) and (ab): (aa) a dimensionally modified representation n.sub.i of one of the following three datasets (1), (2), and (3): (1) a confounding dataset B, (2) a target data set Y, and (3) a union of the confounding dataset B and the target dataset Y, whereby the representation n; does not include information about the source dataset S, and (ab) a dimensionally modified representation n.sub.c of at least one of the following datasets (1), (2), (3), and (4): (1) the source dataset S, (2) the union of the source dataset S and the confounding dataset B, (3) the union of the source dataset S and the target dataset Y, and (4) the union of the source dataset S, the confounding dataset B, and the target dataset Y; (c) constructing at least one of the following augmented dimensionally modified representations m.sub.i or m.sub.c by: (ba) constructing an augmented dimensionally modified representation m.sub.i that does not contain information about the source dataset S by augmenting a dimensionally modified representation n.sub.i with information about at least one of the following datasets (1), (2), and (3): (1) the target dataset Y, (2) the confounding dataset B, and (3) the union of the target dataset Y and the confounding dataset B; and/or (bb) constructing an augmented dimensionally modified representation m.sub.c that contains information about the source dataset S by: (bba) augmenting the dimensionally modified representation n.sub.c with information about at least one of the following datasets (1), (2), (3), and (4): (1) the source dataset S, (2) the confounding dataset B, (3) the target dataset Y, and (4) any union of at least two of the datasets S, B, and Y; or (bbb) augmenting the dimensionally modified representation n.sub.i or the augmented dimensionally modified representation m.sub.i with information about the source dataset S; and (d) calculating a measure of causal interdependence of the source dataset S and the target dataset Y using at least one of the augmented dimensionally modified representations m.sub.i or m.sub.c, whereby information about the confounding data set B is used: (1) directly, or (2) indirectly by having the information about the confounding data set B having been included in the representations n.sub.i, n.sub.c, m.sub.i, or m.sub.c, which are used for calculating a measure of causal interdependence of the source dataset S and the target dataset Y.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
DETAILED DESCRIPTION
[0083] .sup.N×T having simultaneously measured values. The complete system N.sub.c includes as a subset one or more time-series defining a conditional source time-series or source S. It is desired to evaluate the causal influence of each source S on a dataset 14 defining a target time-series or target Y having values simultaneously measured with those of the complete series N.sub.c. A source S may include more than one row of the N rows of N.sub.c. The target Y may or may not be an element or subset of the complete system N.sub.c.
[0084]
[0085] In the illustrated application of the lsXGC method the source time-series S is the time-series x.sub.3 and the target time-series Y is an element of the complete system N.sub.c, namely the time-series x.sub.N-2. It is desired to evaluate the causal influence F.sub.S.fwdarw.Y of the source time-series x.sub.3 on the target time-series x.sub.N-2. The other time-series of the complete system N.sub.c are confounding time-series that do not belong to the source S and do not belong to the target Y but may influence or may be influenced by all or some of the other time-series, including the target Y.
[0086] In the first lsXGC step 16 (see .sup.D×T of an incomplete system N.sub.i and, optionally, generate a dimension-reduced representation n.sub.c∈
.sup.E×T of the complete system N.sub.c wherein D, E<N. The incomplete system N.sub.i is the system created from the complete system N.sub.c by removing the source S from the complete system N.sub.c, that is, N.sub.i=(N.sub.c\S)∈
.sup.F×T wherein F<N. As explained in more detail below, the dimension-reduced incomplete system representation n.sub.i can be generated either by direct dimensionality reduction of the incomplete system N.sub.i or by dimensionality reduction of the complete system N.sub.c and additional, such as subsequent, elimination of information about the source time-series S.
[0087] Dimensionality reduction of the incomplete system N.sub.i or of the complete system N.sub.c can be performed using linear dimensionality reduction techniques, such as Principal Component Analysis (PCA), or non-linear techniques. Principal Component Analysis is a linear technique that transforms the original dataset into its principal components by transforming the dataset into a lower-dimensional subspace. The first principal component explains the greatest amount of the variance in the data, the second principal component explains the second greatest amount of variance, and so on. A plot of explained variance versus the number of principal components can be used in Principal Component Analysis to determine a reasonable number of principal components and hence the dimensionality of the reduced-dimension incomplete representation n.sub.i or the optional reduced-dimension complete representation n.sub.c.
[0088] For non-linear dimensionality reduction methods, one can utilize my previous work on information visualization by non-linear dimension reduction disclosed in my U.S. Pat. No. 7,567,880 as well as a wide scope of non-linear algorithms available from the literature, see e.g. [3]. These include machine learning techniques for dimensionality reduction, such as autoencoders and their variants (e.g. variational autoencoders), clustering methods (e.g. k-means, fuzzy c-means, self-organizing maps (SOM), divergence-based methods, such as stochastic neighbor embedding (SNE), independent component analysis (ICA), exploratory observation machines (XOM), as well as variants or combinations thereof.
[0089] The dimension-reduction algorithm used does not have to be invertible. The original target time-series Y can be estimated directly from the dimension-reduced system and does not have to be estimated indirectly from back-projecting an estimate of the dimension-reduced time-series into the original space as is performed using the large scale Granger Causality (lsGC) method, although such back-projection of estimates calculated in the dimension-reduced space can be used with lsXGC as well.
[0090] Generating the dimension-reduced incomplete system n.sub.i is described next.
[0091]
[0092] There are two alternative methods for generating the dimension-reduced representation n.sub.i of the incomplete system N.sub.i. These alternative methods are illustrated in
[0093] Referring to
[0094] There are various ways to eliminate information of the source S from n.sub.c to generate n.sub.i, depending on the specific dimension reduction algorithm used.
[0095] For example, if Principal Components Analysis (PCA) is used to generate the dimension-reduced complete representation n.sub.c, all entries in the coefficient transformation matrix that correspond to the input dimension(s) of the conditional source time-series S can be set to zero. If kernel transformations are used for dimension reduction, such as based on clustering methods, all contributions to calculating the kernel-transformed time-series originating from the input dimensions of the conditional source time-series S can be set to zero. If trained supervised neural networks are used for dimension reduction, such as auto-encoders, all network inputs related to the input dimensions of the conditional source time-series S can be set to zero. Various approaches to eliminating information about the conditional source time-series S (or target time-series Y, discussed later below) after performing dimension reduction have been described in the literature discussing large-scale Granger causality (lsGC), such as [1], [2], and the literature cited therein.
[0096] Note that if Method A is used, it is not necessary to generate the incomplete system N.sub.i. Also note that if Method A is used, different dimension-reduced incomplete representations n.sub.i for different source time-series S of the complete system N.sub.c can be generated by performing step 24 of
[0097] So, for example, if an analyst wishes to analyze the causal influence of different conditional sources S on a given target T of a complete system N.sub.c, Method A permits that the dimensionality reduction of the complete system N.sub.c is performed only once for all the possible conditional sources S of the given target T in the system. Besides the potentially desirable stability of the dimensionality reduction result, this provides significant computational advantages, specifically when large, i.e. high-dimensional, systems (such as those from full-brain fMRI) need to be analyzed with regards to causal influences among their components.
[0098] Referring again to
[0099] As a specific example,
[0100] In a second step 34 of the lsXGC method (see
[0101] The at least one augmenting time-series x.sub.a can be an element of the source S (x.sub.a∈S) and/or the target Y (x.sub.a∈Y). Alternatively, the at least one augmenting time-series x.sub.a is a modified element of the source S and/or the target Y. For example, the at least one augmenting time-series x.sub.a can be generated by replacing the original values of the source S with the logarithms of the original values. Arbitrary other modifications of the source S and/or the target Y are possible, such as for more efficient generation of causality indices in subsequent steps.
[0102] As shown in
[0103] As similarly shown in
[0104] At least one augmented representation of the dimension-reduced incomplete representation n.sub.i or of the dimension-reduced complete representation n.sub.c will be used in the next step of the lsXGC method to estimate the target Y for causal analysis. In this embodiment of the lsXGC method a source-augmented dimension-reduced incomplete representation n.sub.iaS will be used for estimating the target Y using information about the source S. The incomplete representation n.sub.i will be used for predicting the target Y without using information about the source S.
[0105]
[0106] In a third step 40 of the lsXGC method (see
[0107] (a) a first set of predictions {tilde over (y)} of future values of the target Y using the incomplete system n.sub.i having no information about the conditional source S, and
[0108] (b) a second set of predictions ŷ of future values of the target Y using the source-augmented representation n.sub.iaS having information about the conditional source S.
[0109]
[0110] In the embodiment of
[0111] Nonlinear estimation models, such as for time-series prediction, may be used as well. Variants of lsXGC, in which nonlinear estimation models are used, will be called large-scale Nonlinear Granger Causality (lsNGC). A typical embodiment would be to use Radial Basis Functions (RBF) or Generalized Radial Basis Functions (GRBF) neural networks. Here, a typical embodiment would include dimensionality reduction of the complete or incomplete original time-series system by clustering of extracted state-space vectors using a sliding window approach. The extracted windows represent state-space vectors that can be processed using clustering techniques, such as k-means, fuzzy c-means, self-organizing maps, agglomerative clustering or any clustering method known from prior art. As an example for state space clustering, let Q be a q-dimensional time-series of length T, such as the complete or the incomplete system, or the source or the target time-series, or any other one- or multidimensional time-series considered in this description, such as dimension-reduced and/or augmented complete or incomplete systems. Then sliding windows U(t) of size q×e can be extracted from time-series Q for each time t with t=e, . . . , T, with e the temporal width of the sliding window. The q×e-dimensional vectors U(t) can be clustered yielding K cluster prototype vectors, e.g. cluster centers, U.sub.k of dimension q×e with k=1, . . . , K. The computed cluster prototype vectors U.sub.k can then be used to obtain processed time-series representations, which contain information about the original time-series Q. A simple way to generate such processed time-series representations is to perform nonlinear kernel transformations of the original time-series, which exploit a distance measure between the values of a given time-series and each cluster center. In the above state-space clustering example for the originally q-dimensional time-series Q, one may compute a K-dimensional processed time-series Q′ (t)=(Q.sub.1.sup.′(t), . . . , Q.sub.k.sup.′(t), . . . , Q.sub.K.sup.′(t)).sup.T by using a Gaussian (Radial Basis Functions, RBF) kernel, i.e., Q.sub.k.sup.′(t)=exp(−((U(t)−U.sub.k).sup.2/(2σ.sup.2))) with kernel width σ, or using a normalized Gaussian (Generalized Radial Basis Functions, GRBF) kernel, i.e., Q.sub.k.sup.″(t)=(1/Σ)Q.sub.k.sup.′(t) with Σ=Σ.sub.k=1.sup.k Q.sub.k.sup.′(t).
[0112] For example, a dimension-reduced representation n.sub.i=Q′ of the incomplete system shown in
ŷ=a.sub.0+a.sub.1.Math.Q′+a.sub.2.Math.S′, (1)
where model coefficients a.sub.0, a.sub.1, and a.sub.2 can be estimated by least-squares optimization, and “.Math.” represents the scalar product based on element-wise multiplication. Here, as well, the equivalent sliding window technique and affine modeling can also be used to generate the set of predictions {tilde over (y)} using the dimension-reduced representation n.sub.i=Q′ of the incomplete system only, i.e. without information about the source time-series S, such as by computing
ŷ=b.sub.0+b.sub.1.Math.Q′, (2)
where model coefficients b.sub.0 and b.sub.1 can be estimated using least-squares optimization.
[0113] Note that instead of RBF or GRBF models, it is straightforward to use other non-linear modeling approaches, such as local models, e.g. [5] or [12], or multivariate non-linear regression techniques.
[0114] In a fourth step 42 of the lsXGC method (see
[0115] In the above example, the f-statistic can be obtained by recognizing that the two models, equations (1) and (2) above, can be characterized as an unrestricted model and a restricted model, respectively. Residual sum of squares (RSS) of the restricted model, RSS.sub.R, and residual sum of squares of the unrestricted model, RSS.sub.U, are obtained. Let n be the number of vectors, and p.sub.U and p.sub.R the numbers of parameters to be estimated for the unrestricted and the restricted model, respectively. For the example of equations (1) and (2) using state space clustering with K clusters for calculating n.sub.i=Q′, and S clusters for calculating S′, in (2) and (1), p.sub.R=K, and p.sub.U=K+S, respectively. Given these settings, the f-statistic lsXGC causality index F.sub.S.fwdarw.Y is given by:
[0116] Application of the lsXGC method for causal analysis between time-series of a high-dimensional multivariate functional MRI (fMRI) dataset is described next.
[0117]
[0118] An exemplar whole brain fMRI multivariate time-series dataset is composed of 100,000 voxels (volume pixels) within the brain that are sampled every 0.5 seconds for four minutes. The resulting fMRI dataset is a time-series X∈.sup.(100,000×480). Note that the number of voxels, sampling rate and time period used to collect fMRI data varies depending on the MRI scanner used, age and physical condition of the subject, and other variables. The fMRI dataset is a high-dimensional multivariate time-series having a relatively low number of temporal time values, that is, 100,000 time-series of length 480.
[0119] The raw fMRI dataset 46 is pre-processed before analysis to generate a complete fMRI dataset 48 or complete system N.sub.c∈.sup.N×T that is used for causal analysis. Pre-processing may include, but is not limited to motion correction, removing a limited number of the initial values of the time-series to remove the effects of initial saturation, linear de-trending by high-pass filtering to remove effects of signal drifts, and normalizing the data to zero mean and unit standard deviation to focus on signal dynamics rather than amplitude.
[0120] It is desired to quantify the causal influence of each time-series of the complete system N.sub.c on each of the other time-series of the complete system N.sub.c for generating a N.sup.2 affinity (connectivity) matrix. The analysis requires iterating through each time-series (row) of the complete system N.sub.c and having that time-series be a conditional source S or a target Y and, for that time-series, iterate through each time-series as the target Y or the conditional source S to evaluate the causal influence F.sub.S.fwdarw.Y of the source S on the target Y for that pair of time-series, the other time-series being confounding time-series. Note that, as a special case, the target and source time-series can be identical, i.e. S=Y for evaluating the causal influence of a time-series on itself. This special case is often referred to in the context of so-called “Granger autonomy” in prior art. The analysis is illustrated in the following pseudocode in which each row of the complete system N.sub.c is selected as the target time-series Y and each time-series is selected as the conditional source time series S:
TABLE-US-00001 FOR each time-series x ϵ N.sub.c Target Y := x FOR each time-series x ϵ N.sub.c Conditional Source S := x′ Calculate causality index F.sub.S.fwdarw.Y by 1sXGC method (previously described above) Step 1 - generate dimension-reduced representation(s) Step 2 - augmentation of dimension-reduced representation(s) Step 3 - generate predictions {tilde over (y)} and ŷ from augmented representation(s) Step 4 - calculate causality index F.sub.S.fwdarw.Y using sets of predictions {tilde over (y)} and ŷ Store F.sub.S.fwdarw.Y as element in N.sup.2 affinity matrix ENDFOR ENDFOR
[0121] For fMRI causality analysis, Method A (
[0122] The performance of the lsXGC method (specifically in its non-linear lsNGC variants) was also evaluated and compared to other methods using synthetic data with known connectivity structure. The performance of these methods to unveil known connectivity relationships can be evaluated using Receiver Operating Characteristic (ROC) analysis to quantitatively compare performance. The lsNGC method was found to perform better than current state-of-the-art analysis using Convergent Cross Mapping (CCM) [5] with a high statistical significance.
[0123] The applicant has also found in using the lsNGC method for causality analysis of fMRI datasets and synthetic data that the time-series modeling approaches, e.g. supervised machine learning methods using the non-linear transformation functions, e.g. Q′ and S′ in the above example, do not have to be IIV invariant to generate useful results. That is, although IIV invariance is mathematically relevant, its relevance to real-world causality analysis of high dimensionality multivariate datasets is not so clear. It is suggested that data analysts use and compare both IIV invariant methods and IIV non-invariant modeling methods before settling on the machine learning or statistical modeling method to be used for lsNGC analysis of their datasets.
[0124] Application of the disclosed lsXGC method for feature selection is discussed next.
[0125] Analogies between causality analysis of multivariate time-series data and features using the lsNGC method is set out in Table 1:
TABLE-US-00002 TABLE 1 Description Time-Series Features Components Number of Components N N Number of Values of each T T Component Relationship between Causality Relevancy Components index score
[0126]
[0127] An analysis using a lsNGC variant of the lsXGC method was conducted using the lung cancer feature as the target Y and each of the other features as a source S to obtain a vector of relevancy indices to identify relevant factors for lung cancer.
[0128] The lsNGC method was used in essentially the same manner as used with multivariate time-series datasets to determine causality indices such as Granger causality indices or f-statistic indices. The lung cancer feature was predicted using dimension-reduced and augmented dimension-reduced representations as described above for multivariate time-series analysis. One machine learning model predicts the lung cancer feature utilizing the information from all of the other features except for the source S, and the other machine learning model predicts the lung cancer feature utilizing the information from all of the other features including the source S. Because the feature values are not sequential (time-ordered) data, all the appropriate data and not merely a windowed portion of the data were used for generating predictions.
[0129] The lsNGC method was able to successfully assign a high relevancy score to the five known relevant features of the data set even at small sample sizes. Results of the lsNGC method for feature selection appear very promising, but as always, the data analyst must investigate practical limitations of any method, the sufficiency of data, and other considerations when selecting and using any feature selection approach.
[0130] The lsXGC method, including its lsNGC variants, such as used for time-series causality analysis or feature relevance detection or feature selection, is a computer-implemented method, as it is not practical for hand-calculation.
[0131] In a classic linear Granger causality application setting, multivariate autoregressive (MVAR) time-series estimation can be used, where the augmentation of dimension-reduced time-series systems by additional time-series, which contain information about the conditional source and/or target time-series, can be easily accomplished by adding extra dimensions for these additional time-series in the MVAR modeling approach. This is distinctly different from the large-scale Granger causality (lsGC) approach mentioned above. Note that the affine modeling approach explained previously can be seen as a one-dimensional version of an MVAR model.
[0132] Note that, although the lsXGC method so far has been mainly described for embodiments related to time-series analysis, the method can easily be generalized to other contexts. Instead of interpreting the rows and columns of the data matrices of a complete series N.sub.c as time-series (rows) and temporal samples (columns), or the rows as features and the columns as multi-dimensional data samples, i.e. as vectors, whose elements represent the value of each feature for a given sample. Instead of estimating the value of a target time-series at given points in time (e.g., future, past, present), the target may represent any quantity that may be estimated from the data samples. Thus, the described lsXGC method can be used to calculate measures that can serve as statistical surrogates of the “causal influence” of each feature on any chosen target quantity. The resulting measures can be used to determine the “relevance” of each feature for estimating the target, which can be used for feature selection in function approximation or supervised machine learning tasks. This is specifically important in high-dimensional settings, where the number of features is high, but the number of available data samples is low. Note that for lsXGC, inherent assumptions made for Granger causality, such as on the direction of causality, are not required, as discussed below.
[0133] The lsXGC method disclosed herein has been inspired by Granger causality, namely as a method for calculating a measure of causal influence of the conditional source on the target. It is very important to note that the lsXGC method is not limited to assumptions underlying any type of Granger causality. Such assumptions include, but are not limited to the following:
[0134] (1) Direction of time: Granger causality is usually based on an assumed direction of time: most often, the future of the target time-series is estimated from the present or past of the (complete) time-series system. Alternatively, time-reversed Granger causality methods have been described as well, e.g. [4]. Note that for the lsXGC method described herein, any subset of the target time-series may be estimated from any subset of time-points of any subset of time-series of the complete time-series system. Specifically, past, present, or future values of any time-series may be used for estimating the target.
[0135] (2) Direction of causality: Granger causality is typically used to calculate a measure of causal influence of the conditional source time-series on the target time-series. This implied direction of causality, i.e., from source to target, is not required for the lsXGC or its non-linear lsNGC variants. Specifically, lsXGC can be used for calculating a measure of causal influence of the target on the source, by using the computational steps explained herein. This does not mean simply re-naming source and target and repeating calculations with such exchanged roles. It means an actual inversion of the direction of inferred causality. At first glance, this may seem counter-intuitive, but is inspired by prior work on non-linear time-series analysis, e.g. [5]. Here, it is widely accepted that in many non-linear time-series systems, a causal driver may be estimated well from the causally driving time-series, but not vice-versa. In general, the computational steps of the lsXGC (lsNGC) method may be used to calculate measures of causal influence or any form of interdependence, such as motivated by appropriate information-theoretic measures, such as transfer entropy [6] or conditional mutual information [7], regardless whether such measures of causal influence or interdependence be unidirectional or bi-directional.
[0136] (3) Direct inference of causality: Note that the disclosed method can be used independently from methods that compare quantities calculated in the presence and the absence of information about the source dataset, such as Granger causality, transfer entropy, or conditional mutual information. Instead, measures of causal interdependence between source and target dataset can be directly calculated, even in a multivariate setting, such as the estimation errors obtained when calculating estimates of the target using linear or non-linear modeling approaches. For example, one can use one or more of the augmented dimensionally modified representations calculated in the lsXGC method for estimating values of the target using any linear or non-linear multivariate modeling method, and use the obtained estimation errors as measures of causal interdependence directly, e.g. in a multivariate, e.g. sliding-window based, estimation approach. This is inspired by above cited methods for non-linear time-series modeling e.g. [5], and can be also seen as a multivariate extension of Mutual Connectivity Analysis (MCA) based on the disclosed invention. Here, a specific embodiment could also include (i) selecting a target dataset; (ii) augmenting a dimensionally modified representation of a complete system, which contains information about more than one possible source dataset, with information about one specifically chosen source dataset, and calculating estimation errors for estimating the target dataset as a direct measure of causal interdependence between source and target dataset; (iii) iterating step (ii) for a different choice of source dataset; and (iv) comparing such obtained causal interdependence measures. Note that such an approach is also applicable to features datasets.
[0137] Given the aspects (1), (2), and (3) discussed in the preceding paragraph, the lsXGC (lsNGC) method draws a connection between Granger causality and certain traits of non-linear time-series analysis.
[0138] For the above described generalization of the lsXGC method to features, inverting the direction of causality would mean to interpret the target as features, the relevance of which for the conditional source can be evaluated by using the computational steps described herein. The above comments on calculating measures of causal influence or any form of interdependence between source and target hold as well for feature selection.
[0139] A “Divide and Conquer (D&C) strategy” for enhancing the computational efficiency in lsNGC is discussed next. Using the disclosed method, we can compute quantitative estimates for the presence of multivariate directed causal interaction for all N.sup.2 directed connections between N network nodes and store these in a N.sup.2 affinity matrix D. However, it still remains a significant challenge to implement this approach in practical situations, such as in functional MRI. Here, a major difficulty arises from the large number of time-series (N˜10.sup.3−10.sup.5) for analyzing effective connectivity on a fine-grained brain atlas parcellation template or even on a state-of-the-art neuro-imaging voxel resolution scale with whole-brain coverage.
[0140] Instead of analyzing a small number, say N˜10.sup.2, of pre-selected brain regions of interest, such as in contemporary seed-based approaches to resting-state fMRI analysis, this leads to very large affinity matrices D with up to N˜10.sup.10 elements. Thus, for lsNGC, we will have to train up to 10.sup.10 models, e.g. supervised learning machines, as non-linear time-series estimators, e.g. 10.sup.10 neural networks, in order to compute all matrix elements of D. Even though high-performance computing equipment may be available to many widely, a ‘brute force’ approach to accomplish this goal still appears challenging. Here, a key observation is that the proposed lsNGC approach allows us to pursue an algorithmic ‘Divide and Conquer’ (D&C) strategy to successfully address the afore mentioned computational challenge.
[0141] The D&C strategy is illustrated in
[0142] The D&C strategy can be directly adopted for lsNGC scheme, if we exploit the specific properties of (G)RBF neural networks. These can be interpreted as supervised learning machines, which are partially trained in an unsupervised way. In the (G)RBF network, non-linear feature extraction requires a vector quantization step to determine cluster centers and a subsequent computation of corresponding hidden-layer neuron activities, both of which do not depend on information about the prediction target, i.e., the desired network output. These steps constitute the computationally most expensive components in (G)RBF neural network training. However, these steps need to be performed only once for each input, regardless of how many different output prediction targets should be approximated by the network. Although the subsequent training of the (G)RBF network output weights needs to be performed in a supervised manner for each output, the corresponding computational expense is low, because this only implies a linear regression matrix inversion. I.e., one (G)RBF neural network needs to be trained for each conditional source S, which amounts to N such networks, where each (G)RBF network acts as a non-linear estimator for all N possible target time-series Y in the N-dimensional system.
[0143] Note that instead of using (G)RBF networks, we can similarly apply local models and multivariate regression techniques to apply the proposed D&C strategy. For example, when using such methods, defining a local neighborhood of a given data input, e.g. by k-nearest neighbor search, would correspond to the “unsupervised” learning step in
[0144] Further analysis of causal interdependence measures calculated by the disclosed lsXGC (lsNGC) method can be performed in manifold ways. Specifically, such measures can be used as inputs for subsequent modeling, e.g. machine learning tasks, either directly or indirectly, e.g. after extraction of graph-theoretic network features, for classification or regression. For example, in network analysis applications, such as fMRI brain connectivity described above, the calculated causal interdependence measures between specific brain regions, or even whole N.sup.2 connectivity matrices, may be used for classifying imaged human subjects as suffering from a certain neurologic condition or even estimate their clinical performance levels in neurocognitive tasks based on such fMRI analysis. Interestingly, the lsXGC method can also be applied for estimating the “relevance” of input features for such classification/regression tasks, e.g. for feature selection. Also, one can perform community detection in network graphs calculated from lsXGC (lsNGC)-derived connectivity matrices, such as by using graph-theoretic methods, the Louvain method, or non-metric clustering. For such network analysis and network-based inference applications, geometric deep learning methods, such as graph convolution networks (GCN) can be used.
[0145] While this disclosure includes one or more illustrative embodiments described in detail, it is understood that the one or more embodiments are each capable of modification and that the scope of this disclosure is not limited to the precise details set forth herein but include such modifications that would be obvious to a person of ordinary skill in the relevant art including (but not limited to) changes in supervised and unsupervised machine learning algorithms, dimensional modification algorithms, measures of causality, computer hardware and software for computer implementation, environment or field of use, ordering of method steps, and the like, as well as such changes and alterations that fall within the purview of the following claims.
REFERENCES
[0146] [1] Axel Wismüller et al., Pair-wise Clustering of Large Scale Granger Causality Index Matrices for Revealing Communities. Proc. of SPIE, Int. Soc. Opt. Eng. 2014 Mar. 13; 9038: doi:10.1117/12.2044340 (2014) [0147] [2] Adora M. DSouza, Anas Z. Abidin, Lutz Leistritz, and Axel Wismüller: Exploring connectivity with large-scale Granger causality on resting-state functional MRI. Journal of Neuroscience Methods 287:68-79 (2017) [0148] [3] John A. Lee, Michel Verleysen, Nonlinear Dimensionality Reduction, Springer, 2007. [0149] [4] Irene Winkler, Danny Panknin, Daniel Bartz, Klaus-Robert Müller and Stefan Haufe: Validity of time reversal for testing Granger causality. arXiv:1509.07636v2 [math.ST]11 Feb. 2016 [0150] [5] G. Sugihara, R. May, H. Ye, C. H. Hsieh, E. R. Deyle, M. Fogarty, and S. Munch: Detecting causality in complex ecosystems. Science 338:496-500, 2012 [0151] [6] Schreiber T: Measuring information transfer. Phys Rev Lett 85(2):461-464 (2000) [0152] [7] M. Palus, V. Komarek, Z. Hrncir, and K. Sterbova, Phys. Rev. E 63, 046211 (2001) [0153] [8] Granger C., Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37:424-38 (1969) [0154] [9] Marinazzo D, et al., Nonlinear parametric model for Granger analysis of time series, Phys. Rev. E 73 066216 (2006) [0155] [10] Ancona N, Stramaglia S, An invariance property of predictors in kernel induced hypothesis spaces. Neural Computation, volume 18, no. 4 (2006) [0156] [11] Guyon et al. Causal feature selection. Technical report, Berkeley, Calif., March 2007 [0157] [12] Nadaraja E., On estimating regression. Theory Probabil Appl 9(1):141-142 (1964)