Method and Device for Determining a Measure of Causal Influence Between Components of Complex Systems

20220058211 · 2022-02-24

    Inventors

    Cpc classification

    International classification

    Abstract

    Disclosed is a computer-implemented method for determining a measure of the causal interdependence of a source dataset S and a target dataset Y in the simultaneous presence of a non-empty confounding dataset B. The method includes a dimensional modification step to reduce the complexity of the data and an augmentation step to add information to the dimensionally modified data without a significant increase in size or complexity of the data. The augmented dimensionally modified data is used to calculate a measure of the causality or relatedness between the source dataset S and the target dataset Y. The method enables linear or nonlinear causality analysis of high-dimension multivariate time-series or features datasets, such as time-series datasets generated by functional MRI full-brain scans.

    Claims

    1. A computer-implemented method for determining a measure of the causal interdependence of a source dataset S and a target dataset Y in the simultaneous presence of a non-empty confounding dataset B, the confounding dataset B not being the source dataset S and not being the target dataset Y, the method comprising the steps of: (a) calculating at least one of the following dimensionally modified representations (aa) and (ab): (aa) a dimensionally modified representation n.sub.i of one of the following three datasets (1), (2), and (3): (1) the confounding dataset B, (2) the target data set Y, and (3) the union of the confounding dataset B and the target dataset Y, whereby the representation n.sub.i does not include information about the source dataset S, and (ab) a dimensionally modified representation n.sub.c of at least one of the following datasets (1), (2), (3), and (4): (1) the source dataset S, (2) the union of the source dataset S and the confounding dataset B, (3) the union of the source dataset S and the target dataset Y, and (4) the union of the source dataset S, the confounding dataset B, and the target dataset Y; (b) constructing at least one of the following augmented dimensionally modified representations m.sub.i or m.sub.c by: (ba) constructing an augmented dimensionally modified representation m.sub.i that does not contain information about the source dataset S by augmenting a dimensionally modified representation n.sub.i with information about at least one of the following datasets (1), (2), and (3): (1) the target dataset Y, (2) the confounding dataset B, and (3) the union of the target dataset Y and the confounding dataset B; and/or (bb) constructing an augmented dimensionally modified representation m.sub.c that contains information about the source dataset S by: (bba) augmenting the dimensionally modified representation n.sub.c with information about at least one of the following datasets (1), (2), (3), and (4): (1) the source dataset S, (2) the confounding dataset B, (3) the target dataset Y, and (4) any union of at least two of the datasets S, B, and Y; or (bbb) augmenting the dimensionally modified representation n.sub.i or the augmented dimensionally modified representation m.sub.i with information about the source dataset S; and (c) calculating a measure of causal interdependence of the source dataset S and the target dataset Y using at least one of the augmented dimensionally modified representations m.sub.i or m.sub.c, whereby information about the confounding data set B is used: (1) directly, or (2) indirectly by having the information about the confounding data set B having been included in the representations n.sub.i, n.sub.c, m.sub.i, or m.sub.c, which are used for calculating a measure of causal interdependence of the source dataset S and the target dataset Y.

    2. The method of claim 1, wherein step (c) comprises: calculating estimation errors that compare estimated and actual values of the target dataset Y, or calculating information-theoretic quantities that characterize information exchange between the source dataset S and the target dataset Y, such as transfer entropy, cross-entropy, corr-entropy, mutual information, or conditional mutual information.

    3. The method of claim 1, wherein the calculation of a dimensionally modified representation n.sub.i is based on performing the calculation of a dimensionally modified representation n.sub.c in such a way that information about the source dataset S is eliminated.

    4. The method of claim 1 wherein the calculation of the dimensionally modified representations n.sub.i or n.sub.c is performed in such a way that information about the target dataset Y or the confounding dataset B, or both, is eliminated from n.sub.i or n.sub.c.

    5. The method of claim 1 wherein step (c) comprises: calculating quantities {tilde over (y)} that provide information about the target dataset Y without using information about the source dataset S by basing the calculations of {tilde over (y)}: (i) on the dimensionally modified representation n.sub.i, or (ii) on the augmented dimensionally modified representation m.sub.i; and calculating quantities y{circumflex over ( )} that provide information about the target dataset Y using information about the source dataset S by basing the calculations of y{circumflex over ( )}: (iii) on the dimensionally modified representation n.sub.c, or (iv) on the augmented dimensionally modified representation m.sub.c; and the calculation of a measure of causal interdependence of the source dataset S and the target dataset Y is based on both quantities {tilde over (y)} and y{circumflex over ( )}.

    6. The method of claim 5 wherein quantities {tilde over (y)} and y{circumflex over ( )} are estimated values of the target dataset Y, whereby the calculation of a measure of causal interdependence of the source dataset S and the target dataset Y is based on the calculation of quantities that compare estimation errors of the estimated values {tilde over (y)} and the estimated values y{circumflex over ( )} in estimating the target dataset Y.

    7. The method of claim 5 wherein the quantities {tilde over (y)} and y{circumflex over ( )} provide statistical information about the target dataset Y that is used to calculate information-theoretic measures that characterize information exchange between the source dataset S and the target dataset Y, such as transfer entropy, cross-entropy, corr-entropy, mutual information, or conditional mutual information.

    8. The method of claim 5, wherein calculating a measure of causal interdependence between the source dataset X and the target dataset Y comprises: calculating Granger causality, such as by calculating Granger causality indices or f-statistic.

    9. The method of claim 1 wherein: a complete dataset N.sub.c is defined as: the union of the source dataset S and the confounding dataset B, or the union of the source dataset S and the target dataset Y, or the union of the source dataset S, the confounding dataset B, and the target dataset Y; and step (a) comprises the steps of: removing the source dataset S from the complete dataset N.sub.c to form the incomplete dataset N.sub.i; and calculating a dimensionally modified representation of N.sub.i to form n.sub.i.

    10. The method of claim 9 comprising the steps of: iterating over a non-empty and at least two-element subset P of the complete dataset N.sub.c, and for each iteration: assigning one or more different non-empty subsets of P as the source dataset S; removing that source dataset S from the complete dataset N.sub.c to form an incomplete dataset N.sub.i that includes the confounding dataset B; and calculating the measure of causal interdependence of the source dataset S and a target dataset Y by performing steps (a)-(c), wherein step (a) comprises calculating the dimensionally modified representation n.sub.i from the incomplete dataset N.sub.i.

    11. The method of claim 9 comprising the steps of: iterating over a non-empty and at least two-element subset Q of the complete dataset N.sub.c, and for each iteration: assigning one or more different non-empty subsets of Q as the target dataset Y; and calculating a measure of causal interdependence of the source dataset S and the target dataset Y by performing steps (a)-(c).

    12. The method of claim 9 wherein one dimensionally modified representation n.sub.c of the complete dataset N.sub.c is calculated and used for more than one iteration by, in each iteration: eliminating information about the source dataset S from n.sub.c; and/or eliminating information about the target dataset Y from n.sub.c.

    13. (canceled)

    14. The method of claim 1 wherein estimates of any non-empty subset or any union of any of the following data sets are calculated: target Y, source S, confounding data set B, dimensionally modified representations n.sub.i, n.sub.c, m.sub.i, or m.sub.c; and such calculations are performed using (i) linear methods, including, affine, autoregressive, such as MVAR, or any other linear regression methods; or (ii) non-linear methods, including, non-linear regression methods, feed-forward multi-layer perceptrons, e.g. trained by error-back-propagation, radial basis functions (RBF) neural networks or generalized radial basis functions (GRBF) neural networks, support vector regression, fuzzy logic, random forests, decision trees, recurrent neural networks, including, but not limited to Echo State Networks or LSTMs; or (iii) any type of local models, including, local linear models or local average models; (iv) any type of ordinary or partial differential equation solvers; or (v) any combination of different modeling approaches, including, combining approaches previously recited herein, committee machines, boosting, bootstrap aggregating, mixture-of-experts, and/or ensemble learning or ensemble methods.

    15-18. (canceled)

    19. The method of claim 1 wherein the calculation of a dimensionally modified representation of the source dataset S, the target dataset Y, or the confounding data set B includes a non-linear transformation, such as induced by a non-linear kernel derived from a clustering method; and such a calculated dimensionally modified representation serves as input to linear modeling methods, including, affine or MVAR modeling.

    20. The method of claim 1 wherein step (c) comprises: calculating a measure of causal influence of a source dataset S on a target dataset Y; or, calculating a measure of causal influence of a target data set Y on a source dataset S.

    21. The method of claim 1 wherein the source dataset S, the target dataset Y, and the confounding dataset B are each time-series datasets.

    22. The method of claim 1 wherein the source dataset S, the target dataset Y, and the confounding dataset B are each features datasets.

    23. The method of claim 1 wherein the source dataset S, the target dataset Y, the confounding dataset B, or any union of at least two of these datasets, defines a dataset X∈custom-character.sup.N×T, wherein the dataset X is an underdetermined data set with N>T.

    24.-25. (canceled)

    26. Computer code stored in a tangible medium and operable by a computer processor for computer implementation of the method as recited in claim 1.

    27. A device comprising: one or more processors; and one or more memories storing code executable by the processors to perform: (a) reading a source dataset S, a target dataset Y, and a non-empty dataset B, the dataset B not being the source dataset S and not being the target dataset Y; (b) calculating at least one of the following dimensionally modified representations (aa) and (ab): (aa) a dimensionally modified representation n.sub.i of one of the following three datasets (1), (2), and (3): (1) a confounding dataset B, (2) a target data set Y, and (3) a union of the confounding dataset B and the target dataset Y, whereby the representation n; does not include information about the source dataset S, and (ab) a dimensionally modified representation n.sub.c of at least one of the following datasets (1), (2), (3), and (4): (1) the source dataset S, (2) the union of the source dataset S and the confounding dataset B, (3) the union of the source dataset S and the target dataset Y, and (4) the union of the source dataset S, the confounding dataset B, and the target dataset Y; (c) constructing at least one of the following augmented dimensionally modified representations m.sub.i or m.sub.c by: (ba) constructing an augmented dimensionally modified representation m.sub.i that does not contain information about the source dataset S by augmenting a dimensionally modified representation n.sub.i with information about at least one of the following datasets (1), (2), and (3): (1) the target dataset Y, (2) the confounding dataset B, and (3) the union of the target dataset Y and the confounding dataset B; and/or (bb) constructing an augmented dimensionally modified representation m.sub.c that contains information about the source dataset S by: (bba) augmenting the dimensionally modified representation n.sub.c with information about at least one of the following datasets (1), (2), (3), and (4): (1) the source dataset S, (2) the confounding dataset B, (3) the target dataset Y, and (4) any union of at least two of the datasets S, B, and Y; or (bbb) augmenting the dimensionally modified representation n.sub.i or the augmented dimensionally modified representation m.sub.i with information about the source dataset S; and (d) calculating a measure of causal interdependence of the source dataset S and the target dataset Y using at least one of the augmented dimensionally modified representations m.sub.i or m.sub.c, whereby information about the confounding data set B is used: (1) directly, or (2) indirectly by having the information about the confounding data set B having been included in the representations n.sub.i, n.sub.c, m.sub.i, or m.sub.c, which are used for calculating a measure of causal interdependence of the source dataset S and the target dataset Y.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0071] FIG. 1 is a flow chart showing major steps of the disclosed lsXGC method.

    [0072] FIG. 2 illustrates a complete multivariate system to be analyzed for source to target causality using the lsXGC method.

    [0073] FIG. 3 illustrates an incomplete multivariate system generated by removing the source data elements from the complete system shown in FIG. 2.

    [0074] FIG. 4 illustrates alternative methods A and B for generating a dimension-reduced representation of the incomplete system shown in FIG. 3.

    [0075] FIG. 5 conceptually illustrates the dimension-reduced representation of the incomplete system shown in FIG. 3.

    [0076] FIGS. 6 and 7 illustrate alternative methods for augmenting the dimension-reduced incomplete representation shown in FIG. 5.

    [0077] FIG. 8 conceptually illustrates the dimension-reduced incomplete representation shown in FIG. 5 augmented with information about the target time-series Y.

    [0078] FIG. 9 illustrates calculating predictions of the target time-series using the “sliding window” technique.

    [0079] FIG. 10 schematically illustrates generation of a complete system of fMRI multivariate time-series data for causality analysis.

    [0080] FIG. 11 schematically illustrates a multivariate feature dataset for feature relational analysis using the lsXGC method.

    [0081] FIG. 12 schematically illustrates a computer implementation of the lsXGC method.

    [0082] FIG. 13 schematically illustrates a “Divide and Conquer (D&C) strategy” for enhancing the computational efficiency of the lsNGC variant embodiment of the lsXGC method.

    DETAILED DESCRIPTION

    [0083] FIG. 1 illustrates the main steps of the lsXGC method 10 as applied to a high-dimension multivariate time-series dataset 12. The dataset defines a complete multivariate time-series system N.sub.c∈custom-character.sup.N×T having simultaneously measured values. The complete system N.sub.c includes as a subset one or more time-series defining a conditional source time-series or source S. It is desired to evaluate the causal influence of each source S on a dataset 14 defining a target time-series or target Y having values simultaneously measured with those of the complete series N.sub.c. A source S may include more than one row of the N rows of N.sub.c. The target Y may or may not be an element or subset of the complete system N.sub.c.

    [0084] FIG. 2 illustrates the dataset 12 defining the complete system N.sub.c that includes N time-series x.sub.1, x.sub.2, x.sub.3, . . . , x.sub.N-3, x.sub.N-2, x.sub.N-1, and x.sub.N. Each time-series is a series of T data points indexed or stored in time order. FIG. 2 illustrates the members of the time-series x.sub.1=(r.sub.1, r.sub.2, r.sub.3, . . . , r.sub.T-2, r.sub.T-1, r.sub.T) listed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.

    [0085] In the illustrated application of the lsXGC method the source time-series S is the time-series x.sub.3 and the target time-series Y is an element of the complete system N.sub.c, namely the time-series x.sub.N-2. It is desired to evaluate the causal influence F.sub.S.fwdarw.Y of the source time-series x.sub.3 on the target time-series x.sub.N-2. The other time-series of the complete system N.sub.c are confounding time-series that do not belong to the source S and do not belong to the target Y but may influence or may be influenced by all or some of the other time-series, including the target Y.

    [0086] In the first lsXGC step 16 (see FIG. 1), generate a dimension-reduced representation n.sub.i, n.sub.i∈custom-character.sup.D×T of an incomplete system N.sub.i and, optionally, generate a dimension-reduced representation n.sub.c∈custom-character.sup.E×T of the complete system N.sub.c wherein D, E<N. The incomplete system N.sub.i is the system created from the complete system N.sub.c by removing the source S from the complete system N.sub.c, that is, N.sub.i=(N.sub.c\S)∈custom-character.sup.F×T wherein F<N. As explained in more detail below, the dimension-reduced incomplete system representation n.sub.i can be generated either by direct dimensionality reduction of the incomplete system N.sub.i or by dimensionality reduction of the complete system N.sub.c and additional, such as subsequent, elimination of information about the source time-series S.

    [0087] Dimensionality reduction of the incomplete system N.sub.i or of the complete system N.sub.c can be performed using linear dimensionality reduction techniques, such as Principal Component Analysis (PCA), or non-linear techniques. Principal Component Analysis is a linear technique that transforms the original dataset into its principal components by transforming the dataset into a lower-dimensional subspace. The first principal component explains the greatest amount of the variance in the data, the second principal component explains the second greatest amount of variance, and so on. A plot of explained variance versus the number of principal components can be used in Principal Component Analysis to determine a reasonable number of principal components and hence the dimensionality of the reduced-dimension incomplete representation n.sub.i or the optional reduced-dimension complete representation n.sub.c.

    [0088] For non-linear dimensionality reduction methods, one can utilize my previous work on information visualization by non-linear dimension reduction disclosed in my U.S. Pat. No. 7,567,880 as well as a wide scope of non-linear algorithms available from the literature, see e.g. [3]. These include machine learning techniques for dimensionality reduction, such as autoencoders and their variants (e.g. variational autoencoders), clustering methods (e.g. k-means, fuzzy c-means, self-organizing maps (SOM), divergence-based methods, such as stochastic neighbor embedding (SNE), independent component analysis (ICA), exploratory observation machines (XOM), as well as variants or combinations thereof.

    [0089] The dimension-reduction algorithm used does not have to be invertible. The original target time-series Y can be estimated directly from the dimension-reduced system and does not have to be estimated indirectly from back-projecting an estimate of the dimension-reduced time-series into the original space as is performed using the large scale Granger Causality (lsGC) method, although such back-projection of estimates calculated in the dimension-reduced space can be used with lsXGC as well.

    [0090] Generating the dimension-reduced incomplete system n.sub.i is described next.

    [0091] FIG. 3 illustrates the dataset 18 of the incomplete system N.sub.i generated by removing the source S, i.e., x.sub.3, from the complete system N.sub.c. The incomplete system N.sub.i must include at least one confounding time-series that is not an element of the source S or the target Y. The illustrated dataset 18 includes the confounding time-series x.sub.1, x.sub.2, x.sub.4, . . . , x.sub.N-3, x.sub.N-1, and x.sub.N(the target Y is not considered to be a confounding time-series).

    [0092] There are two alternative methods for generating the dimension-reduced representation n.sub.i of the incomplete system N.sub.i. These alternative methods are illustrated in FIG. 4. One alternative method (“Method A”) generates the dimension-reduced incomplete representation n.sub.i through dimensionality reduction of the complete system N.sub.c and without generating the incomplete system N.sub.i directly. The other alternative method (“Method B”) generates the incomplete system N.sub.i and then generates the dimension-reduced incomplete representation by dimensionality reduction of the incomplete system N.sub.i directly.

    [0093] Referring to FIG. 4, in the first alternative method (“Method A”), perform dimension reduction of the complete system N.sub.c in a first step 20 to generate a dataset 22 that is the dimension-reduced representation of the complete system (referred to herein as n.sub.c). In a second step 24 eliminate information about the conditional source time-series S from the representation n.sub.c to generate the dataset 26 defining the dimension-reduced representation of the incomplete system n.sub.i.

    [0094] There are various ways to eliminate information of the source S from n.sub.c to generate n.sub.i, depending on the specific dimension reduction algorithm used.

    [0095] For example, if Principal Components Analysis (PCA) is used to generate the dimension-reduced complete representation n.sub.c, all entries in the coefficient transformation matrix that correspond to the input dimension(s) of the conditional source time-series S can be set to zero. If kernel transformations are used for dimension reduction, such as based on clustering methods, all contributions to calculating the kernel-transformed time-series originating from the input dimensions of the conditional source time-series S can be set to zero. If trained supervised neural networks are used for dimension reduction, such as auto-encoders, all network inputs related to the input dimensions of the conditional source time-series S can be set to zero. Various approaches to eliminating information about the conditional source time-series S (or target time-series Y, discussed later below) after performing dimension reduction have been described in the literature discussing large-scale Granger causality (lsGC), such as [1], [2], and the literature cited therein.

    [0096] Note that if Method A is used, it is not necessary to generate the incomplete system N.sub.i. Also note that if Method A is used, different dimension-reduced incomplete representations n.sub.i for different source time-series S of the complete system N.sub.c can be generated by performing step 24 of FIG. 4 for each different source time-series S using the initially generated dimension-reduced complete representation n.sub.c calculated in step 20.

    [0097] So, for example, if an analyst wishes to analyze the causal influence of different conditional sources S on a given target T of a complete system N.sub.c, Method A permits that the dimensionality reduction of the complete system N.sub.c is performed only once for all the possible conditional sources S of the given target T in the system. Besides the potentially desirable stability of the dimensionality reduction result, this provides significant computational advantages, specifically when large, i.e. high-dimensional, systems (such as those from full-brain fMRI) need to be analyzed with regards to causal influences among their components.

    [0098] Referring again to FIG. 4, in the second alternative method (“Method B”), generate the incomplete system N.sub.i by removing the source S from the complete system N.sub.c in a first step 28 to form the incomplete system N.sub.i and then perform dimension reduction of the incomplete system N.sub.i directly in the second step 30. Because the incomplete system N.sub.i has already had the information of conditional source time-series S removed from it, no further processing of the resulting dimension-reduced representation n.sub.i is required.

    [0099] As a specific example, FIG. 5 conceptually illustrates the dimension-reduced incomplete representation n.sub.i 32, such as obtained after using PCA or any other dimensionality reduction method. As explained above, the representation n.sub.i does not include information about the source time-series S. Note that n.sub.i may or may not include information about the target time-series Y, such as depending on whether Y is or is not an element of the complete system N.sub.c. Note that the representation n.sub.i in FIG. 5 contains D<N time-series v.sub.k=(ν.sub.k.sup.(1),ν.sub.k.sup.(2),ν.sub.k.sup.(3), . . . ,ν.sub.k.sup.(T-2),ν.sub.k.sup.(T-1),ν.sub.k.sup.(T)), where k=1, . . . , D. Note also that, in general, each of the time-series v.sub.k contains a mixture of information originating from several or all of the original time-series of the incomplete system.

    [0100] In a second step 34 of the lsXGC method (see FIG. 1), the reduced-dimension incomplete representation n.sub.i and/or the reduced-dimension complete representation n.sub.c is augmented with information based on the conditional source S, the target Y, and/or both the conditional source S and the target Y. An augmented dimension-reduced representation n.sub.i or n.sub.c gains information about the source S and/or the target Y without a substantial increase in dimensionality, i.e. without substantial increase in size of the data set or required computational complexity. One or both of the augmented reduced-dimension representations will be used to predict the values of the source Y in a later step.

    [0101] The at least one augmenting time-series x.sub.a can be an element of the source S (x.sub.a∈S) and/or the target Y (x.sub.a∈Y). Alternatively, the at least one augmenting time-series x.sub.a is a modified element of the source S and/or the target Y. For example, the at least one augmenting time-series x.sub.a can be generated by replacing the original values of the source S with the logarithms of the original values. Arbitrary other modifications of the source S and/or the target Y are possible, such as for more efficient generation of causality indices in subsequent steps.

    [0102] As shown in FIG. 6 in which dashed arrows represent alternative Boolean-inclusive “or” paths, the dimension-reduced incomplete representation n.sub.i can be augmented with information about the source S to generate a source-augmented dimension-reduced incomplete representation n.sub.iaS, information about the target Y to generate a target-augmented dimension-reduced incomplete representation n.sub.iaY, or information about both the source S and the target Y to generate a source-and-target-augmented dimension-reduced incomplete representation n.sub.iaSY.

    [0103] As similarly shown in FIG. 7 the dimension-reduced complete representation n.sub.c can be augmented with information about the source S to generate a source-augmented dimension-reduced complete representation n.sub.caS, information about the target Y to generate a target-augmented dimension-reduced complete representation n.sub.caY, or information about both the source S and the target Y to generate a source-and-target-augmented dimension-reduced complete representation n.sub.caSY.

    [0104] At least one augmented representation of the dimension-reduced incomplete representation n.sub.i or of the dimension-reduced complete representation n.sub.c will be used in the next step of the lsXGC method to estimate the target Y for causal analysis. In this embodiment of the lsXGC method a source-augmented dimension-reduced incomplete representation n.sub.iaS will be used for estimating the target Y using information about the source S. The incomplete representation n.sub.i will be used for predicting the target Y without using information about the source S.

    [0105] FIG. 8 conceptually illustrates the dataset 32 of the dimension-reduced incomplete representation n.sub.i augmented to generate a dataset 36 that defines a source-augmented dimension-reduced incomplete representation n.sub.iaS. The incomplete representation n.sub.i is augmented with data S′ that has information about the source S. A simple example would be to augment n.sub.i with the source time-series S directly, i.e., S′=S. As a result of augmentation, the source-augmented representation n.sub.iaS includes information about the source S that is not included in the non-augmented incomplete representation n.sub.i. The source-augmented incomplete representation n.sub.iaS will be used for predicting the target time-series Y using information about the source S. The incomplete system n.sub.i will be used for predicting the target time-series Y without using information about the source S.

    [0106] In a third step 40 of the lsXGC method (see FIG. 1) calculate:

    [0107] (a) a first set of predictions {tilde over (y)} of future values of the target Y using the incomplete system n.sub.i having no information about the conditional source S, and

    [0108] (b) a second set of predictions ŷ of future values of the target Y using the source-augmented representation n.sub.iaS having information about the conditional source S.

    [0109] FIG. 9 illustrates use of the “sliding window” technique for structuring time-series data for supervised machine learning or other mathematical modeling, including but not limited to autoregressive or affine modeling, local models, linear or non-linear regression methods, to generate the set of predictions f. Note, however, that other techniques for time-series estimation that do not require fixed-length sliding windows, such as recurrent neural networks, including, but not limited to Echo State Networks (ESN) or Long Short-Term Memory (LSTM), may be used as time-series estimators as well. In the classical Granger causality setting using the sliding window example of FIG. 9, a number d of previous time-lagged values of all the time-series of the dataset 36 of the source-augmented incomplete representation n.sub.iaS are used as input variables for a prediction model that is used to predict a future value, e.g. the next, value y.sup.(t) of the target Y, where the sliding window W is a matrix of size (D+1)×d, with d the temporal width of the sliding window and D the number of time-series in the dimension-reduced incomplete system n.sub.i, and (D+1) the number of time-series in the source-augmented dimension-reduced system n.sub.iaS in the special case that the dimension of the time-series S′ carrying information about the source S that is used to augment n.sub.i for creating n.sub.iaS is one-dimensional. Let w.sup.(t) represent the column vector of W for time t. Then, a typical embodiment of a linear time-series prediction model can be an affine model, e.g., y.sup.(t)=b.sub.0+Σ.sub.τ=1.sup.d b.sub.τ.Math.w.sup.(t-τ), where b.sub.0 is a bias coefficient, b.sub.τ are coefficient vectors of dimension (D+1) and “.Math.” represents the scalar product based on element-wise multiplication. The model coefficients b.sub.0 and b.sub.τ can be calculated by suitable optimization techniques, such as least squares optimization for estimating y.sup.(t). Other linear time-series prediction models include, but are not limited to MVAR models used on augmented or non-augmented dimension-reduced representations of the complete or incomplete system, where the MVAR predictions belonging to the time-series of the non-augmented dimension-reduced representations are used for estimating y.sup.(t), such as by using inverse PCA in a similar way as in large-scale Granger Causality (lsGC). Other variants of the method are possible, such as a multi-step modeling approach, wherein (i) estimates, such as predictions, for values of the augmented and/or non-augmented dimension-reduced representations are calculated by any linear or non-linear modeling method, and (ii) such estimates are used for calculating estimated, e.g. predicted, values of the target dataset Y.

    [0110] In the embodiment of FIG. 9, the sliding window technique with a fixed window width d generates the predictions ŷ.sup.(t), t=d+1, . . . , T of the set of predictions f using the source-augmented representation n.sub.iaS. The sliding window technique is also used to generate the set of predictions {tilde over (y)} using the dimension-reduced representation n.sub.i of the incomplete system.

    [0111] Nonlinear estimation models, such as for time-series prediction, may be used as well. Variants of lsXGC, in which nonlinear estimation models are used, will be called large-scale Nonlinear Granger Causality (lsNGC). A typical embodiment would be to use Radial Basis Functions (RBF) or Generalized Radial Basis Functions (GRBF) neural networks. Here, a typical embodiment would include dimensionality reduction of the complete or incomplete original time-series system by clustering of extracted state-space vectors using a sliding window approach. The extracted windows represent state-space vectors that can be processed using clustering techniques, such as k-means, fuzzy c-means, self-organizing maps, agglomerative clustering or any clustering method known from prior art. As an example for state space clustering, let Q be a q-dimensional time-series of length T, such as the complete or the incomplete system, or the source or the target time-series, or any other one- or multidimensional time-series considered in this description, such as dimension-reduced and/or augmented complete or incomplete systems. Then sliding windows U(t) of size q×e can be extracted from time-series Q for each time t with t=e, . . . , T, with e the temporal width of the sliding window. The q×e-dimensional vectors U(t) can be clustered yielding K cluster prototype vectors, e.g. cluster centers, U.sub.k of dimension q×e with k=1, . . . , K. The computed cluster prototype vectors U.sub.k can then be used to obtain processed time-series representations, which contain information about the original time-series Q. A simple way to generate such processed time-series representations is to perform nonlinear kernel transformations of the original time-series, which exploit a distance measure between the values of a given time-series and each cluster center. In the above state-space clustering example for the originally q-dimensional time-series Q, one may compute a K-dimensional processed time-series Q′ (t)=(Q.sub.1.sup.′(t), . . . , Q.sub.k.sup.′(t), . . . , Q.sub.K.sup.′(t)).sup.T by using a Gaussian (Radial Basis Functions, RBF) kernel, i.e., Q.sub.k.sup.′(t)=exp(−((U(t)−U.sub.k).sup.2/(2σ.sup.2))) with kernel width σ, or using a normalized Gaussian (Generalized Radial Basis Functions, GRBF) kernel, i.e., Q.sub.k.sup.″(t)=(1/Σ)Q.sub.k.sup.′(t) with Σ=Σ.sub.k=1.sup.k Q.sub.k.sup.′(t).

    [0112] For example, a dimension-reduced representation n.sub.i=Q′ of the incomplete system shown in FIG. 5 may be obtained by state-space clustering of the incomplete system (corresponding to Method B in FIG. 4) or state-space clustering of the complete system with subsequent elimination of information about the conditional source time-series, such as by setting all contributions originating from the conditional source time-series to zero (corresponding to Method A in FIG. 4). Similarly, state-space clustering of the source or target time-series may be used to obtain processed, possibly multi-dimensional, time-series, which contain information about the source time-series or the target time-series. Using the example of FIG. 8, such processed time-series S′ containing information about the source time-series may be used to augment the dimension-reduced representation n.sub.i=Q′ of the incomplete system in FIG. 5 to obtain a source-augmented dimension-reduced representation of the incomplete system according to FIG. 8. Performing least-squares optimization on such a system for predicting a future, e.g. the next, value ŷ of the target time-series using a sliding window of width one and an affine model, as explained above, is equivalent to training the output weights of Radial Basis Functions (RBF) neural networks, if a Gaussian (RBF) kernel had been used to perform above nonlinear kernel transformation using state-space clustering of the original time-series. For example, one can calculate


    ŷ=a.sub.0+a.sub.1.Math.Q′+a.sub.2.Math.S′,  (1)

    where model coefficients a.sub.0, a.sub.1, and a.sub.2 can be estimated by least-squares optimization, and “.Math.” represents the scalar product based on element-wise multiplication. Here, as well, the equivalent sliding window technique and affine modeling can also be used to generate the set of predictions {tilde over (y)} using the dimension-reduced representation n.sub.i=Q′ of the incomplete system only, i.e. without information about the source time-series S, such as by computing


    ŷ=b.sub.0+b.sub.1.Math.Q′,  (2)

    where model coefficients b.sub.0 and b.sub.1 can be estimated using least-squares optimization.

    [0113] Note that instead of RBF or GRBF models, it is straightforward to use other non-linear modeling approaches, such as local models, e.g. [5] or [12], or multivariate non-linear regression techniques.

    [0114] In a fourth step 42 of the lsXGC method (see FIG. 1) calculate a measure of causal interdependence, such as a causality index F.sub.S.fwdarw.Y, e.g. by comparing target Y values with sets of predictions {tilde over (y)} and ŷ. Calculation of the Granger causality index was previously described. Another useful causality index F.sub.S.fwdarw.Y is the f-statistic:

    [0115] In the above example, the f-statistic can be obtained by recognizing that the two models, equations (1) and (2) above, can be characterized as an unrestricted model and a restricted model, respectively. Residual sum of squares (RSS) of the restricted model, RSS.sub.R, and residual sum of squares of the unrestricted model, RSS.sub.U, are obtained. Let n be the number of vectors, and p.sub.U and p.sub.R the numbers of parameters to be estimated for the unrestricted and the restricted model, respectively. For the example of equations (1) and (2) using state space clustering with K clusters for calculating n.sub.i=Q′, and S clusters for calculating S′, in (2) and (1), p.sub.R=K, and p.sub.U=K+S, respectively. Given these settings, the f-statistic lsXGC causality index F.sub.S.fwdarw.Y is given by:

    [00002] F S .fwdarw. Y = ( RSS R - RSS U ) / ( p U - p R ) ( RSS U ) / ( n - p U - 1 )

    [0116] Application of the lsXGC method for causal analysis between time-series of a high-dimensional multivariate functional MRI (fMRI) dataset is described next.

    [0117] FIG. 10 schematically illustrates an MRI scanner 44 that generates a raw whole brain fMRI multivariate voxel time-series dataset 46. It is desired to obtain a measure of causal influence between every pair of voxel time-series.

    [0118] An exemplar whole brain fMRI multivariate time-series dataset is composed of 100,000 voxels (volume pixels) within the brain that are sampled every 0.5 seconds for four minutes. The resulting fMRI dataset is a time-series X∈custom-character.sup.(100,000×480). Note that the number of voxels, sampling rate and time period used to collect fMRI data varies depending on the MRI scanner used, age and physical condition of the subject, and other variables. The fMRI dataset is a high-dimensional multivariate time-series having a relatively low number of temporal time values, that is, 100,000 time-series of length 480.

    [0119] The raw fMRI dataset 46 is pre-processed before analysis to generate a complete fMRI dataset 48 or complete system N.sub.c∈custom-character.sup.N×T that is used for causal analysis. Pre-processing may include, but is not limited to motion correction, removing a limited number of the initial values of the time-series to remove the effects of initial saturation, linear de-trending by high-pass filtering to remove effects of signal drifts, and normalizing the data to zero mean and unit standard deviation to focus on signal dynamics rather than amplitude.

    [0120] It is desired to quantify the causal influence of each time-series of the complete system N.sub.c on each of the other time-series of the complete system N.sub.c for generating a N.sup.2 affinity (connectivity) matrix. The analysis requires iterating through each time-series (row) of the complete system N.sub.c and having that time-series be a conditional source S or a target Y and, for that time-series, iterate through each time-series as the target Y or the conditional source S to evaluate the causal influence F.sub.S.fwdarw.Y of the source S on the target Y for that pair of time-series, the other time-series being confounding time-series. Note that, as a special case, the target and source time-series can be identical, i.e. S=Y for evaluating the causal influence of a time-series on itself. This special case is often referred to in the context of so-called “Granger autonomy” in prior art. The analysis is illustrated in the following pseudocode in which each row of the complete system N.sub.c is selected as the target time-series Y and each time-series is selected as the conditional source time series S:

    TABLE-US-00001 FOR each time-series x ϵ N.sub.c  Target Y := x  FOR each time-series x ϵ N.sub.c   Conditional Source S := x′   Calculate causality index F.sub.S.fwdarw.Y by 1sXGC method (previously    described above)    Step 1 - generate dimension-reduced representation(s)    Step 2 - augmentation of dimension-reduced representation(s)    Step 3 - generate predictions {tilde over (y)} and ŷ from augmented        representation(s)    Step 4 - calculate causality index F.sub.S.fwdarw.Y using sets of        predictions {tilde over (y)} and ŷ   Store F.sub.S.fwdarw.Y as element in N.sup.2 affinity matrix  ENDFOR ENDFOR

    [0121] For fMRI causality analysis, Method A (FIG. 4) is usually preferred for generating the dimension-reduced representation n.sub.i. Specifically, if one wants to analyze the causal influence of different conditional sources S of a complete system N.sub.c on a given target T, the dimensionality reduction of the complete system N.sub.c needs to be performed only once for all the possible conditional sources S of the given target T. Besides the potentially desirable stability of the dimensionality reduction result, this provides significant computational advantages, specifically when large, i.e. high-dimensional, systems such as fMRI multivariate data need to be analyzed with regards to causal influences among their components.

    [0122] The performance of the lsXGC method (specifically in its non-linear lsNGC variants) was also evaluated and compared to other methods using synthetic data with known connectivity structure. The performance of these methods to unveil known connectivity relationships can be evaluated using Receiver Operating Characteristic (ROC) analysis to quantitatively compare performance. The lsNGC method was found to perform better than current state-of-the-art analysis using Convergent Cross Mapping (CCM) [5] with a high statistical significance.

    [0123] The applicant has also found in using the lsNGC method for causality analysis of fMRI datasets and synthetic data that the time-series modeling approaches, e.g. supervised machine learning methods using the non-linear transformation functions, e.g. Q′ and S′ in the above example, do not have to be IIV invariant to generate useful results. That is, although IIV invariance is mathematically relevant, its relevance to real-world causality analysis of high dimensionality multivariate datasets is not so clear. It is suggested that data analysts use and compare both IIV invariant methods and IIV non-invariant modeling methods before settling on the machine learning or statistical modeling method to be used for lsNGC analysis of their datasets.

    [0124] Application of the disclosed lsXGC method for feature selection is discussed next.

    [0125] Analogies between causality analysis of multivariate time-series data and features using the lsNGC method is set out in Table 1:

    TABLE-US-00002 TABLE 1 Description Time-Series Features Components Number of Components N N Number of Values of each T T Component Relationship between Causality Relevancy Components index score

    [0126] FIG. 11 illustrates a feature dataset 50 that defines a complete system N.sub.c containing N features related to lung cancer and containing T sets of simultaneous values for those features obtained from a sample population. It is desired to evaluate the relative relevancy of each feature as a source S on the target Y of the feature “lung cancer” (presence or absence of lung cancer) to evaluate the relative relevancy or importance of each of the other features on the presence or absence of lung cancer.

    [0127] An analysis using a lsNGC variant of the lsXGC method was conducted using the lung cancer feature as the target Y and each of the other features as a source S to obtain a vector of relevancy indices to identify relevant factors for lung cancer.

    [0128] The lsNGC method was used in essentially the same manner as used with multivariate time-series datasets to determine causality indices such as Granger causality indices or f-statistic indices. The lung cancer feature was predicted using dimension-reduced and augmented dimension-reduced representations as described above for multivariate time-series analysis. One machine learning model predicts the lung cancer feature utilizing the information from all of the other features except for the source S, and the other machine learning model predicts the lung cancer feature utilizing the information from all of the other features including the source S. Because the feature values are not sequential (time-ordered) data, all the appropriate data and not merely a windowed portion of the data were used for generating predictions.

    [0129] The lsNGC method was able to successfully assign a high relevancy score to the five known relevant features of the data set even at small sample sizes. Results of the lsNGC method for feature selection appear very promising, but as always, the data analyst must investigate practical limitations of any method, the sufficiency of data, and other considerations when selecting and using any feature selection approach.

    [0130] The lsXGC method, including its lsNGC variants, such as used for time-series causality analysis or feature relevance detection or feature selection, is a computer-implemented method, as it is not practical for hand-calculation. FIG. 12 is a schematic representation of a computer implementation for implementation of the lsXGC method. The computer implementation includes a computer 52 having a database 54, a memory 56, and a processor 58. The database stores computer instructions for carrying out the lsXGC method and the multivariate data for analysis using the lsXGC method. The computer instructions may be in a tangible, non-transient, processor-usable object code format and/or source code format of a programming language (such as Python, C, C++, C #, Java, JavaScript, Ruby, R, MATLAB, or the like) for translation or compilation into processor-usable object code. The processor reads the computer instructions from the database through a memory, and can write data such as an affinity matrix to the database through the memory. The processor may include one or more central processing units (CPUs), each CPU having one core or multiple cores. The processor may include one or more graphical processing units (GPUs) designed for efficient arithmetic processing of the algorithms and data structures often used in machine learning. The components of the computer may be wholly contained in a computer case, be distributed in a local area network, or distributed in a wide area network or cloud. For example, some cloud providers offer internet access to GPUs for use with machine learning applications. The method can also be partly or fully implemented by using quantum computers.

    [0131] In a classic linear Granger causality application setting, multivariate autoregressive (MVAR) time-series estimation can be used, where the augmentation of dimension-reduced time-series systems by additional time-series, which contain information about the conditional source and/or target time-series, can be easily accomplished by adding extra dimensions for these additional time-series in the MVAR modeling approach. This is distinctly different from the large-scale Granger causality (lsGC) approach mentioned above. Note that the affine modeling approach explained previously can be seen as a one-dimensional version of an MVAR model.

    [0132] Note that, although the lsXGC method so far has been mainly described for embodiments related to time-series analysis, the method can easily be generalized to other contexts. Instead of interpreting the rows and columns of the data matrices of a complete series N.sub.c as time-series (rows) and temporal samples (columns), or the rows as features and the columns as multi-dimensional data samples, i.e. as vectors, whose elements represent the value of each feature for a given sample. Instead of estimating the value of a target time-series at given points in time (e.g., future, past, present), the target may represent any quantity that may be estimated from the data samples. Thus, the described lsXGC method can be used to calculate measures that can serve as statistical surrogates of the “causal influence” of each feature on any chosen target quantity. The resulting measures can be used to determine the “relevance” of each feature for estimating the target, which can be used for feature selection in function approximation or supervised machine learning tasks. This is specifically important in high-dimensional settings, where the number of features is high, but the number of available data samples is low. Note that for lsXGC, inherent assumptions made for Granger causality, such as on the direction of causality, are not required, as discussed below.

    [0133] The lsXGC method disclosed herein has been inspired by Granger causality, namely as a method for calculating a measure of causal influence of the conditional source on the target. It is very important to note that the lsXGC method is not limited to assumptions underlying any type of Granger causality. Such assumptions include, but are not limited to the following:

    [0134] (1) Direction of time: Granger causality is usually based on an assumed direction of time: most often, the future of the target time-series is estimated from the present or past of the (complete) time-series system. Alternatively, time-reversed Granger causality methods have been described as well, e.g. [4]. Note that for the lsXGC method described herein, any subset of the target time-series may be estimated from any subset of time-points of any subset of time-series of the complete time-series system. Specifically, past, present, or future values of any time-series may be used for estimating the target.

    [0135] (2) Direction of causality: Granger causality is typically used to calculate a measure of causal influence of the conditional source time-series on the target time-series. This implied direction of causality, i.e., from source to target, is not required for the lsXGC or its non-linear lsNGC variants. Specifically, lsXGC can be used for calculating a measure of causal influence of the target on the source, by using the computational steps explained herein. This does not mean simply re-naming source and target and repeating calculations with such exchanged roles. It means an actual inversion of the direction of inferred causality. At first glance, this may seem counter-intuitive, but is inspired by prior work on non-linear time-series analysis, e.g. [5]. Here, it is widely accepted that in many non-linear time-series systems, a causal driver may be estimated well from the causally driving time-series, but not vice-versa. In general, the computational steps of the lsXGC (lsNGC) method may be used to calculate measures of causal influence or any form of interdependence, such as motivated by appropriate information-theoretic measures, such as transfer entropy [6] or conditional mutual information [7], regardless whether such measures of causal influence or interdependence be unidirectional or bi-directional.

    [0136] (3) Direct inference of causality: Note that the disclosed method can be used independently from methods that compare quantities calculated in the presence and the absence of information about the source dataset, such as Granger causality, transfer entropy, or conditional mutual information. Instead, measures of causal interdependence between source and target dataset can be directly calculated, even in a multivariate setting, such as the estimation errors obtained when calculating estimates of the target using linear or non-linear modeling approaches. For example, one can use one or more of the augmented dimensionally modified representations calculated in the lsXGC method for estimating values of the target using any linear or non-linear multivariate modeling method, and use the obtained estimation errors as measures of causal interdependence directly, e.g. in a multivariate, e.g. sliding-window based, estimation approach. This is inspired by above cited methods for non-linear time-series modeling e.g. [5], and can be also seen as a multivariate extension of Mutual Connectivity Analysis (MCA) based on the disclosed invention. Here, a specific embodiment could also include (i) selecting a target dataset; (ii) augmenting a dimensionally modified representation of a complete system, which contains information about more than one possible source dataset, with information about one specifically chosen source dataset, and calculating estimation errors for estimating the target dataset as a direct measure of causal interdependence between source and target dataset; (iii) iterating step (ii) for a different choice of source dataset; and (iv) comparing such obtained causal interdependence measures. Note that such an approach is also applicable to features datasets.

    [0137] Given the aspects (1), (2), and (3) discussed in the preceding paragraph, the lsXGC (lsNGC) method draws a connection between Granger causality and certain traits of non-linear time-series analysis.

    [0138] For the above described generalization of the lsXGC method to features, inverting the direction of causality would mean to interpret the target as features, the relevance of which for the conditional source can be evaluated by using the computational steps described herein. The above comments on calculating measures of causal influence or any form of interdependence between source and target hold as well for feature selection.

    [0139] A “Divide and Conquer (D&C) strategy” for enhancing the computational efficiency in lsNGC is discussed next. Using the disclosed method, we can compute quantitative estimates for the presence of multivariate directed causal interaction for all N.sup.2 directed connections between N network nodes and store these in a N.sup.2 affinity matrix D. However, it still remains a significant challenge to implement this approach in practical situations, such as in functional MRI. Here, a major difficulty arises from the large number of time-series (N˜10.sup.3−10.sup.5) for analyzing effective connectivity on a fine-grained brain atlas parcellation template or even on a state-of-the-art neuro-imaging voxel resolution scale with whole-brain coverage.

    [0140] Instead of analyzing a small number, say N˜10.sup.2, of pre-selected brain regions of interest, such as in contemporary seed-based approaches to resting-state fMRI analysis, this leads to very large affinity matrices D with up to N˜10.sup.10 elements. Thus, for lsNGC, we will have to train up to 10.sup.10 models, e.g. supervised learning machines, as non-linear time-series estimators, e.g. 10.sup.10 neural networks, in order to compute all matrix elements of D. Even though high-performance computing equipment may be available to many widely, a ‘brute force’ approach to accomplish this goal still appears challenging. Here, a key observation is that the proposed lsNGC approach allows us to pursue an algorithmic ‘Divide and Conquer’ (D&C) strategy to successfully address the afore mentioned computational challenge.

    [0141] The D&C strategy is illustrated in FIG. 13. Instead of training N supervised learning (SVL) machines 60 for each source dataset directly, the D&C strategy disentangles the learning problem into one computationally expensive unsupervised learning (USL) step 62 and N subsequent computationally inexpensive SVL steps 64. In the lsNGC setting, each input source 66 corresponds to extracted multidimensional time-series windows resulting from a specific choice for the conditional source time-series S. As there are N such possible choices for S, the algorithmic complexity of calculating all N.sup.2 elements of the affinity matrix, i.e. all lsNGC indices, is O(N.sup.2). By decoupling USL and SVL steps when choosing RBF (or GRBF) neural networks for the lsNGC, as explained above, we can decrease the empirical complexity dramatically, namely from O(N.sup.2) to almost O(N), because the SVL step in (G)RBF networks is a computationally inexpensive matrix inversion. For a large number N of time-series as in fMRI analysis, the resulting increase of computational efficiency is substantial.

    [0142] The D&C strategy can be directly adopted for lsNGC scheme, if we exploit the specific properties of (G)RBF neural networks. These can be interpreted as supervised learning machines, which are partially trained in an unsupervised way. In the (G)RBF network, non-linear feature extraction requires a vector quantization step to determine cluster centers and a subsequent computation of corresponding hidden-layer neuron activities, both of which do not depend on information about the prediction target, i.e., the desired network output. These steps constitute the computationally most expensive components in (G)RBF neural network training. However, these steps need to be performed only once for each input, regardless of how many different output prediction targets should be approximated by the network. Although the subsequent training of the (G)RBF network output weights needs to be performed in a supervised manner for each output, the corresponding computational expense is low, because this only implies a linear regression matrix inversion. I.e., one (G)RBF neural network needs to be trained for each conditional source S, which amounts to N such networks, where each (G)RBF network acts as a non-linear estimator for all N possible target time-series Y in the N-dimensional system.

    [0143] Note that instead of using (G)RBF networks, we can similarly apply local models and multivariate regression techniques to apply the proposed D&C strategy. For example, when using such methods, defining a local neighborhood of a given data input, e.g. by k-nearest neighbor search, would correspond to the “unsupervised” learning step in FIG. 13, as a computationally expensive step, which, however, has to be performed only once for that source S (complexity O(N)), whereas calculating the local model or regression output for a single source/target pair (corresponding to the “supervised” learning step in FIG. 13) is computationally cheap, but has to be performed for all pairs of inputs and outputs in the system (complexity O(N.sup.2)). In summary, by following the D&C strategy outlined in FIG. 13, i.e., by disentangling unsupervised and supervised learning steps when choosing RBF neural networks, local models, or multivariate regression techniques as non-linear estimation models for lsNGC, we will be able to decrease the empirical complexity of the affinity matrix computation dramatically, namely from O(N.sup.2) to almost O(N) by decoupling the estimation procedure into an initial, computationally expensive step of complexity O(N), followed by a computationally inexpensive step of complexity O(N.sup.2). Note that the D&C approach can be applied to both time-series and features datasets.

    [0144] Further analysis of causal interdependence measures calculated by the disclosed lsXGC (lsNGC) method can be performed in manifold ways. Specifically, such measures can be used as inputs for subsequent modeling, e.g. machine learning tasks, either directly or indirectly, e.g. after extraction of graph-theoretic network features, for classification or regression. For example, in network analysis applications, such as fMRI brain connectivity described above, the calculated causal interdependence measures between specific brain regions, or even whole N.sup.2 connectivity matrices, may be used for classifying imaged human subjects as suffering from a certain neurologic condition or even estimate their clinical performance levels in neurocognitive tasks based on such fMRI analysis. Interestingly, the lsXGC method can also be applied for estimating the “relevance” of input features for such classification/regression tasks, e.g. for feature selection. Also, one can perform community detection in network graphs calculated from lsXGC (lsNGC)-derived connectivity matrices, such as by using graph-theoretic methods, the Louvain method, or non-metric clustering. For such network analysis and network-based inference applications, geometric deep learning methods, such as graph convolution networks (GCN) can be used.

    [0145] While this disclosure includes one or more illustrative embodiments described in detail, it is understood that the one or more embodiments are each capable of modification and that the scope of this disclosure is not limited to the precise details set forth herein but include such modifications that would be obvious to a person of ordinary skill in the relevant art including (but not limited to) changes in supervised and unsupervised machine learning algorithms, dimensional modification algorithms, measures of causality, computer hardware and software for computer implementation, environment or field of use, ordering of method steps, and the like, as well as such changes and alterations that fall within the purview of the following claims.

    REFERENCES

    [0146] [1] Axel Wismüller et al., Pair-wise Clustering of Large Scale Granger Causality Index Matrices for Revealing Communities. Proc. of SPIE, Int. Soc. Opt. Eng. 2014 Mar. 13; 9038: doi:10.1117/12.2044340 (2014) [0147] [2] Adora M. DSouza, Anas Z. Abidin, Lutz Leistritz, and Axel Wismüller: Exploring connectivity with large-scale Granger causality on resting-state functional MRI. Journal of Neuroscience Methods 287:68-79 (2017) [0148] [3] John A. Lee, Michel Verleysen, Nonlinear Dimensionality Reduction, Springer, 2007. [0149] [4] Irene Winkler, Danny Panknin, Daniel Bartz, Klaus-Robert Müller and Stefan Haufe: Validity of time reversal for testing Granger causality. arXiv:1509.07636v2 [math.ST]11 Feb. 2016 [0150] [5] G. Sugihara, R. May, H. Ye, C. H. Hsieh, E. R. Deyle, M. Fogarty, and S. Munch: Detecting causality in complex ecosystems. Science 338:496-500, 2012 [0151] [6] Schreiber T: Measuring information transfer. Phys Rev Lett 85(2):461-464 (2000) [0152] [7] M. Palus, V. Komarek, Z. Hrncir, and K. Sterbova, Phys. Rev. E 63, 046211 (2001) [0153] [8] Granger C., Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37:424-38 (1969) [0154] [9] Marinazzo D, et al., Nonlinear parametric model for Granger analysis of time series, Phys. Rev. E 73 066216 (2006) [0155] [10] Ancona N, Stramaglia S, An invariance property of predictors in kernel induced hypothesis spaces. Neural Computation, volume 18, no. 4 (2006) [0156] [11] Guyon et al. Causal feature selection. Technical report, Berkeley, Calif., March 2007 [0157] [12] Nadaraja E., On estimating regression. Theory Probabil Appl 9(1):141-142 (1964)