Parametric and non-parametric criteria for causal inference from time-series Daniel Chicharro Abstract Granger causality constitutes a criterion for causal inference from time series that has been largely applied to study causal interactions in the brain from electrophysiological recordings. This criterion underlies the classical parametric implementation in terms of linear autoregressive processes as well as Transfer entropy, i.e. a non-parametric implementation in the framework of information theory. In the spectral domain, partial directed coherence and the Geweke formulation are related to Granger causality but rely on alternative criteria for causal inference which are inherently based on the parametric formulation in terms of autoregressive processes. Here we clearly differentiate between criteria for causal inference and measures used to test them. We compare the different criteria for causal inference from time-series and we further introduce new criteria that complete a unified picture of how the different approaches are related. Furthermore, we compare the different measures that implement these criteria in the information theory framework. 1 Introduction The inference of causality in a system of interacting processes from recorded time-series is a subject of interest in many fields. Particularly successful has been the concept of Granger causality [29, 31], originally applied to economic time-series. In the last years, measures of causal inference have been also widely applied to electrophysiological signals, in particular to characterize causal interactions between different brain areas (see [46, 28, 10] for a review of Granger causality measures applied to neural data). In the original formulation of Granger causality, causality from a process Y to a process X was examined based on the reduction of the prediction error Center for Neuroscience and Cognitive Systems@UniTn, Istituto Italiano di Tecnologia, Via Bettini 31, 38068 Rovereto (TN) e-mail: [email protected] i ii Daniel Chicharro of X when including the past of Y [60, 29]. However, this prediction error criterion generalizes to a criterion of conditional independence on probability distributions [31] that is generally applicable to stationary and non-stationary stochastic processes. Here we consider the criterion of Granger causality together with related criteria of causal inference, like Sims causality [55]. We also consider the criteria underlying other measures that have been introduced to infer causality but for which the underlying criterion has not been made explicit. This includes the Geweke spectral measures of causality (GSC) [25, 26], and partial directed coherence (PDC) [5]. We make a clear distinction between criteria for causal inference and measures implementing them. Accordingly, we refer by Granger causality to the general criterion of causal inference and not as it is often the case to the measure implementing it for linear processes. This means that we consider transfer entropy [54] as a particular measure to test for Granger causality in the information-theoretic framework (e.g. [56, 1]). This distinction between criteria and measures is important because in practice one is usually not only interested in assessing the existence of a causal connection but in evaluating its strength (e.g. [11, 9, 8, 52, 59]). Causal inference can be associated with the construction of a causal graph representing which connections exist in the system [19]. However, quantifying the causal effects resulting from these connections is a more difficult task. Recently [16] examined how the general notion of causality developed by Pearl [45] can be applied to study the natural dynamics of complex systems. This notion is based on the idea of externally manipulating the system to evaluate causal effects. For example, if one is studying causal connectivity in the brain, this manipulation could be the deactivation of some connections between brain areas, or stimulating electrically a given area. It is clear that these manipulations alter the normal dynamics of the brain, those which one wants to analyze in order to understand neural computations. Accordingly, [16] pointed out that if the main interest is not the effect of external perturbations, but how the causal connections participate in the generation of the unperturbed dynamics of the system, then only in some cases it is meaningful to characterize interactions between different subsystems in terms of the effect of one subsystem over another. To identify these cases the notion of natural causal effects between dynamics was introduced and conditions for their existence were provided . Consequently, Granger causality measures, and in particular transfer entropy, cannot be used in general as measures of the strength of causal effects [4, 39]. Alternatively, a different approach was developed in [15]. Instead of examining the causal effects resulting from the causal connections, a unifying multivariate framework to study the dynamic dependencies between the subsystems that arise from the causal interactions was proposed. Considering this, we here focus on the criteria for causal inference and the measures are only used as statistics to test these criteria. We closely follow [14] relating the different formulations of Granger causality and the corresponding criteria of causal inference, and integrating parametric and non- Parametric and non-parametric criteria for causal inference from time-series iii parametric formulations, as well as time-domain and spectral formulations, for both bivariate and multivariate systems. Furthermore, we do not discuss the fundamental assumptions that determine the valid applicability of the criterion of Granger causality. In particular we assume that all the relevant processes are observed and well-defined. This is of course a big idealization for real applications, but our purpose is examining the relation between the different criteria and measures that appear in the different formulations of Granger causality. (For a detailed discussion of the limitations of these criteria see [58, 16]). More generally, [45] offers a complete explanation of the limitations of causal inference without intervening the system. This Chapter is organized as follows: In Section 2 we review the nonparametric formulation of the criteria of Granger and Sims causality and the information-theoretic measures, including transfer entropy, used to test them. In section 3 we review the parametric autoregressive representation of the processes and the time domain and spectral measures of Granger causality, in particular GSC and PDC. We make explicit the parametric criteria of causal inference underlying these measures and discuss their relation to the non-parametric criteria. Furthermore we introduce related new criteria for causal inference that allow us to complete a consistent unifying picture that integrates all the criteria and measures. This picture is presented all together in Section 4. 2 Non-parametric approach to causal inference from time-series We here review Granger causality and Sims causality as non-parametric criteria to infer causality from time-series as well as some measures used to test them. Although both the criteria of Granger causality [29, 30] and Sims causality [55] were originally introduced in combination with a linear formulation, we here consider their general non-parametric expression [31, 12]. 2.1 Non-parametric criteria for causal inference In [31] it was stated a general criterion for causal inference from time-series based on the comparison of two probability distributions. We consider first its bivariate formulation. Assume that for the processes X and Y we record two time-series {X} = {X1 , X2 , ..., XN } and {Y } = {Y1 , Y2 , ..., YN }. Granger causality states that there is no causality from Y to X if the equality p(Xt+1 |X t ) = p(Xt+1 |X t , Y t ) ∀X t , Y t (1) iv Daniel Chicharro holds. Here X t = {Xt , Xt−1 , ...X1 } is the past of the process at time t. From now on we will assume stationarity so that the results do not depend on the particular time. Therefore we consider N → ∞ and select t such that X t accounts for the infinite past of the process. See [56, 15] for a non-stationary formulation. According to Eq. 1 Granger causality indicates that there is no causality from Y to X when the future Xt+1 is conditionally independent of the past Y t given the partialization on his own past X t . That is, the past of Y has no dependence with the future of X that cannot be accounted by the past of X. As an alternative criterion Sims causality [55] examines the equality p(X t+1:N |X t , Y t ) = p(X t+1:N |X t , Y t , Yt+1 ) ∀X t , Y t , Yt+1 . (2) It states that there is no causality from Y to X if the whole future X t+1:N is conditionally independent of Yt+1 given the past of the two processes. In fact, assuming stationarity it is not necessary to condition on Y t so that like Granger causality the criterion indicates that the future of X is completely determined by his own past (see [37] for a detailed review of the relation between the two criteria). While Granger causality and Sims causality are equivalent criteria for the bivariate case [12], this is not true for multivariate processes. When other processes also interact with X and Y it is necessary to distinguish a causal connection from Y to X from other connections that also result in statistical dependencies incompatible with the equality in Eq. 1. These other connections are indirect causal connections Y → Z → X as well as the effect of common drivers, i.e. a common parent Z such that Z → Y and Z → X. The formulation of Granger causality turns out to be easily generalizable to account for these influences resulting in the equality p(Xt+1 |X t , Z t ) = p(Xt+1 |X t , Y t , Z t ) ∀X t , Y t , Z t , (3) where Z t refers to the past of any other process that interacts with X and Y . In fact, on which processes it is needed to condition depends on the particular causal structure of the system, which is exactly what one wants to infer. This renders the criterion of Granger causality context dependent [31]. This means that if Z does not include all the relevant processes a false positive can be obtained when testing for causality from Eq. 3. The problem of hidden variables for causal inference is an issue not specific for time-series that in general can only be addressed by an interventional treatment of causality [45]. In practice, from observational data, some procedures can help to optimize the selection of the variables on which to condition [22, 41]. In this Chapter we do not further deal with this problem and we assume that all the relevant processes are observed. In contrast to Granger causality, Sims causality cannot be generalized to the multivariate case as a criterion for causal inference. The reason is that, since in Eq. 2 the whole future X t+1:N is considered jointly, there is no way to Parametric and non-parametric criteria for causal inference from time-series v disentangle direct from indirect causal connections from Y to X. This means that for multivariate processes the criterion of Granger causality in Eq. 3 remains as the unique non-parametric criterion for causal inference between the time series. 2.2 Measures to test for causality In this Chapter we want to clearly differentiate between the criteria for causal inference and the particular measures used to test for causality according to these criteria. This is why we refer by Granger causality to the general criterion proposed in [31] (Eqs. 1 and 3) so that Granger causality measures include both the transfer entropy and the linear Granger causality measure. The linear measure, that quantifies the predictability improvement [60], implements for linear processes a test on the equality of the mean of the distributions appearing in Eq. 1. More generally, if one wants to test for the equality between two probability distributions without examining specific moments of a given order, the Kullback-Leibler divergence (KL-divergence) [38] KL(p∗ (x), p(x)) = ∑ p∗ (x) log x p∗ (x) p(x) (4) is a non-negative measure that is zero if and only if the two distributions are identical. For a multivariate variable X, since it quantifies the divergence of the distribution p(x) from p∗ (x), one can construct p(x) to reflect a specific null-hypothesis about the dependence between the components of X. As particular applications of the KL-divergence to quantify the interdependence between random variables one has the conditional mutual information I(X; Y |Z) = ∑ x,y,z p(x, y, z) log p(x|y, z) . p(x|z) (5) We can see that the form of the probability distributions in the argument of the logarithm is the same as the ones in Eqs. 1-3. Accordingly, testing the equality of Eq. 1 is equivalent to having a zero transfer entropy [54, 44] TY →X = I(Xt+1 ; Y t |X t ) = 0. (6) An analogous information-theoretic measure of Sims causality is obtained so that Eq. 2 leads to SY →X = I(Yt+1 ; X t+1:N |Y t , X t ) = 0. (7) For multivariate processes Eq. 3 leads to a zero conditional transfer entropy vi Daniel Chicharro TY →X|Z = I(Xt+1 ; Y t |X t , Z t ) = 0. (8) [54] introduced the transfer entropy to test the equality of Eq. 1 further assuming that the processes were Markovian with a finite order. A similar information-theoretic quantity, the directed information, has been introduced in the context of communication theory [42, 43, 36]. The directed information was originally formulated for the non-stationary case and naturally appears in a causal decomposition of the mutual information (e.g. [1]). Such a decomposition can also be expressed in terms of transfer entropies, and is valid for both a non-stationary formulation of the measures which is local in time and another that is cumulative on the whole time series [15]. These two formulations converge for the stationary case resulting in I(X N ; Y N ) = TY →X + TX→Y + TX·Y , (9) where TX·Y is a measure of instantaneous causality. From this relation it can be checked that for both the cumulative non-stationary formulation and for the stationary one, if there is no instantaneous causality TY →X =H(Xi+1 |X i ) − H(Xi+1 |X i , Y i ) = H(Yi+1 |X i , Y i ) − H(Y N |X N ) =SY →X . (10) This equality, restricted to the stationary linear case, is indicated already in Theorem 1(ii) of [25], where no instantaneous causality is enforced by a normalization of the covariance matrix. Notice that here we consider the measures as particular instantiations of the KL-divergence used as a statistic for hypothesis testing [38]. This is important to keep in mind because the KL-divergence can be interpreted as well in terms of code length [17], and in particular the transfer entropy (directed information) determines the error-free transmission rate when applied to specific communication channels with feedback [36], (and see also [47] for a discussion of different application of transfer entropy). Furthermore, any conditional mutual information can be evaluated as a difference of two conditional entropies, and interpreted as a reduction of uncertainty. To test for causality only the significance of nonzero values is of interest, but it is common to use the values of TY →X to characterize the causal dependencies. Alternatively, the value of SY →X could be used, giving a not necessarily equivalent characterization if the conditions of Eq. 10 are not fulfilled or depending on the particular estimation procedure. More generally, the KL-divergence is not the only option to test the criteria of causality above in a non-parametric way. Other measures have been proposed based on the same criterion (e.g. [33, 2]) that are sensitive to higherorder moments of the distributions. A natural alternative that also considers all the moments of the distributions is to use the Fisher information Parametric and non-parametric criteria for causal inference from time-series ∫ F (Y ; x) = dY p(Y |x)( ∂ ln p(Y |x) 2 ) ∂x vii (11) which, by means of the Cramer-Rao bound [17], it is related to the accuracy of an unbiased estimator of X from Y . For the particular equality of Eq. 1 this leads to test Eyt [F (Xt+1 ; y t |X t )] = 0. (12) In the Appendix we examine in detail this expression for linear Gaussian autoregressive processes. 3 Parametric approach to causal inference from time-series The criteria of Section 2.1 do not assume any particular form of the processes. Oppositely, in the implementation originally introduced by [29], the processes are assumed to have a linear autoregressive representation. Here by parametric we refer specifically to the assumption of this representation. Notice that this is different from a parametric approach in which not the processes but the probability distributions are estimated parametrically, for example using generalized linear models [49]. We first review the autoregressive representation of stationary stochastic processes for bivariate and multivariate systems, describing the projections used in the different linear formulations of Granger causality. We then review these formulations, in particular the Geweke formulation in the temporal and spectral domain [25, 26] and partial directed coherence [5, 53]. Apart from stationarity we will assume that there is no instantaneous causality, i. e. that the covariance matrices of the innovations terms in the autoregressive representation are diagonal. This substantially simplifies the formulation avoiding a normalization step [25, 18]. Furthermore, strictly speaking, the existence of instantaneous causality is a signature of time or spatial aggregation, or of the existence of hidden variables, that questions the validity of the causal inference [30]. 3.1 The autoregressive process representation Consider the system formed by the stationary stochastic processes X and Y . Two projections are required to construct the bivariate linear measure of Granger causality from Y to X. First, the projection of Xt+1 on his own past: viii Daniel Chicharro Xt+1 = ∞ ∑ (x) (x) (x) a(x) xs Xt−s + ϵxt+1 , var(ϵx ) = Σx , (13) s=0 second, its projection on the past of both X and Y : Xt+1 = ∞ ∑ (xy) (xy) a(xy) xxs Xt−s + axys Yt−s + ϵxt+1 s=0 Yt+1 = ∞ ∑ (14) a(xy) yxs Xt−s + a(xy) yys Yt−s + (xy) ϵyt+1 s=0 ( Σ (xy) (xy) (xy) = (xy) (xy) (xy) ) Σxx Σxy (xy) (xy) Σyx Σyy (xy) (15) (xy) (xy) (xy) where Σxx = var(ϵx ), Σyy = var(ϵy ), Σxy = cov(ϵx , ϵy ), (xy) (xy)T and Σyx = Σxy . Notice that while the subindexes are used to refer to the corresponding variable or to components of a matrix, the superindexes refer to the particular projection. As we said above, we assume that Σ (xy) is diagonal. [25] also proved the equality between Granger and Sims causality measures for linear autoregressive processes. For that purpose also the projection of Yi+1 in the whole process X is needed Yt+1 = ∞ ∑ (xy) bx s(xy) Xt−s + ηyt+1 . (16) s=−∞ For multivariate systems we consider the fully multivariate autoregressive representation of the system W = {X, Y, Z}: Xt+1 = ∞ ∑ (xyz) (xyz) (xyz) a(xyz) xxs Xt−s + axys Yt−s + axzs Zt−s + ϵxt+1 s=0 Yt+1 = Zt+1 = ∞ ∑ s=0 ∞ ∑ (xyz) (xyz) (xyz) a(xyz) yxs Xt−s + ayys Yt−s + ayzs Zt−s + ϵyt+1 (17) (xyz) (xyz) (xyz) a(xyz) zxs Xt−s + ayzs Yt−s + azzs Zt−s + ϵzt+1 s=0 (xyz) (xyz) (xyz) Σxx Σxy Σxz (xyz) (xyz) (xyz) = Σyx Σyy Σyz . (xyz) (xyz) (xyz) Σzx Σzy Σzz Σ (xyz) (18) Like for the bivariate case we assume that Σ (xyz) is diagonal. Apart from the joint autoregressive representation of W to calculate the conditional GSC from Y to X it is also needed the projection of Xt+1 only Parametric and non-parametric criteria for causal inference from time-series ix on the past of X and Z: Xt+1 = Zt+1 = ∞ ∑ s=0 ∞ ∑ (xz) (xz) a(xz) xxs Xt−s + axzs Zt−s + ϵxt+1 (19) a(xz) zxs Xt−s + a(xz) zzs Zt−s + (xz) ϵzt+1 s=0 ( Σ (xz) = (xz) (xz) Σxx Σxz (xz) (xz) Σzx Σzz ) . (20) 3.2 Parametric measures of causality The autoregressive representations described in Section 3.1 have been used to define quite many measures related to the criterion of Granger causality. We here focus on the Geweke measures [25, 26], and partial directed coherence [5]. Other measures introduce some variation or refinement of these measures to deal with estimation problems or attenuate the influence of hidden variables (e.g. [13, 32, 52]). Furthermore, directed transfer function [35] is another related measure [14] but only equivalent to Geweke measure for bivariate systems [20]. 3.2.1 The Geweke measures of Granger causality The temporal formulation and the relation between linear Granger causality and transfer entropy Granger [29, 30] proposed to test for causality from Y to X examining if there is an improvement of predictability of Xt+1 when using the past of Y apart from the past on X for an optimal linear predictor. For a linear predictor h(X t ), using only information from the past of X, the squared error is determined by ∫ E (x) = dXt+1 dX t (Xt+1 − h(X t ))2 p(Xt+1 , X t ), (21) and analogously for E (xy) using information from the past of X and Y . Since the optimal linear predictor is the conditional mean [40], we have that ∫ ∫ E (x) = dX t p(X t ) dXt+1 (Xt+1 − EXt+1 [Xt+1 |X t ])2 p(Xt+1 |X t ) (22) =EX t [σ 2 (Xt+1 |X t )]. x Daniel Chicharro If the autoregressive representation of Eq. 13 is assumed to be valid the variance σ 2 (Xt+1 |X t ) does not depend on the value of X t and we have E (x) = EX t [σ 2 (Xt+1 |X t )] = Σx(x) . (23) An analogous equality is obtained for E (xy) , so that the Geweke measure of Granger causality is defined as: (x) GY →X = ln( Σx (xy) ), (24) Σxx using the autoregressive representation of Eqs. 13-15. This measure, as indicated in [31], tests if there is causality from Y to X in mean, that is, the equality: EXt+1 [Xt+1 |X t ] = EXt+1 [Xt+1 |X t , Y t ] ∀X t , Y t . (25) Accordingly, given Eqs. 1 and 25, it is clear that GY →X ̸= 0 ⇒ TY →X ̸= 0, (26) since the first only test for difference in the moment of order 1 and the other in the whole probability distribution. In principle, the opposite implication is not always true. However, since Eq. 25, as well as Eqs. 1-3 impose a stack of constraints (one for each value of the conditioning variables) we expect that, at least in general, the inequality for higher order moments is accompanied by one in the conditional means. Furthermore, when the autoregressive representations are assumed to be valid, testing for the equality in the mean or the variance of the distributions is equivalent, given Eq. 23 and that the conditional variance is independent on the value conditioning. Notice that Gaussianity has not to be assumed for this equality and in general in [25] it is only further assumed to find the distribution of the measures under the null-hypothesis of no causality. The explanation above further relates the distinction in [31] between causation in mean (Eq. 25) and causation prima facie (Eq. 1) to the equivalence between the Geweke linear measure of Granger causality GY →X and the transfer entropy for Gaussian processes. Since a Gaussian probability distribution is completely determined by its first two moments, and the conditional variance is independent on the value conditioning, it is clear from the explanation above that for Gaussian variables causation in mean and prima facie have to be equivalent. This in practice can be seen [7] taking into account that the entropy of a N -variate Gaussian distribution is completely determined by its covariance matrix Σ: N H(XGaussian )= 1 ln ((2πe)N |Σ|). 2 Accordingly, the two measures are such that: (27) Parametric and non-parametric criteria for causal inference from time-series GY →X = 2 TY →X . xi (28) For multivariate processes the conditional GSC [26] is defined in the time domain analogously to GY →X in Eq. 24, but now using the autoregressive representations of Eqs 17-20: (xz) GY →X|Z = ln( Σxx (xyz) ). (29) Σxx It is straightforward to see that, given the form of the entropy for Gaussian variables (Eq. 27) and the definition of the conditional transfer entropy TY →X|Z (Eq. 8), the relation between Granger causality and Transfer entropy also holds for the conditional measures for Gaussian variables: GY →X|Z = 2 TY →X|Z . (30) The spectral formulation Geweke [25] also proposed a spectral decomposition of the time domain Granger causality measure (Eq. 24). Geweke derived the spectral measure of causality from Y to X, gY →X (ω), requiring the fulfillment of some properties: 1. The spectral measure should have an intuitive interpretation so that the spectral decomposition is useful for empirical applications. 2. The measure has to be nonnegative. 3. The temporal and spectral measures have to be related so that ∫ π 1 gY →X (ω)dω = GY →X . (31) 2π −π Conditions two and three imply that GY →X = 0 ⇔ gY →X (ω) = 0 ∀ω. (32) The GSC is obtained from the spectral representation of the bivariate autoregressive process as follows. Fourier transforming Eq. 14 leads to: )( ) ( (xy) ) ( (xy) (xy) Axx (ω) Axy (ω) X(ω) ϵx (ω) , = (xy) (xy) (xy) Y (ω) ϵy (ω) Ayx (ω) Ayy (ω) (33) ∑∞ (xy) (xy) (xy) where we have Axx (ω) = 1 − s=1 axxs e−iωs , as well as Axy (ω) = ∑∞ (xy) −iωs (xy) (xy) − s=1 axys e , and analogously for Ayy (ω), Ayx (ω). The coefficients (xy) matrix A (ω) can be inverted into the transfer function H(xy) (ω) = (xy) −1 (A ) (ω), so that xii Daniel Chicharro ( ) ( (xy) )( (xy) ) (xy) Hxx (ω) Hxy (ω) ϵx (ω) X(ω) = . (xy) (xy) (xy) Y (ω) Hyx (ω) Hyy (ω) ϵy (ω) (34) Accordingly, the spectral matrix can be expressed as: S(xy) (ω) = H(xy) (ω)Σ (xy) (H(xy) )∗ (ω) (35) where ∗ denotes complex conjugate and matrix transpose. Given the lack of instantaneous correlations (xy) (xy) (xy) (xy) |Hxx (ω)|2 + Σyy |Hxy (ω)|2 . Sxx (ω) = Σxx (36) The GSC from Y to X at frequency ω is defined as: gY →X (ω) = ln Sxx (ω) . (xy) (xy) Σxx |Hxx (ω)|2 (37) This definition fulfills the requirement of being nonnegative since, given Eq. (xy) (xy) 36, Sxx (ω) is always higher than Σxx |Hxx (ω)|2 . It also fulfills the requirement of being intuitive since gY →X (ω) quantifies the portion of the power spectrum which is associated with the intrinsic innovation process of X. Furthermore, the third condition is also accomplished (see [25, 57, 14] for details). This can be seen considering that gY →X (ω) = − ln (1 − |C(X, ϵ(xy) )|2 ) y (xy) (38) (xy) where |C(X, ϵy )|2 is the squared coherence of X with the innovations ϵy of Eq. 14. Given the general relation of the mutual information rate with the squared coherence [24] we have that for Gaussian variables ∫ −1 π N (xy)N TY →X = I(X ; ϵy )= ln (1 − |C(X, ϵ(xy) )|2 )dω. (39) y 4π −π For the multivariate case, to derive the spectral representation of GY →X|Z for simplicity we assume again that there is no instantaneous causality and Σ (xyz) and Σ (xz) are diagonal (see [18] for a detailed derivation when instantaneous correlations exist). We rewrite Eq. 19 after Fourier transforming as: )( ( ) ( ) (xz) (xz) (xz) X(ω) Axx (ω) Axz (ω) ϵx (ω) . (40) = (xz) (xz) (xz) Z(ω) Azx (ω) Azz (ω) ϵz (ω) Furthermore we rewrite Eq. 17 using the transfer function H(xyz) : (xyz) ϵx (ω) X(ω) (xyz) Y (ω) = H(xyz) ϵy (ω) . (xyz) Z(ω) ϵz (ω) (41) Parametric and non-parametric criteria for causal inference from time-series xiii Geweke [26] showed that GY →X|Z = GY ϵ(xz) →ϵ(xz) . z (42) x (xz) Accordingly, Eqs. 40 and 41 are combined to express Y , ϵz terms of the innovations of the fully multivariate process: (xyz) (xz) ϵx (ω) ϵx (ω) (xyz) (xyz) Y (ω) = DH ϵy (ω) , (xz) (xyz) ϵz (ω) ϵz (ω) (xz) and ϵx in (43) (xz) (xz) Axx (ω) 0 Axz (ω) 0 1 0 D= . (xz) (xz) Azx (ω) 0 Azz (ω) where (44) (xz) Considering Q = DH(xyz) , the spectrum matrix of Y , ϵz (xz) and ϵx b S(ω) = Q(ω)Σ (xyz) Q∗ (ω), is: (45) and in particular (xyz) (xyz) (xyz) . (46) + |Qxz (ω)|2 Σzz + |Qxy (ω)|2 Σyy Sϵ(xz) ϵ(xz) (ω) = |Qxx (ω)|2 Σxx x x The conditional GSC from Y to X given Z is defined [26] as the portion of (xyz) , in analogy to Eq. 37: the power spectrum associated with ϵx gY →X|Z (ω) = gY ϵ(xz) →ϵ(xz) (ω) = ln z x Sϵ(xz) ϵ(xz) (ω) x x (xyz) |Qxx (ω)|2 Σxx . (47) This measure also fulfills the requirements that [25] imposed to the spectral measures. Furthermore, in analogy to Eq. 38, gY →X|Z (ω) is related to a multiple coherence: gY →X|Z (ω) = − ln (1 − |C(ϵ(xz) , ϵ(xyz) ϵ(xyz) )|2 ), x y z (xz) (48) (xyz) (xyz) ϵz )|2 is the squared multiple coherence [48]. This where |C(ϵx , ϵy equality results from the direct application of the definition of the squared multiple coherence (see [14] for details). Given the definition of gY →X|Z (ω) in terms of the squared multiple coherence it is clear that, analogously to GY →X (Eq. 39): GY →X|Z = 2 I(ϵ(xz)N ; ϵ(xyz)N ϵ(xyz)N ). x y z (49) xiv Daniel Chicharro 3.2.2 Partial directed coherence The other measure related to Granger causality that we review here is partial directed coherence [6, 5], which is defined only in the spectral domain. In particular, the information partial directed coherence (iPDC) from Y to X [57] is defined in the bivariate case as: (xy) iπxy (ω) = C(ϵ(xy) , ηy(xy) ) x √ (xy) Axy (ω) Syy|X √ = , (xy) Σxx (50) where A(xy) (ω) is the spectral representation of the autoregressive coefficients matrix of Eq. 14 and Syy|X is the partial spectrum [48] of the Y process when (xy) partialized on process X. Furthermore, ηy refers to the partialized process resulting from the Y process when partialized on X, as results from Eq. 16. Like in the case of the GSC, a mutual information rate is associated with iPDC [57], and is further related to SY →X [14] for Gaussian variables: ∫ −1 π (xy) (xy)N (xy)N ; ηy )= ln (1 − |iπxy (ω)|2 )dω. SY →X = I(ϵx (51) 4π −π In the multivariate case the information partial directed coherence (iPDC) from Y to X [57] is: √ (ω) Syy|W \y √ (xyz) Σxx (xyz) (xyz) iπxy (ω) = C(ϵ(xyz) , ηy(xyz) ) x = Axy (52) where A(xyz) (ω) is the spectral representation of the autoregressive coefficients matrix of Eq. 17 and Syy|W \y is the partial spectrum of the Y process when partialized on all the other processes in the multivariate process W . (xyz) refers to the partialized process resulting from the Y Furthermore, ηy process when partialized on all the others. In the multivariate case not even after the integration across frequencies the iPDC can be expressed in terms of the variables of the observed processes X, Y , Z. The equality [57] ; ηy(xyz)N ) = I(ϵ(xyz)N x −1 4π ∫ π −π (xyz) ln (1 − |iπxy (ω)|2 )dω (53) analogous to the one of the bivariate case (Eq. 51), provides only an expres(xyz) (xyz) sion which involves the innovation processes ϵx and ηy . Parametric and non-parametric criteria for causal inference from time-series xv 3.3 Parametric criteria for causal inference From the revision above of the spectral Geweke measures of Granger causality and of the partial directed coherence one can see that alternative criteria for causal inference which involve the innovation processes intrinsic to the parametric autoregressive representation are implicit in the mutual information terms. In particular, for the bivariate case, the spectral Geweke measure is related (Eq. 39) to the criterion p(X N ) = p(X N |ϵ(xy)N ) ∀ϵ(xy)N . y y (54) The bivariate PDC is related (Eq. 51) to p(ϵ(xy)N ) = p(ϵ(xy)N |ηy(xy)N ) ∀ηy(xy)N . x x (55) For the multivariate case the Geweke measure is related (Eq. 49) to p(ϵ(xz)N ) = p(ϵ(xz)N |ϵ(xyz)N , ϵ(xyz)N ) ∀ϵ(xyz)N , ϵ(xyz)N x x y z y z (56) while the PDC is related (Eq. 53) to p(ϵ(xyz)N ) = p(ϵ(xyz)N |ηy(xyz)N ) ∀ηy(xyz)N . x x (57) Comparing the non-parametric criteria of Section 2.1 with these parametric criteria we can see another main difference, apart from that the parametric ones all involve some innovation process. This difference is that in Eqs. 54-57 temporal separation between future and past is not required to state the criteria, while the non-parametric criteria all rely explicitly on temporal precedence. The lack of temporal separation is exactly what allows to construct the spectral measures based on the criteria of Eqs. 54-57. In [14] it was shown, based on this difference with respect to temporal separation, that transfer entropy does not have a non-parametric spectral representation. This lack of a non-parametric spectral representation of the transfer entropy can be further understood considering why a criterion without temporal separation that involves only the processes X, Y and not innovation processes, cannot be used for causal inference: Consider p(X N ) = p(X N |Y N ) as a criterion to infer causality from Y to X in contrast to the ones of Eqs. 1 and 54. Using the chain rule for the probability distributions this equality implies checking p(Xt+1 |X t ) = p(Xt+1 |X t , Y N ). But this equality does not hold if there is a causal connection in the opposite direction, from X to Y , because of the conditioning on the whole process Y N instead of only on its past. Oppositely xvi Daniel Chicharro p(X N |ϵ(xy)N )= y N −1 ∏ p(Xt+1 |X t , ϵ(xy)N )= y t=0 = N −1 ∏ N −1 ∏ t=0 p(Xt+1 |X t , ϵ(xy)t ) y (58) p(Xt+1 |X , Y ), t t t=0 since by construction there are no causal connections from the processes to the innovation processes. The last equality can be understood considering that the autoregressive projections described in Section 3.1 introduce a functional relation of the variables, such that, for example, given Eq. 14, Xt+1 is (xy)t+1 (xy)t completely determined by ϵx , ϵy , and analogously for Yt+1 . Accord(xy)t ingly, it is equivalent to condition on X t , ϵy or X t , Y t . The probability distributions in Eq. 1 and Eq. 54 are still not the same, as it is clear from Eq. 58. However, under the assumption of stationarity, they are the functional relations that completely determine the processes from the innovation processes (and inversely) what leads to the equality in Eq. 39 of the transfer entropy with the mutual information corresponding to the comparison of the probability distributions in Eq. 54, and analogously for Eqs. 49 and 51. Remarkably, the mutual information associated with Eq. 57, as noticed above Eq. 53, is not equal to a mutual information associated with a non-parametric criterion. As indicated in Eq. 51 (see [14] for details) for bivariate processes the PDC is related to Sims causality. However, for the multivariate case, while there is no extension of Sims causality, it is clear from the comparison of the definitions in Eqs. 50 and 52, as well as from the comparison of the criteria of Eqs. 55 and 57, that the multivariate formulation appears as a natural extension of the bivariate one. This stresses the role of the functional relations that are assumed to implicitly define the innovation processes. It is not only the causal structure between the variables but the specific functional form in which they are related what guarantees the validity of the criteria in Eqs. 54-57. In general this functional form is not required to be linear, as long as it establishes that the processes and innovation processes are mutually determined. Another interesting aspect is revealed from the comparison of the bivariate and multivariate criteria respectively associated with GSC and PDC measures. While for the PDC the multivariate criterion is a straightforward extension of the bivariate one, this is not the case for the criteria associated with GSC. This can be noticed as well comparing the autoregressive projections used for each measure. In particular, for the bivariate case, gY →X (ω) is obtained directly from the bivariate autoregressive representation (Eq. 14), not by combining it with the univariate autoregressive representation of X (Eq. 13). Oppositely, gY →X|Z (ω) requires the combination of the full multivariate projection (Eq. 17) and the projection on the past of X, Z (Eq. 19). Below we show that in fact there is a natural counterpart for both the criteria of Eqs. 54 and 56 respectively. Parametric and non-parametric criteria for causal inference from time-series xvii 3.4 Alternative Geweke spectral measures Instead of constructing gY →X (ω) just from the bivariate autoregressive representation, one could proceed alternatively following the procedure used for the conditional case. This means combining Eq. 34 with the Fourier transform of Eq. 13 (x) ϵ(x) (59) x (ω) = axx (ω)X(ω). This is analogous to combining Eqs. 40 and 41 in the conditional case. Combining Eqs. 34 and 59 we get an expression analogous to Eq. 43: ( ) ( (x) ) (xy) ϵx (ω) ϵx (ω) (xy) = PH , (60) (xy) Y (ω) ϵy (ω) ( where P= (x) axx (ω) 0 0 1 ) . (61) b = PH(xy) , the spectrum of ϵ(x) Considering Q x is 2 (xy) (xy) b xy |2 Σyy b xx |2 Σxx = |a(x) + |Q Sϵ(x) ϵ(x) (ω) = |Q xx (ω)| Sxx (ω) x x (xy) and comparing the total power to the portion related to ϵx gbY →X (ω) = ln Sϵ(x) ϵ(x) (ω) x x (xy) b xx |2 Σxx |Q = ln Sxx (ω) (xy) |Hxx |2 Σxx (62) one can define = gY →X (ω). (63) This shows that (xy) )) )) = − ln (1 − C(X, ϵ(xy) gY →X = − ln (1 − C(ϵ(x) x x , ϵy (64) TY →X = I(ϵ(x)N ; ϵ(xy)N ) = I(X N ; ϵ(xy)N ). x y y (65) and This equality indicates that although the procedure used for the multivariate case is apparently not reducible to the bivariate case for Z = ∅, the spectral decomposition gY →X (ω) is the same. The criterion for causal inference that results from reducing to the bivariate case straightforwardly the one of Eq. 56 is p(ϵ(x)N ) = p(ϵ(x)N |ϵ(xy)N ) ∀ϵ(xy)N . (66) x x y y Again the particular functional relation between the processes and the inno(x)N vation processes determines that X N and ϵx share the same information (xy)N with ϵy , given that they are mutually determined in Eq. 13. Analogously, we want to find the criterion that results from a straightforward extension of the one in Eq. 54. An alternative way to construct xviii Daniel Chicharro gY →X|Z (ω) is suggested by the relation between the bivariate and the conditional measures stated by Geweke [26]: GY →X|Z = GY Z→X − GZ→X , (67) which is just an application of the chain rule for the mutual information [17]. In analogy to Eq. 37 gY Z→X (ω) = ln Sxx (xyz) 2 (xyz) |Hxx | Σxx (68) gZ→X (ω) = ln Sxx , (xz) 2 (xz) |Hxx | Σxx (69) and where H(xz) is the inverse of the coefficients matrix of Eq. 40. This leads to (xz) gbY →X|Z (ω) = ln (xz) |Hxx |2 Σxx (xyz) 2 (xyz) | Σxx |Hxx . (70) Notice that while gY →X (ω) = gbY →X (ω), the two measures are different in the conditional case. This means that two alternative spectral decompositions are possible, although their integration is equivalent. This can be seen considering (xyz) (xz) that the integration of the logarithm terms including |Hxx |2 and |Hxx |2 is zero, based on theorem 4.2 of Rozanov [51], (see [14] for details). Accordingly TY →X|Z = I(X N ; ϵ(xyz)N ϵ(xyz)N ) − I(X N ; ϵ(xz)N ). y z z (71) and the natural extension of the criterion in Eq. 54 is . , ϵ(xyz)N ) ∀ϵ(xyz)N , ϵ(xyz)N ) = p(X N |ϵ(xyz)N p(X N |ϵ(xz)N z y z y z (72) The fact that the variable conditioning on the left hand side is not preserved among the variables conditioning on the right hand side is what determines that the information-theoretic statistic to test this equality is not a single KL-divergence (in particular a mutual information) but a difference of two. We examine if the alternative spectral measures fulfill the three conditions imposed by Geweke described in Section 3.2.1. In the bivariate case the measure is equal so it is clear that it does. In the multivariate case the measure has an intuitive interpretation and fulfills the relation with the time domain measure under integration. However, nonnegativity is not guaranteed for every frequency since it is related to a difference of mutual informations. Parametric and non-parametric criteria for causal inference from time-series xix 3.5 Alternative parametric criteria based on innovations partial dependence Above we have shown that the different criteria underlying bivariate and multivariate GSC can be reduced or extended respectively to the other case. We indicated that the parametric criteria rely not only on the causal structure but also in the functional relations assumed between the processes and the innovation processes. This is particularly clear in the multivariate criteria (Eqs. 56, 57 and 72) because the criteria combine innovations from different projections. This prevents from considering the autoregressive models as actual generative models which structure can be mapped to a causal graph. Here we introduce an alternative type of parametric criteria which relies on a single projection, which can be considered as the model from which the processes are generated. In the bivariate case the criterion is p(ϵ(xy)N |X N ) = p(ϵ(xy)N |X N , ϵ(xy)N ) ∀X N , ϵ(xy)N , x x y y (73) which can be tested with the mutual information I(ϵ(xy)N , ϵ(xy)N |X N ) = 0. x y (xy)N (74) (xy)N are assumed to be independent (or are and ϵy The innovations ϵx rendered independent after the normalization step in [25]) when there is no conditioning. The logic of the criterion is that if conditioning on X N introduces some dependence, this can only be because both innovation processes have a dependence with process X (this is the conditioning on a joint child (xy)N is associated with X N , this effect occurs effect). Since by construction ϵx (xy)N has an influence on X N , which can only be through only if and only if ϵy an existent connection from Y to X. In the multivariate case the criterion is straightforwardly extended to (75) ) ∀X N , Z N , ϵ(xyz)N |X N , Z N , ϵ(xyz)N |X N , Z N ) = p(ϵ(xyz)N p(ϵ(xyz)N y y x x and can be tested with the mutual information I(ϵ(xyz)N , ϵ(xyz)N |X N , Z N ) = 0. x y (76) (xyz)N Here the conditioning on Z N is required so that the connection from ϵy to X N is not indirect through Z. These criteria have the advantage that they rely on a unique autoregressive representation. They are also useful to illustrate the difference between using the information-theoretic measures as statistics to test for causality or as measure to quantify the dependencies. In particular, the mutual informations of Eqs. 74, 76 are either infinity or zero if there is or there is no causality from xx Daniel Chicharro Y to X. This is clear when expressed in terms of the squared coherence, for (xyz) (xyz) 2 example |C(Xϵx , ϵy )| is associated with Eq. 74, and is 1 when there is causality. This is because since the two innovation processes completely (xyz) determine X, inversely the innovations ϵy can be known from process X and its innovations. The same occurs in the multivariate case. In principle, this renders these mutual information very powerful to test for causality and useless to quantify in some way the strength of the dependence. In practice, the actual value estimated would also reflect how valid is the autoregreesive model chosen. 4 Comparison of non-parametric and parametric criteria for causal inference from time-series We have reviewed different criteria for causal inference and introduced some related ones that conform a whole consistent framework for causal inference from time-series. Here we briefly summarize them, further highlighting their relations. In Table 1 and 2 we collect all the criteria for causal inference, organized according to being parametric or non-parametric, and bivariate or multivariate. We see that all the bivariate criteria have their multivariate counterpart except the criterion 2, associated with Sims causality. In Table 3 we display the corresponding information-theoretic measures to test the criteria. We group the measures according to which of them are equal given the functional form that determines the processes from the innovation processes. In the bivariate case the measures in 1 and 2 are equivalent only when there is no instantaneous causality. All these measures can be used to test for causal inference from Y to X, but when a nonzero value is obtained they provide alternative characterizations of the dependencies. Finally, in Table 4 we use the set W = {X, Y, Z} to reexpress the criteria of Table 1 in a synthetic form that integrates the bivariate and multivariate notation used so far, making more transparent their link. For example, {W \Y } refers to all the processes except Y . Furthermore, for innovation pro({W \Y }) cesses, ϵ{W \X,Y } refers to, given the projection ({W \Y }) which includes all the processes except Y , all the innovation processes {W \X, Y }, that is, all the ones in the projection except the ones associated with X and Y . 5 Conclusion We have reviewed criteria for causal inferences related to Granger causality and proposed some new ones in order to complete a unified framework of criteria and measures to test for causality in a parametric and non-parametric Parametric and non-parametric criteria for causal inference from time-series xxi Table 1 Bivariate criteria for causal inference Non-parametric 1 p(Xt+1 |X t ) = p(Xt+1 |X t , Y t ) 2 p(X t+1:N |X t , Y t ) = p(X t+1:N |X t , Y t , Yt+1 ) Parametric (x)N (x)N (xy)N 3 p(ϵx ) = p(ϵx |ϵy ) (xy)N N 4 p(X ) = p(X N |ϵy ) (xy)N (xy)N (xy)N 5 p(ϵx ) = p(ϵx |ηy ) (xy)N (xy)N (xy)N N 6 p(ϵx |X ) = p(ϵx |X N , ϵy ) Table 2 Multivariate criteria for causal inference Non-parametric 1 p(Xt+1 |X t , Z t ) = p(Xt+1 |X t , Y t , Z t ) 2 − Parametric (xz)N (xz)N (xyz)N (xyz)N 3 p(ϵx ) = p(ϵx |ϵy , ϵz ) (xz)N (xyz)N (xyz)N N 4 p(X |ϵz ) = p(X N |ϵy , ϵz ) (xyz)N (xyz)N (xyz)N 5 p(ϵx ) = p(ϵx |ηy ) (xyz)N (xyz)N (xyz)N N N 6 p(ϵx |X , Z ) = p(ϵx |X N , Z N , ϵy ) Table 3 Mutual information measures to test for causality Bivariate 1 2 (xy)N (x)N (xy)N I(Xt+1 ; Y t |X t ) = I(X N ; ϵy ) = I(ϵx ; ϵy (xy)N (xy)N I(Yt+1 ; X t+1:N |Y t , X t ) = I(ϵx ; ηy ) (xy)N 3 I(ϵx (xy)N ; ϵy ) |X N ) Multivariate (xz)N (xyz)N (xyz)N 4 I(Xt+1 ; Y t |X t , Z t ) = I(ϵx ; ϵy , ϵz ) (xyz)N (xyz)N (xz)N = I(X N ; ϵy , ϵz ) − I(X N ; ϵz ) 5 (xyz)N (xyz)N I(ϵx ; ηy ) (xyz)N (xyz)N I(ϵx ; ϵy |X N , Z N ) 6 Table 4 Criteria for causal inference Non-parametric 1 p(Xt+1 |{W \Y }t ) = p(Xt+1 |{W }t ) Parametric ({W \Y })N 2 3 ) = p(ϵx ({W \Y })N ({W })N |ϵ{W \X} ) ({W })N p(X N |ϵ{W \X,Y } ) = p(X N |ϵ{W \X} ) ({W })N 4 5 ({W \Y })N p(ϵx p(ϵx ({W })N p(ϵx ({W })N |ηy ({W })N |{W \Y }N , ϵy ) = p(ϵx |{W \Y }N ) = p(ϵx ({W })N ) ({W })N ) xxii Daniel Chicharro way, in the time or spectral domain, and for bivariate or multivariate processes. These criteria and measures are summarized in Tables 1-4. This offers an integrating picture comprising the measures proposed by Geweke [25, 26] and partial directed coherence [5]. The contributions of this Chapter are complementary to the work in [57] and [14]. The distinction between parametric and non-parametric criteria further emphasizes the necessity to check the validity of the autoregressive representation when applying a measure which inherently relies on the definition of the innovation processes. The distinction between criteria and measures stresses that causal inference and the characterization of the dynamic dependencies resulting from them should be addressed by different approaches [16, 15]. Finally, we notice again that we have here focused on the formal relation between the different criteria and measures. For practical applications, problems like the influences of hidden variables [21], or time and temporal aggregation [23] constitute serious challenges that prevent from successfully applying these criteria. For example, in the case of brain causal analysis it is now clear that a successful characterization can only be obtained if the application of these criteria is combined with a biologically plausible reconstruction of how the recorded data are generated by the neural activity [50, 58, 23]. Even at a more practical level, estimating from small data sets the information-theoretic measures to test for causality is complicated [34]. Most often stationarity is assumed for simplification, but event-related estimation is also possible [3, 27]. We believe that a clear understanding of the underlying criteria for causal inference and their relation to measures can also help to better interpret and address these practical problems. 6 Appendix: Fisher Information measure of Granger causality for linear autoregressive Gaussian processes In Eq. 12 we showed how the criterion of Granger causality of Eq. 1 can be tested using the Fisher information. For linear Gaussian autoregressive processes, considering the definition of the Fisher information (Eq. 9) we have Eyt [F (Xt+1 ; y t |X t )] ∫ ∫ ∫ ∂ log p(Xi+1 |X t , y t ) 2 = dy t p(y t ) dX t p(X t |y t ) p(Xi+1 |X t , y t )( ) dXt+1 . ∂y t (77) We start considering the term F (Xt+1 ; y t |xt ) corresponding to the first integral. For a gaussian process p(Xi+1 |xt , y t ) = N (µ(Xt+1 |xt , y t ), σ(Xt+1 |xt , y t )) is Gaussian. Therefore Parametric and non-parametric criteria for causal inference from time-series ∫ F (Xt+1 ; y t |xt ) = − (( ∂ log xxiii N (µ(Xt+1 |xt , y t ), σ(Xt+1 |xt , y t )) xt+1 − √ ∂ 12 ( 2πσ(Xt+1 |xt , y t ) 2 ) +( ∂y t ∑∞ (xy) a(xy) xxs Xt−s +axys Yt−s 2 ) σ(Xt+1 |xt ,y t ) )2 )dXt+1 . ∂y t s=0 (78) The first summand inside the integral is zero because the term on which the derivative is done is independent of y t . For the second summand, since it is linear, we consider for simplification just the partial derivation on a single variable yt . We get ∫ F (Xt+1 ; yt |xt ) = N (µ(Xt+1 |xt , yt ), σ(Xt+1 |xt , yt )) ( = xt+1 − ∑∞ (xy) (xy) axxs Xt−s + axyt yt axyt )2 dXt+1 σ(Xt+1 |xt , yt ) σ(Xt+1 |xt , yt ) s=0 (79) a2xyt . σ 2 (Xt+1 |xt , yt ) This term is independent both of xt and y t , so that the other two integration in Eq. 77 can be done straightforwardly. We have Eyt [F (Xt+1 ; yt |X t )] = a2xyt , σ 2 (Xt+1 |xt , y t ) (80) so that each coefficient in the autoregressive representation can be given a meaning in terms of the Fisher information. This relation further illuminates the relation between coefficients and GY →X [55, 40]: GY →X = 0 ⇔ a(xy) xys = 0 ∀s. (81) References 1. Amblard, P.O., Michel, O.: On directed information theory and Granger causality graphs. J Comput Neurosci 30, 7–16 (2011) 2. Ancona, N., Marinazzo, D., Stramaglia, S.: Radial basis function approach to nonlinear granger causality of time series. Phys Rev E 70(5), 056221 (2004) 3. Andrzejak, R.G., Ledberg, A., Deco, G.: Detection of event-related time-dependent directional couplings. New J Phys 8, 6 (2006) 4. Ay, N., Polani, D.: Information flows in causal networks. Advances in complex systems 11, 17–41 (2008) 5. Baccala, L., Sameshima, K.: Partial directed coherence: a new concept in neural structure determination. Biol Cybern 84(1), 463–474 (2001) 6. Baccala, L., Sameshima, K., Ballester, G., Do Valle, A., Timo-Iaria, C.: Studying the interaction between brain structures via directed coherence and granger causality. Appl Sig Process 5, 40–48 (1999) xxiv Daniel Chicharro 7. Barnett, L., Barrett, A.B., Seth, A.K.: Granger causality and transfer entropy are equivalent for gaussian variables. Phys Rev Lett 103(23), 238701 (2009) 8. Besserve, M., Schoelkopf, B., Logothetis, N.K., Panzeri, S.: Causal relationships between frequency bands of extracellular signals in visual cortex revealed by an information theoretic analysis. J Comput Neurosci 29(3), 547–566 (2010) 9. Bressler, S.L., Richter, C.G., Chen, Y., Ding, M.: Cortical functional network organization from autoregressive modeling of local field potential oscillations. Stat Med 26(21), 3875–3885 (2007) 10. Bressler, S.L., Seth, A.K.: Wiener Granger causality: A well established methodology. Neuroimage 58(2), 323–329 (2011) 11. Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R., Bressler, S.L.: Beta oscillations in a large-scale sensorimotor cortical network: Directional influences revealed by Granger causality. P Natl Acad Sci USA 101, 9849–9854 (2004) 12. Chamberlain, G.: The general equivalence of granger and sims causality. Econometrica 50(3), 569–581 (1982) 13. Chen, Y., Bressler, S., Ding, M.: Frequency decomposition of conditional granger causality and application to multivariate neural field potential data. J Neurosci Meth 150(2), 228–237 (2006) 14. Chicharro, D.: On the spectral formulation of Granger causality. Biol Cybern 105(5-6), 331–347 (2011) 15. Chicharro, D., Ledberg, A.: Framework to study dynamic dependencies in networks of interacting processes. Phys Rev E 86, 041901 (2012) 16. Chicharro, D., Ledberg, A.: When two become one: The limits of causality analysis of brain dynamics. PLoS One 7(3), e32466 (2012) 17. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. John Wiley and Sons (2006) 18. Ding, M., Chen, Y., Bressler, S.L.: Granger causality: Basic theory and application to neuroscience. In: Handbook of Time Series Analysis: Recent Theoretical Developments and Applications., pp. 437–460. Wiley-VCH Verlag (2006) 19. Eichler, M.: A graphical approach for evaluating effective connectivity in neural systems. Phil Trans R Soc B 360, 953–967 (2005) 20. Eichler, M.: On the evaluation of information flow in multivariate systems by the directed transfer function. Biol Cybern 94(6), 469–482 (2006) 21. Eichler, M.: Granger causality and path diagrams for multivariate time series. J Econometrics 137, 334–353 (2007) 22. Faes, L., Nollo, G., Porta, A.: Information-based detection of nonlinear granger causality in multivariate processes via a nonuniform embedding technique. Phys Rev E 83(5), 051112 (2011) 23. Friston, K.J.: Functional and effective connectivity: A review. Brain Connectivity 1(1), 13–36 (2012) 24. Gelfand, I., Yaglom, A.: Calculation of the amount of information about a random function contained in another such function. Am Math Soc Transl Ser 2(12), 199–246 (1959) 25. Geweke, J.F.: Measurement of linear dependence and feedback between multiple time series. J Am Stat Assoc 77(378), 304–313 (1982) 26. Geweke, J.F.: Measures of conditional linear dependence and feedback between time series. J Am Stat Assoc 79(388), 907–915 (1984) 27. Gómez-Herrero, G., Wu, W., Rutanen, K., Soriano, M.C., Pipa, G., Vicente, R.: Assessing coupling dynamics from an ensemble of time series. arXiv:1008.0539v1 (2010) 28. Gourevitch, B., Le Bouquin-Jeannes, R., Faucon, G.: Linear and nonlinear causality between signals: methods, examples and neurophysiological applications. Biol Cybern 95(4), 349–369 (2006) 29. Granger, C.W.J.: Economic processes involving feedback. Information and Control 6, 28–48 (1963) Parametric and non-parametric criteria for causal inference from time-series xxv 30. Granger, C.W.J.: Investigating causal relations by econometric models and crossspectral methods. Econometrica 37(3), 424–38 (1969) 31. Granger, C.W.J.: Testing for causality : A personal viewpoint. J Econ Dynamics and Control 2(1), 329–352 (1980) 32. Guo, S., Seth, A.K., Kendrick, K.M., Zhou, C., Feng, J.: Partial granger causality eliminating exogenous inputs and latent variables. J Neurosci Meth 172(1), 79–93 (2008) 33. Hiemstra, C., Jones, J.D.: Testing for linear and nonlinear granger causality in the stock price-volume relation. J Financ 49(5), 1639–64 (1994) 34. Hlaváčkova-Schindler, K., Paluš, M., Vejmelka, M., Bhattacharya, J.: Causality detection based on information-theoretic approaches in time-series analysis. Phys Rep 441, 1–46 (2007) 35. Kaminski, M., Blinowska, K.: A new method of the description of the information flow in the brain structures. Biol Cybern 65(3), 203–210 (1991) 36. Kramers, G.: Directed information for channels with feedback. PhD dissertation, Swiss Federal Institute of Technology, Zurich (1998) 37. Kuersteiner, G.: Granger-Sims causality, 2nd edn. The New Palgrave Dictionary of Economics (2008). DOI 10.1057/9780230226203.0665 38. Kullback, S.: Information Theory and Statistics. Dover, Mineola,NY (1959) 39. Lizier, J.T., Prokopenko, M., Zomaya, A.Y.: Local information transfer as a spatiotemporal filter for complex systems. Phys Rev E 77, 026110 (2008) 40. Lütkepohl, H.: New introduction to multiple time series analysis. Springer-Verlag, Berlin (2006) 41. Marinazzo, D., Pellicoro, M., Stramaglia, S.: Causal information approach to partial conditioning in multivariate data sets. Comput Math Meth Med p. 303601 (2012). DOI 10.1155/2012/303601 42. Marko, H.: Bidirectional communication theory - generalization of information-theory. IEEE T Commun 12, 1345–1351 (1973) 43. Massey, J.: Causality, feedback and directed information. Proc Intl Symp Info Th Appli, Waikiki, Hawai, USA (1990) 44. Paluš, M., Komárek, V., Hrnčı́ř, Z., Štěrbová, K.: Synchronization as adjustment of information rates: Detection from bivariate time series. Phys Rev E 63, 046211 (2001) 45. Pearl, J.: Causality: Models, Reasoning, Inference, 2nd edn. Cambridge University Press, New York (2009) 46. Pereda, E., Quian Quiroga, R., Bhattacharya, J.: Nonlinear multivariate analysis of neurophysiological signals. Prog Neurobiol 77, 1–37 (2005) 47. Permuter, H., Kim, Y., Weissman, T.: Interpretations of directed information in portfolio theory, data compression, and hypothesis testing. IEEE Trans Inf Theory 57(3), 3248–3259 (2009) 48. Priestley, M.: Spectral analysis and time series. Academic Press Inc., San Diego (1981) 49. Quinn, C.J., Coleman, T.P., Kiyavash, N., Hatsopoulos, N.G.: Estimating the directed information to infer causal relationships in ensemble neural spike train recordings. J Comput Neurosci 30, 17–44 (2011) 50. Roebroeck, A., Formisano, E., Goebel, R.: The identification of interacting networks in the brain using fmri: Model selection, causality and deconvolution. NeuroImage 58(2), 296–302 (2011) 51. Rozanov, Y.: stationary random processes. Holden-Day, San Francisco, USA (1967) 52. Schelter, B., Timmer, J., Eichler, M.: Assessing the strength of directed influences among neural signals using renormalized partial directed coherence. J Neurosci Meth 179(1), 121–130 (2009) 53. Schelter, B., Winterhalder, M., Eichler, M., Peifer, M., Hellwig, B., Guschlbauer, B., Lucking, C., Dahlhaus, R., Timmer, J.: Testing for directed influences among neural signals using partial directed coherence. J Neurosci Meth 152(1-2), 210–219 (2006) 54. Schreiber, T.: Measuring information transfer. Phys Rev Lett 85, 461–464 (2000) xxvi Daniel Chicharro 55. Sims, C.: Money, income, and causality. American Economic Rev 62(4), 540–552 (1972) 56. Solo, V.: On causality and mutual information. In: Proceedings of the 47th IEEE Conference on Decision and Control, pp. 4639–4944 (2008) 57. Takahashi, D.Y., Baccala, L.A., Sameshima, K.: Information theoretic interpretation of frequency domain connectivity measures. Biol Cybern 103(6), 463–469 (2010) 58. Valdes-Sosa, P., Roebroeck, A., Daunizeau, J., Friston, K.: Effective connectivity: Influence, causality and biophysical modeling. Neuroimage 58(2), 339–361 (2011) 59. Vicente, R., Wibral, M., Lindner, M., Pipa, G.: Transfer entropy: A model-free measure of effective connectivity for the neurosciences. J Comput Neurosci 30, 45–67 (2010) 60. Wiener, N.: The theory of prediction. In: Modern Mathematics for Engineers, pp. 165–190. McGraw-Hill, New York (1956)
© Copyright 2026 Paperzz