Influence functions and their uses in econometrics Maximilian Kasy Object of interest: a statistic φ(P ) ∈ R, which is a function of the distribution of the vector X, P (X). Examples include E[X], V ar(X), quantiles of X, the Gini coefficient,... What does it mean for φ to be differentiable at P 0 ? Intuitively, that there is is a continuous linear functional Dφ, such that φ(P ) ≈ φ(P 0 ) + Dφ(P − P 0 ). On functional spaces, there are different notions of differentiability; which one is appropriate depends on the context. Strongest notion of differentiability: Fréchet derivative at P 0 (φ(P ) − φ(P 0 )) − Dφ(P − P 0 ) = 0, lim kP − P 0 k P →P 0 where Dφ is a continuous linear functional with respect to some norm kP k, for instance the L2 norm on the space of densities which is defined by sZ kP k = (dP/dP 0 )2 dP 0 . The limit has to equal 0 for all sequences of measures P converging to P 0 . A weak notion of differentiability: directional or Gâteaux derivative: 1 (φ(P 0 + t(P − P 0 )) − φ(P 0 )) − tDφ(P − P 0 ) = 0, t→0 t lim where the limit has to equal 0 for all measures P . Riesz representation theorem: Consider a Hilbert space H , i.e., a vector space equipped with an inner product h., .i and with the corresponding norm. Then for any continuous linear functional ψ:H →R there is an element x ∈ H , such that ψ(y) = hx, yi 1 for all y ∈ H . The vector x is the dual representation of the linear functional ψ. P k , hx, yi = Intuition of this theorem for H = R xi yi : We can write any vector y as P th y = yi ei , where ei is the i unit vector. By linearity of the functional ψ, X ψ(y) = yi ψ(ei ) = hx, yi for x defined by xi = ψ(ei ). To apply this representation theorem in the present context, take H to be the space of all measurable functions of X which have mean zero and finite variance under P 0 . This space can be equipped with the inner product hy, zi = E 0 [y(X) · z(X)] = Cov(y, z). Consider the derivative Dφ to be a functional applied to the relative density of P to P 0 , minus the relative density of P 0 to P 0 - which equals 1. If Dφ is a continuous functional on H , the Riesz representation theorem implies the existence of a mean zero, finite variance function IF (X), the influence function, such that Z 0 0 0 Dφ(dP/dP − 1) = E [IF (X) · (dP/dP )(X)] = IF (X)(dP/dP 0 )(X)dP 0 (X) Z = IF (X)dP (X) = E[IF (X)]. Put differently, we have an approximation of the functional φ by the mean of IF : φ(P ) ≈ φ(P 0 ) + E[IF (X)]. Alternative, intuitive derivation of influence function: Suppose we want an approximation for the plug-in estimator φb = φ(Pn ), where Pn is the empirical distribution Pn = 1X δXi . n Using the approximation given by the derivative, and the linearity of the derivative, we get 1X φb ≈ φ(P 0 ) + Dφ(Pn − P 0 ) = φ(P 0 ) + Dφ(δXi − P 0 ) n = φ(P 0 ) + En [Dφ(δXi − P 0 )], where En denotes the sample average. This suggests that IF (X) = Dφ(δX − P 0 ). 2 Influence functions play a role in a number of different contexts in econometrics: • Asymptotic distribution theory, efficiency bounds: One can show that any “regular” estimator φb of the just-identified parameter φ(P ) is asymptotically equivalent to a linearised plug-in estimator, φb ≈ φ(P 0 ) + En [IF (X)]. This implies the asymptotic efficiency bound b → Var(IF (X)). n Var(φ) Tsiatis, A. (2006). Semiparametric theory and missing data. Springer Verlag, in particular chapter 3 van der Vaart, A. (2000). Asymptotic statistics. Cambridge University Press, chapter 20 • Robust statistics: Since estimators can be approximated by φb ≈ φ(P 0 ) + En [IF (X)], we get that the value of φb can be dominated by a single outlier, even in large samples, unless IF is bounded. Huber, P. (1996). Robust statistical procedures. Number 68. Society for Industrial Mathematics • Distributional decompositions: In labor economics, we are often interested in counterfactual distributions of the form Z P (Y ) = P 1 (Y |X)dP 2 (X), where we observe samples from the distributions 1 and 2. In order to estimate φ(P ), we can again use the approximation Z Z φ(P ) ≈ φ(P 2 ) + IF (Y )dP (Y ) = φ(P 2 ) + IF (Y )dP 1 (Y |X)dP 2 (X) = φ(P 2 ) + E 2 [E 1 [IF (Y )|X]]. The conditional expectation E 1 [IF (Y )|X] can be estimated using regression methods, the expectation w.r.t. X can be estimated using the P 2 sample average of predicted values from the regression. Firpo, S., Fortin, N., and Lemieux, T. (2009). Unconditional quantile regressions. Econometrica, 77(3):953–973. • Partial identification: Nonparametric models with endogeneity, such as those discussed in the first part of class, tend to lead to representations of potential outcome distributions of the form P (Y d ) = αP 1 (Y ) + (1 − α)P 2 (Y d ), 3 where draws from P 1 (Y ) are observable, while the data are uninformative about P 2 (Y d ). A linear approximation to φ(P (Y d )) then implies φ(P (Y d )) − φ(P 0 ) ≈ αDφ(P 1 (Y ) − P 0 ) + (1 − α)Dφ(P 2 (Y d ) − P 0 ). The first term here is identified, the second term can be bounded if and only if Dφ is bounded on the admissible counterfactual distributions P 2 (Y d ) - the same condition as in robust statistics. Kasy, M. (2012). Partial identification, distributional preferences, and the welfare ranking of policies. workingpaper, section 3 How can we actually calculate influence functions? The easiest way is via directional derivatives: Consider families of distributions indexed by a parameter θ ∈ R: P (X; θ). Then we get that φ(P (.; θ)) : R → R is a function we can easily differentiate. Next, do some algebra to get the resulting expression into the form Z ∂φ ∂ = IF (X) dP (X; θ). ∂θ ∂θ Finally, normalize, by adding a constant, so that IF has mean zero. Example: Z φ(P ) = Var(X) = Thus ∂φ = ∂θ 2 X dP − 2 Z XdP . Z Z ∂ ∂ X dP − 2 XdP X dP ∂θ ∂θ Z ∂ dP = X 2 − 2E[X] · X ∂θ Z 2 Normalizing, as required, to E[IF ] = 0, we get IF (X) = X 2 − 2E[X] · X − Var(X) + E[X]2 . 4
© Copyright 2026 Paperzz