Political Selection and the Relative Age Effect

Political Selection and the Relative Age Effect
Daniel Müller and Lionel Page
∗
March 14, 2012
Abstract
In this paper we present substantial evidence for the existence of a
bias in the distribution of births of leading US politicians in favor of those
that have been the oldest in their cohort at school. This “relative age
effect” has been proven to influence performance at school and in sports,
but evidence on its impact on people’s vocational success has been rare.
We find a marked break in the density of birthdate of politicians using a
maximum likelihood test and McCrary’s (2008) nonparametric test. We
conjecture that being relatively old in a peer group may create long term
advantages which can create a significant role in the ability to succeed in a
highly competitive environment like the race for top political offices in the
USA. The magnitude of the effect we estimate is larger than what most
other studies on the relative age effect for a broader (adult) population
find, but is in general in line with studies that look at populations in
high-competition environments.
∗ We
thank Justin McCrary, Guido Imbens and Karthik Kalyanaraman for providing us
with the code of their estimators.
Contact: School of Economics and Finance, Queens-
land University of Technology. Phone: + 61 7 3138 4793, Gardens Point Campus. E-mail:
[email protected].
1
“’Excellence’ is not a gift, but a skill that takes practice.”
Plato
1
Introduction
Research in political economy usually focuses on the incentives people face,
either once they are in office or, more fundamentally, to become a politician
after all (Diermeier et al., 2005; Besley and Coate, 1997). Studies, on the other
hand, which investigate in detail who are the winning politicians are rare. It
is only recently that economists have evinced interest in whether the political
process selects politicians with specific desirable characteristics such as ability or trustworthiness (Besley, 2005). Besley argues that the proper selection
of politicians is of crucial concern for a functioning government and that it is
therefore essential to understand what factors exactly make it more likely for
people to successfully run for an office. Understanding what individual qualities
and traits allow politicians to reach a top-class position in the political system
is nevertheless an important goal in itself. A better understanding of the selection process producing the population of politicians may also contribute to
understand the way political institution work in practice. Previous research
has already demonstrated that individual characteristics of leaders can have
real impacts. Malmendier and Tate (2005) show that CEOs with a scientific
background are less likely to be overconfident. Bertrand and Schoar (2003) find
that a company’s policy is significantly influenced by its managers. Jones and
Olken (2005) present evidence that makes clear that political leaders have a
significant influence on economic growth. In the field of politics, Ferreira and
Gyourko (2010) results suggest that female politicians have higher unobserved
political skills but are less able to resolve conflict in their political coalitions.
It is obvious that successful politicians are a subpopulation with specific traits,
different from the average population. One may naturally think for example
of wealth, gender, ethnicity or education (Besley et al., 2011), as factors which
give an advantage in the race to top-flight political positions or other leadership
2
tasks in general. It is also evident that successful politicians have specific talents
such as leadership or charisma which help them lead and motivate their group
of partisan and voters.
In the present paper we argue that selection of politicians is significantly
influenced by their relative position among their peers at school. We give for
the first time evidence of a significant bias in the distribution of births of leading
US politicians - US Congressmen - in favor of those that have been the oldest
in their cohort. In line with previous research, we interpret this fact in such a
way that older children tend to benefit from their advantageous relative position
in their cohort to learn early the skills required for leadership (see for example
Dhuey and Lipscomb (2008)). Evidently, these learned abilities help people to be
more competitive and enable them to win competitions, like political elections.
As a result, relative age might have a long lasting effect on people and their
vocational success.
Our estimates indicate that US Senators and Representatives with a higher
relative age are indeed largely overrepresented compared to the average American population as well as compared to a uniform distribution. The point estimate indicates that US politicians are up to 50% more likely to be among the
oldest in their cohort than the average population. The effect size is similar to
what other studies find in professional sports. This reveals that the impact of
people’s date of birth may have in some cases a much longer lasting influence
than previously thought.
Our study contributes to the existing literature mainly along two dimensions:
firstly, as mentioned before, we establish the existence of a relative age effect
among top-flight US American politicians. Secondly, we provide in this paper
a general framework - a combination of a parametric and a nonparametric test
- to analyze any desired population for a relative age effect. Unlike most of the
existing age effect literature we do not rely on rough classifications in months or
quarters of birth, but on a maximum likelihood test which is able to test for a
discontinuity in a potentially continuous density. We implement this test based
on the exact day of birth and cut-off day. The rationale behind our testing
3
framework is straightforward. We simply look for a break in the relative age
distribution at the specific cut-off date. This test enables us to test not only
against a break in the uniform distribution, but also in the empirically observed
US birth distribution.
The test is easy to implement and to interpret, however it relies on a linearity assumption. Therefore we additionally illustrate how to use McCrary’s
nonparametric density test (2008) in our setting. This test does not rely on
any distributional assumption. However it suffers from the same drawback than
other nonparametric methods, the fact that there are free parameters to choose.
We implement his framework using the latest research on optimal parameter
choices. The remainder of this paper is organized as follows. Section 2 will
present a brief review of the literature on the relative age effect, Section 3 describes the data we use in more detail, Section 4 presents the methods we use
and our main results. Section 5 displays some robustness checks and Section 6
concludes.
2
The Relative Age Effect
School entry cut-off dates arbitrarily divide students into different cohorts. As
a consequence, although attending the same class, those born directly after the
cut-off are almost one year older than those born directly before. The fact
that differing relative age within a class affects students along many different
dimensions is supported by a huge body of literature. For example, relative
young students tend to notably underperform at school (Bedard and Dhuey,
2006) and are less likely to enroll at a university (Mühlenweg and Puhani, 2010).
The effect on school performance, although declining for teenage students, is
found to be persisting until the end of tertiary education. Beyond educational
attainment, the relative age effect also seems to play a significant role in the
sports domain, among professional as well as among youth athletes (see Musch
and Grondin (2001) for a summary of the relative age effect in sports). Relative
old athletes seem to enjoy a significant competitive advantage compared to
4
others. However, it is well established that the effect of relative age does not stem
from relatively older students being superior in some absolute sense, but purely
from the fact that students end up being older or younger than their peers. It has
been shown that the relative age effect is independent of the seasonal climatic
conditions and socio-cultural circumstances. Musch and Hay (1999) for instance
substantiate the existence of a relative age effect in professional soccer in four
different countries with different climatic conditions at the cut-of date. They are
even able to show that the effect remains stable after a within-country change
in the cut-off date (see also Sykes et al. (2009) for instance).
Although there is by now no generally accepted theory, several explanations
have been adduced to explain this undisputed phenomenon. One part of the
story might be that lower physical strength and height compared to the peers
might most likely be a source of frustration and feelings of inferiority. In line with
this, studies find that students who are younger within their cohort commit with
a higher probability suicide and face a higher rate of psychological disorders, see
Thompson et al. (1991), Goodman et al. (2003) and Evans et al. (2010). On the
other hand, the results in Dhuey and Lipscomb (2008) suggest that relatively
older students are more likely to become a leader of a high school club. Being
older among their peers might help them to take a leadership position among
them.
Studies that deal with the potential long-term effects of relative age beyond
school and university attainment outside of sports, for example in labor market
outcomes, are considerably less frequent. Fredriksson and Öckert (2005) find
that being older at school entry slightly increases earnings in later life. Du et al.
(2009) study a population probably most similar to ours. They demonstrate by
looking at a sample of US American CEOs that these leaders seem to have
a higher likelihood of being born directly after the cut-off date than directly
before. They are using quarterly dummies which is common practice in the
literature but leads to a loss of information and they do not account for changing
cut-off dates over the years. By looking at students at an Italian university,
Billari and Pellizzari (2008) find a negative effect, if any, of being relatively
5
old on performance. Taken together this yields an interesting pattern: studies
that deal with the outcomes of a broader population after school often find
small to insignificant age effects1 , whereas studies that deal with special, very
successful populations (like Du et al. (2009) and Musch and Hay (1999) for
instance) typically find a strong and significant effect.2 Studies that deal with
the labor market outcomes may fail to find such an effect because they deal with
a broader population where the relative age effect is not as pronounced as in
environments with a more intense competition. In the upper tail of the ’success
distribution’ it is possible that every detail counts more and that, in some sense,
every ingredients needs to have been right. In such a case, the relative age effect
may matter for exactly that reason.
3
The Data
We employ a dataset, which includes birthdays and state of birth of most US
American Senators and members of the House of Representatives, as of June
2011. These data are publicly available. We dropped those observations where
we were not able to determine the state where he or she went to kindergarten
and primary school respectively. For instance a few politicians were born in a
foreign country, some moved during their childhood beyond state borders and
for some the information provided on their homepages was simply not enough
to identify the state where he or she went to school and kindergarten. Moreover
we dropped those born in states without statewide school entry cut-off dates.
To be able to calculate the relative age of the Congressmen, we draw on
school entry cut-off data for all 50 US states from the US Education Commission
of the States. From our communication with staff members of the Education
Commission of the States we know that the same cut-off dates for kindergarten
1 It
needs to be stressed again that all studies consistently find an effect for children and
younger students.
2 And in fact, by looking at US politicians we find an effect of similar size then the studies
of professional sportsmen found before.
6
and schools apply in each state. Unfortunately, these dates only go back to
1975 and most politicians were born before. Therefore we also employ cut-off
dates collected by Dhuey and Lipscomb (2008) for the period preceding 1975.
Table 1 in their paper depicts state cut-off dates for kindergarten entrance for
1947-1949, 1959 and 1967-1969. We assign each politician to one of the four
cut-off dates that is closest to his or her birthday. Including these dates gives us
443 observations, which is more than 80% of the total number of Congressmen.
There is no reason to believe that the sample we have is biased in any obvious
way.
Moreover, we also make use of monthly birth frequency data for a period
from 1990 to 2008 taken from US Department of Health and Human Services.
This dataset covers all births in the US in the given period. We take monthly
averages over the whole period of 19 years to approximate for the real underlying birth distribution over the year. A deviation from the uniform distribution
might for example stem from seasonal effects in child-bearing behavior. Interestingly, these data are also combinable with the educational background of
the mother, available from the same source. This allows us to test not only
against a deviation from the uniform distribution but also against the general
and “educated people only” empirical birth distribution.
4
Methods and Results
4.1
Sawtooth Test against Uniform Distribution
The idea of the testing procedures we use is simple. If being relatively old is
beneficial for developing leadership qualities and improving competitiveness, we
would expect to observe more relative old politicians than young ones, with a
discontinuity in the birthday distribution at the cut-off date. To test this we
implement a parametric maximum likelihood version of the Bayesian “Sawtooth
test” (Barnett and Dobson, 2009).
Firstly, we define the relative age more formally. In general, it is defined to
7
Slope −d
Density
d
2
1
0
− 21
1
2
Relative Age
Figure 1: Stylized Sawtooth Density
be the relative distance of person i′ s birthday and the relevant cut-off date a
person is subjected to. We compute relative age of person i then as follows: Let
coi be the cut-off date a person i is exposed to and xi his or her birthday, both
normalized such that coi , xi ∈ [0, 1].3 We then define relative age yi as



(x − coi ) + 1 if

 i
yi =
(xi − coi )
if



 (x − co ) − 1 if
i
i
(xi − coi )
< − 21
− 12 ≤
(xi − coi )
≤
1
2
(xi − coi )
.
<
1
2
(1)
Thus yi ∈ [− 21 , 21 ], whereas the “critical date” (the date that separates the
relative oldest from the youngest) can be found at y = 0. The relative old can
accordingly be found in the positive area to the right of zero and the young
politicians to the left of zero. The interval is rather arbitrarily chosen, nevertheless the fact that the interval has a length of one facilitates interpretation of
the results.
The test is implemented as follows: we write the density of yi , the relative
age of person i, as
f (yi ) = 1 − dyi −
d
+ d 1[yi ≥0] .
2
(2)
Since yi is normalized to a unit interval, d is not only the height of the break
but also the (negative) slope of the density function. A stylized graph of this
3 For
example, if i is born at the 31st of December, then xi = 1. If the relevant cut-off date
for i would be the 30th of June, then coi =
1
.
2
8
density is given in Figure 1. The coefficient d is estimated by maximizing the
log-likelihood function
L=
N
X
h
i
d
ln 1 − dyi − + d 1[yi ≥0] ,
2
i=1
(3)
with respect to d, where N is the sample size.
It is clear that, if the birth distribution is roughly uniform across the year,
we would expect d to be equal to zero. In that case we must assume that there
is in fact no relative age effect among US politicians. If however politicians have
in fact a higher likelihood of being born directly after the cut-off date, we would
expect d to be significantly larger than zero.
Figure 2 depicts the results. The estimated d-coefficient for the sawtooth
test and the corresponding p-value is given in the upper left graph (“Month
+0”). The coefficient is equal to 0.42, the p-value is 1.1% and thus clearly
significant at the 5% level. A coefficient of d = 0.42 means a big deviation from
the uniform distribution - it roughly implies that the observations have at the
limit a
1.2−0.8
0.8
= 50% higher likelihood of being born directly after the cut-off
date then before. These results provide strong support for the existence of a
relative age effect among US politicians.
This effect is large and is in the same range as the effects found in studies
on professional sport athletes. Barnett (2010) for example finds an effect of
78% for AFL players between January and December, around the cut-off date.
Musch and Hay (1999) find an effect not smaller than 66% up to more than
200% for professional soccer players in Japan, Australia, Germany and Brazil
(they are using quarters of birth). Helsen et al. (2005) estimate the effect size to
be larger than 370% for a sample of more than 2100 soccer players in national
youth selection teams across Europe. Edgar and O’Donoghue (2005) find an
effect of the order of 80% by looking at professional tennis players (they also
use quarters).4 Thus our estimate, although large, are not outstanding relative
to the literature on relative age effect in professional sport.
4 All
these effect sizes are compared to a uniform distribution.
9
Sawtooth Test Against Uniform Distribution
0.5
−0.5
0.0
Relative Age
0.5
−0.5
0.0
Relative Age
0.5
0.5
Estimated density
.8.9 11.1
1.2
Estimated density
.8.9 11.1
1.2
−0.5
−0.5
0.0
Relative Age
0.5
0.0
Relative Age
0.0
Relative Age
d=.21; p= .22
−0.5
0.0
Relative Age
0.5
−0.5
0.0
Relative Age
corresponding p-value. The upper left graph stands for the test at the hypothesized cut-off. The other eleven graphs depict pseudo-tests where the potential
break point is shifted by the equivalent of one month in each step. N = 443.
As a robustnes test we run “pseudo-test”, where we switch the assumed
breakpoint by the equivalent of one month in each step.5 The other eleven
graphs in Figure 2 show the results for these tests. As it can be seen the
pseudo-tests show no break significant at the 5% level and all estimated pseudo
coefficients are considerably smaller than 0.42. This is strong evidence that a
discontinuity occurs at the specifically at around the cut-off.
the variables are normalized to a unit interval, a plus one month move corresponds
to a shift of
1
12
in our case.
10
0.5
d=.17; p= .32
Figure 2: The estimated break height and the slope are given by d, p is the
5 Since
0.5
Cut−off Month+11
d=.04; p= .82
−0.5
0.5
Cut−off Month+8
Cut−off Month+10
d=.02; p= .89
0.0
Relative Age
0.5
d=−.29; p= .08
Cut−off Month+9
−0.5
0.0
Relative Age
0.0
Relative Age
d=−.25; p= .13
Cut−off Month+7
d=−.16; p= .34
Estimated density
.8.9 11.1
1.2
Estimated density
.8.9 11.1
1.2
Cut−off Month+6
−0.5
−0.5
Cut−off Month+5
d=−.12; p= .44
Estimated density
.8.9 11.1
1.2
−0.5
0.5
d=.05; p= .74
Cut−off Month+4
d=.04; p= .8
Estimated density
.8.9 11.1
1.2
Estimated density
.8.9 11.1
1.2
Cut−off Month+3
0.0
Relative Age
Estimated density
.8.9 11.1
1.2
0.0
Relative Age
Estimated density
.8.9 11.1
1.2
−0.5
Cut−off Month+2
d=.22; p= .17
Estimated density
.8.9 11.1
1.2
Cut−off Month+1
d=.42; p= .01
Estimated density
.8.9 11.1
1.2
Estimated density
.8.9 11.1
1.2
Cut−off Month+0
0.5
4.2
Generalized Sawtooth Test
While we find a large estimated break in the density, a problem with that analysis might be that we test implicitly against the null hypothesis of a uniform
distribution of yn . However, the data generating process underlying birth frequencies is not exactly uniform, for instance due to seasonal effects or strategic
child bearing behaviour. Another concern might be that more educated parents
who are aware of the relative age effect endogenously “sort” the births of their
children right after the cut-off date, in order to give them a natural advantage.
If this is true, it would lead to a confounding of an educational effect (children
of more educated parents are more likely to get more education themselves and
more educated people can be found with higher probability in leading positions
than less educated ones) and a potential relative age effect. To control for this
possibility, we generalize the sawtooth test from section 4.1 as follows:
Let h(xi ) be null hypothesis density of observing a birth at date x for politician i. Then the density of yi becomes:
f (yi ) = h(xi ) − dyi −
d
+ d ∗ 1[yi ≥0] .
2
(4)
We approximate the null hypothesis density h(xi ) by the monthly birth frequency data from the US Department of Health and Human Services averaged
over all 50 states and over the whole period 1990 to 2008 (see section 3) and
then rerun the Sawtooth test. All politicians are born before 1990, we therefore assume rely on the identification assumption that the birth distribution is
sufficiently stable over time to get a good approximation.6
Figure 3 depicts the monthly birth frequencies for 1990-2008.7 The inter6 Arguably
we are unable to measure the exact birth distribution during the period of the
birth of the Congressmen. We conducted stability tests in the way that we compared monthly
frequencies of different subsamples of the 1990-2008 series and we found that the frequencies
remained literally unchanged. In any case, any difference between the actual distribution of
births at the time of birth of the Congressmen and the one we use would only tend to create
an attenuation bias, making our estimations conservative.
7 Practically, we have birth frequency data on a monthly base, whereas we need data on
a daily base. We computed the average birth frequency of each day as the frequency of the
11
0.02
Frequencies
0.04
0.06
0.08
0.1
Birth Frequencies in the US
Jan
Feb Mar
Apr May Jun Jul
Months
Aug Sep
Oct
Nov Dec
Figure 3: Monthly birth frequencies in the US. Aggregated data for the period
of 1990 to 2008.
pretation of the coefficient remains the same.
As mentioned in Section 3, we are also able to combine empirical birth
frequency data with the educational background of the mother and thereby examine the differences in birth distributions of educated and uneducated people.
Additionally to the real 1990-2008 monthly birth frequencies we use two definitions of “educated” to construct two other null hypothesis distributions with
data from the same period: we define “educated” first as holding at least a
high school degree and then, secondly, as holding at least a college degree. This
allows us to minimize concerns about sorting.
The results strongly support our conclusion from section 4.1: all three coefficient estimates slightly reduce to d = 0.4, but are still very high with a p-value
of 1.5%. The stability of our point estimates alleviate concerns that our results
are driven by an incorrect distributional assumption or sorting effects. Table 1
summarizes the results.
month, divided by 30.
12
Null Hypothesis
Distribution
Coefficient
Estimate
Uniform
0.42**
Distribution
(1.1%)
Empirical
0.40**
Birth Distribution
(1.5%)
At least high
0.40**
school degree
(1.5%)
At least college
0.40**
degree
(1.5%)
Table 1: Generalized Sawtooth Test. Coefficient estimates above, p values in
brackets below. N = 443. *,** and *** indicate significance at the 10%, 5% &
the 1% level respectively.
13
4.3
A Nonparametric Test
The results presented in section 4.1 and 4.2 draw an interesting picture of the
presence of a relative age effect among US politicians. We think the MLE test
we presented before is very well suited in that context. However, an implicit
assumption underlying this test is that the influence of the relative age effect
on the probability of becoming a successful politician is linearly changing in the
distance of one’s birthday to the relevant cut-off date, here the breakpoint at
yi = 0. This might not always be a justifiable assumption and it is moreover not
a necessary condition for the existence of a relative age effect.8 To address this
issue, we also implement Justin McCrary’s (2008) nonparametric continuous
density test.
His test was initially designed to validate the continuity assumption of the
assignment variable in a regression discontinuity (RDD) framework, which is
a sufficient condition for identification. The idea is to test for a jump in the
density of the assignment variable at the relevant cut-off point to check for the
validity of this standard RDD assumption. In that respect this test is also well
suited in our setting, because we are interested in testing for a potential jump
in the density of relative age. For nonparametric identification of this jump the
limit value of the density of the variable of interest (in our case relative age)
from the left and from the right at the cut-off point needs to be estimated.
Standard kernel density estimators suffer from a strong boundary bias. For
this reason they are not well suited for situations where the interest lies in
the points at the boundary itself. McCrary’s test tackles this problem in a
two-step procedure: first the data are “binned” - that is a histgram of the
variable of interest is calculated, with no bin overlapping the suspected point of
discontinuity. In the second step, nonparametric methods are used to estimate
the density endpoints from both sides, as proposed by Hahn et al. (2001). Local
polynomial regression of order one combined with a triangle kernel was proven
8 It
could be that this effect only shows up for those born four weeks after and four weeks
before the cut-off date for example. These circumstances could render the Sawtooth Test
insignificant, when in fact there is an effect.
14
to have the best properties for estimating points at the boundary (see Cheng
et al. (1997) and Fan and Gijbels (1992)). Besides, Fan and Gijbels (1996)
illustrate that in general polynomials with odd order have better Mean Squared
Error properties than ones with even order and more recently Porter (2003)
was able to prove that local linear estimators are rate optimal. This is exactly
how we (and McCrary) implement the estimation of the jump point. Details
for the calculations of the standard errors are depicted in the web appendix on
McCrary’s homepage. Under standard nonparametric regularity conditions this
estimator is consistent and asymptotically normal.
4.3.1
The Binsize Choice
McCrary’s procedure has the advantage that no functional form assumption
about the underlying density is necessary. However it has the drawback that
the researcher has to choose smoothing parameters, which can be crucial for
the outcome. In the McCrary test there are even two: the binsize for the
histogram and the bandwidth the local polynomial regression. He recommends
visual inspection of the first and second step diagrams guided by some automatic
procedure (like cross validation, see Stone (1974)) to reach an optimal choice.
We implement every parameter decision as theory-driven as possible. We use
the Asymptotic Mean Integrated Squared Error (A-MISE) optimal histogram
binsize here as a benchmark. The A-MISE basically tries to balance the tradeoff
between the bias and the variance inherent in the optimal binsize problem (see
for example Härdle (1991) for details). Formally, the MISE of a histogram is
defined to be
M ISE(fˆb (y)) = E
Z
∞
−∞
2
ˆ
(fb (y) − f (y)) dy .
(5)
where {yi }ni=1 ∼ f (y) is an i.i.d. sample, b is the binsize and fˆb is the estimated histogram. Intuitively, the MISE-optimal binsize minimizes the expected
distance between the true density function and the histogram. The AsymptoticMISE optimal binsize b∗ then results from minimizing (5) with respect to b and
15
neglecting all terms that are asymptotically equal to zero:
∗
b =
6
′
n||f (y)||22
13
(6)
.
In general the A-MISE optimal binsize is of limited practical use as the term
′
||f (y)||22 is unknown. However, proceeding on the assumption that our model
estimated in section 4.2 is a sufficiently good approximation of the true density,
′
we are able to use the fact ||f (y)||2 ≈ 0.4.9 Then, empirically the binsize
estimator turns out to be:
∗
b =
6
443 ∗ 0.42
31
≈ 0.44 .
(7)
This will admittedly result in a rather oversmoothed histogram. Nevertheless, this binsize choice has the theoretical advantage of minimizing the AMISE.10 Intuitively, this makes sense since we are dealing with a distribution
which has a very similar slope over the whole support. McCrary generally starts
from a more bell-shaped distribution in which case the slope changes distinctively over the support. However, he also stresses the irrelevance of the first
step binsize choice for the asymptotic normal approximation of his estimator.
Nevertheless, we will implement a large range of different smaller binsizes to
check the sensitivity of the results.
4.3.2
The Bandwidth Choice
The second free parameter to choose is the bandwidth in the second step local
polynomial regression. Recently Imbens and Kalyanaraman (2009) (IK henceforth) came up with an optimal bandwidth for the estimation around boundary
9 If
′
the uniform distribution was the true data generating process, in fact ||f (y)||22 would
be equal to zero and thus the A-MISE optimal binsize would be infinity, which amounts to
choosing a one-bin histogram.
10 We are aware of the fact that strictly speaking the optimal binsize from A-MISE minimization is not necessarily optimal in our framework. This is so because the A-MISE is a
“global criterion”, whereas we are only interested in the boundary points. This argument is
analogue to Imbens’ and Kalyanaraman’s (2009) argument for the optimal bandwidth in a
regression discontinuity design.
16
points in a RDD setting. Their suggestion is based on the observation that
the MISE is not necessarily the relevant criterion in settings where the interest does not lie in the whole function but only in the boundary points. This
notion goes at least back to Ludwig and Miller (2005). IK’s basic idea is to
use a criterion function that minimizes the expected squared difference between
the RDD estimator and the true parameter value itself instead of minimizing
the MSE over the whole support. IK are able to show that their bandwidth
is asymptotically optimal and outperforms the only other bandwidth selection
procedure especially designed for regression discontinuity design suggested by
Ludwig and Miller (2005), which is based on cross validation. For this reason
we implement IK’s suggestion in the following.
4.3.3
Results
We test for the existence of a break in the relative age distribution using McCrary’s test, including the modifications explained before. Using IK’s optimal
bandwidth estimator for the second step, we try a range of different histogram
binsizes guided by the A-MISE optimal binsize, see section 4.3.1. The estimated
coefficient θ stands now for the log difference of the density at yi = 0, estimated
from the right and the left:
θ̂ = ln
lim+ fˆ(yi ) − ln lim− fˆ(yi ) ,
yi →0
yi →0
(8)
where f (.) is the density estimated via linear local polynomial regression using
the triangle kernel K(z) = (1 − |z|)1[|z|≤1] and yi ∈ [− 12 , 21 ] stands again for
relative age.11
The first column in Table 2 depicts the results with different binsizes and the
corresponding IK optimal bandwidth. We start out with the A-MISE optimal
binsize and reduce it successively in 0.05 steps down to a binsize which is by
a factor of ten smaller than the first suggestion. As mentioned before, we
defined relative age such that y ∈ [− 12 , 21 ]. However the optimal bandwidths
11 We
thank Justin McCrary and IK for providing us with the code of their tests.
17
Break Estimate θ
Binsize
IK Bandwidth
0.48***
0.44
2.72
0.44***
0.40
2.54
0.33***
0.35
2.20
0.34***
0.30
1.78
0.31**
0.25
1.06
0.41*
0.20
0.49
0.36**
0.15
0.95
0.63*
0.10
0.26
0.56*
0.05
0.27
0.36**
0.04
0.72
Table 2: McCrary (2008) test for break in relative age distribution for different
binsizes. The bandwidth is calculated according to Imbens and Kalyanaraman
(2009). *,** & *** indicate significance at the 10%, 5% & the 1% level respectively. θ is the estimated log difference of the density from the left and from the
right, estimated using local linear regression with triangle kernel. N = 443.
18
1.4
1.2
Estimated Density
1.0
0.8
0.6
−0.5
0.0
Relative Age
0.5
Figure 4: McCrary density estimate and confidence intervals of relative age of
US politicians. The binsize is 0.04 and the bandwidth is 0.72.
are sometimes larger than 0.5 especially for large binsizes, which means that
observations outside that interval are in general taken into account. Since our
data is by nature circular we estimated densities over y ∈ [−1, 1], that means
we have the same observations to left and to the right of 0 (but we calculated
the two optimal parameters using y ∈ [− 12 , 21 ]). In cases where the bandwidth
is larger than one, the estimate simply get close to an OLS estimate (Fan and
Gijbels, 1996).
The point estimate using the optimal binsize and bandwidth is 0.48 and
significant at the one percent level. It is depicted in the first row of Table 2. To
see that the point estimates are largely in line with the parametric test, note
that the estimated Sawtooth coefficient is the same as ln(1.2) − ln(0.8) ≈ 0.41.
As it can be seen, the other estimated coefficient has always the expected
sign and is always significant at the 10% level.12 However, the point estimate θ
is not as stable as the MLE suggested before, which is not surprising given that
12 Strictly
speaking, since we stated a clear hypothesis we are facing a one-sided test. Thus
significance at the 10% level is sufficient here.
19
we employ a huge range of different binsizes. In all cases, the nonparametrically
estimated curve looks quite similar to the plotted Sawtooth graph from sections
4.1 and 4.2. Graphical analysis also suggests that there is indeed a clear deviation from the uniform distribution, with the estimated density giving noticeably
more probability mass to observation on the right of the cut-off date than on
the left. This finding is represented in Figure 4.
5
Robustness Tests
5.1
Excluding Older Age Groups
To assess the robustness of our results we perform two types of checks: firstly,
since we suspect that the data from Duhey and Lipscomb may not be as reliable
as the official cut-off data from 1975, we exclude Congressmen who are born
before 1960. To the remaining ones we assign the 1975 cut-off dates. The
rationale is to include only observations that went to school in a 10 year window
around 1975 and to use only the official Education Commission data (assuming
that school enrollment is at age five). This reduces sample size to 125. With
this reduced sample using only the more reliable cut-off dates we find that the
coefficient increases to d = 0.6 with a p-value close to 5% when conducting the
Sawtooth test.13 Table 3 presents the results in column one.
We also perform the McCrary test on this reduced sample. As binsize we use
the A-MISE optimal one (“McCrary1”) half of it (“McCrary2”). To estimate
the optimal binsize we use the sawtooth estimate d = 0.6 to approximate for
′
f (y). The bandwidth for the second step is each time calculated according to
IK. Both tests display a large and significant break in the density see Table
3. All in all it seems the point estimates rather increase than decrease when
excluding more uncertain observations.
13 We
also performed robustness checks in the way that we narrowed down the window
around 1975 even further. The result is that the estimated coefficient remains stable between
0.5 and 0.7 whereas the p-value increases with the sample size going to zero.
20
5.2
Excluding Precarious States
Secondly, we want to rule out the possibility that our results are driven by states
where the recorded cut-off dates may seem suspicious. In order to do so, we
exclude all observations from states which had “too many” changes in cut-off
dates over time. As mentioned before, we rely on cut-off dates given by Dhuey
and Lipscomb (2008) for three different points in time additionally to the dates
from the Education Commission for 1975. We then say a state has “too many”
different cut-off dates when we are confronted with more than two different cutoff dates for the four points in time. If we would have excluded all states with
more than one change, we would have to delete almost every observation, if we
had used at least three changes as a reference point we would have excluded
almost no observations. That is why two changes are in our opinion the right
compromise. While doing that, the sample size reduces to N = 340.14
Doing so, the estimated coefficient goes up to d = 0.50 with significance
at the 1% level. The Sawtooth test against the general empirical distribution
yields very similar and clearly significant results. The nonparametric test is
implemented as well. Both estimated coefficients are high. The first one is
significant at the one percent level. The second one tightly fails to be significant,
but the coefficient still has a similar magnitude.
To sum up, these robustness tests support that the results from the sections before are neither driven by a subgroup of states that appear to have
unreasonable cut-off dates nor by measurement error in the more unreliable,
hand-collected cut-off dates from Dhuey and Lipscomb (2008). In fact, just the
opposite appears to be true: the point estimates increase even more once we
exclude data which is likely to be badly measured. This enhances the credibility of the initial estimates. Table 3 summarizes the results of both robustness
sections.
14 We
see this kind of problem in 13 different states and we consequently exclude 103 obser-
vations.
21
Excluding Older
Excluding Precarious States
0.60*
0.50***
(5.2%)
(0.7%)
0.52*
0.49***
Empirical Distribution
(9.0%)
(0.8%)
McCrary1
0.75***
0.57***
(0.00%)
(0.00%)
0.55**
0.44
(1.87%)
(10.2%)
125
340
Sawtooth against
Uniform Distribution
Sawtooth against
McCrary2
N
Table 3: Sawtooth and McCrary Test for a break. McCrary’s test displays
the log difference. P values in brackets below. McCrary1 is McCrary’s test
with the A-MISE optimal bandwidth and McCrary2 is conducted using half of
this binsize. The bandwidth is calculated according to IK. *,** & *** indicate
significance at the 10%, 5% & the 1% level respectively.
22
5.3
Measurement Error and Redshirting
The past section revealed that removing observations that seemed to be more
prone to measurement error rather increases than decreases the significance of
the results. But still, we can hardly assume that there is no measurement error
left at all. Measurement error will in fact bias the slope estimate towards zero,
since the source of errors is in the cut-off date. Thus the estimates are significant
despite measurement error and not because of it.
In that regard, another issue is also worth noting: in our analysis we implicitly assumed that all Senators and Representatives entered school at the
designated age. In that sense we are only working with “assigned relative age”
as defined by Bedard and Dhuey (2006).15 But what if this is not the case?
The behavior called “academic redshirting” denotes the fact that parents
– especially academic ones – sometimes decide to postpone the school entry of
their children by one year. Bedard and Dhuey (2006) state that about 5% of children in the US enter kindergarten one year later. Among these children, those
born into high-status families and born in the last quarter before the cut-off
are overrepresented. If red-shirting from families from higher social background
is widespread, it will de facto dampen our estimated effect towards zero in the
case where children who are relative young in their cohort are redshirted with
a higher probability. If redshirting happens at random relative to the age in a
cohort, it will not have any effect.
Redshirting might also be an explanation why the parametric test in general
displays stronger evidence for a break in the data than the nonparametric test:
the former takes all observations into account to measure a potential break in the
data, whereas the latter tries estimate the break only in a neighborhood around
the hypothesized cut-off. As a consequence the nonparametric test is more prone
to confounding effects around the cut-off, like redshirting or measurement error.
This also suggests that the influence of relative age is not only a matter of
15 They
define “assigned relative age” in the same manner as we define “relative age” - as
the distance of an individual’s birthday relative the state-wide school entry cut-off, ignoring
any kind of redshirting behavior.
23
being among the very oldest or very youngest, but shows up - somewhat less
pronounced - among the remaining students as well.
6
Conclusion
Elected politicians are arguably important and powerful people in a democratic
system. They have the right to pass new legislations and often to elect important
figures in the political system. Thus they play a crucial role in the democratic
process. Despite general importance, little to nothing is known about how
people choose to become a politician and what traits are necessary or at least
beneficial in gaining such a position of leadership. This study is to the best of
our knowledge the first that tries to systematically identify an underlying reason
shaping selection into political offices. However, we think this finding can only
be the starting point for a broader research agenda investigating politician’s
attributes and its consequences for political outcomes more closely.
Alongside the empirical work, we also presented a general statistical framework that is in our opinion a significant improvement compared to the existing
literature. We use a parametric and nonparametric test to test for a jump in
the density around the cut-off. The prevailing studies usually rely on dummies
and t tests or chi squared tests of distributional equality, using coarse classifications in months or even quarters of birth. Chi squared tests also suffer from low
power and are thus not suited to detect smaller effect sizes. Using cut offs and
birthdays on a daily base, as we do, increases the information used and thus also
enhances the reliability of the estimates. Quite frequently, studies using US data
assume September the first to be the cut-off. However, our data and the data in
Dhuey and Lipscomb (2008) show that cut-off dates are in fact time-varying and
more or less spread all over the year for different states, which makes September
the first unlikely to be a good approximation. The maximum likelihood test we
use is also applicable to all sorts of similar situations where the interest lies in
estimating a break point in a density, when there is a reasonable null hypothesis
distribution. This test is useful, either as a complementary robustness check to
24
nonparametric methods or because one is not willing to make crucial parameter
choices.
The evidence presented in the past sections also adds to the growing literature on the impact relative age has beyond school age. It is known that
professional athletes enjoy an advantage if they are among the relative old, but
evidence for other areas outside of sports has been missing yet. Competition
is a necessary condition for the existence of a relative age effect and the political competition for a seat in the US Senate or House of Representatives is
arguably one the toughest. In that sense it might not even be surprising to find
similar indications in that very special population. We would conjecture that
being relatively old among their peers during their youth helps people to develop leadership skills, maybe learn how to take initiative and lead one’s peers.
Small initial differences could accumulate, as those successful at taking the lead
among their peers, may learn to like such a position and grow in confidence
about their ability to do so. If the social skills required to be successful in politics can be learned, those who were older in their cohort may be more likely to
have accumulated such skills.
The magnitude of the effect we find is larger than most other studies on the
relative age effect for a broader (adult) population, but is in general in line with
research that looks at populations in high-competition environments such as
professional sports or CEOs. This is possibly due to the fact that small initial
differences in advantages may see their effect compounded in highly competitive
environments. We reason that the relative age effect is unlikely to play an
important role for the larger adult population, but evidently exists in settings
where the stakes are high and small things are critical.
The reported findings leave an interesting question unanswered: are politicians selected efficiently or, put differently, does selection on relative age lead to
a selection to an outcome where the best possible candidates are in office? Similarly, one may wonder whether some individuals with the potential to be good
political decision makers may never make it to an official position, because they
developed less skills to get elected in the first place due to the fact that they were
25
relatively young at school. A potential concern might therefore be that a loss
in efficiency occurs because allocation of talents induced by such a relative age
effect is suboptimal. The answer depends on which view one takes on ”leadership abilities”, which we have not defined here in more detail. If these abilities
gained through a higher relative age are purely non-productive (for example
charisma, presentation and persuasion skills that increase election probabilities
but are unrelated to expert knowledge) then selection on relative age is indeed
inefficient and another welfare enhancing selection mechanism is theoretically
conceivable. If however, the skill set acquired in effect enhances the quality and
productivity of a politician (for instance superior motivation and guidance of
employees; improved abilities to efficiently lead a political discussion), then selection on relative age is desirable. In general, it should be aimed at electing the
most educated, able and smartest ones, those with the biggest farsightedness
and not those who are the best speakers and have the most charisma. A welfare
enhancing selection mechanism should explicitly stress the importance of factors that are directly relevant for political performance. For that, more research
on characteristics of politicians and their consequences needs to be conducted.
Besley et al. (2011) is in our view an important step in that direction.
References
Barnett, A. (2010). The relative age effect in australian football league players.
Unpublished working paper.
Barnett, A. and Dobson, A. (2009). Analysing Seasonal Health Data. Springer
Verlag.
Bedard, K. and Dhuey, E. (2006). The persistence of early childhood maturity: International evidence of long-run age effects. The Quarterly Journal of
Economics, 121(4):1437 – 1472.
Bertrand, M. and Schoar, A. (2003). Managing with style: The effect of man-
26
agers on firm policies. The Quarterly Journal of Economics, 118(4):1169–
1208.
Besley, T. (2005). Political selection. The Journal of Economic Perspectives,
19(3):43–60.
Besley, T. and Coate, S. (1997). An economic model of representative democracy. The Quarterly Journal of Economics, 112(1):85–114.
Besley, T., Montalvo, J., and Reynal-Querol, M. (2011). Do educated leaders
matter? The Economic Journal, 121(554):F205–227.
Billari, F. and Pellizzari, M. (2008). The younger, the better? relative age
effects at university. IZA Discussion Paper No. 3795.
Cheng, M., Fan, J., and Marron, J. (1997). On automatic boundary corrections.
The Annals of Statistics, 25(4):1691–1708.
Dhuey, E. and Lipscomb, S. (2008). What makes a leader? relative age and
high school leadership. Economics of Education Review, 27(2):173–183.
Diermeier, D., Keane, M., and Merlo, A. (2005). A political economy model of
congressional careers. American Economic Review, pages 347–373.
Du, Q., Gao, H., and Levi, M. (2009). Born leaders: The Relative-Age effect
and managerial success. In AFA 2011 Denver Meetings Paper.
Edgar, S. and O’Donoghue, P. (2005). Season of birth distribution of elite tennis
players. Journal of sports sciences, 23(10):1013–1020.
Evans, W., Morrill, M., and Parente, S. (2010). Measuring inappropriate medical diagnosis and treatment in survey data: The case of ADHD among schoolage children. Journal of Health Economics, 29(5):657–673.
Fan, J. and Gijbels, I. (1992). Variable bandwidth and local linear regression
smoothers. The Annals of Statistics, pages 2008 – 2036.
27
Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications,
volume 66. Chapman & Hall/CRC.
Ferreira, F. and Gyourko, J. (2010). Does gender matter for political leadership?
the case of US mayors. Philadelphia, PA: Wharton School, University of
Pennsylvania.
Fredriksson, P. and Öckert, B. (2005). Is early learning really more productive?
the effect of school starting age on school and labor market performance. IZA
Discussion Paper No. 1659.
Goodman, R., Gledhill, J., and Ford, T. (2003). Child psychiatric disorder and
relative age within school year: cross sectional survey of large population
sample. BMJ, 327(7413):472.
Hahn, J., Todd, P., and Van der Klaauw, W. (2001). Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica,
69(1):201–209.
Härdle, W. (1991). Smoothing techniques: with implementation in S. Springer.
Helsen, W., Van Winckel, J., and Williams, A. (2005). The relative age effect
in youth soccer across europe. Journal of Sports Sciences, 23(6):629–636.
Imbens, G. and Kalyanaraman, K. (2009). Optimal bandwidth choice for the
regression discontinuity estimator. Technical report, National Bureau of Economic Research.
Jones, B. and Olken, B. (2005). Do leaders matter? national leadership and
growth since world war II. The Quarterly Journal of Economics, 120(3):835–
864.
Ludwig, J. and Miller, D. (2005).
Does head start improve children’s life
chances? evidence from a regression discontinuity design. Technical report,
National Bureau of Economic Research.
28
Malmendier, U. and Tate, G. (2005). CEO overconfidence and corporate investment. The Journal of Finance, 60(6):2661–2700.
McCrary, J. (2008). Manipulation of the running variable in the regression
discontinuity design: A density test. Journal of Econometrics, 142(2):698–
714.
Mühlenweg, A. and Puhani, P. (2010). The evolution of the school-entry age
effect in a school tracking system. Journal of Human Resources, 45(2):407–
438.
Musch, J. and Grondin, S. (2001). Unequal competition as an impediment to
personal development: A review of the relative age effect in sport. Developmental review, 21(2):147–167.
Musch, J. and Hay, R. (1999). The relative age effect in soccer: Cross-cultural
evidence for a systematic discrimination against children born late in the
competition year.
Porter, J. (2003). Estimation in the regression discontinuity model. Unpublished
Manuscript, University of Wisconsin at Madison.
Stone, M. (1974). Cross-validation and multinomial prediction. Biometrika,
61(3):509–515.
Sykes, E., Bell, J., and Rodeiro, C. (2009). Birthdate effects: A review of the
literature from 1990-on. Unpublished paper, University of Cambridge.
Thompson, A., Barnsley, R., and Stebelsky, G. (1991). Born to play ball: the
relative age effect and major league baseball. Sociology of Sport Journal,
8:146–151.
29