Political Selection and the Relative Age Effect Daniel Müller and Lionel Page ∗ March 14, 2012 Abstract In this paper we present substantial evidence for the existence of a bias in the distribution of births of leading US politicians in favor of those that have been the oldest in their cohort at school. This “relative age effect” has been proven to influence performance at school and in sports, but evidence on its impact on people’s vocational success has been rare. We find a marked break in the density of birthdate of politicians using a maximum likelihood test and McCrary’s (2008) nonparametric test. We conjecture that being relatively old in a peer group may create long term advantages which can create a significant role in the ability to succeed in a highly competitive environment like the race for top political offices in the USA. The magnitude of the effect we estimate is larger than what most other studies on the relative age effect for a broader (adult) population find, but is in general in line with studies that look at populations in high-competition environments. ∗ We thank Justin McCrary, Guido Imbens and Karthik Kalyanaraman for providing us with the code of their estimators. Contact: School of Economics and Finance, Queens- land University of Technology. Phone: + 61 7 3138 4793, Gardens Point Campus. E-mail: [email protected]. 1 “’Excellence’ is not a gift, but a skill that takes practice.” Plato 1 Introduction Research in political economy usually focuses on the incentives people face, either once they are in office or, more fundamentally, to become a politician after all (Diermeier et al., 2005; Besley and Coate, 1997). Studies, on the other hand, which investigate in detail who are the winning politicians are rare. It is only recently that economists have evinced interest in whether the political process selects politicians with specific desirable characteristics such as ability or trustworthiness (Besley, 2005). Besley argues that the proper selection of politicians is of crucial concern for a functioning government and that it is therefore essential to understand what factors exactly make it more likely for people to successfully run for an office. Understanding what individual qualities and traits allow politicians to reach a top-class position in the political system is nevertheless an important goal in itself. A better understanding of the selection process producing the population of politicians may also contribute to understand the way political institution work in practice. Previous research has already demonstrated that individual characteristics of leaders can have real impacts. Malmendier and Tate (2005) show that CEOs with a scientific background are less likely to be overconfident. Bertrand and Schoar (2003) find that a company’s policy is significantly influenced by its managers. Jones and Olken (2005) present evidence that makes clear that political leaders have a significant influence on economic growth. In the field of politics, Ferreira and Gyourko (2010) results suggest that female politicians have higher unobserved political skills but are less able to resolve conflict in their political coalitions. It is obvious that successful politicians are a subpopulation with specific traits, different from the average population. One may naturally think for example of wealth, gender, ethnicity or education (Besley et al., 2011), as factors which give an advantage in the race to top-flight political positions or other leadership 2 tasks in general. It is also evident that successful politicians have specific talents such as leadership or charisma which help them lead and motivate their group of partisan and voters. In the present paper we argue that selection of politicians is significantly influenced by their relative position among their peers at school. We give for the first time evidence of a significant bias in the distribution of births of leading US politicians - US Congressmen - in favor of those that have been the oldest in their cohort. In line with previous research, we interpret this fact in such a way that older children tend to benefit from their advantageous relative position in their cohort to learn early the skills required for leadership (see for example Dhuey and Lipscomb (2008)). Evidently, these learned abilities help people to be more competitive and enable them to win competitions, like political elections. As a result, relative age might have a long lasting effect on people and their vocational success. Our estimates indicate that US Senators and Representatives with a higher relative age are indeed largely overrepresented compared to the average American population as well as compared to a uniform distribution. The point estimate indicates that US politicians are up to 50% more likely to be among the oldest in their cohort than the average population. The effect size is similar to what other studies find in professional sports. This reveals that the impact of people’s date of birth may have in some cases a much longer lasting influence than previously thought. Our study contributes to the existing literature mainly along two dimensions: firstly, as mentioned before, we establish the existence of a relative age effect among top-flight US American politicians. Secondly, we provide in this paper a general framework - a combination of a parametric and a nonparametric test - to analyze any desired population for a relative age effect. Unlike most of the existing age effect literature we do not rely on rough classifications in months or quarters of birth, but on a maximum likelihood test which is able to test for a discontinuity in a potentially continuous density. We implement this test based on the exact day of birth and cut-off day. The rationale behind our testing 3 framework is straightforward. We simply look for a break in the relative age distribution at the specific cut-off date. This test enables us to test not only against a break in the uniform distribution, but also in the empirically observed US birth distribution. The test is easy to implement and to interpret, however it relies on a linearity assumption. Therefore we additionally illustrate how to use McCrary’s nonparametric density test (2008) in our setting. This test does not rely on any distributional assumption. However it suffers from the same drawback than other nonparametric methods, the fact that there are free parameters to choose. We implement his framework using the latest research on optimal parameter choices. The remainder of this paper is organized as follows. Section 2 will present a brief review of the literature on the relative age effect, Section 3 describes the data we use in more detail, Section 4 presents the methods we use and our main results. Section 5 displays some robustness checks and Section 6 concludes. 2 The Relative Age Effect School entry cut-off dates arbitrarily divide students into different cohorts. As a consequence, although attending the same class, those born directly after the cut-off are almost one year older than those born directly before. The fact that differing relative age within a class affects students along many different dimensions is supported by a huge body of literature. For example, relative young students tend to notably underperform at school (Bedard and Dhuey, 2006) and are less likely to enroll at a university (Mühlenweg and Puhani, 2010). The effect on school performance, although declining for teenage students, is found to be persisting until the end of tertiary education. Beyond educational attainment, the relative age effect also seems to play a significant role in the sports domain, among professional as well as among youth athletes (see Musch and Grondin (2001) for a summary of the relative age effect in sports). Relative old athletes seem to enjoy a significant competitive advantage compared to 4 others. However, it is well established that the effect of relative age does not stem from relatively older students being superior in some absolute sense, but purely from the fact that students end up being older or younger than their peers. It has been shown that the relative age effect is independent of the seasonal climatic conditions and socio-cultural circumstances. Musch and Hay (1999) for instance substantiate the existence of a relative age effect in professional soccer in four different countries with different climatic conditions at the cut-of date. They are even able to show that the effect remains stable after a within-country change in the cut-off date (see also Sykes et al. (2009) for instance). Although there is by now no generally accepted theory, several explanations have been adduced to explain this undisputed phenomenon. One part of the story might be that lower physical strength and height compared to the peers might most likely be a source of frustration and feelings of inferiority. In line with this, studies find that students who are younger within their cohort commit with a higher probability suicide and face a higher rate of psychological disorders, see Thompson et al. (1991), Goodman et al. (2003) and Evans et al. (2010). On the other hand, the results in Dhuey and Lipscomb (2008) suggest that relatively older students are more likely to become a leader of a high school club. Being older among their peers might help them to take a leadership position among them. Studies that deal with the potential long-term effects of relative age beyond school and university attainment outside of sports, for example in labor market outcomes, are considerably less frequent. Fredriksson and Öckert (2005) find that being older at school entry slightly increases earnings in later life. Du et al. (2009) study a population probably most similar to ours. They demonstrate by looking at a sample of US American CEOs that these leaders seem to have a higher likelihood of being born directly after the cut-off date than directly before. They are using quarterly dummies which is common practice in the literature but leads to a loss of information and they do not account for changing cut-off dates over the years. By looking at students at an Italian university, Billari and Pellizzari (2008) find a negative effect, if any, of being relatively 5 old on performance. Taken together this yields an interesting pattern: studies that deal with the outcomes of a broader population after school often find small to insignificant age effects1 , whereas studies that deal with special, very successful populations (like Du et al. (2009) and Musch and Hay (1999) for instance) typically find a strong and significant effect.2 Studies that deal with the labor market outcomes may fail to find such an effect because they deal with a broader population where the relative age effect is not as pronounced as in environments with a more intense competition. In the upper tail of the ’success distribution’ it is possible that every detail counts more and that, in some sense, every ingredients needs to have been right. In such a case, the relative age effect may matter for exactly that reason. 3 The Data We employ a dataset, which includes birthdays and state of birth of most US American Senators and members of the House of Representatives, as of June 2011. These data are publicly available. We dropped those observations where we were not able to determine the state where he or she went to kindergarten and primary school respectively. For instance a few politicians were born in a foreign country, some moved during their childhood beyond state borders and for some the information provided on their homepages was simply not enough to identify the state where he or she went to school and kindergarten. Moreover we dropped those born in states without statewide school entry cut-off dates. To be able to calculate the relative age of the Congressmen, we draw on school entry cut-off data for all 50 US states from the US Education Commission of the States. From our communication with staff members of the Education Commission of the States we know that the same cut-off dates for kindergarten 1 It needs to be stressed again that all studies consistently find an effect for children and younger students. 2 And in fact, by looking at US politicians we find an effect of similar size then the studies of professional sportsmen found before. 6 and schools apply in each state. Unfortunately, these dates only go back to 1975 and most politicians were born before. Therefore we also employ cut-off dates collected by Dhuey and Lipscomb (2008) for the period preceding 1975. Table 1 in their paper depicts state cut-off dates for kindergarten entrance for 1947-1949, 1959 and 1967-1969. We assign each politician to one of the four cut-off dates that is closest to his or her birthday. Including these dates gives us 443 observations, which is more than 80% of the total number of Congressmen. There is no reason to believe that the sample we have is biased in any obvious way. Moreover, we also make use of monthly birth frequency data for a period from 1990 to 2008 taken from US Department of Health and Human Services. This dataset covers all births in the US in the given period. We take monthly averages over the whole period of 19 years to approximate for the real underlying birth distribution over the year. A deviation from the uniform distribution might for example stem from seasonal effects in child-bearing behavior. Interestingly, these data are also combinable with the educational background of the mother, available from the same source. This allows us to test not only against a deviation from the uniform distribution but also against the general and “educated people only” empirical birth distribution. 4 Methods and Results 4.1 Sawtooth Test against Uniform Distribution The idea of the testing procedures we use is simple. If being relatively old is beneficial for developing leadership qualities and improving competitiveness, we would expect to observe more relative old politicians than young ones, with a discontinuity in the birthday distribution at the cut-off date. To test this we implement a parametric maximum likelihood version of the Bayesian “Sawtooth test” (Barnett and Dobson, 2009). Firstly, we define the relative age more formally. In general, it is defined to 7 Slope −d Density d 2 1 0 − 21 1 2 Relative Age Figure 1: Stylized Sawtooth Density be the relative distance of person i′ s birthday and the relevant cut-off date a person is subjected to. We compute relative age of person i then as follows: Let coi be the cut-off date a person i is exposed to and xi his or her birthday, both normalized such that coi , xi ∈ [0, 1].3 We then define relative age yi as (x − coi ) + 1 if i yi = (xi − coi ) if (x − co ) − 1 if i i (xi − coi ) < − 21 − 12 ≤ (xi − coi ) ≤ 1 2 (xi − coi ) . < 1 2 (1) Thus yi ∈ [− 21 , 21 ], whereas the “critical date” (the date that separates the relative oldest from the youngest) can be found at y = 0. The relative old can accordingly be found in the positive area to the right of zero and the young politicians to the left of zero. The interval is rather arbitrarily chosen, nevertheless the fact that the interval has a length of one facilitates interpretation of the results. The test is implemented as follows: we write the density of yi , the relative age of person i, as f (yi ) = 1 − dyi − d + d 1[yi ≥0] . 2 (2) Since yi is normalized to a unit interval, d is not only the height of the break but also the (negative) slope of the density function. A stylized graph of this 3 For example, if i is born at the 31st of December, then xi = 1. If the relevant cut-off date for i would be the 30th of June, then coi = 1 . 2 8 density is given in Figure 1. The coefficient d is estimated by maximizing the log-likelihood function L= N X h i d ln 1 − dyi − + d 1[yi ≥0] , 2 i=1 (3) with respect to d, where N is the sample size. It is clear that, if the birth distribution is roughly uniform across the year, we would expect d to be equal to zero. In that case we must assume that there is in fact no relative age effect among US politicians. If however politicians have in fact a higher likelihood of being born directly after the cut-off date, we would expect d to be significantly larger than zero. Figure 2 depicts the results. The estimated d-coefficient for the sawtooth test and the corresponding p-value is given in the upper left graph (“Month +0”). The coefficient is equal to 0.42, the p-value is 1.1% and thus clearly significant at the 5% level. A coefficient of d = 0.42 means a big deviation from the uniform distribution - it roughly implies that the observations have at the limit a 1.2−0.8 0.8 = 50% higher likelihood of being born directly after the cut-off date then before. These results provide strong support for the existence of a relative age effect among US politicians. This effect is large and is in the same range as the effects found in studies on professional sport athletes. Barnett (2010) for example finds an effect of 78% for AFL players between January and December, around the cut-off date. Musch and Hay (1999) find an effect not smaller than 66% up to more than 200% for professional soccer players in Japan, Australia, Germany and Brazil (they are using quarters of birth). Helsen et al. (2005) estimate the effect size to be larger than 370% for a sample of more than 2100 soccer players in national youth selection teams across Europe. Edgar and O’Donoghue (2005) find an effect of the order of 80% by looking at professional tennis players (they also use quarters).4 Thus our estimate, although large, are not outstanding relative to the literature on relative age effect in professional sport. 4 All these effect sizes are compared to a uniform distribution. 9 Sawtooth Test Against Uniform Distribution 0.5 −0.5 0.0 Relative Age 0.5 −0.5 0.0 Relative Age 0.5 0.5 Estimated density .8.9 11.1 1.2 Estimated density .8.9 11.1 1.2 −0.5 −0.5 0.0 Relative Age 0.5 0.0 Relative Age 0.0 Relative Age d=.21; p= .22 −0.5 0.0 Relative Age 0.5 −0.5 0.0 Relative Age corresponding p-value. The upper left graph stands for the test at the hypothesized cut-off. The other eleven graphs depict pseudo-tests where the potential break point is shifted by the equivalent of one month in each step. N = 443. As a robustnes test we run “pseudo-test”, where we switch the assumed breakpoint by the equivalent of one month in each step.5 The other eleven graphs in Figure 2 show the results for these tests. As it can be seen the pseudo-tests show no break significant at the 5% level and all estimated pseudo coefficients are considerably smaller than 0.42. This is strong evidence that a discontinuity occurs at the specifically at around the cut-off. the variables are normalized to a unit interval, a plus one month move corresponds to a shift of 1 12 in our case. 10 0.5 d=.17; p= .32 Figure 2: The estimated break height and the slope are given by d, p is the 5 Since 0.5 Cut−off Month+11 d=.04; p= .82 −0.5 0.5 Cut−off Month+8 Cut−off Month+10 d=.02; p= .89 0.0 Relative Age 0.5 d=−.29; p= .08 Cut−off Month+9 −0.5 0.0 Relative Age 0.0 Relative Age d=−.25; p= .13 Cut−off Month+7 d=−.16; p= .34 Estimated density .8.9 11.1 1.2 Estimated density .8.9 11.1 1.2 Cut−off Month+6 −0.5 −0.5 Cut−off Month+5 d=−.12; p= .44 Estimated density .8.9 11.1 1.2 −0.5 0.5 d=.05; p= .74 Cut−off Month+4 d=.04; p= .8 Estimated density .8.9 11.1 1.2 Estimated density .8.9 11.1 1.2 Cut−off Month+3 0.0 Relative Age Estimated density .8.9 11.1 1.2 0.0 Relative Age Estimated density .8.9 11.1 1.2 −0.5 Cut−off Month+2 d=.22; p= .17 Estimated density .8.9 11.1 1.2 Cut−off Month+1 d=.42; p= .01 Estimated density .8.9 11.1 1.2 Estimated density .8.9 11.1 1.2 Cut−off Month+0 0.5 4.2 Generalized Sawtooth Test While we find a large estimated break in the density, a problem with that analysis might be that we test implicitly against the null hypothesis of a uniform distribution of yn . However, the data generating process underlying birth frequencies is not exactly uniform, for instance due to seasonal effects or strategic child bearing behaviour. Another concern might be that more educated parents who are aware of the relative age effect endogenously “sort” the births of their children right after the cut-off date, in order to give them a natural advantage. If this is true, it would lead to a confounding of an educational effect (children of more educated parents are more likely to get more education themselves and more educated people can be found with higher probability in leading positions than less educated ones) and a potential relative age effect. To control for this possibility, we generalize the sawtooth test from section 4.1 as follows: Let h(xi ) be null hypothesis density of observing a birth at date x for politician i. Then the density of yi becomes: f (yi ) = h(xi ) − dyi − d + d ∗ 1[yi ≥0] . 2 (4) We approximate the null hypothesis density h(xi ) by the monthly birth frequency data from the US Department of Health and Human Services averaged over all 50 states and over the whole period 1990 to 2008 (see section 3) and then rerun the Sawtooth test. All politicians are born before 1990, we therefore assume rely on the identification assumption that the birth distribution is sufficiently stable over time to get a good approximation.6 Figure 3 depicts the monthly birth frequencies for 1990-2008.7 The inter6 Arguably we are unable to measure the exact birth distribution during the period of the birth of the Congressmen. We conducted stability tests in the way that we compared monthly frequencies of different subsamples of the 1990-2008 series and we found that the frequencies remained literally unchanged. In any case, any difference between the actual distribution of births at the time of birth of the Congressmen and the one we use would only tend to create an attenuation bias, making our estimations conservative. 7 Practically, we have birth frequency data on a monthly base, whereas we need data on a daily base. We computed the average birth frequency of each day as the frequency of the 11 0.02 Frequencies 0.04 0.06 0.08 0.1 Birth Frequencies in the US Jan Feb Mar Apr May Jun Jul Months Aug Sep Oct Nov Dec Figure 3: Monthly birth frequencies in the US. Aggregated data for the period of 1990 to 2008. pretation of the coefficient remains the same. As mentioned in Section 3, we are also able to combine empirical birth frequency data with the educational background of the mother and thereby examine the differences in birth distributions of educated and uneducated people. Additionally to the real 1990-2008 monthly birth frequencies we use two definitions of “educated” to construct two other null hypothesis distributions with data from the same period: we define “educated” first as holding at least a high school degree and then, secondly, as holding at least a college degree. This allows us to minimize concerns about sorting. The results strongly support our conclusion from section 4.1: all three coefficient estimates slightly reduce to d = 0.4, but are still very high with a p-value of 1.5%. The stability of our point estimates alleviate concerns that our results are driven by an incorrect distributional assumption or sorting effects. Table 1 summarizes the results. month, divided by 30. 12 Null Hypothesis Distribution Coefficient Estimate Uniform 0.42** Distribution (1.1%) Empirical 0.40** Birth Distribution (1.5%) At least high 0.40** school degree (1.5%) At least college 0.40** degree (1.5%) Table 1: Generalized Sawtooth Test. Coefficient estimates above, p values in brackets below. N = 443. *,** and *** indicate significance at the 10%, 5% & the 1% level respectively. 13 4.3 A Nonparametric Test The results presented in section 4.1 and 4.2 draw an interesting picture of the presence of a relative age effect among US politicians. We think the MLE test we presented before is very well suited in that context. However, an implicit assumption underlying this test is that the influence of the relative age effect on the probability of becoming a successful politician is linearly changing in the distance of one’s birthday to the relevant cut-off date, here the breakpoint at yi = 0. This might not always be a justifiable assumption and it is moreover not a necessary condition for the existence of a relative age effect.8 To address this issue, we also implement Justin McCrary’s (2008) nonparametric continuous density test. His test was initially designed to validate the continuity assumption of the assignment variable in a regression discontinuity (RDD) framework, which is a sufficient condition for identification. The idea is to test for a jump in the density of the assignment variable at the relevant cut-off point to check for the validity of this standard RDD assumption. In that respect this test is also well suited in our setting, because we are interested in testing for a potential jump in the density of relative age. For nonparametric identification of this jump the limit value of the density of the variable of interest (in our case relative age) from the left and from the right at the cut-off point needs to be estimated. Standard kernel density estimators suffer from a strong boundary bias. For this reason they are not well suited for situations where the interest lies in the points at the boundary itself. McCrary’s test tackles this problem in a two-step procedure: first the data are “binned” - that is a histgram of the variable of interest is calculated, with no bin overlapping the suspected point of discontinuity. In the second step, nonparametric methods are used to estimate the density endpoints from both sides, as proposed by Hahn et al. (2001). Local polynomial regression of order one combined with a triangle kernel was proven 8 It could be that this effect only shows up for those born four weeks after and four weeks before the cut-off date for example. These circumstances could render the Sawtooth Test insignificant, when in fact there is an effect. 14 to have the best properties for estimating points at the boundary (see Cheng et al. (1997) and Fan and Gijbels (1992)). Besides, Fan and Gijbels (1996) illustrate that in general polynomials with odd order have better Mean Squared Error properties than ones with even order and more recently Porter (2003) was able to prove that local linear estimators are rate optimal. This is exactly how we (and McCrary) implement the estimation of the jump point. Details for the calculations of the standard errors are depicted in the web appendix on McCrary’s homepage. Under standard nonparametric regularity conditions this estimator is consistent and asymptotically normal. 4.3.1 The Binsize Choice McCrary’s procedure has the advantage that no functional form assumption about the underlying density is necessary. However it has the drawback that the researcher has to choose smoothing parameters, which can be crucial for the outcome. In the McCrary test there are even two: the binsize for the histogram and the bandwidth the local polynomial regression. He recommends visual inspection of the first and second step diagrams guided by some automatic procedure (like cross validation, see Stone (1974)) to reach an optimal choice. We implement every parameter decision as theory-driven as possible. We use the Asymptotic Mean Integrated Squared Error (A-MISE) optimal histogram binsize here as a benchmark. The A-MISE basically tries to balance the tradeoff between the bias and the variance inherent in the optimal binsize problem (see for example Härdle (1991) for details). Formally, the MISE of a histogram is defined to be M ISE(fˆb (y)) = E Z ∞ −∞ 2 ˆ (fb (y) − f (y)) dy . (5) where {yi }ni=1 ∼ f (y) is an i.i.d. sample, b is the binsize and fˆb is the estimated histogram. Intuitively, the MISE-optimal binsize minimizes the expected distance between the true density function and the histogram. The AsymptoticMISE optimal binsize b∗ then results from minimizing (5) with respect to b and 15 neglecting all terms that are asymptotically equal to zero: ∗ b = 6 ′ n||f (y)||22 13 (6) . In general the A-MISE optimal binsize is of limited practical use as the term ′ ||f (y)||22 is unknown. However, proceeding on the assumption that our model estimated in section 4.2 is a sufficiently good approximation of the true density, ′ we are able to use the fact ||f (y)||2 ≈ 0.4.9 Then, empirically the binsize estimator turns out to be: ∗ b = 6 443 ∗ 0.42 31 ≈ 0.44 . (7) This will admittedly result in a rather oversmoothed histogram. Nevertheless, this binsize choice has the theoretical advantage of minimizing the AMISE.10 Intuitively, this makes sense since we are dealing with a distribution which has a very similar slope over the whole support. McCrary generally starts from a more bell-shaped distribution in which case the slope changes distinctively over the support. However, he also stresses the irrelevance of the first step binsize choice for the asymptotic normal approximation of his estimator. Nevertheless, we will implement a large range of different smaller binsizes to check the sensitivity of the results. 4.3.2 The Bandwidth Choice The second free parameter to choose is the bandwidth in the second step local polynomial regression. Recently Imbens and Kalyanaraman (2009) (IK henceforth) came up with an optimal bandwidth for the estimation around boundary 9 If ′ the uniform distribution was the true data generating process, in fact ||f (y)||22 would be equal to zero and thus the A-MISE optimal binsize would be infinity, which amounts to choosing a one-bin histogram. 10 We are aware of the fact that strictly speaking the optimal binsize from A-MISE minimization is not necessarily optimal in our framework. This is so because the A-MISE is a “global criterion”, whereas we are only interested in the boundary points. This argument is analogue to Imbens’ and Kalyanaraman’s (2009) argument for the optimal bandwidth in a regression discontinuity design. 16 points in a RDD setting. Their suggestion is based on the observation that the MISE is not necessarily the relevant criterion in settings where the interest does not lie in the whole function but only in the boundary points. This notion goes at least back to Ludwig and Miller (2005). IK’s basic idea is to use a criterion function that minimizes the expected squared difference between the RDD estimator and the true parameter value itself instead of minimizing the MSE over the whole support. IK are able to show that their bandwidth is asymptotically optimal and outperforms the only other bandwidth selection procedure especially designed for regression discontinuity design suggested by Ludwig and Miller (2005), which is based on cross validation. For this reason we implement IK’s suggestion in the following. 4.3.3 Results We test for the existence of a break in the relative age distribution using McCrary’s test, including the modifications explained before. Using IK’s optimal bandwidth estimator for the second step, we try a range of different histogram binsizes guided by the A-MISE optimal binsize, see section 4.3.1. The estimated coefficient θ stands now for the log difference of the density at yi = 0, estimated from the right and the left: θ̂ = ln lim+ fˆ(yi ) − ln lim− fˆ(yi ) , yi →0 yi →0 (8) where f (.) is the density estimated via linear local polynomial regression using the triangle kernel K(z) = (1 − |z|)1[|z|≤1] and yi ∈ [− 12 , 21 ] stands again for relative age.11 The first column in Table 2 depicts the results with different binsizes and the corresponding IK optimal bandwidth. We start out with the A-MISE optimal binsize and reduce it successively in 0.05 steps down to a binsize which is by a factor of ten smaller than the first suggestion. As mentioned before, we defined relative age such that y ∈ [− 12 , 21 ]. However the optimal bandwidths 11 We thank Justin McCrary and IK for providing us with the code of their tests. 17 Break Estimate θ Binsize IK Bandwidth 0.48*** 0.44 2.72 0.44*** 0.40 2.54 0.33*** 0.35 2.20 0.34*** 0.30 1.78 0.31** 0.25 1.06 0.41* 0.20 0.49 0.36** 0.15 0.95 0.63* 0.10 0.26 0.56* 0.05 0.27 0.36** 0.04 0.72 Table 2: McCrary (2008) test for break in relative age distribution for different binsizes. The bandwidth is calculated according to Imbens and Kalyanaraman (2009). *,** & *** indicate significance at the 10%, 5% & the 1% level respectively. θ is the estimated log difference of the density from the left and from the right, estimated using local linear regression with triangle kernel. N = 443. 18 1.4 1.2 Estimated Density 1.0 0.8 0.6 −0.5 0.0 Relative Age 0.5 Figure 4: McCrary density estimate and confidence intervals of relative age of US politicians. The binsize is 0.04 and the bandwidth is 0.72. are sometimes larger than 0.5 especially for large binsizes, which means that observations outside that interval are in general taken into account. Since our data is by nature circular we estimated densities over y ∈ [−1, 1], that means we have the same observations to left and to the right of 0 (but we calculated the two optimal parameters using y ∈ [− 12 , 21 ]). In cases where the bandwidth is larger than one, the estimate simply get close to an OLS estimate (Fan and Gijbels, 1996). The point estimate using the optimal binsize and bandwidth is 0.48 and significant at the one percent level. It is depicted in the first row of Table 2. To see that the point estimates are largely in line with the parametric test, note that the estimated Sawtooth coefficient is the same as ln(1.2) − ln(0.8) ≈ 0.41. As it can be seen, the other estimated coefficient has always the expected sign and is always significant at the 10% level.12 However, the point estimate θ is not as stable as the MLE suggested before, which is not surprising given that 12 Strictly speaking, since we stated a clear hypothesis we are facing a one-sided test. Thus significance at the 10% level is sufficient here. 19 we employ a huge range of different binsizes. In all cases, the nonparametrically estimated curve looks quite similar to the plotted Sawtooth graph from sections 4.1 and 4.2. Graphical analysis also suggests that there is indeed a clear deviation from the uniform distribution, with the estimated density giving noticeably more probability mass to observation on the right of the cut-off date than on the left. This finding is represented in Figure 4. 5 Robustness Tests 5.1 Excluding Older Age Groups To assess the robustness of our results we perform two types of checks: firstly, since we suspect that the data from Duhey and Lipscomb may not be as reliable as the official cut-off data from 1975, we exclude Congressmen who are born before 1960. To the remaining ones we assign the 1975 cut-off dates. The rationale is to include only observations that went to school in a 10 year window around 1975 and to use only the official Education Commission data (assuming that school enrollment is at age five). This reduces sample size to 125. With this reduced sample using only the more reliable cut-off dates we find that the coefficient increases to d = 0.6 with a p-value close to 5% when conducting the Sawtooth test.13 Table 3 presents the results in column one. We also perform the McCrary test on this reduced sample. As binsize we use the A-MISE optimal one (“McCrary1”) half of it (“McCrary2”). To estimate the optimal binsize we use the sawtooth estimate d = 0.6 to approximate for ′ f (y). The bandwidth for the second step is each time calculated according to IK. Both tests display a large and significant break in the density see Table 3. All in all it seems the point estimates rather increase than decrease when excluding more uncertain observations. 13 We also performed robustness checks in the way that we narrowed down the window around 1975 even further. The result is that the estimated coefficient remains stable between 0.5 and 0.7 whereas the p-value increases with the sample size going to zero. 20 5.2 Excluding Precarious States Secondly, we want to rule out the possibility that our results are driven by states where the recorded cut-off dates may seem suspicious. In order to do so, we exclude all observations from states which had “too many” changes in cut-off dates over time. As mentioned before, we rely on cut-off dates given by Dhuey and Lipscomb (2008) for three different points in time additionally to the dates from the Education Commission for 1975. We then say a state has “too many” different cut-off dates when we are confronted with more than two different cutoff dates for the four points in time. If we would have excluded all states with more than one change, we would have to delete almost every observation, if we had used at least three changes as a reference point we would have excluded almost no observations. That is why two changes are in our opinion the right compromise. While doing that, the sample size reduces to N = 340.14 Doing so, the estimated coefficient goes up to d = 0.50 with significance at the 1% level. The Sawtooth test against the general empirical distribution yields very similar and clearly significant results. The nonparametric test is implemented as well. Both estimated coefficients are high. The first one is significant at the one percent level. The second one tightly fails to be significant, but the coefficient still has a similar magnitude. To sum up, these robustness tests support that the results from the sections before are neither driven by a subgroup of states that appear to have unreasonable cut-off dates nor by measurement error in the more unreliable, hand-collected cut-off dates from Dhuey and Lipscomb (2008). In fact, just the opposite appears to be true: the point estimates increase even more once we exclude data which is likely to be badly measured. This enhances the credibility of the initial estimates. Table 3 summarizes the results of both robustness sections. 14 We see this kind of problem in 13 different states and we consequently exclude 103 obser- vations. 21 Excluding Older Excluding Precarious States 0.60* 0.50*** (5.2%) (0.7%) 0.52* 0.49*** Empirical Distribution (9.0%) (0.8%) McCrary1 0.75*** 0.57*** (0.00%) (0.00%) 0.55** 0.44 (1.87%) (10.2%) 125 340 Sawtooth against Uniform Distribution Sawtooth against McCrary2 N Table 3: Sawtooth and McCrary Test for a break. McCrary’s test displays the log difference. P values in brackets below. McCrary1 is McCrary’s test with the A-MISE optimal bandwidth and McCrary2 is conducted using half of this binsize. The bandwidth is calculated according to IK. *,** & *** indicate significance at the 10%, 5% & the 1% level respectively. 22 5.3 Measurement Error and Redshirting The past section revealed that removing observations that seemed to be more prone to measurement error rather increases than decreases the significance of the results. But still, we can hardly assume that there is no measurement error left at all. Measurement error will in fact bias the slope estimate towards zero, since the source of errors is in the cut-off date. Thus the estimates are significant despite measurement error and not because of it. In that regard, another issue is also worth noting: in our analysis we implicitly assumed that all Senators and Representatives entered school at the designated age. In that sense we are only working with “assigned relative age” as defined by Bedard and Dhuey (2006).15 But what if this is not the case? The behavior called “academic redshirting” denotes the fact that parents – especially academic ones – sometimes decide to postpone the school entry of their children by one year. Bedard and Dhuey (2006) state that about 5% of children in the US enter kindergarten one year later. Among these children, those born into high-status families and born in the last quarter before the cut-off are overrepresented. If red-shirting from families from higher social background is widespread, it will de facto dampen our estimated effect towards zero in the case where children who are relative young in their cohort are redshirted with a higher probability. If redshirting happens at random relative to the age in a cohort, it will not have any effect. Redshirting might also be an explanation why the parametric test in general displays stronger evidence for a break in the data than the nonparametric test: the former takes all observations into account to measure a potential break in the data, whereas the latter tries estimate the break only in a neighborhood around the hypothesized cut-off. As a consequence the nonparametric test is more prone to confounding effects around the cut-off, like redshirting or measurement error. This also suggests that the influence of relative age is not only a matter of 15 They define “assigned relative age” in the same manner as we define “relative age” - as the distance of an individual’s birthday relative the state-wide school entry cut-off, ignoring any kind of redshirting behavior. 23 being among the very oldest or very youngest, but shows up - somewhat less pronounced - among the remaining students as well. 6 Conclusion Elected politicians are arguably important and powerful people in a democratic system. They have the right to pass new legislations and often to elect important figures in the political system. Thus they play a crucial role in the democratic process. Despite general importance, little to nothing is known about how people choose to become a politician and what traits are necessary or at least beneficial in gaining such a position of leadership. This study is to the best of our knowledge the first that tries to systematically identify an underlying reason shaping selection into political offices. However, we think this finding can only be the starting point for a broader research agenda investigating politician’s attributes and its consequences for political outcomes more closely. Alongside the empirical work, we also presented a general statistical framework that is in our opinion a significant improvement compared to the existing literature. We use a parametric and nonparametric test to test for a jump in the density around the cut-off. The prevailing studies usually rely on dummies and t tests or chi squared tests of distributional equality, using coarse classifications in months or even quarters of birth. Chi squared tests also suffer from low power and are thus not suited to detect smaller effect sizes. Using cut offs and birthdays on a daily base, as we do, increases the information used and thus also enhances the reliability of the estimates. Quite frequently, studies using US data assume September the first to be the cut-off. However, our data and the data in Dhuey and Lipscomb (2008) show that cut-off dates are in fact time-varying and more or less spread all over the year for different states, which makes September the first unlikely to be a good approximation. The maximum likelihood test we use is also applicable to all sorts of similar situations where the interest lies in estimating a break point in a density, when there is a reasonable null hypothesis distribution. This test is useful, either as a complementary robustness check to 24 nonparametric methods or because one is not willing to make crucial parameter choices. The evidence presented in the past sections also adds to the growing literature on the impact relative age has beyond school age. It is known that professional athletes enjoy an advantage if they are among the relative old, but evidence for other areas outside of sports has been missing yet. Competition is a necessary condition for the existence of a relative age effect and the political competition for a seat in the US Senate or House of Representatives is arguably one the toughest. In that sense it might not even be surprising to find similar indications in that very special population. We would conjecture that being relatively old among their peers during their youth helps people to develop leadership skills, maybe learn how to take initiative and lead one’s peers. Small initial differences could accumulate, as those successful at taking the lead among their peers, may learn to like such a position and grow in confidence about their ability to do so. If the social skills required to be successful in politics can be learned, those who were older in their cohort may be more likely to have accumulated such skills. The magnitude of the effect we find is larger than most other studies on the relative age effect for a broader (adult) population, but is in general in line with research that looks at populations in high-competition environments such as professional sports or CEOs. This is possibly due to the fact that small initial differences in advantages may see their effect compounded in highly competitive environments. We reason that the relative age effect is unlikely to play an important role for the larger adult population, but evidently exists in settings where the stakes are high and small things are critical. The reported findings leave an interesting question unanswered: are politicians selected efficiently or, put differently, does selection on relative age lead to a selection to an outcome where the best possible candidates are in office? Similarly, one may wonder whether some individuals with the potential to be good political decision makers may never make it to an official position, because they developed less skills to get elected in the first place due to the fact that they were 25 relatively young at school. A potential concern might therefore be that a loss in efficiency occurs because allocation of talents induced by such a relative age effect is suboptimal. The answer depends on which view one takes on ”leadership abilities”, which we have not defined here in more detail. If these abilities gained through a higher relative age are purely non-productive (for example charisma, presentation and persuasion skills that increase election probabilities but are unrelated to expert knowledge) then selection on relative age is indeed inefficient and another welfare enhancing selection mechanism is theoretically conceivable. If however, the skill set acquired in effect enhances the quality and productivity of a politician (for instance superior motivation and guidance of employees; improved abilities to efficiently lead a political discussion), then selection on relative age is desirable. In general, it should be aimed at electing the most educated, able and smartest ones, those with the biggest farsightedness and not those who are the best speakers and have the most charisma. A welfare enhancing selection mechanism should explicitly stress the importance of factors that are directly relevant for political performance. For that, more research on characteristics of politicians and their consequences needs to be conducted. Besley et al. (2011) is in our view an important step in that direction. References Barnett, A. (2010). The relative age effect in australian football league players. Unpublished working paper. Barnett, A. and Dobson, A. (2009). Analysing Seasonal Health Data. Springer Verlag. Bedard, K. and Dhuey, E. (2006). The persistence of early childhood maturity: International evidence of long-run age effects. The Quarterly Journal of Economics, 121(4):1437 – 1472. Bertrand, M. and Schoar, A. (2003). Managing with style: The effect of man- 26 agers on firm policies. The Quarterly Journal of Economics, 118(4):1169– 1208. Besley, T. (2005). Political selection. The Journal of Economic Perspectives, 19(3):43–60. Besley, T. and Coate, S. (1997). An economic model of representative democracy. The Quarterly Journal of Economics, 112(1):85–114. Besley, T., Montalvo, J., and Reynal-Querol, M. (2011). Do educated leaders matter? The Economic Journal, 121(554):F205–227. Billari, F. and Pellizzari, M. (2008). The younger, the better? relative age effects at university. IZA Discussion Paper No. 3795. Cheng, M., Fan, J., and Marron, J. (1997). On automatic boundary corrections. The Annals of Statistics, 25(4):1691–1708. Dhuey, E. and Lipscomb, S. (2008). What makes a leader? relative age and high school leadership. Economics of Education Review, 27(2):173–183. Diermeier, D., Keane, M., and Merlo, A. (2005). A political economy model of congressional careers. American Economic Review, pages 347–373. Du, Q., Gao, H., and Levi, M. (2009). Born leaders: The Relative-Age effect and managerial success. In AFA 2011 Denver Meetings Paper. Edgar, S. and O’Donoghue, P. (2005). Season of birth distribution of elite tennis players. Journal of sports sciences, 23(10):1013–1020. Evans, W., Morrill, M., and Parente, S. (2010). Measuring inappropriate medical diagnosis and treatment in survey data: The case of ADHD among schoolage children. Journal of Health Economics, 29(5):657–673. Fan, J. and Gijbels, I. (1992). Variable bandwidth and local linear regression smoothers. The Annals of Statistics, pages 2008 – 2036. 27 Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications, volume 66. Chapman & Hall/CRC. Ferreira, F. and Gyourko, J. (2010). Does gender matter for political leadership? the case of US mayors. Philadelphia, PA: Wharton School, University of Pennsylvania. Fredriksson, P. and Öckert, B. (2005). Is early learning really more productive? the effect of school starting age on school and labor market performance. IZA Discussion Paper No. 1659. Goodman, R., Gledhill, J., and Ford, T. (2003). Child psychiatric disorder and relative age within school year: cross sectional survey of large population sample. BMJ, 327(7413):472. Hahn, J., Todd, P., and Van der Klaauw, W. (2001). Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica, 69(1):201–209. Härdle, W. (1991). Smoothing techniques: with implementation in S. Springer. Helsen, W., Van Winckel, J., and Williams, A. (2005). The relative age effect in youth soccer across europe. Journal of Sports Sciences, 23(6):629–636. Imbens, G. and Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator. Technical report, National Bureau of Economic Research. Jones, B. and Olken, B. (2005). Do leaders matter? national leadership and growth since world war II. The Quarterly Journal of Economics, 120(3):835– 864. Ludwig, J. and Miller, D. (2005). Does head start improve children’s life chances? evidence from a regression discontinuity design. Technical report, National Bureau of Economic Research. 28 Malmendier, U. and Tate, G. (2005). CEO overconfidence and corporate investment. The Journal of Finance, 60(6):2661–2700. McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2):698– 714. Mühlenweg, A. and Puhani, P. (2010). The evolution of the school-entry age effect in a school tracking system. Journal of Human Resources, 45(2):407– 438. Musch, J. and Grondin, S. (2001). Unequal competition as an impediment to personal development: A review of the relative age effect in sport. Developmental review, 21(2):147–167. Musch, J. and Hay, R. (1999). The relative age effect in soccer: Cross-cultural evidence for a systematic discrimination against children born late in the competition year. Porter, J. (2003). Estimation in the regression discontinuity model. Unpublished Manuscript, University of Wisconsin at Madison. Stone, M. (1974). Cross-validation and multinomial prediction. Biometrika, 61(3):509–515. Sykes, E., Bell, J., and Rodeiro, C. (2009). Birthdate effects: A review of the literature from 1990-on. Unpublished paper, University of Cambridge. Thompson, A., Barnsley, R., and Stebelsky, G. (1991). Born to play ball: the relative age effect and major league baseball. Sociology of Sport Journal, 8:146–151. 29
© Copyright 2026 Paperzz