ingl s

European Social Survey ESS 2012
Documentation of the Spanish sampling
procedure
The 2012 sampling design incorporates small innovations to the 2010 design. These are:
a) Changes in the number of individuals in each PSU
The analysis of the 2010 data has shown that differences in response rate between the
three brackets of bigger size have decreased. Significant divergences in the response rate
only remain between those brackets with less than 10,001 inhabitants and the rest.
Therefore, anticipating that these differences in the response rate will remain in the sixth
round, for the 2012 sample 7 individuals will be selected in the three first brackets and 6
individuals will be selected in the brackets of smaller size.
b) Calculation of the inclusion probabilities
Following the recommendation of the ESS Round 6 Guidelines, the effect due to
variation of inclusion probabilities has been also taken into consideration when
estimating the design effect.
Additionally, in order to achieve a more accurate sample size for the 2012 round,
estimations of the ineligible rate, the response rate and the mean response rate in each
cluster have been obtained as a mean of the 2008 and 2010 results. The continuous
update of the municipality registers and the fieldwork improvements justify that these
estimations are calculated using the most recent data.
The design effect, however, has been estimated using the data from 2006 and 2010. The
2008 data has been excluded from the analysis as the ESS Round 4 design oversampled
two regions (Catalonia and Galicia).
A. TARGET POPULATION
The population consists of all people aged 15 or older who are resident within private
households in Spain (including Ceuta and Melilla), regardless of nationality, citizenship
or language.
1
B. SAMPLING FRAME
The sampling frame for the 2012 ESS sample is the Spanish population census structured
in sections taken from the Continuous Census (Padrón Contínuo) updated in December
2011 by the Instituto Nacional de Estadística (INE, the Public Statistics Office of Spain).
Taking the Continuous Register as a sampling frame ensures the best available coverage
of the population who resides in Spain. The Continuous Register is updated using the
municipal register: when a citizen moves from one municipality to another s/he has to
notify the local authorities of his/her new place of residence. That grants her/him access
to the public health services, public schools and other public services, as well as it allows
updating the electoral register. The law obliges every Spanish city council to send the
data from its register to the INE once a year. This process guarantees that the national
Continuous Register of inhabitants is updated. Foreigners usually register in municipal
rolls in order to benefit from welfare services, even if they are not legal residents in the
country.
C. SAMPLE DESIGN
The proposed design for the 2012 round of the ESS is a stratified two-stage sample
design.
The strata are obtained by crossing two population classification criteria. The first
criterion is the Autonomous Community or region of residence (there are 17 of them plus
another one grouping the North-African autonomous cities of Ceuta and Melilla). The
second criterion (the type of habitat criterion) distinguishes among four types of habitats
according to their size:
•
•
•
•
The first bracket: cities with more than 100,000 inhabitants aged 15+
The second bracket: cities between 50,001 and 100,000 inhabitants aged 15+
The third bracket: municipalities between 10,001 and 50,000 inhabitants aged
15+
The fourth bracket: municipalities with less than 10,001 inhabitants aged 15+
The existence of different response rates between the three first brackets and the forth one
justify the maintenance of this stratification. More details about the rational behind the
proposed stratification will be found in Appendix 1.
2
The cross-tabulation of the two criteria gives a total of 72 theoretical strata (18x4), only
64 of them being effective.
In each stratum the two sampling stages are the following:
1. In the first stage, a fixed number of census sections are drawn with probability
proportional to the number of inhabitants aged 15+ in each section. Thus, census
sections are the primary sampling units (PSUs)1.
2. In the second stage, for each PSU selected in the previous stage, 6 or 7 individuals per
unit will be randomly drawn: 7 in the sections belonging to the first three brackets
and 6 in the fourth. As we mentioned above, the data analysis of 2008 and 2010 round
showed that the response rate in the three first brackets has converged, although this
remains to be significantly different from the response rate in the fourth bracket (see
Table A.1. Appendix A.1).
The probabilities of inclusion of sections and individuals are provided by the INE.
D. DESIGN EFFECTS
The effect due to variation of inclusion probabilities has been included in the design
effect of the Spanish sampling for the first time.
The total design effect (DEFF) is now calculated from the product of the design effect
due to clustering (DEFFc) and the design effect due to the variation of inclusion
probabilities (DEFFp).
For the estimation of the effect of clustering in the 2012 round, 22 variables of 2006
round and 23 variables of 2010 round have been used. See Appendix 2 for the
calculations leading to the final estimation of the 2012 mean design effect:
DEFF= DEFFc* DEFFp =1.083*1.177 =1.275
E. RESPONSE RATE
1
There are 34,600 census sections in Spain. Census sections are the most elementary framing units of
eligible voters. The size of sections vary between 500 and 2,000 voters (18+ years old), being the average
size of 1,300. Nevertheless, it should be stressed that although census sections are defined with regard to
electoral processes, these are only used for establishing the boundaries of administrative units that are used
for sample designs. Census sections do include all citizens registered in the municipal registers, regardless
of their voting rights.
3
The first two ESS rounds highlighted the difficulties for achieving the target response
rate of 70% in Spain. The response rates were 53% in 2002 and 55% in 2004. However,
due to the improvements implemented in the fieldwork plan and the serious involvement
of the survey company in the way that they conduct and monitor the fieldwork, the
Spanish response rate raised to 66.8% in 2008 and to 68.5% in 2010. The goal of 70 % is
within reach.
In the calculation of the 2012 sample size an estimated response rate of 69.1% has been
used.
F. VALID CASES
The proportion of valid cases in the 2010 Round was 0.96, lower than in 2008 (0.973),
but still significantly higher than in Round 3 (0.870). Taking into account that the
population data used in the 2012 sampling design have been recently updated (December
2011), we preview a high eligible rate for the Sixth Round. We estimate the 2012 eligible
rate as a weighted average of the two last eligible rates assigning a higher weight for the
last one.
Estimated proportion of valid cases in 2012 Round =
0.960*(2/3) + 0.973*(1/3) = 0.964
G. SAMPLE SIZE
In the calculations of the sample size three numbers have to be estimated: the proportion
of valid cases, the response rate and the design effect. As we have explained in previous
paragraphs, estimations for the response rate and the eligible rate for the 2012 design
have been obtained from the two previous rounds, while the design effect has been
estimated from the 2006 and 2010 data. The values for these estimations are:
Proportion of valid cases = 0.964
Mean response rate = 0.691
Design effect = 1.275
Taking into account the above information, the calculations to determine the sample size
for the 2012 survey are the following:
Minimum effective sample size = 1,500
Net sample size = 1,500*1.275=1,912
Gross sample size = 1912.5/(0.964*0.692) = 2,869
4
In the process of assigning individuals to each stratum proportionally to the population
aged 15+, the constraint to take an integer number of sections in each stratum has led to a
slight modification of the total number. Therefore, the total sample size is 2,868. Table 1
presents the distribution of the number of sections and individuals to be selected in each
stratum proportional to the population aged 15+. See Appendix 3 for the distribution of
the 2012 Spanish population aged 15+.
Table 1. Distribution of sections and individuals by strata (proportionally to the population)
Number of sections
Number of individuals
Size of habitat
Region
Andalucía
Aragón
Asturias
Baleares
Canarias
Cantabria
Castilla y León
Castilla-La Mancha
Cataluña
Valencia
Extremadura
Galicia
Madrid
Murcia
Navarra
País Vasco
La Rioja
Ceuta y Melilla
Total
Between Between
More
50,001 10,001
than
and
and
100,000 100,000 50,000
25
6
5
4
7
2
7
2
27
14
1
5
41
6
2
7
1
0
162
13
0
1
0
2
0
3
3
8
6
1
4
7
1
0
2
0
1
52
20
2
3
5
8
2
3
5
17
17
2
8
6
5
1
6
0
0
110
Less
than
10,001
Total
17
4
2
2
2
2
11
10
14
9
5
9
4
1
3
4
1
0
100
75
12
11
11
19
6
24
20
66
46
9
26
58
13
6
19
2
1
424
Between Between
More
50,001 10,001
than
and
and
100,000 100,000 50,000
175
42
35
28
49
14
49
14
189
98
7
35
287
42
14
49
7
0
1134
91
0
7
0
14
0
21
21
56
42
7
28
49
7
0
14
0
7
364
140
14
21
35
56
14
21
35
119
119
14
56
42
35
7
42
0
0
770
Less
than
10,001
Total
102
24
12
12
12
12
66
60
84
54
30
54
24
6
18
24
6
0
600
508
80
75
75
131
40
157
130
448
313
58
173
402
90
39
129
13
7
2868
5
Appendix A.1: Stratification
We discuss below the reasons for stratification by region and type of habitat:
Stratification by region. There is a common practice for social surveys in Spain to use
stratification by Autonomous Communities (regions). That procedure is based on the
observed socio-economic, political and cultural differences. The analysis of the five first
rounds of the ESS corroborated those differences among regions for some variables and
thus, the benefit to stratify by autonomous communities despite the mean design effect
due to stratification seems to be negligible.
Stratification by type of habitat. From the first stratification in the 2002 Round there has
been an improvement in the stratification by type of habitat. Response rates by habitat
brackets justify the applied stratification: the bigger the city the lower the response rate.
Results of the second round also suggested the need to reconsider the stratification with
an aim to reducing the heterogeneity in terms of town size within the strata. Thus, from
the third round the stratification has been composed of the following four brackets: cities
with more than 100,000 inhabitants, cities between 50,001 and 100,000 inhabitants,
municipalities between 10,001 and 50,000 inhabitants and, finally, municipalities with
less than 10,001 inhabitants.
Table A.1. Expected response rate (%) and size of cluster by bracket
More than
100,000
2008 response rate
2010 response rate
Weighted mean
2012 previsions
Proportion of population
Individuals by section
62.4
66.7
65.3
67.0
0.38
7
Between
50,001 and
100,000
65.2
66.3
65.9
67.0
0.12
7
Between
10001 and
50,000
67.6
65.6
66.3
67.0
0.26
7
Les than
10,001
74.8
76.3
75.8
76.0
0.20
6
Total
66.8
68.5
67.9
69.1
1.00
6.76
6
Appendix A.2. Design effects
Design effect due to variation in inclusion probabilities and design effect due to
clustering for 2012 round are both estimated as a weighted mean of the correspondent
design effects in 2006 and 2010 data. The special design sampling of Spanish ESS R4
(with an oversampling of Catalonia and Galicia regions) causes 2008 data inappropriate
for making these calculations.
The formula used in the calculation of design effect due to clustering is the following:
DEFFc = 1 + (k − 1) ⋅ ρ
Being:
ρ = intra-group correlation coefficient
k = estimated average number of completed interviews per cluster
To estimate the intra-group correlation coefficient in the 2012 round we have used the
data from 2006 and 2010 rounds for a group of numerical, ordinal and dummy variables.
All of the variables were also used in the 2010 design. Some of the ordinal variables were
also used in 2006 design and others were proposed by the ESS experts’ panel.
Table 4 provides the prevision for the average number k of completed interviews per
cluster while Table 5 displays the list of selected variables and the calculation of the
value of the intra-group correlation coefficient used in the estimation of the design effect
of clustering. The intra-group correlation coefficients (ρ) for each of the variables have
been estimated using two level variance decomposition models. Each PSU is considered
as a group or cluster (level-2 unit).
We computed the means of the number of completed interviews for cluster in 2006 and
2010 rounds. The prevision for k is the average of these two means.
Table A.2.1. Estimation of the mean response rate per cluster
Weighted
2008 data
2010 data
Mean (2012 prevision)
mean
k
4,223
4,274
4,257
4,257
7
Table A.2.2. Intra-group correlation coefficient
Variable
Ordinal
PPLTRST
PPLFAIR
PPLHLP
POLINTR
TRSTLGL
TRSTPLT
STFECO
STFGOV
STFDEM
DISCRIGV
Numerical
HHMMB
YRBRN
EDUYRS
PDJOBYR
WKHCT
Dummy
VOTE
PDJOBEV
MOCNTR
GNDR
UEMP3M
UEMP12M
UEMP5YR
CHLDHHE
Total
2006 ρ
2010 ρ
Mean ρ
0.023
0.008
0.087
0.088
0.084
0.098
0.016
0.038
0.098
-
0,095
0,083
0,133
0,073
0,063
0,06
0,077
0,054
0,088
0,008
0,106
0,098
0,168
0,088
0,096
0,095
0,1
0,069
0,111
0,013
0.023
0.028
0.108
0.093
0.070
0,034
0,064
0,168
0,012
0,047
0,036
0,06
0,161
0,041
0,052
0.031
0.008
0.049
0.004
0.024
0.031
0.002
0.010
0,035
0,028
0,074
0,002
0,016
0,03
0,002
0,098
0,038
0,045
0,078
0,011
0,036
0,083
0,022
0,079
0.046
0,058
0,054
Finally, the design effect due to clustering for 2012 round is estimated by:
Table A.2.3. Estimation of design effects
2006 data
2010 data
2012 prevision
DEFFp
1.016
1.117
1.083
DEFFc
1.151
1.190
1.177
The total design effect is:
DEFF = DEFFC * DEFFP = 1.083*1.177=1.275
8
Appendix A.3: Assignment of the number of individuals and sections to strata
Table 6 shows the distribution of the Spanish population in the 64 strata considered:
Table A.3.1. 2011 Spanish population of 15 years old and over per strata
Size of habitat:
cities with
More than
100,000
Region
Andalucía
Aragón
Asturias
Baleares
Canarias
Cantabria
Castilla y León
Castilla - La Mancha
Catalonia
Comunidad Valenciana
Extremadura
Galicia
Madrid
Murcia
Navarra
País Vasco
Rioja (La)
Ceuta y Melilla
Total
2.342.460
580.417
446.283
345.586
654.156
158.664
680.984
143.480
2.612.271
1.316.461
126.435
476.448
3.889.789
542.694
170.565
679.471
129.377
0
15.295.541
Between
50,001 and
100,000
1.182.786
0
74.156
0
203.136
0
238.810
277.547
770.355
525.168
80.348
400.118
692.737
130.714
0
210.303
0
126.607
4.912.785
Between
10,001 and
50,000
1.917.987
187.948
275.037
442.203
745.215
146.169
298.072
449.496
1.654.450
1.630.670
238.331
751.507
544.807
454.700
98.719
583.203
33.170
0
10.451.684
Less than
10,001
Total
1.607.709 7.050.942
393.260 1.161.625
171.227
966.703
155.913
943.702
212.934 1.815.441
210.064
514.897
1.034.076 2.251.942
919.089 1.789.612
1.334.712 6.371.788
880.049 4.352.348
504.445
949.559
843.230 2.471.303
362.683 5.490.016
82.878 1.210.986
273.677
542.961
417.275 1.890.252
113.084
275.631
0
126.607
9.516.305 40.176.315
Source: INE, 2011 Continuous Municipal Register
The total number of sections to be selected comes from the gross sample (2,869) divided
by the number of individuals per section (6.5), giving an initial total of 424 sections. This
total has been distributed among the four brackets of habitat proportionally to their
population aged 15+. The assignment of 7 individuals per section in the first three
brackets and 6 in the fourth gives the final distribution of individuals to be selected in
each bracket as it has been provided in Table 1.
9