Grado de importancia 2

Working document
Revision of 19762000 unemployment
series
under the LFS 2002
definition
August 2005
Directorate of
Labour Market Statistics
Document prepared by:
Javier Trejo Malfaz
Lourdes Ortega Núñez
With the assistance of:
Esperanza Gil Moraleda
Technical guidance:
Ramiro López Paños
Florentina Álvarez Álvarez
Miguel Ángel García Martínez
Contents
Summary
3
Introduction
3
Lines of analysis
5
Adopted procedure: Regression with binary
response variable. Probit regression model
7
Appendix: revised results
15
1
Summary
Under the Resolution of the Government Delegate Committee for Economic Affairs on
improved transparency of economic and statistical information provided by the
Government, the National Statistical Institute will conduct revision of all series whose
methodological basis has changed, in order to preserve historical information.
This document describes the procedure used to revise unemployment series so as to
make up for the discontinuity of the series due to the change of definition in 2002.
The revision extends from the third quarter of 1976 up to the fourth quarter of 2000,
and has been effected for the main aggregates (unemployment by gender and by age
under 25 years old or 25 years old and over) at the national and Autonomous
Community (regional) levels.
First, we provide an introduction to the problem of the unemployment series break to
which the match has been applied: the introduction describes the reasons for and
nature of the change of definition. Then we set out the methods considered to make
up for the break in the series, and look at the pros and cons of each approach. Finally,
we explain the chosen procedure in greater depth and provide the results.
2
Introduction
The Active Population Survey (Spanish Encuesta de Población Activa, “EPA”) is one
of the chief statistical sources on the labour market in Spain and provides harmonised
employment and unemployment data for the European Union.
The Active Population Survey is governed by a range of Regulations, particularly
Council Regulation 577/98 of 9 March 1998, which is implemented by several
Commission Regulations. Changes to the EU’s statistical laws proposed by the
Statistical Office of the European Communities (Eurostat) apply also to the Spanish
EPA.
In 2002 the working definition of unemployment used in the EPA up to then was
modified in line with Community law.
Under the EPA, unemployed people are defined as people aged 16 and over who are
out of work, available for work and actively seeking employment. The definition also
covers those jobless people who have found employment and are available for such
employment, and therefore are no longer jobseeking. The survey uses the generic
definition of unemployment of the International Labour Office.
Specific application of this definition involves a range of additional conventions, such
as defining a ‘period of availability’ for work (the person must be available for work for
the two weeks after the survey reference week) and defining what ‘actively seeking
employment’ means.
For better harmonisation of employment and unemployment figures across the EU,
Commission Regulation (EC) 1897/2000 of 7 September 2000 lays down (besides
other requirements on the order of questions, treatment of special groups, and so
forth) practical rules for specific application of the conditions for a person to be
considered unemployed in the European Union. The conditions are slightly different
3
from the ones in force up to 2002. In particular, Annex I, point 1 of the Regulation
defines the following as the only active methods of seeking employment:
-
-
having been in contact with a public employment office to find work, regardless of
who took the initiative –the person or the office- (renewing registration only for
administrative reasons is not an active step),
having been in contact with a private agency (temporary work agency, firm
specialising in recruitment, etc.) to find work,
applying to employers directly,
asking among friends, relatives, unions, etc., to find work,
placing or answering job advertisements,
studying job advertisements,
taking a recruitment test or examination or being interviewed,
looking for land, premises or equipment to become self-employed,
applying for permits, licences or financial resources to become self-employed.
In Spain, the condition that contact with a public employment office must be for the
purpose of finding work, such that renewing registration for administrative reasons only
is not regarded as an active step, has a considerable impact on the figures and has
had a notable effect on the unemployment measurement under the Spanish EPA.
In 2001 the questionnaire was modified and information was collected so as to
calculate unemployment figures in accordance with the working definition under
Regulation 1897/2000 and the previous Regulation. 2001 data served as the link
between the old definition and the new.
Questionnaires prior to 2001 did not reflect the nuance of whether contact with a public
employment office was for the purpose of finding work or for other reasons, so survey
variables for periods before 2001 cannot be used to determine what unemployment
figures the EPA would have given under the new definition. But unemployment figures
under both the old and the new (and now official) definition can be given for 20012004.
Unemployed people under the old definition who are excluded under the new one
move into the ‘inactive’ category, as they do not meet the condition of ‘actively seeking
work’. The effect of the change of definition, therefore, is a reduction of the number of
unemployed people, the unemployment rate and the activity rate.
3
Lines of analysis
For those years prior to 2001, our task is to assign unemployed people (as defined
historically by the definition then current) to an unemployed or an inactive status,
according to the new definition.
Based on our analysis of the behaviour of the two unemployment definitions for the
period 2001-2004, and given the special features detected in the different Autonomous
Communities, we have examined each independently.
Regarding the rest of explanatory variables that may influence the differences between
the former definition and the new one, we have considered only the main ones, like
4
sex and age under 25 years old or 25 years old and over (the usual threshold for
determining unemployment among young people). We could not go into finer detail
without the risk of invalidating the results in one domain or another due to the small
size of the sample.
Initially, we used 1987 as the starting point for the methods analysed (i.e., we
assumed that for 1987 the two definitions did not differ) and we disaggregated data by
Autonomous Community and sex. Our choice of year 1987 was determined by two
factors. On one hand, it is close to 1985, the year of issue of the Ministry of Labour
and Social Security Order of 11th March laying down the definitions concerning
employment claims at INEM public employment offices (today’s State Employment
Service); this Order may be assumed to have influenced individuals’ behaviour
regarding public employment offices. On the other hand, in 1987 a major
methodological questionnaire-related change was made to the survey to measure
unemployment in greater depth, and these methods have remained substantially
unaltered up to 2004.
In line with the recommendations of the Working Group on Short-term labour market
statistics (High Statistical Council, June 2005), we decided to push the initial period
back to 19761 and use age group as the discriminating factor (under 25 years old, and
25 years old or over). The methods applied for the study are listed below:
- Classification methods. Discriminating analysis with binary logistic regression
Discriminating analysis enables us to identify any significant differences among groups
of individuals as regards a set of variables, and devise procedures for systematic
classification of new observations of unknown origin in one of the analysed groups.
In our study, we tried to classify individuals into two distinct groups according to their
labour status under each unemployment definition (old and new). The two groups are:

Unemployed person under both old and new definitions.

Inactive person under the new definition being also unemployed under the
old definition.
We produced this classification using surveys for years 2001 to 2004; with the
information available on each individual, one can determine his or her labour status
under both definitions.
This method cannot be applied for years prior to 1999, because the questionnaire
provided insufficient information. The variable eliciting the information on the latest
contact with the employment office (‘CONTAC’ variable) – where the relevant
jobseeking method is registration with a public employment office – is unavailable in
EPA questionnaires administered over the period 1992-1998. The variable is essential
for a working distinction between the old and the new unemployment definition.
- Multivariate or econometric models
It may be possible to provide efficient results by explaining unemployment behaviour
(new definition) based on the evolution of variables considered to be explanatory
(causal variables) under econometric models. The main explanatory variable would be
1
Except for Ceuta and Melilla as a whole, for which data are available only from 1998.
5
whether contact with an employment office was active or not. However, when
attempting to make estimates using these models, if the explanatory variable values
are unknown, it is necessary to use predictions of the variables.
In our case, we chose not to use these models because of the difficulty or impossibility
of knowing the values of those magnitudes that would be considered as exogenous. In
addition, since we conduct the study by Autonomous Community, and as the EPA
sample providing the basic data is small in the less populous Autonomous
Communities, sampling errors would lead to ‘non-significant’ estimates, particularly in
the under-represented regions.
- Stochastic time series models. Box-Jenkins model
When the purpose of a study is only ‘prediction’, as in our case, it is not always
necessary to specify a causal model in which the explained variable is expressed on
the basis of a set of explanatory variables, because satisfactory predictions can often
be produced using univariate or time-series models. Moreover, when no abrupt
changes are expected from the present behaviour of the variable, these methods can
provide good predictions. Box-Jenkins models are among the most rigorous
techniques for univariate prediction.
For each studied group, we examined the effect of the change of definition of
unemployment introduced in 2002 using ARIMA models in combination with
intervention models.
After adjusting the model, we recalculated the new unemployment series, estimated
for each sex including the effect of the new definition of unemployment in every
Autonomous Community. The national total was obtained by aggregation.
This method provides a good fit between the model and the historical unemployment
series. But the demographic variables relating to the labour status of the population
are affected by external factors that are very hard to quantify, such as economic
cycles, employment policies and immigration. These factors affect the variability of the
model, increase prediction errors and mask the effect of the change of definition of
unemployment, i.e., quantitatively decrease the estimate of the intervention. All these
factors influence the predictions so that, for some quarters, the estimated unemployed
people under the new definition are greater than under the old one, which is
theoretically impossible.
- Regression with a binary response variable. Probit regression models
Regression models can also be used to predict the binary response of an individual for
whom certain measurable characteristics are known. In the issue we are concerned
with here, the response variable is in fact binary (takes the value 1 or 0 depending on
whether the individual is unemployed or not). One of the most widely used regression
models is the probit model, which uses normal distribution as its reference.
After assessing the various ways we could address the problem of estimating
retrospective unemployment series under the new definition, we decided to use the
probit procedure, as described in detail in the next section.
6
4
Adopted procedure: Regression with a binary response variable
Probit regression models

Available base information
The data we used to estimate the probability of falling within the unemployed category
under the old and under the new definition were the records for unemployed
individuals according to the old definition, from the first quarter of 2001 to the fourth
quarter of 2004 (the period for which we have information for both definitions).
After obtaining these probabilities, we applied them to the historic unemployment
series to estimate the number of unemployed people according to the definition laid
down in 2002. Since the methodology and production of the EPA and various factors
influencing the Spanish labour market and other social and economic situations have
changed over time, we can assume that the effect of this new definition will decrease
the further on we move from 2001.

Description of the procedure
To maintain continuity with earlier models, the study was conducted breaking data
down by sex and Autonomous Community. The national total was again obtained by
aggregation. It is important to point out that the results of estimating the unemployment
series by aggregation of Autonomous Communities and by applying the method
directly to the national total are nonetheless similar.
The procedure allows for a range of different approaches, which we summarise below.
In a first approach, we obtain the probabilities of remaining within the unemployed
category under both the old and the new definition for each quarter in the period
2001-2004.
Then we model mathematically the curve describing the trend over time of these
probabilities for each group considered, by regression. The quarterly probabilities prior
to 2000 are calculated by extrapolation from these models. The number of
unemployed people is estimated by combining these probabilities and historic
unemployment series.
We used linear, polynomial, potential, logarithmic and exponential models, among
others. We also looked at models with ‘retarded variables’ (despite the name, they
refer to the value of the transition probability in later quarters) similar to those variables
used to calculate the first activity rate projections published by INE in 1995. In all
cases, equation parameters were estimated using the least square method.
Since information was only available for a short period of time (four years) compared to
the period for which we needed estimates, we thought it was better to use other
procedures that would produce estimates that would be more robust over the long
term.
Using a second approach, we tried to obtain the probability of permanence within
unemployed status without studying the observations from a time-related perpective.
After producing this probability using a probit regression, we adjusted a trend line so
that for the first observation (the second quarter of 1976), the probability would be
equal to 1, while in the final quarter of 2000 it would be the obtained using the probit
model. We analysed linear and quadratic trends.
7
The procedure to adjust a linear trend entails modelling the straight line through the
two points we have available (the first quarter of the historic series, where the
probability is 1, and the fourth quarter of 2000, where the probability is the figure
produced by the probit regression model).
To adjust a quadratic trend, we needed an additional point that could only be obtained
in the period 2001 to 2004, very close to the end of the series. On adjusting a 2nd
degree polynomial curve, the results obtained with the quadratic trend did not let us
conduct a satisfactory general treatment.
We also considered taking into account possible seasonal variations and adjusting the
trend based on the quarter of the year the observations came from, which would
therefore produce a different probability for each quarter. However, if we distinguished
between quarters, the size of the EPA sample providing the basic data was small in
less populous Autonomous Communities, thus affecting parameter estimates, while in
large Autonomous Communities there were no significant differences from the figures
obtained without seasonal variations. Therefore we decided to calculate a single
common probit for all four quarters.
The considerations set out above led us finally to choose a probit regression model
corrected by a linear trend.
In essence, the procedure consisted of the following stages:
1.
Calculation of unemployment transition probabilities under the old and under the
new definitions of unemployment using a probit regression model, considering:

an endogenous binary variable to measure whether an individual is
unemployed or not under both definitions;

sex, age group (under 25 or 25 and over) and the interaction between the two
of them as exogenous variables, weighted by the elevation factor.
2.
After producing these transition probabilities, we applied a linear trend whereby
the effect of the change of definition has completely disappeared in the third
quarter of 1976.
3.
From then on, we applied the linearly adjusted probabilities to the historic
unemployment series.
The models produced using this procedure are presented below, both by Autonomous
Community and for the national total. The national total theorical model is included
because, although its results are obtained by aggregation of all Autonomous
Communities data, we thought it would be interesting to compare its behaviour with
the one observed in the different regions.
The general model type is:
p  PY  1 sex ,age    0  1  sex   2  age  12  i nt 
where
8
1 if unemployed under both definition s
Y 
otherwise
0
1 if  25 years
age  
0 if  25 years
1 if male
sex  
0 if female
1 if male  25 years
i nt  age  sex  
any other case
0
p  PY  1 sex , age 
·  Distributi on function N(0.1)
The adjusted models for the national total and the Autonomous Communities are:
1. National total
p  0.8814  0.1980  sex  0.2099  age  0.1439  i nt 
2. Andalusia:
p  0.9293  0.2796  sex  0.1557  age  0.2420  i nt 
3. Aragon:
p  0.4767  0.3464  sex  0.2413  age  0.1555  i nt 
4. Asturias:
p  0.5064 - 0.0214  sex  0.1348  age  0.0159  i nt 
5. Balearic Islands:
p  1.2164  0.2549  sex  0.1422  age  0.1729  i nt 
6. Canary Islands:
p  0.9877  0.0042  sex  0.2400  age  0.0606  i nt 
7. Cantabria:
p  0.8439  0.0614  sex  0.1411  age  0.1244  i nt 
8. Castile-La Mancha:
p  0.5847  0.1796  sex  0.2603  age  0.2214  i nt 
9. Castile and Leon:
p  0.8079  0.1838  sex  0.2393  age  0.1982  i nt 
9
10. Catalonia:
p  1.9164  0.0299  sex  0.0813  age  0.1827  i nt 
11. Valencian Community:
p  1.0515  0.3167  sex  0.2786  age  0.1830  i nt 
12. Extremadura:
p  0.2660  0.3525  sex  0.2351  age  0.3022  i nt 
13. Galicia:
p  0.9522  0.0935  sex  0.2403  age  0.2689  i nt 
14. Madrid:
p  0.5429  0.0524  sex  0.2332  age  0.0060  i nt 
15. Murcia:
p  1.1251  0.3664  sex  0.1863  age  0.2417  i nt 
16. Navarre:
p  0.9591  0.1577  sex  0.3882  age  0.1288  i nt 
17. Basque Country:
p  1.1264  0.1955  sex  0.1961  age  0.0252  i nt 
18. La Rioja:
p  0.4835  0.3708  sex  0.1234  age  0.2268  i nt 
19. Ceuta and Melilla:
p   0.5076  0.3515  sex  0.0873  age  0.3648  i nt 
Three tables are presented below. Table 1 sets out marginal probabilities by sex, and
by age (under 25 years old or 25 years old and over). The next two tables show the
probability of a person remaining within the unemployed category with both variables,
sex and age, being considered in conjunction; Table 2 introduces the interaction effect,
while Table 3 leaves it out.
10
Table 1
National total
Andalusia
Aragon
Asturias (Principality of)
Balearic Islands
Canary Islands
Cantabria
Castile and Leon
Castile-La M ancha
Catalonia
Valencian Community
Extremadura
Galicia
M adrid (Community of)
M urcia (Region of)
Navarre
Basque Country
La Rioja
Ceuta & M elilla
M ales
0,86365
0,88222
0,80148
0,69641
0,92807
0,85009
0,81844
0,84197
0,78084
0,97805
0,91822
0,72706
0,85048
0,74496
0,92983
0,88126
0,91375
0,79714
0,40617
Females
0,82242
0,83241
0,69977
0,70304
0,89424
0,85028
0,80751
0,80344
0,73857
0,97346
0,86705
0,62104
0,84007
0,72114
0,88027
0,84910
0,87677
0,69659
0,31503
< 25 years
0,86790
0,86482
0,78858
0,73358
0,91921
0,88484
0,82974
0,85108
0,79593
0,98202
0,91745
0,69948
0,86665
0,78965
0,91328
0,91312
0,91985
0,74579
0,33508
>= 25 years
0,83087
0,85108
0,72263
0,69068
0,90623
0,83880
0,80735
0,80727
0,74042
0,97313
0,87923
0,65503
0,83784
0,71332
0,89584
0,84528
0,88421
0,73862
0,36117
Table 2
M ales
National total
Andalusia
Aragon
Asturias (Principality of)
Balearic Islands
Canary Islands
Cantabria
Castile and Leon
Castile-La M ancha
Catalonia
Valencian Community
Extremadura
Galicia
M adrid (Community of)
M urcia (Region of)
Navarre
Basque Country
La Rioja
Ceuta & M elilla
< 25 years
0,88613
0,88974
0,83583
0,72923
0,93394
0,88396
0,83415
0,81606
0,86911
0,98354
0,93914
0,75510
0,87090
0,79735
0,93876
0,92422
0,93367
0,80556
0,38684
>= 25 years
0,85513
0,87955
0,78752
0,68544
0,92535
0,83757
0,81308
0,76667
0,83165
0,97561
0,91007
0,71922
0,84346
0,72446
0,92584
0,86431
0,90634
0,79479
0,41439
Females
< 25 years
>= 25 years
0,85116
0,81431
0,84239
0,82943
0,74796
0,68739
0,73741
0,69422
0,90287
0,89144
0,88571
0,83977
0,82527
0,80352
0,78084
0,72658
0,83740
0,79457
0,98043
0,97129
0,89848
0,85690
0,65481
0,61341
0,86311
0,83461
0,78184
0,70622
0,89389
0,87462
0,90363
0,83358
0,90544
0,87036
0,70655
0,69316
0,29736
0,32250
11
Table 3
M ales
National total
Andalusia
Aragon
Asturias (Principality of)
Balearic Islands
Canary Islands
Cantabria
Castile and Leon
Castile-La M ancha
Catalonia
Valencian Community
Extremadura
Galicia
M adrid (Community of)
M urcia (Region of)
Navarre
Basque Country
La Rioja
Ceuta & M elilla
< 25 years
0,87400
0,86921
0,81831
0,72707
0,92515
0,87927
0,82173
0,84914
0,78903
0,98646
0,92839
0,70932
0,84543
0,79801
0,92452
0,91561
0,93225
0,77365
0,33231
>= 25 years
0,85980
0,88665
0,79477
0,68618
0,92939
0,83938
0,81734
0,83932
0,77763
0,97420
0,91438
0,73187
0,85214
0,72419
0,93209
0,86796
0,90689
0,80352
0,43799
Females
< 25 years
>= 25 years
0,86245
0,81096
0,86105
0,82364
0,76362
0,68321
0,73931
0,69373
0,91286
0,88808
0,89022
0,83835
0,83768
0,80064
0,85250
0,79042
0,80093
0,72061
0,97713
0,97235
0,90826
0,85350
0,69184
0,60486
0,88346
0,82950
0,78116
0,70640
0,90513
0,86973
0,91105
0,83125
0,90699
0,86999
0,72805
0,68564
0,33715
0,30588
An analysis of these results shows that males are more likely to remain in the
unemployed category under the new definition than females. The probability is also
higher in young people aged under 25 years old than in people aged 25 years old and
over (Table 1). This general pattern is only broken in Asturias and the Canary Islands,
as regards sex, and in Ceuta and Melilla taken as a whole, as regards age.
On the other hand, the introduction of the interaction between sex and age-group
variables (Table 3), produces a general decrease in the probability of remaining within
the unemployed category for males under 25 (compared with Table 2), except in
Madrid and Catalonia, where the probability is higher.
The Appendix to this document sets out the results of the series revision procedure,
obtained by the application of models 2 to 19. The national total is calculated by an
addition of all the Autonomous Communities results.
12
Appendix: revised
results*
* There may be differences in the decimals of old unemployment figures and new figures
measured directly by the LFS and those ones available on INEbase, because totals are sums
that may have accumulated rounding errors.