Working document Revision of 19762000 unemployment series under the LFS 2002 definition August 2005 Directorate of Labour Market Statistics Document prepared by: Javier Trejo Malfaz Lourdes Ortega Núñez With the assistance of: Esperanza Gil Moraleda Technical guidance: Ramiro López Paños Florentina Álvarez Álvarez Miguel Ángel García Martínez Contents Summary 3 Introduction 3 Lines of analysis 5 Adopted procedure: Regression with binary response variable. Probit regression model 7 Appendix: revised results 15 1 Summary Under the Resolution of the Government Delegate Committee for Economic Affairs on improved transparency of economic and statistical information provided by the Government, the National Statistical Institute will conduct revision of all series whose methodological basis has changed, in order to preserve historical information. This document describes the procedure used to revise unemployment series so as to make up for the discontinuity of the series due to the change of definition in 2002. The revision extends from the third quarter of 1976 up to the fourth quarter of 2000, and has been effected for the main aggregates (unemployment by gender and by age under 25 years old or 25 years old and over) at the national and Autonomous Community (regional) levels. First, we provide an introduction to the problem of the unemployment series break to which the match has been applied: the introduction describes the reasons for and nature of the change of definition. Then we set out the methods considered to make up for the break in the series, and look at the pros and cons of each approach. Finally, we explain the chosen procedure in greater depth and provide the results. 2 Introduction The Active Population Survey (Spanish Encuesta de Población Activa, “EPA”) is one of the chief statistical sources on the labour market in Spain and provides harmonised employment and unemployment data for the European Union. The Active Population Survey is governed by a range of Regulations, particularly Council Regulation 577/98 of 9 March 1998, which is implemented by several Commission Regulations. Changes to the EU’s statistical laws proposed by the Statistical Office of the European Communities (Eurostat) apply also to the Spanish EPA. In 2002 the working definition of unemployment used in the EPA up to then was modified in line with Community law. Under the EPA, unemployed people are defined as people aged 16 and over who are out of work, available for work and actively seeking employment. The definition also covers those jobless people who have found employment and are available for such employment, and therefore are no longer jobseeking. The survey uses the generic definition of unemployment of the International Labour Office. Specific application of this definition involves a range of additional conventions, such as defining a ‘period of availability’ for work (the person must be available for work for the two weeks after the survey reference week) and defining what ‘actively seeking employment’ means. For better harmonisation of employment and unemployment figures across the EU, Commission Regulation (EC) 1897/2000 of 7 September 2000 lays down (besides other requirements on the order of questions, treatment of special groups, and so forth) practical rules for specific application of the conditions for a person to be considered unemployed in the European Union. The conditions are slightly different 3 from the ones in force up to 2002. In particular, Annex I, point 1 of the Regulation defines the following as the only active methods of seeking employment: - - having been in contact with a public employment office to find work, regardless of who took the initiative –the person or the office- (renewing registration only for administrative reasons is not an active step), having been in contact with a private agency (temporary work agency, firm specialising in recruitment, etc.) to find work, applying to employers directly, asking among friends, relatives, unions, etc., to find work, placing or answering job advertisements, studying job advertisements, taking a recruitment test or examination or being interviewed, looking for land, premises or equipment to become self-employed, applying for permits, licences or financial resources to become self-employed. In Spain, the condition that contact with a public employment office must be for the purpose of finding work, such that renewing registration for administrative reasons only is not regarded as an active step, has a considerable impact on the figures and has had a notable effect on the unemployment measurement under the Spanish EPA. In 2001 the questionnaire was modified and information was collected so as to calculate unemployment figures in accordance with the working definition under Regulation 1897/2000 and the previous Regulation. 2001 data served as the link between the old definition and the new. Questionnaires prior to 2001 did not reflect the nuance of whether contact with a public employment office was for the purpose of finding work or for other reasons, so survey variables for periods before 2001 cannot be used to determine what unemployment figures the EPA would have given under the new definition. But unemployment figures under both the old and the new (and now official) definition can be given for 20012004. Unemployed people under the old definition who are excluded under the new one move into the ‘inactive’ category, as they do not meet the condition of ‘actively seeking work’. The effect of the change of definition, therefore, is a reduction of the number of unemployed people, the unemployment rate and the activity rate. 3 Lines of analysis For those years prior to 2001, our task is to assign unemployed people (as defined historically by the definition then current) to an unemployed or an inactive status, according to the new definition. Based on our analysis of the behaviour of the two unemployment definitions for the period 2001-2004, and given the special features detected in the different Autonomous Communities, we have examined each independently. Regarding the rest of explanatory variables that may influence the differences between the former definition and the new one, we have considered only the main ones, like 4 sex and age under 25 years old or 25 years old and over (the usual threshold for determining unemployment among young people). We could not go into finer detail without the risk of invalidating the results in one domain or another due to the small size of the sample. Initially, we used 1987 as the starting point for the methods analysed (i.e., we assumed that for 1987 the two definitions did not differ) and we disaggregated data by Autonomous Community and sex. Our choice of year 1987 was determined by two factors. On one hand, it is close to 1985, the year of issue of the Ministry of Labour and Social Security Order of 11th March laying down the definitions concerning employment claims at INEM public employment offices (today’s State Employment Service); this Order may be assumed to have influenced individuals’ behaviour regarding public employment offices. On the other hand, in 1987 a major methodological questionnaire-related change was made to the survey to measure unemployment in greater depth, and these methods have remained substantially unaltered up to 2004. In line with the recommendations of the Working Group on Short-term labour market statistics (High Statistical Council, June 2005), we decided to push the initial period back to 19761 and use age group as the discriminating factor (under 25 years old, and 25 years old or over). The methods applied for the study are listed below: - Classification methods. Discriminating analysis with binary logistic regression Discriminating analysis enables us to identify any significant differences among groups of individuals as regards a set of variables, and devise procedures for systematic classification of new observations of unknown origin in one of the analysed groups. In our study, we tried to classify individuals into two distinct groups according to their labour status under each unemployment definition (old and new). The two groups are: Unemployed person under both old and new definitions. Inactive person under the new definition being also unemployed under the old definition. We produced this classification using surveys for years 2001 to 2004; with the information available on each individual, one can determine his or her labour status under both definitions. This method cannot be applied for years prior to 1999, because the questionnaire provided insufficient information. The variable eliciting the information on the latest contact with the employment office (‘CONTAC’ variable) – where the relevant jobseeking method is registration with a public employment office – is unavailable in EPA questionnaires administered over the period 1992-1998. The variable is essential for a working distinction between the old and the new unemployment definition. - Multivariate or econometric models It may be possible to provide efficient results by explaining unemployment behaviour (new definition) based on the evolution of variables considered to be explanatory (causal variables) under econometric models. The main explanatory variable would be 1 Except for Ceuta and Melilla as a whole, for which data are available only from 1998. 5 whether contact with an employment office was active or not. However, when attempting to make estimates using these models, if the explanatory variable values are unknown, it is necessary to use predictions of the variables. In our case, we chose not to use these models because of the difficulty or impossibility of knowing the values of those magnitudes that would be considered as exogenous. In addition, since we conduct the study by Autonomous Community, and as the EPA sample providing the basic data is small in the less populous Autonomous Communities, sampling errors would lead to ‘non-significant’ estimates, particularly in the under-represented regions. - Stochastic time series models. Box-Jenkins model When the purpose of a study is only ‘prediction’, as in our case, it is not always necessary to specify a causal model in which the explained variable is expressed on the basis of a set of explanatory variables, because satisfactory predictions can often be produced using univariate or time-series models. Moreover, when no abrupt changes are expected from the present behaviour of the variable, these methods can provide good predictions. Box-Jenkins models are among the most rigorous techniques for univariate prediction. For each studied group, we examined the effect of the change of definition of unemployment introduced in 2002 using ARIMA models in combination with intervention models. After adjusting the model, we recalculated the new unemployment series, estimated for each sex including the effect of the new definition of unemployment in every Autonomous Community. The national total was obtained by aggregation. This method provides a good fit between the model and the historical unemployment series. But the demographic variables relating to the labour status of the population are affected by external factors that are very hard to quantify, such as economic cycles, employment policies and immigration. These factors affect the variability of the model, increase prediction errors and mask the effect of the change of definition of unemployment, i.e., quantitatively decrease the estimate of the intervention. All these factors influence the predictions so that, for some quarters, the estimated unemployed people under the new definition are greater than under the old one, which is theoretically impossible. - Regression with a binary response variable. Probit regression models Regression models can also be used to predict the binary response of an individual for whom certain measurable characteristics are known. In the issue we are concerned with here, the response variable is in fact binary (takes the value 1 or 0 depending on whether the individual is unemployed or not). One of the most widely used regression models is the probit model, which uses normal distribution as its reference. After assessing the various ways we could address the problem of estimating retrospective unemployment series under the new definition, we decided to use the probit procedure, as described in detail in the next section. 6 4 Adopted procedure: Regression with a binary response variable Probit regression models Available base information The data we used to estimate the probability of falling within the unemployed category under the old and under the new definition were the records for unemployed individuals according to the old definition, from the first quarter of 2001 to the fourth quarter of 2004 (the period for which we have information for both definitions). After obtaining these probabilities, we applied them to the historic unemployment series to estimate the number of unemployed people according to the definition laid down in 2002. Since the methodology and production of the EPA and various factors influencing the Spanish labour market and other social and economic situations have changed over time, we can assume that the effect of this new definition will decrease the further on we move from 2001. Description of the procedure To maintain continuity with earlier models, the study was conducted breaking data down by sex and Autonomous Community. The national total was again obtained by aggregation. It is important to point out that the results of estimating the unemployment series by aggregation of Autonomous Communities and by applying the method directly to the national total are nonetheless similar. The procedure allows for a range of different approaches, which we summarise below. In a first approach, we obtain the probabilities of remaining within the unemployed category under both the old and the new definition for each quarter in the period 2001-2004. Then we model mathematically the curve describing the trend over time of these probabilities for each group considered, by regression. The quarterly probabilities prior to 2000 are calculated by extrapolation from these models. The number of unemployed people is estimated by combining these probabilities and historic unemployment series. We used linear, polynomial, potential, logarithmic and exponential models, among others. We also looked at models with ‘retarded variables’ (despite the name, they refer to the value of the transition probability in later quarters) similar to those variables used to calculate the first activity rate projections published by INE in 1995. In all cases, equation parameters were estimated using the least square method. Since information was only available for a short period of time (four years) compared to the period for which we needed estimates, we thought it was better to use other procedures that would produce estimates that would be more robust over the long term. Using a second approach, we tried to obtain the probability of permanence within unemployed status without studying the observations from a time-related perpective. After producing this probability using a probit regression, we adjusted a trend line so that for the first observation (the second quarter of 1976), the probability would be equal to 1, while in the final quarter of 2000 it would be the obtained using the probit model. We analysed linear and quadratic trends. 7 The procedure to adjust a linear trend entails modelling the straight line through the two points we have available (the first quarter of the historic series, where the probability is 1, and the fourth quarter of 2000, where the probability is the figure produced by the probit regression model). To adjust a quadratic trend, we needed an additional point that could only be obtained in the period 2001 to 2004, very close to the end of the series. On adjusting a 2nd degree polynomial curve, the results obtained with the quadratic trend did not let us conduct a satisfactory general treatment. We also considered taking into account possible seasonal variations and adjusting the trend based on the quarter of the year the observations came from, which would therefore produce a different probability for each quarter. However, if we distinguished between quarters, the size of the EPA sample providing the basic data was small in less populous Autonomous Communities, thus affecting parameter estimates, while in large Autonomous Communities there were no significant differences from the figures obtained without seasonal variations. Therefore we decided to calculate a single common probit for all four quarters. The considerations set out above led us finally to choose a probit regression model corrected by a linear trend. In essence, the procedure consisted of the following stages: 1. Calculation of unemployment transition probabilities under the old and under the new definitions of unemployment using a probit regression model, considering: an endogenous binary variable to measure whether an individual is unemployed or not under both definitions; sex, age group (under 25 or 25 and over) and the interaction between the two of them as exogenous variables, weighted by the elevation factor. 2. After producing these transition probabilities, we applied a linear trend whereby the effect of the change of definition has completely disappeared in the third quarter of 1976. 3. From then on, we applied the linearly adjusted probabilities to the historic unemployment series. The models produced using this procedure are presented below, both by Autonomous Community and for the national total. The national total theorical model is included because, although its results are obtained by aggregation of all Autonomous Communities data, we thought it would be interesting to compare its behaviour with the one observed in the different regions. The general model type is: p PY 1 sex ,age 0 1 sex 2 age 12 i nt where 8 1 if unemployed under both definition s Y otherwise 0 1 if 25 years age 0 if 25 years 1 if male sex 0 if female 1 if male 25 years i nt age sex any other case 0 p PY 1 sex , age · Distributi on function N(0.1) The adjusted models for the national total and the Autonomous Communities are: 1. National total p 0.8814 0.1980 sex 0.2099 age 0.1439 i nt 2. Andalusia: p 0.9293 0.2796 sex 0.1557 age 0.2420 i nt 3. Aragon: p 0.4767 0.3464 sex 0.2413 age 0.1555 i nt 4. Asturias: p 0.5064 - 0.0214 sex 0.1348 age 0.0159 i nt 5. Balearic Islands: p 1.2164 0.2549 sex 0.1422 age 0.1729 i nt 6. Canary Islands: p 0.9877 0.0042 sex 0.2400 age 0.0606 i nt 7. Cantabria: p 0.8439 0.0614 sex 0.1411 age 0.1244 i nt 8. Castile-La Mancha: p 0.5847 0.1796 sex 0.2603 age 0.2214 i nt 9. Castile and Leon: p 0.8079 0.1838 sex 0.2393 age 0.1982 i nt 9 10. Catalonia: p 1.9164 0.0299 sex 0.0813 age 0.1827 i nt 11. Valencian Community: p 1.0515 0.3167 sex 0.2786 age 0.1830 i nt 12. Extremadura: p 0.2660 0.3525 sex 0.2351 age 0.3022 i nt 13. Galicia: p 0.9522 0.0935 sex 0.2403 age 0.2689 i nt 14. Madrid: p 0.5429 0.0524 sex 0.2332 age 0.0060 i nt 15. Murcia: p 1.1251 0.3664 sex 0.1863 age 0.2417 i nt 16. Navarre: p 0.9591 0.1577 sex 0.3882 age 0.1288 i nt 17. Basque Country: p 1.1264 0.1955 sex 0.1961 age 0.0252 i nt 18. La Rioja: p 0.4835 0.3708 sex 0.1234 age 0.2268 i nt 19. Ceuta and Melilla: p 0.5076 0.3515 sex 0.0873 age 0.3648 i nt Three tables are presented below. Table 1 sets out marginal probabilities by sex, and by age (under 25 years old or 25 years old and over). The next two tables show the probability of a person remaining within the unemployed category with both variables, sex and age, being considered in conjunction; Table 2 introduces the interaction effect, while Table 3 leaves it out. 10 Table 1 National total Andalusia Aragon Asturias (Principality of) Balearic Islands Canary Islands Cantabria Castile and Leon Castile-La M ancha Catalonia Valencian Community Extremadura Galicia M adrid (Community of) M urcia (Region of) Navarre Basque Country La Rioja Ceuta & M elilla M ales 0,86365 0,88222 0,80148 0,69641 0,92807 0,85009 0,81844 0,84197 0,78084 0,97805 0,91822 0,72706 0,85048 0,74496 0,92983 0,88126 0,91375 0,79714 0,40617 Females 0,82242 0,83241 0,69977 0,70304 0,89424 0,85028 0,80751 0,80344 0,73857 0,97346 0,86705 0,62104 0,84007 0,72114 0,88027 0,84910 0,87677 0,69659 0,31503 < 25 years 0,86790 0,86482 0,78858 0,73358 0,91921 0,88484 0,82974 0,85108 0,79593 0,98202 0,91745 0,69948 0,86665 0,78965 0,91328 0,91312 0,91985 0,74579 0,33508 >= 25 years 0,83087 0,85108 0,72263 0,69068 0,90623 0,83880 0,80735 0,80727 0,74042 0,97313 0,87923 0,65503 0,83784 0,71332 0,89584 0,84528 0,88421 0,73862 0,36117 Table 2 M ales National total Andalusia Aragon Asturias (Principality of) Balearic Islands Canary Islands Cantabria Castile and Leon Castile-La M ancha Catalonia Valencian Community Extremadura Galicia M adrid (Community of) M urcia (Region of) Navarre Basque Country La Rioja Ceuta & M elilla < 25 years 0,88613 0,88974 0,83583 0,72923 0,93394 0,88396 0,83415 0,81606 0,86911 0,98354 0,93914 0,75510 0,87090 0,79735 0,93876 0,92422 0,93367 0,80556 0,38684 >= 25 years 0,85513 0,87955 0,78752 0,68544 0,92535 0,83757 0,81308 0,76667 0,83165 0,97561 0,91007 0,71922 0,84346 0,72446 0,92584 0,86431 0,90634 0,79479 0,41439 Females < 25 years >= 25 years 0,85116 0,81431 0,84239 0,82943 0,74796 0,68739 0,73741 0,69422 0,90287 0,89144 0,88571 0,83977 0,82527 0,80352 0,78084 0,72658 0,83740 0,79457 0,98043 0,97129 0,89848 0,85690 0,65481 0,61341 0,86311 0,83461 0,78184 0,70622 0,89389 0,87462 0,90363 0,83358 0,90544 0,87036 0,70655 0,69316 0,29736 0,32250 11 Table 3 M ales National total Andalusia Aragon Asturias (Principality of) Balearic Islands Canary Islands Cantabria Castile and Leon Castile-La M ancha Catalonia Valencian Community Extremadura Galicia M adrid (Community of) M urcia (Region of) Navarre Basque Country La Rioja Ceuta & M elilla < 25 years 0,87400 0,86921 0,81831 0,72707 0,92515 0,87927 0,82173 0,84914 0,78903 0,98646 0,92839 0,70932 0,84543 0,79801 0,92452 0,91561 0,93225 0,77365 0,33231 >= 25 years 0,85980 0,88665 0,79477 0,68618 0,92939 0,83938 0,81734 0,83932 0,77763 0,97420 0,91438 0,73187 0,85214 0,72419 0,93209 0,86796 0,90689 0,80352 0,43799 Females < 25 years >= 25 years 0,86245 0,81096 0,86105 0,82364 0,76362 0,68321 0,73931 0,69373 0,91286 0,88808 0,89022 0,83835 0,83768 0,80064 0,85250 0,79042 0,80093 0,72061 0,97713 0,97235 0,90826 0,85350 0,69184 0,60486 0,88346 0,82950 0,78116 0,70640 0,90513 0,86973 0,91105 0,83125 0,90699 0,86999 0,72805 0,68564 0,33715 0,30588 An analysis of these results shows that males are more likely to remain in the unemployed category under the new definition than females. The probability is also higher in young people aged under 25 years old than in people aged 25 years old and over (Table 1). This general pattern is only broken in Asturias and the Canary Islands, as regards sex, and in Ceuta and Melilla taken as a whole, as regards age. On the other hand, the introduction of the interaction between sex and age-group variables (Table 3), produces a general decrease in the probability of remaining within the unemployed category for males under 25 (compared with Table 2), except in Madrid and Catalonia, where the probability is higher. The Appendix to this document sets out the results of the series revision procedure, obtained by the application of models 2 to 19. The national total is calculated by an addition of all the Autonomous Communities results. 12 Appendix: revised results* * There may be differences in the decimals of old unemployment figures and new figures measured directly by the LFS and those ones available on INEbase, because totals are sums that may have accumulated rounding errors.
© Copyright 2026 Paperzz