Presentation

European Conference on Quality in Official Statistics
3-6 May, 2010 - Helsinki
A mixture model for estimating under-coverage
rate in Italian municipal population registers
by Marco Fortini and Gerardo Gallo
The National Institute of Statistics, Italy
Q2010
Helsinky
3-6 May 2010
Content of presentation
Relationship between Population Census
and Population register in Italy
Quality and accuracy of population Census
results and Population Register (PR) data
Analysis of 2001 Population Register’s
Undercount (PRU events) at municipality level
by using administrative data
Findings and solution for next Population
Census rounds
Q2010
Helsinky
3-6 May 2010
Census returns and update of population register (1)
 Census provides valuable data at a reference day on
population size by socio-economic characteristics of
usual residents
 Population register (PR) is managed at municipality
level according to legal framework and it continuously
records personal data for administrative use (i.e. to
establish identity of individuals or change of legal
residence)
 Both data sources are affected by coverage and
quality problems
Q2010
Helsinky
3-6 May 2010
Census returns and update of population register (2)
 Census data could be incorrect due to (undercount)
missed people and by duplicates or other erroneous
enumerations (over count)
 PR is affected by over coverage whenever events of
emigrations and deaths do not cause a deletion and
by under coverage as far as immigration events and
births do not produce a formal registration
 Census returns play an important role for PR
coverage evaluation and control, since PR can be
updated extensively every 10 years as results of the
comparison with Census records
Q2010
Helsinky
3-6 May 2010
Register-supported Census: involvements and Proposals
Key operational assumptions of register-supported
Census:
 Over coverage events can be amended during
fieldwork operation of the 2011 Census
 PR is affected by undercount for those people who
usually reside on the territory at the Census reference
day without being enlisted into PR
 The goal is to evaluate at municipal level an
approximate amount of Population Register
Undercount (PRU events) on the basis of 2001
Census returns
Q2010
Helsinky
3-6 May 2010
Data description
 Data on PRU events are provided by the 8,101
municipal Register Offices (from Oct. 2001 to Dec.
2006)
 Local authorities verify and finalise the administrative
procedures about persons who were enumerated as
usual residents at the Census date being not included
into the municipal archive
 PRU events refer to people who moved to a current
municipality without applying for a formal change of
place of residence
Q2010
Helsinky
3-6 May 2010
PRU events according to 2001 Census returns
Absolute value of 2001 PRU events = 244,429
50,000 and more
inh. = 49,011
20,000- 49,999
inh.= 44,164
Under 5,000 inh
= 55,248
5,000 - 19,999
inh.= 96,006
1,175 Municipalities
reporting zero PRU events
Q2010
Helsinky
3-6 May 2010
Immigration rate and population register’s undercount ratio
of the larger-size Italian municipalities (over 250,000 inh.)
7,0
Milan
Verona
Bologna
6,0
5,0
4,0
3,0
2,0
1,0
0,0
0,0
Naples
Florence
Bari
Palermo Rome
Messina
Catania
Genoa Venice
Turin
10,0
20,0
30,0
Immigration rate (x 1,000 inh.)
40,0
Q2010
Helsinky
3-6 May 2010
The assumption based on the available evidence!!!!
The observed 2001 PRU events could be
considered only an underestimate of the
whole figure
It is unknown the number of municipalities
which have updated their PR with accuracy,
in efficient manner and quickly
Two sets of municipalities can be expected:
1) Those updating the PR with accuracy
2) Those not achieving this task or
fulfilling it only in part and in delay
Q2010
Helsinky
3-6 May 2010
Why do we use Mixture regression modelling?
In order to correct the observed PRU events
To take into account for underreporting made
by municipalities which didn’t properly update
their population register (undercount)
Q2010
Helsinky
3-6 May 2010
How Finite Mixture regression models work…?

Average probability density function weighted by distribution
functions of the same or different type
G
f x    p g f g x 
g 1
pg>0 and pg=1 are weights or prior probabilities.
By introducing a linear regression model relating to a
dependent variable and one or more explicative variables
y g   g  βx g  ε g
By defining a log-likelihood that can be written as
n
n
G

2
g, g,  g e pg L    log  f xi    log   y g  g , β g ; x g ,  g2 pg 
i 1
i 1
 g 1

….and can be maximized by EM algorithm
The best fitting model is selected by means of the Bayesian Information Criterion
(BIC)…max of the model likelihood penalised with the product of the model degrees of
freedom by the number of municipalities
Q2010
Helsinky
3-6 May 2010
The relationship between PRU events and the explicative variables
Municipality population size (POP) and Average annual number of
immigrants from 2002 to 2005 (AI0205) are the most explicative
variables of PRU events (other promising variables have been
discarded after preliminary analysis)
Q2010
Helsinky
3-6 May 2010
Best fitting models results in 3 components mixture regression
Q2010
Helsinky
3-6 May 2010
Expected PRU events according to municipal population
size
Class of
municipalities
Observed PRU
events
Expected PRU
events
Under 5,000
55,248
105,863
50,615
5,000 - 19,999
96,006
251,210
155,204
20,000 - 49,999
44,164
103,882
59,718
50,000 and more
49,011
98,936
49,925
244,429
559,892
315,463
Total
Absolute
difference
Observed PRU ratio (%)
1,60
Expected PRU ratio (%)
1.50
1,40
1,20
1.03
1.00
1,00
0.98
0,80
0,60
0,40
0.52
0.50
0.57
0.44
0,20
0.43
0.25
0,00
Under 5,000
5,000-19,999
20,000-49,999
50,000 and more
Municipal population classes
Total
Q2010
Helsinky
3-6 May 2010
Expected PRU events according to geographical area
observed PRU ratio
expected PRU ratio
% Increase of PRU
3.0
2.76
2.6
2.5
0.94
0.76
0.54
0.35
North
Centre
0.33
South
Q2010
Helsinky
3-6 May 2010
Final remarks
This figure remains lower than 2001 Census under coverage
survey which estimated 800,000 individuals
So the strategy based on register-supported Census could
be accomplish without causing trouble for
Each municipal Office should provide to Italian National
Institute of Statistics a final balance sheet reporting
quality and accuracy information on the maintenance
of population archive