Estimators

Estimating Internally Forced Displacement (IFD) Flows in
Colombia using Mark and Recapture methods, 1996 – 2004
Soledad Granada
Universidad del Rosario and CERAC
Mauricio Sadinle García-Ruíz
Universidad Nacional and CERAC
Jorge A. Restrepo
Pontificia Universidad Javeriana and CERAC
September 2nd, 2007
Content
1. Difficulties with displacement flows in
Colombia
2. Objectives
3. IFD according to the available information
4. Sources
5. Methods
6. Results
7. Further planned work
Caveats/Acknowledgement
• Work still in progress
– We are waiting for a major third source of
data to become available and to integrate it in
order to improve on the estimation quality
– We are testing further improvements to
extract more information from the data
• Project developed thanks to the support of
Foreign Affairs-CANADA (DFAIT)
Difficulties with displacement flows in Colombia
• Multiple complementary sources
• Measurement:
–
–
–
–
Government
Catholic Church’s Conference of Bishops
NGO Codhes
ICRC
• Surveys
– Ibáñez (2006)
– Household survey (13 cities)
– Catholic Church’s Conference of Bishops
• Estimations
– NGO Codhes
Difficulties with displacement flows in Colombia
• In general, these sources offer similar trends,
different levels, but they have differential
coverage, methods and purposes
• As a results, there is no information for
humanitarian attention and planning
• Inefficiencies in humanitarian attention
• Victims attention is also made mostly on a adhoc basis, both by government and other
assistance agencies
• Most attention is emergency-based and not
long-term oriented
Objectives
• To estimate the level of IFD using several
complementary sources, starting with two
• To estimate dynamics of IFD
• To provide a measure of the reliability of
the estimates
• To provide a measure of
coverage/undercount of sources
• To provide a “humanitarian” IFD risk map
IFD available information
Internally Forced Displaced Population (Expulsion Rates) recorded by SIPOD of Acción Social
500000
400000
300000
200000
100000
0
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Source: Sistema de Informaciòn para Poblaciòn Desplazada - SIPOD - Agencia Presidencial para la Acciòn Social
2001
2002
2003
2004
2005
2006
2007
Internally Forced Displaced Population (Expulsion Rates) recorded by RUT of Pastoral Social
70000
60000
50000
40000
30000
20000
10000
0
1990
1991
1992
Source: RUT from Pastoral
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Sources
Secretariado Nacional de Pastoral Social –RUT-
• The RUT database has 242.565 records, between1996 y 2004; after
cleaning it up we had 236.795 usable records, only 2.37% had
incomplete fields.
• Several variables, including, victim information, family relatives, and
regarding the displacement event.
• Voluntary, not compulsory, survey at humanitarian attention and
Church attention posts.
• Low coverage, self-selected, good coverage in isolated areas
• Potential problematic biases
Acción Social –SUR• Acción Social database holds 2.278.978 records covering the 1995
2006 period.
• Several variables including family, individual, and event of
displacement-related information.
• Low “quality”. After cleaning it up, only 1.616.743 records were
being able to be used.
• Compulsory registry to obtain humanitarian emergency aid from the
government and to be able to apply for further assistance
• Self-selected
• Large coverage, mostly on urban areas and large cities
Methods
• We use RUT and AS for the 1996-2004 period to
estimate a level of displacement at day-municipality
pairs, using Capture-recapture (C-R) census correction
models. (Pollock, 2002, JASA). (Also known as markand recapture, by Wikipedia)
• C-R permits an estimation of population levels based on
incomplete samples of it by different overlapping sources
• It assumes :
–
–
–
–
–
Two samples (at record level)
Equal probability of inclusion for every individual in every sample
Independency between samples
It is possible to identify the intersection
Population is closed
Methods
• Under those assumptions, estimation is
unbiased (consistent) and efficient.
• Applications in natural sciences realms are
countless
• In social sciences, Ball et al have been a
pioneer of using these techniques to
estimate the level of HR violation in civil
war contexts (Guatemala, Perú, Timor
Leste) (see Ball et al, 2002).
Methods
Estimators used
• Lincoln-Petersen:
• Chapman:
n1n2
ˆ
N
m
(n1  1)(n2  1)
ˆ
N
1
m 1
Records at SUR
n1
Records at RUT
n2
Recaptured
m
Estimators
• Chapman (1951) improves on LincolnPetersen
• Chapman is unbiased if (Wittes, 1972):
n1  n2  N
• Chapman (1951) show its estimator has
negative bias if:
n1  n2  N
• The most problematic of all assumptions (in terms of the quality of
the estimate) is the equi-probability of capture across individuals
• Sekar and Deming (1949) propose a stratified estimation procedure
in order to account for the effect of differential probability of capture.
The procedure will yield thus an estimate:
ˆ 
N
ˆ
N
 i
• It assumes differential probability of capture across strata, same
probability within strata, independence between samples and close
population.
• This way the (upward) bias created falls dramatically
• The quality of stratification determines the quality of the estimate
Methods
• We use then both Lincoln Petersen and
Chapman (preferred) stratified estimators
• When m=0, Nˆ i  n1i  n2i
• This is, we use the simple sum of the two
observed samples.
How to define the strata?
•
•
There are several applications that take the strata as defined
exogenously (communities, regions, etc.)
We want to obtain k statistically different groups of records that will
have a statistically similar probability of being included in the group:
ˆ 1 j  m j n2 j , p
ˆ 2 j  m j n1 j
p
•
We test the null hypothesis:
ˆ e1  p
ˆ e2    p
ˆ ek
H0 : p
Against
ˆ ei  p
ˆ ej for some i  j
H1 : p
Where e  1,2
How to define the strata?
• We test using:
(mi  ni 0 ) 2
Q
i 1 ni 0 (1   0 )
k



k
0
i 1
k
mi
n
i 1 i
• That distributes
Q   k21
Procedure for stratification
• We test for stratification by municipality (admin unit) of
arrival, and day of arrival
• We further jointly test for strata by day and municipality
of arrival, creating strata for each day and municipality
pairs.
• We further test within each municipality in consecutive
days, in order to discard over-stratification within
municipality, that will lead to an upward bias.
• (We do not do this by municipality as these are more
“natural” strata).
• The rejection level was set at 5% for all tests
Matching criteria
• We define the matching criteria according to
date of expulsion and arrival, place of expulsion
and arrival and gender. Other more strict
applications offered no substantial estimated
difference.
• We create an indicator for each record and
compare them across sources using a computer
algorithm
• We do not have access to more specific
characteristics (name, id) of the individual that
will help us to improve the matching
Confidence: Bootstrapping methods
• In order to obtain confidence intervals, and given
that we do not know the theoretical distribution
of the estimator, we use bootstrapping methods
in order to approach the distribution of the
estimator. We follow Buckland y Garthwaite
(1991).
• We perform 10000 replications of sub-samples
with replacement in order to obtain CI and an
estimate of the variance of the estimator.
• We perform a correction of the bias of the
estimator R  Nˆ  N according to the results of
the bootstrap obtaining: Nˆ B  Nˆ  R *
Results
• The following are the point estimates bias-corrected and uncorrected
Lincoln-Petersen and Chapman estimates.
• In graphs and maps we present the Chapman estimate
Estimador
Estimación Puntual
Chapman
2.440.207
Lincoln-Petersen
2.732.908
Estimación Corregida*
2.504.760
2.843.858
Intervalo de confianza
2.307.363 - 2.447.239
2.495.533 - 2.761.800
n1 (Acción Social) 1.616.743
*Estimación corregida por el sesgo estimado bootstrap
n2 (RUT)
m (Intersección)
236.795
28.888
6e-06
4e-06
2e-06
0e+00
Densidad
8e-06
1e-05
Distribución Estimada
2250000
2300000
2350000
2400000
Estimación
Chapman
2450000
2500000
2550000
Número de Desplazados Estimado, SUR, RUT e Intersección Acumulados
3000000
2500000
2000000
1500000
1000000
500000
0
1996
1997
SUR
Fuente: Conferencia Episcopal (RUT)
y Acción Social (SUR).
Procesado por: CERAC.
1998
RUT
1999
Estimación Corregida
2000
Intersección
2001
Per 97,5
2002
Per 2,5
2003
Estimación
2004
Número de Desplazados Estimado, SUR, RUT e Intersección
800000
700000
600000
500000
400000
300000
200000
100000
0
1996
1997
SUR
Fuente: Conferencia Episcopal (RUT)
y Acción Social (SUR).
Procesado por: CERAC.
1998
RUT
1999
Estimación Corregida
2000
Intersección
2001
2002
Per 97,5
2003
Per 2,5
2004
Estimación
Departmental
Results
Estimaciones
Antioquia
Atlántico
Bogotá D.C
Bolívar
Boyacá
Caldas
Caquetá
Cauca
Cesar
Córdoba
Cundinamarca
Chocó
Huila
La Guajira
Magdalena
Meta
Nariño
Norte de Santander
Quindio
Risaralda
Santander
Sucre
Tolima
Valle del Cauca
Arauca
Casanare
Putumayo
San Andrés y Providencia
Amazonas
Guainía
Guaviare
Vaupés
Vichada
Total general
Fuente: SUR y RUT
1996-1999
2000-2002
2003-2004
Total general
38.422
144.367
53.672
236.461
5.137
58.295
23.254
86.686
16.566
90.136
242.996
349.697
35.495
151.062
21.479
208.036
228
4.825
3.375
8.428
211
30.707
10.921
41.839
6.066
71.686
80.031
157.783
145
25.727
10.391
36.263
3.938
69.803
34.987
108.729
13.874
56.784
9.777
80.435
1.916
24.178
18.475
44.569
16.174
101.594
7.017
124.786
2.630
29.341
24.345
56.316
703
20.073
16.222
36.998
4.319
59.918
19.723
83.959
13.957
28.549
22.555
65.061
1.996
66.058
31.537
99.590
6.735
40.712
20.228
67.675
173
7.826
6.438
14.437
753
30.369
13.001
44.123
8.263
47.647
15.703
71.613
8.890
78.906
18.230
106.025
2.804
71.211
20.868
94.883
3.280
78.717
42.681
124.678
609
4.626
5.806
11.041
902
6.780
4.774
12.456
316
27.758
16.386
44.460
11
22
13
46
1
225
487
713
6
348
856
1.210
681
11.220
7.043
18.944
17
20
698
735
81
804
649
1.534
195.299
1.440.294
804.614
2.440.207
Estimaciones: CERAC
Recorded by
Secretariado Nacional
de Pastoral Social
– RUT -
Recorded by
Acción Social
SUR
Recaptured
Estimated Arrival IFD
1996 – 2004
Recorded Exit IFD by the
two systems (Projected)
Probability of being
displaced by municipality
(exit)
Estimated arrival IFD
population per 100.000
inhabitants
Comparison with other displacement figures
o
CODHES:
o
o
RUT:
o
o
o
Records (Jan 1995 - June 2005): 1.877.328
CEDE:
o
o
Estimation (1985 – 1994): 720.000
Records (Jan 1995 - June 2004): 237.614
SUR:
o
o
Estimation (Jan 1985 - June 2006): 3.832.527*
Estimation (Jan 1995 - June 2005): 2.576.610
CERAC:
o
Estimation (1996 - 2004): 2.440.207
* Incluye cifra Conferencia Episcopal periodo 1985 – 1994.
Further Planned Work
• We just received data until mid-2007 and will
consequently update the estimation
• We are devising a method to improve the estimation in
order to incorporate the large number of un-matchable
observations using a window-based recursive procedure
with those observations.
• We are researching the properties of the estimator using
Monte Carlo methods.