Computing Poverty measures with R vs. Stata
Rosendo Ramirez and Darryl McLeod
Professor Vinod R-Group presentation,
May 1, 2014
Fordham University E-530 Dealy 12 noon
Outline of Presentation
1. Accessing survey data in R and Stata, Peru has a survey of about 25,000 persons, a longitudinal panel,
2007 to 2011. We are using the 2011 survey data, reading it first into Stata (it is published in Stata
format by the Peruvian …..???)
2. To make the survey same representative of the 30 million people in Peru, we have to weight each family
by its relative prevalence in the national population. This weight scheme is accomplished by svyset in
Stata and, more or less, by a subroutine called svydesign in R.
3. We also use a program called sepov to computer p(0), p(1) and p(2) three standard poverty measures
derived from the Foster-Greer-Thorbeke or FGT poverty index.
4. We find that the Stat and R routines are equally capable of computing basic poverty rates, but so far we
have not been able to implement the survey design or weighting scheme Stata uses to make a HH survey
representative of the entire population.
5. On the other hand, R is free and constantly being updated and it present capacity to handle large data
sets such as the peru survey of 25,000 households is impressive.
6. As of this writing, Stata’s panel data routines (not shown here) are a bit easier to use that those R. In
fact we have not figured out how to load the entire 5 year Peruvian survey into R (suggestions
welcome).
Resources/Files
Camtasia Tutorial for R-Studio Early version (needs editing) (you can download this mp4 videos)
How do I use the Stata survey (svy) commands?
The Peruvian Nuevo Sol is the currency of Peru. Our currency rankings show that the most popular Peru Nuevo Sol exchange
rate is the PEN to USD rate. The currency code for Nuevos Soles is PEN, and the currency symbol is S/..
Data: 2011 HH Survey data for Peru, from the
Stata Do file for tutorial: Sample Stata output with notes
All files on http://www.gdsnet.org/
R files: R file for reading Stata survey data R inflation VAR data Prueba.R (not sure what this file is)
Background note on the FGT poverty and severity measures: the headcount or H or p(0) or the
poverty gap (H*I where I has distance below the poverty line of the average poor person) and the
severity measure p(2) or gap squared. A useful, encompassing measure of poverty is the Foster,
Greer, Thorbeke (FGT) index, where n is total population, q is the population below the poverty line y p
and yi is the income of poor person i. The income gap or shortfall of each poor
q
FGT (1/ n) vi where vi
i 1
y p yi
yp
where yp is the poverty line, yi is the income of household i,
q is the number of poor households, n is the number of households in the entire population. Suppose
the poverty line is $400 and there are four poor people with of a total population (n) of 10. The two
rural poor people have $200 annual income and the two urban poor have $300. When α = 0 and the
FGT index p(0) equals the basic headcount measure of poverty (H). When α= 1 the FGT index p(1) is
H*I, where I is the average income shortfall or (yp - y- )/yp
where y- is the average income of the poor and again yp is the official poverty line. When α = 2 the FGT
poverty index or P(2) is the sum of the average income gaps squared. This implies the poorest have
more weight in the poverty index, so that if the government redistributes income to the poorest of the
poor, the index p(2) falls most (“remember the neediest” is the NY Times motto) The global standard
for severe poverty is 38/month or $1.25 a day PPP in low income countries. Middle income countries
like Peru use $2.50 per day or $76 per month as their severe poverty line or $4-$5 per day for everyday
or moderate poverty line.
Note that the Peruvian currency, the Nuevo Sol trades at about 2.8 per dollar U.S. The PPP conversión
factor for Peru is about 1.66 in other words a dollar in Peru (rural and urban) buy what a $1.66 would
buy in the United Stats.
Files:
This Stata file contains the 24,000 HHs in the 2011 survey: sumaria2011.dta
Do file program: sumaria.do
Stata code
clear
* open the data
use "D:\economic_research\r-software\fordham\sumaria2011", clear
*set the data survey design
svyset conglome [pw=facpob], strata(estrato)
* monthly per capita expenditure – National
tabstat gpcm [aw=facpob], stats(mean semean sd n )
* mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8
Soles/US$
* National
tabstat linpe if (estrato>=1) [aw=facpob], stats(mean p50)
* Urban
tabstat linpe if (estrato<6) [aw=facpob], stats(mean p50)
*Rural
tabstat linpe if (estrato>=6) [aw=facpob], stats(mean p50)
* mean of monthly percapita expenditure - poverty in local currency (soles) exchange rate = 2.8 Soles/US$
* National
tabstat linea if (estrato>=1) [aw=facpob], stats(mean p50)
* Urban
tabstat linea if (estrato<6) [aw=facpob], stats(mean p50)
* Rural
tabstat linea if (estrato>=6) [aw=facpob], stats(mean p50)
* Extreme Poverty headcount
* National
sepov gpcm [w=facpob], povline(linea)
* Urban
sepov gpcm [w=facpob] if (estrato<6), povline(linea)
* Rural
sepov gpcm [w=facpob] if (estrato>=6), povline(linea)
* Poverty headcount
* National
sepov gpcm [w=facpob], povline(linpe)
* Urban
sepov gpcm [w=facpob] if (estrato<6), povline(linpe)
* Rural
sepov gpcm [w=facpob] if (estrato>=6), povline(linpe)
Stata Results
1. * monthly per capita expenditure - National
tabstat gpcm [aw=facpob], stats(mean semean sd n )
variable | mean se(mean)
sd
N
-------------+---------------------------------------gpcm | 484.6624 2.556388 402.6534 24809
-----------------------------------------------------2. * mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8
Soles/US$
. * mean of monthly percapita expenditure - extreme poverty in local currency (soles) exchange rate = 2.8
Soles/US$
. * National
. tabstat linpe if (estrato>=1) [aw=facpob], stats(mean semean p50)
variable |
mean se(mean)
p50
-------------+-----------------------------linpe | 143.0299 .1328722 137.7326
-------------------------------------------. * Urban
. tabstat linpe if (estrato<6) [aw=facpob], stats(mean semean p50)
variable |
mean se(mean)
p50
-------------+-----------------------------linpe | 150.6009 .1561769 143.5867
-------------------------------------------. *Rural
. tabstat linpe if (estrato>=6) [aw=facpob], stats(mean semean p50)
variable |
mean se(mean)
-------------+------------------------------
p50
linpe | 121.2698 .0161088 121.4675
-------------------------------------------3. . * mean of monthly percapita expenditure - poverty in local currency (soles) exchange rate = 2.8
Soles/US$
. * National
. tabstat linea if (estrato>=1) [aw=facpob], stats(mean semean p50)
variable |
mean se(mean)
p50
-------------+-----------------------------linea | 272.2597 .3591983 275.7272
-------------------------------------------. * Urban
. tabstat linea if (estrato<6) [aw=facpob], stats(mean semean p50)
variable |
mean se(mean)
p50
-------------+-----------------------------linea | 296.3015 .3693753 277.5714
-------------------------------------------. * Rural
. tabstat linea if (estrato>=6) [aw=facpob], stats(mean semean p50)
variable |
mean se(mean)
p50
-------------+-----------------------------linea | 203.1609 .0766447 200.8827
--------------------------------------------
4. . * Poverty headcount
. * National
. sepov gpcm [w=facpob], povline(linea)
(sampling weights assumed)
Poverty measures for the variable gpcm: (unlabeled)
Survey mean estimation
pweight: facpob
Number of obs =
Strata: <one>
PSU:
24809
Number of strata =
<observations>
1
Number of PSUs =
24809
Population size = 29943619
-----------------------------------------------------------------------------Mean | Estimate Std. Err. [95% Conf. Interval]
Deff
---------+-------------------------------------------------------------------p0 | .2782429
.00415 .2701086 .2863772 2.127523
p1 | .0780467 .0014051 .0752928 .0808007 1.902044
p2 | .0318401 .0007396 .0303904 .0332898 1.827785
-----------------------------------------------------------------------------. * Urban
. sepov gpcm [w=facpob] if (estrato<6), povline(linea)
(sampling weights assumed)
Poverty measures for the variable gpcm: (unlabeled)
Survey mean estimation
pweight: facpob
Strata: <one>
PSU:
<observations>
Number of obs =
15065
Number of strata =
Number of PSUs =
1
15065
Population size = 22214450
-----------------------------------------------------------------------------Mean | Estimate Std. Err. [95% Conf. Interval]
Deff
---------+-------------------------------------------------------------------p0 | .1799882 .0048984 .1703869 .1895896 2.448933
p1 | .0400419 .0014403 .0372188
.042865 2.561502
p2 | .0138027 .0006963 .0124379 .0151675 2.724085
-----------------------------------------------------------------------------. * Rural
. sepov gpcm [w=facpob] if (estrato>=6), povline(linea)
(sampling weights assumed)
Poverty measures for the variable gpcm: (unlabeled)
Survey mean estimation
pweight: facpob
Number of obs =
Strata: <one>
PSU:
9744
Number of strata =
<observations>
1
Number of PSUs =
9744
Population size = 7729168.5
-----------------------------------------------------------------------------Mean | Estimate Std. Err. [95% Conf. Interval]
Deff
---------+-------------------------------------------------------------------p0 | .5606372 .0062727 .5483413 .5729331 1.556334
p1 | .1872767
.002944 .1815059 .1930476 1.737225
p2 | .0836816 .0018121 .0801295 .0872337 1.834893
------------------------------------------------------------------------------
5. . * Extreme Poverty headcount
. * National
. sepov gpcm [w=facpob], povline(linpe)
(sampling weights assumed)
Poverty measures for the variable gpcm: (unlabeled)
Survey mean estimation
pweight: facpob
Number of obs =
Strata: <one>
PSU:
24809
Number of strata =
<observations>
1
Number of PSUs =
24809
Population size = 29943619
-----------------------------------------------------------------------------Mean | Estimate Std. Err. [95% Conf. Interval]
Deff
---------+-------------------------------------------------------------------p0 | .0634228 .0019537 .0595934 .0672523 1.594156
p1 | .0149874 .0005739 .0138625 .0161122 1.588561
p2 | .0053678 .0002667 .0048451 .0058906 1.497365
-----------------------------------------------------------------------------. * Urban
. sepov gpcm [w=facpob] if (estrato<6), povline(linpe)
(sampling weights assumed)
Poverty measures for the variable gpcm: (unlabeled)
Survey mean estimation
pweight: facpob
Strata: <one>
Number of obs =
Number of strata =
15065
1
PSU:
<observations>
Number of PSUs =
15065
Population size = 22214450
-----------------------------------------------------------------------------Mean | Estimate Std. Err. [95% Conf. Interval]
Deff
---------+-------------------------------------------------------------------p0 | .0141625 .0015339 .0111558 .0171693 2.538724
p1 | .0027967 .0004132 .0019867 .0036066 2.922182
p2 | .0008881 .0001642 .0005662 .0012099 2.583605
-----------------------------------------------------------------------------. * Rural
. sepov gpcm [w=facpob] if (estrato>=6), povline(linpe)
(sampling weights assumed)
Poverty measures for the variable gpcm: (unlabeled)
Survey mean estimation
pweight: facpob
Number of obs =
Strata: <one>
PSU:
9744
Number of strata =
<observations>
1
Number of PSUs =
9744
Population size = 7729168.5
-----------------------------------------------------------------------------Mean | Estimate Std. Err. [95% Conf. Interval]
Deff
---------+-------------------------------------------------------------------p0 | .2050021 .0055135 .1941946 .2158097 1.817265
p1 | .0500248
.001753 .0465885 .0534611 1.902171
p2 | .0182432 .0008841 .0165102 .0199762 1.957318
------------------------------------------------------------------------------
Poverty measures with R Software
# how to set a directory?
setwd("D:/economic_research/r-software/fordham")
# how to get a directory?
getwd()
# how to read a stata file?
# download foreign package - Read Stata file in R Software
# for example stata file: sumaria2011.dta, mus08psidextract.dta, etc
c<-read.dta("D:/economic_research/r-software/fordham/sumaria2011.dta")
summary(~gpcm)
# download survey package - Data survey
poverty<-svydesign(id=~conglome, strata=~estrato, weights=~facpob, data=c)
monthly_percapita_expenditure<-svymean(~gpcm, design=poverty)
monthly_percapita_expenditure
# download ineq package - Poverty package
linea<-svymean(~linea, design=poverty)
linea
linpe<-svymean(~linpe, design=poverty)
linpe
pov(c$gpcm, 143.03, parameter=1, type ="Foster")
pov(c$gpcm, 272.26, parameter=1, type ="Foster")
R Software Results
> monthly_percapita_expenditure<-svymean(~gpcm, design=poverty)
1. monthly_percapita_expenditure
mean
SE
gpcm 484.66 5.3645
Comparison:
We have the same mean monthly per capita expenditure but different standard error of mean
Mean gpcm
SE(mean gpcm)
Stata
484.6624
2.556388
R software
484.66
5.3645
> # download ineq package - Poverty package
2. > # mean monthly percapita expenditure – National poverty line
> linea<-svymean(~linea, design=poverty)
> linea
mean SE
linea 272.26 0.83
3. > # mean monthly percapita expenditure – National extreme poverty
> linpe<-svymean(~linpe, design=poverty)
> linpe
mean
SE
linpe 143.03 0.3305
Comparison
We have the same mean monthly per capita expenditure extreme poverty but different standard error of mean.
National
Mean linpe
SE Mean linpe
Stata
143.0299
.1328722
R Software
143.03
0.3305
We have the same mean monthly per capita expenditure poverty but different standard error of mean.
National
Mean linpe
SE Mean linpe
Stata
272.2597
.3591983
R Software
272.26
0.83
4. > # mean monthly percapita expenditure - extreme poverty line National
> # National extreme poverty headcount
> pov(c$gpcm, 143.03, parameter=1, type ="Foster")
[1] 0.1050022
> # National poverty headcount
> pov(c$gpcm, 272.26, parameter=1, type ="Foster")
[1] 0.3374179
>
Comparison
Stata takes the data survey design (wei ght) while R Software uses only the sample.
National
Headcount Extreme poverty
Headcount Poverty
Stata (with Weighted sample)
.0634228
.2782429
R Software (unweighted data)
0.1050022
0.3374179
I am trying to find other packages to work with poverty measures using data survey design. So far I found ineq
package that works with sample no with data survey design (weight).
© Copyright 2026 Paperzz