Water, Spillovers and Free Riding:
Provision of Local Public Goods in a Spatial Network
Rossa O’Keeffe-O’Donovan
January 31, 2016
Latest Version
Abstract
Investment by one community in a local public good may affect the returns to investment in similar goods for
neighboring communities. This paper estimates the magnitude of spillover and free riding effects from local public
goods investments in a spatial network of interdependent communities, using new data on the population of water
sources in rural Tanzania. Reduced form estimates establish positive spatial correlation in functionality of water
sources of the same type but find no spatial correlation between water sources of a different type. Pumps are 21
percentage points more likely to be functional if the nearest working water source is a pump of the same technology.
These results suggest that the spatial interdependence is driven by positive spillovers and motivate the development
of a structural model, which allows for spillover and free riding effects in a strategic framework. They also provide
a novel solution to the identification challenges inherent in estimating network games, endogeneity and reflection.
I estimate the model by treating geographic communities as a series of segregated networks, for which I compute
the likelihood using an equilibrium selection rule. My research provides evidence that both positive spillovers and
free riding play important roles in the efficient provision of local public goods. I estimate that there would be a
5% increase in public goods provision if the technology of water sources were standardized to fully exploit positive
spillovers between them.
Keywords: local public goods, spatial network, water, spillovers, free rider problem, Tanzania.
JEL Classification: H41, L14, O13
1
1
Introduction
The majority of research related to public goods focuses on the provision of a single good. However, in many cases
public goods are provided at a local level, leading to the possibility that provision of a public good in one locality
affects decisions to provide similar public goods in nearby localities. In particular, investment in public services, public
spaces, transport infrastructure and law and order may have spillovers, incur free riding, or both.
This paper sets out a framework to estimate the extent of spillovers and free riding in the provision of local public goods, and applies this framework to analyze the provision of water in rural Tanzania. Water provision has two
main stages: installation and maintenance. While the installation of water sources is undertaken by actors operating
at the regional level, maintenance investment decisions are made locally, at the water point or village level. My research focuses on the maintenance investment decision of a local community, in a spatial network of interdependent
communities with similar public goods.
The goals of this paper are twofold. First, I seek to contribute towards the recent and growing literature that
analyzes the provision of public goods in a network setting. Second, I aim to improve understanding of why water
projects in developing countries have persistently high failure rates, a question that is poorly understood by practitioners. It is estimated that roughly one third of rural water sources in sub-Saharan Africa are non-functional at any
given time, and increased access to water brings well-documented improvements in health, poverty and gender equality.
My empirical application estimates the extent of positive spillovers and free riding in water provision in Tanzania. I use
newly available data on the population of rural water sources in Tanzania, which gives information about water source
characteristics, location and functional status. Reduced form analysis establishes two empirical facts. First, there is
positive spatial correlation in the functionality of similar water sources (of the same type and technology). Second,
this positive spatial correlation is not present between water sources of different types. Although this correlation may
be explained by spatially correlated shocks or unobservable variables, these would not explain why only water sources
of the same type have correlated functionality, and why the correlation is strongest when water sources are also of the
same technology. As such, these two empirical facts provide strong evidence for positive spillovers in the maintenance
of water sources in Tanzania.
Positive spillovers may occur through a number of mechanisms. In many rural areas of Tanzania, markets for inputs, such as skilled labor and spare parts, do not exist or are very thin; investment in maintenance by one community
might help create or expand these markets for neighboring communities. Explicit cost-sharing between communities
is also possible, and has been documented, as has information sharing. Although I am agnostic about which of these
mechanisms is predominant, the estimated magnitude of spillovers effects is large and significant. My reduced form
analysis estimates that water pumps are 21 percentage points more likely to be functional if the nearest working water
source is a pump of the same technology.
The reduced form results motivate the development of a structural model, which can test the mechanisms through
which spillover and free riding effects act and provide counterfactual analysis to solve optimal policy problems, such as
where to install pumps to maximize access to clean water. My model sets out a network game with strategic interactions
between communities making maintenance decisions. When a community maintains its water source, its neighbors
benefit from positive spillovers but can also free ride on the investment. I define a series of segregated spatial networks
and estimate my model by maximum likelihood. My model estimates that the aggregate functionality rate would
increase by 5.3% if pump technologies were standardized to exploit positive spillovers. I also find evidence for free riding, specifically that communities are significantly more likely to invest in maintenance of their own public good if the
2
nearest viable alternative is more costly to access, because of a large distance to travel, a need to pay user fees, or both.
This paper draws on a number of distinct economic literatures. The most relevant is the recent and growing body of
research examining the provision of goods and services in a network setting. In particular, my modeling approach is
similar to Acemoglu, Garcı́a-Jimeno, and Robinson [2015], who analyze the effect of municipality-level state capacity1
on nearby communities’ outcomes, and model and estimate these externalities in a network context. They find that
most of the effect of neighbors’ state capacity comes through an equilibrium response of other communities, and network effects.
It is well known that the identification of social interactions is challenging because of endogeneity, due to unobservables that are correlated within the network, and reflection, in which each node’s action is a function of the actions
of its neighbors, making it difficult to disentangle cause and effect (Manski [1993], Brock and Durlauf [2001], Brock and
Durlauf [2007]). Identification in my model exploits some novel aspects of my empirical context to address endogeneity,
and uses partially overlapping peer groups to address the reflection problem (Bramoullé, Djebbari, and Fortin [2009],
De Giorgi, Pellizzari, and Redaelli [2010]). Other recent papers have laid theoretical foundations for the analysis of
public goods provision in a network setting, including Elliott and Golub [2015], Bramoullé, Kranton, and D’Amours
[2014] and Allouch [2015], though I cannot use their results directly in my empirical setting as my action space is
discrete rather than continuous. My estimation procedure is most similar to Todd and Wolpin [forthcoming], who
calculate the set of possible equilibria within a network for a given set of parameters, and use this to estimate their
model using a simulation-based likelihood approach.
Empirically, the extent of information sharing and technology transfer has been tested in agricultural practices, and
there is significant evidence for both in a rural developing country context (Conley and Udry [2010], Foster and
Rosenzweig [1995], Bandiera and Rasul [2006] and Munshi [2004]). However, Emerick [2013] finds that the transfer of
technology outside of one’s close (first degree) network is inefficient relative to direct (door-to-door) sales. My research
is also related to the extensive public goods literature, particularly recent papers focusing on multiple public goods
(Boadway, Song, and Tremblay [2007], Mutuswami and Winter [2004] and Schultz and Sjostrom [2001]). Cremer and
Laffont [2003] introduce heterogeneous costs of access to a public good and my modeling takes a similar approach:
each community is responsible for the maintenance of its public good, but can access a (non-excludable) alternative in
the event of breakdown, although at an increased cost. A further strand of the public goods literature finds evidence
that more heterogeneous communities are less able to overcome the collective action problem and provide public goods
(Alesina and La Ferrara [2000], Banerjee, Mookherjee, Munshi, and Ray [2001], Alesina, Baqir, and Easterly [1999],
Habyarimana, Humphreys, Posner, and Weinstein [2007], Miguel [2005]), a finding for which I find further evidence.
The remainder of this paper is structured as follows. Section 2 gives an overview of the empirical context that I
will study, namely the provision of water in developing countries, and section 3 describes the data. Section 4 presents
results of my reduced form analysis, and how these motivate the use, and inform the design, of a structural model.
Section 5 describes the model, and section 6 gives details of my estimation approach; I discuss an alternative estimation
approach that I will explore in future work in the Appendix. Finally, section 7 presents my results and counterfactuals,
and section 8 concludes.
1 The
authors describe ‘state capacity’ as ‘the existence of central and local states with the capacity to enforce law and order, regulate
economic activity and provide public goods’.
3
2
Empirical context
An estimated one billion people rely upon hand-powered water pumps as their main source of water (Carter, Harvey,
and Casey [2010]), though at any one point in time it is estimated that roughly one third of these pumps are broken down, with the functionality rate varying across different developing countries (UNICEF [2014], Lockwood and
Smits [2011]). Despite the high breakdown rates, handpumps remain the preferred and predominant mode of supplying
water in rural sub-Saharan Africa, with more than 60,000 new pumps installed every year (Sansom and Koestler [2009]).
The poor operational performance of rural water supplies is well recognized (Harvey and Reed [2004], Foster [2013]),
and has contributed to the projection that 45 countries will miss Millennium Development Goal 7.C, to halve the
proportion of people without access to an improved water source (WHO, UNICEF [2014]).2 Recent economic research
has estimated that access to improved water sources reduces rural poverty in India by 10-12 percent (Sekhri [2014])
and child diarrhea in Kenya by 25% (Kremer, Leino, Miguel, and Zwane [2011]). Improvements in gender equality in
areas where girls are traditionally responsible for water collection are also well documented (World Bank [2010]).
Although non-functionality of rural water pumps is a longstanding and widespread problem, the causes of this poor
performance are not well understood. It is recognized to be a complex issue with a number of factors explaining breakdown, including social, institutional and economic factors, as well as hydrological and engineering factors (Prokopy
[2005], Schweitzer and Mihelcic [2011], WaterAid [2011]). 28% of breakdowns in Tanzania are due to problems that
cost less than $10 to fix, and 15% of pumps have broken down primarily because they are no longer used, suggesting
that economic incentives to maintain a pump are an important determinant of their sustainability.3 As shown in
section 4 socioeconomic factors, such as the fractionalization of a community, and whether users are charged to access
its pump, are significant predictors of pump functionality.
The provision of water services in rural areas of developing countries can be broken down into two main stages:
installation, and ongoing maintenance and repairs. Although the vast majority of handpumps in rural communities
are installed by third parties (government agencies and non-governmental organizations) who pay for these installation
costs, ongoing costs are typically borne by the community using the water source. It is estimated that there are about
60,000 hand pumps installed each year in sub-Saharan Africa, at a capital expenditure of between $20 and $61 per
person. Recurrent costs are estimated at between $3 and $6 per person per year (IRC [2012]). In average cases, these
ongoing costs are affordable, even for some of the world’s poorest communities.
Community-based management has been the dominant form of managing and maintaining handpumps after they
have been installed (Harvey and Reed [2004], Lockwood and Smits [2011]). Communities typically appoint a ‘water
point committee’ (WPC) responsible for managing the local public good: setting user fees; maintaining the pump,
which comprises of keeping parts oiled and replacing pump components (seals, washers, ropes) as they wear out; and
carrying out repairs and pump deepening if the pump breaks down or dries up. This paper uses newly available data
on the location and functionality of water sources in Tanzania to analyze this decentralized decision of whether to
maintain a pump after it has been installed.
When analyzing the incentives to pay for the ongoing maintenance of a rural water source, it is important to note that
maintenance decisions may be influenced by the existence of other similar public goods nearby. In the vast majority
of developing countries, including Tanzania, there is a belief that access to water is a human right, and a norm that
2 Although the world as a whole has met this target, progress has been very uneven geographically, with most countries missing the
target in sub-Saharan Africa and South Asia. An improved water source includes the hand-powered water pumps that are the subject of
this research, as well as piped water provision, and certain rain-fed schemes.
3 See Figure 5 and Table 11 in the Appendix for more details on reasons given for breakdown in my data.
4
people from outside of a community may access water on the same terms community members. Therefore, incentives
to maintain a pump may be diminished if there is a freely available, functional alternative water source nearby, with
communities free-riding on each other’s investments, and under-investing in maintenance relative to a social optimum
(Cameron [2011]).
However, positive spillovers in the maintenance of nearby water sources might counteract this free-riding problem.
Angelucci and Di Maro [2015] define four types of spillovers: externalities; social interactions, for example resource
sharing, information sharing and changing incentives; context equilibrium effects, such as changes to behavioral or
social norms; and general equilibrium effects through changes in market prices because of changes in supply or demand. In my empirical context, positive spillovers may arise through a number of these channels. It may be possible
for communities to share some costs of maintenance, for example the costs of obtaining spare parts, tools and skilled
labor. There is also evidence that spillovers occur through the development of markets for spare parts and skilled
labor for repairs, as well as the sharing of information between communities (Pond and Pedley [2011]). Many markets
in these very poor rural settings are non-existent or very thin, so the actions of an individual community can have an
impact on the market. In this sense, communities are not price takers and act strategically when considering positive
spillovers from their investments and those of their neighbors. My research is agnostic about the exact mechanism
through which positive spillovers occur, as this is not observed, but I test for their presence and find strong evidence
for their existence.
3
Data
There has been a large increase in the collection of geo-coded data on rural water sources in developing countries
in the last 10 years, with more than 60 datasets collected in various regions of developing countries, each including
different information about water source and community characteristics. The primary data sources used in this paper are two water point mapping exercises carried out in Tanzania. The first was conducted by WaterAid between
2005 and 2008, in partnership with Concern, Engineers Without Borders (EWB) and SNV4 and was published in
2010. These data contain information on every rural water source in 42 out of 132 districts in Tanzania. The second dataset was collected by the Tanzania Ministry of Water between 2011 and 2013, and was published in 2013.
It includes information on the population of rural water sources in the country. The observations are plotted on a
map of Tanzania in Figure 1, with the right hand image showing the observations when we zoom in on a single district.
The data were collected to provide a census of all rural water services in the country and to determine which sources
were functional or not at the time of data collection. As shown in Figure 1, GPS coordinates give the location of
a water source, with administrative areas (village, ward, district region5 ) also listed for each observation. The data
also includes other information about the water sources that are important for my research. Table 1 shows that the
most common water source in Tanzania is a communal standpipe (tap), though many communities rely solely on hand
pumps, which make up one third of the observations in 2010, and 23% in 2013. The technology of a water source is
also given in the data, and among hand pumps there are four technologies that are most prevalent, all of which use a
machine-drilled borehole6 , though have different operation and maintenance requirements. The data also has detail on
who is responsible for managing and maintaining the water source and, as noted in section 2, more than 90% of hand
pumps are managed by a community committee (Community Based Management). There is also information about
when the water source was installed, whether users pay to use it, and (in some cases) date of last breakdown and repair.
4 SNV
is an international not-for-profit development organisation based in the Netherlands.
administrative divisions in Tanzania are broken down as follows: 26 regions, containing a total of 132 districts, with further
sub-divisions into wards, villages and sub-villages.
6 Rope pumps can be installed on a hand dug well
5 The
5
Table 1: Summary Statistics for two main sources of data: Tanzanian water source data from 2010 and 2013
All water sources
Water source type
Communal Standpipe
Hand Pump
Improved Spring
Other
Hand pump technology
Afridev
India mark II
Nira/Tanira
SWN 80
Rope pump
other
Management (all water sources)
Community Based Management
Parastatal
Private
other
Management (hand pumps)
Community Based Management
Parastatal
Private
other
Other variables (all water sources)
Age of water source at record (years)
Pay for use dummy
Other variables (hand pumps)
Age of pump at record (years)
Pay for use dummy
Obs
24,427
2010 data
Percent Func. Rate
100%
56.5%
Obs
57,435
2013 data
Percent Func. Rate
100%
55.2%
12,727
8,208
745
2,747
52.1%
33.6%
3.1%
11.3%
59.3%
55.7%
71.0%
41.3%
38,565
13,022
425
5,423
67.1%
22.7%
0.7%
9.4%
59.9%
58.6%
82.4%
11.7%
711
538
3,580
1,822
73
1,484
8.7%
6.6%
43.6%
22.2%
0.9%
18.1%
63.6%
61.9%
64.1%
55.2%
90.4%
28.4%
1,449
2,513
4,652
3,111
280
1,017
12.5%
21.7%
40.2%
26.9%
2.4%
8.8%
66.4%
57.5%
61.6%
59.5%
74.3%
29.1%
19,895
808
1,474
2,250
81.4%
3.3%
6.0%
9.2%
55.6%
46.7%
77.3%
55.1%
50,878
1,262
4,154
1,141
88.6%
2.2%
7.2%
2.0%
54.8%
66.4%
60.2%
45.1%
7,421
332
88
367
Obs
90.4%
4.0%
1.1%
4.5%
Mean
56.9%
44.9%
62.5%
39.0%
Std dev
12,080
202
442
298
Obs
92.8%
1.6%
3.4%
2.3%
Mean
58.9%
71.3%
42.1%
62.8%
Std dev
22,958
15,896
14.8
0.30
12.0
0.46
45,167
50,801
15.3
0.48
12.5
0.50
7,824
5,647
11.0
0.17
8.7
0.38
9,479
12,123
12.8
0.32
9.8
0.47
6
Figure 1: Location of functional and non-functional pumps in Tanzanian data
The left hand side image is of the entire country, the right hand side is zoomed in on a specific district, Dodoma.
The only paper to conduct a thorough, robust statistical analysis of pump functionality and its correlates using
similar data is Foster [2013]. Its findings are consistent with those in my reduced form analysis, that system age,
distance to the country’s capital and an absence of user fee collection are the main predictors of pump breakdown.
However, the paper presents a purely statistical analysis of the data, showing correlations but not considering interactions between different communities in maintaining their water sources.
I also use secondary data sources in my analysis, which I merge with the primary water data. I use population
and demographic data from the 2002 Tanzanian National Census (Tanzania National Bureau of Statistics [2002]),
and household and community data from the 2007-08 Living Standards Measurement Survey - Integrated Survey on
Agriculture (World Bank [2007-08]). These additional datasets do not have information on pump functionality but
have provided useful information about the communities that use them7 .
4
Reduced form analysis
My initial reduced form research establishes two empirical facts about the functionality of water sources, and uses these
to motivate the development of a structural network model to analyze how neighbors’ maintenance decisions affect
each other. First, I demonstrate positive spatial correlation in the functionality of water sources of the same type, and
that the correlation is strongest when the water sources are also of the same technology.8 Second, I show that when
water sources are of a different type or technology, there is either negative or no statistically significant correlation in
functionality rates. These two facts suggest the presence of positive spillovers between similar water sources drive spatial correlations, rather than spatially correlated shocks, as these would affect water sources of all types. I discuss how
these facts motivate the use of a structural model in section 4.3, and discuss identification in more detail in section 4.3.2.
7 The
8 See
LSMS Household module is only representative at the regional level.
Table 1 for the most prevalent water source types and technologies.
7
The reduced form analysis takes functionality of water pumps as the dependent variable,9 and uses two main groups
of specifications to learn about the spatial relationships between different water sources in determining functionality.
Section 4.1 uses the 2010 dataset to test whether the distance to the nearest alternative water source predicts whether
a water pump is functional or not. Because the distance may be endogenous, I also construct an instrument, which
gives results with the same qualitative conclusions. I find a negative relationship between functionality and distance
to an alternative water source, though this relationship only exists between sources of the same type (hand pumps).
Section 4.2 still uses functionality as the dependent variable, though now uses a measure of a water source’s centrality in its network as the independent variable, using the 2013 data. I find that a pump is more likely to be
functional if there are more pumps of the same technology within a certain distance, however there is an insignificant
or negative relationship with the number of non-pump water sources, or pumps of a different technology. This again
suggests that positive spillovers between similar pumps are driving the spatial correlation in functionality, with the
negative relationship with the number of dissimilar water sources suggestive of free riding. I discuss these results, and
their implications for my research, in more detail below.
4.1
Distance to the nearest alternative water source
I use a simple Probit specification to test whether the distance to an alternative working water source is a significant
predictor of whether a water pump is functional or not. A negative relationship between distance to an alternative
source and pump functionality could be explained by positive spillovers - i.e. communities that are isolated are less
able to adequately maintain their pump in some way. A positive relationship could be explained by free riding i.e. communities that are further from alternative water sources have less free riding opportunities, and so a greater
incentive to maintain their water pump. The baseline Probit specification is given by:
P r(f unctional|di , Xi ) = Φ(β0 + β1 di + β2 d2i + β 0 Xi )
where di is the distance from community i to the nearest alternative working water source, and Xi is a vector of
water source and community characteristics. Figure 6 in the Appendix gives a histogram of the distance to the nearest
alternative working water source, and the proportion of pumps that are functional in each distance ‘bin’. This shows
a non-parametric estimate of a negative relationship between distance to the nearest working water source and pump
functionality over empirically relevant distances (less than 10km).
Though this specification can show correlation, we may reasonably be worried about endogeneity of our key variable of interest, the distance to the nearest available alternative water source. Firstly, omitted variables may be
correlated with both distance to alternative sources and the probability that a pump breaks down. I include a number
of control variables to try to minimize potential sources of bias: pump type, age, and whether a community pays for
its use; population figures from the 2002 national census, including nationality and ethnicity fractionalization; district
and ward fixed effects; regional indicators from the LSMS;10 and distance to the 11 largest Tanzanian cities (proxies
for the isolation of a community).
Including these control variables does not rule out endogeneity, of course. For example, we may think that more
9 I restrict my sample to looking at the functionality status of water pumps only (rather than all water sources) for two main reasons.
Firstly, more than 90% of water pumps are managed by a community water point committee, the decisions of which I am seeking to
understand in this research. Secondly, water pumps are individual installations that are not physically connected to other installations. In
some cases communal standpipes (taps) are part of a wider system, so the point of collection does not represent the entire infrastructure
over which decisions are made.
10 Although the LSMS is only representative at the regional level, it asks some questions that may be correlated with water source
functionality. As well as the standard questions about income, employment and agriculture, it also asks about any water availability and
any water shortages in previous years.
8
pumps are installed in communities that are better able to maintain them, or that water practitioners disproportionately target low income areas, reducing the distance between water sources in these areas. To reduce these concerns
and obtain a ‘causal’ estimate of the effect of distance on pump functionality, I construct an instrument for distance
to the nearest alternative water source. I assume that breakdown in the first year of operation of a water source, B,
is a result of a (random) faulty installation by water practitioners, not poor maintenance, and that this results in an
exogenous increase in the distance to the next available source for a nearby pump, A. The instrument is the resulting
increase in distance to an alternative source for community A, as given by α in Figure 2.
Figure 2: The instrument I construct is the increase in distance to an alternative pump for community A resulting
from breakdown of its nearest alternative water source in its first year of operation. I treat breakdown in the first year
as ‘random’ (uncorrelated to community characteristics) and thus the increase in distance is exogenous.
The assumption that breakdown in the first year after installation is random is reasonable as the expected lifespan
of the most common pump types (Afridev and Nira pumps) is 15 years, and the shortest expected lifespan of any
components, the U-seal, is one year. To allay concerns that breakdown of the nearest pump to A leads to an increased
number of users of A, I restrict my instrument to cases where users of the broken down water source (B) have an
alternative source nearer to them than A, shown in Figure 2 as community C. The estimates under this restriction
remain similar. In each specification, I firmly reject under-identification (LM version of Anderson [1951]) and weakidentification (Cragg and Donald (1993)). There are 304 positive realizations of the instrument, with a mean increase
in distance because of exogenous breakdown of 1.1km. The instrumental variables estimates are performed using
Stata’s ivprobit command, a control functions approach.
The results of my baseline Probit specifications are given in Table 2. Both the linear and squared distance term
are highly statistically significant in all three specifications, and they estimate a negative relationship between distance to an alternative water source and pump functionality over relevant distances11 . This negative relationship gives
evidence for spatial interdependence in functionality of water sources; specifically, it indicates that pumps that are
nearby to other working water sources are more likely to be functional.
There are a number of other notable results coming from the baseline specification. I include the same covariates
in the later specifications, and their results are all qualitatively the same, so I do not include their estimates in later
tables, and will only discuss them here. The first, and most important result for this research, is that water pumps for
which the nearest alternative water source is also a hand pump are more likely to be functional. Furthermore, there
is a large and significant additional positive effect if the nearest water source is a pump of the same technology. The
combined effect of having the nearest working alternative water source be a hand pump of the same technology is an
estimated 21 percentage points increase in the probability of being functional. Combined with the negative relationship
between pump functionality and distance to an alternative working water source, these results give evidence for my
first empirical fact, that there is positive spatial correlation between water sources of the same type, and that this is
strongest between water sources of the same technology.
11 The
estimated relationship is downwards sloping up until the fitted minimum point coming from the quadratic form, and reported
in Table 2. This minimum point is estimated at roughly 13km, but 99% of observations have di < 13km, as shown by Figure 6 in the
Appendix.
9
Table 2: Probit results, distance to nearest working water source
Probit Results, Marginal Effects (reported at means)
VARIABLES
Distance to nearest working water source, km
Squared distance to nearest working water source, km
(1)
(2)
(3)
-0.0803***
-0.0634***
-0.0636***
(0.0103)
(0.0100)
(0.0101)
0.00319***
0.00237***
0.00245***
(0.000637)
(0.000638)
(0.000659)
0.113***
0.123***
(0.0287)
(0.0281)
0.102***
0.0911***
(0.0283)
(0.0290)
-0.0139***
-0.0124***
(0.00482)
(0.00144)
0.228***
0.219***
(0.0294)
(0.0297)
-0.118***
-0.110***
(0.0351)
(0.0351)
-0.000379
0.0358
(0.0542)
(0.0489)
-0.0498
-0.0376
(0.0316)
(0.0318)
Nearest working source is a pump? Dummy
Nearest handpump of the same type? Dummy
Age of pump at record
Pay for use dummy (1=pay per bucket/month/year)
Pay for use dummy, nearest working water source
Pump type dummy: Hand drilled tubewell
Pump type dummy: Machine drilled borehole
Distance to Dar es Salaam, km
-0.0139***
(0.00458)
Ward nationality fractionalisation (HHI)
-0.458***
-0.428**
(0.177)
(0.213)
Census (2002) variables
No
Yes
Yes
LSMS variables
No
No
Yes
District Fixed Effects
No
Yes
No
12.6
9.7 - 15.5
5,256
0.0258
62.1%
13.4
9.0 - 17.8
5,256
0.105
67.7%
13.0
8.7 - 17.3
5,256
0.103
67.3%
Estimated minimum point (km)
Estimated 95% confidence interval for minimum point (km)
Observations
Psuedo R2
Percent correctly predicted
Standard errors clustered at the ward level, in parentheses
*** p<0.01, ** p<0.05, * p<0.1
Specification (1) has no controls; (2) includes controls from water data and 2002 census, with district fixed effects
(3) includes region level controls from the LSMS-ISA
10
Other results from my estimates are consistent with the existing (limited) literature on water pump functionality.
First, I find that communities that pay for water usage are more likely to have a functional water pump; this may
reflect the fact that these communities have been able to overcome the collective action problem and raise funds
for the maintenance of their water pump. Second, communities with greater diversity of nationalities are less likely
to have a functional water pump, a finding consistent with Miguel [2005] and other papers discussed in section 1,
though religious fractionalization is not a significant predictor of pump functionality. Finally, pumps that are further
from the capital city, Dar es Salaam, are less likely to be functional, though the distance to other cities is not significant.
Evidence for my second empirical fact, that positive spatial correlations in water source functionality are not present
when the water sources are of different types, is given in Table 3. These specifications include the distance to pump
and non-pump water sources as separate explanatory variables, both in separate regressions (panels 2 and 3) and in the
same regression (panel 4). The estimates for distance to the nearest working water pump are very similar to those in
the baseline specification, indicating a negative relationship between distance and pump functionality over all relevant
distances. However, there is no statistically significant relationship between pump functionality and the distance to a
non-pump water source, as shown in the final two panels. These results show that only the distance to a similar water
source predicts pump functionality.
The results in Table 3 rule out one possible source of endogeneity of distance to the nearest alternative working
water source. We might be concerned that pump practitioners install more pumps in areas where there is a good
water table, and hence such pumps will have both a small distance to an alternative and a greater probability of
remaining functional due to the plentiful water. This case would explain the negative relationship between distance
to an alternative source and pump functionality. However, if this was the explanation, then we would likely see the
same negative relationship with distance to non-pump water sources, which Table 3 shows is not present in the data.
Finally, I present the instrumental variables estimates in Table 4, with the first stage results given in Table 7 in the
Appendix. The estimates are qualitatively similar to the baseline Probit specification, estimating a negative relationship between distance to the nearest alternative water source and pump functionality. These estimates reinforce the
findings of my baseline specification, that functionality is positively correlated between nearby water sources of the
same type. By taking an instrumental variables approach, this specification provides a plausibly causal negative effect
of increasing the distance to the nearest alternative source on pump functionality.12
4.2
Network measures - degree centrality
The results from section 4.1 showed positive spatial correlation in the functionality of similar water sources, but no
correlation when the water sources are of a different type. However, this analysis only looks at the distance to the
nearest alternative working water source, and so ignores the effects of other nearby water sources. This section presents
further reduced form evidence for my two empirical facts, by testing the relationship between a pump’s ‘centrality’ in
its network and the probability that it is functional.
I use a simple measure of centrality in a network, degree centrality. I use a binary connection measure, defining
community A as connected to community B if they are within 1.2km of each other.13 Degree centrality is the sum
12 The null hypothesis of exogeneity in the Wald chi-squared test is rejected at the 90% confidence level in all three specifications. This
suggests that the non-random placement of pumps is indeed significant. The instrumental variable specification also only estimates the
negative relationship to exist over a distance about half as far as the probit estimates do, as shown by the estimated minimum points in
Table 4. One possible explanation for this is that the non-random placement of pumps by water practitioners deliberately exploits positive
spillovers between communities, possibly strategically placing pumps to extend the distance over which positive spillovers occur.
13 I chose a distance cutoff of 1.2km as this gave a reasonable mean and variance in the number of links, my explanatory variable. For
robustness, I varied this cutoff to 1km and 1.5km, and the results remain qualitatively similar.
11
Table 3: Probit results, distance to nearest working pump, non-pump or combination
Probit Results, Marginal Effects (reported at means)
(1)
(2)
(3)
-0.0803***
-0.0634***
-0.0636***
(0.0103)
(0.0100)
(0.0101)
0.00319***
0.00237***
0.00245***
(0.000637)
(0.000638)
(0.000659)
0.0258
62.1%
0.105
67.7%
0.103
67.3%
-0.0757***
-0.0621***
-0.0631***
(0.00917)
(0.00871)
(0.00880)
0.00241***
0.00197***
0.00207***
(0.000494)
(0.000442)
(0.000441)
0.0326
62.7%
0.109
68.2%
0.106
67.5%
-0.00172*
-0.00195
-0.000857
(0.00102)
(0.00147)
(0.00166)
VARIABLES
Baseline - distance to nearest working water source (all)
Distance to nearest working water source, km
Squared distance to nearest working water source, km
2
Psuedo R
Percent correctly predicted
Distance to nearest working hand pump only
Distance to nearest working handpump, km
Squared distance to nearest working handpump, km
Psuedo R2
Percent correctly predicted
Distance to nearest working non-pump only
Distance to nearest working non-pump water source, km
Squared distance to nearest working non-pump water source, km
Psuedo R2
Percent correctly predicted
Distance to nearest working pump and non-pump
Distance to nearest working handpump, km
Squared distance to nearest working handpump, km
Distance to nearest working non-pump water source, km
Squared distance to nearest working non-pump water source, km
Psuedo R2
Percent correctly predicted
District Fixed Effects?
Observations
2.64e-06
8.12e-06
5.92e-06
(3.48e-06)
(6.18e-06)
(7.05e-06)
0.00257
59.1%
0.0865
66.6%
0.0824
65.7%
-0.0745***
-0.0643***
-0.0646***
(0.00936)
(0.00874)
(0.00857)
0.00240***
0.00211***
0.00221***
(0.000518)
(0.000452)
(0.000443)
-0.000937
5.14e-05
0.000518
(0.00102)
(0.00138)
(0.00154)
1.11e-06
-3.41e-07
-2.40e-07
(3.35e-06)
(5.87e-06)
(6.40e-06)
0.0335
62.4%
No
5,256
0.103
67.8%
Yes
5,256
0.103
67.3%
No
5,256
Standard errors clustered at the ward level, in parentheses
*** p<0.01, ** p<0.05, * p<0.1
Specification (1) has no controls; (2) includes controls from water data and 2002 census, with district fixed effects
(3) includes region level controls from the LSMS-ISA
12
Table 4: Instrumental Variable results
Instrumental variable results, Estimated Coefficients
Distance to nearest working water source
Baseline - probit estimates
(1)
Distance to nearest working water source, km
Squared distance to nearest working water source, km
Estimated minimum point (km)
Estimated 95% confidence interval for minimum point (km)
Early breakdown as an instrument for distance
Distance to nearest working water source, km
Squared distance to nearest working water source, km
Estimated minimum point (km)
Estimated 95% confidence interval for minimum point (km)
Wald chi-squared test of exogeneity, test statistic
Wald chi-squared test of exogeneity, p-value
Observations
(2)
(3)
-0.207
-0.164
-0.164
(0.0268)***
(0.0258)***
(0.0261)***
0.00822
0.00613
0.00633
(0.00164)***
(0.00166)***
(0.00171)***
12.6
9.7 - 15.5
13.4
9.0 - 17.8
13.0
8.7 - 17.3
-0.449
-0.392
-0.434
(0.101)***
[0.144]***
(0.122)***
[0.175]**
(0.118)***
[0.211]**
0.0350
0.0305
0.0363
(0.0103)***
[0.0230]
(0.0118)***
[0.0229]
(0.0117)***
[0.0356]
6.4
4.5 - 8.3
[1.4 - 11.4]
7.75
0.0208
5,256
6.4
4.2 - 8.5
[1.3 - 11.5]
4.65
0.0979
5,256
6.0
4.1 - 7.8
[-0.8 - 12.8]
7.40
0.0247
5,256
Where 2 standard errors and confidence intervals are reported, standard errors in round parentheses estimated without clustering;
standard errors in square parentheses estimated clustering at ward level All other standard errors clustered at the ward level.
*** p<0.01, ** p<0.05, * p<0.1
Specification (1) has no controls; (2) includes controls from WA data and 2002 census, with district fixed effects
(3) includes region level controls from the LSMS-ISA
13
of the strength of all network connections, in this case simply the number of positive network connections. I include
degree centrality as a regressor in a probit regression:
P r(f unctionali |centralityi , Xi ) = Φ(β0 + β1 centralityi + β 0 Xi )
where Xi is a vector of covariates, similar to those used in section 4.1.
I test the effects of five variants of degree centrality as predictors of a pump’s functional status:
1. the number of network connections to other water sources
2. the number of network connections to other hand-powered water pumps
3. the number of network connections to other hand-powered water pumps of the same technology
4. the number of network connections to non-pump water sources
5. the number of network connections to pump water sources of a different technology
The first measure nests the second, which nests the third. Summary statistics and histograms for the number of
network connections of each type are given in Table 13 and Figure 8 in the Appendix, showing a mean of 4.95 links to
non-pump water sources, 0.82 links to pumps of the same technology, and 1.31 links to pumps of a different technology.
Given my results in section 4.1, we expect there to be a positive relationship between pump functionality and the
number of network connections it has each of the first three measures, and for this relationship to be strongest for the
number of connections to pumps of the same technology. The previous results also suggest that there is no statistically
significant spatial correlation between functionality of pump and non-pump water sources, suggesting that the fourth
and fifth measures will not be significant predictors of pump functionality.
The results from the probit specifications of pump functionality on degree centrality are given in Table 5. Specifications 1 and 2 show that the number of water sources a community has in its network is not a significant predictor
of its pump functionality. Note that the centrality variable includes links to all water sources in this case, regardless
of type or technology. This finding is not hugely surprising given that we found that the distance to the nearest nonpump water source was not significant in section 4.1, as the majority of network connections are to non-pump water
sources. Specifications 3 and 4, however, show that the number of pumps in a community’s network has a statistically
significant and positive effect on that community’s pump functionality. While the effect is statistically significant, it
is not a very big effect: an increase of one standard deviation (2.2) in the number of pumps in a community’s network
increases the probability that their pump is functional by 1.3 percentage points. Specifications 5 and 6 show that
the number of pumps of the same technology in a community’s network have a fairly large and significant effect on
pump functionality. A one standard deviation (2.3) increase in the number of pumps of the same technology in a
community’s network is associated with a 3.7 percentage point increase in pump functionality. These results are stable
across a large number of different specifications, and provide evidence for my first empirical fact: there is positive
spatial correlation between functionality of water pumps, and this is strongest when the pumps are more similar.
Specifications 7-12 contain multiple centrality measures for water sources of different types. Specifications 7-9 seem
to indicate that there is an insignificant effect of having non-pump water sources in a pump’s network, in keeping
with our previous results and empirical facts. While these specifications seem to indicate that there is a positive effect
of having pump sources in your network, specifications 10-12 show that this is completely driven by pumps of the
same technology, which have a large and positive effect. Indeed, the number of pumps of a different technology in
a community’s network again has no significant effect on pump functionality, while the number of non-pump sources
14
remains insignificant in these specifications. As discussed in more detail in section 4.3, these negative but insignificant
coefficients are consistent with free riding and positive spillover effects offsetting each other.
4.3
A structural modeling approach
The results presented in sections 4.1 and 4.2 demonstrate two key empirical facts about the functionality of rural water
pumps in Tanzania. First, there is a clear positive spatial correlation between the functionality of water pumps14 , and
these spatial correlations are strongest for pumps that are more similar; second, there is no spatial correlation between
the functionality of water pumps and non-pump water sources.15
There are two main explanations for the positive correlation of nearby water pumps. First, there may be spatially
correlated shocks or unobservable factors driving this correlation in functionality. For example, we may think that
nearby water sources have a similar water table, or experience similar rainfall shocks, which might explain the spatial
correlation. However, such shocks or unobservables would have to be particular to a certain type of water source or
technology to explain why the correlation only exists for water sources of the same type, and not between sources of
different types. The second explanation is that there are positive spillovers in pump maintenance, and that these are
strongest between water sources that are of the same type and technology, explaining the lack of spatial correlation
between water sources of different types. I believe this is the most plausible explanation of the results in sections 4.1
and 4.2, and discuss this, and alternative explanations, in more detail in section 4.3.2.
The reduced form results also give some limited evidence for free riding in the maintenance of water pumps. The
negative coefficients on the number of links to pumps of a different technology in specifications (10)-(12) of Table 5 are
consistent with communities having a reduced incentive to maintain their pump if there is a close substitute nearby,
though these coefficients are not statistically significant. Given these results, the remainder of this paper seeks to
estimate the extent of positive spillovers and free riding in my empirical context, using a structural model.
4.3.1
Motivation
My model sets out a network game in which each community’s action is a best response to the actions of its neighbors.
This allows us to model the strategic interactions between nearby communities. If communities free ride on their
neighbors’ investments, and there are no positive spillover effects, then investments will be strategic substitutes; however, if there are spillover effects that increase a community’s returns to investing, then investments will be strategic
complements. The model allows me to be explicit about how spillovers and free riding occur, and to test different
mechanisms through which they can act. For example I can test different ways in which positive spillovers might decay
as distance between communities increases, or the extent to which positive spillovers are dependent on pumps being
of the same technology.
A structural model also allows me to conduct counterfactual analysis, analyzing the effect of possible policies on
the functionality rates of water sources. In section 7.2, I estimate the impact of standardizing pump technologies to
maximize opportunities for positive spillovers, though in future work there are many other possible interesting and
policy-relevant counterfactuals that I will analyze. Water practitioners (governments, NGOs) are essentially solving
a second-best policy problem. In the first-best world, they would be able to choose actions at both stages of water provision, installation and maintenance, to meet some social optimum. However, as maintenance decisions are
decentralized, communities choose privately optimal maintenance, leaving water practitioners to choose the optimal
14 This is shown by the negative relationship between pump functionality and the distance to an alternative working water pump (Tables
2, 3 and 4), and by the positive coefficient on the number of water pumps within 1.2km (Table 5).
15 This is shown by the positive coefficients on the nearest water source being of the same type and technology (Table 2), the lack of
significance on the distance to the nearest non-pump water source (Table 3), and the insignificant coefficients on the number of non-pump
water sources within 1.2km (Table 5).
15
Table 5: Degree centrality regressions - marginal effects reported at the mean of covariates
VARIABLES
No. links to water sources
(1)
(2)
0.000664
0.000577
(0.000856)
(0.000712)
No. links to pumps
(3)
(4)
0.00670***
0.00566***
(0.00222)
(0.00212)
No. links to pumps, same tech
(5)
(6)
0.0188***
0.0158***
(0.00424)
Age at record
Pay for use dummy
District Fixed Effects
Month Fixed Effects
Observations
Psuedo R2
No
No
9,456
0.000160
(0.00346)
-0.00665***
-0.00673***
-0.00685***
(0.000913)
(0.000906)
(0.000899)
0.251***
0.248***
0.243***
(0.0187)
(0.0189)
(0.0182)
Yes
Yes
9,456
0.0956
No
No
9,456
0.00162
Yes
Yes
9,456
0.0964
No
No
9,456
0.00540
Yes
Yes
9,456
0.0988
Standard errors clustered at the Ward level, in parentheses
*** p<0.01, ** p<0.05, * p<0.1
VARIABLES
No. links to non-pump sources
No. links to pumps
(7)
(8)
(9)
(10)
(11)
(12)
-0.00144
(0.00128)
-0.000941
-0.00108
-0.000834
-0.000463
-0.000564
(0.00125)
(0.000924)
(0.00119)
(0.00114)
(0.000877)
0.00834***
0.00403*
0.00676***
(0.00227)
(0.00215)
(0.00226)
-0.00635
-0.00799
-0.00594
(0.00486)
(0.00516)
(0.00446)
0.0209***
0.0146***
0.0172***
(0.00444)
(0.00399)
(0.00361)
No. links to pumps, diff tech
No. links to pumps, same tech
Age at record
Pay for use dummy
District Fixed Effects
Month Fixed Effects
Observations
Psuedo R2
No
No
9,456
0.00202
-0.00632***
-0.00673***
-0.00638***
-0.00685***
(0.00136)
(0.000905)
(0.00130)
(0.000897)
0.216***
0.248***
0.210***
0.245***
(0.0188)
(0.0189)
(0.0183)
(0.0181)
No
No
9,456
0.0471
Yes
Yes
9,456
0.0966
No
No
9,456
0.0500
Yes
Yes
9,456
0.0994
No
No
9,456
0.00633
Standard errors clustered at the Ward level, in parentheses
*** p<0.01, ** p<0.05, * p<0.1
16
installation policy, taking community actions as given. My model can provide a solution to this optimal policy question, of how many pumps of each technology to install, and where to locate them. Additionally, by using the price paid
to use water at neighboring water sources, which I see in the data, to back out the dollar equivalents of my estimated
parameters, I can estimate the effect of making transfers to communities conditional on their water source remaining
functional. I discuss these counterfactual analyses in more detail in section 7.2.
4.3.2
Identification
There are two main identification challenges inherent in estimating social interaction effects in a network: endogeneity
and reflection (Manski [1993]). Endogeneity may arise in at least two ways: first, agents might choose their peers
endogenously; second, they might have correlated unobservables or experience common or correlated shocks. I discuss
each of the possible sources of endogeneity and how I deal with the reflection problem in turn.
In my empirical context, communities make maintenance decisions but do not make the installation decision of water
sources,16 and so, from the communities’ perspective, the network is exogenous. Further, because network connections
in my model depend on the distance between water sources, and inherent characteristics of the water source or community, communities are not able to choose their peers endogenously.
However, although communities do not choose their peers, it is possible that water practitioners choose where to
install different types of water source based on specific community characteristics, which would lead to correlated
unobservables particular to one type of water source or technology. For example, we might think that practitioners
choose one technology for poorer communities, and another for richer communities. Similarly, an individual NGO
might use a particular technology and also have a preference for working with a certain type of community, for example Christian charities might favor working with Christian communities. However, although water practitioners differ
in their favored technologies, the vast majority have very similar stated goals that guide their installation decisions
- in particular to help Tanzania meet Millennium Development Goal 7.C, to halve the proportion of people without
access to an improved water supply. As they use the same definitions of ‘improved’ and ‘access’ and must typically
agree locations for installation with local government, they are likely to have similar objective functions, and are likely
to work in similar communities on average.
To formally test whether water sources of different types and technologies are installed in similar communities, I
run a series of selection regressions, using water source type or technology as the dependent variable, and community
characteristics from the community LSMS as independent variables. The results are shown in Table 14 in the Appendix, and show that of the 15 community characteristics tested in the main specifications, only the education of the
village leader seems to be a significant predictor of the type or technology of water source installed. All other variables
are insignificant at the 95% confidence level in my preferred specifications, including district fixed effects. Although
we cannot definitively rule out selection of water source type on unobservables, it does show that communities with
different types of water sources are observably similar.
There remains a further possible source of endogeneity, spatially correlated shocks. I allow for spatially correlated
shocks, but the identification of spillover effects requires an identifying assumption: that the correlation of shocks
between nearby water sources is independent of the type of water sources. As shown in Figure 3, although I allow for
spatially correlated shocks, I assume that there cannot be a shock that only affects certain types of water source in an
area, but not others. More formally, I make the following assumption:
Assumption 1. If shocks to a water source i of type j in cluster c are given by jic = ηi +ηcj where ηi is an idiosyncratic
16 Installation of water sources is carried out by third parties: the government or non-governmental organizations such as charities or the
World Bank.
17
shock, and ηcj is a shock specific to water sources of type j ∈ {1, 2, ..., J} in cluster c, then ηc1 = ηc2 = ... = ηcJ = ηc ∀c.
Therefore, I allow for spatially correlated shocks, but do not allow these to be specific to one type of water source
or technology. I think this assumption broadly reflects the reality of my empirical context. The main shocks that
are related to functionality of water sources stem from hydrological shocks (water tables, or rainfall), or shocks to
the communities maintaining these pumps (e.g. negative income shocks). In both cases, it seems that such shocks,
if spatially correlated, would be correlated across water source types, as well as between water sources of the same type.
Given Assumption 1, and the fact that communities do not select their peers and practitioners do not select particular types or technologies of water source for communities based on characteristics of the community, we can
identify spillover effects from spatial correlations that are stronger between water source types that are more similar.
Identification of these spillover effects does not rely upon specific functional forms or parametric assumptions in my
model.
Figure 3: The reduced form results show that there is spatial correlation in functionality between water sources of
type A, and between water sources of type B, but no correlation between water sources of different types, as shown in
the left hand panel. My main identifying assumption is that there cannot be shocks or unobservables that are specific
to one type of water source within a given area: I am ruling out shocks such as the ones presented in the right hand
panel.
The other main challenge in identification is the ‘reflection problem’, formalized by Manski [1993]. In a network setting,
each agent’s action affects the actions of his peers, but at the same time his peers’ actions affect his own. As such we
cannot distinguish cause from effect, and have a problem of simultaneity. However, in my empirical setting, because
the strength of connections in the network declines with the distance between the nodes (with some cutoff distance
at which network connections are zero), each node has its own individual peer group. Essentially, we have partially
overlapping groups, with ‘excluded peers’ or ‘neighbors of neighbors’, that is nodes that are not in one’s peer group,
but are included in the groups of one’s peers. As such, I use an approach similar to that taken by De Giorgi, Pellizzari,
and Redaelli [2010], Bramoullé, Djebbari, and Fortin [2009] and Calvó-Armengol, Patacchini, and Zenou [2009], using
excluded peers’ actions as an exogenous shift in neighbors’ actions to overcome the reflection problem.
5
The Model
Informed by the reduced form results, I develop a structural model to capture strategic interactions between communities within the same network. Interdependence of decision-making between communities may come through two
channels. First, I allow for, and will estimate the magnitude of, positive spillovers in the maintenance of nearby water
sources, with spillovers strongest for communities with water sources of the same type and technology. Possible mechanisms include cost-sharing, market creation (for inputs), skill development and information sharing (see Section 2 for
a more detailed discussion). Second, the model allows for free riding, or ‘incentive effects’, whereby a community may
access a neighbor’s water source in the event that their source is non-functional. The cost of accessing a neighboring
18
water source depends on the distance traveled, and characteristics of both communities. Consistent with the reduced
form evidence and previous literature, I allow positive spillovers and free riding to occur through different mechanisms,
and thus depend on a different set of variables, allowing me to identify both effects.
The unit of observation is a community, as defined by an observed water source (functional or non-functional) in
my data. For example if a large village has three water sources, I will define this as three separate communities.
This is a reasonable assumption, as the norm of ‘community based management’ of rural water sources requires the
appointment of a ‘water point committee’ at the water source level, normally around 10 people, mostly women, who
are responsible for maintenance and management of the water point.17
There are a finite number of communities, each of which has a fixed, exogenously determined spatial location. Communities are connected to each other in a fixed, weighted, undirected, symmetric matrix N(δ), with the strength of
connection between community i and j given by:
nij = max{
1 − δ1 1(T Yi 6= T Yj ) − δ2 1(T Ei 6= T Ej )
1(dij < c), 0}
1 + δ3 dij
(1)
where T Yi is the type of water source i (e.g. pump, tap, well) and T Ei is the technology of pump i (e.g. Nira, Afridev,
India Mark II). dij is the distance between communities i and j, and c is a distance cutoff such that network connections between communities further apart than c are zero.18 The size of positive spillovers between two communities
will depend on the strength of the network connection between them, and the variables and functional form are chosen
to reflect the reduced form evidence. In particular, positive spillovers will be stronger between communities that have
a similar water source, but the strength of these spillovers decays as distance between the communities increases. I
restrict δ = (δ1 , δ2 , δ3 ) ≥ 0 so that nij ∈ [0, 1], where nij = 1 if and only if dij = 0 and communities i and j have water
sources of the same type and technology.
The communities play a static network game of complete information, in which they all move simultaneously. Each
community has a binary maintenance decision, mi ∈ {0, 1}, and this is perfectly effective, so that a water source is
functional if and only if the community sets mi = 119 . I consider Nash equilibria in pure strategies. Community i’s
cost of maintenance depends on the maintenance decisions of those in i’s network (j s.t. nij > 0), as well as (possibly
correlated) cost shocks, i , which are realized and publicly known before the communities choose mi :
c(mi ) =
f (Xi ; ψ)
+ i mi
1 + Ni (δ)m−i
(2)
Xi are observable pump characteristics (e.g. age, technology) and community characteristics (e.g. fractionalization,
population); Ni (δ)m−i is the sum of the product of maintenance decisions and the strength of network connections
for community i (given by the product of the ith row of the networks connection matrix and the vector of maintenance
17 It
may be more realistic to define the community, and thus the decision making process, at the village level. Using village level
observations does not significantly alter the model: each community would still choose a level of maintenance as a best response to their
neighbors, but this maintenance decision would no longer be binary. I define a community as a water-source to simplify my empirical and
data work, and will explore decision-making at the village level in future work.
18 The distance cutoff controls the density of the network, and thus the dimensionality of my estimation problem. The role it plays
depends on the estimation method undertaken and the size of the network we want to estimate. I discuss this in more detail in section 6,
and in the Appendix.
19 By assuming that maintenance is perfectly effective, my model assumes that there is a finite price for every community at which their
water source will work. In practice some maintenance, for example deepening an existing borehole, is too expensive for many communities
to undertake. My model treats these cases as having a very high cost of setting mi = 1, possibly because of a large cost shock, with the
community choosing not to pay this high cost of maintenance. I am agnostic about the timing - this maintenance can be thought of as
pre-emptive or as a ‘repair’ after a shock, but in both cases there is perfect information about the cost of maintenance once the shock has
been realized, and no uncertainty about functionality.
19
decisions of other communities). f (Xi ; ψ) should be thought of as the full cost of maintenance, for a community that
is in isolation or has no neighboring communities that are investing. The more of i’s neighbors that invest, the greater
a discount i receives on its cost of maintenance, as given by the denominator. This discount occurs through positive
spillovers from the maintenance of these water sources.
The simple functional form specifying the mechanism through which positive spillovers occur captures the main
empirical patterns from the reduced form analysis, allows a cost-discount interpretation of the spillovers, and does not
place arbitrary bounds on the aggregate strength of network connections to ensure the cost of maintenance remains
positive.20 The parametric form of f (Xi ; ψ) used for estimation will be specified in section 6, and is informed by
reduced form evidence and the previous literature on costs of public goods provision (e.g. on collective action).
If a community chooses not to maintain their water source then it is non-functional and they may access a water
source maintained by another community, free-riding on that community’s maintenance. However, they face a cost of
accessing an alternative water source, depending on the distance to and characteristics of the alternative:
min
j6=i,mj =1
g(dij , X̃j ; γ)
(3)
That is, they will access water from the community that set m = 1 at which their access has lowest cost. The X̃j
terms include characteristics of community j (a different set to those included in Xi ), for example whether community
j charges for water access, which we see in the data. Again the specific functional form of g(mj dij , X̃j ; γ) is informed
by reduced form evidence and is given in section 6.
Communities have linear utility, with a fixed valuation of having water in their own community, w:21
ui = wmi − mi
f (Xi ; ψ)
+ i
1 + Ni (δ)m−i
− (1 − mi )
min
j6=i,mj =1
g(dij , X̃j ; γ)
(4)
Community i’s best response function, taking the maintenance decisions of other communities in i’s network as given,
is:
mi = 1 iff u1i > u0i
(5)
f (Xi ; ψ)
+ min g(dij , X̃j ; γ) > i
⇔ w−
1 + Ni (δ)m−i j6=i,mj =1
|
{z
}
(6)
ūi
mi = 0 otherwise
(7)
where u1i is the utility of community i if it sets mi = 1, and u0i is the utility if it sets mi = 0. As is usual in games
of discrete choice with multiple players, multiple equilibria are possible and I discuss how this affects my estimation
strategy in more detail in section 6.
20 For example, a simpler linear specification of the form f (X ; ψ)(1 − N (δ)m ) would require bounds on the aggregate strength of
i
−i
i
network connections, Ni (δ)m−i to ensure costs of maintenance are not negative.
21 Including w as a parameter to be estimated is equivalent to estimating the mean of the cost shock terms, which I am normalizing to
be zero.
20
6
Estimation
Estimation of this model is non-trivial as we have cross-community choice dependencies - that is, community i’s optimal
choice of mi depends on the choices made by other communities in i’s network. By assuming a distribution of the cost
shock terms, we can write down the individual likelihood of observing a given mi , conditional on other maintenance
decisions and a given set of parameters. In what follows, I assume that the cost shock terms are ∼iid N (0, σ), though
this assumption can be relaxed.22
h ū imi h
ū i1−mi
i
i
Li (mi |Ni (δ)m−i , Xi , X̃−i ; ψ) = Φ
1−Φ
σ
σ
(8)
However, these conditional likelihoods are not independent across different i’s and so we cannot write the joint likelihood as the product of conditionals (see online appendix of Acemoglu, Garcı́a-Jimeno, and Robinson [2015] for a
similar discussion). Similarly, individual moment conditions will not be independent in a moment inequality approach
(this would be a violation of Assumption 2 in Pakes, Porter, Ho, and Ishii [2011]).
There are two main approaches I can take for estimation, as shown in Figure 4. The first approach, which I take
in this paper, is the ‘segregated networks’ approach, which defines relatively small, separate non-overlapping networks
(clusters) according to pre-defined geographical areas, such as administrative divisions. The assumption is that a
community is only affected by (and therefore only responds to) the actions of communities in its network. Although
the definition of the segregated networks (clusters) requires potentially strong restrictions on the types of interactions I
allow, it makes estimation much more tractable. For a given set of parameters, one can solve for all possible equilibria
in each cluster and use a probabilistic equilibrium selection rule to obtain maximum likelihood estimates. The estimation procedure is similar to that used in Todd and Wolpin [forthcoming], and is discussed in more detail in section 6.1.
A second possible approach does not restrict network connections to be positive only between communities in the
same defined cluster or administrative area. As a result, it is possible for there to be a single large network, in which
all communities are connected to each other, either directly or indirectly, and are therefore playing a large series of
interconnected games. The existing literature has not estimated such a model using a likelihood approach. One possible approach is to partition the data into observations for which we can and cannot write down an exact likelihood,
and then approximate the likelihood for the latter group using Bayesian updating. This procedure is proposed in the
online appendix of Acemoglu, Garcı́a-Jimeno, and Robinson [2015] and I will explore methods of implementing it in
future work. I discuss this in more detail in the Appendix.
6.1
Segregated networks
The basic model from section 5 remains the same - each community chooses perfectly effective, binary maintenance,
mi , to maximize their utility as a best response to other communities’ choices. However, I now define k = 1, ..., N
clusters, each containing i = 1, ..., Nk communities with a water source. Network connections are only positive within
a cluster, that is:
nij = max{
1 − δ1 1(T Yi 6= T Yj ) − δ2 1(T Ei 6= T Ej )
1(dij < c)1(i, j ∈ k), 0}
1 + δ3 dij
22 As discussed in section 4.3.2, my identification strategy allows for spatially correlated cost shocks. Allowing such correlations does
not significantly change the estimation procedure, but just requires simulation of the cost shocks, rather than analytic evaluation of the
probability of a given strategy profile being an equilibrium. In future work, I will specify log normal cost shocks, to ensure that the cost
of maintenance remains positive.
21
Figure 4: The segregated networks approach (left) treats each ward/village as an individual network in equilibrium.
An alternative approach is to allow network connections between all water sources, possibly creating a single large,
connected network. Using a distance cutoff rule (right) allows networks to overlap, but restricts the density of network
connections in a large connected network, reducing the dimensionality problem.
k
In each cluster k we have 2Nk strategy profiles, each of which is a possible Nash equilibria, E0k , E1k , ..., E(2
, where
Nk
−1)
E0k is the strategy profile observed in the data. Note that for a given set of parameters, any equilibrium is possible for
a certain set of cost shocks. To see this, we can fix any possible strategy profile of mi ’s in a community k and check
whether a set of shocks exists in which this configuration is a Nash equilibrium. A set of shocks to rationalize this
strategy profile as an equilibrium always exists: for example every community that sets mi = 1 may have received a
very large negative cost shock, and all those that set mi = 0 may have received a large positive cost shock, making
maintenance of their water pump prohibitively expensive. However, given a set of parameters and my parametric
assumption on the distribution of the cost shock terms, the likelihood that each configuration is an equilibrium will
vary and can be calculated.
The estimation procedure is as follows:
1. Choose a set of parameter values: θ = δ, ψ, w, γ
2. For each possible strategy profile, l, in each cluster, k, calculate the cutoff values of cost-shocks for each community, i: ūil
3. For each strategy profile, l, in each cluster, k, calculate the probability that it is an equilibrium (probability that
the cost shock errors in each cluster are such that no community wants to deviate):
P r(Elk ) =
Y
P r(i > ūil )
{i∈k:mk
il =0}
Y
P r(i ≤ ūil )
(9)
{i∈k:mk
il =1}
4. Using a parametric probabilistic equilibrium selection rule, calculate the likelihood of the observed strategy
profile being the equilibrium that is being played:
exp(P r(E0k ))
Lk (E0k |Xk ; θ) = P2Nk −1
exp(P r(Elk ))
l=0
5. Repeat steps to maximise
QN
k=1
(10)
Lk (E0k |Xk ; θ)
Note that step 3 of the procedure simply calculates the probability that a given strategy profile is an equilibrium; we
need to use a (parametric) equilibrium selection rule in step 4 to give the probability of observing the equilibrium seen
22
in the data, E0k , given the the probabilities of all possible equilibria. This paper uses a multinomial logit in line with
previous literature and Todd and Wolpin [forthcoming].
This estimation procedure can be generalized in a number of ways. Firstly, the choice of probabilistic equilibrium
selection rule can be generalized, for example by making the equilibrium selection rule dependent on equilibrium characteristics weighted by estimable parameters. More fundamentally, I am using a simple error structure, in which cost
shocks are normally distributed and independent between communities. As noted in section 4.3.2, my identification
allows for spatially correlated cost shocks, and the estimation procedure does not significantly change by allowing
these to be correlated within a cluster, or by using log normal shocks to ensure that the cost of maintenance is always
positive. Allowing this more flexible shock structure would make analytic calculation of the probability of a strategy
profile being an equilibrium (step 3) more difficult, and so would require a simulation approach. In this case, I would
draw cost shocks from the joint distribution in step 3, and calculate the set of equilibria for that set of shocks. By
drawing a large number of shocks for each community, one can obtain a consistent estimate of the probability of each
configuration being an equilibrium. This simulation-based approach is also attractive if we allow for measurement
error in pump functionality (or equivalently do not constrain pump maintenance to be perfectly effective). In this
case, pump functionality is a noisy measure of a community’s investment decision, which I would treat as a latent
variable, similar to the approach taken by Todd and Wolpin [forthcoming].
To estimate the model, I must make particular functional form assumptions over f (Xi ; ψ) and g(dij , X̃j ; γ). I present
results using simple forms, which include the strongest predictors of pump functionality from the reduced form analysis. In future work, I will include additional pump and community characteristics (essentially controls), which will
help improve the fit of my model. In particular, I set the full cost of maintenance, f (Xi ; ψ), to be a linear function of
water source age, ai , giving a specific version of equation 2:
c(mi ) =
ψai
+ i mi
1 + Ni (δ)m−i
(11)
In addition, I give a simple functional form to the cost of accessing another water source (equation 3, so that it depends
linearly on the distance to an alternative source, dij , and whether users must pay at that source, pj :
min {γ1 dij + γ2 pj , C0 }
j6=i,mj =1
(12)
Here C0 is the cost paid by a community if it has no alternative functional water sources available in their cluster,
and it essentially bounds the cost of accessing an alternative source. This parameter is estimated and is needed for
the small minority of clusters where we observe no water pumps being functional. The empirical interpretation is the
cost of using a unimproved water source (e.g. river or lake), rather than the improved water sources that we see in
the data (pumps, wells and taps). The best response function is then given by:
mi = 1 iff u1i > u0i
(13)
ψai
+ min {γ1 dij + γ2 pj , C0 } > i
⇔ w−
1 + Ni (δ)m−i j6=i,mj =1
|
{z
}
(14)
ūi
mi = 0 otherwise
(15)
I estimate 8 parameters of my model: δ1 , δ2 , δ3 , γ1 , γ2 , ψ, w, C0 .
23
6.2
Data
I present preliminary results from estimation of my model on a subset of my data, taken from the district of Rukwa
in the region of Mpanda, from the Tanzanian data in 2013. I chose this district, as it has a high proportion of water
sources that are pumps, the maintenance of which my model is designed to capture, and because it is relatively easy to
divide the district into non-overlapping clusters based on village codes, the smallest administrative area in the data23 .
I defined every village with between 2 and 8 observations as a cluster, I merged any village with only one observation
with its nearest cluster, and I divided communities with greater than 8 observations into two or more clusters, based
on GPS coordinates. This gave me 238 observations in 59 clusters. The data used is summarized in table 6, with
summary statistics given for my data overall and the seven different possible cluster sizes.
Table 6: Summary statistics for sample used to estimate the model, from Rukwa, Mpanda
3
26
10
38.5%
17
65.4%
12.6
45
26
57.8%
32
71.1%
12.1
28
9
32.1%
20
71.4%
18.3
Water source type
communal standpipe
%
hand pump
%
improved spring
%
other
%
4
15.4%
19
73.1%
0
0.0%
3
11.5%
7
15.6%
28
62.2%
2
4.4%
8
17.8%
Water source technology
gravity
%
india mark ii
%
nira/tanira
%
other
%
1
3.8%
10
38.5%
9
34.6%
6
23.1%
7
15.6%
18
40.0%
5
11.1%
15
33.3%
Number of obs
Number functional
% functional
Number pay for use
% pay for use
Mean pump age
7
Size of cluster
4
5
2
6
7
8
All
55
30
54.5%
33
60.0%
19.7
48
19
39.6%
35
72.9%
20.9
28
9
32.1%
15
53.6%
18.3
8
3
37.5%
4
50.0%
5.3
238
106
44.5%
156
65.5%
16.9
6
21.4%
16
57.1%
0
0.0%
6
21.4%
3
5.5%
45
81.8%
1
1.8%
6
10.9%
7
14.6%
38
79.2%
0
0.0%
3
6.3%
2
7.1%
22
78.6%
0
0.0%
4
14.3%
0
0.0%
7
87.5%
0
0.0%
1
12.5%
29
12.2%
175
73.5%
3
1.3%
31
13.0%
4
14.3%
8
28.6%
5
17.9%
11
39.3%
3
5.5%
37
67.3%
6
10.9%
9
16.4%
6
12.5%
24
50.0%
1
2.1%
17
35.4%
0
0.0%
6
21.4%
8
28.6%
14
50.0%
0
0.0%
7
87.5%
0
0.0%
1
12.5%
21
8.8%
110
46.2%
34
14.3%
73
30.7%
Results
Table 7 presents estimates of the model parameters, and their standard errors. All of the parameters, except for the
δ terms governing the strength of connections between different communities, are precisely estimated. The parameter
estimates are consistent with the reduced form evidence, with pumps less likely to be functional the older they are,
and δˆ1 + δˆ2 = 1 suggesting that there are no positive spillovers between communities with a water source of a different
23 This sample was chosen for convenience in defining the clusters, but is small relative to the size of my dataset (it only contains about
1% of observations) and is not representative along observable dimensions: the proportion of pumps that are functional is significantly
lower and the proportion of pumps that require payment for use is significantly higher than the dataset as a whole. Future work will present
estimates from my entire sample of data.
24
type and technology. The point estimates provide a plausible ‘back of the envelope’ calculation for the probability of
the average pump being functional: a pump of average age (17 years) in my sample, with a distance of 1km to the
next alternative water source, at which water use is charged, would have a roughly 50% chance of being maintained
and remaining functional.
Table 7: Parameter estimates for estimation of the model using subset of data from Rukwa, Mpanda
Coefficient
Standard error
δ̂1
δ̂2
δ̂3
γ̂1
γ̂2
ψ̂
ŵ
Ĉ0
0.153
2.997
0.847
3.533
0.104
1.049
4.137
1.290
3.250
1.255
1.210
0.296
7.856
2.191
4.736
0.936
There are a number of possible explanations for the lack of significance of the δ̂ terms. I estimated the model
with the constraint δ1 + δ2 ≤ 1, and this constraint was binding in the estimation. Therefore, I have essentially
obtained corner estimates, with the likelihood flat to the right of my point estimates: standard errors estimated
from a flat likelihood function using the estimated Hessian would be expected to be large. One possible solution
to this problem is to alter the functional form used for the network connections, which is naturally generating a
kink in the likelihood when the constraint is binding. I have estimated the model using the smooth functional form
nij = exp(−δ1 1(T Yi 6= T Yj )−δ2 1(T Ei 6= T Ej )−δ3 dij ) in place of equation 1: initial simulation results show improved
estimates, and I will include these in future work, as well as estimation of the model on the full sample of data. The
selection of my subsample, which is not representative of my overall sample along observable dimensions, may also
explain the large standard errors.
7.1
Model fit
Table 8 shows the proportion of water sources correctly predicted as functional or non-functional for the subsample,
and for the water sources that are functional and non-functional in the data, while Table 9 compares model predictions
to actual outcomes, broken down by cluster size. Table 8 shows that 69% of observations are correctly predicted by my
model when the pump is functional, though I systematically over-predict functionality, with only 43% of non-functional
water sources correctly predicted. Table 9 also shows that my model systematically predicts that more pumps are
functional than we see in the data for clusters of every size apart from size 3. However, for a very parsimonious model
using only six variables from the data the fit is reasonable: as I add more explanatory variables (including month
and district fixed effects, and pump and community characteristics) in future work, I would expect the fit to improve
significantly.
Table 8: Number and percentage of observations predicted correctly
Total obs
Correctly predicted
Percentage
Funtional
Non-functional
Total
106
73
68.9%
132
57
43.2%
238
130
54.6%
25
Table 9: Model fit by cluster size
Number of obs
Actual number functional
Actual proportion functional
Number predicted functional
Proportion predicted functional
Number predicted correctly
Percentage predicted correctly
7.2
2
3
26
10
38.5%
17
65.4%
13
50.0%
45
26
57.8%
23
51.1%
28
62.2%
Size of cluster
4
5
28
9
32.1%
18
64.3%
15
53.6%
55
30
54.6%
41
74.6%
28
50.9%
6
7
8
All
48
19
39.6%
27
56.3%
26
54.2%
28
9
32.1%
16
57.1%
15
53.6%
8
3
37.5%
6
75.0%
5
62.5%
238
106
44.5%
148
62.2%
130
54.6%
Counterfactuals
Although the estimates of some of my key model parameters are not precise, and the model fit could be improved
by including more explanatory variables, this section conducts some brief counterfactual analysis and discusses what
types of counterfactuals might be explored in future work, and how this would relate to optimal policy.
The reduced form results in section 4 show that positive spatial correlations in water source functionality are stronger
between neighboring water sources that are nearby and of the same type and technology. In the model these correlations are explained by positive spillovers in the form of maintenance cost-reductions that are stronger when neighboring
communities maintain water sources of the same type and technology. A natural counterfactual, therefore, is to analyze how pump functionality would change if these positive spillovers were maximized in some way, either through the
choice of a uniform pump technology, or by locating pumps to maximize spillovers and reduce free riding.
Table 10 shows the results of eliminating the reduction in the strength of spillovers resulting from water sources
being of a different type or technology, by setting δ1 = δ2 = 0. This means that communities with different types
and technologies of water source enjoy positive spillovers as strong as if their water sources were the same type and
technology; essentially this is equivalent to a policy standardizing the technology of all water sources. This counterfactual is a partial equilibrium approach, in which I do not re-compute possible equilibria and draw one using an
equilibrium selection rule, but simply look at individual best responses and how these change when we set δ1 = δ2 = 0.
I find that there is an increase in the functionality rate of 3.3 percentage points (5.3%) as a result of standardizing
technology. The increase is largest for communities in small clusters: this is because there are diminishing returns from
neighbors’ investments, so strengthening these spillovers has the greatest impact in clusters where they are previously
weak, namely those clusters with fewer water sources. Note that we should expect a larger effect once we account
for the full equilibrium response when technology types are standardized, by re-computing the possible equilibria and
selecting one, as shown by Acemoglu, Garcı́a-Jimeno, and Robinson [2015]. Comparing the partial and full equilibrium
responses to counterfactual policies is a key priority for future work.
Table 10: Predicted partial equilibrium functionality response to the standardization of water source technologies
Cluster Size
4
5
2
3
Number of obs
Number predicted functional
Proportion predicted functional
26
17
65.4%
45
23
51.1%
28
18
64.3%
Counterfactual: δ1 = δ2 = 0
Number predicted functional
Proportion predicted functional
18
69.2%
26
57.8%
19
67.9%
26
6
7
8
All
55
41
74.6%
48
27
56.3%
28
16
57.1%
8
6
75.0%
238
148
62.2%
42
76.4%
28
58.3%
17
60.7%
6
75.0%
156
65.5%
There are a number of other promising avenues for counterfactual analysis. First, as discussed in section 4.3.1, I will
solve the optimal policy question related to location of water sources upon installation. In a first best world, the
social planner can make socially optimal installation and maintenance decisions. However, in reality there is a norm of
‘community based maintenance’, so that the maintenance decisions are de-centralized. The second best policy therefore chooses the optimal location of pump installations, given that communities solve a private optimization problem,
accounting for positive spillovers and free riding. The optimal policy for the social planner is to choose the installation
portfolio (given a fixed budget or number of pumps) to maximize the total number of people with access to a functional
water point.24 I am able to solve this optimal policy question using estimates from my model.
Another possible policy to increase the pump functionality rate would be to subsidize communities for keeping their
pumps functional in a given time period, as continued functionality produces positive externalities for neighboring
communities. By including the dollar amount that communities charge for water access in my model, rather than just
a ‘pay for use’ dummy variable, I can estimate a ‘willingness to pay for functionality’ for each community, allowing
me to estimate the effect of a conditional transfer policy. The cost and benefit of each of these counterfactuals can
be compared to other possible policy interventions, such as the installation of new water sources, using cost estimates
from the practitioner literature.
8
Conclusion
This paper analyzes the determinants of water pump functionality in Tanzania. Specifically, it estimates the magnitude of positive spillovers and free riding, and the mechanisms through which they act, in the maintenance of water
sources, using a spatial network framework. My reduced form analysis establishes a positive spatial correlation in
the functionality of water sources of the same type, which is strongest when the water sources are of the same technology. There is no evidence of spatial correlation between functionality of water sources of different types. There
are various potential explanations for these correlations, of which the existence of positive spillovers between similar
water sources is the most compelling. The reduced form estimates also give limited evidence for free riding in the
provision of water sources, whereby communities have less incentive to maintain their water source if their neighbors
are maintaining theirs. These results motivate the development of a structural model, to estimate the magnitude of
these effects, and the mechanisms through which they act, in a spatial network game between neighboring communities.
Identification of social interactions in such network games present two well known challenges: endogeneity and reflection. To address endogeneity, I exploit a novel aspect of my data: information on the extent of similarity between
different public goods. The two empirical facts derived in the reduced form analysis provide a relatively mild identifying restriction, that spatial correlation in shocks cannot be specific to individual types or technologies of water
sources. This allows me to disentangle the effects of spatially correlated shocks and unobservable variables from positive spillovers between neighboring communities. To address reflection, I exploit the spatial features of my network to
define partially overlapping peer groups that generate ‘peers of peers’ or ‘neighbors of neighbors’ that are not players
in a community’s direct network game, but are players in the games of their neighbors.
I estimate the spatial network game by partitioning the large network into segregated clusters of observations, and
restricting positive spillovers and free riding to occur only within these clusters. For each cluster, I am able to compute the probability of every possible equilibrium, and thus the likelihood of the strategy profile observed in the data,
using an equilibrium selection rule. This allows me to obtain maximum likelihood estimates of the parameters in my
model. Previous research has not estimated spillovers in a single large network using a likelihood approach and this is
a promising avenue for future work; I discuss a possible approach in the Appendix.
24 I
will use standardized definitions of ‘access’ as used by the World Bank, UNICEF and other development practitioners
27
The reduced form results give strong evidence for positive maintenance spillovers between pumps that are similar,
but limited evidence for free riding on neighbors’ investments in water sources. I estimate that water pumps are 21
percentage points more likely to be functional if the nearest working water source is a pump of the same technology.
My model allows for the evaluation of counterfactual policies, and I estimate that exploiting positive spillovers through
the standardization of water source types and technologies would yield a partial equilibrium response equivalent to an
increase in the functionality rate of 5.3%. The full equilibrium response is likely to be significantly greater, and I will
estimate this, as well as the effect of other possible policies related to the effective provision of water in developing
countries, in future work.
28
References
Daron Acemoglu, Camilo Garcı́a-Jimeno, and James A Robinson. State capacity and economic development: A
network approach. The American Economic Review, 2015.
A. Alesina, R. Baqir, and W. Easterly. Public goods and ethnic divisions. The Quarterly Journal of Economics, 114
(4):1243–1284, 1999.
Alberto Alesina and Eliana La Ferrara. Participation in heterogeneous communities. The quarterly journal of economics, 115(3):847–904, 2000.
Nizar Allouch. On the private provision of public goods on networks. Journal of Economic Theory, 2015.
T. W. Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions. Annals
of Mathematical Statistics, 22:327–51, 1951.
Manuela Angelucci and Vincenzo Di Maro. Programme evaluation and spillover effects. Journal of Development
Effectiveness, pages 1–22, 2015.
O. Bandiera and I. Rasul. Social Networks and Technology Adoption in Northern Mozambique. The Economic Journal,
116(514):869–902, 2006.
Abhijit Banerjee, Dilip Mookherjee, Kaivan Munshi, and Debraj Ray. Inequality, control rights, and rent seeking:
sugar cooperatives in maharashtra. Journal of Political Economy, 109(1):138–190, 2001.
R. Boadway, Z. Song, and J.-F. Tremblay. Commitment and matching contributions to public goods. Journal of Public
Economics, 91(9):1664–1683, 2007.
Yann Bramoullé, Habiba Djebbari, and Bernard Fortin. Identification of peer effects through social networks. Journal
of econometrics, 150(1):41–55, 2009.
Yann Bramoullé, Rachel Kranton, and Martin D’Amours. Strategic interaction and networks. The American Economic
Review, 104(3):898–930, 2014.
William A Brock and Steven N Durlauf. Discrete choice with social interactions. The Review of Economic Studies, 68
(2):235–260, 2001.
William A Brock and Steven N Durlauf. Identification of binary choice models with social interactions. Journal of
Econometrics, 140(1):52–75, 2007.
Antoni Calvó-Armengol, Eleonora Patacchini, and Yves Zenou. Peer effects and social networks in education. The
Review of Economic Studies, 76(4):1239–1267, 2009.
John Cameron. Social cost-benefit analysis - principles. In John Cameron, Paul Hunter, Paul Jagals, and Katherine
Pond, editors, Valuing Water, Valuing Livelihoods. World Health Organization: IWA Publishing, 2011.
Richard C Carter, Erik Harvey, and Vincent Casey. User financing of rural handpump water services. IRC Symposium
2010, 2010.
T. G. Conley and C. R. Udry. Learning about a New Technology: Pineapple in Ghana. The American Economic
Review, 100(1):35–69, 2010.
H. Cremer and J.-J. Laffont. Public goods with costly access. Journal of Public Economics, 87(9-10):1985–2012, 2003.
Giacomo De Giorgi, Michele Pellizzari, and Silvia Redaelli. Identification of social interactions through partially
overlapping peer groups. American Economic Journal: Applied Economics, pages 241–275, 2010.
29
Matthew Elliott and Benjamin Golub. A network approach to public goods. 2015.
Kyle Emerick. The efficiency of trading in social networks: Experimental measures from india. 2013.
A. D. Foster and M. R. Rosenzweig. Learning by Doing and Learning from Others: Human Capital and Technical
Change in Agriculture. Journal of Political Economy, 103(6):1176–1209, 1995.
Tim Foster. Predictors of sustainability for community-managed handpumps in sub-saharan africa: Evidence from
liberia, sierra leone, and uganda. Environmental science & technology, 47(21):12037–12046, 2013.
J. Habyarimana, M. Humphreys, D. N. Posner, and J. M. Weinstein. Why does ethnic diversity undermine public
goods provision? American Political Science Review, 101(4):709–725, 2007.
Peter Harvey and Bob Reed. Rural Water Supply in Africa: Building Blocks for Handpump Sustainability. Water,
Engineering and Development Centre, Loughborough University, 2004.
IRC. Providing a basic level of water and sanitation services that last: cost benchmarks. WASHCost infosheet, 2012.
M. Kremer, J. Leino, E. Miguel, and A. P. Zwane. Spring cleaning: Rural water impacts, valuation and property
rights institutions. The Quarterly Journal of Economics, 126(1):145–205, 2011.
Harold Lockwood and Stef Smits. Supporting Rural Water Supply Moving towards a Service Delivery Approach.
Practical Action Publishing, 2011.
Charles F Manski. Identification of endogenous social effects: The reflection problem. The review of economic studies,
60(3):531–542, 1993.
E. Miguel. Tribe or Nation? Nation building and public goods in Kenya versus Tanzania. World Politics, 56(03):
328–362, 2005.
K. Munshi. Social learning in a heterogeneous population: social learning in the indian green revolution. Journal of
Development Economics, 73:185–213, 2004.
S. Mutuswami and E. Winter. Efficient mechanisms for multiple public goods. Journal of Public Economics, 88(3-4):
629–644, 2004.
Ariel Pakes, Jack Porter, Kate Ho, and Joy Ishii. Moment inequalities and their application. 2011.
Katherine Pond and Stephen Pedley. Current situation in access to drinking-water. In John Cameron, Paul Hunter,
Paul Jagals, and Katherine Pond, editors, Valuing Water, Valuing Livelihoods. World Health Organization: IWA
Publishing, 2011.
L. S. Prokopy. The relationship between participation and project outcomes: Evidence from rural water supply projects
in india. World Development, 33(11):1801–1819, 2005.
K. Sansom and L Koestler. African handpump market mapping study: Summary report for unicef wash section and
supply division. UNICEF, New York, 2009.
C. Schultz and T. Sjostrom. Local public goods, debt and migration. Journal of Public Economics, 80(2):313–337,
2001.
R. W. Schweitzer and J. R. Mihelcic. Community Managed Rural Water Systems: What makes them sustainable?
2011.
Sheetal Sekhri. Wells, water, and welfare: The impact of access to groundwater on rural poverty and conflict. American
Economic Journal: Applied Economics, 6(3):76–102, 2014.
30
Tanzania National Bureau of Statistics. Tanzanian national census. 2002.
Petra E Todd and Kenneth I Wolpin. Estimating a coordination game in the classroom. forthcoming.
UNICEF. Progress on Sanitation and Drinking-Water, 2014 Update. 2014.
WaterAid. Sustainability Framework. 2011.
WHO, UNICEF. A Snapshot of Progress - 2014 Update, WHO/UNICEF Joint Monitoring Program for Water Supply
and Sanitation. 2014.
World Bank. Living Standards Measurement Survey - Integrated Survey on Agriculture (LSMS-ISA). 2007-08.
World Bank. Gender in Water and Sanitation. World Bank Water and Sanitation Program, 2010.
31
Appendix
Overlapping networks: partition and integration approach
As discussed in section 6, the estimation procedure I follow in this paper, using segregated, non-overlapping clusters,
is quite restrictive on the interactions that it allows between neighboring communities. Specifically, it allows positive
spillovers and free riding to occur only within a cluster defined by an administrative area, which to some extent is
relatively arbitrarily defined. While a community’s maintenance decision may be a function of the decisions of other
communities in its cluster, I do not allow it to be a function of decisions of communities in other clusters, even if
the distance to these communities is lesser than some within the cluster. Clearly the segregated networks approach
is restrictive, and is less intellectually appealing as a description of behaviors and interactions in my empirical context. In future work, I will explore methods of estimating the full network model without these restrictions, which
may allow my research to make a methodological contribution. This section gives an overview of my proposed approach.
One promising method for estimating the full network model is a ‘partition and integration’ approach, a new and
novel approach, suggested in the online appendix of Acemoglu, Garcı́a-Jimeno, and Robinson [2015]. This provides a
way of approximating the likelihood function, using a prior distribution for the unknown joint distribution of actions
for a subset of observations. The properties of this approximation and the methodology used to implement it (particularly the choice of index and the Bayesian updating stage) are not yet known, and learning more about these through
simulations would be the priority for my future work. As my action space is binary, my empirical application may be
an ideal testing ground for this methodology, as specification of a prior joint distribution and Bayesian updating will
be more straightforward than in a problem with a continuous action space.
As noted at the start of section 6, the individual likelihood from our model is given by:
h ū imi h
ū i1−mi
i
i
Li (mi |Ni (δ)m−i , Xi , X̃−i ; ψ) = Φ
1−Φ
σ
σ
(16)
Without network effects, we can usually take the product of these individual likelihoods to get the joint likelihood,
but in the network case they are not independent. However, a subset of the individual likelihoods are independent of
each other, and we can use this fact to partition observations and give an approximation of the likelihood. To simplify
notation in what follows, I define N (i) = {j : nij > 0} and M (i) = {mj : j ∈ N (i)}. Suppressing observables notation,
we can then write the conditional likelihood for i as conditional on the actions of only those communities in their
network:
Li (mi |Ni (δ)m−i , Xi , X̃−i ; ψ) = Li (mi |M (i); ψ)
(17)
Next, we can fix an order of i from 1 to n, to write the joint likelihood as:
L(m1 , ..., mn ; ψ) =L1 (m1 |m2 , ..., mn ; ψ)L(m2 , ..., mn ; ψ)
Y
=
Lj (mj |mj+1 , ..., mn ; ψ)L(mn ; ψ)
(18)
(19)
j=1
For any given order there is a subset, A, of communities for which all of the communities in their network are higher
ranked than they are, i.e. j such that ∀k < j, k ∈
/ N (j). For communities in A we can write:
LA
j (mj |mj+1 , ..., mn ; ψ) = Lj (mj |M (j); ψ)
32
(20)
Therefore, for communities in the subset A we can write the joint likelihood as a product of the conditionals. For
a relatively sparse network (e.g. in my empirical setting if I set a low cutoff distance, greater than which network
connections are equal to zero), this subset is a significant proportion of communities, though there always exist some
communities in Ac . For those communities in Ac , there are communities in their network who have a lower index, and
so we cannot simply take the product of the conditionals, but instead integrate over the distribution of actions for the
communities with a lower index. Formally, we can write an expression for conditional likelihoods, integrating over the
likelihood of j’s ‘missing’ network neighbors, j1 , ..., jK :
c
LA
j (mj |mj+1 , ..., mn ; ψ) =
Z
Lj (mj |mj1 , ..., mjK , mj+1 , ..., mn ; ψ)L(mj1 , ..., mjK )dmj1 ...dmjK
(21)
Lj (mj |M (j); ψ)L(mj1 , ..., mjK )dmj1 ...dmjK
(22)
Z
=
Note that for communities in Ac the initial problem remains, that we do not know the joint likelihood. However, for
a relatively sparse network, we have greatly reduced the severity/dimensionality of this problem, with the unknown
joint likelihood now given by L(mj1 , ..., mjK ).
Substituting equations 22 and 20 into equation 19, we get an expression for the joint likelihood:
L(m1 , ..., mn ; ψ) =
Y
j∈A
Lj (mj |M (j); ψ)
Y Z
Lj (mj |M (j); ψ)L(mj1 , ..., mjK )dmj1 ...dmjK
(23)
j∈Ac
In principal, we can estimate this using a prior distribution for mj1 , ..., mjK , though the properties of our approximate
likelihood are not known. However, with a small distance cutoff (sparse network), the dimensionality of this integral
is not too high, and because in my application the action space is binary, specification of a prior distribution is more
straightforward than in most cases. For communities with just one network connection with a lower index, i.e. K = 1,
the prior distribution is straightforwardly Bernoulli, as mi is binary. We can then iteratively update the Bernoulli
success parameter, using Bayes’ rule and the last iteration’s parameter estimates. For communities with K > 1, I will
likely start off with a Binomial prior, and then update using the more general multinomial distribution. The priority
for future work exploring this method is to simulate data with a relatively sparse network connections matrix and a
binary action space, and then to explore the estimation properties of different algorithms for choosing the order of the
index and the Bayesian updating.
33
Figures and Tables
Figure 5: Pump components. Breakdown classified as ‘minor’ in Table 11 if malfunctioning components cost less
than $10 to replace
Table 11: Primary reason given for breakdown of water pump, 2008 Tanzania data
Breakdown type
Freq.
Percent
Dried
Major housing issue
Major rising main issue
Minor housing issue
Minor rising main issue
No longer used
Water contaminated
Total
143
25
121
42
105
79
28
543
26.34
4.6
22.28
7.73
19.34
14.55
5.16
100
34
Table 12: Summary statistics for Tanzanian data, 2008
Variable
All water sources (with GPS)
Hand pump water sources (with GPS)
Hand pump technology 1: Shallow well
Hand pump tech 2: Machine drilled
Hand pump tech 3: Hand drilled
Variable
Functional status dummy
Distance to working water source, km
Age of pump at record
Pay for use dummy
Obs
24,276
8,046
5,475
1,365
1,206
Func. rate
56.5%
55.7%
53.7%
56.6%
64.0%
Obs
8,046
8,046
7,694
5,571
Mean
0.56
1.69
11.0
0.17
SD
0.50
3.72
8.6
0.38
Figure 6: Distribution of distance to nearest working water source (km) and proportion of pumps working at each
distance interval
Table 13: Summary statistics for three degree centrality measures
Variable
Obs
Mean
Std Dev
Number of links to non-pump water sources
Number of links to pumps, diff tech
Number of links to pumps, same tech
9885
9885
9885
4.95
0.82
1.31
8.93
2.19
2.33
35
Figure 7: First stage results of instrumental variables regression
Early breakdown instrument
Tests of linear term
F-test of excluded instruments
Angrist-Pischke (2009) chi-squared test of underidentification
Angrist-Pischke (2009) F-test of weak identification
Tests of squared term
F-test of excluded instruments
Angrist-Pischke (2009) chi-squared test of underidentification
Angrist-Pischke (2009) F-test of weak identification
Joint tests
Underidentification test (Anderson (1951) LM statistic)
Weak identifcation test (Cragg-Donald (1993) Wald F statistic)
(1)
(2)
(3)
150.84
(0.0000)
61.27
(0.0000)
61.23
141.92
(0.0000)
43.91
(0.0000)
43.91
119.42
(0.0000)
50.39
(0.0000)
50.10
83.17
(0.0000)
33.78
(0.0000)
33.76
80.14
(0.0000)
24.79
(0.0000)
24.79
62.47
(0.0000)
26.36
(0.0000)
26.20
90.25
(0.0000)
45.89
72.44
(0.0000)
36.50
70.27
(0.0000)
35.41
p-values in parentheses where available
Angrist-Pischke weak-ID test uses Stock and Yogo (2005) critical values
Figure 8: Histograms of three degree centrality measures
36
Table 14: Water source type and technology selection equations, estimated using a probit model. The dependent
variable in specification (1) and (2) is whether a water source is a hand pump, and for specifications (3) to (6) whether
a water source is of a specific pump technology (Nira/Tanira or SWN 80). Dependent variables are community
characteristics taken from the community questionnaire of the LSMS.
(1)
Pump
VARIABLES
Farmer’s cooperative in village
SACCO in village
Islam main religion
Education level of leader
Market in village
Nursery in village
Government primary school
Private primary school
Government secondary school
Government health center
Court
Constant
Observations
District FE
Psuedo R2
(2)
Pump
(3)
Nira
(4)
Nira
(5)
SWN 80
(6)
SWN 80
5.030
5.253
4.794
4.046
1.365
4.76e-09
(150.4)
(281.9)
(310.0)
(821.2)
(486.8)
(1,024)
-4.200***
-5.918
3.258
-3.758
-5.036
-4.854
(1.091)
(85.48)
(1,030)
(821.2)
(477.4)
(284.7)
4.321
3.502
3.345
-0.350
3.592
0.981
(150.4)
(203.8)
(310.0)
(637.7)
(338.0)
(709.1)
1.578***
1.563***
-0.899
-4.673
1.182***
1.182***
(0.481)
(0.491)
(0.586)
(464.5)
(0.353)
(0.353)
0.344
-2.92e-10
-1.424***
1.29e-07
3.004
-6.69e-09
(0.271)
(398.6)
(0.380)
(1,161)
(506.0)
(1,328)
-2.879***
-3.562
0.0604
-0.350
-7.204
-5.776
(0.606)
(203.8)
(0.487)
(637.7)
(167.5)
(709.1)
-8.975
-10.91
-1.983
-7.937
-10.68
-9.465
(150.4)
(294.5)
(1,075)
(1,642)
(849.8)
(980.9)
-6.884
-2.146*
-17.08
9.513
10.04
7.419
(467.6)
(1.100)
(2,179)
(1,265)
(1,087)
(503.9)
-1.013
0.489
-8.001
8.445
9.648
(150.4)
(294.5)
(1,075)
(369.6)
(1,101)
2.391
1.927
5.579
4.108
-6.148
-5.776
(150.4)
(447.7)
(310.0)
(1,325)
(461.1)
(1,532)
-0.475
1.004
-5.070
-4.975
-0.0198
1.182
(150.4)
(294.5)
(310.0)
(975.8)
(593.3)
(1,141)
1.682
4.472
-0.730
16.59
4.538
5.932
(1.586)
(237.0)
(1,030)
(1,746)
(954.9)
(815.4)
574
No
0.328
519
Yes
0.266
574
No
0.283
508
Yes
0.148
574
No
0.516
475
Yes
0.480
Standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1
Omitted religion dummy: Christianity main religion.
Other independent variables not shown: government hospital, private hospital, police station or bank in village.
Probit regressions estimated on a subset of the data with LSMS community information available.
There are fewer observations when use I district fixed effects as some districts have only a
few LSMS observations and these perfectly predict the type or technology of the water source.
37
© Copyright 2026 Paperzz