PRIO-GRID Codebook - Peace Research Institute Oslo

PRIO-GRID Codebook1
Andreas Forø Tollefsen
Peace Research Institute Oslo (PRIO)
Version 1.01
Last updated March 15, 2012
Please cite:
Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified
spatial data structure. Journal of Peace Research 49(2): 363-374.
1
PRIO-GRID is a product of the Advanced Conflict Data Catalogue (ACDC) project, funded by the Research
Council of Norway. PRIO-GRID was initiated by Andreas Forø Tollefsen and Halvard Buhaug in 2008 and is
maintained by Andreas Forø Tollefsen and administered by ACDC project leader Håvard Strand at PRIO. Halvard
Buhaug and Gerdis Wischnath have contributed to the finalizing and testing of the data. We thank numerous
colleagues at PRIO, Uppsala University, ETH Zurich, University of Colorado Boulder, Columbia University, and
elsewhere for crucial feedback during the development of the data project. We also appreciate the providers of
the source data that are integrated into the PRIO-GRID framework. The PRIO-GRID website is:
http://www.prio.no/CSCW/Datasets/PRIO-Grid/. Questions and comments should be addressed to Andreas
Forø Tollefsen: [email protected].
1
1.
Introduction
This document describes the development and content of the PRIO-GRID dataset – a
standardized spatial grid structure with global coverage at a resolution of 0.5 x 0.5 decimal
degrees. See Tollefsen, Strand & Buhaug (2012) for additional information on the
background, motivation, and application of PRIO-GRID, and respective source
documentation for third-party data that are included in PRIO-GRID.
PRIO-GRID consists of three components. The first is a set of data files (*.dta; *.csv)
that contain spatially disaggregated data at the grid cell level. This also includes geographic
information systems (GIS) shapefiles containing the polygon grid and the point grid. Many of
the data files contain time-series data with one grid representation per calendar year. The
second component includes open-source replication scripts that were used to generate the
PRIO-GRID data set. These files facilitate replication, modification, and extension of the
original files, including joining of additional geo-referenced data. The third component is the
documentation consisting of the journal article presenting PRIO-GRID (Tollefsen, Strand &
Buhaug 2012) and this codebook.
PRIO-GRID is a versioned dataset, meaning that any changes to the data are released
with a new version number. Higher version numbers indicate more recent data. All files,
scripts, and documentation should reflect these version changes.
2.
The development of PRIO-GRID
Development of PRIO-GRID is done in a relational database management system (RDBMS);
PostgreSQL with the spatial PostGIS extension supplying the geometric functionality of the
Structured Query Language (SQL) database. In addition, Python scripting language is used to
automate the development processes. Each python script connects to the SQL database
using the psycopg2 python module. The combination of the Python scripting language and
the PostgreSQL/PostGIS provides the PRIO-GRID project with the ability to modify spatial
data and read most available spatial data formats. The Python script also provides the ability
to loop through data at different ranges and time dimensions, which is crucial since the
PRIO-GRID project has a temporal component (represented as annual grids).
PRIO-GRID is released with a 0.5 x 0.5 decimal degree cell resolution. This
corresponds to a cell of roughly 55 x 55 kilometers at the equator (3025 square kilometers
area). Cell area decreases with higher latitudes.
The grid structure is defined by a south-western starting point defined by x and y
coordinates (90S and 180W) and represented using the WGS84 geographic reference
system. The cell identifier starts at 1 at the south-western corner (column 1 and row 1) and
increases by 1 for each column, until reaching 720 (column 720 and row 1). The cell identifier
then starts at the next row and begins at 721 (column 1 row 2). The full grid at 0.5 x 0.5
degrees resolution contains 259.200 cells (720 x 360). A majority of these cells cover water
and other uninhabited areas (notably the Arctic and Antarctica) and are of little relevance in
most applications. To limit file size, the released PRIO-GRID only includes terrestrial grid cells
although the full grid is maintained and is available on request.
The current version of PRIO-GRID consists of one grid per calendar year for the period
1946–2008. Each cell can be assigned to one and only one country in each yearly file. To
determine country ownership, PRIO-GRID draws on the cShapes dataset (Weidmann et al.
2008). cShapes is a time-series dataset that contains geographic data on the outline of
countries (represented as polygons) since 1946, based on the Gleditsch & Ward (1999) list of
2
independent states. Grid cells that fall completely within the territory of an independent
state are assigned the corresponding Gleditsch & Ward country code (gwno). Grid cells that
cover the territory of two or more independent states (i.e. the cell intersects with multiple
country polygons) are assigned to the country that covers the largest share of the cell’s area
(Figure 1). Note that while all terrestrial cells are included in all yearly files, country codes
are assigned to cells only in those years that the host country is a member of the Gleditsch &
Ward international system (Figure 2). In case of territorial transfer (e.g., from East Pakistan
to Bangladesh in 1971), a cell is given the country code that applies to the status at the end
of the year, 31 December.
Figure 1. Country code allocation based on majority rule
Figure 2. Grid cells allocated to independent states by year
Note: The graph shows the number of cells with valid country codes. This increases as states become
independent according to the Gleditsch & Ward (1999) list of independent states (solid line). However, each
yearly grid file contains 64,818 cells (dashed line).
3
3.
Data
This section contains a brief presentation of all variables in the PRIO-GRID files and how they
were imported and modified to fit into the PRIO-GRID data structure. Each file, or attribute
table, contains the same grid cell identifier (gid) and the temporal data tables also include a
year variable. This means that gid + year create a unique identifier in the time-series data
whereas gid constitutes the unique identifier in the static files.
3.1. PRIO-GRID
The PRIO-GRID file is the main attribute table in the PRIO-GRID project. This table contains
yearly observations of all terrestrial grid cells (excluding Antarctica and Greenland) for all
calendar years between 1946 and 2008. In total, the table contains 64,818 cells x 63 years =
4,083,534 observations (cell years) in total. The PRIO-GRID table contains eight variables:
gid
is the grid cell identifier, a unique id code for each cell in the grid. Since we
only include the terrestrial cells from the full grid, the gid starts at 49182 and
ends at 249344.
col
denotes column number for the grid cell; column 1 is the westernmost
column in the grid, between 180 and 179.5 decimal degrees W. With one
column per half degree, there are 720 columns in PRIO-GRID.
row
denotes the row number for the grid cell; while row 1 is the southernmost
row (between 90 and 89.5 degrees S) and row 360 is the northernmost row in
the full grid in the underlying data, the released data file contains
observations for cells between row 69 and 347.
xcoord
denotes the longitude coordinate (decimal degrees) for the centroid of the
grid cell. Negative coordinates are located west of the Prime Meridian
(Greenwich) at 0 degrees longitude.
ycoord
denotes the latitude coordinate (decimal degrees) for the centroid of the grid
cell. Negative coordinates are located south of the equator at 0 degrees
latitude.
year
gives the calendar year of observation.
gwno
denotes the numerical country code for the country to which the cell is
allocated, based on the Gleditsch & Ward (1999) system membership list.
Missing values imply non-independent territory. The country code reflects the
status as of 31 December of each year.
cellarea gives the area of the land territory of the grid cell, given in square kilometers.
4
Table 1. Attribute table of PRIO-GRID
gid
49182
49183
49184
49185
49186
49898
49899
49900
49901
49902
col
222
223
224
225
226
218
219
220
221
222
row
69
69
69
69
69
70
70
70
70
70
xcoord
-69.25
-68.75
-68.25
-67.75
-67.25
-71.25
-70.75
-70.25
-69.75
-69.25
ycoord
-55.75
-55.75
-55.75
-55.75
-55.75
-55.25
-55.25
-55.25
-55.25
-55.25
year
1946
1946
1946
1946
1946
1946
1946
1946
1946
1946
gwno
155
155
155
155
155
155
155
155
155
155
cellarea
4.132284
3.950606
268.3174
242.042
310.7517
3.401011
346.9918
179.8895
806.2632
1098.728
3.2. Distance
The Distance attribute table includes measures of the location of each cell centroid relative
to other entities of interest. It includes seven variables:
gid
is the grid cell identifier, a unique id code for each cell in the grid.
year
gives the calendar year of observation.
bdist1
gives the distance (in kilometer) from the cell centroid to the border of the
nearest contiguous neighboring country (based on Gleditsch & Ward system
membership list). This implies that cells in e.g. Northern Denmark are
measured to the border to Germany even if the straight-line distance to
Norway (across international waters) is shorter. Cells belonging to island
states with no contiguous neighboring country (e.g. New Zealand) are coded
as missing.
bdist2
gives the distance (in kilometer) from the cell centroid to the border of the
nearest neighboring country, regardless of whether the nearest country is
located across international waters. Hence, for cells belonging to island states
(e.g. New Zealand), bdist2 gives the shortest distance to the nearest land
territory of another state.
bdist3
gives the distance (in kilometer) from the cell centroid to the territorial
outline of the country the cell belongs to. For cells located along a coast and
for cells of island states (e.g. New Zealand), bdist3 measures the shortest
straight-line distance to international waters. By definition, bdist3 can never
have higher values than the two other border distance indicators and for 44 %
of the cell years all three border distance estimates are identical.
capdist
gives the distance (in kilometer) from the cell centroid to the national capital
city in the corresponding country (gwno). Geographical coordinates for the
capital cities were derived from the cShapes dataset (Weidmann et al. 2008)
and captures changes over time wherever relevant. Figure 3 visualizes these
straight-line distances.
5
ttime
gives the estimated cell-average travel time (in minutes) by land
transportation from the cell to the nearest major city with more than 50,000
inhabitants. The values are extracted from a global high-resolution raster map
of accessibility (Nelson 2008) where ttime reflects the average pixel value
within each PRIO-GRID grid cell. Unlike the other distance indicators, ttime
does not contain time-varying information.
Figure 3. Cell-capital distance calculations
Note: The map visualizes straight-line distance calculations from the centroid of each grid cell to the capital city
(year 2008).
3.3. Conflict
The Conflict attribute table includes information on the outbreak and incidence of armed
intrastate conflict, derived from the UCDP/PRIO Armed Conflict Dataset (Gleditsch et al.
2002; Themnér & Wallensteen 2011) and adapted to PRIO-GRID through two versions of the
PRIO Conflict Site dataset (Raleigh et al. 2006; Dittrich Hallberg 2012). In these datasets, all
conflicts are assigned circular conflict polygons that encompass all reported locations of
fighting in the given calendar year (Figure 4). Only cells assigned to the host country of the
armed conflict (indicated by the gwnoloc indicator in the UCDP/PRIO Armed Conflict
Dataset) are coded in conflict. See source material for the Armed Conflict Dataset and the
Conflict Site codebook for definitions and further information on armed conflict and the georeferencing of the conflict zones.
6
Figure 4. Circular conflict polygons represented in PRIO-GRID
Note: The figure shows the circular conflict site polygon for the 1990 conflict in eastern Mali. All cells that
intersect the conflict polygon and belong to the conflict-affected country (i.e. Mali; shaded cells) are coded as
being in conflict.
The Conflict table contains 13 variables:
gid
is the grid cell identifier, a unique id code for each cell in the grid.
year
gives the calendar year of observation.
conf
is a simple dummy variable that indicates whether the grid cell was located in
a conflict zone in the given year. This indicator is based on the updated v.3 of
the Conflict Site coding (Dittrich Hallberg 2012) and contains time-varying
values for the period 1989–2008.
civconf
is a dummy variable indicating whether the conf is of type 3 or type 4, that is
internationalized intrastate or intrastate conflict. See Themnér & Wallensteen
2011 for more information on the type variable.
confold
is a similar conflict incidence dummy based on the older v.2 of the Conflict
Site coding (see Raleigh et al. 2006), initially developed by Buhaug & Gates
(2002). This indicator contains valid values for the period 1946–2005, though
users should be aware that the grid-specific incidence coding in many earlier
conflicts vary little during the course of conflict due to poor information on
the exact location of battle events (meaning that cells that were part of the
conflict zone in one or more years are coded in conflict in all years of the
conflict).
civcfold
is a dummy variable indicating whether the confold is of type 3 or type 4, that
is internationalized intrastate or intrastate conflict. See Themnér &
Wallensteen 2011 for more information on the type variable.
onset
is a dummy that identifies the grid cell hosting the initial battle location for
each intrastate conflict, based on coding from Holtermann (n.d.).
conflag1 is a spatial conflict lag indicator that gives the share of ongoing conflict
(conf=1) among contiguous cells in the same country in the given year. This
indicator takes on values between 0 and 1. For cells with no contiguous cells
7
in conflict, conflag1 is coded 0 whereas a cell with 3 of 8 contiguous cells in
conflict is assigned the value 3/8=0.375.
conflag3 is a spatial conflict lag indicator that gives the share of ongoing conflict among
cells within three orders of proximity in the same country in the given year.
conflag5 is a spatial conflict lag indicator that gives the share of ongoing conflict among
cells within five orders of proximity in the same country in the given year.
cfoldlag1 is a spatial conflict lag indicator that gives the share of ongoing conflict
(confold = 1) among contiguous cells in the same country in the given year.
This indicator takes on values between 0 and 1. For cells with no contiguous
cells in conflict, conflag1 is coded 0 whereas a cell with 3 of 8 contiguous cells
in conflict is assigned the value 3/8=0.375.
cfoldlag3 is a spatial conflict lag indicator that gives the share of ongoing conflict among
cells within three orders of proximity in the same country in the given year.
cfoldlag5 is a spatial conflict lag indicator that gives the share of ongoing conflict among
cells within five orders of proximity in the same country in the given year.
In some cases the conflict zone polygons in the Conflict Site datasets (Raleigh et al. 2006;
Dittrich Hallberg 2012) overlap, implying that several grid cells host more than one intrastate
armed conflict in a calendar year. Providing information on the conflict IDs for all conflicts
represented in PRIO-GRID while maintaining a 1:1 relationship with the other PRIO-GRID files
(i.e. one observation per grid cell per calendar year) would require multiple conflict ID
indicators with mostly missing or redundant information. Instead, we offer information on
the identity of the conflicts in PRIO-GRID through separate files (Confsitegrid and
Confsiteoldgrid) that contain one observation per conflict per cell year.
3.4. Confsitegrid
The Confsitegrid attribute table provides the conflict IDs for the conflicts that are indicated
by the conf variable, and correspond to v4-2010 of the UCDP/PRIO Armed Conflict Dataset.
In cases of multiple overlapping conflict polygons, each cell year is listed once per conflict ID.
Georeferenced conflict data are based on the Conflict Site dataset v.3 (Dittrich Hallberg
2012). The Confsitegrid table has valid values for the period 1989–2008, reflecting the
temporal coverage of v.3 of the PRIO Conflict Site dataset. This file contains the following
four variables:
gid
is the grid cell identifier, a unique id code for each cell in the grid.
year
gives the calendar year of observation.
confid
indicates the identity of the conflict(s) for cell years where conf=1. This
indicator is identical to the “ID” variable of the UCDP/PRIO Armed Conflict
Dataset. Additional variables from the UCDP/PRIO dataset (incompatibility,
type, etc) may be imported into PRIO-GRID by joining on the confid.
type
indicates the type of the conflict. The UCDP/PRIO Armed Conflict Dataset
identifies four types of conflict. Type 1 conflicts are extrasystemic conflicts
between a state and a non-state group outside its own territory. Typically
8
type 1 conflicts are referred to as colonial conflicts. Type 2 conflicts are are
interstate conflicts between two or more states. Type 3 conflicts are internal
armed conflicts between a state and one or more opposition groups. Type 4
conflicts are internationalized internal armed conflicts between a state and
one or more opposition groups with intervention from other states on one or
both sides.
3.5. Confsiteoldgrid
The Confsiteoldgrid table contains the same information as Confsitegrid, but is based on the
older v.2 of the PRIO Conflict Site dataset. This indicator has valid values for the period
1946–2005, reflecting the temporal coverage of v.2 of the PRIO Conflict Site parent dataset
Buhaug & Gates 2002; Raleigh et al. 2006). This file corresponds to the confold coding and
contains the following four variables:
gid
is the grid cell identifier, a unique id code for each cell in the grid.
year
gives the calendar year of observation.
confid
indicates the identity of the conflict(s) for cell years where confold=1. This
indicator is identical to the “ID” variable of the UCDP/PRIO Ared Conflict
Dataset. Additional variables from the UCDP/PRIO dataset (incompatibility,
type, etc) may be imported into PRIO-GRID by joining on the confid.
type
indicates the type of the conflict. The UCDP/PRIO Armed Conflict Dataset
identifies four types of conflict. Type 1 conflicts are extrasystemic conflicts
between a state and a non-state group outside its own territory. Typically
type 1 conflicts are referred to as colonial conflicts. Type 2 conflicts are are
interstate conflicts between two or more states. Type 3 conflicts are internal
armed conflicts between a state and one or more opposition groups. Type 4
conflicts are internationalized internal armed conflicts between a state and
one or more opposition groups with intervention from other states on one or
both sides.
3.6. Socioeco
The Socioeco attribute table includes variables related to the demographic and socioeconomic characteristics of the cell. Note that while the table contains observations of all
terrestrial grid cells in all years, the variables only contain data for certain years (see below
for details). This file contains four variables:
gid
is the grid cell identifier, a unique id code for each cell in the grid.
year
gives the calendar year of observation.
pop
gives the population size for each populated cell in the grid, extracted from
the Gridded Population of the World, version 3 (CIESIN 2005). Population
estimates are available for 1990, 1995, 2000, and 2005. The unit is number of
people in the cell; to obtain true population density estimates this variable
should be divided by the cellarea variable contained in the PRIO-GRID table.
9
imr
indicates cell-specific infant mortality rate, based on raster data from the
SEDAC Global Poverty Mapping project (Storeygard et al. 2008). The imr
variable is the average pixel value inside the grid cell. The value unit is the
number of children per 10,000 that die before reaching their first birthday.
This indicator is available for the year 2000 only.
3.7. Physclimate
The Physclimate attribute table contains variables that tap physical characteristics and
climatic conditions for all cells in PRIO-GRID. The file contains nine variables:
gid
is the grid cell identifier, a unique id code for each cell in the grid.
year
gives the calendar year of observation.
mnt
gives the proportion (i.e., average pixel value, in percent) of mountainous
terrain within each cell. This indicator is based on high-resolution mountain
raster data that were developed for UNEP’s Mountain Watch Report (WCMCUNEP 2002). This indicator does not vary over time.
frst
To measure the coverage of forest areas we include the percentage forest
cover in a cell extracted from the Globcover 2009 dataset. To compute the frst
variable we aggregate the classes in Globcover 2009 with the land class
number equal to : 40, 50, 60, 70, 90, 100, 110, 160 or 170. The value indicate
the percentage area of the cell covered by forested area.
irri
gives the proportion (i.e., average pixel value, in percent) of area equipped for
irrigation within each cell. The data is taken from the FAO Aquastat irrigation
raster (Siebert et al. 2007), which indicates pixelated data on areas equipped
for irrigation as of year 2000. This indicator does not vary over time.
prec
gives the yearly total amount of precipitation (in millimeter) in the cell, based
on monthly meteorological statistics from the University of Delaware (NOAA
2011). This indicator contains data for all years 1946–2008.
temp
gives the yearly mean temperature (in degrees Celsius) in the cell, based on
monthly meteorological statistics from the University of Delaware (NOAA
2011). This indicator contains data for all years 1946–2008.
spi6
is an aggregated yearly Standardized Precipitation Index that indicates withinyear deviations in precipitation based on monthly data. For each month, the
monthly SPI6 index measures deviation from long-term normal rainfall during
the six preceding months. The values are standardized where deviation
estimates less than 1 standard deviation indicate near normal rainfall; 1 to
1.49 standard deviations from mean indicate moderately dry conditions; 1.5
to 1.99 indicate severely dry conditions; and values in excess of 2 standard
deviations indicate extremely dry conditions. See Guttmann (1999) and
McKee et al. (1993) for more on the SPI6 calculation and drought
measurement. The monthly data were then aggregated to a yearly format and
categorized to indicate anomalous years. The annualized spi6 is coded 1 if
there were at least three consecutive months with SPI6 ≥ 1 in the given grid
cell during the year (moderate drought); spi6 = 1.5 if SPI6 was ≥ 1.5 for at least
10
two consecutive months (severe drought); and spi6 = 2.5 if both of the above
criteria are met (extreme drought). A spi6 value of 0 indicates that no drought
event occurred during that year. The SPI6 data are extracted from NetCDF
data provided by the International Research Institute for Climate and Society
derived from the Global Precipitation Climatology Centre (GPCC), see Rudolf
et al. (2010) for more information. Cells without spatial or temporal SPI6
coverage is coded as missing.
spi6dist
measures the distance (in kilometer) to the nearest cell with spi6 value above
0 for the current year. For cells with a spi6 above 0, the spi6dist indicator is
coded 0 kilometers by default. The spi6dist measures if there is a spi6 within a
500 kilometers radius from the cell centroid. If no spi6 is present within 500
kilometers the spi6dist is set to missing.
3.8. GeoEPR
The GeoEPR attribute table contains information of the identity of spatially defined,
politically relevant ethnic groups settled in the grid cell, derived from the GeoEPR-ETH v.2
dataset (Wucherpfennig et al. 2011). Note that when more than one group exists inside a
cell, this file contains one row per group per cell year, where gid + year + grpid constitute the
unique observation identifier. Five variables are included:
gid
is the grid cell identifier, a unique id code for each cell in the grid.
year
gives the calendar year of observation.
grpid
denotes the ID of the ethnic group in the cell. This ID variable corresponds to
the cowgroup variable of the GeoEPR-ETH and EPR-ETH datasets, which
enables importing additional group-level information on e.g. political status
and population size.
grptype refers to the settlement pattern of the coded ethnic group. Users should be
aware that type = 5 refers to both large (often dominant) groups that are
settled throughout a country and small, dispersed groups that (perhaps
unrealistically) are coded with the entire country as their settlement area due
to lack of more precise information. See codebook and additional
documentation for the GeoEPR-ETH dataset for further information.
grparea measures the coded group’s settlement area as proportion (percent) of the
cell’s total land area (cellarea).
3.9. Globcover
The Globcover attribute table includes land cover information within each cell extracted
from the Globcover 2009 project (Bontemps et al. 2009). To preserve as much information
from the original land cover raster dataset the table includes area calculation of each land
cover class within each cell. In cells containing several land classes, each type is listed in a
separate row. Accordingly, gid + lclass constitute the unique observation identifier. See
Bontemps et al. (2009: 52) for more information on the classes. This file contains three
variables:
11
gid
is the grid cell identifier, a unique id code for each cell in the grid.
lclass
gives the land class code.
lclasspct gives the proportion (percent) of the cell area covered by the given land class.
3.10. Nordhaus
The Nordhaus attribute table contains information on local economic activity. This file is
based on the G-Econ dataset (Nordhaus 2006) which is released at a 1x1 decimal degree
resolution. Hence, each of the G-Econ cells contains four 0.5x0.5 decimal degree cells. This
has to be taken into account when using the data together with the other PG variables. The
Nordhaus file includes the economic gross cell product (GCP) variables from G-Econ,
indicating the level of local economic activity for five-year intervals since 1990. In border
areas, the G-Econ 1x1 decimal degree cells might overlap with PRIO-GRID cells allocated to a
neighboring country. To minimize vias, the PG only extracts G-Econ data for cells that have
the same country code as the G-Econ cell represents. See the original source documentation
for further details on the data. Ten variables are included:
gid
is the grid cell identifier, a unique id code for each cell in the grid.
gcppc90 indicates the gross cell product per capita in 1990, measured in USD. This
variable is calculated by multiplying the gcp90 value with one billion divided
by the population of the G-Econ 1 x 1 decimal degree cell (gcp90
*1,000,000,000 / gpw90).
gcppc95 indicates the gross cell product per capita in 1995, measured in USD.
gcppc00 indicates the gross cell product per capita in 1995, measured in USD.
gcppc05 indicates the gross cell product per capita in 1995, measured in USD.
gcp90
indicates gross cell product in 1990, measured in Billion USD.
gcp95
indicates gross cell product in 1995, measured in Billion USD.
gcp00
indicates gross cell product in 2000, measured in Billion USD.
gcp05
indicates gross cell product in 2005, measured in Billion USD.
gcpqual indicates the quality of the GCP values. Quality is a measure of the quality of
the economic data. Quality = 1 for countries for which the data are consistent,
but it does not capture the quality of the underlying country statistics. In
general, quality < 1 indicates that there are major inconsistencies in one of the
underlying data inputs into GCP. See the G-Econ definition table, available at
http://gecon.yale.edu/.
4. Adding additional data using the provided shapefile
In addition to the data available in the tables explained above, we provide two shapefiles
that makes it possible for users to add their own data. One shapefile with the cell geometry
and one shapefile with the centroid geometry are provided. These files may be used in a GIS
12
software to extract, join or overlay with other spatial data. These shapefiles may be joined to
the various attribute tables using the gid variable.
We provide two different examples on how to combine data with the PRIO-GRID
using ArcGIS. The first one relates to cases where one wants to combine vector data with the
PRIO-GRID data. In these cases one would use the spatial join function in ArcGIS. If the
relationship is 1:1, use the PRIO-GRID cell shapefile as the target layer and the 3rd party
vector shapefile as the join features. If one has many vector data overlapping with each cell,
one would typically use the 3rd party vector shapefile as the target layer and the PRIO-GRID
cell shapefile as the join feature. This would create a one-to-many join where all the 3rd party
features would be assigned the gid for the PRIO-GRID cell overlaying the 3rd party data.
The other example relates to instances where one wants to aggregate 3rd party raster
data to the PRIO-GRID cells. In these cases one would use the zonal statistics in ArcGIS. The
PRIO-GRID cell shapefile would work as the zonal features, and the 3 rd party raster as the
input raster. One would also select the gid variable as the feature id. The gid makes it easy to
merge the output table from the zonal statistics with the rest of the PG data.
Please refer to your GIS software user manual for more information.
5. Using the python scripts
Information to be added in future updates. Please contact the author in case of questions
regarding the python scripts.
13
6. References
Bontemps, Sophie; Pierre Defourny & Eric Van Bogaert (2009) Globcover 2009. Products
Description and Validation Report. European Space Agency.
(http://globcover.s3.amazonaws.com/LandCover2009/GLOBCOVER2009_Validation_Report
_1.0.pdf ). .
Buhaug, Halvard & Scott Gates (2002) The geography of civil war. Journal of Peace Research
39 (4): 417-433.
CIESIN - Center for International Earth Science Information Network, Columbia University;
and Centro Internacional de Agricultura Tropical (CIAT) (2005) Gridded Population of the
World Version 3 (GPWv3): Population Grids. Palisades, NY: Socioeconomic Data and
Applications Center (SEDAC), Columbia University. (http://sedac.ciesin.columbia.edu/gpw/).
Dittrich Hallberg, Johan (2012) PRIO Conflict Site 1989-2008: A geo-referenced dataset on
armed conflict. Conflict Management and Peace Science (In press).
Gleditsch, Kristian S & Michael D Ward (1999) Interstate system membership: A revised list
of the independent states since 1816. International Interactions 25: 393-413.
Gleditsch, Nils Petter; Peter Wallensteen, Mikael Eriksson, Margareta Sollenberg & Håvard
Strand (2002) Armed Conflict 1946–2001: A new dataset. Journal of Peace Research 39(5):
615–637.
Guttmann, Nathaniel B (1999) Accepting the Standardized Precipitation Index: A calculation
algorithm. Journal of the American Water Resources Association 35(2): 311-322.
Holtermann, Helge (n.d) Location of armed conflict onsets: Documentation of coding
decisions. Unpublished.
McKee, Thomas B; Noland J Doesken & John Kliest (1993) The relationship of drought
frequency and duration to time scales. Proceedings of the 8th Conference of Applied
Climatology, 17-22 January, Anaheim, CA. American Meteorological Society, Boston, MA:
179-184.
National Oceanic and Atmospheric Administration. 2011. Delaware climate data provided by
the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA. http://www.esrl.noaa.gov/psd/.
Nordhaus, William D (2006) Geography and macroeconomics: New data and new findings.
Proceedings of the National Academy of Sciences of the USA 103(10): 3510-3517.
Rudolf, Bruno; Andreas Becker, Udo Schneider, Anja Meyer-Christoffer & Markus Ziese
(2010) GPCC Status Report, December 2010. Global Precipitation Climatology Centre,
http://gpcc.dwd.de.
14
Raleigh, Clinadh; David Cunningham, Lars Wilhelmsen & Nils Petter Gleditsch (2006) Conflict
Sites 1946-2005.
(http://www.prio.no/sptrans/1140767671/conflict%20site%20codebook%20v2.pdf).
Siebert, Stefan; Petra Döll, Jippe Hoogeveen, J-M Frenken Karen Frenken & Sebastian Feick
(2007) Development and validation of the global map of irrigation areas. Hydrology and
Earth System Sciences 9: 535-547.
Storeygard, Adam; Deborah Balk, Marc Levy & Glenn Deane (2008) The global distribution of
infant mortality: A subnational spatial view. Population, Space and Place 14 (3): 209-229
Themnér, Lotta & Peter Wallensteen (2011) ‘Armed conflict, 1946–2010.’ Journal of Peace
Research 48(4): 525–536.
Tollefsen, Andreas Forø; Håvard Strand & Halvard Buhaug (2012) PRIO-GRID: A unified
spatial data structure. Journal of Peace Research 49(2): 363-374.
UNEP-WCMC World Conservation Monitoring Centre. 2002. Mountain Watch 2002.
http://www.unep-wcmc.org/mountains/mountain_watch/pdfs/WholeReport.pdf.
Weidmann, Nils B; Doreen Kuse & Kristian Skrede Gleditsch (2008) The
geography of the international system: The CShapes Dataset. International Interactions:
Empirical and Theoretical Research in International Relations 36(1): 86-106.
Weidmann, Nils B; Jan Ketil Rød & Lars-Erik Cederman (2010) Representing ethnic groups In
space: A new dataset. Journal of Peace Research 47(4): 491-499.
Wucherpfennig, Julian; Nils B Weidmann, Luc Giardin, Lars-Erik Cederman & Andreas
Wimmer (2011) Politically relevant ethnic groups across space and time: Introducing the
GeoEPR dataset. Conflict Management and Peace Science 20 (10): 1-15.
15