Project Civil Strife is a collection of dyadic event data sets focusing

Project Civil Strife is a collection of dyadic event data sets focusing on political conflict and
cooperation processes among myriad actors within countries. The bulk of the project focused on
South and Southeast Asian nations from 2001-2010 and all data were machine coded using the
Xenophon event data coding engine developed by Strategic Analysis Enterprises. Most of the
data are provided at the country level. We have also developed a proprietary multi-method geocoding software package to geo-locate the events within districts and provinces within countries.
At this time we are releasing only some of the India data that were used in our Minerva study.
More geo-located data will be available later and/or one can contact Professor Shellman directly
for more information or data needs.
Funding
This project and these data are a culmination of U.S. National Science Foundation, Defense
Advanced Research Projects Agency, Office of Secretary of Defense, Office of Naval Research
and Strategic Analysis Enterprises internal research and development funding. These data are
being released as deliverables for National Science Foundation projects (BCS-0904921; SES0721618; SES-0516545; SES-0214287) but the coding engine used to code the data benefitted
from several other sources of funding and the data improved overtime as a result. Moreover, the
actor and event dictionaries were developed and expanded over time with additional funding.
The agencies and project numbers are provided below:
Direct Funding
2009-11 National Science Foundation Grant, NSCC/SA: Terror, Conflict Processes,
Organizations, & Ideologies: Completing the Picture. DOD-NSF Minerva Initiative (NSF:
BCS-0904921).
2007-09 National Science Foundation Grant, SES-0721618. Domestic Terrorism & Political
Violence: Empirical Models of Government & Dissident Tactics and Strategies in South &
Southeast Asia.
2006-08 National Science Foundation Grant SES-0619997. Research Experience for
Undergraduates (REU) Supplement: Modeling Intranational Conflict-Cooperation
Processes.
2005-08 National Science Foundation Grant, SES-0516545 & SES-0452769. Project Civil
Strife: Multi-Actor Models of Intranational Conflict & Cooperation.
2002-04 National Science Foundation Grant, SES-0214287. Doctoral Dissertation Research
in Political Science: Taking Turns: A Theory and A Model of State-Dissident Interactions.
Indirect Funding (Increased accuracy of software and dictionaries)
2012-13 Office of Naval Research. “Worldwide Integrated Crisis Early Warning System
(W-ICEWS).”
2012-13 Office of Naval Research. “Subregional Modeling of Political Conflict and
Instability.”
2010-13 Office of Naval Research. “Turning Text into Behavioral Processes and Public
Support: Supporting the Next Generation of Conflict Analysis.”
2009-11 Defense Advanced Research Projects Agency (DARPA), Subcontracted through
Lockheed Martin . “Integrated Crisis Early Warning System (ICEWS) – Phase III.”
2009-11 Defense Advanced Research Projects Agency (DARPA), Subcontracted through
Lockheed Martin. “Integrated Crisis Early Warning System (ICEWS) – Phase II.”
2008-09 Defense Advanced Research Projects Agency. Automated Sentiment Analysis.
August 15 – February 15.
2007-08 Defense Advanced Research Projects Agency (DARPA), Subcontracted through
Lockheed Martin . “Integrated Crisis Early Warning System (ICEWS) – Phase I.”
2008-2013 Strategic Analysis Enterprises. Internal Research and Development for
Xenophon, Taxis, Logos, and Pathos.
The Coder
Xenophon is a software program developed to code “who is doing what to whom” within
countries. The program was developed by Strategic Analysis Enterprises, a private corporation,
and licensed to William & Mary to code these data. More information can be obtained by
contacting SAE (www.strategicanalysisenterprises.com). In multiple random samples, the
automated event data were hand graded for accuracy and recall. The date, source, actor, target,
and event all had to be correct for a grader to mark an observation correct. The data are 70%
accurate (+/-3 points depending on the sample). In a recall study, coders read reports and hand
coded event data. These data were then compared to the output from Xenophon on the same
stories. The recall scores were greater than 50% (greater than 50% of the events found by
humans were coded by Xenophon). These values to our knowledge are the highest accuracy and
recall scores reported for publicly available event data. It is a long painstaking process to hand
code and hand grade events and machine output and not many researchers producing such data
go through this process. We can say with confidence our data are among the most accurate, if not
the most accurate, event data publicly available today.
Actor Dictionaries
The actor dictionaries are also a culmination of efforts over the last 5-7 years. They were begun
by undergraduate students at the College of William & Mary and the University of Georgia. But
along the way as they were used in multiple government projects benefitted from assistance
through Phil Schrodt’s teams at both the University of Kansas and the Pennsylvania State
University. In addition, as part of the Integrated Crisis Early Warning System (ICEWS) project,
the dictionaries were also modified and maintained by Lockheed Martin researchers. Given all
the work on these dictionaries through the years, they are deemed to be some of the most
complete dictionaries ever produced especially for this region.
How these data differ from the ICEWS Data
These data are not direct products of the ICEWS project. In particular, they are coded using a
completely different engine, Xenophon. They also adopt a different actor scheme. However, they
correlate very highly to the ICEWS data in the sense that the data come from virtually the same
set of documents and are generated by a coder that yields similar accuracies to Raytheon-BBN’s
Serif coder which was used to produce the ICEWS data.