electronic seismologist - Seismological Research Letters

E
L
E
C
T
R
O
N
I
C
S E I S M O L O G I S T
Community Online Resource for Statistical
Seismicity Analysis
J. Douglas Zechar,1,2 Jeanne L. Hardebeck, 3
Andrew J. Michael, 3 Mark Naylor, 4 Sandy Steacy,5
Stefan Wiemer,1 Jiancang Zhuang, 6 and the
CORSSA Working Group7
INTRODUCTION
Statistical analysis of seismicity is critical for understanding
earthquake observations, testing proposed prediction and forecast methods, and assessing seismic hazard. Unfortunately,
despite its importance to seismology—especially to those studies that potentially impact public policy—statistical seismology
is mostly ignored in the education of seismologists, and there
has been no central repository for relevant software. To remedy
these deficiencies, and with the broader goal of enhancing the
quality of statistical seismology research, we have begun building the Community Online Resource for Statistical Seismicity
Analysis (CORSSA). CORSSA is an educational platform
that is designed to be authoritative, up-to-date, prominent, and
useful. We anticipate an audience that ranges from beginning
graduate students to experienced researchers.
Every co-author of this article has served as a referee for at
least one seismology manuscript in which the author(s) made a
questionable or incorrect application or interpretation of statistics. We suspect that most readers have had a similar experience.
This is not a matter of stupidity—not even the important kind
championed by Schwartz (2008)—but rather a lack of understanding and/or awareness of sometimes sophisticated mathematical concepts and how they should be applied to uncertain
data. We seek to fill this gap in knowledge, understanding, and
application, and to promote excellence in statistical seismology, by providing the information and resources necessary to
understand and implement the best practices, with the hope
that readers will apply these methods to their own research.
1. Swiss Seismological Service, ETH Zurich, Zurich, Switzerland
2. Lamont-Doherty Earth Observatory, Columbia University,
Palisades, New York, U.S.A.
3. U.S. Geological Survey, Menlo Park, California, U.S.A.
4. School of Geosciences, University of Edinburgh, Edinburgh,
Scotland
5. School of Environmental Sciences, University of Ulster, Northern
Ireland
6. Institute of Statistical Mathematics, Tokyo, Japan
7. See http://www.corssa.org/about/community
Electronic Seismologist
Given that seismology is a field of applied physics, it is
reasonable that, starting with only Hooke’s law, students are
taught to derive the wave equation, Snell’s law, reflection/
refraction coefficients, and the behavior of surface waves. But
seismology is also increasingly becoming a field of applied statistics, and few seismology students are taught even the most
basic statistical methods let alone the underlying theory. For
instance, while most seismology texts mention the GutenbergRichter magnitude distribution, few include Aki’s (1965) demonstration and Weichert’s (1980) additional treatment suggesting that one should use maximum likelihood to estimate the
model parameter a- and b-values. Such disregard for statistics in
seismology texts might be explained by the fact that seismology
evolved from physics and had an early emphasis on theoretical
understanding. Relying on supplementary statistical courses is
an imperfect solution for seismology students: Even with some
basic training, it is rarely simple to apply textbook statistical
procedures to problems of seismicity, where clustering undermines the common assumption of independent data, and issues
of data quality related to seismic networks are unique.
Because statistics is so little emphasized in seismology
texts, the audience that stands to benefit from CORSSA is
quite varied. CORSSA material can serve undergraduate students as a starting point to understand the issues, and it should
serve graduate students as a resource for their own research.
Moreover, it should serve experienced researchers from outside the statistical seismology community, and even researchers within that group, as a point of reference to enhance the
quality of their work. To serve this diverse audience, CORSSA
covers a wide variety of material, which we categorize using the
following seven themes:
I.
II.
III.
IV.
V.
VI.
VII.
Introductory material
Basic features of seismicity
Statistical foundations
Understanding seismicity catalogs
Models and techniques for analyzing seismicity
Earthquake predictability and related hypothesis testing
Data standards
The thematic structure was devised to make it easy for readers
to focus on their personal requirements to get an introduction
to statistical seismology (Theme I), or to learn about the basics
of earthquakes (Theme II), statistics (Theme III), and/or the
intricacies of seismicity catalogs (Theme IV) before moving
on to applications found in Theme V and Theme VI. Theme
686 Seismological Research Letters Volume 82, Number 5 September/October 2011
doi: 10.1785/gssrl .82.5.686
VII provides information about data formats and standardized
datasets that can be used for testing software.
Each of these themes comprises a series of articles. Articles
act as tutorials and rely on previously published, peer-reviewed
literature. Each article deals with a specific task or topic and
includes some subset of the following: discussion of why the topic
is useful for research; a brief review of theory; a list of methods
and software that address this topic; a discussion of trade-offs
between analysis choices; pitfalls to be aware of; example results;
examples of applications in scientific journals; recommendations
for further reading; and next steps for the reader to take.
FEATURES
CORSSA is a collection of review articles related to statistical
seismicity analysis, organized by a few thematic elements, and
supplemented by software packages, data, a glossary, news items,
and discussion forums. To more fully understand this project,
it is useful to compare it with three common contemporary
research outlets: peer-reviewed journals, textbooks, and wikis.
As with textbooks and unlike wikis and regular issues
of journals, a comprehensive design guides CORSSA development. But unlike a book and similar to a wiki, individual
CORSSA articles and other content are made available immediately once they are deemed ready, rather than waiting for
everything to be completed. Moreover, given that technology
now allows a more accurate representation of the ever-evolving,
incremental nature of scientific advancement, the concept of a
final state of knowledge is obsolete—in short, CORSSA articles and content can be revised and updated, and version information can be included when appropriate. Also like a wiki,
large datasets can be curated and presented in the context of an
article and as standalone resources. Because CORSSA is primarily an educational resource, its articles will not contain new
interpretive science; on the contrary, and given that content
can be updated, CORSSA will feature “living” review articles.
We believe that identifying authors and using an optionally anonymous peer-review system provides an authority
that is sometimes missing in anonymous wiki entries (and
Web pages in general). Therefore, as with journals, CORSSA
authors are clearly identified, and articles are peer-reviewed
and subject to editorial approval. By identifying authors, we
also acknowledge their efforts, which are crucial to CORSSA’s
existence. CORSSA articles can be cited in much the same
way as traditional peer-reviewed journal articles. Although we
categorize the articles by theme, the relatively small number of
articles allows us a simple citation scheme without specifying
a volume or issue number: Articles are cited by author(s), year,
and a unique digital object identifier (DOI). CORSSA is not
a traditional journal, so its articles are not indexed in the Web
of Science databases. But because these articles have DOIs, the
Web of Science Cited Reference Search and other tools such as
Google Scholar can track citations.
Recognizing that the portable document format (PDF) is
the current standard for research articles, we present articles as
PDF files. Nevertheless, readers can search the text of all arti-
cles directly via the Web interface, rather than having to open
each article file. The PDF also allows authors to easily include
long equations and in-line vector graphics, an advantage over
most Web-based content, which tends to present equations
and figures as low-resolution images. Authors can also provide
standalone, high-quality graphics that are appropriate for presentations. Because these articles are more educative than most
research articles, we anticipate that authors will include illustrative examples and accompanying code. To accommodate this
need, the CORSSA system allows authors to link an article with
software, data, and accompanying explanatory text (Figure 1).
Like most textbooks and wikis, CORSSA maintains a
glossary of relevant terms. If one of these terms is used in a
CORSSA article, its first occurrence within the article is linked
directly to the definition in the glossary (similar to the electronic version of the New York Times). The glossary includes
community-developed definitions that are specific to statistical
analysis of seismicity, but the definitions are general enough to
be shared by multiple articles, much like a wiki.
Perhaps the most important feature for readers, and unlike
most textbooks and journals, is that CORSSA content is free
to all.
BUILDING CORSSA
In May 2010, 24 scientists from 11 nations attended a workshop in Zürich, Switzerland, to flesh out a plan for CORSSA
and begin drafting an initial set of articles and accompanying material (Figure 2). This was a workshop in the literal
sense: The majority of the time was dedicated to working in
small groups, designing the contents of each thematic section.
During the workshop, the authors of this article formed a
CORSSA executive committee; by volunteering for this committee, we pledged our commitment to implement and publicize CORSSA, including sharing editorial and administrative
responsibilities.
After the conclusion of the Zürich workshop, CORSSA
participants continued drafting articles; soon thereafter, two
article templates were designed and distributed: one for authors
who prefer Microsoft Word and another for authors who prefer LaTeX. These templates provide a consistent look for each
article with minimal typesetting.
The workshop participants agreed to use the Silva content management system for the CORSSA Web presence.
Silva allows us to quickly add and edit all CORSSA content
without requiring detailed knowledge of Web development
technology. Silva has an open source license and was a natural
choice because we rely on the technical support of the Swiss
Seismological Service IT group, which was already familiar
with using and supporting Silva.
We worked with colleagues at the ETH-Bibliothek
(http://www.doi.ethz.ch) to obtain DOIs for CORSSA content. Serendipitously, we discovered that ETH-Bibliothek is
a member of the DataCite consortium (http://datacite.org),
which is one of only seven DOI registration agencies worldwide. This drastically reduced the administrative overhead and
Seismological Research Letters Volume 82, Number 5 September/October 2011 687
▲▲ Figure 1. Typical CORSSA article landing page with navigation to other CORSSA features.
cost for DOIs. We reserved the prefix doi:10.5078/corssa, to
which we append a unique eight digit number for each article.
We note that we could also register DOIs for CORSSA datasets, software, and other content in the same way, but we have
not yet chosen to do so.
With an initial set of DOI-registered articles and accompanying content, the CORSSA Web presence was officially publicized to attendees of the European Seismological Commission
in Montpellier, France, in September 2010.
CURRENT STATE
In this section, we describe CORSSA as it existed at the time of
this writing; because it is a living resource, we don’t expect that
the description will remain exactly accurate in the future, but
this section provides the reader with an informative snapshot.
We encourage the reader to visit http://www.corssa.org for current information.
At the time of this writing, CORSSA includes seven published articles across five themes. In Theme I, introductory material, Michael and Wiemer (2010) described the motivation for,
and some historical development related to, the CORSSA project. Vere-Jones (2010) adapted his keynote presentation from
the 2007 International Statistical Seismology (StatSei) conference, suggesting how statistical tools can aid seismicity analyses
and how students of seismology can obtain effective statistical
training. As part of Theme III, statistical foundations, Naylor
et al. (2010) mentioned some of the difficulties that a new
researcher may face when attempting exploratory data analysis
with earthquake catalogs, and they provided several practical
exercises and code snippets. Husen and Hardebeck (2010) contributed a review of an important topic that many researchers
neglect—accuracy and precision of earthquake locations—to
Theme IV, understanding seismicity catalogs. They outlined in
clear language how events are usually located or relocated; they
also reported typical assumptions and highlighted the coupled
nature of seismic velocity models and earthquake location estimates. For Theme V, models and techniques for analyzing seis-
micity, Hainzl et al. (2010) reviewed work related to spatiotemporal seismicity models based on rate-and-state friction and
Coulomb stress transfer; they supplemented a brief theoretical
treatment with discussion of numerical algorithms for parameter value estimation. Marsan and Wyss (2011) described the
challenges in robustly identifying and understanding seismicity rate changes. In Theme VI, earthquake predictability and
related hypothesis testing, Zechar (2010) compared various
methods for evaluating earthquake predictions and earthquake forecasts, noting advantages and disadvantages for each
strategy. He also contributed several software implementations
and practical applications to accompany the article.
An additional six articles exist in various states of draft.
Gulia et al. (under review) discuss methods for investigating the quality of a seismic catalog, including techniques
for deblasting, or identifying non-tectonic events. Mignan
and Woessner (under review) comprehensively review methods used to estimate catalog completeness—the magnitude
level above which all earthquakes are believed to be reliably
reported. Woessner et al. (under review) tell the story of how
a seismicity catalog is generated and maintained. Zhuang et al.
(forthcoming) provide a broad overview of several statistical
models used to describe seismicity distributions. Iwata (under
review) discusses earthquake triggering caused by forces other
than tectonic loading, e.g., tides and passing seismic waves.
The important issue of declustering—identifying and removing aftershock sequences from catalogs—is reviewed by van
Stiphout et al. (under review).
Moreover, several articles that were planned during the
initial CORSSA workshop have not yet been drafted. The
complete list of envisioned articles is maintained at http://
www.corssa.org/articles/draft_toc.pdf and, while this list associates potential authors with potential articles, we would happily consider additional volunteer authors or article suggestions
(e‑mail [email protected]).
The CORSSA glossary contains 69 terms. The list of terms
was originally compiled during the organizational meeting in
Zürich, primarily by eavesdropping on the discussions of the
688 Seismological Research Letters Volume 82, Number 5 September/October 2011
▲▲ Figure 2. Photographs from initial CORSSA workshop in Zürich, during which participants organized themes, began drafting articles, and posed for photographs.
individual working groups; this list was then augmented by
combing through the submitted articles for frequently used
terms. The linking between articles and the glossary is automated with a few simple scripts—one implemented as a macro
for articles drafted using the Word template, and another
implemented as a Java function that operates on LaTeX articles. Although the glossary is closely linked to the articles, it
can also be used as a standalone resource for readers seeking
seismicity-related definitions. The purpose of the CORSSA
glossary is to provide a concise, contextual definition of each
term, but we also provide links to more comprehensive treatments; for example, pointing to relevant research articles, U.S.
Geological Survey Web pages for earthquake-specific terms,
and Wikipedia for statistical terms. We also link some glossary terms to other glossary terms with which they can be contrasted; for example, “moment magnitude” is in this way linked
with “local magnitude.”
All contributors to CORSSA are acknowledged on the Web
presence; the list of contributors includes workshop attendees,
article authors, referees, and individuals who shared software.
Because we want CORSSA articles to be as useful and
accurate as is practical, we host forums that allow open communication between the authors of each article and the readers;
these forums also allow communication among readers. At the
time of this writing, these forums have been little used, which
may indicate that the forums are too new, that our reader community is too small to merit this functionality, or perhaps that
readers are not accustomed to this type of interaction. We note
that journals such as Nature and Nature Geoscience allow similar functionality in the form of online comments, and these
too are often unused.
As a service to readers, we maintain a minimal news section that includes a list of recent and upcoming relevant meetings and a growing list of relevant journal articles. We suspect
Seismological Research Letters Volume 82, Number 5 September/October 2011 689
that these manually curated lists will serve as a convenient central point to access the latest information related to statistical
seismicity research advances. Such a resource is increasingly
useful; as David Foster Wallace pointed out in 1996, as the
amount of information that we daily receive continues to grow,
we need some method for filtering what is important (Lipsky
2010, 38). We intend for this news section to be sparingly used
to announce calls for papers for special issues of journals or
conference sessions.
We have received anecdotal positive feedback regarding
CORSSAConclusionsarticles. The content has been effective
for introducing new and continuing graduate students to complex topics, and article reprints that we brought to conferences
have been very popular souvenirs.
CONCLUSIONS
Less than one year after work began on CORSSA, we have
made tremendous progress in building a resource that we
believe will educate students and researchers.
When designing CORSSA, we made a deliberate decision
to limit the initial scope of the project; as none of the participants had done something quite like this before, and because we
were mostly reliant on volunteer efforts, we resisted the temptation to make an excessively ambitious plan. It was primarily
for this reason that we chose to emphasize statistical seismicity analysis rather than the broader field of statistical seismology. Moreover, seismicity analysis has tended to dominate the
recent StatSei meetings (e.g., Schorlemmer and Jackson 2009).
Nevertheless, nothing about the design of CORSSA
precludes us from expanding to cover other topics within
statistical seismology. As research interests evolve, so too can
CORSSA, provided that a sufficiently energetic community
persists. We suspect that many other subfields would benefit
from a resource similar to what we have designed and implemented, and because so many of the features of CORSSA are
not knowledge domain specific, we hope that it can serve as a
blueprint for others.
ACKNOWLEDGMENTS
Portions of this article appear in slightly different form in the
CORSSA article by Michael and Wiemer (2010). We thank
an anonymous referee for many insightful comments and useful suggestions. We thank Benno Luthiger and Philipp Kästli
for general technical assistance. We thank Angela Gastl and
Francesco Croci for assistance with DOIs. We thank the following organizations for supporting CORSSA: Network of
Research Infrastructures for European Seismology (NERIES),
the Swiss Seismological Service, Southern California
Earthquake Center, and the U.S. Geological Survey. JDZ was
partially supported by NSF grant EAR-0944202. We especially thank Mietta Petronio for her patient and energetic support of this work.
REFERENCES
Aki, K. (1965). Maximum-likelihood estimate of b in the formula log
N = a − bM and its confidence limits. Bulletin of the Earthquake
Research Institute 45, 237–239.
Gulia, L., S. Wiemer, and M. Wyss (under review). Catalog artifacts
and quality control. Community Online Resource for Statistical
Seismicity Analysis; doi:10.5078/corssa-93722864.
Hainzl, S., S. Steacy, and D. Marsan (2010). Seismicity models based
on Coulomb stress calculations. Community Online Resource for
Statistical Seismicity Analysis; doi:10.5078/corssa-32035809.
Husen, S., and J. L. Hardebeck (2010). Earthquake location accuracy.
Community Online Resource for Statistical Seismicity Analysis;
doi:10.5078/corssa-55815573.
Iwata, T. (under review). Earthquake triggering caused by the external
oscillation of stress/strain changes. Community Online Resource for
Statistical Seismicity Analysis; doi:10.5078/corssa-65828518.
Lipsky, D. (2010). Although of Course You End Up Becoming Yourself: A
Road Trip with David Foster Wallace. New York: Broadway, 352 pp.
Marsan, D., and M. Wyss (2011). Seismicity rate changes. Community
Online Resource for Statistical Seismicity Analysis; doi:10.5078/
corssa-25837590.
Michael, A. J., and S. Wiemer (2010). CORSSA: The Community
Online Resource for Statistical Seismicity Analysis. Community
Online Resource for Statistical Seismicity Analysis; doi:10.5078/
corssa-39071657.
Mignan, A., and J. Woessner (under review). Completeness magnitude
in earthquake catalogs. Community Online Resource for Statistical
Seismicity Analysis; doi:10.5078/corssa-00180805.
Naylor, M., K. Orfanogiannaki, and D. Harte (2010). Exploratory data
analysis: Magnitude, space, and time. Community Online Resource
for Statistical Seismicity Analysis; doi:10.5078/corssa-92330203.
Schorlemmer, D., and D. D. Jackson (2009). Seismologists and statisticians establish new research targets. Eos, Transactions, American
Geophysical Union 90 (43); doi:10.1029/2009EO430008.
Schwarz, M. A. (2008). The importance of stupidity in scientific research.
Journal of Cell Science 121, 1,771.
van Stiphout, T., J. Zhuang, and D. Marsan (under review). Seismicity
declustering. Community Online Resource for Statistical Seismicity
Analysis; doi:10.5078/corssa-52382934.
Vere-Jones, D. (2010). How to educate yourself as a statistical seismologist. Community Online Resource for Statistical Seismicity Analysis;
doi:10.5078/corssa-17728079.
Weichert, D. H. (1980). Estimation of the earthquake recurrence parameters for unequal observation periods for different magnitudes.
Bulletin of the Seismological Society of America 70 (4), 1,337–1,346.
Woessner, J., J. L. Hardebeck, and E. Haukkson (under review). What is
an instrumental seismicity catalog? Community Online Resource for
Statistical Seismicity Analysis; doi:10.5078/corssa-38784307.
Zechar, J. D. (2010). Evaluating earthquake predictions and earthquake
forecasts: A guide for students and new researchers. Community
Online Resource for Statistical Seismicity Analysis; doi:10.5078/
corssa-77337879.
Zhuang, J., M. J. Werner, D. Harte, S. Hainzl, and S. Zhou (forthcoming). Basic models of seismicity. Community Online Resource for
Statistical Seismicity Analysis; doi:10.5078/corssa-47845067.
Swiss Seismological Service, ETH Zurich
NO H3
Sonneggstrasse 5
8092 Zurich, Switzerland
J. Douglas Zechar et al.
690 Seismological Research Letters Volume 82, Number 5 September/October 2011
[email protected]
(J. D. Z.)