Article Volume 14, Number 2 28 February 2013 doi:10.1002/ggge.20067 ISSN: 1525-2027 Toward a semantic web of paleoclimatology Julien Emile-Geay Department of Earth Sciences, University of Southern California, Los Angeles, California, USA ([email protected]) Jason A. Eshleman IO Informatics, Berkeley, California, USA [1] The paleoclimate record is information-rich, yet significant technical barriers currently exist before this record can be used to answer scientific questions. Here we make the case for a universal format to structure paleoclimate data. A simple example demonstrates the scientific utility of such a self-describing way of organizing coral data and meta-data. This example is generalized to a universal ontology that may form the backbone of an open-source, open-access, and crowd-sourced paleoclimate database. The format is designed to enable semantic searches, and is expected to accelerate discovery on topical scientific problems like climate extremes, the characteristics of natural climate variability, and climate sensitivity to various forcings. Components: 8,100 words, 1 table, 4 figures. Keywords: paleoclimatology; database; data-mining; machine-learning; semantic; archive. Index Terms: 0473 Biogeosciences: Paleoclimatology and paleoceanography (3344, 4900); 3344 Atmospheric Processes: Paleoclimatology (0473, 4900); 1912 Informatics: Data management, preservation, rescue; 1914 Informatics: Data mining. Received 6 August 2012; Revised 10 January 2013; Accepted 10 January 2013; Published 28 February 2013. Emile-Geay J., and J. A. Eshleman (2013), Toward a semantic web of paleoclimatology, Goechem. Geophys. Geosyst., 14, 457–469, doi:10.1002/ggge.20067. 1. Introduction [2] Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open. Panton Principles [Murray-Rust et al., 2010] [3] In the past decade or so, the paleoclimate community has embraced these principles to a remarkable extent, so much so that a vast number of proxy records from all parts of the world are now publicly available through online databases like the International Tree ©2013. American Geophysical Union. All Rights Reserved. Ring Data Bank [ITRDB, 2009], the National Climatic Data Center (NCDC) (http://www.ncdc.noaa. gov/paleo/data.html) or PANGAEA (http://www. pangaea.de/). The benefits of making this information freely available are immense. In principle, it may: (1) Facilitate the reproduction of published results, a core principle of modern science, (2) Enable the exploration of all available data, and ease comparisons between all relevant proxy records through meta-searches, (3) Serve as the basis for multiproxy reconstructions of past climates, and (4) Widen the of citizen participation to individuals outside the relatively small community of climate science specialists. 457 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB [4] Yet, as currently archived, this information does not allow to answer societally relevant questions like: “How anomalous is 20th century climate in the context of the past 2000 years?” or “What is the relationship between climate forcing and observed climate variations in the paleoclimate record?” To be sure, no simple answer is expected to either question; nonetheless, a persistent and seldom-recognized barrier to the optimal use of paleoclimate information is that the diversity of proxies, measurements, and chronological information requires a flexible format that maintains sufficient metadata. So far, no universal format has been established to do so; the synthesis of climate proxy information is therefore labor-intensive, unnecessarily tedious, and increases the probability of introducing human errors. Furthermore, such obstacles are common to other geoscientific fields; it was estimated, for instance that NASA scientists spend up to 60% of their time preparing data for comparison with model output—an enormous waste of their precious time [NASA, 2012]. Such considerations have led the National Science Foundation to launch the EarthCube initiative (http:// www.nsf.gov/geo/earthcube/) in June 2011, seeking “transformative concepts and approaches to create integrated data management infrastructures across the geosciences.” [5] Many obstacles to the access to, and analysis of, paleoclimate data would vanish if data and metadata were archived in a machine-readable format, as has been demonstrated in other scientific domains (e.g., BioDash [Neumann and Quan, 2006], QuakeML [Schorlemmer et al., 2004], GenBank [Benson et al., 2011], and MAGE-ML [Spellman et al., 2002]). Microarray experimental metadata and parameters can be searched and complete data sets rapidly retrieved from two centralized best-practice repositories, Gene Expression Omnibus [Edgar et al., 2002] and ArrayExpress [Parkinson et al., 2007]. This ability to rapidly construct novel data sets with all appropriate data and metadata has enabled data-mining and machine-learning algorithms to be applied to a much wider data set than ever possible before. Such work, to give but a few examples, has allowed for the discovery and validation of robust biomarkers for organ transplant rejection [Chen et al., 2010], characterize gene function [Pierre et al., 2010], and detect patterns of gene coexpression [Adler et al., 2009]. Similar breakthroughs would be possible in climate science if online archives conformed to relatively simple principles. [6] The key to those recent success stories has unquestionably been the concept of interoperability (the ability for different systems to readily exchange data); what has enabled it is semantic technology. In this article, we discuss the benefits of designing a 10.1002/ggge.20067 universal, semantically based format for the archival of paleoclimate data. To further motivate this technical operation and illustrate its scientific merit, we start by describing a currently existing format used in recent publications [Emile-Geay et al., 2013a, 2013b], demonstrating how it can enable rapid information extraction (section 2). We then propose semantic extensions of this format (section 3), discussing the pros and cons of the approach and outlining an achievable path toward the stated objectives. We finish by discussing the climate science repercussions of this potentially disruptive technology in section 4. 2. The Power of Well-Organized Data 2.1. Database Format [7] The University of Southern California climate dynamics group recently compiled all publicly available coral records and archived them into a format suitable for the Matlab programming language. The format organizes data as a Matlab structure with fields describing the proxy class (e.g., class: ’CORAL’), the type of measurement (e.g., d18O, Sr/Ca , extension rate, fluorescence), includes the raw chronology (chron) and raw data (data) as vectors, as well as metadata such as latitude (lat, a scalar), longitude (lon, a scalar), the site name (site, a character string), a quick bibliographic reference (reference, a string), and a cite key to a bibliographical software like EndNote or BibTeX. For a concrete example, consider the Sr/Ca record from Palmyra Island from Nurhati et al. [2011]. The corresponding database entry reads: [8] Accompanying the aforementioned fields are others that are calculated from them and enable a rapid parsing of the database. These include the initial year present in the chronology (year_i), the final year (year_f), and the average resolution in months (raw_res). Such quantities will soon prove 458 Geochemistry Geophysics Geosystems 3 G 10.1002/ggge.20067 EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB useful to select records on the basis of their time coverage or resolution. In addition to these fields, one may want to process the raw proxy information to obtain more meaningful quantities. For instance, the data may have been collected at uneven time intervals, so it would need to be interpolated onto a regular time grid for certain types of analysis (e.g., spectral analysis, empirical orthogonal function (EOF) analysis) to be performed. In addition, a monthly-resolved record such as this one exhibits a prominent seasonal cycle that one may want to remove prior to analysis. This can be done very simply by adding the corresponding data vectors chron_reg and data_reg, the new resolution res (in this case, res = raw_res = 1) and a vector of monthly anomalies anom). [9] This strategy had the advantage of enabling a “journaled” approach to data processing; it makes clear which transformations were applied to each proxy record and enables users to go back to the raw data and apply their own processing steps if so desired. Let us now see what scientific information may be derived from this form of organization. 2.2. Analysis [10] The first thing one might want to do is visualize the spatiotemporal coverage of the database. This may be done in a few lines of code, and is illustrated in Figure 1. a) Proxy representation by age (67 records) 40oN 30oN 20oN 10oN 0o o 10 S 18 O 20oS Sr/Ca 30oS other 40oS 1200 1300 1400 1500 1600 1700 1800 1900 Most ancient age resolved b) Proxy availability over time # proxies 60 18 O Sr/Ca other 40 20 0 1500 1550 1600 1650 1700 1750 1800 1850 1900 1950 2000 Time Figure 1. An information-rich representation of the coral database introduced herein. (a) Location of each record, stratified by proxy types d18O (circles), Sr/Ca (squares), “other”, (triangles, covering fluorescence and extension rate) coded by shape and most ancient age resolved. (b) Proxy availability over time, against stratified by proxy type. 459 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB [11] Next, one may want to inspect the data them- selves. It is easy to annualize all records (using DJF averages for those subannually-resolved records) and assemble them all into a single matrix, which is plotted in Figure 2 to obtain a synoptic view of the data set. The well-documented tendency for negative d18O and Sr/Ca values in the late 20th century (probably due to large-scale warming and/ or reorganizations of the hydrological cycle, see Gagan et al. [2000]) is then immediately apparent. [12] Finally, one may want to perform a more revealing analysis of climate variability as portrayed by these records. The extensive meta-data and the structured array syntax allow the user to easily subset records according to several criteria, e.g., finding records containing d18O measurements only, with a quarterly resolution or finer, at least 100 years in length and overlapping. This leaves 25 records out of the original 67 (Table 1), covering the period 1893 to 1984. These records can be readily annualized (taking December–January–February averages for monthly records, cold season averages otherwise), assembled into a matrix and an EOF analysis [Lorenz, 1956] may be readily performed. [13] The first mode of this EOF analysis is pre- sented in Figure 3. As expected, the spatial pattern of sea-surface temperature associated with this 10.1002/ggge.20067 mode bears a strong resemblance to El Niño-Southern Oscillation (ENSO), as confirmed by its temporal expression (PC1), which displays maxima and minima coincident with known ENSO events. The multitaper spectrum of this principal component reveals a relatively strong annual component and dominant interannual variability, consistent with what is independently known about ENSO [e.g., Sarachik and Cane, 2010]. The relative area covered by each circle illustrates the magnitude of the EOF coefficients (“loadings”), while the color refers to their sign (after multiplying d18O values by –1 so that negative excursions in oxygen isotopes correspond to positive temperature anomalies). Their signs are consistent with the ENSO thermal signal expected in each isotopic record, though the disparate amplitudes reflect the varying influence of other factors (climatic or not). [14] That a pan-tropical network of subannually- resolved coral records may capture ENSO variability has been amply demonstrated before [Evans et al., 2000, 2002]. It is merely confirmed by the present analysis, with the difference that this information was extracted in less than 200 lines of code (including visualization) and can be easily modified for different data selection criteria, which amount to a mere logical indexing of the Matlab structured array. Figure 2. Synopsis of the coral matrix. Colors represent Z-scores with records sorted in alphabetical order. 460 Geochemistry Geophysics Geosystems 3 G 10.1002/ggge.20067 EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB Table 1. List of Coral Records Used for the EOF Analysis of Figure 3. The Records Were Obtained From the Full Database (Illustrated in Figure 1) After Applying the Following Conditions: d18O Only, Quarterly Resolution or Finer, at Least 100 Years Long, and Overlapping. Time Resolution is Expressed in Months. The Quantity r Represents the Linear Correlation to Local Temperature Derived From the HadSST2 Data Set [Rayner et al., 2006]. Correlations are Assessed Using a Nonparametric Monte Carlo Test Against 1000 Isopersistent Red Noise Time Series; Significant Correlations are Denoted in Bold 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Site Type Resolution Latitude Longitude Range r Reference Abrolhos Amedée Light. Bunaken Clipperton Atoll Guam Ifaty La Réunion Laing Lombok Strait Madang Mafia Mahe Maiana Atoll Mentawai Moorea Ningaloo Reef Palmyra Rabaul Rarotonga(2R) Rarotonga(3R) Ras Umm Sidd Secas d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O d18O 1 3 1 1 1 2 2 3 1 3 2 1 2 1 2 2 1 1 1 1 2 1 28 S 22 S 2 N 10 N 13 N 23 S 21 S 4 S 8 S 5 S 8 S 5 S 1 N 0 N 18 S 22 S 6 N 4 S 21 S 21 S 28 N 8 N 114 E 166 E 125 E 109 W 145 E 44 E 55 E 145 E 116 E 146 E 40 E 55 E 173 E 99 E 150 W 114 E 162 W 152 E 160 E 160 E 34 E 82 W 1794–1994 1660–1993 1860–1990 1893–1994 1790–2000 1659–1995 1832–1995 1884–1993 1782–1990 1880–1993 1622–1722 1846–1995 1840–1994 1858–1997 1852–1990 1878–1995 1886–1998 1867–1997 1726–1996 1874–2000 1751–1995 1707–1984 –0.27 –0.24 –0.21 –0.27 –0.32 –0.11 –0.07 –0.37 –0.13 –0.22 –0.20 –0.32 –0.46 –0.37 –0.06 –0.24 –0.62 –0.23 –0.18 –0.20 –0.24 –0.11 Kuhnert et al. [1999] Quinn et al. [1998] Charles et al. [2003] Linsley et al. [2000] Asami et al. [2005] Zinke et al. [2004] Pfeiffer et al. [2004] Tudhope et al. [2001] Charles et al. [2003] Tudhope et al. [2001] Damassa et al. [2006] Charles et al. [1997] Urban et al. [2000] Abram et al. [2008] Boiseau et al. [1998] Kuhnert et al. [2000] Cobb et al. [2001] Quinn et al. [2006] Linsley et al. [2008] Linsley et al. [2006] Felis et al. [2000] Linsley et al. [1994] The main reason for the conciseness of this code is the consistent format associated with each record, which enable very easy manipulations of the data set. In the next section, we describe how simple extensions of this principle of data organization may enable much richer scientific questions to be asked of paleoclimate proxy networks. 3. The Paleoclimate Database of the Future [15] Given the imperfections of each proxy type, the value of inferring climate information from multiple proxy classes is no longer in doubt [e.g., Mann, 2002; Li et al., 2010; Wahl et al., 2010], and indeed forms the basis of surface temperature reconstructions of the past millennium and beyond [Jones et al., 2009; NRC, 2006], which have wide implications in the public sphere. The recent community focus embodied by the PAGES 2 K project (http://www.pages-igbp.org/ workinggroups/2k-network) illustrates the widespread recognition of the importance of pooling together all the proxy information available over the Common Era to enable new scientific discoveries to be made. [16] Three leading questions motivated the PAGES 2 K data collation initiative: (1) What did the main patterns and modes of climate variability on subdecadal to orbital time scales look and operate like? (2) How does climate variability and extreme events relate to the important primary forcing factors, namely orbital, solar, and volcanic? (3) What feedbacks operated to modulate the climate response? [17] In an ideal world, the first two questions could be answered relatively easily by applying machinelearning and signal-processing algorithms to an internally consistent database gathering all paleoclimate records contributed to date. Our world, as it stands, is far from ideal, but relatively simple steps, which we describe next, should improve upon this situation. 3.1. Attributes of an Ideal Paleoclimate Data Format [18] Our own experience and discussions with col- leagues suggest that an ideal data format would conform to the following attributes: 1. Parsability: the format should be self-describing (hence machine-readable), and would therefore enable a semantic web [Berners-Lee et al., 2001] of paleoclimate information. 461 Geochemistry Geophysics Geosystems 3 G 10.1002/ggge.20067 EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB Coral 18 O network, 22 sites, EOF 1 - 18% variance 24oN 16oN o 8 N o 0 o 8 S o 16 S 24oS EOF > 0, size magnitude EOF < 0, size magnitude -1 -0.5 0 0.5 Contours: SST ( C) regressed onto PC 1 Principal component timeseries 1 PC 1 MTM spectrum 0.15 Frequency 0.05 0 -0.05 Power PC 1 (unitless) 0.1 -0.1 100 50 20 10 6 4 2 1 10-2 -0.15 10-4 -0.2 1900 1910 1920 1930 1940 1950 1960 1970 1980 Time 10-2 10-1 100 Frequency (cpy) Figure 3. First EOF mode of the network of 25 d18O records presented in Table 1. (top) EOF coefficients (blue < 0, red > 0) overlain on a map of HadSST2 DJF temperature regressed onto the first principal component. d18O values were multiplied by –1 so that positive excursions correspond to warming temperature; (bottom left) time series of the first principal component; (bottom right) multitaper spectrum [Thomson, 1982] of the first principal component; note that the y-axis is the product of the power by the frequency so that the relative area under the spectrum is preserved in this logarithmic scale. Numbers refers to the period of oscillation in years 2. Universality: the format should be platformindependent (readable on all computer and operating systems), and language-independent (readable in major programming languages like Fortran, C, MATLAB, R, Python, IDL, and NCL/PyNGL). It should enable remote data access (e.g., OPeNDAP, http://opendap.org/) to ensure that users rely on the most up-to-date information prior to conducting analysis. 3. Extensibility: the format should require a minimum set of fields to appropriately define a paleoclimate record, but allow for the database to grow organically as more records are added, or—equally important—as more metadata are added to existing records. 4. Citability: a legitimate concern amongst scientists who generate paleoclimate observations is that their work be given due recognition in subsequent analyses and syntheses. Currently, the NCDC relies on an honor system, but one could imagine easier and more stringent tools. This could include, as in the database described in section 2, electronic references to publications (such a digital object identifiers, EndNote or BibTeX cite keys) embedded in the data format. This would allow the automatic citation of peer-reviewed articles as well as data citations [Killeen, 2012] whenever a data record is being used for analysis. For instance, Table 1 was created automatically in Matlab (using LaTeX conventions) to make due recognition of scientific work an automatic part of paleoclimate data analysis. This principle should be foundational to a universal paleoclimate data format. 5. Ergonomy: The format should be easy to use, update and manage. [19] We submit that semantic technology provides the best way of realizing this vision. In the next section, we discuss semantics and ontologies as relevant to paleoclimate knowledge representation. 462 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB 3.2. Semantic Technology for Paleoclimatology 3.2.1. What is the Semantic Web? [20] Anyone familiar with computers knows that they scarcely understand human language. A Google search for instance, parses the World Wide Web for instances of certain keywords, and returns links to documents containing these keywords, ranked by some measure of relevance. The current Web links together strings of characters, without any notion of what they mean. In contrast, semantic technology [e.g., Berners-Lee et al., 2001] aims to assign meaning to its constituents; it defines various entities (e.g., a place, a person, a document) and describes the relationships between them (e.g., Albert Einstein is a person who wrote articles about physics while living in Switzerland, Germany, and California). It does so by utilizing three-component statements or “triples”: a subject of the statement, a predicate expressing some property of the statement, and an object related to the subject via this predicate. For instance, in the expression “Albert Einstein is a person”, the subject of the statement is “Albert Einstein”, the object “person”, and the predicate relating the two is “is a.” Objects of one statement can be the subject of another statement and predicates may themselves be the subjects of other statements, thus creating a potentially limitless graph of information. In this simplicity lies its main strength: elementary relationships are easy to define, but the potentially infinite size of the graph means that complex reasoning may be automatically carried out as an emergent property of the great many triples. Semantic technology underlies the recent success of computers to process queries in human language, such as IBM’s Watson (http://www-03. ibm.com/innovation/us/watson/science-behind_watson. shtml) or Apple’s Siri. They also underlie many recent innovations in knowledge management (http:// www.huffingtonpost.com/steve-hamby/semantic-webtechnology_b_1228883.html). The key to this success is knowledge representation (KR), an emerging field of computer science central to the Semantic Web. KR studies how to formalize relationships so that the information can make sense to machines. At the core of knowledge representation is an ontology. Ontologies define relationships between entities and provide a core vocabulary for organizing information within a domain. A common vocabulary provides for easier interoperability between data modeled using the ontology. The Dublin Core Metadata Initiative (http:// dublincore.org/) provides a vocabulary sufficient to generally describe most published documents; domain-specific ontologies, like the Semantic Web 10.1002/ggge.20067 for Earth and Environmental Technology (SWEET, http://sweet.jpl.nasa.gov/) describe classes of entities, relationships between classes and relationship more specific to modeling Earth Science data. Subclasses inherit the properties from parent classes; membership in a subclass implies membership in the parent class. Adherence to an ontology also allows for formal reasoning whereby properties not explicitly described can be inferred from known properties. As a simple example from the SWEET ontology, the Greenhouse Effect is a subclass of the general phenomena of Global Warming. By simple inference, any paper discussing the Greenhouse Effect is discussing the broader concept Global Warming. The latter property of such a paper need not be made explicit to be true. [21] Knowledge representation is no magic wand. But while it cannot solve any scientific question per se, it can make many problems far easier. Just as the same integer can be expressed in Roman or Hindu-Arabic representation, the latter makes calculations much easier than the former. Similarly, expressing data via structured vocabularies may help solve many issues related to access, manipulation and interpretation. Simply put, by shifting the burden of description onto the data themselves, one can build leaner code to read, write and manipulate it. [22] In the Earth Sciences, knowledge representation has helped solve a number of real-life problems: setting up virtual observatories [Fox et al., 2009], tracking volcanic effects through several subdisciplines of the geosciences [Fox et al., 2007] or greatly accelerating the analysis of stream metabolism for watershed ecosystem management [Gil et al., 2011]. EarthCube is keenly aware of the potential of KR to solve many data management issues, with an entire working group devoted to ontologies (http://earthcube.ning.com/group/semantics-andontologies). One should note that several ontologies already exist in the Earth Sciences (e.g., SWEET, GeoSCI-ML, http://www.geosciml.org/), and some are specific to EarthCube [Berg-Cross et al., 2012]. Some initial efforts have also been made to encode climate data via ontologies (http://iridl.ldeo.columbia. edu/ontologies/). Whichever format is adopted by paleoclimatologists should be made compatible with those, perhaps via the use of ontology design patterns [Janowicz and Hitzler, 2012a]. 3.2.2. Semantic Pros and Cons [23] When engaging colleagues about semantic tech- nology in paleoclimate, we are always asked this pertinent question: “what can a semantic framework 463 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB achieve that a standard database cannot?” The most honest answer is probably “nothing” [Janowicz and Hitzler, 2012b], but issues of efficiency quickly come to the fore: one can go anywhere on a bicycle, but it is often too slow or impractical to do so. Similarly, while much of EarthCube could conceptually be built with existing tools, the fact that it has not is a testimony to the difficulty of the enterprise; we believe that a semantic framework would lower those barriers. For the sake of discussion, we shall limit ourselves to knowledge representation using the resource description framework (RDF), although it is worth noting that much of the same would be true of non-RDF implementations as well. [24] A graph-data format for representing informa- tion, RDF is in fact the backbone of a semantic web. Bergman [2009] provides a comprehensive overview of semantic technology in RDF, listing no fewer than 60 advantages over competing approaches. RDF is readily expressed (“serialized”) in many formats, including—but not limited to— eXtensible Markup Language (XML) text files. Open-source database backend “triple stores” also exist that make it possible to store RDF in a manner that facilitates distributed querying. When made available as SPARQL endpoints, the data within can be queried across the internet. While much of this is also true of storing data in a relational database, storing data in a triple store does not require a priori schema, unlike relational databases. To extend a data set requires only adding additional triples. Perhaps most importantly, dramatically fewer limits, if any at all, are placed on the extensibility of a data set when compared to a relational database where the initial choice of schema can limit the types of information that can be recorded and the types of queries that can be performed. This is a crucial advantage: science is a dynamic enterprise, and as it progresses, terminology, notation and understanding inevitably change. It is therefore critical that a paleoclimate database be allowed to grow organically. To quote Bergman [2009]: “This very adaptability is what enables RDF to be viewed as data-driven design. We can deal with a partial and incomplete world; we can learn as we go; we can start small and simple and evolve to more understanding and structure; and we can preserve all structure and investments we have previously made.” This flexibility is particularly attractive as large organizations like NSF contemplate adopting a format that may commit them for several decades. [25] Equally importantly, because RDF data come with their own “dictionary” in the form of an 10.1002/ggge.20067 ontology, querying a triple store can be done without prior knowledge of the structure of the data. Distributed querying via SPARQL is aimed at breaking down barriers between the segmented data “silos” where RDBM reigns supreme. RDF is in fact the basis for the Linked Open Data framework [Bizer et al., 2009], a set of best practices for storing and exchanging data on the Web. Publishing paleoclimate data online as Linked Open Data means that they will be instantly accessible to a wide community, helping to valorize the data by encouraging re-use, and will enable machine learning algorithms to find common patterns between paleoclimate data and other sources (e.g., weather station data, climate model output). Finally, there are already libraries for creating, writing, storing, searching, and analyzing RDF databases available in open-source languages like R and Python (http://code.google.com/p/rdflib/). Some open-source triple stores already offer formal reasoning and inferencing capabilities. [26] What are the disadvantages of a semantic representation? An obvious one is that it is more complex than existing tools like, say, netCDF (http://www.unidata.ucar.edu/software/netcdf/). More complexity means a steeper learning curve and, unless anything is done, a slower adoption. However, although netCDF is ideally suited to gridded data characteristic of numerical model output, it is in fact very ill-suited to the storage of very heterogeneous data sets composed of multiple proxies from multiple locations at multiples time resolutions, all hallmarks of paleoclimate networks. To some extent, then, this complexity is warranted by the heterogeneity of the paleoclimate data themselves. Furthermore, upward of 100 “RDFizers” already exist, allowing users to turn an Excel spreadsheet into an RDF representation with minimal programming knowledge. Along those lines, we should note that there are also tools to translate between RDF and the more traditional RDB format (http://www. w3.org/2001/sw/rdb2rdf/), so adopting RDF is anything but a dead-end. It also makes it easy to build Web or desktop applications that assist paleoclimate data contributors in properly formatting the data and metadata prior to uploading it to a central repository, or to their own webpage. [27] Another common concern is the memory over- head of storing data in RDF. In this nascent age of Big Data, paleoclimatologists paradoxically find themselves more plagued by the scarcity, rather than the excess, of observations. Storage space is therefore not a primary concern. Nonetheless, one would obviously want as efficient a storage solution 464 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB as possible. The—at times—more verbose triples may present a storage overhead, although this is mitigated by well-formed RDF that adheres to a robust ontology and binary serializations of the data (e.g., HDT (http://www.w3.org/Submission/2011/ SUBM-HDT-20110330/) or Sesame (http://rivulidevelopment.com/2011/11/binary-rdf-in-sesame/)). [28] There are some practical limitations to utilizing RDF as a means of modeling and storing data. While possible to model as such, RDF is not the most efficient means of storing some tabular data or multidimensional arrays. In such cases, it may be preferable to employ a hybrid approach, separating out metadata—which is efficiently stored as triples—and the source data themselves, maintained in their original format or stored in a relational database. Nonetheless, modeling data as RDF allows for flexible integration from multiple sources without the same level of curation necessary to store data in a relational system. Once modeled, it is possible to model some or all of the graph as relational tables [Ramanujam et al., 2009]. Presently, query performance on graph data in triple stores may underwhelm some users, especially when compared to a relational database. A direct comparison of query performance between data modeled as RDF and data stored in a relational database is difficult to quantify, because performance is a product of the database software and the relational model employed (http://www.w3.org/ wiki/RdfStoreBenchmarking). For many queries, 10.1002/ggge.20067 however, relational models presently outperform triples. The flexible semantic model makes more queries possible, but it is more difficult to optimize. Additionally, while the triple construction of data semantics was proposed almost four decades ago [Abrial, 1974] dedicated triple stores do not have the maturity of relational database management software and tools. 3.3. Building a Paleoclimate Web of Data [29] We now consider practical steps to building a paleoclimate Web of data. The standard should be widely adopted, which requires collective discussion and community consensus. This paper serves as a blueprint for the elaboration of a format of optimal use to the paleoclimate community. [30] In practice, two elements are necessary to a semantic database. The first is an ontology, which describes the objects and the relationships between them. As an illustration, a basic ontology describing relationships between the data and metadata fields of the climate record described in section 2.1 is given in Figure 4; there is no question that it will have to be extended based on community input. The second element needed is a way to store this information. As described above, RDF is optimal for metadata and ontologies, whereas the data themselves may be stored in a universal columnar format like comma-operated values, or in RDF as well (with some storage overhead). Figure 4. Preliminary ontology describing relationships between the data and meta-data fields of the Nurhati et al. [2011] climate record. Several fields are viewed as instances of larger classes (ProxyClass, Site, Reference), which would allow computers to perform operations on all records within a specific class (e.g., if the measurement type is d18O , or if the proxy class is “Tree Ring Width”, or if the resolution is less than 3 months, etc.). All records in such a database would be bound to each other by similar links, allowing machines to automatically process any form of query involving existing information. Such a design would also allow growth, by adding records and/or additional information about each record. 465 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB [31] A specificity of paleoclimate observations is that they almost always consist of time series. Their semantic representation should therefore build this into the initial design, perhaps using dedicated time series representation tools like SDMX-RDF (http://publishing-statistical-data.googlecode.com/svn/ trunk/specs/src/main/html/index.html). This would allow the reuse of a considerable body of R code dealing with SDMX-formatted data (http://opensdmxdevelopers.wikispaces.com/RSDMX). [32] An important problem to address is to ensure that scientists who take the time to upload their data to online repositories (a) conform to this new data standard and (b) do not face significant additional time burden for doing so. This can be done most effectively by providing them with spreadsheet or Web-form templates, which does not require any programming expertise and will automatically organize the data into a form that can be converted to the desired one. The complexity of the data network in the backend should be largely or wholly invisible to the scientist sharing data. Once uploaded, however, the full extent of this data network complexity, including any raw and processed data as well as metadata, should be easily ported directly into analysis tools in a manner similar to that employed by the BioConductor package [Gentleman et al., 2004], where entire microarray data sets can be imported directly from ArrayExpress or Gene Expression Omnibus into R for analysis. [33] Another important consideration is scalability: it should be easy to nucleate the database using existing efforts (cf. section 2) or a similar initiative in R (vpIR, T. Ault, pers. comm., 2012). It should also be easy to grow given a backbone. This leads to the definition of a minimal information standard (part of the ontology) necessary to define a paleoclimate record. At a minimum, our own experience suggests that the record must include: a chronology (time series of dates, one date per sample), one measurement time series (one proxy observation per sample), latitude, longitude, site name, and bibliographic reference. Of course, considerably more information could be added, and indeed, should be. For instance, age model uncertainties in nonannual proxies are best assessed if the actual radiometric dates are available, not just the one age model selected for analysis and publication. One may therefore need to define a standard format for geoscientific chronologies as part of the ontology, incorporating raw measurements so that the chronology can be updated in the light of different calibrations (e.g., updates to the “radiocarbon curve”) as was done (manually) in Anchukaitis and Tierney 10.1002/ggge.20067 [2012]. Designing a universal representation of chronologies used in paleoclimatology would yield additional dividends, such as enabling a more rigorous comparison between records (e.g., a U-series dated cave deposit and a varve-counted lake record) and disseminating statistical best practices via open-source code. [34] Additionally, one should be able to add as much metadata as is thought relevant. One could think of adding information about experimental protocol, proxy interpretation, uncertainties, instrument precision, sample number, reproducibility, as well as tracking various processing steps applied to the raw data (e.g., calibration information, isotopic corrections, standardization, detrending) as illustrated in section 2. This method of defining a minimal information standard has proved successful with microarray data, where minimal information about microarray experiments [Brazma et al., 2001] specifies those core elements for recording data and its provenance, experimental protocols necessary, and versioning. How much or how little metadata to include for each proxy class will require community discussion; several workshops are planned over the next few years as part of the EarthCube roadmap. [35] Semantic technology has the power to greatly simplify the input, organization, intelligibility, and therefore usability of paleoclimate data. It is worth noting, however, that its adoption will take time, and that other structured formats—like the one described in section 2—may provide useful stepping stones along that path. 4. Discussion [36] This article has proposed a new, self-describing format to store paleoclimate data. It is intended as a blueprint to spark discussion among the community, and to hopefully lead to the implementation and deployment of one such technology in the near future. [37] The large-scale adoption of such a system would profoundly affect the way we do science. As demonstrated in section 2.2, a rich data format means that more information can be extracted with less effort from the user. The semantic framework we propose here would enable proxy records to be viewed as instances of certain classes of objects, thus enabling object-oriented programming to be applied to paleoclimatology. A universal, open-source data format would quickly enable software libraries to be built around those objects, and would apply, by design, to all objects within a class. Making this 466 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB code widely available (perhaps on the same Web servers as the data themselves) would enable paleoclimate scientists to easily apply sophisticated data analysis tools to their own data before sharing it, even without much programming knowledge. [38] In a nutshell, a semantic web for paleoclimatol- ogy would allow the power of automation to enter the study of past climates. This would first change the way paleoclimate records are compared to each other, a task done routinely in paleoclimate studies. Currently, each new record is typically stacked against existing ones and a comparison is made. The choice of records used in this comparison is partially dictated by scientific considerations (e.g., gathering proxies thought to be sensitive to a particular climate phenomenon), and partially dictated by ease of access. A universal data format would ensure that all the information relevant to a comparison would be gathered using objective criteria, and using the most appropriate methods (e.g., formally taking time uncertainties into account). Furthermore, a semantic web would radically change the way paleoclimate information is used to evaluate general circulation models, an endeavor at the heart of initiatives like the Paleoclimate Modeling Intercomparison Project (http://pmip3.lsce.ipsl.fr/). It would also transform the integration of paleoclimate records with instrumental variables (i.e., climate reconstruction), again allowing records to be included in a reconstruction only if they fit certain objective criteria (e.g., “give me all annuallyresolved records located within 30 degrees of the equator”). Coupled to process-based models of proxy behavior, it would eventually enable the assimilation of paleoclimate data into climate model trajectories. All these tasks presently require sophisticated tools and a daunting amount of human resources, chiefly provided by volunteers. These tasks would be considerably eased by agreeing upon a common data standard, whose overhead cost would be negligible once the submission of proxy records to online databases is designed to structure the data to conform to these standards (via web forms or spreadsheet templates). Indeed, since a majority of paleoclimate scientist already send their data to the NCDC or PANGAEA, there would be little additional work involved for them once the submission process is standardized. [39] On one hand, this semantic web is expected to bring many benefits: increased relevance of paleoclimate data to future climate projections, increased transparency, increased accessibility, increased reproducibility, and increased educational opportunities. On the other hand, one may easily imagine a scenario 10.1002/ggge.20067 whereby inadequate algorithms will be thoughtlessly applied to crunch through the database and come up with absurd answers to irrelevant questions. Obviously, automation will never be a substitute for knowledge, and expert judgment will always be critical to the interpretation of any result derived from paleoclimate records. The solution to such problems is more collaboration between all members of the paleoclimate community; a common language to describe such data can only help this objective. Furthermore, malicious or ignorant use of climate data is already performed routinely by climate change deniers, and a paleoclimate semantic web would do little to change this. [40] The evolution we propose parallels that of other data-driven fields like bioinformatics, whose recent history has shown both the power and limits of automation. A semantic web of paleoclimatology would create much of the same opportunities, and likely much of the same pitfalls. Yet, in this field as in others, the advantages seem to far outweigh the disadvantages: a machine-readable database of paleoclimate proxies means that paleoclimatologists will be able to put their data in a larger, more useful context, spend less time accessing data, and more time thinking about the scientific implications of their results. Who would not want that? Acknowledgments. [41] J.E.G. acknowledges Tiffany Tsai and Nasim Mirnateghi for help with assembling the coral database. This work was partially supported by grant NA10OAR4310115 from the National Oceanographic and Atmospheric Administration. Data and code necessary to generate the figures of this article are available at http://climdyn.usc.edu/Publications_files/EmileGeay%26Eshleman_SI.zip. References Abram, N. J., M. K. Gagan, J. E. Cole, W. S. Hantoro, and M. Mudelsee (2008), Recent intensification of tropical climate variability in the Indian Ocean, Nat. Geosci., 1, 849–853, doi:10.1038/ngeo357. Abrial, J. (1974), Data semantics, in Data Base Management Proceedings IFIP Working Conference on Data Base Management, edited by J. W. Klimbie and K. L. Koffeman, pp. 1–59, North-Holland Publishing Company, Cargèse, Corsica, France. Adler, P., R. Kolde, M. Kull, A. Tkachenko, H. Peterson, J. Reimand, and J. Vilo (2009), Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods, Genome Biol., 10(12), R139 doi:10.1186/gb-2009-10-12-r139. Anchukaitis, K. J., and J. E. Tierney (2012), Identifying coherent spatiotemporal modes in time-uncertain proxy 467 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB paleoclimate records, Clim. Dyn., 1–16, doi:10.1007/s00382012-1483-0. Asami, R., T. Yamada, Y. Iryu, T. M. Quinn, C. P. Meyer, and G. Pauley (2005), Interannual and decadal variability of the western Pacific sea surface condition for the years 1787-2000: Reconstruction based on stable isotope record from a Guam coral, J. Geophys. Res., 110, doi:10.1029/2004JC002555. Benson, D. A., I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. Sayers (2011), GenBank, Nucleic Acids Res., 39 (Database issue), D32–37, doi:10.1093/nar/gkq1079. Berg-Cross, G., et al. (2012), Semantics and ontologies for earthcube, in Proceedings of the Workshop on GIScience in the Big Data age. Bergman, M. (2009), Advantages and Myths of RDF, http:// www.mkbergman.com/483/advantages-and-myths-of-rdf/ Berners-Lee, T., J. Hendler, and O. Lassila (2001), The semantic web: a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities, Sci. Am., 29–37. Bizer, C., T. Heath, and T. Berners-Lee (2009), Linked data - the story so far, Inter. J. Semantic Web Inform. Syst., 5(3), 1–22, doi:10.4018/jswis.2009081901. Boiseau, M., A. Juillet-Leclerc, P. Yiou, B. Salvat, P. Isdale, and M. Guillaume (1998), Atmospheric and oceanic evidences of El Niño-Southern Oscillation events in the south central Pacific Ocean from coral stable isotopic records over the last 137 years, Paleoceanogr., 13, 671–685, doi:10.1029/ 98PA02502. Brazma, A., et al. (2001), Minimum information about a microarray experiment (MIAME)-toward standards for microarray data, Nat. Genet., 29(4), 365–371 doi:10.1038/ng1201-365. Charles, C. D., D. E. Hunter, and R. G. Fairbanks (1997), Interaction Between the ENSO and the Asian Monsoon in a Coral Record of Tropical Climate, Science, 277(5328), 925–928, doi:10.1126/science.277.5328.925. Charles, C. D., K. Cobb, M. D. Moore, and R. G. Fairbanks (2003), Monsoon-tropical ocean interaction in a network of coral records spanning the 20th century, Mar. Geol., 201, 207–222, doi:10.1016/S0025-3227(03)00217-2. Chen, R., et al. (2010), Differentially expressed RNA from public microarray data identifies serum protein biomarkers for cross-organ transplant rejection and other conditions, PLoS Comput. Biol., 6(9), doi:10.1371/journal.pcbi.1000940. Cobb, K. M., C. D. Charles, and D. E. Hunter (2001), A central tropical Pacific coral demonstrates Pacific, Indian, and Atlantic decadal climate connections, Geophys. Res. Lett., 28, 2209–2212, doi:10.1029/2001GL012919. Damassa, T. D., J. E. Cole, H. R. Barnett, T. R. Ault, and T. R. McClanahan (2006), Enhanced multidecadal climate variability in the seventeenth century from coral isotope records in the western Indian Ocean, Paleoceanogr., 21, PA2016, doi:10.1029/2005PA001217. Edgar, R., M. Domrachev, and A. E. Lash (2002), Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Res., 30(1), 207–210. Emile-Geay, J., K. Cobb, M. Mann, and A. T. Wittenberg (2013), Estimating Central Equatorial Pacific SST variability over the Past Millennium. Part 1: Methodology and Validation, J. Clim., in press doi:10.1175/JCLI-D-11-00510.1. Emile-Geay, J., K. Cobb, M. Mann, and A. T. Wittenberg (2013), Estimating Central Equatorial Pacific SST variability over the Past Millennium. Part 2: Reconstructions and Implications, J. Clim., in press doi:10.1175/JCLI-D-11-00511.1. Evans, M. N., A. Kaplan, and M. A. Cane (2000), Intercomparison of coral oxygen isotope data and historical sea surface temperature (SST): Potential for coral-based SST field reconstructions, Paleoceanogr., 15, 551–562. 10.1002/ggge.20067 Evans, M. N., A. Kaplan, and M. A. Cane (2002), Pacific sea surface temperature field reconstruction from coral d18O data using reduced space objective analysis, Paleoceanogr., 17, 7–1, doi:10.1029/2000PA000590. Felis, T., J. Pätzold, Y. Loya, M. Fine, A. H. Nawar, and G. Wefer (2000), A coral oxygen isotope record from the northern Red Sea documenting NAO, ENSO, and North Pacific teleconnections on Middle East climate variability since the year 1750, Paleoceanogr., 15, 679–694, doi:10.1029/ 1999PA000477. Fox, P., D. McGuinness, R. Raskin, and K. Sinha (2007), A volcano erupts: semantically mediated integration of heterogeneous volcanic and atmospheric data, in Proceedings of the ACM First Workshop on CyberInfrastructure: Information Management in eScience, CIMS ’07, pp. 1–6, ACM, New York, NY, USA, doi:10.1145/1317353.1317355. Fox, P., D. L. McGuinness, L. Cinquini, P. West, J. Garcia, J. L. Benedict, and D. Middleton (2009), Ontology-supported scientific data frameworks: The virtual solar-terrestrial observatory experience, Comput. Geosci., 35(4), 724–738, doi:10.1016/j.cageo.2007.12.019. Gagan, M. K., L. K. Ayliffe, J. W. Beck, J. E. Cole, E. R. M. Druffel, R. B. Dunbar, and D. P. Schrag (2000), New views of tropical paleoclimates from corals, Q. Sci. Rev., 19(1-5), 45–64, doi:10.1016/S0277-3791(99)00054-2. Gentleman, R. C., et al. (2004), Bioconductor: open software development for computational biology and bioinformatics, Genome Biol., 5(10), R80, doi:10.1186/gb-2004-5-10-r80. Gil, Y., P. Szekely, S. Villamizar, T. Harmon, V. Ratnakar, S. Gupta, M. Muslea, F. Silva, and C. Knoblock (2011), Mind your metadata: Exploiting semantics for configuration, adaptation, and provenance in scientific workflows, in Proceedings of the Tenth International Semantic Web Conference (ISWC), Bonn, Germany, Springer: Verlag, 395. ITRDB (2009), National Climatic Data Center, World Data Center for Paleoclimatology, Tree Ring Database. Janowicz, K., and P. Hitzler (2012a), The digital earth as knowledge engine, Semant. Web, 3(3), 213–221, doi:10.3233/SW-2012-0070. Janowicz, K., and P. Hitzler (2012b), Key ingredients for your next semantics elevator talk, in Advances in Conceptual Modeling, Lecture Notes in Computer Science, vol. 7518, edited by S. Castano, P. Vassiliadis, L. Lakshmanan, and M. Lee, pp. 213–220, Springer, Berlin Heidelberg, doi:10.1007/978-3-642-33999-8_27. Jones, P., et al. (2009), High-resolution palaeoclimatology of the last millennium: a review of current status and future prospects, Holocene, 19(1), 3–49, doi:10.1177/0959683608098952. Killeen, T. (2012), Data citation in the geosciences, http:// www.nsf.gov/pubs/2012/nsf12058/nsf12058.jsp Kuhnert, H., J. Pätzold, B. Hatcher, K. H. Wyrwoll, A. Eisenhauer, L. B. Collins, Z. R. Zhu, and G. Wefer (1999), A 200-year coral stable oxygen isotope record from a high-latitude reef off Western Australia, Coral Reefs, 18, 1–12, doi:10.1007/ s003380050147. Kuhnert, H., J. Pätzold, K.-H. Wyrwoll, and G. Wefer (2000), Monitoring climate variability over the past 116 years in coral oxygen isotopes from Ningaloo Reef, Western Australia, Inter. J. Earth Sci., 88, 725–732, doi:10.1007/ s005310050300. Li, B., D. W. Nychka, and C. M. Ammann (2010), The value of multi-proxy reconstruction of past climate, J. Amer. Statist. Assoc., 105, 883–911, doi:10.1198/jasa.2010.ap09379. Linsley, B. K., R. B. Dunbar, G. M. Wellington, and D. A. Mucciarone (1994), A coral-based reconstruction of intertropical convergence zone variability over Central 468 Geochemistry Geophysics Geosystems 3 G EMILE-GEAY AND ESHLEMAN: PALEOCLIMATE SEMANTIC WEB America since 1707, J. Geophys. Res., 99, 9977–9994, doi:10.1029/94JC00360. Linsley, B. K., L. Ren, R. B. Dunbar, and S. S. Howe (2000), El Niño Southern Oscillation (ENSO) and decadal-scale climate variability at 10 N in the eastern Pacific from 1893 to 1994: A coral-based reconstruction from Clipperton Atoll, Paleoceanogr., 15, 322–335, doi:10.1029/1999PA000428. Linsley, B. K., A. Kaplan, Y. Gouriou, J. Salinger, P. B. de Menocal, G. M. Wellington, and S. S. Howe (2006), Tracking the extent of the South Pacific Convergence Zone since the early 1600 s, Geochem. Geophys. Geosyst., 7, 5003–+, doi:10.1029/2005GC001115. Linsley, B. K., P. Zhang, A. Kaplan, S. S. Howe, and G. M. Wellington (2008), Interdecadal-decadal climate variability from multicoral oxygen isotope records in the South Pacific Convergence Zone region since 1650 A.D., Paleoceanogr., 23(26), A262,219+, doi:10.1029/2007PA001539. Lorenz, E. N. (1956), Empirical orthogonal functions and statistical weather prediction, Scientific Report 1, Statistical Forecasting Project. 110268, Massachusetts Institute of Technology Defense Doc. Center. Mann, M. E. (2002), Climate reconstruction: The value of multiple proxies, Science, 297(5586), 1481–1482, doi:10.1126/ science.1074318. Murray-Rust, P., C. Neylon, R. Pollock, and J. Wilbanks (2010), Panton principles, principles for open data in science. NASA (2012), A.40 computational modeling algorithms and cyberinfrastructure, Tech. rep., National Aeronautics and Space Administration (NASA). Neumann, E. K., and D. Quan (2006), BioDash: a semantic web dashboard for drug development, Pac. Symp. Biocomput., 176–187, PMID: 17094238. Committee on Surface Temperature Reconstructions for the Last 2, 000 Years, N. R. C. (2006), Surface Temperature Reconstructions for the Last 2000 Years, The National Academies Press, Washington, D.C., 160. Nurhati, I. S., K. M. Cobb, and E. Di Lorenzo (2011), Decadal-scale SST and salinity variations in the central tropical pacific: Signatures of Natural and Anthropogenic Climate change, J. Clim., 24(13), 3294–3308, doi:10.1175/2011JCLI3852.1. Parkinson, H., et al. (2007), ArrayExpress--a public database of microarray experiments and gene expression profiles, Nucleic Acids Res., 35(Database issue), D747–750, doi:10.1093/nar/gkl995. Pfeiffer, M., O. Timm, W.-C. Dullo, and S. Podlech (2004), Oceanic forcing of interannual and multidecadal climate variability in the southwestern Indian Ocean: Evidence from a 160 year coral isotopic record (La Réunion, 55 E, 21 S), Paleoceanogr., 19, 4006–+, doi:10.1029/2003PA000964. Pierre, M., B. DeHertogh, A. Gaigneaux, B. DeMeulder, F. Berger, E. Bareke, C. Michiels, and E. Depiereux (2010), Meta-analysis of archived DNA microarrays identifies genes 10.1002/ggge.20067 regulated by hypoxia and involved in a metastatic phenotype in cancer cells, BMC Cancer, 10, 176, doi:10.1186/14712407-10-176. Quinn, T. M., T. J. Crowley, F. W. Taylor, C. Henin, P. Joannot, and Y. Join (1998), A multicentury stable isotope record from a New Caledonia coral: Interannual and decadal sea surface temperature variability in the southwest Pacific since 1657 A.D., Paleoceanogr., 13, 412–426, doi:10.1029/ 98PA00401. Quinn, T. M., F. W. Taylor, and T. J. Crowley (2006), Coral-based climate variability in the Western Pacific Warm Pool since 1867, J. Geophys. Res. (Oceans), 111, 11,006–+, doi:10.1029/ 2005JC003243. Ramanujam, S., A. Gupta, L. Khan, S. Seida, and B. Thuraisngham (2009), R2d: A bridge between the semantic weband rerlational visualization tools, in IEEE International Conference on Semantic Computing ICSC ’09. Rayner, N., P. Brohan, D. Parker, C. Folland, J. Kennedy, M. Vanicek, T. Ansell, and S. Tett (2006), Improved analyses of changes and uncertainties in sea surface temperature measured in situ since the mid-nineteenth century: the HadSST2 data set, J. Clim., 19(3), 446–469, doi:10.1175/JCLI3637.1. Sarachik, E. S., and M. A. Cane (2010), The El Niño-Southern Oscillation Phenomenon, Cambridge University Press, Cambridge, UK. 336. Schorlemmer, D., A. Wyss, S. Maraini, S. Wiemer, and M. Baer (2004), QuakeML - An XML schema for seismology, Tech. rep., Swiss Seismological Service. Spellman, P. T., et al. (2002), Design and implementation of microarray gene expression markup language (MAGE-ML), Genome Biol., 3(9), RESEARCH0046, doi:10.1186/ gb-2002-3-9-research0046. Thomson, D. J. (1982), Spectrum estimation and harmonic analysis, Proc. IEEE, 70(9), 1055–1096. Tudhope, A. W., C. P. Chilcott, M. T. McCulloch, E. R. Cook, J. Chappell, R. M. Ellam, D. W. Lea, J. M. Lough, and G. B. Shimmield (2001), Variability in the El Niño-Southern oscillation through a glacial-interglacial cycle, Science, 291, 1511–1517, doi:10.1126/science.1057969. Urban, F. E., J. E. Cole, and J. T. Overpeck (2000), Influence of mean climate change on climate variability from a 155-year tropical Pacific coral record, Nature, 407, 989–993. Wahl, E. R., D. M. Anderson, B. A. Bauer, R. Buckner, E. P. Gille, W. S. Gross, M. Hartman, and A. Shah (2010), An archive of high-resolution temperature reconstructions over the past 2+ millennia, Geochem. Geophys. Geosyst., 11(1), doi:10.1029/2009GC002817. Zinke, J., W.-C. Dullo, G. A. Heiss, and A. Eisenhauer (2004), ENSO and Indian Ocean subtropical dipole variability is recorded in a coral record off southwest Madagascar for the period 1659 to 1995, Earth Planet. Sci. Lett., 228, 177–194, doi:10.1016/j.epsl.2004.09.028. 469
© Copyright 2026 Paperzz