Introducing the WaterML R Package for retrieving data from CUAHSI

Introducing the WaterML R Package for retrieving
data from CUAHSI WaterOneFlow Web Services
Jiří Kadlec, Bryn StClair,
Daniel P. Ames, Richard A. Gill
Brigham Young University
2015 AWRA Annual Water Resources Conference
16th November 2015
1
Presentation Outline
•
•
•
•
•
•
Background
What is R?
What is WaterML?
WaterML R Package Functions
WaterML R Package Design
Use Case
Importance of Hydrological Data
Agriculture –
water
management
Data collection and
modeling
Flood event
classification and
forecasting
Map and Time Series
Snow and Ski
Track condition
analysis
Big
Data
Need for statistical analysis
R Statistical Software (www.r-project.org)
• Built-in data structures and functions for manipulating arrays, matrices, tables
• Analysis can be saved as a script for future re-use
Suitable environment to analyze big spatial and time series data
R Packages
• A package is a collection of reusable R
functions, data and code. They are available for
download and installation from the R website.
> 6,000 user-contributed packages for
graphics, data mining, spatial analysis,
modeling…
How to get the data into R?  one
option: CUAHSI Water Data Center
• http://data.cuahsi.org
CUAHSI WaterOneFlow Web Services
• 104 oficially registered data sources
• All of the data is open-access (free to redistribute)
HIS CentralCatalog
(Find)
Examples of available data:
HydroServer 1
(Publish)
HydroServer 2
(Publish)
Data Discovery
Web Service
Data Retrieval
Web Services
Client
(Bind)
Slide 7 of 10
WaterML Data Exchange Format
• Based on XML (extensible markup language)
• Has both the data and metadata about a time-series
• Site, Variable, Observation Method, Data Source,
Quality Control Level, Time Zone
• WaterML 1.1
• WaterML 2.0
• International Standard of Open Geospatial Consortium
• Used by all data providers registered at CUAHSI
Previous approaches for connecting
CUAHSI data and R
• RObsDat (direct connection to ODM database)
• HydroDesktop + HydroR (Windows only)
• DataRetrieval (USGS NWIS and EPA only)
Need easy-to use method to connect R with
any HydroServer WaterOneFlow service or
Any WaterML file
WaterML + R  WaterML R Package
• Free R package for discovery, download and parsing of
WaterML data
• Retrieves data from CUAHSI HIS Central catalog,
WaterOneFlow web services, and custom WaterML files
• Supports WaterML 1.0, 1.1, and 2.0
• Published on CRAN
• (http://cran.r-project.org/web/packages/WaterML)
Slide 10 of 10
WaterML R Package Functions
HydroServer
HIS Central
Catalog
Downlad API (WaterML)
Upload API (JSON)
(1) Data Search
(2) Data Download
(3) Data Upload
GetServices
GetSites
AddSites
HISCentral_GetSites
GetVariables
AddVariables
HISCentral_GetSeriesCatalog GetSiteInfo
GetValues
AddMethods
AddSources
AddValues
Software Development Challenges (1)
• Communicating with SOAP Web Service from R
• SOAP = Simple Object Access Protocol
• To call a SOAP web service method, we must use
HTTP POST web request with 2 parts:
– SOAP:Envelope
– SOAPAction header
SOAP Envelope and SOAP Action
POST http://hydrodata.info/chmi-h/cuahsi_1_1.asmx
HTTP REQUEST POST DATA
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
Web Method Name
<soap:Body>
<GetSiteInfoObject xmlns="http://www.cuahsi.org/his/1.1/ws/">
<site>HYDRODATACZ-HR:187</site><authToken></authToken>
</GetSiteInfoObject></soap:Body>
Parameter Value
</soap:Envelope>
HTTP REQUEST HEADERS
Content-Type: text/xml
SOAPAction: http://www.cuahsi.org/his/1.1/ws/GetSiteInfoObject
Software Development Challenges (2)
• Parsing very big XML Data File
• Initially I used XMLTreeParse  Very Slow
• Better solution: XPath
The xpathSApply function finds all elements with the same name and hierarchy level,
and stores them in an array
100,000 or more lines……….
dataValues = xpathSApply(doc, "//sr:value", xmlValue, namespaces = ns)
dateTimesUTC = xpathSApply(doc, "//sr:value", xmlGetAttr, name = "dateTimeUTC", namespaces = ns)
Using the WaterML Package
Website:
worldwater.byu.edu/app/index.php/rushvalley
Data Logger
Sensors
Automated uploading of
data from sensors to
ODM and Hydroserver
DECAGON Data Server
api.echo2data.com
DECAGON Website
DECAGON API
Lookup Table
R data conversion script
Upload API
Site Variable Logger Sensor
Hydroserver
worldwater.byu.edu/interactive/rushvalley
ODM Database
Website
WaterML Services API
HIS Central Catalog
R statistical analysis tool
Example Test: There is no difference
between NDVI at plots with and without
mammals
Other uses: Exploratory analysis (error bar plot)
(daily mean with 1 standard error bars)
WaterML R Package Usage Statistics
• 1733 downloads (since May 2015)
• 250 downloads in last month
• Officially recognized by CUAHSI (2015 CUAHSI
president’s award for community
contribution)
Slide 20 of 10
Thank you for your attention
R Website
www.r-project.org
WaterML R Package Website
https://cran.r-project.org/web/packages/WaterML
Source Code on Github
http://github.com/jirikadlec2/waterml