GeoInformatics: CUAHSI Hydrologic Information Systems Project Summary This proposal advances integrative hydrologic science through the development of a hydrologic information system that can be implemented at universities throughout the United States. It involves collaboration between hydrologic scientists from the Consortium of Universities for the Advancement of Hydrologic Science, Inc (CUAHSI) and computer scientists from the San Diego Supercomputer Center, and supports a larger strategy at NSF to develop cyberinfrastructure for the environmental and earth sciences. The CUAHSI Hydrologic Information System (HIS) is a geographically distributed network of hydrologic data sources and functions that are integrated using web services so that they function as a connected whole. CUAHSI web services make national water data archives directly accessible to hydrologic scientists, almost as if the data were located on a local disk drive. HIS is built on a hydrologic information model that has six components: time series (streamflow, water quality, groundwater levels), multidimensional fields (remote sensing, Nexrad precipitation, weather and climate model outputs), geospatial themes (terrain, hydrography, watersheds, soils, land cover, vegetation, geology, aquifers), simulation models (existing models and scientific workflow models), information tools (data access, transformation, publication, analysis, visualization), and information collections (digital watersheds, digital aquifers, digital estuaries) accessible through online portals and desktop applications. The intellectual merit of this project is that it harnesses information technology to support hydrologic science by building an information model that has a coherent intellectual structure and synthesizes data from many disciplines. It enables the tracing of water movement and transport of constituents vertically between atmosphere, surface water and groundwater, and horizontally through the landscape from watersheds and aquifers to streams, rivers, estuaries and bays. It integrates data across scales of space and time. This will enable the testing of hypotheses about the interfaces between hydrologic processes in a manner and scale that is rarely attempted now. The results of these studies will be published in journal articles and as a series of CUAHSI monographs. The hydrologic information system developed in this project is of significant value in itself for hydrologic science research and also in the wider sense of being an example of how cyberinfrastructure is developed for earth and environmental sciences. The broader impacts of this project include its networking of hydrologic scientists at many universities who will jointly be contributing and receiving hydrologic information. The CUAHSI hydrologic information system and its accompanying datasets will be developed in the public domain and available to the professional hydrology community, and to educators at all levels. The integration of water information is important for water science, but also for water planning, engineering and management. This project will have broad benefits in improving water information access and scientific analysis across the nation. Results from Prior NSF Support (only the prior project most relevant to this project is reported) David R. Maidment Grant # EAR-0413265 ($964,218) 1/04 – 3/07 CUAHSI Hydrologic Information Systems Principal Investigators: D.R.Maidment, C. Baru, P. Kumar, M. Piasecki, R. Hooper, Summary of Results: This project defined how cyberinfrastructure should be developed for the hydrologic sciences, and developed a prototype community hydrologic information system using web services through the Consortium of Universities for the Advancement of Hydrologic Sciences (CUAHSI). . Publications: Goodall, J.,2005, A geotemporal framework for hydrologic analysis, PhD dissertation, University of Texas at Austin. Strassberg, G., 2005, A geographic data model for groundwater, PhD dissertation, University of Texas at Austin. Maidment, D.R., (Ed.), 2005, CUAHSI Hydrologic Information System Status Report, Consortium of Universities for the Advancement of Hydrologic Science, Inc, 224p.,http://www.cuahsi.org/docs/HISStatusSept15.pdf David G. Tarboton Collaborator on the Grant # EAR-0413265 ($964,218) 1/04 – 3/07 CUAHSI Hydrologic Information Systems for which David Maidment is the Principal Investigator. Summary of Results: Dr. Tarboton has as part of this project been responsible for the User Needs Assessment Survey and Hydrologic Observations Data Model Design. Publications: Bandaragoda, C. J., D. G. Tarboton and D. R. Maidment, (2005), "User Needs Assessment, Chapter 4," in Hydrologic Information System Status Report, Version 1, Edited by D. R. Maidment, p.48-87, http://www.cuahsi.org/docs/HISStatusSept15.pdf. Bandaragoda, C., D. G. Tarboton and D. R. Maidment, (2006), "Hydrology's Effort Towards the Cyberfrontier," EOS, 87(1): 2,6. Horsburgh, J. S., D. G. Tarboton and D. R. Maidment, (2005), "A Community Data Model for Hydrologic Observations, Chapter 6," in Hydrologic Information System Status Report, Version 1, Edited by D. R. Maidment, p.102-135, http://www.cuahsi.org/docs/HISStatusSept15.pdf. Michael Piasecki Grant # EAR-0412904 ($181,007) 1/04 – 3/07 CUAHSI Hydrologic Information Systems Principal Investigators: D.R.Maidment, C. Baru, P. Kumar, M. Piasecki, R. Hooper, Summary of Results: Piasecki has developed a basic community metadata profile and also conceptual representations for vocabulary taxonomy and processes using ontologies. Publications: Bermudez, L., 2004, “ONTOMET: Ontology Metadata Framework”, PhD dissertation, Drexel University, and a Status Report (see above). Ilya Zaslavsky Grant # EAR-0413182 ($1,059,352) 1/04 – 3/07 CUAHSI Hydrologic Information Systems Principal Investigators: D.R.Maidment, C. Baru, P. Kumar, M. Piasecki, R. Hooper, Summary of Results: Dr. Zaslavsky has been responsible for the development and deployment of cyberinfrastructure components of the prototype CUAHSI hydrologic information system, the Hydrologic Data Access system in particular. Publications: Baru, C., I. Zaslavsky, and R. Wahadj (2005), "System Architecture, Chapter 3," in Hydrologic Information System Status Report, Version 1, Edited by D. R. Maidment, p.24-47, http://www.cuahsi.org/docs/HISStatusSept15.pdf. 1 Ruddell, B., Zaslavsky, I., Kumar, P., Jennings, C., Mehnert, E., Thomas, D., Holmes, R., Maidment, D., Piasecki, M. 2005. TI: Development of the Illinois River Basin Virtual Observatory Prototype. Eos Trans. AGU, 86(52), Fall Meet. Suppl., Abstract IN24A-04 2 Project Description Introduction The Consortium of Universities for the Advancement of Hydrologic Science, Inc, (CUAHSI) is a legally independent organization of which 101 US universities are members, that is supported by the National Science Foundation to develop infrastructure and services for the advancement of hydrologic science in the United States. The Hydrologic Information System (HIS) project is one component of CUAHSI’s mission, which is motivated by the community’s desire to better access and analyze hydrologic information. The broad aim of HIS is to provide a strong and flexible foundation for data-intensive hydrologic research that can evolve as the needs of the community change. This coincides within the National Science Foundation with a larger initiative related to development of cyberinfrastructure for revolutionizing science and engineering. The emerging vision “is to use cyberinfrastructure to build more ubiquitous, comprehensive digital environments that become interactive and functionally complete for research communities in terms of people, data, information, tools, and instruments and that operate at unprecedented levels of computational, storage, and data transfer capacity.” (Atkins et al., 2003, p.17). NSF has recently set up an Office of Cyberinfrastructure, and is presently developing a strategic planning document called “NSF’s Cyberinfrastructure Vision for 21st Century Discovery” (NSF Cyberinfrastructure Council, 2005). The CUAHSI Hydrologic Information System project is the key component of NSF’s cyberinfrastructure development to support hydrologic science. The existing CUAHSI Hydrologic Information System project was initiated in April 2004 for a two-year period, which terminates in March 2006. This proposal is for a 5-year renewal of that project. In this proposal, the existing and proposed HIS projects are referred to as HIS Phase 1, and HIS Phase 2, respectively. There is a closely parallel NSF program in environmental engineering called CLEANER (Collaborative Large-Scale Engineering Analysis Network for Environmental Research). A status report summarizing the findings of HIS Phase 1 has been published (Maidment, 2005), and has been reviewed in detail by the CLEANER cyberinfrastructure committee, who consider it a model to which other science communities can look to guide their cyberinfrastructure development efforts. Many copies of this report have been requested by NSF for distribution to Program Officers concerned with cyberinfrastructure development. Project Goals The CUAHSI HIS project has four goals: Data Access – provide rapid access to a large volume of high quality hydrologic data; Hydrologic Observatories – develop a digital watershed framework for synthesizing data and models for a hydrologic region; Hydrologic Science – strengthen place-based hydrologic science by supporting the representation of hydrologic processes with equations by an enhanced capacity to describe hydrologic environments with data; 3 Hydrologic Education – quantify and visualize the movement of water and chemicals in a hydrologic environment continuously in space and time. A user survey conducted during HIS Phase 1 (Tarboton et al., 2005) showed that CUAHSI members rank Data Access as the highest priority of these four goals. Hydrologic Information Model When HIS Phase 1 was initiated, what a Hydrologic Information System would consist of, or how it would function, was unknown. It required more than a year of study to identify the four goals just described. HIS investigators were confronted with a plethora of information types and cyberinfrastructure techniques, without a visible pattern for structuring them. In any such situation of great complexity, it is useful to take the whole problem and break it down into a series of components, which can be addressed individually, and then reassembled to form a solution for the whole problem. The CUAHSI Hydrologic Information Model consists of seven components: time series, multidimensional fields, geospatial themes, simulation models, information tools, information collections and web portals. The first three of these components are categories of data. The second three are means for analyzing data and modeling processes and storing sets of related data and models. Web portals are the windows through which data and models are accessed and shared. These components are now described in more detail: Time Series – these include observational data records from streamflow, precipitation and groundwater level gages, water quality and biological sampling, and climate and weather stations. Typically the locations of these measurements are represented in space as points – groundwater levels are measured at a very large number of points, but with a few samples at each point, randomly scattered in time; water quality sampling produces large numbers of variables at each sampling, the number of geographic points is smaller than for groundwater levels but the number of data measured at each point is larger; streamflow data are very systematically collected at regular intervals through time and a limited number of locations. Time series may also be produced by hydrologic simulation models and by averaging continuous phenomena over spatial regions, such as the average precipitation over a watershed. Multidimensional Fields – these include the products of satellite and aircraft remote sensing, Nexrad radar rainfall grids, and the output of weather and climate models. Fields represent continuously distributed phenomena in space, with one or more variables described at each location on a regular mesh or array. Observed fields are often spatially extensive but thin in time, and what is needed for hydrologic science is spatially localized over a watershed or aquifer but much deeper in time, so there is an important space-time recomposition problem involved when using such data sources. Geospatial Themes – these are static representations of particular layers of information describing the earth’s surface and subsurface, including land surface terrain, watershed and stream networks, stream channel morphology, land cover, vegetation, soils, geology and aquifers. Other themes that may enter a study are census data on population, agricultural statistics, and infrastructure such as roads and dams. A particular theme may have a different 4 form depending on the spatial scale – the land surface terrain of whole nation may be represented using a 1 km Digital Elevation Model grid, while the terrain surface of a small watershed may be represented using LIDAR data with 1m post spacings. Geospatial themes may be comprised of discrete space objects or features, which are spatially distinct points, lines, areas or volumes, or they may be continuous space themes, such as terrain surfaces or digital orthophoto images. Simulation Models – these are computerized sets of equations representing the functioning of hydrologic processes. Hydrologic modeling has occurred over decades, and most current models were developed as stand-alone systems with specially designed input and output files that have little in common from one model to another. Transferring information between these models and an external data infrastructure is possible but difficult and directly connecting one arbitrarily selected hydrologic model with another is nearly impossible. New methods of modeling that emphasize loosely connected modularized functions are needed. Information Tools – these are devices for accessing, transforming, analyzing and visualizing hydrologic data, and for connecting data with models. Tools generally perform a single function, and when many tools are assembled into a package, they form a toolkit or application system. The use of scientific workflows such as Kepler (Altintas et al., 2004), D2K (NCSA, 2006), or ModelBuilder (ESRI, 2006), to sequence the operations of tools is a new way of constructing complex analysis and modeling systems. Information Collections – these are assemblies of series, fields, themes, models and tools into a connected structure that comprehensively describes a hydrologic environment. A Digital Watershed is an information collection describing a drainage basin. One may similarly define a Digital Aquifer or Digital Estuary to describe other hydrologic environments. Data file formats, such as HDF (Hierarchical Data Format) (Folk, 2005), or the ArcGIS geodatabase may be used to store collections. A more flexible method is provided by the GEON registration system in which individual datatypes registered in a catalog can be combined into a “Data Integration Cart” collection using a variety of relationships including spatial (GEON, 2005).. Web Portals – these represent a type of content management system web site that serve as gateways to a broad array of resources and services accessed through portlets. Cross-platform portlet standards are emerging (OASIS, 2003), to allow portal developers easily embed remotelyrunning web services into their portals. A cybercollaboratory is a particular kind of portal designed to facilitate sharing and discussion of information among a community of scientific investigators Thus, the CUAHSI Hydrologic Information System is a geographically distributed set of series, fields, themes, models, and tools, which are connected using the internet, assembled into collections, and accessed through portals. The innovation in computer science which makes all this possible is the Service Oriented Architecture (SOA) in which individual functions are constructed as web services and made available at network nodes for use by other participants in the network via standard access protocols . Thus, a hydrologic scientist using CUAHSI HIS has access to a network of data sources, models and tools some of which are resident on his or her own computer but many of which function automatically on remote computers, in much the 5 same way that a scientist communicates with colleagues via email without worrying about what operating system or software the recipients email system is using. Accomplishments of HIS Phase 1 (1) CUAHSI Web Services A CUAHSI web services library has been built that provides direct access to data from the USGS National Water Information System, the Ameriflux tower network, a portion of the National Climatic Data Center’s (NCDC) archive, and some products from MODIS remote sensing. Functions in the NWIS library, can be viewed at http://river.sdsc.edu/NWISTS/nwis.asmx. For streamflow, this library contains several functions, such as GetSiteInfo, GetDischargeInfo, and GetDischargeValues. GetSiteInfo takes a USGS station number as input and returns an XML document that contains station metadata (name, location, and other attributes); GetDischargeInfo takes a station number as input and returns the number of discharge observations and the date times of the first and last observations; GetDischargeValues takes a station number, start date and end date as inputs, and produces a time series of discharge values and times. Similar functions exist for extracting water quality and groundwater data from NWIS. Each web service method is an elementary piece of code, which performs a single function and is described using the Web Service Definition Language (WSDL), a W3C standard that enables instructions made on one computer to be executed on another. CUAHSI web services for NWIS are web page scrapers -- they programmatically mimic the action a human user would take when using the NWIS web site http://waterdata.usgs.gov/nwis to create a URL request string that when submitted to NWIS produces the same output file a human user sees. The web service then parses that file to transform the information into a standard XML format as required by WSDL. A similar approach has been used to access Ameriflux and MODIS data. The value in this approach is that it just uses a data agency’s web site as it is. The NCDC web services use a different and more profound approach – NCDC has placed a portion of its archive outside its firewall for the Automated Surface Observing System (ASOS) at airport weather stations. The NCDC has created querying functions using Simple Open Access Protocol (SOAP), another W3C standard that allows CUAHSI to make direct data access requests into the NCDC archive without mimicking any web page operations. This is faster and more secure for CUAHSI, but more risky for the data provider because it allows remote machine access to the archive. The advent of CUAHSI web services means that a hydrologic scientist using any programming language (Fortran, C/C++, Visual Basic, Java), or any application (Excel, ArcGIS, Matlab), running on any operating system (Windows, Unix, Linux, MacIntosh), can directly access hydrologic data in several national archives (NWIS, Ameriflux, NCDC, MODIS). This is a remarkable accomplishment. It is almost as if the national data archives are loaded on a local disk in the user’s computer. A HydroObjects library has been prepared that acts as a middleware component on the user’s computer and provides applications like Excel, ArcGIS and Matlab with access to the web services without having to program each service into each application. 6 The value of CUAHSI web services was quickly recognized outside academic circles. The availability of these services was announced during a CUAHSI cyberseminar on HIS presented on Friday, October 28, 2005. On Wednesday Nov 2, Jason Love, from a private firm, RESPEC, in Sioux Falls, South Dakota, posted on the EPA Basins list server: “Occasionally one comes across something that is worth sharing; the CUAHSI Hydrologic Information Systems - Web Services Library for NWIS is a valuable tool for those of us interested in rapidly acquiring and processing data from the USGS, e.g., calibrating models and performing watershed assessments.” He provided a tutorial on how to use the services with Matlab, which CUAHSI had not developed. Thus, the technology transfer from the academia to the private sector to the public sector occurred in less than one week! Better access to the nation’s water information has wide benefits beyond its contributions to the advancement of hydrologic science. (2) Hydrologic Data Access System Web services work just fine when you know where the data have been measured and what has been measured there. A data archive like NWIS is actually a collection of observing networks, one for streamflow, another for water quality, a third for groundwater levels. Each network has a set of observation stations, each with its own name, identifying number, latitude and longitude. Each station has a set of one or more observation parameters (stage height, streamflow, dissolved oxygen, water level) that may be regularly or irregularly recorded through time to form observation series. This pattern of networks of stations having parameters described by series is repeated in all the national hydrologic observation systems (NWIS, EPA Storet, NCDC, NAWQA, Ameriflux) and the pattern is repeated in state, local and academic investigator hydrologic observation systems. The web sites that provide access to these data have tabular interfaces because the data series are stored behind them in relational database tables. As part of providing access to a particular observation network, a station map is constructed by building a program or web service for GetSiteInfo and applying that systematically over the spatial domain of the data source to harvest all the station locations. Then, by applying a GetParameterInfo service, the number and type of measurements available at that station can also be harvested. The end result is a hydrologic observations metadata catalog for the network that consists of a dot map showing station locations, and attached attribute tables that show what is measured at each station. The station maps for each observing network can be integrated to form an observation station map for the nation, as shown in Figure 1. Figure 1. Hydrologic observation station map for the continental US. 7 The observation station map is presented in the CUAHSI Hydrologic Data Access System (HDAS) (http://river.sdsc.edu/HDAS) against a backdrop of watershed and stream network data to provide spatial context. A hydrologic scientist can zoom in to any region of the nation and see where data have been measured, query what has been measured there, obtain graphs and tables of selected observation series, and download them as .csv or Excel files. These functions are supported by calls to the web services library. The HDAS provides a common data window on water observation information in the nation in much the same way that Travelocity or the Home Shopping Network do for travel or shopping. The HIS effort began in 2002 with a committee that wrote an HIS white paper, later published as a CUAHSI Technical Report (HIS, 2002). In that report, it was envisaged that the HIS would be based on a Hydrologic Data Access Center which would be a centralized facility supporting data access. What has emerged through web services, however, is a Hydrologic Data Access System which hydrologic scientists can extend by adding web services and station maps for any observation network in the nation. Thus, the role of CUAHSI HIS is to create the framework for this system and its services for accessing national data archives, and then support scientists in CUAHSI institutions to extend this system by adding state, local and individual investigator networks. (3) Hydrologic Observations Database Hydrologic scientists collect data in field campaigns and experimental sites and it is useful to have a standardized hydrologic observations database for storing that information so that it can be automatically incorporated into the national HIS when the investigator is ready to publish the information. Based on an initial design concept, and review from 22 CUAHSI scientists, a more fully configured relational database schema has been designed and tested on limited hydrologic datasets (Horsburgh et al., 2005). [David Tarboton – you might want to rephrase this or write more here] (4) Hydrologic Metadata Whether data are obtained from a government agency archive or an investigator dataset, they require metadata to describe the character of the information. Each national data archive has its own metadata schema, and the scope of the issue can be assessed from the fact that the NWIS system alone stores data for more than 10,000 parameters, mostly water quality species. The EPA Storet system for water quality has another metadata schema, different from that of the USGS even when describing exactly the same water quality variable. The National Climatic Data Center has yet another approach. Atmospheric water flux data from the Ameriflux network have one symbology and the same variables described in the North American Regional Reanalysis of climate have another. There is thus a very complex problem of semantic mediation or interpreting the parameter information from each individual source correctly, and then finding how parameter information from separate sources can be combined appropriately. [Michael – you might want to rephrase this or write more here] (5) Hydrologic Modeling using Web Services 8 Hydrologic models can be run operated using web services by automatically ingesting their input data from national archives at run time. Goodall (2005) built a hydrologic flux coupler that ingests precipitation, evaporation and groundwater recharge flux fields from the North American regional reanalysis of climate, streamflow discharge data from NWIS gaging stations, and combined all of these with geospatial information on watersheds to compute a daily or monthly water balance for watersheds of the Neuse basin, North Carolina. This accomplishes in minutes what otherwise requires hours of tedious effort manipulating web pages to get data, make format conversions to get everything in the right units, and then run the water balance simulation. The resulting water balance model can readily be applied anywhere in the nation because it is built directly over the national data archives. This was accomplished using a scientific workflow language, ArcGIS ModelBuilder, in which modular tools are visually connected in an iconic tableau to show the computational logic. It has also been shown that entire hydrologic simulation models such as HEC-HMS or HEC-RAS can be called directly as tools in a workflow (Whiteaker et al, 2006). The workflow model shown in Figure 2 can itself be published on a server and called as a web service by a user at a remote location. Thus, models can become web services just like data and a geographically distributed system of data and models created. Figure 2. A scientific workflow using web services to ingest hydrologic flux and flow data and perform a water balance for watersheds in the Neuse basin, North Carolina. Other NSF scientific communities are developing scientific workflows – the LTER ecological community is working with the SDSC Kepler system, and the NCSA environmental cyberinfrastructure project is using their D2K (Data to Knowledge) system. It has been demonstrated that CUAHSI’s web services for hydrologic data can be called as tools in Kepler and in D2K, so the same web services library can support any number of scientific workflow systems. The NCSA environmental cyberinfrastructure project is investigating metaworkflows that will make workflow models developed in different systems interoperable. [Jon Goodall – you may want to edit or add something here]. 9 (6) Digital Watershed Toolkit A Digital Watershed Toolkit (http://geo.sdsc.edu/cuahsi/Toolkit/tabid/79/Default.aspx) has been prepared which contains six components: a groundwater data model and toolkit, the hydrologic observations database schema, GeoLearn – a standalone system for processing remote sensing data, a river channel morphology model and toolkit, a watershed data model and toolkit, and a Time Series Analyst. The Time Series Analyst is particularly interesting because it was developed independently of the CUAHSI HIS project by Jeffrey Horsburgh at Utah Water Research Laboratory (UWRL). At first it operated only on its own specially constructed database of downloaded hydrologic data for a watershed in Utah. When CUAHSI web services became available Dr Horsburgh create a version of this analyst that operates directly over these web services, and thus his analyst is now applicable anywhere in the nation. This application is supported on the web at UWRL, http://water.usu.edu/nwisanalyst/ so now a hydrologic scientist anywhere in the nation can access Dr Horsburgh’s tool and execute it on NWIS data collected anywhere in the nation! This multiplies by a factor of thousands the value of Dr Horsburgh’s work in programming this analyst tool. The HIS development team hopes that other CUAHSI members will similarly nationally enable their tools and models as Dr Horsburgh has done and include them in the Digital Watershed Toolkit. (7) Hydrologic Observatory Collection A Digital Library has been constructed for the Illinois River Basin Observatory (http://irbho.cee.uiuc.edu/irbho/digitallibrary.php) that indexes more than 600 information sources containing information and data about the observatory region. Initially this collection was assembled using Arbitrary Digital Objects, which are unstructured collections of information in zip files with metadata descriptors, but it later emerged that arbitrarily structured objects are difficult to interpret automatically so now the collection has been reconfigured using the GEON registration system in which each indexed object has a particular datatype (e.g. image, relational databases, netCDF file), and thus can be linked automatically to tools that will open and view such datatypes. (8) Web Portals Products from the HIS-1 effort are displayed through an HIS portal mounted at the San Diego Supercomputer Center (http://geo.sdsc.edu/cuahsi/), through the CLEANER-CUAHSI Cybercollaboratory mounted at the NCSA in Illinois (http://cleaner.ncsa.uiuc.edu/home/), and through the CUAHSI program office portal in Washington (http://www.cuahsi.org/his.html). Thus, multiple science communities and web outlets can access the same hydrologic information model components. These components may also be similarly incorporated in web portals that have been created by regional CUAHSI Hydrologic Observatory teams. Assessment of HIS Phase 1 Accomplishments Three letters attached to this proposal have been solicited to assess the accomplishments from the current HIS project. Dr William Michenor, Co-Director of the NEON project office, and 10 Associate Director of the LTER Network Office, represents the ecological science community. He writes “I remain extremely impressed with the team of experts that you have engaged to develop the HIS and the logical and product-oriented process that you have followed.” He describes the HIS products developed to date for hydrologic data, and goes on “Importantly, and from a NEON perspective, it will be possible for us to build upon the CUAHSI HIS efforts and to focus more of our energies on activities related to creating the databases and analytical tools related to terrestrial ecosystems and biodiversity”. Dr Robert Hirsch writes as Associate Director for Water at the USGS, and Chairman of the Subcommittee on Water Availability and Quality (SWAQ) of the President’s National Science and Technology Council, which is “the primary coordinating and planning group for waterrelated science and technology in Federal government”. Dr Hirsch states “I am familiar with the HIS project and am highly supportive of its goals and highly impressed by its accomplishments to date…. I believe that the HIS effort holds great potential for helping all of the data delivery services of the Federal water agencies live up to their full potential.” Dr Hirsch has suggested and CUAHSI has accepted that the SWAQ should set up a committee of federal water agency representatives to advise CUAHSI on how best to bring together the nation’s water information. Thus, CUAHSI HIS has been endorsed at the highest level of federal water agency coordination. Clint Brown writes as Director of Software Products for the Environmental Systems Research Institute (ESRI), makers of the ArcGIS geographic information system. He states “This important project will greatly benefit information access, integration, and use by the hydrologic science community, yet it goes far beyond this single community….We believe that your work will be a critical footprint in illustrating how to build distributed information systems that integrate scientific principles for helping to better manage our nation.” These three perspectives, from leaders in the neighboring sciences, in the federal water agencies, and in the computer industry, show collectively the high regard in which the CUAHSI HIS project is held. It truly is viewed as a critical component for the advancement not just of hydrologic science, but of ecological science, of uniting the nation’s water information, and of advancing the scientific implementation of information systems on a broad scale. The challenge for the HIS team is to maintain the pattern of accomplishments which have earned this regard. Plans for HIS Phase 2 CUAHSI Community Challenge Now that a reasonable understanding of has been obtained through HIS Phase 1 of how a community-based hydrologic information system can function, an obvious focus for Phase 2 is to consolidate and focus the effort on delivering usable information products. The task statements presented subsequently have that aim. However, it is useful in the larger sense to have a single strategic goal to unite various components of the effort. Dr Richard Hooper, President and CEO of CUAHSI, has presented the following CUAHSI community challenge to animate the activities of the observatory teams: “Predict the fluxes of water and chemicals, continuously in space and time, throughout the rivers, lakes, aquifers and estuaries of the nation.” At first sight, this seems like a lofty goal, so far out into the future as to be unrealizable. But Global Climate Modeling 11 was similarly thought to be far-fetched when it was initiated, and the first models were not very effective, but that science has now matured to the point where GCM results are leading to critical national policy decisions. Having a lofty science challenge in mind helps to keep the HIS effort focused on supporting science issues – it is very easy to become preoccupied with the very necessary details of how cyberinfrastructure actually works. There are several advantages of the CUAHSI community challenge – it serves to link the four HIS goals (data access, observatories, science, education); it helps to frame science questions that clarify particular aspects of the challenge; it shows that HIS must think in terms of continuous space and time distributions of phenomena, but also consider the movement of water and chemicals into and out of discrete-space objects (rivers, lakes, aquifers, estuaries); it presents a large national scale hydrologic science challenge that the NSF petascale computing infrastructure could support; it means that HIS must deliver information and integrate modeling across all spatial scales from global and continental scale weather patterns to water movement and chemical transport and transformation a point location within a soil column or a stream channel. And HIS must consider all time scales, from instantaneous events like flood peak flows, to the very slow evolution of the landscape through geological time. Figure 3 shows typical cartographic mapping scales for geospatial themes that CUAHSI HIS uses to depict information in digital watersheds. Figure 3. CUAHSI HIS integrates information and modeling across all spatial scales. CLEANER and WATERS CUAHSI in geosciences and CLEANER in environmental engineering have been encouraged by NSF to jointly pursue an MREFC (Major Research Equipment Facilities Construction) project called WATERS (Water and Terrestrial Environmental Research System). This is a very long term goal whose implementation cannot begin until 2011 or later, but in the mean time, it is expected that these two programs will have interoperable systems and facilities. A conceptual diagram of the research process, prepared by Dr Barbara Minsker, PI of the CLEANER planning 12 office, is shown in Figure 4. It contains six components of an integrated cyberinfrastructure process: knowledge services, data services, workflows and model services, meta-workflows, collaboration services, and digital libraries. Knowledge services help scientists and engineers find the information they need quickly and effectively, and the remaining terms have been explained earlier in this proposal. On January 19, 2006, Drs Minsker and Maidment presented a joint seminar to the NSF Engineering Directorate, in which they pointed out that the NCSA environmental cyberinfrastructure project, in which Dr Minsker participates, is focused on the first, fourth and fifth of these boxes, namely knowledge services, meta-workflows and collaboration services, while the CUAHSI HIS project is focused on the second, third and sixth boxes, data services, workflows and model services and digital libraries. The two projects are working together to construct this integrated cyberinfrastructure for the research process. Figure 4. Integrated cyberinfrastructure for WATERS1 Surface Process Cyberinfrastructure On January 18, 2006, there was held at NSF in Washington, a meeting to coordinate earth surface process cyberinfrastructure. The PI of this proposal (Maidment) developed a meeting book with contributions from all the groups represented at the meeting (see Table 1) to document their aspirations and technical approaches. It is clear that there is a great deal of synergy among these various cyberinfrastructure efforts – indeed one of the most exciting aspects of cyberinfrastructure development is to benefit from data and model sharing with neighboring sciences. The CUAHSI HIS team will work diligently to accomplish this goal. Organization CLEANER CSDMS CUAHSI CZEN EarthChem 1 Name Collaborative Large-Scale Engineering Analysis Network for Environmental Research Community Surface Dynamics Modeling System Consortium of Universities for the Advancement of Hydrologic Science, Inc Critical Zone Exploration Network Advancing Data Management in Solid Earth Geochemistry Web site http://cleaner.ncsa.uiuc.edu/ http://www.nced.umn.edu/ CSDMS.html http://www.cuahsi.org http://www.wssc.psu.edu/ http://www.earthchem.org/ This diagram is a slight amendment of one drafted by Dr Barbara Minsker, University of Illinois. 13 GEON IRIS LTER NCAR NCED NCEAS SAHRA Unidata The Geosciences Network Incorporated Research Institutions for Seismology The US Long Term Ecological Research network National Center for Atmospheric Research National Center for Earth-surface Dynamics National Center for Ecological Analysis and Synthesis Sustainability of semi-Arid Hydrology and Riparian Areas Unidata Program Center http://www.geongrid.org/ http://www.iris.edu/ http://www.lternet.edu/ http://www.ncar.ucar.edu/ http://www.nced.umn.edu/ http://www.nceas.ucsb.edu/ http://www.sahra.arizona.edu/ http://www.unidata.ucar.edu/ Table 1. Surface process cyberinfrastructure communities and centers. SAHRA and NCED The NSF Earth Sciences Division supports two Science and Technology Centers, SAHRA and NCED (see Table 1). Both of these centers maintain substantial hydrology and stream channel morphology data collection efforts, SAHRA in the Upper Rio Grande and Upper San Pedro basins, and NCED in the Angelo Coast Range Reserve in the Eel River watershed, California. NCED has one component of its program called “Desktop Watersheds” which is aimed at developing predictive models for watershed behavior that can guide field investigation. The CUAHSI Digital Watershed provides a structured information base upon which the predictive models contained within the Desktop Watershed may be constructed. CUAHSI will work with NCED and SAHRA to incorporate their data into the national HIS. A commitment letter to this effect from Dr James Shuttleworth, Director of SAHRA, is attached to this proposal. CUAHSI Program Office The CUAHSI Program Office in Washington, DC is the hub of CUAHSI’s operation and in all policy matters the CUAHSI HIS program is subordinated to CUAHSI’s President and Executive Committee. In discussions between CUAHSI, NSF and HIS, it has been decided that all outreach efforts for the CUAHSI HIS program, such as the annual HIS Symposium, will be funded from the program office. Moreover, to the extent possible, the CUAHSI web site http://www.cuahsi.org will be become the central portal to which hydrologic scientists go to learn about and use HIS products. HIS Committees The HIS program will have two associated committees, an HIS Standing Committee and a Technical Advisory Committee. The HIS Standing Committee represents the CUAHSI community to provide oversight on the functioning of HIS. This Committee has been functioning already throughout HIS Phase 1. Its members are Dennis Lettenmaier, University of Washington (Chairman), Larry Band, University of North Carolina, William Michener, University of New Mexico, Paul Morin, University of Minnesota, and Kelly Redmond, Desert Research Institute. Members of the Standing Committee participate in the HIS biweekly conference calls and the HIS PI and the Standing Committee Chairman consult with each other from time to time. When necessary, the Standing Committee develops formal reports for the CUAHSI Executive Committee about the HIS program. The Technical Advisory Committee is still being formed but it will consist of representatives from the SWAQ federal water agencies and from the computer industry. The purpose of this Committee, which will probably meet twice per year, is to provide advice and guidance to the HIS effort 14 GIS and HIS Geographic Information Systems (GIS) are an important underlying technology in HIS because of the use of geospatial themes in HIS and the use of geospatial location as an integration mechanism for integration of hydrologic observation networks. The Hydrologic Data Access System is built at the San Diego Supercomputer Center on top of the ArcGIS Server version 9.1. This web server technology is a back-end infrastructure that does not in any way impede the manner in which hydrologic scientists receive and use hydrologic data. As indicated in the attached letter from Clint Brown, the Environmental Systems Research Institute has “joined in an informal partnership with CUAHSI and San Diego Super Computing Center to help support the HIS system implementation using GIS technology and methods.” This collaboration is helpful to the HIS in several ways. First it provides a worthwhile peer review of our HIS product development strategy from experienced industrial software engineers. The task of building information products to support users in more than 100 universities on a budget of less than $1 million per year is a challenge. The knowledge acquired by listening to experienced ESRI software engineers whose products are already deployed at all these institutions and widely used in hydrologic science is helpful to our development team. Moreover, collaboration with ESRI provides a means of referring groups who want to build HIS for themselves but who don’t fit into the domain of NSF’s science focus. At the SAHRA center, for example, there is an aspiration to build an Arizona Hydrologic Information System, to unite water information for Arizona in much the same way that HIS is proposing here to unite that information for the nation. CUAHSI HIS cannot dilute its efforts by supporting State-level HIS programs in various States. NSF has a policy stated in the EAR/IF proposal guidelines that all products developed under this competition shall be open source, so there is a policy issue that has to be worked out between CUAHSI, NSF and the participating universities to ensure that all products and information developed in this project conform to appropriate standards and guidelines. The fact that 96% of the CUAHSI community uses the Windows operating system needs to be considered in this policy. A draft CUAHSI policy statement addressing this issue is attached to this proposal. This policy statement will subsequently be discussed, opened for public comment, amended and adopted by the CUAHSI Executive Committee after consultation with NSF. The CUAHSI HIS project will then conform to the policy guidelines thus established. Tasks An outline of the proposed tasks to be undertaken during HIS Phase 2 is now presented, using the structure of the products and services developed during Phase 1. This is followed by a section on project management in which the various roles taken by the project team members is explained. It should be understood that the responsibility for executing all these tasks is a shared responsibility carried by the team as a whole, and the work is proportioned out according to resources and knowledge among the PI’s in various ways depending on the individual task. (1) CUAHSI Web Services 15 Continue to build and deploy web services for hydrologic data from national archives. Work with the agencies who maintain those archives to ensure the services are secure and updated as necessary. Develop the HydroObjects library for local access to web services from applications and programming languages. Design a database for the hydrologic observations metadata catalog and deploy the observation station maps in several formats, such as kml files for Google Earth. Develop application examples and tutorials for utilization of web services by hydrologic scientists. Develop more web services for observation fields, such as remote sensing, Nexrad and weather and climate grid models. Investigate the OpenDAP web data access system and compare it to OGC standards such as the web coverage and map services. Select an appropriate method for deployment of CUAHSI web services for observation fields. (2) Hydrologic Data Access System Develop and maintain the Hydrologic Data Access System, adapting it to new upgrades of the underlying server technology as they occur. Develop a querying system to select sets of stations with particular kinds of measurements available. Incorporate geospatial themes, observation series and fields for CUAHSI regional observatories, and the SAHRA and NCED study regions. (3) Hydrologic Observations Database [David T. – this is your slot – please define your role!] (4) Hydrologic Metadata [Michael – this is your slot – please define your role!] (5) Hydrologic Modeling using Web Services Demonstrate that existing hydrologic simulation models can be individually operated as web services, thus providing hydrologic simulation services. Show that scientific workflows can be used to integrate hydrologic simulation services with existing hydrologic data services to supply the input data for a simulation. Demonstrate that the output from one hydrologic simulation service can be ingested as the input to another hydrologic simulation service operated elsewhere. Show how hydrologic models defined for different spatial scales and hydrologic environments can for an integrated hydrologic simulation and data system. Use the integrated system to address important hydrologic science research questions in the Neuse River basin, North Carolina (6) Digital Watershed Toolkit Develop and maintain the Digital Watershed Toolkit and encourage contributions of tools from CUAHSI community members to form part of this toolkit, including web-based tools like the Utah State Time Series Analyst. Define a protocol for what is an acceptable level of utility, performance and documentation needed for a tool to be included in the CUAHSI Toolkit. 16 (7) Hydrologic Information Collections Build a Hydrologic Information Repository at the San Diego Supercomputer Center so that hydrologic scientists who wish to contribute their data and models can do so without having to maintain the server architecture themselves. Continue to develop the GEON information registration system so that collections can be registered and stored in the repository. (8) Web Portals Maintain the HIS web portal and support the deployment of HIS tools in the CLEANER and CUAHSI portals. Investigate the connection with the GeoSpatial OneStop portal for federal information http://geodata.gov and determine if this is a suitable venue for a similar outreach for water information – a “Water OneStop”. (9) Outreach Present an Annual HIS Symposium and an Annual Report that summarize the findings and products of the HIS project. Present cyberseminars and seminars at various locations around the nation to keep CUAHSI members updated on HIS progress. Prepare tutorials about HIS use and short courses of instruction in HIS tools and methods. HIS Project Roles University of Texas – PI – David Maidment (1) Project Management – Responsible to NSF as a single point of contact for all aspects of the HIS project management. Interacts with the CUAHSI Program Office and President (R. Hooper), and partnership efforts with neighboring science communities, federal water agencies, and the computer industry. Coordinates with the four co-PI’s the work being done in their institutions. Is responsible for project reporting and documentation of results from the project as a whole. (2) HIS Component Development – prototype web services and tools for time series, multidimensional fields and geospatial themes. Development of the HydroObjects library. Design and development of Digital Watersheds. San Diego Supercomputer Center – co-PI – Ilya Zaslavsky Hydrologic Service Oriented Architecture – responsible for development of a web services oriented architecture for hydrology. This includes the Hydrologic Data Access System, including its observations metadata catalog, the web services library, and the hydrologic information repository. Is the key cyberinfrastructure designer for CUAHSI HIS. Responsible for deploying and maintaining 24/7 hydrologic data services infrastructure at SDSC 17 Drexel University – co-PI – Michael Piasecki Hydrologic Metadata – responsible for definition of HIS metadata standards for all components of the CUAHSI Hydrologic Information Model, mediation among the metadata systems used by federal water agencies and CUAHSI metadata, definition of a hydrologic markup language (HML), development of a data search engine for the HDAS, and development of a framework of hydrologic ontologies, including controlled vocabularies for domain keywords. Duke University – co-PI – Jon Goodall Web Services for Hydrologic Modeling – responsible for defining how hydrologic simulation models can be wrapped as web services, for designing scientific workflows that combine data and simulation modeling, for prototyping the next generation of hydrologic simulation models that operates on top of CUAHSI cyberinfrastructure. Utah State University – co-PI—David Tarboton Web Services for Hydrologic Observations – responsible for designing a hydrologic observations database, defining how information from sensor networks, and field sampling is loaded into the database, for web services that publish information from the database. Project Management HIS Phase 1 has laid out a reasonable strategy for technical development of the CUAHSI Hydrologic Information System, and to some degree what should follow in Phase 2 is a focusing of that effort to produce usable information products for the CUAHSI community. To this end, the size of the project team in Phase 2 has been reduced considerably – in Phase 1 there were five PI’s and 12 collaborating scientists. In this proposal there are just five PI’s. This team has become accustomed to working together over the last two years, and maintains frequent communications by email and by means of a two-hour conference call held every two weeks. The same relationship of four academic PI’s doing research and prototyping, and the San Diego Supercomputer Center consolidating and synthesizing the products will continue in Phase 2. HIS Phase 1 is a Collaborative Grant where each institution submitted its budget separately to NSF. HIS Phase 2 is proposed as a Cooperative Project where the University of Texas at Austin serves as the single point of contact for NSF, and the other four institutions are subcontracted to Texas. The Cooperative Project mechanism allows NSF a greater degree of control over the effort, including adjustment of the amount and proportioning of the budget from year to year if experience shows that to be necessary. The University of Texas at Austin has waived its right to charge overhead on the funds passed through the university to the subcontractors. Dr Maidment holds an endowed Chair at the University of Texas at Austin, which comes with a reduced teaching load. He is also Director of the University’s Center for Research in Water Resources, which means that he has a clerical and technical support staff in place. The path of development of CUAHSI’s various programs has demonstrated that the success of a communitybased science effort depends critically on the skills and commitment of the program leader. If 18 the present PI (Maidment) were unexpectedly to be unable to continue in that role, leadership of the project will pass to Dr David Tarboton of Utah State University. Dr Tarboton was host to the widely acclaimed CUAHSI Hydrologic Observatory Conference in Logan, UT, in August 2004, and is a trusted leader within the CUAHSI community. Each Fall semester for the last six years, he and Dr Maidment have been teaching together via the internet a graduate course on GIS in Water Resources, so each is closely acquainted with the technical thinking of the other. References Atkins, D., et al., (2003), “Revolutionizing Science and Engineering Through Cyberinfrastructure”, Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure, 84p., http://www.communitytechnology.org/nsf_ci_report/report.pdf CLEANER Cyberinfrastructure Committee, (2005), “Review of Hydrologic Information System Status Report”, December 19, 5 p. Goodall, J.L., A geotemporal framework for hydrologic analysis, PhD dissertation, University of Texas at Austin, August 2005. Horsburgh, J. S., D. G. Tarboton and D. R. Maidment, (2005), "A Community Data Model for Hydrologic Observations, Chapter 6," in Hydrologic Information System Status Report, Version 1, Edited by D. R. Maidment, p.102-135, http://www.cuahsi.org/docs/HISStatusSept15.pdf. Maidment, D.R.(ed.) (2005), Hydrologic Information System Status Report, Consortium of Universities for the Advancement of Hydrologic Science, Inc, September 15, 214pp, http://www.cuahsi.org/docs/HISStatusSept15.pdf NSF Cyberinfrastructure Council, (2005), “NSF’s Cyberinfrastructure Vision for the 21st Century Discovery”, Version 4.0, Sept 26, 24 pp. http://www.nsf.gov/od/oci/CI-v40.pdf Tarboton et al, User Survey – cite the EOS paper Folk (2005) – HDF format. Whiteaker et al. 2006 – Map to Map paper. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludдscher, S. Mock, 2004. Kepler: An Extensible System for Design and Execution of Scientific Workflows , , system demonstration, 16th Intl. Conf. on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece. NCSA, 2006. D2K - Data to Knowledge. Accessed February 2006 at http://alg.ncsa.uiuc.edu/do/tools/d2k 19 ESRI, 2006. ArcGIS ModelBuilder. Accessed February 2006 at http://www.esri.com/software/arcgis/about/desktop.html#modelbuilder GEON, 2005. GEON Annual Report. Accessed February 2006 at www.geongrid.org/communications/annual_reports/Annual_Report_2005_Final_Pub.pdf OASIS, 2003. WSRP: Web Services for Remote Portlets (http://www.oasis-open.org/committees/wsrp/) 20
© Copyright 2026 Paperzz