Data integration and standardization in cross-border

Environ Geol
DOI 10.1007/s00254-007-0753-3
ORIGINAL ARTICLE
Data integration and standardization in cross-border
hydrogeological studies: a novel approach to hydrostratigraphic
model development
Diana M. Allen Æ Nadine Schuurman Æ
Aparna Deshpande Æ Jacek Scibek
Received: 7 November 2006 / Accepted: 3 April 2007
Springer-Verlag 2007
Abstract Data integration—or the merging of multiple
source data sets—is central to hydrogeological studies. In
cross-border situations, data heterogeneities are the source
of most integration problems. Semantic integration of the
subsurface geological terms is undertaken for the Abbotsford–Sumas aquifer, a cross-border (trans-national) aquifer,
which is equally shared by British Columbia (Canada) and
Washington State (US). Subsurface information is largely
derived from water well information submitted to the
respective governments. Use of this information is constrained due to inconsistent use of geological terms in
water well reports. Lack of standardized methodology resulted in 6,000 unique geological descriptions for the
aquifer alone. Semantic standardization of geological
descriptions progressed from database interpretation to
domain expert interpretation. Despite the poor quality of
water well information, trends were observed that facilitated the development of a hydrostratigraphic model that
honors the generalized early conceptual models of the
aquifer, but provides a much higher degree of resolution in
the stratigraphy necessary for groundwater flow modeling.
The standardization protocols introduced support the
model creation despite the constraint of poor quality data.
Keywords Hydrostratigraphic model GIS Semantic
standardization Aquifer heterogeneity Data integration Groundwater modelling
D. M. Allen (&) J. Scibek
Department of Earth Sciences, Simon Fraser University
Burnaby, British Columbia, Canada V5A 1S6
e-mail: [email protected]
N. Schuurman A. Deshpande
Department of Geography, Simon Fraser University Burnaby,
British Columbia, Canada V5A 1S6
Introduction
The ability to store, manipulate and visualize data has
made geographical information systems (GIS) a common
tool in many groundwater investigations and groundwater
management activities, particularly those involving large
datasets. GIS is being used to assemble groundwater data,
such as water quality levels, water table surfaces, geological data, etc. and integrating these with various coverages
(surface water courses, land use, etc.) for groundwater
assessment and management activities. Activities such as
aquifer vulnerability mapping and the construction of
groundwater models depend on an understanding of the
conceptual hydrostratigraphic model, and the development
of such a model is dependent on the availability of subsurface data obtained from field investigations (i.e., drilling
activities and/or geophysical surveys). For large regional
studies, particularly those that are trans-jurisdictional in
nature, the assembled datasets may be very large, very
diverse, and very inconsistent. Specifically, water well
information is often the chief source of the depth-specific
subsurface data. The specific use of water well data for
hydrostratigraphic model generation, however, is often
constrained due to inconsistent geological descriptions
(Russell et al. 1998).
The Abbotsford–Sumas aquifer (Fig. 1), which straddles
the border between British Columbia (BC), Canada and
Washington State (WA), USA offers a unique opportunity
to explore the semantic issues in building a conceptual
hydrostratigraphic model. The aquifer has been exploited
extensively on both sides of the border (Cox and Kahle
1999). It is one of the largest aquifers in the region, and
supports the activities of approximately 200,000 people
who live in this area. Groundwater is used not only for
drinking purposes, but also supports industrial, farming and
123
Environ Geol
Fig. 1 Extent of the Abbotsford–Sumas Aquifer. Inset map shows
location of the aquifer within the southwest British Columbia and
northwest Washington State
agricultural activities (Cox and Kahle 1999; Kohut 1987).
Such activities have threatened the integrity of the aquifer
(Ricketts 1999); agricultural practices and the poultry
industry have lead to widespread occurrence of nitrate
contamination (Cox and Kahle 1999). While Canada is
concerned with the excessive groundwater withdrawal
south of the border (Kohut 1987), the US is concerned with
groundwater contamination that may originate north of the
border (Cox and Kahle 1999). As groundwater is the primary source of water for many inhabitants of the study area
there is a pressing need to develop groundwater management strategies. Groundwater management strategies,
however, call for a thorough and detailed understanding of
the hydrogeological framework as well as a numerical
groundwater flow model, both of which rely on subsurface
geological map, which are lacking for this area.
Approaches for mapping subsurface geology have
stemmed largely from the petroleum industry (e.g., LeRoy
1955; Tearpock and Bischke 2002; Walker and Cohen
2006). The petroleum industry has the distinct advantage of
having actual core, petrophysical data, paleontological
data, geophysical logs and seismic data, all of which are
collected and interpreted by trained geologists or geophysicists. Whereas, in the case of the water well industry,
drillers have little to no geological training, and the overall
quality of the data is generally very poor, largely because
different drillers use different terminologies to represent
the same geologic units and/or the level of detail is widely
varying. Also, geologic information, where collected, is
based on a rudimentary description of cuttings such that
well records typically provide only basic relative grain size
information (e.g., sand or gravel). In addition, geophysical
datasets are few, and paleontological data are non-existent.
Perhaps the greatest shortcoming of water well information
relates to the positioning of wells. Wells are drilled based
on lot development such that well location is random from
123
the perspective of gaining insight into subsurface stratigraphy and structure (i.e., wells are not strategically placed
to obtain good subsurface data). While strategic drilling
programs are undertaken at specific sites (e.g., superfund
sites), they are rare in regional investigations. Consequently, environmental geoscientists are at a distinct disadvantage in their ability to make sense of subsurface
information.
As a result of these problems, there have been numerous
initiatives to help improve how water well information is
collected and stored in databases. For example, in British
Columbia, standardized geologic terms will be required for
reporting of subsurface geologic information collected
during water well drilling (new provincial Ground Water
Protection Regulation). Other jurisdictions are also experimenting with reporting standards for water well information. Another North American initiative is the development
of the North America data model (NADM), which was
initiated between the Geological Survey of Canada (GSC)
and the United States Geological Survey (USGS). Despite
all of the initiatives that are aimed at improving how data
are collected and stored in databases, existing water well
databases in Canada and the US store water well lithology
data in the form that they were originally collected. Thus,
the problem of standardization and classification of well
log lithologic terms is a continuing problem in many
jurisdictions in North America.
When undertaking hydrogeologic studies, whether
cross-jurisdictional or not, the hydrogeologist is faced with
sifting through databases that comprise possibly several
thousand well records, of variable quality. Within the
Abbotsford–Sumas study area, most depth-specific, subsurface information is dependent on water well information
provided to provincial (BC Ministry of Environment), State
(WA State Department of Ecology), and Federal (Environment Canada, GSC, USGS) government agencies. Lack
of consistency in describing the geology has resulted in
over 6,000 unique geological descriptions for the aquifer
for more than 10,000 water wells in the region. Confounding the problem is the heterogeneous nature of the
geology owing to the complex depositional history.
Herein, we describe a novel approach to developing a
hydrostratigrahic model within a complex geologic setting.
This paper begins with a review of data integration and
standardization issues, and provides background on the
geological history of the study area, which has resulted in a
complex (heterogeneous) distribution of sediments. We
then provide a simple methodology for standardizing and,
hence, reconciling semantic issues where subsurface
information is based on water well reports containing a
large number of unique geological descriptions (in this case
over 6,000). The standardized data, along with supporting
geologic insight, are then used to construct a hydrostrati-
Environ Geol
graphic model of the aquifer that is validated numerically
in a groundwater flow model.
Materials and methods
Data integration and standardization
Data integration is at the core of GIS and is one of its
defining properties (Vckovski 1998). Although data fuels
the GIS industry, it is the source of most integration
problems.
Data heterogeneities inherent in the component databases have been identified as the source of data integration problems (Stock and Pullar 1999; Vckovski
1999; Bishr 1998; Sheth and Larson 1990). These include (1) syntactic heterogeneity, which stems from the
use of different data models to represent database elements (Bishr 1998); (2) schematic heterogeneity, which
results from different classification schemes employed in
the component databases or structuring of database elements in component databases (Kim and Seo 1991). For
example, in this research the geological descriptions
contained in the well logs are represented by a single
attribute in the BC database and with three attributes in
the WA State database. Schematic heterogeneities also
result from different definitions of semantically similar
entities, missing attributes, and different representations
for equivalent data; (3) semantic heterogeneity, which
occurs when there is a disagreement about the meaning,
interpretation or intended use of the same or related data
(Sheth and Larson 1990). This heterogeneity results from
the different categorizations employed by individuals
when conceptualizing real world objects. Such categorizations differ between individuals depending on education, experience and theoretical assumptions (Stock and
Pullar 1999). An example is the often varied descriptions
used by different drillers to represent the same lithologic
unit. Such semantic heterogeneities have been identified
as the main cause of data sharing problems and are the
most difficult to reconcile (Bishr 1998; Vckovski 1998;
Kottam 1999).
A number of sophisticated approaches to reconciling
semantics have been proposed (e.g., Ahlqvist 2003, 2004,
2005; Visser et al. 2002; Fonseca et al. 2000; Kashyap and
Sheth 1996; Sheth and Larson 1990), but these are generally restricted to the academic domain and are still in the
prototyping stage. Most government agencies and organizations, however, still maintain datasets in relational format (Schuurman 2002), which is an obstacle at this stage.
Our method supports standardizing data from multiple
sources in relational data format.
There are two closely related issues that bear on
semantic data integration of geological data: classification
and standardization. Classification is the process of allocating record names (e.g., borehole layer descriptions) to
broader categories. The content of classification systems is
all based on the category name, but the attributes of categories are discerned through implicit knowledge on the part
of the user. There are strategies to deal with the problem of
a lack of universal understanding of category meaning,
including use of data dictionaries—which create equivalencies among different conventions. Standardization entails limiting the number of record names permitted, and
re-assigning existing records to those categories. In this
paper, we introduce a strategy for standardization of diverse nomenclature that accounts for local interpretations
by using semi, rather than fully, automated integration.
Aquifer hydrostratigraphy
Quaternary sediments in the Fraser–Whatcom Lowland
The Fraser–Whatcom Lowland, which hosts the Abbotsford–Sumas aquifer, consists of rolling hills of glacial drift,
60–120 m above broad valley floors. The floodplains are
currently near sea level, and there are several prominent
Tertiary-age bedrock outcrops, such as Sumas and Vedder
Mountains, bordering Sumas Valley (Fig. 1). These sedimentary rocks underlie a thick (up to 600 m) Quaternary
sediment fill (Clague 1994; Easterbrook 1969), consisting
of complex sequences of diamictons and stratified drift, in
various associations with marine and deltaic sediments,
which provide the physical framework that controls the
architecture of the aquifers, the Abbotsford–Sumas aquifer
being the largest.
Our understanding of Quaternary lithostratigraphy has
evolved over many years through mapping of surficial
deposits and detailed stratigraphic studies. Extension of
this lithostratigraphic scheme into the subsurface is difficult, partly because of the remarkable complexity of the
glacial stratigraphy. This complexity has resulted from
interactions between sedimentation and erosion during
advance and retreat of the ice sheets, the concomitant retreat and advance of the seas, and the isostatic effects of ice
loading (subsidence) and unloading (uplift). The various
stages of glaciation include (from the youngest to the
oldest): Fraser Glaciation (20–10 ka); Olympia Interglaciation (60–20 ka); Possession Glaciation (80–60 ka);
Whidbey Interglaciation (100–80 ka), and Double Buff
Glaciation (>100 ka) (Jones 1999). Therefore, there are
many units of sufficiently high porosity and hydraulic
conductivity to qualify as aquifers, particularly those that
accumulated in close proximity to ice. Mapping has
123
Environ Geol
identified more than 200 aquifers in the region (Ricketts
and Liebscher 1994).
The maximum glacial ice sheet advance corresponds to
the Vashon Stade of the Fraser Glaciation, which deposited
the Vashon Drift (Armstrong et al. 1965). The time of
retreat of the Vashon ice is called the Everson Interstade
(Armstrong et al. 1965), depositing glaciomarine sediments
referred in the Canadian part of Fraser Lowland as the
Capilano Sediments and Fort Langley Formation (Armstrong 1981). The same sediments were named Everson
Glaciomarine Drift in the US (Easterbrook 1969), which
also include some glaciofluvial sediments. The Everson
Interstade ended when the ice re-advanced briefly into parts
of the Fraser Lowland. This episode is called the Sumas
Stade. Sumas Drift was deposited up to 120 m elevation;
large outwash plains and kame terraces were created by
glacial meltwaters (Easterbrook 1969; Armstrong 1981).
Abbotsford outwash is part of Sumas Drift sediments, and
forms the thickest and uppermost layer of the Abbotsford–
Sumas Aquifer. An outwash terrace slopes southward
across the international boundary from a ridge of ice-contact deposits (Easterbrook 1969). The terrace ends at
Lynden, WA, above the Nooksack River valley and
floodplain. The glaciofluvial Abbotsford outwash is com-
posed of stratified sandy gravel, gravel and sand, mostly
horizontally bedded with some cross-bedding, scour and
fill, and foreset bedding (Easterbrook 1969). The sediments
fine to the south-west, grading from boulder-cobble gravel
along international boundary to pebble gravel, then to sand
near Lynden. South of the Nooksack River valley, there is
much of recent alluvium and the Abbotsford outwash may
or may not be present. The Lynden terrace is interrupted by
the modern floodplain of the Nooksack River, but continues south of the river for several kilometers and terminates
against highlands composed of Everson glaciomarine drift
(Easterbrook 1969). A number of lakes and peat bogs occur
in abandoned meltwater channels and kettles on the outwash terrace. The lakes include Abbotsford Lake, Laxton
and Judson Lakes, Pangborn Lake and smaller ponds.
Table 1 shows a compilation of the geologic units in
comparison to hydrostratigraphic units identified by Halstead (1986). The regional hydrostratigraphic framework is
discussed in the following section.
Hydrostratigraphic framework
The original 3D mapping of the Abbotsford aquifer
(extended only in BC north of US–Canada boundary) must
Table 1 Hydrostratigraphic units (Halstead 1986), comparing US and Canadian geologic units (compiled by Golder Associates 1995)
Hydrostratigraphic
unitsa
Possible geologic equivalents
C1
Qt
C2
US geologic unitsb
General geologic description
Canadian geologic unitsc
Glaciofluvial sand and gravel deposited by
meltwater streams, often occurring as raised deltas
Qp peat
Fraser river and salish
sediments
Fluvial and floodplain deposits of silt, sand, gravel
and peat; till, glaciofluvial, and ice-contact deposits;
Qs till and ice-contact deposits
Sumas drift
Outwash sand and gravel
Qal alluvial deposits
Qsc silt and clay
Qso outwash sand and gravel
A/B
Qb Bellingham drift
Qk Kulshan drift
Fort Langley formation
and Capilano sediments
Glaciomarine deposits consisting of stony clays,
and stony silt with marine shells
C3
Qd Deming sand
Fort Langley formation
and Capilano sediments
Stratified, well sorted sand and gravel with some
layers of clay, silt and gravel
D
Qvt Vashon till
Vashon drift
Qve Esperance sand
Quadra sand
Bellingham drift,
Capilano, Fort Langley,
Cowichan head formations
Till and ice-contact deposits of poorly sorted gravel
in matrix of silt, clay and sand; and glaciofluvial
deposits of sand and gravel
Clay and silt, with interbedded estuarine and
fluvial deposits of fine sand and silt
E
Kulshan drift
Pre-Vashon marine deposits
C4
Pre-Vashon sediments
Pre-Vashon sediments
Fine to medium sand of fluvial or glaciofluvial origin
F
TKc tertiary bedrock
Tertiary bedrock
Tertiary-aged consolidated sedimentary deposits
and interbedded volcanic deposits
Th
a
Halstead (1986)
b
After Easterbrook (1976)
c
After Armstrong (1981)
123
Environ Geol
be credited to Halstead (1986), who produced series of
detailed fence diagrams and maps showing interpreted
aquifer units and depths of water wells. Halstead’s work
preceded numerical flow modeling, and the products were
paper maps and not digital database or CAD drawings.
Halstead (1986) defined hydrostratigraphic units on the
basis of lithology, permeability and porosity, and subordinate factors, such as origin (marine, fluvial), stratigraphic
position, and to some extent aquifer type (e.g., water table
aquifers); they are different from the formal lithostratigraphic units, which are defined primarily on mappability
and degree of homogeneity. Halstead’s scheme is useful
from a regional perspective, although detailed mapping
reveals some ambiguities. Halstead’s Unit C (Table 1), for
example, corresponds to the easily mapped, unconfined
Sumas Drift aquifers. However, other hydrostratigraphic
units, such as units A and B, correspond to the more heterogeneous Fort Langley and Capilano Formations, and not
specifically to aquifers within these formations. Several
aquifers mapped herein are confined by finer grained Fort
Langley–Capilano deposits—some of the confined aquifers
appear very similar in general sedimentological characteristics and map extent to the unconfined Unit C aquifers.
Halstead (1986) grouped the sediments into six units of
significance to groundwater, either acting as barriers to
flow or as units that readily transmit ground water. These
are summarized in Table 1, along with their lithostratigraphic equivalents.
In 1993 the GSC began a series of hydrogeologic
investigations and analyses in the Fraser Valley, with the
main objective being the analysis of regional stratigraphic
framework of aquifers, groundwater flow, discharge and
recharge dynamics (e.g., Ricketts and Jackson 1994).
Similar work has been carried out in Washington State,
notably by Kahle (1991), Jones (1999), and Cox and Kahle
(1999) at the USGS, and Tooley and Erickson (1996) at
Washington State Department of Ecology (WA Ecology).
The USGS studies are known either as the LENS (Lynden–
Everson–Nooksack–Sumas) hydrogeologic study and have
been published as Water Resources Investigations Area
(WRIA) 1 report.
There were also small hydrogeologic projects near
various landfill sites, water supply well locations, and a
proposed power plant site (e.g., Golder Associates 1995;
Gibbons and Culhane 1994; Mitchell et al. 2000; Piteau
Associates 1991, 2002).
Well litholog database
To develop a hydrostratigraphic model of the aquifer, water
well data from various sources were acquired. Sources
of lithology data include: WA Ecology (WRIA 1 of
USGS regional groundwater study database) and NWIFC
(Northwest Indian Fisheries Commission), BC Ministry of
Environment (BC MoE) (WELLs database), BC Ministry
of Energy and Mines (several deep exploration boreholes),
GSC reports and papers, BC Ministry of Transportation
(bridge construction sites), and Simon Fraser University
(Cameron 1989, MSc thesis with lithologs of Sumas
Valley).
The BC MoE has a very extensive well record digital
database. In BC, submission of well drilling reports is
currently not mandatory, and there is no training of drillers
in the preparation of well lithologs. As a result, the
lithology information and location of wells contain many
errors. Typically, litholog quality depends on driller’s
experience and/or education in geology, the amount of
detail recorded in database, transcription errors, method of
drilling, geologic setting, misplaced well records, and
incorrect location of a well. Location coordinates were not
available for many wells. In many instances, the given
coordinates were not accurate, and the error was not
known. For some of the wells, the coordinates were taken
from address information and matched to addresses in
Street Network files. In the lithologic records, the problems
included incorrectly formatted text output, truncated
lithologs (e.g., maximum 24 layers per litholog), lack of
ground elevation of top of well (very common), missing
uppermost unit (assumed to be soil where thin), and
problems with conversion of unit top and bottom depths to
elevations above sea level (elevation of top of well elevation).
Washington State Department of Ecology (WA Ecology) maintains a large and detailed digital database of well
drill records, including lithologs. The lithologs, however,
are mostly in .tif image format, from scanned images of
paper forms. In some areas, local governments and organizations have entered the information from paper forms to
database records and these can be queried directly. In
northern Whatcom County, the south part of our study area
along the Nooksack River and on Lynden terrace, the
database has only images of litholog paper forms, which
had to be entered into text and numeric digital format. In
the WA Ecology logs the values are in feet and locations
are in latitude/longitude. All were converted to meters and
the UTM coordinate system.
Data from all sources were integrated to create a single
database. Figure 2a shows the shallow wells (0.6–32 m
depth) and Fig. 2b shows the deep wells (32 to >200 m
depth). Data integration challenges in terms of data sources, quality, and access to information, schematic and
syntactic heterogeneities can be found in Deshpande
(2004). The semantic issues are discussed herein.
As the data were obtained from multiple sources, the
geological descriptions varied in style, classification and
naming conventions. For example, the geological classifi-
123
Environ Geol
Table 2 Varieties of descriptions for potentially the same lithology
1
Brown fine to medium sand and gravel, and some silt
2
Silty sand and gravel, fine–medium, brown
3
Sand, fine to medium, brown and silty, gravel, fine to medium
4
Brn. fn./med. sand and gravel with silt
5
Silty sand and gravel
6
Sand with gravel
7
Sand
...
n
Fig. 2 Borehole locations and depths in central Fraser valley (all data
sets): a shallow wells (0.6–32 m depth), b deep wells (32 to >200 m
depth)
cation for the bridge construction reports were based on the
Unified Soil Classification System, which is used for
engineering purposes and is based on the particle size, liquid limit and plasticity index; the drill core record
descriptions were based on the Wentworth Scale; GSC
geological descriptions were based on the stratigraphy and
environment of deposition, and the drillers’ descriptions
were based on experience or education. These semantic
differences and lack of standardized descriptions for the
study area resulted in 6,000 unique categories.
Lithologic data standardization
A borehole litholog is a record of geologic materials
encountered at different depths during the drilling process.
The level of precision in such records varies between wells,
and probably within each well. Numerous contractors and
hydrogeologists have contributed information to the well
databases; the variables are quality of expertise, field
conditions, drilling purpose, cost of drilling, well depth and
size, litholog translation into the database, and database
management. Typically, lithologs follow a format that
identifies the top and bottom depth of each layer, and give a
description of lithology encountered at each depth interval.
The choice of words varies slightly to significantly between
different lithologs, even those that describe the same
123
material type. For example, consider a layer of unconsolidated deposits consisting of sand (60% by volume) and
gravel (30% by volume) with properties of fine to medium
grain size in each, brown in color, and containing some silt
(5% by volume). This lithology description could be worded according to i = 1 to n different sentences as shown in
Table 2. Each of the descriptions in Table 2 is unique,
ambiguous to some extent, and typical of lithologs in the
well databases. Some well reports describe more lithological details than others, and the degree of generalization
varies as well. Furthermore, the complexity is increased by
frequent non-standard abbreviations and word misspellings, grammatical ambiguities, and variable delimiters
(comma, slash, space). When using these data some
assumptions were made:
1.
2.
3.
The descriptions are taken literally and describe the
actual lithology of the site where the borehole was
drilled. However, wells can be assigned litholog
quality designations that can be used for weighing the
data in further analysis. These are subjective criteria
and may be based on the amount of detail written in a
litholog; date of drilling, well size, depth, and purpose,
noting that larger hydrogeologic studies usually involve professional hydrogeologists.
Each litholog can be successfully interpreted. Therefore, lithologs that are too ambiguous cannot be used.
The data are output correctly from the database. In
each litholog, the sequence of layers has to be correct,
and the depths of layers must be in the correct order.
Pre-processing of text files
A series of pre-processing steps was required prior to
standardization and classification to deal with database
structure issues. These steps were undertaken using a
custom computer code. The first step was to parse and sort
the data into several fields (e.g., a unique well identifier,
UTM coordinates, layer top and bottom depths, layer
Environ Geol
1
0
20
Silty sand and coarse gravel
2
20
22
Sand and coarse gravel
3
22
28
Grey/brn med. sand few pebbles
4
28
48
Grey/brn med. sand-coarse gravel
few pebbles up to 2†
unrecognized words, which are checked by the user who
then updates the appropriate word lists. The program is rerun for the entire database until all important words are
recognized. For example, the text ‘‘fn. to med. gry. sand &
grav. with coarse gravels’’ is recognized as ‘‘fine–medium
grey sand and gravel and coarse gravel.’’
5
48
84
Grey/brn med coarse sand some gravel
few pebbles
Material property assignment
6
84
136
Gry/brn med coarse sand, trace med
gravel at 117¢
7
136
156
Gry/brn fn-med sand trace silt
Table 3 Example of a litholog array for a single well
lithology description, and additional well information, such
as yield and screen depth). A multi-dimensional array of
wells and their lithologs was then processed further. At this
stage a litholog layer is the smallest unit of data aggregation. An array of litholog data for one well is shown in
Table 3.
Word recognition process
A module was written for word recognition. For each litholog layer, the text is broken up into word groups as
delineated by word separators in the original text (i.e.,
commas, slashes, or other characters). The word groups
preserve the grammatical structure of the source text. Each
word is read separately and compared to a custom dictionary of geological terms. This dictionary consists of lists of
words and their alternative spellings for different categories of words, based on grammatical meaning. These lists
were developed for words describing rock and unconsolidated sediment materials (e.g., ‘‘granite’’, ‘‘sand’’), words
specifying grain size, color, sedimentological structure or
rock structure (e.g., ‘‘interlayered’’), modifying words
such as ‘‘sandy’’ or ‘‘wet’’, hydrogeologic terms, words
describing technical aspects of well design and drilling
process, and special words used to recognize grammatical
relationships between words (e.g., ‘‘and’’, ‘‘to’’). One list
also links some modifying words to material types such as
‘‘sandy’’ to ‘‘sand’’.
For each word in the dictionary there may be many
alternative spellings, abbreviations, and synonyms. For
example, the color ‘‘brown’’ is often written in lithologs as
‘‘brn’’ or ‘‘brwn’’. In extreme cases, a commonly used
word ‘‘gravel’’ is spelled in all of the following forms in
the database: ‘‘gravel’’, ‘‘grav’’, ‘‘grv’’, ‘‘gravels’’,
‘‘grvl’’ in a combination of lower and upper case letters.
Therefore, each word is also converted to lower case as a
default.
Word recognition reaches practical limits where words
are badly misspelled, joined together (missing separator),
or totally ambiguous. The program also outputs a list of
The largest challenge concerned grammatical structures of
litholog text. In that text there are descriptions of different
materials (rock or unconsolidated deposit) and their properties. The materials are also arranged in order of importance, where usually the most abundant material is specified
first, and all other subsequent materials are present in
smaller amounts. There are exceptions, identified by such
words as ‘‘and’’, which relate two materials as being
equally abundant in a layer. For grain size ranges, the word
‘‘to’’ links size descriptors such as ‘‘fine’’ or ‘‘coarse’’, and
‘‘–’’ dash character may also be used instead of ‘‘to’’. The
modifying terms such as ‘‘silty’’, when combined with a
material such as ‘‘sand’’, have a special meaning, from
which two separate materials ‘‘sand’’ and ‘‘silt’’, the silt
being the lesser amount, must be extracted to standardize
this text. The complexities grow exponentially with poorly
constructed sentences and ambiguous sentences.
The goal is to extract all the materials and all separate
properties, in standard form, from all lines in all lithologs.
This task involves an iterative process of test-and-run to
verify the results. It is most economical to train the program
on about 5% of the cases, let the program handle about 80%
of the cases, and verify the remaining 15% cases by visual
inspection without further modifications to attempt to improve the program. Software that would successfully recognize >95% of the lithologs with proper grammatical
relationships would be able to almost mimic a human being,
and is thus impractical to develop because of complexity.
Standardized lithologs
Table 4 shows an example of a non-standardized litholog
for one well and Fig. 3 shows the standardized litholog for
the same well. The standardized forms can be queried in a
database environment using SQL statements or other
methods, and layers can be generalized for spatial and
structural analysis. Once standardization was complete, the
original records were compared to the standardized format
for a sample of wells.
Litholog data classification and interpretation
Rules were developed for litholog classification, which were
used as guides for constructing the geologic cross-sections.
123
Environ Geol
Table 4 Example of a non-standardized litholog
Coarse gravel and silt
Clean coarse sand and small w b gravel
Very coarse sand/coarse gravel and fine silt
Coarse sand and med sand
Med sand/thin clay layers and some boulders
Med and coarse sand/fine sand and silt
Coarse gravel with clay layers
Coarse sand and some gravel
Medium sand with pebbles
Gravel/some sand
Very coarse gravel/very little sand
In essence, the classified well logs aided the interpretation
process, but ultimately did not replace geologic expertise.
Material classification was undertaken over a series of passes, each reducing the output to a lower number of material
types. The classification is simple when only one material
type is present because the classified material is the same as
the constituent material. For mixtures, a series of rules were
applied (Table 5). These rules are not ideal, may be modified, and attempt to capture the important hydrogeologic
properties of the subsurface materials.
Classification was aided by expert knowledge of the
local geology and depositional environment. In the Abbotsford uplands, gravel is the dominant material, followed
by sand and larger boulders, and locally clay and silt. The
clay occurs in lenses associated with tills, and silt content
may vary in the sand matrix of those gravels. Small silt
lenses may be present, but silt is more common as lacustrine deposits in the Sumas Valley. Clean gravels are rare
except in lenses of fluvial or glaciofluvial deposits. Sand
usually occurs with other materials, and most commonly
with gravel. Due to lack of other information, the occurrence of sand as the dominant lithology in the litholog was
interpreted as pure sand (fine to coarse—unknown grain
size distribution).
Fig. 3 Standardized litholog
output from a custom
standardization code
123
In the central Fraser Valley, clays are present in most
deep lithologs at some intervals, and are present in most of
boreholes in Langley uplands where Fort Langley Formation glaciomarine stony clays outcrop at ground surface, or
lie beneath thin sands or gravels of Sumas drift or other
coarse grained sediments. Clay is almost exclusively
associated with clay-rich tills. Thus, intervals containing
predominantly clay, or at least clay as secondary material,
were classified as clay material. Therefore, most ‘‘clay’’
intervals inherently contain mixture of sand, gravel, boulders, or silt. Furthermore, we expected that many lithologs
confuse silt with clay. From groundwater flow point of
view, clay and silt both have low hydraulic conductivity
relative to sand and gravel. Therefore, if clay is present as a
major constituent, it was classified as clay. If clay contains
silt, or is interbedded with silt, it was classified as clay. If
clay is present as a minor constituent or trace amount, and
it is a thin layer, then clay was ignored. Although small silt
lenses may be present, silt is more common as a lacustrine
deposit in the Sumas Valley, which experienced more
lacustrine flooding and deposition. If silt was present as a
major constituent, then the material was classified as silt,
whereas if silt was present as a minor constituent or a trace
amount, and was a thin layer, then silt was ignored.
Sometimes bedrock fragments can be mixed with other
materials near the bedrock surface. In such cases, the other
material was considered dominant. Occasionally, bedrock
was recorded between layers of unconsolidated materials.
Here, it was assumed that bedrock was indeed a boulder.
Soil and fill were ignored. These are very local in extent
and are usually in the unsaturated zone, so do not play a
major part in saturated groundwater flow, although it is
recognized that soils impact recharge significantly. Soil was
considered in recharge modelling (Scibek and Allen 2006).
Some thin layers represent local lenses of materials, while
others are part of larger, but thinning, continuous layers. For
interpolation purposes, some layers were aggregated into
larger more generalized layers, some thin layers were pre-
Environ Geol
Table 5 Primary rules for classification of mixed sediments as recorded in lithogs from the Abbotsford–Sumas aquifer
Gravel
Gravel + cobbles (or boulders) = gravel
Gravel + sand = gravel (if gravel is the dominant material)
Gravel + clay or silt = NOT gravel (see clay or silt classification)
Sand
Sand + gravel = sand
Sand + other material = sand
Clay
Clay + silt = clay
Other material + trace clay or thin clay layer = other material
Other material + clay = clay
Silt
Other material + trace silt or thin silt layer = other material
Other material + silt = silt
Soil and fill
Soil and fill were ignored
Bedrock
Other material + bedrock = other material
Other material underlying bedrock = bedrock
Thin layers
Preserve all clay layers as these are important for groundwater flow
Preserve silt layers if the thickness is significant (the threshold can be
adjusted)
If clay is interbedded with thinner layers of other materials, then
generalize this group of layers as clay
(same rules apply to thin layers of silt)
served, and others were ignored. During interpolation, these
decisions may be changed to provide better fit to data.
However, the borehole density is insufficient in some
locations to resolve the detailed stratigraphy.
The actual implementation of rules for classification was
done in Visual Basic (VB code) and run on an Excel
spreadsheet with the litholog database. The frequency of
occurrence of each material class and average thickness is
very helpful in selecting appropriate aggregation rules and
selecting appropriate material classes—graphed in Fig. 4
for second and third classifications, respectively. Ultimately, the database was reduced down to five material
classes and these were used to map the regional trends in
lithology. However, intermediate classification results aided in the interpretation at the local level.
Results
Early attempts at constructing a traditional layered
hydrostratigraphic model for the Abbotsford–Sumas
aquifer using the standardized well database were fraught
with difficulty. Deshpande (2004) reported that it was
practically impossible to fit any ‘‘surfaces’’ to the very
heterogeneous Quaternary sediments in that area, despite
having only five material classes. In this aquifer, the heterogeneity of sediments is such that a fit to any regional
‘‘geologic layers’’ or ‘‘hydrostratigraphic layered units’’,
would reduce model resolution so greatly as to make it not
possible to calibrate at a regional scale, and definitely not
possible to calibrate to local conditions.
The approach used required the HUV package in
MODFLOW 2000 (Waterloo Hydrogeologic Inc. 2000) to
represent geology in a 3D grid, rather than assigning geologic layers (where possible) to MODFLOW layer surfaces. The primary reason for standardizing the lithology
database was to allow pseudo-3D computer representation
and manipulation in geospatial databases and flow modeling software of the borehole logs. At first, the GMS 4.0
software (Brigham Young University 2002) was used to
examine the information, but the very large quantity of data
(>2000 good quality boreholes) slowed down the software
so much as to make it not practical to use. The second
solution involved use of ArcGIS 8.3 (ESRI 2004) to
display the boreholes in 3D, together with pre-defined
MODFLOW surfaces (slices of the aquifer area without
regard for geology, but thinning toward ground surface to
increase resolution of mapping), ground and bedrock surfaces, and surficial geology polygons (Fig. 5). The software (ArcScene module in ArcGIS 8.3) allows rotation,
zooming in and out, in 3D, and proved to be very fast and
easy to use. Lithologic materials were color-coded for
quick reference. The following colors are used to represent
different lithologies: clay (blue), silt (green), gravel (orange), sand (yellow). Surficial fill or soil units were not
displayed to simplify the materials to only four general
types.
The goal of the mapping was to fill the 3D space of the
model domain with geologic materials, classed into
lithostratigraphic units, based on borehole lithologs. The
lithostratigraphic units were also identified as hydrostratigraphic units by assigning appropriate hydraulic conductivity, porosity, and storativity values (as determined
form pumping test rests). Lithostratigrahic units could then
be joined with others on the basis of their hydraulic
properties.
The MODFLOW grid was then defined. Grid layers
were created as slices (flat where possible), and thickening downward. Near ground surface, the layers were
thin (3 m first layer, 5–10 m second in the uplands, 1–
3 m in lowlands). MODFLOW requires continuous layers
and some judgment was required to create appropriate
slice elevations. This was done using GIS, where elevation zones were created for each slice surface, then imported to MODFLOW as xyz surface elevation points.
The surfaces were displayed in GIS during the mapping
123
Environ Geol
Fig. 4 AB–SUM aquifer
litholog classification
histograms of material
occurrence and thickness. Left
side material classes in lithologs
after second aggregationreclassification pass. Right side
material classes in lithologs
after third aggregationreclassification pass
process. For example, to map geology in layer 4, a view
of the bottom of layer 4 would be displayed, effectively
truncating all deeper lithologs in the view, a view of the
bottom of layer 3 could be switched on and off, to
constrain the mapped litholog intervals. Mapping was
done city block by city block (street network and
drainage network were used as orientation guides), small
area by small area, directly into Visual MODFLOW
Fig. 5 Two litholog database in 3D intersected by MODFLOW
surfaces (constructed in ArcScene, ESRI 2004)
123
software (WHI 2004), by ‘‘painting’’ zones of geologic
materials on the MODFLOW grid in each layer. In each
small area, all boreholes were examined from many
views, through all layers, checking with surficial geology,
and also viewing row and column cross sections of
MODFLOW grid with defined (color-coded) material
zones. This novel approach effectively bypassed the
difficult steps of creating a 3D solid model (i.e., continuous geologic layers). The model is regionally consistent with the hydrogeologic fence diagrams developed
by Halstead (1986) and the cross-sections developed by
Cox and Kahle (1999), but provides a much greater degree of resolution necessary for numerical groundwater
flow modeling. Figure 6 shows slices through the
MODFLOW model, representing each model layer (1
through 8). The bottom layer of the model consists entirely of clay overlying impermeable bedrock.
After initial groundwater flow model calibration attempts (see Scibek and Allen 2005), there were areas with
large residuals that did not respond to changes in hydraulic
conductivity within reasonable range for each mapped
K-zone (hydrostratigraphic unit zone). In those areas, the
geology was re-interpreted, again from borehole lithologs,
this time with much more attention paid to possible
Environ Geol
Fig. 6 Hydrostratigraphic
model of central Fraser valley
fill by MODFLOW layer
(nearly-horizontal slices
of valley)
interpretations and keeping in mind the model residuals
and surficial geology, and by looking at individual borehole
records to verify standardized lithologic units. In many
areas, there are many possible interpretations of local
geology due to poor distribution of boreholes. The interpretation favoring lower model residuals was selected and
the geology re-mapped in that area. Therefore, the
groundwater flow model was used to guide interpretation of
the subsurface geology in this area—the attempt to explain
groundwater levels, flows, existence of lakes and other
features, gives additional information to help interpolate
the geology from poorly distributed boreholes.
The groundwater flow model ultimately achieved a
normalized root mean square (RMS) error of 7.15% using
roughly 1,700 static water levels from drilled wells. The
main source of error in model calibration stems from our
inability to adequately map highly conductive gravels and
sand, and less conductive ‘‘dirty’’ gravel and sand mixtures
that contain some finer grained materials, which lower the
hydraulic conductivity. Typically, the difference can be as
much as one order of magnitude (e.g., 300 m/day for clean
gravels and sands, and 20 m/day for dirty gravel). From
most of the well lithology information, it is impossible to
distinguish these high and low zones.
It is expected that similar difficulties would be had in
other heterogeneous aquifer systems, and that alternative
(stochastic) approaches may provide a solution to representing heterogeneity (e.g., T-PROGS module in GMS).
However, such software similarly requires classification of
material types, and the approach used herein could provide
a means for achieving such a classification.
Conclusions
In the absence of field investigations, the water well
information can be a source of invaluable information. This
was the case for the Abbotsford–Sumas aquifer where
water well reports were the chief source of depth specific
geological information. The extremely poor data quality
and semantics of geological descriptions, however, proved
a hindrance in the development of the hydrostratigraphic
model. Due to the lack of consistent methodology to document geological descriptions, 6,000 unique geological
123
Environ Geol
description was observed for the study area alone.
Semantic inconsistencies in the geological descriptions
were adequately resolved by reclassification. A further
confounding issue was the high degree of heterogeneity
resulting from a complex geological history, which prevented the development of a traditional hydrostratigraphic
model, based on a layered paradigm. The hydrostratigraphic model was constructed using ArcGIS to display the
boreholes in 3D, together with pre-defined MODFLOW
surfaces. The model is regionally consistent with the hydrogeologic fence diagrams developed by previous
researchers, but provides a much greater degree of resolution necessary for numerical groundwater flow modeling.
References
Ahlqvist O (2003) Rough and fuzzy geographical data integration. Int
J Geogr Inf Sci 17(3):223–234
Ahlqvist O (2004) A parameterized representation of uncertain
conceptual spaces. Trans GIS 8:493–514
Ahlqvist O (2005) Using semantic similarity metrics to uncover
category and land cover change. In: Rodriguez MA (ed) GeoS
2005, vol LNCS 3799. Springer, Heidelberg, pp 107–119
Armstrong JE (1981) Post-Vashon Wisconsin glaciation, Fraser
Lowland, British Columbia. Geological Survey Bulletin 322,
Geological Survey of Canada, Ottawa
Armstrong JE, Crandell DR, Easterbrook DJ, Noble JB (1965) Late
Pleistocene stratigraphy and chronology in Southwestern British
Columbia and Northwestern Washington. Geol Soc Am Bull
76:321–330
Bishr Y (1998) Overcoming the semantic and other barriers to GIS
interoperability. Int J Geogr Inf Sci 12(4):299–314
Brigham Young University (2002) Groundwater modeling system
(GMS) Version 4.0
Cameron VJ (1989) The Late Quaternary geomorphic history of the
Sumas Valley. MSc, Department of Geography, Simon Fraser
University, Burnaby, 154 pp
Clague JJ (1994) Quaternary stratigraphy and history of south-costal
British Columbia. In: Monger JWH (ed) Geology and geologic
hazards of Vancouver Region. Southwestern British Columbia,
Geological Survey of Canada, pp 181–192
Cox SE, Khale SC (1999) Hydrogeology, ground-water quality, and
sources of nitrate in Lowland glacial aquifers of Whatcom
County, Washington, and British Columbia, Canada. In: US
Geological Survey Water-Resources Investigations Report, US
Geological Survey
Deshpande A (2004) Data interoperability across borders: a case
study of the Abbotsford–Sumas aquifer (BC/ Washington State).
MSc, Department of Geography, Simon Fraser University,
Burnaby, p 137
Easterbrook DJ (1969) Pleistocene chronology of the Puget Lowland
and San Juan Islands, Washington. Geol Soc Am Bull 80:2273–
2286
Easterbook DJ (1976) Geologic map of western Whatcom County,
Washington. United States Geological Survey Miscellaneous
Investigations Series, Map I-854-B, 1:62500
ESRI (2004) ArcGIS 8.13 user manual and documentation. Environmental Systems Research Institute (ESRI)
Fonseca FT, Egenhofer M J, Davis Jr CA, Borges KAV (2000)
Ontologies and knowledge sharing in urban GIS. Comput
Environ Urban Syst 24:251–271
123
Gibbons TD, Culhane T (1994) Basin study of Johnson Creek,
Whatcom County hydraulic continuity investigations. In: Part 2,
open file technical report OFTR 94–01. Washington State
Department of Ecology, USA
Golder Associates Inc (1995) Blaine ground water management
program. Golder Associates, Canada
Halstead EC (1986) Ground water supply—Fraser Lowland, British
Columbia. In: NHRI Paper No.26, IWD Scientific Series No.145.
National Hydrology Research Institute, Saskatoon
Kashyap V, Sheth A (1996) Semantic and schematic similarities
between database objects: a context-based approach. VLDB J
5:276–304
Jones MA (1999) Geologic framework for the Puget sound aquifer
system, Washington State and British Columbia. Professional
paper 1424–C, US Geological Survey, Reston, Virginia
Kahle SC (1991) Hydrostratigraphy and groundwater flow in the
Sumas Area, Whatcom County, Washington. MSc Western
Washington University, Bellingham
Kim W, Seo J (1991) Classifying schematic and data heterogeneities
in multidatabase systems. IEEE 24(12):12–18
Kohut AP (1987) Groundwater supply capability Abbotsford Upland.
BC Ministry of Environment and Parks, Water Management
Branch, Victoria
Kottam CA (1999) The open GIS consortium and progress towards
interoperability in GIS. In: Egenhofer M, Goodchild M, Fegeas
R, Kottam C (eds) Interoperating geographic information
systems. Kluwer Academic, Boston, pp 39–54
LeRoy LW (ed) (1955) Subsurface geologic methods, 2nd edn.
Colorado School of Mines, Golden, p 1156
Mitchell R, Babcock S, Stasney D, Nanus L, Gelinas S, Boeser S,
Matthews R, Vandersypen J (2000) Abbotsford–Sumas aquifer
monitoring project, final report. Geology Department, Western
Washington University
Piteau & Associates (1991) Hydrogeological assessment of Aldergrove aquifer, Aldergrove, BC. Report for the Corporation of the
Township of Langley
Piteau & Associates (2002) 2001 annual water quality monitoring
report, Jackman sanitary landfill, WMB Permit No PR-1841.
Township of Langley File 4720–L01
Ricketts BD (1999) The Fraser Lowland hydrogeology project: an
overview. open file D3828. Geological Survey of Canada,
Vancouver
Ricketts BD, Jackson LE Jr (1994) An overview of the Vancouver–
Fraser valley hydrogeology project, Southern British Columbia.
Cordilleran and Pacific margin—current research. Geological
Survey of Canada, Vancour, pp 201–206
Ricketts BD, Liebscher H (1994) The geological framework of
groundwater in the Greater Vancouver area. In: Monger JWH
(ed) Geology and geological hazards of the Vancouver region,
Southwestern British Columbia. Geol Surv Canada Bull
481:287–298
Russell HAJ, Brennand TA, Logan C, Sharpe DR (1998) Standardization and assessment of geological descriptions from water
well records: Greater Toronto and Oak Ridges Moraine Areas,
Southern Ontario. Current Research 1998–E. Geological Survey
of Canada, Ottawa
Schuurman N (2002) Flexible standardization: making interoperability accessible to agencies with limited resources. Cartogr Geogr
Inf Sci 29(4):343–353
Scibek J, Allen DM (2006) Comparing the responses of two high
permeability, unconfined aquifers to predicted climate change.
Glob Planet Change 50:50–62
Scibek J, Allen DM (2005) Numerical groundwater flow model of the
Abbotsford–Sumas aquifer, Central Fraser Lowland of BC,
Canada, and Washington State, US. Report prepared for
Environment Canada, Vancouver, p 203
Environ Geol
Sheth A, Larson JA (1990) Federated database systems for managing
distributed, heterogeneous, and autonomous databases. ACM
Comput Surv 22(3):183–236
Stock K, Pullar D (1999) Identifying semantically similar elements in
heterogeneous spatial databases using predicate logic. In:
Vckovski A, Brassel K, Schek H (eds) Interoperating geographic
information systems, second international conference, INTEROP’99 Zurich, Switzerland, 1999. Springer, Heidelberg, pp
231–252
Tearpock DJ, Bischke RE (2002) Applied subsurface geological
mapping with stuctural methods, 2nd edn. Prentice Hall, NJ, p
822
Tooley J, Erickson D (1996) Nooksack watershed surficial aquifer
characterization. Washington State Department of Ecology,
Ecology Report, pp 96–311
Vckovski A (1998) Special issue: interoperability in GIS (Guest
Editorial). Int J Geogr Inf Sci 12(4):297–298
Vckovski A (1999) Interoperability and spatial information theory. In:
Egenhofer M, Goodchild M, Fegeas R, Kottman C (eds)
Interoperating geographic information systems. Kluwer Academic, Boston, pp 31–37
Visser U, Stuckenschmidt H, Schuster G, Vogele T (2002) Ontologies
for geographic information processing. Comput Geosci 28:103–
117
Walker JD, Cohen HA (2006) Geoscience handbook: AGI data
sheets, 4th edn. American Geological Institute, p 310
Waterloo Hydrogeologic Inc (2000) Visual MODFLOW v 3.0: user
manual. Waterloo Hydrogeologic Inc, Waterloo
123

Download Report

Data integration and standardization in cross-border

Paperzz.com

Your Paperzz