Mapping an extended ER model to a spatial relational model

Mapping an extended ER model to a spatial relational
model
Areti Dilo
June 22, 2000
Abstract
Observing that many spatial applications, quite often, need more complex spatial objects
types than the basic ones — points, lines, and areas — this research is dedicated to
identifying what are the most important and mostly needed spatial object types. A set
of spatial elements is chosen to cover these needs and formalisation of them is given using
topology and graph concepts. The formal specifications of spatial elements are refined into
specifications of spatial data types, which can be used in conceptual modelling of spatial
applications not only as data types for the spatial attributes of spatial objects, but also to
set constraints in the objects class extension. Considering an extended (with specialisation,
grouping, etc.) Entity-Relationship model and a nested relational data model, a set of rules
is defined for the translation of a conceptual schema to a relational schema.
2
Acknowledgment
I would like to express gratitude to The Netherlands Fellowship Programme for providing
the fellowship for this study.
Special thanks to my main supervisor, Dr. Rolf de By, for being patient with all my delays
in the time schedule, and for clarifying the ideas every time I was feeling lost in all kinds
of technical details.
Thanks to all the ITC teachers who helped us to increase our knowledge.
A lot of thanks to Angela, Tal, Nirvana, Füsun, Vahit, Romina and all our classmates for
making nice the time we spent together.
Special thanks to Milton, for his willingness to help with comments and answers to all my
questions (small and important ones).
My gratitude to Mohamed, for always being disposed to help with everything.
Arta
3
Contents
1 Introduction
1.1 General . . . . . . . .
1.2 Problem Definition . .
1.3 Research Questions . .
1.4 Objectives . . . . . . .
1.5 Approach . . . . . . .
1.6 Structure of the Thesis
.
.
.
.
.
.
6
6
6
7
7
7
8
.
.
.
.
9
9
11
18
22
.
.
.
.
.
.
29
29
37
39
39
45
50
4 Logical Model
4.1 Graph Design Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Spatial Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Mapping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
52
59
62
5 Conclusions
64
A
66
66
68
.
.
.
.
.
.
.
.
.
.
.
.
2 Literature Review
2.1 Formal methods . . . . . .
2.1.1 The Z Notation . .
2.2 Conceptual models for GIS
2.3 Data Modelling . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Spatial Elements
3.1 Graph notions . . . . . . . . . . .
3.2 Spatial Elements . . . . . . . . .
3.2.1 Zero-dimensional elements
3.2.2 One-dimensional elements
3.2.3 Two-dimensional elements
3.3 Summary . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A.1 Topology Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Z at work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
List of Figures
2.1
Sort and Kind hierarchy of a spatial DB . . . . . . . . . . . . . . . . . . .
28
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
Graph hierarchy . . . . . . . . . . . .
Extended Spatial ER Diagram . . . .
Examples of line . . . . . . . . . . .
Examples of line collections . . . . .
Examples of planar line collections .
One-dimensional elements hierarchy .
The relation between one-dimensional
Examples of region . . . . . . . . . .
Examples of region collections . . . .
Two-dimensional elements hierarchy .
.
.
.
.
.
.
.
.
.
.
33
38
40
41
43
44
45
46
48
49
4.1
A (partial) hierarchy of spatial data types . . . . . . . . . . . . . . . . . .
63
5
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
elements and graphs
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction
1.1
General
Conceptual modelling offers important advantages compared to direct logical design modelling approaches. This is because users may express their knowledge about the application
using concepts that are independent of computer terms (concepts). Another important reason is that the model is independent of the software tool with which the application will
be implemented. It can also facilitate the understanding of the application model, being
a model that is closer to a general way of thinking, not specific for the particular field of
computer science. But, this brings the necessity of translating the conceptual schema to a
logical schema, which is the basis for a real implementation in the computer software.
The relational model is a firmly established data model, which is implemented in many
commercial relational DBMS packages. The relational model is one of the three major data
models that are currently used in commercially available DBMSs, the others two being the
hierarchical model and the network model.
The Entity-Relationship (ER) model is probably the most used conceptual model in the
field of database design. The ER model accommodates a number of well-chosen primitives
to define the information contents of the future database, in a format that is intuitively
appealing to both specialists and non-specialists, and which also allows the downwards
translation in the conceptual / logical / physical database schemas to obtain a representative relational database schema [8].
1.2
Problem Definition
What is missing in the spatial domain, is a conceptual model for spatial applications, that
is accepted as a (kind of) standard. Previous work has been done in building conceptual
models for spatial applications [11], [18], [14], and in formalising the models [13], [12], [7].
There is still place for a complete formalisation and separation of implementation issues
from the conceptual ones ([7]), or for a more complete list of spatial elements needed in
many spatial applications. Furthermore, (it seems) there is no work done in a formal
6
translation from a conceptual model to a logical representation of spatial data.
Accepting the ER model as a good solution for the conceptual modelling phase of spatial
database design, and an extensible DBMS based on the relational model, as the spatial
database system, the aim of this research is to find a set of rules for mapping an extended
ER diagram to the logical schema.
A spatial database system deals with different kinds (approaches) of spatial data: the fieldbased approach, the object-based approach, fuzzy objects, etc. What will be considered in
this research work is the object-based approach to spatial data.
1.3
Research Questions
To help solving the raised problem, the following questions should be answered:
• What are the spatial elements that cover the spatial applications (considered here)?
• How can they be described formally?
• How can they be included (fitted) in the standard ER model?
• What is a proper implementation of spatial elements?
• What rules can be formulated for translating from the conceptual model to the logical
model?
1.4
Objectives
To reach the aim of the research, first a set of spatial elements will be defined, which
cover the needs of (most of the) spatial applications. Then, the formal specification of
these spatial elements will be given, and the role they play in the ER model will be
defined. Accepting a non-first normal form as the data model of the target system (where
the spatial data will be implemented), a concrete specification of spatial elements will be
given, which is the basis of their implementation as (complex) data types in a relational
database system. Considering the (data) structure of these spatial elements and some
common (basic) relationships between them, a set of rules will be defined for translating
from an ER diagram to a (database) relational schema.
1.5
Approach
What will be done in this thesis work, in an ordered fashion, is the following:
• Defining the spatial elements that can be used to build a conceptual schema for
spatial applications.
7
• Looking at some commercial spatial database systems — what spatial elements do
they offer? Are these elements covering them?
• Finding a proper formalization of the spatial elements and defining their role in the
ER model.
• Refining the formal specifications of the spatial elements into more concrete representations, which are closer to (computer) implementation.
• Defining rules for mapping from the extended ER model to the logical model.
1.6
Structure of the Thesis
The thesis consists of five chapters. This chapter gave an introduction to the aim of this
research, the objectives of the research and the approach followed. Chapter 2 will give a
short introduction to Z, the formal language that will be used for the specifications in the
coming chapters. Later in chapter 2 are resumed some of the papers dealing with spatial
data modelling and data modelling in general. Chapter 3 is dedicated to the formalisation of spatial elements, and chapter 4 is refining the specifications given in chapter 3 in
implementation schemas. It gives clues on how they can be used in modelling of spatial
applications. Chapter 5 closes the thesis with some conclusions.
8
Chapter 2
Literature Review
This chapter consists of three parts: Section 2.1 discusses the importance of formal methods
in software engineering, section 2.1.1 describes the Z Notation, which is a formal language
used for formal specifications; Section 2.2 gives some ideas on what should be done for the
application of formal methods in GIS, and describes work done in conceptual modelling
of spatial applications; Section 2.3 resumes the ideas given in some papers written about
data modelling.
2.1
Formal methods
For many years, the software industry has been producing software, employing much effort
in programming. But still, the vast majority of computer code is handcrafted from raw
programming languages by artisans using techniques they neither measure nor are able
to repeat consistently [9]. That brings the danger of having errors in the written code,
for which there is no way of controlling except for testing the final software, and it also
brings the risk of overwork of producing (probably different) code for the solution of similar
problems.
The first of them, testing the software, tells us only that there are bugs in the program (if
testing is good enough to find some of them), but it can never prove that there are no bugs
in the software. Putting the program design in algebraic forms, makes it possible to avoid
serious mistakes, giving the possibility of controlling (proving) the correctness of program
design.
The second problem, duplication of written code, can be solved by first structuring the
problem in small parts, writing code for each part, and then using the technology that
supports interchangeability of software parts, to put together existing and (just) written
code.
Formal methods help in solving or decreasing those problems. They help in writing robust
computer programs, because they provide for
• techniques of structuring the problem (the program is supposed to solve), and
9
• an output that is concise, precise and unambiguous.
They consist of:
• Formal specifications — that tell what the system (solution of the problem) should
do.
• Verified design — that tells how the system is going to do the job.
The methodology underlying formal methods is that one first specifies the behaviour of a
piece of software, then that software is written and one proves whether or not that actual
implementation meets its (formal) specification. This final aspect of formal methods is
known as verified design. This term applies to the relation between a formal specification
and the software component that is written to meet that specification. Clearly, we want
the software component to satisfy its specification; proof techniques have been developed
to enable someone to prove that a software component meets its specification [10].
To write formal specifications we use formal languages. The branches of mathematics
mostly used in formal languages, are set theory and first-order logic. The formal language
that will be used here is the Z notation, the ingredients of which are:
• some basic mathematical types (e.g., set, relation, etc.) and some operators defined
on them. Other types can be defined using the basic types. The language provides
tools for this.
• some more complex structures that are capable of carrying semantics (e.g., schemas)
or defining rules or properties (e.g., axioms, general definitions).
• some mathematical laws, rules and proof methods (e.g., mathematical induction)
that makes it possible to reason effectively about the way a specified system will
behave.
What is the output of formal methods in describing a system, using the Z notation? These
methods define:
• States in which the system can be, and
• operations on states.
Both can be described by Z schemas. The description of the system is given at an abstract
level by abstract schemas and at a concrete level by concrete schemas, which are closer
to computer concepts. Refinement is needed to show the correspondence between the two
levels, abstract schemas and concrete ones. This is an orthogonal process to the system
description levels (abstract, concrete and other that can be needed as intermediate) that
has two parts: data refinement and operation refinement.
Formal specifications are at a middle stage in the software life cycle. They come after
writing the requirements document in a natural language (like English or Spanish), and
before proceeding with the system implementation in a programming language. They are
10
not a replacement of the natural language description of the system, only a complement
to it. For being concise and unambiguous they can serve as a reliable reference point for
the persons who write the customer requirements, those who will implement the program,
others that will test the results and also for writing the reference manuals of the system
[9].
The next section will first describe a standard collection of mathematical symbols, then the
features of the Z language, and, at the end, we will talk about data and function refinement.
An example of the implementation of a Birthday Book will be presented to illustrate the
use of Z. Some definitions needed for it will be in given in this section, and a more complete
solution is given in Appendix A.2.1 This example shows how schema calculus can be used
to modularise a specification and how data refinement is used to relate specifications and
designs.
2.1.1
The Z Notation
One of the key ideas of Z is that the specification and implementation should be kept
separate. The specification should precisely state what the eventual piece of software
should do and not how it is to go about achieving its task. Separating specifications and
implementations results in a separation of the often conflicting task of correctly solving
the problem in hand and that of building an efficient piece of software. Something that is
important to realise is that writing a formal specification is quite different from writing a
computer program [10].
A Z specification is a combination of mathematical language statements that strictly define
or prove system properties, and natural language statements that describe what is said or
done by the mathematics used.
Basic Notions
Some of the mathematical concepts used in the Z language are sets (special sets offered
by Z are naturals — denoted by , integers — denoted by ), bags, sequences, relations,
functions, etc. Built upon those are other concepts like Cartesian product (X × Y of X
and Y ); the power set (denoted by X , which is the set of all subsets of X ); finiteness
applicable to sets (denoted by X , which is the set of finite subsets of X ), and the #
operator for the number of elements in a finite set; the nonemptiness concept that is
applicable to sets, bags, sequences (e.g seq1 X is the set of nonempty finite sequences on
X ), functions, etc. Definition, properties and operations on some of these will be given
below (taken from [9]).
Relations — A relation R from (the set) X to (the set) Y is a set of pairs (x, y) where
x ∈ X and y ∈ Y , thus R ⊆ X × Y . The element (x , y) ∈ R can also be written as
x 7→ y ∈ R. X ↔ Y denotes the set of all relations between X and Y .
1
The example and its solution is taken from [22].
11
The domain of a relation R is that subset of X , for which each element is related (by
R) to at least one element of Y . Its definition in Z is
dom R = {x ∈ X ; y ∈ Y | x 7→ y ∈ R • x }
The range of a relation R is that subset of Y , for which each element is related to
at least one element of X . Its definition in Z is
ran R = {x ∈ X ; y ∈ Y | x 7→ y ∈ R • y}
Domain restriction (over A ⊆ X ) is again a relation: all the pairs of R, which first
members are elements of A. Its definition in Z is
A R = {x ∈ X ; y ∈ Y | x 7→ y ∈ R ∧ x ∈ A • x 7→ y}
Range restriction (over B ⊆ Y ) is a relation that has all the pairs of R, which second
members are elements of B. The definition in Z is
R B = {x ∈ X ; y ∈ Y | x 7→ y ∈ R ∧ y ∈ B • x 7→ y}
Domain co-restriction (over A) is a relation that has all the pairs of R, which first
members are not elements of A. (Range co-restriction is defined in an analogous way.)
A R = {x ∈ X ; y ∈ Y | x 7→ y ∈ R ∧ x ∈
/ A • x 7→ y}
The inverse of R is a relation R ∼ : Y ↔ X that is defined from R
R ∼ = {x ∈ X ; y ∈ Y | x 7→ y ∈ R • y 7→ x }
The identity relation is id X = {x : X • x →
7 x}
The relational composition of R : X ↔ Y and S : Y ↔ Z is
R S = {x ∈ X ; y ∈ Y ; z ∈ Z | x 7→ y ∈ R ∧ y 7→ z ∈ S • x 7→ z }
The backward composition — With the above definitions of R and S : S ◦ R = R S
Overriding — If R, S : X # Y then R ⊕S is a relation that agrees with R everywhere
outside the domain of S , but agrees with S where S is defined:
R) ∪ S
— If R : X # Y
R ⊕ S = (dom S
Relational image
relational image of A through R.
and A ⊂ X , then R A = ran(A R) is the
Functions — A partial function from X , called source, to Y , the target, is a relation that
maps elements of X to at most one element of Y . The set of all partial functions
from X to Y is 2
X
2
Y
== {f : X # Y | ∀ x ∈ X ; y1 , y2 ∈ Y •
x 7→ y1 ∈ f ∧ x 7→ y2 ∈ f ⇒ y1 = y2 }
see Abbreviations in next subsection
12
A total function is a partial function in which each element of the source is mapped
to exactly one element of the target. The set of all total functions from X to Y is
X
"Y
== {f : X
Y
| dom f = X }
If x 7→ y ∈ f then y is denoted as f (x ). All operations defined on relations are
also valid on functions (as they are relations with some specific properties). Some
properties of functions are:
Injections — If each element of the domain is mapped to a different element
of the target, then the function is said to be injective. There are partial
injective functions, which symbol is , and total injective functions, which
symbol is .
Surjections — If the range of the function is the whole target, the function
is said to be surjective. Partial surjections have the symbol , total
surjections have the symbol .
Bijections — A function that is both surjective and injective is called
bijective. The symbol for bijection is .
Finite functions are defined as
X
Y
== {f : X
Y
| dom f ∈ X }
Sequences — A sequence is an ordered collection of objects (it can be empty), e.g.,
ha, c, f , d , ci is a sequence. If X is a set, then the set of all finite sequences of objects
from X is defined by
seq X == {s : X | ∃ n ∈ • dom s = 1 . . n}
where 1 . . n = {i ∈ | 1 ≤ i ≤ n}.
Some operations on sequences are:
Concatenation — If s, t ∈ seq X , then s t ∈ seq X denotes the concatenation of s and t, i.e., adds t at the end of s.
Head and Tail — if s ∈ seq1 X , then head s is the first element of s, and
tail s is the remaining part.
Length — If s ∈ seq X , then #s denotes the length of s, that is the largest
integer for which the associated partial function is defined.
Bags — A bag is an unordered collection of objects in which the multiplicities are important, e.g., a, c, f , d , c is a bag, which is the same as a, c, c, d , f . If X is a set,
then the set of all bags with elements from X is defined as
bag X == X
1
(1 = \ {0})
The number of times an element x : X appears in a bag B : bag X is count B x (B ] x
is another notation). The fact that an element x : X is in the bag B is denoted by
x B , and the following equivalences are true:
∀ x : X ; B : bag X • x
B ⇔ x ∈ dom B and x B ⇔ B ] x > 0
13
Definitions
To illustrate definitions, for which syntax will be given here, we will use examples from the
Birthday Book. (The Birthday Book is a system that records people’s birthdays, allowing
us to add new people and their birthdays, and ask for birthdays of people that are already
in the book.) A definition can be one of the following:
A declaration that introduces a new type or new variable of an existing or previously
declared type. We will need a type NAME for names of persons, and a type DATE
for their birthdays in our Birthday Book. To declare them we write:
[NAME , DATE ]
To declare a variable that holds the names of persons with birthdays recorded (thus
known is a set of names), we state
known : NAME
An abbreviation introduces a new name for some expression. It is of the form
symbol == expression
e.g.,
Month == {Jan, Feb, Mar , Apr , May, Jun, Jul , Aug, Sep, Oct, Nov , Dec}
or of a generic form, which introduces a family of symbols:
symbol parameter == expression
e.g., the empty set
is a subset of every set. In a set X
it is defined as
[X ] == {x : X | false}
Because the predicate part on the right side is always false, there is no element x : X
that can satisfy it.
An axiom gives an everlasting property (truth). It is of the form:
x :X
P (x )
or of a generic form, based on a type (or some types), e.g., the definition of projection
functions for ordered pairs:
[X , Y ]
first : X × Y " X
second : X × Y " X
∀ x : X ; y : Y • first(x , y) = x ∧ second (x , y) = y
14
A free type In our example of Birthday Book, we will look for the birthday of a person,
and it can be that the person is not yet in our book. For this special case (and
probably others) it is good to have a variable result that tells us what is the situation.
The type of this variable can be a free type, REPORT , for which the declaration is:
REPORT ::= ok | already known | not known
If we declare then the variable result : REPORT , that variable can assume just three
values: ok , already known, not known
The notation for free type definitions adds nothing to the power of Z language, but
it makes it easier to describe recursive structures such as lists and trees [22].
Schemas , of which the definition has a name, a declaration part, and a predicate part. A
schema is used to describe state information, operations on states, and initialisation.
To describe the state space of our system, Birthday Book, we write the schema:
BirthdayBook
known : NAME
birthday : NAME
DATE
known = dom birthday
Here birthday is a function which, when applied to a certain name (that is an element
of the set known), gives the birthday associated with that name.
The declaration of a schema can also look like (called a horizontal schema):
b [declaration | predicate]
Schema Name =
A schema can also be defined in terms of its signature and its property. The signature
introduces the schema variables together with their types. A declaration is a signature together with implicit predicates. The property constrains the variables and
describes the relationship between them. The property includes the explicit predicate ‘below the line’, and any implicit predicates concealed within the types used in
the declaration. The schema property is also called the schema invariant.
The relationship known = dom birthday in the BirthdayBook schema is an invariant
of the system.
A schema may be used wherever a declaration is expected. It may be used as a
predicate, and also it may be used as a type, in the same way as mathematical types.
A schema S represents a set, the set of all its bindings. A binding is an assignment
of values to a schema’s components such that they obey its predicate. If a variable is
declared with a schema type s : S , then the variable’s value is one of these bindings.
This is exactly the same as saying if a variable is declared x : , then x has the value
of one of the members of . A schema binding is denoted by θS , S being the schema
name [3].
15
Schemas may be composed, using schema calculus (schema operators), to form specification
of new states and operations. This has two benefits:
• the operations can be broken into small parts that are more easily understood,
• extensive reuse of schemas becomes possible.
Schema Operators are decoration (which symbol is 0 ), conjunction (∧), disjunction (∨),
negation (¬ ), quantification (∀ - for all, ∃ - exists, ∃1 - exists only one), hiding (\),
composition (), and precondition (the symbol is ‘pre’).
We use decoration to indicate the after variables of an operation, where the before variables
are undecorated [3] e.g., known 0 is the state of known after adding another person in the
Birthday Book (that results in adding another name to the set known). Using decoration
we can define some common schemas, the Delta (∆) schema and Xi (Ξ) schema that we
will use in our example later.
b BirthdayBook ∧ BirthdayBook 0
∆BirthdayBook =
The ∆ is part of the name of the schema and is used to indicate change of state [3].
ΞBirthdayBook
∆BirthdayBook
θBirthdayBook = θBirthdayBook 0
The Ξ is part of the schema name and is used to indicate no change of state [3], i.e., the
values of BirthdayBook ’s components are the same before and after the operation.
Examples of conjunction, disjunction, and quantification will be given later.
To show how hiding works, let us take the schema S
b [x : X ; y : Y | P ]
S=
S \ (x ) (read S hiding x ) is the schema:
b [y : Y | ∃ x : X • P ]
S \ (x ) =
The operation pre is applied to operation schemas. The precondition of an operation is a
schema that characterises the collection of ‘before’ states for which some ‘after’ states can
be shown to exist [9]. The precondition of an operation can be calculated. The purpose
of the precondition calculation is to check that the operation is valid. There must be at
least one before state in which the operation is applicable. If there is an inconsistency in
the definition, the precondition is false, and hence there are no appropriate before states.
So, in a precondition calculation the aim is to determine what must be true of the before
state and the operation inputs to achieve a satisfactory outcome. This is done by hiding
the outputs and after variables [3].
pre Operation = ∃ State 0 • Operation \ outputs
16
Refinement
The meaning of refinement is ‘to make more concrete’. Refinement is the process of
turning a more abstract specification into a more concrete one [3]. Typically refinement is
performed in a number of steps, in which we move gradually towards the concrete design
of the system, proving in each step that the lower level (the more concrete one) is a correct
representation of the higher level (the more abstract one), from which it is derived. What
we get at the end of the refinement process is a system design that is closer to the level
of the programming language. Efficiency issues, the space/time trade-off, should be taken
into consideration in the system design.
Refinement has two parts, data refinement and function (operation) refinement.
Data Refinement [3] In data refinement, the abstract data type in the abstract specification is related to the concrete data type in the concrete specification (design).
This is done by means of a retrieve schema that shows the relationship between the
abstract and the concrete state items in logical terms. This relationship allows us to
retrieve the abstract state from the concrete one.
Function Refinement [3] In moving toward a more concrete description of the system,
the operations change from describing the ‘what’ of the operation to the ‘how’. To
be able to demonstrate the refinement, we have to show the following:
• Correct initial concrete state. Each possible initial concrete state (subscript
c) must represent a possible initial abstract state (subscript a). The concrete
version should not allow starting points that the abstract specification forbids.
∀ Statec0 • Initc ⇒ ∃ Statea0 • Inita ∧ Retrieve
• Correct operation refinement. Whenever the abstract operation terminates, so
should the concrete operation. In other words, if we are in a state in which the
abstract operation is guaranteed to terminate and we apply the retrieve relation,
we will be in a state in which the concrete operation is guaranteed to terminate.
∀ Statea ; Statec • pre Opa ∧ Retrieve ⇒ pre Opc
• Correct concrete operation. If the abstract operation terminates, then so should
the concrete one and the state in which the concrete operation terminates should
represent a possible abstract state in which the abstract operation could terminate. In other words, we can either start in the precondition of the abstract
operation, perform the retrieve operation to reach the concrete state and then
perform the concrete operation, or we can first perform the abstract operation
and then apply the retrieve relation.
∀ Statea ; Statec ; Statec0 • preOpa ∧ Retrieve ∧ Opc ⇒
∃ State 0 a • Opa ∧ Retrieve 0
17
2.2
Conceptual models for GIS
What should be done for the application of formal methods to GIS are: Choice of an ontology, Choice of paradigm of formalisation, and Choice of formal languages and reasoning
techniques [4].
Finite and discrete representations of infinite and continuous domains are achieved by
means of abstraction and discretisation. One way of representing infinite and continuous
domains of individuals is to represent classes of individuals rather than individuals. Individuals in the same class are considered to be equivalent. Equivalence classes of individuals
partition the domain of individuals. The difficult problem is to define the classes of equivalence in such a way that the structural properties of the original domain of individuals are
preserved. This can be achieved if structures that govern the domain of individuals are
used to define the classes of equivalent individuals [4].
Such kind of structures can be achieved by :
The geometric paradigm of formalisation which define the equivalence based on properties and relations that remain invariant under particular classes of transformations; The
analytic paradigm of formalisation defines the equivalence based on relation of order with
respect to a frame of reference; The qualitative paradigm of formalisation is based on landmark values (individuals which represent significant changes) and order relations between
them which qualitatively structure the domain. Often more than one paradigm of formalisation is used depending on the structural properties of the class of intended model
[4].
Until the beginning of the last century, Euclidean geometry was the only form of geometry.
Then hyperbolic, or Lobachevskian geometry, and elliptic, or Riemannian geometry were
constructed. As more than one geometric theory was designed, the problem became the
determination of what made a theory of space a geometry. What is the essence of geometry? Geometry abstracts from particular location and considers properties of geometric
figures that are independent of particular location, i.e., invariant under a certain group
of transformations. Geometric figures are equivalence classes of geometric individuals or
configurations of geometric individuals that can be made to coincide by transformations
belonging to such a group (e.g., triangles, configurations of triangles, polygons, configurations of polygons). Transformations are operations on geometric individuals that change
certain properties and leave others invariant [5]. For example, Euclidean geometry deals
with properties such as the length of a line, the sizes of angles, etc., all of which remain
unchanged under the transformations of rotations and translations. This definition of geometry also includes areas of mathematics such as graph theory and topology, which have
a geometric component. Typically each group of transformations defines a set of properties
that remain invariant and thus creates a geometry that can be formally defined and studied
[16].
Location is a relation between spatial individuals and a frame of reference. Location
within a frame of reference allows abstracting from individuals using equivalence classes
with respect to identity of location. Individuals that have the same location in a particular
frame of reference, form an equivalence class. An important issue is to evaluate which
18
geometric properties of the domain of individuals are preserved in the domain of equivalence
classes of individuals with respect to location [5].
Geometry can be represented in terms of locations in R n . Analytical geometry is a mathematical theory of geometric properties of sets of points based on finite representations of
their locations in a frame of reference given by a system of n axes of directed real lines and
the corresponding coordinate space R n [5].
In [15], spatial objects are represented by inequalities, and the paper argues about advantages and disadvantages of linear constraints representation compared to a pointer-based
vector representation in terms of storage and processing time of operations.
In [14], [20], [21], [23], [24] space is modelled as a subset of R 2 and the objects defined on
that space are zero-, one- and two-dimensional. They consider different levels of granularity
(different scales) for the objects in a single conceptual schema element, i.e., a land parcel
can be seen as a point or a region, dependent on the current scale of the application. So,
a land parcel object can be a point or region, zero or two dimensional object.
Spatial objects have a position in space that is made up of four components: Shape is one
of the four components, which fully and non redundantly define position: the others are
size, orientation and centroid [20].
Spatial objects have descriptive attributes and spatial attributes. Spatial attributes are
properties of the embedding space that indirectly become properties of the spatial objects
via their position in space, i.e., the spatial objects inherit them from space. The spatial
attributes of objects may be captured independently of the objects using so-called fields
(called also layers). Layers are one of two types: those that are continuous functions, e.g.,
“temperature”, or “erosion”, and those that are discrete functions, e.g., “county division”
represented as regions [24].
It is frequently necessary to capture the position of spatial objects in the database. The
first step to support this is to provide means for representing the space in which the objects
are embedded. The next is to provide means for indicating that the objects’ position in
this space is to be captured. For this purpose the following special entity and relationship
sets are introduced.
• The special entity sets SPACE, GEOMETRY, POINT, LINE, REGION. Entity set
GEOMETRY captures the geometrical position of the entity set and can be POINT,
LINE, REGION, or any other geometric type (or geometry).
• The special relationship set “is located at” that associates a spatial entity set with
its geometry. The cardinality of this set is 1:M because a spatial entity may have
more than one geometry when multiple granularities are employed. The relationship
set “belongs to” between GEOMETRY and SPACE with cardinality M:1 is also
included.
The spatial attributes of entities are calculated via the relationship “belongs to” [24].
MADS, as described in [18] and [19], is a conceptual model for spatial data. MADS offers a
set of spatial abstract types, organised in a generalisation hierarchy. This hierarchy can be
changed according to the needs of the application, by creating new subtypes, or grouping
19
some existing types into a new supertype. With every spatial type are associated some
methods that permit to define and manipulate instances of that type. The most general
type is Geo; its subtypes are Simple Geo and Complex Geo. Subtypes of Simple Geo are
Point, Line, Simple Area. The type Line is a subtype of Oriented Line type. Subtypes of
Complex Geo are Point Set, Line Set, and Complex Area. Line Set is a subtype of Oriented
Line Set. If an object type has type Geo, the precise type of every instance will be defined
at the moment of its creation.
The spatiality of an object is described by a predefined attribute, geometry that is a
grouping of shape (e.g., Point, Line, Area, or Simple Geo) and location that can be given
in absolute coordinates or relative to other known locations. The domain of values of
geometry is one of the spatial abstract types given above. An element can be described as
a spatial object or a spatial attribute, depending on the application. A spatial attribute
is a simple attribute, single-valued or multivalued, derived or not, whose domain is a
spatial abstract type. Spatial integrity constraints may be associated with spatial entities
or spatial attributes.
A spatial relationship may be of different types, e.g., topological, orientation, metrical,
or spatial aggregation. Spatial relationships can be deduced from the spatiality of objects,
thus, these relationships implicitly exist and are accessible through GIS functions. But,
in MADS it is possible to define them explicitly, giving the possibility to attach to them
attributes or methods, or to give them a special semantics, complementing the semantics
they have from GIS functions. MADS provides topological relationships and spatial aggregation. Any other spatial relationship type may be explicitly declared with the methods
attached to the spatial abstract types. The predefined relationship types in MADS are:
disjunction, adjacency, crossing, overlapping, inclusion, and equality.
Aggregation can be spatial or thematic. It is a binary link directed from the composite
to the component object. An object type composed of several object types is represented
using several aggregation links, one for each component type. It is common that some
attributes of the composite and component objects are related. These dependencies are
represented either by derived attributes, or by integrity constraints.
Generalisation links may relate to spatial or non-spatial object types. Inheritance can
be adjusted by using either refinement or redefinition. Refinement is useful each time a
property of an object (thematic attribute or geometry) has a smaller domain in a subtype
than in a supertype. Redefining an inherited attribute aims at a different objective. Redefinition creates a new attribute in the subtype, with the same name. Thus, redefinition of
geometry makes it possible to associate different geometries to the same object. This can
be used for multiple representations. Multiple inheritance is another kind of generalisation
link that solves the problem of sharing the same object by several object types [18].
Spatial partitions are discussed in [12]. Partitions are a central spatial concept to organise
our perception of understanding the space. They enable us to consider the attributes of
single points (space-based view), they also provide access to collections of points having
equal attributes (object-based view). Thus, the model closes the gap between these two
views of the space.
In set theory, a partition is a complete decomposition of a set S into non-empty, disjoint
20
subsets {Si | i ∈ I }, called blocks. A partition can be seen as a total surjective function
π : S I . A spatial partition can be defined as a set-theoretic partition of the plane, or
as a function π : R 2 I .
From an application point of view, different blocks of a spatial partition are marked (labelled). Thus, a partition model should consider point sets together with the associated
values. The set of values that are used for labelling in a specific partition, define in a way
the type of partition. The spatial partition of a type A are functions of type π : R 2 A,
where A in contrast to I has some semantics. To ensure that π is a total function, an assumption is that every label type A contains an element ⊥A (called undefined or unknown),
and the outside area of a partition is labelled by ⊥A .
Blocks of a spatial partition are called regions. Regions that actually appear in application
are regular, without cuts or punctures, and without isolated points or lines. So, the interior
of regions of a partition is required to be a regular open set. Since points in the boundary
cannot be assigned uniquely to either adjacent regions, they are mapped to the set of values
given by the labels of all their adjacent regions.
The definition of spatial partitions is given in two steps. First, a spatial mapping of type
A is defined as a total mapping π : R 2 " A ∪ A. The range of a spatial mapping π is
the set of labels actually used by π and is denoted by range(π). The blocks of a spatial
mapping π are maximal point sets that are mapped to the same value. (If f : X " A then
∀ a ∈ A • f −1 (a) = {x ∈ X | f (x ) = a}. When f −1 is applied to a set it yields a set of
sets.) The common label of a block b of π is denoted by π[b], that is π(b) = {l } ⇒ π[b] = l .
The cardinality of block labels identifies different parts of a partition: the interior and the
boundary. A region of π is any block of π that is mapped to a single element of A, and a
border of π is given by a block that is mapped to a set of A-values. The interior of π is
the union of all its regions, and the boundary of π is the union of all its border blocks. Let
π be a spatial mapping of type A, then ρ(π) := π −1 (range(π) ∩ A) are the regions of the
partition, and β(π) := π −1 (range(π) ∩ A) are the borders.
Then a spatial partition is defined by topologically constraining regions to regular point
sets and by semantically constraining boundary labels to those of adjacent regions. Thus,
a spatial partition of type A is a spatial mapping π of type A, such that
∀ r ∈ ρ(π) : r is a regular point set, and
∀ b ∈ β(π) : π[b] = {π[r ] | r ∈ ρ(π) ∧ b ⊆ r }
The set of all spatial partitions of type A is denoted by [A], and it is [A] ⊆ R 2 " A ∪ A.
The partition boundary can be seen as an undirected planar graph. From this point of
view, using the cardinality of border labels, they can be discriminated further: an edge
block is mapped to a two element subset of A and defines a border curve between two
regions. A vertex block is mapped to a subset of A with three or more elements; a vertex
block is a singleton point set and describes location where more than two regions of a
partition meet [12].
Three basic operators on partitions: Intersection, Relabel, Refine, are formally defined in
[12] and it is shown that operations that arise generally in spatial analysis, cartography,
21
etc. such as overlay, reclassify, merge, cover, clipping, can be realised by these three basic
operators.
Spatial types, together with some Base, Time, Temporal and Range types are introduced
in [13]. The spatial types, used in the design of wider spatio-temporal types, are point,
points as finite sets of points, line as finite sets of curves that intersect each other only
at their ends, region as regular point sets. A curve is given as the range of a continuous
function from a closed interval in R to R 2 , it is not self-intersecting, but can be looped,
and a condition for the uniqueness of representation of the curve by the function is given.
In the geometry object model of [7] a GeometryCollection is an object type that is a
collection of one or more geometric types. Subtypes of it are Point, MultiPoint as the type
of finite collections of points; Curve is given as the homeomorphic image of a closed real
interval, and it can be simple – the line is not self-intersecting – and looped – the begin
and end of line are the same point (but the homeomorphism excludes the looped curve);
LineString is a curve obtained by linear interpolation between its representing points, Line
as line strings with exactly two points, LineRing as simple looped curves; a MultiCurve
is a collection of curves and a subtype of it is MultiLineString; a Surface can be threedimensional as well as planar surface; A Polygon is a planar surface that is a regular closed
set, its interior is a connected set, its frontier consists of a set of linear rings; MultiSurface
is a collection of surfaces, and Multipolygon is a collection of polygons that can intersect
each other only in a finite set of points from their frontiers.
2.3
Data Modelling
The IFO model described in [1] is a formal semantic database model, which primary focus is on the structural component of the data model (the other components being data
manipulation and the integrity specification component).
Four fundamental principles of semantic database modelling are identified. The most basic is that data about objects and relationships between them should be modelled in a
direct manner. As first introduced by the Entity-Relationship model, such “object-based”
modelling allows database designers and users to think in terms of objects without the
indirection resulting from the symbolic identifiers necessitated by records and pointers. As
highlighted in the Functional Data Model (FDM), a second basic perception of semantic modelling is that many (if not most) of the relationships recorded in a database are
functional in nature. A third basic perception is the significance of the so-called “ISA”
relationships, which specify the fact that one set of objects must be a subset of another
set of objects. The final perception of semantic data modelling is to provide a hierarchical mechanism for building object types out of other object types, like aggregation and
grouping.
The IFO model is a mathematically defined database model that incorporates the four
principles within a coherent, graph-based representational framework. The presentation
of the model is done in four steps. First types are introduced that model the structure of
objects arising in database applications. Second, fragments are built from types, and are
22
used to represent functional relationships in the IFO model. Third, it is described how
ISA relationships between various objects of the schema are incorporated. Finally, all the
pieces are put together to form IFO schemas.
Types can be atomic or nonatomic and they are defined as tree structures. There are
three kind of atomic types: printable types are predefined types like STRING, INTEGER,
BOOLEAN, PICTURE, etc.; abstract types correspond to objects in the world that have
no underlying structure, relative to the point of view of the application, e.g., the type
PERSON is a typical one; free types correspond intuitively to entities obtained via an ISA
relationship, e.g., STUDENTs are a subclass of PERSONs, and STUDENT is a free type.
Nonatomic types are built from the previous ones using two mechanisms: “collection” or
grouping that are finite set of objects, which are represented by star-vertices (⊕
⊗-vertex) in
the IFO schema. If m ≥ 0 and O1 , . . . , Om are objects, then {O1 , . . . , Om } is a set object;
“aggregation” or composition is a Cartesian product and is represented by cross-vertices
(⊗-vertex). If n > 0 and O1 , . . . , On are objects, then [O1 , . . . , On ] is a tuple object. The
two constructs corresponding to star and cross vertices can be applied recursively in any
order.
Fragments are used to represent functional relationships, but differently from the FDM
in the IFO model a distinction is made between vertices serving the role of domain and
those serving the role of range. Another difference is that nested functions can be modelled
in the IFO model (but not in FDM).
An ISA Relationship from a type SUB to a type SUPER indicates that each object associated with SUB is associated with the type SUPER, and functions of SUPER are inherited
by SUB. Two types of ISA relationships are distinguished in an IFO model: specialisation
and generalisation. Specialisation can be used to define possible roles from members of a
given type, e.g., subtypes EMPLOYEE and STUDENT are specialisations of PERSON.
An object may change such roles without changing its underlying or fundamental identity.
A generalisation represents situations where distinct, pre-existing types are combined to
form new virtual types, e.g., types CAR and MOTOR-BOAT can be combined to form
VEHICLE. In such situations it is not allowed for an object of one subtype to migrate into
another subtype. Also, it is common to require that a generalised supertype be covered by
its subtypes.
Two simple constraints on ISA relationships can be incorporated into IFO schemas: subtypes forming a generalised type can be forced to be disjoint, and specialisations of a
supertype can be asked to cover the supertype.
IFO Schemas can be built in a top-down design fashion, beginning with the specification
of the major object types arising in the application environment, then specifying subsidiary
object types, either constructed or defined as sub- or supertypes, and finally specifying the
functions of all object types of the schema. Five rules are put on ISA relationships of an
IFO schema.
To characterise very precisely the types occurring locally in a schema instance the concept
of “derived type” is introduced. The derived types are a family of tree structures that start
with basic types (printable and abstract), and permit the application of three constructs ⊕
⊗,
⊗ and ⊕, which represent aggregation, grouping, and generalisation. The type T formed
23
from ⊕ of two types T1 and T2 will have a domain equal to the union of domains of T1
and T2 . An order relation and an equivalence relation are defined on the derived types.
The paper also focuses on the semantics of updates in the IFO model. In particular, it
allows to carefully examine the different ways that a modification of the data associated
with one part of a database schema can affect data associated with other parts of the
schema. A fundamental observation was that local update semantics can be specified
separately for each construct of the model, and then combined in a natural manner to
form a well-defined global semantics [1].
The spatial data model described in [11] is an integration of functional data modelling
concepts with order-sorted algebras. The novelties of this model are the modelling and
querying of networks and heterogeneous collections of spatial objects. Graphs are used for
modelling networks, but they are also presented as a modelling tool to describe relationships
between objects, thus making available all the efficient graph algorithms in data queries.
They use a multilevel order-sorted algebra to put all the concepts they introduce and
describe in a common formalism. The classical relational algebra is a one-sorted algebra,
its domain being a set of relations, having a collection of operations (functions like join,
project) defined on this domain. A many-sorted algebra provides for a well-structured type
system and integration of arithmetic or aggregate functions, and generally ADT functions.
Order-sorted algebra allows for subtype hierarchies and inheritance, by means of a partial
order on algebra sets (sort carriers), which implies that functions defined on one set can be
applied to elements of a subset. The model describes complex object types that are built
from simpler ones by means of type constructors that can not be described by means of
a one-level algebra. The model supports polymorphic functions, and this polymorphism
can not be modelled by only one level of algebra. To have the parametric polymorphism
it is needed to have sets of types in which to define the (polymorphic) function, and this
is realised by kinds that are simply sets of types (sorts of the first level). The first level
algebra describes basic types that are the sorts of this algebra, and operations on sorts.
The second level algebra describes kinds, which carriers are sets of sorts of the first level
algebra. Operations of the second-level algebra are the type constructors, their associated
functions are mappings between kinds, i.e., they map one or more sorts of one kind to a sort
of another kind. This second level is called kind algebra. The types gained from applying
type constructors of the second level, are then used at the first level. A third-level algebra
is mentioned (called class algebra) to introduce some operations on complex (structured)
types.
The data model is developed in three steps, by introducing first the data types, then the
object types, and finally the structures.
Data types are a collection of standard data types — BOOL, STR, INT , REAL — as
well as the geometric types — POINT , LINE , REG — and some other types shown in
figure 2.1.a that gives the sort hierarchy .
Standard and geometric types are the basic data types (sorts), and are the leaves of the
tree in the sort hierarchy. The notation h·i is used to refer to the domain of a type (carrier
set of the sort), e.g., hINT i denotes the carrier set of sort INT . The carrier of the internal
nodes is defined to be the union of the carriers of children, e.g., hNUM i = hINT i∪hREALi.
24
This meets the subtype constraint of order-sorted algebra.
The set of kinds S = {NUM, ORD, GEO, DATA} is introduced at the second level
algebra. A kind hierarchy is also defined for data types that is given in figure 2.1.b as part
of the complete type system hierarchy. The notation hh·ii is used to denote the carrier of
kinds. The carriers of kinds contain just the sorts that are descendent in the sort hierarchy,
e.g., hhGEOii = {GEO, POINT , EXT , LINE , REG}.
The introduction of kinds makes the definition of polymorphic functions possible, e.g., the
spatial operation ‘inside’ can be defined as inside : GEO × REG " BOOL where GEO
can be any spatial type POINT , LINE , or REG.
The type constructor ⊕ on data types with the signature
⊕ : DATA × DATA " DATA
is defined to return for any two sorts in hhDATAii their smallest common supersort, e.g.,
LINE ⊕ REG = EXT , and POINT ⊕ LINE = GEO.
Object types are totally dependent on a specific application. Sorts represent object
classes and operations represent functions applicable to objects. The object sorts model
the object stored in the database, and they all make up a kind BASEOBJ. For each sort
in BASEOBJ, the carrier is in principle a set of object identifiers.
Two type constructors ⊕, union, and ⊗, aggregation, are used to build other object types
(objects of this types are called “potential objects” and they are not stored in the database).
The sort s ⊕ t resulting from union operation will represent a “collection of objects” which
are of type s or t. In the object sort hierarchy, the constructed sort is a supersort of the
operands to the constructor. The ⊗ operation allows us to build a sort for “aggregation
objects”. The sort constructed from aggregation operation is a subsort of the operands to
the constructor.
Structured types: The two fundamental structures available in this model are sequences
and graphs, that are introduced through constructors seq and graph, respectively. The
(second level) signature of the seq constructor, used to build sorts of kind SEQ is
seq : ANY " SEQ, and the set of sorts obtained by applying the seq constructor to the
sorts in hhANYii are (the carrier of kind SEQ) :
hhSEQii = {seq(BOOL), seq(INT ), . . . seq(POINT ), . . .}
A database could be modelled as an object hierarchy, together with functions applicable to
objects, describing attributes and relationships. The model introduces graphs as another
modelling tool, which means that the user can define some part of the database explicitly
as a graph structure.
Any three object sorts s, t, u of kind BASEOBJ can be selected, and applying the graph
constructor on them, a type graph(s, t, u) can be defined:
∀ s, t, u ∈ hhBASEOBJii :
hgraph(s, t, u)i := {(N , E , XP , ε, π) |
(i) N ⊆ hsi, E ⊆ hti, XP ⊆ hui,
(ii) ε : E N × N (no two edges between the same nodes),
(iii) π : XP " E ∗ its range contains only simple paths of the graph (N , ε(E )).}
25
The idea is that in a given graph of type graph(HLocation, Section, Highway) a HLocation
object is associated to each node, a Section object is associated with each edge, and for
each Highway in XP there is a path in this graph associated to it by π. It is assumed
that an object type used in such a graph definition is ‘devoted to’ this graph instance, i.e.,
that every object in the object type used for nodes, is automatically a node of this graph
instance.
The constructors node, edge, and xpath, are in fact selectors: they extract from a graph
sort the sorts it was constructed from:
node(graph(s, t, u)) = s, edge(graph(s, t, u)) = t, xpath(graph(s, t, u)) = u.
These constructors map to the kind COMPOBJ (graph component object) whose element
can be treated like any other object sort; it is therefore a subkind of BASEOBJ. The
last constructor is path, the application of which restricts the carrier of a given graph
type. Hence, ∀ G ∈ hhGRAPHii : hpath(G)i ⊆ hGi, which allows to define a subsort
relationship in the sort hierarchy. In other words, any path can also be viewed as a graph,
that means it inherits all operations defined in graphs [11].
TM presented in [2] is a typed language with object-oriented features such as attributes
and methods in the presence of subtyping, and FM is the formal theory in which it is
based. The paper introduces two important features in conceptual database modelling:
predicative description of sets, and static constraints of different granularity (object level,
class level, database level). The formal theory FM is based in the Cardelli type theory.
TM allows to handle expressions that denote enumerated sets, and set expressions that are
formed by set comprehension, and have the form {x : σ | φ(x )}, where φ(x ) is a boolean
expression and σ a type in TM. A set expression of this form is called predicative set.
Types are basic types such as integer, real, etc., power types, and record types such as
hage : integer, name : stringi.
The boolean expression of TM are: Constants - true and false; Logical formulas - ¬ (e),
(e ⇒ e 0 ), (e ∧ e 0 ), (e ∨ e 0 ), (e ⇔ e 0 ), expression involving quantifiers ∀ and ∃, where e
and e are boolean expressions, or expressions built up from arithmetical relations like ≤,
> etc.; Special boolean expression - e e 0 (e isa e 0 ), e = e 0 , e ∈ e 0 (e in e 0 ), eεe 0 (e sin e 0 ),
and e ⊂ e 0 (e subset e 0 ) where e and e 0 are TM expressions.
Expression are: constants such as 1integer , 2.0real , or variables such as xinteger , or records,
or projections such as hage = 3, name = “John00 i · name (that evaluates to “John”).
The set of types is equipped with a subtyping relation ≤ that is a partial order in the set
of types. The typing rules are extended such that e : σ, σ ≤ τ ⇒ e : τ . This is called
subtyping. The subtyping relation introduces polymorphism in the language, in the sense
that expressions can have more than one type. There is a way of attaching a unique type
to a correctly typed expression, which is called minimal typing. It is said that e has a
minimal type τ if e : τ ∧ (¬ ∃ σ | e : σ ∧ σ ≤ τ ). The symbology used to show that τ is
the minimal type of e is e :: τ .
If σ is a type then σ denotes the powertype of σ, which is the collection of all sets of
expressions e such that e : σ. An expression e is called a set if it has a powertype as its
type, i.e e : σ for some type σ.
26
The methodology adopted in the paper to describe the set of allowed states of a database
uses three levels :
1. The object level, in which the object types of interest are described as well as, for
each object type, the set of allowed objects of that type. For a class C, at the object
level, C’ object type is denoted by γ. C Universe :: γ denotes the set of allowed
objects of class C.
C Universe = {x : γ | φ(x )}
The predicate φ(x ) determines which objects of type γ are allowed objects.
2. The class extension level, in which the set of allowed class extension for each class
is described. At the class extension level, C ClassUniverse :: γ denotes the set
of allowed class extensions. (An element of C ClassUniverse is thus a possible class
extension of class C).
C ClassUniverse = {X : γ | X ⊆ C Universe ∧ φ0 (X )}
The predicate φ0 (X ) is used to state constraints on the class extension (for instance,
more than ten objects should be in any extension of that class).
3. The database level, in which the set of allowed database states is described. At
the database level, the DatabaseUniverse :: hC1 : γ1 , ...Cn : γn i denotes the
collection of allowed database states.
DatabaseUniverse = {DB : hC1 : γ1 , ...Cn : γn i |
Vn
i=1 DB · Ci ∈ Ci ClassUniverse ∧ Φ(DB )}
By the generalised conjunction, this definition first of all requires each class extension
in an allowed database state to be in an allowed class extension. Furthermore, it may
pose additional requirements on the database state by means of Φ(DB ) like referential
integrity between distinct class extensions.
27
28
Chapter 3
Spatial Elements
In this chapter, we will formally present spatial elements that can be used in a conceptual
schema for spatial applications, and relationships between them. We will first introduce
graphs and graph concepts that can be used in data modelling in general.1 This is done
in section 3.1. We will use these concepts in section 3.2 to represent or define the spatial
elements. Section 3.3 gives a short summary of all spatial elements and relationships
between them.
3.1
Graph notions
A graph, G, is an ordered triple (V (G), E (G), ψG ) consisting of a nonempty set V (G)
of vertices, a set E (G), disjoint from V (G), of edges, and an incidence function ψG that
associates with each edge of G an unordered pair of (not necessarily distinct) vertices of
G. If e is an edge and u and v are vertices such that ψG (e) = {u, v }, then e is said to
join u and v ; the vertices u and v are called the ends of e. (An edge with distinct ends is
a link and one with identical ends is a loop.) The ends of an edge are said to be incident
with the edge and vice versa. Two vertices which are incident with a common edge are
adjacent, as are two edges which are incident with a common vertex.
Graphs built from two given sets, V and E , are:
[V , E ]
GRAPH : V × E × (E
1 V )
∀ G : GRAPH , ∀ V 0 : V , ∀ E 0 : E , ∀ ψ : E 1 V | G = (V 0 , E 0 , ψ) •
dom ψ = E 0 ∧ ∀ e : E 0 • (∃ v1 , v2 : V 0 • ψ(e) = {v1 , v2 })
Graphs can be represented graphically by a diagram: each vertex is indicated by a point,
and each edge by a line joining the points which represent its ends.
We can find the components of a given graph by:
1
Definitions of graph concepts are taken from [6].
29
[V , E ]
vertices : GRAPH [V , E ] " V
edges : GRAPH [V , E ] " E
incf : GRAPH [V , E ] " (E 1 V )
∀ G : GRAPH [V , E ] •
(vertices G = first G ∧ edges G = second G ∧ incf G = third G)
first, second and third are projection functions in a Cartesian product. We will use a
shorter notation for vertices G, edges G, and incf G 2 :
VG == vertices G, EG == edges G, ψG == incf G,
For every graph we can define its incidence matrix as
[V , E ]
inc : GRAPH [V , E ] " (V × E
)
∀ G : GRAPH [V , E ] • dom inc G = V
G × EG ∧

/ ψG (e), otherwise
 0 if v ∈
∀ v : VG , ∀ e : EG • inc G(v , e) = 1 if #ψG (e) = 2


2 if #ψG (e) = 1
The degree of a vertex v in G is the number of edges incident with v , each loop counting
as two edges. Thus, for every vertex of a graph we can define its degree as
[V , E ]
degree : GRAPH [V , E ] " (V
)
∀ G : GRAPH [V , E ] • dom degree G = VG ∧
P
∀ v : VG • degree G(v ) = e∈EG inc G(v , e)
The following statement is true : ∀ G : GRAPH [V , E ] • v ∈VG degree G(v ) = 2 · #EG .
A walk in G is a finite non-empty sequence W = hv0 , e1 , v1 , ...ek , vk i, whose terms are
alternately vertices and edges, such that, for 1 ≤ i ≤ k , the ends of ei are vi−1 and vi . If
the edges e1 , e2 , ..., ek of a walk are distinct, W is called a trail. If, in addition, the vertices
v0 , v1 , ...vk are distinct, W is called a path (a (v0 , vk ) path). To give the definitions of walk,
trail and path we need an alternate sequence which elements are from two sets, such that
all the elements in odd positions are from one set, and in even positions are elements from
the other set.
P
altseq[X , Y ] == {s : 1 X ∪ Y | ∃ n : • dom s = 1 . . (2n + 1) ∧
(∀ m : | 2m + 1 ≤ 2n + 1 • s(2m + 1) ∈ X ) ∧
(∀ m : 1 | 2m < 2n + 1 • s(2m) ∈ Y )}
2
When there is no confusion about V and E we will not associate the function names with them, e.g.
we will write incf instead of incf [V , E ].
30
Now we can give the definition of walks, trails and paths in a graph G.
[V , E ]
walks : GRAPH [V , E ] " altseq[V , E ]
trails : GRAPH [V , E ] " altseq[V , E ]
paths : GRAPH [V , E ] " altseq[V , E ]
∀ G : GRAPH [V , E ] •
walks G = {s : altseq[VG , EG ] | ∃ k : • #s = 2k + 1 ∧
∀ i : 1 . . k • ψG (s(2i )) = {s(2i − 1), s(2i + 1)}}
trails G = {w : walks G | ∃ k : • #w = 2k + 1 ∧
∀ i , j : 1 . . k | i 6= j • w (2i ) 6= w (2j )}
paths G = {t : trails G | ∃ k : • #t = 2k + 1 ∧
∀ i , j ∈ 0 . . k | i 6= j • t(2i + 1) 6= t(2j + 1)}
A walk is closed if it has a positive length and its origin and terminus are the same. A
closed trail whose origin and internal vertices are distinct is a cycle.
[V , E ]
cycles : GRAPH [V , E ] " altseq[V , E ]
∀ G : GRAPH [V , E ] •
cycles G = {t : trails G | ∃ k : 1 • #t = 2k + 1 ∧ t(1) = t(2k + 1) ∧
(∀ i , j : 1 . . k | i 6= j • t(2i + 1) 6= t(2j + 1)}
A graph H is a subgraph of G (written H ⊆ G) if VH ⊆ VG , EH ⊆ EG , and ψH is the
restriction of ψG to EH . Subgraphs of a given graph G are defined by
[V , E ]
subgraphs : GRAPH [V , E ] " GRAPH [V , E ]
∀ G : GRAPH [V , E ] • subgraphs G =
{H : GRAPH [V , E ] | VH ⊆ VG ∧ EH ⊆ EG ∧ ψH = EH
ψG }
Subgraphs of a graph whose vertices and edges are terms of a path are
[V , E ]
SPaths : GRAPH [V , E ] " GRAPH [V , E ]
∀ G : GRAPH [V , E ] •
SPaths G = {p : paths G, H : subgraphs G | ∃ k : • #p = 2k + 1 ∧
VH = {i : 0 . . k • p(2i + 1)} ∧ EH = {i : 1 . . k • p(2i )} ∧
(∀ i : 1 . . k • ψH (p(2i )) = {p(2i − 1), p(2i + 1)}) • H }
PATH [V , E ] ==
S
G∈GRAPH [V ,E ]
SPaths G is a subtype of GRAPH [V , E ].
31
Similarly we will define subgraphs of a graph whose vertices and edges are terms of a cycle,
and we will call them SCycles.
[V , E ]
SCycles : GRAPH [V , E ] " GRAPH [V , E ]
∀ G : GRAPH [V , E ] •
SCycles G = {c : cycles G, H : subgraphs G | ∃ k : 1 • #c = 2k + 1 ∧
VH = {i : 0 . . k • c(2i + 1)} ∧ EH = {i : 1 . . k • c(2i )} ∧
(∀ i : 1 . . k • ψH (c(2i )) = {c(2i − 1), c(2i + 1)}) • H }
S
CYCLE [V , E ] == G∈GRAPH [V ,E ] SCycles G is another subtype of GRAPH [V , E ].
Suppose V 0 is a nonempty subset of vertices VG of a graph G. The subgraph of G whose
vertex set is V 0 and whose edge set is the set of edges of G that have both ends in V 0 is
called the subgraph of G induced by V 0 , and it is denoted by G[V 0 ].
[V , E ]
[ ] : GRAPH [V , E ] × V
GRAPH [V , E ]
∀ G : GRAPH [V , E ], ∀ V 0 : V | V 0 ⊂ VG •
G[V 0 ] = (V 0 , dom(ψG 1 V 0 ), ψG 1 V 0 )
Two vertices u and v of G are said to be connected if there is a (u, v ) path in G. Connection
is an equivalence relation on the vertex set VG of G. Thus there is a partition of VG into
nonempty subsets V1 , V2 , . . . Vn such that two vertices u and v are connected if and only
if both u and v belong to the same set Vi . The subgraphs G[V1 ], G[V2 ], . . . G[Vn ] are
called the components of G. If G has exactly one component, G is connected ; otherwise
G is disconnected. Thus, a graph G is connected if every two vertices of it are connected.
The set of connected graphs CGRAPH [V , E ] is
CGRAPH [V , E ] == {G : GRAPH [V , E ] | ∀ u, v : VG •
(∃ p : paths G • p(1) = u ∧ p(#p) = v )}
Suppose E 0 is a nonempty subset of the edge set of a graph G. The subgraph of G obtained
by deleting the edges of E 0 is G E 0 :
[V , E ]
: GRAPH [V , E ] × E
GRAPH [V , E ]
∀ G : GRAPH [V , E ], ∀ E 0 : E | E 0 ⊂ EG • G E 0 = (VG , EG \ E 0 , EG \ E 0 ψG )
A cut edge of G is an edge e ∈ EG such that the number of components of G {e} is
greater than the number of components of G. If the degree of every vertex v ∈ VG of a
graph G is even, G has no cut vertices.
32
Figure 3.1: Graph hierarchy
An acyclic graph is one that contains no cycles. A tree is a connected acyclic graph.
TREE [V , E ] == {G : CGRAPH [V , E ] | cycles G = }
The following statement is valid for trees: ∀ G : TREE [V , E ] • #EG = #VG − 1.
A path is a tree which vertices have degree at most 2. There are exactly two vertices with
degree 1 in a path. A cycle is a connected graph which vertices have degree 2. Thus,
PATH [V , E ] ⊂ TREE [V , E ] and CYCLE [V , E ] ⊂ CGRAPH [V , E ].
Another property of graphs (orthogonal to connectivity) is planarity. A graph is said to be
embeddable in the plane, or planar, if it can be drawn in the plane so that its edges intersect
only at their ends. Such a drawing of a planar graph G is called a planar embedding of G.
A planar embedding G̃ of G can itself be regarded as a graph; the vertex set of G̃ is the
set of points representing vertices of G, the edge set of G̃ is the set of lines representing
edges of G, and a vertex of G̃ is incident with all the lines of G̃ that contain it. A planar
embedding of a planar graph is sometimes called a plane graph. A plane graph carries the
geometry of plane.
We will give the definition of plane graphs, PLGRAPH , when we talk about one-dimensional
elements (because this definition needs topological concepts). Connectivity, defined for
graphs in general, is valid for plane graphs and all the other concepts derived from it, e.g.
tree, path, cycle, given above, are also valid for plane graphs. Figure 3.1 gives the hierarchy
of graph types introduced until now together with their analogue planar types. PGRAPH
are planar graphs, PCGRAPH are the connected planar graphs, PCYCLE are the planar
cycles, PTREE are the planar trees, and PPATH are the planar paths. (For readability
reason we omit V and E from the notation of graph types in figure 3.1.)
A directed graph (or digraph) D is an ordered triple (VD , AD , ψD ) consisting of a nonempty
set VD of vertices, a set AD , disjoint from VD , of arcs, and an incidence function ψD that
associates with each arc of D, an ordered pair (not necessarily distinct) of vertices of D.
If a is an arc of D and u, v are vertices of D such that ψD (a) = (u, v ), then u is the tail
33
of a and v is its head. Digraphs on V , A are defined by
[V , A]
DIGRAPH : V × A × (A V × V )
∀ D : DIGRAPH , ∀ V 0 : V , ∀ A0 : A, ∀ ψ : A V × V | D = (V 0 , A0 , ψ) •
dom ψ = A0 ∧ ∀ a : A0 • (∃ v1 , v2 : V 0 • ψ(a) = (v1 , v2 ))
We can define the components of a digraph in the same way we defined the components of
graph, and for a given digraph D we are denoting with VD the set of its vertices, AD the
set of its arcs, and ψD its incidence function. The incidence matrix of a digraph is defined
as
[V , A]
incd : DIGRAPH [V , A] " (V × A )
∀ D : DIGRAPH [V , A] • dom incd D = 
VD × AD ∧

 1 if v is tail of a
∀ v : VD , ∀ a : AD • incd D(v , a) = −1 if v is head of a


0 otherwise
Indegree dD− (v ) of a vertex v in a digraph D is the number of arcs with head v ; and
outdegree dD+ (v ) of a vertex v in D is the number of arcs with tail v .
[V , A]
d − : DIGRAPH [V , A] " (V
d + : DIGRAPH [V , A] " (V
)
)
∀ D : Digraph • dom dD− = VD ∧ dom dD+ = VD ∧
P
∀ v : VD • dD− (v ) = − a∈AD min{0, incd D(v , a)} ∧
P
dD+ (v ) = a∈AD max {0, incd D(v , a)}
A directed walk in D is a finite non-null sequence W = hv0 , a1 , v1 , ...ak , vk i, whose terms
are alternately vertices and arcs, such that for i ∈ 1 . . k the arc ai has head vi and tail
vi−1 . A directed trail is a directed walk that is a trail; directed paths, and directed cycles
are similarly defined.
34
[V , A]
diwalks : DIGRAPH [V , A] " altseq[V , A]
ditrails : DIGRAPH [V , A] " altseq[V , A]
dipaths : DIGRAPH [V , A] " altseq[V , A]
dicycles : DIGRAPH [V , A] " altseq[V , A]
∀ D : DIGRAPH [V , A] •
diwalks D = {s : altseq[VD , AD ] | ∃ k : • #s = 2k + 1 ∧
∀ i : 1 . . k • ψD (s(2i )) = (s(2i − 1), s(2i + 1))}
ditrails D = {w : diwalks D | ∃ k : • #w = 2k + 1 ∧
∀ i , j : 1 . . k | i 6= j • w (2i ) 6= w (2j )}
dipaths D = {t : ditrails D | ∃ k : • #t = 2k + 1 ∧
∀ i , j : 0 . . k | i 6= j • t(2i + 1) 6= t(2j + 1)}
dicycles D = {t : ditrails D | ∃ k : 1 • #t = 2k + 1 ∧ t(1) = t(2k + 1) ∧
(∀ i , j : 1 . . k | i 6= j • t(2i + 1) 6= t(2j + 1)}
Using the directed paths we can define DITREE [V , A], DIPATH [V , A] and DICYCLE [V , A]
as subtypes of DIGRAPH [V , A].
DITREE [V , A] == {D : DIGRAPH [V , A] | (∃ r : VD • dD− (r ) = 0 ∧ ∀ v : VD \ {r } •
dD− (v ) = 1) ∧ (∃ p : dipaths D • p(1) = r ∧ p(#p) = v )}
DIPATH [V , A] == {D : DITREE [V , A] | ∀ v : VD • dD+ (v ) ≤ 1}
DICYCLE [V , A] == {D : DIGRAPH [V , A] | (∀ v : VD • dD+ (v ) = 1 ∧ dD− (v ) = 1)
∧ (∀ u, v : VD • ∃ p : dipaths D • p(1) = u ∧ p(#p) = v })
Vertex r in the definition of a directed tree is the root of the ditree: the path connecting
the root and any vertex of the ditree is unique. The path connecting any two vertices of a
dicycle is also unique.
With each digraph D, we can associate a graph G on the same vertex set; corresponding
to each arc of D is an edge of G with the same ends. This graph is the underlying graph
of D. (Conversely, given any graph G, we can obtain a digraph from G by specifying, for
each link, an order of its ends. Such a digraph is called an orientation of G.)
We postulate a function atoe from the set of arcs A to the set of edges E that associates
with each arc of A an edge, which is the arc without the direction, e.g. if arcs represent
relationships between some source and target elements, then atoe will associate with every
arc an edge that is the relationship between the source and target of that arc, without
making a distinction between those two elements. We will use atoe to build the function
ugraph which associates a graph with every digraph.
[V , A, E ]
atoe : A E
ugraph : DIGRAPH [V , A] " GRAPH [V , E ]
∀ D : DIGRAPH [V , A] • ugraph D =
(VD , ran(AD atoe), {a : AD • (atoe(a), {first ψD (a), second ψD (a)})})
35
If S and T are subsets of V , D a digraph on V , A, we denote by (S , T )D the set of arcs
from AD that have their tails in S and their heads in T .
[V , A]
( , ) : V × V × DIGRAPH [V , A] " A
∀ S , T : V , ∀ D : DIGRAPH [V , A] • (S , T )D = dom ψD
(S × T )
A network N is a digraph D (the underlying digraph of N ) with two distinguished subsets
of vertices, X and Y , and a non-negative integer-valued function c defined on its arc set
AD ; the sets X and Y are assumed to be disjoint and nonempty. The vertices in X are
the sources of N and those of Y are the sinks of N . They correspond to production
centres and markets respectively. Vertices which are neither sources nor sinks are called
intermediate vertices; the set of such vertices will be denoted by I . The function c is the
capacity function.
[V , A]
NETWORK : (DIGRAPH [V , A] × 1 V × 1 V × (A ))
∀ N : NETWORK • N = (D, X , Y , c) ⇒ (dom c = AD ∧
X ∩ Y = ∧ X ∪ Y ⊂ VD ∧ (VD \ X , X )D = ∧ (Y , VD \ Y )D = )
The last two predicates state that there are no arcs coming in X from out of X (X is the
source), and there are no arcs going from Y to vertices not in Y (Y is the sink). We will
denote
DN == first N ∧ VN == VDN ∧ AN == ADN ∧ ψN == ψDN ∧
XN == second N ∧ YN == third N ∧ IN == VN \ (XN ∪ YN ) ∧ cN == forth N
To define a flow in a network we will first need some notations. If f is a real-valued
P
function defined in the arc set AN of N , and if K ⊂ AN , we denote a∈K f (a) by f (K ).
Furthermore, if S ⊂ VN and K is a set of arcs of the form (S , VN \ S )DN , we shall write fN+
for f ((S , VN \ S )DN ), fN− for f ((VN \ S , S )DN ), f − (v ) for f − ({v }), and f + (v ) for f + ({v })
[6].
A flow in a network N is an integer-valued function fN defined on AN such that :
0 ≤ fN (a) ≤ c(a) for all a ∈ AN , and fN+ (v ) = fN− (v ) for all v ∈ IN .
The value fN (a) of fN on an arc a can be likened to the rate at which material is transported
along a under the flow fN . The upper bound c, called the capacity constraint imposes that
the rate of flow along an arc cannot exceed the capacity of the arc. The other condition,
called the conservation condition, requires that, for any intermediate vertex v , the rate
at which the material is transported to v is equal to the rate at which the material is
transported out of v .
36
To formally write a flow we have to formalise the notations given above.
[V , A]
f :A"R
f + : NETWORK [V , A] " (V
f − : NETWORK [V , A] " (V
R)
R)
∀ N : NETWORK [V , A] • dom fN+ = VN ∧ dom fN− = VN ∧
P
∀ v : VN • fN+ (v ) = a∈({v },VN \{v })DN f (a) ∧
P
fN− (v ) = a∈(VN \{v },{v })DN f (a)
We are using the notations fN+ and fN− for f + (N ) and f − (N ), respectively. We will do the
same in the next definition of a flow in a network N ; we will write flowN for flow (N ).
[V , A]
flow : NETWORK [V , A] " (A )
∀ N : NETWORK [V , A] • dom flowN = AN ∧
∀ a : AN •
flowN (a) ≤ cN (a)
∧
∀ v : IN •
flowN+ (v ) = flowN− (v )
3.2
Spatial Elements
The space we will deal with is the two-dimensional space R 2 . Together with the usual
metric, Euclidean distance, it forms a metric space (R 2 , ρ). The metric topology generated
by the usual metric in R 2 is called the usual topology (let’s denote it τρ ). (R 2 , τρ ) is a
topological space. Spatial elements that we will introduce are subsets of R 2 , and they are
subspaces of R 2 with the relative topology derived from the usual topology. (Any other
topology would also be valid for all the definitions and reasoning given below.) 3
Properties of space can be described as functions from R 2 to the domain of values of this
property. The property domain may contain values that are measurements belonging to
one of the following types: nominal (the values are qualitative and not quantitative ones),
ordinal, interval, or ratio [27]. Nominal values can be represented as enumerated types and
this makes the property domain to be a subset of real numbers, in all the cases. Generally
we deal only with subsets of space and to make the function a total function, we can extend
the property domain with an undefined value, ⊥. Then a property in space can be given
as a function f : R 2 " R ∪ {⊥}, and the property domain is ran f .
A field is (generally) a continuous function from space to an interval of R extended with
⊥, the range of the function is an uncountable set. To get a discrete representation of a
field, we try to discretise the range of function values, following the idea that close points
in space will have close function values. When ran f is a finite set, we can follow another
3
Definitions of topology terms, taken from [26], are given in Appendix A.1
37
Figure 3.2: Extended Spatial ER Diagram
approach. A total function f : X " Y is completely defined if we can define for every
y ∈ ran(f ) the set f −1 (y) ⊂ X . If the images by f −1 of ran f elements are (somehow)
regular as e.g. linear features, or collection of points, or areas (and the rest of the space
has the value undefined) then we have the object representation of space, and we can treat
them as functions from the property domain to a set of spatial objects. And what we need
to do is defining what are these spatial objects, and how they can be represented. This is
our concern in the coming sections.
We will give the description and the formalisation of the spatial elements using concepts
from topology and graph concepts introduced before. We will define zero-, one-, and twodimensional primitives, and then with the help of the set constructor (called grouping in
some papers, or the star-vertex in the IFO model) we will build new elements from these
primitives, which are collections of zero-, one-, and two-dimensional primitives. Considering the frequent use of some special collections of spatial object such as paths (e.g.,) for
highways, directed trees in hydrological models, spatial partitions in administrative division of a country, etc., we would need to have other elements for these collections. We
can obtain such elements by setting constraints on the collections. Then by adding other
38
constraints in the newly created elements, we can get specialisations of these elements.
Figure 3.2 gives a spatial ER diagram (as given in [27]) extended with the elements that
will be introduced in this section. In the diagram, rectangles are used for the entity types,
diamonds for relationship types, ellipsis for the attributes, cardinality constraints are given
as a pair of “min. .max” values that denote the participation constraint and the cardinality
ratio, respectively (the notation is according to [8]), arrows show subtyping and ⊕
⊗ is used
for the set constructor. A brief summary of this diagram will be given at the end of the
chapter.
The way the spatial elements will be presented here is: first the zero-dimensional primitive
and collections of them will be introduced in section 3.2.1; then, the one-dimensional
primitive and the other one-dimensional elements built up using the set constructor and
the addition of constraints will be defined in section 3.2.2; the two-dimensional primitive
and two-dimensional elements (built up in the same way as the one-dimensional ones) will
be given in section 3.2.3. The relation between higher dimension primitives and lower
dimension ones will also be described.
3.2.1
Zero-dimensional elements
Zero-dimensional elements of R 2 are Point and PointSet.
• A zero-dimensional primitive, point, is an element of R 2 . A point is a suitable representation for an object for which only the position, not the extent, is of interest [13].
The set of all points is Point == R 2 .
• PointSet is the set of finite collection of points. Thus PointSet ==
PointSet = R 2 .
3.2.2
Point, that is
One-dimensional elements
One-dimensional elements are the two analogues of the zero-dimensional elements, Line
and LineSet, and some other that will be defined by adding constraints to the elements of
LineSet. There are cases when we are interested in the direction of movement on a line,
and other cases when we just want the set of points that make up a line. For this reason,
we will make the distinction between a directed line and a line.
• A one-dimensional primitive, line, is a continuous non-self-intersecting curve. It can
be a looped curve (the begin and end of the curve are the same point). A line is
(in most cases) an abstraction for ways of moving through space, or for connections
through space (roads, rivers, electricity networks, gas lines, etc.) [13]. Figure 3.3.a
gives an example of a line, 3.3.b is a looped line and 3.3.c is not an accepted line.
To give the definitions of a directed line and a line we need to define intervals in R.
39
Figure 3.3: — (a) a line, (b) a looped line, (c) not a line.
[ , ] : R ×R R
[ , [: R × R R
] , [: R × R R
∀ p, q : R | p < q •
[p, q] = {r : R | p ≤ r ≤ q}
[p, q[= {r : R | p ≤ r < q}
]p, q[= {r : R | p < r < q}
We will give now the definitions of directed lines and lines using continuous functions
on I , the closed unit interval in R.
C (I , R 2 ) == {f : [0, 1] " R 2 | f is continuous}
DLine == {f : C (I , R 2 ) | [0, 1[f : [0, 1[R 2 ∧ f (1) ∈
/ f ]0, 1[}
Line == {d : DLine • ran(d )}
A directed line is a function d from the unit interval in R to a point set l ⊂ R 2 .
The point set l is a line (an element of Line). The function d lifts the order of the
interval [0, 1] to the subset l of R 2 . This order gives a direction of movement in the
line l , which is the reason we call d a directed line. DLine is the set of directed lines.
The functions FNode and TNode define the begin and end node (respectively) of a
directed line.
FNode : DLine " Point
TNode : DLine " Point
∀ d : DLine • FNode(d ) = d (0) ∧ TNode(d ) = d (1)
For every line, there are only two functions (directed lines) which range is this line.
∀ l : Line, ∀ d , d 0 : DLine | ran d = l ∧ ran d 0 = l •
(∀ λ : [0, 1] • d 0 (λ) = d (1 − λ) )
For directed lines d , d 0 , which range is the line l , it is true that
d (0) = d 0 (1) ∧ d (1) = d 0 (0)
that implies
40
{d (0), d (1)} = {d 0 (0), d 0 (1)}
Figure 3.4: — (a) a collection of lines, (b) a tree, (c) a path, (d) a cycle.
Thus, we can define the set of nodes of a line l as
Node : Line " 1 Point
∀ l : Line, ∀ d : DLine | l = ran(d ) • Node(l ) = {d (0), d (1)}
The boundary of a line l is the set Node(l ).
• DLineSet is built from DLine in the same way PointSet is built from Point. Thus,
DLineSet == DLine is a finite collection of directed lines.
From DLineSet we can build LineSet of which the elements will be the ranges of
directed lines that are components of DLineSet elements. First, we define a function
lines :
lines : DLineSet
Line
∀ ds : DLineSet • lines(ds) = {d : ds • ran(d )}
then we build LineSet == lines DLineSet . So, LineSet =
gives an example of a collection of lines.
Line.
Figure 3.4.a
For a collection of directed lines we can find all the begin and end nodes of its
components.
FNodes : DLineSet " Point
TNodes : DLineSet " Point
∀ ds : DLineSet • FNodes(ds) = d∈ds {FNode(d )} ∧
S
TNodes(ds) = d∈ds {TNode(d )}
S
We can define the set of nodes of a collection of lines as the union of (the set of)
nodes of its line components.
Nodes : LineSet
" Point
∀ ls : LineSet • Nodes(ls) =
S
l∈ls
41
Node(l )
For any collection of directed lines and its corresponding collection of lines we have
∀ ls : LineSet, ∀ ds : DLineSet | ls = lines(ds) •
Nodes(ls) = FNodes(ds) ∪ TNodes(ds)
We will define the other one-dimensional elements using graph specifications given in the
previous section. We will (first) give another representation of collections of lines as graphs
built on sets V = Point and E = Line, and we will use this representation to give
the specifications the subtypes of LineSet. Later we will give another representation for
collections of directed lines as directed graphs built on V = Point and A = DLine. Then,
similarly with the LineSet, we will define subtypes of DLineSet.
Graphs built on sets Point and Line are:
LGRAPH : GRAPH [Point, Line]
∀ G : LGRAPH • VG = Nodes(EG ) ∧ ψG = EG
Node
Then with every ls : LineSet we can associate a graph such that its edges are the lines of
ls, its vertices are the nodes of ls, and the incidence function associates with every line of
ls the set of its nodes. The function graph builds this relation.
graph : LineSet
" LGRAPH
∀ ls : LineSet • graph(ls) = (Nodes(ls), ls, ls
Node)
Using this representation of line collections we can define subtypes on them.
• Tree == {ls : LineSet | graph(ls) ∈ LGRAPH ∩ TREE [Point, Line]}
Figure 3.4.b gives an example of a tree.
• Path == {ls : LineSet | graph(ls) ∈ LGRAPH ∩ PATH [Point, Line]}
Figure 3.4.c gives a path.
• Cycle == {ls : LineSet | graph(ls) ∈ LGRAPH ∩ CYCLE [Point, Line]}
Figure 3.4.d shows a cycle.
We will define now the plane graphs:
PLGRAPH == {G : LGRAPH | (∀ li , lj : EG | li 6= lj • li ∩ lj = Node(li ) ∩ Node(lj ))}
• A collection of lines that intersect each other (only) at their ends is
PGraph == {ls : LineSet | graph(ls) ∈ PLGRAPH }
Figure 3.5.a is showing a plane graph. (We are using the same name for a graph and
a collection of lines, assuming that the context will make clear which we are referring
to.)
42
Figure 3.5: — (a) a plane graph, (b) a partition boundary, (c) a plane cycle, (d) a region
boundary.
Every node of Nodes(ls) is a vertex of graph graph(ls), thus we can talk about the degree
of a node as the degree of the vertex in graph(ls).
deg : LineSet
" (Point )
∀ ls : LineSet • dom deg ls = Nodes(ls) ∧
∀ p : Nodes(ls) • deg ls(p) = degree graph(ls)(p)
• The spatial partition is a central concept in our perception of space. The boundary
of a spatial partition is a plane graph which edges are the border curves between two
regions, and its vertices are the points where more than two regions of the partition
meet[12]. Considering the fact that every looped line has a node, which degree is
two, a partition boundary is a plane graph without cut edges, and its only nodes
with degree two are the nodes of looped lines. A plane graph ls such that no line of
it is a cut edge in graph(ls), and the only nodes of degree two are the nodes of looped
lines, is :
PBound == {ls : PGraph | no cut edges in graph(ls) ∧
∀ p : Nodes(ls) | deg ls(p) = 2 • ∃ l : ls • Node(l ) = {p}}
Figure 3.5.b gives a partition boundary (an element of PBound ).
• We will define plane cycles using cycles and plane graphs.
PCycle == Cycle ∩ PGraph
Figure 3.5.c gives a plane cycle.
A Jordan curve is a continuous non-self-intersecting curve whose origin and terminus coincide. A Jordan curve J partitions the rest of the plane into two disjoint open set called
the interior and the exterior of J , denoted by Int J and Ext J , respectively. Uρ (O, 1) is
the unit disk in R 2 , where O : Point and ρ is the Euclidean distance. To say that a subset
43
Figure 3.6: — (a) The hierarchy of directed line collections, (b) The hierarchy of line
collections
A ⊂ R 2 is homeomorphic with the unit disk we will write A ∼ Uρ (O, 1). For all Jordan
curves Int J ∼ Uρ (O, 1).
The union of lines of a plane cycle constitutes a Jordan curve:
J : PCycle " Line
∀ c : PCycle • J (c) =
S
l∈c
l
For any plane cycle c, J (c) is a Jordan curve and Int J (c) ∼ Uρ (O, 1).
• Another subtype of plane graphs is RegBound that we will need for the two dimensional element region; the boundary of a region is of type RegBound .
RegBound == {cs : PCycle, ls : PGraph | ls =
[
c∧
c∈cs
∃ c0 : cs • (∀ c : cs \ {c0 } • Int J (c) ⊂ Int J (c0 ) ∧
∀ c 0 : cs \ {c0 , c} • Int J (c) ∩ Int J (c 0 ) = ) ∧
SCycles graph(ls) = #cs • ls}
(c0 is the outer cycle.) The reasoning about the constraints put for a region boundary, will be given when the two-dimensional primitive region will be introduced.
Figure 3.5.d gives an example of region boundary.
From the hierarchy of graph elements we can derive a hierarchy on one-dimensional elements. Figure 3.6.b gives the hierarchy of line collection types.
Directed graphs on sets Point and DLine are:
DiLGRAPH : DIGRAPH [Point, DLine]
∀ D : DiLGRAPH • VD = FNodes(AD ) ∪ TNodes(AD ) ∧
ψD = {a : AD • (a, (FNode(a), TNode(a)))}
44
Figure 3.7: The relation between one-dimensional elements and graphs
For every ds : DLineSet we can build a digraph that has as the set of arcs the directed
lines of ds, as the set of vertices the begin and end nodes of directed lines from ds, and its
incidence function associates with every d ∈ ds the begin and end node of d .
digraph : DLineSet
" DiLGRAPH
∀ ds : DLineSet • digraph(ds) = (FNodes(ds) ∪ TNodes(ds), ds, ψds ) ∧
∀ d : ds • ψds (d ) = (FNode(d ), TNode(d ))
Subtypes of DLineSet can be defined using subtypes of directed graphs.
• DTree == {ds : DLineSet | digraph(ds) ∈ DiLGRAPH ∩ DITREE [Point, DLine]}
• DPath == {ds : DLineSet | digraph(ds) ∈ DiLGRAPH ∩ DIPATH [Point, DLine]}
• DCycle == {ds : DLineSet | digraph(ds) ∈ DiLGRAPH ∩DICYCLE [Point, DLine]}
• A collection of directed lines, which ranges intersect each other only at their ends is
DPGraph == {ds : DLineSet | digraph(ds) ∈ DiLGRAPH ∧
∀ d , d 0 : ds, ∀ a, b : [0, 1] | d 6= d 0 ∧ d (a) = d 0 (b) • a, b ∈ {0, 1}
These subtypes of DLineSet are connected to the subtypes of LineSet by the function
lines. Tree, Path, Cycle, and PGraph are the relational images by lines of DTree, DPath,
DCycle, and DPGraph, respectively, e.g. Tree = lines DTree . The hierarchy in directed
lines is analogous to the hierarchy of lines. The hierarchy of types introduced here is given
in figure 3.6.a.
Figure 3.7 gives the relationship between collections of lines, collections of directed lines,
graphs, and digraphs.
3.2.3
Two-dimensional elements
Elements that will be introduced here are regions, finite collections of regions, (quasi-)
disjoint regions, and spatial partitions.
45
A plane graph G partitions the rest of the plane into a number of connected regions; the
closures of these regions are called the faces of G. Each plane graph has exactly one
unbounded face, called the exterior face. The definition that we will give for a region and
a spatial partition is the formalisation of this statement.
• A face is the two-dimensional primitive that we call region. A region is the abstraction
of an object for which the position and the extent are relevant [13]. A region is a
regular closed set (i.e., it is a set without isolated points or lines, dangling lines, cuts
or punctures) of which the interior is a connected set. The set given in figure 3.8.a is
not a region because it contains a cut and a dangling line, 3.8.b and 3.8.c are regions
(3.8.c is an example of an exterior face that is an unbounded region), but 3.8.d and
3.8.e are not regions because the interior of the sets is not connected.
Figure 3.8: — (a) not a region, (b), (c) regions, (d), (e) not regions
To give the definition of a region we will use homeomorphic sets to the unit disk in R 2
(we will omit the distance ρ from its notation). The frontier of a set A is denoted by
∂A, its interior is denoted by A◦ , and the closure is denoted by A. The frontier of R 2
is the empty set. We will use the notation β(A) for the set of lines that constitutes
the frontier of A, and we will call it the boundary of A.
Region == {r : Point, D : iseq1 Point | (D(1) ∼ U (O, 1) ∨ D(1) = Point) ∧
∀ j : 2 . . #D • D(j ) ⊂ D(1) ∧ D(j ) ∼ U (O, 1) ∧
∀ j , k : 2 . . #D | j 6= k • D(j ) ∩ D(k ) = ∧
∀ j , k : dom D | j 6= k •
∂D(j ) ∩ ∂D(k ) = Nodes(β(D(j ))) ∩ Nodes(β(D(k ))) ∧
[
#SCycles graph(β(r )) = #D ∧ r = D(1) \
D(j ) • r }
j ∈2..#D
D(1) is the whole R 2 , or a set homeomorphic to the closed unit disk. The last
predicate states that a region r is a set D(1) with (probably) a finite number of holes
in it, D(j )’s. When D(1) = R 2 the region is unbounded, otherwise it is bounded.
46
The unit disk in R 2 is a regular open set, and the sets D(j ) (j : dom D \ {1}) are regular
open sets for being homeomorphic to the unit disk. The union of two regular open sets
is not always a regular open set. It is true that:
◦
◦
◦
A = A ∧ B = B ∧ x ∈ A ∪ B \ (A ∪ B ) ⇒ x ∈ ∂A ∩ ∂B .
If ∂A ∩ ∂B : Point then the union A ∪ B is a regular open set. With mathematical
induction this can be proved for the union of a finite number of regular open sets. Thus
the union of D(j )’s (j : dom D \ {1}) is a regular open set, because we ask for their
boundaries to meet in a finite set of points.
A closed unit disk is a regular closed set, R 2 is regularly closed. This implies that
D(1) is regularly closed. A region r is the difference of D(1) with the union of D(j )’s
(j : dom D \ {1}), that is the intersection of D(1) with the complement of the union.
The complement of a regular open set is a regular closed set, thus a region r is the
intersection of two regular closed sets, which is not always a regular closed set. It is
true that:
E = E ◦ ∧ F = F ◦ ∧ x ∈ (E ∩ F ) \ (E ∩ F )◦ ⇒ x ∈ ∂E ∩ ∂F .
If ∂E ∩ ∂F : Point and E ∩ F has no isolated points, then the intersection of E and
F is a regular closed set. We ask for the intersection of ∂D(1) with any other ∂D(j )
to be a finite set a points, then the intersection of the frontier of D(1) with the frontier
of the union of the other D(j )’s is again a finite set of points, because the frontier of
union is a subset of union of the frontiers, the finite union of finite sets is a finite set,
and any subset of a finite set is also finite. The union of D(j )’s (j : dom D \ {1}) is a
subset of D(1) and no D(j ) (j : dom D) has isolated points, thus the difference has no
isolated points. This makes r a regular closed set.
The boundary of a region r is the union of boundaries of D(j )-sets (j : dom D) that
S
build it up: β(r ) = j ∈dom D β(D(j )). We (also) want the interior of a region r to be
a connected set.
β(r ) has (at least) as many cycles as there are D sets that form the region r . If
there are other cycles in graph(β(r )), they would define other open sets that would be
disconnected from the rest of the region. This is why we ask the number of cycles in
graph(β(r )) to be equal to the number of D sets that form the region r . This also
implies that the frontiers of every pair of D sets can meet at most in one point, which
is an assertion made about polygons (a subtype of region) in [7].
A Jordan curve (uniquely) defines its interior, which is a point set that is homeomorphic to the unit disk in R 2 . Thus, a set D(j ) is defined by its boundary β(D(j )),
and a region is defined by the boundaries of the D(j ) sets that makes it up, and it is
of type RegBound .
The boundary of a set D, homeomorphic to the unit disk, is of type PCycle, thus β(r )
is the union of one or more plane cycles. For all the sets D(j ) that make up the region
r we want the meeting points of their frontiers to be nodes of the cycles forming the
boundaries of these sets D(j ), so β(r ) : PGraph. Int J (D(1)) includes the interiors of
the other cycles, and the predicate on the number of cycles in β(r ) is also satisfied. All
these make β(r ) : RegBound .
47
Figure 3.9: — (a) a collection of regions, (b) quasi-disjoint regions, (c) spatial partition
An alternate definition of a region would be from its boundary: as the set difference
of the closure of interior of the outer cycle, with the interiors of the other cycles.
• RegionSet is built from Region in the same way PointSet and LineSet are built from
Point and Line, respectively. Thus RegionSet == Region. Its elements are finite
collection of regions. Figure 3.9.a gives a collection of (three) regions, two of which
are overlapping, two of which are touching each other, and the last combination of
two regions is a pair of disjoint regions.
• (Quasi-) disjoint regions is a collection of regions, of which the interiors are pairwise
disjoint, and they intersect each other in a finite set of points (it can be empty) from
their frontiers.
DisjointRegs == {rs : RegionSet | ∀ r , r 0 : rs • r 6= r 0 ⇒ (r ◦ ∩ r 0◦ = ∧
r ∩ r 0 = Nodes(β(r )) ∩ Nodes(β(r 0 )) )}
The last predicate assures us that the meeting points of regions are nodes of the line
sets constituting their boundaries. Figure 3.9.b gives an example of quasi-disjoint
regions.
• A spatial partition is the separation of the plane in regions, of which the interiors are
disjoint, but they share with each other (part of) their frontiers.
SpPartition == {rs : RegionSet | (∀ r , r 0 : rs | r 6= r 0 • r ◦ ∩ r 0◦ = ∧
S
S
r ∩ r 0 = l∈β(r )∩β(r 0 ) l ∪ p∈Nodes(β(r ))∩Nodes(β(r 0 )) p) ∧
S
2
r ∈rs r = R }
The last predicate states that the union of regions of a spatial partition is the whole
plane. In every spatial partition there is only one exterior region (unbounded set).
The intersection of two regions in a spatial partition is the intersection of their boundaries. The predicate on the intersection of two regions is forcing the splitting of a line
48
Figure 3.10: Two-dimensional elements hierarchy
from the boundary of a region in the point where it touches the frontier of another
region, or at the ends of a common part.
Figure 3.9.c gives an example of a spatial partition. The exterior region is given in
gray colour, the other regions are white. Figure 3.9.d gives two regions A and B
that are part of a spatial partition. Their boundaries are β(A) = {a, f , g, d , i } and
β(B ) = {a, b, c, d , e}, having two lines a and d in common. The intersection of their
frontiers is ∂A ∩ ∂B = a ∪ d ∪ {3}, that is two lines and one node. This is not quite a
common example, but it explains the expression used in the second predicate of the
definition of spatial partitions.
The boundary of a spatial partition is
Bound : SpPartition PGraph
∀ S : SpPartition • Bound (S ) =
S
r ∈S
β(r )
There are two possibilities for a pair of lines in Bound (S ) : They are both in the
boundary of one region, and in that case they can intersect each other only at their
nodes; or, they are in the boundaries of two different regions, and for being different
they can intersect each other only at their nodes, because the frontiers of two regions
intersect each other at the nodes of their boundary lines (or in lines from the boundary,
which is not the case here because the lines are different). Thus, two lines in Bound (S )
can intersect only at their ends, i.e., Bound (S ) : PGraph. Every line in Bound (S ) is in
the boundary of a region of S , and as such it can not be a cut edge (in graphBound (S )). If
the only nodes with degree two are the nodes of looped lines, then Bound (S ) : PBound .
The hierarchy of two-dimensional elements is given in figure 3.10.
Given a plane graph G, one can define another graph G ∗ as follows: corresponding to each
face f of G there is a vertex f ∗ of G ∗ , and corresponding to each edge e of G there is an
edge e ∗ of G ∗ ; two vertices f ∗ and g ∗ are joined by the edge e ∗ in G ∗ if and only if their
corresponding faces f and g are separated by the edge e. The graph G ∗ is called the dual
of G. The dual of a plane graph is a planar graph [6].
The dual of the graph of a spatial partition boundary represents the adjacency relationship
between regions; each edge of the dual graph stands for an adjacency relationship between
two region-vertices. This translates regions adjacency problems in graph problems, for
which fast algorithms exist.
49
Two regions of a spatial partition S are adjacent if they share a line l ∈ Bound (S ). A line
from Bound (S ) is in the frontier of only two regions r , r 0 ∈ S , that is
∀ S : SpPartition • ∀ l : Bound (S ) •
∃ r : S • ∃1 r 0 : S • (l ∈ β(r ) ∩ β(r 0 ) ∧ ∀ r 00 : S \ {r , r 0 } • l ∩ β(r 00 ) = )
To any line in the boundary of a spatial partition we can join these two adjacent regions.
Adj == {S : SpPartition, ls : PGraph, r : Region, l : Line |
ls = Bound (S ) ∧ l ∈ ls ∧ r ∈ S • (l , {l : β(r ) • r })}
Adj is a partial function, and ADJ == ran Adj is the adjacency relationship in Region.
Using this relationship we will define dual graphs (but only for a subset) of plane graphs.
Dual : PGraph PGRAPH [Region, Adj ]
∀ ls : PGraph, ∀ S : SpPartition | ls = Bound (S ) •
Dual (ls) = (S , ls Adj , (ls Adj ) second )
The function Dual is defined in dom Dual = Bound SpPartition . The boundary of a
spatial partition forms a plane graph (with no cut edges) and its dual is a graph which
vertices are the regions of the spatial partition, and its edges are the adjacency relationships
between these regions. Thus, for every S : SpPartion the dual of G = graph(Bound (S )) is
G ∗ = Dual (S ).
3.3
Summary
The diagram of figure 3.2 is (more or less) a complete diagram of types presented here
and their relationships. Some types or some parts of this diagram will be used in the
conceptual schemas of spatial applications to represent spatial objects that are relevant
for the specific application. This diagram is putting in a single schema the relationship
between zero-, one-, and two-dimensional primitives and their collections, the hierarchies
on one-and two-dimensional collections (given in figures 3.6 and 3.10), and the relationship
between higher dimension and lower dimension primitives — Node, DLine, and Region.
The diagram is also giving the relationship between the directed lines and lines. Because
the hierarchy on DLineSet is analogue with the hierarchy in LineSet, it is not presented
in this diagram. The relationships BEGIN OF and END OF between DLine and Node
are realised by the function FNode and TNode, respectively. The relationship RANGE
between DLine and Line is achieved by ran.
The types introduced here are points and finite collections of points, Point and PointSet,
respectively.
Linear features in plane are lines and directed lines, which are the elements of types Line
and DLine. A line is a point set, and a directed line is a point set with an order defined on
it. The types LineSet and DLineSet are introduced to describe the collection of lines and
50
directed lines (respectively). A collection of lines connected to each other in a linear fashion
is a path (an element of Path). When branching is allowed in such collections, trees are
achieved and their type is Tree. If a linear collection is closed i.e., the beginning of the first
element coincide with the end of the last element, a Cycle is obtained. Analogue collections
of directed lines such that all the components of the collection follow the same direction
are Dpath, DTree, and DCycle. Collections of lines which components intersect each other
only at their ends are of type PGraph. Cycles that satisfy this condition are elements of
PCycle. Analogously to PGraph, DPGraph is defined as a subtype of DLineSet.
Areal feature is a region, which is a regular point set (possibly) with holes. Collections
of regions are RegionSet; regions of a region set can be overlapping. Collection of regions
that are quasi-disjoint form the type DisjointRegs. A collection of regions that partition
the plane is of type SpPartition. To describe the boundary of a region a type RegBound
is introduced in linear features; RegBound is a subtype of PGraph. When the frontier of
any region of a spatial partition is split up only at the end points of a common part with
another region, or at the touching point with another region, the collection of boundary
lines of all regions (of the spatial partition) is of type PBound . PBound is also a subtype
of PGraph.
51
Chapter 4
Logical Model
In this chapter we will refine the abstract specification of spatial elements given in Chapter 3
into concrete specifications that can serve as a basis for defining the structure of new data
types. The chapter is made of three parts: Section 4.1 is dedicated to the definition
of concrete schemas of graphs; section 4.2 will continue with the refinement of abstract
specifications of spatial elements, which schemas can serve for the definition of (some)
spatial data types; section 4.3 will deal with the possibilities that these spatial data types
offer to better express and control the relationships between spatial entities in a spatial
database. We are assuming that our data types will be implemented in an extensible
RDBMS which data model is an NF2 , i.e., allows set and composite types; that supports
subtyping (specialisation and generalisation); and it offers data structures like lists, trees,
etc. We will use database terminology to describe concrete schemas such as tables, columns
(informal terms), or tuple, attribute (formal ones), etc.
4.1
Graph Design Schemas
We will begin with refining the abstract schema of graph, then based on this schema we
will give the definition of concrete schemas for the other graph features: trees, paths and
cycles. We will do the same for the directed graphs and its subtypes: ditrees, dipaths,
and dicycles. In some cases the refinement process will take more then one step. We will
use this notation for the refined schemas: the name of each intermediate schema will be
the name of its abstract schema preceded by an e (the refinement of every specification
schema given in 3.1 will be one or two step process); each final design schema will have the
name of its abstract schema preceded by c. (We will give the retrieve schemas, the relation
between the abstract and concrete schemas, and the proofs for the correct refinement in
only a few cases.)
A graph can be represented as a set of free vertices (that are not incident with any edge)
and a table that holds the incidence function (a column for the edges and a column for the
set of vertices incident with the edges).
52
[V , E ]
cGRAPH : V × (E
1 V )
∀ G : cGRAPH | G = (W , t) • W ∩
S
s∈ran t
s = ∧ ∀ e : dom t • #t(e) ≤ 2
cgraph relates the abstract specification of graphs GRAPH [V , E ], with the concrete specification cGRAPH [V , E ].
[V , E ]
cgraph : GRAPH [V , E ] cGRAPH [V , E ]
∀ G : GRAPH [V , E ], ∀ cG : cGRAPH [V , E ] | cG = cgraph(G) •
S
first cG = VG \ e∈EG ψG (e) ∧ second cG = ψG
To show that cgraph is a total function we should prove that (G, cG1 ) ∈ cgraph and
(G, cG2 ) ∈ cgraph implies cG1 = cG2 , and for every G : GRAPH [V , E ] exists cG :
cGRAPH [V , E ] such that (G, cG) ∈ cgraph. An element of cGRAPH [V , E ] is an ordered
pair, thus cG1 = (W1 , t1 ) and cG2 = (W2 , t2 ) and they are equal if their co-ordinates
are equal. This is directly derivable from the definition of cgraph as there is that every
G : GRAPH [V , E ] can be related to an element of cGRAPH [V , E ] by cgraph.
S
For every cG = (W , t) of type cGRAPH [V , E ], G = (W ∪ s∈ran t s, dom t, t) is of type
GRAPH [V , E ] and cG = cgraph G. This makes cgraph a surjection. Let G1 = (V1 , E1 , ψ1 )
and G2 = (V2 , E2 , ψ2 ) be of type cGRAPH [V , E ] and cG1 = (W1 , t1 ), cG2 = (W1 , t2 ) be
their images by cgraph. G1 6= G2 implies that they differ at least in one of their coordinates. If E1 6= E2 or ψ1 6= ψ2 then t1 6= t2 , which implies G1 6= G2 . Otherwise V1 6= V2 ;
ψ1 = ψ2 implies that the sets of non free vertices of G1 and G2 are equal, which implies
W1 6= W2 . Thus cgraph is also injective.
To write the degree of a vertex in a graph we will need a function that converts from
boolean values to integer values:
BtoI == {(false, 0), (true, 1)}
For a graph G = (W , t) we can define the incidence matrix cInc G as a partial function
from the Cartesian product of vertices and edges to the natural numbers, such that for any
vertex v and any edge e:


/ t(e)
 0 if v ∈
cInc G(v , e) =  1 if v ∈ t(e) and #t(e) = 2

2 if v ∈ t(e) and #t(e) = 1
The function cIncd G(v , e) can be written
(
0
if v ∈
/ t(e)
3 − #t(e) if v ∈ t(e)
or cInc G(v , i ) = (3 − #t(e)) · BtoI (v ∈ t(e))
cInc G(v , e) =
53
The complete definition of the incidence matrix (its concrete schema) is:
[V , E ]
cInc : cGRAPH [V , E ] " ((V × E ) )
∀ G : cGRAPH [V , E ] | G = (W , t) • dom cInc G = ( s∈ran t s) × dom t ∧
S
∀ v : s∈ran t s, ∀ e : dom t • cInc G(v , i ) = (3 − #t(e)) · BtoI (v ∈ t(e))
S
Then for an cGRAPH we can define the degree of every vertex of it :
[V , E ]
cDeg : cGRAPH [V , E ] " (V
)
∀ G : cGRAPH [V , E ] | G = (W , t) • dom cDeg G = W ∪ s∈ran t s ∧
∀ v : W • cDeg G(v ) = 0 ∧
S
P
∀ v : s∈ran t s • cDeg G(v ) = e∈dom t (3 − #t(e)) · BtoI (v ∈ t(e))
S
The property of graphs, the sum of vertices degree is twice the number of edges, is automatically satisfied.
The function cwalks defines walks in an cGRAPH [V , E ] :
[V , E ]
cwalks : cGRAPH [V , E ] " altseq[V , E ]
∀ G : cGRAPH [V , E ] | G = (W , t) •
S
cwalks G = {s : altseq[ s∈ran t s, dom t] | ∃ k : • #s = 2k + 1 ∧
∀ i : 1 . . k • t(s(2i )) = {s(2i − 1), s(2i + 1)}}
ctrails, cpaths, ccycles are defined from cwalks in the same way trails, paths, cycles are
defined from walks.
Function subgs builds subgraphs of a given graph cGRAPH [V , E ] :
[V , E ]
subgs : cGRAPH [V , E ] " cGRAPH [V , E ]
∀ G : cGRAPH [V , E ] •
subgs G = {H : cGRAPH [V , E ] | first H ⊆ first G ∧ second H ⊆ second G}
Subgraphs of a graph whose vertices and edges form paths or cycles can be defined analogously from subgs, cpaths, and ccycles.
The concrete schema of connected graphs cGRAPH [V , E ] is:
eCGRAPH [V , E ] == {G : cGRAPH [V , E ] | first G = [G] ∧
[
∀ u, v :
s • ∃ p : cpaths G • p(1) = u ∧ p(#p) = v }
s∈ran second G
54
A tree is a connected graph (which means no free vertices) which number of edges is one
less the number of its vertices. The refined TREE [V , E ] is:
[V , E ]
cTREE : E
1 V
∀ T : cTREE , ∀ G : cGRAPH [V , E ] | G = ([G], T ) • # s∈ran T s = #T + 1 ∧
S
∀ u, v : s∈ran T s • (∃ p : cpathsG • p(1) = u ∧ p(#p) = v )
S
T is a subset of E × 1 V (from the definition of partial functions). Its cardinal is equal
to # dom T , number of tree edges, because it is a function. The first predicate uses this
equality to express the relation between number of edges and number of vertices in a tree.
The second predicate assures the connectivity.
Graphs and subgraphs whose table elements are terms of a path are ePATH [V , E ] and
graphs whose table elements are terms of a cycle are eCYCLE [V , E ].
ePATH [V , E ] == {G : cGRAPH [V , E ], p : cpaths G | ∃ k : • #p = 2k + 1 •
([V ], {i : 1 . . k • (p(2i ), {p(2i − 1), p(2i + 1)})})}
eCYCLE [V , E ] == {G : cGRAPH [V , E ], c : ccycles G | ∃ k : • #c = 2k + 1 •
([V ], {i : 1 . . k • (c(2i ), {c(2i − 1), c(2i + 1)})})}
Paths and cycles can be refined as sequences of E × 1 V that satisfy some conditions.
cPATH [V , E ] is such a representation of paths.
[V , E ]
cPATH : seq(E × 1 V )
∀ P : cPATH • ran P ∈ E 1 V ∧ ∀ i : dom P • #second P (i ) = 2 ∧
∀ i : 1 . . #P − 1 • second P (i ) ∩ second P (i + 1) 6= ∧
∀ i : dom P || i − j |6= 1 • second P (i ) ∩ second P (j ) = The first predicate assures that with every edge is associated a unique set of vertices.
The second predicate states that there are exactly two vertices associated with each edge.
The third predicate states that the edges of two subsequent sequence members share vertices, which ensures connectedness, and the last predicate states that this is true only for
subsequent sequence members, which prevents cycles.
∃ G : cGRAPH [V , E ] • G = {[V ], ran P } is a statement that is equivalent with the first
and second predicate. The last predicate can be replaced with the following statement:
S
# i∈dom P second P (i ) = #P + 1.
cCYCLE [V , E ] is a representation of cycles.
55
[V , E ]
cCYCLE : seq(E × 1 V )
∀ C : cCYCLE • ran C ∈ E 1 V ∧ ∀ i : dom C • #second C (i ) ≤ 2 ∧
second C (1) ∩ second C (#C ) 6= ∧
∀ i : 1 . . #C − 1 • second C (i ) ∩ second C (i + 1) 6= ∧
∀ i : dom C || i − j |6= 1 ∧| i − j |6= #C − 1 • second C (i ) ∩ second C (j ) = The first and second predicate assure that every edge is associated with at most 2 vertices.
(Loops are allowed). The third predicate states that the edge of the first sequence member
shares vertices with the edge of the last sequence member. The forth predicate states
that the edge of each sequence member shares vertices with the edge of the next sequence
member. The last predicate states that only the edges of consecutive members or the edges
of the first and the last member share vertices.
The first, second and last predicate can be replaced with the following:
∃ G : cGRAPH [V , E ] • G = ([V ], ran C ) ∧
[
∀v :
second C (i ) • cDeg G(v ) = 2
i∈dom C
An efficient digraph representation is as a set of free vertices and a table with three columns:
one for the arcs, one for tail of the arcs, and the last for the arcs head.
[V , A]
cDIGRAPH : V × (A × V × V )
∀ D : cDIGRAPH | D = (W , T ) • W ∩ (second T ∪ third T ) = ∧
∀ t, t 0 : T | t 6= t 0 • first t 6= first t 0
The last predicate makes T a function from the set of arcs to the Cartesian product of
vertices set (with the difference that a function is of type (A × (V × V ))). Later we
will refer to an element of the first column in T as an arc of D, an element of the second
column will be called a tail, and an element of the third column will be called a head. We
will refer to heads, tails and free vertices as vertices of D.
The relation between the abstract specification of digraphs and the concrete representation
of them is given by:
[V , A]
cdigraph : DIGRAPH [V , A] cDIGRAPH [V , A]
∀ D : DIGRAPH [V , A], ∀ cD : cDIGRAPH [V , A] | cD = cdigraph(D) •
first cD = VD \ (first ran ψD ∪ second ran ψD ) ∧
second cD = {a : AD • (a, first ψD (a), second ψD (a))}
Similarly as for cgraph, it can be proven that cdigraph is a bijection.
56
For D = (W , T ) of type cDIGRAPH [V , A], its incidence matrix would be:
∀ r : T , ∀(v , a) : dom 
cIncd D | a = first r •

 1 v = second r and v 6= third r
cIncd D(v , a) =  −1 v 6= second r and v = third r

0 otherwise
The function cIncd D can be written
cIncd D(v , a) = BtoI (v = second r ) − BtoI (v = third r ).
The (complete) concrete schema of the incidence matrix is:
[V , A]
cIncd : cDIGRAPH [V , A] " (V × A )
∀ D : cDIGRAPH [V , A] | D = (W , T ) •
dom cIncd D = (second T ∪ third T ) × first T ∧
∀ r : T , ∀ v : second T ∪ third T , ∀ a : first T | a = first r •
cIncd D(v , a) = BtoI (v = second r ) − BtoI (v = third r )
The functions cd − D and cd + D are the concrete representation of the functions d − D and
d + D, which calculate the indegree and outdegree of vertices in a digraph.
[V , A]
cd − : cDIGRAPH [V , A] " (V
cd + : cDIGRAPH [V , A] " (V
)
)
∀ D : cDIGRAPH [V , A] | D = (W , T ) •
dom cd − D = W ∪ second T ∪ third T ∧ dom cd + D = dom cd − D ∧
(∀ v : W • cd − D(v ) = 0 ∧ cd + D(v ) = 0) ∧
∀ v : second T ∪ third T •
P
cd − D(v ) = r ∈T max {0, BtoI (v = third r ) − BtoI (v = second r )} ∧
P
cd + D(v ) = r ∈T max {0, BtoI (v = second r ) − BtoI (v = third r )}
The second and third column of a digraph table hold respectively the tails and heads of the
arcs of the first column. A vertex can appear several times in the second column, which
means as a tail, and it can appear several times in the third column, which means it can
be the head of many arcs. Both columns, the second and the third, are bags of vertices.
The multiplicity of any element in the second column is the number of arcs for which it is
a tail. The multiplicity of an element in the third column is the number of arcs for which
it is the head. The following functions Heads and Tails build bags on V in which the
cardinality of elements is this multiplicity.
57
[V , A]
Heads : (A × V × V ) bag V
Tails : (A × V × V ) bag V
dom Tails = {T : (A × V × V ) | (∀ r , r 0 : T | r 6= r 0 • first r 6= first r 0 )} ∧
dom Heads = dom Tails ∧
∀ T : dom Tails • Tails T = r : T • second r ∧ Heads T = r : T • third r The semantics of bag expression is 1 :
∀ T : dom Tails •
P
∀ v : dom Tails T • Tails T ] v = r ∈T BtoI (v = second r ) ∧
P
∀ v : dom Heads T • Heads T ] v = r ∈T BtoI (v = third r )
The concrete schema of directed walks (in a digraph) is:
[V , A]
cdws : cDIGRAPH [V , A] " altseq[V , A]
∀ D : cDIGRAPH [V , A] | D = (W , T ) •
cdws D = {s : altseq[second T ∪ third T , first T ] | ∃ k : • #s = 2k + 1
∧ {i : 1 . . k • (s(2i ), s(2i − 1), s(2i + 1))} ⊆ T }
Functions cdts, cdps, and cdcs build ditrails, dipaths and dicycles in a digraph, and they
are defined from cdws in the same way their abstract equivalents ditrails, dipaths and
dicycles are defined from diwalks.
A directed tree can be defined in a recursive way using free types in Z.
Similarly to computer representations (concrete schemas) of paths and cycles we will define
computer representations of dipaths and dicycles.
[V , A]
cDIPATH : seq(A × V × V )
∀ D : cDIPATH • ∀ i , j : dom D | i 6= j • first D(i ) 6= first D(j ) ∧
∀ i : 1 . . #D − 1 • third D(i ) = second D(i + 1) ∧
#(dom Tails D ∪ dom Heads D) − 1 = #D
From the type definition of cDIPATH results that there are no free vertices in dipaths. The
first predicate assures that an arc is associated with only one pair of vertices. The second
predicate states that the tail of a sequence member is the head of the subsequent member.
This assures the connectivity in D in a linear fashion. The last predicates states that the
1
We will not write the type parameters of a function when it is clear from the context which are the
parameters, e.g., we will write Tails T instead of Tails[V, A] T.
58
number of vertices is one less than the number of arcs in D. This predicate (considering
connectivity) assures the acyclicity in D.
[V , A]
cDICYCLE : seq(A × V × V )
∀ D : cDICYCLE • ∀ i , j : dom D | i 6= j • first D(i ) 6= first D(j ) ∧
∀ i : 1 . . #D − 1 • third D(i ) = second D(i + 1) ∧
second D(1) = third D(#D) ∧ #(dom Tails D ∪ dom Heads D) = #D
As for dipaths, from cDICYCLE type derives that there are no free vertices in dicycles. The
first and the second predicate are the same with the first and second ones in the definition
of computer dipaths. The third predicate guarantees the closing of dicycle — the head of
the first sequence member is the tail of the last one. The last predicate states that the
number of vertices is equal to the number of arcs, and it ensures that only subsequent arcs,
or the first and the last arc are adjacent.
4.2
Spatial Data Types
Spatial elements given formally in chapter 3 will be implemented as abstract data types.
(An abstract data type generally consists of a structure definition as well as a definition of
operators that are exclusively applicable to this type [25].) What we will give here is a basis
for building the data structure of these types. The other important part of an abstract
data type, the operators, can be defined later considering the structure of the data type.
Having the formal specification of spatial elements we can proceed with their refinement.
As for the abstract specifications of spatial elements, we will find quite useful the design
schemas of graph types. We will not pay much attention to the implementation schemas
of points and lines, and pass them by just saying a few words about the existing implementation of them. We will concentrate more on the types defined for collections of lines
and collections od directed lines, and we will say a few words about the implementation of
two-dimensional elements.
In presenting the concrete representations of spatial elements, we follow the order in which
their abstract schemas were given. Assuming that there is a finite representation of real
numbers in Z, let us call it Q, it is obvious that:
cPoint == Q × Q
and
cPointSet = cPoint
An implementation of directed lines is a sequence of points cDLine == seq1 cPoint and a
function that defines the interpolation between these points, such that the curves obtained
from the interpolation between subsequent points do not intersect each other except for
the subsequent curves that meet in their common point from the sequence.
Lines are defined as the sets of points obtained from the interpolation on directed lines.
cLine is the spatial type for lines and it is derived from the elements of cDLine, i.e. its
59
elements are not stored, but calculated from the stored directed lines. The most simple
(and commonly used) case is linear interpolation. Considering linear interpolation as the
(only) interpolation method used, lines would be:
cLine == {d : cDLine •
[
(d ,
{λ : [0, 1] • (xi · (1 − λ) + xi+1 · λ, yi · (1 − λ) + yi+1 · λ)})}
i∈1..#d−1
The begin and end node of a directed line will be:
cFNode == {d : cDLine, p : cPoint | p = d (1) • (d , p)}
cTNode == {d : cDLine, p : cPoint | p = d (#d ) • (d , p)}
The function cNode associates every (directed) line with its ends:
cNode == {d : cDLine • (d , {cFNode(d ), cTNode(d )})}.
Before proceeding with the concrete schemas for collections of lines, we will make a general
remark for all the collection types: collections of points, collections of lines, directed lines,
regions and their subtypes. The idea of their implementation is a structure that stores
references to the components that constitute the collection. The components themselves,
i.e., their geometry, will be stored elsewhere. The data structure of an abstract type for
a collection should provide for a fast check of (mostly topological) relationships between
collection components. For collection of lines types, the references will be made to the
directed lines that define the line components (because there are no stored lines).
Collection of lines are:
cLineSet == {L : cDLine 1 cNode | L = dom L cNode}.
The function cDg calculates the degree of every node of a collection of lines.
cDg : cLineSet
" (cPoint )
∀ L : cLineSet • dom cDg L = l∈dom L ∧
S
P
∀ p : l∈dom L • cDg L(p) = l∈dom L (3 − #L(l )) · BtoI (p ∈ L(l ))
S
The refined Tree is eTree, which is a collection of lines that have a tree shape. Its definition uses the properties of the refined (graph) tree. (Its implementation could be a tree
structure.)
eTree == {L : cLineSet, G : cGRAPH [cPoint, cLine] | G = ([cPoint], L) ∧
[
∀ u, v :
L(l ) • (∃ p : cpaths G • p(1) = u ∧ p(#p) = v ) ∧
l∈dom L
#L + 1 = #
[
L(l ) • L}
l∈dom L
60
The concrete representation of collections of lines that form a path is cPath. It is the
refined Path and its definition makes use of cPATH [V , E ] properties.
cPath : seq(cDLine × 1 cPoint)
∀ P : cPath • ran P ∈ cLineSet ∧
∀ i : 1 . . #P − 1 • second P (i ) ∩ second P (i + 1) 6= ∧
S
#P + 1 = # i∈dom P second P (i )
The concrete representation of collections of lines that form a cycle is:
cCycle : seq(cDLine × 1 cPoint)
∀ C : cCycle • ran C ∈ cLineSet ∧
second C (1) ∩ second C (#C ) 6= ∧
∀ i : 1 . . #C − 1 • second C (i ) ∩ second C (i + 1) 6= ∧
S
#C = # i∈dom C second C (i )
The last predicate in the cCycle definition can be replaced by:
∀p :
S
i∈dom C
second C (i ) • cDg C (p) = 2
The concrete representation of collections of lines that intersect only at their nodes is:
cPGraph == {L : cLineSet |
∀ d , d 0 : dom L | d 6= d 0 • cLine(d ) ∩ cLine(d 0 ) = L(d ) ∩ L(d 0 )}
A concrete schema cRegBound for regions boundary will include functions that involves
geometric calculations for defining that cycles which constitute a cRegBound are inside or
outside each other.
The concrete schema of a partition boundary will define a spatial type cPBound such that
the only nodes with degree two will be the nodes of the looped lines (in the collection of
lines that constitute the partition boundary) and there are no lines which adjacent regions
are the same.
The set cDLineSet is the concrete representation of collections of directed lines.
cDLineSet : (cDLine × cPoint × cPoint)
∀ D : cDLineSet • ∀ d , d 0 : D | d 6= d 0 • first d 6= first d 0 ∧
∀ d : D • second d = cFNode(first d ) ∧ third d = cTNode(first d )
The concrete representation of collections of directed lines that form a directed path is
cDPath. Its definition is based on properties of cDIPATH [V , E ].
cDPath : seq(cDLine × cPoint × cPoint)
∀ P : cDPath • ran P ∈ cDLineSet ∧
∀ i : 1 . . #P − 1 • third P (i ) = second P (i + 1) ∧
#(dom Tails P ∪ dom Heads P ) − 1 = #P
61
The first predicate is equivalent with the following statements:
∀ i , j : dom P | i 6= j • first P (i ) 6= first P (j )
and
∀ i : dom P • second P (i ) = cFNode(first P (i )) ∧ third P (i ) = cTNode(first P (i ))
cDCycle is the refinement of DCycle.
cDCycle : seq(cDLine × cPoint × cPoint)
∀ C : cDCycle • ran C ∈ cDLineSet ∧
second C (1) = third C (#C ) ∧
∀ i : 1 . . #C − 1 • third C (i ) = second C (i + 1) ∧
#(dom Tails C ∪ dom Heads C ) = #C
A cRegion will be defined as a boundary that defines the region and a (label) point that
is inside the area enclosed from the boundary, which means it is a subset of the Cartesian
product cPoint × cRegBound that satisfies the inside constraint of the point to the area
defined from the boundary. The collections of regions and its subtypes can then be defined
using the concrete schema of a region, and the constraints of the collection type.
4.3
Mapping Rules
The spatial data types introduced in the previous section can be used to describe the shape
of entities in the conceptual model of a spatial application, in other words there will be
an attribute of this entity type which describes the geometry of it, and the domain of this
attribute will be one of the spatial types given in section 4.2. This means they will work at
the tuple level, setting constraints that define what are the allowed tuples. On the other
hand, collection types can be used at a relation level, setting constraints that define what
are the allowed extensions of a possible relation instance (for this entity type).
Before continuing with translation rules, we give a table of (topological) relationships
between spatial primitives: points, lines, and regions, which will be needed to formulate
the rules. 2
cPoint
cLine
cRegion
cPoint
=; 6=
cLine
∈; ∈
/
intersect;
share part;
disjoint
cRegion
inside; in the boundary; outside
inside; outside; intersects the boundary; is part
of the boundary; shares part with the boundary;
meets the boundary
the 9-intersection matrix : disjoint, contains, contained, equals, meets, covers, covered by, overlaps,
2
A complete list of topological relationships between lines and regions can be found in [17]. The list
given here suffices for our purpose.
62
All entity types for which the shape is described by a collection type will be translated in
a relation that holds (the attributes of) the entity itself, and another relation that has the
geometry of all the components of the (collection) entities.
A spatial entity type that has shape of type cPoint will be translated in a single relation.
A spatial entity type with cLine (/cDLine) shape, with a cLineSet (/cDLineSet) extension
constraint, and no ‘share part’ intrarelational constraint (a relationship that is defined
only between some entities and cannot be defined in the whole entity type extension)
will be translated in a single relation schema, which means the geometry will be stored
together with the other attributes; otherwise there will be a separate relation for storing
the geometry.
A spatial entity type with cRegion shape, cRegionSet extension constraint, and no ‘meets’,
‘covers’ and ‘covered by’ intrarelational constraint, will be translated in a single relation;
otherwise the geometry should be kept in a separate relation.
The existence of relationships — ∈ between cPoint and cLine; ‘in the boundary’ between
cPoint and cRegion; ‘is part of the boundary’, ‘shares part with the boundary’, ‘meets the
boundary’ between cLine and cRegion — in the conceptual schema, is also affecting the
translation to the relational schema.
Because of different implementation schemas, the hierarchy of spatial data types is not the
same as the hierarchy of spatial elements. Figure 4.1 gives a partial hierarchy of spatial
data types, including two generalised types Geo, which indicates any spatial type, and Ext,
which is any of the types of one- or two- dimensional elements. The other types, cPath,
cCycle, cDPath, cDCycle (not included in the figure for lack of space) are directly under
the Ext type in the hierarchy of types.
Figure 4.1: A (partial) hierarchy of spatial data types
63
Chapter 5
Conclusions
Based on topology concepts, we formally defined zero-, one-, and two-dimensional primitives: Point, Line, and Region. A distinction was made between a directed line and a line:
a directed line is an ordered set.
Using the set constructor, collections of points, collections of lines, directed lines, and
collections of regions were built. To build special collections of linear objects, i.e., (lines
/ directed lines) collections that satisfy some constraints, we used graph concepts. First
we presented a formal definition of graphs, directed graphs (digraphs), paths, cycles, trees,
dipaths, dicycles and ditrees. We gave another representation for collections of lines as
graphs, and collections of directed lines as digraphs. Using these representations and graph
concepts, we introduced the other one-dimensional elements. Topological constraints were
imposed in collections of regions to build new two-dimensional elements.
We defined the relation between spatial primitives with the lower dimensional primitives.
Thus the boundary of a line is a set of points that are the nodes of this line; a directed
line has a pair a points as its begin and end node; a region boundary is a collection of lines
that satisfy some constraints. RegBound is the type that describes a region boundary.
We refined the formal specifications of spatial elements introduced, to get schemas of them
that are closer to computer representations. Thus, a point is a pair of real numbers, a
collection of points is a set of pairs. A directed line is a sequence of pairs, and a line
is calculated from that sequence using an interpolation function. Collections of lines,
collections of directed lines, and their subtypes are structures that store references to
their components and take care of the topological relationships between the components.
For collections of lines and their subtypes, the references are made to directed lines that
resolve the lines (because the lines are not stored, but calculated from the stored directed
lines). A region is resolved (determined) from its boundary, thus, it is represented by the
(structure of the) collection of lines that makes up its boundary. The types representing
collection of regions are structures that store references to the region components, and
enforce constraints in the collection.
The adjacency relationship between regions was represented as a graph, which translates
problems related to adjacency relationships into graph problems.
The (concrete) schemas of spatial elements can be used to define the (data) structure
64
of abstract data types, and the operators can be defined based on this structure. The
abstract data types are then used in the conceptual model to describe spatial attributes
of (spatial) entity types. The data types of collections of (spatial) elements can be used
in the conceptual model also to impose constraints over the extension of a spatial entity
type. These spatial data types can be implemented in a nested (NF2 ) relational model
that supports structures such as lists or trees.
Using the topological relationships between spatial primitives and constraints imposed on
the spatial attribute (by its data type) and the relation extension, we defined a set of rules
for translating from an (extended) Entity-Relationship schema to a relational schema.
As a final remark, the formal specifications of graph elements define type constructors, e.g.,
graph constructor, path constructor, etc., which we applied to build types representing
(special) collection of lines / directed lines.
65
Appendix A
A.1
Topology Concepts
Definition We say two sets A and B intersect each other if A ∩ B 6= . Otherwise, A and
B are disjoint.
Definition A metric space is an ordered pair (M , ρ) consisting of a set M together with
a function ρ : M × M " R satisfying, for x , y, z ∈ M :
1. ρ(x , y) ≥ 0,
2. ρ(x , x ) = 0; ρ(x , y) = 0 implies x = y,
3. ρ(x , y) = ρ(y, x ),
4. ρ(x , y) + ρ(y, z ) ≥ ρ(x , z ) (triangle inequality).
The function ρ is called metric on M . Functions ρ : M × M " R (which are potential
metrics, but which have not yet been tested) are called distance functions.
Fact The distance function
ρ((x1 , ...xn ), (y1 , ...yn )) =
qP
n
k =1 (xk
− yk )2
is called the usual metric in R n . R n together with the usual metric is a metric space.
Definition A metric ρ on M is bounded iff for some constant A, ρ(x , y) ≤ A for all x and
y in M .
Definition Let (M , ρ) be a metric space, x a point of M . For > 0, we define
Uρ (x , ) = {y ∈ M | ρ(x , y) < }
called the -disk about x . When there is no confusion about ρ, we can abbreviate Uρ (x , )
to U (x , ).
Definition A set E in a metric space (M , ρ) is open iff for each x ∈ E there is an -disk
U (x , ) about x contained in E . A set is closed iff it is the complement of an open set.
Evidently, a set F is closed iff whenever every disk about x meets F , then x ∈ F .
66
Definition If (M , ρ) and (N , σ) are metric spaces, a function f : M " N is continuous
at x in M iff for each > 0, there is some δ > 0 such that σ(f (x ), f (y)) < whenever
ρ(x , y) < δ.
Theorem The open sets in a metric space (M , ρ) have the following properties:
1. Any union of open sets is an open set.
2. Any finite intersection of open sets is an open set.
3.
and M
are both open.
Definition A topology on a set X is a collection τ of subsets of X , called the open sets,
satisfying :
1. Any union of elements of τ belongs to τ ,
2. any finite intersection of elements of τ belongs to τ ,
3.
and X
belong to τ .
We say (X , τ ) is a topological space, sometimes abbreviated “X is a topological space”
when no confusion can result about τ .
Definition If X is a topological space and E ⊂ X , we say E is closed iff X − E is open.
Definition if (X , τ ) is a topological space and A ⊂ X , the collection τ 0 = {G ∩ A | G ∈ τ }
is a topology for A, called the relative topology for A. The fact that a subset of X is being
given this topology is signified by referring to it as a subspace of X .
Definition Let (M , ρ) be a metric space. Then (by Theorem ...) the open sets in M form
a topology on M , called the metric topology τρ .
Definition If X is a topological space and E ⊂ X , the closure of E in X is the set
E = ClX (E ) = {K ⊂ X | K is closed and E ⊂ K }
T
E is closed, and it is the smallest closed set containing E .
Definition If X is a topological space and E ⊂ X , the interior of E in X is the set
E ◦ = IntX (E ) = {G ⊂ X | G is open and G ⊂ E }
S
E ◦ is open and it is the largest open set contained in E .
Definition If X is a topological space and E ⊂ X , the frontier of E in X is the set
∂E = FrX (E ) = E ∩ X − E
The frontier of E is a closed set.
Theorem For any subset E of a topological space X :
67
1. E = E ∪ Fr (E )
2. E ◦ = E − Fr (E )
3. X = E ◦ ∪ Fr (E ) ∪ (X − E )◦
Definition An open subset G in a topological space is regularly open iff G is the interior
of its closure, i.e. G = (G)◦ . A closed subset C is regularly closed iff it is the closure of
its interior, i.e. C = C ◦ .
Facts
1. The complement of a regularly open set is a regularly closed set and vice versa.
2. The intersection, but not necessarily the union, of two regularly open sets is regularly
open.
3. The union, but not necessarily the intersection, of two regularly closed sets is regularly
closed.
Definition If X is a topological space and x ∈ X , a neighbourhood (abbreviated in nhood )
of x is a set U which contains an open set V containing x . Thus, evidently, U is a nhood
of x iff x ∈ U ◦ .
Definition Let X and Y be topological spaces and let f : X " Y . Then f is continuous
at x0 ∈ X iff for each nhood V of f (x0 ) in Y , there is a nhood U of x0 in X such that
f (U ) ⊂ V . We say f is continuous on X iff f is continuous at each x0 ∈ X .
Definition If X and Y are topological spaces, a function f : X " Y is a homeomorphism
iff f is bijective and continuous and f −1 is also continuous. In that case we say that X and
Y are homeomorphic.
Definition A space X is disconnected iff there are disjoint nonempty open sets H and K
in X such that X = H ∪ K . When no such disconnection exists, X is connected.
A.2
Z at work
To add persons to our Birthday Book, we use the schema:
AddBirthday
∆BirthdayBook
name? : NAME
date? : DATE
name? ∈
/ known
birthday 0 = birthday ∪ {name? 7→ date?}
68
∆BirthdayBook indicates that the state will change after this operation. ? is a convention
for naming input variables. name? ∈
/ known is a precondition for the success of the
operation (and it is reasonable, because each person can only have one birthday). What
we expect to happen is that the set of names known to the system be augmented with the
new name.
known 0 = known ∪ {name?}
It can be proved that this happens (using the specification of AddBirthday).
To find the birthday of a person, we use the schema:
FindBirthday
ΞBirthdayBook
name? : NAME
date! : DATE
name? ∈ known
date! = birthday(name?)
ΞBirthdayBook indicates that the state of the system does not change after this operation.
! is a convention for naming output variables. The precondition for success of the operation
is that name? ∈ known.
We also need to specify an initial state of our system, that naturally is an empty Birthday
Book, which means known is an empty set. The schema below specifies this initial state.
InitBirthdayBook
BirthdayBook
known = Until now, we have not been considering error situations, e.g. we try to add a person that
is already in the book, or we look for the birthday of a person that is not in the book.
To include those cases also, we should change our specification (in fact we will only add
some other schemas and then combine them). First, let us declare a schema that reports
a successful operation.
Success
result! : REPORT
result! = ok
The schema AddBirthday ∧ Success describes an operation which, for correct input, both
acts as described by AddBirthday and produces the result ok .
69
We will show here a complete specification only for the AddBirthday operation. (An
analogue specification can be given for FindBirthday operation.) We declare now a schema
that deals with the case when we try to add an existing name.
AlreadyKnown
ΞBirthdayBook
name? : NAME
result! : REPORT
name? ∈ known
result! = already known
The complete (robust) RAddBirthday operation will be
b (AddBirthday ∧ Success) ∨ AlreadyKnown.
RAddBirthday =
The operation RAddBirthday terminates, whatever is its input. If the input name? is already known, the state of the system does not change, and the result already known is returned; otherwise, the new birthday is added to the database as described by AddBirthday,
and the result ok is returned.
What we have done until now is the abstract description of our Birthday Book. We should
think now of a concrete description of the system. It seems to be a good idea the use of
arrays to store the names and their birthdays. So the variables
names : array [1 . . ] of NAME ;
dates : array [1 . . ] of DATE ;
can be a right implementation in a programming language (for simplicity let us not discuss
about the dimension of the arrays). The name and birthday of a person has the same index
in two arrays. A mathematical approximation of an array is a function from 1 (naturals
without 0) to the desired type.
names : 1 " NAME
dates : 1 " DATE
Then the concrete state space can be defined as
BirthdayBook 1
names : 1 NAME
dates : 1 " DATE
hwm : The injection names assures us that there are no repetitions among the elements of the
names array. hwm (stands for ‘high water mark’) shows how much of the arrays is in use.
70
The retrieve schema that defines the relationship between the abstract and the concrete
schema is
Abs
BirthdayBook
BirthdayBook 1
known = { i : 1 . . hwm • names(i ) }
∀ i : 1 . . hwm •
birthday(names(i )) = dates(i )
This schema relates the variables of the abstract schema – known and birthday, with
the variables of the concrete schema – names, dates and hwm. The first predicate says
that the set known consists of just those names which occur somewhere among names(1),
. . . , names(hwm). The second predicate says that the birthday for names(i ) is the corresponding element dates(i ) of the array dates.
The schema AddBirthday1 adds another person in the book. To add a new name, we
increase hwm by one, and fill in the name and date in the arrays:
AddBirthday1
∆BirthdayBook 1
name? : NAME
date? : DATE
∀ i : 1 . . hwm • name? 6= names(i)
hwm 0 = hwm + 1
names 0 = names ⊕ {hwm 0 7→ name?}
dates 0 = dates ⊕ {hwm 0 7→ date?}
It is a correct implementation of AddBirthday, because of the following two facts:
1. Whenever AddBirthday is legal in some abstract state, the implementation AddBirthday1
is legal in any corresponding concrete state - it is a correct operation refinement.
2. The final state which results from AddBirthday1 represents an abstract state which
AddBirthday could produce - it is a correct concrete operation.
The operation AddBirthday is legal exactly if its pre-condition name? ∈
/ known is satisfied.
If this is so, the predicate
known = { i : 1 . . hwm • names(i ) }
from Abs tells us that name? is not one of the elements names(i ):
∀ i : 1 . . hwm • name? 6= names(i )
71
This is the pre-condition of AddBirthday1. That means AddBirthday1 is legal whenever
AddBirthday is legal.
To prove the second fact, we need to think about the concrete states before and after an
execution of AddBirthday1, and the abstract states they represent according to Abs. The
two concrete states are related by AddBirthday1, and we must show that the two abstract
states are related as prescribed by AddBirthday:
birthday 0 = birthday ∪ {name? 7→ date?}
The above equality is on functions that are both of the type NAME DATE . To prove
that two functions are equal we should prove that their domains are the same, and that
they map each element of the domain to the same element in the target.
The domains of these two functions are the same:
dom birthday 0
= known 0
[invariant after]
= { i : 1 . . hwm 0 • names 0 (i ) }
[from Abs 0 ]
= { i : 1 . . hwm • names 0 (i ) } ∪ {names 0 (hwm 0 )}
[since hwm 0 = hwm + 1]
= { i : 1 . . hwm • names(i ) } ∪ {name?}
[since names 0 = names ⊕ {hwm 0 7→ name?}]
= known ∪ {name?}
[from Abs]
= dom birthday ∪ {name?}.
[invariant before]
To prove that the mapping is done correctly, we will separate the proof in two parts.
For all i in the range 1 . . hwm,
names 0 (i ) = names(i ) ∧ dates 0 (i ) = dates(i ).
For any i in this range,
birthday 0 (names 0 (i ))
= dates 0 (i )
[from Abs 0 ]
= dates(i )
[dates unchanged]
= birthday(names(i )).
[from Abs]
For the new name, stored at index hwm 0 = hwm + 1,
birthday 0 (names 0 (hwm 0 ))
= dates 0 (hwm 0 )
[from Abs 0 ]
= date?.
[spec. of AddBirthday1]
72
So the two functions birthday 0 and birthday ∪ {name? 7→ date?} are equal, and the abstract states before and after the operation are guaranteed to be related as described by
AddBirthday.
The concrete operation for FindBirthday is
FindBirthday1
ΞBirthdayBook 1
name? : NAME
date! : DATE
∃ i : 1 . . hwm •
name? = names(i ) ∧ date! = dates(i )
The predicate says that there is an index i at which the names array contains the input
name?, and the output date! is the corresponding element of the array dates. For this
to be possible, name? must in fact appear somewhere in the array names: this is the
pre-condition of the operation.
Since neither the abstract nor the concrete operation changes the state, there is no need
to check that the final concrete state is acceptable, but we need to check that the precondition of FindBirthday1 is sufficiently liberal, and that the output date! is correct.
The pre-conditions of the abstract and concrete operations are in fact the same: that the
input name? is known. The output is correct because for some i , name? = names(i ) and
date! = dates(i ), so
date!
= dates(i )
[spec. of FindBirthday1]
= birthday(names(i ))
[from Abs]
= birthday(name?).
[spec. of FindBirthday1]
Let us define now the schema for the initial state of the program:
InitBirthdayBook 1
BirthdayBook 1
hwm = 0
This is correctly representing the initial abstract state (it is a correct initial concrete state),
because
known
= { i : 1 . . hwm • names(i ) }
[from Abs]
= { i : 1 . . 0 • names(i ) }
[from InitBirthdayBook 1]
= .
[since 1 . . 0 = ]
73
Bibliography
[1] Serge Abiteboul and Richard Hull. IFO : A Formal Semantic Database Model. In
Gio Wiederhold, editor, ACM Transactions on Database Systems, pages 525–565.
Association for Computing Machinery, December 1987.
[2] Herman Balsters, Rolf de By, and Roberto Zicari. Typed Sets as a Basis for ObjectOriented Database Schemas. In Oscar M. Nierstrasz, editor, Proc. European Conf. on
Object-oriented Programming (ECOOP’93), pages 161–184. Springer-Verlag, 1993.
[3] Rosalind Barden, Susan Stepney, and David Cooper. Z in Practice. Prentice Hall,
Cambridge, England, August 1994.
[4] T. Bittner and A.U. Frank. An Introduction to the Application of Formal Theories
to GIS. In F. Dollinger and J. Strobl, editors, Informationsverarbeitung IX (AGIT),
pages 11–22. Institut fuer Geographie and Universität Salzburg and Salzburger Geographische Materialien, 1997.
[5] T. Bittner and A.U. Frank. On Representing Geometries of Geographic Space. In
T.K. Poiker and N. Chrisman, editors, 8th Int. Symposium on Spatial Data Handling,
pages 111–122. International Geographic Union, July 1998.
[6] J.A. Bondy and U.S.R. Murty. Graph Theory with Applications. The Macmillan Press
Ltd, University of Waterloo, Ontario, Canada, 1976.
[7] Open GIS Consortium. Open GIS Simple Feature Specification for OLE/COM. Technical Report Document 99-050, Open GIS Project, 1999.
[8] Rolf A. de By. Database Design. Course Notes. International Institute for Aerospace
Survey and Earth Sciences (ITC), 1998.
[9] Rolf A. de By and Rom Langerak. Specificatie-Methoden. Universiteit Twente Faculteit der Informatica, August 1996.
[10] Antoni Diller. An Introduction to Formal Methods. John Wiley & Sons, School of
Computer Science, University of Birmingham, February 1990.
74
[11] Martin Erwig and Ralf Hartmut Guting. Explicit graphs in a functional model for
spatial databases. IEEE Transactions on Knowledge and Data Engineering, 6(5):787–
804, October 1994.
[12] Martin Erwig and Markus Schneider. Partition And Conquer. Technical Report CH97-07, Chorochronos, 1997.
[13] Ralf Hartmut Guting, Michael H. Bohlen, Martin Erwig, Christian S. Jensen, Nikos A.
Lorentzos, Markus Schneider, and Michalis Vazirgiannis. A Foundation for Representing and Querying Moving Objects. Technical Report CH-98-03, Chorochronos, 1998.
[14] Thanasis Hadzilacos and Nectaria Tryfona. An Extended Entity-Relationship Model
for Geographic Application. SIGMOD Record, 26(3), September 1997.
[15] P. Haunold, A.U. Frank, S. Grumbach, G. Kuper, and Z. Lacroix. Geometric Objects Represented by Inequalities. In F. Dollinger and J. Strobl, editors, Angewandte
Geographische Informationsverarbeitung IX (AGIT), pages 77–86. Institut fuer Geographie, Universitaet Salzburg, Salzburger Geographsiche Materialien, July 1997.
[16] D.M. Mark and A.U. Frank. Experiential and Formal Models of Geographic Space.
Environment and Planning, Series B(23), 1996.
[17] Martien Molenaar. An Introduction to the Theory of Spatial Object Modelling. Taylor
& Francis, International Institute for Aerospace Survey and Earth Sciences (ITC),
Enschede, The Netherlands, 1998.
[18] C. Parent, S. Spaccapietra, E. Zimanyi, P. Donini, C. Plazanet, and C. Vangenot.
Modeling Spatial Data in the MADS Conceptual Model. In Proceedings of the 8th
Int. Symp. on Spatial Data Handling, SDH’98, July 1998.
[19] C. Parent, S. Spaccapietra, E. Zimanyi, P. Donini, C. Plazanet, C. Vangenot,
N. Rognon, J. Pouliot, and P. A. Crausaz. MADS: un modèle conceptuel pour des
applications spatio-temporelles. Revue Internationale de Gomatique, 7(3-4), 1997.
[20] Dieter Pfoser and Nectaria Tryfona. Requirements, Definitions, and Notations for
Spatiotemporal Application environments. Technical Report CH-98-09, Chorochronos,
1998.
[21] Rosanne Price, Nectaria Tryfona, and Christian S. Jensen. A Conceptual Modeling Language for Spatiotemporal Applications. Technical Report CH-99-20,
Chorochronos, 1999.
[22] J.M. Spivey. The Z Notation. Prentice Hall, Oxford University Computing Laboratory,
September 1988.
[23] Nectaria Tryfona and Thanasis Hadzilacos. Evaluation of Database Modeling Methods
for Geographic Information systems. Technical Report CH-97-05, Chorochronos, 1997.
75
[24] Nectaria Tryfona and Christian S. Jensen. Conceptual Data Modeling for Spatiotemporal Applications. Technical Report CH-98-08, Chorochronos, 1998.
[25] Gottfried Vossen. Data Models, database languages and database management systems.
Addison-Wesley, Rheinisch-Westfalische Tecnische Hochschule Aachen, 1990.
[26] Stephen Willard. General Topology. Addison-Wesley, University of Alberta, April
1970.
[27] Michael F. Worboys. GIS : A Computing Perspective. Taylor & Francis, Department
of Computer Science, University of Keele, Keele, UK, 1998.
76

Download Report

Mapping an extended ER model to a spatial relational model

Paperzz.com

Your Paperzz