InfoModels-Ontologies--EdBarkmeyer_20070412

Information Models
as a Basis for Ontologies
Ed Barkmeyer, NIST
Ontolog Forum, April, 2007
04/12/07
Next Generation Info Models
1
Outline
•
•
•
•
•
Overview of information modeling
Features of “information modeling”
Comparison to features of OWL
Information modeling methodology
Conclusions
04/12/07
Next Generation Info Models
2
History
• Linked record models (1968)
– CODASYL standard (1974), Navigational Data Model (1980)
• E.F.Codd: Relational Algebra (1970)
• Peter Chen: Entity Attribute Relationship Models (1976)
• ISO TR 8002: 1984
the Conceptual schema and the information base
• 1980s information modeling technologies
– IDEF1-X, SDM, NIAM/ORM, SSADM, EXPRESS, etc.
• 1990s object modeling technologies (UML)
• Frame-based logics (1975-1995)
• Description logics (1985-present): DAML, OWL
04/12/07
Next Generation Info Models
3
Differences in Nature
• Navigational and relational models
– relate data to data
– relational normal forms model functions of keys
• Information models
– relate things (entities) to other things
– relate things to information about them
– use classifiers to collect properties
• Ontologies
– relate things to things
– relate things to information about them
– use information to classify things
04/12/07
Next Generation Info Models
4
Differences in Purpose
• Data models
– support software implementations of business processes
– organize information for access
– describe instances
• Information models
–
–
–
–
support sets of business processes
organize information for comprehension
support design of databases and messages
use classifications to describe instances
• Ontologies
– support retrieval of information using inferencing
– organize information for relevance
– describe subjects and categories by classifications
04/12/07
Next Generation Info Models
5
Differences in Concept
• Information models
– universe is things used by the business processes
– classification/axioms are as used by the business
business rules, not accepted scientific truth
– distinguish conceptual schema
= invariants, quantified assertions
from the information base
= current assertions about individual things
• Ontologies
–
–
–
–
04/12/07
universe is all things that may be encountered in a domain
classification/axioms are accepted truth in the domain
primarily quantified assertions with a few ground facts
distinguished from an information base for some practical uses
Next Generation Info Models
6
Common Ideas
• Universe is a set of things of interest
• Classification enables understanding of the universe
• Axioms (invariants, necessities)
but with a different concept of truth
• Ground facts = axiomatic truths about instances
• conceptual schema is “nearly monotonic”
current/transient facts restricted to the information base
04/12/07
Next Generation Info Models
7
Outline
•
•
•
•
•
Overview of information modeling
Features of “information modeling”
Comparison to features of OWL
Information modeling methodology
Conclusions
04/12/07
Next Generation Info Models
8
Information Modeling: Classifiers
• Entity type classifies things in the universe
–
–
–
–
a template for capturing (current) information about things
a model of the state of a thing
identity is distinct from state
domain of properties
• Value type classifies information about things
–
–
–
–
instance is an information unit, a data element
can be a structure of component data elements
identity is state (state is invariant)
only range of properties (its properties proceed from its identity)
• Data type represents Value types
– instance is a computational data value
04/12/07
Next Generation Info Models
9
Information modeling: Subtypes
• Subtype relationships among classifiers
– S is a subtype (subclass) of E iff
every s in class S is also an instance of E
– multiple supertypes: S is a subtype of E1, ..., En
• Exclusion relationships
– if t is an instance of E then t is not an instance of D
• Covering relationships
– E is covered by S1, ..., Sn iff e in E implies
there exists at least 1 k such that e is in Sk
– Mutually exclusive coverings are “partitions”
– “abstract type” = a type that is covered by some set of subtypes
04/12/07
Next Generation Info Models
10
Information Modeling: Class definition
• Union (“choice”, “select”) types
– Class E is the union of classes F and G and ...
E(x) == F(x) OR G(x)
– Union types are “abstract” by construction
• Intersection
– Class E is the intersection of classes F and G
E(x) == F(x) AND G(x)
• Relative complement
– if S is a subtype of E, C is the relative complement iff C = E – S
04/12/07
Next Generation Info Models
11
Classification
• Entity classes can represent roles or states of things
– no notion of intrinsic properties
– models contain intrinsic classifiers, e.g., maximal superclasses
but languages don’t identify them
• A thing can be an instance of multiple entity types
– the entity types need not be explicitly related
• Default relationship among subtypes is “overlaps”
– a thing can be instance of both
• A thing can change classification over time
– thing is instance of class is just part of the state of thing
• Most of these concepts not supported by object models
04/12/07
Next Generation Info Models
12
Aside: Value Types
• Value type = conceptual classifier for information unit
• Categories
– name (referencer, supports equal/unequal)
• enumerated lists
• codes/identifiers taken from registries
• strings intended to identify things
– quantity
• includes numbers and values with “dimensions”
– quantitative name (names that support quantitative operations)
• ordinal, date, time, time period, temperature, etc.
– truth value
– text (structured and unstructured)
• a body of information interpreted by a specific agent
04/12/07
Next Generation Info Models
13
Information Modeling: Properties
• Attributes (data type properties)
– domain is entity, range is value
• Relationships (object properties, associations)
– domain is entity, range is entity
• Inverse relationship
– same relationship, nominal domain and range reversed
– different “reading” (spelling of the relationship name)
• Multiplicity/cardinality of attributes and relationships
– one entity can have the same property (type)
0, 1, n, unbounded times
– distinguish set of the same property from
property whose range is a set
04/12/07
Next Generation Info Models
14
Property domains
• Domain and range of a property must be a single class
– Name of a property implicitly qualified by the domain
• Ad hoc supertypes (“union type”)
may be created to be domain or range
– enumerate the entity types constituting the domain, or
– enumerate the entity types constituting the range, or
– (rarely) enumerate the value types constituting the range
• Mutable and immutable properties
– a property P(e, v) is “mutable” if
the value v associated with a given e may change over time
– P(e,v) is “immutable” if P(e,x) implies x=v over all time
04/12/07
Next Generation Info Models
15
Property Relationships
• Property implies property
– (there exists v such that P(d,v)) implies
(there exists x such that Q(d,x))
• Property excludes property
– (there exists v such that P(d,v)) implies
NOT (there exists x such that Q(d,x))
• Properties P1, ..., Pn cover entity type
– For every instance e of E there exists some i such that
there exists v such that Pi(e,v)
04/12/07
Next Generation Info Models
16
Relationship Relationships
• Relationship implies/subsets relationship (pairwise)
– P(x,y) implies Q(x,y)
– every pair (x,y) that satisfies P also satisfies Q
• Relationship excludes relationship (pairwise)
– P(x,y) implies NOT Q(x,y)
• Relationship refines/subtypes relationship
– property P is a specialization of property Q
– every instance of P is an instance of Q
– not just implication
04/12/07
Next Generation Info Models
17
Examples
• Property implies property
– x is an officer of ship S implies
there exists officer y such that x reports to y
• Property excludes property
– x is employee of G implies NOT x is eligible for prize p
• Relationship implies/subsets relationship (pairwise)
– x is an officer of ship S implies x has cabin on S
• Relationship excludes relationship (pairwise)
– x is an officer of ship S implies NOT x is passenger on S
• Relationship refines/subtypes relationship
– x is captain of ship S refines x is officer of ship S
04/12/07
Next Generation Info Models
18
Qualifying Properties
• Qualifying property
– a property whose existence or value determines
membership in a given subtype
– existence:
If there exists y such that Q(d,y) then d is an instance of S
– value:
If Q(d, ‘red) then d is an instance of S
– functional value:
Let y = Q(d); if Greater(y, 1) then d is an instance of S
– the domain (D) of property Q must be a supertype of S
Q may be optional (cardinality 0..<something>) on D
04/12/07
Next Generation Info Models
19
Derived Properties
• Derived Property:
a property created by “joining” relationships
– represented by a “path through the semantic network”
• Example:
–
–
–
–
–
vehicle and model are entity types
weight is a value type (a quantity)
attribute: model-has-gross-weight(model, weight)
relationship: vehicle-has-model(vehicle, model)
derived property: vehicle-has-gross-weight(vehicle, weight)
= vehicle.vehicle-has-model[model].model-has-gross-weight[weight]
= { (vehicle, weight) : (exists m)
(and vehicle-has-model(vehicle,m)
model-has-gross-weight(m,weight)) }
04/12/07
Next Generation Info Models
20
Information Modeling: Identifiers
• Identifiers/keys distinguish instances of an entity class
– simple key: a property whose inverse is “functional”
• for each v in the range, there exists at most 1 d in the domain
such that P(d,v)
• almost always an attribute (value type)
– relative uniqueness
• property P is unique within property Q
• for each p in the range of P and each q in the range of Q,
there exists at most 1 d in the domain such that P(d,p) AND Q(d,q)
• p is usually a value, and q is usually an entity such that
for each d there exists exactly 1 q such that Q(d,q)
• selection of a key for q gives rise to a “composite key” for d
by “concatenating” (making a tuple of) the keys
– a key property must apply to all things in the class
– a given entity class may have multiple identifier/key properties
04/12/07
Next Generation Info Models
21
Dependencies
• Entity type E is “dependent on” property P(e,x) iff
(exists e)E(e) implies (exists x)P(e,x)
– that is, the e cannot exist unless the x exists
– a meta-property of a relationship between instances
• sometimes modeled as “dependent on class X”
• in IDEF1-X, E is a “weak entity type” and P “supports” E
– not all “mandatory” properties are dependencies
– dependency is an “intrinsic” property
– dependency is an invariant property: the x never changes
• Example
– course-has-section(course, section) has inverse
section-of-course(section, course)
– section is dependent on section-of-course
the section cannot meaningfully exist without the course
04/12/07
Next Generation Info Models
22
Aggregates
• Entity type E “aggregates” property P(e,m) iff
every instance e of E is a “collection” and
P(e,m) is the relationship of e to its members
– aggregate is a metaproperty of E that is based on P
– P is a “logical” or “virtual” “part of” relationship
• Problem: e is only instantaneously a “set”
– the identity of e does not change if a member is deleted
– no axiom is associated with this metaproperty
• Example:
– Entity type Convoy, with property convoy-includes-ship(c,s)
• Convoy aggregates convoy-includes-ship
• by extension, Convoy “is aggregation of” Ship
04/12/07
Next Generation Info Models
23
Composition
• Entity type E “is composed by” properties Pi(e,ci) iff
– each instance e of E is constructed from the ci such that Pi(e,ci)
– each Pi relates an instance e to one (or more) of its components
– for each i, there are n distinct ci such that Pi(e,ci),
where n is the minimum cardinality of p
(otherwise e is not an instance of E)
– for each ci such that Pi(e,ci), if Pi(x,ci) then x = e
(a ci belongs to at most one e)
– some models make the ci dependent on the inverse of Pi
– “composite” is a metaproperty of E that is based on the Pi
– each Pi is a “physical” “part of” relationship
• Example
– entity type Book is composed by book-has-chapter(b, c)
04/12/07
Next Generation Info Models
24
Validity Rules
• Validity Rule =
arbitrary first-order logic expression
involving instances, classifiers and properties
that must hold in a “valid” information base
• Languages have limitations on expressibility
–
–
–
–
instance references
existentials
“special functions”
nature of comparisons
• NOT inferencing rules
– cannot conclude x should be classified as an instance of E
conclusion E(x) means invalid information base if NOT E(x)
04/12/07
Next Generation Info Models
25
Aside: Object Modeling
– Ad hoc models of state
• properties needed for some set of software applications
• Object is to design software programs
–
–
–
–
–
Object templates (class models)
Attributes, Relationships (associations, pointers)
Superclasses and “inheritance”
Validity rules
‘Operations’ = actions on the object state
• No real association to process
– No keys, no qualifiers
04/12/07
Next Generation Info Models
26
Some known Issues
• Diverse keys for union types
– identity of individuals determined by type and type-specific keys
• Variance of cardinality constraints over time/state
– can be stated as validity rules (only)
• Intermediate states (transactions)
– validity rules don’t apply while the info base is in transition
during certain times in a process
• Localization of properties
– subtype A always has property P, subtypes B and C never do
– model property P local to A?
– model optional property P to common supertype S,
and use its existence to define (“qualify”) subtype A
04/12/07
Next Generation Info Models
27
Outline
•
•
•
•
•
Overview of information modeling
Features of “information modeling”
Comparison to features of OWL
Information modeling methodology
Conclusions
04/12/07
Next Generation Info Models
28
OWL Features – Classification
• Classification
– Entity type
– Value type
•
•
•
•
•
–
–
–
–
04/12/07
Class
Class
enumeration
name
text
quantities
truth values
Y (all values from)
N (datatype string)
N (datatype string)
N (numeric datatypes)
Y
Data type
Multiple classification
Default overlap
Classification change
Y
Y
Y
not applicable
Next Generation Info Models
29
OWL Features – Type relationships
• Type relationships
–
–
–
–
–
–
–
04/12/07
subtype
multiple supertypes
exclusion
covering
relative complement
choice/union
intersection
Y
Y
Y
Y
Complement, Difference
Y
Y
Next Generation Info Models
30
OWL Features -- Properties
• Properties
–
–
–
–
–
–
–
04/12/07
Attributes
Relationships
Inverse
Multiplicity/Cardinality
Set of property instances
Single domain, range
Mutable property
Datatype property
Object property
Y
Y
Y
Y
not applicable
Next Generation Info Models
31
OWL Features -- Metaproperties
• Property relationships
–
–
–
–
–
–
•
•
•
•
Property implies property
Property excludes property
Properties cover entity type
Relationship implies relationship
Relationship excludes relationship
Relationship refines relationship
Derived properties
Identifiers
Dependencies
“Part of”, Aggregates, Composites
04/12/07
Next Generation Info Models
Y
Y
N?
Y
Y
N (only implies)
some
functional property
N
N
32
OWL Features – Definitions and Rules
• Qualifying properties
– based on presence
– based on value equal
– based on function of value
• Validity rules
• N
04/12/07
Class definition
Y
Y
N
N
Inferencing rules
Next Generation Info Models
33
OWL as Info Modeling Language
• OWL has all the major features
• OWL is formally defined
– other information modeling languages have formal models
ascribed to them after the fact (not standard interpretations)
• OWL has formal classification inferencing
– but it is not much stronger than languages like ORM
– not even strong in “datatype reasoning”
• OWL needs:
– Identifier/Key metaproperties – identification of individuals
– Relative uniqueness rules
– Validity rules
04/12/07
Next Generation Info Models
34
Outline
•
•
•
•
•
Overview of information modeling
Features of “information modeling”
Comparison to features of OWL
Information modeling methodology
Conclusions
04/12/07
Next Generation Info Models
35
Information Analysis Approach
• Interview
– obtain initial information from the experts
• Formalize
– formally capture what the experts said
• Design
– reorganize the formal model to provide insight
• Review
– walk the experts through the designed model
– examine one or more use cases
– solicit questions, concerns, variants
• Revise
– correct the design to accommodate the clarifications
04/12/07
Next Generation Info Models
36
Information Analysis Method
• Identify the processes to be supported
• Identify the principal business classifications of things
used/modified by the processes
• Identify the properties of those things
that are used/modified by the processes
• Identify types, specializations and generalizations
that collect uses and properties
• Determine type-to-type relationships
• Associate properties with the classifications
• Determine cardinality constraints
• Distinguish entity types from value types
• Identify the keys for individuals
• Specify validity rules
04/12/07
Next Generation Info Models
Interview
Formalize
Design
37
Process Modeling
• Business Process Modeling
–
–
–
–
–
–
04/12/07
Activities and control flows
Decision points and rules
Process decomposition
Data/Message/Material flows
Information as ‘documents’
Languages: BPMN, ARIS, METIS, ...
Next Generation Info Models
38
Binding process to information
• Actions of process on entities
– creating an entity instance
– creating a relationship instance between entity instances,
usually as a property having a “domain" (or “subject")
and a “range" (or “object")
– changing one or more properties of an entity instance or
relationship
– destroying an entity instance
– destroying a relationship instance
– using a property of an entity instance
04/12/07
Next Generation Info Models
39
Relating Process to Info Requirements
• USE defines an information requirement
• All other actions define EVENTS
– Process models can/should represent impact of events
• Use and Events can be aggregated or decomposed
–
–
–
–
04/12/07
Entity/Class level (UML)
Specific instance
Aspect (a collection of properties)
Property
Next Generation Info Models
40
Conclusions
• Emphasis on supported processes as driver
– scopes the model in breadth and depth
– orthogonal to semantic web concerns
• Model for understanding
– model must be meaningful to the domain experts
– correct formal interpretation is important
– implementation is a separate engineering activity
• OWL language is strong
–
–
–
–
04/12/07
formal logic basis
almost all known features (necessary and optional)
identifiers are a critical concern
validity rules will be required
Next Generation Info Models
41