Challenges in Natural Language Processing

Fundamentals/ICY: Databases
2012/13
WEEK 7
John Barnden
Professor of Artificial Intelligence
School of Computer Science
University of Birmingham, UK
Reminder of Week 6
A Generalization Hierarchy
with Overlapping Subtypes
Actual Realization of Subtyping
A supertype maintains a 1:1 relationship with each subtype
[optional in the super-to-sub direction and
mandatory in the other]
New for Week 7
More-Than-Binary Relationships
in ERMs and ERDs
Relationship Degree
Binary relationship [my definition]

Two entities (“entity occurrences”) are associated by each
instance of the relationship, as in all previous examples in
lecture slides.
Ternary relationship [my definition]

Three entities are associated by each relationship instance.
Etc.
NB: the entities associated with each other need not be
of different types.

Indeed, could have an entity (occurrence) associated with
itself: e.g. an “employs” relationship where someone can
employ him/herself. Doesn’t violate binarity.
Relationship Degree:
Terminology Problems
The standard terminology & definitions relating to
“relationship degree” (see the textbook) are
mathematically anomalous.
The degree of a relationship should be the number of
entities (“entity occurrences”) that are associated by
each instance of the relationship, no matter how many
entity types are involved.
But the definitions used in the books count the number
of entity types (“entities” in the abbreviated language
used), departing from normal mathematical practice.
Terminology Problems contd.
A “unary” relationship is standardly defined as being a
relationship where the entity occurrences are all within
the same entity type (“entity”).

E.g., a “manages” relationship between employees.
A better name is “recursive” (also used in the books).

Above example is binary recursive, under my definition.
The books define a binary relationship as one relating
two different types, even when there are more than two
entity occurrences per relationship instance.
Similarly ternary and three different types.
Different Degrees
The “unary” case is badly named. It’s really a type of binary. The
word “recursive” [later] is better.
Caution about Next Slide
Astute members of the class noted an error in the
following slide.
The FUND table should just be descriptions of different
funds, such as Heart fund, a Cancer fund, and so on.
FUND_ID should be the PK.
 At some point I’ll replace the example with another.
For now, please see the doctor/patient/drug prescription
example from the textbook, pp.337-339.
Tables for a Ternary Relationship
CFR is just like a bridging entity type for a binary M:N relationship,
but has 3 links to other types instead of 2
Recursive Relationships in ERMs
and ERDs
Recursive Relationships
A recursive relationship links entities of the same
type.

E.g.: marriage, management, parthood, …
Can have partial recursion: just some of the entity
types involved in a relationship (that is ternary or of
higher degree in my sense) could be the same.
Recursive Relationships: Symmetry
 A relationship R between entity types E,F (possibly the same)
is symmetric iff:
if eRf then fRe (i.e., if R relates entity e of type E to entity f of
type F, then it must also relate f to e.)
E.g.: marriage, being-sibling-of.
 Recursive relationships cause major redundancy problems
when symmetric.
 Symmetry only makes sense in the 1:1 and M:N cases.
 ((Can generalize the points to partly-recursive cases.))
An ER Representation
of Recursive Relationships
I don’t know why the Crow’s Ft model uses two links in each case
(necessarily non-symmetric) 1:M recursive:
“EMPLOYEE Manages EMPLOYEE”
Just a standard 1:M implementation except linking a table to itself.
No redundancy problem.
non-symmetric M:N recursive:
“PART Contains PART”
The COMPONENT entity type is just a bridging type, linking PART to
itself. NB: its first two columns both refer to PART’s PK but must be
differently named. No redundancy problem.
symmetric (1:1) recursive relationship:
“EMPLOYEE Married to EMPLOYEE”
Symmetry is the Problem
A non-symmetric 1-1 relationship would not have
the problem shown on previous slide.
A symmetric M:N relationship would have a
redundancy problem, whether implemented as in
the 1-1 case or by a bridging table.

E.g.: being-sibling-of.
symmetric (1:1) recursive relationship:
redundant & non-redundant implemntns
1) As previously—redundant .
2) MARRIED_V1 is just a bridging entity type: still redundant.
3) MARRIAGE together with MARPART act as a sort of bridge.
Non-redundant.
Symmetric M:N, etc.
Previous slide: explained and commented upon in the
textbook, pp.353-355.
Method 3 on previous slide can straightforwardly be
generalized to:
symmetric recursive M:N relationships
(( (partially-)symmetric&recursive, etc. relationships
that link more entities together, whatever the
connectivity --- M:N:P, 1:1:1, M:1:P, … ))
Summary: Creating ERMs/ERDs
Designing an ER model for a database is an iterative
process, because, e.g.:

As you proceed, you think of new ways of conceiving what’s
going on (much as in ordinary programming)

Multivalued attributes need to be re-represented eventually

M:N relationships can be included as such at an early stage, but
usually need to be replaced by bridging entity types later

1:1 relationships raise a red flag: may indicate poor design,
especially when mandatory both ways.
• But standard in supertype/subtype representation, and natural in
recursive relationships like “married-to”.


Special supertype/subtype notation needs eventually to be
converted into more standard diagram notation
Conversion of ERM portions to a “Normal Form” – LATER.
MATHEMATICAL
BACKGROUND TO TABLES
Mathematical “sets”: Basics
 A “set” is an unordered collection of items of any sorts (people,
numbers, numerals, shoes, atoms, strings of characters, databases,
sets, blades of grass, …) without any duplication of items.
The items are called “elements” or “members”.
 S = {34, JAB, 59, UoB}, where “JAB” is a name for me and “UoB”
is a name for this university,
means that
S is the set consisting of (exactly) the following four items:
the abstract number 34, me, the abstract number 59, this
university.
Basics, contd
 {34, JAB, 59, UoB} = {UoB, 59, 34, JAB, JAB, 34}

Order of writing the members doesn’t matter; duplication in the writing
doesn’t duplicate the member.
 A set can be infinite (e.g., the set of all whole numbers).
 A set can contain just one member (e.g. the set whose only element
is your favourite pencil). Singleton set.

It’s different from the member itself..
 There’s a set with no members at all: the “empty set”, usually
notated as , but can also be written { }.

Somewhat analogous to zero, or a new committee which has no members yet.

There is only one empty set (rather than an empty set of numbers, an empty
set of pencils, etc.)
Another Notation
 {n | n is an integer, n > 301} =
“The set of n such that n is an integer and n > 301.”
(Actually, this notation is a slight simplification.)
The set is the same as that denoted by, for instance,
{n | n is an integer, n  302}.