Language, Logic and Data Bases

Chapter 6
Language, Logic and Data Bases
6.1
Some Logical Traps.
If all the queries one could formulate were limited to the simple ones in the ”Query by Example”
mode, the whole idea of a data-base would not be very useful. In particular, we need unambiguous
ways of expressing rather complex questions.
Unfortunately, complexity and ambiguity have a way of going hand-in-hand. For example, the
IMPLEMENTATION of the query answering system between Microsoft Works 4.0 for the Macintosh and Microsoft Works 3.0 for Windows 3.1 is different. (Microsoft Works is an integrated
software package containing a Word Processor module, a Spreadsheet module and a Database module - less powerful, and cheaper, than MS Office.) This makes no difference for a simple query one containing no ANDs or ORs - since you don’t have much of a choice on how you interpret it.
It DOES make a difference when the query requires the satisfaction of several different conditions
linked by ANDs and ORs. Let’s start with a simple data-base:
Prefix
Mrs.
Ms.
Ms.
Last Name
Brown
Terry
Green
First Name
Mary
Anne
Emily
State
MI
MI
MI
Town
Hackley
Hackley
Lansing
Zip Code
87654
87654
87653
For example, the query
Prefix = Ms. or Prefix = Mrs. and Town = Hackley,
which might correspond to ”give me all the ladies living in Hackley”, leads to TWO distinct
sets of answers: on the Macintosh it will retrieve the answer to the query
(Prefix = Ms.) or ((Prefix = Mrs.) and (Town = Hackley)),
thus returning the WHOLE data-base, while under Windows 3.1 it will retrieve the answer to
the query
((Prefix = Ms.) or (Prefix = Mrs.)) and (Town Hackley),
thus returning the first and second record only.
151
The two sets of answers ARE DIFFERENT, because the queries are different.
One should be able to expect consistency in the interpretation of the queries, particularly in
software systems that claim to be versions of the same thing running on different machines, but it
appears that we have no such luck. You might reasonably expect, given this example, that moving
from one database engine (Access is one such) to another will lead to all kinds of complications.
Fortunately, the user interface of MS Access takes care of the problem just mentioned: it helps the
user to avoid the ambiguity. In order to understand the question better, we will look at a slightly
more complex database, and we will generate a number of queries. Two of them will correspond to
the two queries discussed above. In the process, we will learn a little more about how to generate
”Queries by Example” in Access.
Figure 6.1 : A small Data Base.
If we want an answer to the request: ”give me all the ladies living in Hackley”, we must create the
following query:
152
Figure 6.2 : Query Design View.
Which will give us the ”answer table”:
Figure 6.3 : Query Response.
An examination of the original database lets us conclude that we have the right answer.
We have taken care of the query:
((Prefix = Ms.) or (Prefix = Miss) or (Prefix = Mrs.)) and (Town = Hackley).
We must observe that there are several different ways in which we can associate the various parts
of the query. We could also alter the order of the terms and THEN apply the different ways to
associate them. After all we asked just ONE of the many possible questions.
Let us look back at the query that the Macintosh version of MS Works gave us:
(Prefix = Ms.) or ((Prefix = Mrs.) and (Town = Hackley)).
153
How do we rewrite it in Access?
Figure 6.4 : A New Query.
From which we get the (correct) answer:
Figure 6.5 : Query Response.
We can see that, if we want to find all those individuals who satisfy any one of several compound
conditions, we simply write each different condition on a different row of the Criteria section of the
Query Design table. Each row can consist of multiple conditions – one per column – all of which
must be satisfied simultaneously. Each individual column condition can contain OR, AND and
other operators.
All this may sound rather complicated, and, beyond the simplest level of queries, it is. There
are, simply, too many ways of asking questions: we need some tools to keep track of what we are
154
actually asking. And we have no choice but going back to LANGUAGE.
We must try to deal with the question of intepretation of statements, and therefore the question
of making language sufficiently precise so that most ambiguity is avoided. It brings us back to
the need for a programming language, and the database community seems to have settled on SQL
(Structured Query Language). The major problem, though, remains: how can we guarantee that
we are asking the question that we want answered, rather than SOME OTHER question that
appears sufficiently similar to fool us, but not the computer system?
6.2
How do we use language PRECISELY?
This is a question that has bothered people for a very long time - and has clearly NOT been
resolved as of this writing. Lawyers have been bothered by it for at least 4000 years - the stele
of Hammurabi bears witness. Philosophers for at least 2500. And we all have been bothered by
”word problems” that we had to solve with algebra.
The first to have systematically codified the ”rules of thought” was probably Aristotle - around
350 BC or so - and not much was done to improve on him for 2000 years.
Starting at the end of the 1600s, people began to question the system of Aristotle - too tied to
”Church reasoning” for their taste, and inadequate to express the new scientific theories based on
new mathematical models and physical understanding.
Mathematical Logic was ”invented” in the mid-1800s by the Englishman George Boole and
eventually put to use on the foundations of Mathematics, Physics and Philosophy.
The thought of ”automating reasoning” had actually occurred before (Leibnitz - 1700), but it
was only with the invention of the ”tabulating machines” that were introduced at the end of the
1800s (and used in the 1890 U.S. census) that the thought could be made real. Furthermore, it is
only with the modern electronic computer that more than trivial ”reasoning” can be carried out.
Reasoning is HARD and it requires lots of processing power.
It also requires that we give very precise meanings to our statements. Actually, this is not
quite correct: we need precision if we require consistency and predictability. If can live without
them, then sloppiness is perfectly acceptable in one’s reasoning - we are all living proof of this last
statement. Dead proofs of the contrary don’t count: they just got too sloppy.
6.3
Some Definitions.
(some of this material is taken from: C. D. Miller and V. E. Heeren, Mathematical Ideas - an
Introduction, Scott, Foresman, 1969)
155
We are really back to language, but with a slightly different twist from the one presented earlier:
we are less concerned with trying to determine whether we have a legal sentence (syntax), which
could, at least in principle, be carried out via our abstract machines, than with determining the
meaning of a complex sentence. We will assume that all the stuff we call ”sentence” satisfies the
requirement of being syntactically well-formed – which is another way of saying that appropriate
abstract machines CAN be constructed – but we need some additional conditions.
What is a SENTENCE?
Definition : A sentence is a (syntactically legal) statement which is either TRUE or FALSE.
Examples :
The sky is blue.
The lights in this room are off.
1 + 1 = 3.
There are a number of syntactically legal statements about which no truth or falsity can be
determined. Examples might be:
Eat your dinner.
Go to school.
What am I doing here?
These are not sentences in this meaning of sentence. They are certainly sentences to a linguist
– in the syntactic sense – but maybe not to a logician who needs to associate one of the two labels,
TRUE or FALSE, to every one of them. They are not declarative sentences: they are imperatives,
questions, etc. This leads us to the problem of how to associate such labels to a syntactically legal
statement – unfortunately this also takes us into deep philosophical questions. From the point
of view of COMPUTATION, all we need is SOME way to decide, so that the computation can
proceed. The obvious way is to rely on our ”intuitive understanding of simple statements”, and
leave it at that (with apologies to four+ millennia of philosophers).
Definition : Statements with ”variables” – e.g., x + 5 = 10 – will be called OPEN SENTENCES,
since they become true or false whenever a specific value (numeric in this case) is put in place of
the variable. ”y is at home” would be another such open sentence, which would be true or false
depending on who y is. They are syntactically legal sentences in which ONE (or, possibly, more)
of the words is left ”unfilled” : the x, or y (called a variable) is just a ”place-holder” for an object
of the correct type.
The open sentences serve a very important role: they allow us to ask questions in ways that are
computationally acceptable. The open sentence x + 5 = 10 corresponds to the question ”What
are the values of x that, when added to 5, will give a result of 10?”. In some sense, the only
”acceptable” questions are those that can be recast as open sentences. Imperative sentences have
no ready way to be recast.
156
As another example, recall that the Employee database contains one field with ”Date Hired”. In
the ”query by example”, you wrote down, in the correct column, ¡1/1/91. What you constructed –
by means of the inequality – was the open sentence ”y such that the Date Hired field of y contains a
value less than January 1st 1991”. It is the function of the database engine to find ALL occurrences
of the record y – within its tables – that satisfy the condition. Another way of saying this is that
the database engine must find all occurrences of y that will make the open sentence TRUE.
Single condition (i.e., simple) sentences are fairly easy to manage: just put the condition on the
single field used for the check, and watch the right list being returned. The very first example
showed that even the simplest compound conditions can be tricky. This is the problem we must
tackle.
6.4
Connectives.
and – or – not – if...then
are called ”connectives” and sentences that contain these words are called COMPOUND SENTENCES, while the ones that don’t are called SIMPLE.
Examples of simple sentences:
Death Valley is in Massachusetts.
The average Harvard Freshman has an SAT score of 1310.
It is dark here.
Examples of compound sentences:
No person here has visited Death Valley.
If I have an SAT score of 1600 then I will apply to Harvard.
3 + 4 = 7 and 1 + 1 = 3.
For reasons of convenience
we use some special symbols to represent the connectives:
V
and =W
or =
not = ¬
Also for reasons of convenience we will denote simple sentences by lower case letters: p, q, r, s, etc...
In this notation, p ∧ q will denote the compound sentence ”p and q” where p and q are simple
sentences.
We discussed briefly the notion of TRUTH for simple sentences: we just attach labels to the
statements, depending on some empirical observation or some other criterion. Once we start using
connectives - and thus dealing with an entity that consists of more than one simple sentence we have to give rules which allow us to derive the truth of the complex entity as soon as we
know the truths of the components. This, in turn, will allow us to answer the question: when
are two compound sentences containing the same SIMPLE sentences ”equivalent”? And this will,
157
finally, allow us a better understanding of what is happening in the two different implementations
of Microsoft Works.
How does TRUTH carry into compound sentences?
If we start with two or more simple sentences whose truth or falsity we know, how do we determine
whether compound sentences that contain them are true or false?
The solution has been to construct TRUTH TABLES associated with the connectives. These
truth tables are ”arbitrary”, although they attempt to reflect ”empirical reality”.
For example, the connective
V
(and - also called conjunction) has the following truth table:
p
True
True
False
False
p∧q
True
False
False
False
q
True
False
True
False
Which does agree with our ”intuitive” notion of AND.
The connective
W
(or - also called a disjunction) has the truth table:
p
True
True
False
False
p∨q
True
True
True
False
q
True
False
True
False
Which agrees with our intuitive notion of OR.
The connective ¬ (not - also called a negation) has the truth table:
¬p
False
True
p
True
False
Which agrees with our intuitive notion of NOT.
158
Some Examples.
And
- each simple sentence will be surrounded by parentheses:
a) (North America is an island) and (3 - 2 = 1)
b) (The aardvark is a mammal) and (Harry Truman was President)
Truth Values :
a) The first sentence is FALSE, the second sentence is TRUE. The truth table for AND
gives FALSE for the conjunction.
b) Both sentences are TRUE. The truth table gives TRUE for the conjunction. Notice
that there is no need for the two sentences to be in any way related.
Or :
c) (To graduate) you must take Math or Computer Science.
This is shorthand for the compound sentence:
(You must take Math) or (you must take Computer Science)
Since I believe that satisfying a Math requirement is necessary for graduation, I must believe that
the sentence is true regardless of the truth of the second simple sentence.
d) The price of admission to Disney World is either less than $50 or more than $20.
Is this sentence TRUE?
It is made up of the two simple sentences:
i) The price of admission to Disney World is less than $50.
ii) The price of admission to Disney World is more than $20.
What can we say and why? Is this sentence ALWAYS true, regardless of what the admission price
is?
More Examples.
Consider the compound sentence
W V W
(p q) (r p)
where p, q and r are simple sentences whose truth values are:
p = F, q = T, r = T.
What is its truth value? In other words, given truth values for each of the components p, q and r,
what is the truth value of the compound sentence?
We start with a ”general” approach: try ALL possibilities. The GENERAL truth table for the
sentence is:
p
T
T
T
T
F
F
F
F
q
T
T
F
F
T
T
F
F
r
T
F
T
F
T
F
T
F
p∨q
T
T
T
T
T
T
F
F
r∨p
T
T
T
T
T
F
T
F
159
(p ∨ q) ∧ (r ∨ p)
T
T
T
T
T
F
F
F
To answer the question, just look in the row with p = F , q = T , r = T : this has T under
(p ∨ q) ∧ (r ∨ p), so the desired value is T (i.e.TRUE).
We can now return to the query that was asked of the two variants of Microsoft Works and that
returned different answers.
We had three simple sentences, say p, q and r. They were OPEN sentences (i.e. they contained
variables, since they were really questions), but this will not make much difference to our analysis.
What we must do is construct a full truth table for p, q, r, (p ∨ q) ∧ r, p ∨ (q ∧ r), since these are the
two possible ”interpretations” of the original query. Here it is:
p
T
T
T
T
F
F
F
F
q
T
T
F
F
T
T
F
F
r
T
F
T
F
T
F
T
F
p∨q
T
T
T
T
T
T
F
F
(p ∨ q) ∧ r
T
F
T
F
T
F
F
F
q∧r
T
F
T
F
T
F
F
F
p ∨ (q ∧ r)
T
T
T
T
T
F
F
F
The truth tables are DIFFERENT, as can be seen by comparing the appropriate two columns which means that the two compound sentences are different: we are not asking the same question...
Definition: Two compound sentences made up from the same simple sentences are ”the same” i.e., logically equivalent - when and only when they have the same truth tables.
The Exclusive OR :
Sometimes we need to specify that we want the answer to a compound query - in particular a
DISJUNCTION - for which one of the simple queries is true, but not more than one. The truth
table must then look like this:
p
T
T
F
F
q
T
F
T
F
p⊕q
F
T
T
F
Unfortunately, most query systems do NOT provide us with a ready-made exclusive or. How
do we deal with it? In other words, how can we obtain the exclusive or of two sentences using
just AND, OR and NOT? Is it possible? (Yes... we will se another way reaching the same result
later)
Claim: p ⊕ q = (p ∧ (¬q)) ∨ ((¬p) ∧ q).
160
Proof: construct the truth table.
p
T
T
F
F
q
T
F
T
F
¬p
F
F
T
T
¬q
F
T
F
T
p ∧ (¬q)
F
T
F
F
(¬p) ∧ q
F
F
T
F
(p ∧ (¬q)) ∨ ((¬p) ∧ q)
F
T
T
F
And we are done: compare the first two and last columns of this table with the columns of the
previous table, and you see they are the same.
Can you set this up in Access?
6.4.1
Implication.
One of the important ideas developed by philosophers at first and then by mathematicians (or
viceversa - the two were probably interchangeable at the time), is the idea of implication. The
verbal formulation of implication is given by our ”if...then...”. The logical formulation will use the
same words, but will have to be made precise.
We are used to thinking of implication as containing some causality - for example, many politicians make the claim:
If I am elected, then taxes will go down;
leading you to believe that THEY will be responsible for the desired consequence.
How do we determine the truth status of this sentence? That is, how can we determine, from
the constituent simple sentences, whether the compound sentence is true?
First of all, what are the simple sentences?
a) I am elected.
b) Taxes will go down.
What will the truth table look like?
I am elected
T
T
F
F
Taxes will go down
T
F
T
F
If I am elected then taxes will go down
?
?
?
?
How do we fill in the question marks?
First Row: Assume that the politician was elected, and that the taxes did go down. The politician told the truth ((s)he may or may not have had anything to do with the taxes going down, so
no causality can be concluded), so put a T in the third column.
Second Row: Assume the politician was elected, and that the taxes did NOT go down. Our
politician lied, so put an F in the third column.
161
Third Row: Assume the politician was NOT elected, and that the taxes did go down. No lie
was perpetrated on the voters, so we must put a T in the third column - this may appear somewhat
strange, since the PREMISE is not satisfied, but we are trying to come up with the truth of the
COMPOUND sentence: the empirically observable fact that the politician did not lie is what is
used to conclude the truth of the sentence (we are using: not a lie = truth).
Fourth Row: Assume the politician was not elected, and that the taxes did not go down. Again,
no lie. So put a T in the third column.
The completed truth table becomes:
I am elected
T
T
F
F
Taxes will go down
T
F
T
F
If I am elected then taxes will go down
T
F
T
T
Pople use several ways of expressing the ”if ... then” implication. Two of the most common
ones are p → q and (¬p) ∨ q. The first one is just a notational convenience - we write less. The
second one is a bit more complicated and may not be immediately obvious. To make it obvious,
we construct the truth table of this new expression, and check it against the truth table we just
made up.
The completed truth table becomes:
p
T
T
F
F
q
T
F
T
F
p→q
T
F
T
T
¬p
F
F
T
T
(¬p) ∨ q
F∨T=T
F∨F=F
T∨T=T
T∨F=T
Since the third and fifth columns have exactly the same entries, the two expressions have exactly
the same truth value - they are logically equivalent.
There is one point that should be made: our usual ”if ... then” is often tied with some kind of
temporal sequencing. This simply means that the sentence tied to the if (often called the antecedent)
is true BEFORE the sentence tied to the then (the consequent). In propositional logic, which is
what we use here, there is NO TIME, and this latter form of implication ((¬p) ∨ q) brings this
timelessness out.
What other connectives do we use in everyday life? How can we make them ”precise”?
162
unless :
p unless q = if q does not occur then p will
= (¬q) → p
What is the truth table? Recall that a → b = (¬a) ∨ b, so that (¬q) → p = (¬(¬q)) ∨ p) = q ∨ p,
since ¬(¬q)) = q (double negation is the same as affirmation).
p
T
T
F
F
q
T
F
T
F
(¬q) → p = q ∨ p
T
T
T
F
either ... or :
This is our old friend, the exclusive or. Another way we could restate it is: (p or q) and (not
both (p and q). The truth table:
p
T
T
F
F
q
T
F
T
F
(p ∨ q)
T
T
T
F
(p ∧ q)
T
F
F
F
¬(p ∧ q)
F
T
T
T
(p ∨ q) ∧ (¬(p ∧ q)) = p ⊕ q
F
T
T
F
because :
p because q = q → p.
6.4.2
More about Conditionals.
Implications (if...then) have been used, and studied by logicians, for a very long time. With
two sentences, implication and negation, we can put together four different ”forms”. They are
important, in part at least, because people - in normal speech - get their equivalence wrong much
of the time, thus reaching conclusions that are totally unwarranted from the evidence available.
Let’s look at these forms.
a)
b)
c)
d)
Direct Statement
Converse Statement
Inverse Statement
Contrapositive Statement
p→q
q→p
(¬p) → (¬q)
(¬q) → (¬p)
if
if
if not
if not
p
q
p
q
then
then
then
then
q
p
not q
not p
Let’s look at a sentence: If I will average 70% on all my tests then I will pass this course.
p = I will average 70% on all my tests.
q = I will pass this course.
Direct Statement : p → q.
163
p
T
T
F
F
Converse Statement
p → q = (¬p) ∨ q
T
F
T
T
: q → p: If I will pass this course then I will average 70% on all my tests.
p
T
T
F
F
Inverse Statement
this course.
q
T
F
T
F
q
T
F
T
F
q → p = (¬q) ∨ p
T
T
F
T
: (¬p) → (¬q): If I will not average 70% on all my tests then I will not pass
p
T
T
F
F
q
T
F
T
F
Contrapositive Statement
70% on all my tests.
p
T
T
F
F
q
T
F
T
F
(¬p) → (¬q) = (¬(¬p)) ∨ (¬q) = p ∨ (¬q)
T
T
F
T
: (¬q) → (¬p): If I will not pass this course then I will not average
(¬q) → (¬p) = (¬(¬q)) ∨ (¬p) = q ∨ (¬p)
T
F
T
T
Notice that:
Direct Statement = Contrapositive Statement.
Converse Statement = Inverse Statement.
The four types split into two groups of equivalent statements. The statements are not equivalent
across groups.
6.4.3
Valid Arguments.
Notice also that the statement
If I have a dollar than I can buy a 32 cent stamp
could well be false, even if I have my dollar: I may not be able to use it for the purpose of buying
the stamp (perverse vending machine? exact change required?).
The upshot is that, to determine the truth or falsehood of an implication, you need to know the
truth or falsehood of BOTH antecedent and consequent.
164
Usually, though, what we REALLY want is to know that a given implication is TRUE, i.e., that
p → q is true, and then we plug in the TRUE antecedent (p) to conclude the consequent (q). After
all, this is what you really want to know: what grade do I need to pass?
Why does this work? Look at the truth table: the only row that has BOTH p and p → q marked
T, also has q marked T.
What do we mean by an ”implication being TRUE”? That the situation where the antecedent
is true and the consequent is false can NEVER occur.
To conclude the THEN part, we must know:
a) The if ... then ... is TRUE.
b) The antecedent (the argument of if...) is also TRUE.
In this case, the consequent (the argument of the then...) will be always TRUE.
165