LG519 Prolog and NLP Basics, Syntax and Semantics, Using Prolog

LG519 Prolog and NLP
Basics, Syntax and Semantics, Using Prolog
Doug Arnold
University of Essex
[email protected]
1
Prolog Basics
Prolog is:
•
•
•
•
a general purpose programming language,
which implements an idea about Programming in Logic,
in which it is very easy to write grammars for Natural Languages,
and with which it is relatively easy to implement Natural Language Processing (NLP) systems,
i.e. systems for ‘understanding’ Natural Languages.
Prolog is a language – it has a syntax and a semantics.
Bear in mind too that learning a language (human or computer) involves:
1. study and understanding; and
2. practice (practice, practice, practice) — using it.
1.1
The course motto
“Remember, if you don’t understand something right away, don’t worry. You never learn
anything, you only get used to it.”
Laurent Siklossy (1976) Let’s talk LISP, Prentice-Hall, Inc., Englewood Cliffs, N.J.,
1.2
1.2.1
Examples
Factorial
The following computes the factorial of a number N (i.e. N!), e.g. 4! = 4 × 3 × 2 × 1 = 24.
Example 1.1
factorial(0,1).
factorial(N,FN) :N1 is N-1,
factorial(N1,FN1),
FN is FN1*N.
Query 1.1
| ?- factorial(5,What).
What = 120 ?
yes
1.2.2
Adding Up a list of numbers
Example 1.2
1
addup([],0).
addup([X|Xs],N) :addup(Xs,M),
N is M+X.
Query 1.2
addup([4,2,3],What).
What = 9 ?
yes
1.2.3
Logical Reasoning
The following implements the syllogism:
For any x is mortal if x is human;
Socrates is human;
Therefore: Socrates is mortal
Example 1.3
mortal(X) :- human(X).
human(socrates).
Query 1.3
| ?- mortal(socrates).
yes
1.2.4
Parsing
The following is a Prolog program which will parse (or generate) the sentence the baby saw the toy
(etc.)
Example 1.4
s --> np, vp.
vp --> v,np.
np --> det,n.
det --> [the].
n --> [baby].
n --> [toy].
v --> [saw].
Query 1.4
| ?- s([the,baby,saw,the,toy],[]).
yes
| ?- s(S,[]).
S = [the,baby,saw,the,baby] ?
yes
| ?- s(S,[]).
S = [the,baby,saw,the,baby] ? ;
S = [the,baby,saw,the,toy] ? ;
S = [the,toy,saw,the,baby] ? ;
S = [the,toy,saw,the,toy] ? ;
no
2
2
Prolog Syntax
Prolog expressions are called terms. There are several sorts of term:
•
•
•
•
atoms
variables
numbers
“compound terms”
• an atom is either:
– a sequence of alphanumerics, starting with a lowercase, and possibly including “_” (underline); or
– anything at all enclosed in single quotes (“’”);
• a variable is a sequence of alphanumerics, starting with an uppercase, or “_” (underline),
variables that start with _ are called anonymous;
• a compound term consists of:
– a functor, or predicate (any atom); and
– a sequence of terms, enclosed in round parentheses, and separated by commas. These are
called arguments of the predicate.1
There must be no space between the functor and the left bracket.
2.1
Examples
foo
Hello
Sam
’hello’
’Hello Sam’
a1
X
_234
f(a)
f(hello,Sam)
f(f(a),b)
hello
sam
hello_Sam
’Hello’
_
’1a’
_hello
hello(X)
f(a,b,c)
f( a, b, c )
hello(1,hello(x,X,hello(sam)))
t(a,t(b,t(c,t(d,[]))))
Some non-terms:
hello Sam
hello-sam
f(a, b
f a,b)
X(a,b)
2.2
Hello Sam
1a
f(a,
f (a, b)
1(a,b)
Clauses
One kind of term is particularly important: Clauses.
A clause is a collection of terms, separated by “,” (commas), or “;” (semicolons), and ending with a
“.” (fullstop).
1 In fact, some functors can be written between, or even after, their arguments. Examples: a=X, man :- mortal.
These are called operators can can be ignored for the moment.
3
Some special cases:
• a simple clause: a term followed by a “.” (fullstop), e.g. happy(sam).
• a conjunction, e.g. happy(sam), young(sam).
• a rule, which consists of:
– a head (a single Prolog term); and
– :-; and
– a body (a sequence of Prolog terms, separated by commas, or semicolons); and
– a fullstop.
• a query: a clause preceded by ?-.
• a directive: a clause preceded by :-.
The head of a clause is the first term in it.
2.2.1
Examples
woman(leslie).
capital(scotland,edinburgh).
capital(scotland,belfast),loves(kim,sam).
woman(X) :- female(X), adult(X).
woman(X) :female(X),
adult(X).
2.3
Syntax:Summary
terms
atoms
numbers
variables
anonymous
non-anonymous
compound-terms
lists
...
clauses
complex
rules
conjunctions
disjunctions
simple
facts
simple-queries
4
3
Prolog Semantics
3.1
Matching
• a variable matches anything;
• two atoms match if (and only if) they are the same (same goes for numbers);
• two compound terms match if (and only if):
– they have the same functor; and
– the same number of arguments (same arity); and
– corresponding arguments match.
Note: terms with functors with different arity can never match, so arity is as important as spelling.
The following involve different predicates:
capital(scotland,edinburgh).
capital(edinburgh).
Predicates can be specified by name/arity: capital/2, capital/1.
3.2
Examples
Term1
Term2
Match?
----------------------------------------------------hello
hello
X
Y
X
leslie
_
leslie
f(a)
f(a)
f(X)
f(a)
woman(X)
woman(leslie)
loves(X,sam)
loves(leslie,Y)
loves(woman(X), sam)
loves(woman(kim), S)
hello
x
f(X)
woman(X)
3.3
goodbye
leslie
f(a,b)
womna(leslie)
Unification
When two terms match, they can be unified. Intuitively, the unification of two terms is the term
that ‘combines the information from both’. More precisely, it is the most specific term that matches
both.
When two variables are unified, they are said to share: essentially, they become the same variable.
More generally, when a variable is unified with another term it is said to be bound to that term, and
has that term as its value — in effect it becomes that term. Thus, it is not possible to unify X with
both a and b (at the same time).
loves( kim ,
_
).
loves( _
, leslie ).
loves( kim , leslie ).
5
3.4
Examples
Term1
Term2
Unification
----------------------------------------------hello
hello
hello
X
leslie
leslie
X
f(a)
f(a)
f(X)
f(a)
f(a)
woman(X)
woman(leslie)
woman(leslie)
loves(X,sam)
loves(leslie,Y)
loves(leslie,sam)
hello
x
f(X)
woman(X)
goodbye
leslie
f(a,b)
womna(leslie)
*fail*
*fail*
*fail*
*fail*
loves( woman(X) , sam)
loves( woman(kim), S )
---------------------loves( woman(kim),sam)
Notice that matching and unification of variables has to be ‘consistent’ in the sense that one variable
cannot get two different values:
f(X,Y)
f(a,b)
-----f(a,b)
f(X,X)
f(a,b)
-----*fail*
But anonymous variables are special (they are called anonymous because their names don’t matter):
f(_,_)
f(a,b)
-----f(a,b)
3.5
Prolog Programs
A Prolog program consists of a database of facts, and rules.
• facts are simple clauses, consisting only of a head;
• rules are compound clauses, of the form head :- body.
The user can access the database by queries or goals (which are simply clauses).
3.5.1
A Simple Database
Program 3.1
% authors.pl
% author(Name, Book).
author(rendell,tree_of_hands).
author(rendell,demon_in_view).
author(alcott,little_women).
6
author(dickens,bleak_house).
author(deighton,berlin_game).
author(deighton,london_set).
author(deighton,mexico_match).
writer(X) :author(X,_).
% authors.pl ends here ----------------------------
Query 3.1
?- author(deighton,X).
Query 3.2
?- author(W,london_set).
Query 3.3
?- author(X,Book).
Query 3.4
?- author(X,Y)
Query 3.5
?- author(X,X).
3.5.2
A More Interesting Example
See presidents.pl.
What party did Reagan represent?
Query 3.6
?- pres(_,’Reagan’,Party,_,_).
When did President Licoln come into office?
Query 3.7
?- pres(_,’Lincoln’,_,Start,_).
Was Clinton a Republican?
Query 3.8
?- pres(_,’Clinton’,’Republican’,_,_).
Who followed Kennedy?
Query 3.9
pres(_,’Kennedy’,_,_,Year), pres(_,Who,_,Year,_).
Suppose we define:
leftoffice(Pres,Date) :president(_, Pres, _Party, _Start, Date).
7
We can ask:
When did Johnson leave office:
Query 3.10
leftoffice(’Johnson’,When).
Who left office in 1974:
Query 3.11
leftoffice(Who,1974).
Similarly, we can define party(President,Party) which will relate presidents to their parties.
Or democrat(Pres) which is true just in case Pres is a democrat, and republican(Pres).
[Do these as exercises]
Suppose we define:
followed(Pres1,Pres2) :pres(_,Pres1,_,_,Date),
pres(_,Pres2,_,Date,_).
We can ask:
Query 3.12
?- followed(’Kennedy’,Who).
?- followed(Who,’Kennedy’).
?- followed(Who1,Who2).
(What should followed(Who,Who) mean? You would expect this query to fail, but it doesn’t. It
succeeds for Presidents Harrison and Garfield. Explain why).
3.6
To Summarize
Prolog solves (answers) queries roughly as follows:
• the database is searched from top to bottom until a clause if found whose head matches the
query;
• the head of the clause and the query are unified;
• if the clause is a fact then we have success; some results are returned.
• is the clause has a body then each member of the body is treated as a new sub-query (sub-goal)
to be solved.
• if a solution cannot be found to a sub-goal, backtracking occurs — we begin searching the
database again starting where we left off before.
4
Using Prolog
Prolog is an interpreted language, i.e. (no need to compile files, etc. before using them). Instead,
the user typically interacts directly with Prolog.
The user:
• creates databases (using a normal text editor) which are stored in files;
• starts Prolog;
• consults the files (‘loads’ them into Prolog);
8
– types queries; which
– Prolog solves (answers); and
– whose answers Prolog displays (prints).
Note: to consult a file either pose a query with either (i) the predicate consult/1, and the file name
as argument; or (ii) the name of the file in square brackets. File names must be atoms.
An Example Interaction SICStus 2.1 #9:
| ?- [’authors.pl’].
consulting authors.pl...
authors.pl consulted, 18 msec 1104 bytes
yes
| ?- listing.
writer(A) :author(A, ).
author(rendell, tree of hands).
author(rendell, demon in view).
author(alcott, little women).
author(dickens, bleak house).
author(deighton, berlin game).
author(deighton, london set).
author(deighton, mexico match).
yes
| ?- author(deighton,X).
X = berlin game ?
yes
| ?- author(deighton,X).
X = berlin game ? ;
X = london set ? ;
X = mexico match ? ;
no
| ?- author(X,Book).
Book = tree of hands,
X = rendell ? ;
Book = demon in view,
X = rendell ? ;
Book = little women,
X = alcott ?
yes
| ?- author(X,X).
no
| ?- halt.
4.1
09/03/95
Starting Prolog
In an xterm or decterm window, type “sicstus”:
% sicstus
4.2
Loading a file
To the Prolog prompt (?-) type:
9
| ?- [file].
where file is an atom – the name of the file containing your program, e.g.
| ?- [’myfile.pl’].
| ?- [’myprolog_progs/myfile.pl’].
These cause Prolog to consult the file, i.e. read in clauses from the files, and put them in the database
(asserting them).
Sicstus expects Prolog files to end with “.pl”, so you can miss the “.pl” extension off:
| ?- [myfile].
4.3
Queries
Normally, what you want to do at the keyboard is to solve queries (goals). Simply type the queries:
| ?- author(rendell,X).
You get back a ‘yes’ or ‘no’, and if your query contains variables (other than the ‘anonymous’ variable
“_”) an indication of the values they have when the query succeeds. You can get alternative solutions
by typing “;” (a semi-colon). When there are no more solutions, you will get the answer “no” (i.e.
no more solutions).
If you want to find out what is in the Prolog database, type “listing”:
| ?- listing.
This gives you everything. If you only want to see one predicate (say author/2), then:
| ?- listing(author).
| ?- listing(author/2).
?- listing.
?- listing(foo).
?- listing(foo/2 ).
4.4
print all the clauses currently in the database
print all the clauses currently in the database for
the predicate foo, with any arity
print all the clauses currently in the database for
foo with arity 2
Syntactic Errors.
Often you will get ’syntax error’ messages when you consult (load) a file. You will have to look at
your file and try to work out what they mean.
It is worth remembering that some Prologs require that your file ends with a blank line.
4.5
Semantic Errors
Your program loads okay, but you get different answers from what you expect – use the trace
(debugging) facility. If the program seems to be taking too long to do anything, it may be in a loop
– try typing CTRL-C. You start the debugger by typing:
| ?- trace.
You are then shown stages of the computation, like the following:
10
23 6 Call: author(rendell,_35) ?
Here:
?
23
the rest
means the tracer is waiting for you to tell it what to do
is the invocation identifier
is the current goal, as currently instantiated
At this point you have several options (not all will make sense yet)
h
RET
s
r
l
4.6
get help
’creep ’ to the next goal
skip: at a Call or Redo port, the tracer skips
to the end (Exit or Fail port) of this invocation
retry – go back to the Call port of this procedure
leap – go to next spy point
Stopping Prolog (exiting)
To exit Prolog, type:
| ?- halt.
4.7
Comments
When creating Prolog databases, it is useful to bear in mind that Prolog allows comments – these
are bits of text that are simply ignored by Prolog:
• Anything on a line after “%”, and
• Anything between “/*” and “/*”.
Examples:
% This whole line is a comment.
f(a,b). % part of this line is a comment.
% This is a comment
% on several lines.
%% This is a comment too.
/*
This comment goes on
for several lines -- it is a good idea to indent things
so you can see what is in, and what is not in, a comment
*/
5
5.1
Exercises
Basic Exercises
The first task is to get to know how the machines in the Computer Lab work. This involves the
learing about how to log on and off, and how to move (raise, lower, etc) windows.
11
5.2
Unix
Xterms (xterms) – where you type normal Unix commands. You should learn how to print files to
both laser printers and line printers, and learn about commands to make directories (mkdir), change
directory (cd), list files (ls), read files (cat, more), rename and move files (mv), and a few other basic
things.
5.3
Netscape, Internet Explorer, etc
You can use Netscape or Internet Explorer for getting soft copies of files (programs and handouts)
across the network. You should go to the lg519 directory (http://courses.essex.ac.uk/lg/LG519/)
and look at the documents that relate to this course:
Files with .pl extension contain Prolog code
Files with .el extension contain emacs code
Files with .ps extension are PostScript files which can be printed on a PostScript (i.e. laser) printer
or viewed with ghostview.
Files with .pdf extension are pdf files which can be viewed with acrobat.
Be careful when you save a ‘.pl’ file, especially Internet Explorer is fond of saving such files with an
extra ‘.txt’ on the end (so it saves ‘authors.pl’ as ‘authors.pl.txt’). Make sure you rename the file to
remove the ‘.txt’ if this happens.
5.4
Emacs
For Emacs (or whatever other editor you propose to use) you should find out how to start it up (click
on the Text/Graphics menu item on the menu bar at the top of the screen), enlarge the window it
appears in, and run the tutorial (see the help menu at the top of the emacs window). If you are
using another editor, you do not need to do this.
Exercise 5.1 Use emacs (or whatever) to make some of the editing changes suggested on the first
handout.
5.5
Mail
You must learn how to:
read mail messages
write mail messages
send mail messages
save all or part of a mail message to a file
Exercise 5.2 Send me a mail telling me that you are on this course, and including the text of this
handout. (Don’t type it in by hand)
5.6
Prolog
You should start up Prolog in a xterm or Emacs window, load a file, try some queries, and generally
try things out as described on the handout under ‘Using Prolog’.
12
Exercise 5.3 The file capitals.pl contains a simple prolog database. Save this file from the LG519
WebPage to your own directory. Consult it (i.e. load it into the Prolog database), and type commands to find out what are the capitals of various American states:
?- capital(alabama,X).
?- capital(X,atlanta).
?- capital(X,Y).
Exercise 5.4 Extract the code fragments from the first hand out to do with factorials, etc.; put
them in a file, consult it, and try to formulate sensible queries.
Exercise 5.5 The file errors.pl contains a Prolog database with some syntactic errors in it. Consult
it, see what happens, and try to find and fix them.
Exercise 5.6 The errors.pl file contains facts like the following, and the following single rule:
author(deighton,berlin_game).
writer(X) :author(X,_).
What does this rule mean, intuitively?
Exercise 5.7 The directory http://courses.essex.ac.uk/lg/LG519/Programs/ contains a file
prime_ministers.pl which contains a database of English Prime Ministers in the form of clauses
for a predicate pm/5, e.g.
pm(’Portland’, [’Duke’,of], coalition, 1783, 1783).
The arguments are: FamilyName, Titles (etc), Party, StartDate, EndDate.
Load this file, and write prolog queries to find out:
•
•
•
•
when John Major came to power;
when Thatcher left office;
who all the Labour prime ministers have been
when Churchill was prime minister
Exercise 5.8 The directory http://courses.essex.ac.uk/lg/LG519/Programs/ contains a file
league_table.pl which gives a snap shot of the English football league table on 6 Dec 1999. Use
it to find out (via Prolog queries):
•
•
•
•
How many goals Man U had conceded;
How many goals Man U had scored;
How many games they had won, lost and drawn;
Which team was leading the table that day;
Exercise 5.9 The directory http://courses.essex.ac.uk/lg/LG519/Programs/ contains a number of other prolog databases. Play with them in the same sort of way.
6
Code Listings
Program 6.1 (capitals. pl )
%% capitals.pl: capital cities of American States:
%% capital(State,City) -%% where City is the state capital of the US state State
13
capital(alabama,montgomery).
capital(alaska,juneau).
capital(arkansas,little_rock).
capital(arizona,phoenix).
capital(california,sacramento).
capital(colorado,denver).
capital(connecticut,hartford).
capital(delaware,dover).
capital(florida,tallahassee).
capital(geogia,atlanta).
capital(hawaii,honolulul).
capital(idaho,boise).
capital(illinois,springfield).
capital(indiana,indianapolis).
capital(iowa,des_moines).
capital(nevada,carson_city).
/*------ capitals.pl ends here ------------------- */
Program 6.2 (authors. pl )
% authors.pl
% author(Name, Book).
author(rendell,tree_of_hands).
author(rendell,demon_in_view).
author(alcott,little_women).
author(dickens,bleak_house).
author(deighton,berlin_game).
author(deighton,london_set).
author(deighton,mexico_match).
writer(X) :author(X,_).
% authors.pl ends here ----------------------------
Program 6.3 (errors. pl )
% errors.pl
% This file contains syntax errors:
% try to find them and fix them.
This line is supposed to be a comment. Is it?
author(rendell,tree_of_hands).
author(rendell demon_in_view).
author(alcott,little women).
author(dickens,’bleak house’).
author(deighton,’berlin game).
author(deighton,london_set.
14
author(deighton,mexico_match)
writer(X) :author(X,_).
% errors.pl ends here ---------------------------------------
Program 6.4 (presidents. pl )
%%
%%
%%
%%
%%
%%
%%
-*- mode:prolog; mode:font-lock -*Time-stamp: <06/10/12 09:21:16 doug s3453i.essex.ac.uk presidents.pl>
------------------------------------------------------------File:
presidents.pl --- table of US presidents
Author: Arnold D J<doug@s2103>, Fri Dec 15 2000
------------------------------------------------------------Presidents of the United States of America
%pres(FirstNames
pres(’George’
pres(’John’
pres(’Thomas’
pres(’James’
pres(’James’
pres(’John Quincy’
pres(’Andrew’
pres(’Martin’
pres(’William Henry’
pres(’John’
pres(’James Knox’
pres(’Zachary’
pres(’Millard’
pres(’Franklin’
pres(’James’
pres(’Abraham’
pres(’Andrew’
pres(’Ulysses Simpson’
pres(’Rutherford Birchard’
pres(’James Abram’
pres(’Chester Alan’
pres(’Grover’
pres(’Benjamin’
pres(’Grover’
pres(’William’
pres(’Theodore’
pres(’William Howard’
pres(’Woodrow’
pres(’Warren Gamaliel’
pres(’Calvin’
pres(’Herbert Clark’
pres(’Franklin Delano’
pres(’Harry S’
pres(’Dwight David’
,Name
,’Washington’
,’Adams’
,’Jefferson’
,’Madison’
,’Monroe’
,’Adams’
,’Jackson’
,’Van Buren’
,’Harrison’
,’Tyler’
,’Polk’
,’Taylor’
,’Fillmore’
,’Pierce’
,’Buchanan’
,’Lincoln’
,’Johnson’
,’Grant’
,’Hayes’
,’Garfield’
,’Arthur’
,’Cleveland’
,’Harrison’
,’Cleveland’
,’McKinley’
,’Roosevelt’
,’Taft’
,’Wilson’
,’Harding’
,’Coolidge’
,’Hoover’
,’Roosevelt’
,’Truman’
,’Eisenhower’
,Political Party
,Start, End
,’xxx’
,1789 ,1797
,’Federalist’
,1797 ,1801
,’Democratic-Republican’
,1801 ,1809
,’Democratic-Republican’
,1809 ,1817
,’Democratic-Republican’
,1817 ,1825
,’Democratic-Republican’
,1825 ,1829
,’Democratic’
,1829 ,1837
,’Democratic’
,1837 ,1841
,’Whig’
,1841 ,1841
,’Whig’
,1841 ,1845
,’Democratic’
,1845 ,1849
,’Whig’
,1849 ,1850
,’Whig’
,1850 ,1853
,’Democratic’
,1853 ,1857
,’Democratic’
,1857 ,1861
,’Republican’
,1861 ,1865
,’Democratic-National Union’,1865 ,1869
,’Republican’
,1869 ,1877
,’Republican’
,1877 ,1881
,’Republican’
,1881 ,1881
,’Republican’
,1881 ,1885
,’Democratic’
,1885 ,1889
,’Republican’
,1889 ,1893
,’Democratic’
,1893 ,1897
,’Republican’
,1897 ,1901
,’Republican’
,1901 ,1909
,’Republican’
,1909 ,1913
,’Democratic’
,1913 ,1921
,’Republican’
,1921 ,1923
,’Republican’
,1923 ,1929
,’Republican’
,1929 ,1933
,’Democratic’
,1933 ,1945
,’Democratic’
,1945 ,1953
,’Republican’
,1953 ,1961
15
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
).
pres(’John Fitzgerald’
,’Kennedy’
pres(’Lyndon Baines’
,’Johnson’
pres(’Richard Milhous’
,’Nixon’
pres(’Gerald Rudolph’
,’Ford’
pres(’Jimmy’
,’Carter’
pres(’Ronald Wilson’
,’Reagan’
pres(’George Herbert Walker’,’Bush’
pres(’Bill’
,’Clinton’
,’Democratic’
,’Democratic’
,’Republican’
,’Republican’
,’Democratic’
,’Republican’
,’Republican’
,’Democratic’
%% ----- presidents.pl ends here ------------------------------These are the examples from section 1.
Program 6.5 (examples. pl )
%% examples.pl -- Some simple demo Prolog programs
%% factorial(N,M) -%% M is the factorial of N (i.e. N!).
factorial(0,1).
factorial(N,FN) :N1 is N-1,
factorial(N1,FN1),
FN is FN1*N.
%% sum(List,Sum) -%% Sum is the sum of the numbers in list List
addup([],0).
addup([X|Xs],N) :addup(Xs,M),
N is M+X.
%% Syllogistic reasoning:
mortal(X) :human(X).
human(socrates).
%%
%%
%%
%%
%%
A simple grammar
to use this, you need queries like
?- s([the,baby,saw,the,toy],[]).
?- s(X,[]).
?- np(X,[]).
s --> np, vp.
vp --> v,np.
np --> det,n.
det --> [the].
n --> [baby].
n --> [toy].
v --> [saw].
n --> [teacher].
%% examples.pl ends here -------------------------------------
16
,1961
,1963
,1969
,1974
,1977
,1981
,1989
,1993
,1963
,1969
,1974
,1977
,1981
,1989
,1993
,2000
).
).
).
).
).
).
).
).