Formal Language Theory
Homework
• Read documentation on Graphviz
– http://graphviz.org/
– http://www.graphviz.org/pdf/dotguide.pdf
• Use graphviz to generate figures like
these (more or less):
Back to Regular Expressions
10. A more interesting example
import re
myString="I have red shoes and blue pants and a green shirt. My phone
number is 8005551234 and my friend's phone number is (800)-565-7568
and my cell number is 1-800-123-4567. You could also call me at
18005551234 if you'd like.”
phoneNumbersRegEx=re.compile(''1?-?\(?\d{3}\)?-?\d{3}-?\d{4}'')
print phoneNumbersRegEx.findall(myString)
Answer is here, but let’s
derive it together
Formal Definition of Regular Expressions
•
•
•
•
•
<expr> character
<expr> ( <expr> )
Concatenation: <expr> <expr> <expr>
Union: <expr> <expr> + <expr>
Kleene Star: <expr> ( <expr> ) *
• Characters:
–
–
–
–
–
–
lower case: a-z
upper case: A-Z
digits: 0-9
special cases: \t \n
octal codes: \000
any single character: .
An Equivalence Relation (=R)
• A Partition of S ≡ Set of Subsets of S
– Mutually Exclusive & Exhaustive
• Equivalence Classes ≡ A Partition such that
– All the elements in a class are equivalent (with respect to =R)
– No element from one class is equivalent to an element from another
• Example: Partition integers into evens & odds
• Even integers: 2,4,6…
• Odd integers: 1,3,5…
– x =R y x has the same parity as y
• Three Properties
– Reflexive: a =R a
– Symmetric: a =R b b =R a
– Transitive: a =R b & b =R c a =R c
>>> for s in wn.synsets('car'): print s.lemma_names
['car', 'auto', 'automobile', 'machine', 'motorcar']
['car', 'railcar', 'railway_car', 'railroad_car']
['car', 'gondola']
['car', 'elevator_car']
['cable_car', 'car']
Word Net (Ch2):
An Equivalence Relation
>>> for s in wn.synsets('car'): print flatten(s.lemma_names) + ': ' + s.definition
car auto automobile machine motorcar: a motor vehicle with four wheels; usually
propelled by an internal combustion engine
car railcar railway_car railroad_car: a wheeled vehicle adapted to the rails of railroad
car gondola: the compartment that is suspended from an airship and that carries
personnel and the cargo and the power plant
car elevator_car: where passengers ride up and down
cable_car car: a conveyance for passengers or freight on a cable railway
Synonymy: An Equivalence Relation?
Comments
A Partial Order (≤R)
• Powerset({x,y,z})
– Subsets ordered by inclusion
– a≤Rb ab
• Three properties
– Reflexive:
• a≤a
– Antisymmetric:
• a≤b & b≤a a=b
– Transitivity:
• a≤b & b≤c a≤c
Wordnet: A Partial Order
>>> for h in wn.synsets('car')[0].hypernym_paths()[0]:
print h.lemma_names
['entity']
['physical_entity']
['object', 'physical_object']
['whole', 'unit']
['artifact', 'artefact']
['instrumentality', 'instrumentation']
['container']
['wheeled_vehicle']
['self-propelled_vehicle']
['motor_vehicle', 'automotive_vehicle']
['car', 'auto', 'automobile', 'machine', 'motorcar']
Help
s = wn.synsets('car')[0]
>>> s.name
'car.n.01'
>>> s.pos
'n'
>>> s.lemmas
[Lemma('car.n.01.car'), Lemma('car.n.01.auto'),
Lemma('car.n.01.automobile'),
Lemma('car.n.01.machine'),
Lemma('car.n.01.motorcar')]
>>> s.examples
['he needs a car to get to work']
>>> s.definition
'a motor vehicle with four wheels; usually propelled
by an internal combustion engine'
>>> s.hyponyms()[0:3]
[Synset('stanley_steamer.n.01'),
Synset('hardtop.n.01'), Synset('loaner.n.02')]
>>> s.hypernyms()
[Synset('motor_vehicle.n.01')]
CFGs: Context
Free Grammars
(Ch8)
Ambiguity
• The Chomsky Hierarchy
– Type 0 > Type 1 > Type 2 > Type 3
– Recursively Enumerable > CS > CF > Regular
• Examples
– Type 3: Regular (Finite State):
• Grep & Regular Expressions
• Right-Branching: A a A
• Left-Branching: B B b
– Type 2: Context-Free (CF):
• Center-Embedding: C … x C y
• Parenthesis Grammars: <expr> ( <expr> )
• w wR
– Type 1: Context-Sensitive (CS): w w
– Type 0: Recursively Enumerable
– Beyond Type 0: Halting Problem
Syntax & Semantics
• Syntax: Symbol pushing / Parsing
– Parsing: use context-free grammar to map string tree
• Semantics: Meaning (making sense of trees)
– Is synonymy an equivalence relation?
• Dichotomy is important both for
– Natural Languages (English, FIGS, CJK, etc.)
• FIGS: French, Italian, German & Spanish
• CJK: Chinese, Japanese & Korean
– as well as Artificial Languages
• Python, HTML, Javascript, SQL, C
Summary
Chapter 1
• NLTK (Natural Lang Toolkit)
– Unix for Poets without Unix
– Unix Python
• Object-Oriented
– Polymorphism:
• “len” applies to lists, sets, etc.
• Ditto for: +, help, print, etc.
• Types & Tokens
– “to be or not to be”
– 6 types & 4 tokens
• FreqDist: sort | uniq –c
• Concordances
Chapters 2-8
• Chapter 3: URLs
• Chapter 2
– Equivalence Relations:
• Parity
• Synonymy (?)
– Partial Orders:
• Wordnet Ontology
• Chapter 8: CF Parsing
– Chomsky Hierarchy
• CS > CF > Regular
© Copyright 2026 Paperzz