Lecture 5: Finite State Transducers

Regular Relations
Morphological Analysis
Finite State Transducers
Lecture 5: Finite State Transducers
CSA3202 Human Language Technology
Mike Rosner, Dept ICS
October 2011
CSA3202 Human Language Technology
L5
Finite State Technology
1/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Acknowledgements
Jurafsky and Martin Chapters 2 and 3 (diagrams)
CSA3202 Human Language Technology
L5
Finite State Technology
2/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Outline
1
Regular Relations
2
Morphological Analysis
3
Finite State Transducers
CSA3202 Human Language Technology
L5
Finite State Technology
3/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Three Important Concepts
CSA3202 Human Language Technology
L5
Finite State Technology
4/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
A Regular Language L
CSA3202 Human Language Technology
L5
Finite State Technology
5/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Two Regular Languages L1 , L2
CSA3202 Human Language Technology
L5
Finite State Technology
6/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
A Regular Relation R ⊆ L1 xL2
CSA3202 Human Language Technology
L5
Finite State Technology
7/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Operations on Regular Relations
Regular relations participate in the same set of operations as
regular languages, namely
Concatenation
Iteration
Complementation
Alternation (Union)
Intersection
CSA3202 Human Language Technology
L5
Finite State Technology
8/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Concatenation and Complementation of Regular Relations
Concatenation R.S = {(a.p, b.q) |
a ∈ dom(R), p ∈ dom(S), b ∈ codom(R), q ∈ codom(S)}
Let R = {(a,b)} and S = {(p,q)}
R.S = {(ap,bq)}
Let R = {(a,b)(c,d)} and S = {(p,q)(r,s)}
R.S = {(ap,bq)(ar,bs)(cp,dq)(cr,ds)}
Iteration, define via exponent and union.
Definition of complement R is straightforward if we know the
domain and co-domain of the relation.
Alternation and intersection follow the standard definitions of
set theory
CSA3202 Human Language Technology
L5
Finite State Technology
9/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Concatenation and Complementation of Regular Relations
Concatenation R.S = {(a.p, b.q) |
a ∈ dom(R), p ∈ dom(S), b ∈ codom(R), q ∈ codom(S)}
Let R = {(a,b)} and S = {(p,q)}
R.S = {(ap,bq)}
Let R = {(a,b)(c,d)} and S = {(p,q)(r,s)}
R.S = {(ap,bq)(ar,bs)(cp,dq)(cr,ds)}
Iteration, define via exponent and union.
Definition of complement R is straightforward if we know the
domain and co-domain of the relation.
Alternation and intersection follow the standard definitions of
set theory
CSA3202 Human Language Technology
L5
Finite State Technology
9/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Composition of Relations
Composition of relations is defined as follows
R ◦ S = {(x, z) | (x, y ) ∈ R ∧ (y , z) ∈ S}
Let R = {(a,b)(c,d)} and S = {(p,q)(d,s)}
R ◦ S = {(c, s)}
CSA3202 Human Language Technology
L5
Finite State Technology
10/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Composition of Relations
Composition of relations is defined as follows
R ◦ S = {(x, z) | (x, y ) ∈ R ∧ (y , z) ∈ S}
Let R = {(a,b)(c,d)} and S = {(p,q)(d,s)}
R ◦ S = {(c, s)}
CSA3202 Human Language Technology
L5
Finite State Technology
10/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Composition of Automata
CSA3202 Human Language Technology
L5
Finite State Technology
11/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Regular Relations for Morphological Analysis
CSA3202 Human Language Technology
L5
Finite State Technology
12/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Two Level Morphology
Invented Kimmo Koskenniemi (1983) [1]
Postulates two levels of representation, reflected by different
languages:
Surface level language (what you see on the page)
Lexical level language for expressing the result of
morphological analysis
Morphological processing involves computations over this
relation such as:
Lookup
Lookdown
What kind of relation is it?
one:one; one:many; many:one; many:many?
CSA3202 Human Language Technology
L5
Finite State Technology
13/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Two Level Morphology
Invented Kimmo Koskenniemi (1983) [1]
Postulates two levels of representation, reflected by different
languages:
Surface level language (what you see on the page)
Lexical level language for expressing the result of
morphological analysis
Morphological processing involves computations over this
relation such as:
Lookup
Lookdown
What kind of relation is it?
one:one; one:many; many:one; many:many?
CSA3202 Human Language Technology
L5
Finite State Technology
13/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Two Level Morphology
Invented Kimmo Koskenniemi (1983) [1]
Postulates two levels of representation, reflected by different
languages:
Surface level language (what you see on the page)
Lexical level language for expressing the result of
morphological analysis
Morphological processing involves computations over this
relation such as:
Lookup
Lookdown
What kind of relation is it?
one:one; one:many; many:one; many:many?
CSA3202 Human Language Technology
L5
Finite State Technology
13/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Regular Relations for Other Tasks
Regular relations amply represent the problem domains of
several kinds of linguistic task
POS Tagging
Text Normalisation
Spelling Correction
Language Translation
Relations are abstract. Next we turn to the concrete
mechanisms which perform the actual computations
CSA3202 Human Language Technology
L5
Finite State Technology
14/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Finite State Transducers
Intuitively, a finite state transducer (FST) is an FSA that
works on two (or more) tapes.
The most common way to think about transducers is as a kind
of translating machine which reads from one tape and writes
on the other.
Formally, it doesn ’t matter which tape is considered for input
and which for output, so FSTs are inherently bidirectional
In computational morphology, the convention is that
lower level = surface level
upper level = lexical level
CSA3202 Human Language Technology
L5
Finite State Technology
15/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Finite State Transducer
CSA3202 Human Language Technology
L5
Finite State Technology
16/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Four Possible Operation Modes
1
generation mode: It writes on both tapes. A string of as on
one tape and a string of bs on the other tape. Both strings
have the same length.
2
recognition mode: It accepts when the word on the first tape
consists of exactly as many as as the word on the second tape
consists of bs.
3
translation mode (left to right): It reads as from the first tape
and writes a b for every a that it reads onto the second tape.
4
translation mode (right to left): It reads bs from the second
tape and writes an a for every b that it
CSA3202 Human Language Technology
L5
Finite State Technology
17/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Finite State Transducer - Formal Definition
A finite state transducer (FST) is a six-tuple
< Q, q0 , Σ1 , Σ2 , δ, F >, where
Q is a set of states
q0 ∈ Q is an initial state
F ⊆ Q is a set of final states
Σ1 is a finite set of symbols - the input alphabet
Σ2 is a finite set of symbols - the output alphabet
δ is a relation Q × Σ1 × Σ2 × Q
CSA3202 Human Language Technology
L5
Finite State Technology
18/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
A Very Simple Transducer
CSA3202 Human Language Technology
L5
Finite State Technology
19/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
A Slightly More Complex Transducer
CSA3202 Human Language Technology
L5
Finite State Technology
20/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Exercise
1
2
What is the relation computed by the FST on the previous
slide?
A vending machine delivers peanuts for 55c. It requires exact
change and accepts the following cent coins: 5c, 10c, 20c
Draw an FSA showing the possible coin sequences to get the
peanuts
Draw an FST which gives change if an inexact amount is given
CSA3202 Human Language Technology
L5
Finite State Technology
21/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Summary
Regular relations
Underlying model of morphological analysis
Finite State Transducers for computations over regular
relations
Next: complex relations and transducers
CSA3202 Human Language Technology
L5
Finite State Technology
22/ 23
Regular Relations
Morphological Analysis
Finite State Transducers
Bibliography
Kimmo Koskenniemi.
Two-level Morphology: A General Computational Model for
Word-Form Recognition and Production.
PhD thesis, University of Helsinki, 1983.
available from http://www.ling.helsinki.fi/~koskenni/
doc/Two-LevelMorphology.pdf.
CSA3202 Human Language Technology
L5
Finite State Technology
23/ 23