Lexical analysis Finite Automata

Finite Automata (FA) 4b
Lexical analysis Finite Automata •  FA also called Finite State Machine (FSM) –  Abstract model of a compu;ng en;ty. –  Decides whether to accept or reject a string. –  Every RE can be represented as a FA and vice versa •  Two types of FAs: –  Non-­‐determinis;c (NFA): more than one ac;on for same input symbol –  Determinis;c (DFA): at most one ac;on for a given input symbol •  Example: how do we write a program to recognize the Java keyword “int”? q0 RE and Finite State Automaton (FA) • REs are a declara;ve way to describe the tokens –  Describes what is a token, but not how to recognize the token • FAs are used to describe how the token is recognized –  FAs are easy to simulate in a programs • A 1-­‐1 correspondence between FAs & REs –  A scanner generator (e.g., lex) bridges the gap between regular expressions and FAs. String stream
Regular
expression
Finite
automaton
Scanner generator
scanner
program
Tokens
i q1 n q2 t q3 TransiAon Diagram • FA can be represented using transi;on diagram • A transi;on diagram has: –  States represented by circles; –  An Alphabet (Σ) represented by labels on edges; –  TransiAons represented by labeled directed edges between states. The label is the input symbol; –  One Start State shown as having an arrow head; –  One or more Final State(s) represented by double circles. • Example transi;on diagram to recognize (a|b)*abb a q0 b a q1 b q2 b q3 Simple examples of FA a
start a*
start a 0
•  Define input alphabet and ini;al state •  Draw the transi;on diagram •  Check 1 a 0 a start a+
a 0
1 a (a|b)*
start 0 Defining a DFA/NFA a, b start 0 b
–  Do all states have out-­‐going arcs labeled with all the input symbols (DFA) –  Any missing final states? –  Any duplicate states? –  Can all strings in the language can be accepted? –  Are any strings not in the language accepted? •  Op;onally name the states Example of construcAng a FA •  Construct a DFA accep;ng a language L over the alphabet {0, 1} where L is set of strings with any number of 0s followed by any number of 1s •  Regular expression: 0*1* •  Σ = {0, 1} •  Draw ini;al state of the transi;on diagram Example of construcAng a FA •  Dra] the transi;on diagram 0 Start 0 1 1 •  Is 111 accepted? •  Le]most state has missed an arc for input 1 0 Start Start 0 1 1 1 Example of construcAng a FA Example of construcAng a FA •  Is 00 accepted? •  The le]most two states are also final states –  First state from the le]: ε is also accepted –  Second state from the le]: strings with “0”s only are also accepted 0 0 Start •  The le]most two states are duplicate –  their arcs point to the same states with same symbols 0 1 Start •  Check that they are correct –  All strings in the language can be accepted » ε, the empty string, is accepted » strings with 0s/1s only are accepted –  No strings not in language are accepted 0 •  Naming all the states 1 1 1 Start TransiAon table –  One row for each state, S –  One column for each symbol, A –  Cell (S,A) is set of states can reachable from S on input A •  NFAs have at least one cell with more than one state •  DFAs have a singe state in every cell •  NFA is more concise but not as easy to implement; •  DFAs easily simulated via algorithm •  Every NFA can be converted to an equivalent DFA –  What does equivalent mean? •  There are general algorithms to ‘minimize’ a DFA –  Minimal in what sense? INPUT
a q0 b a q1 b q2 b q3 STATES
a
b
>Q0
{q0, q1}
q0
Q1
q2
Q2
q3
*Q3
1 1 q0
q1
DFA to program •  A transi;on table is a good way to implement a FSA (a|b)*abb 1 •  There are systems to convert REs to programs using a minimal DFA to recognize strings defined by the RE •  Learn more in 451 (automata theory) and 431 (Compiler design) RE
Thompson construction
NFA
Subset construction
DFA
Minimization
Minimized DFA
DFA simulation
Program
Scanner
generator