Levenshtein Algorithm - Chair of Software Engineering

Chair of Software Engineering
Einführung in die Programmierung
Introduction to Programming
Prof. Dr. Bertrand Meyer
Complement to lecture 11 :
Levenshtein distance algorithm
Levenshtein distance
Also called “Edit distance”
Purpose: to compute the smallest set of basic operations



Insertion
Deletion
Replacement
that will turn one string into another
Intro. to Programming, lecture 11 (complement): Levenshtein
2
Levenshtein distance
“Michael Jackson” to “Mendelssohn”
IC H AE L
MI
E
Operation
Distance
ND
J AC K SO
N
S
H
S
S D D D D
I
01 2 3 4
5 6 7 8 9
10
S D S
Intro. to Programming, lecture 11 (complement): Levenshtein
3
Levenshtein distance algorithm
levenshtein (source, target : STRING): INTEGER
-- Minimum number of operations to turn source into target
Indexed from zero
local
distance : ARRAY_2 [INTEGER]
i, j, del, ins, subst : INTEGER
do
create distance make (source count, target count)
from i := 0 until i > source count loop
distance [i, 0] := i ; i := i + 1
end
.
.
.
.
.
from j := 0 until j > target count loop
distance [0, j ] := j ; j := j + 1
end
-- (Continued)
Intro. to Programming, lecture 11 (complement): Levenshtein
4
Levenshtein, continued
.
from i := 1 until i > source count loop
from j := 1 until j > target count invariant
.
-- For all p : 0 .. i, q : 0 .. j –1, we can turn source [1 .. p ]
-- into target [1 .. q ] in distance [p, q ] operations
loop
if source [i ] = target [ j ] then
s [m .. n ]: substring of s
distance [i, j ] := distance [ i -1, j -1] with items at positions k
else
such that m  k  n
deletion := distance [i -1, j ]
(empty if m > n)
insertion := distance [i , j - 1]
substitution := distance [i - 1, j - 1]
distance [i, j ] := minimum (deletion, insertion, substitution) + 1
end
j := j + 1
end
i := i + 1
end
Result := distance (source count, target count)
end
.
.
Intro. to Programming, lecture 11 (complement): Levenshtein
5
B
E
A
T
1
2
3
4
0
0
B
E
E
T
H
0
D
1
3
4
5
I
1
K
D
2
I
2
I
0
1
D
2
I
K
K
S
D
D
D
D
S
D
D
D
D
S
D
4
5
2
3
4
K
Keep
1
2
3
I
1
D
3
I
2
I
0
1
3
I
1
I
1
2
3
I
Insert
S
I
5
I
4
I
3
I
2
2
K
L
S
I
1
D
2
I
S
D
Delete
I
5
I
4
I
3
3
K
2
S
S
6
7
I
6
I
5
I
4
I
3
I
2
E
I
S
3
3
S
I
I
7
6
5
4
4
?
4
S
Substitute
6
0
B
E
A
T
1
2
3
4
3
E
S
5
6
7
4
5
6
L
Keep B,1
B
1
0
2
1
Keep E,2
E
2
1
0
2
1
3
Subst EA,3
4
5
E
3
2
1
1
3
4
T
4
3
2
2
H
5
4
3
3
Keep
Insert
2
3
Keep T,4
2
3
1
Ins L,5 Ins E,6
2
Delete
2
3
4
Subst
HS,7
4
Substitute
7