Introduction to Cryptography with Coding Theory

Cracking the Code:
Foundations of Cryptology
A brief introduction to the
underlying terms and
concepts of cryptography
and cryptanalysis
Martina Weber
Project Definition & Requirements


Design and implement tools that allow you to
quickly crack XOR-encryption schemes.
General Requirements:
–
–
–
–
XOR-Encrypt a text using a key.
Given an encrypted message, produce the
original message.
Analyze the “quality” of various techniques and
solutions.
Create a Human Computer Interface for the
system.
The Story Line
Alice needs to send a classified message to
Bob, however, she does not want her
archrival, Eve, to know the confidential
information. Therefore Alice and Bob agree
they will disguise their message by
employing an encryption scheme with an
agreed upon key - but Eve is clever and
devious...
Defining the Terms



Plain text - the text that Alice wishes to
transmit to Bob, in its original form
Cipher text - the result of Alice encrypting the
text with the key
Decrypt - reconstructing the plaintext using
the cipher text and the key
The Conventions

To distinctly identify the original text from the
encoded text, plaintext characters will be
delimited in lower case and cipher text
characters in upper case.

Generally, it is standard to omit all punctuation
and spaces from the plaintext. This is done to
eliminate analysis based on sentence structure
and word length in the cipher text.
Eve’s Attack: Cryptanalysis
At first, Eve is baffled, but then
she realizes that Alice and Bob
only know two encryption
schemes. Better yet, Eve is
confident in her abilities to
crypt analyze these schemes
and knows she will be able to
crack the code.
The Encryption Schemes
Monoalphabetic Substitution Cipher

Each plaintext character
in a message is
substituted with a unique
alternate character to
obtain the cipher text,
thus any given letter of
the alphabet is always
enciphered by the same
cipher text letter.
Polyalphabetic Substitution Cipher

The plaintext message is
encoded with a keyword
of length m. Thus, a
character in the original
text can be mapped to
any of the characters in
the keyword to produce
the cipher text.
A Closer Look at Monoalphabetic
Substitution Ciphers

When a monoalphabetic substitution cipher is
used, there is a one-to-one correspondence
between the characters in the plaintext and
the characters in the cipher text
A Simple Example
Using a Monoalphabetic Substitution Cipher

The following is the key used:
Plaintext
a
Ciphertext J


b c d e f g h i j k l m n o p q r s t u v w x y z
K L M N O P Q R S T U V W X Y Z A B C D E F G H I
Example using the key:
Plaintext: thisiseasy
Cipher text: CQRBRBNJBH
To decrypt, simply look up the encrypted character in the
table and use the plaintext character listed directly above
A Closer Look at Polyalphabetic
Substitution Ciphers

When a polyalphabetic substitution cipher is
used, there is NO one-to-one correspondence
between the characters in the plaintext and
the characters in the cipher text; a character
could have been encoded using any of the m
letters of the keyword.
A0
B 1
C 2
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z 0
2
1
1
2
1 3
2 2
3 4 5 6 7 8 9 1 0 1 1
1 4 1 5 1 6 1 7 1 8 1 9 2 0
2 3 2 4 25
Understanding Polyalphabetic Substitution Ciphers
will Require a “New” Alphabet...

Instead of using alphabetic characters, the
new notation will be using the numerical
position (0 to 25) of a given letter
For example, A = 0, B = 1, ..., Y = 24, Z = 25
1
2
3
4
5
6
7
8
9
1 0
1 1
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

=
...And a Modest Mathematical Background in
Modular Arithmetic


+
(x mod m) is evaluated
as the remainder when
dividing x by m
Modular arithmetic ([x+y]
mod m) is performed by
first adding x and y and
the reducing the result
modulo m. Adding two
numbers in the range 0
to m-1 will yield a
number in the range 0 to
m-1
Examples:
6 mod 3 = 0
5 mod 3 = 2
20 mod 7 = 6 10 mod 7 = 3
~~~~~~~~~~~~~~~~~~~~~~
Let m = 26
(7+8) mod 26 = 15
(20 + 6) mod 26 = 0
(17 + 11) mod 26 = 2
(23 + 25) mod 26 = 22
÷
Here is a “Cheat Sheet” for
Arithmetic Modulo 26
Addition mod 26
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
2
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
3
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
4
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
5
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
6
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
7
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
8
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
9
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
10
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
11
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
12
12
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
13
13
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
14
14
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
15
15
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
16
16
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
17
17
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
18
18
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
19
19
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
20
20
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
21
21
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
22
22
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
23
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
24
24
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
25
25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
A Simple Example
Using a Polyalphabetic Substitution Cipher




First take the text to be encoded and convert it character
by character into the respective numerical equivalent.
Then choose a key to use on the text. Convert this key
into its numerical representation as well.
Next, write the converted key above the converted
plaintext, repeating it as necessary, and add the
characters together character by character, modulo 26.
Finally, convert the encoded numerical text back to
alphabetic text (if you so wish)
Example continued...
...Example continued...
A Simple Example
Using a Polyalphabetic Substitution Cipher

Step One: Converting plaintext to numerical equivalent
Plaintext
h
Numerical Equivalent 7

a v i n g
0 21 8 13 6
f u n y e t
5 20 13 24 4 19
Step Two: Converting key to numerical equivalent
Key
y e s
Numerical Equivalent 24 4 18

Step Three: Adding the plaintext with the key, modulo 26
Plaintext
+ Key
=
Mod 26

7 0 21 8 13
24 4 18 24 4
31 4 39 32 17
5 4 13 6 17
6 5 20 13 24 4 19
18 24 4 18 24 4 18
24 29 24 31 48 8 37
24 3 24 5 22 8 11
Step Four: Converting cipher text to its alphabetic equivalent
Ciphertext
5 4 13 6 17 24 3 24 5 22 8 11
Alphabetic Ciphertext F E N G R Y C Y F W I L
...Example continued
A Simple Example
Using a Polyalphabetic Substitution Cipher


Thus, the plaintext “havingfunyet” is encoded with the key “yes” to
the cipher text “FENGRYCYFWIL”
Had Alice actually sent this message to Bob, he would decode it
using the inverse procedure: subtract the key from the cipher text
mod 26
Ciphertext
- Key
=
Mod 26
Plaintext

5
2
7
7
h
4
22
26
0
a
13 6 17 24 3
8 2 22 8 2
21 8 39 32 5
21 8 13 6 5
v i n g f
24
22
46
20
u
5
8
13
13
n
22
2
24
24
y
8
22
30
4
e
11
8
19
19
t
Note that subtracting in modular 26 means adding the additive
inverse of an element. The inverse of a number x can be found
by taking 26 - x. The results of this can be seen in the “key” row
of the above decryption.
Initial Assumptions

Assumptions about the
Language:
–
–
The plaintext will be
based on the English
language
When doing frequency
analysis to determine the
key used, I will assume
the key is an actual word

Assumptions about the
Method Used:
–
–
I will be doing analysis on
the XOR polyalphabetic
substitution cipher
XOR encryption can be
considered addition mod 26
as used previously in the
example (i.e. A = 0, B = 1,
..., Z = 25)
A Note on the Method
Using addition mod 26 (instead of converting letters to
binary representations and doing XOR bit-by-bit) does not
take away from the learning experience. This is because
in this type of cryptanalysis, the algorithm analyzes a
character at a time without regard to the actual character,
noting only that it is a distinct character. The addition
mod 26 will simply provide an easier medium for both
myself and peers to understand and convey the
information as we can talk about specific characters and
not be concerned with abnormal or unprintable characters
that would otherwise be obtained in the XOR encryption.
Exploiting the Weaknesses
After Eve determines that Alice
used a polyalphabetic cipher (after
all, a monoalphabetic substitution
cipher is too simple, even for Alice),
she remembers the strategy for
cracking the code: find the key
length and then use a frequency
analysis to determine either the
plaintext or the key used for
encryption.
Applicable Theories and Terms


Kasiski Test: In a polyalphabetic cipher text message, two identical
segments of plaintext will be encrypted to the same cipher text
whenever their occurrence in the plaintext is a multiple of the length
of the keyword; therefore if a string of characters appears
repeatedly in the cipher message, it is possible that the distance
between the occurrences is a multiple of the length of the keyword
Friedman Test: Used to determine whether a cipher text has been
enciphered using a monoalphabetic or polyalphabetic substitution
cipher. If the cipher used is polyalphabetic, the text also suggests
the length of the keyword using the Index of Coincidence
Continued...
...Continued
Applicable Theories and Terms

Index of Coincidence: The probability of two letters
randomly selected from a text being equal
– The expected frequencies of the letters A through Z in
the English language are known. Using these
probabilities, the index of coincidence for the language is
approximately 6.5%. Hence, if two letters are arbitrarily
chosen from an English text, nearly 6.5% of the time the
letters would be the same.
– In a purely random text, the letters would occur with
roughly the same frequency, resulting in the index of
coincidence being about 3.8%.
Conventions and Abbreviations Employed


n = the length of the cipher text being crypt
analyzed
IC = Index of Coincidence, as discussed
previously and represented by the following
formula
25
IC =
Σ fi*(fi - 1)
i=0
n*(n - 1)
Where fi represents the frequency of the respective alphabetic
character in the cipher text
Let the Code Breaking Begin...
Armed with this bank of
knowledge, Eve can
proceed to crypt analyze
Alice’s message to Bob.
What are the methods she
can use and how effective
are the various techniques?
What is the best approach?
And Now Onto the Fun Part...
Applying the theories and principles!
Determining the Key Length

I employed four distinct, yet related
algorithms for finding the key length. These
algorithms are outlined on the following
slides.

Note: These algorithms can stand alone, however,
for increased accuracy, they can be combined
(Formula taken from Cryptology by Albrecht Beutelspacher, page 39.)
Algorithm One: “Plug it in”

Simply plug data into the following formula:
Key Length =
0.027*n
(n-1)*IC - 0.038*n + 0.065
Where n is the length of the text and IC is the Index of Coincidence for a
specific text
(Algorithm taken from Introduction to Cryptography with Coding
Theory by Wade Trappe and Lawrence C. Washington, page 19.)
Algorithm Two: Shift and Count
1. Make a duplicate copy of the cipher text.
2. Align the copy under the original, only
shifted by x places.
3. Record x and the number of coincidences.
(i.e. where the letters match)
4. Increase x and go to step two.
5. The shift with the most coincidences is a
likely guess for the key length.
(Algorithm adapted from Cryptological Mathematics by Robert
Edward Lewand, pages 90 - 92.)
Algorithm Three: Friedman Test
1. for m = 1 to n
2.
Fill ROWS of rectangular array with dimensions
m x (n/m) with consecutive substrings from the
cipher text of length m.
3.
Compute the IC of each COLUMN.
4.
Find the average of all the column IC’s.
5.
If the average IC is approx 0.065, break and m is
the likely keyword length. Else continue loop.
(Algorithm adapted from Cryptological Mathematics by Robert
Edward Lewand, pages 90 - 92, and Cryptography Theory and
Practice by Douglas R. Stinson, page 31.)
Algorithm Four: Kasiski Test
1. Determine repeating strings of characters in
the cipher text (of length at least three).
2. Tabulate the distances between occurrences.
3. The probable key length is a divisor of the
greatest common divisor (GCD) of all the
distances.
Theory Behind the Kasiski Test

If a string of characters is repeated in a
plaintext message at a distance apart which
is equal to a multiple of the length of the
keyword, then the cipher text representations
of these characters will be identical in each
occurrence
And the Winner is...




The most accurate is the Friedman Test, also the
slowest algorithm
The Shift and Count algorithm is very accurate as
well, taking less time than the Friedman Test
The “Plug it in!” algorithm runs the fastest, but is
only accurate on small keys
The Kasiski Test almost always results in output of
the correct key length or a multiple thereof, but how
many possible lengths must the user try before
finding the correct one?
Determining the Plain Text/Key

I used three distinct, yet related algorithms for
finding the plain text/key. These algorithms are
outlined on the following slides.

These algorithms all require the key length as
input, by knowing the key length, the cipher text
can be split into rows of that length. Looking
down a column, all letters are encrypted by the
same key letter - resulting in a Monoalphabetic
Substitution cipher!
(Algorithm taken from Beutelspacher and Lewand.)
Algorithm One: Basic Frequency
Analysis
1. Split text into rows of the same length as the
key.
2. For each column, determine the frequencies
of each letter.
3. Compare to expected English frequencies
(these values are known and tabulated) and
"guess" at encryption.
4. Repeat process on next column.
(Algorithm taken from Introduction to Cryptography with Coding
Theory by Wade Trappe and Lawrence C. Washington, pages 22 - 23.)
Algorithm Two: Permute through
All Shifts
1. Split text into rows of the same length as the
key.
2. For each column, determine the frequencies of
each letter.
3. Take the dot product of the column frequencies
with the every possible shift of the standard
English alphabet frequencies.
4. The largest value is the most likely shift.
5. Repeat the process on the next column.
(Algorithm taken from Stinson pages 33 - 36.)
Algorithm Three: Find Relative
Shifts between Key Letters
1. Split text into rows of the same length as the key.
2. For each column, determine the frequencies of each
letter.
3. Find all MIc of each column with every other column.
4. Search for the MIc's closest to .065, this yields the
relative shift from column i to column j.
5. Form a system of equations and solve in terms of one
key letter.
6. The keyword is a cyclic shift of the result.
Continued...
...Continued
Algorithm Three: Find Relative
Shifts between Key Letters

MI(c) is represented
by the equation on
the right, where n and
m are the lengths of
substrings f and h, fi is
the frequency of letter
i, and h i - g is the
frequency of letter i g where 0 <= g <= 25.
MIC (f, hg) =
25
Σ
f
*h
i
(i - g)
i=0
n*m
And the Winner is...



Permute through All Shifts Algorithm, logical
winner since all possibilities are attempted
The Basic Frequency Analysis works okay for
small key lengths
What about the Relative Shifts Algorithm?
–
–
I need far more computing power (or patience) to
test this algorithm.
Yields accurate results when the matrix can be
solved
Down the Road: Unaddressed
Issues and Enhancements to
Implement


When the key length is equal to the plaintext length and
the key is perfectly random, this XOR encryption method
is considered perfectly secure. But, does key length really
have to equal the plaintext length for the encryption to
be secure; where exactly is the critical point?
What if a random key is used instead of
an actual word? How will this effect
the frequency analysis to determine the
key?
...Continued
Down the Road: Unaddressed
Issues and Enhancements to
Implement
•I used a cipher text only attack (the only available resource to
analyze is the encrypted cipher text). Consideration should be
given to various types of attacks, such as cribbing
(knowledge that a certain word(s) appears in the plaintext) and
taking advantage of multiple cipher texts in which the same
key was used (additional information is gained under these
circumstances because you KNOW the keys are overlapping
starting at the beginning of the cipher text - however, how do
you determine initially that the same key was used?).
...Continued
Down the Road: Unaddressed
Issues and Enhancements to
Implement


My final code requires “slimming down” to
increase efficiency.
A spell checker/dictionary could be added to
increase accuracy
–
–
Instead of giving the user all cyclic shifts of the
key word on the Find Relative Shifts between Key
Letters Algorithm, only give the user actual words
When using the other two algorithms, a spellingauto-corrector would improve accuracy
...Continued
Down the Road: Unaddressed
Issues and Enhancements to
Implement


In the first two find plain text algorithms, allow
the user to select specific letters in the
keyword or plaintext to change and display
the effect of these changes.
The key length algorithm that attempts to
compute the GCD could be altered to throw
out “bad” data
–
i.e. find the number(s) that are preventing a
common GCD and ignore those numbers
...Continued
Down the Road: Unaddressed
Issues and Enhancements to
Implement


Combine the various algorithms so they can
share the results and base results off of one
another.
Finally, how about considering a new method
of encryption?
Strategies & Knowledge

Research, research, research!
–



Understand everything you read, even how the
author got from one step to the next
Trial and error, but try it.
Do an example first - ON PAPER (but make
sure you do your math right)
No single part of the project was difficult to
code, but implementation required an indepth understanding of the problem
Advice to Next Year’s Seniors



Start EARLY! It goes by
FAST.
It is almost impossible
to stay on target with
your first schedule,
second schedule, third
schedule...
Lofty aspirations at the
beginning, but reality
will hit

ASK QUESTIONS!
–
–
–
Different professors have
different “specialty”
areas, take advantage of
it
Your classmates can
provide great insight
Don’t re-invent the
wheel, check out other
solutions first
QUESTIONS