Algebraic Coding Theory

Outline
Algebraic Coding Theory
Bret Benesh
College of St. Benedict/St. John’s University
Department of Mathematics
Math 331
Bret Benesh
Algebraic Coding Theory
The Problem
Bret Benesh
Algebraic Coding Theory
The Solution
Transmission of data is not perfect. Data (via a cell phone or
computer) is usually transferred in the form of a series of 0’s
and 1’s. An electric surge, cross-contamination from another
data stream, or human error can easily change some of the 0’s
to 1’s and vice versa.
These errors seem to be inevitable, and so we develop
“error-correcting codes;" error-correcting codes are a part of a
larger field called “algebraic coding theory," which also includes
data compression (like .zip files and mp3s).
We saw a simple way of “coding" information when we studied
cryptography: a 7→ 00, b 7→ 01, c 7→ 02, . . . But if you try to
transmit the message pat the dog on its head, and the a 7→ 00
gets the first 0 switched to a 2, thereby becoming 20 (u), the
message that is received is put the dog on its head Which is
weird.
Our codes will be able to find—and fix!—these errors.
Bret Benesh
Algebraic Coding Theory
Bret Benesh
Algebraic Coding Theory
A simple solution
A simple solution (continued)
We could keep the same code, but repeat it multiple times. For
example, instead of coding pat as 150019, we could code it as
151515000000191919. Then if noise changes a 00 to 20, it
might look like 151515000020191919. We can tell the receiving
machine to take whatever code happens at least twice: then
151515 definitely translates as p, 000020 translates as a since
two of the three letters are a, and 191919 is definitely t. This
code actually fixes a single error.
As described, this method can only detect one error. For
instance, if 000000 gets corrupted to 030020 (two errors), the
decoding machine would not know what to do. We could make
it able to correct two errors by repeating each code 5 times, or
correct n errors by repeating each code 2n + 1 times.
The problem with this method: it is expensive. Your typical 1
GB movie becomes 3 GB by this message. This takes a long
time to download.
Bret Benesh
Algebraic Coding Theory
We will work for the remainder of the day in binary, so our
codes will only involve 0’s and 1’s. Perhaps a 7→ 0, b 7→ 1,
c 7→ 10, d 7→ 11 . . . .
Bret Benesh
Algebraic Coding Theory
A cheaper, though flawed, solution
We can detect (but not correct) single errors by appending a
single parity-checking digit to the beginning of the code: 0 if
there is an even number of 1’s, and 1 if there is an odd number.
For instance, if we want to send the message 1011101, we
would instead send the message 11011101. If our intended
message got corrupted to 1010101, then our sent message
would be 11010101. Since there is an odd number of 1’s after
the leading 1, we would know there is an error (but we would
not know where it is).
This has the advantage of being “cheaper" (our 1 GB movie
would only become 78 (1) ≈ 1.14GB), and the code still detects
one error. But the code cannot correct the error, and it is tough
to see how we could detect more than one error.
Bret Benesh
Algebraic Coding Theory
Bret Benesh
Algebraic Coding Theory
A compromise
Some definitions
We will see that we can develop error-correcting—but relatively
cheap—codse by looking at kernels of particular
homomorphisms. In particular, codes like these are groups.
We will also see that we can use cosets (cosets!) to efficiently
decode the words.
Suppose we have a set C of binary codes—for instance, maybe
C = {000, 001, 010, 011, 100, 101, 110, 111}.
The Hamming distance between two codes is the number of
digits where the two codes disagree. For instance, the
Hamming distance between < 5− > 010 and 111 is two.
This truly is a distance; let d(x, y ) denote the Hamming
distance between any two words x, y ∈ C. Then
1
d(x, y ) ≥ 0
2
d(x, y ) = 0 if and only if x = y
3
d(x, y ) = d(y , x)
4
d(x, z) ≤ d(x, y ) + d(x, z)
Therefore, it fits the definition of “distance."
Bret Benesh
The weight w(x) of a binary code x is the number of 1’s that
appear in x. For instance, w(101011) = 4 and
w(0000001000) = 1.
Since our codes only have 0’s and 1’s in them, we can view
codes as being made up of elements of Z2 = {0, 1} under
addition. Suppose that we want our code words to have length
n. Then we want to look at Zn2 = Z2 × · · · × Z2 .
We will first try to find codes that are able to correct e errors.
We will find subgroups C of Zn2 such that for all x, y ∈ C,
d(x, y ) ≥ 2e + 1.
Bret Benesh
Bret Benesh
Algebraic Coding Theory
Algebraic Coding Theory
Algebraic Coding Theory
Find solve the following questions:
1
d(101, 110) = 2
2
d(1001, 1011) = 1
3
d(010101, 110100) = 2
4
w(0010000) = 1
5
w(010100)) = 2
6
w(1111111) = 7
Bret Benesh
Algebraic Coding Theory
Theorem
Suppose a subgroup C of Zn2 is such that, for all x, y ∈ C,
d(x, y ) ≥ 2e + 1. Then C can correct any e or fewer errors.
Since we are working with groups, we have an easy way to
determine if d(x, y ) ≥ 2e + 1 for all x, y ∈ C:
Theorem
The minimum distance between any two elements x, y ∈ C is
the smallest weight w(z) of any ~0 6= z ∈ C.
Example
Let C = {00000, 00111, 11100, 11011}. The most recent
theorem says that the smallest distance between any of these
two elements is the minimum of
{w(00111), w(11100), w(11011)} = {3, 3, 4}. So the minimum
distance is 3.
We can verify this by making a table of the Hamming distances:
00000
00111
11100
11011
00000
0
3
3
4
00111
3
0
4
3
11100
3
4
0
3
11011
4
3
3
0
The minimal distance is 3 = 2(1) + 1, so e = 1 and this code
can correct 1 error.
Bret Benesh
Algebraic Coding Theory
Example (Continued)
If we receive a message 00101, we can see that
d(00101, 00111) = 1 = e but d(00101, x) ≥ 2 for all
00111 6= x ∈ C. So, if we assume there is only one error, we
can correct this message to be 00111.
If we receive a message 00100, we can see that
d(10101, 00111) = 2 = 2e and d(10101, 11100) = 2, but also
d(00101, x) ≥ 1 for all 00111, 11100 6= x ∈ C. So, we know
there is an error (two errors, actually), but cannot correct it.
If a code 00111 is sent, but the message 11011 is received, we
can see that three errors have occured. But 11011 is a
codeword, so we would not know that errors would occur (we
would think that the sender meant to send 11011).
Bret Benesh
Algebraic Coding Theory
Theorem
Suppose a subgroup C of Zn2 is such that, for all x, y ∈ C,
d(x, y ) ≥ 2e + 1. Then C can correct any e or fewer errors, or it
can detect 2e or fewer errors.
Proof.
Suppose x, y ∈ C such that x is sent but y is received, and
further suppose that at most e errors occurred in transmission.
Then d(x, y ) ≤ e.
If z ∈ C is any codeword other than x, then d(x, z) ≥ 2e + 1.
Then
2e + 1 ≤ d(x, z) ≤ d(x, y ) + d(y , z) ≤ e + d(y , z).
So 2e + 1 ≤ e + d(y , z), so d(y , z) ≥ e + 1 so y is closer to x
than any z and the received message y is automatically
corrected to x.
Bret Benesh
Algebraic Coding Theory
Bret Benesh
Algebraic Coding Theory
Theorem
The minimum distance between any two elements x, y ∈ C is
the smallest weight w(z) of any ~0 6= z ∈ C.
Proof.
Let dmin denote the minimal distance. Note that
dmin = min{d(x, y ) | x, y ∈ C, x 6= y }
= min{d(x, y ) | x, y ∈ C, x + y 6= 0}
= min{w(x + y ) | x, y ∈ C, x + y 6= 0}
= min{w(z) | z ∈ C, z 6= 0}
How do we generate these group codes?
n
Answer: Define a homomorphism φ : Zm
2 → Z2 where n < m by
φ(x) = Hx
for all x ∈ Zm
2 and some n × m matrix H. Then the kernel (call it
Ker (H)) of this homomorphism will be our group code. (these
codes clearly form a group, since kernels are subgroups).
In fact, we will we choose H so that H = A | In for some
n × (m − n) matrix A and the n × n identity matrix In . H is called
a parity-check matrix.
We will then have an efficient way of computing the kernel.
h iTo
do this, we will form the standard generator matrix G = IAn . A
somewhat technical (but easy) proof shows that G defines an
isomorphism ψ : Zn2 → Ker (H). So ψ(G) = Ker (H).
Bret Benesh
Algebraic Coding Theory
Example
Bret Benesh
Algebraic Coding Theory
Example (Continued)
Suppose that we want to encode the elements of
Z32 = {000, 001, 010, 011, 100, 101, 110, 111}. Then


0 1 1 1 0 0
H = 1 1 0 0 1 0 ,
1 0 1 0 0 1
and we have an associated homomorphism φ : Z62 → Z32 . We
will have our grou code C = Ker (H). Note that our “A" is


0 1 1
1 1 0 .
1 0 1
Bret Benesh
Algebraic Coding Theory
Suppose we receive a message y = 001001. We can tell if
there has been an error (and we will be able to correct it) if
there is at most one error. To do this, do matrix multiplication:
 
0

 0  

0 1 1 1 0 0 
1
1



  ~
Hy = 1 1 0 0 1 0  
 = 0 6= 0.
0
1 0 1 0 0 1 
0

0
1
Since y is not in the kernel, it is not a code word. So an error
has occurred, and we will see that it can be automatically
corrected.
Bret Benesh
Algebraic Coding Theory
Example (Continued)
Example (Continued)
We can generate Ker (H) by computing

1

0
0
I3
~x = 
G~x =

A
0
1
1
0
1
0
1
1
0

0
0

1
x
1

0
1
for all ~x ∈ Z32 .
Bret Benesh
Message word Code word
x
Gx
000000
000
001101
001
010
010110
011011
011
100011
100
101110
101
110101
110
111
111000
Notice that each element of Z23 maps to something that
resembles itself, but with some extra information tacked on at
the end.
Bret Benesh
Algebraic Coding Theory
Algebraic Coding Theory
Example (Continued)
Example (Continued)
So C = Ker (H) = Im(G) = {000000, 001101, 010110,
011011, 100011, 101110, 110101, 111000}.
Find which code word was intended if the received message
y = 010010 contains at most one error.
 
0

 1
 

0 1 1 1 0 0 
1
0



  ~
1
Hy = 1 1 0 0 1 0  
 = 0 6= 0, so there is an
0
1 0 1 0 0 1 
0

1
0
error.
Suppose you receive a message y = 010010, and assume that
it has at most one error. How can you tell what the intended
message was?
1
Compute Hy .
2
If you get the zero vector, y was the message.
3
If you get something other than the zero vector, find out
which column of H the vector Hy corresponds to. Call this
column k .
4
Then your error is in the k th entry of the vector y .
Bret Benesh
Algebraic Coding Theory
2
This is the 4th column of H.
3
So there is an error in the 4th entry of y .
Therefore, y = 010010 started out as x = 010110.
Bret Benesh
Algebraic Coding Theory
Why does this work?
If a received message y contains exactly one error, then
y = c + ei for some code word c ∈ C and a standard basis
vector 00 . . . 010 . . . 00. From the previous example,
010010 = 010110 + 000100.


1 0 1 1 0 0
Now let H = 0 1 1 0 1 0. Without computing the code
1 1 1 0 0 1
words (so do not use the standard generator matrix G),
determine what the sent messages were if the following
messages were received:
1
Recall from linear algebra that, for any matrix A, Aei is equal to
the ith column of A. For instance:
1 2 3
2
e =
.
4 5 6 2
5
Then Hy = H(c + ei ) = Hc + Hei = ~0 + Hei .
2
3
4
1
2
3
4
Bret Benesh
w = 111110
x = 101010
y = 111111
z = 010111
Hw = 111, so the sent message was
111110 + 001000 = 110110.
Hx = 000, so the sent message was x.
Hy = 110, which does not appear as a column. So y
contains multiple errors.
Hz = 100, so the sent message was
010111 + 000100 = 010011.
Bret Benesh
Algebraic Coding Theory
Cosets!
Algebraic Coding Theory
Example
From a previous page, we saw that H(c + ei ) = Hei for all
c ∈ C.This means that every element of the coset C + ei maps
to the same thing! This leads us to an efficient way of finding
the error.
For every received word y , define Hy to be its syndrome. In the
previous example, we found the syndrome or w to be 111.
Bret Benesh
Algebraic Coding Theory
Let


0 1 1 0 0
H =  1 0 0 1 0 .
1 1 0 0 1
A calculation yields C = {00000, 01101, 10011, 11110}. We
can compute the cosets and put them in a table. The cosets
must be named with the word of the least weight in the coset.
These names are called coset leaders.
C
C + 10000
C + 01000
C + 00100
C + 00010
C + 00001
C + 10100
C + 00110
Cosets
00000 01101 10011 11110
10000 11101 00011 01110
01000 00101 11011 10110
00100 01001 10111 11010
00010 01111 10001 11100
00001 01100 10010 11111
00111 01010 10100 11001
Bret
Benesh 01011
Algebraic10101
Coding Theory
00110
11000
Example (Continued)
Example (Continued)
Suppose we receive a word y = 10111. We simply look up
which coset it is in, C + 00100, and we can conclude that the
sent message is y + 00100 = 10111 + 00100 = 10011 ∈ C.
Finally, we can compile the syndromes of the coset leaders in a
table:
Suppose we receive a word y = 11100. Find what the sent
message was. The coset is C + 00010, so the intended
message was:
y + 00010 = 11100 + 00010 = 11110 ∈ C.
Now suppose y = 01011. We simply look up which coset it is
in, C + 0011. Note that the coset leader for this is 0011 with
w(0011) = 2, so there is more than one error and we cannot
correct it.
Bret Benesh
Algebraic Coding Theory
Syndrome
000
001
010
011
100
101
110
111
Coset Leader
00000
00001
00010
10000
00100
01000
00110
10100
Bret Benesh
Because...
H00000 = 000
H00001 = 001
H00010 = 010
H10000 = 011
H00100 = 100
H01000 = 101
H00110 = 110
H10100 = 111
Algebraic Coding Theory
Why would we want to use coset decoding?
Example (Continued)
Now, suppose we receive messages x = 10111, y = 00101,
and z = 01011. We can determine the intended messages by
finding the syndromes of each:
1
Hx = H10111 = 100. This corresponds to the coset leader
00100, so the intended message was
10111 + 00100 = 10011 ∈ C.
2
Find the intended message for y . Hy = H00101 = 101.
This corresponds to the coset leader 01000. So the
intended message was 00101 + 01000 = 01101 ∈ C.
3
Hz = H01011 = 110. This corresponds to the coset leader
00110, and w(00110) = 2 > 1!. So there were multiple
errors in z, and we cannot correct them.
Bret Benesh
Algebraic Coding Theory
It saves memory. If you have a lot of codes, say you turned
32
words from Z24
2 into words in Z2 , you would have
224 = 16, 777, 216 words. So if you wanted to check to see if
your message was a code word, you would need to check it
against almost 17 million things.
On the other hand, there are only 232−24 = 28 = 256 cosets!
24
(Let G = Z32
2 , which has a subgroup H isomorphic to Z2 , and
232
8
|G : H| = |G|
|H| = 224 = 2 by Lagrange’s Theorem).
So you could either store 224 things in memory, or 256 if you
use coset decoding.
Bret Benesh
Algebraic Coding Theory