資訊錯誤偵測與修正

ERROR-DETECTING AND ERRORCORRECTING CODES
․A simple scheme for detecting errors is to append a parity
check digit to each block of message. Even parity: the total
number of 1s in a block is even.
Thus, for the message BAT, which encoded as
01000010 01000001 01010100
will be transmitted as 010000100 010000010 010101001.
․Suppose BAT is received as 010000110 010000010
010101001. We know one digit is wrong in the first block.
․Another error-detecting scheme is to transmit each block twice.
For instance, BAT is transmitted as
01000010 01000010 01000001 01000001
01010100 01010100
Again, received message is as the following, revealing the 8th
digit is incorrect in the 1st block.
01000011 01000010 01000001 01000001
01010100 01010100
․Definition:
(1) message word: message block; message length k = 8
(2) code-word: bit string to be transmitted for a message word;
code-word length n = 9 (the former), n = 16 (the latter)
(3) efficiency of (k,n)-block code, i.e. k/n: 8/9 (the former),
8/16 (the latter)
․編碼(E)/解碼(D)函數: one-to-one function
and E = D-1 (inverse func of D), D = E-1
message w1, w2, w3…
to be encoded as E(w1), E(w2), E(w3), …..
decoded as D(E(w1)) = w1 if correct, D(E(w2)) = w2, ….
Ex31: By transmitting each block 3 times, we get
01000010 01000010 01000010 01000001 01000001
01000001 01010100 01010100 01010100
Although efficiency is low (k/n=8/24), it is possible to identify
the intended code-word even if transmitted wrongly. The 8th
bit is incorrectly received as follows: 01000011 01000010
01000010 …., but we assume correct code-word is
01000010 01000010 01000010 …. dut to probability.
Ex32: 兩字碼如後: 01001100 01001101, 若錯誤發生在這兩個
字碼的最後一個位元, 系統不會察覺錯誤已發生; 所以欲擁有
錯誤修正能力, 不只要求編/解碼函數為一對一函數, 另外還需
下述的論辯.
․Hamming distance: d(010101, 011001) = 2
d(11010100, 01111110) = 4
․The Triangle Inequality: d(c1, c3) ≤ d(c1, c2) + d(c2, c3)
if c1, c2 and c3 are code-words of the same length.
接收到一字碼 c, 系統解碼成 D(c) ; 但, 接收到不是字碼的 c’ 碼,
系統還是會解碼成 D(c) , 如果只有 d(c,c’) 最小. 如果有兩個以
上的字碼一樣的最靠近 c’ , 則系統記錄有錯誤發生; 接收到不是
字碼的 z 碼, 若D(z)沒有定義 (例如雜訊), 系統應記錄: 錯誤發生
․假設一編碼系統的字碼距離都至少為 3 , 且接收到不是字碼的
c’ 碼; 如果已知傳送時發生單一錯誤, 設 d(c6, c’) = 1 , 則 codeword c6 必為原先所傳輸的字碼.
Suppose a code-word c7≠ c6 ; with the Triangle Inequality,
3 ≤ d(c6, c7) ≤ d(c’, c6) + d(c’, c7) = 1 + d(c’, c7)
 2 ≤ d(c’, c7)
但, 已知傳送時只發生單一錯誤; 故, c7不可能為原先傳輸的字碼.
Theorem: The min Hamming distance of a block code is m,
then the sys
(1) can detect r or fewer errors if and only if m ≥ r + 1
(2) can correct r or fewer errors if and only if m ≥ 2r + 1
․好的編碼--修正區塊中儘可能多的錯誤--其字碼間的最小距離
須愈大愈好
MATRIX CODES
․Define Wm to be the set of all strings of 0s and 1s with length
m. For example,
W3 = {000, 001, 010, 011, 100, 101, 110, 111}
the encoding func for a (k, n)-block code is a one-to-one func
E: Wk → Wn . Suppose A is a kxn matrix of 1s and 0s, and x
is a 1xn matrix of 1s and 0s. The product xA over Z2 is a 1xn
matrix of 1s and 0s. Hence, an element in Wm can be regarded
as a 1xm matrix of 1s and 0s, and then the product of an
element x in Wk and matrix A is the element xA in Wn .
Ex33: Let
1 0 0 1
A  0 1 1 1
1 0 1 1
and x = [1 1 0] , y = [0 1 0]
xA = [1 1 1 0] is an element of W4
yA = [0 1 1 1]
․E defines a matrix code: when E is one-to-one and
E: Wk → Wn defined by E(x) = x A
A: generator matrix
․To ensure that E is one-to-one is to choose the first k columns
of A form the kxk identity matrix Ik. Denote A having this form
by writing A = [Ik |J] , where J is a kx(n-k) matrix whose
columns are the last n-k columns of A. For such matrix A, the
first k entries of xA will be the same as in x.
Hence, if x1A = x2A then x1 = x2 .
1 0 0 1 0 0 1
Ex34: A  0 1 0 0 1 0 1
0 0 1 0 0 1 1
1 0 0 1
then J  0 1 0 1


0 0 1 1
․前述的生成矩陣 A 為3x7矩陣, 其對應的矩陣碼為 (3,7)-區塊碼;
對任何W3中元素, [w1 w2 w3], 可得
[w1 w2 w3]A = [w1 w2 w3 w1 w2 w3 w1+w2+w3]
i.e. 對訊息字元w而言, 其對應字碼, E(w) = wA, 是重複w兩次
再加上其同位位元; 編碼如下:
E([0 0 0]) = [0 0 0 0 0 0 0] = c1
E([0 0 1]) = [0 0 1 0 0 1 1] = c2
:
:
E([1 1 1]) = [1 1 1 1 1 1 1] = c8
Please note d(c1, c2) = 3, d(c1, c8) = 8 …, so it can detect 2
bits error and correct one bit error according to the theorem
above.
The Check Matrix of a Code
․Define a nx(n-k) matrix A* so that its first k rows are the
corresponding rows of J and its last n-k rows are those of
the (n-k)x(n-k) identity matrix.
 J 
Symbolically, we write A*  
 and is called the check


 I nk 
matrix associated with A.


1 0 0 1 0 1 0


Ex35: A  0 1 0 1 1 0 1
0 0 1 0 1 1 1


1
1

0
A*  
1
0

0

0
1 0 1 0
0 0 0 0
1 1 0 1
0 0 0 0
J

AA
*






0 1 1 1
0 0 0 0
0
1
1
0
1
0
1
0
1
0
0
1
0
0
0
1

1
0
0

0

1
is not coincident!
Theorem: A = [Ik |J] be the generator matrix for a (k, n)-block
code, and let A* be its check matrix.
(1) In the kx(n-k) matrix AA*, each entry equals 0.
(2) A word c of length n is a code-word iif cA* = 0
(a 1x(n-k) zero matrix)


 1 0 0 1
Ex36: ( ref ex34)


 0 1 0 1
For x = [1 0 1 0 0 1 1],
its

0
A*  
1
0

0

0
0
0
1
0
1
0
0
1
0
0

1
0
0 
0

1
we have xA = [1 0 0 1]
Because xA ≠0, theorem
implies x is not a codeword.
Ex37: Justify the condition that [w1 w2 w3 w4 w5 w6 w7]A*
= [0 0 0 0]
Since [w1 w2 w3 w4 w5 w6 w7]A*
= [w1+w4 w2+w5 w3+w6 w1+w2+w3+w7]
․We get w1 = -w4, i.e. w1= w4 (over Z2)
then w2 = w5, w3 = w6, and w1 + w2 + w3 = -w7 = w7
Hence, w7 = 0 if [w1 w2 w3] contains an even number of 1s,
and w7 = 1 if [w1 w2 w3] contains an odd number of 1s.
․Thus [w1 w2 w3 w4 w5 w6 w7]A* = [0 0 0 0] when [w4 w5 w6]
is identical to [w1 w2 w3], and w7 is a check bit of 1 or 0
according to whether the number of 1s in [w1 w2 w3] is even
or odd. But these are exactly the requirements that make xA*
a codeword for this code.
MATRIX CODES THAT CORRECT ALL
SINGLE-DIGIT ERRORS
Theorem: A code can correct all single-bit errors iif its check
matrix satisfies the following two conditions:
(1) No row of A* consists entirely of 0s.
(2) No two row of A* are identical.
Ex38: (ref ex35) The codeword corresponding to the message
word x = [1 0 1] is xA = [1 0 1 1 1 0 1].
Because this is a (3, 7)-block code, the first 3 bits are the
original message word.
Similarly, the codeword [0 1 1 1 0 1 0] decodes as [0 1 1].
Note: In step 1, the wA* is named as syndrome of w.
Ex39: Decode the message
0010111 1011100 0111010 1000111 1100010
which was sent using the code in ex 35.
The decoding of this message, using the check matrix row
decoding method, is shown in the following table:
Received
word w
Syndrome
wA*
0010111
1011100
0111010
1000111
1100010
0000
0001
0000
1101
0101
Row of A*
-7
-2
not a row of
A*
Codeword
0010111
1011101
0111010
1100111
unknown
Message
word
001
101
011
110
unknown
Since the syndrome of the last word is nonzero, but not a row of
A*, there must have been 2 or more errors in the transmission.
Hamming Code
․回憶 ex37 的 (3, 7)-碼, 能夠修正所有單一位元錯誤, 其效率為
3/7 ; 似乎不夠好, 不禁要問: 能夠修正所有單一位元錯誤且最有
效率的 (3, n)-碼 為何?
P. 51 的定理透露端倪, 如果 n=3 則每個收到的字元皆為字碼;
因而無法偵測到傳輸錯誤. 假設 A = [Ik |J] 是 (3, 4)-碼 的生成
矩陣, 則J為3x1矩陣, 會有兩相同列; 所以A*有兩個相同的列,
故無法修正所有單一位元錯誤.
設 A = [Ik |J] 是 (3, 5)-碼 的生成矩陣, 則J為3x2矩陣, 會有兩相
同列; 所以A*無法修正所有單一位元錯誤.
然而, 若 A 是 (3, 6)-碼 的生成矩陣, 則J為3x3矩陣, 很容易找到
1 1 0
A*, 例如
J  1 0 1
, 故 (3, 6)-碼 可以修正所有單一位元錯誤.
0 1 1
Theorem: A (k, n)-code exists that corrects all single-bit errors
iif 2n-k – 1 ≥ n.
Setting r = n – k, n = 2r – 1 i.e. n is 1 less than a power of 2.
We require r > 1 because r = 1 implies that k = 0 (wrong!)
The first five such values of k and n are given in the following:
r
k = 2r – r - 1
2
1
3
4
4
11
5
26
6
57
n = 2r - 1
3
7
15
31
63
For these pairs of k and n, we can construct a (k, n)-code that
corrects all single-bit errors. The check matrix, which is nxr, has
n rows, each of length r. But there are only 2r such rows for A*,
so each nonzero row must be used. A matrix code for which the
check matrix has this property is called a Hamming code.
Ex40: The preceding table shows that the Hamming code for r = 2 has k = 1
and n = 3. Thus the check matrix is
? ? 
1 1 
J 
J 
A*     1 0  A*     1 0
 I 2  0 1 
 I 2  0 1 




since the rows of A* must be nonzero
and distinct, so ? must be 1.
In this case, for every message word [x] we have [x]A = [x x x]
and so the code repeats each message word three times.
The decoding of each possible word that can be received
is given in the following:
Word received
syndrome
row of A*
codeword
Message word
000
001
010
011
100
101
110
111
00
01
10
11
11
10
01
00
-3
2
1
1
2
3
--
000
000
000
111
000
111
111
111
0
0
0
1
0
1
1
1
• Suppose we have a Hamming code, and a word c’ is received.
The syndrome of c’ is s = c’A*. If s ≠ 0, then s appears as a
row of A* because every nonzero word of length r appears as
a row of A*. Let row i of A* equal s. Then c’ will be decoded as
the first k bits of c, where c is the word obtained by changing
the ith bit of c’. Thus when check matrix row decoding is used,
step 4 never happens, and every received word can be
decoded.
• The efficiency of the Hamming code for a particular value of r
is
k 2r  r  1
r
 r
 1 r
n
2 1
2 1
and
k
 1 when
n
r 
Thus there are Hamming codes with efficiencies that is close to 1.