176898.pdf

D I S T R I B U T E D ARITHMETIC ARCHITECTURE FOR IMAGE CODING
S . N. Merchant
B. V. Rao
8t
A.C.R.E., I.I.T., Powai, BOMBAY-400 076
ABSTRACT
T h e aim o f this paper i s t o describe
the development of a hardware circuit based
on distributed arithmetic architecture t o
obtain the fast DCT of a given image.
It
not only provides DCT transform coefficients
biut also other
transforms' coefficients.
I n addition t o t h i s it also provides inverse
transforms with little o r n o change in
components/ interconnections.
It has also
been
demonstrated
that
distributed
arithmetic circuit can be used t o obtain
fourth order FIR/IIR filter.
Orthogonal transforms have been used
for a long time in field of image coding for
reduction in t h e amount o f data needed for
storeltransmit a n image.
Many transforms
have been used f o r t h i s purpose.
But o f all
the transforms t h e discrete cosine transform
has been found t o be the best suitable
transform for image coding since it gives
t h e best compression ratio for a given
amount o f mean-square
error.
T h e aim of this paper i s t o describe
the development of a hardware circuit based
o n distributed arithmetic architecture t o
obtain the fast DCT of a given image.
The
image i s first broken into small blocks of
8x8
matrices.
The
hardware
circuit
developed and described in this paper n o t
only provides DCT transform coefficients but
also other transforms' coefficients.
In
addition t o this it also provides inverse
transforms with
little
or
no
change
incomponents/ interconnections.
The change
in the circuit for different transforms i s
just a change in the look-up-table stored in
the ROM.
It has also been demonstrated that
distributed arithmetic circuit can b e used
t o obtain fourth order FIR filter.
The
modification needed i s that input data bits
a r e given serially t o the serial input of
one of the input registers.
W e have also
shown that a fourth order IIR filter can be
4
point
distributed
implemented using
arithmetic circuit
if
four
additional
parallel t o serial registers a r e used.
2.DISTRIBUTED ARITHMETIC PROCESSING
In image processing o n e has t o
obtain following function
N
y = z a x
where
a
n=l
is
often
(1)
a
coefficients, and x
the data values a r e
set
of
are data
such
n=l
Interchanging t h e order o f
over indices n and k yields
y =
E- 1 2-k N
k=l
anxn
k
-
n=l
predetermined
values.
that
lxnl<l
....,xi)
F(x:,xE,
N
summation
a x
with
N
binary
Fl:
=
(4)
anxn = Fk
n=l
Then w e can write v a s
E-1 k
0
y
2 kF(x:,x2 k
xN)- F(xY,.
.,xN)
(5)
k=l
T h u z fiven
value o f
the
function
F(x1,x2
i t i s possible t o compute y
,.,
,...,
=c
:i:
,....,
by using additions (subtraction for k=O) and
shift operations only. S i n c e the arguments
only, F has a
of 'F' can take values 0 o r
o f possible
finite number (equal t o 2*)
outcomes. T h i s s e t of possible outcomes a r e
stored in a memory a s a look-up-table.
The
look-up-ROM i s accessed using t h e arguments
o f 'F' a s address. T h e output o f the ROM,
i.e.
'Fk' i s given t o fiLU units where
appropriate addition and shift operations
a r e performed. This method o f obtaining
N= 1
i s known a s distributed arithmetic system.
IMPLEMENTATION OF A DISTRIBUTED
ARITHnETIC SYSTEM
3.
~
~~
N
We
have
represented as
y = E- 1 2-kF(x:,x2
the
k
equation
,...,xi) -
= C anxn
n=1
y
F(x1x2
0 0
,...xo)
k=l
F o r simplicity w e assume B = 8 and
then
N
N
=
0,
7
.
I
Let
A
F ( x!
k
,x2,
....xk 1
be
represented
as
' F 'then
k
7
If
y =
and
k= 1
2-kF-F
k
9
4.3.1
74
(3)
n=1
W e now define a function F
valued arguments a5 follow
INTRODUCTION
1.
represented in signed 2's complement c o d e of
above
equation a s E 1 3
E bits accuracy then w e can write t h e
CH2766 - 4/89/0000 - 0074 0 1989 IEEE
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
(7)
Thus if o n e knows ' Z m * and 'Sm'
+F4 ) .2-'+Fg) .2-l+F2) .2-'+F1) .2-l+ Fo
(8)
The above equation can now be implemented
using fllowing hardware components:
(1)
eight 8-bit registers (parallel t o
serial).
(2) CI ROM o f capacity Z8= 256 locations
which stores the look-up-table.
(3) CI register which acts a s a memory data
register or pipe line kegister so that when
adder acts on 'Fk+Fk-l', t h e ROM i s being
bits t o obtain Fk-2.
accessed by xk-2
(4) CIn adder / subtractor.
( 5 ) CI temporary register t o
store
inter-
DISTRIBUTED CIRITHMETIC 41 DISCRETE
4.
COSINE TRf3NSFORM
Discrete cosine transform of
sequence 'Xm' i s given a s
N- 1
Yk =
akm Xm
m=Q
where
1
a
=
k=Q
km
K
,
k = Q,l,....,
cos
[-i2.'iikE--
a
data
(9)
N-1
-,
=
f
=
1
,k
#
Q
(10)
2N
Thus each ' Y k * can be treated a s an equation
of t h e type of 1). T h u s if N = 8, then for
each Yk (represented in 8 bits) w e need a
ROM of 256 locations.
Fig.
1
gives
implementation
of
8-point
DCT
using
distributed arithmetic circut.
Even though 16-registers i s not a
small number (each regkster having para3hel
to serial function) the ROM s i z e of 2 is
too large and expensive t o implement. So if
one has to develop a distributed arithmetic
system t o implement 16-point DCT, then t h e
memory size has t o be brought down t o s o m e
practical level. With this in aim, it can be
m = Q,1,2,
N-1 i s a
shown that if 'Xm
....
',
input sequence, then DCT i s given by
N/2)-1
k c even
(11)
yk e m=Q
bkm '
m
'
N-point
N/S)-l
k'
E
m=Q
k c odd
Ckm 'm,
(12)
where
m'
+
'N-1-m
= m'
m'
+
'N-1-m
= m'
bkm = c k cos
w e need a memory capacity of 2N/2 t o obtain
o n e even o r odd term of t h e DCT.
Thus the
total memory capacity needed will be N.ZNf2
N
N/2+
ZN/2
(---.2
).
2
A block schematic t o obtain DCT using
pre-addition and pre-subtraction a s shown in
Fig.2 for N=8.
.
9.
mediate results.
[--;--I
(2m+l)krr
then
o n e can obtain t h e even and odd numbered DCT
output coefficients separately. Also s i n c e
t h e summation is now done only for N/Z terms
IMPLEMENTCITION OF 6-POINT DCT
USING DISTRIBUTED CIRITHMETIC
It w a s shown above that a 8-point DCT
can be obtained using a pair of 4-point
distributed arithmetic circuits. But s i n c e
t h e two 4-point
distributed
arithmetic
circuits differ only d u e t o t h e different
look-up-tables stored in t h e ROMs, if both
t h e look-up-tables [ o n e for even numbered
and o n e for odd numbered coefficients) a r e
stored in a single ROM, then only o n e 4
point distributed arithmetic circuit
is
enough. Of course now t h e speed is reduced
by an equivalent factor. Hence t o reduce the
hardware, a 4 point distributed arithmetic
circuit w a s implemented t o obtain DCT of a N
x N image matrix in blocks o f 8 x 8.
A
4-point
distributed
aritmetic
8
point
DCT
using
system t o obtain
microprocessor for above mentioned i s shown
in Fig.3.
T h e memory i s shown a 5 a 16 x 16
array, since 16 is t h e next higher multiple
of 8. Therefore any program written t o
obtain DCT of a 16 x 16 image in blocks of 8
x 8, can be modified with t h e help of a few
changes in s o m e microprccessor instructions
t o obtain DCT o f a bigger sized image
matrix.
A two dimentional DCT of an image i s
obtained by row transformationsfollowed by
column transformations.
CIPPLICCITIONS OF 4 P O I N T
DISTRIBUTED CIRITHMETICCIRCUIT
6.
( 1 ) In addition t o obtaining DCT and IDCT
t h e s a m e circuit with s o m e modifications and
different look-up-tables ( R O M ) can be used
t o obtain Slant transform, Walsh-Hadamard
transform, and discrete Fourier transform
and i t s inverses.
(2) T h e 4 point distributed arithmetic
circuit can be used t o obtain fourth order
FIR filter. T h e modification needed i s that
input data bits a r e given serially t o t h e
serial input of o n e o f the input registers.
T h i s type of a scheme t o obtain fourth w - d e r
FIR filter i s shown in Fig.4.
( 3 ) CI fourth order IIR filter can a l s o be
4
point
distributed
implemnted
using
arithmetic circuit, if
four
additional
parallel t o serial registers a r e
used.
Fig.5 s h o w s a block diagram t o implement
fourth order KIR filter.
REFERENCES
C l 3 CI.
Peled c4 R. Liu, Digital Signal
Processing-Theory, Design and Implementatio,
New York, John Wiley c4 Sons, 1976.
4.3.2
75
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
c
U
0
-
0
W3lSAS
CO
80s S3308d083IH
f
Q.
I-
m
0
2
-13
I-
? 'L
U
E
LL
U
U
I-
W
I
I
t
[L
a
n
W
U
I
c
3
I
E!
U
c
!L!
a
OI3-l-
-
n
x
Q
1"1
ADDER
U
I!PARALLELTO SERIAL
I
1
I
INPUT :FASTER
INPUT REGISTER
ROM
a
IMEMORY DATA REGISTERI- MEMORY DATA REGISTER
*
I
,
OUTPUT REGISTER
I
1-
r.
,
I
OUTPUT REGISTER
FI G - L
L p i . DISTRIBUTED ARITHMETIC CIRCUIT TO OBTAIN 8 p l .
D C T
4.3.3
76
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
0 3
FIG. 4
L t h O R O E R F I R FILTER USING
4
pf
DISTRIBUTED ARITHMETIC
I --
I
R O M
i
c:
,*
MEMORY
DATA
REGISTER
FIG. f
41h ORDER I I R FILTER USING L PI. DISTRIBUTED ARITHMETIC
4.3.4
77
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.