Algorithm and Architecture Design of Power

Yu-Han Chen, Tung-Chien Chen,
Chuan-Yung Tsai, Sung-Fang Tsai,
and Liang-Gee Chen, Fellow, IEEE
IEEE CSVT 2009
1
 Introduction
 Integer
motion estimation
 Fractional motion estimation
 Parameterized power-scalable
encoding system
 Flexible system architecture
 Implementation results
 Conclusion
2
Battery
capacity
Power-aware encoder
Lifetime

Power-aware encoder can adjust power consumption
in response to different conditions.
ex: user’s preferences and battery states.
3
Power-aware encoder

In this paper provide multiple operating
configurations between point C and D and thus
can adapt to different environmental conditions.
4
5
 Integrates
the low-power design
techniques at the algorithm level and the
architecture level.
• Hardware-oriented fast algorithm
 Improve data reuse capability.
• Content-aware algorithm
 Achieve good tradeoff between coding performance
and computation complexity.
6
 Parallel-VBS-IME algorithm
• Computes all matching costs of different block-sizes
with the same MVs simultaneously.
 Intra-candidate data reuse
• Computes 4x4 blocks first , larger block sizes are
calculated by summing up the corresponding
4x4 costs immediately.
 Inter-candidate data reuse
• For two horizontally neighboring candidates of
a 16×16 block, 16×15 reference pixels are
overlapped and can be shared.
7
 Parallel-VBS-FSS
• Good for inter-
Move to
locally best
Locally best
is at center
candidate data reuse.
• Parallel-VBS-IME is
adopted.
8
 If
motion activity is high
• Set more initial candidates to find the accurate
MVs.
 Multi-iteration
6 initial candidates
parallel-VBS-FSS algorithm
Predicted motion window
(PMW)
Search window
9
 Six
initial candidates
• (0,0)
• MV predictor
 Median MV of left, up, and up-right blocks.
• Rest of four are used to find good matching in
complex motion region.
10
 Content-adaptive
strategy
• The PMW will be adaptively shrunk according to the
neighboring motion activity.
11
 The
searching candidate will conditionally
move vertically or horizontally. Flexible
memory access to support efficient data
reuse.
A2-D2
A2-D2 or B0-B3
Rotate right one
Rotate right two
Rotate right three
12
1. Reference and
current frame
Inter data reuse
2. Current MB
Intra data reuse
2. Reference MBs
Two-directional
random access
3. 16x16
4. Compute the absolute
difference values
5. Compute SAD
13
 Advanced
mode pre-decision algorithm
• N best modes (N = 0 − 7) are pre-decided after
IME with integer-pixel precision.
• Only the N best modes are refined to quarterpixel precision.
• Reduce computation.
 Hardware-oriented
one-pass algorithm
• The half-pixel and quarter-pixel candidates are
processed simultaneously to share the memory
access data and reduce 50% memory access.
14
 Hardware-oriented
one-pass algorithm
Quarter-pixel
Half-pixel
Integer-pixel
Two-step algorithm:17
One-pass algorithm:25
15
Q
is a 4 × 4 block of a quarter-pixel candidate
and it is bilinearly interpolated from two 4 × 4
blocks (A and B) of half-pixel candidates.
 Data processing power for HT of all quarterpixel candidates is saved.
16
Drop 0.06dB
Same memory
access
17
 Parallel
Architecture
Generate the half-pixel
reference data from integerpixel reference data
Generated the quarterpixel reference data fro
half-pixel reference dat
18
 Power-scalable
parameters
• IME, FME, intra prediction (IP), and
DeBlocking(DB) engines.
• Flexibly control the power consumption of the
whole encoding system.
19
(1) 4
(2) 4
(3) 2+2
(4) 2
Power modes: 4*4*4*2=128
20
21
22
 The
curve shows the best coding performance
with the highest power consumption.
 2.69% bit rate increase and 0.12 dB quality drop
in average.
23
Two reference frames to
1 reference frame.
Huang’s H.264/AVC encoder
Multi-iteration
IME and FME
Power scalability of
IP and DB.
Lin’s low-power MPEG-4 encoder
24
25
A
low-power and power-aware H.264/AVC
video encoder has been proposed.
 The power efficiency was co-optimized at
the algorithm, architecture, and circuit levels.
 Provide competitive power efficiency under
D1 (720×480) 30 frames/s video encoding
and the best power configurations
compared to the previous state-of-the-art
designs.
26