國立台灣大學電機資訊學院資訊工程學系 碩士論文 Department of

國立台灣大學電機資訊學院資訊工程學系
碩士論文
Department of Computer Science and Information Engineering
College of Electrical Engineering and Computer Science
National Taiwan University
Master Thesis
分散式影像編碼在手機上的實現與有效率的回饋通道
Distributed Video System Realized on Mobile Device
with Efficient Feedback Channel
陳群元
Chun-Yuan Chen
指導教授: 吳家麟 博士
Advisor: Ja-Ling Wu, Ph.D
資
訊
工
程
學
系
碩
士
論
文
分
散
式
影
像
編
碼
在
手
機
上
的
實
現
與
有
效
率
的
回
饋
通
道
陳
群
元
撰
101
7
˙
國
立
台
灣
大
學
誌謝
能順利的完成這份碩士論文,真的要感謝很多人的幫忙,僅憑一己之力,絕
對沒辦法將這份研究如期完成。首先要感謝的是沈允中學長和蘇則仲學長,若少
了學長們的幫忙,要在如此緊縮的時間內通透分散式編碼的精髓,確實有其困難;
在學長的引導下,各篇論文及書籍,找尋所需要的知識,也在有疑問的時候,不
吝次較,使我的研究逐漸步上軌道。
在學習的路程中,非常感謝謝致仁學長、林映孜學姊、胡敏君學姊、鄭文皇
學長、林裕訓學長、賴瑞欣學長還有黃俊翔學長的鼎力相助。
當然,影響至深的還是指導教授 吳家麟 博士,在學習的路途中不知所措的
時候,他總是像一道光,指引我走向另一座山峰。
在我的研究裡面,為了在手機上實現分散是影像壓縮,其中必須克服去多技
術上的困難。在此,我要特別感謝,航緯、蔡德育、銘遠等網路組同學的幫忙。
感謝許多人的幫忙,我才有辦法在手機上建構出如此完整的影像編碼系統。
最讓我感動的是,一路上有那麼多的好友相伴,在辛勞研究過程的同時,DSP
的組員們,際巧、明宏、奕婷、宇蓓、志霖還有許多好夥伴們總是與我並肩作戰,
讓我的研究所生活充滿友情的陪伴!
最後,我感謝我的家人還有我的女朋友,是他們的關愛和照顧,讓我全心投
i
入學業之餘,沒有後顧之憂。
ii
中文摘要
分散式編碼是一種新的編碼方式,不同於以往的編碼架構,分散式編碼有著
較簡單的編碼器及較複雜的解碼器,所以分散式編碼很適合在運算能力較差的機
器上做分散式編碼。由於分散式編碼的特性,分散式編碼在有者多台執行影像編
碼的感應器應用上面變得越來越熱門。例如視訊會議、手機錄影,都可以利用分
散式編碼的好處,降低手機裝置上的耗電量,而另一方面我們可以利用運算能力
極強的伺服器,例如擁有雲端計算能力的伺服器,來做較難的解碼。
但是我們發現一個嚴重的問題,雖然在高效能的計算機上編碼和解碼都可以
在很快執行。但是由於現行的分散式系統使用的是低密度奇偶校驗碼,所以在解
碼的過程中,解碼器和編碼器需要大量的溝通。這在現實的運用上是一個相當大
的問題。所以在我的碩論裡,不但會在手機裝置實現分散式系統,並對來回溝通
上做更有效率的處理。
關鍵字: 分散式視訊編碼、低密度奇偶校驗碼
iii
ABSTRACT
DVC is a new paradigm video codec. Compare to traditional video codecs, DVC
has light-weight encoder and heavy weight decoder, so DVC encoder is suitable for
applications in which the encoding is done on low computing power devices. Because
of this characteristic of DVC, DVC is more and more popular in the cureless sensor
networking era. On one hand, DVC saves electricity in mobile video conference and
video recording because of its lightweight encoder. On the other hand, the DVC
decoder can run on power server, such as cloud-hased data center
Unfortunately, even though DVC encoder and decoder run rather fast with the
aid of powerful hardware, there are a lot of communications between encoder and
decoder because of the LDPCA is adopted. It is a big challenge. in our work, we
realized a practical DVC system on mobile device and proposed an efficient feedback
channel to face the challenge.
Keyword: Distributed Video codec, LDPCA
iv
CONTENTS
口試委員會審定書.............................................................................................................
誌謝.................................................................................................................................... i
中文摘要.......................................................................................................................... iii
ABSTRACT ..................................................................................................................... iv
CONTENTS ...................................................................................................................... v
LIST OF FIGURES ........................................................................................................ vii
LIST OF TABLES ......................................................................................................... viii
Chapter 1
Introduction ............................................................................................. 1
Chapter 2
DVC system architecture ........................................................................ 5
Chapter 3
DVC system realizes on mobile .............................................................. 8
3.1
3.2
3.3
Encoder .......................................................................................................... 9
3.1.1
Capture video sequence with mobile devices ..................................... 10
3.1.2
Realize DVC encoder on mobile ........................................................ 10
Decoder ........................................................................................................ 12
3.2.1
DVC decode on remote server............................................................ 12
3.2.2
Transcode result sequence for mobile ................................................ 12
Feedback channel ......................................................................................... 13
v
3.3.1
Chapter4
Feedback channel realize in network ................................................. 16
Performance Evaluation .................................................................. 18
4.1
Syndrome distribution .................................................................................. 19
4.2
Estimate the syndrome size per WZ frame .................................................. 21
4.3
Estimate the syndrome size per bitplane ...................................................... 23
Chapter5
Performance Evaluation ....................................................................... 25
5.1
Test conditions and Benchmarks.................................................................. 25
5.2
Decoding complexity analysis ..................................................................... 25
5.3
Quality and Bitrate evalution ....................................................................... 27
Chapter 6
Conclusion and Future Work ............................................................... 31
REFERENCE .................................................................................................................. 33
vi
LIST OF FIGURES
Figure1.
DVC architecture. .......................................................................................... 5
Figure 3-1. DVC system realizes on mobile devices. ....................................................... 8
Figure 3-2. C/C++ project............................................................................................... 11
Figure 3-3. Makefile. ...................................................................................................... 11
Figure 3(a)(b)(c) the process of decoding one bitplane with LDPCA. ........................... 15
Figure 4-1. correlation between syndrome bits and residual value, which residual
value is computed by absolute value of subtract previous reference frame
and next. ....................................................................................................... 21
Figure 4-2. bit rate per bitplane per band for Qi=8, for one WZ frame. ......................... 22
Figure 4-3. bit rate per bitplane and bit rate estimated by DC band for Qi=8, for one
WZ frame. .................................................................................................... 23
Figure 5-1. BitRate with all proposed methods, for Qi=8, for Foreman, Soccer, Coast
Guard, and Hall Monitor (QCIF at 15 Hz)................................................... 28
Figure 5-2. Rate-distortion curve for the foreman
, GOP8 .......................................... 29
Figure 5-3. Rate-distortion curve for the hall monitor
vii
, GOP8 .................................... 30
LIST OF TABLES
Table 5-1(a) decoding time consuming without feedback channel implement. ............. 27
Table 5-1(b) decoding time consuming in our work ....................................................... 27
Table 5-2(a) Bitrate and PSNR for the foreman , GOP8,without estimate. .................... 29
Table 5-2(b) Bitrate and PSNR for the foreman , GOP8, estimate per WZ frame ......... 29
Table 5-2(c) Bitrate and PSNR for the foreman , GOP8, estimate per bitplane ............. 29
Table 5-3(a) Bitrate and PSNR for the hall monitor , GOP8,without estimate. ............. 30
Table 5-3(b) Bitrate and PSNR for the hall monitor , GOP8, estimate per WZ frame ... 30
Table 5-3(c) Bitrate and PSNR for the hall monitor , GOP8, estimate per bitplane ....... 30
viii
Chapter 1 Introduction
Nowadays, there are more and more people using video coding technology via
mobiles, such as attending video conference or recording video with mobile phone’s
cameras. Even though mobile device’s computing power has been developed in recent
years, video coding still costs plenty mobile device’s computing power and can be
easily run out of electricity, which causes a huge problem. Obviously, the
conventional video coding doesn’t fit modern mobile devices due to its high encoding
complexity. In this thesis, we built a whole video transcoding system on the basis of
Distribution Video System upon mobile devices, which adopts light-weight coding on
mobile device and shifts the complexity to remote server, so that the practical video
system can considerably reduce power cost on mobiles. Take a closely related to daily
life application as an example, FaceTime, a popular video call application used to
transfer mobile camera videos to another device with network on iPhones or iPad, can
usually costs a lot of power during operation. In other words, mobile battery runs out
quickly when using FaceTime since network and camera will cost a great amount of
power and video coding is the major burden to blame. Due to the fact it is certainly
essential to keep the network and camera operating during video communication, the
only solution to reduce power consumption is to apply fitter video coding system to
1
low computing devices. Therefore our work is designed to minimize mobile devices’
power cost by using a new video coding paradigm, the distributed video coding
system.
Distributed video coding (DVC) is a brand-new video codec which subverts the
traditional prediction-based standard video scheme by exploiting the source statistics
at the decoder with the development of simpler encoders. In comparison, traditional
video codec has cumbersome encoder but light-weight decoder, DVC video system is
characterized in lighter encoder but heavier decoder. That is, DVC system is more
suitable when the encoder part is realized on mobile device with lighter weight device
so that mobile device is inclined to run efficiently.
DISPAC[1] is one of benchmarks in DVC based video codec, the DISPAC
adopts the WZ-coding which based on Slipian-Wolf[2] and Wyner-ziv theorems[3].
DVC divide the source video into key frames and WZ frames. DISPAC generate Side
Information (SI) from key frames, and adopt the LDPCA[4,5] to do error correction to
SI. Because of the properties of LDPCA, DISPAC decoder requests more syndrome
once decoding one unit (ex.one bitplane in WZ frame), the feedback channel is
designed to handle the communication between encoder and decoder.
2
The sequence of our study is as follows. First, we build a DVC system based on
the benchmark[6] and DVC to H.264 transcoder[7] . Second, we realize DVC encoder
on mobile device, arrange DVC decoder on remote server, and also build up the
feedback channel. Last but not least, we transcode the decoded video into mp4 or 3gp
form so that mobile devices can decode and play the result sequence.
Although the QCIF (surveillance video) can be decoded in near real time in the
DISPAC , unfortunately, the communication over the feedback channel spends a lot of
time. As it was known that LDPCA features in a little syndrome in each network
packeting but requests substantial communication between encoder and decoder,
frequently. In reality, network transmission and network packet header’s overhead
cause a huge time consuming on the feedback channel.
In order to solve these problems, it is necessary to group these syndrome in each
individual network packet and to predict the suitable syndrome size in each bitplane
decoding via two methods proposed as follows. For short, syndrome size represents
the amount of syndrome LDPCA decoder requires to correct one bitplane. We propose
two methods to predict the syndrome size needed in each bitplane decoding. The first
method is to estimate the syndrome size per WZ frame, which predicts the syndrome
size by referencing the corresponding WZ frame in a previous group of pictures
3
(GOPs). As a result of the temporal relation, the neighboring GOP’s frames have
similar characteristics such as motion vectors and side information quality, so that it is
possible to estimate the syndrome size by referencing that of the neighboring GOPs.
The second method is to estimate the syndrome size per bitplane. Moreover, by
observing the experiment result and the statistics of the number of requests [8], we
found that the numbers of requests in each DC band and AC bands have the similar
distribution in a WZ frame. In DC band, the syndrome size is larger per bitplane. On
the other hand, in AC band, syndrome size show the similar trend as DC band;
therefore, we can estimate the AC bands’ syndrome sizes by referencing that of the
DC band.
4
Chapter 2 DVC architecture
The benchmark[6] DVC system DISPAC architecture is showed below as Fig.1,
which has capability to decode low motion video in real time. In this work, we refer
this version of DVC code to realize DVC on mobile. Obviously, the benchmark
project’s encoder and decoder are executed on the same PC; however, our work
realized the DVC video system in a rather the practical situation, which is on mobile
devices instead.
Fig.1 DVC architecture
5
In the case of encoder, the video sequence is divided into two separated parts,
known as WZ frames and key frames. Key frames are normally coded with traditional
video codec H.264/AVC while WZ frames are treated with mode selection such as
quantizer, LDPCA encoder and CRC.
On the other hand, decoder tends to generates side information for each WZ
frames; however, the side information is comparatively difficult to reach from the WZ
frame. Consequently, LDPCA is forced to request more syndrome in order to
reconstruct the SI. Result from the fact that LDPCA’s decoder has the tendency of
requesting more syndrome once it finished error correction for each WZ frames, it is
compulsory to establish a feedback channel between LDPCA encoder and decoder.
Therefore the feedback channel is designed to send syndrome to decoder as well as
requesting encoder whenever LDPCA decoder fails to decode on WZ frames’ bitplane.
Moreover, feedback channel is the most vital component in the practical DVC
application because in no way can LDPCA coding conduct error correct without the
help of feedback channel.
However, due to encoder and decoder are executed on the same device, there is
no feedback channel implement in this benchmark project. Our reference DVC
system’s decoder accesses the same memory as the encoder when it does LDPCA
6
decoding. Moreover, traditional video coding is doing offline and implement with
JM9.5 which is a version of H.264 and has heavy complexity. It is not suitable for
practical application, it is neither considered inefficient to run DVC video system on
different devices without feedback channel nor ineffective to run JM for encoding key
frame. Therefore, aiming at improving these problems, the feedback of the project is
implemented with the network connection. replacing the JM with x264, so that, an
efficient implementation for H.264 coding can be achieved.
7
Chapter 3 DVC system realizes on mobile devices
Fig.3-1 DVC system realized on mobile devices
The following chapter discusses the process of DVC system realization on
mobile devices, with the details shown in Fig.3-1. First , in the case of encoder, real
video is taken with mobile camera. Second, the real feedback channel is implemented.
Third, aiming at the time complexity’s decreasing, two methods are proposed to
estimate the syndrome size, the estimation methods can effectively decrease the time
of request over the feedback channel and finally decrease time complexity spending
8
over the feedback channel.
Comparing to the previous work mentioned above, the study implements several
components and utilities for realizing DVC system into practical application,
explained as follows. In encoder side, practical video is captured with mobile camera
and DVC encoder realization on mobile devices; on the other hand, when it comes to
decoder’s part, the DVC decoder is conducted on the remote server and then
transcodes the decoded video sequence into the type of format that mobile devices are
able to decode. Moreover, the study also provides a mobile application to watch the
decoded video outcome. In the overall DVC system, we implement the feedback
channel to handle the communication between encoder and decoder throughout the
entire network. In addition, two methods for the good of giving a more efficient
feedback channel, considered as the major contribution for the project, are provided
and explained as follows .
3.1 Encoder
Responding to the design of this work that aims at conducting a practical
application of DVC system on mobile, it is obliged to use the mobile devices’ camera
9
in recording video, and realizing the DVC encoder via the mobile devices.
3.1.1 Capture video sequence with mobile devices
In the former case we refereed, the previous solution is take test video data as
one of the sequence sources. In comparison, in this work, mobile device’s camera is
used in video filming.
3.1.2 Realize DVC encoder on mobile devices
Due to the fact that the benchmark DVC project is coding with C/C++ language,
and the mobile device we used is with android system. As a result, for the DVC
encoder coding on mobile devices, we use the android Native Development Kit (NDK)
[9] is chosen to fulfill the process. With the aid of NDK tools, it is possible to
generate the native code library from the exist DVC encoder code. Consequently,
once the native code is generated, it is called in this application on android mobile
device. After all, the native code realizes the DVC encoding in our application.
The following part is the step-by-step porting process that presents details when
porting on the mobile device. First of all, it is necessary to build up several situation
including Java platform (JDK), Eclipse, android SDK and NDK.
Second, the adjusting method of the C/C++ code project is shown in Fig.3-2.
10
Fig.3-2 C/C++ project
Third, create the makefile, Fig.3-3. Like the C/C++ project, and there should be
one makefile that similar to makefile of C/C++, but under NDK’s specified coding
rule.
Fig.3-3 Makefile
Fourth, generate the static library(.a) of x264 and other linking library that using
in Key frame coding. The cross compiler is applied to generate the static library
which targets on android system provided by android SDK.
Fifth, builder is created in teaching eclipse information , argument , C/C++
project and libs location.
Sixth, the file path is change in order to mobile devices’ storage architecture.
11
Besides, it is worth noticing that file location should not be the same as the one on
PC.
Last but not least, building this project to generate the share library then model it
into a function in Java language. Afterwards, we can realize DVC encoder by
executing the generated function call in our application.
3.2 Decoder
Derived from the property of the DVC decoder, DVC decoder has a heavy
complexity, so that it is unsuitable to be realized on mobile devices. Therefore, in
order to solve the problem, we execute the DVC decoder on the remote server. Based
on the similar reason, it is also needed to transcode the decoded sequence into certain
mobile-decodable format, so that other mobile devices can show the decoded video.
3.2.1 DVC decode on remote server
We execute the DVC decoder on server and communication with mobile device
through network and replacing the JM decoder by more efficiently performed x264
decoder at the same time.
3.2.2 Transcode result sequence for mobile
12
In the benchmark’s implementation, the format for final result is a raw sequence.
In order to enable the receiving mobile device to decode the video, the result is
transcoded into mp4 or 3gp format so that mobile devices can decode successfully.
In addition, due to the benchmark in this project merely codes the luminance part
of video, it is inevitable to prepare some other processes before transcoding such as
luminance deriving whenever DVC decoder finishes decoding. Moreover, it is
suggested to use the luminance value in creating each gray-pictures into RGB form,
and then execute FFMPEG to collect these gray-level RGB picture in one video with
mobile device decodable mp4 or 3gp forms.
3.3 Feedback channel
As it is known that the previous work and benchmark do not implement the
feedback channel communication. That is the previous implementation of DVC
encoder and decoder are both executed on the same PC device, and the previous
work’s decoder accesses the same memory of LDPCA’s encoder buffer, resulting to
no feedback channel in the outcome. On the contrary, in this project, real feedback
channel is implemented to handle sending and request on the network, presenting a
practical theme.
13
Resulted from the outcome of LDPCA in Slepian-Wolf, the LDPCA decoder
requests more syndrome bits in decoding per units, for example, Bitplane in WZ
frames. Although, it is advantageous for DVC system to keep the bitrate near Shannon
limit, in practical situation, the DVC encoder and decoder can not be executed in the
same device. As a result, DVC encoding and decoding in this study are separate, with
the former conducted on the mobile device, and the latter on remote server. Besides,
there should be one feedback channel between encoder and decoder in responsible to
handle the syndrome delivery. Therefore, the network connection is built up to realize
feedback channel, and the process is showed below.
First, Fig.3(a) presents that when decoder decodes one bitplane of WZ frame,
LDPCA encoder computes the minimum syndrome size that LDPCA decoder need,
while encoder compacts these syndrome into one network packet then sends it back to
decoder. Fig.3(b) explains that once decoder gets these syndrome, LDPCA decoder
will attempt to decode the bitplane, while syndrome decoder receives and checks
cyclic redundant check (CRC). In other words, once decoder has recorrect bitplane
successfully, then decoder start to decode the next bitplane until the rest of bitplanes
in the WZ frame have been recorrect. Third, Fig.3(c) clarifies that if decoder requests
more syndrome, representing that decoder failed to decoded the bitplane correctly, the
14
encoder tend to send more syndrome to decoder. Furthermore, if encoder sends more
syndrome, the amount of required syndrome is depended on the LDPCA’s bipartite
graph. Besides, in our work, the syndrome size each time decoder send request is the
same. After repeating Fig.3’s process, one WZ frame has been decoded.
Fig.3(a)(b)(c) the process of decoding one bitplane with LDPCA.
Due to the fact that LDPCA is the accumulation of LDPC, each syndrome’s
sending size (yellow or blue block) depends on the syndrome of one LDPC’s bipartite
graph need. (a) LDPCA encoder sends parts of its syndrome to LDPCA decoder. (b)
once LDPCA decoder receive syndrome packet from encoder, it starts to do LDPCA
decoding and CRC check. In addition, once bitplane is recorrected, the decoder will
start to decode the next bitplane. In contrast, if the repairing is failed, decoder will
send request for more syndrome in responce. (c) once encoder got the request from
15
decoder, encoder send more syndrome to decoder.
3.3.1 feedback channel realize in network
In our work, we play the DVC encoder on mobile device and play DVC decoder
on remote server. Because the DVC encoder and decoder don’t execute in the same
device, we create a feedback channel through the network to handle the LDPCA’s
mechanism. There are two main kind of internet protocol suite, TCP and UDP, and
both have their advantage and disadvantage. Transmission Control Protocol (TCP)
provides reliable, ordered delivery of stream. On the other hand, User Datagram
Protocol (UDP) emphasizes reduced latency but reliability. Even though UDP is faster
than TCP, but unfortunately our DVC project DISPAC doesn’t support error resilience,
it means we should adopt TCP protocol.
Once we creating TCP connection and socket between mobile device and server,
we can deliver the syndrome from encoder to decoder and send request from decoder
to encoder. Because of the LDPCA’s property, there are a lot of request when decode
one WZ frame. and each time encoder send more syndrome, the syndrome size is just
a little. Combine these situation, there are a lot of network packet to be sent and each
network packet has just a little syndrome even smaller than packet header. After all,
16
the large number of network transit delay and packet header’s overhead cause a heavy
time complexity. In our experiment, the time consuming on feedback channel is up to
90 percent of all decoding time. absolutely, the time complexity for send-request is
the bottle neck of the whole system.
17
Chapter 4 Effic ient feedback channel
According to the previous paragraph, the time complexity of network
communication is the main part of total decoding time. As the results, the most
important problem here is that too much requests can cause higher frequency in data
transmit generating huge network delay. Therefore, in order to improve this problem,
decreasing the amount of requests is nothing else but compulsory.
Aiming at decreasing the amount of requests, we estimate the needed syndrome
size need for decoding each bitplane by referencing the previous decoded information
and status such as syndrome size. Therefore, it is comparatively easy to consider the
side information’s quality effect the syndrome size because LDPCA discloses that the
more similar side information and WZ frame are, less the syndrome bits are needed.
In fact, it is notably that decoder can not predict the difference between side
information and real WZ frame; therefore the remaining way decoder can do is
referencing other value to simulate the difference between side information and WZ
frame. In the next subparagraph; however, there is no decoder known value has strong
correlation with syndrome size.
Aiming at solving the bottleneck, there are two methods suggested on decreasing
18
the communication time. The first method is to estimate the syndrome size per WZ
frame. In this method, we estimate the syndrome size per WZ frame by referencing
the corresponding WZ frame in previous GOP. On the other hand, the second method
is to estimate the syndrome size per bitplane. In this method, when decoder decoding
the AC bands’ bitplanes, we estimate the syndrome size for one bitplane by
referencing the corresponding bitplane in the DC band of WZ frame.
4.1 syndrome distribution
The essential spirit of the paragraph is to discuss what value has strong
correlation with syndrome size needed for one bitplane or WZ frame. With the
property of LDPCA, the syndrome size needed is related with the side information’s
quality. By simulate the side information’s quality, we have some candidates, motion
vector value and residual value. Motion vector value is computing by accumulate the
motion vector’s absolute value, the motion vector value showed in formula.1 is the
sum of absolute value of motion vector for each pixels. It has relation with the side
information’s quality because that if smaller the motion vector is, more similar the
side information to the WZ frame. Residual value is computing as the sum of absolute
value of difference between previous and next reference frame. It is illustrate as
formula.2, the residual value is sum of absolute value of previous and next frame
19
pixel value’s difference. Consider these two values; the residual value is the better
choice because it is more accurate in the situation that the motion vector is big but the
residual value is big.
With the mention of the previous paragraph, we decide to compute the
correlation between residual value and syndrome. The experiment result showed in
Fig.4 is calculated as the middle WZ frame in one GOP. Unfortunately the Fig.4
shows that the residual value and syndrome size have no positive correlation. With the
experiment observation, we thought that it is too hard to find one value to describe
how much syndrome size LDPCA decoder need to decode one WZ frame.
M=
∑ |𝑚𝑣𝑝 |
𝑝=𝑝𝑖𝑥𝑒𝑙𝑠
Formula.1 motion vector value
R = ∑ |𝐹𝑝𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 − 𝐹𝑝𝑛𝑒𝑥𝑡 |
𝑝=𝑝𝑖𝑥𝑒𝑙
Formula.2 residual value
20
Fig.4-1 correlation between syndrome bits and residual value, which residual value is
computed by absolute value of subtract previous reference frame and next.
4.2 Estimate the syndrome size per WZ frame
Before we talking about the methods to estimate syndrome size, we can observer
the distribution of syndrome size for each bitplane. With the observation of
experiment, we found that the correspond WZ frame in neighbor GOP have the
similar syndrome size distribution.
The first method to decrease request times is estimate the syndrome size per WZ
frame. In this method, we assume that the corresponding WZ frame have the similar
syndrome size in the neighbor GOP, so we estimate the syndrome size by referencing
21
the corresponding WZ frame in the previous GOP.
The formula of the first method is showed below Formula. 4-1, WZn means the
WZ frame’s index, n-GOPsize means the correspond WZ frame in the previous GOP,
bt is the bitplane index, {𝐸𝑆 𝑏𝑡 }𝑊𝑍𝑛 means the estimated syndrome bit for the bt’th
bitplane of WZn frame, and {𝑆 𝑏𝑡 }𝑊𝑍(𝑛−𝐺𝑂𝑃𝑠𝑖𝑧𝑒) means the syndrome bit for the bt’th
bitplane of WZ(n-GOPsize) frame.
{𝐸𝑆 𝑏𝑡 }𝑊𝑍𝑛 = {𝑆 𝑏𝑡 }𝑊𝑍(𝑛−𝐺𝑂𝑃𝑠𝑖𝑧𝑒)
Formula. 4-1 estimate syndrome size with the previous GOP
Fig.4-2 bit rate per bitplane per band for Qi=8, for one WZ frame
22
Fig.4-3 bit rate per bitplane and bit rate estimated by DC band for Qi=8, for one WZ
frame
4.3 Estimate the syndrome size per bitplane
With the observing of experiment shown in Fig.4-2, we can found that the
syndrome size in each DC band and AC bands have the similar distribution. In one
WZ frame decoding, we can estimate the syndrome size in AC bands by referencing
the DC band of the same WZ frame.
The formula of the second method is showed in Formula 4-2. Which WZn means
the n’th WZ frame. acn means the n’th AC band. bt means the bitplane index of this
𝑏𝑡
AC band. Which 𝐸𝑆𝑎𝑐𝑛
mean the estimated syndrome size for the bt’th bitplane in
𝑏𝑡−1
n’th AC band. dc means the DC band. 𝑆𝑑𝑐
bitplane in DC band.
23
means the syndrome bits for bt-1’th
𝑏𝑡−1
𝑏𝑡 }
{𝐸𝑆𝑎𝑐𝑛
𝑊𝑍𝑛 = {𝑆𝑑𝑐 }𝑊𝑍𝑛
Formula 4-2 Estimate the syndrome size per bitplane
24
Chapter 5 Performance Evaluation
5.1 Test conditions and Benchmarks
The following simulation is performed according to our transform domain
Wyner-Ziv (TDWZ) video codec, called DISPAC [2], and mobile video
communication system performed in HTC sensation, with transcoder is run on server
with GPGPU Tesla M2050. This section will be discuss and compare with four
proposed methods, including basic version, syndrome rearrangement, pre-send
average amount syndrome and parallel syndrome delivery. All the four method are run.
In the detail setting, we take the 8th quantize table, group of picture (GOP) is 8, and
set intra mode on to enable the intra coding. Notably, the following table and result is
set which gives the better system performance.
5.2 Decoding complexity analysis
We measure the decoding complexity of the whole system with the 4 sequences,
soccer, foreman, coastguard and hall monitor. In the previous work in DISPAC[6], the
LDPCA decoding time is showed as Table. 5-1(a). And for our work, the decoding
complexity of the practical system for these 4 sequences is showed as Table.5-2(b).
25
Comparing to time complexity in Table.5-1(a) , the network communication time is
about 96 percent in the whole decoding process. With these two tables, we can figure
out that the communication on feedback channel is the main part and the bottle neck
of the practical DVC system.
We also measure the decoding complexity for these two method in Table. 5-1(b) .
The first row shows the time complexity of the DVC system having feedback channel
but without proposed methods. The second row and the thirst row show the time
complexity with four sequences. We found that the first method, estimate syndrome
size per WZ frame, has a huge improve in decoding speed. On the other hand, even
though the second method, estimate syndrome size per bitplane, doesn’t have the
great improve like the first method, the second method has a better RD curve.
26
method\video
foreman
soccer
coastguard
hall
DISPAC
Without
feedback
channel
16.38 sec
11.66 sec
6.18 sec
7.22 sec
Table. 5-1(a) decoding time consuming without feedback channel implement.
Method\video
foreman
soccer
coastguard
hall
With feedback
channel
446.83 sec
419.77 sec
482.40 sec
417.29 sec
Estimate per
104.66 sec
73.28 sec
109.27 sec
78.96 sec
237.56 sec
237.56 sec
271.61 sec
305.15 sec
WZ frame
Estimate per
bitplane
Table. 5-1(b) decoding time consuming in our work
5.3 Quality and Bitrate evaluation
In this section, we also measure bitrate and quality for these methods in our work.
As the Fig.5-1 showed, the second method we proposed, estimate syndrome size per
bitplane, has the similar bitrate as the DISPAC’s[6] bitrate. The result prove that the
second method we propose do an accurate estimation. On the other hand, the first
method we propose has a higher bitrate. It is because that the referenced WZ frame is
far from the decoding WZ frame, so there are many error and inaccuracy between
27
these two WZ frames.
For the Table. 5-2(a)(b)(c), we show the bitrate and PSNR with methods we
propose for these four sequences. The average PSNR for all methods are the same
because our methods don’t affect the quality of the sequences.
In the RD curve in Fig.5-2 and Fig.5-3 , we show the RD curve with proposed
methods. We found that the second method has better quality than the first. Combine
the measure result at this and the previous paragraph, it is a tradeoff between RD
performance and decoding complexity.
Fig. 5-1 BitRate with all proposed methods, for Qi=8, for Foreman, Soccer, Coast
Guard, and Hall Monitor (QCIF at 15 Hz).
28
Q8
Q4
Q2
Bitrate
PSNR
450.32
161.18
85.79
39.29
32.13
28.53
Table.5-2(a) Bitrate and PSNR for the foreman, GOP8,without estimate
Bitrate
PSNR
Q8
564.29
39.29
Q4
Q2
204.38
112.06
32.13
28.53
Table.5-2(b) Bitrate and PSNR for the foreman, GOP8, estimate per WZ frame
Q8
Q4
Q2
Bitrate
PSNR
475.29
178.56
94.50
39.29
32.13
28.53
Table.5-2(c) Bitrate and PSNR for the foreman, GOP8, estimate per bitplane
Fig.5-2 Rate-distortion curve for the foreman, GOP8
29
Q8
Q4
Q2
Bitrate
PSNR
153.45
66.01
38.89
39.47
33.51
30.86
Table.5-3(a) Bitrate and PSNR for the hall monitor, GOP8,without estimate
Q8
Q4
Q2
Bitrate
PSNR
195.62
84.76
49.76
39.47
33.51
30.86
Table.5-3(b) Bitrate and PSNR for the hall monitor, GOP8, estimate per WZ frame
Q8
Q4
Q2
Bitrate
PSNR
157.72
68.68
39.50
39.47
33.51
30.86
Table.5-3(c) Bitrate and PSNR for the hall monitor, GOP8, estimate per bitplane
Fig.5-3 Rate-distortion curve for the hall monitor, GOP8
30
Chapter 6 Conclusion and feature work
To sum up, in this project, we implement and propose many components and
utilities, such as realizing the DVC encoder on mobile device and recording the video
sequence with mobile device’s camera. For the communication between encoder and
decoder, we implement the feedback channel with network connection so that the
whole DVC system can be realized on mobile devices. Aiming to decreasing time
complexity on feedback channel, two methods are proposed in estimate the syndrome
size. Furthermore, in the case of decoding, DVC decoder is executed on the remote
server. In addition, once the decoder finishes decoding video, a transcode is applied so
that mobile devices can decode the result video and play thoroughly.
In conclusion, in order to decrease time complexity on the feedback channel, the
project proposes two methods to estimate the syndrome size per WZ frame and per
bitplane. The first method estimates the syndrome size by referencing the
corresponding WZ frame in the previous GOP; on the other hand, the second method
estimates the syndrome size of AC band by referencing the DC band. In comparison,
the former method can reduce time complexity largely; however, it also causes a
considerable bitrate increase. On the contrary, the latter method is in favor of costing
merely a small amount of increase in bitrate, in exchange of an ideal speed up on
31
feedback channel.
As suggestions for future research, the improved designed for a more precise
estimation of syndrome size and a more reliable DVC codec system with higher loss
tolerant are both highly recommended. Once the problem of loss tolerance is removed,
there will be more advantageous and greater possibility in successfully adopting
UDP protocol efficiently as well as decreasing the communication time substantially.
32
Reference
[1] Han-Ping Cheng 1 , Yun-Chung Shen 1 , Ja-Ling Wu 1 , and Kiyoharu Aizawa 2.
High Efficient Distributed Video Coding with Parallelized Design for Cloud
Computing
[2] Slepian, D. and Wolf, J. 1973. Noiseless coding of correlated information sources.
IEEE Transactions on Information Theory. 19, 4, 471- 480.
[3] Wyner, A. and Ziv, J. 1976. The rate-distortion function for source coding with
side information at the decoder. IEEE Transactions on Information Theory. 22, 1,
1-10.
[4] Yu-Shan Pai, Han-Ping Cheng , Yun-Chung Shen and Ja-Ling Wu. Fast Decoding
for LDPC Based Distributed Video Coding
[5] David Varodayan, Anne Aaron and Bernd Girod. Rate-Adaptive Codes for
Distributed Source Coding.
[6] Tse-Chung Su. Yun-Chung Shen. and Ja-Ling Wu. 2011. Real-time Decoding for
LDPC Based Distributed Video Coding. National Taiwan University
[7] Martinez, J.L.; Fernandez-Escribano, G.; Kalva, H.; Fernando, W.A.C.; Cuenca,
P.2009.Wyner-Ziv to H.264 Video Transcoder for Low Cost Video Encoding.
[8] Catarina Brites a, Joa˜o Ascenso b, Jose´ Quintas Pedro a, Fernando Pereira
Evaluating a feedback channel based transform domain Wyner–Ziv video codec
[9] Android NDK link: http://developer.android.com/tools/sdk/ndk/index.html
33