A TUTORIAL on HEVC

Multimedia Processing Lab,UTA
1










Need for MDCT
Introduction
Definition of MDCT
Properties of MDCT
Variants of MDCT
Special Characteristics of MDCT
DFT (vs) SDFT (vs) MDCT
Applications
MDCT-Overview
References
Multimedia Processing Lab,UTA
2
 With rapid deployment of audio compression technologies, more and more
audio content is stored and transmitted in compressed formats.
 Signal representation in the Modified Discrete Cosine Transform (MDCT)
domain has emerged as a dominant tool in high quality audio coding because
of its special properties like:
 Energy compaction
 Critical Sampling
 Block effect reduction
 Flexible window switching
 MDCT is designed to achieve a perfect reconstruction.
Multimedia Processing Lab,UTA
3



MDCT is a linear orthogonal lapped transform, based on the idea of time
domain aliasing cancellation (TDAC), where the basis functions of the
transformation overlap the block boundaries. However, the number of
coefficients which results from a series of overlapping block transforms
remains the same as any other non-overlapping block transformation.
The MDCT is based on the type-IV discrete cosine transform (DCT-IV) [11],
which is given by
It is designed to be applied on consecutive blocks of larger dataset, in
cases where subsequent blocks overlap such that the last half of one block
coincides with the first half of the next block. MDCT is unusual compared
to other Fourier-related transforms in that it has half as many outputs as
inputs (instead of the same number). It is linear function i.e. F:R2n RN
where R denotes the set of real numbers.
Multimedia Processing Lab,UTA
4

MDCT
Xn : windowed i/p signal
an : i/p signal with 2N samples
hn : window function


This window is an identical analysis-synthesis time window, given by
A sine window is widely used in audio coding as it offers good stop-band
attenuation. It also provides attenuation of the block edge effect and allows
perfect reconstruction. However, other optimized windows can also be
applied. The sine window mentioned here can be defined as:
Multimedia Processing Lab,UTA
5
 IMDCT
The yn the equation contain time domain aliasing
Though the IMDCT obtains yn , the original xn can be perfectly obtained by
adding the overlapped IMDCTs of subsequent overlapping blocks as shown
in fig 1. This leads to cancellation of errors and hence the original data can
be retrieved.
Multimedia Processing Lab,UTA
6
1) MDCT is not an orthogonal transform. Perfect signal reconstruction can be achieved in the
overlap-add (OA) process. For the overlap-add window of 2N samples, the first N and last N
samples of the signal will remain modified. One can easily see this from the fact that performing
MDCT and IMDCT of an arbitrary signal xk reconstructs signal Xk.
The Fig 1 describes the simple overlap and add algorithm. Here, first the signal x[n] is partitioned into nonoverlapping sequences. After this the discrete Fourier transforms of the sequences yk[n] are calculated [2].
This is done by multiplying the FFT of x[n] with the FFT of h[n]. Then we recover the yk[n] ] using inverse FFT
[7], the resulting output signal can be reconstructed using overlapping and adding the yk[n] as described in
the Fig 1. The overlap is based on the idea that a linear convolution is always longer than the original
sequences [2].
Multimedia Processing Lab,UTA
7
2) NONORTHOGONAL PROPERTY OF MDCT
Multimedia Processing Lab,UTA
8
Multimedia Processing Lab,UTA
9
3) MDCT becomes an orthogonal transform if the signal length is infinite. This is
different from the traditional definition of orthogonality, which require a square
transform matrix.
4) MDCT is similar to the orthogonal transforms such as DFT, DCT, DST and it also
possesses energy compaction capability.
5) Invertibility: Since the number of inputs and outputs are not equal it may seem that
the MDCT is not invertible, however MDCT is perfectly invertible which is achieved by
adding the overlapped IMDCTs of subsequent overlapping blocks, this leads the
errors to cancel and the original data will be retrieved. This technique is known as
time-domain aliasing cancellation (TDAC) [10].
6) Performing MDCT and then IMDCT with one single frame of time domain samples,
the original time samples cannot be perfectly reconstructed, instead the
reconstructed samples are normally an alias-embedded version. MDCT by itself is a
lossy process.
7) 90% energy is concentrated within 10% of the normalized frequency scale for most of
the test signals for all transforms concerned [2]. The energy compaction property of
different transforms becomes more unified with increasing window length.
Multimedia Processing Lab,UTA
10
8) Window shape has an impact on MDCT energy compaction property. In the
case of rectangular window, DCT has always the best energy compaction
property since DCT corresponds to an even extension of the signal.
9) MDCT is very useful with its time domain alias cancellation (TDAC)
characteristics [10]. However, its mismatch with the DFT domain based
psychoacoustic model must be kept in mind when developing a MDCT based
audio codec with its full potential in terms of coding performance.
10) The distinct advantage of MDCT lies in its critical sampling property,
reduction of block effect and the possibility of adaptive window switching.
11) In comparison to the orthogonal transforms, the MDCT has a special
property i.e. the input signal cannot be perfectly reconstructed from a
single block of MDCT coefficients of the sliding discrete Fourier transformSDFT(N-1)/2,1/1 [12] are lost in the MDCT.
Multimedia Processing Lab,UTA
11
12) In terms of energy compaction property, MDCT does not have any advantage in
comparison to DFT and DCT as indicated in Fig 2.
Multimedia Processing Lab,UTA
12
Low Delay MDCT (LD-MDCT) [3]: Non-overlapping transforms, such as DCT-IV leads to aliasing in lossy compression
coding. MDCT on the other hand eliminate this aliasing effect. However to do this the MDCT requires a 50%
overlap-add (OLA), and as a result it leads to a delay. The LD_MDCT (Low Delay MDCT) is developed to reduce this
delay needed for the look ahead.
Multimedia Processing Lab,UTA
13
The Fig 3(a) illustrates the 50% window overlap. However, MDCT spectra of different time slots
in Fig 4(b) Fig 4(d) Fig 4(f) are calculated with rectangular windows. The IMDCT time domain
samples of frame 1,2,3 are shown in Fig 4(c) Fig 4(e) Fig 4(g) respectively. The reconstructed time
domain samples after overlap-add (OA) procedure is shown in Fig 4(h).
Multimedia Processing Lab,UTA
14
Multimedia Processing Lab,UTA
15
 The Fig 4 shows the fluctuation of MDCT spectrum in comparison with DFT and
SDFT (N=1)/2,1/2 spectra. With a frequency-modulated time signal in Fig 2(a), the
DFT power spectrum is very stable despite a moving window. Conversely, the MDCT
spectrum is very unstable. The SDFT (N=1)/2,1/2 spectrum is in between. This shows
that the SDFT (N=1)/2,1/2 can be used as a bridge go connect MDCT and DFT in
audio coding applications [2].
Multimedia Processing Lab,UTA
16
Multimedia Processing Lab,UTA
17
 MDCT is employed in most modern lossy audio formats such as MP3, AX-3, Vorbis, Windows
Media Audio, Cook and AAC which are different lossy audio compression codecs [13].
 The direct application of MDCT formula require O(N2 ) operations, however the number of
operations can be reduced to only O(Nlog2N) complexity. This reduction can be done by
recursively factorizing the computation as in the cases of an FFT. It is also possible to compute
MDCT using other transforms such as DFT (FFT) or a DCT combined with O(N) pre and post
processing steps.
 MDCT is used as an analysis filter where it limits the sources of output distortion at the
quantization stage. MDCT performs a series of inner products between the input data x(n) and
the analysis filter hk(n).
 Eliminates the blocking artifacts that would cause a problem during the reconstruction of
the sample. The inverse MDCT reconstructs the samples without the blocking artifacts.
 MDCT makes use the concept of time-domain alias cancellation (TDAC) whereas the
quadrature mirror filter bank (QMF) uses the concept of the frequency-domain alias
cancellation [10]. This can be viewed as a duality of MDCT and QMF. However, it is to be noted
that MDCT also cancels frequency-domain aliasing, whereas QMF does not cancel time-domain
aliasing this means that the MDCT is designed to achieve perfect reconstruction, QMF on the
other hand does not produce perfect reconstruction.
Multimedia Processing Lab,UTA
18
 Overlapped windows allow for better frequency response functions but carry the penalty of
additional values in the frequency domain, thus these transforms are not critically sampled. MDCT
thus has solved the paradox satisfactorily and is currently the best solution with critical sampling.
Multimedia Processing Lab,UTA
19
Multimedia Processing Lab,UTA
20
Multimedia Processing Lab,UTA
21
Multimedia Processing Lab,UTA
22
 MDCT becomes an orthogonal transform, if the signal length is infinite. This is
different from the traditional definition of orthogonality, which requires a square
transform matrix.
 The MDCT spectrum of a signal is the Fourier spectrum of the signal mixed with
its alias. This compromises the performance of MDCT as a Fourier spectrum
analyser and leads to possible mismatch problems between MDCT and DFT
based perceptual models. Nevertheless, MDCT has been successfully applied to
perceptual audio compression without major problems if a proper window such
as a sine window is employed.
 The TDAC of an MDCT filter bank can only be achieved with an overlap-add (OA)
process in the time domain. Although MDCT coefficients are quantized in an
individual data block, it is usually analyzed in the context of a continuous stream.
In the case of discontinuity such as editing or error concealment, the aliases of
the two neighbouring blocks in the overlapped area are not able to cancel each
other.
Multimedia Processing Lab,UTA
23
 MDCT can achieve perfect reconstruction only without quantization, which is
never the case in coding applications. If we model the quantization as a
superposition of quantization noise to the MDCT coefficients, then the time
domain alias of the input signal is still cancelled, but the noise components will
be extended as additional “noise alias”. In order to have 50% window overlap
and critical sampling simultaneously, the MDCT time domain window is twice as
long as that of ordinary orthogonal transforms such as DCT. Because of the
increased time domain window length, the quantization noise is spread to the
whole window, thus making pre-echo more likely to be audible. Well-known
solutions to this problem are window switching and temporal noise shaping
(TNS)[15].
 In very low bitrate coding, the high frequency components are often removed.
This corresponds to a very steep low pass filter. Due to the increased window
size, the ringing effect caused by high frequency cutting is longer.
Multimedia Processing Lab,UTA
24
[1] Y.E. Wang, M. Vilermo, “Modified Discrete Cosine Transform – Its Implications for Audio Coding and Error Concealment”, J.
Audio Eng. Soc., Vol. 51, No. ½. January/February 2003.
[2] Y.E. Wang, L. Yaroslavsky, M.Vilermo and M. Vaananen, “Some Peculiar Properties of the MDCT”, Proceedings of ICSP, 2000.
[3] S. Lee, I. Lee, “A Low-Delay MDCT/IMDCT”, ETRI Journal, pp 935-938, Vol 35, Issue 5, October 2013.
[4] Y.E. Wang, L. Yaroslavsky, M. Vilermo, “On the Relationship between MDCT, SDFT and DFT”, Proceedings of ICSP, 2000.
[5] V. Britanak, “A survey of efficient MDCT implementations in MP3 audio coding standard; Retrospective and state-of-the-art”,
Signal Processing, pp 624-672, Vol 91, Issue 4, April 2011.
[6] V. Britanak, H.J. Lincklaen, Arriens, “Fast computational structures for an efficient implementation of the complete TDAC
analysis/synthesis MDCT/MDST filter banks”, Signal Processing, pp 1379-1394, Vol 89, Issue 7, July 2009.
[7] K.R. Rao, D.N. Kim, J.J. Hwang, “Fast Fourier transform: Advantages and applications”, Springer, 2010.
[8] Nuno Roma, Leonel Sousa, “A tutorial overview on the properties of the discrete cosine transforms for encoded image and
video processing”, Signal Processing, pp 2443-2464, Vol 91, Issue 11, November 2011.
[9] Shanmugam. K.S,” Comments on Discrete Cosine Transform”, IEEE Transactions on Computers, Vol C- 24, Issue 7, pp 759, July
1975.
[10] E. Kurniawati, C.T. Lau, B. Premkumar, “Time-domain aliasing cancellation role in error concealment”, Electronics Letters, Vol
40, n 12, pp 781-783, June 10, 2004.
[11] K.R. Rao, P. Yip, “Discrete cosine transforms : algorithms, advantages applications”, New York, Academic Press, 1990.
[12] K. Duda. “ Accurate, Guaranteed Stable, Sliding Discrete Fourier Transform”, IEEE Signal Processing Magazine, Vol 27, Issue 6,
pp 124-127, Nov 10, 2010.
[13] S. Lai, C.H. Luo, S. Lei, “Common Architecture Design of Novel Recursive MDCT and IMDCT Algorithms for Applications to
AAC, AAC in DRAM, and MP3 Codecs”, IEEE Transactions on Circuits and Systems II: Express Briefs, Vol 56, Issue 10, pp 793-797,
Oct 2009.
[14] A. Ferreira, “Spectral Coding and Post-Processing of High Quality Audio”, PhD thesis
http://telecom.inescn.pt/doc/phd_en.html.
[15] Bosi, M., Brandenburg “ISO/IEC MPEG-2 Advanced Audio Coding,” Journal of Audio Engineering Society, vol. 45, no. 10, 1997.
Multimedia Processing Lab,UTA
25