CM0340/CMT502 Solutions CARDIFF UNIVERSITY EXAMINATION

CM0340/CMT502 Solutions
CARDIFF UNIVERSITY
EXAMINATION PAPER
Academic Year:
2012/2013
Examination Period:
Spring
Examination Paper Number: CM0340/CMT502 Solutions
Examination Paper Title:
Multimedia
Duration:
2 hours
Do not turn this page over until instructed to do so by the Senior Invigilator.
Structure of Examination Paper:
There are 14 pages.
There are 4 questions in total.
There are no appendices.
The maximum mark for the examination paper is 81 and the mark obtainable for a question
or part of a question is shown in brackets alongside the question.
Students to be provided with:
The following items of stationery are to be provided:
ONE answer book.
Instructions to Students:
Answer 3 questions.
The use of calculators is permitted in this examination.
The use of translation dictionaries between English or Welsh and a foreign language bearing
an appropriate departmental stamp is permitted in this examination.
1
PLEASE TURN OVER
CM0340/CMT502 Solutions
Q1.
(a) How does the human eye sense colour? What characteristics of the human visual
system can be exploited for the compression of colour images and video?
The eye is basically sensitive to colour and intensity
• Retina of the eye has ‘neurons’ on which light is focus. Each neuron is either
a rod or a cone.
[1]
• Rods are not sensitive to colour - sense intensity (monochrome).
[1]
• Cones come in 3 types: The first responds most to light of long wavelengths,
red/yellowish colours. The second type responds most to light of mediumwavelength, peaking at a green colour, The third type responds most to shortwavelength light, of a bluish colour.
[1]
• Each responds differently — Non linearly and not equally for RGB differently to various frequencies of light.
[1]
• Compression in image video uses the fact that intensity (monochrome) can
be modelled in high resolution and colour modelled in lower resolution and
non-linearly w.r.t colour sensitivity.
[1]
5 Marks - Bookwork
(b) Different colour models are often used in different applications. What is the
CMYK colour model? Give an application in which this colour model is mostly
used and explain the reason.
The CMYK colour model use Cyan, Magenta, Yellow and Black as primaries
(components).
[1]
The CMYK colour model is mostly used in printing because the colour pigments
on the paper absorb certain colours thus a subtractive model is suitable; black is
used to produce darker black than simply mixing CMY.
[2]
3 Marks — Bookwork
Given a colour represented in RGB colour space as R = 0.2, G = 0.6, B = 0.3,
what is its representation in the CMYK colour model?
First convert to CMY as








C̄
1
R
0.8



 



 M̄  =  1  −  G  =  0.4 
Ȳ
1
B
0.7
Then
K = min(C̄, M̄ , Ȳ ) = 0.4,
C = C̄ − K = 0.4,
M = M̄ − K = 0,
Y = Ȳ − K = 0.3.
[2]
2 Marks — Unseen problem
2
CM0340/CMT502 Solutions
(c) What is a colour look-up table and how is it used to represent colour?
Colour Look-Up Tables (LUTs)
• Store only the index of the colour LUT for each pixel.
• Look up the table to find the colour (RGB) for the index
[1]
[1]
[3]
5 Marks - Bookwork
Give an advantage and a disadvantage of this representation with respect to true
colour (24-bit) colour.
Advantage : Use up significantly less memory than full 24-bit colour.
Disadvantage : Restricted number of colours available.
[1]
[1]
2 Marks - Bookwork
How do you convert from 24-bit colour to an 8-bit colour look up table representation?
• LUT needs to be built when converting 24-bit colour images to 8-bit: grouping similar colours (each group assigned a colour entry)
[1]
1 Mark - Bookwork
3
PLEASE TURN OVER
CM0340/CMT502 Solutions
(d) What is chroma subsampling? Why is chroma subsampling meaningful? What is
the benefit of doing chroma subsampling?
Chroma subsampling is a method that stores colour information at lower resolution than intensity information.
[1]
Chroma subsampling is meaningful because human visual system is less sensitive
to variations in colour than brightness.
[1]
Chroma subsampling can reduce the bandwidth for colour detail in almost no
perceivable visual difference.
[1]
3 Marks — Bookwork
For the following array of colour values, give chroma subsampling results with
4:2:2, 4:1:1 and 4:2:0 schemes. Note: Listing the formulae to obtain the entries
without calculating the final numbers is acceptable.
90 100
80 18
44 62
28 23
96
82
52
48
42
78
38
22
Chroma subsampling result for 4:2:2 scheme:
90
80
44
28
96
82
52
48
[2]
Chroma subsampling result for 4:1:1 scheme:
90
80
44
28
[2]
Chroma subsampling result for 4:2:0 scheme:
(90 + 100+ 80 + 18)/4=72 (96 + 42 + 82 + 78)/4=75
(44 + 62 + 28 + 23)/4=39 (52 + 38 + 48 + 22)/4=40
[2]
6 Marks — Unseen problem
Question 1 Total Marks 27
4
CM0340/CMT502 Solutions
Q2.
(a) GIF and JPEG are two commonly used image representations. Do they usually
use lossless or lossy compression? State the major compression algorithm (if
lossless) or the lossy steps of the algorithm (if lossy) for each representation.
Lossless or lossy:
GIF : Lossless.
JPEG : Lossy.
[1]
[1]
Key algorithms:
GIF : Key algorithm is LZW (lossless)
JPEG : Lossy steps involve quantisation and chroma subsampling
[1]
[1]
4 Marks — Bookwork
(b) Briefly describe the four basic types of data redundancy that data compression
algorithms can apply to audio, image and video signals.
4 Types of Compression:
• Temporal – in 1D data, 1D signals (Audio), 3D temporal frames in Video. [2]
• Spatial – correlation between neighbouring pixels or data items.
[2]
• Spectral – correlation between colour or luminescence components. This
uses the frequency domain to exploit relationships between frequency of
change in data.
[2]
• Psycho-visual, psycho-acoustic – exploit perceptual properties of the human
visual system or aural system to compress data.
[2]
8 Marks Bookwork
5
PLEASE TURN OVER
CM0340/CMT502 Solutions
(c) Given the following string as input, /TAN/HAN/HAN/AN/, with the initial dictionary below, encode the sequence with LZW algorithm, showing the intermediate steps.
Index
1
2
3
4
5
Entry
/
H
A
N
T
RECAP: (Not explicitly required for solution)
The LZW Compression Algorithm:
w = NIL;
while ( read a character k )
{
if wk exists in the dictionary
w = wk;
else
{ add wk to the dictionary;
output the code for w;
w = k;
}
}
The steps to encode above string are given as follows:
• wk is: /, EXISTS w = wk /
• wk is: /T, NEW add to table, w is k: T, Code is: Output is: 1 (/) New Table
Entry, 6 : /T
• wk is: TA, NEW add to table, w is k: A, Code is: Output is: 5 (T) New Table
Entry, 7 : TA
• wk is: AN, NEW add to table, w is k: N, Code is: Output is: 3 (A) New
Table Entry, 8 : AN
• wk is: N/, NEW add to table, w is k: /, Code is: Output is: 4 (N) New Table
Entry, 9 : N/
• wk is: /H, NEW add to table, w is k: H, Code is: Output is: 1 (/) New Table
Entry, 10 : /H
• wk is: HA, NEW add to table, w is k: A, Code is: Output is: 2 (H) New
Table Entry, 11 : HA
• wk is: AN, EXISTS w = wk: AN
• wk is: AN/, NEW add to table, w is k: /, Code is: Output is: 8 (AN) New
Table Entry, 12 : AN/
• wk is: /H, EXISTS w = wk: /H
• wk is: /HA, NEW add to table, w is k: A, Code is: Output is: 10 (/H) New
Table Entry, 13 : /HA
• wk is: AN, EXISTS w = wk: AN
6
CM0340/CMT502 Solutions
• wk is: AN/, EXISTS w = wk: AN/
• wk is: AN/A, NEW add to table, w is k: A, Code is: Output is: 12 (AN/)
New Table Entry, 14 : AN/A
• wk is: AN, EXISTS w = wk: AN
• wk is: AN/, EXISTS w = wk: AN/ Output final token which is 12
To Summarise, the output Table (New Elements)i:
6 : /T
7 : TA
8 : AN
9 : N/
10 : /H
11 : HA
12 : AN/
13 : /HA
14 : AN/A
So the output will be 1 5 3 4 1 2 8 10 12 12
10 Marks — Unseen problem applying algorithms covered in lectures. 3
marks for keeping w, 2 marks for appropriate allocation of index, 3 marks
for symbol table and 3 marks for output
(d) Briefly describe the LZW decoding process, and illustrate your answer with the
above string sequence.
RECAP: (Not explicitly required for solution) The LZW Decompression Algorithm :
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
}
Decoding:
Have sequence: 1 5 3 4 1 2 8 10 12 12
And Code Book:
Index
1
2
3
4
5
7
Entry
/
H
A
N
T
PLEASE TURN OVER
CM0340/CMT502 Solutions
So we get:
•
•
•
•
•
•
•
•
•
•
Input: (w=k) 1 : Output (k Table entry): /
Input k: 5 : Output (k Table entry): T New Table Entry, 6 : /T
Input k: 3: Output (k Table entry): A New Table Entry, 7 : TA
Input k: 4 Output (k Table entry): N New Table Entry, 8 : AN
Input k: 1 : Output (k Table entry): / New Table Entry, 9 : N/
Input k: 2 : Output (k Table entry): H New Table Entry, 10 : /H
Input k: 8 : Output (k Table entry): AN New Table Entry, 11 : HA
Input k: 10: Output (k Table entry): /H New Table Entry, 12 : AN/
Input k: 12 : Output (k Table entry): AN/ New Table Entry, 13 : /HA
Input k: 12 : Output (k Table entry): AN/ New Table Entry, 14 : AN/A
Decoded Stream is (as expected):
/TAN/HAN/HAN/AN/
Note Output Table (New Elements) is as before:
6 : /T
7 : TA
8 : AN
9 : N/
10 : /H
11 : HA
12 : AN/
13 : /HA
14 : AN/A
5 Marks — Unseen problem
Question 2 Total Marks 27
8
CM0340/CMT502 Solutions
Q3.
(a) Briefly outline, with the aid of suitable diagrams, the JPEG/MPEG I-Frame
compression pipeline and list the constituent compression algorithms employed
at each stage in the pipeline.
The Major Steps in JPEG/MPEG Coding involve:
JPEG:
MPEG:
[2]
•
•
•
•
•
•
•
Colour Space Transform and subsampling
DCT (Discrete Cosine Transformation)
Quantization
Zigzag Scan
Discrete Pulse Code Modulation (DPCM) on DC component (in JPEG),
Run length encoding (RLE) on AC Components (JPEG), all of zig zag (MPEG).
Entropy Coding — Huffman or Arithmetic
[7]
9 Marks Bookwork
9
PLEASE TURN OVER
CM0340/CMT502 Solutions
What are the key differences between the JPEG and MPEG I-Frame compression
pipelines?
Four main differences for
• JPEG uses YIQ whilst MPEG use YUV (YCrCb) colour space
[1]
• MPEG used larger block size DCT windows 16 even 32 as opposed to
JPEG’s 8
[1]
• Different quantisation — MPEG usually uses a constant quantisation value.
[1]
• Only Discrete Pulse Code Modulation (DPCM) on DC component in JPEG
on zig zag scan. AC (JEPG) and complete zig zag scan get RLE.
[1]
4 Marks Applied Bookwork: Some lateral thinking to compare JPEG and
MPEG not directly compared in course notes at least
(b) Motion JPEG (or M-JPEG) is a video format that uses JPEG picture compression for each frame of the video. Why is M-JPEG not widely used as a video
compression standard?
Compressing in just each frame does not yield a high enough compression ratio
that is required for general video needs. Can exploit temporal aspect of video to
get better compression.
[2]
2 Marks Bookwork
Briefly state what additional approaches are used by MPEG video compression
algorithms to improve on M-JPEG.
Adopt some form of temporal compression. Use P-frames and B-frames to to
differencing between frames and also motion estimation.
[2]
2 Marks Bookwork
(c) What processes above give rise to the lossy nature of JPEG/MPEG video
compression?
Lossy steps:
• Colour space subsampling in IQ or UV components.
• Quantisation reduces bits needed for DCT components.
4 Marks Bookwork
10
[2]
[2]
CM0340/CMT502 Solutions
(d) Given the following portion from a block (assumed to be 4x4 pixels to simplify
the problem) from an image after the Discrete Cosine Transform stage of the
compression pipeline has been applied:
118
42
100
44
42
32
60
39
54 150
30 34
43 98
40 31
i. What is the result of the quantisation step of the MPEG video compression
method assuming that a constant quantisation value of 32 is used?
Trick needed to be remembered from notes is that we divide the matrix by
the quantisation table or in this case a constant.
So in this case divide all values by 32 and round down (Integer division).
3
1
3
1
1
1
1
1
1
0
1
1
4
1
3
0
[3]
ii. What is the output of the following zig-zag step being applied to the resulting
quantised block?
Trick needed to be remembered from notes is that Zig-zag reads of values
from DCT in an increasing low frequency order (better that row by row).
Create a vector rather than a matrix.
So we get a vector from matrix above:
3113114111111310
6 Marks: Unseen Problem
[3]
Question 3 Total Marks 27
11
PLEASE TURN OVER
CM0340/CMT502 Solutions
Q4.
(a) In MPEG audio compression, what is
i. frequency masking?
When an audio signal consists of multiple frequencies the sensitivity of the
ear changes with the relative amplitude of the signals. If the frequencies are
close and the amplitude of one is less than the other close frequency then the
second frequency may not be heard.
[2]
2 Marks: Bookwork
ii. temporal masking?
After the ear hears a loud sound, consisting of multiple frequencies, it takes
a further short while before it can hear a quieter sound close in frequency.[2]
2 Marks: Bookwork
Briefly describe the cause of each kind of masking in the human auditory system?
Frequency Masking:
• Stereocilia in inner ear get excited as fluid pressure waves flow over them.
[1]
• Stereocilia of different length and tightness on Basilar membrane so resonate
in sympathy to different frequencies of fluid waves (banks of stereocilia at
each frequency band). .
[1]
• Stereocilia already excited by a frequency cannot be further excited by a
lower amplitude near frequency wave.
[1]
3 Marks: Bookwork
Temporal Masking:
• (Like frequency masking) Stereocilia in inner ear get excited as fluid pressure waves flow over them and respond to different frequencies.
[1]
• Stereocilia already excited by a certain frequency will take a while to return
to rest state, as inner ear is a closed fluid chamber and pressure waves will
eventually dampen down.
[1]
• Similar to frequency masking Stereocilia in a ’dampening state’ may not
respond to a a lower amplitude near frequency wave.
[1]
3 Marks: Bookwork
6 Marks: subtotal
10 Marks: Q4(a) Total
12
CM0340/CMT502 Solutions
(b) Briefly describe, using a suitable diagram if necessary, the MPEG-1 audio compression algorithm, outlining how frequency masking and temporal masking are
encoded.
MPEG audio compression basically works by:
• Dividing the audio signal up into a set of frequency subbands (Filtering) [1]
• Use filter banks to achieve this.
• Sub-bands approximate critical bands.
• Each band quantised according to the audibility of
quantisation noise.
27
[2]
[1]
[1]
[1]
Frequency masking and temporal masking are encoded by:
Frequency Masking MPEG Audio encodes this by quantising each filter bank
with adaptive values from neighbouring bands energy, defined by a look up
table.
[2]
Temporal Masking —
Not so easy to model as frequency masking. MP3 achieves this with a 50%
overlap between successive transform windows gives window sizes of 36 or
12 and applies basic frequency masking as above.
[2]
10 Marks: Bookwork
13
PLEASE TURN OVER
CM0340/CMT502 Solutions
(c) In MPEG-4 Audio an alternative synthesis-based approach may be adopted to
achieve compression. Briefly discuss how the following may be compressed with
MPEG-4 Audio:
• Musical Audio Signals.
• Spoken Word Audio.
What are advantages and disadvantages of such approaches?
• Musical Audio Signal — Use MIDI type Structured Audio facilities in MPEG4. ”Compose” music from Scratch using S/W tools or use pitch-to-MIDI or
some transcription tools.
[1]
• Spoken Word Audio. Use Text-to-Speech (TTS) facilities in MPEG-4. Again
could transcribe audio or use some text-to-speech analysis tools.
[1]
Advantages: Lose control of the true nature of sounds so audio won’t sound like
given speaker or the source music.
[1]
Disadvantages: Very low bitrate streams/compression
[1]
4 Marks: Applied Bookwork, Text-to-Speech UNSEEN
(d) Assume that after analysis, the critical band filters of MPEG-1 Audio have output
the levels of 3 consecutive critical bands as:
Band
1 2
Level (dB) 20 90
3
55
Assuming that signal-to-mask ratios for bands 1, 2 and 3 are for signals above
80 dB in band 2 a masking of 30 dB in band 1 and 40 dB in band 3:
Show how temporal masking is implemented in MPEG audio compression. What
is the saving in bits to transmit the masked value in each masked band?
Relies on simple thresholding above or below given values (look-up table)
• In band 1 20 dB < 30 dB so ignore it, don’t send any bits, saving is clearly
4 bits..
[1]
• In band 3 55 dB > 40 dB so ignore it, so send difference value above masking
value: 15 dB (suitable coded). 4 bit instead of 6 bits: Saving of 2 bits (= 12
dB).
[2]
3 Marks: Unseen problem
Question 4 Total Marks 27
14X
END OF EXAMINATION