Loss location - Plymouth University

Impact of Packet Loss Location on
Perceived Speech Quality
Lingfen Sun
Graham Wade, Benn Lines
Emmanuel Ifeachor
University of Plymouth, U.K.
IPTEL'2001, New York, USA
1
Outline
• Introduction
• Codec's internal concealment and
convergence time
• Perceptual speech quality measurement
• Simulation system
• Loss location with perceived quality
• Loss location with convergence time
• Conclusions and future work
IPTEL'2001, New York, USA
2
Introduction
Gateway
Gateway
SCN
IP Network
SCN
• End-to-end speech transmission quality
– IP network performance (e.g. packet loss and jitter)
– Gateway/terminal (codec + loss/jitter compensation)
• Impact of packet loss on perceived speech quality
– Loss pattern (e.g. burst/random)
– Loss location (codec's concealment)
IPTEL'2001, New York, USA
3
Introduction (cont.)
• Previous research on loss location
– Concealment performance is speech content
related (e.g. voiced/unvoiced)
– Analysis based on MSE or SNR for limited codec
– Perceptual objective methods only to assess
overall quality under stochastic loss simulations
• Questions:
– How does a packet loss location affect perceived
speech quality ?
– How does a packet loss location affect codec's
convergence time (for loss constraint)?
IPTEL'2001, New York, USA
4
Codec's internal concealment
• What is codec's concealment?
– When a loss occurs, the decoder interpolates the
parameters for the lost frame from parameters of
previous frames.
• Which codec has concealment algorithm?
– G.729/G.723.1/AMR (main VoIP codecs)
– CELP analysis-by-synthesis
• What are the limitations of concealment
algorithms?
– During unvoiced(u) or voiced(v)
– During u/v
IPTEL'2001, New York, USA
5
Codec's convergence time
• What is convergence time?
– The time taken by decoder to resynchronize its
state with encoder after a loss occurs. It is also
called resynchronization time.
– For set up loss constraint distance between two
consecutive losses for new packet loss metrics
• What is the relationship between
convergence time with loss location, codec
type and packet size?
IPTEL'2001, New York, USA
6
Perceptual quality measurement
Reference signal
System/network
under test
Objective perceptual
quality test
Objective
MOS
Degraded signal
• Transform the signal into the psychophysical
representation approximating human perception
• Calculating their perceptual difference
• Mapping to objective MOS (Mean Opinion Score)
• Algorithms: PSQM/PSQM+/MNB/EMBSD/PESQ
IPTEL'2001, New York, USA
7
Simulation System
Reference
speech
encoder
Degraded speech
without loss
Bitstream
loss
simulation
convengence
time analysis
decoder
decoder
Degraded speech
with loss
Reference speech
perceptual
quality measure
• Perceptual speech quality analysis with loss location
• Convergence time analysis with loss location
IPTEL'2001, New York, USA
8
Speech test sentence
• Speech test sentence is about 6 seconds.
• First talkspurt (about 1.34 second, above waveform)
is used for loss location analysis.
• Four voiced segments, V(1) to V(4), which can be
decided by pitch delay in G.729 codec
IPTEL'2001, New York, USA
9
Pitch delay from G.729 codec
V(2)
V(1)
V(3)
V(4)
140
pitch delay
120
100
80
60
40
20
0
1
11
21
31
41
51
61
71
81
91 101 111 121 131
frame location (10ms/frame)
IPTEL'2001, New York, USA
10
Loss location with perceived quality
• Each time only one packet loss is created
• Loss position moves from left to right one
frame by one frame
• Overall perceptual quality is measured from
PSQM/PSQM+, MNB and EMBSD
• Packet size: 1 to 4 frames/packet
• Codec: G.729/G.723.1/AMR
• How does a loss location affect perceived
speech quality ?
IPTEL'2001, New York, USA
11
Loss position with quality (1)
Loss position
PSQM+
reference speech
degraded speech
PSQM
IPTEL'2001, New York, USA
12
Loss position with quality (2)
Loss position
PSQM+
reference speech
degraded speech
PSQM
IPTEL'2001, New York, USA
13
Loss position with quality (3)
Loss position
PSQM+
reference speech
degraded speech
PSQM
IPTEL'2001, New York, USA
14
Loss position with quality (4)
Loss position
reference speech
degraded speech
PSQM+
PSQM
IPTEL'2001, New York, USA
15
Overall PSQM+ vs loss location (G.729)
2.6
G.729
PSQM+
2.2
1.8
1.4
1
1
11
21
31
41
51
61
71
81
91
101
111
121
131
Loss location (in frames, 10ms/frame)
1-frame
IPTEL'2001, New York, USA
2-frame
3-frame
4-frame
16
Overall MNB vs loss location (G.729)
4
G.729
MNB
3.7
3.4
3.1
2.8
2.5
1
11
21
31
41
51
61
71
81
91
101
111
121
131
Loss location (in frames, 10ms/frame)
1-frame
IPTEL'2001, New York, USA
2-frame
3-frame
4-frame
17
Overall EMBSD vs loss location (G.729)
EMBSD
8
G.729
6
4
2
0
1
11
21
31
41
51
61
71
81
91 101 111 121 131
Loss location (in frames, 10ms/frame)
1-frame
IPTEL'2001, New York, USA
2-frame
3-frame
4-frame
18
PSQM+
Overall PSQM+ vs loss location (G.723.1)
4.5
4
3.5
3
2.5
2
1.5
1
G.723.1
1
6
11
16
21
26
31
36
41
Loss location (in frames, 30ms/frame)
1-frame loss
IPTEL'2001, New York, USA
2-frame loss
3-frame loss
4-frame loss
19
Loss location with perceived quality
• Loss location affects perceived quality.
• The loss at unvoiced speech segment has no
obvious impact on perceived quality.
• The loss at the beginning of the voiced
segment has the most severe impact on
perceived quality.
• PSQM+ yields the most detailed result
comparing to MNB/EMBSD
IPTEL'2001, New York, USA
20
Convergence time (frames)
Convergence time based on MSE
50
G.729
40
30
20
10
0
1
11
21
31
41
51
61
71
81
91
101 111 121 131
Loss location (in frames, 10ms/frame)
1-frame loss
IPTEL'2001, New York, USA
2-frame loss
3-frame loss
4-frame loss
21
PSQM+ (on frame)
Convergence time based on PSQM+
80
60
40
20
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
frame position
location 1
location 4
IPTEL'2001, New York, USA
location 2
location 5
location 3
22
PSQM+ (on frame)
Convergence time based on PSQM+
30
25
20
15
10
5
0
1
6
11
16
21
26
31
36
41
6
7
8
frame position
1
2
3
4
9
10
11
12
IPTEL'2001, New York, USA
5
23
Loss location with convergence time
• Convergence time is almost the same for
different packet size
• Convergence time for a loss at unvoiced
segments appears stable
• Convergence time shows a good linear
relationship for loss at the voiced segments
– maximum at the beginning
– linear descending
– Up bound to the end of voiced segments
IPTEL'2001, New York, USA
24
Conclusions and future work
• Investigated the impact of loss locations on
perceived speech quality
• Investigated the impact of loss locations on
convergence time
• The results will be helpful to develop a
perceptually relevant packet loss metric.
• Future work will focus on more extensive
analysis of the impact of packet loss on
speech content
IPTEL'2001, New York, USA
25