Impact of Packet Loss Location on Perceived Speech Quality Lingfen Sun Graham Wade, Benn Lines Emmanuel Ifeachor University of Plymouth, U.K. IPTEL'2001, New York, USA 1 Outline • Introduction • Codec's internal concealment and convergence time • Perceptual speech quality measurement • Simulation system • Loss location with perceived quality • Loss location with convergence time • Conclusions and future work IPTEL'2001, New York, USA 2 Introduction Gateway Gateway SCN IP Network SCN • End-to-end speech transmission quality – IP network performance (e.g. packet loss and jitter) – Gateway/terminal (codec + loss/jitter compensation) • Impact of packet loss on perceived speech quality – Loss pattern (e.g. burst/random) – Loss location (codec's concealment) IPTEL'2001, New York, USA 3 Introduction (cont.) • Previous research on loss location – Concealment performance is speech content related (e.g. voiced/unvoiced) – Analysis based on MSE or SNR for limited codec – Perceptual objective methods only to assess overall quality under stochastic loss simulations • Questions: – How does a packet loss location affect perceived speech quality ? – How does a packet loss location affect codec's convergence time (for loss constraint)? IPTEL'2001, New York, USA 4 Codec's internal concealment • What is codec's concealment? – When a loss occurs, the decoder interpolates the parameters for the lost frame from parameters of previous frames. • Which codec has concealment algorithm? – G.729/G.723.1/AMR (main VoIP codecs) – CELP analysis-by-synthesis • What are the limitations of concealment algorithms? – During unvoiced(u) or voiced(v) – During u/v IPTEL'2001, New York, USA 5 Codec's convergence time • What is convergence time? – The time taken by decoder to resynchronize its state with encoder after a loss occurs. It is also called resynchronization time. – For set up loss constraint distance between two consecutive losses for new packet loss metrics • What is the relationship between convergence time with loss location, codec type and packet size? IPTEL'2001, New York, USA 6 Perceptual quality measurement Reference signal System/network under test Objective perceptual quality test Objective MOS Degraded signal • Transform the signal into the psychophysical representation approximating human perception • Calculating their perceptual difference • Mapping to objective MOS (Mean Opinion Score) • Algorithms: PSQM/PSQM+/MNB/EMBSD/PESQ IPTEL'2001, New York, USA 7 Simulation System Reference speech encoder Degraded speech without loss Bitstream loss simulation convengence time analysis decoder decoder Degraded speech with loss Reference speech perceptual quality measure • Perceptual speech quality analysis with loss location • Convergence time analysis with loss location IPTEL'2001, New York, USA 8 Speech test sentence • Speech test sentence is about 6 seconds. • First talkspurt (about 1.34 second, above waveform) is used for loss location analysis. • Four voiced segments, V(1) to V(4), which can be decided by pitch delay in G.729 codec IPTEL'2001, New York, USA 9 Pitch delay from G.729 codec V(2) V(1) V(3) V(4) 140 pitch delay 120 100 80 60 40 20 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 frame location (10ms/frame) IPTEL'2001, New York, USA 10 Loss location with perceived quality • Each time only one packet loss is created • Loss position moves from left to right one frame by one frame • Overall perceptual quality is measured from PSQM/PSQM+, MNB and EMBSD • Packet size: 1 to 4 frames/packet • Codec: G.729/G.723.1/AMR • How does a loss location affect perceived speech quality ? IPTEL'2001, New York, USA 11 Loss position with quality (1) Loss position PSQM+ reference speech degraded speech PSQM IPTEL'2001, New York, USA 12 Loss position with quality (2) Loss position PSQM+ reference speech degraded speech PSQM IPTEL'2001, New York, USA 13 Loss position with quality (3) Loss position PSQM+ reference speech degraded speech PSQM IPTEL'2001, New York, USA 14 Loss position with quality (4) Loss position reference speech degraded speech PSQM+ PSQM IPTEL'2001, New York, USA 15 Overall PSQM+ vs loss location (G.729) 2.6 G.729 PSQM+ 2.2 1.8 1.4 1 1 11 21 31 41 51 61 71 81 91 101 111 121 131 Loss location (in frames, 10ms/frame) 1-frame IPTEL'2001, New York, USA 2-frame 3-frame 4-frame 16 Overall MNB vs loss location (G.729) 4 G.729 MNB 3.7 3.4 3.1 2.8 2.5 1 11 21 31 41 51 61 71 81 91 101 111 121 131 Loss location (in frames, 10ms/frame) 1-frame IPTEL'2001, New York, USA 2-frame 3-frame 4-frame 17 Overall EMBSD vs loss location (G.729) EMBSD 8 G.729 6 4 2 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 Loss location (in frames, 10ms/frame) 1-frame IPTEL'2001, New York, USA 2-frame 3-frame 4-frame 18 PSQM+ Overall PSQM+ vs loss location (G.723.1) 4.5 4 3.5 3 2.5 2 1.5 1 G.723.1 1 6 11 16 21 26 31 36 41 Loss location (in frames, 30ms/frame) 1-frame loss IPTEL'2001, New York, USA 2-frame loss 3-frame loss 4-frame loss 19 Loss location with perceived quality • Loss location affects perceived quality. • The loss at unvoiced speech segment has no obvious impact on perceived quality. • The loss at the beginning of the voiced segment has the most severe impact on perceived quality. • PSQM+ yields the most detailed result comparing to MNB/EMBSD IPTEL'2001, New York, USA 20 Convergence time (frames) Convergence time based on MSE 50 G.729 40 30 20 10 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 Loss location (in frames, 10ms/frame) 1-frame loss IPTEL'2001, New York, USA 2-frame loss 3-frame loss 4-frame loss 21 PSQM+ (on frame) Convergence time based on PSQM+ 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 frame position location 1 location 4 IPTEL'2001, New York, USA location 2 location 5 location 3 22 PSQM+ (on frame) Convergence time based on PSQM+ 30 25 20 15 10 5 0 1 6 11 16 21 26 31 36 41 6 7 8 frame position 1 2 3 4 9 10 11 12 IPTEL'2001, New York, USA 5 23 Loss location with convergence time • Convergence time is almost the same for different packet size • Convergence time for a loss at unvoiced segments appears stable • Convergence time shows a good linear relationship for loss at the voiced segments – maximum at the beginning – linear descending – Up bound to the end of voiced segments IPTEL'2001, New York, USA 24 Conclusions and future work • Investigated the impact of loss locations on perceived speech quality • Investigated the impact of loss locations on convergence time • The results will be helpful to develop a perceptually relevant packet loss metric. • Future work will focus on more extensive analysis of the impact of packet loss on speech content IPTEL'2001, New York, USA 25
© Copyright 2026 Paperzz