Frame A Missing Signal

STQ Workshop, Sophia-Antipolis, February 11th, 2003
Packet loss concealment using
audio morphing
Franck Bouteille¹
Pascal Scalart²
Balazs Kövesi²
¹ PRESCOM SA, Lannion, FRANCE
² France Telecom R&D, Lannion, FRANCE
Motivation
In packet data networks, excess traffic leads to delays or loss in delivery of
information. In voice communication, long delays are intolerable and network
delay budgets have strong influence on the design of packet voice systems.
To increase the tolerance of packet voice systems to lost packets some
techniques have been developed.
These techniques do not use the a posteriori information of the next
packet that indicates and detects the lost of one or several frames.
However those techniques are not adapted for long lost periods
(>15ms) because of the non long-term stationnarity of speech signal.
 This a posteriori information is generally available because of the playout
buffer management and real time network protocol.
 The technique proposed uses the knowledge of the frame received after the
last lost one, the models of the last received frames, and a model interpolation
to synthesized the missing signal.
Outline
 Introduction
 Morphing audio principle
 Voiced / Unvoiced strategy
 Modelisation and Interpolation
 Blocks concatenation and smoothing
 Some results of concealed signal
 Comparisons and performances
 Configuration
 Results
 Conclusion
Morphing audio principle
 Context of lost :
Previous Frame Missing Signal
Frame A
Next Frame
Frame B
 Voiced/Unvoiced strategy
Pitch estimation
Pitch estimation
Frame A : P0
Frame B : P1
Frame A
Frame B
V
V
V
UV
UV
V
P0 = P1 , P1
UV
UV
Unvoiced
signal
P0 , P1
P0 , P1 = P0
(400 Hz ) 2.5 ms  P0, P1  15 ms (67 Hz)
When missing signal
is defined as
unvoiced, Frame A is
copied to missing
signal or comfort
noise is generated
Morphing audio principle
 Modelisation and Interpolation:
 P0 and P1 are used to estimate the number of necessary
intermediate blocks (NbBloc) and the size of these blocks (SizeBloc).
SizeBloc  max( P0, P1)
 NbSampleLoss 
NbBloc  round 

 SizeBloc 
 We model the last pitch period vector (X0) of the Frame A (ModP0)
and the first pitch period vector (X1) of the Frame B (ModP1). DCT
(Dicret Cosinus Transform) is used to model X0 and X1. Resolution is 120
points at 8kHz of sample frequency.
 Intermediate blocks, Blocki , are used in order to transform, in a
continuous way, the model vector ModP0 to the model vector ModP1 with
linear interpolation of model parameters.
ModP1 k   ModP0  k  

Blocki  n   IDCT  ModP0  k   i *

NbBloc

120
0  k  120  1
10  i  NbBloc  1
0  n  SizeBloc  1
IDCT : Inverse Discrete Cosinus Transform.
Morphing audio principle
 Blocks concatenation and smoothing
 Each block is then copied in the synthesis frame.
Synthesis
Frame
Block1
Block0
Blocki
Frame A
….
…. Block NbBloc 1 Frame B
Smoothing
 Smoothing between blocks is realized according to:
x(0)   (0) * x( 1)  (1   (0)) * y (0)
x(1) : last sample of previous block (or Frame)
y(0) : first sample of current block (or Frame)
x(i )   ( j ) * x(i  1)  (1   (i)) * y (i )
 (i ) : Smoothing Factor  (i)  1 
0  i  NbPSmoothing
i 1
NbPSmoothing 1
Morphing audio principle
 Some results of concealed signal
Original
frame
Nb sample
Conceal
frame
Nb sample
Case of voiced frames of a female speech
signal (30ms of missing signal)
Morphing audio principle
 Some results of concealed signal
Original
frame
Conceal
frame
Nb sample
Nb sample
Behaviour of the morphing technique during a transition frame (30ms)
for male speech signal.
 We can notice that the concealed speech to noise transition is more voiced
than original frame. In an enhanced morphing technique the voiced duration
could be controlled.
Comparisons and performances
Ten subjects were participating to an informal test: they were
asked to listen to coded speech signals that have been
corrected by different concealment techniques
 Configuration
 Two speech coders (G.711 and G.723.1) were independently tested,
The size frame is 30ms;
 Five concealment techniques : Previous Frame Copy: PFC, double Sided
Periodic Substitution: DSPS1, ITU-T recommended technique defined
for each specific coder: G.711 and G.723.1, GFEC technique2 and Audio
Morphing;
 Two series of rate were defined: 5 % and 10 %. The losses can appear
by burst, but are usually isolated ;
 The number of sentences was 15 (8 female and 7 male speech files)
1 : J. Tang, "Evaluation of Double Sided Periodic Substitution (DSPS) Method for Recovering Missing
Speech in Packet Voice Communications," IEEE Computers and Communications, pp. 454-458, 1991.
2 : B. Kövesi, D. Massaloux, "Method of Packet Errors Cancellation Suitable for any Speech and Sound Compression
Scheme", ETSI STQ Workshop, February 2003, Sophia-Antipolis
Comparisons and performances
 Results for G.711 codec
Score (/15)
7,00
6,00
5,00
4,00
Rate 5%
Taux
5%
Rate 10%
Taux
10%
3,00
2,00
1,00
0,00
PFC
FECG711
DSPS
GFEC
MORPHING
Comparisons and performances
 Results for G.723.1 codec
Note (/15) - G.723.1 - Taux de perte 5% et 10%
Score (/15)
7,00
6,00
5,00
4,00
Taux
5%
Rate 5%
Rate 10%
Taux
10%
3,00
2,00
1,00
0,00
PFC
RTP
FECG723
FECG.723.1
DSPS
DSPS
GFEC
KB
MORPHING
Morphing
Conclusion
 Proposed technique improves the quality of the
frame correction for strong lost rate (5 % and 10 %);
 Morphing audio adds latency (Frame B is required),
but is acceptable for application of VoIP;
 Another modelisation are possible and voiced
condition can be controlled to improve restitution
quality