speech quality evaluation of hands-free telephones during double talk

SPEECH QUALITY EVALUATION OF HANDS-FREE
TELEPHONES DURING DOUBLE TALK: NEW
EVALUATION METHODOLOGIES
H. W. Gierlich; F. Kettler; E. Diedrich*
HEAD acoustics GmbH, Ebertstr. 30a, 52134 Herzogenrath, Germany
Tel.: +49 2407 57722; Fax: +49 2407 57799, E-Mail: [email protected]
*Deutsche Telekom Berkom GmbH, Goslarer Ufer 35, 10589 Berlin, Germany
Tel.: +49 30 3497 2416, Fax: +49 30 3497 2954, E-Mail: [email protected]
ABSTRACT
During the past years auditory test procedures
-conversational tests, listening tests and double talk
tests- have been developed in order to quantify the
speech
quality
of
hands-free
telephones
subjectively. This auditory test results are the basis
for
instrumental
evaluation
of
hands-free
telephones. They allow the creation of test
procedures as well as the determination of
performance parameters and limits. The following
paper
describes
objective
test
parameter
specifically for the double talk situation which is the
most critical one for hands-free telephones and
requires the most advanced analysis technique.
Using the example of one hands-free telephone, the
test methodology is described.
1
INTRODUCTION
During the last years significant progress was made
in evaluating the quality of hands-free telephones:
new objective test procedures have been proposed
[1], new test signals [2] and new analysis
methodologies have been developed [3], [4], [5].
On the other hand advanced signal processing
technology is available at relatively low cost.
Advanced
modern
hands-free
telephones,
consequently incorporate already signal processing
technology which would have been impossible
some years ago: acoustics echo cancellers
(fullband as well as subband cancellers), speech
level controlled attenuation and companders,
advanced center clipper technologies are typical
examples for signal processing technologies.
Consequently the analysis technologies need to be
improved in order to qualify the telephone behavior
especially with respect to speech quality. The most
important situation in this scenario is the double talk
situation. In the double talk situation speech
detection must be very reliable, background noise
transmission needs not to interfere with speech
detection, acoustical echo cancellers must not
diverge and last but not least real double talk
interaction between the two conversational partners
is required. In the situation also the human
sensation of quality is different to the single talk
situation.
2
OVERVIEW
ABOUT
AUDITORY PARAMETERS
RELEVANT
When evaluating the performance of hands-free
telephones subjectively a very realistic test situation
is required on the one hand, on the other hand very
specific tests are needed in order to quantify the
relevant performance parameters and achieve
comparable and reproducible results. Fig. 1 gives
an overview about the test methods used for
subjective performance evaluation and which are
the basis of all objective (instrumental) evaluation of
telephones: conversational tests give the overall
quality.
Fig.1: Overview about test methods used for
subjective evaluation
In conversational tests parameters like overall
quality, dialog capability and sound quality can be
evaluated. During conversational tests, test subjects
may give hints in case they perceived specific
problems. In addition specific for the double talk
situation a well designed double talk test is
recommended. In this specific tests which are kept
quite short, subjects are able to evaluate the double
talk situation much more in detail. Typical
parameters evaluated using this type of test are:
double talk capability, completeness of speech
transmission during double talk, room noise
transmission, sound quality (in single and double
talk), disturbance caused by echoes, disturbance
caused by level variation of speech. Besides the
specific double talk tests, there is also the possibility
to evaluate the situation using binaurally recorded
double talk situations in a listening only test. This
type of test allows a more detailed parameter
evaluation also by naive subjects, since this
situation requires only listening. A more detailed
description of the test procedures can be found in
[1], [6] and [8].
3
INSTRUMENTAL EVALUATION: SIGNALS
AND PROCEDURES
For objective evaluation of hands-free telephones
the test signals used are of major importance. On
the one hand test signals should be as speech like
as possible in order to evaluate the hands-free
telephone behavior appropriate, on the other hand
the test signal should enable the test equipment to
easily and reproducibly access the relevant
performance parameters. Especially in a double talk
situation, the choice of the test signal is critical. A
speech signal can hardly be used due to its time
and frequency variance. So, artificial test signals
and sequences are needed (in addition to speech
signals) which reproduce:
•
•
•
•
the dynamic behavior of speech,
talk spurt durations of speech, (temporal
distribution similar to speech),
the typical power density spectrum of speech.
Fig.2: Double talk test signal using modified CS
signals [2] with increasing and decreasing
level variations for sending (light) and
receiving direction (dark)
In order to simulate typical double talk situations
(long duration double talk, short interruptions, fast
interactions of speakers) various sequences of test
signals are introduced. One example is shown in
Fig. 2.
The test signal in Fig. 2 consists of spectrally
shaped Composite Source Signals [2] which are
overlapped in that way, that the pseudo random
noise sequence (near end) and the voiced sound of
the double talk signal (far end) is present whereas
during the pause always the pseudo random noise
sequence of the double talk signal (opposite
channel) is present. The sequence length is 32 s.
The test signals are uncorrelated. These test
signals allow a basic overview of double talk
measurement where the following parameters of a
telephone can be evaluated:
•
•
•
•
all types of time constants relevant in a double
talk situation,
level and level variation during double talk,
frequency responses during double talk,
loudness and loudness variations during double
talk.
For a typical test the telephone is placed in the
standard measurement position in a room on a table
(arrangement e.g. according to ITU-T P.340). A
measurement microphone for measuring the
receiving direction is placed directly to the speaker
of the HFT in order to achieve a good acoustical
decoupling between sending and receiving, an
artificial head is used for signal generation in
sending direction.
3.1 Level Variations During Double Talk
A general view on the behavior of a HFT in double
talk conditions can be seen in Fig 3. Here the
receiving path amplification depending on the
receiving as well as on the sending signal level can
be seen. The test signal used for this evaluation is
shown in Fig. 2.
Fig. 3: Variation of amplification in receiving
direction during double talk,
level variation:
receiving: -28 dBm0..-8 dBm0..-28 dBm0
sending: -4,7dB(Pa)..-24,7dB(Pa).. -4,7dB(Pa)
It is obvious, that an asymmetric AGC is in
operation which for higher receiving signal levels
will have a noticeable effect on the loudness
fluctuation perceived subjectively. For increasing
receiving levels first a smooth decrease in
amplification can be found followed by a sudden
attenuation of about 10 dB. While decreasing the
receiving level again, 3 steps in the attenuation can
be found: 10 dB, 12,5 dB and 15 dB were
specifically the 10 dB step will have a noticeable
impact and the perceived quality. The level variation
is quite high (15 dB) and incorporates a rapid
change of amplification.
A more detailed evaluation using this type of test
signal is shown in Fig. 4. Fig. 4 shows frequency
responses and loudness ratings (at -28 dBm0 and
-8 dBm0 receiving signal level) which are measured
in the receiving direction during double talk using
the pause in the double talk signal. So, the
measured frequency response and loudness rating
was measured using 88 ms of signal for analysis. In
this example the level variation as a function of
frequency of the HFT under test can be seen.
Fig. 5: Switching time and level variation during
double talk
Fig. 6: Frequency response during double talk,
activated and attenuated
RLR = 6.3 dB
High level signal (-4,7 dB(Pa)) excitation in sending
and low level signal excitation in receiving
(-28 dBm0)
RLR = 19.5 dB
Low level excitation in sending (-24,7 dB(Pa)) and
high level excitation (-8dBm0) in receiving
Fig.4: Frequency responses and Loudness Ratings
in receiving direction during double talk
From this more detailed investigation it can be seen,
that the attenuation is not frequency dependent.
Such no further (audible) artifacts are introduced.
3.2 Switching During Double Talk
One example from the same telephone but derived
from the sending direction is shown in Figs. 5 and 6.
In sending direction for sending levels less than
-10.7 dB(Pa) level switching during double talk can
be found. In order to determine the subjective
relevance of this switching, the time constants and
switching gain need to be determined as well as the
level dependent change of these parameter.
Switching time and gain can be seen in Fig. 5.
35 ms after double talk the amplification is raised by
about 10 dB. The switching time is about 10 ms.
Fig. 6 indicates, that there is no frequency
dependent amplification. The level variation is about
10 dB as seen already from Fig. 5. These values
indicate a reasonable double talk performance as
derived from subjective tests [8].
3.3 Echo During Double Talk
A quite annoying disturbance is echo during double
talk. Echo during double talk occurs in case of
transmission delay present during the connection. In
order to instrumentally access echo problems in the
measurement arrangement, a delay needs to be
inserted. Echo problems during double talk are hard
to evaluate. They are mostly noticeable during
speech sequences where the double talk signal
level itself is low, but the echo signal itself is high
enough to be audible. This motivates a measurement which allows the determination of the echo
loss during the time where the double talk signal is
low and the echo signal is present. This can be
made by a specific time sequence consisting of
artificial signals (like CSS signals) which are
composed in that way, that the echo signal is
measured during a pause in a double talk signal or
a low level double talk signal. A general problem for
this measurement is that -due to insufficient S/NRatios- no crosscorrelation between sending signal
and echo signal can be made for determination of
the echo delay. So the actual echo delay must be
determined in advance without the presence of the
double talk signal. The measurement signal must be
chosen in a way that sufficient energy is present
during the evaluation of the echo during double talk.
The analysis window must be chosen such that also
for slight delay variation the echo signal itself is
present during the analysis. A specific test
sequence containing double talk signal, followed by
the actual test signal, followed again by the double
talk signal, composed by the CSS sequences can
be used. The analysis of the echo signal is made
directly after the second double talk sequence. The
timing of the sequences must be such, that the
delay inserted provides the echo signal directly after
the second double talk sequence. This test allows
the measurement echo loss as a function of level
and frequency. Another in principle well known
technology for the double talk evaluation is the use
of orthogonal test sequences. A simple
methodology is to insert just a sine wave as a
double talk signal and extract this frequency
component from the echo measurement. Advanced
processing methodologies, however, will not react
on a sine wave in a proper way. Such a more
complex double talk signal is more suitable which
introduces e.g. combfiltered, orthogonal sequences
in sending and receiving. By correct filtering of the
received echo signal a separation between double
talk signal and echo signal can be made. In any
case, a proper frequency spacing must be chosen
in order to take the specific hands-free properties
into account. Of course, for any measurement it
must be guaranteed that before conductioning this
measurement, the echo canceller must have been
fully converged.
3.4 More Detailed Evaluations
Fig. 5 already indicates that there is at least the
probability for the hands-free telephone to have
-besides their detected switching- some kind of
companding device incorporated. The effect of a
compander can hardly be evaluated using simply a
Composite Source Signal. This type of signal does
not approximate the speech dynamic in a proper
way in order to find artefacts caused by
companding. When judging such kind of processing
a more speech like signal like artificial voice [7] is
more appropriate. The detailed evaluation using a
short sequence of this P.50 signal is shown in Fig.
7.
Fig. 7: Companding
characteristic
sending
excitation signal sending using artificial voice
(ITU-T P.50), -4,7 dB(Pa)
For this evaluation a spectral representation is
needed. Simple level evaluation may fail due to the
nonlinear frequency response of the HFT which is
activated by a signal with changing frequency
content. When just evaluating the level variation of
the output signal vs. the input signal, a level
variation may be seen which only reflects the
frequency variation of the input signal, amplified
differently according to the frequency dependant
attenuation for the individual frequency component.
Fig. 7 shows one example. Ideally (no companding)
the colors of the picture would be frequency
dependant, but no variation in time would be found.
Fig. 7 indicates clearly the companding device,
operates in a level range of ± 3 dB of the whole
frequency band. This relatively low degree of
companding is of minor impact to the subjective
rating. The loudness variation of this companding is
noticeable, however, not disturbing.
SUMMARY
An overview has been giving of objective measures
for the double talk situation which can be used in
order to instrumentally access the double talk
performance of hands-free telephones. The
measures are based on auditory experiments which
have been conducted previous. Some selected
examples have shown the validity of this test
methods. More detailed validation procedures are
carried on in order to validate the methodology with
various equipment available.
The work was supported by Deutsche Telekom AG.
References
[1] Gierlich, H.W.
The Auditory Perceived Quality of Hands-Free
Telephones: Auditory Judgements, Instrumental
Measurements and Their Relationship, Speech
Communication 20 (1996) 241-254, October 1996
[2] Gierlich, H.W.
Principle and Application of a new Test Signal to
Determine
the
Transfer
Characteristics
of
Telecommunication Systems, IEEE Workshop 1993,
New Paltz, New York, Final Program And Paper
Summaries, Session 7
[3] Krebber, W.; Böhme St.; Gierlich, H.W.
A new Artificial Ear for Telephone Measurements, ASA
1993, Denver Colorado
[4] Gierlich, H.W.; Kettler, F.; Krebber, W.; Diedrich, E.
Quality Evaluation Procedures for Hands-Free
Telephones,
ITG-Workshop
Darmstadt,
1996,
Proceedings pp. 30-31
[5] Gierlich, H.W.; Kettler, F; Hottenbacher,A; Diedrich, E.
Transmission Quality of Hands-Free Telephones Auditory Tests, Instrumental Measurements and
Suggested
Measurement
Parameters
for
Classifications, ITG Workshop Darmstadt, 1996,
Proceedings pp. 32-33
[6] Gierlich, H.W.; Krebber, W.; Kettler, F.
Subjective Evaluation of Hands-Free Telephones
Using Conversational Tests, Specific Double Talk
Tests and Listening Only Tests, ITU-T SG 12 Meeting,
1997, Genf, COM 12-06E
[7] ITU-T Recommendation P.50: Artificial voice
[8] Diedrich, E.; Dehnel, A.
Ein aufwandsreduziertes Listening Only-Testverfahren
zur
Bestimmung
der
Sprachqualität
von
Freisprecheinrichtungen, DAGA 1998, Zürich