SPEECH QUALITY EVALUATION OF HANDS-FREE TELEPHONES DURING DOUBLE TALK: NEW EVALUATION METHODOLOGIES H. W. Gierlich; F. Kettler; E. Diedrich* HEAD acoustics GmbH, Ebertstr. 30a, 52134 Herzogenrath, Germany Tel.: +49 2407 57722; Fax: +49 2407 57799, E-Mail: [email protected] *Deutsche Telekom Berkom GmbH, Goslarer Ufer 35, 10589 Berlin, Germany Tel.: +49 30 3497 2416, Fax: +49 30 3497 2954, E-Mail: [email protected] ABSTRACT During the past years auditory test procedures -conversational tests, listening tests and double talk tests- have been developed in order to quantify the speech quality of hands-free telephones subjectively. This auditory test results are the basis for instrumental evaluation of hands-free telephones. They allow the creation of test procedures as well as the determination of performance parameters and limits. The following paper describes objective test parameter specifically for the double talk situation which is the most critical one for hands-free telephones and requires the most advanced analysis technique. Using the example of one hands-free telephone, the test methodology is described. 1 INTRODUCTION During the last years significant progress was made in evaluating the quality of hands-free telephones: new objective test procedures have been proposed [1], new test signals [2] and new analysis methodologies have been developed [3], [4], [5]. On the other hand advanced signal processing technology is available at relatively low cost. Advanced modern hands-free telephones, consequently incorporate already signal processing technology which would have been impossible some years ago: acoustics echo cancellers (fullband as well as subband cancellers), speech level controlled attenuation and companders, advanced center clipper technologies are typical examples for signal processing technologies. Consequently the analysis technologies need to be improved in order to qualify the telephone behavior especially with respect to speech quality. The most important situation in this scenario is the double talk situation. In the double talk situation speech detection must be very reliable, background noise transmission needs not to interfere with speech detection, acoustical echo cancellers must not diverge and last but not least real double talk interaction between the two conversational partners is required. In the situation also the human sensation of quality is different to the single talk situation. 2 OVERVIEW ABOUT AUDITORY PARAMETERS RELEVANT When evaluating the performance of hands-free telephones subjectively a very realistic test situation is required on the one hand, on the other hand very specific tests are needed in order to quantify the relevant performance parameters and achieve comparable and reproducible results. Fig. 1 gives an overview about the test methods used for subjective performance evaluation and which are the basis of all objective (instrumental) evaluation of telephones: conversational tests give the overall quality. Fig.1: Overview about test methods used for subjective evaluation In conversational tests parameters like overall quality, dialog capability and sound quality can be evaluated. During conversational tests, test subjects may give hints in case they perceived specific problems. In addition specific for the double talk situation a well designed double talk test is recommended. In this specific tests which are kept quite short, subjects are able to evaluate the double talk situation much more in detail. Typical parameters evaluated using this type of test are: double talk capability, completeness of speech transmission during double talk, room noise transmission, sound quality (in single and double talk), disturbance caused by echoes, disturbance caused by level variation of speech. Besides the specific double talk tests, there is also the possibility to evaluate the situation using binaurally recorded double talk situations in a listening only test. This type of test allows a more detailed parameter evaluation also by naive subjects, since this situation requires only listening. A more detailed description of the test procedures can be found in [1], [6] and [8]. 3 INSTRUMENTAL EVALUATION: SIGNALS AND PROCEDURES For objective evaluation of hands-free telephones the test signals used are of major importance. On the one hand test signals should be as speech like as possible in order to evaluate the hands-free telephone behavior appropriate, on the other hand the test signal should enable the test equipment to easily and reproducibly access the relevant performance parameters. Especially in a double talk situation, the choice of the test signal is critical. A speech signal can hardly be used due to its time and frequency variance. So, artificial test signals and sequences are needed (in addition to speech signals) which reproduce: • • • • the dynamic behavior of speech, talk spurt durations of speech, (temporal distribution similar to speech), the typical power density spectrum of speech. Fig.2: Double talk test signal using modified CS signals [2] with increasing and decreasing level variations for sending (light) and receiving direction (dark) In order to simulate typical double talk situations (long duration double talk, short interruptions, fast interactions of speakers) various sequences of test signals are introduced. One example is shown in Fig. 2. The test signal in Fig. 2 consists of spectrally shaped Composite Source Signals [2] which are overlapped in that way, that the pseudo random noise sequence (near end) and the voiced sound of the double talk signal (far end) is present whereas during the pause always the pseudo random noise sequence of the double talk signal (opposite channel) is present. The sequence length is 32 s. The test signals are uncorrelated. These test signals allow a basic overview of double talk measurement where the following parameters of a telephone can be evaluated: • • • • all types of time constants relevant in a double talk situation, level and level variation during double talk, frequency responses during double talk, loudness and loudness variations during double talk. For a typical test the telephone is placed in the standard measurement position in a room on a table (arrangement e.g. according to ITU-T P.340). A measurement microphone for measuring the receiving direction is placed directly to the speaker of the HFT in order to achieve a good acoustical decoupling between sending and receiving, an artificial head is used for signal generation in sending direction. 3.1 Level Variations During Double Talk A general view on the behavior of a HFT in double talk conditions can be seen in Fig 3. Here the receiving path amplification depending on the receiving as well as on the sending signal level can be seen. The test signal used for this evaluation is shown in Fig. 2. Fig. 3: Variation of amplification in receiving direction during double talk, level variation: receiving: -28 dBm0..-8 dBm0..-28 dBm0 sending: -4,7dB(Pa)..-24,7dB(Pa).. -4,7dB(Pa) It is obvious, that an asymmetric AGC is in operation which for higher receiving signal levels will have a noticeable effect on the loudness fluctuation perceived subjectively. For increasing receiving levels first a smooth decrease in amplification can be found followed by a sudden attenuation of about 10 dB. While decreasing the receiving level again, 3 steps in the attenuation can be found: 10 dB, 12,5 dB and 15 dB were specifically the 10 dB step will have a noticeable impact and the perceived quality. The level variation is quite high (15 dB) and incorporates a rapid change of amplification. A more detailed evaluation using this type of test signal is shown in Fig. 4. Fig. 4 shows frequency responses and loudness ratings (at -28 dBm0 and -8 dBm0 receiving signal level) which are measured in the receiving direction during double talk using the pause in the double talk signal. So, the measured frequency response and loudness rating was measured using 88 ms of signal for analysis. In this example the level variation as a function of frequency of the HFT under test can be seen. Fig. 5: Switching time and level variation during double talk Fig. 6: Frequency response during double talk, activated and attenuated RLR = 6.3 dB High level signal (-4,7 dB(Pa)) excitation in sending and low level signal excitation in receiving (-28 dBm0) RLR = 19.5 dB Low level excitation in sending (-24,7 dB(Pa)) and high level excitation (-8dBm0) in receiving Fig.4: Frequency responses and Loudness Ratings in receiving direction during double talk From this more detailed investigation it can be seen, that the attenuation is not frequency dependent. Such no further (audible) artifacts are introduced. 3.2 Switching During Double Talk One example from the same telephone but derived from the sending direction is shown in Figs. 5 and 6. In sending direction for sending levels less than -10.7 dB(Pa) level switching during double talk can be found. In order to determine the subjective relevance of this switching, the time constants and switching gain need to be determined as well as the level dependent change of these parameter. Switching time and gain can be seen in Fig. 5. 35 ms after double talk the amplification is raised by about 10 dB. The switching time is about 10 ms. Fig. 6 indicates, that there is no frequency dependent amplification. The level variation is about 10 dB as seen already from Fig. 5. These values indicate a reasonable double talk performance as derived from subjective tests [8]. 3.3 Echo During Double Talk A quite annoying disturbance is echo during double talk. Echo during double talk occurs in case of transmission delay present during the connection. In order to instrumentally access echo problems in the measurement arrangement, a delay needs to be inserted. Echo problems during double talk are hard to evaluate. They are mostly noticeable during speech sequences where the double talk signal level itself is low, but the echo signal itself is high enough to be audible. This motivates a measurement which allows the determination of the echo loss during the time where the double talk signal is low and the echo signal is present. This can be made by a specific time sequence consisting of artificial signals (like CSS signals) which are composed in that way, that the echo signal is measured during a pause in a double talk signal or a low level double talk signal. A general problem for this measurement is that -due to insufficient S/NRatios- no crosscorrelation between sending signal and echo signal can be made for determination of the echo delay. So the actual echo delay must be determined in advance without the presence of the double talk signal. The measurement signal must be chosen in a way that sufficient energy is present during the evaluation of the echo during double talk. The analysis window must be chosen such that also for slight delay variation the echo signal itself is present during the analysis. A specific test sequence containing double talk signal, followed by the actual test signal, followed again by the double talk signal, composed by the CSS sequences can be used. The analysis of the echo signal is made directly after the second double talk sequence. The timing of the sequences must be such, that the delay inserted provides the echo signal directly after the second double talk sequence. This test allows the measurement echo loss as a function of level and frequency. Another in principle well known technology for the double talk evaluation is the use of orthogonal test sequences. A simple methodology is to insert just a sine wave as a double talk signal and extract this frequency component from the echo measurement. Advanced processing methodologies, however, will not react on a sine wave in a proper way. Such a more complex double talk signal is more suitable which introduces e.g. combfiltered, orthogonal sequences in sending and receiving. By correct filtering of the received echo signal a separation between double talk signal and echo signal can be made. In any case, a proper frequency spacing must be chosen in order to take the specific hands-free properties into account. Of course, for any measurement it must be guaranteed that before conductioning this measurement, the echo canceller must have been fully converged. 3.4 More Detailed Evaluations Fig. 5 already indicates that there is at least the probability for the hands-free telephone to have -besides their detected switching- some kind of companding device incorporated. The effect of a compander can hardly be evaluated using simply a Composite Source Signal. This type of signal does not approximate the speech dynamic in a proper way in order to find artefacts caused by companding. When judging such kind of processing a more speech like signal like artificial voice [7] is more appropriate. The detailed evaluation using a short sequence of this P.50 signal is shown in Fig. 7. Fig. 7: Companding characteristic sending excitation signal sending using artificial voice (ITU-T P.50), -4,7 dB(Pa) For this evaluation a spectral representation is needed. Simple level evaluation may fail due to the nonlinear frequency response of the HFT which is activated by a signal with changing frequency content. When just evaluating the level variation of the output signal vs. the input signal, a level variation may be seen which only reflects the frequency variation of the input signal, amplified differently according to the frequency dependant attenuation for the individual frequency component. Fig. 7 shows one example. Ideally (no companding) the colors of the picture would be frequency dependant, but no variation in time would be found. Fig. 7 indicates clearly the companding device, operates in a level range of ± 3 dB of the whole frequency band. This relatively low degree of companding is of minor impact to the subjective rating. The loudness variation of this companding is noticeable, however, not disturbing. SUMMARY An overview has been giving of objective measures for the double talk situation which can be used in order to instrumentally access the double talk performance of hands-free telephones. The measures are based on auditory experiments which have been conducted previous. Some selected examples have shown the validity of this test methods. More detailed validation procedures are carried on in order to validate the methodology with various equipment available. The work was supported by Deutsche Telekom AG. References [1] Gierlich, H.W. The Auditory Perceived Quality of Hands-Free Telephones: Auditory Judgements, Instrumental Measurements and Their Relationship, Speech Communication 20 (1996) 241-254, October 1996 [2] Gierlich, H.W. Principle and Application of a new Test Signal to Determine the Transfer Characteristics of Telecommunication Systems, IEEE Workshop 1993, New Paltz, New York, Final Program And Paper Summaries, Session 7 [3] Krebber, W.; Böhme St.; Gierlich, H.W. A new Artificial Ear for Telephone Measurements, ASA 1993, Denver Colorado [4] Gierlich, H.W.; Kettler, F.; Krebber, W.; Diedrich, E. Quality Evaluation Procedures for Hands-Free Telephones, ITG-Workshop Darmstadt, 1996, Proceedings pp. 30-31 [5] Gierlich, H.W.; Kettler, F; Hottenbacher,A; Diedrich, E. Transmission Quality of Hands-Free Telephones Auditory Tests, Instrumental Measurements and Suggested Measurement Parameters for Classifications, ITG Workshop Darmstadt, 1996, Proceedings pp. 32-33 [6] Gierlich, H.W.; Krebber, W.; Kettler, F. Subjective Evaluation of Hands-Free Telephones Using Conversational Tests, Specific Double Talk Tests and Listening Only Tests, ITU-T SG 12 Meeting, 1997, Genf, COM 12-06E [7] ITU-T Recommendation P.50: Artificial voice [8] Diedrich, E.; Dehnel, A. Ein aufwandsreduziertes Listening Only-Testverfahren zur Bestimmung der Sprachqualität von Freisprecheinrichtungen, DAGA 1998, Zürich
© Copyright 2025 Paperzz