6-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests Intelligibility Naturalness Quality Levels Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality (Upper than 64 kbps) Test Types Intelligibility Naturalness Subjective DRT, MRT MOS, DAM Objective None. Future ASR systems AI, Global SNR, Seg. SNR, FW-Seg. SNR, Itakura Measure, WSSM First Class Subjective Intelligibility Tests Diagnostic Rhyme Test (DRT) – Selecting between two CVC by different first C – First C should have specific properties – Ex. hop - fop And than - dan Modified Rhyme Test (MRT) – Selecting between CVC’s by different first C – Ex. Cat, bat, rat, mat, fat, sat First Class (Cont’d) Subjective Intelligibility tests DRT is very applicable and credible In this test user can hear the speech only once N Correct N Incorrect DRT % 100 N Tests Second Class Subjective Naturalness tests Mean Opinion Score (MOS) – MOS is very applicable and credible – In this test user can hear the speech a lot Diagnostic Acceptability Measure (DAM) – This test is very complex Mean Opinion Score (MOS) Scores for MOS are like this Score Speech Quality 1 Not Acceptable 2 Weak 3 Medium 4 Good 5 Excellent Diagnostic Acceptability Measure (DAM) This test is very complex In this test there is 19 different parameters for score. These parameters divide into 3 main groups: – Signal Quality – Background Quality – Total Quality Objective Tests These tests can not be used for intelligibility. Because system couldn’t recognize speech intelligibility Objective tests can only be used for speech Naturalness Objective Tests (Cont’d) Articulation Index (AI) Signal to Noise Ratio (SNR) – Global (Classic) SNR – Segmental SNR – Frequency Weighted Segmental SNR Articulation Index (AI) AI assumes that different frequency bands distortion are independent, and measure signal quality in different bands. In each band determines percentage of perceptible signal by listener HZ 200 20 Bands . . . . . . . . . 6100 Articulation index (Cont’d) Perceptible by user signal : – 1- Upper than human hearing threshold – 2- Under than human pain threshold – 3- Upper than Masking Noise level – In each case one of the states 1 or 3 is prevail Articulation index (Cont’d) In AI SNR measured isolated in each band 20 1 Min ( SNR,30) AI 20 j 1 30 Signal To Noise Ratio(SNR) ( n ) s( n ) sˆ( n ) E Es SNR( global) n s n 2 (n) [s n sˆ( n ) ] 2 (n) 2 (n) Es 10 log 10 log E s( n) 2 n 2 ˆ [ s s ] (n) (n) n Segmental SNR mj SNR( seg ) 1 M M 1 10 log [ j 0 s 2 (n) n m j N 1 mj [s (n) n m j N 1 sˆ( n ) ] ] 2 j’th Frame SNR M : Number of frames Frequency Weighted Segmental SNR K SNR( fw seg ) 1 M M 1 10 log[ W k 1 j ,k [ E s , k ( m j ) E , k ( m j )] j 0 W k 1 K : Number of frequency bands M : Number of frames ] K j ,k Itakura Measure H ( ) S ( ) H ( ) Is the envelope spectrum S ( ) F{R( )} S ( ) | X ( ) | 2 Use from All-Pole (AR) Model Itakura Measure (Cont’d) H ( ) 1 p 1 ai e j i 1 This is based on the spectrum difference between main signal and assessment signal ai Autoregressive Coefficients Ki Ri Reflection Coefficients Autocorrelation Coefficients Itakura Measure (Cont’d) 1 d ( g s (m), g sˆ (m)) M M [ g (l, m) g (l, m)] l 1 m :Index of frame l : Index of coefficients s sˆ 2 Itakura Measure (Cont’d) ~ d lp ( s (m), sˆ (m' )) M W [ l 1 l ,m,m ' [ s (l , m) sˆ (l , m' )] M W l 1 1 ] l ,m,m ' s (l , m) Is the l’th parameter of the frame that conduces m’th sample Weighted Spectral Slope Measure (WSSM) | s(k , m) || s(k 1, m) | | s(k , m) | | sˆ(k , m) || sˆ(k 1, m) | | sˆ(k , m) | | s(k 1, m) | and | s(k , m) | are in dB. s(k , m) Is STFT of k’th band of the frame that conduces m’th sample d W SSM (| s ( , m) |, | sˆ( , m) |) 36 K Wk ,m [ | s (k , m) | | sˆ(k , m) |] 2 k 1
© Copyright 2025 Paperzz