223 CODEBOOK MODELING IN SPEAKER VERIFICATION/IDENTIFICATION TASK SOLUTION Kh. M. Akhmad1 1Vladimir State University 600005, Vladimir, Gorkovo st., 87 +7 920 911 45 05, [email protected] It is considered speakers recognition methods, based on using of vector quantization algorithms (VQ), distortion measures, standards modeling and identification by code book. Introduction There are various methods of speech recognition [3], however, recently, a comparison method is core became to the standard. It is connected mainly with progress in the field of electronic components, in particular, with increase of processors computational power and memory capacity. By comparison to the standard , speech signals description are compared to in advance reserved reference (standard) descriptions, and their degree of similarity is calculated. Recognition result is the most similar reference pattern. Task description At the decision of identification problem, to use as the standard all received attributes vectors set is inefficiently and inconveniently, as their quantity can be very much greater (proportionally length of a phrase), that attracts significant growth of a speakers database and reduction of identification speed . Besides features vectors are irregularly distributed in some space area, and some groups, i.e. similar each other features vectors are more close in features space. Therefore it is meaningful to break all features vectors into the groups containing similar each other vector. Such problem is solved by means of vector quantization (VQ) — display process a plenty of vectors in final number of vector space areas. Each such area referred to cluster and can be presented its centroid, denoted as code vectors.. The code words set for one speaker refers to as the codebook, as is the standard [8,4,6,7]. Task solution The codebook modeling task is put as follows. Let the features vectors set is given T X xi | i 1,..., L, где xi xi1 ,..., xiD R D , where D– vector Dimension (D=12) . Let C c1 ,..., ck – codebook, where K L – code vectors number (clusters) in codebook and ci ci1 ,..., ciD T R D . Then, vectorial quantizer Q with D – dimension and K – size, this is mapping of victors set X in one of centriods C ci | i 1,..., K C Q X . The primary task at vector quantizer modulation is distortion minimization, i.e. distances between vectors and centroids. The measure distortion choice at codebook modulation can affect its quality. The distance concept or a distortion measure is understood as function d : RD RD R , (1) 224 Which measures differently between two features vectors x and y. For equal vectors d : RD RD R Euclidean distance. Is most common measure which starting with a physical distance principle between two points in space. It is defined as: d E x, y D x y T x y xk yk 2 (2) k 1 Mean -square error (MSE), calculated by formula: dCKO x, y D 1 x y T x y 1 xk yk 2 ,(3) D D k 1 can serve one given measure modification. Weighed Mean -square error. The basic problem with the Euclidean distance, that component vector dispersions differ from each to other. Components with big dispersion dominate over distance. Besides, dispersions Mel-frequency cepstral coefficients component (MFCC) strongly differ. Hence, normalization of these features vectors is therefore needed or use normalizing measure: 1 (4) d w x, y x y T W x y , D where W – positive-definite weighing matrix. Often take W Г 1 , where Г – covariance matrix of random vector x: (5) Г E x x x x T , x x In this case 1 d w x, y x y T Г 1 x y , (6) D well known as Mahalanobis-distance [8]. The simplified normalization method based on Mahalanobis-distance [6] is used in given work. The given method assumes use as measure MSE, but with preliminary normalized arguments x k xk k , k 1,..., D, (7) k where xk and xk are the original and normalized k-th features vector components, respectively, and k and k – are the meansquare and standard deviations of the k-th component over all features vectors samples. The vectors have zero mean and unit variance after the normalization. Codebook modeling it is necessary, that the choice cluster for the next features vectors has been carried out on a distortion measure minimum, i.e. a choice of the nearest cluster, and that each code vector got out of average distortion minimized condition in the cell. There is enough of algorithms for performance clustering and codebook modeling [8,6,7]. Generalized Lloyd algorithm (GLA, LBG, kaverages) starts with an initial codebook, which is iteratively improved until a local minimum is reached. It is the most popular algorithm in clustering problems. On each step, feature vectors are displayed in the nearest clusters, and then everyone cluster centroids (code vectors) are recalculated in view of the vectors which have got there. New codebook quality, after iteration, is better or equally previous. So repeats demanded quality of the codebook will not be reached yet. Further, the algorithm is resulted. Let's assume, that it is necessary to break training vectors set X xi | i 1,..., L to K clusters. Let’s Ci m – i-th cluster on m-th iteration with cim -centriods. K – number of vectors . code Step 1. k=1, create codebook consisting of one code vector: 1 L (8) c1* X i . L i 1 Calculate average distortion inside only cluster 1 L * (9) Dсред d xi , c1* . L i 1 Step 2: Splitting. Increase the code book size twice, breaking already created cluster by two according to rule: ci0 1 ci* (10) , i 1,..., k , ck0i 1 ci* where 0.01 – splitting parameter, k=2k. ( 0) * Dсред Step 3. Iteration. Give Dсред , m =0 – iteration counter. a) Training vectors set X xi | i 1,..., L is classified on clusters, Ci , i 1,..., k by 225 means of a nearest neighbour rule: x Ci m , if and only if, when d x, cim d x, cjm for all j i . In other words, each features vector concerns to that cluster, to which it more close according to chosen metrics. b)Correction centriods by the following formula: xCi m X cim 1 , i 1,..., k . (11) xC m 1 i c) m=m+1. d)calculate average distance between features vectors and corresponding them centriods: L m 1 d x , cm , j : x C m. (12) Dсред i j i j L i 1 e) if m 1 D m Dсред сред D m , then go to step сред (3 .а). * D m . give Dсред сред f) Give ci* cim, i 1,..., k – resulting code vectors set. Step 4. If k<K, then go to step (2), else, stop. Generalized Lloyd algorithm (also known as LBG) is easily enough sold has complexity of order O (KLH) (H - iterations quantity) and yields good results in most cases. However this method finds only the first local minimum and final clusteration quality strongly depends on initial values. One of disposal ways of these lacks is use global (intercluster) optimization. A method of such optimization can be the random and determined interchange centriods or a splitting-merge method. For example, randomized local search (RLS). The given algorithm includes global and local optimization stages [1,2]. Step 1. Initialization. In a random way get out K (clusters quantity) vectors of attributes as центроидов. On basis of it created optimum splitting by a rule «the nearest neighbour ». As a result of it, the algorithm will be insensitive to initial splitting. Give iteration counter m =1. Step 2. Iteration. a) Random exchange. In a random way get out cluster and it centriod replaced random chosen features vector: cim x j | i random1, K , j random1, L . (13) b)Local resplitting. New splitting created taking into account changed centriod. c) Calculate a new centriods by new splitting. d)Give m=m+1. Step 3. let’s f – criterion function. If m m1 , then get out centriods as f f current decision, has given on step (2.c) Step 4. If m<T, then go to step (2), else – stop. T – is fixed and set before algorithm work (T=100, 200, 500, 1000, 2000). Algorithm complexity About O(TL). As criterion function f can serve mean-square error: 1 L f d xi , cjm , j xi C j m . (14) L i 1 We can suggested to improve the given algorithm work, having replaced a step (З.c) several iterations of the LGA. It will allow to lead much a local optimization in the best way and to created optimum clusters unlike centriods usual calculation procedure. The offered improvement quality will slow down algorithm work, that in the majority the appendix is not so critical. In this case algorithm complexity will be About O(TKL). Exist as well other clustering algorithms, but above described generally are more effective and convenient in realization [7]. The code book, which Received by one of the above described ways is represents a given speaker model and then is brought in a speakers database, which represents code books set (standards). The speaker identification process on an existing code books set (speaker database) is similar to training process. From test speaker speech will be extracted a features vectors set X xi | i 1,..., L. Then, defined, what of code books in a database there better corresponds received set of code vectors. Speaker database contained from codebooks set (standards) - B C1,..., C N , which N – speakers number in database, C ci1 ,..., cik 226 – code book, corresponding i-th speaker (k – code book number). Hereinafter for simplification it is meant, that in a database there are identical size code books, though all algorithms are fair and for other case. One of simple and effective ways of code book definition, that which is better there correspond test speaker features vectors, is described by following algorithm [7]: Step 1. For each speaker codebook Ci i 1,..., N do, Compute the distortion Di d X , Ci (15) between X and Ci . Step 2. Identify the index of the unknown speaker ID as the one with the smallest distortion, i.e. ID arg min Di . (16) i 1,..., N The distortion measure (15) approximates the dissimilarity between the codebook Ci ci1 ,..., cik and the features vector set X x1 ,..., x L . We use the most intuitive distortion measure; map each vector in X to the nearest code vector in Ci and compute the average of these distances: d X , Ci 1 L d E x j , cik , min k 1 L k (17) j 1 where d E is the Euclidean metric (2). The approach based on weight coefficients introducing [5] is interesting enough. Being based on this idea, the algorithm which is carrying out weighed identification has been thought up. Before comparison, correlation between code books and code vectors which have higher distinctive power is calculated, greater weights are appointed. It is not required to any aprioristic information on vectors attributes. Unlike the previous method here it has appeared to use more conveniently not a distinction measure, but a similarity measure. Thus, at identification, the best matching codebook is now defined as the codebook that maximizes the similarity measure between features vectors set X and codebook Ci , i.e. ID arg max si X , Ci , (18) i 1,..., N here the similarity measure is defined as: L 1 s X , Ci L 1 k j 1 min k 1 d E x j , cik . (19) As various code vectors have different distinctive powers it is offered to use for identification not only distance from vectors up to the nearest code vector, but also distinctive power of this code vector. For this purpose weight coefficients, and then the similarity measure (19) will look as: s X , Ci 1 L L 1 k j 1 min k 1 d E x j , cik w cijmin , (20) where cijmin – nearest code victor to x j of codebook Ci , w – weight function. Such weighing can be considered as shifting operator dividing surface in a direction to more significant code vectors. For each code vector weight function is calculated. Speaker modified database result will be represented as Bm C1,W1 ,..., CN ,WN where Wi wci1 ,..., wcik – assigned weights i-th code book . Weights are calculated each time when in a database are added the new announcer that be in progress at training stage and does not influence on calculations complexity at identification. Weight coefficients calculations are carried out bye following formula: w cij 1 N k k 1, k i min d E m 1 1 cij , ck m i 1,..., N , j 1,..., K . , (21) 227 Conclusion References As experiments have shown, the weighed identification developed method, is much better to cope with a task, when the features vectors quantity at testing is not enough, i.e. identification occurs by short fragments of speech. The described methods solve identification problem on the closed set i.e. when it is obviously known, that the standard which has been take from the tested speaker, is in a database. If as the test speaker will be taken the speaker, whose standard in base is not present, the given algorithms will choose the most similar speaker, that in any case it will be erroneous. However, if enter some threshold values for a distortion measure (similarity), then in case of threshold crossing for the chosen speaker, it is possible to consider, that this speaker is foreign (stranger) and is not present in a database. Thus, the identification problem will be solved on the open set without significant modification (changes) in the described methods. Thresholds steal up experimentally, and in practice, such method works exactly enough and effectively. 1. 2. 3. 4. 5. 6. 7. 8. Franti P., Kivijarvi J. Random swapping technique for improving clustering in unsupervised classification. // - ftp: // ftp.cs. joensuu.fi /franti/ papers/ scia99-l.ps Franti P., Kivijarvi J. Randomized local search algorithm for the clustering Problem.//Pattern Analysis an Applications, 3(4): 358-369, 2000, ftp://ftp.cs.joensuu.fi/ franti/papers/rls.ps Gorelik A. L, Skripkin V.A. Recognition Methods: school-book. 3-th edition. - М.: Higher School, 1989. - 232 . (in Russian) Gray R. M. Vector quantization. // IEEE ASSP Mag., vol. 1, pp. 4-29, April 1984. Kinnunen Т., Franti P. Speaker Discriminative Weighting Method for VQ-based Speaker identification." http://cs.joensuu.fi/pages/tkinnu/ research/ pdf/Discriminative wightingMethod.pdf Kinnunen Т., Karkkainen I., Franti P. Is speech data clustered? - statistical analysis of cepstral features. - http://cs.joensuu.fi / pages / tkinnu / research /pdf/IsSpeechClustered.pdf Kinnunen Т., Kilpelainen Т., Franti P. Comparison of clustering algorithms in speaker identification", Proc. LASTED Int. Conf. Signal Processing and Communications (SPC): 222-227. Marbella, Spain, 2000. Makhowel John.. Vector quantization in speech coding. // —IEEE, 1985, т.73, №11, pp.19-60. (in Russian)
© Copyright 2026 Paperzz