275 FORMATION OF MODEL LIBRARY FOR RECOGNITION OF SPEECH COMMANDS ON THE BACKGROUND OF NOISE1 N.A. Krasheninnikova2 2Ulyanovsk State University, 432970, Ulyanovsk, Lev Tolstoy St., 42, e-mail: [email protected] The article considers the problem of formation of model library for recognition of speech commands on the background noise. The algorithms which help to obtain good solutions of the problem of models choice within acceptable time are suggested. 1. Introduction When recognizing speech commands (SC) from a fixed dictionary the speech command under recognition (SCR) is usually compared with a model command (MC) and SCR falls under the category of that MC to which it is nearer. Besides the distance between SC is found out in some space of attributes, for example, comparing SC autocorrelation portraits [1]. Due to speech variability one and the same SC are pronounced a bit differently even by one and the same announcer, so one MC doesn’t completely represent SC. Moreover the presence of noise greatly influences the estimated distance between SC. These two factors decrease the recognition quality. In order to improve SC presentation it is desirable to use several MC, and the more MC we use, the better the presentation and recognition is especially on the background of noise. However if the number of MC is large enough the calculation cost on its recognition increases. Moreover it is necessary to spend much time for the announcer to pronounce the commands. In [2] an imitation method of command pronunciation from its one real pronunciation by the announcer was suggested. It solves the problem of MC obtaining. In order to reduce the calculation capacity it is necessary to reduce the number of MC of each SC in such a way that they would characterize the variability of a given SC pronunciation _______________________________________________________________________ Support of the grant RFBR а 06-08-008-10 1 completely enough. Thus, within each set of pronunciations of each SC it is necessary to choose such a subset which would represent the given SC in the best way. Let us formulate the problem under consideration. The dictionary consists of m SC: {C1 , C 2 ,..., C m } . For every SC C i there exist a set of its pronunciations Pi { pi1 , pi 2 ,..., pini } . This set can consist of real pronunciations or can be formed artificially [2]. Besides it can include pronunciations on the background of different noises. In general, this set must describe the possible variants of a given SC, which may occur during the process of its recognition as completely as possible. For any elements pi and p j from P P1U ...UPm a function (quasimetric) d ( pi , p j ) is defined. Among the metric axioms it may possibly not satisfy the triangle axiom. The distance d ( pi , p j ) is the extent of some difference between elements pi and p j , which is used in the process of SC recognition. For example, it may be difference between spectrums of sound signals, their autocorrelation functions, wavelet transformations and so on. From every set Pi it is necessary to choose a subset Ei ei1 , ei 2 ,..., eik Pi of k elements which we will call MC. This subset will be used in the process of recognition, so it is to 276 represent all the pronunciation variety in the sense of the metric used d ( pi , p j ) . For this purpose the average quasi-distance 1 d M m min {d ( p , e), e E }, i 1 pPi i (1) M n1 n2 ... nm km of P elements to the nearest MC must be as short as possible. Besides, MC must be chosen in such a way, that MC of different commands could be easily distinguished from one another in the sense of a chosen metric. Let’s consider this requirement to be an average quasi-distance between MC of different commands: D 1 min{ d (e, f ), f Ei } , mk i eEi (2) which on the contrary must be as large as possible. It is necessary to notice that the problem under consideration has much in common with the clustering problem as each MC can be considered as the “center” of a cluster, which consists of pronunciations nearest to the very MC (pic. 1 shows one model from each of the four SC). which does not increase on d and does not decrease on D . The minimum of function (3) corresponds to the optimal library. In [3, 4] this problem was considered for the case of a singular SC. The algorithms suggested in these works can be applied for a case of several commands, but separately in each of them, which corresponds to function (3) in the form U d. (4) But in this case the correlations between different SC will not be taken into consideration. More general variants of (3) are U d D; U d / D. (5) (6) Criterion (5) is not a good variant, as the following simple example illustrates. Let there exist two classes, represented by segments K1 [0;1] and K 2 [2;3] on a number line (pic. 2). Let simple distance between points A and B be distance d ( A, B) . Then minimum d is obtained when E1 0.5 and E2 2.5 , i.e. on the central points of the segments. Maximum D is obtained when E1 0 and E2 3 , i.e. on the extreme points of the segments, at the same points minimum U d D is obtained, but it is obviously a bad variant of a library, as the models are too far from the class centers. Рис. 2. The choice of models on a system of segments Pic. 1. The classes of speech command pronunciation and models within them Thus it is necessary to minimize (1) and maximize (2), and it is the problem of contradiction. That is why let’s introduce a common quality criterion of MC library formation in as the following function U U (d , D) , (3) If U d / D , than the minimum is obtained when models are E1 (5 17 ) / 2 0.438 and E 2 3 E1 2.572 , they are shown as bold dots in pic. 2. This variant of the library seems well-founded: each model is situated not far from the centre of its class and at the same time it is rather far from the other classes. 277 Thus criterion (6) seems to be a more suitable variant, which leads to intuitively better variants of MC choice. The problem under consideration can of course be solved by simple sorting out, what is not acceptable even if the number of pronunciations is small. Further the article suggests several quasioptimal algorithms of solving the problem with acceptable volume of calculation. 2. The Algorithm of improving of an available solution At first the initial set of MC E 1 is chosen at random, for which corresponding value U ( E 1 ) is estimated using formula (3). Then sorting out of all variants of replacement of the first MC e11 in the first SC on the element from P1 \ E11 is carried out. The best of E1 and these variants (in the sense of U minimum) is remembered and considered to be 1 , e12,..., e1k } where e1 is an optimal reE2 {e11 placement of model e1 . Then tests to replace the second MC e12 by the first SC in E2 on the elements from the set P1 \ E21 are carried out. And so on until the set of models E 1 k 1 is obtained. Then in the same way the tests to replace the models of the other SC are carried out. The described procedure of MC set improvement is carried out twice. The experiments with the given algorithm have shown that the obtained set of MC usually reaches a deadlock (it can not be further improved by the suggested procedure) and even if it is not optimal it is next to it being only 3-6 per cent inferior. The algorithm is executed rather quickly. It is essential that the time spent increases nearly linear with the increase of the number of pronunciations. In order to fulfill the algorithm it is necessary only to set distances d(p, q) between pronunciations. These distances can be calculated using the concrete application of recognition algorithm, under which the MC library is chosen. Of course, the final result sufficiently depends on the initial choice of MC. That is why it is better to use several initial random variants. It was determined that a good solution could be usually found out after a dozen attempts. 3. Gravitation Algorithm Let pronunciations be presented as dots of mdimensional Euclidean space with a usual metric (it is the space of attributes according to which SC recognition is conducted). Let them be material dots with a unitary mass in viscous medium. Then these dots will experience mutual gravity with medium resistance. The dots which are situated nearer to each other are attracted stronger; they approach each other more quickly and join in clusters. There is an analogy between dots which are forming clusters and dots which are divided into clusters – in both cases a set of dots is divided into groups of dots which are close to each other. If during the process of dots movement we mark k largest clusters and within each cluster consider the dot which is nearest to the centre of gravity to be a model element, then we will get good ways of solving the problem of choosing models for different MC, i.e. for criterion (4). For criterion of (6) type repulsion between the dots which belong to different SC is introduced into the algorithm This heuristic algorithm is easy in use, needs small memory – it is necessary to store only the current coordinates and the speeds of moving dots. The medium viscosity is imitated by multiplication of an obtained at every iteration dot speed on coefficient c 1. 4. Algorithms of fuzzy clusterization The problem considered in this article is close to the problem of fuzzy clusterization. The latter also demands dividing some elements into classes (clusters), but the element’s belonging to each of the clusters is fuzzy, i.e. an element somehow belongs to all clusters, and the total measure of element’s belonging to all clusters is equal to one. Besides, representatives (middles) of the clusters are not obligatory the elements of the initial set, they are typical repre- 278 sentatives of their clusters in the sense of this fuzzy belonging. Optimality of clusterization is understood in the sense of minimum of exponents of (3) type, but the sum is taken together with coefficients, which is equal to the extent of belonging to clusters. In [5] there has been suggested a number of iterative algorithms of quasi-optimal solution of the problem of fuzzy clusterization. Similar algorithms have been used for the problem under consideration, i.e. the problem of model choice. The extent of belonging has been chosen as a fixed decreasing function of distance. 6. Libraries with different number of MC In the formulated problem of formation of MC library it has been suggested that the numbers of MC for all SC are the same. This requirement is not necessary: different SC may have different variability, so it is desirable to increase the number of MC for SC with larger variability and to decrease to less probable SC. The algorithms described above are easy to modernize in order to from such libraries. For this purpose average distances (1) for every separate SC are calculated. If this distance is much less the average one for SC the number of MC is decreased in one and vise versa. After such changing of MC numbers the MC library is formed in a usual way. Then another test of change of MC numbers is conducted. At the same time the value of quality criterion is constantly controlled. The algorithm is stopped after a given number of cycles or when the set value is obtained. In a gravitation algorithm the problem can be solved in another way. The maximal radius of clusters, i.e. the maximal distance from the pronunciation to MC is given. Then the number of MC for each command will be defined by the algorithm itself in the process of attraction and repulsion of dots. 7. Algorithms Tests The conducted experiments have shown that the probability of right recognition of speech commands was higher if the models were chosen not randomly but with the help of the algorithms described, because in such a case models sets better presented different pronunciations. 8. Conclusion The suggested algorithms help to obtain good solutions of the problem of models choice within acceptable time which is much less than simple sorting out demands. The first of the mentioned algorithms is more universal, as no clear structure is supposed to be in the initial set of elements, only the knowledge of quasimetric is required. The experiments showed that if models are chosen with the help of these algorithms the recognition quality is higher, then if MC are simply read. References 1. 2. 3. 4. 5. V.R. Krasheninnikov, A.I. Armer. Speech signal recognition on the background of noise // «Sample recognition and image analysis: new information technologies», Works of the 7th international conference РОАИ-7, SPb. - 2004. - P. 752-755 (in Russian). V.R. Krasheninnikov, A.I. Armer. The Speech Commands Variability Simulation // Proceedings of International Concurrent Engineering, International Society for Productivity Enhancement (ISPE), Dallas, USA. – 2005. – P. 387-390. V.R. Krasheninnikov, N.A. Krasheninnikova, V.V. Kuznetsov. Algorithms of speech command model choice in the process of speech recognition // Works of the 62th scientific session devoted to the Day of Radio, Moscow. – 2007. – P. 158-159 (in Russian). V.R. Krasheninnikov, V.V. Kuznetsov, E.A. Rasputko. The algorithm of model choice in a given set of elements // Bulletin of UlSTU, Ulyanovsk, UlSTU. - 2006. - № 3, pp 59 –61 (in Russian). A.P. Velmisov. Algorithm of fuzzy clusterization // Works of Middle -Volga mathematical society, Saransk, Middle-Volga mathematical society. – 2006. - Vol. 8, No 1. – P. 192–197 (in Russian).
© Copyright 2026 Paperzz