NUSC Technical Report 7989 11 June 1987 The Moments of Matched and Mismatched Hidden Markov Models Roy L. Streit Submarine Sonar Department CN tiN A Naval Underwater Systems Center Newport, Rhode Island / New London, Connecticut Approved for public release; distribution Is unlimited. Preface This report was prepared under NUSC Project No. A75210, "Application of Hidden Markov Models to Transient Classification," Principal Investigator Dr. R. L. Streit. This project is part of the NUSC Independent Research Program sponsored by the Office of Naval Research. The 2112). technical reviewer for this report was Or. J. P. lanniello (code The author thanks Dr. J. P. lanniello of the Naval Underwater Systems Center, New London, CT, and Jeff Woodard of Rockwell International Corporation, Anaheim, CA, for their helpful comments and discussions. Reviewed and Approved: 41). o 11 June 1987 64k W. A. Von Winkle Associate Technical Director for Technology UNCLASS iFIED SECURITY CLASSiFiCArlON OF TXIS PAGE REPORT DOCUMENTATION PAGE Ia. REPORT SECURITY CLASSIFICATION lb. RESTRICTIVE MARKINGS UNCLASSIFIED 2a. SECURITY CLASSIFICATION AUTHORITY 3. OISTRIBUTION/AVAILABILJTY OF REPORT Approved for public release; distribution is Zb. OECL.ASSIFICATIONIOOWNGRADING SCHEDULE 4. PERFORMING ORGANIZATION REPORT NUMBER(S) unlimited S. MONITORING ORGANIZATION REPORT NUMBER(S) TR 7989 Go. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL Naval Underwater Systems Center 7a. NAME OF MONITORING ORGANIZATION OfC 214 6c. ADORESS (Gty, State, and ZJPCoe). b. ADDRESS (COty, State, and ZrP Cod*) New London Laboratory New London, CT 06320 Ba. NAME OF FUNDINGISPONSORING ORGANIZATION ft OFFICE SYMBOL Of appkabi) 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER Office of Naval Research Sc. ADDRESS (City, State. and ZIPCode) 10. SOURCE OF FUNDING NUMBERS PROGRAM PROJECT TASK rlington, VA 22217-5000 ELEMENTNO. 161152N NO.RROOOO NO1 - WORK UNIT NO.RROO0 1 NO ACCESSION NO It.TITLE (include Secunty Clawflcation) THE MOMENTS OF MATCHED AND MISMATCHED HIDDEN MARKOV MODELS 12. PERSONAL AUTHOR(S) Roy L. Streit 13a. TYPE OF REPORT 13b. TIME COVERED FROM I/27 TO Final 14. DATE OF REPORT (Year, Month, Day) 1987 June 11 S. PAGE COUNT 40 16. SUPPLEMENTARY NOTATION 17 FIELD 12 17 COSATI CODES GROUP SUB-GROUP 18. SUBJECT TERMS (Continue on revere if neceary and identify by block number) Hidden Markov Models (HMMs) Log Likelihood Ratio Maximum Likelihood Classification 01 01 Moments Receiver Operating Characteristics ROC) I9 ABSTRACT (Continue on reverse if neceory and identify by block number) -An algorithm for computing the moments of matched and mismatched hidden Markov models from their defining parameters is presented. The ability of the first two moments to adequately describe the probability density function of a maximum posterior likelihood classifier based on hidden Markov models is assessed by examples. These examples include ergodic and nonergodic simulated hidden Markov observations that are matched and mismatched with the posterior likelihood classifier. One example discusses the effect of a noisy discrete communication channel on the posterior likelihood classifier reliability. The examples indicate that the posterior likelihood function is log-normal when the Markov chains are ergodic, and thus the first two moments suffice to describe the required probability density functions. The examples are also of independent interest because they indicate how different internal structures of hidden Markov models impact the performance of a maximum posterior likelihood classifier. 20 DISTRIBUTION/AVAILABIUTY OF ABSTRACT (OUNCLASSIFEO0UNLUMIrED SAME AS RPT. 22a NAME OF RE5PONSIBLE 1NDIVIDUAL 0 Roy L. Streit 00 FORM 1473. B4 MAR • m 21. ABSTRACT SECURITY CLASSIFICATION 0 oTIC USERS UNCLASS I FI ED 22b. TELEPHONE (Include Area Code) (203) 440-4906 83 APR edition may be used until exlausted. All other editions arp mbsolete. m • m . 4- 22c. OFFICE SYMBOL 214 SECURITY CLASSIFICATION OF THIS PACE UNCLASS I FI ED TR 7989 TABLE OF CONTENTS Page LIST OF ILLUSTRATIONS . ........................ LIST OF TABLES ............ ........................... I. INTRODUCTION ........ II. THE MOMENT ALGORITHM .......... III. IV. ii ..........................l.. ..................... A. Finite Symbol HMMs B. Continuous Symbol HMMs ......... 6 .................. 6 .................. COMPARISON OF THEORETICAL MOMENTS WITH SIMULATION ... 16 ....... 19 ..................... ... 19 A. Two Ergodic HMMs ....... B. Mixed Ergodic and Left-to-Right HMMs .. C. Left-to-Right HMM With Noise .... .............. 22 ............... ... 28 .......................... ... 34 ................................. 35 CONCLUSIONS .......... REFERENCES ........... iii _j.. i I , - * - I TR 7989 LIST OF ILLUSTRATIONS Page Fiqure 1 Classification of Unknown Signal s(t) as One of p Signals for Which Trained HMMs Are Available 2 (The Normal Curve Has the Sample Mean and Variance Given in Table 3.) ..... 23 (The Normal Curve Has the Sample Mean and Variance Given in Table 5.) ..... ................. 26 Histogram of 10000 Values of log dF 2 3 (x) for T = 25. (The Normal Curve Has the Sample Mean and Variance Given in Table 7.) ..... 0 ................. 28 Histogram of 9879 Samples of log dF34 (x) for T = 25. (The Normal Curve Has the Sample Mean -28.156 and the Variance = 3.6167.) .. ii1 ................. Histogram of 10000 Values of log dF3 3 (x) for T = 25. 6 23 (The Normal Curve Has the Sample Mean and Variance Given in Table 3.) ..... 5 ................. Histogram of 10000 Values of log dF 2 1 (x) for T = 25. 4 3 Histogram of 10000 Values of log dF2 2 (x) for T = 25. 3 .......... ............... 33 TR 7989 LIST OF TABLES Page Table 1 Parameters of HMM(1) ..... .................. .. ... 2 Parameters of HMM(2), Rounded to Three Significant Digits ....... 3 ....................... ............... 24 .................... 25 4 Parameters of HMM(3) ......... 5 Comparison of Two Estimates for the Mean and Standard Deviation of log dF 3 3 (x) . ..... Number of 0 .... 27 27 Comparison of Two Estimates for the Mean and Standard .. ................. 21 Number of 0T c HMM(3) + Noise for Which f3(OT) = 0 at Various Values of ET 9 .. ................... Deviation of log dF2 3 (x) ...... 8 .... T c HMM(i) for Which f3(0 T) = 0, i = 1, 2 ......... 7 21 Comparison of Two Estimates for the Mean and Standard Deviation of log dF 2 j(x) for j = 1, 2 .. 6 20 ............ 31 Parameters of HMM(4), Rounded to Three Significant Digits ....... ....................... 32 iii/iv Reverse Blank TR 7989 THE MOMENTS OF MATCHED AND MISMATCHED HIDDEN MARKOV MODELS I. INTRODUCTION Hidden Markov models (HMMs) are statistical models of nonstationary time series the or signals. time In variation of the example is the problem, where short term spectra speaker-independent HMMs they are speech applications, used to characterize of spoken words. isolated word A specific recognition characterize the words in a finite-size different HMMs.I by Different words are characterized Every HMM is comprised of two basic parts: random variables. state is (SIIWR) vocabulary. a Markov chain and a set of The Markov chain has a finite number of states, and each uniquely associated with one of the random variables. sequence generated by the chain is not observable; The state i.e., the Markov chain is At each time t = 0, 1, 2, .... , the Markov chain is assumed to be "hidden." in some state; it transitions to another state at time t + 1 according to its probability transition generated by chain time at the random t. The matrix. At each time variable associated with observations are t, one observation the state of referred to as is the Markov symbols. If the random variables assume only a finite set of values, the HMM is referred to as a finite values, symbol HMM. If the random variables the HMM is called a continuous symbol defining an HMM is function of the Markov comprised of chain at the HMM. initial time t = assume continuum of The full parameter set state 0, a probability the Markov density chain state transition probability matrix, and the probability density functions of each of the random observation variables. The act parameters of of computing an HMM solving a mathematical estimates of specific is called numerical "training," and values for training the various is equivalent to optimization problem to determine maximum likelihood the HMM parameters. 2 In this paper, it is assumed that the training phase is completed and that the HMMs developed are adequate models for each of the nonstationary time series, the vocabulary words in the SIIWR problem). or signals, of interest (e.g., TR 7989 During the training phase, a suitable preprocessor is developed to map an transform) (or is this way T truncated HMM required for sequence 0 HMM. that the Denote i-th recognizer thus computes the The by HMM(i). hidden Markov model i-th by the to likelihood posterior the evaluates passed then is ..., T} The integer. positive a characterized series a time from comes that 0T is T where recognizer HMM 1 classification the in one of reference in given utilized also t = 1, 2, T = (O(t), Each recognizers. is discrete description A good .}. a into 0, > The observation sequence is truncated to the HMMs, the .. problem preprocessor is The phase as depicted in figure 1. length 1, 2, SIIWR the for done (pp. 10-17-1078). t = [O(t), sequence observation t s(t), signal input arbitrary posterior likelihood function fi(0 T c 0T where p) Pr[OT 1 0T c HMMi observation the that hypothesis the denotes HMM(i) sequence The maximum of the p computed posterior 0T is a realization of HMM(i). likelihoods is assumed to identify, or classify, the original signal s(t). In of kind some practice, threshold must be to identify signals trained. The likelihood function (1) set for which can n is the number of 2 using the forward-backward algorithm. (where multiplications be must rule tie-breaking defined HMMs with computed be in states have and some not been only the Markov chain) n2 T by The misclassification rate (or false alarm rate) of the system depicted in 1 can figure Alternatively, be estimated by simulation the misclassification rate of after training i as signal is completed. signal j can be determined from the conditional cumulative distribution functions (2) Fij(x) = Pr[fi(OT) < x 1 0T c HMM(j)] by using operator rates classical detection characteristics based (ROC) on F ij(x) depends and estimation methods 3 curves. on the The validity of 2 theory and simulation would support the develop receiver- misclassification the assumption that validity of HMMs developed are adequate models for the signals between to of interest. hypothesis that the Agreement the HMMs TR 7989 z 0< w .J 0 C) LM > LI LU F-- 0 4-0 .4-2 Ln Cci C) 00 0 a z oz z zC (D Z U3 TR 7989 Unfortunately, algorithms for calculating later For parameters are not known. really do represent the signals. directly from the HMM Fij(x) reference, note that F. (x) * Fj (x) in general. The moments of dF ij (x) are defined by the Riemann-Stieltjes integral Mij(k,T) If Fij(x) is differentiable F ij(x), derivative with (3) k = 0, 1, 2, .... dF . (x) , x = the moments sequence because then can be written equivalently as the Riemann integral Iijif M. (k,T) The moments depend on the x) dx F'(x) x = T length of the observation They uniquely determine Fij(x) depends on T, as seen from equation (2). function of finite and the characteristic dF ij(x) when 13they are all 4 dF ij(x) (2), it has a positive radius is clear that dF ij(x) of x < 0 and 0 for = From convergence. equations for x > I. (1) Thus, and from equation (3), 1 Mij(kT) = so that all the moments are finite. @ij x O, k dFij(x) < 1 The series ( V! Mij~vT U=O for the characteristic an infinite radius of is bounded in 4 above function of dF ij(w) is absolutely convergence because, for fixed magnitude by jwO1V/v! and convergent with w0 0, each summand 0 thus the radius of TR 7989 convergence must be at least as large as exp( Io[). Consequently, the moments of dFi (x) uniquely determine dF ij(x) and, thereby, Fij(x). This of paper presents dFii (x) up to any an algorithm for computing explicitly the moments desired order parameters of the HMMs involved. directly from the given underlying The only essential assumption made is the usual one that the HMMs have a finite number of states. However, it is not required that all HMMs have the same number of states. This paper also Lheoretical moments independent interest presents with simulation because they likelihood classification theoretical analysis alone would features examples based compare results. exhibit on that The important ergodic not show as and the first examples features of left-to-r'lht easily -r as are important because they indicate how the are of posterior HMMs quickly. internal two that These structure of HMMs impact the performance c" the system depicted in figure 1. 5 TR 7989 II. THE MOMENT ALGORITHM The reader is assumed to be familiar with such first principles of HMMs as given in reference 2. to undei A. It is not, however, necessary to reaJ this section tand the examples provided in section 11. FINITE SYMBOL HMMs Let HMM(u) be a hidden MarKov process with n(u) states, u = 1, Subscripted indices will always be written as functions of their subscripts (for instance, is used subscripted subscripts. Let HMM(u) denoted be n(u) as instead the n ) of state Au = [au( ),( avcid 0 T ) i(v), for , vector [O(t), t = l = O(t) C V = [Vl, later probability of HMM(u) restrict attention to finite symbol HMMs; that every observation sequence the transition n(u). Let the initial state probability W = [ i(V)], for i(u) = 1, ..., n(u). We first to .. ., use of matrix of j(u) I. be denoted that is, we suppose .... Tj is such that Vmj I where V is the set of all possible output symbols of the preprocessor. true nature of the symbols as in V is of no importance here. The HMMs assume that 0(t) is a random variable whose probability density function depends on the current state of the Markov process. Let the discrete probability density function for HMM(u) when it is in state uB i(v) be denoted as i(u) for n(). ... Thus, each B is a row vector of length Stacking these row vectors gives the n(v)-by-m symbol probability matrix Bl B2 B = [b )I .U),j 8 n(u) bB m. TR 7989 Note that b j(V) ( b= j(U)) where we define bi()(O(t)) = Pr[O(t) I HMM(u) and Markov state The that assumption parameters X For is phase training the = the that means completed = (w , Au, Bu) are known. discrete HMMs, symbol function cumulative the is Fij(x) Let increasing step function with a finite number of jump discontinuities. X . denote the set that dF ij(x) It follows discontinuous. = 0 for x of values all of which if x is not in X.. Fi an (x) is and that, for x in X.., 13 dF ij(x) = Fij(x ) - = Pr[fi (0) F ij(x-) = (4) x 1 0T c HMM(j)] Substituting equation (4) into equation (3) gives xk Pr[f iOT E xcxi. M ij(k,T) = 0T i(OT)} k = x 1 0T c HMM(j)] Pr[OT 1 0T c HMM(j)] ] (Pr[ OT 1 0T c HMM(i)3} k Pr[OT 1 0T c HMM(j)] (5) 0T It k > is clear 1. and I. For from k equation (5) m I, however, we that have M.i (k,l) Mi (k,T) M ij(,T) = M..(l,l) in general for all for i, j, TR 7989 parameters in exponentially algorithm for except practical 2 0T = (O(t), output symbols total of k[n(i) n(j)] 2 increases effort the forward-backward direct calculation Thus, T2 m O(t) each can sequences observation possible different because T} ..., V. not T m are There in is multiplin2 (u) T using Pr[OT I T c HMM(u)] n(j)] 2 T 2 k[n(i) requires (5) summand in equation each t = 1, the calculation that note this, a from 0 calculates multiplications. such computational the because see To T. Thus, cations. T small directly computable is however, HMM(j); and HMM(i) of (5) equation in expression The be any of equation one of the m (5) requires a multiplications. We now derive a recursion for equation (5) that requires computational The recursion is derived for a more effort that grows only linearly with T. For k = 1, general expression that contains equation (5) as a special case. 2, ..., define k Pr[OT 1 0T *I-- R(k,T) = V=l 0T The application of straightforward; forward from equation (5) is (6) to compute any moment for example, R(k+lT) equals M 2 1 (k,T) when HMM(2) = ... = Note that R(k,T) can be interpreted as a joint moment of HMMs. HMM(k+l). The equation (6) c HMM(u)] recursion of portion the proceeds for R(k,T) recursion derivation of the forward-backward as follows. gives algorithm The the expression n(u) 0T c HMM(u)] Pr(OT T(j(v)) = (7) , j(v)=l where, for 2 < t < T, VM Ul(i(v)) a" )) =t = )b (O(t)) , (8) '(v)=l and I 8 v )) = v) b (00)) (9) TR 7989 Substitute equation (7) into equation (6) to obtain k II:(V)) VU- n(u) R(k,T) =, j ( )=l 0T n(u) where we define for t 1 = , .. u;=I JT (J( ), ..., j(k)) ( 0) T k V (11) t(j(v)) V j(k)) ... , l 00ttU=1 One interpretation •nd in state of j(v,), vT v=I suppose that 2 < t < T. is that it ..., k. equals We seek R(k,T) a given that HMM(v) recursion for PT. must First Then, substituting equation (8) into equation (11) gives lt(J(1) ]t7 ... , j(k)) 0t II n() E1- li( t kO 21 ri i(1u)=: U=l ..... k =I & -I(i(v)) ai(V),j(V i t-I tI -V=I )=l 1k~l ' ri i(U)J) ,ji llkilb1 j 10(i() Vt-I u=l ,... ,k • bJ( )(O(t)) i(i) ) t vl-~ ai(U) E~ ) l | m I bV() (O(t)] E =~),( I~ TR 7989 Because ct- (i(v)) t-1 does not depend observation sequence 0t = (0(1), n(v k i(u)=l V=l ..... k U=l last symbol O(t) in the ..., 0(t)}, we have a-ICL,j) Mi 0t Because the sum over 0(t) ... the i( k ) ) lit (J(1) . , = [0(l), on , Q(t-l)}, is as v) k l Ot b V )(O (t ) ) V=l independent of the observation sequence Ot- l well as the indices i(v), and because of equation (11), we have jt(J(1), . .. , j(k)) n(u) Fk =rcj(1), . .,j~k)) Ut-Il(i(1), .. , i(k)) a (12) where k r(j(I) . , ,j(k))= buj(V)Ot) (M) I O(t) V=l m k b I)(Vs) (13) s=l U=l Equation (12) is the desired recursion substituting equation (9) into equation (11) lJ( j, j(k)) 2 < t < gives TI = 0(1) 10 for k V=l (j(u)) T. For t = 1, )k (k TR 7989 b)(00)) E k r(j(), j(k)) , ... I (14) rj(v) U=l Let k regardless = From the definition, it is clear that R(l,T) 1. of HMM(l), because the sum in equation (6) is I for all T, over all 0 T. To independently check the recursion (12)-(13), note that, from equation (13), m (1 )(V) r(j(l)) < t< T = s=1 From equation (14), we have j(l) I= Hence, from equation (10), we obtain n(l) R(ll) = The recursion is verified for T = %__= 1. For T = 2, from equation (12), we have n(1) i(1 )=l n(l) 21i() ai(), ! i(l)=l 11 TR 7989 so that, from equation (10), n(1) (n(l) R(,2) = i(l) ai(1),j(1) ii(1)=l j(1)=l n(i) a(n()) i(1)=1 lu)j(1)=l and the recursion is verified for T = 2. The first nontrivial identically the first special moment case is k = 2. M1 2 (l,T). In this From equation t (i( ),i(2)) case, (12), we R(2,T) have 2<t<T n(1) lt(j(l),j(2)) = n(2) r(j(1),j(2)) V i(l)=l i(2)=l and, from equation (14), 1 p(j(l),j(2)) = r(j(l),j(2)) j() 1 2 j(2) where, from equation (13), m '(l r(j(1),j(2)) = )(Vs) (2)(VS) s=l From equation (10), then, we have n(1) R(2,T) = n(2) E j(l)=l j(2)=l 12 uT(J(l),j(2)) a ), ai2 is for TR 7989 Computation of R(2,T) = M1 2 (l,T) is therefore not excessively laborious. The evaluation of into two parts. possible every R(k,T) using the recursion (12) is properly The first is the precalculation of r(j(l), .... requires This indices j(v). value of the , broken j(k)) for (k-l)m Nk multiplications and Nk storage locations, where storag loatoswhr N (15) net) = is the geometric mean of the number of different states in the various HMMs and is not different requires necessarily observation 262144 an symbols, storage If N = 8 integer. then locations computing and and 7 and 2.1 if there x 10 storing are r m for multiplications. 16 k 6 = Storage is clearly more crucial an issue than multiplications. It is possible in some cases to utilize the underlying symmetries of to reduce both storage and computational = effort. For example, if HMM(2) = HMM(k+l), then r(j(1),j(2), .....j(k+l)) = r(j(l), a(j(2)), for every permutation c of the k integers j(2), equation (16) commutative. follows easily from equation number combinations ..., o(j(k l))) ... , j(k+l). (13) because (16) The proof of multiplication is Thus one only need consider indices that satisfy 1 s j(1) s n(l) and 1 s j(2) The r of ordered of n(2) index letters S sets taken (j(u)) k at repeated any number of times up to k. Nk+l (n(2)(n(2) + 1) j(3) s a . :.. j(k+l) s n(2) is equal time, when to each the number of letter may be Storage is therefore proportional to . k. (n(2) + k ) n(l) 13 TR 7989 is which necessary. be otherwise storage would that reduced is also count multiplication total The [n(2)] k n(l) the than smaller significantly proportionately. r Once recursion sequence. sum the N all over the of each For appears to require k N k k k E ... sum ai(k),j(k "t-l~i1 ) 1~k .. ~). one iteration of (k)=1 (2)=1 i(1)=l (12), This undertaken. be the multiplications; however, by using the nested form, E aM(),W() equation in (j(to)} k, of observation the of T length must value given indices of (i(u)} indices n(2) n(l) sets N a for any for computed be can (12) stored and computed been has it is possible to use approximately instead. If equation (12) sequence of for about requires T, length If N multiplications. required order terms lower (N N) (Nk -1) neglected, computing N k N2 Nk = are 2k N computing For an on the order UT requires of N2kT multiplications are 8 and T =32, then 2.2 x1 Assuming a multiplication takes k = 6. observation multiplications. one microsecond, the calculation requires 611 hours and is clearly impractical. Significant reduction by utilizing = the in computational underlying symmetries in effort is possible in some cases For example, vt" = HMM(k l), then Vt(j(l),j(2), ... , j(k+l)) = ,t(j(l), 0 (j(2)), .... for every permutation a of the k integers j(2), equation (17) multiplication property 14 if HMM(2) (16) follows easily is commutative in this case. by and Thus, induction because r ... from j(ki-l). , (17) The proof of equation (12) because the same symmetry satisfies the recursion o(j(k+l))) (12) need be computed for TR 7989 2 4 sets Nk~l only which NkiT, total The of indices. is significantly multiplication count multiplications N2k T the than smaller to is reduced mutplktin For the above example requiring 611 hours, that would otherwise be needed. is utilized, the calculation if N = n(1) = n(2) = 8 and if the symmetry (17) would be reduced to roughly a 96-minute calculation. Utilizing symmetry is clearly significant in that it can turn an impractical long calculation into a feasible shorter one. Underflow is It easily be can potentially a problem when the recursion (12) in overcome exactly same the is computed. pointed manner as out in reference 2 for preventing numerical underflow during the calculation of the equation (12) which use is the in be computed according to and then multiplied by a scale factor ct defined by = ct Then let lt Specifically, forward-backward algorithm. scaled vdlues of t(j(1), ... , j(k)) the recursion Vt turn scaled as shown in in equation R(k,T) (12) If (18). fashion and recall the expression in equation (10), (1a) to compute pt+l' this we continue in it follows that (19) ct = Because the product cannot be evaluated without underflow, we compute instead T log R(k,T) (20) log ct = -E t=l Any convenient potentially scale useful factor one might can be be to used take instead Tt = of Nk. equation Using A (18). Tt would eliminate the effort of computing the sum in equation (18) before scaling. 15 TR 7989 B. CONTINUOUS SYMBOL HMMS The objective of this section is to show that the moment algorithm for discrete symbol HMMs can be carried over essentially unchanged to continuous symbol HMMs. In fact, it holds also for continuous vector symbol HMMs; however, only the continuous symbol HMMs are treated here for simplicity. Throughout this section, it is assumed that each output symbol 0(t) is a real random variable defined on some underlying event space, V. The probability density function of 0(t) is uniquely defined for each state i(V) = 1, ..., n(i) b (x). b An of v = 1, 2, ..., HMM(v), and is denoted as Thus, for real numbers a and B with a < 0, we have (x) dx observation real each numbers = Pr[a < 0(t) < B I HMM(u) and Markov state = i(u)] sequence xt, with O(t). The posterior density function for 0 = [xt, t = 1, 2, ..., T) xt being a realization of is the a (21) sequence random of variable likelihood continuous function f (0T) is now a probability symbol HMMs, as opposed to a simple probability (see equation (1)) for discrete symbol HMMs. Thus,- for real vectors V and B with '&<B, we have f f(OT) dOT = Pr[- < where 0T c HMM(u) HMM(u) and dOT = dx The denotes the 0T hypothesis 0T C HMM(u)] , (22) that 0T is a realization of ... dxT . conditional cumulative distribution functions Fi(x) are defined by equation (2), just as for discrete symbol HMMs. From equation (3), we have the moments 16 TR 7989 M ij(k,T) f xk dF ij(x) f k Pr[x < fi(OT)< x + dx I Sf. 0 T c HMM(j)] dx [f i(OT)} k f COT) dOT , (23) T-fold which is the equation (23) case k = 1. continuous that analog of equation M ij(k,T) = Mji(k,T) in (5). general It only is clear for the from special The analog of equation (6) for continuous HMMs is R(k,T) f = 1 f (T) E dOT (24) Ul T-fold The forward-backward algorithm for function for continuous HMMs is modified f ( computing the posterior likelihood as follows: )(j(v)) = (25) , j(u):l where (9), now aT(j(v)) with the is only interpreted equation (21). computed exactly difference as the being as given that probability b" by j(V) the (0(t)) density recursions in equation function (8) and (8) is implicit in Consequently, equation (10) still holds exactly if we define 17 TR 7989 t(i(l), f j(k)) ..... (26) t(j(v)) dOt ... t-fold as the analog of equation (11). replacing Proceeding as before with t-fold integrals t-fold summations gives exactly the (12), recursion but with the one-dimensional integral k rum, ... , j(k)) J jib (27) (x) dx in place of equation (13). The remarks in the preceding section concerning storage, multiplication symmetry properties all counts, and primary difference is that equation apply for continuous (27) requires an instead of a finite sum as in equation (13). initial computational overhead, but once algorithm (12) proceeds exactly as before. 1B 18! symbol integral HMMs. The evaluation This evaluation increases the equation (27) is computed, the TR 7989 III. Ergodic from Markov chains are those for which it is possible to transition every step. COMPARISON OF THEORETICAL MOMENTS WITH SIMULATION state to every other state, although not necessarily in one Left-to-right Markov chains are those for which transitions to lower numbered states are not allowed, that is, have probability zero. types of chains are sufficiently different that they are These two considered separately in the examples. One interpretation signals, while ultimately left-to-right become exited once is that ergodic HMMs are models of HMMs stationary it is entered). are (because models the of quasi-stationary transient highest signals numbered state that is not One might therefore expect these two types of HMMs to impact the performance of the maximum likelihood classifier depicted in figure 1 in different ways. The three examples given in this section support this expectation. Using the above follows. The first based HMMs can on stationary signals second interpretation, example shows reliably with the examples may be described that maximum likelihood distinguish between sufficiently a reasonable amount of as classification long computational quasi- effort. The example shows that short quasi-stationary and transient signals look significantly different to the HMM transient recognizer, but not to the HMM recognizer based on the quasi-stationary signal. The third example shows that noisy observations of transient signals adversely affect classification performance by making the transient signal appear to have a stationary component, which is then misclassified by the HMM transient recognizer. A. TWO ERGODIC HMMS HMM(l) and HMM(2) parameters are given and 2, respectively. uniformly distributed every symbol are five-state, (rounded HMM(l) symbols. can be generated interest here is the following. to three clearly HMM(2) eight-symbol ergodic significant decimals) generates observation is more complex in every state. models whose in tables sequences in structure, The fundamental 1 of but question ot How long must an observation sequence be to 19 TR 7989 guarantee that posterior maximum the Introduction) classification (described probability be 98 percent reliable? will in We will give what may best be 6escribed as a semiempirical answer to this question. Because of the nature of HMM(l), it is easy to see that PrFOT f0 In other constant HMM(l). 0T words, the posterior because all observation In particular, f1 0. c HMM(l)] likelihood 0T ) function sequences cannot 8- are equally distinguish useful likely 0T is if 0~TC c HMM(l) from for classification. were generated, and Ten-thousand the posterior observation probability instead of HMM(l), is sequences NUMBER OF MARKOV STATES = 5 NUMBER OF SYMBOLS PER STATE =8 INITIAL STATE PROBABILITf VECTOR: 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol TRANSIlTION 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol PROBABILITY MATRIX: 2.OOE-Ol 2.QOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.00E-01 0T f2 (01T) was Table 1. Parameters of HMM(l) 20 HMM(l) HMM(2) and thus is useless for classification. The posterior likelihood function based on HMM(2), HMM on based 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE -01 2.OOE-Ol ?.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol 2.OOE-Ol SYMBOL PROBABILITY MATRIX (TRANSPOSED): 1.25E-01 1.25E-01 1.25E-01 1.25E-01 1.25E-01 1.25E-01 1.25E-01 1.25E-01 1.25E-01 1.25E-01 1 .25E-01 1 .25E-01 1 .25E-01 1 .25E-Ol 1 .25E-01 1.25E-01 1 .25E-01 1 .25E-01 1.25E-01 1.25E-01 1 .25E-01 1 .25E-01 1.25E-01 1.25E-01 1 .25E-01 1.25E-01 l.25E--Ol 1.25E-01 1 .25E-01 1 .25E -01 1.25E-01 1.25E-01 1 .25E-01 1 .25E-01 1.25E-01 of each computed TR 7989 Table 2. Parameters of HMM(2), Rounded to Three Significant Digits NUMBER OF MARKOV STATES = 5 NUMBER OF SYMBOLS PER STATE = B INITIAL STATE PROBABILITY VECTOR: I.OOE-O0 O.OOE*O0 O.OOEO0 O.OOE+O0 1.24E-01 2.13E-Ol 1.27E-Ol 1.15E-Ol 2.82E-01 1.94E-Ol 2.34E-01 3.38E-01 2.75E-01 2.98E-02 SYMBOL PROBABILITY MATRIX (TRANSPOSED): 1.81E-Ol 1.22E-Ol 7.89E-03 1.48E-Ol 7.04E-02 TRANSITION 1.40E-Ol 1.40E-O1 4.37E-02 9.73E-02 2.36E-01 using the natural thus O.OOE+OO PROBABILITY MATRIX: 2.35E-01 3.08E-01 1.14F-O. 2.99E-01 3.20E-01 1.72E-Ol 4.97E-Ol 1.53E-02 2.49E-02 4.27E-01 1.39E-Ol 8.28E-02 3.23E-02 9.13E-Q2 1.33E-Ol 2.67E-02 l.79E-Ol 1.56E-Ol 1.60E-Ol 1.66E-O1 1.58E-Ol 5.87E-02 2.18E-Ol 2.15E-Ol 1.08E-Ol 1.30E-Ol 2.09E-Ol 2.34E-01 5.97E-02 2.35E-01 1.19E-O1 5.75E-02 l.lIE-Ol 1.02E-Ol 1.03E-Ol l.76E-Ol 2.37E-02 1.32E-Ol 1.22E-Ol 2.40E-01 1.17E-Ol 6.61E-02 1.46E-Ol 1.76E-02 1.47E-Ol forward-backward algorithm. logarithm of matched histogram of to dF2 2 (x) for the posterior log dF2 1 (x) for Figure 2 shows T = 25. likelihood T = 25. mismatched to the likelihood function. The observation function. In a histogram of figure the sequences Figure 3, 3 are shows a 01 is then, As is clear from figures 2 and 3, the difference between the mean values of the log likelihood functions is about 1.4 standard deviations. Thus, the potential exists for using log dF2 2 (x) to classify observation sequences; however, T = 25 is not long enough to classify with high reliability. A useful observation drawn from figures 2 and 3 is that the probability density function distribution. log dF ij(x). 1.. of log dF2 i(x) Let vii Then, and is nicely approximated by the normal aij denote the mean and standard deviation of if dFi (x) is ;og-normal, it is easy to show that and a. . are related to the moments M. .(k,T) by the formulas 21 TR 7989 2 2 It is stressed dFii (x) is (28) 2 log M ij(l,T) - (1/2) log Mij(2,T) = = log Mij (2,T) - 2 log M ij(1,T) that equations truly (28) log-normal. and (29) For (29) hold finite exactly symbol if and HMMs, only if dFii (x) is necessarily discrete, so that both equations (28) and (29) must be viewed as approximations. Sufficient conditions under which it may be proved that dF ij(x) is, in some sense, approximately log-normal are unknown. the mean and standard deviations of Table 3 gives a comparison between log dF ij(x) estimated calculated between and from from equations 10000 (28) the approximations of variances. T = 400 are HMM(2) with value of standard It long high also as equations to and the Assuming they appear to (28) that value log dF2 1 (x) is reliability 01 C the mean about are of length and between log dF 22 (x) the of 0T c HMM(l) log dF2 2 (x) and those good agreement sequences difference of and (29) and the sample means between mean 0T shows observation is, the then table and That be, sequences This distinguish reliability. deviations. (29). establishes enough log dF2 1 (x) distributed, and observation 5.2 normally the maximum posterior classifier is about 98 percent. Computing the requires n 2 T = 10000 effort as computing 9216 posterior multiplications. are small forward-backward equivalent to function This is one 512-point FFT, which multiplications. f2 (04 0 0 ) likelihood In enough algorithm a nested other for words, the practical f2 (OT) about the for same T = 400 level of requires (2)(512)(log 2 512) computational application. requirements Furthermore, = of the for computing f2(OT) is mathematically sequence of matrix-vector multiplications. Consequently, it is possible to reduce total computation time by the design of a "black box" to exploit this special structure in hardware. B. MIXED ERGODIC AND LEFT-TO-RIGHT HMMS HMM(3) is a parameters are given 22 five-state, in table 4. eight-symbol left-to-right model whose It has a structure that might conceivably TR 7989 g -60 -57 -54 -51 -48 -45 -42 Figure 2. Histogram of 10000 Values of log dF2 2 (x) for T = 25. (The Normal Curve Has the Sample Mean and Variance Given in Table 3.) -60 Figure 3. -57 -54 -51 -46 -45 -42 Histogram of 10000 Values of log dF2 1 (x0 for T = 25. (The Normal Curve Has the Sample Mean and Variance Given in Table 3.) 23 TR 7989 Table 3. Comparison of Two Estimates for the Mean and Standard Deviation of log dF2 j(x) for j = 1, 2 Standard Deviation Eq. 29 Sample Mean Value j Eq. 28 Sample T = 1 5 10 15 20 25 50 100 200 400 -10.8 -21.3 -31.9 -42.4 -53.0 -105.8 -211.4 -422.6 -845.0 -10.6 -21.2 -31.8 -42.4 -52.9 -105.8 -211.5 -423.0 -845.8 0.95 1.11 1.24 1.35 1.47 1.93 2.62 3.60 5.09 0.71 0.92 1.10 1.25 1.38 1.91 2.67 3.76 5.30 = 2 5 10 15 20 25 50 100 200 400 -10.1 -20.3 -30.6 -40.8 -51.0 -102.1 -204.4 -408.9 -818.0 -10.1 -20.3 -30.5 -40.8 -51.0 -102.1 -204.4 -409.0 -818.1 0.69 0.90 1.08 1.23 1.37 1.92 2.66 3.77 5.33 0.59 0.84 1.03 1.20 1.34 1.90 2.69 3.80 5.37 Note that HMM(3) never arise in the SIIWR problem. once is it entered. sequences ultimately Note also that been symbol and V8 V VV 3 or f3(OT) = 0. will be against symbol the It entered. ergodic V that the forbidden these observation of sequences make f3( 0 T) fifth may a also state has containing sequence 0 five V8 . V, symbols zero; be powerful noticed. the V2 , i.e., It discriminator summarize briefly, this example To sequences. V6 f V7 9 and if the the observation long likelihood posterior symbol facts any symbols only observation have must three if and occurs containing subsequently V5 only that an follows Other seen contain sufficiently all Consequently, leaves the fifth state will show that short observation sequences of quasi-stationary and transient HMMs look hand, very all different observation to the transient sequences look HMM recognizer. somewhat alike to On the ergodic other HMM recognizers. When 24 HMM(3) enters its fifth state, it becomes stationary and, TR 7989 Parameters of HMM(3) Table 4. NUMBER OF MARKOV STATES = 5 NUMBER OF SYMBOLS PER STATE = 8 INITIAL STATE PROBABILITY VECTOR: O.OOE+O0 O.OOE+O0 l.OOE+OO O.OOE+O0 TRANSITION 6.OOE-01 O.OOE OO O.OOE+OO O.OOE+OO O.OOE O0 O.OOE-O0 l.OOE-Ol 4.00E-Ol 7.OOE-Ol O.OOE+O0 O.OOE+-O0 O.OOE+OO O.OOE+OO 3.OOE-O1 l.OOE+OO SYMBOL PROBABILITY MATRIX (TRANSPOSED): l.OOE-Ol O.OOE+OO O.OOE+O0 9.OOE-Ol l.OOE-Ol 6.OOE-Ol O.OOE OQ O.OOE+O0 3.OOE-Ol O.OOE+O0 O.OOE+O0 2.OOE-Ol l.OOE-Ol 6O0E-Ol l.OOE-Ol O.OOE+O0 2.OOE-Ol l.OOE-Ol OO O.OOE O.OOE+O0 O.OOE+OO 4.OOE-Ol O.OOE+OO O.00E4-O O.OOE+O0 O.OOE+O0 O.OOE+O0 3.OOE-Ol O.OOE+OO O.OOE O0 O.OOE OO O.OOE-O O.OOE+OO O.OOEO0 O.OOEO0 O.OOE+OO O.OOE+OO l.OOE-Ol 6.OOE-O 3.OOE-Ol transient HMM(3) of portion estimating the first passage The 6 entered. mean simulation however, explicitly; deviation respectively. largest The first the of was by gained used times here first passage time was 43 transitions. be computed 10000 In instead. was time was may is found that the mean and it was time fifth state its before passage passage passage first least first for HMM(3), observation sequences generated standard of of into its fifth state, that is, the Markov process variance and length the is sequences observation time of HMM(3) in of transitions the number into Insight interesting. less significantly consequently, the PROBABILITY MATRIX: O.OOE+O0 4.O0E-Ol 2.OOE-O| 7.OOE-Ol O.OOE*OO 6.OOE-Ol O.OOE+O0 O.OOE O0 O.OOE O0 O.OOE+OO O.OOE+OO 3 4.8, and 10.9 transitions, and the Thus, observation sequences almost certainly become stationary for t > 50. Figure 4 distribution as closely dF 22(x), as 5 clearly and table even though approximated evidenced HMM(3) by by a the show is not that dF3 3 (x) is a ergodic. log-normal discrepancy "well-behaved" However, dF 3 3 (x) is not distribution in table as are 5 between dF2 1 (x) and the sample 25 TR 7989 I -45 -40 -35 -30 -20 -2s -15 Figure 4. Histogram of 10000 Values of log dF3 3 (x) for T = 25. (The Normal Curve Has the Sample Mean and Variance Given in Table 5.) and StdLiStics the statistics that would hold if dF3 3 (x) were truly log-normal. Ten-thousand and observation sequences the posterior backward algorithm. posterior likelihood The likelihood 0 which f3( T) = 0. ergodic HMM f3(OT) observation function. Better observations than was into state 5. Total was and HMM(2) were generated computed sequences Table are 6 gives the 99-percent attained observation sequences were about as HMM(3) of HMM(l) using to the number of sequences for of T = 10, that long as the mean rejection of the forward- thus mismatched rejection when the the simulated is, when the first passage time of 10000 ergodic observations occurred for T = 20. The ability much more lack of of impressive symmetry f3( 0 T) than Fi (x) o gives two estimates of the 26 and those reject 0 f2( T) observations rejection F.i (x) is striking in of this of 0 0 T c HMM(2) T c HMM(3). instance. the mean and standard deviation of figure 5 is a histogram of samples to predicted the case T = 25. by equation (28) is The Table log dF2 3 (x), 7 and The mean values of the 10000 agree very well; however, TR 7989 Table 5. Comparison of Two Estimates for the Mean and Standard Deviation of log dF 3 3 (x) Mean Value Sample Eq. 28 T - 5.6 -12.8 -18.6 -23.5 -28.1 -50.5 5 10 15 20 25 50 Standard Deviation Sample Eq. 29 - 4.9 -11.8 -18.2 -22.6 -26.3 -47.2 Table 6. 1.92 2.30 2.72 3.22 3.61 4.59 1.13 1.91 2.33 2.29 2.15 2.75 Number of 0 T c HMM(i) for Which f3(OT) = 0, i = 1, 2 T HMM(l) HMM(2) 5 10 15 20 9389 9937 9997 10000 9172 9918 9988 10000 Table 7. Comparison of Two Estimates for the Mean and Standard Deviation of log dF2 3 (x) Mean Value Sample Eq. 28 T 5 - 10.8 - 21.4 - 32.0 - 42.6 - 53.2 -106.4 10 15 20 25 50 dF 23 (x) is not as dFl (x), as seen from standard deviations. well Standard Deviation Sample Eq. 29 -10.8 -21.4 -32.0 -42.7 -53.5 -106.8 0.51 0.91 1.02 1.04 1.05 1.11 approximated the discrepancy In any event, it by a in the log-normal 0.60 0.89 1.04 1.15 1.12 1.43 as sample versus dF 22(x) and the predicted is clear by comparing table 7 with 27 TR 7989 -60 -51 -54 -ST -48 -42 -45 Figure 5. Histogram of 10000 Values of log dF2 3 (x) for T = 25. (The Normal Curve Has the Sample Mean and Variance Given in Table 7.) the lower HMM(3) time half from of of table 0T c HMM(2) HMM(3) is 3 that when almost f2(OT) T = 50. certainly cannot reliably distinguish However, since first less than the T = 50, c 0 passage increasing the observation sequence length to improve reliability is not appropriate if the underlying intent is the classification of the transient portion of HMM(3). C. LEFT-TO-RIGHT HMM WITH NOISE In this example, the effect of noise on the maximum posterior likelihood classifier is assessed for the left-to-right model HMM(3). study noise in finite symbol The right way to HMMs is to add the noise to the original time series s(t) and then analyze the particular preprocessor under consideration to determine the noisy symbol sequence. However, no particular preprocessor is proposed here, and so we resort to modeling noise in much the same way that discrete Shannon can give an modeled noisy indication of the successful of the probable number of incorrect it cannot 28 memoryless provide an assessment channels. 7 This approach classification rate as a function symbols in an observation sequence, but of the effect of signal-to-noise ratio on TR 7989 classification because such hkj probability an assessment requires knowledge of the preprocessor. Denote altered by to symbol the V. 3 by the that the observation noise mechanism and define symbol the Vk m-by-m noise probability matrix H = [hkj]. It is assumed that H is independent of state of of given HMM the noiseless. matrix H, BH). The Markov chain and corrupted If x by (ir, = noise is t. Consequently, the equivalent to another is of the equivalent noiseless straightforward: probability that the state of symbol bik hkj gives probability output of HMM the of the Markov chain HMM are product that the component matrix b.j B. Clearly, of b.. = bik h (IT, is a is the equivalent equals the is The sum over noiseless (ij) A, the j is i and that symbol produced, given that symbol k was the output of the given HMM. k the A, B) are the parameters of a given HMM with noise the parameters proof time is HMM component of the product BH, so that B = BH. The noise row probability matrix H must be sum must equal one. row stochastic; The HMM-generated symbol Vk that is altered is, every by noise to one of the available symbols, so that row k must sum to one. Because H has probability matrix = BH sums to one. row sums for the equal to one, the matrix equivalent noiseless HMM; B is a valid that is, each symbol row of We have m j=1 m m j~l kzl m E k=l m bikEI j~l hkj m EZ bik k=l =21 29 TR 7989 The worst case noise entry = 1/m for all i and j. m E bik mL kj guishable. HMMs In with fact, 0 H .=k m k=l k=l Consequently, has the constant In this case, m ij 0 denoted H , probability matrix, noise makes probability all HMMs H0 matrix statistically are indistin- equivalent to the ergodic HMM(l) given in table 1. Let Pr[Vi] be the relative frequency of occurrence of the symbol Vi observation sequences of length T before the addition of noise. Thus, we have E Pr[V.] = I. After alteration by noise, the probability of in correct occurrences of V in 0T is that the symbol O(t) c 0 T is correct is then Pr[Vi] h. .. The probability m D (30) Pr[Vi] hii = i=1 and the probability that O(t) is incorrect is ET = 1 - DT For (31) the examples here, given a specific value of E we choose the simple noise probability matrix H defined by hi I - ET , all i , (32) ET h.. Tal m - 1 ij For this choice of H, 0T is * j all i independent of j the actual as is clear from equation (30) and the fact that 30 values Z Pr[V.] = 1. of Pr[Vi, TR 7989 Noise tends to make observations HMM(l), and ergodic sequences for therefore to function of results the how many incorrect various sequences. likely small of T It shows T than for HMMs look like observations of tend to HMM(3). The forbidden symbol symbol values observation for sequences left-to-right determine the for observation of all first probability and ETo that large based This forbidden natural sequences ET. forbidden T. have on 8 of the 10000 sequences are shows a as gives simulations table also is issue occur Table symbol symbol that less noisy observations of HMM(3) do not have as high a proportion of Forbidden symbol sequences as observations of HMM(l) and HMM(2), as can be seen by comparing tables 6 and 8. that ET due to must be forbidden ET = 0.001, 1.21 percent. in the small the T must symbol be for ET = 10 percent, One may conclude from table 8 short to sequences. misclassification Shorter likelihood and even For minimize misclassification instance, rate is and thus T = 25 apparently T, however, causes smaller shifts function if at and least in the statistics increases the misclassification rate. Consequently, a tradeoff exists between short T and long T. The total misclassification misclassification rate due rate to can be expressed forbidden misclassification rate due to noise-induced nonzero values of the posterior likelihood symbol shift as the sum of sequences and in the statistics function. of the the the We examine the total misclassification rate for HMM(4), which is defined to be the HMM equivalent Table 8. Number of 0 T c HMM(3) + Noise for Which f3(OT) = 0 at Various Values of ET ET T 0.1 0.01 0.001 5 10 15 20 25 50 2194 3906 5305 6625 7643 9643 236 443 651 986 1303 2684 23 37 64 103 121 345 0.0001 1 1 11 13 11 34 31 TR 7989 to HMM(3) with the noise matrix H given by equation (32) with E 0.001. = The parameters of HMM(4) are given explicitly in table 9. 0(x) the cumulative distribution function Denote by F. Pr[O < fi(OT) < x I 0 T c HMM(j)] F for x > 0 , (33) = for x < 0 O, Ten-thousand 25. As given likelihood nonzero observation in function values of sequences table 8, 121 values (that f3( 0 T) give 0 T were generated sequences is, the from resulted for in zero and the remaining shown in figure 0 f3( T) = 0) histogram HMM(4) I = posterior 6. 9819 By comparison with figure 4, it is clear that no significant difference between 0 log dF3 4 (x) and log dF3 3 (x) is evident. Therefore, the misclassifi- Table 9. Parameters of HMM(4), Rounded to Three Significant Digits NUMBER OF MARKOV STATES = 5 NUMBER OF SYMBOLS PER STATE = 8 32 INITIAL STATE PROBABILITY VECTOR: l.OOE+O0 O.OOE O0 O.OOE+0O O.OOE+00 O.OOE+O0 TRANSITION 6.OOE-Ol O.OOE+O0 O.OOE+O0 O.OOEO0 O.OOE+OO O.OOE+O0 l.OOE-0l 4.OOE-Ol 7.OOE-Ol O.OOE+0O O.OOE O0 O.OOE+00 O.OOE-0O 3.OOE-Ol l.OOE O0 SYMBOL PROBABILITY MATRIX (TRANSPOSED): B.99E-Ol l.OOE-Ol 1.43E-04 l.43E-04 l.OOE-Ol 5.99E-01 1.43E-04 l.43E-04 l.43E-04 2.OOE-Ol 3.OOE-Ol l.43E-04 l.43E-04 l.OOE-Ol 5.99E-01 I.OOE-Ol l.43E-04 1.43E-04 l.OOE-Ol 2.OOE-Ol l.43E-04 1.43E-04 1.43E-04 4.OOE-Ol l.43E-04 1.43E-04 1.43E-04 3.OOE-0l l.43E-04 1.43E-04 1.43E-04 l.43E-04 1.43E-04 1.43E-04 1.43E-04 1.43E-04 1.43E-04 l.OOE-Ol 5.99E-01 3.OOE-Ol PROBABILITY MATRIX: 4.OOE-Ol O.OOE+O0 7.OOE-0 2.OOE-01 O.OOEfOO 6.OOE-Ol O.OOE+O0 O.OOE+O0 O.OOE OO O.OOE+O0 TR 7989 -45 -40 -35 -3o -15 -20 -is 0 Figure 6. Histogram of 9879 Samples of log dF3 4 (x) for T = 25. (The Normal Curve Has the Sample Mean - -28.156 and the Variance = 3.6167.) cation rate due effectively reliable to zero. when noise-induced The used maximum with shifts in posterior noisy the statistics classifier observations of is thus characterized dF0 4 (x) is 98.8 by percent the noise probability matrix H. ET = 0.001 Because has probability 0.025 observation of sequences, symbol is 250. caused the in this having at the expected Nearly half only example, (121) significant each least one number observation incorrect with contained at sequence symbol. least one forbidden symbol misclassification problem. Of 025 10000 incorrect sequences and The other half of F0ii~x (x) apparently made no contribution to misclassification. It would instead of compute the seems to be be desirable Fii (x). be Alternatively, amplitude present to of in the the able to compute it would impulse be (delta left-to-right the moments desirable function) HMMs considered to in be able dF. .(x) here. to that In other words, if we write dF. (x) = A 6(x) + dF.0 (x) ii3 , (34) 33 TR 7989 Knowing A and then an algorithm to compute A directly would be worthwhile. of moments IV. The first apparently strictly are are empirical and a proof The of ergodic, distribution. dF ij(x) Consequently, does higher dFi(x) developing order give good in case when is that are dFij (x) is claim is this of the needed dF ij(x) or both, or HMM(j), approximate moments the approximation When either HMM(i) not F ij(x) supporting degree of reasonable continuous approximations to dFij(x). 34 of reason The evidence the log-normal would be worthwhile. not function ergodic. log-normal. nearly M ij(2,T) and density both However, CONCLUSIONS M ij(1,T) probability HMM(j) and HMM(i) to the of estimatE, moments two F0 (x). ir of F.. gives the moments .. such an algorithm requires further work. the log-normal to develop TR 7989 REFERENCES 1. L. R. Rabiner, S. E. Levinson, and M. M. Sondhi, Vector Quantization and Hidden Isolated Word Recognition," Markov Models "On the Application of to Speaker-Independent The Bell System Technical Journal, vol. 62, no. 4, April 1983, pp. 1075-1105. 2. S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, "An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition," The Bell System Technical Journal, vol. 62, no. 4, April 1983, pp. 1035-1074. 3. H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I, chapter 2, Wiley and Sons, New York, 1968. 4. A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, 1965, p. 158. 5. L. A. Liporace, Observations Theory, vol. 6. J. G. Kemeny of "Maximum Markov Likelihood Sources," IEEE Estimation for Transactions on Multivariate Information IT-28, no. 5, September 1982, pp. 729-734. and J. L. Snell, Finite Markov Chains, chapter 3, Van Nostrand, Princeton, New Jersey, 1960. 7. C. E. Shannon, "A Mathematical Theory of Communication," The Bell System Technical Journal, vol. 27, 1948. 35/36 Reverse Blank INITIAL DISTRIBUTION LIST Addressee NAVSEA (Code 63-0 (CDR E. Graham, C. Walker)) NRL (Code 5130 (Dr. P. Abraham), - 5844 (Dr. F. Rosenthal, Dr. R. J. Hansen), -5104 (Dr. S. Hanish)-, -5130 (R. Menton, J. Cole)) NRL/USRD, Orlando (A. Markowitz) NOSC, San Diego (E. Schiller, R. Johnson, R. Smith, G. Benthein) DTNSRDC, Bethesda (Code 1541 (P. Rispin)) DARPA (CMDR A. Sears) OTIC NAVPGSCOL, Monterey Naval Technology Office (L. Epley) ONR (Code 434 (Dr. N. Glassman, Dr. R. Fitzgerald), -220B (Dr. T. Warfield)) ONR, Boston (Dr. R. L. Sternberg) AMSI (W. Zimmerman) Bell Laboratories (S. Francis, G. Zipfel) S. Berlin, Inc. (S. Berlin) Campbell and Kronauer Associates (C. Campbell, R. Kronauer) Gould, Inc. (S. Lemon) G&R Associates (F. Reiss) SCRIPPS (0. Andrews, Jr.) Technology Service Corp. (L. Brooks) Atlantic Applied Research (S. Africk) Signal Technology, Inc. (Dr. S. B. Davis) ATT Bell Laboratory (Dr. S. Levinson) Digital Equipment Corp. (Dr. T. Vitale) Atlantic Aerospace Electronics Corp. (L. Lynn) URI, Dept. of Math. (Prof. P. T. Liu) URI, Dept. of Elec. Eng. (Dr. 0. Tufts) Chase, Inc. (0. Chase) A&T, No. Stonington (L. Gorham) No. of Copies 2 6 1 4 1 1 12 1 1 3 1 I 2 1 2 I 1 1 1 I 1 1 1 1 1 I 1 1
© Copyright 2026 Paperzz