International Journal of Control Vol. 81, No. 5, May 2008, 820–835 Structural transformations of probabilistic finite state machines I. CHATTOPADHYAY and A. RAY* The Pennsylvania State University, University Park, PA 16802, USA (Received 12 June 2007; in final form 25 September 2007) Probabilistic finite state machines have recently emerged as a viable tool for modelling and analysis of complex non-linear dynamical systems. This paper rigorously establishes such models as finite encodings of probability measure spaces defined over symbol strings. The well known Nerode equivalence relation is generalized in the probabilistic setting and pertinent results on existence and uniqueness of minimal representations of probabilistic finite state machines are presented. The binary operations of probabilistic synchronous composition and projective composition, which have applications in symbolic model-based supervisory control and in symbolic pattern recognition problems, are introduced. The results are elucidated with numerical examples and are validated on experimental data for statistical pattern classification in a laboratory environment. 1. Introduction and motivation Probabilistic finite state machines have recently emerged as a modelling paradigm for constructing causal models of complex dynamics. The general inapplicability of classical identification algorithms in complex non-linear systems has led to development of several techniques for construction of probabilistic representations of dynamical evolution from observed system behaviour. The essential feature of a majority of such reported approaches is partial or complete departure from the classical continuous-domain modelling towards a formal language theoretic and hence symbolic paradigm (Ray 2004, Shalizi and Shalizi 2004). The continuous range of a sufficiently long observed data set is discretized and tagged with labels to obtain a symbolic sequence (Ray 2004), which is subsequently used to compute a language-theoretic finite state probabilistic predictor via recursive model update algorithms. Symbolization essentially discretizes the continuous state space and gives rise to probabilistic dynamics from the underlying deterministic process, as illustrated in figure 1. Among various reported symbolic reconstruction algorithms, causal-state splitting reconstruction *Corresponding author. Email: [email protected] (CSSR) (Shalizi and Shalizi 2004) computes optimal representations (e.g., "-machines) and is reported to yield the minimal representation consistent with accurate prediction. In contrast, the D-Markov construction (Ray 2004) produces a sub-optimal model, but it has a significant computational advantage and has been shown to be better suited for online detection of small parametric anomalies in dynamic behaviour of physical processes (Rajagopalan and Ray 2006). This paper addresses the issue of structural manipulation of such inferred probabilistic models of system dynamics. The ability to transform and manipulate the automaton structure is critical for design of supervisory control algorithms for symbolic models and real-time pattern recognition from symbol sequences. Specific issues are delineated in the sequel. 1.1 Applications of symbolic model-based control The natural setting for developing control algorithms for symbolic models is that of probabilistic languages. The notion of probabilistic languages in the context of studying qualitative stochastic behaviour of discrete-event systems first appeared in Garg (1992a, b), where the concept of p-languages (‘p’ implying probabilistic) is introduced and an algebra is developed to model probabilistic languages based on concurrency. International Journal of Control ISSN 0020–7179 print/ISSN 1366–5820 online 2008 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/00207170701704746 821 Structural transformations of probabilistic finite state machines s • •• q1 t q3 q1 • q3 t q2 •• Σ* q3 s q2 (a) q2 (b) Figure 1. Emergence of probabilistic dynamics from the underlying deterministic system due to discretization as symbolic states q1, q2, q3; and , are symbolic events. (a) 1 / 0.2 1 / 0.4 MODEL 0 / .7 q2 1/0.4 1 / 0.3 Figure 2. q3 (c) −1 qA1 1/0.2 0/0.6 1/0.3 CONTROL SPECIFICATION qA 1 qB +1 0 Symbolic model and control specification. A multitude of control algorithms for p-languagetheoretic models have been reported. Earlier approaches Lawford and Wonham (1993) and Kumar and Garg (2001) attempt a direct generalization of Ramadge and Wonham’s supervisory control theory (Ramadge and Wonham 1987) for deterministic languages and proves to be somewhat cumbersome in practice. A significantly simpler approach is suggested in Ray (2005), Chattopadhyay (2006) and Chattopadhyay and Ray (2007), where supervisory control laws are synthesized by elementwise maximization of a language measure vector (Ray 2005, Chattopadhyay and Ray 2006) to ensure that the generated event strings cause the supervised plant to visit the ‘‘good’’ states while attempting to avoid the ‘‘bad’’ states optimally in a probabilistic sense. The notion of ‘‘good’’ and ‘‘bad’’ is induced by specifying scalar weights on the model states, with relatively more negative weights indicating less desirable states. Unlike the previous approaches, the measure-theoretic approach does not require a ‘‘specification automaton’’; however, a specification weight is assigned to each state of the finite state machine. (Note: These states are different from the states obtained via symbolic reconstruction of observed physical data.) Figure 2 illustrates the underlying concept. The symbolic model shown on the left which has three states q1, q2, q3, while the control objective is specified by weights þ1 and 1 on states qA, qB of the two-state automaton on the right. Recalling that every finite state automaton induces a right invariant partition on the set of all possible qB2 +1 0/0.6 qB3 +1 1/0.2 0/0.7 0/0.8 −1 −1 qA2 1/0.4 0 / .8 0 / 0.6 +1 1/0.3 0 1 +1 (b) −1 qA3 q1 +1 −1 qB(+1) q1 −1 −1 qA(−1) 0/0.8 qB1 +1 0/0.7 (d) Figure 3. Imposing control specification by probabilistic synchronous composition of automata. finite length strings, the above situation is illustrated by figures 3(a) and (b). The operation of probabilistic synchronous composition, defined later in this paper, resolves the problem by considering the product partition in figure 3(c). Then, the given model is ztransformed into the one shown in figure 3(d), on which the optimization algorithm reported in Chattopadhyay and Ray (2007) can be directly applied to yield the optimal supervision policy. 1.2 Applications of symbolic pattern recognition As mentioned earlier, the symbolic reconstruction algorithms (Ray 2004, Shalizi and Shalizi 2004) generate probabilistic finite state models from observed time series. However, in a pattern classification problem, one may be only interested in a given class of possible future evolutions. For example, as illustrated in figure 4, while the systems G1, G2, . . . , Gk yield different symbolic models A1, A2, . . . , Ak, we may be only interested in matching a given template, i.e., knowing how similar the systems are as far as strings with even number of 0s is concerned (note: qA ¼ {strings with even number of 0s}). The operation of projective composition, defined in this paper, allows transformation of each model Ai to the structure of the template while preserving the distribution over the strings of interest, and is of critical importance in symbolic pattern classification problems. As shown in the sequel, the model order of the machines 822 G1 G2 I. Chattopadhyay and A. Ray 0101010001 0101011001 1 A1 A2 TEMPLATE qA Gk 1101001101 Figure 4. Ak 1 RECONSTRUCTED MODELS 0 qB 0 Symbolic template matching problem. as a language. The function : Q ! Q represents the state transition map and : Q ! Q is the reflexive and transitive closure (Hopcroft et al. 2001) of and Qm Q is the set of marked (i.e. accepting) states. For given functions f and g, we denote the composition as f g. Definition 1: The classical Nerode equivalence N (Hopcroft et al. 2001) on with respect to a given language L is defined as: 8x, y 2 , Ai is not particularly important; hence projective composition accomplishes model order reduction within a quantifiable error. 1.3 Organization of the paper The paper is organized in seven sections including the present one. Section 2 presents preliminary concepts and pertinent results that are necessary for subsequent development. Section 3 introduces the concept of probabilistic finite state automata as finite encodings of probability measure spaces. The concept of Nerode equivalence is generalized to probabilistic automata and the key results on existence and uniqueness of minimal representations are established. Section 4 presents metrics on the space of probability measures on symbolic strings which is shown to induce pseudometrics on the space of probabilistic finite state automata. Along this line, the concept of probabilistic synchronous composition is introduced and the results are elucidated with a simple example. Section 5 defines projective composition and invariance of projected distributions is established. A numerical example is provided for clarity of exposition. Section 6 demonstrates applicability of the developed method to a pattern classification problem on experimental data. The paper is summarized and concluded in x 7 with recommendations for future research. 2. Preliminary notions A deterministic finite state automaton (DFSA) is defined (Hopcroft et al. 2001) as a quintuple Gi ¼ (Q, , , qi, Qm), where Q is the finite set of states, and qi 2 Q is the initial state; is the (finite) alphabet of events. The Kleene closure of , denoted as , is the set of all finite-length strings of events including the empty string "; the set of all finite-length strings of events excluding the empty string " is denoted as þ and the set of all strictly infinite-length strings of events is denoted as !. A subset of ! is called an !-language on the alphabet and a subset of is called a -language. If the meaning is clear from context, we refer to a set of strings simply ðxN y , ð8u 2 ðxu 2 LÞ()ðyu 2 LÞÞÞ: ð1Þ A language L is regular if and only if the corresponding Nerode equivalence is of finite index (Hopcroft et al. 2001). Probabilistic finite state automata (PFSA) considered in this paper are built upon deterministic finite state automata (DFSA) with a specified event generating function. The formal definition is stated next. Definition 2 (PFSA): A probabilistic finite state ~ automata (PFSA) is a quintuple Pi ¼ ðQ, , , qi , Þ where the quadruple (Q, , , qi) is a DFSA with unspecified marked states and the mapping ~ : Q ! ½0, 1 satisfies the following condition: X ~ j , Þ ¼ 1: ð2Þ ðq 8qj 2 Q, 2 In the sequel, ~ is denoted as the event generating function. For a PFSA Pl, cardinality of the set of states is denoted as NUMSTATES(PI). ~ Definition 3: For every PFSA Pi ¼ ðQ, , , qi , Þ, there is an associated stochastic matrix Q 2 RNUMSTATESðPl ÞNUMSTATESðPi Þ called the state transition probability matrix, which is defined as follows: Y X ~ j , Þ: ¼ ð3Þ ðq jk :ðqj , Þ¼qk We further note that for every stochastic matrix , there exists at least one row-vector } such that } ¼ }, where 8j }j = 0 and NUMSTATES X ðPi Þ }j ¼ 1, j¼1 ð4Þ where } is a stable long term distribution over the PFSA states. If is irreducible, then } is unique. Otherwise, there may exist more than one possible solution to equation (4), one for each eigenvector corresponding to unity eigenvalue. However, if the initial state is specified (as it is in this paper), then } is always unique. Several efficient algorithms have been reported in Kemeny and Snell (1960), Harrod and 823 Structural transformations of probabilistic finite state machines Plemmons (1984) and Stewart (1999) for computation of }. ⎧ ⎨ ⎩ Key definitions and results from measure theory that are used here are recalled. S1 Past q0 τ qk Definition 4 (-Algebra): A collection M of subsets of a non-empty set X is said to be a -algebra (Rudin 1988) in X if 2M has the following properties: (1) X 2 M (2) If A 2 M, then Ac 2 M where Ac is the complement c of A relative S1 to X, i.e., A ¼ X\A (3) If A ¼ n¼1 An and if An 2 M for n 2 N, then A 2 M. Theorem 1: If f is any collection of subsets of X, there exists a smallest -algebra M in X such that f M . Proof: See (Rudin 1988, Theorem 1.10) œ Definition 5 (Measure): A finite (non-negative) measure is a countably additive function , defined on a -algebra M, whose range is [0, K] for some K 2 R. Countable additivity means that if {Ai} is a disjoint countable collection of members of M, then ! 1 1 X [ Ai ¼ ðAi Þ: ð5Þ i¼1 i¼1 Theorem 2: If is a (non-negative) measure on a -algebra M, then (1) (;) ¼ 0 (2) (Monotonicity) A B ) ðAÞ ðBÞ if A, B 2 M. Proof: See (Rudin 1988, Theorem 1.19). œ Definition 6: A probability measure on a non-empty set with a specified -algebra M is a finite non-negative measure on M. Although not required by the theory, a probability measure is defined to have the unit interval [0, 1] as its range. Definition 7: A probability measure space is a triple ðX, M, qÞ where X is the underlying set, M is the -algebra in X and q is a finite non-negative measure on M. 3. Properties of probabilistic finite state automata For any 2 , the language ! has an important physical interpretation pertaining to systems modeled as probabilistic language generators (figure 5). A string 2 can be interpreted as a symbol sequence that has been already generated, and any string in ! qualifies as a possible future evolution. Thus, the language ! is conceptually associated with the current dynamical state of the modelled system. S3 Present Σω Possible Future S4 Figure 5. Interpretation of the language ! pertaining to dynamical evolution of a language generator. Definition 8: Given an alphabet , the set B ¼ 2 ! is defined to be the -algebra generated by the set fL : L ¼ ! where 2 g, i.e., the smallest -algebra on the set ! which contains the set fL : L ¼ ! where 2 g. Remark 1: Cardinality of B is @1 because both 2 and ! have cardinality @1. The following relations in the probability measure space ð! , B , qÞ are consequences of Definition 8. . qð! Þ ¼ qð"! Þ ¼ 1 . 8x, u 2 , xu! j x! and hence qðxu! Þ 5 qð! Þ Notation 1: For brevity, the probability qð! Þ is denoted as qðÞ8 2 in the sequel. Next the notion of probabilistic Nerode equivalence N q is introduced on for representing the measure space ð! , B , qÞ in the form of a PFSA. In this context, the following logical formulae are introduced. Definition 9: For x, y 2 , U1 ðx, yÞ ¼ ðqðxÞ ¼ 0 ^ qðyÞ ¼ 0Þ ð6aÞ U2 ðx, yÞ ¼ ðqðxÞ 6¼ 0 ^ qðyÞ 6¼ 0Þ^ qðxuÞ qðyuÞ ¼ : 8u 2 qðxÞ qðyÞ ð6bÞ Theorem 3 (probabilistic nerode equivalence): Given an alphabet , every measure space ð! , B , qÞ induces a right-invariant equivalence relation N q on defined as 8x, y 2 , xN q y , U1 ðx, yÞ _ U2 ðx, yÞ : ð7Þ Proof: Reflexivity and symmetry properties of the relation N q follow from Definition 9. Let x, y, z 2 be distinct and arbitrary strings such that xN q y and yN q z. Then, transitivity property of N q follows from equation (7) and Definition 9. Hence, N q is an equivalence relation. To establish right-invariance (Hopcroft et al. 2001) of N q , it suffices to show that 8x, y 2 , xN q y , 8u 2 , xuN q yu : ð8Þ 824 I. Chattopadhyay and A. Ray Let x, y, u be arbitrary strings in such that xN q y. If qðxÞ ¼ 0, qðyÞ ¼ 0 from equation (7). Then, it follows from the monotonicity property of the measure (see Theorem 2) that qðxuÞ ¼ 0, which implies the truth of U1 ðxu, yuÞ and hence the truth of xuN q yu. If qðxÞ 6¼ 0, then ðxN q yÞ ^ ðqðxÞ 6¼ 0Þ implies qðyÞ 6¼ 0. Hence, qðxuÞ qðxuÞ qðxÞ ¼ ¼ : qðxuÞ qðxÞ qðxuÞ ð9Þ If qðxÞ ¼ qðyÞ, then xN q y implies qðxuÞ ¼ qðyuÞ and also 8 2 ðqðxuÞ ¼ qðyuÞÞ. Similarly, if qðxÞ 6¼ qðyÞ, then xN q y implies qðxuÞ 6¼ qðyuÞ and also 8 2 ðqðxuÞ 6¼ qðyuÞÞ. Hence, 8 2 ððqðxuÞ ¼ qðyuÞÞ , ðqðxuÞ ¼ qðyuÞÞÞ: œ Definition 10 (perfect encoding): Given an alphabet , ~ is defined to be a perfect PFSA Pi ¼ ðQ, , , qi , Þ encoding of the measure space ð! , B , qÞ if 8 2 þ and ¼ 1 2 . . . r , ~ i , 1 Þ qðÞ ¼ ðq r1 Y ~ ðqi , 1 . . . k Þ, kþ1 Þ: ð ~ ‘ , 1 Þ qðhk Þ ¼ qðhk Þðq Theorem 4: A PFSA is a perfect encoding if and only if the corresponding probabilistic Nerode equivalence N q is of finite index. Proof (Left to Right): Let Q be the finite set of equivalence classes of the relation N q of the PFSA ~ that is constructed as follows: Pi ¼ ðQ, , , qi , Þ (1) Since N q is an equivalence relation on , there exists a unique qi 2 Q such that " 2 qi . The initial state of Pi is set to qi. (2) If x 2 qj and x 2 qk , then (qj, ) ¼ qk ~ j , Þ ¼ ðqðxÞ=qðxÞÞ where x 2 qj . (3) ðq First we verify that the steps 2 and 3 are consistent in the sense that and ~ are well-defined. Probabilistic nerode equivalence (see Theorem 3) implies that if x, y 2 , then ððx 2 qj Þ ^ ðx 2 qk Þ^ ðy 2 qj ÞÞ ) ðy 2 qk Þ. Therefore, the constructed is well-defined. Similarly, since ðx, y 2 qj Þ ) ðqðxÞ ¼ qðyÞÞ ^ ðqðxÞ ¼ qðyÞÞ, the constructed ~ is also well-defined. Therefore, the steps 2 and 3 are consistent. For ¼ 1 2 . . . r 2 þ , it follows that R Y qð1 . . . r Þ qð1 . . . r1 Þ r¼2 R 1 Y r¼1 ~ ð ðqi , 1 . . . r Þ, rþ1 Þ: ~ ð ðq‘ , 1 m Þ, mþ1 Þ m¼1 ð10Þ Remark 2: The implications of Definition 10 are as follows: The encoding introduced is perfect in the sense that the measure q can be reconstructed without error from the specification of Pi. ~ i , 1 Þ ¼ ðq r1 Y ~ ‘ , 1 Þ qðhj Þ ¼ qðhj Þðq k¼1 qðÞ ¼ pð1 Þ Hence, the criterion for perfect encoding (see Definition 10) is satisfied. ~ Right to Left: Let the PFSA Pi ¼ ðQ, , , qi , Þ be a perfect encoding; and let the probabilistic Nerode equivalence N q be of infinite index. Then, there exists a set of strings h , having the same cardinality as , such that each element of h belongs to a distinct N q -equivalence class. That is, 8hj , hk 2 h such that j 6¼ k, we have hj N q hk . Since qðhj Þ ¼ qðhk Þ ¼ 0 implies hj N q hk , there can exist at most one element such that qðh0 Þ ¼ 0. That is, h0 2 h qðhj Þ 6¼ 0 8hj 2 h fh0 g. ~ where Q is the For the PFSA Pi ¼ ðQ, , , qi , Þ, finite set of states, there exists q‘ 2 Q and hj , hk 2 h such Let 2 þ and that (qi,hj) ¼ (qi,hk) ¼ q‘. ¼ 1 2 r . Since Pi ia a perfect encoding, it follows from Definition 10 that r1 Y ~ ð ðq‘ , 1 m Þ, mþ1 Þ: m¼1 Now, it follows that ^ qðhj Þ qðhk ÞÞ ¼ qðhj Þ 6¼ 0 ^ qðhk Þ 6¼ 0 qðhj Þ qðhk Þ ) U2 ðhj , hk Þ ) hj N q hk which contradicts the initial assertion that hj N q hk 8hj , œ hk 2 h. This completes the proof. The construction in the first part of Theorem 4 is stated in the form of Algorithm 1. Algorithm 1: Construction of PFSA from the probability measure space (!, B , q) input: (!, B , q) such that N q is of finite index output: Pi 1 begin 2 Let Q ¼ fqj : j 2 J $ Ng be the set of equivalence classes of the relation N q ; 3 Set the initial state of Pi as qi such that " belongs to the equivalence class qi; If x 2 qj and x 2 qk , then set ðqj , Þ ¼ qk ; ~ j , Þ ¼ qðxÞ=qðxÞ where x 2 qj ; 4 ðq 5 end Corollary 1 ~ ðQ, , , qi , Þ -algebra B equivalence is (to Theorem 4): A PFSA Pi ¼ induces a probability measure q on the and the corresponding probabilistic Nerode of finite index. 825 Structural transformations of probabilistic finite state machines Proof: Let a probability measure q be constructed on the -algebra B as follows: ! r1 Y ~ i , 1 Þ 8 2 þ , qðÞ ¼ ðq ~ ð ðqi , 1 k Þ, kþ1 Þ : k¼1 It follows from Definition 10 that Pi perfectly encodes the measure q and Theorem 4 implies that the œ corresponding N q is of finite index. On account of Corollary 1, we can map any given PFSA to a measure space (!, B , q). Definition 11: Let p be the space of all probability measures on B and a be the space of all possible ~ PFSA Pi ¼ ðQ, , , qi , Þ. . The map H : a ! p is defined as HðPi Þ ¼ q such that ! r1 Y þ ~ i , 1 Þ 8 2 , qðÞ ¼ ðq ~ ð ðqi , 1 k Þ, kþ1 Þ k¼1 where ¼ 1 2 r : ð11Þ . The map H1 : p ! a is defined as ( Pi given by Algo:1 if N q is of finite index H1 ðqÞ ¼ Undefined otherwise: ð12Þ Lemma 1: Pi is a perfect encoding for HðPi Þ. Proof: The proof follows from Definition 10 and Definition 11. œ Next we show that, similar to classical finite state machines, an arbitrary PFSA can be uniquely minimized. However, the sense in which the minimization is achieved is somewhat different. To this end, we introduce the notion of reachable states in a PFSA and define isomorphism of two PFSA. Definition 12 (reachable states): Given a PFSA ~ the set of reachable states Pi ¼ ðQ, , , qi , Þ, RCHðPi Þ j Q is defined as: q~ 2 RCHðPi Þ ) 9 ¼ 1 R 2 such that ð ðqi , Þ ¼ q~ Þ ~ i , i Þ ^ ðq R 1 Y ! ~ ð ðqi , 1 r Þ, rþ1 Þ > 0 : r¼1 Remark 3: The strict positivity condition in Definition 12 ensures that every state in the set of reachable states can actually be attained with a strictly non-zero probability. In other words, for every state qj 2 RCHðPi Þ, there exists at least one string !, initiating from qi and eventually terminating on state qj, such that the generation probability of ! is strictly positive. Definition 13 (Isomorphism): Two PFSA Pi ¼ ~ and P0i0 ¼ ðQ0 , , 0 , q0i0 , ~ 0 Þ are defined to ðQ, , , qi , Þ be isomorphic if there exists a bijective map : RCHðPi Þ ! RCHðP0i0 Þ such that ~ j , Þ ~ j , Þ 6¼ 0 ) ~ 0 ððqj Þ, Þ ¼ ðq ðq ^ 0 ððqj Þ, k Þ ¼ ððqj , k ÞÞ : Remark 4: The notion of isomorphism stated in Definition 13 generalizes graph isomorphism to PFSA by considering only the states that can be reached with non-zero probability and transitions that have a nonzero probability of occurrence. Theorem 5 (Minimization of PFSA): For a PFSA ~ H1 HðPi Þ is the unique minimal Pi ¼ ðQ, , , qi , Þ, realization of Pi in the sense that the following conditions are satisfied: (1) The PFSA H1 HðPi Þ perfectly encodes the probability measure HðPi Þ. (2) For a PFSA P0i0 that perfectly encodes HðPi Þ, the inequality CARDðRCHðH1 HðPi ÞÞÞ 5 CARDðRCHðP0i0 ÞÞ holds. (3) The equality, CARDðRCHðH1 HðPi ÞÞÞ 5 CARDðRCHðP0i0 ÞÞ, implies isomorphism of Pi and P0i0 in the sense of Definition 13. Proof: (1) The proof follows from the construction in Theorem 4. (2) Let P0i0 ¼ ðQ0 , , 0 , q0i0 , ~ 0 Þ be an arbitrary PFSA that perfectly encodes the probability measure HðPi Þ. Let us construct a PFSA Pyi0 ¼ ðQ0 [ fqd g, , y , qi0 , ~ y Þ, where qd is a new state not in Q0 , as follows: y ðq0j0 k Þ ¼ 8q0j 2 Q0 , 2 , ( qd if ~ 0 ðq0j k Þ ¼ 0 0 ðq0j0 k Þ ð13aÞ otherwise 8 2 , y ðqd , k Þ ¼ qd 8q0j 2 Q0 , 8 2 , ~ y ðq0j0 k Þ ¼ ~ 0 ðq0j0 k Þ: ð13bÞ ð13cÞ It is seen that Pyi0 perfectly encodes HðPi Þ as well, which follows from Definition 10 and equation (13c). It is claimed that CARDðRCHðPyi0 ÞÞ ¼ CARDðRCHðP0i0 ÞÞ ð14Þ based on the following rationale. Let q0j 2 RCHðP0i0 Þ. Following Definition 12, there exists a stringQ 2 such that 0 ðqi0 , xÞ ¼ q0j and ~ i0 , 1 Þ R1 ~ 0 ð 0 ðqi0 , 1 r Þ, rþ1 Þ > 0. It follows ðq r¼1 Q ~y from Equation (13c) that ~ y ðqi0 , 1 Þ R1 r¼1 y ð ðqi0 , 1 r Þ, rþ1 Þ > 0 and hence we conclude 826 I. Chattopadhyay and A. Ray using equation (13a) that y ðqi0 , xÞ ¼ qj 6¼ qd which then implies that q0j 2 RCHðPyi0 Þ. Hence we have 0 CARDðRCHðPi0 ÞÞ 5 CARDðRCHðPyi0 Þ. By a similar argument, we have CARDðRCHðPyi0 Þ 5 CARDðRCHðP0i0 ÞÞ and hence CARDðRCHðPyi0 Þ 5 CARDðRCHðP0i0 ÞÞ. Next, we claim 8x, y 2 y ðqi0 , xÞ ¼ y ðqi0 , yÞ ) xN HðPi Þ y ð15Þ based on the following rationale. Let x, y 2 s.t. ðy ðqi0 , xÞ ¼ y ðqi0 , yÞÞ. It follows from equations (13a–c) that ( ðHðPi ÞðxÞ ¼ 0 ^ HðPi ÞðyÞ ¼ 0Þ if y ðqi0 , xÞ ¼ qd ðHðPi ÞðxÞ 6¼ 0 ^ HðPi ÞðyÞ 6¼ 0Þ otherwise: Let the PFSA H1 HðPi Þ be denoted as ðQ# , , # , q#i# , ~ # Þ. Let eðq#j Þ denote the equivalence class of N HðPi Þ that q# represents. We define a map 0 : RCHðH1 HðPi ÞÞ ! 2RCHðPi0 Þ as follows: n o ðq#j Þ ¼ q0j 2 Q0 9x 2 eðq#j Þ s:t: 0 ðqi0 , xÞ ¼ q0j : ð19Þ We claim 8q#j , q#k 2 Q# ðC2Þ Let q0‘ 2 ðq#j Þ \ ðq#k Þ. Hence there exists xj 2 eðq#j Þ, xk 2 eðq#k Þ such that q0‘ ¼ 0 ðqi0 , xj Þ ¼ 0 ðqi0 , xk Þ ð16Þ Now, if ðHðPi ÞðxÞ ¼ 0 ^ HðPi ÞðyÞ ¼ 0Þ, then it follows from equation (6b) that xN HðPi Þ y. On the other hand, if ðHðPi ÞðxÞ 6¼ 0 ^ HðPi ÞðyÞ 6¼ 0Þ, then equation (6a) yields 8u ¼ 1 R 2 , ¼ ~ y ðqi0 , 1 Þ R 1 Y HðPi ÞðxuÞ HðPi ÞðyuÞ ¼ HðPi ÞðxÞ HðPi ÞðyÞ ~ y y ðqi0 , 1 r Þ, rþ1 ) xN HðPi Þ y: r¼1 We define a map : RCHðH1 HðPi ÞÞ ! RCHðPyi0 Þ as follows: Let q# 2 RCHðH1 HðPi ÞÞ and let "ðq# Þ be the equivalence class of the relation N HðPi Þ represented by q#. Let x ¼ 1 R 2 "ðq# Þ. HðPi ÞðxÞ > 0 ðSee Definition 12Þ y ) ~ ðqi0 , 1 Þ R 1 Y y y ~ ðqi0 , 1 r Þ, rþ1 > 0 xj N HðPi Þ xk ) 9u 2 r¼1 which contradicts equation (21). Next we claim that ðC3Þ Let x1 , x2 2 Eðq#j Þ such that 0 ðqi , x1 Þ ¼ q0j 0 ðqi , x2 Þ ¼ q0k ) y ðqi0 , xÞ 2 RCHðPyi0 Þ: Let ðq# Þ ¼ y ðqi0 , xÞ. Note that (q#) depends on the choice of x. Let q#1 , q#2 2 RCHðH1 HðPi ÞÞ such that ðq#1 Þ ¼ ðq#2 Þ. If x1, x2 are the corresponding strings chosen to define ðq#1 Þ, ðq#2 Þ, we have y ðqi0 , x1 Þ ¼ y ðqi0 , x2 Þ which implies x1 N HðPi Þ x2 , i.e., q#1 ¼ q#2 . Hence we conclude is injective which, in turn, implies ð17Þ Finally, from equations (14) and (17), it follows that ð18Þ Let P0i0 ¼ ðQ0 , , 0 , qi0 , ~ 0 Þ be an arbitrary PFSA that perfectly encodes HðPi Þ such that CARDðRCHðH1 HðPi ÞÞÞ 5 CARDðRCHðP0i0 ÞÞ: HðPi Þðxj uÞ HðPi Þðxk uÞ 6¼ ð21Þ HðPi Þðxj Þ HðPi Þðxk Þ but, P0i0 perfectly encodes HðPi Þ implying HðPi Þðxk uÞ HðPi Þðxj uÞ 6¼ 8u ¼ 1 R 2 HðPi Þðxj Þ HðPi Þðxk Þ ! R 1 Y ¼ ~ 0 ðq0‘ , 1 Þ ~ 0 ð 0 ðqi0 , 1 r Þ, rþ1 ðSince Pyi0 perfectly encodes HðPi ÞÞ CARDðRCHðH1 HðPi ÞÞÞ 5 CARDðRCHðP0i0 ÞÞ: ð20Þ that 8q#j 2 RCHðH1 HðPi ÞÞ CARDððq#j ÞÞ ¼ 1 r¼1 CARDðRCHðH1 HðPi ÞÞÞ 5 CARDðRCHðPyi0 ÞÞ: \ q#j 6¼ q#k ) ðq#j Þ ðq#k Þ ¼ : ðC1Þ with q0j 6¼ q0k : ð22Þ Therefore, CARDððq#j ÞÞ41 X ¼) q#k 2 RCHðH1 HðPi ÞÞ CARDððq#k ÞÞ 0 4CARDðRCHðH1 HðPi ÞÞÞ ¼)CARDðRCHðP0i0 ÞÞ4CARDðRCHðH1 HðPi ÞÞÞ which contradicts C1 thus proving C3. On account of C2 and C3, let us define a bijective map ~ #j Þ ¼ 0 ðqi , xÞ, ~ : RCHðH1 HðPi ÞÞ ! RCHðP0i0 Þ as ðq # x 2 eðqj Þ. Then, 9 8k 2 , 8q#j 2 RCHðH1 HðPi ÞÞÞ, x 2 eðq#j Þ > = ð23Þ HðP Þðx Þ i k > ~ #j Þ, k Þ ; ~ # ðq#j , k Þ ¼ ¼ ~ # ððq HðPi ÞðxÞ 827 Structural transformations of probabilistic finite state machines ~ # ðq#j , k ÞÞ ¼ 0 ðqi0 , xk Þ ð ~ #j Þ, k Þ ¼ 0 ð 0 ðqi0 , k Þ, k Þ ¼ 0 ððq ð24Þ which implies that H1 HðPi Þ and P0i0 , are isomorphic in the sense of Definition 13. This completes the proof. œ ~ the Theorem 6: For a PFSA Pi ¼ ðe, , , Ei , Þ, function ~ : Q ! Q can be extended to ~ : Q ! Q as 2 , ~ j , "Þ ¼ 1 ðq 2 , ~ j , Þ ¼ ðq ~ j , Þðq ~ j , Þððq ~ ðq j , Þ, Þ: 8qj 2 Q, ð25Þ Proof: Let ¼ HðPi Þ. We note that that Pi perfectly encodes q (see Lemma 1). It follows from Theorem 4 that ~ j , "Þ ¼ 8qj 2 Q, ðq qðx"Þ qðxÞ ¼ ¼ 1 where ðqi , xÞ ¼ qj : qðxÞ qðxÞ Similarly, for a string initiating from state qj, where 2 , 2 , we have qðxÞ qðxÞ qðxÞ ¼ : qðxÞ qðxÞ qðxÞ 4. Metrization of the space p of probability measures on BD Metrization of p is important for differentiating physical processes modeled as dynamical systems evolving probabilistically on discrete state spaces of finite cardinality. In this section, we introduce two metric families, each of which captures a different aspect of such dynamical behaviour and can be combined to form physically meaningful and useful metrics for system analysis and design. Definition 14: Given two probability measures q1 , q2 on the -algebra B and a parameter s 2 ½1, 1, the function ds : p p ! ½0, 1 is defined as follows: !1=s jj X q1 ðxj Þ q2 ðxj Þs ds ðq1 , q2 Þ ¼ sup q ðxÞ q ðxÞ x 2 j¼1 1 2 8s 2 ½1, 1Þ ð28aÞ q1 ðxj Þ q2 ðxj Þ : d1 ðq1 , q2 Þ ¼ sup max q2 ðxÞ x 2 2 q1 ðxÞ ð28bÞ ð26Þ Theorem 8: The space p of all probability measures on B is ds-metrizable for s 2 ½1, 1. ~ j , Þ. Also, (qi, x) ¼ qj We note that qðxÞ=qðxÞ ¼ ðq implies (qj, ) ¼ (qi, x). Therefore, qðxÞ=qðxÞ ¼ ~ ððq j , Þ, Þ and hence Proof: Strict positivity and symmetry properties of a metric follow directly from Definition 14. Validity of the remaining property of triangular inequality follows by application of Minkowski inequality (Rudin 1988). œ ~ j , Þ ¼ ðq ~ j , Þððq ~ ~ j , Þ ¼ ðq ðq j , Þ, Þ: œ This completes the proof. Theorem 7: For a measure space ð! , B , qÞ, H H1 ðqÞ ¼ q ð27Þ i.e., H H1 is the identity map from p onto itself. ~ We note Pi Proof: Let H1 ðqÞ ¼ Pi ¼ ðQ, , , qi , Þ. perfectly encodes q (See Lemma 3.1). Let HðPi Þ ¼ q0 . We claim 8x 2 , qðxÞ ¼ q0 ðxÞ: The result is immediate for jxj ¼ 0, i.e., x ¼ . For jxj 1, we proceed by the method of induction. For jxj ¼ 1, we note ~ i , Þ ¼ qðÞ ðperfect EncodingÞ: 8 2 , q0 ðÞ ¼ ðq Next let us assume that 8x 2 , s.t. jxj ¼ r 2 N, Since 8x 2 with jxj ¼ r 2 N, q ðxÞ ¼ qðxÞ. it follows that 0 0 0 ~ j , Þ where ðqi , xÞ ¼ qi q ðxÞ ¼ q ðxÞðq ~ i , Þ ¼ qðxÞ: ¼ qðxÞðq This completes the proof. œ Definition 15: Let M be a right invariant equivalence relation on and the ith equivalence class of M be denoted as Mi ; i 2 I, where I is an arbitrary index set. Let q be a probability measure on the -algebra B inducing the probabilistic Nerode equivalence N q on j with the jth equivalence class of N q denoted as N q , j 2 J , where J is an index set distinct from I . Then, the map M : p ! ½0, 1CARDðI Þ ½0, 1CARDðJ Þ is defined as X M ðqÞij ¼ qðxÞ: x 2 Mi \N jq be two probability Definition 16: Let q1 , q2 measures on the -algebra B . Then, the function dF : p p ! ½0, 1 is defined as follows: dF ðq1 , q2 Þ ¼ N q2 ðq1 Þ N q2 ðq2 Þ , F ð29Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where kkF ¼ Trace½H is the Frobenius norm of the operator , and H is the Hermitian of . Definition 16 implies that if I and J are the index sets corresponding to N q1 and N q2 respectively, 828 I. Chattopadhyay and A. Ray then N q1 ðq2 Þ 2 ½0, 1CARDðI Þ ½0, 1CARDðJ Þ and N q2 ðq1 Þ 2 ½0, 1CARDðJ Þ ½0, 1CARDðI Þ . Theorem 9: The function dF is a pseudometric on the space p of probability measures. Proof: The Frobenius norm on a probability space satisfies the metric properties except strict positivity because of the almost sure property of a probability measure. œ probability measure. Thus when comparing two probabilistic finite state machines, we need not concern ourselves with whether the machines are represented in their minimal realizations; the distance between two non-minimal realizations of the same PFSA is always zero. However this implies that ,s only qualifies as a pseudo-metric on a. Theorem 10: For 8 2 ½0, 1Þ and 8s 2 ½1, 1Þ, the para metized function ,s ¼ dF þ ð1 Þds is a metric on p. 4.1 Explicit computation of the pseudometric m for PFSA Proof: Following Theorems 8 and 9, ds is a metric for s 2 ½1, 1 and dF is a pseudometric on p. Non-negativity, finiteness, symmetry and sub-additivity of ,s follow from the respective properties of dF and ds. Strict positivity of ,s on 2 ½0, 1Þ is established below The pseudometric ,s is computed explicitly for pairs of PFSA over the same alphabet. Before proceeding to the general case, ,s is computed for the special case, where the pair of PFSA have identical state sets, initial states and transition maps. ,s ðq1 , q2 Þ ¼ 0 ) ð1 Þds ðq1 , q2 Þ ¼ 0 ) q1 ¼ q2 : ð30Þ Lemma 2: Given two PFSA P1i ¼ ðQ, , , qi , ~ 1 Þ, P2i ¼ ðQ, , , qi , ~ 2 Þ, and 2 , the steps for computation of 0, s ðP1i , P2i Þ are œ Remark 5: If two physical processes are modelled as discrete-event dynamical systems, then the respective probabilistic language generators can be associated with probability measures q1 and q2 . The metric ds ðq1 , q2 Þ is related to the production of single symbols as arbitrary strings and hence captures the difference in short term dynamic evolution. In contrast, the pseudometric dF is related to generation of all possible strings and therefore captures the difference in long term behavior of the physical processes. The metric ,s thus captures the both short-term and long-term behaviour with respective relative weights of 1 and . Definition 17: The metric ,s on p for 2 ½0, 1Þ, s 2 ½1, 1, induces a function ,s on a a as follows: 8Pi , P0i0 2 a, ,s ðPi , P0i0 Þ ¼ ,s ðHðPi Þ, HðP0i0 ÞÞ: ð31Þ Corollary 2 (to Theorem 10): The function ,s in Definition 17 for 2 ½0, 1Þ and s 2 ½1, 1 is a pseudometric on a. Specifically, the following condition holds: ,s ðPi , H1 HðPi ÞÞ ¼ 0: ð32Þ Proof: Following Theorem 7, ,s ðPi , H1 HðPi ÞÞ ¼ ,s ðHðPi Þ, H H1 HðPi ÞÞ ¼ ,s ðHðPi Þ, ðH H1 Þ HðPi ÞÞ ¼ ,s ðHðPi Þ, HðPi ÞÞ ¼ 0: œ Remark 6: Corollary 2 can be physically interpreted to imply that the metric family ,s does not differentiate between different realizations of the same Set ðqj Þ ¼ ~ 1 ðqj , Þ ~ 2 ðqj , Þ Then, 0, s ðP1i , P2i Þ ¼ max ðqj Þ s : qj 2 Q P Proof: Let HðP2i ÞðxÞ=HðP2i ÞðxÞ denote a j j-dimensional vector-valued function, where 2 . For proof of the lemma, it suffices to show that the following relation holds: HðP1i ÞðxÞ HðP2i ÞðxÞ , ¼ max ðqj Þ s : sup ds 1 2 qi 2 Q HðPi ÞðxÞ HðPi ÞðxÞ x2 Since P1i perfectly encodes HðP1i Þ, it follows that ð8x, y 2 , ðqi , xÞ ¼ ðqi , yÞÞ implies HðP1i Þðxk Þ HðP1i ÞðyÞ 1 ~ ðq , Þ ¼ ¼ , j k HðP1i ÞðxÞ HðP1i ÞðyÞ where (qi, x) ¼ qj. Similar argument holds for HðP2i Þ. Hence, it follows that for computing 0, s ðP1i , P2i Þ, only one string needs to be considered for each state qj 2 Q. That is, HðP1i Þðxk Þ HðP2i Þðxk Þ 0, s ðP1i , P2i Þ ¼ max x:ðqi ,xÞ¼qj HðP1i ÞðxÞ HðP2i ÞðxÞ s ¼ max ~ 1 ðqj , k Þ ~ 2 ðqj , k Þ s x:ðqi ,xÞ¼qj ¼ max ðqj Þ s : qj 2 Q Lemma 3: Let }1 , }2 be the stable probability distributions for PFSA P1i ¼ ðQ, , , qi , ~ 1 Þ and P2i ¼ ðQ, , , qi , ~ 2 Þ respectively. Then, lim ,s ðP1i , P2i Þ ¼ d2 ð}1 , }2 Þ !1 829 Structural transformations of probabilistic finite state machines Proof: Since P1i and P2i have the same initial state and state transition maps, ( \ k if j 6¼ k j N HðP2 Þ ¼ N HðP1 Þ j k i i N HðP1 Þ ¼ N HðP2 Þ otherwise i j N HðP1 Þ i i k N HðP2 Þ j where and are the jth and kth equivalence classes (i.e., states qj and qk) for P1i , P2i , respectively. The result follows from Definition 16 and Corollary 11 and noting that 8 if j ¼ k < qðxÞ ¼ }1 j : N HðP2 Þ ðHðP1i ÞÞ ¼ x:ðqi ,xÞ¼qj : i jk 0 otherwise Theorem 11: Given two PFSA P1i ¼ ðQ, , , qi , ~ 1 Þ and P2i ¼ ðQ, , , qi , ~ 2 Þ, the pseudometric ,s ðP1i , P2i Þ can be computed explicitly for 2 ½0, 1Þ and s 2 ½1, 1 as ,s ðP1i , P2i Þ ¼ lim ,s ðP1i , P2i Þ þ ð1 Þ0, s ðP1i , P2i Þ: !1 ð33Þ Proof: The result follows from Theorem 10 and Corollary 2. œ The algorithm for computation of the the pseudometric is presented below. To extend the approach presented in Lemma 2 to Algorithm 2: Computation of , s ðPi , P0i0 Þ input: Pi , P0i0 , s, output: , s ðPi , P0i0 Þ 1 begin , , ~ 12 Þ ¼ Pi P00 ; 2 Compute P12 ¼ ðQ, i ~ 21 Þ ¼ P00 Pi ; 3 Compute P21 ¼ ðQ, , , i 4 for j ¼ 1 to CARDðQÞ do 5 ðJÞ ¼ ~ 12 ðqj , k Þ ~ 21 ðqj , k Þ s; 6 endfor 7 0, s ðPi , P0i0 Þ ¼ maxj ðjÞ; 8 Compute }12 ; = State prob: for P12 ðDef: 2:3Þ = 9 Compute }21 ; = State prob: for P21 ðDef: 2:3Þ = 10 Compute d ¼ }12 }21 2 ; 11 Compute , s ðPi , P0i0 Þ ¼ d þ ð1 Þ0, s ðPi , P0i0 Þ; 12 end Then, Pi Gi0 ¼ ðQ Q0 , , , ðqi , q0i0 Þ, ~ Þ, where ððqj , q0k0 Þ, Þ 8 0 0 > > < ðqj , Þ, ðqj0 , Þ , ¼ > > : Undefined if ðqj , Þ and 0 ðq0j0 , Þ are defined otherwise ð34Þ ~ j , Þ: ~ ððqj , q0k0 Þ, Þ ¼ ðq Remark 7: Synchronous composition for PFSA is not commutative, i.e., for an arbitrary pair Pi and Gi 0 , Pi Gi0 6¼ Gi0 Pi : Definition 18: The binary operation of synchronous composition of PFSA, denoted as : p p ! p, is defined as follows: ( Let ~ Pi ¼ ðQ, , , qi , Þ Gi0 ¼ ðQ0 , , 0 , q0i0 , ~ 0 Þ: ð36Þ Synchronous composition of PFSA is associative, i.e., 8P1i1 , P2i2 , P3i3 2 p, P1i1 P2i2 P3i3 ¼ P1i1 P2i2 P3i3 Theorem 11: For a pair of PFSA Pi and Gi0 over the same alphabet, HðPi Þ ¼ HðPi Gi0 Þ: ð37Þ Proof: Let q ¼ HðPi Þ and q0 ¼ HðPi Gi0 Þ. It suffices to show that 8x 2 , qðxÞ ¼ q0 ðxÞ: ðC4Þ For jxj ¼ 0, i.e., x ¼ , the result is immediate. For jxj = 1, we use the method of induction. Since Pi perfectly encodes HðPi Þ, ~ i , Þ 8 2 , qðÞ ¼ ðq ¼ ~ ððqi , q0i0 Þ, Þ ¼ q0 ðÞ: Hence C4 is true for jxj 5 1. With the induction hypothesis 8x 2 , s:t:, jxj ¼ r 2 N, qðxÞ ¼ q0 ðxÞ ð38Þ we proceed with an arbitrary 2 to yield ~ j , Þ where ðqi , xÞ ¼ qj qðxÞ ¼ qðxÞðq 0 ¼ q ðxÞ~ ððqj , q0j0 Þ, Þ where ððqi , q0i0 Þ, xÞ ¼ ðqj , q0j0 Þ ¼ q0 ðxÞ: This completes the proof. arbitrary pairs of PFSA, we need to define the synchronous composition of a pair of PFSA. ð35Þ œ Theorem 12: Given a pair of PFSA Pi, P0i0 and an arbitrary parameter s 2 ½1, 1, Algorithm 2 computes ,s(Pi, P0i0 ) for 2 ½0, 1Þ, s 2 ½1, 1. Proof: By Theorem 11, 8 2 ½0, 1Þ, s 2 ½1, 1, ,s ðPi , P0i0 Þ ¼ ,s ðPi P0i0 , P0i0 Pi Þ: P0i0 P0i0 ð39Þ Since Pi and Pi have the same state sets, initial states and transition maps (see Definition 18), correctness of Algorithm 2 follows from Lemmas 2 and 3. œ 830 I. Chattopadhyay and A. Ray Example 1: The theoretical results of x 4 are illustrated with a numerical example. The following PFSA are considered: P1q1 ¼ ðfq1 , q2 , q3 g, f0, 1g, 1 , q1 , ~ 1 Þ ð40Þ P2qA ¼ ðfqA , qB g, f0, 1g, 2 , qA , ~ 2 Þ ð41Þ as shown in figures 6 and 7, respectively, and figure 8 illustrates the computed compositions P1q1 P2qA (above) and P2qA P1q1 (below). Following Algorithm 2, we have 3 2 k0:4 0:9 0:6 0:1ks 6 k0:2 0:9 0:8 0:1k 7 s7 6 7 6 6 k0:3 0:9 0:7 0:1ks 7 7: 6 ð42Þ s ¼ 6 7 6 k0:3 0:7 0:7 0:3ks 7 7 6 4 k0:4 0:7 0:6 0:3ks 5 k0:2 0:7 0:8 0:3ks As an illustration, we set s ¼ 1. Hence, 1 ¼ ½0:5 ) 0:7 0:6 0, 1 ðP1q1 , P2qA Þ 0:4 0:3 0:5T ¼ maxð1 Þ ¼ 0:7: ð43Þ ð44Þ The final state probabilities are computed to be 0:07 }P1 P2 ¼ ½0:15 }P2 P1 ¼ ½0:29 0:29 0:09 0:29 0:2 0:22 0:27 ð45Þ 0:043 0:043 0:043: ð46Þ For ¼ 0.5, we have 0:5, 1 P1q1 , P2qA ¼ 0:5d2 }P1 P2 , }P2 P1 þ 0:5 0:7 such that probability of occurrence of , given that x has already occurred, is 70% more in one system compared to the other. Also, the occurrence probability of any event, given an arbitrary string has already occurred, is different by no more than 70% for the two systems. The composition P1q1 P2qA shown in the upper part of figure 8 is an encoding of the measure HðP1i Þ and hence is a non-minimal realization of P1i , while the composition P2qA P1q1 shown in the lower part of figure 8 encodes HðP2i0 Þ and therefore is a non-minimal realization of P2i0 . Although the structures of the two compositions are identical in a graph-theoretic sense (i.e. there is a graph isomorphism between the compositions), they represent very different probability distributions on B . 5. Model order reduction for PFSA This section investigates the possibility of encoding an arbitrary probability distribution on B by a PFSA with a pre-specified graph structure. As expected, such encodings will not always be perfect. However, we will show that the error can be rigorously computed and hence is useful for very close approximation of large PFSA models by smaller models. Definition 19: The binary operation of projective ~ : p p ! p is defined as follows: composition 8 P ¼ ðQ, , , qi , ~ 0 Þ > < i Let Gi0 ¼ ðQ0 , , 0 , q0i0 , ~ 0 Þ > : Gi0 Pi ¼ ðQ0 Q, , , ðq0i0 , qi Þ, ~ Þ: ¼ 0:4599: 1/ 0.3 The pseudonorm 0, 1 ðP1q1 , P2qA Þ ¼ 0:7 is interpreted as follows. There exists a string x 2 and an event 2 qA3 1/ 0.4 qA1 qA2 1/ 0.2 1/ 0.4 0/ 0.6 q1 1/0.2 1/ 0.3 0/ 0.8 1/0.4 0/.7 0/.8 q2 qB2 0/ 0.6 q3 0/ 0.8 1/ 0.2 qB1 0/ 0.7 0/0.6 1/ 0.9 1/0.3 Figure 6. qB3 0/ 0.7 PFSA P1. qA3 1/ 0.9 qA1 1/ 0.9 qA2 1/ 0.7 1/0.9 0/0.3 1/.7 qA 0/ 0.1 1/ 0.7 qB2 qB 1/ 0.7 0/ 0.3 qB3 0/ 0.3 qB1 0/ 0.3 0/0.1 Figure 7. 0/ 0.1 0/ 0.1 2 PFSA P . Figure 8. P1 P2 (above) and P2 P1 (below). 831 Structural transformations of probabilistic finite state machines For notational simplicity set 8qj 2 Q and 8q0k0 2 Q0 , X #ðq0k0 , qj Þ ¼ ðHðGi0 ÞðxÞÞ: Projective composition preserves distribution which is defined next. Theorem 13: alphabet, ~ j H ~ k Þ ¼ Pi H ~ k (1) Pi ðG ~ j ÞH ~ j H ~ k Þ ðNon-associativeÞ ~ k 6¼ Pi ðG (2) ðPi G ~ j¼ ~ i ðNon-commutativeÞ: (3) Pi G 6 Gj P œ We justify the nomenclature ‘projective’ composition in the following theorem. Theorem 14: For arbitrary PFSA Pi and Gi0 over the same alphabet, ~ i ¼ Gi0 P ~ i P ~ i: ð48Þ Gi0 P ~ Definition 19 implies Proof: Let Pi ¼ ðQ, , , qi , Þ. z ~ 0 ~ that ðGi Pi Þ ¼ ðQ, , , qi , for ~ z computed as specified in equation (47). It further follows from ~ i ÞP ~ i ¼ ðQ, , , qi , ~ ~ Þ, Definition 19, that ðGi0 P ~ i ÞP ~ i have the same state set, ~ i and Gi0 P i.e., ðGi0 P initial state and state transition maps. Thus, it suffices to show that 8qj 2 Q, , ~ ðqj , Þ ¼ ~ ðqj , Þ: ð49Þ Considering the probabilistic synchronous ~ i Þ Pi ¼ ðQ Q, , , ðqi , qi Þ, ~ Þ composition ðGi0 P (see Definition 18), 8x 2 , ððqi , qi Þ, xÞ ¼ ðqi , qj Þ, for some qj 2 Q It follows that, for qk 6¼ qj, ! X H Gi0 Pi ðxÞ ¼ 0: #ðqk , qj Þ ¼ ð50Þ Finally we conclude 8qj 2 Q, 2 , P ! ~ ððqk , qj Þ, Þ qk 2 Q #ðqk , qj Þ P ~ ðqj , Þ ¼ qk 2 Q #ðqk , qj Þ This completes the proof. Theorem 15: (Projected Distribution Invariance): For two arbitrary PFSA Pi and Gi over the same alphabet, ! ½½Gi0 Pi ¼ ½½Gi0 Pi Pi : ~ Proof: Let Pi ¼ðQ, , , qi , Þ and Gi0 ¼ ðQ0 , , 0 , ! ! 0 ~0 qi0 , Þ. It follows that Gi0 Pi ¼ ðQ, , , qi , ~ Þ, where ! ~ is as computed in Definition 19. Using the same notation as in Definition 19, we have 8 2 , X HðGi0 ÞðxÞ x: ðqi , xÞ¼qj ¼ X #ðq0k0 , qj Þ~ 0 ðq0k0 , Þ q0k0 2 Q0 ¼ X (P #ðq0k0 , qj Þ q0k0 2 Q0 8 < X q0k0 2 Q0 P #ðq0k0 , qj Þ~ 0 ðq0k0 , Þ q0k0 2 Q0 ) #ðq0k0 , qj Þ 9 = ~ #ðq0k0 , qj Þ ~ ðqj , Þ: ¼ :q0 2 Q0 ; ð53Þ P P Since ½½Gi0 Pi jj ¼ x: ðqi , xÞ¼qj HðGi0 ÞðxÞ ¼ q0 0 2 Q0 k #ðq0k0 , qj Þ, it follows that 8 2 , P x: ðqi , xÞ¼qj HðGi0 ÞðxÞ ½½Gi0 Pi j ~ ¼ ~ ððqj , qj Þ, Þ ðsee Definition 18Þ: ð52Þ ¼ ~ ðqj , Þ P X x: ðqi , xÞ¼qj HðGi0 ÞðxÞ ) ½½Gi0 Pi j :ðqj , Þ¼q‘ X ~ ¼ ~ ðqj , Þ #ðqj , qj Þ~ ððqj , qj Þ, Þ #ðqj , qj Þ ¼ ~ z ðqj , Þ We note ½½Gi0 Pi is a probability vector, i.e., NUMSTATES X X ½½Gi0 Pi j ¼ HðGi0 ÞðxÞ ¼ 1: x 2 j¼1 k0 x: ððqi , qi Þ, xÞ ¼ðqk , qj Þ ¼ such that if Nj is the jth equivalence class (i.e. the jth state) of Pi, X then HðGi0 ÞðxÞ ¼ }j : x 2 Nj Proof: The results follow from Definition 19. ~ ½½Gi0 Pi ¼ } 2 ½0, 1NUMSTATESðPi Þ , ð47Þ For PFSA Pi, Gj, Hk over the same z projected Definition 20 (projected distribution): The projected distribution } 2 ½0, 1NUMSTATESðPi Þ of an arbitrary PFSA Gi0 with respect to a given PFSA Pi is defined by the map ½½Pi : a ! ½0, 1NUMSTATESðPi Þ as follows: x: ððq0i0 , qi Þ, xÞ ¼ðq0k0 , qj Þ ~ i ¼ ðQ, , , qi , ~ ~ Þ s.t. Then, Gi0 P P 0 ~ ððq0k0 , qj Þ, Þ q0k0 2 Q0 #ðqk0 , qj Þ ~ P : ~ ðqj , Þ ¼ 0 q0k0 , 2 Q0 #ðqk0 , qj Þ the :ðqj , Þ¼q‘ ð51Þ ) œ 1 HðGi0 Þðxj‘ Þ ¼ ~ ðqj , q‘ Þ, ½½G Pi j i0 ð54Þ 832 I. Chattopadhyay and A. Ray where j‘ j such that 2 j‘ ) ðqj , Þ ¼ q‘ and ~ ðqj , q‘ Þ is the j‘th element of the stochastic state transition matrix ~ corresponding to the PFSA ~ i . It follows from (54), that Gi0 P X qj 2 Q HðGi0 Þðxj‘ Þ ¼ X ~ ½½Gi0 Pi j ðqj , q‘ Þ qj 2 Q ) ½½Gi0 Pi j‘ ¼ X ~ ½½Gi0 Pi j ðqj , q‘ Þ: qj 2 Q ð55Þ It follows that ½½Gi0 Pi satisfies the vector equation ~ ½½Gi0 Pi ¼ ½½Gi0 Pi : The idea is further clarified in the commutative diagram of figure 9. Probabilistic synchronous composition is an exact representation with no loss of statistical information; but the model order increases due to the product automaton construction. On the other hand, the projective composition has the same number of states ~ as the second argument in ðÞðÞ. Both representations have exactly the same projected distribution with respect ~ an extremely to a fixed second argument, thus making useful tool for model order reduction. Algorithm 3 computes the projected composition of two arbitrary PFSA. ð56Þ ~ i P is the stable probability We note that ½½Gi0 P i ~ i and hence, we have distribution of the PFSA Gi0 P ~ ~ i P ¼ ½½Gi0 P ~ i P : ½½Gi0 P i i ð57Þ In general, a stochastic matrix may have more than one eigenvector corresponding to unity eigenvalue (Bapat and Raghavan 1997). However, as per our definition of PFSA (see Definition 2), the initial state is explicitly specified. It follows that the right hand side of (53) assumes that all strings begin from the same state qi 2 Q. Hence it follows: ~ i P : ½½Gi0 Pi ¼ ½½Gi0 P i This completes the proof. ð58Þ œ Algorithm 3: Computation of Projected Composition 1 2 3 4 5 6 7 8 ~ Gi0 ¼ ðQ0 , , 0 , qi0 , ~ 0 Þ input: Pi ¼ ðQ, , , qi , Þ, ~ i0 output: Pi G begin Compute Pi Gi0 ¼ ðQ Q0 , , , ðqi , q0i0 Þ, ~ Þ; = See Definition 4:5 = Compute }; = State prob: for Pi Gi0 Def: 2:3 = Set up matrix T s.t. Tjk ¼ }ððqj , q0k ÞÞ; ~ Compute ~ ~ ¼ T; ! return Pi Gi0 ¼ ðQ0 , , 0 , qi0 , ~ ~ Þ end 5.2 Incurred error in projective composition Given any two PFSA Pi and Gi0 , the incurred error in ! projective composition operation P Gi_0 is quantified in the pseudo-metric defined in x 4 as follows: ~ i0 Þ: v,s ðPi , Pi G 5.1 Physical significance of projected distribution invariance Given a symbolic language theoretic PFSA model for a physical system of interest, one is often concerned with only certain class of possible future evolutions. For example, in the paradigm of deterministic finite state automata (DFSA) (Ramadge and Wonham 1987), the control requirements are expressed in the form of a specification language or a specification automaton. In that setting, it is critical to determine which state of the specification automaton the system is currently visiting. In contrast, for a PFSA, the issue is the probability of certain class of future evolutions. For example, given a large order model of a physical system, it might be necessary to work with a much smaller order PFSA, that has the same long-term behaviour with respect to a specified set of event strings. Although projective composition may incur a representation error in general, the long-term distribution over the states of the projected model is preserved as shown in Theorem 14. ð59Þ Next we establish a sufficient condition for guaranteeing zero incurred error in projective composition. ~ Theorem 16: For arbitrary PFSA Pi ¼ ðQ, , , qi , Þ with corresponding and Gi0 ¼ ðQ0 , , 0 , q0i0 , ~ 0 Þ 0 probabilistic Nerode equivalence relations N and N , we have ! 0 N 5 N ) v,s ðGi0 , Gi0 Pi Þ ¼ 0: 0 Proof: N 5 N implies that there exists a possibly non-injective map f: Q ! Q such that 8x 2 , ðqi , xÞ ¼ qj 2 Q ) ðq0i0 , xÞ ¼ fðqj Þ 2 Q0 : It then follows from Definition 19 that #ðq0k0 , qj Þ ¼ 0 if fðqj Þ 6¼ q0k0 : 833 Structural transformations of probabilistic finite state machines Denoting Gi0 Pi ¼ ðQ Q0 , , , ðqi , q0i0 Þ, ~ Þ and ! ! Gi0 Pi ¼ ðQ, , , qi , ~ Þ, we have fron Definition 19 that P 0 ~ ððq0k0 , qj Þ, Þ ! q0k0 2 Q0 #ðqk0 , qj Þ P ~ ðqj , Þ ¼ 0 q0 0 2 Q0 #ðqk0 , qj Þ k 0 ¼ ~ ððfðqj Þ, qj Þ, Þ ¼ ~ ðfðqj Þ, Þ, where the last step follows from Definition 18. The proof is completed by noting 8x 2 , Using Algorithm 3, we compute the event ~ 21 as ~ 12 and functions 2 " # 0:1250 0:7197 0:2803 6 ~ 12 ¼ ~ 21 ¼ 4 0:1250 , 0:6891 0:3109 0:1250 Example 2: The results of x 5 are illustrated considering the PFSA models described in Example 1. Given the PFSA models P1q1 ¼ ðfq1 , q2 , q3 g, , 1 , q1 , ~ 1 Þ and P2qA ¼ ðfqA , qB g, , 2 , qA , ~ 2 Þ (see (40) and (41)), we ! compute the projected compositions P1q1 P2qA ¼ ! 1 2 12 2 ðfqA , qB g, , , qA , ~ Þ and PqA Pq1 ¼ ðfq1 , q2 , q3 g, The synchronous compositions , 1 , q1 , ~ 21 Þ. P1q1 P2qA and P2qA P1q1 were computed in Example 1 and are shown in figure 8. Denoting the associated stochastic transition matrices for P1q1 P2qA and P2qA P1q1 as 12 and 21 respectively, we note 3 3 2 2 0:2000:8 0:9000:1 ðq1 , qA Þ 6 00:3:700 7 6 00:9:100 7 ðq , q Þ 7 7 6 6 2 A 7 7 6 6 6 :4000:60 7 6 :9000:10 7 ðq3 , qA Þ 7 7 6 6 12 ¼ 6 7, 21 ¼ 6 7 6 0:2000:8 7 6 0:7000:3 7 ðq1 , qB Þ 7 7 6 6 7 7 6 6 4 00:3:700 5 4 00:7:300 5 ðq2 , qB Þ 7 0:8750 5: ðq3 , qB Þ: ð62Þ ! }21 ¼ ½ 0:3333 0:3333 0:3333 : The operations are illustrated in figures 10 and 11 and invariance of the projected distribution is checked as follows: ! }12 ð1Þ þ }12 ð2Þ þ }12 ð3Þ ¼ 0:3017 ¼ }12 ð1Þ ð63aÞ ! }12 ð4Þ þ }12 ð5Þ þ }12 ð6Þ ¼ 0:6983 ¼ }12 ð2Þ ð63bÞ ! }21 ð2Þ þ }21 ð5Þ ¼ 0:333 ¼ }21 ð2Þ ð63cÞ ! }21 ð3Þ þ }21 ð6Þ ¼ 0:333 ¼ }21 ð3Þ: ð63dÞ 6. An engineering application of pattern recognition Projective composition is applied to a symbolic pattern identification problem. Continuous-valued data from a laser ranging array in a sensor fusion test bed are fed to a symbolic model reconstruction algorithm (CSSR) The stable probability distributions}12 and }21 are computed to be }12 ¼ ½0:1458 0:0695 0:0864 0:2017 0:2186 0:2780 ð60aÞ }21 ¼ ½0:2917 0:2917 0:2917 0:0417 0:0417 0:0417: ð60bÞ ⊗Gi′ Pi ⊗ Gi′ → ⊗Gi′ [• Gi′ [• [ Pi q1 1/0.2 1/0.4 0/.7 0/0.6891 1/0.2803 0/.8 q2 → 2 ⊗P qA q3 0/0.6 .31 1 qA qB 1/0.3 Figure 10. P1q1 projectively composed with P2qA . 1/0.9 q1 0/0.3 1/.875 Gi′ 1/.875 [ 0 [• [ → Pi ⊗ Gi′ Gi′ 3 ! }12 ¼ ½ 0:3017 0:6983 , œ :7000:30 0:8750 0:8750 ð61Þ 1 ! 2 We note that the stable distributions for Pq1 PqA ! and P2qA P1q1 are given by HðGi0 ÞðxÞ ¼ ~ 0 ðq0i0 , xÞ ¼ ~ ððfðqi Þ, qi Þ, xÞ ! ! ¼ ~ ðqi , xÞ ¼ HðGi0 Pi ÞðxÞ: :4000:60 generating ℘ Figure 9. Commutative diagram relating probabilistic composition, projective composition and the original projected distribution. qA 0.7 1 0/0.1 qB → 1 ⊗P q1 0 .125 q2 q3 0/.125 1/.875 Figure 11. P2qA projectively composed with P1q1 . 834 I. Chattopadhyay and A. Ray (Shalizi and Shalizi 2004) to yield probabilistic finite state models over a four-letter alphabet. A maximum entropy partitioning scheme (Rajagopalan and Ray 2006) is employed to create the symbolic alphabet on the continuous time series. Figure 12 depicts the results from four different experimental runs. Two of those runs in the top two rows of figure 12 correspond to a human subject moving in the sensor field; the other two runs in the bottom two rows correspond to a robot representing an unmanned ground vehicle (UGV). The symbolic reconstruction algorithm yields PFSA having disparate number of states in each of the above four cases (i.e., two each for the human subject and the robot), with their graph structures being significantly different. The resulting patterns (i.e., state probability vectors) for these PFSA models in each of the four cases are shown on the left side of figure 12. The models are then projectively composed with a 64 state D-Markov machine (Ray 2004) having alphabet size ¼ 4 and depth ¼ 3. The resulting pattern vectors are shown on the right hand column of figure 12. The four rows in figure 12 demonstrate the applicability of projective composition to statistical pattern classification; the state probability vectors of projected models unambiguously identify the respective patterns of a human subject and an UGV. 0.4 7. Summary, conclusions and future work This paper presents a rigorous measure-theoretic approach to probabilistic finite state machines. Key concepts from classical language theory such as the Nerode equivalence relation is generalized to the probabilistic paradigm and the existence and uniqueness of minimal representations for PFSA is established. Two binary operations, namely, probabilistic synchronous composition and projective composition of PFSA are introduced and their properties are investigated. Numerical examples have been provided for clarity of exposition. The applicability of the defined binary operators has been demonstrated on experimental data from a laboratory test bed in a pattern identification and classification problem. This paper lays the framework for three major directions for future research and the associated applications. . Probabilistic non-regular languages: Since projective composition can be used to obtain smaller order models with quantifiable error, the possibility of projectively composing infinite state probabilistic models with finite state machines must be investigated. The extension of the theory developed in this paper to non-regular probabilistic languages would prove invaluable in handling strictly non-Markovian (a) 0.2 0 0.4 0.2 0 10 20 30 40 50 60 0 70 0.4 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 0.4 (b) 0.2 0 0.2 0 10 20 30 40 50 60 70 0 80 1 1 0.5 (c) 0 1 2 3 4 5 6 7 8 0.5 0 9 10 11 12 13 14 0.4 1 0.2 0.5 0 (d) 0 5 10 15 20 25 30 0 Figure 12. Experimental validation of projective composition in pattern recognition: (a) and (b) correspond to ranging data for a human subject in sensor field; (c) and (d) correspond to an UGV. Structural transformations of probabilistic finite state machines models in the symbolic paradigm, especially physical processes that fail to have the semi-Martingle property, e.g., fractional Brownian motion (Decreusefond and Ustunel 1999). Future work will investigate language-theoretic non-regularity as the symbolic analogue to chaotic behavior in the continuous domain. . Optimal control: The reported measure-theoretic approach to optimal supervisor design in PFSA models will be extended in the light of the developments reported in this paper to situations where the control specification is given as weights on the states of DFSA models disparate from the plant under consideration. Such a generalization would allow the fusion of Ramadge and Wonham’s constraint based supervision approach (Ramadge and Wonham 1987) with the measure-theoretic approach reported in Chattopadhyay (2006) and Chattopadhyay and Ray (2007). This new control synthesis tool would prove invaluable in the design of event driven controllers in probabilistic robotics. . Pattern identification: Preliminary application in pattern classification has already been demonstrated in x6. Future research will formalize the approach and investigate methodologies for optimally choosing the plant model on which to project the constructed PFSA to yield maximum algorithmic performance. Future investigations will explore applicability of the structural transformations developed in this paper for the fusion, refinement and computation of bounded order symbolic models of observed system behavior in complex dynamical systems. Acknowledgements The authors would like to thank Dr. Eric Keller for his valuable contribution in obtaining the experimental results. This work has been supported in part by the U.S. Army Research Office under Grant Nos. W911NF-06-1-0469 and W911NF-07-1-0376. 835 References R. Bapat and T. Raghavan, Nonnegative Matrices and Applications, Cambridge University Press, 1997. I. Chattopadhyay, ‘‘Quantitative control of probabilistic discrete event systems,’’ PhD Dissertation, Dept. of Mech. Engg. Pennsylvania State University, http://etda.libraries.psu.edu/theses/approved/ WorldWideIndex/ETD-1443, 2006. I. Chattopadhyay and A. Ray, ‘‘Renormalized measure of regular languages’’, Int. J. Contr., 79, pp. 1107–1117, 2006. I. Chattopadhyay and A. Ray, ‘‘Language-measure-theoretic optimal control of probabilistic finite-state systems’’, Int. J. Contr., 80, pp. 1271–1290, 2007. L. Decreusefond and A. Ustunel, ‘‘Stochastic analysis of the fractional brownian motion’’, Potential Analysis, 10, pp. 177–214, 1999. V. Garg, ‘‘Probabilistic languages for modeling of DEDs’’, Proceedings of 1992 IEEE Conference on Information and Sciences, Princeton, NJ, pp. 198–203, March 1992a. V. Garg, ‘‘An algebraic approach to modeling probabilistic discrete event systems’’, Proceedings of 1992 IEEE Conference on Decision and Control, Tucson, AZ, pp. 2348–2353, December 1992b. W.J. Harrod and R.J. Plemmons, ‘‘Comparison of some direct methods for computing the stationary distributions of markov chains,’’, SIAM J. Sci. Statist. Comput., 5, pp. 453–469, 1984. J.E. Hopcroft, R. Motwani and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, 2nd ed., Boston, MA, USA: Addison-Wesley, 2001, pp. 45–138. J.G. Kemeny and J.L. Snell, Finite Markov Chains, 2nd ed., New York: Springer, 1960. R. Kumar and V. Garg, ‘‘Control of stochastic discrete event systems modeled by probabilistic languages’’, IEEE Trans. Autom. Contr., 46, pp. 593–606, 2001. M. Lawford and W. Wonham, ‘‘Supervisory control of probabilistic discrete event systems’’, Proceedings of 36th Midwest Symposium on Circuits and Systems, pp. 327–331, 1993. V. Rajagopalan and A. Ray, ‘‘Symbolic time series analysis via wavelet-based partitioning’’, Signal Process., 86, pp. 3309–3320, 2006. P.J. Ramadge and W.M. Wonham, ‘‘Supervisory control of a class of discrete event processes’’, SIAM J. Contr. Optimiz., 25, pp. 206–230, 1987. A. Ray, ‘‘Symbolic dynamic analysis of complex systems for anomaly detection’’, Signal Process, 84, pp. 1115–1130, 2004. A. Ray, ‘‘Signed real measure of regular languages for discrete-event supervisory control’’, Int. J. Contr., 78, pp. 949–967, 2005. W. Rudin, Real and Complex Analysis, 3rd ed., New York: McGraw Hill, 1988. C.R. Shalizi and K.L. Shalizi, ‘‘Blind construction of optimal nonlinear recursive predictors for discrete sequences’’, in AUAI’04: Proceedings of the 20th conference on uncertainty in artificial intelligence, Arlington, Virginia, United States, AUAI Press, pp. 504–511, 2004. W. Stewart, Computational Probability: Numerical Methods for Computing Stationary Distribution of Finite Irreducible Markov Chains, New York: Springer, 1999.
© Copyright 2026 Paperzz