www.scichina.com info.scichina.com www.springer.com/scp www.springerlink.com Strategy optimization for controlled Markov process with descriptive complexity constraint JIA QingShan & ZHAO QianChuan† Center for Intelligent and Networked Systems, Department of Automation, TNLIST, Tsinghua University, Beijing 100084, China Due to various advantages in storage and implementation, simple strategies are usually preferred than complex strategies when the performances are close. Strategy optimization for controlled Markov process with descriptive complexity constraint provides a general framework for many such problems. In this paper, we first show by examples that the descriptive complexity and the performance of a strategy could be independent, and use the F-matrix in the No-Free-Lunch Theorem to show the risk that approximating complex strategies may lead to simple strategies that are unboundedly worse in cardinal performance than the original complex strategies. We then develop a method that handles the descriptive complexity constraint directly, which describes simple strategies exactly and only approximates complex strategies during the optimization. The ordinal performance difference between the resulting strategies of this selective approximation method and the global optimum is quantified. Numerical examples on an engine maintenance problem show how this method improves the solution quality. We hope this work sheds some insights to solving general strategy optimization for controlled Markov process with descriptive complexity constraint. strategy optimization, controlled Markov process, descriptive complexity 1 Introduction Strategy optimization for controlled Markov process[1,2] provides a general framework for many control, decision making, and optimization problems in real life. Besides the well-known difficulty of large state space and large action space, the pervasive application of digital computers introduces the constraint of limited memory space when using computer-based optimization or implementing a strategy in computers in practice. Strategies that can be stored in the given memory space and run in reasonable time are called simple strategies. Otherwise are called complex strategies. Simple strategies are easy to store and implement, contain less number of parameters, thus require less historical data and shorter training time, and usually generalize better in unknown environment. Due to these various advantages, simple strategies are usually Received January 27, 2009; accepted August 7, 2009 doi: 10.1007/s11432-009-0192-8 † Corresponding author (email: [email protected]) Supported by the National Natural Science Foundation of China (Grant Nos. 60274011, 60574067, 60704008, 60736027, 60721003, 90924001), the New Century Excellent Talents in University (Grant No. NCET-04-0094), the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20070003110), and the Programme of Introducing Talents of Discipline to Universities (the National 111 International Collaboration Projects) (Grant No. B06002) Citation: Jia Q S, Zhao Q C. Strategy optimization for controlled Markov process with descriptive complexity constraint. Sci China Ser F-Inf Sci, 2009, 52(11): 1993–2005, doi: 10.1007/s11432-009-0192-8 preferred than complex strategies when the performances are close. These preferences can be modeled as descriptive complexity constraints[3] . So it is of great practical interest to study strategy optimization for controlled Markov processes with descriptive complexity constraint. The concept of Kolmogorov complexity (KC)[3] quantifies the descriptive complexity of a strategy mathematically. However, KC is in general incomputable (Theorem 2.3.2 in ref. [3]). Also note that the descriptive complexity of a strategy is related to the action of all the states. This makes the actions at different states correlated. So the traditional policy iteration and value iteration cannot be applied directly. The preference to simple strategies are usually handled in two ways in practice. First, practitioners usually focus on strategies with specific structures or properties, say threshold type of strategies or parameterized strategies, and then solve a parameter optimization problem to find a strategy. However, simple strategies do not necessarily have good performance. Thus it is usually difficult to quantify the performance difference among the strategy obtained in the above way, the optimal simple strategy, and the optimal strategy (not necessarily simple). Also the given memory space may not be fully utilized. A second way is to solve strategy optimization for controlled Markov processes without the descriptive complexity constraint. If the resulting strategy is complex, we then approximate the strategy by simple ones. However, as we will show in section 3.1, there is no insurance that the simple strategy obtained in this way still has good performance. In fact, this simple strategy can be arbitrarily worse than the complex strategy in cardinal performance, which will be shown by examples later. In this paper, we first present the formulation of the strategy optimization problem for controlled Markov process with descriptive complexity constraint. Please note that the general dynamic programming problem can be handled in the same way. Then we show by examples that the descriptive complexity constraint and the performance of a strategy could be independent, and use the F1994 matrix in the No-Free-Lunch Theorem to show the risk that approximating complex strategies may lead to simple strategies that are unboundedly worse in cardinal performance than the original optimal (complex) strategies. We then provide an upper bound of the ordinal performance difference between simple strategies and optimal strategies. After that, we develop a method that handles the descriptive complexity constraint directly. This method uses support vector machine (SVM) to describe strategies in a succinct way, describes simple strategies exactly, and only approximates complex strategies during the optimization. The ordinal performance difference between the solution of this selective approximation method and the global optimum is quantified. We then use numerical examples on an engine maintenance problem to show how this selective approximation method improves the solution quality. The rest of this paper is organized as follows. The problem formulation is provided in section 2. The bounds of performance difference between simple and complex strategies are provided in section 3. The SVM-based selective approximation is introduced in section 4. The numerical results on a generic strategy optimization problem and an engine maintenance problem are shown in section 5. We briefly conclude in section 6. 2 Problem formulation We present the mathematical problem formulation of strategy optimization for controlled Markov processes with descriptive complexity constraint in this section. Let S be the state space and A(s) be the action space at state s ∈ S. To simplify discussion, we consider discrete and finite S and A. A strategy γ is a mapping from the state space to the action space. Let Γ be the set of strategies. When the system state is at s ∈ S at time t, and an action a ∈ A(s) is taken, a cost c(s, a) is caused, which is assumed to take only finite values, i.e., there exists M > 0 such that |c(s, a)| < M for all s ∈ S and a ∈ A. After the action is taken, the system state transits to another state s with probability Pr{st+1 = s |st = s, at = a}. The following criteria are usually considered in a controlled Markov JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 process, i.e., the target cost is ⎤ ⎡ c(st , γ(st ))⎦ , J(γ) = E ⎣ (1) t=0,...,min{T |sT ∈ST } which is the expected cost of the Markov process until the state reaches some target set ST ; the total cost is T −1 c(st , γ(st )) ; (2) J(γ) = E t=0 the discounted cost of infinite horizon is T −1 t α c(st , γ(st )) , J(γ) = lim E T →∞ (3) t=0 where α is the discounted factor, 0 < α < 1; and the average cost is T −1 1 c(st , γ(st )) . (4) J(γ) = lim E T →∞ T t=0 Strategy optimization for controlled Markov process is to find a strategy γ ∈ Γ that minimizes one of the above J(γ)’s. The descriptive complexity constraint is modeled as (5) C(γ|U ) C0 , where U is a given description mechanism (e.g., threshold type of strategies, neural networks, or SVM as in section 4), which will be denoted by USVM , C(γ|U ) is the shortest length of the program that implements strategy γ using description mechanism U , and C0 represents the up limit of descriptive complexity of the strategy that one can tolerate (e.g., the size of the given memory space) and is called the descriptive capacity. Note that C(γ|U ) is also called the conditional KC. Thus, the problem we want to solve is min J(γ) s.t. C(γ|U ) C0 . γ∈Γ (6) The meaning is to find among all the strategies that can be stored in the given memory space a strategy that minimizes the cost of the controlled Markov process. 3 Bound of performance difference between simple and complex strategies To solve the problem in (6), one may want to first solve the problem without the constraint. If the solution strategy is complex, one may approximate this strategy by a simple one that satisfies the descriptive complexity constraint (5); and hope this simple strategy also has good performance. However, as we will see in section 3.1, the simple strategy obtained in the above way may have poor performance. In fact, the performance and descriptive complexity of a strategy are independent. In section 3.2, we provide an upper bound of the ordinal performance difference between the best simple strategy and the global optimum. 3.1 Independence between performance and descriptive complexity We first use an example to show that the performance and descriptive complexity of a strategy are independent. Consider a Markov chain with |S| states. At each state there are |A| actions. The choice of the action only affects the one-step cost c(s, a), and does not affect the transition probability Pr{st+1 = s |st = s, at = a}. We have Pr{st+1 = s 1 |st = s, at = a} = |S| for all s, a, and s . We consider the average cost criterion. The average cost of a strategy γ then is J(γ) = limT →∞ T1 T −1 1 E[ t=0 c(st , γ(st ))] = |S| s∈S c(s, γ(s)). To describe a strategy in a digital computer, we assign an index to each action. Then a strategy γ can be regarded as a program, which outputs the index of γ(st ) when state st is inputted. As aforementioned the descriptive complexity of γ is defined as the length of the shortest program that implements γ. So C(γ) is independent of c(s, γ(s)). Thus the performance and descriptive complexity of the strategy are independent. We use P = {c(s, a), s ∈ S, a ∈ A} to denote a problem setting. When we change c(s, a) in the problem setting, the average cost of a strategy γ is changed but without affecting the descriptive complexity of γ. This implies that the performance difference between the simple and complex strategies could be arbitrarily large. Mathematically speaking, we have the following. Proposition 1. ∀γ1 and γ2 ∈ Γ , γ1 = γ2 , and ∀M > 0, there exists a problem setting P s.t. J P (γ1 ) − J P (γ2 ) M , where J P (γ) is the average cost of strategy γ under problem setting P. Proof. We prove this proposition by con- JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 1995 structing a problem setting P s.t. J P (γ1 ) − J P (γ2 ) = M . First, we specify c(s, γ2 (s)) = 0 for all s ∈ S. Since γ1 = γ2 , there must exist at least one state s0 s.t. γ1 (s0 ) = γ2 (s0 ). Let c(s0 , γ1 (s0 )) = M |S|, and c(s, γ1 (s)) = 0 for all s ∈ {s|γ1 (s) = γ2 (s), s = s0 }. Then we have 1 c(s, γ1 (s)) J P (γ1 ) − J P (γ2 ) = J P (γ1 ) = |S| s∈S M |S| = M. (7) |S| So P satisfies the requirement. An implication of Proposition 1 is that when we approximate a complex strategy by a simple strategy, the performance of the simple strategy could be much worse than that of the complex strategy. In the following, we use the Fundamental matrix (F-matrix for short) in the No-Free-Lunch Theorem[4] to further explain this. An n-row F-matrix is defined as a matrix that 1) contains n-rows and 2n columns; 2) each entry is either 0 or 1; 3) no two columns are the same. Each column of the n-row F-matrix represents a Boolean function f (x) with x taking n values. An example of a 3-row F-matrix is shown in Figure 1. = f1 (x) f2 (x) f3 (x) f4 (x) f5 (x) f6 (x) f7 (x) f8 (x) x1 0 0 0 0 1 1 1 1 x2 0 0 1 1 0 0 1 1 x3 0 1 0 1 0 1 0 1 Figure 1 A 3-row Fundamental matrix. One important property of the F-matrix is that each row contains equal number of 0’s and 1’s. The F-matrix is used to prove the No-Free-Lunch Theorem in ref. [4]. We use this concept to show why approximating complex strategies may lead to simple strategies with bad performances. Let each row of the F-matrix represent a strategy γ, each column represents a problem setting P. We say a strategy has satisfying performance if J(γ) J0 , where J0 is a constant defined by the user. If strategy γi has satisfying performance in problem setting Pj , we let the (i, j)th entry of the F-matrix be 1; otherwise, 0. Following this formulation, all the problem settings can be classified into 2n types (where n = |Γ | is the number of strategies of interests). Proposition 1 shows 1996 that each type contains at least one problem setting. When the descriptive complexity constraint is given, some rows represent simple strategies and the rest rows represent complex strategies. When the value of C0 changes, the number of simple strategies and complex strategies may change, respectively. However, we can always find a problem setting in which all complex strategies have satisfying performance and all simple strategies have nonsatisfying performance. Vice versa, it is then clear that if we do not know which type of problem we are dealing with, the performance difference between the simple and complex strategies could be arbitrarily large in cardinal values. In this subsection we have shown that if the performance of a simple strategy is not considered when approximating a complex strategy, this could lead to a performance degradation that is arbitrarily large. This shows the importance of considering the performance of simple strategies during the approximation. 3.2 Ordinal performance bounds for simple strategies In section 3.1, we see the difficulty to quantify the performance difference between simple and complex strategies in cardinal values. In this subsection, we take another viewpoint and focus on the ordinal performance of simple strategies, i.e., if we rank all the simple and complex strategies according to a performance criterion J from small to large, what is the rank of the best simple strategy? Let Nsimple be the number of simple strategies. Then the best simple strategy is at least top(|Γ | − Nsimple + 1), where |Γ | is the total number of strategies including both simple and complex ones. This upper bound is conservative since it assumes all the simple strategies have worse performance than the complex strategies. Although this rank does not tell us the performance difference between the best simple strategy and the global optimum in cardinal values, it provides us some information on the global goodness of such strategies. In this section, we have seen that approximating complex strategies may not produce simple strategies with satisfying performance. Searching within strategies with specific structures means to search JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 within a subset of all the simple strategies, and thus may not obtain a good simple strategy. These two existing methods fail to solve the strategy optimization for controlled Markov process with descriptive complexity constraint. We thus need to handle the descriptive complexity constraint directly. This means to identify which strategies are simple and which are complex. 4 SVM-based selective approximation for strategies Given a description mechanism, only some strategies can be described exactly in the limited memory space. To fully utilize the given memory space, we should calculate the minimal memory space to describe a strategy using the given description mechanism (i.e., the conditional KC). For those simple strategies (i.e., strategies that can be described within the given memory space), we store them exactly. For other strategies (called complex strategies) we can only store them approximately. This will be called selective approximation, and is discussed in details in section 4.2. In this way, we can explore more strategies than other methods, and thus improve the solution quality. As a first attempt in this research direction, we show how to implement this selective approximation method for a specific description mechanism, which is based on the support vector machine (SVM)[5] . For other description mechanisms, for example, neural networks used in neuro-dynamic programming[6] , the implementation procedure provided in this paper supplies a guide. We note that the task of describing a strategy is essentially a classification problem by choosing suitable feature. There are many tools for classification. We adopt SVM, which has specific advantages as follows. 1. There is a one-to-one correspondence between the number of support vectors and the conditional KC of a strategy. So we can distinguish simple and complex strategies. 2. It is easy to determine how many bits are required to store one support vector. So giving memory space in unit of bits, we can directly tell how many support vectors can be stored. Follow- ing this way, we can answer how many strategies can be described exactly and what these strategies are. 3. The calculation of support vectors and parameter values is a quadratic optimization problem (details follow). There are standard algorithms to solve this problem. There are not many parameters to tune by heuristics. 4. SVM can be regarded as a generalized threshold type description technique in the following sense. Threshold type of strategy approximates the original strategy well only when that strategy is “piecewise constant” (i.e., the actions corresponding to several neighboring states are the same) and the number of break points is no greater than the number of thresholds. SVM uses similar technique (details follow) but permit to do nonlinear transformation of (state, action) pairs. Furthermore, this nonlinear transformation is done implicitly. This enables us to describe strategies more than threshold type. 5. For practical application, the state space may be extremely large so that it is computationally infeasible to explicitly list the entire (state, action) pairs of a strategy. The good generalization ability of SVM is well known. This supplies the possibility to encode (approximately) the strategy based on some instead of all the (state, action) pairs. 4.1 SVM-based description mechanism Consider a two-class classification problem. Let {(xi , yi )}, i = 1, 2, . . . , N be the set of training samples, where xi is a row vector indicating the input pattern of the sample, and yi is a scalar taking values from {−1, +1}, −1 for the first class and +1 for the second class. We want to find a hyperplane that can classify the training samples correctly and has good generalization. According to the structural risk minimization principle, this optimal hyper-plane should maximize the margin to both classes of training samples. We use Figure 2 to illustrate this. In Figure 2, black points represent training samples from class 1 and crosses represent samples from class 2. The solid line represents the optimal hyper-plane. The dashed lines are parallel to the JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 1997 Figure 2 The optimal hyper-plane should separate the training samples with the maximal margin. solid line. And there are no training samples between the dashed and solid lines. The margin is the distance between the dashed and solid lines. The two margins are equal. To maximize the two margins, each dashed line should cross at least one training sample in one class, which is marked by circle. These training samples are called support vectors. A support vector machine is a decision function based on the sign of m yi αi K(xi , x) + b, (8) f (x) = is: When the training samples are not linearly separable, SVM uses a kernel function to convert training samples to a high-dimensional space, in which training samples are linearly separable. One advantage of SVM is that such convert is done implicitly by kernel functions. We use a simple example to show how SVM can describe a strategy succinctly. Suppose the state space is directly observable and there are 4 states in total, which are denoted by 1 through 4 respectively. There are two actions to choose for each state, represented by 0 and 1. Then using the lookup table, a strategy can be represented by a 4-bit sequence. For example “0011” represents one strategy, which takes action 0 for states 1 and 2, and takes action 1 for states 3 and 4. Using SVM, we can find two support vectors: action 0 at state 2 and action 1 at state 3, and an optimal hyper-plane, also shown in Figure 3. i=1 where m(m N ) is the number of support vectors (the corresponding multiplier αi is non-zero), K(xi , x) is the kernel function mapping the input vectors into a feature space and b is the bias term[7] . By letting the hyper-plane pass through the origin, we can let b = 0. The coefficients αi are obtained by solving the following dual form of quadratic optimization problem[7,8] : w(α) = N i=1 αi − N 1 αi αj yi yj K(xi , xj ) 2 i,j=1 (9) subject to the constraints αi 0, i = 1, 2, . . . , N. (10) This particular SVM optimization problem formulation is called the hard margin formulation without threshold (b = 0), where no training errors are allowed. So basically SVM is a kernel-functionbased function approximation technique. The idea 1998 Figure 3 Use SVM to describe strategy “0011”. In Figure 3, the horizontal axis represents the input pattern (state), and the vertical axis represents the label (action). We use circles to mark the support vectors. The solid line represents the optimal hyper-plane found by SVM. Using the two support vectors, we can recover strategy “0011”. Note that if we use a lookup table to represent this strategy, we need to store four (state, action) pairs (e.g., four support vectors). Using SVM, we need to store only two support vectors. We find a more succinct way to represent strategy “0011”. When the kernel function is fixed and all the strategies can be recovered using SVM (i.e., no training error on all strategies), we use the minimal number of support vectors to evaluate C(γ | USVM ) for strategy γ. JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 Note that although the optimal hyper-plane is unique1) (see p. 402 in ref. [5]), the related optimal number of support vectors may not be unique (see p. 406 in ref. [5]). So when we find several support vectors that can be used to recover a strategy, this number is an upper bound estimate of C(γ | USVM ) of that strategy. As we will see in section 5, using this upper bound estimate we can already discover interesting effect of descriptive capacity (i.e., the given memory space) and improve solution quality. To simplify the discussion, we do not distinguish the upper bound estimate and the true C(γ | USVM ) in the rest of this paper. 4.2 Selective approximation To be specific, we propose the following two-step technique (one example of the selective approximation) to deal with the constraint of limited memory space. The two-step technique Step 1. We use SVM to describe strategies succinctly, and use the number of support vectors to measure the conditional KC of each strategy. Step 2. By comparing the conditional KC of each strategy and the given descriptive capacity, we know what strategies can be described exactly (simple strategies) and what cannot (complex strategies). Then for simple strategies, we describe exactly; for complex strategies, we use approximation techniques. (An approximation technique represents a group of strategies (instead of one strategy) together. The strategies within the same group cannot be distinguished from each other. One example of the approximation techniques is discussed in section 5.) In this way, we can utilize the descriptive capacity more sufficiently, and find better solution. 5 Numerical results In this section, we first consider a generic strategy optimization problem, in which all strategies represented by binary sequences are feasible. Then we consider an engine maintenance problem, in which only the strategies represented by some Boolean sequences are feasible. 5.1 Example 1: a generic strategy optimization problem In this subsection, we first show how descriptive capacity (i.e., C0 ) affects the number of strategies that can be explored. Then we consider the case with insufficient descriptive capacity and show that selective approximation performs better than total approximation and no approximation (defined later). After that we show how to give an upper bound estimate for the ordinal difference between the strategy that is found by the proposed method and the global optimum (not necessarily describable in the given memory space). Encoding is an important factor that affects the performance of a description mechanism. For example, assume there are 4 states in total, and we consider only binary actions at each state. If the optimal strategy is “0101” in one encoding, then we can exchange the index of action “0” and “1” in states 2 and 4. This will give us a new encoding, in which the optimal strategy is “0000”. This means if we exchange the indexes of some states and the indexes of some actions, then we can find one encoding method, in which the optimal strategy can be easily described. So before we study the performance of a description mechanism, we must first fix the encoding. In all the following discussion, we assume the encoding is fixed. Consider the following strategy optimization problem. There are 9 states and 4 actions for each state (represented by “00”, “01”, “10”, and “11”). In a lookup table, we can use an 18-bit binary sequence (i.e., 9 support vectors) to represent a strategy. Within this section, we assume that Assumption 1. We can exactly evaluate a strategy, i.e., when strategy γ is specified, we know J(γ) exactly. If only the noisy performance evaluation is available, the truly optimal strategy that has been explored may not be selected as the result. Then the following analysis shows the best that each method 1) We assume there is an optimal hyper-plane that classifies all training samples correctly (this is called the separable case). If there is no such optimal hyper-plane (this is called the non-separable case), we can introduce punishment for each training error, and solve a similar quadratic optimization problem to find the support vectors and corresponding coefficients (see pp. 408-412 in ref. [5]). JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 1999 can do. Effect of descriptive capacity on exploration. There are 218 = 262144 strategies in total. We use SVM to calculate the conditional KC of each strategy. There are many toolboxes to find the SVM, which are on-line and free. We use the OSU SVM toolbox in Matlab[9] . We use the polynomial kernel function and the default parameter settings of that software. To describe a strategy exactly, we need at most 9 support vectors and at least 0 support vector2) . We increase the descriptive capacity (C0 ) from 1 to 9, where C0 is in the unit of the number of support vectors that can be stored. For each C0 , we count how many strategies can be described exactly (denoted as |ΓC0 |) and show in Table 1. Table 1 Exploration under different descriptive capacities in Example 1 C0 C0 |ΓC0 | 10 6 48130 2 230 7 115140 3 1290 8 210382 4 4410 9 262144 5 16290 1 |ΓC0 | From Table 1 we see that most strategies are with high conditional KC (higher than 5). This means if the descriptive capacity is small (e.g., 3), we can explore only a small portion of the strategies. When the descriptive capacity is given, we can immediately tell how many strategies can be described exactly and what these strategies are. For example, when the descriptive capacity is 7, from Table 1, we can explore only 115140 strategies. Then by calculating the conditional KC of each strategy and comparing with this descriptive capacity, we know what these 115140 strategies are. We show how this affects the solution quality in the next subsection. Compare selective approximation, total approximation, and no approximation. Selective approximation, total approximation, and no approximation are three different types of description techniques. We define the following techniques as representations of each type. 1. SVM description (no approximation). We use a succinct description of each strategy. When the descriptive capacity is insufficient, we only explore those strategies that can be described exactly. We use the ordinal index (1 for the best and 218 for the worst) of the strategy to evaluate the solution quality. 2. Lookup table description (total approximation). In a lookup table, we need 18 bits (i.e., 9 support vectors) to describe one strategy exactly. Each 2 bits represent one action for a state. In this sense a lookup table uses 9 support vectors to recover each strategy. When the descriptive capacity is insufficient, we cannot specify one strategy, but a group of strategies. For example, if the descriptive capacity allows to store only 7 support vectors, then we cannot describe strategy “000000000” but “0000000XX”, where X can be any of the 4 actions randomly and with equal probability. In other words, we can distinguish only 47 groups of strategies. We use the average ordinal index of strategies within the same group to represent the performance of that group. In this case to solve the strategy optimization problem, we select the best group as the solution, and regard the corresponding average ordinal index as the solution quality. A lookup table is an approximation technique in the following sense. When we cannot use a lookup table to describe one strategy exactly, we only distinguish among different groups of strategies. This means we use a simpler representation (e.g., 7 support vectors) to approximate a group of strategies (e.g., each 7 support vectors approximate 16 different strategies). 3. Two-step technique (selective approximation). The two-step technique is a combination of the SVM description and the lookup table description. When the descriptive capacity is insufficient (e.g., allow to store at most 7 support vectors), we separate all strategies into two classes: simple and complex. For simple strategies, we use SVM to describe exactly. For complex strategies, we use the above lookup-table-based technique to describe approximately, i.e., within each of the 47 2) When the actions at all states are identical, we only need to record one action and need no support vector. In all other cases, we need at least two support vectors to decide a hyper-plane. 2000 JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 groups of strategies, some may be simple and can be described exactly using the SVM description method. After excluding these simple strategies from a group, we cannot distinguish the rest strategies within the same group. In this way, we can distinguish each simple strategy and different groups of complex strategies. (Note that we still cannot distinguish the complex strategies within the same group.) Then we compare and select the best (group of) strategy(-ies) as the solution. Similarly, the (average) ordinal index is used to evaluate the solution quality of a group of complex strategies. The cost function could be any criterion mentioned in section 2. When comparing different description methods, we consider the following two extreme cases: 1) A strategy with higher conditional KC also has better performance (i.e., smaller value of performance function). 2) A strategy with smaller conditional KC has better performance. We compare the three description methods in both cases and show the numerical results in Tables 2 and 3. In Table 2, each number in the second, third, and fourth rows is the (average) ordinal performance of the solution under that descriptive capacity. For example, when the descriptive capacity is 7, if we use the SVM description method, the best strategy we find is top-147005 among all the 262144 strategies. If we use the lookup table description method, the best group of strategies we find is top-9884.6. If we use the two-step technique, we can find the top-4703 strategy on the average. Table 2 Comparison of SVM, lookup table, and two-step tech- nique in case 1 C0 SVM Lookup table Two-step 1 262135 99794.2 99789.3 2 261915 76159.8 76125.7 3 260855 48561.2 48561.2 4 257735 37346.2 37346.2 5 245855 28344.2 28344.2 6 214015 22523.1 10354.0 7 147005 9884.6 4703.0 8 51763 18.5 1.5 9 1 1.0 1.0 In case 2, the best strategy is the simplest to be described. The SVM and the two-step technique can explore this best strategy under all descriptive capacities. Under Assumption 1, the SVM and the two-step technique always select this best strategy as final solution. Table 3 C0 The solution quality of lookup table in case 2 Lookup table C0 Lookup table 1 99965.7 6 1456.8 2 72185.6 7 137.3 3 31658.3 8 9.3 4 13143.1 9 1.0 5 3735.4 In Table 2, the lookup table method outperforms the SVM method. On the contrary the SVM method outperforms the lookup table method in Table 3. In both tables, the two-step technique is always the best under all descriptive capacities. This is reasonable, because the SVM description and the lookup table description are special cases of the two-step technique. When enlarging the search region, this does not hurt the solution of optimization problems. We have used numerical examples to show how measuring conditional KC helps to sufficiently utilize the descriptive capacity, and how this can improve the solution quality. In the following, we show how to give a reasonable estimate of the ordinal difference between the strategy that the twostep technique finds and the global optimal strategy (not necessarily describable in the given memory space). Since we have no a priori knowledge about the performance of a strategy before conducting the evaluation, we can only give a conservative estimate, i.e., the strategy we found is the best among all the strategies that we evaluated, and we can say nothing more than this. Following this idea, let Nexact denote the number of strategies that can be described exactly, and Napproximate denote the number of groups of strategies that can only be described approximately (under that descriptive capacity, we can only distinguish among different groups but not among strategies within a group). When we use the two-step technique, the solution we find is at least top(|Γ | − Nexact − Napproximate + 1). For Example 1, we show the corresponding values in Table 4. In Table 4, the second row shows the upper JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 2001 bound estimate of the ordinal performance (P ) of the strategy found by the two-step technique under different descriptive capacities. For example, when the descriptive capacity is 8, the strategy found by the two-step technique is at least top-28992. The true ordinal performance is top-1.5 in case 1 and top-1 in case 2 (please refer to Table 2). Table 4 The upper bound estimate of ordinal performance of the solution found by the two-step technique in Example 1 C0 P C0 P 1 262131 6 209967 2 261899 7 133093 3 260791 8 28992 4 257479 9 1 5 244831 By comparing the last row in Table 2 and the second row in Table 4, we find that the estimate in Table 4 is conservative. If we can incorporate problem information, we can possibly give a tighter upper bound estimate. However, this is beyond the scope of this paper. 5.2 Example 2: an engine maintenance problem Due to the significance in many industry and military areas, the optimization of maintenance problems has been extensively studied in the past several decades[10−13] , and more recently in refs. [14, 15]. The engine maintenance strategy optimization is one kind of maintenance problem, which finds the maintenance strategy with the minimal cost during a contract duration or on the average. There are many components in an engine with different new lifetimes and prices. When the engine is working, the remaining lifetimes of the components decrease. When a component is expired (i.e., the remaining lifetime is zero) or an emergent failure happens (e.g., the engine gets stuck), the engine stops working and is then sent to the workshop, which causes a shop visit. During the shop visit, the engine is disassembled into components. A maintenance strategy determines which components to be replaced by new ones. After the replacement, the engine is assembled and shipped back to work. The next expiration of a component or the next emergent failure causes the next 2002 shop visit. The cost of a shop visit consists of two parts: the shipment cost of the engine and the prices of the replaced new components. Many engines are expensive. So the manufacturer usually signs a contract with the customer to cover the maintenance cost of the engine for a couple of years. Then the manufacturer wants to find the best maintenance strategy that has the minimal cost during the contract duration or on the average. The difficulty of this engine maintenance strategy optimization problem is well known[16,17] . Besides the difficulty of large state space and large action space, the constraint of limited memory space is also usually met in practice, and is the focus of this paper. Suppose there are n components; the new lifetime of each component is d days; and we consider only stationary strategies. Then the state is the remaining lifetime of each component during a shop visit, which is a 1-by-n row vector. The size of the state space is |S| = dn . The action is a 1-byn Boolean vector, where “1” represents to replace the component and “0” represents not. The size of the action space is |A| = 2n . To store such a strategy by a lookup table, we need to store dn (state, action) pairs. It is trivial to show that for each such pair we need n log2 d + n bits, where · is the ceil function. Thus to store all the pairs, we need (n log2 d + n)dn bits. When d = 100, n = 10, this value is 9.625 × 1011 Gega-Bytes(GBs), which exceeds the memory space of any nowaday computers so far as we know. So it is important to answer the following question: How to fully utilize the given memory space? Since in an engine maintenance problem the size n of the state space (2nd ) increases faster than an exponent of n, for proof of concept, we consider the following simple case. There are 2 components and the new lifetime of each component is 3 days, i.e., n = 2 and d = 3. There are 9 states and 4 actions (represented by “00”, “01”, “10”, and “11”) in total. In the lookup table, we can use an 18-bit binary sequence (i.e., 9 support vectors) to represent a strategy. Suppose at each time unit the engine has probability 0.1 to fail. The failure of the engine does not affect the remaining lifetime of the components. For example, we consider an aircraft JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 engine. A bird may hit the engine and we need to send the engine to the workshop to clean the mess. When some component expires or the engine fails, a shop visit happens and a maintenance strategy determines which component to replace. Each shop visit causes a shipping cost of 1. Components 1 and 2 have a price of 2 and 3, respectively. We want to minimize the daily average maintenance cost, and use a simulation of 1000 days to estimate this average cost of a strategy. Different from the generic strategy optimization problem considered in section 5.1, in engine maintenance problem not all the 18-bit binary sequences represent feasible strategies. For example, when a component expires we have to replace this component by a new one. This means, if only component 1 expires, the feasible actions are “10” and “11”; if only component 2 expires, the feasible actions are “01” and “11”; if both components expire, the feasible action is “11”; and only when no component expires, all four actions are feasible. Following this analysis, we can see that among all the 218 = 262144 18-bit binary sequences, only 4096 such sequences represent feasible strategies. We only consider these feasible strategies in this subsection. We now see the effect of descriptive capacity on exploration. Effect of descriptive capacity on exploration. We use SVM to calculate the conditional KC of each feasible strategy. To describe a strategy exactly, we need at most 9 support vectors and at least 0 support vector. We increase the descriptive capacity (C0 ) from 1 to 9, where C0 is in the unit of the number of support vectors that can be stored. For each C0 , we count how many strategies can be described exactly within C0 (denoted as |ΓC0 |) and show in Table 5. Table 5 Exploration under different descriptive capacities in Example 2 C0 |ΓC0 | C0 |ΓC0 | 1 1 6 637 2 11 7 1149 3 50 8 1847 4 117 9 4096 5 312 Similar to Table 1, we see that in Table 5 most strategies are with high conditional KC (higher than 3). This means that if we focus on simple strategies, we will only explore a small portion of all the strategies. We show how this affects the solution quality in the next subsection. Compare selective approximation, total approximation, and no approximation. Since only strategies represented by some 18-bit binary sequences are feasible, we clarify the three techniques as follows. 1. SVM description (no approximation). When the descriptive capacity is insufficient, we only explore the feasible strategies that can be described exactly. 2. Lookup table description (total approximation). When the descriptive capacity is insufficient, we cannot specify one deterministic strategy, but a randomized strategy that picks from a group of deterministic strategies with equal probability. Note that each such deterministic strategy is feasible. Also note that the maintenance cost of this randomized strategy may not be the average of the costs of the group of deterministic strategies. We use simulation to estimate the daily average maintenance cost of such randomized strategies. 3. Two-step technique (selective approximation). When the descriptive capacity is insufficient (e.g., allow to store at most 7 support vectors), we separate all strategies into two classes: simple and complex. For simple strategies, we use SVM to describe exactly. For complex strategies, we use the above lookup-table-based technique to describe approximately (i.e., use a randomized strategy to approximate a group of complex deterministic strategies). In this way, we have simple deterministic strategies and randomized strategies. We use simulation to estimate the performance of each such strategy. Then we compare and select the best (deterministic or randomized) strategy as the solution. For different values of C0 , we show the best strategy found by each technique in Table 6. We can see the following interesting facts. First, selective approximation is the best. For 1 C0 8, both SVM and the two-step technique (selective approximation) find strategies with maintenance cost strictly smaller than the look-up-table technique. For C0 = 9, all three techniques find the JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 2003 best strategy. The two-step technique beats SVM when C0 = 1 and ties SVM for 2 C0 9. Second, focusing on simply strategies reduces the search space effectively. The best strategy is with performance 2.0620. Both SVM and the two-step technique find the best strategy for all C0 4. The look-up-table technique, however, finds this best strategy only when C0 = 9. If we consider only strategies with complexity no greater than 4, there are only 117 such strategies (refer to Table 5). By focusing on these simple strategies, we reduce the search space from 4096 to 117, which is only 2.86% of the original search space, and still find the best strategy. Table 6 Comparison of SVM, lookup table, and two-step tech- nique in Example 2 C0 SVM Lookup table Two-step 1 2.2200 2.2470 2.1900 2 2.1170 2.1890 2.1170 3 2.1170 2.1620 2.0620 4 2.0620 2.1600 2.0620 5 2.0620 2.1500 2.0620 6 2.0620 2.1410 2.0620 7 2.0620 2.1440 2.0620 8 2.0620 2.1250 2.0620 9 2.0620 2.0620 2.0620 6 Conclusions In this paper, we focus on strategy optimization for controlled Markov process with descriptive complexity constraint. We first show that it is difficult to quantify the cardinal performance difference between simple and complex strategies. We show that two existing methods in practice cannot handle the descriptive complexity constraint well. One method solves the problem without the descriptive complexity constraint first. And if the solution strategy is complex, the method approximates the complex strategy by a simple strategy. We show that this method may not produce simple strategies with good performance. Another method searches among strategies with predetermined structures and thus may not find the best 2004 among the simple strategies. Then, we provide an upper bound for the ordinal performance of the best simple strategy. After that, we propose to consider the descriptive complexity constraint directly, and develop the selective approximation to best utilize the given memory space. By regarding the description of a strategy as a classification problem, we propose an SVM-based description mechanism, and quantify the according conditional KC. The numerical results justify the effect of descriptive capacity on the strategies that can be explored, and that the proposed SVM-based selective approximation can further improve the solution quality. It should be pointed out that we assume the conditional KC of a strategy (conditioned on the SVM description mechanism) can be accurately calculated. When the state space is extremely large, we need to estimate this conditional KC. Then the performance of our method depends on the accuracy of this estimate. In order to obtain an accurate estimate of the conditional KC, the problem information is needed. We also point out some possible future research directions as follows. Note that when the descriptive capacity is large, it could be infeasible to enumerate all the simple strategies. Then we need to combine the selective approximation with some stochastic optimization algorithms to find the best (or a good) simple strategy. Also note that we have shown that when the descriptive capacity increases the performance of the solution strategy can be improved. An interesting research problem of practical interest is to conduct a sensitivity analysis, i.e., to quantify how much performance improvement can be achieved when the descriptive capacity increases. We hope this work sheds some insight to strategy optimization for controlled Markov process with descriptive complexity constraint in general. The authors would like to thank Prof. Y. C. Ho, Dr. L. Xia, and three anonymous reviewers for the helpful comments on a previous version of this manuscript. JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 1 Puterman M L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: John Wiley and Sons, Inc., 1994 2 Bertsekas D P. Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific, 2007 3 Li M, Vitányi P. An Introduction to Kolmogorov Complexity and Its Applications. 2nd ed. New York: Springer-Verlag New York Inc., 1997 4 Ho Y C, Zhao Q C, Pepyne D L. The no free lunch theorems: complexity and security. IEEE Trans Automat Contr, 2003, 48(5): 783–793 5 Vapnik V N. Statistical Learning Theory. New York: John Wiley and Sons, Inc., 1998 6 Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996 7 Gunn S. Support Vector Machines for Classification and Regression. ISIS Technical Report. 1998 8 Burges C J C. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc, 1998, 2(2): 955–975 9 Ma J, Zhao Y, Ahalt S. Osu svm classifier matlab toolbox. Version 3.00. Available: http://www.ece.osu.edu/maj/osu svm/ 10 Cho D I, Parlar M. A survey of maintenance models for multiunit systems. Eur J Oper Res, 1991, 51: 1–23 11 Dekker R. Applications of maintenance optimization models: a review and analysis. Reliab Eng Syst Safe, 1996, 51: 229–240 12 Tan J S, Kramer M A. A general framework for preventive maintenance optimization in chemical process operations. Comput Chem Eng, 1997, 21(12): 1451–1469 13 Wang H. A survey of maintenance policies of deteriorating systems. Eur J Oper Res, 2002, 139: 469–489 14 Xia L, Zhao Q C, Jia Q S. A structure property of optimal policies for maintenance problems with safety-critical components. IEEE Trans Automat Sci Eng, 2008, 5(3): 519–531 15 Sun T, Zhao Q C, Luh P B, et al. Optimization of joint replacement policies for multi-part systems by a rollout framework. IEEE Trans Automat Sci Eng, 2008, 5(4): 609–619 16 Dekker R, Wildeman R E, Van Der Duyn Schouten F A. A review of multi-component maintenance models with economic dependence. Math Method Oper Res, 1997, 45: 411–435 17 Van Der Duyn Schouten F A, Vanneste S G. Two simple control policies for a multi-component maintenance system. Oper Res, 1993, 41: 1125–1136 JIA Q S et al. Sci China Ser F-Inf Sci | Nov. 2009 | vol. 52 | no. 11 | 1993-2005 2005
© Copyright 2026 Paperzz