ILP & ASM DRAFT Participants: Stephen Muggleton James Cussens Hendrick Blockeel Alex Bradley Daniel van der Wallen (University of York) (University of York) (University of Leuven) (Syllogic Ireland) (Syllogic Holland) This report describes the research done at the University of York. The research was done to give a proof of concept of ILP for use in Adaptive System Management. For the application we used both progol (1) and tilde (2). Draft version: Not yet all Tilde results and statistics are reported (waiting for input from Leuven). Not all relevant test are already done, some tests should be done in Houten and Dublin. Table of contents 1 Introduction _______________________________________________________________ 3 2 Data______________________________________________________________________ 4 3 Model ____________________________________________________________________ 5 4 Experiments _______________________________________________________________ 6 4.1 Experiment 1 __________________________________________________________________ 6 4.1.1 4.1.2 Progol ____________________________________________________________________________ 6 Tilde _____________________________________________________________________________ 8 4.2 Experiment 2 __________________________________________________________________ 9 4.2.1 4.2.2 Progol ___________________________________________________________________________ 10 Tilde ____________________________________________________________________________ 12 4.3 Experiment 3 _________________________________________________________________ 13 4.3.1 4.3.2 progol ___________________________________________________________________________ 14 Tilde ____________________________________________________________________________ 16 4.4 Experiment 4 _________________________________________________________________ 17 4.4.1 4.4.2 4.4.3 Progol ___________________________________________________________________________ 18 Experiment 5 ______________________________________________________________________ 19 Progol ___________________________________________________________________________ 20 5 Visualization of the Data ____________________________________________________ 21 6 Interpretation of the experiments _____________________________________________ 23 7 Conclusions ______________________________________________________________ 25 8 future work _______________________________________________________________ 25 1 Introduction The current learning techniques in ASM are propositional, the target is related to individual monitors. This means we can find a set of relations of type: if monitor X > 12 then Bad_ performance. To interpret those rules we need system management knowledge. For example when a large number of monitors from the set of relations are disk monitors of a certain server then a system manager could interpret the set as a whole as an indication that the disk on that server is performing badly. So, from this set of rules we need to derive a more useful indication like: the disks on server Y are causing low performance. With given knowledge we can maybe deduce that the disks are busy swapping memory, at this point a real cause of low performance is found. To automate this we need two things. First, model information so that we know that a disks runs on a certain server and secondly background knowledge with wich real causes can be deduced from a set of correlated events. This means, we want the model and other background knowledge to be included in the learning process. Therefore we did a feasibility study into the possibility of using ILP systems for this type of learning. This should take into account, besides theoretical issues, issues like scalability and performance. 2 Data The data we used is system data from a real computer network. The size of the data-set is a typical a small size data set compared to the data that is intended to be used. The data set comes from a computer system that has been monitored for two months. The system was monitored every 15 minutes on approx. 200 different components. This gives a data set of 5000 records (1 million fields), each containing values for the different monitors. Whereas the intenional size of data could be something like 12000 components, every 1 minute for 1 month which gives 500 million fields. Because we want to compare different monitors at the same time, we have made sure that for every time slice, every monitor has a value. This is done by first calculating all the time slices and afterwards filling in the values for all monitors, if the monitor doesn’t have a value we linearly interpolate between two time slices for which the monitor did have a value. This probably introduces some noise in the data. Because of the high numeric character of these monitors and the symbolic character of ilp, we only looked at interesting values. These are those values that are 2 standard deviations above or below average, indicated by high and low. This also reduces the size of the data set to be handled by the ILP algorithm enormously. But of course it requires a preprocessing step which is the calculation of the average and standard deviation for every monitor over the time period and checking each value to be high, low or normal. Next to the monitor data, we use model data which is discussed in the next section. 3 Model The model underlying the data is a simplified version of the ASM data model. The model consists of abstract components (cc_component ) and instantiations of those components (ci_component). This defines an “isa”-relation. Next to this, abstract components are part of higher level components. This defines the “part-of” relation. A typical example would be: isa ( AIXServer, Beta ) and partof( Beta , HardDisk ). This gives the following prolog facts: class( classid, classname, parentid ). isa( classid, classid2 ):- class(classid, _ , classid2), class(classid2, _, _). this defines the hierarchical isa-relations. Instance( instance_id, instance_name, class_id ) . Part_of( instance_id, instance_id). This defines the part-of relations. So, for every instance we have a “isa” relation, and for every component we have a “partof” relation (All components are derived from the top-level component). Monitors are described by three parameters: MonitorId (unique), Name , Component_id Monitors are linked to the model by the component they run on. So, in prolog: monitor(monitor_id, monitor_name, component_id). The actual monitor values can be recorded in two ways ‘vertically’ or ‘horizontally’. Say we have t time slices and we have n monitors. We can construct for every t one predicate with arity n, which makes the total number of monitors t. We also can construct, for every t, n predicates with arity 3, which makes the total number of monitors t*n. For our current data set, we have n = 250 and t = 5000, coming down to or 5000 records or 1 million records. (see Data) 4 Experiments A number of experiments were done with both progol and tilde. The first experiment was the propositional case where the “unhappiness” of the system (defined as a threshold on a monitor) is related to individual monitors. The reason for this experiment is to compare the results with previous results found with decision trees and association rules. The second experiment was done with time as a variable, the systems could use the preidcate previous(Time, Time-1), so we allow to find rules that relate the target at time t to the monitors at time t-1. The third experiment was like the second, but here it was allowed to look back two periods in time. The fourth experiment was done to see if existing rules could be further refined, so the rules found in experiments 2 and 3 are added to the background knowledge and the system is run again. This will allow longer rules to come into the theory. 4.1 Experiment 1 The first experiment was the propositional case where the “unhappiness” of the system (defined as a threshold on a monitor) is related to individual monitors. The data was discretized by: Value = high Value = low if if Value > average + 2 std. dev. Value < average + 2 std. dev. So, we expect rules like: unhappy(Time) :- monitor(‘free diskspace’, Time , low). 4.1.1 Progol :- set(i,2)? :- set(noise,5)? :- set(nodes,200)? The theory found by progol was: unhappy(A) :monitor(A, m7_cpu_system_s11 (24) ,hi), monitor(A, m14_nfs_client_calls_s11 (30),hi). Runtime: 20 minutes. the accuracy of this theory: Contingency table A P ( ~P ( 157 9.7) 75 222.3) 232 Overall accuracy= 97.44% +/- 0.22% Chi-square = 2418.98 Without Yates correction = 2435.49 Chi-square probability = 0.0000 ~A 53 200.3) 4714 ( 4566.7) 210 ( 4767 4714 4999 4.1.2 Tilde Induction time: 1534.59 seconds. Theory found by Tilde: modelid(A) monitor(A,157,hi) ? +--yes: unhappy [136 / 136] +--no: monitor(A,58,hi) ? +--yes: monitor(A,135,lo) ? | +--yes: happy [12 / 20] | +--no: unhappy [33 / 34] +--no: monitor(A,82,hi) ? +--yes: monitor(A,49,hi) ? | +--yes: happy [3 / 4] | +--no: unhappy [4 / 4] +--no: happy [284 / 334] Contingency table 4.2 Experiment 2 4.2.1 Progol runtime: 35 minutes. Found theory: Rule1: unhappy(A) :prev(A,B), monitor(B, m6_cpu_busy_s11 (23),hi), monitor(B, m75_nr_rec_breq_s11 (58),hi). Rule2: unhappy(A) :prev(A,B), monitor(B, m2_number_of_processes_s11 (19),hi), monitor(B, m201_number_of_users_s2 (134),hi), monitor(B, m203_paging_space_used_s2 (136),lo). Rule3: unhappy(A) :prev(A,B), monitor(B, m3_paging_space_used_s11 (20),hi), monitor(B, m121_545h502_s11 (104),lo), monitor(B, daytime (227),lo). Contingency table A P ( ~P ( 169 10.3) 63 221.7) 232 ~A 54 212.7) 4713 ( 4554.3) 223 ( 4767 Overall accuracy = 97.66% +/- 0.21% Chi-square = 2652.71 Without Yates correction = 2669.51 Chi-square probability = 0.0000 Distribution of positive coverage. Rule1: 157 Rule2: 5 Rule3: 100 Distribution of errors of commission (negative examples) 4776 4999 Rule1: 16 Rule2: 12 Rule3: 26 4.2.2 Tilde Induction time: 601.3 seconds. Theory found by Tilde: modelid(A) , prev(A,B) monitor(B,58,hi) ? +--yes: monitor(B,86,hi) ? | +--yes: monitor(B,49,hi) ? | | +--yes: unhappy [3 / 3] | | +--no: happy [11 / 17] | +--no: unhappy [168 / 169] +--no: monitor(B,82,hi) ? +--yes: monitor(B,98,hi) ? | +--yes: happy [2 / 2] | +--no: monitor(B,134,hi) ? | +--yes: unhappy [3 / 3] | +--no: monitor(B,20,hi) ? | +--yes: happy [2 / 2] | +--no: unhappy [5 / 7] +--no: monitor(B,84,hi) ? +--yes: monitor(B,227,lo) ? | +--yes: unhappy [3 / 3] | +--no: monitor(B,52,hi) ? | +--yes: happy [5 / 5] | +--no: monitor(B,128,hi) ? | +--yes: unhappy [4 / 5] | +--no: monitor(B,48,lo) ? | +--yes: unhappy [2 / 3] | +--no: happy [16 / 24] +--no: happy [259 / 289] Contingency table A P ( ~P ( 188 19.0) 44 213.0) 232 Overall accuracy= 94.68% +/- 0.32% Chi-square = 1703.63 Without Yates correction = 1713.76 Chi-square probability = 0.0000 ~A 222 391.0) 4544 ( 4375.0) 410 ( 4767 4588 4999 4.3 Experiment 3 4.3.1 progol Theory given by progol Rule1: unhappy(A) :prev(A,B), monitor(B, m75_nr_rec_breq_s11 (58), hi), monitor(B, m121_545h502_s11 (104), lo). Rule2: unhappy(A) :prev(A,B), prev(B,C), monitor(B, m101_vs651fde_filetime_s11 (84), hi), monitor(C, m99_scar_rem_accestime_s11 (82), hi). Rule3: unhappy(A) :prev(A,B), prev(B,C), monitor(C, m43_ora_sga_free_memory_s11 (49),lo), monitor(C, m304_system_load_s4 (172),hi). Rule4: unhappy(A) :prev(A,B), monitor(B, m2_number_of_processes_s11 (19),hi), monitor(B, m201_number_of_users_s2 (134),hi), monitor(B, m203_paging_space_used_s2 (136),lo). Contingency table A P ~P ( 178 (16.0) 54 216.0) 232 Overall accuracy= 95.58% +/- 0.29% Chi-square = 1834.66 Without Yates correction = 1846.04 Chi-square probability = 0.0000 Distribution of positive coverage. Rule1: 168 ~A 167 329.0) 4600 ( 4438.0) 345 ( 4767 4654 4999 Rule2: 5 Rule3: 144 Rule4: 5 Distribution of errors of commission (negative examples) Rule1: Rule2: Rule3: Rule4: 50 8 100 12 4.3.2 Tilde Induction time: 4576.69 seconds. Theory found by Tilde: modelid(A) , prev(A,B) , prev(B,C) monitor(C,58,hi) ? +--yes: monitor(C,104,lo) ? | +--yes: unhappy [168 / 169] | +--no: monitor(B,49,hi) ? | +--yes: unhappy [3 / 3] | +--no: monitor(C,111,hi) ? | +--yes: unhappy [2 / 2] | +--no: happy [11 / 15] +--no: monitor(C,82,hi) ? +--yes: monitor(B,49,hi) ? | +--yes: happy [4 / 5] | +--no: unhappy [8 / 8] +--no: monitor(C,28,hi) ? +--yes: unhappy [3 / 4] +--no: monitor(B,28,hi) ? +--yes: unhappy [3 / 4] +--no: monitor(C,45,hi) ? +--yes: unhappy [2 / 3] +--no: monitor(C,84,hi) ? +--yes: monitor(C,177,hi) ? | +--yes: unhappy [3 / 3] | +--no: monitor(C,26,hi) ? | +--yes: unhappy [2 / 2] | +--no: monitor(B,179,hi) ? | +--yes: unhappy [2 / 2] | +--no: happy [24 / 34] +--no: monitor(B,82,hi) ? +--yes: monitor(C,207,hi) ? | +--yes: unhappy [2 / 2] | +--no: happy [2 / 3] +--no: happy [255 / 273] Contingency table A P ( ~P ( 198 26.2) 34 205.8) 232 Overall accuracy= 91.98% +/- 0.38% Chi-square = 1322.46 Without Yates correction = 1330.19 Chi-square probability = 0.0000 ~A 367 538.8) 4399 ( 4227.2) 565 ( 4767 4433 4999 4.4 Experiment 4 This experiment includes the significant rules from experiment 2 and 3 above as background knowledge. The goal of this experiment is to see wether or not longer and better rules are found. So the search is restricted by pruning all nodes which did not include a rule/2 term. The rules that were used as backgound knowledge are the following: rule(2, A) :prev(A,B), monitor(B, m75_nr_rec_breq_s11 (58), hi), monitor(B, m121_545h502_s11 (104), lo). rule(4, A) :prev(A,B), prev(B,C), monitor(C, m43_ora_sga_free_memory_s11 (49),lo), monitor(C, m304_system_load_s4 (172),hi). rule(6, A) :prev(A,B), monitor(B, m6_cpu_busy_s11 (23),hi), monitor(B, m75_nr_rec_breq_s11 (58),hi). rule(8, A) :prev(A,B), monitor(B, m3_paging_space_used_s11 (20),hi), monitor(B, m121_545h502_s11 (104),lo), monitor(B, daytime (227),lo). 4.4.1 Progol Theory found by progol: Rule1 unhappy(A):-rule(6,A) Rule2 unhappy(A):-rule(8,A) Rule3 unhappy(A):-rule(2,A),prev(A,B),monitor(B,21,hi) Rule4 unhappy(A):-rule(2,A),prev(A,B),monitor(B,154,hi) Contingency table A P ~P ( 168 (10.7) 64 221.3) 232 ~A 63 220.3) 4704 ( 4546.7) 4767 Overall accuracy= 97.46% +/- 0.22%] Chi-square = 2520.85] Without Yates correction = 2536.95] Distribution of positive coverage. Rule1: Rule2: Rule3: Rule4: 157 100 7 79 Distribution of errors of commission (negative examples) Rule1: Rule2: Rule3: Rule4: 16 26 15 24 231 ( 4768 4999 4.5 Experiment 5 The fifth experiment was done with the inclusion of a part of the model information from the system. See model. This, of course is a none-propositional experiment. The data consists of the same data as the previous experiments plus the relational information from the model. We allowed only relation attributes in the rules. 4.5.1 Progol Progol settings were: hi => avg + 4 std. dev. (!) set(I, 3) set(nodes, 1000) Progol generated the following two rules: unhappy(A) :prev(A,B), monitor(B,C,hi), monitorclass_id(C,Breq Table (1045)). unhappy(A) :prev(A,B), monitor(B,C,hi), monitorclass_id(C,NFS server (1026)). Contingency table A P ~P ( 186 (24.2) 46 207.8) 232 Overall accuracy= 92.38% +/- 0.38% Chi-square = 1260.01 Without Yates correction = 1267.84 Chi-square probability = 0.0000 ~A 335 496.8) 4432 ( 4270.2) 521 ( 4767 4478 4999 5 Visualization of the Data To get more insight in the data we plotted some of the monitors that are in the rules against time and/or the target. Figure 1 Cpu system % on s11 in time Figure 2 Cpu system % on s11 against target 6 Interpretation of the experiments The experiments done with Tilde and Progol all give a predictive accuracy between 92%-97%, on the complete data. The default theorie of predicting always happy, gives a accuracy of 95%, but we have to take into account that the theories were learned on all positive examples and a small subset of the negatives, which together forms about 10% of the complete data. The general problem with such data sets as this, is this skewness of the distribution between positives and negatives. A system is performing well most of the time and badly sometimes. The training set is equilized by sampling a small subset of the negatives, giving the default theory an accuracy of about 50%. The theories have a predictive accuracy on the trainings set of minimal 88%. We have seen that in a short time, approx. 20, minutes using progol a relatively simple theory, only 1-4 rules, could be, Tilde took about equally long, giving more complex theories, 10-20 rules. Altough not exactly the same theories were found, both systems came up with a lot of the same monitors. The time was included to increase the ‘predictiveness’ of the rules. Many of the propositional clauses explain approx. the same data so there is a large overlap between the theories. Tilde had less overall predictive accuracy than progol but had less errors on the positives. We should be carefull tough to compare these results because for both systems the positive part of the test set has also been used for training. The theories leaves us with approx. 20% of the unhappies (positives) that are not explained. This could mean that in real systems also 80% of performance problems can be predicted and 20% not. This could, however, also mean that the data is not very clean. As can be seen from the visualization of some of the attributes, the data is corrupted for some time periods. Partly due to the mechanism that was used to retrieve the data, (see Data), the data has certain null-areas in which no values were recorded. These introduce noise in the data as well as in the discretized data because they have an effect on mean and standard deviation. When including the model information and restricting progol so that it uses no specific monitor information, we get a model of only two rules. These two rules predict overall worse than the propositional theories (92,3% versus 97,8%), but scores better on the positives (80.1% versus 72%). Next to this the theory is more sensible. It says more about the domain because it has a richer language. The found theory contained a rule predicting unhappy on there being a high monitor of class NFS server. NFS server is a component class on which several nfs monitors are defined. This gives exactly the kind of generalisation we want, ‘a NFS problem’ is the level of abstraction system managers use. The time the systems took for the induction was approx. 20-60 minutes for the propositional case approx. 180 minutes for the first order case. For background processing this is reasonable for online analysis it isn’t. The implementations of the systems is academic meaning that efficiency and performance is not the first concern of the developpers. Because the search space can be restricted considerably and the implementations can be improved, we can propably bring this time down considerably. 7 Conclusions 8 future work
© Copyright 2026 Paperzz