Investigation of the Fuzzy System for the Assessment of Cadastre

Investigation of the Fuzzy System for the
Assessment of Cadastre Operators’ Work
Dariusz Król1 , Grzegorz Kukla1 , Tadeusz Lasota2 , and Bogdan Trawiński1
1
2
Institute of Applied Informatics, Wroclaw University of Technology, Poland
[email protected] grzegorz [email protected] [email protected]
Faculty of Environmental Engineering and Geodesy, Agricultural University of
Wroclaw, Poland [email protected]
1 Introduction
Cadastre systems are mission critical systems designed for the registration of
parcels, buildings and apartments as well as their owners and users. Those
systems have complex data structures and sophisticated procedures of data
processing. They are constructed in client-server architecture for LAN as well
as in Web technology to be used in intranets and extranets. There are above
400 information centres located in district local self-governments as well as
in the municipalities of bigger towns. Managers of information centres often
complain they have no adequate tools for the assessment of work of cadastre system operators. The fuzzy model for the assessment of operators’ work
of a cadastre information system was proposed in [6]. According to centre
managers’ suggestions the following four input criteria were designed: i.e. productivity, complexity, time and quality. Productivity was expressed by the
number of changes input into the cadastre database within a given period,
complexity of changes was specified as the mean number of objects which
were modified in the database falling per one change, time was determined by
average time of inputting one change and the quality of work defined as the
percentage of changes without any corrections.
The results of the investigation of the fuzzy model are discussed in the
present paper. The tests have been carried out using real data taken from one
cadastre centre. The data comprised all change records input into cadastre
database during the period of one year from October 2004 till September
2005.
In general numerous methods are used for the determining optimal parameters of fuzzy models [7] including such approaches as neuro-fuzzy systems
[1], genetic algorithms [4] and fuzzy clustering techniques [3]. Fuzzy models
are also evaluated using specific analyses like interpretability [9], sensitivity [8]
or regression [2]. In our approach descriptive statistics, correlation, multiple
2
Dariusz Król, Grzegorz Kukla, Tadeusz Lasota, and Bogdan Trawiński
regression and the distance between rankings have been used in the analysis
of the test results.
2 The Structure of the Fuzzy System
2.1 General description of the system
The fuzzy Mamdani’s model with Larsen’s implication proposed in [6] constitutes the basis of the fuzzy system which is intended to rationalize the
management of information centres, to improve the organization of work and
to determine wages of part-time employees. General architecture of the fuzzy
system is shown in Fig. 1. It comprises five main modules of operators’ work
statistics, fuzzification, inference, defuzzification and visualization.
Fig. 1. Architecture of the fuzzy system for the assessment of operators’ work
For each input criterion i.e. productivity (P), complexity (C), time (T)
and quality (Q) as well as for output assessment triangular and trapezoidal
membership functions have been defined. The statistics module provides initial parameters of the model and values of input criteria. The idea of obtaining
the final assessment consists in calculating the average value of P, C, Q and T
criteria taking into account the change records saved in cadastre database for
all operators and for long period of time, e.g. a year or a half of year. These
average values are used as the reference values of 100% for calculating what
percentage of corresponding average value a given operator achieved within
an assessment period. The domain for Q, P and T has been set up from 0% to
200% that means, if an operator achieves better results than 200% of a mean
during the assessment period, his result will be trimmed to 200%. Data for
quality variable are used directly, because this criterion is expressed in percents. Standard deviations, calculated for each criterion separately, determine
the initial width of the basement of triangle and trapezoid. The domain for
output is an arbitrary assessment scale from 0 to 200; with 0 being the lowest
and 200 the highest mark.
Investigation for the Assessment of Cadastre Operators’ Work
3
2.2 Characteristics of the inputs and the output of the system
The first step of data analyses was to examine significant relations between
input criteria. Work statistics for the period of 12 months for 10 operators
were calculated and the values of input criteria were obtained. Criteria with
zero values, for months where a given operator did not work and criteria with
values trimmed to 200, for months when a given operator achieved results
more than 200 percent better than average, were neglected, so finally the
correlation matrix was calculated for 97 quadruplets of input values (see Table
1). Correlation coefficients between T and C as well as between T and P
turned to be significant. This result has led to the decision to remove the
input variable of Time from the fuzzy model.
Table 1. Correlation matrix for input criteria
Complexity
Productivity
Quality
Time
Complexity Productivity
1.0
−0.04
1.0
0.05
0.11
-0.45
0.27
Quality
Time
1.0
0.01
1.00
Fig. 2. Examples of membership functions of input and output variables
Three main models of input and output variables have been programmed
and named 3x5, 5x7 and 7x9. In the 3x5 model 3 fuzzy sets determine each
input criteria and the output is defined by 5 fuzzy sets. In the 5x7 and 7x9
models there are 5 and 7 fuzzy sets for each input as well as 7 and 9 for each
output respectively. Examples of fuzzy membership functions used to define
4
Dariusz Król, Grzegorz Kukla, Tadeusz Lasota, and Bogdan Trawiński
C, P and Q criteria and an output are shown in Fig. 2, where EL, VL, L, BM,
M, AM, H, VH, EH denote Extremely Low, Very Low, Low, Below Medium,
Medium, Above Medium, High, Very High and Extremely High respectively.
Delta (∆) is a parameter which determines the width of the basement of a
triangular membership function. Initial value of ∆ is set up with standard
deviation (σ), calculated for each criterion separately. During tests the value
of ∆ was changed from 1.0σ to 0.2σ.
2.3 Rule base and inference process
The rule base for each model contains simple IF-THEN rules where the condition consists of only two input variables combined by AND operator and
the conclusion is built by one variable. An example of a rule is as follows:
IF Complexity is low AND Productivity is medium THEN Assessment is low.
Fig. 3. Representation of rule base in matrix form for 3x5 and 5x7 models
Thus the rules for one pair of input criteria can be given in the form
of a matrix shown in Fig. 3 and 4. Three matrices for each pair of input
criteria i.e. (C,P), (C,Q) and (P,Q) have been designed and they comprise
9, 25 and 49 rules for the 3x5, 5x7 and 7x9 models respectively. In order to
express the strength of rules belonging to particular combination, rule weights
can be assigned to each combination as the multipliers of rule conditions in
aggregation step in the inference module, for example w(C,P ) = 0.60, w(C,Q) =
0.20 and w(P,Q) = 0.20.
In order to assure that each input value and each rule will have an impact
on the final assessment following operators has been used: PROD for aggregation of rule conditions, PROD for activation of rule conclusions and ASUM for
accumulation of output membership functions, where PROD means algebraic
product and ASUM denotes algebraic sum [5]. In the defuzzification step the
centre of gravity method has been used.
Investigation for the Assessment of Cadastre Operators’ Work
5
Fig. 4. Representation of rule base in matrix form for 7x9 models
3 Plan of the Investigation
The experiment has been carried out using cadastre database taken from one
information centre and change records added by 10 operators into the database
during the period of one year from October 2004 till September 2005. The
fuzzy system has been treated as a black box, that means only input values of
Complexity, Productivity and Quality and corresponding output assessments
have been taken into account in the study. Multiple regression, descriptive
statistics, and the distance between rankings have been used in the analysis
of the test results. In order to examine how the output assessments change
for different parameters of the system 180 variants of fuzzy model have been
constructed by a simulation program and tested. The variants covered all
possible combinations of three basic 3x5, 5x7 and 7x9 models, five values
of ∆ parameter determining the widths of triangle basements, three sets of
rules and four sets of rule weights. Each variant of the model has been coded
according to the method shown in Table 2 where (1), (2), (3) and (4) by a code
caption indicate the position of a digit in the code. For example 7413 denotes
the 7x9 model with ∆=0.4σ using the rule set of (C,P) with the weight equal
0.8, (C,Q) with the weight equal 0.4 and (P,Q) with the weight equal 0.1 In
turn 3134 denotes the 3x5 model with ∆=1.0σ using the rule matrix of (C,P)
with the weight equal 0.6, and (P,Q) with the weight equal 0.2.
The purpose of the experiment was to examine how input values influence
the output of the system, how the assessments produced by the system make
it possible to differentiate the results of operators’ work and how close are the
system assessments to subjective manager’s judgments.
6
Dariusz Król, Grzegorz Kukla, Tadeusz Lasota, and Bogdan Trawiński
Table 2. Coding method of variants tested
C(1) Model C(2) Delta C(3)
C(4) Weights
Rule sets
3
3x5
1
1.0σ
1
(C, P ), (C, Q), (P, Q)
1
1.0, 1.0, 1.0
5
5x7
2
0.8σ
2
(C, P ), (C, Q)
2
0.9, 0.6, 0.3
7
7x9
3
0.6σ
3
(C, P ), (P, Q)
3
0.8, 0.4, 0.1
4
0.4σ
4
0.6, 0.2, 0.2
5
0.2σ
4 Results of the Investigation
Data for the analysis of descriptive statistics and multiple linear regression
have been prepared in analogous way as data which were used during correlation study. Input criteria with zero values, for months where a given operator
did not work and criteria with values trimmed to 200, for months when a
given operator achieved results more than 200 percent better than average,
were neglected. However the comparison of the assessments produced by the
fuzzy system with subjective judgments of information centre manager has
been conducted using statistical data of changes added by operators into the
cadastre system during September 2005.
4.1 Multiple Linear Regression Analysis
The multiple linear regression with no intercept has been calculated for all 180
models. In each case the coefficient R was greater than 0.9 (minimum value
equal 0.935 and maximum value equal 0.997), F-value scaled between 219.473
and 4760.762 and p-value very close to zero. This indicates that input criteria
are strongly related to the output assessments. The analysis of β coefficients
has revealed that p-value for βQ coefficient by Quality variable was greater
than 0.05 in 37 cases, i.e. 20.6%. Moreover the value of βQ coefficient was
negative in 156 cases what may be interpreted that operators achieved better
complexity or productivity at the cost of decreasing quality. The results of the
regression analysis of 9 selected models for which all βC , βP and βQ coefficients
were significant are shown in Table 3.
4.2 Analysis of descriptive statistics
General question of the analysis of descriptive statistics was how the output generated by the system made it possible to differentiate the results of
operators’ work. So two measures have been taken into account namely the
variability coefficient which is expressed by the standard deviation divided by
the mean and the range which equals the difference between maximum and
minimum assessments. It may be expected that if the fuzzy system provides
more differentiated results then it will better assist managers’ in assessing
their workers. The variability coefficient calculated for 180 models has had
Investigation for the Assessment of Cadastre Operators’ Work
7
Table 3. Results of multiple linear regression analysis for 9 selected models
Model type Model code Multiple R
3x5
3111
0.984
3x5
3223
0.981
3x5
3434
0.969
5x7
5131
0.993
5x7
5313
0.984
5x7
5422
0.971
7x9
7312
0.965
7x9
7424
0.956
7x9
7531
0.948
F-value
933.819
807.087
475.597
2264.987
973.042
507.758
426.105
331.688
275.480
βC
0.326
0.626
0.538
0.256
0.617
0.706
0.555
0.759
0.392
βP
0.411
0.439
0.724
0.543
0.447
0.402
0.516
0.627
0.791
βQ
0.201
−0.200
−0.453
0.289
−0.137
−0.285
−0.220
−0.613
−0.498
the values between 0.283 and 0.773 and the range between 128 and 182. The
values of variability coefficient presented in Fig. 5 are greater for 7x9 models
than for 3x5 and 5x7 models.
Fig. 5. Values of variability coefficient for 180 models tested
You can draw similar conclusions when you examine the fuzzy system
output surface. The plots generated by Matlab Surface Viewer has shown that
the 7x9 model assures more distinguishable assessments than the assessments
produced by the 3x5 model (see Fig. 6).
In Fig. 7a range values for different ∆ sizes for corresponding 3x5, 5x7
and 7x9 models (the same rule sets: (C,P), (C,Q), (P,Q) and the same weight
variant: 0.6, 0.2, 0.2) are presented, where 1, 2, 3, 4, 5 on X axis denote ∆
equal 1.0σ, 0.8σ, 0.6σ, 0.4σ, 0.2σ respectively. In Fig. 7b range values for
different rule weight variants for corresponding 3x5, 5x7 and 7x9 models (the
same rule sets: (C,P), (C,Q), (P,Q) and the same ∆=0.6σ) are presented,
where 1, 2, 3, 4 on X axis denote 1.0, 1.0, 1.0 and 0.9, 0.6, 0.3 and 0.8, 0.4,
0.1 and 0.6, 0.2, 0.2 variants respectively. In both Fig. 7a and 7b it is clearly
seen that 7x9 models provide more distinguishable results than other models.
8
Dariusz Król, Grzegorz Kukla, Tadeusz Lasota, and Bogdan Trawiński
Fig. 6. Assessment surface versus C and P criteria for 3x5 and 7x9 models
Fig. 7. Value of a range for a) different variants of ∆ and for b) different variants
of rule weights
4.3 Comparison of the assessments assigned by the system and by
a centre manager
In order to evaluate how the output of the system is related to a centre manager’s judgment one information centre manager was asked to appraise his
operators’ work in September 2005. He was not informed how fuzzy system
worked and he did not see the results of statistics module so that his judgments were entirely subjective. The manager was able to give relatively rough
assessments expressed in percents: 150%, 120%, 120%, 120%, 110%, 100%,
80%, 80%, and 70% for successive operators. It could be easily seen that the
manager had difficulties in differentiating individual operators. Nevertheless
in the case of equal assessment he was asked to rank the operators. So we
were able to compare the rankings determined by the manager with produced
by the fuzzy system. The tenth operator was not classified by the manager
who stated that operator fulfilled different tasks and added changes to the
cadastre database sporadically and therefore was assigned last position in the
Investigation for the Assessment of Cadastre Operators’ Work
9
manager’s ranking. We used following measure of the distance between these
two rankings:
DRank =
10
X
|rmi − rsi |
(1)
i=1
where rmi denotes the position of i-th operator in the manager’s ranking
and rsi the position of i-th operator in the ranking produced by the fuzzy
system. The DRank measure was calculated for each of 180 models tested
and its value was between 18 and 26.
Rank positions of individual operators produced by the system for three
selected 3x5, 5x7 and 7x9 models and positions assigned by the centre manager
are presented in Fig. 8.
Fig. 8. Rank positions assigned to individual operators by the manager and the
system
It can be seen that the manager and the system equally recognized the
best and the worst operators. However the manager clearly underestimated
operator c and operator d. Maybe it has been caused by manager’s subjective
approach, which for example when assessing the d operator’s work for 70%
stated that operator admittedly was a very experienced person but she tended
to work slowly. It is also possible that there are other criteria of operators’
work assessment, maybe even immeasurable, which the manager took into
consideration.
10
Dariusz Król, Grzegorz Kukla, Tadeusz Lasota, and Bogdan Trawiński
5 Conclusions and Future Works
The fuzzy system for the multi-criteria assessment of information system operators’ work has been evaluated using real data taken from one cadastre
centre. Input data generated by the statistical module have been processed
using automatically created 180 variants of fuzzy models. The variants covered all possible combinations of three basic 3x5, 5x7 and 7x9 models, five
values of parameter determining the widths of triangle basements, three sets
of rules and four sets of rule weights. Multiple linear regression, descriptive
statistics, correlation and the distance between operator rankings have been
used in the analysis of the test results.
The experiment allowed us to investigate the properties of the fuzzy system. In 79% all input variables influenced the output significantly. The assessments generated by the models differed in the value of variability coefficient
and the range. The 7x9 models assured better differentiation of the results.
It is not possible to determine definitely which model is optimal, nevertheless
the study proved usefulness of the model. It is planned to carry out further
evaluation experiments with the active participation of the centre managers.
This time the centre managers will be instructed how the fuzzy system operates and will be got familiar with the statistics of operators’ work within a
given time. Moreover they will be able to determine the weights of the rules
in order to adjust the system to their preferences.
References
1. Ajith A (2001) Neuro-Fuzzy Systems: Sate-of-the-Art Modelling Techniques. In:
Proceedings of the 6th International Conference on Neural Networks 269–276
2. Cheung W, Pitcher T, Pauly D (2004) A Fuzzy Logic Expert System for Estimating the Intrinsic Extinction Vulnerabilities of Seamount Fishes to Fishing.
Fisheries Centre Research Reports 12(5):33–50
3. Gomez A, Delgado M, Vila M (1999) About the use of fuzzy clustering techniques for fuzzy model identification. Fuzzy Sets and Systems 106(2):179–188
4. Herrera F (2005) Genetic Fuzzy Systems: Status, Critical Considerations and
Future Directions. Journal of Computational Intelligence Research 1(1):59–67
5. IEC 1131 - Programmable Controllers (1997) Part 7 - Fuzzy Control Programming. Committee Draft CD 1.0 (Rel. 19 Jan 97)
6. Król D, Kukla G S, Lasota T, Trawiński B (2006) Fuzzy Model for the Assessment of Operators’ Work in a Cadastre Information System (to be published
7. Piegat A (2003) Fuzzy Modelling and Control (in Polish). Akademicka Oficyna
Wydawnicza EXIT Warszawa
8. Saez D, Cipriano A (2001) A new method for structure identification of fuzzy
models and its application to a combined cycle power plant. Engineering Intelligent Systems 9(2):101–107
9. Xing Zong-Yi, Jia Li-Min, Zhang Yong, Hu Wei-Li, Qin Yong (2005) A Case
Study of Data-driven Interpretable Fuzzy Modeling. Acta Automatica Sinica
31(6):815–824