A Formal Model of Interpretability of Linguistic Variables

A Formal Model of Interpretability of
Linguistic Variables
Ulrich Bodenhofer1 and Peter Bauer2
1
Software Competence Center Hagenberg
A-4232 Hagenberg, Austria
e-mail: [email protected]
COMNEON Software
A-4040 Linz, Austria
e-mail: [email protected]
2
Abstract. The present contribution is concerned with the interpretability of fuzzy
rule-based systems. While this property is widely considered to be a crucial one in
fuzzy rule-based modeling, a more detailed investigation of what “interpretability”
actually means is still missing. So far, interpretability has often been associated with
heuristic assumptions about shape and mutual overlapping of fuzzy membership
functions. In this chapter, we attempt to approach this problem from a more general
and formal point of view. First, we clarify what, in our opinion, the different aspects
of interpretability are. Following that, we propose an axiomatic framework for the
interpretability of linguistic variables (in Zadeh’s sense) which is underlined by
examples and application perspectives.
1
Introduction
The epoch-making idea of L. A. Zadeh’s early work was to utilize what he
called “fuzzy sets” as mathematical models of linguistic expressions which
cannot be represented in the framework of classical binary logic and set theory
in a natural way. The introduction of his seminal article on fuzzy sets [37]
contains the following remarkable words:
“More often than not, the classes of objects encountered in the real
physical world do not have precisely defined criteria of membership.
[. . . ] Yet, the fact remains that such imprecisely defined “classes” play
an important role in human thinking, particularly in the domains of
pattern recognition, communication of information, and abstraction.”
Fuzzy systems became a tremendously successful paradigm – a remarkable triumph which started with well-selling applications in consumer goods
implemented by Japanese engineers. The reasons for this development are
manifold; however, we are often confronted with the following arguments:
1. The main difference between fuzzy systems and other control or decision support systems is that they are parameterized in an interpretable
A Formal Model of Interpretability of Linguistic Variables
525
way – by means of rules consisting of linguistic expressions. Fuzzy systems, therefore, allow rapid prototyping as well as easy maintenance and
adaptation.
2. Fuzzy systems offer completely new opportunities to deal with processes
for which only a linguistic description is available. Thereby, they allow to
achieve a robust, secure, and reproducible automation of such tasks.
3. Even if conventional control or decision support strategies can be employed, re-formulating a system’s actions by means of linguistic rules can
lead to a deeper qualitative understanding of its behavior.
We would like to raise the question whether fuzzy systems, as they appear
in daily practice, really reflect these undoubtedly nice advantages. One may
observe that the possibility to estimate the system’s behavior by reading and
understanding the rule base only is a basic requirement for the validity of
the above points. If we adopt the usual wide understanding of fuzzy systems
(rule-based systems incorporating vague linguistic expressions), we can see,
however, that this property – let us call it interpretability – is not guaranteed
by definition.
In our opinion, interpretability should be the key property of fuzzy systems. If it is neglected, one ends up in nothing else than black-box descriptions of input-output relationships and any advantage over neural networks
or conventional interpolation methods is lost completely.
The more fuzzy systems became standard tools for engineering applications, the more Zadeh’s initial mission became forgotten. In recent years,
however, after a relatively long period of ignorance, an increasing awareness of
the crucial property of interpretability has emerged [1,2,6,9,17,34–36], where
the present book is intended to bundle these forces by presenting a comprehensive overview of recent research on this topic. So far, the following
questions have been identified to have a close connection to interpretability:
1. Does the inference mechanism produce results that are technically and
intuitively correct?
2. Is the number of rules still small enough to be comprehensible by a human
expert?
3. Do the fuzzy sets associated to the linguistic expressions really correspond
to the human understanding of these expressions?
The first question has to be approached from the side of approximate reasoning [14–16] and relational equations [10,21,26]. There is no standard way to
tackle the problems associated with the second question (see the introductory
chapter of this book for an overview of different ideas).
The given chapter is solely devoted to the third question. So far, there
is a shallow understanding that the third question is related to shape, ordering, and mutual overlapping of fuzzy membership functions. We intend
to approach this question more formally. This is accomplished making the
inherent relationships between the linguistic labels explicit by formulating
526
Ulrich Bodenhofer and Peter Bauer
them as (fuzzy) relations. In order to provide a framework that is as general
as possible, we consider linguistic variables in their most general form.
2
Preliminaries
Throughout the whole chapter, we do not explicitly distinguish between fuzzy
sets and their corresponding membership functions. Consequently, uppercase
letters are used for both synonymously. For a given non-empty set X, we
denote the set of fuzzy sets on X with F(X). As usual, a fuzzy set A ∈ F(X)
is called normalized if there exists an x ∈ X such that A(x) = 1.
Triangular norms and conorms [24] are common standard models for
fuzzy conjunctions and disjunctions, respectively. In this chapter, we will
mainly need these two concepts for intersections and unions of fuzzy sets.
It is known that couples consisting of a nilpotent t-norm and its dual tconorm [18,24] are most appropriate choices as soon as fuzzy partitions are
concerned [11,26]. The most important representatives of such operations are
the so-called Lukasiewicz operations:
TL (x, y) = max(x + y − 1, 0)
SL (x, y) = min(x + y, 1)
The intersection and union of two arbitrary fuzzy sets A, B ∈ F(X) with
respect to the Lukasiewicz operations can then be defined as
A ∩L B (x) = TL A(x), B(x) and
A ∪L B (x) = SL A(x), B(x) ,
respectively. We restrict to these two standard operations in the following –
for the reason of simplicity and the fact that they perfectly fit to the concept
of fuzzy partitions due to Ruspini [33]; recall that a family of fuzzy sets
(Ai )i∈I ⊆ F(X) is called Ruspini partition if the following equality holds for
all x ∈ X:
X
Ai (x) = 1
i∈I
Furthermore, recall that a fuzzy set A ∈ F(X) is called convex if the
property
x ≤ y ≤ z =⇒ A(y) ≥ min A(x), A(z)
holds for all x, y, z ∈ X (given a crisp linear ordering ≤ on the domain X)
[4,27,37].
Lemma 1. [4] Let X be linearly ordered. Then an arbitrary fuzzy set A ∈
F(X) is convex if and only if there exists a partition of X into two connected
subsets X1 and X2 such that, for all x1 ∈ X1 and all x2 ∈ X2 , x1 ≤ x2 holds
and such that the membership function of A is non-decreasing over X1 and
non-increasing over X2 .
A Formal Model of Interpretability of Linguistic Variables
527
As a trivial consequence of the previous lemma, a fuzzy set whose membership function is either non-decreasing or non-increasing is convex.
3
Formal Definition
Since it has more or less become standard and offers much freedom, in particular with respect to integration of linguistic modifiers and connectives, we
closely follow Zadeh’s original definition of linguistic variables [38–40].
Definition 1. A linguistic variable V is a quintuple of the form
V = (N, G, T, X, S),
where N , T , X, G, and S are defined as follows:
1. N is the name of the linguistic variable V
2. G is a grammar
3. T is the so-called term set, i.e. the set linguistic expressions resulting
from G
4. X is the universe of discourse
5. S is a T → F(X) mapping which defines the semantics – a fuzzy set on
X – of each linguistic expression in T
In this chapter, let us assume that the grammar G is always given in BackusNaur Form (BNF) [32].
In our point of view, the ability to interpret the meaning of a rule base
qualitatively relies deeply upon an intuitive understanding of the linguistic
expressions. Of course, this requires knowledge about inherent relationships
between these expressions. Therefore, if qualitative estimations are desired,
these relationships need to transfer to the underlying semantics, i.e. the fuzzy
sets modeling the labels. In other words, interpretability is strongly connected
to the preservation of inherent relationships by the mapping S (according to
Def. 1).
The following definition gives an exact mathematical formulation of this
property.
Definition 2. Consider a linguistic variable V = (N, T, X, G, S) and an index set I. Let R = (Ri )i∈I be a family of relations on the set of verbal values
T , where each relation Ri has a finite arity ai . Assume that, for every relation
Ri , there exists a relation Qi on the fuzzy power set F(X) with the same
arity.1 Correspondingly, we abbreviate the family (Qi )i∈I with Q. Then the
linguistic variable V is called R-Q-interpretable if and only if the following
holds for all i ∈ I and all x1 , . . . , xai ∈ T :
Ri x1 , . . . , xai =⇒ Qi S(x1 ), . . . , S(xai )
(1)
1
Qi is associated with the “semantic counterpart” of Ri , i.e. the relation that
models Ri on the semantical level.
528
Ulrich Bodenhofer and Peter Bauer
Remark 1. The generalization of Def. 2 to fuzzy relations is straightforward.
If we admit fuzziness of the relations Ri and Qi , the implication in (1) has
to be replaced by the inequality
Ri x1 , . . . , xai ≤ Qi S(x1 ), . . . , S(xai ) .
4
A Detailed Study by Means of Practical Examples
In almost all fuzzy control applications, the domains of the system variables
are divided into a certain number of fuzzy sets by means of the underlying ordering – a fact which is typically reflected in expressions like “small”,
“medium”, or “large”. We will now discuss a simple example involving orderings to illustrate the concrete meaning of Def. 2.
Let us consider the following linguistic variable:
V = (“v1”, G, T, X, S)
The grammatical definition G is given as follows:
⊥
:= hatomici ;
hatomici := hadjectivei | hadverbi hadjectivei ;
hadjectivei := “small” | “medium” | “large” ;
hadverbi := “at least” | “at most” ;
Obviously, the following nine-element term set can be derived from G:
T = {“small”, “medium”, “large”,
“at least small”, “at least medium”,
“at least large”, “at most small”,
“at most medium”, “at most large”}
The universe of discourse is the real interval X = [0, 100].
Taking the “background” or “context” of the variable into account, almost
every human has an intuitive understanding of the qualitative meaning of
each of the above linguistic expressions, even if absolutely nothing about
the quantitative meaning, i.e. the corresponding fuzzy sets, is known. This
understanding, to a major part, can be attributed to elementary relationships
between the linguistic values. According to Def. 2, let us assume that these
inherent relationships are modeled by a family of relations R = (Ri )i∈I .
In our opinion, the most obvious relationships in the example term set T
are orderings and inclusions. Therefore, we consider the following two binary
relations (for convenience, we switch to infix notations here):
R = (, v)
(2)
A Formal Model of Interpretability of Linguistic Variables
529
The first relation stands for the ordering of the labels, while the second
one corresponds to an inclusion relation, e.g. u v v means that v is a more
general term than u.
First of all, one would intuitively expect a proper ordering of the adjectives, i.e.
“small” “medium” “large”.
(3)
Moreover, the following monotonicities seem reasonable for all adjectives u, v
(atomic expressions from the set {“small”, “medium”, “large”}):
u v “at least” u
v v “at most” v
u v =⇒ “at least” u “at least” v
u v =⇒ “at most” u “at most” v
u v =⇒ “at least” v v “at least” u
u v =⇒ “at most” u v “at most” v
Figures 1 and 2 show Hasse diagrams which fully describe the two relations
and v (note that both relations are supposed to be reflexive, a fact which,
for the sake of simplicity, is not made explicit in the diagrams).
atl. small
6
small
6
atm. small
- atl. med.
6
-
med.
6
- atm. med.
- atl. large
6
-
large
6
- atm. large
Fig. 1. Hasse diagram of ordering relation Now we have to define meaningful counterparts of the relations in R on
the semantical level, i.e. on F(X). We start with the usual inclusion of fuzzy
sets according to Zadeh [37].
Definition 3. Consider two fuzzy sets of A, B ∈ F(X). A is called a subset
of B, short A ⊆ B, if and only if, for all x ∈ X, A(x) ≤ B(x). Consequently,
in this case, B is called a superset of A.
For defining a meaningful counterpart of the ordering relation , we adopt
a simple variant of the general framework for ordering fuzzy sets proposed
in [4,5], which includes well-known orderings of fuzzy numbers based on the
extension principle [23,25].
530
Ulrich Bodenhofer and Peter Bauer
atm. large
atl. small
6@
I
@
@
6
@
atm. med.
atm. small
6
small
atl. med.
@
@
6@
I
@
@
6
@
@
@
@
@
@
atl. large
@
6
@
med.
@
@
large
Fig. 2. Hasse diagram of inclusion relation v
Definition 4. Suppose that a universe X is equipped with a crisp linear
ordering ≤. Then a preordering . of fuzzy sets can be defined by
A . B ⇐⇒ ATL(B) ⊆ ATL(A) and ATM(A) ⊆ ATM(B) ,
where the operators ATL and ATM are defined as follows:
ATL(A)(x) = sup{A(y) | y ≤ x}
ATM(A)(x) = sup{A(y) | y ≥ x}
Figure 3 shows an example what the operators ATL and ATM give for
a non-trivial fuzzy set. It is easy to see that ATL always yields the smallest
superset with non-decreasing membership function, while ATM yields the
smallest superset with non-increasing membership function. For more details
about the particular properties of the ordering relation . and the two operators ATL and ATM, see [4,5].
Summarizing, the set of counterpart relations Q looks as follows (with the
relations from Defs. 3 and 4):
Q = (., ⊆)
(4)
Now R-Q-interpretability of linguistic variable V (with definitions of R and
Q according to (2) and (4), respectively) specifically means that the following
two implications hold for all u, v ∈ T :
u v =⇒ S(u) . S(v)
u v v =⇒ S(u) ⊆ S(v)
(5)
(6)
This means that the mapping S plays the crucial role in terms of interpretability. In this particular case, R-Q-interpretability is the property that
A Formal Model of Interpretability of Linguistic Variables
531
1
0.8
A
0.6
0.4
0.2
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
0.8
ATL(A)
0.6
0.4
0.2
1
0.8
ATM(A)
0.6
0.4
0.2
Fig. 3. A fuzzy set A ∈ F (R) and the results which are obtained when applying
the operators ATL and ATM
an ordering or inclusion relationship between two linguistic terms is never violated by the two corresponding fuzzy sets. From (3) and (5), we can deduce
the first basic necessary condition for the fulfillment of R-Q-interpretability
– that the fuzzy sets associated with the three adjectives must be in proper
order:
S(“small”) . S(“medium”) . S(“large”)
(7)
It is easy to observe that this basic ordering requirement is violated by the
example shown in Fig. 4, while it is fulfilled by the fuzzy sets in Fig. 5.
In order to fully check R-Q-interpretability of V , the semantics of linguistic expressions containing an adverb (“at least” or “at most”) have to be
considered as well. The definition of linguistic variables does not explicitly
contain any hint how to deal with the semantics of such expressions. From
a pragmatic viewpoint, two different ways are possible: one simple variant is
to define a separate fuzzy set for each expression, regardless whether they
contain an adverb or not. As a second traditional variant, we could use fuzzy
modifiers – F(X) → F(X) functions – for modeling the semantics of adverbs.
In this example, it is straightforward to use the fuzzy modifiers introduced
532
Ulrich Bodenhofer and Peter Bauer
1
S(“medium”) S(“small”)
HH
j
H
?
S(“large”)
0.8
0.6
0.4
0.2
Fig. 4. A non-interpretable setting
1
S(“small”)
?
S(“medium”)
?
S(“large”)
?
0.8
0.6
0.4
0.2
Fig. 5. An example of an interpretable setting
in Def. 4 (see [3,4,12] for a detailed justification):
S(“at least”A) = ATL S(A)
S(“at most”A) = ATM S(A)
Since it is by far simpler and easier to handle with respect to interpretability,
we strongly suggest the second variant.
In case that we use the above fuzzy modifiers for modeling the two adverbs
“at least” and “at most”, we are now able to formulate a necessary condition
for the fulfillment of R-Q-interpretability in our example.
Theorem 1. Consider the linguistic variable V and the two relation families R and Q as defined above. Provided that the mapping S always yields a
normalized fuzzy set, the following two statements are equivalent:
(i) V is R-Q-interpretable
(ii) S(“small”) . S(“medium”) . S(“large”)
A Formal Model of Interpretability of Linguistic Variables
533
Proof. (i)⇒(ii): Trivial (see above).
(ii)⇒(i): The following basic properties hold for all normalized fuzzy sets
A, B ∈ F(X) [4]:
A ⊆ ATL(A)
A ⊆ ATM(A)
ATL ATL(A) = ATL(A)
ATM ATM(A) = ATM(A)
ATL ATM(A) = ATM ATL(A) = X
A ⊆ B =⇒ ATL(A) ⊆ ATL(B)
A ⊆ B =⇒ ATM(A) ⊆ ATM(B)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
Since the relations ⊆ and . are reflexive and transitive [4,5], it is sufficient
to prove the relations indicated by arrows in the two Hasse diagrams (see
Figs. 1 and 2).
Let us start with the ordering relation. The validity of the relations in
the middle row is exactly assumption (ii). The relations in the two other
rows follow directly from the following two relationships which can be
proved easily using (10), (11), and (12):
A . B =⇒ ATL(A) . ATL(B)
A . B =⇒ ATM(A) . ATM(B)
The three vertical relationships in Fig. 1 follow directly from
ATM(A) . A . ATL(A)
which can be shown using (8), (9), (13), and (14).
The relations in the Hasse diagram in Fig. 2 follow from (8), (9), and the
definition of the preordering . (cf. Def. 4).
t
u
Obviously, interpretability of V in this example (with respect to the families R and Q) does not fully correspond to an intuitive human understanding
of interpretability. For instance, all three expressions “small”, “medium”,
and “large” could be mapped to the same fuzzy set without violating R-Qinterpretability. The intention was to give an example which is just expressive
enough to illustrate the concrete meaning and practical relevance of Def. 2.
In order to formulate an example in which R-Q-interpretability is much
closer to a human-like understanding of interpretability (e.g. including separation constraints), we have to consider an extended linguistic variable
V 0 = (“v2”, G0 , T 0 , X 0 , S 0 ).
The extended grammar G0 is given as follows:
534
Ulrich Bodenhofer and Peter Bauer
⊥
:= hexpi | hboundsi ;
hexpi
:= hatomici | hatomici hbinaryi hatomici ;
hatomici := hadjectivei | hadverbi hadjectivei ;
hadjectivei := “small” | “medium” | “large” ;
hadverbi := “at least” | “at most” ;
hbinaryi := “and” | “or” ;
hboundsi := “empty” | “anything” ;
It is easy to see that the corresponding term set T 0 has the following
elements: the grammar admits nine atomic expressions (three adjectives plus
two adverbs times three adjectives; note that this subset coincides with T
from the previous example). Hence, there are 9 + 2 · 92 = 171 expressions of
type hexpi. Finally adding the two expressions of type hboundsi, the term set
T 0 has a total number of 173 elements.
As in the previous example, we would like to use an inclusion and an
ordering relation. Since the two relations . and ⊆ are defined for arbitrary
fuzzy sets, we can keep the relation family Q as it is. If we took R as defined
above, R-Q-interpretability would be satisfied under the same conditions as
in Th. 1. This example, however, is intended to demonstrate that partition
and convexity constraints can be formulated level of linguistic expressions,
too. Therefore, we extend the inclusion relation v as follows. Let us consider
a binary relation v· on T 0 . First of all, we require that v· coincides with v
on the set of atomic expressions (u, v ∈ T ):
(u v v) =⇒ (u v· v)
(15)
Of course, we assume that the two binary connectives are non-decreasing
with respect to inclusion, commutative, and that the “and” connective yields
subsets and the “or” connective yields supersets (for all u, v, w ∈ T ):
(v v· w) =⇒ (u “and” v) v· (u “and” w)
(v v· w) =⇒ (u “or” v) v· (u “or” w)
(16)
(17)
(u “and” v) v· (v “and” u)
(u “or” v) v· (v “or” u)
(18)
(19)
(u “and” v) v· u
u v· (u “or” v)
(20)
(21)
Next, let us suppose that “anything” is the most general and that “empty”
is the least general expression, i.e., for all u ∈ T 0 ,
u v· “anything”
and “empty” v· u.
(22)
Now we can impose reasonable disjointness constraints like
“small and at least medium” v· “empty”
“at most medium and large” v· “empty”
(23)
(24)
A Formal Model of Interpretability of Linguistic Variables
535
and coverage properties:
“small or medium” v· “at most medium”
“at most medium” v· “small or medium”
“anything” v· “at most medium or large”
“medium or large” v· “at least medium”
“at least medium” v· “medium or large”
“small or at least medium” v· “anything”
(25)
(26)
(27)
(28)
(29)
(30)
Finally, let us assume that “small” and “large” are the two boundaries with
respect to the ordering of the labels:
“at most small” v· “small”
“anything” v· “at least small”
(31)
(32)
“at least large” v· “large”
“anything” v· “at most large”
(33)
(34)
If we denote the reflexive and transitive closure of v· with v0 , we can finally
write down the desired family of relations:
R0 = (, v0 )
(35)
In order to study the R0 -Q-interpretability of V 0 , we need to define the
semantics of those expressions that have not been contained in T . Of course,
for the expressions in T , we use the same semantics as in the previous example, i.e., for all u ∈ T , S 0 (u) = S(u). Further, let us make the convention
that the two expressions of type hboundsi are always mapped to the empty
set and the whole universe, respectively:
S 0 (“empty”) = ∅
S 0 (“anything”) = X
The “and” and the “or” connective are supposed to be “implemented” by the
intersection and union with respect to the Lukasiewicz t-norm and its dual
t-conorm (for all u, v ∈ T ):
S 0 (u “and” v) = S 0 (u) ∩L S 0 (v)
S 0 (u “or” v) = S 0 (u) ∪L S 0 (v)
Now we are able to fully characterize R0 -Q-interpretability for the given
example (the linguistic variable V 0 ).
536
Ulrich Bodenhofer and Peter Bauer
Theorem 2. Provided that S 0 yields a normalized fuzzy set for each adjective, V 0 is R0 -Q-interpretable if and only if the following three properties hold
together:
1. S 0 (“small”) . S 0 (“medium”) . S 0 (“large”)
2. S 0 (“small”), S 0 (“medium”), and S 0 (“large”) are convex
3. S 0 (“small”), S 0 (“medium”), and S 0 (“large”) form a Ruspini partition
Proof. First of all, let us assume that V 0 is R0 -Q-interpretable. The first
property follows trivially as in the proof of Th. 1. Now, taking (23) into
account, we obtain from R0 -Q-interpretability that
0 ≥ TL S 0 (“small”)(x), S 0 (“at least medium”)(x)
= TL S 0 (“small”)(x), ATL(S 0 (“medium”))(x)
≥ TL S 0 (“small”)(x), S 0 (“medium”)(x) ,
i.e. that the TL -intersection of S 0 (“small”) and S 0 (“medium”) is empty. Analogously, we are able to show that the TL -intersection of S 0 (“medium”) and
S 0 (“large”) is empty, too. Since S 0 (“medium”) . S 0 (“large”) implies that
ATL S 0 (“large”) ⊆ ATL S 0 (“medium”) ,
it follows that, by the same argument as above, that the TL -intersection of
S 0 (“small”) and S 0 (“large”) is empty as well. Now consider (25) and (26).
R0 -Q-interpretability then implies the following (for all x ∈ X):
SL S 0 (“small”)(x), S 0 (“medium”)(x) = S 0 (“at most medium”)(x)
= ATL S 0 (“medium”) (x)
Taking (27) into account as well, we finally obtain
1 ≤ SL S 0 (“at most medium”)(x), S 0 (“large”)(x)
= SL S 0 (“small”)(x), S 0 (“medium”)(x), S 0 (“large”)(x)
which proves that the SL -union of all fuzzy sets associated to the three adjectives yields the whole universe X. Since all three fuzzy sets are normalized
and properly ordered, not more than two can have a membership degree
greater than zero at a given point x ∈ X. This implies that the three fuzzy
sets form a Ruspini partition [26].
From (31) and (33) and R0 -Q-interpretability, we can infer that
ATM S 0 (“small”) = S 0 (“small”),
ATL S 0 (“large”) = S 0 (“large”),
hence, S 0 (“small”) has a non-increasing membership function and S 0 (“large”)
has a non-decreasing membership function. Both fuzzy sets, therefore, are
A Formal Model of Interpretability of Linguistic Variables
537
convex. Since the three fuzzy sets S 0 (“small”), S 0 (“medium”), and S 0 (“large”)
form a Ruspini partition, while only two can overlap to a positive degree, we
have that S 0 (“medium”) is non-decreasing to the left of any value x for which
S 0 (“medium”)(x) = 1 and non-increasing to the right. Therefore, by Le. 1,
S 0 (“medium”) is convex, too.
Now let us prove the reverse direction, i.e. we assume all three properties
and show that R0 -Q-interpretability must hold. By (15) and the first property
from Th. 2, we can rely on the fact that all correspondences remain preserved
for cases that are already covered by Th. 1. Therefore, it is sufficient to show
that S 0 preserves all relationships (16)–(34). Clearly, the preservation of (16)–
(21) follows directly from elementary properties of t-norms and t-conorms
[24]. The inclusions (22) are trivially maintained, since S 0 is supposed to map
“empty” to the empty set and “anything” to the universe X. The preservation
of the two disjointness conditions (23) and (24) follows from the fact that
S 0 (“small”), S 0 (“medium”), and S 0 (“large”) form a Ruspini partition and
that the first property holds. The same is true for the six coverage properties
(25)–(30). Since all three fuzzy sets are convex and form a Ruspini partition,
S 0 (“small”) must have a non-increasing membership function and S 0 (“large”)
must have a non-decreasing membership function. Therefore, the following
needs to hold:
ATM S 0 (“small”) = S 0 (“small”)
ATL S 0 (“large”) = S 0 (“large”)
This is a sufficient condition for the preservation of inclusions (31) and
(33). Then the preservation of (32) and (34) follows from the fact, that
ATL(ATM(A)) = X for any normalized fuzzy set A (cf. (12)). Since v0 and
⊆ are both supposed to be transitive (v0 being the transitive closure of the
intermediate relation v· ), the preservation of all other relationships follows
instantly.
t
u
At first glance, this example might seem unnecessarily complicated, since
the final result is nothing else than exactly those common sense assumptions
– proper ordering, convexity, partition constraints – that have been identified
as crucial for interpretability before in several recent publications (see [8] for
an overview). However, we must take into account that they are not just
heuristic assumptions here, but necessary conditions that are enforced by
intuitive requirements on the level of linguistic expressions. From this point
of view, this example provides a sound justification for exactly those three
crucial assumptions.
Now the question arises how the three properties can be satisfied in practice. In particular, it is desirable to have a constructive characterization of
the constraints implied by requiring R0 -Q-interpretability. The following theorem provides a unique characterization of R0 -Q-interpretability under the
assumption that we are considering real numbers and fuzzy sets with continuous membership functions – both are no serious restrictions from the practical
538
Ulrich Bodenhofer and Peter Bauer
point of view. Fortunately, we obtain a parameterized representation of all
mappings S 0 that maintain R0 -Q-interpretability.
Theorem 3. Assume that X is a connected subset of the real line and that
S 0 , for each adjective, yields a normalized fuzzy set with continuous membership function. Then the three properties from Th. 2 are fulfilled if and only if
there exist four values a, b, c, d ∈ X satisfying a < b ≤ c < d and two continuous non-decreasing [0, 1] → [0, 1] functions f1 , f2 fulfilling f1 (0) = f2 (0) = 0
and f1 (1) = f2 (1) = 1 such that the semantics of the three adjectives are
defined as follows:

1
if x ≤ a
if a < x < b
S 0 (“small”)(x) = 1 − f1 x−a
b−a

0
if x ≥ b

0
if x ≤ a



x−a

if a < x < b
 f1 b−a
S 0 (“medium”)(x) = 1
if b ≤ x ≤ c

x−c

1
−
f
if c < x < d

2
d−c


0
if x ≥ d

0
if x ≤ c
if c < x < d
S 0 (“large”)(x) = f2 x−c
d−c

1
if x ≥ d
(36)
(37)
(38)
Proof. It is a straightforward, yet tedious, task to show that the fuzzy sets
defined as above fulfill the three properties from Th. 2. Under the assumption
that these three properties are satisfied, we make the following definitions:
a = sup{x | S 0 (“small”)(x) = 1}
b = inf{x | S 0 (“medium”)(x) = 1}
c = sup{x | S 0 (“medium”)(x) = 1}
d = inf{x | S 0 (“large’)(x) = 1}
The two functions f1 , f2 can be defined as follows:
f1 (x) = S 0 (“medium”) a + x · (b − a)
f2 (x) = S 0 (“large”) c + x · (d − c)
Since all membership functions associated with adjectives are continuous, the
functions f1 and f2 are continuous. Taking the continuity and the fact that
the three fuzzy sets associated to the adjectives form a Ruspini partition into
A Formal Model of Interpretability of Linguistic Variables
539
account, it is clear that the following holds:
S 0 (“small”)(a) = 1
S 0 (“small”)(b) = S 0 (“small”)(c) = S 0 (“small”)(d) = 0
S 0 (“medium”)(b) = S 0 (“medium”)(c) = 1
S 0 (“medium”)(a) = S 0 (“medium”)(d) = 0
S 0 (“large”)(d) = 1
S 0 (“large”)(a) = S 0 (“large”)(b) = S 0 (“large”)(c) = 0
These equalities particularly imply that f1 (0) = f2 (0) = 0 and f1 (1) =
f2 (1) = 1 holds. As a consequence of convexity and Le. 1, we know that
the membership function of S 0 (“medium”) to the left of b is non-decreasing.
Analogously, we can infer that the same is true for S 0 (“large”) to the left of
d. Therefore, f1 and f2 are non-decreasing. To show that the three representations (36)–(38) hold is a routine matter.
t
u
5
5.1
Applications
Design Aid
As long as the top-down construction of small fuzzy systems (e.g. two-input
single-output fuzzy controllers) is concerned, interpretability is usually not
such an important issue, since the system is simple enough that a conscious
user will refrain from making settings which contradict his/her intuition.
In the design of complex fuzzy systems with a large number of variables
and rules, however, interpretability is a most crucial point. Integrating tools
which guide the user through the design of a large fuzzy system by preventing
him/her from making non-interpretable settings accidentally are extremely
helpful. As a matter of fact, debugging of large fuzzy systems becomes a
tedious task if it is not guaranteed that the intuitive meanings of the labels
used in the rule base are reflected in their corresponding semantics.
To be more precise, our goal is not to bother the user with additional theoretical aspects. Instead, the idea is to integrate these aspects into software
tools for fuzzy systems design, but not necessarily transparent for the user,
with the aim that he/she can build interpretable fuzzy systems in an even
easier way than with today’s software tools. Theorem 3 gives a clue how this
could be accomplished. This result, for one particular example, clearly identifies how much freedom one has in choosing interpretable settings. The example is not quite representative, since three linguistic expressions are a quite
restrictive assumption. However, the extension to an arbitrary finite number
of such expressions is straightforward, no matter whether we consider such
a typical “small”-“medium”-“large” example or a kind of symmetric setting
(e.g. “neg. large”, “neg. medium”, “neg. small”, “approx. zero”, “pos. small”,
540
Ulrich Bodenhofer and Peter Bauer
“pos. medium”, “pos. large”) as it is common in many fuzzy control applications. In all these cases, the requirements for interpretability are similar
and, by Th. 3, the resulting set of degrees of freedom is an increasing chain
of values that mark the beginning/ending of the kernels of the fuzzy sets
and a set of continuous non-decreasing functions that control the shape of
the transitions between two neighboring fuzzy sets. While linear transitions
are common and easy to handle, smooth transitions by means of polynomial
functions with higher degree may be beneficial in some applications as well.
As simple examples, the following three polynomial [0, 1] → [0, 1] functions of degrees 1, 3, and 5 perfectly serve as transition functions in the
sense of Th. 3. They produce membership functions that are continuous (p1 ),
differentiable (p3 ), and twice differentiable (p5 ), respectively:
p1 (x) = x
p3 (x) = −2x3 + 3x2
p5 (x) = 6x5 − 15x4 + 10x3
Figure 6 shows examples of interpretable fuzzy partitions with three fuzzy
sets using the transition functions p1 , p3 , and p5 .
1
0.8
0.6
0.4
0.2
1
0.8
0.6
0.4
0.2
1
0.8
0.6
0.4
0.2
Fig. 6. Three interpretable fuzzy partitions with polynomial transitions of degree
1 (top), 3 (middle), and 5 (bottom)
A Formal Model of Interpretability of Linguistic Variables
5.2
541
Tuning
Automatic design and tuning of fuzzy systems has become a central issue in
machine learning, data analysis, and the identification of functional dependencies in the analysis of complex systems. In the last years, a vast number
of scientific publications dealt with this problem. Most of them, however,
disregarded the importance of interpretability – leading to results which are
actually black-box functions that do not provide any meaningful linguistic information (typical pictures like in Fig. 4 can be found in an enormous number
of papers).
One may argue that proper input-output behavior is the central goal of
automatic tuning. To some extent, this is true; however, as stated already in
Sect. 1, this is not the primary mission of fuzzy systems.
Again, Th. 3 gives a clear indication how the space of possible solutions
among interpretable settings may be parameterized – by an ascending chain
of transition points (given a set of transition functions). Note that this kind
of parameterization even leads to a reduction of the search space. Parameterizing three trapezoidal fuzzy sets independently requires a total number of
twelve parameters and most probably leads to difficulty interpretable results.
Requiring interpretability (as described in the previous section) leads to a set
of only four parameters. It is true that such a setting is much more restrictive.
However, in our opinion, it is not necessarily the case that requiring interpretability automatically leads to a painful loss of accuracy. The requirement
of interpretability implies more constraints that have to be taken into account and, therefore, is more difficult to handle for many tuning algorithms,
no matter whether we consider genetic algorithms, fuzzy-neuro methods, or
numerical optimization. As a recent investigation has shown, it is indeed
possible to require interpretability while, at the same time, maintaining high
accuracy and robustness [22]. Many other studies have also come up with tuning algorithms that produce interpretable and accurate results [2,7,13,17,35].
5.3
Rule Base Simplification
The examples in Sect. 4 used an inclusion relation v and its counterpart on
the semantical side – the inclusion relation ⊆. Both relations are preorderings which particularly implies that their symmetric kernels are equivalence
relations. As easy to see, the relation ⊆ is even an ordering on F(X), which
implies that the symmetric kernel is the crisp equality relation, i.e. A ⊆ B
and B ⊆ A hold together if and only if the two membership functions coincide
exactly.
Now let us assume that we are given a linguistic variable V and two
relation families R and Q, where R contains an inclusion relation v and Q
contains its counterpart ⊆. If we have two linguistic labels u and v for which
u v v and v v v hold, R-Q-interpretability guarantees that the inclusions
S(u) ⊆ S(v) and S(v) ⊆ S(u) hold, i.e. u v v and v v v are sufficient
542
Ulrich Bodenhofer and Peter Bauer
conditions that the membership functions of S(u) and S(v) are equal. This
means that the equivalence relation defined as (for two u, v ∈ T )
(u ≡ v) ⇐⇒ (u v v and v v u)
may be considered as a set of simplification rules, while R-Q-interpretability
corresponds to the validity of these rules on the semantical side.
Let us recall the second example (linguistic variable V 0 defined as in
Sect. 4). The two inequalities (25) and (26) together imply
“small or medium” ≡ “at most medium”.
This could be read as a replacement rule
“small or medium” −→ “at most medium”,
with the meaning that, in any linguistic expression, “small or medium” can be
replaced by “at most medium”. If we assume interpretability, we can be sure
that this replacement is also semantically correct. If we incorporate as many
reasonable relationships in the relation v as possible such that interpretability can still be fulfilled, we are able to provide a powerful set of simplification
rules.
Of course, very simple grammars do not necessitate any simplification.
However, if we have to consider very complex rule bases like they appear in
grammar-based rule base optimization methods (e.g. inductive learning [28–
31] or fuzzy genetic programming [19,20]), simplification is a highly important
concern. The methodology presented here allows to deal with simplification in
a symbolic fashion – assuming interpretability – without the need to consider
the concrete semantics of the expressions anymore.
6
Conclusion
This chapter has been devoted to the interpretability of linguistic variables. In
order to approach this key property in a systematic and mathematically exact
way, we have proposed to make implicit relationships between the linguistic
labels explicit by formulating them as (fuzzy) relations. Then interpretability
corresponds to the preservation of this relationships by the associated meaning. This idea has been illustrated by means of two extensive examples. These
case studies have demonstrated that well-known common sense assumptions
about the membership functions, such as, ordering, convexity, or partition
constraints, have a sound justification also from a formal linguistic point of
view. In contrast to other investigations, the model proposed in this chapter
cannot just be applied to simple fuzzy sets, but also allows smooth integration
of connectives and ordering-based modifiers.
By characterizing parameterizations which ensure interpretability, we have
been able to provide hints for the design and tuning of fuzzy systems with
A Formal Model of Interpretability of Linguistic Variables
543
interpretable linguistic variables. Finally, we have seen that interpretability
even corresponds to the fact that symbolic simplification rules on the side of
linguistic expressions still remain valid on the semantical side.
Acknowledgements
Ulrich Bodenhofer is working in the framework of the Kplus Competence
Center Program which is funded by the Austrian Government, the Province
of Upper Austria, and the Chamber of Commerce of Upper Austria.
References
1. R. Babuška. Construction of fuzzy systems—interplay between precision and
transparency. In Proc. European Symposium on Intelligent Techniques (ESIT
2000), pages 445–452, Aachen, September 2000.
2. M. Bikdash. A highly interpretable form of Sugeno inference systems. IEEE
Trans. Fuzzy Systems, 7(6):686–696, December 1999.
3. U. Bodenhofer. The construction of ordering-based modifiers. In G. Brewka,
R. Der, S. Gottwald, and A. Schierwagen, editors, Fuzzy-Neuro Systems ’99,
pages 55–62. Leipziger Universitätsverlag, 1999.
4. U. Bodenhofer. A Similarity-Based Generalization of Fuzzy Orderings, volume
C 26 of Schriftenreihe der Johannes-Kepler-Universität Linz. Universitätsverlag
Rudolf Trauner, 1999.
5. U. Bodenhofer. A general framework for ordering fuzzy sets. In B. BouchonMeunier, J. Guitiérrez-Rı́oz, L. Magdalena, and R. R. Yager, editors, Technologies for Constructing Intelligent Systems 1: Tasks, volume 89 of Studies
in Fuzziness and Soft Computing, pages 213–224. Physica-Verlag, Heidelberg,
2002.
6. U. Bodenhofer and P. Bauer. Towards an axiomatic treatment of “interpretability”. In Proc. 6th Int. Conf. on Soft Computing (IIZUKA2000), pages 334–339,
Iizuka, October 2000.
7. U. Bodenhofer and E. P. Klement. Genetic optimization of fuzzy classification
systems — a case study. In B. Reusch and K.-H. Temme, editors, Computational
Intelligence in Theory and Practice, Advances in Soft Computing, pages 183–
200. Physica-Verlag, Heidelberg, 2001.
8. J. Casillas, O. Cordón, F. Herrera, and L. Magdalena. Finding a balance between interpretability and accuracy in fuzzy rule-based modelling: An overview.
In J. Casillas, O. Cordón, F. Herrera, and L. Magdalena, editors, Trade-off between Accuracy and Interpretability in Fuzzy Rule-Based Modelling, Studies in
Fuzziness and Soft Computing. Physica-Verlag, Heidelberg, 2002.
9. O. Cordón and F. Herrera. A proposal for improving the accuracy of linguistic
modeling. IEEE Trans. Fuzzy Systems, 8(3):335–344, June 2000.
10. B. De Baets. Analytical solution methods for fuzzy relational equations. In
D. Dubois and H. Prade, editors, Fundamentals of Fuzzy Sets, volume 7 of The
Handbooks of Fuzzy Sets, pages 291–340. Kluwer Academic Publishers, Boston,
2000.
544
Ulrich Bodenhofer and Peter Bauer
11. B. De Baets and R. Mesiar. T -partitions. Fuzzy Sets and Systems, 97:211–223,
1998.
12. M. De Cock, U. Bodenhofer, and E. E. Kerre. Modelling linguistic expressions
using fuzzy relations. In Proc. 6th Int. Conf. on Soft Computing (IIZUKA2000),
pages 353–360, Iizuka, October 2000.
13. M. Drobics, U. Bodenhofer, W. Winiwarter, and E. P. Klement. Data mining
using synergies between self-organizing maps and inductive learning of fuzzy
rules. In Proc. Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf.,
pages 1780–1785, Vancouver, July 2001.
14. D. Dubois and H. Prade. What are fuzzy rules and how to use them. Fuzzy
Sets and Systems, 84:169–185, 1996.
15. D. Dubois, H. Prade, and L. Ughetto. Checking the coherence and redundancy
of fuzzy knowledge bases. IEEE Trans. Fuzzy Systems, 5(6):398–417, August
1997.
16. D. Dubois, H. Prade, and L. Ughetto. Fuzzy logic, control engineering and artificial intelligence. In H. B. Verbruggen, H.-J. Zimmermann, and R. Babuška,
editors, Fuzzy Algorithms for Control, International Series in Intelligent Technologies, pages 17–57. Kluwer Academic Publishers, Boston, 1999.
17. J. Espinosa and J. Vandewalle. Constructing fuzzy models with linguistic integrity from numerical data—AFRELI algorithm. IEEE Trans. Fuzzy Systems,
8(5):591–600, October 2000.
18. J. Fodor and M. Roubens. Fuzzy Preference Modelling and Multicriteria Decision Support. Kluwer Academic Publishers, Dordrecht, 1994.
19. A. Geyer-Schulz. Fuzzy Rule-Based Expert Systems and Genetic Machine
Learning, volume 3 of Studies in Fuzziness. Physica-Verlag, Heidelberg, 1995.
20. A. Geyer-Schulz. The MIT beer distribution game revisited: Genetic machine
learning and managerial behavior in a dynamic decision making experiment. In
F. Herrera and J. L. Verdegay, editors, Genetic Algorithms and Soft Computing,
volume 8 of Studies in Fuzziness and Soft Computing, pages 658–682. PhysicaVerlag, Heidelberg, 1996.
21. S. Gottwald. Fuzzy Sets and Fuzzy Logic. Vieweg, Braunschweig, 1993.
22. J. Haslinger, U. Bodenhofer, and M. Burger. Data-driven construction of
Sugeno controllers: Analytical aspects and new numerical methods. In Proc.
Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf., pages 239–244,
Vancouver, July 2001.
23. E. E. Kerre, M. Mareš, and R. Mesiar. On the orderings of generated fuzzy
quantities. In Proc. 7th Int. Conf. on Information Processing and Management
of Uncertainty in Knowledge-Based Systems, volume 1, pages 250–253, 1998.
24. E. P. Klement, R. Mesiar, and E. Pap. Triangular Norms, volume 8 of Trends
in Logic. Kluwer Academic Publishers, Dordrecht, 2000.
25. L. T. Kóczy and K. Hirota. Ordering, distance and closeness of fuzzy sets.
Fuzzy Sets and Systems, 59(3):281–293, 1993.
26. R. Kruse, J. Gebhardt, and F. Klawonn. Foundations of Fuzzy Systems. John
Wiley & Sons, New York, 1994.
27. R. Lowen. Convex fuzzy sets. Fuzzy Sets and Systems, 3:291–310, 1980.
28. R. S. Michalski, I. Bratko, and M. Kubat. Machine Learning and Data Mining.
John Wiley & Sons, Chichester, 1998.
29. S. Muggleton and L. De Raedt. Inductive logic programming: Theory and
methods. J. Logic Program., 19 & 20:629–680, 1994.
A Formal Model of Interpretability of Linguistic Variables
545
30. J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106,
1986.
31. J. R. Quinlan. Learning logical definitions from relations. Machine Learning,
5(3):239–266, 1990.
32. A. Ralston, E. D. Reilly, and D. Hemmendinger, editors. Encyclopedia of Computer Science. Groves Dictionaries, Williston, VT, 4th edition, 2000.
33. E. H. Ruspini. A new approach to clustering. Inf. Control, 15:22–32, 1969.
34. M. Setnes, R. Babuška, and H. B. Verbruggen. Rule-based modeling: Precision
and transparency. IEEE Trans. Syst. Man Cybern., Part C: Applications and
Reviews, 28:165–169, 1998.
35. M. Setnes and H. Roubos. GA-fuzzy modeling and classification: Complexity
and performance. IEEE Trans. Fuzzy Systems, 8(5):509–522, October 2000.
36. J. Yen, L. Wang, and C. W. Gillespie. Improving the interpretability of TSK
fuzzy models by combining global learning and local learning. IEEE Trans.
Fuzzy Systems, 6(4):530–537, November 1998.
37. L. A. Zadeh. Fuzzy sets. Inf. Control, 8:338–353, 1965.
38. L. A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning I. Inform. Sci., 8:199–250, 1975.
39. L. A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning II. Inform. Sci., 8:301–357, 1975.
40. L. A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning III. Inform. Sci., 9:43–80, 1975.