On different notions of model size

On different notions of model size
(position paper)
Harald Störrle
University of Innsbruck
Technikerstrasse 21A
6020 Innsbruck, Austria
[email protected]
1
1.1
Introduction
Motivation
Models are an important artifact in the processes of
developing any large scale system. In model-driven approaches such as promoted by the OMG’s Model Driven
Architecture (MDA, [4]), they even play a central role—
but even in explicitly code-centric approaches such as XP
[1], and even in the small scale development efforts where
these approaches may be applied, there usually are models.
They might not be fully fledged formal models, but rather
sketches, and their role might be not quite as pivotal as in
MDA but they are models none the less.
The more models are created and used in a development
effort, the more they are subject to all activities which traditionally apply to code, such as debugging, reviewing, versioning, reusing and so on. Obviously, thus, measuring attributes like quality of models or productivity in creating
and improving them become important. One prerequisite to
define such measures is an adequate notion of the size of a
model. For instance, in analogy to similar notions concerning programs, quality might be measured as errors by size,
and productivity might be measured as size by effort.
Even though today the Unified Modeling Language
(UML, cf. [3, 8]) is the “lingua franca of software engineering” (cf. [5, p. iv]), there are many other languages around,
and since many of them have been around for many years,
there are many models out there that have been created and
are continuously maintained using them. Obviously, comparing different modeling languages is an important issue
in its own right. Therefore, it is instrumental that a model
sizing approach is general enough to cater for a wide range
of modeling languages. Incidentally, this will also help in
quickly gaining a sufficiently large empirical basis for calibrating weight factors.
1.2
Approach
This paper is entirely devoted to models as they occur
in the design and maintenance of business information systems. Models from domains like system software have not
been taken into account, though the results should be transferable, at least. Only the size of models is considered—
other interesting properties like quality and complexity1 are
outside our focus (see Figure 1).
quality
size
complexity
length
width
breadth
density
strength
Figure 1. Dimensions of the size of models.
The general idea in this approach is straightforward:
the size of a model should somehow express the amount
of information contained in the model. Intuitively, this
would correspond to the number of decisions a modeler
would have to make modeling, or the number of answers
to (unasked) questions a reader would get from reading the
model. First, we identify four dimensions that might intuitively be considered as aspects of the size of a model.
In order to be able to compare models expressed in different modeling languages, we define a very simple and
general metamodel that is capable of capturing all notions
of any modeling language. We then sketch the mapping
from the UML 2 metamodel to this sizing metamodel, and
1 Sometimes, size is considered as the simplest possible measure for
complexity, based on the suspicion that large things are also complex
things.
define a size metric based on the sizing metamodel. Finally,
we discuss a number of controversial cases, which will help
sharpen the notion of model size.
AType, we assume appropriate basic types like INT, CHAR,
and references to elements of the model in Figure 2.
Model, model language, model language family
2
Basic concepts
We must first define the basic concepts as a foundation
for the following discussions and definitions. The concepts
are formalized by a meta-metamodel (see Figure 2).
Model vs. original
A model is an abstraction of some original (cf. [6]). In software engineering, there are usually two originals: the socioeconomic or legacy system which is to be (partially) replaced, and the program system implementing (part of) the
replacement. With respect to the former original, a model
is descriptive, while the same model is prescriptive wrt. the
latter.
real world:
originals
(physical entities)
socio-economic
system
description
model world:
abstractions
(concepts)
program
system
prescription
diagram,
text, table,
data structure
presentation
model
Model vs. document
The type of a model is called its model language. There
are scores of model languages in practical use today. Some
“model languages” like IDEF, SDL, or UML are really families of model languages, while others like ERD, StateCharts, or use case maps are just a single language. In the
remainder, when we speak of a model language, we refer
to an individual model language, such as IDF0, MSCs, or
UML 2 activity diagrams. We reserve the term model language family for compound languages such as UML, SDL,
and IDEF. In the context of UML 2, a profile may constitute
a model language.
Model family
Typically, models do not exist in isolation (at least in industrial use). Rather they come as a possibly very large set
of models belonging together, referring to each other, and
building on each other. We will call this a model family.
Exactly which models are supposed to be in a family, what
role they play there, in which order and for which purpose
they are created and so on is determined by guidelines (such
as a method or process) created or selected and modified by
a particular organization (i. e. a company, department, or
project).
2.1
UML mapping
A model may be presented as a diagram, but its contents is
independent of any visual or textual notation. The contents
is basically a set of model elements and some attributes.
The presentation is optional, and is done int the form of
documents capturing the model completely or partially. In
many cases, the document is a diagram such as a class diagram. While a model is contents without (visual) representation, a document is a visual or textual representation of a
model in some concrete syntax without its contents. Note
that one possibility of representing a model is to suppress
some of the model details, such as in an abstraction or selection of elements of the model.
Table 1 sketches the mapping from UML metaclasses to
the sizing meta-metamodel. Size functions defined on this
meta-metamodel may be applied to all UML diagrams.
Should a model need to have attributes of its own, than
these could be attached to a special model element. Should
a model family need to have attributes of its own, than there
could be a special model containing model elements carrying these attributes.
Purely for convenience, we also define a textual representation of the Meta-Metamodel.
Model, Model element, attribute
Model language independence
A model is a purely conceptual entity, consisting of some
attributes and a set of model elements. Model elements consists of an id, a type, and a set of attributes, which are just
a name-value pairs. We assume appropriate enumeration
types for the various terminals to be available, iė., String,
MType, MFType, METype, and DType. A model is typed as
an instance or expression of some modeling language. For
The modeling language should not contribute to the model
size. Ideally, the same model should have the same size, independent of the modeling language used. So, all the three
models in Figure 3 should have the same size. Only those
differences should contribute to differences in the size that
are expressible only in one of several modeling languages.
For instance, multiplicities of attributes may be expressed
3
Requirements for model size metrics
CD Sizing Meta-Metamodel
e. g. the analysis models
for some application system
type: MFType
name: String
0..*
presentation
0..*
1..*
Model
type: MType
1
e. g. text or
diagrams
1..*
ModelElement
type: METype
Attribute
0..*
key: String
value: AType
::=
::=
::=
::=
::=
::=
::=
2..*
ModelLanguage
1..*
type
name: String
1..* presentation: DType [1..*]
e. g. UML 2 activities,
IDEF0, use case maps
e. g. myClassModel,
theCompanyDataModel
1
ModelFamily
Models
Model
ModelElements
ModelElement
Attribute
Attributes
ModelLanguageFamily
0..1
0..*
Document
e. g. UML, IDEF
ModelFamily
e. g. ERD entities or
UML 2 metaclasses
e. g. <name, foo> or
<aggregationKind, none>
mf Attributes Models end
Model . Models | Model
m MType Attributes ModelElements .
ModelElement ; ModelElements |ModelElement
me Id METype Dim [Attributes]
a Key Value
Attribute , Attributes | Attribute
Figure 2. A class diagram representing a generic Meta-Metamodel for sizing models (top), a grammar
representing the same model (below).
CD Q
CD P
A
Table 1. A mapping from UML to the meta
model of Figure 2.
Sizing Meta-Metamodel
UML-Metamodel
ModelFamily
not in UML
Model
not the UML Metaclass
Model
ModelElement
all concrete subclasses of the
UML Metaclass ModelElement
Attribute
all StructuralFeatures (i. e.
Attribute or Association) of
all UML Metaclasses
Diagram
not in UML
AType
¡¡primitive¿¿ types and references to InstanceSpecifications
b: B {composite}
CD P
A
c
1
IE Q
B
ER R
A
B
A
0..* a : int
b : bool
a : int
b : bool
B
Figure 4. Two UML 2 models with the same
meaning.
interpretations lead to XMI files of different size. Clearly, a
size metric should not be influenced by this, and the same
model should yield the same size, no matter what tool is
used.
3.0.1
Method independence
That is, if the same model is created using different methods, all differences in the size of the two model instances
should result only from the method, and should be directly
attributable to the method used.
3.0.2
in UML class diagrams, but not in IE diagrams or ER diagrams.
b
A
B
Comments as part of model
In all models, natural language comments are an important
aspect. Therefore, comments should contribute to the size
of a model. Their size might be measured simply by their
length as a text. In Figure 6, the model represented by diagram Q should be large than the model represented by diagram P.
a
1
c
n
B
b
CD Q
CD P
A
B
Some explanation of
A in terms of the
application domain
A
B
Figure 3. The same model in different modeling languages should have the same size.
Figure 6. Comments should contribute to
model size.
Modeling style independence
Consider the two models presented by the UML class diagrams in Figure 4. According to the explanations in the
UML standard, both models have the same meaning. Thus,
one would expect that they also have the same size. However, the XMI files resulting from these two models differ.
Representation and tool independence
Different UML modeling tools use different internal data
structures and file formats. Accordingly, if the same model
are modeled in different tools, the resulting data structures
and files have different sizes. Even among UML tools adhering to (the same version of) the XMI standard, different
4
Different dimensions of size
In this section we discuss several factors that—from an
intuitive point of view—contribute to the size of a model.
4.1
Length
Probably the simplest aspect is the number of conceptual
entities contained in a model. In fact, counts of particular
entities have been proposed several times (e. g. [2], or [7];
Whitmire calls this class of metric “population metrics”, see
[9]), though with very different counting strategies. The
aggregation kind
navigability
(lower / upper)
multiplicity
constraints
none
unspecified
unspecified
none
value
ordered
subsets
...
aggregation
composition
navigable
not navigable
Figure 5. Some examples for weaker (i. e., more generic) and stronger (i. e., more specific) values.
Elements in typewriter font are UML keywords.
simplest and most general approach is to always count everything, and instrument the count by an external weight table. Based on the grammar defined in Figure 2 above, such
a definition is straightforward.
Assuming that a model language is described in terms
of a metamodel, this means to simply count the instances
of a metaclass (or more generally: metaconcept) and their
attributes. Most likely, these counts will be complemented
by weight factors (to be determined empirically).
length f ()
=0
length f (x,w) = length f (x) + length f (w)
length f (x;w) = length f (x) + length f (w)
length f (mf Models end) = length f (Models)
length f (m MType ModelElements.)
= f (MType) ∗ length f (ModelElements)
length f (me METype Attributes)
= f (METype) ∗ length f (Attributes)
length f (a Key Value) = f (Value)
CD P
A
nq
dq
SM A
empty
nq
dq
non_empty
nq
Figure 7. Complementing viewpoints might
create synergistic information.
might also think of the cover illustration of Douglas Hofstadters “Gödel, Escher, Bach–An Eternal Golden Braid”.
4.3
Height
Consider two buildings, one small and one big (see
sketch). From a certain point of view, both appear to be
equally tall, since their images are equally large. In fact, the
amount of information a human observer processes for both
houses is equal.
for the empty word , any word w and x a word of length 1.
We assume that for each combination of model family type
(that is, method) and model type (that is, notation), there is
a function f which defines a non-negative real weight factor
for every model element type.
The exact value of these weight factors will have to be
determined empirically, relative to the respective purpose.
4.2
Width
As we have already described above, models usually do
not occur in isolation but as families of interrelated models.
Many development methods dwell on providing orthogonal
views of the same entity like a static and a behavioral view
(see Figure 7). The idea is that a human (or machine) interpreter might yield additional, synergistic knowledge from
complementing different viewpoints.
This principle is known under the Gestalt-psychology
slogan “the whole is more than the sum of its parts”. One
However, many more details of the smaller house will
be visible, since the granularity of the picture is finer. Carrying this intuition over to models, a house is an original
system, an image is a model, and the distances between the
observer and the houses are the heights of abstraction or the
concreteness of the picture.
That is, intuitively, one would expect that the product of
the height of abstraction and the size of the original determines the size of the model (see Figure 8).
In terms of model, consider the two models presented by
the diagrams P and Q in Figure 9. Assume that both models
large
original
idea
less
height of abstraction
models
M
M1 2
concreteness
small
original
system
more
Figure 8. The model size is the product of
the height of abstraction and the size of the
original.
are identical, except for the originals they represent. In a
way, the two models represent different heights of abstraction, and thus different distances from the model to a program implementing it. Since the originals are very different
in size, in some sense, the two models have a very different
size, too. Of course, this size aspect is not obvious from the
model presentation.
CD Q
CD P
Button
EventQueue
SAP R/3 FI
UNIServ
Figure 9. The size of the original contributes
to the intuitive size of its models as well.
If the model is descriptive, that is, the original is an existing program system, the size of the original can be determined by traditional metrics like Lines-of-code or Function
Points. If the model is prescriptive, the breadth of the model
can not be determined mechanically. Instead, an estimate
based on similar systems might be used.
4.4
the distance between original and model, depth speaks
about the model and its elements only (one could maybe
also say “sharpness”).
The model elements in a modeling language may be ordered in the sense that some of them are more specific than
others. For instance, the UML 2 metaclasses Component
is a direct specialization of the UML 2 metaclasses Class.
Thus, the former represents more decisions by a modeler,
i.e., more information. Thus, in the UML 2, a component
model element should have a higher weight than a class
model element. So, the model presented in diagram Q of
Figure 10 should have a larger size than the model presented
in diagram P.
CD Q
CD P
A
A
Figure 10. In the UML 2 metamodel, Component is a specialisation of Class.
As a generalization one might demand that for modeling
languages organizing their model elements in a specialisation hierarchy, the weight of a model element should at least
partially be determined by its distance from the root in the
specialisation hierarchy. If there are more than one path
from a model element to the root (that is, if the specialisation hierarchy uses multiple inheritance as e. g. the UML
does), then the longest path counts.
Another aspect of strength that cannot be captured by
considering the depth in the inheritance tree is presented in
Figure 11. Obviously, model Q is more detailed than model
P, though this does not necessarily increase the number of
model elements or attributes, it just replaces an unspecific
value of an attribute by a more specific one.
CD Q
CD P
A
B
A
{ordered}
2..4
B
Figure 11. Model Q contains more information
than model P, since it is more specific.
Depth
One aspect related to but different from height is the
degree of concreteness or closeness to implementation in
terms of the degree of detailedness. We we will call this aspect “depth” in the remainder. Where height speaks about
In fact, there are many such examples for associations
alone (see Figure 5), and plenty more when considering all
kinds of tags, stereotypes, constraints and other decorations
of model elements.
4.5
attribute state
Density
check
unchecked
In any given modeling language and method, each model
element has a number of attributes that may or must be
filled. The more such attributes are actually filled, the
higher is the degree of detail, which contributes considerably to the size of a model, even if this is usually hidden
behind some tool (see Figure 12). Comparing modeling to
photography, one might call density “resolution”.
checked
modify
modified
??
unchanged
ok
<Value>
work in
progress
submitted
for approval
open question
empty
filled
partially
approved
qa approved
Figure 13. The states and transitions associated with an attribute. The double question
mark indicates an open question.
the model element, i. e.
attrs(me . . . Attributes) := |Attributes|
where |w| is the length of the word w.
Obviously, different tools and methods will provide different sets of attributes, and what might be a mandatory attribute in one project might be obligatory in another project
or at a later stage of the same project. In other words,
4.6
Compound size metrics
TM
Figure 12. A screenshot from the ADONIS
modeling tool: a simple graphical symbol like
a UML use case has a large number of attributes (24 in this particular case).
Of course, this implies that the value and the update status of attributes must be distinguished—otherwise, default
values would increase density automatically, even if they
are not at all needed. So, the state of an attribute is not just
its value, but must also comprise the kind of the value (default, actual value, open question), as well as the quality of
the information (i. e. qa state or automatic checker state),
since an approved and verified information is in some sense
bigger or heavier than the same information if it is tentative.
It may be described by the following sate machine (see
Figure 13).
Thus, we assume that for each combination of model
family type (that is, method) and model type (that is, notation), there is a function aoe which defines a list of the
attributes required for a properly specified model element
for every model element type.
The density d of a given model elements me is thus defined as
aoe(me)
d(me) :=
attrs(me)
where attrs(me) is defined as the number of attributes in
Based on these different aspects of size, we may now define a comprehensive metrics for the size of a model. First,
consider height and depth. Both contribute to model size
independently, and together they determine the distance to
be covered from a model to an implementation. We call this
the breadth of the gap between model and implementation,
defined as the product of height and depth.
Next, we combine the product of length, width, and
breadth into the metric volume. Similarly, a metric mass
may be defined as the product of volume and density.
breadth
volume
mass
= height ∗ depth
= length ∗ width ∗ breadth
= volume ∗ density
For an appropriately defined model-difference operator
diff, it is possible to define a model-distance metric d as
d(M1 , M2 ) := length f (diff (M1 , M2 )).
Together with the class of all models M, d forms a metric
space hM, di.
5
Discussion
In this paper we study the notion of “model size” by
examining different aspects of this notion. We define a
generic metamodel that allows to capture size relevant features of any conceivable modeling language. It is a kind
of normal form for models independent of modeling language, methodology, and similar factors, and will thus allow to study these factors experimentally by keeping the
model fixed and varying the other variables like modeling
language, modeling style etc.
A number of aspects of model size are difficult to quantify (e. g. height, depth). At this point, we can’t even say
whether they actually have an influence, much less how big
it is. If factors like these have an influence, their size must
be studied empirically, and it would be an interesting question to find out what context conditions determine them.
In order to proceed toward this goal, we need to provide a mapping to the metamodel for the most important
modeling languages besides the UML. Then, the measures
must be implemented and applied to real models in order to
study them further. A part of this implementation has already been completed, and some data have been collected,
but much more data is needed.
References
[1] K. Beck. Extreme Programming. Addison-Wesley, 2000.
[2] S. R. Chidamber and C. F. Kemerer. Towards a Metrics
Suite for Object Oriented Design. In Proc. Intl. Conf. Object
Oriented Programming, Systems, and Languages (OOPSLA),
pages 197–211. ACM Press, 1991.
[3] OMG. UML 2.0 Superstructure Specification (formal/05-0704). Technical report, Object Management Group, Aug. 2005.
available at www.omg.org, downloaded at September 19th ,
2005.
[4] MDA Guide Version 1.0.1. Technical report, Object Management Group, June 2003. available at www.omg.org/mda,
document number omg/2003-06-01.
[5] B. Selic, S. Kent, and A. Evans, editors. Proc. 3rd Intl. Conf.
<<UML>> 2000. Advancing the Standard, number 1939 in
LNCS. Springer Verlag, Oct. 2000.
[6] H. Stachowiak. Allgemeine Modelltheorie. Springer Verlag,
1973.
[7] H. Störrle. Models of Software Architecture. Design and Analysis with UML and Petri-nets. PhD thesis, LMU München,
Institut für Informatik, Dec. 2000. ISBN 3-8311-1330-0.
[8] H. Störrle. UML 2 erfolgreich einsetzen. Addison-Wesley,
2005.
[9] S. A. Whitmire. Object-Oriented Design Measurement. Wiley Publishing Inc., 1997.