On different notions of model size (position paper) Harald Störrle University of Innsbruck Technikerstrasse 21A 6020 Innsbruck, Austria [email protected] 1 1.1 Introduction Motivation Models are an important artifact in the processes of developing any large scale system. In model-driven approaches such as promoted by the OMG’s Model Driven Architecture (MDA, [4]), they even play a central role— but even in explicitly code-centric approaches such as XP [1], and even in the small scale development efforts where these approaches may be applied, there usually are models. They might not be fully fledged formal models, but rather sketches, and their role might be not quite as pivotal as in MDA but they are models none the less. The more models are created and used in a development effort, the more they are subject to all activities which traditionally apply to code, such as debugging, reviewing, versioning, reusing and so on. Obviously, thus, measuring attributes like quality of models or productivity in creating and improving them become important. One prerequisite to define such measures is an adequate notion of the size of a model. For instance, in analogy to similar notions concerning programs, quality might be measured as errors by size, and productivity might be measured as size by effort. Even though today the Unified Modeling Language (UML, cf. [3, 8]) is the “lingua franca of software engineering” (cf. [5, p. iv]), there are many other languages around, and since many of them have been around for many years, there are many models out there that have been created and are continuously maintained using them. Obviously, comparing different modeling languages is an important issue in its own right. Therefore, it is instrumental that a model sizing approach is general enough to cater for a wide range of modeling languages. Incidentally, this will also help in quickly gaining a sufficiently large empirical basis for calibrating weight factors. 1.2 Approach This paper is entirely devoted to models as they occur in the design and maintenance of business information systems. Models from domains like system software have not been taken into account, though the results should be transferable, at least. Only the size of models is considered— other interesting properties like quality and complexity1 are outside our focus (see Figure 1). quality size complexity length width breadth density strength Figure 1. Dimensions of the size of models. The general idea in this approach is straightforward: the size of a model should somehow express the amount of information contained in the model. Intuitively, this would correspond to the number of decisions a modeler would have to make modeling, or the number of answers to (unasked) questions a reader would get from reading the model. First, we identify four dimensions that might intuitively be considered as aspects of the size of a model. In order to be able to compare models expressed in different modeling languages, we define a very simple and general metamodel that is capable of capturing all notions of any modeling language. We then sketch the mapping from the UML 2 metamodel to this sizing metamodel, and 1 Sometimes, size is considered as the simplest possible measure for complexity, based on the suspicion that large things are also complex things. define a size metric based on the sizing metamodel. Finally, we discuss a number of controversial cases, which will help sharpen the notion of model size. AType, we assume appropriate basic types like INT, CHAR, and references to elements of the model in Figure 2. Model, model language, model language family 2 Basic concepts We must first define the basic concepts as a foundation for the following discussions and definitions. The concepts are formalized by a meta-metamodel (see Figure 2). Model vs. original A model is an abstraction of some original (cf. [6]). In software engineering, there are usually two originals: the socioeconomic or legacy system which is to be (partially) replaced, and the program system implementing (part of) the replacement. With respect to the former original, a model is descriptive, while the same model is prescriptive wrt. the latter. real world: originals (physical entities) socio-economic system description model world: abstractions (concepts) program system prescription diagram, text, table, data structure presentation model Model vs. document The type of a model is called its model language. There are scores of model languages in practical use today. Some “model languages” like IDEF, SDL, or UML are really families of model languages, while others like ERD, StateCharts, or use case maps are just a single language. In the remainder, when we speak of a model language, we refer to an individual model language, such as IDF0, MSCs, or UML 2 activity diagrams. We reserve the term model language family for compound languages such as UML, SDL, and IDEF. In the context of UML 2, a profile may constitute a model language. Model family Typically, models do not exist in isolation (at least in industrial use). Rather they come as a possibly very large set of models belonging together, referring to each other, and building on each other. We will call this a model family. Exactly which models are supposed to be in a family, what role they play there, in which order and for which purpose they are created and so on is determined by guidelines (such as a method or process) created or selected and modified by a particular organization (i. e. a company, department, or project). 2.1 UML mapping A model may be presented as a diagram, but its contents is independent of any visual or textual notation. The contents is basically a set of model elements and some attributes. The presentation is optional, and is done int the form of documents capturing the model completely or partially. In many cases, the document is a diagram such as a class diagram. While a model is contents without (visual) representation, a document is a visual or textual representation of a model in some concrete syntax without its contents. Note that one possibility of representing a model is to suppress some of the model details, such as in an abstraction or selection of elements of the model. Table 1 sketches the mapping from UML metaclasses to the sizing meta-metamodel. Size functions defined on this meta-metamodel may be applied to all UML diagrams. Should a model need to have attributes of its own, than these could be attached to a special model element. Should a model family need to have attributes of its own, than there could be a special model containing model elements carrying these attributes. Purely for convenience, we also define a textual representation of the Meta-Metamodel. Model, Model element, attribute Model language independence A model is a purely conceptual entity, consisting of some attributes and a set of model elements. Model elements consists of an id, a type, and a set of attributes, which are just a name-value pairs. We assume appropriate enumeration types for the various terminals to be available, iė., String, MType, MFType, METype, and DType. A model is typed as an instance or expression of some modeling language. For The modeling language should not contribute to the model size. Ideally, the same model should have the same size, independent of the modeling language used. So, all the three models in Figure 3 should have the same size. Only those differences should contribute to differences in the size that are expressible only in one of several modeling languages. For instance, multiplicities of attributes may be expressed 3 Requirements for model size metrics CD Sizing Meta-Metamodel e. g. the analysis models for some application system type: MFType name: String 0..* presentation 0..* 1..* Model type: MType 1 e. g. text or diagrams 1..* ModelElement type: METype Attribute 0..* key: String value: AType ::= ::= ::= ::= ::= ::= ::= 2..* ModelLanguage 1..* type name: String 1..* presentation: DType [1..*] e. g. UML 2 activities, IDEF0, use case maps e. g. myClassModel, theCompanyDataModel 1 ModelFamily Models Model ModelElements ModelElement Attribute Attributes ModelLanguageFamily 0..1 0..* Document e. g. UML, IDEF ModelFamily e. g. ERD entities or UML 2 metaclasses e. g. <name, foo> or <aggregationKind, none> mf Attributes Models end Model . Models | Model m MType Attributes ModelElements . ModelElement ; ModelElements |ModelElement me Id METype Dim [Attributes] a Key Value Attribute , Attributes | Attribute Figure 2. A class diagram representing a generic Meta-Metamodel for sizing models (top), a grammar representing the same model (below). CD Q CD P A Table 1. A mapping from UML to the meta model of Figure 2. Sizing Meta-Metamodel UML-Metamodel ModelFamily not in UML Model not the UML Metaclass Model ModelElement all concrete subclasses of the UML Metaclass ModelElement Attribute all StructuralFeatures (i. e. Attribute or Association) of all UML Metaclasses Diagram not in UML AType ¡¡primitive¿¿ types and references to InstanceSpecifications b: B {composite} CD P A c 1 IE Q B ER R A B A 0..* a : int b : bool a : int b : bool B Figure 4. Two UML 2 models with the same meaning. interpretations lead to XMI files of different size. Clearly, a size metric should not be influenced by this, and the same model should yield the same size, no matter what tool is used. 3.0.1 Method independence That is, if the same model is created using different methods, all differences in the size of the two model instances should result only from the method, and should be directly attributable to the method used. 3.0.2 in UML class diagrams, but not in IE diagrams or ER diagrams. b A B Comments as part of model In all models, natural language comments are an important aspect. Therefore, comments should contribute to the size of a model. Their size might be measured simply by their length as a text. In Figure 6, the model represented by diagram Q should be large than the model represented by diagram P. a 1 c n B b CD Q CD P A B Some explanation of A in terms of the application domain A B Figure 3. The same model in different modeling languages should have the same size. Figure 6. Comments should contribute to model size. Modeling style independence Consider the two models presented by the UML class diagrams in Figure 4. According to the explanations in the UML standard, both models have the same meaning. Thus, one would expect that they also have the same size. However, the XMI files resulting from these two models differ. Representation and tool independence Different UML modeling tools use different internal data structures and file formats. Accordingly, if the same model are modeled in different tools, the resulting data structures and files have different sizes. Even among UML tools adhering to (the same version of) the XMI standard, different 4 Different dimensions of size In this section we discuss several factors that—from an intuitive point of view—contribute to the size of a model. 4.1 Length Probably the simplest aspect is the number of conceptual entities contained in a model. In fact, counts of particular entities have been proposed several times (e. g. [2], or [7]; Whitmire calls this class of metric “population metrics”, see [9]), though with very different counting strategies. The aggregation kind navigability (lower / upper) multiplicity constraints none unspecified unspecified none value ordered subsets ... aggregation composition navigable not navigable Figure 5. Some examples for weaker (i. e., more generic) and stronger (i. e., more specific) values. Elements in typewriter font are UML keywords. simplest and most general approach is to always count everything, and instrument the count by an external weight table. Based on the grammar defined in Figure 2 above, such a definition is straightforward. Assuming that a model language is described in terms of a metamodel, this means to simply count the instances of a metaclass (or more generally: metaconcept) and their attributes. Most likely, these counts will be complemented by weight factors (to be determined empirically). length f () =0 length f (x,w) = length f (x) + length f (w) length f (x;w) = length f (x) + length f (w) length f (mf Models end) = length f (Models) length f (m MType ModelElements.) = f (MType) ∗ length f (ModelElements) length f (me METype Attributes) = f (METype) ∗ length f (Attributes) length f (a Key Value) = f (Value) CD P A nq dq SM A empty nq dq non_empty nq Figure 7. Complementing viewpoints might create synergistic information. might also think of the cover illustration of Douglas Hofstadters “Gödel, Escher, Bach–An Eternal Golden Braid”. 4.3 Height Consider two buildings, one small and one big (see sketch). From a certain point of view, both appear to be equally tall, since their images are equally large. In fact, the amount of information a human observer processes for both houses is equal. for the empty word , any word w and x a word of length 1. We assume that for each combination of model family type (that is, method) and model type (that is, notation), there is a function f which defines a non-negative real weight factor for every model element type. The exact value of these weight factors will have to be determined empirically, relative to the respective purpose. 4.2 Width As we have already described above, models usually do not occur in isolation but as families of interrelated models. Many development methods dwell on providing orthogonal views of the same entity like a static and a behavioral view (see Figure 7). The idea is that a human (or machine) interpreter might yield additional, synergistic knowledge from complementing different viewpoints. This principle is known under the Gestalt-psychology slogan “the whole is more than the sum of its parts”. One However, many more details of the smaller house will be visible, since the granularity of the picture is finer. Carrying this intuition over to models, a house is an original system, an image is a model, and the distances between the observer and the houses are the heights of abstraction or the concreteness of the picture. That is, intuitively, one would expect that the product of the height of abstraction and the size of the original determines the size of the model (see Figure 8). In terms of model, consider the two models presented by the diagrams P and Q in Figure 9. Assume that both models large original idea less height of abstraction models M M1 2 concreteness small original system more Figure 8. The model size is the product of the height of abstraction and the size of the original. are identical, except for the originals they represent. In a way, the two models represent different heights of abstraction, and thus different distances from the model to a program implementing it. Since the originals are very different in size, in some sense, the two models have a very different size, too. Of course, this size aspect is not obvious from the model presentation. CD Q CD P Button EventQueue SAP R/3 FI UNIServ Figure 9. The size of the original contributes to the intuitive size of its models as well. If the model is descriptive, that is, the original is an existing program system, the size of the original can be determined by traditional metrics like Lines-of-code or Function Points. If the model is prescriptive, the breadth of the model can not be determined mechanically. Instead, an estimate based on similar systems might be used. 4.4 the distance between original and model, depth speaks about the model and its elements only (one could maybe also say “sharpness”). The model elements in a modeling language may be ordered in the sense that some of them are more specific than others. For instance, the UML 2 metaclasses Component is a direct specialization of the UML 2 metaclasses Class. Thus, the former represents more decisions by a modeler, i.e., more information. Thus, in the UML 2, a component model element should have a higher weight than a class model element. So, the model presented in diagram Q of Figure 10 should have a larger size than the model presented in diagram P. CD Q CD P A A Figure 10. In the UML 2 metamodel, Component is a specialisation of Class. As a generalization one might demand that for modeling languages organizing their model elements in a specialisation hierarchy, the weight of a model element should at least partially be determined by its distance from the root in the specialisation hierarchy. If there are more than one path from a model element to the root (that is, if the specialisation hierarchy uses multiple inheritance as e. g. the UML does), then the longest path counts. Another aspect of strength that cannot be captured by considering the depth in the inheritance tree is presented in Figure 11. Obviously, model Q is more detailed than model P, though this does not necessarily increase the number of model elements or attributes, it just replaces an unspecific value of an attribute by a more specific one. CD Q CD P A B A {ordered} 2..4 B Figure 11. Model Q contains more information than model P, since it is more specific. Depth One aspect related to but different from height is the degree of concreteness or closeness to implementation in terms of the degree of detailedness. We we will call this aspect “depth” in the remainder. Where height speaks about In fact, there are many such examples for associations alone (see Figure 5), and plenty more when considering all kinds of tags, stereotypes, constraints and other decorations of model elements. 4.5 attribute state Density check unchecked In any given modeling language and method, each model element has a number of attributes that may or must be filled. The more such attributes are actually filled, the higher is the degree of detail, which contributes considerably to the size of a model, even if this is usually hidden behind some tool (see Figure 12). Comparing modeling to photography, one might call density “resolution”. checked modify modified ?? unchanged ok <Value> work in progress submitted for approval open question empty filled partially approved qa approved Figure 13. The states and transitions associated with an attribute. The double question mark indicates an open question. the model element, i. e. attrs(me . . . Attributes) := |Attributes| where |w| is the length of the word w. Obviously, different tools and methods will provide different sets of attributes, and what might be a mandatory attribute in one project might be obligatory in another project or at a later stage of the same project. In other words, 4.6 Compound size metrics TM Figure 12. A screenshot from the ADONIS modeling tool: a simple graphical symbol like a UML use case has a large number of attributes (24 in this particular case). Of course, this implies that the value and the update status of attributes must be distinguished—otherwise, default values would increase density automatically, even if they are not at all needed. So, the state of an attribute is not just its value, but must also comprise the kind of the value (default, actual value, open question), as well as the quality of the information (i. e. qa state or automatic checker state), since an approved and verified information is in some sense bigger or heavier than the same information if it is tentative. It may be described by the following sate machine (see Figure 13). Thus, we assume that for each combination of model family type (that is, method) and model type (that is, notation), there is a function aoe which defines a list of the attributes required for a properly specified model element for every model element type. The density d of a given model elements me is thus defined as aoe(me) d(me) := attrs(me) where attrs(me) is defined as the number of attributes in Based on these different aspects of size, we may now define a comprehensive metrics for the size of a model. First, consider height and depth. Both contribute to model size independently, and together they determine the distance to be covered from a model to an implementation. We call this the breadth of the gap between model and implementation, defined as the product of height and depth. Next, we combine the product of length, width, and breadth into the metric volume. Similarly, a metric mass may be defined as the product of volume and density. breadth volume mass = height ∗ depth = length ∗ width ∗ breadth = volume ∗ density For an appropriately defined model-difference operator diff, it is possible to define a model-distance metric d as d(M1 , M2 ) := length f (diff (M1 , M2 )). Together with the class of all models M, d forms a metric space hM, di. 5 Discussion In this paper we study the notion of “model size” by examining different aspects of this notion. We define a generic metamodel that allows to capture size relevant features of any conceivable modeling language. It is a kind of normal form for models independent of modeling language, methodology, and similar factors, and will thus allow to study these factors experimentally by keeping the model fixed and varying the other variables like modeling language, modeling style etc. A number of aspects of model size are difficult to quantify (e. g. height, depth). At this point, we can’t even say whether they actually have an influence, much less how big it is. If factors like these have an influence, their size must be studied empirically, and it would be an interesting question to find out what context conditions determine them. In order to proceed toward this goal, we need to provide a mapping to the metamodel for the most important modeling languages besides the UML. Then, the measures must be implemented and applied to real models in order to study them further. A part of this implementation has already been completed, and some data have been collected, but much more data is needed. References [1] K. Beck. Extreme Programming. Addison-Wesley, 2000. [2] S. R. Chidamber and C. F. Kemerer. Towards a Metrics Suite for Object Oriented Design. In Proc. Intl. Conf. Object Oriented Programming, Systems, and Languages (OOPSLA), pages 197–211. ACM Press, 1991. [3] OMG. UML 2.0 Superstructure Specification (formal/05-0704). Technical report, Object Management Group, Aug. 2005. available at www.omg.org, downloaded at September 19th , 2005. [4] MDA Guide Version 1.0.1. Technical report, Object Management Group, June 2003. available at www.omg.org/mda, document number omg/2003-06-01. [5] B. Selic, S. Kent, and A. Evans, editors. Proc. 3rd Intl. Conf. <<UML>> 2000. Advancing the Standard, number 1939 in LNCS. Springer Verlag, Oct. 2000. [6] H. Stachowiak. Allgemeine Modelltheorie. Springer Verlag, 1973. [7] H. Störrle. Models of Software Architecture. Design and Analysis with UML and Petri-nets. PhD thesis, LMU München, Institut für Informatik, Dec. 2000. ISBN 3-8311-1330-0. [8] H. Störrle. UML 2 erfolgreich einsetzen. Addison-Wesley, 2005. [9] S. A. Whitmire. Object-Oriented Design Measurement. Wiley Publishing Inc., 1997.
© Copyright 2026 Paperzz