Applying Cognitive Psychology for Improving the Creation

University of Innsbruck
Department of Computer Science
Dissertation
Applying Cognitive Psychology for Improving the
Creation, Understanding and Maintenance of Business
Process Models
Stefan Zugal
submitted to the Faculty of Mathematics, Computer Science and Physics of the
University of Innsbruck in partial fulfillment of the requirements for the degree of
“Doktor der Naturwissenschaften”
Advisor:
Assoc.–Prof. Dr. Barbara Weber
Innsbruck, 2013
Abstract
Considering the wide–spread adoption of business process modeling, the role of process models has become ever more central. Still, industrial process model collections
display a wide range of quality problems, resulting in vivid research on the quality
of process models, business process modeling languages and –methods. This thesis contributes to this stream of research by advocating the adoption of concepts
from cognitive psychology for improving business process modeling languages and –
methods. To address this rather broad research statement, this thesis focuses on two
particular problems that are approached with the support of cognitive psychology.
First, issues related to the creation, understanding and maintenance of declarative
process models are analyzed. To counteract respective problems, the adoption of
test cases, supporting the test driven development and maintenance of declarative
process models, is proposed. By developing a prototypical implementation of the
proposed concepts and employing them in empirical studies, the feasibility of the
proposed approach is demonstrated. More specifically, empirical evidence for the
positive influence on the creation and understanding of declarative process models
is provided in a case study. Furthermore, beneficial effects on the maintenance of
declarative process models are established in the course of two experiments.
Second, the focus is shifted toward the interplay between a process model’s modularization and the resulting impact on understandability. By conducting a systematic
literature review, apparently contradicting findings related to the understanding of
modularization are found. To resolve these apparent contradictions, a cognitive–
theory–based framework for assessing the impact of a process model’s modularization on its understandability is proposed. The subsequent empirical validation in
the course of three experiments provides empirical evidence for the validity of the
proposed framework.
Summarizing, the creation, understanding and maintenance of declarative process
models as well as the connection between a process model’s modularization and
its understandability are successfully addressed. Thus, it can be concluded that
concepts from cognitive psychology are indeed a promising foundation for improving
business process modeling languages and –methods.
Zusammenfassung
In Anbetracht der weiten Verbreitung der Geschäftsprozessmodellierung ist die Rolle
von Geschäftsprozessmodellen, kurz Prozessmodellen, zentraler denn je. Nichtsdestotrotz weisen industrielle Prozessmodelle immer noch eine Vielzahl von Qualitätsproblemen auf, was wiederum zu reger Forschungsaktivität führte, um Prozessmodelle, Prozessmodellierungssprachen sowie Methoden zur Erstellung von Prozessmodellen zu verbessern. Auch diese Dissertation befasst sich mit diesem Forschungszweig und untersucht, wie Qualitätsverbesserung von Prozessmodellen zielgerichtet
durch das Übertragen von Konzepten aus der kognitiven Psychologie erreicht werden kann. Um diese breite Forschungsfrage zu behandeln, werden zwei spezifische
Probleme aus der Prozessmodellierung mit Hilfe von Konzepten aus der kognitiven
Psychologie adressiert.
Der erste Teil dieser Dissertation befasst sich mit dem Erstellen, Verstehen und
Warten von deklarativen Prozessmodellen. Um bestehenden Problemen gegenzusteuern, werden Testfälle vorgeschlagen, die es erlauben, deklarative Prozessmodelle testgetrieben zu entwickeln und zu warten. Die Machbarkeit von Testfällen
für deklarative Prozessmodelle wird mit Hilfe einer prototypischen Implementierung
und folgender empirischen Validierung demonstriert. Dabei wird im Rahmen einer
Fallstudie der positive Einfluss von Testfällen auf die Erstellung von deklarativen
Prozessmodellen belegt. Des Weiteren wird der positive Einfluss von Testfällen
auf die Wartbarkeit von deklarativen Modellen im Zuge zweier Experimente untermauert.
Im zweiten Teil dieser Dissertation wird der Fokus auf die Verständlichkeit von
Prozessmodellen gelegt und das Zusammenspiel zwischen der Modularisierung eines
Prozessmodells und dessen Verständlichkeit genauer beleuchtet. Im Rahmen einer
systematischen Literaturanalyse werden offenbar widersprüchliche Ergebnisse empirischer Forschungen identifiziert und mit Hilfe eines auf kognitiver Psychologie
basierendem Framework aufgelöst. Analog zum ersten Teil dieser Dissertation werden die darin ausgearbeiteten Konzepte im Zuge von drei Experimenten empirisch
getestet und validiert.
In einem Satz gesagt, adressiert diese Dissertation das Erstellen, die Verständlichkeit und die Wartbarkeit von deklarativen Prozessmodellen sowie das Zusammenspiel der Modularisierung eines Prozessmodells und dessen Verständlichkeit mit Hilfe
von Konzepten aus der kognitiven Psychologie. In Anbetracht der Ergebnisse kann
geschlussfolgert werden, dass Konzepte aus der kognitiven Psychologie in der Tat
einen vielversprechenden Ausgangspunkt für die Verbesserung von Prozessmodellen,
Prozessmodellierungssprachen und –methoden bilden.
Acknowledgements
From my subjective point of view, a PhD feels like an exhausting and demanding, but
still enjoyable and rewarding journey. In this sense, I want to thank everybody, who
accompanied and supported this endeavor. First, and foremost, I am indebted to
the continuous support of my advisor, Barbara Weber. Thank you for your guidance
and your encouragement—not only on a professional, but also on a personal level.
The next person, who undoubtedly deserves an own paragraph, is Jakob Pinggera.
Thank you for the countless discussions, feedback and suggestions. Also, I highly
appreciate your support regarding coffee domination as well as all other not–so–
serious projects that lighten up daily routine.
Also, I want to thank my parents and family for giving me a home I always
cherish coming back to. Thank you for your sheer endless patience, encouragement
and solace. Thank you, Eva Zangerle, for giving me a similarly enjoyable new home.
Furthermore, this work would not have been possible without the highly appreciated aid of researchers from other universities. The most, I am indebted to the
support of Manfred Reichert, Hajo Reijers and Pnina Soffer. Besides your professional support, I highly appreciate our cooperation on a personal level. Thank
you Manfred for letting me learn that there are also Germans who appreciate soccer
players from Austria. Thank you Hajo for introducing me to the pleasures of whisky
and squash. Thank you Pnina for not throwing shoes at me.
If you, dear reader, are wondering why your name was not mentioned so far, please
be forgiving and let me catch up for that. Thank you.
Eidesstattliche Erklärung
Ich erkläre hiermit an Eides statt durch meine eigenhändige Unterschrift, dass ich
die vorliegende Arbeit selbständig verfasst und keine anderen als die angegebenen
Quellen und Hilfsmittel verwendet habe. Alle Stellen, die wörtlich oder inhaltlich
den angegebenen Quellen entnommen wurden, sind als solche kenntlich gemacht.
Die vorliegende Arbeit wurde bisher in gleicher oder ähnlicher Form noch nicht als
Magister–/Master–/Diplomarbeit/Dissertation eingereicht.
Innsbruck,
Oktober 2013
Stefan Zugal
Contents
1 Introduction
1
2 Research Methodology
5
3 Background
3.1 Declarative Business Process Models . . . . . . . . . . . . . .
3.1.1 Characteristics of Declarative Business Process Models
3.1.2 Semantics of Declarative Sub–Processes . . . . . . . .
3.1.3 Enhanced Expressiveness . . . . . . . . . . . . . . . .
3.1.4 Impact on Adaptation . . . . . . . . . . . . . . . . . .
3.2 Cognitive Psychology . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Search . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Recognition . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Cheetah Experimental Platform . . . . . . . . . . . . . . . . .
4 Test
4.1
4.2
4.3
4.4
4.5
4.6
.
.
.
.
.
.
.
.
.
.
9
9
9
12
15
16
17
17
18
19
25
Driven Modeling
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Understandability and Maintainability of Declarative Process Models
4.4.1 Understandability . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . .
Testing Framework for Declarative Processes . . . . . . . . . . . . .
4.5.1 Software Testing Techniques . . . . . . . . . . . . . . . . . . .
4.5.2 Process Testing Framework Concepts . . . . . . . . . . . . .
4.5.3 Test Driven Modeling and the Declarative Process Life–Cycle
4.5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test Driven Modeling Suite . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Software Components . . . . . . . . . . . . . . . . . . . . . .
4.6.2 Support for Empirical Evaluation, Execution and Verification
4.6.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
30
31
32
33
34
37
38
38
39
45
52
53
53
57
58
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.7
The Influence of TDM on Model Creation: A Case Study .
4.7.1 Definition and Planning of the Case Study . . . . .
4.7.2 Performing the Case Study . . . . . . . . . . . . . .
4.7.3 Limitations and Discussion . . . . . . . . . . . . . .
4.8 The Influence of TDM on Model Maintenance: Experiments
4.8.1 Experimental Definition and Planning . . . . . . . .
4.8.2 Performing the Experiment (E1 ) . . . . . . . . . . .
4.8.3 Performing the Replication (R1 ) . . . . . . . . . . .
4.9 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . .
4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
62
66
75
77
78
83
89
95
96
99
5 The Impact of Modularization on Understandability
101
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2 Existing Empirical Research into Modularizing Conceptual Models . 103
5.2.1 Planning the Systematic Literature Review . . . . . . . . . . 104
5.2.2 Performing the Systematic Literature Review . . . . . . . . . 105
5.2.3 Reporting the Systematic Literature Review . . . . . . . . . . 106
5.3 A Framework for Assessing Understandability . . . . . . . . . . . . . 108
5.3.1 Antagonists of Understanding: Abstraction and Fragmentation 108
5.3.2 Toward a Cognitive Framework for Assessing Understandability112
5.3.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.4 Evaluation Part I: BPMN . . . . . . . . . . . . . . . . . . . . . . . . 114
5.4.1 Experimental Definition and Planning . . . . . . . . . . . . . 115
5.4.2 Performing the Experiment (E2 ) . . . . . . . . . . . . . . . . 121
5.4.3 Performing the Replication (R2 ) . . . . . . . . . . . . . . . . 138
5.5 Evaluation Part II: Declare . . . . . . . . . . . . . . . . . . . . . . . 155
5.5.1 Experimental Definition and Planning . . . . . . . . . . . . . 155
5.5.2 Performing the Experiment (E3 ) . . . . . . . . . . . . . . . . 162
5.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6 Summary
187
Appendices
A Tests for Normal Distribution
189
A.1
A.2
A.3
A.4
A.5
Experiment E1
Replication R1
Experiment E2
Replication R2
Experiment E3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
189
190
190
199
208
B Supplementary Information
209
B.1 Process of Thesis Writing . . . . . . . . . . . . . . . . . . . . . . . . 209
B.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Abbreviations
213
Bibliography
215
Chapter 1
Introduction
For decades, conceptual models have been used to facilitate the development of
information systems [207, 266] and to support practitioners in the analysis of complex business domains [26]. For instance, strategists create mind maps [109], supply
chain managers create decisions models [181] and system analysts create conceptual
models [48] (cf. [26]). In this sense, over the years numerous conceptual modeling
languages and associated modeling tools have been proposed [160], as also discussed
in [226]: “Even though hundreds of modelling methods are in existence today, practitioners and researchers are zealously ’producing’ new modelling methods”. In more
recent years, business process models, or process models for short, have gained particular attention [48, 207]. The creation of business process models—depicting the
way organizations conduct current or future business processes—is a fundamental
prerequisite for organizations to engage in Business Process Management (BPM)
initiatives [110]. Therefore, it is not surprising that process modeling was found to
be one of the highest ranked reasons for engaging in conceptual modeling [48].
In general, business process models are used, for instance, to support the development of process–aware information systems [60], inter–organizational workflows [293], service–oriented architectures [65] and web services [148]. Typically,
business process models employ graphical notions to capture which activities, events
and states constitute the underlying business process. In this sense, a business
process can be defined as a set of connected activities that collectively realize a
certain business goal [205, 281]. Business processes can be found in almost every
entrepreneurial context, including insurance claims and the refunding of travel expenses, but also healthcare applications. Similar to other conceptual models, process
models are first and foremost required to be intuitive and easily understandable, especially in project phases that are concerned with requirements documentation and
communication [252]. Also, business process vendors and practitioners ranked the
usage of process models for understanding business processes as a core benefit [110].
Knowing that many companies design and maintain several thousand process models, often also involving non–expert modelers [214], it seems of central importance
1
Chapter 1 Introduction
that respective models are easy to understand and of appropriate quality. Still, it
was observed that these large model collections exhibit serious quality issues, apparently negatively influencing their understandability [273]. For instance, error rates
between 10% and 20% were found in industrial model collections [141].
These problems regarding the quality of process models resulted in vivid research
into potential reasons and countermeasures. For instance, in [140] syntactic errors,
such as deadlocks, were analyzed in industrial process model collections, aiming to
establish a connection between structural model properties and error rates. Other
researchers investigated the consistency of activity labels [144] and issues related to
secondary notation, e.g., layout, of process models [221]. Also, the visual design of
modeling languages [151] was discussed and modeling languages were screened for
ontological deficiencies [202]. Recently, a new stream of research emerged, shifting
the emphasis from the process model toward the act of creating a process model,
denoted to as process of process modeling (PPM). Thereby, researchers identified
varying strategies that are adopted during the creation of a model [185, 186] and
linked particular strategies to the quality of the resulting process model [31]. Moreover, various techniques for visualizing the PPM [30, 192], analysis techniques using
eye movement analysis [183] and theoretical considerations were discussed [233]. So
far, investigations regarding the PPM were focused on a subset of the Business
Process and Model Notation (BPMN) [167], however, recently also other modeling
notations, such as change patterns [274], were considered [268, 269].
In this thesis, we1 contribute to these streams of research and aim for improving business process modeling languages and –methods by widening our perspective
toward the field of cognitive psychology. In particular, we think that the adoption
of concepts from cognitive psychology fosters the systematic improvement of business process modeling languages and respective methods. In this vein, the central
research question of this thesis can be formulated, as follows:
How can we apply concepts from cognitive psychology for systematically
improving business process modeling languages and –methods?
Apparently, this research question is of rather general nature. Therefore, we selected two particular problems for which we expected that the adoption of cognitive
psychology would be beneficial. More specifically, as illustrated in Figure 1.1, first,
1
2
The contributions of this thesis can be clearly attributed to the author. At the same time, the
author is indebted to continuous feedback, suggestions and discussions that helped to guide this
work (cf. acknowledgements and Appendix B.2). To express the gratitude for the support, we
instead of I is used for the remainder of the thesis.
promising concepts from cognitive psychology are put into the context of BPM (cf.
Chapter 3). Then, we shift our focus on the creation, understanding and maintenance of declarative business process models, as shown in the upper branch (cf.
Chapter 4). Subsequently, we turn toward the link between the modularization
and the understandability of a process model, as shown in the lower branch (cf.
Chapter 5). In each of these branches, based upon concepts from cognitive psychology, theories and methods are developed for solving the respective problem at
hand. Then, to demonstrate the feasibility, operational support for developed theories and methods is provided through the implementation of respective software
tools. Finally, the implemented tools are employed in empirical studies to validate
the theories. At this point, we also would like to mention that large parts of this
thesis were already published by the author, however, in a more condensed form. In
particular, concepts from cognitive psychology were described in [298, 299], contributions regarding declarative business process models were published in [296, 300–302]
and the link between a process model’s modularization and it understandability was
published in [297, 303, 304].
Theory
Tools
Empirical
Research
Experiment E1:
Innsbruck, 2010
Creation,
Understanding,
Maintenance
(Chapter 4)
Test Driven
Modeling
Test Driven
Modeling Suite
Case Study:
Innsbruck, 2011
Cognitive
Psychology
(Chapter 3)
Understanding
(Chapter 5)
Replication R1:
Ulm, 2011
Experiment E2:
Eindhoven, 2012
Understandability
Framework
Hierarchy
Explorer
Replication R2:
Online, 2012
Experiment E3:
Ulm/Ibk, 2012
Figure 1.1: Overview of thesis
3
Chapter 1 Introduction
The remainder of this thesis is structured, as follows. Chapter 2 describes the employed research methodology. Then, Chapter 3 introduces background information
regarding declarative process models, cognitive psychology and tools for empirical
research. Chapter 4 describes the adoption of cognitive psychology for improving
the creation, understanding and maintenance of declarative business process models.
Chapter 5, in turn, describes the application of concepts from cognitive psychology
for investigating the link between a process model’s modularization and its understandability. Finally, Chapter 6 concludes the thesis with a summary and an
outlook.
4
Chapter 2
Research Methodology
In this chapter, we focus on the research methodology applied in this thesis. As
discussed in Chapter 1, the main contribution of this thesis targets the creation,
understanding and maintenance of business process models. In particular, building
upon concepts from cognitive psychology, as introduced in Chapter 3, Chapter 4
addresses the creation, understanding and maintenance of declarative business process models. Then, Chapter 5 advances research into the interplay between the
modularization and understanding of imperative and declarative business process
models. In the following, we describe the generic research methodology we applied.
Generally, this research can be attributed to the design science paradigm, “which
seeks to create innovations that define the ideas, practices, technical capabilities and
products through which the analysis, design, implementation, management and use
of information systems can be effectively and efficiently accomplished” [100]. More
specifically, in this research we follow the Design Science Research Methodology
(DSRM) approach [173], which includes the following activities:
(1) Problem identification and motivation
(2) Define the objectives for a solution
(3) Design and development
(4) Demonstration
(5) Evaluation
(6) Communication
Besides guiding the conducted research, the structure of Chapter 4 and Chapter 5 is aligned according to the DSRM approach. In particular, as it is known
that solutions to irrelevant problems will not be used, problem relevance is of central importance [199, 283]. Hence, both chapters start with a general discussion of
the problem and the definition of objectives for a solution. Subsequently, artifacts
5
Chapter 2 Research Methodology
solving the discussed problems will be designed, developed, demonstrated and evaluated, focusing on research rigor [100]. The final step communication focuses on
the publication of results, i.e., is achieved through this document. By adopting the
DSRM approach we are following the analytic paradigm and a deductive model by
“proposing a set of axioms, developing a theory, deriving results and, if possible,
verifying the results with empirical observations” [13].
In this thesis, we put a particular emphasis on the empirical evaluation of the
proposed artifacts, i.e., activity evaluation (5).1 Hence, in the following we shortly
discuss the character of the empirical research conducted in this work. As defined
in DSRM, the evaluation aims to measure “how well the artifact supports a solution
to the problem” [173]. In this sense, the experiments conducted in this research are
rather designed to test a hypothesis, than to explore a new domain, cf. [13]. According to [13], empirical studies can be of descriptive nature, i.e., such studies identify
patterns in the data, but do not examine relationship between variables. If the
variation of dependent variable(s) can be attributed to the variation of independent
variable(s), the study is called correlational. Finally, if the treatment variable(s)
is the only plausible cause of variation in the dependent variable(s), a cause–effect
study is conducted. In this research, we focus mostly on cause–effect studies, in particular the execution of controlled experiments [61]. Thus, the empirical research will
be mainly conducted in vitro, i.e., in the laboratory under controlled conditions [13].
For quality assurance, we make use of checklists for the definition, execution and
analysis of experiments, as far as applicable [284, 285]. Due to pragmatic reasons, in
particular limitations with respect to budget and personnel resources, experiments
will mostly be conducted on students rather than on experts. As argued in [61],
recruiting students is a common practice for obtaining a large group of subjects.
However, researchers are then required to critically analyze the external validity of
the experiments, i.e., in how far obtained results can be generalized. Regarding
the empirical investigations conducted in this thesis, we generally follow a mixed–
method approach [39], i.e., we combine qualitative and quantitative research methods
in order to achieve method triangulation [112], as detailed in the following.
1
6
According to [282], empirical research can play the role of validation or evaluation. Validation
refers to artifacts that were not transferred to practice yet, while evaluation refers to the performance of an artifact after it was transferred to practice. In this work, we focus on validation.
However, DSRM defines evaluation more generally as to “observe and measure how well the
artifact supports a solution to the problem” [173]. To avoid confusion, we adopt the definition
of DSRM, but refer to validation in the sense of [282].
Qualitative Methods
The roots of qualitative research methods can be found in educational research
and other social sciences, where qualitative methods were introduced to study the
complexities of humans, e.g., motivation, communication or understanding [239].
In this sense, qualitative methods produce qualitative data that is rich in detail,
mostly represented as text and pictures, not numbers [86]. This, in turn, allows the
researcher to delve into the complexity of the problem, rather than abstracting it
away. Likewise, results obtained through qualitative methods are richer in detail
and more informative [222]. In this way, qualitative methods help to answer the
question why and how certain phenomena occur. It is important to note that, even
though qualitative data cannot be captured using numbers, qualitative data is not
necessarily subjective. Rather, objectivity or subjectivity of data are orthogonal
aspects [222]. In this research, the application of think–aloud techniques [64] and
concepts from grounded theory [35] can be attributed to the adoption of qualitative
methods. Therein, the basic idea is to ask participants to think out loud while
performing a task. This, in turn, allows the researcher to attain a unique view of
the problem solving process [228].
Quantitative Methods
While qualitative research methods focus on the question how phenomena occur,
quantitative research methods are mainly concerned with quantifying relationships or
comparing two or more groups [39]. In this sense, quantitative research methods are
appropriate when testing the effect of some manipulation or activity, as quantitative
data allows for comparisons and statistical methods [286]. Just like qualitative data
is not necessarily subjective, quantitative data is not objective by default, rather
objectivity has to be seen as a orthogonal factor [222]. It is important to note that
qualitative and quantitative methods do not exist in isolation, but can be connected.
For instance, by coding think–aloud protocols, qualitative information in the form
of transcripts can be transformed to quantitative data, e.g., the number of codes
occurred in the protocol.
Method Triangulation
So far, we have discussed the basic nature of qualitative and quantitative research
methods. In this research, we combine both research methods, thereby adopting a
mixed–method approach [39]. The combination of research methods has a distinct
tradition in the literature on social sciences and was described as method triangulation [267]. The triangulation metaphor actually originates from the domain
7
Chapter 2 Research Methodology
of navigation: given multiple viewpoints, the exact position can be determined at
greater accuracy. Likewise, by adopting multiple research methods, phenomena can
be investigated from different perspectives, allowing researchers to improve the accuracy of their results [112]. The basic idea is thereby to compensate the weaknesses of
an approach by the strengths of a complementary approach [61]. For instance, quantitative methods allow for investigating cause–effect relations, but are not suited for
explaining the underlying reasons, i.e., they neglect the question how phenomena occur. To compensate this shortcoming, qualitative approaches can help by providing
the necessary data. More figuratively, the adoption of qualitative research methods
has also been described to “function as the glue that cements the interpretation of
multi method results” [112].
8
Chapter 3
Background
In this chapter, we introduce backgrounds necessary for the understanding of the
remainder of this thesis. In particular, Section 3.1 introduces declarative business
process models, Section 3.2 discusses concepts from cognitive psychology in the light
of business process modeling, whereas Section 3.3 describes Cheetah Experimental
Platform.
3.1 Declarative Business Process Models
The assessment and improvement of declarative process models is one of the major contributions of this thesis. In this section, we aim for building a general understanding of declarative process models by providing the necessary background
information. Particularly, Section 3.1.1 provides a general introduction to declarative business process models, whereas Section 3.1.2 discusses the semantics of sub–
processes in declarative process models. Finally, Section 3.1.3 and Section 3.1.4
discuss peculiarities of modularized declarative process models.
3.1.1 Characteristics of Declarative Business Process Models
There has been a long tradition of modeling business processes in an imperative way.
Process modeling languages supporting this paradigm, like BPMN [167], EPC [218]
and UML Activity Diagrams [166], are widely used. Recently, declarative approaches
have received increasing interest and suggest a fundamentally different way of describing business processes [175]. While imperative models specify exactly how
things have to be done, declarative approaches rather focus on what to achieve.
In other words, instead of explicitly describing the control to be followed when executing a process instance, e.g., as done in BPMN, declarative approaches rather
focus on the conditions to be achieved when executing a process instance. To discuss
the characteristics of declarative process modeling notations, we will particularly focus on the declarative process modeling language Declare [175] in the remainder of
9
Chapter 3 Background
this thesis.1 Likewise, when not explicitly indicated differently, we use declarative
process model synonymously to a Declare–based declarative process model.
The declarative process modeling notation Declare focuses on the logic that governs the interplay of actions in the process by describing the activities that can
be performed as well as constraints prohibiting undesired behavior. Thereby, constraints can be classified along two dimensions. First, constraints may be classified
as existence constraints, relation constraints or negation constraints [253]. Existence
constraints specify how often an activity must be executed for a particular process
instance. Relation constraints, in turn, restrict the relation between activities. Finally, negation constraints define negated relations between activities, i.e., can be
seen as negated relation constraints. Second, and orthogonally, constraints can be
classified as execution constraints and completion constraints (also referred to as termination constraints, cf. [304]). Execution constraints, on the one hand, restrict the
execution of activities, e.g., an activity can be executed at most once. Completion
constraints, on the other hand, affect the completion of process instances and specify
when process completion is possible. For instance, an activity must be executed at
least once before the process can be completed. Here, it is worthwhile to note that
declarative process instances must be completed explicitely, i.e., the end user must
decide when a process instance, for which all completion constraints are satisfied,
should be completed [205]. Most constraints focus either on execution or completion semantics, however, some constraints also combine execution and completion
semantics (e.g., the cardinality constraint [175], cf. Table 3.1). To give an overview
of typical constraints, Table 3.1 shows examples for each category. An overview of
all constraints defined in Declare can be found in [253].
Existence Constraints
Constraint
Definition
Ea
Cb
cardinality(a,m,n)2
a must occur at least m times and at most n
times
×
×
init(a)
a must be the first activity executed in every
process instance
×
1
Declare was formerly known as ConDec, see:
http://www.win.tue.nl/declare/2011/11/declare-renaming
2
In [175], the cardinality constraint is approached by three constraints: existence, i.e., an activity
must be executed at least n times, exactly, i.e., an activity must be executed exactly n times
and absence, i.e., an activity must be executed at most n − 1 times. In this thesis, for improving
readability, we merged these three constraints into the cardinality constraint.
10
3.1 Declarative Business Process Models
last(a)
×
a must be the last activity executed in every
process instance
Relation Constraints
Constraint
Definition
Ea
Cb
precedence(a,b)
b must be preceded by a (not necessarily directly preceded)
×
response(a,b)
if a is executed, b must be executed afterwards
(not necessarily directly afterwards)
succession(a,b)
combines precedence(a,b) and response(a,b)
×
×
chain response(a,b)
if a is executed, b must be executed directly
afterwards
×
×
chain precedence(a,b)
before each execution of b, a must be executed
directly before
×
chain succession(a,b)
combines constraints chain precedence(a,b) and
chain response(a,b)
×
coexistence(a,b)
if a is executed, b must be executed and vice–
versa
×
×
×
Negation Constraints
Constraint
Definition
Ea
neg response(a,b)
if a is executed, b must not be executed afterwards
×
neg coexistence(a,b)
a and b cannot co–occur in any process instance
×
a
Affects the execution of activities
b
Affects the completion of process instances
Cb
Table 3.1: Definition of constraints
To illustrate the concept of declarative processes, a process model (P MM ) specified in Declare [175] is shown in Figure 3.1a. It contains activities A to F as well
as constraints C1 and C2 . C1 prescribes that A must be executed at least once
(i.e., C1 restricts the completion of process instances). C2 specifies that E can only
11
Chapter 3 Background
be executed if C has been executed at some point in time before (i.e., C2 imposes
restrictions on the execution of activity E). In Figure 3.1b an example of a process
instance (P IM ) illustrates the semantics of P MM . Therein, we make use of events
to describe relevant changes during process execution, e.g., instantiation of the process instance or the start and completion of activities. After process instantiation
(event e1 ), activities A, B, C, D and F can be executed. E, however, cannot be
executed as C2 specifies that C must have been executed before (cf. grey bar below
“E”). Furthermore, the process instance cannot be completed as C1 is not satisfied,
i.e., A has not been executed at least once (cf. grey area below “Completion”). The
subsequent execution of B (in e2 B is started, in e3 B is completed) does not cause
any changes as B is not involved in any constraint. However, after A is executed
(e4 , e5 ), C1 is satisfied, i.e., A has been executed at least once and thus P IM can
be completed—after e5 the box below “Completion” is white. Then, C is executed
(e6 , e7 ), satisfying C2 and consequently allowing E to be executed. Finally, the execution of E (e8 , e9 ) does not affect any constraint, thus no changes with respect to
constraint satisfaction can be observed. As all completion constraints are satisfied,
P IM can be completed. Please note that declarative process instances have to be
completed explicitly, i.e., the end user must decide when to complete the process
instance (e10 ). Completion constraints thereby specify when completion is allowed,
i.e., P IM could have been completed at any point in time after e5 .
As illustrated in Figure 3.1b, a process instance can be specified through a list of
events. In the following, we will denote this list as execution trace, e.g., for P IM :
<e1 , e2 , e3 , . . . e10 >. Considering process model P MM from Figure 3.1a, execution
traces <A>, <C, A, E > and <B, A> are considered valid, because they satisfy
constraints C1 and C2 . Execution trace <B, C >, in turn, is considered invalid, as
it violates constraint C1 (A is not executed). Likewise, execution trace <A, E, B >
is invalid, since it violates constraint C2 (E is executed without prior execution of
C ).
3.1.2 Semantics of Declarative Sub–Processes
So far, we introduced declarative process models in general and Declare in particular.
This section, in turn, aims for establishing an understanding of the semantics of
sub–processes in a declarative model. In general, a sub–process is introduced in a
process model via a complex activity, which refers to a process model. When the
complex activity is executed, the referred process model, i.e., the sub–process, is
instantiated. Thereby, sub–processes are viewed as separate process instances, i.e.,
when a complex activity is started, a new instance of the sub–process the complex
activity is referring to, is created (cf. [167, 177]). The parent process, however, has no
12
3.1 Declarative Business Process Models
Process Model PM
Timeline
M
Process Model PM
M
1..* C1
1..*A C1
(b) Process InstancePI
PMIM
b)Instance
Process
b) Process
PIM Instance
B
B
A
C
E
F
D
PIM started
e2
e3
B started
e1
B completed
e4
e5
A started 2
e3
A completed
e6
C started
e7
C2
Legend
EX
Activity X
F
e8
e9
e10
1..*
Legend X
X
X
1..*
X
X
Activity X must be
executed at least once
Y
A
e1
D
C2
C
Execution
Timeline
(a)Process
Process
P MM
a) Process
Model
PM
a)
ModelModel
PM
M M
Activity X must be
executed before
Activity
X Y
can be executed
B
C
D
E
F
A
Execution
Completion
B
C
D
E
F
Completion
PIM started
B
e
e4
C completed
e5
E started
E completed
B started
BA completed
B
ACstarted
A completed
A
E
e6
C started
PIM completed
e7
C completed
e8
E started
C
E
Execution Trace of PIM: <PIM started, B started, B completed, A
e E completed
started, 9A completed, C started, C completed, E started, E
completed, PIM completed>
e10
PIM completed
Activity X must be
executed at least once
Y
Activity X must be
executed before Y
can be executed
Execution Trace of PIM: <PIM started, B started, B completed, A
started, A completed, C started, C completed, E started, E
completed, PIM completed>
Figure 3.1: Executing a declarative process model, adapted from [304]
information about the internals of the sub–process, i.e., the sub–process is executed
in isolation. In this sense, according to [51, 197, 198] we view sub–processes from
an integrated perspective, i.e., the sub–process is seen as a black box. Interaction
with the parent process is only done via the sub–process’ life–cycle.3 Thereby, the
life–cycle state of the complex activity reflects the state of the sub–process [177],
e.g., when the sub–process is in state completed, the complex activity must be in
state completed as well.
Considering this, it is essential that sub–processes are executed in isolation, as
isolation forbids that constraints can be specified between activities included in
different (sub–)processes. In other words, in a hierarchical declarative process model
with several layers of hierarchy, the constraints of a process model can neither directly
influence the control flow of any parent process, nor directly influence the control
flow of any sub–process on the same layer or a layer below. Please note that control
3
We do not take into account the exchange of input– and output data here, as we focus on control
flow behavior only.
13
Chapter 3 Background
flow may still be indirectly influenced by restricting the execution of a sub–process,
thereby restricting the execution of the activities contained therein.
To illustrate these concepts, consider the modularized process model P MM in Figure 3.2a. It consists of activity A, which can be executed arbitrarily often. Activity
B, in turn, can be executed at most three times (cf. constraint C1 ). B refers to process model P MB , which contains activities C and D. Here, constraint C2 prescribes
that C must be executed at least once whenever process model P MB is executed.
Furthermore, C and D are connected by the precedence constraint C3 , i.e., D can
only be executed if C was executed before. Figure 3.2b shows an example of an
execution of P MM . On the left, a timeline lists all events that occur during process
execution. To the right, the enablement of activities and whether a process instance
can be completed, is illustrated. Whenever the area below an activity or process
instance is colored white, it indicates that this activity is currently enabled or the
process instance can be completed, respectively. The timeline is to be interpreted
the following way: By instantiating P MM (e1 ), activities A and B become enabled,
as no constraints restrict their execution. C and D cannot be executed, as they are
confined in P MB and no instance of P MB is running yet. The subsequent execution of A (e2 , e3 ) does not change activity enablement, as A is not related with any
constraint. Then, the start of B (e4 ) causes the instantiation of P MB (P IB , e5 ).
Hence, C becomes enabled, as it can be executed within P IB . Still, D is not enabled
yet, as constraint C3 is not satisfied. After C is executed (e6 , e7 ), the precedence
constraint is satisfied, therefore also D becomes enabled. In addition, constraint C2
is satisfied, allowing process instance P IB to complete. After the execution of D (e8 ,
e9 ), the user decides to complete P IB (e10 ), causing C and D to be not executable
anymore and triggering the completion of B (e11 ). Still, A and B are enabled as
they can be executed within process instance P IM . Finally, after P IM is completed
by the end user through explicit completion (e12 ), no activity is enabled anymore.
As described in Section 3.1.1, the completion of a declarative process instance is
restricted by the use of completion constraints. In particular, a process instance is
allowed to be completed by the end user if all completion constraints are satisfied.
This, in turn, implies that a process instance may be completed without executing
any activity, if no completion constraints are present. In the context of process
model P MM in Figure 3.2, execution trace <A, B > is valid, since P MB does not
define completion constraints. At this point we also would like to emphasize that
activities are always executed within the context of the respective (sub–)process. In
this sense, execution trace <A, C > is invalid: C can only be executed if there is an
instance of P MB currently being executed.
14
3.1 Declarative Business Process Models
(a) Process Models P MM , P MB
a)
a) Process
ProcessModels
ModelsPM
PMMM, ,PM
PMBB
(b) Execution of P MM
b)b)Execution
ExecutionofofPM
PM
MM
Process
ProcessModel
ModelPM
PMMM
AA
PIPI
BB
M M PIPI
Timeline
Timeline
0..3
0..3 CC1 1
BB
++
Process
ProcessModel
ModelPM
PMBB
e1e1 PIPI
started
started
MM
e2e2 AAstarted
started
e3e3 AAcompleted
completed
e6e
CCstarted
started
e7e
CCcompleted
completed
DDstarted
started
6
CC3
3
DD
7
e8e
8
Legend
Legend
e9e
XX
Activity
ActivityXX
XX
++
e10
e10
e11
e
DDcompleted
completed
PIB completed
PIB completed
BBcompleted
completed
Complex
Complexactivity
activityXX
e12
e
PIPI
M completed
M completed
9
11
1..*
1..*
XX
XX
YY
AA
e4e4 BBstarted
started
e5e5 PIPI
B started
B started
C
1..*
1..* C22
CC
Activity
Activity
Compl.
Compl.
Enablement
Enablement
A A B B C C D D PIM
PIMPIBPIB
12
CC
BB
DD
Activity
Activity XXmust
mustbe
be
executed
executedatatleast
leastonce
once
Execution
PIPI
A started, A completed, B
M: <: <
M started,
ExecutionTrace
TraceofofPIPI
M started, A started, A completed, B
started, B completed, PIMM completed>
started, B completed, PIM completed>
Activity
Activity XXmust
mustbe
be
executed before activity Y
executed before activity Y
can be executed
can be executed
Execution Trace of PIB: < PIB started, C started, C completed, D
Execution Trace of PI : < PIB started, C started, C completed, D
started, D completed, PIBB completed>
started, D completed, PIB completed>
Figure 3.2: Executing a hierarchical declarative process model, adapted from [304]
3.1.3 Enhanced Expressiveness
For imperative process models, hierarchical decomposition is viewed as a structural
measure that may impact model understandability [297], but does not influence semantics. In declarative process models, however, hierarchy also has implications on
semantics. More precisely, hierarchy enhances the expressiveness of a declarative
modeling language. The key observation is that by specifying constraints that refer
to complex activities, it is possible to restrict the life–cycle of a sub–process. A
constraint that refers to a complex activity thereby not only influences the complex
activity, but also all activities contained therein. This, in turn, allows for the specification of constraints that apply in a certain context only. Consider for instance
activity C in Figure 3.2. Considering process models P MB , C is mandatory, i.e.,
for any instance of P MB C must be executed. In the context of process model
15
Chapter 3 Background
P MM , however, C is optional, as it is referred through complex activity B, which
is optional—thus also making the execution of C optional.
To illustrate enhanced expressiveness, consider models P MM and P MC in Figure 3.3, which solely use constraints described in Table 3.1. The chain precedence
constraint between C and D specifies that for each execution of D, C and therefore
P MC has to be executed directly before. When executing P MC , in turn, A has to
be executed exactly once and B has to be executed exactly twice (in any order).
Hence, the constraint between C and D actually refers to a set of activities, i.e., A
and B. For each execution of D, A has to be executed exactly once and B has to
be executed exactly twice. In other words, constraints on A and B are only valid
in the context of P MC . Such behavior cannot be modeled without hierarchy, using
the same set of constraints.
Legend
Process Model PMM
C
+
D
Process Model PMC
1
A
X
Activity X
X
+
Complex activity X
n
X
Activity X must be
executed exactly n times
2
B
X
Y
For each execution of
activity Y, X must be
executed directly before
Figure 3.3: Example of enhanced expressiveness, adapted from [304]
3.1.4 Impact on Adaptation
Constructing hierarchical models supports top–down analysis, i.e., creating the top–
level model first and further refining complex activities thereafter. While this seems
like a natural way of dealing with complexity, in some cases, it is desirable to transform a flat model to a hierarchical one. In the following, we will argue why refactoring [273], i.e., changing hierarchical structures in a control–flow preserving way,
is only possible under certain conditions for declarative process models. Refactoring requires that any hierarchical model can be translated into a model without
hierarchy, but the same control flow behavior (and vice versa). As discussed, expressiveness is enhanced by hierarchy. In other words, there exists control flow
behavior that can be expressed in a hierarchical model, but not in a model without
16
3.2 Cognitive Psychology
hierarchy—cf. Figure 3.3 for an example. Hence, hierarchical models that make use
of the enhanced expressiveness cannot be expressed as a non–modularized model,
i.e., cannot be refactored.
3.2 Cognitive Psychology
So far, we have introduced declarative business process models as well as discussed
their semantics and the usage of sub–processes. In this section, we focus on cognitive
psychology, which constitutes the next basic building block of this thesis. Cognitive
psychology has been investigating internal mental processes, e.g., the way how humans perceive, memorize or handle decision making since the 1950s [22]. In the
following, we will introduce basic concepts from cognitive psychology and put them
in the context of business process modeling. In particular, Section 3.2.1 introduces
the concept of search, Section 3.2.2 discusses recognition and, finally, Section 3.2.3
is concerned with inference.
3.2.1 Search
A vast body of research in the area of visual search has been conducted in the last
decades, cf. [287]. Basically, visual search, or search for short, deals with the identification of a target item among a set of distractor items [287]. The target item
thereby refers to the item to be searched for, whereas distractor items impede the
identification of the target item. In the context of process models, search refers to
the task of identifying a single modeling element, e.g., an activity or a start event
in a process model. As indicated in [128], multiple attributes of the same element
can help to improve the search process. Accordingly, for instance, a blue rectangle
(information: blue and rectangle) can be identified quicker than a rectangle (information: rectangle). A systematic investigation into the impact of visual properties
like size, brightness, or color, can be found in [151].
For this work, however, it is sufficient to know that the human perceptual system
is capable of efficiently locating a target item among a set of distractor items. An
example illustrating the impact of the visual representation on search can be found in
Figure 3.4. Model A in Figure 3.4a and Model B in Figure 3.4b are identical except
for the coloring of activity F. Hence, the models can be assumed to be information
equivalent [128], i.e., all information relevant for the business process can be obtained
from Model A as well as can be inferred from Model B. However, with respect to
search, the models are not computational equivalent [128], i.e., search is not equally
efficient in both models. In particular, when searching for activity F in Model B,
an additional visual cue, namely the color grey, is available for search, allowing for
17
Chapter 3 Background
quicker identification. In practice, such highlighting may be used to help the reader
to identify which activities are assigned to a certain role.
(b) Model B
(a) Model A
a) Model A
b) Model
B
b) Model
B
C
A
C
A
F
D
D E
A
F
E
H
G
G
H
C
C
D
D E
A
F
F
G
G
E
H
H
Figure 3.4: Impact of visual notation on search, adapted from [299]
3.2.2 Recognition
Search, as introduced, allows for the identification of a single modeling element in
isolation. The human perceptual system, however, also provides another mechanism
for identifying higher–level, i.e., more complex, objects. In particular, the identification of patterns plays a central role and is referred to as the process of recognition [128]. During recognition, two aspects were identified as primary influence
factors: First, the human ability to recognize information is highly sensitive to the
exact representation. Second, the perceptual system has to be trained specifically
for the recognition of patterns.
Representation of Information
The recognition of patterns highly depends on the exact form information is represented [128]. More specifically, information may be represented explicitly, enabling
solutions to be more readily “read–off” [217]. In contrast, information can also be
represented implicitly, requiring the reader to investigate the model step wise, as
direct perceptual recognition of patterns is not possible anymore. Thereby, explicit
and implicit are not dichotomous—rather, information can be classified along a spectrum of explicitness/implicitness. For recognition, information must be available in
a highly explicit form. To illustrate the concept of recognition, consider Model A
in Figure 3.5 a) and Model B in Figure 3.5 b). The process models are information
equivalent and differ only with respect to the layout of sequence flows. Apparently,
for Model A sequence flows are laid out in a clear and easily readable way, while
process Model B exhibits edge crossings which obscure reading and thereby make
information less explicit. If an experienced process modeler is asked to determine
whether activity B and activity C are mutual exclusive in Model A, a quick look at
18
3.2 Cognitive Psychology
the process model will most likely be sufficient for answering this question, as the
pattern “B and C directly connected to XOR split” is directly recognizable. If the
question was asked for process model B, even an experienced process modeler would
have to trace the sequence flows to find out that activities B and C are actually
mutual exclusive. Hence, the pattern is not explicit enough and thus not accessible
to recognition.
(a) Model A
(b) Model B
b)b)Model
ModelBB
a) Model
A A
a) Model
B
BB
AA
AA
CC
C
Figure 3.5: Recognition of mutual exclusion pattern, adapted from [299]
Schemata for Recognition
Besides the explicit representation of information, recognition depends on whether
the perceptual system has been trained properly, i.e., appropriate schemata [115,
168, 236, 238]—also called productions [128]—are available for the proper identification of patterns. Put differently, a novice who has not acquired appropriate
schemata yet, cannot rely on recognition, while experts can come back to a variety
of schemata that were acquired throughout the years by working with process models. For instance, an experienced modeler will immediately recognize that activities
B and C of Model A in Figure 3.5 are mutual exclusive, because they are directly
connected to a XOR gateway. A novice modeler, in turn, due to the lack of suitable
schemata has to analyze the model in–depth and make use of inference, as detailed
in the following.
3.2.3 Inference
So far, we have introduced search and recognition—mechanisms that allows for extracting rather simply–structured, local information. Most process models, however,
go well beyond complexity that can be handled by search and recognition. Here,
the human brain as “truly generic problem solver” [248] comes into play. A central
component of the human cognitive system is working memory, which represents a
19
Chapter 3 Background
construct that can maintain and manipulate a limited amount of information for
goal directed behavior [8, 37]. Basically, working memory can be conceptualized as
the activated part of long–term memory [249]. Long–term memory, unlike working
memory, has a theoretically unlimited capacity and stores the knowledge base of a
person, e.g., knowledge about facts, events, rules and procedures, over a long period
of time. It is important to emphasize that the functioning of working memory and
long–term memory is tightly interwoven. For instance, when reading a text, knowledge about speech is a prerequisite for the understanding of the text. In this context,
knowledge about speech is stored in long–term memory, while the actual processing
of the textual information takes place in working memory. Regarding the performance of such tasks, strong empirical evidence was found that working memory
plays an essential role for the outcome, e.g., for language comprehension [114], logic
learning [125], fluid intelligence [34] and the integration of preexisting domain knowledge [98]. However, it is assumed that the capacity of working memory is severely
limited [7], studies report a capacity of 4 items [21, 37], up to 7±2 items [8, 147].
In addition, information held in the working memory decays after 18–30 seconds if
not rehearsed [248]. To illustrate how severe these limitations are, consider the sequence A–G–K–O–M–L–J. The average human mind is just capable of keeping this
sequence in working memory, and, after 18–30 seconds the sequence will be forgotten. Thereby, the amount of working memory currently used is referred to as mental
effort. To measure mental effort, various techniques, such as rating scales, pupillary
responses or heart–rate variability are available [168]. Especially the usage of rating
scales, i.e., to self–rate mental effort, was shown to reliably measure mental effort
and is thus widely adopted [90, 168]. Furthermore, this kind of measurement can
easily be applied, e.g., by using 7–point rating scales. For instance, in [135] mental
effort was assessed using a 7–point rating scale, ranging from (1) very easy to (7)
very hard for the question “How difficult was it for you to learn about lightning from
the presentation you just saw?”.
The importance of the working memory was recognized and led to the development and establishment of Cognitive Load Theory (CLT), meanwhile widespread
and empirically validated in numerous studies [9, 168, 236]. Theories of CLT revolve around the limitations of the working memory and how these limitations can
be overcome—especially the second question is of special interest for the understandability of process models. Hence, subsequently, we discuss four phenomena that are
known to influence mental effort. First, we discuss the chunking of information.
Second, we show how process models can support inference through computational
offloading. Third, we introduce external memory, which allows for freeing resources
in working memory. Finally, we discuss how the split–attention effect increases mental effort.
20
3.2 Cognitive Psychology
Chunking and Schema Acquisition
How are humans, given the described limitations of working memory, able to recall a
sentence that contains more than 7±2 characters? The answer to this question can be
found in the way how the human mind organizes information. Information is believed
to be stored in interconnected chunks rather than stored in isolation [91, 238]. In this
way, several items are bound together to “form a unified whole” [85]. The process
of aggregating information to a chunk, in turn, is referred to as “chunking” [91].
Coming back to our example, this explanation helps to shed light on the question of
how humans can recall an entire sentence. Instead of remembering each character in
isolation, chunks are used to group characters into chunks of information. Imagine a
person trying to remember an entire sentence. One might use one chunk per word,
thereby aggregating several characters into a chunk which then can be stored in one
slot, effectively reducing mental effort, cf. [38, 91, 162].
When several chunks of information are integrated in long–term memory, they
form schemata, i.e., well–integrated chunks of knowledge regarding the world, events,
people or actions [7, 12]. These schemata, in turn, help to guide comprehension
and help to organize information more efficiently [111]. Consider, for instance, the
situation when a person remembers a word, such as glacier. As argued above, the
person will probably remember the entire word as chunk instead of remembering
each single character. Thereby, the knowledge of what constitutes a glacier and
how the word is spelled, is regarded as schema. The actual information stored in
working–memory is then called a chunk. In other words, chunks are always based
upon schemata, which, in turn, guide the construction of chunks. The importance
and usage of chunking is most obvious in investigations that look into the differences
between novices and experts. For instance, it was found that chess players store the
positions of tokens in chunks [28, 29]. Thereby, it was also found that the chunk
size of experts was by far larger than the chunk size of novices, providing a potential
explanation for the superiority of experts. Similar results regarding chunking were
also found in different domains, e.g., how football coaches remember moves [79] or
the way how physicians remember diagnostic interviews [36].
To illustrate how chunking potentially influences the understandability of business
process models, an example is provided in Figure 3.6. An unexperienced reader may,
as shown on the left hand side, use three chunks to store the process fragment: one
for each XOR gateway and one for activity A. In contrast, an expert may, as shown
on the right hand side, recognize this process fragment as a pattern for making
activity A optional. In other words, in the long–term memory a schema for optional
activities is present, thereby allowing to store the entire process fragment in one slot
in working memory.
21
Chapter 3 Background
A
A
Working Memory
Figure 3.6: Chunking of an optional activity [299]
Computational Offloading
In contrast to chunking, which is highly dependent on the internal representation of
information, i.e., how the reader organizes information in the mind, computational
offloading highly depends on the exact external presentation of the business process model, i.e., visualization of the process model. In particular, computational
offloading “refers the extent to which differential external representations reduce the
amount of cognitive effort required to solve information equivalent problems” [217].
In other words, an external representation may provide features that help the reader
to extract information. Instead of computing and inferencing respective information
in the modeler’s mind, information can, as in the process of recognition, more or
less be “read–off ” [217].
To put computational offloading in the context of business process modeling, an
example illustrating the described phenomenon is shown in Figure 3.7. The illustrated process models are information equivalent, i.e., the same execution traces can
be produced based on Model A and Model B. However, Model A was modeled in
BPMN, whereas for Model B the declarative modeling language Declare [175] was
used. Consider the task of listing all execution traces, i.e., process instances, that
are supported by the process model. A reader familiar with BPMN will probably
infer within a few seconds that Model A supports execution traces <A, B, D> and
<A, C, D>. Such information is easy to extract, as BPMN provides an explicit
concept for the representation of execution sequences, namely sequence flows. Thus,
for identifying all possible execution traces, the reader simply follows the process
model’s sequence flows—the computation of all execution traces is offloaded to the
process model. In contrast, for Model B, no explicit representation of sequences is
present. Rather, constraints define the interplay of actions and do not necessarily
specify sequential behavior. Thus, the reader cannot directly read off execution
traces, but has to interpret the constraints in the mind to infer execution traces.
In other words, process model B, while information equivalent to Model A, does
22
3.2 Cognitive Psychology
not provide computational offloading for extracting execution traces. Consequently,
even for a reader experienced with Declare, listing all supported traces is far from
trivial.
(a) Model A
(b) Model B
0..10..1
BB
BB
AA
DD
CC
1
1 1
1
D D
AA
0..10..1
CC
Figure 3.7: Computational offloading [299]
External Memory
Next, we would like to introduce another mechanism that is known for reducing
mental effort, i.e., the amount of working memory slots in use. In particular, we
discuss the concept of external memory in the context of business process models. An
external memory is referred to any information storage outside the human cognitive
system, e.g., pencil and paper or a black board [128, 217, 236, 248]. Information that
is taken from the working memory and stored in an external memory is then referred
to as cognitive trace. In the context of a diagram, a cognitive trace would be to, e.g.,
mark, update and highlight information [217]. Likewise, in the context of process
modeling, the model itself may serve as external memory. When interpreting a
process model, marking an activity as executed while checking whether an execution
trace is supported, can be seen as leaving a cognitive trace.
For the illustration of external memory and cognitive traces, consider the process
model shown in Figure 3.8. Assuming the reader wants to verify, whether execution
trace <A, D, E, F, G, H > is supported by the process model. So far, as indicated by
the bold path and the position of the token, the reader has “mentally executed” the
activities A, D, E, F and G. Without the aid of external memory, the reader will have
to keep in the working memory, which activities have been executed, i.e., sub–trace
<A, D, E, F, G>, as well as the position of the token within the process instance.
By writing down the activities executed so far, i.e., by transferring this information
from working memory to external memory (e.g., piece of paper), load on working
memory is reduced. In this particular case, the process model directly allows to
23
Chapter 3 Background
store the “mental token”—either by simply putting a finger on the respective part
of the process model or by marking the location of the token, as shown in Figure 3.8.
B
A
C
F
D
E
H
G
Figure 3.8: External memory [299]
Split–Attention Effect
So far, we have focused on phenomena that are known to decrease mental effort.
In the following, we look into the split–attention effect [115, 153, 237], which is
known to increase mental effort. Basically, the split–attention effect occurs as soon
as information from different sources has to be integrated. For instance, when
studying material that consists of separate text and diagrams, the learner has to
keep segments of the text in working memory while searching for the matching
diagrammatic entity [116]. Thereby, two basic effects are distinguished. First, the
reader has to switch attention between different information sources, e.g., text and
diagram. Second, the reader has to integrate different information sources. These
two phenomena in combination are then known to increase mental effort and are
referred to as split–attention effect.
In the context of business process modeling, the split–attention can presumably
be observed in any process model that contains sub–processes. As soon as the
reader has to relate to more than one sub–process, the split–attention effect occurs.
Consider, for instance, the process model shown in Figure 3.9. It consists of a top–
level process containing (complex) activities A, B and C. Complex activity A, in
turn, refers to a sub–process containing activities D, E and F. Likewise, complex
activity C refers to a sub–process which contains activities H, I and J. When, for
instance, determining whether activity E is always followed by activity B, one might
use the following strategy. First, activity E is located, i.e., focusing attention on the
respective sub–process. Second, activity B is located, i.e., switching attention to the
top–level process. Third, the relationship between activity E and B is established
by understanding that the execution of activity E is always directly followed by
24
3.3 Cheetah Experimental Platform
the execution of activity B, i.e., integrating the execution semantics of the top–level
process with the execution semantics of the sub–process.
A
B
C
E
H
F
J
I
D
Figure 3.9: Split–attention effect [297]
3.3 Cheetah Experimental Platform
As discussed in Chapter 2, empirical research is one of the central research methodologies applied in this thesis. To particularly support the efficient execution of
empirical research, Cheetah Experimental Platform (CEP) [188] was implemented.4
In the following, we describe how CEP supports efficient empirical research through
experimental workflows.
The motivation for the development of CEP arose from empirical research conducted with Alaska Simulator5 [187, 270, 271, 277]. In the course of several experiments (cf. [118, 182, 219, 272, 276, 295]) reoccurring problems hampering the
efficient execution of controlled experiments could be observed. First, for experiments that consisted of several steps, e.g., filling out a survey and subsequently
performing two modeling tasks, it could not be prevented that subjects disregarded
the experimental setup. For instance, subjects partially forgot filling out the survey
or conducting one of the modeling tasks, leading to missing data. Second, we could
observe that certain components, such as a demographical survey, were required
throughout several experiments, but could not be reused efficiently. Third, data
collected during experiments was lost due to, e.g., interrupted network connections.
To tackle these problems, an experimental design is operationalized in CEP by
the usage of an experimental workflow. In other words, CEP provides a simple implementation of a workflow engine, which supports the execution of experimental
4
Cheetah Experimental Platform is a joint effort with Jakob Pinggera and is freely available from:
http://www.cheetahplatform.org
5
Alaska Simulator is freely available from: http://www.alaskasimulator.org
25
Chapter 3 Background
workflow activities that were specifically tailored toward the needs of an empirical
study. An example of an experimental workflow operationalizing a single factor experiment [286] is shown in Figure 3.10. In this experimental setup, two modeling
notations, i.e., Notation A and Notation B, should be compared. Furthermore, demographic data as well as personal opinions about the modeling notations should
be collected. When conducting the experiment, subjects are instructed to download
a preconfigured version of CEP and to enter a code. By randomly assigning codes
to subjects, randomization can be achieved. Likewise, by distributing code 1 to
one half of the subjects and distributing code 2 to the other half, a balanced setup
can be achieved. After having downloaded and started the preconfigured version
of CEP, the subject is automatically prompted to enter a code (cf. first activity in
Figure 3.10). Thereby, CEP ensures that only valid codes, as specified in the experimental workflow, are permitted. After having entered a valid code, the demographic
survey is automatically presented to the subject. As soon as the subject finishes the
survey, CEP will automatically open an editor supporting Notation A or Notation
B —depending on the code the subject has entered. After finishing the modeling
task, CEP will automatically show a feedback form. Subsequently, all collected data
is automatically uploaded to a database. For the case that no Internet connection
is available, the subject is prompted to send the data via email.
code = 1
Enter Code
Provide
Modeling
Notation A
Show
Feedback
Form
Show
Demographic
Survey
code = 2
Provide
Modeling
Notation B
Figure 3.10: Experimental workflow: an example
By operationalizing an experimental design using an experimental workflow, the
problems described above are resolved, as follows. First, CEP automatically guides
subjects through the experimental workflow. Depending on the code, the subject has
entered, respective branches in the experimental workflow are executed. In this way,
CEP ensures that none of the activities are left out. Second, experimental workflows
are composed by reusable components, e.g., a survey component, a modeling editor
or a feedback form. In this way, experimental workflows can be assembled with
moderate programming effort. Third, data collected during the execution of the
experimental workflow is immediately stored locally and transferred to a database
26
3.3 Cheetah Experimental Platform
after the experiment is finished. In case no Internet connection is available, the
subject is prompted to send the data via email. Thus, it is ensured that collected
during experiments is not lost.
In this thesis, CEP provides the basis for empirical investigations. In particular, concepts described in Chapter 4 and Chapter 5 are accessible as experimental
workflow activities, i.e., can be used arbitrarily in experimental workflows for empirical investigations. The technical perspective of this integration is sketched in
Figure 3.11. Generic functions, such as determining the duration of an experimental
workflow activity, are provided through Abstract Activity. Subclasses extend Abstract Activity by contributing respective specific functionality. For instance, CEP
provides ready–to–use activities Survey Activity for administering surveys, Messagebox Activity for displaying messages as well as Feedback Activity for leaving feedback. Likewise, the concepts described in Chapter 4 are accessible through classes
Test Driven Modeling Activity and Declarative Modeling Activity. Concepts from
Chapter 5 are accessible through Hierarchy Explorer Activity. For the time being,
it is sufficient to know that these activities can be arbitrarily used in experimental
workflows, details are provided in Chapter 4 and Chapter 5, respectively.
Abstract Activity
Activity
Test Driven Modeling Activity
Survey Activity
+execute()
+getDuration()
Messagebox Activity
#id: String
#name: String
Feedback Activity
Declarative Modeling Activity
Hierarchy Explorer Activity
Figure 3.11: Experimental workflow activities: examples
Up to now, we have described the purpose of CEP and how it provides the basis
for empirical research conducted in this thesis. In the following, we would like to give
an overview about the usage of CEP in general. In the meantime, CEP was used
in the course of several empirical investigations. For instance, CEP was used for
the investigation of declarative process modeling notations [95, 180, 296, 300, 301,
303], the process of process modeling [30, 31, 183, 185, 186, 192], Concurrent Task
Trees [124], Literate Process Modeling [184], collaborative process modeling [75–
77], Change Pattern [268, 269] as well as general investigations of process modeling
endeavours [74, 94, 190, 278].
27
Chapter 4
Test Driven Modeling
So far, we have focused on the pillars this thesis is built upon, i.e., we have introduced research methodologies and declarative business process modeling, discussed
concepts from cognitive psychology and described Cheetah Experimental Platform.
In this chapter, we turn toward the application of these concepts with the purpose of
improving the creation, understanding and maintenance of declarative process models. As discussed in Chapter 2, in this work we follow the Design Science Research
Methodology (DSRM) approach [173]. The DSRM methodology comprises of six
activities that have not only guided our research, but were also used to structure
this chapter. In particular, DSRM specifies the following activities:
(1) Problem identification and motivation
(2) Define the objectives for a solution
(3) Design and development
(4) Demonstration
(5) Evaluation
(6) Communication
Starting with problem identification and motivation (1), a general introduction to
the problem is provided in Section 4.1. To clarify the problem, terminology used
throughout this chapter is introduced in Section 4.2, while Section 4.3 provides a
running example. Addressing defining objectives for a solution (2), we deepen the
discussion about benefits and drawbacks of declarative modeling and use cognitive
psychology to systematically assess declarative process modeling and identify potential shortcomings in Section 4.4. Based upon these findings, we propose respective
countermeasures in Section 4.5, cf. design and development (3). To demonstrate
and operationalize the proposed approach, a prototypical implementation is described in Section 4.6, addressing demonstration (4). The implementation is also
used in the following to empirically validate the proposed concepts in Section 4.7
29
Chapter 4 Test Driven Modeling
and Section 4.8, i.e., addressing activity evaluation (5). Limitations of this work
are described in Section 4.9, whereas connections to related work are established in
Section 4.10. Finally, this chapter is concluded with a summary of results in Section 4.11. Regarding the DSRM approach, communication (6) is orthogonal to these
sections, as it is the inherent purpose of this document to communicate results.
4.1 Introduction
In today’s prevalent dynamic business environment, the economic success of an
enterprise depends on its ability to react to various changes like shifts in customer’s attitudes or the introduction of new regulations and exceptional circumstances [129, 194]. Likewise, in health care applications, flexibility is a condition
sine qua non for the adoption of computerized systems [196, 240]. Process–Aware
Information Systems (PAISs) offer a promising perspective on shaping this capability, resulting in growing interest to align information systems in a process–oriented
way [60, 281]. Yet, a critical success factor in applying PAISs is the possibility of flexibly dealing with process changes [129]. To address the need for flexible PAISs, competing paradigms enabling process changes and process flexibility were developed,
e.g., adaptive processes [203, 280], case handling [257], declarative processes [176],
data driven processes [158] and late binding and modeling [216] (for an overview
see [274]).
All these approaches relax the strict separation of build–time (i.e., modeling) and
run–time (i.e., execution), which is typical for plan–driven approaches as realized in
traditional Workflow Management Systems. Depending on the concrete approach,
planning and execution are interwoven to different degrees, resulting in different
levels of decision deferral. The highest degree of decision deferral is fostered by Late
Composition [274] (e.g., as enabled through a declarative approach) which describes
activities that can be performed as well as constraints prohibiting undesired behavior. A declarative approach, therefore, is particularly promising for dynamic and
unpredictable processes [176, 264]. The support for partial workflows [264], allowing
users to defer decisions to run–time [274], the absence of over–specification [176]
and more maneuvering room for end users [176] can all be considered as advantages
commonly attributed to declarative processes. Although the benefits of declarative approaches seem rather evident, they are not widely adopted in practice yet.
Declarative processes are only rudimentarily supported and integrated process lifecycle support are not in place yet, while methods and tools for supporting imperative
processes are rather advanced (e.g., [275]).
30
4.2 Terminology
Reasons for the lacking adoption of declarative approaches seem to be related
to understandability and maintainability problems [175, 276]. Also, methods and
tools addressing respective issues, even though clearly in demand [209], are still
missing. To approach these issues, we start by analyzing problems associated with
understanding and maintaining declarative business process models with the help
of concepts from cognitive psychology. Then, to tackle these problems, we adopt
well established techniques from the domain of software engineering. More specifically, Test Driven Development [15] and Automated Acceptance Testing [155] are
combined and adapted for better supporting the declarative process life–cycle. As
result, we provide a first approach toward enhanced usability of declarative process
management systems. However, before presenting the proposed concepts, we will
take one step back and introduce basic terminology as well as a running example in
the following.
4.2 Terminology
We would like to emphasize that our work has to be seen in the context of the
process of process modeling (PPM), i.e., the process of creating a process model
(cf. [31, 186, 192, 233]). Such a process modeling endeavor is commonly described
as iterative and collaborative [106], whereby communication plays an important
role [259]. Typically, several roles are involved in the PPM. In this work, we
subscribe to the view of [250] and particularly focus on the following roles:
• Domain Expert (DE): provides information about the domain and decides
which aspects of the domain are relevant to the model and which are not.
• Model Builder (MB): is responsible for formalizing information provided
by the DE.
• Modeling Mediator (MM): helps the DE to phrase statements concerning
the domain that can be understood by the MB. In addition, the MM helps to
translate questions the MB might have for the DE.
In particular, the PPM is divided into an elicitation dialogue and formalization
dialogue [105]. During the elicitation dialogue, the DE conveys information about
the business domain to the MB, i.e., capturing the requirements of the business
process. In the formalization dialogue, the MB is responsible for transforming this
informal information into a formal process model. Thereby, as argued in [211],
much of the information is not gathered by the DE only, but created through the
communication process itself.
31
Chapter 4 Test Driven Modeling
4.3 Example
To illustrate the concepts of declarative business processes as well as the proposed
framework and methodology, we introduce a running example (cf. Figure 4.1). The
process model is rather meant to provide an example of a process within a familiar
domain than a comprehensive description of how to write a paper. For the sake of
brevity, we will use the following abbreviations:
I
Come up with Idea
R Refine Idea
W Write
S
Submit Paper
C Cancel as Deadline Missed
Legend
Process Model PMM
Refine Idea (R)
1
Come up with Idea (I)
X
0..1
Cancel as Deadline Missed (C)
Submit Paper (S)
Activity X
1
X
Activity X must be
executed exactly once
0..1
X
Write (W)
0..1
X
X can be executed
at most once
Y
X must be executed before
Y can be executed
Y
X must be followed by
Y or Z; Y and Z must be
preceded by X
X
Z
X
Y
X and Y cannot occur in
the same process instance
Figure 4.1: Example of a declarative process, adapted from [302]
The constraints used in Figure 4.1 are taken from [175] and summarized in Table 3.1. For the sake of readability, we will shortly revisit their semantics in the
following. The precedence constraint specifies that the execution of a particular activity requires another activity to be executed before (not necessarily directly). For
instance, the precedence constraint between I and R allows execution traces <I,
W, R, S > and <I, R, W, S >, but prohibits the trace <R, W, S >. The constraint
succession is a refinement of the precedence constraint. In addition to the precedence relation, it demands that the execution of the first activity is followed by the
32
4.4 Understandability and Maintainability of Declarative Process Models
execution of the second one. For example, the succession constraint between W and
S is satisfied for execution traces <I, W, S >, <I, W, R, S >, but not for <I, R, S >
(W not executed before S ) and <I, R, W > (S not executed after W ). By using
the neg coexistence constraint, it is possible to define that only one of two activities
can be executed for a process instance. For instance, the neg coexistence between C
and S allows execution traces <I, W, S > and <I, W, C >, but prohibits trace <I,
W, S, C >. Finally, the cardinality constraint restricts how often an activity can be
executed. For instance, the cardinality constraint (0..1) on S allows the trace <I,
W, R, S >, but prohibits the trace <I, W, R, S, S >.
Having the constraints’ semantics in mind, process P MM in Figure 4.1 can be
described in the following way: After an initial idea has been devised, it is possible
to start working on the paper and to refine the idea at any time. If it turns out that
the deadline cannot be met, the work on the paper will be stopped. Otherwise, as
soon as the idea is described sufficiently well, the paper can be submitted.
This example indicates three interesting properties of declarative process models:
First, the declarative nature allows for an elegant specification of business rules,
especially for unstructured processes. For an imperatively modeled process it would
be difficult to deal with multiple (possibly) parallel executions of R and W, in combination with the neg coexistence constraint between C and S. Second, for assessing
whether certain behavior is supported by the process model, the reader has to interpret the constraints mentally. For instance, when checking an execution trace, the
reader has to inspect all constraints step–by–step to evaluate whether the execution
trace is valid. Third, it is not obvious where to start reading the model. As there are
no explicit start nodes, it is up to the reader to figure out where to start—in this case
perhaps by following the precedence and succession constraints. In the following,
we will deepen this discussion and assess the understandability and maintainability
of declarative process models in detail.
4.4 Understandability and Maintainability of Declarative
Process Models
Though declarative business processes provide a big potential for flexible process
execution, their adoption is currently limited—this section elaborates on factors impeding their adoption. In particular, we analyze potential problems of declarative
business process models through the lens of cognitive psychology: Section 4.4.1 discusses problems related to the understanding of declarative process models, whereas
Section 4.4.2 deals with maintainability issues.
33
Chapter 4 Test Driven Modeling
4.4.1 Understandability
While there is anecdotal evidence that declarative process models suffer from understandability problems [175, 276], the discussion about potential reasons has been
rather superficial so far. In the following, we examine the understanding of declarative process models from three angles. First, we employ the Cognitive Dimensions Framework (CDF) [92, 93] to analyze the understanding of declarative process
models. Second, we complement these insights by adopting the notion of sequential/circumstantial information. Third, we look into the way how declarative process
models are presumably read.
Cognitive Dimensions Framework
The goal of the Cognitive Dimensions Framework (CDF) [92, 93] is to provide a
framework for assessing almost any cognitive artifact [163], e.g., modeling notations.
The dimensions described by the CDF should allow for describing the structure of the
cognitive artifact and allow for providing an analysis which is also understandable to
persons that are not specialists in Human Computer Interaction (HCI). In this way,
the CDF aims to provide discussion tools in order to raise the level of the discourse.
In the context of this work, two dimensions are of particular interest: hard mental
operations and hidden dependencies.
Hard Mental Operations According to the CDF, hard mental operations are basically defined by two properties [93]. First, the problematic mental operations must
be found on the notational level, rather than solely on the semantic level. In other
words, if a business process is hard to understand due to its size and complexity
(semantic level), the understanding of the process is not considered a hard mental
operation, as defined by the CDF. Rather, if the process could have been modeled more comprehensible in a different modeling language (notational level), the
interpretation is considered a hard mental operation. As discussed in Section 3.2.3,
declarative process models clearly fulfill this property. In particular, as illustrated
in Figure 3.7, there exist business processes which are easily understandable when
modeled in BPMN, but rather difficult to understand when modeled in Declare.
Hence, problematic mental operations can be traced back to the notational level,
rather than to the semantic level. The second precondition to be satisfied for being
considered a hard mental operation is that the combination of “offending objects
vastly increases difficulty” [93]. Also this condition is fulfilled by a declarative process modeling notation. In particular, it was shown that the semantics of a single
34
4.4 Understandability and Maintainability of Declarative Process Models
constraints can be conceived without major problems. The combination of several
constraints, however, seems to pose a significant challenge to the reader [97].
To understand why the interpretation of a declarative process model can be considered a hard mental operation, we would like to revisit the concept of computational
offloading [213, 217, 291, 292]. As introduced in Section 3.2.3, computational offloading allows the reader to “offload” computations to a diagram. In other words,
the way how the diagram represents information allows the reader to quickly extract certain information. For instance, in a BPMN model control flow is explicitly
represented by sequence flows (i.e., control edges) and gateways (e.g., AND gateway, XOR gateway). Assume the reader wants to check whether a certain process
instance is supported by an imperative process model. To this end, the reader may
use the control edges to simulate the process instance by tracing through the process model. In this way, the model allows to offload the computation of execution
traces to the process model. Contrariwise, in a declarative process model, the reader
cannot simply trace through the process model, as the control flow is modeled implicitly. Hence, even though both representations (declarative and imperative) are
information equivalent, i.e., the same execution traces can be derived, the declarative process model does not allow the reader to quickly identify process instances.
Rather, the reader has to simulate the process instance entirely in the mind.
Hidden Dependencies A hidden dependency, according to the CDF [93], is a relationship between two components, such that one component is dependent on the
other, but that dependency is not fully visible. In a declarative process model, this
situation applies to the combination of constraints. Since constraints are interconnected, it is not sufficient to look at constraints in isolation, rather, the reader must
also take into account the interplay of constraints. However, this interplay may not
always be obvious, i.e., can be hidden, thus making understanding difficult.
Figure 4.1 also illustrates hidden dependencies. For instance, after activities C
or S have been completed, W cannot be executed anymore. This behavior is introduced by the combination of the cardinality constraints of C and S as well as the
succession constraint of W and the neg coexistence constraint between S and C. If
W was executed after C or S, the process instance could not be completed anymore
as the succession constraint demands the execution of either C or S after W. However, neither C nor S can be executed due to the combination of cardinality– and
neg coexistence constraint and consequently the process instance cannot be completed. Thus, the workflow engine must prohibit the execution of W (cf. [175]),
which is not apparent from looking at the process model.
35
Chapter 4 Test Driven Modeling
Sequential and Circumstantial Information
So far, we have identified that hard mental operations and hidden dependencies
may be present in declarative process models. In the following, we use the notion of sequential and circumstantial information for further analysis. According
to [69, 70], process models exhibit sequential as well as circumstantial information.
Sequential information describes chronological behavior (e.g., A directly follows B),
whereas circumstantial information captures general relations (e.g., A must be executed at least once). Depending on the process modeling language, either sequential
or circumstantial information may be favored, i.e., can be explicitly described by
the modeling language. Thus, in general, modeling languages can be characterized
along a spectrum of explicitness between sequential and circumstantial. Imperative
languages (e.g., BPMN, Petri Nets) reside on the sequential side of the spectrum,
while declarative approaches (e.g., Declare [175]) are settled on the circumstantial
end [69]. Though the circumstantial nature of declarative languages allows for a
specification of highly flexible business process models [175], it inhibits the extraction of sequential information (e.g., execution traces) at the same time. This, in
turn, compromises understandability, as it makes it harder to see whether a process
model supports the execution of a specific process instance or not.
To understand why the extraction of sequential information, such as execution
traces, plays an essential role for understanding, we refer to the validation of process models. Basically, validation refers to the question: “Did we build the right
system?” [52], i.e., whether the process model faithfully represents the business process to be modeled. We would like to emphasize at this point that validation is
orthogonal to the concept of verification. Validation refers to the question whether
the right model is built, whereas verification is concerned with the question, whether
the model is built right [18, 20], i.e., is formally correct. From software engineering,
it is known that “programmers rely heavily upon mental simulation for evaluating
the validity of rules” [117]. Seen in the context of business process modeling, mental
simulation refers to the mental execution of process instances. In other words, the
person who validates the process model checks via mental simulation whether certain process instances, i.e., scenarios, are supported by a process model. However, as
discussed, this simulation relies on sequential information, which is only implicitly
available in declarative process models. This, in turn, poses a significant challenge
for the person who wants to validate the process model.
To illustrate the concept of sequential and circumstantial information, consider
the process model from Figure 4.1. For instance, the precedence constraint between
I and R defines that R can only be executed if I has been completed at least once.
In this way, the constraint conveys sequential information, i.e., that there is no
36
4.4 Understandability and Maintainability of Declarative Process Models
occurrence of R before any occurrence of I. However, the constraint does not strictly
prescribe all possible sequences involving I and R, but rather allows all process
instances that satisfy this condition, i.e., also conveys circumstantial information.
With respect to the coexistence constraint (cf. Table 3.1), there is an even higher
ratio of circumstantial information: it defines that one out of two activities can be
executed (circumstantial), but not both—the ordering of respective activities is not
taken into account at all (sequential).
Readability of Declarative Process Models
Finally, we would like turn to the reading of declarative process models. Basically,
it is known that conceptual models are not read at once, but chunk–wise, i.e., bit by
bit [83]. While graph–based notations inherently propose a way of reading imperative process models chunk by chunk—namely from start node(s) to end node(s)—this
approach does not necessarily work for declarative models. Since declarative models
do not have explicit start nodes and end nodes, it is not always obvious where the
reader should start reading the model. In fact, it is not unlikely for a declarative
model to have several starting points, as the focus is put on what should be modeled,
but not precisely how.
Thus, when reading a declarative process model, users have to rely on secondary
notation, e.g., layout [97]. For example, regarding Figure 4.1 one might assume
that the process starts top left and ends bottom right. However, it depends on
the person who created the model, whether this strategy succeeds. Another way of
reading could be to follow precedence constraints and succession constraints to find
the model’s start. In short, it is neither obvious which strategy to choose nor which
one works best.
4.4.2 Maintainability
Up to now, we have discussed the understanding of declarative process models. In
the following, we turn to the maintenance of respective models, i.e., the evolution of
process models due to the introduction of new laws or changes in customer attitude.
Besides gathering the change requirements, the adaption of declarative process models is far from trivial. With increasing model size declarative models are not only
difficult to understand, but also hard to maintain. As pointed out in [276]: “it is notoriously difficult to determine which constraints have to be modified and then to test
the newly adapted set of constraints”. An explanation of problems regarding maintainability is provided by the observation that adapting process models involves both
sense–making tasks, i.e., to determine parts of the model that need to be changed,
37
Chapter 4 Test Driven Modeling
and action tasks, i.e., to apply the respective changes to the model [93]. While
the action task is rather simple—adding/removing activities and adding/removing
constraints—the sense–making task is far from trivial as detailed in the following.
As illustrated, declarative process models suffer from understandability problems,
thus impeding the sense–making task. Besides, due to the interplay of constraints,
it is hard to see how changes influence other parts of the process model. Consider,
for instance, the introduction of a neg coexistence constraint between R and W in
Figure 4.1. This change not only restricts the relationship between R and W, but
also introduces a deadlock for the execution trace <I, R>. While the execution of
R is still possible, neither C nor S can be executed anymore, because they require
W to be executed before. However, as R and W are mutual exclusive, W cannot be
executed. This example illustrates that local changes to the process model can have
effects on other parts of the model that are not obvious, i.e., the change becomes
global.
4.5 Testing Framework for Declarative Processes
To address understanding and maintenance issues discussed afore, we propose a
framework for the validation of declarative processes, subsequently referred to as
Test Driven Modeling (TDM). Basically, the idea of TDM is to transfer concepts
from software engineering in order to provide computer–based support for the creation and maintenance of declarative process models. In particular, Section 4.5.1
introduces background information about the software engineering techniques Test
Driven Development and Automated Acceptance Testing. Then, Section 4.5.2 illustrates how respective techniques can be adapted to the domain of business processes
and introduces the testing framework. Section 4.5.3 focuses on methodological aspects and their application in the declarative process life–cycle.
4.5.1 Software Testing Techniques
As the proposed framework and methodology build upon techniques from software
engineering, respective techniques are introduced in the following.
Test Driven Development
Software engineering processes typically define the phases of system design and
implementation as well as testing. That means, after the required functionality
has been developed, the software system’s defects are (more or less) systematically
searched for and corrected. The idea of Test Driven Development (TDD), however,
38
4.5 Testing Framework for Declarative Processes
is to interweave the phases of system development and testing [15]. As the name
suggests, automated test cases are specified before the actual production code is
written. Whenever a new feature is introduced, a test case is created to ensure that
the feature is implemented properly. In addition, developers execute all test cases
to verify that existing behavior is preserved and the new feature does not introduce
undesired behavior, i.e., “breaks the existing code” [15]. Studies show that the adoption of TDD indeed leads to improvements with respect to the number of software
defects and design quality (e.g., [27, 84]). It is worthwhile to note that TDD, as a
byproduct, enables regression testing, i.e., testing whether the development of new
functionality in a software system preserves the correctness of the existing system [1].
In particular, test cases that were specified during the development, can directly be
used for regression testing.
Automated Acceptance Testing
Similar to TDD, the idea of Automated Acceptance Testing (AAT) [155] is the creation of executable test cases. AAT, however, focuses on the interaction between
customers and developers. Test cases in AAT need to be understandable to customers without technical background, but still exhibit strict semantics. Thereby, test
cases act as means of communication as they can be understood by both customers
and developers, allowing for a better integration of the customer in the development
process. This, in turn, supports the identification of system requirements [155]. The
automated validation of the developed system against the test cases ensures that
the desired functionality is actually provided by the software system. The test cases
are seen as contract of acceptance: only if the software system passes all test cases,
it will be accepted by the customer.
4.5.2 Process Testing Framework Concepts
So far, we have established that TDD and AAT can help to improve understanding and maintaining software. Next, we describe how respective concepts can be
adopted for declarative process models for addressing understandability and maintainability issues, as identified in Section 4.4. In particular, this section starts with
further motivation and an overview of the testing framework before the concepts are
explained in detail.
Besides improving understanding and maintenance, test cases aim for supporting
the creation of process models in two ways. First, it is known that a fundamental
modeling strategy is to factor the problem, i.e., the business process to be modeled,
into smaller pieces [154]. As test cases typically refer to a specific aspect of the busi-
39
Chapter 4 Test Driven Modeling
ness process, they help to focus on smaller parts of the process, thereby supporting
the factoring of the problem. In this vein, we subscribe to the view of [17]: “We
assume that the domain experts know some or all scenarios of the process to be modelled better than the process itself ”. Second, an automated execution of test cases
allows for a progressive evaluation, which is particularly beneficial for novices [93].
Thereby, test cases help to focus on a particular instance of the problem [154],
another fundamental modeling strategy. Similarly, focusing on the bare essentials
for reproducing a specific (errant) behavior was identified as an expert strategy for
debugging [279].
The intended application of test cases, in particular the interplay of domain expert
(DE), model builder (MB), test cases and process model is illustrated in Figure 4.2.
The DE, possibly with help of the MB, creates a test case specifying the intended
behavior of the declarative process (1a and 1b). To foster communication between
DE and MB, test cases are represented by a rich graphical user interface, the so–
called test fixture. Since the test fixture is understandable to both the DE and
MB, it serves as base for discussion. To validate a test case, the test case logic is
automatically extracted from the test case (2) and fed into the test engine (3). For
the validation of test cases, the test engine also needs to access the process model
(4). After the test engine has completed the validation process, results are reported
to the MB/DE by indicating whether the test failed and for the case it failed, the
reason why (5). Depending on the feedback, the MB might decide to adapt the
process model (6) or the test case (1b).
Test Case Structure
In our work, we adopt the terminology from UML Testing Profile (UTP). More
specifically, we consider a test case a “specification of one case to test the system
including what to test with, which input, result, and under which conditions. . . A
test case always returns a verdict. The verdict may be pass, fail, inconclusive or
error.” [165]. Test cases are key components of the testing framework and essential
parts of TDD and AAT. A test case for a declarative process model consists of text
explaining the intention of the test case, an execution trace describing the behavior
to be tested, a set of assertions specifying the conditions to be validated, as well as
a graphical representation by a test fixture.
Textual Description The textual description in a test case helps to capture information that cannot be expressed directly in a process model, but is still necessary to
fully understand it. In this way, textual descriptions can be used to document why
certain behavior must be present in the process model. For instance, the intention
40
4.5 Testing Framework for Declarative Processes
Domain Expert (DE)
Model Builder (MB)
6)
Test case
visualized by
Test fixture
1b)
1a)
5)
My First Test Case
The intention of my first test case is to
motivate the interplay of MB, DE, …
Execution Trace
4)
Test Engine
Assertion
Exec.
Term.
3)
1
B
A
3
4
1
A
Test Case Logic
time
2
Process
Model
A
B
2)
1) can_not_terminate(1)
2) execute(A,2)
3) is_not_executable(A,3)
4) can_terminate(3)
5) execute(B,4)
C
Figure 4.2: Framework for the validation of declarative processes, adapted from [302]
of the test case shown in Figure 4.2 is to give an overview of the proposed testing
framework and illustrate the interplay of concepts.
Execution Trace As motivated by AAT, test cases can be seen as executable specifications [155]. Instead of using an informal document describing the requirements of
the process model in natural language, the specification should not only be readable
by humans, but also accessible to automated interpretation. The term execution
thereby refers to the idea of checking the described requirements using a test engine
in an automated manner, i.e., “the test case gets executed”. Within a test case,
execution traces are used to capture behavior to be tested, as they contain all relevant information about the execution of a process, e.g., execution of an activity or
completion of a process instance. Thereby, an execution trace allows restoring an
arbitrary state of a process instance by replaying the steps in the execution trace
on the workflow engine. Thus, execution traces provide the basis for an executable
specification, capturing behavior in a machine–readable form.
41
Chapter 4 Test Driven Modeling
Considering the test case illustrated in Figure 4.2, the execution trace can be found
on the left hand side of the test case. The execution trace includes the execution of
two activities: activity A at time 2 and activity B at time 4.
Assertion While execution traces allow for specifying behavior that must be supported by a process model, they do not provide means to specify behavior that must
not be included. For instance, regarding the process model shown in Figure 4.2,
activity A must be executed exactly once. Put differently, activity A must be at
least once in the execution trace, but at the same time not more than once. In
order to support such scenarios, a test case also contains a set of assertions. Using
assertions, it can be validated whether certain behavior is supported/prohibited by
the process model. Since declarative process models require explicit completion, i.e.,
the user must complete the instance explicitly, we differentiate between execution
and completion assertions:
• is executable (a, t): activity a is executable at time t.
• is not executable (a, t): activity a is not executable at time t.
• can complete (t): the process instance can complete at time t.
• can not complete (t): the process instance can not complete at time t.
As discussed in Section 3.1, the behavior specified in a declarative process model
is prescribed by its activities and constraints. The former enumerate tasks which
may be executed, whereas the latter ones restrict their execution and the completion
of the process instance. Thus, the aforementioned assertions should be sufficient to
cover any condition with respect to control flow, since they can deal with activity
execution as well as process completion.
Considering Figure 4.2, three assertions can be found at the right hand side of
the test case. The assertions are organized in two columns to separate execution
from completion assertions. At time 1, before the execution of activity A, assertion
can not complete (1) (crossed out area are on the right) is specified. After A has
been executed, at time 3, is not executable (A,3) (crossed out rectangle on the left)
and can complete (3) (non–crossed area in the right column) can be found.
Test Fixture So far, we illustrated the textual description, the execution trace and
assertions of a test case. While these concepts suffice to specify the behavior to be
tested, it is very likely that the DE cannot cope with these technical details. To enable the DE to read and specify test cases, test fixtures provide an intuitive graphical
representation of a test case. Depending on the situation, a specific visualization
42
4.5 Testing Framework for Declarative Processes
may be suitable best. For instance, when control flow constraints should be tested
(e.g., activity B must be preceded by activity A), a simple time–line fixture may
be sufficient (cf. Figure 4.2). However, for testing temporal constraints [127] (e.g.,
activity A can only be executed once per day), a calendar–like fixture might be more
appropriate. Furthermore, it might be beneficial to provide a user interface the DE
is familiar with (e.g., calendar from Microsoft Outlook).
Figure 4.2 shows an example of a test fixture for testing control flow constraints.
It consists of a simple timeline on the left hand side as well as an area for the specification of constraints on the right hand side. Assertion can complete is represented
by an empty rectangle, whereas assertions is not executable and can not complete
are represented by crossed out rectangles.
Test Case Validation
So far, we have introduced test cases that serve as executable specification and are
created by the DE, possibly with the help of the MB. Now, we describe how test
cases can be automatically validated against the process model. Each test case is
responsible for providing a precise definition of the behavior to be tested, i.e., the
execution trace and assertion statements. For the actual execution of the test case,
the test engine provides an artificial environment, in which the process is executed.
The procedure for the validation of a test case is straight forward:
(1) Initialize the test environment, i.e., instantiate the process instance.
(2) For each event in the execution trace and assertion, ascending by time:
a) if event: The test engine interprets the log event and manipulates the
test environment, e.g., execution of an activity. If the log event cannot
be interpreted, e.g., because the activity cannot be executed, the test case
validation will be stopped and the failure reported.
b) if assertion: Test whether the assertion holds for the current state of the
test environment. For the case the condition does not hold, the test case
validation will be stopped and the failure reported.
(3) In case all assertions passed, report that the test case passed. Otherwise,
provide a detailed problem report, e.g., report the constraint that caused the
failure or the state of the process instance when the test case failed.
With respect to the test case illustrated in Figure 4.2, the test engine takes the
test case logic and the process model as input. Then, the engine interprets the test
case logic as follows:
43
Chapter 4 Test Driven Modeling
(1) can not complete (1): Check that the process instance cannot be completed
without executing any activity.
(2) execute (A,2): Execute activity A at time 2 and complete it at time 3.
(3) is not executable (A,3): Check that A cannot be executed at time 3.
(4) can complete (3): Test if the process instance can be completed at time 3.
(5) execute (B,4): Execute activity B at time 4 and complete it at time 5.
(6) All events from the execution trace and all assertions could be interpreted
without problems, thus report that the test case passed.
Meta Model
Up to now, we have described test cases and the automated validation. In the following, we summarize the described concepts in form of a meta model. As illustrated in
Figure 4.3, TDM’s meta model can be divided into two main parts: the specification
of test cases (upper half) as well as the specification of the business process model
(lower half). A Test Driven Model consists of exactly one Declarative Process Model
and an arbitrary amount of Test Cases. A Declarative Process Model consists of at
least one Activity as well as an arbitrary amount of Constraints. For the sake of
brevity, Figure 4.3 shows three constraints only, i.e., the Response Constraint, the
Precedence Constraint and the Coexistence Constraint. TDM actually supports all
constraints described in [175], for a detailed description of the constraints we refer
to [148]. Besides the specification of a Declarative Process Model, the meta model
in Figure 4.3 describes how Test Cases can be specified. In particular, a Test Case
is built–up by a Process Instance, i.e., the execution trace, and an arbitrary number
of Assertions. The Process Instance, in turn, consists of an arbitrary number of
Activity Instances. For each Activity Instance a start Event (i.e., when the activity
instance enters the state started in its life–cycle) as well as and an end Event (i.e.,
when the activity instance is completed ) are defined. Similarly, each Assertion is
defined for a certain window by its start– and end Event. Within this window, a
condition that is specified by the Assertion must hold. TDM thereby differentiates between two types of Assertions: an Execution Assertion can be used to verify
whether a certain Activity is executable. The positive flag in Assertion thereby defines whether an Activity is expected to be executable or if the Activity is expected
to be non–executable. Similarly, a Completion Assertion can be used to test whether
the Process Instance can be completed within a specified window.
To wrap up, TDM allows for the specification of declarative process models and
test cases. Each test case defines a certain scenario, i.e., process instance, that
44
4.5 Testing Framework for Declarative Processes
Specification of test cases
Execution Assertion
-activity : Activity
Assertion
Test Driven Model
1
Test Case
0..* -name : String
-name : String
1
0..*
-positive : boolean
-start : Event
-end : Event
1
1
Completion Assertion
1
Process Instance
1
Activity Instance
1
-start : Event
-end : Event
0..*
Event
-id : int
1
Response Constraint
-activity1 : Activity
-activity2 : Activity
Activity
1..*
1
1
-name : String 1
1
Precedence Constraint
Declarative Process Model
Constraint
1
0..*
-activity1 : Activity
-activity2 : Activity
Coexistence Constraint
Specification of declarative
process model
For the sake of brevity only three
constraints are shown.
-activity1 : Activity
-activity2 : Activity
Figure 4.3: Meta model of TDM [300]
must be supported by the process model. Assertions can thereby be used to test for
specific conditions, namely whether an activity is executable as well as whether the
process instance can be completed.
4.5.3 Test Driven Modeling and the Declarative Process Life–Cycle
So far, we have elaborated on understandability issues and maintainability issues of
declarative models and introduced a testing framework adopting TDD and AAT.
In the following, we turn toward the declarative process life–cycle, as depicted in
Figure 4.4. We start by looking into the phase of process design and deployment,
where we focus on methodological aspects and discuss how test cases are intended
to drive the creation of declarative process models during. Then, we describe how
test cases can provide support in the phase of process operation and evaluation.
Design– and Deployment Phase
In the following, we focus on the phases of process design and process deployment.
In particular, we focus on process specification and process testing, cf. Figure 4.4.
45
Chapter 4 Test Driven Modeling
Measure
Operation &
Evaluation
Specify
Design
Run
Release
Deployment
Implement
Test
Figure 4.4: Process life–cycle, adapted from [281]
Test Driven Modeling (TDM) The PPM is a collaborative, but still manual process. By adopting TDD and AAT techniques, process specification and process
testing get closely interwoven. In particular, test cases serve as Modeling Mediator (MM) [250] mediating between DE and MB. As illustrated in Figure 4.5, test
cases provide means to talk about both the business domain and the process model.
The DE is no longer forced to rely on the information provided by the MB solely
(2)—the specification of test cases (4) and their automated validation against the
formal process model (6) provide an unambiguous basis for communicating with the
MB (5). This does not mean that the DE and MB do not communicate directly
anymore. Rather, tests provide an additional communication channel. Thereby,
modeling minutes, which are formulated as test cases instead of informal text can
be automatically validated against the process model, relieving the MB from manually checking the process model against the informal specification.
It is important to stress that the TDM’s PPM is of iterative nature. Rather than
specifying all test cases up–front and modeling the process afterwards against the
specification, test cases and model are refined iteratively. As illustrated in Figure 4.6,
when a new process model is specified, the DE (with the help of the MB) describes a
requirement in the form of a test case. Then, all test cases are validated against the
process model to check whether the specified behavior is already supported by the
model. For the case that all tests pass, the DE and MB can move on with specifying
the next requirement, i.e., test case. If at least one test case fails, the DE and MB
will discuss whether the failed test cases are valid, i.e., capture the business domain
properly. If all test case are valid, it can be assumed that the model does not capture
the business domain properly and therefore needs to be adapted. However, if the
46
4.5 Testing Framework for Declarative Processes
4)
Test 1
Test 2
6)
Test 3
1)
5)
Domain
Domain Expert
2)
3)
Model Builder
Figure 4.5: Communication channels, adapted from [134]
discussion reveals that the test case is invalid, the test case needs to be adapted
to represent the business domain properly. In either case, all test cases are run to
ensure that the conducted adaption had the desired effect. Subsequently, new test
cases are defined/adapted or the model is adapted iteratively until both DE and
MB are satisfied with the test cases and process model.
While the basic idea, as illustrated in Figure 4.6, is to start by specifying test
cases, for some situations one might also start with the modeling part or a non–empty
model. For instance, when working with existing process models it is neither feasible
nor meaningful to start from scratch. Furthermore, depending on the MB’s or DE’s
skills and preferences, it is not necessary to strictly follow the test–before–model idea,
e.g., to start from an initial model capturing the process logic roughly and refine it
using test cases. Such deviations from the original TDM process are acceptable—as
long as testing and modeling stays interwoven. Otherwise, the benefits of TDM are
likely to be diminished. The idea of TDM is illustrated based on an example of
a modeling session, cf. Figures 4.7 to 4.9. To recapitulate, we use the following
abbreviations:
I
Come up with Idea
W Write
S
Submit Paper
Starting from an empty process model, the DE lines out general properties of the
process: “When writing a publication, you need to have an idea. Then you write
the publication and submit it.”. Thus, possibly with help of the MB, the DE inserts
activities I, W and S in the test case’s execution trace (cf. Figure 4.7). Respective
activities are automatically created in the process model. Now, the DE and MB run
47
Chapter 4 Test Driven Modeling
Write Test Case
all test cases
pass
Run All Test Cases
test case(s) fail(s)
Discuss Failed Test Case
test case valid
Adapt Process Model
test case invalid
Adapt Test Case
Figure 4.6: Test Driven Modeling [302]
the test and the test engine reports that the test case passes.
Subsequently, the DE and MB engage in a dialogue of questioning and answering [105]—the MB challenges the model: “So can I submit the paper several times?”.
“You should submit the paper, but, at most once!”, the DE replies and adds: “And
you should only have a single idea—otherwise the reader gets confused.”. Thus,
they adapt the test case capturing this requirement and run it (cf. Figure 4.8).
Apparently, the test case fails as there are no constraints in the model yet. After
ensuring that the requirement is valid, the MB adapts the model—inserts cardinality
constraints on I and S —the test passes (cf. Figure 4.8).
Again, the MB challenges the model and asks: “Is it possible to submit an idea
without paper?”. The DE replies: “No, you always need a written document.” and
together they specify a second test case that ensures that S cannot be executed
without at least one execution of W before. By automatically validating the second
test case, it becomes apparent that S can be executed before W has been finished.
Thus, the MB introduces a precedence constraint between W and S (cf. Figure 4.9).
The given example illustrates the benefits of TDM for the design– and deployment
phase, which are detailed in the following.
48
4.5 Testing Framework for Declarative Processes
Trace
Assertions
Execution
Term.
Process Model
Come up with Idea (I)
I
Write (W)
W
Submit Paper (S)
S
Figure 4.7: Test case 1: <I, W, S > proposed by the DE, adapted from [302]
Trace
Assertions
Execution
Term.
Process Model
1
Come up with Idea (I)
I
W
Write (W)
I
1
S
Submit Paper (S)
S
Figure 4.8: Test case 1: introduction of cardinality on I and S, adapted from [302]
Improving Understandability As discussed in Section 4.4.1, declarative process
models do not provide explicit support for sequential information, thereby forcing
the MB to construct respective information in the mind. At this point, the sequential
nature of test cases is exploited: since specification and testing are interwoven, test
cases and models are paired together. Thereby, test cases provide an explicit source
of sequential information. The construction of sequential information is supported
by the automated validation of test cases, thus avoiding hard mental operations.
Put differently, the automated computation of execution traces compensates for
the lack of computational offloading in declarative process models. In addition, by
specifying a respective test case, implicit dependencies between constraints can be
made explicit. This, in turn, helps the MB to deal with hidden dependencies.
According to [15], test cases should focus on a single aspect only. For instance,
the test case shown in Figure 4.9 focuses on the execution of activity S. Thereby, a
chunk of the process model is presented, focusing on a certain aspect of the model
49
Chapter 4 Test Driven Modeling
Trace
Assertions
Execution
Term.
Process Model
1
Come up with Idea (I)
I
S
W
Write (W)
1
Submit Paper (S)
S
S
Figure 4.9: Test case 2: introduction of precedence between W and S, adapted
from [302]
only. This, in turn, enables the MB to read the model test case by test case, i.e.,
chunk–wise. Besides, the ordering of the activities specified in the test case proposes
a way of reading the process model. Consider, for instance, the test case illustrated
in Figure 4.9: Activities I, W and S are ordered consecutively. Thus, the reader
can assume that the process probably starts with activity I and ends with S.
Foster Communication As pointed out, our approach aims at fostering the communication between DE and MB. Wrapping up, we expect test cases to (1) act as
communication medium between DE and MB that serves as basis for discussion, (2)
structure their dialogue and (3) allow to focus on the modeling task, as test cases
provide a way to automatically ensure that existing behavior is not affected when
changing the model.
Support Schema Evolution Since design is redesign [87], during schema evolution
the same principles as for process specification can be applied. The only difference,
however, is that DE and MB have already a set of test cases as starting point that is
extended as the process model is re–engineered. In this sense, the use of automated
test cases is also beneficial for supporting schema evolution. First, existing test cases
ensure that desired behavior is preserved by schema evolution, i.e., no unwanted behavior is introduced (cf. regression testing, Section 4.5.1). Second, the specification
of new test cases capturing the behavior to be introduced helps the modeler to determine which constraints need to be changed, addressing the maintainability issues
discussed in Section 4.4.2. Similar to the specification of new process models, the
first step consists of specifying a test case that defines the behavior to be introduced/changed. Afterwards, the MB iteratively refines the test case, creates new
50
4.5 Testing Framework for Declarative Processes
test cases and adapts the model until the desired solution is finally approached. The
automated nature of test cases ensures that neither requirements are forgotten, nor
new requirements contradict existing ones, allowing DE and MB to focus on the
requirement elicitation and the modeling task.
Operation and Evaluation
While declarative processes provide a high degree of flexibility [175], deviations from
the process model can occur nevertheless. In Declare [175], for instance, it is possible
to specify optional constraints, which can be violated during process execution.
These deviations are usually documented using plain text. However, when deviations
occur frequently, it is desirable to ensure that deviations are incorporated during
schema evolution [275].
To support the evolution of business processes over time, we propose to capture
each deviation in the form of a test case. The execution trace of the current process
instance, in combination with a textual description, can directly be transformed into
a test case. The user, who deviated, is thereby enabled to document the deviation
in a form that can directly be used to guide the upcoming schema evolution. This
means, when redesigning the process schema, the MB runs all test cases for the
respective schema. If all test cases, including the test cases specified in course of
deviations, pass, the MB knows that the new process schema version also supports
the needs of users who deviated. Otherwise, test cases that fail will be discarded if
the discussion between DE and MB reveals that respective behavior should not be
supported in the new process model version.
Considering the process depicted in Figure 4.1, which allows for the submission
of a paper only, resubmission is not supported. If a user needed to resubmit a
paper, he would deviate from the process by inserting and executing an activity
Resubmit Paper. To assure that the next version of the process model includes this
exceptional case—or at least that the exceptional case is discussed during the schema
evolution—the user creates a new test cases using the execution trace of the current
process instance and a textual description to record the reason for the deviation.
Test Cases and the Process Life–Cycle
So far, the implications on the different phases of the declarative process life–cycle, as
shown in Figure 4.4, were pointed out. While our approach covers all phases—with
focus on process design and deployment—it should be emphasized that support is
not provided in isolation for each phase. More specifically, test cases provide a mean
of communication and documentation throughout the process life–cycle. Starting
51
Chapter 4 Test Driven Modeling
in the design phase, they aim at improving the understandability of declarative
process models and foster communication between DE and MB. Moving to the
phase of process deployment, the automated nature of test cases provides support
for the validation of the process. During process operation, test cases can be used to
document process deviations. And, again starting at the phase of process design, test
cases specified during process operation provide a valuable starting point for schema
evolution. Thus, it becomes apparent that test cases are neither restricted to a single
phase of the process life–cycle nor to a single iteration of the life–cycle. Rather, test
cases flow through possibly multiple iterations of the process life–cycle, providing
information that cannot be explicitly specified by declarative process models in
isolation.
3)
un
R
+
1)
ify
Spec
TC1 … TCn
TCn+1 … TCo
+
TC1 … TCn
Test
TC1 … TCn
2)
Figure 4.10: Process– and test case life–cycle [302]
The life–cycle of test cases is illustrated in Figure 4.10. Test cases are primarily
defined during process specification (1) and can then directly be used for process
testing, i.e., validation (2). During process execution, new test cases (T Cn+1 . . .
T Co ) may be created to document deviations (3). These test cases, in turn, can be
used as input for schema evolution (1).
4.5.4 Limitations
Apparently, TDM has to be seen in the light of several limitations. Regarding conceptual aspects, it shall be noted that the focus of TDM is rather narrow. In particular, TDM was developed to support the creation, understanding and maintenance
of declarative process models. Even though it seems plausible that test cases may be
also used for the validation of imperative process models, it remains unclear in how
52
4.6 Test Driven Modeling Suite
far concepts described in this work can be transferred. More specifically, we have
argued that a central aspect of test cases is the automated extraction of sequential
information from declarative process models, which usually rather focus on circumstantial information. Hence, when directly applying test cases, as described in this
work, to imperative process models, sequential information is duplicated. In other
words, as test cases and imperative languages both focus on sequential information,
the adoption of test cases does not help to balance the sequential/circumstantial
information mismatch (cf. Section 4.4.1). More likely, the introduction of circumstantial information, such as constraints, will help to facilitate model understanding
and validity of imperative process models (cf. [132]).
Another limitation of this work is that TDM focuses on control flow aspects only.
Other perspectives, such as data and resources, were not taken into account yet.
Even though this limits the applicability of TDM, this design decision was taken
deliberately. In particular, the Design Science Research Methodology (DSRM) [173]
adopted in this work envisions an iterative research process. Likewise, it is acknowledged that design–science research efforts may begin with simplified conceptualizations [100]. Hence, we focused on building a solid foundation, which may be used
in future iterations for refining and extending. Regarding the applicability and efficiency of TDM, we would like to stress that TDM was designed to be used as
a collaborative approach, requiring MB and DE to work together closely. Hence,
for the adoption of TDM, two experts willing to collaborate are required. To test
whether the application of TDM is indeed feasible, in the following we describe a
prototypical implementation of TDM’s concepts.
4.6 Test Driven Modeling Suite
In this section we introduce Test Driven Modeling Suite (TDMS), which provides
operational support for TDM. We start by discussing the software components of
TDMS in Section 4.6.1. Then, we describe how TDMS was integrated with existing
frameworks for empirical research and business process execution in Section 4.6.2
and demonstrate the application of TDMS in a modeling session in Section 4.6.3.
Thus, in the sense of [164], this section can be seen as a “proof–by–demonstration”,
i.e., demonstrating the feasibility of TDM by implementing its concepts.
4.6.1 Software Components
To give an overview of the features implemented in TDMS, we have modeled the
scenario described in Section 4.5.3 using TDMS, cf. Figure 4.11. On the left hand
side, TDMS offers a graphical editor for editing test cases (1). To the right, a
53
Chapter 4 Test Driven Modeling
graphical editor allows for designing the process model (2). Whenever changes are
conducted, TDMS immediately validates the test cases against the process model
and indicates failed test cases in the test case overview (3). In this case, it lists two
test cases from which one failed. In addition, TDMS provides a detailed problem
message about failed test cases in (4). In this example, the MB defined that the
trace <I, W, S > must be supported by the process model. In addition, the test case
defines that S cannot be executed before W has been executed. However, as the
relation between W and S is not restricted, it is possible to execute S, causing the
test case to fail. In particular, TDMS informs the user about the failed test case by
highlighting the respective erroneous part in the test case editor (1) and in the test
case overview (3). In addition, TDMS provides a detailed error message to the user
in (4): “Submit Paper (S)” should not have been executable.
Figure 4.11: Screenshot of TDMS
Test Case Editor
As discussed in Section 4.5.2, test cases are a central concept of TDM, have precise semantics for the specification of behavior and still should be understandable
to domain experts. To this end, TDMS provides a calendar–like test case editor as
shown in Figure 4.11 (1). In particular, the test case editor provides support for the
specification of an execution trace on the left hand side and execution assertions as
well as completion assertions at the right hand side. The graphical representation
of assertions slightly deviates from the original design (crossed out and non–crossed
54
4.6 Test Driven Modeling Suite
out rectangles). Instead, stop–signs are used for negative assertions, i.e., when an
activity cannot be executed or the process instance cannot be completed. OK–signs,
in turn, are used to visualize positive assertions, i.e., when an activity can be executed or the process instance can be completed. To avoid unnecessary distractions,
the test case editor is deliberately kept simple. The execution trace can be assembled by dragging activities from the process model editor (2) and dropping them at
the respective time in the test case editor (1). Likewise, execution assertions can
be specified in the same way. For completion assertions, the user selects the desired
time frame in the test case editor and uses the context menu to add the assertion.
Declarative Process Model Editor
The declarative process model editor, as shown in Figure 4.11 (2), provides a graphical editor for designing models using the declarative process modeling language Declare [175]. Particularly, it enables the user to create, delete, rename and reposition
activities and allows the user to create, edit and delete constraints. To allow users
to quickly get familiar with the editor, it builds upon the standard user interactions
provided by Graphical Editor Framework (GEF)1 .
Test Case Creation and Validation
In order to create new test cases or to delete existing test cases, an outline of all
test cases is provided in Figure 4.11 (3). Whenever a test cases is created, edited
or deleted, or the process model is changed, TDMS immediately validates all test
cases. By double–clicking on a test case, TDMS opens the respective test case in
the test case editor (1). Whenever a test case fails, TDMS provides a detailed
problem message in the problem overview (4). When double–clicking on a problem
reported in the problem overview, TDMS opens the test case associated with the
problem in test case editor (1) and highlights the problem. Regarding the validation
of test cases, it is important to stress that the validation procedure is performed
automatically, i.e., no user interaction is required to validate the test cases. To this
end, TDMS provides a test engine in which test cases are executed, as shown in
Figure 4.12. Basically, the test engine consists of a declarative process instance that
is executed on a declarative workflow engine within a test environment. Thereby,
TDMS’ process model provides the basis for the process instance. The test cases
steer the execution of the process instance, e.g., instantiating the process instance,
starting activities or completing activities. In addition, test cases may also check the
1
http://www.eclipse.org/gef
55
Chapter 4 Test Driven Modeling
state of the process instance in the course of evaluating execution– or completion
assertions.
TDMS Model
Test Engine
Declarative Process Model
defined by
Declarative Process Instance
steer
Test Cases
check state
Declarative Workflow Engine
Figure 4.12: Testing framework [300]
As pointed out in Section 4.5.3, the TDM methodology is of iterative nature, hence
TDMS must also provide respective support. In particular, the iterative creation
of the process model poses a significant challenge, as any relevant change of the
process model requires the validation of test cases.2 However, existing approaches for
supporting Declare either lead to exponential runtime for schema adaptations [175]
or do not support workflow execution [148]. In order to tackle these problems, TDMS
provides an own declarative workflow engine. Similar to the Declare framework,
where constraints are mapped to LTL formulas [177], TDMS’ workflow engine maps
constraints to Java3 classes. In addition, for each process instance, the workflow
engine keeps a list of events to describe its current state. The enablement of an
activity can then be determined as detailed in the following. Based on the current
process instance, a constraint is able to determine whether it restricts the execution
of an activity. The workflow engine consults all defined constraints and determines
for each constraint whether it restricts the execution. If no constraint vetos, the
activity can be executed. For determining whether the process instance can be
completed, a similar strategy is followed. However, in this case constraints are
asked whether they restrict the completion of the process instance instead.
Whenever a constraint should be added to the process model, it is then sufficient
to add this constraint to set of constraints to be checked. Similarly, when removing a constraint, the workflow engine does not consider the respective constraint
anymore. While such an approach allows for efficient schema evolution, it does not
support verification mechanisms as provided in, e.g., the Declare framework [177].
To compensate for this shortcoming, TDMS provides an interface to integrate third
party tools for verification, as detailed in the following.
2
Layout operations, for instance, can be ignored here as they do not change the semantics of the
process model.
3
http://java.sun.com
56
4.6 Test Driven Modeling Suite
4.6.2 Support for Empirical Evaluation, Execution and Verification
So far, we have described how test cases and declarative process models can be
created in TDMS. In the following, we discuss how support for empirical research
as well as the execution and verification of declarative process models is supported.
In particular, we describe how TDMS makes use of CEP’s components for empirical
research and integrates the Declare framework [177] for workflow execution and
process model verification, as illustrated in Figure 4.13 and detailed in the following.
Test Cases
+
Process
Model
Test Driven Modeling
Suite
Export
and
Deploy
Declare Framework
(Workflow Engine)
Cheetah Experimental
Platform
Declare Worklist
(Workflow Client)
Execute Process
Instance
Process Modeling
Process Execution
Figure 4.13: Interplay of TDMS, CEP and the Declare framework [300]
Cheetah Experimental Platform as Basis
One of the design goals of TDMS was to make it amenable for empirical research,
i.e., it should be easy to employ it in experiments and case studies. In addition, data
should be easy to collect and analyze. For this purpose, TDMS was implemented
as an experimental workflow activity of CEP, allowing TDMS to be integrated in
any experimental workflow, i.e., a sequence of activities performed during an experiment, cf. Section 3.3. Furthermore, we used CEP to instrument TDMS, i.e.,
to log each relevant user interaction to a central data storage. This logging mechanism, in combination with CEP’s replay feature, allows the researcher to inspect
step–by–step how TDMS is used to create process models and test cases. Or, even
more sophisticated, such a fine–grained instrumentation allows researchers and practitioners to closely monitor the process of process modeling, i.e., the creation of the
process model, using Modeling Phase Diagrams [192].
To support the step–wise replay of modeling sessions, any relevant user–interaction
in TDMS is supported through a command (cf. Command Pattern [262]). As illustrated in Figure 4.14, any command in TDMS inherits from Abstract Command,
which defines the basic operations for replay: to execute a command and to undo a
57
Chapter 4 Test Driven Modeling
Create Activity Command
«abstract»
Abstract Command
Abstract Activity Command
-name : String
+execute()
+undo()
Move Activity Command
Renam e Activity Com m and
Abstract Constraint Command
de
Create Response Constraint Command
e
E
Create Precedence Constraint Command
Delete Constraint Com m and
Figure 4.14: Command class hierarchy
command. Subclasses extend this behavior accordingly, e.g., Create Activity Command defines the behavior for creating an activity. Similarly, Create Precedence
Constraint Command provides functionality for specifying a precedence command.
Consequently, any modeling session can be represented as a sequence of commands.
When conducting a modeling session in TDMS, CEP automatically stores the executed commands. When analyzing the modeling session, in turn, commands are
restored and executed, allowing to revisit the modeling session step–by–step.
Process Model Verification and Execution
As discussed, the internal workflow engine of TDMS does not support the verification of declarative process models. However, it is known that the combination of
constraints may lead to activities that cannot be executed [175]. In order ensure that
the process model is free from such dead activities, we make use of the verification
provided in the Declare framework [177]. In particular, as illustrated in Figure 4.13,
the process model is iteratively created in TDMS. For the purpose of verification,
the process model is then converted into a format that can be read by the Declare
framework. Similarly, this export mechanism can be used to execute the process
model in the Declare framework’s workflow engine.
4.6.3 Example
To demonstrate how TDMS can be used to drive the creation of a declarative business process model including test cases, we again turn to the example discussed in
Section 4.5.3. Recall that the example shows how the proposed approach may be
used by a DE and MB to create a process model and test cases describing the process
58
4.6 Test Driven Modeling Suite
of writing a publication (cf. Figures 4.15–4.17). Again, we make use of the following
abbreviations:
I
Come up with Idea
W Write
S
Submit Paper
Initially, TDMS starts up with an empty process model and an empty test case.
Following the process described in Figure 4.6, the DE describes a general version of
the business process: “When writing a publication, you need to have an idea. Then
you write the publication and submit it.”. In the following, possibly with help of
the MB, the DE inserts activities I, W and S in the test case’s execution trace (cf.
Figure 4.15). The user interface of TDMS thereby allows to create the activities in
the test case editor (left hand side), respective activities are automatically created
in the process model and laid out accordingly (right hand side). In addition, after
each test–case–relevant user interaction, e.g., adding an activity to the test cases’
execution trace, TDMS automatically validates the test case against the process
model. As TDMS has automatically created activities I, W and S, the test case
passes.
Figure 4.15: Test case 1: <I, W, S > proposed by the DE
Subsequently, the DE and MB engage in a dialogue of questioning and answering [105]—the MB challenges the model: “So can I submit the paper several times?”.
“You should submit the paper, but, at most once!”, the DE replies and adds: “And
you should only have a single idea—otherwise the reader gets confused.”. To capture these requirements, the MB adapts the test case accordingly. In particular, the
MB adds execution assertions to restrict the execution of I and S, specifying that I
and S cannot be executed more than once. In addition, the MB adds a completion
assertion to specify that the process cannot be completed until I and S have been
executed, thereby requiring that I and S are executed at least once (cf. Figure 4.16).
TDMS immediately validates the changes and reports that the test case is invalid,
as I and S can be executed arbitrarily often. Since DE and MB know that the
test case is valid, the process model has to be adapted to resolve this situation (cf.
59
Chapter 4 Test Driven Modeling
Figure 4.6). In particular, the MB adds cardinality constraints on I and S, making
the test case valid, as shown in Figure 4.16.
Figure 4.16: Test case 1: introduction of cardinality on I and S
Again, the MB challenges the model and asks: “Is it possible to submit an idea
without paper?”. The DE replies: “No, you always need a written document.” and
together they specify a second test case that ensures that S cannot be executed
without at least one execution of W before (cf. Figure 4.17). Again, TDMS immediately validates the test case and reports that the test case failed, as there are no
constraints restricting the interplay of W and S. Also in this situation DE and MB
know that the test case is valid, hence the MB needs to adapt the process model. In
particular, the MB slightly changes the layout of the process model to introduce a
precedence constraint between W and S. TDMS, in turn, reacts to this change and
validates the test case. As shown in Figure 4.17, test case and process model are
now consistent, hence the test cases pass.
Figure 4.17: Test case 2: introduction of precedence between W and S
In this section described how test cases and process modeling—particularly their
interwoven creation—is supported by TDMS. By the means of examples, we showed
how TDM could be adopted for driving a modeling session. In the following, we look
into the application of TDM to investigate whether test cases are indeed an adequate
60
4.7 The Influence of TDM on Model Creation: A Case Study
measure for improving the creation, understanding and maintenance of declarative
process models.
4.7 The Influence of TDM on Model Creation: A Case
Study
To study the influence of TDM on the creation of declarative process models and
to analyze the communication behavior between DE and MB, we apply TDM in
modeling sessions. In particular, we study the application of TDM in a case study,
i.e., we conduct “an empirical inquiry that investigates a contemporary phenomenon
within its real–life context” [288]. More specifically, we conduct a case study in which
a MB, who was trained in TDM, uses TDMS to capture business processes. Starting
with the definition of research questions to be addressed, we describe the design of
the case study in Section 4.7.1. Subsequently, Section 4.7.2 discusses the afore
defined research questions in the light of the collected data. Finally, Section 4.7.3
presents potential limitations and revisits findings for a discussion.
The case study is organized along five research questions (RQ1 to RQ5 ), as described in the following. Please recall that TDM was developed to support the
creation of declarative business process models, hence the research questions focus
on declarative business process modeling. In particular, TDM assumes that test
cases provide an additional communication channel between DE and MB (cf. Figure 4.5). To examine whether test cases are indeed as intuitive as expected and
accessible to DEs, in RQ1 we investigate whether test cases are accepted by DEs as
communication channel.
Research Question RQ1
DEs?
Are test cases accepted as communication channel by
In TDM, the process model as well as test cases are available during modeling.
Assuming that RQ1 can be answered positively, i.e., test cases are actually accepted
and used as communication channel, it is still not clear whether test cases are actually
better suited for communication than the process model itself. In particular, TDM
claims that test cases are easier understandable to the DE and hence likelier to be
used for communication than the process model. The goal of RQ2 is to find out
whether test cases are indeed favored over the process model, as suggested by TDM.
Research Question RQ2
nication channel?
Are test cases favored over the process model as commu-
61
Chapter 4 Test Driven Modeling
The ultimate goal of any design science artifact should be to improve over the state
of the art [100]. With respect to the communication between DE and MB, TDM
aims to improve over the state of the art by providing a common basis for discussion
(cf. Section 4.5.3). The goal of RQ3 is to investigate whether the adoption TDM in
this sense positively influences the communication behavior.
Research Question RQ3
DE and MB?
Do test cases help to foster the communication between
Furthermore, TDM claims that the specification of test cases can also be achieved
by a DE. In other words, the user interface provided by TDMS must be designed
so it can be operated by a DE. The goal of RQ4 is therefore to assess whether DEs
indeed think that they are capable of specifying test cases, i.e., whether DEs think
that operating TDMS is easy.
Research Question RQ4
Do DEs think that operating TDMS is easy?
So far, RQ1 to RQ4 covered positive aspects of TDM. However, it must be
assumed that the creation of test cases entails costs, i.e., the MB has to invest
additional time in the creation of test cases. Thus, the goal of RQ5 is to investigate
the additional costs implied by TDM.
Research Question RQ5
test cases?
What is the overhead associated with the specification of
4.7.1 Definition and Planning of the Case Study
In the following, we describe the specification of the case study. In particular, we
shortly describe the research methodology, on which the design of the case study is
based upon. Subsequently, we turn toward the design of the case study to show how
RQ1 to RQ5 were operationalized.
Case Study Methodology
The modeling methodology, as proposed by TDM, can be seen as a collaborative
approach. It assumes that the process model is created in an iterative process that
requires intense communication between MB and DE. To investigate the communication, we follow the CoPrA approach for the analysis of collaborative modeling
sessions [223]. As illustrated in Figure 4.18, the CoPrA approach consists of three
62
4.7 The Influence of TDM on Model Creation: A Case Study
phases. First, in the data collection phase, the research question and research design
are fixed. Based on the design, data is collected, e.g., by recording communication
protocols. In the context of this work, communication protocols refer to the conversation that takes place between DE and MB. To make the protocols amenable for
analysis, transcription of the recorded audio files is necessary (cf. [64]). Second, in
the data preparation phase, transcribed data is coded according to a coding schema,
which was fixed during the data collection phase. This coding is required to tackle
the problem of “attractive nuisance” [146], i.e., the enormous amounts of data produced in case studies. By breaking down communication protocols to codes, the
complexity of data is reduced and becomes amenable for analysis. Third, in the
last phase, characteristics such as the distribution of codes can be analyzed. The
strength of the CoPrA approach arises from coding communication protocols in a
format that supports the storage of temporal information and making the protocol
amenable for analysis with third party tools. To this end, as shown in Figure 4.18,
CoPrA makes use of Audittrail Entries, as defined in the well–established process
mining tool PROM [258]. In particular, each Audittrail Entry contains an entry
Workflow Model Element that refers to the code, whereas entry Timestamp stores
the timestamp at which the code was identified and entry Originator refers to the
person the code can be attributed to. In this sense, a transcript obtained in a
modeling session can be represented by storing a list of Audittrail Entries. In this
work, for instance, we analyzed how often the DE and MB referred to test cases
and looked into the temporal distribution of codes. For a detailed description of the
CoPrA approach, we refer the interested reader to [223].
Case Study Design
Based on research questions RQ1 to RQ5 , the design of the case study was elaborated, resulting in a three–phased process; the population of this case study are all
MBs and DEs that are working with declarative process models. In the first phase,
demographic data, such as age, familiarity with computers and experience in process
modeling are collected. TDM assumes that the DE knows the domain very well, but
is not trained in process modeling. Hence, these assessments are required to ensure
that the DEs participating in the case study comply with this profile. In the second
phase, the modeling sessions take place. For half of the subjects, a MB trained
in TDM leads the modeling session. For the other half of the subjects, the MB
conducts the modeling session using a declarative modeling editor only. During the
modeling session, three data capturing mechanisms are used. To capture communication, audio and video data is recorded (cf. RQ1 to RQ3 ). In addition, TDMS is
employed to gather the created process models and test cases. The collected process
63
Data
Collection
Chapter 4 Test Driven Modeling
E.g., Usage of test cases during
communication between MB
and DE.
E.g., Are test cases favored over
the process model?
Research Design
and Research
Question
Communication Logs
Gather Data
Data
Preparation
iterations
Coding with
Coding Schema
propose_testcase
ask_process_model
clarify_domain
...
Audittrail Entry
Export of Process
Information
Workflow Model Element
Timestamp
Data
Analysis
Originator
Computation of
Information
Figure 4.18: Analysis technique for collaboration processes, adapted from [223]
models can then be compared to investigate the additional effort for creating test
cases (cf. RQ5 ). To ensure that the results are not biased by unfamiliarity with the
usage of TDMS, it is operated by the MB only. In the third phase, the Perceived
Ease of Use scale of the Technology Acceptance Model (TAM) [47, 50] is presented
to the DE in order to investigate RQ4 (TDMS is easy to use).
To address research questions RQ1 to RQ3 , we developed a coding schema for
coding the transcribed communication logs. In particular, we use a subset of the
negotiation patterns [210] to describe the communication between DE, MB, test
cases and process model. As summarized in Table 4.1, we differentiate between
asking questions (ask ), answering questions (clarify), proposing changes (propose),
expressing consent to a proposal (support) and modeling (model ). Orthogonally,
we distinguish whether MB and DE refer to the process model when talking (pro-
64
4.7 The Influence of TDM on Model Creation: A Case Study
cess model ), refer to a test case when talking (test case) or just talk freely without
referring to the process model or a test case (domain). To assess whether DEs take
into account formal properties of the process model, we added code ask notation
to code situations where the DE asks about the modeling notation. For answering
questions about the modeling notation, we use clarify notation.
Category: Ask
Code
Action
Example
ask domain (p,q)
Person p states a question q without referring to a test case or the
process model.
“So you need to connect the mast with the sailing ship?”
Code
Action
Example
ask notation (p,q)
Person p states a question q regarding the notation.
“Is this a precondition?”
Code
Action
Example
ask process model (p,q)
Person p states a question q and refers to the process model.
“This one too?” (points at process model)
Code
Action
Example
ask test case (p,q)
Person p states a question q and refers to a test case.
“Then we start here.” (points at test case)
Category: Clarify
Code
Action
Example
clarify domain (p,q,a)
Person p gives answer a to question q without referring to a test case
or the process model.
“Yes, at least once and at most ten times.”
Code
Action
Example
clarify notation (p,q,a)
Person p gives answer a to question q regarding the notation.
“This arrow indicates a precondition.”
Code
Action
Example
clarify process model (p,q,a)
Person p gives answer a to question q and refers to the process model.
“Here, we are currently. . . ” (points at process model)
Code
Action
Example
clarify test case (p,q,a)
Person p gives answer a to question q and refers to a test case.
“This happens up here, just before. . . ” (points at test case)
65
Chapter 4 Test Driven Modeling
Category: Propose/Support
Code
Action
Example
propose domain (p,pr)
Person p makes proposal pr without referring to a test case or the
process model.
“Then, I have to contact the agency.”
Code
Action
Example
propose process model (p,pr)
Person p makes proposal pr and refers to the process model.
“Here, you have to unwrap the mainsail.” (points to process model)
Code
Action
Example
propose test case (p,pr)
Person p makes proposal pr and refers to a test case.
“Then, here, we went on with. . . ” (points to test case)
Code
Action
Example
support (p,pr)
Person p expresses consent to proposal pr.
“Yes, right.”
Category: Model
Code
Action
Example
model process model (p)
Person p adapts the process model.
“I just need to draw some lines here. . . ” (adapts process model)
Code
Action
Example
model test case (p)
Person p adapts a test case.
“So I drop this activity there. . . ” (adapts test case)
Table 4.1: Coding schema
For the operationalization of this setup, we rely on the capabilities of TDMS. As
TDMS is implemented as an experimental workflow activity of CEP (cf. Section 3.3),
it can be seamlessly integrated in the experimental workflow, i.e., CEP guides MB
and DE through the modeling sessions. Data is collected automatically, ensuring
that each modeling session, the collected demographic data and TAM survey is
stored as a separate case of the case study.
4.7.2 Performing the Case Study
In the following, we describe how the case study was performed and investigate
research questions RQ1 to RQ5 . Besides the elaboration of the case study design,
the preparatory phase included the configuration of CEP, acquisition of appropriate
66
4.7 The Influence of TDM on Model Creation: A Case Study
Familiarity
Familiarity
Familiarity
Familiarity
with
with
with
with
computers
domain
BPM
declarative BPM
Min.
Max.
M
SD
3
3
1
1
5
5
2
2
4.38
4.75
1.13
1.13
0.70
0.66
0.33
0.33
Table 4.2: Demographic data
devices for capturing audio and video and training the MB in TDM. To find potential
DEs, we asked colleagues, friends and relatives from whom we knew that they had
no experience in Business Process Management (BPM). Before the case study was
started, a small pilot study was conducted to ensure that the collected data is
amenable for the envisioned analysis. After minor adaptations (e.g., other software
for video capturing), the case study was started. All in all, eight DEs participated
in the study. Each of them was asked to describe a process from a domain they were
familiar with. In four modeling sessions, the TDM methodology was adopted, i.e., in
addition to the process model, test cases were developed, validated and discussed. In
the other four modeling sessions, the MB used a declarative process modeling editor
only, no test cases were provided.4 To ensure that the TDM methodology was
properly adopted and results were not influenced by lacking familiarity with TDMS
or declarative process models, the MB underwent intensive training in declarative
process modeling, applying TDM and using TDMS. During the modeling sessions,
TDMS was operated by the MB only to avoid the DEs’ different levels of tool
knowledge to influence results. Each of these modeling sessions lasted between
19 and 32 minutes, in total 3 hours and 32 minutes of modeling were captured.
Due to the nature of the case study, i.e., one–on–one sessions and the adoption of
communication protocols, an entirely anonymous data collection was not possible.
However, subjects were informed that all analyses are reported in confidentiality–
preserving way.
After the modeling sessions were finished, we validated whether the participating
DEs actually met the targeted profile, i.e., they had to be experts in their domain,
but should not be familiar with BPM. We used a 5–point rating scale to test for
familiarity with computers in general, the domain, BPM as well as declarative BPM.
4
The experimental material as well as collected data can be downloaded from:
http://bpm.q-e.at/experiment/ImpactOnCommunication
67
Chapter 4 Test Driven Modeling
Code
TDM
Declare
ask domain
ask notation
ask process model
ask test case
clarify domain
clarify notation
clarify process model
clarify test case
propose domain
propose process model
propose test case
support
model process model
model test case
45
0
5
34
63
0
4
34
113
0
26
57
65
61
109
1
9
0
110
3
18
0
95
30
0
48
170
0
Total
507
593
Table 4.3: Total codes used
The scale ranged from Disagree (1) over Neutral (3) to Agree (5).5 As summarized
in Table 4.2, the case study’s subjects fit the targeted profile of a DE. In particular,
familiarity with the domain was high (M = 4.75, SD = 0.66), while familiarity with
BPM in general (M = 1.13, SD = 0.33) an declarative BPM (M = 1.13, SD = 0.33)
was low. Involved domains were quite diverse and included, e.g., tax audition,
creation of class schedules, sailing and the renovation of buildings. In addition,
we assessed the age of the participants. Two DEs were between 18 and 29, one DE
between 30 and 45, two DEs between 45 and 60 and three DEs were over 60. Finally,
we also assessed the familiarity with computers. As summarized in Table 4.2, all
participants indicated strong accordance, canceling out that subjects were unfamiliar
with computers.
In the following, the recorded communication protocols of all sessions were transcribed, resulting in a text document with 26,299 words uttered in 1,267 statements.6
Subsequently, the codes from Table 4.1 were used to code the transcripts. Due to
5
The study was conducted with native German speakers, hence scale names were translated to
avoid misunderstandings. The original scale names were: stimmt nicht, stimmt wenig, stimmt
mittelmäßig, stimmt ziemlich and stimmt sehr.
6
We are indebted to Cornelia Haisjackl for supporting the transcription of protocols.
68
4.7 The Influence of TDM on Model Creation: A Case Study
personnel limitations, the coding process was carried out by a single person, i.e., the
author, only. As indicated in Table 4.1, not all codes could be identified by looking
at the transcripts only. For instance, for properly identifying code ask process model,
the coder has to know whether the person referred to the process model. For this
reason, also the video files of the modeling sessions were taken into account while coding. To get an overview of the coding, a summary is listed in Table 4.3. In general, it
can be said that the amount of codes used in total is similar, i.e., 507 for TDM, 593
for Declare modeling. However, the distribution of codes shows differences. Unsurprisingly, codes that refer to test cases were not used for the transcripts of Declare
modeling sessions. Interestingly enough, less asking and clarifying occurred in TDM
sessions (84 versus 119 ask, 101 versus 131 clarify). Please note that the amount
of ask statements does not necessarily need to coincide with the amount of clarify
statements. In fact, several clarify statements may follow an ask statement. For
instance, in some cases a question was picked up later and clarified in more depth.
A further interesting relation can be found between the number of times questions
about the notation were posed (ask notation) and the number of times statements
referred to the process model. 66 statements referred to the process model, but only
once a DE uttered a question about the notation. Knowing that the participating
DEs did not have any experience in process modeling (cf. Table 4.2), this finding is
surprising. In the following, we will first discuss the research questions in the light of
the collected data and then pick one particular case to illustrate a typical modeling
session.
RQ1 : Are Test Cases Accepted as Communication Channel by DEs?
One of the basic claims of TDM is that test cases provide an additional communication channel between DE and MB by providing a common discussion basis. To
investigate this claim, in particular to verify whether test cases are indeed adopted
for communication, we counted how often MB and DE referred to a test case during
the modeling session. Only those statements that unmistakably referred to a test
case were counted. For the identification of such statements two criteria were used.
If the test case was explicitly mentioned, e.g., “So, now we are talking about the
positive case”, we counted the statement as test case related. If this was not the
case and the transcript did not reveal whether the discussion was revolving around a
test case, we consulted the video. Thereby, we checked whether the person pointed
at a test case. If this was the case, the statement was considered to be referring to
a test case.
Considering ask, in total 84 statements were uttered during TDM modeling sessions, 34 (40%) of them referred to a test case. Regarding clarify, in total 101
69
Chapter 4 Test Driven Modeling
statements were found, whereby 34 (34%) statements referred to a test case. Finally, regarding propose, a total of 139 statements were found, of which 26 (19%)
referred to a test case. All in all, it can be said that for ask and clarify, a fair share
of the communication was using test cases. Concerning propose, the proportion was
clearly lower. In the following, we would like to provide an explanation for this
phenomenon. During the coding process, we found that DEs preferred to talk freely
about their domain. In other words, their statements were not well–structured and
included aspects that could not be captured in the process model. For instance,
DEs reported from personal experiences, such as how a certain procedure evolved
and improved over time: “Well this took as quite some time, but now, we are pretty
fast!”. For the DE, this information seemed relevant, however, it is not possible
to capture it in the form of a declarative process model. Put differently, it was
not clear a–priori to the DE which information was relevant for the modeling sessions, which is not surprising, as none of the participants was familiar with BPM
(cf. Table 4.2). Speaking in terms of Figure 4.5, the DE requires the MB to filter,
abstract and translate knowledge in a form that can be modeled as test case or process model. Hence, it appears as if test cases cannot replace the abstraction skills
of the MB. Still, 40% of the ask statements and 33% clarify statements referred to
test cases. Consequently, we argue that test cases are apparently able to provide an
additional communication channel, preferably for asking questions and clarifying.
In other words, we can positively answer RQ1 : test cases do provide an additional
communication channel. Please note that the observation that test cases cannot replace the abstraction skills of the MB does not contradict TDM. Rather, test cases
should be modeled by DE and MB together. Also, test cases aim for improving the
communication, but are not intended to replace the MB.
RQ2 : Are Test Cases Favored over the Process Model as Communication
Channel?
Even if test cases provide an additional communication channel, it is not clear yet
whether it improves over the process model as communication channel. TDM claims
that test cases are easier to understand for DEs, hence DEs presumably favor test
cases over process models for communication. To investigate this claim, we look at
the collected data from two perspectives. First, we look at the ratio of test case
based communication versus process model based communication in TDM modeling
sessions. Then, we will compare the communication behavior of TDM and Declare
modeling.
Regarding ask, in TDM modeling sessions 84 statements were uttered. 45 (54%)
were classified as general statements, 5 (6%) referred to the process model, while 34
70
4.7 The Influence of TDM on Model Creation: A Case Study
(40%) referred to a test case. A similar situation can be found for clarify. Here, 101
statements were transcribed, 63 (62%) were classified as general, 4 (4%) referred to
the process model, while 34 (34%) referred to a test case. Finally, for propose, from
a total of 139 statements, 113 (81%) were classified as general, 0 (0%) referred to
the process model, whereas 26 (19%) referred to a test case. Interestingly, for any of
these code categories a similar pattern can be found. Most communication happens
in a general form (54% to 81%), without referring to the process model or a test
case. In addition, a noticeable share of the communication involves test cases (19%
to 40%), while almost no communication is conducted with respect to the process
model (0% to 6%). It is important to note that TDM provides test cases as well as
the process model side–by–side. Hence, even though the idea is to focus on test cases
for communication, such behavior cannot be enforced, i.e., the MB cannot forbid
the DE to talk about the process model. Given this freedom of choice, we conclude
that DEs favor test cases over the process model for communication.
In the following, we complement these findings by comparing TDM with Declare
modeling. With respect to ask, 5 (6%) statements referred to the process model
in TDM, 9 (8%) in Declare modeling. Regarding clarify, 4 (4%) referred to the
process model in TDM, 18 (14%) in Declare modeling. Finally, for propose, 0 (0%)
statements in TDM referred to the process model, in Declare modeling 30 (24%)
could be found. Apparently, supplying test cases seems to distract attention from
the process model, further providing empirical evidence that test cases are favored
over process models. Hence, RQ2 can be answered positively: the collected data
suggests that test cases are favored over the process model in communication.
RQ3 : Do Test Cases Help to Foster the Communication between DE and MB?
To answer RQ3 , we start by defining what fostering communication means and
how it can be measured. In particular, we subscribe to the view of [105] and view
modeling as a “dialogue of questioning and answering”. Indeed, we could exactly
observe this behavior while coding the communication protocols. Initially, the DE
proposes a statement, e.g., by posing a general statement about the domain. If
the statement is clear enough, the MB is able to directly reflect it in a test case or
the process model. Otherwise, the MB has to ask the DE for clarification until the
statement is clear enough. Hence, we assume that the less ask statements and, in
turn, clarify statements are required, the more efficient the communication.
To assess whether the adoption of TDM indeed fosters communication, we counted
the number of ask, clarify and propose statements. In TDM sessions, in total 84
ask statements were uttered, in Declare modeling sessions 119. Similarly, in TDM
sessions 101 clarify statements could be observed, in Declare modeling sessions 131
71
Chapter 4 Test Driven Modeling
statements. These results suggest that less ask and clarify statements were observed
in TDM modeling sessions. Still, the reason for the difference could be lead back to a
different amount of total statements. Particularly, modeling sessions lasted between
19 and 32 minutes, potentially leading to a different number of total statements.
To cancel out this influence, we also counted the number of propose statements
and computed the relative occurrence of ask and clarify with respect to propose
statements. In TDM modeling sessions, in total 139 propose statements were found,
for Declare modeling sessions 125. Hence, in TDM modeling sessions, 0.60 ask
statements were uttered per propose statement. In Declare modeling, this value
increased to 0.95 times ask per propose. Similarly, 0.73 clarify statements were
found per propose in TDM sessions, 1.05 for Declare modeling. All in all, the
total amount of ask and clarify statements as well as relative amount of ask and
clarify statements per propose are lower. Hence, we conclude that also RQ3 can be
answered positively: the adoption of TDM appears to have a positive influence on
communication by making communication more precise. We would like to add at
this point that—even though we consider this argumentation plausible—one might
object that the discussion was rather less detailed than more effective. Given the
data at hand, we cannot entirely rule out this alternative explanation.
RQ4 : Do DEs Think that Operating TDMS is Easy?
Research question RQ4 is partly related to TDM and partly related to TDMS. TDM
claims that test cases are easier to understand than a process model. Similarly,
TDMS claims the graphical user interface for test cases it is easy to use. To assess
whether DEs would agree to these statements, we administered the Perceived Ease
of Use questionnaire from Technology Acceptance Model (TAM) [47, 50] to the
participating DEs after the modeling session. The Perceived Ease of Use scale
consists of six 7–point Likert items, ranging from Extremely likely (1) over Neither
Likely nor Unlikely (4) to Extremely Unlikely (7). On average, the DEs responded
with 1.9, which approximately relates to Quite Likely (2). Hence, we conclude that
the participating DEs find it quite likely that it would be easy to learn and operate
TDMS.
RQ5 : What is the Overhead Associated with the Specification of Test Cases?
Even though research questions RQ1 to RQ4 indicate positive effects of adopting
TDM, it seems inevitable that the specification of test cases implies additional effort. To estimate the resulting overhead, we collected the declarative process models
created in this study and analyzed their size. In particular, as summarized in Ta-
72
4.7 The Influence of TDM on Model Creation: A Case Study
Total
Per Model
%
Per Minute
%
Activities
TDM
Declare
74
85
18.50
21.25
(87%)
(100%)
2.84
3.12
(91%)
(100%)
Constraints
TDM
Declare
92
114
23.00
28.50
(81%)
(100%)
3.54
4.19
(84%)
(100%)
Table 4.4: Process model metrics
ble 4.4, we counted the number of activities and constraints for TDM and Declare
modeling sessions. The numbers clearly indicate that less activities (74 versus 85)
and less constraints (82 versus 114) were created in TDM modeling sessions. To
compensate for the varying duration of modeling sessions, we have also counted the
number of activities and constraints added per minute. Even though the difference
between TDM and Declare becomes smaller, the numbers clearly indicate that the
creation of test cases involves a considerable overhead. In particular, in Declare
modeling sessions 3.12 activities were added per minute, 2.84 activities were added
in TDM sessions, i.e., indicating a drop from 100% to 91%. Considering constraints,
4.19 constraints were added in Declare sessions, while 3.54 constraints were added
in TDM sessions, i.e., dropping from 100% to 84%.
Knowing that in each TDM session on average 2 test cases with 34.5 activities
(total: 8 test cases, 138 activities) were created, it seems plausible that a considerable overhead was involved for the specification of test cases. Nonetheless, we
would like to stress the point that this additional effort apparently helps to improve
communication and presumably provides support for process model maintenance.
TDM in Action: Example of a Modeling Session
So far, we have established that test cases are accepted as communication medium
(RQ1 ), are favored over the process model for communication (RQ2 ), improve communication efficiency (RQ3 ), are easy to understand when visualized through TDMS
(RQ4 ), but also imply additional costs (RQ5 ). In the following, we complement these
findings with insights from a typical TDM modeling session. In particular, we take
a more fine–grained and dynamic point of view and look into how the process of
process modeling evolved during the modeling session (cf. [192]). To this end, we
partition the modeling session into time slots and analyze the occurrence of codes
per time slot. We put 10 codes in each time slot of the modeling session, as we
identified 10 codes per slots as meaningful level of granularity.
73
Chapter 4 Test Driven Modeling
Considering, for instance, the diagram illustrated in Figure 4.19. On the x–axis,
the modeling session’s time slots are listed. In this particular case, the modeling
session was divided into 16 time slots. Then, for each time slot the occurrence of
statements model test case and model process model were counted, as can be seen on
the y–axis. Please note that, even though each time slot includes 10 statements, the
number of model test case statements and model process model statements in each
time slot do not add up to 10. This is due to the fact that other statements that
are not shown in this diagram, such as ask domain, also occur in modeling sessions.
5
4
Codes
3
model_test_case
2
model_process_model
1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Time Slot
Figure 4.19: Modeling of test cases versus modeling of process model
Knowing the semantics of the diagram, the following observations can be made
from Figure 4.19. First, the creation of test cases preceded the creation of the process
model, indicating that TDM was indeed adopted for modeling. By looking at the
created process model and the communication transcripts, we could observe that in
this case the DE described a tax inspection. The first test case peak from time slot
2 to time slot 4 related to the description of a positive case, i.e., the case when no
additional tax payment had to be performed. Next, from time slot 5 to time slot 7,
the MB consolidated the test case with the process model, hence mainly the process
model was adapted. From time slot 8 to time slot 9, no modeling activities could be
observed. Rather, the DE and MB elaborated on the business process in general. In
time slot 10, we could see from the communication protocols that the MB and DE
74
4.7 The Influence of TDM on Model Creation: A Case Study
revisited and further refined the first test case. Subsequently, in time slot 11, a new
test case was created—the negative case, where additional tax payments had to be
performed. Necessary adaptations to the process model were conducted in time slot
12. The final peak of test case modeling from time slot 13 to time slot 16 related
to a further refinement of the negative test case. Apparently, no more adaptations
to the process model were required, as no more process modeling activity could be
found.
Even though this analysis shows that TDM is feasible and test cases can be used
to drive the creation of declarative process models, it does not provide insights
into the communication behavior. Therefore, we also analyzed the communication
logs with respect to the communication channel used. In particular, as shown in
Figure 4.20, we differentiate between general conversations about the domain, conversations that refers to a test case as well as communication that refers to the
process model. Unsurprisingly, as discussed in RQ2 , most communication is kept on
a general level. A fair amount of communication refers to test cases, while almost no
communication refers to the process model. Interestingly, however, the distribution
of domain communication and test case communication appears to change as the
modeling evolved. Starting with a high amount of domain communication, i.e., talking in general, communication shifts toward the usage of test cases. To quantify this
observation, we counted the amount of codes referring to talking freely, i.e., codes
* domain, and all codes referring to test cases, i.e., codes * test case, in the first and
second half of the modeling session. Thereby, we could observe that the amount of
codes associated with talking freely decreased from 51 statements to 25 statements,
while the amount of codes referring to test cases increased from 10 statements to
32 statements. Knowing that DEs in our study did not have any prior experience
in business process modeling, it seems likely that DEs first had to get used to the
notion of test cases. As this happened within half of a modeling session, we conclude
that test cases are indeed a intuitive instrument for specifying process models.
4.7.3 Limitations and Discussion
Even though the investigation seems quite promising, the presented results are to
be viewed in the light of limitations. First and foremost, the sample size—even
though not untypical for a case study [61]—is a threat to the generalization of
results. In other words, although the DEs in our study accepted and favored test
cases as communication channel, it cannot be guaranteed that this holds for every
DE. Similarly, we made use of a convenience sample, i.e., we used colleagues, friends
an relatives as DEs. To avoid a potential bias, we did not inform DEs about the
goal of the study and only informed about necessary details, such as the tasks to be
75
Chapter 4 Test Driven Modeling
10
9
8
7
Codes
6
5
domain
4
test case
process model
3
2
1
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
Time Slot
Figure 4.20: Utilized communication channels
performed. However, the adopted selection strategy must be considered a potential
bias. Furthermore, the modeling sessions were rather short, on average a modeling
session lasted 26 minutes and 30 seconds. In addition, reported numbers are of
descriptive nature, as due to the low sample size inferential statistics could not be
applied. Finally, due to personnel limitations, the coding was performed by a single
person, i.e., the author, only. This, in turn, must be expected to negatively influence
the accuracy and reliability of the coding [119].
Still, the findings of the case study seem quite positive so far. In particular,
it seems as if test cases are accepted as communication channel (RQ1 ) and are
favored over the process model for communication (RQ2 ). Furthermore, test cases
decreased the amount of ask and clarify statements per propose (RQ3 ). In addition,
DEs indicated that they think learning to operate TDMS seems to be easy (RQ 4 ),
but the adoption of TDM requires moderate overhead for the creation of test cases
(RQ5 ). In the following, we will underpin these findings with insights gained during
data analysis. One of the central insights of the modeling sessions was about the
way how DEs structured their information. In general, DEs seemed to prefer talking
about their domain in the form of sequential actions, seen from their perspective.
Indeed, 837 times words indicating sequential information, such as “then”, “now” or
76
4.8 The Influence of TDM on Model Maintenance: Experiments
“afterwards” were used. Interestingly enough, this behavior could be observed across
all modeling sessions. As this behavior occurred for TDM and Declare modeling
sessions, we assume that it is the intuitive way for a DE to talk about a domain.
If this was indeed the case, it would explain why test cases were well accepted:
test cases provide such a sequential, instance–based view on a declarative process
model. Contrariwise, a declarative process model rather provides circumstantial
information. Similar observations were made in an empirical investigation examining
how MBs make sense of declarative process models [97]. In particular, the study
showed that MB seem to prefer a sequential way of reading a declarative process
model. Against this background, it seems natural that DEs prefer test cases for
communication.
Finally, we would like to come back to the assumption that DEs are normally not
able to read formal process models (cf. [107]). In particular, in 66 cases, either the
DE or the MB referred to the process model during the modeling session, indicating
that DEs might have had formal process modeling knowledge. To clarify whether
this was indeed the case, we looked into all statements that referred to the process
model. The analysis revealed all but one DE referred to activities, but ignored the
constraints. Hence, it appears that DEs are indeed normally not able to grasp the
formal parts of a process model, but may access superficial information, such as
activity names. Backup for this theory is also provided by the fact that none of the
DEs had experience in declarative process modeling (cf. Table 4.2) and only once a
DE uttered a question about the notation (cf. Table 4.3). In other words, it seems
implausible that DEs were able to extract more information from the process model
than activity names.
4.8 The Influence of TDM on Model Maintenance:
Experiments
So far, we have empirically investigated the influence of TDM on the creation of
declarative process models, particularly focusing on the communication between
DE and MB. In the following, we turn toward the maintenance of declarative
process models, i.e., the evolution of process models due to, e.g., external factors
like the introduction of new laws or changes in customer attitude. Methodologically,
we thereby rely on controlled experiments, i.e., conduct modelling sessions in the
laboratory under controlled conditions [13]. In particular, this investigation consists
of two controlled experiments. Based upon the experimental design, as described
in Section 4.8.1, the first experiment, subsequently referred to as E1 , is conducted
with a rather small sample size (cf. Section 4.8.2). Insights from this experiment are
77
Chapter 4 Test Driven Modeling
then used for a replication, in the following referred to as R1 , with a larger sample
size (cf. Section 4.8.3).
4.8.1 Experimental Definition and Planning
The goal of this investigation is to provide empirical evidence for the positive influence of test cases on the maintenance of declarative process models. This section
introduces the research questions, hypotheses, describes the subjects, objects, factors, factor levels and response variables required for our experiment and presents
the instrumentation and data collection procedure as well as the experimental design.
Research Questions and Hypotheses
The hypotheses tested in this investigation are directly derived from the theoretical considerations presented in Section 4.4 and Section 4.5. In particular, we have
argued that declarative process models lack computational offloading for the computation of execution traces. In addition, declarative process models rather provide
circumstantial than sequential information. The extraction of sequential information, in turn, was shown to be a hard mental operation. As, however the validation of
process models presumably requires the MB to mentally execute the process model,
a significant impact on the mental effort can be expected. TDM and test cases in
particular provide computer–based support for the computation of execution traces.
Therefore, we expect that the mental effort required for adapting process models
significantly decreases when test cases are provided. This claim, in turn, directly
leads to research question RQ6 and associated hypothesis H1 .7
Research Question RQ6 Does the adoption of test cases lower the mental effort
on the process modeler conducting the change?
Hypothesis H1 The adoption of test cases significantly lowers the mental effort on
the process modeler conducting the change.
Tightly connected to the mental effort required for adapting declarative process
models is the way how the adaptations are perceived, i.e., the perceived quality of
7
To unambiguously refer to research questions, we assigned consecutive numbers to our research
questions. RQ1 to RQ5 were already used in Section 4.7, hence we continue here with research
question 6.
78
4.8 The Influence of TDM on Model Maintenance: Experiments
the changes. In particular, it is assumed that the human mind works best if it is engaged at a level appropriate to one’s capacities [161], i.e., the required mental effort
is not too high. Likewise, by avoiding hard mental operations and not overloading the working memory, MBs are presumably more confident with the conducted
changes. Indeed, similar results could be found in the domain of software engineering, where experiments showed that having test cases at hand can improve perceived
quality [133]. Similarly, we expect test cases to improve the perceived quality of the
changes applied to declarative process models, as stated in research question RQ7
and its associated hypothesis H2 :
Research Question RQ7 Does the adoption of test cases improve the perceived
quality of the conducted changes?
Hypothesis H2 The adoption of test cases significantly improves the perceived quality of the conducted changes.
Finally, the availability of test cases should also improve the quality of the conducted changes, since test cases provide an automated way of validating the process
model. In addition, is known that errors are more likely to occur when the working
memory’s capacity is overloaded [236], we argue that a reduction of mental effort by
providing test cases ultimately leads to lower error rates, i.e., quality of the evolution.
In particular, we expect a positive influence on the quality of process models (the
operationalization of quality will be explained subsequently), as stated in research
question RQ8 and the associated hypothesis H3 :
Research Question RQ8 Does the adoption of test cases improve the quality of
changes conducted during maintenance?
Hypothesis H3 The adoption of test cases significantly improves the quality of
changes conducted during maintenance.
Subjects
The population under examination are process modelers that create and maintain
declarative process models. Therefore, targeted subjects should be at least moderately familiar with BPM and declarative process modeling notation (Declare in
particular). We are not targeting modelers who are not familiar with declarative
process models at all, since we expect that their unfamiliarity blurs the effect of
79
Chapter 4 Test Driven Modeling
adopting test cases, since errors may be traced backed to lack of knowledge rather
than the complexity of the change task.
Factor and Factor Levels
Our experiment’s factor is the adoption of test cases, i.e., whether test cases are
provided while conducting the changes to the process model or not. Thus, we define
the factor to be adoption of test cases with factor levels test cases as well as absence
of test cases.
Objects
The objects of our study are two change assignments, each one performed on a different declarative process model. Please recall that this section describes the general
design of this study. Hence, aspects specific to experiment E1 and replication R1 ,
such as the exact models that were used, are reported in Sections 4.8.2 and Section 4.8.3, respectively. The process models and change assignments were designed
carefully to reach a level of complexity that goes well beyond the complexity of a
“toy–example”. To cancel out the influence of domain knowledge [120], we labeled
the models’ activities by letters (e.g., A to H ). Furthermore, to counter–steer potential confusion by an abundance of different modeling elements, no more than
eight distinct constraints were used per model. By providing test support, factor
level adoption of test cases was operationalized. Likewise, by do not providing test
support, factor level absence of test cases was operationalized. Finally, we performed a pilot study to ensure that the process models and change assignments are
of appropriate complexity and are not misleading.
The change assignments consist of a list of requirements, so–called invariants, that
hold for the initial model and must not be violated by the changes conducted. In
addition, it must be determined, whether the change to be modeled is consistent with
the invariants. If this is the case, the change has to be performed while ensuring that
all invariants are preserved. If a change assignment is identified to be inconsistent,
a short explanation of the inconsistencies must be provided and the change must
not be applied. An example of a change assignment is illustrated in Figure 4.21 (1).
Assume an invariant that C cannot be executed until A has been executed. Further
assume a change assignment to remove the precedence constraint between A and
B. The invariant is valid for this model as C requires B to be executed before and
B requires A to be executed before—thus C cannot be executed before A has been
executed. The change is consistent, as it does not contradict the invariant. However,
removing the precedence constraint between A and B is not enough. In addition,
80
4.8 The Influence of TDM on Model Maintenance: Experiments
I
nv
ar
i
ant
:Cpr
ecededbyA
I
nv
ar
i
ant
:Cpr
ecededbyA
ChangeTask:Remov
epr
ecedencebet
weenAandB
Figure 4.21: Example of a change assignment [301]
a new precedence constraint between A and C has to be introduced to satisfy the
invariant, resulting in the process model shown in Figure 4.21 (2).
Response Variables
To test the hypotheses, we define the following response variables: mental effort
(H1 ), perceived quality (H2 ) and quality of the conducted changes (H3 ). To measure
mental effort, we employ a 7–point rating scale, asking subject to rate the mental
effort expended for conducting the change tasks from Very low (1) over Medium (4)
to Very high (7). As detailed in Section 3.2.3, employing rating scales for measuring
mental effort was shown to be reliable and is widely adopted. To assess perceived
quality, we ask subjects to self–rate the quality, i.e., correctness, of the conducted
changes. The measurement of quality is derived from the change assignments (cf.
Paragraph Objects). In particular, we define quality to be the sum of preserved
(non–violated) invariants, the number of correctly identified inconsistencies as well
as the number of properly performed changes, i.e., we measure whether the new
requirements were modeled appropriately.
To illustrate this notion of quality, consider the process model shown in Figure 4.21
(1). The modeler must 1) determine that the change is consistent, 2) remove the
precedence constraint between A and B to fulfill the change assignment, as well as 3)
introduce a new precedence constraint between A and C to satisfy the invariant—for
each subtask one point can be reached, i.e., at most 3 points per change assignment.
81
Chapter 4 Test Driven Modeling
Experimental Design
The experimental design employed in this study is based on the guidelines for designing experiments from [286]. Following these guidelines, a randomized balanced
single factor experiment is conducted with repeated measurements. The experiment
is called randomized, since subjects are assigned to groups randomly. We denote the
experiment as balanced as each factor level (i.e., the adoption of test cases and the
absence of test cases) is applied to the same number of subjects. As only a single
factor is manipulated (i.e., the adoption of test cases), the design is called single factor. To operationalize this setup, we employ the experimental workflow capability
of CEP (cf. Section 3.3). In particular, as shown in Figure 4.22, the workflow starts
by asking the subject to enter a code, which is printed on the assignment sheet given
to subject. In this way, the experimenters can steer how many subjects will take the
upper branch in the experimental workflow or take the lower branch of the experimental workflow, thereby achieving repeated measurement. Likewise, by randomly
distributing assignment sheets, randomization is achieved. After entering a valid
code, i.e., 3482 or 7198, the subject is prompted to answer a demographical survey
in order to assess the subject’s background. In addition, subjects are informed that
participation is non–mandatory as well as anonymous so that neither personal information is stored nor that answers can be traced back to subjects. Then, subjects
are asked to adapt two process models (M1 and M2 ) according to a given list of
requirements. As shown in Figure 4.22, 10 change tasks have to be performed for
each model. For the case, the subject enters code 3482, M1 is presented with test
support, while M2 has to be adapted without test support. Contrariwise, for code
7198, M1 has to be adapted without test support, while test support is available
for M2 . Subsequently, regardless of the code, a concluding survey, assessing mental
effort and perceived quality is administered.
code = 3482
Enter Code
M1: TDMS,
10 Change
Tasks
M2: Declare
10 Change
Tasks
Show
Demographic
Survey
Concluding
Survey
code = 7198
M1: Declare
10 Change
Tasks
M2: TDMS
10 Change
Tasks
Figure 4.22: Experimental design
82
4.8 The Influence of TDM on Model Maintenance: Experiments
Instrumentation and Data Collection Procedure
Similar to the case study described in Section 4.7, we rely on CEP for non–intrusive
data collection. This, as detailed in Section 4.6, allows to investigate the maintenance tasks in detail by replaying the logged commands step–by–step. In addition,
CEP provides the bare essentials for conducting the change tasks, allowing subjects
to get quickly familiar with the modeling tool. Likewise, it can be ensured that
subjects are not distracted by non–task relevant tool features. Finally, as described
in Section 3.3, CEP supports the automated collection of data, thereby simplifying
the execution of the experiment.
4.8.2 Performing the Experiment (E1 )
Based upon the experimental setup described in Section 4.8.1, the first controlled
experiment (E1 ) was conducted. In the following, we cover aspects regarding the
operation of the experiment, followed by data analysis.
Experimental Operation of E1
Experimental Preparation Preparation for controlled experiment E1 can be divided into the preparation of experimental material as well as the acquisition and
training of subjects. As described in Section 4.8.1, the process models used in this
study should have a reasonable complexity, likewise the change task should not be
trivial. In particular, two models with 7 activities each were prepared for the experiment. In M1 , 7 constraints were used, whereas the M2 comprised 8 constraints.8
Please note that these models, even if a seemingly low number of activities and constraints was used, can pose a significant challenge in understanding. In particular,
it was observed that already the combination of two different constraints can be a
challenging task [97]. Hence, we argue that M1 and M2 go well beyond the size of
trivial process models. To ensure that the assignments were clearly formulated, we
conducted a pilot study before the actual experiment to screen the assignments for
potential problems.
Regarding the training and acquisition of subjects, we could rely on a lecture on
BPM, which was held at the University of Innsbruck. In particular, it was possible
to adapt the lectures such that students were given lectures and assignments on
declarative business process modeling before the experiment. More specifically, a
lecture on declarative process models was held two weeks before the experiment. In
8
The material used for experiment E1 can be downloaded from:
http://bpm.q-e.at/experiment/ImpactOnMaintenance
83
Chapter 4 Test Driven Modeling
addition, students had to work on several modeling assignments using declarative
processes before the experiment took place. One week before the experiment, the
concept of test cases and their usage was demonstrated.
Experimental Execution Controlled experiment E1 was conducted in December
2010 at the University of Innsbruck in the course of a lecture on BPM with 12
participants. Immediately before the experiment, a short lecture revisiting the most
important concepts of TDM and the experimental setup was held. The rest of the
experiment was guided by CEP’s experimental workflow engine (cf. Section 3.3),
leading students through an initial questionnaire, two modeling tasks (one with
the support of test cases and one without the support of test cases), a concluding
questionnaire and a feedback questionnaire, cf. Figure 4.22. The experiment was
concluded with a discussion to exchange students’ experiences and to revisit the
experiment’s key aspects.
Data Analysis of E1
So far, we focused on the setup and execution of controlled experiment E1 . In the
following, we describe the analysis and interpretation of data.
Data Validation Due to the relatively small number of subjects, we were able to
constantly monitor for potential problems or misunderstandings and to immediately
resolve them. For this reason and owing to CEP’s experimental workflow engine, all
subjects were guided successfully through the experiment and no data set had to be
discarded because of disobeying the experimental setup. In addition, we screened
the subjects for familiarity with Declare [175] (the declarative process modeling
language we used in our models), since our research setup requires subjects to be
at least moderately familiar with Declare. As summarized in Table 4.5, subjects
felt competent in using and understanding Declare, even though they just recently
started using Declare. In particular, the mean value for familiarity with Declare,
on a rating scale from 1 to 7, was 3.17 (slightly below average). For confidence
in understanding Declare models a mean value of 3.92 was reached (approximately
average). For perceived competence in creating Declare models, a mean value of 3.83
(approximately average) could be computed. Also, it can be seen that subjects were
rather new to Declare (on average using it since two weeks) and rather new to BPM
in general (on average 9 months experience). Finally, all subjects indicated that
they were students. Summarizing, we conclude that the subjects are rather new to
Declare, however feel competent in using and understanding Declare, in turn, fitting
the targeted subject profile.
84
4.8 The Influence of TDM on Model Maintenance: Experiments
Familiarity with Declare
Confidence understanding Declare
Competence modeling Declare
Months using Declare
Years experience in BPM
Min.
Max.
M
SD
1
2
2
0
0
5
5
5
2
4
3.17
3.92
3.83
0.58
0.75
1.27
1.00
0.94
0.79
1.22
Table 4.5: Demographical statistics of E1
Descriptive Analysis To give an overview of the experiment’s data, Table 4.6 shows
minimum, maximum, mean and standard deviation of mental effort, perceived quality and quality. The values shown in Table 4.6 suggest that the adoption of test
cases lowers mental effort, increases perceived quality and increases quality, thus
supporting hypotheses H1 , H2 and H3 . In particular, mental effort for tasks that
were supported with test cases (M = 4.25, SD = 0.97), was lower than the for non
test case supported tasks (M = 5.75, SD = 0.87). Likewise, perceived quality for
tasks with test cases (M = 6.00, SD = 0.85) was higher than perceived quality for
tasks without test cases (M = 4.33, SD = 1.61). Finally, also quality was higher for
tasks with test case support (M = 22.33, SD = 1.56) than for tasks without test
case support (M = 21.92, SD = 1.07). Please recall that mental effort and perceived
quality were measured on rating scales ranging from 1 (Very low) to 7 (Very high).
Quality, according to the definition of the quality measure in Section 4.8.1 (paragraph Response Variables), was calculated by taking into account whether change
tasks were properly identified as feasible, invariants were obeyed and changes conducted correctly. All in all, 10 change tasks had to be performed per model, of which
5 were feasible. In addition, 8 invariants had to be obeyed, hence in total at most
23 points could be reached per model.
So far, these observations are merely based on descriptive statistics. For a more
rigid investigation, the hypotheses will be tested for statistical significance in the
following.
Hypotheses Testing of E1
Our sample is relatively small and not normal distributed9 , thus we follow guidelines
for analyzing small samples to employ non–parametric tests [179]. In particular, we
9
We applied Kolmogorov–Smirnov Test with Lilliefors significance correction to test for normal
distribution. Detailed tables listing results can be found in Appendix A.1.
85
Chapter 4 Test Driven Modeling
N
Min.
Max.
M
SD
Mental effort
with test cases
without test cases
24
12
12
3
3
4
7
6
7
5.00
4.25
5.75
1.18
0.97
0.87
Perceived quality
with test cases
without test cases
24
12
12
2
4
2
7
7
7
5.17
6.00
4.33
1.52
0.85
1.61
Quality
with test cases
without test cases
24
12
12
19
19
20
23
23
23
22.13
22.33
21.92
1.33
1.56
1.07
Table 4.6: Descriptive statistics of E1
used SPSS (Version 21.0) for carrying out Wilcoxon Signed–Rank Test.10
Hypothesis H1 The mental effort for conducting change tasks with test case support (M = 4.25) was significantly lower than the mental effort for change tasks
that were conducted without test case support (M = 5.75) (Wilcoxon Signed–Rank
Test, Z = −2.57, p = 0.01, r = 0.82).
Hypothesis H2 The perceived quality for conducting change tasks with test case
support (M = 6.00) was significantly higher than the perceived quality for change
tasks that were conducted without test case support (M = 4.33) (Wilcoxon Signed–
Rank Test, Z = −2.83, p = 0.005, r = 0.74).
Hypothesis H3 The quality for conducting change tasks with test case support
(M = 22.33) was not significantly higher than the quality for change tasks that
were conducted without test case support (M = 21.92) (Wilcoxon Signed–Rank
Test, Z = −0.86, p = 0.39, r = 0.25).
Summing up, hypotheses for mental effort (H1 ) and perceived quality (H2 ) could
be accepted, while the hypothesis quality (H3 ) had to be rejected. To assess the
strength of the effects, we consider recommendations by Cohen and regard an effect
size of 0.1 a small effect, an effect size of 0.3 a medium effect and an effect size of 0.5
10
Due to repeated measurements the response variables are not independent, thus Wilcoxon Signed–
Rank Test was chosen.
86
4.8 The Influence of TDM on Model Maintenance: Experiments
a large effect [32, 33]. Against this background, the effect regarding mental effort
(H1 , r = 0.82) and perceived quality (H2 , r = 0.74) can be considered to be large.
The effect size regarding quality (H3 , r = 0.25), in turn, is medium and differences
are not statistically significant. Reasons, implications and conclusions are discussed
in the following.
Discussion and Limitations
Based on the obtained results we conclude that the adoption of test cases has a
positive influence on mental effort (H1 ) and perceived quality (H2 ). Especially interesting is the fact that, even though quality could not be improved significantly,
apparently the modelers were more confident that they conducted the changes properly. This effect is also known from software engineering, where test cases improve
perceived quality [14, 133]. Indeed also the follow–up discussion with the subjects
after the experiment revealed that students with software development background
experienced this similarity, further substantiating our hypotheses on a qualitative
basis.
Regarding quality (H3 ), no statistically significant differences could be observed.
This raises the question whether there is no impact of test cases on quality at all
or if the missing impact can be explained otherwise. To this end, a detailed look
at the distribution of quality offers a plausible explanation: the overall quality is
very high, the quality measured on average is 22.13 out of a maximum of 23 (cf.
Table 4.6). Thus, approximately 96% of the questions were answered correctly and
tasks were carried out properly. Put differently, the overall quality leaves almost
no room for improvements when adopting test cases—in fact, the sample’s standard
deviation is very low (1.33). Since data “points toward” the positive influence of test
cases on quality (i.e., the mean value is higher, cf. Table 4.6) and due to the low
variance it seems reasonable to assume that a positive correlation exists, however,
the overall high quality blurs expected effects. In other words, we believe that the
ceiling effect [263] was in place, i.e., best performers could not use their full potential,
as the rather low complexity of tasks introduced an artificial ceiling. Thus, in turn,
it seems plausible that by increasing the complexity of change tasks, the positive
influence of test cases can be shown.
As explained in Section 4.8.1, for the evaluation of quality we had to take a close
look at the process models. Thereby, we could observe that the availability of test
cases changed the behavior of the subjects. In particular, subjects who did not have
test cases at hand seemed to be reluctant to change the process model, i.e., tried
to perform the change tasks with a minimum number of change operations. To
quantify and investigate this effect in detail, we counted how many constraints were
87
Chapter 4 Test Driven Modeling
created, how many constraints were deleted and how many constraints were adapted.
The results, as listed in Table 4.7, reveal that modelers having test cases at hand
approximately changed twice as many constraints. More specifically, subjects who
had test cases at hand performed on average 32.33 change operations, while subjects without test cases conducted 18.58 change operations. The applied Wilcoxon
Signed–Rank Test (Z = −2.71, p = 0.007, r = 0.78) indicates statistically significant
differences and a large effect size (cf. [32, 33]). Again, we would like to provide a
possible explanation that is again inspired by insights from software engineering. In
particular, it was argued that test cases in software engineering improve the developer’s confidence in the source code, in turn increasing the developer’s willingness
to change the software [14, 15]. This experiment’s data supports the assumption
that a similar effect can be observed when declarative process models are combined
with test cases. In fact, significantly more change operations were performed by
modelers who had test cases at hand. Still, the quality did not decrease even though
approximately twice as many change operations were performed. Thus, we conclude
that test cases indeed provide a safety net, thereby increasing willingness to change
while not impacting quality in a negative way.
N
Min.
Max.
M
SD
Created constraints
with test cases
without test cases
24
12
12
5
10
5
34
34
13
13.04
16.25
9.83
6.36
7.49
2.29
Deleted constraints
with test cases
without test cases
24
12
12
5
9
5
32
32
12
11.79
15.17
8.42
6.36
7.48
1.93
Adapted constraints
with test cases
without test cases
24
12
12
0
0
0
7
7
1
0.63
0.92
0.33
1.53
2.11
0.49
Total changes
with test cases
without test cases
24
12
12
11
19
11
73
73
24
25.46
32.33
18.58
13.90
16.91
3.90
Table 4.7: Performed change operations
Summing up, controlled experiment E1 shows on the basis of empirical data that
the adoption of test cases has a positive influence on the mental effort, perceived
quality and the willingness to change. Regarding the impact on quality, however,
88
4.8 The Influence of TDM on Model Maintenance: Experiments
the data did not yield any statistically significant results. A closer look at the
data suggests that these effects were blurred by the overall high quality (cf. ceiling
effect [263]). Although the results sound very promising, it should be mentioned that
the small sample size (12 subjects) constitutes a threat to external validity, i.e., it is
questionable in how far the results can be generalized. Regarding the generalization
of results, also the process models and change tasks used in E1 are of relevance.
In particular, 10 change tasks were performed on 2 rather small process models,
thereby further limiting the generalization of results. To address these problems, we
have conducted a replication of experiment E1 , particularly focusing on the issues
of quality, sample size and diversity of models. This replication, referred to as R1 ,
is described in the following.
4.8.3 Performing the Replication (R1 )
Replication R1 mainly follows the same experimental design as employed for experiment E1 , except for some adaptations required to address open issues of experiment
E1 . In the following, we describe the experimental design of R1 , before we turn
toward the execution and analysis of R1 .
Experimental Design
To address shortcomings of E1 , we adapted the experimental design, as follows.
First, the experimental workflow and assignments were slightly adapted to minimize
potential confusion. Second, to address the lack of significant differences with respect
to quality in E1 , more complex models were used in R1 , as detailed in the following.11
Adaptations to the Experimental Workflow In experiment E1 , we could observe
two main troubles related to the execution of the experiment. First, in E1 , we
asked subjects to perform a series of change tasks to the same process model. The
change tasks were designed such that they could be performed independently of each
other and in arbitrary order. Still, subjects had to keep track which change tasks
they had performed so far. Even though most subjects were not distracted by this
procedure, as indicated by the high amount of correctly performed change tasks (cf.
Table 4.6), it required the experimenters to repeatedly clarify the experimental setup
to subjects. Second, in E1 we asked subjects to determine whether a change task was
feasible and if this was the case, to perform the change task. Also this procedure
11
The material used for replication R1 can be downloaded from:
http://bpm.q-e.at/experiment/TDMReplication
89
Chapter 4 Test Driven Modeling
caused confusion among subjects—apparently the setting was not as intuitive as
intended.
To address these problems, we slightly adapted the experimental workflow. In
particular, instead of asking subjects to apply changes to the same process model,
we separated the change tasks from each other. In this way, a new process model
was presented for each change task. A total of 14 change tasks—7 with test case
support, 7 without test case support—had to be performed, requiring 14 different
process models. To ensure that the variance of the process models’ complexity was
kept low, we designed 3 process models and varied the labeling of activities so that
we arrived at 14 process model variations. Besides this separation of change tasks, in
R1 we did not provide change tasks that could not be performed, i.e., change tasks
that were inconsistent with the process model and its invariants. By simplifying
the change task assignments, we could also simplify the calculation of the quality
measure. In particular, we rewarded each correctly performed change task with 1
point—allowing subjects to reach a maximum of 7 points per model.
Adaptations to the Experimental Material As discussed in Section 4.8.2, no significant impact of test cases on the quality of changes could be found. However,
we argued that lack of differences could be traced back to ceiling effects [263] and
that more complex change tasks would compensate for this shortcoming. To test
this assumptions, we increased the complexity of the process models used in R1 .
Particularly, we increased the number of activities from 7 activities in E1 to 16 activities per model in R1 . Similarly, we increased the number of constraints from 7/8
constraints per process model in E1 to 18 constraints in R1 .
Experimental Operation of R1
Experimental Preparation As described above, we slightly adapted the experimental workflow for R1 and created more complex process models. Analogous to
E1 , we conducted a pilot study to ensure that the material and instructions were
comprehensible. To accomplish a larger sample size for R1 , we could rely on an ongoing collaboration with Prof. Reichert from the University of Ulm. In particular,
Prof. Reichert agreed to present TDM and to conduct replication R1 in the course
of a lecture on Business Process Management. Similar to E1 , lectures and assignments on declarative process modeling in general and TDM in particular ensured
that subjects were trained sufficiently before the replication.
Experimental Execution Replication R1 was conducted in December 2011 at the
University of Ulm in the course of a lecture on BPM with 31 participants. As in
90
4.8 The Influence of TDM on Model Maintenance: Experiments
E1 , the experiment was guided by CEP’s experimental workflow engine, leading
subjects through an initial questionnaire, 14 change tasks (7 with the support of
test cases and 7 without the support of test cases), a concluding questionnaire and
a feedback questionnaire. Even though in R1 more than twice as many subjects
as in E1 participated, no major problems during the execution of the replication
occurred.
Data Analysis of R1
Up to now, we have described differences between E1 and R1 as well as the execution
of R1 . In the following, we turn toward the analysis and interpretation of data
collected in R1 .
Data Validation Analogous to E1 , subjects were guided using CEP’s experimental
workflow engine. Again, all subjects were successfully guided through the experiment, no data set had to be discarded because of disobeying the experimental setup.
Also, we screened subjects for familiarity with Declare. As summarized in Table 4.8,
the samples of E1 and R1 are similar with respect to their knowledge of Declare.
Even though the values of the sample from R1 are slightly lower, differences are not
statistical significant. In particular, neither differences with respect to familiarity
with Declare (Mann–Whitney U Test, U = 164.5, p = 0.565), nor difference with respect to understanding Declare (Mann–Whitney U Test, U = 164.0, p = 0.565), nor
differences with respect to competence using Declare (Mann–Whitney U Test, U =
138.0, p = 0.201) are statistically significant. Similar to E1 , 25 subjects indicated
that they were students and 6 subjects indicated academic background. All in all,
we conclude that the sample of R1 is similar to the sample of E1 with respect to
knowledge about Declare and thus the subjects fit the targeted profile.
Familiarity with Declare
Confidence understanding Declare
Competence modeling Declare
Months using Declare
Years experience in BPM
Min.
Max.
M
SD
1
1
1
0
0
6
6
6
15
7
2.90
3.48
3.16
1.42
1.56
1.45
1.55
1.53
3.28
1.99
Table 4.8: Demographical statistics of R1
91
Chapter 4 Test Driven Modeling
Descriptive Analysis To give an overview of the results from R1 , Table 4.9 lists minimum, maximum, mean and standard deviation of mental effort, perceived quality
and quality. Similar to E1 , data of R1 suggests that the adoption of test cases lowers mental effort, increases perceived quality and increases quality, thus supporting
hypotheses H1 , H2 and H3 . More specifically, the average mental effort for change
tasks with test support (M = 3.77, SD = 1.52) was lower than the average mental
effort required for change tasks without test support (M = 5.16, SD = 1.64). The
reported perceived quality of change tasks with test cases (M = 5.84, SD = 1.32)
was higher than the perceived quality for change tasks without test support (M =
4.23, SD = 1.33). Finally, the average quality of changes with test case support
(M = 6.19, SD = 1.01) was higher than the quality of changes without test support
(M = 5.03, SD = 1.82). Please recall that mental effort and perceived quality were
measured on rating scales ranging from 1 (Very low) to 7 (Very high), therefore also
values in Table 4.9 range from 1 to 7. Regarding quality, we awarded 1 point for
each correctly performed change task. All in all, 7 change tasks per model had to be
performed, hence in total 7 points could be achieved per model, likewise values for
quality range from 0 to 7. Analogous to E1 , we now turn toward inferential statistics
to test whether these differences are also statistically significant.
N
Min.
Max.
M
SD
Mental effort
with test cases
without test cases
62
31
31
1
1
1
7
7
7
4.47
3.77
5.16
1.72
1.52
1.64
Perceived quality
with test cases
without test cases
62
31
31
1
2
1
7
7
6
5.03
5.84
4.23
1.55
1.32
1.33
Quality
with test cases
without test cases
62
31
31
0
3
0
7
7
7
5.61
6.19
5.03
1.57
1.01
1.82
Table 4.9: Descriptive statistics of R1
92
4.8 The Influence of TDM on Model Maintenance: Experiments
Hypotheses Testing of R1
Since the sample of R1 is not normal distributed12 , analogous to E1 , we rely on non–
parametric statistical tests. In particular, we used SPSS (Version 21.0) for carrying
out Wilcoxon Signed–Rank Test.13
Hypothesis H1 The mental effort for conducting change tasks with test case support (M = 3.77) was significantly lower than the mental effort for change tasks
that were conducted without test case support (M = 5.16) (Wilcoxon Signed–Rank
Test, Z = −2.41, p = 0.016, r = 0.43).
Hypothesis H2 The perceived quality for conducting change tasks with test case
support (M = 5.84) was significantly higher than the perceived quality for change
tasks that were conducted without test case support (M = 4.23) (Wilcoxon Signed–
Rank Test, Z = −3.87, p = 0.000, r = 0.69).
Hypothesis H3 The quality for conducting change tasks with test case support
(M = 6.19) was significantly higher than the quality for change tasks that were
conducted without test case support (M = 5.03) (Wilcoxon Signed–Rank Test, Z =
−3.17, p = 0.002, r = 0.57).
Discussion and Limitations
The results obtained in replication R1 indicate that the adoption of test cases has a
positive influence on the maintenance of declarative process models. In particular,
the collected data supports the claim that test cases can help to reduce mental effort (H1 ), increase perceived quality (H2 ) and improve the quality of the conducted
changes (H3 ). It is worthwhile to note that these results could be consistently found
in E1 and R1 and the effect sizes for statistically significant differences were medium
to large. Statistically significant differences for H1 and H2 as well as indications
for the support of H3 —even though not statistically significant—were found in E1 .
Regarding H3 , we argued that lack of differences was caused by ceiling effects [263],
i.e., 96% of the task were performed correctly in E1 , leaving no room for improvement. By providing more complex process models in R1 , this rate dropped to 80%
and we could provide statistically significant support for H3 .
12
We applied Kolmogorov–Smirnov Test with Lilliefors significance correction to test for normal
distribution. Detailed tables listing results can be found in Appendix A.2.
13
Please recall that due to repeated measurements the response variables are not independent, thus
Wilcoxon Signed–Rank Test was chosen.
93
Chapter 4 Test Driven Modeling
This, in turn, leads us to two conclusions. First, test cases can indeed improve
quality, i.e., reduce the amount of errors, when adapting declarative process models. Second, it appears that test cases work best if a certain level of complexity is
reached. This finding is also in–line with the theoretical analysis of problems in understanding and maintaining declarative process models (cf. Section 4.4). We argued
that declarative process models lack computational offloading (cf. Section 3.2), thus
without the support of test cases, mental effort will quickly rise, in turn increasing
the amount of errors. Even though in E1 the mental effort was significantly higher
when no test cases were provided, the differences were not large enough to provoke
significant changes with respect to the amount of errors committed. One might object at this point that comparing the average mental effort for E1 (M = 5.00) and
R1 (M = 4.47) clearly indicates that the change tasks of E1 were actually easier
than the change tasks of R1 . However, as discussed in [303], subjects usually show
different base levels of mental effort, i.e., mental effort may be perceived differently
by each subject. Put differently, a task that is considered hard by subject A might
considered easy by subject B, since subject B is used to difficult tasks. Please note
that even though this hampers the comparison between samples, it does not influence
E1 and R1 , as Wilcoxon Signed–Rank Test conducts a within–subject comparison.
N
Min.
Max.
M
SD
Created constraints
with test cases
without test cases
434
217
217
0
1
0
24
24
10
2.00
2.36
1.64
2.42
3.08
1.42
Deleted constraints
with test cases
without test cases
434
217
217
0
1
0
24
24
9
1.82
2.29
1.35
2.28
2.99
1.02
Adapted constraints
with test cases
without test cases
434
217
217
0
0
0
3
3
1
0.01
0.01
0.01
0.16
0.20
0.10
Total changes
with test cases
without test cases
434
217
217
0
1
0
48
48
19
3.82
4.66
2.99
4.68
6.10
2.32
Table 4.10: Performed change operations in R1
When analyzing the data collected in experiment E1 , we also looked into the
amount of change operations that were conducted. In particular, we found that sig-
94
4.9 Limitations
nificantly more change operations were performed when subjects had test cases at
hand. To investigate whether this trend persists in R1 , we counted the amount of operations conducted in R1 . As summarized in Table 4.10, in the course of 434 change
tasks (14 per subject, 31 subjects), subjects required on average 3.82 changes to perform the task. Mostly, change tasks included the deletion of constraints (M = 2.00)
and creation of constraints (M = 1.82). Contrariwise, constraints were almost never
reused (M = 0.01), i.e., reconnected between activities. Consistent with E1 , in R1
significantly more change operations were performed when change tasks were provided (Wilcoxon Signed–Rank Test, Z = −3.27, p = 0.001, r = 0.59). Even though
the results obtained in R1 appear to be consistent with the findings of E1 , also similar limitations apply. In particular, we argued that the generalization of results in E1
is limited, as only 10 change tasks for 2 process models were performed. Similarly,
in R1 14 change tasks on 3 models were performed, thereby limiting generalization.
Also, neither in E1 nor in R1 we could acquire professionals, rather students and
academics were used as subjects.
4.9 Limitations
Apparently, the work presented in this chapter has to be seen in the light of several
limitations. Regarding conceptual aspects, we have discussed that the focus of TDM
is rather narrow. In particular, TDM focuses on control flow behavior only and
requires MBs and DEs that are willing to collaborate, cf. Section 4.5.2. With
respect to the efficiency of TDM, the case study reported in Section 4.7 revealed
that an overhead of 9% to 16% should be expected.
In addition, also limitations regarding the evaluation should be mentioned. Regarding the case study presented in Section 4.7, the sample size—even though typical
for case studies—is a clear limitation to the generalization of results. In addition,
the modeling sessions lasted on average 26 minutes and 30 seconds, hence the effects TDM of longer–lasting modeling sessions could not be examined. Similarly,
all modeling sessions started from scratch and did not investigate whether MBs and
DEs are able to operate on test cases that were specified by other MBs or DEs.
Also, the results obtained in the controlled experiments investigating the maintenance of declarative process models in Section 4.8 should be generalized with care.
Although results were consistently found in controlled experiment E1 and replication R1 , it should be mentioned that students were used as subjects. Even though
it was shown that in software engineering students may provide an adequate model
for the professional population [108, 195, 215], other studies report from significant
differences between students and professionals, e.g., [3]. In particular, it was argued
95
Chapter 4 Test Driven Modeling
that studies should not use students and “blindly generalize to a larger population
of software engineering professionals” [235]. Rather, under certain circumstances,
such as an adequate level of commitment, students may be suitable for experimentation [16]. Since the subjects participating in our studies largely performed the
tasks correctly, i.e., 96% correct in E1 , 80% correct in R1 , we conclude that the
students showed considerable commitment. Against this background and the fact
that findings persisted over two experiments, we argue that the results can also be
generalized to the population of professional process modelers.
4.10 Related Work
Basically, research related to this work can be organized along four different streams
of research: test–driven approaches, scenario–driven approaches, validation, verification and creation of declarative process models and execution of declarative process
models.
Test–Driven Approaches
As discussed in Section 4.5, the development of TDM was inspired by Test Driven
Development (TDD). TDD, in turn, was developed to drive the software development process and to introduce test cases as early as possible in the development
process, thereby improving the quality of the resulting software. Besides theoretical
considerations of TDD [15], empirical investigations looking into the feasibility of
TDD are of interest. In particular, experiments investigating the effect of TDD,
i.e., interweaving software development and testing, like the TDM methodology interweaves modeling and testing, are of interest. For instance, [133] conducted a
long–term case study which showed that the adoption of TDD increases perceived
quality. In addition, developers stated that changes could be conducted easier when
TDD was adopted. With respect to code quality, the situation appears not to be
entirely clear. For instance, [62, 84] report from controlled experiments that showed
increased code quality through the adoption of TDD. Contrariwise, experiments
reported in [63, 169] could not show significant difference between TDD and test–
after coding. More generally, benefits of TDD with respect to quality and perceived
quality could so far mostly be shown for industrial settings—in semi–industrial or
academic context, the situation seems less clear [230].
In addition, it should be mentioned that TDM is not the first approach in which
the concepts of TDD are transferred to other domains. In particular, Tort et al.
adapted TDD for the creation and testing of conceptual schemata. Similar to this
work, a comprehensive description of concepts is provided [244, 245], but also tool
96
4.10 Related Work
support for the approach is provided [243, 246]. In addition, the authors extend
their work toward defining desirable properties for test cases [247]. Even though the
general approach of Tort et al. is similar to this work, the targeted domain, i.e.,
conceptual schemata versus declarative process models, is entirely different, in turn
leading to entirely different artifacts that were developed.
Scenario–Driven Approaches
In this work, we argued that the specification of test cases will help the MB in
constructing the process model. Even though test cases are validated automatically
against the process model, the creation of the process model is still a manual process.
A similar approach is followed by scenario–driven approaches, where specific aspects
of a system are captured using scenarios, cf. [2, 131]. These scenarios, in turn, are
then used to automatically synthesize the system, e.g., the process models. For instance, Fahland proposes an approach for synthesizing Petri Nets from scenarios [67].
This approach was meanwhile implemented in the GRETA tool [66, 71] and was extended toward the specification of scenarios which can be directly executed [72] or
be used for the specification of decentralized components [68]. In a similar vein,
Desel et al. propose to adopt process mining techniques for the specification of
scenarios [53, 54]. Similarly, the Play–Engine provides support scenario–based programming support for Live Sequence Charts [99].
All these approaches distinguish themselves from our work in two ways. First,
in TDM, declarative process models are clearly in the focus, while the mentioned
scenario–based approaches focus on imperative languages, such as Petri Nets or Live
Sequence Charts. Second, all the approaches focus on the automated synthesis of
models, whereas in TDM the model creation is a manual process. Even though the
automated creation of models appears to be a desirable goal, such an approach is
probably not feasible for all purposes. If, for instance, process models are developed
for the enactment of processes, the readability of the model may not be of concern.
However, if a process model is developed for documentation purposes, readability
is a central goal. As argued in [88], automated approaches mostly fail to produce
models that are also readable by humans.
Validation, Verification and Creation of Declarative Process Models
Regarding the validation and verification of process models in general, work in the
area of process compliance checking should be mentioned, e.g., [5, 6]. In contrast
to our work, understandability of declarative languages is not of concern, the focus is put on imperative languages. Likewise, the work of Ly et al. [132] focuses
97
Chapter 4 Test Driven Modeling
on validation, but instead of declarative, adaptive process management systems are
targeted. Interestingly, with respect to the classification of sequential and circumstantial information, their setting describes the opposite of the setting used in this
work. Their imperative process models exhibit mostly sequential information, while
the adopted measure to improve validity relies on circumstantial information, i.e.,
the usage of constraints. Thus, the aim of improving validity is found in both approaches, however, for entirely different settings. With respect to the verification
of declarative process models, in turn, mechanisms for the verification were proposed. With proper formalization, declarative process models can be verified using
established formal methods [175]. Depending on the concrete approach, a–priori,
e.g., absence of deadlocks [175] or a–posteriori, e.g., conformance of the execution
trace [251] checks can be performed. While these approaches definitively help to
improve the syntactical correctness and provide semantical checks a–posteriori, they
do not address understandability issues.
Finally, in [126] an algorithm for the automated extraction of declarative process
models from execution traces is proposed. Even though such an approach certainly
helps to facilitate the creation of declarative process models, it does not help to
understand the created models.
Execution of Declarative Process Models
In this work, we focus on the creation and maintenance of declarative process models.
In the following, we give an overview of approaches related to the execution of declarative processes. Closest related to TDM is certainly the development of the declarative process modeling language Declare [175, 178], since TDMS was implemented
to support the creation of Declare models. Regarding the execution of Declare
workflows, the Declare framework [175–177], which provide support for executing
Declare models, is related. Besides supporting general–purpose workflows, the Declare framework was also applied for the support of Computer–Interpretable Guidelines [159], i.e., supporting clinical guidelines with workflow technology. Likewise,
the Declare framework was applied for the execution of service workflows [253, 254].
Declare is a prominent representative of declarative process modeling, however,
also other researchers have addressed declarative process modeling. For instance,
in [4, 229], similar to Declare, a workflow is considered a set of dependencies between tasks, however different formalisms are in place. In this vein, also Dynamic
Condition Response (DCR) graphs [101, 156, 157] are of interest. DCR graphs,
just like Declare, allow for the specification of declarative business process models,
support the specification of sub–processes [103] and were applied in a case study for
the specification of a cross–organizational case management system [102]. However,
98
4.11 Summary
unlike Declare, DCR graphs focus on a set of core constraints instead of allowing for
the specification of arbitrary constraints. Likewise, DCR graphs employ different
formalisms for the operationalization of constraints.
Although declarative process models provide a high degree of flexibility, their
execution may pose a significant challenge. In particular, as argued in [220], due
to the high degree of flexibility, it may not always be clear to the end user which
activity shall be executed next. To counterbalance this problem, in [10, 11] methods
for guiding the end–user through the execution of a declarative process instance are
proposed. In particular, by recommending activities to be executed, the end user
shall be be supported. More broadly, [96, 220] propose similar methods that can be
also applied for imperative process models. Even though these approach focuses on
improving the usability of declarative process models, the focus is put on the phase
of process operation only.
4.11 Summary
In this chapter, we described TDM for supporting the understanding, creation and
maintenance of declarative process models. Regarding the understanding of declarative process models, we argued along the dimensions of the Cognitive Dimensions
Framework and identified hard mental operations and hidden dependencies as problems. Similarly, we discussed that declarative process models mostly provide circumstantial information, while the extraction of sequential information may impose
significant mental effort. As it is known that adapting a model consists of understanding what to change and then to perform the change, deficits with respect of understanding presumably also impact the maintenance of declarative process models.
To counteract these problems, we proposed TDM, which adopts testing techniques
from the domain of software engineering. By providing automatically verifiable test
cases, we target to automate the extraction of sequential information, thereby relieving the MB from hard mental operations. In addition, we envisioned a visualization
of test cases that is also readable for DEs. This, in turn, provides an additional
communication channel MBs and DEs, allowing for more efficient communication.
In order to provide operational support, we implemented the concepts of TDM in
TDMS. To specifically support empirical investigations, TDMS was implemented on
top of CEP. TDMS, in turn, was used in three empirical studies investigating the
influence of TDM on the creation, understanding and maintenance of declarative
process models.
The data collected in the case study investigating the creation of declarative
process models indicates that test cases are accepted as communication medium.
99
Chapter 4 Test Driven Modeling
Furthermore, in our study, test cases were favored over the process models as communication channel and allowed for structuring modeling sessions. Regarding the
maintenance of declarative process models, we conducted a controlled experiment
(E1 ) and a replication (R1 ). Therein, a positive influence of test cases on mental
effort, perceived quality and quality could be observed. Interestingly, significantly
more change operations were performed for change tasks that were provided with
test case support, indicating that test cases can also improve the willingness to
change. As nevertheless quality increased, we conclude that test cases provide a
safety net for conducting changes. Still, it should be mentioned that the creation
of test cases implies a certain overhead (in our study between 9% and 16%). In
addition, TDM currently focuses on control flow only, i.e., other dimensions such
as resources or data are not supported yet, although such an extension could be
achieved by extending the semantics of test cases. Thus, we conclude that TDM—
even though implying a certain overhead for specifying test cases—helps to improve
the creation, understanding and maintenance of declarative process models.
100
Chapter 5
The Impact of Modularization on
Understandability
In Chapter 4, we have focused on the question how the application of cognitive
psychology can help to improve the creation, understanding and maintenance of
declarative business process models. In this chapter, we turn toward the application
of cognitive psychology for investigating the impact of modularization on the understandability of a process model. In other words, we shift the focus in two ways.
First, we narrow our research from creating, understanding and maintaining process
models toward the understanding only, i.e., we focus on the interpretation of process
models. Second, we broaden our perspective on modeling languages. In particular,
besides declarative process modeling languages, we also take into account imperative process modeling languages and consider insights from conceptual modeling
languages in general. As described in Chapter 2, thereby we follow the Design Science Research Methodology (DSRM) approach [173] to guide our research, but also
to structure this chapter. In particular, the remainder of this chapter is organized
along the activities specified by DSRM:
(1) Problem identification and motivation
(2) Define the objectives for a solution
(3) Design and development
(4) Demonstration
(5) Evaluation
(6) Communication
In particular, we start with a general introduction to the problem, i.e., the interplay between modularization and the understanding of a process model, in Section 5.1. Then, we systematically assess the state of the art by conducting a systematic literature review in Section 5.2. In so far, these sections address problem
101
Chapter 5 The Impact of Modularization on Understandability
identification and motivation (1) and define the objectives for a solution (2). Next,
we propose a cognitive–psychology–based framework for assessing the impact of
modularization on the understanding of a process model in Section 5.3, addressing
design and development (3) and demonstration (4). Subsequently, Section 5.4 reports from empirical investigations in which the proposed framework is empirically
validated in the context of BPMN–based process models. In a similar vein, Section 5.5 reports from an empirical investigation, applying the proposed framework
in the context of declarative business process models. Therefore, these sections can
be attributed to activity evaluation (5). Then, limitations of this work are described
in Section 5.6. In Section 5.7, we revisit findings and limitations for a discussion,
whereas the work presented in this chapter is put in the context of existing research
in Section 5.8. Finally, this chapter is concluded with a summary in Section 5.9.
Even though activity communication (6) was not explicitly mentioned so far, we
would like to remark at this point that communication is the inherent purpose of
this document, i.e., is also addressed in this work.
5.1 Introduction
Using modularization to structure information has for decades been identified as
a viable approach to deal with complexity [170]. Not surprisingly, modularization
is widely used and, for instance, available through nested states in UML Statecharts [166] and sub–processes in BPMN [167] and YAWL [256]. However, in general, “the world does not represent itself to us neatly divided into systems, subsystems. . . these divisions which we make ourselves” [89]. In this sense, a viable discussion about the proper use of modularization for the analysis and design of information systems as well as its impact on understandability is still going on. Even though
considerable progress could be achieved, for instance, by adopting the good decomposition model [265] for the modularization of Event–driven Process Chains [113]
and the application of proper abstraction levels [51, 197], still, it appears that it is
not entirely clear whether and when modularization has a positive influence on the
understandability of a conceptual model in general or a process model in particular.
For instance, researchers who have set out to provide empirical evidence for the
positive effects of modularization reported—contrary to their expectations—from a
negative influence of modularization on understandability, cf. [40, 45]. More generally, empirical research into the understandability of conceptual models has shown
that modularization can have a positive influence [208], negative influence [45], or no
influence at all [41]. In Business Process Management (BPM), sub–processes have
been recognized as an important factor influencing model understandability [46],
102
5.2 Existing Empirical Research into Modularizing Conceptual Models
however, there are no definitive guidelines on their use yet. For example, recommendations regarding the size of a sub–process in an imperative process model range
from 5–7 model elements [224] over 5–15 model elements [122] to up to 50 model
elements [145]. Against this background, we argue that even though considerable
progress concerning the proper application of modularization has been achieved, the
connection between the modularization of a conceptual model and its understandability is not yet fully understood.
In this work, we aim for contributing to a better understanding of the interplay
between modularization and the understandability of a process model by distilling
insights from cognitive psychology and findings from empirical investigations. We
would like to mention at this point that the concepts proposed in this work are
theoretically applicable to any modularized conceptual modeling language, but were
only empirically validated for process modeling languages so far. Hence, we will
refer to conceptual models in the conceptual sections, but refer to the specific modeling language in the empirical validation. In particular, we start by assessing the
state of the art of empirical research into the modularization of conceptual models
through a systematic literature review. Then, to provide a potential explanation
for the diverging findings, we propose a cognitive–psychology–based framework for
assessing the impact of modularization on the understandability of a conceptual
model, i.e., whether a certain modularization of a model has a positive influence,
negative influence, or no influence at all. Finally, we test the proposed framework
empirically in the course of three experiments, thereby applying it to BPMN–based
process models and Declare–based process models.
5.2 Existing Empirical Research into Modularizing
Conceptual Models
Up to now, we have discussed that the impact of modularization is not uniform.
Rather, the influence seems to vary from positive over neutral to negative. In order to
provide a comprehensive analysis, we conducted a systematic literature review [121]
about empirical work investigating the impact of modularization on understanding.
As detailed subsequently, we followed guidelines from [121] and structured our systematic literature into three phases: planning, performing as well as reporting the
systematic literature review.
103
Chapter 5 The Impact of Modularization on Understandability
5.2.1 Planning the Systematic Literature Review
The planning phase of a systematic literature review consists of two tasks. First,
the need for a systematic literature review is assessed and the goal is defined. Second, based upon the defined goal, the search strategy for conducting the systematic
literature review is elaborated.
Identification of Goal
Conducting a systematic literature review is usually associated with a significant
effort. Even though digital libraries allow for searching by the use of keywords, the
identification of relevant literature has to be performed manually, typically requiring
the reviewer to assess thousands of results for relevance. Hence, the decision for conducting a systematic literature review should be well–justified. The starting point of
this systematic literature review were individual empirical works that show a variable
influence of modularization of a conceptual model on its understanding. However,
these works were identified in a rather unsystematic manner, e.g., coincidentally in
the course of other research activities or by receiving literature recommendations
by co–workers. In this way, we had to assume that the identified literature is non–
representative and may only represent a small fraction of studies. The goal of this
systematic literature review is therefore to counteract this problem and to provide
a comprehensive overview of empirical studies investigating the interplay between
modularization and understanding.
Search Strategy
To operationalize the goal of the systematic literature review, we derived a key–word
pattern that describes any “empirical study investigating the interplay between modularization and understanding of a conceptual model”. Since previous works may
made use of different terminology, we used synonyms for modularization and understanding. In particular, as summarized in Table 5.1, we identified 9 synonyms for
modularization and 2 synonyms for understanding. From these synonyms, in turn,
the key–word pattern for the search in digital libraries was derived. In particular, we
used all potential combinations of synonym for modularization, synonym for understandability, experiment and model, leading to 18 key–word patterns (9 * 2 * 1 * 1).
If supported by the digital library, we intended to use a boolean combination of all
key–words to avoid duplication, i.e., key–word pattern1 OR key–word pattern2 OR
. . . key–word pattern18 . If such boolean expressions were not supported, we intended
to conduct an individual search for each key–word pattern and to merge results.
104
5.2 Existing Empirical Research into Modularizing Conceptual Models
Search Term
Synonyms
Modularization
Modularity, hierarchical, hierarchy, decomposition, refinement, submodel, sub–model, fragment, module
Understandability, comprehensibility
Experiment
Model
Understandability
Experiment
Model
Table 5.1: Synonyms for key–word patterns
For performing the search, the review protocol defined to rely on the online portals
of the most important publishers in the field of computer science, i.e., Springer1 ,
Elsevier2 , ACM3 and IEEE4 . In addition, we included the senior scholars’ basket
of journals to extend the search toward the field of information systems.5 Besides
specifying search terms, the review protocol is required to define criteria for the
inclusion or exclusion of results. We decided to include any publication that reports
from empirical work investigating the interplay of modularization and understanding
and did not define quality criteria necessary for inclusion. Finally, it is needed to
define which information shall be extracted from the identified literature. In this
work, we decided to extract the following information:
• Investigated modeling language
• Dimension of understanding that was measured (e.g., accuracy, duration)
• Direction of impact (positive, neutral, negative)
5.2.2 Performing the Systematic Literature Review
Following the guidelines from [121], the review protocol was used to guide the search.
The key–word pattern resulted into a total of 10,391 hits that had to be assessed
manually for inclusion. Mostly, the title of an identified work was sufficient for determining inclusion/exclusion. If the title was not informative enough, we consulted
the work’s abstract. Only if the abstract was also not informative enough, the paper had to be inspected in detail. By following this strategy, 10 publications out
1
http://www.springerlink.com
http://www.sciencedirect.com
3
http://portal.acm.org
4
http://ieeexplore.ieee.org
5
The senior scholars’ basket of journals is available at: http://www.vvenkatesh.com/isranking.
For searching these journals, we relied on Google Scholar: http://scholar.google.at
2
105
Chapter 5 The Impact of Modularization on Understandability
of 10,391 were classified as relevant. Assuming that similar works are usually connected, we also checked the references used in these 10 publications, leading to the
identification of another 3 studies. Due to personnel limitations, the identification
of relevant literature was performed by a single person, i.e., the author. All in all,
13 publications reporting from studies investigating the impact of modularization
on understanding of conceptual models were found. The insights of these studies
are reported in the following.
5.2.3 Reporting the Systematic Literature Review
As summarized in Table 5.2, we could identify 13 publications reporting from investigations of 6 different modeling languages: Levelled Data Model (LDM) [150], Hierarchical Entity–Relationship Diagram (HERD) [225], Protos [261], UML Class Diagram [166], UML Use Case Diagram [166] and UML Statechart [166]. Even though
13 publications could be identified, Table 5.2 actually lists 10 experiments, since 3
experiments were published in different venues. Basically, these studies adopted a
similar experimental design, as illustrated in Figure 5.1. Based on the particular research question, conceptual models were created that operationalized the concepts
to be investigated, such as quality of decomposition or depth of modularization.
Then, researchers elaborated questions about these models in order to assess their
understandability. Regarding the dimension of understanding that was measured,
we could observe that each study looked into accuracy, i.e., researchers counted how
many questions were answered correctly. In addition, 4 studies also took into account the duration required for answering questions ([43] reported “efficiency”, i.e.,
accuracy divided by duration). Finally, 3 studies investigated the perceived ease of
understanding, i.e., mental effort. Regarding the impact on understanding, 7 studies reported from positive influence, 9 studies reported from no influence, while 3
studies reported from negative influence [40]. Please note that these numbers do
not add up to the 10 studies identified in the systematic literature review, as some
studies reported from positive, negative as well as neutral results. We also would like
to emphasize at this point that most of the studies investigate differences between
modularized and non–modularized models. Studies [24–26], however, empirically
validate guidelines for modularizing conceptual models. In other words, these works
empirically showed that a certain modularization is required for showing positive
effects.
In general, it can be said that the influence of modularization on the understanding of a conceptual model was investigated for a variety of modeling languages.
Although most studies report from positive influence, some studies could not find
any impact on the understanding of a model and 3 studies reported from a negative
106
5.2 Existing Empirical Research into Modularizing Conceptual Models
Language
Study
Finding
LDM
[150]
Accuracy: positive (2 models)
Duration: negative (1 model), neutral (1 model)
HERD
[225]
Accuracy: neutral (2 models)
Protos
[206, 208]a Accuracy: positive (1 model), neutral (1 model)
UMLb
[24]
Accuracy: positive (3 models)
Mental Effort: neutral (3 models)
[25]
Accuracy: positive (1 experiment, 1 replication)
Mental Effort: neutral (1 experiment, 1 replication)
[26]
Accuracy: positive (3 models)
Mental Effort: neutral (3 models)
[42, 44]a
Accuracy: positive (1 exp.), neutral (4 experiments)
[41]
Accuracy: neutral (1 model)
Duration: neutral (1 model)
[43]
Efficiencyc: positive (experiment), negative (repl.)
[40, 45]a
Accuracy: neutral (experiment), negative (repl.)
Duration: neutral (experiment), negative (repl.)
UML
Statecharts
a
b
c
The same study was published in multiple venues
UML Class Diagrams, UML Use Case Diagrams and UML Statecharts
Defined as ratio of accuracy and duration
Table 5.2: Systematic literature review: results
influence. In addition, it was empirically corroborated that also the way how modularization is applied, has an impact on understanding. Against this background, one
might argue that studies reporting from negative influence used models that were
badly modularized. However, knowing that all these studies tried to provide evidence
that modularization has a positive influence and that authors are considered specialists in their field, this explanation appears implausible. Rather, we think that
modularization involves a certain trade–off. By introducing modularization, certain
aspects of a model become easier to understand, while other aspects of a model
become harder to understand. It is important to emphasize at this point that these
influences have to be seen orthogonal to guidelines for modularization, as presented
in [24–26]. Guidelines appear to maximize positive influence, while minimizing neg-
107
Chapter 5 The Impact of Modularization on Understandability
question
answer
modularization
model
understandability
Figure 5.1: Typical research model [297]
ative influences. In this vein, we argue that even for a perfectly modularized model,
certain aspects would be easier to understand without modularization. However,
currently, it is not entirely clear yet under which circumstances positive or negative
influence of modularization on the understanding of a model can be expected. To
approach this issue, in the following, we draw on cognitive psychology to provide a
systematic view on which factors influence understandability.
5.3 A Framework for Assessing Understandability
Up to now, we have focused on existing research into the interplay between modularization and the understanding of a model. In the following, we turn toward offering
a new perspective by proposing a framework for assessing the impact of modularization on understandability. To this end, we start by identifying two opposing forces
that presumably influence the understanding of a model in Section 5.3.1. Then, in
Section 5.3.2 we integrate these opposing forces into a framework for assessing the
interplay between modularization and understandability.
5.3.1 Antagonists of Understanding: Abstraction and Fragmentation
The systematic literature revealed that typical research setups are centered around
modularized conceptual models as well as questions for assessing their understandability. In this sense, questions and associated answers are the unit of analysis:
If the impact of the applied modularization is positive, questions will be easier to
answer. Contrariwise, if the impact of the applied modularization is negative, questions will be harder to answer. Therefore, we discuss the impact of modularization
on understanding by focusing on the effects on an individual question. In particular,
108
5.3 A Framework for Assessing Understandability
we describe how the positive influence of modularization can be attributed to the
concept of abstraction, while negative effects can be explained by fragmentation.
Abstraction
Through the introduction of modularization it is possible to group a part of a model
into a sub–model. When referring to such a sub–model, its content is hidden by
providing an abstract description, such as a complex activity in a business process
model or a composite state in an UML Statechart. The concept of abstraction is far
from new and known since the 1970s as “information hiding” [170]. In the context of
our work, it is of interest in how far abstraction influences model understandability.
From a theoretical point of view, abstraction should show a positive influence, as abstraction reduces the amount of elements that have to be considered simultaneously,
i.e., abstraction can hide irrelevant information, cf. [150]. However, if positive effects
depend on whether information can be hidden, the way how modularization is displayed apparently plays an important role. Here, we assume, similar to [150, 206],
that each sub–model is presented separately. In other words, each sub–model is
displayed in a separate window if viewed on a computer, or printed on a single sheet
of paper. The reader may arrange the sub–models according to personal preferences
and may close a window or put away a paper for hiding information. Thereby, irrelevant information can be hidden from the modeler, leading to decreased mental effort,
as argued in [150]. From the perspective of cognitive psychology, this phenomenon
can be explained by the concept of attention management [128]. During the problem solving process, i.e., answering a question about a model, attention needs to be
guided to certain parts of a model. For instance, when checking whether a certain
execution trace is supported by a process model, activities that are not contained
in the trace are irrelevant for answering the question. Here, abstraction allows for
removing this irrelevant information, supporting the attention management system
and thus allows for reducing mental effort.
To illustrate the impact of abstraction, consider the BPMN–based model shown
in Figure 5.2. Assume the reader wants to determine whether activity I is always
preceded by activity A. In this case, the question can be easily affirmed, since all
execution flows leading to activity I are also connected to activity A. Activities B to
H can be ignored, since they are hidden through complex activities J and K, i.e., are
hidden by abstraction. The value of abstraction becomes particularly visible when
comparing this model with the non–modularized version shown in Figure 5.3. As
both models represent the same business process, again activity I is always preceded
by A. Similar to the process model from Figure 5.2, all sequence flows between
activity A and I have to be checked. However, no sub–processes are present, hence
109
Chapter 5 The Impact of Modularization on Understandability
considerable more modeling elements, i.e., the entire content of sub–processes J and
K, have to be considered.
Figure 5.2: Example of modularization
Besides reducing mental effort by improving attention management, abstraction
presumably supports the identification of higher level patterns. It is known that
the human’s perceptual system requires little mental effort for recognizing certain
patterns [128, 217], e.g., recognizing a well–known person does not require thinking,
rather this information can be directly perceived. Similarly, in conceptual models,
by abstracting and thereby aggregating information, presumably information can be
easier perceived, as discussed in detail in Section 3.2.2.
To illustrate the effect of recognition, we would like to come back to the process models shown in Figure 5.2 and Figure 5.3. Basically, both models represent
the same business process and thus exhibit a similar structure. The start event is
followed by activity A, after which the control flow is split and joined just before
activity I. Directly after activity I, in turn, the process is completed. At this point,
we argue that this structure can be easily recognized in the modularized model from
Figure 5.2, while it takes considerably more time to identify this structure in the
non–modularized model from Figure 5.3. Please note that we selected small models
for demonstration purposes, therefore the presented questions can still be answered
rather easily. At the same time, it can be expected that the positive influence of
abstraction increases when models become larger.
110
5.3 A Framework for Assessing Understandability
C
B
D
A
E
I
F
G
H
Figure 5.3: Non–modularized version of model from Figure 5.2
Fragmentation
Empirical evidence shows that the influence of modularization ranges from positive
over neutral to negative (cf. [40, 41, 150, 208]). To explain the negative influence, we
refer to the fragmentation of the model. When extracting a sub–model, modeling elements are removed from the parent model and placed within the sub–model. When
answering a question that also refers to the content of a sub–model, the modeler has
to switch attention between the parent model and the sub–model. In addition, the
modeler has to mentally integrate the sub–model into the parent model, i.e., interpret the sub–model in the context of the parent model. From the perspective of
cognitive psychology, these phenomena are known to increase mental effort and are
referred to as split–attention effect, as discussed in detail in Section 3.2.3.
To demonstrate this effect, consider the process model from Figure 5.2. To determine whether activity B and activity F are mutually exclusive, a modeler may make
use of the following strategy. First, activity B has to be located in sub–process J
and activity F has to be located in sub–process K. In other words, the modeler has
to split attention between sub–processes J and K for this step. Next, to determine
whether activities B and F are indeed mutual exclusive, the modeler has to integrate sub–processes J and K back into the parent process model, i.e., it is required
to mentally integrate the fragments. Compared to the process model in Figure 5.3,
where it is sufficient to interpret the control flow, splitting attention and mentally
reintegrating sub–processes causes additional effort in the modularized version.
Please note that fragmentation is inevitable as soon as modularization is introduced, even for well–modularized models. Consider, for instance, a modeler who
wants to find all activities that are assigned to a specific role. In this case it is very
111
Chapter 5 The Impact of Modularization on Understandability
likely that the modeler will have to look through several sub–processes to locate
all these activities. Hence, the impact of modularization on the understanding of a
model will depend on whether fragmentation can be compensated by abstraction,
as detailed in the following.
5.3.2 Toward a Cognitive Framework for Assessing Understandability
Up to now, we have discussed two forces that presumably influence the understanding of modularized models. Positively, we have identified abstraction, which presumably improves understanding by fostering information hiding and pattern recognition. Negatively, splitting attention and the mental integration of sub–models
caused by fragmentation is presumably responsible for decreased understanding. In
the following, we will combine these opposing forces toward a framework for assessing
the impact of modularization on understandability.
When taking into account the interplay of abstraction and fragmentation, it becomes apparent that the impact of modularization on the performance of answering
a question might not be uniform. Rather, each individual question may benefit
from or be impaired by modularization. Typically, understanding is estimated by
averaging the performance of answering questions about a model, as discussed in
Section 5.3.1. Therefore, it is essential to understand how a single question is influenced by modularization. To approach this influence, we propose a framework
that is centered around the concept of mental effort, i.e., the load imposed on the
working memory [168], as discussed in Section 3.2.3. As illustrated in Figure 5.4, we
propose to view the impact of modularization as the result of two opposing forces. In
particular, every question induces a certain mental effort caused by the question’s
complexity, also referred to as intrinsic cognitive load [236]. This value depends
on model–specific factors, e.g., model size, question type or layout, and person–
specific factors, e.g., experience, but is independent of the model’s modularization.
If modularization is present, the resulting mental effort is decreased by abstraction,
particularly by enabling information hiding and pattern recognition. Contrariwise,
fragmentation increases the required mental effort by requiring to switch attention
between fragments and the mental integration of information. Based on the resulting mental effort, a certain answering performance, e.g., accuracy or duration, can
be expected. We center our framework around mental effort, since it is known that
working memory in general and mental effort in particular is connected to performance (cf. Section 3.2.3). Hence, we assume that mental effort can be used as a
reasonable estimator of performance. In the following, we will discuss the interplay
of abstraction and fragmentation as well as link the size of a model to the framework.
112
5.3 A Framework for Assessing Understandability
allows for
Information hiding
Abstraction
allows for
Question
complexity
decreases
Pattern recognition
enables
induces
decreases
Modularization
Mental Effort
causes
requires
Switching attention
between fragments
Understandability
increases
Fragmentation
requires
estimates
increases
Integration of
information
Figure 5.4: Framework for assessing understandability [304]
Interplay of Abstraction and Fragmentation
According to the model illustrated in Figure 5.4, a question’s complexity induces a
certain mental effort, e.g., locating an activity is easier than validating an execution trace. In addition, mental effort may be decreased by information hiding and
pattern recognition, or increased by the need to switch between sub–models and integrate information. Thereby, abstraction as well as fragmentation occur at the same
time. A model without sub–models apparently cannot benefit from abstraction, neither is it impacted by fragmentation. By introducing modularization, i.e., creating
sub–models, both abstraction and fragmentation are stimulated. Whether the introduction of a new sub–model influences understandability positively or negatively
then depends on whether the influence of abstraction or fragmentation predominates. In the systematic literature review reported in Section 5.2, we could make
two observations in this respect. First, a negative influence of modularization was
reported particularly for rather small models, cf. [40, 43, 45]. Presumably, when
introducing modularization in a small process model, little influence of abstraction
can be expected, as the model is small anyway. However, fragmentation will appear,
regardless of model size. In other words, modularization will most likely show a
negative influence or at best no influence for small models. Second, and similarly,
in [42] a series of experiments for assessing the understandability of UML Statecharts
with composite states is described. For the first four experiments, no significant differences between non–modularized models and modularized models could be found.
Finally, the last experiment showed significantly better results for the modularized
model. The authors identified increased complexity, i.e., model size, as one of the
main factors for this result, strengthening the assumption that a model must be
large enough to benefit from abstraction. While it seems very likely that there is a
certain complexity threshold that must be exceeded so that desired effects can be
113
Chapter 5 The Impact of Modularization on Understandability
observed, it is not yet clear where exactly where this threshold lies. Looking into
literature that recommends the size of sub–processes illustrates how difficult it is to
define this threshold: estimations range from 5–7 model elements [224] over 5–15
elements [122] to up to 50 elements [145].
5.3.3 Limitations
Even though the proposed framework relies on established concepts from cognitive
psychology and insights from empirical works investigating the interplay between
modularization and the understanding of a conceptual model, certain limitations
apply. First, and foremost, the framework is of a general nature. Although this
basically allows for applying the framework to almost any conceptual modeling language supporting modularization, it does not take into account the characteristics
of specific modeling languages. Particularly, we have identified the integration of
sub–models as a task negatively influencing the understandability of a model. However, the difficulty of this integration task will most likely depend on the specific
modeling language. For instance, when mentally executing a BPMN–based model,
this task mainly refers to transferring the token from the parent process model to
the sub–process. When mentally executing a Declare–based model, the integration
of a sub–process requires the modeler to interpret the constraints of the sub–process
in the context of the parent process. This, in turn, might be a considerably more
difficult task, as discussed in Section 3.1.2. Also, the proposed framework focuses
on the structure of the model, but does not take into account its semantics. Hence,
factors such as redundancy or minimality, as described in the good decomposition
model [25, 266] cannot be taken into account. Furthermore, even though we have
discussed that the size of a model plays an important role for modularization, the
size is only taken into account implicitly, e.g., abstraction will presumably only
appear for large models, whereas fragmentation will presumably occur for any modularized model. We would like to emphasize at this point that we have deliberately
refrained from explicitly integrating size in our framework, as the optimal size of a
sub–model still appears to be unknown. Finally, as discussed, the framework is built
upon insights from cognitive psychology and empirical investigations of modularizing conceptual models, but still requires empirical validation. To compensate for
this shortcoming, we empirically tested the framework, as detailed in the following.
5.4 Evaluation Part I: BPMN
Even though the framework for assessing the impact of modularization on the understanding of a model relies on established concepts from cognitive psychology, it
114
5.4 Evaluation Part I: BPMN
clearly calls for empirical evaluation. To this end, we conducted a series of experiments in which the framework was validated for two different modeling notations.
In particular, in this section we report from an experiment (E2 ) and a replication
(R2 ), in which we look into modularized models that were created with BPMN.6 As
the experimental design of experiment E2 and replication R2 are almost identical,
we start by introducing the experimental design in Section 5.4.1. Then, we turn
to describing the execution of E2 in Section 5.4.2 as well as the execution of R2 in
Section 5.4.3.
5.4.1 Experimental Definition and Planning
The goal of experiment E2 and replication R2 is to provide empirical evidence for
the presence of abstraction and fragmentation, as introduced in Section 5.3. To
this end, in the following, we introduce the research questions, hypotheses, describe
the subjects, factors, factor levels, objects and response variables required for our
experiment. Then, we present the experimental design as well as the instrumentation
and data collection procedure.
Research Questions and Hypotheses
The research questions investigated in E2 and R2 are directly derived from the
theoretical framework presented in Section 5.3.2. The basic claim of the framework
is that any modularization shows a positive influence through abstraction, but also
a negative influence through fragmentation. Hence, the first research question can
be formulated, as follows:
Research Question RQ9 Is the understanding of a BPMN–based model positively
influenced by abstraction?
To assess the understandability of a model, we rely on measuring the mental effort
expended for answering a question, the accuracy, i.e., percentage of correct answers,
and the duration required for answering the question (mental effort, accuracy and
duration are elaborated in detail in Paragraph Response Variables). Also, we use
the term flat model synonymously for a model that is not modularized. For all
hypotheses, we assume that a business process is available as a modularized version
modelmod and a flat version modelf lat . In this way, the hypotheses associated with
RQ9 can be postulated as follows:
6
Similar to research questions, we have decided to number experiments and replications consecutively. E1 and R1 were already reported in Chapter 4, hence we continue with experiment 2
and replication 2.
115
Chapter 5 The Impact of Modularization on Understandability
Hypothesis H4 Questions that are influenced by abstraction, but are not influenced
by fragmentation, require less mental effort in modelmod .
Hypothesis H5 Questions that are influenced by abstraction, but are not influenced
by fragmentation, yield a higher accuracy in modelmod .
Hypothesis H6 Questions that are influenced by abstraction, but are not influenced
by fragmentation, require less time to answer in modelmod .
The second research question refers to the negative influence of modularization,
in our framework captured as fragmentation. Consequently, RQ10 looks into the
negative influence of fragmentation:
Research Question RQ10 Is the understanding of a BPMN–based model negatively
influenced by fragmentation?
Analogous to RQ9 , we look into mental effort, accuracy and duration for the
postulation of hypotheses associated with RQ10 :
Hypothesis H7 Questions that are influenced by fragmentation, but are not influenced by abstraction, require a higher mental effort in modelmod .
Hypothesis H8 Questions that are influenced by fragmentation, but are not influenced by abstraction, yield a lower accuracy in modelmod .
Hypothesis H9 Questions that are influenced by fragmentation, but are not influenced by abstraction, require more time to answer in modelmod .
Subjects
The population examined in E2 and R2 are basically all persons that need to interpret BPMN–based models, such as process modelers or system analysts. Therefore,
the subjects participating in our study should be at least moderately familiar with
BPM in general and BPMN in particular. We are not targeting modelers who are
not familiar with BPMN at all, as it has to be assumed that committed errors can
be rather traced back to unfamiliarity with the notation, i.e., BPMN, than traced
back to the influence of modularization.
116
5.4 Evaluation Part I: BPMN
Factor and Factor Levels
In this experimental design, we consider two factors. First, factor modularization
with factor levels flat and modularized refers to the modularization of the model.
Second, factor question type refers to the influence of modularization that should
be investigated. Factor level abstraction refers to questions that are presumably
influenced by abstraction, but are not influenced by fragmentation. Contrariwise,
factor level fragmentation refers to questions that are presumably influenced by
fragmentation, but are not influenced by abstraction.
Objects
The objects of our study are four process models created with BPMN.7 In particular,
we created two process models without modularization and then derived modularized versions from these models, operationalizing factor levels flat and modularized.
When elaborating the models, we ensured that the models contain typical constructs,
such as sequences, concurrency, exclusive choice and looping (cf. [207, 255]). To ensure a consistent layout, we used an automated algorithm for laying out the process
models [94]. In addition, we assume that a certain model size is required for positive
aspects of modularization to appear (cf. Section 5.3). Therefore, we ensured that
the size of model, i.e., the number of nodes [142], such as activities or gateways,
goes well beyond the recommended size of at most 50 nodes [145]. In particular, as
summarized in Table 5.3, the flat version of Model 1 contains 134 nodes, whereas
the flat version of Model 2 contains 131 nodes (subsequently, we refer Model 1 to
as M1 and Model 2 to as M2 ). When also taking into account edges, i.e., sequence
flows, 298 model elements can be found in M1 and 292 model elements in M2 . Please
note that the amount of start events, end events and sequence flows between the
flat and modularized versions differs, since process fragments that are extracted as
sub–processes were enclosed in a complex activity with a start event and an end
event. Finally, we would like to emphasize that for this study the quality of the
modularization is not of concern. In particular, the framework presented in Section 5.3.2 assumes that for any modularization positive and negative effects can be
found—the goal of this study is to provide empirical evidence that the framework is
indeed able to identify these positive and negative influences.
To operationalize factor question type, in particular factor level abstraction, we
created questions that are presumably influenced by abstraction, but are not influenced by fragmentation. In other words, these questions were designed so that
fewer model elements had to be taken into account when answering the question
7
Material can be downloaded from: http://bpm.q-e.at/experiment/Modularization
117
Chapter 5 The Impact of Modularization on Understandability
M1
flat
Activities
Complex activities
XOR gateways
AND gateways
Start events
End events
Sequence flows
Sub–processes
Nesting depth
Total nodes
Total elements
84
mod.
M2
flat
85
mod.
30
18
1
1
164
84
5
30
18
6
6
174
5
2
18
26
1
1
161
85
7
18
26
8
8
175
7
3
134
298
149
323
131
292
152
327
Table 5.3: Process models used in E2 and R2
in the modularized version (cf. Section 5.3.1). At the same time, we ensured that
not more than one sub–process was necessary for answering the question, thereby
preventing fragmentation. We would like to emphasize at this point that, as discussed in Section 5.3.2, information hiding and pattern recognition always occur at
the same time. As E2 and R2 focus on quantitative data, we designed our questions to particularly focus on information hiding, since information hiding can be
operationalized and measured rather easily in a quantitative setting.8 For operationalizing factor level fragmentation, the opposite of this strategy is applied. The
questions were designed such that the same amount of model elements had to be
considered, no matter whether the question was answered in the modularized or flat
version—thereby preventing abstraction. At the same time, we ensured that fragmentation would occur by forcing the reader to integrate information from several
sub–processes and to switch attention between sub–processes (cf. Section 5.3.1).
Furthermore, we ensured that these questions refer to aspects that are typically
needed for understanding a business process model, i.e., represent typical questions.
To this end, we developed question that are known to representatively cover aspects
of control–flow, i.e., ordering, concurrency, exclusiveness and repetition [137, 138].
As illustrated in Figure 5.5, for each of these categories, for each model (M1 and M2 )
8
The investigation of pattern recognition requires methods that are also able to analyze the modeler’s reasoning processes in detail. Therefore, in experiment E3 we pursue the exploration of
pattern recognition through the adoption of think–aloud techniques, cf. [64].
118
5.4 Evaluation Part I: BPMN
M1
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
M2
Ordering
Concurrency
Exclusiveness
Repetition
A
Q9
F
Q10
A
Q11
F
Q12
A
Q13
F
Q14
A
Q15
F
Q16
Ordering
Concurrency
Exclusiveness
Repetition
A
Question Type
F
A… Abstraction
F... Fragmentation
A
F
A
F
A
F
Figure 5.5: Experimental design: questions
and for each question type (abstraction and fragmentation) questions were developed, leading to a total of 16 questions (4 * 2 * 2). In the following, we refer to these
questions to as Q1 to Q16 . To ensure that questions operationalizing fragmentation
or abstraction are equally comprehensive, we developed question schemata for each
category. For instance, for category repetition we used the following schema: ’X’ can
be executed several times for a single case. By replacing X with a specific activity
from M1 or M2 , fragmentation/abstraction questions were created. In addition, we
ensured that all questions could be answered by using only information provided in
the process models, i.e., the questions are schema–based comprehension tasks [120].
Likewise, the process models are made available to the subjects while answering the
question, i.e., the tasks can be considered to be read–to–do, cf. [23]. Finally, we use
only closed questions, i.e., each question can be answered by selecting from the following answers: True, False and Don’t Know. We award one point for each correct
answer and zero points for a wrong answer (including Don’t Know ). We deliberately
allow for the option Don’t Know, as otherwise subjects would be forced to guess.
Response Variables
To test hypotheses H4 to H9 , we define the following response variables: mental effort
(H4 , H7 ), accuracy (H5 , H8 ) and duration (H6 , H9 ). To measure mental effort, we
employ a 7–point rating scale, asking subjects to rate the mental effort from Very
low (1) over Medium (4) to Very high (7). As for experiment E1 and replication
R1 , we would like to emphasize that employing rating scales for measuring mental
effort is known to be reliable and is widely adopted (cf. Section 3.2.3). Accuracy,
119
Chapter 5 The Impact of Modularization on Understandability
in turn, is defined as the ratio of correct answers given by a subject divided by the
number of all answers given. Hence, an accuracy of 1.0 describes that all questions
were answered correctly by a subject, while an accuracy of 0.0 indicates that all
questions were answered incorrectly by a subject. Finally, duration is defined as the
duration required for answering a question (measured in seconds), i.e., the amount
of seconds it takes to read the question, interpret the process model and give the
answer.
Experimental Design
As discussed in Section 3.3, Cheetah Experimental Platform (CEP) builds the basis
for empirical research conducted in this thesis. Likewise, we describe the experimental design in the form of an experimental workflow. As shown in Figure 5.6, the
experimental workflow starts by asking the subject for a code. Similar to E1 and
R1 , this code can be found on the assignment sheets distributed to subjects. By
equally and randomly distributing codes 3563 and 9174, we ensure a randomized
and balanced setup. In addition, the assignment sheets inform subjects that participation is anonymous and that no personal information will be collected. After
having entered a valid code, subjects are guided through a demographic survey and
a familiarization phase, in which subjects are introduced to the user interface. If the
subject has entered code 3563, questions 1–8 will be presented for the modularized
version of M1 , followed by questions 9–16 for the flat version of M2 . If the subject
has entered code 9174, questions 1–8 will be presented for the flat version of M1 ,
followed by questions 9–16 for the modularized version of M2 . Finally, regardless of
the code, a feedback form is shown to the subject.
code = 3563
Enter Code
Show
Demographic
Survey
Model 1,
Modularized:
Questions 1–8
Model 2, Flat:
Questions 9–16
Show
Feedback
Form
Familiarization
Phase
code = 9174
Model 1, Flat:
Questions 1–8
Model 2,
Modularized:
Questions 9–16
Figure 5.6: Experimental workflow
120
5.4 Evaluation Part I: BPMN
Instrumentation and Data Collection Procedure
To present the experimental material, i.e., process models and questions, to subjects,
we rely on Hierarchy Explorer provided by CEP.9 Hierarchy Explorer, as shown in
Figure 5.7, provides a basic user interface for the visualization of hierarchical process
models. In particular, on the top left (1), the structure of the process model is shown.
In this particular case, the top–level process is called Scientific Process, having
sub–processes Do systematic literature review and Perform empirical evaluation.
Next to the process structure, in (2) a question about the process model, which is
displayed in (3), is shown. As summarized in Table 5.3, the process models used
in this study contain up to 152 nodes. Thus, locating an activity by reading and
comparing activity labels may cause a significant effort. As this study focuses rather
on the reasoning about the control flow of a process model than the identification
of activities, we decided to alleviate subjects from this tiresome task. In particular,
whenever a question in (2) refers to an activity in the process model shown in (3),
the user interface colors this particular part in the question as well as in the process
model.10 For instance, in Figure 5.7 activities Define the problem and Check if
problem is already solved are colored accordingly in (2) and (3). Whenever a subject
has answered a question, it is possible to navigate to the next question by clicking
on the Next button (4). To constantly inform subjects about their progress, the
current percentage of answered questions is displayed below Overall Progress in (4).
Finally, to make the interaction with Hierarchy Explorer as simple as possible, the
user cannot navigate back to previously answered questions.
Similar to TDMS (cf. Section 4.6), Hierarchy Explorer was implemented as an experimental workflow activity of CEP. This, in turn, allows for a seamless integration
in the experimental workflow (cf. Figure 5.6). In addition, CEP automatically logs
the duration of any experimental workflow activity, thereby automatically assessing
the time required for answering and collecting response variable duration.
5.4.2 Performing the Experiment (E2 )
Based upon the experimental setup described in Section 5.4.1, the controlled experiment E2 was conducted. Aspects regarding the preparation and operation of E2 , as
well as subsequent data validation and data analysis, are covered in the following.
9
We would like to thank Thomas Porcham for supporting the implementation of Hierarchy Explorer.
10
We think that the coloring of activities is acceptable, since the task of locating activities is
not of concern in this study. However, requiring subjects to repeatedly identify activities by
comparing activity labels from up to 85 activities, most likely causes severe annoyance among
subjects, hence impairing motivation.
121
Chapter 5 The Impact of Modularization on Understandability
Figure 5.7: User interface of Hierarchy Explorer
Experimental Operation of E2
Experimental Preparation Similar to E1 , the preparation of E2 can be divided
into the preparation of experimental material as well as the acquisition and training
of subjects. We argued in Section 5.4.1 that the process models should reach a
reasonable size and complexity so that the effects of modularization can be observed.
As summarized in Table 5.3, the employed process models consist of 131 up to
152 nodes, which is clearly above the recommended size of a sub–process of 50
nodes [145]. After having created the process models, assignment sheets describing
the experimental tasks to be conducted were printed. In addition, a pre–configured
version of CEP was compiled and made available through our website. To ensure that
the used material and instructions were comprehensible, we piloted the study and
iteratively refined the material. Regarding the acquisition and training of subjects,
we could rely on an ongoing collaboration with Prof. Reijers from the Eindhoven
University of Technology. In particular, Prof. Reijers agreed to conduct experiment
E2 in the course of a lecture on Business Process Management. BPMN and the
usage of sub–processes were already existing topics covered in this lecture, hence
participating students were trained accordingly.
Experimental Execution Experiment E2 was conducted in January 2012 at the
Eindhoven University of Technology in the course of a lecture on Business Process
Management with 114 participants. The subjects were given the prepared assignment sheets and advised to follow the instruction detailed on the assignment sheets.
The rest of experiment E2 was guided by CEP’s experimental workflow engine,
122
5.4 Evaluation Part I: BPMN
leading subjects through an initial questionnaire, a familiarization phase and 16
questions regarding the prepared process models. As illustrated in Figure 5.6, 8 of
the questions were posed for a modularized process model, whereas the remaining 8
questions were posed for a non–modularized process model. Finally, the experiment
was concluded with a feedback form, enabling the subjects to report any problems
or difficulties.
Data Analysis of E2
Up to now, we have described the experimental design of E2 as well as the experimental preparation and experimental operation. In the following, we report from
the data validation an data analysis. For this purpose, we used SPSS (Version 21.0),
if not indicated otherwise.
Data Validation The data validation conducted for E2 consisted of two phases. As
shown in Figure 5.7, we colored relevant activities in our process models to support
subjects in the identification of activities. Hence, in the first phase, we screened our
sample for subjects that had problems with identifying colors or were colorblind. To
this end, one of the demographic questions asked subjects to indicate whether they
had troubles identifying colors or were colorblind. Our data analysis showed that 5
out of 114 subjects affirmed this question and were hence removed from the sample,
leaving 109 subject for analysis.
In the second phase we analyzed the demographical data of the remaining 109 subjects. As summarized in Table 5.4, questions 1–7 concerned the modeling proficiency
of the subjects, i.e., confidence and competence using and understanding BPMN,
experience and education. In addition, questions 8–10 asked for details of the models
subjects had created or analyzed. Finally, questions 11–12 concerned the domain
knowledge of subjects. For questions 1–3 and questions 11–12 we employed 7–point
rating scales ranging from Strongly disagree (1) over Neutral (4) to Strongly agree
(7). Subjects indicated average familiarity with BPMN (M = 4.57, SD = 1.20) and
felt rather competent in understanding BPMN (M = 5.14, SD = 1.19). In addition,
subjects indicated on average 7.90 months of experience in BPMN (SD = 11.09)
and 2.58 years of experience in BPM in general (SD = 1.56). Furthermore, subjects
indicated profound formal training (M = 5.55 days, SD = 5.78) and self–education
(M = 6.83 days, SD = 11.18) during the last year. Questions 8–10 indicate that
subjects had experience in reading process models (M = 52.08, SD = 49.4) and creating process models (M = 26.29, SD = 6.64), however not necessarily experience
with large process models (M = 16.32 activities per model, SD = 6.64). Finally,
with respect to domain knowledge, subjects indicated that they were rather familiar
123
Chapter 5 The Impact of Modularization on Understandability
with scientific work (M = 5.12, SD = 1.08), i.e., the domain used for process model
M1 , but rather unfamiliar with space travel (M = 2.97, SD = 1.36), i.e., the domain
used for process model M2 .
Min.
Max.
M
SD
Familiarity with BPMN
Confidence understanding BPMN
Competence modeling BPMN
Months using BPMN
Years experience in BPMa
Days of formal training last year
Days of self–education last year
2
2
2
0
0
1
0
7
7
7
75
7
40
100
4.57
5.14
4.58
7.90
2.58
5.55
6.83
1.20
1.19
1.14
11.09
1.56
5.78
11.18
8. Process models read last year
9. Process models created last year
10. Avg. number of activities per model
8
1
3
300
100
40
52.08
26.29
16.32
49.4
20.08
6.64
11. Familiarity scientific work
12. Familiarity space travel
2
1
7
6
5.12
2.97
1.08
1.36
1.
2.
3.
4.
5.
6.
7.
a
Three subjects reported implausible values, e.g., 1000 years experience in BPM, hence they
were excluded from this overview.
Table 5.4: Demographical statistics of E2
Summarizing, we conclude that the sample meets the demanded requirements, i.e.,
subjects should have received sufficient training in BPMN. Against this background,
we conduct the data analysis and hypothesis testing in the following. In particular,
we analyze RQ9 (positive influence of modularization) and RQ10 (negative influence
of modularization) at three different levels of granularity. First, we give an overview,
i.e., report aggregated values for all modularized models versus aggregated values
for all non–modularized models. Then, we look into the differences for each model
and finally report results for each question.
RQ9 : Is the Understanding of a BPMN–Based Model Positively Influenced by
Abstraction?
As summarized in Section 5.4.1, we expect that abstraction has a positive influence
on mental effort (H4 ), accuracy (H5 ) and duration (H6 ). To test these hypotheses,
we created modularized and non–modularized (flat) versions of models and posed
questions for those models. For RQ9 we created questions that are presumably
124
5.4 Evaluation Part I: BPMN
influenced by abstraction, but not impaired by fragmentation. Thus, the questions
should be easier to be answered in modularized models, i.e., should result in lower
mental effort, higher accuracy and lower duration.
The results of this investigation are listed in Table 5.5. In particular, in the first
column the hypotheses to be tested are listed. The second column shows the average
values for modularized models, whereas the third column shows the average values
for flat models; the difference between average values for modularized models and
average values for flat models can be found in the fourth column. We would like to
remind at this point that each subject was asked to answer 4 questions regarding
abstraction for modularized models and 4 questions regarding abstraction for flat
models. Hence, the values reported in Table 5.5 are the aggregated values for 4
questions. Mental effort was measured using a 7–point rating scale, resulting in a
value range from 4 to 28. Accuracy, in turn, was defined to range from 0 to 1,
likewise values potentially range from 0 to 4. Likewise, durations are summed up
for 4 questions. As expected, the average mental effort (H4 ) and duration (H6 ) were
lower for modularized models, however, accuracy (H5 ) was slightly higher for flat
models. Subsequently, columns five to seven list the results from statistical analysis.
As the data is not normal–distributed11 and due to repeated measurements, response
variables are not independent, we chose Wilcoxon Signed–Rank Test. In particular,
Wilcoxon Signed–Rank Test reveals that the mental effort for modularized models
(H4 ) was significantly lower for modularized models (Z = −4.94, p = 0.000, r =
0.47), also the duration (H6 ) was significantly lower for modularized models (Z =
−4.99, p = 0.000, r = 0.48). Considering Cohen’s guidelines for effect sizes, an
effect size of 0.1 is regarded a small effect, 0.3 is regarded as medium effect and
values ranging around 0.5 are considered to be large effects [32, 33]. Therefore,
for both comparisons large effects, i.e., values around 0.5, could be observed. For
hypothesis accuracy (H5 ), in turn, no statistically significant differences could be
found (Z = −1.07, p = 0.286, r = 0.10), also the effect size can be considered to be
small.
In summary, it can be said that support for hypothesis mental effort (H4 ) and
duration (H6 ), i.e., statistically significant differences and large effect sizes, could be
found. Hypothesis accuracy (H5 ), however, could not be supported, i.e., differences
were not statistically significant and effect sizes were small. In the following, we
refine our analysis by looking into the results model–by–model, i.e., analyzing M1
and M2 separately.
11
We would like to remind at this point that for the sake of readability tests for normal distribution
were moved to Appendix A.3.
125
Chapter 5 The Impact of Modularization on Understandability
Hypothesis
H4 : Mental effort
H5 : Accuracy
H6 : Duration
a
Mod.
11.60
3.42
117.22
Flat
13.10
3.50
154.48
Δ
−1.50
−0.08
−37.26
Z
−4.94
−1.07
−4.99
p
0.000a
0.286
0.000a
r
0.47
0.10
0.48
significant at the 0.05 level
Table 5.5: Results for abstraction questions
Abstraction: Results per Model Similar to the previous analysis, we organized our
results in a table. In particular, as shown in Table 5.6, we analyzed hypotheses H4 ,
H5 and H6 separately for M1 and M2 . The first column lists the hypotheses, while
the second column lists the models for which the hypotheses were tested. Columns
three and four list the values for modularized and flat models, respectively, whereas
column five shows the difference between modularized and flat models. The remaining columns six to eight, in turn, list the results of statistical tests. The data is not
normal–distributed, however, the response variables are now independent, because
the models are analyzed separately—thus, we applied Mann–Whitney U Test. The
results, as shown in Table 5.6 are in line with the results from Table 5.5. Hypotheses
mental effort (H4 ) and duration (H6 ) consistently show significant differences, even
though the effect size is reduced to a medium effect. For hypothesis accuracy (H5 ),
in turn, statistically significant differences could be found for M1 (even though opposing the expected direction); for M2 differences were not significant. To explain
the reduction of effect size, we would like to refer to the fact that we used Wilcoxon
Signed–Rank Test, i.e., a paired sample test, in Table 5.5, but used Mann–Whitney
U Test, i.e., an unpaired sample test, in Table 5.6. As paired–sample tests eliminate inter–individual differences and thus reduce the standard error of the difference
between means, paired test can more accurately detect differences [294], which, in
turn, explains the differences in effect size.
Summarizing, it can be said that the results obtained from analyzing each model
separately for hypotheses H4 , H5 and H6 are in line with the results obtained from
analyzing the models together, i.e., results from Table 5.5. In particular, the results
suggest a significant influence of abstraction on mental effort (H4 ) and duration
(H6 ), but do not support the influence on accuracy (H5 ). In the following, we will
look into mental effort, accuracy and duration for each question. By doing so, we
investigate whether findings are also consistent across questions. Furthermore, in
our opinion, such a fine–grained analysis helps to get a more detailed picture of the
collected data.
126
5.4 Evaluation Part I: BPMN
Hypothesis
Model
Mod.
Flat
Δ
U
p
r
H4 : Mental
effort
M1
M2
11.98
11.19
13.15
13.05
−1.17
−1.86
1102.50
1090.00
0.020a
0.016a
0.22
0.23
H5 : Accuracy
M1
M2
3.32
3.53
3.62
3.39
−0.30
0.14
1142.00
1337.00
0.015a
0.305
0.23
0.10
H6 : Duration
M1
M2
136.75
96.59
179.77
130.54
−43.02
−33.95
764.00
853.00
0.000a
0.000a
0.42
0.37
a
significant at the 0.05 level
Table 5.6: Results for abstraction questions (per model)
Abstraction: Results for Mental Effort (H4 ) So far, we have analyzed the impact
of abstraction on the mental effort for all models as well as per model. Now, we
investigate the influence of abstraction on mental effort on a per–question basis.
Please remember that questions that operationalized abstraction and fragmentation
were posed alternately, i.e., Q1 , Q3 , . . . Q15 operationalized abstraction, while Q2 ,
Q4 , . . . Q16 operationalized fragmentation (cf. Section 5.4.1). Thus, the summary
in Table 5.7 only lists odd questions. Similar to the previous tables, in Table 5.7, the
first two columns list the model and question. Then, mental effort for modularized
and flat models as well as the difference thereof are shown. Finally, columns six
to eight list results from testing for statistical significance using Mann–Whitney U
Test. Again, it can be observed that the results are in line with the observations
made so far, i.e., the mental effort for abstraction questions was tendentially lower
in modularized models than in flat models.
At the same time, it can also be observed that the differences were not statistically
significant for all questions, even though differences from Table 5.5 and Table 5.6
were statistically significant. This discrepancy can be explained by the assumption
that a response variable, i.e., mental effort in this case, is not only influenced by the
treatment, i.e., abstraction and modularization, but also influenced by other factors,
such as the subject’s knowledge or the phrasing of the question. In addition, it is
assumed that the influence of the treatment is on average larger than the influence
of such interference factors. Thus, when analyzing mental effort per model, the influence of the treatment is accumulated, but also the influence of interference factors
is accumulated. As, however, the average influence of the treatment is larger than
the average influence of interference factors, the difference between the influence of
the treatment and the influence of interference factors grows—hence also differences
are more likely to be found statistically significant. Against this background, also
127
Chapter 5 The Impact of Modularization on Understandability
the results presented in Table 5.7 are in line with the findings presented so far.
Model
Quest.
M1
M2
a
Mod.
Flat
Δ
U
p
r
Q1
Q3
Q5
Q7
2.75
3.14
3.02
3.07
2.85
3.75
3.58
2.96
−0.10
−0.61
−0.56
0.11
1366.50
968.50
1017.50
1385.00
0.452
0.001a
0.002a
0.519
0.07
0.31
0.29
0.06
Q9
Q11
Q13
Q15
2.85
2.89
2.87
2.58
3.05
3.48
3.20
3.32
−0.20
−0.59
−0.33
−0.74
1291.00
1128.00
1239.50
958.50
0.223
0.025a
0.120
0.001a
0.12
0.21
0.15
0.32
significant at the 0.05 level
Table 5.7: H4 —Mental effort for abstraction questions
Abstraction: Results for Accuracy (H5 ) Analogous to mental effort (H4 ), in the
following we summarize the average accuracy of each individual question (H5 ). In
particular, Table 5.8 shows the average accuracy, differences between modularized
and flat models as well as results from statistical tests. Here, two observations are
of special interest. First, except for Q15 , observed differences are marginal and non–
significant. Second, results rather point toward a negative influence of abstraction—
even though the only statistically significant difference indicates a positive influence
of abstraction on accuracy. These two observations further substantiate that in E2
the influence of abstraction on accuracy appears to be inconsistent and rather weak.
Not surprisingly, the average effect size from Table 5.8 is 0.10, i.e., only a small effect
could be observed (cf. [32, 33]).
Abstraction: Results for Duration (H6 ) To conclude the analysis of RQ9 , i.e., the
influence of abstraction, we look into the average duration required for answering
questions. As summarized in Table 5.9, subjects were tendentially faster when answering questions in modularized models. The observed differences were found to be
statistically significant in all but one question (Q9 ), i.e., are in line with the findings
presented up to now. However, we also would like to call attention to Q7 : in this
case subjects were significantly slower in modularized models. A detailed analysis
of Q7 showed that the question was phrased such that subjects had to screen the
process models’ sub–processes to find all relevant activities. As Q7 could then be
128
5.4 Evaluation Part I: BPMN
Model
Quest.
M1
M2
a
Mod.
Flat
Δ
U
p
r
Q1
Q3
Q5
Q7
0.82
0.79
0.77
0.95
0.94
0.91
0.83
0.94
−0.12
−0.12
−0.06
0.01
1303.00
1306.00
1391.50
1479.50
0.051
0.086
0.420
0.945
0.19
0.16
0.08
0.01
Q9
Q11
Q13
Q15
0.89
0.77
0.89
0.98
0.91
0.75
0.91
0.82
−0.02
0.02
−0.02
0.16
1448.50
1449.00
1448.50
1247.00
0.680
0.774
0.680
0.006a
0.04
0.03
0.04
0.26
significant at the 0.05 level
Table 5.8: H5 —Accuracy for abstraction questions
answered by considering a single sub–process, it thereby did not cause fragmentation, i.e., switching between sub–processes while answering the question. However,
it apparently caused significant overhead for locating the relevant activities, which
explains the higher duration for modularized models. In this sense, Q7 probably did
not operationalize abstraction perfectly. As, however, the remaining questions seem
to counterbalance this issue, i.e., the aggregated values from Table 5.5 and Table 5.6
are still statistically significant, we refrained from excluding Q7 from this analysis.
Model
Quest.
M1
M2
a
Mod.
Flat
Δ
U
p
r
Q1
Q3
Q5
Q7
32.80
40.00
29.30
34.65
51.89
50.77
47.75
29.37
−19.09
−10.77
−18.45
5.28
698.00
747.00
658.00
1068.00
0.000a
0.000a
0.000a
0.012a
0.46
0.43
0.48
0.24
Q9
Q11
Q13
Q15
36.52
24.98
22.45
12.63
40.11
34.65
34.41
21.38
−3.59
−9.67
−11.96
−8.75
1339.00
1020.50
720.00
604.00
0.379
0.005a
0.000a
0.000a
0.08
0.27
0.44
0.51
significant at the 0.05 level
Table 5.9: H6 —Duration for abstraction questions
Concluding, it can be said that experiment E2 provides empirical support for the
positive influence of abstraction on mental effort (H4 ) and duration (H6 ), while no
129
Chapter 5 The Impact of Modularization on Understandability
statistically significant influence on accuracy (H5 ) could be found. In the following,
we investigate RQ10 , i.e., the influence of fragmentation.
RQ10 : Is the Understanding of a BPMN–Based Model Negatively Influenced
by Fragmentation?
In the following, we approach RQ10 analogous to RQ9 . More specifically, as summarized in Section 5.4.1, we expect that fragmentation has a negative influence
on mental effort (H7 ), accuracy (H8 ) and duration (H9 ). Therefore, we created
questions that are presumably impaired by fragmentation, but not influenced by
abstraction. Thus, the questions should be harder to answer in modularized models, i.e., should result in higher mental effort, lower accuracy and higher duration.
We again start by taking a look at the hypotheses for the aggregated values of all
questions, report findings from M1 and M2 separately and finally report values for
each question.
Hypothesis
Mod.
H7 : Mental effort
H8 : Accuracy
H9 : Duration
a
16.17
2.70
176.87
Flat
13.06
3.45
120.08
Δ
3.11
−0.75
56.79
Z
−7.20
−5.28
−5.92
p
0.000a
0.000a
0.000a
r
0.69
0.51
0.57
significant at the 0.05 level
Table 5.10: Results for fragmentation questions
An overview of the results of RQ10 can be found in Table 5.10. Akin to RQ9 , the
columns list the hypotheses to be tested, values for modularized and flat models,
differences thereof as well as results from statistical tests. Again, we relied on
Wilcoxon Signed–Rank Test, since the data is not normal–distributed and the sample
is due to the repeated measurements not independent. Wilcoxon Signed–Rank Test
shows that mental effort for modularized models (H7 ) was significantly higher (Z =
−7.20, p = 0.000, r = 0.69), accuracy for modularized models (H8 ) was significantly
lower (Z = −5.28, p = 0.000, r = 0.51) and duration for modularized models (H9 )
was significantly higher (Z = −5.92, p = 0.000, r = 0.57). In addition, large effect
sizes, i.e., values ≥0.5, could be observed (cf. [32, 33]). Summarizing, we conclude
that support for hypotheses mental effort (H7 ), accuracy (H8 ) and duration (H9 ),
i.e., statistically significant differences and large effect sizes, could be found. In the
following, we extend our analysis by investigating the results per model.
130
5.4 Evaluation Part I: BPMN
Fragmentation: Results per Model To determine whether the reported findings
are specific for M1 or M2 , or could be found for both models, we analyzed H7 , H8 and
H9 separately for M1 and M2 . The results of this analysis can be found in Table 5.11,
whereby the columns again list hypotheses, models, values for modularized models
and flat models, differences thereof as well as results from the statistical analysis.
As per RQ9 , the data is not normal–distributed and variables are independent due
to the separate analysis of models, hence we applied Mann–Whitney U Test. It
can be observed that the results from Table 5.11 are in line with previous findings,
i.e., the influence of fragmentation on mental effort (H7 ), accuracy (H8 ) as well
as duration (H9 ) is present for both models and statistically significant. However,
again, effect sizes slightly decrease, which can be traced back to the application of
tests for unpaired samples (cf. [294]).
Hypothesis
Model
Mod.
Flat
Δ
U
H7 : Mental
effort
M1
M2
15.64
16.74
12.32
13.75
3.32
2.99
575.00
797.00
0.000a
0.000a
0.53
0.40
H8 : Accuracy
M1
M2
2.82
2.57
3.66
3.25
−0.84
−0.68
806.00
1040.50
0.000a
0.005a
0.43
0.27
H9 : Duration
M1
M2
262.94
219.27
128.40
112.22
134.54
107.05
197.00
393.00
0.000a
0.000a
0.75
0.63
a
p
r
significant at the 0.05 level
Table 5.11: Results for fragmentation questions
In summary, it can be observed that results obtained from analyzing M1 and M2
separately are in line with the results obtained from analyzing models together, i.e.,
results from Table 5.10. In the following, we refine our analysis by investigating the
effects on individual questions.
Fragmentation: Results for Mental Effort (H7 ) So far, we have analyzed the
influence on mental effort (H7 ) for all models as well as per model. Now, we look
into the effects on each question individually. We would like to remind that questions
that operationalized abstraction and fragmentation were posed alternately, i.e., Q1 ,
Q3 , . . . Q15 operationalized abstraction, while Q2 , Q4 , . . . Q16 operationalized
fragmentation. For this reason, Table 5.12 lists only even questions. Analogous to
the analysis of H4 , the first two columns list the model and question, while columns
three to five show values for modularized and flat models as well as differences
thereof; columns six to eight report results from statistical tests. It can be observed
131
Chapter 5 The Impact of Modularization on Understandability
that for all questions the average mental effort was higher for modularized models.
In addition, all differences could be found to be statistically significant and effect
sizes range from medium to large, as indicated in columns six to eight. Hence, it
can be said that results obtained from the question–based analysis is in line with
the findings obtained so far.
Model
Quest.
M1
M2
a
Mod.
Flat
Δ
U
p
r
Q2
Q4
Q6
Q8
3.82
4.14
4.05
3.63
2.74
2.83
3.62
3.13
1.08
1.31
0.43
0.50
697.50
491.50
1139.00
1091.50
0.000a
0.000a
0.029a
0.012a
0.47
0.60
0.21
0.24
Q10
Q12
Q14
Q16
4.75
4.09
3.96
3.92
3.55
3.59
3.32
3.29
1.20
0.50
0.64
0.63
693.50
1041.50
1002.50
1033.00
0.000a
0.006a
0.003a
0.005a
0.47
0.27
0.29
0.27
significant at the 0.05 level
Table 5.12: H7 —Mental effort for fragmentation questions
Fragmentation: Results for Accuracy (H8 ) Analogous to mental effort (H7 ), in
the following we summarize the average accuracy (H8 ) of each individual question.
As can be seen in Table 5.13, the average accuracy for fragmentation questions
was lower in modularized models, cf. column five. However, differences are only
statistically significant for three questions and effect sizes are lower than for the
previous analyses. As argued for abstraction, this might be traced back to the
assumption that differences are more likely to be found significant when values are
aggregated, as the effect of the treatment sums up. Against this background, also
the results from Table 5.14 are in line with findings obtained so far.
Fragmentation: Results for Duration (H9 ) To conclude RQ10 , we look into the
influence of fragmentation on duration (H9 ) for each question. As summarized in
Table 5.14, for all questions subjects spent more time for answering questions in the
modularized versions of the model. In addition, all differences could be found to be
statistically significant and effect sizes range from medium to large. In this sense,
also the results from Table 5.14 are in line with the findings obtained up to now.
Summarizing, it can be said the data obtained in E2 corroborates the positive
influence of abstraction (RQ9 ) as well as the negative influence of fragmentation
132
5.4 Evaluation Part I: BPMN
Model
Quest.
M1
M2
a
Mod.
Flat
Δ
U
p
r
Q2
Q4
Q6
Q8
0.75
0.79
0.84
0.45
0.89
0.94
0.92
0.91
−0.14
−0.15
−0.08
−0.46
1281.00
1250.00
1357.50
802.50
0.066
0.017a
0.172
0.000a
0.18
0.23
0.13
0.49
Q10
Q12
Q14
Q16
0.40
0.68
0.81
0.68
0.82
0.82
0.86
0.75
−0.42
−0.14
−0.05
−0.07
853.00
1273.00
1416.00
1379.00
0.000a
0.087
0.522
0.415
0.43
0.16
0.06
0.08
significant at the 0.05 level
Table 5.13: H8 —Accuracy for fragmentation questions
(RQ10 ). In particular, for abstraction, hypotheses mental effort (H4 ) and duration
(H6 ) could be supported, whereas no indication for a significant influence on accuracy could be found (H5 ). For fragmentation, in turn, support for mental effort
(H7 ), accuracy (H8 ) and duration (H9 ) could be found. In the following, we look
into the correlation between mental effort and accuracy as well as the correlation
between mental effort and duration.
Correlations with Mental Effort
In Section 5.3.2, we argued that we centered our framework around mental effort,
since it is known to correlate with performance and therefore can be used to estimate understandability. Subsequently, we investigate whether mental effort is indeed
linked to answering performance measured in E2 , i.e., accuracy and duration. In
particular, we computed the average mental effort, accuracy and duration for each
question. Since we expect different values, depending on whether a question was
posed for the modularized or flat version of a process model, we differentiated between questions that were posed for modularized models and questions that were
posed for flat models. Having asked 8 questions per model for 2 models and considering the differentiation between modularized and flat models, in total 32 data points
could be computed (8 * 2 * 2). The first part of this analysis can be found in Figure 5.8, which shows the correlation between mental effort and accuracy. Therein,
the x–axis represents the mental effort, ranging from Extremely low mental effort
(1) to Extremely high mental effort (7). On the y–axis, in turn, the corresponding
accuracy can be found, ranging from 0 (all answers wrong) to 1 (all answers correct).
133
Chapter 5 The Impact of Modularization on Understandability
Model
M1
M2
a
Quest.
Mod.
Flat
Δ
U
p
r
Q2
Q4
Q6
Q8
71.29
68.71
74.98
47.95
36.96
22.79
40.20
28.45
34.33
45.92
34.78
19.50
583.00
181.00
546.00
742.00
0.000a
0.000a
0.000a
0.52
0.76
0.54
0.43
Q10
Q12
Q14
Q16
88.07
40.28
56.53
34.40
36.21
25.69
28.41
21.91
51.86
14.59
28.12
12.49
363.00
817.00
877.00
916.00
0.000a
0.000a
0.000a
0.001a
0.65
0.39
0.35
0.33
0.000a
significant at the 0.05 level
Table 5.14: H9 —Duration for fragmentation questions
Considering Figure 5.8, three observations can be made. First, most questions
were perceived to be rather easy, as most of the data points can be found to the
left of Neither high nor low mental effort (4). Likewise, for most questions an
accuracy of at least 70% could be computed. Second, the scatter diagram suggests that high mental effort is related to low accuracy. In fact, mental effort
and accuracy show a significant negative correlation according to Pearson Correlation (r(30) = −0.643, p = 0.000), corroborating the assumption that mental effort
and accuracy are related. Third, even though most questions seem to show similar behavior, two questions seem not to fit in. In particular, Q8 and Q10 posed
for modularized models resulted in a below–average accuracy. Although it is not
surprising that some questions were possibly more difficult than others, it seems
that the relation between mental effort and accuracy for Q8 and Q10 differs with
respect to the other questions. To quantify these differences, we applied simple
linear regression, which shows that mental effort significantly predicts accuracy:
accuracy = 1.37 − 0.16 ∗ mental effort, t(30) = −4.60, p = 0.000 and explains a significant proportion of variance in accuracy, R2 = 0.41, F (1, 30) = 21.16, p = 0.000.
To identify outliers, we computed the residual for each question and calculated the
Median Absolute Deviation (MAD) [82]. In particular, values differing more than
3 times the MAD from the median were considered outliers. Applying this rather
conservative criterion [130], Q8 and Q10 were detected as the only outliers. Therefore, in the following we discuss potential reasons for the unexpected behavior of Q8
and Q10 .
One potential explanation could be that questions Q8 and Q10 were worded imprecisely or were difficult to comprehend. However, as summarized in Table 5.15,
134
5.4 Evaluation Part I: BPMN
1.00
Accuracy
0.80
0.60
0.40
0.20
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
Mental Effort
Figure 5.8: Mental effort versus accuracy, adapted from [298]
when the same question was asked for flat models, a considerably higher accuracy
was reached. Another potential explanation could be that ordering and repetition
questions were particularly difficult. Again, we would like to refer to Table 5.15
to refute this explanation: except for Q8 and Q10 , the average accuracy was above
average (average accuracy for E2 : 82%). Finally, it may have been the case that
ordering and repetition questions were particularly difficult when posed in modularized models. As described in Section 5.4.1, we posed the same type of questions,
i.e., ordering and repetition in this case, for M1 as well as M2 . However, these
particularly low accuracy values could only be found once for M1 and once for M2 .
Mod.
Flat
Mod.
Flat
Ordering
M1
M2
Q7
Q8
95%
45%
94%
91%
Q15
Q16
98%
68%
82%
91%
Repetition
M1
M2
Q9
Q10
89%
40%
91%
82%
Q1
Q2
82%
75%
94%
89%
Table 5.15: Average accuracy for ordering and repetition questions
135
Chapter 5 The Impact of Modularization on Understandability
Against this background, also this explanation seems implausible. Hence, due to
lack of alternative explanations, we conclude that these outliers can be traced back
peculiarities of the sample.
100.00
Duration (sec)
80.00
60.00
40.00
20.00
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
Mental Effort
Figure 5.9: Mental effort versus duration, adapted from [298]
Analogously, we used the same procedure to investigate the correlation between
mental effort and duration. As shown in Figure 5.9, the average mental effort
and average duration required for answering a question seem to be correlated.
Indeed, Pearson Correlation indicates a positive correlation between mental effort and duration (r(30) = 0.742, p = 0.000). Analogous to the correlation between mental effort and accuracy, we have applied the simple linear regression:
duration = −44.41 + 24.93 ∗ mental effort, t(30) = 6.06, p = 0.000, whereby mental
effort explains a significant proportion of variance in duration: R2 = 0.55, F (1, 30) =
36.70, p = 0.000. Applying median ±3 MAD for residuals confirms that all data
points behave similarly, further corroborating the connection between mental effort
and duration.
Therefore, we conclude that—even though two unexplainable outliers were identified for accuracy—mental effort and accuracy as well as mental effort and duration
are correlated. In other words, E2 corroborates the assumption that mental effort can be used as an estimator for answering performance such as accuracy and
136
5.4 Evaluation Part I: BPMN
duration. Next, we discuss limitations of E2 before we revisit the findings for a
discussion.
Limitations
Clearly, experiment E2 has to be seen in the light of several limitations, mostly
concerning the generalization of results. Particularly, even though 114 subjects participated in E2 , all of the subjects indicated that they were students. As argued in
Section 4.9, students may provide an adequate model for the professional population, if certain conditions, e.g., an adequate level of commitment, are fulfilled. Since
82% of the questions were answered correctly, we conclude that subjects were seriously interested in properly working on their assignments, indicating commitment.
Furthermore, it should also be noted that experiment E2 involved only two process
models. Even though these models contained typical constructs, e.g., sequences and
concurrency, apparently not all potential constructs can be covered. Likewise, the
process models were modeled using BPMN, i.e., E2 is restricted to a single process
modeling language. Although BPMN can be considered to be a de–facto standard
and the findings therefore contribute to the understanding of a commonly used process modeling language, generalization to other languages is limited. Finally, even
though questions asked in E2 were designed to cover typical modeling constructs, it
may be the case that the questions were not entirely representative.
Discussion
Overall, the data collected in experiment E2 provides support for the positive influence of abstraction (RQ9 ) and the negative influence of fragmentation (RQ10 ) on
the understanding of BPMN–based process models. Thereby, we operationalized the
understanding of a model as the mental effort, accuracy and duration required for
answering a question. In the following, we link these findings with existing insights
from literature and discuss potential reasons for the missing influence of abstraction
on accuracy.
When conducting the systematic literature review assessing the current state of
the art regarding empirical studies investigating the interplay between modularization and understanding, we could observe that researchers seemed to have a rather
positive attitude toward modularization. For instance, Cruz–Lemus et al. [42] state
that “our hypotheses derive from conventional wisdom, which says that hierarchical
modeling mechanisms are helpful in mastering the complexity of a software system”.
In a similar vein, [150, 225] seek support for the positive effects of mechanisms of
modularization. In turn, Cruz–Lemus et al. seem to be surprised that their studies
137
Chapter 5 The Impact of Modularization on Understandability
showed a negative influence of modularization: “this finding goes against conventional wisdom” [42]. Against this background, the findings from E2 improve over
the state of the art by providing empirical evidence that modularization not necessarily improves all aspects of a model. Rather, it appears that a certain trade–off
is involved in the application of modularization. In particular, E2 empirically substantiates that abstraction and fragmentation may offer a viable explanation for the
interplay of modularization and understandability.
Even though we argue that E2 provides empirical evidence for the positive influence, of abstraction, it remains unclear why no influence on accuracy (H5 ) could
be found. The most obvious explanation for the lack of correlation between abstraction and accuracy in E2 is simply that abstraction does not influence accuracy,
i.e., the postulated framework from Section 5.3.2 is incorrect. However, as argued
in the following, we think that this explanation—even though obvious at first—is
implausible. Having established that mental effort and accuracy as well as mental
effort and duration correlate and knowing that abstraction had a significant influence on mental effort and duration, it seems implausible that no connection between
abstraction and accuracy exists. Hence, we conclude that the lack of significant differences can be traced back to peculiarities of the experimental setup that could not
be detected when analyzing the data of E2 . To test this assumption, we conducted
a replication of E2 with the goal of understanding the interplay between abstraction
and accuracy.
5.4.3 Performing the Replication (R2 )
The replication of experiment E2 , subsequently referred to as R2 , was conducted for
two reasons. First, in E2 we could provide empirical evidence for the positive influence of abstraction on mental effort and duration, but could not find any influence
on accuracy. Hence, the first goal of R2 was to further investigate this unexpected
outcome in detail. Second, by conducting a replication, the findings obtained in E2
should be confirmed by a second empirical investigation. Due to the nature of a
replication, the experimental design of R2 is closely connected to the experimental
design of E2 . Therefore, in the following, we will not repeat the experimental design described in Section 5.4.1 here, but rather discuss the differences between the
experimental designs of E2 and R2 . Similar to E2 , then we focus on the preparation
and operation of R2 , as well as data validation and analysis.
138
5.4 Evaluation Part I: BPMN
Experimental Design
Basically, the experimental design of R2 is identical to the experimental design of
E2 , except for one difference: Besides asking subjects comprehension questions, we
asked subjects to justify their answer. To this end, we extended the graphical user
interface of Hierarchy Explorer, as shown in Figure 5.7, by replacing the progress
bar with a text field. Subjects were then asked to answer the question, but also to
briefly explain why they had decided for a certain answer. Through this adaption,
we intended to extend the measurement of mental effort, accuracy and duration
with more detailed information about accuracy. Particularly, we planned to analyze incorrect answers, trying to investigate the connection between abstraction and
accuracy in more detail. For the analysis of incorrect answers, the adoption of principles from grounded theory [35, 234] in a two–step procedure was planned. First,
in the open coding phase all justifications of incorrect answers were collected. For
each justification, a potential cause for the incorrect answer was determined, e.g.,
lack of knowledge or misunderstood questions. After all justifications received an
initial classification, the procedure was repeated, all classifications were revisited
and adapted if necessary. This procedure was repeated until the classification was
stable, i.e., none of the classifications was changed anymore. Second, in the axial
coding phase classifications were repeatedly grouped to higher level categories.
Experimental Operation of R2
Experimental Preparation Since R2 builds upon the experimental setup of E2 , all
of the experimental material, i.e., process models and questions, could be reused
for R2 . Still, Hierarchy Explorer had to be extended with a text field in order to
allow subjects justifying their answer and the assignments sheets had to be adapted
to instruct subjects to justify their answers. In addition, experimental preparation
included the acquisition of subjects. Unlike for E2 , where a large number of students was available, we could not find any opportunity to conduct R2 in a controlled
setting. Therefore, we decided to conduct R2 as an online study and to acquire
subjects at the University of Innsbruck by acquiring volunteers among students and
researchers working in related areas. In addition, we asked researchers we were collaborating with to ask students and co–workers to participate in the experiment.
We would like to emphasize at this point that we are aware that this procedure will
inevitably lead to a heterogeneous sample and an uncontrolled setting. However,
due to the lack of alternatives, we decided for this procedure and put a particular
emphasis on the data validation for R2 . As the experimental design of R2 was operationalized through an experimental workflow in CEP, the experimental material
139
Chapter 5 The Impact of Modularization on Understandability
could easily be distributed to subjects by handing out respective download links to
a preconfigured version of CEP.
Experimental Execution Replication R2 was started in April 2012 and ended in
July 2012, during this time 48 subjects participated. Even though we could not
control the environment in which subjects were answering the comprehension questions, the experimental workflow of CEP ensured that none of the subjects skipped
a task and that data was collected successfully.
Data Analysis of R2
So far, we have focused on the experimental design of R2 as well as experimental
preparation and experimental operation. Next, we describe the data validation and
data analysis.
Data Validation Replication R2 was conducted in an uncontrolled setting, hence
particular care needs to be taken with respect to the validity of the collected data.
First, and analogous to E2 , we screened our sample for subjects that had problems
in identifying colors. Unlike in E2 , none of the subjects had to be excluded from
the data analysis due to this reason. Second, we could not ensure that subjects
fully focused on their tasks. Rather, it seems likely that the answering process may
were interrupted by, e.g., telephone calls, answering emails or surfing the web. To
assess whether subjects seriously worked on their assignments, we computed the
average accuracy for each subject. Surprisingly, the average accuracy was computed
to be 89%, i.e., 7% higher than for E2 . In addition, 46 out of 48 subjects achieved
an accuracy of at least 75% and all subjects achieved an accuracy of more than
56%. Against this background, it seems plausible to conclude that subjects were
working on their assignments seriously. Similarly, the average mental effort was
3.29, which is approximately Rather low mental effort. The highest reported mental
effort, in turn, was 4.9, which is approximately Rather high mental effort. Hence, we
conclude that none of the subjects was overstrained by the experimental tasks. Still,
a high accuracy and the lack of exceedingly high mental effort does not necessarily
indicate that subjects were not interrupted, i.e., measured durations for answering
questions may still fluctuate unexpectedly. To detect these anomalies, we screened
the sample for outliers. As outliers should not be removed without justification [227],
we employed the Median Absolute Deviation (MAD) [82] for detecting outliers.12 In
12
SPSS (Version 21.0) does not provide a satisfactory procedure for computing the MAD. Therefore,
we exported data to R [200], computed MAD and re–imported data in SPSS.
140
5.4 Evaluation Part I: BPMN
particular, we considered values differing from the median 3 times the MAD or more
as outliers. Applying this rather conservative criterion [130] led to the exclusion of
18 subjects for analyzing duration. We would like to point out that we did not
exclude these subjects from analyzing mental effort and accuracy, since these values
appear to be plausible.
Next, we analyzed the subject’s demographical information, as summarized in
Table 5.16. Analogous to E2 , questions 1–7 concerned modeling proficiency, questions 8–10 asked for specifics of models subjects had created or analyzed. Finally,
questions 11–12 covered domain knowledge. For questions 1–3 and questions 11–
12, we employed a 7–point rating scale ranging from Strongly (1) over Neutral (4)
to Strongly agree (7). Overall, subjects indicated that they were rather familiar
with BPMN (M = 5.17, SD = 1.40), felt rather confident in understanding BPMN
(M = 5.52, SD = 1.09) and felt rather competent in modeling BPMN–based models (M = 5.13, SD = 1.08). Furthermore, subjects had on average 25.27 months
of experience in modeling BPMN (SD = 19.66) and 3.85 years of experience in
BPM in general (SD = 3.43). In addition, it can be observed that subjects had received formal training (M = 3.00 days, SD = 9.21) and considerable self–education
(M = 15.94 days, SD = 22.61). Questions 8–10 indicate that subjects had experience in reading process models (M = 41.42, SD = 50.12) and creating process models (M = 15.60, SD = 20.32), however, not necessarily experience with large models
(M = 26.29 activities per model, SD = 41.75). Finally, regarding domain knowledge, subjects indicated familiarity with scientific work (M = 5.65, SD = 1.14),
i.e., the domain used for process model M1 , but indicated rather unfamiliarity with
space travel (M = 3.33, SD = 1.56), i.e., the domain used for process model M2 .
Summarizing, it can be said that—even though some of the participants were
rather new to BPMN—the average subject indicated considerable training and experience in BPMN. Knowing that from 48 subjects, 26 subjects indicated that they
were students, but 16 subjects indicated academic background and 6 subjects indicated professional background, this appears plausible. Hence, we conclude that
subjects meet the demanded requirements, i.e., subjects should have received sufficient training in BPMN. Against this background, in the following we continue with
the data analysis and hypothesis testing. Analogous to E2 , we approach R9 (positive influence of modularization) and RQ10 (negative influence of modularization) at
three levels of granularity. First, we give an overview, i.e., report aggregated values
for all modularized models versus all non–modularized models. Then, we analyze
process models M1 and M2 separately. Finally, we analyze each question separately.
141
Chapter 5 The Impact of Modularization on Understandability
Min.
Max.
M
SD
Familiarity with BPMN
Confidence understanding BPMN
Competence modeling BPMN
Months using BPMN
Years experience in BPM
Days of formal training last year
Days of self–education last year
1
3
3
0
0
0
0
7
7
7
82
20
62
100
5.17
5.52
5.13
25.27
3.85
3.00
15.94
1.40
1.09
1.08
19.66
3.43
9.21
22.61
8. Process models read last year
9. Process models created last year
10. Avg. number of activities per model
0
0
0
220
100
200
41.42
15.60
26.29
50.12
20.32
41.75
11. Familiarity scientific work
12. Familiarity space travel
3
1
7
7
5.65
3.33
1.14
1.56
1.
2.
3.
4.
5.
6.
7.
Table 5.16: Demographical statistics of R2
RQ9 : Is the Understanding of a BPMN–Based Model Positively Influenced by
Abstraction?
Analogous to E2 , we expect in replication R2 a positive influence of abstraction on
mental effort (H4 ), accuracy (H5 ) and duration (H6 ). As described in Section 5.4.1,
we created questions that are presumably influenced by abstraction, but are not
imposed by fragmentation. Hence, these questions should be easier to be answered
in modularized model, i.e., should result in lower mental effort, higher accuracy and
lower duration.
Hypothesis
Mod.
Flat
Δ
Z
H4 : Mental effort
H5 : Accuracy
H6 : Duration
11.50
3.58
256.46
12.88
3.65
322.72
−1.38
−0.07
−66.26
−3.15
−0.53
−2.09
a
p
0.002a
0.599
0.037a
r
0.45
0.08
0.38
significant at the 0.05 level
Table 5.17: Results for abstraction questions
The results regarding RQ9 are summarized in Table 5.17. In particular, the first
column lists hypotheses, columns two to four show values for modularized models,
flat models and the differences thereof. Columns five to seven, in turn, list the
142
5.4 Evaluation Part I: BPMN
results of statistical tests. Not all data is normal–distributed13 and due to repeated
measurements, response variables are not independent. Hence, to make results better
comparable with E2 , we refrained from using parametric tests and chose Wilcoxon
Signed–Rank Test to test for statistical significance. In particular, Wilcoxon Signed–
Rank Test shows that mental effort (H4 ) was significantly lower in modularized
models (Z = −3.15, p = 0.002, r = 0.45) and duration (H6 ) was significantly lower
in modularized models (Z = −2.09, p = 0.037, r = 0.38), however, no significant
differences could be found for accuracy (H5 ) in R2 (Z = −0.53, p = 0.599, r = 0.08).
Analogously, the effect sizes for mental effort and duration can be considered large
and medium, respectively, while the effect size for accuracy is small (cf. [32, 33]). We
would like to remind at this point that subjects were asked to justify their answers.
Hence, it appears plausible that the average duration in R2 were almost twice as high
as the average duration in E2 . In the following, we refine our analysis by analyzing
results for process models M1 and M2 separately.
Abstraction: Results for Model Similar to the previous analysis and analogous to
E2 , we have analyzed the results for M1 and M2 separately. In particular, Table 5.18
lists the hypotheses, models, values for modularized as well as flat models and the
difference thereof. Columns six to eight, in turn, report results from statistical tests.
Analogous to E2 , we applied Mann–Whitney U Test to test for statistical significance, since models were analyzed separately and therefore the response variables
were independent. Again, results are in line with findings obtained so far: mental
effort and duration were on average lower in modularized models, while differences
regarding accuracy appear to be marginal. However, except for duration, differences
are not statistically significant. As discussed in E2 , this can be partially traced back
to the application of statistical tests for unpaired samples, i.e., the application of
Mann–Whitney U Test instead of Wilcoxon Signed–Rank Test. Furthermore, replication R2 was conducted in a rather uncontrolled setting and with a smaller as well
as presumably more heterogeneous sample, hence it can be assumed that these influences pose additional interference factors further hampering the identification of
significant differences.
In the following, we continue by analyzing the results for mental effort (H4 ),
accuracy (H5 ) and duration (H6 ) per question.
Abstraction: Results for Mental Effort (H4 ) So far, we have analyzed the impact
of abstraction on mental effort for all models as well as per model—next, we investi13
We would like to remind at this point that detailed results of tests for normal distribution can
be found Appendix A.4.
143
Chapter 5 The Impact of Modularization on Understandability
Hypothesis
Model
Mod.
Flat
Δ
U
p
r
H4 : Mental
effort
M1
M2
12.00
11.14
13.11
12.55
−1.11
−1.41
210.00
204.50
0.140
0.111
0.21
0.23
H5 : Accuracy
M1
M2
3.45
3.68
3.54
3.80
−0.09
−0.12
274.00
246.00
0.885
0.356
0.02
0.13
H6 : Duration
M1
M2
302.66
216.03
379.41
257.93
−76.75
−41.90
72.00
64.00
0.096
0.047a
0.30
0.36
a
significant at the 0.05 level
Table 5.18: Results for abstraction questions (per model)
gate the mental effort individually for each question. We would like to remind at this
point that abstraction questions and fragmentation questions were posed alternately,
i.e., Q1 , Q3 , . . . Q15 operationalized abstraction, while Q2 , Q4 , . . . Q16 operationalized fragmentation. Hence, results listed in Table 5.19 only list odd question. In
particular columns one and two list the model and question, while columns three
to five list values for modularized models, flat models and the differences thereof;
columns six to eight list results from statistical tests. It can be observed that the
mental effort was tendentially lower in modularized models, however, except for Q3 ,
differences were not statically significant.
Model
Quest.
M1
M2
a
Mod.
Flat
Δ
U
p
Q1
Q3
Q5
Q7
2.70
3.00
3.05
3.25
2.57
3.86
3.43
3.25
0.13
−0.86
−0.38
0.00
260.50
150.50
218.50
264.00
0.664
0.004a
0.170
0.723
0.06
0.41
0.20
0.05
Q9
Q11
Q13
Q15
2.54
3.21
2.79
2.61
2.90
3.40
3.15
3.10
−0.36
−0.19
−0.36
−0.49
239.50
242.00
209.00
220.00
0.376
0.411
0.103
0.183
0.13
0.12
0.24
0.19
significant at the 0.05 level
Table 5.19: H4 —Mental effort for abstraction questions
144
r
5.4 Evaluation Part I: BPMN
Abstraction: Results for Accuracy (H5 ) Analogous to mental effort (H4 ), in the
following we summarize the average accuracy (H5 ) for each individual question. As
can be seen in Table 5.20, similar to E2 , differences appear to be marginal and
non–significant. Rather, similar to E2 , results point toward a negative influence of
abstraction on accuracy.
Model
Quest.
Mod.
Flat
Δ
U
p
r
M1
Q1
Q3
Q5
Q7
0.95
0.85
0.70
0.95
1.00
0.89
0.71
0.93
−0.05
−0.04
−0.01
0.02
266.00
268.00
276.00
274.00
0.237
0.661
0.915
0.765
0.17
0.06
0.02
0.04
M2
Q9
Q11
Q13
Q15
0.86
0.89
0.96
0.96
0.95
0.95
1.00
0.90
−0.09
−0.06
−0.04
0.06
254.00
264.00
270.00
262.00
0.304
0.485
0.398
0.369
0.15
0.10
0.12
0.13
Table 5.20: H5 —Accuracy for abstraction questions
Abstraction: Results for Duration (H6 ) To conclude the analysis of RQ9 , we look
into the average duration required for answering questions. The results, as listed in
Table 5.21, show that most questions could be answered quicker in the modularized
models (except for Q7 ; potential reasons were already discussed in Section 5.4.2).
Even though, except for Q1 , differences could not found to be statistically significant,
a clear trend toward the beneficial influence of abstraction can be observed.
Hence, we conclude that regarding H9 the findings of E2 could largely be replicated
in R2 . In particular, support for hypotheses mental effort (H4 ) and duration (H6 )
could be found, while hypothesis accuracy (H5 ) could not be supported. However,
compared to E2 , effects observed in replication R2 were less strong, i.e., differences
between groups were less often statistically significant and effect sizes were smaller.
This, as we argue, can be largely traced back to the uncontrolled setting of R2 and
the smaller sample size. In the following, we turn to the investigation of RQ10 , i.e.,
the influence of fragmentation, before the insights from RQ9 and RQ10 are revisited
for a discussion.
145
Chapter 5 The Impact of Modularization on Understandability
Model
M1
M2
a
Quest.
Mod.
Flat
Δ
U
p
r
Q1
Q3
Q5
Q7
67.26
71.35
86.58
77.47
112.01
98.24
100.97
68.18
−44.75
−26.89
−14.39
9.29
57.00
67.00
83.00
91.00
0.022a
0.061
0.228
0.383
0.42
0.34
0.22
0.16
Q9
Q11
Q13
Q15
66.90
59.98
53.72
35.42
74.80
75.27
62.08
45.78
−7.90
−15.29
−8.36
−10.36
79.00
69.00
91.00
68.00
0.170
0.074
0.383
0.067
0.25
0.33
0.16
0.33
significant at the 0.05 level
Table 5.21: H6 —Duration for abstraction questions
RQ10 : Is the Understanding of a BPMN–Based Model Negatively Influenced
by Fragmentation?
Analogous to RQ9 , we approach RQ10 by investigating hypotheses mental effort
(H7 ), accuracy (H8 ) and duration (H9 ). As described in Section 5.4.1, we created
questions that are presumably impaired by fragmentation, but are not influenced
by abstraction, i.e., should result in higher mental effort, lower accuracy and higher
duration in modularized models. Again, we test these assumptions at three levels of
granularity. Starting with an analysis of all questions, subsequently values for M1
and M2 are investigated separately, and finally, questions are analyzed individually.
Hypothesis
Mod.
Flat
Δ
Z
H7 : Mental effort
H8 : Accuracy
H9 : Duration
15.88
3.35
436.50
12.29
3.67
285.74
3.59
−0.32
150.76
−5.46
−2.46
−4.64
a
p
0.000a
0.011a
0.000a
r
0.79
0.35
0.85
significant at the 0.05 level
Table 5.22: Results for fragmentation questions
An overview of the results of RQ10 can be found in Table 5.22. Analogous to RQ9 ,
it lists hypotheses, values for modularized as well as flat models and the differences
thereof. Then, columns five to seven list results from statistical tests. We would
like to remind at this point that due to repeated measurements response variables
are not independent, hence Wilcoxon Signed–Rank Test was applied. In particu-
146
5.4 Evaluation Part I: BPMN
lar, Wilcoxon Signed–Rank Test indicates that mental effort (H7 ) for modularized
models was significantly higher (Z = −5.46, p = 0.000, r = 0.79), accuracy (H8 )
was significantly lower for modularized models (Z = −2.46, p = 0.011, r = 0.35) and
duration (H9 ) was significantly higher for modularized models (Z = −4.64, p =
0.000, r = 0.85). Furthermore, medium to large effect sizes could be observed
(cf. [32, 33]). Summarizing, data collected in R2 provides empirical support for
hypotheses mental effort (H7 ), accuracy (H8 ) and duration (H9 ), thereby confirming the findings of E2 . Subsequently, we extend our analysis by investigating the
results separately for M1 and M2 .
Fragmentation: Results per Model To determine whether effects observed in R2
were specific for M1 or M2 , we analyzed results separately for each model. The
results can be found in Table 5.23, whereby columns one to five list hypotheses,
models, values for modularized as well as flat models, and the differenced thereof.
Then, columns five to eight list results from statistical tests. We would like to
remind at this point that due to the separate analysis of M1 and M2 , response
variables are independent and hence Mann–Whitney U Test was applied. Basically,
it can be observed that this analysis confirms the findings obtained so far: mental
effort and duration were on average higher for modularized models, while accuracy
was lower. However, differences regarding accuracy were not statistically significant.
Hence, we conclude that these results are in line with findings obtained so far. In
the following, we continue our analysis by investigating mental effort (H7 ), accuracy
(H8 ) and duration (H9 ) for each question.
Hypothesis
Model
Mod.
Flat
Δ
U
H4 : Mental
effort
M1
M2
15.60
16.07
11.89
12.85
3.71
3.22
121.50
137.50
0.001a
0.003a
0.48
0.43
H5 : Accuracy
M1
M2
3.30
3.39
3.68
3.65
−0.38
−0.26
207.00
222.00
0.078
0.165
0.25
0.20
H6 : Duration
M1
M2
480.60
397.91
308.33
259.93
172.27
137.98
29.00
35.00
0.001a
0.001a
0.63
0.58
a
p
r
significant at the 0.05 level
Table 5.23: Results for fragmentation questions (per model)
Fragmentation: Results for Mental Effort (H7 ) So far, we have analyzed the
influence of fragmentation on mental effort (H7 ) for all models as well as per model,
147
Chapter 5 The Impact of Modularization on Understandability
in the following we analyze the influence on each question. We would like to remind
at this point that questions that operationalized abstraction and fragmentation were
posed alternately, i.e., Q1 , Q3 , . . . Q15 operationalized abstraction, while Q2 , Q4 ,
. . . Q16 operationalized fragmentation, hence Table 5.24 only lists even questions.
Similar to the analysis of H4 , the columns list the models, questions, values for
modularized as well as flat models and the differences thereof. Then, columns six
to eight list results from statistical tests. Overall, it can be observed that the
average mental effort was higher for modularized models. In addition, except for
Q8 differences are also statistically significant, further corroborating the negative
influence of fragmentation on mental effort.
Model
Quest.
M1
M2
a
Mod.
Flat
Δ
U
p
r
Q2
Q4
Q6
Q8
3.75
3.80
4.20
3.85
2.57
2.79
3.32
3.21
1.18
1.01
0.88
0.64
118.00
140.00
170.00
222.00
0.000a
0.002a
0.018a
0.205
0.51
0.45
0.34
0.18
Q10
Q12
Q14
Q16
4.57
3.64
3.89
3.96
3.90
3.00
3.05
2.90
0.67
0.64
0.84
1.06
177.50
184.00
151.50
146.50
0.024a
0.037a
0.005a
0.004a
0.33
0.30
0.41
0.42
significant at the 0.05 level
Table 5.24: H7 —Mental effort for fragmentation questions
Fragmentation: Results for Accuracy (H8 ) Analogous to mental effort (H7 ), we
list results regarding accuracy (H8 ) in the following. In particular, as can be seen in
Table 5.25, a trend toward lower accuracy in modularized models can be observed;
for Q8 differences were statistically significant.
Fragmentation: Results for Duration (H9 ) To conclude the analysis of RQ10 ,
we investigate the influence of fragmentation on duration (H9 ). As summarized in
Table 5.26, on average subjects required more time for answering questions in modularized models and except for Q2 and Q12 differences were statistically significant.
Hence, also this analysis is in line with the findings obtained so far.
Summarizing, we conclude that data obtained in R2 to a large extend confirms
findings from E2 . Particularly, also R2 provides empirical evidence for the positive
influence of abstraction on mental effort (H4 ) and duration (H6 ). The influence on
148
5.4 Evaluation Part I: BPMN
Model
Quest.
M1
M2
a
Mod.
Flat
Δ
U
p
r
Q2
Q4
Q6
Q8
0.70
0.95
1.00
0.65
0.86
1.00
0.89
0.93
−0.16
−0.05
0.11
−0.28
236.00
266.00
250.00
202.00
0.191
0.237
0.135
0.016a
0.19
0.17
0.22
0.35
Q10
Q12
Q14
Q16
0.64
0.93
0.96
0.86
0.85
0.95
0.95
0.90
−0.21
−0.02
0.01
−0.04
222.00
274.00
276.00
268.00
0.115
0.765
0.809
0.661
0.23
0.04
0.03
0.06
significant at the 0.05 level
Table 5.25: H8 —Accuracy for fragmentation questions
accuracy (H5 ), in turn, remains unclear. Regarding the influence of fragmentation,
R2 provides empirical evidence for the negative influence on mental effort (H7 ),
accuracy (H8 ) and duration (H9 ), corroborating the findings obtained in E2 . In the
following, we first analyze whether correlations between mental effort and accuracy
as well as mental effort and duration also can be found in R2 . Then, to examine the
apparently missing link between abstraction and accuracy, we investigate accuracy
in detail.
Correlations with Mental Effort
In the analysis of E2 , we investigated the correlation between mental effort and
accuracy as well as mental effort and duration. Next, we repeat this analysis for
the data obtained in R2 . In particular, we computed the average mental effort,
accuracy and duration for each question. Thereby, we differentiated between questions that were posed for modularized models and questions that were posed for
non–modularized models, since different mental effort, accuracy and duration had
to be expected. The correlation between mental effort and accuracy can be found in
the scatter diagram in Figure 5.10. We would like to remind at this point that the
x–axis represents the mental effort, ranging from Extremely low mental effort (1)
to Extremely high mental effort (7). The y–axis, in turn, shows accuracy, ranging
from 0 (all answers incorrect) to 1 (all answers correct). It can be observed that—
consistent with E2 —in R2 high mental effort seems to be related to low accuracy. In
fact, mental effort and accuracy show a statistically significant negative correlation
(r(30) = −0.394, p = 0.026).
149
Chapter 5 The Impact of Modularization on Understandability
Model
Quest.
Mod.
Flat
Δ
U
M1
Q2
Q4
Q6
Q8
125.01
129.01
131.92
94.65
101.28
58.51
91.42
57.12
23.74
70.50
40.50
37.53
75.00
28.00
31.00
46.00
0.124
0.000a
0.001a
0.006a
0.28
0.64
0.61
0.50
M2
Q10
Q12
Q14
Q16
159.13
73.42
94.02
71.34
91.80
68.70
53.88
45.55
67.33
4.72
40.14
25.79
45.00
95.00
36.00
47.00
0.005a
0.480
0.002a
0.007a
0.51
0.13
0.58
0.49
a
p
r
significant at the 0.05 level
Table 5.26: H9 —Duration for fragmentation questions
Similar to E2 , 4 questions seem not to fit in (Q8 , Q10 , Q12 and Q16 ). To determine, whether these points can be considered outliers, we applied simple linear
regression: accuracy = 1.14 − 0.08 ∗ mental effort, t(30) = −2.35, p = 0.026, with
R2 = 0.16, F (1, 30) = 5.51, p = 0.026. Then, we computed the residuals and considered all residuals differing more than 3 times Median Absolute Deviation (MAD) [82]
as outliers. Applying this rather conservative criterion [130], none of the data points
was identified as outlier. We would like to remind that in the analysis of E2 we
identified Q8 and Q10 as outliers, but could not provide a plausible explanation why
this was the case. Having established that no outliers could be found for R2 , it
appears likely that the outliers can be traced back to peculiarities of E2 ’s sample.
For instance, all subjects of E2 were attending the same lecture, hence perhaps a
certain aspect of BPMN was taught differently than it was interpreted in our experimental material. Contrariwise, in R2 subjects had rather different backgrounds
and knowledge, presumably counteracting this effect.
Results regarding the correlation between mental effort and duration can be found
in Figure 5.11. Again, consistent with E2 , high mental effort appears to be associated
with high duration. Indeed, mental effort and duration are statistically significant
negatively correlated (r(30) = 0.462, p = 0.008). Analogous to the analysis of mental
effort and accuracy, we applied simple linear regression to test whether all data points
follow a similar behavior: duration = −27.02+32.96∗mental effort, t(30) = 4.24, p =
0.000 with R2 = 0.38, F (1, 30) = 17.98, p = 0.000. To identify potential outliers,
we computed the residual of each data point and considered residuals differing from
the mean more than 3 times the MAD as outliers. Against this rather conservative
criterion [130], none of the residuals was detected as outlier. Hence, we conclude
150
5.4 Evaluation Part I: BPMN
1.00
Accuracy
0.80
0.60
0.40
0.20
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
Mental Effort
Figure 5.10: Mental effort versus accuracy
that also in R2 mental effort correlates with accuracy and duration.
Investigating Accuracy
Even though in experiment E2 and replication R2 postulated hypotheses could
largely be confirmed, neither in E2 nor in R2 support for the influence of abstraction on accuracy (H5 ) could be found. In the following, we try to clarify whether
there is indeed no connection between abstraction and accuracy or whether other
explanations seem more plausible. As accuracy is defined as the ratio of correct
answers divided by the total amount of answers, it seems essential to understand
the causes for incorrect answers, i.e., error sources. To this end, we asked subjects
to justify their answers by shortly explaining their line of reasoning. As described
in the experimental setup of R2 , we used these justifications to identify potential
reasons for incorrect answers in the open coding phase and subsequently aggregated
reasons to higher level categories. Similar to the coding reported in Section 4.7, due
to personnel limitations only one person, i.e., the author, was responsible for the
coding.
The results of this analysis can be found in Figure 5.12. From a total of 80 errors,
i.e., incorrect answers, 12 errors could not be analyzed due to trivial or missing
justifications, for instance, subjects simply rephrased the question: “both activities
151
Chapter 5 The Impact of Modularization on Understandability
200.00
Duration (sec)
150.00
100.00
50.00
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
Mental Effort
Figure 5.11: Mental effort versus duration
could be reachable in parallel”. For the remaining 68 incorrect answers, various
reasons could be identified. First, different background and terminology posed a
problem for certain subjects. In particular, 6 subjects confused terminology both A
and B and A parallel to B. While both A and B refers to the situation where A and B
occur at some time in a process instance, A parallel to B refers to the situation where
A and B occur at the same time in a process instance. Then, some subjects also
considered infinite looping, i.e., that a process instance is never terminated, but stuck
in a loop—which we did not take into account. Furthermore, 5 subjects confused
can, i.e., there is a process instance in which certain behavior is true, with must,
i.e., a certain behavior must be true for all process instances. Finally, 2 subjects
confused A is executed immediately after B, i.e., after A is immediately followed
by B, with A is executed after B, i.e., after A at some point B follows. Besides
different terminology, 14 answers were incorrect simply due to a lack of knowledge,
e.g., subjects confused symbols for OR– and XOR gateways. Interestingly, 10 further
subjects gave a perfectly correct justification, but selected the wrong answer. Finally,
3 subjects did not understand a question and 2 subjects had problems with operating
the user interface.
The remaining 20 errors, in turn, could be traced back to the misinterpretation
152
5.4 Evaluation Part I: BPMN
Misinterpretation of
model
20
Could determine
problem
68
Different background /
terminology
19
Total errors
80
Lack of knowledge
14
Could not determine
problem
12
Correct reasoning, wrong
answer chosen
10
Did not understand
question
3
Explanation indicates
problem according to
framework
15
Non-according to
framework
5
Both means parallel
6
Infinite looping
6
Can / must
5
Executed after /
Immediatly after
2
Handling of UI
2
Figure 5.12: Distribution of errors
of the process model. From these 20 errors, 15 errors could be explained through
fragmentation, i.e., subjects had problems in integrating sub–processes or lost track
when they switched between sub–processes. This leaves 5 errors that cannot be
explained by the proposed framework, but can rather be traced back to errors that
occured while interpreting a process model, e.g., overlooking a back–edge in a loop.
Put differently, from 20 errors that occurred when interpreting the process model,
15 can be explained by fragmentation.
From this categorization, two main observations can be made. First, only 29% of
incorrect answers (20 of 68; 12 could not be analyzed) can be traced back to misinterpreting the process model. The remaining 71% seem to be related to problems with
the experimental setup, e.g., not properly worded questions, and subject–specific
factors, e.g., different backgrounds or terminology. Since the framework proposed in
Section 5.3.2 focuses on model–related errors only, these influences have to be seen as
interference. This rather strong interference, combined with the low amount of occurring errors, i.e., 82% answers correct in E2 and 89% answers correct in R2 , might
be a potential explanation for the—compared to mental effort and duration—rather
weak impact of abstraction and fragmentation on accuracy. However, this does not
153
Chapter 5 The Impact of Modularization on Understandability
explain why accuracy was found to differ statistically significantly for fragmentation
in E2 as well as R2 , but did not differ statistically significantly for abstraction. To
explain this disparity, a look at the effect sizes might provide a potential clarification. Throughout E2 and R2 , effect sizes were smaller in hypotheses regarding
abstraction compared to effect sizes of hypotheses regarding fragmentation. In particular, as summarized in Table 5.5, effect sizes for mental effort and duration for
abstraction were 0.47 and 0.48. For fragmentation, as shown in Table 5.10, values
increased to 0.69 and 0.57, respectively. In R2 , a similar behavior could be observed:
effect sized for mental effort and duration, as listed in Table 5.17 increased from 0.45
and 0.38 for abstraction to 0.79 and 0.85 for fragmentation (cf. Table 5.22). Hence,
it appears that the influence of fragmentation was on average stronger than the average influence of abstraction. Therefore, a potential explanation could be that the
influence of abstraction was not strong enough to counterbalance these interferences,
leading to non–significant results.
Limitations
So far, we have analyzed data obtained in R2 , in the following we discuss potential
limitations of R2 . First, and foremost, replication R2 was conducted in a rather uncontrolled setting with a rather heterogeneous sample. Hence, it must be assumed
that subjects were probably distracted or interrupted while working on the comprehension tasks and therefore results have to be interpreted with care. As, however,
results are in line with findings obtained in E2 , it appears likely that also results
from R2 are valid. Second, replication R2 builds upon the experimental material
from E2 . Hence, limitations regarding experimental material, e.g., usage of a single
modeling language, also apply to R2 . Third, the classification of error sources was
conducted by a single person only, i.e., the author, thereby limiting the accuracy
and reliability of the coding.
Discussion
So far, we have described the experimental setup, experimental execution, data
validation and data analysis of R2 . Subsequently, we revisit the key findings for a
discussion. Basically, it can be observed that findings obtained in E2 could mostly
be replicated in R2 . In particular, repeatedly empirical evidence for the positive
influence of abstraction (RQ9 ) and the negative influence of fragmentation (RQ10 )
could be found. Similarly, it could be substantiated that mental effort and accuracy
as well as mental effort and duration are correlated. It should be mentioned at this
point that effect sizes and correlations were tendential smaller in R2 , which can be
154
5.5 Evaluation Part II: Declare
traced back to the rather uncontrolled setting and heterogeneous setting of R2 as well
as the smaller sample size. In addition, neither E2 nor R2 could provide evidence
that abstraction has a positive influence on accuracy. A detailed investigation of
error sources showed that only approximately 29% of the errors could be traced
back to misinterpretations of the model, i.e., errors that can potentially explained
through the proposed framework. The remaining 71%, in turn, have to be considered
as interference. Knowing that effect sizes for abstraction were on average smaller
than effect sizes for fragmentation, it appears plausible that interfering influences
were larger than the influence of abstraction on accuracy, thereby covering effects.
We would like to emphasize at this point that this does not imply that the negative influence of modularization predominates in general. Rather, in this particular
setting, i.e., using the models and questions from E2 and R2 , the measured influence
of fragmentation was stronger than the measured influence of abstraction. Even
though we could not entirely resolve the connection between abstraction and accuracy, we decided not to further refine the empirical investigation regarding BPMN,
but to broaden the study and to also approach declarative process modeling, as
detailed in the following.
5.5 Evaluation Part II: Declare
So far, we have focused on the quantitative empirical evaluation of the proposed
framework from Section 5.3.2 in the context of BPMN–based process models. Subsequently, in experiment E3 we extend this investigation by shifting the focus in two
ways. First, we examine the proposed framework for declarative, i.e., Declare–based,
process models. Second, besides using quantitative data, we also take into account
qualitative data in E3 by employing think–aloud techniques to investigate the subjects’ reasoning processes. Analogous to experiment E2 and replication R2 , we start
by introducing the experimental design in Section 5.5.1. Then, in Section 5.5.2 we
describe the experimental operation and findings of E3 .
5.5.1 Experimental Definition and Planning
The goal of E3 is to provide empirical evidence of abstraction and fragmentation in
Declare–based models. In the following, we introduce the research questions, hypotheses, describe the subjects, factors, factor levels, objects and response variables
required for our experiment. Then, we present the experimental design as well as
the instrumentation and data collection procedure.
155
Chapter 5 The Impact of Modularization on Understandability
Research Questions and Hypotheses
The research questions investigated in E3 are directly derived from the theoretical
framework presented in Section 5.3.2. The basic claim of the framework is that
any modularization of a Declare–based process model shows positive effects through
abstraction, but also negative effects through fragmentation. As described in Section 3.1.2, the interpretation of modularized declarative process models requires the
modeler to combine constraints. Likewise, the interpretation of sub–processes requires the modeler to combine the constraints of the sub–process with constraints
from the parent process. Compared to BPMN–based models, where interpretation
is mostly concerned with passing tokens through the process [167], this presumably
constitutes a considerably more difficult task. Hence, it is a–priori not clear whether
process modelers are able to properly perform this task. Therefore, research question RQ11 investigates whether process modelers are basically able to understand
the semantics of sub–processes in declarative process models.
Research Question RQ11 Do process modelers understand the semantics of sub–
processes in Declare–based process models?
Then, research questions RQ12 and RQ13 investigate whether empirical evidence
for the positive influence of modularization, as postulated in Section 5.3.2, can be
found. RQ12 thereby particularly focuses on the role of pattern recognition, whereas
the influence of information hiding is approached in RQ13 :
Research Question RQ12 Is the understanding of a Declare–based model positively
influenced by pattern recognition?
Research Question RQ13 Is the understanding of a Declare–based model positively
influenced by information hiding?
To assess the understandability of a model, analogous to E2 and R2 , we rely on
measuring the mental effort, accuracy and duration required for answering a question
(mental effort, accuracy and duration are elaborated in detail in Paragraph Response
Variables). Again, we use the term flat for a model that is not modularized. In
addition, we assume that business processes are available as a modularized version
modelmod and a flat version modelf lat . In this way, the hypotheses associated with
RQ13 can be postulated as follows:
156
5.5 Evaluation Part II: Declare
Hypothesis H10 Questions that are influenced by abstraction, but are not influenced by fragmentation, require a lower mental effort in modelmod .
Hypothesis H11 Questions that are influenced by abstraction, but are not influenced by fragmentation, yield a higher accuracy in modelmod .
Hypothesis H12 Questions that are influenced by abstraction, but are not influenced by fragmentation, require less time in modelmod .
Finally, RQ14 explores postulated negative effects of modularization. In particular, RQ14 investigates whether fragmentation, i.e., splitting attention and integration
of sub–processes, decreases understandability:
Research Question RQ14 Is the understanding of a Declare–based model negatively influenced by fragmentation?
Analogous to RQ13 , we also look into mental effort, accuracy and duration for the
postulation of hypotheses associated with RQ14 :
Hypothesis H13 Questions that are influenced by fragmentation, but are not influenced by abstraction, require a higher mental effort in modelmod .
Hypothesis H14 Questions that are influenced by fragmentation, but are not influenced by abstraction, yield a lower accuracy in modelmod .
Hypothesis H15 Questions that are influenced by fragmentation, but are not influenced by abstraction, require more time in modelmod .
Subjects
The population examined in E3 are all persons that need to interpret declarative
process models, e.g., process modelers and system analysts. To ensure that measured
differences are caused by the impact of modularization rather than by unfamiliarity
with declarative process modeling, subjects need to be sufficiently trained. Even
though we do not require experts, subjects should have a good understanding of
declarative processes’ principles.
157
Chapter 5 The Impact of Modularization on Understandability
Factor and Factor Levels
Our experiment employs a two–factorial design with factor modularization (factor
levels flat and modularized ) and factor question type (factor levels abstraction and
fragmentation). The elaboration of process models with/without sub–processes realizes factor modularization, questions formulated according to the framework from
Section 5.3.2 realize factor question type, as detailed in Paragraph Objects.
Objects
The objects of this experimental design are eight declarative process models that
were created, as follows. The starting point were four declarative process models,
which were taken from a case study on declarative business process modeling [95],
i.e., from process models describing real–world processes. To make these four models, subsequently referred to as M1 to M4 , amenable for E3 , they underwent the
following steps. First, the models were translated to English (the case study was
conducted in German). Second, inevitable errors occurring in modeling sessions,
such as spelling errors, were corrected. Third, the process models were created
without the support of sub–processes. Hence, a second variant of each process was
created that describes the same process, but makes use of sub–processes, leading to
a total of eight declarative process models. As summarized in Table 5.27, the process
models were chosen such that the number of activities and number of constraints
vary. In particular, M1 and M2 have, compared to M3 and M4 , a small number
of activities. In addition, all processes have a different number of constraints. The
number of activities varies between the flat and modularized model, as complex activities had to be introduced in the modularized models. Similarly, the number of
constraint varies, as processes had to be modeled slightly differently. Since this is
the first study investigating sub–processes in declarative models, we decided to keep
the model’s complexity rather low. In particular, we ensured that not too many
different types of constraints (at most 8) and sub–processes (at most 3) were used.
Likewise, we decided for a maximum nesting level of 1, i.e., none of the sub–processes
referred to another sub–process.
The experiment’s questions are designed, as follows. First, for each model, the
subject is asked to describe the process model. The idea of this step is to make
the subject familiar with the process model and to minimize learning effects in
the upcoming questions. In addition, by letting subjects freely describe a process
model, we intend to get further insights how well models are understood. Second,
for each model 4 categories of representative questions are designed. In particular,
the questions are based on available constraint types [175], i.e., existence, negation
158
5.5 Evaluation Part II: Declare
M1
flat
Activities
Constraints
Constraint types
Sub–processes
Nesting level
Domain
11
19
8
mod.
13
21
8
2
1
Software
development
M2
flat
8
7
4
mod.
9
9
4
1
1
Teaching
M3
flat
mod.
23
30
7
26
28
7
3
1
Electronic
company
M4
flat
23
45
5
mod.
26
44
5
3
1
Buying an
apartment
Table 5.27: Process models used in E3
and ordering. In addition, trace questions, i.e., whether an execution trace is valid,
are asked to combine aspects of different constraints. For each category of questions, a pair of questions is designed according to the understandability framework
from Section 5.3.2. The first question is designed to profit from abstraction, but
not being impaired by fragmentation. Hence, the question should be easier to answer in the modularized model than in the flat model. The second question, in
turn, is designed to not profit from abstraction, but being impaired by fragmentation. Hence, the question should be easier to answer in the flat model. All in
all, for each model 9 questions are provided—the first one looking into the general
understanding of declarative processes, the remaining 8 questions alternatively operationalizing positive and negative effects of modularization. Finally, it is ensured
that the information provided in the process models is sufficient to answer all questions. In other words, no background knowledge is required for answering questions,
as recommended in [171].
Response Variables The primary response variable of this experimental design is
the level of understanding that subjects display with respect to the process models.
To measure understanding, similar to E2 and R2 , we measure mental effort, accuracy
and duration. For measuring mental effort, we rely on 7–point rating scales, asking
subjects to rate mental effort from Very low (1) over Medium (4) to Very High (7).
As discussed in Section 3.2.3, adopting rating scales for measuring mental effort is
known to be reliable and is widely adopted. Response variable accuracy, in turn,
is defined as the ratio of correct answers divided by the total amount of answers.
Hence, an accuracy of 0.0 means that all questions were answered incorrectly, while
a value of 1.0 indicates that all questions were answered correctly. Then, duration
159
Chapter 5 The Impact of Modularization on Understandability
is defined as the duration in second required for answering a question (measured in
seconds), i.e., the amount of seconds required for reading the question, interpreting
the process model and giving the answer. In addition, think–aloud protocols are
collected to get insights into the subject’s reasoning processes, e.g., for analyzing
errors and their underlying causes in detail. To this end, similar to replication R2 ,
we adopt principles from grounded theory [35, 234]. In particular, first, in the open
coding phase we mark parts of think–aloud protocols that relate to incorrect answers
and try to determine potential reasons for the mistakes. If all errors are categorized,
the think–aloud protocols are revisited for checking whether the classification is consistent. If categories are found to be inconsistent, they are adopted accordingly. This
procedure is repeated until the classification is stable, i.e., none of the categories is
changed. Second, in the axial coding phase classifications are repeatedly aggregated
to higher–level categories.
Experimental Design
In order to investigate RQ11 to RQ14 , we adopt a combination of qualitative and
quantitative research methods, as detailed in the following.14 The experiment’s
overall process is lined out in Figure 5.13a: First, subjects are randomly, but evenly,
assigned to Group 1 or Group 2. Before starting with data collection, subjects are
informed that due to the adoption of think–aloud anonymous data collection is not
possible. However, it is ensured to subjects that all personal information is kept
confidential at any time. Then, regardless of the group assignment, demographical
data is collected and subjects are presented with introductory assignments. To
support subjects in their task, sheets briefly summarizing the constraints’ semantics
are distributed. Data gathered during the introduction is not used for analysis.
Rather, the introductory tasks allow subjects to familiarize themselves with the
type of tasks to be performed—ambiguities can be resolved at this stage without
influencing the actual data collection.
After the familiarization phase, subjects are confronted with the actual models
designed for data collection. As shown in Figure 5.13a, four declarative business
processes are used; each of them once modeled with the use of sub–processes and once
modeled without sub–processes (the processes are described in detail in Paragraph
Objects). Those four pairs of process models are then distributed between Group 1
and Group 2 such that subjects are confronted with modularized models and flat
models in an alternating manner, cf. Figure 5.13a.
14
The experimental material can be downloaded from:
http://bpm.q-e.at/experiment/ModularizationDeclarative
160
5.5 Evaluation Part II: Declare
Group 1
n/2 Subjects
Group 2
n/2 Subjects
Demographics,
Introduction
M1
Flat
M2
Modularized
M3
Flat
M4
Modularized
Demographics,
Introduction
M1
Modularized
M2
Flat
M3
Modularized
M4
Flat
(a) Overview
Describe
Process Model
2 Trace
Questions
2 Existence
Questions
2 Negation
Questions
2 Ordering
Questions
(b) Questions per Model
Answer
Question
Assess
Mental Effort
Justify Mental
Effort
(c) Tasks per Question
Figure 5.13: Experimental design of E3 [303]
As detailed in Figure 5.13b, for each model the same procedure is used. First,
the subject is asked to describe what the process is intended to achieve. Second,
the subject is confronted with four pairs of questions which were designed to representatively cover modeling constructs of a declarative process modeling language
(details are presented in Paragraph Objects). For each of the questions, in turn,
a three–step procedure is followed, cf. Figure 5.13c. First, the subject is asked to
answer a question about the model either by Yes, No or Don’t Know. We award one
point for each correct answer and zero points for a wrong answer (including Don’t
Know ). We deliberately introduced the option Don’t Know, as otherwise subjects
would be forced to guess. Second, the subject is asked to assess the expended mental
effort. Third, the subject is asked to explain why it indicated a certain mental effort.
Throughout the experiment, subjects are asked to constantly voice their thoughts,
i.e., to think–aloud, allowing for a detailed analysis of reasoning processes [64].
Instrumentation and Data Collection Procedure
For each question, subjects receive separate sheets of paper showing the process
model, allowing them to use a pencil for highlighting or taking notes. In addition to
recording audio, video recording is used, as video has proven useful to resolve unclear
situations in think–aloud protocols (cf. [302]). Hence, besides collecting quantitative
data in terms of answering questions by ternary choices (Yes, No, Don’t Know ) and
161
Chapter 5 The Impact of Modularization on Understandability
measuring mental effort on a 7–point rating scale, qualitative data in terms of think–
aloud protocols is gathered.
5.5.2 Performing the Experiment (E3 )
Based on the experimental setup described in Section 5.5.1, the controlled experiment (E3 ) was conducted. Aspects regarding the preparation and operation of E3 ,
as well as subsequent data validation and data analysis, are covered in the following.
Experimental Operation of E3
Experimental Preparation Preparation for the experiment included the elaboration of process models, associated questions and the demographic survey. In addition, we prepared material introducing subjects with the tasks to be performed. In
case subjects required clarification of a constraint’s semantics, we prepared sheets
briefly summarizing the semantics of all involved constraints. To ensure that the
material and instructions were comprehensible, we piloted the study and iteratively
refined the material. Finally, models and questions were printed, audio devices and
video camera were checked for operability. In parallel, subjects were acquired, and
if necessary, trained in declarative process modeling.
Experimental Execution The experiment was conducted in July 2012 in two locations. First, seven subjects participated at the University of Ulm, followed by two
additional sessions at the University of Innsbruck, i.e., a total of nine subjects participated. To ensure that subjects were sufficiently familiar with declarative process
modeling, all subjects were provided with training material that had to be studied.
Each session was organized as follows: In the beginning, the subject was welcomed to
the experiment and instructed to speak thoughts out aloud. Since the experimental
material consisted over 100 sheets of paper containing process models and questions,
we needed to ensure that subjects were not distracted by the extent of material to
be processed. To this end, one supervisor was seated left to the subject, a second
supervisor to the right and the sheets containing the experimental material were
then passed from the left to the subject. As soon as the subject had finished the
task, it passed the sheets further to the supervisor to the right. Hence, no more than
a handful of sheets were presented to subjects at once. Meanwhile, a video camera
video–recorded the subject’s activities and audio–recorded any uttered thoughts. At
the end of each session, a discussion followed in order to help subjects reflect on the
experiment and to provide us with feedback.
162
5.5 Evaluation Part II: Declare
Data Analysis of E3
So far, we have focused on the experimental design as well as experimental execution
of E3 . Next, we report from data validation and data analysis; for all statistical
analyses we relied on SPSS, Version 21.0.
Data Validation One session was performed per subject, hence we could easily
ensure that the experimental setup was obeyed. In addition, we screened whether
subjects fitted the targeted profile, i.e., were familiar with Business Process Management in general and Declare in particular; results are summarized in Table 5.28.
Demographical questions 1–6 concerned modeling proficiency of subjects, i.e., confidence as well as understanding Declare and general BPM knowledge. Questions
7–9, in turn, asked for details of the models subjects had created or analyzed. Finally, questions 10–13 regarded the domain knowledge of subjects. Thereby, we
employed a 7–point rating scale for questions 1–3 and questions 10–13, ranging from
Strongly disagree (1) over Neutral (4) to Strongly agree (7). Subjects indicated
that they were rather unfamiliar with Declare (M = 3.78, SD = 1.47), but felt
confident in understanding Declare (M = 4.11, SD = 1.45) and modeling Declare
(M = 4.11, SD = 1.37). Also, subjects indicated that they had on average 4.94 years
of experience in BPM in general (SD = 1.50). Furthermore, subjects indicated that
they had received formal training (M = 0.89 days, SD = 1.59) and self–education
(M = 30.33 days, SD = 44.69) during the last year. Questions 7–9 indicate that subjects were experienced in reading process models (M = 75.56, SD = 70.73) and creating process models (M = 24.00, SD = 28.29), however, not necessarily experienced
with large process models (M = 17.67 activities per model, SD = 12.16). Finally,
we assessed whether subjects were familiar with the domains of the process models
used in E3 , i.e., model M1 to model M4 . Subjects indicated that they were rather
familiar with software development (M1 ; M = 5.78, SD = 0.79) and familiar with
teaching (M2 ; M = 5.56, SD = 0.96), but rather unfamiliar with electronic companies (M3 ; M = 3.00, SD = 1.56) and buying apartments (M4 ; M = 3.56, SD = 1.71).
Finally, we assessed the subjects’ professional background: all subjects indicated an
academic background.
Up to now, we have discussed the design and execution of the empirical study and
looked into the demographical data. In the following, we use the gathered data to
investigate RQ11 to RQ14 .
163
Chapter 5 The Impact of Modularization on Understandability
Min.
Max.
M
SD
Familiarity with Declare
Confidence understanding Declare
Confidence modeling Declare
Years of modeling experience
Days of formal training last year
Days of self–education last year
2
2
2
2
0
4
6
6
6
7
5
150
3.78
4.11
4.11
4.94
0.89
30.33
1.47
1.45
1.37
1.50
1.59
44.69
7. Process models read last year
8. Process models created last year
9. Avg. number of activities per model
10
5
5
250
100
50
75.56
24.00
17.67
70.73
28.29
12.16
4
4
1
1
7
7
6
6
5.78
5.56
3.00
3.56
0.79
0.96
1.56
1.71
1.
2.
3.
4.
5.
6.
10.
11.
12.
13.
Familiarity
Familiarity
Familiarity
Familiarity
software development
teaching
electronic companies
buying apartments
Table 5.28: Demographical statistics of E3
RQ11 : Do Process Modelers Understand the Semantics of Sub–Processes in
Declare–Based Process Models?
Using modularization means to abstract certain parts of a declarative process model
by the means of sub–processes. However, as soon as the content of a sub–process
is of concern, the sub–process has to be integrated back into the parent process, as
described in Section 3.1.2. For a declarative process model, this implies that the
semantics of constraints referring to the sub–process and constraints within the sub–
process have to be combined. As argued, this task might not be trivial, hence in
RQ11 we investigate whether modelers are basically able to perform this integration
task.
In the following, we approach RQ11 in two steps. First, we classify questions with
respect to correctness, i.e., whether a question was answered correctly. Then, we
turn toward the think–aloud protocols to investigate and classify error sources, as
described in Section 5.5.1. Similar to the classifications performed for the case study
in Section 4.7 and replication R2 in Section 5.4.3, this classification was performed
by a single person, i.e., the author. As illustrated in Figure 5.14, in total 288 questions were asked in this experiment (9 subjects * 4 models * 8 questions = 288).
In the following, we inspect the upper branch in which questions asked for modularized models are summarized. In total, 144 questions were asked for modularized
164
5.5 Evaluation Part II: Declare
models, of which 133 (92.3%) were answered correctly and 11 (7.7%) were answered
incorrectly. Apparently, less questions were answered incorrectly in flat models: 4
out of 144 (2.8%). However, when looking into error sources, it becomes clear that
modularization is responsible only for a fraction of incorrect answers. In particular,
4 (2.8%) errors could be traced back to integration of constraints, i.e., when subjects
had to combine the semantics of several constraints in order to answer a question.
Another 1 (0.7%) question was answered incorrectly due to an ambiguous wording,
i.e., the subject misunderstood the wording of a question. 2 (1.4%) questions were
answered incorrectly due to insufficient knowledge about declarative process models.
Finally, 4 (2.8%) questions could be traced back to the presence of modularization,
i.e., were answered incorrectly because subjects did not properly understand the
meaning of constraints in sub–processes in the context of the parent process. In
other words, in these cases subjects had troubles understanding the semantics of the
sub–process.
Correctly answered
(133, 92.3%)
Questions modularized
(144)
Incorrectly answered
(11, 7.7%)
Questions total
(288)
Correctly answered
(140, 97.2%)
Questions flat
(144)
Incorrectly answered
(4, 2.8%)
Integration of constraints
(4, 2.8%)
Integration of sub-processes
(4, 2.8%)
Ambiguous question
(1, 0.7%)
Lack of knowledge
(2, 1.4%)
Integration of constraints
(2, 2.8%)
Ambiguous question
(1, 0.7%)
Lack of knowledge
(1, 0.7%)
Figure 5.14: Distribution of errors, adapted from [303]
The main findings are hence, as follows. First, modelers averagely familiar with
Declare (cf. Table 5.28) are reasonably capable of interpreting Declare models, as
indicated by the fact that 273 out of 288 (94.8%) questions were answered correctly.
Second, the collected data indicates that modelers are capable of interpreting modularized models (133 out of 144 question correct, 92.3%), only 4 questions (2.8%) were
answered incorrectly due to modularization. Therefore, we conclude that averagely
trained modelers are able to interpret modularized declarative process models—
165
Chapter 5 The Impact of Modularization on Understandability
however, modularization might also be a potential error source. This finding is also
in–line with the framework presented in Section 5.3.2, i.e., modularization is feasible,
but has to be applied carefully.
Besides showing that modularization is feasible, these findings are also relevant for
declarative process models in general. In particular, in Section 4.4.1, we discussed
that process models with a large number of constraints are hard to understand, as
the modeler has to keep track of all constraints. When analyzing the distribution of
errors in Figure 5.14, this assumption is further substantiated. In particular, without considering errors conducted due to modularization, 11 errors were committed
in total. Thereof, 5 errors can be attributed to problems with the experimental execution, i.e., in 2 cases a question was worded ambiguously and in further 3 cases the
subject was hindered by lacking knowledge about declarative process models. The
remaining 6 errors were classified as “integration of constraints”, i.e., when subjects
failed to integrate the semantics of several constraints. Hence, it can be concluded
that problems in understanding are not caused by single constraints, but rather the
interplay of several constraints seems to pose a significant challenge. Given this
finding, it seems plausible that the computer–based, automated interpretation of
constraints can lead to significant improvements in the maintenance of declarative
process models, as described in Chapter 4. Having established that modelers are
able to understand the semantics of sub–processes, we now turn to the question in
how far the adoption of modularization generates positive effects.
RQ12 : Is the Understanding of a Declare–Based Model Positively Influenced
by Pattern Recognition?
In Section 5.3.2 we argued that modularization supports the modeler in understanding a process model. In the following, we approach this research question in two
steps. First, we use think–aloud protocols to identify patterns in understanding
declarative process models. Then, we analyze in how far sub–processes support
this process of understanding and how it relates to the understandability framework
presented in Section 5.3.2.
As described in Section 5.5.1, we asked participating subjects to voice their
thoughts. For the investigation of RQ12 , we transcribed the recorded audio files
and analyzed how subjects handled the question in which they were asked to describe the processes’ behavior. The analysis showed that, regardless of whether
sub–processes were present or not, subjects described the process in the order activities were supposedly executed, i.e., tried to describe the process in a sequential
way. Hence, as first step, subjects skimmed over the process model to find an entry
point where they could start with describing the process: “Ok, this is the, this is
166
5.5 Evaluation Part II: Declare
the first activity because it has this init constraint”. Interestingly, subjects seemed
to appreciate when a clear starting point for their explanations could be found: “it
is nice that we have an init activity, so I can start with this”. A declarative process
model, however, does not necessarily have an unique entry point, apparently causing
confusion: “Well. . . gosh. . . I’ve got no clue where to start in this model” 15 . After
having identified an entry point, subjects tried to figure out in which order activities
are to be executed: “And after given duties to the apprentices there should come
these two tasks”. Finally, subjects indicated where the process supposedly ends:
“the process ends with the activity give lessons”.
The sequential way of describing the process models is rather surprising, as it
is known that declarative process models rather convey circumstantial information,
i.e., overall conditions that produce an outcome, than sequential information, i.e.,
how the outcome is achieved [69, 70]. In other words, in an imperative model, sequences are made explicit, e.g., through sequence flows in BPMN. In a declarative
process model, however, such information might not be available at all. For instance,
the coexistence constraint (cf. Table 3.1) defines that two activities must occur in
the same process instance (or do not occur at all)—the ordering of the activities is
not prescribed. As subjects still rather talked about declarative process models in a
sequential manner, it appears as if they preferred this kind of information. Interestingly, similar observations could be made in the case study investigating declarative
process modeling described in Section 4.7. Therein, sequential information, such as
“A before B” or “then C” was preferred for communication.
With respect to this work, the question is in how far sub–processes can support
modelers in making sense of the process model. Given that modelers apparently seek
for a sequential way of describing the process model, it seems likely that the task of
describing a model gets harder for large models, as the modeler cannot just follow
sequence flows as in BPMN models, but has to infer which activity could be executed
next. Hence, the more activities are present, the more possibilities the modeler has
to rule out. Conversely, sub–processes reduce the number of activities per (sub–
)model, hence simplifying this task. In order to see whether empirical evidence
for this claim can be found, we analyzed the mental effort required for describing
process models. During our analysis, we saw that each subject showed a different
base–level of mental effort. Hence, a comparison of absolute values of mental effort
will be influenced by different base levels. To cancel out this influence and to make
mental effort comparable between subjects, we base our analysis on the relative
15
We allowed subjects to choose their preferred language to avoid unnecessary language barriers.
The original quote was uttered in Tyrolean dialect: “jå Oiski! Poiski! Då woas ma jå nit wo
ånfangn bei dem bledn Modell”. To improve the comprehensibility of the thesis, we translated
the quote to English.
167
Chapter 5 The Impact of Modularization on Understandability
mental effort, i.e., the mental effort expended for answering a question divided by
the average mental effort expended by this subject for answering a question about
a process model. Thus, for instance, a value of 0.78 indicates that the subject
expended 78% of the average mental effort. Contrariwise, a value of 2.00 indicates
that the task was twice as hard as an average task in terms of mental effort.
When comparing the relative mental effort required for describing flat models
(M = 1.68, SD = 0.72) and modularized models (M = 1.63, SD = 0.72), however,
differences turned out to be marginal (0.05). Nevertheless, this result does not contradict the assumption that sub–processes can improve understanding. Rather, we
postulated that mental effort will be lower for process models of a certain size.
Indeed, if the same analysis is performed for the larger models (M3 and M4 ),
the difference with respect to relative mental effort between flat models (M =
1.93, SD = 0.93) and modularized models (M = 1.55, SD = 0.37) increases to
0.38, i.e., modularized models are easier to understand. Likewise, for small models
the difference between flat models (M = 1.43, SD = 0.28) and modularized models (M = 1.72, SD = 0.50) increases to −0.29, i.e., modularized models are harder
to understand. These findings are in–line with the framework presented in Section 5.3.2: While large models apparently benefit from abstraction, small models
are rather impaired by fragmentation.
So far, we have discussed how sub–processes influence modelers in establishing
an understanding of a declarative process model. In the following, we investigate
in how far the recognition of patterns can support the process modeler. To this
end, we now turn to results obtained from M3 , as particular interesting insights
could be found for this process model. M3 captures procedures from a company
selling electronic devices: After having completed initial tasks, employees either
supervise apprentices, handle incoming goods or deal with customer complaints—
in the modularized model, these three procedures are modeled as sub–processes.16
Unsurprisingly, all subjects that received the modularized model recognized these
sub–processes. Interestingly, also all subjects that received the flat model described
the same sub–processes. However, in contrast to subjects that received modularized
models, it took them considerably longer to understand that the model could be
partitioned this way. In order to visualize this relation, we assessed at which point in
time subjects mentioned those sub–processes for the first time. In order to eliminate
fluctuations such as talking speed, we refrained from looking into absolute duration.
Rather, we computed the ratio of the time needed for recognizing the sub–processes
divided by the total duration spent for describing the process model. As illustrated
16
Due to size, the process models cannot be reproduced here meaningfully, but can be accessed
through: http://bpm.q-e.at/experiment/ModularizationDeclarative
168
5.5 Evaluation Part II: Declare
in Figure 5.15, subjects confronted with the flat model tended to recognize the sub–
processes for the first time toward the end of the task only, while subjects confronted
with modularized models recognized the sub–processes earlier. In particular, for flat
models, subjects mentioned sub–process after having expended 62% of the total
time. For modularized models, the average ratio dropped to 17%.
Even though the data indicates that sub–processes could be identified earlier, the
question remains why sub–processes were not identified immediately. The answer
to this question can be found in the way subjects described the process models: All
subjects described the process in the order activities were supposedly executed. As
the sub–processes were to be executed after some initial tasks were performed, subjects first described the initial tasks and then the sub–processes. Still, two different
patterns could be observed. Subjects who received the modularized models mentioned the sub–processes and then described their content. Subject who received
flat models rather described the entire model first and toward the end stated that
they think that the model could actually be split according to these sub–processes.
Flat
Flat
Flat
Flat
Modularized
Modularized
Flat
Modularized
Modularized
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Figure 5.15: Duration until first mentioning of sub–processes [303]
Obviously, it is not surprising that subjects mentioned sub–processes earlier in
modularized models as sub–processes were explicitly represented. However, when
looking into mental effort, similar observations can be made. For flat models a relative mental effort of 2.00 (200%) was computed, for modularized models it dropped
to 1.53 (153%)—providing further evidence that modularization was beneficial in
this case.
Even though these observations provide empirical evidence for the positive influence of pattern recognition for M3 , no indications of pattern recognition could be
169
Chapter 5 The Impact of Modularization on Understandability
found in M1 , M2 and M4 . As indicated in the first part of this research question,
the size of a model has an impact on whether modularization is helping or rather
interfering. Likewise, it can be expected that a certain model size is required for
pattern recognition, explaining why no effects could be found for M1 and M2 . This,
however, does not explain why subjects did not identify sub–processes in M4 —a
potential explanation for this difference can be found in its structure. In particular, the process is to a large extent modeled with precedence constraints, i.e., a
constraint that restricts the ordering of activities. Hence, subjects could use these
constraints to move through the process model in a sequential way. For M3 , however, such a behavior was not possible, as it also consists of constraints that do not
convey any sequential information at all (e.g., the not coexistence constraint, cf.
Table 3.1). Hence, subjects were forced to approach the process model differently.
Apparently, the strategy was to divide the process model into parts that could be
tackled sequentially—resulting in the described sub–processes.
To summarize, the collected data indicates that sub–processes appear to negatively influence the overall understanding of rather small modularized declarative
process models, but improve understanding if model size increases. In addition,
subjects seemed to approach process models in a sequential manner. When this
was not possible, subjects apparently tried to divide the process model in manageable, potentially sequential chunks. For modularized models, these divisions could
directly be perceived in form of sub–processes, supporting the overall understanding
of the process model.
RQ13 : Is the Understanding of a Declare–Based Model Positively Influenced
by Information Hiding?
Besides fostering the recognition of patterns, we argued that information hiding,
i.e., using sub–processes to abstract from their content, will support modelers. In
particular, removing information irrelevant for conducting the task at hand will
presumably result in a lower mental effort and consequently in higher performance.
To investigate this claim, we elaborated questions that could be answered without
looking into sub–processes. In order to investigate this research question, we first
approach it from a quantitative angle, i.e., we analyze the mental effort, accuracy
and duration of questions. Then, we take a qualitative point of view and inspect
the think–aloud protocols for evidence of information hiding.
For the investigation of RQ12 , we introduced the notion of relative mental effort,
i.e., the mental effort of a question in relation to the average mental effort required.
To approach RQ13 , we additional introduce the notion of relative duration. As described in Section 5.5.1, we employ think–aloud techniques, i.e., subjects constantly
170
5.5 Evaluation Part II: Declare
voice their thoughts. During the execution of E3 we observed that subjects have
different preferences regarding the level of detail provided in their answers. Hence,
also durations required for answering questions differ not only with respect to the
difficulty of the question, but are also dependent on personal inclination. As an effort
to counterbalance this effect, we computed the relative duration for each question,
i.e., we divided the duration required for answering a question by the time required
for answering all questions of a process model. The quantitative analysis of RQ13 is
similarly structured to the analyses conducted in E2 and R2 . In particular, we computed the average relative mental effort (H10 ), accuracy (H11 ) and relative duration
(H12 ) for abstraction questions as well as for each model (M1 to M4 ). However,
unlike for E2 and R2 we did not apply statistical test for model–wise comparisons,
since the groups would be far too small for a meaningful statistical test.17 Likewise,
we refrain from reporting individual values of questions.
Hypothesis
Mod.
Flat
Δ
Z
0.98
0.96
1.09
1.05
0.99
1.13
−0.07
−0.03
−0.04
−2.07
−1.41
−0.77
H10 : Mental effort
H11 : Accuracy
H12 : Duration
a
p
0.038a
0.157
0.441
r
0.69
0.47
0.26
significant at the 0.05 level
Table 5.29: Results for abstraction questions
The results of this investigation are summarized in Table 5.29. Similar to E2 and
R2 , the table lists hypotheses, values for modularized as well as flat models and differences thereof. Then, columns five to seven list results from applying statistical tests.
In particular, we applied Wilcoxon Signed–Rank Test, as our sample is relatively
small (cf. [179]).18 Summarizing, it can be said that mental effort (H10 ) was statistically significantly lower for modularized models (Z = −2.07, p = 0.038, r = 0.69),
whereas no statistically significant differences could be found for accuracy (H11 , Z =
−1.41, p = 0.157, r = 0.47) and duration (H12 , Z = −0.77, p = 0.441, r = 0.26).
Summarizing, the data indicates that information hiding decreases mental effort—
however, no statistically significant influence with respect to accuracy and duration
could be observed. When computing values for mental effort, accuracy and duration
for each individual model, as shown in Table 5.30, the same pattern can be observed.
17
In total, 9 subjects participated in E3 and each subject only worked on the modularized version
or the flat version of a model. Hence, when analyzing models individually the group size is 4
and 5 subjects, respectively.
18
For the sake of completeness, tests for normal distribution can be found in Appendix A.5.
171
Chapter 5 The Impact of Modularization on Understandability
While the influence on mental effort appears to be positive and rather consistent,
values for accuracy and duration seem to fluctuate.
Hypothesis
Model
Mod.
Flat
Δ
H10 : Mental effort
M1
M2
M3
M4
1.01
0.90
1.01
1.01
0.97
0.98
1.14
1.10
0.04
−0.08
−0.13
−0.09
H11 : Accuracy
M1
M2
M3
M4
0.94
1.00
0.94
0.95
1.00
1.00
1.00
0.94
−0.06
0.00
−0.06
0.01
H12 : Duration
M1
M2
M3
M4
1.13
0.91
1.10
1.21
1.07
1.16
1.12
1.17
0.06
−0.25
−0.02
0.04
Table 5.30: Results for abstraction questions (per model)
Considering the statistically significant results for mental effort and knowing that
mental effort and accuracy as well as mental effort and duration correlate (cf. Section 5.4), it seems surprising that no statistical significant differences with respect
to accuracy and duration could be found. In the following, we first discuss potential explanations regarding accuracy and then turn toward duration. Basically, the
high overall accuracy (0.97) and the low standard deviation (0.08) indicate that the
lack of significant differences could be attributed to the ceiling effect [263]. In other
words, the questions were not hard enough or the models were too small to cause a
substantial amount of errors, resulting in low fluctuations of accuracy. In fact, the
average mental effort was 3.43, i.e., between Low mental effort and Neither high nor
low mental effort. Furthermore, it was argued that accuracy is a less sensitive measure than mental effort [298] and thus larger samples are required to show statistical
significant differences. Thus, it seems likely that the lack of significant differences
with respect to accuracy can be traced back to the rather low sample size and the
low complexity of tasks. Regarding duration, we argued that results will depend on
whether subjects spend time in detailing their explanations. We tried to minimize
this influence by computing the relative duration, however, it cannot be excluded
that the lack of significant results is caused by this interference.
172
5.5 Evaluation Part II: Declare
Up to now, we have focused on quantitative data for investigating the influence
of information hiding. In the following, we turn to the think–aloud protocols and
video recordings, discussing qualitative evidence for the utilization of information
hiding. In particular, regardless of whether sub–processes were present or not, a
two–step procedure could be observed. In the first step, subjects identified all activities relevant for answering a question. Apparently depending on personal preference, subjects used a pencil to highlight these activities, or simply placed a finger
on the paper. In cognitive psychology this is referred to as external memory (cf.
Section 3.2.3): the information, which activities have to be considered for answering
the question, is stored externally instead of taking up the human mind’s working
memory. In the second step, subjects performed the reasoning, i.e., interpreted the
constraints relevant for these activities. Interestingly, after step 1 was performed,
we could observe subjects actively pursuing information hiding. In particular, in
modularized models, sheets of papers that contained irrelevant sub–processes for
the question at hand, were removed, e.g., “I don’t need this here I think”. A similar
pattern could be observed in the flat models: after having identified which parts of
the model are relevant for answering the question at hand, subjects followed various
strategies for hiding irrelevant information. For instance, a hand was used to cover
up irrelevant parts of the model (“this part of the model cannot be performed”) or
the relevant part of the models was highlighted: “cannot occur, since I’ve got here
some kind of partial process” 19 . Hence, we conclude that information hiding appears to be a strategy that is intuitively followed by subjects. Interestingly, also for
flat models, where all information is present at once, subjects emulated information
hiding by covering up irrelevant parts of the model. Still, as indicated in Table 5.29
and Table 5.30, information hiding seems to be rather present in modularized models
than in flat models.
RQ14 : Is the Understanding of a Declare–Based Model Negatively Influenced
by Fragmentation?
After having provided empirical evidence that modelers are basically able to understand modularization (RQ11 ) and positive influence of sub–processes (RQ12 and
RQ13 ), now we turn to the postulated negative influence. As argued in Section 5.3.2,
tasks that involve the content of several sub–processes require the modeler to mentally integrate these sub–processes, imposing a higher mental effort and leading to
lower performance. To empirically investigate this claim and similar to RQ13 , we
elaborated questions that presumably do not benefit from abstraction, but suffer
from fragmentation. Hence, such questions should be easier to answer in a flat
19
Original quote: “cannot occur, da ich hier so’n Teilprozess hab”.
173
Chapter 5 The Impact of Modularization on Understandability
model, as they are not negatively influenced by modularization. Similar to RQ13 ,
we start by approaching RQ14 from a quantitative angle and take a qualitative point
of view afterwards.
The analysis of results follows the same strategy as applied in RQ13 , i.e., we
computed the relative mental effort, accuracy and relative duration for all models. Likewise, Table 5.31 shows hypotheses, values for modularized as well as flat
models and differences thereof. Similar to RQ13 we applied Wilcoxon Signed–Rank
Test to test for statistical significant differences, results are listed in columns five
to seven. In particular, mental effort (H13 ) was significantly higher in modularized models (Z = −2.07, p = 0.038, r = 0.69), whereas no statistical significant
differences could be found for accuracy (H14 , Z = −1.89, p = 0.059, r = 0.63) and
duration (H15 , Z = −0.77, p = 0.441, r = 0.26). Hence, similar to RQ13 , we could
provide empirical evidence for the negative influence of modularization on mental
effort, but could not show differences with respect to accuracy and duration. Also,
when looking into values computed for each model individually, a similar pattern
can be observed (cf. Table 5.32). In particular, the influence on mental effort seems
to be consistently negative in modularized models, values for accuracy and duration
apparently fluctuate.
Hypothesis
Mod.
Flat
Δ
Z
1.02
0.89
0.91
0.95
0.96
0.87
0.07
−0.07
0.04
−2.07
−1.89
−0.77
H13 : Mental effort
H14 : Accuracy
H15 : Duration
a
p
0.038a
0.059
0.441
r
0.69
0.63
0.26
significant at the 0.05 level
Table 5.31: Results for fragmentation questions
Interestingly, a similar pattern of results as described in RQ13 could be observed.
Again, mental effort was significantly different, while no significant differences with
respect to accuracy and duration could be shown. In RQ13 we argued that results
regarding accuracy were to a certain extent caused by the ceiling effect, i.e., high
accuracy and low standard deviation. In RQ14 further evidence for this assumption
is provided. More specifically, the mean accuracy was lower (0.92 versus 0.97), while
the standard deviation increased (0.13 versus 0.08). In line with these changes, also
the p value computed for Wilcoxon Signed–Rank Test dropped near significance
(0.059 versus 0.157). Hence, it seems likely that also for RQ14 lack of significant
differences with respect to accuracy can be traced back to sample size and low
complexity of tasks. Nevertheless, empirical evidence for the negative influence
174
5.5 Evaluation Part II: Declare
of modularization in terms of mental effort could be provided. With respect to
duration, we argued that the adoption of think–aloud techniques might posed a
considerable influence.
Hypothesis
Model
Mod.
Flat
Δ
H13 : Mental effort
M1
M2
M3
M4
0.99
1.10
0.99
0.99
1.03
1.02
0.86
0.90
−0.04
0.08
0.13
0.09
H14 : Accuracy
M1
M2
M3
M4
0.75
0.95
0.81
1.00
0.95
0.94
1.00
0.94
−0.20
0.01
−0.19
0.06
H15 : Duration
M1
M2
M3
M4
0.87
1.09
0.90
0.79
0.93
0.84
0.88
0.83
−0.06
0.25
0.02
−0.04
Table 5.32: Results for fragmentation questions (per model)
To enhance RQ14 with qualitative insights, we examined the think–aloud protocols for evidence of fragmentation. A particularly explicit case of fragmentation
could be found in question 4 from M2 . Here, subjects were asked to answer how
often Decide on teaching method, contained in sub–process Prepare lessons, could be
executed. Decide on teaching method was constrained to be executed exactly once in
the sub–process Prepare lessons. Prepare lessons, in turn, was also restricted to be
executed exactly once. Hence, subjects had to combine these two constraints to find
out that Decide on teaching method could be executed exactly once. The reasoning
process required to establish this answer can be found in a subject’s think–aloud
protocol: “yes, has to be executed exactly once. . . it is in this sub–process of prepare
lessons. Prepare lessons has to be executed exactly once and also in the sub–process
exactly once. One times one is one” 20 . As described in RQ13 , subjects first located
relevant activities and then interpreted associated constraints. In this particular
case, the subject understood that it had to combine the cardinality constraint on
Decide on teaching method with the cardinality constraint on Prepare lessons, i.e.,
20
Original quote: “ja, muss immer genau einmal ausgeführt werden, das is in dem, es is in dem
Subprozess von prepare lessons. Prepare lessons muss genau einmal ausgeführt werden und das
muss in dem Subprozess genau einmal, und ein mal eins ergibt bei mir auch wieder eins”.
175
Chapter 5 The Impact of Modularization on Understandability
had to integrate these two cardinality constraints. Even though this integration task
appears especially easy (“one times one is one”), it emphasizes the problem of fragmentation: it requires the modeler to combine the semantics of (potentially) several
constraints. This, in turn, was shown to be the major reason for misinterpreting
declarative process models (cf. RQ11 ), providing further empirical evidence for the
negative influence of fragmentation.
An apparently especially difficult integration task could be found in a fragment
of M1 , cf. Figure 5.16. In particular, the subjects had to assess the statement:
“Write code” has to be executed before “Merge fix” can be executed. To this end,
three facts have to be combined. First, Write code is contained in sub–process
Apply TDD, while Merge fix can be found in sub–process Work with production
software. Second, Apply TDD and Work with production software are connected
by a precedence constraint, hence Apply TDD must be executed before Work with
production software can be executed. Hence, it could mistakenly be inferred that
Write code must be executed before Merge fix can be executed. However, third,
Write code is not necessarily executed when Apply TDD is executed. Rather, Write
test must be executed at least once and consequently also Run tests must be executed
at least once due to the chain response constraint (cf. Table 3.1) between these two
activities. Write code, though, is not required. Therefore, Apply TDD can be
executed without an execution of Write code, thus also Merge fix can be executed
without a prior execution of Write code.
For illustration purpose, consider the following excerpt from a think–aloud transcript: “Write code has to be, write code, where are you, here, has to be executed
before merge fix can be executed”. Here the subject searches for activities Write
code and Merge fix. Then, the subject examines the relationship between the sub–
processes which contain these activities: “Yes, because before, ahm, before work with
production software which is the sub–process where merge fix is. . . apply TDD has
to be performed before”. Here, the subject apparently falsely integrates the precedence constraint between Apply TDD and Work with production software with the
activities contained therein. Knowing that the subject answered 29 out of 32 (91%)
questions correctly, it can be assumed that the subject tried its best to answer the
questions correctly. Hence, we conclude that this task indeed posed a significant
challenge for the subject.
Limitations
Apparently, several limitations apply to E3 . First, the empirical evaluation provides promising results, however, the rather low sample size (9 subjects) is a clear
threat to the generalization of results. Similarly, even though the process models
176
5.5 Evaluation Part II: Declare
Top-level process (fragment)
Apply TDD
Work with
production software
+
+
1
1..*
Run tests
Write test
Merge fix
Test fix
Write code
Legend
X
Activity X
X
+
Complex
activity X
X
Y
1
X
1..*
X
Activity X must be
executed before Y
can be executed
Activity X must be
executed exactly
once
X
Y
Activity X must be
executed at least
once
X
Y
Each execution of X
must be directly
followed by Y. Each Y
must be directly
preceded by X.
X
Y
Each execution of Y
must be directly
preceded by X.
Each execution of X
must be directly
followed by Y.
Figure 5.16: Difficult integration task, adapted from [303]
used in this study vary in the number of activities, constraints and sub–processes,
it is not entirely clear whether the obtained results are applicable to every modularized declarative process models. In addition, we considered process models with
a nesting level of one only, i.e., none of the sub–processes was refined using further
sub–processes. As it was shown that an overuse of sub–processes may negatively
impact the understanding of a model [45], the limited nesting level has to be seen as
a further limitation of this study. Similarly, the questions used to assess the understandability can only address a limited number of aspects. Even though questions
were designed to representatively cover several aspects of models (cf. [138]), a bias
favoring certain questions cannot be ruled out entirely. In addition, all participating subjects indicated academic background, limiting the generalization of results.
However, subjects also indicated profound background in BPM, hence, we argue
that they can be seen as proxies for professionals. Finally, the classification of error
sources was due to personnel limitations performed by a single person only, i.e.,
the author. Therefore, also limitations regarding the accuracy and reliability of the
classification must be acknowledged.
177
Chapter 5 The Impact of Modularization on Understandability
Discussion
The results of E3 show a diversified picture of modularization in declarative models.
Basically, the findings of RQ11 indicate that modelers are able to properly interpret
sub–processes. However, the adoption of sub–processes does not necessarily improve
the understandability of a model. While pattern recognition (RQ12 ) and information hiding (RQ13 ) may lower the mental effort for understanding a process model,
fragmentation (RQ14 ) appears to impose an additional burden on the modeler.
Basically, due to the low sample size of E3 , results should be interpreted with care.
In this sense, comparing findings from E3 with results from E2 and R2 is particularly
valuable for determining which findings could be replicated and seem to be stable.
First and foremost, the impact of modularization on mental effort appears to be
stable, as in E2 , R2 as well as E3 all hypotheses regarding mental effort could be
supported. With respect to accuracy, the situation is less clear. On the one hand,
the negative influence of fragmentation on accuracy could be corroborated in E2
as well as R2 and differences in E3 were barely above statistical significance. On
the other hand, no statistically significant empirical evidence for the influence of
abstraction on accuracy could be found. This, however, may be rather traced back
to problems with the experimental design. Also, the influence on duration could
shown to be statistically significant in E2 and R2 , but could not be be replicated in
E3 . As argued, this can be attributed to the adoption of think–aloud, which must
be assumed to have a strong influence on the duration. Against this background,
also the results from E3 —even though obtained from a rather small sample—seem
to be plausible. In addition, BPMN and Declare differ considerably with respect
to semantics and modeling constructs. Hence, the results obtained in E3 suggest
that the influence of abstraction and fragmentation, as postulated in Section 5.3.2,
are not only specific for these two modeling languages, but can be considered rather
general effects that can also be expected for other conceptual modeling languages.
5.6 Limitations
Up to now, we have identified and discussed existing empirical research into the
modularization of conceptual models through a systematic literature review in Section 5.2. Then, in Section 5.3 we proposed a cognitive–psychology–based framework, providing a new perspective on the interplay between modularization and
understandability of a conceptual model. Subsequently, we empirically validated
the framework for BPMN in Section 5.4 and for Declare in Section 5.5. In the
following, we revisit the limitations of the described work.
178
5.7 Discussion
As discussed in Section 5.3.3, the proposed framework is of rather general nature and does not take into account pecularities of specific modeling languages. In
particular, the framework focuses on the structure of the model, but does not take
into account its semantics. Hence, factors such as redundancy or minimality, as
described in the good decomposition model [25, 266] cannot be taken into account.
Similarly, we clearly acknowledge model size as an important factor, but consider
model size only implicitly. We would like to emphasize at this point that we have
deliberately decided not to take into account model size explicitly, as it appears that
the appropriate size of a sub–model is still unclear. Rather, the intention was to start
with a rather simple, but stable framework which may be extended in future work.
While the general nature basically allows for applying the framework to almost any
conceptual modeling language supporting modularization, at the same time it calls
for an empirical validation of particular modeling languages. In this sense, empirical
investigations E2 , R2 and E3 provide empirical support for the framework. In particular, the influence on mental effort could be corroborated in E2 , R2 and E3 , while
the influence of fragmentation on accuracy could be empirically substantiated in E2 ,
R2 , while no support could be found for the influence of abstraction on accuracy.
Finally, support for the influence of abstraction and fragmentation on duration could
be found in E2 and R2 , whereas no statistically significant differences were found
in E3 . Apparently, the results need to be viewed in the light of several limitations.
In particular, the empirical evaluation focuses on BPMN and Declare. Although
results appear to be consistent, it remains unclear whether the findings can directly
be applied to other conceptual modeling languages that support modularization. In
addition, we tried to conduct a comprehensive empirical evaluation by taking into
account different modeling languages and quantitative as well as qualitative data.
Even though we tried to create representative models and questions by taking into
account typical modeling constructs, it cannot be entirely excluded that observed
results are specific for these models. In the following, we revisit findings of this work
as well as the above described limitations in the course of a discussion.
5.7 Discussion
The central goal of this chapter is to provide a new perspective on the connection
between the modularization of a process model and its understandability. To this
end, drawing upon insights from previous empirical investigations and cognitive psychology, we proposed a framework that claims that the impact on understandability
is determined by the positive influence of abstraction and the negative influence of
fragmentation. In the course of three empirical studies, i.e., E2 , R2 and E3 , empiri-
179
Chapter 5 The Impact of Modularization on Understandability
cal support for the impact on mental effort, accuracy and duration could be found.
It should be emphasized that respective effects could be found largely consistently
across three empirical investigations including two different modeling languages and
subjects of varying background, i.e., students, academics and professionals. Against
this background, it seems likely that the influence of abstraction and fragmentation,
as postulated, is not just a peculiarity of the experimental design, but can generally
be found in BPMN–based and Declare–based models.
Besides providing empirical support for the understandability framework proposed
in Section 5.3.2, the results indicate that the benefits of a modularization depend
on which kind of information should be extracted. In other words, if the question
an modeler is interested in, rather benefits from abstraction than being impaired
by fragmentation, understandability will presumably improve. Contrariwise, if fragmentation prevails, the model will presumably become more difficult to understand.
Thus, it seems worthwhile to maximize the ratio of abstraction to fragmentation.
In this sense, dynamic process visualizations [19, 123, 204] seem to be promising, as
they allow for visualizing the process model according to the modeler’s demands. In
the context of this work, such a dynamic visualization would ensure that all relevant modeling elements are visible, while irrelevant modeling elements are hidden in
sub–processes. For declarative process models, however, this requires an automated
restructuring of modularized declarative process models. Such techniques, however,
are not in place yet and only possible for process models that do not make use of
enhanced expressiveness (cf. Section 3.1.3).
Regarding the interpretation of statically visualized declarative business process
models, we could identify different strategies in the think–aloud protocols and video
material. Basically, modelers appear to approach declarative process models in a sequential manner, i.e., they tend to describe the process in the ordering activities can
be executed. Knowing that imperative process modeling languages, e.g., BPMN, are
much wider spread than declarative process modeling languages, one might argue
that this indicates that subjects were biased by the former category of modeling
languages. Similarly, it might have been the case that this behavior was triggered
by the layout of the process models. In particular, the process models were laid
out so that activities that were executed in the beginning of a process instance were
placed top left, whereas activities that were executed at the end of the process instance were placed bottom right. Even though a declarative process model typically
does not prescribe a strict ordering of activities, this layout might have influenced
the subjects. However, it was found that domain experts, i.e., persons unfamiliar
with business process modeling, were also inclined toward sequential behavior (cf.
Section 4.7). Hence, it seems likely that the abstract nature of declarative process
models does not naturally fit the human way of reasoning.
180
5.7 Discussion
Further, evidence that constraints indeed may pose a considerable challenge for
the modeler could be found in the tasks where subjects were asked to describe a
process model. Therein, we could find indications that for larger process models,
sub–processes helped to divide the model into manageable parts, i.e., the number of interacting constraints seems to play an essential role. Further evidence for
this hypothesis is provided by the finding that subjects intuitively sought to reduce
the number of constraints by, e.g., putting away sheets describing irrelevant sub–
processes or, in a flat model, using the hand to hide irrelevant parts of the model. In
this sense, it seems plausible that an automated interpretation of constraints, as proposed in Chapter 4, can help to improve the understanding and thereby maintenance
of declarative business process models.
In this work we have not considered the granularity of modularizations. Likewise,
we have not investigated whether correct levels of abstraction were applied for sub–
processes, as discussed in detail in [51, 197]. Rather, our work has to be seen
as an orthogonal perspective to these aspects. Even when optimizing granularity
and abstraction levels, a process model may be modularized in various ways. The
framework proposed in this work may then be used as an additional perspective,
helping the modeler to decide for a specific modularization.
Similarly, the results have to be seen in the light of guidelines for modularization.
For instance, according to the good decomposition model [265], proper modularization should satisfy minimality, determinism, losslessness, weak coupling and strong
cohesion. Again, abstraction and fragmentation have to be seen as an additional
perspective. Basically, satisfying the conditions of the good decomposition model
can be related to optimizing the ratio between abstraction and fragmentation. For
instance, achieving strong cohesion clearly aims at increasing abstraction by keeping closely related objects together (non–related object will have to be placed in
different sub–models to achieve strong cohesion, hence fostering abstraction). Weak
coupling, in turn, aims at minimizing fragmentation by minimizing connections between sub–models and hence decreasing potential switches between sub–models.
Losslessness, i.e., that no information is lost when introducing sub–processes, is not
captured in our framework, as the focus of our work is put on models rather than on
their creation. Finally, achieving minimality, i.e., non–redundancy, and determinism seems desirable for modularization. However, in our opinion these factors are
not necessarily related to decomposition only, but should be rather seen as general
modeling guidelines that also hold for non–modularized models. As our framework
specifically focuses on modularization, we do not see a direct connection to our
framework. Regarding guidelines and specifically the guidance of modelers in creating modularized models, the findings obtained in this work may be used for the
development of recommendations during modeling. Similar to recommendation sys-
181
Chapter 5 The Impact of Modularization on Understandability
tems that guide the execution of business processes [96, 220], support microblogging
environments [289, 290] and help to organize semi–structured data [80, 81], modelers
may be supported in the creation of modularized models. In particular, by instrumenting modeling editors, user interactions, such as switching between sub–models
and scrolling, could be logged. An excessive amount of switching could indicate a
high degree of fragmentation and the modeling environment could recommend the
modeler to merge certain parts of the model. Contrariwise, an excessive amount of
scrolling may indicate an overly large process model and the introduction of sub–
models may be recommendable. However, these scenarios are of theoretical nature
only so far and it has to be investigated first whether supporting, yet non–disrupting
recommendations are feasible.
With respect to empirical investigations of modularized models in general, the
interplay of positive and negative influences is also of interest. In particular, the results doubt in how far empirical comparisons between flat and modularized models
are meaningful if questions were not designed carefully. In particular, when focusing
on abstraction questions only, a bias for modularized models must be expected. Contrariwise, when an empirical investigation focuses on fragmentation questions only,
the setup will most likely favor non–modularized models. Hence, merely comparing
modularized models and flat models seems to be too shortsighted. Rather, in the
experimental design, positive and negative effects should be distinguished. Against
this background, seemingly contradicting results from empirical investigations into
modularization can be explained in a plausible way. In works reporting from positive
influence, e.g., [150, 208], questions benefiting from abstraction probably prevailed.
In inconclusive works, e.g., [41, 225], questions benefiting from abstraction and questions impaired by fragmentation were probably in balance. In works reporting from
negative influence, in turn, e.g., [40, 45], probably questions impaired by fragmentation prevailed.
5.8 Related Work
The work presented in this chapter is basically related to three streams of research:
research that investigates the modularization and understandability of conceptual
models, research that deals with guidelines for modularization and research that is
concerned with automated approaches to modularization.
Modularization and Understandability of Conceptual Models
In this work, we discussed characteristics of modularization in conceptual models and
the impact on understandability. The impact of modularization on understandability
182
5.8 Related Work
was studied in various conceptual modeling languages, such as imperative business
process models [206, 208], ER diagrams [150, 225], UML Statecharts [40, 42–44] and
various other UML diagrams [24–26] (cf. Section 5.2). Even though all these works
empirically investigate the connection between modularization and understanding,
BPMN and Declare are not investigated. Similarly, also the work of McDonald and
Stevenson [136], in which the navigation performance in hierarchical documents is
investigated, should be mentioned. Even though hierarchical structures are investigated in that work as well, the navigation instead of comprehension is of concern.
More generally, also work dealing with the understandability of conceptual models is
related. For instance, in [137] criteria for the systematic analysis of understandability
are described. Similarly, [212] proposes COGEVAL, a cognitive–psychology–based
theoretical framework for assessing the understanding of conceptual models. In this
sense, also the work of Reijers et al., which deals with the understandability of
business process models in general [143, 207], is related. Similarly, in [139] the relationship between the size of imperative process models and error rates is established.
Likewise, in [59, 260] the understanding of process models is assessed through the
adoption of structural metrics. Regarding the understandability of declarative process models in general, also the work presented in Chapter 4 is related. Even though
all these works investigate understandability, modularization is at most addressed
as a side aspect.
Guidelines for Modularization
In this work we provide a new perspective on the link between modularization and
understandability. Thereby, we also hope to advance the understanding of principles
modeling guidelines are based on. In this sense, respective guidelines for the creation
of business process models, e.g., [46, 49, 122, 145], are also is of interest. Similarly,
general considerations about modularization, resulting in the good decomposition
model [265] were adopted for the modularization of object–oriented design [25] and
the modularization of Event–driven Process Chains [113]. Likewise, in [51] a development method, which foresees the creation of modularized business processes,
is described. Even though these works provide valuable insights into modularized
models, none of these works deals with declarative process models.
In this work we focused on the outcome of a process modeling endeavor, i.e., the
process model. Recently, researchers have also began to investigate the process of
creating a process model, referred to as the process of process modeling [192]. Similar
to this work, they way how modelers make sense of a process model while creating
it, is investigated—for instance, by visualizing the process of process modeling [30].
Similarly, different personalized modeling styles [185, 186] and modeling strategies
183
Chapter 5 The Impact of Modularization on Understandability
were identified [31]. Even though this stream of research appears to be promising,
none of these works investigates the creation of modularized models and rather puts
its current focus on imperative, non–modularized models.
Automated Approaches to Modularization
Besides assessing the understandability of modularization, several authors investigated potential ways of automatically creating modularized models. In particular,
in [193] an approach for automatically aggregating activities based the most relevant activities of a process model is proposed. Similarly, in [232] an approach
for the automated abstraction of control flow, based on behavioral profiles, is described. Another automated approach for modularization is described in [231]—here,
meronymy relations between activity labels are employed for automated modularization. Similarly, in [73, 201] methods for the automated clustering of ER diagrams
are described and [78] discusses criteria for clustering, such as cohesion and coupling.
A comprehensive overview of related approaches to clustering, in turn, is provided
in [149, 152]. Finally, in [172] an automated approach for the decomposition of
systems is proposed. Even though all of these approaches promise to provide abstraction in an automated way, it is unclear in how far the created models will be
understandable to the end–user, which is of concern in this work.
5.9 Summary
In this chapter, we proposed and empirically validated a cognitive–psychology–based
framework for assessing the impact of the modularization of a conceptual model on
its understandability. The starting point for this work was formed by a systematic
literature review about empirical research investigating the interplay between modularization and understandability, which showed that—contrary to the expectation
of researchers—the influence of modularization is not always positive. Rather, although most empirical studies reported from a positive influence, several studies
also reported from non–significant differences and a negative influence of modularization. To explain these apparently contradicting findings, we proposed a cognitive–
psychology–based framework that provides a potential explanation. Particularly, we
identified abstraction, i.e., pattern recognition and information hiding, as a positive
influence of modularization. Contrariwise, we attributed the negative influence of
modularization to fragmentation, i.e., the need to switch between sub–models and
to re–integrate information. Depending on whether abstraction or fragmentation
dominates, a positive influence, a negative influence or no influence at all can be
184
5.9 Summary
observed. To empirically corroborate this claim, we conducted a series of empirical
studies in which we applied the framework in various settings.
First, in experiment E2 we focused on BPMN–based models and conducted a controlled experiment. The replication R2 , in turn, was carried out as an on–line study,
i.e., in a rather uncontrolled setting. Finally, empirical study E3 was conducted
in a controlled setting, but shifted the focus from BPMN toward Declare and also
took into account qualitative data, whereas E2 and R2 mostly collected quantitative data. Throughout these studies, similar effects and patterns could be observed.
First, regardless of the setting, i.e., modeling language, subjects and experimental
material, empirical support for the positive influence of abstraction as well as the
negative influence of fragmentation could be found. If observed effects could not be
found to be statistically significant, plausible alternative explanations, such as pecularities of the experimental design, could be provided. Throughout E2 , R2 and E3 ,
the influence of modularization on mental effort could be found, i.e., we could find
a statistically significant influence in all studies. Similarly, statistically significant
differences of durations could be found in E2 and R2 , whereas the lack thereof in
E3 could be rather traced back to the application of think–aloud. Finally, empirical
evidence for the influence on accuracy could be found. Even though we could not
identify statistically significant results for the influence of abstraction on accuracy,
we could trace back these unexpected results to the low number of committed errors
and their distribution. Furthermore, the analysis of think–aloud protocols in E3
provided complementary qualitative support. For instance, we could observe subjects deliberately hiding sub–models, thereby facilitating abstraction—particularly
information hiding—for alleviation. Contrariwise, we could also notice reasoning
processes involved in the integration of sub–models, observing how fragmentation
complicated the interpretation of models.
Summarizing, we think that the empirical data collected these studies provides
convincing arguments for the existence of abstraction and fragmentation, as postulated. In this sense, we hope to have advanced the state of the art by providing a
new perspective on the connection between modularization and understandability.
185
Chapter 6
Summary
In this thesis, we set out to investigate whether concepts from cognitive psychology
can be applied for systematically improving the creation, understanding and maintenance business process models. To approach this rather broad research questions,
we selected two promising areas of application. In particular, in Chapter 3, we built
the foundation for this thesis by transferring concepts from cognitive psychology
to business process modeling. Then, in Chapter 4, we used these insights to analyze potential problems regarding the creation, understanding and maintenance of
declarative business process models. Largely, we could trace back issues to the representation of sequential information in declarative process models as well as hidden
depedencies. These properties, in turn, presumably complicate the understanding
of declarative process models. To counteract these problems, we proposed TDM
for the computer–supported computation of sequential information. Further, we
implemented the concepts of TDM in TDMS and empirically validated the benefits of TDM in a case study, in an experiment and in a replication. Therein, we
found empirical evidence corroborating that TDM can help to support the creation,
understanding and maintenance of declarative process models.
Subsequently, we turned toward the connection between a process models’ modularization and its understandability in Chapter 5. Therein, we conducted a systematic literature review for assessing the state of the art regarding empirical research
investigating the link between modularization and understandability. In the course
of the review, we found that insights seem to vary from positive over neutral to
negative. To provide a potential explanation for these differing findings, we drew
on concepts from cognitive psychology and proposed a framework for assessing the
impact of modularization on the understandability of a process model. Then, to support the efficient empirical validation of the proposed framework, we implemented
Hierarchy Explorer for displaying modularized process models. Hierarchy Explorer,
in turn, was employed in an experiment and a replication, in which the framework was validated for BPMN–based process models. In an additional experiment,
findings were complemented by applying the framework for Declare–based process
187
Chapter 6 Summary
models. The findings of these empirical studies show that the proposed framework,
in particular the interplay of abstraction and fragmentation, allows for assessing
the influence of modularization on understandability. In this way, we think that
the framework provides a new perspective on the modularization of process models
and contributes to the creation of modularized process models that are easier to
understand.
In the light of these findings, we conclude that the central research question of
this thesis—whether cognitive psychology can be used to improve the creation, understanding and maintenance of process models—can be clearly affirmed. Moreover,
concepts from cognitive psychology were not only applied to a single purpose or to
a single modeling language, but for three purposes (creation, understanding and
maintenance) and two modeling languages (BPMN and Declare). Thus, we believe
that also other areas, such as research into the layout of process models or comparisons between process modeling languages, could benefit from respective concepts.
Furthermore, it was noted that particularly the comparison of modeling languages
and methods often lacks respective theoretical underpinnings [69]. In this sense, we
think that insights from cognitive psychology could help to put such discussions on
a more objective basis.
Even though the contributions presented in this thesis are self–contained, they
apparently also offer the opportunity for follow–up research. In particular, TDM
currently focuses on control flow only, but could be extended to include data and
resources in test cases. Similarly, TDMS provides support for Declare only, however,
could be extended toward supporting other declarative modeling languages, such as
DCR graphs. Regarding the framework for assessing the impact of modularization
on understanding, particularly more detailed empirical investigations seem to be
promising. For instance, through the adoption of eye tracking technology [58], the
way how humans interpret modularized process models could be investigated in far
more detail. Concluding, we think that bringing together cognitive psychology and
business process modeling provides a fruitful basis for research. In this vein, we
hope that this thesis helps to improve business process modeling, but also fosters
interdisciplinarity.
188
Appendix A
Tests for Normal Distribution
In this part of the appendix, results from tests for normal distributions are listed.
The results were put in the appendix to avoid the text being cluttered by tables.
A.1 Experiment E1
Variable
Group
N
D
Mental effort
With test cases
Without test cases
12
12
0.352
0.280
0.000a
0.010a
Perceived quality
With test cases
Without test cases
12
12
0.333
0.160
0.001a
0.200
Quality
With test cases
Without test cases
12
12
0.339
0.399
0.000a
0.000a
Operations
With test cases
Without test cases
12
12
0.209
0.365
0.154
0.000a
a
p
significant at the 0.05 level
Table A.1: Kolmogorov–Smirnov Test with Lilliefors significance correction for E1
189
Appendix A Tests for Normal Distribution
A.2 Replication R1
Variable
Group
N
D
Mental effort
With test cases
Without test cases
31
31
0.183
0.277
0.010a
0.000a
Perceived quality
With test cases
Without test cases
31
31
0.323
0.203
0.000a
0.002a
Quality
With test cases
Without test cases
31
31
0.271
0.219
0.000a
0.001a
Operations
With test cases
Without test cases
31
31
0.241
0.243
0.000a
0.000a
a
p
significant at the 0.05 level
Table A.2: Kolmogorov–Smirnov Test with Lilliefors significance correction for R1
A.3 Experiment E2
Variable
Group
N
D
Mental effort
Modularized
Flat
109
109
0.100
0.094
0.010a
0.018a
Accuracy
Modularized
Flat
109
109
0.338
0.380
0.000a
0.000a
Duration
Modularized
Flat
109
109
0.117
0.118
0.001a
0.001a
a
p
significant at the 0.05 level
Table A.3: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
abstraction questions, total values
190
A.3 Experiment E2
Variable
Model
Group
N
D
Mental effort
M1
Modularized
Flat
56
53
0.110
0.142
0.091
0.009a
M2
Modularized
Flat
53
56
0.118
0.107
0.063
0.164
M1
Modularized
Flat
56
53
0.310
0.438
0.000a
0.000a
M2
Modularized
Flat
53
56
0.364
0.323
0.000a
0.000a
M1
Modularized
Flat
56
53
0.130
0.111
0.020a
0.148
M2
Modularized
Flat
53
56
0.076
0.230
0.2000
0.000a
Accuracy
Duration
a
p
significant at the 0.05 level
Table A.4: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
abstraction questions, values per model
191
Appendix A Tests for Normal Distribution
Model
Question
Group
N
D
M1
Q1
Modularized
Flat
56
53
0.199
0.261
0.000a
0.000a
Q3
Modularized
Flat
56
53
0.239
0.203
0.000a
0.000a
Q5
Modularized
Flat
56
53
0.243
0.260
0.000a
0.000a
Q7
Modularized
Flat
56
53
0.223
0.310
0.000a
0.000a
Q9
Modularized
Flat
53
56
0.180
0.240
0.000a
0.000a
Q11
Modularized
Flat
53
56
0.204
0.219
0.000a
0.000a
Q13
Modularized
Flat
53
56
0.206
0.241
0.000a
0.000a
Q15
Modularized
Flat
53
56
0.204
0.248
0.000a
0.000a
M2
a
p
significant at the 0.05 level
Table A.5: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
abstraction questions, mental effort, values per question
192
A.3 Experiment E2
Model
Question
Group
N
D
M1
Q1
Modularized
Flat
56
53
0.499
0.539
0.000a
0.000a
Q3
Modularized
Flat
56
53
0.483
0.531
0.000a
0.000a
Q5
Modularized
Flat
56
53
0.475
0.503
0.000a
0.000a
Q7
Modularized
Flat
56
53
0.540
0.539
0.000a
0.000a
Q9
Modularized
Flat
53
56
0.533
0.525
0.000a
0.000a
Q11
Modularized
Flat
53
56
0.466
0.478
0.000a
0.000a
Q13
Modularized
Flat
53
56
0.533
0.525
0.000a
0.000a
Q15
Modularized
Flat
53
56
0.499
0.536
0.000a
0.000a
M2
a
p
significant at the 0.05 level
Table A.6: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
abstraction questions, accuracy, values per question
193
Appendix A Tests for Normal Distribution
Model
Question
Group
N
D
M1
Q1
Modularized
Flat
56
53
0.202
0.119
0.000a
0.060
Q3
Modularized
Flat
56
53
0.263
0.111
0.000a
0.151
Q5
Modularized
Flat
56
53
0.173
0.082
0.000a
0.200
Q7
Modularized
Flat
56
53
0.145
0.178
0.005a
0.000a
Q9
Modularized
Flat
53
56
0.189
0.171
0.000a
0.001a
Q11
Modularized
Flat
53
56
0.143
0.118
0.001a
0.064
Q13
Modularized
Flat
53
56
0.197
0.147
0.000a
0.006a
Q15
Modularized
Flat
53
56
0.143
0.163
0.006a
0.001a
M2
a
p
significant at the 0.05 level
Table A.7: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
abstraction questions, duration, values per question
Variable
Group
N
D
Mental effort
Modularized
Flat
109
109
0.122
0.096
0.000a
0.016a
Accuracy
Modularized
Flat
109
109
0.246
0.372
0.000a
0.000a
Duration
Modularized
Flat
109
109
0.122
0.081
0.000a
0.075
a
p
significant at the 0.05 level
Table A.8: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
fragmentation questions, total values
194
A.3 Experiment E2
Variable
Model
Group
N
D
Mental effort
M1
Modularized
Flat
56
53
0.132
0.141
0.016a
0.010a
M2
Modularized
Flat
53
56
0.134
0.093
0.019a
0.200
M1
Modularized
Flat
56
53
0.239
0.433
0.000a
0.000a
M2
Modularized
Flat
53
56
0.250
0.310
0.000a
0.000a
M1
Modularized
Flat
56
53
0.066
0.084
0.200
0.200
M2
Modularized
Flat
53
56
0.177
0.092
0.000a
0.200
Accuracy
Duration
a
p
significant at the 0.05 level
Table A.9: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
fragmentation questions, values per model
195
Appendix A Tests for Normal Distribution
Model
Question
Group
N
D
M1
Q2
Modularized
Flat
56
53
0.222
0.218
0.000a
0.000a
Q4
Modularized
Flat
56
53
0.250
0.263
0.000a
0.000a
Q6
Modularized
Flat
56
53
0.162
0.200
0.001a
0.000a
Q8
Modularized
Flat
56
53
0.165
0.305
0.001a
0.000a
Q10
Modularized
Flat
53
56
0.278
0.199
0.000a
0.000a
Q12
Modularized
Flat
53
56
0.214
0.228
0.000a
0.000a
Q14
Modularized
Flat
53
56
0.191
0.206
0.000a
0.000a
Q16
Modularized
Flat
53
56
0.242
0.176
0.000a
0.000a
M2
a
p
significant at the 0.05 level
Table A.10: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
fragmentation questions, mental effort, values per question
196
A.3 Experiment E2
Model
Question
Group
N
D
M1
Q2
Modularized
Flat
56
53
0.466
0.525
0.000a
0.000a
Q4
Modularized
Flat
56
53
0.483
0.539
0.000a
0.000a
Q6
Modularized
Flat
56
53
0.507
0.536
0.000a
0.000a
Q8
Modularized
Flat
56
53
0.367
0.531
0.000a
0.000a
Q10
Modularized
Flat
53
56
0.393
0.499
0.000a
0.000a
Q12
Modularized
Flat
53
56
0.431
0.499
0.000a
0.000a
Q14
Modularized
Flat
53
56
0.495
0.514
0.000a
0.000a
Q16
Modularized
Flat
53
56
0.431
0.466
0.000a
0.000a
M2
a
p
significant at the 0.05 level
Table A.11: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
fragmentation questions, accuracy, values per question
197
Appendix A Tests for Normal Distribution
Model
Question
Group
N
D
M1
Q2
Modularized
Flat
56
53
0.105
0.167
0.194
0.001a
Q4
Modularized
Flat
56
53
0.086
0.184
0.200
0.000a
Q6
Modularized
Flat
56
53
0.115
0.125
0.065
0.038a
Q8
Modularized
Flat
56
53
0.222
0.075
0.000a
0.200
Q10
Modularized
Flat
53
56
0.140
0.092
0.011a
0.200
Q12
Modularized
Flat
53
56
0.148
0.166
0.005a
0.001a
Q14
Modularized
Flat
53
56
0.205
0.160
0.000a
0.001a
Q16
Modularized
Flat
53
56
0.112
0.129
0.096
0.021a
M2
a
p
significant at the 0.05 level
Table A.12: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 ,
fragmentation questions, duration, values per question
198
A.4 Replication R2
A.4 Replication R2
Variable
Group
N
D
Mental Effort
Modularized
Flat
48
48
0.161
0.112
0.003a
0.178
Accuracy
Modularized
Flat
48
48
0.386
0.422
0.000a
0.000a
Duration
Modularized
Flat
30
30
0.091
0.197
0.200
0.004a
a
p
significant at the 0.05 level
Table A.13: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
abstraction questions, total values
199
Appendix A Tests for Normal Distribution
Variable
Model
Group
N
D
p
Mental Effort
M1
Modularized
Flat
20
28
0.150
0.140
0.200
0.173
M2
Modularized
Flat
28
20
0.161
0.134
0.061
0.200
M1
Modularized
Flat
20
28
0.347
0.374
0.000a
0.000a
M2
Modularized
Flat
28
20
0.429
0.487
0.000a
0.000a
M1
Modularized
Flat
14
16
0.205
0.157
0.113
0.200
M2
Modularized
Flat
16
14
0.166
0.182
0.200
0.200
Accuracy
Duration
a
significant at the 0.05 level
Table A.14: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
abstraction questions, values per model
200
A.4 Replication R2
Model
Question
Group
N
D
M1
Q1
Modularized
Flat
20
28
0.280
0.252
0.000a
0.000a
Q3
Modularized
Flat
20
28
0.300
0.203
0.000a
0.005a
Q5
Modularized
Flat
20
28
0.278
0.196
0.000a
0.007a
Q7
Modularized
Flat
20
28
0.235
0.245
0.005a
0.000a
Q9
Modularized
Flat
28
20
0.221
0.172
0.001a
0.124
Q11
Modularized
Flat
28
20
0.246
0.250
0.000a
0.002a
Q13
Modularized
Flat
28
20
0.307
0.277
0.000a
0.000a
Q15
Modularized
Flat
28
20
0.225
0.280
0.001a
0.000a
M2
a
p
significant at the 0.05 level
Table A.15: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
abstraction questions, mental effort, values per question
201
Appendix A Tests for Normal Distribution
Model
Question
Group
N
M1
Q1
Modularized
Flat
20
28
0.538
0.000a
Not applicableb
Q3
Modularized
Flat
20
28
0.509
0.526
0.000a
0.000a
Q5
Modularized
Flat
20
28
0.438
0.447
0.000a
0.000a
Q7
Modularized
Flat
20
28
0.538
0.536
0.000a
0.000a
Q9
Modularized
Flat
28
20
0.513
0.538
0.000a
0.000a
Q11
Modularized
Flat
28
20
0.526
0.538
0.000a
0.000a
Q13
Modularized
Flat
28
20
0.539
0.000a
Not applicableb
Q15
Modularized
Flat
28
20
0.539
0.527
M2
a
b
D
p
0.000a
0.000a
significant at the 0.05 level
The variance of the group is 0. As the variance of a normal distribution must be larger
than 0, Kolmogorov–Smirnov Test cannot be applied. Likewise, the group is not normal
distributed.
Table A.16: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
abstraction questions, accuracy, values per question
202
A.4 Replication R2
Model
Question
Group
N
D
p
M1
Q1
Modularized
Flat
14
16
0.130
0.196
0.200
0.103
Q3
Modularized
Flat
14
16
0.128
0.114
0.200
0.200
Q5
Modularized
Flat
14
16
0.202
0.120
0.128
0.200
Q7
Modularized
Flat
14
16
0.257
0.197
0.013a
0.098
Q9
Modularized
Flat
16
14
0.174
0.163
0.200
0.200
Q11
Modularized
Flat
16
14
0.268
0.224
0.003a
0.055
Q13
Modularized
Flat
16
14
0.136
0.147
0.200
0.200
Q15
Modularized
Flat
16
14
0.107
0.112
0.200
0.200
M2
a
significant at the 0.05 level
Table A.17: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
abstraction questions, duration, values per question
Variable
Group
N
D
p
Mental effort
Modularized
Flat
48
48
0.107
0.108
0.200
0.200
Accuracy
Modularized
Flat
48
48
0.291
0.427
0.000a
0.000a
Duration
Modularized
Flat
30
30
0.125
0.137
0.200
0.156
a
significant at the 0.05 level
Table A.18: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
fragmentation questions, total values
203
Appendix A Tests for Normal Distribution
Variable
Model
Group
N
D
p
Mental Effort
M1
Modularized
Flat
20
28
0.121
0.094
0.200
0.200
M2
Modularized
Flat
28
20
0.171
0.128
0.036a
0.200
M1
Modularized
Flat
20
28
0.259
0.429
0.001a
0.000a
M2
Modularized
Flat
28
20
0.312
0.424
0.000a
0.000a
M1
Modularized
Flat
14
16
0.205
0.164
0.115
0.200
M2
Modularized
Flat
16
14
0.122
0.133
0.200
0.200
Accuracy
Duration
a
significant at the 0.05 level
Table A.19: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
fragmentation questions, values per model
204
A.4 Replication R2
Model
Question
Group
N
D
M1
Q2
Modularized
Flat
20
28
0.245
0.206
0.003a
0.004a
Q4
Modularized
Flat
20
28
0.250
0.300
0.002a
0.000a
Q6
Modularized
Flat
20
28
0.185
0.173
0.073
0.031a
Q8
Modularized
Flat
20
28
0.270
0.218
0.001a
0.002a
Q10
Modularized
Flat
28
20
0.215
0.269
0.002a
0.001a
Q12
Modularized
Flat
28
20
0.310
0.200
0.000a
0.035a
Q14
Modularized
Flat
28
20
0.220
0.276
0.001a
0.000a
Q16
Modularized
Flat
28
20
0.274
0.239
0.000a
0.004a
M2
a
p
significant at the 0.05 level
Table A.20: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
fragmentation questions, mental effort, values per question
205
Appendix A Tests for Normal Distribution
Model
Question
Group
N
D
M1
Q2
Modularized
Flat
20
28
0.438
0.513
Q4
Modularized
Flat
20
28
0.538
0.000a
Not applicableb
Q6
Modularized
Flat
20
28
Not applicableb
0.526
0.000a
Q8
Modularized
Flat
20
28
0.413
0.536
0.000a
0.000a
Q10
Modularized
Flat
28
20
0.411
0.509
0.000a
0.000a
Q12
Modularized
Flat
28
20
0.536
0.538
0.000a
0.000a
Q14
Modularized
Flat
28
20
0.539
0.538
0.000a
0.000a
Q16
Modularized
Flat
28
20
0.513
0.527
0.000a
0.000a
M2
a
b
p
0.000a
0.000a
significant at the 0.05 level
The variance of the group is 0. As the variance of a normal distribution must be larger than
0, Kolmogorov–Smirnov Test cannot be applied. Likewise, the group is not distributed.
Table A.21: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
fragmentation questions, accuracy, values per question
206
A.4 Replication R2
Model
Question
Group
N
D
p
M1
Q2
Modularized
Flat
14
16
0.205
0.165
0.113
0.200
Q4
Modularized
Flat
14
16
0.217
0.195
0.073
0.106
Q6
Modularized
Flat
14
16
0.099
0.198
0.200
0.094
Q8
Modularized
Flat
14
16
0.147
0.229
0.200
0.025a
Q10
Modularized
Flat
16
14
0.214
0.204
0.048a
0.118
Q12
Modularized
Flat
16
14
0.192
0.152
0.118
0.200
Q14
Modularized
Flat
16
14
0.174
0.193
0.200
0.166
Q16
Modularized
Flat
16
14
0.160
0.135
0.200
0.200
M2
a
significant at the 0.05 level
Table A.22: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 ,
fragmentation questions, duration, values per question
207
Appendix A Tests for Normal Distribution
A.5 Experiment E3
Variable
Group
N
D
p
Mental Effort
Modularized
Flat
9
9
0.155
0.151
0.200
0.200
Accuracy
Modularized
Flat
9
9
0.459
0.519
0.000a
0.000a
Duration
Modularized
Flat
9
9
0.188
0.146
0.200
0.200
a
significant at the 0.05 level
Table A.23: Kolmogorov–Smirnov Test with Lilliefors significance correction for E3 ,
abstraction questions, total values
Variable
Group
N
D
p
Mental Effort
Modularized
Flat
9
9
0.155
0.151
0.200
0.200
Accuracy
Modularized
Flat
9
9
0.245
0.414
0.127
0.000a
Duration
Modularized
Flat
9
9
0.188
0.146
0.200
0.200
a
significant at the 0.05 level
Table A.24: Kolmogorov–Smirnov Test with Lilliefors significance correction for E3 ,
fragmentation questions, total values
208
Appendix B
Supplementary Information
B.1 Process of Thesis Writing
In the vein of the Process of Process Modeling (PPM) [185, 192], we analyzed the
process of thesis writing, i.e., the way how this thesis was written. We thereby
focus on the writing part only, i.e., we describe how this document—given a set of
published research—evolved. The basis for this analysis is a SVN repository1 in
which the LATEX sources of intermediate versions of this document, also referred to
as revisions, are stored. In particular, the document was periodically, e.g., at the
end of each working day, saved in the repository. Thus, by looking at the intermediate versions of the document, its evolution can be reconstructed. Technically, the
analysis was supported by a Bash script that determines all revisions, compiles a
PDF file for reach revision, computes metrics, e.g., number of pages, and writes the
results to an output file.2
Applying this procedure led to data points from 183 revisions, therefore we preprocessed the data and removed all revisions that differed less than 2 pages. The
result of this procedure can be found in Figure B.1: the solid line shows the amount
of pages, the dotted line shows the amount of references, the dashed line shows the
amount of figures and the dash–dotted line shows the amount of tables. Due to the
preprocessing, only 58 revisions are shown on the x–axis. Even though this data
is clearly influenced by uncontrolled variables, such as holidays, durations between
revisions and workload, some interesting observations—that clearly resemble the
PPM—can be made.
• Steep Writing Phases As discussed in Chapter 1, most of the research described
in this thesis was previously published, hence also this thesis builds upon these
publications. Pasting the content of these publications into the thesis resulted
in a quick increase of page numbers, as be can seen in, e.g., revision 11, 17 and
1
2
SVN is freely available from: http://subversion.apache.org
The script is freely available from: http://bpm.q-e.at/misc/ThesisEvolution
209
Appendix B Supplementary Information
300
Pages
References
Figures
250
Tables
Amount
200
150
100
50
0
1
5
10
15
20
25
30
35
40
45
50
55
SVN Revision (Preprocessed)
Figure B.1: Process of Thesis Writing
30. We would like to remark at this point that the steep increase of pages at
revision 52 can be rather traced back to adding tables listing tests for normal
distribution in Appendix A. Likewise, the final steep increase is related to the
creation of the preface.
• Steady Writing Phases Of course, this thesis could not be assembled by just
pasting publications next to each other. Rather, text was specifically produced
for this thesis. Clearly, in such phases, the number of pages increases steadily,
but slower than in steep writing phases. Examples of steady writing phases
can be found from revision 6–11, 23–29 and 34–44.
• Reconciliation Phases After text was produced—regardless whether it was
pasted or written—the content needs to be reviewed, consolidated and compacted. This, in turn, may lead to a reduction of pages, as can be seen for
revisions 6, 15 and 33.
• Chapter Phases Against this background, also the chapters of this thesis can be
210
B.2 Publications
found. Starting with Chapter 2 at revisions 1–6, Chapter 3 follows at revisions
7–11. Chapter 4 is reflected through revisions 12–31 and Chapter 5 is related
to revisions 32–52. Finally, in revisions 53–58 Chapter 1 and Chapter 6 as well
as the preface can be found.
Please note that due to the preprocessing, time–consuming tasks, such as final
spell–checking are not reflected in this diagram. Finally, it is worthwhile to note
that from the beginning of the PhD to its end, in total 3,246 cups of coffee were
consumed. Put differently, for each page 12.8 cups of coffee were required.
B.2 Publications
The research conducted in this thesis was embedded in several international cooperations. Likewise, the author was given the opportunity to cooperate in research
that is not directly connected to this thesis. To give an overview of all works the
author was involved, in Figure B.2 all publications can be found—publications directly related to this thesis are marked with a circle. It should also be mentioned
that all publications directly related to this thesis were lead by the author. In this
sense, the author was responsible for the research described in these publications,
including conceptualization, implementation and empirical validation. Co-authors,
however, supported the work by giving feedback, participating in discussions and
supporting the data collection.
211
Appendix B Supplementary Information
[278] [190] [174] [94]
[242]
Imperative
Process
Models
[180] [69]
[301]
[241]
[70] [184]
Declarative
Process
Models
Understandability
[299] [297] [298]
[304]
Modularization
[303]
[296] [97] [302] [276]
[271] [188] [189]
[300]
[183]
Process of
Process
Modeling
Tools for
Empirical
Research
[185] [186] [192] [31]
[277] [104] [270]
[187]
Process Flexibility
[191] [295] [272] [118]
Semantic Web
and
Medical Informatics
[56]
Figure B.2: All publications, organized by topic
212
[57]
[55]
Abbreviations
Throughout this thesis, we made use of abbreviations. Even though we carefully
introduced abbreviations, for the sake of completeness and quick lookup, all abbreviations used in this thesis are listed in Table B.1.
Abbreviation
Full Name
AAT
ACM
BPM
BPMN
CDF
CEP
CLT
DCR
DE
DSRM
EPC
ER
GEF
GRETA
HCI
HERD
IEEE
IS
LATEX
LDM
LTL
MAD
MB
MM
PAIS
PDF
PPM
Automated Acceptance Testing
Association for Computing Machinery
Business Process Management
Business Process Model and Notation
Cognitive Dimensions Framework
Cheetah Experimental Platform
Cognitive Load Theory
Dynamic Condition Response
Domain Expert
Design Science Research Methodology
Event–Driven Process Chain
Entity–Relationship
Graphical Editing Framework
Graphical Runtime Environment for Adaptive Processes
Human Computer Interaction
Hierarchical Entity–Relationship Diagram
Institute of Electrical and Electronics Engineers
Information Systems
Lamport TeX
Levelled Data Model
Linear Temporal Logic
Median Absolute Deviation
Model Builder
Model Mediator
Process Aware Information System
Portable Document Format
Process of Process Modeling
213
Appendix B Supplementary Information
Abbreviation
Full Name
RQ
SD
SPSS
SVN
TAM
TDD
TDM
TDMS
UI
UML
UTP
YAWL
Research Question
Standard Deviation
Statistical Package for Social Sciences
Subversion
Technology Acceptance Model
Test Driven Development
Test Driven Modeling
Test Driven Modeling Suite
User Interface
Unified Modeling Language
UML Testing Profile
Yet Another Workflow Language
Table B.1: Abbreviations
214
Bibliography
[1] H. Agrawal, J. R. Horgan, E. W. Krauser, and S. London. Incremental Regression Testing. In Proc. ICSM’93, pages 348–357, 1993.
[2] D. Amyot and A. Eberlein. An Evaluation of Scenario Notations and Construction Approaches for Telecommunication Systems Development. Telecommunication Systems, 24(1):61–94, 2003.
[3] E. Arisholm and D. I. Sjøberg. Evaluating the Effect of a Delegated versus
Centralized Control Style on the Maintainability of Object-Oriented Software.
IEEE Transactions on Software Engineering, 30(8):521–534, 2004.
[4] P. Attie, M. Singh, A. Sheth, and M. Rusinkiewicz. Specifying and Enforcing
Intertask Dependencies. In Proc. VLDB’93, pages 134–145, 1993.
[5] A. Awad, G. Decker, and M. Weske. Efficient Compliance Checking Using
BPMN-Q and Temporal Logic. In Proc. BPM’08, pages 326–341, 2008.
[6] A. Awad, S. Smirnov, and M. Weske. Resolution of Compliance Violation in
Business Process Models: A Planning-Based Approach. In Proc. OTM’09,
pages 6–23, 2009.
[7] A. Baddeley. Working Memory. Science, 255(5044):556–559, 1992.
[8] A. Baddeley. Working Memory: Theories, Models, and Controversies. Annual
Review of Psychology, 63(1):1–29, 2012.
[9] M. Bannert. Managing cognitive load—recent trends in cognitive load theory.
Learning and Instruction, 12(1):139–146, 2002.
[10] I. Barba, B. Weber, and C. D. Valle. Supporting the Optimized Execution of
Business Processes through Recommendations. In Proc. BPI’11, pages 135–
140, 2012.
[11] I. Barba, B. Weber, C. D. Valle, and A. J. Ramı́rez. User Recommendations for the Optimized Execution of Business Processes. Data & Knowledge
Engineering, 86:61–84, 2013.
215
Bibliography
[12] F. Bartlett. Remembering: A Study in Experimental and Social Psychology.
Cambridge University Press, 1932.
[13] V. R. Basili. The Role of Experimentation in Software Engineering: Past,
Current, and Future. In Proc. ICSE’96, pages 442–449, 1996.
[14] K. Beck. Extreme Programming Explained: Embracing Change. AddisonWesley, 1999.
[15] K. Beck. Test Driven Development: By Example. Addison-Wesley, 2002.
[16] P. Berander. Using Students as Subjects in Requirements Prioritization. In
Proc. ISESE’04, pages 167–176, 2004.
[17] R. Bergenthum, J. Desel, S. Mauser, and R. Lorenz. Construction of Process
Models from Example Runs. In Transactions on Petri Nets and Other Models
of Concurrency II, pages 243–259. Springer, 2009.
[18] A. Bertolino, G. D. Angelis, A. D. Sandro, and A. Sabetta. Is my model
right? Let me ask the expert. Journal of Systems and Software, 84(7):1089–
1099, 2011.
[19] R. Bobrik, M. Reichert, and T. Bauer. Requirements for the Visualization
of System-Spanning Business Processes. In Proc. DEXA’05, pages 948–954,
2005.
[20] B. W. Boehm. Verifying and Validating Software Requirements and Design
Specifications. IEEE Software, 1(1):75–88, 1984.
[21] D. E. Broadbent. The magic number seven after fifteen years. In Studies in
long term memory, pages 3–18. Wiley, 1975.
[22] C. Brown. Cognitive Psychology. Sage Publications, 2006.
[23] J.-M. Burkhardt, F. Détienne, and S. Wiedenbeck. Object-Oriented Program
Comprehension: Effect of Expertise, Task and Phase. Empirical Software
Engineering, 7(2):115–156, 2002.
[24] A. Burton-Jones and P. N. Meso. How Good Are These UML Diagrams? An
Empirical Test of the Wand and Weber Good Decomposition Model. In Proc.
ICIS’02, pages 101–114, 2002.
216
Bibliography
[25] A. Burton-Jones and P. N. Meso. Conceptualizing Systems for Understanding:
An Empirical Test of Decomposition Principles in Object-Oriented Analysis.
Information Systems Research, 17(1):38–60, 2006.
[26] A. Burton-Jones and P. N. Meso. The Effects of Decomposition Quality and
Multiple Forms of Information on Novices’ Understanding of a Domain from
a Conceptual Model. Journal of the Association for Information Systems,
9(12):748–802, 2008.
[27] G. Canfora, A. Cimitile, F. Garcia, M. Piattini, and C. A. Visaggio. Evaluating Advantages of Test Driven Development: a Controlled Experiment with
Professionals. In Proc. ISESE’06, pages 364–371, 2006.
[28] W. Chase and H. Simon. Perception in Chess. Cognitive Psychology, 4(1):55–
81, 1973.
[29] W. Chase and H. Simon. The Mind’s Eye in Chess. In Visual information
processing, pages 215–281. Academic Press, 1973.
[30] J. Claes, I. Vanderfeesten, J. Pinggera, H. Reijers, B. Weber, and G. Poels.
Visualizing the Process of Process Modeling with PPMCharts. In Proc.
TAProViz’12, pages 744–755, 2013.
[31] J. Claes, I. Vanderfeesten, H. Reijers, J. Pinggera, M. Weidlich, S. Zugal,
D. Fahland, B. Weber, J. Mendling, and G. Poels. Tying Process Model
Quality to the Modeling Process: The Impact of Structuring, Movement, and
Speed. In Proc. BPM’12, pages 33–48, 2012.
[32] J. Cohen. Statistical Power Analysis for the Behavioral Sciences, Second Edition. Lawrence Erlbaum, 1988.
[33] J. Cohen. A Power Primer. Psychological Bulletin, 112(1):155–159, 1992.
[34] A. R. Conway, N. Cowan, M. F. Bunting, D. J. Therriault, and S. R. B.
Minkoff. A latent variable analysis of working memory capacity, short-term
memory capacity, processing speed, and general fluid intelligence. Intelligence,
30(2):163–183, 2002.
[35] J. Corbin and A. Strauss. Basics of Qualitative Research: Techniques and
Procedures for Developing Grounded Theory. SAGE Publications, 2007.
[36] L. D. Couglin and V. L. Patel. Processing of critical information by physicians
and medical students. Journal of Medical Education, 62(10):818–828, 1987.
217
Bibliography
[37] N. Cowan. Working Memory Capacity. Psycholology Press, 2005.
[38] A. W. Crapo, L. B. Waisel, W. A. Wallace, and T. R. Willemain. Visualization
and the process of modeling: a cognitive-theoretic view. In Proc. KDD’00,
pages 218–226, 2000.
[39] J. Creswell. Research Design: Qualitative, Quantitative and Mixed Method
Approaches. Sage Publications, 2002.
[40] J. Cruz-Lemus, M. Genero, and M. Piattini. Using Controlled Experiments for
Validating UML Statechart Diagrams Measures. In Proc. IWSM-Mensura’07,
pages 129–138, 2008.
[41] J. Cruz-Lemus, M. Genero, M. Piattini, and A. Toval. Investigating the
Nesting Level of Composite States in UML Statechart Diagrams. In Proc.
QAOOSE’05, pages 97–108, 2005.
[42] J. A. Cruz-Lemus, M. Genero, M. E. Manso, S. Morasca, and M. Piattini.
Assessing the understandability of UML statechart diagrams with composite states—A family of empirical studies. Empirical Software Engineering,
25(6):685–719, 2009.
[43] J. A. Cruz-Lemus, M. Genero, M. E. Manso, and M. Piattini. Evaluating
the Effect of Composite States on the Understandability of UML Statechart
Diagrams. In Proc. MODELS’05, pages 113–125, 2005.
[44] J. A. Cruz-Lemus, M. Genero, S. Morasca, and M. Piattini. Using Practitioners for Assessing the Understandability of UML Statechart Diagrams with
Composite States. In Proc. ER Workshops’07, pages 213–222, 2007.
[45] J. A. Cruz-Lemus, M. Genero, M. Piattini, and A. Toval. An Empirical Study
of the Nesting Level of Composite States Within UML Statechart Diagrams.
In Proc. ER Workshops’05, pages 12–22, 2005.
[46] N. Damij. Business process modelling using diagrammatic and tabular techniques. Business Process Management Journal, 13(1):70–90, 2007.
[47] F. Davies. A Technology Acceptance Model for Empirically Testing New EndUser Information Systems: Theory and Results. PhD thesis, Sloan School of
Management, 1986.
[48] I. Davies, P. Green, M. Rosemann, M. Indulska, and S. Gallo. How do Practitioners Use Conceptual Modeling in Practice? Data & Knowledge Engineering,
58(3):358–380, 2006.
218
Bibliography
[49] R. Davies. Business Process Modelling With Aris: A Practical Guide. Springer,
2001.
[50] F. D. Davis. Perceived Usefulness, Perceived Ease of Use, and User Acceptance
of Information Technology. MIS Quarterly, 13(3):319–340, 1989.
[51] M. K. de Weger. Structuring of Business Processes: An architectural approach
to distributed systems development and its application to business processes.
PhD thesis, University of Twente, 1998.
[52] J. Desel. Model Validation—A Theoretical Issue? In Proc. ICATPN’02, pages
23–43, 2002.
[53] J. Desel. From Human Knowledge to Process Models. In Proc. UNISCON’08,
pages 84–95, 2008.
[54] J. Desel, G. Juhás, R. Lorenz, and C. Neumair. Modelling and Validation with
VipTool. In Proc. BPM’03, pages 380–389, 2003.
[55] M. Droop, M. Flarer, J. Groppe, S. Groppe, V. Linnemann, J. Pinggera,
F. Santner, M. Schier, F. Schöpf, H. Staffler, and S. Zugal. Translating XPath
Queries into SPARQL Queries. In Proc. OTM Workshops’07, pages 9–10,
2007.
[56] M. Droop, M. Flarer, J. Groppe, S. Groppe, V. Linnemann, J. Pinggera,
F. Santner, M. Schier, F. Schöpf, H. Staffler, and S. Zugal. Embedding Xpath
Queries into SPARQL Queries. In Proc. ICEIS’08, pages 5–14, 2008.
[57] M. Droop, M. Flarer, J. Groppe, S. Groppe, V. Linnemann, J. Pinggera,
F. Santner, M. Schier, F. Schöpf, H. Staffler, and S. Zugal. Bringing the
XML and Semantic Web Worlds Closer: Transforming XML into RDF and
Embedding XPath into SPARQL. In Proc. ICEIS’08, pages 31–45, 2009.
[58] A. Duchowski. Eye Tracking Methodology. Springer, 2007.
[59] M. Dumas, M. L. Rosa, J. Mendling, R. Mäesalu, H. Reijers, and N. Semenenko. Understanding Business Process Models: The Costs and Benefits of
Structuredness. In Proc. CAiSE’12, pages 31–46, 2012.
[60] M. Dumas, W. M. P. van der Aalst, and A. H. ter Hofstede. Process Aware
Information Systems: Bridging People and Software Through Process Technology. Wiley-Interscience, 2005.
219
Bibliography
[61] S. Easterbrook, J. Singer, M.-A. Storey, and D. Damian. Selecting Empirical
Methods for Software Engineering Research. In Guide to Advanced Empirical
Software Engineering, pages 285–311. Springer, 2008.
[62] S. H. Edwards. Using Test-Driven Development in the Classroom: Providing Students with Automatic, Concrete Feedback on Performance. In Proc.
EISTA’03, pages 421–426, 2003.
[63] H. Erdogmus, M. Morisio, and M. Torchiano. On the Effectiveness of the TestFirst Approach to Programming. IEEE Transactions on Software Engineering,
31(1):226–237, 2005.
[64] K. A. Ericsson and H. A. Simon. Protocol analysis: Verbal reports as data.
MIT Press, 1993.
[65] T. Erl. Service-oriented Architecture: Concepts, Technology, and Design.
Prentice Hall, 2005.
[66] D. Fahland. Oclets—Scenario-Based Modeling with Petri Nets. In Proc.
PETRI NETS’09, pages 223–242, 2009.
[67] D. Fahland. From Scenarios To Components.
Universität zu Berlin, 2010.
PhD thesis, Humboldt-
[68] D. Fahland and A. Kantor. Synthesizing Decentralized Components from a
Variant of Live Sequence Charts. In Proc. MODELSWARD’13, pages 25–38,
2013.
[69] D. Fahland, J. Mendling, H. Reijers, B. Weber, M. Weidlich, and S. Zugal.
Declarative versus Imperative Process Modeling Languages: The Issue of Understandability. In Proc. EMMSAD’09, pages 353–366, 2009.
[70] D. Fahland, J. Mendling, H. Reijers, B. Weber, M. Weidlich, and S. Zugal.
Declarative vs. Imperative Process Modeling Languages: The Issue of Maintainability. In Proc. ER-BPM’09, pages 65–76, 2009.
[71] D. Fahland and M. Weidlich. Scenario-based process modeling with Greta. In
Proc. BPMDemos’10, 2010, http://ceur-ws.org/Vol-615.
[72] D. Fahland and H. Woith. Towards Process Models for Disaster Response. In
Proc. PM4HDPS’08, pages 254–265, 2008.
[73] P. Feldmann and D. Miller. Entity Model Clustering: Structuring A Data
Model By Abstraction. The Computer Journal, 29(4):348–360, 1986.
220
Bibliography
[74] K. Figl and B. Weber. Individual Creativity in Designing Business Processes.
In Proc. HC-PAIS’12, pages 294–306, 2012.
[75] S. Forster. Investigating the Collaborative Process of Process Modeling. In
CAiSE’13 Doctoral Consortium, pages 33–41, 2013.
[76] S. Forster, J. Pinggera, and B. Weber. Collaborative Business Process Modeling. In Proc. EMISA’12, pages 81–94, 2012.
[77] S. Forster, J. Pinggera, and B. Weber. Toward an Understanding of the Collaborative Process of Process Modeling. In Proc. CAiSE Forum’13, pages
98–105, 2013.
[78] C. Francalanci and B. Pernici. Abstraction Levels for Entity-Relationship
Schemas. In Proc. ER’94, pages 456–473, 1994.
[79] D. J. Garland and J. R. Barry. Cognitive Advantage in Sport: The Nature of
Perceptual Structures. The American Journal of Psychology, 104(2):211–228,
1991.
[80] W. Gassler, E. Zangerle, and G. Specht. The Snoopy Concept: Fighting heterogeneity in semistructured and collaborative information systems by using
recommendations. In Proc. CTS’11, pages 61–68, 2011.
[81] W. Gassler, E. Zangerle, M. Tschuggnall, and G. Specht. SnoopyDB: narrowing the gap between structured and unstructured information using recommendations. In Proc. HT’10, pages 271–272, 2010.
[82] C. F. Gauss. Bestimmung der Genauigkeit der Beobachtungen. Zeitschrift für
Astronomie und verwandte Wissenschaften, 1:185–197, 1816.
[83] A. Gemino and Y. Wand. Complexity and clarity in conceptual modeling:
Comparison of mandatory and optional properties. Data & Knowledge Engineering, 55(3):301–326, 2005.
[84] B. George and L. Williams. A structured experiment of test-driven development. Information and Software Technology, 46(5):337–342, 2004.
[85] A. L. Gilchrist and N. Cowan. Chunking. In The encyclopedia of human
behavior, vol. 1, pages 476–483. Academic Press, 2012.
[86] J. F. Gilgun. Qualitative Methods in Family Research, chapter Definitions,
Methologies, and Methods in Qualitative Family Research, pages 22–39. Sage
Publications, 1992.
221
Bibliography
[87] D. J. Gilmore and T. R. Green. Comprehension and recall of miniature programs. International Journal of Man-Machine Studies, 21(1):31–48, 1984.
[88] M. Glinz, C. Seybold, and S. Meier. Simulation-Driven Creation, Validation
and Evolution of Behavioral Requirements Models. In Proc. MBEES’07, pages
103–112, 2007.
[89] J. A. Goguen and F. J. Varela. Systems and Distinctions; Duality and Complementarity. International Journal of General Systems, 5(1):31–43, 1979.
[90] D. Gopher and R. Brown. On the psychophysics of workload: Why bother
with subjective measure? Human Factors: The Journal of the Human Factors
and Ergonomics Society, 26(5):519–532, 1984.
[91] P. Gray. Psychology. Worth Publishers, 2007.
[92] T. R. Green. Cognitive dimensions of notations. In Proc. BCSHCI’89, pages
443–460, 1989.
[93] T. R. Green and M. Petre. Usability Analysis of Visual Programming Environments: A ’Cognitive Dimensions’ Framework. Journal of Visual Languages
and Computing, 7(2):131–174, 1996.
[94] T. Gschwind, J. Pinggera, S. Zugal, H. Reijers, and B. Weber. A Linear Time
Layout Algorithm for Business Process Models. Technical Report RZ3830,
IBM Research, 2012.
[95] C. Haisjackl. Test Driven Modeling meets Declarative Process Modeling—A
Case Study. Master’s thesis, University of Innsbruck, August 2012.
[96] C. Haisjackl and B. Weber. User Assistance during Process Execution—An
Experimental Evaluation of Recommendation Strategies. In Proc. BPI’10,
pages 134–145, 2011.
[97] C. Haisjackl, S. Zugal, P. Soffer, I. Hadar, M. Reichert, J. Pinggera, and
B. Weber. Making Sense of Declarative Process Models: Common Strategies
and Typical Pitfalls. In Proc. BPMDS’13, pages 2–17, 2013.
[98] D. Z. Hambrick and R. W. Engle. Effects of domain knowledge, working
memory capacity, and age on cognitive performance: An investigation of the
knowledge-is-power hypothesis. Cognitive Psychology, 44(4):339–387, 2002.
[99] D. Harel and R. Marelly. Come, Let’s Play: Scenario-Based Programming
Using LSCs and the Play-Engine. Springer-Verlag, 2003.
222
Bibliography
[100] A. Hevner, S. March, J. Park, and S. Ram. Design Science in Information
Systems Research. MIS Quarterly, 28(1):75–105, 2004.
[101] T. Hildebrandt and R. Mukkamala. Declarative Event-Based Workflow as
Distributed Dynamic Condition Response Graphs. In Proc. PLACES’10, pages
59–73, 2010.
[102] T. Hildebrandt, R. Mukkamala, and T. Slaats.
Designing a Crossorganizational Case Management System using Dynamic Condition Response
Graphs. In Proc. EDOC’11, pages 161–170, 2011.
[103] T. Hildebrandt, R. Mukkamala, and T. Slaats. Nested Dynamic Condition
Response Graphs. In Proc. FSEN’11, pages 343–350, 2012.
[104] B. Holzner, J. Giesinger, J. Pinggera, S. Zugal, F. Schöpf, A. Oberguggenberger, E. Gamper, A. Zabernigg, B. Weber, and G. Rumpold. The Computerbased Health Evaluation Software (CHES): a software for electronic patientreported outcome monitoring. BMC Medical Informatics and Decision Making, 12(1), 2012.
[105] S. Hoppenbrouwers, L. Lindeman, and E. Proper. Capturing Modeling Processes - Towards the MoDial Modeling Laboratory. In Proc. OTM’06, pages
1242–1252, 2006.
[106] S. Hoppenbrouwers, E. Proper, and T. van der Weide. Formal Modelling as a
Grounded Conversation. In Proc. LAP’05, pages 139–155, 2005.
[107] S. Hoppenbrouwers, E. Proper, and T. Weide. A Fundamental View on the
Process of Conceptual Modeling. In Proc. ER’05, pages 128–143, 2005.
[108] M. Höst, B. Regnell, and C. Wohlin. Using Students as Subjects—A Comparative Study of Students and Professionals in Lead-Time Impact Assessment.
Empirical Software Engingeering, 5(3):201–214, 2000.
[109] A. S. Huff. Mapping Strategic Thought. Wiley, 1990.
[110] M. Indulska, P. Green, J. Recker, and M. Rosemann. Business Process Modeling: Perceived Benefits. In Proc. ER’09, pages 458–471, 2009.
[111] R. Jeffries, A. Turner, P. Polson, and M. Atwood. The Process Involved in
Designing Software. In Cognitive Skills and Their Acquisition, pages 255–283.
Erlbaum, 1981.
223
Bibliography
[112] T. D. Jick. Mixing Qualitative and Quantitative Methods: Triangulation in
Action. Administrative Science Quarterly, 24(4):602–611, 1979.
[113] F. Johannsen and S. Leist. Wand and Weber’s Decomposition Model in the
Context of Business Process Modeling. Business & Information Systems Engineering, 4(5):271–286, 2012.
[114] M. A. Just and P. A. Carpenter. A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99(1):122–149,
1992.
[115] S. Kalyuga, P. Ayres, P. Chandler, and J. Sweller. The Expertise Reversal
Effect. Educational Psychologist, 38(1):23–31, 2003.
[116] S. Kalyuga, P. Chandler, and J. Sweller. Managing Split-attention and Redundancy in Multimedia Instruction. Applied Cognitive Psychology, 13(4):351–
371, 1999.
[117] E. Kant and A. Newell. Problem Solving Techniques for the design of algorithms. Information Processing & Management, 20(1–2):97–118, 1984.
[118] A. Kaser, B. Weber, J. Pinggera, and S. Zugal. Handlungsorientierung bei der
Planung von Softwareprojekten. In Proc. TEAP’10, pages 253–253, 2010.
[119] A. E. Kazdin. Artifact, bias, and complexity of assessment: the ABCs of
reliability. Journal of Applied Behavior Analysis, 10(1):141–150, 1977.
[120] V. Khatri, I. Vessey, P. C. V. Ramesh, and S.-J. Park. Understanding Conceptual Schemas: Exploring the Role of Application and IS Domain Knowledge.
Information Systems Research, 17(1):81–99, 2006.
[121] B. Kitchenham. Procedures for performing systematic reviews. Technical
report, Keele University Joint Technical Report TR/SE-0401, 2004.
[122] N. F. Kock. Product Flow, Breadth and Complexity of Business Processes: An
Empirical Study of 15 Business Processes in Three Organizations. Business
Process Re-engineering & Management Journal, 2(2):8–22, 1996.
[123] J. Kolb, K. Kammerer, and M. Reichert. Updatable Process Views for Usercentered Adaption of Large Process Models. In Proc. ICSOC’12, pages 484–
498, 2012.
224
Bibliography
[124] J. Kolb, M. Reichert, and B. Weber. Using Concurrent Task Trees for
Stakeholder-centered Modeling and Visualization of Business Processes. In
Proc. S-BPM ONE’12, pages 137–151, 2012.
[125] P. C. Kyllonen and D. L. Stephens. Cognitive Abilities as Determinants of Success in Acquiring Logic Skill. Learning and Individual Differences, 2(2):129–
160, 1990.
[126] E. Lamma, P. Mello, M. Montali, F. Riguzzi, and S. Storari. Inducing Declarative Logic-Based Models from Labeled Traces. In Proc. BPM’07, pages 344–
359, 2007.
[127] A. Lanz, B. Weber, and M. Reichert. Time Patterns for Process-aware Information Systems: A Pattern-based Analysis - Revised version. Technical
report, University of Ulm, 2009.
[128] J. H. Larkin and H. A. Simon. Why a Diagram is (Sometimes) Worth Ten
Thousand Words. Cognitive Science, 11(1):65–100, 1987.
[129] R. Lenz and M. Reichert. IT support for healthcare processes—premises,
challenges, perspectives. Data & Knowledge Engineering, 61(1):39–58, 2007.
[130] C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata. Detecting outliers: Do
not use standard deviation around the mean, use absolute deviation around
the median. Journal of Experimental Social Psychology, 49(4):764–766, 2013.
[131] H. Liang, J. Dingel, and Z. Diskin. A Comparative Survey of Scenario-based
to State-based Model Synthesis Approaches. In Proc. SCESM’06, pages 5–12,
2006.
[132] L. T. Ly, S. Rinderle, and P. Dadam. Integration and verification of semantic constraints in adaptive process management systems. Data & Knowledge
Engineering, 64(1):3–23, 2008.
[133] A. Marchenko, A. Pekka, and T. Ihme. Long-Term Effects of Test-Driven
Development A Case Study. In XP, pages 13–22, 2009.
[134] K. Masri. Conceptual Model Design for Better Understanding. PhD thesis,
Simon Fraser University, 2009.
[135] R. Mayer and P. Chandler. When learning is just a click away: Does simple
user interaction foster deeper understanding of multimedia messages. Journal
of Educational Psychology, 93(2):390–397, 2001.
225
Bibliography
[136] S. McDonald and R. J. Stevenson. Disorientation in hypertext: the effects
of three text structures on navigation performance. Applied Ergonomics,
27(1):61–68, 1996.
[137] J. Melcher, J. Mendling, H. Reijers, and D. Seese. On Measuring the Understandability of Process Models. In Proc. BPM Workshops’09, pages 465–476,
2009.
[138] J. Melcher and D. Seese. Towards Validating Prediction Systems for Process Understandability: Measuring Process Understandability. In Proc.
SYNASC’08, pages 564–571, 2008.
[139] J. Mendling. Detection and Prediction of Errors in EPC Business Process
Models. PhD thesis, Vienna University of Economics and Business Administration, 2007.
[140] J. Mendling. Metrics for Process Models: Empirical Foundations of Verification, Error Prediction and Guidelines for Correctness. Springer, 2008.
[141] J. Mendling. Empirical Studies in Process Model Verification. In Transactions
on Petri Nets and Other Models of Concurrency II, volume 5460 of Lecture
Notes in Computer Science, pages 208–224. Springer Berlin Heidelberg, 2009.
[142] J. Mendling, G. Neumann, and W. M. P. van der Aalst. Understanding the
Occurrence of Errors in Process Models based on Metrics. In Proc. CoopIS’07,
pages 113–130, 2007.
[143] J. Mendling, H. Reijers, and J. Cardoso. What Makes Process Models Understandable? In Proc. BPM’07, pages 48–63, 2007.
[144] J. Mendling, H. Reijers, and J. Recker. Activity Labeling in Process Modeling:
Empirical Insights and Recommendations. Information Systems, 35(4):467–
482, 2010.
[145] J. Mendling, H. Reijers, and W. M. P. van der Aalst. Seven process modeling
guidelines (7PMG). Information & Software Technology, 52(2):127–136, 2010.
[146] M. B. Miles. Qualitative Data as an Attractive Nuisance: The Problem of
Analysis. Administrative Science Quarterly, 24(4):590–601, 1979.
[147] G. Miller. The Magical Number Seven, Plus or Minus Two: Some Limits on
Our Capacity for Processing Information. The Psychological Review, 63(2):81–
97, 1956.
226
Bibliography
[148] M. Montali, M. Pesic, W. M. P. van der Aalst, F. Chesani, P. Mello, and
S. Storari. Declarative Specification and Verification of Service Choreographies. ACM Transactions on the Web, 4(1):1–62, 2010.
[149] D. Moody. A Multi-Level Architecture for Representing Enterprise Data Models. In Proc. ER’97, pages 184–197, 1997.
[150] D. L. Moody. Cognitive Load Effects on End User Understanding of Conceptual Models: An Experimental Analysis. In Proc. ADBIS’04, pages 129–143,
2004.
[151] D. L. Moody. The ”Physics” of Notations: Toward a Scientific Basis for
Constructing Visual Notations in Software Engineering. IEEE Transactions
on Software Engineering, 35(6):756–779, 2009.
[152] D. L. Moody and A. Flitman. A Methodology for Clustering Entity Relationship Models—A Human Information Processing Approach. In Proc. ER’99,
pages 114–130, 1999.
[153] R. Moreno and R. E. Mayer. Cognitive Principles of Multimedia Learning:
The Role of Modality and Contiguity. Journal of Educational Psychology,
91(2):358–368, 1999.
[154] W. T. Morris. On the Art of Modeling. Management Science, 13(12):B–707–
B–717, 1967.
[155] R. Mugridge and W. Cunningham. Fit for Developing Software: Framework
for Integrated Tests. Prentice Hall, 2005.
[156] R. Mukkamala. A Formal Model For Declarative Workflows-Dynamic Condition Response Graphs. PhD thesis, IT University of Copenhagen, 2012.
[157] R. Mukkamala, T. Hildebrandt, and T. Slaats. Towards Trustworthy Adaptive Case Management with Dynamic Condition Response Graphs. In Proc.
EDOC’13, accepted.
[158] D. Müller, M. Reichert, and J. Herbst. A New Paradigm for the Enactment and
Dynamic Adaptation of Data-driven Process Structures. In Proc. CAiSE’08,
pages 48–63, 2008.
[159] N. Mulyar, M. Pesic, W. M. P. van der Aalst, and M. Peleg. Declarative and
Procedural Approaches for Modelling Clinical Guidelines: Addressing Flexibility Issues. In Proc. ProHealth’07, pages 335–346, 2007.
227
Bibliography
[160] J. Mylopoulos. Information modeling in the time of the revolution. Information
Systems, 23(3/4):127–155, 1998.
[161] J. Nakamura and M. Csikszentmihalyi. The Concept of Flow. In Handbook of
Positive Psychology, pages 89–105. Oxford University Press, 2002.
[162] A. Newell. Unified Theories of Cognition. Harvard University Press, 1990.
[163] D. A. Norman. Cognitive artifacts. Department of Cognitive Science, University of California, San Diego, 1990.
[164] J. F. Nunamaker, M. Chen, and T. D. Purdin. Systems Development in Information Systems Research. Journal of Management Information Systems,
7(3):89–106, 1991.
[165] OMG.
UML Testing Profile, Version 1.0.
http://www.omg.org/cgibin/doc?formal/05-07-07, 2005. Accessed: April 2013.
[166] OMG. UML Version 2.3. http://www.omg.org/spec/UML/2.3/Superstructure
/PDF, 2010. Accessed: April 2013.
[167] OMG. BPMN Version 2.0. http://www.omg.org/spec/BPMN/2.0/PDF, 2011.
Accessed: April 2013.
[168] F. Paas, A. Renkl, and J. Sweller. Cognitive Load Theory and Instructional
Design: Recent Developments. Educational Psychologist, 38(1):1–4, 2003.
[169] M. Pančur, M. Ciglarič, M. Trampuš, and T. Vidmar. Towards Empirical
Evaluation of Test-Driven Development in a University Environment. In Proc.
EUROCON’03, pages 83–86, 2003.
[170] D. L. Parnas. On the Criteria to be Used in Decomposing Systems into Modules. Communications of the ACM, 15(12):1053–1058, 1972.
[171] J. Parsons and L. Cole. What do the pictures mean? Guidelines for experimental evaluation of representation fidelity in diagrammatical conceptual
modeling techniques. Data & Knowledge Engineering, 55(3):327–342, 2005.
[172] D. Paulson and Y. Wand. An Automated Approach to Information Systems
Decomposition. IEEE Transactions on Software Engineering, 18(3):174–189,
1992.
228
Bibliography
[173] K. Peffers, T. Tuunanen, M. Rothenberger, and S. Chatterjee. A Design
Science Research Methodology for Information Systems Research. Journal of
Management Information Systems, 24(3):45–77, 2007.
[174] R. Pérez-Castillo, B. Weber, J. Pinggera, S. Zugal, I. G.-R. de Guzmán, and
M. Piattini. Generating event logs from nonprocess-aware systems enabling
business process mining. Enterprise Information Systems, 5(3):301–335, 2011.
[175] M. Pesic. Constraint-Based Workflow Management Systems: Shifting Control
to Users. PhD thesis, TU Eindhoven, 2008.
[176] M. Pesic, H. Schonenberg, N. Sidorova, and W. M. P. van der Aalst.
Constraint-Based Workflow Models: Change Made Easy. In Proc. CoopIS’07,
pages 77–94, 2007.
[177] M. Pesic, H. Schonenberg, and W. M. P. van der Aalst. DECLARE: Full
Support for Loosely-Structured Processes. In Proc. EDOC’07, pages 287–298,
2007.
[178] M. Pesic and W. M. P. van der Aalst. A Declarative Approach for Flexible
Business Processes Management. In Proc. BPM Workshops’06, pages 169–180,
2006.
[179] M. A. Pett. Nonparametric Statistics for Health Care Research: Statistics for
Small Samples and Unusual Distributions. Sage Publications, 1997.
[180] P. Pichler, B. Weber, S. Zugal, J. Pinggera, J. Mendling, and H. Reijers.
Imperative versus Declarative Process Modeling Languages: An Empirical
Investigation. In Proc. ER-BPM’11, pages 383–394, 2012.
[181] M. Pidd. Tools for Thinking: Modelling in Management Science. Wiley, 2003.
[182] J. Pinggera. Handling Uncertainty in Software Projects—A Controlled Experiment. Master’s thesis, University of Innsbruck, Institute of Computer Science,
2009.
[183] J. Pinggera, M. Furtner, M. Martini, P. Sachse, K. Reiter, S. Zugal, and
B. Weber. Investigating the Process of Process Modeling with Eye Movement
Analysis. In Proc. ER-BPM’12, pages 438–450, 2013.
[184] J. Pinggera, T. Porcham, S. Zugal, and B. Weber. LiProMo—Literate Process
Modeling. In Proc. CAiSE Forum’12, pages 163–170, 2012.
229
Bibliography
[185] J. Pinggera, P. Soffer, D. Fahland, M. Weidlich, S. Zugal, B. Weber, H. Reijers,
and J. Mendling. Styles in business process modeling: an exploration and a
model. Software & Systems Modeling, 2013, DOI: 10.1007/s10270-013-0349-1.
[186] J. Pinggera, P. Soffer, S. Zugal, B. Weber, M. Weidlich, D. Fahland, H. Reijers,
and J. Mendling. Modeling Styles in Business Process Modeling. In Proc.
BPMDS’12, pages 151–166, 2012.
[187] J. Pinggera, S. Zugal, and B. Weber. Alaska Simulator—Supporting Empirical
Evaluation of Process Flexibility. In Proc. WETICE’09, pages 231–233, 2009.
[188] J. Pinggera, S. Zugal, and B. Weber. Investigating the Process of Process
Modeling with Cheetah Experimental Platform. In Proc. ER-POIS’10, pages
13–18, 2010.
[189] J. Pinggera, S. Zugal, and B. Weber. Investigating the Process of Process
Modeling with Cheetah Experimental Platform. EMISA Forum, 30(2):25–31,
2010.
[190] J. Pinggera, S. Zugal, B. Weber, D. Fahland, M. Weidlich, J. Mendling, and
H. Reijers. How the Structuring of Domain Knowledge Can Help Casual
Process Modelers. In Proc. ER’10, pages 231–237, 2010.
[191] J. Pinggera, S. Zugal, B. Weber, W. Wild, and M. Reichert. Integrating CaseBased Reasoning with Adaptive Process Management. Technical Report TRCTIT-08-11, Centre for Telematics and Information Technology, University of
Twente, 2008.
[192] J. Pinggera, S. Zugal, M. Weidlich, D. Fahland, B. Weber, J. Mendling, and
H. Reijers. Tracing the Process of Process Modeling with Modeling Phase
Diagrams. In Proc. ER-BPM’11, pages 370–382, 2012.
[193] A. Polyvyanyy, S. Smirnov, and M. Weske. Process Model Abstraction: A
Slider Approach. In Proc. EDOC’08, pages 325–331, 2008.
[194] M. Poppendieck and T. Poppendieck. Implementing Lean Software Development: From Concept to Cash. Addison-Wesley Professional, 2006.
[195] A. Porter and L. Votta. Comparing Detection Methods For Software Requirements Inspections: A Replication Using Professional Subjects. Empirical
Software Engineering, 3(4):355–379, 1998.
230
Bibliography
[196] S. Quaglini, M. Stefanelli, G. Lanzola, V. Caporusso, and S. Panzarasa.
Flexible guideline-based patient careflow systems. Artificial Intelligence in
Medicine, 22(1):65–80, 2001.
[197] D. Quartel. Action relations. Basic design concepts for behaviour modelling
and refinement. PhD thesis, University of Twente, 1998.
[198] D. Quartel, L. Pires, M. van Sinderen, H. Franken, and C. Vissers. On the
Role of Basic Design Concepts in Behaviour Structuring. Computer networks
and ISDN systems, 29(4):413–436, 1997.
[199] W. R. The Problem of the Problem. MIS Quarterly, 27(1):iii–ix, 2003.
[200] R Core Team. R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria, 2013.
[201] O. Rauh and E. Stickel. Entity tree clustering—A method for simplifying ER
designs. In Proc. ER’92, pages 62–78, 1992.
[202] J. C. Recker, M. Rosemann, P. Green, and M. Indulska. Do Ontological
Deficiencies in Modeling Grammars Matter? MIS Quarterly, 35(1):57–79,
2011.
[203] M. Reichert and P. Dadam. ADEPTflex: Supporting Dynamic Changes of
Workflow without Losing Control. Journal of Intelligent Information Systems,
10(2):93–129, 1998.
[204] M. Reichert, J. Kolb, R. Bobrik, and T. Bauer. Enabling Personalized Visualization of Large Business Processes through Parameterizable Views. In Proc.
SAC’12, pages 1653–1660, 2012.
[205] M. Reichert and B. Weber. Enabling Flexibility in Process-Aware Information
Systems: Challenges, Methods, Technologies. Springer, 2012.
[206] H. Reijers and J. Mendling. Modularity in Process Models: Review and Effects.
In Proc. BPM’08, pages 20–35, 2008.
[207] H. Reijers and J. Mendling. A Study into the Factors that Influence the Understandability of Business Process Models. IEEE Transactions on Systems,
Man and Cybernetics, Part A: Systems and Humans, 41(3):449–462, 2011.
[208] H. Reijers, J. Mendling, and R. Dijkman. Human and automatic modularizations of process models to enhance their comprehension. Information Systems,
36(5):881–897, 2011.
231
Bibliography
[209] H. Reijers, T. Slaats, and C. Stahl. Declarative Modeling—An Academic
Dream or the Future for BPM? In Proc. BPM’13, pages 307–322, 2013.
[210] P. Rittgen. Negotiating Models. In Proc. CAiSE’07, pages 561–573, 2007.
[211] P. Rittgen. Collaborative Modeling—A Design Science Approach. In Proc.
HICSS’09, pages 1–10, 2009.
[212] S. Rockwell and A. Bajaj. COGEVAL: Applying Cognitive Theories to Evaluate Conceptual Models. Advanced Topics in Database Research, 4:255–282,
2005.
[213] Y. Rogers and H. Brignull. Computational Offloading: Supporting Distributed
Team Working Through Visually Augmenting Verbal Communication. In
Proc. CogSci’03, pages 1011–1016, 2003.
[214] J. Rosenberg. Statistical Methods and Measurement. In Guide to Advanced
Empirical Software Engineering, pages 155–184. Springer, 2008.
[215] P. Runeson. Using Students as Experiment Subjects—An Analysis on Graduate and Freshmen Student Data. In Proc. EASE’03, pages 95–102, 2003.
[216] S. W. Sadiq, M. E. Orlowska, and W. Sadiq. Specification and validation of
process constraints for flexible workflows. Information Systems, 30(5):349–378,
2005.
[217] M. Scaife and Y. Rogers. External cognition: how do graphical representations
work? International Journal of Human-Computer Studies, 45(2):185–213,
1996.
[218] A. W. Scheer. ARIS: Business Process Modeling, 3rd ed. Springer, 2000.
[219] M. Schier. Adoption of Decision Deferring Techniques in Plan-driven Software
Projects. Master’s thesis, Master Thesis, Department of Computer Science,
University of Innsbruck, 2008.
[220] H. Schonenberg, B. Weber, B. van Dongen, and W. M. P. van der Aalst.
Supporting Flexible Processes through Recommendations Based on History.
Proc. BPM’08, pages 51–66, 2008.
[221] M. Schrepfer, J. Wolf, J. Mendling, and H. Reijers. The Impact of Secondary
Notation on Process Model Understanding. In Proc. PoEM’09, pages 161–175,
2009.
232
Bibliography
[222] C. B. Seaman. Qualitative Methods. In Guide to Advanced Empirical Software
Engineering, pages 35–62. Springer, 2008.
[223] I. Seeber, B. Weber, and R. Maier. CoPrA: A Process Analysis Technique
to Investigate Collaboration in Groups. In Proc. HICCS’12, pages 363–372,
2012.
[224] A. Sharp and P. McDermott. Workflow Modeling: Tools for Process Improvement and Application Development. Artech House, 2011.
[225] P. Shoval, R. Danoch, and M. Balabam. Hierarchical entity-relationship diagrams: the model, method of creation and experimental evaluation. Requirements Engineering, 9(4):217–228, 2004.
[226] K. Siau and M. Rossi. Evaluation techniques for systems analysis and design
modelling methods—a review and comparative analysis. Information Systems
Journal, 21(3):249–268, 2008.
[227] J. P. Simmons, L. D. Nelson, and U. Simonsohn. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting
Anything as Significant. Psychological Science, 22(11):1359–1366, 2011.
[228] J. Singer, S. E. Sim, and T. C. Lethbridge. Software Engineering Data Collection for Field Studies. In Guide to Advanced Empirical Software Engineering,
pages 9–34. Springer, 2008.
[229] M. P. Singh. Semantical Considerations on Workflows: An Algebra for Intertask Dependencies. In Proc. DBPL’96, pages 1–15, 1996.
[230] M. Siniaalto. Test driven development: empirical body of evidence. Technical
report, Information Technology for European Advancement, 2006.
[231] S. Smirnov, R. Dijkman, J. Mendling, and M. Weske. Meronymy-Based Aggregation of Activities in Business Process Models. In Proc. ER’10, pages
1–14, 2010.
[232] S. Smirnov, M. Weidlich, and J. Mendling. Business Process Model Abstraction Based on Behavioral Profiles. In Proc. ICSOC’10, pages 1–16, 2010.
[233] P. Soffer, M. Kaner, and Y. Wand. Towards understanding the process of
process modeling: Theoretical and empirical considerations. In Proc. ERBPM’11, pages 357–369, 2012.
233
Bibliography
[234] A. Strauss and J. Corbin. Grounded theory methodology: An overview, pages
273–285. Sage, 1994.
[235] M. Svahnberg, A. Aurum, and C. Wohlin. Using Students as Subjects—an
Empirical Evaluation. In Proc. ESEM’08, pages 288–290, 2008.
[236] J. Sweller. Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2):257–285, 1988.
[237] J. Sweller. Instructional Design in Technical Areas. ACER Press, Camberwell,
1999.
[238] J. Sweller and P. Chandler. Why Some Material Is Difficult to Learn. Cognition
and Instruction, 12(3):185–233, 1994.
[239] S. Taylor and R. Bogdan. Introduction to Qualitative Research Methods. Wiley,
1984.
[240] L. Thom, M. Reichert, C. Iochpe, and J. P. M. de Oliveira. Why Rigid Process Management Technology Hampers Computerized Support of Healthcare
Processes? In Proc. WIM’10, pages 1522–1531, 2010.
[241] V. Torres, S. Zugal, B. Weber, M. Reichert, and C. Ayora. Understandability Issues of Approaches Supporting Business Process Variability. Technical
Report ProS-TR-2012-03, Universidad Politecnica de Valencia, 2012.
[242] V. Torres, S. Zugal, B. Weber, M. Reichert, C. Ayora, and V. Pelechano. A
Qualitative Comparison of Approaches Supporting Business Process Variability. In Proc. rBPM’12, pages 560–572, 2012.
[243] A. Tort and A. Olivé. First Steps Towards Conceptual Schema Testing. In
Proc. CAiSE Forum’09, pages 1–6, 2009.
[244] A. Tort and A. Olivé. An approach to testing conceptual schemas. Data &
Knowledge Engineering, 69(6):598–618, 2010.
[245] A. Tort, A. Olivé, and M.-R. Sancho. An approach to test-driven development
of conceptual schemas. Data & Knowledge Engineering, 70(12):1088–1111,
2011.
[246] A. Tort, A. Olivé, and M.-R. Sancho. The CSTL Processor: A Tool for
Automated Conceptual Schema Testing. In Proc. ER Workshops’11, pages
349–352, 2011.
234
Bibliography
[247] A. Tort, A. Olivé, and M.-R. Sancho. On Checking Executable Conceptual
Schema Validity by Testing. In Proc. DEXA’12, pages 249–264, 2012.
[248] W. J. Tracz. Computer programming and the human thought process. Software: Practice and Experience, 9(2):127–137, 1979.
[249] N. Unsworth and R. W. Engle. The Nature of Individual Differences in Working Memory Capacity: Active Maintenance in Primary Memory and Controlled Search From Secondary Memory. Psychological Review, 114(1):104–
132, 2007.
[250] P. van Bommel, S. Hoppenbrouwers, E. Proper, and T. van der Weide. Exploring Modelling Strategies in a Meta-modelling Context. In Proc. OTM’06,
pages 1128–1137, 2006.
[251] W. M. P. van der Aalst, H. T. de Beer, and B. van Dongen. Process Mining
and Verification of Properties: An Approach Based on Temporal Logic. In
Proc. OTM’05, pages 130–147, 2005.
[252] W. M. P. Van der Aalst and J. Dehnert. Bridging the Gap between Business
Models and Workflow Specifications. International Journal of Cooperative
Information Systems, 13(3):289–332, 2004.
[253] W. M. P. van der Aalst and M. Pesic. DecSerFlow: Towards a Truly Declarative Service Flow Languages. In The Role of Business Processes in Service
Oriented Architectures, number 06291 in Dagstuhl Seminar Proceedings, pages
1–23, 2006.
[254] W. M. P. van der Aalst and M. Pesic. Specifying and Monitoring Service
Flows: Making Web Services Process-Aware. In Test and Analysis of Web
Services, pages 11–55. Springer, 2007.
[255] W. M. P. van der Aalst, A. H. ter Hofstede, B. Kiepuszewski, and A. P. Barros.
Workflow Patterns. Distributed and Parallel Database, 14(3):5–51, 2003.
[256] W. M. P. van der Aalst and A. H. M. ter Hofstede. YAWL: Yet Another
Workflow Language. Information Systems, 30(4):245–275, June 2005.
[257] W. M. P. van der Aalst and M. Weske. Case handling: a new paradigm
for business process support. Data & Knowledge Engineering, 53(2):129–162,
2005.
235
Bibliography
[258] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M.
Weijters, and W. M. P. van der Aalst. The ProM framework: A new era in
process mining tool support. In Proc. ICATPN’05, pages 444–454, 2005.
[259] G. V. van Zanten, S. Hoppenbrouwers, and H. Proper. System Development
as a Rational Communicative Process. Journal of Systemics, Cybernetics and
Informatics, 2(4):1–5, 2004.
[260] I. Vanderfeesten, H. Reijers, J. Mendling, W. M. P. van der Aalst, and J. Cardoso. On a Quest for Good Process Models: The Cross-Connectivity Metric.
In Proc. CAiSE’08, pages 480–494, 2008.
[261] E. Verbeek, M. Hattem, H. Reijers, and W. Munk. Protos 7.0: Simulation
Made Accessible. In Proc. APN’05, pages 465–474, 2005.
[262] J. Vlissides, R. Helm, R. Johnson, and E. Gamma. Design Patterns. Elements
of Reusable Object-Oriented Software. Addison-Wesley, 1994.
[263] W. P. Vogt. Dictionary of Statistics & Methodology: A Nontechnical Guide
for the Social Sciences. SAGE Publications, 2011.
[264] J. Wainer, F. Bezerra, and P. Barthelmess. Tucupi: a flexible workflow system
based on overridable constraints. In Proc. SAC’04, pages 498–502, 2004.
[265] Y. Wand and R. Weber. An Ontological Model of an Information System.
IEEE Transactions on Software Engineering, 16(11):1282–1292, 1990.
[266] Y. Wand and R. Weber. Research Commentary: Information Systems and
Conceptual Modeling—A Research Agenda. Information Systems Research,
13(4):363–376, 2002.
[267] E. J. Webb, D. T. Campbell, R. D. Schwartz, L. Sechrest, and J. B. Grove.
Nonreactive Measures in the Social Sciences. Houghton, 1981.
[268] B. Weber, J. Pinggera, V. Torres, and M. Reichert. Change Patterns for Model
Creation: Investigating the Role of Nesting Depth. In Proc. Cognise’13, pages
198–204, 2013.
[269] B. Weber, J. Pinggera, V. Torres, and M. Reichert. Change Patterns in Use:
A Critical Evaluation. In Proc. BPMDS’13, pages 261–276, 2013.
[270] B. Weber, J. Pinggera, S. Zugal, and W. Wild. Alaska Simulator—A Journey
to Planning. In Proc. XP’09, pages 253–254, 2009.
236
Bibliography
[271] B. Weber, J. Pinggera, S. Zugal, and W. Wild. Alaska Simulator Toolset
for Conducting Controlled Experiments. In Proc. CAiSE Forum’10, pages
205–221, 2010.
[272] B. Weber, J. Pinggera, S. Zugal, and W. Wild. Handling Events During
Business Process Execution: An Empirical Test. In Proc. ER-POIS’10, pages
19–30, 2010.
[273] B. Weber, M. Reichert, J. Mendling, and H. Reijers. Refactoring large process
model repositories. Computers in Industry, 62(5):467–486, 2011.
[274] B. Weber, M. Reichert, and S. Rinderle. Change Patterns and Change Support
Features—Enhancing Flexibility in Process-Aware Information Systems. Data
& Knowledge Engineering, 66(3):438–466, 2008.
[275] B. Weber, M. Reichert, S. Rinderle-Ma, and W. Wild. Providing Integrated
Life Cycle Support in Process-Aware Information Systems. International Journal of Cooperative Information Systems, 18(1):115–165, 2009.
[276] B. Weber, H. Reijers, S. Zugal, and W. Wild. The Declarative Approach to
Business Process Execution: An Empirical Test. In Proc. CAiSE’09, pages
270–285, 2009.
[277] B. Weber, S. Zugal, J. Pinggera, and W. Wild. Experiencing Process Flexibility Patterns with Alaska Simulator. In Proc. BPMDemos’09, pages 13–16,
2009.
[278] M. Weidlich, S. Zugal, J. Pinggera, B. Weber, H. Reijers, and J. Mendling.
The Impact of Sequential and Circumstantial Changes on Process Models. In
Proc. ER-POIS’10, pages 43–54, 2010.
[279] M. Weiser. Programmers Use Slices When debugging. Communications of the
ACM, 25(7):446–452, 1982.
[280] M. Weske. Workflow Management Systems: Formal Foundation, Conceptual
Design, Implementation Aspects. PhD thesis, University of Münster, 2000.
[281] M. Weske. Business Process Management: Concepts, Methods, Technology.
Springer, 2007.
[282] R. Wieringa. Design Science Methodology: Principles and Practice. In Proc.
ICSE’10, pages 493–494, 2010.
237
Bibliography
[283] R. Wieringa. Relevance and Problem Choice in Design Science. In Proc.
DESRIST’10, pages 61–76, 2010.
[284] R. Wieringa. Towards a Unified Checklist for Empirical Research in Software
Engineering: First proposal. In Proc. EASE’12, pages 161–165, 2012.
[285] R. Wieringa, N. Condori-Fernandez, M. Daneva, B. Mutschler, and O. Pastor.
Lessons Learned from Evaluating a Checklist for Reporting Experimental and
Observational Research. In Proc. ESEM’12, pages 157–160, 2012.
[286] C. Wohlin, R. Runeson, M. Halst, M. Ohlsson, B. Regnell, and A. Wesslen.
Experimentation in Software Engineering: an Introduction. Kluwer, 2000.
[287] J. Wolfe. Guided Search 2.0 A revised model of visual search. Psychonomic
Bulletin & Review, 21(2):202–238, 1994.
[288] R. K. Yin. Case Study Research: Design and Methods. Sage, Thousand Oaks,
CA, 2002.
[289] E. Zangerle, W. Gassler, and G. Specht. Recommending#-Tags in Twitter.
In Proc. SASWeb’11, pages 67–78, 2011.
[290] E. Zangerle, W. Gassler, and G. Specht. Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments. In Proc. SocInfo’11,
pages 113–126, 2011.
[291] J. Zhang. The Nature of External Representations in Problem Solving. Cognitive Science, 21(2):179–217, 1997.
[292] J. Zhang and D. A. Norman. Representations in Distributed Cognitive Tasks.
Cognitive Science, 18(1):87–122, 1995.
[293] X. Zhao, C. Liu, Y. Yang, and W. Sadiq. Aligning Collaborative Business
Processes—An Organization-Oriented Perspective. Transactions on Systems,
Man, and Cybernetics, Part A: Systems and Humans, 39(6):1152–1164, 2009.
[294] D. W. Zimmerman. A Note on Interpretation of the Paired-Samples t Test.
Journal of Educational and Behavioral Statistics, 22(3):349–360, 1997.
[295] S. Zugal. Agile versus Plan-Driven Approaches to Planning—A Controlled
Experiment. Master’s thesis, University of Innsbruck, October 2008.
238
Bibliography
[296] S. Zugal, C. Haisjackl, J. Pinggera, and B. Weber. Empirical Evaluation of
Test Driven Modeling. International Journal of Information System Modeling
and Design, 4(2):23–43, 2013.
[297] S. Zugal, J. Pinggera, J. Mendling, H. Reijers, and B. Weber. Assessing the
Impact of Hierarchy on Model Understandability—A Cognitive Perspective.
In Proc. EESSMod’11, pages 123–133, 2011.
[298] S. Zugal, J. Pinggera, H. Reijers, M. Reichert, and B. Weber. Making the
Case for Measuring Mental Effort. In Proc. EESSMod’12, pages 37–42, 2012.
[299] S. Zugal, J. Pinggera, and B. Weber. Assessing Process Models with Cognitive
Psychology. In Proc. EMISA’11, pages 177–182, 2011.
[300] S. Zugal, J. Pinggera, and B. Weber. Creating Declarative Process Models
Using Test Driven Modeling Suite. In Proc. CAiSE Forum’11, pages 16–32,
2011.
[301] S. Zugal, J. Pinggera, and B. Weber. The Impact of Testcases on the Maintainability of Declarative Process Models. In Proc. BPMDS’11, pages 163–177,
2011.
[302] S. Zugal, J. Pinggera, and B. Weber. Toward Enhanced Life-Cycle Support for
Declarative Processes. Journal of Software: Evolution and Process, 24(3):285–
302, 2012.
[303] S. Zugal, P. Soffer, C. Haisjackl, J. Pinggera, M. Reichert, and B. Weber.
Investigating Expressiveness and Understandability of Hierarchy in Declarative Business Process Models. Software & Systems Modeling, 2013, DOI:
10.1007/s10270-013-0356-2.
[304] S. Zugal, P. Soffer, J. Pinggera, and B. Weber. Expressiveness and Understandability Considerations of Hierarchy in Declarative Business Process Models. In
Proc. BPMDS’12, pages 167–181, 2012.
239