University of Innsbruck Department of Computer Science Dissertation Applying Cognitive Psychology for Improving the Creation, Understanding and Maintenance of Business Process Models Stefan Zugal submitted to the Faculty of Mathematics, Computer Science and Physics of the University of Innsbruck in partial fulfillment of the requirements for the degree of “Doktor der Naturwissenschaften” Advisor: Assoc.–Prof. Dr. Barbara Weber Innsbruck, 2013 Abstract Considering the wide–spread adoption of business process modeling, the role of process models has become ever more central. Still, industrial process model collections display a wide range of quality problems, resulting in vivid research on the quality of process models, business process modeling languages and –methods. This thesis contributes to this stream of research by advocating the adoption of concepts from cognitive psychology for improving business process modeling languages and – methods. To address this rather broad research statement, this thesis focuses on two particular problems that are approached with the support of cognitive psychology. First, issues related to the creation, understanding and maintenance of declarative process models are analyzed. To counteract respective problems, the adoption of test cases, supporting the test driven development and maintenance of declarative process models, is proposed. By developing a prototypical implementation of the proposed concepts and employing them in empirical studies, the feasibility of the proposed approach is demonstrated. More specifically, empirical evidence for the positive influence on the creation and understanding of declarative process models is provided in a case study. Furthermore, beneficial effects on the maintenance of declarative process models are established in the course of two experiments. Second, the focus is shifted toward the interplay between a process model’s modularization and the resulting impact on understandability. By conducting a systematic literature review, apparently contradicting findings related to the understanding of modularization are found. To resolve these apparent contradictions, a cognitive– theory–based framework for assessing the impact of a process model’s modularization on its understandability is proposed. The subsequent empirical validation in the course of three experiments provides empirical evidence for the validity of the proposed framework. Summarizing, the creation, understanding and maintenance of declarative process models as well as the connection between a process model’s modularization and its understandability are successfully addressed. Thus, it can be concluded that concepts from cognitive psychology are indeed a promising foundation for improving business process modeling languages and –methods. Zusammenfassung In Anbetracht der weiten Verbreitung der Geschäftsprozessmodellierung ist die Rolle von Geschäftsprozessmodellen, kurz Prozessmodellen, zentraler denn je. Nichtsdestotrotz weisen industrielle Prozessmodelle immer noch eine Vielzahl von Qualitätsproblemen auf, was wiederum zu reger Forschungsaktivität führte, um Prozessmodelle, Prozessmodellierungssprachen sowie Methoden zur Erstellung von Prozessmodellen zu verbessern. Auch diese Dissertation befasst sich mit diesem Forschungszweig und untersucht, wie Qualitätsverbesserung von Prozessmodellen zielgerichtet durch das Übertragen von Konzepten aus der kognitiven Psychologie erreicht werden kann. Um diese breite Forschungsfrage zu behandeln, werden zwei spezifische Probleme aus der Prozessmodellierung mit Hilfe von Konzepten aus der kognitiven Psychologie adressiert. Der erste Teil dieser Dissertation befasst sich mit dem Erstellen, Verstehen und Warten von deklarativen Prozessmodellen. Um bestehenden Problemen gegenzusteuern, werden Testfälle vorgeschlagen, die es erlauben, deklarative Prozessmodelle testgetrieben zu entwickeln und zu warten. Die Machbarkeit von Testfällen für deklarative Prozessmodelle wird mit Hilfe einer prototypischen Implementierung und folgender empirischen Validierung demonstriert. Dabei wird im Rahmen einer Fallstudie der positive Einfluss von Testfällen auf die Erstellung von deklarativen Prozessmodellen belegt. Des Weiteren wird der positive Einfluss von Testfällen auf die Wartbarkeit von deklarativen Modellen im Zuge zweier Experimente untermauert. Im zweiten Teil dieser Dissertation wird der Fokus auf die Verständlichkeit von Prozessmodellen gelegt und das Zusammenspiel zwischen der Modularisierung eines Prozessmodells und dessen Verständlichkeit genauer beleuchtet. Im Rahmen einer systematischen Literaturanalyse werden offenbar widersprüchliche Ergebnisse empirischer Forschungen identifiziert und mit Hilfe eines auf kognitiver Psychologie basierendem Framework aufgelöst. Analog zum ersten Teil dieser Dissertation werden die darin ausgearbeiteten Konzepte im Zuge von drei Experimenten empirisch getestet und validiert. In einem Satz gesagt, adressiert diese Dissertation das Erstellen, die Verständlichkeit und die Wartbarkeit von deklarativen Prozessmodellen sowie das Zusammenspiel der Modularisierung eines Prozessmodells und dessen Verständlichkeit mit Hilfe von Konzepten aus der kognitiven Psychologie. In Anbetracht der Ergebnisse kann geschlussfolgert werden, dass Konzepte aus der kognitiven Psychologie in der Tat einen vielversprechenden Ausgangspunkt für die Verbesserung von Prozessmodellen, Prozessmodellierungssprachen und –methoden bilden. Acknowledgements From my subjective point of view, a PhD feels like an exhausting and demanding, but still enjoyable and rewarding journey. In this sense, I want to thank everybody, who accompanied and supported this endeavor. First, and foremost, I am indebted to the continuous support of my advisor, Barbara Weber. Thank you for your guidance and your encouragement—not only on a professional, but also on a personal level. The next person, who undoubtedly deserves an own paragraph, is Jakob Pinggera. Thank you for the countless discussions, feedback and suggestions. Also, I highly appreciate your support regarding coffee domination as well as all other not–so– serious projects that lighten up daily routine. Also, I want to thank my parents and family for giving me a home I always cherish coming back to. Thank you for your sheer endless patience, encouragement and solace. Thank you, Eva Zangerle, for giving me a similarly enjoyable new home. Furthermore, this work would not have been possible without the highly appreciated aid of researchers from other universities. The most, I am indebted to the support of Manfred Reichert, Hajo Reijers and Pnina Soffer. Besides your professional support, I highly appreciate our cooperation on a personal level. Thank you Manfred for letting me learn that there are also Germans who appreciate soccer players from Austria. Thank you Hajo for introducing me to the pleasures of whisky and squash. Thank you Pnina for not throwing shoes at me. If you, dear reader, are wondering why your name was not mentioned so far, please be forgiving and let me catch up for that. Thank you. Eidesstattliche Erklärung Ich erkläre hiermit an Eides statt durch meine eigenhändige Unterschrift, dass ich die vorliegende Arbeit selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Alle Stellen, die wörtlich oder inhaltlich den angegebenen Quellen entnommen wurden, sind als solche kenntlich gemacht. Die vorliegende Arbeit wurde bisher in gleicher oder ähnlicher Form noch nicht als Magister–/Master–/Diplomarbeit/Dissertation eingereicht. Innsbruck, Oktober 2013 Stefan Zugal Contents 1 Introduction 1 2 Research Methodology 5 3 Background 3.1 Declarative Business Process Models . . . . . . . . . . . . . . 3.1.1 Characteristics of Declarative Business Process Models 3.1.2 Semantics of Declarative Sub–Processes . . . . . . . . 3.1.3 Enhanced Expressiveness . . . . . . . . . . . . . . . . 3.1.4 Impact on Adaptation . . . . . . . . . . . . . . . . . . 3.2 Cognitive Psychology . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Recognition . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Cheetah Experimental Platform . . . . . . . . . . . . . . . . . 4 Test 4.1 4.2 4.3 4.4 4.5 4.6 . . . . . . . . . . 9 9 9 12 15 16 17 17 18 19 25 Driven Modeling Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Understandability and Maintainability of Declarative Process Models 4.4.1 Understandability . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . Testing Framework for Declarative Processes . . . . . . . . . . . . . 4.5.1 Software Testing Techniques . . . . . . . . . . . . . . . . . . . 4.5.2 Process Testing Framework Concepts . . . . . . . . . . . . . 4.5.3 Test Driven Modeling and the Declarative Process Life–Cycle 4.5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Driven Modeling Suite . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Software Components . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Support for Empirical Evaluation, Execution and Verification 4.6.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 30 31 32 33 34 37 38 38 39 45 52 53 53 57 58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 The Influence of TDM on Model Creation: A Case Study . 4.7.1 Definition and Planning of the Case Study . . . . . 4.7.2 Performing the Case Study . . . . . . . . . . . . . . 4.7.3 Limitations and Discussion . . . . . . . . . . . . . . 4.8 The Influence of TDM on Model Maintenance: Experiments 4.8.1 Experimental Definition and Planning . . . . . . . . 4.8.2 Performing the Experiment (E1 ) . . . . . . . . . . . 4.8.3 Performing the Replication (R1 ) . . . . . . . . . . . 4.9 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 62 66 75 77 78 83 89 95 96 99 5 The Impact of Modularization on Understandability 101 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2 Existing Empirical Research into Modularizing Conceptual Models . 103 5.2.1 Planning the Systematic Literature Review . . . . . . . . . . 104 5.2.2 Performing the Systematic Literature Review . . . . . . . . . 105 5.2.3 Reporting the Systematic Literature Review . . . . . . . . . . 106 5.3 A Framework for Assessing Understandability . . . . . . . . . . . . . 108 5.3.1 Antagonists of Understanding: Abstraction and Fragmentation 108 5.3.2 Toward a Cognitive Framework for Assessing Understandability112 5.3.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4 Evaluation Part I: BPMN . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4.1 Experimental Definition and Planning . . . . . . . . . . . . . 115 5.4.2 Performing the Experiment (E2 ) . . . . . . . . . . . . . . . . 121 5.4.3 Performing the Replication (R2 ) . . . . . . . . . . . . . . . . 138 5.5 Evaluation Part II: Declare . . . . . . . . . . . . . . . . . . . . . . . 155 5.5.1 Experimental Definition and Planning . . . . . . . . . . . . . 155 5.5.2 Performing the Experiment (E3 ) . . . . . . . . . . . . . . . . 162 5.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 5.8 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6 Summary 187 Appendices A Tests for Normal Distribution 189 A.1 A.2 A.3 A.4 A.5 Experiment E1 Replication R1 Experiment E2 Replication R2 Experiment E3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 190 190 199 208 B Supplementary Information 209 B.1 Process of Thesis Writing . . . . . . . . . . . . . . . . . . . . . . . . 209 B.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Abbreviations 213 Bibliography 215 Chapter 1 Introduction For decades, conceptual models have been used to facilitate the development of information systems [207, 266] and to support practitioners in the analysis of complex business domains [26]. For instance, strategists create mind maps [109], supply chain managers create decisions models [181] and system analysts create conceptual models [48] (cf. [26]). In this sense, over the years numerous conceptual modeling languages and associated modeling tools have been proposed [160], as also discussed in [226]: “Even though hundreds of modelling methods are in existence today, practitioners and researchers are zealously ’producing’ new modelling methods”. In more recent years, business process models, or process models for short, have gained particular attention [48, 207]. The creation of business process models—depicting the way organizations conduct current or future business processes—is a fundamental prerequisite for organizations to engage in Business Process Management (BPM) initiatives [110]. Therefore, it is not surprising that process modeling was found to be one of the highest ranked reasons for engaging in conceptual modeling [48]. In general, business process models are used, for instance, to support the development of process–aware information systems [60], inter–organizational workflows [293], service–oriented architectures [65] and web services [148]. Typically, business process models employ graphical notions to capture which activities, events and states constitute the underlying business process. In this sense, a business process can be defined as a set of connected activities that collectively realize a certain business goal [205, 281]. Business processes can be found in almost every entrepreneurial context, including insurance claims and the refunding of travel expenses, but also healthcare applications. Similar to other conceptual models, process models are first and foremost required to be intuitive and easily understandable, especially in project phases that are concerned with requirements documentation and communication [252]. Also, business process vendors and practitioners ranked the usage of process models for understanding business processes as a core benefit [110]. Knowing that many companies design and maintain several thousand process models, often also involving non–expert modelers [214], it seems of central importance 1 Chapter 1 Introduction that respective models are easy to understand and of appropriate quality. Still, it was observed that these large model collections exhibit serious quality issues, apparently negatively influencing their understandability [273]. For instance, error rates between 10% and 20% were found in industrial model collections [141]. These problems regarding the quality of process models resulted in vivid research into potential reasons and countermeasures. For instance, in [140] syntactic errors, such as deadlocks, were analyzed in industrial process model collections, aiming to establish a connection between structural model properties and error rates. Other researchers investigated the consistency of activity labels [144] and issues related to secondary notation, e.g., layout, of process models [221]. Also, the visual design of modeling languages [151] was discussed and modeling languages were screened for ontological deficiencies [202]. Recently, a new stream of research emerged, shifting the emphasis from the process model toward the act of creating a process model, denoted to as process of process modeling (PPM). Thereby, researchers identified varying strategies that are adopted during the creation of a model [185, 186] and linked particular strategies to the quality of the resulting process model [31]. Moreover, various techniques for visualizing the PPM [30, 192], analysis techniques using eye movement analysis [183] and theoretical considerations were discussed [233]. So far, investigations regarding the PPM were focused on a subset of the Business Process and Model Notation (BPMN) [167], however, recently also other modeling notations, such as change patterns [274], were considered [268, 269]. In this thesis, we1 contribute to these streams of research and aim for improving business process modeling languages and –methods by widening our perspective toward the field of cognitive psychology. In particular, we think that the adoption of concepts from cognitive psychology fosters the systematic improvement of business process modeling languages and respective methods. In this vein, the central research question of this thesis can be formulated, as follows: How can we apply concepts from cognitive psychology for systematically improving business process modeling languages and –methods? Apparently, this research question is of rather general nature. Therefore, we selected two particular problems for which we expected that the adoption of cognitive psychology would be beneficial. More specifically, as illustrated in Figure 1.1, first, 1 2 The contributions of this thesis can be clearly attributed to the author. At the same time, the author is indebted to continuous feedback, suggestions and discussions that helped to guide this work (cf. acknowledgements and Appendix B.2). To express the gratitude for the support, we instead of I is used for the remainder of the thesis. promising concepts from cognitive psychology are put into the context of BPM (cf. Chapter 3). Then, we shift our focus on the creation, understanding and maintenance of declarative business process models, as shown in the upper branch (cf. Chapter 4). Subsequently, we turn toward the link between the modularization and the understandability of a process model, as shown in the lower branch (cf. Chapter 5). In each of these branches, based upon concepts from cognitive psychology, theories and methods are developed for solving the respective problem at hand. Then, to demonstrate the feasibility, operational support for developed theories and methods is provided through the implementation of respective software tools. Finally, the implemented tools are employed in empirical studies to validate the theories. At this point, we also would like to mention that large parts of this thesis were already published by the author, however, in a more condensed form. In particular, concepts from cognitive psychology were described in [298, 299], contributions regarding declarative business process models were published in [296, 300–302] and the link between a process model’s modularization and it understandability was published in [297, 303, 304]. Theory Tools Empirical Research Experiment E1: Innsbruck, 2010 Creation, Understanding, Maintenance (Chapter 4) Test Driven Modeling Test Driven Modeling Suite Case Study: Innsbruck, 2011 Cognitive Psychology (Chapter 3) Understanding (Chapter 5) Replication R1: Ulm, 2011 Experiment E2: Eindhoven, 2012 Understandability Framework Hierarchy Explorer Replication R2: Online, 2012 Experiment E3: Ulm/Ibk, 2012 Figure 1.1: Overview of thesis 3 Chapter 1 Introduction The remainder of this thesis is structured, as follows. Chapter 2 describes the employed research methodology. Then, Chapter 3 introduces background information regarding declarative process models, cognitive psychology and tools for empirical research. Chapter 4 describes the adoption of cognitive psychology for improving the creation, understanding and maintenance of declarative business process models. Chapter 5, in turn, describes the application of concepts from cognitive psychology for investigating the link between a process model’s modularization and its understandability. Finally, Chapter 6 concludes the thesis with a summary and an outlook. 4 Chapter 2 Research Methodology In this chapter, we focus on the research methodology applied in this thesis. As discussed in Chapter 1, the main contribution of this thesis targets the creation, understanding and maintenance of business process models. In particular, building upon concepts from cognitive psychology, as introduced in Chapter 3, Chapter 4 addresses the creation, understanding and maintenance of declarative business process models. Then, Chapter 5 advances research into the interplay between the modularization and understanding of imperative and declarative business process models. In the following, we describe the generic research methodology we applied. Generally, this research can be attributed to the design science paradigm, “which seeks to create innovations that define the ideas, practices, technical capabilities and products through which the analysis, design, implementation, management and use of information systems can be effectively and efficiently accomplished” [100]. More specifically, in this research we follow the Design Science Research Methodology (DSRM) approach [173], which includes the following activities: (1) Problem identification and motivation (2) Define the objectives for a solution (3) Design and development (4) Demonstration (5) Evaluation (6) Communication Besides guiding the conducted research, the structure of Chapter 4 and Chapter 5 is aligned according to the DSRM approach. In particular, as it is known that solutions to irrelevant problems will not be used, problem relevance is of central importance [199, 283]. Hence, both chapters start with a general discussion of the problem and the definition of objectives for a solution. Subsequently, artifacts 5 Chapter 2 Research Methodology solving the discussed problems will be designed, developed, demonstrated and evaluated, focusing on research rigor [100]. The final step communication focuses on the publication of results, i.e., is achieved through this document. By adopting the DSRM approach we are following the analytic paradigm and a deductive model by “proposing a set of axioms, developing a theory, deriving results and, if possible, verifying the results with empirical observations” [13]. In this thesis, we put a particular emphasis on the empirical evaluation of the proposed artifacts, i.e., activity evaluation (5).1 Hence, in the following we shortly discuss the character of the empirical research conducted in this work. As defined in DSRM, the evaluation aims to measure “how well the artifact supports a solution to the problem” [173]. In this sense, the experiments conducted in this research are rather designed to test a hypothesis, than to explore a new domain, cf. [13]. According to [13], empirical studies can be of descriptive nature, i.e., such studies identify patterns in the data, but do not examine relationship between variables. If the variation of dependent variable(s) can be attributed to the variation of independent variable(s), the study is called correlational. Finally, if the treatment variable(s) is the only plausible cause of variation in the dependent variable(s), a cause–effect study is conducted. In this research, we focus mostly on cause–effect studies, in particular the execution of controlled experiments [61]. Thus, the empirical research will be mainly conducted in vitro, i.e., in the laboratory under controlled conditions [13]. For quality assurance, we make use of checklists for the definition, execution and analysis of experiments, as far as applicable [284, 285]. Due to pragmatic reasons, in particular limitations with respect to budget and personnel resources, experiments will mostly be conducted on students rather than on experts. As argued in [61], recruiting students is a common practice for obtaining a large group of subjects. However, researchers are then required to critically analyze the external validity of the experiments, i.e., in how far obtained results can be generalized. Regarding the empirical investigations conducted in this thesis, we generally follow a mixed– method approach [39], i.e., we combine qualitative and quantitative research methods in order to achieve method triangulation [112], as detailed in the following. 1 6 According to [282], empirical research can play the role of validation or evaluation. Validation refers to artifacts that were not transferred to practice yet, while evaluation refers to the performance of an artifact after it was transferred to practice. In this work, we focus on validation. However, DSRM defines evaluation more generally as to “observe and measure how well the artifact supports a solution to the problem” [173]. To avoid confusion, we adopt the definition of DSRM, but refer to validation in the sense of [282]. Qualitative Methods The roots of qualitative research methods can be found in educational research and other social sciences, where qualitative methods were introduced to study the complexities of humans, e.g., motivation, communication or understanding [239]. In this sense, qualitative methods produce qualitative data that is rich in detail, mostly represented as text and pictures, not numbers [86]. This, in turn, allows the researcher to delve into the complexity of the problem, rather than abstracting it away. Likewise, results obtained through qualitative methods are richer in detail and more informative [222]. In this way, qualitative methods help to answer the question why and how certain phenomena occur. It is important to note that, even though qualitative data cannot be captured using numbers, qualitative data is not necessarily subjective. Rather, objectivity or subjectivity of data are orthogonal aspects [222]. In this research, the application of think–aloud techniques [64] and concepts from grounded theory [35] can be attributed to the adoption of qualitative methods. Therein, the basic idea is to ask participants to think out loud while performing a task. This, in turn, allows the researcher to attain a unique view of the problem solving process [228]. Quantitative Methods While qualitative research methods focus on the question how phenomena occur, quantitative research methods are mainly concerned with quantifying relationships or comparing two or more groups [39]. In this sense, quantitative research methods are appropriate when testing the effect of some manipulation or activity, as quantitative data allows for comparisons and statistical methods [286]. Just like qualitative data is not necessarily subjective, quantitative data is not objective by default, rather objectivity has to be seen as a orthogonal factor [222]. It is important to note that qualitative and quantitative methods do not exist in isolation, but can be connected. For instance, by coding think–aloud protocols, qualitative information in the form of transcripts can be transformed to quantitative data, e.g., the number of codes occurred in the protocol. Method Triangulation So far, we have discussed the basic nature of qualitative and quantitative research methods. In this research, we combine both research methods, thereby adopting a mixed–method approach [39]. The combination of research methods has a distinct tradition in the literature on social sciences and was described as method triangulation [267]. The triangulation metaphor actually originates from the domain 7 Chapter 2 Research Methodology of navigation: given multiple viewpoints, the exact position can be determined at greater accuracy. Likewise, by adopting multiple research methods, phenomena can be investigated from different perspectives, allowing researchers to improve the accuracy of their results [112]. The basic idea is thereby to compensate the weaknesses of an approach by the strengths of a complementary approach [61]. For instance, quantitative methods allow for investigating cause–effect relations, but are not suited for explaining the underlying reasons, i.e., they neglect the question how phenomena occur. To compensate this shortcoming, qualitative approaches can help by providing the necessary data. More figuratively, the adoption of qualitative research methods has also been described to “function as the glue that cements the interpretation of multi method results” [112]. 8 Chapter 3 Background In this chapter, we introduce backgrounds necessary for the understanding of the remainder of this thesis. In particular, Section 3.1 introduces declarative business process models, Section 3.2 discusses concepts from cognitive psychology in the light of business process modeling, whereas Section 3.3 describes Cheetah Experimental Platform. 3.1 Declarative Business Process Models The assessment and improvement of declarative process models is one of the major contributions of this thesis. In this section, we aim for building a general understanding of declarative process models by providing the necessary background information. Particularly, Section 3.1.1 provides a general introduction to declarative business process models, whereas Section 3.1.2 discusses the semantics of sub– processes in declarative process models. Finally, Section 3.1.3 and Section 3.1.4 discuss peculiarities of modularized declarative process models. 3.1.1 Characteristics of Declarative Business Process Models There has been a long tradition of modeling business processes in an imperative way. Process modeling languages supporting this paradigm, like BPMN [167], EPC [218] and UML Activity Diagrams [166], are widely used. Recently, declarative approaches have received increasing interest and suggest a fundamentally different way of describing business processes [175]. While imperative models specify exactly how things have to be done, declarative approaches rather focus on what to achieve. In other words, instead of explicitly describing the control to be followed when executing a process instance, e.g., as done in BPMN, declarative approaches rather focus on the conditions to be achieved when executing a process instance. To discuss the characteristics of declarative process modeling notations, we will particularly focus on the declarative process modeling language Declare [175] in the remainder of 9 Chapter 3 Background this thesis.1 Likewise, when not explicitly indicated differently, we use declarative process model synonymously to a Declare–based declarative process model. The declarative process modeling notation Declare focuses on the logic that governs the interplay of actions in the process by describing the activities that can be performed as well as constraints prohibiting undesired behavior. Thereby, constraints can be classified along two dimensions. First, constraints may be classified as existence constraints, relation constraints or negation constraints [253]. Existence constraints specify how often an activity must be executed for a particular process instance. Relation constraints, in turn, restrict the relation between activities. Finally, negation constraints define negated relations between activities, i.e., can be seen as negated relation constraints. Second, and orthogonally, constraints can be classified as execution constraints and completion constraints (also referred to as termination constraints, cf. [304]). Execution constraints, on the one hand, restrict the execution of activities, e.g., an activity can be executed at most once. Completion constraints, on the other hand, affect the completion of process instances and specify when process completion is possible. For instance, an activity must be executed at least once before the process can be completed. Here, it is worthwhile to note that declarative process instances must be completed explicitely, i.e., the end user must decide when a process instance, for which all completion constraints are satisfied, should be completed [205]. Most constraints focus either on execution or completion semantics, however, some constraints also combine execution and completion semantics (e.g., the cardinality constraint [175], cf. Table 3.1). To give an overview of typical constraints, Table 3.1 shows examples for each category. An overview of all constraints defined in Declare can be found in [253]. Existence Constraints Constraint Definition Ea Cb cardinality(a,m,n)2 a must occur at least m times and at most n times × × init(a) a must be the first activity executed in every process instance × 1 Declare was formerly known as ConDec, see: http://www.win.tue.nl/declare/2011/11/declare-renaming 2 In [175], the cardinality constraint is approached by three constraints: existence, i.e., an activity must be executed at least n times, exactly, i.e., an activity must be executed exactly n times and absence, i.e., an activity must be executed at most n − 1 times. In this thesis, for improving readability, we merged these three constraints into the cardinality constraint. 10 3.1 Declarative Business Process Models last(a) × a must be the last activity executed in every process instance Relation Constraints Constraint Definition Ea Cb precedence(a,b) b must be preceded by a (not necessarily directly preceded) × response(a,b) if a is executed, b must be executed afterwards (not necessarily directly afterwards) succession(a,b) combines precedence(a,b) and response(a,b) × × chain response(a,b) if a is executed, b must be executed directly afterwards × × chain precedence(a,b) before each execution of b, a must be executed directly before × chain succession(a,b) combines constraints chain precedence(a,b) and chain response(a,b) × coexistence(a,b) if a is executed, b must be executed and vice– versa × × × Negation Constraints Constraint Definition Ea neg response(a,b) if a is executed, b must not be executed afterwards × neg coexistence(a,b) a and b cannot co–occur in any process instance × a Affects the execution of activities b Affects the completion of process instances Cb Table 3.1: Definition of constraints To illustrate the concept of declarative processes, a process model (P MM ) specified in Declare [175] is shown in Figure 3.1a. It contains activities A to F as well as constraints C1 and C2 . C1 prescribes that A must be executed at least once (i.e., C1 restricts the completion of process instances). C2 specifies that E can only 11 Chapter 3 Background be executed if C has been executed at some point in time before (i.e., C2 imposes restrictions on the execution of activity E). In Figure 3.1b an example of a process instance (P IM ) illustrates the semantics of P MM . Therein, we make use of events to describe relevant changes during process execution, e.g., instantiation of the process instance or the start and completion of activities. After process instantiation (event e1 ), activities A, B, C, D and F can be executed. E, however, cannot be executed as C2 specifies that C must have been executed before (cf. grey bar below “E”). Furthermore, the process instance cannot be completed as C1 is not satisfied, i.e., A has not been executed at least once (cf. grey area below “Completion”). The subsequent execution of B (in e2 B is started, in e3 B is completed) does not cause any changes as B is not involved in any constraint. However, after A is executed (e4 , e5 ), C1 is satisfied, i.e., A has been executed at least once and thus P IM can be completed—after e5 the box below “Completion” is white. Then, C is executed (e6 , e7 ), satisfying C2 and consequently allowing E to be executed. Finally, the execution of E (e8 , e9 ) does not affect any constraint, thus no changes with respect to constraint satisfaction can be observed. As all completion constraints are satisfied, P IM can be completed. Please note that declarative process instances have to be completed explicitly, i.e., the end user must decide when to complete the process instance (e10 ). Completion constraints thereby specify when completion is allowed, i.e., P IM could have been completed at any point in time after e5 . As illustrated in Figure 3.1b, a process instance can be specified through a list of events. In the following, we will denote this list as execution trace, e.g., for P IM : <e1 , e2 , e3 , . . . e10 >. Considering process model P MM from Figure 3.1a, execution traces <A>, <C, A, E > and <B, A> are considered valid, because they satisfy constraints C1 and C2 . Execution trace <B, C >, in turn, is considered invalid, as it violates constraint C1 (A is not executed). Likewise, execution trace <A, E, B > is invalid, since it violates constraint C2 (E is executed without prior execution of C ). 3.1.2 Semantics of Declarative Sub–Processes So far, we introduced declarative process models in general and Declare in particular. This section, in turn, aims for establishing an understanding of the semantics of sub–processes in a declarative model. In general, a sub–process is introduced in a process model via a complex activity, which refers to a process model. When the complex activity is executed, the referred process model, i.e., the sub–process, is instantiated. Thereby, sub–processes are viewed as separate process instances, i.e., when a complex activity is started, a new instance of the sub–process the complex activity is referring to, is created (cf. [167, 177]). The parent process, however, has no 12 3.1 Declarative Business Process Models Process Model PM Timeline M Process Model PM M 1..* C1 1..*A C1 (b) Process InstancePI PMIM b)Instance Process b) Process PIM Instance B B A C E F D PIM started e2 e3 B started e1 B completed e4 e5 A started 2 e3 A completed e6 C started e7 C2 Legend EX Activity X F e8 e9 e10 1..* Legend X X X 1..* X X Activity X must be executed at least once Y A e1 D C2 C Execution Timeline (a)Process Process P MM a) Process Model PM a) ModelModel PM M M Activity X must be executed before Activity X Y can be executed B C D E F A Execution Completion B C D E F Completion PIM started B e e4 C completed e5 E started E completed B started BA completed B ACstarted A completed A E e6 C started PIM completed e7 C completed e8 E started C E Execution Trace of PIM: <PIM started, B started, B completed, A e E completed started, 9A completed, C started, C completed, E started, E completed, PIM completed> e10 PIM completed Activity X must be executed at least once Y Activity X must be executed before Y can be executed Execution Trace of PIM: <PIM started, B started, B completed, A started, A completed, C started, C completed, E started, E completed, PIM completed> Figure 3.1: Executing a declarative process model, adapted from [304] information about the internals of the sub–process, i.e., the sub–process is executed in isolation. In this sense, according to [51, 197, 198] we view sub–processes from an integrated perspective, i.e., the sub–process is seen as a black box. Interaction with the parent process is only done via the sub–process’ life–cycle.3 Thereby, the life–cycle state of the complex activity reflects the state of the sub–process [177], e.g., when the sub–process is in state completed, the complex activity must be in state completed as well. Considering this, it is essential that sub–processes are executed in isolation, as isolation forbids that constraints can be specified between activities included in different (sub–)processes. In other words, in a hierarchical declarative process model with several layers of hierarchy, the constraints of a process model can neither directly influence the control flow of any parent process, nor directly influence the control flow of any sub–process on the same layer or a layer below. Please note that control 3 We do not take into account the exchange of input– and output data here, as we focus on control flow behavior only. 13 Chapter 3 Background flow may still be indirectly influenced by restricting the execution of a sub–process, thereby restricting the execution of the activities contained therein. To illustrate these concepts, consider the modularized process model P MM in Figure 3.2a. It consists of activity A, which can be executed arbitrarily often. Activity B, in turn, can be executed at most three times (cf. constraint C1 ). B refers to process model P MB , which contains activities C and D. Here, constraint C2 prescribes that C must be executed at least once whenever process model P MB is executed. Furthermore, C and D are connected by the precedence constraint C3 , i.e., D can only be executed if C was executed before. Figure 3.2b shows an example of an execution of P MM . On the left, a timeline lists all events that occur during process execution. To the right, the enablement of activities and whether a process instance can be completed, is illustrated. Whenever the area below an activity or process instance is colored white, it indicates that this activity is currently enabled or the process instance can be completed, respectively. The timeline is to be interpreted the following way: By instantiating P MM (e1 ), activities A and B become enabled, as no constraints restrict their execution. C and D cannot be executed, as they are confined in P MB and no instance of P MB is running yet. The subsequent execution of A (e2 , e3 ) does not change activity enablement, as A is not related with any constraint. Then, the start of B (e4 ) causes the instantiation of P MB (P IB , e5 ). Hence, C becomes enabled, as it can be executed within P IB . Still, D is not enabled yet, as constraint C3 is not satisfied. After C is executed (e6 , e7 ), the precedence constraint is satisfied, therefore also D becomes enabled. In addition, constraint C2 is satisfied, allowing process instance P IB to complete. After the execution of D (e8 , e9 ), the user decides to complete P IB (e10 ), causing C and D to be not executable anymore and triggering the completion of B (e11 ). Still, A and B are enabled as they can be executed within process instance P IM . Finally, after P IM is completed by the end user through explicit completion (e12 ), no activity is enabled anymore. As described in Section 3.1.1, the completion of a declarative process instance is restricted by the use of completion constraints. In particular, a process instance is allowed to be completed by the end user if all completion constraints are satisfied. This, in turn, implies that a process instance may be completed without executing any activity, if no completion constraints are present. In the context of process model P MM in Figure 3.2, execution trace <A, B > is valid, since P MB does not define completion constraints. At this point we also would like to emphasize that activities are always executed within the context of the respective (sub–)process. In this sense, execution trace <A, C > is invalid: C can only be executed if there is an instance of P MB currently being executed. 14 3.1 Declarative Business Process Models (a) Process Models P MM , P MB a) a) Process ProcessModels ModelsPM PMMM, ,PM PMBB (b) Execution of P MM b)b)Execution ExecutionofofPM PM MM Process ProcessModel ModelPM PMMM AA PIPI BB M M PIPI Timeline Timeline 0..3 0..3 CC1 1 BB ++ Process ProcessModel ModelPM PMBB e1e1 PIPI started started MM e2e2 AAstarted started e3e3 AAcompleted completed e6e CCstarted started e7e CCcompleted completed DDstarted started 6 CC3 3 DD 7 e8e 8 Legend Legend e9e XX Activity ActivityXX XX ++ e10 e10 e11 e DDcompleted completed PIB completed PIB completed BBcompleted completed Complex Complexactivity activityXX e12 e PIPI M completed M completed 9 11 1..* 1..* XX XX YY AA e4e4 BBstarted started e5e5 PIPI B started B started C 1..* 1..* C22 CC Activity Activity Compl. Compl. Enablement Enablement A A B B C C D D PIM PIMPIBPIB 12 CC BB DD Activity Activity XXmust mustbe be executed executedatatleast leastonce once Execution PIPI A started, A completed, B M: <: < M started, ExecutionTrace TraceofofPIPI M started, A started, A completed, B started, B completed, PIMM completed> started, B completed, PIM completed> Activity Activity XXmust mustbe be executed before activity Y executed before activity Y can be executed can be executed Execution Trace of PIB: < PIB started, C started, C completed, D Execution Trace of PI : < PIB started, C started, C completed, D started, D completed, PIBB completed> started, D completed, PIB completed> Figure 3.2: Executing a hierarchical declarative process model, adapted from [304] 3.1.3 Enhanced Expressiveness For imperative process models, hierarchical decomposition is viewed as a structural measure that may impact model understandability [297], but does not influence semantics. In declarative process models, however, hierarchy also has implications on semantics. More precisely, hierarchy enhances the expressiveness of a declarative modeling language. The key observation is that by specifying constraints that refer to complex activities, it is possible to restrict the life–cycle of a sub–process. A constraint that refers to a complex activity thereby not only influences the complex activity, but also all activities contained therein. This, in turn, allows for the specification of constraints that apply in a certain context only. Consider for instance activity C in Figure 3.2. Considering process models P MB , C is mandatory, i.e., for any instance of P MB C must be executed. In the context of process model 15 Chapter 3 Background P MM , however, C is optional, as it is referred through complex activity B, which is optional—thus also making the execution of C optional. To illustrate enhanced expressiveness, consider models P MM and P MC in Figure 3.3, which solely use constraints described in Table 3.1. The chain precedence constraint between C and D specifies that for each execution of D, C and therefore P MC has to be executed directly before. When executing P MC , in turn, A has to be executed exactly once and B has to be executed exactly twice (in any order). Hence, the constraint between C and D actually refers to a set of activities, i.e., A and B. For each execution of D, A has to be executed exactly once and B has to be executed exactly twice. In other words, constraints on A and B are only valid in the context of P MC . Such behavior cannot be modeled without hierarchy, using the same set of constraints. Legend Process Model PMM C + D Process Model PMC 1 A X Activity X X + Complex activity X n X Activity X must be executed exactly n times 2 B X Y For each execution of activity Y, X must be executed directly before Figure 3.3: Example of enhanced expressiveness, adapted from [304] 3.1.4 Impact on Adaptation Constructing hierarchical models supports top–down analysis, i.e., creating the top– level model first and further refining complex activities thereafter. While this seems like a natural way of dealing with complexity, in some cases, it is desirable to transform a flat model to a hierarchical one. In the following, we will argue why refactoring [273], i.e., changing hierarchical structures in a control–flow preserving way, is only possible under certain conditions for declarative process models. Refactoring requires that any hierarchical model can be translated into a model without hierarchy, but the same control flow behavior (and vice versa). As discussed, expressiveness is enhanced by hierarchy. In other words, there exists control flow behavior that can be expressed in a hierarchical model, but not in a model without 16 3.2 Cognitive Psychology hierarchy—cf. Figure 3.3 for an example. Hence, hierarchical models that make use of the enhanced expressiveness cannot be expressed as a non–modularized model, i.e., cannot be refactored. 3.2 Cognitive Psychology So far, we have introduced declarative business process models as well as discussed their semantics and the usage of sub–processes. In this section, we focus on cognitive psychology, which constitutes the next basic building block of this thesis. Cognitive psychology has been investigating internal mental processes, e.g., the way how humans perceive, memorize or handle decision making since the 1950s [22]. In the following, we will introduce basic concepts from cognitive psychology and put them in the context of business process modeling. In particular, Section 3.2.1 introduces the concept of search, Section 3.2.2 discusses recognition and, finally, Section 3.2.3 is concerned with inference. 3.2.1 Search A vast body of research in the area of visual search has been conducted in the last decades, cf. [287]. Basically, visual search, or search for short, deals with the identification of a target item among a set of distractor items [287]. The target item thereby refers to the item to be searched for, whereas distractor items impede the identification of the target item. In the context of process models, search refers to the task of identifying a single modeling element, e.g., an activity or a start event in a process model. As indicated in [128], multiple attributes of the same element can help to improve the search process. Accordingly, for instance, a blue rectangle (information: blue and rectangle) can be identified quicker than a rectangle (information: rectangle). A systematic investigation into the impact of visual properties like size, brightness, or color, can be found in [151]. For this work, however, it is sufficient to know that the human perceptual system is capable of efficiently locating a target item among a set of distractor items. An example illustrating the impact of the visual representation on search can be found in Figure 3.4. Model A in Figure 3.4a and Model B in Figure 3.4b are identical except for the coloring of activity F. Hence, the models can be assumed to be information equivalent [128], i.e., all information relevant for the business process can be obtained from Model A as well as can be inferred from Model B. However, with respect to search, the models are not computational equivalent [128], i.e., search is not equally efficient in both models. In particular, when searching for activity F in Model B, an additional visual cue, namely the color grey, is available for search, allowing for 17 Chapter 3 Background quicker identification. In practice, such highlighting may be used to help the reader to identify which activities are assigned to a certain role. (b) Model B (a) Model A a) Model A b) Model B b) Model B C A C A F D D E A F E H G G H C C D D E A F F G G E H H Figure 3.4: Impact of visual notation on search, adapted from [299] 3.2.2 Recognition Search, as introduced, allows for the identification of a single modeling element in isolation. The human perceptual system, however, also provides another mechanism for identifying higher–level, i.e., more complex, objects. In particular, the identification of patterns plays a central role and is referred to as the process of recognition [128]. During recognition, two aspects were identified as primary influence factors: First, the human ability to recognize information is highly sensitive to the exact representation. Second, the perceptual system has to be trained specifically for the recognition of patterns. Representation of Information The recognition of patterns highly depends on the exact form information is represented [128]. More specifically, information may be represented explicitly, enabling solutions to be more readily “read–off” [217]. In contrast, information can also be represented implicitly, requiring the reader to investigate the model step wise, as direct perceptual recognition of patterns is not possible anymore. Thereby, explicit and implicit are not dichotomous—rather, information can be classified along a spectrum of explicitness/implicitness. For recognition, information must be available in a highly explicit form. To illustrate the concept of recognition, consider Model A in Figure 3.5 a) and Model B in Figure 3.5 b). The process models are information equivalent and differ only with respect to the layout of sequence flows. Apparently, for Model A sequence flows are laid out in a clear and easily readable way, while process Model B exhibits edge crossings which obscure reading and thereby make information less explicit. If an experienced process modeler is asked to determine whether activity B and activity C are mutual exclusive in Model A, a quick look at 18 3.2 Cognitive Psychology the process model will most likely be sufficient for answering this question, as the pattern “B and C directly connected to XOR split” is directly recognizable. If the question was asked for process model B, even an experienced process modeler would have to trace the sequence flows to find out that activities B and C are actually mutual exclusive. Hence, the pattern is not explicit enough and thus not accessible to recognition. (a) Model A (b) Model B b)b)Model ModelBB a) Model A A a) Model B BB AA AA CC C Figure 3.5: Recognition of mutual exclusion pattern, adapted from [299] Schemata for Recognition Besides the explicit representation of information, recognition depends on whether the perceptual system has been trained properly, i.e., appropriate schemata [115, 168, 236, 238]—also called productions [128]—are available for the proper identification of patterns. Put differently, a novice who has not acquired appropriate schemata yet, cannot rely on recognition, while experts can come back to a variety of schemata that were acquired throughout the years by working with process models. For instance, an experienced modeler will immediately recognize that activities B and C of Model A in Figure 3.5 are mutual exclusive, because they are directly connected to a XOR gateway. A novice modeler, in turn, due to the lack of suitable schemata has to analyze the model in–depth and make use of inference, as detailed in the following. 3.2.3 Inference So far, we have introduced search and recognition—mechanisms that allows for extracting rather simply–structured, local information. Most process models, however, go well beyond complexity that can be handled by search and recognition. Here, the human brain as “truly generic problem solver” [248] comes into play. A central component of the human cognitive system is working memory, which represents a 19 Chapter 3 Background construct that can maintain and manipulate a limited amount of information for goal directed behavior [8, 37]. Basically, working memory can be conceptualized as the activated part of long–term memory [249]. Long–term memory, unlike working memory, has a theoretically unlimited capacity and stores the knowledge base of a person, e.g., knowledge about facts, events, rules and procedures, over a long period of time. It is important to emphasize that the functioning of working memory and long–term memory is tightly interwoven. For instance, when reading a text, knowledge about speech is a prerequisite for the understanding of the text. In this context, knowledge about speech is stored in long–term memory, while the actual processing of the textual information takes place in working memory. Regarding the performance of such tasks, strong empirical evidence was found that working memory plays an essential role for the outcome, e.g., for language comprehension [114], logic learning [125], fluid intelligence [34] and the integration of preexisting domain knowledge [98]. However, it is assumed that the capacity of working memory is severely limited [7], studies report a capacity of 4 items [21, 37], up to 7±2 items [8, 147]. In addition, information held in the working memory decays after 18–30 seconds if not rehearsed [248]. To illustrate how severe these limitations are, consider the sequence A–G–K–O–M–L–J. The average human mind is just capable of keeping this sequence in working memory, and, after 18–30 seconds the sequence will be forgotten. Thereby, the amount of working memory currently used is referred to as mental effort. To measure mental effort, various techniques, such as rating scales, pupillary responses or heart–rate variability are available [168]. Especially the usage of rating scales, i.e., to self–rate mental effort, was shown to reliably measure mental effort and is thus widely adopted [90, 168]. Furthermore, this kind of measurement can easily be applied, e.g., by using 7–point rating scales. For instance, in [135] mental effort was assessed using a 7–point rating scale, ranging from (1) very easy to (7) very hard for the question “How difficult was it for you to learn about lightning from the presentation you just saw?”. The importance of the working memory was recognized and led to the development and establishment of Cognitive Load Theory (CLT), meanwhile widespread and empirically validated in numerous studies [9, 168, 236]. Theories of CLT revolve around the limitations of the working memory and how these limitations can be overcome—especially the second question is of special interest for the understandability of process models. Hence, subsequently, we discuss four phenomena that are known to influence mental effort. First, we discuss the chunking of information. Second, we show how process models can support inference through computational offloading. Third, we introduce external memory, which allows for freeing resources in working memory. Finally, we discuss how the split–attention effect increases mental effort. 20 3.2 Cognitive Psychology Chunking and Schema Acquisition How are humans, given the described limitations of working memory, able to recall a sentence that contains more than 7±2 characters? The answer to this question can be found in the way how the human mind organizes information. Information is believed to be stored in interconnected chunks rather than stored in isolation [91, 238]. In this way, several items are bound together to “form a unified whole” [85]. The process of aggregating information to a chunk, in turn, is referred to as “chunking” [91]. Coming back to our example, this explanation helps to shed light on the question of how humans can recall an entire sentence. Instead of remembering each character in isolation, chunks are used to group characters into chunks of information. Imagine a person trying to remember an entire sentence. One might use one chunk per word, thereby aggregating several characters into a chunk which then can be stored in one slot, effectively reducing mental effort, cf. [38, 91, 162]. When several chunks of information are integrated in long–term memory, they form schemata, i.e., well–integrated chunks of knowledge regarding the world, events, people or actions [7, 12]. These schemata, in turn, help to guide comprehension and help to organize information more efficiently [111]. Consider, for instance, the situation when a person remembers a word, such as glacier. As argued above, the person will probably remember the entire word as chunk instead of remembering each single character. Thereby, the knowledge of what constitutes a glacier and how the word is spelled, is regarded as schema. The actual information stored in working–memory is then called a chunk. In other words, chunks are always based upon schemata, which, in turn, guide the construction of chunks. The importance and usage of chunking is most obvious in investigations that look into the differences between novices and experts. For instance, it was found that chess players store the positions of tokens in chunks [28, 29]. Thereby, it was also found that the chunk size of experts was by far larger than the chunk size of novices, providing a potential explanation for the superiority of experts. Similar results regarding chunking were also found in different domains, e.g., how football coaches remember moves [79] or the way how physicians remember diagnostic interviews [36]. To illustrate how chunking potentially influences the understandability of business process models, an example is provided in Figure 3.6. An unexperienced reader may, as shown on the left hand side, use three chunks to store the process fragment: one for each XOR gateway and one for activity A. In contrast, an expert may, as shown on the right hand side, recognize this process fragment as a pattern for making activity A optional. In other words, in the long–term memory a schema for optional activities is present, thereby allowing to store the entire process fragment in one slot in working memory. 21 Chapter 3 Background A A Working Memory Figure 3.6: Chunking of an optional activity [299] Computational Offloading In contrast to chunking, which is highly dependent on the internal representation of information, i.e., how the reader organizes information in the mind, computational offloading highly depends on the exact external presentation of the business process model, i.e., visualization of the process model. In particular, computational offloading “refers the extent to which differential external representations reduce the amount of cognitive effort required to solve information equivalent problems” [217]. In other words, an external representation may provide features that help the reader to extract information. Instead of computing and inferencing respective information in the modeler’s mind, information can, as in the process of recognition, more or less be “read–off ” [217]. To put computational offloading in the context of business process modeling, an example illustrating the described phenomenon is shown in Figure 3.7. The illustrated process models are information equivalent, i.e., the same execution traces can be produced based on Model A and Model B. However, Model A was modeled in BPMN, whereas for Model B the declarative modeling language Declare [175] was used. Consider the task of listing all execution traces, i.e., process instances, that are supported by the process model. A reader familiar with BPMN will probably infer within a few seconds that Model A supports execution traces <A, B, D> and <A, C, D>. Such information is easy to extract, as BPMN provides an explicit concept for the representation of execution sequences, namely sequence flows. Thus, for identifying all possible execution traces, the reader simply follows the process model’s sequence flows—the computation of all execution traces is offloaded to the process model. In contrast, for Model B, no explicit representation of sequences is present. Rather, constraints define the interplay of actions and do not necessarily specify sequential behavior. Thus, the reader cannot directly read off execution traces, but has to interpret the constraints in the mind to infer execution traces. In other words, process model B, while information equivalent to Model A, does 22 3.2 Cognitive Psychology not provide computational offloading for extracting execution traces. Consequently, even for a reader experienced with Declare, listing all supported traces is far from trivial. (a) Model A (b) Model B 0..10..1 BB BB AA DD CC 1 1 1 1 D D AA 0..10..1 CC Figure 3.7: Computational offloading [299] External Memory Next, we would like to introduce another mechanism that is known for reducing mental effort, i.e., the amount of working memory slots in use. In particular, we discuss the concept of external memory in the context of business process models. An external memory is referred to any information storage outside the human cognitive system, e.g., pencil and paper or a black board [128, 217, 236, 248]. Information that is taken from the working memory and stored in an external memory is then referred to as cognitive trace. In the context of a diagram, a cognitive trace would be to, e.g., mark, update and highlight information [217]. Likewise, in the context of process modeling, the model itself may serve as external memory. When interpreting a process model, marking an activity as executed while checking whether an execution trace is supported, can be seen as leaving a cognitive trace. For the illustration of external memory and cognitive traces, consider the process model shown in Figure 3.8. Assuming the reader wants to verify, whether execution trace <A, D, E, F, G, H > is supported by the process model. So far, as indicated by the bold path and the position of the token, the reader has “mentally executed” the activities A, D, E, F and G. Without the aid of external memory, the reader will have to keep in the working memory, which activities have been executed, i.e., sub–trace <A, D, E, F, G>, as well as the position of the token within the process instance. By writing down the activities executed so far, i.e., by transferring this information from working memory to external memory (e.g., piece of paper), load on working memory is reduced. In this particular case, the process model directly allows to 23 Chapter 3 Background store the “mental token”—either by simply putting a finger on the respective part of the process model or by marking the location of the token, as shown in Figure 3.8. B A C F D E H G Figure 3.8: External memory [299] Split–Attention Effect So far, we have focused on phenomena that are known to decrease mental effort. In the following, we look into the split–attention effect [115, 153, 237], which is known to increase mental effort. Basically, the split–attention effect occurs as soon as information from different sources has to be integrated. For instance, when studying material that consists of separate text and diagrams, the learner has to keep segments of the text in working memory while searching for the matching diagrammatic entity [116]. Thereby, two basic effects are distinguished. First, the reader has to switch attention between different information sources, e.g., text and diagram. Second, the reader has to integrate different information sources. These two phenomena in combination are then known to increase mental effort and are referred to as split–attention effect. In the context of business process modeling, the split–attention can presumably be observed in any process model that contains sub–processes. As soon as the reader has to relate to more than one sub–process, the split–attention effect occurs. Consider, for instance, the process model shown in Figure 3.9. It consists of a top– level process containing (complex) activities A, B and C. Complex activity A, in turn, refers to a sub–process containing activities D, E and F. Likewise, complex activity C refers to a sub–process which contains activities H, I and J. When, for instance, determining whether activity E is always followed by activity B, one might use the following strategy. First, activity E is located, i.e., focusing attention on the respective sub–process. Second, activity B is located, i.e., switching attention to the top–level process. Third, the relationship between activity E and B is established by understanding that the execution of activity E is always directly followed by 24 3.3 Cheetah Experimental Platform the execution of activity B, i.e., integrating the execution semantics of the top–level process with the execution semantics of the sub–process. A B C E H F J I D Figure 3.9: Split–attention effect [297] 3.3 Cheetah Experimental Platform As discussed in Chapter 2, empirical research is one of the central research methodologies applied in this thesis. To particularly support the efficient execution of empirical research, Cheetah Experimental Platform (CEP) [188] was implemented.4 In the following, we describe how CEP supports efficient empirical research through experimental workflows. The motivation for the development of CEP arose from empirical research conducted with Alaska Simulator5 [187, 270, 271, 277]. In the course of several experiments (cf. [118, 182, 219, 272, 276, 295]) reoccurring problems hampering the efficient execution of controlled experiments could be observed. First, for experiments that consisted of several steps, e.g., filling out a survey and subsequently performing two modeling tasks, it could not be prevented that subjects disregarded the experimental setup. For instance, subjects partially forgot filling out the survey or conducting one of the modeling tasks, leading to missing data. Second, we could observe that certain components, such as a demographical survey, were required throughout several experiments, but could not be reused efficiently. Third, data collected during experiments was lost due to, e.g., interrupted network connections. To tackle these problems, an experimental design is operationalized in CEP by the usage of an experimental workflow. In other words, CEP provides a simple implementation of a workflow engine, which supports the execution of experimental 4 Cheetah Experimental Platform is a joint effort with Jakob Pinggera and is freely available from: http://www.cheetahplatform.org 5 Alaska Simulator is freely available from: http://www.alaskasimulator.org 25 Chapter 3 Background workflow activities that were specifically tailored toward the needs of an empirical study. An example of an experimental workflow operationalizing a single factor experiment [286] is shown in Figure 3.10. In this experimental setup, two modeling notations, i.e., Notation A and Notation B, should be compared. Furthermore, demographic data as well as personal opinions about the modeling notations should be collected. When conducting the experiment, subjects are instructed to download a preconfigured version of CEP and to enter a code. By randomly assigning codes to subjects, randomization can be achieved. Likewise, by distributing code 1 to one half of the subjects and distributing code 2 to the other half, a balanced setup can be achieved. After having downloaded and started the preconfigured version of CEP, the subject is automatically prompted to enter a code (cf. first activity in Figure 3.10). Thereby, CEP ensures that only valid codes, as specified in the experimental workflow, are permitted. After having entered a valid code, the demographic survey is automatically presented to the subject. As soon as the subject finishes the survey, CEP will automatically open an editor supporting Notation A or Notation B —depending on the code the subject has entered. After finishing the modeling task, CEP will automatically show a feedback form. Subsequently, all collected data is automatically uploaded to a database. For the case that no Internet connection is available, the subject is prompted to send the data via email. code = 1 Enter Code Provide Modeling Notation A Show Feedback Form Show Demographic Survey code = 2 Provide Modeling Notation B Figure 3.10: Experimental workflow: an example By operationalizing an experimental design using an experimental workflow, the problems described above are resolved, as follows. First, CEP automatically guides subjects through the experimental workflow. Depending on the code, the subject has entered, respective branches in the experimental workflow are executed. In this way, CEP ensures that none of the activities are left out. Second, experimental workflows are composed by reusable components, e.g., a survey component, a modeling editor or a feedback form. In this way, experimental workflows can be assembled with moderate programming effort. Third, data collected during the execution of the experimental workflow is immediately stored locally and transferred to a database 26 3.3 Cheetah Experimental Platform after the experiment is finished. In case no Internet connection is available, the subject is prompted to send the data via email. Thus, it is ensured that collected during experiments is not lost. In this thesis, CEP provides the basis for empirical investigations. In particular, concepts described in Chapter 4 and Chapter 5 are accessible as experimental workflow activities, i.e., can be used arbitrarily in experimental workflows for empirical investigations. The technical perspective of this integration is sketched in Figure 3.11. Generic functions, such as determining the duration of an experimental workflow activity, are provided through Abstract Activity. Subclasses extend Abstract Activity by contributing respective specific functionality. For instance, CEP provides ready–to–use activities Survey Activity for administering surveys, Messagebox Activity for displaying messages as well as Feedback Activity for leaving feedback. Likewise, the concepts described in Chapter 4 are accessible through classes Test Driven Modeling Activity and Declarative Modeling Activity. Concepts from Chapter 5 are accessible through Hierarchy Explorer Activity. For the time being, it is sufficient to know that these activities can be arbitrarily used in experimental workflows, details are provided in Chapter 4 and Chapter 5, respectively. Abstract Activity Activity Test Driven Modeling Activity Survey Activity +execute() +getDuration() Messagebox Activity #id: String #name: String Feedback Activity Declarative Modeling Activity Hierarchy Explorer Activity Figure 3.11: Experimental workflow activities: examples Up to now, we have described the purpose of CEP and how it provides the basis for empirical research conducted in this thesis. In the following, we would like to give an overview about the usage of CEP in general. In the meantime, CEP was used in the course of several empirical investigations. For instance, CEP was used for the investigation of declarative process modeling notations [95, 180, 296, 300, 301, 303], the process of process modeling [30, 31, 183, 185, 186, 192], Concurrent Task Trees [124], Literate Process Modeling [184], collaborative process modeling [75– 77], Change Pattern [268, 269] as well as general investigations of process modeling endeavours [74, 94, 190, 278]. 27 Chapter 4 Test Driven Modeling So far, we have focused on the pillars this thesis is built upon, i.e., we have introduced research methodologies and declarative business process modeling, discussed concepts from cognitive psychology and described Cheetah Experimental Platform. In this chapter, we turn toward the application of these concepts with the purpose of improving the creation, understanding and maintenance of declarative process models. As discussed in Chapter 2, in this work we follow the Design Science Research Methodology (DSRM) approach [173]. The DSRM methodology comprises of six activities that have not only guided our research, but were also used to structure this chapter. In particular, DSRM specifies the following activities: (1) Problem identification and motivation (2) Define the objectives for a solution (3) Design and development (4) Demonstration (5) Evaluation (6) Communication Starting with problem identification and motivation (1), a general introduction to the problem is provided in Section 4.1. To clarify the problem, terminology used throughout this chapter is introduced in Section 4.2, while Section 4.3 provides a running example. Addressing defining objectives for a solution (2), we deepen the discussion about benefits and drawbacks of declarative modeling and use cognitive psychology to systematically assess declarative process modeling and identify potential shortcomings in Section 4.4. Based upon these findings, we propose respective countermeasures in Section 4.5, cf. design and development (3). To demonstrate and operationalize the proposed approach, a prototypical implementation is described in Section 4.6, addressing demonstration (4). The implementation is also used in the following to empirically validate the proposed concepts in Section 4.7 29 Chapter 4 Test Driven Modeling and Section 4.8, i.e., addressing activity evaluation (5). Limitations of this work are described in Section 4.9, whereas connections to related work are established in Section 4.10. Finally, this chapter is concluded with a summary of results in Section 4.11. Regarding the DSRM approach, communication (6) is orthogonal to these sections, as it is the inherent purpose of this document to communicate results. 4.1 Introduction In today’s prevalent dynamic business environment, the economic success of an enterprise depends on its ability to react to various changes like shifts in customer’s attitudes or the introduction of new regulations and exceptional circumstances [129, 194]. Likewise, in health care applications, flexibility is a condition sine qua non for the adoption of computerized systems [196, 240]. Process–Aware Information Systems (PAISs) offer a promising perspective on shaping this capability, resulting in growing interest to align information systems in a process–oriented way [60, 281]. Yet, a critical success factor in applying PAISs is the possibility of flexibly dealing with process changes [129]. To address the need for flexible PAISs, competing paradigms enabling process changes and process flexibility were developed, e.g., adaptive processes [203, 280], case handling [257], declarative processes [176], data driven processes [158] and late binding and modeling [216] (for an overview see [274]). All these approaches relax the strict separation of build–time (i.e., modeling) and run–time (i.e., execution), which is typical for plan–driven approaches as realized in traditional Workflow Management Systems. Depending on the concrete approach, planning and execution are interwoven to different degrees, resulting in different levels of decision deferral. The highest degree of decision deferral is fostered by Late Composition [274] (e.g., as enabled through a declarative approach) which describes activities that can be performed as well as constraints prohibiting undesired behavior. A declarative approach, therefore, is particularly promising for dynamic and unpredictable processes [176, 264]. The support for partial workflows [264], allowing users to defer decisions to run–time [274], the absence of over–specification [176] and more maneuvering room for end users [176] can all be considered as advantages commonly attributed to declarative processes. Although the benefits of declarative approaches seem rather evident, they are not widely adopted in practice yet. Declarative processes are only rudimentarily supported and integrated process lifecycle support are not in place yet, while methods and tools for supporting imperative processes are rather advanced (e.g., [275]). 30 4.2 Terminology Reasons for the lacking adoption of declarative approaches seem to be related to understandability and maintainability problems [175, 276]. Also, methods and tools addressing respective issues, even though clearly in demand [209], are still missing. To approach these issues, we start by analyzing problems associated with understanding and maintaining declarative business process models with the help of concepts from cognitive psychology. Then, to tackle these problems, we adopt well established techniques from the domain of software engineering. More specifically, Test Driven Development [15] and Automated Acceptance Testing [155] are combined and adapted for better supporting the declarative process life–cycle. As result, we provide a first approach toward enhanced usability of declarative process management systems. However, before presenting the proposed concepts, we will take one step back and introduce basic terminology as well as a running example in the following. 4.2 Terminology We would like to emphasize that our work has to be seen in the context of the process of process modeling (PPM), i.e., the process of creating a process model (cf. [31, 186, 192, 233]). Such a process modeling endeavor is commonly described as iterative and collaborative [106], whereby communication plays an important role [259]. Typically, several roles are involved in the PPM. In this work, we subscribe to the view of [250] and particularly focus on the following roles: • Domain Expert (DE): provides information about the domain and decides which aspects of the domain are relevant to the model and which are not. • Model Builder (MB): is responsible for formalizing information provided by the DE. • Modeling Mediator (MM): helps the DE to phrase statements concerning the domain that can be understood by the MB. In addition, the MM helps to translate questions the MB might have for the DE. In particular, the PPM is divided into an elicitation dialogue and formalization dialogue [105]. During the elicitation dialogue, the DE conveys information about the business domain to the MB, i.e., capturing the requirements of the business process. In the formalization dialogue, the MB is responsible for transforming this informal information into a formal process model. Thereby, as argued in [211], much of the information is not gathered by the DE only, but created through the communication process itself. 31 Chapter 4 Test Driven Modeling 4.3 Example To illustrate the concepts of declarative business processes as well as the proposed framework and methodology, we introduce a running example (cf. Figure 4.1). The process model is rather meant to provide an example of a process within a familiar domain than a comprehensive description of how to write a paper. For the sake of brevity, we will use the following abbreviations: I Come up with Idea R Refine Idea W Write S Submit Paper C Cancel as Deadline Missed Legend Process Model PMM Refine Idea (R) 1 Come up with Idea (I) X 0..1 Cancel as Deadline Missed (C) Submit Paper (S) Activity X 1 X Activity X must be executed exactly once 0..1 X Write (W) 0..1 X X can be executed at most once Y X must be executed before Y can be executed Y X must be followed by Y or Z; Y and Z must be preceded by X X Z X Y X and Y cannot occur in the same process instance Figure 4.1: Example of a declarative process, adapted from [302] The constraints used in Figure 4.1 are taken from [175] and summarized in Table 3.1. For the sake of readability, we will shortly revisit their semantics in the following. The precedence constraint specifies that the execution of a particular activity requires another activity to be executed before (not necessarily directly). For instance, the precedence constraint between I and R allows execution traces <I, W, R, S > and <I, R, W, S >, but prohibits the trace <R, W, S >. The constraint succession is a refinement of the precedence constraint. In addition to the precedence relation, it demands that the execution of the first activity is followed by the 32 4.4 Understandability and Maintainability of Declarative Process Models execution of the second one. For example, the succession constraint between W and S is satisfied for execution traces <I, W, S >, <I, W, R, S >, but not for <I, R, S > (W not executed before S ) and <I, R, W > (S not executed after W ). By using the neg coexistence constraint, it is possible to define that only one of two activities can be executed for a process instance. For instance, the neg coexistence between C and S allows execution traces <I, W, S > and <I, W, C >, but prohibits trace <I, W, S, C >. Finally, the cardinality constraint restricts how often an activity can be executed. For instance, the cardinality constraint (0..1) on S allows the trace <I, W, R, S >, but prohibits the trace <I, W, R, S, S >. Having the constraints’ semantics in mind, process P MM in Figure 4.1 can be described in the following way: After an initial idea has been devised, it is possible to start working on the paper and to refine the idea at any time. If it turns out that the deadline cannot be met, the work on the paper will be stopped. Otherwise, as soon as the idea is described sufficiently well, the paper can be submitted. This example indicates three interesting properties of declarative process models: First, the declarative nature allows for an elegant specification of business rules, especially for unstructured processes. For an imperatively modeled process it would be difficult to deal with multiple (possibly) parallel executions of R and W, in combination with the neg coexistence constraint between C and S. Second, for assessing whether certain behavior is supported by the process model, the reader has to interpret the constraints mentally. For instance, when checking an execution trace, the reader has to inspect all constraints step–by–step to evaluate whether the execution trace is valid. Third, it is not obvious where to start reading the model. As there are no explicit start nodes, it is up to the reader to figure out where to start—in this case perhaps by following the precedence and succession constraints. In the following, we will deepen this discussion and assess the understandability and maintainability of declarative process models in detail. 4.4 Understandability and Maintainability of Declarative Process Models Though declarative business processes provide a big potential for flexible process execution, their adoption is currently limited—this section elaborates on factors impeding their adoption. In particular, we analyze potential problems of declarative business process models through the lens of cognitive psychology: Section 4.4.1 discusses problems related to the understanding of declarative process models, whereas Section 4.4.2 deals with maintainability issues. 33 Chapter 4 Test Driven Modeling 4.4.1 Understandability While there is anecdotal evidence that declarative process models suffer from understandability problems [175, 276], the discussion about potential reasons has been rather superficial so far. In the following, we examine the understanding of declarative process models from three angles. First, we employ the Cognitive Dimensions Framework (CDF) [92, 93] to analyze the understanding of declarative process models. Second, we complement these insights by adopting the notion of sequential/circumstantial information. Third, we look into the way how declarative process models are presumably read. Cognitive Dimensions Framework The goal of the Cognitive Dimensions Framework (CDF) [92, 93] is to provide a framework for assessing almost any cognitive artifact [163], e.g., modeling notations. The dimensions described by the CDF should allow for describing the structure of the cognitive artifact and allow for providing an analysis which is also understandable to persons that are not specialists in Human Computer Interaction (HCI). In this way, the CDF aims to provide discussion tools in order to raise the level of the discourse. In the context of this work, two dimensions are of particular interest: hard mental operations and hidden dependencies. Hard Mental Operations According to the CDF, hard mental operations are basically defined by two properties [93]. First, the problematic mental operations must be found on the notational level, rather than solely on the semantic level. In other words, if a business process is hard to understand due to its size and complexity (semantic level), the understanding of the process is not considered a hard mental operation, as defined by the CDF. Rather, if the process could have been modeled more comprehensible in a different modeling language (notational level), the interpretation is considered a hard mental operation. As discussed in Section 3.2.3, declarative process models clearly fulfill this property. In particular, as illustrated in Figure 3.7, there exist business processes which are easily understandable when modeled in BPMN, but rather difficult to understand when modeled in Declare. Hence, problematic mental operations can be traced back to the notational level, rather than to the semantic level. The second precondition to be satisfied for being considered a hard mental operation is that the combination of “offending objects vastly increases difficulty” [93]. Also this condition is fulfilled by a declarative process modeling notation. In particular, it was shown that the semantics of a single 34 4.4 Understandability and Maintainability of Declarative Process Models constraints can be conceived without major problems. The combination of several constraints, however, seems to pose a significant challenge to the reader [97]. To understand why the interpretation of a declarative process model can be considered a hard mental operation, we would like to revisit the concept of computational offloading [213, 217, 291, 292]. As introduced in Section 3.2.3, computational offloading allows the reader to “offload” computations to a diagram. In other words, the way how the diagram represents information allows the reader to quickly extract certain information. For instance, in a BPMN model control flow is explicitly represented by sequence flows (i.e., control edges) and gateways (e.g., AND gateway, XOR gateway). Assume the reader wants to check whether a certain process instance is supported by an imperative process model. To this end, the reader may use the control edges to simulate the process instance by tracing through the process model. In this way, the model allows to offload the computation of execution traces to the process model. Contrariwise, in a declarative process model, the reader cannot simply trace through the process model, as the control flow is modeled implicitly. Hence, even though both representations (declarative and imperative) are information equivalent, i.e., the same execution traces can be derived, the declarative process model does not allow the reader to quickly identify process instances. Rather, the reader has to simulate the process instance entirely in the mind. Hidden Dependencies A hidden dependency, according to the CDF [93], is a relationship between two components, such that one component is dependent on the other, but that dependency is not fully visible. In a declarative process model, this situation applies to the combination of constraints. Since constraints are interconnected, it is not sufficient to look at constraints in isolation, rather, the reader must also take into account the interplay of constraints. However, this interplay may not always be obvious, i.e., can be hidden, thus making understanding difficult. Figure 4.1 also illustrates hidden dependencies. For instance, after activities C or S have been completed, W cannot be executed anymore. This behavior is introduced by the combination of the cardinality constraints of C and S as well as the succession constraint of W and the neg coexistence constraint between S and C. If W was executed after C or S, the process instance could not be completed anymore as the succession constraint demands the execution of either C or S after W. However, neither C nor S can be executed due to the combination of cardinality– and neg coexistence constraint and consequently the process instance cannot be completed. Thus, the workflow engine must prohibit the execution of W (cf. [175]), which is not apparent from looking at the process model. 35 Chapter 4 Test Driven Modeling Sequential and Circumstantial Information So far, we have identified that hard mental operations and hidden dependencies may be present in declarative process models. In the following, we use the notion of sequential and circumstantial information for further analysis. According to [69, 70], process models exhibit sequential as well as circumstantial information. Sequential information describes chronological behavior (e.g., A directly follows B), whereas circumstantial information captures general relations (e.g., A must be executed at least once). Depending on the process modeling language, either sequential or circumstantial information may be favored, i.e., can be explicitly described by the modeling language. Thus, in general, modeling languages can be characterized along a spectrum of explicitness between sequential and circumstantial. Imperative languages (e.g., BPMN, Petri Nets) reside on the sequential side of the spectrum, while declarative approaches (e.g., Declare [175]) are settled on the circumstantial end [69]. Though the circumstantial nature of declarative languages allows for a specification of highly flexible business process models [175], it inhibits the extraction of sequential information (e.g., execution traces) at the same time. This, in turn, compromises understandability, as it makes it harder to see whether a process model supports the execution of a specific process instance or not. To understand why the extraction of sequential information, such as execution traces, plays an essential role for understanding, we refer to the validation of process models. Basically, validation refers to the question: “Did we build the right system?” [52], i.e., whether the process model faithfully represents the business process to be modeled. We would like to emphasize at this point that validation is orthogonal to the concept of verification. Validation refers to the question whether the right model is built, whereas verification is concerned with the question, whether the model is built right [18, 20], i.e., is formally correct. From software engineering, it is known that “programmers rely heavily upon mental simulation for evaluating the validity of rules” [117]. Seen in the context of business process modeling, mental simulation refers to the mental execution of process instances. In other words, the person who validates the process model checks via mental simulation whether certain process instances, i.e., scenarios, are supported by a process model. However, as discussed, this simulation relies on sequential information, which is only implicitly available in declarative process models. This, in turn, poses a significant challenge for the person who wants to validate the process model. To illustrate the concept of sequential and circumstantial information, consider the process model from Figure 4.1. For instance, the precedence constraint between I and R defines that R can only be executed if I has been completed at least once. In this way, the constraint conveys sequential information, i.e., that there is no 36 4.4 Understandability and Maintainability of Declarative Process Models occurrence of R before any occurrence of I. However, the constraint does not strictly prescribe all possible sequences involving I and R, but rather allows all process instances that satisfy this condition, i.e., also conveys circumstantial information. With respect to the coexistence constraint (cf. Table 3.1), there is an even higher ratio of circumstantial information: it defines that one out of two activities can be executed (circumstantial), but not both—the ordering of respective activities is not taken into account at all (sequential). Readability of Declarative Process Models Finally, we would like turn to the reading of declarative process models. Basically, it is known that conceptual models are not read at once, but chunk–wise, i.e., bit by bit [83]. While graph–based notations inherently propose a way of reading imperative process models chunk by chunk—namely from start node(s) to end node(s)—this approach does not necessarily work for declarative models. Since declarative models do not have explicit start nodes and end nodes, it is not always obvious where the reader should start reading the model. In fact, it is not unlikely for a declarative model to have several starting points, as the focus is put on what should be modeled, but not precisely how. Thus, when reading a declarative process model, users have to rely on secondary notation, e.g., layout [97]. For example, regarding Figure 4.1 one might assume that the process starts top left and ends bottom right. However, it depends on the person who created the model, whether this strategy succeeds. Another way of reading could be to follow precedence constraints and succession constraints to find the model’s start. In short, it is neither obvious which strategy to choose nor which one works best. 4.4.2 Maintainability Up to now, we have discussed the understanding of declarative process models. In the following, we turn to the maintenance of respective models, i.e., the evolution of process models due to the introduction of new laws or changes in customer attitude. Besides gathering the change requirements, the adaption of declarative process models is far from trivial. With increasing model size declarative models are not only difficult to understand, but also hard to maintain. As pointed out in [276]: “it is notoriously difficult to determine which constraints have to be modified and then to test the newly adapted set of constraints”. An explanation of problems regarding maintainability is provided by the observation that adapting process models involves both sense–making tasks, i.e., to determine parts of the model that need to be changed, 37 Chapter 4 Test Driven Modeling and action tasks, i.e., to apply the respective changes to the model [93]. While the action task is rather simple—adding/removing activities and adding/removing constraints—the sense–making task is far from trivial as detailed in the following. As illustrated, declarative process models suffer from understandability problems, thus impeding the sense–making task. Besides, due to the interplay of constraints, it is hard to see how changes influence other parts of the process model. Consider, for instance, the introduction of a neg coexistence constraint between R and W in Figure 4.1. This change not only restricts the relationship between R and W, but also introduces a deadlock for the execution trace <I, R>. While the execution of R is still possible, neither C nor S can be executed anymore, because they require W to be executed before. However, as R and W are mutual exclusive, W cannot be executed. This example illustrates that local changes to the process model can have effects on other parts of the model that are not obvious, i.e., the change becomes global. 4.5 Testing Framework for Declarative Processes To address understanding and maintenance issues discussed afore, we propose a framework for the validation of declarative processes, subsequently referred to as Test Driven Modeling (TDM). Basically, the idea of TDM is to transfer concepts from software engineering in order to provide computer–based support for the creation and maintenance of declarative process models. In particular, Section 4.5.1 introduces background information about the software engineering techniques Test Driven Development and Automated Acceptance Testing. Then, Section 4.5.2 illustrates how respective techniques can be adapted to the domain of business processes and introduces the testing framework. Section 4.5.3 focuses on methodological aspects and their application in the declarative process life–cycle. 4.5.1 Software Testing Techniques As the proposed framework and methodology build upon techniques from software engineering, respective techniques are introduced in the following. Test Driven Development Software engineering processes typically define the phases of system design and implementation as well as testing. That means, after the required functionality has been developed, the software system’s defects are (more or less) systematically searched for and corrected. The idea of Test Driven Development (TDD), however, 38 4.5 Testing Framework for Declarative Processes is to interweave the phases of system development and testing [15]. As the name suggests, automated test cases are specified before the actual production code is written. Whenever a new feature is introduced, a test case is created to ensure that the feature is implemented properly. In addition, developers execute all test cases to verify that existing behavior is preserved and the new feature does not introduce undesired behavior, i.e., “breaks the existing code” [15]. Studies show that the adoption of TDD indeed leads to improvements with respect to the number of software defects and design quality (e.g., [27, 84]). It is worthwhile to note that TDD, as a byproduct, enables regression testing, i.e., testing whether the development of new functionality in a software system preserves the correctness of the existing system [1]. In particular, test cases that were specified during the development, can directly be used for regression testing. Automated Acceptance Testing Similar to TDD, the idea of Automated Acceptance Testing (AAT) [155] is the creation of executable test cases. AAT, however, focuses on the interaction between customers and developers. Test cases in AAT need to be understandable to customers without technical background, but still exhibit strict semantics. Thereby, test cases act as means of communication as they can be understood by both customers and developers, allowing for a better integration of the customer in the development process. This, in turn, supports the identification of system requirements [155]. The automated validation of the developed system against the test cases ensures that the desired functionality is actually provided by the software system. The test cases are seen as contract of acceptance: only if the software system passes all test cases, it will be accepted by the customer. 4.5.2 Process Testing Framework Concepts So far, we have established that TDD and AAT can help to improve understanding and maintaining software. Next, we describe how respective concepts can be adopted for declarative process models for addressing understandability and maintainability issues, as identified in Section 4.4. In particular, this section starts with further motivation and an overview of the testing framework before the concepts are explained in detail. Besides improving understanding and maintenance, test cases aim for supporting the creation of process models in two ways. First, it is known that a fundamental modeling strategy is to factor the problem, i.e., the business process to be modeled, into smaller pieces [154]. As test cases typically refer to a specific aspect of the busi- 39 Chapter 4 Test Driven Modeling ness process, they help to focus on smaller parts of the process, thereby supporting the factoring of the problem. In this vein, we subscribe to the view of [17]: “We assume that the domain experts know some or all scenarios of the process to be modelled better than the process itself ”. Second, an automated execution of test cases allows for a progressive evaluation, which is particularly beneficial for novices [93]. Thereby, test cases help to focus on a particular instance of the problem [154], another fundamental modeling strategy. Similarly, focusing on the bare essentials for reproducing a specific (errant) behavior was identified as an expert strategy for debugging [279]. The intended application of test cases, in particular the interplay of domain expert (DE), model builder (MB), test cases and process model is illustrated in Figure 4.2. The DE, possibly with help of the MB, creates a test case specifying the intended behavior of the declarative process (1a and 1b). To foster communication between DE and MB, test cases are represented by a rich graphical user interface, the so– called test fixture. Since the test fixture is understandable to both the DE and MB, it serves as base for discussion. To validate a test case, the test case logic is automatically extracted from the test case (2) and fed into the test engine (3). For the validation of test cases, the test engine also needs to access the process model (4). After the test engine has completed the validation process, results are reported to the MB/DE by indicating whether the test failed and for the case it failed, the reason why (5). Depending on the feedback, the MB might decide to adapt the process model (6) or the test case (1b). Test Case Structure In our work, we adopt the terminology from UML Testing Profile (UTP). More specifically, we consider a test case a “specification of one case to test the system including what to test with, which input, result, and under which conditions. . . A test case always returns a verdict. The verdict may be pass, fail, inconclusive or error.” [165]. Test cases are key components of the testing framework and essential parts of TDD and AAT. A test case for a declarative process model consists of text explaining the intention of the test case, an execution trace describing the behavior to be tested, a set of assertions specifying the conditions to be validated, as well as a graphical representation by a test fixture. Textual Description The textual description in a test case helps to capture information that cannot be expressed directly in a process model, but is still necessary to fully understand it. In this way, textual descriptions can be used to document why certain behavior must be present in the process model. For instance, the intention 40 4.5 Testing Framework for Declarative Processes Domain Expert (DE) Model Builder (MB) 6) Test case visualized by Test fixture 1b) 1a) 5) My First Test Case The intention of my first test case is to motivate the interplay of MB, DE, … Execution Trace 4) Test Engine Assertion Exec. Term. 3) 1 B A 3 4 1 A Test Case Logic time 2 Process Model A B 2) 1) can_not_terminate(1) 2) execute(A,2) 3) is_not_executable(A,3) 4) can_terminate(3) 5) execute(B,4) C Figure 4.2: Framework for the validation of declarative processes, adapted from [302] of the test case shown in Figure 4.2 is to give an overview of the proposed testing framework and illustrate the interplay of concepts. Execution Trace As motivated by AAT, test cases can be seen as executable specifications [155]. Instead of using an informal document describing the requirements of the process model in natural language, the specification should not only be readable by humans, but also accessible to automated interpretation. The term execution thereby refers to the idea of checking the described requirements using a test engine in an automated manner, i.e., “the test case gets executed”. Within a test case, execution traces are used to capture behavior to be tested, as they contain all relevant information about the execution of a process, e.g., execution of an activity or completion of a process instance. Thereby, an execution trace allows restoring an arbitrary state of a process instance by replaying the steps in the execution trace on the workflow engine. Thus, execution traces provide the basis for an executable specification, capturing behavior in a machine–readable form. 41 Chapter 4 Test Driven Modeling Considering the test case illustrated in Figure 4.2, the execution trace can be found on the left hand side of the test case. The execution trace includes the execution of two activities: activity A at time 2 and activity B at time 4. Assertion While execution traces allow for specifying behavior that must be supported by a process model, they do not provide means to specify behavior that must not be included. For instance, regarding the process model shown in Figure 4.2, activity A must be executed exactly once. Put differently, activity A must be at least once in the execution trace, but at the same time not more than once. In order to support such scenarios, a test case also contains a set of assertions. Using assertions, it can be validated whether certain behavior is supported/prohibited by the process model. Since declarative process models require explicit completion, i.e., the user must complete the instance explicitly, we differentiate between execution and completion assertions: • is executable (a, t): activity a is executable at time t. • is not executable (a, t): activity a is not executable at time t. • can complete (t): the process instance can complete at time t. • can not complete (t): the process instance can not complete at time t. As discussed in Section 3.1, the behavior specified in a declarative process model is prescribed by its activities and constraints. The former enumerate tasks which may be executed, whereas the latter ones restrict their execution and the completion of the process instance. Thus, the aforementioned assertions should be sufficient to cover any condition with respect to control flow, since they can deal with activity execution as well as process completion. Considering Figure 4.2, three assertions can be found at the right hand side of the test case. The assertions are organized in two columns to separate execution from completion assertions. At time 1, before the execution of activity A, assertion can not complete (1) (crossed out area are on the right) is specified. After A has been executed, at time 3, is not executable (A,3) (crossed out rectangle on the left) and can complete (3) (non–crossed area in the right column) can be found. Test Fixture So far, we illustrated the textual description, the execution trace and assertions of a test case. While these concepts suffice to specify the behavior to be tested, it is very likely that the DE cannot cope with these technical details. To enable the DE to read and specify test cases, test fixtures provide an intuitive graphical representation of a test case. Depending on the situation, a specific visualization 42 4.5 Testing Framework for Declarative Processes may be suitable best. For instance, when control flow constraints should be tested (e.g., activity B must be preceded by activity A), a simple time–line fixture may be sufficient (cf. Figure 4.2). However, for testing temporal constraints [127] (e.g., activity A can only be executed once per day), a calendar–like fixture might be more appropriate. Furthermore, it might be beneficial to provide a user interface the DE is familiar with (e.g., calendar from Microsoft Outlook). Figure 4.2 shows an example of a test fixture for testing control flow constraints. It consists of a simple timeline on the left hand side as well as an area for the specification of constraints on the right hand side. Assertion can complete is represented by an empty rectangle, whereas assertions is not executable and can not complete are represented by crossed out rectangles. Test Case Validation So far, we have introduced test cases that serve as executable specification and are created by the DE, possibly with the help of the MB. Now, we describe how test cases can be automatically validated against the process model. Each test case is responsible for providing a precise definition of the behavior to be tested, i.e., the execution trace and assertion statements. For the actual execution of the test case, the test engine provides an artificial environment, in which the process is executed. The procedure for the validation of a test case is straight forward: (1) Initialize the test environment, i.e., instantiate the process instance. (2) For each event in the execution trace and assertion, ascending by time: a) if event: The test engine interprets the log event and manipulates the test environment, e.g., execution of an activity. If the log event cannot be interpreted, e.g., because the activity cannot be executed, the test case validation will be stopped and the failure reported. b) if assertion: Test whether the assertion holds for the current state of the test environment. For the case the condition does not hold, the test case validation will be stopped and the failure reported. (3) In case all assertions passed, report that the test case passed. Otherwise, provide a detailed problem report, e.g., report the constraint that caused the failure or the state of the process instance when the test case failed. With respect to the test case illustrated in Figure 4.2, the test engine takes the test case logic and the process model as input. Then, the engine interprets the test case logic as follows: 43 Chapter 4 Test Driven Modeling (1) can not complete (1): Check that the process instance cannot be completed without executing any activity. (2) execute (A,2): Execute activity A at time 2 and complete it at time 3. (3) is not executable (A,3): Check that A cannot be executed at time 3. (4) can complete (3): Test if the process instance can be completed at time 3. (5) execute (B,4): Execute activity B at time 4 and complete it at time 5. (6) All events from the execution trace and all assertions could be interpreted without problems, thus report that the test case passed. Meta Model Up to now, we have described test cases and the automated validation. In the following, we summarize the described concepts in form of a meta model. As illustrated in Figure 4.3, TDM’s meta model can be divided into two main parts: the specification of test cases (upper half) as well as the specification of the business process model (lower half). A Test Driven Model consists of exactly one Declarative Process Model and an arbitrary amount of Test Cases. A Declarative Process Model consists of at least one Activity as well as an arbitrary amount of Constraints. For the sake of brevity, Figure 4.3 shows three constraints only, i.e., the Response Constraint, the Precedence Constraint and the Coexistence Constraint. TDM actually supports all constraints described in [175], for a detailed description of the constraints we refer to [148]. Besides the specification of a Declarative Process Model, the meta model in Figure 4.3 describes how Test Cases can be specified. In particular, a Test Case is built–up by a Process Instance, i.e., the execution trace, and an arbitrary number of Assertions. The Process Instance, in turn, consists of an arbitrary number of Activity Instances. For each Activity Instance a start Event (i.e., when the activity instance enters the state started in its life–cycle) as well as and an end Event (i.e., when the activity instance is completed ) are defined. Similarly, each Assertion is defined for a certain window by its start– and end Event. Within this window, a condition that is specified by the Assertion must hold. TDM thereby differentiates between two types of Assertions: an Execution Assertion can be used to verify whether a certain Activity is executable. The positive flag in Assertion thereby defines whether an Activity is expected to be executable or if the Activity is expected to be non–executable. Similarly, a Completion Assertion can be used to test whether the Process Instance can be completed within a specified window. To wrap up, TDM allows for the specification of declarative process models and test cases. Each test case defines a certain scenario, i.e., process instance, that 44 4.5 Testing Framework for Declarative Processes Specification of test cases Execution Assertion -activity : Activity Assertion Test Driven Model 1 Test Case 0..* -name : String -name : String 1 0..* -positive : boolean -start : Event -end : Event 1 1 Completion Assertion 1 Process Instance 1 Activity Instance 1 -start : Event -end : Event 0..* Event -id : int 1 Response Constraint -activity1 : Activity -activity2 : Activity Activity 1..* 1 1 -name : String 1 1 Precedence Constraint Declarative Process Model Constraint 1 0..* -activity1 : Activity -activity2 : Activity Coexistence Constraint Specification of declarative process model For the sake of brevity only three constraints are shown. -activity1 : Activity -activity2 : Activity Figure 4.3: Meta model of TDM [300] must be supported by the process model. Assertions can thereby be used to test for specific conditions, namely whether an activity is executable as well as whether the process instance can be completed. 4.5.3 Test Driven Modeling and the Declarative Process Life–Cycle So far, we have elaborated on understandability issues and maintainability issues of declarative models and introduced a testing framework adopting TDD and AAT. In the following, we turn toward the declarative process life–cycle, as depicted in Figure 4.4. We start by looking into the phase of process design and deployment, where we focus on methodological aspects and discuss how test cases are intended to drive the creation of declarative process models during. Then, we describe how test cases can provide support in the phase of process operation and evaluation. Design– and Deployment Phase In the following, we focus on the phases of process design and process deployment. In particular, we focus on process specification and process testing, cf. Figure 4.4. 45 Chapter 4 Test Driven Modeling Measure Operation & Evaluation Specify Design Run Release Deployment Implement Test Figure 4.4: Process life–cycle, adapted from [281] Test Driven Modeling (TDM) The PPM is a collaborative, but still manual process. By adopting TDD and AAT techniques, process specification and process testing get closely interwoven. In particular, test cases serve as Modeling Mediator (MM) [250] mediating between DE and MB. As illustrated in Figure 4.5, test cases provide means to talk about both the business domain and the process model. The DE is no longer forced to rely on the information provided by the MB solely (2)—the specification of test cases (4) and their automated validation against the formal process model (6) provide an unambiguous basis for communicating with the MB (5). This does not mean that the DE and MB do not communicate directly anymore. Rather, tests provide an additional communication channel. Thereby, modeling minutes, which are formulated as test cases instead of informal text can be automatically validated against the process model, relieving the MB from manually checking the process model against the informal specification. It is important to stress that the TDM’s PPM is of iterative nature. Rather than specifying all test cases up–front and modeling the process afterwards against the specification, test cases and model are refined iteratively. As illustrated in Figure 4.6, when a new process model is specified, the DE (with the help of the MB) describes a requirement in the form of a test case. Then, all test cases are validated against the process model to check whether the specified behavior is already supported by the model. For the case that all tests pass, the DE and MB can move on with specifying the next requirement, i.e., test case. If at least one test case fails, the DE and MB will discuss whether the failed test cases are valid, i.e., capture the business domain properly. If all test case are valid, it can be assumed that the model does not capture the business domain properly and therefore needs to be adapted. However, if the 46 4.5 Testing Framework for Declarative Processes 4) Test 1 Test 2 6) Test 3 1) 5) Domain Domain Expert 2) 3) Model Builder Figure 4.5: Communication channels, adapted from [134] discussion reveals that the test case is invalid, the test case needs to be adapted to represent the business domain properly. In either case, all test cases are run to ensure that the conducted adaption had the desired effect. Subsequently, new test cases are defined/adapted or the model is adapted iteratively until both DE and MB are satisfied with the test cases and process model. While the basic idea, as illustrated in Figure 4.6, is to start by specifying test cases, for some situations one might also start with the modeling part or a non–empty model. For instance, when working with existing process models it is neither feasible nor meaningful to start from scratch. Furthermore, depending on the MB’s or DE’s skills and preferences, it is not necessary to strictly follow the test–before–model idea, e.g., to start from an initial model capturing the process logic roughly and refine it using test cases. Such deviations from the original TDM process are acceptable—as long as testing and modeling stays interwoven. Otherwise, the benefits of TDM are likely to be diminished. The idea of TDM is illustrated based on an example of a modeling session, cf. Figures 4.7 to 4.9. To recapitulate, we use the following abbreviations: I Come up with Idea W Write S Submit Paper Starting from an empty process model, the DE lines out general properties of the process: “When writing a publication, you need to have an idea. Then you write the publication and submit it.”. Thus, possibly with help of the MB, the DE inserts activities I, W and S in the test case’s execution trace (cf. Figure 4.7). Respective activities are automatically created in the process model. Now, the DE and MB run 47 Chapter 4 Test Driven Modeling Write Test Case all test cases pass Run All Test Cases test case(s) fail(s) Discuss Failed Test Case test case valid Adapt Process Model test case invalid Adapt Test Case Figure 4.6: Test Driven Modeling [302] the test and the test engine reports that the test case passes. Subsequently, the DE and MB engage in a dialogue of questioning and answering [105]—the MB challenges the model: “So can I submit the paper several times?”. “You should submit the paper, but, at most once!”, the DE replies and adds: “And you should only have a single idea—otherwise the reader gets confused.”. Thus, they adapt the test case capturing this requirement and run it (cf. Figure 4.8). Apparently, the test case fails as there are no constraints in the model yet. After ensuring that the requirement is valid, the MB adapts the model—inserts cardinality constraints on I and S —the test passes (cf. Figure 4.8). Again, the MB challenges the model and asks: “Is it possible to submit an idea without paper?”. The DE replies: “No, you always need a written document.” and together they specify a second test case that ensures that S cannot be executed without at least one execution of W before. By automatically validating the second test case, it becomes apparent that S can be executed before W has been finished. Thus, the MB introduces a precedence constraint between W and S (cf. Figure 4.9). The given example illustrates the benefits of TDM for the design– and deployment phase, which are detailed in the following. 48 4.5 Testing Framework for Declarative Processes Trace Assertions Execution Term. Process Model Come up with Idea (I) I Write (W) W Submit Paper (S) S Figure 4.7: Test case 1: <I, W, S > proposed by the DE, adapted from [302] Trace Assertions Execution Term. Process Model 1 Come up with Idea (I) I W Write (W) I 1 S Submit Paper (S) S Figure 4.8: Test case 1: introduction of cardinality on I and S, adapted from [302] Improving Understandability As discussed in Section 4.4.1, declarative process models do not provide explicit support for sequential information, thereby forcing the MB to construct respective information in the mind. At this point, the sequential nature of test cases is exploited: since specification and testing are interwoven, test cases and models are paired together. Thereby, test cases provide an explicit source of sequential information. The construction of sequential information is supported by the automated validation of test cases, thus avoiding hard mental operations. Put differently, the automated computation of execution traces compensates for the lack of computational offloading in declarative process models. In addition, by specifying a respective test case, implicit dependencies between constraints can be made explicit. This, in turn, helps the MB to deal with hidden dependencies. According to [15], test cases should focus on a single aspect only. For instance, the test case shown in Figure 4.9 focuses on the execution of activity S. Thereby, a chunk of the process model is presented, focusing on a certain aspect of the model 49 Chapter 4 Test Driven Modeling Trace Assertions Execution Term. Process Model 1 Come up with Idea (I) I S W Write (W) 1 Submit Paper (S) S S Figure 4.9: Test case 2: introduction of precedence between W and S, adapted from [302] only. This, in turn, enables the MB to read the model test case by test case, i.e., chunk–wise. Besides, the ordering of the activities specified in the test case proposes a way of reading the process model. Consider, for instance, the test case illustrated in Figure 4.9: Activities I, W and S are ordered consecutively. Thus, the reader can assume that the process probably starts with activity I and ends with S. Foster Communication As pointed out, our approach aims at fostering the communication between DE and MB. Wrapping up, we expect test cases to (1) act as communication medium between DE and MB that serves as basis for discussion, (2) structure their dialogue and (3) allow to focus on the modeling task, as test cases provide a way to automatically ensure that existing behavior is not affected when changing the model. Support Schema Evolution Since design is redesign [87], during schema evolution the same principles as for process specification can be applied. The only difference, however, is that DE and MB have already a set of test cases as starting point that is extended as the process model is re–engineered. In this sense, the use of automated test cases is also beneficial for supporting schema evolution. First, existing test cases ensure that desired behavior is preserved by schema evolution, i.e., no unwanted behavior is introduced (cf. regression testing, Section 4.5.1). Second, the specification of new test cases capturing the behavior to be introduced helps the modeler to determine which constraints need to be changed, addressing the maintainability issues discussed in Section 4.4.2. Similar to the specification of new process models, the first step consists of specifying a test case that defines the behavior to be introduced/changed. Afterwards, the MB iteratively refines the test case, creates new 50 4.5 Testing Framework for Declarative Processes test cases and adapts the model until the desired solution is finally approached. The automated nature of test cases ensures that neither requirements are forgotten, nor new requirements contradict existing ones, allowing DE and MB to focus on the requirement elicitation and the modeling task. Operation and Evaluation While declarative processes provide a high degree of flexibility [175], deviations from the process model can occur nevertheless. In Declare [175], for instance, it is possible to specify optional constraints, which can be violated during process execution. These deviations are usually documented using plain text. However, when deviations occur frequently, it is desirable to ensure that deviations are incorporated during schema evolution [275]. To support the evolution of business processes over time, we propose to capture each deviation in the form of a test case. The execution trace of the current process instance, in combination with a textual description, can directly be transformed into a test case. The user, who deviated, is thereby enabled to document the deviation in a form that can directly be used to guide the upcoming schema evolution. This means, when redesigning the process schema, the MB runs all test cases for the respective schema. If all test cases, including the test cases specified in course of deviations, pass, the MB knows that the new process schema version also supports the needs of users who deviated. Otherwise, test cases that fail will be discarded if the discussion between DE and MB reveals that respective behavior should not be supported in the new process model version. Considering the process depicted in Figure 4.1, which allows for the submission of a paper only, resubmission is not supported. If a user needed to resubmit a paper, he would deviate from the process by inserting and executing an activity Resubmit Paper. To assure that the next version of the process model includes this exceptional case—or at least that the exceptional case is discussed during the schema evolution—the user creates a new test cases using the execution trace of the current process instance and a textual description to record the reason for the deviation. Test Cases and the Process Life–Cycle So far, the implications on the different phases of the declarative process life–cycle, as shown in Figure 4.4, were pointed out. While our approach covers all phases—with focus on process design and deployment—it should be emphasized that support is not provided in isolation for each phase. More specifically, test cases provide a mean of communication and documentation throughout the process life–cycle. Starting 51 Chapter 4 Test Driven Modeling in the design phase, they aim at improving the understandability of declarative process models and foster communication between DE and MB. Moving to the phase of process deployment, the automated nature of test cases provides support for the validation of the process. During process operation, test cases can be used to document process deviations. And, again starting at the phase of process design, test cases specified during process operation provide a valuable starting point for schema evolution. Thus, it becomes apparent that test cases are neither restricted to a single phase of the process life–cycle nor to a single iteration of the life–cycle. Rather, test cases flow through possibly multiple iterations of the process life–cycle, providing information that cannot be explicitly specified by declarative process models in isolation. 3) un R + 1) ify Spec TC1 … TCn TCn+1 … TCo + TC1 … TCn Test TC1 … TCn 2) Figure 4.10: Process– and test case life–cycle [302] The life–cycle of test cases is illustrated in Figure 4.10. Test cases are primarily defined during process specification (1) and can then directly be used for process testing, i.e., validation (2). During process execution, new test cases (T Cn+1 . . . T Co ) may be created to document deviations (3). These test cases, in turn, can be used as input for schema evolution (1). 4.5.4 Limitations Apparently, TDM has to be seen in the light of several limitations. Regarding conceptual aspects, it shall be noted that the focus of TDM is rather narrow. In particular, TDM was developed to support the creation, understanding and maintenance of declarative process models. Even though it seems plausible that test cases may be also used for the validation of imperative process models, it remains unclear in how 52 4.6 Test Driven Modeling Suite far concepts described in this work can be transferred. More specifically, we have argued that a central aspect of test cases is the automated extraction of sequential information from declarative process models, which usually rather focus on circumstantial information. Hence, when directly applying test cases, as described in this work, to imperative process models, sequential information is duplicated. In other words, as test cases and imperative languages both focus on sequential information, the adoption of test cases does not help to balance the sequential/circumstantial information mismatch (cf. Section 4.4.1). More likely, the introduction of circumstantial information, such as constraints, will help to facilitate model understanding and validity of imperative process models (cf. [132]). Another limitation of this work is that TDM focuses on control flow aspects only. Other perspectives, such as data and resources, were not taken into account yet. Even though this limits the applicability of TDM, this design decision was taken deliberately. In particular, the Design Science Research Methodology (DSRM) [173] adopted in this work envisions an iterative research process. Likewise, it is acknowledged that design–science research efforts may begin with simplified conceptualizations [100]. Hence, we focused on building a solid foundation, which may be used in future iterations for refining and extending. Regarding the applicability and efficiency of TDM, we would like to stress that TDM was designed to be used as a collaborative approach, requiring MB and DE to work together closely. Hence, for the adoption of TDM, two experts willing to collaborate are required. To test whether the application of TDM is indeed feasible, in the following we describe a prototypical implementation of TDM’s concepts. 4.6 Test Driven Modeling Suite In this section we introduce Test Driven Modeling Suite (TDMS), which provides operational support for TDM. We start by discussing the software components of TDMS in Section 4.6.1. Then, we describe how TDMS was integrated with existing frameworks for empirical research and business process execution in Section 4.6.2 and demonstrate the application of TDMS in a modeling session in Section 4.6.3. Thus, in the sense of [164], this section can be seen as a “proof–by–demonstration”, i.e., demonstrating the feasibility of TDM by implementing its concepts. 4.6.1 Software Components To give an overview of the features implemented in TDMS, we have modeled the scenario described in Section 4.5.3 using TDMS, cf. Figure 4.11. On the left hand side, TDMS offers a graphical editor for editing test cases (1). To the right, a 53 Chapter 4 Test Driven Modeling graphical editor allows for designing the process model (2). Whenever changes are conducted, TDMS immediately validates the test cases against the process model and indicates failed test cases in the test case overview (3). In this case, it lists two test cases from which one failed. In addition, TDMS provides a detailed problem message about failed test cases in (4). In this example, the MB defined that the trace <I, W, S > must be supported by the process model. In addition, the test case defines that S cannot be executed before W has been executed. However, as the relation between W and S is not restricted, it is possible to execute S, causing the test case to fail. In particular, TDMS informs the user about the failed test case by highlighting the respective erroneous part in the test case editor (1) and in the test case overview (3). In addition, TDMS provides a detailed error message to the user in (4): “Submit Paper (S)” should not have been executable. Figure 4.11: Screenshot of TDMS Test Case Editor As discussed in Section 4.5.2, test cases are a central concept of TDM, have precise semantics for the specification of behavior and still should be understandable to domain experts. To this end, TDMS provides a calendar–like test case editor as shown in Figure 4.11 (1). In particular, the test case editor provides support for the specification of an execution trace on the left hand side and execution assertions as well as completion assertions at the right hand side. The graphical representation of assertions slightly deviates from the original design (crossed out and non–crossed 54 4.6 Test Driven Modeling Suite out rectangles). Instead, stop–signs are used for negative assertions, i.e., when an activity cannot be executed or the process instance cannot be completed. OK–signs, in turn, are used to visualize positive assertions, i.e., when an activity can be executed or the process instance can be completed. To avoid unnecessary distractions, the test case editor is deliberately kept simple. The execution trace can be assembled by dragging activities from the process model editor (2) and dropping them at the respective time in the test case editor (1). Likewise, execution assertions can be specified in the same way. For completion assertions, the user selects the desired time frame in the test case editor and uses the context menu to add the assertion. Declarative Process Model Editor The declarative process model editor, as shown in Figure 4.11 (2), provides a graphical editor for designing models using the declarative process modeling language Declare [175]. Particularly, it enables the user to create, delete, rename and reposition activities and allows the user to create, edit and delete constraints. To allow users to quickly get familiar with the editor, it builds upon the standard user interactions provided by Graphical Editor Framework (GEF)1 . Test Case Creation and Validation In order to create new test cases or to delete existing test cases, an outline of all test cases is provided in Figure 4.11 (3). Whenever a test cases is created, edited or deleted, or the process model is changed, TDMS immediately validates all test cases. By double–clicking on a test case, TDMS opens the respective test case in the test case editor (1). Whenever a test case fails, TDMS provides a detailed problem message in the problem overview (4). When double–clicking on a problem reported in the problem overview, TDMS opens the test case associated with the problem in test case editor (1) and highlights the problem. Regarding the validation of test cases, it is important to stress that the validation procedure is performed automatically, i.e., no user interaction is required to validate the test cases. To this end, TDMS provides a test engine in which test cases are executed, as shown in Figure 4.12. Basically, the test engine consists of a declarative process instance that is executed on a declarative workflow engine within a test environment. Thereby, TDMS’ process model provides the basis for the process instance. The test cases steer the execution of the process instance, e.g., instantiating the process instance, starting activities or completing activities. In addition, test cases may also check the 1 http://www.eclipse.org/gef 55 Chapter 4 Test Driven Modeling state of the process instance in the course of evaluating execution– or completion assertions. TDMS Model Test Engine Declarative Process Model defined by Declarative Process Instance steer Test Cases check state Declarative Workflow Engine Figure 4.12: Testing framework [300] As pointed out in Section 4.5.3, the TDM methodology is of iterative nature, hence TDMS must also provide respective support. In particular, the iterative creation of the process model poses a significant challenge, as any relevant change of the process model requires the validation of test cases.2 However, existing approaches for supporting Declare either lead to exponential runtime for schema adaptations [175] or do not support workflow execution [148]. In order to tackle these problems, TDMS provides an own declarative workflow engine. Similar to the Declare framework, where constraints are mapped to LTL formulas [177], TDMS’ workflow engine maps constraints to Java3 classes. In addition, for each process instance, the workflow engine keeps a list of events to describe its current state. The enablement of an activity can then be determined as detailed in the following. Based on the current process instance, a constraint is able to determine whether it restricts the execution of an activity. The workflow engine consults all defined constraints and determines for each constraint whether it restricts the execution. If no constraint vetos, the activity can be executed. For determining whether the process instance can be completed, a similar strategy is followed. However, in this case constraints are asked whether they restrict the completion of the process instance instead. Whenever a constraint should be added to the process model, it is then sufficient to add this constraint to set of constraints to be checked. Similarly, when removing a constraint, the workflow engine does not consider the respective constraint anymore. While such an approach allows for efficient schema evolution, it does not support verification mechanisms as provided in, e.g., the Declare framework [177]. To compensate for this shortcoming, TDMS provides an interface to integrate third party tools for verification, as detailed in the following. 2 Layout operations, for instance, can be ignored here as they do not change the semantics of the process model. 3 http://java.sun.com 56 4.6 Test Driven Modeling Suite 4.6.2 Support for Empirical Evaluation, Execution and Verification So far, we have described how test cases and declarative process models can be created in TDMS. In the following, we discuss how support for empirical research as well as the execution and verification of declarative process models is supported. In particular, we describe how TDMS makes use of CEP’s components for empirical research and integrates the Declare framework [177] for workflow execution and process model verification, as illustrated in Figure 4.13 and detailed in the following. Test Cases + Process Model Test Driven Modeling Suite Export and Deploy Declare Framework (Workflow Engine) Cheetah Experimental Platform Declare Worklist (Workflow Client) Execute Process Instance Process Modeling Process Execution Figure 4.13: Interplay of TDMS, CEP and the Declare framework [300] Cheetah Experimental Platform as Basis One of the design goals of TDMS was to make it amenable for empirical research, i.e., it should be easy to employ it in experiments and case studies. In addition, data should be easy to collect and analyze. For this purpose, TDMS was implemented as an experimental workflow activity of CEP, allowing TDMS to be integrated in any experimental workflow, i.e., a sequence of activities performed during an experiment, cf. Section 3.3. Furthermore, we used CEP to instrument TDMS, i.e., to log each relevant user interaction to a central data storage. This logging mechanism, in combination with CEP’s replay feature, allows the researcher to inspect step–by–step how TDMS is used to create process models and test cases. Or, even more sophisticated, such a fine–grained instrumentation allows researchers and practitioners to closely monitor the process of process modeling, i.e., the creation of the process model, using Modeling Phase Diagrams [192]. To support the step–wise replay of modeling sessions, any relevant user–interaction in TDMS is supported through a command (cf. Command Pattern [262]). As illustrated in Figure 4.14, any command in TDMS inherits from Abstract Command, which defines the basic operations for replay: to execute a command and to undo a 57 Chapter 4 Test Driven Modeling Create Activity Command «abstract» Abstract Command Abstract Activity Command -name : String +execute() +undo() Move Activity Command Renam e Activity Com m and Abstract Constraint Command de Create Response Constraint Command e E Create Precedence Constraint Command Delete Constraint Com m and Figure 4.14: Command class hierarchy command. Subclasses extend this behavior accordingly, e.g., Create Activity Command defines the behavior for creating an activity. Similarly, Create Precedence Constraint Command provides functionality for specifying a precedence command. Consequently, any modeling session can be represented as a sequence of commands. When conducting a modeling session in TDMS, CEP automatically stores the executed commands. When analyzing the modeling session, in turn, commands are restored and executed, allowing to revisit the modeling session step–by–step. Process Model Verification and Execution As discussed, the internal workflow engine of TDMS does not support the verification of declarative process models. However, it is known that the combination of constraints may lead to activities that cannot be executed [175]. In order ensure that the process model is free from such dead activities, we make use of the verification provided in the Declare framework [177]. In particular, as illustrated in Figure 4.13, the process model is iteratively created in TDMS. For the purpose of verification, the process model is then converted into a format that can be read by the Declare framework. Similarly, this export mechanism can be used to execute the process model in the Declare framework’s workflow engine. 4.6.3 Example To demonstrate how TDMS can be used to drive the creation of a declarative business process model including test cases, we again turn to the example discussed in Section 4.5.3. Recall that the example shows how the proposed approach may be used by a DE and MB to create a process model and test cases describing the process 58 4.6 Test Driven Modeling Suite of writing a publication (cf. Figures 4.15–4.17). Again, we make use of the following abbreviations: I Come up with Idea W Write S Submit Paper Initially, TDMS starts up with an empty process model and an empty test case. Following the process described in Figure 4.6, the DE describes a general version of the business process: “When writing a publication, you need to have an idea. Then you write the publication and submit it.”. In the following, possibly with help of the MB, the DE inserts activities I, W and S in the test case’s execution trace (cf. Figure 4.15). The user interface of TDMS thereby allows to create the activities in the test case editor (left hand side), respective activities are automatically created in the process model and laid out accordingly (right hand side). In addition, after each test–case–relevant user interaction, e.g., adding an activity to the test cases’ execution trace, TDMS automatically validates the test case against the process model. As TDMS has automatically created activities I, W and S, the test case passes. Figure 4.15: Test case 1: <I, W, S > proposed by the DE Subsequently, the DE and MB engage in a dialogue of questioning and answering [105]—the MB challenges the model: “So can I submit the paper several times?”. “You should submit the paper, but, at most once!”, the DE replies and adds: “And you should only have a single idea—otherwise the reader gets confused.”. To capture these requirements, the MB adapts the test case accordingly. In particular, the MB adds execution assertions to restrict the execution of I and S, specifying that I and S cannot be executed more than once. In addition, the MB adds a completion assertion to specify that the process cannot be completed until I and S have been executed, thereby requiring that I and S are executed at least once (cf. Figure 4.16). TDMS immediately validates the changes and reports that the test case is invalid, as I and S can be executed arbitrarily often. Since DE and MB know that the test case is valid, the process model has to be adapted to resolve this situation (cf. 59 Chapter 4 Test Driven Modeling Figure 4.6). In particular, the MB adds cardinality constraints on I and S, making the test case valid, as shown in Figure 4.16. Figure 4.16: Test case 1: introduction of cardinality on I and S Again, the MB challenges the model and asks: “Is it possible to submit an idea without paper?”. The DE replies: “No, you always need a written document.” and together they specify a second test case that ensures that S cannot be executed without at least one execution of W before (cf. Figure 4.17). Again, TDMS immediately validates the test case and reports that the test case failed, as there are no constraints restricting the interplay of W and S. Also in this situation DE and MB know that the test case is valid, hence the MB needs to adapt the process model. In particular, the MB slightly changes the layout of the process model to introduce a precedence constraint between W and S. TDMS, in turn, reacts to this change and validates the test case. As shown in Figure 4.17, test case and process model are now consistent, hence the test cases pass. Figure 4.17: Test case 2: introduction of precedence between W and S In this section described how test cases and process modeling—particularly their interwoven creation—is supported by TDMS. By the means of examples, we showed how TDM could be adopted for driving a modeling session. In the following, we look into the application of TDM to investigate whether test cases are indeed an adequate 60 4.7 The Influence of TDM on Model Creation: A Case Study measure for improving the creation, understanding and maintenance of declarative process models. 4.7 The Influence of TDM on Model Creation: A Case Study To study the influence of TDM on the creation of declarative process models and to analyze the communication behavior between DE and MB, we apply TDM in modeling sessions. In particular, we study the application of TDM in a case study, i.e., we conduct “an empirical inquiry that investigates a contemporary phenomenon within its real–life context” [288]. More specifically, we conduct a case study in which a MB, who was trained in TDM, uses TDMS to capture business processes. Starting with the definition of research questions to be addressed, we describe the design of the case study in Section 4.7.1. Subsequently, Section 4.7.2 discusses the afore defined research questions in the light of the collected data. Finally, Section 4.7.3 presents potential limitations and revisits findings for a discussion. The case study is organized along five research questions (RQ1 to RQ5 ), as described in the following. Please recall that TDM was developed to support the creation of declarative business process models, hence the research questions focus on declarative business process modeling. In particular, TDM assumes that test cases provide an additional communication channel between DE and MB (cf. Figure 4.5). To examine whether test cases are indeed as intuitive as expected and accessible to DEs, in RQ1 we investigate whether test cases are accepted by DEs as communication channel. Research Question RQ1 DEs? Are test cases accepted as communication channel by In TDM, the process model as well as test cases are available during modeling. Assuming that RQ1 can be answered positively, i.e., test cases are actually accepted and used as communication channel, it is still not clear whether test cases are actually better suited for communication than the process model itself. In particular, TDM claims that test cases are easier understandable to the DE and hence likelier to be used for communication than the process model. The goal of RQ2 is to find out whether test cases are indeed favored over the process model, as suggested by TDM. Research Question RQ2 nication channel? Are test cases favored over the process model as commu- 61 Chapter 4 Test Driven Modeling The ultimate goal of any design science artifact should be to improve over the state of the art [100]. With respect to the communication between DE and MB, TDM aims to improve over the state of the art by providing a common basis for discussion (cf. Section 4.5.3). The goal of RQ3 is to investigate whether the adoption TDM in this sense positively influences the communication behavior. Research Question RQ3 DE and MB? Do test cases help to foster the communication between Furthermore, TDM claims that the specification of test cases can also be achieved by a DE. In other words, the user interface provided by TDMS must be designed so it can be operated by a DE. The goal of RQ4 is therefore to assess whether DEs indeed think that they are capable of specifying test cases, i.e., whether DEs think that operating TDMS is easy. Research Question RQ4 Do DEs think that operating TDMS is easy? So far, RQ1 to RQ4 covered positive aspects of TDM. However, it must be assumed that the creation of test cases entails costs, i.e., the MB has to invest additional time in the creation of test cases. Thus, the goal of RQ5 is to investigate the additional costs implied by TDM. Research Question RQ5 test cases? What is the overhead associated with the specification of 4.7.1 Definition and Planning of the Case Study In the following, we describe the specification of the case study. In particular, we shortly describe the research methodology, on which the design of the case study is based upon. Subsequently, we turn toward the design of the case study to show how RQ1 to RQ5 were operationalized. Case Study Methodology The modeling methodology, as proposed by TDM, can be seen as a collaborative approach. It assumes that the process model is created in an iterative process that requires intense communication between MB and DE. To investigate the communication, we follow the CoPrA approach for the analysis of collaborative modeling sessions [223]. As illustrated in Figure 4.18, the CoPrA approach consists of three 62 4.7 The Influence of TDM on Model Creation: A Case Study phases. First, in the data collection phase, the research question and research design are fixed. Based on the design, data is collected, e.g., by recording communication protocols. In the context of this work, communication protocols refer to the conversation that takes place between DE and MB. To make the protocols amenable for analysis, transcription of the recorded audio files is necessary (cf. [64]). Second, in the data preparation phase, transcribed data is coded according to a coding schema, which was fixed during the data collection phase. This coding is required to tackle the problem of “attractive nuisance” [146], i.e., the enormous amounts of data produced in case studies. By breaking down communication protocols to codes, the complexity of data is reduced and becomes amenable for analysis. Third, in the last phase, characteristics such as the distribution of codes can be analyzed. The strength of the CoPrA approach arises from coding communication protocols in a format that supports the storage of temporal information and making the protocol amenable for analysis with third party tools. To this end, as shown in Figure 4.18, CoPrA makes use of Audittrail Entries, as defined in the well–established process mining tool PROM [258]. In particular, each Audittrail Entry contains an entry Workflow Model Element that refers to the code, whereas entry Timestamp stores the timestamp at which the code was identified and entry Originator refers to the person the code can be attributed to. In this sense, a transcript obtained in a modeling session can be represented by storing a list of Audittrail Entries. In this work, for instance, we analyzed how often the DE and MB referred to test cases and looked into the temporal distribution of codes. For a detailed description of the CoPrA approach, we refer the interested reader to [223]. Case Study Design Based on research questions RQ1 to RQ5 , the design of the case study was elaborated, resulting in a three–phased process; the population of this case study are all MBs and DEs that are working with declarative process models. In the first phase, demographic data, such as age, familiarity with computers and experience in process modeling are collected. TDM assumes that the DE knows the domain very well, but is not trained in process modeling. Hence, these assessments are required to ensure that the DEs participating in the case study comply with this profile. In the second phase, the modeling sessions take place. For half of the subjects, a MB trained in TDM leads the modeling session. For the other half of the subjects, the MB conducts the modeling session using a declarative modeling editor only. During the modeling session, three data capturing mechanisms are used. To capture communication, audio and video data is recorded (cf. RQ1 to RQ3 ). In addition, TDMS is employed to gather the created process models and test cases. The collected process 63 Data Collection Chapter 4 Test Driven Modeling E.g., Usage of test cases during communication between MB and DE. E.g., Are test cases favored over the process model? Research Design and Research Question Communication Logs Gather Data Data Preparation iterations Coding with Coding Schema propose_testcase ask_process_model clarify_domain ... Audittrail Entry Export of Process Information Workflow Model Element Timestamp Data Analysis Originator Computation of Information Figure 4.18: Analysis technique for collaboration processes, adapted from [223] models can then be compared to investigate the additional effort for creating test cases (cf. RQ5 ). To ensure that the results are not biased by unfamiliarity with the usage of TDMS, it is operated by the MB only. In the third phase, the Perceived Ease of Use scale of the Technology Acceptance Model (TAM) [47, 50] is presented to the DE in order to investigate RQ4 (TDMS is easy to use). To address research questions RQ1 to RQ3 , we developed a coding schema for coding the transcribed communication logs. In particular, we use a subset of the negotiation patterns [210] to describe the communication between DE, MB, test cases and process model. As summarized in Table 4.1, we differentiate between asking questions (ask ), answering questions (clarify), proposing changes (propose), expressing consent to a proposal (support) and modeling (model ). Orthogonally, we distinguish whether MB and DE refer to the process model when talking (pro- 64 4.7 The Influence of TDM on Model Creation: A Case Study cess model ), refer to a test case when talking (test case) or just talk freely without referring to the process model or a test case (domain). To assess whether DEs take into account formal properties of the process model, we added code ask notation to code situations where the DE asks about the modeling notation. For answering questions about the modeling notation, we use clarify notation. Category: Ask Code Action Example ask domain (p,q) Person p states a question q without referring to a test case or the process model. “So you need to connect the mast with the sailing ship?” Code Action Example ask notation (p,q) Person p states a question q regarding the notation. “Is this a precondition?” Code Action Example ask process model (p,q) Person p states a question q and refers to the process model. “This one too?” (points at process model) Code Action Example ask test case (p,q) Person p states a question q and refers to a test case. “Then we start here.” (points at test case) Category: Clarify Code Action Example clarify domain (p,q,a) Person p gives answer a to question q without referring to a test case or the process model. “Yes, at least once and at most ten times.” Code Action Example clarify notation (p,q,a) Person p gives answer a to question q regarding the notation. “This arrow indicates a precondition.” Code Action Example clarify process model (p,q,a) Person p gives answer a to question q and refers to the process model. “Here, we are currently. . . ” (points at process model) Code Action Example clarify test case (p,q,a) Person p gives answer a to question q and refers to a test case. “This happens up here, just before. . . ” (points at test case) 65 Chapter 4 Test Driven Modeling Category: Propose/Support Code Action Example propose domain (p,pr) Person p makes proposal pr without referring to a test case or the process model. “Then, I have to contact the agency.” Code Action Example propose process model (p,pr) Person p makes proposal pr and refers to the process model. “Here, you have to unwrap the mainsail.” (points to process model) Code Action Example propose test case (p,pr) Person p makes proposal pr and refers to a test case. “Then, here, we went on with. . . ” (points to test case) Code Action Example support (p,pr) Person p expresses consent to proposal pr. “Yes, right.” Category: Model Code Action Example model process model (p) Person p adapts the process model. “I just need to draw some lines here. . . ” (adapts process model) Code Action Example model test case (p) Person p adapts a test case. “So I drop this activity there. . . ” (adapts test case) Table 4.1: Coding schema For the operationalization of this setup, we rely on the capabilities of TDMS. As TDMS is implemented as an experimental workflow activity of CEP (cf. Section 3.3), it can be seamlessly integrated in the experimental workflow, i.e., CEP guides MB and DE through the modeling sessions. Data is collected automatically, ensuring that each modeling session, the collected demographic data and TAM survey is stored as a separate case of the case study. 4.7.2 Performing the Case Study In the following, we describe how the case study was performed and investigate research questions RQ1 to RQ5 . Besides the elaboration of the case study design, the preparatory phase included the configuration of CEP, acquisition of appropriate 66 4.7 The Influence of TDM on Model Creation: A Case Study Familiarity Familiarity Familiarity Familiarity with with with with computers domain BPM declarative BPM Min. Max. M SD 3 3 1 1 5 5 2 2 4.38 4.75 1.13 1.13 0.70 0.66 0.33 0.33 Table 4.2: Demographic data devices for capturing audio and video and training the MB in TDM. To find potential DEs, we asked colleagues, friends and relatives from whom we knew that they had no experience in Business Process Management (BPM). Before the case study was started, a small pilot study was conducted to ensure that the collected data is amenable for the envisioned analysis. After minor adaptations (e.g., other software for video capturing), the case study was started. All in all, eight DEs participated in the study. Each of them was asked to describe a process from a domain they were familiar with. In four modeling sessions, the TDM methodology was adopted, i.e., in addition to the process model, test cases were developed, validated and discussed. In the other four modeling sessions, the MB used a declarative process modeling editor only, no test cases were provided.4 To ensure that the TDM methodology was properly adopted and results were not influenced by lacking familiarity with TDMS or declarative process models, the MB underwent intensive training in declarative process modeling, applying TDM and using TDMS. During the modeling sessions, TDMS was operated by the MB only to avoid the DEs’ different levels of tool knowledge to influence results. Each of these modeling sessions lasted between 19 and 32 minutes, in total 3 hours and 32 minutes of modeling were captured. Due to the nature of the case study, i.e., one–on–one sessions and the adoption of communication protocols, an entirely anonymous data collection was not possible. However, subjects were informed that all analyses are reported in confidentiality– preserving way. After the modeling sessions were finished, we validated whether the participating DEs actually met the targeted profile, i.e., they had to be experts in their domain, but should not be familiar with BPM. We used a 5–point rating scale to test for familiarity with computers in general, the domain, BPM as well as declarative BPM. 4 The experimental material as well as collected data can be downloaded from: http://bpm.q-e.at/experiment/ImpactOnCommunication 67 Chapter 4 Test Driven Modeling Code TDM Declare ask domain ask notation ask process model ask test case clarify domain clarify notation clarify process model clarify test case propose domain propose process model propose test case support model process model model test case 45 0 5 34 63 0 4 34 113 0 26 57 65 61 109 1 9 0 110 3 18 0 95 30 0 48 170 0 Total 507 593 Table 4.3: Total codes used The scale ranged from Disagree (1) over Neutral (3) to Agree (5).5 As summarized in Table 4.2, the case study’s subjects fit the targeted profile of a DE. In particular, familiarity with the domain was high (M = 4.75, SD = 0.66), while familiarity with BPM in general (M = 1.13, SD = 0.33) an declarative BPM (M = 1.13, SD = 0.33) was low. Involved domains were quite diverse and included, e.g., tax audition, creation of class schedules, sailing and the renovation of buildings. In addition, we assessed the age of the participants. Two DEs were between 18 and 29, one DE between 30 and 45, two DEs between 45 and 60 and three DEs were over 60. Finally, we also assessed the familiarity with computers. As summarized in Table 4.2, all participants indicated strong accordance, canceling out that subjects were unfamiliar with computers. In the following, the recorded communication protocols of all sessions were transcribed, resulting in a text document with 26,299 words uttered in 1,267 statements.6 Subsequently, the codes from Table 4.1 were used to code the transcripts. Due to 5 The study was conducted with native German speakers, hence scale names were translated to avoid misunderstandings. The original scale names were: stimmt nicht, stimmt wenig, stimmt mittelmäßig, stimmt ziemlich and stimmt sehr. 6 We are indebted to Cornelia Haisjackl for supporting the transcription of protocols. 68 4.7 The Influence of TDM on Model Creation: A Case Study personnel limitations, the coding process was carried out by a single person, i.e., the author, only. As indicated in Table 4.1, not all codes could be identified by looking at the transcripts only. For instance, for properly identifying code ask process model, the coder has to know whether the person referred to the process model. For this reason, also the video files of the modeling sessions were taken into account while coding. To get an overview of the coding, a summary is listed in Table 4.3. In general, it can be said that the amount of codes used in total is similar, i.e., 507 for TDM, 593 for Declare modeling. However, the distribution of codes shows differences. Unsurprisingly, codes that refer to test cases were not used for the transcripts of Declare modeling sessions. Interestingly enough, less asking and clarifying occurred in TDM sessions (84 versus 119 ask, 101 versus 131 clarify). Please note that the amount of ask statements does not necessarily need to coincide with the amount of clarify statements. In fact, several clarify statements may follow an ask statement. For instance, in some cases a question was picked up later and clarified in more depth. A further interesting relation can be found between the number of times questions about the notation were posed (ask notation) and the number of times statements referred to the process model. 66 statements referred to the process model, but only once a DE uttered a question about the notation. Knowing that the participating DEs did not have any experience in process modeling (cf. Table 4.2), this finding is surprising. In the following, we will first discuss the research questions in the light of the collected data and then pick one particular case to illustrate a typical modeling session. RQ1 : Are Test Cases Accepted as Communication Channel by DEs? One of the basic claims of TDM is that test cases provide an additional communication channel between DE and MB by providing a common discussion basis. To investigate this claim, in particular to verify whether test cases are indeed adopted for communication, we counted how often MB and DE referred to a test case during the modeling session. Only those statements that unmistakably referred to a test case were counted. For the identification of such statements two criteria were used. If the test case was explicitly mentioned, e.g., “So, now we are talking about the positive case”, we counted the statement as test case related. If this was not the case and the transcript did not reveal whether the discussion was revolving around a test case, we consulted the video. Thereby, we checked whether the person pointed at a test case. If this was the case, the statement was considered to be referring to a test case. Considering ask, in total 84 statements were uttered during TDM modeling sessions, 34 (40%) of them referred to a test case. Regarding clarify, in total 101 69 Chapter 4 Test Driven Modeling statements were found, whereby 34 (34%) statements referred to a test case. Finally, regarding propose, a total of 139 statements were found, of which 26 (19%) referred to a test case. All in all, it can be said that for ask and clarify, a fair share of the communication was using test cases. Concerning propose, the proportion was clearly lower. In the following, we would like to provide an explanation for this phenomenon. During the coding process, we found that DEs preferred to talk freely about their domain. In other words, their statements were not well–structured and included aspects that could not be captured in the process model. For instance, DEs reported from personal experiences, such as how a certain procedure evolved and improved over time: “Well this took as quite some time, but now, we are pretty fast!”. For the DE, this information seemed relevant, however, it is not possible to capture it in the form of a declarative process model. Put differently, it was not clear a–priori to the DE which information was relevant for the modeling sessions, which is not surprising, as none of the participants was familiar with BPM (cf. Table 4.2). Speaking in terms of Figure 4.5, the DE requires the MB to filter, abstract and translate knowledge in a form that can be modeled as test case or process model. Hence, it appears as if test cases cannot replace the abstraction skills of the MB. Still, 40% of the ask statements and 33% clarify statements referred to test cases. Consequently, we argue that test cases are apparently able to provide an additional communication channel, preferably for asking questions and clarifying. In other words, we can positively answer RQ1 : test cases do provide an additional communication channel. Please note that the observation that test cases cannot replace the abstraction skills of the MB does not contradict TDM. Rather, test cases should be modeled by DE and MB together. Also, test cases aim for improving the communication, but are not intended to replace the MB. RQ2 : Are Test Cases Favored over the Process Model as Communication Channel? Even if test cases provide an additional communication channel, it is not clear yet whether it improves over the process model as communication channel. TDM claims that test cases are easier to understand for DEs, hence DEs presumably favor test cases over process models for communication. To investigate this claim, we look at the collected data from two perspectives. First, we look at the ratio of test case based communication versus process model based communication in TDM modeling sessions. Then, we will compare the communication behavior of TDM and Declare modeling. Regarding ask, in TDM modeling sessions 84 statements were uttered. 45 (54%) were classified as general statements, 5 (6%) referred to the process model, while 34 70 4.7 The Influence of TDM on Model Creation: A Case Study (40%) referred to a test case. A similar situation can be found for clarify. Here, 101 statements were transcribed, 63 (62%) were classified as general, 4 (4%) referred to the process model, while 34 (34%) referred to a test case. Finally, for propose, from a total of 139 statements, 113 (81%) were classified as general, 0 (0%) referred to the process model, whereas 26 (19%) referred to a test case. Interestingly, for any of these code categories a similar pattern can be found. Most communication happens in a general form (54% to 81%), without referring to the process model or a test case. In addition, a noticeable share of the communication involves test cases (19% to 40%), while almost no communication is conducted with respect to the process model (0% to 6%). It is important to note that TDM provides test cases as well as the process model side–by–side. Hence, even though the idea is to focus on test cases for communication, such behavior cannot be enforced, i.e., the MB cannot forbid the DE to talk about the process model. Given this freedom of choice, we conclude that DEs favor test cases over the process model for communication. In the following, we complement these findings by comparing TDM with Declare modeling. With respect to ask, 5 (6%) statements referred to the process model in TDM, 9 (8%) in Declare modeling. Regarding clarify, 4 (4%) referred to the process model in TDM, 18 (14%) in Declare modeling. Finally, for propose, 0 (0%) statements in TDM referred to the process model, in Declare modeling 30 (24%) could be found. Apparently, supplying test cases seems to distract attention from the process model, further providing empirical evidence that test cases are favored over process models. Hence, RQ2 can be answered positively: the collected data suggests that test cases are favored over the process model in communication. RQ3 : Do Test Cases Help to Foster the Communication between DE and MB? To answer RQ3 , we start by defining what fostering communication means and how it can be measured. In particular, we subscribe to the view of [105] and view modeling as a “dialogue of questioning and answering”. Indeed, we could exactly observe this behavior while coding the communication protocols. Initially, the DE proposes a statement, e.g., by posing a general statement about the domain. If the statement is clear enough, the MB is able to directly reflect it in a test case or the process model. Otherwise, the MB has to ask the DE for clarification until the statement is clear enough. Hence, we assume that the less ask statements and, in turn, clarify statements are required, the more efficient the communication. To assess whether the adoption of TDM indeed fosters communication, we counted the number of ask, clarify and propose statements. In TDM sessions, in total 84 ask statements were uttered, in Declare modeling sessions 119. Similarly, in TDM sessions 101 clarify statements could be observed, in Declare modeling sessions 131 71 Chapter 4 Test Driven Modeling statements. These results suggest that less ask and clarify statements were observed in TDM modeling sessions. Still, the reason for the difference could be lead back to a different amount of total statements. Particularly, modeling sessions lasted between 19 and 32 minutes, potentially leading to a different number of total statements. To cancel out this influence, we also counted the number of propose statements and computed the relative occurrence of ask and clarify with respect to propose statements. In TDM modeling sessions, in total 139 propose statements were found, for Declare modeling sessions 125. Hence, in TDM modeling sessions, 0.60 ask statements were uttered per propose statement. In Declare modeling, this value increased to 0.95 times ask per propose. Similarly, 0.73 clarify statements were found per propose in TDM sessions, 1.05 for Declare modeling. All in all, the total amount of ask and clarify statements as well as relative amount of ask and clarify statements per propose are lower. Hence, we conclude that also RQ3 can be answered positively: the adoption of TDM appears to have a positive influence on communication by making communication more precise. We would like to add at this point that—even though we consider this argumentation plausible—one might object that the discussion was rather less detailed than more effective. Given the data at hand, we cannot entirely rule out this alternative explanation. RQ4 : Do DEs Think that Operating TDMS is Easy? Research question RQ4 is partly related to TDM and partly related to TDMS. TDM claims that test cases are easier to understand than a process model. Similarly, TDMS claims the graphical user interface for test cases it is easy to use. To assess whether DEs would agree to these statements, we administered the Perceived Ease of Use questionnaire from Technology Acceptance Model (TAM) [47, 50] to the participating DEs after the modeling session. The Perceived Ease of Use scale consists of six 7–point Likert items, ranging from Extremely likely (1) over Neither Likely nor Unlikely (4) to Extremely Unlikely (7). On average, the DEs responded with 1.9, which approximately relates to Quite Likely (2). Hence, we conclude that the participating DEs find it quite likely that it would be easy to learn and operate TDMS. RQ5 : What is the Overhead Associated with the Specification of Test Cases? Even though research questions RQ1 to RQ4 indicate positive effects of adopting TDM, it seems inevitable that the specification of test cases implies additional effort. To estimate the resulting overhead, we collected the declarative process models created in this study and analyzed their size. In particular, as summarized in Ta- 72 4.7 The Influence of TDM on Model Creation: A Case Study Total Per Model % Per Minute % Activities TDM Declare 74 85 18.50 21.25 (87%) (100%) 2.84 3.12 (91%) (100%) Constraints TDM Declare 92 114 23.00 28.50 (81%) (100%) 3.54 4.19 (84%) (100%) Table 4.4: Process model metrics ble 4.4, we counted the number of activities and constraints for TDM and Declare modeling sessions. The numbers clearly indicate that less activities (74 versus 85) and less constraints (82 versus 114) were created in TDM modeling sessions. To compensate for the varying duration of modeling sessions, we have also counted the number of activities and constraints added per minute. Even though the difference between TDM and Declare becomes smaller, the numbers clearly indicate that the creation of test cases involves a considerable overhead. In particular, in Declare modeling sessions 3.12 activities were added per minute, 2.84 activities were added in TDM sessions, i.e., indicating a drop from 100% to 91%. Considering constraints, 4.19 constraints were added in Declare sessions, while 3.54 constraints were added in TDM sessions, i.e., dropping from 100% to 84%. Knowing that in each TDM session on average 2 test cases with 34.5 activities (total: 8 test cases, 138 activities) were created, it seems plausible that a considerable overhead was involved for the specification of test cases. Nonetheless, we would like to stress the point that this additional effort apparently helps to improve communication and presumably provides support for process model maintenance. TDM in Action: Example of a Modeling Session So far, we have established that test cases are accepted as communication medium (RQ1 ), are favored over the process model for communication (RQ2 ), improve communication efficiency (RQ3 ), are easy to understand when visualized through TDMS (RQ4 ), but also imply additional costs (RQ5 ). In the following, we complement these findings with insights from a typical TDM modeling session. In particular, we take a more fine–grained and dynamic point of view and look into how the process of process modeling evolved during the modeling session (cf. [192]). To this end, we partition the modeling session into time slots and analyze the occurrence of codes per time slot. We put 10 codes in each time slot of the modeling session, as we identified 10 codes per slots as meaningful level of granularity. 73 Chapter 4 Test Driven Modeling Considering, for instance, the diagram illustrated in Figure 4.19. On the x–axis, the modeling session’s time slots are listed. In this particular case, the modeling session was divided into 16 time slots. Then, for each time slot the occurrence of statements model test case and model process model were counted, as can be seen on the y–axis. Please note that, even though each time slot includes 10 statements, the number of model test case statements and model process model statements in each time slot do not add up to 10. This is due to the fact that other statements that are not shown in this diagram, such as ask domain, also occur in modeling sessions. 5 4 Codes 3 model_test_case 2 model_process_model 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time Slot Figure 4.19: Modeling of test cases versus modeling of process model Knowing the semantics of the diagram, the following observations can be made from Figure 4.19. First, the creation of test cases preceded the creation of the process model, indicating that TDM was indeed adopted for modeling. By looking at the created process model and the communication transcripts, we could observe that in this case the DE described a tax inspection. The first test case peak from time slot 2 to time slot 4 related to the description of a positive case, i.e., the case when no additional tax payment had to be performed. Next, from time slot 5 to time slot 7, the MB consolidated the test case with the process model, hence mainly the process model was adapted. From time slot 8 to time slot 9, no modeling activities could be observed. Rather, the DE and MB elaborated on the business process in general. In time slot 10, we could see from the communication protocols that the MB and DE 74 4.7 The Influence of TDM on Model Creation: A Case Study revisited and further refined the first test case. Subsequently, in time slot 11, a new test case was created—the negative case, where additional tax payments had to be performed. Necessary adaptations to the process model were conducted in time slot 12. The final peak of test case modeling from time slot 13 to time slot 16 related to a further refinement of the negative test case. Apparently, no more adaptations to the process model were required, as no more process modeling activity could be found. Even though this analysis shows that TDM is feasible and test cases can be used to drive the creation of declarative process models, it does not provide insights into the communication behavior. Therefore, we also analyzed the communication logs with respect to the communication channel used. In particular, as shown in Figure 4.20, we differentiate between general conversations about the domain, conversations that refers to a test case as well as communication that refers to the process model. Unsurprisingly, as discussed in RQ2 , most communication is kept on a general level. A fair amount of communication refers to test cases, while almost no communication refers to the process model. Interestingly, however, the distribution of domain communication and test case communication appears to change as the modeling evolved. Starting with a high amount of domain communication, i.e., talking in general, communication shifts toward the usage of test cases. To quantify this observation, we counted the amount of codes referring to talking freely, i.e., codes * domain, and all codes referring to test cases, i.e., codes * test case, in the first and second half of the modeling session. Thereby, we could observe that the amount of codes associated with talking freely decreased from 51 statements to 25 statements, while the amount of codes referring to test cases increased from 10 statements to 32 statements. Knowing that DEs in our study did not have any prior experience in business process modeling, it seems likely that DEs first had to get used to the notion of test cases. As this happened within half of a modeling session, we conclude that test cases are indeed a intuitive instrument for specifying process models. 4.7.3 Limitations and Discussion Even though the investigation seems quite promising, the presented results are to be viewed in the light of limitations. First and foremost, the sample size—even though not untypical for a case study [61]—is a threat to the generalization of results. In other words, although the DEs in our study accepted and favored test cases as communication channel, it cannot be guaranteed that this holds for every DE. Similarly, we made use of a convenience sample, i.e., we used colleagues, friends an relatives as DEs. To avoid a potential bias, we did not inform DEs about the goal of the study and only informed about necessary details, such as the tasks to be 75 Chapter 4 Test Driven Modeling 10 9 8 7 Codes 6 5 domain 4 test case process model 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time Slot Figure 4.20: Utilized communication channels performed. However, the adopted selection strategy must be considered a potential bias. Furthermore, the modeling sessions were rather short, on average a modeling session lasted 26 minutes and 30 seconds. In addition, reported numbers are of descriptive nature, as due to the low sample size inferential statistics could not be applied. Finally, due to personnel limitations, the coding was performed by a single person, i.e., the author, only. This, in turn, must be expected to negatively influence the accuracy and reliability of the coding [119]. Still, the findings of the case study seem quite positive so far. In particular, it seems as if test cases are accepted as communication channel (RQ1 ) and are favored over the process model for communication (RQ2 ). Furthermore, test cases decreased the amount of ask and clarify statements per propose (RQ3 ). In addition, DEs indicated that they think learning to operate TDMS seems to be easy (RQ 4 ), but the adoption of TDM requires moderate overhead for the creation of test cases (RQ5 ). In the following, we will underpin these findings with insights gained during data analysis. One of the central insights of the modeling sessions was about the way how DEs structured their information. In general, DEs seemed to prefer talking about their domain in the form of sequential actions, seen from their perspective. Indeed, 837 times words indicating sequential information, such as “then”, “now” or 76 4.8 The Influence of TDM on Model Maintenance: Experiments “afterwards” were used. Interestingly enough, this behavior could be observed across all modeling sessions. As this behavior occurred for TDM and Declare modeling sessions, we assume that it is the intuitive way for a DE to talk about a domain. If this was indeed the case, it would explain why test cases were well accepted: test cases provide such a sequential, instance–based view on a declarative process model. Contrariwise, a declarative process model rather provides circumstantial information. Similar observations were made in an empirical investigation examining how MBs make sense of declarative process models [97]. In particular, the study showed that MB seem to prefer a sequential way of reading a declarative process model. Against this background, it seems natural that DEs prefer test cases for communication. Finally, we would like to come back to the assumption that DEs are normally not able to read formal process models (cf. [107]). In particular, in 66 cases, either the DE or the MB referred to the process model during the modeling session, indicating that DEs might have had formal process modeling knowledge. To clarify whether this was indeed the case, we looked into all statements that referred to the process model. The analysis revealed all but one DE referred to activities, but ignored the constraints. Hence, it appears that DEs are indeed normally not able to grasp the formal parts of a process model, but may access superficial information, such as activity names. Backup for this theory is also provided by the fact that none of the DEs had experience in declarative process modeling (cf. Table 4.2) and only once a DE uttered a question about the notation (cf. Table 4.3). In other words, it seems implausible that DEs were able to extract more information from the process model than activity names. 4.8 The Influence of TDM on Model Maintenance: Experiments So far, we have empirically investigated the influence of TDM on the creation of declarative process models, particularly focusing on the communication between DE and MB. In the following, we turn toward the maintenance of declarative process models, i.e., the evolution of process models due to, e.g., external factors like the introduction of new laws or changes in customer attitude. Methodologically, we thereby rely on controlled experiments, i.e., conduct modelling sessions in the laboratory under controlled conditions [13]. In particular, this investigation consists of two controlled experiments. Based upon the experimental design, as described in Section 4.8.1, the first experiment, subsequently referred to as E1 , is conducted with a rather small sample size (cf. Section 4.8.2). Insights from this experiment are 77 Chapter 4 Test Driven Modeling then used for a replication, in the following referred to as R1 , with a larger sample size (cf. Section 4.8.3). 4.8.1 Experimental Definition and Planning The goal of this investigation is to provide empirical evidence for the positive influence of test cases on the maintenance of declarative process models. This section introduces the research questions, hypotheses, describes the subjects, objects, factors, factor levels and response variables required for our experiment and presents the instrumentation and data collection procedure as well as the experimental design. Research Questions and Hypotheses The hypotheses tested in this investigation are directly derived from the theoretical considerations presented in Section 4.4 and Section 4.5. In particular, we have argued that declarative process models lack computational offloading for the computation of execution traces. In addition, declarative process models rather provide circumstantial than sequential information. The extraction of sequential information, in turn, was shown to be a hard mental operation. As, however the validation of process models presumably requires the MB to mentally execute the process model, a significant impact on the mental effort can be expected. TDM and test cases in particular provide computer–based support for the computation of execution traces. Therefore, we expect that the mental effort required for adapting process models significantly decreases when test cases are provided. This claim, in turn, directly leads to research question RQ6 and associated hypothesis H1 .7 Research Question RQ6 Does the adoption of test cases lower the mental effort on the process modeler conducting the change? Hypothesis H1 The adoption of test cases significantly lowers the mental effort on the process modeler conducting the change. Tightly connected to the mental effort required for adapting declarative process models is the way how the adaptations are perceived, i.e., the perceived quality of 7 To unambiguously refer to research questions, we assigned consecutive numbers to our research questions. RQ1 to RQ5 were already used in Section 4.7, hence we continue here with research question 6. 78 4.8 The Influence of TDM on Model Maintenance: Experiments the changes. In particular, it is assumed that the human mind works best if it is engaged at a level appropriate to one’s capacities [161], i.e., the required mental effort is not too high. Likewise, by avoiding hard mental operations and not overloading the working memory, MBs are presumably more confident with the conducted changes. Indeed, similar results could be found in the domain of software engineering, where experiments showed that having test cases at hand can improve perceived quality [133]. Similarly, we expect test cases to improve the perceived quality of the changes applied to declarative process models, as stated in research question RQ7 and its associated hypothesis H2 : Research Question RQ7 Does the adoption of test cases improve the perceived quality of the conducted changes? Hypothesis H2 The adoption of test cases significantly improves the perceived quality of the conducted changes. Finally, the availability of test cases should also improve the quality of the conducted changes, since test cases provide an automated way of validating the process model. In addition, is known that errors are more likely to occur when the working memory’s capacity is overloaded [236], we argue that a reduction of mental effort by providing test cases ultimately leads to lower error rates, i.e., quality of the evolution. In particular, we expect a positive influence on the quality of process models (the operationalization of quality will be explained subsequently), as stated in research question RQ8 and the associated hypothesis H3 : Research Question RQ8 Does the adoption of test cases improve the quality of changes conducted during maintenance? Hypothesis H3 The adoption of test cases significantly improves the quality of changes conducted during maintenance. Subjects The population under examination are process modelers that create and maintain declarative process models. Therefore, targeted subjects should be at least moderately familiar with BPM and declarative process modeling notation (Declare in particular). We are not targeting modelers who are not familiar with declarative process models at all, since we expect that their unfamiliarity blurs the effect of 79 Chapter 4 Test Driven Modeling adopting test cases, since errors may be traced backed to lack of knowledge rather than the complexity of the change task. Factor and Factor Levels Our experiment’s factor is the adoption of test cases, i.e., whether test cases are provided while conducting the changes to the process model or not. Thus, we define the factor to be adoption of test cases with factor levels test cases as well as absence of test cases. Objects The objects of our study are two change assignments, each one performed on a different declarative process model. Please recall that this section describes the general design of this study. Hence, aspects specific to experiment E1 and replication R1 , such as the exact models that were used, are reported in Sections 4.8.2 and Section 4.8.3, respectively. The process models and change assignments were designed carefully to reach a level of complexity that goes well beyond the complexity of a “toy–example”. To cancel out the influence of domain knowledge [120], we labeled the models’ activities by letters (e.g., A to H ). Furthermore, to counter–steer potential confusion by an abundance of different modeling elements, no more than eight distinct constraints were used per model. By providing test support, factor level adoption of test cases was operationalized. Likewise, by do not providing test support, factor level absence of test cases was operationalized. Finally, we performed a pilot study to ensure that the process models and change assignments are of appropriate complexity and are not misleading. The change assignments consist of a list of requirements, so–called invariants, that hold for the initial model and must not be violated by the changes conducted. In addition, it must be determined, whether the change to be modeled is consistent with the invariants. If this is the case, the change has to be performed while ensuring that all invariants are preserved. If a change assignment is identified to be inconsistent, a short explanation of the inconsistencies must be provided and the change must not be applied. An example of a change assignment is illustrated in Figure 4.21 (1). Assume an invariant that C cannot be executed until A has been executed. Further assume a change assignment to remove the precedence constraint between A and B. The invariant is valid for this model as C requires B to be executed before and B requires A to be executed before—thus C cannot be executed before A has been executed. The change is consistent, as it does not contradict the invariant. However, removing the precedence constraint between A and B is not enough. In addition, 80 4.8 The Influence of TDM on Model Maintenance: Experiments I nv ar i ant :Cpr ecededbyA I nv ar i ant :Cpr ecededbyA ChangeTask:Remov epr ecedencebet weenAandB Figure 4.21: Example of a change assignment [301] a new precedence constraint between A and C has to be introduced to satisfy the invariant, resulting in the process model shown in Figure 4.21 (2). Response Variables To test the hypotheses, we define the following response variables: mental effort (H1 ), perceived quality (H2 ) and quality of the conducted changes (H3 ). To measure mental effort, we employ a 7–point rating scale, asking subject to rate the mental effort expended for conducting the change tasks from Very low (1) over Medium (4) to Very high (7). As detailed in Section 3.2.3, employing rating scales for measuring mental effort was shown to be reliable and is widely adopted. To assess perceived quality, we ask subjects to self–rate the quality, i.e., correctness, of the conducted changes. The measurement of quality is derived from the change assignments (cf. Paragraph Objects). In particular, we define quality to be the sum of preserved (non–violated) invariants, the number of correctly identified inconsistencies as well as the number of properly performed changes, i.e., we measure whether the new requirements were modeled appropriately. To illustrate this notion of quality, consider the process model shown in Figure 4.21 (1). The modeler must 1) determine that the change is consistent, 2) remove the precedence constraint between A and B to fulfill the change assignment, as well as 3) introduce a new precedence constraint between A and C to satisfy the invariant—for each subtask one point can be reached, i.e., at most 3 points per change assignment. 81 Chapter 4 Test Driven Modeling Experimental Design The experimental design employed in this study is based on the guidelines for designing experiments from [286]. Following these guidelines, a randomized balanced single factor experiment is conducted with repeated measurements. The experiment is called randomized, since subjects are assigned to groups randomly. We denote the experiment as balanced as each factor level (i.e., the adoption of test cases and the absence of test cases) is applied to the same number of subjects. As only a single factor is manipulated (i.e., the adoption of test cases), the design is called single factor. To operationalize this setup, we employ the experimental workflow capability of CEP (cf. Section 3.3). In particular, as shown in Figure 4.22, the workflow starts by asking the subject to enter a code, which is printed on the assignment sheet given to subject. In this way, the experimenters can steer how many subjects will take the upper branch in the experimental workflow or take the lower branch of the experimental workflow, thereby achieving repeated measurement. Likewise, by randomly distributing assignment sheets, randomization is achieved. After entering a valid code, i.e., 3482 or 7198, the subject is prompted to answer a demographical survey in order to assess the subject’s background. In addition, subjects are informed that participation is non–mandatory as well as anonymous so that neither personal information is stored nor that answers can be traced back to subjects. Then, subjects are asked to adapt two process models (M1 and M2 ) according to a given list of requirements. As shown in Figure 4.22, 10 change tasks have to be performed for each model. For the case, the subject enters code 3482, M1 is presented with test support, while M2 has to be adapted without test support. Contrariwise, for code 7198, M1 has to be adapted without test support, while test support is available for M2 . Subsequently, regardless of the code, a concluding survey, assessing mental effort and perceived quality is administered. code = 3482 Enter Code M1: TDMS, 10 Change Tasks M2: Declare 10 Change Tasks Show Demographic Survey Concluding Survey code = 7198 M1: Declare 10 Change Tasks M2: TDMS 10 Change Tasks Figure 4.22: Experimental design 82 4.8 The Influence of TDM on Model Maintenance: Experiments Instrumentation and Data Collection Procedure Similar to the case study described in Section 4.7, we rely on CEP for non–intrusive data collection. This, as detailed in Section 4.6, allows to investigate the maintenance tasks in detail by replaying the logged commands step–by–step. In addition, CEP provides the bare essentials for conducting the change tasks, allowing subjects to get quickly familiar with the modeling tool. Likewise, it can be ensured that subjects are not distracted by non–task relevant tool features. Finally, as described in Section 3.3, CEP supports the automated collection of data, thereby simplifying the execution of the experiment. 4.8.2 Performing the Experiment (E1 ) Based upon the experimental setup described in Section 4.8.1, the first controlled experiment (E1 ) was conducted. In the following, we cover aspects regarding the operation of the experiment, followed by data analysis. Experimental Operation of E1 Experimental Preparation Preparation for controlled experiment E1 can be divided into the preparation of experimental material as well as the acquisition and training of subjects. As described in Section 4.8.1, the process models used in this study should have a reasonable complexity, likewise the change task should not be trivial. In particular, two models with 7 activities each were prepared for the experiment. In M1 , 7 constraints were used, whereas the M2 comprised 8 constraints.8 Please note that these models, even if a seemingly low number of activities and constraints was used, can pose a significant challenge in understanding. In particular, it was observed that already the combination of two different constraints can be a challenging task [97]. Hence, we argue that M1 and M2 go well beyond the size of trivial process models. To ensure that the assignments were clearly formulated, we conducted a pilot study before the actual experiment to screen the assignments for potential problems. Regarding the training and acquisition of subjects, we could rely on a lecture on BPM, which was held at the University of Innsbruck. In particular, it was possible to adapt the lectures such that students were given lectures and assignments on declarative business process modeling before the experiment. More specifically, a lecture on declarative process models was held two weeks before the experiment. In 8 The material used for experiment E1 can be downloaded from: http://bpm.q-e.at/experiment/ImpactOnMaintenance 83 Chapter 4 Test Driven Modeling addition, students had to work on several modeling assignments using declarative processes before the experiment took place. One week before the experiment, the concept of test cases and their usage was demonstrated. Experimental Execution Controlled experiment E1 was conducted in December 2010 at the University of Innsbruck in the course of a lecture on BPM with 12 participants. Immediately before the experiment, a short lecture revisiting the most important concepts of TDM and the experimental setup was held. The rest of the experiment was guided by CEP’s experimental workflow engine (cf. Section 3.3), leading students through an initial questionnaire, two modeling tasks (one with the support of test cases and one without the support of test cases), a concluding questionnaire and a feedback questionnaire, cf. Figure 4.22. The experiment was concluded with a discussion to exchange students’ experiences and to revisit the experiment’s key aspects. Data Analysis of E1 So far, we focused on the setup and execution of controlled experiment E1 . In the following, we describe the analysis and interpretation of data. Data Validation Due to the relatively small number of subjects, we were able to constantly monitor for potential problems or misunderstandings and to immediately resolve them. For this reason and owing to CEP’s experimental workflow engine, all subjects were guided successfully through the experiment and no data set had to be discarded because of disobeying the experimental setup. In addition, we screened the subjects for familiarity with Declare [175] (the declarative process modeling language we used in our models), since our research setup requires subjects to be at least moderately familiar with Declare. As summarized in Table 4.5, subjects felt competent in using and understanding Declare, even though they just recently started using Declare. In particular, the mean value for familiarity with Declare, on a rating scale from 1 to 7, was 3.17 (slightly below average). For confidence in understanding Declare models a mean value of 3.92 was reached (approximately average). For perceived competence in creating Declare models, a mean value of 3.83 (approximately average) could be computed. Also, it can be seen that subjects were rather new to Declare (on average using it since two weeks) and rather new to BPM in general (on average 9 months experience). Finally, all subjects indicated that they were students. Summarizing, we conclude that the subjects are rather new to Declare, however feel competent in using and understanding Declare, in turn, fitting the targeted subject profile. 84 4.8 The Influence of TDM on Model Maintenance: Experiments Familiarity with Declare Confidence understanding Declare Competence modeling Declare Months using Declare Years experience in BPM Min. Max. M SD 1 2 2 0 0 5 5 5 2 4 3.17 3.92 3.83 0.58 0.75 1.27 1.00 0.94 0.79 1.22 Table 4.5: Demographical statistics of E1 Descriptive Analysis To give an overview of the experiment’s data, Table 4.6 shows minimum, maximum, mean and standard deviation of mental effort, perceived quality and quality. The values shown in Table 4.6 suggest that the adoption of test cases lowers mental effort, increases perceived quality and increases quality, thus supporting hypotheses H1 , H2 and H3 . In particular, mental effort for tasks that were supported with test cases (M = 4.25, SD = 0.97), was lower than the for non test case supported tasks (M = 5.75, SD = 0.87). Likewise, perceived quality for tasks with test cases (M = 6.00, SD = 0.85) was higher than perceived quality for tasks without test cases (M = 4.33, SD = 1.61). Finally, also quality was higher for tasks with test case support (M = 22.33, SD = 1.56) than for tasks without test case support (M = 21.92, SD = 1.07). Please recall that mental effort and perceived quality were measured on rating scales ranging from 1 (Very low) to 7 (Very high). Quality, according to the definition of the quality measure in Section 4.8.1 (paragraph Response Variables), was calculated by taking into account whether change tasks were properly identified as feasible, invariants were obeyed and changes conducted correctly. All in all, 10 change tasks had to be performed per model, of which 5 were feasible. In addition, 8 invariants had to be obeyed, hence in total at most 23 points could be reached per model. So far, these observations are merely based on descriptive statistics. For a more rigid investigation, the hypotheses will be tested for statistical significance in the following. Hypotheses Testing of E1 Our sample is relatively small and not normal distributed9 , thus we follow guidelines for analyzing small samples to employ non–parametric tests [179]. In particular, we 9 We applied Kolmogorov–Smirnov Test with Lilliefors significance correction to test for normal distribution. Detailed tables listing results can be found in Appendix A.1. 85 Chapter 4 Test Driven Modeling N Min. Max. M SD Mental effort with test cases without test cases 24 12 12 3 3 4 7 6 7 5.00 4.25 5.75 1.18 0.97 0.87 Perceived quality with test cases without test cases 24 12 12 2 4 2 7 7 7 5.17 6.00 4.33 1.52 0.85 1.61 Quality with test cases without test cases 24 12 12 19 19 20 23 23 23 22.13 22.33 21.92 1.33 1.56 1.07 Table 4.6: Descriptive statistics of E1 used SPSS (Version 21.0) for carrying out Wilcoxon Signed–Rank Test.10 Hypothesis H1 The mental effort for conducting change tasks with test case support (M = 4.25) was significantly lower than the mental effort for change tasks that were conducted without test case support (M = 5.75) (Wilcoxon Signed–Rank Test, Z = −2.57, p = 0.01, r = 0.82). Hypothesis H2 The perceived quality for conducting change tasks with test case support (M = 6.00) was significantly higher than the perceived quality for change tasks that were conducted without test case support (M = 4.33) (Wilcoxon Signed– Rank Test, Z = −2.83, p = 0.005, r = 0.74). Hypothesis H3 The quality for conducting change tasks with test case support (M = 22.33) was not significantly higher than the quality for change tasks that were conducted without test case support (M = 21.92) (Wilcoxon Signed–Rank Test, Z = −0.86, p = 0.39, r = 0.25). Summing up, hypotheses for mental effort (H1 ) and perceived quality (H2 ) could be accepted, while the hypothesis quality (H3 ) had to be rejected. To assess the strength of the effects, we consider recommendations by Cohen and regard an effect size of 0.1 a small effect, an effect size of 0.3 a medium effect and an effect size of 0.5 10 Due to repeated measurements the response variables are not independent, thus Wilcoxon Signed– Rank Test was chosen. 86 4.8 The Influence of TDM on Model Maintenance: Experiments a large effect [32, 33]. Against this background, the effect regarding mental effort (H1 , r = 0.82) and perceived quality (H2 , r = 0.74) can be considered to be large. The effect size regarding quality (H3 , r = 0.25), in turn, is medium and differences are not statistically significant. Reasons, implications and conclusions are discussed in the following. Discussion and Limitations Based on the obtained results we conclude that the adoption of test cases has a positive influence on mental effort (H1 ) and perceived quality (H2 ). Especially interesting is the fact that, even though quality could not be improved significantly, apparently the modelers were more confident that they conducted the changes properly. This effect is also known from software engineering, where test cases improve perceived quality [14, 133]. Indeed also the follow–up discussion with the subjects after the experiment revealed that students with software development background experienced this similarity, further substantiating our hypotheses on a qualitative basis. Regarding quality (H3 ), no statistically significant differences could be observed. This raises the question whether there is no impact of test cases on quality at all or if the missing impact can be explained otherwise. To this end, a detailed look at the distribution of quality offers a plausible explanation: the overall quality is very high, the quality measured on average is 22.13 out of a maximum of 23 (cf. Table 4.6). Thus, approximately 96% of the questions were answered correctly and tasks were carried out properly. Put differently, the overall quality leaves almost no room for improvements when adopting test cases—in fact, the sample’s standard deviation is very low (1.33). Since data “points toward” the positive influence of test cases on quality (i.e., the mean value is higher, cf. Table 4.6) and due to the low variance it seems reasonable to assume that a positive correlation exists, however, the overall high quality blurs expected effects. In other words, we believe that the ceiling effect [263] was in place, i.e., best performers could not use their full potential, as the rather low complexity of tasks introduced an artificial ceiling. Thus, in turn, it seems plausible that by increasing the complexity of change tasks, the positive influence of test cases can be shown. As explained in Section 4.8.1, for the evaluation of quality we had to take a close look at the process models. Thereby, we could observe that the availability of test cases changed the behavior of the subjects. In particular, subjects who did not have test cases at hand seemed to be reluctant to change the process model, i.e., tried to perform the change tasks with a minimum number of change operations. To quantify and investigate this effect in detail, we counted how many constraints were 87 Chapter 4 Test Driven Modeling created, how many constraints were deleted and how many constraints were adapted. The results, as listed in Table 4.7, reveal that modelers having test cases at hand approximately changed twice as many constraints. More specifically, subjects who had test cases at hand performed on average 32.33 change operations, while subjects without test cases conducted 18.58 change operations. The applied Wilcoxon Signed–Rank Test (Z = −2.71, p = 0.007, r = 0.78) indicates statistically significant differences and a large effect size (cf. [32, 33]). Again, we would like to provide a possible explanation that is again inspired by insights from software engineering. In particular, it was argued that test cases in software engineering improve the developer’s confidence in the source code, in turn increasing the developer’s willingness to change the software [14, 15]. This experiment’s data supports the assumption that a similar effect can be observed when declarative process models are combined with test cases. In fact, significantly more change operations were performed by modelers who had test cases at hand. Still, the quality did not decrease even though approximately twice as many change operations were performed. Thus, we conclude that test cases indeed provide a safety net, thereby increasing willingness to change while not impacting quality in a negative way. N Min. Max. M SD Created constraints with test cases without test cases 24 12 12 5 10 5 34 34 13 13.04 16.25 9.83 6.36 7.49 2.29 Deleted constraints with test cases without test cases 24 12 12 5 9 5 32 32 12 11.79 15.17 8.42 6.36 7.48 1.93 Adapted constraints with test cases without test cases 24 12 12 0 0 0 7 7 1 0.63 0.92 0.33 1.53 2.11 0.49 Total changes with test cases without test cases 24 12 12 11 19 11 73 73 24 25.46 32.33 18.58 13.90 16.91 3.90 Table 4.7: Performed change operations Summing up, controlled experiment E1 shows on the basis of empirical data that the adoption of test cases has a positive influence on the mental effort, perceived quality and the willingness to change. Regarding the impact on quality, however, 88 4.8 The Influence of TDM on Model Maintenance: Experiments the data did not yield any statistically significant results. A closer look at the data suggests that these effects were blurred by the overall high quality (cf. ceiling effect [263]). Although the results sound very promising, it should be mentioned that the small sample size (12 subjects) constitutes a threat to external validity, i.e., it is questionable in how far the results can be generalized. Regarding the generalization of results, also the process models and change tasks used in E1 are of relevance. In particular, 10 change tasks were performed on 2 rather small process models, thereby further limiting the generalization of results. To address these problems, we have conducted a replication of experiment E1 , particularly focusing on the issues of quality, sample size and diversity of models. This replication, referred to as R1 , is described in the following. 4.8.3 Performing the Replication (R1 ) Replication R1 mainly follows the same experimental design as employed for experiment E1 , except for some adaptations required to address open issues of experiment E1 . In the following, we describe the experimental design of R1 , before we turn toward the execution and analysis of R1 . Experimental Design To address shortcomings of E1 , we adapted the experimental design, as follows. First, the experimental workflow and assignments were slightly adapted to minimize potential confusion. Second, to address the lack of significant differences with respect to quality in E1 , more complex models were used in R1 , as detailed in the following.11 Adaptations to the Experimental Workflow In experiment E1 , we could observe two main troubles related to the execution of the experiment. First, in E1 , we asked subjects to perform a series of change tasks to the same process model. The change tasks were designed such that they could be performed independently of each other and in arbitrary order. Still, subjects had to keep track which change tasks they had performed so far. Even though most subjects were not distracted by this procedure, as indicated by the high amount of correctly performed change tasks (cf. Table 4.6), it required the experimenters to repeatedly clarify the experimental setup to subjects. Second, in E1 we asked subjects to determine whether a change task was feasible and if this was the case, to perform the change task. Also this procedure 11 The material used for replication R1 can be downloaded from: http://bpm.q-e.at/experiment/TDMReplication 89 Chapter 4 Test Driven Modeling caused confusion among subjects—apparently the setting was not as intuitive as intended. To address these problems, we slightly adapted the experimental workflow. In particular, instead of asking subjects to apply changes to the same process model, we separated the change tasks from each other. In this way, a new process model was presented for each change task. A total of 14 change tasks—7 with test case support, 7 without test case support—had to be performed, requiring 14 different process models. To ensure that the variance of the process models’ complexity was kept low, we designed 3 process models and varied the labeling of activities so that we arrived at 14 process model variations. Besides this separation of change tasks, in R1 we did not provide change tasks that could not be performed, i.e., change tasks that were inconsistent with the process model and its invariants. By simplifying the change task assignments, we could also simplify the calculation of the quality measure. In particular, we rewarded each correctly performed change task with 1 point—allowing subjects to reach a maximum of 7 points per model. Adaptations to the Experimental Material As discussed in Section 4.8.2, no significant impact of test cases on the quality of changes could be found. However, we argued that lack of differences could be traced back to ceiling effects [263] and that more complex change tasks would compensate for this shortcoming. To test this assumptions, we increased the complexity of the process models used in R1 . Particularly, we increased the number of activities from 7 activities in E1 to 16 activities per model in R1 . Similarly, we increased the number of constraints from 7/8 constraints per process model in E1 to 18 constraints in R1 . Experimental Operation of R1 Experimental Preparation As described above, we slightly adapted the experimental workflow for R1 and created more complex process models. Analogous to E1 , we conducted a pilot study to ensure that the material and instructions were comprehensible. To accomplish a larger sample size for R1 , we could rely on an ongoing collaboration with Prof. Reichert from the University of Ulm. In particular, Prof. Reichert agreed to present TDM and to conduct replication R1 in the course of a lecture on Business Process Management. Similar to E1 , lectures and assignments on declarative process modeling in general and TDM in particular ensured that subjects were trained sufficiently before the replication. Experimental Execution Replication R1 was conducted in December 2011 at the University of Ulm in the course of a lecture on BPM with 31 participants. As in 90 4.8 The Influence of TDM on Model Maintenance: Experiments E1 , the experiment was guided by CEP’s experimental workflow engine, leading subjects through an initial questionnaire, 14 change tasks (7 with the support of test cases and 7 without the support of test cases), a concluding questionnaire and a feedback questionnaire. Even though in R1 more than twice as many subjects as in E1 participated, no major problems during the execution of the replication occurred. Data Analysis of R1 Up to now, we have described differences between E1 and R1 as well as the execution of R1 . In the following, we turn toward the analysis and interpretation of data collected in R1 . Data Validation Analogous to E1 , subjects were guided using CEP’s experimental workflow engine. Again, all subjects were successfully guided through the experiment, no data set had to be discarded because of disobeying the experimental setup. Also, we screened subjects for familiarity with Declare. As summarized in Table 4.8, the samples of E1 and R1 are similar with respect to their knowledge of Declare. Even though the values of the sample from R1 are slightly lower, differences are not statistical significant. In particular, neither differences with respect to familiarity with Declare (Mann–Whitney U Test, U = 164.5, p = 0.565), nor difference with respect to understanding Declare (Mann–Whitney U Test, U = 164.0, p = 0.565), nor differences with respect to competence using Declare (Mann–Whitney U Test, U = 138.0, p = 0.201) are statistically significant. Similar to E1 , 25 subjects indicated that they were students and 6 subjects indicated academic background. All in all, we conclude that the sample of R1 is similar to the sample of E1 with respect to knowledge about Declare and thus the subjects fit the targeted profile. Familiarity with Declare Confidence understanding Declare Competence modeling Declare Months using Declare Years experience in BPM Min. Max. M SD 1 1 1 0 0 6 6 6 15 7 2.90 3.48 3.16 1.42 1.56 1.45 1.55 1.53 3.28 1.99 Table 4.8: Demographical statistics of R1 91 Chapter 4 Test Driven Modeling Descriptive Analysis To give an overview of the results from R1 , Table 4.9 lists minimum, maximum, mean and standard deviation of mental effort, perceived quality and quality. Similar to E1 , data of R1 suggests that the adoption of test cases lowers mental effort, increases perceived quality and increases quality, thus supporting hypotheses H1 , H2 and H3 . More specifically, the average mental effort for change tasks with test support (M = 3.77, SD = 1.52) was lower than the average mental effort required for change tasks without test support (M = 5.16, SD = 1.64). The reported perceived quality of change tasks with test cases (M = 5.84, SD = 1.32) was higher than the perceived quality for change tasks without test support (M = 4.23, SD = 1.33). Finally, the average quality of changes with test case support (M = 6.19, SD = 1.01) was higher than the quality of changes without test support (M = 5.03, SD = 1.82). Please recall that mental effort and perceived quality were measured on rating scales ranging from 1 (Very low) to 7 (Very high), therefore also values in Table 4.9 range from 1 to 7. Regarding quality, we awarded 1 point for each correctly performed change task. All in all, 7 change tasks per model had to be performed, hence in total 7 points could be achieved per model, likewise values for quality range from 0 to 7. Analogous to E1 , we now turn toward inferential statistics to test whether these differences are also statistically significant. N Min. Max. M SD Mental effort with test cases without test cases 62 31 31 1 1 1 7 7 7 4.47 3.77 5.16 1.72 1.52 1.64 Perceived quality with test cases without test cases 62 31 31 1 2 1 7 7 6 5.03 5.84 4.23 1.55 1.32 1.33 Quality with test cases without test cases 62 31 31 0 3 0 7 7 7 5.61 6.19 5.03 1.57 1.01 1.82 Table 4.9: Descriptive statistics of R1 92 4.8 The Influence of TDM on Model Maintenance: Experiments Hypotheses Testing of R1 Since the sample of R1 is not normal distributed12 , analogous to E1 , we rely on non– parametric statistical tests. In particular, we used SPSS (Version 21.0) for carrying out Wilcoxon Signed–Rank Test.13 Hypothesis H1 The mental effort for conducting change tasks with test case support (M = 3.77) was significantly lower than the mental effort for change tasks that were conducted without test case support (M = 5.16) (Wilcoxon Signed–Rank Test, Z = −2.41, p = 0.016, r = 0.43). Hypothesis H2 The perceived quality for conducting change tasks with test case support (M = 5.84) was significantly higher than the perceived quality for change tasks that were conducted without test case support (M = 4.23) (Wilcoxon Signed– Rank Test, Z = −3.87, p = 0.000, r = 0.69). Hypothesis H3 The quality for conducting change tasks with test case support (M = 6.19) was significantly higher than the quality for change tasks that were conducted without test case support (M = 5.03) (Wilcoxon Signed–Rank Test, Z = −3.17, p = 0.002, r = 0.57). Discussion and Limitations The results obtained in replication R1 indicate that the adoption of test cases has a positive influence on the maintenance of declarative process models. In particular, the collected data supports the claim that test cases can help to reduce mental effort (H1 ), increase perceived quality (H2 ) and improve the quality of the conducted changes (H3 ). It is worthwhile to note that these results could be consistently found in E1 and R1 and the effect sizes for statistically significant differences were medium to large. Statistically significant differences for H1 and H2 as well as indications for the support of H3 —even though not statistically significant—were found in E1 . Regarding H3 , we argued that lack of differences was caused by ceiling effects [263], i.e., 96% of the task were performed correctly in E1 , leaving no room for improvement. By providing more complex process models in R1 , this rate dropped to 80% and we could provide statistically significant support for H3 . 12 We applied Kolmogorov–Smirnov Test with Lilliefors significance correction to test for normal distribution. Detailed tables listing results can be found in Appendix A.2. 13 Please recall that due to repeated measurements the response variables are not independent, thus Wilcoxon Signed–Rank Test was chosen. 93 Chapter 4 Test Driven Modeling This, in turn, leads us to two conclusions. First, test cases can indeed improve quality, i.e., reduce the amount of errors, when adapting declarative process models. Second, it appears that test cases work best if a certain level of complexity is reached. This finding is also in–line with the theoretical analysis of problems in understanding and maintaining declarative process models (cf. Section 4.4). We argued that declarative process models lack computational offloading (cf. Section 3.2), thus without the support of test cases, mental effort will quickly rise, in turn increasing the amount of errors. Even though in E1 the mental effort was significantly higher when no test cases were provided, the differences were not large enough to provoke significant changes with respect to the amount of errors committed. One might object at this point that comparing the average mental effort for E1 (M = 5.00) and R1 (M = 4.47) clearly indicates that the change tasks of E1 were actually easier than the change tasks of R1 . However, as discussed in [303], subjects usually show different base levels of mental effort, i.e., mental effort may be perceived differently by each subject. Put differently, a task that is considered hard by subject A might considered easy by subject B, since subject B is used to difficult tasks. Please note that even though this hampers the comparison between samples, it does not influence E1 and R1 , as Wilcoxon Signed–Rank Test conducts a within–subject comparison. N Min. Max. M SD Created constraints with test cases without test cases 434 217 217 0 1 0 24 24 10 2.00 2.36 1.64 2.42 3.08 1.42 Deleted constraints with test cases without test cases 434 217 217 0 1 0 24 24 9 1.82 2.29 1.35 2.28 2.99 1.02 Adapted constraints with test cases without test cases 434 217 217 0 0 0 3 3 1 0.01 0.01 0.01 0.16 0.20 0.10 Total changes with test cases without test cases 434 217 217 0 1 0 48 48 19 3.82 4.66 2.99 4.68 6.10 2.32 Table 4.10: Performed change operations in R1 When analyzing the data collected in experiment E1 , we also looked into the amount of change operations that were conducted. In particular, we found that sig- 94 4.9 Limitations nificantly more change operations were performed when subjects had test cases at hand. To investigate whether this trend persists in R1 , we counted the amount of operations conducted in R1 . As summarized in Table 4.10, in the course of 434 change tasks (14 per subject, 31 subjects), subjects required on average 3.82 changes to perform the task. Mostly, change tasks included the deletion of constraints (M = 2.00) and creation of constraints (M = 1.82). Contrariwise, constraints were almost never reused (M = 0.01), i.e., reconnected between activities. Consistent with E1 , in R1 significantly more change operations were performed when change tasks were provided (Wilcoxon Signed–Rank Test, Z = −3.27, p = 0.001, r = 0.59). Even though the results obtained in R1 appear to be consistent with the findings of E1 , also similar limitations apply. In particular, we argued that the generalization of results in E1 is limited, as only 10 change tasks for 2 process models were performed. Similarly, in R1 14 change tasks on 3 models were performed, thereby limiting generalization. Also, neither in E1 nor in R1 we could acquire professionals, rather students and academics were used as subjects. 4.9 Limitations Apparently, the work presented in this chapter has to be seen in the light of several limitations. Regarding conceptual aspects, we have discussed that the focus of TDM is rather narrow. In particular, TDM focuses on control flow behavior only and requires MBs and DEs that are willing to collaborate, cf. Section 4.5.2. With respect to the efficiency of TDM, the case study reported in Section 4.7 revealed that an overhead of 9% to 16% should be expected. In addition, also limitations regarding the evaluation should be mentioned. Regarding the case study presented in Section 4.7, the sample size—even though typical for case studies—is a clear limitation to the generalization of results. In addition, the modeling sessions lasted on average 26 minutes and 30 seconds, hence the effects TDM of longer–lasting modeling sessions could not be examined. Similarly, all modeling sessions started from scratch and did not investigate whether MBs and DEs are able to operate on test cases that were specified by other MBs or DEs. Also, the results obtained in the controlled experiments investigating the maintenance of declarative process models in Section 4.8 should be generalized with care. Although results were consistently found in controlled experiment E1 and replication R1 , it should be mentioned that students were used as subjects. Even though it was shown that in software engineering students may provide an adequate model for the professional population [108, 195, 215], other studies report from significant differences between students and professionals, e.g., [3]. In particular, it was argued 95 Chapter 4 Test Driven Modeling that studies should not use students and “blindly generalize to a larger population of software engineering professionals” [235]. Rather, under certain circumstances, such as an adequate level of commitment, students may be suitable for experimentation [16]. Since the subjects participating in our studies largely performed the tasks correctly, i.e., 96% correct in E1 , 80% correct in R1 , we conclude that the students showed considerable commitment. Against this background and the fact that findings persisted over two experiments, we argue that the results can also be generalized to the population of professional process modelers. 4.10 Related Work Basically, research related to this work can be organized along four different streams of research: test–driven approaches, scenario–driven approaches, validation, verification and creation of declarative process models and execution of declarative process models. Test–Driven Approaches As discussed in Section 4.5, the development of TDM was inspired by Test Driven Development (TDD). TDD, in turn, was developed to drive the software development process and to introduce test cases as early as possible in the development process, thereby improving the quality of the resulting software. Besides theoretical considerations of TDD [15], empirical investigations looking into the feasibility of TDD are of interest. In particular, experiments investigating the effect of TDD, i.e., interweaving software development and testing, like the TDM methodology interweaves modeling and testing, are of interest. For instance, [133] conducted a long–term case study which showed that the adoption of TDD increases perceived quality. In addition, developers stated that changes could be conducted easier when TDD was adopted. With respect to code quality, the situation appears not to be entirely clear. For instance, [62, 84] report from controlled experiments that showed increased code quality through the adoption of TDD. Contrariwise, experiments reported in [63, 169] could not show significant difference between TDD and test– after coding. More generally, benefits of TDD with respect to quality and perceived quality could so far mostly be shown for industrial settings—in semi–industrial or academic context, the situation seems less clear [230]. In addition, it should be mentioned that TDM is not the first approach in which the concepts of TDD are transferred to other domains. In particular, Tort et al. adapted TDD for the creation and testing of conceptual schemata. Similar to this work, a comprehensive description of concepts is provided [244, 245], but also tool 96 4.10 Related Work support for the approach is provided [243, 246]. In addition, the authors extend their work toward defining desirable properties for test cases [247]. Even though the general approach of Tort et al. is similar to this work, the targeted domain, i.e., conceptual schemata versus declarative process models, is entirely different, in turn leading to entirely different artifacts that were developed. Scenario–Driven Approaches In this work, we argued that the specification of test cases will help the MB in constructing the process model. Even though test cases are validated automatically against the process model, the creation of the process model is still a manual process. A similar approach is followed by scenario–driven approaches, where specific aspects of a system are captured using scenarios, cf. [2, 131]. These scenarios, in turn, are then used to automatically synthesize the system, e.g., the process models. For instance, Fahland proposes an approach for synthesizing Petri Nets from scenarios [67]. This approach was meanwhile implemented in the GRETA tool [66, 71] and was extended toward the specification of scenarios which can be directly executed [72] or be used for the specification of decentralized components [68]. In a similar vein, Desel et al. propose to adopt process mining techniques for the specification of scenarios [53, 54]. Similarly, the Play–Engine provides support scenario–based programming support for Live Sequence Charts [99]. All these approaches distinguish themselves from our work in two ways. First, in TDM, declarative process models are clearly in the focus, while the mentioned scenario–based approaches focus on imperative languages, such as Petri Nets or Live Sequence Charts. Second, all the approaches focus on the automated synthesis of models, whereas in TDM the model creation is a manual process. Even though the automated creation of models appears to be a desirable goal, such an approach is probably not feasible for all purposes. If, for instance, process models are developed for the enactment of processes, the readability of the model may not be of concern. However, if a process model is developed for documentation purposes, readability is a central goal. As argued in [88], automated approaches mostly fail to produce models that are also readable by humans. Validation, Verification and Creation of Declarative Process Models Regarding the validation and verification of process models in general, work in the area of process compliance checking should be mentioned, e.g., [5, 6]. In contrast to our work, understandability of declarative languages is not of concern, the focus is put on imperative languages. Likewise, the work of Ly et al. [132] focuses 97 Chapter 4 Test Driven Modeling on validation, but instead of declarative, adaptive process management systems are targeted. Interestingly, with respect to the classification of sequential and circumstantial information, their setting describes the opposite of the setting used in this work. Their imperative process models exhibit mostly sequential information, while the adopted measure to improve validity relies on circumstantial information, i.e., the usage of constraints. Thus, the aim of improving validity is found in both approaches, however, for entirely different settings. With respect to the verification of declarative process models, in turn, mechanisms for the verification were proposed. With proper formalization, declarative process models can be verified using established formal methods [175]. Depending on the concrete approach, a–priori, e.g., absence of deadlocks [175] or a–posteriori, e.g., conformance of the execution trace [251] checks can be performed. While these approaches definitively help to improve the syntactical correctness and provide semantical checks a–posteriori, they do not address understandability issues. Finally, in [126] an algorithm for the automated extraction of declarative process models from execution traces is proposed. Even though such an approach certainly helps to facilitate the creation of declarative process models, it does not help to understand the created models. Execution of Declarative Process Models In this work, we focus on the creation and maintenance of declarative process models. In the following, we give an overview of approaches related to the execution of declarative processes. Closest related to TDM is certainly the development of the declarative process modeling language Declare [175, 178], since TDMS was implemented to support the creation of Declare models. Regarding the execution of Declare workflows, the Declare framework [175–177], which provide support for executing Declare models, is related. Besides supporting general–purpose workflows, the Declare framework was also applied for the support of Computer–Interpretable Guidelines [159], i.e., supporting clinical guidelines with workflow technology. Likewise, the Declare framework was applied for the execution of service workflows [253, 254]. Declare is a prominent representative of declarative process modeling, however, also other researchers have addressed declarative process modeling. For instance, in [4, 229], similar to Declare, a workflow is considered a set of dependencies between tasks, however different formalisms are in place. In this vein, also Dynamic Condition Response (DCR) graphs [101, 156, 157] are of interest. DCR graphs, just like Declare, allow for the specification of declarative business process models, support the specification of sub–processes [103] and were applied in a case study for the specification of a cross–organizational case management system [102]. However, 98 4.11 Summary unlike Declare, DCR graphs focus on a set of core constraints instead of allowing for the specification of arbitrary constraints. Likewise, DCR graphs employ different formalisms for the operationalization of constraints. Although declarative process models provide a high degree of flexibility, their execution may pose a significant challenge. In particular, as argued in [220], due to the high degree of flexibility, it may not always be clear to the end user which activity shall be executed next. To counterbalance this problem, in [10, 11] methods for guiding the end–user through the execution of a declarative process instance are proposed. In particular, by recommending activities to be executed, the end user shall be be supported. More broadly, [96, 220] propose similar methods that can be also applied for imperative process models. Even though these approach focuses on improving the usability of declarative process models, the focus is put on the phase of process operation only. 4.11 Summary In this chapter, we described TDM for supporting the understanding, creation and maintenance of declarative process models. Regarding the understanding of declarative process models, we argued along the dimensions of the Cognitive Dimensions Framework and identified hard mental operations and hidden dependencies as problems. Similarly, we discussed that declarative process models mostly provide circumstantial information, while the extraction of sequential information may impose significant mental effort. As it is known that adapting a model consists of understanding what to change and then to perform the change, deficits with respect of understanding presumably also impact the maintenance of declarative process models. To counteract these problems, we proposed TDM, which adopts testing techniques from the domain of software engineering. By providing automatically verifiable test cases, we target to automate the extraction of sequential information, thereby relieving the MB from hard mental operations. In addition, we envisioned a visualization of test cases that is also readable for DEs. This, in turn, provides an additional communication channel MBs and DEs, allowing for more efficient communication. In order to provide operational support, we implemented the concepts of TDM in TDMS. To specifically support empirical investigations, TDMS was implemented on top of CEP. TDMS, in turn, was used in three empirical studies investigating the influence of TDM on the creation, understanding and maintenance of declarative process models. The data collected in the case study investigating the creation of declarative process models indicates that test cases are accepted as communication medium. 99 Chapter 4 Test Driven Modeling Furthermore, in our study, test cases were favored over the process models as communication channel and allowed for structuring modeling sessions. Regarding the maintenance of declarative process models, we conducted a controlled experiment (E1 ) and a replication (R1 ). Therein, a positive influence of test cases on mental effort, perceived quality and quality could be observed. Interestingly, significantly more change operations were performed for change tasks that were provided with test case support, indicating that test cases can also improve the willingness to change. As nevertheless quality increased, we conclude that test cases provide a safety net for conducting changes. Still, it should be mentioned that the creation of test cases implies a certain overhead (in our study between 9% and 16%). In addition, TDM currently focuses on control flow only, i.e., other dimensions such as resources or data are not supported yet, although such an extension could be achieved by extending the semantics of test cases. Thus, we conclude that TDM— even though implying a certain overhead for specifying test cases—helps to improve the creation, understanding and maintenance of declarative process models. 100 Chapter 5 The Impact of Modularization on Understandability In Chapter 4, we have focused on the question how the application of cognitive psychology can help to improve the creation, understanding and maintenance of declarative business process models. In this chapter, we turn toward the application of cognitive psychology for investigating the impact of modularization on the understandability of a process model. In other words, we shift the focus in two ways. First, we narrow our research from creating, understanding and maintaining process models toward the understanding only, i.e., we focus on the interpretation of process models. Second, we broaden our perspective on modeling languages. In particular, besides declarative process modeling languages, we also take into account imperative process modeling languages and consider insights from conceptual modeling languages in general. As described in Chapter 2, thereby we follow the Design Science Research Methodology (DSRM) approach [173] to guide our research, but also to structure this chapter. In particular, the remainder of this chapter is organized along the activities specified by DSRM: (1) Problem identification and motivation (2) Define the objectives for a solution (3) Design and development (4) Demonstration (5) Evaluation (6) Communication In particular, we start with a general introduction to the problem, i.e., the interplay between modularization and the understanding of a process model, in Section 5.1. Then, we systematically assess the state of the art by conducting a systematic literature review in Section 5.2. In so far, these sections address problem 101 Chapter 5 The Impact of Modularization on Understandability identification and motivation (1) and define the objectives for a solution (2). Next, we propose a cognitive–psychology–based framework for assessing the impact of modularization on the understanding of a process model in Section 5.3, addressing design and development (3) and demonstration (4). Subsequently, Section 5.4 reports from empirical investigations in which the proposed framework is empirically validated in the context of BPMN–based process models. In a similar vein, Section 5.5 reports from an empirical investigation, applying the proposed framework in the context of declarative business process models. Therefore, these sections can be attributed to activity evaluation (5). Then, limitations of this work are described in Section 5.6. In Section 5.7, we revisit findings and limitations for a discussion, whereas the work presented in this chapter is put in the context of existing research in Section 5.8. Finally, this chapter is concluded with a summary in Section 5.9. Even though activity communication (6) was not explicitly mentioned so far, we would like to remark at this point that communication is the inherent purpose of this document, i.e., is also addressed in this work. 5.1 Introduction Using modularization to structure information has for decades been identified as a viable approach to deal with complexity [170]. Not surprisingly, modularization is widely used and, for instance, available through nested states in UML Statecharts [166] and sub–processes in BPMN [167] and YAWL [256]. However, in general, “the world does not represent itself to us neatly divided into systems, subsystems. . . these divisions which we make ourselves” [89]. In this sense, a viable discussion about the proper use of modularization for the analysis and design of information systems as well as its impact on understandability is still going on. Even though considerable progress could be achieved, for instance, by adopting the good decomposition model [265] for the modularization of Event–driven Process Chains [113] and the application of proper abstraction levels [51, 197], still, it appears that it is not entirely clear whether and when modularization has a positive influence on the understandability of a conceptual model in general or a process model in particular. For instance, researchers who have set out to provide empirical evidence for the positive effects of modularization reported—contrary to their expectations—from a negative influence of modularization on understandability, cf. [40, 45]. More generally, empirical research into the understandability of conceptual models has shown that modularization can have a positive influence [208], negative influence [45], or no influence at all [41]. In Business Process Management (BPM), sub–processes have been recognized as an important factor influencing model understandability [46], 102 5.2 Existing Empirical Research into Modularizing Conceptual Models however, there are no definitive guidelines on their use yet. For example, recommendations regarding the size of a sub–process in an imperative process model range from 5–7 model elements [224] over 5–15 model elements [122] to up to 50 model elements [145]. Against this background, we argue that even though considerable progress concerning the proper application of modularization has been achieved, the connection between the modularization of a conceptual model and its understandability is not yet fully understood. In this work, we aim for contributing to a better understanding of the interplay between modularization and the understandability of a process model by distilling insights from cognitive psychology and findings from empirical investigations. We would like to mention at this point that the concepts proposed in this work are theoretically applicable to any modularized conceptual modeling language, but were only empirically validated for process modeling languages so far. Hence, we will refer to conceptual models in the conceptual sections, but refer to the specific modeling language in the empirical validation. In particular, we start by assessing the state of the art of empirical research into the modularization of conceptual models through a systematic literature review. Then, to provide a potential explanation for the diverging findings, we propose a cognitive–psychology–based framework for assessing the impact of modularization on the understandability of a conceptual model, i.e., whether a certain modularization of a model has a positive influence, negative influence, or no influence at all. Finally, we test the proposed framework empirically in the course of three experiments, thereby applying it to BPMN–based process models and Declare–based process models. 5.2 Existing Empirical Research into Modularizing Conceptual Models Up to now, we have discussed that the impact of modularization is not uniform. Rather, the influence seems to vary from positive over neutral to negative. In order to provide a comprehensive analysis, we conducted a systematic literature review [121] about empirical work investigating the impact of modularization on understanding. As detailed subsequently, we followed guidelines from [121] and structured our systematic literature into three phases: planning, performing as well as reporting the systematic literature review. 103 Chapter 5 The Impact of Modularization on Understandability 5.2.1 Planning the Systematic Literature Review The planning phase of a systematic literature review consists of two tasks. First, the need for a systematic literature review is assessed and the goal is defined. Second, based upon the defined goal, the search strategy for conducting the systematic literature review is elaborated. Identification of Goal Conducting a systematic literature review is usually associated with a significant effort. Even though digital libraries allow for searching by the use of keywords, the identification of relevant literature has to be performed manually, typically requiring the reviewer to assess thousands of results for relevance. Hence, the decision for conducting a systematic literature review should be well–justified. The starting point of this systematic literature review were individual empirical works that show a variable influence of modularization of a conceptual model on its understanding. However, these works were identified in a rather unsystematic manner, e.g., coincidentally in the course of other research activities or by receiving literature recommendations by co–workers. In this way, we had to assume that the identified literature is non– representative and may only represent a small fraction of studies. The goal of this systematic literature review is therefore to counteract this problem and to provide a comprehensive overview of empirical studies investigating the interplay between modularization and understanding. Search Strategy To operationalize the goal of the systematic literature review, we derived a key–word pattern that describes any “empirical study investigating the interplay between modularization and understanding of a conceptual model”. Since previous works may made use of different terminology, we used synonyms for modularization and understanding. In particular, as summarized in Table 5.1, we identified 9 synonyms for modularization and 2 synonyms for understanding. From these synonyms, in turn, the key–word pattern for the search in digital libraries was derived. In particular, we used all potential combinations of synonym for modularization, synonym for understandability, experiment and model, leading to 18 key–word patterns (9 * 2 * 1 * 1). If supported by the digital library, we intended to use a boolean combination of all key–words to avoid duplication, i.e., key–word pattern1 OR key–word pattern2 OR . . . key–word pattern18 . If such boolean expressions were not supported, we intended to conduct an individual search for each key–word pattern and to merge results. 104 5.2 Existing Empirical Research into Modularizing Conceptual Models Search Term Synonyms Modularization Modularity, hierarchical, hierarchy, decomposition, refinement, submodel, sub–model, fragment, module Understandability, comprehensibility Experiment Model Understandability Experiment Model Table 5.1: Synonyms for key–word patterns For performing the search, the review protocol defined to rely on the online portals of the most important publishers in the field of computer science, i.e., Springer1 , Elsevier2 , ACM3 and IEEE4 . In addition, we included the senior scholars’ basket of journals to extend the search toward the field of information systems.5 Besides specifying search terms, the review protocol is required to define criteria for the inclusion or exclusion of results. We decided to include any publication that reports from empirical work investigating the interplay of modularization and understanding and did not define quality criteria necessary for inclusion. Finally, it is needed to define which information shall be extracted from the identified literature. In this work, we decided to extract the following information: • Investigated modeling language • Dimension of understanding that was measured (e.g., accuracy, duration) • Direction of impact (positive, neutral, negative) 5.2.2 Performing the Systematic Literature Review Following the guidelines from [121], the review protocol was used to guide the search. The key–word pattern resulted into a total of 10,391 hits that had to be assessed manually for inclusion. Mostly, the title of an identified work was sufficient for determining inclusion/exclusion. If the title was not informative enough, we consulted the work’s abstract. Only if the abstract was also not informative enough, the paper had to be inspected in detail. By following this strategy, 10 publications out 1 http://www.springerlink.com http://www.sciencedirect.com 3 http://portal.acm.org 4 http://ieeexplore.ieee.org 5 The senior scholars’ basket of journals is available at: http://www.vvenkatesh.com/isranking. For searching these journals, we relied on Google Scholar: http://scholar.google.at 2 105 Chapter 5 The Impact of Modularization on Understandability of 10,391 were classified as relevant. Assuming that similar works are usually connected, we also checked the references used in these 10 publications, leading to the identification of another 3 studies. Due to personnel limitations, the identification of relevant literature was performed by a single person, i.e., the author. All in all, 13 publications reporting from studies investigating the impact of modularization on understanding of conceptual models were found. The insights of these studies are reported in the following. 5.2.3 Reporting the Systematic Literature Review As summarized in Table 5.2, we could identify 13 publications reporting from investigations of 6 different modeling languages: Levelled Data Model (LDM) [150], Hierarchical Entity–Relationship Diagram (HERD) [225], Protos [261], UML Class Diagram [166], UML Use Case Diagram [166] and UML Statechart [166]. Even though 13 publications could be identified, Table 5.2 actually lists 10 experiments, since 3 experiments were published in different venues. Basically, these studies adopted a similar experimental design, as illustrated in Figure 5.1. Based on the particular research question, conceptual models were created that operationalized the concepts to be investigated, such as quality of decomposition or depth of modularization. Then, researchers elaborated questions about these models in order to assess their understandability. Regarding the dimension of understanding that was measured, we could observe that each study looked into accuracy, i.e., researchers counted how many questions were answered correctly. In addition, 4 studies also took into account the duration required for answering questions ([43] reported “efficiency”, i.e., accuracy divided by duration). Finally, 3 studies investigated the perceived ease of understanding, i.e., mental effort. Regarding the impact on understanding, 7 studies reported from positive influence, 9 studies reported from no influence, while 3 studies reported from negative influence [40]. Please note that these numbers do not add up to the 10 studies identified in the systematic literature review, as some studies reported from positive, negative as well as neutral results. We also would like to emphasize at this point that most of the studies investigate differences between modularized and non–modularized models. Studies [24–26], however, empirically validate guidelines for modularizing conceptual models. In other words, these works empirically showed that a certain modularization is required for showing positive effects. In general, it can be said that the influence of modularization on the understanding of a conceptual model was investigated for a variety of modeling languages. Although most studies report from positive influence, some studies could not find any impact on the understanding of a model and 3 studies reported from a negative 106 5.2 Existing Empirical Research into Modularizing Conceptual Models Language Study Finding LDM [150] Accuracy: positive (2 models) Duration: negative (1 model), neutral (1 model) HERD [225] Accuracy: neutral (2 models) Protos [206, 208]a Accuracy: positive (1 model), neutral (1 model) UMLb [24] Accuracy: positive (3 models) Mental Effort: neutral (3 models) [25] Accuracy: positive (1 experiment, 1 replication) Mental Effort: neutral (1 experiment, 1 replication) [26] Accuracy: positive (3 models) Mental Effort: neutral (3 models) [42, 44]a Accuracy: positive (1 exp.), neutral (4 experiments) [41] Accuracy: neutral (1 model) Duration: neutral (1 model) [43] Efficiencyc: positive (experiment), negative (repl.) [40, 45]a Accuracy: neutral (experiment), negative (repl.) Duration: neutral (experiment), negative (repl.) UML Statecharts a b c The same study was published in multiple venues UML Class Diagrams, UML Use Case Diagrams and UML Statecharts Defined as ratio of accuracy and duration Table 5.2: Systematic literature review: results influence. In addition, it was empirically corroborated that also the way how modularization is applied, has an impact on understanding. Against this background, one might argue that studies reporting from negative influence used models that were badly modularized. However, knowing that all these studies tried to provide evidence that modularization has a positive influence and that authors are considered specialists in their field, this explanation appears implausible. Rather, we think that modularization involves a certain trade–off. By introducing modularization, certain aspects of a model become easier to understand, while other aspects of a model become harder to understand. It is important to emphasize at this point that these influences have to be seen orthogonal to guidelines for modularization, as presented in [24–26]. Guidelines appear to maximize positive influence, while minimizing neg- 107 Chapter 5 The Impact of Modularization on Understandability question answer modularization model understandability Figure 5.1: Typical research model [297] ative influences. In this vein, we argue that even for a perfectly modularized model, certain aspects would be easier to understand without modularization. However, currently, it is not entirely clear yet under which circumstances positive or negative influence of modularization on the understanding of a model can be expected. To approach this issue, in the following, we draw on cognitive psychology to provide a systematic view on which factors influence understandability. 5.3 A Framework for Assessing Understandability Up to now, we have focused on existing research into the interplay between modularization and the understanding of a model. In the following, we turn toward offering a new perspective by proposing a framework for assessing the impact of modularization on understandability. To this end, we start by identifying two opposing forces that presumably influence the understanding of a model in Section 5.3.1. Then, in Section 5.3.2 we integrate these opposing forces into a framework for assessing the interplay between modularization and understandability. 5.3.1 Antagonists of Understanding: Abstraction and Fragmentation The systematic literature revealed that typical research setups are centered around modularized conceptual models as well as questions for assessing their understandability. In this sense, questions and associated answers are the unit of analysis: If the impact of the applied modularization is positive, questions will be easier to answer. Contrariwise, if the impact of the applied modularization is negative, questions will be harder to answer. Therefore, we discuss the impact of modularization on understanding by focusing on the effects on an individual question. In particular, 108 5.3 A Framework for Assessing Understandability we describe how the positive influence of modularization can be attributed to the concept of abstraction, while negative effects can be explained by fragmentation. Abstraction Through the introduction of modularization it is possible to group a part of a model into a sub–model. When referring to such a sub–model, its content is hidden by providing an abstract description, such as a complex activity in a business process model or a composite state in an UML Statechart. The concept of abstraction is far from new and known since the 1970s as “information hiding” [170]. In the context of our work, it is of interest in how far abstraction influences model understandability. From a theoretical point of view, abstraction should show a positive influence, as abstraction reduces the amount of elements that have to be considered simultaneously, i.e., abstraction can hide irrelevant information, cf. [150]. However, if positive effects depend on whether information can be hidden, the way how modularization is displayed apparently plays an important role. Here, we assume, similar to [150, 206], that each sub–model is presented separately. In other words, each sub–model is displayed in a separate window if viewed on a computer, or printed on a single sheet of paper. The reader may arrange the sub–models according to personal preferences and may close a window or put away a paper for hiding information. Thereby, irrelevant information can be hidden from the modeler, leading to decreased mental effort, as argued in [150]. From the perspective of cognitive psychology, this phenomenon can be explained by the concept of attention management [128]. During the problem solving process, i.e., answering a question about a model, attention needs to be guided to certain parts of a model. For instance, when checking whether a certain execution trace is supported by a process model, activities that are not contained in the trace are irrelevant for answering the question. Here, abstraction allows for removing this irrelevant information, supporting the attention management system and thus allows for reducing mental effort. To illustrate the impact of abstraction, consider the BPMN–based model shown in Figure 5.2. Assume the reader wants to determine whether activity I is always preceded by activity A. In this case, the question can be easily affirmed, since all execution flows leading to activity I are also connected to activity A. Activities B to H can be ignored, since they are hidden through complex activities J and K, i.e., are hidden by abstraction. The value of abstraction becomes particularly visible when comparing this model with the non–modularized version shown in Figure 5.3. As both models represent the same business process, again activity I is always preceded by A. Similar to the process model from Figure 5.2, all sequence flows between activity A and I have to be checked. However, no sub–processes are present, hence 109 Chapter 5 The Impact of Modularization on Understandability considerable more modeling elements, i.e., the entire content of sub–processes J and K, have to be considered. Figure 5.2: Example of modularization Besides reducing mental effort by improving attention management, abstraction presumably supports the identification of higher level patterns. It is known that the human’s perceptual system requires little mental effort for recognizing certain patterns [128, 217], e.g., recognizing a well–known person does not require thinking, rather this information can be directly perceived. Similarly, in conceptual models, by abstracting and thereby aggregating information, presumably information can be easier perceived, as discussed in detail in Section 3.2.2. To illustrate the effect of recognition, we would like to come back to the process models shown in Figure 5.2 and Figure 5.3. Basically, both models represent the same business process and thus exhibit a similar structure. The start event is followed by activity A, after which the control flow is split and joined just before activity I. Directly after activity I, in turn, the process is completed. At this point, we argue that this structure can be easily recognized in the modularized model from Figure 5.2, while it takes considerably more time to identify this structure in the non–modularized model from Figure 5.3. Please note that we selected small models for demonstration purposes, therefore the presented questions can still be answered rather easily. At the same time, it can be expected that the positive influence of abstraction increases when models become larger. 110 5.3 A Framework for Assessing Understandability C B D A E I F G H Figure 5.3: Non–modularized version of model from Figure 5.2 Fragmentation Empirical evidence shows that the influence of modularization ranges from positive over neutral to negative (cf. [40, 41, 150, 208]). To explain the negative influence, we refer to the fragmentation of the model. When extracting a sub–model, modeling elements are removed from the parent model and placed within the sub–model. When answering a question that also refers to the content of a sub–model, the modeler has to switch attention between the parent model and the sub–model. In addition, the modeler has to mentally integrate the sub–model into the parent model, i.e., interpret the sub–model in the context of the parent model. From the perspective of cognitive psychology, these phenomena are known to increase mental effort and are referred to as split–attention effect, as discussed in detail in Section 3.2.3. To demonstrate this effect, consider the process model from Figure 5.2. To determine whether activity B and activity F are mutually exclusive, a modeler may make use of the following strategy. First, activity B has to be located in sub–process J and activity F has to be located in sub–process K. In other words, the modeler has to split attention between sub–processes J and K for this step. Next, to determine whether activities B and F are indeed mutual exclusive, the modeler has to integrate sub–processes J and K back into the parent process model, i.e., it is required to mentally integrate the fragments. Compared to the process model in Figure 5.3, where it is sufficient to interpret the control flow, splitting attention and mentally reintegrating sub–processes causes additional effort in the modularized version. Please note that fragmentation is inevitable as soon as modularization is introduced, even for well–modularized models. Consider, for instance, a modeler who wants to find all activities that are assigned to a specific role. In this case it is very 111 Chapter 5 The Impact of Modularization on Understandability likely that the modeler will have to look through several sub–processes to locate all these activities. Hence, the impact of modularization on the understanding of a model will depend on whether fragmentation can be compensated by abstraction, as detailed in the following. 5.3.2 Toward a Cognitive Framework for Assessing Understandability Up to now, we have discussed two forces that presumably influence the understanding of modularized models. Positively, we have identified abstraction, which presumably improves understanding by fostering information hiding and pattern recognition. Negatively, splitting attention and the mental integration of sub–models caused by fragmentation is presumably responsible for decreased understanding. In the following, we will combine these opposing forces toward a framework for assessing the impact of modularization on understandability. When taking into account the interplay of abstraction and fragmentation, it becomes apparent that the impact of modularization on the performance of answering a question might not be uniform. Rather, each individual question may benefit from or be impaired by modularization. Typically, understanding is estimated by averaging the performance of answering questions about a model, as discussed in Section 5.3.1. Therefore, it is essential to understand how a single question is influenced by modularization. To approach this influence, we propose a framework that is centered around the concept of mental effort, i.e., the load imposed on the working memory [168], as discussed in Section 3.2.3. As illustrated in Figure 5.4, we propose to view the impact of modularization as the result of two opposing forces. In particular, every question induces a certain mental effort caused by the question’s complexity, also referred to as intrinsic cognitive load [236]. This value depends on model–specific factors, e.g., model size, question type or layout, and person– specific factors, e.g., experience, but is independent of the model’s modularization. If modularization is present, the resulting mental effort is decreased by abstraction, particularly by enabling information hiding and pattern recognition. Contrariwise, fragmentation increases the required mental effort by requiring to switch attention between fragments and the mental integration of information. Based on the resulting mental effort, a certain answering performance, e.g., accuracy or duration, can be expected. We center our framework around mental effort, since it is known that working memory in general and mental effort in particular is connected to performance (cf. Section 3.2.3). Hence, we assume that mental effort can be used as a reasonable estimator of performance. In the following, we will discuss the interplay of abstraction and fragmentation as well as link the size of a model to the framework. 112 5.3 A Framework for Assessing Understandability allows for Information hiding Abstraction allows for Question complexity decreases Pattern recognition enables induces decreases Modularization Mental Effort causes requires Switching attention between fragments Understandability increases Fragmentation requires estimates increases Integration of information Figure 5.4: Framework for assessing understandability [304] Interplay of Abstraction and Fragmentation According to the model illustrated in Figure 5.4, a question’s complexity induces a certain mental effort, e.g., locating an activity is easier than validating an execution trace. In addition, mental effort may be decreased by information hiding and pattern recognition, or increased by the need to switch between sub–models and integrate information. Thereby, abstraction as well as fragmentation occur at the same time. A model without sub–models apparently cannot benefit from abstraction, neither is it impacted by fragmentation. By introducing modularization, i.e., creating sub–models, both abstraction and fragmentation are stimulated. Whether the introduction of a new sub–model influences understandability positively or negatively then depends on whether the influence of abstraction or fragmentation predominates. In the systematic literature review reported in Section 5.2, we could make two observations in this respect. First, a negative influence of modularization was reported particularly for rather small models, cf. [40, 43, 45]. Presumably, when introducing modularization in a small process model, little influence of abstraction can be expected, as the model is small anyway. However, fragmentation will appear, regardless of model size. In other words, modularization will most likely show a negative influence or at best no influence for small models. Second, and similarly, in [42] a series of experiments for assessing the understandability of UML Statecharts with composite states is described. For the first four experiments, no significant differences between non–modularized models and modularized models could be found. Finally, the last experiment showed significantly better results for the modularized model. The authors identified increased complexity, i.e., model size, as one of the main factors for this result, strengthening the assumption that a model must be large enough to benefit from abstraction. While it seems very likely that there is a certain complexity threshold that must be exceeded so that desired effects can be 113 Chapter 5 The Impact of Modularization on Understandability observed, it is not yet clear where exactly where this threshold lies. Looking into literature that recommends the size of sub–processes illustrates how difficult it is to define this threshold: estimations range from 5–7 model elements [224] over 5–15 elements [122] to up to 50 elements [145]. 5.3.3 Limitations Even though the proposed framework relies on established concepts from cognitive psychology and insights from empirical works investigating the interplay between modularization and the understanding of a conceptual model, certain limitations apply. First, and foremost, the framework is of a general nature. Although this basically allows for applying the framework to almost any conceptual modeling language supporting modularization, it does not take into account the characteristics of specific modeling languages. Particularly, we have identified the integration of sub–models as a task negatively influencing the understandability of a model. However, the difficulty of this integration task will most likely depend on the specific modeling language. For instance, when mentally executing a BPMN–based model, this task mainly refers to transferring the token from the parent process model to the sub–process. When mentally executing a Declare–based model, the integration of a sub–process requires the modeler to interpret the constraints of the sub–process in the context of the parent process. This, in turn, might be a considerably more difficult task, as discussed in Section 3.1.2. Also, the proposed framework focuses on the structure of the model, but does not take into account its semantics. Hence, factors such as redundancy or minimality, as described in the good decomposition model [25, 266] cannot be taken into account. Furthermore, even though we have discussed that the size of a model plays an important role for modularization, the size is only taken into account implicitly, e.g., abstraction will presumably only appear for large models, whereas fragmentation will presumably occur for any modularized model. We would like to emphasize at this point that we have deliberately refrained from explicitly integrating size in our framework, as the optimal size of a sub–model still appears to be unknown. Finally, as discussed, the framework is built upon insights from cognitive psychology and empirical investigations of modularizing conceptual models, but still requires empirical validation. To compensate for this shortcoming, we empirically tested the framework, as detailed in the following. 5.4 Evaluation Part I: BPMN Even though the framework for assessing the impact of modularization on the understanding of a model relies on established concepts from cognitive psychology, it 114 5.4 Evaluation Part I: BPMN clearly calls for empirical evaluation. To this end, we conducted a series of experiments in which the framework was validated for two different modeling notations. In particular, in this section we report from an experiment (E2 ) and a replication (R2 ), in which we look into modularized models that were created with BPMN.6 As the experimental design of experiment E2 and replication R2 are almost identical, we start by introducing the experimental design in Section 5.4.1. Then, we turn to describing the execution of E2 in Section 5.4.2 as well as the execution of R2 in Section 5.4.3. 5.4.1 Experimental Definition and Planning The goal of experiment E2 and replication R2 is to provide empirical evidence for the presence of abstraction and fragmentation, as introduced in Section 5.3. To this end, in the following, we introduce the research questions, hypotheses, describe the subjects, factors, factor levels, objects and response variables required for our experiment. Then, we present the experimental design as well as the instrumentation and data collection procedure. Research Questions and Hypotheses The research questions investigated in E2 and R2 are directly derived from the theoretical framework presented in Section 5.3.2. The basic claim of the framework is that any modularization shows a positive influence through abstraction, but also a negative influence through fragmentation. Hence, the first research question can be formulated, as follows: Research Question RQ9 Is the understanding of a BPMN–based model positively influenced by abstraction? To assess the understandability of a model, we rely on measuring the mental effort expended for answering a question, the accuracy, i.e., percentage of correct answers, and the duration required for answering the question (mental effort, accuracy and duration are elaborated in detail in Paragraph Response Variables). Also, we use the term flat model synonymously for a model that is not modularized. For all hypotheses, we assume that a business process is available as a modularized version modelmod and a flat version modelf lat . In this way, the hypotheses associated with RQ9 can be postulated as follows: 6 Similar to research questions, we have decided to number experiments and replications consecutively. E1 and R1 were already reported in Chapter 4, hence we continue with experiment 2 and replication 2. 115 Chapter 5 The Impact of Modularization on Understandability Hypothesis H4 Questions that are influenced by abstraction, but are not influenced by fragmentation, require less mental effort in modelmod . Hypothesis H5 Questions that are influenced by abstraction, but are not influenced by fragmentation, yield a higher accuracy in modelmod . Hypothesis H6 Questions that are influenced by abstraction, but are not influenced by fragmentation, require less time to answer in modelmod . The second research question refers to the negative influence of modularization, in our framework captured as fragmentation. Consequently, RQ10 looks into the negative influence of fragmentation: Research Question RQ10 Is the understanding of a BPMN–based model negatively influenced by fragmentation? Analogous to RQ9 , we look into mental effort, accuracy and duration for the postulation of hypotheses associated with RQ10 : Hypothesis H7 Questions that are influenced by fragmentation, but are not influenced by abstraction, require a higher mental effort in modelmod . Hypothesis H8 Questions that are influenced by fragmentation, but are not influenced by abstraction, yield a lower accuracy in modelmod . Hypothesis H9 Questions that are influenced by fragmentation, but are not influenced by abstraction, require more time to answer in modelmod . Subjects The population examined in E2 and R2 are basically all persons that need to interpret BPMN–based models, such as process modelers or system analysts. Therefore, the subjects participating in our study should be at least moderately familiar with BPM in general and BPMN in particular. We are not targeting modelers who are not familiar with BPMN at all, as it has to be assumed that committed errors can be rather traced back to unfamiliarity with the notation, i.e., BPMN, than traced back to the influence of modularization. 116 5.4 Evaluation Part I: BPMN Factor and Factor Levels In this experimental design, we consider two factors. First, factor modularization with factor levels flat and modularized refers to the modularization of the model. Second, factor question type refers to the influence of modularization that should be investigated. Factor level abstraction refers to questions that are presumably influenced by abstraction, but are not influenced by fragmentation. Contrariwise, factor level fragmentation refers to questions that are presumably influenced by fragmentation, but are not influenced by abstraction. Objects The objects of our study are four process models created with BPMN.7 In particular, we created two process models without modularization and then derived modularized versions from these models, operationalizing factor levels flat and modularized. When elaborating the models, we ensured that the models contain typical constructs, such as sequences, concurrency, exclusive choice and looping (cf. [207, 255]). To ensure a consistent layout, we used an automated algorithm for laying out the process models [94]. In addition, we assume that a certain model size is required for positive aspects of modularization to appear (cf. Section 5.3). Therefore, we ensured that the size of model, i.e., the number of nodes [142], such as activities or gateways, goes well beyond the recommended size of at most 50 nodes [145]. In particular, as summarized in Table 5.3, the flat version of Model 1 contains 134 nodes, whereas the flat version of Model 2 contains 131 nodes (subsequently, we refer Model 1 to as M1 and Model 2 to as M2 ). When also taking into account edges, i.e., sequence flows, 298 model elements can be found in M1 and 292 model elements in M2 . Please note that the amount of start events, end events and sequence flows between the flat and modularized versions differs, since process fragments that are extracted as sub–processes were enclosed in a complex activity with a start event and an end event. Finally, we would like to emphasize that for this study the quality of the modularization is not of concern. In particular, the framework presented in Section 5.3.2 assumes that for any modularization positive and negative effects can be found—the goal of this study is to provide empirical evidence that the framework is indeed able to identify these positive and negative influences. To operationalize factor question type, in particular factor level abstraction, we created questions that are presumably influenced by abstraction, but are not influenced by fragmentation. In other words, these questions were designed so that fewer model elements had to be taken into account when answering the question 7 Material can be downloaded from: http://bpm.q-e.at/experiment/Modularization 117 Chapter 5 The Impact of Modularization on Understandability M1 flat Activities Complex activities XOR gateways AND gateways Start events End events Sequence flows Sub–processes Nesting depth Total nodes Total elements 84 mod. M2 flat 85 mod. 30 18 1 1 164 84 5 30 18 6 6 174 5 2 18 26 1 1 161 85 7 18 26 8 8 175 7 3 134 298 149 323 131 292 152 327 Table 5.3: Process models used in E2 and R2 in the modularized version (cf. Section 5.3.1). At the same time, we ensured that not more than one sub–process was necessary for answering the question, thereby preventing fragmentation. We would like to emphasize at this point that, as discussed in Section 5.3.2, information hiding and pattern recognition always occur at the same time. As E2 and R2 focus on quantitative data, we designed our questions to particularly focus on information hiding, since information hiding can be operationalized and measured rather easily in a quantitative setting.8 For operationalizing factor level fragmentation, the opposite of this strategy is applied. The questions were designed such that the same amount of model elements had to be considered, no matter whether the question was answered in the modularized or flat version—thereby preventing abstraction. At the same time, we ensured that fragmentation would occur by forcing the reader to integrate information from several sub–processes and to switch attention between sub–processes (cf. Section 5.3.1). Furthermore, we ensured that these questions refer to aspects that are typically needed for understanding a business process model, i.e., represent typical questions. To this end, we developed question that are known to representatively cover aspects of control–flow, i.e., ordering, concurrency, exclusiveness and repetition [137, 138]. As illustrated in Figure 5.5, for each of these categories, for each model (M1 and M2 ) 8 The investigation of pattern recognition requires methods that are also able to analyze the modeler’s reasoning processes in detail. Therefore, in experiment E3 we pursue the exploration of pattern recognition through the adoption of think–aloud techniques, cf. [64]. 118 5.4 Evaluation Part I: BPMN M1 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 M2 Ordering Concurrency Exclusiveness Repetition A Q9 F Q10 A Q11 F Q12 A Q13 F Q14 A Q15 F Q16 Ordering Concurrency Exclusiveness Repetition A Question Type F A… Abstraction F... Fragmentation A F A F A F Figure 5.5: Experimental design: questions and for each question type (abstraction and fragmentation) questions were developed, leading to a total of 16 questions (4 * 2 * 2). In the following, we refer to these questions to as Q1 to Q16 . To ensure that questions operationalizing fragmentation or abstraction are equally comprehensive, we developed question schemata for each category. For instance, for category repetition we used the following schema: ’X’ can be executed several times for a single case. By replacing X with a specific activity from M1 or M2 , fragmentation/abstraction questions were created. In addition, we ensured that all questions could be answered by using only information provided in the process models, i.e., the questions are schema–based comprehension tasks [120]. Likewise, the process models are made available to the subjects while answering the question, i.e., the tasks can be considered to be read–to–do, cf. [23]. Finally, we use only closed questions, i.e., each question can be answered by selecting from the following answers: True, False and Don’t Know. We award one point for each correct answer and zero points for a wrong answer (including Don’t Know ). We deliberately allow for the option Don’t Know, as otherwise subjects would be forced to guess. Response Variables To test hypotheses H4 to H9 , we define the following response variables: mental effort (H4 , H7 ), accuracy (H5 , H8 ) and duration (H6 , H9 ). To measure mental effort, we employ a 7–point rating scale, asking subjects to rate the mental effort from Very low (1) over Medium (4) to Very high (7). As for experiment E1 and replication R1 , we would like to emphasize that employing rating scales for measuring mental effort is known to be reliable and is widely adopted (cf. Section 3.2.3). Accuracy, 119 Chapter 5 The Impact of Modularization on Understandability in turn, is defined as the ratio of correct answers given by a subject divided by the number of all answers given. Hence, an accuracy of 1.0 describes that all questions were answered correctly by a subject, while an accuracy of 0.0 indicates that all questions were answered incorrectly by a subject. Finally, duration is defined as the duration required for answering a question (measured in seconds), i.e., the amount of seconds it takes to read the question, interpret the process model and give the answer. Experimental Design As discussed in Section 3.3, Cheetah Experimental Platform (CEP) builds the basis for empirical research conducted in this thesis. Likewise, we describe the experimental design in the form of an experimental workflow. As shown in Figure 5.6, the experimental workflow starts by asking the subject for a code. Similar to E1 and R1 , this code can be found on the assignment sheets distributed to subjects. By equally and randomly distributing codes 3563 and 9174, we ensure a randomized and balanced setup. In addition, the assignment sheets inform subjects that participation is anonymous and that no personal information will be collected. After having entered a valid code, subjects are guided through a demographic survey and a familiarization phase, in which subjects are introduced to the user interface. If the subject has entered code 3563, questions 1–8 will be presented for the modularized version of M1 , followed by questions 9–16 for the flat version of M2 . If the subject has entered code 9174, questions 1–8 will be presented for the flat version of M1 , followed by questions 9–16 for the modularized version of M2 . Finally, regardless of the code, a feedback form is shown to the subject. code = 3563 Enter Code Show Demographic Survey Model 1, Modularized: Questions 1–8 Model 2, Flat: Questions 9–16 Show Feedback Form Familiarization Phase code = 9174 Model 1, Flat: Questions 1–8 Model 2, Modularized: Questions 9–16 Figure 5.6: Experimental workflow 120 5.4 Evaluation Part I: BPMN Instrumentation and Data Collection Procedure To present the experimental material, i.e., process models and questions, to subjects, we rely on Hierarchy Explorer provided by CEP.9 Hierarchy Explorer, as shown in Figure 5.7, provides a basic user interface for the visualization of hierarchical process models. In particular, on the top left (1), the structure of the process model is shown. In this particular case, the top–level process is called Scientific Process, having sub–processes Do systematic literature review and Perform empirical evaluation. Next to the process structure, in (2) a question about the process model, which is displayed in (3), is shown. As summarized in Table 5.3, the process models used in this study contain up to 152 nodes. Thus, locating an activity by reading and comparing activity labels may cause a significant effort. As this study focuses rather on the reasoning about the control flow of a process model than the identification of activities, we decided to alleviate subjects from this tiresome task. In particular, whenever a question in (2) refers to an activity in the process model shown in (3), the user interface colors this particular part in the question as well as in the process model.10 For instance, in Figure 5.7 activities Define the problem and Check if problem is already solved are colored accordingly in (2) and (3). Whenever a subject has answered a question, it is possible to navigate to the next question by clicking on the Next button (4). To constantly inform subjects about their progress, the current percentage of answered questions is displayed below Overall Progress in (4). Finally, to make the interaction with Hierarchy Explorer as simple as possible, the user cannot navigate back to previously answered questions. Similar to TDMS (cf. Section 4.6), Hierarchy Explorer was implemented as an experimental workflow activity of CEP. This, in turn, allows for a seamless integration in the experimental workflow (cf. Figure 5.6). In addition, CEP automatically logs the duration of any experimental workflow activity, thereby automatically assessing the time required for answering and collecting response variable duration. 5.4.2 Performing the Experiment (E2 ) Based upon the experimental setup described in Section 5.4.1, the controlled experiment E2 was conducted. Aspects regarding the preparation and operation of E2 , as well as subsequent data validation and data analysis, are covered in the following. 9 We would like to thank Thomas Porcham for supporting the implementation of Hierarchy Explorer. 10 We think that the coloring of activities is acceptable, since the task of locating activities is not of concern in this study. However, requiring subjects to repeatedly identify activities by comparing activity labels from up to 85 activities, most likely causes severe annoyance among subjects, hence impairing motivation. 121 Chapter 5 The Impact of Modularization on Understandability Figure 5.7: User interface of Hierarchy Explorer Experimental Operation of E2 Experimental Preparation Similar to E1 , the preparation of E2 can be divided into the preparation of experimental material as well as the acquisition and training of subjects. We argued in Section 5.4.1 that the process models should reach a reasonable size and complexity so that the effects of modularization can be observed. As summarized in Table 5.3, the employed process models consist of 131 up to 152 nodes, which is clearly above the recommended size of a sub–process of 50 nodes [145]. After having created the process models, assignment sheets describing the experimental tasks to be conducted were printed. In addition, a pre–configured version of CEP was compiled and made available through our website. To ensure that the used material and instructions were comprehensible, we piloted the study and iteratively refined the material. Regarding the acquisition and training of subjects, we could rely on an ongoing collaboration with Prof. Reijers from the Eindhoven University of Technology. In particular, Prof. Reijers agreed to conduct experiment E2 in the course of a lecture on Business Process Management. BPMN and the usage of sub–processes were already existing topics covered in this lecture, hence participating students were trained accordingly. Experimental Execution Experiment E2 was conducted in January 2012 at the Eindhoven University of Technology in the course of a lecture on Business Process Management with 114 participants. The subjects were given the prepared assignment sheets and advised to follow the instruction detailed on the assignment sheets. The rest of experiment E2 was guided by CEP’s experimental workflow engine, 122 5.4 Evaluation Part I: BPMN leading subjects through an initial questionnaire, a familiarization phase and 16 questions regarding the prepared process models. As illustrated in Figure 5.6, 8 of the questions were posed for a modularized process model, whereas the remaining 8 questions were posed for a non–modularized process model. Finally, the experiment was concluded with a feedback form, enabling the subjects to report any problems or difficulties. Data Analysis of E2 Up to now, we have described the experimental design of E2 as well as the experimental preparation and experimental operation. In the following, we report from the data validation an data analysis. For this purpose, we used SPSS (Version 21.0), if not indicated otherwise. Data Validation The data validation conducted for E2 consisted of two phases. As shown in Figure 5.7, we colored relevant activities in our process models to support subjects in the identification of activities. Hence, in the first phase, we screened our sample for subjects that had problems with identifying colors or were colorblind. To this end, one of the demographic questions asked subjects to indicate whether they had troubles identifying colors or were colorblind. Our data analysis showed that 5 out of 114 subjects affirmed this question and were hence removed from the sample, leaving 109 subject for analysis. In the second phase we analyzed the demographical data of the remaining 109 subjects. As summarized in Table 5.4, questions 1–7 concerned the modeling proficiency of the subjects, i.e., confidence and competence using and understanding BPMN, experience and education. In addition, questions 8–10 asked for details of the models subjects had created or analyzed. Finally, questions 11–12 concerned the domain knowledge of subjects. For questions 1–3 and questions 11–12 we employed 7–point rating scales ranging from Strongly disagree (1) over Neutral (4) to Strongly agree (7). Subjects indicated average familiarity with BPMN (M = 4.57, SD = 1.20) and felt rather competent in understanding BPMN (M = 5.14, SD = 1.19). In addition, subjects indicated on average 7.90 months of experience in BPMN (SD = 11.09) and 2.58 years of experience in BPM in general (SD = 1.56). Furthermore, subjects indicated profound formal training (M = 5.55 days, SD = 5.78) and self–education (M = 6.83 days, SD = 11.18) during the last year. Questions 8–10 indicate that subjects had experience in reading process models (M = 52.08, SD = 49.4) and creating process models (M = 26.29, SD = 6.64), however not necessarily experience with large process models (M = 16.32 activities per model, SD = 6.64). Finally, with respect to domain knowledge, subjects indicated that they were rather familiar 123 Chapter 5 The Impact of Modularization on Understandability with scientific work (M = 5.12, SD = 1.08), i.e., the domain used for process model M1 , but rather unfamiliar with space travel (M = 2.97, SD = 1.36), i.e., the domain used for process model M2 . Min. Max. M SD Familiarity with BPMN Confidence understanding BPMN Competence modeling BPMN Months using BPMN Years experience in BPMa Days of formal training last year Days of self–education last year 2 2 2 0 0 1 0 7 7 7 75 7 40 100 4.57 5.14 4.58 7.90 2.58 5.55 6.83 1.20 1.19 1.14 11.09 1.56 5.78 11.18 8. Process models read last year 9. Process models created last year 10. Avg. number of activities per model 8 1 3 300 100 40 52.08 26.29 16.32 49.4 20.08 6.64 11. Familiarity scientific work 12. Familiarity space travel 2 1 7 6 5.12 2.97 1.08 1.36 1. 2. 3. 4. 5. 6. 7. a Three subjects reported implausible values, e.g., 1000 years experience in BPM, hence they were excluded from this overview. Table 5.4: Demographical statistics of E2 Summarizing, we conclude that the sample meets the demanded requirements, i.e., subjects should have received sufficient training in BPMN. Against this background, we conduct the data analysis and hypothesis testing in the following. In particular, we analyze RQ9 (positive influence of modularization) and RQ10 (negative influence of modularization) at three different levels of granularity. First, we give an overview, i.e., report aggregated values for all modularized models versus aggregated values for all non–modularized models. Then, we look into the differences for each model and finally report results for each question. RQ9 : Is the Understanding of a BPMN–Based Model Positively Influenced by Abstraction? As summarized in Section 5.4.1, we expect that abstraction has a positive influence on mental effort (H4 ), accuracy (H5 ) and duration (H6 ). To test these hypotheses, we created modularized and non–modularized (flat) versions of models and posed questions for those models. For RQ9 we created questions that are presumably 124 5.4 Evaluation Part I: BPMN influenced by abstraction, but not impaired by fragmentation. Thus, the questions should be easier to be answered in modularized models, i.e., should result in lower mental effort, higher accuracy and lower duration. The results of this investigation are listed in Table 5.5. In particular, in the first column the hypotheses to be tested are listed. The second column shows the average values for modularized models, whereas the third column shows the average values for flat models; the difference between average values for modularized models and average values for flat models can be found in the fourth column. We would like to remind at this point that each subject was asked to answer 4 questions regarding abstraction for modularized models and 4 questions regarding abstraction for flat models. Hence, the values reported in Table 5.5 are the aggregated values for 4 questions. Mental effort was measured using a 7–point rating scale, resulting in a value range from 4 to 28. Accuracy, in turn, was defined to range from 0 to 1, likewise values potentially range from 0 to 4. Likewise, durations are summed up for 4 questions. As expected, the average mental effort (H4 ) and duration (H6 ) were lower for modularized models, however, accuracy (H5 ) was slightly higher for flat models. Subsequently, columns five to seven list the results from statistical analysis. As the data is not normal–distributed11 and due to repeated measurements, response variables are not independent, we chose Wilcoxon Signed–Rank Test. In particular, Wilcoxon Signed–Rank Test reveals that the mental effort for modularized models (H4 ) was significantly lower for modularized models (Z = −4.94, p = 0.000, r = 0.47), also the duration (H6 ) was significantly lower for modularized models (Z = −4.99, p = 0.000, r = 0.48). Considering Cohen’s guidelines for effect sizes, an effect size of 0.1 is regarded a small effect, 0.3 is regarded as medium effect and values ranging around 0.5 are considered to be large effects [32, 33]. Therefore, for both comparisons large effects, i.e., values around 0.5, could be observed. For hypothesis accuracy (H5 ), in turn, no statistically significant differences could be found (Z = −1.07, p = 0.286, r = 0.10), also the effect size can be considered to be small. In summary, it can be said that support for hypothesis mental effort (H4 ) and duration (H6 ), i.e., statistically significant differences and large effect sizes, could be found. Hypothesis accuracy (H5 ), however, could not be supported, i.e., differences were not statistically significant and effect sizes were small. In the following, we refine our analysis by looking into the results model–by–model, i.e., analyzing M1 and M2 separately. 11 We would like to remind at this point that for the sake of readability tests for normal distribution were moved to Appendix A.3. 125 Chapter 5 The Impact of Modularization on Understandability Hypothesis H4 : Mental effort H5 : Accuracy H6 : Duration a Mod. 11.60 3.42 117.22 Flat 13.10 3.50 154.48 Δ −1.50 −0.08 −37.26 Z −4.94 −1.07 −4.99 p 0.000a 0.286 0.000a r 0.47 0.10 0.48 significant at the 0.05 level Table 5.5: Results for abstraction questions Abstraction: Results per Model Similar to the previous analysis, we organized our results in a table. In particular, as shown in Table 5.6, we analyzed hypotheses H4 , H5 and H6 separately for M1 and M2 . The first column lists the hypotheses, while the second column lists the models for which the hypotheses were tested. Columns three and four list the values for modularized and flat models, respectively, whereas column five shows the difference between modularized and flat models. The remaining columns six to eight, in turn, list the results of statistical tests. The data is not normal–distributed, however, the response variables are now independent, because the models are analyzed separately—thus, we applied Mann–Whitney U Test. The results, as shown in Table 5.6 are in line with the results from Table 5.5. Hypotheses mental effort (H4 ) and duration (H6 ) consistently show significant differences, even though the effect size is reduced to a medium effect. For hypothesis accuracy (H5 ), in turn, statistically significant differences could be found for M1 (even though opposing the expected direction); for M2 differences were not significant. To explain the reduction of effect size, we would like to refer to the fact that we used Wilcoxon Signed–Rank Test, i.e., a paired sample test, in Table 5.5, but used Mann–Whitney U Test, i.e., an unpaired sample test, in Table 5.6. As paired–sample tests eliminate inter–individual differences and thus reduce the standard error of the difference between means, paired test can more accurately detect differences [294], which, in turn, explains the differences in effect size. Summarizing, it can be said that the results obtained from analyzing each model separately for hypotheses H4 , H5 and H6 are in line with the results obtained from analyzing the models together, i.e., results from Table 5.5. In particular, the results suggest a significant influence of abstraction on mental effort (H4 ) and duration (H6 ), but do not support the influence on accuracy (H5 ). In the following, we will look into mental effort, accuracy and duration for each question. By doing so, we investigate whether findings are also consistent across questions. Furthermore, in our opinion, such a fine–grained analysis helps to get a more detailed picture of the collected data. 126 5.4 Evaluation Part I: BPMN Hypothesis Model Mod. Flat Δ U p r H4 : Mental effort M1 M2 11.98 11.19 13.15 13.05 −1.17 −1.86 1102.50 1090.00 0.020a 0.016a 0.22 0.23 H5 : Accuracy M1 M2 3.32 3.53 3.62 3.39 −0.30 0.14 1142.00 1337.00 0.015a 0.305 0.23 0.10 H6 : Duration M1 M2 136.75 96.59 179.77 130.54 −43.02 −33.95 764.00 853.00 0.000a 0.000a 0.42 0.37 a significant at the 0.05 level Table 5.6: Results for abstraction questions (per model) Abstraction: Results for Mental Effort (H4 ) So far, we have analyzed the impact of abstraction on the mental effort for all models as well as per model. Now, we investigate the influence of abstraction on mental effort on a per–question basis. Please remember that questions that operationalized abstraction and fragmentation were posed alternately, i.e., Q1 , Q3 , . . . Q15 operationalized abstraction, while Q2 , Q4 , . . . Q16 operationalized fragmentation (cf. Section 5.4.1). Thus, the summary in Table 5.7 only lists odd questions. Similar to the previous tables, in Table 5.7, the first two columns list the model and question. Then, mental effort for modularized and flat models as well as the difference thereof are shown. Finally, columns six to eight list results from testing for statistical significance using Mann–Whitney U Test. Again, it can be observed that the results are in line with the observations made so far, i.e., the mental effort for abstraction questions was tendentially lower in modularized models than in flat models. At the same time, it can also be observed that the differences were not statistically significant for all questions, even though differences from Table 5.5 and Table 5.6 were statistically significant. This discrepancy can be explained by the assumption that a response variable, i.e., mental effort in this case, is not only influenced by the treatment, i.e., abstraction and modularization, but also influenced by other factors, such as the subject’s knowledge or the phrasing of the question. In addition, it is assumed that the influence of the treatment is on average larger than the influence of such interference factors. Thus, when analyzing mental effort per model, the influence of the treatment is accumulated, but also the influence of interference factors is accumulated. As, however, the average influence of the treatment is larger than the average influence of interference factors, the difference between the influence of the treatment and the influence of interference factors grows—hence also differences are more likely to be found statistically significant. Against this background, also 127 Chapter 5 The Impact of Modularization on Understandability the results presented in Table 5.7 are in line with the findings presented so far. Model Quest. M1 M2 a Mod. Flat Δ U p r Q1 Q3 Q5 Q7 2.75 3.14 3.02 3.07 2.85 3.75 3.58 2.96 −0.10 −0.61 −0.56 0.11 1366.50 968.50 1017.50 1385.00 0.452 0.001a 0.002a 0.519 0.07 0.31 0.29 0.06 Q9 Q11 Q13 Q15 2.85 2.89 2.87 2.58 3.05 3.48 3.20 3.32 −0.20 −0.59 −0.33 −0.74 1291.00 1128.00 1239.50 958.50 0.223 0.025a 0.120 0.001a 0.12 0.21 0.15 0.32 significant at the 0.05 level Table 5.7: H4 —Mental effort for abstraction questions Abstraction: Results for Accuracy (H5 ) Analogous to mental effort (H4 ), in the following we summarize the average accuracy of each individual question (H5 ). In particular, Table 5.8 shows the average accuracy, differences between modularized and flat models as well as results from statistical tests. Here, two observations are of special interest. First, except for Q15 , observed differences are marginal and non– significant. Second, results rather point toward a negative influence of abstraction— even though the only statistically significant difference indicates a positive influence of abstraction on accuracy. These two observations further substantiate that in E2 the influence of abstraction on accuracy appears to be inconsistent and rather weak. Not surprisingly, the average effect size from Table 5.8 is 0.10, i.e., only a small effect could be observed (cf. [32, 33]). Abstraction: Results for Duration (H6 ) To conclude the analysis of RQ9 , i.e., the influence of abstraction, we look into the average duration required for answering questions. As summarized in Table 5.9, subjects were tendentially faster when answering questions in modularized models. The observed differences were found to be statistically significant in all but one question (Q9 ), i.e., are in line with the findings presented up to now. However, we also would like to call attention to Q7 : in this case subjects were significantly slower in modularized models. A detailed analysis of Q7 showed that the question was phrased such that subjects had to screen the process models’ sub–processes to find all relevant activities. As Q7 could then be 128 5.4 Evaluation Part I: BPMN Model Quest. M1 M2 a Mod. Flat Δ U p r Q1 Q3 Q5 Q7 0.82 0.79 0.77 0.95 0.94 0.91 0.83 0.94 −0.12 −0.12 −0.06 0.01 1303.00 1306.00 1391.50 1479.50 0.051 0.086 0.420 0.945 0.19 0.16 0.08 0.01 Q9 Q11 Q13 Q15 0.89 0.77 0.89 0.98 0.91 0.75 0.91 0.82 −0.02 0.02 −0.02 0.16 1448.50 1449.00 1448.50 1247.00 0.680 0.774 0.680 0.006a 0.04 0.03 0.04 0.26 significant at the 0.05 level Table 5.8: H5 —Accuracy for abstraction questions answered by considering a single sub–process, it thereby did not cause fragmentation, i.e., switching between sub–processes while answering the question. However, it apparently caused significant overhead for locating the relevant activities, which explains the higher duration for modularized models. In this sense, Q7 probably did not operationalize abstraction perfectly. As, however, the remaining questions seem to counterbalance this issue, i.e., the aggregated values from Table 5.5 and Table 5.6 are still statistically significant, we refrained from excluding Q7 from this analysis. Model Quest. M1 M2 a Mod. Flat Δ U p r Q1 Q3 Q5 Q7 32.80 40.00 29.30 34.65 51.89 50.77 47.75 29.37 −19.09 −10.77 −18.45 5.28 698.00 747.00 658.00 1068.00 0.000a 0.000a 0.000a 0.012a 0.46 0.43 0.48 0.24 Q9 Q11 Q13 Q15 36.52 24.98 22.45 12.63 40.11 34.65 34.41 21.38 −3.59 −9.67 −11.96 −8.75 1339.00 1020.50 720.00 604.00 0.379 0.005a 0.000a 0.000a 0.08 0.27 0.44 0.51 significant at the 0.05 level Table 5.9: H6 —Duration for abstraction questions Concluding, it can be said that experiment E2 provides empirical support for the positive influence of abstraction on mental effort (H4 ) and duration (H6 ), while no 129 Chapter 5 The Impact of Modularization on Understandability statistically significant influence on accuracy (H5 ) could be found. In the following, we investigate RQ10 , i.e., the influence of fragmentation. RQ10 : Is the Understanding of a BPMN–Based Model Negatively Influenced by Fragmentation? In the following, we approach RQ10 analogous to RQ9 . More specifically, as summarized in Section 5.4.1, we expect that fragmentation has a negative influence on mental effort (H7 ), accuracy (H8 ) and duration (H9 ). Therefore, we created questions that are presumably impaired by fragmentation, but not influenced by abstraction. Thus, the questions should be harder to answer in modularized models, i.e., should result in higher mental effort, lower accuracy and higher duration. We again start by taking a look at the hypotheses for the aggregated values of all questions, report findings from M1 and M2 separately and finally report values for each question. Hypothesis Mod. H7 : Mental effort H8 : Accuracy H9 : Duration a 16.17 2.70 176.87 Flat 13.06 3.45 120.08 Δ 3.11 −0.75 56.79 Z −7.20 −5.28 −5.92 p 0.000a 0.000a 0.000a r 0.69 0.51 0.57 significant at the 0.05 level Table 5.10: Results for fragmentation questions An overview of the results of RQ10 can be found in Table 5.10. Akin to RQ9 , the columns list the hypotheses to be tested, values for modularized and flat models, differences thereof as well as results from statistical tests. Again, we relied on Wilcoxon Signed–Rank Test, since the data is not normal–distributed and the sample is due to the repeated measurements not independent. Wilcoxon Signed–Rank Test shows that mental effort for modularized models (H7 ) was significantly higher (Z = −7.20, p = 0.000, r = 0.69), accuracy for modularized models (H8 ) was significantly lower (Z = −5.28, p = 0.000, r = 0.51) and duration for modularized models (H9 ) was significantly higher (Z = −5.92, p = 0.000, r = 0.57). In addition, large effect sizes, i.e., values ≥0.5, could be observed (cf. [32, 33]). Summarizing, we conclude that support for hypotheses mental effort (H7 ), accuracy (H8 ) and duration (H9 ), i.e., statistically significant differences and large effect sizes, could be found. In the following, we extend our analysis by investigating the results per model. 130 5.4 Evaluation Part I: BPMN Fragmentation: Results per Model To determine whether the reported findings are specific for M1 or M2 , or could be found for both models, we analyzed H7 , H8 and H9 separately for M1 and M2 . The results of this analysis can be found in Table 5.11, whereby the columns again list hypotheses, models, values for modularized models and flat models, differences thereof as well as results from the statistical analysis. As per RQ9 , the data is not normal–distributed and variables are independent due to the separate analysis of models, hence we applied Mann–Whitney U Test. It can be observed that the results from Table 5.11 are in line with previous findings, i.e., the influence of fragmentation on mental effort (H7 ), accuracy (H8 ) as well as duration (H9 ) is present for both models and statistically significant. However, again, effect sizes slightly decrease, which can be traced back to the application of tests for unpaired samples (cf. [294]). Hypothesis Model Mod. Flat Δ U H7 : Mental effort M1 M2 15.64 16.74 12.32 13.75 3.32 2.99 575.00 797.00 0.000a 0.000a 0.53 0.40 H8 : Accuracy M1 M2 2.82 2.57 3.66 3.25 −0.84 −0.68 806.00 1040.50 0.000a 0.005a 0.43 0.27 H9 : Duration M1 M2 262.94 219.27 128.40 112.22 134.54 107.05 197.00 393.00 0.000a 0.000a 0.75 0.63 a p r significant at the 0.05 level Table 5.11: Results for fragmentation questions In summary, it can be observed that results obtained from analyzing M1 and M2 separately are in line with the results obtained from analyzing models together, i.e., results from Table 5.10. In the following, we refine our analysis by investigating the effects on individual questions. Fragmentation: Results for Mental Effort (H7 ) So far, we have analyzed the influence on mental effort (H7 ) for all models as well as per model. Now, we look into the effects on each question individually. We would like to remind that questions that operationalized abstraction and fragmentation were posed alternately, i.e., Q1 , Q3 , . . . Q15 operationalized abstraction, while Q2 , Q4 , . . . Q16 operationalized fragmentation. For this reason, Table 5.12 lists only even questions. Analogous to the analysis of H4 , the first two columns list the model and question, while columns three to five show values for modularized and flat models as well as differences thereof; columns six to eight report results from statistical tests. It can be observed 131 Chapter 5 The Impact of Modularization on Understandability that for all questions the average mental effort was higher for modularized models. In addition, all differences could be found to be statistically significant and effect sizes range from medium to large, as indicated in columns six to eight. Hence, it can be said that results obtained from the question–based analysis is in line with the findings obtained so far. Model Quest. M1 M2 a Mod. Flat Δ U p r Q2 Q4 Q6 Q8 3.82 4.14 4.05 3.63 2.74 2.83 3.62 3.13 1.08 1.31 0.43 0.50 697.50 491.50 1139.00 1091.50 0.000a 0.000a 0.029a 0.012a 0.47 0.60 0.21 0.24 Q10 Q12 Q14 Q16 4.75 4.09 3.96 3.92 3.55 3.59 3.32 3.29 1.20 0.50 0.64 0.63 693.50 1041.50 1002.50 1033.00 0.000a 0.006a 0.003a 0.005a 0.47 0.27 0.29 0.27 significant at the 0.05 level Table 5.12: H7 —Mental effort for fragmentation questions Fragmentation: Results for Accuracy (H8 ) Analogous to mental effort (H7 ), in the following we summarize the average accuracy (H8 ) of each individual question. As can be seen in Table 5.13, the average accuracy for fragmentation questions was lower in modularized models, cf. column five. However, differences are only statistically significant for three questions and effect sizes are lower than for the previous analyses. As argued for abstraction, this might be traced back to the assumption that differences are more likely to be found significant when values are aggregated, as the effect of the treatment sums up. Against this background, also the results from Table 5.14 are in line with findings obtained so far. Fragmentation: Results for Duration (H9 ) To conclude RQ10 , we look into the influence of fragmentation on duration (H9 ) for each question. As summarized in Table 5.14, for all questions subjects spent more time for answering questions in the modularized versions of the model. In addition, all differences could be found to be statistically significant and effect sizes range from medium to large. In this sense, also the results from Table 5.14 are in line with the findings obtained up to now. Summarizing, it can be said the data obtained in E2 corroborates the positive influence of abstraction (RQ9 ) as well as the negative influence of fragmentation 132 5.4 Evaluation Part I: BPMN Model Quest. M1 M2 a Mod. Flat Δ U p r Q2 Q4 Q6 Q8 0.75 0.79 0.84 0.45 0.89 0.94 0.92 0.91 −0.14 −0.15 −0.08 −0.46 1281.00 1250.00 1357.50 802.50 0.066 0.017a 0.172 0.000a 0.18 0.23 0.13 0.49 Q10 Q12 Q14 Q16 0.40 0.68 0.81 0.68 0.82 0.82 0.86 0.75 −0.42 −0.14 −0.05 −0.07 853.00 1273.00 1416.00 1379.00 0.000a 0.087 0.522 0.415 0.43 0.16 0.06 0.08 significant at the 0.05 level Table 5.13: H8 —Accuracy for fragmentation questions (RQ10 ). In particular, for abstraction, hypotheses mental effort (H4 ) and duration (H6 ) could be supported, whereas no indication for a significant influence on accuracy could be found (H5 ). For fragmentation, in turn, support for mental effort (H7 ), accuracy (H8 ) and duration (H9 ) could be found. In the following, we look into the correlation between mental effort and accuracy as well as the correlation between mental effort and duration. Correlations with Mental Effort In Section 5.3.2, we argued that we centered our framework around mental effort, since it is known to correlate with performance and therefore can be used to estimate understandability. Subsequently, we investigate whether mental effort is indeed linked to answering performance measured in E2 , i.e., accuracy and duration. In particular, we computed the average mental effort, accuracy and duration for each question. Since we expect different values, depending on whether a question was posed for the modularized or flat version of a process model, we differentiated between questions that were posed for modularized models and questions that were posed for flat models. Having asked 8 questions per model for 2 models and considering the differentiation between modularized and flat models, in total 32 data points could be computed (8 * 2 * 2). The first part of this analysis can be found in Figure 5.8, which shows the correlation between mental effort and accuracy. Therein, the x–axis represents the mental effort, ranging from Extremely low mental effort (1) to Extremely high mental effort (7). On the y–axis, in turn, the corresponding accuracy can be found, ranging from 0 (all answers wrong) to 1 (all answers correct). 133 Chapter 5 The Impact of Modularization on Understandability Model M1 M2 a Quest. Mod. Flat Δ U p r Q2 Q4 Q6 Q8 71.29 68.71 74.98 47.95 36.96 22.79 40.20 28.45 34.33 45.92 34.78 19.50 583.00 181.00 546.00 742.00 0.000a 0.000a 0.000a 0.52 0.76 0.54 0.43 Q10 Q12 Q14 Q16 88.07 40.28 56.53 34.40 36.21 25.69 28.41 21.91 51.86 14.59 28.12 12.49 363.00 817.00 877.00 916.00 0.000a 0.000a 0.000a 0.001a 0.65 0.39 0.35 0.33 0.000a significant at the 0.05 level Table 5.14: H9 —Duration for fragmentation questions Considering Figure 5.8, three observations can be made. First, most questions were perceived to be rather easy, as most of the data points can be found to the left of Neither high nor low mental effort (4). Likewise, for most questions an accuracy of at least 70% could be computed. Second, the scatter diagram suggests that high mental effort is related to low accuracy. In fact, mental effort and accuracy show a significant negative correlation according to Pearson Correlation (r(30) = −0.643, p = 0.000), corroborating the assumption that mental effort and accuracy are related. Third, even though most questions seem to show similar behavior, two questions seem not to fit in. In particular, Q8 and Q10 posed for modularized models resulted in a below–average accuracy. Although it is not surprising that some questions were possibly more difficult than others, it seems that the relation between mental effort and accuracy for Q8 and Q10 differs with respect to the other questions. To quantify these differences, we applied simple linear regression, which shows that mental effort significantly predicts accuracy: accuracy = 1.37 − 0.16 ∗ mental effort, t(30) = −4.60, p = 0.000 and explains a significant proportion of variance in accuracy, R2 = 0.41, F (1, 30) = 21.16, p = 0.000. To identify outliers, we computed the residual for each question and calculated the Median Absolute Deviation (MAD) [82]. In particular, values differing more than 3 times the MAD from the median were considered outliers. Applying this rather conservative criterion [130], Q8 and Q10 were detected as the only outliers. Therefore, in the following we discuss potential reasons for the unexpected behavior of Q8 and Q10 . One potential explanation could be that questions Q8 and Q10 were worded imprecisely or were difficult to comprehend. However, as summarized in Table 5.15, 134 5.4 Evaluation Part I: BPMN 1.00 Accuracy 0.80 0.60 0.40 0.20 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Mental Effort Figure 5.8: Mental effort versus accuracy, adapted from [298] when the same question was asked for flat models, a considerably higher accuracy was reached. Another potential explanation could be that ordering and repetition questions were particularly difficult. Again, we would like to refer to Table 5.15 to refute this explanation: except for Q8 and Q10 , the average accuracy was above average (average accuracy for E2 : 82%). Finally, it may have been the case that ordering and repetition questions were particularly difficult when posed in modularized models. As described in Section 5.4.1, we posed the same type of questions, i.e., ordering and repetition in this case, for M1 as well as M2 . However, these particularly low accuracy values could only be found once for M1 and once for M2 . Mod. Flat Mod. Flat Ordering M1 M2 Q7 Q8 95% 45% 94% 91% Q15 Q16 98% 68% 82% 91% Repetition M1 M2 Q9 Q10 89% 40% 91% 82% Q1 Q2 82% 75% 94% 89% Table 5.15: Average accuracy for ordering and repetition questions 135 Chapter 5 The Impact of Modularization on Understandability Against this background, also this explanation seems implausible. Hence, due to lack of alternative explanations, we conclude that these outliers can be traced back peculiarities of the sample. 100.00 Duration (sec) 80.00 60.00 40.00 20.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Mental Effort Figure 5.9: Mental effort versus duration, adapted from [298] Analogously, we used the same procedure to investigate the correlation between mental effort and duration. As shown in Figure 5.9, the average mental effort and average duration required for answering a question seem to be correlated. Indeed, Pearson Correlation indicates a positive correlation between mental effort and duration (r(30) = 0.742, p = 0.000). Analogous to the correlation between mental effort and accuracy, we have applied the simple linear regression: duration = −44.41 + 24.93 ∗ mental effort, t(30) = 6.06, p = 0.000, whereby mental effort explains a significant proportion of variance in duration: R2 = 0.55, F (1, 30) = 36.70, p = 0.000. Applying median ±3 MAD for residuals confirms that all data points behave similarly, further corroborating the connection between mental effort and duration. Therefore, we conclude that—even though two unexplainable outliers were identified for accuracy—mental effort and accuracy as well as mental effort and duration are correlated. In other words, E2 corroborates the assumption that mental effort can be used as an estimator for answering performance such as accuracy and 136 5.4 Evaluation Part I: BPMN duration. Next, we discuss limitations of E2 before we revisit the findings for a discussion. Limitations Clearly, experiment E2 has to be seen in the light of several limitations, mostly concerning the generalization of results. Particularly, even though 114 subjects participated in E2 , all of the subjects indicated that they were students. As argued in Section 4.9, students may provide an adequate model for the professional population, if certain conditions, e.g., an adequate level of commitment, are fulfilled. Since 82% of the questions were answered correctly, we conclude that subjects were seriously interested in properly working on their assignments, indicating commitment. Furthermore, it should also be noted that experiment E2 involved only two process models. Even though these models contained typical constructs, e.g., sequences and concurrency, apparently not all potential constructs can be covered. Likewise, the process models were modeled using BPMN, i.e., E2 is restricted to a single process modeling language. Although BPMN can be considered to be a de–facto standard and the findings therefore contribute to the understanding of a commonly used process modeling language, generalization to other languages is limited. Finally, even though questions asked in E2 were designed to cover typical modeling constructs, it may be the case that the questions were not entirely representative. Discussion Overall, the data collected in experiment E2 provides support for the positive influence of abstraction (RQ9 ) and the negative influence of fragmentation (RQ10 ) on the understanding of BPMN–based process models. Thereby, we operationalized the understanding of a model as the mental effort, accuracy and duration required for answering a question. In the following, we link these findings with existing insights from literature and discuss potential reasons for the missing influence of abstraction on accuracy. When conducting the systematic literature review assessing the current state of the art regarding empirical studies investigating the interplay between modularization and understanding, we could observe that researchers seemed to have a rather positive attitude toward modularization. For instance, Cruz–Lemus et al. [42] state that “our hypotheses derive from conventional wisdom, which says that hierarchical modeling mechanisms are helpful in mastering the complexity of a software system”. In a similar vein, [150, 225] seek support for the positive effects of mechanisms of modularization. In turn, Cruz–Lemus et al. seem to be surprised that their studies 137 Chapter 5 The Impact of Modularization on Understandability showed a negative influence of modularization: “this finding goes against conventional wisdom” [42]. Against this background, the findings from E2 improve over the state of the art by providing empirical evidence that modularization not necessarily improves all aspects of a model. Rather, it appears that a certain trade–off is involved in the application of modularization. In particular, E2 empirically substantiates that abstraction and fragmentation may offer a viable explanation for the interplay of modularization and understandability. Even though we argue that E2 provides empirical evidence for the positive influence, of abstraction, it remains unclear why no influence on accuracy (H5 ) could be found. The most obvious explanation for the lack of correlation between abstraction and accuracy in E2 is simply that abstraction does not influence accuracy, i.e., the postulated framework from Section 5.3.2 is incorrect. However, as argued in the following, we think that this explanation—even though obvious at first—is implausible. Having established that mental effort and accuracy as well as mental effort and duration correlate and knowing that abstraction had a significant influence on mental effort and duration, it seems implausible that no connection between abstraction and accuracy exists. Hence, we conclude that the lack of significant differences can be traced back to peculiarities of the experimental setup that could not be detected when analyzing the data of E2 . To test this assumption, we conducted a replication of E2 with the goal of understanding the interplay between abstraction and accuracy. 5.4.3 Performing the Replication (R2 ) The replication of experiment E2 , subsequently referred to as R2 , was conducted for two reasons. First, in E2 we could provide empirical evidence for the positive influence of abstraction on mental effort and duration, but could not find any influence on accuracy. Hence, the first goal of R2 was to further investigate this unexpected outcome in detail. Second, by conducting a replication, the findings obtained in E2 should be confirmed by a second empirical investigation. Due to the nature of a replication, the experimental design of R2 is closely connected to the experimental design of E2 . Therefore, in the following, we will not repeat the experimental design described in Section 5.4.1 here, but rather discuss the differences between the experimental designs of E2 and R2 . Similar to E2 , then we focus on the preparation and operation of R2 , as well as data validation and analysis. 138 5.4 Evaluation Part I: BPMN Experimental Design Basically, the experimental design of R2 is identical to the experimental design of E2 , except for one difference: Besides asking subjects comprehension questions, we asked subjects to justify their answer. To this end, we extended the graphical user interface of Hierarchy Explorer, as shown in Figure 5.7, by replacing the progress bar with a text field. Subjects were then asked to answer the question, but also to briefly explain why they had decided for a certain answer. Through this adaption, we intended to extend the measurement of mental effort, accuracy and duration with more detailed information about accuracy. Particularly, we planned to analyze incorrect answers, trying to investigate the connection between abstraction and accuracy in more detail. For the analysis of incorrect answers, the adoption of principles from grounded theory [35, 234] in a two–step procedure was planned. First, in the open coding phase all justifications of incorrect answers were collected. For each justification, a potential cause for the incorrect answer was determined, e.g., lack of knowledge or misunderstood questions. After all justifications received an initial classification, the procedure was repeated, all classifications were revisited and adapted if necessary. This procedure was repeated until the classification was stable, i.e., none of the classifications was changed anymore. Second, in the axial coding phase classifications were repeatedly grouped to higher level categories. Experimental Operation of R2 Experimental Preparation Since R2 builds upon the experimental setup of E2 , all of the experimental material, i.e., process models and questions, could be reused for R2 . Still, Hierarchy Explorer had to be extended with a text field in order to allow subjects justifying their answer and the assignments sheets had to be adapted to instruct subjects to justify their answers. In addition, experimental preparation included the acquisition of subjects. Unlike for E2 , where a large number of students was available, we could not find any opportunity to conduct R2 in a controlled setting. Therefore, we decided to conduct R2 as an online study and to acquire subjects at the University of Innsbruck by acquiring volunteers among students and researchers working in related areas. In addition, we asked researchers we were collaborating with to ask students and co–workers to participate in the experiment. We would like to emphasize at this point that we are aware that this procedure will inevitably lead to a heterogeneous sample and an uncontrolled setting. However, due to the lack of alternatives, we decided for this procedure and put a particular emphasis on the data validation for R2 . As the experimental design of R2 was operationalized through an experimental workflow in CEP, the experimental material 139 Chapter 5 The Impact of Modularization on Understandability could easily be distributed to subjects by handing out respective download links to a preconfigured version of CEP. Experimental Execution Replication R2 was started in April 2012 and ended in July 2012, during this time 48 subjects participated. Even though we could not control the environment in which subjects were answering the comprehension questions, the experimental workflow of CEP ensured that none of the subjects skipped a task and that data was collected successfully. Data Analysis of R2 So far, we have focused on the experimental design of R2 as well as experimental preparation and experimental operation. Next, we describe the data validation and data analysis. Data Validation Replication R2 was conducted in an uncontrolled setting, hence particular care needs to be taken with respect to the validity of the collected data. First, and analogous to E2 , we screened our sample for subjects that had problems in identifying colors. Unlike in E2 , none of the subjects had to be excluded from the data analysis due to this reason. Second, we could not ensure that subjects fully focused on their tasks. Rather, it seems likely that the answering process may were interrupted by, e.g., telephone calls, answering emails or surfing the web. To assess whether subjects seriously worked on their assignments, we computed the average accuracy for each subject. Surprisingly, the average accuracy was computed to be 89%, i.e., 7% higher than for E2 . In addition, 46 out of 48 subjects achieved an accuracy of at least 75% and all subjects achieved an accuracy of more than 56%. Against this background, it seems plausible to conclude that subjects were working on their assignments seriously. Similarly, the average mental effort was 3.29, which is approximately Rather low mental effort. The highest reported mental effort, in turn, was 4.9, which is approximately Rather high mental effort. Hence, we conclude that none of the subjects was overstrained by the experimental tasks. Still, a high accuracy and the lack of exceedingly high mental effort does not necessarily indicate that subjects were not interrupted, i.e., measured durations for answering questions may still fluctuate unexpectedly. To detect these anomalies, we screened the sample for outliers. As outliers should not be removed without justification [227], we employed the Median Absolute Deviation (MAD) [82] for detecting outliers.12 In 12 SPSS (Version 21.0) does not provide a satisfactory procedure for computing the MAD. Therefore, we exported data to R [200], computed MAD and re–imported data in SPSS. 140 5.4 Evaluation Part I: BPMN particular, we considered values differing from the median 3 times the MAD or more as outliers. Applying this rather conservative criterion [130] led to the exclusion of 18 subjects for analyzing duration. We would like to point out that we did not exclude these subjects from analyzing mental effort and accuracy, since these values appear to be plausible. Next, we analyzed the subject’s demographical information, as summarized in Table 5.16. Analogous to E2 , questions 1–7 concerned modeling proficiency, questions 8–10 asked for specifics of models subjects had created or analyzed. Finally, questions 11–12 covered domain knowledge. For questions 1–3 and questions 11– 12, we employed a 7–point rating scale ranging from Strongly (1) over Neutral (4) to Strongly agree (7). Overall, subjects indicated that they were rather familiar with BPMN (M = 5.17, SD = 1.40), felt rather confident in understanding BPMN (M = 5.52, SD = 1.09) and felt rather competent in modeling BPMN–based models (M = 5.13, SD = 1.08). Furthermore, subjects had on average 25.27 months of experience in modeling BPMN (SD = 19.66) and 3.85 years of experience in BPM in general (SD = 3.43). In addition, it can be observed that subjects had received formal training (M = 3.00 days, SD = 9.21) and considerable self–education (M = 15.94 days, SD = 22.61). Questions 8–10 indicate that subjects had experience in reading process models (M = 41.42, SD = 50.12) and creating process models (M = 15.60, SD = 20.32), however, not necessarily experience with large models (M = 26.29 activities per model, SD = 41.75). Finally, regarding domain knowledge, subjects indicated familiarity with scientific work (M = 5.65, SD = 1.14), i.e., the domain used for process model M1 , but indicated rather unfamiliarity with space travel (M = 3.33, SD = 1.56), i.e., the domain used for process model M2 . Summarizing, it can be said that—even though some of the participants were rather new to BPMN—the average subject indicated considerable training and experience in BPMN. Knowing that from 48 subjects, 26 subjects indicated that they were students, but 16 subjects indicated academic background and 6 subjects indicated professional background, this appears plausible. Hence, we conclude that subjects meet the demanded requirements, i.e., subjects should have received sufficient training in BPMN. Against this background, in the following we continue with the data analysis and hypothesis testing. Analogous to E2 , we approach R9 (positive influence of modularization) and RQ10 (negative influence of modularization) at three levels of granularity. First, we give an overview, i.e., report aggregated values for all modularized models versus all non–modularized models. Then, we analyze process models M1 and M2 separately. Finally, we analyze each question separately. 141 Chapter 5 The Impact of Modularization on Understandability Min. Max. M SD Familiarity with BPMN Confidence understanding BPMN Competence modeling BPMN Months using BPMN Years experience in BPM Days of formal training last year Days of self–education last year 1 3 3 0 0 0 0 7 7 7 82 20 62 100 5.17 5.52 5.13 25.27 3.85 3.00 15.94 1.40 1.09 1.08 19.66 3.43 9.21 22.61 8. Process models read last year 9. Process models created last year 10. Avg. number of activities per model 0 0 0 220 100 200 41.42 15.60 26.29 50.12 20.32 41.75 11. Familiarity scientific work 12. Familiarity space travel 3 1 7 7 5.65 3.33 1.14 1.56 1. 2. 3. 4. 5. 6. 7. Table 5.16: Demographical statistics of R2 RQ9 : Is the Understanding of a BPMN–Based Model Positively Influenced by Abstraction? Analogous to E2 , we expect in replication R2 a positive influence of abstraction on mental effort (H4 ), accuracy (H5 ) and duration (H6 ). As described in Section 5.4.1, we created questions that are presumably influenced by abstraction, but are not imposed by fragmentation. Hence, these questions should be easier to be answered in modularized model, i.e., should result in lower mental effort, higher accuracy and lower duration. Hypothesis Mod. Flat Δ Z H4 : Mental effort H5 : Accuracy H6 : Duration 11.50 3.58 256.46 12.88 3.65 322.72 −1.38 −0.07 −66.26 −3.15 −0.53 −2.09 a p 0.002a 0.599 0.037a r 0.45 0.08 0.38 significant at the 0.05 level Table 5.17: Results for abstraction questions The results regarding RQ9 are summarized in Table 5.17. In particular, the first column lists hypotheses, columns two to four show values for modularized models, flat models and the differences thereof. Columns five to seven, in turn, list the 142 5.4 Evaluation Part I: BPMN results of statistical tests. Not all data is normal–distributed13 and due to repeated measurements, response variables are not independent. Hence, to make results better comparable with E2 , we refrained from using parametric tests and chose Wilcoxon Signed–Rank Test to test for statistical significance. In particular, Wilcoxon Signed– Rank Test shows that mental effort (H4 ) was significantly lower in modularized models (Z = −3.15, p = 0.002, r = 0.45) and duration (H6 ) was significantly lower in modularized models (Z = −2.09, p = 0.037, r = 0.38), however, no significant differences could be found for accuracy (H5 ) in R2 (Z = −0.53, p = 0.599, r = 0.08). Analogously, the effect sizes for mental effort and duration can be considered large and medium, respectively, while the effect size for accuracy is small (cf. [32, 33]). We would like to remind at this point that subjects were asked to justify their answers. Hence, it appears plausible that the average duration in R2 were almost twice as high as the average duration in E2 . In the following, we refine our analysis by analyzing results for process models M1 and M2 separately. Abstraction: Results for Model Similar to the previous analysis and analogous to E2 , we have analyzed the results for M1 and M2 separately. In particular, Table 5.18 lists the hypotheses, models, values for modularized as well as flat models and the difference thereof. Columns six to eight, in turn, report results from statistical tests. Analogous to E2 , we applied Mann–Whitney U Test to test for statistical significance, since models were analyzed separately and therefore the response variables were independent. Again, results are in line with findings obtained so far: mental effort and duration were on average lower in modularized models, while differences regarding accuracy appear to be marginal. However, except for duration, differences are not statistically significant. As discussed in E2 , this can be partially traced back to the application of statistical tests for unpaired samples, i.e., the application of Mann–Whitney U Test instead of Wilcoxon Signed–Rank Test. Furthermore, replication R2 was conducted in a rather uncontrolled setting and with a smaller as well as presumably more heterogeneous sample, hence it can be assumed that these influences pose additional interference factors further hampering the identification of significant differences. In the following, we continue by analyzing the results for mental effort (H4 ), accuracy (H5 ) and duration (H6 ) per question. Abstraction: Results for Mental Effort (H4 ) So far, we have analyzed the impact of abstraction on mental effort for all models as well as per model—next, we investi13 We would like to remind at this point that detailed results of tests for normal distribution can be found Appendix A.4. 143 Chapter 5 The Impact of Modularization on Understandability Hypothesis Model Mod. Flat Δ U p r H4 : Mental effort M1 M2 12.00 11.14 13.11 12.55 −1.11 −1.41 210.00 204.50 0.140 0.111 0.21 0.23 H5 : Accuracy M1 M2 3.45 3.68 3.54 3.80 −0.09 −0.12 274.00 246.00 0.885 0.356 0.02 0.13 H6 : Duration M1 M2 302.66 216.03 379.41 257.93 −76.75 −41.90 72.00 64.00 0.096 0.047a 0.30 0.36 a significant at the 0.05 level Table 5.18: Results for abstraction questions (per model) gate the mental effort individually for each question. We would like to remind at this point that abstraction questions and fragmentation questions were posed alternately, i.e., Q1 , Q3 , . . . Q15 operationalized abstraction, while Q2 , Q4 , . . . Q16 operationalized fragmentation. Hence, results listed in Table 5.19 only list odd question. In particular columns one and two list the model and question, while columns three to five list values for modularized models, flat models and the differences thereof; columns six to eight list results from statistical tests. It can be observed that the mental effort was tendentially lower in modularized models, however, except for Q3 , differences were not statically significant. Model Quest. M1 M2 a Mod. Flat Δ U p Q1 Q3 Q5 Q7 2.70 3.00 3.05 3.25 2.57 3.86 3.43 3.25 0.13 −0.86 −0.38 0.00 260.50 150.50 218.50 264.00 0.664 0.004a 0.170 0.723 0.06 0.41 0.20 0.05 Q9 Q11 Q13 Q15 2.54 3.21 2.79 2.61 2.90 3.40 3.15 3.10 −0.36 −0.19 −0.36 −0.49 239.50 242.00 209.00 220.00 0.376 0.411 0.103 0.183 0.13 0.12 0.24 0.19 significant at the 0.05 level Table 5.19: H4 —Mental effort for abstraction questions 144 r 5.4 Evaluation Part I: BPMN Abstraction: Results for Accuracy (H5 ) Analogous to mental effort (H4 ), in the following we summarize the average accuracy (H5 ) for each individual question. As can be seen in Table 5.20, similar to E2 , differences appear to be marginal and non–significant. Rather, similar to E2 , results point toward a negative influence of abstraction on accuracy. Model Quest. Mod. Flat Δ U p r M1 Q1 Q3 Q5 Q7 0.95 0.85 0.70 0.95 1.00 0.89 0.71 0.93 −0.05 −0.04 −0.01 0.02 266.00 268.00 276.00 274.00 0.237 0.661 0.915 0.765 0.17 0.06 0.02 0.04 M2 Q9 Q11 Q13 Q15 0.86 0.89 0.96 0.96 0.95 0.95 1.00 0.90 −0.09 −0.06 −0.04 0.06 254.00 264.00 270.00 262.00 0.304 0.485 0.398 0.369 0.15 0.10 0.12 0.13 Table 5.20: H5 —Accuracy for abstraction questions Abstraction: Results for Duration (H6 ) To conclude the analysis of RQ9 , we look into the average duration required for answering questions. The results, as listed in Table 5.21, show that most questions could be answered quicker in the modularized models (except for Q7 ; potential reasons were already discussed in Section 5.4.2). Even though, except for Q1 , differences could not found to be statistically significant, a clear trend toward the beneficial influence of abstraction can be observed. Hence, we conclude that regarding H9 the findings of E2 could largely be replicated in R2 . In particular, support for hypotheses mental effort (H4 ) and duration (H6 ) could be found, while hypothesis accuracy (H5 ) could not be supported. However, compared to E2 , effects observed in replication R2 were less strong, i.e., differences between groups were less often statistically significant and effect sizes were smaller. This, as we argue, can be largely traced back to the uncontrolled setting of R2 and the smaller sample size. In the following, we turn to the investigation of RQ10 , i.e., the influence of fragmentation, before the insights from RQ9 and RQ10 are revisited for a discussion. 145 Chapter 5 The Impact of Modularization on Understandability Model M1 M2 a Quest. Mod. Flat Δ U p r Q1 Q3 Q5 Q7 67.26 71.35 86.58 77.47 112.01 98.24 100.97 68.18 −44.75 −26.89 −14.39 9.29 57.00 67.00 83.00 91.00 0.022a 0.061 0.228 0.383 0.42 0.34 0.22 0.16 Q9 Q11 Q13 Q15 66.90 59.98 53.72 35.42 74.80 75.27 62.08 45.78 −7.90 −15.29 −8.36 −10.36 79.00 69.00 91.00 68.00 0.170 0.074 0.383 0.067 0.25 0.33 0.16 0.33 significant at the 0.05 level Table 5.21: H6 —Duration for abstraction questions RQ10 : Is the Understanding of a BPMN–Based Model Negatively Influenced by Fragmentation? Analogous to RQ9 , we approach RQ10 by investigating hypotheses mental effort (H7 ), accuracy (H8 ) and duration (H9 ). As described in Section 5.4.1, we created questions that are presumably impaired by fragmentation, but are not influenced by abstraction, i.e., should result in higher mental effort, lower accuracy and higher duration in modularized models. Again, we test these assumptions at three levels of granularity. Starting with an analysis of all questions, subsequently values for M1 and M2 are investigated separately, and finally, questions are analyzed individually. Hypothesis Mod. Flat Δ Z H7 : Mental effort H8 : Accuracy H9 : Duration 15.88 3.35 436.50 12.29 3.67 285.74 3.59 −0.32 150.76 −5.46 −2.46 −4.64 a p 0.000a 0.011a 0.000a r 0.79 0.35 0.85 significant at the 0.05 level Table 5.22: Results for fragmentation questions An overview of the results of RQ10 can be found in Table 5.22. Analogous to RQ9 , it lists hypotheses, values for modularized as well as flat models and the differences thereof. Then, columns five to seven list results from statistical tests. We would like to remind at this point that due to repeated measurements response variables are not independent, hence Wilcoxon Signed–Rank Test was applied. In particu- 146 5.4 Evaluation Part I: BPMN lar, Wilcoxon Signed–Rank Test indicates that mental effort (H7 ) for modularized models was significantly higher (Z = −5.46, p = 0.000, r = 0.79), accuracy (H8 ) was significantly lower for modularized models (Z = −2.46, p = 0.011, r = 0.35) and duration (H9 ) was significantly higher for modularized models (Z = −4.64, p = 0.000, r = 0.85). Furthermore, medium to large effect sizes could be observed (cf. [32, 33]). Summarizing, data collected in R2 provides empirical support for hypotheses mental effort (H7 ), accuracy (H8 ) and duration (H9 ), thereby confirming the findings of E2 . Subsequently, we extend our analysis by investigating the results separately for M1 and M2 . Fragmentation: Results per Model To determine whether effects observed in R2 were specific for M1 or M2 , we analyzed results separately for each model. The results can be found in Table 5.23, whereby columns one to five list hypotheses, models, values for modularized as well as flat models, and the differenced thereof. Then, columns five to eight list results from statistical tests. We would like to remind at this point that due to the separate analysis of M1 and M2 , response variables are independent and hence Mann–Whitney U Test was applied. Basically, it can be observed that this analysis confirms the findings obtained so far: mental effort and duration were on average higher for modularized models, while accuracy was lower. However, differences regarding accuracy were not statistically significant. Hence, we conclude that these results are in line with findings obtained so far. In the following, we continue our analysis by investigating mental effort (H7 ), accuracy (H8 ) and duration (H9 ) for each question. Hypothesis Model Mod. Flat Δ U H4 : Mental effort M1 M2 15.60 16.07 11.89 12.85 3.71 3.22 121.50 137.50 0.001a 0.003a 0.48 0.43 H5 : Accuracy M1 M2 3.30 3.39 3.68 3.65 −0.38 −0.26 207.00 222.00 0.078 0.165 0.25 0.20 H6 : Duration M1 M2 480.60 397.91 308.33 259.93 172.27 137.98 29.00 35.00 0.001a 0.001a 0.63 0.58 a p r significant at the 0.05 level Table 5.23: Results for fragmentation questions (per model) Fragmentation: Results for Mental Effort (H7 ) So far, we have analyzed the influence of fragmentation on mental effort (H7 ) for all models as well as per model, 147 Chapter 5 The Impact of Modularization on Understandability in the following we analyze the influence on each question. We would like to remind at this point that questions that operationalized abstraction and fragmentation were posed alternately, i.e., Q1 , Q3 , . . . Q15 operationalized abstraction, while Q2 , Q4 , . . . Q16 operationalized fragmentation, hence Table 5.24 only lists even questions. Similar to the analysis of H4 , the columns list the models, questions, values for modularized as well as flat models and the differences thereof. Then, columns six to eight list results from statistical tests. Overall, it can be observed that the average mental effort was higher for modularized models. In addition, except for Q8 differences are also statistically significant, further corroborating the negative influence of fragmentation on mental effort. Model Quest. M1 M2 a Mod. Flat Δ U p r Q2 Q4 Q6 Q8 3.75 3.80 4.20 3.85 2.57 2.79 3.32 3.21 1.18 1.01 0.88 0.64 118.00 140.00 170.00 222.00 0.000a 0.002a 0.018a 0.205 0.51 0.45 0.34 0.18 Q10 Q12 Q14 Q16 4.57 3.64 3.89 3.96 3.90 3.00 3.05 2.90 0.67 0.64 0.84 1.06 177.50 184.00 151.50 146.50 0.024a 0.037a 0.005a 0.004a 0.33 0.30 0.41 0.42 significant at the 0.05 level Table 5.24: H7 —Mental effort for fragmentation questions Fragmentation: Results for Accuracy (H8 ) Analogous to mental effort (H7 ), we list results regarding accuracy (H8 ) in the following. In particular, as can be seen in Table 5.25, a trend toward lower accuracy in modularized models can be observed; for Q8 differences were statistically significant. Fragmentation: Results for Duration (H9 ) To conclude the analysis of RQ10 , we investigate the influence of fragmentation on duration (H9 ). As summarized in Table 5.26, on average subjects required more time for answering questions in modularized models and except for Q2 and Q12 differences were statistically significant. Hence, also this analysis is in line with the findings obtained so far. Summarizing, we conclude that data obtained in R2 to a large extend confirms findings from E2 . Particularly, also R2 provides empirical evidence for the positive influence of abstraction on mental effort (H4 ) and duration (H6 ). The influence on 148 5.4 Evaluation Part I: BPMN Model Quest. M1 M2 a Mod. Flat Δ U p r Q2 Q4 Q6 Q8 0.70 0.95 1.00 0.65 0.86 1.00 0.89 0.93 −0.16 −0.05 0.11 −0.28 236.00 266.00 250.00 202.00 0.191 0.237 0.135 0.016a 0.19 0.17 0.22 0.35 Q10 Q12 Q14 Q16 0.64 0.93 0.96 0.86 0.85 0.95 0.95 0.90 −0.21 −0.02 0.01 −0.04 222.00 274.00 276.00 268.00 0.115 0.765 0.809 0.661 0.23 0.04 0.03 0.06 significant at the 0.05 level Table 5.25: H8 —Accuracy for fragmentation questions accuracy (H5 ), in turn, remains unclear. Regarding the influence of fragmentation, R2 provides empirical evidence for the negative influence on mental effort (H7 ), accuracy (H8 ) and duration (H9 ), corroborating the findings obtained in E2 . In the following, we first analyze whether correlations between mental effort and accuracy as well as mental effort and duration also can be found in R2 . Then, to examine the apparently missing link between abstraction and accuracy, we investigate accuracy in detail. Correlations with Mental Effort In the analysis of E2 , we investigated the correlation between mental effort and accuracy as well as mental effort and duration. Next, we repeat this analysis for the data obtained in R2 . In particular, we computed the average mental effort, accuracy and duration for each question. Thereby, we differentiated between questions that were posed for modularized models and questions that were posed for non–modularized models, since different mental effort, accuracy and duration had to be expected. The correlation between mental effort and accuracy can be found in the scatter diagram in Figure 5.10. We would like to remind at this point that the x–axis represents the mental effort, ranging from Extremely low mental effort (1) to Extremely high mental effort (7). The y–axis, in turn, shows accuracy, ranging from 0 (all answers incorrect) to 1 (all answers correct). It can be observed that— consistent with E2 —in R2 high mental effort seems to be related to low accuracy. In fact, mental effort and accuracy show a statistically significant negative correlation (r(30) = −0.394, p = 0.026). 149 Chapter 5 The Impact of Modularization on Understandability Model Quest. Mod. Flat Δ U M1 Q2 Q4 Q6 Q8 125.01 129.01 131.92 94.65 101.28 58.51 91.42 57.12 23.74 70.50 40.50 37.53 75.00 28.00 31.00 46.00 0.124 0.000a 0.001a 0.006a 0.28 0.64 0.61 0.50 M2 Q10 Q12 Q14 Q16 159.13 73.42 94.02 71.34 91.80 68.70 53.88 45.55 67.33 4.72 40.14 25.79 45.00 95.00 36.00 47.00 0.005a 0.480 0.002a 0.007a 0.51 0.13 0.58 0.49 a p r significant at the 0.05 level Table 5.26: H9 —Duration for fragmentation questions Similar to E2 , 4 questions seem not to fit in (Q8 , Q10 , Q12 and Q16 ). To determine, whether these points can be considered outliers, we applied simple linear regression: accuracy = 1.14 − 0.08 ∗ mental effort, t(30) = −2.35, p = 0.026, with R2 = 0.16, F (1, 30) = 5.51, p = 0.026. Then, we computed the residuals and considered all residuals differing more than 3 times Median Absolute Deviation (MAD) [82] as outliers. Applying this rather conservative criterion [130], none of the data points was identified as outlier. We would like to remind that in the analysis of E2 we identified Q8 and Q10 as outliers, but could not provide a plausible explanation why this was the case. Having established that no outliers could be found for R2 , it appears likely that the outliers can be traced back to peculiarities of E2 ’s sample. For instance, all subjects of E2 were attending the same lecture, hence perhaps a certain aspect of BPMN was taught differently than it was interpreted in our experimental material. Contrariwise, in R2 subjects had rather different backgrounds and knowledge, presumably counteracting this effect. Results regarding the correlation between mental effort and duration can be found in Figure 5.11. Again, consistent with E2 , high mental effort appears to be associated with high duration. Indeed, mental effort and duration are statistically significant negatively correlated (r(30) = 0.462, p = 0.008). Analogous to the analysis of mental effort and accuracy, we applied simple linear regression to test whether all data points follow a similar behavior: duration = −27.02+32.96∗mental effort, t(30) = 4.24, p = 0.000 with R2 = 0.38, F (1, 30) = 17.98, p = 0.000. To identify potential outliers, we computed the residual of each data point and considered residuals differing from the mean more than 3 times the MAD as outliers. Against this rather conservative criterion [130], none of the residuals was detected as outlier. Hence, we conclude 150 5.4 Evaluation Part I: BPMN 1.00 Accuracy 0.80 0.60 0.40 0.20 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Mental Effort Figure 5.10: Mental effort versus accuracy that also in R2 mental effort correlates with accuracy and duration. Investigating Accuracy Even though in experiment E2 and replication R2 postulated hypotheses could largely be confirmed, neither in E2 nor in R2 support for the influence of abstraction on accuracy (H5 ) could be found. In the following, we try to clarify whether there is indeed no connection between abstraction and accuracy or whether other explanations seem more plausible. As accuracy is defined as the ratio of correct answers divided by the total amount of answers, it seems essential to understand the causes for incorrect answers, i.e., error sources. To this end, we asked subjects to justify their answers by shortly explaining their line of reasoning. As described in the experimental setup of R2 , we used these justifications to identify potential reasons for incorrect answers in the open coding phase and subsequently aggregated reasons to higher level categories. Similar to the coding reported in Section 4.7, due to personnel limitations only one person, i.e., the author, was responsible for the coding. The results of this analysis can be found in Figure 5.12. From a total of 80 errors, i.e., incorrect answers, 12 errors could not be analyzed due to trivial or missing justifications, for instance, subjects simply rephrased the question: “both activities 151 Chapter 5 The Impact of Modularization on Understandability 200.00 Duration (sec) 150.00 100.00 50.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Mental Effort Figure 5.11: Mental effort versus duration could be reachable in parallel”. For the remaining 68 incorrect answers, various reasons could be identified. First, different background and terminology posed a problem for certain subjects. In particular, 6 subjects confused terminology both A and B and A parallel to B. While both A and B refers to the situation where A and B occur at some time in a process instance, A parallel to B refers to the situation where A and B occur at the same time in a process instance. Then, some subjects also considered infinite looping, i.e., that a process instance is never terminated, but stuck in a loop—which we did not take into account. Furthermore, 5 subjects confused can, i.e., there is a process instance in which certain behavior is true, with must, i.e., a certain behavior must be true for all process instances. Finally, 2 subjects confused A is executed immediately after B, i.e., after A is immediately followed by B, with A is executed after B, i.e., after A at some point B follows. Besides different terminology, 14 answers were incorrect simply due to a lack of knowledge, e.g., subjects confused symbols for OR– and XOR gateways. Interestingly, 10 further subjects gave a perfectly correct justification, but selected the wrong answer. Finally, 3 subjects did not understand a question and 2 subjects had problems with operating the user interface. The remaining 20 errors, in turn, could be traced back to the misinterpretation 152 5.4 Evaluation Part I: BPMN Misinterpretation of model 20 Could determine problem 68 Different background / terminology 19 Total errors 80 Lack of knowledge 14 Could not determine problem 12 Correct reasoning, wrong answer chosen 10 Did not understand question 3 Explanation indicates problem according to framework 15 Non-according to framework 5 Both means parallel 6 Infinite looping 6 Can / must 5 Executed after / Immediatly after 2 Handling of UI 2 Figure 5.12: Distribution of errors of the process model. From these 20 errors, 15 errors could be explained through fragmentation, i.e., subjects had problems in integrating sub–processes or lost track when they switched between sub–processes. This leaves 5 errors that cannot be explained by the proposed framework, but can rather be traced back to errors that occured while interpreting a process model, e.g., overlooking a back–edge in a loop. Put differently, from 20 errors that occurred when interpreting the process model, 15 can be explained by fragmentation. From this categorization, two main observations can be made. First, only 29% of incorrect answers (20 of 68; 12 could not be analyzed) can be traced back to misinterpreting the process model. The remaining 71% seem to be related to problems with the experimental setup, e.g., not properly worded questions, and subject–specific factors, e.g., different backgrounds or terminology. Since the framework proposed in Section 5.3.2 focuses on model–related errors only, these influences have to be seen as interference. This rather strong interference, combined with the low amount of occurring errors, i.e., 82% answers correct in E2 and 89% answers correct in R2 , might be a potential explanation for the—compared to mental effort and duration—rather weak impact of abstraction and fragmentation on accuracy. However, this does not 153 Chapter 5 The Impact of Modularization on Understandability explain why accuracy was found to differ statistically significantly for fragmentation in E2 as well as R2 , but did not differ statistically significantly for abstraction. To explain this disparity, a look at the effect sizes might provide a potential clarification. Throughout E2 and R2 , effect sizes were smaller in hypotheses regarding abstraction compared to effect sizes of hypotheses regarding fragmentation. In particular, as summarized in Table 5.5, effect sizes for mental effort and duration for abstraction were 0.47 and 0.48. For fragmentation, as shown in Table 5.10, values increased to 0.69 and 0.57, respectively. In R2 , a similar behavior could be observed: effect sized for mental effort and duration, as listed in Table 5.17 increased from 0.45 and 0.38 for abstraction to 0.79 and 0.85 for fragmentation (cf. Table 5.22). Hence, it appears that the influence of fragmentation was on average stronger than the average influence of abstraction. Therefore, a potential explanation could be that the influence of abstraction was not strong enough to counterbalance these interferences, leading to non–significant results. Limitations So far, we have analyzed data obtained in R2 , in the following we discuss potential limitations of R2 . First, and foremost, replication R2 was conducted in a rather uncontrolled setting with a rather heterogeneous sample. Hence, it must be assumed that subjects were probably distracted or interrupted while working on the comprehension tasks and therefore results have to be interpreted with care. As, however, results are in line with findings obtained in E2 , it appears likely that also results from R2 are valid. Second, replication R2 builds upon the experimental material from E2 . Hence, limitations regarding experimental material, e.g., usage of a single modeling language, also apply to R2 . Third, the classification of error sources was conducted by a single person only, i.e., the author, thereby limiting the accuracy and reliability of the coding. Discussion So far, we have described the experimental setup, experimental execution, data validation and data analysis of R2 . Subsequently, we revisit the key findings for a discussion. Basically, it can be observed that findings obtained in E2 could mostly be replicated in R2 . In particular, repeatedly empirical evidence for the positive influence of abstraction (RQ9 ) and the negative influence of fragmentation (RQ10 ) could be found. Similarly, it could be substantiated that mental effort and accuracy as well as mental effort and duration are correlated. It should be mentioned at this point that effect sizes and correlations were tendential smaller in R2 , which can be 154 5.5 Evaluation Part II: Declare traced back to the rather uncontrolled setting and heterogeneous setting of R2 as well as the smaller sample size. In addition, neither E2 nor R2 could provide evidence that abstraction has a positive influence on accuracy. A detailed investigation of error sources showed that only approximately 29% of the errors could be traced back to misinterpretations of the model, i.e., errors that can potentially explained through the proposed framework. The remaining 71%, in turn, have to be considered as interference. Knowing that effect sizes for abstraction were on average smaller than effect sizes for fragmentation, it appears plausible that interfering influences were larger than the influence of abstraction on accuracy, thereby covering effects. We would like to emphasize at this point that this does not imply that the negative influence of modularization predominates in general. Rather, in this particular setting, i.e., using the models and questions from E2 and R2 , the measured influence of fragmentation was stronger than the measured influence of abstraction. Even though we could not entirely resolve the connection between abstraction and accuracy, we decided not to further refine the empirical investigation regarding BPMN, but to broaden the study and to also approach declarative process modeling, as detailed in the following. 5.5 Evaluation Part II: Declare So far, we have focused on the quantitative empirical evaluation of the proposed framework from Section 5.3.2 in the context of BPMN–based process models. Subsequently, in experiment E3 we extend this investigation by shifting the focus in two ways. First, we examine the proposed framework for declarative, i.e., Declare–based, process models. Second, besides using quantitative data, we also take into account qualitative data in E3 by employing think–aloud techniques to investigate the subjects’ reasoning processes. Analogous to experiment E2 and replication R2 , we start by introducing the experimental design in Section 5.5.1. Then, in Section 5.5.2 we describe the experimental operation and findings of E3 . 5.5.1 Experimental Definition and Planning The goal of E3 is to provide empirical evidence of abstraction and fragmentation in Declare–based models. In the following, we introduce the research questions, hypotheses, describe the subjects, factors, factor levels, objects and response variables required for our experiment. Then, we present the experimental design as well as the instrumentation and data collection procedure. 155 Chapter 5 The Impact of Modularization on Understandability Research Questions and Hypotheses The research questions investigated in E3 are directly derived from the theoretical framework presented in Section 5.3.2. The basic claim of the framework is that any modularization of a Declare–based process model shows positive effects through abstraction, but also negative effects through fragmentation. As described in Section 3.1.2, the interpretation of modularized declarative process models requires the modeler to combine constraints. Likewise, the interpretation of sub–processes requires the modeler to combine the constraints of the sub–process with constraints from the parent process. Compared to BPMN–based models, where interpretation is mostly concerned with passing tokens through the process [167], this presumably constitutes a considerably more difficult task. Hence, it is a–priori not clear whether process modelers are able to properly perform this task. Therefore, research question RQ11 investigates whether process modelers are basically able to understand the semantics of sub–processes in declarative process models. Research Question RQ11 Do process modelers understand the semantics of sub– processes in Declare–based process models? Then, research questions RQ12 and RQ13 investigate whether empirical evidence for the positive influence of modularization, as postulated in Section 5.3.2, can be found. RQ12 thereby particularly focuses on the role of pattern recognition, whereas the influence of information hiding is approached in RQ13 : Research Question RQ12 Is the understanding of a Declare–based model positively influenced by pattern recognition? Research Question RQ13 Is the understanding of a Declare–based model positively influenced by information hiding? To assess the understandability of a model, analogous to E2 and R2 , we rely on measuring the mental effort, accuracy and duration required for answering a question (mental effort, accuracy and duration are elaborated in detail in Paragraph Response Variables). Again, we use the term flat for a model that is not modularized. In addition, we assume that business processes are available as a modularized version modelmod and a flat version modelf lat . In this way, the hypotheses associated with RQ13 can be postulated as follows: 156 5.5 Evaluation Part II: Declare Hypothesis H10 Questions that are influenced by abstraction, but are not influenced by fragmentation, require a lower mental effort in modelmod . Hypothesis H11 Questions that are influenced by abstraction, but are not influenced by fragmentation, yield a higher accuracy in modelmod . Hypothesis H12 Questions that are influenced by abstraction, but are not influenced by fragmentation, require less time in modelmod . Finally, RQ14 explores postulated negative effects of modularization. In particular, RQ14 investigates whether fragmentation, i.e., splitting attention and integration of sub–processes, decreases understandability: Research Question RQ14 Is the understanding of a Declare–based model negatively influenced by fragmentation? Analogous to RQ13 , we also look into mental effort, accuracy and duration for the postulation of hypotheses associated with RQ14 : Hypothesis H13 Questions that are influenced by fragmentation, but are not influenced by abstraction, require a higher mental effort in modelmod . Hypothesis H14 Questions that are influenced by fragmentation, but are not influenced by abstraction, yield a lower accuracy in modelmod . Hypothesis H15 Questions that are influenced by fragmentation, but are not influenced by abstraction, require more time in modelmod . Subjects The population examined in E3 are all persons that need to interpret declarative process models, e.g., process modelers and system analysts. To ensure that measured differences are caused by the impact of modularization rather than by unfamiliarity with declarative process modeling, subjects need to be sufficiently trained. Even though we do not require experts, subjects should have a good understanding of declarative processes’ principles. 157 Chapter 5 The Impact of Modularization on Understandability Factor and Factor Levels Our experiment employs a two–factorial design with factor modularization (factor levels flat and modularized ) and factor question type (factor levels abstraction and fragmentation). The elaboration of process models with/without sub–processes realizes factor modularization, questions formulated according to the framework from Section 5.3.2 realize factor question type, as detailed in Paragraph Objects. Objects The objects of this experimental design are eight declarative process models that were created, as follows. The starting point were four declarative process models, which were taken from a case study on declarative business process modeling [95], i.e., from process models describing real–world processes. To make these four models, subsequently referred to as M1 to M4 , amenable for E3 , they underwent the following steps. First, the models were translated to English (the case study was conducted in German). Second, inevitable errors occurring in modeling sessions, such as spelling errors, were corrected. Third, the process models were created without the support of sub–processes. Hence, a second variant of each process was created that describes the same process, but makes use of sub–processes, leading to a total of eight declarative process models. As summarized in Table 5.27, the process models were chosen such that the number of activities and number of constraints vary. In particular, M1 and M2 have, compared to M3 and M4 , a small number of activities. In addition, all processes have a different number of constraints. The number of activities varies between the flat and modularized model, as complex activities had to be introduced in the modularized models. Similarly, the number of constraint varies, as processes had to be modeled slightly differently. Since this is the first study investigating sub–processes in declarative models, we decided to keep the model’s complexity rather low. In particular, we ensured that not too many different types of constraints (at most 8) and sub–processes (at most 3) were used. Likewise, we decided for a maximum nesting level of 1, i.e., none of the sub–processes referred to another sub–process. The experiment’s questions are designed, as follows. First, for each model, the subject is asked to describe the process model. The idea of this step is to make the subject familiar with the process model and to minimize learning effects in the upcoming questions. In addition, by letting subjects freely describe a process model, we intend to get further insights how well models are understood. Second, for each model 4 categories of representative questions are designed. In particular, the questions are based on available constraint types [175], i.e., existence, negation 158 5.5 Evaluation Part II: Declare M1 flat Activities Constraints Constraint types Sub–processes Nesting level Domain 11 19 8 mod. 13 21 8 2 1 Software development M2 flat 8 7 4 mod. 9 9 4 1 1 Teaching M3 flat mod. 23 30 7 26 28 7 3 1 Electronic company M4 flat 23 45 5 mod. 26 44 5 3 1 Buying an apartment Table 5.27: Process models used in E3 and ordering. In addition, trace questions, i.e., whether an execution trace is valid, are asked to combine aspects of different constraints. For each category of questions, a pair of questions is designed according to the understandability framework from Section 5.3.2. The first question is designed to profit from abstraction, but not being impaired by fragmentation. Hence, the question should be easier to answer in the modularized model than in the flat model. The second question, in turn, is designed to not profit from abstraction, but being impaired by fragmentation. Hence, the question should be easier to answer in the flat model. All in all, for each model 9 questions are provided—the first one looking into the general understanding of declarative processes, the remaining 8 questions alternatively operationalizing positive and negative effects of modularization. Finally, it is ensured that the information provided in the process models is sufficient to answer all questions. In other words, no background knowledge is required for answering questions, as recommended in [171]. Response Variables The primary response variable of this experimental design is the level of understanding that subjects display with respect to the process models. To measure understanding, similar to E2 and R2 , we measure mental effort, accuracy and duration. For measuring mental effort, we rely on 7–point rating scales, asking subjects to rate mental effort from Very low (1) over Medium (4) to Very High (7). As discussed in Section 3.2.3, adopting rating scales for measuring mental effort is known to be reliable and is widely adopted. Response variable accuracy, in turn, is defined as the ratio of correct answers divided by the total amount of answers. Hence, an accuracy of 0.0 means that all questions were answered incorrectly, while a value of 1.0 indicates that all questions were answered correctly. Then, duration 159 Chapter 5 The Impact of Modularization on Understandability is defined as the duration in second required for answering a question (measured in seconds), i.e., the amount of seconds required for reading the question, interpreting the process model and giving the answer. In addition, think–aloud protocols are collected to get insights into the subject’s reasoning processes, e.g., for analyzing errors and their underlying causes in detail. To this end, similar to replication R2 , we adopt principles from grounded theory [35, 234]. In particular, first, in the open coding phase we mark parts of think–aloud protocols that relate to incorrect answers and try to determine potential reasons for the mistakes. If all errors are categorized, the think–aloud protocols are revisited for checking whether the classification is consistent. If categories are found to be inconsistent, they are adopted accordingly. This procedure is repeated until the classification is stable, i.e., none of the categories is changed. Second, in the axial coding phase classifications are repeatedly aggregated to higher–level categories. Experimental Design In order to investigate RQ11 to RQ14 , we adopt a combination of qualitative and quantitative research methods, as detailed in the following.14 The experiment’s overall process is lined out in Figure 5.13a: First, subjects are randomly, but evenly, assigned to Group 1 or Group 2. Before starting with data collection, subjects are informed that due to the adoption of think–aloud anonymous data collection is not possible. However, it is ensured to subjects that all personal information is kept confidential at any time. Then, regardless of the group assignment, demographical data is collected and subjects are presented with introductory assignments. To support subjects in their task, sheets briefly summarizing the constraints’ semantics are distributed. Data gathered during the introduction is not used for analysis. Rather, the introductory tasks allow subjects to familiarize themselves with the type of tasks to be performed—ambiguities can be resolved at this stage without influencing the actual data collection. After the familiarization phase, subjects are confronted with the actual models designed for data collection. As shown in Figure 5.13a, four declarative business processes are used; each of them once modeled with the use of sub–processes and once modeled without sub–processes (the processes are described in detail in Paragraph Objects). Those four pairs of process models are then distributed between Group 1 and Group 2 such that subjects are confronted with modularized models and flat models in an alternating manner, cf. Figure 5.13a. 14 The experimental material can be downloaded from: http://bpm.q-e.at/experiment/ModularizationDeclarative 160 5.5 Evaluation Part II: Declare Group 1 n/2 Subjects Group 2 n/2 Subjects Demographics, Introduction M1 Flat M2 Modularized M3 Flat M4 Modularized Demographics, Introduction M1 Modularized M2 Flat M3 Modularized M4 Flat (a) Overview Describe Process Model 2 Trace Questions 2 Existence Questions 2 Negation Questions 2 Ordering Questions (b) Questions per Model Answer Question Assess Mental Effort Justify Mental Effort (c) Tasks per Question Figure 5.13: Experimental design of E3 [303] As detailed in Figure 5.13b, for each model the same procedure is used. First, the subject is asked to describe what the process is intended to achieve. Second, the subject is confronted with four pairs of questions which were designed to representatively cover modeling constructs of a declarative process modeling language (details are presented in Paragraph Objects). For each of the questions, in turn, a three–step procedure is followed, cf. Figure 5.13c. First, the subject is asked to answer a question about the model either by Yes, No or Don’t Know. We award one point for each correct answer and zero points for a wrong answer (including Don’t Know ). We deliberately introduced the option Don’t Know, as otherwise subjects would be forced to guess. Second, the subject is asked to assess the expended mental effort. Third, the subject is asked to explain why it indicated a certain mental effort. Throughout the experiment, subjects are asked to constantly voice their thoughts, i.e., to think–aloud, allowing for a detailed analysis of reasoning processes [64]. Instrumentation and Data Collection Procedure For each question, subjects receive separate sheets of paper showing the process model, allowing them to use a pencil for highlighting or taking notes. In addition to recording audio, video recording is used, as video has proven useful to resolve unclear situations in think–aloud protocols (cf. [302]). Hence, besides collecting quantitative data in terms of answering questions by ternary choices (Yes, No, Don’t Know ) and 161 Chapter 5 The Impact of Modularization on Understandability measuring mental effort on a 7–point rating scale, qualitative data in terms of think– aloud protocols is gathered. 5.5.2 Performing the Experiment (E3 ) Based on the experimental setup described in Section 5.5.1, the controlled experiment (E3 ) was conducted. Aspects regarding the preparation and operation of E3 , as well as subsequent data validation and data analysis, are covered in the following. Experimental Operation of E3 Experimental Preparation Preparation for the experiment included the elaboration of process models, associated questions and the demographic survey. In addition, we prepared material introducing subjects with the tasks to be performed. In case subjects required clarification of a constraint’s semantics, we prepared sheets briefly summarizing the semantics of all involved constraints. To ensure that the material and instructions were comprehensible, we piloted the study and iteratively refined the material. Finally, models and questions were printed, audio devices and video camera were checked for operability. In parallel, subjects were acquired, and if necessary, trained in declarative process modeling. Experimental Execution The experiment was conducted in July 2012 in two locations. First, seven subjects participated at the University of Ulm, followed by two additional sessions at the University of Innsbruck, i.e., a total of nine subjects participated. To ensure that subjects were sufficiently familiar with declarative process modeling, all subjects were provided with training material that had to be studied. Each session was organized as follows: In the beginning, the subject was welcomed to the experiment and instructed to speak thoughts out aloud. Since the experimental material consisted over 100 sheets of paper containing process models and questions, we needed to ensure that subjects were not distracted by the extent of material to be processed. To this end, one supervisor was seated left to the subject, a second supervisor to the right and the sheets containing the experimental material were then passed from the left to the subject. As soon as the subject had finished the task, it passed the sheets further to the supervisor to the right. Hence, no more than a handful of sheets were presented to subjects at once. Meanwhile, a video camera video–recorded the subject’s activities and audio–recorded any uttered thoughts. At the end of each session, a discussion followed in order to help subjects reflect on the experiment and to provide us with feedback. 162 5.5 Evaluation Part II: Declare Data Analysis of E3 So far, we have focused on the experimental design as well as experimental execution of E3 . Next, we report from data validation and data analysis; for all statistical analyses we relied on SPSS, Version 21.0. Data Validation One session was performed per subject, hence we could easily ensure that the experimental setup was obeyed. In addition, we screened whether subjects fitted the targeted profile, i.e., were familiar with Business Process Management in general and Declare in particular; results are summarized in Table 5.28. Demographical questions 1–6 concerned modeling proficiency of subjects, i.e., confidence as well as understanding Declare and general BPM knowledge. Questions 7–9, in turn, asked for details of the models subjects had created or analyzed. Finally, questions 10–13 regarded the domain knowledge of subjects. Thereby, we employed a 7–point rating scale for questions 1–3 and questions 10–13, ranging from Strongly disagree (1) over Neutral (4) to Strongly agree (7). Subjects indicated that they were rather unfamiliar with Declare (M = 3.78, SD = 1.47), but felt confident in understanding Declare (M = 4.11, SD = 1.45) and modeling Declare (M = 4.11, SD = 1.37). Also, subjects indicated that they had on average 4.94 years of experience in BPM in general (SD = 1.50). Furthermore, subjects indicated that they had received formal training (M = 0.89 days, SD = 1.59) and self–education (M = 30.33 days, SD = 44.69) during the last year. Questions 7–9 indicate that subjects were experienced in reading process models (M = 75.56, SD = 70.73) and creating process models (M = 24.00, SD = 28.29), however, not necessarily experienced with large process models (M = 17.67 activities per model, SD = 12.16). Finally, we assessed whether subjects were familiar with the domains of the process models used in E3 , i.e., model M1 to model M4 . Subjects indicated that they were rather familiar with software development (M1 ; M = 5.78, SD = 0.79) and familiar with teaching (M2 ; M = 5.56, SD = 0.96), but rather unfamiliar with electronic companies (M3 ; M = 3.00, SD = 1.56) and buying apartments (M4 ; M = 3.56, SD = 1.71). Finally, we assessed the subjects’ professional background: all subjects indicated an academic background. Up to now, we have discussed the design and execution of the empirical study and looked into the demographical data. In the following, we use the gathered data to investigate RQ11 to RQ14 . 163 Chapter 5 The Impact of Modularization on Understandability Min. Max. M SD Familiarity with Declare Confidence understanding Declare Confidence modeling Declare Years of modeling experience Days of formal training last year Days of self–education last year 2 2 2 2 0 4 6 6 6 7 5 150 3.78 4.11 4.11 4.94 0.89 30.33 1.47 1.45 1.37 1.50 1.59 44.69 7. Process models read last year 8. Process models created last year 9. Avg. number of activities per model 10 5 5 250 100 50 75.56 24.00 17.67 70.73 28.29 12.16 4 4 1 1 7 7 6 6 5.78 5.56 3.00 3.56 0.79 0.96 1.56 1.71 1. 2. 3. 4. 5. 6. 10. 11. 12. 13. Familiarity Familiarity Familiarity Familiarity software development teaching electronic companies buying apartments Table 5.28: Demographical statistics of E3 RQ11 : Do Process Modelers Understand the Semantics of Sub–Processes in Declare–Based Process Models? Using modularization means to abstract certain parts of a declarative process model by the means of sub–processes. However, as soon as the content of a sub–process is of concern, the sub–process has to be integrated back into the parent process, as described in Section 3.1.2. For a declarative process model, this implies that the semantics of constraints referring to the sub–process and constraints within the sub– process have to be combined. As argued, this task might not be trivial, hence in RQ11 we investigate whether modelers are basically able to perform this integration task. In the following, we approach RQ11 in two steps. First, we classify questions with respect to correctness, i.e., whether a question was answered correctly. Then, we turn toward the think–aloud protocols to investigate and classify error sources, as described in Section 5.5.1. Similar to the classifications performed for the case study in Section 4.7 and replication R2 in Section 5.4.3, this classification was performed by a single person, i.e., the author. As illustrated in Figure 5.14, in total 288 questions were asked in this experiment (9 subjects * 4 models * 8 questions = 288). In the following, we inspect the upper branch in which questions asked for modularized models are summarized. In total, 144 questions were asked for modularized 164 5.5 Evaluation Part II: Declare models, of which 133 (92.3%) were answered correctly and 11 (7.7%) were answered incorrectly. Apparently, less questions were answered incorrectly in flat models: 4 out of 144 (2.8%). However, when looking into error sources, it becomes clear that modularization is responsible only for a fraction of incorrect answers. In particular, 4 (2.8%) errors could be traced back to integration of constraints, i.e., when subjects had to combine the semantics of several constraints in order to answer a question. Another 1 (0.7%) question was answered incorrectly due to an ambiguous wording, i.e., the subject misunderstood the wording of a question. 2 (1.4%) questions were answered incorrectly due to insufficient knowledge about declarative process models. Finally, 4 (2.8%) questions could be traced back to the presence of modularization, i.e., were answered incorrectly because subjects did not properly understand the meaning of constraints in sub–processes in the context of the parent process. In other words, in these cases subjects had troubles understanding the semantics of the sub–process. Correctly answered (133, 92.3%) Questions modularized (144) Incorrectly answered (11, 7.7%) Questions total (288) Correctly answered (140, 97.2%) Questions flat (144) Incorrectly answered (4, 2.8%) Integration of constraints (4, 2.8%) Integration of sub-processes (4, 2.8%) Ambiguous question (1, 0.7%) Lack of knowledge (2, 1.4%) Integration of constraints (2, 2.8%) Ambiguous question (1, 0.7%) Lack of knowledge (1, 0.7%) Figure 5.14: Distribution of errors, adapted from [303] The main findings are hence, as follows. First, modelers averagely familiar with Declare (cf. Table 5.28) are reasonably capable of interpreting Declare models, as indicated by the fact that 273 out of 288 (94.8%) questions were answered correctly. Second, the collected data indicates that modelers are capable of interpreting modularized models (133 out of 144 question correct, 92.3%), only 4 questions (2.8%) were answered incorrectly due to modularization. Therefore, we conclude that averagely trained modelers are able to interpret modularized declarative process models— 165 Chapter 5 The Impact of Modularization on Understandability however, modularization might also be a potential error source. This finding is also in–line with the framework presented in Section 5.3.2, i.e., modularization is feasible, but has to be applied carefully. Besides showing that modularization is feasible, these findings are also relevant for declarative process models in general. In particular, in Section 4.4.1, we discussed that process models with a large number of constraints are hard to understand, as the modeler has to keep track of all constraints. When analyzing the distribution of errors in Figure 5.14, this assumption is further substantiated. In particular, without considering errors conducted due to modularization, 11 errors were committed in total. Thereof, 5 errors can be attributed to problems with the experimental execution, i.e., in 2 cases a question was worded ambiguously and in further 3 cases the subject was hindered by lacking knowledge about declarative process models. The remaining 6 errors were classified as “integration of constraints”, i.e., when subjects failed to integrate the semantics of several constraints. Hence, it can be concluded that problems in understanding are not caused by single constraints, but rather the interplay of several constraints seems to pose a significant challenge. Given this finding, it seems plausible that the computer–based, automated interpretation of constraints can lead to significant improvements in the maintenance of declarative process models, as described in Chapter 4. Having established that modelers are able to understand the semantics of sub–processes, we now turn to the question in how far the adoption of modularization generates positive effects. RQ12 : Is the Understanding of a Declare–Based Model Positively Influenced by Pattern Recognition? In Section 5.3.2 we argued that modularization supports the modeler in understanding a process model. In the following, we approach this research question in two steps. First, we use think–aloud protocols to identify patterns in understanding declarative process models. Then, we analyze in how far sub–processes support this process of understanding and how it relates to the understandability framework presented in Section 5.3.2. As described in Section 5.5.1, we asked participating subjects to voice their thoughts. For the investigation of RQ12 , we transcribed the recorded audio files and analyzed how subjects handled the question in which they were asked to describe the processes’ behavior. The analysis showed that, regardless of whether sub–processes were present or not, subjects described the process in the order activities were supposedly executed, i.e., tried to describe the process in a sequential way. Hence, as first step, subjects skimmed over the process model to find an entry point where they could start with describing the process: “Ok, this is the, this is 166 5.5 Evaluation Part II: Declare the first activity because it has this init constraint”. Interestingly, subjects seemed to appreciate when a clear starting point for their explanations could be found: “it is nice that we have an init activity, so I can start with this”. A declarative process model, however, does not necessarily have an unique entry point, apparently causing confusion: “Well. . . gosh. . . I’ve got no clue where to start in this model” 15 . After having identified an entry point, subjects tried to figure out in which order activities are to be executed: “And after given duties to the apprentices there should come these two tasks”. Finally, subjects indicated where the process supposedly ends: “the process ends with the activity give lessons”. The sequential way of describing the process models is rather surprising, as it is known that declarative process models rather convey circumstantial information, i.e., overall conditions that produce an outcome, than sequential information, i.e., how the outcome is achieved [69, 70]. In other words, in an imperative model, sequences are made explicit, e.g., through sequence flows in BPMN. In a declarative process model, however, such information might not be available at all. For instance, the coexistence constraint (cf. Table 3.1) defines that two activities must occur in the same process instance (or do not occur at all)—the ordering of the activities is not prescribed. As subjects still rather talked about declarative process models in a sequential manner, it appears as if they preferred this kind of information. Interestingly, similar observations could be made in the case study investigating declarative process modeling described in Section 4.7. Therein, sequential information, such as “A before B” or “then C” was preferred for communication. With respect to this work, the question is in how far sub–processes can support modelers in making sense of the process model. Given that modelers apparently seek for a sequential way of describing the process model, it seems likely that the task of describing a model gets harder for large models, as the modeler cannot just follow sequence flows as in BPMN models, but has to infer which activity could be executed next. Hence, the more activities are present, the more possibilities the modeler has to rule out. Conversely, sub–processes reduce the number of activities per (sub– )model, hence simplifying this task. In order to see whether empirical evidence for this claim can be found, we analyzed the mental effort required for describing process models. During our analysis, we saw that each subject showed a different base–level of mental effort. Hence, a comparison of absolute values of mental effort will be influenced by different base levels. To cancel out this influence and to make mental effort comparable between subjects, we base our analysis on the relative 15 We allowed subjects to choose their preferred language to avoid unnecessary language barriers. The original quote was uttered in Tyrolean dialect: “jå Oiski! Poiski! Då woas ma jå nit wo ånfangn bei dem bledn Modell”. To improve the comprehensibility of the thesis, we translated the quote to English. 167 Chapter 5 The Impact of Modularization on Understandability mental effort, i.e., the mental effort expended for answering a question divided by the average mental effort expended by this subject for answering a question about a process model. Thus, for instance, a value of 0.78 indicates that the subject expended 78% of the average mental effort. Contrariwise, a value of 2.00 indicates that the task was twice as hard as an average task in terms of mental effort. When comparing the relative mental effort required for describing flat models (M = 1.68, SD = 0.72) and modularized models (M = 1.63, SD = 0.72), however, differences turned out to be marginal (0.05). Nevertheless, this result does not contradict the assumption that sub–processes can improve understanding. Rather, we postulated that mental effort will be lower for process models of a certain size. Indeed, if the same analysis is performed for the larger models (M3 and M4 ), the difference with respect to relative mental effort between flat models (M = 1.93, SD = 0.93) and modularized models (M = 1.55, SD = 0.37) increases to 0.38, i.e., modularized models are easier to understand. Likewise, for small models the difference between flat models (M = 1.43, SD = 0.28) and modularized models (M = 1.72, SD = 0.50) increases to −0.29, i.e., modularized models are harder to understand. These findings are in–line with the framework presented in Section 5.3.2: While large models apparently benefit from abstraction, small models are rather impaired by fragmentation. So far, we have discussed how sub–processes influence modelers in establishing an understanding of a declarative process model. In the following, we investigate in how far the recognition of patterns can support the process modeler. To this end, we now turn to results obtained from M3 , as particular interesting insights could be found for this process model. M3 captures procedures from a company selling electronic devices: After having completed initial tasks, employees either supervise apprentices, handle incoming goods or deal with customer complaints— in the modularized model, these three procedures are modeled as sub–processes.16 Unsurprisingly, all subjects that received the modularized model recognized these sub–processes. Interestingly, also all subjects that received the flat model described the same sub–processes. However, in contrast to subjects that received modularized models, it took them considerably longer to understand that the model could be partitioned this way. In order to visualize this relation, we assessed at which point in time subjects mentioned those sub–processes for the first time. In order to eliminate fluctuations such as talking speed, we refrained from looking into absolute duration. Rather, we computed the ratio of the time needed for recognizing the sub–processes divided by the total duration spent for describing the process model. As illustrated 16 Due to size, the process models cannot be reproduced here meaningfully, but can be accessed through: http://bpm.q-e.at/experiment/ModularizationDeclarative 168 5.5 Evaluation Part II: Declare in Figure 5.15, subjects confronted with the flat model tended to recognize the sub– processes for the first time toward the end of the task only, while subjects confronted with modularized models recognized the sub–processes earlier. In particular, for flat models, subjects mentioned sub–process after having expended 62% of the total time. For modularized models, the average ratio dropped to 17%. Even though the data indicates that sub–processes could be identified earlier, the question remains why sub–processes were not identified immediately. The answer to this question can be found in the way subjects described the process models: All subjects described the process in the order activities were supposedly executed. As the sub–processes were to be executed after some initial tasks were performed, subjects first described the initial tasks and then the sub–processes. Still, two different patterns could be observed. Subjects who received the modularized models mentioned the sub–processes and then described their content. Subject who received flat models rather described the entire model first and toward the end stated that they think that the model could actually be split according to these sub–processes. Flat Flat Flat Flat Modularized Modularized Flat Modularized Modularized 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Figure 5.15: Duration until first mentioning of sub–processes [303] Obviously, it is not surprising that subjects mentioned sub–processes earlier in modularized models as sub–processes were explicitly represented. However, when looking into mental effort, similar observations can be made. For flat models a relative mental effort of 2.00 (200%) was computed, for modularized models it dropped to 1.53 (153%)—providing further evidence that modularization was beneficial in this case. Even though these observations provide empirical evidence for the positive influence of pattern recognition for M3 , no indications of pattern recognition could be 169 Chapter 5 The Impact of Modularization on Understandability found in M1 , M2 and M4 . As indicated in the first part of this research question, the size of a model has an impact on whether modularization is helping or rather interfering. Likewise, it can be expected that a certain model size is required for pattern recognition, explaining why no effects could be found for M1 and M2 . This, however, does not explain why subjects did not identify sub–processes in M4 —a potential explanation for this difference can be found in its structure. In particular, the process is to a large extent modeled with precedence constraints, i.e., a constraint that restricts the ordering of activities. Hence, subjects could use these constraints to move through the process model in a sequential way. For M3 , however, such a behavior was not possible, as it also consists of constraints that do not convey any sequential information at all (e.g., the not coexistence constraint, cf. Table 3.1). Hence, subjects were forced to approach the process model differently. Apparently, the strategy was to divide the process model into parts that could be tackled sequentially—resulting in the described sub–processes. To summarize, the collected data indicates that sub–processes appear to negatively influence the overall understanding of rather small modularized declarative process models, but improve understanding if model size increases. In addition, subjects seemed to approach process models in a sequential manner. When this was not possible, subjects apparently tried to divide the process model in manageable, potentially sequential chunks. For modularized models, these divisions could directly be perceived in form of sub–processes, supporting the overall understanding of the process model. RQ13 : Is the Understanding of a Declare–Based Model Positively Influenced by Information Hiding? Besides fostering the recognition of patterns, we argued that information hiding, i.e., using sub–processes to abstract from their content, will support modelers. In particular, removing information irrelevant for conducting the task at hand will presumably result in a lower mental effort and consequently in higher performance. To investigate this claim, we elaborated questions that could be answered without looking into sub–processes. In order to investigate this research question, we first approach it from a quantitative angle, i.e., we analyze the mental effort, accuracy and duration of questions. Then, we take a qualitative point of view and inspect the think–aloud protocols for evidence of information hiding. For the investigation of RQ12 , we introduced the notion of relative mental effort, i.e., the mental effort of a question in relation to the average mental effort required. To approach RQ13 , we additional introduce the notion of relative duration. As described in Section 5.5.1, we employ think–aloud techniques, i.e., subjects constantly 170 5.5 Evaluation Part II: Declare voice their thoughts. During the execution of E3 we observed that subjects have different preferences regarding the level of detail provided in their answers. Hence, also durations required for answering questions differ not only with respect to the difficulty of the question, but are also dependent on personal inclination. As an effort to counterbalance this effect, we computed the relative duration for each question, i.e., we divided the duration required for answering a question by the time required for answering all questions of a process model. The quantitative analysis of RQ13 is similarly structured to the analyses conducted in E2 and R2 . In particular, we computed the average relative mental effort (H10 ), accuracy (H11 ) and relative duration (H12 ) for abstraction questions as well as for each model (M1 to M4 ). However, unlike for E2 and R2 we did not apply statistical test for model–wise comparisons, since the groups would be far too small for a meaningful statistical test.17 Likewise, we refrain from reporting individual values of questions. Hypothesis Mod. Flat Δ Z 0.98 0.96 1.09 1.05 0.99 1.13 −0.07 −0.03 −0.04 −2.07 −1.41 −0.77 H10 : Mental effort H11 : Accuracy H12 : Duration a p 0.038a 0.157 0.441 r 0.69 0.47 0.26 significant at the 0.05 level Table 5.29: Results for abstraction questions The results of this investigation are summarized in Table 5.29. Similar to E2 and R2 , the table lists hypotheses, values for modularized as well as flat models and differences thereof. Then, columns five to seven list results from applying statistical tests. In particular, we applied Wilcoxon Signed–Rank Test, as our sample is relatively small (cf. [179]).18 Summarizing, it can be said that mental effort (H10 ) was statistically significantly lower for modularized models (Z = −2.07, p = 0.038, r = 0.69), whereas no statistically significant differences could be found for accuracy (H11 , Z = −1.41, p = 0.157, r = 0.47) and duration (H12 , Z = −0.77, p = 0.441, r = 0.26). Summarizing, the data indicates that information hiding decreases mental effort— however, no statistically significant influence with respect to accuracy and duration could be observed. When computing values for mental effort, accuracy and duration for each individual model, as shown in Table 5.30, the same pattern can be observed. 17 In total, 9 subjects participated in E3 and each subject only worked on the modularized version or the flat version of a model. Hence, when analyzing models individually the group size is 4 and 5 subjects, respectively. 18 For the sake of completeness, tests for normal distribution can be found in Appendix A.5. 171 Chapter 5 The Impact of Modularization on Understandability While the influence on mental effort appears to be positive and rather consistent, values for accuracy and duration seem to fluctuate. Hypothesis Model Mod. Flat Δ H10 : Mental effort M1 M2 M3 M4 1.01 0.90 1.01 1.01 0.97 0.98 1.14 1.10 0.04 −0.08 −0.13 −0.09 H11 : Accuracy M1 M2 M3 M4 0.94 1.00 0.94 0.95 1.00 1.00 1.00 0.94 −0.06 0.00 −0.06 0.01 H12 : Duration M1 M2 M3 M4 1.13 0.91 1.10 1.21 1.07 1.16 1.12 1.17 0.06 −0.25 −0.02 0.04 Table 5.30: Results for abstraction questions (per model) Considering the statistically significant results for mental effort and knowing that mental effort and accuracy as well as mental effort and duration correlate (cf. Section 5.4), it seems surprising that no statistical significant differences with respect to accuracy and duration could be found. In the following, we first discuss potential explanations regarding accuracy and then turn toward duration. Basically, the high overall accuracy (0.97) and the low standard deviation (0.08) indicate that the lack of significant differences could be attributed to the ceiling effect [263]. In other words, the questions were not hard enough or the models were too small to cause a substantial amount of errors, resulting in low fluctuations of accuracy. In fact, the average mental effort was 3.43, i.e., between Low mental effort and Neither high nor low mental effort. Furthermore, it was argued that accuracy is a less sensitive measure than mental effort [298] and thus larger samples are required to show statistical significant differences. Thus, it seems likely that the lack of significant differences with respect to accuracy can be traced back to the rather low sample size and the low complexity of tasks. Regarding duration, we argued that results will depend on whether subjects spend time in detailing their explanations. We tried to minimize this influence by computing the relative duration, however, it cannot be excluded that the lack of significant results is caused by this interference. 172 5.5 Evaluation Part II: Declare Up to now, we have focused on quantitative data for investigating the influence of information hiding. In the following, we turn to the think–aloud protocols and video recordings, discussing qualitative evidence for the utilization of information hiding. In particular, regardless of whether sub–processes were present or not, a two–step procedure could be observed. In the first step, subjects identified all activities relevant for answering a question. Apparently depending on personal preference, subjects used a pencil to highlight these activities, or simply placed a finger on the paper. In cognitive psychology this is referred to as external memory (cf. Section 3.2.3): the information, which activities have to be considered for answering the question, is stored externally instead of taking up the human mind’s working memory. In the second step, subjects performed the reasoning, i.e., interpreted the constraints relevant for these activities. Interestingly, after step 1 was performed, we could observe subjects actively pursuing information hiding. In particular, in modularized models, sheets of papers that contained irrelevant sub–processes for the question at hand, were removed, e.g., “I don’t need this here I think”. A similar pattern could be observed in the flat models: after having identified which parts of the model are relevant for answering the question at hand, subjects followed various strategies for hiding irrelevant information. For instance, a hand was used to cover up irrelevant parts of the model (“this part of the model cannot be performed”) or the relevant part of the models was highlighted: “cannot occur, since I’ve got here some kind of partial process” 19 . Hence, we conclude that information hiding appears to be a strategy that is intuitively followed by subjects. Interestingly, also for flat models, where all information is present at once, subjects emulated information hiding by covering up irrelevant parts of the model. Still, as indicated in Table 5.29 and Table 5.30, information hiding seems to be rather present in modularized models than in flat models. RQ14 : Is the Understanding of a Declare–Based Model Negatively Influenced by Fragmentation? After having provided empirical evidence that modelers are basically able to understand modularization (RQ11 ) and positive influence of sub–processes (RQ12 and RQ13 ), now we turn to the postulated negative influence. As argued in Section 5.3.2, tasks that involve the content of several sub–processes require the modeler to mentally integrate these sub–processes, imposing a higher mental effort and leading to lower performance. To empirically investigate this claim and similar to RQ13 , we elaborated questions that presumably do not benefit from abstraction, but suffer from fragmentation. Hence, such questions should be easier to answer in a flat 19 Original quote: “cannot occur, da ich hier so’n Teilprozess hab”. 173 Chapter 5 The Impact of Modularization on Understandability model, as they are not negatively influenced by modularization. Similar to RQ13 , we start by approaching RQ14 from a quantitative angle and take a qualitative point of view afterwards. The analysis of results follows the same strategy as applied in RQ13 , i.e., we computed the relative mental effort, accuracy and relative duration for all models. Likewise, Table 5.31 shows hypotheses, values for modularized as well as flat models and differences thereof. Similar to RQ13 we applied Wilcoxon Signed–Rank Test to test for statistical significant differences, results are listed in columns five to seven. In particular, mental effort (H13 ) was significantly higher in modularized models (Z = −2.07, p = 0.038, r = 0.69), whereas no statistical significant differences could be found for accuracy (H14 , Z = −1.89, p = 0.059, r = 0.63) and duration (H15 , Z = −0.77, p = 0.441, r = 0.26). Hence, similar to RQ13 , we could provide empirical evidence for the negative influence of modularization on mental effort, but could not show differences with respect to accuracy and duration. Also, when looking into values computed for each model individually, a similar pattern can be observed (cf. Table 5.32). In particular, the influence on mental effort seems to be consistently negative in modularized models, values for accuracy and duration apparently fluctuate. Hypothesis Mod. Flat Δ Z 1.02 0.89 0.91 0.95 0.96 0.87 0.07 −0.07 0.04 −2.07 −1.89 −0.77 H13 : Mental effort H14 : Accuracy H15 : Duration a p 0.038a 0.059 0.441 r 0.69 0.63 0.26 significant at the 0.05 level Table 5.31: Results for fragmentation questions Interestingly, a similar pattern of results as described in RQ13 could be observed. Again, mental effort was significantly different, while no significant differences with respect to accuracy and duration could be shown. In RQ13 we argued that results regarding accuracy were to a certain extent caused by the ceiling effect, i.e., high accuracy and low standard deviation. In RQ14 further evidence for this assumption is provided. More specifically, the mean accuracy was lower (0.92 versus 0.97), while the standard deviation increased (0.13 versus 0.08). In line with these changes, also the p value computed for Wilcoxon Signed–Rank Test dropped near significance (0.059 versus 0.157). Hence, it seems likely that also for RQ14 lack of significant differences with respect to accuracy can be traced back to sample size and low complexity of tasks. Nevertheless, empirical evidence for the negative influence 174 5.5 Evaluation Part II: Declare of modularization in terms of mental effort could be provided. With respect to duration, we argued that the adoption of think–aloud techniques might posed a considerable influence. Hypothesis Model Mod. Flat Δ H13 : Mental effort M1 M2 M3 M4 0.99 1.10 0.99 0.99 1.03 1.02 0.86 0.90 −0.04 0.08 0.13 0.09 H14 : Accuracy M1 M2 M3 M4 0.75 0.95 0.81 1.00 0.95 0.94 1.00 0.94 −0.20 0.01 −0.19 0.06 H15 : Duration M1 M2 M3 M4 0.87 1.09 0.90 0.79 0.93 0.84 0.88 0.83 −0.06 0.25 0.02 −0.04 Table 5.32: Results for fragmentation questions (per model) To enhance RQ14 with qualitative insights, we examined the think–aloud protocols for evidence of fragmentation. A particularly explicit case of fragmentation could be found in question 4 from M2 . Here, subjects were asked to answer how often Decide on teaching method, contained in sub–process Prepare lessons, could be executed. Decide on teaching method was constrained to be executed exactly once in the sub–process Prepare lessons. Prepare lessons, in turn, was also restricted to be executed exactly once. Hence, subjects had to combine these two constraints to find out that Decide on teaching method could be executed exactly once. The reasoning process required to establish this answer can be found in a subject’s think–aloud protocol: “yes, has to be executed exactly once. . . it is in this sub–process of prepare lessons. Prepare lessons has to be executed exactly once and also in the sub–process exactly once. One times one is one” 20 . As described in RQ13 , subjects first located relevant activities and then interpreted associated constraints. In this particular case, the subject understood that it had to combine the cardinality constraint on Decide on teaching method with the cardinality constraint on Prepare lessons, i.e., 20 Original quote: “ja, muss immer genau einmal ausgeführt werden, das is in dem, es is in dem Subprozess von prepare lessons. Prepare lessons muss genau einmal ausgeführt werden und das muss in dem Subprozess genau einmal, und ein mal eins ergibt bei mir auch wieder eins”. 175 Chapter 5 The Impact of Modularization on Understandability had to integrate these two cardinality constraints. Even though this integration task appears especially easy (“one times one is one”), it emphasizes the problem of fragmentation: it requires the modeler to combine the semantics of (potentially) several constraints. This, in turn, was shown to be the major reason for misinterpreting declarative process models (cf. RQ11 ), providing further empirical evidence for the negative influence of fragmentation. An apparently especially difficult integration task could be found in a fragment of M1 , cf. Figure 5.16. In particular, the subjects had to assess the statement: “Write code” has to be executed before “Merge fix” can be executed. To this end, three facts have to be combined. First, Write code is contained in sub–process Apply TDD, while Merge fix can be found in sub–process Work with production software. Second, Apply TDD and Work with production software are connected by a precedence constraint, hence Apply TDD must be executed before Work with production software can be executed. Hence, it could mistakenly be inferred that Write code must be executed before Merge fix can be executed. However, third, Write code is not necessarily executed when Apply TDD is executed. Rather, Write test must be executed at least once and consequently also Run tests must be executed at least once due to the chain response constraint (cf. Table 3.1) between these two activities. Write code, though, is not required. Therefore, Apply TDD can be executed without an execution of Write code, thus also Merge fix can be executed without a prior execution of Write code. For illustration purpose, consider the following excerpt from a think–aloud transcript: “Write code has to be, write code, where are you, here, has to be executed before merge fix can be executed”. Here the subject searches for activities Write code and Merge fix. Then, the subject examines the relationship between the sub– processes which contain these activities: “Yes, because before, ahm, before work with production software which is the sub–process where merge fix is. . . apply TDD has to be performed before”. Here, the subject apparently falsely integrates the precedence constraint between Apply TDD and Work with production software with the activities contained therein. Knowing that the subject answered 29 out of 32 (91%) questions correctly, it can be assumed that the subject tried its best to answer the questions correctly. Hence, we conclude that this task indeed posed a significant challenge for the subject. Limitations Apparently, several limitations apply to E3 . First, the empirical evaluation provides promising results, however, the rather low sample size (9 subjects) is a clear threat to the generalization of results. Similarly, even though the process models 176 5.5 Evaluation Part II: Declare Top-level process (fragment) Apply TDD Work with production software + + 1 1..* Run tests Write test Merge fix Test fix Write code Legend X Activity X X + Complex activity X X Y 1 X 1..* X Activity X must be executed before Y can be executed Activity X must be executed exactly once X Y Activity X must be executed at least once X Y Each execution of X must be directly followed by Y. Each Y must be directly preceded by X. X Y Each execution of Y must be directly preceded by X. Each execution of X must be directly followed by Y. Figure 5.16: Difficult integration task, adapted from [303] used in this study vary in the number of activities, constraints and sub–processes, it is not entirely clear whether the obtained results are applicable to every modularized declarative process models. In addition, we considered process models with a nesting level of one only, i.e., none of the sub–processes was refined using further sub–processes. As it was shown that an overuse of sub–processes may negatively impact the understanding of a model [45], the limited nesting level has to be seen as a further limitation of this study. Similarly, the questions used to assess the understandability can only address a limited number of aspects. Even though questions were designed to representatively cover several aspects of models (cf. [138]), a bias favoring certain questions cannot be ruled out entirely. In addition, all participating subjects indicated academic background, limiting the generalization of results. However, subjects also indicated profound background in BPM, hence, we argue that they can be seen as proxies for professionals. Finally, the classification of error sources was due to personnel limitations performed by a single person only, i.e., the author. Therefore, also limitations regarding the accuracy and reliability of the classification must be acknowledged. 177 Chapter 5 The Impact of Modularization on Understandability Discussion The results of E3 show a diversified picture of modularization in declarative models. Basically, the findings of RQ11 indicate that modelers are able to properly interpret sub–processes. However, the adoption of sub–processes does not necessarily improve the understandability of a model. While pattern recognition (RQ12 ) and information hiding (RQ13 ) may lower the mental effort for understanding a process model, fragmentation (RQ14 ) appears to impose an additional burden on the modeler. Basically, due to the low sample size of E3 , results should be interpreted with care. In this sense, comparing findings from E3 with results from E2 and R2 is particularly valuable for determining which findings could be replicated and seem to be stable. First and foremost, the impact of modularization on mental effort appears to be stable, as in E2 , R2 as well as E3 all hypotheses regarding mental effort could be supported. With respect to accuracy, the situation is less clear. On the one hand, the negative influence of fragmentation on accuracy could be corroborated in E2 as well as R2 and differences in E3 were barely above statistical significance. On the other hand, no statistically significant empirical evidence for the influence of abstraction on accuracy could be found. This, however, may be rather traced back to problems with the experimental design. Also, the influence on duration could shown to be statistically significant in E2 and R2 , but could not be be replicated in E3 . As argued, this can be attributed to the adoption of think–aloud, which must be assumed to have a strong influence on the duration. Against this background, also the results from E3 —even though obtained from a rather small sample—seem to be plausible. In addition, BPMN and Declare differ considerably with respect to semantics and modeling constructs. Hence, the results obtained in E3 suggest that the influence of abstraction and fragmentation, as postulated in Section 5.3.2, are not only specific for these two modeling languages, but can be considered rather general effects that can also be expected for other conceptual modeling languages. 5.6 Limitations Up to now, we have identified and discussed existing empirical research into the modularization of conceptual models through a systematic literature review in Section 5.2. Then, in Section 5.3 we proposed a cognitive–psychology–based framework, providing a new perspective on the interplay between modularization and understandability of a conceptual model. Subsequently, we empirically validated the framework for BPMN in Section 5.4 and for Declare in Section 5.5. In the following, we revisit the limitations of the described work. 178 5.7 Discussion As discussed in Section 5.3.3, the proposed framework is of rather general nature and does not take into account pecularities of specific modeling languages. In particular, the framework focuses on the structure of the model, but does not take into account its semantics. Hence, factors such as redundancy or minimality, as described in the good decomposition model [25, 266] cannot be taken into account. Similarly, we clearly acknowledge model size as an important factor, but consider model size only implicitly. We would like to emphasize at this point that we have deliberately decided not to take into account model size explicitly, as it appears that the appropriate size of a sub–model is still unclear. Rather, the intention was to start with a rather simple, but stable framework which may be extended in future work. While the general nature basically allows for applying the framework to almost any conceptual modeling language supporting modularization, at the same time it calls for an empirical validation of particular modeling languages. In this sense, empirical investigations E2 , R2 and E3 provide empirical support for the framework. In particular, the influence on mental effort could be corroborated in E2 , R2 and E3 , while the influence of fragmentation on accuracy could be empirically substantiated in E2 , R2 , while no support could be found for the influence of abstraction on accuracy. Finally, support for the influence of abstraction and fragmentation on duration could be found in E2 and R2 , whereas no statistically significant differences were found in E3 . Apparently, the results need to be viewed in the light of several limitations. In particular, the empirical evaluation focuses on BPMN and Declare. Although results appear to be consistent, it remains unclear whether the findings can directly be applied to other conceptual modeling languages that support modularization. In addition, we tried to conduct a comprehensive empirical evaluation by taking into account different modeling languages and quantitative as well as qualitative data. Even though we tried to create representative models and questions by taking into account typical modeling constructs, it cannot be entirely excluded that observed results are specific for these models. In the following, we revisit findings of this work as well as the above described limitations in the course of a discussion. 5.7 Discussion The central goal of this chapter is to provide a new perspective on the connection between the modularization of a process model and its understandability. To this end, drawing upon insights from previous empirical investigations and cognitive psychology, we proposed a framework that claims that the impact on understandability is determined by the positive influence of abstraction and the negative influence of fragmentation. In the course of three empirical studies, i.e., E2 , R2 and E3 , empiri- 179 Chapter 5 The Impact of Modularization on Understandability cal support for the impact on mental effort, accuracy and duration could be found. It should be emphasized that respective effects could be found largely consistently across three empirical investigations including two different modeling languages and subjects of varying background, i.e., students, academics and professionals. Against this background, it seems likely that the influence of abstraction and fragmentation, as postulated, is not just a peculiarity of the experimental design, but can generally be found in BPMN–based and Declare–based models. Besides providing empirical support for the understandability framework proposed in Section 5.3.2, the results indicate that the benefits of a modularization depend on which kind of information should be extracted. In other words, if the question an modeler is interested in, rather benefits from abstraction than being impaired by fragmentation, understandability will presumably improve. Contrariwise, if fragmentation prevails, the model will presumably become more difficult to understand. Thus, it seems worthwhile to maximize the ratio of abstraction to fragmentation. In this sense, dynamic process visualizations [19, 123, 204] seem to be promising, as they allow for visualizing the process model according to the modeler’s demands. In the context of this work, such a dynamic visualization would ensure that all relevant modeling elements are visible, while irrelevant modeling elements are hidden in sub–processes. For declarative process models, however, this requires an automated restructuring of modularized declarative process models. Such techniques, however, are not in place yet and only possible for process models that do not make use of enhanced expressiveness (cf. Section 3.1.3). Regarding the interpretation of statically visualized declarative business process models, we could identify different strategies in the think–aloud protocols and video material. Basically, modelers appear to approach declarative process models in a sequential manner, i.e., they tend to describe the process in the ordering activities can be executed. Knowing that imperative process modeling languages, e.g., BPMN, are much wider spread than declarative process modeling languages, one might argue that this indicates that subjects were biased by the former category of modeling languages. Similarly, it might have been the case that this behavior was triggered by the layout of the process models. In particular, the process models were laid out so that activities that were executed in the beginning of a process instance were placed top left, whereas activities that were executed at the end of the process instance were placed bottom right. Even though a declarative process model typically does not prescribe a strict ordering of activities, this layout might have influenced the subjects. However, it was found that domain experts, i.e., persons unfamiliar with business process modeling, were also inclined toward sequential behavior (cf. Section 4.7). Hence, it seems likely that the abstract nature of declarative process models does not naturally fit the human way of reasoning. 180 5.7 Discussion Further, evidence that constraints indeed may pose a considerable challenge for the modeler could be found in the tasks where subjects were asked to describe a process model. Therein, we could find indications that for larger process models, sub–processes helped to divide the model into manageable parts, i.e., the number of interacting constraints seems to play an essential role. Further evidence for this hypothesis is provided by the finding that subjects intuitively sought to reduce the number of constraints by, e.g., putting away sheets describing irrelevant sub– processes or, in a flat model, using the hand to hide irrelevant parts of the model. In this sense, it seems plausible that an automated interpretation of constraints, as proposed in Chapter 4, can help to improve the understanding and thereby maintenance of declarative business process models. In this work we have not considered the granularity of modularizations. Likewise, we have not investigated whether correct levels of abstraction were applied for sub– processes, as discussed in detail in [51, 197]. Rather, our work has to be seen as an orthogonal perspective to these aspects. Even when optimizing granularity and abstraction levels, a process model may be modularized in various ways. The framework proposed in this work may then be used as an additional perspective, helping the modeler to decide for a specific modularization. Similarly, the results have to be seen in the light of guidelines for modularization. For instance, according to the good decomposition model [265], proper modularization should satisfy minimality, determinism, losslessness, weak coupling and strong cohesion. Again, abstraction and fragmentation have to be seen as an additional perspective. Basically, satisfying the conditions of the good decomposition model can be related to optimizing the ratio between abstraction and fragmentation. For instance, achieving strong cohesion clearly aims at increasing abstraction by keeping closely related objects together (non–related object will have to be placed in different sub–models to achieve strong cohesion, hence fostering abstraction). Weak coupling, in turn, aims at minimizing fragmentation by minimizing connections between sub–models and hence decreasing potential switches between sub–models. Losslessness, i.e., that no information is lost when introducing sub–processes, is not captured in our framework, as the focus of our work is put on models rather than on their creation. Finally, achieving minimality, i.e., non–redundancy, and determinism seems desirable for modularization. However, in our opinion these factors are not necessarily related to decomposition only, but should be rather seen as general modeling guidelines that also hold for non–modularized models. As our framework specifically focuses on modularization, we do not see a direct connection to our framework. Regarding guidelines and specifically the guidance of modelers in creating modularized models, the findings obtained in this work may be used for the development of recommendations during modeling. Similar to recommendation sys- 181 Chapter 5 The Impact of Modularization on Understandability tems that guide the execution of business processes [96, 220], support microblogging environments [289, 290] and help to organize semi–structured data [80, 81], modelers may be supported in the creation of modularized models. In particular, by instrumenting modeling editors, user interactions, such as switching between sub–models and scrolling, could be logged. An excessive amount of switching could indicate a high degree of fragmentation and the modeling environment could recommend the modeler to merge certain parts of the model. Contrariwise, an excessive amount of scrolling may indicate an overly large process model and the introduction of sub– models may be recommendable. However, these scenarios are of theoretical nature only so far and it has to be investigated first whether supporting, yet non–disrupting recommendations are feasible. With respect to empirical investigations of modularized models in general, the interplay of positive and negative influences is also of interest. In particular, the results doubt in how far empirical comparisons between flat and modularized models are meaningful if questions were not designed carefully. In particular, when focusing on abstraction questions only, a bias for modularized models must be expected. Contrariwise, when an empirical investigation focuses on fragmentation questions only, the setup will most likely favor non–modularized models. Hence, merely comparing modularized models and flat models seems to be too shortsighted. Rather, in the experimental design, positive and negative effects should be distinguished. Against this background, seemingly contradicting results from empirical investigations into modularization can be explained in a plausible way. In works reporting from positive influence, e.g., [150, 208], questions benefiting from abstraction probably prevailed. In inconclusive works, e.g., [41, 225], questions benefiting from abstraction and questions impaired by fragmentation were probably in balance. In works reporting from negative influence, in turn, e.g., [40, 45], probably questions impaired by fragmentation prevailed. 5.8 Related Work The work presented in this chapter is basically related to three streams of research: research that investigates the modularization and understandability of conceptual models, research that deals with guidelines for modularization and research that is concerned with automated approaches to modularization. Modularization and Understandability of Conceptual Models In this work, we discussed characteristics of modularization in conceptual models and the impact on understandability. The impact of modularization on understandability 182 5.8 Related Work was studied in various conceptual modeling languages, such as imperative business process models [206, 208], ER diagrams [150, 225], UML Statecharts [40, 42–44] and various other UML diagrams [24–26] (cf. Section 5.2). Even though all these works empirically investigate the connection between modularization and understanding, BPMN and Declare are not investigated. Similarly, also the work of McDonald and Stevenson [136], in which the navigation performance in hierarchical documents is investigated, should be mentioned. Even though hierarchical structures are investigated in that work as well, the navigation instead of comprehension is of concern. More generally, also work dealing with the understandability of conceptual models is related. For instance, in [137] criteria for the systematic analysis of understandability are described. Similarly, [212] proposes COGEVAL, a cognitive–psychology–based theoretical framework for assessing the understanding of conceptual models. In this sense, also the work of Reijers et al., which deals with the understandability of business process models in general [143, 207], is related. Similarly, in [139] the relationship between the size of imperative process models and error rates is established. Likewise, in [59, 260] the understanding of process models is assessed through the adoption of structural metrics. Regarding the understandability of declarative process models in general, also the work presented in Chapter 4 is related. Even though all these works investigate understandability, modularization is at most addressed as a side aspect. Guidelines for Modularization In this work we provide a new perspective on the link between modularization and understandability. Thereby, we also hope to advance the understanding of principles modeling guidelines are based on. In this sense, respective guidelines for the creation of business process models, e.g., [46, 49, 122, 145], are also is of interest. Similarly, general considerations about modularization, resulting in the good decomposition model [265] were adopted for the modularization of object–oriented design [25] and the modularization of Event–driven Process Chains [113]. Likewise, in [51] a development method, which foresees the creation of modularized business processes, is described. Even though these works provide valuable insights into modularized models, none of these works deals with declarative process models. In this work we focused on the outcome of a process modeling endeavor, i.e., the process model. Recently, researchers have also began to investigate the process of creating a process model, referred to as the process of process modeling [192]. Similar to this work, they way how modelers make sense of a process model while creating it, is investigated—for instance, by visualizing the process of process modeling [30]. Similarly, different personalized modeling styles [185, 186] and modeling strategies 183 Chapter 5 The Impact of Modularization on Understandability were identified [31]. Even though this stream of research appears to be promising, none of these works investigates the creation of modularized models and rather puts its current focus on imperative, non–modularized models. Automated Approaches to Modularization Besides assessing the understandability of modularization, several authors investigated potential ways of automatically creating modularized models. In particular, in [193] an approach for automatically aggregating activities based the most relevant activities of a process model is proposed. Similarly, in [232] an approach for the automated abstraction of control flow, based on behavioral profiles, is described. Another automated approach for modularization is described in [231]—here, meronymy relations between activity labels are employed for automated modularization. Similarly, in [73, 201] methods for the automated clustering of ER diagrams are described and [78] discusses criteria for clustering, such as cohesion and coupling. A comprehensive overview of related approaches to clustering, in turn, is provided in [149, 152]. Finally, in [172] an automated approach for the decomposition of systems is proposed. Even though all of these approaches promise to provide abstraction in an automated way, it is unclear in how far the created models will be understandable to the end–user, which is of concern in this work. 5.9 Summary In this chapter, we proposed and empirically validated a cognitive–psychology–based framework for assessing the impact of the modularization of a conceptual model on its understandability. The starting point for this work was formed by a systematic literature review about empirical research investigating the interplay between modularization and understandability, which showed that—contrary to the expectation of researchers—the influence of modularization is not always positive. Rather, although most empirical studies reported from a positive influence, several studies also reported from non–significant differences and a negative influence of modularization. To explain these apparently contradicting findings, we proposed a cognitive– psychology–based framework that provides a potential explanation. Particularly, we identified abstraction, i.e., pattern recognition and information hiding, as a positive influence of modularization. Contrariwise, we attributed the negative influence of modularization to fragmentation, i.e., the need to switch between sub–models and to re–integrate information. Depending on whether abstraction or fragmentation dominates, a positive influence, a negative influence or no influence at all can be 184 5.9 Summary observed. To empirically corroborate this claim, we conducted a series of empirical studies in which we applied the framework in various settings. First, in experiment E2 we focused on BPMN–based models and conducted a controlled experiment. The replication R2 , in turn, was carried out as an on–line study, i.e., in a rather uncontrolled setting. Finally, empirical study E3 was conducted in a controlled setting, but shifted the focus from BPMN toward Declare and also took into account qualitative data, whereas E2 and R2 mostly collected quantitative data. Throughout these studies, similar effects and patterns could be observed. First, regardless of the setting, i.e., modeling language, subjects and experimental material, empirical support for the positive influence of abstraction as well as the negative influence of fragmentation could be found. If observed effects could not be found to be statistically significant, plausible alternative explanations, such as pecularities of the experimental design, could be provided. Throughout E2 , R2 and E3 , the influence of modularization on mental effort could be found, i.e., we could find a statistically significant influence in all studies. Similarly, statistically significant differences of durations could be found in E2 and R2 , whereas the lack thereof in E3 could be rather traced back to the application of think–aloud. Finally, empirical evidence for the influence on accuracy could be found. Even though we could not identify statistically significant results for the influence of abstraction on accuracy, we could trace back these unexpected results to the low number of committed errors and their distribution. Furthermore, the analysis of think–aloud protocols in E3 provided complementary qualitative support. For instance, we could observe subjects deliberately hiding sub–models, thereby facilitating abstraction—particularly information hiding—for alleviation. Contrariwise, we could also notice reasoning processes involved in the integration of sub–models, observing how fragmentation complicated the interpretation of models. Summarizing, we think that the empirical data collected these studies provides convincing arguments for the existence of abstraction and fragmentation, as postulated. In this sense, we hope to have advanced the state of the art by providing a new perspective on the connection between modularization and understandability. 185 Chapter 6 Summary In this thesis, we set out to investigate whether concepts from cognitive psychology can be applied for systematically improving the creation, understanding and maintenance business process models. To approach this rather broad research questions, we selected two promising areas of application. In particular, in Chapter 3, we built the foundation for this thesis by transferring concepts from cognitive psychology to business process modeling. Then, in Chapter 4, we used these insights to analyze potential problems regarding the creation, understanding and maintenance of declarative business process models. Largely, we could trace back issues to the representation of sequential information in declarative process models as well as hidden depedencies. These properties, in turn, presumably complicate the understanding of declarative process models. To counteract these problems, we proposed TDM for the computer–supported computation of sequential information. Further, we implemented the concepts of TDM in TDMS and empirically validated the benefits of TDM in a case study, in an experiment and in a replication. Therein, we found empirical evidence corroborating that TDM can help to support the creation, understanding and maintenance of declarative process models. Subsequently, we turned toward the connection between a process models’ modularization and its understandability in Chapter 5. Therein, we conducted a systematic literature review for assessing the state of the art regarding empirical research investigating the link between modularization and understandability. In the course of the review, we found that insights seem to vary from positive over neutral to negative. To provide a potential explanation for these differing findings, we drew on concepts from cognitive psychology and proposed a framework for assessing the impact of modularization on the understandability of a process model. Then, to support the efficient empirical validation of the proposed framework, we implemented Hierarchy Explorer for displaying modularized process models. Hierarchy Explorer, in turn, was employed in an experiment and a replication, in which the framework was validated for BPMN–based process models. In an additional experiment, findings were complemented by applying the framework for Declare–based process 187 Chapter 6 Summary models. The findings of these empirical studies show that the proposed framework, in particular the interplay of abstraction and fragmentation, allows for assessing the influence of modularization on understandability. In this way, we think that the framework provides a new perspective on the modularization of process models and contributes to the creation of modularized process models that are easier to understand. In the light of these findings, we conclude that the central research question of this thesis—whether cognitive psychology can be used to improve the creation, understanding and maintenance of process models—can be clearly affirmed. Moreover, concepts from cognitive psychology were not only applied to a single purpose or to a single modeling language, but for three purposes (creation, understanding and maintenance) and two modeling languages (BPMN and Declare). Thus, we believe that also other areas, such as research into the layout of process models or comparisons between process modeling languages, could benefit from respective concepts. Furthermore, it was noted that particularly the comparison of modeling languages and methods often lacks respective theoretical underpinnings [69]. In this sense, we think that insights from cognitive psychology could help to put such discussions on a more objective basis. Even though the contributions presented in this thesis are self–contained, they apparently also offer the opportunity for follow–up research. In particular, TDM currently focuses on control flow only, but could be extended to include data and resources in test cases. Similarly, TDMS provides support for Declare only, however, could be extended toward supporting other declarative modeling languages, such as DCR graphs. Regarding the framework for assessing the impact of modularization on understanding, particularly more detailed empirical investigations seem to be promising. For instance, through the adoption of eye tracking technology [58], the way how humans interpret modularized process models could be investigated in far more detail. Concluding, we think that bringing together cognitive psychology and business process modeling provides a fruitful basis for research. In this vein, we hope that this thesis helps to improve business process modeling, but also fosters interdisciplinarity. 188 Appendix A Tests for Normal Distribution In this part of the appendix, results from tests for normal distributions are listed. The results were put in the appendix to avoid the text being cluttered by tables. A.1 Experiment E1 Variable Group N D Mental effort With test cases Without test cases 12 12 0.352 0.280 0.000a 0.010a Perceived quality With test cases Without test cases 12 12 0.333 0.160 0.001a 0.200 Quality With test cases Without test cases 12 12 0.339 0.399 0.000a 0.000a Operations With test cases Without test cases 12 12 0.209 0.365 0.154 0.000a a p significant at the 0.05 level Table A.1: Kolmogorov–Smirnov Test with Lilliefors significance correction for E1 189 Appendix A Tests for Normal Distribution A.2 Replication R1 Variable Group N D Mental effort With test cases Without test cases 31 31 0.183 0.277 0.010a 0.000a Perceived quality With test cases Without test cases 31 31 0.323 0.203 0.000a 0.002a Quality With test cases Without test cases 31 31 0.271 0.219 0.000a 0.001a Operations With test cases Without test cases 31 31 0.241 0.243 0.000a 0.000a a p significant at the 0.05 level Table A.2: Kolmogorov–Smirnov Test with Lilliefors significance correction for R1 A.3 Experiment E2 Variable Group N D Mental effort Modularized Flat 109 109 0.100 0.094 0.010a 0.018a Accuracy Modularized Flat 109 109 0.338 0.380 0.000a 0.000a Duration Modularized Flat 109 109 0.117 0.118 0.001a 0.001a a p significant at the 0.05 level Table A.3: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , abstraction questions, total values 190 A.3 Experiment E2 Variable Model Group N D Mental effort M1 Modularized Flat 56 53 0.110 0.142 0.091 0.009a M2 Modularized Flat 53 56 0.118 0.107 0.063 0.164 M1 Modularized Flat 56 53 0.310 0.438 0.000a 0.000a M2 Modularized Flat 53 56 0.364 0.323 0.000a 0.000a M1 Modularized Flat 56 53 0.130 0.111 0.020a 0.148 M2 Modularized Flat 53 56 0.076 0.230 0.2000 0.000a Accuracy Duration a p significant at the 0.05 level Table A.4: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , abstraction questions, values per model 191 Appendix A Tests for Normal Distribution Model Question Group N D M1 Q1 Modularized Flat 56 53 0.199 0.261 0.000a 0.000a Q3 Modularized Flat 56 53 0.239 0.203 0.000a 0.000a Q5 Modularized Flat 56 53 0.243 0.260 0.000a 0.000a Q7 Modularized Flat 56 53 0.223 0.310 0.000a 0.000a Q9 Modularized Flat 53 56 0.180 0.240 0.000a 0.000a Q11 Modularized Flat 53 56 0.204 0.219 0.000a 0.000a Q13 Modularized Flat 53 56 0.206 0.241 0.000a 0.000a Q15 Modularized Flat 53 56 0.204 0.248 0.000a 0.000a M2 a p significant at the 0.05 level Table A.5: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , abstraction questions, mental effort, values per question 192 A.3 Experiment E2 Model Question Group N D M1 Q1 Modularized Flat 56 53 0.499 0.539 0.000a 0.000a Q3 Modularized Flat 56 53 0.483 0.531 0.000a 0.000a Q5 Modularized Flat 56 53 0.475 0.503 0.000a 0.000a Q7 Modularized Flat 56 53 0.540 0.539 0.000a 0.000a Q9 Modularized Flat 53 56 0.533 0.525 0.000a 0.000a Q11 Modularized Flat 53 56 0.466 0.478 0.000a 0.000a Q13 Modularized Flat 53 56 0.533 0.525 0.000a 0.000a Q15 Modularized Flat 53 56 0.499 0.536 0.000a 0.000a M2 a p significant at the 0.05 level Table A.6: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , abstraction questions, accuracy, values per question 193 Appendix A Tests for Normal Distribution Model Question Group N D M1 Q1 Modularized Flat 56 53 0.202 0.119 0.000a 0.060 Q3 Modularized Flat 56 53 0.263 0.111 0.000a 0.151 Q5 Modularized Flat 56 53 0.173 0.082 0.000a 0.200 Q7 Modularized Flat 56 53 0.145 0.178 0.005a 0.000a Q9 Modularized Flat 53 56 0.189 0.171 0.000a 0.001a Q11 Modularized Flat 53 56 0.143 0.118 0.001a 0.064 Q13 Modularized Flat 53 56 0.197 0.147 0.000a 0.006a Q15 Modularized Flat 53 56 0.143 0.163 0.006a 0.001a M2 a p significant at the 0.05 level Table A.7: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , abstraction questions, duration, values per question Variable Group N D Mental effort Modularized Flat 109 109 0.122 0.096 0.000a 0.016a Accuracy Modularized Flat 109 109 0.246 0.372 0.000a 0.000a Duration Modularized Flat 109 109 0.122 0.081 0.000a 0.075 a p significant at the 0.05 level Table A.8: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , fragmentation questions, total values 194 A.3 Experiment E2 Variable Model Group N D Mental effort M1 Modularized Flat 56 53 0.132 0.141 0.016a 0.010a M2 Modularized Flat 53 56 0.134 0.093 0.019a 0.200 M1 Modularized Flat 56 53 0.239 0.433 0.000a 0.000a M2 Modularized Flat 53 56 0.250 0.310 0.000a 0.000a M1 Modularized Flat 56 53 0.066 0.084 0.200 0.200 M2 Modularized Flat 53 56 0.177 0.092 0.000a 0.200 Accuracy Duration a p significant at the 0.05 level Table A.9: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , fragmentation questions, values per model 195 Appendix A Tests for Normal Distribution Model Question Group N D M1 Q2 Modularized Flat 56 53 0.222 0.218 0.000a 0.000a Q4 Modularized Flat 56 53 0.250 0.263 0.000a 0.000a Q6 Modularized Flat 56 53 0.162 0.200 0.001a 0.000a Q8 Modularized Flat 56 53 0.165 0.305 0.001a 0.000a Q10 Modularized Flat 53 56 0.278 0.199 0.000a 0.000a Q12 Modularized Flat 53 56 0.214 0.228 0.000a 0.000a Q14 Modularized Flat 53 56 0.191 0.206 0.000a 0.000a Q16 Modularized Flat 53 56 0.242 0.176 0.000a 0.000a M2 a p significant at the 0.05 level Table A.10: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , fragmentation questions, mental effort, values per question 196 A.3 Experiment E2 Model Question Group N D M1 Q2 Modularized Flat 56 53 0.466 0.525 0.000a 0.000a Q4 Modularized Flat 56 53 0.483 0.539 0.000a 0.000a Q6 Modularized Flat 56 53 0.507 0.536 0.000a 0.000a Q8 Modularized Flat 56 53 0.367 0.531 0.000a 0.000a Q10 Modularized Flat 53 56 0.393 0.499 0.000a 0.000a Q12 Modularized Flat 53 56 0.431 0.499 0.000a 0.000a Q14 Modularized Flat 53 56 0.495 0.514 0.000a 0.000a Q16 Modularized Flat 53 56 0.431 0.466 0.000a 0.000a M2 a p significant at the 0.05 level Table A.11: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , fragmentation questions, accuracy, values per question 197 Appendix A Tests for Normal Distribution Model Question Group N D M1 Q2 Modularized Flat 56 53 0.105 0.167 0.194 0.001a Q4 Modularized Flat 56 53 0.086 0.184 0.200 0.000a Q6 Modularized Flat 56 53 0.115 0.125 0.065 0.038a Q8 Modularized Flat 56 53 0.222 0.075 0.000a 0.200 Q10 Modularized Flat 53 56 0.140 0.092 0.011a 0.200 Q12 Modularized Flat 53 56 0.148 0.166 0.005a 0.001a Q14 Modularized Flat 53 56 0.205 0.160 0.000a 0.001a Q16 Modularized Flat 53 56 0.112 0.129 0.096 0.021a M2 a p significant at the 0.05 level Table A.12: Kolmogorov–Smirnov Test with Lilliefors significance correction for E2 , fragmentation questions, duration, values per question 198 A.4 Replication R2 A.4 Replication R2 Variable Group N D Mental Effort Modularized Flat 48 48 0.161 0.112 0.003a 0.178 Accuracy Modularized Flat 48 48 0.386 0.422 0.000a 0.000a Duration Modularized Flat 30 30 0.091 0.197 0.200 0.004a a p significant at the 0.05 level Table A.13: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , abstraction questions, total values 199 Appendix A Tests for Normal Distribution Variable Model Group N D p Mental Effort M1 Modularized Flat 20 28 0.150 0.140 0.200 0.173 M2 Modularized Flat 28 20 0.161 0.134 0.061 0.200 M1 Modularized Flat 20 28 0.347 0.374 0.000a 0.000a M2 Modularized Flat 28 20 0.429 0.487 0.000a 0.000a M1 Modularized Flat 14 16 0.205 0.157 0.113 0.200 M2 Modularized Flat 16 14 0.166 0.182 0.200 0.200 Accuracy Duration a significant at the 0.05 level Table A.14: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , abstraction questions, values per model 200 A.4 Replication R2 Model Question Group N D M1 Q1 Modularized Flat 20 28 0.280 0.252 0.000a 0.000a Q3 Modularized Flat 20 28 0.300 0.203 0.000a 0.005a Q5 Modularized Flat 20 28 0.278 0.196 0.000a 0.007a Q7 Modularized Flat 20 28 0.235 0.245 0.005a 0.000a Q9 Modularized Flat 28 20 0.221 0.172 0.001a 0.124 Q11 Modularized Flat 28 20 0.246 0.250 0.000a 0.002a Q13 Modularized Flat 28 20 0.307 0.277 0.000a 0.000a Q15 Modularized Flat 28 20 0.225 0.280 0.001a 0.000a M2 a p significant at the 0.05 level Table A.15: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , abstraction questions, mental effort, values per question 201 Appendix A Tests for Normal Distribution Model Question Group N M1 Q1 Modularized Flat 20 28 0.538 0.000a Not applicableb Q3 Modularized Flat 20 28 0.509 0.526 0.000a 0.000a Q5 Modularized Flat 20 28 0.438 0.447 0.000a 0.000a Q7 Modularized Flat 20 28 0.538 0.536 0.000a 0.000a Q9 Modularized Flat 28 20 0.513 0.538 0.000a 0.000a Q11 Modularized Flat 28 20 0.526 0.538 0.000a 0.000a Q13 Modularized Flat 28 20 0.539 0.000a Not applicableb Q15 Modularized Flat 28 20 0.539 0.527 M2 a b D p 0.000a 0.000a significant at the 0.05 level The variance of the group is 0. As the variance of a normal distribution must be larger than 0, Kolmogorov–Smirnov Test cannot be applied. Likewise, the group is not normal distributed. Table A.16: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , abstraction questions, accuracy, values per question 202 A.4 Replication R2 Model Question Group N D p M1 Q1 Modularized Flat 14 16 0.130 0.196 0.200 0.103 Q3 Modularized Flat 14 16 0.128 0.114 0.200 0.200 Q5 Modularized Flat 14 16 0.202 0.120 0.128 0.200 Q7 Modularized Flat 14 16 0.257 0.197 0.013a 0.098 Q9 Modularized Flat 16 14 0.174 0.163 0.200 0.200 Q11 Modularized Flat 16 14 0.268 0.224 0.003a 0.055 Q13 Modularized Flat 16 14 0.136 0.147 0.200 0.200 Q15 Modularized Flat 16 14 0.107 0.112 0.200 0.200 M2 a significant at the 0.05 level Table A.17: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , abstraction questions, duration, values per question Variable Group N D p Mental effort Modularized Flat 48 48 0.107 0.108 0.200 0.200 Accuracy Modularized Flat 48 48 0.291 0.427 0.000a 0.000a Duration Modularized Flat 30 30 0.125 0.137 0.200 0.156 a significant at the 0.05 level Table A.18: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , fragmentation questions, total values 203 Appendix A Tests for Normal Distribution Variable Model Group N D p Mental Effort M1 Modularized Flat 20 28 0.121 0.094 0.200 0.200 M2 Modularized Flat 28 20 0.171 0.128 0.036a 0.200 M1 Modularized Flat 20 28 0.259 0.429 0.001a 0.000a M2 Modularized Flat 28 20 0.312 0.424 0.000a 0.000a M1 Modularized Flat 14 16 0.205 0.164 0.115 0.200 M2 Modularized Flat 16 14 0.122 0.133 0.200 0.200 Accuracy Duration a significant at the 0.05 level Table A.19: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , fragmentation questions, values per model 204 A.4 Replication R2 Model Question Group N D M1 Q2 Modularized Flat 20 28 0.245 0.206 0.003a 0.004a Q4 Modularized Flat 20 28 0.250 0.300 0.002a 0.000a Q6 Modularized Flat 20 28 0.185 0.173 0.073 0.031a Q8 Modularized Flat 20 28 0.270 0.218 0.001a 0.002a Q10 Modularized Flat 28 20 0.215 0.269 0.002a 0.001a Q12 Modularized Flat 28 20 0.310 0.200 0.000a 0.035a Q14 Modularized Flat 28 20 0.220 0.276 0.001a 0.000a Q16 Modularized Flat 28 20 0.274 0.239 0.000a 0.004a M2 a p significant at the 0.05 level Table A.20: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , fragmentation questions, mental effort, values per question 205 Appendix A Tests for Normal Distribution Model Question Group N D M1 Q2 Modularized Flat 20 28 0.438 0.513 Q4 Modularized Flat 20 28 0.538 0.000a Not applicableb Q6 Modularized Flat 20 28 Not applicableb 0.526 0.000a Q8 Modularized Flat 20 28 0.413 0.536 0.000a 0.000a Q10 Modularized Flat 28 20 0.411 0.509 0.000a 0.000a Q12 Modularized Flat 28 20 0.536 0.538 0.000a 0.000a Q14 Modularized Flat 28 20 0.539 0.538 0.000a 0.000a Q16 Modularized Flat 28 20 0.513 0.527 0.000a 0.000a M2 a b p 0.000a 0.000a significant at the 0.05 level The variance of the group is 0. As the variance of a normal distribution must be larger than 0, Kolmogorov–Smirnov Test cannot be applied. Likewise, the group is not distributed. Table A.21: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , fragmentation questions, accuracy, values per question 206 A.4 Replication R2 Model Question Group N D p M1 Q2 Modularized Flat 14 16 0.205 0.165 0.113 0.200 Q4 Modularized Flat 14 16 0.217 0.195 0.073 0.106 Q6 Modularized Flat 14 16 0.099 0.198 0.200 0.094 Q8 Modularized Flat 14 16 0.147 0.229 0.200 0.025a Q10 Modularized Flat 16 14 0.214 0.204 0.048a 0.118 Q12 Modularized Flat 16 14 0.192 0.152 0.118 0.200 Q14 Modularized Flat 16 14 0.174 0.193 0.200 0.166 Q16 Modularized Flat 16 14 0.160 0.135 0.200 0.200 M2 a significant at the 0.05 level Table A.22: Kolmogorov–Smirnov Test with Lilliefors significance correction for R2 , fragmentation questions, duration, values per question 207 Appendix A Tests for Normal Distribution A.5 Experiment E3 Variable Group N D p Mental Effort Modularized Flat 9 9 0.155 0.151 0.200 0.200 Accuracy Modularized Flat 9 9 0.459 0.519 0.000a 0.000a Duration Modularized Flat 9 9 0.188 0.146 0.200 0.200 a significant at the 0.05 level Table A.23: Kolmogorov–Smirnov Test with Lilliefors significance correction for E3 , abstraction questions, total values Variable Group N D p Mental Effort Modularized Flat 9 9 0.155 0.151 0.200 0.200 Accuracy Modularized Flat 9 9 0.245 0.414 0.127 0.000a Duration Modularized Flat 9 9 0.188 0.146 0.200 0.200 a significant at the 0.05 level Table A.24: Kolmogorov–Smirnov Test with Lilliefors significance correction for E3 , fragmentation questions, total values 208 Appendix B Supplementary Information B.1 Process of Thesis Writing In the vein of the Process of Process Modeling (PPM) [185, 192], we analyzed the process of thesis writing, i.e., the way how this thesis was written. We thereby focus on the writing part only, i.e., we describe how this document—given a set of published research—evolved. The basis for this analysis is a SVN repository1 in which the LATEX sources of intermediate versions of this document, also referred to as revisions, are stored. In particular, the document was periodically, e.g., at the end of each working day, saved in the repository. Thus, by looking at the intermediate versions of the document, its evolution can be reconstructed. Technically, the analysis was supported by a Bash script that determines all revisions, compiles a PDF file for reach revision, computes metrics, e.g., number of pages, and writes the results to an output file.2 Applying this procedure led to data points from 183 revisions, therefore we preprocessed the data and removed all revisions that differed less than 2 pages. The result of this procedure can be found in Figure B.1: the solid line shows the amount of pages, the dotted line shows the amount of references, the dashed line shows the amount of figures and the dash–dotted line shows the amount of tables. Due to the preprocessing, only 58 revisions are shown on the x–axis. Even though this data is clearly influenced by uncontrolled variables, such as holidays, durations between revisions and workload, some interesting observations—that clearly resemble the PPM—can be made. • Steep Writing Phases As discussed in Chapter 1, most of the research described in this thesis was previously published, hence also this thesis builds upon these publications. Pasting the content of these publications into the thesis resulted in a quick increase of page numbers, as be can seen in, e.g., revision 11, 17 and 1 2 SVN is freely available from: http://subversion.apache.org The script is freely available from: http://bpm.q-e.at/misc/ThesisEvolution 209 Appendix B Supplementary Information 300 Pages References Figures 250 Tables Amount 200 150 100 50 0 1 5 10 15 20 25 30 35 40 45 50 55 SVN Revision (Preprocessed) Figure B.1: Process of Thesis Writing 30. We would like to remark at this point that the steep increase of pages at revision 52 can be rather traced back to adding tables listing tests for normal distribution in Appendix A. Likewise, the final steep increase is related to the creation of the preface. • Steady Writing Phases Of course, this thesis could not be assembled by just pasting publications next to each other. Rather, text was specifically produced for this thesis. Clearly, in such phases, the number of pages increases steadily, but slower than in steep writing phases. Examples of steady writing phases can be found from revision 6–11, 23–29 and 34–44. • Reconciliation Phases After text was produced—regardless whether it was pasted or written—the content needs to be reviewed, consolidated and compacted. This, in turn, may lead to a reduction of pages, as can be seen for revisions 6, 15 and 33. • Chapter Phases Against this background, also the chapters of this thesis can be 210 B.2 Publications found. Starting with Chapter 2 at revisions 1–6, Chapter 3 follows at revisions 7–11. Chapter 4 is reflected through revisions 12–31 and Chapter 5 is related to revisions 32–52. Finally, in revisions 53–58 Chapter 1 and Chapter 6 as well as the preface can be found. Please note that due to the preprocessing, time–consuming tasks, such as final spell–checking are not reflected in this diagram. Finally, it is worthwhile to note that from the beginning of the PhD to its end, in total 3,246 cups of coffee were consumed. Put differently, for each page 12.8 cups of coffee were required. B.2 Publications The research conducted in this thesis was embedded in several international cooperations. Likewise, the author was given the opportunity to cooperate in research that is not directly connected to this thesis. To give an overview of all works the author was involved, in Figure B.2 all publications can be found—publications directly related to this thesis are marked with a circle. It should also be mentioned that all publications directly related to this thesis were lead by the author. In this sense, the author was responsible for the research described in these publications, including conceptualization, implementation and empirical validation. Co-authors, however, supported the work by giving feedback, participating in discussions and supporting the data collection. 211 Appendix B Supplementary Information [278] [190] [174] [94] [242] Imperative Process Models [180] [69] [301] [241] [70] [184] Declarative Process Models Understandability [299] [297] [298] [304] Modularization [303] [296] [97] [302] [276] [271] [188] [189] [300] [183] Process of Process Modeling Tools for Empirical Research [185] [186] [192] [31] [277] [104] [270] [187] Process Flexibility [191] [295] [272] [118] Semantic Web and Medical Informatics [56] Figure B.2: All publications, organized by topic 212 [57] [55] Abbreviations Throughout this thesis, we made use of abbreviations. Even though we carefully introduced abbreviations, for the sake of completeness and quick lookup, all abbreviations used in this thesis are listed in Table B.1. Abbreviation Full Name AAT ACM BPM BPMN CDF CEP CLT DCR DE DSRM EPC ER GEF GRETA HCI HERD IEEE IS LATEX LDM LTL MAD MB MM PAIS PDF PPM Automated Acceptance Testing Association for Computing Machinery Business Process Management Business Process Model and Notation Cognitive Dimensions Framework Cheetah Experimental Platform Cognitive Load Theory Dynamic Condition Response Domain Expert Design Science Research Methodology Event–Driven Process Chain Entity–Relationship Graphical Editing Framework Graphical Runtime Environment for Adaptive Processes Human Computer Interaction Hierarchical Entity–Relationship Diagram Institute of Electrical and Electronics Engineers Information Systems Lamport TeX Levelled Data Model Linear Temporal Logic Median Absolute Deviation Model Builder Model Mediator Process Aware Information System Portable Document Format Process of Process Modeling 213 Appendix B Supplementary Information Abbreviation Full Name RQ SD SPSS SVN TAM TDD TDM TDMS UI UML UTP YAWL Research Question Standard Deviation Statistical Package for Social Sciences Subversion Technology Acceptance Model Test Driven Development Test Driven Modeling Test Driven Modeling Suite User Interface Unified Modeling Language UML Testing Profile Yet Another Workflow Language Table B.1: Abbreviations 214 Bibliography [1] H. Agrawal, J. R. Horgan, E. W. Krauser, and S. London. Incremental Regression Testing. In Proc. ICSM’93, pages 348–357, 1993. [2] D. Amyot and A. Eberlein. An Evaluation of Scenario Notations and Construction Approaches for Telecommunication Systems Development. Telecommunication Systems, 24(1):61–94, 2003. [3] E. Arisholm and D. I. Sjøberg. Evaluating the Effect of a Delegated versus Centralized Control Style on the Maintainability of Object-Oriented Software. IEEE Transactions on Software Engineering, 30(8):521–534, 2004. [4] P. Attie, M. Singh, A. Sheth, and M. Rusinkiewicz. Specifying and Enforcing Intertask Dependencies. In Proc. VLDB’93, pages 134–145, 1993. [5] A. Awad, G. Decker, and M. Weske. Efficient Compliance Checking Using BPMN-Q and Temporal Logic. In Proc. BPM’08, pages 326–341, 2008. [6] A. Awad, S. Smirnov, and M. Weske. Resolution of Compliance Violation in Business Process Models: A Planning-Based Approach. In Proc. OTM’09, pages 6–23, 2009. [7] A. Baddeley. Working Memory. Science, 255(5044):556–559, 1992. [8] A. Baddeley. Working Memory: Theories, Models, and Controversies. Annual Review of Psychology, 63(1):1–29, 2012. [9] M. Bannert. Managing cognitive load—recent trends in cognitive load theory. Learning and Instruction, 12(1):139–146, 2002. [10] I. Barba, B. Weber, and C. D. Valle. Supporting the Optimized Execution of Business Processes through Recommendations. In Proc. BPI’11, pages 135– 140, 2012. [11] I. Barba, B. Weber, C. D. Valle, and A. J. Ramı́rez. User Recommendations for the Optimized Execution of Business Processes. Data & Knowledge Engineering, 86:61–84, 2013. 215 Bibliography [12] F. Bartlett. Remembering: A Study in Experimental and Social Psychology. Cambridge University Press, 1932. [13] V. R. Basili. The Role of Experimentation in Software Engineering: Past, Current, and Future. In Proc. ICSE’96, pages 442–449, 1996. [14] K. Beck. Extreme Programming Explained: Embracing Change. AddisonWesley, 1999. [15] K. Beck. Test Driven Development: By Example. Addison-Wesley, 2002. [16] P. Berander. Using Students as Subjects in Requirements Prioritization. In Proc. ISESE’04, pages 167–176, 2004. [17] R. Bergenthum, J. Desel, S. Mauser, and R. Lorenz. Construction of Process Models from Example Runs. In Transactions on Petri Nets and Other Models of Concurrency II, pages 243–259. Springer, 2009. [18] A. Bertolino, G. D. Angelis, A. D. Sandro, and A. Sabetta. Is my model right? Let me ask the expert. Journal of Systems and Software, 84(7):1089– 1099, 2011. [19] R. Bobrik, M. Reichert, and T. Bauer. Requirements for the Visualization of System-Spanning Business Processes. In Proc. DEXA’05, pages 948–954, 2005. [20] B. W. Boehm. Verifying and Validating Software Requirements and Design Specifications. IEEE Software, 1(1):75–88, 1984. [21] D. E. Broadbent. The magic number seven after fifteen years. In Studies in long term memory, pages 3–18. Wiley, 1975. [22] C. Brown. Cognitive Psychology. Sage Publications, 2006. [23] J.-M. Burkhardt, F. Détienne, and S. Wiedenbeck. Object-Oriented Program Comprehension: Effect of Expertise, Task and Phase. Empirical Software Engineering, 7(2):115–156, 2002. [24] A. Burton-Jones and P. N. Meso. How Good Are These UML Diagrams? An Empirical Test of the Wand and Weber Good Decomposition Model. In Proc. ICIS’02, pages 101–114, 2002. 216 Bibliography [25] A. Burton-Jones and P. N. Meso. Conceptualizing Systems for Understanding: An Empirical Test of Decomposition Principles in Object-Oriented Analysis. Information Systems Research, 17(1):38–60, 2006. [26] A. Burton-Jones and P. N. Meso. The Effects of Decomposition Quality and Multiple Forms of Information on Novices’ Understanding of a Domain from a Conceptual Model. Journal of the Association for Information Systems, 9(12):748–802, 2008. [27] G. Canfora, A. Cimitile, F. Garcia, M. Piattini, and C. A. Visaggio. Evaluating Advantages of Test Driven Development: a Controlled Experiment with Professionals. In Proc. ISESE’06, pages 364–371, 2006. [28] W. Chase and H. Simon. Perception in Chess. Cognitive Psychology, 4(1):55– 81, 1973. [29] W. Chase and H. Simon. The Mind’s Eye in Chess. In Visual information processing, pages 215–281. Academic Press, 1973. [30] J. Claes, I. Vanderfeesten, J. Pinggera, H. Reijers, B. Weber, and G. Poels. Visualizing the Process of Process Modeling with PPMCharts. In Proc. TAProViz’12, pages 744–755, 2013. [31] J. Claes, I. Vanderfeesten, H. Reijers, J. Pinggera, M. Weidlich, S. Zugal, D. Fahland, B. Weber, J. Mendling, and G. Poels. Tying Process Model Quality to the Modeling Process: The Impact of Structuring, Movement, and Speed. In Proc. BPM’12, pages 33–48, 2012. [32] J. Cohen. Statistical Power Analysis for the Behavioral Sciences, Second Edition. Lawrence Erlbaum, 1988. [33] J. Cohen. A Power Primer. Psychological Bulletin, 112(1):155–159, 1992. [34] A. R. Conway, N. Cowan, M. F. Bunting, D. J. Therriault, and S. R. B. Minkoff. A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence. Intelligence, 30(2):163–183, 2002. [35] J. Corbin and A. Strauss. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory. SAGE Publications, 2007. [36] L. D. Couglin and V. L. Patel. Processing of critical information by physicians and medical students. Journal of Medical Education, 62(10):818–828, 1987. 217 Bibliography [37] N. Cowan. Working Memory Capacity. Psycholology Press, 2005. [38] A. W. Crapo, L. B. Waisel, W. A. Wallace, and T. R. Willemain. Visualization and the process of modeling: a cognitive-theoretic view. In Proc. KDD’00, pages 218–226, 2000. [39] J. Creswell. Research Design: Qualitative, Quantitative and Mixed Method Approaches. Sage Publications, 2002. [40] J. Cruz-Lemus, M. Genero, and M. Piattini. Using Controlled Experiments for Validating UML Statechart Diagrams Measures. In Proc. IWSM-Mensura’07, pages 129–138, 2008. [41] J. Cruz-Lemus, M. Genero, M. Piattini, and A. Toval. Investigating the Nesting Level of Composite States in UML Statechart Diagrams. In Proc. QAOOSE’05, pages 97–108, 2005. [42] J. A. Cruz-Lemus, M. Genero, M. E. Manso, S. Morasca, and M. Piattini. Assessing the understandability of UML statechart diagrams with composite states—A family of empirical studies. Empirical Software Engineering, 25(6):685–719, 2009. [43] J. A. Cruz-Lemus, M. Genero, M. E. Manso, and M. Piattini. Evaluating the Effect of Composite States on the Understandability of UML Statechart Diagrams. In Proc. MODELS’05, pages 113–125, 2005. [44] J. A. Cruz-Lemus, M. Genero, S. Morasca, and M. Piattini. Using Practitioners for Assessing the Understandability of UML Statechart Diagrams with Composite States. In Proc. ER Workshops’07, pages 213–222, 2007. [45] J. A. Cruz-Lemus, M. Genero, M. Piattini, and A. Toval. An Empirical Study of the Nesting Level of Composite States Within UML Statechart Diagrams. In Proc. ER Workshops’05, pages 12–22, 2005. [46] N. Damij. Business process modelling using diagrammatic and tabular techniques. Business Process Management Journal, 13(1):70–90, 2007. [47] F. Davies. A Technology Acceptance Model for Empirically Testing New EndUser Information Systems: Theory and Results. PhD thesis, Sloan School of Management, 1986. [48] I. Davies, P. Green, M. Rosemann, M. Indulska, and S. Gallo. How do Practitioners Use Conceptual Modeling in Practice? Data & Knowledge Engineering, 58(3):358–380, 2006. 218 Bibliography [49] R. Davies. Business Process Modelling With Aris: A Practical Guide. Springer, 2001. [50] F. D. Davis. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly, 13(3):319–340, 1989. [51] M. K. de Weger. Structuring of Business Processes: An architectural approach to distributed systems development and its application to business processes. PhD thesis, University of Twente, 1998. [52] J. Desel. Model Validation—A Theoretical Issue? In Proc. ICATPN’02, pages 23–43, 2002. [53] J. Desel. From Human Knowledge to Process Models. In Proc. UNISCON’08, pages 84–95, 2008. [54] J. Desel, G. Juhás, R. Lorenz, and C. Neumair. Modelling and Validation with VipTool. In Proc. BPM’03, pages 380–389, 2003. [55] M. Droop, M. Flarer, J. Groppe, S. Groppe, V. Linnemann, J. Pinggera, F. Santner, M. Schier, F. Schöpf, H. Staffler, and S. Zugal. Translating XPath Queries into SPARQL Queries. In Proc. OTM Workshops’07, pages 9–10, 2007. [56] M. Droop, M. Flarer, J. Groppe, S. Groppe, V. Linnemann, J. Pinggera, F. Santner, M. Schier, F. Schöpf, H. Staffler, and S. Zugal. Embedding Xpath Queries into SPARQL Queries. In Proc. ICEIS’08, pages 5–14, 2008. [57] M. Droop, M. Flarer, J. Groppe, S. Groppe, V. Linnemann, J. Pinggera, F. Santner, M. Schier, F. Schöpf, H. Staffler, and S. Zugal. Bringing the XML and Semantic Web Worlds Closer: Transforming XML into RDF and Embedding XPath into SPARQL. In Proc. ICEIS’08, pages 31–45, 2009. [58] A. Duchowski. Eye Tracking Methodology. Springer, 2007. [59] M. Dumas, M. L. Rosa, J. Mendling, R. Mäesalu, H. Reijers, and N. Semenenko. Understanding Business Process Models: The Costs and Benefits of Structuredness. In Proc. CAiSE’12, pages 31–46, 2012. [60] M. Dumas, W. M. P. van der Aalst, and A. H. ter Hofstede. Process Aware Information Systems: Bridging People and Software Through Process Technology. Wiley-Interscience, 2005. 219 Bibliography [61] S. Easterbrook, J. Singer, M.-A. Storey, and D. Damian. Selecting Empirical Methods for Software Engineering Research. In Guide to Advanced Empirical Software Engineering, pages 285–311. Springer, 2008. [62] S. H. Edwards. Using Test-Driven Development in the Classroom: Providing Students with Automatic, Concrete Feedback on Performance. In Proc. EISTA’03, pages 421–426, 2003. [63] H. Erdogmus, M. Morisio, and M. Torchiano. On the Effectiveness of the TestFirst Approach to Programming. IEEE Transactions on Software Engineering, 31(1):226–237, 2005. [64] K. A. Ericsson and H. A. Simon. Protocol analysis: Verbal reports as data. MIT Press, 1993. [65] T. Erl. Service-oriented Architecture: Concepts, Technology, and Design. Prentice Hall, 2005. [66] D. Fahland. Oclets—Scenario-Based Modeling with Petri Nets. In Proc. PETRI NETS’09, pages 223–242, 2009. [67] D. Fahland. From Scenarios To Components. Universität zu Berlin, 2010. PhD thesis, Humboldt- [68] D. Fahland and A. Kantor. Synthesizing Decentralized Components from a Variant of Live Sequence Charts. In Proc. MODELSWARD’13, pages 25–38, 2013. [69] D. Fahland, J. Mendling, H. Reijers, B. Weber, M. Weidlich, and S. Zugal. Declarative versus Imperative Process Modeling Languages: The Issue of Understandability. In Proc. EMMSAD’09, pages 353–366, 2009. [70] D. Fahland, J. Mendling, H. Reijers, B. Weber, M. Weidlich, and S. Zugal. Declarative vs. Imperative Process Modeling Languages: The Issue of Maintainability. In Proc. ER-BPM’09, pages 65–76, 2009. [71] D. Fahland and M. Weidlich. Scenario-based process modeling with Greta. In Proc. BPMDemos’10, 2010, http://ceur-ws.org/Vol-615. [72] D. Fahland and H. Woith. Towards Process Models for Disaster Response. In Proc. PM4HDPS’08, pages 254–265, 2008. [73] P. Feldmann and D. Miller. Entity Model Clustering: Structuring A Data Model By Abstraction. The Computer Journal, 29(4):348–360, 1986. 220 Bibliography [74] K. Figl and B. Weber. Individual Creativity in Designing Business Processes. In Proc. HC-PAIS’12, pages 294–306, 2012. [75] S. Forster. Investigating the Collaborative Process of Process Modeling. In CAiSE’13 Doctoral Consortium, pages 33–41, 2013. [76] S. Forster, J. Pinggera, and B. Weber. Collaborative Business Process Modeling. In Proc. EMISA’12, pages 81–94, 2012. [77] S. Forster, J. Pinggera, and B. Weber. Toward an Understanding of the Collaborative Process of Process Modeling. In Proc. CAiSE Forum’13, pages 98–105, 2013. [78] C. Francalanci and B. Pernici. Abstraction Levels for Entity-Relationship Schemas. In Proc. ER’94, pages 456–473, 1994. [79] D. J. Garland and J. R. Barry. Cognitive Advantage in Sport: The Nature of Perceptual Structures. The American Journal of Psychology, 104(2):211–228, 1991. [80] W. Gassler, E. Zangerle, and G. Specht. The Snoopy Concept: Fighting heterogeneity in semistructured and collaborative information systems by using recommendations. In Proc. CTS’11, pages 61–68, 2011. [81] W. Gassler, E. Zangerle, M. Tschuggnall, and G. Specht. SnoopyDB: narrowing the gap between structured and unstructured information using recommendations. In Proc. HT’10, pages 271–272, 2010. [82] C. F. Gauss. Bestimmung der Genauigkeit der Beobachtungen. Zeitschrift für Astronomie und verwandte Wissenschaften, 1:185–197, 1816. [83] A. Gemino and Y. Wand. Complexity and clarity in conceptual modeling: Comparison of mandatory and optional properties. Data & Knowledge Engineering, 55(3):301–326, 2005. [84] B. George and L. Williams. A structured experiment of test-driven development. Information and Software Technology, 46(5):337–342, 2004. [85] A. L. Gilchrist and N. Cowan. Chunking. In The encyclopedia of human behavior, vol. 1, pages 476–483. Academic Press, 2012. [86] J. F. Gilgun. Qualitative Methods in Family Research, chapter Definitions, Methologies, and Methods in Qualitative Family Research, pages 22–39. Sage Publications, 1992. 221 Bibliography [87] D. J. Gilmore and T. R. Green. Comprehension and recall of miniature programs. International Journal of Man-Machine Studies, 21(1):31–48, 1984. [88] M. Glinz, C. Seybold, and S. Meier. Simulation-Driven Creation, Validation and Evolution of Behavioral Requirements Models. In Proc. MBEES’07, pages 103–112, 2007. [89] J. A. Goguen and F. J. Varela. Systems and Distinctions; Duality and Complementarity. International Journal of General Systems, 5(1):31–43, 1979. [90] D. Gopher and R. Brown. On the psychophysics of workload: Why bother with subjective measure? Human Factors: The Journal of the Human Factors and Ergonomics Society, 26(5):519–532, 1984. [91] P. Gray. Psychology. Worth Publishers, 2007. [92] T. R. Green. Cognitive dimensions of notations. In Proc. BCSHCI’89, pages 443–460, 1989. [93] T. R. Green and M. Petre. Usability Analysis of Visual Programming Environments: A ’Cognitive Dimensions’ Framework. Journal of Visual Languages and Computing, 7(2):131–174, 1996. [94] T. Gschwind, J. Pinggera, S. Zugal, H. Reijers, and B. Weber. A Linear Time Layout Algorithm for Business Process Models. Technical Report RZ3830, IBM Research, 2012. [95] C. Haisjackl. Test Driven Modeling meets Declarative Process Modeling—A Case Study. Master’s thesis, University of Innsbruck, August 2012. [96] C. Haisjackl and B. Weber. User Assistance during Process Execution—An Experimental Evaluation of Recommendation Strategies. In Proc. BPI’10, pages 134–145, 2011. [97] C. Haisjackl, S. Zugal, P. Soffer, I. Hadar, M. Reichert, J. Pinggera, and B. Weber. Making Sense of Declarative Process Models: Common Strategies and Typical Pitfalls. In Proc. BPMDS’13, pages 2–17, 2013. [98] D. Z. Hambrick and R. W. Engle. Effects of domain knowledge, working memory capacity, and age on cognitive performance: An investigation of the knowledge-is-power hypothesis. Cognitive Psychology, 44(4):339–387, 2002. [99] D. Harel and R. Marelly. Come, Let’s Play: Scenario-Based Programming Using LSCs and the Play-Engine. Springer-Verlag, 2003. 222 Bibliography [100] A. Hevner, S. March, J. Park, and S. Ram. Design Science in Information Systems Research. MIS Quarterly, 28(1):75–105, 2004. [101] T. Hildebrandt and R. Mukkamala. Declarative Event-Based Workflow as Distributed Dynamic Condition Response Graphs. In Proc. PLACES’10, pages 59–73, 2010. [102] T. Hildebrandt, R. Mukkamala, and T. Slaats. Designing a Crossorganizational Case Management System using Dynamic Condition Response Graphs. In Proc. EDOC’11, pages 161–170, 2011. [103] T. Hildebrandt, R. Mukkamala, and T. Slaats. Nested Dynamic Condition Response Graphs. In Proc. FSEN’11, pages 343–350, 2012. [104] B. Holzner, J. Giesinger, J. Pinggera, S. Zugal, F. Schöpf, A. Oberguggenberger, E. Gamper, A. Zabernigg, B. Weber, and G. Rumpold. The Computerbased Health Evaluation Software (CHES): a software for electronic patientreported outcome monitoring. BMC Medical Informatics and Decision Making, 12(1), 2012. [105] S. Hoppenbrouwers, L. Lindeman, and E. Proper. Capturing Modeling Processes - Towards the MoDial Modeling Laboratory. In Proc. OTM’06, pages 1242–1252, 2006. [106] S. Hoppenbrouwers, E. Proper, and T. van der Weide. Formal Modelling as a Grounded Conversation. In Proc. LAP’05, pages 139–155, 2005. [107] S. Hoppenbrouwers, E. Proper, and T. Weide. A Fundamental View on the Process of Conceptual Modeling. In Proc. ER’05, pages 128–143, 2005. [108] M. Höst, B. Regnell, and C. Wohlin. Using Students as Subjects—A Comparative Study of Students and Professionals in Lead-Time Impact Assessment. Empirical Software Engingeering, 5(3):201–214, 2000. [109] A. S. Huff. Mapping Strategic Thought. Wiley, 1990. [110] M. Indulska, P. Green, J. Recker, and M. Rosemann. Business Process Modeling: Perceived Benefits. In Proc. ER’09, pages 458–471, 2009. [111] R. Jeffries, A. Turner, P. Polson, and M. Atwood. The Process Involved in Designing Software. In Cognitive Skills and Their Acquisition, pages 255–283. Erlbaum, 1981. 223 Bibliography [112] T. D. Jick. Mixing Qualitative and Quantitative Methods: Triangulation in Action. Administrative Science Quarterly, 24(4):602–611, 1979. [113] F. Johannsen and S. Leist. Wand and Weber’s Decomposition Model in the Context of Business Process Modeling. Business & Information Systems Engineering, 4(5):271–286, 2012. [114] M. A. Just and P. A. Carpenter. A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99(1):122–149, 1992. [115] S. Kalyuga, P. Ayres, P. Chandler, and J. Sweller. The Expertise Reversal Effect. Educational Psychologist, 38(1):23–31, 2003. [116] S. Kalyuga, P. Chandler, and J. Sweller. Managing Split-attention and Redundancy in Multimedia Instruction. Applied Cognitive Psychology, 13(4):351– 371, 1999. [117] E. Kant and A. Newell. Problem Solving Techniques for the design of algorithms. Information Processing & Management, 20(1–2):97–118, 1984. [118] A. Kaser, B. Weber, J. Pinggera, and S. Zugal. Handlungsorientierung bei der Planung von Softwareprojekten. In Proc. TEAP’10, pages 253–253, 2010. [119] A. E. Kazdin. Artifact, bias, and complexity of assessment: the ABCs of reliability. Journal of Applied Behavior Analysis, 10(1):141–150, 1977. [120] V. Khatri, I. Vessey, P. C. V. Ramesh, and S.-J. Park. Understanding Conceptual Schemas: Exploring the Role of Application and IS Domain Knowledge. Information Systems Research, 17(1):81–99, 2006. [121] B. Kitchenham. Procedures for performing systematic reviews. Technical report, Keele University Joint Technical Report TR/SE-0401, 2004. [122] N. F. Kock. Product Flow, Breadth and Complexity of Business Processes: An Empirical Study of 15 Business Processes in Three Organizations. Business Process Re-engineering & Management Journal, 2(2):8–22, 1996. [123] J. Kolb, K. Kammerer, and M. Reichert. Updatable Process Views for Usercentered Adaption of Large Process Models. In Proc. ICSOC’12, pages 484– 498, 2012. 224 Bibliography [124] J. Kolb, M. Reichert, and B. Weber. Using Concurrent Task Trees for Stakeholder-centered Modeling and Visualization of Business Processes. In Proc. S-BPM ONE’12, pages 137–151, 2012. [125] P. C. Kyllonen and D. L. Stephens. Cognitive Abilities as Determinants of Success in Acquiring Logic Skill. Learning and Individual Differences, 2(2):129– 160, 1990. [126] E. Lamma, P. Mello, M. Montali, F. Riguzzi, and S. Storari. Inducing Declarative Logic-Based Models from Labeled Traces. In Proc. BPM’07, pages 344– 359, 2007. [127] A. Lanz, B. Weber, and M. Reichert. Time Patterns for Process-aware Information Systems: A Pattern-based Analysis - Revised version. Technical report, University of Ulm, 2009. [128] J. H. Larkin and H. A. Simon. Why a Diagram is (Sometimes) Worth Ten Thousand Words. Cognitive Science, 11(1):65–100, 1987. [129] R. Lenz and M. Reichert. IT support for healthcare processes—premises, challenges, perspectives. Data & Knowledge Engineering, 61(1):39–58, 2007. [130] C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4):764–766, 2013. [131] H. Liang, J. Dingel, and Z. Diskin. A Comparative Survey of Scenario-based to State-based Model Synthesis Approaches. In Proc. SCESM’06, pages 5–12, 2006. [132] L. T. Ly, S. Rinderle, and P. Dadam. Integration and verification of semantic constraints in adaptive process management systems. Data & Knowledge Engineering, 64(1):3–23, 2008. [133] A. Marchenko, A. Pekka, and T. Ihme. Long-Term Effects of Test-Driven Development A Case Study. In XP, pages 13–22, 2009. [134] K. Masri. Conceptual Model Design for Better Understanding. PhD thesis, Simon Fraser University, 2009. [135] R. Mayer and P. Chandler. When learning is just a click away: Does simple user interaction foster deeper understanding of multimedia messages. Journal of Educational Psychology, 93(2):390–397, 2001. 225 Bibliography [136] S. McDonald and R. J. Stevenson. Disorientation in hypertext: the effects of three text structures on navigation performance. Applied Ergonomics, 27(1):61–68, 1996. [137] J. Melcher, J. Mendling, H. Reijers, and D. Seese. On Measuring the Understandability of Process Models. In Proc. BPM Workshops’09, pages 465–476, 2009. [138] J. Melcher and D. Seese. Towards Validating Prediction Systems for Process Understandability: Measuring Process Understandability. In Proc. SYNASC’08, pages 564–571, 2008. [139] J. Mendling. Detection and Prediction of Errors in EPC Business Process Models. PhD thesis, Vienna University of Economics and Business Administration, 2007. [140] J. Mendling. Metrics for Process Models: Empirical Foundations of Verification, Error Prediction and Guidelines for Correctness. Springer, 2008. [141] J. Mendling. Empirical Studies in Process Model Verification. In Transactions on Petri Nets and Other Models of Concurrency II, volume 5460 of Lecture Notes in Computer Science, pages 208–224. Springer Berlin Heidelberg, 2009. [142] J. Mendling, G. Neumann, and W. M. P. van der Aalst. Understanding the Occurrence of Errors in Process Models based on Metrics. In Proc. CoopIS’07, pages 113–130, 2007. [143] J. Mendling, H. Reijers, and J. Cardoso. What Makes Process Models Understandable? In Proc. BPM’07, pages 48–63, 2007. [144] J. Mendling, H. Reijers, and J. Recker. Activity Labeling in Process Modeling: Empirical Insights and Recommendations. Information Systems, 35(4):467– 482, 2010. [145] J. Mendling, H. Reijers, and W. M. P. van der Aalst. Seven process modeling guidelines (7PMG). Information & Software Technology, 52(2):127–136, 2010. [146] M. B. Miles. Qualitative Data as an Attractive Nuisance: The Problem of Analysis. Administrative Science Quarterly, 24(4):590–601, 1979. [147] G. Miller. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. The Psychological Review, 63(2):81– 97, 1956. 226 Bibliography [148] M. Montali, M. Pesic, W. M. P. van der Aalst, F. Chesani, P. Mello, and S. Storari. Declarative Specification and Verification of Service Choreographies. ACM Transactions on the Web, 4(1):1–62, 2010. [149] D. Moody. A Multi-Level Architecture for Representing Enterprise Data Models. In Proc. ER’97, pages 184–197, 1997. [150] D. L. Moody. Cognitive Load Effects on End User Understanding of Conceptual Models: An Experimental Analysis. In Proc. ADBIS’04, pages 129–143, 2004. [151] D. L. Moody. The ”Physics” of Notations: Toward a Scientific Basis for Constructing Visual Notations in Software Engineering. IEEE Transactions on Software Engineering, 35(6):756–779, 2009. [152] D. L. Moody and A. Flitman. A Methodology for Clustering Entity Relationship Models—A Human Information Processing Approach. In Proc. ER’99, pages 114–130, 1999. [153] R. Moreno and R. E. Mayer. Cognitive Principles of Multimedia Learning: The Role of Modality and Contiguity. Journal of Educational Psychology, 91(2):358–368, 1999. [154] W. T. Morris. On the Art of Modeling. Management Science, 13(12):B–707– B–717, 1967. [155] R. Mugridge and W. Cunningham. Fit for Developing Software: Framework for Integrated Tests. Prentice Hall, 2005. [156] R. Mukkamala. A Formal Model For Declarative Workflows-Dynamic Condition Response Graphs. PhD thesis, IT University of Copenhagen, 2012. [157] R. Mukkamala, T. Hildebrandt, and T. Slaats. Towards Trustworthy Adaptive Case Management with Dynamic Condition Response Graphs. In Proc. EDOC’13, accepted. [158] D. Müller, M. Reichert, and J. Herbst. A New Paradigm for the Enactment and Dynamic Adaptation of Data-driven Process Structures. In Proc. CAiSE’08, pages 48–63, 2008. [159] N. Mulyar, M. Pesic, W. M. P. van der Aalst, and M. Peleg. Declarative and Procedural Approaches for Modelling Clinical Guidelines: Addressing Flexibility Issues. In Proc. ProHealth’07, pages 335–346, 2007. 227 Bibliography [160] J. Mylopoulos. Information modeling in the time of the revolution. Information Systems, 23(3/4):127–155, 1998. [161] J. Nakamura and M. Csikszentmihalyi. The Concept of Flow. In Handbook of Positive Psychology, pages 89–105. Oxford University Press, 2002. [162] A. Newell. Unified Theories of Cognition. Harvard University Press, 1990. [163] D. A. Norman. Cognitive artifacts. Department of Cognitive Science, University of California, San Diego, 1990. [164] J. F. Nunamaker, M. Chen, and T. D. Purdin. Systems Development in Information Systems Research. Journal of Management Information Systems, 7(3):89–106, 1991. [165] OMG. UML Testing Profile, Version 1.0. http://www.omg.org/cgibin/doc?formal/05-07-07, 2005. Accessed: April 2013. [166] OMG. UML Version 2.3. http://www.omg.org/spec/UML/2.3/Superstructure /PDF, 2010. Accessed: April 2013. [167] OMG. BPMN Version 2.0. http://www.omg.org/spec/BPMN/2.0/PDF, 2011. Accessed: April 2013. [168] F. Paas, A. Renkl, and J. Sweller. Cognitive Load Theory and Instructional Design: Recent Developments. Educational Psychologist, 38(1):1–4, 2003. [169] M. Pančur, M. Ciglarič, M. Trampuš, and T. Vidmar. Towards Empirical Evaluation of Test-Driven Development in a University Environment. In Proc. EUROCON’03, pages 83–86, 2003. [170] D. L. Parnas. On the Criteria to be Used in Decomposing Systems into Modules. Communications of the ACM, 15(12):1053–1058, 1972. [171] J. Parsons and L. Cole. What do the pictures mean? Guidelines for experimental evaluation of representation fidelity in diagrammatical conceptual modeling techniques. Data & Knowledge Engineering, 55(3):327–342, 2005. [172] D. Paulson and Y. Wand. An Automated Approach to Information Systems Decomposition. IEEE Transactions on Software Engineering, 18(3):174–189, 1992. 228 Bibliography [173] K. Peffers, T. Tuunanen, M. Rothenberger, and S. Chatterjee. A Design Science Research Methodology for Information Systems Research. Journal of Management Information Systems, 24(3):45–77, 2007. [174] R. Pérez-Castillo, B. Weber, J. Pinggera, S. Zugal, I. G.-R. de Guzmán, and M. Piattini. Generating event logs from nonprocess-aware systems enabling business process mining. Enterprise Information Systems, 5(3):301–335, 2011. [175] M. Pesic. Constraint-Based Workflow Management Systems: Shifting Control to Users. PhD thesis, TU Eindhoven, 2008. [176] M. Pesic, H. Schonenberg, N. Sidorova, and W. M. P. van der Aalst. Constraint-Based Workflow Models: Change Made Easy. In Proc. CoopIS’07, pages 77–94, 2007. [177] M. Pesic, H. Schonenberg, and W. M. P. van der Aalst. DECLARE: Full Support for Loosely-Structured Processes. In Proc. EDOC’07, pages 287–298, 2007. [178] M. Pesic and W. M. P. van der Aalst. A Declarative Approach for Flexible Business Processes Management. In Proc. BPM Workshops’06, pages 169–180, 2006. [179] M. A. Pett. Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions. Sage Publications, 1997. [180] P. Pichler, B. Weber, S. Zugal, J. Pinggera, J. Mendling, and H. Reijers. Imperative versus Declarative Process Modeling Languages: An Empirical Investigation. In Proc. ER-BPM’11, pages 383–394, 2012. [181] M. Pidd. Tools for Thinking: Modelling in Management Science. Wiley, 2003. [182] J. Pinggera. Handling Uncertainty in Software Projects—A Controlled Experiment. Master’s thesis, University of Innsbruck, Institute of Computer Science, 2009. [183] J. Pinggera, M. Furtner, M. Martini, P. Sachse, K. Reiter, S. Zugal, and B. Weber. Investigating the Process of Process Modeling with Eye Movement Analysis. In Proc. ER-BPM’12, pages 438–450, 2013. [184] J. Pinggera, T. Porcham, S. Zugal, and B. Weber. LiProMo—Literate Process Modeling. In Proc. CAiSE Forum’12, pages 163–170, 2012. 229 Bibliography [185] J. Pinggera, P. Soffer, D. Fahland, M. Weidlich, S. Zugal, B. Weber, H. Reijers, and J. Mendling. Styles in business process modeling: an exploration and a model. Software & Systems Modeling, 2013, DOI: 10.1007/s10270-013-0349-1. [186] J. Pinggera, P. Soffer, S. Zugal, B. Weber, M. Weidlich, D. Fahland, H. Reijers, and J. Mendling. Modeling Styles in Business Process Modeling. In Proc. BPMDS’12, pages 151–166, 2012. [187] J. Pinggera, S. Zugal, and B. Weber. Alaska Simulator—Supporting Empirical Evaluation of Process Flexibility. In Proc. WETICE’09, pages 231–233, 2009. [188] J. Pinggera, S. Zugal, and B. Weber. Investigating the Process of Process Modeling with Cheetah Experimental Platform. In Proc. ER-POIS’10, pages 13–18, 2010. [189] J. Pinggera, S. Zugal, and B. Weber. Investigating the Process of Process Modeling with Cheetah Experimental Platform. EMISA Forum, 30(2):25–31, 2010. [190] J. Pinggera, S. Zugal, B. Weber, D. Fahland, M. Weidlich, J. Mendling, and H. Reijers. How the Structuring of Domain Knowledge Can Help Casual Process Modelers. In Proc. ER’10, pages 231–237, 2010. [191] J. Pinggera, S. Zugal, B. Weber, W. Wild, and M. Reichert. Integrating CaseBased Reasoning with Adaptive Process Management. Technical Report TRCTIT-08-11, Centre for Telematics and Information Technology, University of Twente, 2008. [192] J. Pinggera, S. Zugal, M. Weidlich, D. Fahland, B. Weber, J. Mendling, and H. Reijers. Tracing the Process of Process Modeling with Modeling Phase Diagrams. In Proc. ER-BPM’11, pages 370–382, 2012. [193] A. Polyvyanyy, S. Smirnov, and M. Weske. Process Model Abstraction: A Slider Approach. In Proc. EDOC’08, pages 325–331, 2008. [194] M. Poppendieck and T. Poppendieck. Implementing Lean Software Development: From Concept to Cash. Addison-Wesley Professional, 2006. [195] A. Porter and L. Votta. Comparing Detection Methods For Software Requirements Inspections: A Replication Using Professional Subjects. Empirical Software Engineering, 3(4):355–379, 1998. 230 Bibliography [196] S. Quaglini, M. Stefanelli, G. Lanzola, V. Caporusso, and S. Panzarasa. Flexible guideline-based patient careflow systems. Artificial Intelligence in Medicine, 22(1):65–80, 2001. [197] D. Quartel. Action relations. Basic design concepts for behaviour modelling and refinement. PhD thesis, University of Twente, 1998. [198] D. Quartel, L. Pires, M. van Sinderen, H. Franken, and C. Vissers. On the Role of Basic Design Concepts in Behaviour Structuring. Computer networks and ISDN systems, 29(4):413–436, 1997. [199] W. R. The Problem of the Problem. MIS Quarterly, 27(1):iii–ix, 2003. [200] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2013. [201] O. Rauh and E. Stickel. Entity tree clustering—A method for simplifying ER designs. In Proc. ER’92, pages 62–78, 1992. [202] J. C. Recker, M. Rosemann, P. Green, and M. Indulska. Do Ontological Deficiencies in Modeling Grammars Matter? MIS Quarterly, 35(1):57–79, 2011. [203] M. Reichert and P. Dadam. ADEPTflex: Supporting Dynamic Changes of Workflow without Losing Control. Journal of Intelligent Information Systems, 10(2):93–129, 1998. [204] M. Reichert, J. Kolb, R. Bobrik, and T. Bauer. Enabling Personalized Visualization of Large Business Processes through Parameterizable Views. In Proc. SAC’12, pages 1653–1660, 2012. [205] M. Reichert and B. Weber. Enabling Flexibility in Process-Aware Information Systems: Challenges, Methods, Technologies. Springer, 2012. [206] H. Reijers and J. Mendling. Modularity in Process Models: Review and Effects. In Proc. BPM’08, pages 20–35, 2008. [207] H. Reijers and J. Mendling. A Study into the Factors that Influence the Understandability of Business Process Models. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 41(3):449–462, 2011. [208] H. Reijers, J. Mendling, and R. Dijkman. Human and automatic modularizations of process models to enhance their comprehension. Information Systems, 36(5):881–897, 2011. 231 Bibliography [209] H. Reijers, T. Slaats, and C. Stahl. Declarative Modeling—An Academic Dream or the Future for BPM? In Proc. BPM’13, pages 307–322, 2013. [210] P. Rittgen. Negotiating Models. In Proc. CAiSE’07, pages 561–573, 2007. [211] P. Rittgen. Collaborative Modeling—A Design Science Approach. In Proc. HICSS’09, pages 1–10, 2009. [212] S. Rockwell and A. Bajaj. COGEVAL: Applying Cognitive Theories to Evaluate Conceptual Models. Advanced Topics in Database Research, 4:255–282, 2005. [213] Y. Rogers and H. Brignull. Computational Offloading: Supporting Distributed Team Working Through Visually Augmenting Verbal Communication. In Proc. CogSci’03, pages 1011–1016, 2003. [214] J. Rosenberg. Statistical Methods and Measurement. In Guide to Advanced Empirical Software Engineering, pages 155–184. Springer, 2008. [215] P. Runeson. Using Students as Experiment Subjects—An Analysis on Graduate and Freshmen Student Data. In Proc. EASE’03, pages 95–102, 2003. [216] S. W. Sadiq, M. E. Orlowska, and W. Sadiq. Specification and validation of process constraints for flexible workflows. Information Systems, 30(5):349–378, 2005. [217] M. Scaife and Y. Rogers. External cognition: how do graphical representations work? International Journal of Human-Computer Studies, 45(2):185–213, 1996. [218] A. W. Scheer. ARIS: Business Process Modeling, 3rd ed. Springer, 2000. [219] M. Schier. Adoption of Decision Deferring Techniques in Plan-driven Software Projects. Master’s thesis, Master Thesis, Department of Computer Science, University of Innsbruck, 2008. [220] H. Schonenberg, B. Weber, B. van Dongen, and W. M. P. van der Aalst. Supporting Flexible Processes through Recommendations Based on History. Proc. BPM’08, pages 51–66, 2008. [221] M. Schrepfer, J. Wolf, J. Mendling, and H. Reijers. The Impact of Secondary Notation on Process Model Understanding. In Proc. PoEM’09, pages 161–175, 2009. 232 Bibliography [222] C. B. Seaman. Qualitative Methods. In Guide to Advanced Empirical Software Engineering, pages 35–62. Springer, 2008. [223] I. Seeber, B. Weber, and R. Maier. CoPrA: A Process Analysis Technique to Investigate Collaboration in Groups. In Proc. HICCS’12, pages 363–372, 2012. [224] A. Sharp and P. McDermott. Workflow Modeling: Tools for Process Improvement and Application Development. Artech House, 2011. [225] P. Shoval, R. Danoch, and M. Balabam. Hierarchical entity-relationship diagrams: the model, method of creation and experimental evaluation. Requirements Engineering, 9(4):217–228, 2004. [226] K. Siau and M. Rossi. Evaluation techniques for systems analysis and design modelling methods—a review and comparative analysis. Information Systems Journal, 21(3):249–268, 2008. [227] J. P. Simmons, L. D. Nelson, and U. Simonsohn. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11):1359–1366, 2011. [228] J. Singer, S. E. Sim, and T. C. Lethbridge. Software Engineering Data Collection for Field Studies. In Guide to Advanced Empirical Software Engineering, pages 9–34. Springer, 2008. [229] M. P. Singh. Semantical Considerations on Workflows: An Algebra for Intertask Dependencies. In Proc. DBPL’96, pages 1–15, 1996. [230] M. Siniaalto. Test driven development: empirical body of evidence. Technical report, Information Technology for European Advancement, 2006. [231] S. Smirnov, R. Dijkman, J. Mendling, and M. Weske. Meronymy-Based Aggregation of Activities in Business Process Models. In Proc. ER’10, pages 1–14, 2010. [232] S. Smirnov, M. Weidlich, and J. Mendling. Business Process Model Abstraction Based on Behavioral Profiles. In Proc. ICSOC’10, pages 1–16, 2010. [233] P. Soffer, M. Kaner, and Y. Wand. Towards understanding the process of process modeling: Theoretical and empirical considerations. In Proc. ERBPM’11, pages 357–369, 2012. 233 Bibliography [234] A. Strauss and J. Corbin. Grounded theory methodology: An overview, pages 273–285. Sage, 1994. [235] M. Svahnberg, A. Aurum, and C. Wohlin. Using Students as Subjects—an Empirical Evaluation. In Proc. ESEM’08, pages 288–290, 2008. [236] J. Sweller. Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2):257–285, 1988. [237] J. Sweller. Instructional Design in Technical Areas. ACER Press, Camberwell, 1999. [238] J. Sweller and P. Chandler. Why Some Material Is Difficult to Learn. Cognition and Instruction, 12(3):185–233, 1994. [239] S. Taylor and R. Bogdan. Introduction to Qualitative Research Methods. Wiley, 1984. [240] L. Thom, M. Reichert, C. Iochpe, and J. P. M. de Oliveira. Why Rigid Process Management Technology Hampers Computerized Support of Healthcare Processes? In Proc. WIM’10, pages 1522–1531, 2010. [241] V. Torres, S. Zugal, B. Weber, M. Reichert, and C. Ayora. Understandability Issues of Approaches Supporting Business Process Variability. Technical Report ProS-TR-2012-03, Universidad Politecnica de Valencia, 2012. [242] V. Torres, S. Zugal, B. Weber, M. Reichert, C. Ayora, and V. Pelechano. A Qualitative Comparison of Approaches Supporting Business Process Variability. In Proc. rBPM’12, pages 560–572, 2012. [243] A. Tort and A. Olivé. First Steps Towards Conceptual Schema Testing. In Proc. CAiSE Forum’09, pages 1–6, 2009. [244] A. Tort and A. Olivé. An approach to testing conceptual schemas. Data & Knowledge Engineering, 69(6):598–618, 2010. [245] A. Tort, A. Olivé, and M.-R. Sancho. An approach to test-driven development of conceptual schemas. Data & Knowledge Engineering, 70(12):1088–1111, 2011. [246] A. Tort, A. Olivé, and M.-R. Sancho. The CSTL Processor: A Tool for Automated Conceptual Schema Testing. In Proc. ER Workshops’11, pages 349–352, 2011. 234 Bibliography [247] A. Tort, A. Olivé, and M.-R. Sancho. On Checking Executable Conceptual Schema Validity by Testing. In Proc. DEXA’12, pages 249–264, 2012. [248] W. J. Tracz. Computer programming and the human thought process. Software: Practice and Experience, 9(2):127–137, 1979. [249] N. Unsworth and R. W. Engle. The Nature of Individual Differences in Working Memory Capacity: Active Maintenance in Primary Memory and Controlled Search From Secondary Memory. Psychological Review, 114(1):104– 132, 2007. [250] P. van Bommel, S. Hoppenbrouwers, E. Proper, and T. van der Weide. Exploring Modelling Strategies in a Meta-modelling Context. In Proc. OTM’06, pages 1128–1137, 2006. [251] W. M. P. van der Aalst, H. T. de Beer, and B. van Dongen. Process Mining and Verification of Properties: An Approach Based on Temporal Logic. In Proc. OTM’05, pages 130–147, 2005. [252] W. M. P. Van der Aalst and J. Dehnert. Bridging the Gap between Business Models and Workflow Specifications. International Journal of Cooperative Information Systems, 13(3):289–332, 2004. [253] W. M. P. van der Aalst and M. Pesic. DecSerFlow: Towards a Truly Declarative Service Flow Languages. In The Role of Business Processes in Service Oriented Architectures, number 06291 in Dagstuhl Seminar Proceedings, pages 1–23, 2006. [254] W. M. P. van der Aalst and M. Pesic. Specifying and Monitoring Service Flows: Making Web Services Process-Aware. In Test and Analysis of Web Services, pages 11–55. Springer, 2007. [255] W. M. P. van der Aalst, A. H. ter Hofstede, B. Kiepuszewski, and A. P. Barros. Workflow Patterns. Distributed and Parallel Database, 14(3):5–51, 2003. [256] W. M. P. van der Aalst and A. H. M. ter Hofstede. YAWL: Yet Another Workflow Language. Information Systems, 30(4):245–275, June 2005. [257] W. M. P. van der Aalst and M. Weske. Case handling: a new paradigm for business process support. Data & Knowledge Engineering, 53(2):129–162, 2005. 235 Bibliography [258] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, and W. M. P. van der Aalst. The ProM framework: A new era in process mining tool support. In Proc. ICATPN’05, pages 444–454, 2005. [259] G. V. van Zanten, S. Hoppenbrouwers, and H. Proper. System Development as a Rational Communicative Process. Journal of Systemics, Cybernetics and Informatics, 2(4):1–5, 2004. [260] I. Vanderfeesten, H. Reijers, J. Mendling, W. M. P. van der Aalst, and J. Cardoso. On a Quest for Good Process Models: The Cross-Connectivity Metric. In Proc. CAiSE’08, pages 480–494, 2008. [261] E. Verbeek, M. Hattem, H. Reijers, and W. Munk. Protos 7.0: Simulation Made Accessible. In Proc. APN’05, pages 465–474, 2005. [262] J. Vlissides, R. Helm, R. Johnson, and E. Gamma. Design Patterns. Elements of Reusable Object-Oriented Software. Addison-Wesley, 1994. [263] W. P. Vogt. Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social Sciences. SAGE Publications, 2011. [264] J. Wainer, F. Bezerra, and P. Barthelmess. Tucupi: a flexible workflow system based on overridable constraints. In Proc. SAC’04, pages 498–502, 2004. [265] Y. Wand and R. Weber. An Ontological Model of an Information System. IEEE Transactions on Software Engineering, 16(11):1282–1292, 1990. [266] Y. Wand and R. Weber. Research Commentary: Information Systems and Conceptual Modeling—A Research Agenda. Information Systems Research, 13(4):363–376, 2002. [267] E. J. Webb, D. T. Campbell, R. D. Schwartz, L. Sechrest, and J. B. Grove. Nonreactive Measures in the Social Sciences. Houghton, 1981. [268] B. Weber, J. Pinggera, V. Torres, and M. Reichert. Change Patterns for Model Creation: Investigating the Role of Nesting Depth. In Proc. Cognise’13, pages 198–204, 2013. [269] B. Weber, J. Pinggera, V. Torres, and M. Reichert. Change Patterns in Use: A Critical Evaluation. In Proc. BPMDS’13, pages 261–276, 2013. [270] B. Weber, J. Pinggera, S. Zugal, and W. Wild. Alaska Simulator—A Journey to Planning. In Proc. XP’09, pages 253–254, 2009. 236 Bibliography [271] B. Weber, J. Pinggera, S. Zugal, and W. Wild. Alaska Simulator Toolset for Conducting Controlled Experiments. In Proc. CAiSE Forum’10, pages 205–221, 2010. [272] B. Weber, J. Pinggera, S. Zugal, and W. Wild. Handling Events During Business Process Execution: An Empirical Test. In Proc. ER-POIS’10, pages 19–30, 2010. [273] B. Weber, M. Reichert, J. Mendling, and H. Reijers. Refactoring large process model repositories. Computers in Industry, 62(5):467–486, 2011. [274] B. Weber, M. Reichert, and S. Rinderle. Change Patterns and Change Support Features—Enhancing Flexibility in Process-Aware Information Systems. Data & Knowledge Engineering, 66(3):438–466, 2008. [275] B. Weber, M. Reichert, S. Rinderle-Ma, and W. Wild. Providing Integrated Life Cycle Support in Process-Aware Information Systems. International Journal of Cooperative Information Systems, 18(1):115–165, 2009. [276] B. Weber, H. Reijers, S. Zugal, and W. Wild. The Declarative Approach to Business Process Execution: An Empirical Test. In Proc. CAiSE’09, pages 270–285, 2009. [277] B. Weber, S. Zugal, J. Pinggera, and W. Wild. Experiencing Process Flexibility Patterns with Alaska Simulator. In Proc. BPMDemos’09, pages 13–16, 2009. [278] M. Weidlich, S. Zugal, J. Pinggera, B. Weber, H. Reijers, and J. Mendling. The Impact of Sequential and Circumstantial Changes on Process Models. In Proc. ER-POIS’10, pages 43–54, 2010. [279] M. Weiser. Programmers Use Slices When debugging. Communications of the ACM, 25(7):446–452, 1982. [280] M. Weske. Workflow Management Systems: Formal Foundation, Conceptual Design, Implementation Aspects. PhD thesis, University of Münster, 2000. [281] M. Weske. Business Process Management: Concepts, Methods, Technology. Springer, 2007. [282] R. Wieringa. Design Science Methodology: Principles and Practice. In Proc. ICSE’10, pages 493–494, 2010. 237 Bibliography [283] R. Wieringa. Relevance and Problem Choice in Design Science. In Proc. DESRIST’10, pages 61–76, 2010. [284] R. Wieringa. Towards a Unified Checklist for Empirical Research in Software Engineering: First proposal. In Proc. EASE’12, pages 161–165, 2012. [285] R. Wieringa, N. Condori-Fernandez, M. Daneva, B. Mutschler, and O. Pastor. Lessons Learned from Evaluating a Checklist for Reporting Experimental and Observational Research. In Proc. ESEM’12, pages 157–160, 2012. [286] C. Wohlin, R. Runeson, M. Halst, M. Ohlsson, B. Regnell, and A. Wesslen. Experimentation in Software Engineering: an Introduction. Kluwer, 2000. [287] J. Wolfe. Guided Search 2.0 A revised model of visual search. Psychonomic Bulletin & Review, 21(2):202–238, 1994. [288] R. K. Yin. Case Study Research: Design and Methods. Sage, Thousand Oaks, CA, 2002. [289] E. Zangerle, W. Gassler, and G. Specht. Recommending#-Tags in Twitter. In Proc. SASWeb’11, pages 67–78, 2011. [290] E. Zangerle, W. Gassler, and G. Specht. Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments. In Proc. SocInfo’11, pages 113–126, 2011. [291] J. Zhang. The Nature of External Representations in Problem Solving. Cognitive Science, 21(2):179–217, 1997. [292] J. Zhang and D. A. Norman. Representations in Distributed Cognitive Tasks. Cognitive Science, 18(1):87–122, 1995. [293] X. Zhao, C. Liu, Y. Yang, and W. Sadiq. Aligning Collaborative Business Processes—An Organization-Oriented Perspective. Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 39(6):1152–1164, 2009. [294] D. W. Zimmerman. A Note on Interpretation of the Paired-Samples t Test. Journal of Educational and Behavioral Statistics, 22(3):349–360, 1997. [295] S. Zugal. Agile versus Plan-Driven Approaches to Planning—A Controlled Experiment. Master’s thesis, University of Innsbruck, October 2008. 238 Bibliography [296] S. Zugal, C. Haisjackl, J. Pinggera, and B. Weber. Empirical Evaluation of Test Driven Modeling. International Journal of Information System Modeling and Design, 4(2):23–43, 2013. [297] S. Zugal, J. Pinggera, J. Mendling, H. Reijers, and B. Weber. Assessing the Impact of Hierarchy on Model Understandability—A Cognitive Perspective. In Proc. EESSMod’11, pages 123–133, 2011. [298] S. Zugal, J. Pinggera, H. Reijers, M. Reichert, and B. Weber. Making the Case for Measuring Mental Effort. In Proc. EESSMod’12, pages 37–42, 2012. [299] S. Zugal, J. Pinggera, and B. Weber. Assessing Process Models with Cognitive Psychology. In Proc. EMISA’11, pages 177–182, 2011. [300] S. Zugal, J. Pinggera, and B. Weber. Creating Declarative Process Models Using Test Driven Modeling Suite. In Proc. CAiSE Forum’11, pages 16–32, 2011. [301] S. Zugal, J. Pinggera, and B. Weber. The Impact of Testcases on the Maintainability of Declarative Process Models. In Proc. BPMDS’11, pages 163–177, 2011. [302] S. Zugal, J. Pinggera, and B. Weber. Toward Enhanced Life-Cycle Support for Declarative Processes. Journal of Software: Evolution and Process, 24(3):285– 302, 2012. [303] S. Zugal, P. Soffer, C. Haisjackl, J. Pinggera, M. Reichert, and B. Weber. Investigating Expressiveness and Understandability of Hierarchy in Declarative Business Process Models. Software & Systems Modeling, 2013, DOI: 10.1007/s10270-013-0356-2. [304] S. Zugal, P. Soffer, J. Pinggera, and B. Weber. Expressiveness and Understandability Considerations of Hierarchy in Declarative Business Process Models. In Proc. BPMDS’12, pages 167–181, 2012. 239
© Copyright 2025 Paperzz