A Protocol of a Systematic Mapping Study for Domain-Specific Languages Tomaž Kosar1 , Sudev Bohra2 , Marjan Mernik1 1 University 2 Carnegie of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ulica 17, 2000 Maribor, Slovenia Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213-3890, USA 1. Introduction This document describes the protocol for a Systematic Mapping Study (SMS) for Domain-Specific Languages (DSLs) and it is meant as an electronic supplement to the paper [17]. DSL research is spreading into many software development methodologies (e.g., Generative Programming, Product Lines, Software Factories, Language-Oriented Programming, and Model-Driven Engineering), vast areas of application domains (e.g., control systems, data intensive apps, embedded systems, security, simulation, testing, web), and different development approaches (e.g., external and internal DSLs), it is hard to obtain a complete knowledge of the DSL research field, and foreseen DSL research trends. Indeed, there is a substantial body of publications on DSL research (see for example some expert literature reviews from the past [6][20]). Therefore, the main objective is to perform SMS [13, 23] on DSLs for better understanding the DSL research field, identifying research trends, and possible open issues. The work on this SMS started in July 2013 and finished in October 2014 (with some corrections from the period of May-August 2015). Several guidelines were studied before starting SMS on DSLs ([13, 23]) as well as reports on other SMSs (e.g., [8, 2, 1, 19, 9, 10, 18]). The guidelines for performing Systematic Reviews (SRs) in Software Engineering (SE) [13] outlined three phases: planning the review, conducting the review, and reporting the review. The sub-tasks of planning the review are [13]: • Identification of the need for a review. For this SMS see the discussion in Section 2. • Commissioning a review. According to [13] this step is unrequired for a research team undertaking a review for their own needs, or for one being undertaken by a student during his/her project or thesis (BSc., MSc., PhD.). • Specifying the research questions. For this SMS see Section 3. Preprint submitted to Elsevier October 28, 2015 • Developing a review protocol. For this SMS see Section 4. • Evaluating the review protocol. One of the benefits of SRs is that the review protocol is precisely defined before the review is conducted. This assures that the results will be unbiased and reliable (at least to some extent). However, due to lack of appropriate funding the review protocol will not be re-examined by a group of independent experts for this SMS. The sub-tasks of conducting the review are [13]: • Identification of relevant primary studies relating to the research questions using the search strategy defined in the review protocol (see Section 4). • Selection of primary studies based on previously defined inclusion/exclusion criteria identifying those primary studies that provide direct evidence about the research questions. Identified primary studies for this SMS will be published on a project web-page. • Study quality assessment, where it is investigated as to whether quality differences provide an explanation for differences in study results (e.g., results from controlled experiments can be more trustworthy than from observational study), or as a means of weighting the importance of individual studies when results are being synthesised. This step is optional for SMSs and will be skipped in this SMS, too. On the other hand, only peer-review papers will be considered and hence weak forms of quality assessment will be achieved in this SMS. • Data extraction and monitoring, where the data extraction forms will be designed to collect the information needed to address the review questions and the study quality criteria (if included in the study). The data extraction form is presented in Section 5. • Data synthesis summarizing the results of the included primary studies. The sub-tasks of reporting the review are [13]: • Specifying dissemination mechanisms. As this is an academic study we assume that dissemination of results will be achieved by publishing it in a scientific journal and summarised on a web-page, where the basic results will be found by practitioners. • Formatting the main report, either as technical/project report, BSc/MSc./ PhD. thesis, conference or journal papers. We will submit the work to a journal. • Evaluating the report. This will be achieved in our case by submitting the work to a peer reviewed journal. The outlined three phases (planning the review, conducting the review, and reporting the review) with the aforementioned sub-phases were later simplified in [23] into five stages: 2 • defining research questions, • conducting a search for primary studies, • screening primary studies based on inclusion/exclusion criteria, • classifying the primary studies, and • data extraction and aggregation. This simplified structure has been adopted by many researchers (e.g., [8, 24, 7, 19, 10, 18]), and even by the authors of the original guidelines [15]. Hence we will also adopted it. We will also take into account the good practices and lessons learned from applying the SR process within an SE domain [4]. 2. Identification of the need for a review The first step in performing SR, according to [13], is to ensure that a SR is necessary in the first place. Researchers should identify and review any existing SRs of the research topic under investigation. Although, annotated DSL bibliography [6] is not an example of SR, more than 70 publications have been summarised. Furthermore, DSL terminology, DSL advantages and disadvantages, as well as DSL design methodologies, and DSL implementation techniques have been discussed. This work can be classified as an ’expert literature review’ paper. Yet, another ’expert literature review’ paper is our own survey paper on DSLs [20] where more than 150 primary studies were used to classify DSL research work, and to find patterns during various phases of DSL development (decision, analysis, design, implementation). However, this work was published 10 years ago, in 2005, hence it is reasonable to ask what has been the research space of the literature in the field of DSLs after publishing the survey paper on DSLs. The first attempt of SMS for DSLs was the work [22], which unfortunately was unsatisfactory and needed to be repeated. In particular, we didn’t find study [22] very useful as the authors classify the primary studies regarding a research focus with respect to keywords found in the primary studies and not as already-established research focused within the DSL field. Although the authors of [22] follow the guidelines on how to perform SMSs in SE [23] the outcome - classification of research focus (in the authors’ words a DSL research type) on ADL (Architecture Description Language), DSAL (Domain-Specific Aspect Language), DSML (Domain-Specific Modeling Languages), external DSL, internal DSL, method or process, technique, and tools, are far from being satisfactory. Our SMS will not be an exact replication of SMS in [22] due to differences in research questions and classifications, as well as the inclusion of primary studies (we want to concentrate solely on DSLs, and DSMLs will be excluded). Overall, there is a need for SMS on DSLs (due to DSL’s broad nature) for summarising recent knowledge about DSLs, in order to draw more general conclusions about DSL research and to discover promising directions for DSL research in the near future. 3 3. Specifying the research questions In this section we report on research questions and the rationale behind them. RQ1 : What has been the research space of the literature within the field of DSLs since the survey paper on DSLs [20] was published 10 years ago? RQ2 : What have been the trends and demographics of the literature within the field of DSLs after the survey on DSLs [20] was published 10 years ago? The research question RQ1 will further be split into three sub-questions. RQ1.1 T ype of contribution : What is the main contribution of DSL studies with respect to techniques/methods, tools, processes, and measurements? The particular study would fall into: • the ’DSL development techniques/methods’ category if the study’s main contribution were to be a technique/method of any DSL development phase: domain analysis, design, implementation (e.g., DSL compiler), validation, and maintenance (e.g., DSL evolution). • the ’DSL development tools’ category if the study’s main contribution were to be a tool that supported one or more phases of DSL development (domain analysis, design, implementation, validation, and maintenance). • the ’DSL processes’ category if the study’s main contribution were to be the description of a flow from one phase into another (e.g., how outputs from one phase could be used as inputs for another phase), or the DSL development process were to be discussed within a wider context of software engineering (e.g., integration within a larger project), or a particular DSL process were to be described (DSL debugging, DSL testing, DSL usability test). • the ’DSL measurement’ category if the study’s main contribution were to be a proposal or application of metrics regarding the effectiveness of DSL approaches (e.g., measuring comprehensibilities of DSL programs, measuring the productivities of DSL users). Note that we will not differentiate between ’techniques’ and ’methods’ to support easier classification and replication of this study as was also suggested in [26]. The rationale for such a decision would be that the difference were to be subtle and as such might cause problems during classification. Moreover, even the authors of primary studies might use different criteria and hence the result of such classification would be rather unreliable. We will apply the following definition from [26]: “A method or technique is a solution to one or more problems. It might be argued that a method is usually more related to a process which involves humans in the loop while a technique is usually more focused on the automated part of such a process. As for distinguishing between a 4 method and a technique, the standard glossaries for software engineering, e.g. IEEE 729 (IEEE 1983), do not allow a sharp distinction between ’method’ and ’technique’.” On the other hand, the SMS on DSLs conducted by [22] used the following classification: method/process, technique, and tools. Moreover, this classification was mixed within a research focus area: ADL, DSAL, DSML, external DSL, internal DSL. Overall, such a classification was rather unconvincing. RQ1.2 T ype of research : What types of research methods have been used in DSL studies? Many different proposal for classifying the types of research methods exist (e.g., [25], [11], [27]), but most SMSs (e.g., [8, 21, 24, 22, 18]) have followed the guidelines [23] for classifying the types of research methods defined in [27] due to simplicity. Some other possible classifications are presented in [25] (formal theory, design and modelling, empirical work, hypothesis testing, other) or in [11] (conceptual analysis, conceptual analysis/mathematical, concept implementation (proof of concept), case study, data analysis, field study, laboratory experiment, literature review/analysis, mathematical proof, and simulation). Indeed, some SMS (e.g., [1]) have used partial classification from [11] and classified papers within: experiment, case study, conceptual analysis/mathematical, descriptive, literature review, and survey. Such a classification might even be harder to use. We have also decided to use classification as was suggested for SMSs in [23] and is based on [27]: • Opinion paper, which reports on authors’ opinions as to whether a certain technique/method/tool/process/measurement is good or bad. • Experience paper, which reports on authors’ experiences of certain techniques/methods/tools/processes/measurements, as used in practice. • Philosophical/conceptual paper, which provides the taxonomy of a research field or a conceptual framework for structuring phenomena under investigation/design. • Solution proposal, which proposes a certain technique/method/tool/process/measurement as a solution to a particular problem and explains it as small example or a good line of argumentation. However, technique/ method/tool/process/measurement has not yet been implemented. • Validation research, where a certain technique/method/tool/process/measurement is implemented as a solution to a problem and validated by simulations, prototyping, experiments, mathematical systematical analysis, mathematical proof of properties, etc. However, the implementation has not yet been evaluated in practice. • Evaluation research, where a certain technique/method/tool/process/measurement as a solution to a problem is implemented and validated in prac5 tice. Its benefits and drawbacks are evaluated by controlled experiments, observational studies, or case studies. However, we decided to include some improvements to the process of classification with the aim of being a more reliable and replicable process. In [23] it was suggested that the types of research methods could be further classified into empirical research (validation research and evaluation research), and into nonempirical research (opinion paper, experience paper, philosophical/conceptual paper, and solution proposal). However, it seems that this coarse-grained classification has been unacceptable amongst SR researchers so far. In our opinion, this broader classification is very useful for obtaining a broader picture of the research field and is more reliable than the fine-grained classification. For classification to be as uniform as possible we have also specified sufficient conditions for each particular research type as an example of agreed interpretation [28]. A sufficient condition for experience paper is that the authors’ opinion was gained from practical experience. A sufficient condition for a philosophical/conceptual paper is that with gained knowledge authors were able to propose taxonomy or new conceptual framework. A sufficient condition for the solution proposal is that with gained knowledge authors have proposed a new solution, which has not as yet been implemented. A sufficient condition for validation research is that the proposed new solution is implemented and hence validated at least by prototyping. Finally, a sufficient condition for evaluation research is that a solution has been evaluated in practice by a controlled experiment (randomised experiment or quasi-experiment), observational study, or case study. These conditions might also create an ordering amongst research types: opinion paper, experience paper, philosophical/conceptual paper, solution proposal, validation research, evaluation research. It might be argued that such ordering of research types would be too simplistic or unsuitable. For example, one might value validation research more than evaluation research, as in the former a new solution (technique/method/tool/process/measurement) for a problem has been invented and scientifically validated (e.g., with mathematical proof), whilst in the latter this already implemented solution has only been validated in practice, finding its true benefits (or drawbacks) in practice. Yet another example, one might value a philosophical/conceptual paper more than a solution proposal, as to develop a good taxonomy of a research field requires complete knowledge, and often some generalisation skills. Whilst for proposing a certain technique/method/tool/process/measurement as a solution to a particular problem no such complete understanding of a research field is necessary. Our aim regarding the ordering is not to trigger such debates or to imply such conclusions (e.g., evaluation research is valued more than validation research). Instead, our ordering just naturally implies a research life-cycle, which should have ended with validation in practice. This is vividly described in [25] as: “Quite the contrary - new ideas are needed more than ever. But computer scientists must find out how good these ideas are and use experimentation to guide them to the profitable ones.” We are convinced that such ordering of research types by stating sufficient conditions, in addition to two-level classifi- 6 cation, would improve the reliabilities of research type classification and SMSs in the future. RQ1.3 F ocus area : Which research topics have been investigated in DSL studies? The following DSL development phases based on a DSL survey paper [20] will be included: domain analysis, design phase, implementation phase, validation phase, and maintenance phase. The research question RQ2, which is about the trends and demographics of DSL research, will be further split into four sub-questions. • RQ2.1 P ublication count by year: What is the annual number of publications within this field? • RQ2.2 T op cited papers: Which DSL primary studies used in this SMS are cited the most? • RQ2.3 Active institutions: Rather than identifying researchers within the DSL field we will opted for identifying DSL groups working at particular institutions. This will be measured by the number of published papers. How DSL groups are connected together (being co-authors of a same primary study)? • RQ2.4 T op venues: Which venues (e.g., journals, conferences, workshops) are the main targets of DSL papers? Note, that there have been no specialised DSL journals or conferences spanning many years. Hence, it would be interesting to know where DSL researchers have been mostly published. 4. The protocol According to Kitchenham and Charters [13] a protocol is “a plan that describes the conduct of a proposed systematic literature review.” In this section the protocol of our SMS is described. In particular, which search string is used, how to conduct the search for primary studies, what are inclusion and exclusion criteria, and rules for classifying primary studies. 4.1. The Search String There are many synonyms for DSL such as: application-oriented language, special purpose language, specialised language, task-specific language, application language, and little language. In the guidelines [13] it is suggested that synonyms should be used within the search string in order to broaden the literature search. In order to eliminate some threats to the validity of this study due to possible omissions of synonyms within the search string, a pilot literature search was performed during protocol design to verify whether the following synonyms had still been used in the research literature during the period 2006–2012. 7 The results showed that the following synonyms were, more or less unused anymore (increase in the number of hits was less than 0.05%): application-oriented language, application language, and task specific language. The following synonyms had been used rarely (increase in the number of hits was 1%-2%): little language and special purpose language. The synonym ’specialised language’ had been used more often than other synonyms (increase in the number of hits was 5.5%). However, most of the publications had not described DSL research (they had been from the linguistic field) and the relevant number of hits was less than for ’little language’. Because these synonyms are rarely used nowadays we will exclude them within the search string. As the acronym ’DSL’ is omnipresent nowadays and used in some papers without introduction we will include it within the search string. As this SMS will be started during the Summer of 2013 it would be possible to also include within this study some recent primary studies which were published in 2013. However, in this case the replication of this study would be extremely hard to achieve because publications published in the second half of the year 2013 would have to be excluded manually. In view of the aforementioned reasons it has been decided to use the following elementary search string: (”domain − specif ic language” OR ”DSL”) AN D year > 2005 AN D year < 2013 The rationale for the search string is now fully explained. 4.2. Conducting the search for primary studies According to the guidelines in [13] all relevant studies should be found whilst performing SR, whilst this recommendation was relaxed later for SMSs [14]. Indeed, as indicated in [28] it is more likely that we will only deal with a subset of all relevant publications. In order to have a rough indication as to how many relevant publications exist and to decide how to conduct the search, we did a preliminary search on the following Digital Libraries (DLs) (Table 1), which were available to us. Preliminary screening has also shown that most of them are relevant primary studies. In such a broad topic, as DSLs are, we have become convinced that all relevant primary studies cannot be identified and the inclusion of statistical methods would be needed in any case to produce proper generalisation. We decided due to the broadness of publications on DSL research, that our search for primary studies, automatic or manual, will be based on the margin of error (confidence interval) [5]. We will include DLs until the requested margin of error (confidence interval) is smaller or equal to 5%. The margin of error in other sciences is commonly set between 5%-10% [3, 12, 16]. Hence, our search for primary studies will be driven by the level of precision. When the margin of error is small enough an interpretation of data can be done with high confidence. We can add manual search anytime into the suggested process (as well as in the case when all DLs are exhausted). In the case that the specified margin of error cannot be achieved and no new primary studies can be 8 start ? defining research questions ? defining a search string & inclusion/exclusion criteria -? HH H can more HH no H relevant publications H HH be found H HHyes ? select a new digital library or add new publications by manual search ? screening relevant publications ? classifying primary studies and data extraction ? HH H H nois requested marginHH HH of error achieved H HH yes H ? aggregating results ? reporting results ? stop Figure 1: SMS procedure found by automatic or manual search we will stop the search process and report on the achieved margin of error. The suggested process is described in Figure1. In the above process the order of DLs is unspecified. We have decided to start with the ISI Web of Science as the number of hits was the highest. If the 9 number of DLs from Table 1 will be insufficient with respect of the requested margin of error we should add new DLs (e.g., GoogleScholar) or to perform a manual search. The search string specified in Section 4.1 has been customised for particular DLs as follows: ISI Web of Science: (((T S = (”domain − specif ic language”) OR T S = (DSL) OR T I = (”domain − specif ic language”) OR T I = (DSL)) AN D P Y = (2006 − 2012)) AN D SU = (Computer science)) IEEE Xplore: ((((”P ublication T itle” : domain−specif ic language) OR ”P ublication T itle” : DSL) OR ”Author Keywords” : domain−specif ic language) OR ”AuthorKeywords” : DSL) additional constraint : P ublication Y ear : 2006 − 2012 ACM Digital Library: (”domain − specif ic language” OR DSL) and (Keywords : ”domain − specif ic language” OR Keywords : DSL) additional constraint : P ublication Y ear : 2006 − 2012 Science Direct: pub − date > 2005 and pub − date < 2013 and T IT LE−ABST R−KEY (”domain−specif ic language”) or T IT LE−ABST R−KEY (DSL) additional constraint : [All Sources(Computer Science)] . 4.2.1. Inclusion/exclusion criteria The following inclusion/exclusion criteria have been defined. Similar criteria can be found in many other SMSs (e.g., [8, 21, 24, 22, 18]). The inclusion criteria: • study must have addressed DSL research, 10 Table 1: Digital Library ISI Web of Science IEEE Xplore ACM Digital Library Science Direct Preliminary identification of relevant publications accessible at no. of publications http:// http://sub3.webofknowledge.com 792 http://ieeexplore.ieee.org 527 http://dl.acm.org 361 http://www.sciencedirect.com 135 Σ 1815 • peer reviewed studies had been published in journals, conferences, and workshops, • study must be written in English, • study must be accessible electronically, and • computer science literature. The exclusion criteria: • irrelevant publications that lie outside the core DSL research field, which also excludes DSMLs, modelware, and MDE publications, visual/graphical languages (based on graph-grammars or other formalisms) or those mentioning DSL as future work; • non-peer reviewed studies (abstracts, tutorials, editorials, slides, talks, tool demonstrations, posters, panels, keynotes, technical reports); • peer-reviewed but not published in journals, conferences, workshops (e.g., PhD thesis, books, patents); • publications not in English; • electronically non-accessible; and • non-computer science literature. The inclusion and exclusion criteria will be applied to the titles, keywords, and abstracts. In those case where it will not be completely clear from the title, keywords, and abstract that a publication really addressed the DSL research then such publications will be temporarily included but might be excluded during the next phase (classification phase) when the whole publication (not only the abstract) will be read. Hence, only publications that are clearly outside the scope will be excluded during this phase. Possible mistakes made during this phase will be largely eliminated. Search for primary studies and applying inclusion/exclusion criteria will be done by the first and second authors. 11 4.2.2. Classifying the papers Classification is one of the most critical and time-consuming steps when conducting SMSs. In regard to our research questions, it is not expected that all relevant information will be inferred from the abstracts. Hence, the classification will be done based on reading the whole primary study. It was recommended [13] that the classification of primary studies is done by at least two authors, and in the case of disagreement the authors need to decide how to resolve it. However, such a process only increases the reliability of classification inside the group, where probably the opinions of the more senior researchers would prevail anyway. It was shown in [28] that this process didn’t improve the reliabilities of classifications between different research groups. In the case of two independent SMSs on product line testing 22 out of 33 papers were classified differently [28]. Due to these reasons we decided that classification of the primary studies will be done by the third author, who is the most experienced in terms of published relevant DSL research publications. We are convinced that reliability of classifications between different research groups in such manner will not be lower than reported in [28] and can be only increased by more precise guidelines how to classify primary studies (e.g., having “standarized classification scheme with an agreed interpretation” [28]). However, we acknowledged that this decision might introduce some bias, which is a valid threat to validity. To mitigate this threat, at least partially, some improvements to the process of classification has been suggested: two level classification and introduction of sufficient conditions (see Section 3). During this phase similar studies will also be identified (e.g., if a conference paper had a more recent journal version we will include only the latter). For this reason we will start classifying backwards, from 2012 to 2006. In order to avoid a fatigue during classification (as a possible threat to validity), classification will be performed in blocks of at most two hours followed by at least one hour breaks. 5. The extraction form Based on research questions (Section 3) and on the protocol specified in Section 4 the following extraction form has been proposed (Figure 2), which indicates what data will be extracted. To support this extraction form a web based application has been developed (Figure 3). 6. Divergences from the original Protocol In our SMS [17] we slightly diverged from the protocol as described in Section 4. During the reviewing process of [17] a mistake was found in computing the margin of error. This mistake was corrected in the period from May-August 2015, by classifying another 86 primary studies from ACM Digital Library, thus achieving the requested margin of error. However, given that it was impossible 12 Figure 2: The extraction form for DSL systematic mapping study to ensure a random sample of papers, the ambition to use the margin of error was discarded [17]. 13 Figure 3: Web application for DSL systematic mapping study References [1] A. Ampatzoglou, S. Charalampidou, I. Stamelos. Research state of the art on GoF design patterns: A mapping study. Journal of Systems and Software, 86(7), 1945–1964, 2013. [2] S. Barney, K. Petersen, M. Svahnberg, A. Aurum, H. Barney. Software quality trade-offs: A systematic map. Information and Software Technology, 54(7), 651–662, 2012. [3] J.E. Bartlett II, J.W. Kotrlik, C.C. Higgins. Organizational Research: Determining Appropriate Sample Size in Survey Research. Information technology, learning, and performance, 19(1), 43–50, 2001. [4] P. Brereton, B. Kitchenham, D. Budgen, M. Turner, M. Khalil. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software, 80(4), 571–583, 2007. [5] W.G. Cochran. Sampling techniques. John Wiley & Sons, New York, NY, 1977. [6] A. van Deursen, P. Klint, J. Visser. Domain-specific languages: an annotated bibliography. ACM SIGPLAN Notices, 35(6), 26–36, 2000. 14 [7] F. Elberzhager, A. Rosbach, J. Munch, R. Eschbach. Reducing test effort: A systematic mapping study on existing approaches. Information and Software Technology, 54(10), 1092–1106, 2012. [8] E. Engström, P. Runeson. Software product line testing a systematic mapping study. Information and Software Technology, 53(1), 2–13, 2011. [9] A.M. Fernández-Sáez, M. Genero, M.R.V. Chaudron. Empirical studies concerning the maintenance of UML diagrams and their use in the maintenance of code: A systematic mapping study. Information and Software Technology, 55(7), 1119–1142, 2013. [10] V. Garousi, A. Mesbah, A. Betin-Can, S. Mirshokraie. A Systematic Mapping Study on Web Application Testing. Information and Software Technology, 55(8), 1374–1396, 2013. [11] R. Glass, V. Ramesh, I. Vessey. An analysis of research in the computing disciplines. Communication of the ACM, 47(6), 89–94, 2004. [12] G.D. Israel. Determining Sample Size. Program Evaluation and Organizational Development, PEOD-6, University of Florida, Institute of Food and Agriculture Sciences, 1992. [13] B. Kitchenham, S. Charters. Guidelines for performing systematic literature reviews in software engineering. EBSE Techical Report, Keele University, 2007. [14] B. Kitchenham, D. Budgen, P. Brereton. The value of mapping studies-a participant-observer case study. Proceedings of the 14th International Conference on Evaluation and Assessment in Software Engineering, (EASE’10), pages 25-33, 2010. [15] B. Kitchenham, D. Budgen, P. Brereton. Using mapping studies as the basis for further research - A participant-observer case study. Information and Software Technology, 53(6), 638–651, 2011. [16] S.S. Kohles, J.B. Roberts, M.L. Upton, C.G. Wilson, L.J. Bonassar, A.L. Schlichting. Direct perfusion measurements of cancellous bone anisotropic permeability. Journal of Biomechanics, 34(9), 1197–1202, 2001. [17] T. Kosar, S. Bohra, M. Mernik. Domain-Specific Languages: A Systematic Mapping Study. Information and Software Technology, Submitted, 2015. [18] M.A. Laguna, Y. Crespo. A systematic mapping study on software product line evolution: From legacy system reengineering to product line refactoring. Science of Computer Programming, 78(8), 1010–1034, 2013. [19] A. Mehmood. D.N.A. Jawawi. Aspect-oriented model-driven code generation: A systematic mapping study. Information and Software Technology, 55(2), 395–411, 2013. 15 [20] M. Mernik, J. Heering, A.M. Sloane. When and how to develop domainspecific languages. ACM Computing Surveys, 37(4), 316–344, 2005. [21] P.A. da Mota Silveira Neto, I. do Carmo Machado, J.D. McGregor, E.S. de Almeida, S.R. de Lemos Meira. A systematic mapping study of software productlines testing. Information and Software Technology, 53(5), 407–423, 2011. [22] L.M. do Nascimento, D. Leite Viana, P.A.M. Silveira Neto, D.A.O. Martins, V. Cardoso Garcia, S.R.L. Meira. A Systematic Mapping Study on Domain-Specific Languages. Proceedings of the 7th International Conference on Software Engineering Advances (ICSEA’12), pages 179–187, 2012. [23] K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson. Systematic Mapping Studies in Software Engineering. Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering (EASE’08), pages 71–80, 2008. [24] I.F. Silva, P.A. da Mota Silveira Neto, P. O’Leary, E. Santana de Almeida, S.R. de Lemos Meira. Agile software product lines: a systematic mapping study. Software: Practice and Experience, 41(8), 899–920, 2011. [25] W. Tichy, P. Lukowicz, L. Prechelt, E. Heinz. Experimental evaluation in computer science: a quantitative study. Journal of Systems and Software, 28(1), 9–18, 1997. [26] P. Tonella, M. Torchiano, B. Du Bois, T. Systä. Empirical studies in reverse engineering: state of the art and future trends. Empirical Software Engineering, 12(5), 551–571, 2007. [27] R. Wieringa, N. Maiden, N. Mead, C. Rolland. Requirements engineering paper classification and evaluation criteria: a proposal and a discussion. Requirements Engineering, 11(1), 102-107, 2006. [28] C. Wohlin, P. Runeson, P.A. da Mota Silveira Neto, E. Engstrom, I. do Carmo Machado, E. Santana de Almeida. On the reliability of mapping studies in software engineering. The Journal of Systems and Software, 86(10), 2594-2610, 2013. 16
© Copyright 2026 Paperzz