Conflict-aware Optimal Scheduling of Code Clone Refactoring: A

Conflict-aware Optimal Scheduling of Code Clone
Refactoring: A Constraint Programming Approach
Minhaz F. Zibran
Chanchal K. Roy
Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada S7N 5C9
Email: {minhaz.zibran, chanchal.roy}@usask.ca
Abstract—Duplicated code, also known as code clones, are one
of the malicious ‘code smells’ that often need to be removed
through refactoring for enhancing maintainability. Among all
the potential refactoring opportunities, the choice and order of a
set of refactoring activities may have distinguishable effect on the
design/code quality. Moreover, there may be dependencies and
conflicts among those refactorings. The organization may also
impose priorities on certain refactoring activities. Addressing all
these conflicts, priorities, and dependencies, manual formulation
of an optimal refactoring schedule is very expensive, if not
impossible. Therefore, an automated refactoring scheduler is
necessary, which will maximize benefit and minimize refactoring
effort. In this paper, we present a refactoring effort model, and
propose a constraint programming approach for conflict-aware
optimal scheduling of code clone refactoring.
I. I NTRODUCTION
Duplicated code, or code clone is a well-known code smell.
From the maintenance perspective, existence of code clones
may increase maintenance effort and so, the amount of code
clones should be minimized by applying active refactoring.
There are many refactoring patterns, all of which are not
directly applicable to code clone refactoring. The applicability
of certain refactoring activities largely depends on the context.
So, for code clones, refactoring activities and the relevant
contexts need to be identified in the first place. The consequence of clone refactoring should also be taken into account.
The effort required for applying certain refactoring on the
underlying code clones should also be minimized to keep the
maintenance cost within reach. The application of a subset of
refactoring from a set of applicable refactoring activities may
result in distinguishable impact on the overall code quality.
Moreover, there may be sequential dependencies and conflicts
among the refactoring activities. Hence, it is also necessary
that, from all refactoring candidates a subset of non-conflicting
refactoring activities be selected and ordered (for application)
such that the quality of the code base is maximized while the
required effort is minimized [24].
Software refactoring is often performed with the aid of
graph transformation tools [17], where the available refactorings are applied in random, without having been scheduled [14]. Usually, the application order of the semi-automated
refactorings is determined implicitly by human practitioners.
But this is inefficient and error-prone. While experienced
engineers may do it well, inexperienced engineers may lead
to poor/infeasible schedule. The challenge is likely to be more
severe for refactoring legacy systems, or when a practitioner
new to the code base has to devise the refactoring schedule.
Therefore, automated (or semi-automated) scheduling for performing selection and ordering of refactorings from a set of
all refactoring candidates is a justified need.
In this regard, this paper makes two contributions. First,
for estimating the refactoring effort, we introduce an effort
model for refactoring code clones in object-oriented (OO) code
base. Second, taking into account the effort model and a wide
variety of possible hard and soft constraints, we formulate
clone refactoring scheduling as a constraint satisfaction optimization problem (CSOP), and solve it applying constraint
programming (CP) technique that aims to maximize benefit
while minimizing the refactoring effort. To the best of our
knowledge, ours is the first refactoring effort model for OO
systems, and we are the first to apply the CP technique in
software refactoring scheduling.
The remaining of the paper is organized as follows. Section II identifies the refactoring patterns that are suitable for
code clone refactoring. In Section III, we describe our clone
refactoring effort model. Section IV discusses how the effect
of refactoring can be estimated. In Section V, we describe the
possible constraints on refactorings, and Section VI presents
our CSOP formulation of the refactoring scheduling problem.
Section VII contains discussion on the related work, and finally
Section VIII concludes the paper with our directions to future
research.
II. C LONE R EFACTORING O PERATIONS
Among the software refactoring patterns [6], we find that
extract method (EM), pull-up method (PM), extract class (EC)
and rename refactor (RR) are suitable for clone refactoring,
and we refer to them as refactoring operators.
For code clone refactoring, these refactoring operators will
operate on groups of clone fragments having two or more
members. We refer to such clone-groups as the refactoring
operands. Thus, a refactoring activity (or simply, refactoring) r can be formalized as r = hopr , gr i, where opr ∈
{EM, P M, EC, RR} and gr is the clone-group, which the
refactoring operator opr operates on.
III. E STIMATION OF R EFACTORING E FFORT
The effort required for code clone refactoring is likely
to depend on the type of refactoring operator and operand.
Moreover, refactoring cloned code snippets that are scattered
across different locations of the code base and/or inheritance
hierarchy may require relatively more effort than that for
refactoring clones residing cohesively at certain location of
the code base. To address these issues, we propose a code
clone refactoring effort model for procedural and OO software
systems.
Suppose, a group of clones g = {c1 , c2 , c3 , . . . , cn } is
extracted as a set of refactoring candidates, where ci (1 ≤
i ≤ n) is a clone fragment inside method mi , which is a
member of class Ci hosted in file Fi contained in directory
Di . Mathematically,
y y y
cy
i mi Ci Fi Di ,
y y y
ci mi Fi Di ,
for object-oriented code.
for procedural code.
Here, the symbol y indicates containment relationship. xy y
means, x is contained in y, in other words, y contains x.
The relationship preserves transitive property, i.e., xy y y z ⇒
xy z. Thus, the set C(g) of all classes hosting the clone fragments in g can be defined as C(g) = {Ci | ∀ ci ∈ g, cy
i Ci }.
A. Context Understanding Effort
Applicability of refactoring on certain code clones are
largely dependent on the context. Therefore, before refactoring, the developer needs to understand the context pertaining
to the refactoring candidate at hand. For understanding the
context, the developer needs to examine two things: the callercallee delegation of methods and the inheritance hierarchy.
Effort for Understanding Method Delegation. To understand the method delegation involving the concerned code
clone ci , the developer needs to understand the chain of methods that can be reached from ci via caller-callee relationship.
Let, Mr (ci ) be the set of all such methods. The developer will
also need to comprehend the set Mf (ci ) of all the methods
from which ci can be reached via caller-callee relationship.
Then, the set of methods required to investigate for understanding the delegation effort concerning ci , denoted as,
delegation(ci ) = Mf (ci )∪Mr (ci )∪{mi }. For understanding
delegation concerning the clone fragments in g, the set of
all
S methods required to examine, becomes Delegation(g) =
ci ∈g delegation(ci ). Thus, for the group g of refactoring
candidates, the total effort for understanding method delegation understanding
can be estimated as,
P
Ed (g) = m∈Delegation(g) LOC(m).
Effort for Understanding Inheritance Hierarchy. Suppose, Cp (g) be the set of all lowest/closest common superclasses of all pairs of classes in C(g). The developer also
needs to understand those classes in the inheritance hierarchy
that have overridden or referenced to method mi containing
any code clone ci ∈ g. Let, Cs (g) be the set of all such
classes. Then Ch (g) = {Cp (g) ∪ Cs (g) ∪ C(g)} becomes
the set of all classes required to examine for understanding
the inheritance hierarchy concerning the code clones in g,
and the effort
P Eh (g) required for this can be estimated as,
Eh (g) = C∈Ch (g) LOC(C).
B. Effort for Code Modifications
To perform refactoring on the refactoring candidates, the
developer usually needs to modify portions of source code.
Token Modification Effort. Developer’s source code
modification activities typically include modifications in
the program tokens (e.g., identifier renaming). Let, T =
{t1 , t2 , t3 , . . . , tk } be the set of tokens such that a token
ti ∈ T is required to be modified to t0i , and the edit distance
between ti and t0i is denoted as δ(ti , t0i ). Then the total
effort EtP
(g) for token modifications can be estimated as,
k
Et (g) = i=1 δ(ti , t0i ).
Code Relocation Effort. When the developers need to
move a piece of code from one place to another, they typically
select a block of adjacent statements and relocate them all at a
time. Hence, the code relocation effort Er (g) can be estimated
as, Er (g) = |β|. where β is the set of all non-adjacent blocks
of code that need to be relocated to perform the refactoring.
C. Navigation Effort
Effort for source code comprehension, modification, relocation is also dependent on the number of files and directories involved, and their distribution. To capture this, our effort model
includes the notion of navigation effort, En (g), calculated as
follows.
En (g) = |Fd (g)∪Fh (g)|+|Dd (g)∪Dh (g)|+DCH(g)+DF H(g)
where,
Fd (g) ={Fi |
Fh (g) ={Fi |
Dd (g) ={Di |
Dh (g) ={Di |
DCH(g) =
DF H(g) =
my
i Fi , mi ∈ Delegation(g)}
Ciy Fi , Ci ∈ Ch (g)}
Fiy Di , Fi ∈ Fd (g)}
Fiy Di , Fi ∈ Fh (g)}
max
{∂(Ci , Cj )}
Ci ,Cj ∈Ch (g)
max
{ð(Fi , Fj )}
Fi ,Fj ∈Fd (g)∪Fh (g)
Here, DCH(g) refers to the dispersion of class hierarchy
having ∂(Ci , Cj ) denoting the distance between class Ci
and class Cj in the inheritance hierarchy. More detail about
DCH(g) can be found elsewhere [8]. DF H(g) is a similar
metric that captures the dispersion of files, and ð(Fi , Fj )
denotes the distance between files Fi and Fj in the file-system
hierarchy.
Thus, the total effort E(g) needed to refactor clone-group
g is estimated as,
E(g) = wd × Ed (g) + wh × Eh (g)
+ wt × Et (g) + wr × Er (g) + wn × En (g)
where wd , wh , wt , wr , and wn are respectively the wights on
the efforts for understanding method delegation, understanding
inheritance hierarchy, token modification, code relocation, and
navigation. By default, they are set to one, but the software
engineer may assign different weights to penalize certain types
of efforts.
IV. P REDICTION OF R EFACTORING B ENEFITS
The expected benefit from code clone refactoring is the
structural improvement in the code base, which should also
enhance the software design quality. Obvious expected benefits
include reduced line of code (LOC), less redundant code,
and so on. For procedural code, procedural metrics (e.g.,
LOC, Cyclomatic Complexity) as well as structural metrics
(e.g., fan-in, fan-out, and information flow) can be used to
estimate software quality after refactoring. For object-oriented
system, these metrics can be supplemented by object-oriented
design metrics suites, such as QMOOD [2] and ChidamberKemerer [4] metric suite. Quantitative or qualitative estimation
of the effect of refactoring on quality metrics can be possible
before the actual application of the refactoring [3], [15], [18],
[21], [22]. For example, Higo et. al. [7] developed a tool to
estimate the effect of refactoring on the Chidamber-Kemerer
quality metric before the actual application of the refactorings.
Having chosen a suitable set of quality attributes, let, Q =
{q1 , q2 , q3 , . . . , qη } be the set of quality attribute values before
refactoring, and Qr = {q10 , q20 , q30 , . . . , qη0 } be the estimated
values of those quality attributes after applying refactoring r.
The improvement in quality can be assessed by comparing
the quality before and after refactoring. Hence, the total gain
in quality Qr from refactoring r can be estimated as, Qr =
P
η
0
th
quality
j=1 ϑj ×(qj −qj ), where ϑj is the weight on the j
attribute. By default, ϑj = 1, but the software practitioner can
assign different values to impose more or less emphasis on
certain quality attributes.
V. R EFACTORING C ONSTRAINTS
Among the applicable refactoring activities, there may be
conflicts and dependencies [16]. Application of one refactoring may cause elements of other refactorings disappear, and
thus invalidate their applicability [3], [15], [16]. Beside such
mutual exclusion on conflicting refactorings, the application of
one refactoring may reveal new refactoring opportunities, as
suggested by Lee et. al. [15]. We understand that this is largely
due to the composite structure of certain refactoring patterns,
where one larger refactoring is composed of several smaller
core refactorings [1]. For example, when extract superclass
refactoring is applied, pull-up method is also applied as a part
of it. In other words, pull-up method, at times, may require
extraction of new superclass.
There may also be sequential dependencies between refactoring activities [15], [16]. Constraints of mutual inclusion
may also arise when the application of one refactoring will necessitate operation of certain other refactorings [23]. Moreover,
the organization’s management may also impose priorities on
certain refactoring activities [3], for example, lower priorities
on refactoring clones in the critical parts of the system. We
identify such priorities as soft constraints beside the following
three types of hard constraints.
Definition 1 (Sequential dependency): Two refactorings ri
and rj is said to have sequential dependency, if ri cannot be
applied after rj . This is denoted as, rj 9 ri .
Definition 2 (Mutual exclusion): Given two refactorings ri
and rj is said to be mutually exclusive, if both ri 9 rj and
ri 8 rj holds. The mutual exclusion between ri and rj is
denoted as, ri = rj or rj = ri .
Definition 3 (Mutual inclusion): Two refactorings ri and rj
are said to be mutually inclusive, if ri is applied, rj must also
be applied before or after ri , and vice versa. This is denoted
as ri ↔ rj or rj ↔ ri .
The complete independence of ri and rj is expressed as ri ⊥rj .
VI. F ORMULATION OF R EFACTORING S CHEDULE
Addressing all the hard and soft constraints, it becomes very
difficult to compute an optimal refactoring schedule aiming
to maximize code quality while minimizing efforts. Such a
problem is known to be NP-hard [3], [14], [15]. Finding
the optimum solution for such problems practically becomes
too expensive (time consuming), and so, a feasible optimal
solution is desired. However, the problem is by nature a CSOP.
Therefore, we model the problem as a CSOP and solve it
by applying constraint programming technique, which no one
tried before.
Having identified the set R of all potential refactoring
activities, we define two decision variables Xr and Yr , such
that,

0
1
Xr =

Yr =
if r ∈ R is not chosen
if r ∈ R is chosen
if r ∈ R is not chosen
if r ∈ R is chosen as the kth activity
0
k
where 1 ≤ k ≤ |R|.
We also define a |R| × |R| constraint matrix Z to capture
the sequential dependencies between refactorings ri and rj ,
such that,
Zij
8
0
>
>
>
>
1
>
<
+2
=
−2
>
>
>
>
+3
>
:
−3
if
if
if
if
if
if
ri ⊥rj
ri = rj
rj 9 ri and ri ↔ rj
ri 9 rj and ri ↔ rj
rj 9 ri , but neither ri ↔ rj nor ri = rj
ri 9 rj , but neither ri ↔ rj nor ri = rj
Also note that, Zij = −Zji or Zij = Zji = 1, for all hi, ji.
Let, ρr be the priority on the refactoring r that operates
on clone-group gr . The CSOP formulation of the refactoring
scheduling problem can be as follows.
maximize
X
Xr (ρr Qr − E(gr ))
(1)
r∈R
subject to,
Xr + Yr 6= 1,
Xri + Xrj = 2 ⇒ Yri 6= Yrj ,
Zij − Zji > 0 ⇒ Yri < Yrj ,
Zij − Zji < 0 ⇒ Yri > Yrj ,
|Zij | = 1 ⇒ Xri + Xrj ≤ 1,
|Zij | = 2 ⇒ Xri + Xrj = 2,
∀
∀
∀
∀
∀
∀r
ri , r j
ri , r j
ri , r j
ri , r j
ri , r j
∈R
∈R
∈R
∈R
∈R
∈R
(2)
(3)
(4)
(5)
(6)
(7)
Here, Equation 1 is the objective function for maximizing the code quality and minimizing the refactoring effort
while rewarding refactoring activities having higher priorities.
Equation 2 ensures that the decision variables Xr and Yr
are kept consistent as their values are assigned. Equation 3
enforces that no two refactorings are scheduled at the same
point in the sequence. Equation 4 and Equation 5 impose
the sequential dependency constraints (i.e., ri 9 rj ) on
feasible schedules. Mutual exclusion (i.e., ri = rj ) and mutual
inclusion (i.e., ri ↔ rj ) constraints are enforced by Equation 6
and Equation 7 respectively.
We have implemented the CSOP model applying constraint
programming using OPL (Optimization Programming Language) in IBM ILOG CPLEX Optimization Studio
12.2.
VII. R ELATED W ORK
The work of Bouktif et. al. [3], Lee et. al. [15], and Liu et.
al. [14] closely relate to ours. Bouktif et. al. [3] formulated the
refactoring problem as a constrained Knapsack problem and
applied a metaheuristic genetic algorithm (GA) to obtain an
optimal solution. However, they ignored both the constraints
and consequences of the refactorings. Lee et. al. [15] applied
ordering messy GA (OmeGA), whereas, Liu et. al. [14] applied a heuristic algorithm to schedule code clone refactoring
activities. Both of those work took into account the conflict
and sequential dependencies among the refactoring activities,
but missed the constraints of mutual inclusion and refactoring
effort. Our work significantly differs from all those work in
two ways. First, for computing the refactoring schedule, we
applied constraint programming approach, which is different
from theirs. Second, we took into account a wide category of
refactoring constraints, some of which they ignored. Although
Bouktif et. al. [3] proposed a clone refactoring effort model,
their model was for procedural code only, whereas, our effort
model is applicable to not only procedural but also OO source
code.
O’Keeffe et. al. [13] conducted an empirical comparison
of simulated annealing (SA), GA and multiple ascent hillclimbing techniques in scheduling refactoring activities in five
software systems written in Java. They reported that the hillclimbing approach outperformed both GA and SA techniques.
One of our immediate future work is to compare our technique
with those approaches.
A number of methodologies [5], [12], [19], [20], [23] and
metric based tools such as CCShaper [9] and Aries [8] have
been proposed for semi-automated extraction of code clones
as refactoring candidates. Several tools, such as Libra [10]
and CnP [11], have been developed for providing support for
simultaneous modification of code clones. Our work is not on
finding potential clones for refactoring or providing editing
support to apply refactorings. Rather, we focus on efficient
scheduling of those refactoring candidates, which is missing
in those tools. However, the metrics used in those tools can be
exploited to estimate the refactoring effort and expected gain.
VIII. C ONCLUSION AND F UTURE W ORK
In this paper, we present our ongoing work towards conflictaware optimal scheduling of code clone refactorings. To estimate the refactoring effort, we have proposed an effort model
for refactoring code clones in OO source code. Considering
diverse categories of refactoring constraints, we have modelled
the clone refactoring problem as a CSOP, and implemented the
model using the CP technique. To the best of our knowledge,
ours is the first refactoring effort model for OO code corpus,
and our CP approach is a unique technique that no one in the
past reported to have applied in this regard.
To evaluate our approach, we are currently working to apply
it on several open-source and proprietary code bases. Our
immediate future work includes comparison of our approach
with other techniques such as GA and SA. We believe that
our constraint programming approach is capable of efficiently
compute optimal feasible schedule of code clone refactoring
activities.
Acknowledgments: This work is supported in part by the
Natural Science and Engineering Research Council of Canada
(NSERC).
R EFERENCES
[1] D. Advani, Y. Hassoun, and S. Counsell. Understanding the complexity of
refactoring in software systems: a tool-based approach. Intl. J. Gen. Sys., 35(3):
329–346, 2006.
[2] J. Bansiya and C.G. Davis. A hierarchical model for object-oriented design quality
assessment. IEEE Trans. Softw. Engg., 28(1): 4–17, 2002.
[3] S. Bouktif, G. Antoniol, M. Neteler, and E. Merlo. A Novel Approach to Optimize
Clone Refactoring Activity. In GECCO, July 8 –12, 2006.
[4] S. Chidamber and C. Kemerer. A metric suite for object-oriented design. IEEE
Trans. Softw. Engg., 25(5): 476–493, 1994.
[5] S. Ducasse, M. Rieger, and G. Golomingi. Tool Support for Refactoring Duplicated
OO Code. In WOOT, pp. 177–178, 1999.
[6] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. Refactoring: Improving
the Design of Existing Code. Addison Wesley Professional, 1999.
[7] Y. Higo, Y. Matsumoto, S. Kusumoto, and K. Inoue. Refactoring Effect Estimation
based on Complexity Metrics. In ASWEC, pp. 219–228, 2008.
[8] Y. Higo, T. Kamiya, S. Kusumoto, and K. Inoue. ARIES: Refactoring Support
Tool Code Clone. In 3-WoSQ, pp. 1 – 4, 2005.
[9] Y. Higo, T. Kamiya, S. Kusumoto, and K. Inoue. Refactoring Support Based on
Code Clone Analysis. PROFES, LNCS 3009, pp. 220–233, Springer-Verlag, 2004.
[10] Y. Higo, Y. Ueda, S. Kusumoto, and K. Inoue. Simultaneous Modification Support
based on Code Clone Analysis. In APSEC. pp. 262–269, 2007.
[11] D. Hou, P. Jablonski, and F. Jacob. CnP: Towards an environment for the proactive
management of copy-and-paste programming. In ICPC, pp. 238–242, 2009.
[12] E. Kodhai , V. Vijayakumar, G. Balabaskaran, T. Stalin, and B.Kanagaraj. Method
Level Detection and Removal of Code Clones in C and Java Programs using
Refactoring. In IJJCET, pp. 93 – 95, 2010.
[13] M. O’Keeffe and M. Ó Cinnéide. Search-based refactoring: an empirical study. J.
Softw. Maint. Evol.: Res. Pract., 20: 345 – 364, 2008.
[14] H. Liu, G. Li, Z. Ma, and W. Shao. Conflict-aware schedule of software refactorings. IET Softw., 2(5): 446–460, 2008.
[15] S. Lee, G. Bae, H. S. Chae, and D. Bae, and Yong Rae Kwon. Automated
scheduling for clone-based refactoring using a competent GA. Softw. Pract. Exper.,
Wiley Online Library, 2010.
[16] T. Mens, G. Taentzer, and O. Runge. Analysing refactoring dependencies using
graph transformation. J. Softw. and Syst. Modeling, 6(3): 269–285, 2007.
[17] J. Pérez, Y. Crespo, B. Hoffmann, and Tom Mens. A case study to evaluate the
suitability of graph transformation tools for program refactoring. Intl. J. Softw.
Tools Tech. Transfer, 12: 183–199, 2010.
[18] H. Sahraoui, R. Godin, and T. Miceli. Can metrics help to bridge the gap between
the improvement of OO design quality and its automation?. In ICSM, pp. 154–162,
2000.
[19] S. Schulze, M. Kuhlemann, and M. Rosenmüller. Towards a Refactoring Guideline
Using Code Clone Classification. In WRT, pp. 6:1–6:4, 2008.
[20] S. Schulze and M. Kuhlemann. Advanced Analysis for Code Clone Removal. In
WSR, 2009.
[21] F. Simon, F. Steinbrucker, and C. Lewerentz. Metrics based refactoring. In CSMR,
pp. 30–38, 2001.
[22] L. Tahvildari and K. Kontogiannis. A metric-based approach to enhance design
quality through meta-pattern transformations. In CSMR, pp. 183–192, 2003.
[23] N. Yoshida, Y. Higo, T. Kamiya, S. Kusumoto, and K. Inoue. On refactoring
support based on code clone dependency relation. In METRICS, 10 pp., 2005.
[24] M. F. Zibran and C. K. Roy. Towards Flexible Code Clone Detection, Management,
and Refactoring in IDE. In IWSC, 2 pp., 2011 (to appear).

Download Report

Conflict-aware Optimal Scheduling of Code Clone Refactoring: A

Paperzz.com

Your Paperzz