Cross product line analysis

Cross Product Line Analysis
Ora Wulf-Hadash
Iris Reinhartz-Berger
Department of Information Systems,
University of Haifa, Haifa 31905, Israel
972-4-9590805
Department of Information Systems,
University of Haifa, Haifa 31905, Israel
972-4-8288502
[email protected]
[email protected]
ABSTRACT
Due to increase in market competition and merger and acquisition
of companies, different software product lines (SPLs) may exist
under the same roof. These SPLs may be developed applying
different domain analysis processes, but are likely not disjoint.
Cross product line analysis aims to examine the common and
variable aspects of different SPLs for improving maintenance and
future development of related SPLs. Currently different SPL
artifacts, or more accurately feature models, are compared,
matched, and merged for supporting scalability, increasing
modularity and reuse, synchronizing feature model versions, and
modeling multiple SPLs for software supply chains. However, in
all these cases the focus is on creating valid merged models from
the input feature models. Furthermore, the terminology used in all
the input feature models is assumed to be the same, namely
similar features are named the same. As a result these methods
cannot be simply applied to feature models that represent different
SPLs. In this work we offer adapting similarity metrics and text
clustering techniques in order to enable cross product line
analysis. This way analysis of feature models that use different
terminologies in the same domain can be done in order to improve
the management of the involved SPLs. Preliminary results reveal
that the suggested method helps systematically analyze the
commonality and variability between related SPLs, potentially
suggesting improvements to existing SPLs and to the maintenance
of sets of SPLs.
Keywords
Feature Diagram Matching, Feature Diagram Merging, Feature
Clustering, Feature Similarity, Empirical Evaluation
1. INTRODUCTION
Due to increase in market competition, companies cannot afford
to focus on single SPLs and need to develop several SPLs for
different customers, requirements, etc. Usually all these SPLs are
somehow related, e.g., belong to the domain in which the
company specializes, but include different common and variable
aspects. As an example consider the domain of mobile phones.
The largest seller of mobile devices1, Samsung, manages several
different SPLs, some of which are: Galaxy S, Galaxy Note, and
Samsung Nexus. While these SPLs differ in their features, e.g.,
Samsung Galaxy Note has a large screen, whereas Samsung
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee.
VaMoS '13, January 23 - 25 2013, Pisa , Italy
1Copyright
2013 ACM 978-1-4503-1541-8/13/01…$15.00.
According to Gartner’s report of the first quarter of 2012.
Galaxy S is relatively small, it is important to be able to
systematically analyze the commonality and variability of these
SPLs in order to improve productivity by considering uniting the
maintenance and future development of similar features (or even
SPLs) or by improving the artifacts of a specific SPL based on
artifacts of other SPLs in the same domain.
In another scenario, consider mergers or acquisitions of
companies. Each company has developed and maintained its own
SPL or SPLs. The merger or acquisition of the companies yields
the existence of different SPLs with different kinds of overlaps.
Furthermore, the terminologies used in the different SPLs in this
case may differ due to the development of the artifacts in different
companies. Here again it is important to analyze the commonality
and variability of these SPLs in order to improve the management
of these SPLs. As an example to this scenario consider the merger
of Sony and Ericsson in 2001 ‎[28].
A large group of methods concentrate on representing SPLs as
feature models ‎[6]. Some of the methods further support
automated analysis of these feature models ‎[3], such as checking
product validity, calculating the number of products in a SPL, and
identifying void features. Several studies also examine the
relationships between feature models for supporting
scalability ‎[25], increasing modularity and reuse ‎[2], ‎[4],
synchronizing feature model versions ‎[17], ‎[27], and modeling
multiple SPLs for software supply chains ‎[7], ‎[8]. To this end, the
studies mainly suggest composing feature models describing
different SPL aspects ‎[1]. Furthermore, they assume using the
same terminology in the input feature models, check the structural
similarity of the given models, and searches for similar portions
that are later used as anchors for merging. While this strategy is
perfectly suitable for handling feature models that represent
different aspects of the same SPL or different SPLs that use the
same underlying terminology, it is not enough for performing
commonality and variability analysis of feature models of
different SPLs that were potentially developed in different
departments or even in different companies and thus not
necessarily share the same terminology (as is the case in mergers
and acquisitions).
In the current work, we call for cross product line analysis,
namely conducting commonality and variability analysis of
related SPLs for improving the management of current and future
SPLs. In particular, the input of the suggested method is a set of
feature diagrams representing different SPLs. The input is
processed in three main steps (see Figure 1). First, during the
Feature Similarity Calculation step, the set of feature diagrams is
analyzed using linguistic and structural techniques for finding
similar features. This step is important for aligning the different
terminologies that may be used for developing the artifacts of the
different SPLs. In the second step, Feature Clustering, an
agglomerative clustering technique is used for creating groups
(clusters) of similar features that may represent variants of the
same features. Finally, in the Cluster Analysis step, the clusters
from the previous step as well as their relationships are analyzed
to provide recommendations for improving individual SPLs and
the management of the whole set of SPLs.
Feature
Similarity
Calculation
Pairs of Features and
their Degrees of
Similarity
A set of Feature
Diagrams
2.1.1 Similarity of Feature Names
Feature
Clustering
Clusters of
Similar
Features
Cluster
Analysis
Recommendations
for Improvements
2.1 Feature Similarity Calculation
In order to define the common aspects of the input feature
diagrams, the method measures the similarity in the feature names
and their context (i.e., where they appear in the feature diagrams
with respect to their ancestors and descendants).
Figure 1. An overview of the suggested method
The rest of the paper is structured as follows. Section 2 describes
and exemplifies the method. Section 3 presents preliminary results
regarding the method outputs. Section 4 includes related work and
discusses the benefits and limitations of the suggested method
with respect to the related work. Finally, Section 5 concludes and
refers to future research.
2. The Cross Product Line Analysis Method
As noted, the input for the suggested method is a set of feature
diagrams, each representing a SPL in the same domain. As an
example to such an input, consider Figure 2 which includes two
feature diagrams in the mobile phones domain. The SPL presented
in Figure 2(a) supports utility functions (namely, voice calls and
messaging services), three types of screens, and optional extras
that include a camera, mp3, or mp4. The second SPL, presented in
Figure 2(b), supports calls, message services, two types of
displays, and optional media capabilities in the form of a camera
or mp3. As can be seen, these two SPLs differ in the features they
support (e.g., the second SPL does not support mp4), the ways
they structure the features (e.g., calls appear in the first SPL under
utility functions, whereas in the second SPL it appears directly
under the diagram root), and the terminologies they use (e.g.,
'extras' vs. 'media', ‘screen’ vs. ‘display’, and so on). To overcome
these kinds of difference, the method measures first the degree of
similarity between pairs of features and afterwards groups similar
features, enabling commonality and variability analysis of clusters
rather than individual features. These steps of the method are
elaborated next.
We utilize linguistic measurements for calculating the similarity
of feature names. Many of the linguistic measurements (see ‎[5],
for example) uses WordNet, which is a lexical database of
English ‎[29]. The benefits of WordNet are that it is large, rich,
freely available online, and general-purpose; hence, it can be used
for SPLs that belong to different domains. Note, however, that
WordNet also has shortcomings in the context of cross SPL
analysis. The features in technological domains can be
represented as abbreviations or commonly known acronyms
which are sometimes not recognized as meaningful words for
WordNet. In other cases, the same word may have different
meanings depending on the domain and the context. To overcome
these deficiencies, we currently added the ability to import userdefined acronyms for certain domains. In the future, we intend to
improve this step with Wikipedia-based semantic analysis
methods, such as the one proposed in ‎[13].
For measuring the similarity of two features, we adopted Dao and
Simpson's similarity measurement between two phrases ‎[10],
which is a simple and straightforward metric that does not require
a large corpus of statistics. The following formula defines feature
name similarity.
Definition 1 (Feature Name Similarity). Let f1 and f2 be two
features. Feature name similarity, NSim, is calculated as follows:
∑
(
Where:
t1…tm; u1…un are the names of features f1; f2, respectively (m and
n are the numbers of words in the names of f1 and f2)
is Wu and Palmer's formula ‎[30] for
comparing two words (see explanations in Figure 3).
LCS is the least common superconcept of ti and uj in WordNet
root
N1 is the number of nodes on the
shortest path from ti to LCS in
WordNet
N3
mobile phone
LCS
screen
utility functions
voice call
messaging
Text Message
basic
color
N1
E = extras
high
resolution
Voice Message
(a)
camera
mp3
alternative
alternative
require
require
optional
optional
or
or
exclude
exclude
message
service
display
SMS – short message
service
MMS – multimedia
message service
EMS – enhanced
message service
media
low
resolution
colour
tti 1
tu2j
N3 is the number of nodes on the
shortest path from LCS to the root in
WordNet
Figure 3. Calculating similarity between terms that are
hierarchically related
mobile phone
calls
N2 is the number of nodes on the
shortest path from u j to LCS in
WordNet
N2
mp4
mandatory
mandatory
∑
)
camera
(b)
Figure 2. Two feature diagrams of mobile phones
mp3
As an example of calculating the name similarity of two features,
consider the feature 'Short Message Service' (SMS), which
appears in Figure 2(b), and the feature 'Text Message' that appears
in Figure 2(a). Table 1 summarizes the pair-wise name similarity
values of these features, while the formula below calculates their
feature name similarity.
(
(
)
)
(
Figure 2(b). The name similarity of these features is 0.58, whereas
their overall similarity taking into consideration their sub-features
is much higher – 0.72.
)
Table 1. The pair-wise similarity values of 'short message
service' and 'text message'
Short
0.52
0.43
Text
Message
message
0.62
1.00
service
0.55
0.46
2.1.2 Similarity of Feature Context
As the inputs of our method are feature diagrams, which are
structured trees of features and not plain lists, the method consider
the context in which the feature appears and not just its name. The
descendants (sub-features) are highly important for determining
the context of the (ancestor) feature. Note that similar structures
of completely different features may not indicate on their potential
relatedness. However, the similarity of features whose names and
structures are similar should be higher than the similarity of
features which only share similar names. For defining the context
similarity of features, the method considers their immediate
descendants (namely, mandatory, optional alternative, and ‘or’
sub-features). Let f1, f’1, f2, and f’2 be features, such that f’1 is a
sub-feature of f1 and f’2 is a sub-feature of f2 (see Figure 4). If f’1
and f’2 are similar (i.e., their similarity measurement considering
both their names and context with respect to their descendants is
higher than some threshold), then the similarity of f1 and f2 should
increase. Note that the increase in the similarity is percolated from
the leaves of the feature diagram to its root, thus the percolation
process will terminate assuming the input feature diagrams are
structured as trees. The following definition defines feature
similarity taking into consideration both feature names and
context.
Potentially similar
f'1
Are similar
f2
Increase
similarity
f1
f'2
Figure 4. Percolating similarity through relationships
Definition 2 (Feature Similarity). Feature similarity of features
f1 and f2 is calculated using the following formula:
(
(
)
)
∑
(
)
Where {f’1} are the sub-features of f1, {f’2} are the sub-features of
f2, Sim’(f’1,f’2) =
(
)
the number of pairs (f’1, f’2) satisfying
(
)
(
, m is
)> threshold.
In order to determine the threshold for similar features (namely,
the similarity threshold), different algorithms may be used. As
will be explained and demonstrated in Section 3, we chose to use
AdaBoost ‎[11], which is a machine learning, adaptive algorithm,
for this purpose.
Three important characteristics of the above formula are: (1) the
value of similarity is always between 0 and 1; (2) the similarity of
features increases proportionally to the degree of similarity of
their sub-features; and (3) the similarity of features increases
proportionally to the number of similar sub-features.
As an example of feature similarity calculation, consider the
features ‘messaging’ from Figure 2(a) and ‘message service’ from
At the current stage, the method only checks the existence of
relationships between features and not the types of these
relationships (e.g., mandatory vs. optional features). In the future,
we plan to examine the impacts of the different relationship types
on similarity and to improve the definition of similarity
accordingly, e.g., by defining a higher weight to mandatory
relationships.
Note that the values of both name and overall similarity have no
absolute meaning, but only relative ones (“more similar than”,
“less similar than”). Thus, other techniques are required to better
understand the degree of similarity of the different feature
diagrams or the SPLs that they represent. For this purpose, we
utilize feature clustering, as described next.
2.2 Feature Clustering
Clustering is the process of grouping a set of objects into classes
of similar objects ‎[21]. In our research, the objects are features
that are represented via their names. Thus, document or text
clustering techniques are relevant ‎[26], ‎[15]. In particular, we use
a variation of the agglomerative hierarchical clustering technique.
This technique ‎[18] is a bottom-up clustering approach, which
gets as a parameter the number of expected clusters and starts
with putting each object in a separate cluster. Then, the algorithm
agglomerates (merges) in each iteration the closest pair of clusters
by calculating the distance between different clusters. The
algorithm continues until the number of expected clusters is
reached.
We chose this algorithm because of the following reasons. First,
this algorithm is known as one of the most accurate clustering
techniques ‎[26], ‎[18]. Since the clustering quality has a great
impact on the analysis results, accuracy is very important in our
case. Second, the distance between two clusters reflects the degree
of similarity between their features. Starting with each feature in a
different cluster will prevent grouping features when they are not
similar enough. However, the agglomerative hierarchical
clustering algorithm requires determining the number of clusters
a-priori. This number cannot be determined in our case as it varies
depending on the size of the SPLs and their degree of variability.
Therefore, we modified the stopping criterion of the algorithm to
merge two closest clusters as long as the distance between them is
not bigger than the similarity threshold. This way we ensure that
too different features will not be put in the same cluster.
Three types of distances between clusters are commonly
mentioned in the literature ‎[22], ‎[16] (see Figure 5): (1) Singlelink: the distance between the two closest features of the clusters;
(2) Complete-link: the distance between the two farthest features
of the clusters; and (3) Average-link: the average of pair-wise
distances between features in the two clusters. Our algorithm
supports the three types of distances and the user can choose the
preferable type. For example, in cases where the domain is narrow
and there are many similar features, the 'complete-link' distance
may be more appropriate to create a refined division to clusters
(namely, more clusters where each cluster includes fewer but
more similar features). If the domain is wide, on the other hand,
the 'single-link' may be better to avoid stopping cluster merging
too early. In any case, after selecting a distance type, the algorithm
calculates all distances between clusters according to the selected
type.
Single link
Average link
Complete
link
Figure 5. Types of distances between clusters
The feature clustering algorithm, which is presented in Listing 1,
gets a two dimensional matrix, named FeatSim. The cell i,j of this
matrix represents the (overall) similarity between feature i and
feature j. The feature clustering algorithm starts with creating a
hash of clusters, each holding one feature that appears in FeatSim.
Then, iteratively, the algorithm measures the distance between
clusters (utilizing single-, complete-, or average-link) and merges
the closest clusters as long as their distance is greater than the
similarity threshold – th.
FeatureClustering(FeatSim){
// Initialization a hash of the clusters, each holding a
// single feature from FeatSim feature.
FeatClst = InitializeHash (FeatSim)
Do {
// A merging iteration
// find the two closest clusters based on the
// selected distance method
(i,j) = FindClosestClusters(FeatClst, FeatSim)
If Distance (FeatClst[i],FeatClst[j], FeatSim)
≥ th
//merge two closest clusters
merge(i, j, FeatClst)
// merge as long as there are close clusters
} while Distance (FeatClst[i],FeatClst[j], FeatSim) ≥ th
Return FeatClst
}
Listing 1. The algorithm for feature clustering
The feature clustering could create the following clusters in our
example of the two mobile phones SPLs in Figure 2:
Cluster 1: voice call, calls
Cluster 2: High resolution, Low resolution, color, colour, basic
Cluster 3: screen, display
Cluster 4: media, extras
Cluster 5: camera, mp3, mp4
Note that some of the clusters were mainly created due to the high
similarity in the feature names, e.g., cluster 3, whereas some
clusters emerged due to similar feature contexts (e.g., cluster 4).
In other cases, both name and context similarity contributed to the
cluster creation.
2.3 Cluster Analysis
Once the feature clusters are created, we can analyze the features
included in each cluster, as well as the cluster relationships. The
benefits of such an analysis are three folded. First, features that
belong to the same cluster are assumed to be similar. Thus, if they
belong to different SPLs it may be beneficial to manage them
together or even consider their union for future development and
maintenance. Second, clusters that are “tightly” connected are
likely to include related features. If these features are not included
in some SPLs, it may be advisable to recommend their inclusion
for enhancing the specific SPLs. Finally, the clusters and their
relationships may enable extracting the domain terminology. In
particular, they can help map different terminologies and identify
dependencies between terms or concepts, assisting in aligning the
existing SPLs and supporting their potential union.
2.3.1 Internal Analysis of Clusters
Several reasons may cause features to fall into the same cluster:
(1) the features are identical or almost identical (i.e., the values of
the similarity measurement are almost 1). Examples of such a case
are the features ‘color’-‘colour’ in cluster 2 and ‘screen’-‘display’
in cluster 3 of our example; (2) the features are not identical but
“similar enough”, potentially justifying their consideration as
different specializations or variants of the same abstract feature.
Examples of such a case are the features camera, mp3, and mp4 in
cluster 5 of our example; and (3) the names of the features are
different but the context in which they are used is so similar that
their overall similarity is relatively high. Examples of such a case
are the features ‘media’ and ‘extras’ in cluster 4 of our example.
In all these cases we may wish to recommend mutual management
of these features, and their associated artifacts. In other words,
inclusion of features from different feature diagrams, representing
different SPLs, in the same cluster may indicate on a high degree
of similarity between the corresponding features, potentially
calling for managing these features together. Alternatively, we
may wish to name each cluster and use these names for renaming
the corresponding features in the different SPLs. This way we will
align the different terminologies used in the SPLs and enable their
potential merge for different purposes, such as interoperability.
However, it is important to notice that sometimes different
representations of the same fact may use different, mainly
complementary, features. For example, in both feature diagrams
camera requires high resolution and no low resolution. However,
this constraint is differently expressed: in Figure 2(a) it is
expressed as 'camera' requires 'high resolution', while in Figure
2(b) – as 'camera' excludes 'low resolution'. In the future, we will
employ mining techniques to automatically detect such situations.
2.3.2 Inter-Cluster Analysis
The features in the different clusters may be related, inducing
relationships between the clusters. In other words, two clusters are
related if the features they contain are related in the input feature
diagrams. However, some of the clusters may be tightly related,
namely relationships between the corresponding features exist in
many of the involved SPLs, while others are loosely related or not
related at all. We want to use only tightly related relationships for
our cross product line analysis, as tight relationships may capture
“knowledge” on the set of the given SPLs. Thus, we use the
following definition of cluster relationship strength.
Definition 3 (cluster relationship strength): The strength of a
relationship from cluster C1 to cluster C2 is defined as the ratio
between the number of SPLs involved in the relationship2 and the
total number of SPLs whose features appear in at least one of the
clusters. Formally expressed:
(
)
In our example (with only two SPLs), we found tight relationships
between clusters 4 (including 'media', 'extras') and 5 (including
'camera', 'mp3', 'mp4'), namely the two feature diagrams include
relationships between features that belong to these clusters. The
relationship between a cluster that includes 'mobile phone' and a
2
i.e., the number of SPLs in which there exist features f1 and f2
such that f1C1, f2C2 and f2 is a sub-feature of f1 in the feature
diagram of that SPL.
cluster that includes 'utility functions' is loose, as only one feature
diagram, Figure 2(b), includes a relevant relationship.
For each two clusters that are connected via a relationship whose
strength is greater than a predefined threshold (e.g., two thirds of
the involved SPLs), we can recommend to SPLs whose features
appear in the source cluster but do not include features from the
target cluster to add appropriate features from the target cluster, as
they refine features already existing in the SPLs. This way the
method can improve the input feature diagrams by detailing them
and increasing the possible reuse among the SPLs. As an example
consider clusters 3 (screen, display) and 2 (basic, color, colour,
high resolution, low resolution). The relationship between these
clusters can be considered tight (relationships between features
that belong to these clusters exist in both diagrams). Thus, the
method will recommend adding the features ‘basic’ and ‘high
resolution’ under the feature 'display' in the SPL presented in
Figure 2(b) and the feature ‘low resolution’ under the feature
'screen' in the SPL presented in Figure 2(a).
3. PRELIMINARY RESULTS
In order to evaluate the proposed method, we implemented it,
assuming that the input feature diagrams are given in the common
Simple XML Feature Model format (SXFM) ‎[24]. We further
used the perl module WordNet::Similarity::wup ‎[20] for
calculating feature name similarity. A list including all the
features in the input diagrams was created and sent to the module,
which returned the similarity between each pair of features in the
list. For implementing the feature clustering algorithm, we used
the complete-link distance in order to create a refined division of
features into clusters.
The input feature diagrams were taken from S.P.L.O.T, an
academic feature diagrams repository ‎[24]. This repository
includes about 220 feature diagrams in different domains, which
were extracted from academic publications and other relevant
sources. The criteria for including feature diagrams in that
repository, as listed in S.P.L.O.T web site, are: (1) Consistency:
All models are guaranteed to be consistent (contain at least one
valid configuration); (2) Correctness: None of the models contain
dead features; and (3) Transparency: All models identify their
authors (or related literature) and provide some contact
information. Thus, we could assume the validity of these models
(at the cost of using relatively simple examples).
We examined the domains that are included in S.P.L.O.T
repository and selected the mobile phones domain, due to the
existence of seven different feature diagrams in the repository and
the ability to get relevant information for creating additional ones.
We indeed modeled two additional feature diagrams based on the
supplement material we found on this domain and added some
challenges to better evaluate our method. In particular, we added
synonyms and antonyms and we modeled the hierarchies of
features using different nesting structures. Table 2 lists the nine
feature diagrams we used in the evaluation, along with the number
of features in each diagram (marked #F in the table) and the
number of levels (marked #L). As can be seen, the diagrams are
quite simple, since we wanted to be able to examine the method
outputs manually. In the future we will evaluate the method on
more complicated feature diagrams in order to examine its
scalability.
To calculate the similarity threshold, we calculated first the name
similarity of all pairs of features from the nine input diagrams. We
then sample 100 pairs whose values of name similarity ranged
from low (almost 0) to high (almost 1). We requested six human
graders, who have strong technical background and experience in
mobile devices architecture, to grade the similarity of each
selected pair of features on a scale that ranged from 1 to 10. Based
on the results, we identified different features (grades 1-7) and
similar features (grades 8-10). For 96% of the feature pairs, all
graders classified the pair in the same similarity category. For
pairs on which no consensus was reached, we selected the
similarity category of the majority.
Table 2. The feature diagrams used in the evaluation
Name
MobilePhone
Mobile
Phone
Mobile
Phone
Phone
Creator
ISARDA
Sergio
Segura
AE UNB
#F
10
#L
3
20
4
25
4
m
10
3
Mobile
Phone
Example
Mobile
phone
Cell Phone
Lenita
ETSII
Rayco
10
3
10
3
Sebastian
Oster
Self
15
4
21
4
Self
21
4
Mobile
Phone 1
Mobile
Phone 2
Source
model_20120110_139114401.
xml
model_20100322_955726153.
xml
model_20101119_1472596180
.xml
model_20101111_1790887308
.xml
model_20120110_1719396361
.xml
model_20120110_1094246588
.xml
model_20100308_1032655961
.xml
Figure 6 shows a histogram of the number of pairs classified by
humans as similar or different for each range of name similarity
values (as calculated using Definition 1). As can be noticed, at
range of 0.7-0.8, many of the feature pairs whose similarity value
falls into this range were classified as different by human judgers.
Only around a similarity value of 0.9, all pairs of features whose
similarity is above this value were classified by human judgers as
similar. In order to receive the exact similarity threshold, we used
AdaBoost ‎[11], which as noted is a machine learning, adaptive
algorithm. We chose AdaBoost because it is fast, simple and easy
to program; it has no parameters to tune (except for the number of
rounds); and it requires no prior knowledge about the weak
learner and thus it can be flexibly combined with any method for
finding weak hypotheses ‎[12]. In addition, often boosting does not
suffer from over-fitting. Therefore, the classifier found on a
specific domain can be generalized and used as a pre-defined
threshold for the clustering algorithm in other domains as well.
The similarity threshold calculated using Adaboost is 0.88.
After running the feature clustering part of our method on the set
of the (nine) aforementioned diagrams, we got 29 clusters, 14
tight relationships (namely, at least two thirds of the involved
SPLs refer to them to some extent), and 15 loose relationships.
Each cluster included between 1 to 19 features.
In order to evaluate whether the cluster analysis yields reasonable
recommendations, we created a questionnaire with 12 statements
based on the principles described in Section 2.3. We then asked
mobile phone users to grade their degree of agreement with each
statement, on a scale of ‘completely agree’, ‘partially agree’, and
‘disagree’. The respondents could also mark that they have no
sufficient information on the subject and hence cannot decide on
their degree of agreement with the statement (“don’t know”). This
way we wanted to check whether the recommendations of our
method sound reasonable to users (of mobile phones). In the
Number of feature pairs
future we will examine other viewpoints of more technical
stakeholders, such as developers and maintainers.
20
18
16
14
12
10
8
6
4
2
0
Table 3. Degrees of agreement with cluster analysis outputs
completely
agree
partially
agree
Disagree
don't know
0 - 0.1 0.1 - 0.2 - 0.3 - 0.4 - 0.5 - 0.6 - 0.7 - 0.8 - 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Similarity value
similar
different
Figure 7. A histogram of human classification with respect to
similarity values
Six of the questions in the questionnaire were based on the
internal analysis of clusters and referred to features within the
same cluster. In this case, we aimed to check whether mobile
phone users perceive the features as similar enough. Examples of
questions in this category are:

Various media capabilities (such as mp3, mp4, and camera)
are extras to mobile phones.
 High resolution, low resolution, basic and color are all
characteristics of a mobile phone screen/display.
 In mobile phones screen and display are synonyms.
The other six statements in the questionnaire were derived
applying the inter-cluster analysis of relationships. Here we aim to
check whether recommendations to add features that were missing
in specific feature diagrams but exist in others are indeed justified.
Examples of the statements in this category are:

A mobile phone which supports messaging services is likely
to support SMS.
 Two common settings of a mobile phone are its operating
system and its support for Java.
 A clock utility is a basic function in a mobile phone.
50 information systems students filled the questionnaire. Four of
them worked in service departments of mobile phone companies
for several years. The other 46 respondents were familiar with the
domain as users for 10 years on the average. 18 of the respondents
evaluated their familiarity with the domain as ‘very good’, 22 – as
‘good’, and only 5 – as ‘poor’ and 1 – as ‘unfamiliar’.
The analysis of the answers reveals that overall in about 80% of
the cases the respondents agreed with the statements (in 59% of
the cases they completely agreed and in 20% they partially
agreed). Only in 16% of the cases the respondents disagreed with
the specified statement. Dividing the questions according to their
origin in the cluster analysis step, we found similar degrees of
agreement in questions based on internal analysis of clusters and
in questions derived from inter-cluster analysis of relationships
(see Table 3).
Despite the high degree of agreement with the statements derived
from the cluster analysis step, we noticed a high degree of
disagreement on three specific statements (40% of disagreement).
Internal analysis
of clusters
59.00%
Inter-cluster
analysis of rel.
59.87%
Overall
59.43%
21.67%
18.73%
20.20%
15.33%
17.39%
16.36%
4.00%
4.01%
4.01%
The first statement claimed that in mobile phones ‘screen’ and
‘display’ are synonyms. While this is the case in WordNet (name
similarity of 0.95), many respondents probably interpreted
‘screen’ as the physical device that is characterized by resolution,
size, etc., while ‘display’ was interpreted as referring to the way
the mobile phone visualizes applications (e.g., using different
drivers). This distinction sounds reasonable, but it did not appear
in our input feature diagrams, in which both ‘display’ and ‘screen’
had similar (or event identical) sub-features.
The second statement with a high degree of disagreement claimed
that mobile phones with basic functions are most likely to have
games. Here we believe that the respondents disagreed with the
statement since they referred to different kinds of mobile phones,
the simple of which do not include games or game support at all.
In our case, however, 80% of the SPLs that included the features
‘utility functions’ or ‘basic functions’ included also the features
‘game’ or ‘play’. .
Finally, the third statement with a high degree of disagreement
claimed that two common settings of a mobile phone are its
operating system and its support for Java. Here we believe that the
respondents tend to refer to settings as features (or definitions)
that can be controlled by users. Indeed, both the operating system
and the java support are not features that can be modified by
users, but are features relevant for the configuration of mobile
phones and thus should be recommended for SPLs that do not
include them.
Although our results are promising, only further evaluation may
indicate whether our results can be generalized to other (more
complicated) cases.
4. RELATED WORK: COMPARISON OF
FEATURE MODELS
Comparison of feature models is studied and researched for
different reasons ‎[1]. Here we review the main purposes and
differentiate our work from the related studies.
First, comparison is done when composing feature models to
support scalability of SPLs. It is recommended to divide feature
models that represent SPLs with large numbers of features and a
high degree of variability ‎[23]. Segura et al. ‎[25] further suggest
using graph transformations for automating the merge of feature
models. They present a catalogue of 30 visual, technologyindependent rules that describe how to build a feature model
including all the products represented by two given feature
models. The main assumption in this category of studies is that the
different feature models include identical portions that can be
characterized by their feature names and attributes. These portions
can serve as anchors for merging. In our case, we cannot assume
any syntactical overlap between the feature models, as the feature
models may be developed in separate domain analysis processes
in different companies. Instead we assume a set of feature
diagrams, each representing a different SPL. In case a single SPL
is represented in different feature diagrams, we will need to run a
merging method prior to using our cross SPL analysis method.
Second, comparison is also done when weaving aspects to base
feature models that represent SPLs. Acher et al. ‎[2] propose two
main operators to compose feature models: insert, which enables
inserting features from a crosscutting feature model into a base
feature model, and merge, which enables putting together features
from two separated feature models, when none of the two clearly
crosscuts the other. Bošković et al. ‎[4] describe a method, called
AoFM, for facilitating modeling of aspects in feature models. To
this end, they introduce the concepts of join point models, aspects,
aspect feature models, pointcut feature models, patterns, and
composition rules. In particular, the patterns and composition
rules describe all the changes that an aspect in AoFM can impose
on a base concern, i.e., the addition and removal of feature
diagram elements. Studies in this category assume the existence of
composition rules or operations that can be performed on the base
models when weaving a given aspect model. Furthermore, the
aspect and base models are defined over the same terminology,
easing the task of matching, or finding similar, features. In our
case, we do not aim to weave or integrate the different feature
diagrams, but to analyze the commonality and variability among
the input SPLs (as represented in their feature diagrams) in order
to recommend on improvements in particular SPLs or in the
maintenance and future development of a set of SPLs.
Third, comparison of feature models can be done for
synchronization purposes. Thüm et al. ‎[27] propose an algorithm
for automatically determining the relationships between two
feature models in terms of a set of configurations. As SPLs and
their feature models evolve over time, they classify the
modification of a feature model as refactorings, specializations,
generalizations, or arbitrary edits. They further present an
algorithm that computes the change classification of two feature
models (before and after the edit). Kim and Czarnecki ‎[17] also
suggest synchronizing existing configurations of a feature model
that have evolved over time. They refer to the following changes
that can be performed on feature models: addition, removal,
changing attribute, relocation, and cardinality changes. In all these
cases, the assumption is that the feature models were emerged
from the same source (feature model) applying different
sequences of operations from a pre-defined list of possible ones.
In our work, the sources of the feature models may represent SPLs
that were developed in different departments of the same
company or even in different companies (as is the case with
mergers and acquisitions of companies).
Finally, feature models of different SPLs may be compared and
merged. In this context, Czarnecki and Eisenecker ‎[8] suggest
using dependency rules between high level and low level features
to deal with different feature models for different product types.
Czarnecki et al. ‎[9] introduce the concept of staged configuration
for composing separate feature models that model decisions taken
by different stakeholders. Stage configuration can be achieved
“either by stepwise specialization of feature models or by multilevel configuration, where the configuration choices available in
each stage are defined by separate feature models”. Hartmann and
Trew ‎[14] introduce the concept of a Context Variability model
which contain the primary drivers for variation. The Context
Variability model constrains the feature model and enables
modeling multiple SPLs for software supply chains. Hartmann
and Trew further suggest how to merge multi product line feature
models. In all these cases the different feature models are assumed
to use the same terminology. In particular, one of the most current
underlying assumptions is that similar features have the same
name. Thus, the main challenges are merging and yielding valid
outputs (i.e., valid merged models). In our case we did not assume
the same terminology and thus we had to apply similarity metrics
and clustering techniques to identify similar features and group
them accordingly. Furthermore, we are not interested in a merged
model, but in recommendation for improvement in the SPL
artifacts and in the overall management of the SPLs.
5. SUMMARY AND FUTURE WORK
Analysis of the commonality and variability within families of
SPLs is highly important for improving maintenance, future
development, and management. Currently different feature models
are mainly merged for different purposes, assuming the usage of
the same terminology in all input models and focusing on creating
valid merged models. In this research we introduce a method for
conducting cross product line analysis by using linguistic and
structural similarity techniques for measuring feature similarity
and an agglomerative hierarchical clustering technique for
analyzing the commonality and variability of the examined SPLs.
Preliminary results indicate that the method may have a potential
for recommending on improvements to the management of related
features.
Future research includes several directions. First, so far we mainly
used WordNet for similarity calculations. However, as there are a
wide variety of domains which require wide and updated corpuses
of terms, additional techniques for measuring semantic similarity
should be evaluated (e.g., the one that is based on
Wikipedia ‎[13]). In addition, the impact of the different similarity
types, i.e., name and context similarity, on the overall similarity
needs to be examined, especially for differently structured product
lines in the same domain.
Second, currently the feature clustering is done based on the
existence of relationships and does not refer to their types.
However, mandatory features may indicate on a stronger
relationship than optional features, as the first kind points on
essential characteristics of the product line, while the second kind
may point “only” on possible added values. In addition,
dependencies should be analyzed as well.
Third, the usability of the method can be improved by
representing recommendations for changes for each examined
SPL. This can be done by introducing mining techniques for
automatically analyzing the clustering results.
Fourth, the efficiency, especially in terms of scalability, of the
method should be further evaluated. In particular, as currently the
method bottleneck is the agglomerative clustering algorithm3,
further clustering techniques should be evaluated checking the
quality of their results with respect to their time complexity.
Finally, additional evaluations of the method are required. In
particular, additional, more complicated domains need to be
examined, as well as different sources of feature diagrams
(besides S.P.L.O.T repository) need to be explored. The
evaluation also needs to examine different points of view besides
that of users, such as those of developers and maintainers.
3
The time complexity of the agglomerative clustering algorithm
varies depending on the selected distance type from O(n2) for
single-link to O(n2 log n) for complete-link ‎[19], where n is the
number of features in all input diagrams.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Acher, M., Collet, P., Lahire, P., France, R. (2010)
Comparing approaches to implement feature model
composition. Modelling Foundations and Applications, 319.
Acher, M., Collet, P., Lahire, P., France, R. (2009)
Composing Feature Models. 2nd International Conference on
Software Language Engineering (SLE’09). LNCS 5969, pp.
62–81.
Benavides, D., Segura, S., Ruiz-Cortés, A. (2010)
Automated analysis of feature models 20 years later: A
literature review. Information Systems 35 (6), pp. 615-636.
Bošković, M., Mussbacher, G., Bagheri, E., Amyot, D.,
Gašević, D., Hatala, M. (2011) Aspect-oriented feature
models. Models in Software Engineering, pp. 110-124.
Budanitsky, A., Hirst, G. (2006). Evaluating wordnet-based
measures of lexical semantic relatedness. Computational
Linguistics 32 (1), pp. 13-47.
Chen, L., Ali Babar, M. Ali, N. (2009). Variability
management in software product lines: a systematic review,
Proceedings of the 13th International Software Product Line
Conference, pp. 81-90.
Clements, P., Northrop, L. (2001). Software Product Lines:
Practices and Patterns. Addisson-Wesley.
Czarnecki, K, Eisenecker, U.W. (2000) Generative
Programming: Methods, Tools, and Applications, AddisonWesley, Boston.
Czarnecki, K., Helsen, S., Eisenecker, U. (2005) Staged
Configuration through Specialization and Multilevel
Configuration of Feature Models. Software Process:
Improvement and Practice 10(2), pp. 143–169.
Dao, T. N. Simpson, T. (2005). Measuring Similarity
between sentences. opensvn repository.
Freund, Y. and Schapire, R.E. (1997). A decision-theoretic
generalization of on-line learning and an application to
boosting. Journal of Computer and System Sciences 55, pp.
119-139.
Freund, Y., Schapire, R., Abe, N. (1999). A short
introduction to boosting. Journal of Japanese Society for
Artificial Intelligence 14, pp. 771-780.
Gabrilovich, E., Markovitch, S. (2007). Computing
semantic relatedness using wikipedia-based explicit
semantic analysis. Proceedings of the 20th international
joint conference on Artificial intelligence, pp. 1606-1611.
Hartmann, H., Trew, T. (2008) Using feature diagrams with
context variability to model multiple product lines for
software supply chains. IEEE Software Product Line
Conference (SPLC’08), pp.12–21.
[15] Jain, A. K. (2010). Data clustering: 50 years beyond Kmeans. Pattern Recognition Letters 31 (8), pp. 651-666.
[16] Kamvar, S. D., Klein, D., Manning, C. D. (2002).
Interpreting and extending classical agglomerative
clustering algorithms using a model-based approach.
Proceedings of 19th International Conference on Machine
Learning, pp. 283-290.
[17] Kim, C.H.P., Czarnecki, K. (2005) Synchronizing
cardinality-based feature models and their specializations.
ECMDA-FA’05. LMCS 3748, pp. 331-348.
[18] Kurita, T. (1991). An efficient agglomerative clustering
algorithm using a heap. Pattern Recognition 24 (3), pp. 205209.
[19] Manning, C. D., Raghavan, P., Schütze, H. (2008)
Introduction to information retrieval. Cambridge University
Press Cambridge, Vol. 1, pp. 378-395.
[20] Pedersen, T., Patwardhan, S., Michelizzi, J. (2004)
WordNet::Similarity: measuring the relatedness of concepts,
Demonstration Papers at HLT-NAACL 2004, Association
for Computational Linguistics, pp. 38-41.
[21] Perumal, P., Nedunchezhian, R. (2011). Performance
Analysis of Standard k-Means Clustering Algorithm on
Clustering TMG format Document Data. International
Journal of Computer Applications in Engineering Sciences I
(IV), pp. 406-412.
[22] Rasmussen, E. (1992). Clustering algorithms. Information
Retrieval: data structures and algorithms, pp. 419-442.
[23] Reiser, M.O., Weber, M. (2007) Multi-level feature trees: A
pragmatic approach to managing highly complex product
families. Requirements Engineering 12(2), pp. 57–75.
[24] S.P.L.O.T Software Product Lines Online Tools,
http://www.splot-research.org/.
[25] Segura, S., Benavides, D., Ruiz-Cortés, A., Trinidad, P.
(2008) Automated merging of feature models using graph
transformations. GTTSE ’07, LNCS 5235, pp. 489–505.
[26] Steinbach, M., Karypis, G., Kumar, V. (2000). A
comparison of document clustering techniques. KDD
Workshop on Text Mining, pp. 525-526.
[27] Thüm, T., Batory, D., Kästner, C. (2009) Reasoning about
edits to feature models. IEEE Internation Conference on
Software Engineering (ICSE’09), pp. 254–264.
[28] Wikipedia
Sony
Mobile
Communications
http://en.wikipedia.org/wiki/Sony_Mobile_Communications
[29] WordNet:
a
lexical
database
for
English,
http://wordnet.princeton.edu/.
[30] Wu, Z., Palmer, M. (1994). In Verbs semantics and lexical
selection, Association for Computational Linguistics, pp.
133-138.