Cross Product Line Analysis Ora Wulf-Hadash Iris Reinhartz-Berger Department of Information Systems, University of Haifa, Haifa 31905, Israel 972-4-9590805 Department of Information Systems, University of Haifa, Haifa 31905, Israel 972-4-8288502 [email protected] [email protected] ABSTRACT Due to increase in market competition and merger and acquisition of companies, different software product lines (SPLs) may exist under the same roof. These SPLs may be developed applying different domain analysis processes, but are likely not disjoint. Cross product line analysis aims to examine the common and variable aspects of different SPLs for improving maintenance and future development of related SPLs. Currently different SPL artifacts, or more accurately feature models, are compared, matched, and merged for supporting scalability, increasing modularity and reuse, synchronizing feature model versions, and modeling multiple SPLs for software supply chains. However, in all these cases the focus is on creating valid merged models from the input feature models. Furthermore, the terminology used in all the input feature models is assumed to be the same, namely similar features are named the same. As a result these methods cannot be simply applied to feature models that represent different SPLs. In this work we offer adapting similarity metrics and text clustering techniques in order to enable cross product line analysis. This way analysis of feature models that use different terminologies in the same domain can be done in order to improve the management of the involved SPLs. Preliminary results reveal that the suggested method helps systematically analyze the commonality and variability between related SPLs, potentially suggesting improvements to existing SPLs and to the maintenance of sets of SPLs. Keywords Feature Diagram Matching, Feature Diagram Merging, Feature Clustering, Feature Similarity, Empirical Evaluation 1. INTRODUCTION Due to increase in market competition, companies cannot afford to focus on single SPLs and need to develop several SPLs for different customers, requirements, etc. Usually all these SPLs are somehow related, e.g., belong to the domain in which the company specializes, but include different common and variable aspects. As an example consider the domain of mobile phones. The largest seller of mobile devices1, Samsung, manages several different SPLs, some of which are: Galaxy S, Galaxy Note, and Samsung Nexus. While these SPLs differ in their features, e.g., Samsung Galaxy Note has a large screen, whereas Samsung Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. VaMoS '13, January 23 - 25 2013, Pisa , Italy 1Copyright 2013 ACM 978-1-4503-1541-8/13/01…$15.00. According to Gartner’s report of the first quarter of 2012. Galaxy S is relatively small, it is important to be able to systematically analyze the commonality and variability of these SPLs in order to improve productivity by considering uniting the maintenance and future development of similar features (or even SPLs) or by improving the artifacts of a specific SPL based on artifacts of other SPLs in the same domain. In another scenario, consider mergers or acquisitions of companies. Each company has developed and maintained its own SPL or SPLs. The merger or acquisition of the companies yields the existence of different SPLs with different kinds of overlaps. Furthermore, the terminologies used in the different SPLs in this case may differ due to the development of the artifacts in different companies. Here again it is important to analyze the commonality and variability of these SPLs in order to improve the management of these SPLs. As an example to this scenario consider the merger of Sony and Ericsson in 2001 [28]. A large group of methods concentrate on representing SPLs as feature models [6]. Some of the methods further support automated analysis of these feature models [3], such as checking product validity, calculating the number of products in a SPL, and identifying void features. Several studies also examine the relationships between feature models for supporting scalability [25], increasing modularity and reuse [2], [4], synchronizing feature model versions [17], [27], and modeling multiple SPLs for software supply chains [7], [8]. To this end, the studies mainly suggest composing feature models describing different SPL aspects [1]. Furthermore, they assume using the same terminology in the input feature models, check the structural similarity of the given models, and searches for similar portions that are later used as anchors for merging. While this strategy is perfectly suitable for handling feature models that represent different aspects of the same SPL or different SPLs that use the same underlying terminology, it is not enough for performing commonality and variability analysis of feature models of different SPLs that were potentially developed in different departments or even in different companies and thus not necessarily share the same terminology (as is the case in mergers and acquisitions). In the current work, we call for cross product line analysis, namely conducting commonality and variability analysis of related SPLs for improving the management of current and future SPLs. In particular, the input of the suggested method is a set of feature diagrams representing different SPLs. The input is processed in three main steps (see Figure 1). First, during the Feature Similarity Calculation step, the set of feature diagrams is analyzed using linguistic and structural techniques for finding similar features. This step is important for aligning the different terminologies that may be used for developing the artifacts of the different SPLs. In the second step, Feature Clustering, an agglomerative clustering technique is used for creating groups (clusters) of similar features that may represent variants of the same features. Finally, in the Cluster Analysis step, the clusters from the previous step as well as their relationships are analyzed to provide recommendations for improving individual SPLs and the management of the whole set of SPLs. Feature Similarity Calculation Pairs of Features and their Degrees of Similarity A set of Feature Diagrams 2.1.1 Similarity of Feature Names Feature Clustering Clusters of Similar Features Cluster Analysis Recommendations for Improvements 2.1 Feature Similarity Calculation In order to define the common aspects of the input feature diagrams, the method measures the similarity in the feature names and their context (i.e., where they appear in the feature diagrams with respect to their ancestors and descendants). Figure 1. An overview of the suggested method The rest of the paper is structured as follows. Section 2 describes and exemplifies the method. Section 3 presents preliminary results regarding the method outputs. Section 4 includes related work and discusses the benefits and limitations of the suggested method with respect to the related work. Finally, Section 5 concludes and refers to future research. 2. The Cross Product Line Analysis Method As noted, the input for the suggested method is a set of feature diagrams, each representing a SPL in the same domain. As an example to such an input, consider Figure 2 which includes two feature diagrams in the mobile phones domain. The SPL presented in Figure 2(a) supports utility functions (namely, voice calls and messaging services), three types of screens, and optional extras that include a camera, mp3, or mp4. The second SPL, presented in Figure 2(b), supports calls, message services, two types of displays, and optional media capabilities in the form of a camera or mp3. As can be seen, these two SPLs differ in the features they support (e.g., the second SPL does not support mp4), the ways they structure the features (e.g., calls appear in the first SPL under utility functions, whereas in the second SPL it appears directly under the diagram root), and the terminologies they use (e.g., 'extras' vs. 'media', ‘screen’ vs. ‘display’, and so on). To overcome these kinds of difference, the method measures first the degree of similarity between pairs of features and afterwards groups similar features, enabling commonality and variability analysis of clusters rather than individual features. These steps of the method are elaborated next. We utilize linguistic measurements for calculating the similarity of feature names. Many of the linguistic measurements (see [5], for example) uses WordNet, which is a lexical database of English [29]. The benefits of WordNet are that it is large, rich, freely available online, and general-purpose; hence, it can be used for SPLs that belong to different domains. Note, however, that WordNet also has shortcomings in the context of cross SPL analysis. The features in technological domains can be represented as abbreviations or commonly known acronyms which are sometimes not recognized as meaningful words for WordNet. In other cases, the same word may have different meanings depending on the domain and the context. To overcome these deficiencies, we currently added the ability to import userdefined acronyms for certain domains. In the future, we intend to improve this step with Wikipedia-based semantic analysis methods, such as the one proposed in [13]. For measuring the similarity of two features, we adopted Dao and Simpson's similarity measurement between two phrases [10], which is a simple and straightforward metric that does not require a large corpus of statistics. The following formula defines feature name similarity. Definition 1 (Feature Name Similarity). Let f1 and f2 be two features. Feature name similarity, NSim, is calculated as follows: ∑ ( Where: t1…tm; u1…un are the names of features f1; f2, respectively (m and n are the numbers of words in the names of f1 and f2) is Wu and Palmer's formula [30] for comparing two words (see explanations in Figure 3). LCS is the least common superconcept of ti and uj in WordNet root N1 is the number of nodes on the shortest path from ti to LCS in WordNet N3 mobile phone LCS screen utility functions voice call messaging Text Message basic color N1 E = extras high resolution Voice Message (a) camera mp3 alternative alternative require require optional optional or or exclude exclude message service display SMS – short message service MMS – multimedia message service EMS – enhanced message service media low resolution colour tti 1 tu2j N3 is the number of nodes on the shortest path from LCS to the root in WordNet Figure 3. Calculating similarity between terms that are hierarchically related mobile phone calls N2 is the number of nodes on the shortest path from u j to LCS in WordNet N2 mp4 mandatory mandatory ∑ ) camera (b) Figure 2. Two feature diagrams of mobile phones mp3 As an example of calculating the name similarity of two features, consider the feature 'Short Message Service' (SMS), which appears in Figure 2(b), and the feature 'Text Message' that appears in Figure 2(a). Table 1 summarizes the pair-wise name similarity values of these features, while the formula below calculates their feature name similarity. ( ( ) ) ( Figure 2(b). The name similarity of these features is 0.58, whereas their overall similarity taking into consideration their sub-features is much higher – 0.72. ) Table 1. The pair-wise similarity values of 'short message service' and 'text message' Short 0.52 0.43 Text Message message 0.62 1.00 service 0.55 0.46 2.1.2 Similarity of Feature Context As the inputs of our method are feature diagrams, which are structured trees of features and not plain lists, the method consider the context in which the feature appears and not just its name. The descendants (sub-features) are highly important for determining the context of the (ancestor) feature. Note that similar structures of completely different features may not indicate on their potential relatedness. However, the similarity of features whose names and structures are similar should be higher than the similarity of features which only share similar names. For defining the context similarity of features, the method considers their immediate descendants (namely, mandatory, optional alternative, and ‘or’ sub-features). Let f1, f’1, f2, and f’2 be features, such that f’1 is a sub-feature of f1 and f’2 is a sub-feature of f2 (see Figure 4). If f’1 and f’2 are similar (i.e., their similarity measurement considering both their names and context with respect to their descendants is higher than some threshold), then the similarity of f1 and f2 should increase. Note that the increase in the similarity is percolated from the leaves of the feature diagram to its root, thus the percolation process will terminate assuming the input feature diagrams are structured as trees. The following definition defines feature similarity taking into consideration both feature names and context. Potentially similar f'1 Are similar f2 Increase similarity f1 f'2 Figure 4. Percolating similarity through relationships Definition 2 (Feature Similarity). Feature similarity of features f1 and f2 is calculated using the following formula: ( ( ) ) ∑ ( ) Where {f’1} are the sub-features of f1, {f’2} are the sub-features of f2, Sim’(f’1,f’2) = ( ) the number of pairs (f’1, f’2) satisfying ( ) ( , m is )> threshold. In order to determine the threshold for similar features (namely, the similarity threshold), different algorithms may be used. As will be explained and demonstrated in Section 3, we chose to use AdaBoost [11], which is a machine learning, adaptive algorithm, for this purpose. Three important characteristics of the above formula are: (1) the value of similarity is always between 0 and 1; (2) the similarity of features increases proportionally to the degree of similarity of their sub-features; and (3) the similarity of features increases proportionally to the number of similar sub-features. As an example of feature similarity calculation, consider the features ‘messaging’ from Figure 2(a) and ‘message service’ from At the current stage, the method only checks the existence of relationships between features and not the types of these relationships (e.g., mandatory vs. optional features). In the future, we plan to examine the impacts of the different relationship types on similarity and to improve the definition of similarity accordingly, e.g., by defining a higher weight to mandatory relationships. Note that the values of both name and overall similarity have no absolute meaning, but only relative ones (“more similar than”, “less similar than”). Thus, other techniques are required to better understand the degree of similarity of the different feature diagrams or the SPLs that they represent. For this purpose, we utilize feature clustering, as described next. 2.2 Feature Clustering Clustering is the process of grouping a set of objects into classes of similar objects [21]. In our research, the objects are features that are represented via their names. Thus, document or text clustering techniques are relevant [26], [15]. In particular, we use a variation of the agglomerative hierarchical clustering technique. This technique [18] is a bottom-up clustering approach, which gets as a parameter the number of expected clusters and starts with putting each object in a separate cluster. Then, the algorithm agglomerates (merges) in each iteration the closest pair of clusters by calculating the distance between different clusters. The algorithm continues until the number of expected clusters is reached. We chose this algorithm because of the following reasons. First, this algorithm is known as one of the most accurate clustering techniques [26], [18]. Since the clustering quality has a great impact on the analysis results, accuracy is very important in our case. Second, the distance between two clusters reflects the degree of similarity between their features. Starting with each feature in a different cluster will prevent grouping features when they are not similar enough. However, the agglomerative hierarchical clustering algorithm requires determining the number of clusters a-priori. This number cannot be determined in our case as it varies depending on the size of the SPLs and their degree of variability. Therefore, we modified the stopping criterion of the algorithm to merge two closest clusters as long as the distance between them is not bigger than the similarity threshold. This way we ensure that too different features will not be put in the same cluster. Three types of distances between clusters are commonly mentioned in the literature [22], [16] (see Figure 5): (1) Singlelink: the distance between the two closest features of the clusters; (2) Complete-link: the distance between the two farthest features of the clusters; and (3) Average-link: the average of pair-wise distances between features in the two clusters. Our algorithm supports the three types of distances and the user can choose the preferable type. For example, in cases where the domain is narrow and there are many similar features, the 'complete-link' distance may be more appropriate to create a refined division to clusters (namely, more clusters where each cluster includes fewer but more similar features). If the domain is wide, on the other hand, the 'single-link' may be better to avoid stopping cluster merging too early. In any case, after selecting a distance type, the algorithm calculates all distances between clusters according to the selected type. Single link Average link Complete link Figure 5. Types of distances between clusters The feature clustering algorithm, which is presented in Listing 1, gets a two dimensional matrix, named FeatSim. The cell i,j of this matrix represents the (overall) similarity between feature i and feature j. The feature clustering algorithm starts with creating a hash of clusters, each holding one feature that appears in FeatSim. Then, iteratively, the algorithm measures the distance between clusters (utilizing single-, complete-, or average-link) and merges the closest clusters as long as their distance is greater than the similarity threshold – th. FeatureClustering(FeatSim){ // Initialization a hash of the clusters, each holding a // single feature from FeatSim feature. FeatClst = InitializeHash (FeatSim) Do { // A merging iteration // find the two closest clusters based on the // selected distance method (i,j) = FindClosestClusters(FeatClst, FeatSim) If Distance (FeatClst[i],FeatClst[j], FeatSim) ≥ th //merge two closest clusters merge(i, j, FeatClst) // merge as long as there are close clusters } while Distance (FeatClst[i],FeatClst[j], FeatSim) ≥ th Return FeatClst } Listing 1. The algorithm for feature clustering The feature clustering could create the following clusters in our example of the two mobile phones SPLs in Figure 2: Cluster 1: voice call, calls Cluster 2: High resolution, Low resolution, color, colour, basic Cluster 3: screen, display Cluster 4: media, extras Cluster 5: camera, mp3, mp4 Note that some of the clusters were mainly created due to the high similarity in the feature names, e.g., cluster 3, whereas some clusters emerged due to similar feature contexts (e.g., cluster 4). In other cases, both name and context similarity contributed to the cluster creation. 2.3 Cluster Analysis Once the feature clusters are created, we can analyze the features included in each cluster, as well as the cluster relationships. The benefits of such an analysis are three folded. First, features that belong to the same cluster are assumed to be similar. Thus, if they belong to different SPLs it may be beneficial to manage them together or even consider their union for future development and maintenance. Second, clusters that are “tightly” connected are likely to include related features. If these features are not included in some SPLs, it may be advisable to recommend their inclusion for enhancing the specific SPLs. Finally, the clusters and their relationships may enable extracting the domain terminology. In particular, they can help map different terminologies and identify dependencies between terms or concepts, assisting in aligning the existing SPLs and supporting their potential union. 2.3.1 Internal Analysis of Clusters Several reasons may cause features to fall into the same cluster: (1) the features are identical or almost identical (i.e., the values of the similarity measurement are almost 1). Examples of such a case are the features ‘color’-‘colour’ in cluster 2 and ‘screen’-‘display’ in cluster 3 of our example; (2) the features are not identical but “similar enough”, potentially justifying their consideration as different specializations or variants of the same abstract feature. Examples of such a case are the features camera, mp3, and mp4 in cluster 5 of our example; and (3) the names of the features are different but the context in which they are used is so similar that their overall similarity is relatively high. Examples of such a case are the features ‘media’ and ‘extras’ in cluster 4 of our example. In all these cases we may wish to recommend mutual management of these features, and their associated artifacts. In other words, inclusion of features from different feature diagrams, representing different SPLs, in the same cluster may indicate on a high degree of similarity between the corresponding features, potentially calling for managing these features together. Alternatively, we may wish to name each cluster and use these names for renaming the corresponding features in the different SPLs. This way we will align the different terminologies used in the SPLs and enable their potential merge for different purposes, such as interoperability. However, it is important to notice that sometimes different representations of the same fact may use different, mainly complementary, features. For example, in both feature diagrams camera requires high resolution and no low resolution. However, this constraint is differently expressed: in Figure 2(a) it is expressed as 'camera' requires 'high resolution', while in Figure 2(b) – as 'camera' excludes 'low resolution'. In the future, we will employ mining techniques to automatically detect such situations. 2.3.2 Inter-Cluster Analysis The features in the different clusters may be related, inducing relationships between the clusters. In other words, two clusters are related if the features they contain are related in the input feature diagrams. However, some of the clusters may be tightly related, namely relationships between the corresponding features exist in many of the involved SPLs, while others are loosely related or not related at all. We want to use only tightly related relationships for our cross product line analysis, as tight relationships may capture “knowledge” on the set of the given SPLs. Thus, we use the following definition of cluster relationship strength. Definition 3 (cluster relationship strength): The strength of a relationship from cluster C1 to cluster C2 is defined as the ratio between the number of SPLs involved in the relationship2 and the total number of SPLs whose features appear in at least one of the clusters. Formally expressed: ( ) In our example (with only two SPLs), we found tight relationships between clusters 4 (including 'media', 'extras') and 5 (including 'camera', 'mp3', 'mp4'), namely the two feature diagrams include relationships between features that belong to these clusters. The relationship between a cluster that includes 'mobile phone' and a 2 i.e., the number of SPLs in which there exist features f1 and f2 such that f1C1, f2C2 and f2 is a sub-feature of f1 in the feature diagram of that SPL. cluster that includes 'utility functions' is loose, as only one feature diagram, Figure 2(b), includes a relevant relationship. For each two clusters that are connected via a relationship whose strength is greater than a predefined threshold (e.g., two thirds of the involved SPLs), we can recommend to SPLs whose features appear in the source cluster but do not include features from the target cluster to add appropriate features from the target cluster, as they refine features already existing in the SPLs. This way the method can improve the input feature diagrams by detailing them and increasing the possible reuse among the SPLs. As an example consider clusters 3 (screen, display) and 2 (basic, color, colour, high resolution, low resolution). The relationship between these clusters can be considered tight (relationships between features that belong to these clusters exist in both diagrams). Thus, the method will recommend adding the features ‘basic’ and ‘high resolution’ under the feature 'display' in the SPL presented in Figure 2(b) and the feature ‘low resolution’ under the feature 'screen' in the SPL presented in Figure 2(a). 3. PRELIMINARY RESULTS In order to evaluate the proposed method, we implemented it, assuming that the input feature diagrams are given in the common Simple XML Feature Model format (SXFM) [24]. We further used the perl module WordNet::Similarity::wup [20] for calculating feature name similarity. A list including all the features in the input diagrams was created and sent to the module, which returned the similarity between each pair of features in the list. For implementing the feature clustering algorithm, we used the complete-link distance in order to create a refined division of features into clusters. The input feature diagrams were taken from S.P.L.O.T, an academic feature diagrams repository [24]. This repository includes about 220 feature diagrams in different domains, which were extracted from academic publications and other relevant sources. The criteria for including feature diagrams in that repository, as listed in S.P.L.O.T web site, are: (1) Consistency: All models are guaranteed to be consistent (contain at least one valid configuration); (2) Correctness: None of the models contain dead features; and (3) Transparency: All models identify their authors (or related literature) and provide some contact information. Thus, we could assume the validity of these models (at the cost of using relatively simple examples). We examined the domains that are included in S.P.L.O.T repository and selected the mobile phones domain, due to the existence of seven different feature diagrams in the repository and the ability to get relevant information for creating additional ones. We indeed modeled two additional feature diagrams based on the supplement material we found on this domain and added some challenges to better evaluate our method. In particular, we added synonyms and antonyms and we modeled the hierarchies of features using different nesting structures. Table 2 lists the nine feature diagrams we used in the evaluation, along with the number of features in each diagram (marked #F in the table) and the number of levels (marked #L). As can be seen, the diagrams are quite simple, since we wanted to be able to examine the method outputs manually. In the future we will evaluate the method on more complicated feature diagrams in order to examine its scalability. To calculate the similarity threshold, we calculated first the name similarity of all pairs of features from the nine input diagrams. We then sample 100 pairs whose values of name similarity ranged from low (almost 0) to high (almost 1). We requested six human graders, who have strong technical background and experience in mobile devices architecture, to grade the similarity of each selected pair of features on a scale that ranged from 1 to 10. Based on the results, we identified different features (grades 1-7) and similar features (grades 8-10). For 96% of the feature pairs, all graders classified the pair in the same similarity category. For pairs on which no consensus was reached, we selected the similarity category of the majority. Table 2. The feature diagrams used in the evaluation Name MobilePhone Mobile Phone Mobile Phone Phone Creator ISARDA Sergio Segura AE UNB #F 10 #L 3 20 4 25 4 m 10 3 Mobile Phone Example Mobile phone Cell Phone Lenita ETSII Rayco 10 3 10 3 Sebastian Oster Self 15 4 21 4 Self 21 4 Mobile Phone 1 Mobile Phone 2 Source model_20120110_139114401. xml model_20100322_955726153. xml model_20101119_1472596180 .xml model_20101111_1790887308 .xml model_20120110_1719396361 .xml model_20120110_1094246588 .xml model_20100308_1032655961 .xml Figure 6 shows a histogram of the number of pairs classified by humans as similar or different for each range of name similarity values (as calculated using Definition 1). As can be noticed, at range of 0.7-0.8, many of the feature pairs whose similarity value falls into this range were classified as different by human judgers. Only around a similarity value of 0.9, all pairs of features whose similarity is above this value were classified by human judgers as similar. In order to receive the exact similarity threshold, we used AdaBoost [11], which as noted is a machine learning, adaptive algorithm. We chose AdaBoost because it is fast, simple and easy to program; it has no parameters to tune (except for the number of rounds); and it requires no prior knowledge about the weak learner and thus it can be flexibly combined with any method for finding weak hypotheses [12]. In addition, often boosting does not suffer from over-fitting. Therefore, the classifier found on a specific domain can be generalized and used as a pre-defined threshold for the clustering algorithm in other domains as well. The similarity threshold calculated using Adaboost is 0.88. After running the feature clustering part of our method on the set of the (nine) aforementioned diagrams, we got 29 clusters, 14 tight relationships (namely, at least two thirds of the involved SPLs refer to them to some extent), and 15 loose relationships. Each cluster included between 1 to 19 features. In order to evaluate whether the cluster analysis yields reasonable recommendations, we created a questionnaire with 12 statements based on the principles described in Section 2.3. We then asked mobile phone users to grade their degree of agreement with each statement, on a scale of ‘completely agree’, ‘partially agree’, and ‘disagree’. The respondents could also mark that they have no sufficient information on the subject and hence cannot decide on their degree of agreement with the statement (“don’t know”). This way we wanted to check whether the recommendations of our method sound reasonable to users (of mobile phones). In the Number of feature pairs future we will examine other viewpoints of more technical stakeholders, such as developers and maintainers. 20 18 16 14 12 10 8 6 4 2 0 Table 3. Degrees of agreement with cluster analysis outputs completely agree partially agree Disagree don't know 0 - 0.1 0.1 - 0.2 - 0.3 - 0.4 - 0.5 - 0.6 - 0.7 - 0.8 - 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Similarity value similar different Figure 7. A histogram of human classification with respect to similarity values Six of the questions in the questionnaire were based on the internal analysis of clusters and referred to features within the same cluster. In this case, we aimed to check whether mobile phone users perceive the features as similar enough. Examples of questions in this category are: Various media capabilities (such as mp3, mp4, and camera) are extras to mobile phones. High resolution, low resolution, basic and color are all characteristics of a mobile phone screen/display. In mobile phones screen and display are synonyms. The other six statements in the questionnaire were derived applying the inter-cluster analysis of relationships. Here we aim to check whether recommendations to add features that were missing in specific feature diagrams but exist in others are indeed justified. Examples of the statements in this category are: A mobile phone which supports messaging services is likely to support SMS. Two common settings of a mobile phone are its operating system and its support for Java. A clock utility is a basic function in a mobile phone. 50 information systems students filled the questionnaire. Four of them worked in service departments of mobile phone companies for several years. The other 46 respondents were familiar with the domain as users for 10 years on the average. 18 of the respondents evaluated their familiarity with the domain as ‘very good’, 22 – as ‘good’, and only 5 – as ‘poor’ and 1 – as ‘unfamiliar’. The analysis of the answers reveals that overall in about 80% of the cases the respondents agreed with the statements (in 59% of the cases they completely agreed and in 20% they partially agreed). Only in 16% of the cases the respondents disagreed with the specified statement. Dividing the questions according to their origin in the cluster analysis step, we found similar degrees of agreement in questions based on internal analysis of clusters and in questions derived from inter-cluster analysis of relationships (see Table 3). Despite the high degree of agreement with the statements derived from the cluster analysis step, we noticed a high degree of disagreement on three specific statements (40% of disagreement). Internal analysis of clusters 59.00% Inter-cluster analysis of rel. 59.87% Overall 59.43% 21.67% 18.73% 20.20% 15.33% 17.39% 16.36% 4.00% 4.01% 4.01% The first statement claimed that in mobile phones ‘screen’ and ‘display’ are synonyms. While this is the case in WordNet (name similarity of 0.95), many respondents probably interpreted ‘screen’ as the physical device that is characterized by resolution, size, etc., while ‘display’ was interpreted as referring to the way the mobile phone visualizes applications (e.g., using different drivers). This distinction sounds reasonable, but it did not appear in our input feature diagrams, in which both ‘display’ and ‘screen’ had similar (or event identical) sub-features. The second statement with a high degree of disagreement claimed that mobile phones with basic functions are most likely to have games. Here we believe that the respondents disagreed with the statement since they referred to different kinds of mobile phones, the simple of which do not include games or game support at all. In our case, however, 80% of the SPLs that included the features ‘utility functions’ or ‘basic functions’ included also the features ‘game’ or ‘play’. . Finally, the third statement with a high degree of disagreement claimed that two common settings of a mobile phone are its operating system and its support for Java. Here we believe that the respondents tend to refer to settings as features (or definitions) that can be controlled by users. Indeed, both the operating system and the java support are not features that can be modified by users, but are features relevant for the configuration of mobile phones and thus should be recommended for SPLs that do not include them. Although our results are promising, only further evaluation may indicate whether our results can be generalized to other (more complicated) cases. 4. RELATED WORK: COMPARISON OF FEATURE MODELS Comparison of feature models is studied and researched for different reasons [1]. Here we review the main purposes and differentiate our work from the related studies. First, comparison is done when composing feature models to support scalability of SPLs. It is recommended to divide feature models that represent SPLs with large numbers of features and a high degree of variability [23]. Segura et al. [25] further suggest using graph transformations for automating the merge of feature models. They present a catalogue of 30 visual, technologyindependent rules that describe how to build a feature model including all the products represented by two given feature models. The main assumption in this category of studies is that the different feature models include identical portions that can be characterized by their feature names and attributes. These portions can serve as anchors for merging. In our case, we cannot assume any syntactical overlap between the feature models, as the feature models may be developed in separate domain analysis processes in different companies. Instead we assume a set of feature diagrams, each representing a different SPL. In case a single SPL is represented in different feature diagrams, we will need to run a merging method prior to using our cross SPL analysis method. Second, comparison is also done when weaving aspects to base feature models that represent SPLs. Acher et al. [2] propose two main operators to compose feature models: insert, which enables inserting features from a crosscutting feature model into a base feature model, and merge, which enables putting together features from two separated feature models, when none of the two clearly crosscuts the other. Bošković et al. [4] describe a method, called AoFM, for facilitating modeling of aspects in feature models. To this end, they introduce the concepts of join point models, aspects, aspect feature models, pointcut feature models, patterns, and composition rules. In particular, the patterns and composition rules describe all the changes that an aspect in AoFM can impose on a base concern, i.e., the addition and removal of feature diagram elements. Studies in this category assume the existence of composition rules or operations that can be performed on the base models when weaving a given aspect model. Furthermore, the aspect and base models are defined over the same terminology, easing the task of matching, or finding similar, features. In our case, we do not aim to weave or integrate the different feature diagrams, but to analyze the commonality and variability among the input SPLs (as represented in their feature diagrams) in order to recommend on improvements in particular SPLs or in the maintenance and future development of a set of SPLs. Third, comparison of feature models can be done for synchronization purposes. Thüm et al. [27] propose an algorithm for automatically determining the relationships between two feature models in terms of a set of configurations. As SPLs and their feature models evolve over time, they classify the modification of a feature model as refactorings, specializations, generalizations, or arbitrary edits. They further present an algorithm that computes the change classification of two feature models (before and after the edit). Kim and Czarnecki [17] also suggest synchronizing existing configurations of a feature model that have evolved over time. They refer to the following changes that can be performed on feature models: addition, removal, changing attribute, relocation, and cardinality changes. In all these cases, the assumption is that the feature models were emerged from the same source (feature model) applying different sequences of operations from a pre-defined list of possible ones. In our work, the sources of the feature models may represent SPLs that were developed in different departments of the same company or even in different companies (as is the case with mergers and acquisitions of companies). Finally, feature models of different SPLs may be compared and merged. In this context, Czarnecki and Eisenecker [8] suggest using dependency rules between high level and low level features to deal with different feature models for different product types. Czarnecki et al. [9] introduce the concept of staged configuration for composing separate feature models that model decisions taken by different stakeholders. Stage configuration can be achieved “either by stepwise specialization of feature models or by multilevel configuration, where the configuration choices available in each stage are defined by separate feature models”. Hartmann and Trew [14] introduce the concept of a Context Variability model which contain the primary drivers for variation. The Context Variability model constrains the feature model and enables modeling multiple SPLs for software supply chains. Hartmann and Trew further suggest how to merge multi product line feature models. In all these cases the different feature models are assumed to use the same terminology. In particular, one of the most current underlying assumptions is that similar features have the same name. Thus, the main challenges are merging and yielding valid outputs (i.e., valid merged models). In our case we did not assume the same terminology and thus we had to apply similarity metrics and clustering techniques to identify similar features and group them accordingly. Furthermore, we are not interested in a merged model, but in recommendation for improvement in the SPL artifacts and in the overall management of the SPLs. 5. SUMMARY AND FUTURE WORK Analysis of the commonality and variability within families of SPLs is highly important for improving maintenance, future development, and management. Currently different feature models are mainly merged for different purposes, assuming the usage of the same terminology in all input models and focusing on creating valid merged models. In this research we introduce a method for conducting cross product line analysis by using linguistic and structural similarity techniques for measuring feature similarity and an agglomerative hierarchical clustering technique for analyzing the commonality and variability of the examined SPLs. Preliminary results indicate that the method may have a potential for recommending on improvements to the management of related features. Future research includes several directions. First, so far we mainly used WordNet for similarity calculations. However, as there are a wide variety of domains which require wide and updated corpuses of terms, additional techniques for measuring semantic similarity should be evaluated (e.g., the one that is based on Wikipedia [13]). In addition, the impact of the different similarity types, i.e., name and context similarity, on the overall similarity needs to be examined, especially for differently structured product lines in the same domain. Second, currently the feature clustering is done based on the existence of relationships and does not refer to their types. However, mandatory features may indicate on a stronger relationship than optional features, as the first kind points on essential characteristics of the product line, while the second kind may point “only” on possible added values. In addition, dependencies should be analyzed as well. Third, the usability of the method can be improved by representing recommendations for changes for each examined SPL. This can be done by introducing mining techniques for automatically analyzing the clustering results. Fourth, the efficiency, especially in terms of scalability, of the method should be further evaluated. In particular, as currently the method bottleneck is the agglomerative clustering algorithm3, further clustering techniques should be evaluated checking the quality of their results with respect to their time complexity. Finally, additional evaluations of the method are required. In particular, additional, more complicated domains need to be examined, as well as different sources of feature diagrams (besides S.P.L.O.T repository) need to be explored. The evaluation also needs to examine different points of view besides that of users, such as those of developers and maintainers. 3 The time complexity of the agglomerative clustering algorithm varies depending on the selected distance type from O(n2) for single-link to O(n2 log n) for complete-link [19], where n is the number of features in all input diagrams. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Acher, M., Collet, P., Lahire, P., France, R. (2010) Comparing approaches to implement feature model composition. Modelling Foundations and Applications, 319. Acher, M., Collet, P., Lahire, P., France, R. (2009) Composing Feature Models. 2nd International Conference on Software Language Engineering (SLE’09). LNCS 5969, pp. 62–81. Benavides, D., Segura, S., Ruiz-Cortés, A. (2010) Automated analysis of feature models 20 years later: A literature review. Information Systems 35 (6), pp. 615-636. Bošković, M., Mussbacher, G., Bagheri, E., Amyot, D., Gašević, D., Hatala, M. (2011) Aspect-oriented feature models. Models in Software Engineering, pp. 110-124. Budanitsky, A., Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32 (1), pp. 13-47. Chen, L., Ali Babar, M. Ali, N. (2009). Variability management in software product lines: a systematic review, Proceedings of the 13th International Software Product Line Conference, pp. 81-90. Clements, P., Northrop, L. (2001). Software Product Lines: Practices and Patterns. Addisson-Wesley. Czarnecki, K, Eisenecker, U.W. (2000) Generative Programming: Methods, Tools, and Applications, AddisonWesley, Boston. Czarnecki, K., Helsen, S., Eisenecker, U. (2005) Staged Configuration through Specialization and Multilevel Configuration of Feature Models. Software Process: Improvement and Practice 10(2), pp. 143–169. Dao, T. N. Simpson, T. (2005). Measuring Similarity between sentences. opensvn repository. Freund, Y. and Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, pp. 119-139. Freund, Y., Schapire, R., Abe, N. (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14, pp. 771-780. Gabrilovich, E., Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. Proceedings of the 20th international joint conference on Artificial intelligence, pp. 1606-1611. Hartmann, H., Trew, T. (2008) Using feature diagrams with context variability to model multiple product lines for software supply chains. IEEE Software Product Line Conference (SPLC’08), pp.12–21. [15] Jain, A. K. (2010). Data clustering: 50 years beyond Kmeans. Pattern Recognition Letters 31 (8), pp. 651-666. [16] Kamvar, S. D., Klein, D., Manning, C. D. (2002). Interpreting and extending classical agglomerative clustering algorithms using a model-based approach. Proceedings of 19th International Conference on Machine Learning, pp. 283-290. [17] Kim, C.H.P., Czarnecki, K. (2005) Synchronizing cardinality-based feature models and their specializations. ECMDA-FA’05. LMCS 3748, pp. 331-348. [18] Kurita, T. (1991). An efficient agglomerative clustering algorithm using a heap. Pattern Recognition 24 (3), pp. 205209. [19] Manning, C. D., Raghavan, P., Schütze, H. (2008) Introduction to information retrieval. Cambridge University Press Cambridge, Vol. 1, pp. 378-395. [20] Pedersen, T., Patwardhan, S., Michelizzi, J. (2004) WordNet::Similarity: measuring the relatedness of concepts, Demonstration Papers at HLT-NAACL 2004, Association for Computational Linguistics, pp. 38-41. [21] Perumal, P., Nedunchezhian, R. (2011). Performance Analysis of Standard k-Means Clustering Algorithm on Clustering TMG format Document Data. International Journal of Computer Applications in Engineering Sciences I (IV), pp. 406-412. [22] Rasmussen, E. (1992). Clustering algorithms. Information Retrieval: data structures and algorithms, pp. 419-442. [23] Reiser, M.O., Weber, M. (2007) Multi-level feature trees: A pragmatic approach to managing highly complex product families. Requirements Engineering 12(2), pp. 57–75. [24] S.P.L.O.T Software Product Lines Online Tools, http://www.splot-research.org/. [25] Segura, S., Benavides, D., Ruiz-Cortés, A., Trinidad, P. (2008) Automated merging of feature models using graph transformations. GTTSE ’07, LNCS 5235, pp. 489–505. [26] Steinbach, M., Karypis, G., Kumar, V. (2000). A comparison of document clustering techniques. KDD Workshop on Text Mining, pp. 525-526. [27] Thüm, T., Batory, D., Kästner, C. (2009) Reasoning about edits to feature models. IEEE Internation Conference on Software Engineering (ICSE’09), pp. 254–264. [28] Wikipedia Sony Mobile Communications http://en.wikipedia.org/wiki/Sony_Mobile_Communications [29] WordNet: a lexical database for English, http://wordnet.princeton.edu/. [30] Wu, Z., Palmer, M. (1994). In Verbs semantics and lexical selection, Association for Computational Linguistics, pp. 133-138.
© Copyright 2026 Paperzz