Business Component Identification

Business Component Identification- A Formal Approach
Hemant Jain
[email protected]
University of Wisconsin – Milwaukee
Naresh Chalimeda
Navin Ivaturi
Balarama Reddy
Tata Consultancy Services
Abstract
2. Business Component Fabrication
Component Based Software Development is carried
out in two phases: Component Building and Application
Assembly. The key to building business components is a
formal approach for identifying the components. This
paper describes such an approach, which assists in
identifying the components from an Analysis Level
Object Model, representing a business domain. The
approach makes use of a clustering algorithm, certain
constraints, a predefined rule and a set of heuristics. The
approach has been implemented in a tool named
‘CompMaker’ and was used for identifying components
for an auto insurance claims domain.
1. Introduction
Component Based Software Development (CBSD) is
likely to revolutionalize the process of building
applications. It advocates an approach whereby
applications would be assembled from pre-built parts
known as business components. A business component is
the software implementation of an autonomous business
concept or business process [3]. Thus in CBSD the
process of building a business application can be
considered to be consisting of two stages--Component
Fabrication (building the Business Components) and
Application Assembly (building a Business Application
from Components). The Component Fabrication stage of
CBSD consists of various phases: Domain Analysis and
Modeling, Component Identification, Component Design
& Implementation, Acceptance and Roll Out &
Deployment. This paper focuses on the ‘Identification
Phase’ of Business Component Fabrication. The
Component Identification phase groups closely related
classes of a business domain, into components.
The goal of the fabrication process is to design
business components that can be reused within the same
domain and may possibly be reused across domains. The
challenge for the designer is to identify components that
can be developed in cost effective manner, are suitable
for reuse, easy to assemble into applications, easy to
maintain and provides capability to customize end
application by proper selection and assembly of
components.
Identifying reusable artifacts is recognized as one of
the greatest difficulties in classical software reuse [1].
Although the design issues of traditional reusable
software artifacts such as code are discussed in the
literature [8], the design issues of reusable business
components are not adequately addressed [7,11].
Business components vary from traditional software
artifacts, and therefore the design process must account
for those differences. For instance, traditional reusable
artifacts (e.g., code segments, objects, etc.) are mostly
fine-grained and portray a low-level technical-oriented
representation of the domain. Components on the other
hand are more coarse-grained and are intended to
provide a high-level business-oriented representation of
the domain. The fine-grained technical-oriented nature
of traditional reusable artifacts such as objects prevents
managers from working with them effectively.
However, the coarse-grained business-oriented approach
in components allows managers to identify the
components that satisfy their business requirements, and
subsequently assemble them into full-scale business
applications. In addition to granularity, the following
key differences between components and traditional
reusable artifacts have been identified [2,6,11]:
1. A component is a self-contained executable program
that provides a specific service.
Proceedings of the Fifth International Enterprise Distributed Object Computing Conference (EDOC’01)
0-7695-1345-X/01 $10.00 © 2001 IEEE
2. A component has an interface, which is used to
communicate with other components.
3. A component could be used in a context that is
unanticipated by its initial designers.
Hence, the design of components requires a unique
perspective. This paper presents a formal approach for
business component design. The approach provides
support to component designer in making trade-off
between multiple conflicting managerial goals such as
reduced development cost, increased reusability
identified above.
Next section describes a formal approach for
component identification.
(agglomerations), initially of individual entities (classes)
and later of clusters formed during the previous stages.
The classes having the highest relationship strengths are
grouped first.
The process continues until a cut-off
point is reached. The process of computing relationship
strength is described next.
3.2. Computing Class Relationship Strength
This is a crucial phase in the overall Component
Fabrication process. The approach proposed for
identifying business components uses an analysis level
domain model as input. We assume that the domain
modeling has been done using an object-oriented
approach. Thus, the domain model represents significant
object classes (using UML notations), the structural
relationships between object classes, use cases and
sequence/interaction diagrams representing the dynamic
relationship between the classes. A clustering approach
is used to obtain an initial set of components.
Consideration of super type subtype relationships and a
set of heuristics enhance and refine the solution obtained
from the clustering algorithm. The alternative solutions
are evaluated based on the managerial goals measured in
terms technical characteristics such as coupling,
cohesion, complexity etc. [12].
The approach is
described in detail in next sub-sections.
The clustering algorithm groups the classes on the
basis of the strength of relationships between classes.
For computing strength of relationships between classes,
static and dynamic relationships are used. Static
relationships [9] are computed based on the associations
between classes and the dynamic relationships are
computed based on use cases and sequence diagrams.
The static relationship represents the way various
classes are related to each other. The use of static
relationship in the clustering process ensures that only
the related classes are clustered together. On the other
hand dynamic relationship represents the way various
classes interact through messaging to support various
business processes. Use cases and the corresponding
sequence diagrams are used as a basis for computing
dynamic relationship between classes. Use cases are
assigned relative weights based on their importance to
the domain. The importance to the domain can be based
on the criticality of the business process supported by the
use case, frequency or any other considerations. The
total relationship strength between a pair of classes is
computed as follows:
Consider a scenario in which Class i and Class j are
structurally related and are used in one or more use
cases, representing dynamic relationship between them.
3.1. The Clustering Algorithm
The strength of the static relationship (Sij) between
classes i and j can be defined as:
3. Business Component Identification
The process of component identification begins by
grouping related classes of an analysis level domain
model. A clustering approach is used to arrive at the
initial grouping. Clustering approaches can be classified
as hierarchical or non-hierarchical. Hierarchical
clustering techniques are further divided into
Agglomerative
and
Divisive
techniques.
An
Agglomerative method involves a series of successive
mergers whereas a Divisive method involves a series of
successive divisions [5].
The approach proposed here makes use of a
Hierarchical Agglomerative clustering algorithm for
grouping the classes of the analysis level domain model.
The strength of the relationships (static and dynamic)
between the classes of the domain model is used as the
basis for clustering the classes. The technique proceeds
through a series of successive binary mergers
Sij = Ws × Nij
Where
Ws = the static association weight.
Nij = the total number of associations between class i
and j.
The strength of the dynamic relationship (Dij)
between classes i and j is defined as:
Dij =
™(Upi * Upj * Wp * Vijp)
pεP
Where,
P = Set of use cases
Upi = 1 if use case p need class i
0 if use case p does not need class i
Wp = Weight assigned to use case p
Proceedings of the Fifth International Enterprise Distributed Object Computing Conference (EDOC’01)
0-7695-1345-X/01 $10.00 © 2001 IEEE
Vijp = Number of messages between class i & j in use
case p
The Sij and Dij are scaled on a 0 to 1 scale. The
designer has an option of assigning relative importance
(RI) to static and dynamic relationships. The total
strength (TSij) of the relationship between two classes is
computed as:
TSij = ( RIs * Sij + RId * Dij)
Where, RIs + RId = 1.0
Another factor called ‘Threshold Limit’ of the
relationship strengths is also used during the clustering
process. Threshold Limit denotes the stage at which the
clustering algorithm puts an end to the series of
successive mergers of classes. The designer can assign a
value to this factor, thereby indicating the point at which
the algorithm needs to stop the clustering process. The
clustering process can also be constrained by defining
the ‘Minimum number of components desired’ and the
‘Maximum number of classes that are allowed in a
component’.
3.3. Enhancement of the Clustering Solution
Placing the classes that are related through
‘inheritance’ in a single component can enhance the
component identification solution obtained from the
clustering algorithm.
Taking the technical characteristics of the component
design into consideration, one has to strive for tight
cohesion within a component and loose coupling
between components . Cohesion refers to the strength of
association between elements (classes) in a component
[12]. On the other hand coupling refers to the extent to
which classes within the component relate to other
classes, which are not in that component [12]. If there is
inheritance between classes, then it is more appropriate
to place those classes in the same component because of
the strong relationship (cohesion) between them. If the
classes related through inheritance were distributed
across components, then it would result in an increase in
dependency (coupling) between components. The ideal
scenario is one in which the cohesion within a
component is maximized and the coupling between
components is minimized. In this approach we replicate
the super classes by adding it to the components
containing one or more of its sub-classes.
3.4. Evaluation of Solution
Vitharana (2000) identified five managerial goals of
the component developer (cost effectiveness, ease of
assembly, customization, reusability and maintainability)
and five technical features of component design
(coupling, cohesion, number of components, size of
component and complexity) that are closely related to the
managerial goals. He identified the relationship
coefficients between the technical features and
managerial goals from a survey. We adopt Vitharana’s
model for evaluating the component identification
solutions.
3.5. Heuristics
The Component Identification approach makes use of
a set of heuristics for further refining the initial solution
obtained from the clustering algorithm. The following
two types of heuristics are supported:
• Automated
• Manual
Automated Heuristics: These heuristics are performed
by the system when the designer opts for them. Amongst
the automated heuristics, the various options available to
the designer are: Add heuristics, Move heuristics and
Exchange heuristics.
Each of these heuristics is
described here.
Add heuristics: In this type of heuristics, redundant
assignment of classes to multiple components is used to
arrive at a more desirable solution. At each iteration, a
class is added to a component and the solution is
evaluated in terms of the managerial goals associated
with it. Since the evaluation model contains multiple
conflicting objectives a set of non-dominated solutions
are generated and presented to the designer. The process
is similar to the one used in [4]. Figure 1 depicts an
iteration of add heuristics.
Figure 1. Add Heuristics
Move Heuristics: In this type of heuristics, a class from
a component is moved to another component, during
iteration. The managerial goal values are computed after
every iteration. As in the case of Add heuristics, only the
non-dominated solutions are displayed. Figure 2 depicts
an iteration of Move heuristics. During the iteration,
Class A is moved from Component 1 to Component 2.
Proceedings of the Fifth International Enterprise Distributed Object Computing Conference (EDOC’01)
0-7695-1345-X/01 $10.00 © 2001 IEEE
Unlike Add heuristics, classes are not redundantly
assigned to components.
Exchange Heuristics:
This heuristic operates by
making even exchanges of classes between components.
During an iteration of Exchange heuristics, a class from a
component is exchanged with a class from another
component. Figure 3 depicts the exchange of Classes A
and X between Components 1 and 2 respectively.
Figure 2. Move Heuristics
Figure 3. Exchange Heuristics
Manual Heuristics: Unlike automated Heuristics, which
are performed by the system, manual heuristics are
carried out by the designer (or any other person who
possesses the domain knowledge). If the designer feels
that a particular class is more appropriate in another
component, he/she can move the class to that component
The manual heuristic is designed to provide opportunity
for fine tuning the components by the designer.
4. Implementation of the Approach:
The research team at the University of Wisconsin,
Milwaukee, has used the above approach to build a
component identification tool, ‘CompMaker’. The
research program is a joint collaboration between Tata
Consultancy Services (TCS), Asia’s largest software
consultancy firm, and the University of Wisconsin,
Milwaukee.
4.1. The CompMaker Tool
The CompMaker is a Java based application, built
using JBuilder Version 3.5, in a Windows NT
environment. The steps involved in using this tool are
briefly described below:
• Initially a UML based Object Model representing
the domain under consideration is developed. This
model comprises of use case diagrams, sequence
diagrams and class diagrams.
• The model data are extracted by executing a script
in the object-modeling tool (Adex Modeling
Framework) [10].
• Once the component identification tool opens the
model, it displays all the use cases that are present
in the model and allows the user to assign the
weight to the use cases
• Other weights required by the model and
constraints are then specified.
• Clustering algorithm is then run and the initial
solution evaluated in terms of managerial goals is
displayed.
• The user can choose to apply automated heuristics
to further refine the initial solution obtained from
the clustering algorithm.
• Once a set of alternate non-dominated solutions is
obtained, the user can modify the solution
manually.
• Any solution thus obtained can be saved and
retrieved at a later stage.
4.2. Application of the approach on an AutoInsurance Claims System
The component identification approach was applied
to Auto-Insurance claims domain. A team of domain
experts from TCS developed the object model for this
domain. The model contained 57 classes and 8 use cases.
Each use case was assigned weight based on its
importance as determined by TCS expert. Equal weight
was assigned to static and dynamic relationship. The
cluster constraints, which are minimum number of
components and maximum number of classes present in
a component, were assigned values of 20 and 3
respectively.
Clustering algorithm was then executed. The
enhanced version of the initial component identification
solution was displayed after incorporating the
‘inheritance rule’ described above. The solution
contained 26 components and the managerial goals
represented on a ten-point scale were computed (Higher
Proceedings of the Fifth International Enterprise Distributed Object Computing Conference (EDOC’01)
0-7695-1345-X/01 $10.00 © 2001 IEEE
values are more desirable except development effort
where lower values are better). The values of the goals
obtained are shown in the first row of Table 1.
Heuristics were used to further refine this solution.
Exchange heuristics was first applied. The second row of
Table 1 shows the solution obtained. We see that the
values are better in terms of development efforts. Please
note that the managerial goal values provide the relative
comparison between solutions. Thus, all other values
being equal a solution with reusability of 8 is better than
a solution with reusability of 7. The solution was later
subjected to move heuristics. This resulted in a set of
non-dominated solutions. The selected solution
contained the values, which are shown in the third row of
Table 1. In this solution one can see that though the cost
has increased a little bit, all the other values have
improved.
Table 1. Component Identification Solutions
Customization
Reusability
Maintainability
8.17
5.52
5.93
5.48
8.48
8.15
5.53
5.93
5.48
8.48
8.16
5.60
5.96
5.52
8.51
8.10
5.64
5.96
5.52
8.48
Solutions
Å
Dev. Efforts
East of Assembly
Managerial
Goals
Æ
Initial
Solution
Exchange
Heuristics
Move
Heuristics
Add
Heuristics
In the next step, add heuristics were applied. From
the resulting set of non-dominated solutions, a preferred
solution was selected, which had the following values for
managerial goals (shown in row four of Table 1). This
solution shows improvement on the basis of two factors,
cost and ease of assembly as opposed to maintainability.
Overall the solution seems to have improved. Manual
heuristics were performed by moving classes from one
component to another component where it seemed more
appropriate.
This procedure was repeated by specifying a different
set of values for the clustering constraints, minimum
number of components as ‘10’ and maximum number of
classes as ‘6’. The solution yielded 20 components.
The TCS expert felt that the final solution obtained
was a good satisfactory design. They found the tool
useful.
5. Conclusion
There is a dearth in the availability of literature that
discusses a standard methodology for identifying
components from a set of classes. The component
identification approach discussed in this paper represents
a formal approach to identifying components. Such an
approach is likely to enhance the subsequent phase of
component assembly.
References
[1] Apte, U. and C. S. Sankar, “Reusability-based Strategy for
Development of Information Systems: Implementation
Experience of a Bank,” MIS Quarterly, 14, 4 (December 1990),
421-433.
[2] Brown, A. W. and K. C. Wallnau, “Engineering of
Component-Based Systems,” in A. W. Brown (Ed.), Selected
Papers from the Software Engineering Institute, IEEE
Computer Society Press, Los Alamitos, CA, 1996.
[3] Herzum, P. and O. Sims, Business Component Factory: A
Comprehensive Overview of Component-Based Development
for the Enterprise, John Wiley & Sons, Inc., 2000.
[4]
Jain, H., A Comprehensive Model for the Design of
Distributed Computer Systems, IEEE Transactions on Software
Engineering, vol. SE-13, No 10, October 1987.
[5] Johnson, R. and D. Wichern, Applied Multivariate
Statistical Analysis, Prentice-Hall, Inc., 1998.
[6] Kythe, D. K., “The Promise of Distributed Business
Components,” AT&T Technical Journal, Vol. 75, No. 2,
March/April 1996, pp. 20-28.
[7] Makrygiannis, N., “Toward Mass-Customized Information
Systems,” in T. Jell (Ed.), Component-based Software
Engineering, Cambridge University Press, Cambridge, UK,
1998.
[8] Mili, H., F. Mili, and A. Mili, “Reusing Software: Issues
and Research Directions,” IEEE Transactions on Software
Engineering, Vol. 21, No. 6, June 1995, pp. 528-561.
[9] Rosenberg, D. and K. Scott, Use Case Driven Object
Modeling with UML- A Practical Approach, Addison Wesley
Longman, Inc., 1999.
[10] Tata Consultancy Services, ADEX modeling Framework,
Version 1.7, 1999.
[11] Szyperski, C., Component Software: Beyond ObjectOriented Programming, ACM Press, New York, 1998.
[12] Vitharana, P., Designing and Managing reusable
business components, Ph. D. Dissertation, University of
Wisconsin- Milwaukee, 2000.
Proceedings of the Fifth International Enterprise Distributed Object Computing Conference (EDOC’01)
0-7695-1345-X/01 $10.00 © 2001 IEEE