as a PDF

A SOA-ENABLED
ENTERPRISE CONTENT MANAGEMENT SYSTEM
A thesis proposal
Presented to the
Department of Information and Computing Sciences
Utrecht University
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
in
Computer Science
by
Lee Provoost
February 2006
UTRECHT UNIVERSITY
The Undersigned Faculty Committee Approves the
thesis proposal of Lee Provoost:
A SOA-Enabled
Enterprise Content Management System
prof. dr. S. D. Swierstra, Chair
Software Technology Group
dr. A. Bijlsma, Thesis Supervisor
Software Technology Group
prof. dr. J. van den Berg, Thesis Supervisor
Content and Knowledge Engineering Group
Approval Date
iii
ABSTRACT OF THE THESIS PROPOSAL
A SOA-Enabled
Enterprise Content Management System
by
Lee Provoost
Master of Science in Computer Science
Utrecht University, 2006
An Enterprise Content Management (ECM) system has been widely recognized as a
key asset in nowadays’ companies. Such systems are necessary to streamline and to manage
the flow of huge amounts of data inside a corporate environment. Every company changes or
evolves during its lifecycle and Service-Oriented Architecture tries to address this problem of
change. These ”Adaptive Enterprises” (as described by HP) could benefit from an Enterprise
Content Management system built upon the principles of SOA. This master’s thesis tries to
design an architecture for such system. However, the field of SOA is very young and lots of
technologies and standards are in their infancy and still need lots of work to make it to
production status. As part of this project, I’ll try to address several of these issues.
iv
TABLE OF CONTENTS
PAGE
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
CHAPTER
1
2
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1
Short description of thesis project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Purpose of this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
CONTEXT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.1
Enterprise Content Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.1.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Service-Oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.2.2 Value of a SOA architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.2
2.2.3 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3
Business Process Execution Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4
Human Based Web Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5
Enterprise Service Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6
Service Component Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3
THESIS PROJECT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
v
3.1
Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2
Own Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Participate in open source projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 SOA - ECMS Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.3 Master thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Maintain a public blog. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.2 Participate in discussions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.3 Share intermediate results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4
Project Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 ”Must have” (Priority 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 ”Great to have” (Priority 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 ”Nice to have” (Priority 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5
Time Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
vi
LIST OF FIGURES
PAGE
Figure 2.1 AIIM’s ECM architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 2.2 A simplified SOA architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 2.3 IBM SOA software stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 2.4 Simplified ESB Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1
CHAPTER 1
INTRODUCTION
1.1 S HORT DESCRIPTION OF THESIS PROJECT
This master’s thesis project tries to design an Enterprise Content Management (ECM)
system according to the principles of Service-Oriented Architecture (SOA). While there are
several commercial companies working on service-enabling their software stack, there is no
ECM that has been built from scratch with the ideas of SOA in mind. The field of SOA is very
young and most technologies and standards are still in their infancy and still require lots of
effort to become production-ready. The goal of this project is to identify weak points in the
current standards and implementations and try to come up with solutions for them. At the end,
a prototype of an open-source SOA-enabled ECM system should be delivered.
1.2 P URPOSE OF THIS DOCUMENT
This thesis proposal tries to describe the context and the project of my master’s thesis.
It firstly describes the used technologies and standards that I will need in my work. This is
important because definitions are often vague and usually everyone has its own understanding
of what a certain technology means. I don’t want to pretend that my definitions are the perfect
ones, but they serve as a basis for a common understanding between my supervisors and me.
Secondly, I’ll explain what I am going to do during my thesis project. I’ll clarify my research
questions, my approach, my contributions to the field, my project deliverables and I will give
a time planning.
2
CHAPTER 2
CONTEXT
This chapter introduces the main concepts that I will use in my thesis. First I will start
with what my understanding is of Service-Oriented Architecture and Enterprise Content
Management, because these two technologies are quite vague and everybody seems to have
their own definition. Then I will have a closer look at some other technologies that I might
need and during my thesis I’ll decide which I will use. For each technology, some
implementations will be discussed with a status report whether it is still in beta or already a
mature product. That report can serve as a basis for identifying needs that we can try to solve
as part of the thesis project.
1
2.1 E NTERPRISE C ONTENT M ANAGEMENT
S YSTEM
Definition from the Association for Information and Image Management (AIIM) 2 :
”Enterprise Content Management is the technologies used to Capture, Manage, Store,
Preserve and Deliver content and documents related to organizational processes. ECM tools
and strategies allow the management of an organization’s unstructured information, wherever
that information exists. The ECM industry provides information management solutions to help
users: guarantee business continuity; enable employee, partner, and customer collaboration;
ensure legal and regulatory compliance and reduce costs through process streamlining and
standardization.” [1]
2.1.1 Definition
Before I’ll start explaining each part of the definition from the AIIM (given above), let
me first introduce the terms: Content Management System (CMS) [49] and Document
Management System (DCM) [47]. There is a lot of confusion about these terms and they are
often interchanged with each other and with the term Enterprise Content Management (ECM)
[50]. After the definition of CMS and DMS, we’ll take a closer look at what an ECM is.
1
I should note that the technologies that I will use in my final work are not limited to the ones described in this
chapter. In my final work, some other technologies might be introduced.
2
Also known as the Enterprise Content Management Association
3
2.1.1.1 C ONTENT M ANAGEMENT S YSTEM
The history of the electronically Content Management Systems dates back to halfway
the seventies according to the CMS Wiki [46]. At that time they used electronic publishing on
mainframe systems. Nowadays when people are talking about a CMS, they are usually
referring to web-based Content Management Systems, which are only a specific subset of
Content Management Systems. Apparently already in 1995 CNET had an internally
developed Web Content Management System, which spun of a bit later into Vignette (now a
leading ECM vendor).
The CMS Wiki defines several core concepts:
1) Content needs to be stored in a repository. That can be either on the file system or
in a database.
2) WYSIWIG 3 editing of content.
3) Workflow automation of tasking sequences and business rules.
4) Content check-in and check-out
5) Version control
6) Linking of content with each other
7) Metadata of content (e.g. date changed, author, editors, etc.)
8) Reuse of content in other content
9) Multi-channel Delivery (print, PDF, PDA, cell phone, etc.)
10) Personalization
11) Multiple languages and localization
12) Separation of creation, management and delivery of content.
However, this definition by the CMS Wiki is rather strict. Most systems that are
tagged as ”CMS” do not adhere to all these requirements, especially in the field of Web
Content Management Systems. These systems are nowadays very popular, because a Wiki
(like Wikipedia [52]) and a weblog (like Movable Type [2] and WordPress [53]) are also
considered to be a Content Management System. While a Wiki adheres to most of the rules
3
”WYSIWYG is an acronym for What You See Is What You Get. Relating to or being a word-processing or
desktop publishing system in which the screen displays text exactly as it will be printed [42].”
4
described above, a blog does not. There are also several frameworks available for creating
your own Web Content Management System like Mambo [19] and Zope [4].
Wikipedia [49] adds that there are actually several categories of Content Management
systems, whereas Enterprise Content Management is one of them. Next to ECM systems, we
have also Web Content Management Systems (as earlier described), Publications
Management Systems (for publishing manuals, books, etc) and several others.
2.1.1.2 D OCUMENT M ANAGEMENT S YSTEM
Originally a Document Management System was used to manage scanned documents.
Companies wanted to keep track of paper documents and they wanted to store them
electronically. We can distinct three kinds of Document Management Systems: Electronic
Document Management Systems, Integrated Document Management Systems and Physical
Paper Document Management Systems (or also known as Document Imaging Systems).
Electronic Document Management Systems manage (as their name implies) electronic
documents like Microsoft Word’s documents. The difference with the original DMS is that
the creation of the document is also electronically and not the result of scanning paper
documents. It offers possibilities to electronically sign documents and it can support legal
requirements like Sarbanes-Oxley 4 . Last but not least all documents are under
rights-management control, so there are restrictions on who can access, view, print or modify
a document. An Electronic Document Management System is typically part of an Enterprise
Content Management System.
An Integrated Document Management System is a DMS that is integrated with a
document-authoring tool, like Microsoft’s Word. In this case, a person could open a document
from the repository directly in Word, edit it and save it. The Integrated Document
Management System takes care of version management and access rights.
Physical Paper Document Management Systems are in fact the traditional Document
Management Systems that manage scanned documents. Usually the user is required to add
some tags, to ease the search for a particular document. With the popularity of Optical
Character Recognition (OCR) software, these systems are able to recognize text and store the
text together with the scanned document.
2.1.1.3 E NTERPRISE C ONTENT M ANAGEMENT
Let me rephrase the definition from AIIM: ”Enterprise Content Management is the
technologies used to Capture, Manage, Store, Preserve and Deliver content and documents
4
The Sarbanes-Oxley Act of 2002 is to protect investors by improving the accuracy and reliability of corporate
disclosures made pursuant to the securities law, and for other purposes. This act came after the huge financial
scandal of Enron in the United States. More information about the Sarbanes-Oxley Act can be found here [43]
5
related to organizational processes.” The five actions can also be found on the architectural
schema from AIIM (Fig. 2.1). I’ll explain them further:
1) Capture: Generating, capturing, preparing and processing analog and electronic
information. The source can be digital (EDI documents, XML documents, ...) or
analog (scanned invoices, ...).
2) Manage: Management, processing and use of information.
Document management: check-in/check-out, versioning, search, navigate and
visualizing.
Collaboration: Whiteboards for brainstorming, appointment scheduling,
project management, jointly usable information databases, etc.
Web Content Management: The Web interface to our ECM. Tasks include
conversion from and to Web formats, editing and creation of documents, access
control (like guest access), etc.
Business Process Management: Workflow functionality, process and data
monitoring, Enterprise Application Integration (see further in this chapter:
Enterprise Service Bus) and business intelligence (e.g. rules).
3) Store: Storage of content is in this definition different than preserving content.
The ”store” action is used for the temporary storage of information that does not
require archiving. We use repositories for this action, which can be either the file
system or a database.
4) Deliver: This is also called ”output management”. Here we take care of
delivering the content to the end-user. This includes converters (for instance
PDF), compression (images), layout (for instance with XSLT), access control (the
right user accesses the right document), etc.
5) Preserve: Preserving is the long-term archiving of the content. Nowadays the
archive is often stored on a Storage Area Network (SAN, [5]) or
Network-Attached Storage (NAS, [5]) instead of the traditional backup tapes. The
preserve action also includes the necessary work to keep your data accessible. For
instance for older data, it is perhaps necessary to convert them to a new file
format, or emulate viewers to access them.
6
2.1.2 Implementations and status report
I will give here a brief overview of some popular ECM systems. During my thesis
phase, I will have a thorough look at them to find out what these systems offer and how I can
integrate these features in my framework. Right now, I’ll take the five actions within an ECM
(Capture, Manage, Store, Preserve and Deliver) as a guide for my review. You will see that the
commercial vendors have a more extensive portfolio and often they have several products to
fulfill one of the five actions, instead of just one product.
2.1.2.1 C OMMERCIAL
1) Hummingbird [16]: Hummingbird has seven products with each several features
that fit into the ”five-action” model. I’ll mention the product with the specific task.
Capture: Hummingbird Enterprise (capturing); Hummingbird DOCS (image
capture + annotation); Hummingbird ImageBASIC (image capture)
Manage: Hummingbird Enterprise (version control + workflow +
collaboration); Hummingbird Business Intelligence; Hummingbird SearchServer;
Hummingbird DOCS (routing of documents)
Store: Hummingbird Enterprise; Hummingbird DOCS
Preserve: Hummingbird Enterprise (archiving)
Deliver: Hummingbird Enterprise (security); Hummingbird ImageBASIC
(image viewer)
2) EMC documentum [6]: Documentum touts to have 80 products that fall in four
categories: content services, process services, repository services and integration
services. They try to evolve their product stack towards a SOA model and
therefore documentum is really interesting for me to take a closer look at.
Capture: Content services (capture + edit)
Manage: Process services (collaboration + business process management);
Content services (search)
Store: Repository services (storage); Content services (library services)
Preserve: Repository services (archiving)
Deliver: Content services (transformation + publishing/distribution)
7
2.1.2.2 O PEN S OURCE
1) Alfresco [45]: Alfresco calls itself ”the Open Source alternative for Enterprise
Content Management”. They deliver a full-featured open-source ECM, but also a
commercial version. The big difference lays in the support and some enterprise
features like single-sign on, clustering, LDAP authentication, etc.
Capture: Manage: Check-in / check-out; version control; meta data on document
editing (who created, who updated, etc); team collaboration; integrated workflow;
document lifecycle management; advanced search
Store: Virtual file system
Preserve: Deliver: Transformation to different file formats like PDF, Flash and
Powerpoint; document security; user management
2) eZ systems [7]: eZ System has a quite good package for Web Content
Management with built-in blog, forum and Wiki. However to call their package
Enterprise Content Management is a bit too much.
Capture: Manage: Version control with rollback; workflow system
Store: Preserve: Deliver: Automatic image conversion; preview, drafts; content stored in
XML and can be converted to all kind of formats
2.1.2.3 C ONCLUSION
The commercial ECM vendors seem to be able to span the whole lifecycle of content,
from capturing till delivering. They also are worthy to have the label ”Enterprise” in my
opinion. In the field of open-source ECM, products often call themselves ”Enterprise”, but
they are merely a Web Content Management System geared towards Wiki-style usage.
However Alfresco is a very convincing open-source product and worth taking a closer look at
during my thesis period. Last but not least, the open-source products seem to lack the
”capture” and ”preserve” parts of the ECM lifecycle, but that is not that much of a problem
for my project. During my thesis I have to do a more extensive study on existing ECM
products and abstract the features that I want to include in my framework.
8
2.2 S ERVICE -O RIENTED A RCHITECTURE
2.2.1 Definition
Service-Oriented Architecture (further abbreviated as SOA) is an architectural pattern
that defines the use of services to support the requirement of software users. Services are
software components that allow remote access over standard protocols and provide declarative
descriptions of their requirements and capabilities. These services have well-defined
interfaces that are platform, programming language and operating system independent; or
otherwise said: the services are loosely coupled. These heterogeneous services can interact
with each other in a uniform and universal manner (also denoted as interoperability).
This loose coupling between services benefits companies that adopt a SOA strategy,
because it allows evolutionary and even radical change in the internal IT infrastructure
without a whole rewrite of the software. The need of a business to change can be caused by
partnerships, mergers, acquisitions, changed business focus, changing policies, etc. Therefore
big vendors like HP like to call companies that have adopted a SOA strategy ”Adaptive
Enterprises” and IBM uses the word ”On-Demand Business”. The latter refers to the fact that
change can occur in how things are done or work, as necessary, on demand.
2.2.1.1 SOA SERVICE CHARACTERISTICS
1) Stateless: A SOA service should operate independently of other services, without
pre-conditions and side effects. Therefore a service should be provided with all
the necessary data to do its job.
2) Technology independent: It does not matter what the underlying technology of
the service is. Nor the platform, nor the programming language, nor the
programming paradigm (object-oriented, functional, ...).
3) Well-defined, standard interface
2.2.1.2 SOA C OMPONENTS
Based on figure 2.2, I’ll introduce the components of a SOA.
1) Service provider: A service provider offers a service and registers itself with a
service registry to allow discovery by a service consumer. The start of a service is
always initiated by the request of a service consumer.
2) Service consumer: A service consumer requests a service from a service provider
and supplies it with data. In the case the service providers sends a response, the
service consumer will process this result.
9
3) Service discovery: As with a basic Web Services architecture (which does not
include components like Enterprise Service Bus and Business Process
Management), a services repository is important to dynamically discover services
that can execute a certain task. In a SOA environment however, this services
repository is a key component, but alas often ignored in real-life implementations.
One of the major problems is that the discovery of services is not as dynamic as
foreseen. Initiatives like ”Semantically-empowered Service-Oriented
Architectures” try to use the power of the Semantic Web to enrich services in a
SOA to allow dynamic discovery and negotiation [8].
4) Service binding: Once a service consumer has discovered an appropriate
provider, a binding is established at run-time.
5) Services orchestration and choreography: The difference between service
orchestration and service choreography is subtle and usually not well understood.
Stefan Tilkov made an analogy in his weblog [44] to clarify the difference. I
quote: ”In orchestration, there’s someone - the conductor - who tells everybody
in the orchestra what to do and makes sure they all play in sync. In choreography,
every dancer follows a pre-defined plan - everyone independently of the others.”
While this can help at first to understand the subtle difference between the general
definition of choreography and orchestration, we still need to relate it to services.
Wikipedia [51] describes orchestration as: ”Sequencing services and providing
additional logic to process data. Does not include data presentation.”, while it
has a more extensive definition for choreography: ”Broadly, a choreography
defines how a party interacts with other external parties, for example in terms of
the order of message exchange, or, the path of navigation through a service. The
party, from whose perspective the choreography is viewed, may either be the
client (which could be an orchestration), meaning the choreography defines the
conversation with a service, or the deployed service itself, in which multiple
clients may be involved.” To sum up, we can see orchestration as some kind of
executable process. We use for instance BPEL (described further in this chapter)
to define business processes that can be executed on an orchestration engine.
Choreography could be seen then as some kind of multi-party collaboration. We
can describe externally observable interactions between services.
2.2.2 Value of a SOA architecture
Regardless of the technical values (loosely coupled, interoperable, standards based,
etc) we still need to wonder how we can ”sell” this concept to business people? Without
10
enterprise adoption, SOA will never take off. That’s why it is interesting to follow the articles
of important journalists in the area of software architecture, CIOs, CEOs and other important
persons in the field of computer science. While whitepapers from the big vendors might be a
little biased, some arguments are interesting to take in account.
HP’s ”executive overview whitepaper” targets specifically the senior management (and
thus the decision makers) and is titled ”Adaptive Enterprise: business and IT synchronized to
capitalize on change” [15]. As mentioned before, HP wants to emphasize the keyword
”change” and uses it throughout all their examples. For instance a company that explores new
markets, that takes over companies, that grows... Interesting to note is that they bring a sound
and reasonable story without mentioning any technical aspects like loosely coupled interfaces
and interoperability.
An interesting side-effect of adopting a SOA strategy is mentioned by Joe
McKendrick, SOA market watcher from ZDNet [25]. He describes the strategy of a company
that sells light decoration. They have an old and unsupported version of the ERP software
PeopleSoft running but refuse to follow the forced upgrade path from the vendor. They highly
customized their PeopleSoft implementation and wrapped the back-end services in a SOA
”coating” and wrote the integration brokers themselves. That is a very interesting aspect of
adopting a SOA strategy: you can happily integrate your older legacy software in your new
SOA environment. Then it doesn’t matter anymore when you are upgrading that part of the
infrastructure. Even more, there are often several man-years of work in older systems and it
would be very uneconomical to throw that away.
Last but not least, McKendrick [26] mentions a report from the AberdeenGroup that
calculated that the world’s largest companies could save together 53 billion US dollar in the
next five years when they adopt a SOA strategy. It’s often vague how they come to this
number, but it gives at least an idea and cost reduction is often a good motivator for
management to choose for change.
2.2.3 Implementations and status report
As I already mentioned earlier, SOA is an architectural pattern and not something you
can buy in a shop, boxed like Microsoft’s Office Suite. Therefore the definition of what kind
of software can be tagged as SOA is really hard to make.
The big vendors push their whole middleware stack as ”SOA-enabled”. Usually this
means that using their application servers, BPEL implementations, etc you can develop an
enterprise application based on SOA. I will have a look at some commercial SOA solutions
from the big vendors but I won’t spend much time on it.
11
However, I will invest more time in the search for open-source SOA components and
software, because I will need them for my own ECM implementation. I will compare several
products and give an initial status report. A thorough comparison will be made during my
thesis period.
2.2.3.1 C OMMERCIAL
1) IBM (International Business Machines) [23]
Slogan: On-Demand Business
Product overview: IBM is one of the biggest and earliest pusher of SOA
technology. Even more, their whole business model can be captured in their
”On-Demand Business”. A full and thorough overview of all their SOA-enabled
software is out of the scope of this proposal, but I found a good picture (Fig. 2.3)
that givens an overview of their SOA products. Detailed information on the used
technologies can be found on their website [22].
Evaluation: It is no doubt that IBM has one of the most mature and extensive
support for SOA in their software stack. They were at the root of several SOA
standards and are eager to push the way that they think SOA should be. A nice
example is the Software Component Architecture technology. Lots of code has
been donated to the open-source community to drive the adoption of SOA and
they continuously push the SOA technologies further, like the upcoming
BPEL4People [24] standard.
2) HP (Hewlett-Packard) [13]
Slogan: The adaptive enterprise
Product overview: HP’s focus is on delivering management software and
consulting services [14] and not concrete SOA applications, SOA development
tools nor SOA middleware software. Their consulting staff is trained to give
support for third-party vendors like SAP, BEA and JBOSS. The management
software is known as the HP SOA Management, formerly known as HP
Openview.
So what can the SOA Manager do? The main focus is Web Services
Management. It contains an overview of all the available services and provides
full monitoring of aspects like performance, security, availability etc. It can also
be used to make virtual business services on top of the Web Services, so it allows
orchestration of services. The SOA Manager consists of four components: Web
12
Services Management, Management Integration Platform, Business Services
Catalog and Business Service Designer and Identity Management component 5 .
Evaluation: HP’s ”Adaptive Enterprise” wants to be an answer for changing
enterprises. ”Change” in the sense that enterprises become bigger, focus on other
products and markets, merge with other enterprises, etc. To ”adapt” to these
changes, enterprises need a SOA-based solution to be able to quickly address the
new situations. The executive whitepaper from HP [15] really stresses that time is
a critical factor to stay ahead of the competition and using a Service-Oriented
Architecture, ”an enterprise’s IT team can quickly and easily reassemble and
reconfigure core application and infrastructure services into a wide range of new
business services”.
In contrast to IBM, HP focuses on management tools and consulting services for
companies that want to go for SOA. They use mainly third-party software to
address the needs of their customers and for the hardware, they can shop in their
own store. HP has really impressive server systems and operating systems, so in a
way SOA is just another way to generate income for their hardware department.
3) Microsoft [28]
Slogan: (None as far as I know)
Product overview: Microsoft’s SOA efforts are combined in their Windows
Communication Foundation (WCF), previously known under the code name
Indigo. It replaces technologies like ASMX (ASP.NET Web Services) and
MSMQ (Microsoft Message Queue) and tries to offer a one-stop-shopping
solution. It will be integrated with their upcoming Windows operating system
releases and .NET technology stack. According to ZDNet’s blogger Dana
Gardner [12] the main difference between IBM and Microsoft is that ”IBM
envisions a future where the new productivity action is above the interfaces,
runtimes and discrete messaging protocols. By elevating process innovation
above the older platforms and embracing open source tools and frameworks, IBM
and BEA see SOA abstracting much of an enterprise’s legacy assets and resources
into standards-based services that can be modeled and deployed as general and
easily tuned processes, regardless of their origins.” According to Dana,
Microsoft’s goal is to keep their SOA platform closed and they want to encourage
developers to build services solely on .NET technology. However integration of
5
Identity Management component enables secure web services, secure provisioning, and secure command and
control.
13
third-party services into their Enterprise Service Bus is supported and in the
architectural overview of the WCF [28], Microsoft states that SOAP will be the
message protocol and thus their SOA stack will be interoperable.
Evaluation: In contrast to the other big vendors like IBM and HP, Microsoft
is not using their SOA commitment in their marketing campaigns. You even have
to search quite hard to find something on their site, but using Google, you can see
that they host a big SOA page on their MSDN pages (the URL can be found in the
reference list). Right now it is too early to give an opinion on Microsoft’s SOA
strategy, but we know that Microsoft’s strategy is to produce for a mass market
and when the market will demand SOA, Microsoft will be ready to deliver it to
them.
2.2.3.2 O PEN S OURCE
1) Apache http://www.apache.org
Slogan: N/A
Product overview:
ActiveMQ: Message Queue
Axis: SOAP implementation
Geronimo: J2EE 1.4 Certified Application Server
jUDDI: UDDI Service Registry
Synapse: Web Services Mediation Framework
ServiceMix: Enterprise Service Bus
Tuscany: Implementation of the Service Component Architecture
Woden: WSDL 2.0 implementation
Agila: Business Process Management
Ode: Business Process Management
Evaluation: A lot of code and work has been donated to the Apache Software
Foundation by industry giants as IBM, HP and others. While the projects are
usually from a high quality, they all originate from different projects and therefore
they are completely inconsistent with each other. There is still a lot of work to do
to align Apache’s SOA software stack into a homogenous and competitive
product, but I am convinced that if we want to make an open-source SOA stack
14
happen, then we all have to put our shoulders under the Apache projects.
Therefore I will put a lot of attention to Apache’s product stack for my
SOA-enabled ECM system implementation.
2.2.3.3 C ONCLUSION
It’s for sure that the big vendors are ready for this ”next big thing” called SOA. This is
not surprising, considering that the whole SOA (r)evolution is started and driven by industry.
Usually they make the standards first, develop a reference implementation and then industry
follows. Here it is the other way around, the standards were derived from the implementations
from the big vendors. Although there are some standards, a lot of technologies are not
standardized yet (or are in the process of being standardized) and basically every company has
its own interpretation of what exactly a SOA architecture is and how far you can take it.
The developments in the open-source world are heavily backed by industry (the big
vendors are donating code and engineers to for instance Apache projects), but the products
still need a lot of work. One of the big SOA projects at Apache, called Tuscany, is still in
incubation 6 and needs a lot of work to even reach some kind of beta status. It is most likely
that the open-source community still needs six months to a year from here on, before they can
deliver a product from the level of Apache Tomcat and Apache httpd. Due to this immaturity
of the open source implementations, there are a lot of opportunities for me to jump in some
projects and help, whether it’s testing, debugging, donating code, etc.
The big vendors are pushing their marketing departments to evangelize their SOA
product stack, the only one that is a bit more silent about the whole SOA buzz is Microsoft.
Right now we don’t have a consistent open-source SOA software stack yet, but things might
change soon because the Apache Software Foundation is heavily working on it. Despite the
lack of an open source SOA stack, we have already incredible and very mature open source
software available like Apache’s ServiceMix [21] and ObjectWeb’s Celtix [29].
2.3 B USINESS P ROCESS E XECUTION L ANGUAGE
The Business Process Execution Language (BPEL) was born out of the merger of
IBM’s Web Service Flow Language and Microsoft’s XLANG and its first official name was
BPEL4WS 1.0. Later on BEA, SAP and Siebel joined the group and submitted BPEL4WS
1.1 to the standards group OASIS. The current version of BPEL is called WS-BPEL 2.0, but
usually everyone is referring to it as just BPEL.
6
The Incubator project was created in October of 2002 to provide an entry path to the Apache Software
Foundation for projects and code bases wishing to become part of the Foundation’s efforts. Code donations from
external organizations and existing external projects wishing to move to Apache will enter through the Incubator.
http://incubator.apache.org
15
2.3.1 Definition
To better understand what BPEL means, I’ll firstly introduce some concepts before the
actual definition.
1) Business Process:
The necessary steps or activities to complete a business transaction.
The activities can be performed either by applications or by humans.
Usually long-running.
Multiple parties are involved, often external parties.
2) Programming in the large: According to Wikipedia [48], ”Programming in the
large can refer to programming code that represents the high-level state
transition logic of a system. This logic encodes information such as when to wait
for messages, when to send messages, when to compensate for failed non-ACID
transactions, etc.”
BPEL is an XML-based programming language for defining and managing Web
Service orchestrations or processes. It controls the overall sequence of invocation and also the
actual invocation of the collaborating services. To achieve its task, the BPEL language knows
language constructs like assignments, case logic, sequences, etc.
So BPEL is good at executing a series of activities and interacting with multiple
parties, but does have its limits. BPEL is not designed to let people participate as a service
(like a manager that has to approve a contract) and can’t handle complex business processes
(like processes that can evolve or change during their long-life execution).
Oracle’s ”Weaving Web Services Together” article [32] on their developer website
identified several questions, which are actually also an answer why you would really need a
language like BPEL:
1) How does one handle asynchronous Web service invocations where the called
Web service is long-running and has to call back at a later time into the
originating Web service?
2) How does one correlate requests across many in-flight business processes?
3) How does one invoke Web services in parallel rather than in sequence?
4) How does one undo a long-running transaction in which there has been a failure?
16
5) How does one compose larger business processes out of smaller business
processes?
6) How does one guarantee reliable message delivery?
These are typical questions that arise when you are dealing with long living processes.
Several companies have tried to solve these issues with their own proprietary solutions, but
that only resulted in vendor lock-in for the customers. BPEL however is an official standard
and should be vendor-independent.
2.3.2 Implementations and status report
I will discuss here some open source BPEL engines, but often these are not really
visible. BPEL engines are usually integrated in an Enterprise Service Bus or in SOA
frameworks.
2.3.2.1 O PEN S OURCE
1) Apache Agila BPEL (formerly known as Twister) [9]: The Twister project has
been donated to the Apache Software Foundation and is now in incubation under
the name Agila. The Agila Wiki page does not contain architectural documents at
the moment (well at least not accessible from their wiki), so therefore it’s not
completely clear what it is capable off. However a quick look at their user’s guide
learns us that Agila provides possibilities to define user roles and let users
participate in the business process. It is not clear whether it conforms to the new
BPEL4People standard or not.
2) bexee - BPEL Execution Engine [3]: bexee is a research product from students
from the Berner Fachhochschule in Germany. Although not really clear what it is
actually capable of, it seems like a basic BPEL engine. It is touted as some kind
of experimental sandbox to play with and test new technologies and approaches.
While the enterprise value of this product is rather low, it is interesting to have a
closer look at this project because it is extremely well documented (because it’s a
school research project) and can therefore be useful for me to better understand
some general concepts of BPEL engines.
3) Intalio PXE [18]: Intalio’s PXE (acquired from FiveSight) is an well-known
BPEL engine that is used in Sun Java Studio, Apache ServiceMix, SymphonySoft
Mule, etc. It has support for both BPEL4WS 1.1 and WS-BPEL 2.0. As far as I
understand, the BPEL code is compiled to be able to do static checking and
17
analysis. PXE can run stand-alone or can be embedded in another product. To
conclude: PXE is quite a mature product that is also the basis for Intalio’s
commercial BPEL engine.
2.3.2.2 C ONCLUSION
There are several nice and mature open source BPEL engines available and you see
that they are often acquired by commercial companies (like Intalio’s PXE) or open-source
organizations (like Apache’s Agila). At the moment we can only hope that these commercial
companies keep the software open-source. The school project bexee is interesting for me for
the architectural information it can provide. However it is not suitable to be deployed in a
production environment. A good alternative for our framework could be Apache’s Agila, but
our choice also depends on what Enterprise Service Bus we will chose. For instance Intalio’s
PXE is part of the Apache’s ServiceMix Enterprise Service Bus.
2.4 H UMAN BASED W EB S ERVICE
Certain tasks require human judgment or expertise and cannot be done by a machine.
The participation of people in service compositions is quite a new aspect in Service-Oriented
Architecture and essential for my project. IBM and SAP proposed recently an extension to the
BPEL standard [37], called BPEL4People to achieve the concept of human based web
services.
2.4.1 Definition
The ultimate goal of automating all the business processes is not possible at the
moment. Certain activities require involvement of humans, like approval of certain requests or
when some exceptional situations arise that the system cannot handle.
In our SOA system, a Human Based Web Service does not differ from a regular Web
Service for the environment 7 . So when our Human Based Web Service is called, it notifies
the assigned person(s) and waits (usually asynchronously) for a response from the user. You
can build in mechanisms of course to remind the user when it takes too long, or re-assign it to
someone else.
The big advantage of implementing an interaction with a user in this fashion is that it
can be replaced by software. Let’s say that in an insurance system, there is no need anymore
for human approval. Then we can easily code a piece of software that checks certain
parameters. Of course without any change for the rest of the system!
However we can also deal with more complex situations where we need
double-checking. Imagine that certain insurances for high-profile companies need approval
7
If you remember the fact that the implementation details of an Web Service is not important, this makes sense.
18
from two different persons. This is not so obvious to represent in our SOA environment with
Human Based Web Services. These problems are addressed in the new upcoming standard by
SAP and IBM, called BPEL4People [37]. BPEL4People is an extension on top of BPEL and
right now not (yet?) part of WS-BPEL 2.0.
I can’t find the formal specifications of this BPEL4People extension to discuss, but in a
whitepaper they describe four situations (patterns) where BPEL4People could be of use, so I’ll
discuss these here. In the meanwhile, I contacted some developers at IBM with the question if
they have some kind of reference implementation or formal specifications that I can use.
1) 4-Eyes principle: Also known as the separation of duties. For security reasons or
perhaps just out of company policy, certain actions need involvement of two or
more persons (for instance approval of a loan). It is even possible that the persons
are not allowed to know who the other person is. This can be addressed in
BPEL4People.
2) Escalation: You can put certain time limits on the execution of a task by a person.
When these limits are exceeded, an escalation mechanism starts. It notifies certain
people or reschedules the tasks.
3) Nominations: It is not always clear who should handle a certain task. The
nominations mechanism allows a supervisor to manually assign a task to the best
suited person.
4) Chained execution: It can be possible that a certain task requires several steps to
be executed. It is possible that this is some side effect of an earlier step in the
process and thus unpredictable. Instead of scheduling all these steps as new tasks
in the notification list of the user, we just chain them and present them as one
task. This can be presented to the user in some kind of wizard fashion.
2.4.2 Implementations and status report
2.4.2.1 O PEN S OURCE
None as far as I know.
2.4.2.2 C ONCLUSION
The initiators of this extension have of course built this into their own commercial
products, but there are no open-source implementations available. There is only one vendor,
Intalio, that supplies a BPEL4People compatible product, but that product is not open-source.
19
I emailed someone from IBM to get more information about some implementation or formal
specifications and I will probably work on this during my thesis phase.
2.5 E NTERPRISE S ERVICE B US
The Enterprise Service Bus (here after abbreviated as ESB) unifies and connects
services, applications and resources within an enterprise. Important to note is that ESB is an
architectural pattern that can be implemented in various ways, so it is not a concrete
application.
2.5.1 Definition
The ESB architectural pattern is not a new invention from the SOA era. The idea of
having middleware that interconnects distributed applications like Distributed COM and
CORBA exists already for a long time and is often called Enterprise Application Integration
(EAI). While previous approaches also integrates applications, coordinates resources and
manipulates information, ESB takes this to the next level. Now we can access all kinds of
software (in the form of services), independent of the platform, programming language and
programming paradigm (object-oriented, procedural, functional, etc.).
Both the service providers and the service requestors operate independently without
really knowing the other’s origin. The responsibility of the actual delivery of the messages
and aspects like quality of service (QoS) is up to the ESB.
The nice thing is that this ESB is completely transparent. Neither the developers, nor
the service providers and consumers need to know from before that there will be an ESB as a
transport layer. Even more, you can deploy an application that was never meant to run in an
ESB environment.
Schmidt et al. define the essential characteristics of an Enterprise Service Bus in the
IBM Technical Journal for SOA [36] as: ”the meta-data that describes service requestors and
providers, mediations and their operations on the information that flows between requestors
and providers, and the discovery, routing and matchmaking that realizes a dynamic and
autonomic SOA.”
1) Meta-data: Explicit declaration of capabilities and requirements of interaction
endpoints. This is stored in a service registry for easy discovery. The more
information provided, the easier the matchmaking will be. The used
matchmaking principle is that the service requestor adapts to the service provider,
but the requestor needs to know sufficient information about the requirements and
capabilities of the provider for this process.
20
2) Mediations: The role of a mediation is to satisfy the integration and operational
requirements within the infrastructure. This can be converting from one transport
protocol to another, validation of content, caching, etc.
3) Routing: Messages can be (re-)routed based on rules and content.
Sometimes vendors denote the Enterprise Service Bus as Enterprise Application
Integration Middleware as we can see in IBM’s simplified ESB architecture picture (Fig. 2.4).
2.5.2 Implementations and status report
The idea of an Enterprise Service Bus is not new and therefore the ESB is one of the
most mature components in the Service-Oriented Architecture. I will have a look at three
open source ESBs, that are viable candidates for my framework. The final decision will be
taken during my thesis period.
2.5.2.1 O PEN S OURCE
1) ObjectWeb Celtix [31]: Celtix is IONA’s [41] open source Java ESB, hosted at
the ObjectWeb community. You can consider it as the lightweight, open source
variant of IONA’s commercial ESB Artix [40]. Celtix is quite a young product
and you can notice this when you have a look at the feature list on their website.
They have the basic support for HTTP 1.1 and SOAP 1.1, but also the Java
Messaging Service (based on Active MQ), WS-Addressing (to address web
services and messages), policies and JAX-WS 2.0 (formerly known as JAX-RPC)
are supported. A closer look at their documentation learns that the goal of Celtix
is to be full JBI compliant. JBI is a specification that describes a pluggable
Integration Container that is strongly focused on Web Services, built with WSDL
and other WS technologies at its core. 8 While the documentation is quite limited
and vague on their JBI support, I learned that they want to achieve two goals.
They want to use Celtix as a JBI container that can host JBI-compliant
components, but they also want to allow you to expose Celtix components as
JBI-components and let you deploy them in other JBI containers. Thus, allowing
you to build components with Celtix and then deploy them in another JBI
environment.
2) SymphonySoft Mule [38]: SymphonySoft’s Mule targets itself as a lightweight
messaging framework. The feature list includes: message delivery, message
8
The JBI or Java Business Integration is also known as JSR-208 and is therefore an official Java standard. [33]
21
transformations, dispatching messages, pooling and threading, exception
handling, transaction management and lifecycle management. On the technology
side it has support for JBI, BPEL for orchestration and the other standard
technologies in an ESB. One of the big pros of Mule is the flexibility in usage.
We can use it in a client/server, P2P or ESB environment. The event processing
can be synchronous, asynchronous and request-response. Last but not least, there
is support for content-based and rule-based routing of messages. Bottom line, this
ESB is much more developed than ObjectWeb’s Celtix and offers most of the
functionality that I have in mind for my framework.
3) Apache ServiceMix [10]: ServiceMix has been donated by LogicBlaze (an Open
Source SOA software house) to the Apache Software Foundation and is
completely build upon the JSR-208 or the Java Business Integration standard. To
sum up all the features of ServiceMix, I’ll need another chapter, but I’ll try to sum
up the most important ones. Basically, they thought about almost everything:
going from support for scripting languages (through the JSR-233 scripting
standard), to caching, schema validation, security, notifications, etc. Next to the
standard support for transport protocols like JMS (with ActiveMQ) and HTTP, it
supports RSS, Jabber (the open source chat protocol), FTP, email, etc. The fact
that this product is full-featured makes it harder to improve it, but I noticed that
there is for instance no support (yet?) for BPEL4People (the Human-based Web
Service technology) and probably there will be some more shortcomings.
2.5.2.2 C ONCLUSION
As I pointed out, the open source ESBs are very mature and some are even
production-ready. At first sight SymphonySoft’s Mule and Apache’s ServiceMix are the most
suitable for my framework. Martin LaMonica pointed out in an article on News.com [20] that
ObjectWeb’s Celtix and Apache’s ServiceMix will cooperate and share code and presumably
merge into one ESB in the future. Therefore it might be wise to go for ServiceMix, but I’ll
take a closer look into this matter during my thesis period. Next to that, the described ESBs
are generic frameworks and perhaps need some tweaking for my ECMS purpose. I still have
to figure out how we are going to deal with documents, because documents are not ”just”
messages, especially unique documents that contain for instance specific Service-Level
Agreements. Because the technology of ESB is quite mature, I’ll probably have more time to
invest in the ECMS aspects, instead of the underlying ESB technologies (though
BPEL4People support is lacking, so that can be worth investigating).
22
2.6 S ERVICE C OMPONENT A RCHITECTURE
Service Component Architecture (SCA) is a specification that describes a model for
building applications and systems using a Service-Oriented Architecture. It is an
industry-driven standard backed by IBM, BEA, Interface21, IONA, Oracle, SAP, Siebel and
Sybase [39].
2.6.1 Definition
In November 2005, a consortium of vendors 9 has released a whitepaper [39] where
they explain their view on how a SOA framework should look like. Because a Service
Component Architecture (SCA) is just an interpretation of how a SOA could look like, it
sticks to the same principles that I have earlier described. Therefore it is more interesting to
look where the differences are with respect to the more ”traditional” approach of building
SOA frameworks:
1) Integrated system: In contrast to the more traditional approach, an SCA tries to
offer a single package. So the functionalities of for instance an Enterprise Service
Bus and a service orchestration engine are included.
2) BPEL as first class citizen: a BPEL process for service orchestration can be used
as a service inside an SCA.
3) Services != Web Services: A service within an SCA can be next to a Web Service
also a messaging system (like JMS), CORBA IIOP, etc. This is achieved with
bindings.
4) Infrastructure capabilities: Infrastructure capabilities like security and
transactions are handled by policies and are completely independent of the
implementation code of the services.
2.6.2 Implementations and status report
2.6.2.1 O PEN S OURCE
There is at the moment only one project that tries to implement the Service
Component Architecture specifications and that is Apache’s Tuscany. The project is still in
early beta and under heavy development. The members of the consortium behind SCA have
developed the original code and the current Apache Tuscany project is under incubation.
Because Tuscany will be the reference implementation for SCA, all we know about SCA will
be implemented in it and therefore it has no use to repeat the features here again.
9
BEA, IBM, Interface21, IONA, Oracle, SAP, Siebel and Sybase
23
2.6.2.2 C ONCLUSION
I’ve been monitoring the efforts of the Apache Software Foundation’s mailing list for
Tuscany and it is still in early beta and completely not production ready. Apparently Tuscany
is one of the only implementations available of SCA and therefore no noteworthy
implementation has been done to verify the maturity of the specifications and platform. I’ve
seen several emails about re-architecturing several core components and therefore I can’t say
that Tuscany is stable yet. There is for instance still a lot of work to do on the bindings with
other languages. The team that is working on it seems very enthusiastic and every day there
are more people joining the development so eventually Tuscany will become a production
ready reference implementation for SCA.
2.6.3 Considerations
This industry-pushed approach to SOA has right now one disadvantage: it’s not a
standard. It is backed by a large consortium of companies, but does not get support from Sun
Microsystems. The latter believes in an Enterprise Service Bus that sticks to the earlier
described Java Business Interface. Therefore the SCA specification is not ratified by the Java
Community Process. It is not only Sun Microsystems that doesn’t believe in SCA, Microsoft
does not support it either, perhaps because it’s a threat to their Windows Communication
Foundation. Also JBOSS does not support it. In a recent article [27] Marc Fleury (JBOSS’
CEO) told that he is worried that the big vendors want to dominate the SOA market and
bypass standards organizations and set their own standards to kill commercial open source
companies like JBOSS.
But will this fact withhold SCA from being an (un)official standard? If such a large
vendor group backs it, then it is possible that it will eventually make it as a standard one day.
History has proven that often technologies become a de-facto standard due to their popularity.
Although it is heavily pushed by for instance IBM, it feels a bit ambiguous. IBM has
its complete WebSphere platform with BPEL engine, Enterprise Service Bus and the whole
SOA stack and it puts its money and efforts also on SCA. Of course it is wise to bet your
money on several horses, but perhaps SCA can be seen as the ”next-generation” SOA?
It is a fact that SCA is very promising. Instead of wondering which ESB, BPEL engine
and other infrastructure software you are going to use, you just go for an SCA framework but
as long as SCA is not a standard, the danger exists for vendor lock-in. It is however likely that
this integrated aspect will attract lots of implementers of SOA-enabled systems.
It is interesting for my project to look at this approach and decide whether I’ll go for
the traditional SOA or the SCA way. Both have their advantages and disadvantages. I will
make a decision based on a more thorough comparison study during my thesis project. That
24
decision will also be based on the fact whether we will allow non-standard technologies and
products.
25
Figure 2.1. AIIM’s ECM architecture
26
Figure 2.2. A simplified SOA architecture
Figure 2.3. IBM SOA software stack
27
Figure 2.4. Simplified ESB Architecture.
28
CHAPTER 3
THESIS PROJECT
This third chapter of my thesis proposal answers questions like the why, the how and
the when of my thesis. It goes deeper into the research questions, my own contribution, the
taken approach, the project deliverables and the thesis planning.
3.1 R ESEARCH Q UESTIONS
1) SOA-enabled ECMS: The initiator of the thesis project is the Content and
Knowledge Engineering group. Their interest is the feasibility of an Enterprise
content Management System based on the concepts of a Service-Oriented
Architecture. Their main interest can be found of course in the field of
information science and they focus on aspects like how are we dealing with the
document flow in a SOA?
2) SOA status report: From a software technology point of view, I am interested in
how the SOA architecture works. Because most of the technologies and
implementations are in their infancy (a lot of them are still in alpha or beta
phase), I will make a status report of the standards, technologies and
implementations to be able to identify the weak spots. The next step would be to
try to come up with ideas and solutions to fill these ”gaps”. It would be nice to
see that some of my work will be actually incorporated in some standards,
technologies or implementations.
3) Future of SOA: Service-Oriented Architecture is one of these technologies that
are much over hyped. While it is dangerous to make such predictions, we can
actually look at history and draw conclusions from it. I want to make a
comparison with earlier distributed technologies like for instance CORBA and
look at aspects like the rate of market adoption, the install base, the impact on the
IT industry, etc. Based on these data, we can probably come up with some
conclusions and try to predict a trend for the future.
3.2 OWN C ONTRIBUTION
I will here discuss my own efforts in the area of Service-Oriented Architecture. While
one of the goals will be to deliver a SOA framework for an Enterprise content Management
29
System, a lot of effort will be spent in giving knowledge back to the SOA community. This
can be ideas, but also source code, help in projects, testing, etc. The only problem can be that
it is hard to keep track of all these efforts.
3.2.1 Participate in open source projects
SOA technologies are industry-driven and right now the big vendors have their own
implementation of a SOA architecture. Some of the technologies are already standardized,
like the BPEL Web Service orchestration standard. However, there is no real reference
implementation yet and tools provided by the open source community are still in their infancy.
Noteworthy projects are (but not limited to) the following:
1) Apache Tuscany: Open source implementation of a Service Component
Architecture framework. Hosted by the Apache Software Foundation, but still in
incubation. Lot of work can be done on bindings with other technologies and
languages. Because of the ”immature” nature of this product, there is room for
suggestions to make things better.
2) Apache Axis2-C: Axis is a Web Service server, that can be embedded in other
application servers (like JBOSS). Axis2 is still beta and work needs to be done on
the bindings with other languages.
3) Apache Enterprise Service Bus: The code base of this project has been donated
by commercial companies and so needs work to be polished and integrated with
the other Apache projects.
4) Own open-source BPEL4People implementation (with formal specifications).
5) Other projects will follow.
3.2.2 SOA - ECMS Framework
I will incorporate my ideas and research findings in a framework. This framework can
be used to work upon in the future in other projects. Perhaps I can also donate some code to
open source projects, but I should take a closer look at the open source licenses for that.
Perhaps the project can be hosted at SourceForge or some other project server. Of course it
can also be hosted at the Software Technology Group’s servers.
3.2.3 Master thesis
The results of my research will be documented and bundled in a final work. The added
value of this work will incorporate, but is not limited to, a status report of SOA technologies
30
and implementations, feasibility of a SOA architecture, added value of a SOA architecture,
feasibility of a SOA-enabled ECMS, and others. It is important to note that a comparative
study of SOA technologies is important, because several technologies are overlapping,
resulting in multiple approaches using different technologies. Even more, a status report of
the technologies and projects will give an insight in the maturity of SOA and this will also be
the basis to formulate improvements and formulate solutions for incomplete technologies. I
realize that a master thesis is not the perfect way to give back ideas and knowledge to the
SOA community and therefore, intermediate results and ideas that can stand on themselves
will be published as articles or papers.
3.3 A PPROACH
3.3.1 Maintain a public blog
I will put ideas and intermediate results on my public blog. I hope that some
discussions contribute to my thesis. It’s available from: http://blogs.webcoder.be/lee
3.3.2 Participate in discussions
Several web sites have a forum and discussion corners to exchange and discuss ideas.
Participating in these will be a perfect way to verify my ideas and point of view.
Some web sites with discussion places for SOA related technologies:
1) IBM developerWorks articles:
http://www-128.ibm.com/developerworks/webservices
2) IBM developerWorks blogs:
http://www-128.ibm.com/developerworks/blogs/index.jspa
3) Apache Tuscany mailing list: [email protected]
4) Apache Axis2-C mailing list: [email protected]
5) ZDNet’s Blogs: http://blogs.zdnet.com
6) Several websites will follow
3.3.3 Share intermediate results
I will try to share intermediate results as soon as possible. This can be for instance on
my blog. I will also try to submit an article to websites like The Register [35], ZDNet [54] and
others. Although it is hard to contribute to a site like IBM’s developerWorks, I will try to
submit some articles to them.
31
3.4 P ROJECT D ELIVERABLES
It is obvious that my deliverables will be my thesis document and a prototype of the
SOA-enabled ECMS framework, but my supervisors advised me to prioritize the subprojects
of my thesis project. Therefore I introduce three categories: ”must have”, ”great to have” and
”nice to have” 1 . Each subproject will also have an indication of estimated workload. In
combination with the time slots that are defined in the next section, we can roughly estimate
the time needed for the project. Considering that each time slot can have two units of
workload, I can schedule one subproject of two units in such time frame, or two one-unit
subprojects. Of course this is just an estimation and not exact science. When work advances
quicker than expected, I can reschedule and use three units of works per time slot.
While it is not explicitly mentioned, aspects like contributions to technologies (like an
enhancement proposal for BPEL4People) are part of the proposed subproject. This is because
you detect only shortcomings when you are deeply involved with a certain technology or
project and therefore cannot identify them from before. Other activities like participating in
discussions and blogs are also not explicitly mentioned. Together with the thesis document,
I’ll hand in a brief overview of what ”side-work” I’ve done with probably statistics about the
amount of time spent on what parts of the thesis project.
3.4.1 ”Must have” (Priority 1)
1) Thesis document
2) Case study Elsevier: Work out the processes and work-flow of the paper
submission and handling system at Elsevier (Suggestion of prof. van den Berg).
This will make the project more concrete and will actually verify whether the
results of my research and my prototype ”work”. In this particular case we have
to think about: participants (authors, reviewers, publishers,...), workflow of
document, notifications, editing of documents, approval of participants,
annotations, how to deal with multiple authors, how to deal with changes to
documents (change document? add metadata? ...), etc. (Estimated workload: 1)
3) Generalize case study into a framework: While the Elsevier case study makes our
project more concrete and easier to understand, we still need to transform the case
study into a ”SOA version” and eventually generalize it, to make it suitable for all
kinds of ECM applications. (Estimated workload: 1)
4) Security aspect: Look at aspects like authentication. Is a user authorized to view
certain parts of documents? How do we deal with issues like non-repudiation,
1
Priority: ”must have” > ”great to have” > ”nice to have”
32
preventing of message altering, etc. Will a traditional (single sign-on)
authentication suffice or do we need other variants like a rule-based system? Can
we trust a service or build a trust relation with it? (Estimated workload: 1)
5) Technical architecture: We know how a SOA looks like and what kind of
components we need. Now we have to make a decision which approach we are
taking. Will we use the more standards-based approach with BPEL, ESB and WS
or shall we go for the IBM/BEA pushed Service Component Architecture? In this
phase we choose for the approach and the concrete applications, frameworks
and/or technologies. The resulting framework should consist of open source code
that can be freely distributed again. (Estimated workload: 1)
6) User interface: How are we presenting the data to the end-user, especially with
respect to the involvement of users in human-based web services. An often-used
approach in this Internet era is a web interface where they make use of portal
technologies. (Estimated workload: 1)
7) Business Process Management: This is one of the more challenging parts of my
thesis project. The area of BPM is very cluttered and most projects (especially
open-source) are immature and still in development. There are also several ways
to achieve BPM like with or without rule engine, with or without ESB, what
version of BPEL, how to integrate BPEL4People, etc. So the first issue will be to
determine what we want and come up with an open-source solution. Almost for
sure, I will have to work on the projects themselves to get everything working and
polished. (Estimated workload: 3)
8) Internal document format: In our ECMS, we have to deal often with documents.
This subproject wants to look at how we can represent document data inside our
system. A possible candidate could be the new OpenDocument format [30] that is
used in the new OpenOffice office suite. However, our ECMS should be able to
handle other data too. How are we going to deal with documents written in a
proprietary format (like Microsoft’s Word) or how are we going to deal with
binary data like images, PDF, video, etc. (Estimated workload: 1)
9) SOA Monitoring and administration: Ideally we want to monitor our SOA
environment. Are our services up and running? How much load do they have?
What processes are active or pending? In the case of our ECMS: where are our
documents? Can we replace services at runtime and how are we dealing with
running processes that are using these services? As part of my project, I can think
33
about and develop an open source SOA/BPEL monitoring tool. Can be
web-based, but also fat-client. Right now only the big vendors offer commercial
packages, so such tool would be useful. The need for good SOA governance has
also been identified recently by Colleen Frye of SearchWebservices.com [11].
(Estimated workload: 2)
3.4.2 ”Great to have” (Priority 2)
1) SOA market study: What is the adoption of SOA technology now and prediction
for the (near) future? (This will probably be based on research by companies like
Gartner.) (Estimated workload: 0.5)
2) Comparison of the evolution of SOA adoption and success with older comparable
technologies like CORBA, RMI, DCOM, etc. (Estimated workload: 0.5)
3) Service State: Is it possible that these services have some kind of ”state”
(although this is a bit contradictory to the SOA idea, it can be useful sometimes).
(Estimated workload: 0.5)
4) Atomic operations: How can we deal with atomic operations (transactions)?
(Estimated workload: 0.5)
5) Predicting failure of long-running BPEL processes: One of the problems with
long-running BPEL processes is that they can fail somewhere in the middle.
Imagine that we are working with the latest WS-BPEL 2.0 standard with the
added BPEL4People extension. Then it is possible that someone forgot to assign
a person to a certain task, resulting in the halting of the process. Imagine that we
set a time-out of 3 days before our process reports a failure, then we loose a lot of
time and resources. I was wondering if we could have some system (perhaps
extension on BPEL) that can somehow predict the failure or the success of a
long-running process before the process starts. Right now I don’t know how this
could be done and it is not part of the standard as far as I know, so this would be
worth investigating. (Estimated workload: 1.5)
3.4.3 ”Nice to have” (Priority 3)
1) Semantically-Empowered Service-Oriented Architecture (SESA): Use Semantic
Web technologies to enhance the dynamicity of service discovery and negotiation.
This will be based on a paper that I’ve written for the course of Semantic Web on
the fusion of Semantic Web and Service-Oriented Architecture [34]. (Estimated
workload: 3)
34
2) Service enable SAP: Take a look at the product stack of a big ERP vendor like
SAP and look how we can re-architecture its product/technology stack according
to the ideas of SOA. We can also wonder if my proposed framework would suit
this purpose. (Estimated workload: 0.5)
3) P2P: These ECMS systems will probably have some kind of central system that
keeps track of the documents, notifications, changes, etc. However, would it be
feasible to have a P2P based system? How are we dealing then with versions?
How are we dealing then with notifications? Is every client then subscribed to all
the other clients that have documents, so every client is also some kind of service,
or... ? (Estimated workload: 1)
3.5 T IME P LANNING
While it is always difficult to make a strict planning, it can be useful to keep an eye on
the progress with respect to the finish date. I’ll assume that there will be a two-weekly
meeting with my thesis supervisors to discuss the work and certain issues. Perhaps it would
be useful to arrange a fixed meeting moment for period 3 and period 4.
1) 13/02/06 - 26/02/06: Unofficial start of my thesis project. This period will be
mainly reserved for literature study. One of the priorities will be for instance to
read the IBM Systems Journal on SOA [17], a reviewed, 250-page collection of
top-quality research papers on Service-Oriented Architecture. Of course the
literature study is not limited to this Journal, I still have a whole list of articles
that I need to read, but due to time constraints, I’ve just marked them as ”to be
read” until now. I will also use this period to set up my server and development
environment and think about the practical issues and technical details.
2) 27/02/06 - 12/03/06: Part 1:
Case study Elsevier
Generalize case study into a framework
3) 13/03/06 - 26/03/06: Part 2:
Technical architecture
Business Process Management
4) 27/03/06 - 09/04/06: Part 3:
Business Process Management
35
5) 10/04/06 - 23/04/06: Part 4:
SOA Monitoring and Administration
6) 24/04/06 - 07/05/06: Part 5:
Internal Document Format
Security
7) 08/05/06 - 21/05/06: Part 6:
User Interface
Predicting of failure
8) 22/05/06 - 04/06/06: Part 7: As advised by my thesis supervisors, I will reserve
this time slot as a buffer. I can either use it to catch up, or in the case I have time
left, I can tackle the ”great to have” and the ”nice to have” subprojects.
9) 05/06/06 - 18/06/06: Part 8: As advised by my thesis supervisors, I will reserve
this time slot as a buffer. I can either use it to catch up, or in the case I have time
left, I can tackle the ”great to have” and the ”nice to have” subprojects.
10) 19/06/06 - 29/06/06: Finalization. This should be the period where my thesis
project is under review by the thesis committee and where I can do my thesis
defense.
3.6 C ONCLUSION
During the literature study I noticed that there is room for an open source
SOA-enabled ECM system. There are several open source projects that call themselves
”enterprise” but they are merely Web Content Management Systems with some advanced
features. As professor van den Berg pointed out during the first meeting, there is not a ”real”
SOA-enabled ECM system available yet. Some companies boast their product as
”SOA-ready” or expose several services as Service-Oriented, but none has built an ECM
system from the ground up with Service-Oriented Architecture in mind.
A possible direction for this project is to use as much existing components as possible
(for instance from the Apache Software Foundation) and try to release it in future as an ECM.
Although there exist already lots of good SOA software, due to the young nature of the field
of Service-Oriented Architecture we have enough opportunities to improve existing standards
and products.
I realize that I can’t deliver a full-featured, production-ready Enterprise Content
Management System in my limited thesis period, but my basis can serve as a starting point for
36
future collaborations between the Software Technology group and Content and Knowledge
Engineering group. Students from both groups could extend and improve the product for
instance.
37
BIBLIOGRAPHY
[1] AIIM. Aiim - the enterprise content management association. URL:
http://www.aiim.org/.
[2] Six Apart. Movable type. URL: http://www.sixapart.com/movabletype/.
[3] bexee. bexee - bpel execution engine. URL: http://bexee.sourceforge.net/.
[4] Zope Corporation. Zope. URL: http://www.zope.com/.
[5] Bram Dons. Opslagnetwerken: DAS, SAN en NAS. Academic Service, 2003. ISBN 90
395 2142 5.
[6] EMC. Emc documentum. URL: http://www.documentum.com/.
[7] eZ systems. ez systems. URL: http://ez.no/.
[8] Dieter Fensel. Semantically-empowered service-oriented architecture (sesa). Technical
report, DERI International, 2005. URL:
http://www.cs.umd.edu/users/hendler/2005/Fensel.pdf.
[9] Apache Software Foundation. Apache agila bpel. URL:
http://wiki.apache.org/agila/.
[10] Apache Software Foundation. Apache servicemix. URL: http://servicemix.org.
[11] Colleen Frye. Soa governance. URL: http://searchwebservices.techtarget.
com/originalContent/0,289142,sid26_%gci1164393,00.html.
[12] Dana Gardner. Has microsoft missed the soa bus? URL:
http://blogs.zdnet.com/BTL/?p=1879.
[13] Hewlett-Packard. Adaptive enterprise. URL:
http://h71028.www7.hp.com/enterprise/cache/6842-0-0-225-121.html.
[14] Hewlett-Packard. Hp press release: Hp expands soa services, opens center to help
customers enhance business performance. URL:
http://www.hp.com/hpinfo/newsroom/press/2005/050628c.html.
[15] Hewlett-Packard. Adaptive enterprise: Business and it synchronized to capitalize on
change. Technical report, Hewlett-Packard, 2005. URL:
http://h71028.www7.hp.com/enterprise/downloads/Adaptive%
20Enterprise%20%Overview%20WP_4AA0-0760ENW,Rev%201.pdf.
[16] Hummingbird. Hummingbird. URL: http://www.hummingbird.com/.
[17] IBM. Ibm systems journal: Service-oriented architecture. URL:
http://www.research.ibm.com/journal/sj44-4.html.
[18] Intalio. Intalio pxe. URL: http://pxe.fivesight.com/wiki/display/PXE/Home.
[19] Miro International. Mambo. URL: http://www.mamboserver.com/.
38
[20] Martin LaMonica. Open source projects intertwine for integration. URL:
http://news.com.com/Open-source+projects+intertwine+for+integration/
210%0-7344_3-5844789.html?tag=nef.
[21] LogicBlaze. Servicemix. URL: http://servicemix.codehaus.org/.
[22] International Business Machines. New to soa and web services. URL:
http://www-128.ibm.com/developerworks/webservices/newto/#7.
[23] International Business Machines. On-demand business. URL:
http://www-306.ibm.com/e-business/ondemand/us/index.html.
[24] International Business Machines. Ws-bpel for people. URL:
http://www-128.ibm.com/developerworks/webservices/library/
specification%/ws-bpel4people/.
[25] Joe McKendrick. Controlling one’s own fate, through soa. URL: http://blogs.
zdnet.com/service-oriented/?p=524&part=rss&tag=feed&subj=z%dblog.
[26] Joe McKendrick. Is soa a 53 billion dollar secret? URL: http://blogs.zdnet.com/
service-oriented/?p=520&part=rss&tag=feed&subj=z%dblog.
[27] Michael Meehan. Jboss’ marc fleury on soa standards, java and paranoia, part 2. URL:
http://searchwebservices.techtarget.com/qna/0,289202,sid26_
gci1155951,0%0.html.
[28] Microsoft. .net architecture center: Service-oriented architecture. URL:
http://msdn.microsoft.com/architecture/soa/default.aspx.
[29] ObjectWeb Open Source Middleware. Celtix: Open source java enterprise service bus.
URL: http://celtix.objectweb.org/.
[30] OASIS. Open document format for office applications (opendocument) v1.0. URL:
http://www.oasis-open.org/committees/download.php/16053/
OpenDocument-v1%.0-os_tagged.pdf.
[31] ObjectWeb. Objectweb celtix. URL: http://celtix.objectweb.org/.
[32] Oracle. Weaving web services together. URL: http:
//www.oracle.com/technology/oramag/oracle/04-jul/o44dev_web.html.
[33] Java Community Process. Jsr 208: Java business integration. URL:
http://www.jcp.org/en/jsr/detail?id=208.
[34] Lee Provoost and Erwan Bornier. Service-oriented architecture and the semantic web: a
killer combination? URL: http://lee.webcoder.be/papers/sesa.pdf, February
2006.
[35] The Register. The register. URL: http://www.theregister.co.uk.
[36] M.-T. Schmidt, B. Hutchison, P. Lambros, and R. Phippen. Elements of the
service-oriented-architecture infrastructure, volume 44, chapter The Enterprise Service
Bus: Making service-oriented architecture real, page 781. IBM, 2005. URL:
http://www.research.ibm.com/journal/sj/444/schmidt.pdf.
39
[37] SDN. Ws-bpel extension for people. Technical report, SAP Developer Network, 2005.
URL: https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.
km.cm.do%cs/library/uuid/cfab6fdd-0501-0010-bc82-f5c2414080ed.
[38] SymphonySoft. Symphonysoft mule. URL: http://mule.codehaus.org/.
[39] BEA Systems, IBM, Interface21, IONA, Oracle, SAP, Siebel, and Sybase. Service
component architecture. URL:
http://www.iona.com/devcenter/sca/SCA_White_Paper1_09.pdf.
[40] IONA Technologies. Iona artix. URL:
http://www.iona.com/products/artix/welcome.htm.
[41] IONA Technologies. Iona technologies. URL: http://www.iona.com/.
[42] Fourth Edition The American Heritage Dictionary of the English Language. Wysiwyg.
URL: http://dictionary.reference.com/search?q=wysiwyg.
[43] One Hundred Seventh Congress Of the United States of America. Sarbanes-oxley act of
2002. URL: http://files.findlaw.com/news.findlaw.com/hdocs/docs/
gwbush/sarbanesoxl%ey072302.pdf.
[44] Stefan Tilkov. Choreography vs. orchestration. URL :http://www.innoq.com/blog/
st/2005/02/16/choreography_vs_orchestration.%html.
[45] Tommy. Alfresco releases open source enterprise content management solution. URL:
http://www.linuxelectrons.com/article.php/20051110092348936.
[46] CMS Wiki. History of cms. URL: http://72.14.207.104/search?q=cache:
VZ7jW_gtOTIJ:www.cmswiki.com/tiki-i%ndex.php%3Fpage%3DHistoryOfCMS+
history+cms&hl=en&lr=&client=safari&strip=1.
[47] Wikipeda. Document management system. URL:
http://en.wikipedia.org/wiki/Document_management_system.
[48] Wikipedia. Bpel. URL: http://en.wikipedia.org/wiki/BPEL.
[49] Wikipedia. Content management system. URL:
http://en.wikipedia.org/wiki/Content_management_system.
[50] Wikipedia. Enterprise content management system. URL:
http://en.wikipedia.org/wiki/Enterprise_content_management_System.
[51] Wikipedia. Service-oriented architecture. URL:
http://en.wikipedia.org/wiki/Service-oriented_architecture.
[52] Wikipedia. Wikipedia. URL: http://www.wikipedia.org.
[53] WordPress. Wordpress. URL: http://wordpress.org/.
[54] ZDNet. Zdnet. URL: http://www.zdnet.com/.