A SOA-ENABLED ENTERPRISE CONTENT MANAGEMENT SYSTEM A thesis proposal Presented to the Department of Information and Computing Sciences Utrecht University In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Lee Provoost February 2006 UTRECHT UNIVERSITY The Undersigned Faculty Committee Approves the thesis proposal of Lee Provoost: A SOA-Enabled Enterprise Content Management System prof. dr. S. D. Swierstra, Chair Software Technology Group dr. A. Bijlsma, Thesis Supervisor Software Technology Group prof. dr. J. van den Berg, Thesis Supervisor Content and Knowledge Engineering Group Approval Date iii ABSTRACT OF THE THESIS PROPOSAL A SOA-Enabled Enterprise Content Management System by Lee Provoost Master of Science in Computer Science Utrecht University, 2006 An Enterprise Content Management (ECM) system has been widely recognized as a key asset in nowadays’ companies. Such systems are necessary to streamline and to manage the flow of huge amounts of data inside a corporate environment. Every company changes or evolves during its lifecycle and Service-Oriented Architecture tries to address this problem of change. These ”Adaptive Enterprises” (as described by HP) could benefit from an Enterprise Content Management system built upon the principles of SOA. This master’s thesis tries to design an architecture for such system. However, the field of SOA is very young and lots of technologies and standards are in their infancy and still need lots of work to make it to production status. As part of this project, I’ll try to address several of these issues. iv TABLE OF CONTENTS PAGE ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi CHAPTER 1 2 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Short description of thesis project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Purpose of this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CONTEXT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 Enterprise Content Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Service-Oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Value of a SOA architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 2.2.3 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Business Process Execution Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Human Based Web Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Enterprise Service Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6 Service Component Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.2 Implementations and status report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.3 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 THESIS PROJECT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 v 3.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Own Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Participate in open source projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 SOA - ECMS Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.3 Master thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.1 Maintain a public blog. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.2 Participate in discussions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.3 Share intermediate results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Project Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.1 ”Must have” (Priority 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.2 ”Great to have” (Priority 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.3 ”Nice to have” (Priority 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.5 Time Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 vi LIST OF FIGURES PAGE Figure 2.1 AIIM’s ECM architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Figure 2.2 A simplified SOA architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Figure 2.3 IBM SOA software stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Figure 2.4 Simplified ESB Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1 CHAPTER 1 INTRODUCTION 1.1 S HORT DESCRIPTION OF THESIS PROJECT This master’s thesis project tries to design an Enterprise Content Management (ECM) system according to the principles of Service-Oriented Architecture (SOA). While there are several commercial companies working on service-enabling their software stack, there is no ECM that has been built from scratch with the ideas of SOA in mind. The field of SOA is very young and most technologies and standards are still in their infancy and still require lots of effort to become production-ready. The goal of this project is to identify weak points in the current standards and implementations and try to come up with solutions for them. At the end, a prototype of an open-source SOA-enabled ECM system should be delivered. 1.2 P URPOSE OF THIS DOCUMENT This thesis proposal tries to describe the context and the project of my master’s thesis. It firstly describes the used technologies and standards that I will need in my work. This is important because definitions are often vague and usually everyone has its own understanding of what a certain technology means. I don’t want to pretend that my definitions are the perfect ones, but they serve as a basis for a common understanding between my supervisors and me. Secondly, I’ll explain what I am going to do during my thesis project. I’ll clarify my research questions, my approach, my contributions to the field, my project deliverables and I will give a time planning. 2 CHAPTER 2 CONTEXT This chapter introduces the main concepts that I will use in my thesis. First I will start with what my understanding is of Service-Oriented Architecture and Enterprise Content Management, because these two technologies are quite vague and everybody seems to have their own definition. Then I will have a closer look at some other technologies that I might need and during my thesis I’ll decide which I will use. For each technology, some implementations will be discussed with a status report whether it is still in beta or already a mature product. That report can serve as a basis for identifying needs that we can try to solve as part of the thesis project. 1 2.1 E NTERPRISE C ONTENT M ANAGEMENT S YSTEM Definition from the Association for Information and Image Management (AIIM) 2 : ”Enterprise Content Management is the technologies used to Capture, Manage, Store, Preserve and Deliver content and documents related to organizational processes. ECM tools and strategies allow the management of an organization’s unstructured information, wherever that information exists. The ECM industry provides information management solutions to help users: guarantee business continuity; enable employee, partner, and customer collaboration; ensure legal and regulatory compliance and reduce costs through process streamlining and standardization.” [1] 2.1.1 Definition Before I’ll start explaining each part of the definition from the AIIM (given above), let me first introduce the terms: Content Management System (CMS) [49] and Document Management System (DCM) [47]. There is a lot of confusion about these terms and they are often interchanged with each other and with the term Enterprise Content Management (ECM) [50]. After the definition of CMS and DMS, we’ll take a closer look at what an ECM is. 1 I should note that the technologies that I will use in my final work are not limited to the ones described in this chapter. In my final work, some other technologies might be introduced. 2 Also known as the Enterprise Content Management Association 3 2.1.1.1 C ONTENT M ANAGEMENT S YSTEM The history of the electronically Content Management Systems dates back to halfway the seventies according to the CMS Wiki [46]. At that time they used electronic publishing on mainframe systems. Nowadays when people are talking about a CMS, they are usually referring to web-based Content Management Systems, which are only a specific subset of Content Management Systems. Apparently already in 1995 CNET had an internally developed Web Content Management System, which spun of a bit later into Vignette (now a leading ECM vendor). The CMS Wiki defines several core concepts: 1) Content needs to be stored in a repository. That can be either on the file system or in a database. 2) WYSIWIG 3 editing of content. 3) Workflow automation of tasking sequences and business rules. 4) Content check-in and check-out 5) Version control 6) Linking of content with each other 7) Metadata of content (e.g. date changed, author, editors, etc.) 8) Reuse of content in other content 9) Multi-channel Delivery (print, PDF, PDA, cell phone, etc.) 10) Personalization 11) Multiple languages and localization 12) Separation of creation, management and delivery of content. However, this definition by the CMS Wiki is rather strict. Most systems that are tagged as ”CMS” do not adhere to all these requirements, especially in the field of Web Content Management Systems. These systems are nowadays very popular, because a Wiki (like Wikipedia [52]) and a weblog (like Movable Type [2] and WordPress [53]) are also considered to be a Content Management System. While a Wiki adheres to most of the rules 3 ”WYSIWYG is an acronym for What You See Is What You Get. Relating to or being a word-processing or desktop publishing system in which the screen displays text exactly as it will be printed [42].” 4 described above, a blog does not. There are also several frameworks available for creating your own Web Content Management System like Mambo [19] and Zope [4]. Wikipedia [49] adds that there are actually several categories of Content Management systems, whereas Enterprise Content Management is one of them. Next to ECM systems, we have also Web Content Management Systems (as earlier described), Publications Management Systems (for publishing manuals, books, etc) and several others. 2.1.1.2 D OCUMENT M ANAGEMENT S YSTEM Originally a Document Management System was used to manage scanned documents. Companies wanted to keep track of paper documents and they wanted to store them electronically. We can distinct three kinds of Document Management Systems: Electronic Document Management Systems, Integrated Document Management Systems and Physical Paper Document Management Systems (or also known as Document Imaging Systems). Electronic Document Management Systems manage (as their name implies) electronic documents like Microsoft Word’s documents. The difference with the original DMS is that the creation of the document is also electronically and not the result of scanning paper documents. It offers possibilities to electronically sign documents and it can support legal requirements like Sarbanes-Oxley 4 . Last but not least all documents are under rights-management control, so there are restrictions on who can access, view, print or modify a document. An Electronic Document Management System is typically part of an Enterprise Content Management System. An Integrated Document Management System is a DMS that is integrated with a document-authoring tool, like Microsoft’s Word. In this case, a person could open a document from the repository directly in Word, edit it and save it. The Integrated Document Management System takes care of version management and access rights. Physical Paper Document Management Systems are in fact the traditional Document Management Systems that manage scanned documents. Usually the user is required to add some tags, to ease the search for a particular document. With the popularity of Optical Character Recognition (OCR) software, these systems are able to recognize text and store the text together with the scanned document. 2.1.1.3 E NTERPRISE C ONTENT M ANAGEMENT Let me rephrase the definition from AIIM: ”Enterprise Content Management is the technologies used to Capture, Manage, Store, Preserve and Deliver content and documents 4 The Sarbanes-Oxley Act of 2002 is to protect investors by improving the accuracy and reliability of corporate disclosures made pursuant to the securities law, and for other purposes. This act came after the huge financial scandal of Enron in the United States. More information about the Sarbanes-Oxley Act can be found here [43] 5 related to organizational processes.” The five actions can also be found on the architectural schema from AIIM (Fig. 2.1). I’ll explain them further: 1) Capture: Generating, capturing, preparing and processing analog and electronic information. The source can be digital (EDI documents, XML documents, ...) or analog (scanned invoices, ...). 2) Manage: Management, processing and use of information. Document management: check-in/check-out, versioning, search, navigate and visualizing. Collaboration: Whiteboards for brainstorming, appointment scheduling, project management, jointly usable information databases, etc. Web Content Management: The Web interface to our ECM. Tasks include conversion from and to Web formats, editing and creation of documents, access control (like guest access), etc. Business Process Management: Workflow functionality, process and data monitoring, Enterprise Application Integration (see further in this chapter: Enterprise Service Bus) and business intelligence (e.g. rules). 3) Store: Storage of content is in this definition different than preserving content. The ”store” action is used for the temporary storage of information that does not require archiving. We use repositories for this action, which can be either the file system or a database. 4) Deliver: This is also called ”output management”. Here we take care of delivering the content to the end-user. This includes converters (for instance PDF), compression (images), layout (for instance with XSLT), access control (the right user accesses the right document), etc. 5) Preserve: Preserving is the long-term archiving of the content. Nowadays the archive is often stored on a Storage Area Network (SAN, [5]) or Network-Attached Storage (NAS, [5]) instead of the traditional backup tapes. The preserve action also includes the necessary work to keep your data accessible. For instance for older data, it is perhaps necessary to convert them to a new file format, or emulate viewers to access them. 6 2.1.2 Implementations and status report I will give here a brief overview of some popular ECM systems. During my thesis phase, I will have a thorough look at them to find out what these systems offer and how I can integrate these features in my framework. Right now, I’ll take the five actions within an ECM (Capture, Manage, Store, Preserve and Deliver) as a guide for my review. You will see that the commercial vendors have a more extensive portfolio and often they have several products to fulfill one of the five actions, instead of just one product. 2.1.2.1 C OMMERCIAL 1) Hummingbird [16]: Hummingbird has seven products with each several features that fit into the ”five-action” model. I’ll mention the product with the specific task. Capture: Hummingbird Enterprise (capturing); Hummingbird DOCS (image capture + annotation); Hummingbird ImageBASIC (image capture) Manage: Hummingbird Enterprise (version control + workflow + collaboration); Hummingbird Business Intelligence; Hummingbird SearchServer; Hummingbird DOCS (routing of documents) Store: Hummingbird Enterprise; Hummingbird DOCS Preserve: Hummingbird Enterprise (archiving) Deliver: Hummingbird Enterprise (security); Hummingbird ImageBASIC (image viewer) 2) EMC documentum [6]: Documentum touts to have 80 products that fall in four categories: content services, process services, repository services and integration services. They try to evolve their product stack towards a SOA model and therefore documentum is really interesting for me to take a closer look at. Capture: Content services (capture + edit) Manage: Process services (collaboration + business process management); Content services (search) Store: Repository services (storage); Content services (library services) Preserve: Repository services (archiving) Deliver: Content services (transformation + publishing/distribution) 7 2.1.2.2 O PEN S OURCE 1) Alfresco [45]: Alfresco calls itself ”the Open Source alternative for Enterprise Content Management”. They deliver a full-featured open-source ECM, but also a commercial version. The big difference lays in the support and some enterprise features like single-sign on, clustering, LDAP authentication, etc. Capture: Manage: Check-in / check-out; version control; meta data on document editing (who created, who updated, etc); team collaboration; integrated workflow; document lifecycle management; advanced search Store: Virtual file system Preserve: Deliver: Transformation to different file formats like PDF, Flash and Powerpoint; document security; user management 2) eZ systems [7]: eZ System has a quite good package for Web Content Management with built-in blog, forum and Wiki. However to call their package Enterprise Content Management is a bit too much. Capture: Manage: Version control with rollback; workflow system Store: Preserve: Deliver: Automatic image conversion; preview, drafts; content stored in XML and can be converted to all kind of formats 2.1.2.3 C ONCLUSION The commercial ECM vendors seem to be able to span the whole lifecycle of content, from capturing till delivering. They also are worthy to have the label ”Enterprise” in my opinion. In the field of open-source ECM, products often call themselves ”Enterprise”, but they are merely a Web Content Management System geared towards Wiki-style usage. However Alfresco is a very convincing open-source product and worth taking a closer look at during my thesis period. Last but not least, the open-source products seem to lack the ”capture” and ”preserve” parts of the ECM lifecycle, but that is not that much of a problem for my project. During my thesis I have to do a more extensive study on existing ECM products and abstract the features that I want to include in my framework. 8 2.2 S ERVICE -O RIENTED A RCHITECTURE 2.2.1 Definition Service-Oriented Architecture (further abbreviated as SOA) is an architectural pattern that defines the use of services to support the requirement of software users. Services are software components that allow remote access over standard protocols and provide declarative descriptions of their requirements and capabilities. These services have well-defined interfaces that are platform, programming language and operating system independent; or otherwise said: the services are loosely coupled. These heterogeneous services can interact with each other in a uniform and universal manner (also denoted as interoperability). This loose coupling between services benefits companies that adopt a SOA strategy, because it allows evolutionary and even radical change in the internal IT infrastructure without a whole rewrite of the software. The need of a business to change can be caused by partnerships, mergers, acquisitions, changed business focus, changing policies, etc. Therefore big vendors like HP like to call companies that have adopted a SOA strategy ”Adaptive Enterprises” and IBM uses the word ”On-Demand Business”. The latter refers to the fact that change can occur in how things are done or work, as necessary, on demand. 2.2.1.1 SOA SERVICE CHARACTERISTICS 1) Stateless: A SOA service should operate independently of other services, without pre-conditions and side effects. Therefore a service should be provided with all the necessary data to do its job. 2) Technology independent: It does not matter what the underlying technology of the service is. Nor the platform, nor the programming language, nor the programming paradigm (object-oriented, functional, ...). 3) Well-defined, standard interface 2.2.1.2 SOA C OMPONENTS Based on figure 2.2, I’ll introduce the components of a SOA. 1) Service provider: A service provider offers a service and registers itself with a service registry to allow discovery by a service consumer. The start of a service is always initiated by the request of a service consumer. 2) Service consumer: A service consumer requests a service from a service provider and supplies it with data. In the case the service providers sends a response, the service consumer will process this result. 9 3) Service discovery: As with a basic Web Services architecture (which does not include components like Enterprise Service Bus and Business Process Management), a services repository is important to dynamically discover services that can execute a certain task. In a SOA environment however, this services repository is a key component, but alas often ignored in real-life implementations. One of the major problems is that the discovery of services is not as dynamic as foreseen. Initiatives like ”Semantically-empowered Service-Oriented Architectures” try to use the power of the Semantic Web to enrich services in a SOA to allow dynamic discovery and negotiation [8]. 4) Service binding: Once a service consumer has discovered an appropriate provider, a binding is established at run-time. 5) Services orchestration and choreography: The difference between service orchestration and service choreography is subtle and usually not well understood. Stefan Tilkov made an analogy in his weblog [44] to clarify the difference. I quote: ”In orchestration, there’s someone - the conductor - who tells everybody in the orchestra what to do and makes sure they all play in sync. In choreography, every dancer follows a pre-defined plan - everyone independently of the others.” While this can help at first to understand the subtle difference between the general definition of choreography and orchestration, we still need to relate it to services. Wikipedia [51] describes orchestration as: ”Sequencing services and providing additional logic to process data. Does not include data presentation.”, while it has a more extensive definition for choreography: ”Broadly, a choreography defines how a party interacts with other external parties, for example in terms of the order of message exchange, or, the path of navigation through a service. The party, from whose perspective the choreography is viewed, may either be the client (which could be an orchestration), meaning the choreography defines the conversation with a service, or the deployed service itself, in which multiple clients may be involved.” To sum up, we can see orchestration as some kind of executable process. We use for instance BPEL (described further in this chapter) to define business processes that can be executed on an orchestration engine. Choreography could be seen then as some kind of multi-party collaboration. We can describe externally observable interactions between services. 2.2.2 Value of a SOA architecture Regardless of the technical values (loosely coupled, interoperable, standards based, etc) we still need to wonder how we can ”sell” this concept to business people? Without 10 enterprise adoption, SOA will never take off. That’s why it is interesting to follow the articles of important journalists in the area of software architecture, CIOs, CEOs and other important persons in the field of computer science. While whitepapers from the big vendors might be a little biased, some arguments are interesting to take in account. HP’s ”executive overview whitepaper” targets specifically the senior management (and thus the decision makers) and is titled ”Adaptive Enterprise: business and IT synchronized to capitalize on change” [15]. As mentioned before, HP wants to emphasize the keyword ”change” and uses it throughout all their examples. For instance a company that explores new markets, that takes over companies, that grows... Interesting to note is that they bring a sound and reasonable story without mentioning any technical aspects like loosely coupled interfaces and interoperability. An interesting side-effect of adopting a SOA strategy is mentioned by Joe McKendrick, SOA market watcher from ZDNet [25]. He describes the strategy of a company that sells light decoration. They have an old and unsupported version of the ERP software PeopleSoft running but refuse to follow the forced upgrade path from the vendor. They highly customized their PeopleSoft implementation and wrapped the back-end services in a SOA ”coating” and wrote the integration brokers themselves. That is a very interesting aspect of adopting a SOA strategy: you can happily integrate your older legacy software in your new SOA environment. Then it doesn’t matter anymore when you are upgrading that part of the infrastructure. Even more, there are often several man-years of work in older systems and it would be very uneconomical to throw that away. Last but not least, McKendrick [26] mentions a report from the AberdeenGroup that calculated that the world’s largest companies could save together 53 billion US dollar in the next five years when they adopt a SOA strategy. It’s often vague how they come to this number, but it gives at least an idea and cost reduction is often a good motivator for management to choose for change. 2.2.3 Implementations and status report As I already mentioned earlier, SOA is an architectural pattern and not something you can buy in a shop, boxed like Microsoft’s Office Suite. Therefore the definition of what kind of software can be tagged as SOA is really hard to make. The big vendors push their whole middleware stack as ”SOA-enabled”. Usually this means that using their application servers, BPEL implementations, etc you can develop an enterprise application based on SOA. I will have a look at some commercial SOA solutions from the big vendors but I won’t spend much time on it. 11 However, I will invest more time in the search for open-source SOA components and software, because I will need them for my own ECM implementation. I will compare several products and give an initial status report. A thorough comparison will be made during my thesis period. 2.2.3.1 C OMMERCIAL 1) IBM (International Business Machines) [23] Slogan: On-Demand Business Product overview: IBM is one of the biggest and earliest pusher of SOA technology. Even more, their whole business model can be captured in their ”On-Demand Business”. A full and thorough overview of all their SOA-enabled software is out of the scope of this proposal, but I found a good picture (Fig. 2.3) that givens an overview of their SOA products. Detailed information on the used technologies can be found on their website [22]. Evaluation: It is no doubt that IBM has one of the most mature and extensive support for SOA in their software stack. They were at the root of several SOA standards and are eager to push the way that they think SOA should be. A nice example is the Software Component Architecture technology. Lots of code has been donated to the open-source community to drive the adoption of SOA and they continuously push the SOA technologies further, like the upcoming BPEL4People [24] standard. 2) HP (Hewlett-Packard) [13] Slogan: The adaptive enterprise Product overview: HP’s focus is on delivering management software and consulting services [14] and not concrete SOA applications, SOA development tools nor SOA middleware software. Their consulting staff is trained to give support for third-party vendors like SAP, BEA and JBOSS. The management software is known as the HP SOA Management, formerly known as HP Openview. So what can the SOA Manager do? The main focus is Web Services Management. It contains an overview of all the available services and provides full monitoring of aspects like performance, security, availability etc. It can also be used to make virtual business services on top of the Web Services, so it allows orchestration of services. The SOA Manager consists of four components: Web 12 Services Management, Management Integration Platform, Business Services Catalog and Business Service Designer and Identity Management component 5 . Evaluation: HP’s ”Adaptive Enterprise” wants to be an answer for changing enterprises. ”Change” in the sense that enterprises become bigger, focus on other products and markets, merge with other enterprises, etc. To ”adapt” to these changes, enterprises need a SOA-based solution to be able to quickly address the new situations. The executive whitepaper from HP [15] really stresses that time is a critical factor to stay ahead of the competition and using a Service-Oriented Architecture, ”an enterprise’s IT team can quickly and easily reassemble and reconfigure core application and infrastructure services into a wide range of new business services”. In contrast to IBM, HP focuses on management tools and consulting services for companies that want to go for SOA. They use mainly third-party software to address the needs of their customers and for the hardware, they can shop in their own store. HP has really impressive server systems and operating systems, so in a way SOA is just another way to generate income for their hardware department. 3) Microsoft [28] Slogan: (None as far as I know) Product overview: Microsoft’s SOA efforts are combined in their Windows Communication Foundation (WCF), previously known under the code name Indigo. It replaces technologies like ASMX (ASP.NET Web Services) and MSMQ (Microsoft Message Queue) and tries to offer a one-stop-shopping solution. It will be integrated with their upcoming Windows operating system releases and .NET technology stack. According to ZDNet’s blogger Dana Gardner [12] the main difference between IBM and Microsoft is that ”IBM envisions a future where the new productivity action is above the interfaces, runtimes and discrete messaging protocols. By elevating process innovation above the older platforms and embracing open source tools and frameworks, IBM and BEA see SOA abstracting much of an enterprise’s legacy assets and resources into standards-based services that can be modeled and deployed as general and easily tuned processes, regardless of their origins.” According to Dana, Microsoft’s goal is to keep their SOA platform closed and they want to encourage developers to build services solely on .NET technology. However integration of 5 Identity Management component enables secure web services, secure provisioning, and secure command and control. 13 third-party services into their Enterprise Service Bus is supported and in the architectural overview of the WCF [28], Microsoft states that SOAP will be the message protocol and thus their SOA stack will be interoperable. Evaluation: In contrast to the other big vendors like IBM and HP, Microsoft is not using their SOA commitment in their marketing campaigns. You even have to search quite hard to find something on their site, but using Google, you can see that they host a big SOA page on their MSDN pages (the URL can be found in the reference list). Right now it is too early to give an opinion on Microsoft’s SOA strategy, but we know that Microsoft’s strategy is to produce for a mass market and when the market will demand SOA, Microsoft will be ready to deliver it to them. 2.2.3.2 O PEN S OURCE 1) Apache http://www.apache.org Slogan: N/A Product overview: ActiveMQ: Message Queue Axis: SOAP implementation Geronimo: J2EE 1.4 Certified Application Server jUDDI: UDDI Service Registry Synapse: Web Services Mediation Framework ServiceMix: Enterprise Service Bus Tuscany: Implementation of the Service Component Architecture Woden: WSDL 2.0 implementation Agila: Business Process Management Ode: Business Process Management Evaluation: A lot of code and work has been donated to the Apache Software Foundation by industry giants as IBM, HP and others. While the projects are usually from a high quality, they all originate from different projects and therefore they are completely inconsistent with each other. There is still a lot of work to do to align Apache’s SOA software stack into a homogenous and competitive product, but I am convinced that if we want to make an open-source SOA stack 14 happen, then we all have to put our shoulders under the Apache projects. Therefore I will put a lot of attention to Apache’s product stack for my SOA-enabled ECM system implementation. 2.2.3.3 C ONCLUSION It’s for sure that the big vendors are ready for this ”next big thing” called SOA. This is not surprising, considering that the whole SOA (r)evolution is started and driven by industry. Usually they make the standards first, develop a reference implementation and then industry follows. Here it is the other way around, the standards were derived from the implementations from the big vendors. Although there are some standards, a lot of technologies are not standardized yet (or are in the process of being standardized) and basically every company has its own interpretation of what exactly a SOA architecture is and how far you can take it. The developments in the open-source world are heavily backed by industry (the big vendors are donating code and engineers to for instance Apache projects), but the products still need a lot of work. One of the big SOA projects at Apache, called Tuscany, is still in incubation 6 and needs a lot of work to even reach some kind of beta status. It is most likely that the open-source community still needs six months to a year from here on, before they can deliver a product from the level of Apache Tomcat and Apache httpd. Due to this immaturity of the open source implementations, there are a lot of opportunities for me to jump in some projects and help, whether it’s testing, debugging, donating code, etc. The big vendors are pushing their marketing departments to evangelize their SOA product stack, the only one that is a bit more silent about the whole SOA buzz is Microsoft. Right now we don’t have a consistent open-source SOA software stack yet, but things might change soon because the Apache Software Foundation is heavily working on it. Despite the lack of an open source SOA stack, we have already incredible and very mature open source software available like Apache’s ServiceMix [21] and ObjectWeb’s Celtix [29]. 2.3 B USINESS P ROCESS E XECUTION L ANGUAGE The Business Process Execution Language (BPEL) was born out of the merger of IBM’s Web Service Flow Language and Microsoft’s XLANG and its first official name was BPEL4WS 1.0. Later on BEA, SAP and Siebel joined the group and submitted BPEL4WS 1.1 to the standards group OASIS. The current version of BPEL is called WS-BPEL 2.0, but usually everyone is referring to it as just BPEL. 6 The Incubator project was created in October of 2002 to provide an entry path to the Apache Software Foundation for projects and code bases wishing to become part of the Foundation’s efforts. Code donations from external organizations and existing external projects wishing to move to Apache will enter through the Incubator. http://incubator.apache.org 15 2.3.1 Definition To better understand what BPEL means, I’ll firstly introduce some concepts before the actual definition. 1) Business Process: The necessary steps or activities to complete a business transaction. The activities can be performed either by applications or by humans. Usually long-running. Multiple parties are involved, often external parties. 2) Programming in the large: According to Wikipedia [48], ”Programming in the large can refer to programming code that represents the high-level state transition logic of a system. This logic encodes information such as when to wait for messages, when to send messages, when to compensate for failed non-ACID transactions, etc.” BPEL is an XML-based programming language for defining and managing Web Service orchestrations or processes. It controls the overall sequence of invocation and also the actual invocation of the collaborating services. To achieve its task, the BPEL language knows language constructs like assignments, case logic, sequences, etc. So BPEL is good at executing a series of activities and interacting with multiple parties, but does have its limits. BPEL is not designed to let people participate as a service (like a manager that has to approve a contract) and can’t handle complex business processes (like processes that can evolve or change during their long-life execution). Oracle’s ”Weaving Web Services Together” article [32] on their developer website identified several questions, which are actually also an answer why you would really need a language like BPEL: 1) How does one handle asynchronous Web service invocations where the called Web service is long-running and has to call back at a later time into the originating Web service? 2) How does one correlate requests across many in-flight business processes? 3) How does one invoke Web services in parallel rather than in sequence? 4) How does one undo a long-running transaction in which there has been a failure? 16 5) How does one compose larger business processes out of smaller business processes? 6) How does one guarantee reliable message delivery? These are typical questions that arise when you are dealing with long living processes. Several companies have tried to solve these issues with their own proprietary solutions, but that only resulted in vendor lock-in for the customers. BPEL however is an official standard and should be vendor-independent. 2.3.2 Implementations and status report I will discuss here some open source BPEL engines, but often these are not really visible. BPEL engines are usually integrated in an Enterprise Service Bus or in SOA frameworks. 2.3.2.1 O PEN S OURCE 1) Apache Agila BPEL (formerly known as Twister) [9]: The Twister project has been donated to the Apache Software Foundation and is now in incubation under the name Agila. The Agila Wiki page does not contain architectural documents at the moment (well at least not accessible from their wiki), so therefore it’s not completely clear what it is capable off. However a quick look at their user’s guide learns us that Agila provides possibilities to define user roles and let users participate in the business process. It is not clear whether it conforms to the new BPEL4People standard or not. 2) bexee - BPEL Execution Engine [3]: bexee is a research product from students from the Berner Fachhochschule in Germany. Although not really clear what it is actually capable of, it seems like a basic BPEL engine. It is touted as some kind of experimental sandbox to play with and test new technologies and approaches. While the enterprise value of this product is rather low, it is interesting to have a closer look at this project because it is extremely well documented (because it’s a school research project) and can therefore be useful for me to better understand some general concepts of BPEL engines. 3) Intalio PXE [18]: Intalio’s PXE (acquired from FiveSight) is an well-known BPEL engine that is used in Sun Java Studio, Apache ServiceMix, SymphonySoft Mule, etc. It has support for both BPEL4WS 1.1 and WS-BPEL 2.0. As far as I understand, the BPEL code is compiled to be able to do static checking and 17 analysis. PXE can run stand-alone or can be embedded in another product. To conclude: PXE is quite a mature product that is also the basis for Intalio’s commercial BPEL engine. 2.3.2.2 C ONCLUSION There are several nice and mature open source BPEL engines available and you see that they are often acquired by commercial companies (like Intalio’s PXE) or open-source organizations (like Apache’s Agila). At the moment we can only hope that these commercial companies keep the software open-source. The school project bexee is interesting for me for the architectural information it can provide. However it is not suitable to be deployed in a production environment. A good alternative for our framework could be Apache’s Agila, but our choice also depends on what Enterprise Service Bus we will chose. For instance Intalio’s PXE is part of the Apache’s ServiceMix Enterprise Service Bus. 2.4 H UMAN BASED W EB S ERVICE Certain tasks require human judgment or expertise and cannot be done by a machine. The participation of people in service compositions is quite a new aspect in Service-Oriented Architecture and essential for my project. IBM and SAP proposed recently an extension to the BPEL standard [37], called BPEL4People to achieve the concept of human based web services. 2.4.1 Definition The ultimate goal of automating all the business processes is not possible at the moment. Certain activities require involvement of humans, like approval of certain requests or when some exceptional situations arise that the system cannot handle. In our SOA system, a Human Based Web Service does not differ from a regular Web Service for the environment 7 . So when our Human Based Web Service is called, it notifies the assigned person(s) and waits (usually asynchronously) for a response from the user. You can build in mechanisms of course to remind the user when it takes too long, or re-assign it to someone else. The big advantage of implementing an interaction with a user in this fashion is that it can be replaced by software. Let’s say that in an insurance system, there is no need anymore for human approval. Then we can easily code a piece of software that checks certain parameters. Of course without any change for the rest of the system! However we can also deal with more complex situations where we need double-checking. Imagine that certain insurances for high-profile companies need approval 7 If you remember the fact that the implementation details of an Web Service is not important, this makes sense. 18 from two different persons. This is not so obvious to represent in our SOA environment with Human Based Web Services. These problems are addressed in the new upcoming standard by SAP and IBM, called BPEL4People [37]. BPEL4People is an extension on top of BPEL and right now not (yet?) part of WS-BPEL 2.0. I can’t find the formal specifications of this BPEL4People extension to discuss, but in a whitepaper they describe four situations (patterns) where BPEL4People could be of use, so I’ll discuss these here. In the meanwhile, I contacted some developers at IBM with the question if they have some kind of reference implementation or formal specifications that I can use. 1) 4-Eyes principle: Also known as the separation of duties. For security reasons or perhaps just out of company policy, certain actions need involvement of two or more persons (for instance approval of a loan). It is even possible that the persons are not allowed to know who the other person is. This can be addressed in BPEL4People. 2) Escalation: You can put certain time limits on the execution of a task by a person. When these limits are exceeded, an escalation mechanism starts. It notifies certain people or reschedules the tasks. 3) Nominations: It is not always clear who should handle a certain task. The nominations mechanism allows a supervisor to manually assign a task to the best suited person. 4) Chained execution: It can be possible that a certain task requires several steps to be executed. It is possible that this is some side effect of an earlier step in the process and thus unpredictable. Instead of scheduling all these steps as new tasks in the notification list of the user, we just chain them and present them as one task. This can be presented to the user in some kind of wizard fashion. 2.4.2 Implementations and status report 2.4.2.1 O PEN S OURCE None as far as I know. 2.4.2.2 C ONCLUSION The initiators of this extension have of course built this into their own commercial products, but there are no open-source implementations available. There is only one vendor, Intalio, that supplies a BPEL4People compatible product, but that product is not open-source. 19 I emailed someone from IBM to get more information about some implementation or formal specifications and I will probably work on this during my thesis phase. 2.5 E NTERPRISE S ERVICE B US The Enterprise Service Bus (here after abbreviated as ESB) unifies and connects services, applications and resources within an enterprise. Important to note is that ESB is an architectural pattern that can be implemented in various ways, so it is not a concrete application. 2.5.1 Definition The ESB architectural pattern is not a new invention from the SOA era. The idea of having middleware that interconnects distributed applications like Distributed COM and CORBA exists already for a long time and is often called Enterprise Application Integration (EAI). While previous approaches also integrates applications, coordinates resources and manipulates information, ESB takes this to the next level. Now we can access all kinds of software (in the form of services), independent of the platform, programming language and programming paradigm (object-oriented, procedural, functional, etc.). Both the service providers and the service requestors operate independently without really knowing the other’s origin. The responsibility of the actual delivery of the messages and aspects like quality of service (QoS) is up to the ESB. The nice thing is that this ESB is completely transparent. Neither the developers, nor the service providers and consumers need to know from before that there will be an ESB as a transport layer. Even more, you can deploy an application that was never meant to run in an ESB environment. Schmidt et al. define the essential characteristics of an Enterprise Service Bus in the IBM Technical Journal for SOA [36] as: ”the meta-data that describes service requestors and providers, mediations and their operations on the information that flows between requestors and providers, and the discovery, routing and matchmaking that realizes a dynamic and autonomic SOA.” 1) Meta-data: Explicit declaration of capabilities and requirements of interaction endpoints. This is stored in a service registry for easy discovery. The more information provided, the easier the matchmaking will be. The used matchmaking principle is that the service requestor adapts to the service provider, but the requestor needs to know sufficient information about the requirements and capabilities of the provider for this process. 20 2) Mediations: The role of a mediation is to satisfy the integration and operational requirements within the infrastructure. This can be converting from one transport protocol to another, validation of content, caching, etc. 3) Routing: Messages can be (re-)routed based on rules and content. Sometimes vendors denote the Enterprise Service Bus as Enterprise Application Integration Middleware as we can see in IBM’s simplified ESB architecture picture (Fig. 2.4). 2.5.2 Implementations and status report The idea of an Enterprise Service Bus is not new and therefore the ESB is one of the most mature components in the Service-Oriented Architecture. I will have a look at three open source ESBs, that are viable candidates for my framework. The final decision will be taken during my thesis period. 2.5.2.1 O PEN S OURCE 1) ObjectWeb Celtix [31]: Celtix is IONA’s [41] open source Java ESB, hosted at the ObjectWeb community. You can consider it as the lightweight, open source variant of IONA’s commercial ESB Artix [40]. Celtix is quite a young product and you can notice this when you have a look at the feature list on their website. They have the basic support for HTTP 1.1 and SOAP 1.1, but also the Java Messaging Service (based on Active MQ), WS-Addressing (to address web services and messages), policies and JAX-WS 2.0 (formerly known as JAX-RPC) are supported. A closer look at their documentation learns that the goal of Celtix is to be full JBI compliant. JBI is a specification that describes a pluggable Integration Container that is strongly focused on Web Services, built with WSDL and other WS technologies at its core. 8 While the documentation is quite limited and vague on their JBI support, I learned that they want to achieve two goals. They want to use Celtix as a JBI container that can host JBI-compliant components, but they also want to allow you to expose Celtix components as JBI-components and let you deploy them in other JBI containers. Thus, allowing you to build components with Celtix and then deploy them in another JBI environment. 2) SymphonySoft Mule [38]: SymphonySoft’s Mule targets itself as a lightweight messaging framework. The feature list includes: message delivery, message 8 The JBI or Java Business Integration is also known as JSR-208 and is therefore an official Java standard. [33] 21 transformations, dispatching messages, pooling and threading, exception handling, transaction management and lifecycle management. On the technology side it has support for JBI, BPEL for orchestration and the other standard technologies in an ESB. One of the big pros of Mule is the flexibility in usage. We can use it in a client/server, P2P or ESB environment. The event processing can be synchronous, asynchronous and request-response. Last but not least, there is support for content-based and rule-based routing of messages. Bottom line, this ESB is much more developed than ObjectWeb’s Celtix and offers most of the functionality that I have in mind for my framework. 3) Apache ServiceMix [10]: ServiceMix has been donated by LogicBlaze (an Open Source SOA software house) to the Apache Software Foundation and is completely build upon the JSR-208 or the Java Business Integration standard. To sum up all the features of ServiceMix, I’ll need another chapter, but I’ll try to sum up the most important ones. Basically, they thought about almost everything: going from support for scripting languages (through the JSR-233 scripting standard), to caching, schema validation, security, notifications, etc. Next to the standard support for transport protocols like JMS (with ActiveMQ) and HTTP, it supports RSS, Jabber (the open source chat protocol), FTP, email, etc. The fact that this product is full-featured makes it harder to improve it, but I noticed that there is for instance no support (yet?) for BPEL4People (the Human-based Web Service technology) and probably there will be some more shortcomings. 2.5.2.2 C ONCLUSION As I pointed out, the open source ESBs are very mature and some are even production-ready. At first sight SymphonySoft’s Mule and Apache’s ServiceMix are the most suitable for my framework. Martin LaMonica pointed out in an article on News.com [20] that ObjectWeb’s Celtix and Apache’s ServiceMix will cooperate and share code and presumably merge into one ESB in the future. Therefore it might be wise to go for ServiceMix, but I’ll take a closer look into this matter during my thesis period. Next to that, the described ESBs are generic frameworks and perhaps need some tweaking for my ECMS purpose. I still have to figure out how we are going to deal with documents, because documents are not ”just” messages, especially unique documents that contain for instance specific Service-Level Agreements. Because the technology of ESB is quite mature, I’ll probably have more time to invest in the ECMS aspects, instead of the underlying ESB technologies (though BPEL4People support is lacking, so that can be worth investigating). 22 2.6 S ERVICE C OMPONENT A RCHITECTURE Service Component Architecture (SCA) is a specification that describes a model for building applications and systems using a Service-Oriented Architecture. It is an industry-driven standard backed by IBM, BEA, Interface21, IONA, Oracle, SAP, Siebel and Sybase [39]. 2.6.1 Definition In November 2005, a consortium of vendors 9 has released a whitepaper [39] where they explain their view on how a SOA framework should look like. Because a Service Component Architecture (SCA) is just an interpretation of how a SOA could look like, it sticks to the same principles that I have earlier described. Therefore it is more interesting to look where the differences are with respect to the more ”traditional” approach of building SOA frameworks: 1) Integrated system: In contrast to the more traditional approach, an SCA tries to offer a single package. So the functionalities of for instance an Enterprise Service Bus and a service orchestration engine are included. 2) BPEL as first class citizen: a BPEL process for service orchestration can be used as a service inside an SCA. 3) Services != Web Services: A service within an SCA can be next to a Web Service also a messaging system (like JMS), CORBA IIOP, etc. This is achieved with bindings. 4) Infrastructure capabilities: Infrastructure capabilities like security and transactions are handled by policies and are completely independent of the implementation code of the services. 2.6.2 Implementations and status report 2.6.2.1 O PEN S OURCE There is at the moment only one project that tries to implement the Service Component Architecture specifications and that is Apache’s Tuscany. The project is still in early beta and under heavy development. The members of the consortium behind SCA have developed the original code and the current Apache Tuscany project is under incubation. Because Tuscany will be the reference implementation for SCA, all we know about SCA will be implemented in it and therefore it has no use to repeat the features here again. 9 BEA, IBM, Interface21, IONA, Oracle, SAP, Siebel and Sybase 23 2.6.2.2 C ONCLUSION I’ve been monitoring the efforts of the Apache Software Foundation’s mailing list for Tuscany and it is still in early beta and completely not production ready. Apparently Tuscany is one of the only implementations available of SCA and therefore no noteworthy implementation has been done to verify the maturity of the specifications and platform. I’ve seen several emails about re-architecturing several core components and therefore I can’t say that Tuscany is stable yet. There is for instance still a lot of work to do on the bindings with other languages. The team that is working on it seems very enthusiastic and every day there are more people joining the development so eventually Tuscany will become a production ready reference implementation for SCA. 2.6.3 Considerations This industry-pushed approach to SOA has right now one disadvantage: it’s not a standard. It is backed by a large consortium of companies, but does not get support from Sun Microsystems. The latter believes in an Enterprise Service Bus that sticks to the earlier described Java Business Interface. Therefore the SCA specification is not ratified by the Java Community Process. It is not only Sun Microsystems that doesn’t believe in SCA, Microsoft does not support it either, perhaps because it’s a threat to their Windows Communication Foundation. Also JBOSS does not support it. In a recent article [27] Marc Fleury (JBOSS’ CEO) told that he is worried that the big vendors want to dominate the SOA market and bypass standards organizations and set their own standards to kill commercial open source companies like JBOSS. But will this fact withhold SCA from being an (un)official standard? If such a large vendor group backs it, then it is possible that it will eventually make it as a standard one day. History has proven that often technologies become a de-facto standard due to their popularity. Although it is heavily pushed by for instance IBM, it feels a bit ambiguous. IBM has its complete WebSphere platform with BPEL engine, Enterprise Service Bus and the whole SOA stack and it puts its money and efforts also on SCA. Of course it is wise to bet your money on several horses, but perhaps SCA can be seen as the ”next-generation” SOA? It is a fact that SCA is very promising. Instead of wondering which ESB, BPEL engine and other infrastructure software you are going to use, you just go for an SCA framework but as long as SCA is not a standard, the danger exists for vendor lock-in. It is however likely that this integrated aspect will attract lots of implementers of SOA-enabled systems. It is interesting for my project to look at this approach and decide whether I’ll go for the traditional SOA or the SCA way. Both have their advantages and disadvantages. I will make a decision based on a more thorough comparison study during my thesis project. That 24 decision will also be based on the fact whether we will allow non-standard technologies and products. 25 Figure 2.1. AIIM’s ECM architecture 26 Figure 2.2. A simplified SOA architecture Figure 2.3. IBM SOA software stack 27 Figure 2.4. Simplified ESB Architecture. 28 CHAPTER 3 THESIS PROJECT This third chapter of my thesis proposal answers questions like the why, the how and the when of my thesis. It goes deeper into the research questions, my own contribution, the taken approach, the project deliverables and the thesis planning. 3.1 R ESEARCH Q UESTIONS 1) SOA-enabled ECMS: The initiator of the thesis project is the Content and Knowledge Engineering group. Their interest is the feasibility of an Enterprise content Management System based on the concepts of a Service-Oriented Architecture. Their main interest can be found of course in the field of information science and they focus on aspects like how are we dealing with the document flow in a SOA? 2) SOA status report: From a software technology point of view, I am interested in how the SOA architecture works. Because most of the technologies and implementations are in their infancy (a lot of them are still in alpha or beta phase), I will make a status report of the standards, technologies and implementations to be able to identify the weak spots. The next step would be to try to come up with ideas and solutions to fill these ”gaps”. It would be nice to see that some of my work will be actually incorporated in some standards, technologies or implementations. 3) Future of SOA: Service-Oriented Architecture is one of these technologies that are much over hyped. While it is dangerous to make such predictions, we can actually look at history and draw conclusions from it. I want to make a comparison with earlier distributed technologies like for instance CORBA and look at aspects like the rate of market adoption, the install base, the impact on the IT industry, etc. Based on these data, we can probably come up with some conclusions and try to predict a trend for the future. 3.2 OWN C ONTRIBUTION I will here discuss my own efforts in the area of Service-Oriented Architecture. While one of the goals will be to deliver a SOA framework for an Enterprise content Management 29 System, a lot of effort will be spent in giving knowledge back to the SOA community. This can be ideas, but also source code, help in projects, testing, etc. The only problem can be that it is hard to keep track of all these efforts. 3.2.1 Participate in open source projects SOA technologies are industry-driven and right now the big vendors have their own implementation of a SOA architecture. Some of the technologies are already standardized, like the BPEL Web Service orchestration standard. However, there is no real reference implementation yet and tools provided by the open source community are still in their infancy. Noteworthy projects are (but not limited to) the following: 1) Apache Tuscany: Open source implementation of a Service Component Architecture framework. Hosted by the Apache Software Foundation, but still in incubation. Lot of work can be done on bindings with other technologies and languages. Because of the ”immature” nature of this product, there is room for suggestions to make things better. 2) Apache Axis2-C: Axis is a Web Service server, that can be embedded in other application servers (like JBOSS). Axis2 is still beta and work needs to be done on the bindings with other languages. 3) Apache Enterprise Service Bus: The code base of this project has been donated by commercial companies and so needs work to be polished and integrated with the other Apache projects. 4) Own open-source BPEL4People implementation (with formal specifications). 5) Other projects will follow. 3.2.2 SOA - ECMS Framework I will incorporate my ideas and research findings in a framework. This framework can be used to work upon in the future in other projects. Perhaps I can also donate some code to open source projects, but I should take a closer look at the open source licenses for that. Perhaps the project can be hosted at SourceForge or some other project server. Of course it can also be hosted at the Software Technology Group’s servers. 3.2.3 Master thesis The results of my research will be documented and bundled in a final work. The added value of this work will incorporate, but is not limited to, a status report of SOA technologies 30 and implementations, feasibility of a SOA architecture, added value of a SOA architecture, feasibility of a SOA-enabled ECMS, and others. It is important to note that a comparative study of SOA technologies is important, because several technologies are overlapping, resulting in multiple approaches using different technologies. Even more, a status report of the technologies and projects will give an insight in the maturity of SOA and this will also be the basis to formulate improvements and formulate solutions for incomplete technologies. I realize that a master thesis is not the perfect way to give back ideas and knowledge to the SOA community and therefore, intermediate results and ideas that can stand on themselves will be published as articles or papers. 3.3 A PPROACH 3.3.1 Maintain a public blog I will put ideas and intermediate results on my public blog. I hope that some discussions contribute to my thesis. It’s available from: http://blogs.webcoder.be/lee 3.3.2 Participate in discussions Several web sites have a forum and discussion corners to exchange and discuss ideas. Participating in these will be a perfect way to verify my ideas and point of view. Some web sites with discussion places for SOA related technologies: 1) IBM developerWorks articles: http://www-128.ibm.com/developerworks/webservices 2) IBM developerWorks blogs: http://www-128.ibm.com/developerworks/blogs/index.jspa 3) Apache Tuscany mailing list: [email protected] 4) Apache Axis2-C mailing list: [email protected] 5) ZDNet’s Blogs: http://blogs.zdnet.com 6) Several websites will follow 3.3.3 Share intermediate results I will try to share intermediate results as soon as possible. This can be for instance on my blog. I will also try to submit an article to websites like The Register [35], ZDNet [54] and others. Although it is hard to contribute to a site like IBM’s developerWorks, I will try to submit some articles to them. 31 3.4 P ROJECT D ELIVERABLES It is obvious that my deliverables will be my thesis document and a prototype of the SOA-enabled ECMS framework, but my supervisors advised me to prioritize the subprojects of my thesis project. Therefore I introduce three categories: ”must have”, ”great to have” and ”nice to have” 1 . Each subproject will also have an indication of estimated workload. In combination with the time slots that are defined in the next section, we can roughly estimate the time needed for the project. Considering that each time slot can have two units of workload, I can schedule one subproject of two units in such time frame, or two one-unit subprojects. Of course this is just an estimation and not exact science. When work advances quicker than expected, I can reschedule and use three units of works per time slot. While it is not explicitly mentioned, aspects like contributions to technologies (like an enhancement proposal for BPEL4People) are part of the proposed subproject. This is because you detect only shortcomings when you are deeply involved with a certain technology or project and therefore cannot identify them from before. Other activities like participating in discussions and blogs are also not explicitly mentioned. Together with the thesis document, I’ll hand in a brief overview of what ”side-work” I’ve done with probably statistics about the amount of time spent on what parts of the thesis project. 3.4.1 ”Must have” (Priority 1) 1) Thesis document 2) Case study Elsevier: Work out the processes and work-flow of the paper submission and handling system at Elsevier (Suggestion of prof. van den Berg). This will make the project more concrete and will actually verify whether the results of my research and my prototype ”work”. In this particular case we have to think about: participants (authors, reviewers, publishers,...), workflow of document, notifications, editing of documents, approval of participants, annotations, how to deal with multiple authors, how to deal with changes to documents (change document? add metadata? ...), etc. (Estimated workload: 1) 3) Generalize case study into a framework: While the Elsevier case study makes our project more concrete and easier to understand, we still need to transform the case study into a ”SOA version” and eventually generalize it, to make it suitable for all kinds of ECM applications. (Estimated workload: 1) 4) Security aspect: Look at aspects like authentication. Is a user authorized to view certain parts of documents? How do we deal with issues like non-repudiation, 1 Priority: ”must have” > ”great to have” > ”nice to have” 32 preventing of message altering, etc. Will a traditional (single sign-on) authentication suffice or do we need other variants like a rule-based system? Can we trust a service or build a trust relation with it? (Estimated workload: 1) 5) Technical architecture: We know how a SOA looks like and what kind of components we need. Now we have to make a decision which approach we are taking. Will we use the more standards-based approach with BPEL, ESB and WS or shall we go for the IBM/BEA pushed Service Component Architecture? In this phase we choose for the approach and the concrete applications, frameworks and/or technologies. The resulting framework should consist of open source code that can be freely distributed again. (Estimated workload: 1) 6) User interface: How are we presenting the data to the end-user, especially with respect to the involvement of users in human-based web services. An often-used approach in this Internet era is a web interface where they make use of portal technologies. (Estimated workload: 1) 7) Business Process Management: This is one of the more challenging parts of my thesis project. The area of BPM is very cluttered and most projects (especially open-source) are immature and still in development. There are also several ways to achieve BPM like with or without rule engine, with or without ESB, what version of BPEL, how to integrate BPEL4People, etc. So the first issue will be to determine what we want and come up with an open-source solution. Almost for sure, I will have to work on the projects themselves to get everything working and polished. (Estimated workload: 3) 8) Internal document format: In our ECMS, we have to deal often with documents. This subproject wants to look at how we can represent document data inside our system. A possible candidate could be the new OpenDocument format [30] that is used in the new OpenOffice office suite. However, our ECMS should be able to handle other data too. How are we going to deal with documents written in a proprietary format (like Microsoft’s Word) or how are we going to deal with binary data like images, PDF, video, etc. (Estimated workload: 1) 9) SOA Monitoring and administration: Ideally we want to monitor our SOA environment. Are our services up and running? How much load do they have? What processes are active or pending? In the case of our ECMS: where are our documents? Can we replace services at runtime and how are we dealing with running processes that are using these services? As part of my project, I can think 33 about and develop an open source SOA/BPEL monitoring tool. Can be web-based, but also fat-client. Right now only the big vendors offer commercial packages, so such tool would be useful. The need for good SOA governance has also been identified recently by Colleen Frye of SearchWebservices.com [11]. (Estimated workload: 2) 3.4.2 ”Great to have” (Priority 2) 1) SOA market study: What is the adoption of SOA technology now and prediction for the (near) future? (This will probably be based on research by companies like Gartner.) (Estimated workload: 0.5) 2) Comparison of the evolution of SOA adoption and success with older comparable technologies like CORBA, RMI, DCOM, etc. (Estimated workload: 0.5) 3) Service State: Is it possible that these services have some kind of ”state” (although this is a bit contradictory to the SOA idea, it can be useful sometimes). (Estimated workload: 0.5) 4) Atomic operations: How can we deal with atomic operations (transactions)? (Estimated workload: 0.5) 5) Predicting failure of long-running BPEL processes: One of the problems with long-running BPEL processes is that they can fail somewhere in the middle. Imagine that we are working with the latest WS-BPEL 2.0 standard with the added BPEL4People extension. Then it is possible that someone forgot to assign a person to a certain task, resulting in the halting of the process. Imagine that we set a time-out of 3 days before our process reports a failure, then we loose a lot of time and resources. I was wondering if we could have some system (perhaps extension on BPEL) that can somehow predict the failure or the success of a long-running process before the process starts. Right now I don’t know how this could be done and it is not part of the standard as far as I know, so this would be worth investigating. (Estimated workload: 1.5) 3.4.3 ”Nice to have” (Priority 3) 1) Semantically-Empowered Service-Oriented Architecture (SESA): Use Semantic Web technologies to enhance the dynamicity of service discovery and negotiation. This will be based on a paper that I’ve written for the course of Semantic Web on the fusion of Semantic Web and Service-Oriented Architecture [34]. (Estimated workload: 3) 34 2) Service enable SAP: Take a look at the product stack of a big ERP vendor like SAP and look how we can re-architecture its product/technology stack according to the ideas of SOA. We can also wonder if my proposed framework would suit this purpose. (Estimated workload: 0.5) 3) P2P: These ECMS systems will probably have some kind of central system that keeps track of the documents, notifications, changes, etc. However, would it be feasible to have a P2P based system? How are we dealing then with versions? How are we dealing then with notifications? Is every client then subscribed to all the other clients that have documents, so every client is also some kind of service, or... ? (Estimated workload: 1) 3.5 T IME P LANNING While it is always difficult to make a strict planning, it can be useful to keep an eye on the progress with respect to the finish date. I’ll assume that there will be a two-weekly meeting with my thesis supervisors to discuss the work and certain issues. Perhaps it would be useful to arrange a fixed meeting moment for period 3 and period 4. 1) 13/02/06 - 26/02/06: Unofficial start of my thesis project. This period will be mainly reserved for literature study. One of the priorities will be for instance to read the IBM Systems Journal on SOA [17], a reviewed, 250-page collection of top-quality research papers on Service-Oriented Architecture. Of course the literature study is not limited to this Journal, I still have a whole list of articles that I need to read, but due to time constraints, I’ve just marked them as ”to be read” until now. I will also use this period to set up my server and development environment and think about the practical issues and technical details. 2) 27/02/06 - 12/03/06: Part 1: Case study Elsevier Generalize case study into a framework 3) 13/03/06 - 26/03/06: Part 2: Technical architecture Business Process Management 4) 27/03/06 - 09/04/06: Part 3: Business Process Management 35 5) 10/04/06 - 23/04/06: Part 4: SOA Monitoring and Administration 6) 24/04/06 - 07/05/06: Part 5: Internal Document Format Security 7) 08/05/06 - 21/05/06: Part 6: User Interface Predicting of failure 8) 22/05/06 - 04/06/06: Part 7: As advised by my thesis supervisors, I will reserve this time slot as a buffer. I can either use it to catch up, or in the case I have time left, I can tackle the ”great to have” and the ”nice to have” subprojects. 9) 05/06/06 - 18/06/06: Part 8: As advised by my thesis supervisors, I will reserve this time slot as a buffer. I can either use it to catch up, or in the case I have time left, I can tackle the ”great to have” and the ”nice to have” subprojects. 10) 19/06/06 - 29/06/06: Finalization. This should be the period where my thesis project is under review by the thesis committee and where I can do my thesis defense. 3.6 C ONCLUSION During the literature study I noticed that there is room for an open source SOA-enabled ECM system. There are several open source projects that call themselves ”enterprise” but they are merely Web Content Management Systems with some advanced features. As professor van den Berg pointed out during the first meeting, there is not a ”real” SOA-enabled ECM system available yet. Some companies boast their product as ”SOA-ready” or expose several services as Service-Oriented, but none has built an ECM system from the ground up with Service-Oriented Architecture in mind. A possible direction for this project is to use as much existing components as possible (for instance from the Apache Software Foundation) and try to release it in future as an ECM. Although there exist already lots of good SOA software, due to the young nature of the field of Service-Oriented Architecture we have enough opportunities to improve existing standards and products. I realize that I can’t deliver a full-featured, production-ready Enterprise Content Management System in my limited thesis period, but my basis can serve as a starting point for 36 future collaborations between the Software Technology group and Content and Knowledge Engineering group. Students from both groups could extend and improve the product for instance. 37 BIBLIOGRAPHY [1] AIIM. Aiim - the enterprise content management association. URL: http://www.aiim.org/. [2] Six Apart. Movable type. URL: http://www.sixapart.com/movabletype/. [3] bexee. bexee - bpel execution engine. URL: http://bexee.sourceforge.net/. [4] Zope Corporation. Zope. URL: http://www.zope.com/. [5] Bram Dons. Opslagnetwerken: DAS, SAN en NAS. Academic Service, 2003. ISBN 90 395 2142 5. [6] EMC. Emc documentum. URL: http://www.documentum.com/. [7] eZ systems. ez systems. URL: http://ez.no/. [8] Dieter Fensel. Semantically-empowered service-oriented architecture (sesa). Technical report, DERI International, 2005. URL: http://www.cs.umd.edu/users/hendler/2005/Fensel.pdf. [9] Apache Software Foundation. Apache agila bpel. URL: http://wiki.apache.org/agila/. [10] Apache Software Foundation. Apache servicemix. URL: http://servicemix.org. [11] Colleen Frye. Soa governance. URL: http://searchwebservices.techtarget. com/originalContent/0,289142,sid26_%gci1164393,00.html. [12] Dana Gardner. Has microsoft missed the soa bus? URL: http://blogs.zdnet.com/BTL/?p=1879. [13] Hewlett-Packard. Adaptive enterprise. URL: http://h71028.www7.hp.com/enterprise/cache/6842-0-0-225-121.html. [14] Hewlett-Packard. Hp press release: Hp expands soa services, opens center to help customers enhance business performance. URL: http://www.hp.com/hpinfo/newsroom/press/2005/050628c.html. [15] Hewlett-Packard. Adaptive enterprise: Business and it synchronized to capitalize on change. Technical report, Hewlett-Packard, 2005. URL: http://h71028.www7.hp.com/enterprise/downloads/Adaptive% 20Enterprise%20%Overview%20WP_4AA0-0760ENW,Rev%201.pdf. [16] Hummingbird. Hummingbird. URL: http://www.hummingbird.com/. [17] IBM. Ibm systems journal: Service-oriented architecture. URL: http://www.research.ibm.com/journal/sj44-4.html. [18] Intalio. Intalio pxe. URL: http://pxe.fivesight.com/wiki/display/PXE/Home. [19] Miro International. Mambo. URL: http://www.mamboserver.com/. 38 [20] Martin LaMonica. Open source projects intertwine for integration. URL: http://news.com.com/Open-source+projects+intertwine+for+integration/ 210%0-7344_3-5844789.html?tag=nef. [21] LogicBlaze. Servicemix. URL: http://servicemix.codehaus.org/. [22] International Business Machines. New to soa and web services. URL: http://www-128.ibm.com/developerworks/webservices/newto/#7. [23] International Business Machines. On-demand business. URL: http://www-306.ibm.com/e-business/ondemand/us/index.html. [24] International Business Machines. Ws-bpel for people. URL: http://www-128.ibm.com/developerworks/webservices/library/ specification%/ws-bpel4people/. [25] Joe McKendrick. Controlling one’s own fate, through soa. URL: http://blogs. zdnet.com/service-oriented/?p=524&part=rss&tag=feed&subj=z%dblog. [26] Joe McKendrick. Is soa a 53 billion dollar secret? URL: http://blogs.zdnet.com/ service-oriented/?p=520&part=rss&tag=feed&subj=z%dblog. [27] Michael Meehan. Jboss’ marc fleury on soa standards, java and paranoia, part 2. URL: http://searchwebservices.techtarget.com/qna/0,289202,sid26_ gci1155951,0%0.html. [28] Microsoft. .net architecture center: Service-oriented architecture. URL: http://msdn.microsoft.com/architecture/soa/default.aspx. [29] ObjectWeb Open Source Middleware. Celtix: Open source java enterprise service bus. URL: http://celtix.objectweb.org/. [30] OASIS. Open document format for office applications (opendocument) v1.0. URL: http://www.oasis-open.org/committees/download.php/16053/ OpenDocument-v1%.0-os_tagged.pdf. [31] ObjectWeb. Objectweb celtix. URL: http://celtix.objectweb.org/. [32] Oracle. Weaving web services together. URL: http: //www.oracle.com/technology/oramag/oracle/04-jul/o44dev_web.html. [33] Java Community Process. Jsr 208: Java business integration. URL: http://www.jcp.org/en/jsr/detail?id=208. [34] Lee Provoost and Erwan Bornier. Service-oriented architecture and the semantic web: a killer combination? URL: http://lee.webcoder.be/papers/sesa.pdf, February 2006. [35] The Register. The register. URL: http://www.theregister.co.uk. [36] M.-T. Schmidt, B. Hutchison, P. Lambros, and R. Phippen. Elements of the service-oriented-architecture infrastructure, volume 44, chapter The Enterprise Service Bus: Making service-oriented architecture real, page 781. IBM, 2005. URL: http://www.research.ibm.com/journal/sj/444/schmidt.pdf. 39 [37] SDN. Ws-bpel extension for people. Technical report, SAP Developer Network, 2005. URL: https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap. km.cm.do%cs/library/uuid/cfab6fdd-0501-0010-bc82-f5c2414080ed. [38] SymphonySoft. Symphonysoft mule. URL: http://mule.codehaus.org/. [39] BEA Systems, IBM, Interface21, IONA, Oracle, SAP, Siebel, and Sybase. Service component architecture. URL: http://www.iona.com/devcenter/sca/SCA_White_Paper1_09.pdf. [40] IONA Technologies. Iona artix. URL: http://www.iona.com/products/artix/welcome.htm. [41] IONA Technologies. Iona technologies. URL: http://www.iona.com/. [42] Fourth Edition The American Heritage Dictionary of the English Language. Wysiwyg. URL: http://dictionary.reference.com/search?q=wysiwyg. [43] One Hundred Seventh Congress Of the United States of America. Sarbanes-oxley act of 2002. URL: http://files.findlaw.com/news.findlaw.com/hdocs/docs/ gwbush/sarbanesoxl%ey072302.pdf. [44] Stefan Tilkov. Choreography vs. orchestration. URL :http://www.innoq.com/blog/ st/2005/02/16/choreography_vs_orchestration.%html. [45] Tommy. Alfresco releases open source enterprise content management solution. URL: http://www.linuxelectrons.com/article.php/20051110092348936. [46] CMS Wiki. History of cms. URL: http://72.14.207.104/search?q=cache: VZ7jW_gtOTIJ:www.cmswiki.com/tiki-i%ndex.php%3Fpage%3DHistoryOfCMS+ history+cms&hl=en&lr=&client=safari&strip=1. [47] Wikipeda. Document management system. URL: http://en.wikipedia.org/wiki/Document_management_system. [48] Wikipedia. Bpel. URL: http://en.wikipedia.org/wiki/BPEL. [49] Wikipedia. Content management system. URL: http://en.wikipedia.org/wiki/Content_management_system. [50] Wikipedia. Enterprise content management system. URL: http://en.wikipedia.org/wiki/Enterprise_content_management_System. [51] Wikipedia. Service-oriented architecture. URL: http://en.wikipedia.org/wiki/Service-oriented_architecture. [52] Wikipedia. Wikipedia. URL: http://www.wikipedia.org. [53] WordPress. Wordpress. URL: http://wordpress.org/. [54] ZDNet. Zdnet. URL: http://www.zdnet.com/.
© Copyright 2026 Paperzz