EDMS - Open Source Solutions

Open Source Solutions
for
Document
Management Systems
Thomas Choppy – ECM consultant
Patrick Nerden – Solutions consultant
Page 2
EDMS - Open Source Solutions
[1] PREAMBLE
[1.1] Smile
Smile is a company comprised of engineers specialising in the implementation of
open source solutions and the integration of systems based on open source solutions.
Smile is a member of APRIL, an association focused on the promotion and protection
of free software.
With over 290 employees in France, and 320 throughout the world (September
2009), Smile is the leading French Open Source solution company.
Since around the year 2000, Smile has been actively monitoring the technological
market, allowing us to identify, to test and assess the most promising open source
solutions. We can then present our clients with the strongest, most sustainable,
most efficient products available.
This approach has given way to a whole range of white papers covering various
different application sectors. Content management (2004); portals (2005); business
intelligence (2006); PHP frameworks (2007); virtualisation (2007); digital document
management (2008); and ERPs (2008). Among the works published in 2009, the
“Open Source VPNs”, and “Open Source flow controls and Firewalls” articles, within
the “Systems and Infrastructures” collection are also of interest.
Each of these works offers a selection of the best open source solutions in the
relevant domain, their respective qualities, and feedback on operational use.
As stable open source solutions slowly gain ground in new sectors, Smile will be
present to offer customers the benefit of these solutions risk free. Smile appears in
the French I.T. market as the integration service provider of choice, to assist major
companies in adopting the best open source solutions.
Smile has also developed a range of service offers over the last few years. A
consultancy department has assisted our clients since 2005, through preproject
phases, solution research, and project support. In 2000, Smile created a graphics
studio which in 2007 became known as The Interactive Media Agency. This agency
offers not only graphic design services, but also e-marketing, editorial, and rich
interface expertise. Smile also has an agency specializing in Third-party Application
Maintenance, application support and application processing. Smile offices can be
found in Paris, Lyon, Nantes, Bordeaux and Montpellier, with presence in Spain,
Switzerland, the Ukraine and Morocco.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 3
EDMS - Open Source Solutions
[1.1.1] Some Smile references
a) Web sites
Laboratoires Boiron, Foncia, Crédit Coopératif, EMI Music, Salon de l’Agriculture,
Mazars, Areva, Société Générale, Gîtes de France, Patrice Pichet, Groupama, EcoEmballage, CFnews, CEA, Prisma Pub, Véolia, NRJ, JCDecaux, Larousse, 01
Informatique, Spie, PSA, Boiron, Dassault-Systèmes, Action Contre la Faim, BNP
Paribas, Air Pays de Loire, Forum des Images, IFP, BHV, ZeMedical, Gallimard,
Cheval Mag, Afssaps, CNIL…
b) Portals and Intranets
Eurosport, HEC, Bouygues Telecom, Prisma, Veolia, Arjowiggins, INA, Primagaz,
Croix Rouge, Invivo, Faceo, Château de Versailles, Ipsos, VSC Technologies, Sanef,
Explorimmo, Bureau Veritas, Région Centre, Dassault Systèmes, Fondation
d’Auteuil, Korian, PagesJaunes Annonces, Primagaz…
c) Electronic Document Management and ECM
Agefiph, Primagaz, UCFF, Apave, Géoservices, Renault F1 Team, INRIA, CIDJ,
SNCD, Ecureuil Gestion, CS informatique, Serimax, Véolia Propreté, NetasQ,
Corep, Packetis, Alstom Power Services, Mazars…
d) E-business
Furet du Nord, Camif Collectivité, La Halle, De Dietrich, Adenclassifieds, Macif,
Gîtes de France, GPdis, Longchamp, Projectif, ETS, Bain & Spa, Yves Rocher,
Bouygues Immobilier, Nestlé, Stanhome, AVF Périmédical, CCI, Pompiers de
France, Commissariat à l’Energie Atomique…
e) Business Intelligence and ERP
Lafarge, Groupe Accueil, Anevia, Projectif, Xinek, Companeo, Advans, Point P,
Mindscape, Loyalty Experts, Cecim, Espace Loggia, Nouvelles Frontières, France24,
La Poste, HomeCineSolutions, Vocatis, Skyrock, France Domicile, Polyexpert,
Cadremploi, Cmonjob, Meilleurmobile.com…
f) Infrastructure and Hosting
Kantar, Pierre Audoin Consultants, Rexel, Motor Presse, OSEO, Sport24, SETRAG,
Canal-U, Institut Mutualiste Montsouris, ETS, Ionis, Osmoz, SIDEL, Atel Hotels,
Cadremploi, Institut Français du Pétrole, Mutualité Française…
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 4
EDMS - Open Source Solutions
[1.2] This white paper
This document aims to present our approach to document management in relation to
content management, and to assist you in the selection of a software solution for
your projects.
With this in mind, we have analysed the responses of a range of open source
solutions to concrete problems, and created a methodological approach to
assist you with the implementation of your project.
This is neither a solution directory nor a theoretical approach to the management of
document content, but rather report on the reality of needs and the Open Source
EDM (Electronic Document Management) market.
While Open Source provides excellent solutions to a number of needs, specific
documentation savoir-faire remains, ahead of tools, the main challenge of EDM or
ECM (Enterprise Content Management) projects.
As with previous White Papers published by Smile, this work attempts to draw
together:
•
A general outline of the underlying document management notions,
upon which EDM project methodology is based.
•
A description of the main functions required by this type of project and
the issues that arise with these functions.
•
An inventory of the main open source content management solutions
•
A presentation of the best tools, an assessment of their strengths,
limitations, and maturity.
This White Paper is the fruit of collective feedback. We welcome any comments or
thoughts you have on the subject.
[1.3] Version 1.0
Version 1.0 appeared shortly after Smile first took an interest in the Open Source
EDM domain and in particular with the arrival of a team specifically dedicated to
this area.
We wanted to present document management both in terms of structure and
operations in order to allow the reader to form their own opinion on the subject and
equip them with the knowledge to successfully implement their project.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 5
EDMS - Open Source Solutions
[1.4] Version 2.0
This update, compiled between October 2009 and January 2010, contains many
important changes both in terms of layout and content.
We have updated the section on developments in the Open Source EDM solutions
market together with the methodological section which was updated based on
feedback we received.
Key additions concern mainly:
•
SaaS - most Open Source solution providers now include a SaaS
(Software as a Service) solution version.
•
Important improvements in the integration of applications with
workstations, particularly Microsoft Office suite.
•
Function updates for each software solution in relation to their
developments, and their roadmap.
•
How the solutions have evolved and rate in relation to each other:
eXo DMS has now been added to the initial panel and Freedom ECM
has been removed from the comparative study
[1.5] Summary
Document management is designed to provide a secure, traceable, organised,
collaborative mode of management for company documents.
In choosing an Open Source solution one benefits from a quality product, guided by
the best solutions to real user needs, and this at minimal cost.
EDM and ECM solutions differ mainly in terms of the digital objects and functions
that they deal with. EDM is limited to digital documents, often office documents,
while ECM encompasses all of the company’s digital content, notably images, site
content, and documents sent by company management systems.
Open Source arrived at maturity in this area in the mid-2000’s and now offers stable
solutions, of a very high functional and technical standard and ever-better adapted
to market needs.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 6
EDMS - Open Source Solutions
[1.6] Open Source at company level
[1.6.1] Non-Open Source document management
A number of solutions exist and provide their range of functions and responses to
known document management issues.
EMC’s Documentum, IBM’s FileNet, Microsoft’s SharePoint, Open Text’s solution of
the same name, Interwoven bought over by Autonomy and the Ever Group’s Ever
Team are among the most well-known in the non-Open Source domain. These are
generally high-end solutions that have been on the content management market for
over a decade and which have, over time, gradually integrated a great number of
functions, well beyond document content management. They are relatively heavy,
ageing solutions, adapted to the complex problems that very large companies face.
The market of proprietary document management solutions has followed the same
trend as that of Web content management tools, some years ago. That is to say, the
solutions which have survived are in niche markets and incorporate important
business savoir-faire, or are very high-end and expensive with a reputation which
continues to bring them clients.
One must understand that an editor that has commercial objectives does not have
the only the clients interests at heart. While they do develop in a competitive
market, and their product must be the better than that of the competition, once their
position is well established, the editor may come to the conclusion that:
•
their product must be open, but not too much so, in order to maintain
control over the client (vendor lock-in)
•
their product must be highly-capable, but not overly-so, as a rise in the
number of servers results in an increase in the number of licenses sold
•
their product must be solid, but complex, to ensure the continued sale
of services such as support
•
their product must be useful, but more importantly it must be
attractive. In actual fact decision-makers are often removed from the
realities of the field and a well-packaged product, which may be limited
from a development point-of-view or may not meet specific operational
needs, may well be selected as the best option, even though it may be of
very little real benefit to the company.
We are not suggesting that proprietary vendors are calculating to the point where
they would make their products less well than they could be, but that the strategic
priority is not necessarily based on this criteria.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 7
EDMS - Open Source Solutions
The “2009: Magic Quadrant for Enterprise Content Management 1” analysis refers to
several important trends in relation to the ECM, including:
•
The continued growth of the content management market, while a
number of I.T. markets are retracting.
•
Furthermore, content management is perceived more
frequently as a major asset to a company. As a result of
solutions which manage their digital assets must be better
into their information system, better controlled by internal
cost less to develop and maintain.
and more
which the
integrated
teams and
These are the very strengths of Open Source solutions.
[1.6.2] The decision to use Open Source solutions
Open Source is gaining ground at incredible speed in new areas of application, each
year. New players have emerged in the form of Open Source editors and the
pertinence of their business model has been proven. The solutions are more and
more mature, and are now real alternatives to historical proprietary solutions.
Let’s have a look at the criteria of selection in relation to the open source nature of
these solutions.
One of the main reasons for choosing to use an Open Source solution is the
financial benefit. There is always some cost involved when implementing a
solution, even if it is only in terms of time spent on training and implementation.
Numerous studies have shown, however, that the total cost of Open Source solution
projects is significantly less than proprietary solution projects, in the long term.
Savings can be anywhere from 20%-80% based on the level of maturity of
the open source solution in the relevant area.
The cost of the proprietary solution licence is, of course, the main factor that is
mentioned: it represents a major initial investment, even before the benefits of the
solution can even be proven. Added to this is the cost of services which tend to be
more expensive that in the domain of Open Source solutions, as the openness of a
product facilitates knowledge dissemination. Finally maintenance and development
costs also tend to be lower.
Their evolution does not depend on their profitability or marketing factors. The
product will live on for as long as there is an interested community.
Dissemination of savoir-faire associated with their implementation is guaranteed
by the fact that there are practically no barriers to acquiring the solution (downloads
and documentation are free and easily-accessible).
1
http://www.gartner.com
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 8
EDMS - Open Source Solutions
Standardisation, adhering to norms and standards, and openness all come as
second nature to Open Source developers, who strive to develop efficient solutions
rather than reinventing the wheel.
The possibility of making modifications to the source is fundamental in theory, but
risky in practice. As such it is not in these terms that openness is appreciated, but in
terms of the capacity to accept extensions and to interface with other applications.
Sustainability: free access to the source is a fundamental guarantee of long-lasting
sustainability. Even if – it must be emphasised – there is absolutely no need for the
client company to be able to master this source code.
As regards durability, the worst that can happen to an open source solution is that
the community can slowly lose interest in it, generally in favour of a more promising
solution. As such, the product may one day need to be changed. This winding down
process is always slow giving users more than enough time to migrate to a different
solution.
In the case of an Open Source editor, it must be said, that if one day it was found to
be in decline, it is always possible for the community to take over the product and its
development, this is the principle of Open Source licences.
As such as these solutions slowly reach maturity, their low cost is no longer their
most appealing feature and it is the other advantages of Open Source solutions
which make them the most appealing solutions.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 9
EDMS - Open Source Solutions
Content
[1] PREAMBLE.......................................................................................................2
[1.1]SMILE................................................................................................................................. 2
[1.2]THIS WHITE PAPER................................................................................................................... 4
[1.3]VERSION 1.0........................................................................................................................ 4
[1.4]VERSION 2.0........................................................................................................................ 5
[1.5]SUMMARY............................................................................................................................. 5
[1.6]OPEN SOURCE AT COMPANY LEVEL................................................................................................ 6
[2] GENERAL.......................................................................................................10
[2.1]WHAT NEED IS THERE FOR A DOCUMENT MANAGEMENT SOLUTION? ........................................................10
[2.2]CONCEPT SERVICES ............................................................................................................... 11
[2.3]FROM EDM TO ECM........................................................................................................... 15
[2.4]THE MAIN ISSUES OF EDM .................................................................................................... 18
[3] DOCUMENT MANAGEMENT SOLUTIONS ........................................................21
[3.1]ALFRESCO.......................................................................................................................... 21
[3.2]NUXEO.............................................................................................................................. 23
[3.3]EXO DMS......................................................................................................................... 26
[3.4]KNOWLEDGE TREE................................................................................................................ 27
[3.5]JAHIA................................................................................................................................ 29
[3.6]OTHER SOLUTIONS................................................................................................................. 30
[4] FUNCTIONS ...................................................................................................36
[4.1]METADATA.......................................................................................................................... 36
[4.2]VERSION MANAGEMENT .......................................................................................................... 41
[4.3]CLASSIFICATION REPOSITORY .................................................................................................... 42
[4.4]SEARCH ENGINE................................................................................................................... 47
[4.5]EDM INTEGRATION .............................................................................................................. 50
[4.6]DIGITALISATION ................................................................................................................... 52
[4.7]MANAGEMENT OF PERMISSIONS ................................................................................................. 54
[4.8]COLLABORATIVE FUNCTIONS ..................................................................................................... 56
[4.9]WORKFLOWS....................................................................................................................... 57
[4.10]MANAGEMENT RULES ........................................................................................................... 60
[4.11]LIFECYCLE MANAGEMENT....................................................................................................... 62
[4.12]IMPORT/EXPORT................................................................................................................. 66
[4.13]EMAIL MANAGEMENT............................................................................................................. 67
[4.14]FILE MANAGEMENT .............................................................................................................. 69
[4.15]TECHNICAL INTEGRATION........................................................................................................ 71
[5] CONCLUSION ................................................................................................76
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 10
EDMS - Open Source Solutions
[2] GENERAL
[2.1] What need is there for a document management
solution?
The decision to deploy a document management solution still often results from a
crisis or a tension in the management of information; a situation which necessitates
the implementation of a more structured organisation, traceability or improved
usability. I.T. often offers the best means of providing a suitable solution.
Crises can manifest themselves in various different ways: for example, the
impossibility to call up a complete client file within a reasonable amount of time, or
to find the latest version of an electronic document which represents days of work.
Though insidious in our highly digital era, the most frequent crisis is probably
“loss by dilution”, i.e. dilution of important information in a pool that is far too
vast. This results in loss of knowledge or memory at organization level.
Tensions result from chronic difficulties in the use of documents or time lost. Time
spent searching for information is the factor most frequented mentioned. The reuse
of existing documents is also problematic, when all independent documents cross in
the organization and the efficiency of work can be greatly layered due to a lack of
organization, traceability or even visibility in the organization of documents. The
outcome is loss of productivity.
Furthermore, we know that document management makes up part of the quality
process ((ISO 9001, 14001 in particular) and while the use of a document
management software product is not required in order to comply with this
regulation, it most certainly helps to apply standards in regards to document
management, beyond strict application of quality criteria.
Return on investment can be difficult to calculate for this type of application. It
depends on the evaluation of often immaterial criteria the most delicate evaluation
being the before/after comparison, criteria which is difficult to quantify numerically.
Nonetheless, it is still an unbeatable method for proving the interest of a project and
we often find common sense and contextual metrics which are quite satisfactory.
For this type of application, once it is possible, it is worth assessing objective data
and setting new objectives regarding improvements. Take for example: the time
mail processing takes, in the case of a correspondence EDM, the rate of correct
application of norms and procedures, in the case of a quality EDM, the processing
time for contract renewal for a contract solution or the rate of duplicates of a given
image for a media management application.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 11
EDMS - Open Source Solutions
[2.2] Concept services
Here we attempt to position document management in relation to a certain number
of areas.
Below you will find not all of the concepts of the document management
environment, but those which we feel are essential to our analysis of EDMS.
[2.2.1] Electronic documents
While the notion of documents is straightforward in the material world, its meaning
must be specified in relation to electronic documents or files.
We base our approach to document management on the definitions given by the ISO:
« A collection of information that is identified as a unit intended for human
perception and which can be read by a machine” and “Recorded
information which can be treated as a unit in a documentation process”.
The file format is to electronic documents what paper is to physical documents. A
file is created, modified and made legible by an application. A “.doc” document for
example is created, modified and made legible by the Microsoft Word application.
The information is computer data included in the envelope of the file.
We will see later that EDM applications introduce the notion of document objects,
which conceptually reunite the file and associated metadata.
[2.2.2] Content
The term “content” is used to describe a coherent informative component. The
document object, as mentioned above, is content. The term applies to all digital
information that has meaning.
Content is generally structured, that is to say made up of fields of information.
We refer to this as documentary or structured content.
It is sometimes difficult to differentiate between overall content and an electronic
document. It is often the context of use and the processes carried out which allow to
distinguish the nature of the information and the functional area it relates to, and to
decide which type of tool is most suitable to manage it.
See below, for illustration purposes, some examples of texts which can be considered
as documents and/or as content: a news flash on a news site, a press review, an
image and its legend, product description text, a document notice, a product
catalogue in PDF format, a documentary module in SCORM format.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 12
EDMS - Open Source Solutions
[2.2.3] Document management
The objective: to manage the storage, share and recovery of electronic documents.
These are mainly file management systems which include content (information) and
style (the presentation envelope).
The priority: electronic document management.
The focus: storage, sharing, search.
Terms used: EDM – Electronic Document Management, EDM(S) - Electronic
Document Management (Systems), DMS – Document Management Systems.
[2.2.4] Management of web content
The objective: manage the draft, validation, and online publishing of web site
content.
The priority: publishing information online.
The focus: dissemination, recovery of structured content and/or edits, contributions
as a secondary issue.
Terms used: WCM - Web Content Management, CMS – Content Management
System, often used to describe web content management.
[2.2.5] Content management
The objective: enable the management of digital content, using design,
dissemination, use, search and archiving features.
Content management can be considered as a superset of EDM and WCM. The
content management solutions generally include procedural aspects of workflows
(BPM - Business Process Management) and collaborative work (Groupware).
The priority: ECM (Enterprise Content Management) is the most recent concept of
digital information management, as it addresses both structured and non-structured
information, at all steps of the digital content lifecycle.
The focus: content management in the larger sense, i.e. the integration of EDM and
WCM tools and sometimes even a portal.
Terms used: CMS – Content Management System, ECM – Enterprise Content
Management, BPM – Business Process Management, EIM - Enterprise Information
Management.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 13
EDMS - Open Source Solutions
We sometimes come across the term “content management” in the strict sense of the
term, this refers to “management of web content” due to the historical link with
CMS, the precursor of this type of tool, focusing solely on internet web sites.
[2.2.6] Multimedia content management
The objective: manage the specifics of Digital Assets: images, music and videos.
The priority: image and sound content, navigation and search on a repository,
specific copyright management.
The focus: metadata, navigation, search, DRM (Digital Rights Management),
management of large volumes.
Terms used: DAM – Digital Asset Management.
This is a special feature of EDM. There are a number of common features such as
categorisation, permissions management and lifecycle management. Even if the
metadata specific to author rights management can be modelled with an EDM tool,
certain features such as image manipulation, copyright management, thumbnail
extraction, and selection baskets are specific to DAM.
[2.2.7] Workflows (BPM)
The objective: allow to formalise and fluidify processes by linking unit tasks to roll
them from one party to another.
The priority: management of processes and forms. The BPM covers the specific
interaction needs of those involved in the activity of the organisation, be it humans
or systems as the two often interact via processes.
The focus: management of tasks attributed to user profiles. Nevertheless, this is
usually in terms of a tool which allows to “orchestrate” web services produced by
different applications with a view to organizing simple individual actions to produce
a complex result.
Some tools are specialised in relation to the objects they relate to: a document
“Docflow” for example.
Terms used: BPM – Business Process Management; form, workflow, lifecycle
management.
[2.2.8] Archiving – Record Management (RM)
The objective: enable management of documents phases after its use period
(Length of Administrative Value).
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 14
EDMS - Open Source Solutions
The priority: the preservation of archive records.
The focus: all archive management processes to guarantee the consideration,
safekeeping, reliability and durability of company archives.
RM applications are more and more frequently “blended” into EDM applications. It
has been shown that archiving is better taken into consideration when it is managed
right from creation of an element (file, document, etc.).
Terms used: EAS – Electronic Archiving System, RM – Record Management,
archiving, research value, lifecycle management.
[2.2.9] Collaborative work
The objective: facilitates groupwork by means of special tools, often real-time or
differed communication or interaction tools (e.g. email, shared agendas, etc.).
The priority: to facilitate teamwork.
The focus: to share operational information.
Terms used: groupware – teamwork, Chat (instant messaging), Blog (personal
mode of communication using themes), Wiki (mode for sharing information via the
co-edition of pages), shared Agenda, Google Wave (a mail, chat, blog, wiki tool).
[2.2.10] Search engine
The objective: to allow existing information to be found.
The priority: indexing or interrogation of content databases.
The focus: The objective is to allow users to find information, as search tools focus
on both the indexing of content or the use of external indexes in the case of
metaengines and tools to improve relevance (the appropriateness of what the search
engine produces in relation to the search carried out).
Terms used: Search engine, SEO (Search Engine Optimisation), Findability, Meta
engine, Crawler, Search operator.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 15
EDMS - Open Source Solutions
[2.3] From EDM to ECM
For a last few years the term ECM has been used more and more frequently to
replace EDM and DMS.
Beyond being a “buzz term”, this evolution from “document management” to
“enterprise content management” reflects a certain reality.
a) EDM scope
Electronic Document Management deals with digital documents. In this context an
EDM solution can integrate:
•
tools: mainly digitalisation, storage, circulation, dissemination and
search tools
•
Business features: application of quality procedures, lifecycle
management, management rules, interaction with business
applications etc.
•
Technical specifications: file format transformation, multimedia file
previews, Web pages, structured content, groupwork, etc.
“Pure” EDM applications barely exist anymore: they almost always integrate
features “borrowed” from the areas mentioned above, or business domains.
b) The ECM concept
Enterprise Content Management gathers together software solutions that
offer functions capable of managing a company’s entire digital content.
This means acquiring or capturing electronic information (structured or
document) to manage this information (storage, editing, distribution) while
meeting user requirements (ergonomics, functionality) and functioning in
line with company processes (security, reliability, processes).
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 16
EDMS - Open Source Solutions
IS application
integration
Portal
Collaborativ e
Records
management
EDM
Archiv ing
WCM
Search
IS information
report
ECM
This graph aims to position the EDM, WCM and ECM areas in relation to each
other and to certain number of other related areas.
Finally, we note that the notion of a company portal is frequently
associated with that of ECM. Though in functionality, the crossovers are slight,
the all-encompassing nature of ECM solutions must not overshadow the
fundamentally different purposes of the two. Exposing the tools (services) of the
organisation (portal) versus managing the content of the organisation.
Nonetheless, an ECM or EDM solution’s capacity to integrate itself into a portal is
an important feature. Particularly when your company’s I.T. service does not wish to
merge all information access solutions in one application.
c) The evolution of tools
EDM solutions were focused on file management, then on the consideration of
specific documents covering more and more document types (mail, forms, images,
videos, etc.). They have naturally evolved to better integrate document management
processes, notably by a better consideration of the edit context and the structure of
document files (Microsoft Word, Open Office, PDF, etc.) which will soon allow the
processing of documents as a content aggregation.
Development of the functional scope of EDM solutions tends to resolve different
problems linked to use. It both integrates the management of structured content in
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 17
EDMS - Open Source Solutions
line with that of document content, sometimes referred to as semi-structured
content, and also by enabling complex collaborative actions.
ECM covers the area of EDM and extends to other enterprise content, notably Web
content (WCM), or content managed by other applications integrating content, this
is called federation.
d) Transfer of methods
Development, in terms of content management, follows two distinct patterns: 1. an
increase in terms of managed content, and 2. an adaptation of management methods
to this content, derived from document management.
We note, that a number of ECM methods come from EDM. Content is often
considered to be a file (or a collection of files) and the actions that we apply resemble
those that a user is familiar with applying to documents.
Two factors explain this transfer of methods between EDM and ECM : ECM
solutions often come from EDM editors, but it is the transposition of the paradigm of
the paper document to electronic content which allows the user to do away with part
the complexity carried by the nature of electronic content.
We note that users are more open to managing content which they see as a coherent
whole than content which they consider to be pieces of “something” which has no
structure and from which it is difficult to obtain a return (structure).
Basically: a web site page is easier to understand and manipulate than several
pieces of content (e.g. 3 images, a banner ad., and several text blocks), which, when
viewed on a screen form a web page.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 18
EDMS - Open Source Solutions
[2.4] The main issues of EDM
Electronic Document Management (EDM) is an area which is, above all,
organisational, as opposed to Web Content Management (WCM) which deals mainly
with technical issues.
Web content management tools are focused on the dissemination of structured
content: formatting, publishing, accessibility etc. While document management
tools are concerned with upstream processes, such as the possibility of carrying out
elaborate indexing or integrating processing rules (e.g. workflow, transformation,
conditional alerts, etc.).
Contributors
CMS
EDM
Web users
readers
User player
As such it is very important that the future users are taken into account
when implementing an EDM solution.
We have noticed that the success of EDM application deployment is mostly
based on the correct use of a certain number of tools and compliance with
the principles outlined below.
The ergonomics of the application must be adapted to the use of a management
application, used by a large number of people, in the same way as a mail application,
for example.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 19
EDMS - Open Source Solutions
[2.4.1] Metadata structuring
The management of electronic documents often seems like managing in the dark. In
the absence of open formats (see below), the document-file is a dense and
impenetrable object for all applications apart from that which created it, and so for
the EDM application, which at the beginning only knows the document name (file
name) and its type (MIME type).
Metadata is the information associated with the managed documents. It is used to
describe the document, and provide complementary information on the document
which can then be used. Metadata is useful for the information it provides directly,
but more importantly it is what search functions, and selective processes applied to
documents, are based on.
In order to provide advanced functions, an EDMS must allow to associate
exploitable structured information with documents. This is what metadata is
really all about.
Standard metadata includes, for example, the title, author, description, language,
and publication date.
There are a number of metadata standard terms, such as Dublin Core, which defines
15 main metadata fields. Different businesses have their own standards (e.g. music,
architecture, health, etc.).
In order to address a maximum number of needs, the solutions must as such manage
different metadata sets based on document types and make provision for a large
range of metadata types, textual information (text fields), lists of values from
reference tables, dates or numbers on which management rules can be calculated,
etc.
[2.4.2] Classification repositories
Most content management solutions integrate one or more classification
repositories.
When we talk about professional organisation, specific professions and document
management procedures, it is crucial that we have a structure which serves
as the backbone of the document management system – this is the role of
the classification repository. This is one of the most important tools. This
uses of this tool constitutes one of the most important factors in the choice of a
solution.
The tools can be simple or elaborate, but must, at a bare minimum, allow to define a
classification scheme i.e. a hierarchical tree-structure upon which documents are
filed. They can go as far as allowing to manage business vocabularies with
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 20
EDMS - Open Source Solutions
synonyms, links of semantic proximity, multi-hierarchies and the translation of
terms, referred to as thesaurus’ or ontologies.
The use of a classification repository relates to different functions: help with
indexing, glossaries, advanced searches (taking synonyms and semantic proximity
into consideration, for example), or faceted browsing.
[2.4.3] Taking the lifecycle into consideration
To optimise document resource management, the lifecycle (which extends from
document creation up to and beyond operational utility) must be managed.
A document is first conceived by putting a number of processes into place before it
can be used. For example, a contract is drafted, modified, verified, printed, signed
and then sent, often in paper format. The life cycle includes all of these phases, from
drafting to definitive archiving or disposal.
The lifecycle of electronic documents must be managed at least as carefully as for
paper documents. Due to the simplicity of duplication, transmission and storage of
electronic documents, the number of electronic documents is considerably higher
than that of paper documents. This leads to a number of different problems, notably
traceability between versions, capacity to find specific information or even the sheer
volume of data to be stored.
[2.4.4] Going paperless and digital transformation
EDM solutions are now capable of managing different types of documents, be these
resulting from an office application or from a digitalisation process.
Digitalisation or converting to a “paperless” system refers to the transformation of a
paper document to electronic format. E.g. the transformation of a paper document
into an electronic office document, a number of forms to a database, a film to a
multimedia file etc.
For a number of years companies have set themselves an objective of “going
paperless”, even though they admit that paper will probably never be completely
done away with within their companies. Nonetheless, any content that is digital will
benefit from numerous numeric advantages.
Some organisations have integrated this and their information management plans
are focused on “all-in-one” solutions which benefit from modes of access and
management which are unified for all content, be it physical or digital.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 21
EDMS - Open Source Solutions
[3] DOCUMENT MANAGEMENT SOLUTIONS
[3.1] Alfresco
Alfresco2 is an English editor, founded in 2005 by the former directors of
Documentum and Business Object. The company is today present in England,
France and the United States.
This is a J2EE solution which offers all of the usual features: metadata, document
types, document and advanced workflows, category management, collaboration tools,
search, management of several independent databases, Web content management.
The tool sets itself apart with some distinctive features:
•
a resolutely collaborative focus both in relation to functions and
interfaces
•
a good standard of functional and technical architecture which allows
it to position itself on the EDMS of very large companies. Alfresco is
very open and is developed using numerous well-documented API’s and
tools, ranging from basic scripts, to web services, to java components.
•
management rules, directly accessible to users, allowing them to
dispense with a part of functional adjustments at manager level and
not at developer or administrator level.
•
a web content management component, in line with document
management. This module focuses on the management of web content
and not their distribution on distant servers
2http://www.alfresco.com
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 22
EDMS - Open Source Solutions
•
ease of integration with a work station
•
a Record Management module which provides all of the functionality
needed to help organisations capture, classify, control and dispose of a
wide range of corporate records, certified to the U.S. Department of the
Defense standard 5015.02
•
SURF: an interface conception framework
•
advanced technical modules, such as load balancing
management of several instances in one single installation
•
excellent visibility on an international level, due to its communication,
marketing activity, and large number of clients.
or
the
Alfresco is available in two versions: A free “Community” version under GPL
and an “Enterprise” version, which requires annual subscription and offers access to
editor guarantees, together with intermediate updates. The cost of this subscription
depends on different factors: the Service Level Agreement (SLA), the modules
implemented, and number of processers used.
This solution has a strong development dynamic, and a large community of users
and developers. The Alfresco Forge3 site hosts a number of plug-ins.
The 3.2 version, released in January 2010, integrates a number of
improvements, the following of which we found to be of particular interest:
3http://forge.alfresco.com/
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 23
EDMS - Open Source Solutions
•
An overhaul of the collaborative interface Alfresco Share, on the SURF
framework
•
Improvement of the WCM module, in terms of performance, but also in
terms of functionality
•
Improvement of mail integration. It is possible, not only to address
directly to each object of the Alfresco server (to store content or
comment on a document, for example), but also to go through an IMAP
box directly.
•
The addition of the Record Management (RM) and Information
Lifecycle Management (ILM) module to manage the life of documents.
This module is certified by the U.S. Department of Defense (DoD).
•
Alfresco accounting with the Amazon Elastic Compute Cloud (EC2),
with management facilities and “multi-tenant functions” in particular
(databases partitioned in one single installation)
•
The integration of a first CMIS implementation
interoperability standard for EDM systems).
•
Integration of the communication protocol Windows SharePoint
Services (WSS) allowing use of the EDM as a document depot directly
from Microsoft Office products.
(the future
[3.2] Nuxeo
Nuxeo4 is a French company, which has been producing Open Source solutions since
2001. The first Nuexo solution, CPS, was developed in Zope/Python. Nuxeo
Enterprise Platform is the result of it’s migration to Java in 2007. The company is
today present in France and the United States.
It is a full enterprise content management solution, in the Java J2EE environment:
metadata, document types, advanced workflows, category management,
collaboration functions, search, complex content management (web files, multi files,
and structured files), and multi database management.
4
http://www.nuxeo.com
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 24
EDMS - Open Source Solutions
See some distinguishing features of this tool below:
It’s an entirely graphic theme editor which allows to customise the interface
•
The notion of “relations” which allows to make typical links between
content. The typing is reciprocal allowing to connect both elements
internal and external to Nuxeo (URL), for example, “is the translation
of/is translated from” or “has the attachment/is the attachment of”
•
its standard interface, which is both directly usable for basic EDM
projects and equipped with user-friendly layout (drag and drop, rightclick, tab presentation etc.)
•
vocabulary management, which allows an operations administrator to
administrate the value lists in every application
•
the publication section notion, which allows to totally decorrelate
user work space from what is presented to different audiences
•
a totally modular architecture which facilitates the development,
maintenance and reuse of additional functions. The technical quality of
the solution allows it to be integrated by small companies and
international groups alike and even to be integrated as a document
management component for other projects (the ESUP-portal for
example).
•
the Nuxeo notification engine, both powerful (triggering alerts on
several elements) and extensible (by mail, RSS, etc.).
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 25
EDMS - Open Source Solutions
•
“Nuxeo Studio”, for facilitated configuration, see below
The 5.3 version, released in October 2009 brings a number of important
improvements, in technical and ergonomic terms and even in terms of
marketing:
•
the availability of widget technology (Web gadgets) with OpenSocial
integration
•
a web site document creation framework based on Nuxeo. This allows
to manage content, as with an EDM, and to obtain a preview, as with a
CMS. Integration with CMS (notably eZ Publish, end of 2009) allows to
separate management and publication of content
•
the introduction of wiki collaborative tools (and soon Blogs) directly in
the standard interface
•
mail management has also been included: it allows to have directories
which automatically retrieve mails from a designated account
•
a very efficient Office document annotation tool which meets user
needs in terms of ease of use
•
integration of the SharePoint communication protocol (WSS), allowing
use of the EDMS as a document depot, directly from Microsoft Office
products
•
synchronisation between document databases (with SyncML), this
notably allows to manage the distribution of part of the document
database
•
a version of Nuxeo focused on Digital Asset Management (DAM) i.e.
management of multimedia content, which integrates notably an
adapted interface, video format management tools, and image
management features
•
Nuxeo Studio, available on subscription, is a graphic configuration tool,
which allows to intervene on a number of options, such as document
types, lifecycle definition, certain interface graphic elements, or even
basic configuration of a Nuxeo project
•
A mail management tool, to appear in T1 2010, adapted to the
consideration of incoming and outgoing mail flows, with notably the
consideration of specific processing (trays), procedural (workflow), and
ergonomic adaptations.
•
Implementation of an intermediary version of the CMIS standard
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 26
EDMS - Open Source Solutions
In collaboration with Smile, an offline mode to consult document databases
on systems that are not connected to the Nuxeo server is in the completion
stages.
[3.3] Exo DMS
Exo is a French editor founded in 2003 by Frenchman Benjamin Mestralet, with
financing from the U.S. Department of Defense (DoD). This company has presence
in France, Vietnam, the Ukraine and Tunisia.
The initial portal integration application, swamped by multiple modules & technical
components, rapidly evolved into the eXo Platform suite covering all of the needs of a
modern ECM in an integrated manner. Management of an integration portal (Exo
Portal), web content management (eXo WCM), document management (eXo DMS Document Management System), workflow management and notably integration of
the Bonita project and even a Web O.S., a sort of portal in the form of a workstation
used to provide a virtual office, for example.
Different software suites have been packaged by the editor to create “finished
product” solutions, e.g. eXo Collaboration Suite with mail, address book, calendar,
and instant messaging features, or eXo Knowledge Suite with FAQ and Forum tools.
EXo DMS occupies a central role at the heart of this suite, as it is the component
which is used to store all files. We will only talk about the component in this white
paper, taking into account how it belongs to a greater whole.
The eXo DMS application has some very interesting features, the following of which
we found to be of particular interest:
•
A standard interface which natively integrates Windows explorer
facilities: click-drag, keyboard shortcut, several different display
modes, etc.
•
The global content database (JCR) for all content (both Web and
document) can be used via WebDAV, FTP and CIFS
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 27
EDMS - Open Source Solutions
•
Advanced office integration, with a special Microsoft Office plug-in and
Open Office integration
•
The capability of greatly adapting user interfaces based on profile. eXo
DMS, like all eXo applications, is based on eXo Portal the portal
management application.
Two important features distinguish the eXo DMS solution from all the other ECM
and EDM solutions described in this white paper:
•
It is part of a greater whole and is of far less interest outside of the eXo
Platform context
•
It is a very technical rather than functional orientated application. Its
integration and use in the framework of a solution packaged by the
editor is necessary, to appreciate the quality of it and to get the most
out of it
For these reasons, we have chosen not to systematically include it in our
comparative reports, as taking it into consideration outside of the eXo
Platform suite context makes any comparison unfair.
[3.4] Knowledge Tree
Knowledge Tree5 is an EDM solution developed by the South African company,
JamWarehouse.
Knowledge Tree has a full range of functions and several modules which allow good
integration into the office environment.
The Open Source version of Knowledge Tree integrates most of the package, but
several modules, notably those which concern integration to the work station (hot
folder, navigation, Microsoft Office integration, scanner management application),
are only available under commercial license. The comparison between the different
versions is outlined in detail on the editor’s web site.
5http://www.knowledgetree.com/
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 28
EDMS - Open Source Solutions
The application is well designed overall, providing a simple and efficient EDM.
There are several features which we particularly like:
•
A simplified, immediately operational standard interface
•
A highly-advanced search option, which is sure to satisfy even the most
demanding user in terms of the complex multi-criteria searches in
particular
•
Administrative functions, accessible to an administrator with no
particular technical skills, for all configurations: creation of document
types
•
Virtual navigation modes implemented by default, notably by
document type
•
The ergonomics of the module integrated into Microsoft Office, which,
as opposed to its competitors, allows to manage metadata from office
Version 3.7, tested within the framework of this white paper, offers several
important improvements:
•
Management of a full spectrum of metadata types, notably the date,
which had been missing
•
A technical overhaul considerably improving
performance, notably via partnership with Zend
•
The integration of two recent protocols: CMIS, allowing to query
Knowledge Tree via standardised webservices and Open Search
allowing to query and obtain responses from its search engine via
standardised methods
the
solution’s
A number of languages such as French, Spanish, and Portuguese have also been
recently integrated into the community version.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 29
EDMS - Open Source Solutions
[3.5] Jahia
Jahia is an integrated web portal and content management solution, by the editor
Jahia. It is available in two versions: the Community Edition and the Enterprise
Edition. The first is entirely Open Source under GPL v2, the second is under
commercial licence and integrates functions that are specific to companies.
This solution meets mainly web and document content management needs, together
with a number of portal needs (aggregation), notably with the JSR 168 standard.
Jahia also allows to manage a file repository to JCR standard (JSR 170) which can
be linked to different web content with good integration to the office environment
thanks to the use of WebDAV, CIFS & SMB access.
Nevertheless, EDM functions in the strictest sense of the term, are not very
advanced, the interest of these lies essentially in the use which can be made of
document content in a web site context.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 30
EDMS - Open Source Solutions
The release of version 6, mid-2009, marked a turning point for Jahia, both in terms
of its economic model going from open but proprietary license, to a far more open
source model, and in terms of the progress the new version has made.
Version 6 has brought some major improvements, notably in terms of document
management:
•
Access to medialibary files via CIFS, SMB and FTP
•
Improvement of document search functions, the capability of saving
queries
•
A simplified file management interface with improved ergonomics
(right click in particular)
•
The capability of mapping document sources external to Jahia, directly
in the media library
Jahia remains a solution focused more on web content management. The
consideration of files (documents) is essentially developed from this perspective.
Jahia remains nonetheless, an excellent means of publishing document content. It is
often necessary to use it in conjunction with an Enterprise EDM in order to take
document management issues into account.
[3.6] Other solutions
Several other EDM solutions exist in the world of Open Source. Their level of quality
is varied and in general these tools are too limited to be reliable in an enterprise
context and so for us to recommend them.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 31
EDMS - Open Source Solutions
These solutions may, nonetheless, correspond to very specific needs or a
context of certain technologies. Freedom, for example, for its capacity to create
business applications, or Quotero due to its .Net development.
We don’t recommend the following due to limited durability or a low level of
functionality: DocMgr, OpenEDM, myDMS, and eDMS.
[3.6.1] Maarch
Maarch6 is a PHP developed solution from the French company, Maerys. The
solution includes several applications based on the Maarch framework: Entreprise
1.0, to Maarch 2.7 EDM, LetterBox and Archive in motion.
LetterBox interfaces with a scanner and manages the lifecycle of mail: interface
with the digitalisation tool, reception, validation, response processing, and search
tool. It comes with a set of functions and an interface that is totally orientated
towards mail management.
•
Archive in motion is an application which manages electronic and physical
archives
•
Maarch Entreprise allows to store different categories of documents and
includes the functions that are required by an enterprise EDMS
6http://www.maarch.org/
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 32
EDMS - Open Source Solutions
This solution, though relatively new, is of interest to us in relation to certain specific
contexts, notably in relation to verticals offered by the editor, mail management,
archive management etc.
We also note that the community around the solution is small and mainly
concentrated around the editor.
Finally, the PHP framework focus in version 3 seems to us to fill a niche in the PHP
area. This focus makes it a suitable enterprise document application conception
solution.
[3.6.2] Freedom
Freedom is an EDM tool developed and deployed by the French company Anakeen.
The solution is focused on content management and the design of applications for
professionals, via multiple configuration options.
Version 3.0 is out in beta 2 at the time of writing. As such assessment of the solution
is not complete.
Despite a relatively concise interface, the tool is quite complete: with a good
selection of metadata types, document types, version management, search, a
classification scheme, document composition, calendar, address book, etc.
The main drawback with Freedom ECM is that the interfaces are not very userfriendly and configuration is complicated, making it a developing application. These
aspects do, however, seem to have been improved in version 3.0.
Adding to this the fact that the community around the tool is small and French only,
with few client references and few integrating partners listed by the editor.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 33
EDMS - Open Source Solutions
[3.6.3] Quotero
A new .Net solution, v1.0 dates back to March 2009, released by the service and
editor company: Core-techs.
This application, though far from what we expect from a modern EDM, presents an
interesting technological core.
The product is centred around three components: the document server, the web
interrogation interface, and the heavy client. Certain components are only available
on the commercial license version, with the Open Source version limited to basic
functions.
We found the following to be the most interesting functions: indexing and full text
search, reservation (check-in / check-out), Email and RSS alerts, document
workflows with jBPM, links between documents, and graphic configuration of
metadata.
The following interesting features are only available in the commercial version:
•
Opening and modification of documents from Microsoft Office and Open
Office
•
Document drag & drop from Windows and Linux via a client application
The solution community seems to be its only integrator and interested mainly in the
editor. As such the Open Source element is of limited interest.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 34
EDMS - Open Source Solutions
As such the Quotero solution is only relevant in relation to the technology on which
it has been developed, i.e. .NET. It has been surpassed in every way by other
solutions.
[3.6.4] LogicalDOC
LogicalDOC7, formerly Contineo, is a document management tool, edited by the
Italian company Logical Objects together with some independent developers.
Two versions of the solution are available, the Open Source edition, and the
Commercial edition which has extra functions8 and includes support.
Some of the functions proposed include: version management, document dispatch by
mail or generated link, webmail integration, document language management,
discussions.
LogicalDoc offers the functions that are essential to any EDM tool, together with
some interesting features such as graphic rights management, via a ‘tick box’ grid.
Use of the Open Source version is limited, however:
•
No office integration, which also reveals a negative aspect of the solution in
that it is not a collaborative one
•
The are no import/export functions
•
Identification on a company directory is only available with the enterprise
edition
•
The document architecture is very simple, which greatly limits it
extensibility
Modules complete the Open Source version, with some interesting, sometimes
indispensable features:
•
Optical Character Recognition (OCR) and integration with digitisation tools
7http://www.logicaldoc.com/
8
version comparison : http://www.logicaldoc.com/en/products/compare-products.html
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 35
EDMS - Open Source Solutions
•
Use of recent AutoCAD and Microsoft Office formats
•
A document workflow which can be configured via a graphic interface
•
Traceability functions (audit)
•
The capability of integrating mails like other documents, directly on
the server
The use of LogicalDoc is suitable for simple needs and very focused on the few strong
points of the solution. The Open Source element is possibly more as a marketing
strategy than an economic model.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 36
EDMS - Open Source Solutions
[4] FUNCTIONS
In this chapter we present the main functions of document management solutions,
and we indicate how each tool rates.
This means describing not only the feature itself, but also the impacts that it will
have in a project context.
The best Open Source EDM applications are now very advanced and tend to include
a number of features on the fringes of EDM. Furthermore, certain applications from
other areas integrate EDM functions, in close relation to business functions.
As such, the solution designers capitalise on the experience acquired in other
domains: documentation, search engines, office automation, digitalisation, process
representation, etc. The functions which are most frequently integrated with
document management solutions are: collaboration, structured content management
and workflows.
As solutions carry out essential functions most of our attention is based on
the optimisation and sophistication of these functions, together with the
availability of high-end functions, with a view to choosing a solution adapted
to each context.
[4.1] Metadata
Indexing is the central function of EDM tools, which consist, in the initial stage, of
attaching metadata to documents.
[4.1.1] Types of documents
Each document type can be described by metadata. Each document will as
such have its own metadata and will eventually be associated with management
rules using this metadata.
We note that it can be interesting to index several files with one set of metadata.
The relation between the document notice and stored files must ideally be flexible
enough to allow zero to “n” files to be attached to one single form.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 37
EDMS - Open Source Solutions
a) Metadata (notice) structure
The metadata associated with a document is generally entered using a form.
It is fundamental that the degree of finesse in the description of documents can be
modified for each project. The balance between the fullness of information, the use
that must be made of it, and the amount of work or development that feeding
metadata will represent.
While it is sometimes counter-productive to oblige the entry of 15 metatags for a
common document, certain uses can on the contrary necessitate large sets of
metadata.
The structure of document type metadata must:
•
Allow pertinent indexing, reflecting the precise documents described
and encourage the user to complete them
•
Cover all of the information relevant to use. Search of course, but also
processes applied to documents such as alerts, traceability, display, etc.
It is important to avoid over-indexing (too much metadata for the type of
document) or free indexing (an insufficient number of controls) which lead to either
indexing costs that are too high in relation to the return from them, or use errors
which can arise (noise and silence issues in particular).
b) Functions based on the types of information
These are functions which make indexing more reliable and allow to carry out
processes specific to document types, for example:
•
Multivalue fields, i.e. allowing to enter several values
•
Consistency rules on the field, and between fields: for example a date
format for a day, a positive number for a price, etc.
•
Calculated fields allow to make a field value depend on one or more
other values or conditions
•
Connections between documents and “types” of these associations, for
example, a mail which “has the annex…” or a contract which “relates
to…” a specific file
The definition and structure of document types are a fundamental phase of
the implementation of an EDM. They should in no way be neglected.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 38
EDMS - Open Source Solutions
[4.1.2] Manual indexing
a) Data-entry interface
The sheet (or form) describing a document is often the user’s first point of entry. It is
also this form which solicits most effort from the contributor, as they must enter
information.
To do this, and taking ergonomics, reliability and productivity into account, EDM
solutions must offer different facilities, either directly or via a minor integration, for
example:
•
Copy/paste between the electronic document and metadata. This is
particularly appropriate when the electronic document results from
digitalisation and when it is possible to visualise it on the same screen
as the indexing form
•
Help with data entries, as soon as possible, i.e.:
•
Control lists which allow to verify the content (semantics) and the style
(spelling in particular) of the data entered
•
Tick boxes or radio buttons for multiple choice
•
Dialogue boxes adapted to reference value tables (listing possible
choices). This can go from a simple drop down list, to interfaces with
auto completion or a navigation tool (abecedary, tree) in the reference
values
•
suggestions for a given field (see “metadata induction”, page 39)
b) Reference tables
The aim is to propose value lists as soon as possible, to limit the questions to user
has to ask himself, and to make the data entered more relevant and as such improve
the use of metadata.
Among the reference tables, we find, for example:
•
classification repositories (see “classification repositories”, page 44)
•
values lists enriched by data entry or fixed, e.g. the names of document
authors already in the system, or the list of services offered by the
organisation
The objective of these tables is to offer help in entering data and to provide
limitations and controls during data entry.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 39
EDMS - Open Source Solutions
[4.1.3] Automatic indexing
a) Extracting metadata
There are several ways to extract metadata, for example:
•
by retrieving information
Each computer file is automatically attributed information from the file
management system which can be relevant as metadata: the file name, creation and
modification dates, size, where it is stored, etc.
EDM solutions can retrieve certain information. It is also possible, notably in the
framework of taking over history, to work this information in order to reconstitute
indexing by key word, for example, by deconstructing the location where they are
stored or the structure of a name in the form of a succession of terms which can be
used for indexing.
•
Extracting structured data
Certain file formats have a readable structure, through their properties. This is
notably the case for a number of open formats, such as Open Document Format
(ODF) for example, but also Microsoft Office formats.
When these structures are known and documented, automates can be used to extract
relevant information directly from the file and this information can then be used in
the indexing form.
b) Metadata induction
Some elaborate solutions allow to automatically determine the metadata which is
the most relevant to indexing a document.
This type of metadata induction is often carried out by:
•
Recognition tools, which find character strings in a document present
in repositories in order to allow the user to add these to the metadata
•
Statistics tools, which analyse the characters strings which appear
most often and so those which are probably most representative of
content
•
Semantic tools, capable of automatically extracting the words and
expressions which are most relevant, or even to recognise if these are a
keywords, dates, titles, etc.
A combination of these different approaches is often used.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 40
EDMS - Open Source Solutions
The most evolved tools allow to take a great volume of information into
consideration very rapidly; the downside is a reduced indexing relevance, compared
to manual indexing.
Alfresco document types are managed by a combination of aspects. An aspect
contains standard metadata, with or without limits, which we can apply to
any document. An aspect and metadata can be added manually or
automatically based on file content, its name, its location or its properties. An
aspect can also modify the behaviour of a document in the repository: audit
and version management functions are activated by the corresponding aspect.
Aspects are added by XML configuration.
eXo DMS offer advanced metadata management. Several types of objects can
be managed, together with complex objects. This management can notably
use taxonomies. Metadata structuring can be managed directly in the eXo
DMS interface.
FreeDom allows to create document categories which each have a specific set
of metadata. Numerous types are available and the limits are defined by PHP
code. Data entry remains manual.
Jahia also allows to have content types which carry different metadata. The
nature of web content metadata and that of document are noticeably
different. Information is entered manually via the web interface which can be
automated in relation to PDF or MP3 files.
Knowledge Tree allows the administrator to create document types and
metadata via the web administration interface. Metadata is entered
manually, but it is possible to configure filters to automate certain
extractions.
LogicalDoc uses templates to differentiate documents. These templates are
defined via the graphic interface. Metadata is entered manually.
Maarch allows to create different types of documents. Data is entered
manually. It can be optimised using different processes, notably rules relating
to the document’s position in the tree-structure.
Nuxeo uses the notion of facets to add new types of documents. This notion
also includes other more functional document features such as the possibility
to contain other documents, or version monitoring for example. Metadata can
be entered manually, or automatically using file characteristics or content.
New document types are created from XML schemas (XSD) as well as data
entry or restitution screens and are added to the solutions architecture using
new plug-ins (extensions). Nuxeo Studio, the service associated with editor
support, offers a graphic configuration interface.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 41
EDMS - Open Source Solutions
[4.2] Version management
The management of document versions is one of the domains where EDM provides
important benefits.
In a group work context, moreover, with the exchange of multiple emails, it rapidly
becomes complicated to know with certainty the last version or relevant version of a
document.
The management of versions allows to trace the evolutions of a document, and, by a
reservation system (check-in/check-out), to guarantee that a user can take over a
document and modify it in the document database, without the risk of concurrent
modifications.
[4.2.1] Reservation (check-in/check-out)
Check-in/check-out can function technically in different ways, but must always
guarantee that when a user makes a reservation the document is locked until a
condition is met. This condition is generally a check-in. We can also decide that a
check-out be freed automatically after a certain amount of time.
Technically, check-out can be automatically triggered when the user opens a
document and check-in can be triggered once the document is closed; but the
reservation of a document can also imply that the user will take the document out of
the system while modifying, and for this reason a declarative system is required.
[4.2.2] Version incrementation
Version incrementation is generally an automatic counter which assigns a
sequential number to successive versions of a document.
We can also more elaborate incrementation calculation methods, for example to take
into account the notion of minor/major versions, or pre-established business rules,
which define the construction of version numbers.
In any case, the version history is conserved and it must be possible to view a
previous version of a document. Each document modification must give rise to a new
version. These two points are essential for the overall traceability of the document
management system.
Specific management rules must be able to be triggered in relation to conditions
from versioning, for example the creation of a major version, the time which has
lapsed since the last major version or the identification of evolutions between the
two versions.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 42
EDMS - Open Source Solutions
Alfresco, Nuxeo, eXo DMS and Knowledge Tree offer advanced
reservation and version management functions, together with the distinction
between minor and major versions, version comments, and automatic
reservation when a document is edited online.
FreeDom and LogicalDoc also offer reservation and versions, but the
possibilities, and configuration in particular, are a bit behind compared to
other solutions.
Jahia offer these features, but only for web content. There is no version
management for documents.
Maarch does not offer these types of features.
[4.3] Classification repository
A classification repository is, above all, a structured compilation of keywords or
expressions also referred to as business “vocabularies”.
EDM solutions offer basic to elaborate implementations, both in terms of the
complexity of repositories (hierarchy, types of links, etc.) and in terms of use
possibilities.
The extensibility of solutions presents a special interest, as it allows the repository
model to be extended based on needs (see “Ontology”, page 54).
[4.3.1] Repository types
a) Glossary
This is a list of useful terms relevant to the given context, normally listed
alphabetically. It is sometimes possible to include a definition.
b) Classification scheme
A classification scheme is a group of terms listed in terms of importance.
It is a repository used to carry out physical classifications (paper or otherwise).
We often find prices or codes linked with each term. The traditional use of the
classification scheme consists in physically arranging the documents by allocating
one single term of the classification to each.
In electronic usage, it becomes possible to assign several classification scheme terms
to the same document, which gives a greater number of “keys” to find it with. This is
multi-categorisation.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 43
EDMS - Open Source Solutions
c) Thesaurus
As well as a logical structure and classification scheme hierarchy, the thesaurus
introduces several notions which enrich document repository use. The thesaurus is a
list of perfectly defined terms used in a given area of application.
Several ISO standards define the relationship between terms called “describers”
which make up a thesaurus. Among the most common, we would like to point out:
•
A “generic” term corresponds to the “father” in the hierarchy. A term
can have several “fathers”.
•
A “specific” term refers to the “son” in the hierarchy
•
A “used for” term refers to synonyms that we do not keep for use in the
repository, but which can be used in the place of the describer.
•
A close term (“see also”), which defines the transverse links between
describer terms, due to their semantic closeness.
d) Management table
As regards EDM, the management table serves to establish conservation policies, i.e.
the rules which determine the conservation of electronic documents.
The management table is generally constructed based on a classification scheme.
A management table associates information with each term:
•
The duration of conservation: the duration can be optional (in office)
and an archiving date is mandatory
•
The “final outcome” at the end of the archive period
•
The reference texts which justify the duration of conservation, quality
procedure, law, regulations, etc. in order to allow reviews
The purpose of this tool is to allow to associate conservation durations and
conditions with each element of the documentary body (document or file) based on a
classification. This tool is essential to the application of conservation rules.
It’s utility is specific to the archive and conservation of essential archives domain,
i.e. record management.
e) Ontology
Ontonogies work on the same principal as thesauruses, i.e. by creating relationships
between terms. An ontology is an “extensible” repository, i.e. it has been designed to
evolve.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 44
EDMS - Open Source Solutions
Some examples of these relationship links: “has the supplier”/“is the supplier to”; “is
a subsidiary of”/ “is the mother company to”.
It is a tool which is not implemented in EDM solutions, certain allow implement this
type of tool nonetheless, due to XML structuring of their classification repository.
As opposed to a thesaurus, there is no predetermination, nor limitation in
relationship types. As such it is possible to create as many relationship links as you
like, between the terms in an ontology and to associate certain limitations and
inference rules with them.
Use of ontologies is still relatively limited today and essentially based on Resource
Description Frameworks (RDF), a method used to model information, under W3C
specifications.
The concept of ontologies is at the core of semantic web, the evolution conceived by
Tim Berners Lee to make the web more “intelligent” by allowing machines to
independently make pertinent associations between content.
In the framework of a document application, ontologies provide extremely powerful,
adaptable tools both for use in the body of the document and in user profiles.
An ontology can, for example, define business relations which can be used by a
search engine, based on the user profile or their search “optic”. They can also allow
to define contextual classification schemes based on settings which indicate which
ontological links are used.
[4.3.2] Use of repositories
Let’s take a look at some uses of repositories.
Classification repositories are designed to facilitate the use of the document
database, notably by reducing noise and silence during searches or other
types of EDM uses.
a) Restoring content
Tree-structure navigation allows the user to find a document according to a logical
hierarchy, or even several different logical hierarchies.
These are referred to as “facets”. As an example, it’s like offering the user the
same objects (document contents) according to different points of view
(facets).
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 45
EDMS - Open Source Solutions
As such, if content is indexed, categorised according to /Europe/France/Paris, but
also as /Establishments/Restaurants/3-star, the user can arrive at this content with
a navigation which corresponds to the geographical logic (or “facet”), or the
gastronomic logic (or “facet”).
This approach is very powerful, as it offers each user a vision and use of the
document database which corresponds to its use.
b) Multi-repository management
In a professional context, we are often led to present the same piece of information to
different audiences. We must sometimes manage several controlled vocabularies, for
example a list of keywords and a list of organisational units.
Multireference management involves managing certain difficult cases such as
homonyms, links between repositories, etc. Most EDM applications studied have a
single complex classification repository (if indeed they have any); as such it is
important to validate the cohabitation of several repositories if possible.
c) Management of synonyms
This means not only being able to manage synonyms and acronyms of a given
vocabulary, but also to make their use as transparent as possible for the user.
The most useful features are:
•
Easy updating of the synonym database, perhaps by the users
themselves
•
Automatic use of synonyms during a search on the main term (see
Search tool, below)
•
Automatic detection and replacement of synonyms by the term used in
indexing, in order to homogenise indexing
d) Search tool
Repository relationships can be used in different ways by the search engine.
Expanding queries
A possible use of rich repositories is to allow definition of the semantic environment
of a term. As such, some search engines are capable of “expanding the search”
automatically by using the relationships indicated in the repository.
For example, by automatically indicating with a “OR Boolean” that a search carried
out on the term “car” also concerns everything to do with “automobiles”, “vehicle”
etc. As such, all documents which contain these terms will appear in the search
results (reducing silence).
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 46
EDMS - Open Source Solutions
The engine can also extend the query by semantic proximity, for example between,
fisherman, boat, fish, etc.
The search “extenders” can also be based on syntax or semantic analysers, which
automatically construct correspondence dictionaries from the analysis of indexed
documents.
Suggestion made to the user
The analysis of repository keywords found in response forms to a query present
another occasion to use existing repositories.
For example, by proposing terms to the user which are close to those the most
frequently found in their search results.
This type of use is often more fruitful when we use a syntax or semantic engine, to
make these keyword associations.
Cluster representation
This refers to arranging a group of documents together according to a tree-structure
of keywords which relate to these documents.
When this regrouping is made from a pre-established list of values in one or more
fields, it is called categorisation.
The term cluster is generally reserved for solutions which propose a dynamic
calculation based on the analysis of terms present in the documents themselves.
This type of feature is used to represent search results in the form of a tree-structure
arranged by theme or domain for example. As such, the user immediately sees the
themes represented in the documents found and can, where relevant, specify their
query by adding or excluding certain cluster terms.
Alfresco offer a classification scheme in the form of a category hierarchy.
Alfresco also offer to display classification repository content according to the
category tree-structure.
eXo DMS offer use of its taxonomy management feature to categorise
documents. The classification of documents is very similar to that of a file
system.
Freedom uses a classification scheme which can be explored via a treestructure. Searches can also be carried out in this classification scheme.
Jahia allows for a classification scheme. This can be used to index documents
when they are associated with content.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 47
EDMS - Open Source Solutions
Knowledge Tree allows to create glossaries and classification schemes. Facet
navigation is generated automatically for certain types of content.
LogicalDoc uses a glossary which can be explored letter by letter to find the
associated documents.
Maarch propose the creation of document types, on which searches can be
carried out.
Nuxeo offers a powerful repository management system (vocabularies) which
allows to structure all repositories. They are used in particular to index
documents, the types of links between documents or facet navigation.
Quotero propose a browsable classification scheme. It is possible to carry out
searches at the heart of this classification scheme.
[4.4] Search engine
This is of course an essential part of information management applications.
Three non-exclusive trends exist in EDM solutions:
•
Integration of a search engine, Open Source solutions often integrate
Lucene, the Open Source engine of reference
•
Use of search on databases, this can limit the functions available
(notably lemmatisation, truncation, etc.).
•
An opening toward market research solutions (owner) via connectors
As such it is necessary to know how we want to search, notably in terms of
functions and the scope of the search.
[4.4.1] Basic functions
At a minimum, the EDM search engine must:
•
Index fact sheets, i.e. the document’s metadata
•
Index electronic documents in full text, i.e. taking into consideration the
content of eletronic files, for all those which include textual content
•
Filter these search results according to user read permissions, i.e. displaying
only documents that the user has read access to
•
Allow searches on the entire document and on one or more specific metadata
fields
•
Allow the use of reference tables in search interfaces
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 48
EDMS - Open Source Solutions
•
Allow searches using several criteria associated with Boolean operators
(AND, OR, EXCEPT), together with brackets
•
Allow to combine documentary approaches, on metadata and the full text
•
Authorise centre and right-hand side truncations, implicitly or by the use of
generic characters which replace one or more characters (often with a star “*”
or question mark “?”)
•
Allow search pages to be customised, if possible several different pages, to
correspond to the user requirements
[4.4.2] Advanced functions
The use of elaborate engines is often considered, to manage advanced functions such
as:
•
Allowing results to be saved, in the form of a “basket” for example, allowing
users to gather a selection
•
Allowing search strategies to be saved, i.e. the criteria used for the search, in
order to relaunch this strategy easily
•
Successive searches in order to narrow down a query in relation to the
results of the previous search
•
Extending a search, to include synonyms for example
•
Federating the search on several document databases
•
Automated approximations, spelling suggestions and lemmatisations (i.e.
search on the root of a word)
•
Allowing to search in natural language or search using an example, i.e. infer
the search equation of a phrase or text interpreted by an engine
•
Suggest results close to the search results, by different means:
•
•
explicit, via links between documents specified in the indexing of
each
•
implicit, connections which result from semantic or statistical
calculations
automatically detect “named entities”, i.e. proper nouns (people, places) and
their relationship
The search function is central in EDM projects and often the aspect which is
most beneficial for users, on condition that it is adapted to meet their specific
needs.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 49
EDMS - Open Source Solutions
Among the commonly mentioned needs, we find for example: tree-structure
navigation, search on date intervals, basic search (for Google fans), search by
reference, and federated search covering both the EDM and third-party applications.
As such it is essential that user expectations be analysed to define search needs. It is
a good starting-point to build the architecture of the target application on (data
structure and ergonomics).
All of the tools offer a minimum metadata search feature.
Alfresco uses the Lucene engine to index content, documents and metadata.
This integration provides a native powerful and highly configurable solution.
The standard interface advanced search screen is configurable and settings
can be saved for future use.
eXo DMS propose a complete search function on file and metadata content.
Settings allow to define synonyms and first level configuration of Lucene.
Freedom allows searches to be carried out and saved. The tool allows to build
dedicated search screens and reports which are in search result format.
Jahia also allows full search on all office documents and metadata, with
Lucene integration. Engine functions are powerful and highly configurable.
KnowledgeTree integrates a simple powerful search engine, which offers
several practical options (search history, search by content type, search by
location, etc.).But the most interesting feature is in the advanced search
interface which allows to build extremely sophisticated searches potentially
crossing all document database criteria.
LogicalDoc only takes into consideration the simplest types of documents, in
its community versions. The search functions include, nonetheless, two
interesting features: an advanced search screen which is highly configurable
by the user; and a “search by similarity” feature (based on word frequency).
Maarch integrates a search module which is relatively limited by default, but
which covers the basics, i.e. office documents and metadata search.
Nuxeo uses the database engine by default. Another engine such as Lucene or
solr, for example, can also be used. The standard interface functions are good,
but limited to search functions of the underlying database.
Quotero propose a graphic configurator of document types with customised
metadata.
More advanced search functions are not natively available and demand integration
or configuration.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 50
EDMS - Open Source Solutions
Among the non-open source engine tools which offer considerable added advantages,
notably in terms of semantics, are Sinequa, Antidot, Exalead, Polyspot Autonomy
and Fast.
[4.5] EDM integration
When the EDM application targets a large part of the production of office
documents, integration of the EDMS with user desktop applications is
determinative.
In this context, the ergonomics & efficiency of the work, and the flow from the EDMS
to applications using digital documents, will be the criteria which determine the
acceptance of the tool and so the success of the project.
Two main levels of integration can be considered and the choice impacts both on the
complexity of the technical environment and the quality of ergonomics.
All solutions presented propose a Web interface to interact with the EDMS.
[4.5.1] Storage space
The EDMS is like storage space, used similarly to a network disk: with file
management capabilities, navigation between different levels, or even the capability
of moving an element directly from the user workstation using “drag and drop”.
The main interest of this approach is that the change for users is minor, in relation
to simple file servers that they are used to, this facilitates acceptance of the EDM
tool. The down-side is the loss of elements such as metadata and the search
interface. This approach is most often complemented by a Web or office interface
dedicated to the EDMS which provides these features.
Use of the document repository is available in the usual user environment; it takes
into account the management of EDM permissions, but does not fully take
advantage of the underlying document management application.
Technologically, this approach is made possible by interfaces such as WebDAV,
CIFS, WSS (Microsoft) or even FTP. These technologies are implemented as an extra
layer to a document database which allows to access the EDMS with standard user
station tools.
[4.5.2] Access via office applications
This means allowing access to the document application from applications associated
with documents (for example Microsoft Office, OpenOffice, AutoCAD, Photoshop,
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 51
EDMS - Open Source Solutions
etc.) and in this way improving the flow of document creation and modification
processes.
Users want to be able to carry out common tasks using their office tools: opening,
indexing, saving in the document database, see workflow progress, etc.
The advantage of this approach is that there is no upheaval in terms of what the
user is used to, in common work phases, while a tool is provided which is far more
vast than a traditional file system.
This approach necessitates major investment when not implemented by the
document application.
Note: when Microsoft opened its WSS protocol code, used between
SharePoint and Microsoft Office, numerous EDM applications choose to
integrate it. This enables use of Microsoft Office and Windows as if it was in
contact with a SharePoint server, i.e. integrating components to the office
suite and using a “network disk” under Windows.
These plug-ins allow the following functions, when implementation is
complete: open, edit, reservation, version access, workflow access and
navigation.
Alfresco has, for some time, been the precursor in the area of office
integration, notably by proposing database access via network sharing (in
CIFS), FTP access or integration into Microsoft office with a special module.
WSS integration is today operational and replaces all of these functions
offering more again.
eXo DMS offers WebDAV, CIFS and FTP access to its content repository.
Two modules, which are installed on work stations, allow integration with
Microsoft Office and Open Office suites.
Freedom proposes WebDAV access to the repository.
Jahia offers integration limited to workstations, via media library navigation
using Windows Explorer. Management of versions and reservations is not
included.
Knowledge Tree has WebDAV access and complementary modules (nonopen source) allowing excellent integration into Windows (repository
navigation via a dedicated navigator and the “hot-folder” feature), Microsoft
Office (document edits) and Outlook (mail indexing). These extensions are
exclusive to the commercial version.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 52
EDMS - Open Source Solutions
LogicalDoc offers WebDAV access to the repository. The solution also offers
an interesting document referencing on distant network stations feature.
This indexing includes authentication on the workstation, and even the
default inclusion of a template and tag for documents referenced.
Maarch does not offer integration to the user workstation natively.
Nevertheless, integration with scanners functions for the rapid viewing of
digital documents in the Maarch interface is available with Maarch
LetterBox.
Nuxeo offer a WebDAV and WSS interface for document access. WSS
integration is operational and now allows a number of standard functions in
terms of office integration. A document can be added using drag and drop
between the workstation and the web interface. Finally, Nuxeo offer the
automation of online modification processes from the Microsoft Office or
OpenOffice Web interface, via the LiveEdit plug-in.
Quotero offer WebDAV integration. It has also announced Microsoft and
OpenOffice integration and access via rich access, which we haven’t been able
to test.
[4.6] Digitalisation
In general, document management applications manage neither digitalisation nor
associated processes – these are covered by dedicated solutions. They deal with the
initial phases from conversion of support, to the “injection” of the document in the
EDMS. The document application takes over business processes such as indexing,
once the file has been “injected”.
The functions evoked below fall into the category of document management, but are
not in general taken into consideration directly by EDM solutions.
[4.6.1] Preindexing and scanner management
This is the first function to be taken into account in the chain of dematerialisation
(i.e. going from material physical form to immaterial or electronic form).
It is generally carried out via an application installed on the system connected to the
scanner which has carried out the digitalisation, but can be implemented via a Web
interface.
As the aim is to facilitate, as much as possible, the process which includes
digitalisation and entering of the first metadata tags of a document (referred to as
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 53
EDMS - Open Source Solutions
preindexing), we try to optimise the interface proposed to the operator to digitalise
and preindex the documents in the EDMS.
Still often perceived by operators as fastidious, this initial phase is crucial to
the global success of the dematerialisation application.
It is here that specific metadata (digitalisation date, sender, recipient, invoice
amount, etc.) in entered. The correctness and the completeness of this data will
determine, to a large extent, the efficiency of subsequent processes and so the overall
value of the system.
Batch management practices and the introduction of automatic recognition allow to
improve these processes.
[4.6.2] Automatic recognition
We find several categories of automatic recognition tools. Their function is always to
efficiently retranscribe information which comes from physical supports (paper) to
its electronic equivalent, by minimising loss, errors and human intervention.
These tools are largely dependant on the quality of digitalisation: angle, definition,
visibility of text, etc. These considerations are even more important when automatic
recognition has been planned.
Some automatic recognition acronyms:
•
OCR: Optical Character Recognition
“Good OCR” reaches rates of recognition of over 95% in a digitalised
document and can reformat tables, recognise styles, etc.
•
ICR: Intelligent Character Recognition
Focused on the recognition of cursive writing (handwriting), this type
of application is frequently used by banks, for example, in regards to
cheques.
•
ADR: Automatic Document Reading
To acquire structured data in a database in relation to previously
identified fields.
•
ADR: Automatic Document Recognition
The aim is to direct documents to the right recognition process (above).
This allows notably to apply specific rules based on the type of
document recognised.
Based on complex algorithms these tools are largely dependant on the quality of
recognition and the use of linguistic and business dictionaries. The best use notions
of learning and suggestion of recognition by association.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 54
EDMS - Open Source Solutions
The integration of these functions in an EDMS often consists in the adjunction of a
dedicated application and the conception of an “injector” (or connector) to recover the
results of digital flows in the EDMS.
We feel that there is no fully satisfactory Open Source automatic recognition
solution.
Alfresco interfaces with Kofax Ascent Capture, a commercial scanner
management and automatic content recognition solution
LogicalDoc offers this type of integration in its commercial version.
eXo DMS offers integration with Kofax via WebDAV.
Freedom does not offer this type of integration.
Knowledge Tree integrates a specific tool in its commercial edition, which
allows both scanner management and OCR. This is a simple digitalisation
chain tool which allows to manage the scanner, to carry out simple operations
on the generated file and to index this file in the EDMS.
Maarch offers a standard interface with Kofax Ascent Capture or the Fujitsu
ScanSnap scanner directly. A complete digitalization chain tool, perfectly
adapted to the management of mail is offered in the dedicated vertical:
LetterBox.
Nuxeo has extension points for integration with digitalisation solutions,
notably in the framework of its mail management vertical to appear in Q1
2010: Nuxeo Correspondence.
Jahia does not offer this type of integration natively.
[4.7] Management of permissions
The management of permissions is a thorny subject in the EDM, as with the
majority of management applications likely to have diverse users.
Permissions management is generally based on the association of
authorisations to access elements on a document database. These
authorisations allow to define the overall permissions for a group or user.
[4.7.1] Management levels
The management of permissions must be sufficiently precise, at a minimum by treestructure (directory in file system), and if possible at the level of each document.
Some systems manage permissions at metadata level, in order to manage extremely
important confidentiality issues. The management of permissions also takes into
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 55
EDMS - Open Source Solutions
account possible actions based on the role (or profile) of each user (see the paragraph
below).
The management of permissions or user permissions must be able to be set precisely
and in a decentralised manner, that is to say that it must be the managers that use
the EDMS and not the developers that set and modify these permissions. This allows
to define how the notion of confidentiality is implemented in the EDMS.
[4.7.2] Profile management
A user can accumulate different profiles, based on their role within a department,
hierarchical level, but also their role in a transverse process for example as a web
site contributor, manager of a client account, etc.
The management of permissions can itself require permission, which allows local
administrators to be in charge of permissions delegated to a certain section of the
document database.
In so far as possible, it is preferable to use a metadefinition of permissions at
company directory level. This means by using groups of the central directory in the
EDM system. We only manage the allocation of roles to groups on document objects
The EDMS can assign groups with roles that can be played on document objects; this
is a totally impersonal manner, the management of individuals being delegated to
the company directory.
[4.7.3] Directory and SSO
Documentary applications can be based on LDAP or AD type directories and Single
Sign-On (SSO) systems for the management of user identities.
Different solutions differentiate themselves notably in their capacity to synchronise
with the directory, the capacity to interface rapidly with SSO market systems and
the possibility to create users for the EDMS, outside the directory.
All of the solutions presented herein are capable of interfacing with a
directory. They all offer sufficiently precise management of
permissions.
Alfresco, eXo DMS, Nuxeo and Jahia support the configuration of a SSO
system and the authentication linked to several directories.
Alfresco also offers a secondary user database in the framework of its Share
interface, this in order to be able to invite external users.
eXo DMS allows to define permissions on a number of levels: contents,
interface elements, certain functions, etc. These behaviours are notably
passed down from a portal component on which eXo DMS is based.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 56
EDMS - Open Source Solutions
Freedom, LogicalDoc and Maarch support authentication via a LDAP
directory.
Knowledge Tree also offers a dynamic management of permissions mode
based on document metadata, on top of their standard model.
Nuxeo also allow the definition of negative permissions, i.e. capability to
withhold access as required.
[4.8] Collaborative functions
The EDMS centralises enterprise documentation and offers de facto a unique
document depot, notably office. This is where collaborative functions really take off,
well beyond the basic tool i.e. mail.
Several concepts exist in the collaborative editing of documents, when a user wishes
to edit a document present in the EDMS:
•
Explicit reservation: A copy of work is created in order for the user to
be able to edit it from their work station. The original remains in readonly access on the EDMS. This principal is referred to as reservation or
“check-out”. Once the document has been modified, the user puts the
new version online and frees up their reservation. The document is in
general versioned (with an incremented version number) and made
available once again: this is refereed to as “check-in” or freeing up a
reservation.
•
Online modification: the user who wishes to modify the document,
carries out this modification online, i.e. on the server version. As long
as the document is open on the user’s system, the document is
reserved, and so in read-only mode. Once the user closes the document
on their system, the reservation is freed up. This procedure is the
easiest for the user, but it has limitations, as it necessitates remaining
“online”.
•
Concurrent modifications: Users edit the document in real time from
their system. If the document has been opened by several users at the
same time, the EDMS manages the concurrent edit by highlighting
differences and requesting action. This practice is exceptional, as the
office document format does not really work so well with this type of
edit.
We also like some other functions such as: modification alerts, mail
dispatch from the EDMS, document comments and discussions, and
previews.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 57
EDMS - Open Source Solutions
From a collaborative point-of-view, reservation and version
management are essential features.
Alfresco allows to carry out collaborative edits, via reservation, online
modification, and integration into Microsoft Office. Alfresco offers alert tools
which can be configured by the administrator. Finally, Alfresco provides an
interface development framework: Alfresco SURF. Share, the interface based
on SURF has a collaborative site generator tool. Share also provides content
management tools such as wiki, a forum, a blog, and an iCal compatible
calendar. Each user has a control panel which displays all information
relative to the sites that they are involved in.
eXo DMS offers few integrated, directly usable, collaborative functions. A
vertical dedicated to collaboration in the eXo suite offers a number of tools,
calendar, mail, forum, etc.
Knowledge Tree provides reservation and integration with Microsoft Office
features in its commercial edition; we note that integration into Microsoft
Office lies on a highly intuitive integration with the user’s system. The
solution also allows to create Word and Excel documents directly from the
EDMS Web interface.
Nuxeo allows to carry out collaborative edits via reservation, online
modification, integration into Microsoft Office, or via the LiveEdit module to
be deployed on client systems. It is also possible to initiate a document
directly in the EDMS without having to first create it on the user’s system.
Nuxeo also offers alert tools which can be configured by the administrator or
user. Finally, Nuxeo have developed an interface framework called
WebEngine which offers an alternative interface (like a mini website) on part
of the Nuxeo repository.
Maarch does not offer this type of function. This is a good example of
Maarch’s focus; it is not designed to manage the collaborative aspects of
document management but rather incoming & outgoing flows and archives.
[4.9] Workflows
Two types of workflows (processes) are implemented in the framework of content
management solutions. It is one of the boundaries between the area of EDM and
that of ECM. In EDM solutions, we find workflows applied to documents. In ECM
solutions, we also find automated procedures, outside of all document contexts: this
is referred to as Business Process Management (BPM) or business workflows.
Different levels of management processes are addressed by the solutions:
•
Document workflows, for example: document validation, approbation,
distribution etc.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 58
EDMS - Open Source Solutions
•
Business workflows, for example: file instruction, data processing, form
digitalisation, etc.
[4.9.1] Document workflows
This is where documents follow a chain of validations, often to be published or
archived: we sometimes come across the term “Docflow”. Different methods are
implemented by the tools.
Workflows are generally based on standard tool functions: management of
permissions, mail dispatch, moving a file, etc.
The tools, which allow to create workflows on demand, can be based on management
of rules underlying the application. This allows to add and juxtapose different
functions, using a combination of simple rules.
These can also be simple developments which are not very configurable notably in
terms of the number of workflow steps, notifications or results.
[4.9.2] BPM or business workflows
The aim is to enable the digitalisation of procedures, whether they have a link with
documents or not. The implemented tools are workflow engines, i.e. applications
dedicated to the configuration and execution of processes.
Document applications, when they offer elaborate process management,
integrate Open Source components to “power” them.
There exist a number excellent open source workflow engine projects, generally
using Java technology. These tools are relatively complex, e.g. Intalio, Bonita,
jBPM, Processmaker, Orchestra and OSWorkflow.
As with an Open Source document management solution, it is advisable to:
•
Question their durability, in relation to community, user and
functional level criteria
•
Ensure that the BPM engine is of good quality and well implemented
into the EDMS, notably in terms of user management (when they are
common), the persistence of data or design of processes in relation to
their interaction with document functions.
a) Workflow presentation
Several process presentation methods exist.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 59
EDMS - Open Source Solutions
The most elaborate engines are based on graphic presentation of workflows, on
which configuration can be carried out, while the underlying modelling is often in
XML. Transcription in the engine can be simple or rich, from the simple
interpretation of steps up to the possibility of graphically configuring alerts,
conditions, interaction with other applications, or even specific processes (scripts).
b) Implementation in the EDMS
Integration of a workflow engine into a document management tool is relatively
important, notably as regards the management of permissions and roles, interaction
between document objects (document, file) and workflow objects (processes, steps,
etc.) and the interfaces of the two tools.
Interface integration usually masks the workflow interface and gives the user the
impression that they are using one single application.
The management of permissions must be transversal between the EDMS and BPM.
Users have roles in processes; they can be recipients of certain tasks based on
processes. In parallel, they have permissions on the document database and are, as
such, authorised to carry out actions on certain document objects. The integration of
a workflow engine must then well apprehend this link to avoid situations where, for
example, a user is a task recipient, but does not have the permissions they need to
access data in order to carry out this task.
c) Control panels and monitoring
One of the objectives of BPM is enable the monitoring and traceability of
processes.
All “log” functions and their use must then be given extra attention when choosing
the implementation of a BPM solution in the EDMS.
At the very minimum, each user must be able to view the tasks that they have
pending. At a deeper level, indicators available also include: the list of tasks of
hierarchical subordinates, the history of completed tasks, the list of tasks for groups
the user belongs to, use statistics at individual level, group level etc.
This information is usually presented in the EDM interface, without it being
necessary to have a third party interface, except when the BPM solution is used to
“drive” processes which have nothing to do with document management. In this case,
it can be preferable to regroup all of the tasks in a control panel that is external to
the EDM.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 60
EDMS - Open Source Solutions
Alfresco has a very complete tool based on a workflow engine (jBPM). It is
well integrated with the application and allows to create business workflows.
Furthermore, Alfresco has a basic workflow design tool based on the use of its
integrated rules engine. This second type of workflow can be generated by a
function user.
eXo DMS integrates Bonita as its engine focused on the modelling and
automation of document workflows. Bonita is also used in the eXo suite to
manage different processes. It is the most complete BPM tool of those
integrated into the solutions presented herein.
Jahia include a “BPM server” in Professional and Enterprise editions.
Knowledge Tree uses the ProcessMaker engine. Less complete than jBPM,
it is however, easier to configure and integration, which is done in the form of
a module, is satisfactory. Processes are mainly document.
LogicalDoc offers a graphically configurable document workflow system in
its enterprise version.
Maarch has a PHP development workflow system. We note that it does offer
a graphic presentation of workflow flows, nonetheless.
Nuxeo has a very complete tool based on a workflow engine (jBPM). The
basic implementation which is proposed is highly configurable and allows to
create different types of business workflows.
[4.10] Management rules
[4.10.1] Management functions
Managing documents involves having functions which allow to take into account
management rules which are inherent to the relevant businesses. In relation to
documents, these rules are conveyed notably by actions on: acquisition, notifications,
conservation, transformations, etc.
We can find various different functions depending on the application, for example:
•
transformation of an office document into PDF
•
calculation a chronological number to display a document title block
based on metadata
•
dissemination of information based on different criteria: a previously
saved search strategy, a keyword, the validation of a document or a
batch of documents, etc.
•
notification by email, RSS flows, in the dashboard, etc.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 61
EDMS - Open Source Solutions
•
the definition of display formats for indexing forms, for example :by
user profile, according to business optics or by document type, this in
order to highlight the fields that are most pertinent for each
•
conditional attribution of metadata to a document, for example, in
relation to the person who indexes it or the file it belongs to
•
image manipulation, in a media library approach. This allows to avoid
taking the image out of the application to carry out simple actions:
trim, rotate, shade, etc.
•
collaborative exchange around a document or file, for example by forum
type functions, via document annotations etc.
[4.10.2] Rules engine
On the same concept as the management functions above, tools use a rules engine
which allows to configure the most complex actions and combine rules.
These rules constitute a chain of unit actions, for example, copy, transform,
metadata feed, mail dispatch, etc. They can also be more complex and reflect
business needs.
Furthermore, it is often easier to implement new functions in a rules engine, this
reflects the extensible nature of the architecture, constructed in such a way as to
allow new function modes to be added which have not been natively planned.
PHP and Java applications really stand out here: PHP due to their
development rapidity, Java due to an extensible technical architecture which
allows new functions to be added.
Alfresco has a number of management functions. These are largely
extensible, be it by existing plug-ins or by development. Alfresco also
integrates a rules engine. Some rules can be manipulated via the standard
Web interface, thus facilitating their use.
eXo DMS does not offer a rules engine. Nonetheless, the solution does
integrate a large choice of technical tools which allow the implementation of
business rules.
KnowledgeTree offers few specific management functions. There exist
nonetheless, numerous modules which allow integration with other
applications or the inclusion of new functions on the interface. Some of these
modules are in the commercial edition.
LogicalDoc does not offer a rules engine, but allows to sign up to document
or directory modifications or to automatically unzip the content of a zip file
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 62
EDMS - Open Source Solutions
Maarch does not have a rules engine. The solution implements, nonetheless,
a number of management rules to carry out specific processes, notably in
LetterBox or relative to physical archive management functions.
Nuxeo has a number of management functions which can be envisaged
almost systematically as modules. This makes the application, overall,
extremely flexible and modular. Nuxeo also propose a rules engine, but this is
not available in the interface, and so remains a powerful tool available only to
developers. The Nuxeo Studio will soon integrate the manipulation of engine
rules, which will lead to their use by operation administrators.
[4.11] Lifecycle management
[4.11.1] Conservation policies
The conservation, preservation and securing of information, enters more and more,
into document management projects, as companies begin to recognise the value of
the documents that they are in daily contact with.
Conservation policies exist for some time in the physical domain (paper), but
they are only recently being applied to electronic data.
It is the disciplines covering the functions linked to archiving which allows to meet
the needs of conservation policies, we talk of archiving or “Record Management”.
“Record Management” is sometimes considered to be better adapted to the digital
domain, in so far as it takes into account a sub-selection which corresponds to “vital
documents”, i.e. documents necessary for the activity of the organisation.
The definition of a conservation policy in a document management application
allows:
•
to define content groups, in relation to types or indexing criteria
•
to link each of these groups, to conservation actions, destruction
processes, durations, conservation formats etc.
[4.11.2] Archiving
Archiving can be approached in different ways based on the application context.
From the simple “archive” tag on elements considered to be archives, to placing
content “offline” for example.
The necessity and the complexity of taking digital archiving into consideration will
grow with the volume and the importance of information managed.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 63
EDMS - Open Source Solutions
As such, the applications integrate functions which allow:
•
to manage conservation information
•
to manage file formats in the EDMS based on the duration of their
relevance
•
to automatically trigger archiving processes
•
to choose between search in archived elements/do not search in
archived elements
Different types of processes are planned to meet archive durability,
completeness, reliability, and traceability issues.
Among the functions which can be considered, let us mention:
•
freezing archived elements, for example by including digital signatures
to ensure that the element cannot be altered
•
request authorization for each archive element, in terms of date and
responsibility
•
manage content and metadata storage systems to ensure that elements
are conserved on systems that are best suited to the use of these
elements (long life, frequent access, etc.).
[4.11.3] File formats
File format management in a document application is of particular relevance.
As opposed to paper format documents, which all use the same material i.e.
paper,, the format of digital content (files) raises several critical issues.
a) Functions linked to format
Most EDM functions can store any type of file; however, certain functions are only
available for certain formats.
•
Full-text search, i.e. the files themselves. Document formats are still
often dense and locked. Nonetheless, there exists evermore frequently
the possibility of extracting text content, which can be indexed and so
found by search engines. The most common formats are (PDF,
Microsoft Office, HTML) which nearly always appear among those
recognized. Can the same be said though for specific formats: DWG –
AutoCAD or MP3 (for textual tags), for example
•
Previews, i.e. the possibility of opening a file without having to open
the associated application. This notion is particularly useful for files
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 64
EDMS - Open Source Solutions
that are very large (e.g. image or video files), as the application allows
to rapidly view part of the content, without downloading the full file.
•
Extracting information and metadata in particular. Most file
formats use a specific method of managing the metadata that they use.
Knowing these methods or using a standard (e.g. Open document,
JPEG or TIFF) allows to facilitate recovering metadata entered by the
appliction or the materials which have generated the files. This list is
not exhaustive.
b) Open formats
Open and/or standard formats are currently of great interest with the increased
importance of office format standardisation. This phenomena which we have been
observing for some years in the technical domains, where the interoperability needs
are very strong, has become an issue for multiple projects.
There are two main format issues to be taken into consideration when implementing
an EDM project:
•
The first is to use an open, documented, free (not subject to patent or
rights of use) and very widely used format to guarantee that the files
will last the test of time
•
The second is to use a format with a known structure which allows
content to be edited. This allows file content manipulation functions:
transformations, combinations, and even adding data to a file itself
(the EDM reference or contents table).
c) Conservation formats
If the documents stored in the system must often be conserved for several years
(over 5 years) their conservation must include their preservation, i.e. at the very
minimum, that they remain legible.
The question of conservation is found to be asked more or more frequently in
relation to electronic documents, in so far as they replace paper, where a few years
ago they were a backup of paper documents.
The file formats capable of guaranteeing the readability of documents of over 10
years are, either simple very widely-used formats, and due to this standards, for
example : text files (.txt), of which XML files are among to most common and the
most useful for structured data, or formats standardised at international level such
as PDF/A (ISO 19005-1:2005), PDF/E (ISO 24517-1 pending) or ODF (ISO/IEC
26300:2006 - Open Document Format).
If these documents carry a digital signature, the issue becomes more complex (see
below).
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 65
EDMS - Open Source Solutions
[4.11.4] Probative value digital archiving
New issues have arisen in the area of archiving, giving rise to terms such as “legal”
or “probative value” archives i.e. the conservation in electronic form of documents
which may serve as legal proof.
This aspect of archiving is not generally covered by the EDM solution, but by the
annex functions which manage digital signatures, relations with trusted third
parties, and conservation traceability.
For the implementation of a legal archive, it is necessary to guarantee:
•
Integrity: that archives cannot have been modified
•
Authenticity: a nominative and verifiable signature
•
Gracility : the life of every document must be known with no room for
error, notably via certified and contiguous time-stamping (with no
discontinuity)
•
Auditability: the system must be able to be verified to prove that
these processes cannot be altered
Alfresco offers two approaches to lifecycle management. A DoD 5015-2
certified Records Management module, which notably allows to manage an
archiving classification scheme. And a lifecycle management module (ILM –
Information Lifecycle Management) dedicated to the management of content
storage medias based on their characteristics, notably conservation. These
two are complementary modules of the commercial version.
eXo DMS offers management functions associated with content publication
cycles, mainly with a view to Web distribution.
Maarch has numerous functions linked to archive management which take
physical archives (archive box) and the conception of autonomous database
archives on CDROM into consideration. This is the solutions “history”
function. We note nonetheless that the approach is not suitable for the
management of a complex conservation policy.
Nuxeo proposes the management of content lifecycles and content
conservation policies via standard functions: metadata, classification schemes
and content lifecycle management. We note that these functions are not
specific to an archive module, but transverse to the application.
The other solutions do not offer specific lifecycle or archive management
functions.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 66
EDMS - Open Source Solutions
[4.12] Import/export
Import and export functions are fundamental in an EDM solution; they allow
to inject/extract content from one solution to another and so to remain in
charge of ones choice of tools.
[4.12.1] Mass imports/exports
The objective here is to allow a large number of documents to be injected or
extracted rapidly. These functions allow to take metadata and files into
consideration and, where possible, the management of permissions.
The best solutions propose dedicated tools or low level APIs which allow to interact
with the document database to: inject or extract content, add or modify metadata or
permissions associated with content.
[4.12.2] Physical exports
There are two options here.
A definitive export, used for archiving in particular, this consists in “exporting”
certain documents and their metadata on “long term” medias, often slower than
server hard drives (CDs, DVDs, and DAT etc.) Ideally the EDM converses the
memory of documents exported in this way in a way to allow searches on metadata.
Exportation for online consultation. This consists in exporting part of the
document database to the user system or a digital media with its own consultation
interface.
Alfresco allows the importation and exportation, in the form of archives, of
complete sections of the repository, including metadata together with rules
and permissions in XML format. Alfresco also manages, the publication of
content to distant servers via it’s WCM module dedicated to web site content.
eXo DMS offers technical import-export functions at routine JCR (content
database) implementation level.
Jahia has a module which can import/export all data (content and permission
in XML, together with related files).
Knowledge Tree is capable of importing the content of a zip file together
with common metadata, and exporting a selection of files.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 67
EDMS - Open Source Solutions
Maarch allows to manage the importation and exportation of data via a
module available under commercial licence.
Nuxeo has a module which manages document and metadata imports and
exports including permissions. These functions can be managed by an
interface with the “Nuxeo Shell” command line. Finally, the solution allows
users to export a selection of documents in zip format.
[4.13] Email management
Mail management is a document management function in high demand, both in
regards to the elements of information which participate in the collective memory of
the company and for regulation purposes (Sarbanes-Oxley or Bâle II impose email
conservation rules).
Knowledge management issues involve sharing and capitalising on information,
even if their degree of formalisation is weak, which is typically the case with emails.
Storing and sharing information facilitates team work. Controls, be they
regulationary or quality, are also becoming more and more imposing. They are
pushing organisations to conserve all exchanges with their employees and
commercial partners.
The more or less automatic pooling of emails may appear to be a solution to these
problems.
[4.13.1] Selection of emails to be archived
It is very difficult to have a completely automated mail archiving process, without
the inconvenience of storing a huge volume of spam, or private messages (which
should be kept out of the “shared” environment).
EDM solutions must then propose selection processes for mail archiving. These can
use the integration of a plug-in in the mail manager (Mozilla, Outlook, etc.) or by the
automatic importation of mails sent to a specific mail address
[email protected], for example.
[4.13.2] Mail management
The management of mails poses several both technical and methodological problems.
For one, there is no homogeneity as to the number or format of attachments. The
format of emails themselves is variable (HTML or text, or owner formats depending
on the message client used). Furthermore, email exchanges are usually between
more than two individuals, as such multiple useless copies are saved in the system.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 68
EDMS - Open Source Solutions
This is only a sample of issues that arise in relation to email management.
The EDMS should offer, in particular, solutions which:
•
Manage email and attachment conservation formats, in order to ensure
optimal restoration
•
Detect duplicates to avoid storing multiple identical mails, propose
collaborative work processes, for example during the exchange of an
attachment slightly modified in several successive mails.
•
Index mail and attachment content and manage links between content
Alfresco allows to manage emails from Outlook using drag and drop. The
basic information (sender, recipient, and subject) is extracted automatically.
Mail and attachment content is available in full-text search. In Alfresco, each
document or folder has its own email address, which allows emails to be
“addressed” to it. Behaviour can be configured: a folder, will in this way, store
the mail and attachment; a document can receive an annotation
corresponding to the email’s content, etc.
eXo DMS does not directly take email management into account.
Nevertheless, eXo suite includes a tool for this specific use which is based on
the same eXo storage DMS.
Nuxeo allows to manage emails from Outlook using drag and drop. Mail and
attachment content is available in full-text search. Nuxeo also offer a “mail”
type folder which can be linked to an email client mailbox and which allows
to automatically recover all of these messages in Nuxeo. There are multiple
uses and facilitating notably the use of a shared email box by making it
available in the document database.
Knowledge Tree offers excellent integration into Outlook with its
commercial version. Emails can be archived and their metadata
automatically extracted, repository files can also be attached when a mail is
sent.
The other tools do not include these types of functions.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 69
EDMS - Open Source Solutions
[4.14] File management
[4.14.1] “File management” notions
Similar to the management of files on a work station, which itself is the result of the
management of paper documents into files, the layout of EDM solutions as regards
“file management” is in line with a logic users are familiar with. Here, a file is not a
simple directory, but a number of documents linked by a common purpose.
The notion of what a file is is sometimes a little hazy, see below definitions of files in
three different contexts:
•
From an everyday professional perspective, the file consists of a
number of documents which have been grouped together to form a
coherent file, and from which user scan accomplish tasks – project or
instruction files for example
•
From a document perspective, the file reunites documents with
common characteristics: a theme or metadata, for example – a press
article file or a company financial analysis file.
•
From an information system perspective, the file digitally
regroups information from different part of the information system to
facilitate use – the client file, for example
These approaches can be complementary.
There are two fundamental differences between paper and electronic files:
•
The digital format allows multi-positioning. The same document can
be present in several files, used by several users with different
business needs
•
The digital format avoids folder duplication and splitting. The
same folder can be shared by several entities, geographically or
functionally distinct allowing each to manage its content, without it
being necessary to create several sub-folders. Sharing part of the
information pool often leads to file duplication which then take on a life
of their own
[4.14.2] File management tools
Several important means of management are found in EDM tools, let’s mention for
example: file information, concerning metadata and associated tasks in the
framework of a more tightly-knit workflow.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 70
EDMS - Open Source Solutions
Other functions can be adapted in the EDMS or developed as completely separate
modules.
Below we regroup the most frequently encountered functions.
a) Information sharing
•
Definition of meta-information by file
•
The possibility of structuring a file by regrouping documents via their
metadata or sub-folders
•
Links between folders (hierarchy, theme, regulation, business type,
client type, etc.)
•
User permissions management (initialisation, processing agent,
supervisor, decision maker, etc.) and their respective rights to each
part (see folder structuring above).
b) File/Folder management
•
with automatic triggers ; for example for archiving or reminders in the
framework of a procedure or upon receipt of an element
•
interaction between documents and data of a third party application or
a database (for forms in particular)
•
application of rules on sub-group documents which make up a folder
•
verification of the completeness of a folder in relation to known lists
•
the possibility of generating documents from models and conditions or
file characteristics, for example a form or letter scan
•
workflow process management to automate transitions and allow
automatic following of instructions
c) Filing
•
The relationship between physical and digital elements can be
important in the case where elements in paper format are of a legal
nature
•
Identification of a folder: rules of constitution, naming rules, terms, etc.
d) Traceability
•
Any action on any file/folder must be logged to allow audits on
file/folder
processing
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 71
EDMS - Open Source Solutions
Alfresco allows to manage files/folders by combining directories, metadata,
management rules and audit functions. Facet viewing allows various
individuals, involved in company processes, distinct views. The notion of
“custom” views of each folder allows to easily add information or simple folder
management tools (check-lists or progress status, for example)
eXo DMS manages folders as a type of content, i.e. that is to say that they
can have metadata and all associated functions. If no specific function is
implemented, everything has been anticipated to allow its development upon
request.
Freedom allows to manage folders and virtual binders to file the documents.
Jahia allows to manage dynamic folders based on metadata, but does not
integrate management rules natively.
Maarch allows to manage dynamic folders which correspond to configured
queries and attributed to users based on their role. This behaviour is
available in the LetterBox application which offers well-established mail
folder management.
Nuxeo manage the notion of files/folders and provide a notion of sections
allowing to position each piece of content in a folder with as many sections as
are needed. In practise we combine directories, metadata, management rules
and audits. Facet views allow to propose different individuals involved in
company processes distinct views. The notion of a section authorizes a
supplementary abstraction, transverse and independent to metadata.
[4.15] Technical integration
EDM tools can be looked at in different ways are regards technical integration. They
can be considered as applications of their own, without real links to the information
system apart from user management and the ergonomics offered. Or they can be
considered as a part of a global system, where they serve as centralised storage for
all company files.
[4.15.1] Interface configuration
An EDM tool can be very complete, but offer an interface which doesn’t suit target
users. While with Open Source solutions it is always possible to make modification
directly on the code of the application itself, this risks making updates more
complicated. Some solutions meet this problem by allowing to modify their interface
by configuration or via special tools.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 72
EDMS - Open Source Solutions
Alfresco has two relatively monolithic interfaces and a development
framework. “Alfresco U.I.” offers a large number of functions as standard,
many of which can be adapted by XML configuration: assistants and actions
available together with some behaviours. Evolutions can be gathered in the
form of a module or developed in “custom view” format as an add-on to the
standard interface, which allows to apply them easily during updates.
SHARE, the second interface, is built on the SURF framework. Share is
focused solely on collaborative work. We note that the principal of a home
page which is totally configurable by the user is common to the two
standard interfaces. There exist in the Alfresco community several other
interfaces which are more or less mature.
eXo DMS is built on the eXo portal, a solution that is itself dedicated to the
conception of Web portals. Due to this, all of the elements of the eXo DMS
interface are configurable and extensible. The interface offered natively is
relatively technical and necessitates adaption. Remember that eXo DMS
most often used by users with semi-technical profiles.
Jahia offers a very user-friendly native interface. Given its focus on content
publication, the natively concerned interface is that of the back-office. The
restitution interface offers great flexibility in adaptation to user needs. This
is a dedicated interface design. We note that the insite editing mode to add
components directly in the pages of sites generated offers great restitution
flexibility.
Knowledge Tree allows to easily modify the text of its various interface
messages, but a specific interface manipulation tool hasn’t been made
provision for beyond the home page. Adaptations are most often carried out
on the very code of the interface.
LogicalDoc offers some interface options, but does not have a specific tool
for their conception. Adaptations are carried out by the use of themes or
skins.
Freedom and Maarch do not have a specific tool which allows to modify
the interface beyond the functional configurations proposed. Adaptations
are then carried out in the interface code.
Nuxeo has a complete and ergonomic standard interface, together with a
theme editor which allows to configure certain aspects in an interactive
way. For more advanced modifications, the whole interface (functionally
and graphically) is made up of a plug-in system which allows to package the
modifications developed to facilitate their application on subsequent
versions. Nuxeo Studio also allows here to provide great flexibility in the
configuration of certain interface elements: home page, logo, data entry,
modification and advanced search screens. We also note that the home page
is totally configurable by the user via the use of the Open Social gadget.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 73
EDMS - Open Source Solutions
[4.15.2] Applicative integration
Integration allows content management applications to integrate themselves, to
communicate with other applications, usually which producing or using content.
To this end, several standards and technologies are available.
a) Web services
The services REST and SOAP are two business function exchange and display
formats used in the creation of web APIs.
b) SOAP Web services
A specification using XML language in standard SOAP format (Simple Object Access
Protocol). Web services allow to carry out exchanges between a client and server via
the HTTP protocol. Web Services make available overall business points of entry
(Endpoints) described in a standard WSDL format contract (Web Services
Description Language).
c) REST Web services
Simpler and more efficient than the SOAP specification, REST services
(REpresentation State Transfer) are limited to use of the HTTP protocol for entries
and exits. They are not restricted to using XML as a language and their exchanges
between the client and the server are less wordy than those of SOAP, making them
more efficient.
d) CMIS: Content Management Interoperability Services
This is an emerging standard to unify access to web content and documents in
particular, probably one of the most promising since JCR or WebDAV. This standard
consists in proposing unified technical access to the content management tools
offered by this interface. In other words, it allows an application such as eZPublish
(a website development CMS) to exploit the content stored in one of the following
databases: Alfresco, Nuxeo, or Knowledge Tree, independently.
e) Simplified interfaces
Another easy way of integrating the EDM tool into the information system is by
providing one or more simplified user interfaces, limited to a specific use. These can
be used for: insertion into a portal or business application, for example.
Solutions which offer interface conception frameworks are best able to meet this
need.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 74
EDMS - Open Source Solutions
f) Portlets in a portal
The tool can supply a certain number of “portlets”, interactive mini-applications
which are plugged into a J2EE portal and which give direct access to certain
functions.
Certain EDM applications natively offer to transform certain components or pages
into portlets (dashboards, search or browsing, for example).
We invite you to consult our White Paper on Open Source Portals for further
information on this.
Alfresco has a large range of APIs: Web services, REST and also
webscripts, which can be used to create simplified interfaces based on
HTML and JavaScript.
eXo DMS is an integration solution. As you have seen above, it is used
mainly within the framework of the eXo Platform suite. eXo DMS mainly
behaves like a content management interface used by other platform
components. eXo DMS has, as such, all of the elements necessary to be
integrated, elements which are largely focused on other components of the
eXo suite.
Jahia, apart from its role as J2EE portal, makes few functions available to
access and manipulate its own content. It positions itself as an aggregator.
It is possible to recover XML content via web services and files by their
WebDAV address.
Knowledge Tree offers web services.
LogicalDoc offers some web services.
Freedom and Maarch offer practically no interaction possibilities as
standard.
Nuxeo has a number of web services, remoting technologies EJB, SOAP
and REST, together with a plug-in system which allows to easily extend tool
functions. The availability of the heavy client interface Eclipse RDP
facilitates application developments on the work station and heavy
processing.
[4.15.3] SaaS mode
2009 saw the emergence of numerous solutions in Software as a Service (SaaS)
mode, content management applications, notably EDM applications, did not escape
the trend and saw the dawn of “Cloud” computing solutions.
This mode allows to tackle EDM projects from a new angle, by seeing the application
as a service and no longer as a solution, deployment is easier, as its is industrialised,
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 75
EDMS - Open Source Solutions
often by the editor themselves. The down-side is that development is often more
limited.
This type of product is still often seen either as a way to launch a project very
quickly, as complete infrastructure can be set up in a few hours, or as the first phase
in a more ambitious project (using the Cloud as a pilot project).
Alfresco offer a machine Amazon EC2 for its community and enterprise
editions (with a 30-day limit).
Nuxeo offers an instance location service packaged on the Amazon EC2
platform.
KnowledgeTree offers KnowledgeTreeLive versions also hosted with
Amazon EC2. This version offers notably specific commercial plug-ins such
as integration with Zoho’s online office solutions.
Smile offer a rapid deployment approach designed to quickly build
prototypes or simple applications for their clients everyday uses.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 76
EDMS - Open Source Solutions
[5] CONCLUSION
For companies the aim of document management is to manage what is often vital
information. In many cases, document management makes up an important
strategic part of the information system, easily as important as the old file server
which exists in the large majority of companies and the loss of which would be
catastrophic.
Document management projects are projects onto themselves, as they combine
technical and functional aspects. The success of these projects is of paramount
importance the company.
The Open Source aspect of these solutions offers some major advantages: solutions
which are solid and durable, multiple services, and an editor who can provide
support and a service continuity guarantee, see our White Paper on this subject.
Open Source editors offer solutions focused on two axes: guarantees (support and
maintenance) and the availability of advanced functions.
Some may complain that there is a use cost and it is true that for an Open Source
solution to be completely authentic, this should not be the case. Nonetheless, EDM
Open Source solutions are editor products, they are not the product of large
foundations, or communities, and so the editor obviously needs some source of
revenue to develop their product.
Open Source offers the possibility of prototyping a project or equipping an initial
pilot area with a solution at a low cost, as the subscription is not a right of use, it is
not obligatory. It also allows a functionality which is improved by community
contributions, and so by the users who respond to other user’s needs.
But, as soon as use becomes central or critical, it is preferable to use contractual
editor support, and the functions available in the enterprise version.
As you will have learned, while attentively reading this paper, three products stand
out due to their quality and ability to create EDM projects, these are, of course,
Alfresco, Nuxeo, and Knowledge Tree. Two further solutions, i.e. eXo DMS and
Jahia, correspond to different types of project needs.
While these solutions may not have the same workings, they rival each other on
advanced functions, the essential functions being covered by each. All have an
excellent level of support, both from the editor, of a network of integrators and their
community. A choice is made, then, based on advanced functions, questions of
technical architecture and/or an economic model which allows to optimize the value
of each project.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden
Page 77
EDMS - Open Source Solutions
When document management projects include Web and/or portal components, we
can turn to tools which have EDM functions, such as Bahia, eXoPlatform, or eZ
Publish. We would, nonetheless, like to mention the modules WCM from Alfresco
and Webengine from Nuxeo which already offer interesting solutions to Web
issues.
As our objective was to, not only present these solutions, but also our concept of
document management. We hope that you have found this paper informative and we
would be happy to help advise you on your future projects.
© Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden