Best Practices for Alfresco Replication, Backup and Disaster Recovery

Best Practices for
Alfresco Replication,
Backup and Disaster
Recovery
Richard McKnight
Principal Consultant
Brian Long
Principal Consultant
Agenda
 About the Problem Space
 Target Clients and Use
Cases
 Overview of Functionality
 About our Implementation
At the end of this presentation you
should have enough information to
make an informed decision about
BC/DR and Global Replication
The Problem Space
Moving data around the
globe to support distributed
teams and Business
Continuity and Disaster
Recovery (BC/DR)
What Problems are We Solving?
BC/DR
 Ensuring continued
operations after loss of
use of a data center
 Resume operations as
quickly as possible
 Have minimal loss of
content when back up
operations resume.
Global Replication
 Global distribution of
content around the globe
 Global collaboration
around content.
 Control over where
content is replicated
These two distinct requirements come up often with larger customers.
Global Replication Challenges
Global Replication ≠ WAN level HA
Clustering
During outages trade-offs must
be made between consistency
and availability.
In global read/write use cases
conflict resolution functionality
must be implemented
BC/DR Requirements
Recovery Point Objective
The time period for which
missing data could be tolerated
Recovery Time Objective
The length of time that the
organization could tolerate a
service interruption.
Common BC/DR #Fail
# Transaction
DB
File
1 Add 10 files
Completely Transferred
10 files Transferred
2 Add 100 more Files Completely Transferred
99 Files transferred
3 Update 20 Objects
Completely Transferred
All updates complete
4 Delete 20 Files
Completely Transferred
N/A
5 Add 10 more files
Partially Transferred
7 Files transferred
Quite often the initial instinct is to implement BC/DR as a continuous back up to
a remote region.
Transaction #5 would always be thrown out because the database would not
commit a partial transaction.
Because transaction #2 would be incomplete (due to missing files), it and all
subsequent transactions would need to be discarded.
Who is Asking For This and Why?
Reviewing types of
customers asking for this
and how they intend to use
this.
Financial Services
Many of these firms have offices around the globe.
Their drivers are:
Making content available globally.
Ensuring that content is available in alternate
sites to cover disaster recovery events.
Control over where content is
replicated for compliance
purposes.
Oil and Gas
An oil company asked about using Alfresco for
documentation used on their oil rigs.
Content was created and maintained in a
central repository.
Some content might be created on a rig and
sent back to the central repository
The network connection between the central
data center and the oil rig can suffer
extended outages.
General Disaster Recovery
A few organizations have approached us around
disaster recovery. Their drivers were:
Performance on the active side.
Best RPO possible
Consistent view of the existence of
content
Managing the cost of solution vs RTO
Large Multi-national Corporations
A couple of clients/prospects have asked about
global collaboration
Multiple data centers around the globe (more
than 10).
Specific replication policies for different
collections of content.
Read write collaboration across the globe.
Control over how aggressively replication occurs
between various regions
Comparing Infrastructure and
Application Replication
Infrastructure
• Occurs outside of the
Alfresco repository
services
• Complete replica of the
repository
• No change needed to the
application
Application
• Managed by the Alfresco
repository services
• Can be tuned to prioritize
critical content.
• Supports selective
replication to support
different business
requirements
• Supports cross-regional
collaboration
Functional Overview
So what exactly have we
created
Functional Requirements
A set of federated repositories.
The ability to create “distributed objects”
that are replicated across a selected set of
those repositories.
The ability to know the replication state of
any object.
Repository specific access control.
Non Functional Requirements
Efficient Replication
Ability to control the progress of
replication.
Resilience in the face of outages.
Cluster aware
The ability to add/delete new repositories
in the federation
About our Implementation
Business Continuity
High Availability + BC
Disaster Recovery + HA/BC
Flexibility / Configurability
• Wide Range of Business Requirements
• Wide Range of Technical Requirements
• Segmented Multi-Stage Replication
Node Segments & Events
• Stub
• Property/Properties
(non-content)
• Association(s)
• Access Control(s)
• Content(s)
• Etc…
•
•
•
•
•
•
DeleteNode
MoveNode
DeleteAssociation
InvalidateProperties
InvalidateContents
Etc…
Scheduling Options
• Synchronous
• Asynchronous
• Batch
• Including Previously Failed Actions
• On-Demand
• Typically for Content
• Never
• Typically for Invalidate Events
What is Distributable?
• Folders declared as Distribution Roots
• Configured by an Administrator
• All Children of Distribution Roots
• Ignored Aspects
• Renditions
• Thumbnails
• Etc…
Journal
Transport
Conflict Resolution
• Simple/Smart Merging by Default
• Supports Custom Implementations
• Spring Pluggable
Contact Information
Richard McKnight
[email protected]
@rmknightstar
Brian Long
[email protected]