SharePoint with Large Archives Gets Some TLC

SharePoint with Large Archives Gets Some TLC
Presented by Tony Yin and Dan Wilkens
Presenter: Dan Wilkens
Software Product Analyst
AvePoint
@TheDanWilkens
Dan Wilkens
Presenter: Tony Yin
SharePoint Architect
Jenner & Block
@yintony
Tony Yin
Thank you for being here today
June 10th, 2014
Welcome to Dan and Tony’s Archiving Adventure
Best practices and real-world examples for handling large SharePoint
archiving and being excellent to each other….
Think of SharePoint
as the world, and
content as the
people living in it.
People have a defined lifecycle.
They are born.
They grow up.
They become a lawyer 
They fulfill a role.
They get older.
They pass on.
Why is this
important?
If you think of content as
people, think of SharePoint
as the world they live in.
Let's pretend Chicago is a
SQL Server for SharePoint.
If all the people that were
ever born in Chicago
continued to live here,
never died and never left,
what would happen?
One word: chaos.
Chicago’s infrastructure
would break down.
•
•
•
Overcrowding
Traffic jams
Nowhere to live
People born outside
Chicago would have to live
in a different place.
Chaos Continues
Let’s say that President
Obama is a SharePoint
admin. He sees that his
Chicago SQL Server is filled
with people (content).
Now he has to put all these
new people somewhere. He
decides to steer them to a
new place - Atlanta.
What happens next?
Ever Watch the Walking Dead?
The same process will repeat in Atlanta until it’s full of obsolete, undead content that
lingers in SQL and ensures its usefulness will be naught. It will also repeat if the same
model is used elsewhere, and you will have a full-blown zombie SharePoint world.
Insanity: doing the same thing over and over again and
expecting different results.
- Albert Einstein
How Do You Prevent this Situation from Happening?
• People should leave the city to enjoy retirement
• Cheaper, better places for them to live at stage in life
• Places with people of similar ages/interests to interact
with and enjoy retirement
• Allow younger residents to move into the city and
establish their relevance/importance
• Prevent the zombie apocalypse from happening
• The end of a person’s lifecycle is expiration, and this must
be allowed to occur
• Proper planning for local/national infrastructure management
• City planners must monitor growth and quality of life
• If overcrowding, city must provide space to build new
homes, new highways/trains to the suburbs, and repurpose older areas for development
What does any of this have to do with SharePoint?
Introducing The SharePoint Content Lifecycle
The SharePoint Content Lifecycle
Content is
created,
uploaded or
migrated
Content is
accessed and
edited
regularly
Unused
content is
retained
Removal or
archiving of
unused content
Putting things into context
• Just as people should leave the city to enjoy retirement, the same
applies for SharePoint content. When content is ready for
retirement, admins must consider removing content from SQL.
• Cheaper, better places for content to live at stage in lifecycle
• Local or network file shares/systems
• Advanced storage systems (NetApp, IBM, etc.)
• Cloud storage - public, private, hybrid
• Reduce total cost of ownership by establishing multi-tier
storage policies based on retention
• Allow for fresher, relevant content to populate SQL Servers
• Has an effect on overall SharePoint performance
• Can re-tool architecture according to business needs
I see. Go on….
The SharePoint content lifecycle must be considered for
enterprise-level archiving, and must end in expiration
• The end of a person’s lifecycle is expiration, and this
applies to content, sites, and site collections in SharePoint
• Retention periods need to be set on content to know how
long to keep it according to state legal & compliance rules
• However, that period eventually comes to pass, and if it is
not expired, it will continue to take up space in SQL or
external storage devices
• Steps must be taken to expire this content to preserve
space and reduce IT costs
Question: What is one of the most important things a
person can do as they get older?
Answer:
Plan for retirement!
Just as people plan for their
retirement, SharePoint
admins must take steps to
plan content archival,
retention and expiration.
As in life, the earlier you
plan for retirement, the
better off you will be in the
long run.
Planning for Retirement, SharePoint Style…
Key considerations when planning out an
enterprise archive strategy:
•
•
•
•
•
Access current architecture
Define what archiving means to you
Storage and cost considerations
Define a system of record
Planning for records retention and content
expiration
• Integration with eDiscovery platforms and
governance-based software
• Automation, workflows and more
Assess Your Current Architecture
• Think about what systems and what versions of those
systems your organization is currently using
• On-premises SharePoint (2003, 07, 10 or 13)
• SharePoint Online/Office 365
• Hybrid deployment
• Public cloud, private cloud, hybrid cloud
• Outside SharePoint
• Other enterprise content management systems
• Email systems
• File shares
• Practice management solutions
• eDiscovery and early case assessment systems
Assess Your Current Architecture
• Think about how your users are using each system to
consider what approach to take
• Migration to unify experience?
• Training necessary
• User adoption
• Interruption of business activities
• Think about your long-term content management needs
• Planning a move from on-premises to cloud
• Planning a migration from version to version
• Planning for a hybrid deployment
• Think about your total cost of ownership and strain on IT to
manage various systems
Real-World Example: Jenner & Block
Farm Architecture
• SharePoint 2010 with SP1, all virtual
• 2 web front ends, 1 application server, and 1 FAST search
server
• 2 SQL Server 2008 R2, 1 for content dbs, 1 for all other
dbs; with SAN volumes
• NAS for blobs with RBS, with snapshot daily
• From 1 matter per site collection to 1 client site per site
collection
1.2 TB in the farm
Define what Archiving Means to Your Organization
How will you plan for retirement? What does that mean to you?
• Does that mean you'll move to Florida?
• Does that mean you'll give your house to your kids and go live
in an adult community?
• Does that mean you'll sell all your possessions and become a
monk?
The same considerations must go into archiving methodology:
• Keep everything in SharePoint
• Utilize SharePoint records management functionalities
• In-place? Records Center?
• Delete & externalize completely out of SharePoint
• Removing BLOBs outside of SQL but content readable in SP
• Export to another ECM/ file system
• Print it out and put it in a file cabinet (please don’t)
Question: What is something that a financial advisor
will always tell you to with your investments?
Diversify your
portfolio!
Just as you wouldn’t put your
financial eggs all in one basket,
you shouldn't do it for archiving
content. You should consider
diversification to maximum ROI
for storage.
For SharePoint, that's all about
reducing your IT overhead costs
and taking content off
expensive storage systems and
moving it to lower-cost models.
Storage and Cost Considerations
SharePoint’s native storage architecture:
Web Front Ends
User Data
Application Server
Application Server
SQL Server on SAN
Improving cost and architecture….
Web Front Ends
User Data
Application
Server
Application
Server
Cloud Storage (non-sensitive)
NAS/File Share
SQL Server
Storage
(sensitive) on SAN
Real-World Example: Jenner & Block
Over the last year, the size has grown by almost half a terabyte
Some interesting stats
• Top 5 content dbs in GB
• 67.2, 43.6, 42.3, 39.0, 37.5
• Top site sizes in GB
• 53, 40, 29, 29, 22
• Top Doc sizes in MB
• 800, 711, 710, 622, 610
Cost / Benefit Analysis and Best Practices
• Establish a storage plan for archiving that correlates with
content activity and retention stages
• Consider the following:
• Active/WIP
• Records
• Non-active but on retention
• Content soon to expire
• Develop a multi-tier policy for records retention and
archiving based around a sliding cost scale
Tier 0: SQL Server
Tier 1: Move to highest-cost storage
Tier 2: Lower-cost storage
Tier 3: Lowest-cost storage
Best Practices
• Consider sensitivity of data for each storage system
• Sensitive content may not be optimal for the cloud
• PII, financial information, SSNs, etc.
• However, non-essential records without PII can be
pruned and placed in the cloud
• Consider third-party tools that allow you to centrally
manage the distribution of content among these different
storage platforms
• Set up workflows/work with 3rd party tools that allow you
to automate the process and set retention policies
Define A System of Record
• How do you define a record?
• Do you differentiate between works in progress and
final records? If so, how?
• Content types, metadata, managed metadata?
• Do you use Content Type Hubs?
• Content Organizer and Drop-off Libraries?
• Do you use 3rd party vendor tools?
• Do you use site mailboxes for Exchange integration?
• Will you store SharePoint records in SharePoint?
• If so, what method?
• In-Place Records Management
• Record Center
• Will you store records externally?
Planning for Retirement, SharePoint Style…
Planning for retention
• Define content specific retention policies
• Opt for granular retention policies that can be defined
at multiple object levels (document, lists, sites, etc)
• How to deal with state specific legal rules for retention
• Ex: Illinois requires 7 year retention
• Regulatory and compliance retention standards
• Ex: SEC broker-dealer regulations
• Once defined, how to automate and schedule records
retention policies
• Using information management policies?
• Using third-party tools?
• Building custom applications?
Integration Points with Other Software
• Integration with eDiscovery platforms and governance-based
software
• Ability to search archived content
• Ability to restore and export archived content
• Ability to set policies/proactively prevent users from
breaking rules
• Integration with early case assessment, technology assisted and
document review platforms for larger firms
• Exportable load files in consumable file formats
• Integration with tools that provide scanning and classification of
content as it enters SharePoint
• Identify and protect sensitive data
• Prevent it from moving to the cloud
Once your policies are defined, look to automation
to speed up your processes.
Planning for Retirement, SharePoint Style…
• Automate expiration of content
• Expire records & identify stale SP Content for removal
• SharePoint site deletion policies
• SharePoint information management policies
• 3rd party tools to automate expiration based on retention
policies
• Use workflows to allow end-users to approve/reject archiving of
SharePoint content
• No one is closer / more familiar with the data
• They know if it’s obsolete
• Protect yourself
• Be sure to provide notifications and audit logging of events
• Ensure defensible deletion via digital shredding
Real-World Solutions: Jenner & Block
Solutions/Recommendations/Best Practices
• Evaluate physical and logical architectures
• Too many site collections? Site collection per client or per
matter? Information sharing?
• 3rd party RBS solution, with NAS
• Smart site provisioning:
• Pre-defined site templates
• Choose db with least # of sites and size
• Controlled site creation, ensure security, governance,
consistency
• Enforce upload limit
• Reporting on site usage, aging to site owners
• Version control
Questions
We’ll now open it up for questions
Thank You