Lecture 1

Lecture Two
Data Centre’s Government & Maintenance
Work & People Organisation
Changes Governance (1/5)
… then you better start swimmin'
or you'll sink like a stone
for the times they are a-changin‘ …
• A Data Centre is a living thing, experiencing continual changes
• A good Data Center’s Government requires to forecast the changes,
to analyze their impact, to plan and control the corrective and
complying actions, to verify the results …
… then you better start swimmin‘ …
Changes Governance (2/5)
“… Dad, my PC doesn’t respond ! …”
What did you change?
• 80% of service interruption is caused by operator error or poor
change control (Gartner)
Changes Governance (3/5)
• Changes may concern all the Data Centre components (building,
hardware, software, people, …) and may be originated by internal or
external reasons
• As an early summary classification we may distinguish between
“ordinary” and “extraordinary” changes
• A main difference between these categories lies in the approach to
face them: while the ordinary changes are generally managed
through well-defined and consolidated procedures, for the
extraordinary changes must often be established an “ad hoc” project
Changes Governance (4/5)
• Common causes for “ordinary” changes are:
•
•
•
•
•
Users’ requests
Legislation requirements
Technological innovations
Actions for budget control
Accidents and mistakes
• Ordinary changes are very frequent (hourly/daily) and their life-cycle
is generally medium-short (hours to few weeks). They impact limited
components of the Data Centre. Their management involves few
resources with a medium-low effort
Changes Governance (5/5)
• Examples of causes for “extraordinary” changes are:
•
•
•
•
Great technical or regulatory transformations
Wide company reorganizations
Site relocations and consolidations
Big and unpredicted accidents (“disasters”)
• Extraordinary changes are sporadic and their life-cycle is certainly
long (many months to years). Their impact usually crosses all the Data
Centre components. Their management involves many resources and
requires a huge effort. These resources are generally organized as a
specific project-team, with a dedicated leader
Organising the Work (1/11)
CHANGE MGMT
• Design and plan the required
changes
• Apply the changes
DEMAND MGMT
PROBLEM MGMT
• Collect the requests (from
users, market, legislation)
• Assign the proper priorities
• Analyse the problems
• Identify the repairing
changes
Organising the Work (2/11)
• Service continuity must be always protected …
• … so changes must be tested in a similar, but separated
“environment”.
Common Data Centre environments:
Development
Test
Trial
Production
Organising the Work (3/11)
Development Environment
• It’s a “laboratory” where new changes are designed and developed.
• This environment is generally used for software changes and – more
usually – for application software changes. However a development
environment may be used for system software changes as well. It’s
extremely rare to use it for hardware changes.
• The environment is geared with the tools used by the technicians to
produce and modify the software. It usually contains a library
(“Repository”) where all the versions of the software are stored: the old,
current and underdeveloped ones.
• In the smaller Data Centers the development environment is often joined
with the test environment
Organising the Work (4/11)
Test Environment (1/3)
• This environment is used to test the changes built in the development
environment
• The changes must be tested to verify that:
• They fit the purposes they were designed for.
• They do not generate problems.
• To test the changes, the changed components (usually software) must
work in similar conditions as they work in the production
environment: so the test environment is required to be “similar” to
the production one
Organising the Work (5/11)
Test Environment (2/3)
• The test environment is usually just similar (not “equal”) to the
production one for economical reasons. To duplicate the production
environment only to test the changes should be extremely expensive
and isn’t usually necessary. For example to test a new software for a
bank’s cash-dispenser network with 1.500 devices, it’s enough to set
up a test network with 4-5 devices (better if including all the different
used models of the production network).
• Isn’t rare to find different “parallel” environments to test different
changes at the same time.
Organising the Work (6/11)
Test Environment (3/3)
• The data are one of the main topics to be studied during a test
environment design
• As a start we could think that the best solution is to run the test with
a perfect copy of the production data. However this choice is
subjected to three shortcomings:
1. Cost: often the amount of the production data is excessive for test purpose
2. Security: some production data are confidential and must not be accessed
by the technicians running the test
3. Reliability: sometimes the set of the true data is a “subset” of all the
possible data. So some theoretical possible occurrences are not tested
Organising the Work (7/11)
Trial Environment (1/2)
• The trial is a sort of “test PLUS” environment. Its purpose is a “dry run” of
the changes, that’s the last complete test of the system before its delivery
in the production environment.
• The main characteristic of a trial vs. a test environment is its stronger
affinity to the production environment: it’s a requirement to guarantee the
test effectiveness. As an example, “stronger affinity” means a latest copy of
the production environment (a test environment may have been generated
not much recently). Furthermore in a trial environment may be present
characteristics missing in a test one: an example is the presence of security
systems usually “disabled” in the test environment, with the aim to speed
the test runs.
Organising the Work (8/11)
Trial Environment (2/2)
• Another usual characteristic of a trial environment (not always
present in a test environment) is the capacity to simulate the
production “workload”. Specific tools are available that can stress the
systems generating “transaction flows” comparable to the true
workload (from the volumes and from the statistical distribution as
well points of view)
• In the smaller Data Centers the trial environment is often missing and
the last run before the delivery is usually done in the test
environment
Organising the Work (9/11)
Production Environment
• It’s the environment where the true services are delivered to the true users
• Its main characteristic must be a perfect isolation from the other
environments (development, test and trial), if present. Usually indeed the
other environments are much less protected and reliable and if the
isolation is not enough confident the production environment may be
somehow effected by the problems occurring elsewhere
• The best isolation is achieved using two completely distinct Data Centers:
one for production and the second one for development, test and trial
together. However less expensive and anyway working solutions may be
designed using distinct hardware in the same site, or even distinct virtual
environments on the same hardware
Organising the Work (10/11)
The Software Lifecycle (1/2)
• The “Lifecycle” of the software is characterized by some typical
phases:
1.
2.
3.
4.
5.
6.
Design
Development
Test
Delivery and possible deploy
Errors correction and functional changes
Disuse
• Usually a software is delivered in different “releases” and the phases
follow cyclically, release after realease
Organising the Work (11/11)
The Software Lifecycle (2/2)
• Replacing the actual release of a software with the new one, is often important to choose
between two approaches: “phased” vs. “big-bang” delivery.
PREP-1
PREP-2
PREP
• Consider:
•
•
•
•
•
Release preparation time
Concurrent changes
Interactions with other internal/external systems
Test complexity
Is the date your own choice? (… hardly ever !)
• Phased approach is generally less “painful” but requires more work
Organising the People (1/8)
• The People teams working in a Data Centre are typically organized
with the following structure:
Management
Staff
Applications
Systems
Operations
Organising the People (2/8)
Applications
• They deal with the lifecycle of the Application Software
• It’s usually possible to distinguish two kinds of figures:
• Analysts: who analyze the users requests and design the general characteristics of
the software to be built. They choose the software functionalities and its technical
general architecture as well (the tools to be used, the structure of the modules, etc.)
• Programmers: who, following the general design depicted by the analysts, “write the
code”
• Usually, in a medium-great organization, the Applications “division” is
structured in two or more “departments”, one for each “Applications
Family” (as an example, for a bank, it’s usual to find the departments
“Accounts”, “Financial”, “Loans”, “Web-banking”, etc.)
Organising the People (3/8)
Systems (1/2)
• The people working in this division deals with the “Systems”, i.e. hardware,
system software, network. In a medium-great organization the division is
usually structured in three “departments”:
Network
Software
Hardware
• “Software” and “Hardware” usually deal with “not-network” SW & HW (i.e.
computers, storage, etc.), while “Network” deals with both HW & SW for
network. That’s because network components are each other more tightly
linked than not-network ones.
Organising the People (4/8)
Systems (2/2)
• Each department, mainly in big organizations, may be structured in smaller
high-specialized teams (for example Software people may be organized in
teams dealing with operating systems, data base systems, middleware,
etc.)
• In greater organizations it’s usually present a team dedicated to
“Peripheral Systems”. Sometimes it’s located inside the Network
department, sometimes not. It’s dealing with systems out of the Data
Centre (i.e. personal computers, “branch servers”, etc.)
• For the Systems specialists too – just like for the Applications ones – it’s
usually possible to distinguish between System Analysts (dealing with the
general structure of the systems they manage) and System Programmers
(with more technical and operational skills)
Organising the People (5/8)
Operations (1/2)
• While Applications and Systems divisions deal with design,
development and maintenance of the Data Centre components,
Operations division is responsible for its day-by-day functioning.
• The Operation division is responsible for the “Service Levels”
negotiated with the users, in terms of service time, performance,
problem resolution times, etc.
• Because of this responsibilities, the Operation division must be the
“only and absolute owner” of the production environment. No other
else can apply any change to production components without the
Operation division authorization.
Organising the People (6/8)
Operations (2/2)
• The Operations division too, in medium-great organizations, is often
structured in smaller teams. Usually:
• Computer Room: dealing with systems and applications starting, stopping and
properly working. This team is usually working 7H24
• Storage: dealing with data maintenance and data recovery
• NOC: or “Network Operation Centre” dealing with network components
functioning
• Help Desk: responsible for the communications between the users and the
Data Centre. The Help Desk phone number must be the only one dialed by
the users to notify malfunctions or other problems
Organising the People (7/8)
Staff (1/2)
• The “Staff” is not always present (but it is, for sure, in the grater
organizations) and represents one or more teams with miscellaneous
tasks. These tasks have two characteristics:
• they concern the whole Data Centre (i.e. they’re crossing more or all the
components or functions)
• The Data Centre Management must have full and direct view and control over
them (and that’s the reason why the Staff teams are directly subordinated to
the management)
• Each Staff team is generally very thin, composed by two or three
professionals extremely skilled in the matter they deal with
Organising the People (8/8)
Staff (2/2)
• Some Staff teams usually (even not always) present are:
• Security: dealing with the physical and logical Security Systems, users authentication
and authorizations, etc. When present, this team usually deals with Disaster
Recovery systems and procedures as well
• Procurement: dealing with all the procurement life-cycle, including the costs budget
preparation, the negotiation with the suppliers (sometimes by means of specific
invitations to tender), the contracts stipulation and control, the payments
supervision, etc.
• Standards and Documentation: is a team responsible to set, maintain and document
all the “working rules” about the Data Centre functioning. For example: what are the
responsibilities of each team in each division, what technical architectures and tools
are eligible as “Standard”, what are the “naming conventions” for all the Data Centre
components, etc.
Data Centres actually …
… a few numbers about environments …
• The case of a medium-great Italian P.A. Data Centre
… and an example of extraordinary project …
• 2 Firms merge: Application unification and Site consolidation
Environments in a medium-great Italian P.A.
Data Centre (1/3)
• The Site:
Environments in a medium-great Italian P.A.
Data Centre (2/3)
• The Hardware:
Environments in a medium-great Italian P.A.
Data Centre (3/3)
• The Environments:
AIX
Production
246
Windows
Linux
101
Test / Trial
Service (VM)
192
24
49
---
Mainframe
1
---
1
 614 virtual environments
2 Firms merge: Application unification and
Site consolidation – (1/3)
Application unification:
• From 2 different Application Systems to an unified one
• “Application System” unification means “Application Software” + “System
Software” unification
• Usually the unified system is x% of Firm-A system + y% of Firm-B system +
z% brand-new
Site consolidation:
• From 2 different Sites to an unified one
• Usually the unified site is the Firm-A or the Firm-B site; very infrequent a
third brand-new site
2 Firms merge: Application unification and
Site consolidation – (2/3)
Strategies (PRO/CON): (A) appl.unif. & then site cons. VS (B)
site cons. & then appl.unif.
(A) appl.unif.  site cons.
(B) site cons.  appl.unif.
PRO
• The Unified Site sizing is exactly
equal to the sum of the starting Sites
at the end of Appl. Unification
• The Consolidation savings can be
achieved a few months after the start up
• The Appl. Unification process is easier if
carried out in one single Site
• People integration is immediately
promoted and the whole process will run
faster
CON
• The Consolidation savings can be
achieved only after the Appl.
Unification, loosing this benefit for
many months
• The Unified Site sizing must somehow
exceed the sum of the starting Sites as is
before the Appl. Unification
2 Firms merge: Application unification and
Site consolidation – (3/3)
Consider:
• HW & SW equipment in peripheral branches (Application Unification
usually requires mass upgrade or substitution with long time
consuming processes)
• People education, both in the Data Centers and in peripheral
branches: the latter may require many months
• Reconversion of one of the original sites as a Disaster Recovery site or
as a development/test site (or both)