s01 - UBC ECE

(Slides are taken from the presentations by
Alan Ganek, Alfred Spector, Jeff Kephart of IBM)
Trillions of heterogeneous
computing devices connected to
the Internet
Dream of Pervasive Computing …
or Nightmare!
1
Core of the Problem
• Complexity
in systems themselves and in the operating
environment
– As systems become more interconnected and
diverse, architects are less able to anticipate and
design interactions among components
push to runtime, late binding
e.g., hot-plug, JVM, JIT compilation, service
discovery, mobile agents, …
• Complexity management
human intervention and IT costs
2
Need Complexity Management
• But complexity is beyond that human can
handle
Human out of the control loop autonomic
• Even though we are moving along this
direction, is there any systematic way of
addressing this issue?
• Autonomic Computing
3
4
5
Complex Heterogeneous Infrastructures
Are a Reality!
Directory
and Security
Services
Dozens of
systems and
applications
Existing
Applications
and Data
Business
Data
DNS
DNS
Server
Server
Web
Web
Server
Server
Data
Data
Server
Server
Web
Web
Application
Application
Server
Server
Thousands of
tuning
parameters
Storage Area
Network
Hundreds of
components
Data
BPs and
External
Services
6
7
Industry Trends
• Administration of systems is increasingly difficult
– 100s of configuration, tuning parameters for DB2
• Heterogeneous systems are increasingly connected
– Integration becoming ever more difficult
• Architects can't plan interactions among
components
– Increasingly dynamic; frequently with unanticipated
components
• More burden must be assumed at run time
– But human administrators can't assume the burden
• 6:1 cost ratio between storage admin and storage
• 40% outages due to operator error
• Need self-managing computing systems
– Behavior specified by sys admins via high-level policies
– System and its components figure out how to carry out
policies
8
Autonomic Computing Vision
• “Intelligent” open systems that…
– Manage complexity
– “Know” themselves
– Continuously tune themselves
– Adapt to unpredictable conditions
– Prevent and recover from failures
– Provide a safe environment
• Self-management:
– free administrators from details of operations
– provide peak performance 24/7
– Concentrate on high-level decisions and policies
9
Self-managing Systems That …
Increase
Responsiveness
Business
Resiliency
Adapt to dynamically
changing environments
Discover, diagnose,
and act to prevent
disruptions
Operational
Secure
Efficiency Aware/Proactive Information and
Resources
Tune resources and
balance workloads to
maximize use of IT
resources
Anticipate, detect,
identify, and protect
against attacks
10
Self-Configuring Example:
DB2 Configuration Advisor
11
Self-Healing Example:
IBM Electronic Service Agent
12
Self Optimizing:
Enterprise Workload Management
Web
Application Data and
Servers Transaction
Appliance
Servers
Servers
Internet
Heterogeneous, distributed
components working together
Business
Partners
Internet/
Extranet
Self-tuning, end-to-end performance
management
 Dynamic allocation of network resources
 Workload balancing & routing
 Cross platform reporting
 Policy-based for various classes of users & applications
13
Self-Protecting Example:
IBM Tivoli Risk Manager
Rapid / automated analysis
of complex situations
Risk
Manager
Security Event
Correlation
Engine
Event
Database
Application
Server
Application
Server
Risk Mgr
IDS Rules
Intrusion Detection
System (IDS)
Intrusion
Detection
Firewall
Intranet
Automate incident response
Protect systems and data
Help prevent service
disruptions
Web
Server
Router
Internet
"The Tivoli security management software portfolio is
helping our clients extend their businesses to the
Internet while providing security and privacy..."
Mark Ford, Principal
Deloitte & Touche
14
Evolving towards Self-management
Today
The Autonomic Future
Self-configure
Corporate data centers are
multi-vendor, multi-platform.
Installing, configuring,
integrating systems is timeconsuming, error-prone.
Automated configuration of components,
systems according to high-level policies;
rest of system adjusts automatically.
Seamless, like adding new cell to body or
new individual to population.
Self-heal
Problem determination in large,
complex systems can take a
team of programmers weeks
Automated detection, diagnosis, and
repair of localized software/hardware
problems.
Self-optimize
Software stacks (e.g., DB2)
have hundreds of nonlinear
tuning parameters; many new
ones with each release.
Components and systems will continually
seek opportunities to improve their own
performance and efficiency.
Self-protect
Manual detection and recovery
from attacks and cascading
failures.
Automated defense against malicious
attacks or cascading failures; use early
warning to anticipate and prevent
system-wide failures.
15
Benefits
Skills
Characteristics
Evolving to Autonomic Computing
Basic
Managed
Predictive
Adaptive
Autonomic
Level 1
Level 2
Level 3
Level 4
Level 5
Multiple
sources of
system
generated data
Requires
extensive,
highly skilled
IT staff
Basic
Requirements
Met
Manual
Autonomic
16
Benefits
Skills
Characteristics
Evolving to Autonomic Computing
Basic
Managed
Predictive
Adaptive
Autonomic
Level 1
Level 2
Level 3
Level 4
Level 5
Consolidation
Multiple
of data and
sources of
actions
system
through
generated data management
tools
Requires
IT staff
extensive,
analyzes and
highly skilled takes actions
IT staff
Basic
Requirements
Met
Manual
Greater
system
awareness
Improved
productivity
Autonomic
17
Benefits
Skills
Characteristics
Evolving to Autonomic Computing
Basic
Managed
Predictive
Adaptive
Autonomic
Level 1
Level 2
Level 3
Level 4
Level 5
Consolidation
System
of data and
Multiple
monitors,
actions
sources of
correlates and
through
system
generated data management recommends
actions
tools
Requires
IT staff
IT staff
extensive,
approves and
and
highly skilled analyzes
takes actions initiates actions
IT staff
Basic
Requirements
Met
Manual
Reduced
Greater
dependency on
system
deep skills
awareness
Faster/better
Improved
productivity decision making
Autonomic
18
Benefits
Skills
Characteristics
Evolving to Autonomic Computing
Basic
Managed
Predictive
Adaptive
Autonomic
Level 1
Level 2
Level 3
Level 4
Level 5
Consolidation
System
of data and
Multiple
System
monitors,
actions
sources of
monitors,
correlates
and
through
system
correlates and
recommends
generated data management
takes action
actions
tools
Requires
IT staff
IT staff
IT staff
extensive,
approves and
and
manages
highly skilled analyzes
takes actions initiates actions performance
IT staff
against SLAs
Basic
Requirements
Met
Manual
Reduced
Greater
Balanced
dependency on human/system
system
deep skills
awareness
interaction
Faster/better IT agility and
Improved
productivity decision making resiliency
Autonomic
19
Benefits
Skills
Characteristics
Evolving to Autonomic Computing
Basic
Managed
Predictive
Adaptive
Autonomic
Level 1
Level 2
Level 3
Level 4
Level 5
Consolidation
Integrated
System
of data and
components
Multiple
System
monitors,
actions
dynamically
sources of
monitors,
correlates
and
through
system
correlates and managed by
recommends
business
generated data management
takes action
actions
tools
rules/policies
Requires
IT staff
IT staff focuses
IT staff
IT staff
extensive,
approves and
on enabling
and
manages
highly skilled analyzes
takes actions initiates actions performance business needs
IT staff
against SLAs
Basic
Requirements
Met
Manual
Reduced
Business policy
Balanced
Greater
dependency on human/system
drives IT
system
deep skills
management
interaction
awareness
Faster/better IT agility and Business agility
Improved
and resiliency
productivity decision making resiliency
Autonomic
20
IBM’s Architecture Model
• Intelligent control loop:
– Implementing self-managing attributes involves
an intelligent control loop
21
Control Loops Delivered in 2 Ways
Combinations of
Management Tools
Resource
Provider
22
Autonomic Element - Structure
• Fundamental atom of the architecture
– Managed element(s)
– Autonomic manager
• Responsible for:
Autonomic
Manager
• Database, storage
Sensors
Analyze
Effectors
Plan
Managed
Element
Monitor Knowledge Execute
– Providing its service
– Managing own
Sensors
Effectors
behavior in
accordance with
policies
An Autonomic Element
– Interacting with other autonomic elements
23
Autonomic Manager Substructure
Alerts, events & problem SLA/Policy interface,
analysis request interface interprets & translates
into "control logic"
Sensors
Analyze
Monitor
Effectors
Analysis Engines
Policy Interpreter
Policy Validations
Policy Transforms
Policy Resolution
Rules Engines
Filters
Simple Correlators
Metric Managers
Plan
Generators
Knowledge
Topology
Calendar
Recent Activity Log Policy
Plan
Execute
Workflow Engine
Service Dispatcher
Scheduler Engine
Distribution Engine
24
Autonomic Elements - Interaction
• Relationships
– Dynamic, ephemeral
– Formed by agreement
• May be negotiated
– Full spectrum
• Peer-to-peer
• Hierarchical
– Subject to policies
25
Multiple Contexts for Autonomic
Behavior
Customer
Relationship
Management
Server
Farm
Enterprise
Resource
Planning
Enterprise
Network
Storage
Pool
Business
Solutions
(Business Policies,
Processes,
Contracts)
Groups of
Elements
(Inter-element
selfmanagement)
System
Elements
Database
Network
Servers Storage
Applications (Intra-element
Devices Middleware
self-management)
26
Levels of Maturity
27
Autonomic Computing Requires Core
Technologies
Solution Management
Dynamic Provisioning
Auto-Update
End-to-end Problem Determination
Heterogeneous Workload Management
Policy-based Management
Enabled capabilities
Core technologies
Automated Root Cause Analysis
Identity/Security Management
Auto-Detection
Data Collection
Install/Dependency
(Logging/Tracing)
Administrative
Management
Console
Infrastructure
Policy Infrastructure
Provisioning
28
Integrated Solutions Console for
Common System Administration
Customer pain point:
Complexity of operations
• Value:
– One consistent interface
across product portfolio
– Common runtime
infrastructure and
development tools based
on industry standards,
component reuse
– Provides a presentation
framework for other
autonomic core technologies
Standards-based:
J2EE, JSR168
29
Log and Trace Tool for Problem
Determination
Customer pain point:
Difficulty in analyzing problems in multi....
component systems
Viewer
ISC
• Value:
– Introduces standard
interfaces and formats for
logging and tracing
– Central point of interaction
with multiple data sources
– Correlated views of data
– Reduced time spent in
problem analysis
Analysis
Engine
....
Collector
Data
Exploiters
....
Data Store
Collector
Data
Producers
Standard
Interface
Logging
Agent
Logging
Agent
Common
situations and
data model
Logging
Agent
Common
situations and
data model
Parser
Parser
Embedded
adapter
Common
situations and
data model
Embedded
adapter
B
Log
Parser
eServer
Log
Embedded
adapter
A
Log
Standards-based:
JSR47, Apache
30
Install/Config Package for new
Solutions
GUI
GUI
Interface
Interface
Customer pain point:
Difficulty of deployment in complex systems
Meta-Data
Deployment
Deployment
Descriptor
Descriptor
Product
ProductFiles
Files
(binaries,
(binaries,etc.)
etc.)
Custom Extensions
• Value:
Dependency
Dependency
Checkers
Checkers
Install
Install
Actions
Actions
Name
UUID
Vendor
Version
Configuration Properties
Install Input
Runtime Attributes
Dependencies
HW, SW, OS, Configuration
Extensions
Install Actions
Extensions
Verification Actions
Extensions
Configuration Actions
– One consistent software installation Install package developer
technology across all products
– Consistent and up-to-date configuration
and dependency data, key to building
Standards-based:
self-configuring autonomic systems
OGSA, Web Services
– Reduced deployment time with less errors
Partnering with
– Reduced software maintenance time,
InstallShield
improved analysis of failed system
components
– Component-based install for IBM and nonIBM products
31
Verification
Verification
Actions
Actions
Configuration
Configuration
Actions
Actions
Extensions
Package Structure
Policy Tools for Policy-based Management
Customer pain point:
Complexity of product and systems management
• Value:
Definition
–Uniform cross-product policy
definition and management
infrastructure, needed for
delivering system-wide selfmanagement capabilities
–Simplifies management of
multiple products; reduced
TCO
–Easier to dynamically change
configuration in on-demand
environment
M
O
N
Analysis
I
T
Facts
O
R
Activate
Local
Reposito
ry
Validation
Push or pull
Distribution
Push or pull
Enforcement
Point
…
Enforcement
Point
Adaptation
…
Implement
Resource
Resource
Resource
32
Technologies for Implementing Autonomic
Managers
Customer pain point: How to implement end-to-end
autonomic solutions
Value:
• Components to simplify the incorporation of
autonomic functions into applications
– Building blocks for self-management
– Monitoring, analysis, planning and execution
components
– Including autonomic computing technologies,
grid tools, and services
• Pluggable
Standards-based:
– Defines interfaces and provides
OGSA, W3C
implementations for each major toolkit
component
33
Summary of Autonomic Computing
Architecture
• Based on a distributed, service-oriented
architectural approach
– Every component provides or consumes services
– Policy-based management
• Autonomic elements
– Make every component resilient, robust, self-managing
– Behavior is specified and driven by policies
• Relationships between autonomic elements
– Based on agreements established and maintained by
autonomic elements
– Governed by policies
– Give rise to resiliency, robustness, self-management of
system
34
The Metaphor
Without requiring our
conscious involvement
- when we run, it increases
our heart and breathing
rate
35
Integrating Biology and Information Technology:
The Autonomic Computing Metaphor
• Current programming paradigms, methods,
management tools are inadequate to handle the
scale, complexity, dynamism and heterogeneity of
emerging systems
• Nature has evolved to cope with scale, complexity,
heterogeneity, dynamism and unpredictability, lack
of guarantees
– self configuring, self adapting, self optimizing, self healing,
self protecting, highly decentralized, heterogeneous
architectures that work
• Goal of autonomic computing is to build a selfmanaging system that addresses these challenges
using high level guidance
– Unlike AI, duplication of human thought is not the ultimate
goal
36