QoS-driven Lifecycle Management of Service

QoS-driven Lifecycle Management of
Service-oriented Distributed Real-time &
Embedded Systems
Aniruddha Gokhale
[email protected]
www.dre.vanderbilt.edu/~gokhale
Assistant Professor
ISIS, Dept. of EECS
Vanderbilt University
Nashville, Tennessee
February 16th, 2006
www.dre.vanderbilt.edu
Service-oriented Style of Distributed Realtime & Embedded Systems
– Regulating & adapting to (dis)continuous
changes in runtime environments
• e.g., online prognostics, dependable
upgrades
– Satisfying tradeoffs between multiple
(often conflicting) QoS demands
• e.g., secure, real-time, reliable, etc.
– Satisfying QoS demands in face of
fluctuating and/or insufficient resources
• e.g., mobile ad hoc networks (MANETs)
2
Characteristics of SOA-style DRE Systems
• Manifestation of Service-Oriented Architectures (SOA) in the distributed
real-time & embedded (DRE) systems space
–
–
–
–
Applications composed of a one or more “operational string” of services
A service is a component or an assembly of components
Dynamic (re)deployment of services into operational strings is necessary
New class of QoS (performance + survivability) requirements
• Realized using enabling component middleware technologies e.g., CCM,
.NET and J2EE
3
QoS Issues for SOA-style DRE Systems
Failover Unit
C1
C2
C3
C4
C5
• Per-component concern – choice of implementation
– Depends of resources, compatibility with other components in assembly
• Communication concern – choice of communication mechanism used
• Assembly concerns – what components to assemble dynamically?
What order? What configurations end-to-end are valid?
• Failure recovery concern – what is the unit of failover?
• Sharing concern – shared components will need proactive survivability
since it affects several services simultaneously
• Availability concern – what is the degree of redundancy? What
replication styles to use? Does it apply to whole assembly?
• Deployment concern – how to select resources? Risk alleviation?
4
Tangled Concerns in SOA-style DRE Systems
• Demonstrates numerous
tangled para-functional
concerns
• Significant sources of
variability that affect endto-end QoS (performance
+ survivability)
Separation of
Concerns &
Managing Variability
is the Key
Design-time
Deployment-time
Run-time
5
(1) Design-time Variability Management
in SOA-style DRE Systems
• Focus on Separation of Concerns
• “What if” Analysis
• Analytical methods
• Simulation methods
• Model-driven generative programming for “what if”
• Understanding the impact of individual concerns
• Students involved:
•
Krishnakumar Balasubramanian, Jaiganesh Balasubramanian, Gan
Deng, Amogh Kavimandan, James Hill, Sumant Tambe, Arundhati
Kogekar, Dimple Kaul
Work partly supported by DARPA PCES program (PI),
DARPA ARMS Program, PI on subcontracts from
Lockheed Martin ATL, & NSF CSR-SMA Program, PI
Separation of Concerns using CoSMIC
Component Package
Component
Component
Component
Assembly
Component
Assembly
Component
Component
Component
Component
Component
Component
Configurator
pa
cka
s
ML
ge
IC
(P
(4
)c
on
(O
fig
CM
ur
es
L,Q
oS
ML
)
(3)
Component
Packager
)
ag
Component
(PICML)
assembly
Component
sp
nfi
gu
CoSMIC
ec
ific
ati
on
Component
Developer
fe
b
ed
k
ac
Assembly
Component
Deployer
ra t
ion
(6) deployment
Deployment
Application
RACE Framework
(5)
planning
n
si g
d e a ck
(9 ) e d b
fe
(1)
co
Deployment
Planner
y  f ( x1 , x2 ,... xn )
(8) reconfiguration
& replanning
Component
Assembler
ing
de
ve
lop
ML
s
&P
IC
ML
)
Component
Assembly
Component
p
(2) assembles
(ID
Component
k
ac
planning
Component
Assembly
DAnCE
Framework
Assembly
Component
Imp
Imp
Imp
l
l
l
Resource
Requirements
(7) analysis &
benchmarking
(Cadena & BGML)
Properties
System
analyzer
• Project Lead and PI
DARPA PCES program
• CoSMIC project focuses
on separation of
deployment and
configuration concerns
• Model-driven generative
programming framework
• Complementary
technology to CIAO and
DAnCE middleware
• www.dre.vanderbilt.edu/
cosmic
Analysis & Benchmarking
•
•
•
•
CoSMIC tools e.g., PICML used for separation of concerns in operational strings
Captures the data model of the OMG D&C specification
Synthesis of static deployment plans for DRE components
New capabilities being added for static deployment planning
Work supported by DARPA PCES Program, PI
7
Case Study for “What if” Analysis: Virtual Router
• Network services need support for efficient (de)-multiplexing, dispatching
and routing/forwarding
• .e.g., VPN Service
CE
provided by a virtual
router
Virtual Router
Provider Edge
VR (PE)
CE
Multiple tunnels to customer
edge or virtual routers
Firewall
Multiple tunnels to backbone
or virtual routers
VR
Provider Edge
(PE) VR
VR
VR
CE
CE
CE
CE
Provider Edge
VR (PE)
VR
VR
Provider Edge
VR (PE)
VR
VR
VPN2
CE
VR
CE
Backbone 1
VR
VR
CE
CE
CE
Provider Edge
(PE) VR
VR
Provider Edge
VR (PE)
Level 2 Service
Provider
VR
VR
CE
CE
CE
VR
VPN3
VR
CE
CE
VR
CE
VR
Backbone 2
VR
Level 1 Service
Providers
VPN2
VR
Level 1 Service
Providers
VPN1
VR
VPN3
VPN1
CE
• Provides differentiated
services to customers,
e.g., prioritized service
• VPN setup messages
must be efficiently (de)
multiplexed, serviced
and forwarded
• Implemented using
middleware
• Need to estimate
capacity of the system
at design-time
Problem boils down to capacity planning and estimating
performance of configured middleware
8
Performance Analysis of Reactor Pattern in VR
CE
Provider Edge
VR (PE)
VPN1
CE
CE
VR
VR
VR
• Customers send VPN setup messages to router
• VPN setup messages manifest as events at the VR
• VR must service these events (e.g., resource
allocation) and honor the prioritized service, if any
• Accepted messages are forwarded
• Events could be dropped in overload conditions
The Reactor architectural pattern allows event-driven applications to
demultiplex & dispatch service requests that are delivered to an
application from one or more clients.
• Reactor pattern decouples the
detection, demultiplexing, & dispatching
of events from the handling of events
• Participants include the Reactor, Event
handle, Event demultiplexer, abstract
and concrete event handlers
9
Modeling VR Capabilities in a Reactor
Single Threaded Reactor
Event Handler
with
exponential
service time m2
• Differentiated services for two classes
 Events are handled in prioritized order
N2
N1
Event Handler
with
exponential
service time m1
l1
Poisson
arrival
rate
• Consider VPN service for two customer
classes
 Reactor accepts and handles two types
of input events
l2
Poisson
arrival
rate
select-based event demultiplexer
incoming
events
network
Model of a single-threaded, selectbased reactor implementation
• Each event type has a separate queue to
hold the incoming events. Buffer capacity for
events of type one is N1 and of type two is
N2.
• Event arrivals are Poisson for type one and
type two events with rates l1 and l2, resp.
• Event service time is exponential for type
one and type two events with rates m1 and
m2, resp.
10
Performance Metrics of Interest for Reactor
•Throughput:
-Number of events that can be processed
-Applications such as telecommunications call processing.
•Queue length:
-Queuing for the event handler queues.
-Appropriate scheduling policies for applications with real-time requirements.
•Total number of events:
-Total number of events in the system.
-Scheduling decisions.
-Resource provisioning required to sustain system demands.
•Probability of event loss:
-Events discarded due to lack of buffer space.
-Safety-critical systems.
-Levels of resource provisioning.
•Response time:
-Time taken to service the incoming event.
-Bounded response time for real-time systems.
11
Performance Analysis using Stochastic Reward Nets
Transition
A1
A2
N2
N1
B1
Sn1
S1
Place
B2
Immediate
transition
Sn2
Inhibitor arc
StSnpSht
T_SrvSnpSht
T_EndSnpSht
Token
S2
SnpShtInProg
Sr1
Sr2
(a)
(b)
• Stochastic Reward Nets (SRNs) are an extension to Generalized
Stochastic Petri Nets (GSPNs) which are an extension to Petri Nets.
• Extend the modeling power of GSPNs by allowing:
Guard functions
Marking-dependent arc multiplicities
General transition probabilities
Reward rates at the net level
• Allow model specification at a level closer to intuition.
• Solved using tools such as SPNP (Stochastic Petri Net Package).
12
Modeling the Reactor using SRN (1/2)
Event arr.
A1
A2
N2
N1
B1
Service
queue
B2
Sn1
Sn2
S1
S2
Sr1
Servicing
the(a)
event
Drop events
on overflow
StSnpSht
T_SrvSnpSht
T_EndSnpSht
Prioritized
service
SnpShtInProg
Sr2
Service
completion
(b)
•
•
•
•
•
•
Models arrivals, queuing, and prioritized service of events.
Transitions A1 and A2: Event arrivals.
Places B1 and B2: Buffer/queues.
Places S1 and S2: Service of the events.
Transitions Sr1 and Sr2: Service completions.
Inhibitor arcs: Place B1and transition A1 with multiplicity N1 (B2, A2, N2)
- Prevents firing of transition A1 when there are N1 tokens in place B1.
• Inhibitor arc from place S1 to transition Sr2:
- Offers prioritized service to an event of type one over event of type two.
- Prevents firing of transition Sr2 when there is a token in place S1.
13
Modeling the Reactor using SRN (2/2)
A2
A1
N2
N1
B1
B2
Sn1
Sn2
S1
S2
Sr1
Sr2
StSnpSht
T_SrvSnpSht
T_EndSnpSht
SnpShtInProg
(a)
(b)
• Process of taking successive snapshots
• Reactor waits for new events when currently enabled events are
handled
• Sn1 enabled: Token in StSnpSht & Tokens in B1 & No Token in S1.
• Sn2 enabled: Token in StSnpSht & Tokens in B2 & No Token in S2.
• T_SrvSnpSht enabled: Token in S1 and/or S2.
• T_EndSnpSht enabled: No token in S1 and S2.
• Sn1 and Sn2 have same priority
• T_SrvSnpSht lower priority than Sn1 and Sn2
14
VR SRN: Performance Estimates
• SRN model solved using Stochastic Petri Net Package (SPNP) to obtain
estimates of performance metrics.
• Parameter values: l1  0.5/sec, l2  0.5/sec, m1  2.0/sec, m2  2.0/sec.
• Two cases: N1 = N2 = 1, and N1 = N2 = 5.
Perf. metric
N1 = N2 = 1
N1 = N2 = 5
#1
#2
#1
#2
Throughput
0.37/s
0.37/s
0.40/s
0.40/s
Queue length
0.065
0.065
0.12
0.12
Total events
Loss probab.
0.25
0.065
0.27
0.065
0.32
0.35
.00026 .00026
Observations:
• Probability of event loss is higher when the buffer space is 1
• Total number of events of type two is higher than type one.
• Events of type two stay in the system longer than events of type one.
• May degrade the response time of event requests for class 2 customers
compared to requests from class 1 customers
15
VR SRN: Sensitivity Analysis
• Analyze the sensitivity of performance metrics to variations in input
parameter values.
• Vary l1 from 0.5/sec to 2.0/sec.
• Values of other parameters: l2  0.5/sec, m1  2.0/sec, m2  2.0/sec, N1 = N2 = 5.
• Compute performance measures for each one of the input values.
1.6
1.4
Throughput
1.2
1
0.8
0.6
0.4
0.2
0
0.4
0.44
0.5
0.57
0.66
0.8
1
1.33
2
Lambda1
Observations:
• Throughput of event requests from customer class #1 increases, but rate
of increase declines.
• Throughput of event requests from customer class #2 remains
unchanged.
16
Middleware Pattern Simulations in OMNeT++
• OMNeT++ is a discrete event simulator for networked systems
• Developers write C++ code for simulation
• www.omnetpp.org
.ned files
Simulation kernel
Mod
Submod1
Submod2
Statistics
Output Vector File
Output Scalar File
Mod_n.h/.cpp
Submod1.h/.cpp
Submod2.h/.cpp
Visualization and
Animation
OMNeT++
Initialization File
OMNeT++
Message File
UI Library
17
The Simulation Model for Reactor
Event Handlers
with queues
Statistics
Collector
Synchronous
Event
Demultiplexer
Event
Generator
Reactor
18
Addressing Middleware Variability Challenges
Although middleware provides reusable building blocks that
capture commonalities, these blocks and their compositions
incur variabilities that impact performance in significant ways.
• Compositional Variability
• Incurred due to variations in the
compositions of these building blocks
• Need to address compatibility in the
compositions and individual
configurations
• Dictated by needs of the domain
• E.g., Leader-Follower makes no sense
in a single threaded Reactor
• Per-Block Configuration Variability
Reactor
single
threaded
thread pool
event handling strategy
select
poll
Qt
WaitForMultipleObjects
Tk
event demultiplexing strategy
• Incurred due to variations in
implementations & configurations for a
patterns-based building block
• E.g., single threaded versus thread-pool
based reactor implementation dimension
that crosscuts the event demultiplexing
strategy (e.g., select, poll,
WaitForMultipleObjects
19
Automation Goals for “What if” Analysis
Applying design-time performance analysis techniques to estimate
the impact of variability in middleware-based DRE systems
Refined
model of a
pattern
Refined
of a
weave
variability model
pattern
workload
Refined
model of a
pattern
Invariant
Refined
model of a
pattern
Refined
model of a
pattern
Refined
model
weaveof a variability
pattern
200
150
100
50
0
Refined
model of a
pattern
Composed System
200
system
workload
150
• Build and validate performance
models for invariant parts of
middleware building blocks
• Weaving of variability concerns
manifested in a building block into
the performance models
• Compose and validate
performance models of building
blocks mirroring the anticipated
software design of DRE systems
• Estimate end-to-end performance
of composed system
• Iterate until design meets
performance requirements
100
50
0
20
Automating & Scaling the “What if” Process
• Model-driven Generative technologies
• Developed the SRN Modeling Language (SRNML) in GME
• Applied C-SAW framework (from Univ of Alabama, Birmingham) for
model scalability
R&D supported by NSF CSR-SMA Program in collaboration with
Dr. Jeff Gray (UAB) and Dr. Swapna Gokhale (UConn)
21
Analyzing Impact of Individual Concerns
Engineering Mechanics – Statics & Dynamics – for analyzing impact of concerns?
• Borrow concepts from physical systems to analyze the impact of individual
concerns on end-to-end system
• Method of joints, method of sections, free body diagrams, equilibrium
conditions
22
Engineering Mechanics for DRE Systems
Failover Unit
C1
C2
C3
C4
C5
A concern is viewed as a “force”
Challenges
• Directionality – are concerns vectors?
• Rigidity – are assemblies rigid or deformable?
• Force distribution – does a concern have components along Cartesian
axes
• Well-defined structures – do software components have properties like
trusses
• Second order effects – transient effects showing up elsewhere
• Notion of friction – these are probably the capacities of resources
23
(2) Deployment-time Intelligence
• Near optimal deployment planning decisions
• Specialized middleware stacks
• Students involved:
• Arvind Krishna (graduated), Jaiganesh Balasubramanian, Gan
Deng, Dimple Kaul, Arundhati Kogekar, Amogh Kavimandan
Work partly supported by DARPA ARMS Program,
PI on subcontracts from Lockheed Martin ATL
Deployment Challenges
• Service workloads and resource capacity issues – service placement depends on
workloads and available resources
• Component accessibility patterns -- component survivability depends on its sharing degree
• Differentiated levels of service –affects resource provisioning and survivability strategies
• Service failover – different failover possibilities e.g., as a whole or part assembly or one
component at a time
• Resource sharing – increases the risk of component(s) requiring proactive survivability
strategy
25
• No one-size-fits-all dependability strategy – cannot dictate one FT strategy on all services
Service Placement Problem
• A resource configuration is a tuple RC = (C, D, HC, EC) where:
C1
S1
A2
A1
C2
C3
C4
S4
S3
• PI(c): processing index (capacity)
• MI(c): memory index
• RI(c): reliability index
• D: is a set of Data access units of types in {Ai,Sj}
• HC: C  (D): is a map associating each c in C with a set of data access
units
• EC  C  C : is a set of comm. links each attributed by:
• BI(e): bandwidth index
• RI(e): reliability index
• System performance can be measured in a variety of
ways. Considering a task assignment TA: T  C:
A3
1
PU (TA) 
C
• C: is a set of computation nodes each attributed by:

cC
 PI (t )
tTA ( c )
PI (c)
SU (TA)  1PU (TA)   2 MU (TA)   3 LU (TA)
• Resource utilization: for processing it is defined as the
average of all task processing utilization, given as
• Memory utilization MU(TA) and link utilization LU(TA) can
defined similarly
• System utilization factor: The weighted sum percentage of
utilizing the system resources
• Reliability is more tricky to measure. In general, the reliability of a given computation string is the
multiplication of the reliability indices of the underlying nodes and communication edges.
• The reliability factor RF(TA) for a given task assignment, TA, depends on:
• The reliability of all its computation strings.
• The group reliability the underlying nodes (taking into account their relative distances).
• The resource utilization of the systems. The more the system hardware are utilized the less reliable it is.
26
Specializations via Generative Programming
Specialized Middleware Stack
demuxing &
dispatching
marshaling
protocol
adapter
Crosscutting,
Configurable,
QoS Property
Manager Component
(1) concurrency
(2) security
(2) persistence
(3) instrumentation
(4) others
Component
Lifecycle
Manager
CONTAINER
• GME-based POSAML language for
POSA2 pattern language
• Generative programming to synthesize
FOCUS and AspectC++ rules
• Synthesize specialized middleware stacks
for distributed deployment of operational
strings.
27
Run-time QoS-aware Mechanisms
• Focus on Autonomic Mechanisms
• Survivability & Fault tolerance
• Students involved:
• Jaiganesh Balasubramanian, Sumant Tambe, Jules
White, Nishanth Shankaran
Work supported by DARPA ARMS Program, PI on subcontracts
from Lockheed Martin ATL, BBN Technologies, & Telcordia
Distributed Virtual Container Approach
•
primary
…
Virtual Container Concept for Component M/W
•
•
•
…
…
Virtual Container
•
•
•
•
•
Salient features
•
…
secondary
Based on a virtualization idea
Spans boundaries across all the replicas, which could
be placed on different physical nodes
Provides a single point for resource provisioning &
component programming
Seamless environment for configuring FT, LB, online
swapping
Handles fine-grained checkpointing across all the
replicas in virtual container
Reliable multicast & state synchronization confined to
a virtual container
Maintains information about how the replicas are
connected to the external component assemblies
•
•
Provides an operating context for the
components/assemblies requiring QoS
Relieves programmer from having to configure the
middleware for QoS support
Clients are oblivious to replication
• Normal container programming model
• Middleware hides the virtualization details
29
Run-time QoS & Survivability Mechanisms
• A configurable approach to survivability including micro- (infrastructure) & macro- (assembly &
operational string) level strategies
• Micro-level strategies monitor
infrastructure state to make
proactive decisions at
• Component level (swapping &
migration)
• Middleware level (configurations)
• Component Server Level (process
resource allocations)
• Node level (multiple components)
• Macro-level strategies monitor
assembly health to make failover
decisions
• Failover based on type of failover unit
• Affects service placement decisions
• May involve load balancing
• State synchronization issues
• Replication styles (hidden by FT strategies)
• Initial prototype developed using Component-Integrated ACE ORB (CIAO) & Deployment & Configuration
Engine (DAnCE) (www.dre.vanderbilt.edu)
31
Research Summary
Applications
Middleware
R&D in new, holistic approaches to end-to-end QoS management
in services-enabled distributed real-time & embedded systems
Research Challenge
• Managing problem
space variability
OS & Protocols
Hardware
Research Approach
Benefits
• Model-driven generative approach
to separation of concerns
• Enhance the state-of-art in MDD
and AOSD technologies
• Design-time “What-if” • Variety of analysis techniques
analysis using
including non traditional
generative prog
mechanisms
• Generative technologies for
automated analysis
• Application of Engineering
Mechanics
• Deployment-time
intelligent decisions
• New applications of constraints
optimization theory
• Middleware specializations
• Near optimal deployment
• Specialized middleware stacks
• Run-time
Mechanisms
• Multilevel, proactive QoS mgmt
schemes
• Virtualization ideas
• Largely autonomic
• Survivable systems
33