Presentation

Physical Programming:
Beyond Mere Logic
Bran Selic
Rational Software Canada
[email protected]
What I am Hoping For
E
THEORY
AND
PRACTICE OF
SOFTWARE
2
The Ideal and the Real
PLATN
By focussing on the imperfect world of physical reality we
may miss the essence
Software seems much closer to the “ideal” world
3
The Software World
Fundamental design principle: separate program logic
from the underlying implementation technology


separation of concerns
software portability
Program Logic
HL Programming
Languages
Computing Environment
& Technology
4
The Real-Time Software World
Key question: How long will it take?
The quantitative characteristics of the computing
environment encroach upon the purity of the logic

software design involves engineering tradeoffs
Program Logic
HL Programming
Languages
Computing Environment
& Technology
5
A Simple Programming Application
Traverse a transactions log database and print all
transactions pertaining to a specific account
Printer
CPU
DB
open (DB);
for i := 1 to DB.size do
record := read (DB);
if (record.acctNo = myAccount)then
print (record);
enddo;
close (DB);
6
Porting to a Distributed Environment
Can it really be this simple?
Printer
CPU
Replicated DB
servers
CPU
DB
CPU
DB
Network
RPC_open
open
(DB);
(DB);
for i := 1 to DB.size do
record := RPC_read
read (DB);
(DB);
if (record.acctNo = myAccount)then
print (record);
enddo;
close (DB);
RPC_close
(DB);
7
Some (Unstated!) Assumptions
The CPU and database are fast enough for the needs of
the application

e.g. random access database hardware
The CPU and database fail as a unit

i.e., no need to contend with failures of the database
Communications is reliable


order preserving
exactly once semantics
A system never has anything more important to do than
what it is doing at the moment
8
Partial Failures
Distributed systems can exhibit partial failures

fault tolerance: ability to recover from partial failures
Issue: failure recovery strategy



fault detection
failure recovery
fault diagnosis
Issue: how do other sites detect that a site has failed?


(apparent) lack of activity/response
how do we distinguish between a failed site and a lost
message?
• Timeout is the only general mechanism available

how long do we wait?
• Tradeoff between responsiveness vs. degree of certainty
9
A More Realistic Distribution Scenario
Dealing with partial failures
DB := locate_database (Network)exception abort;
RPC_open (DB)exception do
DB := locate_database (Network)exception abort;
enddo;
for i := 1 to DB.size do
record := RPC_read (DB)exception do
DB := locate_database (Network)exception abort;
for j := 1 to (i-1) do
RPC_read (DB) exception abort;
retry;
enddo;
if (record.acctNo = myAccount)then
print (record);
Most of the code is in the
enddo;
exception handlers!
RPC_close (DB);
10
Asynchronous Events and Fault Tolerance
Partial system failures are only one kind of event that
may need to be handled in the course of execution of a
distributed program
Others:

high-priority situations (e.g., imminent deadlines)

aborts
These events are often unpredictable


may occur at any point in the execution of a program
fault tolerance requires that whenever they occur and
whatever they are, we need to deal with them
11
Revisiting An Old Assumption
Is the traditional “main path” focussed programming style
appropriate when exceptions are the rule?
Step N
Exception!
Handler B
Handler AN
Step N+1
Exception!
Handler AN+1
Step N+2
12
Asynchronous Event Handling
This is nicely captured by the state-event matrix of finite
state machines
Event S
Event A
Step N
Handler AN
Step N+1
Handler AN+1
Step N+2
Event B
etc.
Handler B
Handler AN+2
13
A Conclusion
In an event-driven and deadline-based application, a
state machine-based programming model may be more
appropriate than the traditional algorithmic (“main path”)
programming model
The environment strikes back

the program logic is strongly affected by the environment
14
Communication Media Failures
Message loss


due to hardware failures
due to software failures (e.g., buffer overflow)
Message reordering



due to different paths
due to variable delays (e.g., due to variable message lengths)
retransmission due to fault-tolerant protocols
Message duplication


due to faulty hardware
retransmission due to fault-tolerant protocols
15
Transmission Delays
Possibility of out of date status information
Processing Site
Processing Site
observer
on
off
“on”
State?
“on”
16
Relativistic Effects
Relativistic effects:

different observers see different event orderings (due to
different and variable transmission delays)
clientA
notifier1
notifier2
clientB
E2
E1
E1
E2
time
17
Distribution Transparencies
Providing supporting layers of functionality that shield the
application from the undesirable effects of distribution

e.g., reliable communication protocols
Processing Site
Processing Site
client
server
Reliable
Comm Service
Reliable
Comm Service
Communications Medium
18
Impossibility Result No.1
It is not possible to guarantee that agreement
can be reached in finite time over an
asynchronous communication medium, if the
medium is lossy or one of the distributed sites
can fail

Fischer, M., N. Lynch, and M. Paterson,
“Impossibility of Distributed Consensus with One
Faulty Process” Journal of the ACM, (32, 2) April
1985.
19
Impossibility Result No.2
Even when communication is fully reliable, it is
not possible to guarantee common knowledge if
communication delays are unbounded

Halpern, J.Y, and Moses, Y., “Knowledge and
common knowledge in a distributed environment”
Journal of the ACM, (37, 3) 1990.
20
The “End-To-End” Argument
Transparency mechanisms are intended to protect the
application from observing the undesirable effects of
distribution

Most transparency types require distributed agreement!
The end-to-end argument [Saltzer et al.]:

if transparency cannot be guaranteed, the application is not
really shielded from the effects of distribution
the overhead of introducing transparency mechanisms may
not be justified
21
Stepping Back...
Most distribution problems are a consequence of the
encroachment of the physical world into the pliable and
limitless “logical” world of software

the problem is fundamental (e.g., the end-to-end argument)
Traditional Programming = Logic
Physical Programming = Logic + Physics


like traditional engineers, software designers must take into
account the raw material out of which they spin their logic
finite resources, finite delays, finite reliability...
22
Quality of Service Concepts
The physical characteristics of software can be specified
using the general notion of Quality of Service (QoS):
a specification of how well a service is (to be) performed


e.g. throughput, capacity, response time
usually a quantitative measure
QoS specifications are two sided:

offered QoS: the QoS that is offered to clients

required QoS: the QoS required by a client
23
Resources and Quality of Service
Resource: an element whose functional capacity is
limited, directly or indirectly, by the finite capacities of the
underlying physical computing environment
The services of a resource are characterized by one or
more QoS attributes

capacity, reliability, availability, response time, etc.
Client
Resource Demand
S1
S1
Resource
OfferedQoS
RequiredQoS
{RequiredQoS  OfferedQoS}
24
Simple Example
Concurrent tasks accessing a monitor with known
response time characteristics
Required QoS
Client1
Client2
access ( )
access ( )
{Deadline = 5 ms}
{Deadline = 3 ms}
myMonitor
{MaxExecutionTime = 4 ms}
Offered QoS
25
Types and “Physical” Types
The purpose of types is to tell us about the externally
relevant properties of software components so that we
can validate whether they are being used appropriately
Physical types: type specifications that incorporate QoS
characteristics
Answer two key engineering questions:

can this component support the “load” intended for it?

what does this component require to support its offered QoS?
26
Physical Type Example
A semaphore type:
class Semaphore {
{heap= 10 bytes} -- required QoS
{CPU 5 MIPS}
-- required QoS
get(){proc 0.4*CPU us;stack=4 bytes};
rel(){proc 0.4*CPU us;stack=4 bytes};
}
Usage:
mySema : Semaphore;
mySema.get() {proc 3 us} -- req. QoS
27
Violation of Encapsulation?
Aren’t the offered QoS characteristics a consequence of
the implementation?
Not necessarily...
The offered QoS characteristics can and should be
defined independently of the implementation

the “worst-case” numbers of traditional engineering
The contractual obligations that the component designer
is willing to assume
28
Physical Type Checking
Can physical types be statically checked?



The good news: Yes, they can (in most cases)
The bad news: typically requires complex analysis methods
(queueing network analysis, schedulability analysis, etc.)
but then, model checking and theorem proving is not simple
either
Some issues:



Typically, QoS-based analyses cannot be done incrementally - the full system context is required
but then, the same holds for many formal verification methods
Each type of QoS (e.g., bandwidth, CPU performance)
combines differently
29
Required QoS
Like all guarantees, the offered QoS is contingent on the
component getting what it needs to do its job
There are two distinct dimensions to this:


the peer dimension
the layering dimension
S1
Client
S1
S2
ResourceA
S2
ResourceB
CPU
CPU
Physical Processor
30
Logical Viewpoint
Example: logical view of aircraft simulator software
INSTRUCTOR
STATION
AIRFRAME
ATMOSPHERE
MODEL
CONTROL
SURFACES
GROUND
MODEL
PILOT
CONTROLS
ENGINES
31
Engineering (Realization) Viewpoint
The realization of a specific set of logical components
using facilities of the run-time environment
Processor
OS process
Processor
Ethernet LAN
OS process
stack
stack
TCP/IP socket
TCP/IP socket
OS process
stack
32
Viewpoints and Mappings
Logical Viewpoint
INSTRUCTOR
STATION
AIRFRAME
ATMOSPHERE
MODEL
CONTROL
SURFACES
GROUND
MODEL
PILOT
CONTROLS
ENGINES
Realization
mappings
Engineering Viewpoint
Processor
OS process
Processor
OS process
stack
Ethernet LAN
stack
TCP/IP socket
TCP/IP socket
OS process
stack
33
The Engineering Viewpoint
The engineering viewpoint represents the “raw material”
out of which we construct the logical viewpoint

the quality of the outcome is only as good as the quality of the
ingredients that are put in

as in all true engineering, the quantitative aspects of the
logical model are often crucial (How long will it take? How
much will be required?…)
34
Distributed Systems Dilemma
Dilemma: How can we account for the engineering
characteristics of the system without prematurely and
possibly unnecessarily committing to a specific
technology?
Proposed solution: Include in the logical model a generic
(technology-neutral) specification of the
required/expected characteristics of the engineering
environment
35
Viewpoint Separation
Required Environment: a technology-neutral environment
specification required by the logical elements of a model
Logical Viewpoint
Required Environment
Engineering Viewpoint (alternative A)
UNIX
Process
UNIX
Process
Engineering Viewpoint (alternative B)
WinNT
Process
WinNT
Process
36
Required Environment Specifications
What a logical component needs in order to perform its
function according to spec
Airframe
CPU :
3 MIPs
Mem :
2MB
logical element (client)
Bandw. :
70Mbit/s
required QoS values
realization mapping
3MIPs
20MB
CPU
100Mbit/s
LAN
offered QoS values
engineering
element
(resource)
37
Required Environment Partitions
Logical elements often share common QoS requirements
INSTRUCTOR
STATION
AIRFRAME
ATMOSPHERE
MODEL
CONTROL
SURFACES
GROUND
MODEL
ENGINES
PILOT
CONTROLS
QoS domain
(e.g.,failure unit,
uniform comm properties)
38
QoS Domains
Specify a domain in which certain QoS values apply
throughout:

failure characteristics (failure modes, availability, reliability)

CPU speeds

communications characteristics (delay, throughput, capacity)

etc.
The QoS values of a domain can be compared against
those of a concrete engineering environment to see if a
given environment is adequate for a specific model
39
“Physical” Programming
The notions of QoS and QoS domains enable the design
of distributed systems that properly account for the
effects of distribution and other non-transparent physical
phenomena, while allowing for a high degree of
portability and technology independence
They are also the basis for formal verification of
realization mappings
{required QoS  QoS of the proposed engineering environment}
May also be used to automatically synthesize
engineering environments that satisfy a given QoS
specification of a logical model
40
Conclusions and an Appeal...
The physical aspects of software will not go away


ignoring them can be perilous especially when working with
distributed systems
most interesting software systems of the future will be
distributed and will have stringent dependability requirements
(“cannot reboot the Internet”)
What is needed is a proper theoretical framework for
dealing with physical types
The QoS framework described here is currently being
incorporated into a profile of UML for real-time
applications
41