Design and Analysis of
Real-Time Systems
Predictability and
Predictable Microarchitectures
Jan Reineke
Advanced Lecture, Summer 2013
Notion of Predictability
Oxford Dictionary:
¢ predictable = adjective, able to be predicted
¢ to predict = verb, state that a specified event
will happen in the future
Fuzzy term in the WCET community.
May refer to the ability to predict:
¢ the WCET precisely,
¢ the execution time precisely,
¢ the WCET efficiently.
Notion of Predictability
Oxford Dictionary:
¢ predictable = adjective, able to be predicted
¢ to predict = verb, state that a specified event
will happen in the future
Fuzzy term in the WCET community.
May refer to the ability to predict:
¢ the WCET precisely,
How are these related?
¢ the execution time precisely,
¢ the WCET efficiently.
Ability to predict the WCET precisely
In theory we can precisely “predict” (rather:
determine) the WCET of most systems:
enumerate all inputs
l enumerate all initial states of microarchitecture
l enumerate all possible environments
l
However, this is of course not feasible in practice.
à Predictability of WCET is not the “right goal”
Contrast with ability to predict execution time:
à Related to variability in execution times
(AS) :=
[
ps2P S
{hp, ci | p 2
P S (ps)
^c2
CS (AS(ps))}
[
Variability(AS)
of Execution
Times
:=
\ CS (AS(ps))
P S (ps)
Frequency
ps2P S
Analysis-guaranteed timing bounds
Possible execution times
LB
BCET
Overest.
WCET
UB
Execution
time
Related to predictability of execution time.
2. REFERENCES
How close to WCET can we
safely push UB with
“reasonable” analysis effort?
Notion of Predictability
Fuzzy term in the WCET community.
May refer to the ability to predict:
¢ the WCET precisely,
¢ the execution time precisely
à execution-time predictability
¢ the WCET efficiently
à analyzability
Challenges to Timing Predictability
Uncertainty about
¢ program inputs,
¢ initial state of microarchitecture, and
¢ activity in environment (e.g. other cores insaarland
multi-core),
What does the execution time depend on?
resulting in interference
1 The input, determining which path is taken through the program.
à introduces
inplatform:
execution times,
2 The statevariability
of the hardware
Due to caches, pipelines, speculation, etc.
thus
decreases
execution-time predictability.
3 Interferences from the environment:
External interferences as seen from the analyzed task on shared
à introduces
non-determinism
in analysis,
busses,
caches, memory.
thus decreases analyzability.
university
computer science
I
I
Complex
CPU
L1
Cache
L2
Cache
...
Complex
CPU
L1
Cache
Main
Memory
Two Ways to Increase Predictability
1.
2.
Reduce uncertainty.
Reduce influence of uncertainty on
a.
b.
Variability of execution times, and/or
Analysis efficiency.
1. Reduce Uncertainty
¢
Reduce number of program inputs?
Difficult…
saarland
saarland
university
t does the execution time depend on? What
does the execution time depend on?
¢
university
Reduce number of micro-architectural states:
computer science
computer science
The input, determining which path is taken through the program.
What does the execution time depend on?
The state of the hardware platform:
1 The input, determining which path is taken through the program.
I
I
E.g. eliminate
predictor, cache, out-ofThe state of branch
the hardware platform:
Due to caches, pipelines, speculation, etc.
order execution…
Interferences from the environment:
Due to caches, pipelines, speculation, etc.
nterferences from the environment:
2
1
I
The input, determining which path is taken through the p
The state of the hardware platform:
External interferences as seen from the analyzed task on shared
3
I Due to caches, pipelines, speculation, etc.
busses, caches, memory.
I External interferences as seen from the analyzed task
shared from the environment:
3 on
Interferences
I External interferences as seen from the analyzed task on
busses, caches, memory.
2
busses, caches, memory.
Complex
CPU
L1
Cache
L2
Cache
...
Complex
CPU
Complex CPU
(out-of-order
execution,
branch
prediction, etc.)
Main
Memory
L1
Cache
L1
Cache
Simple
CPU
Main
Memory
Jan Reineke
Jan Reineke
Timing Analysis and Timing Predictability
11. Februar 2013
6 / 38
Jan Reineke
Timing Analysis and Timing Predictability
11. Februar 2013
Memory
Timing Analysis and Timing Predictability
6 / 38
11. Feb
1. Reduce Uncertainty
¢
Reduce number of program inputs?
Difficult…
saarland
saarland
university
t does the execution time depend on? What
does the execution time depend on?
¢
university
Reduce number of micro-architectural states:
computer science
computer science
The input, determining which path is taken through the program.
What does the execution time depend on?
The state of the hardware platform:
1 The input, determining which path is taken through the program.
I
I
E.g. eliminate
predictor, cache, out-ofThe state of branch
the hardware platform:
Due to caches, pipelines, speculation, etc.
order execution…
Interferences from the environment:
Due to caches, pipelines, speculation, etc.
nterferences from the environment:
2
1
I
The input, determining which path is taken through the p
The state of the hardware platform:
External interferences as seen from the analyzed task on shared
3
I Due to caches, pipelines, speculation, etc.
busses, caches, memory.
I External interferences as seen from the analyzed task
shared from the environment:
3 on
Interferences
I External interferences as seen from the analyzed task on
busses, caches, memory.
2
busses, caches, memory.
Complex
CPU
L1
Cache
L2
Cache
...
Complex
CPU
Complex CPU
(out-of-order
execution,
branch
prediction, etc.)
Main
Memory
L1
Cache
L1
Cache
Simple
CPU
Main
Memory
Jan Reineke
Jan Reineke
Memory
Timing Analysis and Timing Predictability
If done naively: Reverses many micro-architectural developments…
à Decreases performance…
Key question: How to reduce uncertainty without sacrificing performance?
Timing Analysis and Timing Predictability
11. Februar 2013
6 / 38
Jan Reineke
Timing Analysis and Timing Predictability
11. Februar 2013
6 / 38
11. Feb
2.a) Reducing Influence of Uncertainty on
Variability of Execution Times
If a source of uncertainty has no influence on
execution times, it is irrelevant for timing analysis.
Example: Temporal Isolation
Tpipeline,
hp, ci) =
Tpipeline
(P,ci)
hpi)
(P ) = cache
max(P,
Tpipeline,
(P, hp,
ne, cache
cache
p,c
Tcache (P, hci)
WCETpipeline, cache (P ) = max Tpipeline, cache (P, hp, ci)
p,c
max Tpipeline
(P, hpi)
Temporal
Isolation
p
max Tcache (P, hci)
c
max Tpipeline (P, hpi)
p
= WCETpipeline (P ) + WCETcache (P )
max Tcache (
c
Temporal isolation
between
cores
= WCETcache (P
= WCET
(P ) +
pipeline
l isolation: timing of program on one core is independent of
Temporal isolation:
activity on other cores
0
0
¢
Formally:
,
p
,
c
i)
=
T
(P
,
hp
,
c
,
p
,
c
1
2 2
1
1 1
2 2 i) = Tisolated
0 (P
0 1 , hp1 , c2 i)
T (P1 , hp1 , c1 , p2 , c2 i) = T (P1 , hp1 , c1 , p2 , c2 i) = Tisolated (P1 ,
T (P1 , hp1 , c1 , p2 , c2 i) = Tisolated (P1 , hp1 , c2 i)
T (P1 , hp1 , c1 , p2 , c2 i) = Tisolated (P1 ,
¢
Can be exploited in WCET analysis:
CET(P1 ) = WCET(P
max T1(P
,(P
c21i)
1 , hpmax
1 , c1 , p2
)
=
T
, hp1 , c1 , p2 , c2 i)
p1 ,c1 ,p2 ,c2
¢
p1 ,c1 ,p2 ,c2
= max Tisolated (P
, hp1 ,Tcisolated
1 i) (P1 , hp1 , c1 i)
=1max
p1 ,c1
p1 ,c1
Temporal Isolation
How to achieve it?
¢
Partition resources in space and/or time
l
¢
Resource appears like a slower and/or smaller
private resource to each client
Examples:
Time-division multiple access (TDMA) arbitration
in shared busses
l Partitioned shared caches
l
¢
Why not simply provide private resources then?
2.b) Reducing Influence of Uncertainty on
Analysis Efficiency
Does non-determinism have to be a problem for
analyzability?
à Timing Anomalies
à Domino Effects
à Lack of Timing Compositionality
¢
Eliminate Timing Anomalies,
e.g. stall pipeline on cache miss and use LRU.
¢
Eliminate Domino Effects
e.g. use LRU rather than FIFO.
Timing Anomalies
Timing Anomaly = Counterintuitive scenario in
which the “local worst case” does not imply
the “global worst case”.
Example: Scheduling Anomaly
C ready
A
Resource 1
D
C
Resource 2
Resource 1
Resource 2
Recommended
literature:
E
B
A
D
B
E
C
Bounds on multiprocessing timing anomalies
RL Graham - SIAM Journal on Applied Mathematics, 1969 – SIAM
(http://epubs.siam.org/doi/abs/10.1137/0117039)
Timing Anomalies
Example: Speculation Anomaly
Prefetching as branch
condition has not been
evaluated yet
Branch Condition
Evaluated
Cache Hit
Cache Miss
A
Prefetch B - Miss
A
C
C
No prefetching as branch
condition has already
been evaluated yet
Timing Anomalies
Example: Speculation Anomaly
Prefetching as branch
condition has not been
evaluated yet
Memory access may
induce additional
cache misses later on
Branch Condition
Evaluated
Cache Hit
Cache Miss
A
Prefetch B - Miss
A
C
C
No prefetching as branch
condition has already
been evaluated yet
Timing Anomalies
Example: Cache Timing Anomaly of FIFO
Access:
b
hit
b
c
[a, b]
miss
[c, a]
miss
c
d
[b, c]
miss
[d, b]
miss
[c, d]
4 Misses
[d, c]
3 Misses
[a, ?]
miss
[b, a]
miss
[c, b]
hit
[c, b]
miss
[d, c]
hit
Similar examples exist for PLRU and MRU.
Impossible for LRU.
Timing Anomalies
Example: Cache Timing Anomaly of FIFO
Access:
b
hit
b
c
[a, b]
miss
[c, a]
miss
c
d
[b, c]
miss
[d, b]
miss
[c, d]
4 Misses
[d, c]
3 Misses
[a, ?]
miss
[b, a]
miss
[c, b]
hit
[c, b]
miss
[d, c]
hit
Local worst case
Similar examples exist for PLRU and MRU.
Impossible for LRU.
Timing Anomalies
Example: Cache Timing Anomaly of FIFO
Global worst case
Access:
b
hit
b
c
[a, b]
miss
[c, a]
miss
c
d
[b, c]
miss
[d, b]
miss
[c, d]
4 Misses
[d, c]
3 Misses
[a, ?]
miss
[b, a]
miss
[c, b]
hit
[c, b]
miss
[d, c]
hit
Local worst case
Similar examples exist for PLRU and MRU.
Impossible for LRU.
Timing Anomalies
Consequences for Timing Analysis
In the presence of timing anomalies, a
timing analysis cannot make decisions “locally”:
it needs
toWCET
consider
all cases.
State-of-the-art:
Integrated
Analysis
Drawback Efficiency
à May yield “State explosion problem”
saarland
university
computer science
Timing Anomalies
Open Analysis and Design Challenges
¢
¢
How to determine whether a given timing
model exhibits timing anomalies?
How to construct processors without timing
anomalies?
Caches: LRU replacement
l No speculation
l Other aspects: “halt” everything upon every
“timing accident” à possibly very inefficient
l
¢
How to construct conservative timing model
without timing anomalies?
l
Can we e.g. add a “safety margin” to the local
worst case?
Domino Effects
¢
Intuitively:
domino effect = “unbounded” timing anomaly
¢
Examples:
Pipeline (e.g. PowerPC 755)
l Caches (FIFO, PLRU, MRU, …)
l
Domino Effects
Example: Cache Domino Effect of FIFO
Access:
b
hit
[a, b]
c
b
d
c
miss
miss
miss
miss
[c, a]
[b, c]
[d, b]
[c, d]
4 Misses
[d, c]
3 Misses
[a, ?]
miss
[b, a]
miss
[c, b]
hit
[c, b]
miss
[d, c]
hit
Similar examples exist for PLRU and MRU.
Impossible for LRU.
Domino Effects
Example: Cache Domino Effect of FIFO
Access:
b
hit
[a, b]
c
b
d
c
miss
miss
miss
miss
[c, a]
[b, c]
[d, b]
[c, d]
4 Misses
[d, c]
3 Misses
[a, b]
4 Misses
[b, a]
3 Misses
[a, ?]
miss
[b, a]
miss
[c, b]
[c, d]
[d, c]
miss
miss
[c, b]
d
a
Access:
hit
[a, c]
[a, d]
miss
hit
miss
[d, c]
a
b
[d, a]
[a, d]
miss
miss
hit
[b, d]
[b, a]
miss
hit
Similar examples exist for PLRU and MRU.
Impossible for LRU.
Domino Effects
Example: Cache Domino Effect of FIFO
Access:
b
hit
[a, b]
c
b
d
c
miss
miss
miss
miss
[c, a]
[b, c]
[d, b]
[c, d]
4 Misses
[d, c]
3 Misses
[a, b]
4 Misses
[b, a]
3 Misses
[a, ?]
miss
[b, a]
miss
[c, b]
[c, d]
[d, c]
miss
miss
[c, b]
d
a
Access:
hit
[a, c]
[a, d]
miss
hit
miss
[d, c]
a
b
[d, a]
[a, d]
miss
miss
hit
[b, d]
[b, a]
miss
hit
Similar examples exist for PLRU and MRU.
Impossible for LRU.
Domino Effects
Example: Cache Domino Effect of FIFO
Access:
b
hit
[a, b]
c
b
d
c
miss
miss
miss
miss
[c, a]
[b, c]
[d, b]
[c, d]
4 Misses
[d, c]
3 Misses
[a, b]
4 Misses
[b, a]
3 Misses
[a, ?]
miss
[b, a]
miss
[c, b]
[c, d]
[d, c]
miss
miss
[c, b]
d
a
Access:
hit
[a, c]
[a, d]
miss
hit
miss
[d, c]
a
b
[d, a]
[a, d]
miss
miss
hit
[b, d]
[b, a]
miss
hit
Similar examples exist for PLRU and MRU.
Impossible for LRU.
Domino Effects
Example: FIFO Cache
Equal up to renaming à Can extend this sequence arbitrarily
1.
2.
Hit/ 1.
Miss 2.
[a,b,c,d]
[b,a,e,d]
e
M
H
[e,a,b,c]
[b,a,e,d]
d
M
H
[d,e,a,b]
[b,a,e,d]
f
M
M
[f,d,e,a]
[f,b,a,e]
b
[b,f,d,e]
[f,b,a,e]
M
H
Similar examples exist for PLRU and MRU.
Impossible for LRU.
4 Misses
1 Miss
Domino Effects
Open Analysis and Design Challenges
Exactly as with timing anomalies:
¢ How to determine whether a given timing
model exhibits domino effects?
¢ How to construct processors without domino
effects?
¢ How to construct conservative timing model
without domino effects?
Timing Compositionality
What does the execution time depend on?
Motivation
The input, determining which path is taken through the p
1
2
The state of the hardware platform:
I
¢
Due to caches, pipelines, speculation, etc.
Interferences from the environment:
Some timing accidents are hard
or
External interferences as seen from the analyzed task o
busses, caches,
even impossible to statically exclude
at memory.
any particular program point:
l Interference on a shared bus:
...
depends on behavior of tasks
executed on other cores
l Interference on a cache in
preemptively scheduled systems
l DRAM refreshes
But it may be possible to make
cumulative statements about the
number of these accidents
3
I
Complex
CPU
L1
Cache
L2
Cache
Complex
CPU
Jan Reineke
¢
Main
Memory
L1
Cache
Timing Analysis and Timing Predictability
11. Feb
Timing Compositionality
Intuitive Meaning
Timing of a program can be decomposed into
contributions by different “components”, e.g.
l Pipeline
l Cache non-preempted
l Cache-related preemption delay
l Bus interference
l DRAM refreshes
l …
Timing
compositionality:
¢ Example, decomposition into pipeline and cache
¢
Tpipeline, cache (P, hp, ci) = Tpipeline (P, hpi)
Tcache (P, hci)
WCETpipeline, cache (P, hp, ci) = max Tpipeline, cache (P, hp, ci)
p,c
Timing Compositionality
Application in Timing Analysis
Then,
the components (here: pipeline and cache)
Timing compositionality:
can also be analyzed separately:
l
,
l
#
u
u
Tpipeline, cache (P, hp, ci) = Tpipeline (P, hpi)
Tcache (P, hci)
WCETpipeline, cache (P ) = max Tpipeline, cache (P, hp, ci)
p,c
max Tpipeline (P, hpi)
p
max Tcache (P, hci)
c
= WCETpipeline (P ) + WCETcache (P )
2.
REFERENCES
Timing Compositionality
Example: “Cache-aware” Response-Time Analysis
saarland
university
In preemptive
scheduling,
Preemptively
Scheduled
Systems preempting
[Altmeyer et tasks
al.] may
computer science
“disturb” the cache contents of preempted tasks:
Response Time of Task 2
aT 1
dT 1
T1
aT 2
dT 2
T2
t
1
Worst-case execution time of task 2 without preemptions: C2
Timing Compositionality
Example: “Cache-aware” Response-Time Analysis
saarland
university
In preemptive
scheduling,
Preemptively
Scheduled
Systems preempting
[Altmeyer et tasks
al.] may
computer science
“disturb” the cache contents of preempted tasks:
Response Time of Task 2
aT 1
dT 1
T1
aT 2
dT 2
T2
t
Additional misses due to preemption, referred to as
the Cache-Related Preemption Delay (CRPD).
1
Worst-case execution time of task 2 without preemptions: C2
Timing Compositionality
saarland
Preemptively
Scheduled Systems [Altmeyer
et al.]
Example:
“Cache-aware”
Response-Time
Analysis
university
computer science
Response Time of Task 2
aT 1
dT 1
T1
aT 2
dT 2
T2
t
Timing decomposition:
Worst-case execution time of task 2 without preemptions: C
¢ WCET
of T1execution
without
preemptions:
Worst-case
time of task
1 without preemptions: C C1
Additional cost of task 1 preempting task 2:
= #add. misses · BRT
¢ WCET
without
C2
) R of
C T2
+ #preemptions
· (C preemptions:
+
)
¢ Additional cost of T1 preempting T2:
CRPD1,2 = BRT * #additional misses
à Response time of T2:
1
2
2
1
3
1,2
2
2
Sebastian Hahn
1
1,2
Timing Compositionality
19 June 2013
R2 ≤ C2 + #preemptions × (C1 + CRPD1,2)
10 / 19
Timing Compositionality
Open Analysis and Design Challenges
¢
¢
¢
How to check whether a given decomposition
of a timing model is valid?
How to compute bounds on the cost of
individual events, such as cache misses (BRT
in previous example) or bus stalls?
How to build microarchitecture in a way that
permits a sound and precise decomposition of
its timing?
Summary:
Approaches to Increase Predictability
Reduce size by
simplifying
microarchitecture, e.g.
eliminate cache,
branch prediction, etc.
Reduce influence
of uncertainty on
executions, e.g. by
temporal isolation
Uncertainty
about Inputs
Program inputs
Initial state of microarchitecture
Tasks on other cores
Decouple analysis efficiency from
number of executions:
• Eliminate timing anomalies and
domino effects,
• Achieve timing compositionality
e.g. by LRU replacement, stalling
pipeline upon cache miss
Possible
Executions
Analysis
Efficiency
Summary
¢
¢
(Fuzzy) notions of timing predictability
Important related notions:
Timing anomalies
l Domino effects
l Timing compositionality
l Temporal isolation
l
¢
Two ways of increasing predictability:
1.
2.
Reduce uncertainty
Reduce influence of uncertainty
© Copyright 2026 Paperzz