Challenges for Worst-case Execution Time Analysis of Multi-core Architectures Jan Reineke @ saarland university computer science IRIT, Toulouse September 18, 2014 Hard Real-Time Systems The Real-Time Context: Hard Real-Time Systems Hard Systems saarl unive compu Safety-critical applications: Safety-critical applications: Safety-critical applications: Avionics,Avionics, automotive, train industries, manufacturing control automotive, train industries, manufacturing cont ¢ Avionics, automotive, train industries, manufacturing Side in car Crankshaft-synchronous Crankshaft-synchronous task Side airbag airbag in car Crankshaft-synchronous tasks Side airbag in car tasks Reaction inin < 10 in < 45inmicrosec Reaction < msec 10 msec Reaction Reaction < 45 µsec Reaction in < 10 msec Reaction in < 45 µsec ¢ Embedded controllers must finish their tasks within Embedded controllers must finish their tasks within given given time bounds. Embedded controllers must finish their tasks within given time bounds. ¢ Developers would like to know the Worst-Case bounds. Developers would like to to know Worst-Case Execution Execution Time (WCET) givethe a guarantee. Developers would the Worst-Case Execution Tim (WCET) to like givetoa know guarantee. Jan Reineke, Saarland 2 The Timing Analysis Problem ? Our Vision: PRET Machines + Embedded Software PREcision-Timed processors: Performance & Predicability + Timing Requirements = PRET Microarchitecture Jan Reineke, Saarland 3 What does the execution time depend on? What does the execution time depend on? 2 university The input, determining which path is taken through the program. The input, determining which path is taken through the program. ¢ The state of the hardware platform: ¢ 1 saarland computer science The state of the hardware platform: l Due to caches, pipelining, speculation, etc. Due to caches, pipelines, speculation, etc. ¢ Interference the environment: 3 Interferences from the from environment: I External interferences as seen from the analyzed task on shared l External interference as seen from the analyzed busses, caches, memory.busses, caches, memory. task on shared I Simple CPU Memory Jan Reineke, Saarland 4 What does the execution time depend on? What does the execution time depend on? saarland university saarland The input, determining which path is taken university What does the execution time depend on? through the program. 1 The input, determining which path is taken through the program. of thewhich hardware platform: 1 ¢ TheThe input,state determining path is taken through the program. ¢ computer science computer science 2 The state of the hardware platform: The l state of the hardware pipelining, platform: Due to caches, speculation, etc. Due to caches, pipelines, speculation, etc. I Due to caches, pipelines, speculation, etc. ¢ Interference from the environment: 3 Interferences from the environment: 3 Interferences from the environment: I External interferences as seen from the analyzed task on shared External interference from the I l External interferences as seen as fromseen the analyzed task analyzed on shared busses, caches, memory. busses, caches, memory. task on shared busses, caches, memory. 2 I Complex CPU (out-of-order Simple execution, CPU branch prediction, etc.) L1 Main Memory Cache Memory Jan Reineke, Saarland 5 What does the execution time depend on? What does the execution time depend on? saarland university saarland saarland The input, determining which path is taken university university What does the execution time depend depend on? on? through the program. 1 The input, determining which path is taken through the program. 1 The input, determining which path is taken through the program. of thewhich hardware platform: statestate of the hardware platform: 12 ¢ TheThe input, determining path is taken through the program. ¢ computer science computerscience science computer 2 The state of the hardware platform: I Due toofcaches, pipelines, speculation,speculation, etc. The l state the hardware platform: Due to caches, pipelining, etc. Due to caches, pipelines, speculation, etc. I Due to caches, 3 Interferences from the environment: pipelines, speculation, etc. ¢ Interference from the environment: 3 Interferences from the environment: I External interferences as seen from the analyzed task on shared 3 Interferences from the environment: I External busses, caches, memory. interferences asasseen the taskanalyzed onshared shared External interference as from the I l External interferences seenfrom fromseen the analyzed analyzed task on busses, caches, memory. busses, caches, memory. task on shared busses, caches, memory. 2 I Complex CPU Complex L1 Cache CPU (out-of-order Simple execution, ... CPU branch prediction, etc.) Complex CPU L1 L2 Main Main Memory Cache Memory Cache Memory L1 Cache Jan Reineke, Saarland 6 e saarland saarland university university Example ofInfluence Influence Example Example ofofInfluence ofofof Access Time Access Time Microarchitectural State Microarchitectural State Microarchitectural State x=a+b; =a+b; computer science computer science saarland saarland university university computer science computer science LOAD r2,r2, LOAD _a_a LOAD r1,r1, LOAD _b_b LOAD LOADr2, r2, _a _a ADDx=a+b; r3,r2,r1 LOAD ADD r3,r2,r1 LOADr1, r1, x=a+b; _b _b ADD r3,r2,r1 r3,r2,r1 ADD MPC5xx MPC5xx PPC755 PPC755 MPC5xx MPC5xx PPC755 PPC755 Motorola PowerPC Motorola PowerPC PowerPC 755 755755 Reinhard Wilhelm Reinhard Wilhelm Courtesy of Reinhard Wilhelm. ofTutorial Reinhard Wilhelm. Timing Analysis and Timing Predictability Courtesy Timing Analysis and Timing Predictability Tutorial ISCAISCA 20102010 5 / 635 / 63 Reineke etSaarland al., Berkeley Reineke et al., Berkeley Jan Reineke, 75 Example of Influence of Corunning Tasks in Multicores Radojkovic et al. (ACM TACO, 2012) on Intel Atom and Intel Core 2 Quad: up to 14x slow-down due to interference on shared L2 cache and memory controller Jan Reineke, Saarland 8 Challenges 1. Modeling How to construct sound timing models? 2. Analysis How to precisely & efficiently bound the WCET? 3. Design How to design microarchitectures that enable precise & efficient WCET analysis? Jan Reineke, Saarland 9 Our Vision: PRET Machines The Modeling Challenge ? PREcision-Timed processors: Performance & Predicability Microarchitecture + Timing Model = PRET Timing model = Formal specification of (Image: John Harrison’s H4, first clock to solve longitude problem) microarchitecture’s timing Precision-Timed (PRET) Machines – p. 11/19 Incorrect timing model à possibly incorrect WCET bound. Jan Reineke, Saarland 10 Current Process of Deriving Timing Model Our Vision: PRET Machines ? PREcision-Timed processors: Performance & Predicability Microarchitecture + = PRET Timing Model (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 Jan Reineke, Saarland 11 Current Process of Deriving Timing Model Our Vision: PRET Machines ? PREcision-Timed processors: Performance & Predicability Microarchitecture + = PRET Timing Model (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 Jan Reineke, Saarland 12 Current Process of Deriving Timing Model Our Vision: PRET Machines ? PREcision-Timed processors: Performance & Predicability Microarchitecture + = PRET Timing Model (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 Jan Reineke, Saarland 13 Current Process of Deriving Timing Model Our Vision: PRET Machines ? PREcision-Timed processors: Performance & Predicability Microarchitecture + = PRET Timing Model (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 Jan Reineke, Saarland 14 Current Process of Deriving Timing Model Our Vision: PRET Machines ? PREcision-Timed processors: Performance & Predicability Microarchitecture + Timing Model = PRET (Image: John Harrison’s H4, first clock to solve longitude problem) à Time-consuming, and Precision-Timed (PRET) Machines – p. 11/19 à error-prone. Jan Reineke, Saarland 15 Current Process of Deriving Timing Model Our Vision: PRET Machines ? PREcision-Timed processors: Performance & Predicability Microarchitecture + Timing Model = PRET (Image: John Harrison’s H4, first clock to solve longitude problem) à Time-consuming, and Precision-Timed (PRET) Machines – p. 11/19 à error-prone. Jan Reineke, Saarland 16 Our Vision:Process PRET Machines 1. Future of Deriving Timing Model PREcision-Timed processors: Performance & Predicability Microarchitecture + VHDL = PRET Model Timing Model (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 Jan Reineke, Saarland 17 Our Vision:Process PRET Machines 1. Future of Deriving Timing Model PREcision-Timed processors: Performance & Predicability Microarchitecture + VHDL = PRET Model Timing Model Derive timing model automatically from formal specification of microarchitecture. (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 à à Less manual effort, thus less time-consuming, and provably correct. Jan Reineke, Saarland 18 Our Vision:Process PRET Machines 1. Future of Deriving Timing Model PREcision-Timed processors: Performance & Predicability Microarchitecture + VHDL = PRET Model Timing Model Derive timing model automatically from formal specification of microarchitecture. (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 à à Less manual effort, thus less time-consuming, and provably correct. Jan Reineke, Saarland 19 Our Vision:Process PRET Machines 1. Future of Deriving Timing Model PREcision-Timed processors: Performance & Predicability Microarchitecture + VHDL = PRET Model Timing Model Derive timing model automatically from formal specification of microarchitecture. (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 à à Less manual effort, thus less time-consuming, and provably correct. Jan Reineke, Saarland 20 Our Vision:Process PRET Machines 2. Future of Deriving Timing Model PREcision-Timed processors: Performance & Predicability Microarchitecture + Perform measurements on hardware = PRET Infer model Timing Model (Image: John Harrison’s H4, first clock to solve longitude problem) Precision-Timed (PRET) Machines – p. 11/19 Jan Reineke, Saarland 21 Our Vision:Process PRET Machines 2. Future of Deriving Timing Model PREcision-Timed processors: Performance & Predicability Microarchitecture + Perform measurements on hardware = PRET Infer model Timing Model (Image: John Harrison’s H4, first clock to solve longitude problem) Derive timing model automatically from measurements on the hardware using ideas from automata learning. Precision-Timed (PRET) Machines – p. 11/19 à à à No manual effort, and (under certain assumptions) provably correct. Also useful to validate assumptions about microarch. Jan Reineke, Saarland 22 Proof-of-concept: Automatic Modeling of the Cache Hierarchy ¢ ¢ Cache Model is important part of Timing Model Can be characterized by a few parameters: ABC: associativity, block size, capacity l Replacement policy l B = Block Size A = Associativity Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data ... Tag Data Tag Data Tag Data Tag Data N = Number of Cache Sets chi [Abel and Reineke, RTAS 2013] derives all of these parameters fully automatically. Jan Reineke, Saarland 23 Example: Intel Core 2 Duo E6750, L1 Data Cache |Misses| 90000 80000 70000 60000 50000 L1 Misses 40000 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950 |Size| Jan Reineke, Saarland 24 Example: Intel Core 2 Duo E6750, L1 Data Cache |Misses| 90000 80000 70000 60000 50000 L1 Misses 40000 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950 Capacity = 32 KB |Size| Jan Reineke, Saarland 25 Example: Intel Core 2 Duo E6750, L1 Data Cache Way Size = 4 KB |Misses| 90000 80000 70000 60000 50000 L1 Misses 40000 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950 Capacity = 32 KB |Size| Jan Reineke, Saarland 26 Replacement Policy Approach inspired by methods to learn finite automata. Heavily specialized to problem domain. Jan Reineke, Saarland 27 Replacement Policy Approach inspired by methods to learn finite automata. Heavily specialized to problem domain. Discovered to our knowledge undocumented policy of the Intel Atom D525: d ab cd ef x dc ab ef xe dc ab More information: http://embedded.cs.uni-saarland.de/chi.php Jan Reineke, Saarland 28 Modeling Challenge: Future Work Extend automation to other parts of the microarchitecture: ¢ ¢ ¢ Translation lookaside buffers, branch predictors Shared caches in multicores including their coherency protocols Out-of-order pipelines? Jan Reineke, Saarland 29 Our Vision: PRET Machines The Analysis Challenge ! ? PREcision-Timed processors: Performance & Predicability Microarchitecture + Timing Model Precise & Efficient = PRET Timing Analysis Consider all Consider all possible possible initial (Image: John Harrison’s H4, firstprogram clock to solve longitudestates problem) of the inputs hardware 1. INTRODUCTION Precision-Timed (PRET) Machines – p. 11/19 WCETH (P ) := max max i2Inputs h2States(H) 2. REFERENCES ETH (P, i, h) Jan Reineke, Saarland 30 The Analysis Challenge Consider all possible program inputs 1. INTRODUCTION WCETH (P ) := max Consider all possible initial states of the hardware max i2Inputs h2States(H) ETH (P, i, h) Explicitly evaluating ET for all inputs and all 2. REFERENCES hardware states is not feasible in practice: There are simply too many. è Need for abstraction and thus approximation! ¢ Jan Reineke, Saarland 31 The Analysis Challenge: State of the Art Component Analysis Status Caches, Branch Target Buffers Precise & efficient abstractions, for • LRU [Ferdinand, 1999] Not-so-precise but efficient abstractions, for • FIFO, PLRU, MRU [Grund and Reineke, 2008-2011] Reasonably precise quantitative analyses, for • FIFO, MRU [Guan et al., 2012-2014] Complex Pipelines Precise but very inefficient; little abstraction Major challenge: timing anomalies Shared resources, e.g. busses, shared caches, DRAM No realistic approaches yet Major challenge: interference between hardware threads à execution time depends on corunning tasks à need timing compositionality or isolation Jan Reineke, Saarland 32 The Design Challenge Wanted: Multi-/many-core architecture that is timing predictable = admits precise and efficient WCET analysis and delivers high performance. Jan Reineke, Saarland 33 Approaches to Increase Predictability Inputs = Program inputs + Initial state of microarchitecture + Tasks on other cores Uncertainty about Inputs 1. Reduce uncertainty Possible Executions 2. Reduce influence of uncertainty on executions Analysis Efficiency 3. Decouple analysis efficiency from number of executions Jan Reineke, Saarland 34 1. Reduce Uncertainty a) Eliminate performance-enhancing features: saarland saarland computer science computer science university t does the execution time depend on? What does the execution time depend on? university The input, determining which path is taken through the program. What does the execution time depend on? The state of the hardware platform: 1 The input, determining which path is taken through the program. E.g. branch predictor, cache, out-of-order The state of the hardware platform: nterferences from the environment: execution…Due to caches, pipelines, speculation, etc. I I Due to caches, pipelines, speculation, etc. 2 1 I The input, determining which path is taken through the p The state of the hardware platform: External interferences as seen from the analyzed task on shared 3 Interferences from the environment: I Due to caches, pipelines, speculation, etc. busses, caches, memory. I External interferences as seen from the analyzed task shared from the environment: 3 on Interferences I External interferences as seen from the analyzed task on busses, caches, memory. 2 busses, caches, memory. Complex CPU L1 Cache L2 Cache ... Complex CPU Complex CPU (out-of-order execution, branch prediction, etc.) Main Memory L1 Cache L1 Cache Simple CPU Main Memory Jan Reineke Jan Reineke Timing Analysis and Timing Predictability 11. Februar 2013 6 / 38 Jan Reineke Timing Analysis and Timing Predictability 11. Februar 2013 Memory Timing Analysis and Timing Predictability 11. Feb 6 / 38 If done naively: Reverses many micro-architectural developments… à Decreases performance… Key question: How to reduce uncertainty without sacrificing performance? Jan Reineke, Saarland 35 1. Reduce Uncertainty b) Replace features by alternatives with smaller at does the execution time depend on? state spaces: saarland university computer science The input, determining which path is taken through the program. The state of the hardware platform: Caches à Scratchpads, Due to caches, pipelines, speculation, etc. Complex Pipeline à Simple VLIW or ThreadInterferences from the environment: Interleaved Pipeline External interferences as seen from the analyzed task on shared busses, caches, memory. DRAM Controller w/ Closed-Page Policy I I Complex CPU L1 Cache L2 Cache ... Complex CPU Jan Reineke Scratch pad VLIW Main Memory L1 Cache Timing Analysis and Timing Predictability Scratch pad ... VLIW 11. Februar 2013 6 / 38 DRAM Controller w/ Closed Page Policy Scratch pad Jan Reineke, Saarland 36 1. Reduce Uncertainty c) Establish fixed microarchitectural state “once in a while”: ¢ By microarchitecture: partially flush pipeline at basicblock boundaries [Rochange and Sainrat, Computing Frontiers 2005] ¢ By sync instruction, placed by compiler [Maksoud and Reineke, RTNS 2014] Jan Reineke, Saarland 37 Approaches to Increase Predictability Inputs = Program inputs + Initial state of microarchitecture + Tasks on other cores Uncertainty about Inputs 1. Reduce uncertainty Possible Executions 2. Reduce influence of uncertainty on executions Analysis Efficiency 3. Decouple analysis efficiency from number of executions Jan Reineke, Saarland 38 2) Reducing Influence of Uncertainty on Execution Times If a particular source of uncertainty has no influence on the execution time of a program under analysis, it is irrelevant for timing analysis. Principal example: Temporal Isolation Jan Reineke, Saarland 39 Tpipeline, hp, ci) = Tpipeline (P,ci) hpi) (P ) = cache max(P, Tpipeline, (P, hp, ne, cache cache p,c Tcache (P, hci) WCETpipeline, cache (P ) = max Tpipeline, cache (P, hp, ci) p,c max Tpipeline (P, hpi) Temporal Isolation p max Tcache (P, hci) c max Tpipeline (P, hpi) p = WCETpipeline (P ) + WCETcache (P ) max Tcache ( c Temporal isolation between cores = WCETcache (P = WCET (P ) + pipeline l isolation: timing of program on one core is independent of Temporal isolation: activity on other cores 0 0 ¢ Formally: , p , c i) = T (P , hp , c , p , c 1 2 2 1 1 1 2 2 i) = Tisolated 0 (P 0 1 , hp1 , c2 i) T (P1 , hp1 , c1 , p2 , c2 i) = T (P1 , hp1 , c1 , p2 , c2 i) = Tisolated (P1 , T (P1 , hp1 , c1 , p2 , c2 i) = Tisolated (P1 , hp1 , c2 i) T (P1 , hp1 , c1 , p2 , c2 i) = Tisolated (P1 , ¢ Can be exploited in WCET analysis: CET(P1 ) = WCET(P max T1(P ,(P c21i) 1 , hpmax 1 , c1 , p2 ) = T , hp1 , c1 , p2 , c2 i) p1 ,c1 ,p2 ,c2 ¢ p1 ,c1 ,p2 ,c2 = max Tisolated (P , hp1 ,Tcisolated 1 i) (P1 , hp1 , c1 i) =1max p1 ,c1 p1 ,c1 Jan Reineke, Saarland 40 Temporal Isolation How to achieve it? Partition resources in space and/or time l Resource appears like a slower and/or smaller private resource to each client Examples: Time-division multiple access (TDMA) arbitration in shared busses l Partitioned shared caches l PRET DRAM Controller = time division of data/ control bus + space division of DRAM internals l Why not simply provide private resources then? Jan Reineke, Saarland 41 Approaches to Increase Predictability Inputs = Program inputs + Initial state of microarchitecture + Tasks on other cores Uncertainty about Inputs 1. Reduce uncertainty Possible Executions 2. Reduce influence of uncertainty on executions Analysis Efficiency 3. Decouple analysis efficiency from number of executions Jan Reineke, Saarland 42 3) Decouple analysis efficiency from number of executions 1. Abstraction: represent many concrete executions by few abstract ones: l l 2. Disregard most executions: l l 3. Genius is in choice of abstraction Usually requires some sort of monotonicity E.g. follow only local worst-case Requires absence of timing anomalies/domino effects Divide and conquer: Analyze different components separately l Requires timing compositionality l Jan Reineke, Saarland 43 Timing Anomalies Timing Anomaly = Counterintuitive scenario in which the “local worst case” does not imply the “global worst case”. Example: Scheduling Anomaly C ready A Resource 1 D C Resource 2 Resource 1 Resource 2 E B A D B E C Bounds on multiprocessing timing anomalies RL Graham - SIAM Journal on Applied Mathematics, 1969 – SIAM (http://epubs.siam.org/doi/abs/10.1137/0117039) Jan Reineke, Saarland 44 Timing Anomalies Example: Speculation Anomaly Prefetching as branch condition has not been evaluated yet Memory access may induce additional cache misses later on Branch Condition Evaluated Cache Hit Cache Miss A Prefetch B - Miss A C C No prefetching as branch condition has already been evaluated yet Jan Reineke, Saarland 45 Timing Anomalies Example: Cache Timing Anomaly of FIFO Global worst case Access: b hit b c [a, b] miss [c, a] miss c d [b, c] miss [d, b] miss [c, d] 4 Misses [d, c] 3 Misses [a, ?] miss [b, a] miss [c, b] hit [c, b] miss [d, c] hit Local worst case Similar examples exist for PLRU and MRU. Impossible for LRU. Jan Reineke, Saarland 46 Timing Anomalies Consequences for Timing Analysis In the presence of timing anomalies, a timing analysis cannot make decisions “locally”: it needs toWCET consider all cases. State-of-the-art: Integrated Analysis Drawback Efficiency à May yield “State explosion problem” saarland university computer science Jan Reineke, Saarland 47 Timing Anomalies Open Analysis and Design Challenges ¢ ¢ How to determine whether a given timing model exhibits timing anomalies? How to construct processors without timing anomalies? Caches: LRU replacement l No speculation l Other aspects: “halt” everything upon every “timing accident” à possibly very inefficient l ¢ How to construct conservative timing model without timing anomalies? l Can we e.g. add a “safety margin” to the local worst case? Jan Reineke, Saarland 48 Domino Effects ¢ Intuitively: domino effect = “unbounded” timing anomaly ¢ Examples: Pipeline (e.g. PowerPC 755) l Caches (FIFO, PLRU, MRU, …) l Jan Reineke, Saarland 49 Domino Effects Example: Cache Domino Effect of FIFO Access: b hit [a, b] c b d c miss miss miss miss [c, a] [b, c] [d, b] [c, d] 4 Misses [d, c] 3 Misses [a, b] 4 Misses [b, a] 3 Misses [a, ?] miss [b, a] miss [c, b] [c, d] [d, c] miss miss [c, b] d a Access: hit [a, c] [a, d] miss hit miss [d, c] a b [d, a] [a, d] miss miss hit [b, d] [b, a] miss hit Similar examples exist for PLRU and MRU. Impossible for LRU. Jan Reineke, Saarland 50 Domino Effects Open Analysis and Design Challenges Exactly as with timing anomalies: ¢ How to determine whether a given timing model exhibits domino effects? ¢ How to construct processors without domino effects? ¢ How to construct conservative timing model without domino effects? Jan Reineke, Saarland 51 Timing Compositionality What does the execution time depend on? Motivation The input, determining which path is taken through the p 1 2 The state of the hardware platform: I ¢ Due to caches, pipelines, speculation, etc. Interferences from the environment: Some timing accidents are hard or External interferences as seen from the analyzed task o busses, caches, even impossible to statically exclude at memory. any particular program point: l Interference on a shared bus: ... depends on behavior of tasks executed on other cores l Interference on a cache in preemptively scheduled systems l DRAM refreshes But it may be possible to make cumulative statements about the number of these accidents 3 I Complex CPU L1 Cache L2 Cache Complex CPU Jan Reineke ¢ Main Memory L1 Cache Timing Analysis and Timing Predictability 11. Feb Jan Reineke, Saarland 52 Timing Compositionality Intuitive Meaning Timing of a program can be decomposed into contributions by different “components”, e.g. l Pipeline l Cache non-preempted l Cache-related preemption delay l Bus interference l DRAM refreshes l … Timing compositionality: ¢ Example, decomposition into pipeline and cache ¢ Tpipeline, cache (P, hp, ci) = Tpipeline (P, hpi) Tcache (P, hci) WCETpipeline, cache (P, hp, ci) = max Tpipeline, cache (P, hp, ci) Monotonic p,c Jan Reineke, Saarland 53 Timing Compositionality Application in Timing Analysis Then, the components (here: pipeline and cache) Timing compositionality: can also be analyzed separately: l , l # u u Tpipeline, cache (P, hp, ci) = Tpipeline (P, hpi) Tcache (P, hci) WCETpipeline, cache (P ) = max Tpipeline, cache (P, hp, ci) p,c max Tpipeline (P, hpi) p max Tcache (P, hci) c = WCETpipeline (P ) + WCETcache (P ) 2. REFERENCES Hahn, Reineke, Wilhelm: Towards Compositionality in Execution Time Analysis - Definition and Challenges. In CRTS 2013. Jan Reineke, Saarland 54 Timing Compositionality Example: “Cache-aware” Response-Time Analysis saarland university In preemptive scheduling, Preemptively Scheduled Systems preempting [Altmeyer et tasks al.] may computer science “disturb” the cache contents of preempted tasks: Response Time of Task 2 aT 1 dT 1 T1 aT 2 dT 2 T2 t Additional misses due to preemption, referred to as the Cache-Related Preemption Delay (CRPD). 1 Worst-case execution time of task 2 without preemptions: C2 Jan Reineke, Saarland 55 Timing Compositionality saarland Preemptively Scheduled Systems [Altmeyer et al.] Example: “Cache-aware” Response-Time Analysis university computer science Response Time of Task 2 aT 1 dT 1 T1 aT 2 dT 2 T2 t Timing decomposition: Worst-case execution time of task 2 without preemptions: C ¢ WCET of T1execution without preemptions: Worst-case time of task 1 without preemptions: C C1 Additional cost of task 1 preempting task 2: = #add. misses · BRT ¢ WCET without C2 ) R of C T2 + #preemptions · (C preemptions: + ) ¢ Additional cost of T1 preempting T2: CRPD1,2 = BRT * #additional misses à Response time of T2: 1 2 2 1 3 1,2 2 2 Sebastian Hahn 1 1,2 Timing Compositionality 19 June 2013 10 / 19 R2 ≤ C2 + #preemptions × (C1 + CRPD1,2) Jan Reineke, Saarland 56 Timing Compositionality Open Analysis and Design Challenges ¢ ¢ ¢ How to check whether a given decomposition of a timing model is valid? How to compute bounds on the cost of individual events, such as cache misses (BRT in previous example) or bus stalls? How to build microarchitecture in a way that permits a sound and precise decomposition of its timing? l ¢ Beyond simply stalling the whole system Evaluate costs and benefits of compositional analyses. Jan Reineke, Saarland 57 Emerging Challenge: Microarchitecture Selection & Configuration Embedded Software with Timing Requirements ? Family of Microarchitectures = Platform Jan Reineke, Saarland 58 Emerging Challenge: Microarchitecture Selection & Configuration Choices: ? • • Embedded Software with • • • Timing Requirements Processor frequency Sizes and latencies of local memories Latency and bandwidth of interconnect Presence of floatingpoint unit … Family of Microarchitectures = Platform Jan Reineke, Saarland 59 Emerging Challenge: Microarchitecture Selection & Configuration Choices: ? • • Processor frequency Sizes and latencies of local memories Select a microarchitecture that Embedded Software • Latency and a) satisfies all timing requirements, and bandwidth of with b) minimizes cost/size/energy. interconnect • Presence of floatingReineke and Doerfert: point unit Architecture-Parametric Timing • … Analysis. In RTAS 2014. Timing Requirements Family of Microarchitectures = Platform Jan Reineke, Saarland 60 Summary: Approaches to Increase Predictability Inputs = Program inputs + Initial state of microarchitecture + Tasks on other cores Uncertainty about Inputs 1. Reduce uncertainty by simplifying microarchitecture, e.g. eliminate cache, branch prediction, etc. Possible Executions 2. Reduce influence of uncertainty on executions, e.g. by temporal isolation Analysis Efficiency 3. Decouple analysis efficiency from number of executions: • Eliminate timing anomalies and domino effects, • Achieve timing compositionality e.g. by LRU replacement, stalling pipeline upon cache miss Jan Reineke, Saarland 61 Conclusions Challenges in modeling analysis design remain. Jan Reineke, Saarland 62 Conclusions Challenges in modeling analysis design remain. Progress based on machine learning abstraction partitioning has been made. Thank you for your attention! Jan Reineke, Saarland 63 Some References A Compiler Optimization to Increase the Efficiency of WCET Analysis M. A. Maksoud and J. Reineke. In RTNS, 2014. Architecture-Parametric Timing Analysis J. Reineke and J. Doerfert. In RTAS, 2014. Selfish-LRU: Preemption-Aware Caching for Predictability and Performance J. Reineke, S. Altmeyer, D. Grund, S. Hahn, C. Maiza. In RTAS, 2014. Towards Compositionality in Execution Time Analysis - Definition and Challenges S. Hahn, J. Reineke, and R. Wilhelm. In CRTS, 2013. Impact of Resource Sharing on Performance and Performance Prediction: A Survey A. Abel, F. Benz, J. Doerfert, B. Dörr, S. Hahn, F. Haupenthal, M. Jacobs, A. H. Moin, J. Reineke, B. Schommer, and R. Wilhelm. In CONCUR, 2013. Measurement-based Modeling of the Cache Replacement Policy A. Abel and J. Reineke. In RTAS, 2013. A PRET Microarchitecture Implementation with Repeatable Timing and Competitive Performance I. Liu, J. Reineke, D. Broman, M. Zimmer, and E.A. Lee. In ICCD, 2012. PRET DRAM Controller: Bank Privatization for Predictability and Temporal Isolation J. Reineke, I. Liu, H.D. Patel, S. Kim, and E.A. Lee. In CODES+ISSS, 2011. Jan Reineke, Saarland 64
© Copyright 2026 Paperzz