Performance Measurement A Quantitative Basis for Design Parallel programming is an optimization problem. Must take into account several factors: – execution time – scalability – efficiency A Quantitative Basis for Design Parallel programming is an optimization problem. Must take into account several factors: Also must take into account the costs: – memory requirements – implementation costs – maintenance costs etc. A Quantitative Basis for Design Parallel programming is an optimization problem. Must take into account several factors: Also must take into account the costs: Mathematical performance models are used to assess these costs and predict performance. Defining Performance How do you define parallel performance? What do you define it in terms of? Consider – Distributed databases – Image processing pipeline – Nuclear weapons testbed Metrics for Performance Efficiency Speedup Scalability Others ………….. Some Terms s(n,p) = speedup for problem size n on p processors o(n) = serial portion of computation p(n) = parallel portion of computation c(n,p) = time for communication Speed1 = o(n) + p(n) SpeedP = o(n) + p(n)/p + c(n,p) Efficiency The fraction of time a processor spends doing useful work T1 E= pTp o(n) + p(n) E = p * o(n) + p(n) + p * c(n,p) What about when pTp < T1 – Does cache make a processor work at 110%? Speedup What is Speed? Speed1 S= SpeedP What algorithm for Speed1? What is the work performed? How much work? Speedup (More Detail) s(n,p) = speedup for problem size n on p processors o(n) = serial portion of computation p(n) = parallel portion of computation c(n,p) = time for communication Speed1 = o(n) + p(n) SpeedP = o(n) + p(n)/p + c(n,p) o(n) + p(n) Speedup = o(n) + p(n)/p + c(n,p) Execution Time More on Speedup 100 90 80 70 60 50 40 30 20 10 0 Communication Computation Processors Computation time decreases as we add processors but communication time increases Two kinds of Speedup Relative – Uses parallel algorithm on 1 processor – Most common – Useful for determining algorithm scalability Absolute – Uses best known serial algorithm – Eliminates overheads in calculation. – Useful to express absolute performance Story: Prime Number Generation Amdahl's Law Every algorithm has a sequential component. Sequential component limits speedup ¾ can be parallelized ¼ sequential Suppose each ¼ of the program takes 1 unit of time Speedup = 1 proc time / n proc time = 4/1 = 4 Sequential = 1/s Component Maximum =s Speedup Amdahl’s Law o(n) + p(n) Speedup = o(n) + p(n)/p + c(n,p) o(n) + p(n) <= o(n) + p(n)/p s = o(n)/(o(n) + p(n)) = the inherently sequential percentage o(n) / s Speedup <= o(n) + o(n) ( 1/s -1)/p Speedup <= 1 s + ( 1 - s)/p Amdahl's Law s Speedup Speedup Algorithm A – Serial execution time is 10 sec. – Parallel execution time is 2 sec. Algorithm B – Serial execution time is 2 sec. – Parallel execution time is 1 sec. What if I told you A = B? Speedup Conventional speedup is defined as the reduction in execution time. Consider running a problem on a slow parallel computer and on a faster one. – Same serial component – Speedup will be lower on the faster computer. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of logic is the syllogism, consisting of a major and minor premise and a conclusion. Example Major Premise: Sixty men can do a piece of work sixty times as quickly as one man. Minor Premise: One man can dig a posthole in sixty seconds. Conclusion: Sixty men can dig a post-hole in one second. Speedup and Amdahl's Law Conventional speedup penalizes faster absolute speed. Assumption that task size is constant as the computing power increases results in an exaggeration of task overhead. Scaling the problem size reduces these distortion effects. Solution Gustafson introduced scaled speedup. Scale the problem size as you increase the number of processors. Calculated in two ways – Experimentally – Analytical models Traditional Speedup (Strong Scaling) T1(N ) Speedup = T P (N ) Tx (y) is time taken to solve problem of size y on x processors Scaled Speedup (weak scaling) T 1 ( PN ) Speedup = T P ( PN ) Traditional speedup reduces the work done by each processor as we add processors Scaled speedup keeps the work constant on each processor as we add processors. Scaled Speedup Speedup <= o(n) + p(n) o(n) + p(n)/p can be divided into two pieces serial and parallel s = o(n) / (o(n) + p(n)/p) and (1 – s) = p(n)/p / (o(n) + p(n)/p) now solve for o(n) and p(n) respectively o(n) = (o(n) + p(n)/p) * s p(n) = (o(n) + p(n)/p) * (1 – s) * p substituting these back into Speedup Equation yeilds Speedup <= s + (1 – s) * p and Speedup <= p + (1 – p) * s where s is fraction of time doing serial code = o(n) / t(n,k) t(n,k) is time of parallel program for size n on k processors Thus, max speedup with p < k processors is Speedup <= p + (1 – p) * s Traditional Speedup ideal Speedup measured Number of Processors Scaled Speedup ideal Large Problem Speedup Medium problem Small problem Number of Processors Scaled Speedup vs Amdahl’s Law Amdahl’s Law determines speedup by taking a serial computation and predicting how quickly it could be done in parallel Scaled speedup begins with a parallel computation and estimates how much faster the parallel computation is than the same computation on a serial processor strong scaling is defined as how the solution time varies with the number of processors for a fixed total problem size. weak scaling is defined as how the solution time varies with the number of processors for a fixed problem size per processor. Determining Scaled Speedup Time problem size n on 1 processor Time problem size 2n on 2 processors Time problem size 2n on 1 processor Time problem size 4n on 4 processors Time problem size 4n on 1 processor etc. Plot the curve Performance Measurement There is not a perfect way to measure and report performance. Wall clock time seems to be the best. But how much work do you do? Best Bet: – Develop a model that fits experimental results. A Parallel Programming Model Goal: Define an equation that predicts execution time as a function of – – – – Problem size Number of processors Number of tasks Etc. T = f ( N , P,....) A Parallel Programming Model Execution time can be broken up into – Computing – Communicating – Idling T = Tcomp + Tcomm + Tidle Computation Time Normally depends on problem size Also depends on machine characteristics – Processor speed – Memory system – Etc. Often, experimentally obtained Communication Time The amount of time spent sending & receiving messages Most often is calculated as – Cost of sending a single message * #messages Single message cost – T = startuptime + time_to_send_one_word * #words Idle Time Difficult to determine This is often the time waiting for a message to be sent to you. Can be avoided by overlapping communication and computation. Finite Difference Example Finite Difference Code 512 x 512 x 5 Elements nxnx z Nine-point stencil Row-wise decomposition – Each processor gets n/p*n*z elements 16 IBM RS6000 workstations Connected via Ethernet Finite Difference Model Execution Time (per iteration) – ExTime = (Tcomp + Tcomm)/P Communication Time (per iteration) – Tcomm = 2 (lat + 2*n*z*bw) Computation Time – Estimate using some sample code Estimated Performance Finite Difference Example What was wrong? Ethernet – Shared bus Change the computation of Tcomm – Reduce the bandwith – Scale the message volume by the number of processors sending concurrently. – Tcomm = 2 (lat + 2*n*z*bw * P/2) Finite Difference Example Using analytical models Examine the control flow of the algorithm Find a general algebraic form for the complexity (execution time). Fit the curve with experimental data. If the fit is poor, find the missing terms and repeat. Calculate the scaled speedup using formula. Example Serial Time = 2 + 12 N seconds Parallel Time = 4 + 12 N/P + 5P seconds Let N/P = 128 Scaled Speedup for 4 processors is: C1 ( PN ) 2 + 12(4(128)) 6146 = = = 3.93 CP ( PN ) 4 + 12(4(128) / 4) + 5(4) 1560 Performance Evaluation Identify the data Design the experiments to obtain the data Report data Performance Evaluation Identify the data – Execution time – Be sure to examine a range of data points Design the experiments to obtain the data Report data Performance Evaluation Identify the data Design the experiments to obtain the data – Make sure the experiment measures what you intend to measure. – Remember: Execution time is max time taken. – Repeat your experiments many times – Validate data by designing a model Report data Performance Evaluation Identify the data Design the experiments to obtain the data Report data – Report all information that affects execution – Results should be separate from Conclusions – Present the data in an easily understandable format.
© Copyright 2026 Paperzz