851-0585-04L – Modeling and Simulating Social Systems with MATLAB Lecture 9 – Simulation Speed & Parallelization Karsten Donnay and Stefano Balietti Chair of Sociology, in particular of Modeling and Simulation © ETH Zürich | 2011-11-21 Schedule of the course Introduction to MATLAB 26.09. 03.10. 10.10. 17.10. 24.10. 31.10. Working on projects (seminar thesis) 07.11. 14.11. Introduction to social-science modeling and simulations 21.11. 28.11. 05.12. 12.12. 19.12. 2011-11-21 Handing in seminar thesis and giving a presentation K. Donnay & S. Balietti / [email protected] [email protected] 2 Goals of Lecture 9: students will 1. Refresh knowledge on continuous simulations acquired in lecture 8, through brief repetition of the main points. 2. Get an overview of potential issues that might affect computational performance/speed in MATLAB and learn strategies to avoid performance loss. 3. This lecture emphasizes both the importance of efficient program design and strategies to avoid MATLABspecific performance issues. 4. Understand the basics of MATLAB’s parallel computing toolbox. 5. Put some of the above concepts into practice with code examples. 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 3 Repetition Transition from discrete spatial models to continuous spatial models where computational agents are treated analogous to particles in Physics Random walk introduced as one generic microscopic agent dynamic that microscopically leads to diffusion of the whole ensemble of agents (micro-macro link) Concept of random walk is not limited to spatial dynamics; successfully used for example also in modeling of financial markets Continuous models vs. discrete models (CA, lattice): depending on context one or the other can be more suitable, both approaches are used in the literature! 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 4 How To Optimize Your Program General Remarks Plan your program/ work flow before implementing the program Only perform the minimal number of computational steps necessary to obtain results Store intermediate results and re-use them Use global variables with caution Do not visualize in real time if not necessary 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 5 How To Optimize Your Program Strategy The design and structure of your program is the place to most easily gain performance Use functions to run your program, they are best performance-optimized in MATLAB! Subroutines both improve performance and readability of your code In a second step optimize each routine for individual speed & test its performance 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 6 How To Optimize Your Program Possible Program Structure Define and initialize all variables in a script file that serves as the main of your program Execute the actual program as function with the above variables as inputs Automated analysis, visualization etc. of the simulation output may then be implemented in the main file using the outputs of the function that runs the actual program 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 7 MATLAB-specific Performance Issues Memory Preallocation Data structures (arrays, cell arrays) in MATLAB have a default memory allocated when created without specified size Resizing data structures is both bad for performance and memory efficiency! Whenever possible initialize a data structure such that it does not have to be resized while the program is running Use commands zeros(n,m), cell(n,m) 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 8 MATLAB-specific Performance Issues Limit the Complexity of The Program MATLAB has a limit on the complexity of program code it can interpret! Subdivide your program in routines of which each is independently implemented as a function Avoid excessive use of if… else… statements, in particular multiply nested expressions “Minimalistic” programming will not only improve performance but also greatly improve readability! 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 9 MATLAB-specific Performance Issues Variable Casting Do not recast your data structures during program execution In particular, avoid storing the “wrong” data type in a data structure (for example a complex number or a string in an array of type double) MATLAB is otherwise forced to recast the data structure! The MATLAB default number format is double; if you explicitly need another number format, use for example zeros(10, ‘int32’) to initialize the array 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 10 MATLAB-specific Performance Issues Short-circuit Operators In logical operations make use of the short-circuit operators that MATLAB provides, i.e. use && and || instead of & and | Advantage: in a complicated logical statement MATLAB only evaluates the first expression if this already returns false, e.g. in (x >= 3) && (y > 4) it stops after the first argument if x < 3 Evaluating as few Boolean expressions as possible is a direct performance boost 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 11 MATLAB-specific Performance Issues Overloading Build-in Functions MATLAB is optimized for performance of its built-in functions Other than for example in C++, overloading a built-in function can lead to performance losses since you might interfere with particular optimizations in the built-in function MATLAB is a closed source software, you usually do not know what you are tempering with our advice: stay away from overloading built-in functions! 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 12 MATLAB-specific Performance Issues MATLAB Specifics MATLAB is best optimized for the execution of functions, use them over scripts whenever possible If you need to store data structures outside your program, use the MATLAB ‘save’ and ‘load’ commands they are superior to routines like ‘fread’ or ‘fwrite’ (faster and less memory fragmentation) Avoid running CPU and/or memory intensive programs at the same time as MATLAB OR prioritize MATLAB 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 13 MATLAB-specific Performance Issues Vectorizing A large performance gain can achieved by vectorizing ‘for’ or ‘while’ loops Instead of iteratively calculating the results, the loops are expressed as matrix operations for which MATLAB is optimized e.g. instead of simply use for i=1:10 x(i)=i^2 end 2011-11-21 indices=1:10 x=indices.^2 K. Donnay & S. Balietti / [email protected] [email protected] 14 MATLAB-specific Performance Issues Vectorizing There are a number of functions specifically used in the context of vectorized computations, examples are meshgrid or reshape There is an exhaustive documentation of vectorizing techniques available on the Internet A quick overview and a list of useful functions in MATLAB when vectorizing computations may be found here, a more detailed introduction here 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 15 A word of caution… The fact that MATLAB is optimized for matrix operations does not in every case mean that such a routine is faster!! vectorizing a computation can be quite costly and greatly affect the performance MATLAB uses an just-in-time (JIT) compiler that recognizes high level commands and replaces them with native machine instructions This can for example greatly accelerate loops! In the end you have to test your code to see what is faster!! 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 16 Measuring performance MATLAB has a profiler that tracks the performance of your code at the resolution of lines It may be called with profile on, the command profile viewer stops the profiler and displays the results Using the profiler make sure to first optimize the slowest sections of your code as the performance gain is the largest It is important to test the code for the full load of the program, it might perform differently for small/ large data structures! The MATLAB documentation of the profiler may be found here 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 17 Measuring performance Alternatively use the tic and toc commands to test a particular routine tic starts the time counter and toc stops the timer & displays the elapsed time It is often convenient to store the timer results in a variable, the MATLAB default is elapsedTime = toc You can also test several routines simultaneously using the ticID and the the specific timing command toc(ticID) Documentation may be found here 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 18 Parallelization in MATLAB When Is It Useful? When performing large number of independent operations MATLAB’s parallel computing features are useful E.g when the same simulation has to be run for multiple parameter combinations parallelization boosts performance Note that the data structures and program routines have to be parallelizable You also require multiple cores on your computer or access to a computational cluster 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 19 Parallelization in MATLAB The Toolbox The toolbox comes with a number of high level programming constructs such as the parfor loop A number of built-in routines automatically parallelize their work stream if possible with your hardware Built-in distributed computing interface supports various schedulers like Platform LSF®, Microsoft® Windows® Compute Cluster Server & HPC Server 2008 or Altair PBS Pro® Newest MATLAB version also supports GPU computing 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 20 How to use parallel for loops: Parfor matlabpool open N clear A parfor i = 1:8 A(i) = i; end A matlabpool close 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 21 How to use parallel for loops: Parfor How many parallel workers matlabpool open N clear A parfor i = 1:8 A(i) = i; end A The real number you can use depends on the number of cores of your CPU matlabpool close 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 22 Batch Because we still need our laptop while running MATLAB… j = batch(‘myScript’); wait(j); load(j); % Wait for the job to finish % Load the job results destroy(j); 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 23 The Multi-Agent Simulator Revisited Programming an agent-based simulator often goes in five steps Initialization Initialization - Initial state; parameters; environment Time loop Time loop Agents loop - Processing each time steps Agents loop Update state - Processing each agents Update end end - Updating agent i at time t Save data Save data - For further analysis 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 24 The Multi-Agent Simulator Revisited Main Program 1. Initialization 2. Simulation 3. Saving Results Keep 1., 2., 3. (as much as possible) separated Create interfaces between the components Make input and output more re-usable with objects Separate data generation and data analysis 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 25 The Multi-Agent Simulator Revisited Main Program 1. Initialization 2. Simulation 3. Saving Results Data Analysis Keep 1., 2., 3. (as much as possible) separated Create interfaces between the components Make input and output more re-usable with objects Separate data generation and data analysis 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 26 The Multi-Agent Simulator Revisited Object definition: Objects can be used to store both input and output A simple way to define objects is using struct myObj = struct( ‘property_A’, 1, … ‘property_B’, 2); myObj.property_A = 1 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 27 The Multi-Agent Simulator Revisited The main program: 1. Loads the configuration file: load(‘conf_file’); 2. Generates all the combinations of parameter sets specified in the configuration file 3. Launch N simulations per parameter set 4. Save the results 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 28 The Multi-Agent Simulator Revisited The main program: 1. Loads the configuration file: load(‘conf_file;); 2. Generates all the combinations of parameter sets specified in the configuration file 3. Launch N simulations per parameter set 4. Save the results 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 29 The Multi-Agent Simulator Revisited Initialization: Should be placed in a separate file containing: Model variables: One (or more) combination of model parameters N. simulations per parameter set (single combination) ‘Global’ settings: Relevant directories (log, dump…), dummies, etc. Other info: name, description, version, date… 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 30 Initialization file example % GLOBAL Conf simName = 'MyCoolSym’; dumpDir = 'dump/'; RUNS = 10; globals = struct('dumpDir', dumpDir, 'RUNS', RUNS); % MODEL Conf dts = [0.01]; steps = [2]; n_agents = [10:10:100]; % time_step % number of simulation steps % number of agents model = struct('dts', dts, 'steps', steps, 'n_agents', n_agents); % Put all together init = struct('name', simName,'globals', globals,'model', model); save(simName); % Creates a loadable file 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 31 Initialization file example % GLOBAL Conf simName = 'MyCoolSym’; dumpDir = 'dump/'; RUNS = 10; globals = struct('dumpDir', dumpDir, 'RUNS', RUNS,); % MODEL Conf dts = [0.01]; steps = [2]; n_agents = [10:10:100]; Notice: we are using % time_step vectors here! % number of simulation steps % number of agents model = struct('dts', dts, 'steps', steps, 'n_agents', n_agents); % Put all together init = struct('name', simName,'globals', globals,'model', model); save(simName); % Creates a loadable file 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 32 The Multi-Agent Simulator Revisited The main program: 1. Loads the configuration file: load(‘conf_file’); 2. Generates all the combinations of parameter sets specified in the configuration file 3. Launch N simulations per parameter set 4. Save the results Parameter Sweeping 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 33 The Multi-Agent Simulator Revisited The combinatorial exploration of the parameter space is also called ‘Parameter Sweeping’. Its purpose is to determine the behavior of the system in different parameter ranges. It is normally executed by “exploding” the input vectors of parameters into multiple independent parameter sets. Nested for loops are a solution. 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 34 Example of vectorization of parameter sets for i1=1:size(param_As,2) param_A = param_As(i1); for i2=1:size(param_Bs,2) param_B = param_Bs(i2); % Repeat the same simulation nRUNS times for rCount=1:nRUNS simulation(param_A, param_B); end end end 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 35 The Multi-Agent Simulator Revisited The main program: 1. Loads the configuration file: load(‘conf_file;); 2. Generates all the combinations of parameter sets specified in the configuration file 3. Launch N simulations per parameter set 4. Save the results 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 36 The Multi-Agent Simulator Revisited Saving the results: save(fileName,’results’) It is good practice to save the output of the simulation together the input object which was used to run it. Save save each parameter set into a separate folder: mkdir(‘foldername’) Saved results may already include the average or leave the computation to the data analysis part. 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 37 Projects There are no exercises today, please work on your projects! We would like to remind you that the oral project presentations will start in the week of the 19th of December. All projects are due for midnight Friday 16th of December. 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 38 References MATLAB documentation on performance Manual on writing fast MATLAB code MATLAB Technical Note on code vectorization MATLAB Parallel Computing Toolbox Short-circuit operators in MATLAB https://github.com/msssm/lectures_files/optimiza tion/ 2011-11-21 K. Donnay & S. Balietti / [email protected] [email protected] 39
© Copyright 2026 Paperzz