Getting the Best Performance from V9 Threaded PROC SORT Scott Mebust System Developer Base Information Technology Copyright © 2005, SAS Institute Inc. All rights reserved. The (Unofficial) SAS Skydiving Team Copyright © 2005, SAS Institute Inc. All rights reserved. 2 Keys to Sorting Performance Know the conditions Observe actual performance Understand theoretical performance Make adjustments Copyright © 2005, SAS Institute Inc. All rights reserved. 3 Know the Conditions System SAS Sort job Copyright © 2005, SAS Institute Inc. All rights reserved. 4 Observe Actual Performance Monitor System Activity Examine the SAS Log Measure System Capabilities Copyright © 2005, SAS Institute Inc. All rights reserved. 8 Identify and Observe Sorting Phases I/O Bound, External, Single-Threaded Sort Phase Merge Phase Copyright © 2005, SAS Institute Inc. All rights reserved. 9 Measure Storage Device Sequential Transfer Rates From Within SAS Create a large dataset (e.g. 4xRAM) Read dataset, dumping to _NULL_ Ensure Real time » CPU time Compute transfer rates (R) Rread F Rwrite tread F twrite Where F: size of the dataset (bytes) t: real time (seconds) Copyright © 2005, SAS Institute Inc. All rights reserved. 12 Measure In-Core Sorting Costs CPU Time Per Observation 5.00E-07 Normalized CPU Time (seconds) 4.50E-07 4.00E-07 Small job overhead 3.50E-07 3.00E-07 Actual 2.50E-07 Predicted 2.00E-07 1.50E-07 1.00E-07 5.00E-08 0.00E+00 10000 100000 1000000 10000000 1E+08 Number of Observations Copyright © 2005, SAS Institute Inc. All rights reserved. 14 Understand Theoretical Performance Classify the job Estimate SORT running time Consider estimation hazards Copyright © 2005, SAS Institute Inc. All rights reserved. 15 Classify the Job Performance Limitation Compute Bound I/O Bound Mixed Copyright © 2005, SAS Institute Inc. All rights reserved. 16 Classify the Job Size F O 1 Internal M F O 1 External M B F O M SinglePass B F O M MultiPass Where F: size of input dataset O: size of internal sorting overhead M: size of RAM B: utility file page (block) size Copyright © 2005, SAS Institute Inc. All rights reserved. 17 Estimate the Running Time Internal Sort, I/O Bound F tread Rread Input Sequential Read t tread twrite RAM Output Where t: real time (sec) F: dataset size (bytes) R: transfer rate (bytes/sec) Copyright © 2005, SAS Institute Inc. All rights reserved. Sequential Write F twrite Rwrite 21 Estimate the Running Time Single-Pass External, I/O Bound Input Sequential Read RAM F t1 Rread Sequential Write U t2 Rwrite t t1 t 2 t 3 t 4 Temp F t4 Rwrite Output Sequential Write t3 ? RAM Random Read Where U: utility file size (bytes) Copyright © 2005, SAS Institute Inc. All rights reserved. 22 Utility File Read Time File Size Single-threaded: Multi-threaded: U F o UF Number of Pages U npages B where F: size of input dataset o: # of observations × sort key length where B: utility file page (block) size Best Case (Sequential) Read Time Worst Case (Random) Read Time U tread Rread Copyright © 2005, SAS Institute Inc. All rights reserved. B tread npages s r Rread where s: average positional latency r: average rotational latency 23 Multi-Pass External Sorting Number of Sorted Runs Number of Utility File Passes F O nruns M Ln(nruns) npasses Ln ( n buffers ) where M nbuffers B is the Maximum External Merge Order Copyright © 2005, SAS Institute Inc. All rights reserved. and F: size of input dataset O: size of internal sorting overhead M: SORTSIZE B: utility file page (block) size 24 Estimate the Running Time Single-Pass External, Compute Bound Input RAM tsort ? t tsort tmerge twrite Temp F twrite Rwrite Output Copyright © 2005, SAS Institute Inc. All rights reserved. Sequential Write tmerge ? RAM Random Read 25 Single-Pass External, Compute Bound Utility File Creation Time tsort nruns trun Utility File Merge Time, Compute Bound tmerge tsort where F O nruns M trun is the time required to perform an in-memory sort the number of observations in a single run Utility File Merge Time, I/O Bound Best Case: U tmerge Rread Worst Case: nobs runobs nruns B tmerge npages s r Rread Where nobs is the total number of observations in the dataset As previously described for I/O bound Utility File Read Time Copyright © 2005, SAS Institute Inc. All rights reserved. 26 Consider Estimation Hazards File cache effects Pseudo-internal sorting (thrashing) Pseudo-external sorting (file cache) Limitations within each sorting phase Copyright © 2005, SAS Institute Inc. All rights reserved. 27 Make adjustments Determine if there is a problem Identify the problem Alter the conditions Re-evaluate Copyright © 2005, SAS Institute Inc. All rights reserved. 30 Identify the Problem Processing speed Memory External Storage Copyright © 2005, SAS Institute Inc. All rights reserved. 31 Alter the Conditions Memory settings Library to storage device mappings Utility file location Utility file page size Copyright © 2005, SAS Institute Inc. All rights reserved. 32 Copyright © 2005, SAS Institute Inc. All rights reserved. 34
© Copyright 2026 Paperzz