Introduction to Parallel Processing

Introduction to Parallel
Processing
Transcript
Introduction to Parallel Processing Transcript was developed by Michelle Buchecker. Additional
contributions were made by Cheryl Doninger, Glenn Horton, Merry Rabb, and Christine Riddiough.
Editing and production support was provided by the Curriculum Development and Support Department.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product
names are trademarks of their respective companies.
Introduction to Parallel Processing Transcript
Copyright © 2009 SAS Institute Inc. Cary, NC, USA. All rights reserved. Printed in the United States of
America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
Book code E1411, course code RLSPCN06, prepared date 31Mar2009.
RLSPCN06_001
ISBN 978-1-59994-969-7
For Your Information
Table of Contents
Lecture Description ..................................................................................................................... iv Prerequisites ................................................................................................................................. v Introduction to Parallel Processing .............................................................................. 1 1.
Basics of Parallel Processing .............................................................................................. 6 2.
Threading .......................................................................................................................... 30 iii
iv
For Your Information
Lecture Description
This is the first e-lecture of a five-lecture series on parallel processing with SAS/CONNECT software.
This lecture teaches you the basics of parallel processing in SAS/CONNECT software to perform parallel
processing on a single machine or across multiple machines.
To learn more…
For information on other courses in the curriculum, contact the SAS Education
Division at 1-800-333-7660, or send e-mail to [email protected]. You can also
find this information on the Web at support.sas.com/training/ as well as in the
Training Course Catalog.
For a list of other SAS books that relate to the topics covered in this
Course Notes, USA customers can contact our SAS Publishing Department at
1-800-727-3228 or send e-mail to [email protected]. Customers outside the
USA, please contact your local SAS office.
Also, see the Publications Catalog on the Web at support.sas.com/pubs for a
complete list of books and a convenient order form.
For Your Information
Prerequisites
Before listening to this lecture, you should be able to
• write DATA and PROC steps
• understand error messages in the SAS log and debug your programs.
v
vi
For Your Information
Introduction to Parallel Processing
1. Basics of Parallel Processing .......................................................................................... 6 2. Threading ......................................................................................................................... 30 2
Introduction to Parallel Processing
1. Basics of Parallel Processing
Introduction to
Parallel Processing
Welcome to the Introduction to Parallel Processing e-lecture. My name is Michelle and I will be your
instructor for this lecture. I have been an instructor for SAS for over 15 years and my specialties include
SAS/CONNECT software.
3
4
Introduction to Parallel Processing
Lectures Available
„
Introduction to Parallel Processing.
„
Using Parallel Processing on a Single Machine (Scaling Up)
„
Using Parallel Processing Across Multiple Machines (Scaling Out)
„
Pipeline Parallelism
„
Managing Asynchronous Execution
2
This lecture series consists of five separate lectures. The first lecture is Introduction to Parallel
Processing. The second lecture is Using Parallel Processing on a Single Machine (Scaling Up). The third
lecture is Using Parallel Processing Across Multiple Machines (Scaling Out). The fourth lecture is
Pipeline Parallelism. And the fifth lecture is Managing Asynchronous Execution.
This is the first lecture in the series. We encourage you to listen to all five lectures to get a full
understanding of how to perform parallel processing.
1. Basics of Parallel Processing
5
Introduction to Parallel Processing
1. Basics of Parallel Processing
2. Threading
3
In this lecture you will learn the basics of parallel processing, which is part of SAS/CONNECT software,
as well as threading, which is part of Base SAS software.
6
Introduction to Parallel Processing
1.
Basics of Parallel Processing
Introduction to Parallel Processing
1. Basics of Parallel Processing
2. Threading
4
So let’s start off with the basics.
1. Basics of Parallel Processing
Objectives
„
Define parallel processing concepts.
„
Identify the benefits of parallel processing.
„
Define independent parallelism.
5
In this section you will learn about the basics of parallel processing, what the benefits are, and what is
meant by independent parallelism.
7
8
Introduction to Parallel Processing
Technology Yesterday
6
In years past, life was simple. Each computer had only one CPU and was not connected to another
computer. Programs ran faster on the machine that had the fastest processor, assuming no one else was
using the processor at that time. When you submitted your program, it ran in a top-down approach. We
didn’t have a word for it at the time, but now we call this technique synchronous processing. That is what
happened in SAS 6.
1. Basics of Parallel Processing
Synchronous Processing
DATA Step A
PROC SORT A
DATA Step B
PROC SORT B
elapsed time
0
7
Synchronous processing is when the execution of tasks follows one after the other.
In this example, I need to run a DATA step to read in data to create data set A, and then sort data set A.
Once that completes, then the DATA step to read in different data to create data set B will start, and then
sort that. So the total elapsed time is the combination of all of the different steps.
9
10
Introduction to Parallel Processing
Parallel Processing Using MP CONNECT
Multiprocess (MP) CONNECT enables SAS tasks or sessions to
execute in parallel, and coordinates and synchronizes all results
into a single parent (or client) session.
This simultaneous processing can occur on:
MP CONNECT is part of SAS/CONNECT software.
8
The purpose of MP CONNECT is to reduce overall elapsed time by having multiple SAS sessions
executing code at once and then synchronizing the results in the parent, also known as the client, SAS
session. This parallel processing allows multiple tasks to be executed at the same time.
This simultaneous processing can occur on a single machine that has multiple processors, multiple
machines that are networked together, or a combination of both. If you are doing simultaneous processing
of SAS programs and coordinating the results on just one machine, then SAS/CONNECT must be
installed on that machine.
1. Basics of Parallel Processing
11
Benefits of Parallel Processing
The purposes of parallel processing (also known as multiprocessing
or asynchronous processing) include the following:
9
„
execute independent tasks in parallel (SAS® 8)
„
execute select dependent tasks in parallel (SAS®9)
„
take advantage of multiple processors on a symmetric
multiprocessing (SMP) single machine
continued...
With parallel processing you can execute independent tasks in parallel starting in SAS 8, or dependent
®
tasks in parallel using pipeline parallelism in SAS 9. Both cases require SAS/CONNECT software. This
allows you to take advantage of a machine with multiple processors on it, also known as an SMP or
symmetric multiprocessing machine.
12
Introduction to Parallel Processing
Benefits of Parallel Processing
„
take advantage of each processor on a network of machines
„
complete a job in less total elapsed time than it would take
to execute the same job serially
„
increase usage of underutilized CPUs
– exploit current investment
– prevent further monetary outlay for hardware
10
If you don’t have an SMP machine, that’s OK. Just farm out the processing to multiple machines on your
network to use the benefits of parallel processing. Either way, the whole goal of parallel processing is to
reduce the overall elapsed time that a job takes to run. This way you can take advantage of underutilized
CPUs and don’t have to buy additional hardware.
1. Basics of Parallel Processing
13
Networked Machines
Machine 1
Machine 5
Machine 2
Central
Parent
Machine 4
Machine 3
11
Most of us don’t work 24 hours a day, so your computer is idle at your desk while you are getting some
much deserved rest, and the same is true of your coworkers. Parallel processing helps you take advantage
of that computer downtime. This is an example of grid computing, which is simply a network of
computers that are used to do parallel processing to solve one application problem.
14
Introduction to Parallel Processing
Grid Enabling a SAS Application
grid node
partial result
grid node
partial result
sub-task
SAS
program
sub-task
join
final
result
split
Input
data
grid node
partial result
grid node
partial result
data
sub-task
sub-task
12
This diagram shows the idea behind executing a long-running SAS job on a grid. The first step is to split
the job into multiple sub-tasks. These sub-tasks are distributed to the nodes of the grid. Executing these
sub-tasks results in many partial result sets that are then aggregated or joined together at the end to form
the final result.
SAS implements this advanced grid environment by using SAS Grid Manager and the Platform Suite for
SAS, which are separate components of the SAS Enterprise Business Intelligence Server.
This course does not go into more detail on grid computing using the SAS Grid Manager, but if you are
interested there is a recorded lecture series devoted to that topic in detail.
Note: Some type of software or grid middleware is needed in order to accomplish this parallel distributed
execution and aggregation of results.
The most basic grid environment consists of a SAS application that utilizes the parallel distributed
processing capabilities of SAS/CONNECT to run sub-components of an application in parallel across
nodes in a grid. This can be accomplished by utilizing the %DISTRIBUTE macro that serves as a
wrapper of the SAS/CONNECT functionality and provides simplistic, yet very powerful load balancing
of the workload across the nodes in the grid. This can result in performance gains in the range of 95-99%
for those applications that consist of many replicate runs of the same fundamental task (such as
applications doing BY-group processing) or tasks made up of multiple independent units of work (such as
large-scale parallel scoring).
1. Basics of Parallel Processing
15
A more advanced grid environment consists of using SAS Grid Manager and the Platform Suite for SAS,
which are separate components of the SAS Enterprise Business Intelligence Server.
16
Introduction to Parallel Processing
Independent Parallelism
Today a computer can have multiple processors, or be part of a network.
SAS 8 software has the ability through MP CONNECT to take
advantage of additional processors for different steps.
Data Source A
Data Source B
PROC SORT A
PROC SORT B
13
0
elapsed time
Starting in SAS Version 8, SAS introduced parallel processing. Parallel processing was available in SAS
8 as long as the steps you wanted to execute in parallel were independent data. Independent parallelism is
when the execution of separate tasks does not have any interdependencies.
For instance, I have a DATA step to read in data to create data set A and then sort data set A. While that
is happening, I can also run a DATA step to read in different data to create data set B and sort that.
Because the processing for data set A has nothing to do with the processing for data set B, this is called
independent parallelism. This will reduce the overall elapsed time it takes the code to run. Once the
longest task has finished, in our case processing B data, then I can merge the two data sets together.
To take advantage of this type of processing, which is known as MP CONNECT, you need to have
multiple CPUs, whether they are multiple CPUs on the same physical machine or single CPUs on a
number of different machines. Either way, SAS/CONNECT software is still required on those machines.
So even if you have a single machine with two CPUs and you want to run SAS steps in parallel, you must
have SAS/CONNECT on that one machine.
1. Basics of Parallel Processing
Sample Business Scenarios
Uses for multiprocessing for independent parallelism tasks include:
„
multiple analyses of a single SAS data set
„
concurrent creation of multiple summary data sets
„
concurrent extraction of data from multiple data sources to be
loaded into a data warehouse
14
So what are some uses of doing parallel processing? Well one use is to perform multiple analyses of the
same SAS data set. You can also use it for creation of multiple data sets at the same time, as well as to
grab data from different data sources to populate a data warehouse.
17
18
Introduction to Parallel Processing
Multiple Analyses of a Single SAS Data Set
FREQ Procedure
TABULATE Procedure
UNIVARIATE Procedure
15
0
elapsed time
So let’s say I have a single data set and wish to do a PROC FREQ, a PROC TABULATE, and a PROC
UNIVARIATE on that data. Using parallel processing, I can have these steps run simultaneously and the
overall elapsed time is only as long as it takes the longest PROC to run.
1. Basics of Parallel Processing
Concurrent Creation of Multiple Summaries
Create Summary1
Create Summary2
Create Summaryn
16 0
elapsed time
Similarly, if I had to read the same data set and create multiple summaries of the data, this would be a
good use of parallel processing.
19
20
Introduction to Parallel Processing
Concurrent Data Extraction
Extract Oracle Data
Read/Summarize
SAS Data Set
Merge Data
Read/Summarize
Raw Data File
17
0
elapsed time
And lastly, if I had an Oracle table, a SAS data set, and a raw data file that all needed to be read,
manipulated, and loaded into a data warehouse or data mart, I can use the parallel processing techniques
to do the reading and manipulating parts at the same time, and when the last one completes I will add an
instruction to merge the data together.
1. Basics of Parallel Processing
21
Divide and Conquer
Distribute processing for jobs that consist of many replicate runs
of a fundamental task:
„
Monte Carlo methods (such as simulating statistical tests)
„
global integer optimization (such as searching for optimal
designs)
„
mining massive data sets
18
So with asynchronous execution we can take a “divide and conquer” approach to performing tasks against
our data that may consists of replicate runs, for example, Monte Carlo simulation methods, global integer
optimization, and mining massive data sets.
22
Introduction to Parallel Processing
Divide and Conquer
You might want to divide a program into many smaller sub-units to
take advantage of faster processors and repeatedly send these smaller
sub-units to idle processors until the entire program is complete.
1 2 3
1
2
3
19
One approach is to divide the program into equal pieces and submit each piece separately to whatever
processor is available. But if I have a fast processor, a medium processor, and a slow processor, the fast
processor will get done quickly and then just be idle and your program is waiting then on the slowest
processor.
1. Basics of Parallel Processing
23
Divide and Conquer
You might want to divide a program into many smaller sub-units to
take advantage of faster processors and repeatedly send these smaller
sub-units to idle processors until the entire program is complete.
20
An alternative approach is to break the program into even smaller chunks and send the first three chunks
to the three idle processors.
24
Introduction to Parallel Processing
Divide and Conquer
You might want to divide a program into many smaller sub-units to
take advantage of faster processors and repeatedly send these smaller
sub-units to idle processors until the entire program is complete.
Done!
21
Then when one processor is finished, send the next piece of code to that processor.
1. Basics of Parallel Processing
Divide and Conquer
You might want to divide a program into many smaller sub-units to
take advantage of faster processors and repeatedly send these smaller
sub-units to idle processors until the entire program is complete.
Done!
22
And when the next processor finishes, send the next piece of code in line to that one.
25
26
Introduction to Parallel Processing
Divide and Conquer
You might want to divide a program into many smaller sub-units to
take advantage of faster processors and repeatedly send these smaller
sub-units to idle processors until the entire program is complete.
Done!
23
And continue repeating this process,
1. Basics of Parallel Processing
Divide and Conquer
You might want to divide a program into many smaller sub-units to
take advantage of faster processors and repeatedly send these smaller
sub-units to idle processors until the entire program is complete.
Done!
24
until all of the subunits of the program have been processed.
27
28
Introduction to Parallel Processing
Considerations
When working with multiprocessing applications, consider the following:
„
overhead
„
load factor
„
requirements on the I/O subsystem
„
interdependencies of data between steps
„
dividing the data
25
Parallel processing is not a one-size-fits-all solution. There are many factors to take into consideration.
Specifically, when doing parallel processing SAS is actually starting up additional SAS sessions on your
computer, so there is some amount of overhead as those sessions start up. Also, if your machine is already
heavily utilized, it won’t have the extra capacity to help with the parallelism.
More I/O is often used for the parallel processing to occur, and you have to carefully think through the
dependencies of your data as it goes from step to step.
You also could choose to divide a large data set into smaller ones to do the processing on and then bring
the data back together.
1. Basics of Parallel Processing
29
Dependent Parallelism
DATA Step A
PROC SORT A
0
PROC PRINT A
elapsed time
26
Regardless if you use sequential processing or parallel processing of independent steps, the data between
®
steps still needs to be written out to disk. In SAS 9, SAS introduced pipeline parallelism, which negates
the need for writing the data to disk between dependent steps. This process allows you to pipe the data
directly from one step to the next through a TCP/IP pipe and allows some steps to work in parallel. Since
you are no longer writing the data to disk, you are saving both disk space and I/O time. Pipeline
parallelism is part of SAS/CONNECT software.
Dependent parallelism is when the execution of different steps has the same data. In this example, the sort
for data set A starts shortly after the DATA step has started, but before the DATA step has finished. The
two steps have the same data, so this is dependent parallelism. To achieve this, you will use pipeline
parallelism techniques. Because the PROC PRINT step has to have all of the data before it can start, it
cannot overlap with the PROC SORT.
This topic is covered in more detail in the Pipeline Parallelism e-lecture.
30
Introduction to Parallel Processing
2.
Threading
Introduction to Parallel Processing
1. Basics of Parallel Processing
2. Threading
27
Now that you have an overview of parallel processing, let’s take a look at threading.
2. Threading
Objectives
„
Define how threading works.
„
Specify tuning options.
„
Compare benchmarks.
28
In this section you will learn what threading is, what tuning options are available, and see some
benchmark statistics comparing threading versus non-threading.
31
32
Introduction to Parallel Processing
Threading
Starting with SAS®9, some individual steps have the ability to take
advantage of additional processors.
SAS Version 6
SAS Version 8
Independent Parallelism
SAS®9 Threading
Parallelism
DATA Step A
PROC SORT A
DATA Step B
PROC SORT B
29
DATA Step A
DATA Step B
PROC SORT A
PROC SORT B
DATA Step A
PROC SORT A PROC SORT A
Base SAS
Software
http://support.sas.com/rnd/scalability/procs/index.html
®
Starting with SAS 9, a single step can take advantage of multiple CPUs and run in a threading
environment. This is a type of parallel processing.
Threading takes advantage of multiple CPUs by dividing processing among the available CPUs. Even if
your site does not use an SMP machine, some types of threading can be performed using a single CPU.
So if my PROC SORT on data set A is running on a multiple processor machine, SAS can run part of the
sort on one processor and part on another processor and coordinate the results. This is part of Base SAS
software. Only a select number of PROC steps support this technology. Those steps can be found at the
Web address shown on this slide.
Note: Multi-threading takes advantage of SMPs and provides performance gains for two types of SAS
processes:
• threaded I/O
• threaded applications processing
Threaded I/O aids applications that can process data faster than the data can be delivered to the
application. When an application cannot keep the available CPUs busy, the application is said to be I/Obound.
Threaded application processing aids applications that receive data faster than they can perform the
necessary processing on that data. These applications are referred to as CPU-bound.
2. Threading
Threaded technology and multiple CPUs alleviate these problems.
Threading is part of Base SAS and is implemented for the following SAS procedures:
• MEANS/SUMMARY
• REPORT
• SORT
• SQL
• TABULAT
• REG
• ROBUSTREG
• GLM
• LOESS
• DMREG
• DMINE
33
34
Introduction to Parallel Processing
Multi-Threaded Processing
PROC SORT DATA=A
Execute Simultaneously
SMP
30
So let’s say that I have an SMP machine that has three processors on it, and I submit a PROC SORT
which is one of the threaded procedures. The operating system then can use one CPU to start sorting some
of the data, a second CPU can sort more of the data, and the third CPU can sort the rest of the data.
2. Threading
35
Multi-Threaded Processing
PROC SORT DATA=A
SMP
31
Eventually all of that data must be combined back together, so one CPU will handle that part. Now, you
as a programmer do not get to control that this happens. The operating system may have other tasks that it
needs to work on, so one CPU may be dedicated to working on something unrelated to your program. The
threaded procedures just give the operating system the option to split the one PROC step across multiple
CPUs.
36
Introduction to Parallel Processing
Tuning Options for Threaded Procedures
Options to control tuning include:
„
CPUCOUNT option – controls number of CPUs to utilize; does not
control number of threads
„
NOTHREADS option in the PROC statement – prevents a threadenabled procedure from using multiple threads
32
Now if you are sharing a machine like a UNIX box with others in your organization, they may not be too
thrilled with you if you start taking over all of the processors. You may need to perform tuning on your
machine to restrict the number of threads available to prevent an individual session from monopolizing
the machine. You can restrict the number of CPUs by using the CPUCOUNT= option.
If you prefer your PROC step to run single-threaded, you can use the NOTHREADS option in that PROC
statement.
2. Threading
37
Tuning Options for Threaded Procedures
The number of CPUs to use for processing in each SAS session can be
controlled with the CPUCOUNT system option.
OPTIONS CPUCOUNT=1-1024 | ACTUAL ;
1-1024
is the number of CPUs that SAS assumes
are available for use by threaded-enabled
applications.
ACTUAL
is the number of CPUs that SAS detects are
available for a specific session (default).
33
The syntax for the CPUCOUNT= option is OPTIONS CPUCOUNT= and then a number that ranges from
1 to 1024. So if you specify CPUCOUNT=2, then a maximum of two CPUs are used by that SAS session.
You could also specify CPUCOUNT=ACTUAL, and that means however many CPUs are on that
machine, that’s how many I want. This is the default value.
If you did use a number that was greater than the actual number of CPUs on that machine, then SAS does
not magically create extra processors, but changes the count back to ACTUAL.
38
Introduction to Parallel Processing
How Much Better Is Threading? PROC SORT
34
So we ran some benchmarks just to see how much better threading was than not threading. We varied our
data using anywhere from 1 million to 10 million observations in a data set, and then ran those data sets
through a PROC SORT with four variables on our BY statement using a four-way UNIX machine. There
really wasn’t much difference with less than two million observations. But after that, notice a remarkable
improvement on overall elapsed time when we used threading in SAS 9.1.3 (the bottom line).
When we re-ran the code and used the NOTHREADS option (the top line), it was pretty similar to what
happens when we ran it in SAS 8.2.
2. Threading
How Much Better Is Threading? PROC SUMMARY
35
Now we ran that same basic scenario but this time using PROC SUMMARY and four class variables.
Notice how even at only one million observations we see an improvement in overall elapsed time. And
the elapsed time stays remarkably low even with ten million observations, whereas the non-threading in
9.1.3 and 8.2 are very similar, and show a steeper rate of elapsed time with more observations.
39
40
Introduction to Parallel Processing
Threading and Independent Parallelism
SAS®9 Threading and Independent Parallelism
DATA Step A
SAS/CONNECT
Software
DATA Step B
PROC SORT A PROC SORT A
PROC SORT B PROC SORT B
Base SAS
Software
36
®
Even with threading, of course you can still perform independent parallelism in SAS 9. So I can process
my A data and B data at the same time including any PROC steps that are thread-enabled. However,
because we are performing independent parallelism, this still requires SAS/CONNECT software.
So to recap, threading means that a single PROC step is taking advantage of multiple CPUs on a
machine and is part of Base SAS, while asynchronous execution of different steps is taking advantage of
multiple CPUs on a single machine, or single or multiple CPUs on more than one machine, and is part of
SAS/CONNECT software.
2. Threading
Summary
37
Traditionally each computer operates as a stand-alone unit without sharing processing power. By using
the capabilities in parallel processing, the processors can work together simultaneously to reduce the
overall elapsed time of your job.
41
42
Introduction to Parallel Processing
Credits
Introduction to Parallel Processing was developed by M. Michelle
Buchecker. Additional contributions were made by Cheryl Doninger,
Glenn Horton, Merry Rabb, and Chris Riddiough.
38
That concludes the Introduction to Parallel Processing e-lecture.
2. Threading
Comments?
We would like to hear what you think.
„
Do you have any comments about this lecture?
„
Did you find the information in this lecture useful?
„
What other e-lectures would you like to see SAS develop
in the future?
Please e-mail your comments to
[email protected]
39
If you have any comments on this lecture or other lectures you would like to see, please e-mail
[email protected].
43
44
Introduction to Parallel Processing
Copyright
SAS and all other SAS Institute Inc. product or service names
are registered trademarks or trademarks of SAS Institute Inc.
in the USA and other countries.
® indicates USA registration. Other brand and product names
are trademarks of their respective companies.
Copyright © 2009 by SAS Institute Inc., Cary, NC 27513, USA.
All rights reserved.
40
Thank you for your time.