Summary
HRP223 – 2009
October 28, 2009
Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved.
Warning: This presentation is protected by copyright law and international treaties.
Unauthorized reproduction of this presentation, or any portion of it, may result in
severe civil and criminal penalties and will be prosecuted to maximum extent possible
under the law.
1
It is broken!!!
• Yesterday a student had the experience where right
clicking on nodes in the flowchart brought up the wrong
menu and/or the nodes did not respond when she
clicked on them. I was able to replicate the problem by
doing run branch starting with data that did not exist
(because it had been in work and I restarted the project).
If this happens to you, try to get a screen shot so we can
send it to SAS and use the project maintenance option on
the tools menu.
2
Don’t use the same dataset name
• I have not replicated it yet but I think you can
also cause EG to come unglued if you create a
dataset with the query builder on one
process-flow-chart and create a different
dataset with the same name (different
variables but with the same name) on second
process-flow-chart.
3
You know …
• How to create a table from scratch
• How to import tables
• How to create tables
– from a single existing table
• with selected variables
• with recoded variables
• with or without subsets of the records
– from multiple tables
With code or GUI
– From external sources like Excel or using export/import
code from databases
• by adding columns (joins)
• by adding sets of records (set operators)
4
Create a New Table
• The GUI is the easiest.
• Look in the optional textbooks for the class to
learn the syntax for code.
Missing
numbers are
just a .
Missing
characters are
just spaces
(not tabs)
$ means a character string. 10. means
10 letters wide.
The age variable starts
in column 11.
5
Importing
• The most bullet proof way to import is to use
the import wizard.
• You can also write a program with proc import
6
Code
The import macro gives
you the shell to import
Excel files.
• If you write any code be sure to load my
keyboard macros:
Once you have a program node
open in a flowchart, you add the
macros to both SAS and EG by
using the Program menu.
7
From a Database
• If you load data that came with an
import/export program, you will probably
need to add the path to infile statement.
8
Importing Advice
• It is a good idea to import the source into a
permanent library.
• After importing, use the Query Builder or a
Program node and copy all the variables into a
new data set. This node can be tweaked later
to fix the problems that you identify later.
– If you do not do this, you will have to change the
links leading from the cleaned/fixed data to point
to the analyses.
9
Creating New Datasets From 1
Table
• Name the query
and new table.
• Drag the entire
table or individual
variables to the
Select Data pane.
• In the Select Data
pane pick variables
then click the
properties button.
10
Changing a Variable
• Computed
Columns>New… >
Recoded column>
pick a variable.
• Notice the other
tabs for selecting
what to change to
a new value.
SAS allows 27 different
types of missing numbers.
.A through .Z and .
11
Bad Ages Recoded to NULL
• If you get data from a program that uses
bogus numbers to indicate problems in a
numeric field, replace the values with
different NULL values .A , .B , etc. When you
do descriptive statistics the null values will be
automatically excluded.
12
Removing/Choosing Records
• Right click on
the variables
you want to
use for
dropping
records or
use the Filter
Data tab.
13
14
Advanced Changes Comparisons
• You can use the Advanced Expression dialog
box to do complex tasks like editing and
combining text variables.
– catt(), lowcase(), compress(), combl()
• SAS has built in Regular Expression processing
(like PERL) as well as Soundex for phonetic
spellings and (Levenshtein) edit distances for
measuring dissimilarity between strings.
15
Working with Several Tables
• Joins add columns to a base table.
Table
1
Table
2
New Table
• Set operations add (or subtract) records.
Table
1
New
Table
Table
2
16
Commonly Used Joins
Table
1
Table
1
Inner Join
Left Join
Table
2
Table
2
New Table
New Table
Keep only records
where you can match
IDs in both tables.
Keep only all records
from the left table
and matching
records from the
right. Use NULL for
the unmatched
records in the right
table variables.
17
One to Many Joins
• All of the SQL joins that I have mentioned
work with either a 1 to 1 match of key
variables across tables or a 1 to many match.
But you need to be cognizant of how many
records are in each table.
Inner
Left
• Double check the new table size.
18
Cartesian Joins
• If there are duplicate key values in one of the
tables and you do not join on a second
variable, SQL will multiply the combinations
and you can end up with the total records
being the product of the number of records.
Inner Join
on Family
19
PROC SQL - Set Operators
NO GUI
• Outer Union Corresponding
– concatenates
• Unions
– unique rows from both queries
• Except
– rows that are part of first query
• Intersect
– rows common to both queries
20
How does a data step typically
work?
• The data statement says make this (or these)
data set(s).
1. SAS then reads every line down to the run
statement and gathers a list of all variables used.
•
This list is called the program data vector (PDV).
2. It then sets all the variables to missing.
21
How does a data step typically
work?
3. It then does the instruction listed on each line of
the data step program in the order that the lines
are written.
4. Then it writes all the variables out to the new
dataset.
5. It then repeats the process if there is more data.
22
How SAS Processes a Dataset(1)
• In the example below, SAS will look in the existing dataset
called Teletubbies and it will find two variables, teletubby
and thing. Then it will find the variable called kid.
• Then it will do the instructions in order.
data Teletubbies2; *name of a new data set;
set Teletubbies; *load 1 observation of data;
kid = "Andrew"; * fill in the blank;
output; *write the variables to teletubbies2;
return; *return to the top of the step;
run; *end of these instructions;
23
The Set Statement
set Teletubbies;
• This line tells SAS to load one row of data from
the data set Teletubbies into the PDV. The
first time this line is run, the first row of data is
loaded into the PDV.
• When there is no more data to load, the data
step is done.
24
Variable Assignment
• In the example the word Andrew is assigned to the
variable kid. All variables are assigned from the right
side into the variable named on the left.
kid = "Andrew";
Assignment goes this way
• If a variable appears on the left and right side of an
equal sign, the original value on the right is changed
and then written to the left.
• aNumber = aNumber + 4;
new value
original value
25
How SAS Processes a Dataset(2)
• If you do not include the output and return
statements, SAS will do them automatically. So, the
previous data step would typically be written like
this.
data Teletubbies2;
set Teletubbies;
kid = "Andrew";
run;
26
How SAS Processes a Dataset(3)
• If, If-else, or select statements are typically used to
conditionally assign values in a data step.
If: one possibility
If else: two possibilities
Select when otherwise end:
multiple possibilities
27
Error Trapping
• “Tinkywinkey” is not “Tinky Winkey” … Bad
Teletubby.
28
Test Your Understanding
data test3a test3b;
set source;
if isMale = 1 then output test3a;
hasCancer = 1;
output test3b;
run;
29
Common Ground … where
• Both SQL and data step programming use
where statements to select what records are
included in the new dataset.
• With data steps the variables used in the
where statement need to already exist in the
source file. Use if to check variables created in
the data step.
30
where
• The syntax for where is identical in SQL and data steps.
• Differences vs. if statements:
– main points work in where only
• sub points work in either
– x between y and z
• x >= y and x <= z
• y <= x <= z
– string1 ? string2 or string1 contains string2
• index(string1,string2) > 0
– string1 =* string2
• soundex(string1) = soundex(string2)
– x is null or x is missing
• missing(x)
– String1 like “U%of%A%”
• use regular expressions (PRX)
31
where Syntax
• The where statement, like all SAS statements,
begins with a keyword (where) and ends in a
semicolon.
–
–
–
–
–
–
where
where
where
where
where
where
isDead = "false";
isDead ne "true";
missing(gender);
salary > 100000;
country in ("USA", "Japan", "UK");
country in ("USA" "Japan" "UK");
32
where Syntax
• Arithmetic
– where salary/12 > 10000;
– where (salary /12) * 1.20 ge 9900;
– where salary + bonus < 120000;
• Logical
–
–
–
–
where
where
where
where
gender ne "M" and salary >= 50000;
gender ne "M" or salary >= 50000;
country = "UK" or country = "UTAH";
country not in ("USA", "AU");
33
Make Decisions
• SAS has many operations available to help you
make decisions.
= eq, ~= ne, < lt, > gt, <= le, >= ge, in ( )
Not
requires the expression following it to not be true.
& And, | or, in
& Requires both operands to be true.
| Requires one operand to be true.
In () requires at least one comparison to be true.
Math operations:
+ - * / **.
34
Logical Decisions & Compound
Expressions
• Common tests and common problems:
where YODeath < YOBirth;
where Sex = "M" and numPreg > 0;
where Sex="M" and numPreg > 0 or ageLMP > 0;
*** bad ***;
where Sex="M" and (numPreg > 0 or ageLMP > 0);
*** good ***;
– Moral: Use parentheses generously with ands and ors.
35
Where is everywhere
36
Numeric Data and Looping
• Say somebody tells you to simulate rolling dice. The formula to do this
says:
– generate a random number between 0 and 1
– multiply it by 6
– round up to the closest integer
data die;
*the 22 says which list of numbers between 0 & 1;
aNumber = ranuni(22);
die = ceil(6*aNumber);
* Generate a random integer between 1 and 6.;
dieDie = ceil(6*ranuni(78687632));
output; * write to the new dataset;
return; * go to the top and try to read in data;
run;
37
Doing Stuff Repeatedly
• How to roll two dice:
data dice;
do x = 1 to 2 by 1;
roll= ceil(6*ranuni(78687632));
output;
end;
return; * go to the top and try to
read in data;
run;
38
Craps…
• In the dice game “craps” you throw two dice and the
number you roll determines if you win or lose. How do you
simulate rolling 10 pairs of dice?
data craps ;
do trial = 1 to 10;
do dieNumber = 1 to 2;
roll = ceil(6*ranuni(78687632));
output;
end;
end;
return;
run;
39
Summing
40
© Copyright 2026 Paperzz