DSCI 325: Handout 6 ‒ More on Manipulating Data in SAS

DSCI 325: Handout 6 – More on Manipulating Data in SAS
Spring 2017
CREATING VARIABLES IN SAS: A WRAP-UP
As you have already seen several times, SAS variables can be created with an assignment
statement in a DATA step. The assignment statement evaluates the expression on the right side
of the equal sign and stores the result in the variable whose name is specified on the left side of
the equal sign. For example, consider the following statements.
DATA employees;
Salary = 40000;
Gender = 'M';
Hire_Date = '29JAN2017'D;
RUN;
PROC PRINT;
FORMAT Hire_Date DATE9.;
RUN;
The results are shown below:
Review of Arithmetic Operators:
Operator
**
*
/
+
-
Definition
Priority
Exponentiation
1
Negative prefix
1
Multiplication
2
Division
2
Addition
3
Subtraction
3
Operations of Priority 1 are performed before operations of Priority 2, etc. Consecutive
operations with the same priority are performed from right to left within priority 1 and from
left to right within priorities 2 and 3. Parentheses can be used to control the order of operations.
1
Review of Date Functions:
SAS date functions can be used to either create SAS date values or extract information from
existing SAS date values. For example, consider the following:
Function
Description
Extracts the year and returns a four-digit
value for year
Extracts the quarter and returns a number
from 1 to 4.
Extracts the month and returns a number
from 1 to 12.
Extracts the day of the month and returns a
number from 1 to 31.
Extracts the day of the week and returns a
number from 1 to 7, where 1 represents
Sunday, etc.
YEAR(SAS-date)
QRTR(SAS-date)
MONTH(SAS-date)
DAY(SAS-date)
WEEKDAY(SAS-date)
THE DROP AND KEEP STATEMENTS
You can use the DROP statement to specify variables you want to omit from the output data
set(s). Conversely, the KEEP statement specifies the names of the variables you want to be
written to the output data set(s).
DATA employees2;
SET employees;
KEEP Salary Hire_Date;
RUN;
PROC PRINT data=employees2; RUN;
Complete the DROP statement in the code below that would yield the same result for the
employees2 data set:
DATA employees2;
SET employees;
DROP
RUN;
;
PROC PRINT data=employees2; RUN;
2
Alternatives to the DROP and KEEP statements are the DROP= and KEEP= data set options
placed in the DATA statement.
DATA employees2 (KEEP = Salary Hire_Date);
SET employees;
RUN;
DATA employees2 (DROP = Gender);
SET employees;
RUN;
PROC PRINT data=employees2; RUN;
PROC CONTENTS DATA=employees2; RUN;
Note that the DROP= and KEEP= data set options can be used in situations where the DROP
and KEEP statements cannot. In particular, the DROP= and KEEP= data set options can be used
in any PROC step to control which variables are used in the procedure:
DATA employees2;
SET employees;
RUN;
PROC PRINT DATA=employees2 (KEEP = Salary Hire_Date);
RUN;
PROC CONTENTS DATA=employees2;
RUN;
3
SUBSETTING OBSERVATIONS
We can subset observations in SAS using either the WHERE, IF, or IF-THEN DELETE
statements.
The WHERE statement
This statement subsets observations that meet a particular condition. For example, it is used
below to create a new data set (GradeA) that contains only the students that earned an A.
DATA Grades3;
SET Hooks.Grades_missing;
TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent=. THEN
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE Grade='F';
Grade='Incomplete';
>= 0.90 THEN Grade='A';
>= 0.80 THEN Grade='B';
>= 0.70 THEN Grade='C';
>= 0.60 THEN Grade='D';
RUN;
DATA GradeA;
SET Grades3;
WHERE Grade='A';
RUN;
PROC PRINT Data=GradeA;
VAR FirstName LastName Final FinalPercent Grade;
RUN;
Note that the WHERE statement selects observations before they are brought into the program
data vector. As a result, the following code would produce an error because the data set
Hooks.Grades_missing does not contain the variable Grade.
DATA GradeA;
SET Hooks.Grades_missing;
TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent=. THEN
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE Grade='F';
Grade='Incomplete';
>= 0.90 THEN Grade='A';
>= 0.80 THEN Grade='B';
>= 0.70 THEN Grade='C';
>= 0.60 THEN Grade='D';
WHERE Grade='A';
RUN;
4
The Subsetting IF statement
This statement continues processing only those observations that meet the specified condition.
For example, consider the following.
DATA GradeA;
SET Hooks.Grades_missing;
TotalQuiz = SUM(Quiz1,Quiz2,Quiz3,Quiz4,Quiz5,Quiz6,Quiz7,Quiz8,Quiz9,Quiz10,Quiz11,Quiz12);
TotalExam = SUM(Exam1,Exam2,Exam3);
FinalPercent = (TotalQuiz + TotalExam + EC + Final)/640;
IF FinalPercent=. THEN
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE IF FinalPercent
ELSE Grade='F';
Grade='Incomplete';
>= 0.90 THEN Grade='A';
>= 0.80 THEN Grade='B';
>= 0.70 THEN Grade='C';
>= 0.60 THEN Grade='D';
IF Grade='A';
RUN;
PROC PRINT Data=GradeA;
VAR FirstName LastName Final FinalPercent Grade;
RUN;
Note that the subsetting IF statement is not processed before observations are brought in to the
data vector. Instead, it simply determines whether an observation continues to be processed (a
false IF expression simply causes the observation to not output to the data set).
The IF-THEN-DELETE statement
This can be used as an alternative to the subsetting IF statement. For example, consider the
following.
DATA GradeA;
SET Grades3;
IF Grade NE 'A' THEN DELETE;
RUN;
PROC PRINT Data=GradeA;
VAR FirstName LastName Final FinalPercent Grade;
RUN;
5
6