HRP 223 - 2008 - Stanford University

HRP 223 - 2008
HRP223 2008
Topic 4 – Making and Looking at Data
Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved.
Warning: This presentation is protected by copyright law and international treaties.
Unauthorized reproduction of this presentation, or any portion of it, may result in
severe civil and criminal penalties and will be prosecuted to maximum extent possible
under the law.
Toy Data
HRP223 2008
 While it is of little use in real life, SAS lets you
manually enter data.
 First make a library so the data will be
permanently stored.
Toy Data
 Tell it to make the dataset:
HRP223 2008
HRP223 2008
HRP223 2008
If you type a number as the first
value of a character variable, EG
converts the column to numeric.
Right click on the column
headings to change them back if
this inadvertently happens.
Professional programmers
equate 0 with “no” and 1 means
“yes” but create HRP223
a format to2008
make reports pretty.
HRP223 2008
Open the data, then set it to be not
read only by unchecking the
option.
When you come back…
HRP223 2008
 If you return to the project it will have
forgotten about the formats you applied.
 Add a one line program to tell it what libraries
(folders) have formats stored in them.
This little program shows the details on formats in a library.
HRP223 2008
These 4 records really represent 300 people. So if you were to
do a frequency count on the cancer name variable, you would
get the wrong count.
Notice that it
uses the
labels.
HRP223 2008
If you find the label “The FREQ
Procedure” annoying, turn it off
in the options Tasks > Tasks
General pane.
This is the same as the code:
ods noproctitle;
You can also set or remove
default titles and footnotes here.
HRP223 2008
Fix the title also:
Setting the Order
 There are
options to set
the order that
the results print.
If the options
don’t work, make
a format.
HRP223 2008
Ordering the Information
HRP223 2008
 When data is sorted in format order, the first
“letter” of the alphabet is blank. So put a
leading space in the format for the things you
want listed first.
I added a leading blank before the Y
One format is numeric.
HRP223 2008
The other format is
character.
Two Categorical Variables
 You can do similar voodoo with two
categorical variables:
HRP223 2008
… no idea why
Frequency
count shows
first on the
task roles.
Specify What is a Row vs. a Column
HRP223 2008
 Drag your outcome variable over first.
 Drag the exposure variable over second.
First
HRP223 2008
The character variable lists No before Yes.
This will replace values in a character
variable so this a character format.
HRP223 2008
Notice the leading space
before the Y.
You could go back and
manually change the format
by clicking on the column
heading in the data set but I
recommend just applying it in
the analysis.
HRP223 2008
Be aware that all the
common statistics are
here so you do not
need to learn the code.
Use the Preview code
button to see if you
have the right options
set.
Summarizing Numeric Data
 Begin with a graphic.
HRP223 2008
– Remember that you want to show both central
tendency and variability.
– You have already briefly seen the Summary
Statistics and Distribution Analysis menu options
(aka proc means and proc univariate).
 I want you to know how to summarize large
and small datasets.
Numeric Data
HRP223 2008
 Say somebody tells you to simulate rolling dice. The formula to do this
says:
– generate a random number between 0 and 1
– multiply it by 6
– round up to the closest integer
data die;
*the 22 says which list of numbers between 0 & 1;
aNumber = ranuni(22);
die = ceil(6*aNumber);
* Generate a random integer between 1 and 6.;
dieDie = ceil(6*ranuni(78687632));
output; * write to the new dataset;
return; * go to the top and try to read in data;
run;
Doing Stuff Repeatedly
HRP223 2008
 How to roll two dice:
data dice;
do x = 1 to 2 by 1;
roll= ceil(6*ranuni(78687632));
output;
end;
return; * go to the top and try to
read in data;
run;
Craps…
HRP223 2008
 In the dice game “craps” you throw two dice and the
number you roll determines if you win or lose. How do you
simulate rolling 10 pairs of dice?
data craps ;
do trial = 1 to 10;
do dieNumber = 1 to 2;
roll = ceil(6*ranuni(78687632));
output;
end;
end;
return;
run;
The Total
 Calculate the sum across
the rolls using Summary
Statistics on the Describe
menu.
HRP223 2008
Total on a Trial
HRP223 2008
HRP223 2008
Do the histogram on the summary data.
HRP223 2008
Crank up the number of simulations. Turn off the histograms for each trial.
Generate a histogram based on the 1000 trials.
I want to fix the way the histogram is binned.
When the code is open, push any key and
it will make a copy of the code which you
can edit.
HRP223 2008
HRP223 2008
HRP223 2008
HRP223 2008
Do Loops
 Loops are used
whenever you need to
repeatedly do
something. Say you
wanted to read in 24
lines of data, where
the first 6 records are
from 1 treatment, the
next 6 are from a 2nd,
etc.
HRP223 2008
More Condensed
HRP223 2008
 The group could be a counter that goes from 1
to 4.
How to Summarize
HRP223 2008
 You can get a boxplot or a histogram with only
6 values but they will not be very informative.
HRP223 2008
HRP223 2008
Only a Few
HRP223 2008
 If you only have a few data points, you should
consider a mean and dot plot. SAS doesn’t
have one built in so I made a macro to do it.
 Macros are self contained blocks of code that
do complex things.
– A good Macro is like a function. You pass it a few
arguments and it returns an answer. You don’t
need to look at how its guts work.
The plotit Macro
HRP223 2008
 You paste in the macro beginning with the
macro line and ending in the mend line.
 Then you invoke the macro using the name
following the %macro statement:
HRP223 2008
Macro Stuff
HRP223 2008
 Macros can do simple formulas like calculating
an age.
 Or really ugly stuff like validating dates.
HRP223 2008
Function Help
HRP223 2008
 The books for the class have lists of frequently
used functions but you probably want to
bookmark the function help in EG as well as
using onlineDoc.
Add it to the favorite page.
HRP223 2008
 Highlight a word in the right windowpane and
then type control-f to find words.
Dummy records in the HW
HRP223 2008
 Recall there was a dummy record at the
beginning of the Homework datasets. Why?
– Columns of data in Excel are allowed to take
arbitrary widths. So if you have a “last-name”
column it will import into a database as having the
width of the longest name.
– If you import a second dataset and it has a
different length and you try to append them
together a database will choke.
– You can use a dummy record to make sure the
columns have the same length.
Combine two datasets
HRP223 2008
HRP223 2008
HRP223 2008