sas - Stanford University

Working with Data in Windows
HRP223 – 2009
Sept 28th, 2009
Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved.
Warning: This presentation is protected by copyright law and international treaties.
Unauthorized reproduction of this presentation, or any portion of it, may result in
severe civil and criminal penalties and will be prosecuted to maximum extent possible
under the law.
Sources of Data
• Toy data
– For statistics classes, you may be able to type in the
data directly into a SAS code file.
• Excel
– For small amounts of HIPAA safe data you can use
Excel with validation.
• Text files with columns of numbers and text
– Exports created by databases frequently provide a text
file full of data and a program for loading it into SAS.
• SAS
– Native SAS datasets created by somebody else.
Recognize File Types
• Windows adds a period and a suffix that is a
couple letters long to the names of files to
indicate what program uses the file. By default
the suffix is hidden.
2
1
4
Follow these steps to show file
extensions (suffixes) in Vista.
3 Uncheck
5
Show File Extensions (Suffixes) in XP
2
1
4
3 Uncheck
5
Types of Files
.pdf Adobe portable document format
.zip archives full of compressed data
.xls Excel prior to 2007
.xlsx Excel 2007 and later
.csv comma separated values (text which Excel likes)
.txt text files
.sas SAS code files
.egp Enterprise Guide projects
.sas7bdat SAS data files
.htm or .html web pages
SAS and EG files
• .sas files are text files full of instructions that a
programmer can write and/or edit.
• .egp files are not.
Searching
• Because the contents of .egp files are
incomprehensible (without special tools) you will
have trouble searching for things inside of
projects.
• This affects me when I can’t remember the name
of a project and to find it I want to search for key
words in the code (like the principal investigators
name or the name of the source data file).
– I can not find a tool to search the contents of all the
.egp files on my hard drive.
Files in Enterprise Guide
• You can (and should) save SAS code files outside of
the EG project to make it easy to search.
• Most people create EG projects that reference data
files that live outside of EG.
– SAS datasets
– Excel files
– Text files full of data
Shortcuts
• Windows
indicates a
“shortcut” to a
file that lives
elsewhere with
an arrow in the
bottom left
corner of an icon.
• EG uses the same
symbol to denote
a shortcut to a
file outside of the
project.
EG and Code
• You can write and store your “code”
instructions to SAS inside of the EG project or
you can create a short cut to the code file
which lives outside of EG.
Right click and choose New > Program
Look at the process flow
No shortcut icon
External SAS files
• You can easily save a code file outside of the
project by choosing Save Program As… from
the File menu or clicking the Save or Save As …
from the program tab (when the code is
open).
Where are SAS Data Sets Stored
• While SAS can refer to files using their windows
path, it is easier to type a short name instead of a
long path.
• SAS calls the short names “libraries”.
• EG automatically knows about a couple places
where data can be stored.
– It creates a temporary work folder whenever EG
starts.
– It creates a permanent sasuser folder when EG is
installed.
• The locations for data are called libraries.
Libraries
• By default the data goes into the sasuser
library. This is a very bad idea.
• You will end up with every file in one
folder.
• Anybody using SAS can access that folder.
So there are significant HIPAA issues.
• Right click on a file and pick Properties to
see where it is stored.
Libraries
• You can see the contents of libraries by going
to the Server List window and opening the
local libraries “file drawer.”
If you accidentally close the
window use the View menu.
Double click the
dataset to browse it.
Change the Default File Location
• On every machine you, use you should change
the default file location to the work library. Do
this once per machine.
Click 1st
Click 2x
Other Options
• To save your sanity, make this change to the
options.
Check this box on
Custom Code for Graphics
• Analyses in SAS 9.2 can have extra high
resolution graphics. Permanently turn them
on.
Permanent Store
• I suggest that you save your data into the
temporary work library by default.
• If you have a huge file which you only want to
import once, or if you want to keep a
permanent copy of a SAS data file, you will
want to set up a permanent library.
– This is just a fancy way of specifying what folder
SAS should use to save the .sas7bdat data files.
Loading Data The Easy Way
• First fix the problematic registry entries that
are described in the instructions on installing
SAS.
www.stanford.edu/class/hrp223/2009/install/
• If you have mixtures of characters an number
variables in a column in Excel programs
reading the data (including SAS) can drop the
cells that have character data without
warning.
R
SAS
Importing the Easy Way
• The most bulletproof way for importing with
EG 4.2 is to use the import wizard.
2nd add a line to the flowchart connecting
the library to the import. It just looks good.
1st rename the node to
match the library name
Playing with data
• Once the data is imported you can add code
“nodes” to the flowchart or use the graphical
user interface to tweak the data and do
Quick and easy
analyses.
subset and sorting
Complex changes
It gives you more options as
you add in sort variables.
SQL is built behind the scenes.
Context sensitive
menus help you
describe the data
you are browsing.
Descriptive Statistics