From 500 Spreadsheets to One SAS® Data Set in

NESUG 18
Ins & Outs
From five hundred spreadsheets to one SAS® data set in 3 easy steps:
Driving IMPORT with data-driven macro variables
Christianna S. Williams, University of North Carolina at Chapel Hill
ABSTRACT
In a clinical trial, we had sleep data on 90 study participants over 30 one-week study periods. These
data were originally stored in Excel® spreadsheets, with separate workbooks for each study period and
a separate worksheet within those workbooks for each participant. Additionally complicating matters,
not all subjects participated in all study periods, so the number and the “identities” of the worksheets
varied among study periods. Thoughtful and systematic naming of workbook files and the worksheets
they contained, along with construction of a “control” SAS® data set indicating which people took part
in which study periods, facilitated the development of a few fairly simple SAS macros to automate the
IMPORTation of more than 500 spreadsheets and the concatenation of these into a single file for
statistical analysis. This paper will present these macros, and demonstrate how the “control” dataset
was used to create macro variables that were passed as parameters to the macro that read the
workbook files. The methods used should be broadly applicable to many situations in research or
business where data need to be combined from many files (be they external files or SAS data sets) to
streamline this process and allow it to be reproducible and data-driven.
BACKGROUND
Many people with Alzheimer’s Disease and other dementias suffer from depression and have disturbed
sleep patterns, with frequent nighttime waking and excessive daytime drowsiness. There is evidence to
suggest that increasing daytime exposure to light can improve mood and sleep patterns in persons with
dementia. However, in many nursing homes light levels are kept quite dim, making it difficult for
nursing home residents to gain exposure to beneficial daytime light. One technique for increasing
exposure to therapeutic light is for individuals to sit in front of a light box. However, nursing home
residents with dementia, however, may not comply with sitting in front of a light box long enough to get
a therapeutic “dose” of high intensity light. Thus, we conducted a clinical trial in which computercontrolled high intensity-low glare lights were installed in the public areas of two long-term care
facilities, one in North Carolina and one in Oregon. The study had a crossover design so that four
treatment conditions (high light in the morning, high light in the afternoon, high light all day, and
standard lighting) alternated at three-week intervals for 22 intervals (or periods) in the North Carolina
site and 8 periods in the Oregon facility.
During the third week of each study period, study residents wore wrist actigraphs, which are watch-like
devices that contain an accelerometer that records arm movements in one-minute intervals. Analysis
of these counts with specialized software provides estimates of night-time sleep for each of the seven
nights that the actigraph was worn. This scoring is done manually for each resident for each study
period, and the scored results are output to an Excel spreadsheet. Thus, a separate spreadsheet is
constructed for each resident for each study period. While over the entire course of the study, we had
90 resident participants, because of new admissions, discharges, and other reasons, new residents
were enrolled and other residents left the study throughout its duration. As a result there was a
different set of residents for each study period – and thus the sleep data in the Excel files was for a
different set of residents for each period.
In order to conduct statistical analyses of these data (e.g. to test whether nighttime sleep improves with
increased daytime light exposure), I had to read the data from all of these spreadsheets and assemble
them into a single SAS data set, which could then be merged with the treatment codes and other
resident or period-specific information. In all there were about 500 spreadsheets that needed to be
IMPORTed into SAS, while keeping track of which spreadsheet was for which resident for which study
period! The actual SAS code to read each spreadsheet is, of course, quite straightforward. What I
wanted to avoid was having a SAS program that contained 500 PROC IMPORT steps! And – even
1
NESUG 18
Ins & Outs
more importantly – I wanted to ensure that I had sleep data for each study participant for each of the
study periods in which he/she was enrolled in the study. This issue was particularly critical for quality
control and data validation because of the complicated study design and the fact that each actigraph
file had to be scored and assigned its identifiers individually. This paper details how the required
analysis data set was constructed, in the following three steps:
(1) Write a simple program that will IMPORT a single, specified spreadsheet.
(2) Incorporate that program into a macro, with parameters that will identify the worksheet(s) to
read.
(3) Use a “control” data set to provide values for the parameters needed to generate the calls to
this macro, so that all the necessary worksheets can be read.
THE STARTING POINT
The spreadsheets with the scored sleep data were organized so that there was a separate workbook
(*.xls file) for each study period, and each of these workbooks contained a worksheet for each resident.
Each worksheet had only seven rows of data – one for each night of sleep data for that resident for that
study period. The directory structure for the project dictated that the workbook files were stored in a
different directory for each study period and site. The study periods were assigned sequential letters of
the alphabet (A through V in North Carolina [site 1] and A through H in Oregon [site 2]), and the
directories and workbook files were named accordingly; hence, the sub-directory for period A in the NC
facility was named “PERIOD 1A” and the workbook file was named SLEEP_1A, and the file for period C
in Oregon was SLEEP_2C and was located in the subdirectory “period 2C”. Individual worksheets
were named with the study ID (called RESID in this study) of the resident (e.g. 11501, 11507, etc.). An
example of one of the files is shown in Figure 1.
Figure 1. Portion of Excel Workbook (SLEEP_1A.xls) containing scored sleep data for Period A at Site 1. The
work sheet shown is for RESID 11512. All seven rows of the worksheet are shown but only a portion of the
columns. Tabs at bottom indicate other worksheets for other participants in this study period. Column labels are
identical for all worksheets.
2
NESUG 18
Ins & Outs
CODE TO READ ONE WORKSHEET
The following is the SAS code to read a single worksheet (Code 1).
PROC IMPORT OUT = WORK.A1_11512
DATAFILE = "S:\lighting\period 1A\sleep_1A.xls" DBMS=EXCEL2000 REPLACE;
RANGE
= "11512$";
GETNAMES = yes;
RUN;
DATA A1_11512_r ;
SET A1_11512 (WHERE = (resid NE . AND date NE .));
ATTRIB site LENGTH=3 LABEL='Study Site'
period LENGTH=$1 LABEL='Study Period'
night LENGTH=3 LABEL='Counter for Night' ;
night = _N_ ;
site = 1;
period = 'A' ;
RUN;
Code 1. SAS code to read a single worksheet in a single spreadsheet, corresponding to the sleep
data for RESID 11512, site 1, study period 1.
This example utilizes the sleep data for resident ID 11512 in Period A at site 1. The PROC IMPORT
reads the specified range of the specified workbook file and writes a temporary SAS data set named
A1_11512, which derives its variable names from the first row of the worksheet (because GETNAMES
= YES). The resulting data set has seven observations. The subsequent DATA step does a little
clean-up (in case of blank rows, which sometimes occur after the data rows) and adds the site and
period identifiers to the file as well as a counter for the night of data collection.
Now, I could copy and paste this chunk of code 500 times, modifying the identifying parameters
(Resident ID, study site, and study period) each time, but in addition to making a very unwieldy SAS
program, I’d be sure to make a lot of errors.
MACRO TO READ ONE WORKSHEET
So, the first improvement I made was to incorporate this code into a macro to which I needed to pass
three parameters – RESID, SITE and PERIOD, which jointly indicate which spreadsheet is to be read.
The result of one call to the %IMPRT macro, shown below (Code 2), is a data set that is identical to the
one resulting from Code 1.
%MACRO imprt(resid,site=,period=);
* Read the Excel file ;
PROC IMPORT OUT= WORK.&period&site._&resid
DATAFILE = "C:\CSW\NESUG\NESUG05\MyPapers\IMPORT\sleep_&site&period..xls"
DBMS=EXCEL2000 REPLACE;
RANGE="'&resid.$'";
GETNAMES=YES;
3
NESUG 18
Ins & Outs
RUN;
* Delete empty rows, add identifiers ;
DATA &period&site._&resid._r ;
SET &period&site._&resid (WHERE = (resid NE . AND date NE .));
ATTRIB site LENGTH=3 LABEL='Study Site'
period LENGTH=$1 LABEL='Study Period'
night LENGTH=3 LABEL='Counter for Night'
;
site = %eval(&site) ;
period = "&period" ;
night = _N_ ;
RUN;
* Append to file of sleep data for this period and site;
PROC APPEND BASE = sleep_&site&period DATA=&period&site._&resid._r ;
RUN;
%MEND imprt ;
%imprt(11512,site=1,period=A) ;
%imprt(11515,site=1,period=A) ;
Code 2. SAS macro code to read a single worksheet in a single spreadsheet for each call to the
macro and append the resulting SAS data set to the data set of all sleep data. This code shows two
calls to the macro, one for RESID 11512, site 1, period A and one for RESID 11515, site=1,
period=A.
Within each iteration of the macro, we APPEND the most recently read sleep data to a data set
including all previously read sleep data. So, this program is a little better…now I just have to issue 500
calls to the macro, and I would end up with a data set containing all the scored sleep data. While the
program would be substantially more readable and easier to update than the Code 1 example, it would
still be quite prone to error – I have to know exactly which macro calls to make.
THE KEY TO SIMPLIFICATION: A “CONTROL” DATA SET
At this point, I knew there had to be a better way…a way to automate the specification of the
parameters for the macro calls. And then I had an epiphany: the set of parameters (i.e. all
combinations of RESID, SITE and PERIOD) for which there is sleep data – and corresponding
worksheets – is itself data! So, I should get that information into a data set and use that data set to
direct or control the macro calls. Luckily, our project manager who was of course keeping track of
which residents participated in which study periods and, among those, which had provided valid sleep
data – could provide me with this information. This participation data was incorporated into a SAS data
set, the first 35 observations of which are shown in Figure 2. Because not all enrolled participants were
willing to wear the actigraph, or due to occasional actigraph malfunction, some participants do not have
valid sleep data. These are indicated by SLEEPDATA = 0.
4
NESUG 18
Ins & Outs
SLEEP_CTRL
period
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
C
C
C
resid
site
11501
11507
11508
11509
11510
11512
11515
11524
11525
11526
11527
11528
11529
11530
11531
11532
11501
11508
11509
11510
11512
11515
11525
11527
11528
11529
11530
11531
11534
11535
11536
11537
11501
11508
11509
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
sleepdata
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
Figure 2. PRINT of the first 35 observations of the
“control” data set, which indicates which RESIDs
contributed sleep data to which study periods.
So, how do we turn DATA step variable values into macro variable values so that they can be passed
as parameters to the %IMPRT macro? The SYMPUT call routine is perfectly suited to this task! I
decided that I needed to count the number of residents in each study period for each site and use a
macro %DO loop to call the macro this many times. I first created a macro, titled %DOPER (for “do
period”), that, when fully developed, would direct the calls of the %IMPRT macro. The first step, a
DATA _NULL_ step (shown in Code 3), contains two invocations of the SYMPUT routine. The first
step, denoted by , creates a macro variable for each iteration of the DATA step (i.e. for each
observation read by the SET statement. The name of the macro variable will be the concatenation of
period (‘A’,’B’,’C…), site (‘1’ or ‘2’ – I converted the numeric variable SITE to the character variable
SITEC to avoid problems with leading blanks), and the automatic SAS data step variable _N_. The
5
NESUG 18
Ins & Outs
value assigned to this automatic counter will be the _N_th resident ID that meets the criteria specified by
the WHERE clause on the SET statement (i.e. corresponding to the desired site and period and having
sleep data). When the DATA step reaches the last qualifying observation in the control data set,
another macro variable is created with CALL SYMPUT, denoted by
in Code 3. The name of this
macro variable will be the concatenation of “NUM_” with site and period.
%MACRO doper(per,loc) ;
DATA _NULL_ ;
SET in.sleep_ctrl
(WHERE = (sleepdata=1 AND site=&loc AND period="&per"))
END = lastobs;
1
* create macro variable for each Resident ID ;
1
sitec = PUT(site,1.) ;
CALL SYMPUT(TRIM(period)||sitec||"_"||LEFT(TRIM(_N_)),resid) ;
* get number of observations in each block for each site ;
IF lastobs THEN CALL
SYMPUT("NUM_"||sitec||"_"||trim(period),LEFT(TRIM(_N_))) ;
RUN;
%PUT _USER_ ;
3
2
1
%MEND doper ;
%doper(A,1) ;
Code 3. The beginnings of the macro %DOPER, which will eventually be used to specify calls to the
%IMPRT macro. This piece consists of a DATA _NULL_ step to read the “control” data set for a
given site and study period and store the RESID’s with sleep data for that period as macro variables.
This code also creates a macro variable that has the value of the total number of study participants
with sleep data for that site and period.
If I include the statement “%PUT _USER_;” in the code above, denoted
and call the %DOPER
macro, a listing of the macro variables and values created by calling the macro for period A, site 1 is
written to the SAS log. This list is shown in Figure 3.
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
DOPER
PER A
LOC 1
A1_1
A1_2
A1_3
A1_4
A1_5
A1_6
A1_7
A1_8
A1_9
A1_10
A1_11
A1_12
A1_13
A1_14
A1_15
NUM_1_A 15
11501
11507
11508
11509
11510
11512
11515
11524
11525
11527
11528
11529
11530
11531
11532
Figure 3. Portion of the SAS log,
showing the macro variables and
their values created by the call to
%DOPER shown in Code 3.
6
NESUG 18
Ins & Outs
All of the macro variables are preceded by “DOPER” in the log because they are all local to the
%DOPER macro. The first two macro variables shown are the parameters that were passed to the
macro when it was invoked (&LOC and &PER). The next 15 macro variables are those that were
created by the first invocation of CALL SYMPUT in Code 3, and their values correspond to the 1st
through 15th RESID that had sleep data in PERIOD A for SITE 1. Inspection of the listing of the
SLEEP_CTRL data set (Figure 2) shows that this is the case. Finally, the last macro variable shown in
Figure 2 (NUM_1_A) has the value 15, which is again the total number of RESIDs who should have
Excel worksheets to read for this study period. This information will be put to use in the next
enhancement of the %DOPER macro.
So, now we have all the information we need to direct the IMPORTing of the sleep data for a specified
period and site. Adding just a few lines of code to the %DOPER macro will use this information to
generate the appropriate calls to the %IMPRT macro. The complete macro is shown in Code 4.
%MACRO doper(per,loc) ;
DATA _NULL_ ;
SET in.sleep_ctrl
(WHERE = (sleepdata=1 AND site=&loc AND period = "&per"))
END = lastobs;
* create macro variable for each Resident ID ;
sitec = PUT(site,1.) ;
CALL SYMPUT(TRIM(period)||sitec||"_"||LEFT(TRIM(_N_)),resid) ;
* get number of observations in each block for each site ;
IF lastobs THEN CALL
SYMPUT("NUM_"||sitec||"_"||trim(period),LEFT(TRIM(_N_))) ;
RUN;
* this code will import the data for all residents in this period
by calling the %imprt macro once for each RESID in the period ;
%DO x = 1 %TO &&num_&loc._&per ;
%imprt(&&&per&loc._&x,site=&loc,period=&per) ;
%end;
PROC APPEND BASE=AllSleep DATA=sleep_&loc&per;
RUN;
%MEND doper ;
%doper(A,1)
%doper(B,1)
%doper(C,1)
%doper(D,1)
%doper(E,1)
. . .
%doper(V,1)
%doper(A,2)
%doper(B,2)
. . .
%doper(H,2)
;
;
;
;
;
1
1
2
1
3
;
;
;
;
Code 4. The updated %DOPER macro, which uses the macro variables created by the CALL
SYMPUT routine to specify the calls to the %IMPRT macro. Once the data for all residents in one
period of the study are read, the resulting data set is APPENDed to the data set with all the sleep
data for the study. The %DOPER macro is called once for each study period and site.
7
NESUG 18
Ins & Outs
The most significant change to the macro is the addition of the %DO loop, denoted by
in Code 4.
These three statements contain a rather intimidating number of ampersands – so let’s take them apart.
The %DO statement specifies the creation of a macro variable &X, which is going to get incremented
by 1 each time the program passes through the loop. The ugly macro expression “&&num_&loc._&per”
tells the loop when to stop…so what does it resolve to?
In the first pass through &&num_&loc._&per (for the first call to the %DOPER macro where we have
specified site 1 and period A), && resolves to &, &loc resolves to 1, and &per resolves to A. So, we
have &num_1_A for the next pass. The macro variable &num_1_A was created and assigned a value
by the second invocation of CALL SYMPUT in Code 3. It is the number of RESID’s who have sleep
data for period A at site 1, and, as shown in Figure 3, it has a value of 15. So, when the macro
processor gets through with it, that %DO statement setting up the loop reads %DO X = 1 %TO 15. Of
course, the next statement, which calls the %IMPRT macro, specifies what will happen 15 times.
Again, let’s take apart this macro call – to figure out what parameters we’re passing to the %IMPRT
macro. The second two parameters are easy: &site resolves to 1, and &per resolves to A. We know
we want the first parameter to resolve to a RESID that has data for this period, but how is that feat
accomplished? Starting with &&&per&loc._&x – in the first pass, && resolves to &, &per resolves to A,
&loc resolves to 1, and &x (the index for the %DO loop) resolves to 1 at the first iteration of the %DO
loop, 2 the second time, and so on. So, after the macro facility does the first round of resolution of
&&&per&loc._&x, it has translated to &A1_1. Again, &A1_1 is one of the macro variables created by
CALL SYMPUT (see Code 3, ): it has the value of the first RESID in PERIOD A for SITE 1, and thus
corresponds to 11501 (see Figures 2 and 3). Hence in the first trip through the %DO loop, the following
call is issued to the %IMPRT macro: %IMPRT(11501,1,A). And the appropriate worksheet is read
from the appropriate workbook file. In the second loop, the first macro parameter will get resolved to
&A1_2, which corresponds to the second RESID in PERIOD A for SITE 1, which we can see from the
output in Figure 3 is 11507. So this is the worksheet that gets IMPORTed, and APPENDed to the data
set for this site and period. This looping continues until &X has gotten to 15, at which point the
worksheet for the 15th RESID gets IMPORTed and APPENDed. Then, when &X is incremented to 16
the %DO loop doesn’t iterate again. The PROC APPEND step denoted by
in Code 4 simply takes
the data set that has been constructed by the concatenation of all the sleep data for period A at site 1,
and APPENDs that to a data set, SLEEPDATA, that will eventually include all the sleep data for all
study periods for both sites. The code denoted
in Code 4 simply shows that the %DOPER macro
would be called once for each period and site. After the final call has been executed, the SLEEPDATA
data set will contain the data from all 502 spreadsheets. Mission accomplished!
VALIDATION
Ok, but…But what if the information in the control dataset (SLEEP_CTRL) doesn’t match up with the
spreadsheets. It’s always a good idea to think about all the things that could go wrong. For example,
what if there is a record in the SLEEP_CTRL dataset indicating that there should be a spreadsheet for
a given site, period and RESID combination, but that worksheet file doesn’t contain a spreadsheet for
that RESID? Say the following call to the %IMPRT macro is generated:
%imprt(11504,site=1,period=A) ;
This would happen if there were an observation with RESID=11504, SITE=1, PERIOD=’A’ and
SLEEPDATA=1 in the SLEEP_CTRL data set. We see in Figure 1 that there is no spreadsheet for
11504 in the file SLEEP_1A.xls. Such a macro call does generate several error messages in the log,
as shown in Figure 4. As it turns out, this error does not cause the program to crash – zero
observations get APPENDed to the SLEEP_1A data set, but the processing will continue unscathed
with the next legitimate call to the %IMPRT macro. There is one important exception to this, however,
which is if the faulty macro call is the first one within a site-period combination. This poses a problem
because the SAS dataset for that site-period doesn’t yet exist, and so the first APPEND (where the
8
NESUG 18
Ins & Outs
BASE data set doesn’t yet exist) creates a dataset (e.g. SLEEP_1A) with 0 observations and 3
variables (SITE, PERIOD and NIGHT). This then causes the program to bomb when the next %IMPRT
call is executed because APPEND1 only works when the dataset being APPENDed has the same
variables as the existing BASE data set. In this scenario, the faulty BASE data set does not contain all
the sleep variables that are in the data set that SAS is attempting to APPEND. Now, there are
workarounds (e.g. setting up an empty dataset with all the right variables before processing begins for
each site-period). However, I am a staunch believer in READING the SAS log, and paying attention to
all WARNINGs and ERRORs, and generally doing what one can to avoid them. In this case, a
mismatch between the SLEEP_CTRL data set and the spreadsheets would indicate that either the
SLEEP_CTRL data set had an observation that shouldn’t be there or the EXCEL file for that site-period
was missing a sheet. From a project management point of view, it should be determined where the
discrepancy lies – and fix it, rather than trying to write a SAS program that will run no matter what. The
same would be true if an entire EXCEL file were missing or misnamed.
ERROR: Describe error: The Microsoft Jet database engine could not find
the object ''11504$''. Make sure the object exists and that you spell its
name and the path name correctly.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
ERROR: File WORK.A1_11504.DATA does not exist.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.A1_11504_R may be incomplete. When this step
was stopped there were 0 observations and 3 variables.
NOTE: Appending WORK.A1_11504_R to WORK.SLEEP_1A.
WARNING: Variable RESID was not found on DATA file.
WARNING: Variable Date was not found on DATA file.
WARNING: Variable Bed_time was not found on DATA file.
WARNING: Variable Get_up_time was not found on DATA file.
WARNING: Variable Time_in_bed was not found on DATA file.
< SNIP>
NOTE: There were 0 observations read from the data set WORK.A1_11504_R.
NOTE: 0 observations added.
NOTE: The data set WORK.SLEEP_1A has 14 observations and 35 variables.
Figure 4. Excerpts from SAS log showing errors that occur when a macro call is generated for a
non-existent spreadsheet within an existing worksheet file. As long as this is not the first call to the
%IMPRT macro within a site-period combination, the program will proceed without ill effects.
CONCLUSIONS
I had several purposes in mind with this paper – and with the application upon which it is based. The
most obvious is to demonstrate and explain the method I used to greatly simplify and automate the
construction of a single SAS data set from a very large number of EXCEL spreadsheets. While this
made my program a lot prettier and less error-prone, and I’ve used an identical method to read and
concatenate external files containing other types of data for other projects, I also believe that several
aspects of the method are broadly applicable to other fields. The key innovation is the use of a
“control” data set that contains the information needed to direct the required processing. This
processing could be the reading of a large number of external files (as in my application), or it could be
a particular type of analysis that needed to be completed using an existing data set for many sets of
parameters, where those sets of parameters could be enumerated in the “control” data set. One of the
9
NESUG 18
Ins & Outs
appealing aspects of this strategy, of course, is that – provided that certain structural elements of the
“control” data set are consistent, the actual content can change over time – yet the program that is
doing the processing won’t need to change. In this way, we see that the line between “code” and “data”
can be crossed.
Another point I wanted to make in this paper was to show a bit of how a program evolved, starting first
with the very simple program that is the guts of the processing task (the PROC IMPORT code, in this
particular application), and then, given that this task had to be repeated a mind-numbing number of
times, with changes to only a few parameters, folding that simple processing task into a macro. Finally,
upon recognizing that this would still require way too many macro calls, determining that I could use
data to generate those macro calls. This is the approach I often take with a complex programming task
– I visualize it as peeling an onion in reverse. One starts with the core of what one needs to
accomplish and builds outward from that, testing that it works properly (still tastes like an onion?) at
each step. In fact, I could probably have added another layer to this particular onion – by using the
data to specify not only the RESIDs for each site and period but the names of the sites and periods
themselves. But that seemed like overkill, since for this project, that aspect was not going to change.
Finally, I’d like to convey the message that automating a task in this way – building a good-looking and
functional (dare I say elegant) program – is not only good programming (readable, reproducible,
maintainable) practice. It also allows one to keep building ones skills (by not always taking the “brute
force” approach), and it’s such a blast when it works!
1
I am ignoring the FORCE option that is available in PROC APPEND (would allow concatenation of data sets with different
variables or variable attributes) because in this application having different variables on the data sets to be APPENDed
indicates that there is a problem with the input data, and I want it to cause an ERROR.
REFERENCES
For more on SYMPUT see any of the following:
1. Carpenter, Art. 1998. Carpenter’s Complete Guide to the SAS® Macro Language, Cary, NC:
SAS Institute Inc, 242 pp.
2. Burlew, Michele M. 1998. SAS® Macro Programming Made Easy. Cary, NC: SAS Institute Inc,
280 pp.
3. SAS Institute, Inc. 2002. SAS Macro Language: Reference. http://v9doc.sas.com/sasdoc/
ACKNOWLEDGMENTS
I am supremely grateful to my colleague Lauren Cohen for her careful reading of and constructive
commenting on an earlier version of this paper.
SAS is a Registered Trademark of the SAS Institute, Inc. of Cary, North Carolina. Excel is a registered
trademark of the Microsoft Corporation. ® indicates US registration. Other brand and product names
are registered trademarks or trademarks of their respective companies.
CONTACT INFORMATION
Feel free to contact the author with questions or comments:
Christianna S. Williams, PhD
Cecil G. Sheps Center for Health Services Research
University of North Carolina at Chapel Hill
725 Martin Luther King Blvd.
Campus Box # 7590
Chapel Hill, North Carolina 27599
Email: [email protected]
10