Check All that Apply

Check it Out:
A Macro for Neatly Displaying 'Check All that Apply' Responses
Elizabeth A. Roth, RAND, Santa Monica. CA
cards
ABSTRACT
0
1
0
0
1
0
The multiple-response or "Check All that Apply" (CATA)
question type is commonly used in many data collection
settings. Although all variables of the multi-response question
are similar and should be grouped together for frequency
display, it is often easiest to generate separate frequencies for
each variable. This paper introduces a macro that neatly
displays frequencies of all variables in a CATA as a table, and
compares the usefulness of this table to other ways oftabluating
CATA questions.
1
1
1
0
0
0
0
1
1
0
INTRODUCTION
run;
We commonly encounter data from surveys and other sources
that allows for multiple responses to one question. This type of
question allows for people to report that they have more than
one kind of pet or that they heard about a product from more
than one source, and, most recently, allows us to affiliate
ourselves with more than one racial or ethnic background in the
U.S. decennial Census.
1
0
0
1
1
0
0 0
1 1
0 1
0 1
0 1
0 0
0 0
0 0
0 1
1 1
0 1
1 1
1 0
0 1
Proc frequency on these variables generates the following output
(due to space limitations I don't generate percents or cumulative
statistics):
The FREQ Procedure
Los Angeles
The simplest way in SAS to store the responses to CATA
questions is as a set of independent but related variables. For
data step processing we can put them in arrays so they are
treated similarly, but there is no convenient equivalent in SAS
reporting that shows the interrelatedness of the components of
the CATA question. The cata.sas macro developed in this paper
is one way to neatly display frequencies ofCATA questions so
their relationship to each other is apparent
Frequency
la
5
0
1
5
Santa Barbara
sbarbara
CHECK ALL THAT APPLY QUESTIONS
If we store the elements ofaCATA question as separate
variables, we can generate frequencies of each variable
independently of the others to see the distributions of responses.
This strategy gives us the information we need, but is
cumbersome in the amount of output generate, and does not
visually communicate the related nature of the elements of a
CATA question. Furthermore, since variables are defmed as
either checked or not checked, most of the output is redundant;
if we know the total number of respondents and the number who
did not check the element, then we automatically know how
many did check it.
Frequency
0
7
1
3
San Diego
sdiego
0
1
The fullowing program reads in the California cities that the ten
respondents in a fictitious survey traveled to in the past 12
months.
Frequency
4
6
San Francisco
Sfrancisco
data catadata;
= "Los Angeles"
label la
= "Santa Barbara"
sbarbara
= "San Diego"
sdiego
Sfrancisco = "San Francisco"
SLObispo = "San Louis Obispo"
Frequency
0
7
1
3
San Louis Obispo
SLObispo
input LA SBarbara SDiego SFrancisco
SLObispo ;
0
1
373
Frequency
4
6
cleaned and missing data must be explored beforehand.
The output generated here is useful, concise, and visually
communicates the interrelatedness of the variables.
The CATA Macro
We can use the SAS macro detailed at the end of this paper to
display these questions in a more compact format. The use of
this macro relies on the assumption that the CATA questions
have all been cleaned and all missing data is explained. This
macro only displays the number of questions checked, and
therefore doesn't deal with missing data.
ACKNOWLEDGEMENTS
I would like to thank Paul Shekelle and Sally Morton of
the Southern California Evidence-based Practice Center
for the support and opportunity to develop, test, and
extensively implement this macro.
The macro assumes that the variables are formatted with a value
label of"Yes" for those questions that are checked. Format
associations are made in a macro called ''fintstmt". Variable
labels must also be defined. The name of the dataset containing
the variables must also be defined in the macro variable,
&dataset. Finally, the macro allows room for a title to be added
to the CATA frequency report.
CONTACT INFORMATION
Elizabeth A. Roth
RAND
1700 Main Street
Santa Monica, CA 90407
(31 0)393-0411
[email protected]
The macro's strategy is to loop through a list of variables,
generate a frequency for each variable and write out the freqs to
a dataset. It uses proc transpose to convert the variable names to
variable labels, concatenates all of the frequencies for each
variable in the list, orders the frequencies by descending
frequency and reports on the list of all variables.
The following program uses the cata macro to display the
frequencies of the checked cities.
%include "cata.sas";
%let dataset=catadata
proc format ;
value yesno
0,2=No
l=Yes ;
run;
%macro fmtstmt ;
LA SBarbara SDiego SFrancisco SLObispo
yesno.;
%mend;
%cata (LA SBarbara SDiego SFrancisco
SLObispo,
title=Destination)
This code, with the cata.sas macro, genemtes the following
output:
Destination
San Diego
San Louis Obispo
Los Angeles
Santa Barbara
San Francisco
Yes
6
6
5
3
3
CONCLUSION
Using the CAT A macro detailed in the appendix can
greatly reduce the amount of output needed when
reporting on frequencies of the elements of a check-allthat-apply-type question. The macro does not report on
missing data, so variables going into the macro must be
374
APPENDIX
I* Count the number of words in a string *I
%macro words {string);
%local count word;
%let count=l;
%let word:%qscan(&string,&cou nt,%str{ )) ;
%do %while{&word ne);
%let count=%eval{&count+l);
%let word=%qscan{&string,&cou nt,%str{ )) ;
%end;
%.eval {&count -1)
%mend words;
I* Generate frequencies of CATA questions *I
%macro cata {varlist, title="") ;
%let nvars = %words{&varlist) ; **Generate number of elements in the CATA question ;
** Loop over each element ;
%do tab4varl=l %to &nvars ;
**Retrieve the variable name
%let tab4var = %upcase{%scan{&varlist,& tab4varl));
** Generate frequencies on the variable, outputting the results to a dataset ;
proc freq noprint
data= &dataset;
tables &tab4var
I out=&tab4var {keep=&tab4var count)
format %fmtstmt ;
run;
** Transpose the datset to get the variable label and frequencies on one line
proc transpose
out= &tab4var;
id &tab4var ;
var &tab4var count
run;
** Concatenate the frequency with previous frequencies and clean up the datset
data %upcase{%scan{&varlist,l) ) {drop=_label_ _name_ i ) ;
attrib variable length=$35;
set %upcase{%scan{&varlist,l) )
%if &tab4varl > 1 %then _last_
if upcase{_name_)=upcase{"& tab4var") then do;
variable=_label_;
delete;
end;
else _label_=variable;
retain variable ;
%if &tab4varl=&nvars %then %do;
array vars{*} _numeric_ ;
do i=l to dim{vars) ;
if vars[i]=. then vars[i]=O
end;
%end;
%else %do; i=l; %end;
run;
%end ; I* Doing for each variable in the list *I
** Sort the dataset, highest number of checked items first
proc sort ;
by descending yes ;
run;
** Print the final table ;
proc report nowd headline ;
column variable yes ;
define variable I display "&title"
define yes I display ;
run;
%mend; /* cata */
375