Open Problem Solutions 2013

Open Problem for SUAVe User Group Meeting, November 26, 2013 (UVic)
Background
The data in a SAS dataset is organized into variables and observations, which equate to rows and columns. While the
order of rows (or observations) is often relevant to data step processing, the order of the variables in a dataset doesn't
matter. But sometimes the order of the variables matters to people.
Sometimes we want to interactively view a SAS dataset. When there are a large number of variables in the dataset, it
may be awkward to scroll back and forth between variables we're interested in. When you use a data step to create a
dataset, SAS decides on the order of variables, basically adding each variable as soon as its required storage space is
known.
The Problem
For this problem, let's assume you've received a dataset that has a large number of variables in an inconvenient order.
You want to create a new version of the dataset where the variables have been rearranged into alphabetical order.
When interactively viewing the new dataset, finding specific variables will be much easier while scrolling left and right.
There are two levels at which you can attack this problem. You can come up with a solution that works for the problem
dataset only, or a more general solution that will work for any SAS dataset. The former requires a bit of typing, but the
latter may be a much more difficult task! Feel free to tackle either problem.
The code below can be executed to create the sample dataset, named random, which has 40 randomly named variables.
(Because the seed is fixed, everyone should be working with the same data.) Please run that code first, and then write
your solution to make a new copy of the dataset with the variables sorted in alphabetical order.
Please be prepared to present your solution at the November SUAVe meeting. Good luck!
Code to create the random dataset is on the next page.
1
* Create the open problem dataset, named RANDOM;
%let seed = 456789;
%let seed2 = 9731;
* seed for variable names;
* seed for values in dataset;
data _null_;
length var_name prev_var_name $ 4;
call execute('data random;');
call execute('drop i;');
* create 50 observations;
call execute('do i = 1 to 50;');
* create 40 variables;
do j = 1 to 40;
* names are each 4 characters long;
do k = 1 to 4;
substr(var_name, k, 1) = byte(64 + ceil(ranuni(&seed) * 26) +
((ranuni(&seed) > 0.5) * 32));
end;
if (j = 1) then
call execute(var_name || " = round(i ** (ranuni(&seed2) * 2), 0.01);");
else
call execute(var_name || ' = round(' || prev_var_name ||
' * 1.5, 0.01);');
prev_var_name = var_name;
end;
call execute('output;');
call execute('end;');
call execute('run;');
run;
2
The Solutions:
1. Antoine Lalumiѐre, Canadian Forest Service
****************************************************************************************;
* Solution to the SUAVe 26NOV2013 Open Problem - Not using the results of PROC CONTENTS
or PROC DATASETS to reorder variables. *;
* Antoine Lalumière, NRCan, Canadian Wood Fibre Centre, Pacific Forestry Centre,
[email protected].
*;
****************************************************************************************;
* Have a preliminary look at the variable formats. The transpose procedure works
differently for numeric and character variables. ;
* Note that the ORDER=VARNUM orders the results by variable number. ;
options ps=32767 ls=64 nodate nonumber nocenter ;
proc contents data=random ORDER=VARNUM; run;
options ls=256;
* Have a look at the original dataset. ;
proc print data=random;
title 'Original random dataset';
run;
* Transpose the entire dataset - Rows become columns, and columns become rows. ;
proc transpose data=random out=transposed;
run;
proc print data=transposed;
title 'Transposed random dataset';
run;
* Sort the transposed dataset by _NAME_ to order the variables (each of which is now on a
row-record) by the original variable name. ;
* Note that the SORTSEQ option is used to disregard case when sorting. ;
proc sort SORTSEQ=LINGUISTIC(STRENGTH=PRIMARY) data=transposed out=trsorted;
by _NAME_;
run;
proc print data=trsorted;
title 'Transposed sorted dataset';
run;
* Transpose the sorted dataset again, turning columns back into rows, and rows back into
columns. ;
proc transpose data=trsorted out=sorted(drop=_NAME_);
run;
proc print data=sorted;
title 'Final sorted dataset';
run;
* Have a last look at the variables to ensure the variable re-ordering went smoothly. ;
* Note that the ORDER=CASECOLLATE option is used to order the results without regard to
case. ;
* If this option is not used, PROC CONTENTS orders upper case results before lower case.;
options ls=64;
proc contents data=sorted
ORDER=CASECOLLATE; run;
* Have fun with the TRANSPOSE procedure! ;
3
2. Joel Choy, BC Ministry of Health
proc datasets;
contents data=random order=ignorecase out=columns ;
Run;
data columns_out (keep=name);
set columns;
run;
filename retainf
'H:\temp\retains.txt';
data _null_;
length retain_literal $6. ;
retain_literal = 'retain';
semicolon_literal = ';';
set columns_out;
file retainf;
put @01 retain_literal
$6.
@08 name
$10.
@20 semicolon_literal
$1.;
run;
data random_out;
%include 'H:\temp\retains.txt';
set random;
run;
3. Peter Ott, BC Ministry of Forests, Lands & NRO
proc contents data=random out=vars(keep=varnum name) noprint;
run;
proc sort data=vars; *this step is not really necessary since proc contents already
sorted in alphabetical order;
by name;
run;
data _null_;
length var_names $ 999.; *this will hopefully be long enough;
set vars end=last;
retain var_names '';
var_names=catx(' ', var_names, name); *removes leading and trailing blanks, inserts a
delimiter, and returns a concatenated character string;
if last then output;
call symput('ordered', var_names); *sticking into macro;
run;
%put &ordered;
data new;
array allvars{*} &ordered; *array statement overrides original order - could also use
attrib, format, informat, retain or length statements;
set random;
run;
quit;
4
4. Dale Starr, BC Ministry of Health
/*
*----------------------------------------------------------------------------------*
* SUAVe (SAS Users Association of Victoria, eh?) meeting Nov 26, 2013.
*
* Open Problem, extract SAS column names and change column order - Three solutions *
*----------------------------------------------------------------------------------*
*/
/*
Solution 1
PROC CONTENTS Method to create a Macro variable containing the list of column names.
*/
*--------------------------------------------------------------------------------*;
* Use PROC CONTENTS to get column names. The varnum variable is unnecessary.
;
* Note the default SAS sort order for the output dataset is that the column names ;
* are sorted by UPPERCASE alphabetical then LOWERCASE alphabetical.
;
*--------------------------------------------------------------------------------*;
PROC CONTENTS DATA = work.RANDOM OUT = var_list01 (KEEP = name varnum) NOPRINT;
RUN;
*-------------------------------------------------------------------------------*;
* This step is unnecessary.
;
* Sort data back into the original column name order to compare against original ;
* input file.
;
*-------------------------------------------------------------------------------*;
PROC SORT DATA = var_list01 OUT = srt_var_lst01;
BY varnum;
RUN;
*-------------------------------------------------------------------------------*;
* Use PROC SQL to write the column names into a Macro variable
;
* i.e INTO :<varname> , note SEPARATED BY " " to be used in a DATA step
;
*-------------------------------------------------------------------------------*;
PROC SQL NOPRINT;
SELECT name
INTO :macro_var_lst01 SEPARATED BY " "
FROM srt_var_lst01
ORDER BY UPPER(name);
QUIT;
* Check the contents of the Macro variable.;
%PUT &macro_var_lst01;
*----------------------------------------------------------------------------------*;
* Create final data set using a DATA step. Could also be done with PROC SQL
;
* Use Macro variable in the RETAIN statement to put columns in alphabetical order .;
*----------------------------------------------------------------------------------*;
DATA Out_file01;
RETAIN &macro_var_lst01;
SET work.Random;
RUN;
/*
END PROC CONTENTS Method to create a Macro variable containing the list of column
names.
*/
/*
Solution 2
5
DICTIONARY.COLUMNS Method to create a Macro variable containing the list of column
names.;
*/
*-------------------------------------------------------------------------------*;
* Create macro in one step using DICTIONARY.COLUMNS
*;
* Note SEPARATED BY ", " to be used in a PROC SQL
*;
*-------------------------------------------------------------------------------*;
PROC SQL NOPRINT;
SELECT name
INTO :macro_var_lst02 SEPARATED BY ", "
FROM dictionary.columns
WHERE memname = 'RANDOM'
AND libname = 'WORK'
ORDER BY UPPER(name);
QUIT;
* Check the contents of the Macro variable.;
%PUT &macro_var_lst02;
*----------------------------------------------------------------------------------*;
* Create final data set using PROC SQL. Could also be done with a data step.
;
*----------------------------------------------------------------------------------*;
PROC SQL;
CREATE TABLE Out_file02 AS
SELECT &macro_var_lst02
FROM work.RANDOM
QUIT;
/*
END DICTIONARY.COLUMNS Method to create a Macro variable containing the list of column
names.;
*/
/*
Solution 3
SASHELP.VCOLUMN Method to create a Macro variable containing the list of column
names.;
*/
*--------------------------------------------------------------------------------*;
* Use SAS system data set sashelp.vcolumn in a DATA step to get column names.
;
* Note the only required variable is NAME.
;
*--------------------------------------------------------------------------------*;
DATA var_list03;
SET sashelp.vcolumn;
WHERE memname = 'RANDOM';
KEEP name varnum;
RUN;
*-------------------------------------------------------------------------------*;
* Use PROC SQL to write the column names into a Macro variable.
;
* Note I am creating two macro variables, one separated by comma the other by a ;
* space in order to create the final data sets with a DATA step and PROC SQL.
;
*-------------------------------------------------------------------------------*;
PROC SQL NOPRINT;
SELECT name,
name AS name2
INTO :macro_var_lst03 SEPARATED BY ", ",
:macro_var_lst03b SEPARATED BY " "
FROM var_list03
ORDER BY UPPER(name),
6
UPPER(name2);
QUIT;
* Check the contents of the Macro variables.;
%PUT &macro_var_lst03;
%PUT &macro_var_lst03b;
*----------------------------------------------------------------------------------*;
* Create final data set. A DATA step or PROC SQL could be used.
*;
*----------------------------------------------------------------------------------*;
PROC SQL;
CREATE TABLE Out_file03 AS
SELECT &macro_var_lst03
FROM work.RANDOM
QUIT;
DATA Out_file03b;
RETAIN &macro_var_lst03b;
SET work.Random;
RUN;
* END SASHELP.VCOLUMN Method to create a Macro variable containing the list of column
names.;
5. Aijun Yang, BC Ministry of Health
/* Use PROC SQL to create alphabetical list of variable names from dictionary tables
/* Define libname if libname is not WORK
/* Note the values of libname and memname need to in upper case
*/
*/
*/
%macro reorder(libnm, datain,dataout);
proc sql noprint;
select distinct name into : varlist separated by ' '
from dictionary.columns
where libname="&libnm" and memname="&datain";
quit;
run;
data &dataout;
retain &varlist;
set &datain;
run;
%mend;
%reorder(%upcase(work),%upcase(random), reorder_random);
7