How Paul Bunyan Cleared the Log Jam: Creating a

How Paul Bunyan Cleared the Log Jam:
Creating a Useful Table from Information within Log Files
Beth Worrall, Kaiser Pennanente, Denver, CO
READING A SINGLE LINE OF DATA
ABSTRACT
The INFILE and INPUT statements together allow lines of text
of varying lengths to be read as observations of a character
variable. The LENGTH= option on the INFILE statement sets a
variable, in this case len, to the length of the line being read
The SVARYING#. format on the INPUT statement then allows
lines of varying lengths to be read, where the length of each line
must be specified, in this case as len.
Frequently SAS programs are submitted on a regular basis
(weekly, monthly, or annually), and it is desirable to compare
Jog files between different time periods. Such comparisons can
detect any drastic differences in data sets routinely created, and
-the infonmition they contain. To taclfitate this comparison,
table can be created which summarizes information contained
within logs. This summary may include the names of data sets,
the number of variables within each data set, and the number of
observations within each data set To accomplish this, the
following statements are used:
a
)o
)o
)o
>
In this example, &logfile is read, line by line, where each line
can contain up to ISO characters, as specified by the ISO
following SVARYING. The variable log_text is created, where
each observation is a line of text from the log file.
INFILE With the LENGTH option
INPUT with the SVARYING#. format
INDEX functions
DO UNTIL loops
INFILE &loqfile LENGTH=len;
INPUT loq_text $VARYING150. len;
This paper will describe the above
tactics and examine each ofthem in
an actual example where SAS logs
are created on a monthly basis.
After reading the log files and
creating a data set with the
information the log contains, it will
be shown bow the results can be
easily presented.
FINDING THE FIRST STRING:
IDENTIFYING SAS LffiRARIES
Now that the lines of text bave been read into the data set, the
next step in the example is to find the libnuy names. The index
function is used to find, within log_text, the string
''The data set", which immediately precedes the library name.
INTRODUCTION
Two new variables, start and n, are then created. Start is the
position of the cursor at the end of the first desired string. This
is 20 because "The data set" is actually preceded by the string
"NOTE:". N will aid in the creation of the libnuy name.
To more accurately examine information contained in one log
file or compare information between two similar log files, the
files can be read line by line into a SAS character variable. This
is accomplished through the combination ofiNFILE and INPUT
statements. The resulting variable can then be separated into
various parts, suc!t as libnuy names, data set names, number of
observations, and number of variables. This process is done
utilizing the INDEX function and DO UNTIL loops. Finally,
the results can be easily presented in PROC PRINT.
IF INDEX(loq text,'The data set') THEN DO;
n=O;
start=20;
END;
To continue, SAS looks to see if start is greater than 0, which
indicates there is a string of interest If start is greater than 0, a
new variable called nextcbar is created which contains the first
letter of the library name. The name of the library,
ds_Ub&time, is then set to the value ofnextchar.
This paper describes an example where information within two
different log files, each resulting from the same program
submitted at the beginning of two separate months, is compared.
Both log files are identified as ALLINF02.LOG in a filename
statement, but each file resides in a different directory. The log
files are then read separately via a simple macro containing
macro variables for the data set being created and the time
period from which the log file was generated. The end data set
contains information from the log file, such as libraries and data
sets created, and number of observations and variables contained
within these data sets. The entire code is shown in APPendix 1.
IF start>O;
nextchar=SUBSTR(loq_text,start+n,l);
ds_lib&time-LEFT(nextchar);
89
IDENTIFYING THE NUMBER OF
OBSERVATIONS AND VARIABLES
READING THE WHOLE LIBRARY NAME:
WHEN TO QUIT?
The next two steps are almost identieal to the previous step. The
only difference lies in the fact that a new position needs to be
determined The number of observations follows five characters
after the end of the dataset name (after" has''). Suppose the
name of the data set is YREND. Then the correct position is
identified by start (now equal to 20), plus n (now equal to five,
the number of characters in YREND), plus five (the number of
characters between the last character of the data set name and
the first character of the number of observations). A DO loop
follows, as before.
Now that the first letter of the librazy has been identified, a DO
loop Is used to append other letters which are part of the librazy
name. With each loop, n is incremented by one, the next letter
of the library is set to nextehar, and the librazy name
(ds_llb&tlme) is appended with nextellar. Notice that, in this
step, the TRIM and LEFT functions are used to eliminate extra
spaces. N and dslib&time are then retained for the next
iteration of the loop. The loop ends when nextehar is equal to
".",At that point, the librazy name is determined to be complete.
nurn obs&time=' ';
nextchar=S OBSTR(log _text,start+ n+5,1)
DO UNTIL (nextchar- '. ');
n=n+1;
nextchar-SO BSTR(log text,start +n,l);
ds lib&time=T RIM(ds-lib &time)
-, ILEFT(nex tchar)/
RETAIN n ds lib&time nextchar;
END;
nurn obs&tirne=TRIM(nurn obs&time)
ltLEFT(ne xtchar);DO UNTIL (nextchar= ' ');
n=n+l;
nextchar=SUBSTR
(log text,start+ n+5,1);
nurn_obs&time=TRIM(nurn_obs&time)
II LEFT (nextchar) ;
RETAIN n num_obs&time nextchar;
END;
THE NEXT STRING:
IDENTIFYING DATA SET NAMES
nurn_obs&tirne=TRIM(nurn_obs&tirne)l
The next desired variable is the data set name, ds name&tlme.
This immediately follows the librazy name, so itsfirst letter can
be identified by incrementing (start+n) by one. For example, if
the library name is DEMOCO., the value of start is 20 and n is
equal to the number of characters in the library name( in this case
seven), so the next character to be read is at position 28. A DO
loop allows characters to be read and appended to
ds_name&tlme until a space is reached(nextehar=' '). At this
point, the variable ds_name&time is determined to be
complete.
The number of variables is identified in a similar manner, shown
below:
num vars&time= ' ';
nextchar=SU BSTR(log text,start+ n+22,1
) ;
nurn_vars&time=TRIM(nurn_vars&tirne)
I ILEFT(nex tchar);
DO UNTIL (nextchar= ' ');
n=n+l;
nextchar=SOBSTR
(log_text, start+n+22 ,1);
nurn vars&tirne=TRIM(nurn vars&tirne)
TILEFT(ne xtchar); RETAIN n nurn vars&tirne nextchar;
END; ·
ds name&tirne=LEFT(ds name&tirne );
nextchar=S UBSTR(lo g_text,star t+n+l,l);
ds name&tirne=TRIM(ds name&tirnel
-, ILEFT(nex tchar);DO UNTIL (nextchar= ' ');
n=n+l;
nextchar=SU BSTR(log text,start+ n+l,l);
ds narne&tirne=TRIM(ds name&time)
-1 ILEFT(nex tchar);RETAIN n ds name&tirne nextchar;
END;
THE FINAL COMPARI SON
After the macro has been called twice, once for each of the two
log files, the two resulting data sets are merged for comparison
of the information they contain. A difference between the
number of observations for each data set in each time period is
computed, along with the difference between the number of
variables. The results can then be easily presented using PROC
PRINT, where the user specifies which elements are of interest.
In this manner, it is easy to examine differences between months
in terms of the number of observations in each data set, or to
quickly examine how many observations result in different
versions of a data set Part of the output is shown in Appendix
2.
90
DO UNTIL (nextchar='.'l;
n•n+l;
nextchar•SUBSTR(log_text,start+n,l);
ds lib&time•TRIM(ds lib&time)
-IILEFT(nextcharll
RETAIN n ds_lib&time nextchar;
CONCLUSION
This paper bas provided an example of how to read specific
elements from a log file and put them into a SAS data set.
Although the example used illusttates how to find the names of
data sets and associated number of observations and variables, it
can also be applied to tables (resulting from SQL code) or
reading an output file for specific elements. Any text file can
be read in a similar manner which, many times, allows for easier
examination and presentation of various files.
END;
ds name&time•'
';nextchar=SUBSTR(log text,start+n+1,1);
ds name&time•TRIM(ds name&timel
tiLEFT(nextchar);DO UNTil (nextchar•' ');
n•n+l;
nextchar•SUBSTR(log text,start+n+1,1);
ds name&time=TRIM(ds name&timel
tiLEFT(nextchar);-
ACKNOWLEDGEMENTS
Thanks to Mel Widawski for his presentation on a similar
subject-it allowed me to create an automa~ process which,
RS'l'AIN n ds_name&time nextollar;
thankfully, does not require me to print log files anymore.
END;
Thanks to Nikki Carroll for all her help in revising this paper so
it was more understandable.
ds name&time=LEFT(ds_name&time);
/*** Get t of observations in dataset ***/
num obs&time•' ';
REFERENCES
nextchar=SUBSTR(log_text,start+n+5,1);
num obs&time=TRIM(num obs&time)
tiLEFT(nextchar); -
Mel Widawski,
"Reading an Entire Line ofData of Unknown Length Into a
Character Variable"
Proceedings of the fl' Annual Western Users ofSAS Software
Regional Users Group Conference, LA, 1999.
DO UNTIL (nextchar-' ');
n-n+l;
SAS Institute, Inc. (1990), SAS Language: Reference, Version 6,
First Edition, Caly, NC: SAS Institute, Inc.
nextchar•SUBSTR(log text,start+n+5,1);
num obs&time=TRIM(num obs&time)
llLEFT(nextchar); RETAIN n num_obs&time nextchar;
END;
CONTACT INFORMATION
num_obs&time=LEFT(num_obs&time);
Beth Worrall
Analytic Resource Center
Kaiser Permanente
Denver, CO
/*** Get number of variables in dataset ***/
num vars&ttme•' ';
nextchar=SUBSTR(log text,start+n+22,1);
num vars&time=TRIM(num vars&timel
TILEFT(nextchar); -
Elizabeth.E. [email protected] org
DO UNTIL (nextchar•' ');
n•n+l;
nextchar•SUBSTR(log text,start+n+22,1);
num vars&time•TRIM(num vars&timel
llLEFT(nextchar);
RETAIN n num_vars&time nextchar;
END;
LABEL ds lib&time="LIBRARY &time"
dsJname&time="DATASET NAME &time"
num vars&time="NUMBER VARS &time"
num:obs&time•"NUMBER OBS &time";
APPENDIX 1
FILENAME previous
'c:/mydocs/sas/wuss01/wuss01_1/allinfo2.log';
FILENAME current
•c:/mydocs/sas/wuss01/wuss01_2/allinfo2.log';
%MACRO chk str(logfile,time);
DATA &logfile;
LENGTH log_text $150 ds_lib&time $10
ds name&time $20 nextchar $1
num obs&time $7 num vars&time $3;
INFILE &logfile LENGTH~len/
INPUT log_text $varying150. len;
RUN;
%MEND CHK_STR;
%chk str(previous,1);
%cbk:str(current,2);
DATA both;
MERGE previous current;
diff obs-num obs2-num obs1;
diff:var-num:vars2-num_vars1;
RUN;
TITLE1 "KAISER PERMANENTE COLORADO REGION";
TITLE2 "Review of LOG";
TITLE3 "Timeframe: March to April, 2001";
PR0C PRINT DATA•both LABEL;
VAR ds lib1 ds lib2 ds name1 ds name2
/*** Get name of library and dataset ***/
IF INDEX(log_text,'The data set') THEN DO;
n•O;
start•20;
END;
IF start>O;
nextchar=SUBSTR(log text,start+n,l);
ds_lib&time=LEFTinextchar);
nWn obsl Oum obs2 O.um varsl num_vars2
diff_obs diff_var;
RUN;
91
-