How Paul Bunyan Cleared the Log Jam: Creating a Useful Table from Information within Log Files Beth Worrall, Kaiser Pennanente, Denver, CO READING A SINGLE LINE OF DATA ABSTRACT The INFILE and INPUT statements together allow lines of text of varying lengths to be read as observations of a character variable. The LENGTH= option on the INFILE statement sets a variable, in this case len, to the length of the line being read The SVARYING#. format on the INPUT statement then allows lines of varying lengths to be read, where the length of each line must be specified, in this case as len. Frequently SAS programs are submitted on a regular basis (weekly, monthly, or annually), and it is desirable to compare Jog files between different time periods. Such comparisons can detect any drastic differences in data sets routinely created, and -the infonmition they contain. To taclfitate this comparison, table can be created which summarizes information contained within logs. This summary may include the names of data sets, the number of variables within each data set, and the number of observations within each data set To accomplish this, the following statements are used: a )o )o )o > In this example, &logfile is read, line by line, where each line can contain up to ISO characters, as specified by the ISO following SVARYING. The variable log_text is created, where each observation is a line of text from the log file. INFILE With the LENGTH option INPUT with the SVARYING#. format INDEX functions DO UNTIL loops INFILE &loqfile LENGTH=len; INPUT loq_text $VARYING150. len; This paper will describe the above tactics and examine each ofthem in an actual example where SAS logs are created on a monthly basis. After reading the log files and creating a data set with the information the log contains, it will be shown bow the results can be easily presented. FINDING THE FIRST STRING: IDENTIFYING SAS LffiRARIES Now that the lines of text bave been read into the data set, the next step in the example is to find the libnuy names. The index function is used to find, within log_text, the string ''The data set", which immediately precedes the library name. INTRODUCTION Two new variables, start and n, are then created. Start is the position of the cursor at the end of the first desired string. This is 20 because "The data set" is actually preceded by the string "NOTE:". N will aid in the creation of the libnuy name. To more accurately examine information contained in one log file or compare information between two similar log files, the files can be read line by line into a SAS character variable. This is accomplished through the combination ofiNFILE and INPUT statements. The resulting variable can then be separated into various parts, suc!t as libnuy names, data set names, number of observations, and number of variables. This process is done utilizing the INDEX function and DO UNTIL loops. Finally, the results can be easily presented in PROC PRINT. IF INDEX(loq text,'The data set') THEN DO; n=O; start=20; END; To continue, SAS looks to see if start is greater than 0, which indicates there is a string of interest If start is greater than 0, a new variable called nextcbar is created which contains the first letter of the library name. The name of the library, ds_Ub&time, is then set to the value ofnextchar. This paper describes an example where information within two different log files, each resulting from the same program submitted at the beginning of two separate months, is compared. Both log files are identified as ALLINF02.LOG in a filename statement, but each file resides in a different directory. The log files are then read separately via a simple macro containing macro variables for the data set being created and the time period from which the log file was generated. The end data set contains information from the log file, such as libraries and data sets created, and number of observations and variables contained within these data sets. The entire code is shown in APPendix 1. IF start>O; nextchar=SUBSTR(loq_text,start+n,l); ds_lib&time-LEFT(nextchar); 89 IDENTIFYING THE NUMBER OF OBSERVATIONS AND VARIABLES READING THE WHOLE LIBRARY NAME: WHEN TO QUIT? The next two steps are almost identieal to the previous step. The only difference lies in the fact that a new position needs to be determined The number of observations follows five characters after the end of the dataset name (after" has''). Suppose the name of the data set is YREND. Then the correct position is identified by start (now equal to 20), plus n (now equal to five, the number of characters in YREND), plus five (the number of characters between the last character of the data set name and the first character of the number of observations). A DO loop follows, as before. Now that the first letter of the librazy has been identified, a DO loop Is used to append other letters which are part of the librazy name. With each loop, n is incremented by one, the next letter of the library is set to nextehar, and the librazy name (ds_llb&tlme) is appended with nextellar. Notice that, in this step, the TRIM and LEFT functions are used to eliminate extra spaces. N and dslib&time are then retained for the next iteration of the loop. The loop ends when nextehar is equal to ".",At that point, the librazy name is determined to be complete. nurn obs&time=' '; nextchar=S OBSTR(log _text,start+ n+5,1) DO UNTIL (nextchar- '. '); n=n+1; nextchar-SO BSTR(log text,start +n,l); ds lib&time=T RIM(ds-lib &time) -, ILEFT(nex tchar)/ RETAIN n ds lib&time nextchar; END; nurn obs&tirne=TRIM(nurn obs&time) ltLEFT(ne xtchar);DO UNTIL (nextchar= ' '); n=n+l; nextchar=SUBSTR (log text,start+ n+5,1); nurn_obs&time=TRIM(nurn_obs&time) II LEFT (nextchar) ; RETAIN n num_obs&time nextchar; END; THE NEXT STRING: IDENTIFYING DATA SET NAMES nurn_obs&tirne=TRIM(nurn_obs&tirne)l The next desired variable is the data set name, ds name&tlme. This immediately follows the librazy name, so itsfirst letter can be identified by incrementing (start+n) by one. For example, if the library name is DEMOCO., the value of start is 20 and n is equal to the number of characters in the library name( in this case seven), so the next character to be read is at position 28. A DO loop allows characters to be read and appended to ds_name&tlme until a space is reached(nextehar=' '). At this point, the variable ds_name&time is determined to be complete. The number of variables is identified in a similar manner, shown below: num vars&time= ' '; nextchar=SU BSTR(log text,start+ n+22,1 ) ; nurn_vars&time=TRIM(nurn_vars&tirne) I ILEFT(nex tchar); DO UNTIL (nextchar= ' '); n=n+l; nextchar=SOBSTR (log_text, start+n+22 ,1); nurn vars&tirne=TRIM(nurn vars&tirne) TILEFT(ne xtchar); RETAIN n nurn vars&tirne nextchar; END; · ds name&tirne=LEFT(ds name&tirne ); nextchar=S UBSTR(lo g_text,star t+n+l,l); ds name&tirne=TRIM(ds name&tirnel -, ILEFT(nex tchar);DO UNTIL (nextchar= ' '); n=n+l; nextchar=SU BSTR(log text,start+ n+l,l); ds narne&tirne=TRIM(ds name&time) -1 ILEFT(nex tchar);RETAIN n ds name&tirne nextchar; END; THE FINAL COMPARI SON After the macro has been called twice, once for each of the two log files, the two resulting data sets are merged for comparison of the information they contain. A difference between the number of observations for each data set in each time period is computed, along with the difference between the number of variables. The results can then be easily presented using PROC PRINT, where the user specifies which elements are of interest. In this manner, it is easy to examine differences between months in terms of the number of observations in each data set, or to quickly examine how many observations result in different versions of a data set Part of the output is shown in Appendix 2. 90 DO UNTIL (nextchar='.'l; n•n+l; nextchar•SUBSTR(log_text,start+n,l); ds lib&time•TRIM(ds lib&time) -IILEFT(nextcharll RETAIN n ds_lib&time nextchar; CONCLUSION This paper bas provided an example of how to read specific elements from a log file and put them into a SAS data set. Although the example used illusttates how to find the names of data sets and associated number of observations and variables, it can also be applied to tables (resulting from SQL code) or reading an output file for specific elements. Any text file can be read in a similar manner which, many times, allows for easier examination and presentation of various files. END; ds name&time•' ';nextchar=SUBSTR(log text,start+n+1,1); ds name&time•TRIM(ds name&timel tiLEFT(nextchar);DO UNTil (nextchar•' '); n•n+l; nextchar•SUBSTR(log text,start+n+1,1); ds name&time=TRIM(ds name&timel tiLEFT(nextchar);- ACKNOWLEDGEMENTS Thanks to Mel Widawski for his presentation on a similar subject-it allowed me to create an automa~ process which, RS'l'AIN n ds_name&time nextollar; thankfully, does not require me to print log files anymore. END; Thanks to Nikki Carroll for all her help in revising this paper so it was more understandable. ds name&time=LEFT(ds_name&time); /*** Get t of observations in dataset ***/ num obs&time•' '; REFERENCES nextchar=SUBSTR(log_text,start+n+5,1); num obs&time=TRIM(num obs&time) tiLEFT(nextchar); - Mel Widawski, "Reading an Entire Line ofData of Unknown Length Into a Character Variable" Proceedings of the fl' Annual Western Users ofSAS Software Regional Users Group Conference, LA, 1999. DO UNTIL (nextchar-' '); n-n+l; SAS Institute, Inc. (1990), SAS Language: Reference, Version 6, First Edition, Caly, NC: SAS Institute, Inc. nextchar•SUBSTR(log text,start+n+5,1); num obs&time=TRIM(num obs&time) llLEFT(nextchar); RETAIN n num_obs&time nextchar; END; CONTACT INFORMATION num_obs&time=LEFT(num_obs&time); Beth Worrall Analytic Resource Center Kaiser Permanente Denver, CO /*** Get number of variables in dataset ***/ num vars&ttme•' '; nextchar=SUBSTR(log text,start+n+22,1); num vars&time=TRIM(num vars&timel TILEFT(nextchar); - Elizabeth.E. [email protected] org DO UNTIL (nextchar•' '); n•n+l; nextchar•SUBSTR(log text,start+n+22,1); num vars&time•TRIM(num vars&timel llLEFT(nextchar); RETAIN n num_vars&time nextchar; END; LABEL ds lib&time="LIBRARY &time" dsJname&time="DATASET NAME &time" num vars&time="NUMBER VARS &time" num:obs&time•"NUMBER OBS &time"; APPENDIX 1 FILENAME previous 'c:/mydocs/sas/wuss01/wuss01_1/allinfo2.log'; FILENAME current •c:/mydocs/sas/wuss01/wuss01_2/allinfo2.log'; %MACRO chk str(logfile,time); DATA &logfile; LENGTH log_text $150 ds_lib&time $10 ds name&time $20 nextchar $1 num obs&time $7 num vars&time $3; INFILE &logfile LENGTH~len/ INPUT log_text $varying150. len; RUN; %MEND CHK_STR; %chk str(previous,1); %cbk:str(current,2); DATA both; MERGE previous current; diff obs-num obs2-num obs1; diff:var-num:vars2-num_vars1; RUN; TITLE1 "KAISER PERMANENTE COLORADO REGION"; TITLE2 "Review of LOG"; TITLE3 "Timeframe: March to April, 2001"; PR0C PRINT DATA•both LABEL; VAR ds lib1 ds lib2 ds name1 ds name2 /*** Get name of library and dataset ***/ IF INDEX(log_text,'The data set') THEN DO; n•O; start•20; END; IF start>O; nextchar=SUBSTR(log text,start+n,l); ds_lib&time=LEFTinextchar); nWn obsl Oum obs2 O.um varsl num_vars2 diff_obs diff_var; RUN; 91 -
© Copyright 2026 Paperzz