Paper TT14 Consistency Check: QC Across Outputs for Inconsistencies John Morrill, Quintiles, Inc., Kansas City, MO David J. Austin, PRA International, Charlottesville, VA ABSTRACT Executing a SAS program produces a log containing many notes, warnings and error messages regarding the program. There is no such log regarding the fitness of the output itself. For example, no SAS warning is issued when a set of outputs contain variant usages within the output. This paper describes a naïve but effective approach to detecting such inconsistencies across outputs. All words (actually, any set of characters separated by a space) in a document are reported together, sorted with duplicates eliminated. This report brings out inconsistencies by showing similar elements adjacent to each other. The costs and benefits of reducing the volume of this report with several methods are discussed. Example input and output along with all SAS code are all provided. The simplicity of this approach makes this paper appropriate for beginners while its usefulness makes it worthy of consideration for all levels of SAS ® users. INTRODUCTION This paper is organized into five sections. The first section describes the problem and also introduces an example pharmaceutical input and our approach to detecting inconsistencies, a SAS program we call Con_check (for consistency check). The second section describes a Con_check report on the example input. The third presents three methods of improving the usefulness of this report, discussing the benefits and costs of each. The fourth discusses limitations of Con_check and how to get around some of them. The fifth wraps up several miscellaneous issues. The appendices provide all example input, output and code discussed in this paper. 1. THE PROBLEM AND THE CON_CHECK APPROACH Consider a set of tables and listings summarizing a clinical study. A traditional spell-checking program could be used to identify aberrant spelling and a visual check is often used to identify other problems. There are at least two shortcomings to these approaches. First, since both the spell-checker and the visual check deal with words in their context, overall consistency is not an objective. Second, both methods check the same word multiple times and traditional spell-checkers check one word at a time and flag anything not found in its dictionary meaning the output may be unnecessarily voluminous. Con_check attempts to mitigate these shortcomings. Con_check removes words from their original context and compares them with similar occurrences, increasing the chance of finding inconsistent usages. Also, Con_check presents only one occurrence of a given word, eliminating time-consuming redundancy and minimizing flagged output volume. Here is how Con_check works. First, note the example pharmaceutical output included in Appendix 1: this is a compilation of tables designed to fit on a single page, contrived with dozens of errors and inconsistencies for illustration purposes. This SAS output becomes input to Con_check; it is converted to a text file and read into a data set consisting of a single observation for each “word” (used here to mean a collection of characters not containing the space character). Duplicate observations are removed and the resulting Con_check report is found in Appendix 2. This report is sorted so that words starting with the same characters (both alphabetic and non-alphabetic) are grouped together. Visually scanning this output provides the opportunity to detect inconsistent usages of words in the original pharmaceutical input. 2. USING THE CON_CHECK REPORT TO DETECT INCONSISTENCIES Appendix 2 is the Con_check output using all words from Appendix 1 as input. Appendix 2 is divided into page 1 (the shorter outputs) and page 2 (the longer outputs). Page 1 has 6 panels, created with the PROC REPORT panels option. Each panel contains three variables. The variable Z is a code described at length in the third section below; an understanding of Z is not necessary for Con_check output to be useful. The variable W contains “words.” Words as used in this paper include regular words but also any collection of characters not containing a space. The variable C is a frequency count of each word. Page 2 of Appendix 2 contains the same variables as page 1 as well as W_L, the length of the word. The table below demonstrates how to use the Con_check report to detect inconsistencies. Selected potential inconsistencies in the first panel. &dose This may be an unresolved macro variable, missed in the SAS log. 13 ( Probably not cause for 2 (%) concern. (%)by Likely a missing space. (0.0$) (0.0%) (Beats/minutes) Probably the $ should be replaced with a %. Should this be (Beats/minute) [without the plural minutes]? Incorrect characters: zero in place of the capital letter O. Incorrect characters: u in place of the Greek letter µ. Again, incorrect characters: lower case l in place of the capital letter L. Also, should a space precede the hyphen? Spacing issues. (C02) (CO2) (cp/ug) (cp/µg) (IU/mL) (IU/ml) (IU/mL)(mm (mmHg) (N= (N=38) ) , . / .) Selected inconsistencies involving numbers. 0.0. 02 02NOV2005 03NOV2005 2004 88mg 1 14.8.1 2 14.8.3 Again, spacing issues. Familiarity with the output or looking at a few of these instances in context will reveal if these are intended. 1 0.1178 1 0.4148 3 0.8894 May be worth investigating as irregular. Perhaps this properly ends a sentence but maybe it is a programming error. As the only number like this with a preceding zero, this bears investigating. Are all dates to be on the same day? Do we expect to see dates from only the current year? Spacing issues. A case where the frequency is important in identifying a potential problem can be seen here. These are the table numbers appearing at the top of each output in Appendix 1. Since we have two instances of 14.8.3, we may want to investigate whether this is due to properly having multiple pages of the same output or due to have mistakenly placed 14.8.3 where 14.8.2 is intended. Here, the latter is the case. This illustrates that Con_check is valuable for what is not shown (essentially a frequency of zero for the missing 14.8.2) as well as what is shown. Frequency is also important here. Upon investigation, all three cases of 0.8894 turn out to be p-values. This makes one suspicious that the models producing these p-values may improperly be miss-specified as identical. In general, Con_check’s output should be considered merely as candidates for further action – checking these candidates in context will reveal whether further action is merited. Clearly, not all words will be cause for concern. In fact, the completely consistent output will have no words that are cause for concern. The high incidence of questionable words in this paper is due to the contrived, error-laden Appendix 1. (Though this example is contrived for illustration purposes, most of the errors placed in Appendix 1 were detected in actual use of early versions of Con_check.) In fact, the sparse rate of errors in a typical output is motivation for finding ways to eliminate words that are not a problem. These ways are explored in the Section 3 below. Also, while a feature of Con_check is to remove words from their context in order to identify inconsistent usages, it is important to check the context to determine which usage is preferred. Here are some more cases. Selected inconsistencies in the second and third panels. <.0001 The first is produced with the SAS <0.0001 format pvalue6.4. Should these two be made consistent? 8 AUC Most instances have a space 1 AUCfollowing AUC. One, however, has 1 AUCas a hyphen immediately following and another seemingly has a space missing. Selected inconsistencies in the last three panels. 3 IL-10 1 IL=10 Should the one instance with the equal sign be hyphenated instead? 1 MANCOVA 1 MANOVA. Should one of these be changed to the other or are the two different models actually used in this study? AcmeAce Is a space missing? 1 C-Peptide 1 C-peptide 1 DAFT 2 DRAFT doesd dosed ears Should the P following the hyphen be capitalized? DAFT is a valid word but is probably not what is intended. The first should probably be “dosed.” This seemed odd. Upon investigation it should instead be “years.” Enrolled Enrolled" Stray quote mark? 2 gamma, 1 gamna, The one case with the “n” should have two “m’s.” MMDDDDYYYY MMDDDYYYY N n 1 Plabeco 4 Placebo 1 TFN 2 TNF Trail 1 ta.sas 1 tb.sas 1 tcc.sas 3 or 4 Ds? Actually, should this not be DDMMMYYYY? Are both okay or should one style be selected? Transposed letters in one instance. One of these is likely wrong. This is a good word but seemed odd. Upon investigation it should have been Trial as in “Trial Medication.” The third case does not follow the pattern of two letter SAS program names. Page 2 shows lines of different length – perhaps this is intended but it merits investigation. Along with making the Con_check report more usable, other inconsistencies will be covered in section 3. 3. MAKING THE CON_CHECK REPORT MORE USABLE As was discussed above, the Appendix 1 output is contrived to contain many errors and inconsistencies and thus makes the Con_check report found in Appendix 2 clearly worth reviewing. In reality, the Con_check report is typically (hopefully) very sparse of actual problems. A low number of errors per page in part reproduces the original problem which dissuades a careful consideration of the report and causes errors to remain undetected. In this section we cover three methods of increasing the error rate through eliminating words that are of no concern, improving the usefulness of Con_check. These three methods correspond to levels of the variable Z on the Con_check report and include (Z=1) stripping selected characters so that fewer lines are output to the Con_check report, (Z=2) eliminating numbers, and (Z=4 and Z=8) eliminating properly spelled words. Eliminating all words to which Con_check assigns a value to Z above 1 would clearly generate the smallest possible report, but this comes at a cost. For illustration purposes in this paper, no such flagged words were eliminated. The first method of eliminating words that do not merit an individual line in the report is to strip off characters which unnecessarily differentiate the base word. Note that there are seven versions of the word Study in Appendix 2. Aside from capitalization, six versions differ only in the punctuation [the right parenthesis, comma, period, colon, and semi-colon] at the end of the word. Words containing these characters at the end are marked by adding 1 to the Z variable. By stripping these characters, six lines in this report could be reduced to a single line containing Study. Similarly, stripping the period from the end of Placebi in the fifth panel, this becomes a word that can be spellchecked (see spelling section below). The downside of naively stripping characters is that one loses the ability to mark words that should be flagged. For example, by eliminating the period and the right parenthesis, neither the .) nor the 0.0. in the first panel are marked as a potential problem. And adding a hyphen to the set of Z=1 characters would further reduce the lines used but would also reduce the number of potential problems flagged. Eliminating characters from the beginning of words would also have both benefits and costs. The second method of reducing Con_check output is to eliminate numbers. Words flagged as numbers are marked by adding 2 to the Z variable. Note that Z is cumulative so that Z=3 indicates words that are stripped of selected characters as described above (initializing Z to 1) which, in turn, results in a number (adding 2 to Z to make 3). The part of the table describing numbers in section 2 above gives some of the benefits of leaving numbers in the report. This must be balanced against the tremendous volume of numbers in, for example, a set of listings. The third method of reducing Con_check output is to eliminate properly spelled words. A standard word list found on the internet was used to assign Z=4 and a specialized pharmaceutical word list was used to assign Z=8. Again, Z is cumulative so that a word with a Z=5 value such as Treatment, (with a comma on the end) gets a value of 1 from stripping off the comma and a value of 4 from appearing in the standard word list. The word limit:,.); (clearly a contrived case) scores Z=5 for the same reason: the select characters are stripped leaving a properly spelled word. Many properly spelled words could be eliminated by using this method, but this would be at the cost of eliminating valid yet unintended words such as colleted (the Appendix 1 context reveals that this word should be collected), Chang (appears in our standard word list but should be Change), DAFT, and continuos (should be continuous). 4. CON_CHECK LIMITATIONS AND APPROACHES TO OVERCOME THESE LIMITATIONS Con_check is intended to supplement the quality control process, not replace it. As a naïve approach, it cannot be depended on to catch grammatical errors. In addition, big-picture consistency or context-sensitive issues are not flagged. For example, in Appendix 1 Con_check does not notice that Note: is missing from the third output’s footnote nor that the line separating the title from the header is missing from the first output. And on the final output’s footnote, the word also improperly appears both at the end of the first line and the beginning of the second line. And by removing words from their context, Con_check does not highlight alignment issues. Con_check relies heavily on consistency among initial letters of similar words. An inconsistency at the beginning of a word will keep the inconsistent word from showing up near the intended word, decreasing the chance that the pair will be noticed as inconsistency candidates. The word zubject is an example of this. Some progress in overcoming this may be made by sorting based on reverse(word) while still printing the un-reversed word. This will gather words of similar endings together, potentially revealing inconsistencies which otherwise would go unnoticed. Singular and plural forms of words, along with other suffixes, throw a wrench into this method. Perhaps a scoring scheme could be created to assign similarity between words for comparison purposes. In its present form, the Con_check unit is a single word. In reality, it would be valuable to check consistency among units longer than a single word. For example, perhaps a given analyte is expected to be followed by only one form of a unit rather than both the original and the standard unit. And some words are frequently found together – trial medication is an example from Appendix 1. Also, hyphenated words extending beyond a single line are presently considered separate words (due to converting the original document to a text file prior to processing). By extending the concept of unit to selected multiple word phrases, another level of inconsistency could be detected. Extend the unit to be a complete sentence and grammatical checking from sophisticated word processing programs can be utilized. Extend the unit to be an entire footnote and the another level of consistency can be checked. For hints on how to process multiple word units, see Section 4.2 of Morrill, 2005, the source of the idea for Con_check. 5. MISCELLANEOUS Note that any file is eligible material: a Statistical Analysis Plan, a set of mock tables or shells, a title and footnote file, an entire clinical study report, journal articles, books, even the Gettysburg Address. Con_check has been successfully used on parts of these (except the latter which cannot be improved!). An innovative suggestion was made by Emily Peterson: run Con_check on mock tables to produce a set of words that should appear in the actual tables and listings, then use this set as the specialized dictionary to flag unexpected words in the pharmaceutical outputs programmed from these mock tables. In general, a “before and after” approach can be used to detect changes in a variety of circumstances: run Con_check, save results, make desired changes to Con_check input, run Con_check again, compare results. In practice, multiple SAS outputs are gathered into a single document for processing. We use a perl script (written by Doug Moore) to gather MS Word documents into a single file, then convert this to a text file. To avoid an overwhelming volume of numbers in data listing output, one could take only the first page of each listing and still obtain a representative sample. One could narrow the focus to footnote or title sections and avoid the body with all its numbers altogether. And one may wish to avoid or run separately AE and Medical History listings since verbatim terms (think of all the misspellings such as toxicity vs. tocixity) may cloud the results. Regarding the spell check, note that the standard word list is in lower case letters and the Z values of Con_check are assigned based on having all letters converted to lower case letters, though the original form is displayed. The specialized pharmaceutical word list used in Con_check consists of just a few words for illustration purposes. In practice it could be generated over time by adding appropriate words or could be generated from a medical dictionary. A correct set of units would be valuable to include in the specialized word list. Note that the specialized list saves at most only one line of Con_check output per word in the specialized list. A couple things regarding Con_check’s code: First, Con_check converts tabs to spaces. This is important since tabs are not displayed in Con_check’s output. Leaving a tab adjacent to a word would result in words showing up as separate cases in the Con_check output for no visible reason. Second, due to the limit on the number of characters actually compared using PROC FREQ in some versions of SAS, a PROC SORT along with a data step is used to accurately summarize the units in Con_check. REFERENCES Morrill, JM, “Programming Tips and Examples for Your Toolkit, II” Proceedings of the 2005 Annual Conference of the Pharmaceutical Industry SAS Users Group, May 2005, http://www.pharmasug.org/2005/bestpapers.htm or http://www.lexjansen.com/pharmasug ACKNOWLEDGMENTS The authors wish to thank Indra Fernando, Doug Moore, John Gorden, and Emily Peterson for valuable review and suggestions. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. John Morrill Quintiles, Inc., P.O. Box 9708 Kansas City, MO 64134 816-767-6000 [email protected]. [email protected] David J. Austin PRA International, 4105 Lewis & Clark Dr. Charlottesville, VA 22911-5801 434-951-3000 [email protected] BIO INFORMATION John is an Analyst in Statistical Programming at Quintiles, Inc. in Kansas City. He first used SAS in 1985 and has used it full-time in the pharmaceutical industry since 1998. David is an Analysis Programmer for PRA International in Charlottesville, Virginia and has worked in the pharmaceutical industry since 1997. KEYWORDS title, footnote, text file, inconsistent, aberrant, spelling, quality control, consistency, consistent Appendix 1 Table 14.8.1 Acme Ace-PROT1A ta.sas DRAFT - 03NOV2005 15:26 Page 1 of 1 Multivariate Analysis of ABC Level AUC per day (cp/ug) and Maximum Total of TNF alpha, IFN gamma, IL-6, and IL-2 Circulating Cytokines During Study, Per-Protocol Population [ Enrolled ] Adjusted 95% Confidence Interval of Treatment Group n Geometric Mean Adjusted Geometric Mean p-Value ______________________________________________________________________________________________________________________________________ Simultaneous Estimates ABC level AUC AcmeAce 35 65 ( 19 , 228 ) Plabeco 12 21 ( 0 , 6 ) Total TFN alpha, IFN gamna, IL-6, and IL-2 AcmeAce 35 591.3 ( 517.4, 5425.4) Placebo 12 76.6 ( 35.8, 163.9) ______________________________________________________________________________________________________________________________________ Simultaneous Treatment Comparisons: AcmeAce / Placebo ABC Level AUC 65 ( 12 , 346 ) <.0001 Total TNF alpha, IFN gamma, IL-6, and IL-2 33.8 ( 16.5, 69.5) <.0001 ______________________________________________________________________________________________________________________________________ Partial Correlation Coefficient from Error SSCP Matrix p-value ______________________________________________________________________________________________________________________________________ -0.1239 0.8894 ______________________________________________________________________________________________________________________________________ Note: MANOVA. fixed-effect. Treatment-Emergent Treatmemt-emergent. viait visit. AUC AUC- (0.0%) (0.0$) (IU/mL) (IU/ml) colleted GADA GADA- Lympcyte Lymphocyte Lymphocytes Lymphocytess Mddication Chang MedDra Subgrops Summries Comcomitant (Beats/mininute) ears Acme Ace-PROT1A tb.sas DAFT - 03NOV2005 14:46 Page 1 of 1 Table 14.8.3 Analysis of Maximum Lymphopenia Degree (%) After the First Treatment and ABC Level AUC per Day (cp/µg) During the Study After the First Treatment, Per-protocol Population [ Intent-to-Treat. ] ______________________________________________________________________________________________________________________________________ Treatment Variable N Mean St. Dev. Meadian Minimum Maximum p=value ____________________________________________________________________________________________________________________________________ AcmeAce Maximum lymphopenia degree 35 16 3 16 10 22 ABC level AUC (C02) 35 491 665 152 0 3090 0.1178 Placebo Maximum lymphopenia degree 12 0 1 0 0 3 ABC level AUC (CO2) 12 3 10 0 0 34 0.4148 ______________________________________________________________________________________________________________________________________ Note: Lymphoneia degree defined as number of units (%) below lower normal limit:,.); Seven zubjects dosed with 88 mg of study drug. MANCOVA fixd-effect bseline C-peptide AUCas a continuos. ABC level AUC (Beats/minutes) (mm Hg) (mmHg) Enrolled" 02 2004 Treatment. Acme Ace-PROT1A tcc.sas DRAFT - 02NOV2005 16:38 Page 1 of 1 Table 14.8.3 Analysis of IL-10 Circulating Cytokine at Day 2, 4 Hours Post Infusion Subset by Occurrence of Rash Adverse Event Data through Month 18 for 78 Randomized Subjects, Intent-to-Treat Population ______________________________________________________________________________________________________________________________ AcmeAce (N= 40) Placebo (N=38) IL-10 at Endpoint IL-10 at Endpoint 0 n(%) >0 n(%) 0 n(%) >0 n(%) ______________________________________________________________________________________________________________________________ Rash During Study Yes 21 (100.0) 9 ( 44.3) 10 ( 54.4) 3 ( 50.0) No . ( .) 5 ( 55.7) 22 ( 45.6) 3 ( 50.0) P-value <0.0001 0.8894 P-value Controlling for Treatment 0.8894 ______________________________________________________________________________________________________________________________________ IL=10 zero and non-zero values and precense of rash were analyzed within each treatment group using Fisher's exact test and also also analyzed overall using Mantel-Haenszel test(controlling for treatment group). &dose subjects doesd with 88mg of study drug. MMDDDYYYY MMDDDDYYYY Trail MEdication. Porbably. Exan Normal? Placebi. Subejct. Urianalysis. 0.0. Contuing. Study. Study) Life-threatning. Sequale. Ongoign. Checst X-Ray. Inclusiom Criteria. Exclusin. Completd. C-Peptide (%)by (IU/mL)- Study; Study: Appendix 2, Page 1, Shorter Z C W . 1 &dose . 13 ( 1 2 (%) . 1 (%)by 1 1 (0.0$) 1 1 (0.0%) 1 1 (100.0) 1 1 (Beats/minutes) 1 1 (C02) 1 1 (CO2) 1 1 (cp/ug) 1 1 (cp/µg) 1 1 (IU/mL) 1 1 (IU/ml) . 1 (IU/mL). 1 (mm 1 1 (mmHg) . 1 (N= 1 1 (N=38) 1 3 ) 1 3 , . 3 2 1 -0.1239 1 1 . 1 1 .) . 1 / . 9 0 1 1 0.0. 2 1 0.1178 2 1 0.4148 2 3 0.8894 2 1 02 . 1 02NOV2005 . 2 03NOV2005 2 8 1 2 3 10 2 5 12 . 1 14.8.1 . 2 14.8.3 . 1 14:46 2 1 152 . 1 15:26 2 2 16 3 1 16.5, 3 1 163.9) . 1 16:38 2 1 18 2 1 19 Appendix Z C 1 1 9 1 1 1 . 1 1 1 . 1 . 2 . 1 . 8 lengths. Input=2006 TT14 Con_check_app1, z=code (strip=1, number=2, spell=4, Z C W Z C W Z C W 3 1 2, 4 2 analyzed 4 1 Estimates 2 1 2004 4 8 and 4 1 Event 2 2 21 4 1 Appendix 4 1 exact 2 2 22 4 1 as . 1 Exan 2 1 228 4 3 at 1 1 Exclusin. 2 5 3 . 8 AUC 4 2 First 2 1 3090 . 1 AUC. 1 Fisher's 2 1 33.8 . 1 AUCas . 1 fixd-effect 2 1 34 4 1 below 9 1 fixed-effect. 2 1 346 . 1 bseline 4 3 for 2 4 35 4 1 by 4 1 from 3 1 35.8, . 1 C-Peptide . 1 GADA 2 1 4 . 1 C-peptide . 1 GADA3 1 40) 4 1 Chang 5 2 gamma, 3 1 44.3) . 1 Checst 1 1 gamna, 3 1 45.6) 4 2 Circulating 4 2 Geometric 2 1 491 4 1 Coefficient 4 1 Group 2 1 5 4 1 colleted 4 1 group 3 2 50.0) . 1 Comcomitant 5 1 group). 3 1 517.4, 5 1 Comparisons: 1 1 Hg) 3 1 54.4) 1 1 Completd. 4 1 Hours 3 1 5425.4) 4 1 Confidence . 3 IFN 3 1 55.7) 5 1 continuos. . 3 IL-10 2 1 591.3 4 1 Controlling . 3 IL-2 2 1 6 1 1 Contuing. 1 3 IL-6, 2 2 65 4 1 Correlation . 1 IL=10 2 1 665 5 1 Criteria. . 1 Inclusiom 3 1 69.5) 4 1 Cytokine 4 1 Infusion 2 1 76.6 4 1 Cytokines 8 1 Intent-to-Treat 2 1 78 4 1 DAFT 4 1 Interval 2 1 88 4 1 Data 4 3 Level . 1 88mg 4 2 Day 4 4 level 2 1 9 4 1 day 5 1 limit:,.); . 1 95% 4 1 defined 4 1 lower . 2 <.0001 4 1 Degree . 1 Lympcyte . 1 <0.0001 4 3 degree 12 1 Lymphocyte . 2 >0 5 1 Dev. 4 1 Lymphocytes . 1 a . 1 doesd . 1 Lymphocytess . 7 ABC 4 1 dosed . 1 Lymphoneia . 3 Ace-PROT1A 4 2 DRAFT 8 1 Lymphopenia 4 3 Acme 5 2 drug. 8 2 lymphopenia . 5 AcmeAce 4 3 During . 1 MANCOVA 4 2 Adjusted 4 1 each 1 1 MANOVA. 4 1 Adverse 4 1 ears . 1 Mantel-Haenszel 4 2 After 4 2 Endpoint 4 1 Matrix 5 3 alpha, 4 1 Enrolled 4 5 Maximum 4 2 also . 1 Enrolled" . 1 Mddication 4 3 Analysis 4 1 Error . 1 Meadian special=8), c=frequency, w=word Z C W Z C W 4 3 Mean 5 1 Study) . 1 MedDra 5 1 Study, 5 1 MEdication. 5 1 Study. . 1 mg 5 1 Study: 4 1 Minimum 5 1 Study; . 1 MMDDDDYYYY 1 1 Subejct. . 1 MMDDDYYYY . 1 Subgrops 4 1 Month 4 1 subjects 4 1 Multivariate 5 1 Subjects, . 1 N 4 1 Subset . 1 n . 1 Summries 1 4 n(%) . 1 ta.sas 4 1 No 4 3 Table . 1 non-zero . 1 tb.sas 4 1 normal . 1 tcc.sas . 1 Normal? 4 1 test 5 2 Note: . 1 TFN 4 1 number 4 3 the 4 1 Occurrence 4 1 through 4 13 of . 2 TNF 1 1 Ongoign. 4 3 Total 4 1 overall 4 1 Trail 8 2 P-value 4 5 Treatment 8 1 p-Value 4 2 treatment 8 1 p-value 5 1 Treatment, . 1 p=value 5 1 Treatment. 4 3 Page 4 1 units 4 1 Partial 1 1 Urianalysis. 4 2 per 4 2 using . 1 Per-Protocol 4 1 values . 1 Per-protocol 4 1 Variable . 1 Plabeco . 1 viait 1 1 Placebi. 5 1 visit. 4 4 Placebo 4 1 were 4 3 Population 4 2 with 1 1 Porbably. 4 1 within 4 1 Post 1 1 X-Ray. . 1 precense 4 1 Yes 4 1 Randomized 4 1 zero 4 2 Rash . 1 zubjects 4 1 rash . 2 [ 1 1 Sequale. . 2 ] 4 1 Seven 4 2 Simultaneous . 1 SSCP 1 1 St. 4 2 Study 4 2 study 2, Page 2, Longer lengths. Input=2006 TT14 Con_check_app1, z=code (strip=1, number=2, spell=4, special=8), c=frequency, w_l=length, w=word W_L W 16 (Beats/mininute) 16 Intent-to-Treat. 16 Life-threatning. 16 test(controlling 19 Treatmemt-emergent. 18 Treatment-Emergent 126 ______________________________________________________________________________________________________________________________ 132 ____________________________________________________________________________________________________________________________________ 134 ______________________________________________________________________________________________________________________________________ * Appendix 3, PharmaSUG TT14 (2006), John Morrill & David Austin, Con_check.sas *; options nocenter mprint mlogic macrogen symbolgen source2 pagesize=51 ls=154 nodate nonumber validvarname=upcase msglevel=I; * Difficulty with cards statement within a macro. *; * Could populate with assignment statements or external data set. *; data special(where=(wp NE '1 2 3')); length wp $800.; input wp; cards; intent-to-treat p-value analytes fixed-effect lymphopenia lymphocyte 1 2 3 ; run; * Testing tab replacement *; data special; set special; wp_1=tranwrd(wp,' ','x'); * Change tab to space. *; wp_2=tranwrd(wp,'09'x,'y'); * Change tab to space. *; run cancel; proc print; run cancel; %macro loop(ids=,strip=Y,numbers=Y,spell=Y,special=Y); filename book "C:\Documents and Settings\nm11923\My Documents\PharmaSUG_2006\&ids..txt"; filename spell "C:\Documents and Settings\nm11923\My Documents\PharmaSUG_2006\word.lst"; libname psug "C:\Documents and Settings\nm11923\My Documents\PharmaSUG_2006\"; data any(keep=w);*(drop=w: i:); infile book missover length=reclen;* obs=55400; length line w $800.; input line $varying800. reclen; *line=tranwrd(line,' ',' '); * Change tab to space. *; line=tranwrd(line,'09'x,' '); * Change tab to space. *; line=tranwrd(line,'--',' '); cnt=1; do until(w eq ' '); w=scan(line,cnt,' '); if w ne ' ' then output; cnt+1; end; run; * Chose retain in data step due to length issues in PROC FREQ *; proc sort data=any; by w; run; data any; set any; retain c; by w; if first.w then c=1; else c+1; w_l=length(w); if last.w then output; run; ***************************************************************; ** Strip selected non-alphabetic characters from word ends of word (wt) - BEGIN. **; %if %substr(%upcase(&strip),1,1)=Y %then %do; data anyt(keep=w t wt); set any;*(where=(substr(upcase(w),1,1) IN ('L' '0' '1' '2' '3' '4' '5' '6'))); wr=reverse(w); do until (substr(left(wr),1,1) NOT IN (':' ',' '.' ')' ';')); if (substr(left(wr),1,1) IN (':' ',' '.' ')' ';')) then do; wr= substr(left(wr),2); t=1; end; end; wt=left(reverse(wr)); run; proc print width=min data=anyt(where=(t)); format w wt $30.; run cancel; %end; ** Strip selected non-alphabetic characters from word ends of word (wt) - END. **; ** Identify numbers - BEGIN. **; * Note that t is used as input here!!!!! *; %if %substr(%upcase(&numbers),1,1)=Y %then %do; data anyn(keep=w n wn t); *set any; *(where=(substr(upcase(w),1,1) IN ('L' '0' '1' '2' '3' '4' '5' '6' '7' '8' '8'))); set anyt; if input(wt,?? best.) then do; *Checks if number: avoid message- thanks Doug!*; n=2; wn=input(wt,best.); end; run; proc print width=min data=anyn(where=(n)); format w $30.; run cancel; %end; ** Identify numbers - END. **; ** Spell Check - BEGIN. **; %if %substr(%upcase(&spell),1,1)=Y %then %do; * Read in word list to create SAS data set. *; data psug.spell; infile spell missover length=reclen obs=200200; length spell $800.; input spell $varying800. reclen; run cancel; proc print width=min data=psug.spell(obs=54); run cancel; data any_s; set anyt; * Use anyt here instead of any *; ws=lowcase(wt); run; proc sort data=any_s;*(where=(substr(upcase(w),1,1) IN ('L' 'M' 'T' 'U' '0'))); by ws; run; data anys(keep=w s ws); merge any_s(in=a) psug.spell(in=b rename=(spell=ws)); by ws; if a AND b; s=4; run; proc sort data=anys; by w; run; proc print width=min data=anys(where=(s)); format w $30.; run cancel; %end; ** Spell Check - END. **; ** Special Spell Check - BEGIN. **; %if %substr(%upcase(&special),1,1)=Y %then %do; proc sort data=special; by wp; run; data any_p; set anyt; * Use anyt rather than any. *; wp=lowcase(wt); run; proc sort data=any_p; by wp; run; data anyp(keep=w p wp); merge any_p(in=a) special(in=b); by wp; if a AND b; p=8; run; proc sort data=anyp; by w; run; proc print width=min data=anyp(where=(p)); format w: $30.; run cancel; %end; ** Special Spell Check - END. **; data any_final; merge any(in=a) anyt(drop=t) anyn anys anyp; by w; if a; z=sum(of t n s p); w_sort=upcase(w); run; proc sort data=any_final; by w_sort; run; proc print width=min; format w wt $30.; run cancel; title "Appendix 2.1, Shorter lengths. Input=2006 TT14 &ids, z=code (strip=1, number=2, spell=4, special=8), c=frequency, w=word"; proc report data=any_final(where=(length(w)<=15)) panels=6 split='*****'; columns z c w; define z / width=2 spacing=1; define c / width=2 spacing=1; define w / width=15 flow spacing=1; run; title "Appendix 2.2, Longer lengths. Input=2006 TT14 &ids, z=code (strip=1, number=2, spell=4, special=8), c=frequency, w_l=length, w=word"; proc print data=any_final(where=(length(w)>15)) noobs; var z c w_l w; run cancel; proc report data=any_final(where=(length(w)>15)) split='*****'; columns z c w_l w; define z / width=2 spacing=1; define c / width=2 spacing=1; define w_l / width=5 spacing=1; define w / width=135 flow spacing=1; run; %mend loop; %loop(ids=Con_check_app1); * Could parameterize 15 (width of column) - changes panels *;
© Copyright 2026 Paperzz