ARROW Statistical Graphic System

Paper: PO19
ARROW Statistical Graphic System
Cheng Jun Tian, Johnson & Johnson PRD, Titusville, New Jersey, 08560
Qin Li, Johnson & Johnson PRD, Titusville, New Jersey, 08560
Jiangfan Li, Johnson & Johnson PRD, Titusville, New Jersey, 08560
ABSTRACT
ARROW Statistical Graphic System (ARROW/GR), written in SAS® , presents informative statistical graphs
for clinical data analysis done by Johnson & Johnson Pharmaceutical Research & Development (J&JPRD).
ARROW/GR is a sub-system of the ARROW Clinical Data Analysis System, focusing on data visualization.
It applies to Phases 1 through 4 clinical trials. ARROW/GR currently provides eight types of statistical
graphs: mean plot, vertical bar charts, horizontal bar charts, box plot, line plot, dot plot, survival curve plot,
and point estimates with confidence intervals plot. With ARROW/GR, graphic presentation becomes more
convenient, reliable, repeatable, and yet flexible.
ARROW/GR is a modularized SAS® macro package with well-organized structure. It sets up J&JPRD’s
graphic standards by applying graphic templates, consistent selection of colors, symbols, and line types.
The graphic data are the output from well-validated ARROW statistical analysis modules. Hence statistical
graphs produced by ARROW/GR are reliable and easy to validate.
INTRODUCTION
Data visualization has become more important in clinical data analyses and presentations. It is also playing
a more significant role in regulatory submissions. In version 9.1, enhanced features in SAS® /GRPAH
procedures, statements and annotations make it easier for users to produce informative graphics. However,
to get desirable graphic presentations, a SAS® programmer needs to understand SAS® /GRPAH procedures,
®
statements, annotation data, and their relationships, as well as their relationships with other BASE SAS
statements and procedures. Producing graphics in SAS® is still a challenge to many clinical trial
programmers. These challenges may include: control output destination and format; selection of fonts, lines,
and symbols; display graphics in multi-panel per page with common titles, footnotes, and legends, etc.
Another challenge is the requirement to be able to reproduce (e.g., mass production) pharmaceutical trial
output. re-production of outputs in pharmaceutical industry.
ARROW Statistical Graphic System combines these challenges of graphic programming inside SAS®
macros, while leaving users with friendly interfaces in the format of functional oriented macro parameters.
For example, self-explanatory macro parameters such as: pagesize=, layout=, textsize=, lines=,
symbols=, outcolor=, outfile=, provide users vast selections on page orientation, display layout (single or
multi-panel display), relative text size, line type, symbol of data point, color, and output file format, etc.
Built on decades of experience in application to clinical trial data analyses, ARROW Statistical Graphic
System selectively constructed 8 macros. These 8 macros will generate mean/median plot, vertical bar
chart, horizontal bar chart, box plot, line plot, dot plot, survival curve, and plot of point estimates
with confidence intervals. Users provide inputs to the 8 core macros by selections of macro parameters.
Within each of the 8 macros, there are supporting macros to perform tasks such as system initiation, device
selection, graphic template setting, and annotation data derivation behind the scene.
The purpose of this paper is to introduce the capability of the ARROW Statistical Graphic System by
presenting different types of graphics the system can produce. Technical details of the programming will not
be discussed. You are always welcome to contact the authors for any further discussion.
SCOPE OF ANALYSIS
ARROW Statistical Graphic System covers parallel and crossover clinical trial design. It is applicable to
data for a single study as well as data for integrated studies. The system presents results of three major
types of analysis: Categorical data analysis, Continuous data analysis, and Survival data analysis. For
continuous data, the system will display in the graphics: Number of values assessed (N), Mean and /or
confidence interval with different significance levels, Standard deviation (SD) or Standard error (SE), L. S.
Mean, and Standard error of L. S. Mean. For categorical data, the system will display: Number of values
Assessed (N) and the number of observations in each category. For survival data, the system will display:
Numbers of subject at risk, Confidence intervals for survival rates, and p-value for the distribution
comparison, as obtained from a log-rank test.
1
GRAPHIC FEATURES
We will use a macro call of vertical bar chart to demonstrate common features in ARROW Statistical Graphic
System. These features, in bold below, apply to all the 8 core graphic macros of ARROW Graphic System.
%grvbar(inds=,
if= ,
by=,
above=1,
yvar=,
yvarf=,
ylabel=,
yorder=,
yref=,
xvar= ,
xvarf=,
grgroup=,
grgroupf=,
stat=,
statlbl=,
barstat=,
barstatf=,
cisel=no,
title=’Figure 1: Distribution of Subjects ...’,
headnote=’(Free Headnote)’,
footnote=’(Free Footnote)’,
pgmid=yes,
pagesize=landscape,
layout=2X2,
textsize=140,
outcolor=yes,
frame=no,
patterns=,
outfile=eps,
split=no,
cleanup=yes);
An ARROW Statistical Graphic macro starts with input data (inds=). Parameter IF (if=) defines an “if”
statement, which could be used to further subset the data. This feature is especially useful when you want to
set a macro loop to produce plots for different subgroups. Use of the parameter BY (by=) will result in
separate graphics for each of the values of the variable designated in the By statement. by variable values.
Parameter title= defines the first title in a single quoted string. The second title will always be protocol and
analysis population pre-defined as macro variables defined in other parts of an output program. Parameter
headnote= allows you to add one more title. It can also be set to blank. Parameter footnote= provides
single or multiple footnotes in single quoted strings. Multiple footnotes should be separated by the special
character “@”. Parameter pgmid= has options yes or no. If yes, the name of the graphic program will be
displayed with page number at the bottom of the output. Pagesize= has three options: half, portrait, or
landscape. Selecting half will display the graphic in the top half of a portrait page. Selecting portrait or
landscape will display a graphic in the full page. Layout= defines output template. As in the sample, a
layout=2X2 produces up to four graphics per page in 2 rows by 2 columns. Textsize= defines relative text
size for graphic symbols, label of axes, and text of by variables. Textsize=100 means original size as defined
by ARROW Statistical Graphic System. Outcolor= provides selection of colors. Options are: no, yes, or user
defined stream of colors. Outcolor=no will generate graphics in grey. Output=yes will generate outputs in
ARROW Statistical Graphic System predefined colors. An example of user defined color stream could be:
outcolor=red green blue. Frame= has values yes or no. Frame=yes will draw a frame around the graphic.
Outfile= defines output name and format. Selections of output format are: eps, ps, png, emf, html, tiff, jpeg.
Split= has options yes or no. For outputs with multiple pages, split=yes will produce multiple single page
outputs, which is convenient for inserting graphics into word documents. (Figure 1)
2
3
In the following sections, we will present specific features of each individual graphic module using examples.
Options of these macro parameters won’t be discussed in detail.
MEAN/MEDIAN PLOT
Plot mean or median of a given variable with option of presenting confidence intervals.
Figure 2 displays the mean of laboratory results with its 95% confidence interval by study visit and laboratory
tests. It is a 3X2 display (layout=3X2) with user specified color stream (color=red green orange blue). These
plots share a common legend. Statistic displayed is Count (N). The statistics can also include mean or
median.
4
Figure 3 shows an example of break of joints (joint=1 to 4, 5 to 6) in a mean plot. This feature is useful when
you want to group observations by time points or values of other x-axis variables.
Figure 4 shows a plot of the median with its 84% confidence interval. Medians of each treatment group at
each time point are also annotated under the plot. To get the annotation, you need to set parameter
stat=median, and set the format of median by parameter statf=4.2. In addition, the scale of the y-axis can be
set to log10 or loge.
5
VERTICAL BAR CHART
In addition to the vertical bar chart displayed in the previous section for common graphic features, additional
examples of vertical bar charts are displayed below (Figures 5 and 6).
Figure 5 shows the distribution of age groups within each treatment group. Percents are shown at the top of
each bar. .
In Figure 6 a vertical bar chart is used to display mean change from baseline for each treatment group at
different time points. The mean changes are also displayed at the top of each bar. Artificial reference lines
are placed to show the feature of reference line(s). These reference lines could be dynamic as long as the
references are saved in a variable in the input data.
6
HORIZONTAL BAR CHART
Horizontal bar chart is similar to vertical bar chart. It presents data in another dimension.
Figure 7 shows another option to display bars: in patterns (patterns=e l5 x5 r5 s).
BOX PLOT
Produces a box plot with the option of different interpolation methods.
Figure 8 shows a box plot with option interpol=boxt and option joint=all.
7
In Figure 9, options are set as Interpol=hilo and joint=blank. An artificial reference line is set on y-axis
(yref=0).
Figure 10 shows an example of a multi-panel box plot.
8
LINE PLOT
A line plot is useful in presenting data for f individual subjects over time.
Figure 11 presents a line plot with two y-axes. It displays subject’s ECG values and ECG changes over time.
Figure 12 shows lab test results of individual subjects over time.
9
DOT PLOT
Produces scatter plots or needle plots with options of interpolation methods and fitted or reference lines.
Figure 13 shows a scatter plot with fitted lines.
Figure 14 shows a needle plot.
10
Figure 15 shows a scatter plot with multiple reference lines.
Figure 16 is an interesting application of a dot plot. It presents counts of adverse events for each treatment
group.
11
SURVIVAL CURVE
Produces time to event plots with options of confidence interval and display of statistics.
Figure 17 displays a Kaplan-Meier curve with 95% confidence intervals. Censored observations are marked
in the plot with different symbols for each treatment group.
Figure 18 presents an upward survival curve with numbers of subjects left annotated.
12
Figure 19 shows survival curves without censored observations displayed. In addition to the number of
subject left, p-values and hazard ratios are also provided by annotation. . Note that number of survival
curves is different in the two plots, but a common legend is preserved.
PLOT OF POINT ESTIMATES WITH CONFIDENCE INTERVALS
Plot any point estimate and confidence interval. It may be used to plot odds ratio.
Figure 20 show mean change with confidence intervals at each time point.
13
Plot 21 shows cure rate by different categories.
In summary, the ARROW Statistical Graphic System provides users a set of powerful macros with friendly
interface and enhanced graphic presentations. The system includes challenging features such as:
displaying common title, footnotes, and legend for multi-panel graphics on the same page; annotating
statistical analysis results in graphics, etc into macros, and provides graphic tools which are easy to use.
With the ARROW Statistical Graphic System, it does not require a seasoned SAS® programmer to generate
desirable graphics.
The ARROW Statistical Graphic System is an add-on subsystem to other ARROW Statistical Analysis
Systems developed in J&JPRD. ARROW Graphic System macros accept data from the data preparation or
analysis modules of other ARROW subsystems. This architecture guarantees the consistency between the
ARROW tabulation and graphic displays. The system architecture demonstrates the advantage of sharing
the data and analysis results in the different displays. It also allows for the reliable reproduction of analysis
results.
As with other ARROW system macros, ARROW Statistical Graphic System macros are stored in a fully
validated and well-maintained central library. All programmers could use these macros and have read-only
access to all codes. Version update and validation are strictly controlled, but user-interventions are
supported.
All ARROW System macros follow a thoughtfully designed architecture, which is open modifications and
additions. . New modules could easily be added to the current system if there is demand. ARROW
system’s open architecture also allows smooth interchanges with other software systems, such as Microsoft
Office, Visual Basic, C/C++, SPLUS/R, etc.
CONCLUSION
ARROW Statistical Graphic System established a clinical trial graphical style and standard followed by
J&JPRD. . It makes facilitates data visualization and presentation through use of simple macro calls.
ARROW Statistical Graphic System improved programming efficiency by utilizing centrally maintained
macros, and avoided repeated development of similar codes by different trial teams. Graphics generated
by the fully validated ARROW Statistical Graphic System are more reliable than those produced individually.
14
REFERENCES
[1] ARROW Clinical Data Analysis System, Paper PO17, presented in PharmaSUG, 2009.
[2] ARROW General Categorical Analysis System, Paper AD17, presented in PharmaSUG, 2009.
[3] ARROW Generic Data Listing System, Paper AD15, presented in PharmaSUG, 2009.
ACKNOWLEDGEMENTS
The authors would like to thank Johnson & Johnson PRD for providing a working and research environment,
and supporting conference participation. The authors also would like to thank all the colleagues for applying
these macro tools on clinical projects, and for providing feedbacks. The real trails applications are
motivation and objectives of ARROW Statistical Graphic System.
CONTACT INFORMATION
Your comments are appreciated. Please contact authors at
Cheng Jun Tian, Qin Li, Jiangfan Li
J&J PRD
P.O. Box 200
1125 Trenton-Harbourton Road
Titusville, NJ 08560-1504
Email: [email protected], [email protected], [email protected]
15