9 August, 2022

Let’s go for a walk in the forest!

Link
Walk in the forest

A complete guide to producing a high-quality forest plot with SAS

The forest plot is a powerful graphical representation that summarizes the strengths of a statistical association between variables (e.g predictors, treatment arms, etc..) and clinical outcomes, across multiple subgroups, within a common scientific question.

We will present here a simple method to generate the forest plot using SAS® GTL feature, which is one of the most flexible ways to produce these highly customizable figures.

Our study case

For our example, we have selected a bone marrow cancer study for which the data is available publicly online. You can find the data in the full SAS code page. In this study, we sought to visually demonstrate the effectiveness of the treatment (Methotrexate) compared to placebo, within various subgroup populations.

Here is the forest plot that we aim to create in this project (figure 1):

Forest plot

Figure 1 : Forest plot showing the statistical comparisons of clinical efficacy between Methotrexate and Placebo, across various patients’ subgroups.

Don’t worry, it’s easier than it looks! Just follow the few steps..

The 4 steps to producing a beautiful forest plot

The procedure described below takes reference from a previous work (Matange 2014,2016; Hebbar 2015)

Step 1: The Input dataset

The very first step of the generation of a forest plot is the creation of an Input dataset including all the information needed to present the data and the graphical elements on the final plot.

In the figure 1, the main reported information are:

  • Subgroups names
  • Patient counts
  • Hazard ratios representation
  • Hazard ratios value
  • P values

Here is how the Input dataset used to generate this plot looks like. 

Input Dataset

Figure 2 : Input dataset supporting the creation of the forest plot

There are 11 variables that are reported into the Input dataset. We will describe some of them.

  • Subgroup : The groups for which association has been derived.
  • Count: Total number of patients in each subgroup with percentage which has been calculated by a PROC FREQ procedure.
  • HzRatio: It represents the hazard ratio with 95%CI . The estimate and its confidence limits have been derived by using PROC PHREG procedure.
  • Mean/Low/High: The mean here represents point estimate of hazard ratio, and Low and high represent its lower and higher limits of confidence intervals respectively.
  • P-Value: The level of statistical significance related to the hazard ratios in the specific subgroup. It can also be derived by the PROC PHREG procedure.

Step 2 : The structural information

The objective of this step is to add some additional supportive variables in the Input dataset to enhance the structure of the plot, namely INDENT, ID, REF and SQAURESIZE.

  • Ref= It’s a variable to control row properties for Y axis. We have used dynamic variables like _headercolor, _rowcolor and _thick while defining TEMPLATE for displaying a coloured band and its thickness. It can be modified while calling in SGRENDER procedure.
  • ID= To control the weight of the subgroup font.
  • Indent = To control the indentation of subgroups. 
  • Squaresize= It determines the marker size based on the subgroup population size.

Step 3 : The SAS® template

By using PROC TEMPLATE, we create a template, here by using the Graph Template Language (GTL) to layout the five columns of the figure.

Graphic Template Language (GTL) is a part of the Output Destination Style (ODS) Graphics software. GTL provides the user more control and flexibility over other graphical procedures. By using GTL, the user has the flexibility to modify features for graphs that are based on procedure-driven templates, as well as create completely customized graphs that may not be feasible to produce from a procedure-driven template.

Furthermore, GTL makes it simpler to integrate characteristic which in the past may have been difficult to include. For example, inserting a table of data within the output area or representing multiple graphs. It can also add multiple graphs of the different or same graph types (scatter plot, step plot, box plot etc) in the same plot.

We will go through the GTL section of code and try to highlight important parts of each section and their purpose.

Part 1 : The basic structure​

First step to provide the name to the template in DEFINE STATGRAPH. We defined few dynamic variables for controlling the row band (line) colour and thickness. We call them at each section of the graph in REFERENCELINE statement.

This SAS® code set the basic structure of our graph. We have used LAYOUT=LATTICE with 5 columns (related to the 5 elements in the final representation presented in figure 1).

Additionally, COLUMNWEIGHTS signifies the weight or width of each column. It should be add up to 1.

Part 2: The figure header​

This SAS® code defines the header section of the graph. The SIDEBAR statement here, defines the header space of the graph, we have used 2 rows and 5 columns to show labels for all column values.

ENTRY statement contains all the header labels which represents each column. First row has been kept as blank and the labels have been defined at second row.

Part 3: The columns

This SAS® code creates the first column, i.e., “Subgroup” in this case. We can define the X and Y axis by using XAXISOPTS or YAXISOPTS statements in GTL. REFRENCELINE gives the flexibility to use the dynamic variable (mentioned above in first section) to enhance the features of the horizontal line or a row.

The newly introduced statement AXISTABLE, which enables the user to write simple and short code within only one overlay layout. AXISTABLE statement has been used to present the information inside the overlay layout’s inner margin.

Second section created the column which represents the number and percentage of patents in each subgroup.

Third section created the hazard ratio graph and column for estimates and 95% Cl. The graphical part has been shown by using variable mean, high and low in SCATTERPLOT. For purpose of representation, we have chosen here log axis by using LOGOPTS option however LINEAROPTS can also be used here. SCATTERPLOT statement is for the risk difference plot with bars for representing the confidence intervals. SIZERESPONSE option has been used to signifies the population of the group.

The hazard ratio and 95 % confidence limits has been shown in the fourth column by using the HzRatio variable.

The last section represents the column for P-value and footnote as EHTRYFOOTNOTE statement. ENDLAYOUT and ENDGRAPH has been used to close the graph template

Step 4 : Style creation (optional step)

We have modified the default HTMLBlue style to enhance the visualization and to more easily control the fonts shown in the graph.

Step 5 : The final graph creation

Once the Input dataset and the figure template are both set, we can finally create the Forest plot by using ODS procedure SGRENDER. For better visualization we have used a modified style (ListingFP) to include in the ODS LISTING statement.

We call this template in the SGRENDER procedure along with some dynamic variables which has been used in the GTL for color and for font weight. In the above plot we have kept both header and body rows as white. User may also select different colours as well.

We can also specify the format of the file and quality of graph by controlling the IMAGEFMT and IMAGE_DPI respectively. We can also include SGANNO option and add various annotation in the graph, however that we will discuss in a separate topic as its quite a vast topic.

The full SAS codes of this project can be found here.

Conclusion

The purpose of this topic was to simplify the GTL procedure so that it not only helps those who already know how to create graphs using traditional or SG procedures but also those who are entirely new to graph programming. Forest plots may be created with other methods as well. However using GTL gives the user of much needed flexibility.

Want to learn more.. ?

Sign up here to enjoy new blog articles about biostatistics, clinical data analytics,  and stat programming

Most Read Articles

More for you:

Share your thoughts :