report production in R

Rmarkdown is great; it mixes plain text, employs markdown formatting with R script to output documents to pdf (with LaTeX), Word or markdown documents. Unfortunately, it doesn’t quite do everything I need out of the box. There are a few things that I need for full solution for preparing full research documents:

cross-referenced tables and figures
an easy way to go from regression models to tables
good tables

Cross referencing

The captioner package creates nice captions which can be cross-referenced in text. First, one creates an object to hold the captions. The captions are then added to the object. Finally, captions are called at the same point as the table or figure.

 library(captioner)
 fig_nums <- captioner(prefix = "Figure")
 fig_nums(name = "plot1", "A caption describing the plot")

 ## [1] "Figure  1: A caption describing the plot"

The caption can then be printed by calling the object.

fig_nums("plot1")

Figure 1: A caption describing the plot

data(mtcars)
library(ggplot2)
p <- ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point()
p

It can also be used inline, e.g.

As shown in r fig_nums("plot1")

Going from regression models to table output

Normal regression model outputs don’t give particularly nice outputs that can easily be used in a report.

data(infert)
head(infert)

##   education age parity induced case spontaneous stratum pooled.stratum
## 1    0-5yrs  26      6       1    1           2       1              3
## 2    0-5yrs  42      1       1    1           0       2              1
## 3    0-5yrs  39      6       2    1           0       3              4
## 4    0-5yrs  34      4       2    1           0       4              2
## 5   6-11yrs  35      3       1    1           1       5             32
## 6   6-11yrs  36      4       2    1           1       6             36

m1 <- glm(case ~ spontaneous + stratum + parity, data = infert, family = "binomial")
m1

## 
## Call:  glm(formula = case ~ spontaneous + stratum + parity, family = "binomial", 
##     data = infert)
## 
## Coefficients:
## (Intercept)  spontaneous      stratum       parity  
##   -0.626993     1.255312    -0.006535    -0.288837  
## 
## Degrees of Freedom: 247 Total (i.e. Null);  244 Residual
## Null Deviance:	    316.2 
## Residual Deviance: 279 	AIC: 287

I used to have a really slow way of accessing the components of the model to go to a useful output. Broom made things much easier:

library(broom)
tidy(m1, exponentiate = TRUE, conf.int = TRUE)

##          term  estimate   std.error statistic      p.value  conf.low
## 1 (Intercept) 0.5341956 0.438123690 -1.431087 1.524052e-01 0.2234415
## 2 spontaneous 3.5089314 0.221620511  5.664239 1.476784e-08 2.3022730
## 3     stratum 0.9934862 0.006413124 -1.019019 3.081940e-01 0.9809588
## 4      parity 0.7491344 0.138150300 -2.090743 3.655108e-02 0.5656926
##   conf.high
## 1 1.2555246
## 2 5.5043957
## 3 1.0060144
## 4 0.9747095

At this point one could export to csv, format a table in Excel and copy-paste to Word. However, that would fail to make full use of the potential of Rmarkdown. There’s a function in knitr called kable, and this can produce simple tables. Unfortunately, they are too simple for my needs. I favour pixiedust, which does almost everything I need it to.

library(pixiedust)

## Additional documentation is being constructed at http://nutterb.github.io/pixiedust/index.html

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

m1 %>% tidy(exponentiate = TRUE, conf.int = TRUE) %>%
  select(term, estimate, conf.low, conf.high, p.value) %>%
  dust() %>%
  sprinkle(cols = c("term", "estimate", "conf.low", "conf.high"), 
           round = 2) %>%
  sprinkle(rows = 1, border = c("top")) %>%
  sprinkle(rows = 4, border = c("bottom")) %>%
  sprinkle(cols = "p.value", fn = quote(pvalString(value))) %>% 
  sprinkle_colnames(term = "Term", p.value = "P-value") %>%
  sprinkle_print_method("markdown")

Term	estimate	conf.low	conf.high	P-value
(Intercept)	0.53	0.22	1.26	0.15
spontaneous	3.51	2.3	5.5	< 0.001
stratum	0.99	0.98	1.01	0.31
parity	0.75	0.57	0.97	0.037

The vignette has further details. In addition, pixiedust has a nice method for inline estimates. r dust_inline(m1, term = "spontaneous", label = "OR", fun = exp)

Which renders: Spontaneous abortions were associated with increasing odds of infertility (OR = 3.51; 95% CI: 2.3 - 5.5; P < 0.001)

Publication quality graphs

Graphs out of ggplot2 are almost perfect, however there are a few places in which they fall short:

Antialiasing - not available in ggplot2 so some lines can appear jagged
Arranging multiple graphs in a single figure

Cowplot provides a nice way of arranging figures in grids and also provides an attractive and professional looking theme.

library(cowplot)

## 
## Attaching package: 'cowplot'

## The following object is masked from 'package:ggplot2':
## 
##     ggsave

p <- ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point()
q <- ggplot(data = mtcars, aes(x = wt, y = hp)) + geom_point()
z <- plot_grid(p,q, labels = c("A", "B"))
z

Cairo adds the ability to export with antialiasing

library(knitr)
opts_chunk$set(dev="png", 
               dev.args=list(type="windows"),
               dpi=300)
p <- ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_line()
p

library(Cairo)
opts_chunk$set(dev="png", 
               dev.args=list(type="cairo"),
               dpi=300)
library(Cairo)
p