Literate programming using R and knitr

2012-12-17

Introduction

Motivation

Literate programming

Sweave: R and LaTeX

Sweave properties

A new hope

Markdown + knitr + pandoc

Pure markdown

% A simple markdown example
% by Yours Truly
% In the year 2929

Markdown is a simple text formatting standard that aims to be human 
readable. It is inspired by old-school email formatting.

It supports _italic_ and __bold__ text formatting. It supports simple 
lists as in

* important item,
* other important item

which can of course be ordered and/or nested. Images can easily be 
integrated, as in

![Picture from Wikimedia Commons](figure/Corcovado.jpg)

Pandoc conversion

pandoc -s --mathjax PureMarkdown.md -o PureMarkdown.html

pandoc PureMarkdown.md -o PureMarkdown.pdf

pandoc PureMarkdown.md -o PureMarkdown.docx

pandoc PureMarkdown.md -o PureMarkdown.rtf

pandoc PureMarkdown.md -o PureMarkdown.odt

pandoc PureMarkdown.md -o PureMarkdown.epub

R + markdown

A .Rmd file is essentially a markdown file with R code chunks between 
triple grave accents, with chunk options in braces. Like so:
```{r}
require(MASS)
data(anorexia)
summary(anorexia)
```
Here, the `r` in braces indicates that the content should be evaluated 
by R, and replaced by the result when weaving. 

Usally, chunks get names, as well as extra options that regulate the 
evaluation and display during weaving. Like so:
```{r myPlot, echo=FALSE, dpi=300, fig.cap="Weight gain by treatment"}
boxplot(Postwt-Prewt~Treat, anorexia)
```
Here, the chunk is called `myPlot`, the R code generating the output 
is not shown, the image resolution is set to 300, and there is a 
figure caption.

Weaving using knitr

At the R command line:

> require(knitr)
> knit("RMarkdown.Rmd")

weaves file RMarkdown.Rmd into pure markdown to be converted as before:

pandoc -s --mathjax RMarkdown.md -o RMarkdown.html

pandoc RMarkdown.md -o RMarkdown.pdf

pandoc RMarkdown.md -o RMarkdown.docx

Summary: Markdown + knitr + pandoc

  1. Editor: write markdown plus code chunks

  2. R: weave using command knit in package knitr

  3. At the command line: convert via pandoc

Good to know

LP and applied statistics

Who are you writing for?

Application: Fully automated report

Application: Reproducible research

Application: Documented pipeline

Application: Annotated code

Application: Draft report(s)

Endgame

Summary

  1. A new way to implement LP for statistics

    • Fairly intuitive, readable markup
    • State-of-the-art weaving/tangling
    • Lightweight formatting
    • Write once, convert multiple
  2. LP for statistics is harder than for CS

  3. Challenge: re-think workflows to use LP efficiently

Requirements & Installation

  1. Recent version of R

  2. Packages knitr and ascii (from CRAN)

  3. Pandoc

  4. Recommended: pdflatex

  5. A good editor or IDE - a very personal choice (Geany, WinEdt, RStudio, Eclipse, vim…)

About this document

This presentation was written as a pure markdown file and converted to HTML via

pandoc -s -S -t slidy --mathjax LitProg_20121217.md \ 
       -o LitProg_20121217.html

The current versions can be found at

http://www.meb.ki.se/~aleplo/LP2012/LitProg_20121217.html

and

http://www.meb.ki.se/~aleplo/LP2012/LitProg_20121217.md