U.S. Geological Survey, Utah Cooperative Fish and Wildlife Research Unit, and
Department of Wildland Ressources, Utah State University, Logan, UT 84322-5230 USA
This vignette shows how to build RMarkdown documents and output in a variety of different formats. In fact, this is a RMarkdown document built within RStudio.
Markdown is a form of document markup language. A document markup language is a way to distinguish basic text from other document elements, such as those that occur during an editorial review. For example, by clicking the “REVIEW” tab on a MS Word document you initiate a “markup” process where a reviewer’s comments and editorial suggestions are shown on the document.
Markdown still involves embedding additional elements into a document, but has a slightly different purpose. It evolved as means to embed formatting syntax into a document and have it rapidly and easily converted to a desired format. You “markdown” text, including, for example, document formatting such as boldface, headers, lists, and mathematical symbols and equations, pictures, links to other files or webpages - in short, just about anything you see on a webpage, journal article, or book, can be constructed using Markdown. Formats are fairly flexible, including output in .html format for browsers or many different text formats.
In the case of R, and Markdown as an example of a document Markdown language, it provides a means to embed - and actually run - R code from within a single document, all using simple syntax coding. This is addition to the text formatting mentioned above.
Cool! Right?
Think about this for a moment in your role as a research scientist: By using Markdown you can fully replicate the analytical processes that underlay your research, showing the code you employ, the output it generates, be it numeric or graphical, plus text explanations about the analytical processes of your work. It can be, for all intent and purpose, your research metadata.
Although not a necessary requirement, it is easiest to configure and generate Markdown output from within RStudio. However, command–line code from Plain Vanilla R, which is still used by Curmudgeonly Dinosaurs like me, can also be used to generate Markdown documents. From this point forward I’ll call documents integrating R coding RMarkdown documents so as to distinguish them from other types of Markdown documents.
The most common outputs are .html, MS Word, .pdf, or LaTeK. RMarkdown is quite powerful, and there exists an ever-expanding list of customized Markdown templates including, for example, a template for all journals published by Elsevier. You can also build a book, a webpage, or just about any form of document you can imagine.
But you should probably start simple. And that’s what this vignette is all about.
Hopefully some of the advantages as outlined above answer this question.
But another, pragmatic reason relates to classes, especially mine. Historically, I’ve accepted just about any format for answers to exercises, quizzes, and tests in courses Iteach. MS PowerPoint or Word have been the most common. However, I now require RMarkdown as the means for evaluating coursework.
This is for two reasons. A first reason is that as you begin your R programming adventures, you’ll quickly learn that no “right” answer for implementing R code to meet an analysis objective exists. You, as a student and researcher, will find a personal solution. Other students and colleagues, one of them being perhaps more clever, might find a more efficient approach to a R coding issue. Because one of the easiest ways for you to learn R is to see examples, RMarkdown documents provide a great way for me to post - and you to see - “clever” answers to R coding problems.
The second reason is one of convenience for me as the instructor, and you as the student. I get a consistent, easy-to-read and evaluate document that includes your R code, the results of that code, as well as the rationale for your R analytical pathway, and your interpretation of your results. You, the student, get a fully documented process by which you performed an analysis for future reference, a metaphorical trail of breadcrumbs by which you can answer the time-honored question all analysts eventually face:
“How the **** did I actually do that analysis??!!??”
We’re going to use features found in RStudio as the foundation for instruction in building RMarkdown documents. As we shall see, it has the advantage of some simple drop-downs for building, compiling, and editing the “code” used to build a RMarkdown document.
Note, however, that all the RMarkdown code as shown below will also work perfectly well in Plain Vanilla R. So learn the RMarkdown coding process and worry later about how to compile the document.
The steps below will help you build a simple RMarkdown document. There are many, many more options for creating RMarkdown documents, but the best approach to learning RMarkdown is to first learn how to build simple documents. You can then experiment with, and learn to apply, more sophisticated aspects of RMarkdown to your documents.
To build a Markdown document from within RStudio ….
Note again that although we’re first learning how to build a RMarkdown document from within the RStudio editor, all the text and associated RMarkdown formatting could be entered into a basic text editor (e.g., Notepad++ for Windows 7/8/10 OS, textmate for MAC Os), or even as a MS Word document. [NOTE: However, be careful using the latter; MS Word carries special characters across depending on font (e.g., “” (quotes) in Times New Roman) that may not compile properly.]
Open RStudio. You will first need to install the R package rmarkdown
and dependencies. Note that your version of R must be \(\geqq\) 3.0. You should install the most current of RStudio as well, as it has recent features related to RMarkdown.
Create a new RMarkdown document
From the upper left of the RStudio window
This will open a GUI dialog box like that below asking which of the three basic RMarkdown documents types (i.e., HTML, PDF, Word) you’d like to build. It also shows dialog boxes for Title and Author. For now, change the Title to Homework 1 and Author to Your Name. Next, make sure the Document button on the upper left side is highlighted (blue color) and click on document type HTML. This will open a default RStudio RMarkdown document destined to be compiled in .html format. The date of the document defaults to the date when the document is first opened.
For now ignore the text in the default document. Merely highlight and delete the parts shown in blue, much as you would in Word document.
Once the deletion is done,
and name the file something like hmwk1.rmd, where the .rmd file extension identifies the file as a R markdown document.
Close RStudio. You’ve now created and saved a RMarkdown template.
Open an existing RMarkdown document
Pretty simple:
and navigate to directory and open the file. Use the hmwk1.rmd we saved from above. It will appear as:
You now have the backbone template of a RMarkdown document.
This section explains what the YAML is, and shows how to add text and basic formatting code (syntax) to that text.
All RMarkdown documents begin with a YAML header, which originally meant “Yet Another Markup Language,” but has since morphed to mean “YAML Ain’t Markup Language.” The YAML header controls the format of the RMarkdown document. A basic YAML for a .html document generated from within RStudio looks like the one found in our hmwk1.rmd file:
#### What a basic YAML looks like
---
title: "Homework 1"
author: "Student R. Me"
date: "August 15, 2018"
output: html_document
---
Of the four YAML elements shown above, only the output: is truly needed. This YAML requests output be a html_document. Some possible, but by no means all, RMarkdown output types include:
Other document control features, such as a .css file that controls .html output, or a MS Word document template to control Word output, can be added to the YAML. More on that a bit later.
#### Nothing special about adding text
This is text. Just type away to your heart's content ....
Use two hard returns to separate paragraphs (blocks) of text.
Click here to see what the output looks like. We’ll learn more about how to configure text for bold, italics, and other text formatting features next.
The most common use of horizontal separators is to distinguish among document sections. You can add one, two, or any number of line separators. Note a hard return and blank line is required after the first separator to generate two lines !!
#### Coding for line separators
The sequence of --- (three hyphens) adds a line separator to the document.
---
The sequence two sets of --- (three hyphens) separated by a blank line
---
---
adds two lines of separators.
Headers can be up to 6 levels, with each level preceded by the number of #
(pound or hashtag) symbols representing the header level.
#### Adding different levels of headers
# Header1
## Header2
### Header3
#### Header4
##### Header5
###### Header6
Examine the output, showing both headerrs abd line seprators. Note that by the time you’ve reached Header6 the font is quite small relative to the text. You can create headers with greater than 6 #
; however, at some point headers \(\gt\) 6 revert to the base pitch.
Words and phrases can be emphasized using Bold, Italics, and Combinations of each. There’s even a Blockquote option which allows for (pithy?) phrases or paragraphs to be highlighted.
#### Boldface and Italics
Encapsulate a word with ** (two asterik) or __ (two underscores)
to **Bold** or __Bold__ text.
For italics, use a * (single asterisk) or _ (single underscore)
for *emphasis* or _emphasis_.
Use a sequence of *** (three astrik) or ___ (three underscores)
to create ***bolded, italicized words***.
There should be no spaces betwen the first and last characters of the
word or phrase to be bolded or italicized.
#### The Blockquote
Use the > (greater than) symbol to create a
> Blockquote of (typically pithy) text.
Our RMarkdown document so far includes basic text, line separators and headers to distinguish among document sections, and boldface and italics. There’s even formatting to add blockquotes that highlight text sections.
Let’s next add lists.
Lists use the *
(asterisk) or +
(plus) or -
(hyphen) for an unordered list; simply add a hard return and another *
or +
or -
to create the second item in the list. For indenting, use either a tab or at least four spaces.
A numbered list begins with the desired number, say 1, and followed with successive numbers.
Note that RMarkdown has some limitations for classical numeric listing of 1, then 1.1, then 1.1.1, etc. These are better controlled using a .css template from within the YAML.
#### Unordered Lists
* Build an unordered Markdown list using the * (asterisk)
* Add a second list element
* at least 4 spaces (or a tab) before the * results in indenting
#### Ordered Lists
1. Build an ordered Markdown list using numbers
2. Add a second list element
1. at least 4 spaces (or a tab) before an ordered sub-list
So far we’ve seen how to code a basic RMarkdown document. However, we’ve not yet seen how to actually build the desired document type from the code. So how is it done?
The process of converting the RMarkdown text to the preferred document type (e.g., HTML, PDF, Word) involves several steps. The process first involves the function render
from within package rmarkdown
, which feeds the .rmd file (what we’re now learning how to build) to package knitr
. knitr
creates a new markdown output .md which is converted to the desired document type using the function pandoc
from package knitr
.
Whew! Probably too much detail, right?
Luckily, these steps have been condensed into a single dropdown command within RStudio. The process is called Knit.
From within RStudio open the hmwk1.rmd template saved at the beginning. For now it should only show the YAML. Add some headers and text, some boldface and italics, a list, and even a blockquote if desired. Something similar to that shown below.
Once that is done, click the \(\blacktriangledown\) (carrot) by Knit and select the type of document desired. For now select HTML.
The R Markdown panel (lower left of your RStudio panels) will scroll through output, and a new window will popup with the knitted document. Assuming no errors (we’ll deal with those in a while …) you now have a completed .html document that can be opened in any browser. The .html document will also be saved in your default working directory.
You’re done. You’ve built your first RMarkdown document in .html format.
So far we’ve built a basic RMarkdown document and exported it as a .html format. Next, let’s consider some more creative aspects of RMarkdown documents, including:
Linking websites and existing files involves two coding symbologies: the []
(two square brackets) followed immediately by ()
(left right parentheses) as in []()
. Do not place a blank space between the []
and ()
symbol sets. Content inside the []
becomes the highlighted text for the link, while text inside the ()
is the actual http:// or https:// website location.
#### Coding a link to a website
[LinkText](WebAddress)
You can make links explicit, as in "Click [here](https://www.rstudio.com/) to access RStudio."
[here] becomes the highlighted link to click.
The actual link, enclosed by the (), is not visible in the RMarkdown document.
Or, you can highlight entire phrases, such as
"The [R-project homepage](https://www.r-project.org/) is the starting location to learn about R."
In this example, the [R-project homepage] phrase is the highlighted link to click.
Files can be a bit trickier, mostly because of file pathnames. For now, we’ll assume everything is nested under your OS home directory.
#### Coding to link to a file
This is how to link a file, such as a file called [MyPDF](markdown_basics05.pdf).
Let’s see what links look like in a finalized Markdown document displayed in a browser.
Coding for pictures and video is not too different from that for files, subject only only to some file-type constraints. Although work-arounds do exist, pictures embedded in RMarkdown should be bitmap- rather than vector-based. Common bitmap image formats include .png, .jpg, .gif, .tif, and .bmp. In general, .png are considered excellent for web-based output (small file size, no loss in quality), although .jpg can do better for photographs, especially those rich in color. .tif formats are equally good for rich colors, but suffer from large file size.
You can convert among the many different image formats using an image editor. Some great freeware image editors include IrfanView, InkScape, ImageMagick, or GIMP.
Embedding a picture
Code to embed a picture can be of one of two types, a RMarkdown-based linkage or .html. In the RMarkdown version, as shown below, the !
indicates an image, the []
can be used for a caption or reference for the picture, and the ()
the picture filename.
The .html version merely uses the .html <img />
code. If you wish to control the number of pixels presented (i.e., make the picture bigger or smaller) then use .html coding. Alternatively, use one of the image editors noted above and create the desired picture size in pixels, and then use the RMarkdown linkage.
Two cautionary notes:
The RMarkdown version:– Coding for insertion of a picture is below. Note that although a picture caption can be added, it is not required.
#### RMarkdown coding with a picture caption
![PictureCaption](PictureFilename.Type)
![Source: http://mikemclin.net/markdown-syntax-language/](markdown_image1.png)
#### RMarkdown coding without a picture caption
![](PictureFilename.Type)
The .html version:– Results using HTML coding will be the same as Markdown coding. Note that any caption - here, the source of the picture - must be added as a line of text below the picture. You can control the size of the picture using height=
and width=
options in the <img />
tag. Options for the HTML <img />
tag can be found here.
#### HTML coding with defult size (pixels) of picture
<img src="PictureFilename.Type" />
<img src="markdown_image1.png" />
Source: http://mikemclin.net/markdown-syntax-language/
#### HTML coding with specified width and height in pixels
<img src="markdown_image1.png" height="67px" width="100px" />
Source: http://mikemclin.net/markdown-syntax-language/
Check out the updated homwork file with embedded pictures!
Embedding video
Video within Markdown is restricted as of now to YouTube only. This will undoubtedly change over time (for example see: package vembedr
). You will need to establish a YouTube account, and upload video into your personal My Channel. After uploading to My Channel you will receive the video’s unique identifier.
Construction of data and text tables in RMarkdown can be frustrating and irritating, as well as visually unappealing when output, principally because RMarkdown does not actually support tables per se. All forms of table construction allow for inclusion of a table description. By default they extend across the entire webpage width when in .html format.
Several variations of types of tables exist. The examples here are for simple tables only; additional methods for constructing tables based on R packages will be illustrated below when we begin to embed code into RMarkdown documents. Note also that the table output from even the simple code show below will change if different .css output control files are attached to the RMarkdown document.
We will explore the different options for table constricton below, and examone ouptut from each at the end of this section.
The pipe table
The simplest table code to implement is the so-called pipe table. The term pipe refers to the |
(vertical bar) symbol, which is used to separate among cells and the information each cell contains. It requires column headers, either as text or blank, separators based on --
(hyphens) between the header and the cells, and as noted above, the |
(vertical bar) pipe symbol. Default is left alignment; the :
(colon) is used to change alignment as shown below. A hard return, followed by a :
(colon) symbol and text creates a table description. The description defaults to above the table information when the table is output.
#### Coding a simple pipe table.
| Default | Left | Right | Center |
|---------|:-----|------:|:------:|
| 12 | 34 | 56 | 78 |
| 901 | 234 | 567 | 890 |
| 1 | 2 | 3 | 4 |
: Table 1. Demonstration of pipe table syntax.
There’s a reasonably friendly website for building pipe tables, too. It handles .csv files as well.
You can “force” pipe tables to wrap text by altering the relative widths of the first (header), second (demarcation), and third through n (table information) rows. In the example below the width of text plus space in the header (1st) row in Column 1 of the table is identical to the width of the demarcation (2nd) row. In Column 2, the width of the 2nd row is greater than the header width and the text in the table information row is greater as well. This forces a wrap-around.
#### Coding a pipe table with wrap-around.
| Column 1 | Column 2 | Column 3 | Column 4 |
|:---------|:-----------|:-----------|:----------|
| Some | This is an exceedingly verbose sentence that should wrap around | It wraps around | The |
| column 1 | in | because the pipe length in the 2nd row | last column |
| text | Column 2 | is greater than the pipe length in the 1st row. | is 4 |
The line table
An alternative is the line table. Here, the top row (blue color in RStudio) in the table consists of multiple -
’s (hyphen) in a row. The second, header row consists of the column names. The third row (red color in RStudio), also a sequence of -
’s, sets the widths of the columns, and determines column alignment based on the location of the column names relative to width of the -
’s. Default alignment is left. To control left alignment, the column header aligns on the left of the -
’s; right aligns with the right side of the -
sequence. Centering occurs when there is an extra -
on either side of the column name.
Unlike a pipe table, absolute width in a line table can be controlled by the combined length of the third row of -
’s. In the first example, the table is a “minimal width” table where just enough third row -
’s are added to control column alignment. The second has a longer sequence of -
. This will expand the column width.
#### Line table of minimal width
-----------------------------
Default Left Right Center
------- ----- ------ --------
12 34 56 78
901 234 567 890
1 2 3 4
----------------------------
#### Line table of expanded width in column 4
-----------------------------
Default Left Right Center
------- ----- ------ ---------------------
12 34 56 78
901 234 567 890
1 2 3 4
----------------------------
Compare the different table configurations arising from the pipe and line table coding. In particular, note the wrap-around text in the pipe table, and the expansion of column 4 width in the line table. Realistically, I’ve never been able to get a RMarkdown table “correct” in the first, let alone the first several tries. It often requires playing around with different width configurations until you obtain output that is visually appealing.
One of the more powerful aspects of RMarkdown is the ability to use mathematical symbols and build (even complex) mathematical equations, and have those portrayed in the final RMarkdown document. Mathematical symbols and equations in RMarkdown use TeX math, which although it relies on LaTeX syntax for the symbol or equation, is a bit different in implementation.
The difference is that Tex math uses the $
(dollar sign) symbol to start and stop various mathematical symbols used in isolation or as part of an equation while LaTeX uses the \
(left slash) only. The implementation difference is that the \
(left slash) symbol is an escape character in RMarkdown. Within RMarkdown, the first $
must be followed by a character (not a blank space), while the last $
requires a character on its left. The closing $
cannot be followed by a digit.
Consider the following examples:
They represent in-line embedding of symbols and equations.
#### LaTex coding for the in-line examples above
* Sample $\bar{x}$ temperature ($\pm$ 2-SD) was 11.8 (2.3) $^\circ$C
* Pythagorean's theorem is $x^n + y^n = z^n$
* The simple linear regression equation is: $Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$
More complex equations, designed to be represented as stand-alone lines in a document, can also be constructed, as with the sample variance example shown next.
The equation for sample variance is: \[s^2 = \sqrt\frac{\sum\limits_{i=1}^{n}(x_i - \bar{x})^2} {n - 1}\]
In this example, the equation is centered rather than left-aligned. This was accomplished using two $$
(dollar sign) symbols at the beginning and end of the equation rather than one $
.
#### LaTex coding for a centered, stand-alone equation
The equation for sample variance is:
$$s^2 = \sqrt\frac{\sum\limits_{i=1}^{n}(x_i - \bar{x})^2} {n - 1}$$
The code to embed mathematical symbols and build equations relies on LaTex syntax. One of the better websites for LaTex coding syntax is the AoPS (Art of Problem Solving) Wiki, while a convenient two-page “cheatsheet” built by Winston Chang and licensed by Creative Commons can be accessed from here.
In the first part we learned how add text to a RMarkdown document, including how to:
Once the text was written, we saved and compiled (i.e., “Knited”“) the RMarkdown document in a .html
format.
Now we’ll see how to embed R code into the document, and control the execution and output of the code depending on the desired final RMarkdown document. Specifically, we will:
These are the final elements needed to build a RMarkdown document of your own.
chunk is the term that refers to R code to be rendered in a RMarkdown document. A chunk can include one or more lines of R code implemented as a block of code, or a chunk can be executed inline and directly from within text constituting a sentence. chunk’s can be embedded directly by typing into the document, or by using the drop-down chunk icon on the toolbar.
The basic block chunk begins with ```{r}
(three backticks ```
then {r}
with no spacing), follows with a hard returns for each line of code in the block, and ends with ```
.
An inline chunk of code begins with `r
, follows with a single space then executable R code Code2Execute
, and ends with `
.
#### The basic R Markdown chunk as a block of code
```{r}
```
#### The basic R Markdown chunk as inline, executed R code
`r Code2Execute`
The default execution of a block chunk returns the R code plus results of the code, while an inline chunk returns a value and embeds it directly into a sentence.
Setting local chunk options
Common options to modify output to the RMarkdown document include:
These options can be set locally within a chunk, or set globally for all chunk’s. Within a chunk options appear inline within the {r}
syntax. Multiple options can be set, with each separated by a ,
. Although RStudio shows some of the available chunk options, a more comprehensive discussion of possible chunk options is found within the knitr homepage.
#### Naming a chunk
```{r ChunkName}
```
#### Return the results but not the code
```{r echo=F}
````
#### Return the results, and only lines 4:7 of the code
```{r echo=4:7}
```
#### Return the code but do not evaluate
```{r eval=F}
```
#### Naming a chunk, and supressing return of code
```{r ChunkName, echo=F}
```
Setting global chunk options
Global options set chunk options throughout the entire document, such as the suppression of R messages and warnings. Global options should be inserted after the YAML, and before the document text begins.
#### Setting global options; here, suppress R warnings and messages
```{r global options}
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
```
Let’s execute R code embedded in a RMarkdown document. For an example we will use a small cross-classified data structure examining fish selection by males and females of the piscivorous osprey Pandion haliaetus. Three types of fish - bass, shad, and sunfish - were available. A simple contingency analysis will be applied to the data.
This first chunk example imports a .csv
file and examines the resulting dataframe structure. Note that output lines are preceded with ##
(two pound symbols); this helps distinguish code from output. All chunk’s in this example accept the default and return both the R code being evaluated and the output. This is so you can understand the chunk rendering process in RMarkdown.
# import data and examine dataframe characteristics
f1 <- read.csv("data/ospreypreybysex.csv", header = T)
str(f1) # examine dataframe structure
## 'data.frame': 6 obs. of 3 variables:
## $ ospreysex: Factor w/ 2 levels "Female","Male": 2 1 2 1 2 1
## $ fishspp : Factor w/ 3 levels "Bass","Shad",..: 3 3 1 1 2 2
## $ count : int 59 72 14 21 189 138
Next, examine the cross-classified table of fish species captured by each sex.
# cross-tabluation of captures by fish spp and osprey sex
f2 <- xtabs(count ~ ospreysex + fishspp, data = f1) # cross-tab the data
f2 # cross-tab output
## fishspp
## ospreysex Bass Shad Sunfish
## Female 21 138 72
## Male 14 189 59
Here, we write the \(\chi^2\) output to the object f3
and see what output parameters are available from f3
. We’ll explain why we did this in Example #2.
# chi-sq test of cross-tab data
f3 <- chisq.test(f2) # chisq test
f3 # chi-sq output
##
## Pearson's Chi-squared test
##
## data: f2
## X-squared = 8.7294, df = 2, p-value = 0.01272
ls(f3) # available output parameters from the chi-sq analysis
## [1] "data.name" "expected" "method" "observed" "p.value" "parameter"
## [7] "residuals" "statistic" "stdres"
If this were a classroom exercise, I - an instructor - would most likely wish to see all the code and output as shown in Example #1. That way I could evaluate your work and grade the exercise.
However, consider other circumstances where you wish to render the various chunk outputs into a sentence (or paragraph) of plain Ol’ King’s English. That sentence might be something like:
A \(\chi^2\) analysis indicates that male and female ospreys differentially selected fish species while foraging (\(\chi^2\) = 8.729, p = 0.013, df = 2).
Wow! The analytical “answers”" were (somehow) directly incorporated into the sentence. How was the coding that resulted in that sentence written?
First, examine the output from above of the \(\chi^2\) analysis. Note the sentence has captured the output \(\chi^2\) statistic, the degrees of freedom (df), and the p-value, and incorporated those values directly into the sentence using inline chunk code. The code that generated the sentence looks like this:
#### chunk code as inline
A $\chi^2$ analysis indicates that male and female ospreys differentially selected fish
species while foraging ($\chi^2$ = `r round(f3$statistic, 3)`,
*p* = `r round(f3$p.value, 3)`, *df* = `r f3$parameter`).
We first embedded mathematical syntax to print the symbology \(\chi^2\). Next notice how inline chunk coding was used three times to extract specified information from the output of the \(\chi^2\) analysis, the object f3
we created earlier. The ls(f3)
call had returned the output objects of the analysis, of which statistic
, p.value
, and parameter
represent, respectively, the values of the \(\chi^2\), p-value, and df’s. (NOTE: Selection of correct output objects by name assumes working knowledge of R analysis and outputs.) These values were extracted using standard R coding, such as f3$statistic
. We also applied the round()
function to the extracted values to reduce the number of significant digits carried over into the sentence.
There is nothing fancy here: these are basic R operations for extracting information from an analysis. We’ve merely used RMarkdown syntax to extract these values and embed them into a document.
A repeat of something stated above regarding tables in RMarkdown:
“Construction of data and text tables in RMarkdown can be frustrating and irritating, as well as visually unappealing when output, principally because RMarkdown does not actually support tables per se.”
Solutions do exist, but before we can produce more visually appealing tables in RMarkdown, some basic understanding of format control in a RMarkdown document is needed.
Control of RMarkdown output - and tables the document contains - can occur at two separate levels. The first is default RMarkdown syntax that controls the final rendering of the document. Thus far we’ve only implemented .html
output, as specified in the YAML. Consequently, output is converted to table coding conventions associated with .html. It is a minimalist approach.
A second approach is to control output of the entire RMarkdown document by using CSS control. Use of CSS control for RMarkdown output is discussed in detail below under Advanced RMarkdown topics, so it won’t be discussed here. Instead we’ll focus on ways to apply some R packages to create tables.
(Full Disclosure: I’ve been cheating thus far; this document has CSS control, while the output you’ve clicked on and followed is RMarkdown default .html
without any CSS control.)
Use of R packages for table control
There are several packages in R that provide control over table construction. These include the function kable
within package knitr
, and the packages pander
and xtable
. Here we’ll explore how these packages improve how tables looks visually, as well as how these packages and their functions can directly work with basic R code to generate tables.
We will use the osprey data set (f1
) from above. The data in the R dataframe look like this:
## ospreysex fishspp count
## 1 Male Sunfish 59
## 2 Female Sunfish 72
## 3 Male Bass 14
## 4 Female Bass 21
## 5 Male Shad 189
## 6 Female Shad 138
Build a table using the kable
function from package knitr
Assuming your data are in a standard R dataframe, one method to build a table is to load the library knitr
and apply the function kable
to the data frame, here f1
. Important options to consider including are col.names =
, where more “readable” labels can be used as table column headers, digits =
, which applies a specified rounding to table output, and a vector of data alignment using align =
. The caption =
option allows for inclusion of a table description.
#### build a table using fxn kable from package knitr
```{r}
library(knitr) # load library: knitr
kable(f1, col.names = c("Sex", "Fish Species", "Frequency"), align = c("l", "c", "r"),
caption = "Fish capture freqeuncy by osprey sex")
```
The output kable
table from the code above looks like this. As mentioned above (Section 7.3), this form of table construction in a .html environmment spans the entire width of the browser page.
Build a table using the package xtable
xtable
has many of the same output options as the kable
function, but also allows for more control of table output. A necessary option to include within the chunk for a xtable
is results="asis"
, which renders the results into text. Otherwise xtable
returns the code for .html, or other formats such as LaTex.
Consequently modify the chunk as:
#### chunk code to portray the rendered table
```{r, results="asis"}
````
xtable
also requires a print()
wrapper or the default output is returned as LaTex code rather than text. The option type =
specifies the form of the output, as in .html. As with kable
, some available formatting options include digits, alignments, and captions. Row numbers are printed by default; use include.rownames = F
to turn them off.
#### build a table using xtable
```{r, results="asis"}
library(xtable)
# xtable(f1) NOT RUN: would return a default table of LaTex code
print(xtable(f1), type="html", include.rownames = F) # rendered table in html format
```
Note that basic xtable
output includes |
(vertical bar) as a separator between table cells, and that the data seem “squished” in appearance. This appearance, as well as many other aspects of table appearance, can be controlled from within xtable
. The xtable Gallery is an excellent .pdf vignette full of examples on how to format tables.
Assume you wish to change the sequential cell alignment to left, center, right. This is the option align =
where r
, l
, and c
represent right, left, and center, respectively. Assign the desired option with the initial xtable
call and save it as an object (here, tab1
). The apply the print()
wrapper to the object. An oddity to note: xtable
automatically includes row numbers and you must format for that column, even if your intention is to suppress row numbers the final rendering of the text with print()
.
Adding “readable” column names can be done within xtable
, but it can be ponderous. Because output from xtable
inherits the original dataframe, it is easier to apply names()
to the xtable
object prior to rendering using print()
.
One consideration regarding formatting in xtable
. Above, the print()
wrapper was placed outside the xtable()
call and all table syntax rendered in a single line of code. For reason(s) I cannot fully figure out, format control seemingly works best if applied and saved as an xtable
object, and then having the object rendered from inside the print()
wrapper. Attempting all formatting operations within a single call often fails.
#### a xtable with alignment, new column names, and suppressed row names
```{r, results="asis"}
# add format: alignment
library(xtable)
# xtable(f1) NOT RUN: would return a default table of LaTex code
tab1 <- xtable(f1, align = "llrc") # formatting outside print() wrapper
names(tab1) <- c("Sex", "Fish Species", "Frequency") # rename columns
print(tab1, type="html", include.rownames = F) # rendered table in html format
```
Compare the two outputs – one without results="asis"
and the other with some minimal formatting.
Build a table using the package pander
Package pander
is another means of formatting tables for output. An added benefit ofpander
(as well as xtable
, actually) is that it can access results of a statistical analysis and print a table of those results.
As with xtable
, it is simply easier to create “readable” column names outside the pander()
call; here, the variables were renamed before the rendering. justify =
controls alignment, and a caption =
can be added if desired. A large variety of other pander options are available as well.
#### built a table using pander
```{r}
library(pander)
names(f1)=c("Sex", "Fish Species", "Frequency") # rename columns
pander(f1, caption = "Fish capture frequency by osprey sex",
justify = c("right", "center", "center")) # rendered table in html format
```
Recall from above we accessed the results of the \(\chi^2\) analysis (object f3
) on osprey foraging, and used inline chunk code to present the results as text. Assume instead you wish for a table of those results. Simply call the saved analytical object, here f3
, from inside pander()
. Note you can even embed mathematical symbols in the caption.
#### render statistical output as a table: chi-sq test from above
```{r}
library(pander)
pander(f3, caption = "$\\chi^2$ test of fish capture frequency by osprey sex")
```
You can see the pander
table outputs here.
A (mostly complete) list of statistical analyses that pander
can render as tables is found by loading the pander
library and invoking methods(pander)
. Current offerings include:
## [1] pander.anova* pander.aov*
## [3] pander.aovlist* pander.Arima*
## [5] pander.call* pander.cast_df*
## [7] pander.character* pander.clogit*
## [9] pander.coxph* pander.cph*
## [11] pander.CrossTable* pander.data.frame*
## [13] pander.data.table* pander.Date*
## [15] pander.default* pander.density*
## [17] pander.describe* pander.ets*
## [19] pander.evals* pander.factor*
## [21] pander.formula* pander.ftable*
## [23] pander.function* pander.glm*
## [25] pander.Glm* pander.gtable*
## [27] pander.htest* pander.image*
## [29] pander.irts* pander.list*
## [31] pander.lm* pander.lme*
## [33] pander.logical* pander.lrm*
## [35] pander.manova* pander.matrix*
## [37] pander.microbenchmark* pander.name*
## [39] pander.nls* pander.NULL*
## [41] pander.numeric* pander.ols*
## [43] pander.orm* pander.polr*
## [45] pander.POSIXct* pander.POSIXlt*
## [47] pander.prcomp* pander.randomForest*
## [49] pander.rapport* pander.rlm*
## [51] pander.sessionInfo* pander.smooth.spline*
## [53] pander.stat.table* pander.summary.aov*
## [55] pander.summary.aovlist* pander.summary.glm*
## [57] pander.summary.lm* pander.summary.lme*
## [59] pander.summary.manova* pander.summary.nls*
## [61] pander.summary.polr* pander.summary.prcomp*
## [63] pander.summary.rms* pander.summary.survreg*
## [65] pander.summary.table* pander.survdiff*
## [67] pander.survfit* pander.survreg*
## [69] pander.table* pander.tabular*
## [71] pander.ts* pander.zoo*
## see '?methods' for accessing help and source code
pander
covers the bulk of traditional statistical analyses, but quoting directly from the pander: An R Pandoc Writer site:
“If you think that pander lacks support for any other R class(es), please feel free to open a ticket suggesting a new feature or submit pull request and we will be happy to extend the package.”
In short, the site is open to suggestions for additional forms of statistical analysis to be rendered with pander
. In addition, you could create your own function if desired by emulating one of the other functions - that’s where some of the true power of R comes into play!
Embedding a R-generated figure or plot still uses chunk code, but with different options for configuration. Beginning simple, consider a barplot of the frequencies of the three groups of fish captured by male and female ospreys and represented in our previous data object f1
. Note there are no chunk options; just the R code for a barplot is embedded in the chunk. Default figure placement is left justification.
#### basic chunk code for embedding a figure
```{r}
# embed a plot
plot code here ....
```
Here’s the code …
#### basic barplot inserted as figure in RMarkdown document
```{r}
# embed a barplot; assumes object f1 & f2 from above
freq.r <- range(0, signif(max(f1$count), 1) + 10) # use raw data for ylim= range
bplot1 <- barplot(f2, ylim = freq.r, xlab = "Fish Species",
ylab = "Capture Frequency (n)", space = c(0, 0.5), col = c("black", "grey"), beside = T)
abline(h = 0) # add line to bottom of plot
# add legend
legend("topleft", c("Female", "Male"), fill = c("black", "grey"), bty = "n", cex = 1.25)
title(main = "Fish Species Capture Frequency by Osprey Sex")
text(x = bplot1, y = c(f2[1:6]), labels = c(f2[1:6]), pos = 3) # adds f2 counts above bars
```
… and resultant barplot of the osprey fish captures by males and females introduced above.
Some options for RMarkdown figures
Two common figure chunk options are to specify figure width and height, and create an outfile version of the figure. Default figure size reflects R conventions; 480 \(\times\) 480 pixels at 72dpi for bitmap-based output (i.e., .png), which translates into an approximate 6.7 \(\times\) 6.7 inch size. The option fig.path="PATH/"
, where PATH
followed by a /
(right slash) indicates the directory for the figure.
IMPORTANT: The specified directory path is from the default directory for the RMarkdown document. Otherwise a full path must be represented.
#### set fig size as 4X4 & establish file outfile path
```{r, fig.width=4, fig.height=4, fig.path="outfigs/"}
```
It may be beneficial to simultaneously plot a vector-based plot (i.e., .pdf, .eps) for archiving along with a bitmap-based (i.e., .png) plot for the document. Simply suppress the R code and output using the chunk option include=F
, and outfile the figure to the desired directory.
#### suppress all output, & build .pdf figure for output
```{r, include=F, fig.width=4, fig.height=4}
# create a duplicate of figure in pdf outfile
pdf("PATH4FILE/FIGURENAME.pdf")
# plot code would follow here
code code code ....
dev.off()
```
Note there exists potential to alter figure placement by using the option align="center"
, which uses .html tag <img />
rather than the RMarkdown ![]()
syntax. However, alignment other than left justified is not supported within HTML5, so it is best to not use alignments other than the default in .html documents. See the CSS HTML site for details on <img />
tag control.
Be careful with sizing as it sometimes clips figure output unless the width:height ratio is considered. This, however, is the same issue faced with any plot in R. If you wish global control over figure characteristics, such as a desire for all figures to be of same width and height, or a common outfile directory, these can be inserted as options under the {r global_options}
, much as shown for warnings above.
Last, do not forget it is R coding that determines characteristics of your RMarkdown figure. This means any type of plot that is possible in R can be embedded into your document. Check out results of the options for figures we discussed here.
The knitr
website lists all available chunk plots and figures, including figure control for documents other than .html output.
Assume you have been instructed to complete a formal analysis and write-up of prey capture by male and female osprey using the data from above. The write-up would include a list of objectives, the R code used for analysis, the output, and some measure of interpretation.
One possible rendition of this homework assignment is here.
You’re done! You should now be able build a RMarkdown document that integrates R coding and analysis directly into a .html document.
As noted at the start of this document, there’s almost unlimited potential for RMarkdown-based documents. If you made it through the first section of this document, you can easily create a .html document. And if you’re in one of my classes, a RMarkdown document is how I’ll be expecting your homework to be delivered to me. So experiment with different ways to document your work; that’s the best way to learn RMarkdown syntax.
Even with an almost unlimited potential, there are several advanced topics for RMarkdown to be considered. These include, in no particular order:
Setting a working directory other than default
RMarkdown defaults to your R home directory, which is C:\Users\USRNAME\Documents
in a Windows OS, /Users/USRNAME
in a MAC OS, and usr/home/USRNAME
on Linux builds. Depending on how you prefer to organize your workflow on a personal CPU, a default to you OS home may, or may not, be desired. It is not for me, given the co–mingling of research day–to–day work life, and teaching responsibilities on the same CPU.
So exactly how do you “set” a home directory different from default?
Because the rendering process works through package knitr
, the working directory for knitr
is set with the following option root.dir =
:
```{r global_options, include=FALSE}
knitr::opts_knit$set(root.dir = "~/PATH/WORK_DIR")
```
Setting the working directory occurs within the {r global_options}
. It is best that the RMarkdown document options be set immediately after the YAML. Operationally, once this directory is set any future directory changes should nested under this working directory. Although it possible trick RMarkdown into changing among many different directories within different chunks, it is generally not recommended. Use the root.dir
as shown above and then nest directories underneath. You can always use full paths within a chunk, but this can become tedious.
Inserting comments into a RMarkdown HTML document
The standard format for a .html comment – leading as <!--
and ending as -->
– can be inserted into a RMarkdown document.
<!-- HTML comment that will be ignored -->
The comment will be ignored during rendering of the RMarkdown document. I often use them as metaphorical breadcrumbs to help remind me of options I have set.
#### HTML comments as breadcrumbs
<!-- set root directory here -->
<!-- suppress R warnings -->
```{r global_options, include=FALSE}
knitr::opts_knit$set(root.dir = "~/PATH/WORK_DIR")
knitr::opts_chunk$set(warning=FALSE)
```
Rendering and knitting a RMarkdown document from outside RStudio
RStudio is by far the easiest approach for creating a RMarkdown document.
Even I – an acknowledged Curmudgeonly Dinosaur – use it. However, as noted at the start of this vignette, command line code from within Plain Vanilla R can be used as well to create a RMarkdown document. I do not show how to use command line RMarkdown here. Simple google searches using phrases like “command line r markdown” return numerous hits that can be explored if desired.
My recomended Bottom Line, however, is to use RStudio to build RMarkdown documents.
One of the principal ways to control RMarkdown document formatting is through use of so-called templates, be it a template for HTML output, Word, or PDF. In the case of .html, this control is best implemented through use of CSS control . Not surprisingly, there already exists several “canned” .css-type control files available for RMarkdown within the RStudio environment. Although they are not represented within RStudio as .css files per se, but are instead referred to as “Highlight” and “Theme,” they operate in the same fashion.
For example, by now you (should) have noticed that the output of the .html document describing how to use RMarkdown differs from the example outputs. This is because my RMarkdown has .css control while the rendered examples are in the RStudio default for .html. This default has no .css control.
So how would change the stylistic formatting?
Adding stylistic control from within RStudio
Open and examine the final RMarkdown document in your browser. Keep it there; it will serve as a reference as we add some stylistic control options and render different html documents. Next, from within RStudio open the .rmd file that built the .html output you have open in your browser. Note that it includes, at the end of its filename, 01
. This is because we’ll be making different versions of this using different stylistic formats, and the 01
will be a counter.
Stylistic control is added in the YAML. To embed stylistic control, click the \(\blacktriangledown\) (carrot) to the right of the tool symbol, and navigate the drop-down to Output Options (light blue color):
Click the Output Options. This opens a new drop-down. At the top you can choose HTML, Word, or PDF. For now, make sure HTML is selected. Note that the syntax highlighting and Apply theme boxes are checked, and that they show default.
Here is where you can add one of RStudio’s “canned” formats. First, make sure you’ve opened the markdown_basicsFINAL01.rmd
file in RStudio. Next, follow the process above and pick a syntax from the list. I selected haddock (mostly because it is a wonderful fish for eating, not for any specific RMarkdown reason). Before hitting Ok be sure you are looking at the YAML in the markdown_basicsFINAL01.rmd
. Hit Ok, and observe how the YAML has changed. It now includes; nested under the YAML output: html_document:
a new option called highlight: hadddock
.
#### YAML now includes RStudio highlight option that links to a .css-type file
---
title: "A Final **R**Markdown Document in HTML Format"
author: "Student R. Me"
date: "14 August, 2018"
output:
html_document:
highlight: haddock
---
Last, save the document as 02
version, and then knit. Open the version 02
.html in your browser and compare to the 01
.html version. The first (and only) difference is the code syntax, and it is colored.
Next, pick a theme (I chose cerulean, liking deep sky blues) and save the change with your chosen theme as version 03
. Once again, the YAML will change - theme: cerulean
has been added - as will the knited document. Once again, it differs. In this example, the YAML output: html_document:
will now also include a theme: cerulean
.
Try as many highlights and themes as you wish.
Adding CSS control from within RStudio
Much like highlight:
and theme:
, the css:
call is placed within the the YAML output: html_document:
section. An example would be css: FILENAME.css
.
#### YAML now includes css: option that explicitly links to a .css file
---
title: "A Final **R**Markdown Document in HTML Format"
author: "Student R. Me"
date: "14 August, 2018"
output:
html_document:
css: FILENAME.css
highlight: haddock
theme: null
---
You can either build your own .css file, obtain one from elsewhere, or obtain and modify an existing .css file. They are relatively simple from within any basic text editor. A great place to start for RMarkdown-related .css files is this GitHub site , which includes a dozen or so different .css files . There’s a companion website that allows you to see differences among the RStudio styles. Check it out.
Because the .css file controls stylistic elements such as font, size, and color, it best to either not include the highlight:
and theme:
options within the YAML, or set them to null
, if a CSS control is being implemented.
As previously noted, there exist numerous formatting options for Markdown documents. Two default formats within RStudio are Word and PDF, both of which are accessible from the same drop-down menu location where the Knit HTML
is found.
MS Word format
To convert the RMarkdown document to a Word format simply click the Knit to Word
on the dropdown shown above. That’s all there is to it.
The doucment returned is typically “Read-Only,” and might require saving or changing the View (i.e., View => Edit Document) so that it can be edited if desired. This is largely system dependent and can only be determined by experimentation on your part. The Knit
process does a reasonably good job of converting most figures, although occasionally you may have to return to your R code and rescale some of the plot elements, such as the cex =
used in text labels or legends. Note below how the YAML in a Knit to Word
document looks; output has changed from html_document
to word_document
.
#### Basic MS Word YAML
---
title: "A Final **R**Markdown Document in HTML Format"
author: "Student R. Me"
date: "10 October 2016"
output:
word_document
---
But what if you do not like the Word defaults, which here are blue headers, a Cambria font of 12 pitch, as well as other defaults?
Much like we used a .css file to control format of .html documents, we can use a Word template to control the stylistic format. However, unlike there being several .css-type defaults for .html format in RStudio, as well as several .css found on GitHub and elsewhere, there are few “canned” Word templates available. You need to build one. The steps are reasonably simple:
Once open, from the Home
tab click the lower right of the Styles
section. This will open an adjoining panel of Styles options to the right.
Highlight a segment of the document, such as text in the first paragraph. Next, scroll down the Styles
panel until you find a highlighted section (here, First Paragraph). This represents the Word Style associated with that section of the document. Click the First Paragraph symbol \(\blacktriangledown\) (carrot), which opens a drop-down. Click Modify
, and then change the format to whatever is preferred. The Modify
merely implements any of the possible format options found in a standard Word document.
Repeat this process for all snippets of output that you wish to format. Once completed, save the document, giving it some sort of name like word_RMarkdown_templateV01.docx
. The final step is to reference the Word template in the YAML using the reference_docx:
option. As the RMarkdown file is processed, it uses the specified formats in the reference_docx:
. There will be, on occasion, formats that simply will not cross-over from RMarkdown to a Word document, such as a system calls. You will learn what these are mostly through personal trial and error.
#### MS Word YAML with reference to a Word template
---
title: "A Final **R**Markdown Document in HTML Format"
author: "Student R. Me"
date: "10 October 2016"
output:
word_document:
reference_docx: word-styles-reference-01.docx
---
One final note.
The process to construct a Word document using RMarkdown is very well described in a post by R. Layton, and I encourage you to check it out. He goes into much greater detail that I describe here. Remember, you need only build this Word template once and then access it when you wish to use RMarkdown to construct a MS Word document. It can take a while until you are satisfied with the formatting, but once the template is constructed and saved it is very easy to build Word documents that mirror your preferred .html format.
PDF format
Conversion to .pdf directly from RMarkdown is somewhat complex, mostly because it requires a full installation of TeX.
The Lazy Person’s Approach:– Don’t fight City hall. Build the MS Word document using a template of your choice, then convert the document to .pdf in Word.