MODULE 3.7 Data Reshaping in R

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective:

  • How to reshape data - the transposition of rows and columns



Let’s begin the topic of data reshaping in R with …

Some learning questions:

  • What does the concept of data reshaping mean?

  • How do I transpose my data - that is, make columns the rows and rows the columns?

  • How do I convert cross-tabulated data in a spreadsheet into a format for analysis in R?


Some Background - Why You Should Care


Data reshaping refers to processes that:

  • Transpose rows and columns, or
  • Convert data from a table format to [row,column] format used in most R analyses

A common circumstance is data in a spreadsheet table format as below:

Here, the actual data - the frequencies - are in cells that are cross-classified by the variable FishSpp as rows,and two columns (Male, Female), one for each sex. The goal of reshaping applied to these data is to create new columns of ospreysex (Male, Female) and the frequency (count) of each fishspp captured by ospreysex. The desired outcome looks like:


Some Initilaization …


Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData. Some of these objects will be needed, so load them first into your workspace.

# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
  getwd()  # in correct directory ?  
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
  list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
  load("mod3data.RData")  # load it
  ls()  # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"

If the objects are not there, or you did not save an .RData from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.


Transposing Data in R


t(DataObject) converts (transposes) rows to columns and columns to rows. It is a naive call with no options.

# assume object f1 from Exercise #6
  f1  # examine f1 before transpotion
##   FishSpp Male Female
## 1 Sunfish   59     72
## 2    Bass   14     21
## 3    Shad  189    138
  t(f1)  # f1 after transposition; rows and columns reversed from above
##         [,1]      [,2]   [,3]  
## FishSpp "Sunfish" "Bass" "Shad"
## Male    " 59"     " 14"  "189" 
## Female  " 72"     " 21"  "138"

Other Reshape Options - The Package reshape (or reshape2)


The packages reshape and reshape2 performs more complex operations on a data object. The function melt(DataObject) can converts cross-classified table data, as that shown above, into a [row, column] format.

f1  # assume data object f1 from Exercise #6
##   FishSpp Male Female
## 1 Sunfish   59     72
## 2    Bass   14     21
## 3    Shad  189    138
library(reshape)  # load package reshape
## Warning: package 'reshape' was built under R version 3.2.3
f2 <- melt(f1) # reshape using melt converts to [rows,columns]
## Using FishSpp as id variables
names(f2)[2:3] <- c("Sex", "Count"); f2  # re-label data w/names & examine; [row, column] format
##   FishSpp    Sex Count
## 1 Sunfish   Male    59
## 2    Bass   Male    14
## 3    Shad   Male   189
## 4 Sunfish Female    72
## 5    Bass Female    21
## 6    Shad Female   138

Note that R returns a message “Using FishSpp as id variables.” This means that melt() performs the reshaping on the variable FishSpp and transposes (i.e., pivots) the other two columns, Male and Female.

Controlling the pivot:
But what if you wanted to perform a transposition by fixing one or more variables and transposing the others? (Think of an .xls/.xlsx “pivot table”)

# import a new data set f1mod.csv;  NOTE it has 2 Lakes but is still a cross-classified data table
f1mod <- read.csv("f1mod.csv", header = T); f1mod  # examine
##     Lake FishSpp Male Female
## 1 Spring Sunfish   59     72
## 2 Spring    Bass   14     21
## 3 Spring    Shad  189    138
## 4  Clear Sunfish   63     95
## 5  Clear    Bass    7     32
## 6  Clear    Shad  126     98
# first do a reshape and see what happens
melt(f1mod) # reshape & examine
## Using Lake, FishSpp as id variables
##      Lake FishSpp variable value
## 1  Spring Sunfish     Male    59
## 2  Spring    Bass     Male    14
## 3  Spring    Shad     Male   189
## 4   Clear Sunfish     Male    63
## 5   Clear    Bass     Male     7
## 6   Clear    Shad     Male   126
## 7  Spring Sunfish   Female    72
## 8  Spring    Bass   Female    21
## 9  Spring    Shad   Female   138
## 10  Clear Sunfish   Female    95
## 11  Clear    Bass   Female    32
## 12  Clear    Shad   Female    98

It worked!

The data output looks logical and is in the desired [row,column] format. As a function, melt() defaults to a transposition where the last column in the data becomes the “values” - that is, the data, which here is the frequencies. The column next to last column becomes “variable” (here, the osprey sex), and the first two columns in sequence become the “id variables” Lake and FishSpp on which the pivot occurs.

Can I alter how the pivot occurs?
You can alter how the pivot occurs by using the option id =c(), where c(Var2Use) is the variables, and their sequence, on which the pivot will occur. Let’s compare by switching the sequence of variable Lake and FishSpp. Output now has FishSpp in the first column and Lake in the second.

melt(f1mod, id =c("FishSpp", "Lake")) # reshape w/different sequence & examine
##    FishSpp   Lake variable value
## 1  Sunfish Spring     Male    59
## 2     Bass Spring     Male    14
## 3     Shad Spring     Male   189
## 4  Sunfish  Clear     Male    63
## 5     Bass  Clear     Male     7
## 6     Shad  Clear     Male   126
## 7  Sunfish Spring   Female    72
## 8     Bass Spring   Female    21
## 9     Shad Spring   Female   138
## 10 Sunfish  Clear   Female    95
## 11    Bass  Clear   Female    32
## 12    Shad  Clear   Female    98

Summary of Module 3.7 functions


Basic calls related to data reshaping are:

  • t() => Naive transposition
  • melt() => Reshape (“Pivot”) cross-classified data

An important option for melt() is:

  • id =c(Vars2Use) => Controls the variables used, and their sequence, in the reshape

Exercise #8


Data for this exercise are in: ../baseR-V2016.2/data/exercise_dat.

The data file rodent_seedchoice.csv is a spreadsheet-based representation of seed preferences by rodents. Variables diff_ are the difference between seeds chosen and seeds presented, where “_" are the 3 seed choices. This is a commonly found “table” format data structure that must be reshaped for analysis in R.

  • Import these data into R.
  • Using reshape techniques, convert this .csv file into an R data object, where the 3 seed choices are a labeled column (seed), with the associated difference value a column named diff as well.
    IMPORTANT: This reshape will occur only for a subset of the columns, not all columns.
  • Save the data structure as rodents_seedchoice.RData

END MODULE 3.7


Printable Version