Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016
Data reshaping refers to processes that:
A common circumstance is data in a spreadsheet table format as below:
Here, the actual data - the frequencies - are in cells that are cross-classified by the variable FishSpp as rows,and two columns (Male, Female), one for each sex. The goal of reshaping applied to these data is to create new columns of ospreysex (Male, Female) and the frequency (count) of each fishspp captured by ospreysex. The desired outcome looks like:
Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData
. Some of these objects will be needed, so load them first into your workspace.
# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
getwd() # in correct directory ?
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
load("mod3data.RData") # load it
ls() # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"
If the objects are not there, or you did not save an .RData
from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.
t(DataObject)
converts (transposes) rows to columns and columns to rows. It is a naive call with no options.
# assume object f1 from Exercise #6
f1 # examine f1 before transpotion
## FishSpp Male Female
## 1 Sunfish 59 72
## 2 Bass 14 21
## 3 Shad 189 138
t(f1) # f1 after transposition; rows and columns reversed from above
## [,1] [,2] [,3]
## FishSpp "Sunfish" "Bass" "Shad"
## Male " 59" " 14" "189"
## Female " 72" " 21" "138"
The packages reshape and reshape2 performs more complex operations on a data object. The function melt(DataObject)
can converts cross-classified table data, as that shown above, into a [row, column] format.
f1 # assume data object f1 from Exercise #6
## FishSpp Male Female
## 1 Sunfish 59 72
## 2 Bass 14 21
## 3 Shad 189 138
library(reshape) # load package reshape
## Warning: package 'reshape' was built under R version 3.2.3
f2 <- melt(f1) # reshape using melt converts to [rows,columns]
## Using FishSpp as id variables
names(f2)[2:3] <- c("Sex", "Count"); f2 # re-label data w/names & examine; [row, column] format
## FishSpp Sex Count
## 1 Sunfish Male 59
## 2 Bass Male 14
## 3 Shad Male 189
## 4 Sunfish Female 72
## 5 Bass Female 21
## 6 Shad Female 138
Note that R returns a message “Using FishSpp as id variables.” This means that melt()
performs the reshaping on the variable FishSpp and transposes (i.e., pivots) the other two columns, Male and Female.
Controlling the pivot:
But what if you wanted to perform a transposition by fixing one or more variables and transposing the others? (Think of an .xls/.xlsx “pivot table”)
# import a new data set f1mod.csv; NOTE it has 2 Lakes but is still a cross-classified data table
f1mod <- read.csv("f1mod.csv", header = T); f1mod # examine
## Lake FishSpp Male Female
## 1 Spring Sunfish 59 72
## 2 Spring Bass 14 21
## 3 Spring Shad 189 138
## 4 Clear Sunfish 63 95
## 5 Clear Bass 7 32
## 6 Clear Shad 126 98
# first do a reshape and see what happens
melt(f1mod) # reshape & examine
## Using Lake, FishSpp as id variables
## Lake FishSpp variable value
## 1 Spring Sunfish Male 59
## 2 Spring Bass Male 14
## 3 Spring Shad Male 189
## 4 Clear Sunfish Male 63
## 5 Clear Bass Male 7
## 6 Clear Shad Male 126
## 7 Spring Sunfish Female 72
## 8 Spring Bass Female 21
## 9 Spring Shad Female 138
## 10 Clear Sunfish Female 95
## 11 Clear Bass Female 32
## 12 Clear Shad Female 98
It worked!
The data output looks logical and is in the desired [row,column] format. As a function, melt()
defaults to a transposition where the last column in the data becomes the “values” - that is, the data, which here is the frequencies. The column next to last column becomes “variable” (here, the osprey sex), and the first two columns in sequence become the “id variables” Lake and FishSpp on which the pivot occurs.
Can I alter how the pivot occurs?
You can alter how the pivot occurs by using the option id =c()
, where c(Var2Use)
is the variables, and their sequence, on which the pivot will occur. Let’s compare by switching the sequence of variable Lake and FishSpp. Output now has FishSpp in the first column and Lake in the second.
melt(f1mod, id =c("FishSpp", "Lake")) # reshape w/different sequence & examine
## FishSpp Lake variable value
## 1 Sunfish Spring Male 59
## 2 Bass Spring Male 14
## 3 Shad Spring Male 189
## 4 Sunfish Clear Male 63
## 5 Bass Clear Male 7
## 6 Shad Clear Male 126
## 7 Sunfish Spring Female 72
## 8 Bass Spring Female 21
## 9 Shad Spring Female 138
## 10 Sunfish Clear Female 95
## 11 Bass Clear Female 32
## 12 Shad Clear Female 98
Basic calls related to data reshaping are:
t()
=> Naive transpositionmelt()
=> Reshape (“Pivot”) cross-classified dataAn important option for melt()
is:
id =c(Vars2Use)
=> Controls the variables used, and their sequence, in the reshapeData for this exercise are in: ../baseR-V2016.2/data/exercise_dat.
The data file rodent_seedchoice.csv is a spreadsheet-based representation of seed preferences by rodents. Variables diff_ are the difference between seeds chosen and seeds presented, where “_" are the 3 seed choices. This is a commonly found “table” format data structure that must be reshaped for analysis in R.