Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016
Module #4 - Data Manipulation in R
Data manipulation serves many different purposes, all of which are related to your research objectives. Some of the more common purposes in ecological analysis include:
Modifying or combining so-called “raw” data to create new variables.
These operations typically require use of logical and/or mathematical operators, and are often based on a formula or known arithmatic relationship.
Extracting subsets of data meeting some condition.
Here, selections of rows and/or columns are made from a data object. These extractions can be simple - just certain columns (variables) - more complex, such as selecting observations fora variable if they meet a specified condition.
or selections based on conditionals
The splitting of data into logical subgroups on which separate analyses are to be performed.
These splits require subgroups having data scale of ordinal or nominal.
Using a common variable to merge or combine separate data objects into a single data object.
For a merge to work properly there must be one or more common variable among data objects.
The sorting, ordering,and ranking of variables in data objects.
These are processes that reconfigure observations in data objects based on specified criteria.
Simple programming using loop.
A loop is a means to implement the same set of code some established number of times. Often this involves analytical operations consistently repeated on many data objects that are identical in structure but differ in their input data.
Simple programming using functions.
As with the loop, functions comprising repeated analytical operations consistently repeated on variables in many data objects that differ in structure and in variables can be built. Unlike a loop, where there is a known number of iterations, functions are used frequently.
Aspects of these types of data manipulation in R are explored in the nine elements comprising this Module.
Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData
. Some of these objects will be needed for Module 4, so load them first into your workspace.
# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
getwd() # in correct directory ?
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
load("mod3data.RData") # load it
ls() # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"
If the objects are not there, or you did not save an .RData
from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.