MODULE 4 Data Manipulation in R

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective:

  • Explore commonly applied types of data manipulation in R used in ecological studies



Module #4 - Data Manipulation in R


  • 4.1 - Mathematical, logical operators and functions
  • 4.2 - New variables and objects in R
  • 4.3 - Extracting subsets of data
  • 4.4 - Splitting data
  • 4.5 - Data control structures - the conditional
  • 4.6 - Merging and combining data
  • 4.7 - Sorting, ranking, and ordering
  • 4.8 - Looping in R
  • 4.9 - Functions in R

What is Meant by Data Manipulation in R?


Data manipulation serves many different purposes, all of which are related to your research objectives. Some of the more common purposes in ecological analysis include:

Modifying or combining so-called “raw” data to create new variables.
These operations typically require use of logical and/or mathematical operators, and are often based on a formula or known arithmatic relationship.

  • EXAMPLE: Transforming 0-360\(^\circ\) aspect measurements to a 0-1 (dry to wet) scale

Extracting subsets of data meeting some condition.
Here, selections of rows and/or columns are made from a data object. These extractions can be simple - just certain columns (variables) - more complex, such as selecting observations fora variable if they meet a specified condition.

  • EXAMPLE: Selecting Var1, Var3 and Var5 from a data object containing Var1,\(\dots\),Var10
  • EXAMPLE: Selecting Row10 to Row20 from data object of Row1 to Row100

or selections based on conditionals

  • EXAMPLE: Selecting Var1 (Sex) where Sex=Male, and Var3 (Age) where Age $$5

The splitting of data into logical subgroups on which separate analyses are to be performed.
These splits require subgroups having data scale of ordinal or nominal.

  • EXAMPLE: Mean of Var3 (Weight) by Var1 (Sex=[Male, Female]) and Var2 (Species=[sppAA, sppBB, sppCC])

Using a common variable to merge or combine separate data objects into a single data object.
For a merge to work properly there must be one or more common variable among data objects.

  • EXAMPLE: Use Var1 (LocID) in data object X1=(LocID, Var1, Var2) and X2=(LocID, Var3, \(\dots\), Var10) to merge into data object X3=(LocID, Var1,\(\dots\),Var10) Combine simply adds rows from different data objects together
  • EXAMPLE: Combine data object X1=(Row1,\(\dots\),Row10) with data object X2=(Row11,\(\dots\),Row20) to create X3=(Row1,\(\dots\),Row20)

The sorting, ordering,and ranking of variables in data objects.
These are processes that reconfigure observations in data objects based on specified criteria.

  • EXAMPLE: Sorting Var1 (Habitat=[Hab1,Hab2,Hab3]) by alpha-numeric sequence
  • EXAMPLE: Ordering Var1 (Habitat=[Hab1,Hab2,Hab3]) by Var2 (Quality) and Var3 (Quantity)
  • EXAMPLE: Assigning a number (rank) to the ranked Habitats (Var1) by Quality (Var2)

Simple programming using loop.
A loop is a means to implement the same set of code some established number of times. Often this involves analytical operations consistently repeated on many data objects that are identical in structure but differ in their input data.

  • EXAMPLE: One hundred data objects DF1,\(\dots\),DF100, each with Var1,\(\dots\),Var10, and where a model (e.g., simple regression) is constructed for each data object

Simple programming using functions.
As with the loop, functions comprising repeated analytical operations consistently repeated on variables in many data objects that differ in structure and in variables can be built. Unlike a loop, where there is a known number of iterations, functions are used frequently.

  • EXAMPLE: Data objects DF1,\(\dots\),DF100, each with different subsets of Var1,\(\dots\),Var100, and where an analysis (Mean, 2SD, N) is repeated on the variables in the data objects

Aspects of these types of data manipulation in R are explored in the nine elements comprising this Module.


Some Initialization …


Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData. Some of these objects will be needed for Module 4, so load them first into your workspace.

# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
  getwd()  # in correct directory ?  
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
  list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
  load("mod3data.RData")  # load it
  ls()  # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"

If the objects are not there, or you did not save an .RData from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.


START MODULE 4


Printable Version