MODULE 3 Data Management in R

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective:

  • Data in R can be of many different types and forms - these differences must be udnerstood before working in R



Module #3: Data Management in R


Elements of Module #3 - Data Management in R - include:

  • 3.1 - Data cautions and caveats
  • 3.2 - Data class and mode
  • 3.3 - Data distinctions
  • 3.4 - Data input
  • 3.5 - Accessing variables
  • 3.6 - Coercing data
  • 3.7 - Reshaping data
  • 3.8 - Data checking
  • 3.9 - Data output

What is meant by data management in R?


Data come in many different scales - nominal, ordinal, ratio/interval. In R, each scale is of a different class, such as integer or numeric (floating point), or character, among others. Analyses in R depend on knowing how the scale of your data relates to R data class, and changing them to other classes is often necessary before analyses in R can proceed.

Data management also ensures proper assignment of data class, such as factor (e.g., sex) with levels (male, female), or as character (e.g., spp126 as code for Juniperus monosperma), representing logical interpretative groupings of the data. Again, the assignment of proper data class is fundamental to analyses in R.

Understanding the data input source (e.g., MS Excel, extraction from a GIS), and how each external data sources deals with data scale and class (e.g., missing values), also affects R analyses.

A common data management issue, for example, is how best to import and standardize data for analysis in R from different sources. In addition, it is often necessary to “reshape” data, such as transposing rows and columns.

Aspects of data management are explored in the eight elements of this Module.


START MODULE 3


Printable Version