MODULE 3.2 Data Mode and Class in R

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective:

  • Understand the different types of R objects - an emphasis on mode and class



Let’s begin with …

Some background on the distinctions between R data mode and class


R has many different distinctions among data (R objects). Although there are many different types of of R objects, the two most commonly encountered distinctions are class and mode. Many R-based analyses require objects be of a specified class or mode; incorrect specification can lead to errors in analysis.

Nomenclature for data, and the structures that contain data, can be confusing in R. Much of the confusion occurs because of overlap in application of terminology to data structures, and their different usages in analyses. In the case of class and mode, R often uses these terms interchangeably for the same data.


Data mode vs. data class


Basic modes of R data objects consist of: numeric, complex, character, logical, list, and function. For an ecologist, the most common modes of R data we deal with include:

Mode Examples
Numeric (integer) => 1, 27
Numeric (floating point) => 3.14, 0.0067
Character => “A”, “pimo”, “1”
Logical => TRUE, FALSE

Distinctions based on mode relate to characteristics of your data (if it helps, think of the measurement associated with a variable).

class is the property of the R object. Commonly encountered class properties are: scalar, vector, matrix, and data.frame. Each of these classes can have data that are mixes of different modes. If it sounds confusing, it is and can be. It sometimes helps to think of a R class as representing different types of data objects.

For example, consider a 4-column spreadsheet. The spreadsheet can be of class = data.frame, where the mode of Col1 = numeric (integer), Col2 = character, Col3 = numeric (floating point), and Col4 = logical. Similarly, the class scalar could be of mode =character (“1”) or numeric (1).

Col1 Col2 Col3 Col4
1 A 1.23 T
2 B 4.56 F
3 C 7.89 F
4 D 1.01 F

More specialized data classes (e.g., factor, date and time, missing value NA) are discussed later in Module #3.


Why should you care about class and mode


Most packages require columns (variables) to be of a particular mode or class before analysis can proceed.

  • EXAMPLE: ANOVA requires levels in a factor be of class = factor
  • EXAMPLE: Count data should be of mode = integer

Mixing classes of data that are internally incompatible during R analyses can lead to frustrating errors. We will learn how to check for and change (coerce) class and mode in Module 3.6.


END MODULE 3.2


Printable Version