Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016
Many analytical procedures in R require objects be of a certain data class or mode for operation. Data import from external sources often results in a class or mode not amenable with desired R function.
For example, an analysis of variance may require a variable class to be factor
when data were read into the data object as class numeric
. While we learned how to control some of this during data import (Module 3.4), there often arise situations where data characteristics need to be changed after the initial data importation.
Coercing is the name of this process in R.
Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData
. Some of these objects will be needed, so load them first into your workspace.
# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
getwd() # in correct directory ?
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
load("mod3data.RData") # load it
ls() # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"
If the objects are not there, or you did not save an .RData
from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.
Data imported from external sources (e.g., read.csv()
) assume a specific R class depending on the nature of element values (i.e., the recorded measurements). Once imported the data will ahve a mode as well.
Determining class:
Rule sets for this class assignment are not always obvious, but the easiest way to check is to apply the calls class(DataObject)
or class(DataObject$VariableName)
.
These calls return the class of the data object, and each of the variables in the data object.
# assume object f1 from Exercise #6
class(f1) # class of data object
## [1] "data.frame"
names(f1) # names in data object
## [1] "FishSpp" "Male" "Female"
class(f1$FishSpp) # class of var1
## [1] "factor"
class(f1$Male) # class of var2
## [1] "integer"
class(f1$Female) # class of var3
## [1] "integer"
In addition to calls that return the object class, data objects can be queried using is.OfClass
, where OfClass is a known R class (see Module 3.2). Queries return the logical of TRUE or FALSE. Some common queries are:
is.numeric()
| is.character()
| is.factor()
| is.vector()
| is.matrix()
| is.data.frame()
# assume object f1 from Exercise #6
is.data.frame(f1) # is f1 a dataframe?
## [1] TRUE
is.vector(f1$FishSpp) # is f1$FishSpp a vector?
## [1] FALSE
is.numeric(f1$FishSpp) # is f1$FishSpp numeric?
## [1] FALSE
is.factor(f1$FishSpp) # is f1$FishSpp factor?
## [1] TRUE
Let’s learn a shortcut …
Rather that testing each variable independently as before, use str()
to determine class for all variables in a data object at once. This is much simpler than querying each variable one-by-one.
# assume object f1 from Exercise #6
names(f1) # what are the variable names in f1?
## [1] "FishSpp" "Male" "Female"
f1 # examine data
## FishSpp Male Female
## 1 Sunfish 59 72
## 2 Bass 14 21
## 3 Shad 189 138
str(f1) # find variable classes in data object f1
## 'data.frame': 3 obs. of 3 variables:
## $ FishSpp: Factor w/ 3 levels "Bass","Shad",..: 3 1 2
## $ Male : int 59 14 189
## $ Female : int 72 21 138
In the example above, str()
returns output indicating the data are in a class data.frame
, while the variables FishSpp, Male, and Female are classes of Factor
, and int
(for integer), respectively.
Determining mode:
Just as with class()
, you can use mode()
to ascertain the R mode of data objects and the elements they contain.
# assume object f1 from Exercise #6
mode(f1) # class of data object
## [1] "list"
mode(f1$FishSpp) # mode of data object
## [1] "numeric"
mode(f1$Male) # mode of var2
## [1] "numeric"
mode(f1$Female) # mode of var3
## [1] "numeric"
Data may be coerced to appropriate data class or mode using as.NewClassOrMode()
, where some options for NewClassOrMode are:
as.numeric()
| as.character()
| as.factor()
| as.vector()
| as.matrix
| as.data.frame()
WARNING !!
as.NewClassOrMode()
coercions are temporary only. To make them permanent the variable in the R data object, or the data object itself, must be over-written.
# an example of temporary vs. permanent coercion
x <- c("pied", "pimo"); x # simple vector of characters
## [1] "pied" "pimo"
is.factor(x) # are values in x factors?
## [1] FALSE
str(x) # what is class of x?
## chr [1:2] "pied" "pimo"
as.factor(x) # coerce into factors
## [1] pied pimo
## Levels: pied pimo
is.factor(x) # factor yet? why not?
## [1] FALSE
x <- as.factor(c("pied", "pimo")); x # MUST make permanent as factor
## [1] pied pimo
## Levels: pied pimo
is.factor(x) # finally !!
## [1] TRUE
class can be specified during data import using the option colClasses= c()
within the read.FileType()
calls. Some basic, commonly used class options (see Module 3.2) are logical
, integer
, numeric
, character
, factor
, and Date
.
# set variable class during import; make sure you're in the data directory !!
m1c <- read.csv("m1.csv", header = T,
colClasses = c("factor", "factor", "integer", "numeric", "numeric", "numeric"))
str(m1c) # check class in dataframe
## 'data.frame': 19 obs. of 6 variables:
## $ catno : Factor w/ 19 levels "17573","17574",..: 19 1 2 3 4 5 6 7 8 9 ...
## $ sex : Factor w/ 2 levels "F","M": 2 1 2 1 1 2 1 1 2 1 ...
## $ elev : int 1878 3230 3230 3047 3047 3047 3047 3047 3047 3047 ...
## $ conlen: num 22.4 NA NA 20.4 21.7 ...
## $ zygbre: num 12.6 12.4 11.8 11.4 12.1 ...
## $ lstiob: num 4.83 4.28 4.45 4.36 4.51 4.45 4.13 4.26 4.32 3.94 ...
Although assignment of class can be done during import, lots of variables in an import dataset makes this a somewhat cumbersome operation unless the import is a continuous, repeated exercise, such as weather data being received on a daily basis. For one-time imports you are better off using str()
to examine the imported data, and adjusting class or mode as needed.
Basic calls related to class and mode identification and coercion are:
class()
=> Returns column names in object in sequencestr()
=> Same as abovemode()
=> Returns column names in object in alphabetic sequencerownames()
=> Returns row names in object in sequenceis.ClassOrMode
=> $ returns specified column name from objectData for this exercise are in: ../baseR-V2016.2/data/exercise_dat.
Examine the zapusmorph.csv file. The “.” (periods) represent “missing values” in the input .csv file.
read.FileType()
call.Two columns in the coyotebehav.csv data object are labeled “date” and “time.” These are EXCEL-based.
Challenge Exercise:
The data object built with fish_recapture.csv has a tag date and a recapture date by tag id (an individual fish)
HINT. Each day will need to be subtrcated from the next day in the date column. See if you can use some of what you learned in Module 3.5 to solve this problem.