MODULE 4.5 Conditional Statements in R

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective:

  • Apply one or more conditionals to data objects and control operations of R statements



Let’s start (as we always do !!) with …

Some Learning Questions:

  • What common conditionals, and their syntax, are available in R?

  • Can I create nested conditionals?


Some Background


Calls that control the flow of analysis are called conditionals. Conditionals determine if a specified condition is met (or not), then direct subsequent analysis or action depending on whether the condition is met or not.


Some Initialization Before We Proceed …


Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData. Some of these objects will be needed, so load them first into your workspace.

# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
  getwd()  # in correct directory ?  
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
  list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
  load("mod3data.RData")  # load it
  ls()  # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"

If the objects are not there, or you did not save an .RData from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.


Basic Conditionals and Syntax in R


Some basic conditionals:

Function Purpose Function Purpose
if if (condition) statement while while (condition) statement
if - else if (condition) statement1 repeat repeat (statement)
else statement2 switch switch (condition, statement1, statement2, … , statement)
ifelse ifelse (condition, yes, no) for for (variable in list) statement

Conditional Statements - The if()


The conditional if(Condition) Statement executes one or more R statements when Condition is met. Multiple Statement’s must be inside {} (curly brackets) as in {Statement1, Statement2}.

WARNING!!
The if() conditional can only test a single element, not a vector of elements. Consequently, when applied to a vector (column) of values in a data object, it checks to see if the first - and only the first - observation meets Condition. If the first observation in data object does not meet the condition, no statements are executed on the entire data object. A non-fatal Warning will be returned, indicating that although R executed the code, you should be concerned.

# assume vector x as below
x <- c(1, 2, 3, 4, 5)
# goal is a return vector that when x < 3, it is mulitplied by 2
# expected values would be:  2, 4, 3, 4, 5 given ony the 1st two elements are <3
if (x < 3) x * 2
## Warning in if (x < 3) x * 2: the condition has length > 1 and only the
## first element will be used
## [1]  2  4  6  8 10

Notice the response. First is the warning that Warning in if … : the condition has length > 1 and only the first element will be used. This means that since the first element was, indeed, \(\lt\) 3, all elements of x were multiplied by 2. Second, the code executed. It just gave you the wrong answer.

Try again, only this time make the condition \(\gt\) 3.

# goal is a return vector that when x > 3, it is mulitplied by 2
# expected values would be:  1, 2, 3, 8, 10 given only the last two elements are >3
if (x > 3) x * 2
## Warning in if (x > 3) x * 2: the condition has length > 1 and only the
## first element will be used

This time notice no code was executed. This was because the first element in x did not meet the condition of being \(\gt\) 3. This can be resolved by nesting the condition inside the any() function, as in if(any(Condition)).

# using any() fxn to ensure condition is met across entire vector
# expected values would be:  1, 2, 3, 8, 10 given only the last two elements are >3
if (any(x > 3)) x * 2
## [1]  2  4  6  8 10

As we will see below, there are better options rather than nesting the if() inside any(), even though it is perfectly legitimate code.


Conditional Statements - The if() else


The if (Condition) Statement1 else Statement2 conditional executes different statements when Condition is met. Statement1 is executed only if Condition is met. If the condition is not met, then Statement2 is executed. Multiple statements can be performed, but they must be nested inside {} (curly brackets).

if() else suffers from the same logic constraint found in if(); it too only evaluates the first element in the column

# assume data m1 from Exercise #6; data are skull characteristics of jumping mice
head(m1, 2)  # examine data
##   catno sex elev conlen zygbre lstiob
## 1  9316   M 1878  22.37  12.64   4.83
## 2 17573   F 3230     NA  12.38   4.28
# goal: use conditional to create new variable; convert elev to categories "hi" & "lo"
if (m1$elev < 3100) m1$elev.cat = "lo" else m2$elev.cat = "hi"
## Warning in if (m1$elev < 3100) m1$elev.cat = "lo" else m2$elev.cat = "hi":
## the condition has length > 1 and only the first element will be used
head(m1, 2)  # examine
##   catno sex elev conlen zygbre lstiob elev.cat
## 1  9316   M 1878  22.37  12.64   4.83       lo
## 2 17573   F 3230     NA  12.38   4.28       lo

Notice the same Warning … is returned, and that the assignment of the new category is not correct. The second observation should have elev.cat = “hi” given elev for that observation is not <3100.


Some Observations on Use of if() and if() else


if() and if() else should not be applied when the Condition being evaluated is a vector. It is best used only when meeting a single element condition. In most applications the condition is an element not related to the data object being manipulated. We will see better examples of using if() and if() else in Modules 4.8 (Looping) and 4.9 (Functions).

The ifelse() is better suited for dealing with vectors.


Conditional Statements - The ifelse()


The ifelse(Condition, Statement1, Statement2) conditional executes different statements when Condition is met. Statement1 is executed only if Condition is met. If the condition is not met, then Statement2 is executed. Multiple statements can be performed, but as above they must be inside {} (curly brackets).

Unlike if and if-else, ifelse works with vectors. Thus it can be applied to a column of data within a data object.

# assume data m1 from Exercise #6; data are skull characteristics of jumping mice
# conditional to create new varaiable; convert elev to categories "hi" & "lo"
m1$elev.cat <- ifelse (m1$elev < 3100, "lo", "hi")
head(m1, 2)  # examine
##   catno sex elev conlen zygbre lstiob elev.cat
## 1  9316   M 1878  22.37  12.64   4.83       lo
## 2 17573   F 3230     NA  12.38   4.28       hi

Conditional Statements - The which()


which(Condition) returns row number(s) from a data object meeting Condition. It is a useful call for extracting observations or identifying observations meeting the condition.

# assume data m1 from Exercise #6; data are skull characteristics of jumping mice
which(m1$elev >= 3200)  # return obs number of data row where elev>3200
## [1] 2 3
m1[which(m1$elev >= 3200), ]  # return data rows where elev>3200
##   catno sex elev conlen zygbre lstiob elev.cat
## 2 17573   F 3230     NA  12.38   4.28       hi
## 3 17574   M 3230     NA  11.75   4.45       hi

Conditional Statements - The switch()


The switch(WhichStatement2Use, Statement1, Statement2, ... , StatementN) applies different Statement(s) depending on the switch condition. Note that switch values outside the number of statements return a NULL of no answer.

# assume data m1 from Exercise #6; data are skull characteristics of jumping mice
test <- 1  # condition that identifies statement to execute
switch(test, mean(m1$elev), median(m1$elev), sd(m1$elev)) # statement #1, mean()
## [1] 3000.789
test <- 2  # executes statement 2, median
switch(test, mean(m1$elev), median(m1$elev), sd(m1$elev)) # statement #2, median()
## [1] 3047
test <- 3  # executes statement 3, sd
switch(test, mean(m1$elev), median(m1$elev), sd(m1$elev)) # statement #3, sd()
## [1] 310.0377
test <- 4  # tries execute statement 4, no stement 4 so NULL returned
switch(test, mean(m1$elev), median(m1$elev), sd(m1$elev)) # statement #4 ???

WARNING!!
Be careful and avoid a switch value for which there is no corresponding statement. The lack of an error return or warning makes it appear as though your code is operating perfectly, when in effect it is not.


Summary of Module 4.5 functions


Basic calls related splitting data are:

  • if() {} => Execute R statement(s) when condition is met
  • if() {} else {} => Execute statement 1 if condition met; if not execute statement 2
  • ifelse() => Execute statement 1 if condition met; if not execute statement 2
  • which() => Find row(s) in data object that meet condition
  • switch() => Apply different Statement(s) depending on condition

Remember, the if() and if() else calls work only on single elements, not vectors. Use ifelse() is the application is test conditions for each element in a vector.


Exercise #15


Data for this exercise are in: ../baseR-V2016.2/data/exercise_dat.

The dataset bearclawpoppy.csv consists of poppy [P, A] and a set of associated environmental variables. The data must ‘cleaned’ prior to building a predictive model, including:

  • All missing values (‘99’, ‘blank’) should be converted to NA
  • Condensing landform variables according to the rules:
    • (‘af’, ‘aflb’, ‘afub’, ‘afwb’, ‘afws’) = ‘af_type’
    • (‘bfr’, ‘bfrlb’, ‘bfrwb’, ‘bfrwbws’, ‘bfrws’) = ‘bf_type’
    • (‘lb’, ‘lbaf’, ‘lbub’, ‘lbwb’) = ‘lb_type’
    • (‘wb’, ‘wblb’, ‘wbub’, ‘wbws’, ‘ws’, ‘wslb’) = ‘ws_type’
    You MUST over-write existing var = LANDFORM
  • Creating new variable presab where:
    • plant (‘poppydead’, ‘poppyalive’, ‘poppy’) = 1;
    • plant (‘absence’) = 0; and
    • plant (‘buckwheat’) = NA

HINT: Think nested conditionals in conjunction with the %in% function.


END MODULE 4.5


Printable Version