MODULE 4.9 Custom Functions in R

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective:

  • Construct customized functions from sets of R statements



Let’s start with the ususal …

Learning Questions:

  • How do I build a function?

  • What is considered the input to a function?

  • Can I use (i.e., nest) functions inside functions?

  • How do I structure output from a function?


Some Background


Functions are perhaps the most powerful aspect of R. They receive input arguments, perform requested operations, and output results in any requested format. They are also fast, especially when constructed to operate on vectors.

Although not a firm, hard rule, it is best to build a function dedicated around a single task. That is, to return a single output element. A function should also be standalone in scope, and not nested inside other functions. This means they are also transportable, and able to operate across many different analyses dealing different input and output structures.

Above all, a function solves a specific, targeted analytical operation.


Some Initialization Before We Proceed …


Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData. Some of these objects will be needed, so load them first into your workspace.

# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
  getwd()  # in correct directory ?  
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
  list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
  load("mod3data.RData")  # load it
  ls()  # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"

If the objects are not there, or you did not save an .RData from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.


Some Free Advice on Custom Functions


First consider whether a function is appropriate:

  • Does the function perform an analysis or process that is repeated with some regularity?
    EXAMPLES: data cleaning, construction of a table of summary statistics or metrics

Function construction itself requires a logical approach.

  • What external values or objects are fed to the function?
  • Where do these come from? Extracted from other objects? “Manual” entry?
  • What sequence of statements are performed within the function?
  • Is intermediate output desired? If so, what is this?
  • What is the nature of the desired final output from the function operation?

Consider writing out the function operations in King’s English prior to writing the code in R.


The Basic Custom Function in R - One Argument, One Statement


The basic syntax for a custom R function is FunctionName = function(Argument(s)) {Statement(s)}. All functions are assigned a name FunctionName; they end up as objects in your workspace, and are implemented by name. Argument(s) represented the input data objects, which can range for one to several. Each input argument is assigned a generic variable name in the function. Statement(s) is any valid R statement.

Let’s start with a simple function. This function is designed to convert measurements in meters to feet. It receives one argument.

# assume m1 from Exercise #6; skull characteristics of jumping mice
# Note elev data in m; you desire a new column in feet
  head(m1, 3)  # data structure
##   catno sex elev conlen zygbre lstiob
## 1  9316   M 1878  22.37  12.64   4.83
## 2 17573   F 3230     NA  12.38   4.28
## 3 17574   M 3230     NA  11.75   4.45
# function purpose: convert elev in m to ft
  convert1.m2ft <- function(x1) {   # function name; argument x1
   x1 * 3.28 }                      # statement operation on argument

Examine the function above. First, we have assigned it the name convert1.m2ft. It is a good practice to name functions after the task(s) they perform. We next used the function() call with an argument labeled x1. There is nothing special about the use of x1; you can use any label name for the argument.

However, when you look at the R statement in the function, notice it refers to the argument in the operation as something - labeled x1 - that gets multiplied by the scalar 3.28. We’ll come back to this labeled x1 when we run the function below. Once you build and submit the function it resides in your workspace.

# is the function in workspace ??
  ls(pattern = "convert1")
## [1] "convert1.m2ft"
# what does the function look like ??
  convert1.m2ft  # examine function
## function(x1) {   # function name; argument x1
##    x1 * 3.28 }

Now call the function and feed it the input data object m1$elev. By specifying the argument as m1$elev you are replacing the label argument x1 with m1$elev. Thus every place in the function that calls x1 actually uses m1$elev, which is then multiplied by the specified scalar 3.28.

# call the function; feed function the column elev, which is elev in meters
  convert1.m2ft(m1$elev)  # call fxn; assign argument
##  [1]  6159.84 10594.40 10594.40  9994.16  9994.16  9994.16  9994.16
##  [8]  9994.16  9994.16  9994.16  9994.16  8796.96  8895.36 10433.68
## [15] 10433.68 10433.68 10433.68 10433.68  9846.56

Notice the output returns as a vector to your console. You could very easily link the output of the function to the data object as well.

# add function output to dataframe m1
  m1$elev.ft <- convert1.m2ft(m1$elev)  # call fxn; assign argument and linke to data object m1
# examine m1; is the column there ??
  head(m1, 3)  # 1st 3 lines of data object m1
##   catno sex elev conlen zygbre lstiob  elev.ft
## 1  9316   M 1878  22.37  12.64   4.83  6159.84
## 2 17573   F 3230     NA  12.38   4.28 10594.40
## 3 17574   M 3230     NA  11.75   4.45 10594.40

You have now built your first function.


The Basic Custom Function in R - Two Arguments, One Statement


Imagine, for example, that you generate an R object elsewhere - say a simple scalar output - and now want to feed that to a function that also includes data from another, second object. This means two arguments will be fed into the function.

We will use the simple example of conversion of meters to feet again to illustrate how two arguments can be fed to a function.

# function purpose: use two arguements to convert elev in m to ft
  conversion <- 3.28  # an outside scalar; can come from anywhere
  convert2.m2ft <- function(x1, x2) {   # function name; argument x1
    x1 * x2 }                           # statement operation on argument
  convert2.m2ft(m1$elev, conversion)    # 2 arguments fed to function
##  [1]  6159.84 10594.40 10594.40  9994.16  9994.16  9994.16  9994.16
##  [8]  9994.16  9994.16  9994.16  9994.16  8796.96  8895.36 10433.68
## [15] 10433.68 10433.68 10433.68 10433.68  9846.56

You can apply as many arguments as you wish to a function. However, at some point too many arguments overwhelms the purpose of the function - to return the result(s) of a specific, targeted analytical operation. keep this in mind as you build functions.


The Basic Custom Function in R - Two or More Arguments, Two or More Statements


More than two arguments and statements can be used in a function. In this case, assume you wish to take the formal, full scientific name of a species - the genus and epithet - and convert these to a new variable consisting of 4 character codes.

# assume t1 from Exercise #6; data on tree presence and 3 topographic variables
  head(t1, 2)  # data structure
##   locid     genus    epithet elev   aspect    slope    rough presabs
## 1     1 Juniperus monosperma 1304 24.96899 1.034836 5.162961       0
## 2     2 Juniperus monosperma 1623 30.59265 2.190933 8.642208       0
# build function: 2 arguments, 2 statements
# function purpose: create new 4-char names for tree species from formal names
  code.4char <- function(x1, x2) {  # new 4-char var spp.code
    e <- substr(x2, 1, 2)               # statement1
    g <- tolower(substr(x1, 1, 2))      # statement2
    paste(g, e, sep = "")               # statement3
  }

Call the function by providing the two arguments, t1$genus and t1$epithet.

# call function with 2 arguments
  code.4char(t1$genus, t1$epithet)
##  [1] "jumo" "jumo" "jumo" "jumo" "jumo" "jumo" "jumo" "jumo" "jumo" "jumo"

As before, the output could be appended to the existing data object t1 if desired.


Controlling Output from a Custom Function


Examine the output from the code.4char() function above. The last line of the function has no assignment, as paste(g, e, sep = "") simply returns the output of the function. So what happened to the function’s internal objects g and e?

Basically, intermediate operations within a function are not returned unless a return is specified. Only the final operation is returned. This is because, as noted above, the typical reason for developing a function is to return a specific, targeted result. In most cases intermediate results are not considered useful.

The return() is used to return named objects from a function. The output is of class ’list`.

source outside functions

f.bad <- function(x, y) {
 z1 <- 2*x + y
 z2 <- x + 2*y
 z3 <- 2*x + 2*y
 z4 <- x/y
}


f.good <- function(x, y) {
 z1 <- 2*x + y
 z2 <- x + 2*y
 z3 <- 2*x + 2*y
 z4 <- x/y
 return(c(z1, z2, z3, z4))
}

sem = function(x)
+ {
+      sqrt(var(x)/length(x))
}

Summary of Module 4.9 Functions


Basic calls related to sorting, ordering, and ranking data objects are:

  • for () {} => Repeat statements for specified iterations
  • while () {} => Repeat statements until stop criteria is reached
  • repeat {} => Repeat statements until break is called
  • print() => Sends output to console from inside loop

Exercise #18


Data for this exercise are in: ../baseR-V2016.2/data/exercise_dat.

In Exercise #14 and #16 summary datasets were created on counts of male grouse by lek within complexes. In #14, these were 3 descriptive stats; #16 merged these into a single dataset. Assume this represents an annual exercise to be accomplished once lek counts are completed each spring. Now write a function - say AnnualLekSummary - that:

  • Imports the lek data (assume it has been “cleaned”);
  • Selects from the data a specified year for which statistics are desired;
  • Calculates mean, 2SD, n, and 90%CI for the specified year within lek complex
  • Writes the yearly-based output as a separate *.csv file

END MODULE 4.9


Printable Version