MODULE 4.2 New Variables and Objects

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective:

  • Create and add new variables to a R data object



Let’s begin with …

Some Learning Questions:

  • How do I create a new variable from one or more existing variables in a data object?

  • How do I add this new variable to an existing data object?

  • How do I create new variables as objects outside an existing data object?

  • How do I create new data objects?


Some Background


The generation of new variables, or modification of existing variables, is central to data manipulation. Within R there exists numerous mathematical and operation functions (See Module 4.1) that allow you to manipulate data. The outcome of these manipulations can:

  • Return a modified or new variable to console;
  • Write variable modifications to an R object as new variables; or
  • Over-write existing variables (replace) in a data object.

New objects can also be created as standalone variables in the R workspace, or as new data objects for export to your external OS.


Some Initialization Before We Proceed …


Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData. Some of these objects will be needed, so load them first into your workspace.

# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
  getwd()  # in correct directory ?  
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
  list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
  load("mod3data.RData")  # load it
  ls()  # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"

If the objects are not there, or you did not save an .RData from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.


Creating New Variables in an Existing Dataframe


Assume you wish to create a new variable based on one or more operations applied to one or more variables in your data object. The first step is the operation itself; the second is attaching the output from the operation to the existing data object.

Start by first revisiting the dataset m1.

# examine the m1 dataset
head(m1, 2)  # 1st 2 lines of m1; note variable names
##   catno sex elev conlen zygbre lstiob
## 1  9316   M 1878  22.37  12.64   4.83
## 2 17573   F 3230     NA  12.38   4.28

There are many different ways new variables can be added to an existing data object. Here, we simply subtract one variable in m1 from another (conlen from zygbre). We call that new variable diff. We next use the cbind() call to add the value of subtraction diff to the data object m1.

# create a new variable by subtraction
m1 <- cbind(m1, diff = m1$conlen - m1$zygbre)  # create new var diff and bind to m1
head(m1, 2)  # note new var diff in m1
##   catno sex elev conlen zygbre lstiob diff
## 1  9316   M 1878  22.37  12.64   4.83 9.73
## 2 17573   F 3230     NA  12.38   4.28   NA

Note how R dealt with the NA’s. If one of the variables in the operation is NA R automatically assigns NA to the output.

Although the approach above works quite well, it is not as intuitively obvious as simply assigning the outcome of a manipulation to a new variable. Note we use the $ symbol to link the new variable - cl_rat, the outcome of a division operation - to the existing data object m1.

# add new variable 
m1$cl_rat <- m1$conlen/m1$lstiob  # create new var using operator division
head(m1, 2)  # examine new var; note significant digits
##   catno sex elev conlen zygbre lstiob diff  cl_rat
## 1  9316   M 1878  22.37  12.64   4.83 9.73 4.63147
## 2 17573   F 3230     NA  12.38   4.28   NA      NA

Any R function can be applied to new variables, such as the use of round() below. round() does exactly what is says - it rounds a value to a specified number of significant digits as set by the digits = option.

# round existing variable and add as new variable to m1
m1$cl_ratR1 <- round(m1$cl_rat, digits = 2)  # round cl_rat to 2 sig digits
head(m1, 2)  # examine cl_ratR1 after round
##   catno sex elev conlen zygbre lstiob diff  cl_rat cl_ratR1
## 1  9316   M 1878  22.37  12.64   4.83 9.73 4.63147     4.63
## 2 17573   F 3230     NA  12.38   4.28   NA      NA       NA

In most cases it is simpler to combine numerous R functions in sequence.

m1$cl_ratR2 <- round(m1$conlen/m1$lstiob, digits = 2)  # combine operations

Thus, you can apply several functions in sequence (1 code line) to generate and add a new variable to an existing data object.


New Variable Objects Outside an Existing Dataframe


You can use variables in data objects to create new variables that themselves become objects in workspace rather than being attached to an existing data object. One common example is summary statistics.

mean(m1$conlen)  # mean of conlen; NOTE missing values in data returns NA
## [1] NA
mean(m1$conlen, na.rm = T)  # use option na.rm = T; mean returned to console but NOT saved anywhere
## [1] 21.70333
xbar <- mean(m1$conlen, na.rm = T)  # mean as workspace object
ls()  # is xbar in workspace ??  ... yes
## [1] "f1"   "m1"   "m2"   "m3"   "m4"   "t1"   "w1"   "xbar"
xbar  # call object
## [1] 21.70333

We will increasingly create standalone objects in the workspace as we progress through Module 4.


New Variables from Character Strings


More complex operations can be implemented using character strings.

Let’s create a new variable that:

  • First extracts a specified subset of characters from 2 separate character strings in a data object;
  • Converts upper case letters to lower case; and
  • Pastes the extractions into single new variable that is added to data object.

Two R functions - substr(DataObject, FromPos, ToPos) and paste(CharObject1, CharObject2, sep = ) will be used. In substr(), the FromPos and ToPos indicate the position in the character string where the substring starts and stops, respectively. CharOject1 and CharObject2 in paste() are two extracted or existing character strings (the number of character objects to be pasted may be up to some N). The sep = option allows you to specify the character that “binds” the two strings; sep = "" (two quotes) indicates no separation or character between the two character objects.

# create new character string from existing character vars in data object
t1[1:2, ]  # assume t1 from Exercise #6; show rows=1 to 2
##   locid     genus    epithet elev   aspect    slope    rough presabs
## 1     1 Juniperus monosperma 1304 24.96899 1.034836 5.162961       0
## 2     2 Juniperus monosperma 1623 30.59265 2.190933 8.642208       0
g <- tolower(substr(t1$genus, 1, 2)); g  # extract 1st 2 chars of variabale genus
##  [1] "ju" "ju" "ju" "ju" "ju" "ju" "ju" "ju" "ju" "ju"
e <- substr(t1$epithet, 1, 2); e  # extract 1st 2 chars of variable epithet
##  [1] "mo" "mo" "mo" "mo" "mo" "mo" "mo" "mo" "mo" "mo"
sppcode <- paste(g, e, sep = "")  # paste chars together; new 4-char var called sppcode
t1 <- cbind(t1, sppcode)  # bind the new var to t1
head(t1, 2)  # examine new var sppcode
##   locid     genus    epithet elev   aspect    slope    rough presabs
## 1     1 Juniperus monosperma 1304 24.96899 1.034836 5.162961       0
## 2     2 Juniperus monosperma 1623 30.59265 2.190933 8.642208       0
##   sppcode
## 1    jumo
## 2    jumo

The code above is done in several steps. Try and do it all in a single line of code!

You can also simply create a new character string and add it to the data object

# add character scalar to data object
sppnum <- "spp69"  # create scalar spp code
t1 <- cbind(t1, sppnum)  # bind the new var to t1
head(t1, 2)  # examine new var sppcode
##   locid     genus    epithet elev   aspect    slope    rough presabs
## 1     1 Juniperus monosperma 1304 24.96899 1.034836 5.162961       0
## 2     2 Juniperus monosperma 1623 30.59265 2.190933 8.642208       0
##   sppcode sppnum
## 1    jumo  spp69
## 2    jumo  spp69

New Objects from Character Strings


A new object can be created from an existing object and added to your current workspace using the assign(NewObject, OldObject) call. NewObject is the name of the newly created object in the workspace, while OldObject is the object from which the new object is determined. This call is especially useful inside loops and functions, where an object name might change in each loop iteration or as the function progresses.

# examine the t1 dataset from xercise #6 as above
t1 <- read.csv("t1.csv", header = T)
head(t1, 2) # examine 
##   locid     genus    epithet elev   aspect    slope    rough presabs
## 1     1 Juniperus monosperma 1304 24.96899 1.034836 5.162961       0
## 2     2 Juniperus monosperma 1623 30.59265 2.190933 8.642208       0

Next, build some scalars that will be pasted together as the new object name, J.mono, using assign().

spp <- "J"  # create a scalar j
epi <- "mono"  # create a scalar mono
assign(paste(spp, ".", epi, sep = ""),t1)  # create object name, assign object t1

Then examine the object J.mono and check to see if the data there are the same as t1, which is the old object having the data you wish copied into the new object J.mono.

ls()  # is object J.mono in workspace? yes ...
##  [1] "e"       "epi"     "f1"      "g"       "J.mono"  "m1"      "m2"     
##  [8] "m3"      "m4"      "spp"     "sppcode" "sppnum"  "t1"      "w1"     
## [15] "xbar"
head(J.mono, 2)  # check new data object J.mono; same as t1?  yes ...
##   locid     genus    epithet elev   aspect    slope    rough presabs
## 1     1 Juniperus monosperma 1304 24.96899 1.034836 5.162961       0
## 2     2 Juniperus monosperma 1623 30.59265 2.190933 8.642208       0
head(t1, 2)
##   locid     genus    epithet elev   aspect    slope    rough presabs
## 1     1 Juniperus monosperma 1304 24.96899 1.034836 5.162961       0
## 2     2 Juniperus monosperma 1623 30.59265 2.190933 8.642208       0

Summary of Module 4.2 functions


Basic calls related to the generation of new variables are:

  • cbind() => Bind objects by column
  • round() => Round to selected significant digits
  • substr() => Extract selected sequence of characters from object of mode=character
  • tolower() => Convert upper to lower case (toupper() is reverse)
  • paste() => paste sequence of objects into new character string
  • assign() => Assign an object in workspace to a new object

Exercise #12


Data for this exercise are in: ../baseR-V2016.2/data/exercise_dat.

The file tmax_all.dbf consist of a large (N=116,441) number of observations of maximum temperatures (tmax; \(\times\) 10) on a 10,000 m grid across western North America. Variables are listed as tmax_XX, where XX represents month from 1=Jan to 12=Dec.

  • Import the file and determine data object characteristics
  • Are there any missing tmax values?
  • Create variables for mean Winter (Dec, Jan, Feb), Spring (etc ..), Summer, and Fall tmax, as well as an annual mean tmax, for each UNIQUEID (i.e., row)
  • The values are to be rounded to integers
  • Add these new variables to the your current data object
  • Create new mean monthly (i.e., column) tmax variables and bind all into a new data object

END MODULE 4.2


Printable Version