MODULE 2.5 Directories and Workspace in R

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective(s):

  • Understanding directories and workspace in R

Let’s begin by discovering …



What is the workspace in R?


R workspace is temporary space on your CPU’s RAM that “disappears” at the end of R session. All data, analyses, output, are all stored as objects in the R workspace. When you exit from R, the temporary space (i.e., workspace) disappears, as do all of your objects.

A cautionary tale !!!
Objects can be saved, but are forever and irrevocably lost at the end of your R session unless they have been saved.


Beware the “\” and “/” Difference in Windows vs. R !!!


Windows uses a \ (left slash) to delineate locations in CPU:

  • C:\Users\tce\Documents\stats for Windows directory access

R uses / (right slash) to delineate locations in CPU:

  • C:/Users/tce/Documents/stats for R W7/W8/W10 directory access

An alternative to R’s / (single right) is \\ (two left) slashes, as in C:\\Users\\tce\\Documents\\stats. R GUI’s use dropdowns to change directories. Although it will work in R, try to avoid creating directory names with spaces in them.

This is a non-issue in the MAC OS/Linux worlds, given they have retained the historical / slash as the basis for directory delineation.


Directories and pathnames in R


There are two principal calls related to directories in R: getwd(), which returns the current directory in relation to your R session, and setwd(), which changes your session directory to another, specified directory. When changing directories, you must enclosed the desired directory path inside paired “” (quotes).

The getwd() call:
Find out where you are using getwd(): (NOTE: Your answer will be different; mine defaulted to the directory where this baseR course resides on my CPU.)

# navigation
  getwd()  # where am I?
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2"

Assigning a directory path as an object:
It is often convenient to create directory paths as R objects. These objects merely make a specified directory an R object, making it easier to change directories by object name rather typing out the entire directory path.

For example, the first call below assigns the default directory to the object baseR. You can call these objects anything. The next two objects - treedat and climdat - are fictitious, but they could logically represent data directories at two different locations, one internal to your CPU boot drive and the other to an external drive.

# assigning objects directory pathnames; some examples
  baseR <- getwd()  # my current defult directory
# assume a project called "treeshifts" with data in 2 different directories
  treedat <- ("C:/Users/tce/Documents/stats/treeshifts/data")  # path to data for project treeshifts
  climdat <- ("E:/treeshifts/data/climate")  # path to presumed climate data stored on external drive

# what do the paths as objects look like ???
  baseR; treedat; climdat
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2"
## [1] "C:/Users/tce/Documents/stats/treeshifts/data"
## [1] "E:/treeshifts/data/climate"

Changing your directory using setwd():
setwd() changes from one directory to another. First, change to a directory - here, data - that is nested under your current working directory. This is the full path option; it is tedious but safe, for you can see exactly where you are going. A shorter version, that works if the new directory is nested underneath the current root directory, is shown as an example.

# full pathname approach using ~ for your personal default
  setwd("~/words/classes/baseR_ALLversions/baseR-V2016.2/data")
# NOT RUN; short path assuming data is nested under your current working directory
# setwd("data")
  getwd()  # where am I ??? Now in data directory
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data"

Next assume you wish to move somewhere else to access other data. This is where pathnames as objects comes in handy; you need only specify the object name in the setwd() call rather than type a full pathname. Recall we had created object baseR as the pathname to our base root working directory. Now just setwd(baseR), and setwd() will use the object to set the working directory.

# return to root.dir using pathname as object
  setwd(baseR)  # root.dir is directory path as object
  getwd()  # where am I ??? Back at the original root ...
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2"

While many R GUI/IDE shells handle directory changes through dropdowns, directory paths as objects are a useful way when you have complex R code that is shifting among many different directories (e.g., data over 3 different hard drives, with different directories within each hard drive). Or you can simply dump all of your 3,258 data files into a single directory - entirely your call based on how you personally organize your research.


The default directory in a R session


R uses your “home” directory in Windows/MAC OS/Linux as default. Home in Windows is something similar to C:\Users\tce\Documents, where “tce” is the user (you, although specific to me in this case). For MAC OS the default for me would be /Users/tce, while in Linux it would be /home/tce. The ~ (tilda) symbol is the shortcut to this default directory, and can used in R to access your home in all OS’s. Note that you can configure R so that any desired directory can serve as “home.”

# ~ as a shortcut to your 'home' in W7/W8/W10, eg ~ => C:\Users\USER\Documents
  getwd()  # where am I now?  I want to go home ...
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2"
  setwd("~/")  # set work dir to home
  getwd()  # am I there?  yes ...
## [1] "C:/Users/tce/Documents"

Saving R objects


The distinction between objects and workspace is important. The workspace is, basically, an analogue of a folder, while objects can be considered as files in that folder. Thus you can save (i) the entire workspace with all objects, or (ii) individual objects by themselves.

Assume objects x, y, and z are created in your workspace. Use the save() command to save \(\geqq 1\) objects, such as save(ObjectName(s), file = "FileName.RData"), where ObjectNames(s) are objects to save, and FileName is your choice of a name for file. save() defaults to the current working directory unless a different path is specified. FileName must end with a .RData extension as well.

save.image("ImageName.RData") saves the entire workspace, where ImageName is a filename you choose. As above, ImageName must end with a .RData extension.

# some R objects to save
  x <- 1  # object x; x is assigned value of 1
  y <- 2  # object y; y is assigned value of 2
  z <- 3  # object z; z is assigned value of 3
  ls()  # objects in workspace
## [1] "baseR"   "climdat" "treedat" "x"       "y"       "z"
# save objects; NOTE saves to current working directory
  getwd()  # where objects will be saved
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2"
  save(x, y, file = "exampleobjects.RData")  # save 2 specified R objects (x,y)
  save.image("MyWorkToday.RData")  # save entire workspace and objects x,y,z

If the objects were saved they will be in your default directory. list.files() is a useful command to see what is stored in your current working directory, especially when combined with the pattern = option.

# are the 2 .RData objects present ??? yes ...
  list.files(pattern = ".RData")  # returns files in your working directory matching the pattern ".RData"
## [1] "exampleobjects.RData" "MyWorkToday.RData"

When ending your R session, you will be prompted to save the entire workspace image unless you have configured your R working environment otherwise. This is your last chance to save! Remember, once you end an R session all work is lost and cannot be recovered unless it has been saved.


Loading saved R objects and workspaces for a new R session


The call load("ObjectToLoad") adds previously saved objects or workspaces to your current R session. Your current working directory must be where the .RData is saved or an error will be returned. Previously you saved two .RData files, exampleobjects.RData and MyWorkToday.RData. To see how load() works, use rm() to remove the “x”, “y”, and “z” objects, then load exampleobjects.RData.

# restore previous workspace and objects
  ls()  # what is there?
## [1] "baseR"   "climdat" "treedat" "x"       "y"       "z"
  rm(x, y)  # remove objects as test
  ls()  # what is left?
## [1] "baseR"   "climdat" "treedat" "z"
# load some .RData objects
  setwd(baseR)  # change to appropriate directory
  load("exampleobjects.RData")  # load objects
  ls()  # are x, y now present?
## [1] "baseR"   "climdat" "treedat" "x"       "y"       "z"

Make sure you are in the correct directory before loading a .RData file or an error will be returned.

Other options for loading .RData files are “double-clicking” on the file, which opens a completely new R session and loads the file, or by “click-n-dragging” the file directly into your R console window. For RStudio you will have to configure the “double-clicking” option.


Options for a R session


Numerous options can be customized for a specific R session, or they can be stored and used in all your R sessions. Use help(options) to see what these are. A simple example is to set the maximum significant digits to 4 (the default is 7). Once you have set an option it will remain for the rest of your current R session.

# options
  help(options)  # show available options; NOTE will open in your browser
## starting httpd help server ...
##  done
# set a significant digits option
  x <- 1.23456789; x  # lots of digits ...
## [1] 1.234568
# set a significant digites option
  options(digits = 4); x  # set digits < 4
## [1] 1.235

You can modify the Rprofile.site file to reflect personal preferences. These would be the defaults for all your future sessions. In Windows, the file is found at: C:\Program Files\R\R-X.Y.Z\etc, where R-X.Y.Z would be your R version. An example of the basic Rprofile.site, with some guidance on how it works, is shown below.

########################################
# Sample Rprofile.site file
#
# You can customize the R environment through a site initialization file or a directory
#   initialization file. 
# R will always source the Rprofile.site file first. On Windows, the file is in the
#   C:\ProgramFiles/R/R-n.n.n/etc directory. 
# You can also place a .Rprofile file in any directory that you are going to run R from or in
#   the user home directory.
#
# At startup, R will source the Rprofile.site file. It will then look for a .Rprofile file to
#   source in the current working directory. 
# If it doesn't find it, it will look for one in the user's home directory. There are two
#   special functions you can place in these files. 
# .First() will be run at the start of the R session and .Last( ) will be run at the end of
#   the session. 
#
# Things you might want to change
# options(papersize="a4")
# options(editor="notepad")
# options(pager="internal")
# options(edit = "C:/Program Files (x86)/Notepad++/notepad++.exe") 
# R interactive prompt
# options(prompt="> ")
# options(continue="+ ")
#
# to prefer Compiled HTML
# help options(chmhelp=TRUE)
# to prefer HTML help
# options(htmlhelp=TRUE)
#
# General options
# options(tab.width = 2)
# options(width = 130)
# options(graphics.record=TRUE)
########################################

The command history


R keeps track of all commands submitted. You can use the “up” and “down” arrow keys recall recently submitted commands. Other options for command history include: history(), which shows the most recent 25 submitted commands, and history(max.show = Inf), which recalls all command that have been submitted.

# command history
  history()  # short history
  history(max.show = Inf) # full history

Both history calls result in a pop-up window looking similar to that below. In RStudio history() will show up in your top right panel under the History tab.

You can configure your R environment to automatically save this history every time you exit, or use the associated GUI File => Save to file… to save the commands as a text file.


Commenting your R code


Comments are a great way to leave “bread-crumbs” explaining the use of single lines or snippets of code. Comments can be in-line with R code, or on separate lines. Comments start with #; all text to right of # is ignored.

# use comments to leave bread crumbs behind your codespeak
# they can be as single or multiple lines without code or ...
  y <- 42  # ... after code and in the same line

Exercise #2


  • Build a set of at least 3 pathnames pointing to internal or external drives
    • Two of these should be the directories for the course data, /exercise_dat and /powerpoint_dat
  • Activate paths and determine where you are as each is activated
  • Save the paths as a .RData file, where is whatever file name you wish to use
  • Access your last 25 R commands and save them as *.r (or other text) file
    • Recommend saving in Windows folder for R code labeled: r_code *Exit R, re-open R, and load previously saved pathnames
    • Change to your R data directory; what is there?

END MODULE 2.5


Printable Version