Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016
So-called sorting can be tricky in R, and knowing the logic behind the different types of sorting in R is crucial. There are two basic sorts in R:
You can also rank data objects from low to high values.
Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData
. Some of these objects will be needed, so load them first into your workspace.
# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
getwd() # in correct directory ?
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
load("mod3data.RData") # load it
ls() # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"
If the objects are not there, or you did not save an .RData
from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.
Confusion often occurs because sort()
in R is not equivalent to the sort command common to spreadsheets like EXCEL.
Instead, R sort(ObjectVar)
orders from low to high (or high to low) on a single variable only. The other variables are not carried along during the sort. The default is an ascending order of the variable, which can be reversed using the option descending = T
.
For multi-column data objects you must specify a column to sort. You cannot sort a data object, only a vector or column in a data object. Likewise, you cannot attempt a multi-column (variable) sort. In both cases an error will be returned.
# assume data m1 from Exercise #6; skull characteristics of jumping mice
# NOT RUN will lead to an error; go ahead and try anyway to see what error will look like
sort(m1) # sort w/out specifying var leads to error
sort(m1[, 2:3]) # try mulitple var sort; error returned
names(m1) # possible vars to sort
## [1] "catno" "sex" "elev" "conlen" "zygbre" "lstiob"
# application of sort to a column variable
m1$elev # current ordering of var=elev
## [1] 1878 3230 3230 3047 3047 3047 3047 3047 3047 3047 3047 2682 2712 3181
## [15] 3181 3181 3181 3181 3002
sort(m1$elev) # ordering after sort; NOTE low to high ranking
## [1] 1878 2682 2712 3002 3047 3047 3047 3047 3047 3047 3047 3047 3181 3181
## [15] 3181 3181 3181 3230 3230
order()
is the equivalent to sort in a spreadsheet like EXCEL (get it ??). It can be applied to a data object of 1 or more columns (variables). order()
defaults to low to high, and can be changed using the decreasing = T
option.
A single column order is DataObject[order(OrderVar), (Vars2CarryAlong)]
. Note that Vars2CarryAlong is not needed. Use of the ,
(comma) after the (OrdeVar) defaults to all other variables in the DataObject. A multi-variable sort is accomplished as (OrderVar1, OrderVar2, … , OrderVarN).
Similarly, you can use a sequence of variables to indicate which additional variables should be carried along with the ordering.
Note that in order()
the original row numbers are not changed.
# assume data m1 from Exercise #6; skull characteristics of jumping mice
head(m1, 4) # current order of m1
## catno sex elev conlen zygbre lstiob
## 1 9316 M 1878 22.37 12.64 4.83
## 2 17573 F 3230 NA 12.38 4.28
## 3 17574 M 3230 NA 11.75 4.45
## 4 17575 F 3047 20.41 11.44 4.36
# order by sex; column calls used
o1 <- m1[order(m1[, 2]), 1:6]; head(o1, 4)
## catno sex elev conlen zygbre lstiob
## 2 17573 F 3230 NA 12.38 4.28
## 4 17575 F 3047 20.41 11.44 4.36
## 5 17576 F 3047 21.70 12.06 4.51
## 7 17578 F 3047 20.89 11.08 4.13
# order by single column sex; column calls NOT used
o2 <- m1[order(m1$sex), ]; head(o2, 4)
## catno sex elev conlen zygbre lstiob
## 2 17573 F 3230 NA 12.38 4.28
## 4 17575 F 3047 20.41 11.44 4.36
## 5 17576 F 3047 21.70 12.06 4.51
## 7 17578 F 3047 20.89 11.08 4.13
# order by sex & elev; column calls NOT used
o3 <- m1[order(m1$sex, m1$elev), ]; head(o3, 4)
## catno sex elev conlen zygbre lstiob
## 12 26500 F 2682 21.93 12.49 4.24
## 13 26566 F 2712 22.05 12.84 4.53
## 4 17575 F 3047 20.41 11.44 4.36
## 5 17576 F 3047 21.70 12.06 4.51
There are three options for ordering an object with NA:
na.last = T
na.last = F
na.last = NA
# assume data m1 from Exercise #6; skull characteristics of jumping mice
head(m1, 4) # data in m1 before sort
## catno sex elev conlen zygbre lstiob
## 1 9316 M 1878 22.37 12.64 4.83
## 2 17573 F 3230 NA 12.38 4.28
## 3 17574 M 3230 NA 11.75 4.45
## 4 17575 F 3047 20.41 11.44 4.36
# sort m1 by conlen; NA last
o4 <- m1[order(m1$conlen, na.last = T), ]; tail(o4, 4) # order & check tail for NA
## catno sex elev conlen zygbre lstiob
## 2 17573 F 3230 NA 12.38 4.28
## 3 17574 M 3230 NA 11.75 4.45
## 8 17579 F 3047 NA 11.43 4.26
## 9 17580 M 3047 NA 11.72 4.32
# sort m1 by conlen; NA first
o5 <- m1[order(m1$conlen, na.last = F), ]; head(o5, 4) # order & check head for NA
## catno sex elev conlen zygbre lstiob
## 2 17573 F 3230 NA 12.38 4.28
## 3 17574 M 3230 NA 11.75 4.45
## 8 17579 F 3047 NA 11.43 4.26
## 9 17580 M 3047 NA 11.72 4.32
# sort m1 by conlen; drop NA
o6 <- m1[order(m1$conlen, na.last = NA), ]; head(o6, 4) # order & drop all NA
## catno sex elev conlen zygbre lstiob
## 11 17582 F 3047 20.32 11.38 4.15
## 4 17575 F 3047 20.41 11.44 4.36
## 7 17578 F 3047 20.89 11.08 4.13
## 10 17581 F 3047 21.11 NA 3.94
# dim m1=original data object; o6 is after order & drop NA
dim(m1); dim(o6)
## [1] 19 6
## [1] 15 6
rank(DataObject$Var2Rank)
creates a new variable, e.g., rank, of the column value, and adds it to data object. The rank is the order of the value in the column. The default for rank()
is ascending; ties are averaged.
# assume data m1 from Exercise #6; skull characteristics of jumping mice
# rank var=conlen w/NA=last
m1$ranks <- rank(m1$conlen, na.last = T); head(m1, 4) # rank var=conlen w/NA=last
## catno sex elev conlen zygbre lstiob ranks
## 1 9316 M 1878 22.37 12.64 4.83 13
## 2 17573 F 3230 NA 12.38 4.28 16
## 3 17574 M 3230 NA 11.75 4.45 17
## 4 17575 F 3047 20.41 11.44 4.36 2
# rank var=elev w/ties as average
m1$ranks <- rank(m1$elev); head(m1, 4) # rank var=elev w/ties=average
## catno sex elev conlen zygbre lstiob ranks
## 1 9316 M 1878 22.37 12.64 4.83 1.0
## 2 17573 F 3230 NA 12.38 4.28 18.5
## 3 17574 M 3230 NA 11.75 4.45 18.5
## 4 17575 F 3047 20.41 11.44 4.36 8.5
The calls below provide a comparison of how sort()
, order()
, and rank()
operate.
# compare output from sort, order, & rank
m1$sort <- sort(m1$elev) # sort var=elev
m1$ranks <- rank(m1$elev) # rank var=elev
m1$order <- order(m1$elev) # order var=elev
m1[c("elev", "sort", "order", "ranks")] # examine output
## elev sort order ranks
## 1 1878 1878 1 1.0
## 2 3230 2682 12 18.5
## 3 3230 2712 13 18.5
## 4 3047 3002 19 8.5
## 5 3047 3047 4 8.5
## 6 3047 3047 5 8.5
## 7 3047 3047 6 8.5
## 8 3047 3047 7 8.5
## 9 3047 3047 8 8.5
## 10 3047 3047 9 8.5
## 11 3047 3047 10 8.5
## 12 2682 3047 11 2.0
## 13 2712 3181 14 3.0
## 14 3181 3181 15 15.0
## 15 3181 3181 16 15.0
## 16 3181 3181 17 15.0
## 17 3181 3181 18 15.0
## 18 3181 3230 2 15.0
## 19 3002 3230 3 4.0
Basic calls related to sorting, ordering, and ranking data objects are:
sort()
=> Selected column only; no other columns carried alongorder()
=> Column(s) of interest; other columns carried alongrank()
=> Returns order of the value in the column