MODULE 4.7 Sorting, Ordering, and Ranking

baseR-V2016.2 - Data Management and Manipulation using R

Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016


Objective:

  • Sort, order, and rank data in a R data object



Let’s begin again with …

Some Learning Objectives:

  • What are the differences among sorting, ranking and ordering in R?

  • Can I sort a data object by 1 or more columns (variables)?

  • From low to high? High to low?

  • How does ranking deal with identical values in a column?

  • What does ordering do to a data object?


Some Background


So-called sorting can be tricky in R, and knowing the logic behind the different types of sorting in R is crucial. There are two basic sorts in R:

  • On selected variables only; no other variables carried along
  • On selected variables, and all other variables carried along

You can also rank data objects from low to high values.


Some Initialization Before We Proceed …


Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as mod3data.RData. Some of these objects will be needed, so load them first into your workspace.

# load objects from Exercise #6; should have saved as mod3data.RData
# REMEMBER your directory path will be different .. I'm a long-winded instructor
  getwd()  # in correct directory ?  
## [1] "C:/Users/tce/Documents/words/classes/baseR_ALLversions/baseR-V2016.2/data/powerpoint_dat"
  list.files(pattern = ".RData") # is it there ?
## [1] "mod3data.RData"
  load("mod3data.RData")  # load it
  ls()  # check workspace; objects present ?
## [1] "f1" "m1" "m2" "m3" "m4" "t1" "w1"

If the objects are not there, or you did not save an .RData from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.


Sorting a Single Variable


Confusion often occurs because sort() in R is not equivalent to the sort command common to spreadsheets like EXCEL.

Instead, R sort(ObjectVar) orders from low to high (or high to low) on a single variable only. The other variables are not carried along during the sort. The default is an ascending order of the variable, which can be reversed using the option descending = T.

For multi-column data objects you must specify a column to sort. You cannot sort a data object, only a vector or column in a data object. Likewise, you cannot attempt a multi-column (variable) sort. In both cases an error will be returned.

# assume data m1 from Exercise #6; skull characteristics of jumping mice
# NOT RUN will lead to an error; go ahead and try anyway to see what error will look like
  sort(m1)  # sort w/out specifying var leads to error
  sort(m1[, 2:3])  # try mulitple var sort; error returned
  names(m1)  # possible vars to sort
## [1] "catno"  "sex"    "elev"   "conlen" "zygbre" "lstiob"
# application of sort to a column variable
  m1$elev  # current ordering of var=elev
##  [1] 1878 3230 3230 3047 3047 3047 3047 3047 3047 3047 3047 2682 2712 3181
## [15] 3181 3181 3181 3181 3002
  sort(m1$elev)  # ordering after sort; NOTE low to high ranking
##  [1] 1878 2682 2712 3002 3047 3047 3047 3047 3047 3047 3047 3047 3181 3181
## [15] 3181 3181 3181 3230 3230

Ordering a Data Object - Variable(s) as the Key


order() is the equivalent to sort in a spreadsheet like EXCEL (get it ??). It can be applied to a data object of 1 or more columns (variables). order() defaults to low to high, and can be changed using the decreasing = T option.

A single column order is DataObject[order(OrderVar), (Vars2CarryAlong)]. Note that Vars2CarryAlong is not needed. Use of the , (comma) after the (OrdeVar) defaults to all other variables in the DataObject. A multi-variable sort is accomplished as (OrderVar1, OrderVar2, … , OrderVarN).

Similarly, you can use a sequence of variables to indicate which additional variables should be carried along with the ordering.

Note that in order() the original row numbers are not changed.

# assume data m1 from Exercise #6; skull characteristics of jumping mice
  head(m1, 4)  # current order of m1
##   catno sex elev conlen zygbre lstiob
## 1  9316   M 1878  22.37  12.64   4.83
## 2 17573   F 3230     NA  12.38   4.28
## 3 17574   M 3230     NA  11.75   4.45
## 4 17575   F 3047  20.41  11.44   4.36
# order by sex; column calls used
  o1 <- m1[order(m1[, 2]), 1:6]; head(o1, 4)
##   catno sex elev conlen zygbre lstiob
## 2 17573   F 3230     NA  12.38   4.28
## 4 17575   F 3047  20.41  11.44   4.36
## 5 17576   F 3047  21.70  12.06   4.51
## 7 17578   F 3047  20.89  11.08   4.13
# order by single column sex; column calls NOT used
  o2 <- m1[order(m1$sex), ]; head(o2, 4)
##   catno sex elev conlen zygbre lstiob
## 2 17573   F 3230     NA  12.38   4.28
## 4 17575   F 3047  20.41  11.44   4.36
## 5 17576   F 3047  21.70  12.06   4.51
## 7 17578   F 3047  20.89  11.08   4.13
# order by sex & elev; column calls NOT used
  o3 <- m1[order(m1$sex, m1$elev), ]; head(o3, 4) 
##    catno sex elev conlen zygbre lstiob
## 12 26500   F 2682  21.93  12.49   4.24
## 13 26566   F 2712  22.05  12.84   4.53
## 4  17575   F 3047  20.41  11.44   4.36
## 5  17576   F 3047  21.70  12.06   4.51

Ordering and Missing Variables NA


There are three options for ordering an object with NA:

  • Place NA at the top of the order; use option na.last = T
  • Place NA at the bottom of the order; use option na.last = F
  • Eliminate all NA during the order; use option na.last = NA
# assume data m1 from Exercise #6; skull characteristics of jumping mice
  head(m1, 4)  # data in m1 before sort
##   catno sex elev conlen zygbre lstiob
## 1  9316   M 1878  22.37  12.64   4.83
## 2 17573   F 3230     NA  12.38   4.28
## 3 17574   M 3230     NA  11.75   4.45
## 4 17575   F 3047  20.41  11.44   4.36
# sort m1 by conlen; NA last
  o4 <- m1[order(m1$conlen, na.last = T), ]; tail(o4, 4)  # order & check tail for NA
##   catno sex elev conlen zygbre lstiob
## 2 17573   F 3230     NA  12.38   4.28
## 3 17574   M 3230     NA  11.75   4.45
## 8 17579   F 3047     NA  11.43   4.26
## 9 17580   M 3047     NA  11.72   4.32
# sort m1 by conlen; NA first
  o5 <- m1[order(m1$conlen, na.last = F), ]; head(o5, 4)  # order & check head for NA
##   catno sex elev conlen zygbre lstiob
## 2 17573   F 3230     NA  12.38   4.28
## 3 17574   M 3230     NA  11.75   4.45
## 8 17579   F 3047     NA  11.43   4.26
## 9 17580   M 3047     NA  11.72   4.32
# sort m1 by conlen; drop NA
  o6 <- m1[order(m1$conlen, na.last = NA), ]; head(o6, 4)  # order & drop all NA
##    catno sex elev conlen zygbre lstiob
## 11 17582   F 3047  20.32  11.38   4.15
## 4  17575   F 3047  20.41  11.44   4.36
## 7  17578   F 3047  20.89  11.08   4.13
## 10 17581   F 3047  21.11     NA   3.94
# dim m1=original data object; o6 is after order & drop NA
  dim(m1); dim(o6)  
## [1] 19  6
## [1] 15  6

Ranking Values in a Variable Column


rank(DataObject$Var2Rank) creates a new variable, e.g., rank, of the column value, and adds it to data object. The rank is the order of the value in the column. The default for rank() is ascending; ties are averaged.

# assume data m1 from Exercise #6; skull characteristics of jumping mice
# rank var=conlen w/NA=last
  m1$ranks <- rank(m1$conlen, na.last = T); head(m1, 4) # rank var=conlen w/NA=last
##   catno sex elev conlen zygbre lstiob ranks
## 1  9316   M 1878  22.37  12.64   4.83    13
## 2 17573   F 3230     NA  12.38   4.28    16
## 3 17574   M 3230     NA  11.75   4.45    17
## 4 17575   F 3047  20.41  11.44   4.36     2
# rank var=elev w/ties as average
  m1$ranks <- rank(m1$elev); head(m1, 4)  # rank var=elev w/ties=average
##   catno sex elev conlen zygbre lstiob ranks
## 1  9316   M 1878  22.37  12.64   4.83   1.0
## 2 17573   F 3230     NA  12.38   4.28  18.5
## 3 17574   M 3230     NA  11.75   4.45  18.5
## 4 17575   F 3047  20.41  11.44   4.36   8.5

A Comparison of Sorting, Ranking, and Ordering


The calls below provide a comparison of how sort(), order(), and rank() operate.

# compare output from sort, order, & rank
  m1$sort <- sort(m1$elev)  # sort var=elev
  m1$ranks <- rank(m1$elev)  # rank var=elev
  m1$order <- order(m1$elev)  # order var=elev
  m1[c("elev", "sort", "order", "ranks")]   # examine output
##    elev sort order ranks
## 1  1878 1878     1   1.0
## 2  3230 2682    12  18.5
## 3  3230 2712    13  18.5
## 4  3047 3002    19   8.5
## 5  3047 3047     4   8.5
## 6  3047 3047     5   8.5
## 7  3047 3047     6   8.5
## 8  3047 3047     7   8.5
## 9  3047 3047     8   8.5
## 10 3047 3047     9   8.5
## 11 3047 3047    10   8.5
## 12 2682 3047    11   2.0
## 13 2712 3181    14   3.0
## 14 3181 3181    15  15.0
## 15 3181 3181    16  15.0
## 16 3181 3181    17  15.0
## 17 3181 3181    18  15.0
## 18 3181 3230     2  15.0
## 19 3002 3230     3   4.0

Summary of Module 4.7 Functions


Basic calls related to sorting, ordering, and ranking data objects are:

  • sort() => Selected column only; no other columns carried along
  • order() => Column(s) of interest; other columns carried along
  • rank() => Returns order of the value in the column

END MODULE 4.7


Printable Version