Trick or tips 004 {R}

August 13, 2019

  R tips trickortips
  base utils graphics microbenchmark

Kevin Cazelles  

Marie-Hélène Brice  


Trick or Tips?

Ever tumbled on a code chunk that made you say: "I should have known this ¶ø?!@~&* piece of code long ago!?" Chances are you have, frustratingly, just like we have, and on multiple occasions too.

In comes Trick or Tips!

Trick or Tips is a series of blog posts that each present 5 -- hopefully helpful -- coding tips for a specific programming language. Posts should be short (i.e. no more than 5 lines of code, max 80 characters per line, except when appropriate) and provide tips of many kind: a function, a way of combining of functions, a single argument, a note about the philosophy of the language and practical consequences, tricks to improve the way you code, good practices, etc.

Note that while some tips might be obvious for careful documentation readers (God bless them for their wisdom), we do our best to present what we find very useful and underestimated. By the way, there are undoubtedly similar initiatives on the web (e.g. "One R Tip a Day" Twitter account). Also, feel free to comment below tip ideas or a post of code tips of your own which we will be happy to incorporate to Trick or Tips. Enjoy and get ready to frustratingly appreciate our tips!

Trick or tips 0004

Today’s menu:

  1. Subset an array with a matrix
  2. nzchar()
  3. Do you need to call return()?
  4. invisible()
  5. bquote() and substitute()

Subset an array with a matrix

Let’s consider two arrays of letters: the first has two dimensions (i.e. a matrix) and the second one has 3.

(arr2 <- array(LETTERS[1:9], dim = c(3,3)))
#R>       [,1] [,2] [,3]
#R>  [1,] "A"  "D"  "G" 
#R>  [2,] "B"  "E"  "H" 
#R>  [3,] "C"  "F"  "I"
#R>  [1] "matrix" "array"
(arr3 <- array(LETTERS[1:18], dim = c(3, 3, 2)))
#R>  , , 1
#R>       [,1] [,2] [,3]
#R>  [1,] "A"  "D"  "G" 
#R>  [2,] "B"  "E"  "H" 
#R>  [3,] "C"  "F"  "I" 
#R>  , , 2
#R>       [,1] [,2] [,3]
#R>  [1,] "J"  "M"  "P" 
#R>  [2,] "K"  "N"  "Q" 
#R>  [3,] "L"  "O"  "R"
#R>  [1] "array"

Let’s say, you need to subset a specific set of values based on the position of the elements. To subset a single element, say "G", there are a couple of options, but I guess the most common approach is to use [ with one value per dimension:

#R>  [1] "G"
#R>  [1] "G"

or with a single value giving the position of the element:

#R>  [1] "G"
#R>  [1] "G"

Now we consider the case where you have a vector of positions (one value per dimension of the array), in this case, beware the orientation of the vector!

# with the line below, we get the 1rst and 3rd elements because we're using a column vector
#R>  [1] "A" "C"
# whereas with a row vector, we obtain the element of the 1rst row and the 3rd column
#R>  [1] "G"

And for more than one element, you need to use a matrix with one row per element to be subset:

(mat <- rbind(c(1,3), c(2,2)))
#R>       [,1] [,2]
#R>  [1,]    1    3
#R>  [2,]    2    2
#R>  [1] "G" "E"

Similarly, with an array of 3 dimensions, the matrix will have three columns and as many row as there are elements:

# Let us subset `E`,`C` and `O` and `C` (again)
(msub <- rbind(c(2,2,1), c(3, 1, 1), c(3, 2, 2), c(3, 1, 1)))
#R>       [,1] [,2] [,3]
#R>  [1,]    2    2    1
#R>  [2,]    3    1    1
#R>  [3,]    3    2    2
#R>  [4,]    3    1    1
#R>  [1] "E" "C" "O" "C"

Two additional comments. First, we should always keep in mind that data frames and arrays are different:

# this gives you the 1rst and 3rd **entire columns**[c(1,3)]
#R>    V1 V3
#R>  1  A  G
#R>  2  B  H
#R>  3  C  I
# this still gives you the element on the 1rst row and the 3rd column[t(c(1,3))]
#R>  [1] "G"

Second, if you are a tidyverse user, there is a new article dealing with subassigment with tibble 😎.


You may already be aware of nchar(), a function that returns the number of characters of a given character vector:

nchar(c("insil", "eco", ""))
#R>  [1] 5 3 0

nzchar() returns TRUE for every character string in the vector that has at least 1 character:

vec <- c("insil", "eco", "")
# is there any empty character string in `vec`?
#R>  [1] TRUE

Interesting, but let’s dig deeper: I can think about no less than 3 ways of writing a equivalent function with one more character:

nchar(vec) > 0
!! nchar(vec)
nchar(vec) & 1

One more character… so why bother? 💡 It should be a matter of performance! Let’s check that out with the cool 📦 microbenchmark:

microbenchmark(nchar(vec) > 0, !! nchar(vec), nchar(vec) & 1, nzchar(vec),
  times = 1000L)
#R>  Unit: nanoseconds
#R>             expr  min   lq   mean median   uq   max neval
#R>   nchar(vec) > 0  900 1000 1059.9   1000 1100  2900  1000
#R>     !!nchar(vec) 1000 1100 1158.6   1100 1200  3900  1000
#R>   nchar(vec) & 1 1000 1100 1205.3   1100 1200 12500  1000
#R>      nzchar(vec)    0    0   66.7    100  100   500  1000

Yep yep!nzchar() is indeed way faster 🚀!

Do you need to use return()?

If you have already written your own function, you must have used return() to specify what your function should return. There are programming languages where this instruction is mandatory, not in R! Check out the documentation ?return:

If the end of a function is reached without calling ‘return’, the value of the last evaluated expression is returned.

Let me write 2 functions:

add_v <- function(x, y) {
   x + y
add_v2 <- function(x, y) {
   return(x + y)

add_v() and add_v2() are equivalent! So… do we care? Well, you must bear in mind that whenever return() is encountered, the evaluation of the set of expressions within the function is stopped and therefore some time can be saved:

foo <- function(x) {
    out <- 0
    if (x > 3) out <- 3
    if (x > 2) out <- 2
    if (x > 1) out <- 1
foo2 <- function(x) {
  out <- 0
  if (x > 3) return(3)
  if (x > 2) return(2)
  if (x > 1) return(1)
microbenchmark(foo(4), foo2(4), times = 1e5)
#R>  Unit: nanoseconds
#R>      expr min  lq    mean median  uq      max neval
#R>    foo(4) 400 500 920.786    500 600 18631300 1e+05
#R>   foo2(4) 200 300 459.573    300 400  3311300 1e+05


Let’s keep talking about what functions return. The function invisible() allows you to return an invisible copy of an object, meaning that nothing is (apparently) return if not assigned:

add_v <- function(x, y) {
  x + y
add_i <- function(x, y) {
  invisible(x + y)
add_v(2, 3)
#R>  [1] 5
add_i(2, 3)
res <- add_i(2, 3)
#R>  [1] 5

But… why? As explained in the documentation (?invisible):

This function can be useful when it is desired to have functions return values which can be assigned, but which do not print when they are not assigned.

This is indeed helpful when you have a function that creates a plot (and you don’t normally to assign the result) for which you sometimes need to use an object that was created during the evaluation of the function:

plot_logy <- function(x, y) {
  # create ty
  ty <- log10(y + 1)
  plot(x, ty)
plot_logy(0:10, 0:10)

# get ty
ty <- plot_logy(0:10, 0:10)

#R>   [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980
#R>   [8] 0.9030900 0.9542425 1.0000000 1.0413927

bquote() and substitute()

When using mathematical annotations, we sometimes need to include the value of a variable. In such case, bquote() or substitute() are the functions you would need (rather than expression() you may already be familiar with).

If you opt for bquote(), then variables to be evaluated must be put in brackets and preceded by a dot, e.g. .(var). If you choose substitute(), then variables evaluated will be the ones included in the list passed as argument env (which can also be the name of a environment).

Let’s use both functions in to add mathematical expressions in an empty plot:

delta <- 1.5
plot(c(0,1), c(0,1), type = "n", axes = FALSE, ann = FALSE)
text(0.5, .75, labels = bquote(beta^j == .(delta) + bold("h")), cex = 3)
text(0.5, .25, labels = substitute(alpha[i] == a + delta, env = list(a = 2)), cex = 3)

#R>  [1] "/home/runner/work/"

That’s all folks!

Session info
#R>  R version 4.2.2 (2022-10-31)
#R>  Platform: x86_64-pc-linux-gnu (64-bit)
#R>  Running under: Ubuntu 22.04.1 LTS
#R>  Matrix products: default
#R>  BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/
#R>  LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
#R>  locale:
#R>   [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#R>   [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#R>  attached base packages:
#R>  [1] stats     graphics  grDevices utils     datasets  methods   base     
#R>  other attached packages:
#R>  [1] microbenchmark_1.4.9   inSilecoRef_0.0.1.9000
#R>  loaded via a namespace (and not attached):
#R>   [1] tidyselect_1.2.0  xfun_0.35         bslib_0.4.1       vctrs_0.5.1      
#R>   [5] generics_0.1.3    miniUI_0.1.1.1    htmltools_0.5.4   yaml_2.3.6       
#R>   [9] utf8_1.2.2        rlang_1.0.6       later_1.3.0       pillar_1.8.1     
#R>  [13] jquerylib_0.1.4   httpcode_0.3.0    glue_1.6.2        DBI_1.1.3        
#R>  [17] lifecycle_1.0.3   plyr_1.8.8        stringr_1.5.0     blogdown_1.15    
#R>  [21] htmlwidgets_1.5.4 evaluate_0.18     knitr_1.41        fastmap_1.1.0    
#R>  [25] httpuv_1.6.6      curl_4.3.3        fansi_1.0.3       highr_0.9        
#R>  [29] Rcpp_1.0.9        xtable_1.8-4      promises_1.2.0.1  backports_1.4.1  
#R>  [33] DT_0.26           cachem_1.0.6      jsonlite_1.8.4    rcrossref_1.2.0  
#R>  [37] mime_0.12         digest_0.6.30     stringi_1.7.8     bookdown_0.30    
#R>  [41] dplyr_1.0.10      shiny_1.7.3       bibtex_0.5.0      cli_3.4.1        
#R>  [45] tools_4.2.2       magrittr_2.0.3    sass_0.4.4        tibble_3.1.8     
#R>  [49] RefManageR_1.4.0  crul_1.3          pkgconfig_2.0.3   ellipsis_0.3.2   
#R>  [53] xml2_1.3.3        timechange_0.1.1  lubridate_1.9.0   assertthat_0.2.1 
#R>  [57] rmarkdown_2.18    httr_1.4.4        R6_2.5.1          compiler_4.2.2

Comment with Disqus