Trick or tips 004 {R}

August 13, 2019

  R tips trickortips
  base utils graphics microbenchmark

Kevin Cazelles

Marie-Hélène Brice

   

Trick or Tips

Ever tumbled on a code chunk that made you say: "I should have known this f_ piece of code long ago!" Chances are you have, frustratingly, just like we have, and on multiple occasions too. In comes Trick or Tips!

Trick or Tips is a series of blog posts that each present 5 -- hopefully helpful -- coding tips for a specific programming language. Posts should be short (i.e. no more than 5 lines of code, max 80 characters per line, except when appropriate) and provide tips of many kind: a function, a way of combining of functions, a single argument, a note about the philosophy of the language and practical consequences, tricks to improve the way you code, good practices, etc.

Note that while some tips might be obvious for careful documentation readers (God bless them for their wisdom), we do our best to present what we find very useful and underestimated. By the way, there are undoubtedly similar initiatives on the web (e.g. "One R Tip a Day" Twitter account). Last, feel free to comment below tip ideas or a post of code tips of your own which we will be happy to incorporate to our next post.

Enjoy and get ready to frustratingly appreciate our tips!

Subset an array with a matrix

Let’s consider two arrays of letters: the first has two dimensions (i.e. a matrix) and the second one has 3.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(arr2 <- array(LETTERS[1:9], dim = c(3,3)))
#R>       [,1] [,2] [,3]
#R>  [1,] "A"  "D"  "G" 
#R>  [2,] "B"  "E"  "H" 
#R>  [3,] "C"  "F"  "I"
class(arr2)
#R>  [1] "matrix" "array"
(arr3 <- array(LETTERS[1:18], dim = c(3, 3, 2)))
#R>  , , 1
#R>  
#R>       [,1] [,2] [,3]
#R>  [1,] "A"  "D"  "G" 
#R>  [2,] "B"  "E"  "H" 
#R>  [3,] "C"  "F"  "I" 
#R>  
#R>  , , 2
#R>  
#R>       [,1] [,2] [,3]
#R>  [1,] "J"  "M"  "P" 
#R>  [2,] "K"  "N"  "Q" 
#R>  [3,] "L"  "O"  "R"
class(arr3)
#R>  [1] "array"

Let’s say, you need to subset a specific set of values based on the position of the elements. To subset a single element, say "G", there are a couple of options, but I guess the most common approach is to use [ with one value per dimension:

1
2
3
4
arr2[1,3]
#R>  [1] "G"
arr3[1,3,1]
#R>  [1] "G"

or with a single value giving the position of the element:

1
2
3
4
arr2[7]
#R>  [1] "G"
arr3[7]
#R>  [1] "G"

Now we consider the case where you have a vector of positions (one value per dimension of the array), in this case, beware the orientation of the vector!

1
2
3
4
5
6
# with the line below, we get the 1rst and 3rd elements because we're using a column vector
arr2[c(1,3)]
#R>  [1] "A" "C"
# whereas with a row vector, we obtain the element of the 1rst row and the 3rd column
arr2[t(c(1,3))]
#R>  [1] "G"

And for more than one element, you need to use a matrix with one row per element to be subset:

1
2
3
4
5
6
(mat <- rbind(c(1,3), c(2,2)))
#R>       [,1] [,2]
#R>  [1,]    1    3
#R>  [2,]    2    2
arr2[mat]
#R>  [1] "G" "E"

Similarly, with an array of 3 dimensions, the matrix will have three columns and as many row as there are elements:

1
2
3
4
5
6
7
8
9
# Let us subset `E`,`C` and `O` and `C` (again)
(msub <- rbind(c(2,2,1), c(3, 1, 1), c(3, 2, 2), c(3, 1, 1)))
#R>       [,1] [,2] [,3]
#R>  [1,]    2    2    1
#R>  [2,]    3    1    1
#R>  [3,]    3    2    2
#R>  [4,]    3    1    1
arr3[msub]
#R>  [1] "E" "C" "O" "C"

Two additional comments. First, we should always keep in mind that data frames and arrays are different:

1
2
3
4
5
6
7
8
9
# this gives you the 1rst and 3rd **entire columns**
as.data.frame(arr2)[c(1,3)]
#R>    V1 V3
#R>  1  A  G
#R>  2  B  H
#R>  3  C  I
# this still gives you the element on the 1rst row and the 3rd column
as.data.frame(arr2)[t(c(1,3))]
#R>  [1] "G"

Second, if you are a tidyverse user, there is a new article dealing with subassigment with tibble 😎.

nzchar()

You may already be aware of nchar(), a function that returns the number of characters of a given character vector:

1
2
nchar(c("insil", "eco", ""))
#R>  [1] 5 3 0

nzchar() returns TRUE for every character string in the vector that has at least 1 character:

1
2
3
4
5
6
vec <- c("insil", "eco", "")
nzchar(vec)
#R>  [1]  TRUE  TRUE FALSE
# is there any empty character string in `vec`?
any(nzchar(vec))
#R>  [1] TRUE

Interesting, but let’s dig deeper: I can think about no less than 3 ways of writing a equivalent function with one more character:

1
2
3
4
5
6
nchar(vec) > 0
#R>  [1]  TRUE  TRUE FALSE
!! nchar(vec)
#R>  [1]  TRUE  TRUE FALSE
nchar(vec) & 1
#R>  [1]  TRUE  TRUE FALSE

One more character… so why bother? 💡 It should be a matter of performance! Let’s check that out with the cool 📦 microbenchmark:

1
2
3
4
5
6
7
8
9
library(microbenchmark)
microbenchmark(nchar(vec) > 0, !! nchar(vec), nchar(vec) & 1, nzchar(vec),
  times = 1000L)
#R>  Unit: nanoseconds
#R>             expr min  lq     mean median   uq  max neval
#R>   nchar(vec) > 0 842 892  938.491    911  932 5631  1000
#R>     !!nchar(vec) 922 962 1005.029    981 1002 3777  1000
#R>   nchar(vec) & 1 942 982 1026.216   1002 1031 2194  1000
#R>      nzchar(vec)  69  90   96.917     90  100 1533  1000

Yep yep!nzchar() is indeed way faster 🚀!

Do you need to use return()?

If you have already written your own function, you must have used return() to specify what your function should return. There are programming languages where this instruction is mandatory, not in R! Check out the documentation ?return:

If the end of a function is reached without calling ‘return’, the value of the last evaluated expression is returned.

Let me write 2 functions:

1
2
3
4
5
6
add_v <- function(x, y) {
   x + y
}
add_v2 <- function(x, y) {
   return(x + y)
}

add_v() and add_v2() are equivalent! So… do we care? Well, you must bear in mind that whenever return() is encountered, the evaluation of the set of expressions within the function is stopped and therefore some time can be saved:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
foo <- function(x) {
    out <- 0
    if (x > 3) out <- 3
    if (x > 2) out <- 2
    if (x > 1) out <- 1
    return(out)
}
foo2 <- function(x) {
  out <- 0
  if (x > 3) return(3)
  if (x > 2) return(2)
  if (x > 1) return(1)
  return(out)
}
microbenchmark(foo(4), foo2(4), times = 1e5)
#R>  Unit: nanoseconds
#R>      expr min  lq     mean median  uq      max neval
#R>    foo(4) 430 461 566.5423    461 472  3749391 1e+05
#R>   foo2(4) 310 341 511.5477    351 361 12611775 1e+05

invisible()

Let’s keep talking about what functions return. The function invisible() allows you to return an invisible copy of an object, meaning that nothing is (apparently) return if not assigned:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
add_v <- function(x, y) {
  x + y
}
##
add_i <- function(x, y) {
  invisible(x + y)
}
add_v(2, 3)
#R>  [1] 5
add_i(2, 3)
res <- add_i(2, 3)
res
#R>  [1] 5

But… why? As explained in the documentation (?invisible):

This function can be useful when it is desired to have functions return values which can be assigned, but which do not print when they are not assigned.

This is indeed helpful when you have a function that creates a plot (and you don’t normally to assign the result) for which you sometimes need to use an object that was created during the evaluation of the function:

1
2
3
4
5
6
7
plot_logy <- function(x, y) {
  # create ty
  ty <- log10(y + 1)
  plot(x, ty)
  invisible(ty)
}
plot_logy(0:10, 0:10)
1
2
# get ty
ty <- plot_logy(0:10, 0:10)
1
2
3
ty
#R>   [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980 0.9030900 0.9542425
#R>  [10] 1.0000000 1.0413927

bquote() and substitute()

When using mathematical annotations, we sometimes need to include the value of a variable. In such case, bquote() or substitute() are the functions you would need (rather than expression() you may already be familiar with).

If you opt for bquote(), then variables to be evaluated must be put in brackets and preceded by a dot, e.g. .(var). If you choose substitute(), then variables evaluated will be the ones included in the list passed as argument env (which can also be the name of a environment).

Let’s use both functions in to add mathematical expressions in an empty plot:

1
2
3
4
5
delta <- 1.5
plot(c(0,1), c(0,1), type = "n", axes = FALSE, ann = FALSE)
text(0.5, .75, labels = bquote(beta^j == .(delta) + bold("h")), cex = 3)
text(0.5, .25, labels = substitute(alpha[i] == a + delta, env = list(a = 2)), cex = 3)
box()
1
2
print(path_root)
#R>  [1] "/home/runner/work/inSileco.github.io/inSileco.github.io"

That’s all folks!

Display information relative to the R session used to render this post.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
sessionInfo()
#R>  R version 4.4.2 (2024-10-31)
#R>  Platform: x86_64-pc-linux-gnu
#R>  Running under: Ubuntu 22.04.5 LTS
#R>  
#R>  Matrix products: default
#R>  BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#R>  LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#R>  
#R>  locale:
#R>   [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
#R>   [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
#R>   [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#R>  
#R>  time zone: UTC
#R>  tzcode source: system (glibc)
#R>  
#R>  attached base packages:
#R>  [1] stats     graphics  grDevices utils     datasets  methods   base     
#R>  
#R>  other attached packages:
#R>  [1] microbenchmark_1.5.0 inSilecoRef_0.1.1   
#R>  
#R>  loaded via a namespace (and not attached):
#R>   [1] sass_0.4.9        generics_0.1.3    xml2_1.3.6        blogdown_1.19     stringi_1.8.4    
#R>   [6] httpcode_0.3.0    digest_0.6.37     magrittr_2.0.3    evaluate_1.0.1    bookdown_0.41    
#R>  [11] fastmap_1.2.0     plyr_1.8.9        jsonlite_1.8.9    backports_1.5.0   crul_1.5.0       
#R>  [16] promises_1.3.2    bibtex_0.5.1      jquerylib_0.1.4   cli_3.6.3         shiny_1.10.0     
#R>  [21] rlang_1.1.4       cachem_1.1.0      yaml_2.3.10       tools_4.4.2       dplyr_1.1.4      
#R>  [26] httpuv_1.6.15     DT_0.33           rcrossref_1.2.0   curl_6.0.1        vctrs_0.6.5      
#R>  [31] R6_2.5.1          mime_0.12         lifecycle_1.0.4   stringr_1.5.1     fs_1.6.5         
#R>  [36] htmlwidgets_1.6.4 miniUI_0.1.1.1    pkgconfig_2.0.3   pillar_1.10.0     bslib_0.8.0      
#R>  [41] later_1.4.1       glue_1.8.0        Rcpp_1.0.13-1     xfun_0.49         tibble_3.2.1     
#R>  [46] tidyselect_1.2.1  knitr_1.49        xtable_1.8-4      htmltools_0.5.8.1 rmarkdown_2.29   
#R>  [51] compiler_4.4.2

Edits

Feb 6, 2023 -- Remove redundant headers.