R tips trickortips
base utils graphics microbenchmark
Today’s menu:
nzchar()
return()
?invisible()
bquote()
and substitute()
Let’s consider two arrays of letters: the first has two dimensions (i.e. a matrix) and the second one has 3.
(arr2 <- array(LETTERS[1:9], dim = c(3,3)))
#R> [,1] [,2] [,3]
#R> [1,] "A" "D" "G"
#R> [2,] "B" "E" "H"
#R> [3,] "C" "F" "I"
class(arr2)
#R> [1] "matrix" "array"
(arr3 <- array(LETTERS[1:18], dim = c(3, 3, 2)))
#R> , , 1
#R>
#R> [,1] [,2] [,3]
#R> [1,] "A" "D" "G"
#R> [2,] "B" "E" "H"
#R> [3,] "C" "F" "I"
#R>
#R> , , 2
#R>
#R> [,1] [,2] [,3]
#R> [1,] "J" "M" "P"
#R> [2,] "K" "N" "Q"
#R> [3,] "L" "O" "R"
class(arr3)
#R> [1] "array"
Let’s say, you need to subset a specific set of values based on the position of the elements. To subset a single element, say "G"
, there are a couple of options, but I guess the most common approach is to use [
with one value per dimension:
arr2[1,3]
#R> [1] "G"
arr3[1,3,1]
#R> [1] "G"
or with a single value giving the position of the element:
arr2[7]
#R> [1] "G"
arr3[7]
#R> [1] "G"
Now we consider the case where you have a vector of positions (one value per dimension of the array), in this case, beware the orientation of the vector!
# with the line below, we get the 1rst and 3rd elements because we're using a column vector
arr2[c(1,3)]
#R> [1] "A" "C"
# whereas with a row vector, we obtain the element of the 1rst row and the 3rd column
arr2[t(c(1,3))]
#R> [1] "G"
And for more than one element, you need to use a matrix with one row per element to be subset:
(mat <- rbind(c(1,3), c(2,2)))
#R> [,1] [,2]
#R> [1,] 1 3
#R> [2,] 2 2
arr2[mat]
#R> [1] "G" "E"
Similarly, with an array of 3 dimensions, the matrix will have three columns and as many row as there are elements:
# Let us subset `E`,`C` and `O` and `C` (again)
(msub <- rbind(c(2,2,1), c(3, 1, 1), c(3, 2, 2), c(3, 1, 1)))
#R> [,1] [,2] [,3]
#R> [1,] 2 2 1
#R> [2,] 3 1 1
#R> [3,] 3 2 2
#R> [4,] 3 1 1
arr3[msub]
#R> [1] "E" "C" "O" "C"
Two additional comments. First, we should always keep in mind that data frames and arrays are different:
# this gives you the 1rst and 3rd **entire columns**
as.data.frame(arr2)[c(1,3)]
#R> V1 V3
#R> 1 A G
#R> 2 B H
#R> 3 C I
# this still gives you the element on the 1rst row and the 3rd column
as.data.frame(arr2)[t(c(1,3))]
#R> [1] "G"
Second, if you are a tidyverse user, there is a new article dealing with subassigment with tibble 😎.
nzchar()
You may already be aware of nchar()
, a function that returns the number of characters of a given character vector:
nchar(c("insil", "eco", ""))
#R> [1] 5 3 0
nzchar()
returns TRUE
for every character string in the vector that has at least 1 character:
vec <- c("insil", "eco", "")
nzchar(vec)
#R> [1] TRUE TRUE FALSE
# is there any empty character string in `vec`?
any(nzchar(vec))
#R> [1] TRUE
Interesting, but let’s dig deeper: I can think about no less than 3 ways of writing a equivalent function with one more character:
nchar(vec) > 0
#R> [1] TRUE TRUE FALSE
!! nchar(vec)
#R> [1] TRUE TRUE FALSE
nchar(vec) & 1
#R> [1] TRUE TRUE FALSE
One more character… so why bother? 💡 It should be a matter of performance! Let’s check that out with the cool 📦 microbenchmark:
library(microbenchmark)
microbenchmark(nchar(vec) > 0, !! nchar(vec), nchar(vec) & 1, nzchar(vec),
times = 1000L)
#R> Unit: nanoseconds
#R> expr min lq mean median uq max neval
#R> nchar(vec) > 0 900 1000 1059.9 1000 1100 2900 1000
#R> !!nchar(vec) 1000 1100 1158.6 1100 1200 3900 1000
#R> nchar(vec) & 1 1000 1100 1205.3 1100 1200 12500 1000
#R> nzchar(vec) 0 0 66.7 100 100 500 1000
Yep yep!nzchar()
is indeed way faster 🚀!
return()
?If you have already written your own function, you must have used return()
to specify what your function should return. There are programming languages where this instruction is mandatory, not in R! Check out the documentation ?return
:
If the end of a function is reached without calling ‘return’, the value of the last evaluated expression is returned.
Let me write 2 functions:
add_v <- function(x, y) {
x + y
}
add_v2 <- function(x, y) {
return(x + y)
}
add_v()
and add_v2()
are equivalent! So… do we care? Well, you must bear in mind that whenever return()
is encountered, the evaluation of the set of expressions within the function is stopped and therefore some time can be saved:
foo <- function(x) {
out <- 0
if (x > 3) out <- 3
if (x > 2) out <- 2
if (x > 1) out <- 1
return(out)
}
foo2 <- function(x) {
out <- 0
if (x > 3) return(3)
if (x > 2) return(2)
if (x > 1) return(1)
return(out)
}
microbenchmark(foo(4), foo2(4), times = 1e5)
#R> Unit: nanoseconds
#R> expr min lq mean median uq max neval
#R> foo(4) 400 500 920.786 500 600 18631300 1e+05
#R> foo2(4) 200 300 459.573 300 400 3311300 1e+05
invisible()
Let’s keep talking about what functions return. The function invisible()
allows you to return an invisible copy of an object, meaning that nothing is (apparently) return if not assigned:
add_v <- function(x, y) {
x + y
}
##
add_i <- function(x, y) {
invisible(x + y)
}
add_v(2, 3)
#R> [1] 5
add_i(2, 3)
res <- add_i(2, 3)
res
#R> [1] 5
But… why? As explained in the documentation (?invisible
):
This function can be useful when it is desired to have functions return values which can be assigned, but which do not print when they are not assigned.
This is indeed helpful when you have a function that creates a plot (and you don’t normally to assign the result) for which you sometimes need to use an object that was created during the evaluation of the function:
plot_logy <- function(x, y) {
# create ty
ty <- log10(y + 1)
plot(x, ty)
invisible(ty)
}
plot_logy(0:10, 0:10)
# get ty
ty <- plot_logy(0:10, 0:10)
ty
#R> [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980
#R> [8] 0.9030900 0.9542425 1.0000000 1.0413927
bquote()
and substitute()
When using mathematical annotations, we sometimes need to include the value
of a variable. In such case, bquote()
or substitute()
are the functions you
would need (rather than expression()
you may already be familiar with).
If you opt for bquote()
, then variables to be evaluated must be put in
brackets and preceded by a dot, e.g. .(var)
. If you choose substitute()
,
then variables evaluated will be the ones included in the list passed as
argument env
(which can also be the name of a environment).
Let’s use both functions in to add mathematical expressions in an empty plot:
delta <- 1.5
plot(c(0,1), c(0,1), type = "n", axes = FALSE, ann = FALSE)
text(0.5, .75, labels = bquote(beta^j == .(delta) + bold("h")), cex = 3)
text(0.5, .25, labels = substitute(alpha[i] == a + delta, env = list(a = 2)), cex = 3)
box()
print(path_root)
#R> [1] "/home/runner/work/inSileco.github.io/inSileco.github.io"
sessionInfo()
#R> R version 4.2.2 (2022-10-31)
#R> Platform: x86_64-pc-linux-gnu (64-bit)
#R> Running under: Ubuntu 22.04.1 LTS
#R>
#R> Matrix products: default
#R> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#R> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#R>
#R> locale:
#R> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#R> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#R> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#R> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#R>
#R> attached base packages:
#R> [1] stats graphics grDevices utils datasets methods base
#R>
#R> other attached packages:
#R> [1] microbenchmark_1.4.9 inSilecoRef_0.0.1.9000
#R>
#R> loaded via a namespace (and not attached):
#R> [1] tidyselect_1.2.0 xfun_0.35 bslib_0.4.1 vctrs_0.5.1
#R> [5] generics_0.1.3 miniUI_0.1.1.1 htmltools_0.5.4 yaml_2.3.6
#R> [9] utf8_1.2.2 rlang_1.0.6 later_1.3.0 pillar_1.8.1
#R> [13] jquerylib_0.1.4 httpcode_0.3.0 glue_1.6.2 DBI_1.1.3
#R> [17] lifecycle_1.0.3 plyr_1.8.8 stringr_1.5.0 blogdown_1.15
#R> [21] htmlwidgets_1.5.4 evaluate_0.18 knitr_1.41 fastmap_1.1.0
#R> [25] httpuv_1.6.6 curl_4.3.3 fansi_1.0.3 highr_0.9
#R> [29] Rcpp_1.0.9 xtable_1.8-4 promises_1.2.0.1 backports_1.4.1
#R> [33] DT_0.26 cachem_1.0.6 jsonlite_1.8.4 rcrossref_1.2.0
#R> [37] mime_0.12 digest_0.6.30 stringi_1.7.8 bookdown_0.30
#R> [41] dplyr_1.0.10 shiny_1.7.3 bibtex_0.5.0 cli_3.4.1
#R> [45] tools_4.2.2 magrittr_2.0.3 sass_0.4.4 tibble_3.1.8
#R> [49] RefManageR_1.4.0 crul_1.3 pkgconfig_2.0.3 ellipsis_0.3.2
#R> [53] xml2_1.3.3 timechange_0.1.1 lubridate_1.9.0 assertthat_0.2.1
#R> [57] rmarkdown_2.18 httr_1.4.4 R6_2.5.1 compiler_4.2.2