Trick or Tips
Ever tumbled on a code chunk that made you say:
"I should have known this f_ piece of code long ago!"
Chances are you have, frustratingly, just like we have, and on multiple occasions too. In comes Trick or Tips!
Trick or Tips is a series of blog posts that each
present 5 -- hopefully helpful -- coding tips for a specific programming
language. Posts should be short (i.e. no more than 5 lines of code,
max 80 characters per line, except when appropriate) and provide tips of
many kind: a function, a way of combining of functions, a single argument,
a note about the philosophy of the language and practical consequences,
tricks to improve the way you code, good practices, etc.
Note that while some tips might be obvious for careful documentation readers
(God bless them for their wisdom), we do our best to present what we find very
useful and underestimated. By the way, there are undoubtedly similar initiatives on the web (e.g.
"One R Tip a Day" Twitter account). Last, feel free to comment below tip ideas or a post of code tips of your own which we will be happy to incorporate to our next post.
Enjoy and get ready to frustratingly appreciate our tips!
The apropos()
function
A powerful way to look for a function you can barely remember the name of
directly in R, i.e without googling!
1
2
3
4
5
6
7
8
9
|
apropos('Sys')
#R> [1] ".First.sys" ".sys.timezone" "R_system_version" "sys.call" "sys.calls"
#R> [6] "Sys.chmod" "Sys.Date" "sys.frame" "sys.frames" "sys.function"
#R> [11] "Sys.getenv" "Sys.getlocale" "Sys.getpid" "Sys.glob" "Sys.info"
#R> [16] "sys.load.image" "Sys.localeconv" "sys.nframe" "sys.on.exit" "sys.parent"
#R> [21] "sys.parents" "Sys.readlink" "sys.save.image" "Sys.setenv" "Sys.setFileTime"
#R> [26] "Sys.setLanguage" "Sys.setlocale" "Sys.sleep" "sys.source" "sys.status"
#R> [31] "Sys.time" "Sys.timezone" "Sys.umask" "Sys.unsetenv" "Sys.which"
#R> [36] "system" "system.file" "system.time" "system2"
|
You can also take advantage of regular expressions to narrow down you research:
1
2
3
4
5
6
7
8
9
|
apropos('^Sys')
#R> [1] "sys.call" "sys.calls" "Sys.chmod" "Sys.Date" "sys.frame"
#R> [6] "sys.frames" "sys.function" "Sys.getenv" "Sys.getlocale" "Sys.getpid"
#R> [11] "Sys.glob" "Sys.info" "sys.load.image" "Sys.localeconv" "sys.nframe"
#R> [16] "sys.on.exit" "sys.parent" "sys.parents" "Sys.readlink" "sys.save.image"
#R> [21] "Sys.setenv" "Sys.setFileTime" "Sys.setLanguage" "Sys.setlocale" "Sys.sleep"
#R> [26] "sys.source" "sys.status" "Sys.time" "Sys.timezone" "Sys.umask"
#R> [31] "Sys.unsetenv" "Sys.which" "system" "system.file" "system.time"
#R> [36] "system2"
|
Or even better:
1
2
|
apropos('^Sys.*time$', ignore.case = FALSE)
#R> [1] "Sys.time"
|
The table()
function
Oftentimes we wish to extract the frequency of certain elements in a dataset.
There is a very useful function that allows us to achieve this quite efficiently:
table()
. Let’s see how this works:
1
2
3
4
5
|
df <- data.frame(data = sample(1:5, 20, replace = TRUE))
table(df$data)
#R>
#R> 1 2 3 4 5
#R> 5 3 6 1 5
|
You can also get the frequency for a data.frame with multiple columns. For example,
if you observed species at a site throughout multiple years and wanted to know
the frequency of observations per species per year:
1
2
3
4
5
6
7
8
9
10
11
12
|
df <- data.frame(
observations = paste0('species', sample(1:5, 50, replace = TRUE)),
year = sort(sample(2015:2018, 50, replace = TRUE))
)
table(df)
#R> year
#R> observations 2015 2016 2017 2018
#R> species1 3 1 1 4
#R> species2 3 2 1 2
#R> species3 3 3 2 4
#R> species4 0 4 3 1
#R> species5 1 2 5 5
|
You can actually do so for more than two columns.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
df$atr1 <- rep(c("val1", "val2"), each = 25)
tb <- table(df)
tb
#R> , , atr1 = val1
#R>
#R> year
#R> observations 2015 2016 2017 2018
#R> species1 3 1 1 0
#R> species2 3 2 0 0
#R> species3 3 3 0 0
#R> species4 0 4 0 0
#R> species5 1 2 2 0
#R>
#R> , , atr1 = val2
#R>
#R> year
#R> observations 2015 2016 2017 2018
#R> species1 0 0 0 4
#R> species2 0 0 1 2
#R> species3 0 0 2 4
#R> species4 0 0 3 1
#R> species5 0 0 3 5
|
As you can see, in such case, you will have to deal with arrays:
1
2
3
4
5
6
7
8
|
tb[, , 1]
#R> year
#R> observations 2015 2016 2017 2018
#R> species1 3 1 1 0
#R> species2 3 2 0 0
#R> species3 3 3 0 0
#R> species4 0 4 0 0
#R> species5 1 2 2 0
|
With further development and by combining table()
with paste0()
(see
fish and tips 001
for an explanation of this useful function!), you can create your desired data.frame:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
as.data.frame(table(paste0(df$year, '_', df$observations)))
#R> Var1 Freq
#R> 1 2015_species1 3
#R> 2 2015_species2 3
#R> 3 2015_species3 3
#R> 4 2015_species5 1
#R> 5 2016_species1 1
#R> 6 2016_species2 2
#R> 7 2016_species3 3
#R> 8 2016_species4 4
#R> 9 2016_species5 2
#R> 10 2017_species1 1
#R> 11 2017_species2 1
#R> 12 2017_species3 2
#R> 13 2017_species4 3
#R> 14 2017_species5 5
#R> 15 2018_species1 4
#R> 16 2018_species2 2
#R> 17 2018_species3 4
#R> 18 2018_species4 1
#R> 19 2018_species5 5
|
Everything but 0
This is a well-known trick for developers that may be useful for many beginners.
In R when performing a logical test, every numeric is considered as TRUE
but
0
(which is FALSE
):
1
2
3
4
5
6
7
8
|
0 == FALSE
!0
!1
!7.45
#R> [1] TRUE
#R> [1] TRUE
#R> [1] FALSE
#R> [1] FALSE
|
This can actually be very helpful, for instance when we are testing whether
or not a vector is empty!
1
2
3
4
5
6
7
|
vec0 <- 1:7
vec1 <- vec0[vec0 > 5]
vec2 <- vec0[vec0 > 7]
!(length(vec1))
!(length(vec2))
#R> [1] FALSE
#R> [1] TRUE
|
expand.grid()
vs.combn()
If you often create empty data.frame, you are very likely already familiar
with the expand.grid()
function:
1
2
3
4
5
6
7
8
9
10
|
expand.grid(LETTERS[1:4], LETTERS[5:6])
#R> Var1 Var2
#R> 1 A E
#R> 2 B E
#R> 3 C E
#R> 4 D E
#R> 5 A F
#R> 6 B F
#R> 7 C F
#R> 8 D F
|
But if you are looking for unique combinations (think about all combinations of
games in a tournament of four team), you may feel that expand.grid()
is not
what you need:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
expand.grid(LETTERS[1:4], LETTERS[1:4])
#R> Var1 Var2
#R> 1 A A
#R> 2 B A
#R> 3 C A
#R> 4 D A
#R> 5 A B
#R> 6 B B
#R> 7 C B
#R> 8 D B
#R> 9 A C
#R> 10 B C
#R> 11 C C
#R> 12 D C
#R> 13 A D
#R> 14 B D
#R> 15 C D
#R> 16 D D
|
In comes combn
:
1
2
3
4
|
combn(LETTERS[1:5], 2)
#R> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#R> [1,] "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
#R> [2,] "B" "C" "D" "E" "C" "D" "E" "D" "E" "E"
|
As you can see you need to specify the number of elements in the combination as
combn
can compute all combination
1
2
3
4
5
6
|
combn(LETTERS[1:5], 4)
#R> [,1] [,2] [,3] [,4] [,5]
#R> [1,] "A" "A" "A" "A" "B"
#R> [2,] "B" "B" "B" "C" "C"
#R> [3,] "C" "C" "D" "D" "D"
#R> [4,] "D" "E" "E" "E" "E"
|
Also if you want a data frame, a small extra step is required:
1
2
3
4
5
6
7
8
9
10
11
12
|
as.data.frame(t(combn(LETTERS[1:5], 2)))
#R> V1 V2
#R> 1 A B
#R> 2 A C
#R> 3 A D
#R> 4 A E
#R> 5 B C
#R> 6 B D
#R> 7 B E
#R> 8 C D
#R> 9 C E
#R> 10 D E
|
Writing outside the margins
If you are always thinking outside the box you may want to learn how to plot
something outside the margins! This is possible using the xpd
parameter of
the par()
function.
1
2
3
4
5
6
7
|
par(mfrow = c(1, 2))
plot(c(0, 2), c(0, 2))
lines(c(-1, 3), c(1, 1), lwd = 4)
##
par(xpd = TRUE)
plot(c(0, 2), c(0, 2))
lines(c(-1, 3), c(1, 1), lwd = 4)
|
See you next post post!
Display information relative to the R session used to render this post.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
sessionInfo()
#R> R version 4.4.2 (2024-10-31)
#R> Platform: x86_64-pc-linux-gnu
#R> Running under: Ubuntu 22.04.5 LTS
#R>
#R> Matrix products: default
#R> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#R> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#R>
#R> locale:
#R> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
#R> [5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
#R> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#R>
#R> time zone: UTC
#R> tzcode source: system (glibc)
#R>
#R> attached base packages:
#R> [1] stats graphics grDevices utils datasets methods base
#R>
#R> other attached packages:
#R> [1] inSilecoRef_0.1.1
#R>
#R> loaded via a namespace (and not attached):
#R> [1] sass_0.4.9 generics_0.1.3 xml2_1.3.6 blogdown_1.19 stringi_1.8.4
#R> [6] httpcode_0.3.0 digest_0.6.37 magrittr_2.0.3 evaluate_1.0.1 bookdown_0.41
#R> [11] fastmap_1.2.0 plyr_1.8.9 jsonlite_1.8.9 backports_1.5.0 crul_1.5.0
#R> [16] promises_1.3.2 bibtex_0.5.1 jquerylib_0.1.4 cli_3.6.3 shiny_1.10.0
#R> [21] rlang_1.1.4 cachem_1.1.0 yaml_2.3.10 tools_4.4.2 dplyr_1.1.4
#R> [26] httpuv_1.6.15 DT_0.33 rcrossref_1.2.0 curl_6.0.1 vctrs_0.6.5
#R> [31] R6_2.5.1 mime_0.12 lifecycle_1.0.4 stringr_1.5.1 fs_1.6.5
#R> [36] htmlwidgets_1.6.4 miniUI_0.1.1.1 pkgconfig_2.0.3 pillar_1.10.0 bslib_0.8.0
#R> [41] later_1.4.1 glue_1.8.0 Rcpp_1.0.13-1 systemfonts_1.1.0 xfun_0.49
#R> [46] tibble_3.2.1 tidyselect_1.2.1 knitr_1.49 xtable_1.8-4 htmltools_0.5.8.1
#R> [51] svglite_2.1.3 rmarkdown_2.29 compiler_4.4.2
|
Edits
Apr 26, 2022 -- Beautify code source and add session info section.