Trick or Tips 003 {R}

February 11, 2018

  R tips trickortips
  base utils graphics

Kevin Cazelles, David Beauchesne

   

Trick or Tips

Ever tumbled on a code chunk that made you say: "I should have known this f_ piece of code long ago!" Chances are you have, frustratingly, just like we have, and on multiple occasions too. In comes Trick or Tips!

Trick or Tips is a series of blog posts that each present 5 -- hopefully helpful -- coding tips for a specific programming language. Posts should be short (i.e. no more than 5 lines of code, max 80 characters per line, except when appropriate) and provide tips of many kind: a function, a way of combining of functions, a single argument, a note about the philosophy of the language and practical consequences, tricks to improve the way you code, good practices, etc.

Note that while some tips might be obvious for careful documentation readers (God bless them for their wisdom), we do our best to present what we find very useful and underestimated. By the way, there are undoubtedly similar initiatives on the web (e.g. "One R Tip a Day" Twitter account). Last, feel free to comment below tip ideas or a post of code tips of your own which we will be happy to incorporate to our next post.

Enjoy and get ready to frustratingly appreciate our tips!


The apropos() function

A powerful way to look for a function you can barely remember the name of directly in R, i.e without googling!

1
2
3
4
5
6
7
8
9
apropos('Sys')
#R>   [1] ".First.sys"       ".sys.timezone"    "R_system_version" "sys.call"         "sys.calls"       
#R>   [6] "Sys.chmod"        "Sys.Date"         "sys.frame"        "sys.frames"       "sys.function"    
#R>  [11] "Sys.getenv"       "Sys.getlocale"    "Sys.getpid"       "Sys.glob"         "Sys.info"        
#R>  [16] "sys.load.image"   "Sys.localeconv"   "sys.nframe"       "sys.on.exit"      "sys.parent"      
#R>  [21] "sys.parents"      "Sys.readlink"     "sys.save.image"   "Sys.setenv"       "Sys.setFileTime" 
#R>  [26] "Sys.setLanguage"  "Sys.setlocale"    "Sys.sleep"        "sys.source"       "sys.status"      
#R>  [31] "Sys.time"         "Sys.timezone"     "Sys.umask"        "Sys.unsetenv"     "Sys.which"       
#R>  [36] "system"           "system.file"      "system.time"      "system2"

You can also take advantage of regular expressions to narrow down you research:

1
2
3
4
5
6
7
8
9
apropos('^Sys')
#R>   [1] "sys.call"        "sys.calls"       "Sys.chmod"       "Sys.Date"        "sys.frame"      
#R>   [6] "sys.frames"      "sys.function"    "Sys.getenv"      "Sys.getlocale"   "Sys.getpid"     
#R>  [11] "Sys.glob"        "Sys.info"        "sys.load.image"  "Sys.localeconv"  "sys.nframe"     
#R>  [16] "sys.on.exit"     "sys.parent"      "sys.parents"     "Sys.readlink"    "sys.save.image" 
#R>  [21] "Sys.setenv"      "Sys.setFileTime" "Sys.setLanguage" "Sys.setlocale"   "Sys.sleep"      
#R>  [26] "sys.source"      "sys.status"      "Sys.time"        "Sys.timezone"    "Sys.umask"      
#R>  [31] "Sys.unsetenv"    "Sys.which"       "system"          "system.file"     "system.time"    
#R>  [36] "system2"

Or even better:

1
2
apropos('^Sys.*time$', ignore.case = FALSE)
#R>  [1] "Sys.time"

The table() function

Oftentimes we wish to extract the frequency of certain elements in a dataset. There is a very useful function that allows us to achieve this quite efficiently: table(). Let’s see how this works:

1
2
3
4
5
df <- data.frame(data = sample(1:5, 20, replace = TRUE))
table(df$data)
#R>  
#R>  1 2 3 4 5 
#R>  5 3 6 1 5

You can also get the frequency for a data.frame with multiple columns. For example, if you observed species at a site throughout multiple years and wanted to know the frequency of observations per species per year:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
df <- data.frame(
  observations = paste0('species', sample(1:5, 50, replace = TRUE)),
  year = sort(sample(2015:2018, 50, replace = TRUE))
)
table(df)
#R>              year
#R>  observations 2015 2016 2017 2018
#R>      species1    3    1    1    4
#R>      species2    3    2    1    2
#R>      species3    3    3    2    4
#R>      species4    0    4    3    1
#R>      species5    1    2    5    5

You can actually do so for more than two columns.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
df$atr1 <- rep(c("val1", "val2"), each = 25)
tb <- table(df)
tb
#R>  , , atr1 = val1
#R>  
#R>              year
#R>  observations 2015 2016 2017 2018
#R>      species1    3    1    1    0
#R>      species2    3    2    0    0
#R>      species3    3    3    0    0
#R>      species4    0    4    0    0
#R>      species5    1    2    2    0
#R>  
#R>  , , atr1 = val2
#R>  
#R>              year
#R>  observations 2015 2016 2017 2018
#R>      species1    0    0    0    4
#R>      species2    0    0    1    2
#R>      species3    0    0    2    4
#R>      species4    0    0    3    1
#R>      species5    0    0    3    5

As you can see, in such case, you will have to deal with arrays:

1
2
3
4
5
6
7
8
tb[, , 1]
#R>              year
#R>  observations 2015 2016 2017 2018
#R>      species1    3    1    1    0
#R>      species2    3    2    0    0
#R>      species3    3    3    0    0
#R>      species4    0    4    0    0
#R>      species5    1    2    2    0

With further development and by combining table() with paste0() (see fish and tips 001 for an explanation of this useful function!), you can create your desired data.frame:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
as.data.frame(table(paste0(df$year, '_', df$observations)))
#R>              Var1 Freq
#R>  1  2015_species1    3
#R>  2  2015_species2    3
#R>  3  2015_species3    3
#R>  4  2015_species5    1
#R>  5  2016_species1    1
#R>  6  2016_species2    2
#R>  7  2016_species3    3
#R>  8  2016_species4    4
#R>  9  2016_species5    2
#R>  10 2017_species1    1
#R>  11 2017_species2    1
#R>  12 2017_species3    2
#R>  13 2017_species4    3
#R>  14 2017_species5    5
#R>  15 2018_species1    4
#R>  16 2018_species2    2
#R>  17 2018_species3    4
#R>  18 2018_species4    1
#R>  19 2018_species5    5

Everything but 0

This is a well-known trick for developers that may be useful for many beginners. In R when performing a logical test, every numeric is considered as TRUE but 0 (which is FALSE):

1
2
3
4
5
6
7
8
0 == FALSE
!0
!1
!7.45
#R>  [1] TRUE
#R>  [1] TRUE
#R>  [1] FALSE
#R>  [1] FALSE

This can actually be very helpful, for instance when we are testing whether or not a vector is empty!

1
2
3
4
5
6
7
vec0 <- 1:7
vec1 <- vec0[vec0 > 5]
vec2 <- vec0[vec0 > 7]
!(length(vec1))
!(length(vec2))
#R>  [1] FALSE
#R>  [1] TRUE

expand.grid() vs.combn()

If you often create empty data.frame, you are very likely already familiar with the expand.grid() function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
expand.grid(LETTERS[1:4], LETTERS[5:6])
#R>    Var1 Var2
#R>  1    A    E
#R>  2    B    E
#R>  3    C    E
#R>  4    D    E
#R>  5    A    F
#R>  6    B    F
#R>  7    C    F
#R>  8    D    F

But if you are looking for unique combinations (think about all combinations of games in a tournament of four team), you may feel that expand.grid() is not what you need:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
expand.grid(LETTERS[1:4], LETTERS[1:4])
#R>     Var1 Var2
#R>  1     A    A
#R>  2     B    A
#R>  3     C    A
#R>  4     D    A
#R>  5     A    B
#R>  6     B    B
#R>  7     C    B
#R>  8     D    B
#R>  9     A    C
#R>  10    B    C
#R>  11    C    C
#R>  12    D    C
#R>  13    A    D
#R>  14    B    D
#R>  15    C    D
#R>  16    D    D

In comes combn:

1
2
3
4
combn(LETTERS[1:5], 2)
#R>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#R>  [1,] "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "D"  
#R>  [2,] "B"  "C"  "D"  "E"  "C"  "D"  "E"  "D"  "E"  "E"

As you can see you need to specify the number of elements in the combination as combn can compute all combination

1
2
3
4
5
6
combn(LETTERS[1:5], 4)
#R>       [,1] [,2] [,3] [,4] [,5]
#R>  [1,] "A"  "A"  "A"  "A"  "B" 
#R>  [2,] "B"  "B"  "B"  "C"  "C" 
#R>  [3,] "C"  "C"  "D"  "D"  "D" 
#R>  [4,] "D"  "E"  "E"  "E"  "E"

Also if you want a data frame, a small extra step is required:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
as.data.frame(t(combn(LETTERS[1:5], 2)))
#R>     V1 V2
#R>  1   A  B
#R>  2   A  C
#R>  3   A  D
#R>  4   A  E
#R>  5   B  C
#R>  6   B  D
#R>  7   B  E
#R>  8   C  D
#R>  9   C  E
#R>  10  D  E

Writing outside the margins

If you are always thinking outside the box you may want to learn how to plot something outside the margins! This is possible using the xpd parameter of the par() function.

1
2
3
4
5
6
7
par(mfrow = c(1, 2))
plot(c(0, 2), c(0, 2))
lines(c(-1, 3), c(1, 1), lwd = 4)
##
par(xpd = TRUE)
plot(c(0, 2), c(0, 2))
lines(c(-1, 3), c(1, 1), lwd = 4)

See you next post post!

Display information relative to the R session used to render this post.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
sessionInfo()
#R>  R version 4.4.2 (2024-10-31)
#R>  Platform: x86_64-pc-linux-gnu
#R>  Running under: Ubuntu 22.04.5 LTS
#R>  
#R>  Matrix products: default
#R>  BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#R>  LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#R>  
#R>  locale:
#R>   [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
#R>   [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
#R>   [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#R>  
#R>  time zone: UTC
#R>  tzcode source: system (glibc)
#R>  
#R>  attached base packages:
#R>  [1] stats     graphics  grDevices utils     datasets  methods   base     
#R>  
#R>  other attached packages:
#R>  [1] inSilecoRef_0.1.1
#R>  
#R>  loaded via a namespace (and not attached):
#R>   [1] sass_0.4.9        generics_0.1.3    xml2_1.3.6        blogdown_1.19     stringi_1.8.4    
#R>   [6] httpcode_0.3.0    digest_0.6.37     magrittr_2.0.3    evaluate_1.0.1    bookdown_0.41    
#R>  [11] fastmap_1.2.0     plyr_1.8.9        jsonlite_1.8.9    backports_1.5.0   crul_1.5.0       
#R>  [16] promises_1.3.2    bibtex_0.5.1      jquerylib_0.1.4   cli_3.6.3         shiny_1.10.0     
#R>  [21] rlang_1.1.4       cachem_1.1.0      yaml_2.3.10       tools_4.4.2       dplyr_1.1.4      
#R>  [26] httpuv_1.6.15     DT_0.33           rcrossref_1.2.0   curl_6.0.1        vctrs_0.6.5      
#R>  [31] R6_2.5.1          mime_0.12         lifecycle_1.0.4   stringr_1.5.1     fs_1.6.5         
#R>  [36] htmlwidgets_1.6.4 miniUI_0.1.1.1    pkgconfig_2.0.3   pillar_1.10.0     bslib_0.8.0      
#R>  [41] later_1.4.1       glue_1.8.0        Rcpp_1.0.13-1     systemfonts_1.1.0 xfun_0.49        
#R>  [46] tibble_3.2.1      tidyselect_1.2.1  knitr_1.49        xtable_1.8-4      htmltools_0.5.8.1
#R>  [51] svglite_2.1.3     rmarkdown_2.29    compiler_4.4.2

Edits

Apr 26, 2022 -- Beautify code source and add session info section.